public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: "Zurcher, Christopher J" <christopher.j.zurcher@intel.com>
To: devel@edk2.groups.io
Cc: Jian J Wang <jian.j.wang@intel.com>,
	Xiaoyu Lu <xiaoyux.lu@intel.com>, Eugene Cohen <eugene@hp.com>,
	Ard Biesheuvel <ard.biesheuvel@linaro.org>
Subject: [PATCH 1/1] CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64
Date: Tue, 17 Mar 2020 03:26:56 -0700	[thread overview]
Message-ID: <20200317102656.20032-2-christopher.j.zurcher@intel.com> (raw)
In-Reply-To: <20200317102656.20032-1-christopher.j.zurcher@intel.com>

BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2507

Adding IA32 and X64 versions of OpensslLib.inf, and their respective
assembly files. This also introduces the required modifications to
process_files.pl for generating these files.

Cc: Jian J Wang <jian.j.wang@intel.com>
Cc: Xiaoyu Lu <xiaoyux.lu@intel.com>
Cc: Eugene Cohen <eugene@hp.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Christopher J Zurcher <christopher.j.zurcher@intel.com>
---
 CryptoPkg/Library/OpensslLib/OpensslLib.inf                          |    2 +-
 CryptoPkg/Library/OpensslLib/OpensslLibCrypto.inf                    |    2 +-
 CryptoPkg/Library/OpensslLib/OpensslLibIa32.inf                      |  680 ++
 CryptoPkg/Library/OpensslLib/OpensslLibX64.inf                       |  691 ++
 CryptoPkg/Library/Include/openssl/opensslconf.h                      |    3 -
 CryptoPkg/Library/OpensslLib/ApiHooks.c                              |   18 +
 CryptoPkg/Library/OpensslLib/OpensslLibConstructor.c                 |   34 +
 CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/aesni-x86.nasm          | 3209 ++++++++
 CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/vpaes-x86.nasm          |  648 ++
 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/bn-586.nasm              | 1522 ++++
 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/co-586.nasm              | 1259 +++
 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-gf2m.nasm            |  352 +
 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-mont.nasm            |  486 ++
 CryptoPkg/Library/OpensslLib/Ia32/crypto/des/crypt586.nasm           |  887 +++
 CryptoPkg/Library/OpensslLib/Ia32/crypto/des/des-586.nasm            | 1835 +++++
 CryptoPkg/Library/OpensslLib/Ia32/crypto/md5/md5-586.nasm            |  690 ++
 CryptoPkg/Library/OpensslLib/Ia32/crypto/modes/ghash-x86.nasm        | 1264 +++
 CryptoPkg/Library/OpensslLib/Ia32/crypto/rc4/rc4-586.nasm            |  381 +
 CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha1-586.nasm           | 3977 ++++++++++
 CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha256-586.nasm         | 6796 ++++++++++++++++
 CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha512-586.nasm         | 2842 +++++++
 CryptoPkg/Library/OpensslLib/Ia32/crypto/x86cpuid.nasm               |  513 ++
 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb-x86_64.nasm     | 1772 +++++
 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1-x86_64.nasm   | 3271 ++++++++
 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256-x86_64.nasm | 4709 +++++++++++
 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm        | 5084 ++++++++++++
 CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm        | 1170 +++
 CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-avx2.nasm            | 1989 +++++
 CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-x86_64.nasm          | 2242 ++++++
 CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-gf2m.nasm          |  432 +
 CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont.nasm          | 1479 ++++
 CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont5.nasm         | 4033 ++++++++++
 CryptoPkg/Library/OpensslLib/X64/crypto/md5/md5-x86_64.nasm          |  794 ++
 CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm-x86_64.nasm  |  984 +++
 CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash-x86_64.nasm      | 2077 +++++
 CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-md5-x86_64.nasm      | 1395 ++++
 CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-x86_64.nasm          |  784 ++
 CryptoPkg/Library/OpensslLib/X64/crypto/sha/keccak1600-x86_64.nasm   |  532 ++
 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb-x86_64.nasm      | 7581 ++++++++++++++++++
 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm         | 5773 ++++++++++++++
 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb-x86_64.nasm    | 8262 ++++++++++++++++++++
 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm       | 5712 ++++++++++++++
 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm       | 5668 ++++++++++++++
 CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm             |  472 ++
 CryptoPkg/Library/OpensslLib/process_files.pl                        |  208 +-
 CryptoPkg/Library/OpensslLib/uefi-asm.conf                           |   14 +
 46 files changed, 94478 insertions(+), 50 deletions(-)

diff --git a/CryptoPkg/Library/OpensslLib/OpensslLib.inf b/CryptoPkg/Library/OpensslLib/OpensslLib.inf
index 3519a66885..542507a534 100644
--- a/CryptoPkg/Library/OpensslLib/OpensslLib.inf
+++ b/CryptoPkg/Library/OpensslLib/OpensslLib.inf
@@ -15,7 +15,7 @@
   VERSION_STRING                 = 1.0
   LIBRARY_CLASS                  = OpensslLib
   DEFINE OPENSSL_PATH            = openssl
-  DEFINE OPENSSL_FLAGS           = -DL_ENDIAN -DOPENSSL_SMALL_FOOTPRINT -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE
+  DEFINE OPENSSL_FLAGS           = -DL_ENDIAN -DOPENSSL_SMALL_FOOTPRINT -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE -DOPENSSL_NO_ASM
 
 #
 #  VALID_ARCHITECTURES           = IA32 X64 ARM AARCH64
diff --git a/CryptoPkg/Library/OpensslLib/OpensslLibCrypto.inf b/CryptoPkg/Library/OpensslLib/OpensslLibCrypto.inf
index 8a723cb8cd..f0c588284c 100644
--- a/CryptoPkg/Library/OpensslLib/OpensslLibCrypto.inf
+++ b/CryptoPkg/Library/OpensslLib/OpensslLibCrypto.inf
@@ -15,7 +15,7 @@
   VERSION_STRING                 = 1.0
   LIBRARY_CLASS                  = OpensslLib
   DEFINE OPENSSL_PATH            = openssl
-  DEFINE OPENSSL_FLAGS           = -DL_ENDIAN -DOPENSSL_SMALL_FOOTPRINT -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE
+  DEFINE OPENSSL_FLAGS           = -DL_ENDIAN -DOPENSSL_SMALL_FOOTPRINT -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE -DOPENSSL_NO_ASM
 
 #
 #  VALID_ARCHITECTURES           = IA32 X64 ARM AARCH64
diff --git a/CryptoPkg/Library/OpensslLib/OpensslLibIa32.inf b/CryptoPkg/Library/OpensslLib/OpensslLibIa32.inf
new file mode 100644
index 0000000000..14f4d4ab1a
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/OpensslLibIa32.inf
@@ -0,0 +1,680 @@
+## @file
+#  This module provides OpenSSL Library implementation.
+#
+#  Copyright (c) 2010 - 2020, Intel Corporation. All rights reserved.<BR>
+#  SPDX-License-Identifier: BSD-2-Clause-Patent
+#
+##
+
+[Defines]
+  INF_VERSION                    = 0x00010005
+  BASE_NAME                      = OpensslLibIa32
+  MODULE_UNI_FILE                = OpensslLib.uni
+  FILE_GUID                      = 5805D1D4-F8EE-4FBA-BDD8-74465F16A534
+  MODULE_TYPE                    = BASE
+  VERSION_STRING                 = 1.0
+  LIBRARY_CLASS                  = OpensslLib
+  DEFINE OPENSSL_PATH            = openssl
+  DEFINE OPENSSL_FLAGS           = -DL_ENDIAN -DOPENSSL_SMALL_FOOTPRINT -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE
+  DEFINE OPENSSL_FLAGS_CONFIG    = -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM
+  CONSTRUCTOR                    = OpensslLibConstructor
+
+#
+#  VALID_ARCHITECTURES           = IA32
+#
+
+[Sources]
+  OpensslLibConstructor.c
+  $(OPENSSL_PATH)/e_os.h
+  $(OPENSSL_PATH)/ms/uplink.h
+# Autogenerated files list starts here
+  Ia32/crypto/aes/aesni-x86.nasm
+  Ia32/crypto/aes/vpaes-x86.nasm
+  Ia32/crypto/bn/bn-586.nasm
+  Ia32/crypto/bn/co-586.nasm
+  Ia32/crypto/bn/x86-gf2m.nasm
+  Ia32/crypto/bn/x86-mont.nasm
+  Ia32/crypto/des/crypt586.nasm
+  Ia32/crypto/des/des-586.nasm
+  Ia32/crypto/md5/md5-586.nasm
+  Ia32/crypto/modes/ghash-x86.nasm
+  Ia32/crypto/rc4/rc4-586.nasm
+  Ia32/crypto/sha/sha1-586.nasm
+  Ia32/crypto/sha/sha256-586.nasm
+  Ia32/crypto/sha/sha512-586.nasm
+  Ia32/crypto/x86cpuid.nasm
+  $(OPENSSL_PATH)/crypto/aes/aes_cbc.c
+  $(OPENSSL_PATH)/crypto/aes/aes_cfb.c
+  $(OPENSSL_PATH)/crypto/aes/aes_core.c
+  $(OPENSSL_PATH)/crypto/aes/aes_ecb.c
+  $(OPENSSL_PATH)/crypto/aes/aes_ige.c
+  $(OPENSSL_PATH)/crypto/aes/aes_misc.c
+  $(OPENSSL_PATH)/crypto/aes/aes_ofb.c
+  $(OPENSSL_PATH)/crypto/aes/aes_wrap.c
+  $(OPENSSL_PATH)/crypto/aria/aria.c
+  $(OPENSSL_PATH)/crypto/asn1/a_bitstr.c
+  $(OPENSSL_PATH)/crypto/asn1/a_d2i_fp.c
+  $(OPENSSL_PATH)/crypto/asn1/a_digest.c
+  $(OPENSSL_PATH)/crypto/asn1/a_dup.c
+  $(OPENSSL_PATH)/crypto/asn1/a_gentm.c
+  $(OPENSSL_PATH)/crypto/asn1/a_i2d_fp.c
+  $(OPENSSL_PATH)/crypto/asn1/a_int.c
+  $(OPENSSL_PATH)/crypto/asn1/a_mbstr.c
+  $(OPENSSL_PATH)/crypto/asn1/a_object.c
+  $(OPENSSL_PATH)/crypto/asn1/a_octet.c
+  $(OPENSSL_PATH)/crypto/asn1/a_print.c
+  $(OPENSSL_PATH)/crypto/asn1/a_sign.c
+  $(OPENSSL_PATH)/crypto/asn1/a_strex.c
+  $(OPENSSL_PATH)/crypto/asn1/a_strnid.c
+  $(OPENSSL_PATH)/crypto/asn1/a_time.c
+  $(OPENSSL_PATH)/crypto/asn1/a_type.c
+  $(OPENSSL_PATH)/crypto/asn1/a_utctm.c
+  $(OPENSSL_PATH)/crypto/asn1/a_utf8.c
+  $(OPENSSL_PATH)/crypto/asn1/a_verify.c
+  $(OPENSSL_PATH)/crypto/asn1/ameth_lib.c
+  $(OPENSSL_PATH)/crypto/asn1/asn1_err.c
+  $(OPENSSL_PATH)/crypto/asn1/asn1_gen.c
+  $(OPENSSL_PATH)/crypto/asn1/asn1_item_list.c
+  $(OPENSSL_PATH)/crypto/asn1/asn1_lib.c
+  $(OPENSSL_PATH)/crypto/asn1/asn1_par.c
+  $(OPENSSL_PATH)/crypto/asn1/asn_mime.c
+  $(OPENSSL_PATH)/crypto/asn1/asn_moid.c
+  $(OPENSSL_PATH)/crypto/asn1/asn_mstbl.c
+  $(OPENSSL_PATH)/crypto/asn1/asn_pack.c
+  $(OPENSSL_PATH)/crypto/asn1/bio_asn1.c
+  $(OPENSSL_PATH)/crypto/asn1/bio_ndef.c
+  $(OPENSSL_PATH)/crypto/asn1/d2i_pr.c
+  $(OPENSSL_PATH)/crypto/asn1/d2i_pu.c
+  $(OPENSSL_PATH)/crypto/asn1/evp_asn1.c
+  $(OPENSSL_PATH)/crypto/asn1/f_int.c
+  $(OPENSSL_PATH)/crypto/asn1/f_string.c
+  $(OPENSSL_PATH)/crypto/asn1/i2d_pr.c
+  $(OPENSSL_PATH)/crypto/asn1/i2d_pu.c
+  $(OPENSSL_PATH)/crypto/asn1/n_pkey.c
+  $(OPENSSL_PATH)/crypto/asn1/nsseq.c
+  $(OPENSSL_PATH)/crypto/asn1/p5_pbe.c
+  $(OPENSSL_PATH)/crypto/asn1/p5_pbev2.c
+  $(OPENSSL_PATH)/crypto/asn1/p5_scrypt.c
+  $(OPENSSL_PATH)/crypto/asn1/p8_pkey.c
+  $(OPENSSL_PATH)/crypto/asn1/t_bitst.c
+  $(OPENSSL_PATH)/crypto/asn1/t_pkey.c
+  $(OPENSSL_PATH)/crypto/asn1/t_spki.c
+  $(OPENSSL_PATH)/crypto/asn1/tasn_dec.c
+  $(OPENSSL_PATH)/crypto/asn1/tasn_enc.c
+  $(OPENSSL_PATH)/crypto/asn1/tasn_fre.c
+  $(OPENSSL_PATH)/crypto/asn1/tasn_new.c
+  $(OPENSSL_PATH)/crypto/asn1/tasn_prn.c
+  $(OPENSSL_PATH)/crypto/asn1/tasn_scn.c
+  $(OPENSSL_PATH)/crypto/asn1/tasn_typ.c
+  $(OPENSSL_PATH)/crypto/asn1/tasn_utl.c
+  $(OPENSSL_PATH)/crypto/asn1/x_algor.c
+  $(OPENSSL_PATH)/crypto/asn1/x_bignum.c
+  $(OPENSSL_PATH)/crypto/asn1/x_info.c
+  $(OPENSSL_PATH)/crypto/asn1/x_int64.c
+  $(OPENSSL_PATH)/crypto/asn1/x_long.c
+  $(OPENSSL_PATH)/crypto/asn1/x_pkey.c
+  $(OPENSSL_PATH)/crypto/asn1/x_sig.c
+  $(OPENSSL_PATH)/crypto/asn1/x_spki.c
+  $(OPENSSL_PATH)/crypto/asn1/x_val.c
+  $(OPENSSL_PATH)/crypto/async/arch/async_null.c
+  $(OPENSSL_PATH)/crypto/async/arch/async_posix.c
+  $(OPENSSL_PATH)/crypto/async/arch/async_win.c
+  $(OPENSSL_PATH)/crypto/async/async.c
+  $(OPENSSL_PATH)/crypto/async/async_err.c
+  $(OPENSSL_PATH)/crypto/async/async_wait.c
+  $(OPENSSL_PATH)/crypto/bio/b_addr.c
+  $(OPENSSL_PATH)/crypto/bio/b_dump.c
+  $(OPENSSL_PATH)/crypto/bio/b_sock.c
+  $(OPENSSL_PATH)/crypto/bio/b_sock2.c
+  $(OPENSSL_PATH)/crypto/bio/bf_buff.c
+  $(OPENSSL_PATH)/crypto/bio/bf_lbuf.c
+  $(OPENSSL_PATH)/crypto/bio/bf_nbio.c
+  $(OPENSSL_PATH)/crypto/bio/bf_null.c
+  $(OPENSSL_PATH)/crypto/bio/bio_cb.c
+  $(OPENSSL_PATH)/crypto/bio/bio_err.c
+  $(OPENSSL_PATH)/crypto/bio/bio_lib.c
+  $(OPENSSL_PATH)/crypto/bio/bio_meth.c
+  $(OPENSSL_PATH)/crypto/bio/bss_acpt.c
+  $(OPENSSL_PATH)/crypto/bio/bss_bio.c
+  $(OPENSSL_PATH)/crypto/bio/bss_conn.c
+  $(OPENSSL_PATH)/crypto/bio/bss_dgram.c
+  $(OPENSSL_PATH)/crypto/bio/bss_fd.c
+  $(OPENSSL_PATH)/crypto/bio/bss_file.c
+  $(OPENSSL_PATH)/crypto/bio/bss_log.c
+  $(OPENSSL_PATH)/crypto/bio/bss_mem.c
+  $(OPENSSL_PATH)/crypto/bio/bss_null.c
+  $(OPENSSL_PATH)/crypto/bio/bss_sock.c
+  $(OPENSSL_PATH)/crypto/bn/bn_add.c
+  $(OPENSSL_PATH)/crypto/bn/bn_blind.c
+  $(OPENSSL_PATH)/crypto/bn/bn_const.c
+  $(OPENSSL_PATH)/crypto/bn/bn_ctx.c
+  $(OPENSSL_PATH)/crypto/bn/bn_depr.c
+  $(OPENSSL_PATH)/crypto/bn/bn_dh.c
+  $(OPENSSL_PATH)/crypto/bn/bn_div.c
+  $(OPENSSL_PATH)/crypto/bn/bn_err.c
+  $(OPENSSL_PATH)/crypto/bn/bn_exp.c
+  $(OPENSSL_PATH)/crypto/bn/bn_exp2.c
+  $(OPENSSL_PATH)/crypto/bn/bn_gcd.c
+  $(OPENSSL_PATH)/crypto/bn/bn_gf2m.c
+  $(OPENSSL_PATH)/crypto/bn/bn_intern.c
+  $(OPENSSL_PATH)/crypto/bn/bn_kron.c
+  $(OPENSSL_PATH)/crypto/bn/bn_lib.c
+  $(OPENSSL_PATH)/crypto/bn/bn_mod.c
+  $(OPENSSL_PATH)/crypto/bn/bn_mont.c
+  $(OPENSSL_PATH)/crypto/bn/bn_mpi.c
+  $(OPENSSL_PATH)/crypto/bn/bn_mul.c
+  $(OPENSSL_PATH)/crypto/bn/bn_nist.c
+  $(OPENSSL_PATH)/crypto/bn/bn_prime.c
+  $(OPENSSL_PATH)/crypto/bn/bn_print.c
+  $(OPENSSL_PATH)/crypto/bn/bn_rand.c
+  $(OPENSSL_PATH)/crypto/bn/bn_recp.c
+  $(OPENSSL_PATH)/crypto/bn/bn_shift.c
+  $(OPENSSL_PATH)/crypto/bn/bn_sqr.c
+  $(OPENSSL_PATH)/crypto/bn/bn_sqrt.c
+  $(OPENSSL_PATH)/crypto/bn/bn_srp.c
+  $(OPENSSL_PATH)/crypto/bn/bn_word.c
+  $(OPENSSL_PATH)/crypto/bn/bn_x931p.c
+  $(OPENSSL_PATH)/crypto/buffer/buf_err.c
+  $(OPENSSL_PATH)/crypto/buffer/buffer.c
+  $(OPENSSL_PATH)/crypto/cmac/cm_ameth.c
+  $(OPENSSL_PATH)/crypto/cmac/cm_pmeth.c
+  $(OPENSSL_PATH)/crypto/cmac/cmac.c
+  $(OPENSSL_PATH)/crypto/comp/c_zlib.c
+  $(OPENSSL_PATH)/crypto/comp/comp_err.c
+  $(OPENSSL_PATH)/crypto/comp/comp_lib.c
+  $(OPENSSL_PATH)/crypto/conf/conf_api.c
+  $(OPENSSL_PATH)/crypto/conf/conf_def.c
+  $(OPENSSL_PATH)/crypto/conf/conf_err.c
+  $(OPENSSL_PATH)/crypto/conf/conf_lib.c
+  $(OPENSSL_PATH)/crypto/conf/conf_mall.c
+  $(OPENSSL_PATH)/crypto/conf/conf_mod.c
+  $(OPENSSL_PATH)/crypto/conf/conf_sap.c
+  $(OPENSSL_PATH)/crypto/conf/conf_ssl.c
+  $(OPENSSL_PATH)/crypto/cpt_err.c
+  $(OPENSSL_PATH)/crypto/cryptlib.c
+  $(OPENSSL_PATH)/crypto/ctype.c
+  $(OPENSSL_PATH)/crypto/cversion.c
+  $(OPENSSL_PATH)/crypto/des/cbc_cksm.c
+  $(OPENSSL_PATH)/crypto/des/cbc_enc.c
+  $(OPENSSL_PATH)/crypto/des/cfb64ede.c
+  $(OPENSSL_PATH)/crypto/des/cfb64enc.c
+  $(OPENSSL_PATH)/crypto/des/cfb_enc.c
+  $(OPENSSL_PATH)/crypto/des/ecb3_enc.c
+  $(OPENSSL_PATH)/crypto/des/ecb_enc.c
+  $(OPENSSL_PATH)/crypto/des/fcrypt.c
+  $(OPENSSL_PATH)/crypto/des/ofb64ede.c
+  $(OPENSSL_PATH)/crypto/des/ofb64enc.c
+  $(OPENSSL_PATH)/crypto/des/ofb_enc.c
+  $(OPENSSL_PATH)/crypto/des/pcbc_enc.c
+  $(OPENSSL_PATH)/crypto/des/qud_cksm.c
+  $(OPENSSL_PATH)/crypto/des/rand_key.c
+  $(OPENSSL_PATH)/crypto/des/set_key.c
+  $(OPENSSL_PATH)/crypto/des/str2key.c
+  $(OPENSSL_PATH)/crypto/des/xcbc_enc.c
+  $(OPENSSL_PATH)/crypto/dh/dh_ameth.c
+  $(OPENSSL_PATH)/crypto/dh/dh_asn1.c
+  $(OPENSSL_PATH)/crypto/dh/dh_check.c
+  $(OPENSSL_PATH)/crypto/dh/dh_depr.c
+  $(OPENSSL_PATH)/crypto/dh/dh_err.c
+  $(OPENSSL_PATH)/crypto/dh/dh_gen.c
+  $(OPENSSL_PATH)/crypto/dh/dh_kdf.c
+  $(OPENSSL_PATH)/crypto/dh/dh_key.c
+  $(OPENSSL_PATH)/crypto/dh/dh_lib.c
+  $(OPENSSL_PATH)/crypto/dh/dh_meth.c
+  $(OPENSSL_PATH)/crypto/dh/dh_pmeth.c
+  $(OPENSSL_PATH)/crypto/dh/dh_prn.c
+  $(OPENSSL_PATH)/crypto/dh/dh_rfc5114.c
+  $(OPENSSL_PATH)/crypto/dh/dh_rfc7919.c
+  $(OPENSSL_PATH)/crypto/dso/dso_dl.c
+  $(OPENSSL_PATH)/crypto/dso/dso_dlfcn.c
+  $(OPENSSL_PATH)/crypto/dso/dso_err.c
+  $(OPENSSL_PATH)/crypto/dso/dso_lib.c
+  $(OPENSSL_PATH)/crypto/dso/dso_openssl.c
+  $(OPENSSL_PATH)/crypto/dso/dso_vms.c
+  $(OPENSSL_PATH)/crypto/dso/dso_win32.c
+  $(OPENSSL_PATH)/crypto/ebcdic.c
+  $(OPENSSL_PATH)/crypto/err/err.c
+  $(OPENSSL_PATH)/crypto/err/err_prn.c
+  $(OPENSSL_PATH)/crypto/evp/bio_b64.c
+  $(OPENSSL_PATH)/crypto/evp/bio_enc.c
+  $(OPENSSL_PATH)/crypto/evp/bio_md.c
+  $(OPENSSL_PATH)/crypto/evp/bio_ok.c
+  $(OPENSSL_PATH)/crypto/evp/c_allc.c
+  $(OPENSSL_PATH)/crypto/evp/c_alld.c
+  $(OPENSSL_PATH)/crypto/evp/cmeth_lib.c
+  $(OPENSSL_PATH)/crypto/evp/digest.c
+  $(OPENSSL_PATH)/crypto/evp/e_aes.c
+  $(OPENSSL_PATH)/crypto/evp/e_aes_cbc_hmac_sha1.c
+  $(OPENSSL_PATH)/crypto/evp/e_aes_cbc_hmac_sha256.c
+  $(OPENSSL_PATH)/crypto/evp/e_aria.c
+  $(OPENSSL_PATH)/crypto/evp/e_bf.c
+  $(OPENSSL_PATH)/crypto/evp/e_camellia.c
+  $(OPENSSL_PATH)/crypto/evp/e_cast.c
+  $(OPENSSL_PATH)/crypto/evp/e_chacha20_poly1305.c
+  $(OPENSSL_PATH)/crypto/evp/e_des.c
+  $(OPENSSL_PATH)/crypto/evp/e_des3.c
+  $(OPENSSL_PATH)/crypto/evp/e_idea.c
+  $(OPENSSL_PATH)/crypto/evp/e_null.c
+  $(OPENSSL_PATH)/crypto/evp/e_old.c
+  $(OPENSSL_PATH)/crypto/evp/e_rc2.c
+  $(OPENSSL_PATH)/crypto/evp/e_rc4.c
+  $(OPENSSL_PATH)/crypto/evp/e_rc4_hmac_md5.c
+  $(OPENSSL_PATH)/crypto/evp/e_rc5.c
+  $(OPENSSL_PATH)/crypto/evp/e_seed.c
+  $(OPENSSL_PATH)/crypto/evp/e_sm4.c
+  $(OPENSSL_PATH)/crypto/evp/e_xcbc_d.c
+  $(OPENSSL_PATH)/crypto/evp/encode.c
+  $(OPENSSL_PATH)/crypto/evp/evp_cnf.c
+  $(OPENSSL_PATH)/crypto/evp/evp_enc.c
+  $(OPENSSL_PATH)/crypto/evp/evp_err.c
+  $(OPENSSL_PATH)/crypto/evp/evp_key.c
+  $(OPENSSL_PATH)/crypto/evp/evp_lib.c
+  $(OPENSSL_PATH)/crypto/evp/evp_pbe.c
+  $(OPENSSL_PATH)/crypto/evp/evp_pkey.c
+  $(OPENSSL_PATH)/crypto/evp/m_md2.c
+  $(OPENSSL_PATH)/crypto/evp/m_md4.c
+  $(OPENSSL_PATH)/crypto/evp/m_md5.c
+  $(OPENSSL_PATH)/crypto/evp/m_md5_sha1.c
+  $(OPENSSL_PATH)/crypto/evp/m_mdc2.c
+  $(OPENSSL_PATH)/crypto/evp/m_null.c
+  $(OPENSSL_PATH)/crypto/evp/m_ripemd.c
+  $(OPENSSL_PATH)/crypto/evp/m_sha1.c
+  $(OPENSSL_PATH)/crypto/evp/m_sha3.c
+  $(OPENSSL_PATH)/crypto/evp/m_sigver.c
+  $(OPENSSL_PATH)/crypto/evp/m_wp.c
+  $(OPENSSL_PATH)/crypto/evp/names.c
+  $(OPENSSL_PATH)/crypto/evp/p5_crpt.c
+  $(OPENSSL_PATH)/crypto/evp/p5_crpt2.c
+  $(OPENSSL_PATH)/crypto/evp/p_dec.c
+  $(OPENSSL_PATH)/crypto/evp/p_enc.c
+  $(OPENSSL_PATH)/crypto/evp/p_lib.c
+  $(OPENSSL_PATH)/crypto/evp/p_open.c
+  $(OPENSSL_PATH)/crypto/evp/p_seal.c
+  $(OPENSSL_PATH)/crypto/evp/p_sign.c
+  $(OPENSSL_PATH)/crypto/evp/p_verify.c
+  $(OPENSSL_PATH)/crypto/evp/pbe_scrypt.c
+  $(OPENSSL_PATH)/crypto/evp/pmeth_fn.c
+  $(OPENSSL_PATH)/crypto/evp/pmeth_gn.c
+  $(OPENSSL_PATH)/crypto/evp/pmeth_lib.c
+  $(OPENSSL_PATH)/crypto/ex_data.c
+  $(OPENSSL_PATH)/crypto/getenv.c
+  $(OPENSSL_PATH)/crypto/hmac/hm_ameth.c
+  $(OPENSSL_PATH)/crypto/hmac/hm_pmeth.c
+  $(OPENSSL_PATH)/crypto/hmac/hmac.c
+  $(OPENSSL_PATH)/crypto/init.c
+  $(OPENSSL_PATH)/crypto/kdf/hkdf.c
+  $(OPENSSL_PATH)/crypto/kdf/kdf_err.c
+  $(OPENSSL_PATH)/crypto/kdf/scrypt.c
+  $(OPENSSL_PATH)/crypto/kdf/tls1_prf.c
+  $(OPENSSL_PATH)/crypto/lhash/lh_stats.c
+  $(OPENSSL_PATH)/crypto/lhash/lhash.c
+  $(OPENSSL_PATH)/crypto/md4/md4_dgst.c
+  $(OPENSSL_PATH)/crypto/md4/md4_one.c
+  $(OPENSSL_PATH)/crypto/md5/md5_dgst.c
+  $(OPENSSL_PATH)/crypto/md5/md5_one.c
+  $(OPENSSL_PATH)/crypto/mem.c
+  $(OPENSSL_PATH)/crypto/mem_dbg.c
+  $(OPENSSL_PATH)/crypto/mem_sec.c
+  $(OPENSSL_PATH)/crypto/modes/cbc128.c
+  $(OPENSSL_PATH)/crypto/modes/ccm128.c
+  $(OPENSSL_PATH)/crypto/modes/cfb128.c
+  $(OPENSSL_PATH)/crypto/modes/ctr128.c
+  $(OPENSSL_PATH)/crypto/modes/cts128.c
+  $(OPENSSL_PATH)/crypto/modes/gcm128.c
+  $(OPENSSL_PATH)/crypto/modes/ocb128.c
+  $(OPENSSL_PATH)/crypto/modes/ofb128.c
+  $(OPENSSL_PATH)/crypto/modes/wrap128.c
+  $(OPENSSL_PATH)/crypto/modes/xts128.c
+  $(OPENSSL_PATH)/crypto/o_dir.c
+  $(OPENSSL_PATH)/crypto/o_fips.c
+  $(OPENSSL_PATH)/crypto/o_fopen.c
+  $(OPENSSL_PATH)/crypto/o_init.c
+  $(OPENSSL_PATH)/crypto/o_str.c
+  $(OPENSSL_PATH)/crypto/o_time.c
+  $(OPENSSL_PATH)/crypto/objects/o_names.c
+  $(OPENSSL_PATH)/crypto/objects/obj_dat.c
+  $(OPENSSL_PATH)/crypto/objects/obj_err.c
+  $(OPENSSL_PATH)/crypto/objects/obj_lib.c
+  $(OPENSSL_PATH)/crypto/objects/obj_xref.c
+  $(OPENSSL_PATH)/crypto/ocsp/ocsp_asn.c
+  $(OPENSSL_PATH)/crypto/ocsp/ocsp_cl.c
+  $(OPENSSL_PATH)/crypto/ocsp/ocsp_err.c
+  $(OPENSSL_PATH)/crypto/ocsp/ocsp_ext.c
+  $(OPENSSL_PATH)/crypto/ocsp/ocsp_ht.c
+  $(OPENSSL_PATH)/crypto/ocsp/ocsp_lib.c
+  $(OPENSSL_PATH)/crypto/ocsp/ocsp_prn.c
+  $(OPENSSL_PATH)/crypto/ocsp/ocsp_srv.c
+  $(OPENSSL_PATH)/crypto/ocsp/ocsp_vfy.c
+  $(OPENSSL_PATH)/crypto/ocsp/v3_ocsp.c
+  $(OPENSSL_PATH)/crypto/pem/pem_all.c
+  $(OPENSSL_PATH)/crypto/pem/pem_err.c
+  $(OPENSSL_PATH)/crypto/pem/pem_info.c
+  $(OPENSSL_PATH)/crypto/pem/pem_lib.c
+  $(OPENSSL_PATH)/crypto/pem/pem_oth.c
+  $(OPENSSL_PATH)/crypto/pem/pem_pk8.c
+  $(OPENSSL_PATH)/crypto/pem/pem_pkey.c
+  $(OPENSSL_PATH)/crypto/pem/pem_sign.c
+  $(OPENSSL_PATH)/crypto/pem/pem_x509.c
+  $(OPENSSL_PATH)/crypto/pem/pem_xaux.c
+  $(OPENSSL_PATH)/crypto/pem/pvkfmt.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_add.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_asn.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_attr.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_crpt.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_crt.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_decr.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_init.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_key.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_kiss.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_mutl.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_npas.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_p8d.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_p8e.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_sbag.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_utl.c
+  $(OPENSSL_PATH)/crypto/pkcs12/pk12err.c
+  $(OPENSSL_PATH)/crypto/pkcs7/bio_pk7.c
+  $(OPENSSL_PATH)/crypto/pkcs7/pk7_asn1.c
+  $(OPENSSL_PATH)/crypto/pkcs7/pk7_attr.c
+  $(OPENSSL_PATH)/crypto/pkcs7/pk7_doit.c
+  $(OPENSSL_PATH)/crypto/pkcs7/pk7_lib.c
+  $(OPENSSL_PATH)/crypto/pkcs7/pk7_mime.c
+  $(OPENSSL_PATH)/crypto/pkcs7/pk7_smime.c
+  $(OPENSSL_PATH)/crypto/pkcs7/pkcs7err.c
+  $(OPENSSL_PATH)/crypto/rand/drbg_ctr.c
+  $(OPENSSL_PATH)/crypto/rand/drbg_lib.c
+  $(OPENSSL_PATH)/crypto/rand/rand_egd.c
+  $(OPENSSL_PATH)/crypto/rand/rand_err.c
+  $(OPENSSL_PATH)/crypto/rand/rand_lib.c
+  $(OPENSSL_PATH)/crypto/rand/rand_unix.c
+  $(OPENSSL_PATH)/crypto/rand/rand_vms.c
+  $(OPENSSL_PATH)/crypto/rand/rand_win.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_ameth.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_asn1.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_chk.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_crpt.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_depr.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_err.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_gen.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_lib.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_meth.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_mp.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_none.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_oaep.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_ossl.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_pk1.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_pmeth.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_prn.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_pss.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_saos.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_sign.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_ssl.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_x931.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_x931g.c
+  $(OPENSSL_PATH)/crypto/sha/keccak1600.c
+  $(OPENSSL_PATH)/crypto/sha/sha1_one.c
+  $(OPENSSL_PATH)/crypto/sha/sha1dgst.c
+  $(OPENSSL_PATH)/crypto/sha/sha256.c
+  $(OPENSSL_PATH)/crypto/sha/sha512.c
+  $(OPENSSL_PATH)/crypto/siphash/siphash.c
+  $(OPENSSL_PATH)/crypto/siphash/siphash_ameth.c
+  $(OPENSSL_PATH)/crypto/siphash/siphash_pmeth.c
+  $(OPENSSL_PATH)/crypto/sm3/m_sm3.c
+  $(OPENSSL_PATH)/crypto/sm3/sm3.c
+  $(OPENSSL_PATH)/crypto/sm4/sm4.c
+  $(OPENSSL_PATH)/crypto/stack/stack.c
+  $(OPENSSL_PATH)/crypto/threads_none.c
+  $(OPENSSL_PATH)/crypto/threads_pthread.c
+  $(OPENSSL_PATH)/crypto/threads_win.c
+  $(OPENSSL_PATH)/crypto/txt_db/txt_db.c
+  $(OPENSSL_PATH)/crypto/ui/ui_err.c
+  $(OPENSSL_PATH)/crypto/ui/ui_lib.c
+  $(OPENSSL_PATH)/crypto/ui/ui_null.c
+  $(OPENSSL_PATH)/crypto/ui/ui_openssl.c
+  $(OPENSSL_PATH)/crypto/ui/ui_util.c
+  $(OPENSSL_PATH)/crypto/uid.c
+  $(OPENSSL_PATH)/crypto/x509/by_dir.c
+  $(OPENSSL_PATH)/crypto/x509/by_file.c
+  $(OPENSSL_PATH)/crypto/x509/t_crl.c
+  $(OPENSSL_PATH)/crypto/x509/t_req.c
+  $(OPENSSL_PATH)/crypto/x509/t_x509.c
+  $(OPENSSL_PATH)/crypto/x509/x509_att.c
+  $(OPENSSL_PATH)/crypto/x509/x509_cmp.c
+  $(OPENSSL_PATH)/crypto/x509/x509_d2.c
+  $(OPENSSL_PATH)/crypto/x509/x509_def.c
+  $(OPENSSL_PATH)/crypto/x509/x509_err.c
+  $(OPENSSL_PATH)/crypto/x509/x509_ext.c
+  $(OPENSSL_PATH)/crypto/x509/x509_lu.c
+  $(OPENSSL_PATH)/crypto/x509/x509_meth.c
+  $(OPENSSL_PATH)/crypto/x509/x509_obj.c
+  $(OPENSSL_PATH)/crypto/x509/x509_r2x.c
+  $(OPENSSL_PATH)/crypto/x509/x509_req.c
+  $(OPENSSL_PATH)/crypto/x509/x509_set.c
+  $(OPENSSL_PATH)/crypto/x509/x509_trs.c
+  $(OPENSSL_PATH)/crypto/x509/x509_txt.c
+  $(OPENSSL_PATH)/crypto/x509/x509_v3.c
+  $(OPENSSL_PATH)/crypto/x509/x509_vfy.c
+  $(OPENSSL_PATH)/crypto/x509/x509_vpm.c
+  $(OPENSSL_PATH)/crypto/x509/x509cset.c
+  $(OPENSSL_PATH)/crypto/x509/x509name.c
+  $(OPENSSL_PATH)/crypto/x509/x509rset.c
+  $(OPENSSL_PATH)/crypto/x509/x509spki.c
+  $(OPENSSL_PATH)/crypto/x509/x509type.c
+  $(OPENSSL_PATH)/crypto/x509/x_all.c
+  $(OPENSSL_PATH)/crypto/x509/x_attrib.c
+  $(OPENSSL_PATH)/crypto/x509/x_crl.c
+  $(OPENSSL_PATH)/crypto/x509/x_exten.c
+  $(OPENSSL_PATH)/crypto/x509/x_name.c
+  $(OPENSSL_PATH)/crypto/x509/x_pubkey.c
+  $(OPENSSL_PATH)/crypto/x509/x_req.c
+  $(OPENSSL_PATH)/crypto/x509/x_x509.c
+  $(OPENSSL_PATH)/crypto/x509/x_x509a.c
+  $(OPENSSL_PATH)/crypto/x509v3/pcy_cache.c
+  $(OPENSSL_PATH)/crypto/x509v3/pcy_data.c
+  $(OPENSSL_PATH)/crypto/x509v3/pcy_lib.c
+  $(OPENSSL_PATH)/crypto/x509v3/pcy_map.c
+  $(OPENSSL_PATH)/crypto/x509v3/pcy_node.c
+  $(OPENSSL_PATH)/crypto/x509v3/pcy_tree.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_addr.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_admis.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_akey.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_akeya.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_alt.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_asid.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_bcons.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_bitst.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_conf.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_cpols.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_crld.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_enum.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_extku.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_genn.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_ia5.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_info.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_int.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_lib.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_ncons.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_pci.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_pcia.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_pcons.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_pku.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_pmaps.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_prn.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_purp.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_skey.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_sxnet.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_tlsf.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_utl.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3err.c
+  $(OPENSSL_PATH)/crypto/arm_arch.h
+  $(OPENSSL_PATH)/crypto/mips_arch.h
+  $(OPENSSL_PATH)/crypto/ppc_arch.h
+  $(OPENSSL_PATH)/crypto/s390x_arch.h
+  $(OPENSSL_PATH)/crypto/sparc_arch.h
+  $(OPENSSL_PATH)/crypto/vms_rms.h
+  $(OPENSSL_PATH)/crypto/aes/aes_locl.h
+  $(OPENSSL_PATH)/crypto/asn1/asn1_item_list.h
+  $(OPENSSL_PATH)/crypto/asn1/asn1_locl.h
+  $(OPENSSL_PATH)/crypto/asn1/charmap.h
+  $(OPENSSL_PATH)/crypto/asn1/standard_methods.h
+  $(OPENSSL_PATH)/crypto/asn1/tbl_standard.h
+  $(OPENSSL_PATH)/crypto/async/async_locl.h
+  $(OPENSSL_PATH)/crypto/async/arch/async_null.h
+  $(OPENSSL_PATH)/crypto/async/arch/async_posix.h
+  $(OPENSSL_PATH)/crypto/async/arch/async_win.h
+  $(OPENSSL_PATH)/crypto/bio/bio_lcl.h
+  $(OPENSSL_PATH)/crypto/bn/bn_lcl.h
+  $(OPENSSL_PATH)/crypto/bn/bn_prime.h
+  $(OPENSSL_PATH)/crypto/bn/rsaz_exp.h
+  $(OPENSSL_PATH)/crypto/comp/comp_lcl.h
+  $(OPENSSL_PATH)/crypto/conf/conf_def.h
+  $(OPENSSL_PATH)/crypto/conf/conf_lcl.h
+  $(OPENSSL_PATH)/crypto/des/des_locl.h
+  $(OPENSSL_PATH)/crypto/des/spr.h
+  $(OPENSSL_PATH)/crypto/dh/dh_locl.h
+  $(OPENSSL_PATH)/crypto/dso/dso_locl.h
+  $(OPENSSL_PATH)/crypto/evp/evp_locl.h
+  $(OPENSSL_PATH)/crypto/hmac/hmac_lcl.h
+  $(OPENSSL_PATH)/crypto/lhash/lhash_lcl.h
+  $(OPENSSL_PATH)/crypto/md4/md4_locl.h
+  $(OPENSSL_PATH)/crypto/md5/md5_locl.h
+  $(OPENSSL_PATH)/crypto/modes/modes_lcl.h
+  $(OPENSSL_PATH)/crypto/objects/obj_dat.h
+  $(OPENSSL_PATH)/crypto/objects/obj_lcl.h
+  $(OPENSSL_PATH)/crypto/objects/obj_xref.h
+  $(OPENSSL_PATH)/crypto/ocsp/ocsp_lcl.h
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_lcl.h
+  $(OPENSSL_PATH)/crypto/rand/rand_lcl.h
+  $(OPENSSL_PATH)/crypto/rc4/rc4_locl.h
+  $(OPENSSL_PATH)/crypto/rsa/rsa_locl.h
+  $(OPENSSL_PATH)/crypto/sha/sha_locl.h
+  $(OPENSSL_PATH)/crypto/siphash/siphash_local.h
+  $(OPENSSL_PATH)/crypto/sm3/sm3_locl.h
+  $(OPENSSL_PATH)/crypto/store/store_locl.h
+  $(OPENSSL_PATH)/crypto/ui/ui_locl.h
+  $(OPENSSL_PATH)/crypto/x509/x509_lcl.h
+  $(OPENSSL_PATH)/crypto/x509v3/ext_dat.h
+  $(OPENSSL_PATH)/crypto/x509v3/pcy_int.h
+  $(OPENSSL_PATH)/crypto/x509v3/standard_exts.h
+  $(OPENSSL_PATH)/crypto/x509v3/v3_admis.h
+  $(OPENSSL_PATH)/ssl/bio_ssl.c
+  $(OPENSSL_PATH)/ssl/d1_lib.c
+  $(OPENSSL_PATH)/ssl/d1_msg.c
+  $(OPENSSL_PATH)/ssl/d1_srtp.c
+  $(OPENSSL_PATH)/ssl/methods.c
+  $(OPENSSL_PATH)/ssl/packet.c
+  $(OPENSSL_PATH)/ssl/pqueue.c
+  $(OPENSSL_PATH)/ssl/record/dtls1_bitmap.c
+  $(OPENSSL_PATH)/ssl/record/rec_layer_d1.c
+  $(OPENSSL_PATH)/ssl/record/rec_layer_s3.c
+  $(OPENSSL_PATH)/ssl/record/ssl3_buffer.c
+  $(OPENSSL_PATH)/ssl/record/ssl3_record.c
+  $(OPENSSL_PATH)/ssl/record/ssl3_record_tls13.c
+  $(OPENSSL_PATH)/ssl/s3_cbc.c
+  $(OPENSSL_PATH)/ssl/s3_enc.c
+  $(OPENSSL_PATH)/ssl/s3_lib.c
+  $(OPENSSL_PATH)/ssl/s3_msg.c
+  $(OPENSSL_PATH)/ssl/ssl_asn1.c
+  $(OPENSSL_PATH)/ssl/ssl_cert.c
+  $(OPENSSL_PATH)/ssl/ssl_ciph.c
+  $(OPENSSL_PATH)/ssl/ssl_conf.c
+  $(OPENSSL_PATH)/ssl/ssl_err.c
+  $(OPENSSL_PATH)/ssl/ssl_init.c
+  $(OPENSSL_PATH)/ssl/ssl_lib.c
+  $(OPENSSL_PATH)/ssl/ssl_mcnf.c
+  $(OPENSSL_PATH)/ssl/ssl_rsa.c
+  $(OPENSSL_PATH)/ssl/ssl_sess.c
+  $(OPENSSL_PATH)/ssl/ssl_stat.c
+  $(OPENSSL_PATH)/ssl/ssl_txt.c
+  $(OPENSSL_PATH)/ssl/ssl_utst.c
+  $(OPENSSL_PATH)/ssl/statem/extensions.c
+  $(OPENSSL_PATH)/ssl/statem/extensions_clnt.c
+  $(OPENSSL_PATH)/ssl/statem/extensions_cust.c
+  $(OPENSSL_PATH)/ssl/statem/extensions_srvr.c
+  $(OPENSSL_PATH)/ssl/statem/statem.c
+  $(OPENSSL_PATH)/ssl/statem/statem_clnt.c
+  $(OPENSSL_PATH)/ssl/statem/statem_dtls.c
+  $(OPENSSL_PATH)/ssl/statem/statem_lib.c
+  $(OPENSSL_PATH)/ssl/statem/statem_srvr.c
+  $(OPENSSL_PATH)/ssl/t1_enc.c
+  $(OPENSSL_PATH)/ssl/t1_lib.c
+  $(OPENSSL_PATH)/ssl/t1_trce.c
+  $(OPENSSL_PATH)/ssl/tls13_enc.c
+  $(OPENSSL_PATH)/ssl/tls_srp.c
+  $(OPENSSL_PATH)/ssl/packet_locl.h
+  $(OPENSSL_PATH)/ssl/ssl_cert_table.h
+  $(OPENSSL_PATH)/ssl/ssl_locl.h
+  $(OPENSSL_PATH)/ssl/record/record.h
+  $(OPENSSL_PATH)/ssl/record/record_locl.h
+  $(OPENSSL_PATH)/ssl/statem/statem.h
+  $(OPENSSL_PATH)/ssl/statem/statem_locl.h
+# Autogenerated files list ends here
+  buildinf.h
+  rand_pool_noise.h
+  ossl_store.c
+  rand_pool.c
+
+[Sources.Ia32]
+  rand_pool_noise_tsc.c
+
+[Packages]
+  MdePkg/MdePkg.dec
+  CryptoPkg/CryptoPkg.dec
+
+[LibraryClasses]
+  BaseLib
+  DebugLib
+  TimerLib
+  PrintLib
+
+[BuildOptions]
+  #
+  # Disables the following Visual Studio compiler warnings brought by openssl source,
+  # so we do not break the build with /WX option:
+  #   C4090: 'function' : different 'const' qualifiers
+  #   C4132: 'object' : const object should be initialized (tls13_enc.c)
+  #   C4210: nonstandard extension used: function given file scope
+  #   C4244: conversion from type1 to type2, possible loss of data
+  #   C4245: conversion from type1 to type2, signed/unsigned mismatch
+  #   C4267: conversion from size_t to type, possible loss of data
+  #   C4306: 'identifier' : conversion from 'type1' to 'type2' of greater size
+  #   C4310: cast truncates constant value
+  #   C4389: 'operator' : signed/unsigned mismatch (xxxx)
+  #   C4700: uninitialized local variable 'name' used. (conf_sap.c(71))
+  #   C4702: unreachable code
+  #   C4706: assignment within conditional expression
+  #   C4819: The file contains a character that cannot be represented in the current code page
+  #
+  MSFT:*_*_IA32_CC_FLAGS   = -U_WIN32 -U_WIN64 -U_MSC_VER $(OPENSSL_FLAGS) $(OPENSSL_FLAGS_CONFIG) /wd4090 /wd4132 /wd4210 /wd4244 /wd4245 /wd4267 /wd4310 /wd4389 /wd4700 /wd4702 /wd4706 /wd4819
+
+  INTEL:*_*_IA32_CC_FLAGS  = -U_WIN32 -U_WIN64 -U_MSC_VER -U__ICC $(OPENSSL_FLAGS) $(OPENSSL_FLAGS_CONFIG) /w
+
+  #
+  # Suppress the following build warnings in openssl so we don't break the build with -Werror
+  #   -Werror=maybe-uninitialized: there exist some other paths for which the variable is not initialized.
+  #   -Werror=format: Check calls to printf and scanf, etc., to make sure that the arguments supplied have
+  #                   types appropriate to the format string specified.
+  #   -Werror=unused-but-set-variable: Warn whenever a local variable is assigned to, but otherwise unused (aside from its declaration).
+  #
+  GCC:*_*_IA32_CC_FLAGS    = -U_WIN32 -U_WIN64 $(OPENSSL_FLAGS) -Wno-error=maybe-uninitialized -Wno-error=unused-but-set-variable
+
+  # suppress the following warnings in openssl so we don't break the build with warnings-as-errors:
+  # 1295: Deprecated declaration <entity> - give arg types
+  #  550: <entity> was set but never used
+  # 1293: assignment in condition
+  #  111: statement is unreachable (invariably "break;" after "return X;" in case statement)
+  #   68: integer conversion resulted in a change of sign ("if (Status == -1)")
+  #  177: <entity> was declared but never referenced
+  #  223: function <entity> declared implicitly
+  #  144: a value of type <type> cannot be used to initialize an entity of type <type>
+  #  513: a value of type <type> cannot be assigned to an entity of type <type>
+  #  188: enumerated type mixed with another type (i.e. passing an integer as an enum without a cast)
+  # 1296: Extended constant initialiser used
+  #  128: loop is not reachable - may be emitted inappropriately if code follows a conditional return
+  #       from the function that evaluates to true at compile time
+  #  546: transfer of control bypasses initialization - may be emitted inappropriately if the uninitialized
+  #       variable is never referenced after the jump
+  #    1: ignore "#1-D: last line of file ends without a newline"
+  # 3017: <entity> may be used before being set (NOTE: This was fixed in OpenSSL 1.1 HEAD with
+  #       commit d9b8b89bec4480de3a10bdaf9425db371c19145b, and can be dropped then.)
+  XCODE:*_*_IA32_CC_FLAGS   = -mmmx -msse -U_WIN32 -U_WIN64 $(OPENSSL_FLAGS) -w -std=c99 -Wno-error=uninitialized
diff --git a/CryptoPkg/Library/OpensslLib/OpensslLibX64.inf b/CryptoPkg/Library/OpensslLib/OpensslLibX64.inf
new file mode 100644
index 0000000000..fcebc6d6de
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/OpensslLibX64.inf
@@ -0,0 +1,691 @@
+## @file
+#  This module provides OpenSSL Library implementation.
+#
+#  Copyright (c) 2010 - 2020, Intel Corporation. All rights reserved.<BR>
+#  SPDX-License-Identifier: BSD-2-Clause-Patent
+#
+##
+
+[Defines]
+  INF_VERSION                    = 0x00010005
+  BASE_NAME                      = OpensslLibX64
+  MODULE_UNI_FILE                = OpensslLib.uni
+  FILE_GUID                      = 18125E50-0117-4DD0-BE54-4784AD995FEF
+  MODULE_TYPE                    = BASE
+  VERSION_STRING                 = 1.0
+  LIBRARY_CLASS                  = OpensslLib
+  DEFINE OPENSSL_PATH            = openssl
+  DEFINE OPENSSL_FLAGS           = -DL_ENDIAN -DOPENSSL_SMALL_FOOTPRINT -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE
+  DEFINE OPENSSL_FLAGS_CONFIG    = -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM
+  CONSTRUCTOR                    = OpensslLibConstructor
+
+#
+#  VALID_ARCHITECTURES           = X64
+#
+
+[Sources]
+  OpensslLibConstructor.c
+  $(OPENSSL_PATH)/e_os.h
+  $(OPENSSL_PATH)/ms/uplink.h
+# Autogenerated files list starts here
+  X64/crypto/aes/aesni-mb-x86_64.nasm
+  X64/crypto/aes/aesni-sha1-x86_64.nasm
+  X64/crypto/aes/aesni-sha256-x86_64.nasm
+  X64/crypto/aes/aesni-x86_64.nasm
+  X64/crypto/aes/vpaes-x86_64.nasm
+  X64/crypto/bn/rsaz-avx2.nasm
+  X64/crypto/bn/rsaz-x86_64.nasm
+  X64/crypto/bn/x86_64-gf2m.nasm
+  X64/crypto/bn/x86_64-mont.nasm
+  X64/crypto/bn/x86_64-mont5.nasm
+  X64/crypto/md5/md5-x86_64.nasm
+  X64/crypto/modes/aesni-gcm-x86_64.nasm
+  X64/crypto/modes/ghash-x86_64.nasm
+  X64/crypto/rc4/rc4-md5-x86_64.nasm
+  X64/crypto/rc4/rc4-x86_64.nasm
+  X64/crypto/sha/keccak1600-x86_64.nasm
+  X64/crypto/sha/sha1-mb-x86_64.nasm
+  X64/crypto/sha/sha1-x86_64.nasm
+  X64/crypto/sha/sha256-mb-x86_64.nasm
+  X64/crypto/sha/sha256-x86_64.nasm
+  X64/crypto/sha/sha512-x86_64.nasm
+  X64/crypto/x86_64cpuid.nasm
+  $(OPENSSL_PATH)/crypto/aes/aes_cbc.c
+  $(OPENSSL_PATH)/crypto/aes/aes_cfb.c
+  $(OPENSSL_PATH)/crypto/aes/aes_core.c
+  $(OPENSSL_PATH)/crypto/aes/aes_ecb.c
+  $(OPENSSL_PATH)/crypto/aes/aes_ige.c
+  $(OPENSSL_PATH)/crypto/aes/aes_misc.c
+  $(OPENSSL_PATH)/crypto/aes/aes_ofb.c
+  $(OPENSSL_PATH)/crypto/aes/aes_wrap.c
+  $(OPENSSL_PATH)/crypto/aria/aria.c
+  $(OPENSSL_PATH)/crypto/asn1/a_bitstr.c
+  $(OPENSSL_PATH)/crypto/asn1/a_d2i_fp.c
+  $(OPENSSL_PATH)/crypto/asn1/a_digest.c
+  $(OPENSSL_PATH)/crypto/asn1/a_dup.c
+  $(OPENSSL_PATH)/crypto/asn1/a_gentm.c
+  $(OPENSSL_PATH)/crypto/asn1/a_i2d_fp.c
+  $(OPENSSL_PATH)/crypto/asn1/a_int.c
+  $(OPENSSL_PATH)/crypto/asn1/a_mbstr.c
+  $(OPENSSL_PATH)/crypto/asn1/a_object.c
+  $(OPENSSL_PATH)/crypto/asn1/a_octet.c
+  $(OPENSSL_PATH)/crypto/asn1/a_print.c
+  $(OPENSSL_PATH)/crypto/asn1/a_sign.c
+  $(OPENSSL_PATH)/crypto/asn1/a_strex.c
+  $(OPENSSL_PATH)/crypto/asn1/a_strnid.c
+  $(OPENSSL_PATH)/crypto/asn1/a_time.c
+  $(OPENSSL_PATH)/crypto/asn1/a_type.c
+  $(OPENSSL_PATH)/crypto/asn1/a_utctm.c
+  $(OPENSSL_PATH)/crypto/asn1/a_utf8.c
+  $(OPENSSL_PATH)/crypto/asn1/a_verify.c
+  $(OPENSSL_PATH)/crypto/asn1/ameth_lib.c
+  $(OPENSSL_PATH)/crypto/asn1/asn1_err.c
+  $(OPENSSL_PATH)/crypto/asn1/asn1_gen.c
+  $(OPENSSL_PATH)/crypto/asn1/asn1_item_list.c
+  $(OPENSSL_PATH)/crypto/asn1/asn1_lib.c
+  $(OPENSSL_PATH)/crypto/asn1/asn1_par.c
+  $(OPENSSL_PATH)/crypto/asn1/asn_mime.c
+  $(OPENSSL_PATH)/crypto/asn1/asn_moid.c
+  $(OPENSSL_PATH)/crypto/asn1/asn_mstbl.c
+  $(OPENSSL_PATH)/crypto/asn1/asn_pack.c
+  $(OPENSSL_PATH)/crypto/asn1/bio_asn1.c
+  $(OPENSSL_PATH)/crypto/asn1/bio_ndef.c
+  $(OPENSSL_PATH)/crypto/asn1/d2i_pr.c
+  $(OPENSSL_PATH)/crypto/asn1/d2i_pu.c
+  $(OPENSSL_PATH)/crypto/asn1/evp_asn1.c
+  $(OPENSSL_PATH)/crypto/asn1/f_int.c
+  $(OPENSSL_PATH)/crypto/asn1/f_string.c
+  $(OPENSSL_PATH)/crypto/asn1/i2d_pr.c
+  $(OPENSSL_PATH)/crypto/asn1/i2d_pu.c
+  $(OPENSSL_PATH)/crypto/asn1/n_pkey.c
+  $(OPENSSL_PATH)/crypto/asn1/nsseq.c
+  $(OPENSSL_PATH)/crypto/asn1/p5_pbe.c
+  $(OPENSSL_PATH)/crypto/asn1/p5_pbev2.c
+  $(OPENSSL_PATH)/crypto/asn1/p5_scrypt.c
+  $(OPENSSL_PATH)/crypto/asn1/p8_pkey.c
+  $(OPENSSL_PATH)/crypto/asn1/t_bitst.c
+  $(OPENSSL_PATH)/crypto/asn1/t_pkey.c
+  $(OPENSSL_PATH)/crypto/asn1/t_spki.c
+  $(OPENSSL_PATH)/crypto/asn1/tasn_dec.c
+  $(OPENSSL_PATH)/crypto/asn1/tasn_enc.c
+  $(OPENSSL_PATH)/crypto/asn1/tasn_fre.c
+  $(OPENSSL_PATH)/crypto/asn1/tasn_new.c
+  $(OPENSSL_PATH)/crypto/asn1/tasn_prn.c
+  $(OPENSSL_PATH)/crypto/asn1/tasn_scn.c
+  $(OPENSSL_PATH)/crypto/asn1/tasn_typ.c
+  $(OPENSSL_PATH)/crypto/asn1/tasn_utl.c
+  $(OPENSSL_PATH)/crypto/asn1/x_algor.c
+  $(OPENSSL_PATH)/crypto/asn1/x_bignum.c
+  $(OPENSSL_PATH)/crypto/asn1/x_info.c
+  $(OPENSSL_PATH)/crypto/asn1/x_int64.c
+  $(OPENSSL_PATH)/crypto/asn1/x_long.c
+  $(OPENSSL_PATH)/crypto/asn1/x_pkey.c
+  $(OPENSSL_PATH)/crypto/asn1/x_sig.c
+  $(OPENSSL_PATH)/crypto/asn1/x_spki.c
+  $(OPENSSL_PATH)/crypto/asn1/x_val.c
+  $(OPENSSL_PATH)/crypto/async/arch/async_null.c
+  $(OPENSSL_PATH)/crypto/async/arch/async_posix.c
+  $(OPENSSL_PATH)/crypto/async/arch/async_win.c
+  $(OPENSSL_PATH)/crypto/async/async.c
+  $(OPENSSL_PATH)/crypto/async/async_err.c
+  $(OPENSSL_PATH)/crypto/async/async_wait.c
+  $(OPENSSL_PATH)/crypto/bio/b_addr.c
+  $(OPENSSL_PATH)/crypto/bio/b_dump.c
+  $(OPENSSL_PATH)/crypto/bio/b_sock.c
+  $(OPENSSL_PATH)/crypto/bio/b_sock2.c
+  $(OPENSSL_PATH)/crypto/bio/bf_buff.c
+  $(OPENSSL_PATH)/crypto/bio/bf_lbuf.c
+  $(OPENSSL_PATH)/crypto/bio/bf_nbio.c
+  $(OPENSSL_PATH)/crypto/bio/bf_null.c
+  $(OPENSSL_PATH)/crypto/bio/bio_cb.c
+  $(OPENSSL_PATH)/crypto/bio/bio_err.c
+  $(OPENSSL_PATH)/crypto/bio/bio_lib.c
+  $(OPENSSL_PATH)/crypto/bio/bio_meth.c
+  $(OPENSSL_PATH)/crypto/bio/bss_acpt.c
+  $(OPENSSL_PATH)/crypto/bio/bss_bio.c
+  $(OPENSSL_PATH)/crypto/bio/bss_conn.c
+  $(OPENSSL_PATH)/crypto/bio/bss_dgram.c
+  $(OPENSSL_PATH)/crypto/bio/bss_fd.c
+  $(OPENSSL_PATH)/crypto/bio/bss_file.c
+  $(OPENSSL_PATH)/crypto/bio/bss_log.c
+  $(OPENSSL_PATH)/crypto/bio/bss_mem.c
+  $(OPENSSL_PATH)/crypto/bio/bss_null.c
+  $(OPENSSL_PATH)/crypto/bio/bss_sock.c
+  $(OPENSSL_PATH)/crypto/bn/asm/x86_64-gcc.c
+  $(OPENSSL_PATH)/crypto/bn/bn_add.c
+  $(OPENSSL_PATH)/crypto/bn/bn_blind.c
+  $(OPENSSL_PATH)/crypto/bn/bn_const.c
+  $(OPENSSL_PATH)/crypto/bn/bn_ctx.c
+  $(OPENSSL_PATH)/crypto/bn/bn_depr.c
+  $(OPENSSL_PATH)/crypto/bn/bn_dh.c
+  $(OPENSSL_PATH)/crypto/bn/bn_div.c
+  $(OPENSSL_PATH)/crypto/bn/bn_err.c
+  $(OPENSSL_PATH)/crypto/bn/bn_exp.c
+  $(OPENSSL_PATH)/crypto/bn/bn_exp2.c
+  $(OPENSSL_PATH)/crypto/bn/bn_gcd.c
+  $(OPENSSL_PATH)/crypto/bn/bn_gf2m.c
+  $(OPENSSL_PATH)/crypto/bn/bn_intern.c
+  $(OPENSSL_PATH)/crypto/bn/bn_kron.c
+  $(OPENSSL_PATH)/crypto/bn/bn_lib.c
+  $(OPENSSL_PATH)/crypto/bn/bn_mod.c
+  $(OPENSSL_PATH)/crypto/bn/bn_mont.c
+  $(OPENSSL_PATH)/crypto/bn/bn_mpi.c
+  $(OPENSSL_PATH)/crypto/bn/bn_mul.c
+  $(OPENSSL_PATH)/crypto/bn/bn_nist.c
+  $(OPENSSL_PATH)/crypto/bn/bn_prime.c
+  $(OPENSSL_PATH)/crypto/bn/bn_print.c
+  $(OPENSSL_PATH)/crypto/bn/bn_rand.c
+  $(OPENSSL_PATH)/crypto/bn/bn_recp.c
+  $(OPENSSL_PATH)/crypto/bn/bn_shift.c
+  $(OPENSSL_PATH)/crypto/bn/bn_sqr.c
+  $(OPENSSL_PATH)/crypto/bn/bn_sqrt.c
+  $(OPENSSL_PATH)/crypto/bn/bn_srp.c
+  $(OPENSSL_PATH)/crypto/bn/bn_word.c
+  $(OPENSSL_PATH)/crypto/bn/bn_x931p.c
+  $(OPENSSL_PATH)/crypto/bn/rsaz_exp.c
+  $(OPENSSL_PATH)/crypto/buffer/buf_err.c
+  $(OPENSSL_PATH)/crypto/buffer/buffer.c
+  $(OPENSSL_PATH)/crypto/cmac/cm_ameth.c
+  $(OPENSSL_PATH)/crypto/cmac/cm_pmeth.c
+  $(OPENSSL_PATH)/crypto/cmac/cmac.c
+  $(OPENSSL_PATH)/crypto/comp/c_zlib.c
+  $(OPENSSL_PATH)/crypto/comp/comp_err.c
+  $(OPENSSL_PATH)/crypto/comp/comp_lib.c
+  $(OPENSSL_PATH)/crypto/conf/conf_api.c
+  $(OPENSSL_PATH)/crypto/conf/conf_def.c
+  $(OPENSSL_PATH)/crypto/conf/conf_err.c
+  $(OPENSSL_PATH)/crypto/conf/conf_lib.c
+  $(OPENSSL_PATH)/crypto/conf/conf_mall.c
+  $(OPENSSL_PATH)/crypto/conf/conf_mod.c
+  $(OPENSSL_PATH)/crypto/conf/conf_sap.c
+  $(OPENSSL_PATH)/crypto/conf/conf_ssl.c
+  $(OPENSSL_PATH)/crypto/cpt_err.c
+  $(OPENSSL_PATH)/crypto/cryptlib.c
+  $(OPENSSL_PATH)/crypto/ctype.c
+  $(OPENSSL_PATH)/crypto/cversion.c
+  $(OPENSSL_PATH)/crypto/des/cbc_cksm.c
+  $(OPENSSL_PATH)/crypto/des/cbc_enc.c
+  $(OPENSSL_PATH)/crypto/des/cfb64ede.c
+  $(OPENSSL_PATH)/crypto/des/cfb64enc.c
+  $(OPENSSL_PATH)/crypto/des/cfb_enc.c
+  $(OPENSSL_PATH)/crypto/des/des_enc.c
+  $(OPENSSL_PATH)/crypto/des/ecb3_enc.c
+  $(OPENSSL_PATH)/crypto/des/ecb_enc.c
+  $(OPENSSL_PATH)/crypto/des/fcrypt.c
+  $(OPENSSL_PATH)/crypto/des/fcrypt_b.c
+  $(OPENSSL_PATH)/crypto/des/ofb64ede.c
+  $(OPENSSL_PATH)/crypto/des/ofb64enc.c
+  $(OPENSSL_PATH)/crypto/des/ofb_enc.c
+  $(OPENSSL_PATH)/crypto/des/pcbc_enc.c
+  $(OPENSSL_PATH)/crypto/des/qud_cksm.c
+  $(OPENSSL_PATH)/crypto/des/rand_key.c
+  $(OPENSSL_PATH)/crypto/des/set_key.c
+  $(OPENSSL_PATH)/crypto/des/str2key.c
+  $(OPENSSL_PATH)/crypto/des/xcbc_enc.c
+  $(OPENSSL_PATH)/crypto/dh/dh_ameth.c
+  $(OPENSSL_PATH)/crypto/dh/dh_asn1.c
+  $(OPENSSL_PATH)/crypto/dh/dh_check.c
+  $(OPENSSL_PATH)/crypto/dh/dh_depr.c
+  $(OPENSSL_PATH)/crypto/dh/dh_err.c
+  $(OPENSSL_PATH)/crypto/dh/dh_gen.c
+  $(OPENSSL_PATH)/crypto/dh/dh_kdf.c
+  $(OPENSSL_PATH)/crypto/dh/dh_key.c
+  $(OPENSSL_PATH)/crypto/dh/dh_lib.c
+  $(OPENSSL_PATH)/crypto/dh/dh_meth.c
+  $(OPENSSL_PATH)/crypto/dh/dh_pmeth.c
+  $(OPENSSL_PATH)/crypto/dh/dh_prn.c
+  $(OPENSSL_PATH)/crypto/dh/dh_rfc5114.c
+  $(OPENSSL_PATH)/crypto/dh/dh_rfc7919.c
+  $(OPENSSL_PATH)/crypto/dso/dso_dl.c
+  $(OPENSSL_PATH)/crypto/dso/dso_dlfcn.c
+  $(OPENSSL_PATH)/crypto/dso/dso_err.c
+  $(OPENSSL_PATH)/crypto/dso/dso_lib.c
+  $(OPENSSL_PATH)/crypto/dso/dso_openssl.c
+  $(OPENSSL_PATH)/crypto/dso/dso_vms.c
+  $(OPENSSL_PATH)/crypto/dso/dso_win32.c
+  $(OPENSSL_PATH)/crypto/ebcdic.c
+  $(OPENSSL_PATH)/crypto/err/err.c
+  $(OPENSSL_PATH)/crypto/err/err_prn.c
+  $(OPENSSL_PATH)/crypto/evp/bio_b64.c
+  $(OPENSSL_PATH)/crypto/evp/bio_enc.c
+  $(OPENSSL_PATH)/crypto/evp/bio_md.c
+  $(OPENSSL_PATH)/crypto/evp/bio_ok.c
+  $(OPENSSL_PATH)/crypto/evp/c_allc.c
+  $(OPENSSL_PATH)/crypto/evp/c_alld.c
+  $(OPENSSL_PATH)/crypto/evp/cmeth_lib.c
+  $(OPENSSL_PATH)/crypto/evp/digest.c
+  $(OPENSSL_PATH)/crypto/evp/e_aes.c
+  $(OPENSSL_PATH)/crypto/evp/e_aes_cbc_hmac_sha1.c
+  $(OPENSSL_PATH)/crypto/evp/e_aes_cbc_hmac_sha256.c
+  $(OPENSSL_PATH)/crypto/evp/e_aria.c
+  $(OPENSSL_PATH)/crypto/evp/e_bf.c
+  $(OPENSSL_PATH)/crypto/evp/e_camellia.c
+  $(OPENSSL_PATH)/crypto/evp/e_cast.c
+  $(OPENSSL_PATH)/crypto/evp/e_chacha20_poly1305.c
+  $(OPENSSL_PATH)/crypto/evp/e_des.c
+  $(OPENSSL_PATH)/crypto/evp/e_des3.c
+  $(OPENSSL_PATH)/crypto/evp/e_idea.c
+  $(OPENSSL_PATH)/crypto/evp/e_null.c
+  $(OPENSSL_PATH)/crypto/evp/e_old.c
+  $(OPENSSL_PATH)/crypto/evp/e_rc2.c
+  $(OPENSSL_PATH)/crypto/evp/e_rc4.c
+  $(OPENSSL_PATH)/crypto/evp/e_rc4_hmac_md5.c
+  $(OPENSSL_PATH)/crypto/evp/e_rc5.c
+  $(OPENSSL_PATH)/crypto/evp/e_seed.c
+  $(OPENSSL_PATH)/crypto/evp/e_sm4.c
+  $(OPENSSL_PATH)/crypto/evp/e_xcbc_d.c
+  $(OPENSSL_PATH)/crypto/evp/encode.c
+  $(OPENSSL_PATH)/crypto/evp/evp_cnf.c
+  $(OPENSSL_PATH)/crypto/evp/evp_enc.c
+  $(OPENSSL_PATH)/crypto/evp/evp_err.c
+  $(OPENSSL_PATH)/crypto/evp/evp_key.c
+  $(OPENSSL_PATH)/crypto/evp/evp_lib.c
+  $(OPENSSL_PATH)/crypto/evp/evp_pbe.c
+  $(OPENSSL_PATH)/crypto/evp/evp_pkey.c
+  $(OPENSSL_PATH)/crypto/evp/m_md2.c
+  $(OPENSSL_PATH)/crypto/evp/m_md4.c
+  $(OPENSSL_PATH)/crypto/evp/m_md5.c
+  $(OPENSSL_PATH)/crypto/evp/m_md5_sha1.c
+  $(OPENSSL_PATH)/crypto/evp/m_mdc2.c
+  $(OPENSSL_PATH)/crypto/evp/m_null.c
+  $(OPENSSL_PATH)/crypto/evp/m_ripemd.c
+  $(OPENSSL_PATH)/crypto/evp/m_sha1.c
+  $(OPENSSL_PATH)/crypto/evp/m_sha3.c
+  $(OPENSSL_PATH)/crypto/evp/m_sigver.c
+  $(OPENSSL_PATH)/crypto/evp/m_wp.c
+  $(OPENSSL_PATH)/crypto/evp/names.c
+  $(OPENSSL_PATH)/crypto/evp/p5_crpt.c
+  $(OPENSSL_PATH)/crypto/evp/p5_crpt2.c
+  $(OPENSSL_PATH)/crypto/evp/p_dec.c
+  $(OPENSSL_PATH)/crypto/evp/p_enc.c
+  $(OPENSSL_PATH)/crypto/evp/p_lib.c
+  $(OPENSSL_PATH)/crypto/evp/p_open.c
+  $(OPENSSL_PATH)/crypto/evp/p_seal.c
+  $(OPENSSL_PATH)/crypto/evp/p_sign.c
+  $(OPENSSL_PATH)/crypto/evp/p_verify.c
+  $(OPENSSL_PATH)/crypto/evp/pbe_scrypt.c
+  $(OPENSSL_PATH)/crypto/evp/pmeth_fn.c
+  $(OPENSSL_PATH)/crypto/evp/pmeth_gn.c
+  $(OPENSSL_PATH)/crypto/evp/pmeth_lib.c
+  $(OPENSSL_PATH)/crypto/ex_data.c
+  $(OPENSSL_PATH)/crypto/getenv.c
+  $(OPENSSL_PATH)/crypto/hmac/hm_ameth.c
+  $(OPENSSL_PATH)/crypto/hmac/hm_pmeth.c
+  $(OPENSSL_PATH)/crypto/hmac/hmac.c
+  $(OPENSSL_PATH)/crypto/init.c
+  $(OPENSSL_PATH)/crypto/kdf/hkdf.c
+  $(OPENSSL_PATH)/crypto/kdf/kdf_err.c
+  $(OPENSSL_PATH)/crypto/kdf/scrypt.c
+  $(OPENSSL_PATH)/crypto/kdf/tls1_prf.c
+  $(OPENSSL_PATH)/crypto/lhash/lh_stats.c
+  $(OPENSSL_PATH)/crypto/lhash/lhash.c
+  $(OPENSSL_PATH)/crypto/md4/md4_dgst.c
+  $(OPENSSL_PATH)/crypto/md4/md4_one.c
+  $(OPENSSL_PATH)/crypto/md5/md5_dgst.c
+  $(OPENSSL_PATH)/crypto/md5/md5_one.c
+  $(OPENSSL_PATH)/crypto/mem.c
+  $(OPENSSL_PATH)/crypto/mem_dbg.c
+  $(OPENSSL_PATH)/crypto/mem_sec.c
+  $(OPENSSL_PATH)/crypto/modes/cbc128.c
+  $(OPENSSL_PATH)/crypto/modes/ccm128.c
+  $(OPENSSL_PATH)/crypto/modes/cfb128.c
+  $(OPENSSL_PATH)/crypto/modes/ctr128.c
+  $(OPENSSL_PATH)/crypto/modes/cts128.c
+  $(OPENSSL_PATH)/crypto/modes/gcm128.c
+  $(OPENSSL_PATH)/crypto/modes/ocb128.c
+  $(OPENSSL_PATH)/crypto/modes/ofb128.c
+  $(OPENSSL_PATH)/crypto/modes/wrap128.c
+  $(OPENSSL_PATH)/crypto/modes/xts128.c
+  $(OPENSSL_PATH)/crypto/o_dir.c
+  $(OPENSSL_PATH)/crypto/o_fips.c
+  $(OPENSSL_PATH)/crypto/o_fopen.c
+  $(OPENSSL_PATH)/crypto/o_init.c
+  $(OPENSSL_PATH)/crypto/o_str.c
+  $(OPENSSL_PATH)/crypto/o_time.c
+  $(OPENSSL_PATH)/crypto/objects/o_names.c
+  $(OPENSSL_PATH)/crypto/objects/obj_dat.c
+  $(OPENSSL_PATH)/crypto/objects/obj_err.c
+  $(OPENSSL_PATH)/crypto/objects/obj_lib.c
+  $(OPENSSL_PATH)/crypto/objects/obj_xref.c
+  $(OPENSSL_PATH)/crypto/ocsp/ocsp_asn.c
+  $(OPENSSL_PATH)/crypto/ocsp/ocsp_cl.c
+  $(OPENSSL_PATH)/crypto/ocsp/ocsp_err.c
+  $(OPENSSL_PATH)/crypto/ocsp/ocsp_ext.c
+  $(OPENSSL_PATH)/crypto/ocsp/ocsp_ht.c
+  $(OPENSSL_PATH)/crypto/ocsp/ocsp_lib.c
+  $(OPENSSL_PATH)/crypto/ocsp/ocsp_prn.c
+  $(OPENSSL_PATH)/crypto/ocsp/ocsp_srv.c
+  $(OPENSSL_PATH)/crypto/ocsp/ocsp_vfy.c
+  $(OPENSSL_PATH)/crypto/ocsp/v3_ocsp.c
+  $(OPENSSL_PATH)/crypto/pem/pem_all.c
+  $(OPENSSL_PATH)/crypto/pem/pem_err.c
+  $(OPENSSL_PATH)/crypto/pem/pem_info.c
+  $(OPENSSL_PATH)/crypto/pem/pem_lib.c
+  $(OPENSSL_PATH)/crypto/pem/pem_oth.c
+  $(OPENSSL_PATH)/crypto/pem/pem_pk8.c
+  $(OPENSSL_PATH)/crypto/pem/pem_pkey.c
+  $(OPENSSL_PATH)/crypto/pem/pem_sign.c
+  $(OPENSSL_PATH)/crypto/pem/pem_x509.c
+  $(OPENSSL_PATH)/crypto/pem/pem_xaux.c
+  $(OPENSSL_PATH)/crypto/pem/pvkfmt.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_add.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_asn.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_attr.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_crpt.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_crt.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_decr.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_init.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_key.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_kiss.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_mutl.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_npas.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_p8d.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_p8e.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_sbag.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_utl.c
+  $(OPENSSL_PATH)/crypto/pkcs12/pk12err.c
+  $(OPENSSL_PATH)/crypto/pkcs7/bio_pk7.c
+  $(OPENSSL_PATH)/crypto/pkcs7/pk7_asn1.c
+  $(OPENSSL_PATH)/crypto/pkcs7/pk7_attr.c
+  $(OPENSSL_PATH)/crypto/pkcs7/pk7_doit.c
+  $(OPENSSL_PATH)/crypto/pkcs7/pk7_lib.c
+  $(OPENSSL_PATH)/crypto/pkcs7/pk7_mime.c
+  $(OPENSSL_PATH)/crypto/pkcs7/pk7_smime.c
+  $(OPENSSL_PATH)/crypto/pkcs7/pkcs7err.c
+  $(OPENSSL_PATH)/crypto/rand/drbg_ctr.c
+  $(OPENSSL_PATH)/crypto/rand/drbg_lib.c
+  $(OPENSSL_PATH)/crypto/rand/rand_egd.c
+  $(OPENSSL_PATH)/crypto/rand/rand_err.c
+  $(OPENSSL_PATH)/crypto/rand/rand_lib.c
+  $(OPENSSL_PATH)/crypto/rand/rand_unix.c
+  $(OPENSSL_PATH)/crypto/rand/rand_vms.c
+  $(OPENSSL_PATH)/crypto/rand/rand_win.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_ameth.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_asn1.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_chk.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_crpt.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_depr.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_err.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_gen.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_lib.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_meth.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_mp.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_none.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_oaep.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_ossl.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_pk1.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_pmeth.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_prn.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_pss.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_saos.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_sign.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_ssl.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_x931.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_x931g.c
+  $(OPENSSL_PATH)/crypto/sha/sha1_one.c
+  $(OPENSSL_PATH)/crypto/sha/sha1dgst.c
+  $(OPENSSL_PATH)/crypto/sha/sha256.c
+  $(OPENSSL_PATH)/crypto/sha/sha512.c
+  $(OPENSSL_PATH)/crypto/siphash/siphash.c
+  $(OPENSSL_PATH)/crypto/siphash/siphash_ameth.c
+  $(OPENSSL_PATH)/crypto/siphash/siphash_pmeth.c
+  $(OPENSSL_PATH)/crypto/sm3/m_sm3.c
+  $(OPENSSL_PATH)/crypto/sm3/sm3.c
+  $(OPENSSL_PATH)/crypto/sm4/sm4.c
+  $(OPENSSL_PATH)/crypto/stack/stack.c
+  $(OPENSSL_PATH)/crypto/threads_none.c
+  $(OPENSSL_PATH)/crypto/threads_pthread.c
+  $(OPENSSL_PATH)/crypto/threads_win.c
+  $(OPENSSL_PATH)/crypto/txt_db/txt_db.c
+  $(OPENSSL_PATH)/crypto/ui/ui_err.c
+  $(OPENSSL_PATH)/crypto/ui/ui_lib.c
+  $(OPENSSL_PATH)/crypto/ui/ui_null.c
+  $(OPENSSL_PATH)/crypto/ui/ui_openssl.c
+  $(OPENSSL_PATH)/crypto/ui/ui_util.c
+  $(OPENSSL_PATH)/crypto/uid.c
+  $(OPENSSL_PATH)/crypto/x509/by_dir.c
+  $(OPENSSL_PATH)/crypto/x509/by_file.c
+  $(OPENSSL_PATH)/crypto/x509/t_crl.c
+  $(OPENSSL_PATH)/crypto/x509/t_req.c
+  $(OPENSSL_PATH)/crypto/x509/t_x509.c
+  $(OPENSSL_PATH)/crypto/x509/x509_att.c
+  $(OPENSSL_PATH)/crypto/x509/x509_cmp.c
+  $(OPENSSL_PATH)/crypto/x509/x509_d2.c
+  $(OPENSSL_PATH)/crypto/x509/x509_def.c
+  $(OPENSSL_PATH)/crypto/x509/x509_err.c
+  $(OPENSSL_PATH)/crypto/x509/x509_ext.c
+  $(OPENSSL_PATH)/crypto/x509/x509_lu.c
+  $(OPENSSL_PATH)/crypto/x509/x509_meth.c
+  $(OPENSSL_PATH)/crypto/x509/x509_obj.c
+  $(OPENSSL_PATH)/crypto/x509/x509_r2x.c
+  $(OPENSSL_PATH)/crypto/x509/x509_req.c
+  $(OPENSSL_PATH)/crypto/x509/x509_set.c
+  $(OPENSSL_PATH)/crypto/x509/x509_trs.c
+  $(OPENSSL_PATH)/crypto/x509/x509_txt.c
+  $(OPENSSL_PATH)/crypto/x509/x509_v3.c
+  $(OPENSSL_PATH)/crypto/x509/x509_vfy.c
+  $(OPENSSL_PATH)/crypto/x509/x509_vpm.c
+  $(OPENSSL_PATH)/crypto/x509/x509cset.c
+  $(OPENSSL_PATH)/crypto/x509/x509name.c
+  $(OPENSSL_PATH)/crypto/x509/x509rset.c
+  $(OPENSSL_PATH)/crypto/x509/x509spki.c
+  $(OPENSSL_PATH)/crypto/x509/x509type.c
+  $(OPENSSL_PATH)/crypto/x509/x_all.c
+  $(OPENSSL_PATH)/crypto/x509/x_attrib.c
+  $(OPENSSL_PATH)/crypto/x509/x_crl.c
+  $(OPENSSL_PATH)/crypto/x509/x_exten.c
+  $(OPENSSL_PATH)/crypto/x509/x_name.c
+  $(OPENSSL_PATH)/crypto/x509/x_pubkey.c
+  $(OPENSSL_PATH)/crypto/x509/x_req.c
+  $(OPENSSL_PATH)/crypto/x509/x_x509.c
+  $(OPENSSL_PATH)/crypto/x509/x_x509a.c
+  $(OPENSSL_PATH)/crypto/x509v3/pcy_cache.c
+  $(OPENSSL_PATH)/crypto/x509v3/pcy_data.c
+  $(OPENSSL_PATH)/crypto/x509v3/pcy_lib.c
+  $(OPENSSL_PATH)/crypto/x509v3/pcy_map.c
+  $(OPENSSL_PATH)/crypto/x509v3/pcy_node.c
+  $(OPENSSL_PATH)/crypto/x509v3/pcy_tree.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_addr.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_admis.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_akey.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_akeya.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_alt.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_asid.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_bcons.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_bitst.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_conf.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_cpols.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_crld.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_enum.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_extku.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_genn.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_ia5.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_info.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_int.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_lib.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_ncons.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_pci.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_pcia.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_pcons.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_pku.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_pmaps.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_prn.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_purp.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_skey.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_sxnet.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_tlsf.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_utl.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3err.c
+  $(OPENSSL_PATH)/crypto/arm_arch.h
+  $(OPENSSL_PATH)/crypto/mips_arch.h
+  $(OPENSSL_PATH)/crypto/ppc_arch.h
+  $(OPENSSL_PATH)/crypto/s390x_arch.h
+  $(OPENSSL_PATH)/crypto/sparc_arch.h
+  $(OPENSSL_PATH)/crypto/vms_rms.h
+  $(OPENSSL_PATH)/crypto/aes/aes_locl.h
+  $(OPENSSL_PATH)/crypto/asn1/asn1_item_list.h
+  $(OPENSSL_PATH)/crypto/asn1/asn1_locl.h
+  $(OPENSSL_PATH)/crypto/asn1/charmap.h
+  $(OPENSSL_PATH)/crypto/asn1/standard_methods.h
+  $(OPENSSL_PATH)/crypto/asn1/tbl_standard.h
+  $(OPENSSL_PATH)/crypto/async/async_locl.h
+  $(OPENSSL_PATH)/crypto/async/arch/async_null.h
+  $(OPENSSL_PATH)/crypto/async/arch/async_posix.h
+  $(OPENSSL_PATH)/crypto/async/arch/async_win.h
+  $(OPENSSL_PATH)/crypto/bio/bio_lcl.h
+  $(OPENSSL_PATH)/crypto/bn/bn_lcl.h
+  $(OPENSSL_PATH)/crypto/bn/bn_prime.h
+  $(OPENSSL_PATH)/crypto/bn/rsaz_exp.h
+  $(OPENSSL_PATH)/crypto/comp/comp_lcl.h
+  $(OPENSSL_PATH)/crypto/conf/conf_def.h
+  $(OPENSSL_PATH)/crypto/conf/conf_lcl.h
+  $(OPENSSL_PATH)/crypto/des/des_locl.h
+  $(OPENSSL_PATH)/crypto/des/spr.h
+  $(OPENSSL_PATH)/crypto/dh/dh_locl.h
+  $(OPENSSL_PATH)/crypto/dso/dso_locl.h
+  $(OPENSSL_PATH)/crypto/evp/evp_locl.h
+  $(OPENSSL_PATH)/crypto/hmac/hmac_lcl.h
+  $(OPENSSL_PATH)/crypto/lhash/lhash_lcl.h
+  $(OPENSSL_PATH)/crypto/md4/md4_locl.h
+  $(OPENSSL_PATH)/crypto/md5/md5_locl.h
+  $(OPENSSL_PATH)/crypto/modes/modes_lcl.h
+  $(OPENSSL_PATH)/crypto/objects/obj_dat.h
+  $(OPENSSL_PATH)/crypto/objects/obj_lcl.h
+  $(OPENSSL_PATH)/crypto/objects/obj_xref.h
+  $(OPENSSL_PATH)/crypto/ocsp/ocsp_lcl.h
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_lcl.h
+  $(OPENSSL_PATH)/crypto/rand/rand_lcl.h
+  $(OPENSSL_PATH)/crypto/rc4/rc4_locl.h
+  $(OPENSSL_PATH)/crypto/rsa/rsa_locl.h
+  $(OPENSSL_PATH)/crypto/sha/sha_locl.h
+  $(OPENSSL_PATH)/crypto/siphash/siphash_local.h
+  $(OPENSSL_PATH)/crypto/sm3/sm3_locl.h
+  $(OPENSSL_PATH)/crypto/store/store_locl.h
+  $(OPENSSL_PATH)/crypto/ui/ui_locl.h
+  $(OPENSSL_PATH)/crypto/x509/x509_lcl.h
+  $(OPENSSL_PATH)/crypto/x509v3/ext_dat.h
+  $(OPENSSL_PATH)/crypto/x509v3/pcy_int.h
+  $(OPENSSL_PATH)/crypto/x509v3/standard_exts.h
+  $(OPENSSL_PATH)/crypto/x509v3/v3_admis.h
+  $(OPENSSL_PATH)/ssl/bio_ssl.c
+  $(OPENSSL_PATH)/ssl/d1_lib.c
+  $(OPENSSL_PATH)/ssl/d1_msg.c
+  $(OPENSSL_PATH)/ssl/d1_srtp.c
+  $(OPENSSL_PATH)/ssl/methods.c
+  $(OPENSSL_PATH)/ssl/packet.c
+  $(OPENSSL_PATH)/ssl/pqueue.c
+  $(OPENSSL_PATH)/ssl/record/dtls1_bitmap.c
+  $(OPENSSL_PATH)/ssl/record/rec_layer_d1.c
+  $(OPENSSL_PATH)/ssl/record/rec_layer_s3.c
+  $(OPENSSL_PATH)/ssl/record/ssl3_buffer.c
+  $(OPENSSL_PATH)/ssl/record/ssl3_record.c
+  $(OPENSSL_PATH)/ssl/record/ssl3_record_tls13.c
+  $(OPENSSL_PATH)/ssl/s3_cbc.c
+  $(OPENSSL_PATH)/ssl/s3_enc.c
+  $(OPENSSL_PATH)/ssl/s3_lib.c
+  $(OPENSSL_PATH)/ssl/s3_msg.c
+  $(OPENSSL_PATH)/ssl/ssl_asn1.c
+  $(OPENSSL_PATH)/ssl/ssl_cert.c
+  $(OPENSSL_PATH)/ssl/ssl_ciph.c
+  $(OPENSSL_PATH)/ssl/ssl_conf.c
+  $(OPENSSL_PATH)/ssl/ssl_err.c
+  $(OPENSSL_PATH)/ssl/ssl_init.c
+  $(OPENSSL_PATH)/ssl/ssl_lib.c
+  $(OPENSSL_PATH)/ssl/ssl_mcnf.c
+  $(OPENSSL_PATH)/ssl/ssl_rsa.c
+  $(OPENSSL_PATH)/ssl/ssl_sess.c
+  $(OPENSSL_PATH)/ssl/ssl_stat.c
+  $(OPENSSL_PATH)/ssl/ssl_txt.c
+  $(OPENSSL_PATH)/ssl/ssl_utst.c
+  $(OPENSSL_PATH)/ssl/statem/extensions.c
+  $(OPENSSL_PATH)/ssl/statem/extensions_clnt.c
+  $(OPENSSL_PATH)/ssl/statem/extensions_cust.c
+  $(OPENSSL_PATH)/ssl/statem/extensions_srvr.c
+  $(OPENSSL_PATH)/ssl/statem/statem.c
+  $(OPENSSL_PATH)/ssl/statem/statem_clnt.c
+  $(OPENSSL_PATH)/ssl/statem/statem_dtls.c
+  $(OPENSSL_PATH)/ssl/statem/statem_lib.c
+  $(OPENSSL_PATH)/ssl/statem/statem_srvr.c
+  $(OPENSSL_PATH)/ssl/t1_enc.c
+  $(OPENSSL_PATH)/ssl/t1_lib.c
+  $(OPENSSL_PATH)/ssl/t1_trce.c
+  $(OPENSSL_PATH)/ssl/tls13_enc.c
+  $(OPENSSL_PATH)/ssl/tls_srp.c
+  $(OPENSSL_PATH)/ssl/packet_locl.h
+  $(OPENSSL_PATH)/ssl/ssl_cert_table.h
+  $(OPENSSL_PATH)/ssl/ssl_locl.h
+  $(OPENSSL_PATH)/ssl/record/record.h
+  $(OPENSSL_PATH)/ssl/record/record_locl.h
+  $(OPENSSL_PATH)/ssl/statem/statem.h
+  $(OPENSSL_PATH)/ssl/statem/statem_locl.h
+# Autogenerated files list ends here
+  buildinf.h
+  rand_pool_noise.h
+  ossl_store.c
+  rand_pool.c
+
+[Sources.X64]
+  ApiHooks.c
+  rand_pool_noise_tsc.c
+
+[Packages]
+  MdePkg/MdePkg.dec
+  CryptoPkg/CryptoPkg.dec
+
+[LibraryClasses]
+  BaseLib
+  DebugLib
+  TimerLib
+  PrintLib
+
+[BuildOptions]
+  #
+  # Disables the following Visual Studio compiler warnings brought by openssl source,
+  # so we do not break the build with /WX option:
+  #   C4090: 'function' : different 'const' qualifiers
+  #   C4132: 'object' : const object should be initialized (tls13_enc.c)
+  #   C4210: nonstandard extension used: function given file scope
+  #   C4244: conversion from type1 to type2, possible loss of data
+  #   C4245: conversion from type1 to type2, signed/unsigned mismatch
+  #   C4267: conversion from size_t to type, possible loss of data
+  #   C4306: 'identifier' : conversion from 'type1' to 'type2' of greater size
+  #   C4310: cast truncates constant value
+  #   C4389: 'operator' : signed/unsigned mismatch (xxxx)
+  #   C4700: uninitialized local variable 'name' used. (conf_sap.c(71))
+  #   C4702: unreachable code
+  #   C4706: assignment within conditional expression
+  #   C4819: The file contains a character that cannot be represented in the current code page
+  #
+  MSFT:*_*_X64_CC_FLAGS    = -U_WIN32 -U_WIN64 -U_MSC_VER $(OPENSSL_FLAGS) $(OPENSSL_FLAGS_CONFIG) /wd4090 /wd4132 /wd4210 /wd4244 /wd4245 /wd4267 /wd4306 /wd4310 /wd4700 /wd4389 /wd4702 /wd4706 /wd4819
+
+  INTEL:*_*_X64_CC_FLAGS   = -U_WIN32 -U_WIN64 -U_MSC_VER -U__ICC $(OPENSSL_FLAGS) $(OPENSSL_FLAGS_CONFIG) /w
+
+  #
+  # Suppress the following build warnings in openssl so we don't break the build with -Werror
+  #   -Werror=maybe-uninitialized: there exist some other paths for which the variable is not initialized.
+  #   -Werror=format: Check calls to printf and scanf, etc., to make sure that the arguments supplied have
+  #                   types appropriate to the format string specified.
+  #   -Werror=unused-but-set-variable: Warn whenever a local variable is assigned to, but otherwise unused (aside from its declaration).
+  #
+  GCC:*_*_X64_CC_FLAGS     = -U_WIN32 -U_WIN64 $(OPENSSL_FLAGS) -Wno-error=maybe-uninitialized -Wno-error=format -Wno-format -Wno-error=unused-but-set-variable -DNO_MSABI_VA_FUNCS
+
+  # suppress the following warnings in openssl so we don't break the build with warnings-as-errors:
+  # 1295: Deprecated declaration <entity> - give arg types
+  #  550: <entity> was set but never used
+  # 1293: assignment in condition
+  #  111: statement is unreachable (invariably "break;" after "return X;" in case statement)
+  #   68: integer conversion resulted in a change of sign ("if (Status == -1)")
+  #  177: <entity> was declared but never referenced
+  #  223: function <entity> declared implicitly
+  #  144: a value of type <type> cannot be used to initialize an entity of type <type>
+  #  513: a value of type <type> cannot be assigned to an entity of type <type>
+  #  188: enumerated type mixed with another type (i.e. passing an integer as an enum without a cast)
+  # 1296: Extended constant initialiser used
+  #  128: loop is not reachable - may be emitted inappropriately if code follows a conditional return
+  #       from the function that evaluates to true at compile time
+  #  546: transfer of control bypasses initialization - may be emitted inappropriately if the uninitialized
+  #       variable is never referenced after the jump
+  #    1: ignore "#1-D: last line of file ends without a newline"
+  # 3017: <entity> may be used before being set (NOTE: This was fixed in OpenSSL 1.1 HEAD with
+  #       commit d9b8b89bec4480de3a10bdaf9425db371c19145b, and can be dropped then.)
+  XCODE:*_*_X64_CC_FLAGS    = -mmmx -msse -U_WIN32 -U_WIN64 $(OPENSSL_FLAGS) -w -std=c99 -Wno-error=uninitialized
diff --git a/CryptoPkg/Library/Include/openssl/opensslconf.h b/CryptoPkg/Library/Include/openssl/opensslconf.h
index bd34e53ef2..20f32cc6fe 100644
--- a/CryptoPkg/Library/Include/openssl/opensslconf.h
+++ b/CryptoPkg/Library/Include/openssl/opensslconf.h
@@ -103,9 +103,6 @@ extern "C" {
 #ifndef OPENSSL_NO_ASAN
 # define OPENSSL_NO_ASAN
 #endif
-#ifndef OPENSSL_NO_ASM
-# define OPENSSL_NO_ASM
-#endif
 #ifndef OPENSSL_NO_ASYNC
 # define OPENSSL_NO_ASYNC
 #endif
diff --git a/CryptoPkg/Library/OpensslLib/ApiHooks.c b/CryptoPkg/Library/OpensslLib/ApiHooks.c
new file mode 100644
index 0000000000..58cff16838
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/ApiHooks.c
@@ -0,0 +1,18 @@
+/** @file
+  OpenSSL Library API hooks.
+
+Copyright (c) 2020, Intel Corporation. All rights reserved.<BR>
+SPDX-License-Identifier: BSD-2-Clause-Patent
+
+**/
+
+#include <Uefi.h>
+
+VOID *
+__imp_RtlVirtualUnwind (
+  VOID *    Args
+  )
+{
+  return NULL;
+}
+
diff --git a/CryptoPkg/Library/OpensslLib/OpensslLibConstructor.c b/CryptoPkg/Library/OpensslLib/OpensslLibConstructor.c
new file mode 100644
index 0000000000..ef20d2b84e
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/OpensslLibConstructor.c
@@ -0,0 +1,34 @@
+/** @file
+  Constructor to initialize CPUID data for OpenSSL assembly operations.
+
+Copyright (c) 2020, Intel Corporation. All rights reserved.<BR>
+SPDX-License-Identifier: BSD-2-Clause-Patent
+
+**/
+
+#include <Uefi.h>
+
+extern void OPENSSL_cpuid_setup (void);
+
+/**
+  Constructor routine for OpensslLib.
+
+  The constructor calls an internal OpenSSL function which fetches a local copy
+  of the hardware capability flags, used to enable native crypto instructions.
+
+  @param  None
+
+  @retval EFI_SUCCESS         The construction succeeded.
+
+**/
+EFI_STATUS
+EFIAPI
+OpensslLibConstructor (
+  VOID
+  )
+{
+  OPENSSL_cpuid_setup ();
+
+  return EFI_SUCCESS;
+}
+
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/aesni-x86.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/aesni-x86.nasm
new file mode 100644
index 0000000000..30879d3cf5
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/aesni-x86.nasm
@@ -0,0 +1,3209 @@
+; Copyright 2009-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code    use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text   code align=64
+%else
+section .text   code
+%endif
+;extern _OPENSSL_ia32cap_P
+global  _aesni_encrypt
+align   16
+_aesni_encrypt:
+L$_aesni_encrypt_begin:
+        mov     eax,DWORD [4+esp]
+        mov     edx,DWORD [12+esp]
+        movups  xmm2,[eax]
+        mov     ecx,DWORD [240+edx]
+        mov     eax,DWORD [8+esp]
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        lea     edx,[32+edx]
+        xorps   xmm2,xmm0
+L$000enc1_loop_1:
+db      102,15,56,220,209
+        dec     ecx
+        movups  xmm1,[edx]
+        lea     edx,[16+edx]
+        jnz     NEAR L$000enc1_loop_1
+db      102,15,56,221,209
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        movups  [eax],xmm2
+        pxor    xmm2,xmm2
+        ret
+global  _aesni_decrypt
+align   16
+_aesni_decrypt:
+L$_aesni_decrypt_begin:
+        mov     eax,DWORD [4+esp]
+        mov     edx,DWORD [12+esp]
+        movups  xmm2,[eax]
+        mov     ecx,DWORD [240+edx]
+        mov     eax,DWORD [8+esp]
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        lea     edx,[32+edx]
+        xorps   xmm2,xmm0
+L$001dec1_loop_2:
+db      102,15,56,222,209
+        dec     ecx
+        movups  xmm1,[edx]
+        lea     edx,[16+edx]
+        jnz     NEAR L$001dec1_loop_2
+db      102,15,56,223,209
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        movups  [eax],xmm2
+        pxor    xmm2,xmm2
+        ret
+align   16
+__aesni_encrypt2:
+        movups  xmm0,[edx]
+        shl     ecx,4
+        movups  xmm1,[16+edx]
+        xorps   xmm2,xmm0
+        pxor    xmm3,xmm0
+        movups  xmm0,[32+edx]
+        lea     edx,[32+ecx*1+edx]
+        neg     ecx
+        add     ecx,16
+L$002enc2_loop:
+db      102,15,56,220,209
+db      102,15,56,220,217
+        movups  xmm1,[ecx*1+edx]
+        add     ecx,32
+db      102,15,56,220,208
+db      102,15,56,220,216
+        movups  xmm0,[ecx*1+edx-16]
+        jnz     NEAR L$002enc2_loop
+db      102,15,56,220,209
+db      102,15,56,220,217
+db      102,15,56,221,208
+db      102,15,56,221,216
+        ret
+align   16
+__aesni_decrypt2:
+        movups  xmm0,[edx]
+        shl     ecx,4
+        movups  xmm1,[16+edx]
+        xorps   xmm2,xmm0
+        pxor    xmm3,xmm0
+        movups  xmm0,[32+edx]
+        lea     edx,[32+ecx*1+edx]
+        neg     ecx
+        add     ecx,16
+L$003dec2_loop:
+db      102,15,56,222,209
+db      102,15,56,222,217
+        movups  xmm1,[ecx*1+edx]
+        add     ecx,32
+db      102,15,56,222,208
+db      102,15,56,222,216
+        movups  xmm0,[ecx*1+edx-16]
+        jnz     NEAR L$003dec2_loop
+db      102,15,56,222,209
+db      102,15,56,222,217
+db      102,15,56,223,208
+db      102,15,56,223,216
+        ret
+align   16
+__aesni_encrypt3:
+        movups  xmm0,[edx]
+        shl     ecx,4
+        movups  xmm1,[16+edx]
+        xorps   xmm2,xmm0
+        pxor    xmm3,xmm0
+        pxor    xmm4,xmm0
+        movups  xmm0,[32+edx]
+        lea     edx,[32+ecx*1+edx]
+        neg     ecx
+        add     ecx,16
+L$004enc3_loop:
+db      102,15,56,220,209
+db      102,15,56,220,217
+db      102,15,56,220,225
+        movups  xmm1,[ecx*1+edx]
+        add     ecx,32
+db      102,15,56,220,208
+db      102,15,56,220,216
+db      102,15,56,220,224
+        movups  xmm0,[ecx*1+edx-16]
+        jnz     NEAR L$004enc3_loop
+db      102,15,56,220,209
+db      102,15,56,220,217
+db      102,15,56,220,225
+db      102,15,56,221,208
+db      102,15,56,221,216
+db      102,15,56,221,224
+        ret
+align   16
+__aesni_decrypt3:
+        movups  xmm0,[edx]
+        shl     ecx,4
+        movups  xmm1,[16+edx]
+        xorps   xmm2,xmm0
+        pxor    xmm3,xmm0
+        pxor    xmm4,xmm0
+        movups  xmm0,[32+edx]
+        lea     edx,[32+ecx*1+edx]
+        neg     ecx
+        add     ecx,16
+L$005dec3_loop:
+db      102,15,56,222,209
+db      102,15,56,222,217
+db      102,15,56,222,225
+        movups  xmm1,[ecx*1+edx]
+        add     ecx,32
+db      102,15,56,222,208
+db      102,15,56,222,216
+db      102,15,56,222,224
+        movups  xmm0,[ecx*1+edx-16]
+        jnz     NEAR L$005dec3_loop
+db      102,15,56,222,209
+db      102,15,56,222,217
+db      102,15,56,222,225
+db      102,15,56,223,208
+db      102,15,56,223,216
+db      102,15,56,223,224
+        ret
+align   16
+__aesni_encrypt4:
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        shl     ecx,4
+        xorps   xmm2,xmm0
+        pxor    xmm3,xmm0
+        pxor    xmm4,xmm0
+        pxor    xmm5,xmm0
+        movups  xmm0,[32+edx]
+        lea     edx,[32+ecx*1+edx]
+        neg     ecx
+db      15,31,64,0
+        add     ecx,16
+L$006enc4_loop:
+db      102,15,56,220,209
+db      102,15,56,220,217
+db      102,15,56,220,225
+db      102,15,56,220,233
+        movups  xmm1,[ecx*1+edx]
+        add     ecx,32
+db      102,15,56,220,208
+db      102,15,56,220,216
+db      102,15,56,220,224
+db      102,15,56,220,232
+        movups  xmm0,[ecx*1+edx-16]
+        jnz     NEAR L$006enc4_loop
+db      102,15,56,220,209
+db      102,15,56,220,217
+db      102,15,56,220,225
+db      102,15,56,220,233
+db      102,15,56,221,208
+db      102,15,56,221,216
+db      102,15,56,221,224
+db      102,15,56,221,232
+        ret
+align   16
+__aesni_decrypt4:
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        shl     ecx,4
+        xorps   xmm2,xmm0
+        pxor    xmm3,xmm0
+        pxor    xmm4,xmm0
+        pxor    xmm5,xmm0
+        movups  xmm0,[32+edx]
+        lea     edx,[32+ecx*1+edx]
+        neg     ecx
+db      15,31,64,0
+        add     ecx,16
+L$007dec4_loop:
+db      102,15,56,222,209
+db      102,15,56,222,217
+db      102,15,56,222,225
+db      102,15,56,222,233
+        movups  xmm1,[ecx*1+edx]
+        add     ecx,32
+db      102,15,56,222,208
+db      102,15,56,222,216
+db      102,15,56,222,224
+db      102,15,56,222,232
+        movups  xmm0,[ecx*1+edx-16]
+        jnz     NEAR L$007dec4_loop
+db      102,15,56,222,209
+db      102,15,56,222,217
+db      102,15,56,222,225
+db      102,15,56,222,233
+db      102,15,56,223,208
+db      102,15,56,223,216
+db      102,15,56,223,224
+db      102,15,56,223,232
+        ret
+align   16
+__aesni_encrypt6:
+        movups  xmm0,[edx]
+        shl     ecx,4
+        movups  xmm1,[16+edx]
+        xorps   xmm2,xmm0
+        pxor    xmm3,xmm0
+        pxor    xmm4,xmm0
+db      102,15,56,220,209
+        pxor    xmm5,xmm0
+        pxor    xmm6,xmm0
+db      102,15,56,220,217
+        lea     edx,[32+ecx*1+edx]
+        neg     ecx
+db      102,15,56,220,225
+        pxor    xmm7,xmm0
+        movups  xmm0,[ecx*1+edx]
+        add     ecx,16
+        jmp     NEAR L$008_aesni_encrypt6_inner
+align   16
+L$009enc6_loop:
+db      102,15,56,220,209
+db      102,15,56,220,217
+db      102,15,56,220,225
+L$008_aesni_encrypt6_inner:
+db      102,15,56,220,233
+db      102,15,56,220,241
+db      102,15,56,220,249
+L$_aesni_encrypt6_enter:
+        movups  xmm1,[ecx*1+edx]
+        add     ecx,32
+db      102,15,56,220,208
+db      102,15,56,220,216
+db      102,15,56,220,224
+db      102,15,56,220,232
+db      102,15,56,220,240
+db      102,15,56,220,248
+        movups  xmm0,[ecx*1+edx-16]
+        jnz     NEAR L$009enc6_loop
+db      102,15,56,220,209
+db      102,15,56,220,217
+db      102,15,56,220,225
+db      102,15,56,220,233
+db      102,15,56,220,241
+db      102,15,56,220,249
+db      102,15,56,221,208
+db      102,15,56,221,216
+db      102,15,56,221,224
+db      102,15,56,221,232
+db      102,15,56,221,240
+db      102,15,56,221,248
+        ret
+align   16
+__aesni_decrypt6:
+        movups  xmm0,[edx]
+        shl     ecx,4
+        movups  xmm1,[16+edx]
+        xorps   xmm2,xmm0
+        pxor    xmm3,xmm0
+        pxor    xmm4,xmm0
+db      102,15,56,222,209
+        pxor    xmm5,xmm0
+        pxor    xmm6,xmm0
+db      102,15,56,222,217
+        lea     edx,[32+ecx*1+edx]
+        neg     ecx
+db      102,15,56,222,225
+        pxor    xmm7,xmm0
+        movups  xmm0,[ecx*1+edx]
+        add     ecx,16
+        jmp     NEAR L$010_aesni_decrypt6_inner
+align   16
+L$011dec6_loop:
+db      102,15,56,222,209
+db      102,15,56,222,217
+db      102,15,56,222,225
+L$010_aesni_decrypt6_inner:
+db      102,15,56,222,233
+db      102,15,56,222,241
+db      102,15,56,222,249
+L$_aesni_decrypt6_enter:
+        movups  xmm1,[ecx*1+edx]
+        add     ecx,32
+db      102,15,56,222,208
+db      102,15,56,222,216
+db      102,15,56,222,224
+db      102,15,56,222,232
+db      102,15,56,222,240
+db      102,15,56,222,248
+        movups  xmm0,[ecx*1+edx-16]
+        jnz     NEAR L$011dec6_loop
+db      102,15,56,222,209
+db      102,15,56,222,217
+db      102,15,56,222,225
+db      102,15,56,222,233
+db      102,15,56,222,241
+db      102,15,56,222,249
+db      102,15,56,223,208
+db      102,15,56,223,216
+db      102,15,56,223,224
+db      102,15,56,223,232
+db      102,15,56,223,240
+db      102,15,56,223,248
+        ret
+global  _aesni_ecb_encrypt
+align   16
+_aesni_ecb_encrypt:
+L$_aesni_ecb_encrypt_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     esi,DWORD [20+esp]
+        mov     edi,DWORD [24+esp]
+        mov     eax,DWORD [28+esp]
+        mov     edx,DWORD [32+esp]
+        mov     ebx,DWORD [36+esp]
+        and     eax,-16
+        jz      NEAR L$012ecb_ret
+        mov     ecx,DWORD [240+edx]
+        test    ebx,ebx
+        jz      NEAR L$013ecb_decrypt
+        mov     ebp,edx
+        mov     ebx,ecx
+        cmp     eax,96
+        jb      NEAR L$014ecb_enc_tail
+        movdqu  xmm2,[esi]
+        movdqu  xmm3,[16+esi]
+        movdqu  xmm4,[32+esi]
+        movdqu  xmm5,[48+esi]
+        movdqu  xmm6,[64+esi]
+        movdqu  xmm7,[80+esi]
+        lea     esi,[96+esi]
+        sub     eax,96
+        jmp     NEAR L$015ecb_enc_loop6_enter
+align   16
+L$016ecb_enc_loop6:
+        movups  [edi],xmm2
+        movdqu  xmm2,[esi]
+        movups  [16+edi],xmm3
+        movdqu  xmm3,[16+esi]
+        movups  [32+edi],xmm4
+        movdqu  xmm4,[32+esi]
+        movups  [48+edi],xmm5
+        movdqu  xmm5,[48+esi]
+        movups  [64+edi],xmm6
+        movdqu  xmm6,[64+esi]
+        movups  [80+edi],xmm7
+        lea     edi,[96+edi]
+        movdqu  xmm7,[80+esi]
+        lea     esi,[96+esi]
+L$015ecb_enc_loop6_enter:
+        call    __aesni_encrypt6
+        mov     edx,ebp
+        mov     ecx,ebx
+        sub     eax,96
+        jnc     NEAR L$016ecb_enc_loop6
+        movups  [edi],xmm2
+        movups  [16+edi],xmm3
+        movups  [32+edi],xmm4
+        movups  [48+edi],xmm5
+        movups  [64+edi],xmm6
+        movups  [80+edi],xmm7
+        lea     edi,[96+edi]
+        add     eax,96
+        jz      NEAR L$012ecb_ret
+L$014ecb_enc_tail:
+        movups  xmm2,[esi]
+        cmp     eax,32
+        jb      NEAR L$017ecb_enc_one
+        movups  xmm3,[16+esi]
+        je      NEAR L$018ecb_enc_two
+        movups  xmm4,[32+esi]
+        cmp     eax,64
+        jb      NEAR L$019ecb_enc_three
+        movups  xmm5,[48+esi]
+        je      NEAR L$020ecb_enc_four
+        movups  xmm6,[64+esi]
+        xorps   xmm7,xmm7
+        call    __aesni_encrypt6
+        movups  [edi],xmm2
+        movups  [16+edi],xmm3
+        movups  [32+edi],xmm4
+        movups  [48+edi],xmm5
+        movups  [64+edi],xmm6
+        jmp     NEAR L$012ecb_ret
+align   16
+L$017ecb_enc_one:
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        lea     edx,[32+edx]
+        xorps   xmm2,xmm0
+L$021enc1_loop_3:
+db      102,15,56,220,209
+        dec     ecx
+        movups  xmm1,[edx]
+        lea     edx,[16+edx]
+        jnz     NEAR L$021enc1_loop_3
+db      102,15,56,221,209
+        movups  [edi],xmm2
+        jmp     NEAR L$012ecb_ret
+align   16
+L$018ecb_enc_two:
+        call    __aesni_encrypt2
+        movups  [edi],xmm2
+        movups  [16+edi],xmm3
+        jmp     NEAR L$012ecb_ret
+align   16
+L$019ecb_enc_three:
+        call    __aesni_encrypt3
+        movups  [edi],xmm2
+        movups  [16+edi],xmm3
+        movups  [32+edi],xmm4
+        jmp     NEAR L$012ecb_ret
+align   16
+L$020ecb_enc_four:
+        call    __aesni_encrypt4
+        movups  [edi],xmm2
+        movups  [16+edi],xmm3
+        movups  [32+edi],xmm4
+        movups  [48+edi],xmm5
+        jmp     NEAR L$012ecb_ret
+align   16
+L$013ecb_decrypt:
+        mov     ebp,edx
+        mov     ebx,ecx
+        cmp     eax,96
+        jb      NEAR L$022ecb_dec_tail
+        movdqu  xmm2,[esi]
+        movdqu  xmm3,[16+esi]
+        movdqu  xmm4,[32+esi]
+        movdqu  xmm5,[48+esi]
+        movdqu  xmm6,[64+esi]
+        movdqu  xmm7,[80+esi]
+        lea     esi,[96+esi]
+        sub     eax,96
+        jmp     NEAR L$023ecb_dec_loop6_enter
+align   16
+L$024ecb_dec_loop6:
+        movups  [edi],xmm2
+        movdqu  xmm2,[esi]
+        movups  [16+edi],xmm3
+        movdqu  xmm3,[16+esi]
+        movups  [32+edi],xmm4
+        movdqu  xmm4,[32+esi]
+        movups  [48+edi],xmm5
+        movdqu  xmm5,[48+esi]
+        movups  [64+edi],xmm6
+        movdqu  xmm6,[64+esi]
+        movups  [80+edi],xmm7
+        lea     edi,[96+edi]
+        movdqu  xmm7,[80+esi]
+        lea     esi,[96+esi]
+L$023ecb_dec_loop6_enter:
+        call    __aesni_decrypt6
+        mov     edx,ebp
+        mov     ecx,ebx
+        sub     eax,96
+        jnc     NEAR L$024ecb_dec_loop6
+        movups  [edi],xmm2
+        movups  [16+edi],xmm3
+        movups  [32+edi],xmm4
+        movups  [48+edi],xmm5
+        movups  [64+edi],xmm6
+        movups  [80+edi],xmm7
+        lea     edi,[96+edi]
+        add     eax,96
+        jz      NEAR L$012ecb_ret
+L$022ecb_dec_tail:
+        movups  xmm2,[esi]
+        cmp     eax,32
+        jb      NEAR L$025ecb_dec_one
+        movups  xmm3,[16+esi]
+        je      NEAR L$026ecb_dec_two
+        movups  xmm4,[32+esi]
+        cmp     eax,64
+        jb      NEAR L$027ecb_dec_three
+        movups  xmm5,[48+esi]
+        je      NEAR L$028ecb_dec_four
+        movups  xmm6,[64+esi]
+        xorps   xmm7,xmm7
+        call    __aesni_decrypt6
+        movups  [edi],xmm2
+        movups  [16+edi],xmm3
+        movups  [32+edi],xmm4
+        movups  [48+edi],xmm5
+        movups  [64+edi],xmm6
+        jmp     NEAR L$012ecb_ret
+align   16
+L$025ecb_dec_one:
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        lea     edx,[32+edx]
+        xorps   xmm2,xmm0
+L$029dec1_loop_4:
+db      102,15,56,222,209
+        dec     ecx
+        movups  xmm1,[edx]
+        lea     edx,[16+edx]
+        jnz     NEAR L$029dec1_loop_4
+db      102,15,56,223,209
+        movups  [edi],xmm2
+        jmp     NEAR L$012ecb_ret
+align   16
+L$026ecb_dec_two:
+        call    __aesni_decrypt2
+        movups  [edi],xmm2
+        movups  [16+edi],xmm3
+        jmp     NEAR L$012ecb_ret
+align   16
+L$027ecb_dec_three:
+        call    __aesni_decrypt3
+        movups  [edi],xmm2
+        movups  [16+edi],xmm3
+        movups  [32+edi],xmm4
+        jmp     NEAR L$012ecb_ret
+align   16
+L$028ecb_dec_four:
+        call    __aesni_decrypt4
+        movups  [edi],xmm2
+        movups  [16+edi],xmm3
+        movups  [32+edi],xmm4
+        movups  [48+edi],xmm5
+L$012ecb_ret:
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        pxor    xmm2,xmm2
+        pxor    xmm3,xmm3
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+        pxor    xmm6,xmm6
+        pxor    xmm7,xmm7
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _aesni_ccm64_encrypt_blocks
+align   16
+_aesni_ccm64_encrypt_blocks:
+L$_aesni_ccm64_encrypt_blocks_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     esi,DWORD [20+esp]
+        mov     edi,DWORD [24+esp]
+        mov     eax,DWORD [28+esp]
+        mov     edx,DWORD [32+esp]
+        mov     ebx,DWORD [36+esp]
+        mov     ecx,DWORD [40+esp]
+        mov     ebp,esp
+        sub     esp,60
+        and     esp,-16
+        mov     DWORD [48+esp],ebp
+        movdqu  xmm7,[ebx]
+        movdqu  xmm3,[ecx]
+        mov     ecx,DWORD [240+edx]
+        mov     DWORD [esp],202182159
+        mov     DWORD [4+esp],134810123
+        mov     DWORD [8+esp],67438087
+        mov     DWORD [12+esp],66051
+        mov     ebx,1
+        xor     ebp,ebp
+        mov     DWORD [16+esp],ebx
+        mov     DWORD [20+esp],ebp
+        mov     DWORD [24+esp],ebp
+        mov     DWORD [28+esp],ebp
+        shl     ecx,4
+        mov     ebx,16
+        lea     ebp,[edx]
+        movdqa  xmm5,[esp]
+        movdqa  xmm2,xmm7
+        lea     edx,[32+ecx*1+edx]
+        sub     ebx,ecx
+db      102,15,56,0,253
+L$030ccm64_enc_outer:
+        movups  xmm0,[ebp]
+        mov     ecx,ebx
+        movups  xmm6,[esi]
+        xorps   xmm2,xmm0
+        movups  xmm1,[16+ebp]
+        xorps   xmm0,xmm6
+        xorps   xmm3,xmm0
+        movups  xmm0,[32+ebp]
+L$031ccm64_enc2_loop:
+db      102,15,56,220,209
+db      102,15,56,220,217
+        movups  xmm1,[ecx*1+edx]
+        add     ecx,32
+db      102,15,56,220,208
+db      102,15,56,220,216
+        movups  xmm0,[ecx*1+edx-16]
+        jnz     NEAR L$031ccm64_enc2_loop
+db      102,15,56,220,209
+db      102,15,56,220,217
+        paddq   xmm7,[16+esp]
+        dec     eax
+db      102,15,56,221,208
+db      102,15,56,221,216
+        lea     esi,[16+esi]
+        xorps   xmm6,xmm2
+        movdqa  xmm2,xmm7
+        movups  [edi],xmm6
+db      102,15,56,0,213
+        lea     edi,[16+edi]
+        jnz     NEAR L$030ccm64_enc_outer
+        mov     esp,DWORD [48+esp]
+        mov     edi,DWORD [40+esp]
+        movups  [edi],xmm3
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        pxor    xmm2,xmm2
+        pxor    xmm3,xmm3
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+        pxor    xmm6,xmm6
+        pxor    xmm7,xmm7
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _aesni_ccm64_decrypt_blocks
+align   16
+_aesni_ccm64_decrypt_blocks:
+L$_aesni_ccm64_decrypt_blocks_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     esi,DWORD [20+esp]
+        mov     edi,DWORD [24+esp]
+        mov     eax,DWORD [28+esp]
+        mov     edx,DWORD [32+esp]
+        mov     ebx,DWORD [36+esp]
+        mov     ecx,DWORD [40+esp]
+        mov     ebp,esp
+        sub     esp,60
+        and     esp,-16
+        mov     DWORD [48+esp],ebp
+        movdqu  xmm7,[ebx]
+        movdqu  xmm3,[ecx]
+        mov     ecx,DWORD [240+edx]
+        mov     DWORD [esp],202182159
+        mov     DWORD [4+esp],134810123
+        mov     DWORD [8+esp],67438087
+        mov     DWORD [12+esp],66051
+        mov     ebx,1
+        xor     ebp,ebp
+        mov     DWORD [16+esp],ebx
+        mov     DWORD [20+esp],ebp
+        mov     DWORD [24+esp],ebp
+        mov     DWORD [28+esp],ebp
+        movdqa  xmm5,[esp]
+        movdqa  xmm2,xmm7
+        mov     ebp,edx
+        mov     ebx,ecx
+db      102,15,56,0,253
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        lea     edx,[32+edx]
+        xorps   xmm2,xmm0
+L$032enc1_loop_5:
+db      102,15,56,220,209
+        dec     ecx
+        movups  xmm1,[edx]
+        lea     edx,[16+edx]
+        jnz     NEAR L$032enc1_loop_5
+db      102,15,56,221,209
+        shl     ebx,4
+        mov     ecx,16
+        movups  xmm6,[esi]
+        paddq   xmm7,[16+esp]
+        lea     esi,[16+esi]
+        sub     ecx,ebx
+        lea     edx,[32+ebx*1+ebp]
+        mov     ebx,ecx
+        jmp     NEAR L$033ccm64_dec_outer
+align   16
+L$033ccm64_dec_outer:
+        xorps   xmm6,xmm2
+        movdqa  xmm2,xmm7
+        movups  [edi],xmm6
+        lea     edi,[16+edi]
+db      102,15,56,0,213
+        sub     eax,1
+        jz      NEAR L$034ccm64_dec_break
+        movups  xmm0,[ebp]
+        mov     ecx,ebx
+        movups  xmm1,[16+ebp]
+        xorps   xmm6,xmm0
+        xorps   xmm2,xmm0
+        xorps   xmm3,xmm6
+        movups  xmm0,[32+ebp]
+L$035ccm64_dec2_loop:
+db      102,15,56,220,209
+db      102,15,56,220,217
+        movups  xmm1,[ecx*1+edx]
+        add     ecx,32
+db      102,15,56,220,208
+db      102,15,56,220,216
+        movups  xmm0,[ecx*1+edx-16]
+        jnz     NEAR L$035ccm64_dec2_loop
+        movups  xmm6,[esi]
+        paddq   xmm7,[16+esp]
+db      102,15,56,220,209
+db      102,15,56,220,217
+db      102,15,56,221,208
+db      102,15,56,221,216
+        lea     esi,[16+esi]
+        jmp     NEAR L$033ccm64_dec_outer
+align   16
+L$034ccm64_dec_break:
+        mov     ecx,DWORD [240+ebp]
+        mov     edx,ebp
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        xorps   xmm6,xmm0
+        lea     edx,[32+edx]
+        xorps   xmm3,xmm6
+L$036enc1_loop_6:
+db      102,15,56,220,217
+        dec     ecx
+        movups  xmm1,[edx]
+        lea     edx,[16+edx]
+        jnz     NEAR L$036enc1_loop_6
+db      102,15,56,221,217
+        mov     esp,DWORD [48+esp]
+        mov     edi,DWORD [40+esp]
+        movups  [edi],xmm3
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        pxor    xmm2,xmm2
+        pxor    xmm3,xmm3
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+        pxor    xmm6,xmm6
+        pxor    xmm7,xmm7
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _aesni_ctr32_encrypt_blocks
+align   16
+_aesni_ctr32_encrypt_blocks:
+L$_aesni_ctr32_encrypt_blocks_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     esi,DWORD [20+esp]
+        mov     edi,DWORD [24+esp]
+        mov     eax,DWORD [28+esp]
+        mov     edx,DWORD [32+esp]
+        mov     ebx,DWORD [36+esp]
+        mov     ebp,esp
+        sub     esp,88
+        and     esp,-16
+        mov     DWORD [80+esp],ebp
+        cmp     eax,1
+        je      NEAR L$037ctr32_one_shortcut
+        movdqu  xmm7,[ebx]
+        mov     DWORD [esp],202182159
+        mov     DWORD [4+esp],134810123
+        mov     DWORD [8+esp],67438087
+        mov     DWORD [12+esp],66051
+        mov     ecx,6
+        xor     ebp,ebp
+        mov     DWORD [16+esp],ecx
+        mov     DWORD [20+esp],ecx
+        mov     DWORD [24+esp],ecx
+        mov     DWORD [28+esp],ebp
+db      102,15,58,22,251,3
+db      102,15,58,34,253,3
+        mov     ecx,DWORD [240+edx]
+        bswap   ebx
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        movdqa  xmm2,[esp]
+db      102,15,58,34,195,0
+        lea     ebp,[3+ebx]
+db      102,15,58,34,205,0
+        inc     ebx
+db      102,15,58,34,195,1
+        inc     ebp
+db      102,15,58,34,205,1
+        inc     ebx
+db      102,15,58,34,195,2
+        inc     ebp
+db      102,15,58,34,205,2
+        movdqa  [48+esp],xmm0
+db      102,15,56,0,194
+        movdqu  xmm6,[edx]
+        movdqa  [64+esp],xmm1
+db      102,15,56,0,202
+        pshufd  xmm2,xmm0,192
+        pshufd  xmm3,xmm0,128
+        cmp     eax,6
+        jb      NEAR L$038ctr32_tail
+        pxor    xmm7,xmm6
+        shl     ecx,4
+        mov     ebx,16
+        movdqa  [32+esp],xmm7
+        mov     ebp,edx
+        sub     ebx,ecx
+        lea     edx,[32+ecx*1+edx]
+        sub     eax,6
+        jmp     NEAR L$039ctr32_loop6
+align   16
+L$039ctr32_loop6:
+        pshufd  xmm4,xmm0,64
+        movdqa  xmm0,[32+esp]
+        pshufd  xmm5,xmm1,192
+        pxor    xmm2,xmm0
+        pshufd  xmm6,xmm1,128
+        pxor    xmm3,xmm0
+        pshufd  xmm7,xmm1,64
+        movups  xmm1,[16+ebp]
+        pxor    xmm4,xmm0
+        pxor    xmm5,xmm0
+db      102,15,56,220,209
+        pxor    xmm6,xmm0
+        pxor    xmm7,xmm0
+db      102,15,56,220,217
+        movups  xmm0,[32+ebp]
+        mov     ecx,ebx
+db      102,15,56,220,225
+db      102,15,56,220,233
+db      102,15,56,220,241
+db      102,15,56,220,249
+        call    L$_aesni_encrypt6_enter
+        movups  xmm1,[esi]
+        movups  xmm0,[16+esi]
+        xorps   xmm2,xmm1
+        movups  xmm1,[32+esi]
+        xorps   xmm3,xmm0
+        movups  [edi],xmm2
+        movdqa  xmm0,[16+esp]
+        xorps   xmm4,xmm1
+        movdqa  xmm1,[64+esp]
+        movups  [16+edi],xmm3
+        movups  [32+edi],xmm4
+        paddd   xmm1,xmm0
+        paddd   xmm0,[48+esp]
+        movdqa  xmm2,[esp]
+        movups  xmm3,[48+esi]
+        movups  xmm4,[64+esi]
+        xorps   xmm5,xmm3
+        movups  xmm3,[80+esi]
+        lea     esi,[96+esi]
+        movdqa  [48+esp],xmm0
+db      102,15,56,0,194
+        xorps   xmm6,xmm4
+        movups  [48+edi],xmm5
+        xorps   xmm7,xmm3
+        movdqa  [64+esp],xmm1
+db      102,15,56,0,202
+        movups  [64+edi],xmm6
+        pshufd  xmm2,xmm0,192
+        movups  [80+edi],xmm7
+        lea     edi,[96+edi]
+        pshufd  xmm3,xmm0,128
+        sub     eax,6
+        jnc     NEAR L$039ctr32_loop6
+        add     eax,6
+        jz      NEAR L$040ctr32_ret
+        movdqu  xmm7,[ebp]
+        mov     edx,ebp
+        pxor    xmm7,[32+esp]
+        mov     ecx,DWORD [240+ebp]
+L$038ctr32_tail:
+        por     xmm2,xmm7
+        cmp     eax,2
+        jb      NEAR L$041ctr32_one
+        pshufd  xmm4,xmm0,64
+        por     xmm3,xmm7
+        je      NEAR L$042ctr32_two
+        pshufd  xmm5,xmm1,192
+        por     xmm4,xmm7
+        cmp     eax,4
+        jb      NEAR L$043ctr32_three
+        pshufd  xmm6,xmm1,128
+        por     xmm5,xmm7
+        je      NEAR L$044ctr32_four
+        por     xmm6,xmm7
+        call    __aesni_encrypt6
+        movups  xmm1,[esi]
+        movups  xmm0,[16+esi]
+        xorps   xmm2,xmm1
+        movups  xmm1,[32+esi]
+        xorps   xmm3,xmm0
+        movups  xmm0,[48+esi]
+        xorps   xmm4,xmm1
+        movups  xmm1,[64+esi]
+        xorps   xmm5,xmm0
+        movups  [edi],xmm2
+        xorps   xmm6,xmm1
+        movups  [16+edi],xmm3
+        movups  [32+edi],xmm4
+        movups  [48+edi],xmm5
+        movups  [64+edi],xmm6
+        jmp     NEAR L$040ctr32_ret
+align   16
+L$037ctr32_one_shortcut:
+        movups  xmm2,[ebx]
+        mov     ecx,DWORD [240+edx]
+L$041ctr32_one:
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        lea     edx,[32+edx]
+        xorps   xmm2,xmm0
+L$045enc1_loop_7:
+db      102,15,56,220,209
+        dec     ecx
+        movups  xmm1,[edx]
+        lea     edx,[16+edx]
+        jnz     NEAR L$045enc1_loop_7
+db      102,15,56,221,209
+        movups  xmm6,[esi]
+        xorps   xmm6,xmm2
+        movups  [edi],xmm6
+        jmp     NEAR L$040ctr32_ret
+align   16
+L$042ctr32_two:
+        call    __aesni_encrypt2
+        movups  xmm5,[esi]
+        movups  xmm6,[16+esi]
+        xorps   xmm2,xmm5
+        xorps   xmm3,xmm6
+        movups  [edi],xmm2
+        movups  [16+edi],xmm3
+        jmp     NEAR L$040ctr32_ret
+align   16
+L$043ctr32_three:
+        call    __aesni_encrypt3
+        movups  xmm5,[esi]
+        movups  xmm6,[16+esi]
+        xorps   xmm2,xmm5
+        movups  xmm7,[32+esi]
+        xorps   xmm3,xmm6
+        movups  [edi],xmm2
+        xorps   xmm4,xmm7
+        movups  [16+edi],xmm3
+        movups  [32+edi],xmm4
+        jmp     NEAR L$040ctr32_ret
+align   16
+L$044ctr32_four:
+        call    __aesni_encrypt4
+        movups  xmm6,[esi]
+        movups  xmm7,[16+esi]
+        movups  xmm1,[32+esi]
+        xorps   xmm2,xmm6
+        movups  xmm0,[48+esi]
+        xorps   xmm3,xmm7
+        movups  [edi],xmm2
+        xorps   xmm4,xmm1
+        movups  [16+edi],xmm3
+        xorps   xmm5,xmm0
+        movups  [32+edi],xmm4
+        movups  [48+edi],xmm5
+L$040ctr32_ret:
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        pxor    xmm2,xmm2
+        pxor    xmm3,xmm3
+        pxor    xmm4,xmm4
+        movdqa  [32+esp],xmm0
+        pxor    xmm5,xmm5
+        movdqa  [48+esp],xmm0
+        pxor    xmm6,xmm6
+        movdqa  [64+esp],xmm0
+        pxor    xmm7,xmm7
+        mov     esp,DWORD [80+esp]
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _aesni_xts_encrypt
+align   16
+_aesni_xts_encrypt:
+L$_aesni_xts_encrypt_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     edx,DWORD [36+esp]
+        mov     esi,DWORD [40+esp]
+        mov     ecx,DWORD [240+edx]
+        movups  xmm2,[esi]
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        lea     edx,[32+edx]
+        xorps   xmm2,xmm0
+L$046enc1_loop_8:
+db      102,15,56,220,209
+        dec     ecx
+        movups  xmm1,[edx]
+        lea     edx,[16+edx]
+        jnz     NEAR L$046enc1_loop_8
+db      102,15,56,221,209
+        mov     esi,DWORD [20+esp]
+        mov     edi,DWORD [24+esp]
+        mov     eax,DWORD [28+esp]
+        mov     edx,DWORD [32+esp]
+        mov     ebp,esp
+        sub     esp,120
+        mov     ecx,DWORD [240+edx]
+        and     esp,-16
+        mov     DWORD [96+esp],135
+        mov     DWORD [100+esp],0
+        mov     DWORD [104+esp],1
+        mov     DWORD [108+esp],0
+        mov     DWORD [112+esp],eax
+        mov     DWORD [116+esp],ebp
+        movdqa  xmm1,xmm2
+        pxor    xmm0,xmm0
+        movdqa  xmm3,[96+esp]
+        pcmpgtd xmm0,xmm1
+        and     eax,-16
+        mov     ebp,edx
+        mov     ebx,ecx
+        sub     eax,96
+        jc      NEAR L$047xts_enc_short
+        shl     ecx,4
+        mov     ebx,16
+        sub     ebx,ecx
+        lea     edx,[32+ecx*1+edx]
+        jmp     NEAR L$048xts_enc_loop6
+align   16
+L$048xts_enc_loop6:
+        pshufd  xmm2,xmm0,19
+        pxor    xmm0,xmm0
+        movdqa  [esp],xmm1
+        paddq   xmm1,xmm1
+        pand    xmm2,xmm3
+        pcmpgtd xmm0,xmm1
+        pxor    xmm1,xmm2
+        pshufd  xmm2,xmm0,19
+        pxor    xmm0,xmm0
+        movdqa  [16+esp],xmm1
+        paddq   xmm1,xmm1
+        pand    xmm2,xmm3
+        pcmpgtd xmm0,xmm1
+        pxor    xmm1,xmm2
+        pshufd  xmm2,xmm0,19
+        pxor    xmm0,xmm0
+        movdqa  [32+esp],xmm1
+        paddq   xmm1,xmm1
+        pand    xmm2,xmm3
+        pcmpgtd xmm0,xmm1
+        pxor    xmm1,xmm2
+        pshufd  xmm2,xmm0,19
+        pxor    xmm0,xmm0
+        movdqa  [48+esp],xmm1
+        paddq   xmm1,xmm1
+        pand    xmm2,xmm3
+        pcmpgtd xmm0,xmm1
+        pxor    xmm1,xmm2
+        pshufd  xmm7,xmm0,19
+        movdqa  [64+esp],xmm1
+        paddq   xmm1,xmm1
+        movups  xmm0,[ebp]
+        pand    xmm7,xmm3
+        movups  xmm2,[esi]
+        pxor    xmm7,xmm1
+        mov     ecx,ebx
+        movdqu  xmm3,[16+esi]
+        xorps   xmm2,xmm0
+        movdqu  xmm4,[32+esi]
+        pxor    xmm3,xmm0
+        movdqu  xmm5,[48+esi]
+        pxor    xmm4,xmm0
+        movdqu  xmm6,[64+esi]
+        pxor    xmm5,xmm0
+        movdqu  xmm1,[80+esi]
+        pxor    xmm6,xmm0
+        lea     esi,[96+esi]
+        pxor    xmm2,[esp]
+        movdqa  [80+esp],xmm7
+        pxor    xmm7,xmm1
+        movups  xmm1,[16+ebp]
+        pxor    xmm3,[16+esp]
+        pxor    xmm4,[32+esp]
+db      102,15,56,220,209
+        pxor    xmm5,[48+esp]
+        pxor    xmm6,[64+esp]
+db      102,15,56,220,217
+        pxor    xmm7,xmm0
+        movups  xmm0,[32+ebp]
+db      102,15,56,220,225
+db      102,15,56,220,233
+db      102,15,56,220,241
+db      102,15,56,220,249
+        call    L$_aesni_encrypt6_enter
+        movdqa  xmm1,[80+esp]
+        pxor    xmm0,xmm0
+        xorps   xmm2,[esp]
+        pcmpgtd xmm0,xmm1
+        xorps   xmm3,[16+esp]
+        movups  [edi],xmm2
+        xorps   xmm4,[32+esp]
+        movups  [16+edi],xmm3
+        xorps   xmm5,[48+esp]
+        movups  [32+edi],xmm4
+        xorps   xmm6,[64+esp]
+        movups  [48+edi],xmm5
+        xorps   xmm7,xmm1
+        movups  [64+edi],xmm6
+        pshufd  xmm2,xmm0,19
+        movups  [80+edi],xmm7
+        lea     edi,[96+edi]
+        movdqa  xmm3,[96+esp]
+        pxor    xmm0,xmm0
+        paddq   xmm1,xmm1
+        pand    xmm2,xmm3
+        pcmpgtd xmm0,xmm1
+        pxor    xmm1,xmm2
+        sub     eax,96
+        jnc     NEAR L$048xts_enc_loop6
+        mov     ecx,DWORD [240+ebp]
+        mov     edx,ebp
+        mov     ebx,ecx
+L$047xts_enc_short:
+        add     eax,96
+        jz      NEAR L$049xts_enc_done6x
+        movdqa  xmm5,xmm1
+        cmp     eax,32
+        jb      NEAR L$050xts_enc_one
+        pshufd  xmm2,xmm0,19
+        pxor    xmm0,xmm0
+        paddq   xmm1,xmm1
+        pand    xmm2,xmm3
+        pcmpgtd xmm0,xmm1
+        pxor    xmm1,xmm2
+        je      NEAR L$051xts_enc_two
+        pshufd  xmm2,xmm0,19
+        pxor    xmm0,xmm0
+        movdqa  xmm6,xmm1
+        paddq   xmm1,xmm1
+        pand    xmm2,xmm3
+        pcmpgtd xmm0,xmm1
+        pxor    xmm1,xmm2
+        cmp     eax,64
+        jb      NEAR L$052xts_enc_three
+        pshufd  xmm2,xmm0,19
+        pxor    xmm0,xmm0
+        movdqa  xmm7,xmm1
+        paddq   xmm1,xmm1
+        pand    xmm2,xmm3
+        pcmpgtd xmm0,xmm1
+        pxor    xmm1,xmm2
+        movdqa  [esp],xmm5
+        movdqa  [16+esp],xmm6
+        je      NEAR L$053xts_enc_four
+        movdqa  [32+esp],xmm7
+        pshufd  xmm7,xmm0,19
+        movdqa  [48+esp],xmm1
+        paddq   xmm1,xmm1
+        pand    xmm7,xmm3
+        pxor    xmm7,xmm1
+        movdqu  xmm2,[esi]
+        movdqu  xmm3,[16+esi]
+        movdqu  xmm4,[32+esi]
+        pxor    xmm2,[esp]
+        movdqu  xmm5,[48+esi]
+        pxor    xmm3,[16+esp]
+        movdqu  xmm6,[64+esi]
+        pxor    xmm4,[32+esp]
+        lea     esi,[80+esi]
+        pxor    xmm5,[48+esp]
+        movdqa  [64+esp],xmm7
+        pxor    xmm6,xmm7
+        call    __aesni_encrypt6
+        movaps  xmm1,[64+esp]
+        xorps   xmm2,[esp]
+        xorps   xmm3,[16+esp]
+        xorps   xmm4,[32+esp]
+        movups  [edi],xmm2
+        xorps   xmm5,[48+esp]
+        movups  [16+edi],xmm3
+        xorps   xmm6,xmm1
+        movups  [32+edi],xmm4
+        movups  [48+edi],xmm5
+        movups  [64+edi],xmm6
+        lea     edi,[80+edi]
+        jmp     NEAR L$054xts_enc_done
+align   16
+L$050xts_enc_one:
+        movups  xmm2,[esi]
+        lea     esi,[16+esi]
+        xorps   xmm2,xmm5
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        lea     edx,[32+edx]
+        xorps   xmm2,xmm0
+L$055enc1_loop_9:
+db      102,15,56,220,209
+        dec     ecx
+        movups  xmm1,[edx]
+        lea     edx,[16+edx]
+        jnz     NEAR L$055enc1_loop_9
+db      102,15,56,221,209
+        xorps   xmm2,xmm5
+        movups  [edi],xmm2
+        lea     edi,[16+edi]
+        movdqa  xmm1,xmm5
+        jmp     NEAR L$054xts_enc_done
+align   16
+L$051xts_enc_two:
+        movaps  xmm6,xmm1
+        movups  xmm2,[esi]
+        movups  xmm3,[16+esi]
+        lea     esi,[32+esi]
+        xorps   xmm2,xmm5
+        xorps   xmm3,xmm6
+        call    __aesni_encrypt2
+        xorps   xmm2,xmm5
+        xorps   xmm3,xmm6
+        movups  [edi],xmm2
+        movups  [16+edi],xmm3
+        lea     edi,[32+edi]
+        movdqa  xmm1,xmm6
+        jmp     NEAR L$054xts_enc_done
+align   16
+L$052xts_enc_three:
+        movaps  xmm7,xmm1
+        movups  xmm2,[esi]
+        movups  xmm3,[16+esi]
+        movups  xmm4,[32+esi]
+        lea     esi,[48+esi]
+        xorps   xmm2,xmm5
+        xorps   xmm3,xmm6
+        xorps   xmm4,xmm7
+        call    __aesni_encrypt3
+        xorps   xmm2,xmm5
+        xorps   xmm3,xmm6
+        xorps   xmm4,xmm7
+        movups  [edi],xmm2
+        movups  [16+edi],xmm3
+        movups  [32+edi],xmm4
+        lea     edi,[48+edi]
+        movdqa  xmm1,xmm7
+        jmp     NEAR L$054xts_enc_done
+align   16
+L$053xts_enc_four:
+        movaps  xmm6,xmm1
+        movups  xmm2,[esi]
+        movups  xmm3,[16+esi]
+        movups  xmm4,[32+esi]
+        xorps   xmm2,[esp]
+        movups  xmm5,[48+esi]
+        lea     esi,[64+esi]
+        xorps   xmm3,[16+esp]
+        xorps   xmm4,xmm7
+        xorps   xmm5,xmm6
+        call    __aesni_encrypt4
+        xorps   xmm2,[esp]
+        xorps   xmm3,[16+esp]
+        xorps   xmm4,xmm7
+        movups  [edi],xmm2
+        xorps   xmm5,xmm6
+        movups  [16+edi],xmm3
+        movups  [32+edi],xmm4
+        movups  [48+edi],xmm5
+        lea     edi,[64+edi]
+        movdqa  xmm1,xmm6
+        jmp     NEAR L$054xts_enc_done
+align   16
+L$049xts_enc_done6x:
+        mov     eax,DWORD [112+esp]
+        and     eax,15
+        jz      NEAR L$056xts_enc_ret
+        movdqa  xmm5,xmm1
+        mov     DWORD [112+esp],eax
+        jmp     NEAR L$057xts_enc_steal
+align   16
+L$054xts_enc_done:
+        mov     eax,DWORD [112+esp]
+        pxor    xmm0,xmm0
+        and     eax,15
+        jz      NEAR L$056xts_enc_ret
+        pcmpgtd xmm0,xmm1
+        mov     DWORD [112+esp],eax
+        pshufd  xmm5,xmm0,19
+        paddq   xmm1,xmm1
+        pand    xmm5,[96+esp]
+        pxor    xmm5,xmm1
+L$057xts_enc_steal:
+        movzx   ecx,BYTE [esi]
+        movzx   edx,BYTE [edi-16]
+        lea     esi,[1+esi]
+        mov     BYTE [edi-16],cl
+        mov     BYTE [edi],dl
+        lea     edi,[1+edi]
+        sub     eax,1
+        jnz     NEAR L$057xts_enc_steal
+        sub     edi,DWORD [112+esp]
+        mov     edx,ebp
+        mov     ecx,ebx
+        movups  xmm2,[edi-16]
+        xorps   xmm2,xmm5
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        lea     edx,[32+edx]
+        xorps   xmm2,xmm0
+L$058enc1_loop_10:
+db      102,15,56,220,209
+        dec     ecx
+        movups  xmm1,[edx]
+        lea     edx,[16+edx]
+        jnz     NEAR L$058enc1_loop_10
+db      102,15,56,221,209
+        xorps   xmm2,xmm5
+        movups  [edi-16],xmm2
+L$056xts_enc_ret:
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        pxor    xmm2,xmm2
+        movdqa  [esp],xmm0
+        pxor    xmm3,xmm3
+        movdqa  [16+esp],xmm0
+        pxor    xmm4,xmm4
+        movdqa  [32+esp],xmm0
+        pxor    xmm5,xmm5
+        movdqa  [48+esp],xmm0
+        pxor    xmm6,xmm6
+        movdqa  [64+esp],xmm0
+        pxor    xmm7,xmm7
+        movdqa  [80+esp],xmm0
+        mov     esp,DWORD [116+esp]
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _aesni_xts_decrypt
+align   16
+_aesni_xts_decrypt:
+L$_aesni_xts_decrypt_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     edx,DWORD [36+esp]
+        mov     esi,DWORD [40+esp]
+        mov     ecx,DWORD [240+edx]
+        movups  xmm2,[esi]
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        lea     edx,[32+edx]
+        xorps   xmm2,xmm0
+L$059enc1_loop_11:
+db      102,15,56,220,209
+        dec     ecx
+        movups  xmm1,[edx]
+        lea     edx,[16+edx]
+        jnz     NEAR L$059enc1_loop_11
+db      102,15,56,221,209
+        mov     esi,DWORD [20+esp]
+        mov     edi,DWORD [24+esp]
+        mov     eax,DWORD [28+esp]
+        mov     edx,DWORD [32+esp]
+        mov     ebp,esp
+        sub     esp,120
+        and     esp,-16
+        xor     ebx,ebx
+        test    eax,15
+        setnz   bl
+        shl     ebx,4
+        sub     eax,ebx
+        mov     DWORD [96+esp],135
+        mov     DWORD [100+esp],0
+        mov     DWORD [104+esp],1
+        mov     DWORD [108+esp],0
+        mov     DWORD [112+esp],eax
+        mov     DWORD [116+esp],ebp
+        mov     ecx,DWORD [240+edx]
+        mov     ebp,edx
+        mov     ebx,ecx
+        movdqa  xmm1,xmm2
+        pxor    xmm0,xmm0
+        movdqa  xmm3,[96+esp]
+        pcmpgtd xmm0,xmm1
+        and     eax,-16
+        sub     eax,96
+        jc      NEAR L$060xts_dec_short
+        shl     ecx,4
+        mov     ebx,16
+        sub     ebx,ecx
+        lea     edx,[32+ecx*1+edx]
+        jmp     NEAR L$061xts_dec_loop6
+align   16
+L$061xts_dec_loop6:
+        pshufd  xmm2,xmm0,19
+        pxor    xmm0,xmm0
+        movdqa  [esp],xmm1
+        paddq   xmm1,xmm1
+        pand    xmm2,xmm3
+        pcmpgtd xmm0,xmm1
+        pxor    xmm1,xmm2
+        pshufd  xmm2,xmm0,19
+        pxor    xmm0,xmm0
+        movdqa  [16+esp],xmm1
+        paddq   xmm1,xmm1
+        pand    xmm2,xmm3
+        pcmpgtd xmm0,xmm1
+        pxor    xmm1,xmm2
+        pshufd  xmm2,xmm0,19
+        pxor    xmm0,xmm0
+        movdqa  [32+esp],xmm1
+        paddq   xmm1,xmm1
+        pand    xmm2,xmm3
+        pcmpgtd xmm0,xmm1
+        pxor    xmm1,xmm2
+        pshufd  xmm2,xmm0,19
+        pxor    xmm0,xmm0
+        movdqa  [48+esp],xmm1
+        paddq   xmm1,xmm1
+        pand    xmm2,xmm3
+        pcmpgtd xmm0,xmm1
+        pxor    xmm1,xmm2
+        pshufd  xmm7,xmm0,19
+        movdqa  [64+esp],xmm1
+        paddq   xmm1,xmm1
+        movups  xmm0,[ebp]
+        pand    xmm7,xmm3
+        movups  xmm2,[esi]
+        pxor    xmm7,xmm1
+        mov     ecx,ebx
+        movdqu  xmm3,[16+esi]
+        xorps   xmm2,xmm0
+        movdqu  xmm4,[32+esi]
+        pxor    xmm3,xmm0
+        movdqu  xmm5,[48+esi]
+        pxor    xmm4,xmm0
+        movdqu  xmm6,[64+esi]
+        pxor    xmm5,xmm0
+        movdqu  xmm1,[80+esi]
+        pxor    xmm6,xmm0
+        lea     esi,[96+esi]
+        pxor    xmm2,[esp]
+        movdqa  [80+esp],xmm7
+        pxor    xmm7,xmm1
+        movups  xmm1,[16+ebp]
+        pxor    xmm3,[16+esp]
+        pxor    xmm4,[32+esp]
+db      102,15,56,222,209
+        pxor    xmm5,[48+esp]
+        pxor    xmm6,[64+esp]
+db      102,15,56,222,217
+        pxor    xmm7,xmm0
+        movups  xmm0,[32+ebp]
+db      102,15,56,222,225
+db      102,15,56,222,233
+db      102,15,56,222,241
+db      102,15,56,222,249
+        call    L$_aesni_decrypt6_enter
+        movdqa  xmm1,[80+esp]
+        pxor    xmm0,xmm0
+        xorps   xmm2,[esp]
+        pcmpgtd xmm0,xmm1
+        xorps   xmm3,[16+esp]
+        movups  [edi],xmm2
+        xorps   xmm4,[32+esp]
+        movups  [16+edi],xmm3
+        xorps   xmm5,[48+esp]
+        movups  [32+edi],xmm4
+        xorps   xmm6,[64+esp]
+        movups  [48+edi],xmm5
+        xorps   xmm7,xmm1
+        movups  [64+edi],xmm6
+        pshufd  xmm2,xmm0,19
+        movups  [80+edi],xmm7
+        lea     edi,[96+edi]
+        movdqa  xmm3,[96+esp]
+        pxor    xmm0,xmm0
+        paddq   xmm1,xmm1
+        pand    xmm2,xmm3
+        pcmpgtd xmm0,xmm1
+        pxor    xmm1,xmm2
+        sub     eax,96
+        jnc     NEAR L$061xts_dec_loop6
+        mov     ecx,DWORD [240+ebp]
+        mov     edx,ebp
+        mov     ebx,ecx
+L$060xts_dec_short:
+        add     eax,96
+        jz      NEAR L$062xts_dec_done6x
+        movdqa  xmm5,xmm1
+        cmp     eax,32
+        jb      NEAR L$063xts_dec_one
+        pshufd  xmm2,xmm0,19
+        pxor    xmm0,xmm0
+        paddq   xmm1,xmm1
+        pand    xmm2,xmm3
+        pcmpgtd xmm0,xmm1
+        pxor    xmm1,xmm2
+        je      NEAR L$064xts_dec_two
+        pshufd  xmm2,xmm0,19
+        pxor    xmm0,xmm0
+        movdqa  xmm6,xmm1
+        paddq   xmm1,xmm1
+        pand    xmm2,xmm3
+        pcmpgtd xmm0,xmm1
+        pxor    xmm1,xmm2
+        cmp     eax,64
+        jb      NEAR L$065xts_dec_three
+        pshufd  xmm2,xmm0,19
+        pxor    xmm0,xmm0
+        movdqa  xmm7,xmm1
+        paddq   xmm1,xmm1
+        pand    xmm2,xmm3
+        pcmpgtd xmm0,xmm1
+        pxor    xmm1,xmm2
+        movdqa  [esp],xmm5
+        movdqa  [16+esp],xmm6
+        je      NEAR L$066xts_dec_four
+        movdqa  [32+esp],xmm7
+        pshufd  xmm7,xmm0,19
+        movdqa  [48+esp],xmm1
+        paddq   xmm1,xmm1
+        pand    xmm7,xmm3
+        pxor    xmm7,xmm1
+        movdqu  xmm2,[esi]
+        movdqu  xmm3,[16+esi]
+        movdqu  xmm4,[32+esi]
+        pxor    xmm2,[esp]
+        movdqu  xmm5,[48+esi]
+        pxor    xmm3,[16+esp]
+        movdqu  xmm6,[64+esi]
+        pxor    xmm4,[32+esp]
+        lea     esi,[80+esi]
+        pxor    xmm5,[48+esp]
+        movdqa  [64+esp],xmm7
+        pxor    xmm6,xmm7
+        call    __aesni_decrypt6
+        movaps  xmm1,[64+esp]
+        xorps   xmm2,[esp]
+        xorps   xmm3,[16+esp]
+        xorps   xmm4,[32+esp]
+        movups  [edi],xmm2
+        xorps   xmm5,[48+esp]
+        movups  [16+edi],xmm3
+        xorps   xmm6,xmm1
+        movups  [32+edi],xmm4
+        movups  [48+edi],xmm5
+        movups  [64+edi],xmm6
+        lea     edi,[80+edi]
+        jmp     NEAR L$067xts_dec_done
+align   16
+L$063xts_dec_one:
+        movups  xmm2,[esi]
+        lea     esi,[16+esi]
+        xorps   xmm2,xmm5
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        lea     edx,[32+edx]
+        xorps   xmm2,xmm0
+L$068dec1_loop_12:
+db      102,15,56,222,209
+        dec     ecx
+        movups  xmm1,[edx]
+        lea     edx,[16+edx]
+        jnz     NEAR L$068dec1_loop_12
+db      102,15,56,223,209
+        xorps   xmm2,xmm5
+        movups  [edi],xmm2
+        lea     edi,[16+edi]
+        movdqa  xmm1,xmm5
+        jmp     NEAR L$067xts_dec_done
+align   16
+L$064xts_dec_two:
+        movaps  xmm6,xmm1
+        movups  xmm2,[esi]
+        movups  xmm3,[16+esi]
+        lea     esi,[32+esi]
+        xorps   xmm2,xmm5
+        xorps   xmm3,xmm6
+        call    __aesni_decrypt2
+        xorps   xmm2,xmm5
+        xorps   xmm3,xmm6
+        movups  [edi],xmm2
+        movups  [16+edi],xmm3
+        lea     edi,[32+edi]
+        movdqa  xmm1,xmm6
+        jmp     NEAR L$067xts_dec_done
+align   16
+L$065xts_dec_three:
+        movaps  xmm7,xmm1
+        movups  xmm2,[esi]
+        movups  xmm3,[16+esi]
+        movups  xmm4,[32+esi]
+        lea     esi,[48+esi]
+        xorps   xmm2,xmm5
+        xorps   xmm3,xmm6
+        xorps   xmm4,xmm7
+        call    __aesni_decrypt3
+        xorps   xmm2,xmm5
+        xorps   xmm3,xmm6
+        xorps   xmm4,xmm7
+        movups  [edi],xmm2
+        movups  [16+edi],xmm3
+        movups  [32+edi],xmm4
+        lea     edi,[48+edi]
+        movdqa  xmm1,xmm7
+        jmp     NEAR L$067xts_dec_done
+align   16
+L$066xts_dec_four:
+        movaps  xmm6,xmm1
+        movups  xmm2,[esi]
+        movups  xmm3,[16+esi]
+        movups  xmm4,[32+esi]
+        xorps   xmm2,[esp]
+        movups  xmm5,[48+esi]
+        lea     esi,[64+esi]
+        xorps   xmm3,[16+esp]
+        xorps   xmm4,xmm7
+        xorps   xmm5,xmm6
+        call    __aesni_decrypt4
+        xorps   xmm2,[esp]
+        xorps   xmm3,[16+esp]
+        xorps   xmm4,xmm7
+        movups  [edi],xmm2
+        xorps   xmm5,xmm6
+        movups  [16+edi],xmm3
+        movups  [32+edi],xmm4
+        movups  [48+edi],xmm5
+        lea     edi,[64+edi]
+        movdqa  xmm1,xmm6
+        jmp     NEAR L$067xts_dec_done
+align   16
+L$062xts_dec_done6x:
+        mov     eax,DWORD [112+esp]
+        and     eax,15
+        jz      NEAR L$069xts_dec_ret
+        mov     DWORD [112+esp],eax
+        jmp     NEAR L$070xts_dec_only_one_more
+align   16
+L$067xts_dec_done:
+        mov     eax,DWORD [112+esp]
+        pxor    xmm0,xmm0
+        and     eax,15
+        jz      NEAR L$069xts_dec_ret
+        pcmpgtd xmm0,xmm1
+        mov     DWORD [112+esp],eax
+        pshufd  xmm2,xmm0,19
+        pxor    xmm0,xmm0
+        movdqa  xmm3,[96+esp]
+        paddq   xmm1,xmm1
+        pand    xmm2,xmm3
+        pcmpgtd xmm0,xmm1
+        pxor    xmm1,xmm2
+L$070xts_dec_only_one_more:
+        pshufd  xmm5,xmm0,19
+        movdqa  xmm6,xmm1
+        paddq   xmm1,xmm1
+        pand    xmm5,xmm3
+        pxor    xmm5,xmm1
+        mov     edx,ebp
+        mov     ecx,ebx
+        movups  xmm2,[esi]
+        xorps   xmm2,xmm5
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        lea     edx,[32+edx]
+        xorps   xmm2,xmm0
+L$071dec1_loop_13:
+db      102,15,56,222,209
+        dec     ecx
+        movups  xmm1,[edx]
+        lea     edx,[16+edx]
+        jnz     NEAR L$071dec1_loop_13
+db      102,15,56,223,209
+        xorps   xmm2,xmm5
+        movups  [edi],xmm2
+L$072xts_dec_steal:
+        movzx   ecx,BYTE [16+esi]
+        movzx   edx,BYTE [edi]
+        lea     esi,[1+esi]
+        mov     BYTE [edi],cl
+        mov     BYTE [16+edi],dl
+        lea     edi,[1+edi]
+        sub     eax,1
+        jnz     NEAR L$072xts_dec_steal
+        sub     edi,DWORD [112+esp]
+        mov     edx,ebp
+        mov     ecx,ebx
+        movups  xmm2,[edi]
+        xorps   xmm2,xmm6
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        lea     edx,[32+edx]
+        xorps   xmm2,xmm0
+L$073dec1_loop_14:
+db      102,15,56,222,209
+        dec     ecx
+        movups  xmm1,[edx]
+        lea     edx,[16+edx]
+        jnz     NEAR L$073dec1_loop_14
+db      102,15,56,223,209
+        xorps   xmm2,xmm6
+        movups  [edi],xmm2
+L$069xts_dec_ret:
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        pxor    xmm2,xmm2
+        movdqa  [esp],xmm0
+        pxor    xmm3,xmm3
+        movdqa  [16+esp],xmm0
+        pxor    xmm4,xmm4
+        movdqa  [32+esp],xmm0
+        pxor    xmm5,xmm5
+        movdqa  [48+esp],xmm0
+        pxor    xmm6,xmm6
+        movdqa  [64+esp],xmm0
+        pxor    xmm7,xmm7
+        movdqa  [80+esp],xmm0
+        mov     esp,DWORD [116+esp]
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _aesni_ocb_encrypt
+align   16
+_aesni_ocb_encrypt:
+L$_aesni_ocb_encrypt_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     ecx,DWORD [40+esp]
+        mov     ebx,DWORD [48+esp]
+        mov     esi,DWORD [20+esp]
+        mov     edi,DWORD [24+esp]
+        mov     eax,DWORD [28+esp]
+        mov     edx,DWORD [32+esp]
+        movdqu  xmm0,[ecx]
+        mov     ebp,DWORD [36+esp]
+        movdqu  xmm1,[ebx]
+        mov     ebx,DWORD [44+esp]
+        mov     ecx,esp
+        sub     esp,132
+        and     esp,-16
+        sub     edi,esi
+        shl     eax,4
+        lea     eax,[eax*1+esi-96]
+        mov     DWORD [120+esp],edi
+        mov     DWORD [124+esp],eax
+        mov     DWORD [128+esp],ecx
+        mov     ecx,DWORD [240+edx]
+        test    ebp,1
+        jnz     NEAR L$074odd
+        bsf     eax,ebp
+        add     ebp,1
+        shl     eax,4
+        movdqu  xmm7,[eax*1+ebx]
+        mov     eax,edx
+        movdqu  xmm2,[esi]
+        lea     esi,[16+esi]
+        pxor    xmm7,xmm0
+        pxor    xmm1,xmm2
+        pxor    xmm2,xmm7
+        movdqa  xmm6,xmm1
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        lea     edx,[32+edx]
+        xorps   xmm2,xmm0
+L$075enc1_loop_15:
+db      102,15,56,220,209
+        dec     ecx
+        movups  xmm1,[edx]
+        lea     edx,[16+edx]
+        jnz     NEAR L$075enc1_loop_15
+db      102,15,56,221,209
+        xorps   xmm2,xmm7
+        movdqa  xmm0,xmm7
+        movdqa  xmm1,xmm6
+        movups  [esi*1+edi-16],xmm2
+        mov     ecx,DWORD [240+eax]
+        mov     edx,eax
+        mov     eax,DWORD [124+esp]
+L$074odd:
+        shl     ecx,4
+        mov     edi,16
+        sub     edi,ecx
+        mov     DWORD [112+esp],edx
+        lea     edx,[32+ecx*1+edx]
+        mov     DWORD [116+esp],edi
+        cmp     esi,eax
+        ja      NEAR L$076short
+        jmp     NEAR L$077grandloop
+align   32
+L$077grandloop:
+        lea     ecx,[1+ebp]
+        lea     eax,[3+ebp]
+        lea     edi,[5+ebp]
+        add     ebp,6
+        bsf     ecx,ecx
+        bsf     eax,eax
+        bsf     edi,edi
+        shl     ecx,4
+        shl     eax,4
+        shl     edi,4
+        movdqu  xmm2,[ebx]
+        movdqu  xmm3,[ecx*1+ebx]
+        mov     ecx,DWORD [116+esp]
+        movdqa  xmm4,xmm2
+        movdqu  xmm5,[eax*1+ebx]
+        movdqa  xmm6,xmm2
+        movdqu  xmm7,[edi*1+ebx]
+        pxor    xmm2,xmm0
+        pxor    xmm3,xmm2
+        movdqa  [esp],xmm2
+        pxor    xmm4,xmm3
+        movdqa  [16+esp],xmm3
+        pxor    xmm5,xmm4
+        movdqa  [32+esp],xmm4
+        pxor    xmm6,xmm5
+        movdqa  [48+esp],xmm5
+        pxor    xmm7,xmm6
+        movdqa  [64+esp],xmm6
+        movdqa  [80+esp],xmm7
+        movups  xmm0,[ecx*1+edx-48]
+        movdqu  xmm2,[esi]
+        movdqu  xmm3,[16+esi]
+        movdqu  xmm4,[32+esi]
+        movdqu  xmm5,[48+esi]
+        movdqu  xmm6,[64+esi]
+        movdqu  xmm7,[80+esi]
+        lea     esi,[96+esi]
+        pxor    xmm1,xmm2
+        pxor    xmm2,xmm0
+        pxor    xmm1,xmm3
+        pxor    xmm3,xmm0
+        pxor    xmm1,xmm4
+        pxor    xmm4,xmm0
+        pxor    xmm1,xmm5
+        pxor    xmm5,xmm0
+        pxor    xmm1,xmm6
+        pxor    xmm6,xmm0
+        pxor    xmm1,xmm7
+        pxor    xmm7,xmm0
+        movdqa  [96+esp],xmm1
+        movups  xmm1,[ecx*1+edx-32]
+        pxor    xmm2,[esp]
+        pxor    xmm3,[16+esp]
+        pxor    xmm4,[32+esp]
+        pxor    xmm5,[48+esp]
+        pxor    xmm6,[64+esp]
+        pxor    xmm7,[80+esp]
+        movups  xmm0,[ecx*1+edx-16]
+db      102,15,56,220,209
+db      102,15,56,220,217
+db      102,15,56,220,225
+db      102,15,56,220,233
+db      102,15,56,220,241
+db      102,15,56,220,249
+        mov     edi,DWORD [120+esp]
+        mov     eax,DWORD [124+esp]
+        call    L$_aesni_encrypt6_enter
+        movdqa  xmm0,[80+esp]
+        pxor    xmm2,[esp]
+        pxor    xmm3,[16+esp]
+        pxor    xmm4,[32+esp]
+        pxor    xmm5,[48+esp]
+        pxor    xmm6,[64+esp]
+        pxor    xmm7,xmm0
+        movdqa  xmm1,[96+esp]
+        movdqu  [esi*1+edi-96],xmm2
+        movdqu  [esi*1+edi-80],xmm3
+        movdqu  [esi*1+edi-64],xmm4
+        movdqu  [esi*1+edi-48],xmm5
+        movdqu  [esi*1+edi-32],xmm6
+        movdqu  [esi*1+edi-16],xmm7
+        cmp     esi,eax
+        jb      NEAR L$077grandloop
+L$076short:
+        add     eax,96
+        sub     eax,esi
+        jz      NEAR L$078done
+        cmp     eax,32
+        jb      NEAR L$079one
+        je      NEAR L$080two
+        cmp     eax,64
+        jb      NEAR L$081three
+        je      NEAR L$082four
+        lea     ecx,[1+ebp]
+        lea     eax,[3+ebp]
+        bsf     ecx,ecx
+        bsf     eax,eax
+        shl     ecx,4
+        shl     eax,4
+        movdqu  xmm2,[ebx]
+        movdqu  xmm3,[ecx*1+ebx]
+        mov     ecx,DWORD [116+esp]
+        movdqa  xmm4,xmm2
+        movdqu  xmm5,[eax*1+ebx]
+        movdqa  xmm6,xmm2
+        pxor    xmm2,xmm0
+        pxor    xmm3,xmm2
+        movdqa  [esp],xmm2
+        pxor    xmm4,xmm3
+        movdqa  [16+esp],xmm3
+        pxor    xmm5,xmm4
+        movdqa  [32+esp],xmm4
+        pxor    xmm6,xmm5
+        movdqa  [48+esp],xmm5
+        pxor    xmm7,xmm6
+        movdqa  [64+esp],xmm6
+        movups  xmm0,[ecx*1+edx-48]
+        movdqu  xmm2,[esi]
+        movdqu  xmm3,[16+esi]
+        movdqu  xmm4,[32+esi]
+        movdqu  xmm5,[48+esi]
+        movdqu  xmm6,[64+esi]
+        pxor    xmm7,xmm7
+        pxor    xmm1,xmm2
+        pxor    xmm2,xmm0
+        pxor    xmm1,xmm3
+        pxor    xmm3,xmm0
+        pxor    xmm1,xmm4
+        pxor    xmm4,xmm0
+        pxor    xmm1,xmm5
+        pxor    xmm5,xmm0
+        pxor    xmm1,xmm6
+        pxor    xmm6,xmm0
+        movdqa  [96+esp],xmm1
+        movups  xmm1,[ecx*1+edx-32]
+        pxor    xmm2,[esp]
+        pxor    xmm3,[16+esp]
+        pxor    xmm4,[32+esp]
+        pxor    xmm5,[48+esp]
+        pxor    xmm6,[64+esp]
+        movups  xmm0,[ecx*1+edx-16]
+db      102,15,56,220,209
+db      102,15,56,220,217
+db      102,15,56,220,225
+db      102,15,56,220,233
+db      102,15,56,220,241
+db      102,15,56,220,249
+        mov     edi,DWORD [120+esp]
+        call    L$_aesni_encrypt6_enter
+        movdqa  xmm0,[64+esp]
+        pxor    xmm2,[esp]
+        pxor    xmm3,[16+esp]
+        pxor    xmm4,[32+esp]
+        pxor    xmm5,[48+esp]
+        pxor    xmm6,xmm0
+        movdqa  xmm1,[96+esp]
+        movdqu  [esi*1+edi],xmm2
+        movdqu  [16+esi*1+edi],xmm3
+        movdqu  [32+esi*1+edi],xmm4
+        movdqu  [48+esi*1+edi],xmm5
+        movdqu  [64+esi*1+edi],xmm6
+        jmp     NEAR L$078done
+align   16
+L$079one:
+        movdqu  xmm7,[ebx]
+        mov     edx,DWORD [112+esp]
+        movdqu  xmm2,[esi]
+        mov     ecx,DWORD [240+edx]
+        pxor    xmm7,xmm0
+        pxor    xmm1,xmm2
+        pxor    xmm2,xmm7
+        movdqa  xmm6,xmm1
+        mov     edi,DWORD [120+esp]
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        lea     edx,[32+edx]
+        xorps   xmm2,xmm0
+L$083enc1_loop_16:
+db      102,15,56,220,209
+        dec     ecx
+        movups  xmm1,[edx]
+        lea     edx,[16+edx]
+        jnz     NEAR L$083enc1_loop_16
+db      102,15,56,221,209
+        xorps   xmm2,xmm7
+        movdqa  xmm0,xmm7
+        movdqa  xmm1,xmm6
+        movups  [esi*1+edi],xmm2
+        jmp     NEAR L$078done
+align   16
+L$080two:
+        lea     ecx,[1+ebp]
+        mov     edx,DWORD [112+esp]
+        bsf     ecx,ecx
+        shl     ecx,4
+        movdqu  xmm6,[ebx]
+        movdqu  xmm7,[ecx*1+ebx]
+        movdqu  xmm2,[esi]
+        movdqu  xmm3,[16+esi]
+        mov     ecx,DWORD [240+edx]
+        pxor    xmm6,xmm0
+        pxor    xmm7,xmm6
+        pxor    xmm1,xmm2
+        pxor    xmm2,xmm6
+        pxor    xmm1,xmm3
+        pxor    xmm3,xmm7
+        movdqa  xmm5,xmm1
+        mov     edi,DWORD [120+esp]
+        call    __aesni_encrypt2
+        xorps   xmm2,xmm6
+        xorps   xmm3,xmm7
+        movdqa  xmm0,xmm7
+        movdqa  xmm1,xmm5
+        movups  [esi*1+edi],xmm2
+        movups  [16+esi*1+edi],xmm3
+        jmp     NEAR L$078done
+align   16
+L$081three:
+        lea     ecx,[1+ebp]
+        mov     edx,DWORD [112+esp]
+        bsf     ecx,ecx
+        shl     ecx,4
+        movdqu  xmm5,[ebx]
+        movdqu  xmm6,[ecx*1+ebx]
+        movdqa  xmm7,xmm5
+        movdqu  xmm2,[esi]
+        movdqu  xmm3,[16+esi]
+        movdqu  xmm4,[32+esi]
+        mov     ecx,DWORD [240+edx]
+        pxor    xmm5,xmm0
+        pxor    xmm6,xmm5
+        pxor    xmm7,xmm6
+        pxor    xmm1,xmm2
+        pxor    xmm2,xmm5
+        pxor    xmm1,xmm3
+        pxor    xmm3,xmm6
+        pxor    xmm1,xmm4
+        pxor    xmm4,xmm7
+        movdqa  [96+esp],xmm1
+        mov     edi,DWORD [120+esp]
+        call    __aesni_encrypt3
+        xorps   xmm2,xmm5
+        xorps   xmm3,xmm6
+        xorps   xmm4,xmm7
+        movdqa  xmm0,xmm7
+        movdqa  xmm1,[96+esp]
+        movups  [esi*1+edi],xmm2
+        movups  [16+esi*1+edi],xmm3
+        movups  [32+esi*1+edi],xmm4
+        jmp     NEAR L$078done
+align   16
+L$082four:
+        lea     ecx,[1+ebp]
+        lea     eax,[3+ebp]
+        bsf     ecx,ecx
+        bsf     eax,eax
+        mov     edx,DWORD [112+esp]
+        shl     ecx,4
+        shl     eax,4
+        movdqu  xmm4,[ebx]
+        movdqu  xmm5,[ecx*1+ebx]
+        movdqa  xmm6,xmm4
+        movdqu  xmm7,[eax*1+ebx]
+        pxor    xmm4,xmm0
+        movdqu  xmm2,[esi]
+        pxor    xmm5,xmm4
+        movdqu  xmm3,[16+esi]
+        pxor    xmm6,xmm5
+        movdqa  [esp],xmm4
+        pxor    xmm7,xmm6
+        movdqa  [16+esp],xmm5
+        movdqu  xmm4,[32+esi]
+        movdqu  xmm5,[48+esi]
+        mov     ecx,DWORD [240+edx]
+        pxor    xmm1,xmm2
+        pxor    xmm2,[esp]
+        pxor    xmm1,xmm3
+        pxor    xmm3,[16+esp]
+        pxor    xmm1,xmm4
+        pxor    xmm4,xmm6
+        pxor    xmm1,xmm5
+        pxor    xmm5,xmm7
+        movdqa  [96+esp],xmm1
+        mov     edi,DWORD [120+esp]
+        call    __aesni_encrypt4
+        xorps   xmm2,[esp]
+        xorps   xmm3,[16+esp]
+        xorps   xmm4,xmm6
+        movups  [esi*1+edi],xmm2
+        xorps   xmm5,xmm7
+        movups  [16+esi*1+edi],xmm3
+        movdqa  xmm0,xmm7
+        movups  [32+esi*1+edi],xmm4
+        movdqa  xmm1,[96+esp]
+        movups  [48+esi*1+edi],xmm5
+L$078done:
+        mov     edx,DWORD [128+esp]
+        pxor    xmm2,xmm2
+        pxor    xmm3,xmm3
+        movdqa  [esp],xmm2
+        pxor    xmm4,xmm4
+        movdqa  [16+esp],xmm2
+        pxor    xmm5,xmm5
+        movdqa  [32+esp],xmm2
+        pxor    xmm6,xmm6
+        movdqa  [48+esp],xmm2
+        pxor    xmm7,xmm7
+        movdqa  [64+esp],xmm2
+        movdqa  [80+esp],xmm2
+        movdqa  [96+esp],xmm2
+        lea     esp,[edx]
+        mov     ecx,DWORD [40+esp]
+        mov     ebx,DWORD [48+esp]
+        movdqu  [ecx],xmm0
+        pxor    xmm0,xmm0
+        movdqu  [ebx],xmm1
+        pxor    xmm1,xmm1
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _aesni_ocb_decrypt
+align   16
+_aesni_ocb_decrypt:
+L$_aesni_ocb_decrypt_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     ecx,DWORD [40+esp]
+        mov     ebx,DWORD [48+esp]
+        mov     esi,DWORD [20+esp]
+        mov     edi,DWORD [24+esp]
+        mov     eax,DWORD [28+esp]
+        mov     edx,DWORD [32+esp]
+        movdqu  xmm0,[ecx]
+        mov     ebp,DWORD [36+esp]
+        movdqu  xmm1,[ebx]
+        mov     ebx,DWORD [44+esp]
+        mov     ecx,esp
+        sub     esp,132
+        and     esp,-16
+        sub     edi,esi
+        shl     eax,4
+        lea     eax,[eax*1+esi-96]
+        mov     DWORD [120+esp],edi
+        mov     DWORD [124+esp],eax
+        mov     DWORD [128+esp],ecx
+        mov     ecx,DWORD [240+edx]
+        test    ebp,1
+        jnz     NEAR L$084odd
+        bsf     eax,ebp
+        add     ebp,1
+        shl     eax,4
+        movdqu  xmm7,[eax*1+ebx]
+        mov     eax,edx
+        movdqu  xmm2,[esi]
+        lea     esi,[16+esi]
+        pxor    xmm7,xmm0
+        pxor    xmm2,xmm7
+        movdqa  xmm6,xmm1
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        lea     edx,[32+edx]
+        xorps   xmm2,xmm0
+L$085dec1_loop_17:
+db      102,15,56,222,209
+        dec     ecx
+        movups  xmm1,[edx]
+        lea     edx,[16+edx]
+        jnz     NEAR L$085dec1_loop_17
+db      102,15,56,223,209
+        xorps   xmm2,xmm7
+        movaps  xmm1,xmm6
+        movdqa  xmm0,xmm7
+        xorps   xmm1,xmm2
+        movups  [esi*1+edi-16],xmm2
+        mov     ecx,DWORD [240+eax]
+        mov     edx,eax
+        mov     eax,DWORD [124+esp]
+L$084odd:
+        shl     ecx,4
+        mov     edi,16
+        sub     edi,ecx
+        mov     DWORD [112+esp],edx
+        lea     edx,[32+ecx*1+edx]
+        mov     DWORD [116+esp],edi
+        cmp     esi,eax
+        ja      NEAR L$086short
+        jmp     NEAR L$087grandloop
+align   32
+L$087grandloop:
+        lea     ecx,[1+ebp]
+        lea     eax,[3+ebp]
+        lea     edi,[5+ebp]
+        add     ebp,6
+        bsf     ecx,ecx
+        bsf     eax,eax
+        bsf     edi,edi
+        shl     ecx,4
+        shl     eax,4
+        shl     edi,4
+        movdqu  xmm2,[ebx]
+        movdqu  xmm3,[ecx*1+ebx]
+        mov     ecx,DWORD [116+esp]
+        movdqa  xmm4,xmm2
+        movdqu  xmm5,[eax*1+ebx]
+        movdqa  xmm6,xmm2
+        movdqu  xmm7,[edi*1+ebx]
+        pxor    xmm2,xmm0
+        pxor    xmm3,xmm2
+        movdqa  [esp],xmm2
+        pxor    xmm4,xmm3
+        movdqa  [16+esp],xmm3
+        pxor    xmm5,xmm4
+        movdqa  [32+esp],xmm4
+        pxor    xmm6,xmm5
+        movdqa  [48+esp],xmm5
+        pxor    xmm7,xmm6
+        movdqa  [64+esp],xmm6
+        movdqa  [80+esp],xmm7
+        movups  xmm0,[ecx*1+edx-48]
+        movdqu  xmm2,[esi]
+        movdqu  xmm3,[16+esi]
+        movdqu  xmm4,[32+esi]
+        movdqu  xmm5,[48+esi]
+        movdqu  xmm6,[64+esi]
+        movdqu  xmm7,[80+esi]
+        lea     esi,[96+esi]
+        movdqa  [96+esp],xmm1
+        pxor    xmm2,xmm0
+        pxor    xmm3,xmm0
+        pxor    xmm4,xmm0
+        pxor    xmm5,xmm0
+        pxor    xmm6,xmm0
+        pxor    xmm7,xmm0
+        movups  xmm1,[ecx*1+edx-32]
+        pxor    xmm2,[esp]
+        pxor    xmm3,[16+esp]
+        pxor    xmm4,[32+esp]
+        pxor    xmm5,[48+esp]
+        pxor    xmm6,[64+esp]
+        pxor    xmm7,[80+esp]
+        movups  xmm0,[ecx*1+edx-16]
+db      102,15,56,222,209
+db      102,15,56,222,217
+db      102,15,56,222,225
+db      102,15,56,222,233
+db      102,15,56,222,241
+db      102,15,56,222,249
+        mov     edi,DWORD [120+esp]
+        mov     eax,DWORD [124+esp]
+        call    L$_aesni_decrypt6_enter
+        movdqa  xmm0,[80+esp]
+        pxor    xmm2,[esp]
+        movdqa  xmm1,[96+esp]
+        pxor    xmm3,[16+esp]
+        pxor    xmm4,[32+esp]
+        pxor    xmm5,[48+esp]
+        pxor    xmm6,[64+esp]
+        pxor    xmm7,xmm0
+        pxor    xmm1,xmm2
+        movdqu  [esi*1+edi-96],xmm2
+        pxor    xmm1,xmm3
+        movdqu  [esi*1+edi-80],xmm3
+        pxor    xmm1,xmm4
+        movdqu  [esi*1+edi-64],xmm4
+        pxor    xmm1,xmm5
+        movdqu  [esi*1+edi-48],xmm5
+        pxor    xmm1,xmm6
+        movdqu  [esi*1+edi-32],xmm6
+        pxor    xmm1,xmm7
+        movdqu  [esi*1+edi-16],xmm7
+        cmp     esi,eax
+        jb      NEAR L$087grandloop
+L$086short:
+        add     eax,96
+        sub     eax,esi
+        jz      NEAR L$088done
+        cmp     eax,32
+        jb      NEAR L$089one
+        je      NEAR L$090two
+        cmp     eax,64
+        jb      NEAR L$091three
+        je      NEAR L$092four
+        lea     ecx,[1+ebp]
+        lea     eax,[3+ebp]
+        bsf     ecx,ecx
+        bsf     eax,eax
+        shl     ecx,4
+        shl     eax,4
+        movdqu  xmm2,[ebx]
+        movdqu  xmm3,[ecx*1+ebx]
+        mov     ecx,DWORD [116+esp]
+        movdqa  xmm4,xmm2
+        movdqu  xmm5,[eax*1+ebx]
+        movdqa  xmm6,xmm2
+        pxor    xmm2,xmm0
+        pxor    xmm3,xmm2
+        movdqa  [esp],xmm2
+        pxor    xmm4,xmm3
+        movdqa  [16+esp],xmm3
+        pxor    xmm5,xmm4
+        movdqa  [32+esp],xmm4
+        pxor    xmm6,xmm5
+        movdqa  [48+esp],xmm5
+        pxor    xmm7,xmm6
+        movdqa  [64+esp],xmm6
+        movups  xmm0,[ecx*1+edx-48]
+        movdqu  xmm2,[esi]
+        movdqu  xmm3,[16+esi]
+        movdqu  xmm4,[32+esi]
+        movdqu  xmm5,[48+esi]
+        movdqu  xmm6,[64+esi]
+        pxor    xmm7,xmm7
+        movdqa  [96+esp],xmm1
+        pxor    xmm2,xmm0
+        pxor    xmm3,xmm0
+        pxor    xmm4,xmm0
+        pxor    xmm5,xmm0
+        pxor    xmm6,xmm0
+        movups  xmm1,[ecx*1+edx-32]
+        pxor    xmm2,[esp]
+        pxor    xmm3,[16+esp]
+        pxor    xmm4,[32+esp]
+        pxor    xmm5,[48+esp]
+        pxor    xmm6,[64+esp]
+        movups  xmm0,[ecx*1+edx-16]
+db      102,15,56,222,209
+db      102,15,56,222,217
+db      102,15,56,222,225
+db      102,15,56,222,233
+db      102,15,56,222,241
+db      102,15,56,222,249
+        mov     edi,DWORD [120+esp]
+        call    L$_aesni_decrypt6_enter
+        movdqa  xmm0,[64+esp]
+        pxor    xmm2,[esp]
+        movdqa  xmm1,[96+esp]
+        pxor    xmm3,[16+esp]
+        pxor    xmm4,[32+esp]
+        pxor    xmm5,[48+esp]
+        pxor    xmm6,xmm0
+        pxor    xmm1,xmm2
+        movdqu  [esi*1+edi],xmm2
+        pxor    xmm1,xmm3
+        movdqu  [16+esi*1+edi],xmm3
+        pxor    xmm1,xmm4
+        movdqu  [32+esi*1+edi],xmm4
+        pxor    xmm1,xmm5
+        movdqu  [48+esi*1+edi],xmm5
+        pxor    xmm1,xmm6
+        movdqu  [64+esi*1+edi],xmm6
+        jmp     NEAR L$088done
+align   16
+L$089one:
+        movdqu  xmm7,[ebx]
+        mov     edx,DWORD [112+esp]
+        movdqu  xmm2,[esi]
+        mov     ecx,DWORD [240+edx]
+        pxor    xmm7,xmm0
+        pxor    xmm2,xmm7
+        movdqa  xmm6,xmm1
+        mov     edi,DWORD [120+esp]
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        lea     edx,[32+edx]
+        xorps   xmm2,xmm0
+L$093dec1_loop_18:
+db      102,15,56,222,209
+        dec     ecx
+        movups  xmm1,[edx]
+        lea     edx,[16+edx]
+        jnz     NEAR L$093dec1_loop_18
+db      102,15,56,223,209
+        xorps   xmm2,xmm7
+        movaps  xmm1,xmm6
+        movdqa  xmm0,xmm7
+        xorps   xmm1,xmm2
+        movups  [esi*1+edi],xmm2
+        jmp     NEAR L$088done
+align   16
+L$090two:
+        lea     ecx,[1+ebp]
+        mov     edx,DWORD [112+esp]
+        bsf     ecx,ecx
+        shl     ecx,4
+        movdqu  xmm6,[ebx]
+        movdqu  xmm7,[ecx*1+ebx]
+        movdqu  xmm2,[esi]
+        movdqu  xmm3,[16+esi]
+        mov     ecx,DWORD [240+edx]
+        movdqa  xmm5,xmm1
+        pxor    xmm6,xmm0
+        pxor    xmm7,xmm6
+        pxor    xmm2,xmm6
+        pxor    xmm3,xmm7
+        mov     edi,DWORD [120+esp]
+        call    __aesni_decrypt2
+        xorps   xmm2,xmm6
+        xorps   xmm3,xmm7
+        movdqa  xmm0,xmm7
+        xorps   xmm5,xmm2
+        movups  [esi*1+edi],xmm2
+        xorps   xmm5,xmm3
+        movups  [16+esi*1+edi],xmm3
+        movaps  xmm1,xmm5
+        jmp     NEAR L$088done
+align   16
+L$091three:
+        lea     ecx,[1+ebp]
+        mov     edx,DWORD [112+esp]
+        bsf     ecx,ecx
+        shl     ecx,4
+        movdqu  xmm5,[ebx]
+        movdqu  xmm6,[ecx*1+ebx]
+        movdqa  xmm7,xmm5
+        movdqu  xmm2,[esi]
+        movdqu  xmm3,[16+esi]
+        movdqu  xmm4,[32+esi]
+        mov     ecx,DWORD [240+edx]
+        movdqa  [96+esp],xmm1
+        pxor    xmm5,xmm0
+        pxor    xmm6,xmm5
+        pxor    xmm7,xmm6
+        pxor    xmm2,xmm5
+        pxor    xmm3,xmm6
+        pxor    xmm4,xmm7
+        mov     edi,DWORD [120+esp]
+        call    __aesni_decrypt3
+        movdqa  xmm1,[96+esp]
+        xorps   xmm2,xmm5
+        xorps   xmm3,xmm6
+        xorps   xmm4,xmm7
+        movups  [esi*1+edi],xmm2
+        pxor    xmm1,xmm2
+        movdqa  xmm0,xmm7
+        movups  [16+esi*1+edi],xmm3
+        pxor    xmm1,xmm3
+        movups  [32+esi*1+edi],xmm4
+        pxor    xmm1,xmm4
+        jmp     NEAR L$088done
+align   16
+L$092four:
+        lea     ecx,[1+ebp]
+        lea     eax,[3+ebp]
+        bsf     ecx,ecx
+        bsf     eax,eax
+        mov     edx,DWORD [112+esp]
+        shl     ecx,4
+        shl     eax,4
+        movdqu  xmm4,[ebx]
+        movdqu  xmm5,[ecx*1+ebx]
+        movdqa  xmm6,xmm4
+        movdqu  xmm7,[eax*1+ebx]
+        pxor    xmm4,xmm0
+        movdqu  xmm2,[esi]
+        pxor    xmm5,xmm4
+        movdqu  xmm3,[16+esi]
+        pxor    xmm6,xmm5
+        movdqa  [esp],xmm4
+        pxor    xmm7,xmm6
+        movdqa  [16+esp],xmm5
+        movdqu  xmm4,[32+esi]
+        movdqu  xmm5,[48+esi]
+        mov     ecx,DWORD [240+edx]
+        movdqa  [96+esp],xmm1
+        pxor    xmm2,[esp]
+        pxor    xmm3,[16+esp]
+        pxor    xmm4,xmm6
+        pxor    xmm5,xmm7
+        mov     edi,DWORD [120+esp]
+        call    __aesni_decrypt4
+        movdqa  xmm1,[96+esp]
+        xorps   xmm2,[esp]
+        xorps   xmm3,[16+esp]
+        xorps   xmm4,xmm6
+        movups  [esi*1+edi],xmm2
+        pxor    xmm1,xmm2
+        xorps   xmm5,xmm7
+        movups  [16+esi*1+edi],xmm3
+        pxor    xmm1,xmm3
+        movdqa  xmm0,xmm7
+        movups  [32+esi*1+edi],xmm4
+        pxor    xmm1,xmm4
+        movups  [48+esi*1+edi],xmm5
+        pxor    xmm1,xmm5
+L$088done:
+        mov     edx,DWORD [128+esp]
+        pxor    xmm2,xmm2
+        pxor    xmm3,xmm3
+        movdqa  [esp],xmm2
+        pxor    xmm4,xmm4
+        movdqa  [16+esp],xmm2
+        pxor    xmm5,xmm5
+        movdqa  [32+esp],xmm2
+        pxor    xmm6,xmm6
+        movdqa  [48+esp],xmm2
+        pxor    xmm7,xmm7
+        movdqa  [64+esp],xmm2
+        movdqa  [80+esp],xmm2
+        movdqa  [96+esp],xmm2
+        lea     esp,[edx]
+        mov     ecx,DWORD [40+esp]
+        mov     ebx,DWORD [48+esp]
+        movdqu  [ecx],xmm0
+        pxor    xmm0,xmm0
+        movdqu  [ebx],xmm1
+        pxor    xmm1,xmm1
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _aesni_cbc_encrypt
+align   16
+_aesni_cbc_encrypt:
+L$_aesni_cbc_encrypt_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     esi,DWORD [20+esp]
+        mov     ebx,esp
+        mov     edi,DWORD [24+esp]
+        sub     ebx,24
+        mov     eax,DWORD [28+esp]
+        and     ebx,-16
+        mov     edx,DWORD [32+esp]
+        mov     ebp,DWORD [36+esp]
+        test    eax,eax
+        jz      NEAR L$094cbc_abort
+        cmp     DWORD [40+esp],0
+        xchg    ebx,esp
+        movups  xmm7,[ebp]
+        mov     ecx,DWORD [240+edx]
+        mov     ebp,edx
+        mov     DWORD [16+esp],ebx
+        mov     ebx,ecx
+        je      NEAR L$095cbc_decrypt
+        movaps  xmm2,xmm7
+        cmp     eax,16
+        jb      NEAR L$096cbc_enc_tail
+        sub     eax,16
+        jmp     NEAR L$097cbc_enc_loop
+align   16
+L$097cbc_enc_loop:
+        movups  xmm7,[esi]
+        lea     esi,[16+esi]
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        xorps   xmm7,xmm0
+        lea     edx,[32+edx]
+        xorps   xmm2,xmm7
+L$098enc1_loop_19:
+db      102,15,56,220,209
+        dec     ecx
+        movups  xmm1,[edx]
+        lea     edx,[16+edx]
+        jnz     NEAR L$098enc1_loop_19
+db      102,15,56,221,209
+        mov     ecx,ebx
+        mov     edx,ebp
+        movups  [edi],xmm2
+        lea     edi,[16+edi]
+        sub     eax,16
+        jnc     NEAR L$097cbc_enc_loop
+        add     eax,16
+        jnz     NEAR L$096cbc_enc_tail
+        movaps  xmm7,xmm2
+        pxor    xmm2,xmm2
+        jmp     NEAR L$099cbc_ret
+L$096cbc_enc_tail:
+        mov     ecx,eax
+dd      2767451785
+        mov     ecx,16
+        sub     ecx,eax
+        xor     eax,eax
+dd      2868115081
+        lea     edi,[edi-16]
+        mov     ecx,ebx
+        mov     esi,edi
+        mov     edx,ebp
+        jmp     NEAR L$097cbc_enc_loop
+align   16
+L$095cbc_decrypt:
+        cmp     eax,80
+        jbe     NEAR L$100cbc_dec_tail
+        movaps  [esp],xmm7
+        sub     eax,80
+        jmp     NEAR L$101cbc_dec_loop6_enter
+align   16
+L$102cbc_dec_loop6:
+        movaps  [esp],xmm0
+        movups  [edi],xmm7
+        lea     edi,[16+edi]
+L$101cbc_dec_loop6_enter:
+        movdqu  xmm2,[esi]
+        movdqu  xmm3,[16+esi]
+        movdqu  xmm4,[32+esi]
+        movdqu  xmm5,[48+esi]
+        movdqu  xmm6,[64+esi]
+        movdqu  xmm7,[80+esi]
+        call    __aesni_decrypt6
+        movups  xmm1,[esi]
+        movups  xmm0,[16+esi]
+        xorps   xmm2,[esp]
+        xorps   xmm3,xmm1
+        movups  xmm1,[32+esi]
+        xorps   xmm4,xmm0
+        movups  xmm0,[48+esi]
+        xorps   xmm5,xmm1
+        movups  xmm1,[64+esi]
+        xorps   xmm6,xmm0
+        movups  xmm0,[80+esi]
+        xorps   xmm7,xmm1
+        movups  [edi],xmm2
+        movups  [16+edi],xmm3
+        lea     esi,[96+esi]
+        movups  [32+edi],xmm4
+        mov     ecx,ebx
+        movups  [48+edi],xmm5
+        mov     edx,ebp
+        movups  [64+edi],xmm6
+        lea     edi,[80+edi]
+        sub     eax,96
+        ja      NEAR L$102cbc_dec_loop6
+        movaps  xmm2,xmm7
+        movaps  xmm7,xmm0
+        add     eax,80
+        jle     NEAR L$103cbc_dec_clear_tail_collected
+        movups  [edi],xmm2
+        lea     edi,[16+edi]
+L$100cbc_dec_tail:
+        movups  xmm2,[esi]
+        movaps  xmm6,xmm2
+        cmp     eax,16
+        jbe     NEAR L$104cbc_dec_one
+        movups  xmm3,[16+esi]
+        movaps  xmm5,xmm3
+        cmp     eax,32
+        jbe     NEAR L$105cbc_dec_two
+        movups  xmm4,[32+esi]
+        cmp     eax,48
+        jbe     NEAR L$106cbc_dec_three
+        movups  xmm5,[48+esi]
+        cmp     eax,64
+        jbe     NEAR L$107cbc_dec_four
+        movups  xmm6,[64+esi]
+        movaps  [esp],xmm7
+        movups  xmm2,[esi]
+        xorps   xmm7,xmm7
+        call    __aesni_decrypt6
+        movups  xmm1,[esi]
+        movups  xmm0,[16+esi]
+        xorps   xmm2,[esp]
+        xorps   xmm3,xmm1
+        movups  xmm1,[32+esi]
+        xorps   xmm4,xmm0
+        movups  xmm0,[48+esi]
+        xorps   xmm5,xmm1
+        movups  xmm7,[64+esi]
+        xorps   xmm6,xmm0
+        movups  [edi],xmm2
+        movups  [16+edi],xmm3
+        pxor    xmm3,xmm3
+        movups  [32+edi],xmm4
+        pxor    xmm4,xmm4
+        movups  [48+edi],xmm5
+        pxor    xmm5,xmm5
+        lea     edi,[64+edi]
+        movaps  xmm2,xmm6
+        pxor    xmm6,xmm6
+        sub     eax,80
+        jmp     NEAR L$108cbc_dec_tail_collected
+align   16
+L$104cbc_dec_one:
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        lea     edx,[32+edx]
+        xorps   xmm2,xmm0
+L$109dec1_loop_20:
+db      102,15,56,222,209
+        dec     ecx
+        movups  xmm1,[edx]
+        lea     edx,[16+edx]
+        jnz     NEAR L$109dec1_loop_20
+db      102,15,56,223,209
+        xorps   xmm2,xmm7
+        movaps  xmm7,xmm6
+        sub     eax,16
+        jmp     NEAR L$108cbc_dec_tail_collected
+align   16
+L$105cbc_dec_two:
+        call    __aesni_decrypt2
+        xorps   xmm2,xmm7
+        xorps   xmm3,xmm6
+        movups  [edi],xmm2
+        movaps  xmm2,xmm3
+        pxor    xmm3,xmm3
+        lea     edi,[16+edi]
+        movaps  xmm7,xmm5
+        sub     eax,32
+        jmp     NEAR L$108cbc_dec_tail_collected
+align   16
+L$106cbc_dec_three:
+        call    __aesni_decrypt3
+        xorps   xmm2,xmm7
+        xorps   xmm3,xmm6
+        xorps   xmm4,xmm5
+        movups  [edi],xmm2
+        movaps  xmm2,xmm4
+        pxor    xmm4,xmm4
+        movups  [16+edi],xmm3
+        pxor    xmm3,xmm3
+        lea     edi,[32+edi]
+        movups  xmm7,[32+esi]
+        sub     eax,48
+        jmp     NEAR L$108cbc_dec_tail_collected
+align   16
+L$107cbc_dec_four:
+        call    __aesni_decrypt4
+        movups  xmm1,[16+esi]
+        movups  xmm0,[32+esi]
+        xorps   xmm2,xmm7
+        movups  xmm7,[48+esi]
+        xorps   xmm3,xmm6
+        movups  [edi],xmm2
+        xorps   xmm4,xmm1
+        movups  [16+edi],xmm3
+        pxor    xmm3,xmm3
+        xorps   xmm5,xmm0
+        movups  [32+edi],xmm4
+        pxor    xmm4,xmm4
+        lea     edi,[48+edi]
+        movaps  xmm2,xmm5
+        pxor    xmm5,xmm5
+        sub     eax,64
+        jmp     NEAR L$108cbc_dec_tail_collected
+align   16
+L$103cbc_dec_clear_tail_collected:
+        pxor    xmm3,xmm3
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+        pxor    xmm6,xmm6
+L$108cbc_dec_tail_collected:
+        and     eax,15
+        jnz     NEAR L$110cbc_dec_tail_partial
+        movups  [edi],xmm2
+        pxor    xmm0,xmm0
+        jmp     NEAR L$099cbc_ret
+align   16
+L$110cbc_dec_tail_partial:
+        movaps  [esp],xmm2
+        pxor    xmm0,xmm0
+        mov     ecx,16
+        mov     esi,esp
+        sub     ecx,eax
+dd      2767451785
+        movdqa  [esp],xmm2
+L$099cbc_ret:
+        mov     esp,DWORD [16+esp]
+        mov     ebp,DWORD [36+esp]
+        pxor    xmm2,xmm2
+        pxor    xmm1,xmm1
+        movups  [ebp],xmm7
+        pxor    xmm7,xmm7
+L$094cbc_abort:
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+align   16
+__aesni_set_encrypt_key:
+        push    ebp
+        push    ebx
+        test    eax,eax
+        jz      NEAR L$111bad_pointer
+        test    edx,edx
+        jz      NEAR L$111bad_pointer
+        call    L$112pic
+L$112pic:
+        pop     ebx
+        lea     ebx,[(L$key_const-L$112pic)+ebx]
+        lea     ebp,[_OPENSSL_ia32cap_P]
+        movups  xmm0,[eax]
+        xorps   xmm4,xmm4
+        mov     ebp,DWORD [4+ebp]
+        lea     edx,[16+edx]
+        and     ebp,268437504
+        cmp     ecx,256
+        je      NEAR L$11314rounds
+        cmp     ecx,192
+        je      NEAR L$11412rounds
+        cmp     ecx,128
+        jne     NEAR L$115bad_keybits
+align   16
+L$11610rounds:
+        cmp     ebp,268435456
+        je      NEAR L$11710rounds_alt
+        mov     ecx,9
+        movups  [edx-16],xmm0
+db      102,15,58,223,200,1
+        call    L$118key_128_cold
+db      102,15,58,223,200,2
+        call    L$119key_128
+db      102,15,58,223,200,4
+        call    L$119key_128
+db      102,15,58,223,200,8
+        call    L$119key_128
+db      102,15,58,223,200,16
+        call    L$119key_128
+db      102,15,58,223,200,32
+        call    L$119key_128
+db      102,15,58,223,200,64
+        call    L$119key_128
+db      102,15,58,223,200,128
+        call    L$119key_128
+db      102,15,58,223,200,27
+        call    L$119key_128
+db      102,15,58,223,200,54
+        call    L$119key_128
+        movups  [edx],xmm0
+        mov     DWORD [80+edx],ecx
+        jmp     NEAR L$120good_key
+align   16
+L$119key_128:
+        movups  [edx],xmm0
+        lea     edx,[16+edx]
+L$118key_128_cold:
+        shufps  xmm4,xmm0,16
+        xorps   xmm0,xmm4
+        shufps  xmm4,xmm0,140
+        xorps   xmm0,xmm4
+        shufps  xmm1,xmm1,255
+        xorps   xmm0,xmm1
+        ret
+align   16
+L$11710rounds_alt:
+        movdqa  xmm5,[ebx]
+        mov     ecx,8
+        movdqa  xmm4,[32+ebx]
+        movdqa  xmm2,xmm0
+        movdqu  [edx-16],xmm0
+L$121loop_key128:
+db      102,15,56,0,197
+db      102,15,56,221,196
+        pslld   xmm4,1
+        lea     edx,[16+edx]
+        movdqa  xmm3,xmm2
+        pslldq  xmm2,4
+        pxor    xmm3,xmm2
+        pslldq  xmm2,4
+        pxor    xmm3,xmm2
+        pslldq  xmm2,4
+        pxor    xmm2,xmm3
+        pxor    xmm0,xmm2
+        movdqu  [edx-16],xmm0
+        movdqa  xmm2,xmm0
+        dec     ecx
+        jnz     NEAR L$121loop_key128
+        movdqa  xmm4,[48+ebx]
+db      102,15,56,0,197
+db      102,15,56,221,196
+        pslld   xmm4,1
+        movdqa  xmm3,xmm2
+        pslldq  xmm2,4
+        pxor    xmm3,xmm2
+        pslldq  xmm2,4
+        pxor    xmm3,xmm2
+        pslldq  xmm2,4
+        pxor    xmm2,xmm3
+        pxor    xmm0,xmm2
+        movdqu  [edx],xmm0
+        movdqa  xmm2,xmm0
+db      102,15,56,0,197
+db      102,15,56,221,196
+        movdqa  xmm3,xmm2
+        pslldq  xmm2,4
+        pxor    xmm3,xmm2
+        pslldq  xmm2,4
+        pxor    xmm3,xmm2
+        pslldq  xmm2,4
+        pxor    xmm2,xmm3
+        pxor    xmm0,xmm2
+        movdqu  [16+edx],xmm0
+        mov     ecx,9
+        mov     DWORD [96+edx],ecx
+        jmp     NEAR L$120good_key
+align   16
+L$11412rounds:
+        movq    xmm2,[16+eax]
+        cmp     ebp,268435456
+        je      NEAR L$12212rounds_alt
+        mov     ecx,11
+        movups  [edx-16],xmm0
+db      102,15,58,223,202,1
+        call    L$123key_192a_cold
+db      102,15,58,223,202,2
+        call    L$124key_192b
+db      102,15,58,223,202,4
+        call    L$125key_192a
+db      102,15,58,223,202,8
+        call    L$124key_192b
+db      102,15,58,223,202,16
+        call    L$125key_192a
+db      102,15,58,223,202,32
+        call    L$124key_192b
+db      102,15,58,223,202,64
+        call    L$125key_192a
+db      102,15,58,223,202,128
+        call    L$124key_192b
+        movups  [edx],xmm0
+        mov     DWORD [48+edx],ecx
+        jmp     NEAR L$120good_key
+align   16
+L$125key_192a:
+        movups  [edx],xmm0
+        lea     edx,[16+edx]
+align   16
+L$123key_192a_cold:
+        movaps  xmm5,xmm2
+L$126key_192b_warm:
+        shufps  xmm4,xmm0,16
+        movdqa  xmm3,xmm2
+        xorps   xmm0,xmm4
+        shufps  xmm4,xmm0,140
+        pslldq  xmm3,4
+        xorps   xmm0,xmm4
+        pshufd  xmm1,xmm1,85
+        pxor    xmm2,xmm3
+        pxor    xmm0,xmm1
+        pshufd  xmm3,xmm0,255
+        pxor    xmm2,xmm3
+        ret
+align   16
+L$124key_192b:
+        movaps  xmm3,xmm0
+        shufps  xmm5,xmm0,68
+        movups  [edx],xmm5
+        shufps  xmm3,xmm2,78
+        movups  [16+edx],xmm3
+        lea     edx,[32+edx]
+        jmp     NEAR L$126key_192b_warm
+align   16
+L$12212rounds_alt:
+        movdqa  xmm5,[16+ebx]
+        movdqa  xmm4,[32+ebx]
+        mov     ecx,8
+        movdqu  [edx-16],xmm0
+L$127loop_key192:
+        movq    [edx],xmm2
+        movdqa  xmm1,xmm2
+db      102,15,56,0,213
+db      102,15,56,221,212
+        pslld   xmm4,1
+        lea     edx,[24+edx]
+        movdqa  xmm3,xmm0
+        pslldq  xmm0,4
+        pxor    xmm3,xmm0
+        pslldq  xmm0,4
+        pxor    xmm3,xmm0
+        pslldq  xmm0,4
+        pxor    xmm0,xmm3
+        pshufd  xmm3,xmm0,255
+        pxor    xmm3,xmm1
+        pslldq  xmm1,4
+        pxor    xmm3,xmm1
+        pxor    xmm0,xmm2
+        pxor    xmm2,xmm3
+        movdqu  [edx-16],xmm0
+        dec     ecx
+        jnz     NEAR L$127loop_key192
+        mov     ecx,11
+        mov     DWORD [32+edx],ecx
+        jmp     NEAR L$120good_key
+align   16
+L$11314rounds:
+        movups  xmm2,[16+eax]
+        lea     edx,[16+edx]
+        cmp     ebp,268435456
+        je      NEAR L$12814rounds_alt
+        mov     ecx,13
+        movups  [edx-32],xmm0
+        movups  [edx-16],xmm2
+db      102,15,58,223,202,1
+        call    L$129key_256a_cold
+db      102,15,58,223,200,1
+        call    L$130key_256b
+db      102,15,58,223,202,2
+        call    L$131key_256a
+db      102,15,58,223,200,2
+        call    L$130key_256b
+db      102,15,58,223,202,4
+        call    L$131key_256a
+db      102,15,58,223,200,4
+        call    L$130key_256b
+db      102,15,58,223,202,8
+        call    L$131key_256a
+db      102,15,58,223,200,8
+        call    L$130key_256b
+db      102,15,58,223,202,16
+        call    L$131key_256a
+db      102,15,58,223,200,16
+        call    L$130key_256b
+db      102,15,58,223,202,32
+        call    L$131key_256a
+db      102,15,58,223,200,32
+        call    L$130key_256b
+db      102,15,58,223,202,64
+        call    L$131key_256a
+        movups  [edx],xmm0
+        mov     DWORD [16+edx],ecx
+        xor     eax,eax
+        jmp     NEAR L$120good_key
+align   16
+L$131key_256a:
+        movups  [edx],xmm2
+        lea     edx,[16+edx]
+L$129key_256a_cold:
+        shufps  xmm4,xmm0,16
+        xorps   xmm0,xmm4
+        shufps  xmm4,xmm0,140
+        xorps   xmm0,xmm4
+        shufps  xmm1,xmm1,255
+        xorps   xmm0,xmm1
+        ret
+align   16
+L$130key_256b:
+        movups  [edx],xmm0
+        lea     edx,[16+edx]
+        shufps  xmm4,xmm2,16
+        xorps   xmm2,xmm4
+        shufps  xmm4,xmm2,140
+        xorps   xmm2,xmm4
+        shufps  xmm1,xmm1,170
+        xorps   xmm2,xmm1
+        ret
+align   16
+L$12814rounds_alt:
+        movdqa  xmm5,[ebx]
+        movdqa  xmm4,[32+ebx]
+        mov     ecx,7
+        movdqu  [edx-32],xmm0
+        movdqa  xmm1,xmm2
+        movdqu  [edx-16],xmm2
+L$132loop_key256:
+db      102,15,56,0,213
+db      102,15,56,221,212
+        movdqa  xmm3,xmm0
+        pslldq  xmm0,4
+        pxor    xmm3,xmm0
+        pslldq  xmm0,4
+        pxor    xmm3,xmm0
+        pslldq  xmm0,4
+        pxor    xmm0,xmm3
+        pslld   xmm4,1
+        pxor    xmm0,xmm2
+        movdqu  [edx],xmm0
+        dec     ecx
+        jz      NEAR L$133done_key256
+        pshufd  xmm2,xmm0,255
+        pxor    xmm3,xmm3
+db      102,15,56,221,211
+        movdqa  xmm3,xmm1
+        pslldq  xmm1,4
+        pxor    xmm3,xmm1
+        pslldq  xmm1,4
+        pxor    xmm3,xmm1
+        pslldq  xmm1,4
+        pxor    xmm1,xmm3
+        pxor    xmm2,xmm1
+        movdqu  [16+edx],xmm2
+        lea     edx,[32+edx]
+        movdqa  xmm1,xmm2
+        jmp     NEAR L$132loop_key256
+L$133done_key256:
+        mov     ecx,13
+        mov     DWORD [16+edx],ecx
+L$120good_key:
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        pxor    xmm2,xmm2
+        pxor    xmm3,xmm3
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+        xor     eax,eax
+        pop     ebx
+        pop     ebp
+        ret
+align   4
+L$111bad_pointer:
+        mov     eax,-1
+        pop     ebx
+        pop     ebp
+        ret
+align   4
+L$115bad_keybits:
+        pxor    xmm0,xmm0
+        mov     eax,-2
+        pop     ebx
+        pop     ebp
+        ret
+global  _aesni_set_encrypt_key
+align   16
+_aesni_set_encrypt_key:
+L$_aesni_set_encrypt_key_begin:
+        mov     eax,DWORD [4+esp]
+        mov     ecx,DWORD [8+esp]
+        mov     edx,DWORD [12+esp]
+        call    __aesni_set_encrypt_key
+        ret
+global  _aesni_set_decrypt_key
+align   16
+_aesni_set_decrypt_key:
+L$_aesni_set_decrypt_key_begin:
+        mov     eax,DWORD [4+esp]
+        mov     ecx,DWORD [8+esp]
+        mov     edx,DWORD [12+esp]
+        call    __aesni_set_encrypt_key
+        mov     edx,DWORD [12+esp]
+        shl     ecx,4
+        test    eax,eax
+        jnz     NEAR L$134dec_key_ret
+        lea     eax,[16+ecx*1+edx]
+        movups  xmm0,[edx]
+        movups  xmm1,[eax]
+        movups  [eax],xmm0
+        movups  [edx],xmm1
+        lea     edx,[16+edx]
+        lea     eax,[eax-16]
+L$135dec_key_inverse:
+        movups  xmm0,[edx]
+        movups  xmm1,[eax]
+db      102,15,56,219,192
+db      102,15,56,219,201
+        lea     edx,[16+edx]
+        lea     eax,[eax-16]
+        movups  [16+eax],xmm0
+        movups  [edx-16],xmm1
+        cmp     eax,edx
+        ja      NEAR L$135dec_key_inverse
+        movups  xmm0,[edx]
+db      102,15,56,219,192
+        movups  [edx],xmm0
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        xor     eax,eax
+L$134dec_key_ret:
+        ret
+align   64
+L$key_const:
+dd      202313229,202313229,202313229,202313229
+dd      67569157,67569157,67569157,67569157
+dd      1,1,1,1
+dd      27,27,27,27
+db      65,69,83,32,102,111,114,32,73,110,116,101,108,32,65,69
+db      83,45,78,73,44,32,67,82,89,80,84,79,71,65,77,83
+db      32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115
+db      115,108,46,111,114,103,62,0
+segment .bss
+common  _OPENSSL_ia32cap_P 16
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/vpaes-x86.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/vpaes-x86.nasm
new file mode 100644
index 0000000000..5eecfdba3d
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/vpaes-x86.nasm
@@ -0,0 +1,648 @@
+; Copyright 2011-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code    use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text   code align=64
+%else
+section .text   code
+%endif
+align   64
+L$_vpaes_consts:
+dd      218628480,235210255,168496130,67568393
+dd      252381056,17041926,33884169,51187212
+dd      252645135,252645135,252645135,252645135
+dd      1512730624,3266504856,1377990664,3401244816
+dd      830229760,1275146365,2969422977,3447763452
+dd      3411033600,2979783055,338359620,2782886510
+dd      4209124096,907596821,221174255,1006095553
+dd      191964160,3799684038,3164090317,1589111125
+dd      182528256,1777043520,2877432650,3265356744
+dd      1874708224,3503451415,3305285752,363511674
+dd      1606117888,3487855781,1093350906,2384367825
+dd      197121,67569157,134941193,202313229
+dd      67569157,134941193,202313229,197121
+dd      134941193,202313229,197121,67569157
+dd      202313229,197121,67569157,134941193
+dd      33619971,100992007,168364043,235736079
+dd      235736079,33619971,100992007,168364043
+dd      168364043,235736079,33619971,100992007
+dd      100992007,168364043,235736079,33619971
+dd      50462976,117835012,185207048,252579084
+dd      252314880,51251460,117574920,184942860
+dd      184682752,252054788,50987272,118359308
+dd      118099200,185467140,251790600,50727180
+dd      2946363062,528716217,1300004225,1881839624
+dd      1532713819,1532713819,1532713819,1532713819
+dd      3602276352,4288629033,3737020424,4153884961
+dd      1354558464,32357713,2958822624,3775749553
+dd      1201988352,132424512,1572796698,503232858
+dd      2213177600,1597421020,4103937655,675398315
+dd      2749646592,4273543773,1511898873,121693092
+dd      3040248576,1103263732,2871565598,1608280554
+dd      2236667136,2588920351,482954393,64377734
+dd      3069987328,291237287,2117370568,3650299247
+dd      533321216,3573750986,2572112006,1401264716
+dd      1339849704,2721158661,548607111,3445553514
+dd      2128193280,3054596040,2183486460,1257083700
+dd      655635200,1165381986,3923443150,2344132524
+dd      190078720,256924420,290342170,357187870
+dd      1610966272,2263057382,4103205268,309794674
+dd      2592527872,2233205587,1335446729,3402964816
+dd      3973531904,3225098121,3002836325,1918774430
+dd      3870401024,2102906079,2284471353,4117666579
+dd      617007872,1021508343,366931923,691083277
+dd      2528395776,3491914898,2968704004,1613121270
+dd      3445188352,3247741094,844474987,4093578302
+dd      651481088,1190302358,1689581232,574775300
+dd      4289380608,206939853,2555985458,2489840491
+dd      2130264064,327674451,3566485037,3349835193
+dd      2470714624,316102159,3636825756,3393945945
+db      86,101,99,116,111,114,32,80,101,114,109,117,116,97,116,105
+db      111,110,32,65,69,83,32,102,111,114,32,120,56,54,47,83
+db      83,83,69,51,44,32,77,105,107,101,32,72,97,109,98,117
+db      114,103,32,40,83,116,97,110,102,111,114,100,32,85,110,105
+db      118,101,114,115,105,116,121,41,0
+align   64
+align   16
+__vpaes_preheat:
+        add     ebp,DWORD [esp]
+        movdqa  xmm7,[ebp-48]
+        movdqa  xmm6,[ebp-16]
+        ret
+align   16
+__vpaes_encrypt_core:
+        mov     ecx,16
+        mov     eax,DWORD [240+edx]
+        movdqa  xmm1,xmm6
+        movdqa  xmm2,[ebp]
+        pandn   xmm1,xmm0
+        pand    xmm0,xmm6
+        movdqu  xmm5,[edx]
+db      102,15,56,0,208
+        movdqa  xmm0,[16+ebp]
+        pxor    xmm2,xmm5
+        psrld   xmm1,4
+        add     edx,16
+db      102,15,56,0,193
+        lea     ebx,[192+ebp]
+        pxor    xmm0,xmm2
+        jmp     NEAR L$000enc_entry
+align   16
+L$001enc_loop:
+        movdqa  xmm4,[32+ebp]
+        movdqa  xmm0,[48+ebp]
+db      102,15,56,0,226
+db      102,15,56,0,195
+        pxor    xmm4,xmm5
+        movdqa  xmm5,[64+ebp]
+        pxor    xmm0,xmm4
+        movdqa  xmm1,[ecx*1+ebx-64]
+db      102,15,56,0,234
+        movdqa  xmm2,[80+ebp]
+        movdqa  xmm4,[ecx*1+ebx]
+db      102,15,56,0,211
+        movdqa  xmm3,xmm0
+        pxor    xmm2,xmm5
+db      102,15,56,0,193
+        add     edx,16
+        pxor    xmm0,xmm2
+db      102,15,56,0,220
+        add     ecx,16
+        pxor    xmm3,xmm0
+db      102,15,56,0,193
+        and     ecx,48
+        sub     eax,1
+        pxor    xmm0,xmm3
+L$000enc_entry:
+        movdqa  xmm1,xmm6
+        movdqa  xmm5,[ebp-32]
+        pandn   xmm1,xmm0
+        psrld   xmm1,4
+        pand    xmm0,xmm6
+db      102,15,56,0,232
+        movdqa  xmm3,xmm7
+        pxor    xmm0,xmm1
+db      102,15,56,0,217
+        movdqa  xmm4,xmm7
+        pxor    xmm3,xmm5
+db      102,15,56,0,224
+        movdqa  xmm2,xmm7
+        pxor    xmm4,xmm5
+db      102,15,56,0,211
+        movdqa  xmm3,xmm7
+        pxor    xmm2,xmm0
+db      102,15,56,0,220
+        movdqu  xmm5,[edx]
+        pxor    xmm3,xmm1
+        jnz     NEAR L$001enc_loop
+        movdqa  xmm4,[96+ebp]
+        movdqa  xmm0,[112+ebp]
+db      102,15,56,0,226
+        pxor    xmm4,xmm5
+db      102,15,56,0,195
+        movdqa  xmm1,[64+ecx*1+ebx]
+        pxor    xmm0,xmm4
+db      102,15,56,0,193
+        ret
+align   16
+__vpaes_decrypt_core:
+        lea     ebx,[608+ebp]
+        mov     eax,DWORD [240+edx]
+        movdqa  xmm1,xmm6
+        movdqa  xmm2,[ebx-64]
+        pandn   xmm1,xmm0
+        mov     ecx,eax
+        psrld   xmm1,4
+        movdqu  xmm5,[edx]
+        shl     ecx,4
+        pand    xmm0,xmm6
+db      102,15,56,0,208
+        movdqa  xmm0,[ebx-48]
+        xor     ecx,48
+db      102,15,56,0,193
+        and     ecx,48
+        pxor    xmm2,xmm5
+        movdqa  xmm5,[176+ebp]
+        pxor    xmm0,xmm2
+        add     edx,16
+        lea     ecx,[ecx*1+ebx-352]
+        jmp     NEAR L$002dec_entry
+align   16
+L$003dec_loop:
+        movdqa  xmm4,[ebx-32]
+        movdqa  xmm1,[ebx-16]
+db      102,15,56,0,226
+db      102,15,56,0,203
+        pxor    xmm0,xmm4
+        movdqa  xmm4,[ebx]
+        pxor    xmm0,xmm1
+        movdqa  xmm1,[16+ebx]
+db      102,15,56,0,226
+db      102,15,56,0,197
+db      102,15,56,0,203
+        pxor    xmm0,xmm4
+        movdqa  xmm4,[32+ebx]
+        pxor    xmm0,xmm1
+        movdqa  xmm1,[48+ebx]
+db      102,15,56,0,226
+db      102,15,56,0,197
+db      102,15,56,0,203
+        pxor    xmm0,xmm4
+        movdqa  xmm4,[64+ebx]
+        pxor    xmm0,xmm1
+        movdqa  xmm1,[80+ebx]
+db      102,15,56,0,226
+db      102,15,56,0,197
+db      102,15,56,0,203
+        pxor    xmm0,xmm4
+        add     edx,16
+db      102,15,58,15,237,12
+        pxor    xmm0,xmm1
+        sub     eax,1
+L$002dec_entry:
+        movdqa  xmm1,xmm6
+        movdqa  xmm2,[ebp-32]
+        pandn   xmm1,xmm0
+        pand    xmm0,xmm6
+        psrld   xmm1,4
+db      102,15,56,0,208
+        movdqa  xmm3,xmm7
+        pxor    xmm0,xmm1
+db      102,15,56,0,217
+        movdqa  xmm4,xmm7
+        pxor    xmm3,xmm2
+db      102,15,56,0,224
+        pxor    xmm4,xmm2
+        movdqa  xmm2,xmm7
+db      102,15,56,0,211
+        movdqa  xmm3,xmm7
+        pxor    xmm2,xmm0
+db      102,15,56,0,220
+        movdqu  xmm0,[edx]
+        pxor    xmm3,xmm1
+        jnz     NEAR L$003dec_loop
+        movdqa  xmm4,[96+ebx]
+db      102,15,56,0,226
+        pxor    xmm4,xmm0
+        movdqa  xmm0,[112+ebx]
+        movdqa  xmm2,[ecx]
+db      102,15,56,0,195
+        pxor    xmm0,xmm4
+db      102,15,56,0,194
+        ret
+align   16
+__vpaes_schedule_core:
+        add     ebp,DWORD [esp]
+        movdqu  xmm0,[esi]
+        movdqa  xmm2,[320+ebp]
+        movdqa  xmm3,xmm0
+        lea     ebx,[ebp]
+        movdqa  [4+esp],xmm2
+        call    __vpaes_schedule_transform
+        movdqa  xmm7,xmm0
+        test    edi,edi
+        jnz     NEAR L$004schedule_am_decrypting
+        movdqu  [edx],xmm0
+        jmp     NEAR L$005schedule_go
+L$004schedule_am_decrypting:
+        movdqa  xmm1,[256+ecx*1+ebp]
+db      102,15,56,0,217
+        movdqu  [edx],xmm3
+        xor     ecx,48
+L$005schedule_go:
+        cmp     eax,192
+        ja      NEAR L$006schedule_256
+        je      NEAR L$007schedule_192
+L$008schedule_128:
+        mov     eax,10
+L$009loop_schedule_128:
+        call    __vpaes_schedule_round
+        dec     eax
+        jz      NEAR L$010schedule_mangle_last
+        call    __vpaes_schedule_mangle
+        jmp     NEAR L$009loop_schedule_128
+align   16
+L$007schedule_192:
+        movdqu  xmm0,[8+esi]
+        call    __vpaes_schedule_transform
+        movdqa  xmm6,xmm0
+        pxor    xmm4,xmm4
+        movhlps xmm6,xmm4
+        mov     eax,4
+L$011loop_schedule_192:
+        call    __vpaes_schedule_round
+db      102,15,58,15,198,8
+        call    __vpaes_schedule_mangle
+        call    __vpaes_schedule_192_smear
+        call    __vpaes_schedule_mangle
+        call    __vpaes_schedule_round
+        dec     eax
+        jz      NEAR L$010schedule_mangle_last
+        call    __vpaes_schedule_mangle
+        call    __vpaes_schedule_192_smear
+        jmp     NEAR L$011loop_schedule_192
+align   16
+L$006schedule_256:
+        movdqu  xmm0,[16+esi]
+        call    __vpaes_schedule_transform
+        mov     eax,7
+L$012loop_schedule_256:
+        call    __vpaes_schedule_mangle
+        movdqa  xmm6,xmm0
+        call    __vpaes_schedule_round
+        dec     eax
+        jz      NEAR L$010schedule_mangle_last
+        call    __vpaes_schedule_mangle
+        pshufd  xmm0,xmm0,255
+        movdqa  [20+esp],xmm7
+        movdqa  xmm7,xmm6
+        call    L$_vpaes_schedule_low_round
+        movdqa  xmm7,[20+esp]
+        jmp     NEAR L$012loop_schedule_256
+align   16
+L$010schedule_mangle_last:
+        lea     ebx,[384+ebp]
+        test    edi,edi
+        jnz     NEAR L$013schedule_mangle_last_dec
+        movdqa  xmm1,[256+ecx*1+ebp]
+db      102,15,56,0,193
+        lea     ebx,[352+ebp]
+        add     edx,32
+L$013schedule_mangle_last_dec:
+        add     edx,-16
+        pxor    xmm0,[336+ebp]
+        call    __vpaes_schedule_transform
+        movdqu  [edx],xmm0
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        pxor    xmm2,xmm2
+        pxor    xmm3,xmm3
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+        pxor    xmm6,xmm6
+        pxor    xmm7,xmm7
+        ret
+align   16
+__vpaes_schedule_192_smear:
+        pshufd  xmm1,xmm6,128
+        pshufd  xmm0,xmm7,254
+        pxor    xmm6,xmm1
+        pxor    xmm1,xmm1
+        pxor    xmm6,xmm0
+        movdqa  xmm0,xmm6
+        movhlps xmm6,xmm1
+        ret
+align   16
+__vpaes_schedule_round:
+        movdqa  xmm2,[8+esp]
+        pxor    xmm1,xmm1
+db      102,15,58,15,202,15
+db      102,15,58,15,210,15
+        pxor    xmm7,xmm1
+        pshufd  xmm0,xmm0,255
+db      102,15,58,15,192,1
+        movdqa  [8+esp],xmm2
+L$_vpaes_schedule_low_round:
+        movdqa  xmm1,xmm7
+        pslldq  xmm7,4
+        pxor    xmm7,xmm1
+        movdqa  xmm1,xmm7
+        pslldq  xmm7,8
+        pxor    xmm7,xmm1
+        pxor    xmm7,[336+ebp]
+        movdqa  xmm4,[ebp-16]
+        movdqa  xmm5,[ebp-48]
+        movdqa  xmm1,xmm4
+        pandn   xmm1,xmm0
+        psrld   xmm1,4
+        pand    xmm0,xmm4
+        movdqa  xmm2,[ebp-32]
+db      102,15,56,0,208
+        pxor    xmm0,xmm1
+        movdqa  xmm3,xmm5
+db      102,15,56,0,217
+        pxor    xmm3,xmm2
+        movdqa  xmm4,xmm5
+db      102,15,56,0,224
+        pxor    xmm4,xmm2
+        movdqa  xmm2,xmm5
+db      102,15,56,0,211
+        pxor    xmm2,xmm0
+        movdqa  xmm3,xmm5
+db      102,15,56,0,220
+        pxor    xmm3,xmm1
+        movdqa  xmm4,[32+ebp]
+db      102,15,56,0,226
+        movdqa  xmm0,[48+ebp]
+db      102,15,56,0,195
+        pxor    xmm0,xmm4
+        pxor    xmm0,xmm7
+        movdqa  xmm7,xmm0
+        ret
+align   16
+__vpaes_schedule_transform:
+        movdqa  xmm2,[ebp-16]
+        movdqa  xmm1,xmm2
+        pandn   xmm1,xmm0
+        psrld   xmm1,4
+        pand    xmm0,xmm2
+        movdqa  xmm2,[ebx]
+db      102,15,56,0,208
+        movdqa  xmm0,[16+ebx]
+db      102,15,56,0,193
+        pxor    xmm0,xmm2
+        ret
+align   16
+__vpaes_schedule_mangle:
+        movdqa  xmm4,xmm0
+        movdqa  xmm5,[128+ebp]
+        test    edi,edi
+        jnz     NEAR L$014schedule_mangle_dec
+        add     edx,16
+        pxor    xmm4,[336+ebp]
+db      102,15,56,0,229
+        movdqa  xmm3,xmm4
+db      102,15,56,0,229
+        pxor    xmm3,xmm4
+db      102,15,56,0,229
+        pxor    xmm3,xmm4
+        jmp     NEAR L$015schedule_mangle_both
+align   16
+L$014schedule_mangle_dec:
+        movdqa  xmm2,[ebp-16]
+        lea     esi,[416+ebp]
+        movdqa  xmm1,xmm2
+        pandn   xmm1,xmm4
+        psrld   xmm1,4
+        pand    xmm4,xmm2
+        movdqa  xmm2,[esi]
+db      102,15,56,0,212
+        movdqa  xmm3,[16+esi]
+db      102,15,56,0,217
+        pxor    xmm3,xmm2
+db      102,15,56,0,221
+        movdqa  xmm2,[32+esi]
+db      102,15,56,0,212
+        pxor    xmm2,xmm3
+        movdqa  xmm3,[48+esi]
+db      102,15,56,0,217
+        pxor    xmm3,xmm2
+db      102,15,56,0,221
+        movdqa  xmm2,[64+esi]
+db      102,15,56,0,212
+        pxor    xmm2,xmm3
+        movdqa  xmm3,[80+esi]
+db      102,15,56,0,217
+        pxor    xmm3,xmm2
+db      102,15,56,0,221
+        movdqa  xmm2,[96+esi]
+db      102,15,56,0,212
+        pxor    xmm2,xmm3
+        movdqa  xmm3,[112+esi]
+db      102,15,56,0,217
+        pxor    xmm3,xmm2
+        add     edx,-16
+L$015schedule_mangle_both:
+        movdqa  xmm1,[256+ecx*1+ebp]
+db      102,15,56,0,217
+        add     ecx,-16
+        and     ecx,48
+        movdqu  [edx],xmm3
+        ret
+global  _vpaes_set_encrypt_key
+align   16
+_vpaes_set_encrypt_key:
+L$_vpaes_set_encrypt_key_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     esi,DWORD [20+esp]
+        lea     ebx,[esp-56]
+        mov     eax,DWORD [24+esp]
+        and     ebx,-16
+        mov     edx,DWORD [28+esp]
+        xchg    ebx,esp
+        mov     DWORD [48+esp],ebx
+        mov     ebx,eax
+        shr     ebx,5
+        add     ebx,5
+        mov     DWORD [240+edx],ebx
+        mov     ecx,48
+        mov     edi,0
+        lea     ebp,[(L$_vpaes_consts+0x30-L$016pic_point)]
+        call    __vpaes_schedule_core
+L$016pic_point:
+        mov     esp,DWORD [48+esp]
+        xor     eax,eax
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _vpaes_set_decrypt_key
+align   16
+_vpaes_set_decrypt_key:
+L$_vpaes_set_decrypt_key_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     esi,DWORD [20+esp]
+        lea     ebx,[esp-56]
+        mov     eax,DWORD [24+esp]
+        and     ebx,-16
+        mov     edx,DWORD [28+esp]
+        xchg    ebx,esp
+        mov     DWORD [48+esp],ebx
+        mov     ebx,eax
+        shr     ebx,5
+        add     ebx,5
+        mov     DWORD [240+edx],ebx
+        shl     ebx,4
+        lea     edx,[16+ebx*1+edx]
+        mov     edi,1
+        mov     ecx,eax
+        shr     ecx,1
+        and     ecx,32
+        xor     ecx,32
+        lea     ebp,[(L$_vpaes_consts+0x30-L$017pic_point)]
+        call    __vpaes_schedule_core
+L$017pic_point:
+        mov     esp,DWORD [48+esp]
+        xor     eax,eax
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _vpaes_encrypt
+align   16
+_vpaes_encrypt:
+L$_vpaes_encrypt_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        lea     ebp,[(L$_vpaes_consts+0x30-L$018pic_point)]
+        call    __vpaes_preheat
+L$018pic_point:
+        mov     esi,DWORD [20+esp]
+        lea     ebx,[esp-56]
+        mov     edi,DWORD [24+esp]
+        and     ebx,-16
+        mov     edx,DWORD [28+esp]
+        xchg    ebx,esp
+        mov     DWORD [48+esp],ebx
+        movdqu  xmm0,[esi]
+        call    __vpaes_encrypt_core
+        movdqu  [edi],xmm0
+        mov     esp,DWORD [48+esp]
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _vpaes_decrypt
+align   16
+_vpaes_decrypt:
+L$_vpaes_decrypt_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        lea     ebp,[(L$_vpaes_consts+0x30-L$019pic_point)]
+        call    __vpaes_preheat
+L$019pic_point:
+        mov     esi,DWORD [20+esp]
+        lea     ebx,[esp-56]
+        mov     edi,DWORD [24+esp]
+        and     ebx,-16
+        mov     edx,DWORD [28+esp]
+        xchg    ebx,esp
+        mov     DWORD [48+esp],ebx
+        movdqu  xmm0,[esi]
+        call    __vpaes_decrypt_core
+        movdqu  [edi],xmm0
+        mov     esp,DWORD [48+esp]
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _vpaes_cbc_encrypt
+align   16
+_vpaes_cbc_encrypt:
+L$_vpaes_cbc_encrypt_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     esi,DWORD [20+esp]
+        mov     edi,DWORD [24+esp]
+        mov     eax,DWORD [28+esp]
+        mov     edx,DWORD [32+esp]
+        sub     eax,16
+        jc      NEAR L$020cbc_abort
+        lea     ebx,[esp-56]
+        mov     ebp,DWORD [36+esp]
+        and     ebx,-16
+        mov     ecx,DWORD [40+esp]
+        xchg    ebx,esp
+        movdqu  xmm1,[ebp]
+        sub     edi,esi
+        mov     DWORD [48+esp],ebx
+        mov     DWORD [esp],edi
+        mov     DWORD [4+esp],edx
+        mov     DWORD [8+esp],ebp
+        mov     edi,eax
+        lea     ebp,[(L$_vpaes_consts+0x30-L$021pic_point)]
+        call    __vpaes_preheat
+L$021pic_point:
+        cmp     ecx,0
+        je      NEAR L$022cbc_dec_loop
+        jmp     NEAR L$023cbc_enc_loop
+align   16
+L$023cbc_enc_loop:
+        movdqu  xmm0,[esi]
+        pxor    xmm0,xmm1
+        call    __vpaes_encrypt_core
+        mov     ebx,DWORD [esp]
+        mov     edx,DWORD [4+esp]
+        movdqa  xmm1,xmm0
+        movdqu  [esi*1+ebx],xmm0
+        lea     esi,[16+esi]
+        sub     edi,16
+        jnc     NEAR L$023cbc_enc_loop
+        jmp     NEAR L$024cbc_done
+align   16
+L$022cbc_dec_loop:
+        movdqu  xmm0,[esi]
+        movdqa  [16+esp],xmm1
+        movdqa  [32+esp],xmm0
+        call    __vpaes_decrypt_core
+        mov     ebx,DWORD [esp]
+        mov     edx,DWORD [4+esp]
+        pxor    xmm0,[16+esp]
+        movdqa  xmm1,[32+esp]
+        movdqu  [esi*1+ebx],xmm0
+        lea     esi,[16+esi]
+        sub     edi,16
+        jnc     NEAR L$022cbc_dec_loop
+L$024cbc_done:
+        mov     ebx,DWORD [8+esp]
+        mov     esp,DWORD [48+esp]
+        movdqu  [ebx],xmm1
+L$020cbc_abort:
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/bn-586.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/bn-586.nasm
new file mode 100644
index 0000000000..75bba13387
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/bn-586.nasm
@@ -0,0 +1,1522 @@
+; Copyright 1995-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code    use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text   code align=64
+%else
+section .text   code
+%endif
+;extern _OPENSSL_ia32cap_P
+global  _bn_mul_add_words
+align   16
+_bn_mul_add_words:
+L$_bn_mul_add_words_begin:
+        lea     eax,[_OPENSSL_ia32cap_P]
+        bt      DWORD [eax],26
+        jnc     NEAR L$000maw_non_sse2
+        mov     eax,DWORD [4+esp]
+        mov     edx,DWORD [8+esp]
+        mov     ecx,DWORD [12+esp]
+        movd    mm0,DWORD [16+esp]
+        pxor    mm1,mm1
+        jmp     NEAR L$001maw_sse2_entry
+align   16
+L$002maw_sse2_unrolled:
+        movd    mm3,DWORD [eax]
+        paddq   mm1,mm3
+        movd    mm2,DWORD [edx]
+        pmuludq mm2,mm0
+        movd    mm4,DWORD [4+edx]
+        pmuludq mm4,mm0
+        movd    mm6,DWORD [8+edx]
+        pmuludq mm6,mm0
+        movd    mm7,DWORD [12+edx]
+        pmuludq mm7,mm0
+        paddq   mm1,mm2
+        movd    mm3,DWORD [4+eax]
+        paddq   mm3,mm4
+        movd    mm5,DWORD [8+eax]
+        paddq   mm5,mm6
+        movd    mm4,DWORD [12+eax]
+        paddq   mm7,mm4
+        movd    DWORD [eax],mm1
+        movd    mm2,DWORD [16+edx]
+        pmuludq mm2,mm0
+        psrlq   mm1,32
+        movd    mm4,DWORD [20+edx]
+        pmuludq mm4,mm0
+        paddq   mm1,mm3
+        movd    mm6,DWORD [24+edx]
+        pmuludq mm6,mm0
+        movd    DWORD [4+eax],mm1
+        psrlq   mm1,32
+        movd    mm3,DWORD [28+edx]
+        add     edx,32
+        pmuludq mm3,mm0
+        paddq   mm1,mm5
+        movd    mm5,DWORD [16+eax]
+        paddq   mm2,mm5
+        movd    DWORD [8+eax],mm1
+        psrlq   mm1,32
+        paddq   mm1,mm7
+        movd    mm5,DWORD [20+eax]
+        paddq   mm4,mm5
+        movd    DWORD [12+eax],mm1
+        psrlq   mm1,32
+        paddq   mm1,mm2
+        movd    mm5,DWORD [24+eax]
+        paddq   mm6,mm5
+        movd    DWORD [16+eax],mm1
+        psrlq   mm1,32
+        paddq   mm1,mm4
+        movd    mm5,DWORD [28+eax]
+        paddq   mm3,mm5
+        movd    DWORD [20+eax],mm1
+        psrlq   mm1,32
+        paddq   mm1,mm6
+        movd    DWORD [24+eax],mm1
+        psrlq   mm1,32
+        paddq   mm1,mm3
+        movd    DWORD [28+eax],mm1
+        lea     eax,[32+eax]
+        psrlq   mm1,32
+        sub     ecx,8
+        jz      NEAR L$003maw_sse2_exit
+L$001maw_sse2_entry:
+        test    ecx,4294967288
+        jnz     NEAR L$002maw_sse2_unrolled
+align   4
+L$004maw_sse2_loop:
+        movd    mm2,DWORD [edx]
+        movd    mm3,DWORD [eax]
+        pmuludq mm2,mm0
+        lea     edx,[4+edx]
+        paddq   mm1,mm3
+        paddq   mm1,mm2
+        movd    DWORD [eax],mm1
+        sub     ecx,1
+        psrlq   mm1,32
+        lea     eax,[4+eax]
+        jnz     NEAR L$004maw_sse2_loop
+L$003maw_sse2_exit:
+        movd    eax,mm1
+        emms
+        ret
+align   16
+L$000maw_non_sse2:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        ;
+        xor     esi,esi
+        mov     edi,DWORD [20+esp]
+        mov     ecx,DWORD [28+esp]
+        mov     ebx,DWORD [24+esp]
+        and     ecx,4294967288
+        mov     ebp,DWORD [32+esp]
+        push    ecx
+        jz      NEAR L$005maw_finish
+align   16
+L$006maw_loop:
+        ; Round 0
+        mov     eax,DWORD [ebx]
+        mul     ebp
+        add     eax,esi
+        adc     edx,0
+        add     eax,DWORD [edi]
+        adc     edx,0
+        mov     DWORD [edi],eax
+        mov     esi,edx
+        ; Round 4
+        mov     eax,DWORD [4+ebx]
+        mul     ebp
+        add     eax,esi
+        adc     edx,0
+        add     eax,DWORD [4+edi]
+        adc     edx,0
+        mov     DWORD [4+edi],eax
+        mov     esi,edx
+        ; Round 8
+        mov     eax,DWORD [8+ebx]
+        mul     ebp
+        add     eax,esi
+        adc     edx,0
+        add     eax,DWORD [8+edi]
+        adc     edx,0
+        mov     DWORD [8+edi],eax
+        mov     esi,edx
+        ; Round 12
+        mov     eax,DWORD [12+ebx]
+        mul     ebp
+        add     eax,esi
+        adc     edx,0
+        add     eax,DWORD [12+edi]
+        adc     edx,0
+        mov     DWORD [12+edi],eax
+        mov     esi,edx
+        ; Round 16
+        mov     eax,DWORD [16+ebx]
+        mul     ebp
+        add     eax,esi
+        adc     edx,0
+        add     eax,DWORD [16+edi]
+        adc     edx,0
+        mov     DWORD [16+edi],eax
+        mov     esi,edx
+        ; Round 20
+        mov     eax,DWORD [20+ebx]
+        mul     ebp
+        add     eax,esi
+        adc     edx,0
+        add     eax,DWORD [20+edi]
+        adc     edx,0
+        mov     DWORD [20+edi],eax
+        mov     esi,edx
+        ; Round 24
+        mov     eax,DWORD [24+ebx]
+        mul     ebp
+        add     eax,esi
+        adc     edx,0
+        add     eax,DWORD [24+edi]
+        adc     edx,0
+        mov     DWORD [24+edi],eax
+        mov     esi,edx
+        ; Round 28
+        mov     eax,DWORD [28+ebx]
+        mul     ebp
+        add     eax,esi
+        adc     edx,0
+        add     eax,DWORD [28+edi]
+        adc     edx,0
+        mov     DWORD [28+edi],eax
+        mov     esi,edx
+        ;
+        sub     ecx,8
+        lea     ebx,[32+ebx]
+        lea     edi,[32+edi]
+        jnz     NEAR L$006maw_loop
+L$005maw_finish:
+        mov     ecx,DWORD [32+esp]
+        and     ecx,7
+        jnz     NEAR L$007maw_finish2
+        jmp     NEAR L$008maw_end
+L$007maw_finish2:
+        ; Tail Round 0
+        mov     eax,DWORD [ebx]
+        mul     ebp
+        add     eax,esi
+        adc     edx,0
+        add     eax,DWORD [edi]
+        adc     edx,0
+        dec     ecx
+        mov     DWORD [edi],eax
+        mov     esi,edx
+        jz      NEAR L$008maw_end
+        ; Tail Round 1
+        mov     eax,DWORD [4+ebx]
+        mul     ebp
+        add     eax,esi
+        adc     edx,0
+        add     eax,DWORD [4+edi]
+        adc     edx,0
+        dec     ecx
+        mov     DWORD [4+edi],eax
+        mov     esi,edx
+        jz      NEAR L$008maw_end
+        ; Tail Round 2
+        mov     eax,DWORD [8+ebx]
+        mul     ebp
+        add     eax,esi
+        adc     edx,0
+        add     eax,DWORD [8+edi]
+        adc     edx,0
+        dec     ecx
+        mov     DWORD [8+edi],eax
+        mov     esi,edx
+        jz      NEAR L$008maw_end
+        ; Tail Round 3
+        mov     eax,DWORD [12+ebx]
+        mul     ebp
+        add     eax,esi
+        adc     edx,0
+        add     eax,DWORD [12+edi]
+        adc     edx,0
+        dec     ecx
+        mov     DWORD [12+edi],eax
+        mov     esi,edx
+        jz      NEAR L$008maw_end
+        ; Tail Round 4
+        mov     eax,DWORD [16+ebx]
+        mul     ebp
+        add     eax,esi
+        adc     edx,0
+        add     eax,DWORD [16+edi]
+        adc     edx,0
+        dec     ecx
+        mov     DWORD [16+edi],eax
+        mov     esi,edx
+        jz      NEAR L$008maw_end
+        ; Tail Round 5
+        mov     eax,DWORD [20+ebx]
+        mul     ebp
+        add     eax,esi
+        adc     edx,0
+        add     eax,DWORD [20+edi]
+        adc     edx,0
+        dec     ecx
+        mov     DWORD [20+edi],eax
+        mov     esi,edx
+        jz      NEAR L$008maw_end
+        ; Tail Round 6
+        mov     eax,DWORD [24+ebx]
+        mul     ebp
+        add     eax,esi
+        adc     edx,0
+        add     eax,DWORD [24+edi]
+        adc     edx,0
+        mov     DWORD [24+edi],eax
+        mov     esi,edx
+L$008maw_end:
+        mov     eax,esi
+        pop     ecx
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _bn_mul_words
+align   16
+_bn_mul_words:
+L$_bn_mul_words_begin:
+        lea     eax,[_OPENSSL_ia32cap_P]
+        bt      DWORD [eax],26
+        jnc     NEAR L$009mw_non_sse2
+        mov     eax,DWORD [4+esp]
+        mov     edx,DWORD [8+esp]
+        mov     ecx,DWORD [12+esp]
+        movd    mm0,DWORD [16+esp]
+        pxor    mm1,mm1
+align   16
+L$010mw_sse2_loop:
+        movd    mm2,DWORD [edx]
+        pmuludq mm2,mm0
+        lea     edx,[4+edx]
+        paddq   mm1,mm2
+        movd    DWORD [eax],mm1
+        sub     ecx,1
+        psrlq   mm1,32
+        lea     eax,[4+eax]
+        jnz     NEAR L$010mw_sse2_loop
+        movd    eax,mm1
+        emms
+        ret
+align   16
+L$009mw_non_sse2:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        ;
+        xor     esi,esi
+        mov     edi,DWORD [20+esp]
+        mov     ebx,DWORD [24+esp]
+        mov     ebp,DWORD [28+esp]
+        mov     ecx,DWORD [32+esp]
+        and     ebp,4294967288
+        jz      NEAR L$011mw_finish
+L$012mw_loop:
+        ; Round 0
+        mov     eax,DWORD [ebx]
+        mul     ecx
+        add     eax,esi
+        adc     edx,0
+        mov     DWORD [edi],eax
+        mov     esi,edx
+        ; Round 4
+        mov     eax,DWORD [4+ebx]
+        mul     ecx
+        add     eax,esi
+        adc     edx,0
+        mov     DWORD [4+edi],eax
+        mov     esi,edx
+        ; Round 8
+        mov     eax,DWORD [8+ebx]
+        mul     ecx
+        add     eax,esi
+        adc     edx,0
+        mov     DWORD [8+edi],eax
+        mov     esi,edx
+        ; Round 12
+        mov     eax,DWORD [12+ebx]
+        mul     ecx
+        add     eax,esi
+        adc     edx,0
+        mov     DWORD [12+edi],eax
+        mov     esi,edx
+        ; Round 16
+        mov     eax,DWORD [16+ebx]
+        mul     ecx
+        add     eax,esi
+        adc     edx,0
+        mov     DWORD [16+edi],eax
+        mov     esi,edx
+        ; Round 20
+        mov     eax,DWORD [20+ebx]
+        mul     ecx
+        add     eax,esi
+        adc     edx,0
+        mov     DWORD [20+edi],eax
+        mov     esi,edx
+        ; Round 24
+        mov     eax,DWORD [24+ebx]
+        mul     ecx
+        add     eax,esi
+        adc     edx,0
+        mov     DWORD [24+edi],eax
+        mov     esi,edx
+        ; Round 28
+        mov     eax,DWORD [28+ebx]
+        mul     ecx
+        add     eax,esi
+        adc     edx,0
+        mov     DWORD [28+edi],eax
+        mov     esi,edx
+        ;
+        add     ebx,32
+        add     edi,32
+        sub     ebp,8
+        jz      NEAR L$011mw_finish
+        jmp     NEAR L$012mw_loop
+L$011mw_finish:
+        mov     ebp,DWORD [28+esp]
+        and     ebp,7
+        jnz     NEAR L$013mw_finish2
+        jmp     NEAR L$014mw_end
+L$013mw_finish2:
+        ; Tail Round 0
+        mov     eax,DWORD [ebx]
+        mul     ecx
+        add     eax,esi
+        adc     edx,0
+        mov     DWORD [edi],eax
+        mov     esi,edx
+        dec     ebp
+        jz      NEAR L$014mw_end
+        ; Tail Round 1
+        mov     eax,DWORD [4+ebx]
+        mul     ecx
+        add     eax,esi
+        adc     edx,0
+        mov     DWORD [4+edi],eax
+        mov     esi,edx
+        dec     ebp
+        jz      NEAR L$014mw_end
+        ; Tail Round 2
+        mov     eax,DWORD [8+ebx]
+        mul     ecx
+        add     eax,esi
+        adc     edx,0
+        mov     DWORD [8+edi],eax
+        mov     esi,edx
+        dec     ebp
+        jz      NEAR L$014mw_end
+        ; Tail Round 3
+        mov     eax,DWORD [12+ebx]
+        mul     ecx
+        add     eax,esi
+        adc     edx,0
+        mov     DWORD [12+edi],eax
+        mov     esi,edx
+        dec     ebp
+        jz      NEAR L$014mw_end
+        ; Tail Round 4
+        mov     eax,DWORD [16+ebx]
+        mul     ecx
+        add     eax,esi
+        adc     edx,0
+        mov     DWORD [16+edi],eax
+        mov     esi,edx
+        dec     ebp
+        jz      NEAR L$014mw_end
+        ; Tail Round 5
+        mov     eax,DWORD [20+ebx]
+        mul     ecx
+        add     eax,esi
+        adc     edx,0
+        mov     DWORD [20+edi],eax
+        mov     esi,edx
+        dec     ebp
+        jz      NEAR L$014mw_end
+        ; Tail Round 6
+        mov     eax,DWORD [24+ebx]
+        mul     ecx
+        add     eax,esi
+        adc     edx,0
+        mov     DWORD [24+edi],eax
+        mov     esi,edx
+L$014mw_end:
+        mov     eax,esi
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _bn_sqr_words
+align   16
+_bn_sqr_words:
+L$_bn_sqr_words_begin:
+        lea     eax,[_OPENSSL_ia32cap_P]
+        bt      DWORD [eax],26
+        jnc     NEAR L$015sqr_non_sse2
+        mov     eax,DWORD [4+esp]
+        mov     edx,DWORD [8+esp]
+        mov     ecx,DWORD [12+esp]
+align   16
+L$016sqr_sse2_loop:
+        movd    mm0,DWORD [edx]
+        pmuludq mm0,mm0
+        lea     edx,[4+edx]
+        movq    [eax],mm0
+        sub     ecx,1
+        lea     eax,[8+eax]
+        jnz     NEAR L$016sqr_sse2_loop
+        emms
+        ret
+align   16
+L$015sqr_non_sse2:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        ;
+        mov     esi,DWORD [20+esp]
+        mov     edi,DWORD [24+esp]
+        mov     ebx,DWORD [28+esp]
+        and     ebx,4294967288
+        jz      NEAR L$017sw_finish
+L$018sw_loop:
+        ; Round 0
+        mov     eax,DWORD [edi]
+        mul     eax
+        mov     DWORD [esi],eax
+        mov     DWORD [4+esi],edx
+        ; Round 4
+        mov     eax,DWORD [4+edi]
+        mul     eax
+        mov     DWORD [8+esi],eax
+        mov     DWORD [12+esi],edx
+        ; Round 8
+        mov     eax,DWORD [8+edi]
+        mul     eax
+        mov     DWORD [16+esi],eax
+        mov     DWORD [20+esi],edx
+        ; Round 12
+        mov     eax,DWORD [12+edi]
+        mul     eax
+        mov     DWORD [24+esi],eax
+        mov     DWORD [28+esi],edx
+        ; Round 16
+        mov     eax,DWORD [16+edi]
+        mul     eax
+        mov     DWORD [32+esi],eax
+        mov     DWORD [36+esi],edx
+        ; Round 20
+        mov     eax,DWORD [20+edi]
+        mul     eax
+        mov     DWORD [40+esi],eax
+        mov     DWORD [44+esi],edx
+        ; Round 24
+        mov     eax,DWORD [24+edi]
+        mul     eax
+        mov     DWORD [48+esi],eax
+        mov     DWORD [52+esi],edx
+        ; Round 28
+        mov     eax,DWORD [28+edi]
+        mul     eax
+        mov     DWORD [56+esi],eax
+        mov     DWORD [60+esi],edx
+        ;
+        add     edi,32
+        add     esi,64
+        sub     ebx,8
+        jnz     NEAR L$018sw_loop
+L$017sw_finish:
+        mov     ebx,DWORD [28+esp]
+        and     ebx,7
+        jz      NEAR L$019sw_end
+        ; Tail Round 0
+        mov     eax,DWORD [edi]
+        mul     eax
+        mov     DWORD [esi],eax
+        dec     ebx
+        mov     DWORD [4+esi],edx
+        jz      NEAR L$019sw_end
+        ; Tail Round 1
+        mov     eax,DWORD [4+edi]
+        mul     eax
+        mov     DWORD [8+esi],eax
+        dec     ebx
+        mov     DWORD [12+esi],edx
+        jz      NEAR L$019sw_end
+        ; Tail Round 2
+        mov     eax,DWORD [8+edi]
+        mul     eax
+        mov     DWORD [16+esi],eax
+        dec     ebx
+        mov     DWORD [20+esi],edx
+        jz      NEAR L$019sw_end
+        ; Tail Round 3
+        mov     eax,DWORD [12+edi]
+        mul     eax
+        mov     DWORD [24+esi],eax
+        dec     ebx
+        mov     DWORD [28+esi],edx
+        jz      NEAR L$019sw_end
+        ; Tail Round 4
+        mov     eax,DWORD [16+edi]
+        mul     eax
+        mov     DWORD [32+esi],eax
+        dec     ebx
+        mov     DWORD [36+esi],edx
+        jz      NEAR L$019sw_end
+        ; Tail Round 5
+        mov     eax,DWORD [20+edi]
+        mul     eax
+        mov     DWORD [40+esi],eax
+        dec     ebx
+        mov     DWORD [44+esi],edx
+        jz      NEAR L$019sw_end
+        ; Tail Round 6
+        mov     eax,DWORD [24+edi]
+        mul     eax
+        mov     DWORD [48+esi],eax
+        mov     DWORD [52+esi],edx
+L$019sw_end:
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _bn_div_words
+align   16
+_bn_div_words:
+L$_bn_div_words_begin:
+        mov     edx,DWORD [4+esp]
+        mov     eax,DWORD [8+esp]
+        mov     ecx,DWORD [12+esp]
+        div     ecx
+        ret
+global  _bn_add_words
+align   16
+_bn_add_words:
+L$_bn_add_words_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        ;
+        mov     ebx,DWORD [20+esp]
+        mov     esi,DWORD [24+esp]
+        mov     edi,DWORD [28+esp]
+        mov     ebp,DWORD [32+esp]
+        xor     eax,eax
+        and     ebp,4294967288
+        jz      NEAR L$020aw_finish
+L$021aw_loop:
+        ; Round 0
+        mov     ecx,DWORD [esi]
+        mov     edx,DWORD [edi]
+        add     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        add     ecx,edx
+        adc     eax,0
+        mov     DWORD [ebx],ecx
+        ; Round 1
+        mov     ecx,DWORD [4+esi]
+        mov     edx,DWORD [4+edi]
+        add     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        add     ecx,edx
+        adc     eax,0
+        mov     DWORD [4+ebx],ecx
+        ; Round 2
+        mov     ecx,DWORD [8+esi]
+        mov     edx,DWORD [8+edi]
+        add     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        add     ecx,edx
+        adc     eax,0
+        mov     DWORD [8+ebx],ecx
+        ; Round 3
+        mov     ecx,DWORD [12+esi]
+        mov     edx,DWORD [12+edi]
+        add     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        add     ecx,edx
+        adc     eax,0
+        mov     DWORD [12+ebx],ecx
+        ; Round 4
+        mov     ecx,DWORD [16+esi]
+        mov     edx,DWORD [16+edi]
+        add     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        add     ecx,edx
+        adc     eax,0
+        mov     DWORD [16+ebx],ecx
+        ; Round 5
+        mov     ecx,DWORD [20+esi]
+        mov     edx,DWORD [20+edi]
+        add     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        add     ecx,edx
+        adc     eax,0
+        mov     DWORD [20+ebx],ecx
+        ; Round 6
+        mov     ecx,DWORD [24+esi]
+        mov     edx,DWORD [24+edi]
+        add     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        add     ecx,edx
+        adc     eax,0
+        mov     DWORD [24+ebx],ecx
+        ; Round 7
+        mov     ecx,DWORD [28+esi]
+        mov     edx,DWORD [28+edi]
+        add     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        add     ecx,edx
+        adc     eax,0
+        mov     DWORD [28+ebx],ecx
+        ;
+        add     esi,32
+        add     edi,32
+        add     ebx,32
+        sub     ebp,8
+        jnz     NEAR L$021aw_loop
+L$020aw_finish:
+        mov     ebp,DWORD [32+esp]
+        and     ebp,7
+        jz      NEAR L$022aw_end
+        ; Tail Round 0
+        mov     ecx,DWORD [esi]
+        mov     edx,DWORD [edi]
+        add     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        add     ecx,edx
+        adc     eax,0
+        dec     ebp
+        mov     DWORD [ebx],ecx
+        jz      NEAR L$022aw_end
+        ; Tail Round 1
+        mov     ecx,DWORD [4+esi]
+        mov     edx,DWORD [4+edi]
+        add     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        add     ecx,edx
+        adc     eax,0
+        dec     ebp
+        mov     DWORD [4+ebx],ecx
+        jz      NEAR L$022aw_end
+        ; Tail Round 2
+        mov     ecx,DWORD [8+esi]
+        mov     edx,DWORD [8+edi]
+        add     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        add     ecx,edx
+        adc     eax,0
+        dec     ebp
+        mov     DWORD [8+ebx],ecx
+        jz      NEAR L$022aw_end
+        ; Tail Round 3
+        mov     ecx,DWORD [12+esi]
+        mov     edx,DWORD [12+edi]
+        add     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        add     ecx,edx
+        adc     eax,0
+        dec     ebp
+        mov     DWORD [12+ebx],ecx
+        jz      NEAR L$022aw_end
+        ; Tail Round 4
+        mov     ecx,DWORD [16+esi]
+        mov     edx,DWORD [16+edi]
+        add     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        add     ecx,edx
+        adc     eax,0
+        dec     ebp
+        mov     DWORD [16+ebx],ecx
+        jz      NEAR L$022aw_end
+        ; Tail Round 5
+        mov     ecx,DWORD [20+esi]
+        mov     edx,DWORD [20+edi]
+        add     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        add     ecx,edx
+        adc     eax,0
+        dec     ebp
+        mov     DWORD [20+ebx],ecx
+        jz      NEAR L$022aw_end
+        ; Tail Round 6
+        mov     ecx,DWORD [24+esi]
+        mov     edx,DWORD [24+edi]
+        add     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        add     ecx,edx
+        adc     eax,0
+        mov     DWORD [24+ebx],ecx
+L$022aw_end:
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _bn_sub_words
+align   16
+_bn_sub_words:
+L$_bn_sub_words_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        ;
+        mov     ebx,DWORD [20+esp]
+        mov     esi,DWORD [24+esp]
+        mov     edi,DWORD [28+esp]
+        mov     ebp,DWORD [32+esp]
+        xor     eax,eax
+        and     ebp,4294967288
+        jz      NEAR L$023aw_finish
+L$024aw_loop:
+        ; Round 0
+        mov     ecx,DWORD [esi]
+        mov     edx,DWORD [edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [ebx],ecx
+        ; Round 1
+        mov     ecx,DWORD [4+esi]
+        mov     edx,DWORD [4+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [4+ebx],ecx
+        ; Round 2
+        mov     ecx,DWORD [8+esi]
+        mov     edx,DWORD [8+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [8+ebx],ecx
+        ; Round 3
+        mov     ecx,DWORD [12+esi]
+        mov     edx,DWORD [12+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [12+ebx],ecx
+        ; Round 4
+        mov     ecx,DWORD [16+esi]
+        mov     edx,DWORD [16+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [16+ebx],ecx
+        ; Round 5
+        mov     ecx,DWORD [20+esi]
+        mov     edx,DWORD [20+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [20+ebx],ecx
+        ; Round 6
+        mov     ecx,DWORD [24+esi]
+        mov     edx,DWORD [24+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [24+ebx],ecx
+        ; Round 7
+        mov     ecx,DWORD [28+esi]
+        mov     edx,DWORD [28+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [28+ebx],ecx
+        ;
+        add     esi,32
+        add     edi,32
+        add     ebx,32
+        sub     ebp,8
+        jnz     NEAR L$024aw_loop
+L$023aw_finish:
+        mov     ebp,DWORD [32+esp]
+        and     ebp,7
+        jz      NEAR L$025aw_end
+        ; Tail Round 0
+        mov     ecx,DWORD [esi]
+        mov     edx,DWORD [edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        dec     ebp
+        mov     DWORD [ebx],ecx
+        jz      NEAR L$025aw_end
+        ; Tail Round 1
+        mov     ecx,DWORD [4+esi]
+        mov     edx,DWORD [4+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        dec     ebp
+        mov     DWORD [4+ebx],ecx
+        jz      NEAR L$025aw_end
+        ; Tail Round 2
+        mov     ecx,DWORD [8+esi]
+        mov     edx,DWORD [8+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        dec     ebp
+        mov     DWORD [8+ebx],ecx
+        jz      NEAR L$025aw_end
+        ; Tail Round 3
+        mov     ecx,DWORD [12+esi]
+        mov     edx,DWORD [12+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        dec     ebp
+        mov     DWORD [12+ebx],ecx
+        jz      NEAR L$025aw_end
+        ; Tail Round 4
+        mov     ecx,DWORD [16+esi]
+        mov     edx,DWORD [16+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        dec     ebp
+        mov     DWORD [16+ebx],ecx
+        jz      NEAR L$025aw_end
+        ; Tail Round 5
+        mov     ecx,DWORD [20+esi]
+        mov     edx,DWORD [20+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        dec     ebp
+        mov     DWORD [20+ebx],ecx
+        jz      NEAR L$025aw_end
+        ; Tail Round 6
+        mov     ecx,DWORD [24+esi]
+        mov     edx,DWORD [24+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [24+ebx],ecx
+L$025aw_end:
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _bn_sub_part_words
+align   16
+_bn_sub_part_words:
+L$_bn_sub_part_words_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        ;
+        mov     ebx,DWORD [20+esp]
+        mov     esi,DWORD [24+esp]
+        mov     edi,DWORD [28+esp]
+        mov     ebp,DWORD [32+esp]
+        xor     eax,eax
+        and     ebp,4294967288
+        jz      NEAR L$026aw_finish
+L$027aw_loop:
+        ; Round 0
+        mov     ecx,DWORD [esi]
+        mov     edx,DWORD [edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [ebx],ecx
+        ; Round 1
+        mov     ecx,DWORD [4+esi]
+        mov     edx,DWORD [4+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [4+ebx],ecx
+        ; Round 2
+        mov     ecx,DWORD [8+esi]
+        mov     edx,DWORD [8+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [8+ebx],ecx
+        ; Round 3
+        mov     ecx,DWORD [12+esi]
+        mov     edx,DWORD [12+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [12+ebx],ecx
+        ; Round 4
+        mov     ecx,DWORD [16+esi]
+        mov     edx,DWORD [16+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [16+ebx],ecx
+        ; Round 5
+        mov     ecx,DWORD [20+esi]
+        mov     edx,DWORD [20+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [20+ebx],ecx
+        ; Round 6
+        mov     ecx,DWORD [24+esi]
+        mov     edx,DWORD [24+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [24+ebx],ecx
+        ; Round 7
+        mov     ecx,DWORD [28+esi]
+        mov     edx,DWORD [28+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [28+ebx],ecx
+        ;
+        add     esi,32
+        add     edi,32
+        add     ebx,32
+        sub     ebp,8
+        jnz     NEAR L$027aw_loop
+L$026aw_finish:
+        mov     ebp,DWORD [32+esp]
+        and     ebp,7
+        jz      NEAR L$028aw_end
+        ; Tail Round 0
+        mov     ecx,DWORD [esi]
+        mov     edx,DWORD [edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [ebx],ecx
+        add     esi,4
+        add     edi,4
+        add     ebx,4
+        dec     ebp
+        jz      NEAR L$028aw_end
+        ; Tail Round 1
+        mov     ecx,DWORD [esi]
+        mov     edx,DWORD [edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [ebx],ecx
+        add     esi,4
+        add     edi,4
+        add     ebx,4
+        dec     ebp
+        jz      NEAR L$028aw_end
+        ; Tail Round 2
+        mov     ecx,DWORD [esi]
+        mov     edx,DWORD [edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [ebx],ecx
+        add     esi,4
+        add     edi,4
+        add     ebx,4
+        dec     ebp
+        jz      NEAR L$028aw_end
+        ; Tail Round 3
+        mov     ecx,DWORD [esi]
+        mov     edx,DWORD [edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [ebx],ecx
+        add     esi,4
+        add     edi,4
+        add     ebx,4
+        dec     ebp
+        jz      NEAR L$028aw_end
+        ; Tail Round 4
+        mov     ecx,DWORD [esi]
+        mov     edx,DWORD [edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [ebx],ecx
+        add     esi,4
+        add     edi,4
+        add     ebx,4
+        dec     ebp
+        jz      NEAR L$028aw_end
+        ; Tail Round 5
+        mov     ecx,DWORD [esi]
+        mov     edx,DWORD [edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [ebx],ecx
+        add     esi,4
+        add     edi,4
+        add     ebx,4
+        dec     ebp
+        jz      NEAR L$028aw_end
+        ; Tail Round 6
+        mov     ecx,DWORD [esi]
+        mov     edx,DWORD [edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [ebx],ecx
+        add     esi,4
+        add     edi,4
+        add     ebx,4
+L$028aw_end:
+        cmp     DWORD [36+esp],0
+        je      NEAR L$029pw_end
+        mov     ebp,DWORD [36+esp]
+        cmp     ebp,0
+        je      NEAR L$029pw_end
+        jge     NEAR L$030pw_pos
+        ; pw_neg
+        mov     edx,0
+        sub     edx,ebp
+        mov     ebp,edx
+        and     ebp,4294967288
+        jz      NEAR L$031pw_neg_finish
+L$032pw_neg_loop:
+        ; dl<0 Round 0
+        mov     ecx,0
+        mov     edx,DWORD [edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [ebx],ecx
+        ; dl<0 Round 1
+        mov     ecx,0
+        mov     edx,DWORD [4+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [4+ebx],ecx
+        ; dl<0 Round 2
+        mov     ecx,0
+        mov     edx,DWORD [8+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [8+ebx],ecx
+        ; dl<0 Round 3
+        mov     ecx,0
+        mov     edx,DWORD [12+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [12+ebx],ecx
+        ; dl<0 Round 4
+        mov     ecx,0
+        mov     edx,DWORD [16+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [16+ebx],ecx
+        ; dl<0 Round 5
+        mov     ecx,0
+        mov     edx,DWORD [20+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [20+ebx],ecx
+        ; dl<0 Round 6
+        mov     ecx,0
+        mov     edx,DWORD [24+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [24+ebx],ecx
+        ; dl<0 Round 7
+        mov     ecx,0
+        mov     edx,DWORD [28+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [28+ebx],ecx
+        ;
+        add     edi,32
+        add     ebx,32
+        sub     ebp,8
+        jnz     NEAR L$032pw_neg_loop
+L$031pw_neg_finish:
+        mov     edx,DWORD [36+esp]
+        mov     ebp,0
+        sub     ebp,edx
+        and     ebp,7
+        jz      NEAR L$029pw_end
+        ; dl<0 Tail Round 0
+        mov     ecx,0
+        mov     edx,DWORD [edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        dec     ebp
+        mov     DWORD [ebx],ecx
+        jz      NEAR L$029pw_end
+        ; dl<0 Tail Round 1
+        mov     ecx,0
+        mov     edx,DWORD [4+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        dec     ebp
+        mov     DWORD [4+ebx],ecx
+        jz      NEAR L$029pw_end
+        ; dl<0 Tail Round 2
+        mov     ecx,0
+        mov     edx,DWORD [8+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        dec     ebp
+        mov     DWORD [8+ebx],ecx
+        jz      NEAR L$029pw_end
+        ; dl<0 Tail Round 3
+        mov     ecx,0
+        mov     edx,DWORD [12+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        dec     ebp
+        mov     DWORD [12+ebx],ecx
+        jz      NEAR L$029pw_end
+        ; dl<0 Tail Round 4
+        mov     ecx,0
+        mov     edx,DWORD [16+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        dec     ebp
+        mov     DWORD [16+ebx],ecx
+        jz      NEAR L$029pw_end
+        ; dl<0 Tail Round 5
+        mov     ecx,0
+        mov     edx,DWORD [20+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        dec     ebp
+        mov     DWORD [20+ebx],ecx
+        jz      NEAR L$029pw_end
+        ; dl<0 Tail Round 6
+        mov     ecx,0
+        mov     edx,DWORD [24+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [24+ebx],ecx
+        jmp     NEAR L$029pw_end
+L$030pw_pos:
+        and     ebp,4294967288
+        jz      NEAR L$033pw_pos_finish
+L$034pw_pos_loop:
+        ; dl>0 Round 0
+        mov     ecx,DWORD [esi]
+        sub     ecx,eax
+        mov     DWORD [ebx],ecx
+        jnc     NEAR L$035pw_nc0
+        ; dl>0 Round 1
+        mov     ecx,DWORD [4+esi]
+        sub     ecx,eax
+        mov     DWORD [4+ebx],ecx
+        jnc     NEAR L$036pw_nc1
+        ; dl>0 Round 2
+        mov     ecx,DWORD [8+esi]
+        sub     ecx,eax
+        mov     DWORD [8+ebx],ecx
+        jnc     NEAR L$037pw_nc2
+        ; dl>0 Round 3
+        mov     ecx,DWORD [12+esi]
+        sub     ecx,eax
+        mov     DWORD [12+ebx],ecx
+        jnc     NEAR L$038pw_nc3
+        ; dl>0 Round 4
+        mov     ecx,DWORD [16+esi]
+        sub     ecx,eax
+        mov     DWORD [16+ebx],ecx
+        jnc     NEAR L$039pw_nc4
+        ; dl>0 Round 5
+        mov     ecx,DWORD [20+esi]
+        sub     ecx,eax
+        mov     DWORD [20+ebx],ecx
+        jnc     NEAR L$040pw_nc5
+        ; dl>0 Round 6
+        mov     ecx,DWORD [24+esi]
+        sub     ecx,eax
+        mov     DWORD [24+ebx],ecx
+        jnc     NEAR L$041pw_nc6
+        ; dl>0 Round 7
+        mov     ecx,DWORD [28+esi]
+        sub     ecx,eax
+        mov     DWORD [28+ebx],ecx
+        jnc     NEAR L$042pw_nc7
+        ;
+        add     esi,32
+        add     ebx,32
+        sub     ebp,8
+        jnz     NEAR L$034pw_pos_loop
+L$033pw_pos_finish:
+        mov     ebp,DWORD [36+esp]
+        and     ebp,7
+        jz      NEAR L$029pw_end
+        ; dl>0 Tail Round 0
+        mov     ecx,DWORD [esi]
+        sub     ecx,eax
+        mov     DWORD [ebx],ecx
+        jnc     NEAR L$043pw_tail_nc0
+        dec     ebp
+        jz      NEAR L$029pw_end
+        ; dl>0 Tail Round 1
+        mov     ecx,DWORD [4+esi]
+        sub     ecx,eax
+        mov     DWORD [4+ebx],ecx
+        jnc     NEAR L$044pw_tail_nc1
+        dec     ebp
+        jz      NEAR L$029pw_end
+        ; dl>0 Tail Round 2
+        mov     ecx,DWORD [8+esi]
+        sub     ecx,eax
+        mov     DWORD [8+ebx],ecx
+        jnc     NEAR L$045pw_tail_nc2
+        dec     ebp
+        jz      NEAR L$029pw_end
+        ; dl>0 Tail Round 3
+        mov     ecx,DWORD [12+esi]
+        sub     ecx,eax
+        mov     DWORD [12+ebx],ecx
+        jnc     NEAR L$046pw_tail_nc3
+        dec     ebp
+        jz      NEAR L$029pw_end
+        ; dl>0 Tail Round 4
+        mov     ecx,DWORD [16+esi]
+        sub     ecx,eax
+        mov     DWORD [16+ebx],ecx
+        jnc     NEAR L$047pw_tail_nc4
+        dec     ebp
+        jz      NEAR L$029pw_end
+        ; dl>0 Tail Round 5
+        mov     ecx,DWORD [20+esi]
+        sub     ecx,eax
+        mov     DWORD [20+ebx],ecx
+        jnc     NEAR L$048pw_tail_nc5
+        dec     ebp
+        jz      NEAR L$029pw_end
+        ; dl>0 Tail Round 6
+        mov     ecx,DWORD [24+esi]
+        sub     ecx,eax
+        mov     DWORD [24+ebx],ecx
+        jnc     NEAR L$049pw_tail_nc6
+        mov     eax,1
+        jmp     NEAR L$029pw_end
+L$050pw_nc_loop:
+        mov     ecx,DWORD [esi]
+        mov     DWORD [ebx],ecx
+L$035pw_nc0:
+        mov     ecx,DWORD [4+esi]
+        mov     DWORD [4+ebx],ecx
+L$036pw_nc1:
+        mov     ecx,DWORD [8+esi]
+        mov     DWORD [8+ebx],ecx
+L$037pw_nc2:
+        mov     ecx,DWORD [12+esi]
+        mov     DWORD [12+ebx],ecx
+L$038pw_nc3:
+        mov     ecx,DWORD [16+esi]
+        mov     DWORD [16+ebx],ecx
+L$039pw_nc4:
+        mov     ecx,DWORD [20+esi]
+        mov     DWORD [20+ebx],ecx
+L$040pw_nc5:
+        mov     ecx,DWORD [24+esi]
+        mov     DWORD [24+ebx],ecx
+L$041pw_nc6:
+        mov     ecx,DWORD [28+esi]
+        mov     DWORD [28+ebx],ecx
+L$042pw_nc7:
+        ;
+        add     esi,32
+        add     ebx,32
+        sub     ebp,8
+        jnz     NEAR L$050pw_nc_loop
+        mov     ebp,DWORD [36+esp]
+        and     ebp,7
+        jz      NEAR L$051pw_nc_end
+        mov     ecx,DWORD [esi]
+        mov     DWORD [ebx],ecx
+L$043pw_tail_nc0:
+        dec     ebp
+        jz      NEAR L$051pw_nc_end
+        mov     ecx,DWORD [4+esi]
+        mov     DWORD [4+ebx],ecx
+L$044pw_tail_nc1:
+        dec     ebp
+        jz      NEAR L$051pw_nc_end
+        mov     ecx,DWORD [8+esi]
+        mov     DWORD [8+ebx],ecx
+L$045pw_tail_nc2:
+        dec     ebp
+        jz      NEAR L$051pw_nc_end
+        mov     ecx,DWORD [12+esi]
+        mov     DWORD [12+ebx],ecx
+L$046pw_tail_nc3:
+        dec     ebp
+        jz      NEAR L$051pw_nc_end
+        mov     ecx,DWORD [16+esi]
+        mov     DWORD [16+ebx],ecx
+L$047pw_tail_nc4:
+        dec     ebp
+        jz      NEAR L$051pw_nc_end
+        mov     ecx,DWORD [20+esi]
+        mov     DWORD [20+ebx],ecx
+L$048pw_tail_nc5:
+        dec     ebp
+        jz      NEAR L$051pw_nc_end
+        mov     ecx,DWORD [24+esi]
+        mov     DWORD [24+ebx],ecx
+L$049pw_tail_nc6:
+L$051pw_nc_end:
+        mov     eax,0
+L$029pw_end:
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+segment .bss
+common  _OPENSSL_ia32cap_P 16
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/co-586.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/co-586.nasm
new file mode 100644
index 0000000000..08eb9fe372
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/co-586.nasm
@@ -0,0 +1,1259 @@
+; Copyright 1995-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code    use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text   code align=64
+%else
+section .text   code
+%endif
+global  _bn_mul_comba8
+align   16
+_bn_mul_comba8:
+L$_bn_mul_comba8_begin:
+        push    esi
+        mov     esi,DWORD [12+esp]
+        push    edi
+        mov     edi,DWORD [20+esp]
+        push    ebp
+        push    ebx
+        xor     ebx,ebx
+        mov     eax,DWORD [esi]
+        xor     ecx,ecx
+        mov     edx,DWORD [edi]
+        ; ################## Calculate word 0
+        xor     ebp,ebp
+        ; mul a[0]*b[0]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [20+esp]
+        adc     ecx,edx
+        mov     edx,DWORD [edi]
+        adc     ebp,0
+        mov     DWORD [eax],ebx
+        mov     eax,DWORD [4+esi]
+        ; saved r[0]
+        ; ################## Calculate word 1
+        xor     ebx,ebx
+        ; mul a[1]*b[0]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [esi]
+        adc     ebp,edx
+        mov     edx,DWORD [4+edi]
+        adc     ebx,0
+        ; mul a[0]*b[1]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [20+esp]
+        adc     ebp,edx
+        mov     edx,DWORD [edi]
+        adc     ebx,0
+        mov     DWORD [4+eax],ecx
+        mov     eax,DWORD [8+esi]
+        ; saved r[1]
+        ; ################## Calculate word 2
+        xor     ecx,ecx
+        ; mul a[2]*b[0]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [4+esi]
+        adc     ebx,edx
+        mov     edx,DWORD [4+edi]
+        adc     ecx,0
+        ; mul a[1]*b[1]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [esi]
+        adc     ebx,edx
+        mov     edx,DWORD [8+edi]
+        adc     ecx,0
+        ; mul a[0]*b[2]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [20+esp]
+        adc     ebx,edx
+        mov     edx,DWORD [edi]
+        adc     ecx,0
+        mov     DWORD [8+eax],ebp
+        mov     eax,DWORD [12+esi]
+        ; saved r[2]
+        ; ################## Calculate word 3
+        xor     ebp,ebp
+        ; mul a[3]*b[0]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [8+esi]
+        adc     ecx,edx
+        mov     edx,DWORD [4+edi]
+        adc     ebp,0
+        ; mul a[2]*b[1]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [4+esi]
+        adc     ecx,edx
+        mov     edx,DWORD [8+edi]
+        adc     ebp,0
+        ; mul a[1]*b[2]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [esi]
+        adc     ecx,edx
+        mov     edx,DWORD [12+edi]
+        adc     ebp,0
+        ; mul a[0]*b[3]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [20+esp]
+        adc     ecx,edx
+        mov     edx,DWORD [edi]
+        adc     ebp,0
+        mov     DWORD [12+eax],ebx
+        mov     eax,DWORD [16+esi]
+        ; saved r[3]
+        ; ################## Calculate word 4
+        xor     ebx,ebx
+        ; mul a[4]*b[0]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [12+esi]
+        adc     ebp,edx
+        mov     edx,DWORD [4+edi]
+        adc     ebx,0
+        ; mul a[3]*b[1]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [8+esi]
+        adc     ebp,edx
+        mov     edx,DWORD [8+edi]
+        adc     ebx,0
+        ; mul a[2]*b[2]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [4+esi]
+        adc     ebp,edx
+        mov     edx,DWORD [12+edi]
+        adc     ebx,0
+        ; mul a[1]*b[3]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [esi]
+        adc     ebp,edx
+        mov     edx,DWORD [16+edi]
+        adc     ebx,0
+        ; mul a[0]*b[4]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [20+esp]
+        adc     ebp,edx
+        mov     edx,DWORD [edi]
+        adc     ebx,0
+        mov     DWORD [16+eax],ecx
+        mov     eax,DWORD [20+esi]
+        ; saved r[4]
+        ; ################## Calculate word 5
+        xor     ecx,ecx
+        ; mul a[5]*b[0]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [16+esi]
+        adc     ebx,edx
+        mov     edx,DWORD [4+edi]
+        adc     ecx,0
+        ; mul a[4]*b[1]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [12+esi]
+        adc     ebx,edx
+        mov     edx,DWORD [8+edi]
+        adc     ecx,0
+        ; mul a[3]*b[2]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [8+esi]
+        adc     ebx,edx
+        mov     edx,DWORD [12+edi]
+        adc     ecx,0
+        ; mul a[2]*b[3]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [4+esi]
+        adc     ebx,edx
+        mov     edx,DWORD [16+edi]
+        adc     ecx,0
+        ; mul a[1]*b[4]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [esi]
+        adc     ebx,edx
+        mov     edx,DWORD [20+edi]
+        adc     ecx,0
+        ; mul a[0]*b[5]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [20+esp]
+        adc     ebx,edx
+        mov     edx,DWORD [edi]
+        adc     ecx,0
+        mov     DWORD [20+eax],ebp
+        mov     eax,DWORD [24+esi]
+        ; saved r[5]
+        ; ################## Calculate word 6
+        xor     ebp,ebp
+        ; mul a[6]*b[0]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [20+esi]
+        adc     ecx,edx
+        mov     edx,DWORD [4+edi]
+        adc     ebp,0
+        ; mul a[5]*b[1]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [16+esi]
+        adc     ecx,edx
+        mov     edx,DWORD [8+edi]
+        adc     ebp,0
+        ; mul a[4]*b[2]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [12+esi]
+        adc     ecx,edx
+        mov     edx,DWORD [12+edi]
+        adc     ebp,0
+        ; mul a[3]*b[3]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [8+esi]
+        adc     ecx,edx
+        mov     edx,DWORD [16+edi]
+        adc     ebp,0
+        ; mul a[2]*b[4]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [4+esi]
+        adc     ecx,edx
+        mov     edx,DWORD [20+edi]
+        adc     ebp,0
+        ; mul a[1]*b[5]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [esi]
+        adc     ecx,edx
+        mov     edx,DWORD [24+edi]
+        adc     ebp,0
+        ; mul a[0]*b[6]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [20+esp]
+        adc     ecx,edx
+        mov     edx,DWORD [edi]
+        adc     ebp,0
+        mov     DWORD [24+eax],ebx
+        mov     eax,DWORD [28+esi]
+        ; saved r[6]
+        ; ################## Calculate word 7
+        xor     ebx,ebx
+        ; mul a[7]*b[0]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [24+esi]
+        adc     ebp,edx
+        mov     edx,DWORD [4+edi]
+        adc     ebx,0
+        ; mul a[6]*b[1]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [20+esi]
+        adc     ebp,edx
+        mov     edx,DWORD [8+edi]
+        adc     ebx,0
+        ; mul a[5]*b[2]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [16+esi]
+        adc     ebp,edx
+        mov     edx,DWORD [12+edi]
+        adc     ebx,0
+        ; mul a[4]*b[3]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [12+esi]
+        adc     ebp,edx
+        mov     edx,DWORD [16+edi]
+        adc     ebx,0
+        ; mul a[3]*b[4]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [8+esi]
+        adc     ebp,edx
+        mov     edx,DWORD [20+edi]
+        adc     ebx,0
+        ; mul a[2]*b[5]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [4+esi]
+        adc     ebp,edx
+        mov     edx,DWORD [24+edi]
+        adc     ebx,0
+        ; mul a[1]*b[6]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [esi]
+        adc     ebp,edx
+        mov     edx,DWORD [28+edi]
+        adc     ebx,0
+        ; mul a[0]*b[7]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [20+esp]
+        adc     ebp,edx
+        mov     edx,DWORD [4+edi]
+        adc     ebx,0
+        mov     DWORD [28+eax],ecx
+        mov     eax,DWORD [28+esi]
+        ; saved r[7]
+        ; ################## Calculate word 8
+        xor     ecx,ecx
+        ; mul a[7]*b[1]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [24+esi]
+        adc     ebx,edx
+        mov     edx,DWORD [8+edi]
+        adc     ecx,0
+        ; mul a[6]*b[2]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [20+esi]
+        adc     ebx,edx
+        mov     edx,DWORD [12+edi]
+        adc     ecx,0
+        ; mul a[5]*b[3]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [16+esi]
+        adc     ebx,edx
+        mov     edx,DWORD [16+edi]
+        adc     ecx,0
+        ; mul a[4]*b[4]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [12+esi]
+        adc     ebx,edx
+        mov     edx,DWORD [20+edi]
+        adc     ecx,0
+        ; mul a[3]*b[5]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [8+esi]
+        adc     ebx,edx
+        mov     edx,DWORD [24+edi]
+        adc     ecx,0
+        ; mul a[2]*b[6]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [4+esi]
+        adc     ebx,edx
+        mov     edx,DWORD [28+edi]
+        adc     ecx,0
+        ; mul a[1]*b[7]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [20+esp]
+        adc     ebx,edx
+        mov     edx,DWORD [8+edi]
+        adc     ecx,0
+        mov     DWORD [32+eax],ebp
+        mov     eax,DWORD [28+esi]
+        ; saved r[8]
+        ; ################## Calculate word 9
+        xor     ebp,ebp
+        ; mul a[7]*b[2]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [24+esi]
+        adc     ecx,edx
+        mov     edx,DWORD [12+edi]
+        adc     ebp,0
+        ; mul a[6]*b[3]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [20+esi]
+        adc     ecx,edx
+        mov     edx,DWORD [16+edi]
+        adc     ebp,0
+        ; mul a[5]*b[4]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [16+esi]
+        adc     ecx,edx
+        mov     edx,DWORD [20+edi]
+        adc     ebp,0
+        ; mul a[4]*b[5]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [12+esi]
+        adc     ecx,edx
+        mov     edx,DWORD [24+edi]
+        adc     ebp,0
+        ; mul a[3]*b[6]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [8+esi]
+        adc     ecx,edx
+        mov     edx,DWORD [28+edi]
+        adc     ebp,0
+        ; mul a[2]*b[7]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [20+esp]
+        adc     ecx,edx
+        mov     edx,DWORD [12+edi]
+        adc     ebp,0
+        mov     DWORD [36+eax],ebx
+        mov     eax,DWORD [28+esi]
+        ; saved r[9]
+        ; ################## Calculate word 10
+        xor     ebx,ebx
+        ; mul a[7]*b[3]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [24+esi]
+        adc     ebp,edx
+        mov     edx,DWORD [16+edi]
+        adc     ebx,0
+        ; mul a[6]*b[4]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [20+esi]
+        adc     ebp,edx
+        mov     edx,DWORD [20+edi]
+        adc     ebx,0
+        ; mul a[5]*b[5]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [16+esi]
+        adc     ebp,edx
+        mov     edx,DWORD [24+edi]
+        adc     ebx,0
+        ; mul a[4]*b[6]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [12+esi]
+        adc     ebp,edx
+        mov     edx,DWORD [28+edi]
+        adc     ebx,0
+        ; mul a[3]*b[7]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [20+esp]
+        adc     ebp,edx
+        mov     edx,DWORD [16+edi]
+        adc     ebx,0
+        mov     DWORD [40+eax],ecx
+        mov     eax,DWORD [28+esi]
+        ; saved r[10]
+        ; ################## Calculate word 11
+        xor     ecx,ecx
+        ; mul a[7]*b[4]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [24+esi]
+        adc     ebx,edx
+        mov     edx,DWORD [20+edi]
+        adc     ecx,0
+        ; mul a[6]*b[5]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [20+esi]
+        adc     ebx,edx
+        mov     edx,DWORD [24+edi]
+        adc     ecx,0
+        ; mul a[5]*b[6]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [16+esi]
+        adc     ebx,edx
+        mov     edx,DWORD [28+edi]
+        adc     ecx,0
+        ; mul a[4]*b[7]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [20+esp]
+        adc     ebx,edx
+        mov     edx,DWORD [20+edi]
+        adc     ecx,0
+        mov     DWORD [44+eax],ebp
+        mov     eax,DWORD [28+esi]
+        ; saved r[11]
+        ; ################## Calculate word 12
+        xor     ebp,ebp
+        ; mul a[7]*b[5]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [24+esi]
+        adc     ecx,edx
+        mov     edx,DWORD [24+edi]
+        adc     ebp,0
+        ; mul a[6]*b[6]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [20+esi]
+        adc     ecx,edx
+        mov     edx,DWORD [28+edi]
+        adc     ebp,0
+        ; mul a[5]*b[7]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [20+esp]
+        adc     ecx,edx
+        mov     edx,DWORD [24+edi]
+        adc     ebp,0
+        mov     DWORD [48+eax],ebx
+        mov     eax,DWORD [28+esi]
+        ; saved r[12]
+        ; ################## Calculate word 13
+        xor     ebx,ebx
+        ; mul a[7]*b[6]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [24+esi]
+        adc     ebp,edx
+        mov     edx,DWORD [28+edi]
+        adc     ebx,0
+        ; mul a[6]*b[7]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [20+esp]
+        adc     ebp,edx
+        mov     edx,DWORD [28+edi]
+        adc     ebx,0
+        mov     DWORD [52+eax],ecx
+        mov     eax,DWORD [28+esi]
+        ; saved r[13]
+        ; ################## Calculate word 14
+        xor     ecx,ecx
+        ; mul a[7]*b[7]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [20+esp]
+        adc     ebx,edx
+        adc     ecx,0
+        mov     DWORD [56+eax],ebp
+        ; saved r[14]
+        ; save r[15]
+        mov     DWORD [60+eax],ebx
+        pop     ebx
+        pop     ebp
+        pop     edi
+        pop     esi
+        ret
+global  _bn_mul_comba4
+align   16
+_bn_mul_comba4:
+L$_bn_mul_comba4_begin:
+        push    esi
+        mov     esi,DWORD [12+esp]
+        push    edi
+        mov     edi,DWORD [20+esp]
+        push    ebp
+        push    ebx
+        xor     ebx,ebx
+        mov     eax,DWORD [esi]
+        xor     ecx,ecx
+        mov     edx,DWORD [edi]
+        ; ################## Calculate word 0
+        xor     ebp,ebp
+        ; mul a[0]*b[0]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [20+esp]
+        adc     ecx,edx
+        mov     edx,DWORD [edi]
+        adc     ebp,0
+        mov     DWORD [eax],ebx
+        mov     eax,DWORD [4+esi]
+        ; saved r[0]
+        ; ################## Calculate word 1
+        xor     ebx,ebx
+        ; mul a[1]*b[0]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [esi]
+        adc     ebp,edx
+        mov     edx,DWORD [4+edi]
+        adc     ebx,0
+        ; mul a[0]*b[1]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [20+esp]
+        adc     ebp,edx
+        mov     edx,DWORD [edi]
+        adc     ebx,0
+        mov     DWORD [4+eax],ecx
+        mov     eax,DWORD [8+esi]
+        ; saved r[1]
+        ; ################## Calculate word 2
+        xor     ecx,ecx
+        ; mul a[2]*b[0]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [4+esi]
+        adc     ebx,edx
+        mov     edx,DWORD [4+edi]
+        adc     ecx,0
+        ; mul a[1]*b[1]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [esi]
+        adc     ebx,edx
+        mov     edx,DWORD [8+edi]
+        adc     ecx,0
+        ; mul a[0]*b[2]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [20+esp]
+        adc     ebx,edx
+        mov     edx,DWORD [edi]
+        adc     ecx,0
+        mov     DWORD [8+eax],ebp
+        mov     eax,DWORD [12+esi]
+        ; saved r[2]
+        ; ################## Calculate word 3
+        xor     ebp,ebp
+        ; mul a[3]*b[0]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [8+esi]
+        adc     ecx,edx
+        mov     edx,DWORD [4+edi]
+        adc     ebp,0
+        ; mul a[2]*b[1]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [4+esi]
+        adc     ecx,edx
+        mov     edx,DWORD [8+edi]
+        adc     ebp,0
+        ; mul a[1]*b[2]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [esi]
+        adc     ecx,edx
+        mov     edx,DWORD [12+edi]
+        adc     ebp,0
+        ; mul a[0]*b[3]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [20+esp]
+        adc     ecx,edx
+        mov     edx,DWORD [4+edi]
+        adc     ebp,0
+        mov     DWORD [12+eax],ebx
+        mov     eax,DWORD [12+esi]
+        ; saved r[3]
+        ; ################## Calculate word 4
+        xor     ebx,ebx
+        ; mul a[3]*b[1]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [8+esi]
+        adc     ebp,edx
+        mov     edx,DWORD [8+edi]
+        adc     ebx,0
+        ; mul a[2]*b[2]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [4+esi]
+        adc     ebp,edx
+        mov     edx,DWORD [12+edi]
+        adc     ebx,0
+        ; mul a[1]*b[3]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [20+esp]
+        adc     ebp,edx
+        mov     edx,DWORD [8+edi]
+        adc     ebx,0
+        mov     DWORD [16+eax],ecx
+        mov     eax,DWORD [12+esi]
+        ; saved r[4]
+        ; ################## Calculate word 5
+        xor     ecx,ecx
+        ; mul a[3]*b[2]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [8+esi]
+        adc     ebx,edx
+        mov     edx,DWORD [12+edi]
+        adc     ecx,0
+        ; mul a[2]*b[3]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [20+esp]
+        adc     ebx,edx
+        mov     edx,DWORD [12+edi]
+        adc     ecx,0
+        mov     DWORD [20+eax],ebp
+        mov     eax,DWORD [12+esi]
+        ; saved r[5]
+        ; ################## Calculate word 6
+        xor     ebp,ebp
+        ; mul a[3]*b[3]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [20+esp]
+        adc     ecx,edx
+        adc     ebp,0
+        mov     DWORD [24+eax],ebx
+        ; saved r[6]
+        ; save r[7]
+        mov     DWORD [28+eax],ecx
+        pop     ebx
+        pop     ebp
+        pop     edi
+        pop     esi
+        ret
+global  _bn_sqr_comba8
+align   16
+_bn_sqr_comba8:
+L$_bn_sqr_comba8_begin:
+        push    esi
+        push    edi
+        push    ebp
+        push    ebx
+        mov     edi,DWORD [20+esp]
+        mov     esi,DWORD [24+esp]
+        xor     ebx,ebx
+        xor     ecx,ecx
+        mov     eax,DWORD [esi]
+        ; ############### Calculate word 0
+        xor     ebp,ebp
+        ; sqr a[0]*a[0]
+        mul     eax
+        add     ebx,eax
+        adc     ecx,edx
+        mov     edx,DWORD [esi]
+        adc     ebp,0
+        mov     DWORD [edi],ebx
+        mov     eax,DWORD [4+esi]
+        ; saved r[0]
+        ; ############### Calculate word 1
+        xor     ebx,ebx
+        ; sqr a[1]*a[0]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebx,0
+        add     ecx,eax
+        adc     ebp,edx
+        mov     eax,DWORD [8+esi]
+        adc     ebx,0
+        mov     DWORD [4+edi],ecx
+        mov     edx,DWORD [esi]
+        ; saved r[1]
+        ; ############### Calculate word 2
+        xor     ecx,ecx
+        ; sqr a[2]*a[0]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ecx,0
+        add     ebp,eax
+        adc     ebx,edx
+        mov     eax,DWORD [4+esi]
+        adc     ecx,0
+        ; sqr a[1]*a[1]
+        mul     eax
+        add     ebp,eax
+        adc     ebx,edx
+        mov     edx,DWORD [esi]
+        adc     ecx,0
+        mov     DWORD [8+edi],ebp
+        mov     eax,DWORD [12+esi]
+        ; saved r[2]
+        ; ############### Calculate word 3
+        xor     ebp,ebp
+        ; sqr a[3]*a[0]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebp,0
+        add     ebx,eax
+        adc     ecx,edx
+        mov     eax,DWORD [8+esi]
+        adc     ebp,0
+        mov     edx,DWORD [4+esi]
+        ; sqr a[2]*a[1]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebp,0
+        add     ebx,eax
+        adc     ecx,edx
+        mov     eax,DWORD [16+esi]
+        adc     ebp,0
+        mov     DWORD [12+edi],ebx
+        mov     edx,DWORD [esi]
+        ; saved r[3]
+        ; ############### Calculate word 4
+        xor     ebx,ebx
+        ; sqr a[4]*a[0]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebx,0
+        add     ecx,eax
+        adc     ebp,edx
+        mov     eax,DWORD [12+esi]
+        adc     ebx,0
+        mov     edx,DWORD [4+esi]
+        ; sqr a[3]*a[1]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebx,0
+        add     ecx,eax
+        adc     ebp,edx
+        mov     eax,DWORD [8+esi]
+        adc     ebx,0
+        ; sqr a[2]*a[2]
+        mul     eax
+        add     ecx,eax
+        adc     ebp,edx
+        mov     edx,DWORD [esi]
+        adc     ebx,0
+        mov     DWORD [16+edi],ecx
+        mov     eax,DWORD [20+esi]
+        ; saved r[4]
+        ; ############### Calculate word 5
+        xor     ecx,ecx
+        ; sqr a[5]*a[0]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ecx,0
+        add     ebp,eax
+        adc     ebx,edx
+        mov     eax,DWORD [16+esi]
+        adc     ecx,0
+        mov     edx,DWORD [4+esi]
+        ; sqr a[4]*a[1]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ecx,0
+        add     ebp,eax
+        adc     ebx,edx
+        mov     eax,DWORD [12+esi]
+        adc     ecx,0
+        mov     edx,DWORD [8+esi]
+        ; sqr a[3]*a[2]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ecx,0
+        add     ebp,eax
+        adc     ebx,edx
+        mov     eax,DWORD [24+esi]
+        adc     ecx,0
+        mov     DWORD [20+edi],ebp
+        mov     edx,DWORD [esi]
+        ; saved r[5]
+        ; ############### Calculate word 6
+        xor     ebp,ebp
+        ; sqr a[6]*a[0]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebp,0
+        add     ebx,eax
+        adc     ecx,edx
+        mov     eax,DWORD [20+esi]
+        adc     ebp,0
+        mov     edx,DWORD [4+esi]
+        ; sqr a[5]*a[1]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebp,0
+        add     ebx,eax
+        adc     ecx,edx
+        mov     eax,DWORD [16+esi]
+        adc     ebp,0
+        mov     edx,DWORD [8+esi]
+        ; sqr a[4]*a[2]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebp,0
+        add     ebx,eax
+        adc     ecx,edx
+        mov     eax,DWORD [12+esi]
+        adc     ebp,0
+        ; sqr a[3]*a[3]
+        mul     eax
+        add     ebx,eax
+        adc     ecx,edx
+        mov     edx,DWORD [esi]
+        adc     ebp,0
+        mov     DWORD [24+edi],ebx
+        mov     eax,DWORD [28+esi]
+        ; saved r[6]
+        ; ############### Calculate word 7
+        xor     ebx,ebx
+        ; sqr a[7]*a[0]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebx,0
+        add     ecx,eax
+        adc     ebp,edx
+        mov     eax,DWORD [24+esi]
+        adc     ebx,0
+        mov     edx,DWORD [4+esi]
+        ; sqr a[6]*a[1]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebx,0
+        add     ecx,eax
+        adc     ebp,edx
+        mov     eax,DWORD [20+esi]
+        adc     ebx,0
+        mov     edx,DWORD [8+esi]
+        ; sqr a[5]*a[2]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebx,0
+        add     ecx,eax
+        adc     ebp,edx
+        mov     eax,DWORD [16+esi]
+        adc     ebx,0
+        mov     edx,DWORD [12+esi]
+        ; sqr a[4]*a[3]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebx,0
+        add     ecx,eax
+        adc     ebp,edx
+        mov     eax,DWORD [28+esi]
+        adc     ebx,0
+        mov     DWORD [28+edi],ecx
+        mov     edx,DWORD [4+esi]
+        ; saved r[7]
+        ; ############### Calculate word 8
+        xor     ecx,ecx
+        ; sqr a[7]*a[1]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ecx,0
+        add     ebp,eax
+        adc     ebx,edx
+        mov     eax,DWORD [24+esi]
+        adc     ecx,0
+        mov     edx,DWORD [8+esi]
+        ; sqr a[6]*a[2]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ecx,0
+        add     ebp,eax
+        adc     ebx,edx
+        mov     eax,DWORD [20+esi]
+        adc     ecx,0
+        mov     edx,DWORD [12+esi]
+        ; sqr a[5]*a[3]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ecx,0
+        add     ebp,eax
+        adc     ebx,edx
+        mov     eax,DWORD [16+esi]
+        adc     ecx,0
+        ; sqr a[4]*a[4]
+        mul     eax
+        add     ebp,eax
+        adc     ebx,edx
+        mov     edx,DWORD [8+esi]
+        adc     ecx,0
+        mov     DWORD [32+edi],ebp
+        mov     eax,DWORD [28+esi]
+        ; saved r[8]
+        ; ############### Calculate word 9
+        xor     ebp,ebp
+        ; sqr a[7]*a[2]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebp,0
+        add     ebx,eax
+        adc     ecx,edx
+        mov     eax,DWORD [24+esi]
+        adc     ebp,0
+        mov     edx,DWORD [12+esi]
+        ; sqr a[6]*a[3]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebp,0
+        add     ebx,eax
+        adc     ecx,edx
+        mov     eax,DWORD [20+esi]
+        adc     ebp,0
+        mov     edx,DWORD [16+esi]
+        ; sqr a[5]*a[4]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebp,0
+        add     ebx,eax
+        adc     ecx,edx
+        mov     eax,DWORD [28+esi]
+        adc     ebp,0
+        mov     DWORD [36+edi],ebx
+        mov     edx,DWORD [12+esi]
+        ; saved r[9]
+        ; ############### Calculate word 10
+        xor     ebx,ebx
+        ; sqr a[7]*a[3]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebx,0
+        add     ecx,eax
+        adc     ebp,edx
+        mov     eax,DWORD [24+esi]
+        adc     ebx,0
+        mov     edx,DWORD [16+esi]
+        ; sqr a[6]*a[4]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebx,0
+        add     ecx,eax
+        adc     ebp,edx
+        mov     eax,DWORD [20+esi]
+        adc     ebx,0
+        ; sqr a[5]*a[5]
+        mul     eax
+        add     ecx,eax
+        adc     ebp,edx
+        mov     edx,DWORD [16+esi]
+        adc     ebx,0
+        mov     DWORD [40+edi],ecx
+        mov     eax,DWORD [28+esi]
+        ; saved r[10]
+        ; ############### Calculate word 11
+        xor     ecx,ecx
+        ; sqr a[7]*a[4]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ecx,0
+        add     ebp,eax
+        adc     ebx,edx
+        mov     eax,DWORD [24+esi]
+        adc     ecx,0
+        mov     edx,DWORD [20+esi]
+        ; sqr a[6]*a[5]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ecx,0
+        add     ebp,eax
+        adc     ebx,edx
+        mov     eax,DWORD [28+esi]
+        adc     ecx,0
+        mov     DWORD [44+edi],ebp
+        mov     edx,DWORD [20+esi]
+        ; saved r[11]
+        ; ############### Calculate word 12
+        xor     ebp,ebp
+        ; sqr a[7]*a[5]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebp,0
+        add     ebx,eax
+        adc     ecx,edx
+        mov     eax,DWORD [24+esi]
+        adc     ebp,0
+        ; sqr a[6]*a[6]
+        mul     eax
+        add     ebx,eax
+        adc     ecx,edx
+        mov     edx,DWORD [24+esi]
+        adc     ebp,0
+        mov     DWORD [48+edi],ebx
+        mov     eax,DWORD [28+esi]
+        ; saved r[12]
+        ; ############### Calculate word 13
+        xor     ebx,ebx
+        ; sqr a[7]*a[6]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebx,0
+        add     ecx,eax
+        adc     ebp,edx
+        mov     eax,DWORD [28+esi]
+        adc     ebx,0
+        mov     DWORD [52+edi],ecx
+        ; saved r[13]
+        ; ############### Calculate word 14
+        xor     ecx,ecx
+        ; sqr a[7]*a[7]
+        mul     eax
+        add     ebp,eax
+        adc     ebx,edx
+        adc     ecx,0
+        mov     DWORD [56+edi],ebp
+        ; saved r[14]
+        mov     DWORD [60+edi],ebx
+        pop     ebx
+        pop     ebp
+        pop     edi
+        pop     esi
+        ret
+global  _bn_sqr_comba4
+align   16
+_bn_sqr_comba4:
+L$_bn_sqr_comba4_begin:
+        push    esi
+        push    edi
+        push    ebp
+        push    ebx
+        mov     edi,DWORD [20+esp]
+        mov     esi,DWORD [24+esp]
+        xor     ebx,ebx
+        xor     ecx,ecx
+        mov     eax,DWORD [esi]
+        ; ############### Calculate word 0
+        xor     ebp,ebp
+        ; sqr a[0]*a[0]
+        mul     eax
+        add     ebx,eax
+        adc     ecx,edx
+        mov     edx,DWORD [esi]
+        adc     ebp,0
+        mov     DWORD [edi],ebx
+        mov     eax,DWORD [4+esi]
+        ; saved r[0]
+        ; ############### Calculate word 1
+        xor     ebx,ebx
+        ; sqr a[1]*a[0]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebx,0
+        add     ecx,eax
+        adc     ebp,edx
+        mov     eax,DWORD [8+esi]
+        adc     ebx,0
+        mov     DWORD [4+edi],ecx
+        mov     edx,DWORD [esi]
+        ; saved r[1]
+        ; ############### Calculate word 2
+        xor     ecx,ecx
+        ; sqr a[2]*a[0]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ecx,0
+        add     ebp,eax
+        adc     ebx,edx
+        mov     eax,DWORD [4+esi]
+        adc     ecx,0
+        ; sqr a[1]*a[1]
+        mul     eax
+        add     ebp,eax
+        adc     ebx,edx
+        mov     edx,DWORD [esi]
+        adc     ecx,0
+        mov     DWORD [8+edi],ebp
+        mov     eax,DWORD [12+esi]
+        ; saved r[2]
+        ; ############### Calculate word 3
+        xor     ebp,ebp
+        ; sqr a[3]*a[0]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebp,0
+        add     ebx,eax
+        adc     ecx,edx
+        mov     eax,DWORD [8+esi]
+        adc     ebp,0
+        mov     edx,DWORD [4+esi]
+        ; sqr a[2]*a[1]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebp,0
+        add     ebx,eax
+        adc     ecx,edx
+        mov     eax,DWORD [12+esi]
+        adc     ebp,0
+        mov     DWORD [12+edi],ebx
+        mov     edx,DWORD [4+esi]
+        ; saved r[3]
+        ; ############### Calculate word 4
+        xor     ebx,ebx
+        ; sqr a[3]*a[1]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebx,0
+        add     ecx,eax
+        adc     ebp,edx
+        mov     eax,DWORD [8+esi]
+        adc     ebx,0
+        ; sqr a[2]*a[2]
+        mul     eax
+        add     ecx,eax
+        adc     ebp,edx
+        mov     edx,DWORD [8+esi]
+        adc     ebx,0
+        mov     DWORD [16+edi],ecx
+        mov     eax,DWORD [12+esi]
+        ; saved r[4]
+        ; ############### Calculate word 5
+        xor     ecx,ecx
+        ; sqr a[3]*a[2]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ecx,0
+        add     ebp,eax
+        adc     ebx,edx
+        mov     eax,DWORD [12+esi]
+        adc     ecx,0
+        mov     DWORD [20+edi],ebp
+        ; saved r[5]
+        ; ############### Calculate word 6
+        xor     ebp,ebp
+        ; sqr a[3]*a[3]
+        mul     eax
+        add     ebx,eax
+        adc     ecx,edx
+        adc     ebp,0
+        mov     DWORD [24+edi],ebx
+        ; saved r[6]
+        mov     DWORD [28+edi],ecx
+        pop     ebx
+        pop     ebp
+        pop     edi
+        pop     esi
+        ret
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-gf2m.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-gf2m.nasm
new file mode 100644
index 0000000000..5f2f4f65de
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-gf2m.nasm
@@ -0,0 +1,352 @@
+; Copyright 2011-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code    use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text   code align=64
+%else
+section .text   code
+%endif
+;extern _OPENSSL_ia32cap_P
+align   16
+__mul_1x1_mmx:
+        sub     esp,36
+        mov     ecx,eax
+        lea     edx,[eax*1+eax]
+        and     ecx,1073741823
+        lea     ebp,[edx*1+edx]
+        mov     DWORD [esp],0
+        and     edx,2147483647
+        movd    mm2,eax
+        movd    mm3,ebx
+        mov     DWORD [4+esp],ecx
+        xor     ecx,edx
+        pxor    mm5,mm5
+        pxor    mm4,mm4
+        mov     DWORD [8+esp],edx
+        xor     edx,ebp
+        mov     DWORD [12+esp],ecx
+        pcmpgtd mm5,mm2
+        paddd   mm2,mm2
+        xor     ecx,edx
+        mov     DWORD [16+esp],ebp
+        xor     ebp,edx
+        pand    mm5,mm3
+        pcmpgtd mm4,mm2
+        mov     DWORD [20+esp],ecx
+        xor     ebp,ecx
+        psllq   mm5,31
+        pand    mm4,mm3
+        mov     DWORD [24+esp],edx
+        mov     esi,7
+        mov     DWORD [28+esp],ebp
+        mov     ebp,esi
+        and     esi,ebx
+        shr     ebx,3
+        mov     edi,ebp
+        psllq   mm4,30
+        and     edi,ebx
+        shr     ebx,3
+        movd    mm0,DWORD [esi*4+esp]
+        mov     esi,ebp
+        and     esi,ebx
+        shr     ebx,3
+        movd    mm2,DWORD [edi*4+esp]
+        mov     edi,ebp
+        psllq   mm2,3
+        and     edi,ebx
+        shr     ebx,3
+        pxor    mm0,mm2
+        movd    mm1,DWORD [esi*4+esp]
+        mov     esi,ebp
+        psllq   mm1,6
+        and     esi,ebx
+        shr     ebx,3
+        pxor    mm0,mm1
+        movd    mm2,DWORD [edi*4+esp]
+        mov     edi,ebp
+        psllq   mm2,9
+        and     edi,ebx
+        shr     ebx,3
+        pxor    mm0,mm2
+        movd    mm1,DWORD [esi*4+esp]
+        mov     esi,ebp
+        psllq   mm1,12
+        and     esi,ebx
+        shr     ebx,3
+        pxor    mm0,mm1
+        movd    mm2,DWORD [edi*4+esp]
+        mov     edi,ebp
+        psllq   mm2,15
+        and     edi,ebx
+        shr     ebx,3
+        pxor    mm0,mm2
+        movd    mm1,DWORD [esi*4+esp]
+        mov     esi,ebp
+        psllq   mm1,18
+        and     esi,ebx
+        shr     ebx,3
+        pxor    mm0,mm1
+        movd    mm2,DWORD [edi*4+esp]
+        mov     edi,ebp
+        psllq   mm2,21
+        and     edi,ebx
+        shr     ebx,3
+        pxor    mm0,mm2
+        movd    mm1,DWORD [esi*4+esp]
+        mov     esi,ebp
+        psllq   mm1,24
+        and     esi,ebx
+        shr     ebx,3
+        pxor    mm0,mm1
+        movd    mm2,DWORD [edi*4+esp]
+        pxor    mm0,mm4
+        psllq   mm2,27
+        pxor    mm0,mm2
+        movd    mm1,DWORD [esi*4+esp]
+        pxor    mm0,mm5
+        psllq   mm1,30
+        add     esp,36
+        pxor    mm0,mm1
+        ret
+align   16
+__mul_1x1_ialu:
+        sub     esp,36
+        mov     ecx,eax
+        lea     edx,[eax*1+eax]
+        lea     ebp,[eax*4]
+        and     ecx,1073741823
+        lea     edi,[eax*1+eax]
+        sar     eax,31
+        mov     DWORD [esp],0
+        and     edx,2147483647
+        mov     DWORD [4+esp],ecx
+        xor     ecx,edx
+        mov     DWORD [8+esp],edx
+        xor     edx,ebp
+        mov     DWORD [12+esp],ecx
+        xor     ecx,edx
+        mov     DWORD [16+esp],ebp
+        xor     ebp,edx
+        mov     DWORD [20+esp],ecx
+        xor     ebp,ecx
+        sar     edi,31
+        and     eax,ebx
+        mov     DWORD [24+esp],edx
+        and     edi,ebx
+        mov     DWORD [28+esp],ebp
+        mov     edx,eax
+        shl     eax,31
+        mov     ecx,edi
+        shr     edx,1
+        mov     esi,7
+        shl     edi,30
+        and     esi,ebx
+        shr     ecx,2
+        xor     eax,edi
+        shr     ebx,3
+        mov     edi,7
+        and     edi,ebx
+        shr     ebx,3
+        xor     edx,ecx
+        xor     eax,DWORD [esi*4+esp]
+        mov     esi,7
+        and     esi,ebx
+        shr     ebx,3
+        mov     ebp,DWORD [edi*4+esp]
+        mov     edi,7
+        mov     ecx,ebp
+        shl     ebp,3
+        and     edi,ebx
+        shr     ecx,29
+        xor     eax,ebp
+        shr     ebx,3
+        xor     edx,ecx
+        mov     ecx,DWORD [esi*4+esp]
+        mov     esi,7
+        mov     ebp,ecx
+        shl     ecx,6
+        and     esi,ebx
+        shr     ebp,26
+        xor     eax,ecx
+        shr     ebx,3
+        xor     edx,ebp
+        mov     ebp,DWORD [edi*4+esp]
+        mov     edi,7
+        mov     ecx,ebp
+        shl     ebp,9
+        and     edi,ebx
+        shr     ecx,23
+        xor     eax,ebp
+        shr     ebx,3
+        xor     edx,ecx
+        mov     ecx,DWORD [esi*4+esp]
+        mov     esi,7
+        mov     ebp,ecx
+        shl     ecx,12
+        and     esi,ebx
+        shr     ebp,20
+        xor     eax,ecx
+        shr     ebx,3
+        xor     edx,ebp
+        mov     ebp,DWORD [edi*4+esp]
+        mov     edi,7
+        mov     ecx,ebp
+        shl     ebp,15
+        and     edi,ebx
+        shr     ecx,17
+        xor     eax,ebp
+        shr     ebx,3
+        xor     edx,ecx
+        mov     ecx,DWORD [esi*4+esp]
+        mov     esi,7
+        mov     ebp,ecx
+        shl     ecx,18
+        and     esi,ebx
+        shr     ebp,14
+        xor     eax,ecx
+        shr     ebx,3
+        xor     edx,ebp
+        mov     ebp,DWORD [edi*4+esp]
+        mov     edi,7
+        mov     ecx,ebp
+        shl     ebp,21
+        and     edi,ebx
+        shr     ecx,11
+        xor     eax,ebp
+        shr     ebx,3
+        xor     edx,ecx
+        mov     ecx,DWORD [esi*4+esp]
+        mov     esi,7
+        mov     ebp,ecx
+        shl     ecx,24
+        and     esi,ebx
+        shr     ebp,8
+        xor     eax,ecx
+        shr     ebx,3
+        xor     edx,ebp
+        mov     ebp,DWORD [edi*4+esp]
+        mov     ecx,ebp
+        shl     ebp,27
+        mov     edi,DWORD [esi*4+esp]
+        shr     ecx,5
+        mov     esi,edi
+        xor     eax,ebp
+        shl     edi,30
+        xor     edx,ecx
+        shr     esi,2
+        xor     eax,edi
+        xor     edx,esi
+        add     esp,36
+        ret
+global  _bn_GF2m_mul_2x2
+align   16
+_bn_GF2m_mul_2x2:
+L$_bn_GF2m_mul_2x2_begin:
+        lea     edx,[_OPENSSL_ia32cap_P]
+        mov     eax,DWORD [edx]
+        mov     edx,DWORD [4+edx]
+        test    eax,8388608
+        jz      NEAR L$000ialu
+        test    eax,16777216
+        jz      NEAR L$001mmx
+        test    edx,2
+        jz      NEAR L$001mmx
+        movups  xmm0,[8+esp]
+        shufps  xmm0,xmm0,177
+db      102,15,58,68,192,1
+        mov     eax,DWORD [4+esp]
+        movups  [eax],xmm0
+        ret
+align   16
+L$001mmx:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     eax,DWORD [24+esp]
+        mov     ebx,DWORD [32+esp]
+        call    __mul_1x1_mmx
+        movq    mm7,mm0
+        mov     eax,DWORD [28+esp]
+        mov     ebx,DWORD [36+esp]
+        call    __mul_1x1_mmx
+        movq    mm6,mm0
+        mov     eax,DWORD [24+esp]
+        mov     ebx,DWORD [32+esp]
+        xor     eax,DWORD [28+esp]
+        xor     ebx,DWORD [36+esp]
+        call    __mul_1x1_mmx
+        pxor    mm0,mm7
+        mov     eax,DWORD [20+esp]
+        pxor    mm0,mm6
+        movq    mm2,mm0
+        psllq   mm0,32
+        pop     edi
+        psrlq   mm2,32
+        pop     esi
+        pxor    mm0,mm6
+        pop     ebx
+        pxor    mm2,mm7
+        movq    [eax],mm0
+        pop     ebp
+        movq    [8+eax],mm2
+        emms
+        ret
+align   16
+L$000ialu:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        sub     esp,20
+        mov     eax,DWORD [44+esp]
+        mov     ebx,DWORD [52+esp]
+        call    __mul_1x1_ialu
+        mov     DWORD [8+esp],eax
+        mov     DWORD [12+esp],edx
+        mov     eax,DWORD [48+esp]
+        mov     ebx,DWORD [56+esp]
+        call    __mul_1x1_ialu
+        mov     DWORD [esp],eax
+        mov     DWORD [4+esp],edx
+        mov     eax,DWORD [44+esp]
+        mov     ebx,DWORD [52+esp]
+        xor     eax,DWORD [48+esp]
+        xor     ebx,DWORD [56+esp]
+        call    __mul_1x1_ialu
+        mov     ebp,DWORD [40+esp]
+        mov     ebx,DWORD [esp]
+        mov     ecx,DWORD [4+esp]
+        mov     edi,DWORD [8+esp]
+        mov     esi,DWORD [12+esp]
+        xor     eax,edx
+        xor     edx,ecx
+        xor     eax,ebx
+        mov     DWORD [ebp],ebx
+        xor     edx,edi
+        mov     DWORD [12+ebp],esi
+        xor     eax,esi
+        add     esp,20
+        xor     edx,esi
+        pop     edi
+        xor     eax,edx
+        pop     esi
+        mov     DWORD [8+ebp],edx
+        pop     ebx
+        mov     DWORD [4+ebp],eax
+        pop     ebp
+        ret
+db      71,70,40,50,94,109,41,32,77,117,108,116,105,112,108,105
+db      99,97,116,105,111,110,32,102,111,114,32,120,56,54,44,32
+db      67,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97
+db      112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103
+db      62,0
+segment .bss
+common  _OPENSSL_ia32cap_P 16
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-mont.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-mont.nasm
new file mode 100644
index 0000000000..904526ffbf
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-mont.nasm
@@ -0,0 +1,486 @@
+; Copyright 2005-2018 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code    use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text   code align=64
+%else
+section .text   code
+%endif
+;extern _OPENSSL_ia32cap_P
+global  _bn_mul_mont
+align   16
+_bn_mul_mont:
+L$_bn_mul_mont_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        xor     eax,eax
+        mov     edi,DWORD [40+esp]
+        cmp     edi,4
+        jl      NEAR L$000just_leave
+        lea     esi,[20+esp]
+        lea     edx,[24+esp]
+        add     edi,2
+        neg     edi
+        lea     ebp,[edi*4+esp-32]
+        neg     edi
+        mov     eax,ebp
+        sub     eax,edx
+        and     eax,2047
+        sub     ebp,eax
+        xor     edx,ebp
+        and     edx,2048
+        xor     edx,2048
+        sub     ebp,edx
+        and     ebp,-64
+        mov     eax,esp
+        sub     eax,ebp
+        and     eax,-4096
+        mov     edx,esp
+        lea     esp,[eax*1+ebp]
+        mov     eax,DWORD [esp]
+        cmp     esp,ebp
+        ja      NEAR L$001page_walk
+        jmp     NEAR L$002page_walk_done
+align   16
+L$001page_walk:
+        lea     esp,[esp-4096]
+        mov     eax,DWORD [esp]
+        cmp     esp,ebp
+        ja      NEAR L$001page_walk
+L$002page_walk_done:
+        mov     eax,DWORD [esi]
+        mov     ebx,DWORD [4+esi]
+        mov     ecx,DWORD [8+esi]
+        mov     ebp,DWORD [12+esi]
+        mov     esi,DWORD [16+esi]
+        mov     esi,DWORD [esi]
+        mov     DWORD [4+esp],eax
+        mov     DWORD [8+esp],ebx
+        mov     DWORD [12+esp],ecx
+        mov     DWORD [16+esp],ebp
+        mov     DWORD [20+esp],esi
+        lea     ebx,[edi-3]
+        mov     DWORD [24+esp],edx
+        lea     eax,[_OPENSSL_ia32cap_P]
+        bt      DWORD [eax],26
+        jnc     NEAR L$003non_sse2
+        mov     eax,-1
+        movd    mm7,eax
+        mov     esi,DWORD [8+esp]
+        mov     edi,DWORD [12+esp]
+        mov     ebp,DWORD [16+esp]
+        xor     edx,edx
+        xor     ecx,ecx
+        movd    mm4,DWORD [edi]
+        movd    mm5,DWORD [esi]
+        movd    mm3,DWORD [ebp]
+        pmuludq mm5,mm4
+        movq    mm2,mm5
+        movq    mm0,mm5
+        pand    mm0,mm7
+        pmuludq mm5,[20+esp]
+        pmuludq mm3,mm5
+        paddq   mm3,mm0
+        movd    mm1,DWORD [4+ebp]
+        movd    mm0,DWORD [4+esi]
+        psrlq   mm2,32
+        psrlq   mm3,32
+        inc     ecx
+align   16
+L$0041st:
+        pmuludq mm0,mm4
+        pmuludq mm1,mm5
+        paddq   mm2,mm0
+        paddq   mm3,mm1
+        movq    mm0,mm2
+        pand    mm0,mm7
+        movd    mm1,DWORD [4+ecx*4+ebp]
+        paddq   mm3,mm0
+        movd    mm0,DWORD [4+ecx*4+esi]
+        psrlq   mm2,32
+        movd    DWORD [28+ecx*4+esp],mm3
+        psrlq   mm3,32
+        lea     ecx,[1+ecx]
+        cmp     ecx,ebx
+        jl      NEAR L$0041st
+        pmuludq mm0,mm4
+        pmuludq mm1,mm5
+        paddq   mm2,mm0
+        paddq   mm3,mm1
+        movq    mm0,mm2
+        pand    mm0,mm7
+        paddq   mm3,mm0
+        movd    DWORD [28+ecx*4+esp],mm3
+        psrlq   mm2,32
+        psrlq   mm3,32
+        paddq   mm3,mm2
+        movq    [32+ebx*4+esp],mm3
+        inc     edx
+L$005outer:
+        xor     ecx,ecx
+        movd    mm4,DWORD [edx*4+edi]
+        movd    mm5,DWORD [esi]
+        movd    mm6,DWORD [32+esp]
+        movd    mm3,DWORD [ebp]
+        pmuludq mm5,mm4
+        paddq   mm5,mm6
+        movq    mm0,mm5
+        movq    mm2,mm5
+        pand    mm0,mm7
+        pmuludq mm5,[20+esp]
+        pmuludq mm3,mm5
+        paddq   mm3,mm0
+        movd    mm6,DWORD [36+esp]
+        movd    mm1,DWORD [4+ebp]
+        movd    mm0,DWORD [4+esi]
+        psrlq   mm2,32
+        psrlq   mm3,32
+        paddq   mm2,mm6
+        inc     ecx
+        dec     ebx
+L$006inner:
+        pmuludq mm0,mm4
+        pmuludq mm1,mm5
+        paddq   mm2,mm0
+        paddq   mm3,mm1
+        movq    mm0,mm2
+        movd    mm6,DWORD [36+ecx*4+esp]
+        pand    mm0,mm7
+        movd    mm1,DWORD [4+ecx*4+ebp]
+        paddq   mm3,mm0
+        movd    mm0,DWORD [4+ecx*4+esi]
+        psrlq   mm2,32
+        movd    DWORD [28+ecx*4+esp],mm3
+        psrlq   mm3,32
+        paddq   mm2,mm6
+        dec     ebx
+        lea     ecx,[1+ecx]
+        jnz     NEAR L$006inner
+        mov     ebx,ecx
+        pmuludq mm0,mm4
+        pmuludq mm1,mm5
+        paddq   mm2,mm0
+        paddq   mm3,mm1
+        movq    mm0,mm2
+        pand    mm0,mm7
+        paddq   mm3,mm0
+        movd    DWORD [28+ecx*4+esp],mm3
+        psrlq   mm2,32
+        psrlq   mm3,32
+        movd    mm6,DWORD [36+ebx*4+esp]
+        paddq   mm3,mm2
+        paddq   mm3,mm6
+        movq    [32+ebx*4+esp],mm3
+        lea     edx,[1+edx]
+        cmp     edx,ebx
+        jle     NEAR L$005outer
+        emms
+        jmp     NEAR L$007common_tail
+align   16
+L$003non_sse2:
+        mov     esi,DWORD [8+esp]
+        lea     ebp,[1+ebx]
+        mov     edi,DWORD [12+esp]
+        xor     ecx,ecx
+        mov     edx,esi
+        and     ebp,1
+        sub     edx,edi
+        lea     eax,[4+ebx*4+edi]
+        or      ebp,edx
+        mov     edi,DWORD [edi]
+        jz      NEAR L$008bn_sqr_mont
+        mov     DWORD [28+esp],eax
+        mov     eax,DWORD [esi]
+        xor     edx,edx
+align   16
+L$009mull:
+        mov     ebp,edx
+        mul     edi
+        add     ebp,eax
+        lea     ecx,[1+ecx]
+        adc     edx,0
+        mov     eax,DWORD [ecx*4+esi]
+        cmp     ecx,ebx
+        mov     DWORD [28+ecx*4+esp],ebp
+        jl      NEAR L$009mull
+        mov     ebp,edx
+        mul     edi
+        mov     edi,DWORD [20+esp]
+        add     eax,ebp
+        mov     esi,DWORD [16+esp]
+        adc     edx,0
+        imul    edi,DWORD [32+esp]
+        mov     DWORD [32+ebx*4+esp],eax
+        xor     ecx,ecx
+        mov     DWORD [36+ebx*4+esp],edx
+        mov     DWORD [40+ebx*4+esp],ecx
+        mov     eax,DWORD [esi]
+        mul     edi
+        add     eax,DWORD [32+esp]
+        mov     eax,DWORD [4+esi]
+        adc     edx,0
+        inc     ecx
+        jmp     NEAR L$0102ndmadd
+align   16
+L$0111stmadd:
+        mov     ebp,edx
+        mul     edi
+        add     ebp,DWORD [32+ecx*4+esp]
+        lea     ecx,[1+ecx]
+        adc     edx,0
+        add     ebp,eax
+        mov     eax,DWORD [ecx*4+esi]
+        adc     edx,0
+        cmp     ecx,ebx
+        mov     DWORD [28+ecx*4+esp],ebp
+        jl      NEAR L$0111stmadd
+        mov     ebp,edx
+        mul     edi
+        add     eax,DWORD [32+ebx*4+esp]
+        mov     edi,DWORD [20+esp]
+        adc     edx,0
+        mov     esi,DWORD [16+esp]
+        add     ebp,eax
+        adc     edx,0
+        imul    edi,DWORD [32+esp]
+        xor     ecx,ecx
+        add     edx,DWORD [36+ebx*4+esp]
+        mov     DWORD [32+ebx*4+esp],ebp
+        adc     ecx,0
+        mov     eax,DWORD [esi]
+        mov     DWORD [36+ebx*4+esp],edx
+        mov     DWORD [40+ebx*4+esp],ecx
+        mul     edi
+        add     eax,DWORD [32+esp]
+        mov     eax,DWORD [4+esi]
+        adc     edx,0
+        mov     ecx,1
+align   16
+L$0102ndmadd:
+        mov     ebp,edx
+        mul     edi
+        add     ebp,DWORD [32+ecx*4+esp]
+        lea     ecx,[1+ecx]
+        adc     edx,0
+        add     ebp,eax
+        mov     eax,DWORD [ecx*4+esi]
+        adc     edx,0
+        cmp     ecx,ebx
+        mov     DWORD [24+ecx*4+esp],ebp
+        jl      NEAR L$0102ndmadd
+        mov     ebp,edx
+        mul     edi
+        add     ebp,DWORD [32+ebx*4+esp]
+        adc     edx,0
+        add     ebp,eax
+        adc     edx,0
+        mov     DWORD [28+ebx*4+esp],ebp
+        xor     eax,eax
+        mov     ecx,DWORD [12+esp]
+        add     edx,DWORD [36+ebx*4+esp]
+        adc     eax,DWORD [40+ebx*4+esp]
+        lea     ecx,[4+ecx]
+        mov     DWORD [32+ebx*4+esp],edx
+        cmp     ecx,DWORD [28+esp]
+        mov     DWORD [36+ebx*4+esp],eax
+        je      NEAR L$007common_tail
+        mov     edi,DWORD [ecx]
+        mov     esi,DWORD [8+esp]
+        mov     DWORD [12+esp],ecx
+        xor     ecx,ecx
+        xor     edx,edx
+        mov     eax,DWORD [esi]
+        jmp     NEAR L$0111stmadd
+align   16
+L$008bn_sqr_mont:
+        mov     DWORD [esp],ebx
+        mov     DWORD [12+esp],ecx
+        mov     eax,edi
+        mul     edi
+        mov     DWORD [32+esp],eax
+        mov     ebx,edx
+        shr     edx,1
+        and     ebx,1
+        inc     ecx
+align   16
+L$012sqr:
+        mov     eax,DWORD [ecx*4+esi]
+        mov     ebp,edx
+        mul     edi
+        add     eax,ebp
+        lea     ecx,[1+ecx]
+        adc     edx,0
+        lea     ebp,[eax*2+ebx]
+        shr     eax,31
+        cmp     ecx,DWORD [esp]
+        mov     ebx,eax
+        mov     DWORD [28+ecx*4+esp],ebp
+        jl      NEAR L$012sqr
+        mov     eax,DWORD [ecx*4+esi]
+        mov     ebp,edx
+        mul     edi
+        add     eax,ebp
+        mov     edi,DWORD [20+esp]
+        adc     edx,0
+        mov     esi,DWORD [16+esp]
+        lea     ebp,[eax*2+ebx]
+        imul    edi,DWORD [32+esp]
+        shr     eax,31
+        mov     DWORD [32+ecx*4+esp],ebp
+        lea     ebp,[edx*2+eax]
+        mov     eax,DWORD [esi]
+        shr     edx,31
+        mov     DWORD [36+ecx*4+esp],ebp
+        mov     DWORD [40+ecx*4+esp],edx
+        mul     edi
+        add     eax,DWORD [32+esp]
+        mov     ebx,ecx
+        adc     edx,0
+        mov     eax,DWORD [4+esi]
+        mov     ecx,1
+align   16
+L$0133rdmadd:
+        mov     ebp,edx
+        mul     edi
+        add     ebp,DWORD [32+ecx*4+esp]
+        adc     edx,0
+        add     ebp,eax
+        mov     eax,DWORD [4+ecx*4+esi]
+        adc     edx,0
+        mov     DWORD [28+ecx*4+esp],ebp
+        mov     ebp,edx
+        mul     edi
+        add     ebp,DWORD [36+ecx*4+esp]
+        lea     ecx,[2+ecx]
+        adc     edx,0
+        add     ebp,eax
+        mov     eax,DWORD [ecx*4+esi]
+        adc     edx,0
+        cmp     ecx,ebx
+        mov     DWORD [24+ecx*4+esp],ebp
+        jl      NEAR L$0133rdmadd
+        mov     ebp,edx
+        mul     edi
+        add     ebp,DWORD [32+ebx*4+esp]
+        adc     edx,0
+        add     ebp,eax
+        adc     edx,0
+        mov     DWORD [28+ebx*4+esp],ebp
+        mov     ecx,DWORD [12+esp]
+        xor     eax,eax
+        mov     esi,DWORD [8+esp]
+        add     edx,DWORD [36+ebx*4+esp]
+        adc     eax,DWORD [40+ebx*4+esp]
+        mov     DWORD [32+ebx*4+esp],edx
+        cmp     ecx,ebx
+        mov     DWORD [36+ebx*4+esp],eax
+        je      NEAR L$007common_tail
+        mov     edi,DWORD [4+ecx*4+esi]
+        lea     ecx,[1+ecx]
+        mov     eax,edi
+        mov     DWORD [12+esp],ecx
+        mul     edi
+        add     eax,DWORD [32+ecx*4+esp]
+        adc     edx,0
+        mov     DWORD [32+ecx*4+esp],eax
+        xor     ebp,ebp
+        cmp     ecx,ebx
+        lea     ecx,[1+ecx]
+        je      NEAR L$014sqrlast
+        mov     ebx,edx
+        shr     edx,1
+        and     ebx,1
+align   16
+L$015sqradd:
+        mov     eax,DWORD [ecx*4+esi]
+        mov     ebp,edx
+        mul     edi
+        add     eax,ebp
+        lea     ebp,[eax*1+eax]
+        adc     edx,0
+        shr     eax,31
+        add     ebp,DWORD [32+ecx*4+esp]
+        lea     ecx,[1+ecx]
+        adc     eax,0
+        add     ebp,ebx
+        adc     eax,0
+        cmp     ecx,DWORD [esp]
+        mov     DWORD [28+ecx*4+esp],ebp
+        mov     ebx,eax
+        jle     NEAR L$015sqradd
+        mov     ebp,edx
+        add     edx,edx
+        shr     ebp,31
+        add     edx,ebx
+        adc     ebp,0
+L$014sqrlast:
+        mov     edi,DWORD [20+esp]
+        mov     esi,DWORD [16+esp]
+        imul    edi,DWORD [32+esp]
+        add     edx,DWORD [32+ecx*4+esp]
+        mov     eax,DWORD [esi]
+        adc     ebp,0
+        mov     DWORD [32+ecx*4+esp],edx
+        mov     DWORD [36+ecx*4+esp],ebp
+        mul     edi
+        add     eax,DWORD [32+esp]
+        lea     ebx,[ecx-1]
+        adc     edx,0
+        mov     ecx,1
+        mov     eax,DWORD [4+esi]
+        jmp     NEAR L$0133rdmadd
+align   16
+L$007common_tail:
+        mov     ebp,DWORD [16+esp]
+        mov     edi,DWORD [4+esp]
+        lea     esi,[32+esp]
+        mov     eax,DWORD [esi]
+        mov     ecx,ebx
+        xor     edx,edx
+align   16
+L$016sub:
+        sbb     eax,DWORD [edx*4+ebp]
+        mov     DWORD [edx*4+edi],eax
+        dec     ecx
+        mov     eax,DWORD [4+edx*4+esi]
+        lea     edx,[1+edx]
+        jge     NEAR L$016sub
+        sbb     eax,0
+        mov     edx,-1
+        xor     edx,eax
+        jmp     NEAR L$017copy
+align   16
+L$017copy:
+        mov     esi,DWORD [32+ebx*4+esp]
+        mov     ebp,DWORD [ebx*4+edi]
+        mov     DWORD [32+ebx*4+esp],ecx
+        and     esi,eax
+        and     ebp,edx
+        or      ebp,esi
+        mov     DWORD [ebx*4+edi],ebp
+        dec     ebx
+        jge     NEAR L$017copy
+        mov     esp,DWORD [24+esp]
+        mov     eax,1
+L$000just_leave:
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+db      77,111,110,116,103,111,109,101,114,121,32,77,117,108,116,105
+db      112,108,105,99,97,116,105,111,110,32,102,111,114,32,120,56
+db      54,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121
+db      32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46
+db      111,114,103,62,0
+segment .bss
+common  _OPENSSL_ia32cap_P 16
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/des/crypt586.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/des/crypt586.nasm
new file mode 100644
index 0000000000..dd69f436c4
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/des/crypt586.nasm
@@ -0,0 +1,887 @@
+; Copyright 1995-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code    use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text   code align=64
+%else
+section .text   code
+%endif
+extern  _DES_SPtrans
+global  _fcrypt_body
+align   16
+_fcrypt_body:
+L$_fcrypt_body_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        ;
+        ; Load the 2 words
+        xor     edi,edi
+        xor     esi,esi
+        lea     edx,[_DES_SPtrans]
+        push    edx
+        mov     ebp,DWORD [28+esp]
+        push    DWORD 25
+L$000start:
+        ;
+        ; Round 0
+        mov     eax,DWORD [36+esp]
+        mov     edx,esi
+        shr     edx,16
+        mov     ecx,DWORD [40+esp]
+        xor     edx,esi
+        and     eax,edx
+        and     edx,ecx
+        mov     ebx,eax
+        shl     ebx,16
+        mov     ecx,edx
+        shl     ecx,16
+        xor     eax,ebx
+        xor     edx,ecx
+        mov     ebx,DWORD [ebp]
+        xor     eax,ebx
+        mov     ecx,DWORD [4+ebp]
+        xor     eax,esi
+        xor     edx,esi
+        xor     edx,ecx
+        and     eax,0xfcfcfcfc
+        xor     ebx,ebx
+        and     edx,0xcfcfcfcf
+        xor     ecx,ecx
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        mov     ebp,DWORD [4+esp]
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        mov     ebx,DWORD [0x600+ebx*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x700+ecx*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x400+eax*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x500+edx*1+ebp]
+        xor     edi,ebx
+        mov     ebp,DWORD [32+esp]
+        ;
+        ; Round 1
+        mov     eax,DWORD [36+esp]
+        mov     edx,edi
+        shr     edx,16
+        mov     ecx,DWORD [40+esp]
+        xor     edx,edi
+        and     eax,edx
+        and     edx,ecx
+        mov     ebx,eax
+        shl     ebx,16
+        mov     ecx,edx
+        shl     ecx,16
+        xor     eax,ebx
+        xor     edx,ecx
+        mov     ebx,DWORD [8+ebp]
+        xor     eax,ebx
+        mov     ecx,DWORD [12+ebp]
+        xor     eax,edi
+        xor     edx,edi
+        xor     edx,ecx
+        and     eax,0xfcfcfcfc
+        xor     ebx,ebx
+        and     edx,0xcfcfcfcf
+        xor     ecx,ecx
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        mov     ebp,DWORD [4+esp]
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        mov     ebx,DWORD [0x600+ebx*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x700+ecx*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x400+eax*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x500+edx*1+ebp]
+        xor     esi,ebx
+        mov     ebp,DWORD [32+esp]
+        ;
+        ; Round 2
+        mov     eax,DWORD [36+esp]
+        mov     edx,esi
+        shr     edx,16
+        mov     ecx,DWORD [40+esp]
+        xor     edx,esi
+        and     eax,edx
+        and     edx,ecx
+        mov     ebx,eax
+        shl     ebx,16
+        mov     ecx,edx
+        shl     ecx,16
+        xor     eax,ebx
+        xor     edx,ecx
+        mov     ebx,DWORD [16+ebp]
+        xor     eax,ebx
+        mov     ecx,DWORD [20+ebp]
+        xor     eax,esi
+        xor     edx,esi
+        xor     edx,ecx
+        and     eax,0xfcfcfcfc
+        xor     ebx,ebx
+        and     edx,0xcfcfcfcf
+        xor     ecx,ecx
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        mov     ebp,DWORD [4+esp]
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        mov     ebx,DWORD [0x600+ebx*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x700+ecx*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x400+eax*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x500+edx*1+ebp]
+        xor     edi,ebx
+        mov     ebp,DWORD [32+esp]
+        ;
+        ; Round 3
+        mov     eax,DWORD [36+esp]
+        mov     edx,edi
+        shr     edx,16
+        mov     ecx,DWORD [40+esp]
+        xor     edx,edi
+        and     eax,edx
+        and     edx,ecx
+        mov     ebx,eax
+        shl     ebx,16
+        mov     ecx,edx
+        shl     ecx,16
+        xor     eax,ebx
+        xor     edx,ecx
+        mov     ebx,DWORD [24+ebp]
+        xor     eax,ebx
+        mov     ecx,DWORD [28+ebp]
+        xor     eax,edi
+        xor     edx,edi
+        xor     edx,ecx
+        and     eax,0xfcfcfcfc
+        xor     ebx,ebx
+        and     edx,0xcfcfcfcf
+        xor     ecx,ecx
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        mov     ebp,DWORD [4+esp]
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        mov     ebx,DWORD [0x600+ebx*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x700+ecx*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x400+eax*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x500+edx*1+ebp]
+        xor     esi,ebx
+        mov     ebp,DWORD [32+esp]
+        ;
+        ; Round 4
+        mov     eax,DWORD [36+esp]
+        mov     edx,esi
+        shr     edx,16
+        mov     ecx,DWORD [40+esp]
+        xor     edx,esi
+        and     eax,edx
+        and     edx,ecx
+        mov     ebx,eax
+        shl     ebx,16
+        mov     ecx,edx
+        shl     ecx,16
+        xor     eax,ebx
+        xor     edx,ecx
+        mov     ebx,DWORD [32+ebp]
+        xor     eax,ebx
+        mov     ecx,DWORD [36+ebp]
+        xor     eax,esi
+        xor     edx,esi
+        xor     edx,ecx
+        and     eax,0xfcfcfcfc
+        xor     ebx,ebx
+        and     edx,0xcfcfcfcf
+        xor     ecx,ecx
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        mov     ebp,DWORD [4+esp]
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        mov     ebx,DWORD [0x600+ebx*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x700+ecx*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x400+eax*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x500+edx*1+ebp]
+        xor     edi,ebx
+        mov     ebp,DWORD [32+esp]
+        ;
+        ; Round 5
+        mov     eax,DWORD [36+esp]
+        mov     edx,edi
+        shr     edx,16
+        mov     ecx,DWORD [40+esp]
+        xor     edx,edi
+        and     eax,edx
+        and     edx,ecx
+        mov     ebx,eax
+        shl     ebx,16
+        mov     ecx,edx
+        shl     ecx,16
+        xor     eax,ebx
+        xor     edx,ecx
+        mov     ebx,DWORD [40+ebp]
+        xor     eax,ebx
+        mov     ecx,DWORD [44+ebp]
+        xor     eax,edi
+        xor     edx,edi
+        xor     edx,ecx
+        and     eax,0xfcfcfcfc
+        xor     ebx,ebx
+        and     edx,0xcfcfcfcf
+        xor     ecx,ecx
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        mov     ebp,DWORD [4+esp]
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        mov     ebx,DWORD [0x600+ebx*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x700+ecx*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x400+eax*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x500+edx*1+ebp]
+        xor     esi,ebx
+        mov     ebp,DWORD [32+esp]
+        ;
+        ; Round 6
+        mov     eax,DWORD [36+esp]
+        mov     edx,esi
+        shr     edx,16
+        mov     ecx,DWORD [40+esp]
+        xor     edx,esi
+        and     eax,edx
+        and     edx,ecx
+        mov     ebx,eax
+        shl     ebx,16
+        mov     ecx,edx
+        shl     ecx,16
+        xor     eax,ebx
+        xor     edx,ecx
+        mov     ebx,DWORD [48+ebp]
+        xor     eax,ebx
+        mov     ecx,DWORD [52+ebp]
+        xor     eax,esi
+        xor     edx,esi
+        xor     edx,ecx
+        and     eax,0xfcfcfcfc
+        xor     ebx,ebx
+        and     edx,0xcfcfcfcf
+        xor     ecx,ecx
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        mov     ebp,DWORD [4+esp]
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        mov     ebx,DWORD [0x600+ebx*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x700+ecx*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x400+eax*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x500+edx*1+ebp]
+        xor     edi,ebx
+        mov     ebp,DWORD [32+esp]
+        ;
+        ; Round 7
+        mov     eax,DWORD [36+esp]
+        mov     edx,edi
+        shr     edx,16
+        mov     ecx,DWORD [40+esp]
+        xor     edx,edi
+        and     eax,edx
+        and     edx,ecx
+        mov     ebx,eax
+        shl     ebx,16
+        mov     ecx,edx
+        shl     ecx,16
+        xor     eax,ebx
+        xor     edx,ecx
+        mov     ebx,DWORD [56+ebp]
+        xor     eax,ebx
+        mov     ecx,DWORD [60+ebp]
+        xor     eax,edi
+        xor     edx,edi
+        xor     edx,ecx
+        and     eax,0xfcfcfcfc
+        xor     ebx,ebx
+        and     edx,0xcfcfcfcf
+        xor     ecx,ecx
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        mov     ebp,DWORD [4+esp]
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        mov     ebx,DWORD [0x600+ebx*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x700+ecx*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x400+eax*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x500+edx*1+ebp]
+        xor     esi,ebx
+        mov     ebp,DWORD [32+esp]
+        ;
+        ; Round 8
+        mov     eax,DWORD [36+esp]
+        mov     edx,esi
+        shr     edx,16
+        mov     ecx,DWORD [40+esp]
+        xor     edx,esi
+        and     eax,edx
+        and     edx,ecx
+        mov     ebx,eax
+        shl     ebx,16
+        mov     ecx,edx
+        shl     ecx,16
+        xor     eax,ebx
+        xor     edx,ecx
+        mov     ebx,DWORD [64+ebp]
+        xor     eax,ebx
+        mov     ecx,DWORD [68+ebp]
+        xor     eax,esi
+        xor     edx,esi
+        xor     edx,ecx
+        and     eax,0xfcfcfcfc
+        xor     ebx,ebx
+        and     edx,0xcfcfcfcf
+        xor     ecx,ecx
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        mov     ebp,DWORD [4+esp]
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        mov     ebx,DWORD [0x600+ebx*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x700+ecx*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x400+eax*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x500+edx*1+ebp]
+        xor     edi,ebx
+        mov     ebp,DWORD [32+esp]
+        ;
+        ; Round 9
+        mov     eax,DWORD [36+esp]
+        mov     edx,edi
+        shr     edx,16
+        mov     ecx,DWORD [40+esp]
+        xor     edx,edi
+        and     eax,edx
+        and     edx,ecx
+        mov     ebx,eax
+        shl     ebx,16
+        mov     ecx,edx
+        shl     ecx,16
+        xor     eax,ebx
+        xor     edx,ecx
+        mov     ebx,DWORD [72+ebp]
+        xor     eax,ebx
+        mov     ecx,DWORD [76+ebp]
+        xor     eax,edi
+        xor     edx,edi
+        xor     edx,ecx
+        and     eax,0xfcfcfcfc
+        xor     ebx,ebx
+        and     edx,0xcfcfcfcf
+        xor     ecx,ecx
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        mov     ebp,DWORD [4+esp]
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        mov     ebx,DWORD [0x600+ebx*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x700+ecx*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x400+eax*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x500+edx*1+ebp]
+        xor     esi,ebx
+        mov     ebp,DWORD [32+esp]
+        ;
+        ; Round 10
+        mov     eax,DWORD [36+esp]
+        mov     edx,esi
+        shr     edx,16
+        mov     ecx,DWORD [40+esp]
+        xor     edx,esi
+        and     eax,edx
+        and     edx,ecx
+        mov     ebx,eax
+        shl     ebx,16
+        mov     ecx,edx
+        shl     ecx,16
+        xor     eax,ebx
+        xor     edx,ecx
+        mov     ebx,DWORD [80+ebp]
+        xor     eax,ebx
+        mov     ecx,DWORD [84+ebp]
+        xor     eax,esi
+        xor     edx,esi
+        xor     edx,ecx
+        and     eax,0xfcfcfcfc
+        xor     ebx,ebx
+        and     edx,0xcfcfcfcf
+        xor     ecx,ecx
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        mov     ebp,DWORD [4+esp]
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        mov     ebx,DWORD [0x600+ebx*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x700+ecx*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x400+eax*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x500+edx*1+ebp]
+        xor     edi,ebx
+        mov     ebp,DWORD [32+esp]
+        ;
+        ; Round 11
+        mov     eax,DWORD [36+esp]
+        mov     edx,edi
+        shr     edx,16
+        mov     ecx,DWORD [40+esp]
+        xor     edx,edi
+        and     eax,edx
+        and     edx,ecx
+        mov     ebx,eax
+        shl     ebx,16
+        mov     ecx,edx
+        shl     ecx,16
+        xor     eax,ebx
+        xor     edx,ecx
+        mov     ebx,DWORD [88+ebp]
+        xor     eax,ebx
+        mov     ecx,DWORD [92+ebp]
+        xor     eax,edi
+        xor     edx,edi
+        xor     edx,ecx
+        and     eax,0xfcfcfcfc
+        xor     ebx,ebx
+        and     edx,0xcfcfcfcf
+        xor     ecx,ecx
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        mov     ebp,DWORD [4+esp]
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        mov     ebx,DWORD [0x600+ebx*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x700+ecx*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x400+eax*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x500+edx*1+ebp]
+        xor     esi,ebx
+        mov     ebp,DWORD [32+esp]
+        ;
+        ; Round 12
+        mov     eax,DWORD [36+esp]
+        mov     edx,esi
+        shr     edx,16
+        mov     ecx,DWORD [40+esp]
+        xor     edx,esi
+        and     eax,edx
+        and     edx,ecx
+        mov     ebx,eax
+        shl     ebx,16
+        mov     ecx,edx
+        shl     ecx,16
+        xor     eax,ebx
+        xor     edx,ecx
+        mov     ebx,DWORD [96+ebp]
+        xor     eax,ebx
+        mov     ecx,DWORD [100+ebp]
+        xor     eax,esi
+        xor     edx,esi
+        xor     edx,ecx
+        and     eax,0xfcfcfcfc
+        xor     ebx,ebx
+        and     edx,0xcfcfcfcf
+        xor     ecx,ecx
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        mov     ebp,DWORD [4+esp]
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        mov     ebx,DWORD [0x600+ebx*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x700+ecx*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x400+eax*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x500+edx*1+ebp]
+        xor     edi,ebx
+        mov     ebp,DWORD [32+esp]
+        ;
+        ; Round 13
+        mov     eax,DWORD [36+esp]
+        mov     edx,edi
+        shr     edx,16
+        mov     ecx,DWORD [40+esp]
+        xor     edx,edi
+        and     eax,edx
+        and     edx,ecx
+        mov     ebx,eax
+        shl     ebx,16
+        mov     ecx,edx
+        shl     ecx,16
+        xor     eax,ebx
+        xor     edx,ecx
+        mov     ebx,DWORD [104+ebp]
+        xor     eax,ebx
+        mov     ecx,DWORD [108+ebp]
+        xor     eax,edi
+        xor     edx,edi
+        xor     edx,ecx
+        and     eax,0xfcfcfcfc
+        xor     ebx,ebx
+        and     edx,0xcfcfcfcf
+        xor     ecx,ecx
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        mov     ebp,DWORD [4+esp]
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        mov     ebx,DWORD [0x600+ebx*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x700+ecx*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x400+eax*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x500+edx*1+ebp]
+        xor     esi,ebx
+        mov     ebp,DWORD [32+esp]
+        ;
+        ; Round 14
+        mov     eax,DWORD [36+esp]
+        mov     edx,esi
+        shr     edx,16
+        mov     ecx,DWORD [40+esp]
+        xor     edx,esi
+        and     eax,edx
+        and     edx,ecx
+        mov     ebx,eax
+        shl     ebx,16
+        mov     ecx,edx
+        shl     ecx,16
+        xor     eax,ebx
+        xor     edx,ecx
+        mov     ebx,DWORD [112+ebp]
+        xor     eax,ebx
+        mov     ecx,DWORD [116+ebp]
+        xor     eax,esi
+        xor     edx,esi
+        xor     edx,ecx
+        and     eax,0xfcfcfcfc
+        xor     ebx,ebx
+        and     edx,0xcfcfcfcf
+        xor     ecx,ecx
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        mov     ebp,DWORD [4+esp]
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        mov     ebx,DWORD [0x600+ebx*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x700+ecx*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x400+eax*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x500+edx*1+ebp]
+        xor     edi,ebx
+        mov     ebp,DWORD [32+esp]
+        ;
+        ; Round 15
+        mov     eax,DWORD [36+esp]
+        mov     edx,edi
+        shr     edx,16
+        mov     ecx,DWORD [40+esp]
+        xor     edx,edi
+        and     eax,edx
+        and     edx,ecx
+        mov     ebx,eax
+        shl     ebx,16
+        mov     ecx,edx
+        shl     ecx,16
+        xor     eax,ebx
+        xor     edx,ecx
+        mov     ebx,DWORD [120+ebp]
+        xor     eax,ebx
+        mov     ecx,DWORD [124+ebp]
+        xor     eax,edi
+        xor     edx,edi
+        xor     edx,ecx
+        and     eax,0xfcfcfcfc
+        xor     ebx,ebx
+        and     edx,0xcfcfcfcf
+        xor     ecx,ecx
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        mov     ebp,DWORD [4+esp]
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        mov     ebx,DWORD [0x600+ebx*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x700+ecx*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x400+eax*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x500+edx*1+ebp]
+        xor     esi,ebx
+        mov     ebp,DWORD [32+esp]
+        mov     ebx,DWORD [esp]
+        mov     eax,edi
+        dec     ebx
+        mov     edi,esi
+        mov     esi,eax
+        mov     DWORD [esp],ebx
+        jnz     NEAR L$000start
+        ;
+        ; FP
+        mov     edx,DWORD [28+esp]
+        ror     edi,1
+        mov     eax,esi
+        xor     esi,edi
+        and     esi,0xaaaaaaaa
+        xor     eax,esi
+        xor     edi,esi
+        ;
+        rol     eax,23
+        mov     esi,eax
+        xor     eax,edi
+        and     eax,0x03fc03fc
+        xor     esi,eax
+        xor     edi,eax
+        ;
+        rol     esi,10
+        mov     eax,esi
+        xor     esi,edi
+        and     esi,0x33333333
+        xor     eax,esi
+        xor     edi,esi
+        ;
+        rol     edi,18
+        mov     esi,edi
+        xor     edi,eax
+        and     edi,0xfff0000f
+        xor     esi,edi
+        xor     eax,edi
+        ;
+        rol     esi,12
+        mov     edi,esi
+        xor     esi,eax
+        and     esi,0xf0f0f0f0
+        xor     edi,esi
+        xor     eax,esi
+        ;
+        ror     eax,4
+        mov     DWORD [edx],eax
+        mov     DWORD [4+edx],edi
+        add     esp,8
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/des/des-586.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/des/des-586.nasm
new file mode 100644
index 0000000000..980d488316
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/des/des-586.nasm
@@ -0,0 +1,1835 @@
+; Copyright 1995-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code    use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text   code align=64
+%else
+section .text   code
+%endif
+global  _DES_SPtrans
+align   16
+__x86_DES_encrypt:
+        push    ecx
+        ; Round 0
+        mov     eax,DWORD [ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [4+ecx]
+        xor     eax,esi
+        xor     ecx,ecx
+        xor     edx,esi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     edi,DWORD [0x600+ebx*1+ebp]
+        xor     edi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     edi,DWORD [0x400+eax*1+ebp]
+        xor     edi,DWORD [0x500+edx*1+ebp]
+        ; Round 1
+        mov     eax,DWORD [8+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [12+ecx]
+        xor     eax,edi
+        xor     ecx,ecx
+        xor     edx,edi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     esi,DWORD [0x600+ebx*1+ebp]
+        xor     esi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     esi,DWORD [0x400+eax*1+ebp]
+        xor     esi,DWORD [0x500+edx*1+ebp]
+        ; Round 2
+        mov     eax,DWORD [16+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [20+ecx]
+        xor     eax,esi
+        xor     ecx,ecx
+        xor     edx,esi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     edi,DWORD [0x600+ebx*1+ebp]
+        xor     edi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     edi,DWORD [0x400+eax*1+ebp]
+        xor     edi,DWORD [0x500+edx*1+ebp]
+        ; Round 3
+        mov     eax,DWORD [24+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [28+ecx]
+        xor     eax,edi
+        xor     ecx,ecx
+        xor     edx,edi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     esi,DWORD [0x600+ebx*1+ebp]
+        xor     esi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     esi,DWORD [0x400+eax*1+ebp]
+        xor     esi,DWORD [0x500+edx*1+ebp]
+        ; Round 4
+        mov     eax,DWORD [32+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [36+ecx]
+        xor     eax,esi
+        xor     ecx,ecx
+        xor     edx,esi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     edi,DWORD [0x600+ebx*1+ebp]
+        xor     edi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     edi,DWORD [0x400+eax*1+ebp]
+        xor     edi,DWORD [0x500+edx*1+ebp]
+        ; Round 5
+        mov     eax,DWORD [40+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [44+ecx]
+        xor     eax,edi
+        xor     ecx,ecx
+        xor     edx,edi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     esi,DWORD [0x600+ebx*1+ebp]
+        xor     esi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     esi,DWORD [0x400+eax*1+ebp]
+        xor     esi,DWORD [0x500+edx*1+ebp]
+        ; Round 6
+        mov     eax,DWORD [48+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [52+ecx]
+        xor     eax,esi
+        xor     ecx,ecx
+        xor     edx,esi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     edi,DWORD [0x600+ebx*1+ebp]
+        xor     edi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     edi,DWORD [0x400+eax*1+ebp]
+        xor     edi,DWORD [0x500+edx*1+ebp]
+        ; Round 7
+        mov     eax,DWORD [56+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [60+ecx]
+        xor     eax,edi
+        xor     ecx,ecx
+        xor     edx,edi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     esi,DWORD [0x600+ebx*1+ebp]
+        xor     esi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     esi,DWORD [0x400+eax*1+ebp]
+        xor     esi,DWORD [0x500+edx*1+ebp]
+        ; Round 8
+        mov     eax,DWORD [64+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [68+ecx]
+        xor     eax,esi
+        xor     ecx,ecx
+        xor     edx,esi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     edi,DWORD [0x600+ebx*1+ebp]
+        xor     edi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     edi,DWORD [0x400+eax*1+ebp]
+        xor     edi,DWORD [0x500+edx*1+ebp]
+        ; Round 9
+        mov     eax,DWORD [72+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [76+ecx]
+        xor     eax,edi
+        xor     ecx,ecx
+        xor     edx,edi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     esi,DWORD [0x600+ebx*1+ebp]
+        xor     esi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     esi,DWORD [0x400+eax*1+ebp]
+        xor     esi,DWORD [0x500+edx*1+ebp]
+        ; Round 10
+        mov     eax,DWORD [80+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [84+ecx]
+        xor     eax,esi
+        xor     ecx,ecx
+        xor     edx,esi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     edi,DWORD [0x600+ebx*1+ebp]
+        xor     edi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     edi,DWORD [0x400+eax*1+ebp]
+        xor     edi,DWORD [0x500+edx*1+ebp]
+        ; Round 11
+        mov     eax,DWORD [88+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [92+ecx]
+        xor     eax,edi
+        xor     ecx,ecx
+        xor     edx,edi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     esi,DWORD [0x600+ebx*1+ebp]
+        xor     esi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     esi,DWORD [0x400+eax*1+ebp]
+        xor     esi,DWORD [0x500+edx*1+ebp]
+        ; Round 12
+        mov     eax,DWORD [96+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [100+ecx]
+        xor     eax,esi
+        xor     ecx,ecx
+        xor     edx,esi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     edi,DWORD [0x600+ebx*1+ebp]
+        xor     edi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     edi,DWORD [0x400+eax*1+ebp]
+        xor     edi,DWORD [0x500+edx*1+ebp]
+        ; Round 13
+        mov     eax,DWORD [104+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [108+ecx]
+        xor     eax,edi
+        xor     ecx,ecx
+        xor     edx,edi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     esi,DWORD [0x600+ebx*1+ebp]
+        xor     esi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     esi,DWORD [0x400+eax*1+ebp]
+        xor     esi,DWORD [0x500+edx*1+ebp]
+        ; Round 14
+        mov     eax,DWORD [112+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [116+ecx]
+        xor     eax,esi
+        xor     ecx,ecx
+        xor     edx,esi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     edi,DWORD [0x600+ebx*1+ebp]
+        xor     edi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     edi,DWORD [0x400+eax*1+ebp]
+        xor     edi,DWORD [0x500+edx*1+ebp]
+        ; Round 15
+        mov     eax,DWORD [120+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [124+ecx]
+        xor     eax,edi
+        xor     ecx,ecx
+        xor     edx,edi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     esi,DWORD [0x600+ebx*1+ebp]
+        xor     esi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     esi,DWORD [0x400+eax*1+ebp]
+        xor     esi,DWORD [0x500+edx*1+ebp]
+        add     esp,4
+        ret
+align   16
+__x86_DES_decrypt:
+        push    ecx
+        ; Round 15
+        mov     eax,DWORD [120+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [124+ecx]
+        xor     eax,esi
+        xor     ecx,ecx
+        xor     edx,esi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     edi,DWORD [0x600+ebx*1+ebp]
+        xor     edi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     edi,DWORD [0x400+eax*1+ebp]
+        xor     edi,DWORD [0x500+edx*1+ebp]
+        ; Round 14
+        mov     eax,DWORD [112+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [116+ecx]
+        xor     eax,edi
+        xor     ecx,ecx
+        xor     edx,edi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     esi,DWORD [0x600+ebx*1+ebp]
+        xor     esi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     esi,DWORD [0x400+eax*1+ebp]
+        xor     esi,DWORD [0x500+edx*1+ebp]
+        ; Round 13
+        mov     eax,DWORD [104+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [108+ecx]
+        xor     eax,esi
+        xor     ecx,ecx
+        xor     edx,esi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     edi,DWORD [0x600+ebx*1+ebp]
+        xor     edi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     edi,DWORD [0x400+eax*1+ebp]
+        xor     edi,DWORD [0x500+edx*1+ebp]
+        ; Round 12
+        mov     eax,DWORD [96+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [100+ecx]
+        xor     eax,edi
+        xor     ecx,ecx
+        xor     edx,edi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     esi,DWORD [0x600+ebx*1+ebp]
+        xor     esi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     esi,DWORD [0x400+eax*1+ebp]
+        xor     esi,DWORD [0x500+edx*1+ebp]
+        ; Round 11
+        mov     eax,DWORD [88+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [92+ecx]
+        xor     eax,esi
+        xor     ecx,ecx
+        xor     edx,esi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     edi,DWORD [0x600+ebx*1+ebp]
+        xor     edi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     edi,DWORD [0x400+eax*1+ebp]
+        xor     edi,DWORD [0x500+edx*1+ebp]
+        ; Round 10
+        mov     eax,DWORD [80+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [84+ecx]
+        xor     eax,edi
+        xor     ecx,ecx
+        xor     edx,edi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     esi,DWORD [0x600+ebx*1+ebp]
+        xor     esi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     esi,DWORD [0x400+eax*1+ebp]
+        xor     esi,DWORD [0x500+edx*1+ebp]
+        ; Round 9
+        mov     eax,DWORD [72+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [76+ecx]
+        xor     eax,esi
+        xor     ecx,ecx
+        xor     edx,esi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     edi,DWORD [0x600+ebx*1+ebp]
+        xor     edi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     edi,DWORD [0x400+eax*1+ebp]
+        xor     edi,DWORD [0x500+edx*1+ebp]
+        ; Round 8
+        mov     eax,DWORD [64+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [68+ecx]
+        xor     eax,edi
+        xor     ecx,ecx
+        xor     edx,edi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     esi,DWORD [0x600+ebx*1+ebp]
+        xor     esi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     esi,DWORD [0x400+eax*1+ebp]
+        xor     esi,DWORD [0x500+edx*1+ebp]
+        ; Round 7
+        mov     eax,DWORD [56+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [60+ecx]
+        xor     eax,esi
+        xor     ecx,ecx
+        xor     edx,esi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     edi,DWORD [0x600+ebx*1+ebp]
+        xor     edi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     edi,DWORD [0x400+eax*1+ebp]
+        xor     edi,DWORD [0x500+edx*1+ebp]
+        ; Round 6
+        mov     eax,DWORD [48+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [52+ecx]
+        xor     eax,edi
+        xor     ecx,ecx
+        xor     edx,edi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     esi,DWORD [0x600+ebx*1+ebp]
+        xor     esi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     esi,DWORD [0x400+eax*1+ebp]
+        xor     esi,DWORD [0x500+edx*1+ebp]
+        ; Round 5
+        mov     eax,DWORD [40+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [44+ecx]
+        xor     eax,esi
+        xor     ecx,ecx
+        xor     edx,esi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     edi,DWORD [0x600+ebx*1+ebp]
+        xor     edi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     edi,DWORD [0x400+eax*1+ebp]
+        xor     edi,DWORD [0x500+edx*1+ebp]
+        ; Round 4
+        mov     eax,DWORD [32+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [36+ecx]
+        xor     eax,edi
+        xor     ecx,ecx
+        xor     edx,edi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     esi,DWORD [0x600+ebx*1+ebp]
+        xor     esi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     esi,DWORD [0x400+eax*1+ebp]
+        xor     esi,DWORD [0x500+edx*1+ebp]
+        ; Round 3
+        mov     eax,DWORD [24+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [28+ecx]
+        xor     eax,esi
+        xor     ecx,ecx
+        xor     edx,esi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     edi,DWORD [0x600+ebx*1+ebp]
+        xor     edi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     edi,DWORD [0x400+eax*1+ebp]
+        xor     edi,DWORD [0x500+edx*1+ebp]
+        ; Round 2
+        mov     eax,DWORD [16+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [20+ecx]
+        xor     eax,edi
+        xor     ecx,ecx
+        xor     edx,edi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     esi,DWORD [0x600+ebx*1+ebp]
+        xor     esi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     esi,DWORD [0x400+eax*1+ebp]
+        xor     esi,DWORD [0x500+edx*1+ebp]
+        ; Round 1
+        mov     eax,DWORD [8+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [12+ecx]
+        xor     eax,esi
+        xor     ecx,ecx
+        xor     edx,esi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     edi,DWORD [0x600+ebx*1+ebp]
+        xor     edi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     edi,DWORD [0x400+eax*1+ebp]
+        xor     edi,DWORD [0x500+edx*1+ebp]
+        ; Round 0
+        mov     eax,DWORD [ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [4+ecx]
+        xor     eax,edi
+        xor     ecx,ecx
+        xor     edx,edi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     esi,DWORD [0x600+ebx*1+ebp]
+        xor     esi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     esi,DWORD [0x400+eax*1+ebp]
+        xor     esi,DWORD [0x500+edx*1+ebp]
+        add     esp,4
+        ret
+global  _DES_encrypt1
+align   16
+_DES_encrypt1:
+L$_DES_encrypt1_begin:
+        push    esi
+        push    edi
+        ;
+        ; Load the 2 words
+        mov     esi,DWORD [12+esp]
+        xor     ecx,ecx
+        push    ebx
+        push    ebp
+        mov     eax,DWORD [esi]
+        mov     ebx,DWORD [28+esp]
+        mov     edi,DWORD [4+esi]
+        ;
+        ; IP
+        rol     eax,4
+        mov     esi,eax
+        xor     eax,edi
+        and     eax,0xf0f0f0f0
+        xor     esi,eax
+        xor     edi,eax
+        ;
+        rol     edi,20
+        mov     eax,edi
+        xor     edi,esi
+        and     edi,0xfff0000f
+        xor     eax,edi
+        xor     esi,edi
+        ;
+        rol     eax,14
+        mov     edi,eax
+        xor     eax,esi
+        and     eax,0x33333333
+        xor     edi,eax
+        xor     esi,eax
+        ;
+        rol     esi,22
+        mov     eax,esi
+        xor     esi,edi
+        and     esi,0x03fc03fc
+        xor     eax,esi
+        xor     edi,esi
+        ;
+        rol     eax,9
+        mov     esi,eax
+        xor     eax,edi
+        and     eax,0xaaaaaaaa
+        xor     esi,eax
+        xor     edi,eax
+        ;
+        rol     edi,1
+        call    L$000pic_point
+L$000pic_point:
+        pop     ebp
+        lea     ebp,[(L$des_sptrans-L$000pic_point)+ebp]
+        mov     ecx,DWORD [24+esp]
+        cmp     ebx,0
+        je      NEAR L$001decrypt
+        call    __x86_DES_encrypt
+        jmp     NEAR L$002done
+L$001decrypt:
+        call    __x86_DES_decrypt
+L$002done:
+        ;
+        ; FP
+        mov     edx,DWORD [20+esp]
+        ror     esi,1
+        mov     eax,edi
+        xor     edi,esi
+        and     edi,0xaaaaaaaa
+        xor     eax,edi
+        xor     esi,edi
+        ;
+        rol     eax,23
+        mov     edi,eax
+        xor     eax,esi
+        and     eax,0x03fc03fc
+        xor     edi,eax
+        xor     esi,eax
+        ;
+        rol     edi,10
+        mov     eax,edi
+        xor     edi,esi
+        and     edi,0x33333333
+        xor     eax,edi
+        xor     esi,edi
+        ;
+        rol     esi,18
+        mov     edi,esi
+        xor     esi,eax
+        and     esi,0xfff0000f
+        xor     edi,esi
+        xor     eax,esi
+        ;
+        rol     edi,12
+        mov     esi,edi
+        xor     edi,eax
+        and     edi,0xf0f0f0f0
+        xor     esi,edi
+        xor     eax,edi
+        ;
+        ror     eax,4
+        mov     DWORD [edx],eax
+        mov     DWORD [4+edx],esi
+        pop     ebp
+        pop     ebx
+        pop     edi
+        pop     esi
+        ret
+global  _DES_encrypt2
+align   16
+_DES_encrypt2:
+L$_DES_encrypt2_begin:
+        push    esi
+        push    edi
+        ;
+        ; Load the 2 words
+        mov     eax,DWORD [12+esp]
+        xor     ecx,ecx
+        push    ebx
+        push    ebp
+        mov     esi,DWORD [eax]
+        mov     ebx,DWORD [28+esp]
+        rol     esi,3
+        mov     edi,DWORD [4+eax]
+        rol     edi,3
+        call    L$003pic_point
+L$003pic_point:
+        pop     ebp
+        lea     ebp,[(L$des_sptrans-L$003pic_point)+ebp]
+        mov     ecx,DWORD [24+esp]
+        cmp     ebx,0
+        je      NEAR L$004decrypt
+        call    __x86_DES_encrypt
+        jmp     NEAR L$005done
+L$004decrypt:
+        call    __x86_DES_decrypt
+L$005done:
+        ;
+        ; Fixup
+        ror     edi,3
+        mov     eax,DWORD [20+esp]
+        ror     esi,3
+        mov     DWORD [eax],edi
+        mov     DWORD [4+eax],esi
+        pop     ebp
+        pop     ebx
+        pop     edi
+        pop     esi
+        ret
+global  _DES_encrypt3
+align   16
+_DES_encrypt3:
+L$_DES_encrypt3_begin:
+        push    ebx
+        mov     ebx,DWORD [8+esp]
+        push    ebp
+        push    esi
+        push    edi
+        ;
+        ; Load the data words
+        mov     edi,DWORD [ebx]
+        mov     esi,DWORD [4+ebx]
+        sub     esp,12
+        ;
+        ; IP
+        rol     edi,4
+        mov     edx,edi
+        xor     edi,esi
+        and     edi,0xf0f0f0f0
+        xor     edx,edi
+        xor     esi,edi
+        ;
+        rol     esi,20
+        mov     edi,esi
+        xor     esi,edx
+        and     esi,0xfff0000f
+        xor     edi,esi
+        xor     edx,esi
+        ;
+        rol     edi,14
+        mov     esi,edi
+        xor     edi,edx
+        and     edi,0x33333333
+        xor     esi,edi
+        xor     edx,edi
+        ;
+        rol     edx,22
+        mov     edi,edx
+        xor     edx,esi
+        and     edx,0x03fc03fc
+        xor     edi,edx
+        xor     esi,edx
+        ;
+        rol     edi,9
+        mov     edx,edi
+        xor     edi,esi
+        and     edi,0xaaaaaaaa
+        xor     edx,edi
+        xor     esi,edi
+        ;
+        ror     edx,3
+        ror     esi,2
+        mov     DWORD [4+ebx],esi
+        mov     eax,DWORD [36+esp]
+        mov     DWORD [ebx],edx
+        mov     edi,DWORD [40+esp]
+        mov     esi,DWORD [44+esp]
+        mov     DWORD [8+esp],DWORD 1
+        mov     DWORD [4+esp],eax
+        mov     DWORD [esp],ebx
+        call    L$_DES_encrypt2_begin
+        mov     DWORD [8+esp],DWORD 0
+        mov     DWORD [4+esp],edi
+        mov     DWORD [esp],ebx
+        call    L$_DES_encrypt2_begin
+        mov     DWORD [8+esp],DWORD 1
+        mov     DWORD [4+esp],esi
+        mov     DWORD [esp],ebx
+        call    L$_DES_encrypt2_begin
+        add     esp,12
+        mov     edi,DWORD [ebx]
+        mov     esi,DWORD [4+ebx]
+        ;
+        ; FP
+        rol     esi,2
+        rol     edi,3
+        mov     eax,edi
+        xor     edi,esi
+        and     edi,0xaaaaaaaa
+        xor     eax,edi
+        xor     esi,edi
+        ;
+        rol     eax,23
+        mov     edi,eax
+        xor     eax,esi
+        and     eax,0x03fc03fc
+        xor     edi,eax
+        xor     esi,eax
+        ;
+        rol     edi,10
+        mov     eax,edi
+        xor     edi,esi
+        and     edi,0x33333333
+        xor     eax,edi
+        xor     esi,edi
+        ;
+        rol     esi,18
+        mov     edi,esi
+        xor     esi,eax
+        and     esi,0xfff0000f
+        xor     edi,esi
+        xor     eax,esi
+        ;
+        rol     edi,12
+        mov     esi,edi
+        xor     edi,eax
+        and     edi,0xf0f0f0f0
+        xor     esi,edi
+        xor     eax,edi
+        ;
+        ror     eax,4
+        mov     DWORD [ebx],eax
+        mov     DWORD [4+ebx],esi
+        pop     edi
+        pop     esi
+        pop     ebp
+        pop     ebx
+        ret
+global  _DES_decrypt3
+align   16
+_DES_decrypt3:
+L$_DES_decrypt3_begin:
+        push    ebx
+        mov     ebx,DWORD [8+esp]
+        push    ebp
+        push    esi
+        push    edi
+        ;
+        ; Load the data words
+        mov     edi,DWORD [ebx]
+        mov     esi,DWORD [4+ebx]
+        sub     esp,12
+        ;
+        ; IP
+        rol     edi,4
+        mov     edx,edi
+        xor     edi,esi
+        and     edi,0xf0f0f0f0
+        xor     edx,edi
+        xor     esi,edi
+        ;
+        rol     esi,20
+        mov     edi,esi
+        xor     esi,edx
+        and     esi,0xfff0000f
+        xor     edi,esi
+        xor     edx,esi
+        ;
+        rol     edi,14
+        mov     esi,edi
+        xor     edi,edx
+        and     edi,0x33333333
+        xor     esi,edi
+        xor     edx,edi
+        ;
+        rol     edx,22
+        mov     edi,edx
+        xor     edx,esi
+        and     edx,0x03fc03fc
+        xor     edi,edx
+        xor     esi,edx
+        ;
+        rol     edi,9
+        mov     edx,edi
+        xor     edi,esi
+        and     edi,0xaaaaaaaa
+        xor     edx,edi
+        xor     esi,edi
+        ;
+        ror     edx,3
+        ror     esi,2
+        mov     DWORD [4+ebx],esi
+        mov     esi,DWORD [36+esp]
+        mov     DWORD [ebx],edx
+        mov     edi,DWORD [40+esp]
+        mov     eax,DWORD [44+esp]
+        mov     DWORD [8+esp],DWORD 0
+        mov     DWORD [4+esp],eax
+        mov     DWORD [esp],ebx
+        call    L$_DES_encrypt2_begin
+        mov     DWORD [8+esp],DWORD 1
+        mov     DWORD [4+esp],edi
+        mov     DWORD [esp],ebx
+        call    L$_DES_encrypt2_begin
+        mov     DWORD [8+esp],DWORD 0
+        mov     DWORD [4+esp],esi
+        mov     DWORD [esp],ebx
+        call    L$_DES_encrypt2_begin
+        add     esp,12
+        mov     edi,DWORD [ebx]
+        mov     esi,DWORD [4+ebx]
+        ;
+        ; FP
+        rol     esi,2
+        rol     edi,3
+        mov     eax,edi
+        xor     edi,esi
+        and     edi,0xaaaaaaaa
+        xor     eax,edi
+        xor     esi,edi
+        ;
+        rol     eax,23
+        mov     edi,eax
+        xor     eax,esi
+        and     eax,0x03fc03fc
+        xor     edi,eax
+        xor     esi,eax
+        ;
+        rol     edi,10
+        mov     eax,edi
+        xor     edi,esi
+        and     edi,0x33333333
+        xor     eax,edi
+        xor     esi,edi
+        ;
+        rol     esi,18
+        mov     edi,esi
+        xor     esi,eax
+        and     esi,0xfff0000f
+        xor     edi,esi
+        xor     eax,esi
+        ;
+        rol     edi,12
+        mov     esi,edi
+        xor     edi,eax
+        and     edi,0xf0f0f0f0
+        xor     esi,edi
+        xor     eax,edi
+        ;
+        ror     eax,4
+        mov     DWORD [ebx],eax
+        mov     DWORD [4+ebx],esi
+        pop     edi
+        pop     esi
+        pop     ebp
+        pop     ebx
+        ret
+global  _DES_ncbc_encrypt
+align   16
+_DES_ncbc_encrypt:
+L$_DES_ncbc_encrypt_begin:
+        ;
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     ebp,DWORD [28+esp]
+        ; getting iv ptr from parameter 4
+        mov     ebx,DWORD [36+esp]
+        mov     esi,DWORD [ebx]
+        mov     edi,DWORD [4+ebx]
+        push    edi
+        push    esi
+        push    edi
+        push    esi
+        mov     ebx,esp
+        mov     esi,DWORD [36+esp]
+        mov     edi,DWORD [40+esp]
+        ; getting encrypt flag from parameter 5
+        mov     ecx,DWORD [56+esp]
+        ; get and push parameter 5
+        push    ecx
+        ; get and push parameter 3
+        mov     eax,DWORD [52+esp]
+        push    eax
+        push    ebx
+        cmp     ecx,0
+        jz      NEAR L$006decrypt
+        and     ebp,4294967288
+        mov     eax,DWORD [12+esp]
+        mov     ebx,DWORD [16+esp]
+        jz      NEAR L$007encrypt_finish
+L$008encrypt_loop:
+        mov     ecx,DWORD [esi]
+        mov     edx,DWORD [4+esi]
+        xor     eax,ecx
+        xor     ebx,edx
+        mov     DWORD [12+esp],eax
+        mov     DWORD [16+esp],ebx
+        call    L$_DES_encrypt1_begin
+        mov     eax,DWORD [12+esp]
+        mov     ebx,DWORD [16+esp]
+        mov     DWORD [edi],eax
+        mov     DWORD [4+edi],ebx
+        add     esi,8
+        add     edi,8
+        sub     ebp,8
+        jnz     NEAR L$008encrypt_loop
+L$007encrypt_finish:
+        mov     ebp,DWORD [56+esp]
+        and     ebp,7
+        jz      NEAR L$009finish
+        call    L$010PIC_point
+L$010PIC_point:
+        pop     edx
+        lea     ecx,[(L$011cbc_enc_jmp_table-L$010PIC_point)+edx]
+        mov     ebp,DWORD [ebp*4+ecx]
+        add     ebp,edx
+        xor     ecx,ecx
+        xor     edx,edx
+        jmp     ebp
+L$012ej7:
+        mov     dh,BYTE [6+esi]
+        shl     edx,8
+L$013ej6:
+        mov     dh,BYTE [5+esi]
+L$014ej5:
+        mov     dl,BYTE [4+esi]
+L$015ej4:
+        mov     ecx,DWORD [esi]
+        jmp     NEAR L$016ejend
+L$017ej3:
+        mov     ch,BYTE [2+esi]
+        shl     ecx,8
+L$018ej2:
+        mov     ch,BYTE [1+esi]
+L$019ej1:
+        mov     cl,BYTE [esi]
+L$016ejend:
+        xor     eax,ecx
+        xor     ebx,edx
+        mov     DWORD [12+esp],eax
+        mov     DWORD [16+esp],ebx
+        call    L$_DES_encrypt1_begin
+        mov     eax,DWORD [12+esp]
+        mov     ebx,DWORD [16+esp]
+        mov     DWORD [edi],eax
+        mov     DWORD [4+edi],ebx
+        jmp     NEAR L$009finish
+L$006decrypt:
+        and     ebp,4294967288
+        mov     eax,DWORD [20+esp]
+        mov     ebx,DWORD [24+esp]
+        jz      NEAR L$020decrypt_finish
+L$021decrypt_loop:
+        mov     eax,DWORD [esi]
+        mov     ebx,DWORD [4+esi]
+        mov     DWORD [12+esp],eax
+        mov     DWORD [16+esp],ebx
+        call    L$_DES_encrypt1_begin
+        mov     eax,DWORD [12+esp]
+        mov     ebx,DWORD [16+esp]
+        mov     ecx,DWORD [20+esp]
+        mov     edx,DWORD [24+esp]
+        xor     ecx,eax
+        xor     edx,ebx
+        mov     eax,DWORD [esi]
+        mov     ebx,DWORD [4+esi]
+        mov     DWORD [edi],ecx
+        mov     DWORD [4+edi],edx
+        mov     DWORD [20+esp],eax
+        mov     DWORD [24+esp],ebx
+        add     esi,8
+        add     edi,8
+        sub     ebp,8
+        jnz     NEAR L$021decrypt_loop
+L$020decrypt_finish:
+        mov     ebp,DWORD [56+esp]
+        and     ebp,7
+        jz      NEAR L$009finish
+        mov     eax,DWORD [esi]
+        mov     ebx,DWORD [4+esi]
+        mov     DWORD [12+esp],eax
+        mov     DWORD [16+esp],ebx
+        call    L$_DES_encrypt1_begin
+        mov     eax,DWORD [12+esp]
+        mov     ebx,DWORD [16+esp]
+        mov     ecx,DWORD [20+esp]
+        mov     edx,DWORD [24+esp]
+        xor     ecx,eax
+        xor     edx,ebx
+        mov     eax,DWORD [esi]
+        mov     ebx,DWORD [4+esi]
+L$022dj7:
+        ror     edx,16
+        mov     BYTE [6+edi],dl
+        shr     edx,16
+L$023dj6:
+        mov     BYTE [5+edi],dh
+L$024dj5:
+        mov     BYTE [4+edi],dl
+L$025dj4:
+        mov     DWORD [edi],ecx
+        jmp     NEAR L$026djend
+L$027dj3:
+        ror     ecx,16
+        mov     BYTE [2+edi],cl
+        shl     ecx,16
+L$028dj2:
+        mov     BYTE [1+esi],ch
+L$029dj1:
+        mov     BYTE [esi],cl
+L$026djend:
+        jmp     NEAR L$009finish
+L$009finish:
+        mov     ecx,DWORD [64+esp]
+        add     esp,28
+        mov     DWORD [ecx],eax
+        mov     DWORD [4+ecx],ebx
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+align   64
+L$011cbc_enc_jmp_table:
+dd      0
+dd      L$019ej1-L$010PIC_point
+dd      L$018ej2-L$010PIC_point
+dd      L$017ej3-L$010PIC_point
+dd      L$015ej4-L$010PIC_point
+dd      L$014ej5-L$010PIC_point
+dd      L$013ej6-L$010PIC_point
+dd      L$012ej7-L$010PIC_point
+align   64
+global  _DES_ede3_cbc_encrypt
+align   16
+_DES_ede3_cbc_encrypt:
+L$_DES_ede3_cbc_encrypt_begin:
+        ;
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     ebp,DWORD [28+esp]
+        ; getting iv ptr from parameter 6
+        mov     ebx,DWORD [44+esp]
+        mov     esi,DWORD [ebx]
+        mov     edi,DWORD [4+ebx]
+        push    edi
+        push    esi
+        push    edi
+        push    esi
+        mov     ebx,esp
+        mov     esi,DWORD [36+esp]
+        mov     edi,DWORD [40+esp]
+        ; getting encrypt flag from parameter 7
+        mov     ecx,DWORD [64+esp]
+        ; get and push parameter 5
+        mov     eax,DWORD [56+esp]
+        push    eax
+        ; get and push parameter 4
+        mov     eax,DWORD [56+esp]
+        push    eax
+        ; get and push parameter 3
+        mov     eax,DWORD [56+esp]
+        push    eax
+        push    ebx
+        cmp     ecx,0
+        jz      NEAR L$030decrypt
+        and     ebp,4294967288
+        mov     eax,DWORD [16+esp]
+        mov     ebx,DWORD [20+esp]
+        jz      NEAR L$031encrypt_finish
+L$032encrypt_loop:
+        mov     ecx,DWORD [esi]
+        mov     edx,DWORD [4+esi]
+        xor     eax,ecx
+        xor     ebx,edx
+        mov     DWORD [16+esp],eax
+        mov     DWORD [20+esp],ebx
+        call    L$_DES_encrypt3_begin
+        mov     eax,DWORD [16+esp]
+        mov     ebx,DWORD [20+esp]
+        mov     DWORD [edi],eax
+        mov     DWORD [4+edi],ebx
+        add     esi,8
+        add     edi,8
+        sub     ebp,8
+        jnz     NEAR L$032encrypt_loop
+L$031encrypt_finish:
+        mov     ebp,DWORD [60+esp]
+        and     ebp,7
+        jz      NEAR L$033finish
+        call    L$034PIC_point
+L$034PIC_point:
+        pop     edx
+        lea     ecx,[(L$035cbc_enc_jmp_table-L$034PIC_point)+edx]
+        mov     ebp,DWORD [ebp*4+ecx]
+        add     ebp,edx
+        xor     ecx,ecx
+        xor     edx,edx
+        jmp     ebp
+L$036ej7:
+        mov     dh,BYTE [6+esi]
+        shl     edx,8
+L$037ej6:
+        mov     dh,BYTE [5+esi]
+L$038ej5:
+        mov     dl,BYTE [4+esi]
+L$039ej4:
+        mov     ecx,DWORD [esi]
+        jmp     NEAR L$040ejend
+L$041ej3:
+        mov     ch,BYTE [2+esi]
+        shl     ecx,8
+L$042ej2:
+        mov     ch,BYTE [1+esi]
+L$043ej1:
+        mov     cl,BYTE [esi]
+L$040ejend:
+        xor     eax,ecx
+        xor     ebx,edx
+        mov     DWORD [16+esp],eax
+        mov     DWORD [20+esp],ebx
+        call    L$_DES_encrypt3_begin
+        mov     eax,DWORD [16+esp]
+        mov     ebx,DWORD [20+esp]
+        mov     DWORD [edi],eax
+        mov     DWORD [4+edi],ebx
+        jmp     NEAR L$033finish
+L$030decrypt:
+        and     ebp,4294967288
+        mov     eax,DWORD [24+esp]
+        mov     ebx,DWORD [28+esp]
+        jz      NEAR L$044decrypt_finish
+L$045decrypt_loop:
+        mov     eax,DWORD [esi]
+        mov     ebx,DWORD [4+esi]
+        mov     DWORD [16+esp],eax
+        mov     DWORD [20+esp],ebx
+        call    L$_DES_decrypt3_begin
+        mov     eax,DWORD [16+esp]
+        mov     ebx,DWORD [20+esp]
+        mov     ecx,DWORD [24+esp]
+        mov     edx,DWORD [28+esp]
+        xor     ecx,eax
+        xor     edx,ebx
+        mov     eax,DWORD [esi]
+        mov     ebx,DWORD [4+esi]
+        mov     DWORD [edi],ecx
+        mov     DWORD [4+edi],edx
+        mov     DWORD [24+esp],eax
+        mov     DWORD [28+esp],ebx
+        add     esi,8
+        add     edi,8
+        sub     ebp,8
+        jnz     NEAR L$045decrypt_loop
+L$044decrypt_finish:
+        mov     ebp,DWORD [60+esp]
+        and     ebp,7
+        jz      NEAR L$033finish
+        mov     eax,DWORD [esi]
+        mov     ebx,DWORD [4+esi]
+        mov     DWORD [16+esp],eax
+        mov     DWORD [20+esp],ebx
+        call    L$_DES_decrypt3_begin
+        mov     eax,DWORD [16+esp]
+        mov     ebx,DWORD [20+esp]
+        mov     ecx,DWORD [24+esp]
+        mov     edx,DWORD [28+esp]
+        xor     ecx,eax
+        xor     edx,ebx
+        mov     eax,DWORD [esi]
+        mov     ebx,DWORD [4+esi]
+L$046dj7:
+        ror     edx,16
+        mov     BYTE [6+edi],dl
+        shr     edx,16
+L$047dj6:
+        mov     BYTE [5+edi],dh
+L$048dj5:
+        mov     BYTE [4+edi],dl
+L$049dj4:
+        mov     DWORD [edi],ecx
+        jmp     NEAR L$050djend
+L$051dj3:
+        ror     ecx,16
+        mov     BYTE [2+edi],cl
+        shl     ecx,16
+L$052dj2:
+        mov     BYTE [1+esi],ch
+L$053dj1:
+        mov     BYTE [esi],cl
+L$050djend:
+        jmp     NEAR L$033finish
+L$033finish:
+        mov     ecx,DWORD [76+esp]
+        add     esp,32
+        mov     DWORD [ecx],eax
+        mov     DWORD [4+ecx],ebx
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+align   64
+L$035cbc_enc_jmp_table:
+dd      0
+dd      L$043ej1-L$034PIC_point
+dd      L$042ej2-L$034PIC_point
+dd      L$041ej3-L$034PIC_point
+dd      L$039ej4-L$034PIC_point
+dd      L$038ej5-L$034PIC_point
+dd      L$037ej6-L$034PIC_point
+dd      L$036ej7-L$034PIC_point
+align   64
+align   64
+_DES_SPtrans:
+L$des_sptrans:
+dd      34080768,524288,33554434,34080770
+dd      33554432,526338,524290,33554434
+dd      526338,34080768,34078720,2050
+dd      33556482,33554432,0,524290
+dd      524288,2,33556480,526336
+dd      34080770,34078720,2050,33556480
+dd      2,2048,526336,34078722
+dd      2048,33556482,34078722,0
+dd      0,34080770,33556480,524290
+dd      34080768,524288,2050,33556480
+dd      34078722,2048,526336,33554434
+dd      526338,2,33554434,34078720
+dd      34080770,526336,34078720,33556482
+dd      33554432,2050,524290,0
+dd      524288,33554432,33556482,34080768
+dd      2,34078722,2048,526338
+dd      1074823184,0,1081344,1074790400
+dd      1073741840,32784,1073774592,1081344
+dd      32768,1074790416,16,1073774592
+dd      1048592,1074823168,1074790400,16
+dd      1048576,1073774608,1074790416,32768
+dd      1081360,1073741824,0,1048592
+dd      1073774608,1081360,1074823168,1073741840
+dd      1073741824,1048576,32784,1074823184
+dd      1048592,1074823168,1073774592,1081360
+dd      1074823184,1048592,1073741840,0
+dd      1073741824,32784,1048576,1074790416
+dd      32768,1073741824,1081360,1073774608
+dd      1074823168,32768,0,1073741840
+dd      16,1074823184,1081344,1074790400
+dd      1074790416,1048576,32784,1073774592
+dd      1073774608,16,1074790400,1081344
+dd      67108865,67371264,256,67109121
+dd      262145,67108864,67109121,262400
+dd      67109120,262144,67371008,1
+dd      67371265,257,1,67371009
+dd      0,262145,67371264,256
+dd      257,67371265,262144,67108865
+dd      67371009,67109120,262401,67371008
+dd      262400,0,67108864,262401
+dd      67371264,256,1,262144
+dd      257,262145,67371008,67109121
+dd      0,67371264,262400,67371009
+dd      262145,67108864,67371265,1
+dd      262401,67108865,67108864,67371265
+dd      262144,67109120,67109121,262400
+dd      67109120,0,67371009,257
+dd      67108865,262401,256,67371008
+dd      4198408,268439552,8,272633864
+dd      0,272629760,268439560,4194312
+dd      272633856,268435464,268435456,4104
+dd      268435464,4198408,4194304,268435456
+dd      272629768,4198400,4096,8
+dd      4198400,268439560,272629760,4096
+dd      4104,0,4194312,272633856
+dd      268439552,272629768,272633864,4194304
+dd      272629768,4104,4194304,268435464
+dd      4198400,268439552,8,272629760
+dd      268439560,0,4096,4194312
+dd      0,272629768,272633856,4096
+dd      268435456,272633864,4198408,4194304
+dd      272633864,8,268439552,4198408
+dd      4194312,4198400,272629760,268439560
+dd      4104,268435456,268435464,272633856
+dd      134217728,65536,1024,134284320
+dd      134283296,134218752,66592,134283264
+dd      65536,32,134217760,66560
+dd      134218784,134283296,134284288,0
+dd      66560,134217728,65568,1056
+dd      134218752,66592,0,134217760
+dd      32,134218784,134284320,65568
+dd      134283264,1024,1056,134284288
+dd      134284288,134218784,65568,134283264
+dd      65536,32,134217760,134218752
+dd      134217728,66560,134284320,0
+dd      66592,134217728,1024,65568
+dd      134218784,1024,0,134284320
+dd      134283296,134284288,1056,65536
+dd      66560,134283296,134218752,1056
+dd      32,66592,134283264,134217760
+dd      2147483712,2097216,0,2149588992
+dd      2097216,8192,2147491904,2097152
+dd      8256,2149589056,2105344,2147483648
+dd      2147491840,2147483712,2149580800,2105408
+dd      2097152,2147491904,2149580864,0
+dd      8192,64,2149588992,2149580864
+dd      2149589056,2149580800,2147483648,8256
+dd      64,2105344,2105408,2147491840
+dd      8256,2147483648,2147491840,2105408
+dd      2149588992,2097216,0,2147491840
+dd      2147483648,8192,2149580864,2097152
+dd      2097216,2149589056,2105344,64
+dd      2149589056,2105344,2097152,2147491904
+dd      2147483712,2149580800,2105408,0
+dd      8192,2147483712,2147491904,2149588992
+dd      2149580800,8256,64,2149580864
+dd      16384,512,16777728,16777220
+dd      16794116,16388,16896,0
+dd      16777216,16777732,516,16793600
+dd      4,16794112,16793600,516
+dd      16777732,16384,16388,16794116
+dd      0,16777728,16777220,16896
+dd      16793604,16900,16794112,4
+dd      16900,16793604,512,16777216
+dd      16900,16793600,16793604,516
+dd      16384,512,16777216,16793604
+dd      16777732,16900,16896,0
+dd      512,16777220,4,16777728
+dd      0,16777732,16777728,16896
+dd      516,16384,16794116,16777216
+dd      16794112,4,16388,16794116
+dd      16777220,16794112,16793600,16388
+dd      545259648,545390592,131200,0
+dd      537001984,8388736,545259520,545390720
+dd      128,536870912,8519680,131200
+dd      8519808,537002112,536871040,545259520
+dd      131072,8519808,8388736,537001984
+dd      545390720,536871040,0,8519680
+dd      536870912,8388608,537002112,545259648
+dd      8388608,131072,545390592,128
+dd      8388608,131072,536871040,545390720
+dd      131200,536870912,0,8519680
+dd      545259648,537002112,537001984,8388736
+dd      545390592,128,8388736,537001984
+dd      545390720,8388608,545259520,536871040
+dd      8519680,131200,537002112,545259520
+dd      128,545390592,8519808,0
+dd      536870912,545259648,131072,8519808
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/md5/md5-586.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/md5/md5-586.nasm
new file mode 100644
index 0000000000..83e4e77e6a
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/md5/md5-586.nasm
@@ -0,0 +1,690 @@
+; Copyright 1995-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code    use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text   code align=64
+%else
+section .text   code
+%endif
+global  _md5_block_asm_data_order
+align   16
+_md5_block_asm_data_order:
+L$_md5_block_asm_data_order_begin:
+        push    esi
+        push    edi
+        mov     edi,DWORD [12+esp]
+        mov     esi,DWORD [16+esp]
+        mov     ecx,DWORD [20+esp]
+        push    ebp
+        shl     ecx,6
+        push    ebx
+        add     ecx,esi
+        sub     ecx,64
+        mov     eax,DWORD [edi]
+        push    ecx
+        mov     ebx,DWORD [4+edi]
+        mov     ecx,DWORD [8+edi]
+        mov     edx,DWORD [12+edi]
+L$000start:
+        ;
+        ; R0 section
+        mov     edi,ecx
+        mov     ebp,DWORD [esi]
+        ; R0 0
+        xor     edi,edx
+        and     edi,ebx
+        lea     eax,[3614090360+ebp*1+eax]
+        xor     edi,edx
+        mov     ebp,DWORD [4+esi]
+        add     eax,edi
+        rol     eax,7
+        mov     edi,ebx
+        add     eax,ebx
+        ; R0 1
+        xor     edi,ecx
+        and     edi,eax
+        lea     edx,[3905402710+ebp*1+edx]
+        xor     edi,ecx
+        mov     ebp,DWORD [8+esi]
+        add     edx,edi
+        rol     edx,12
+        mov     edi,eax
+        add     edx,eax
+        ; R0 2
+        xor     edi,ebx
+        and     edi,edx
+        lea     ecx,[606105819+ebp*1+ecx]
+        xor     edi,ebx
+        mov     ebp,DWORD [12+esi]
+        add     ecx,edi
+        rol     ecx,17
+        mov     edi,edx
+        add     ecx,edx
+        ; R0 3
+        xor     edi,eax
+        and     edi,ecx
+        lea     ebx,[3250441966+ebp*1+ebx]
+        xor     edi,eax
+        mov     ebp,DWORD [16+esi]
+        add     ebx,edi
+        rol     ebx,22
+        mov     edi,ecx
+        add     ebx,ecx
+        ; R0 4
+        xor     edi,edx
+        and     edi,ebx
+        lea     eax,[4118548399+ebp*1+eax]
+        xor     edi,edx
+        mov     ebp,DWORD [20+esi]
+        add     eax,edi
+        rol     eax,7
+        mov     edi,ebx
+        add     eax,ebx
+        ; R0 5
+        xor     edi,ecx
+        and     edi,eax
+        lea     edx,[1200080426+ebp*1+edx]
+        xor     edi,ecx
+        mov     ebp,DWORD [24+esi]
+        add     edx,edi
+        rol     edx,12
+        mov     edi,eax
+        add     edx,eax
+        ; R0 6
+        xor     edi,ebx
+        and     edi,edx
+        lea     ecx,[2821735955+ebp*1+ecx]
+        xor     edi,ebx
+        mov     ebp,DWORD [28+esi]
+        add     ecx,edi
+        rol     ecx,17
+        mov     edi,edx
+        add     ecx,edx
+        ; R0 7
+        xor     edi,eax
+        and     edi,ecx
+        lea     ebx,[4249261313+ebp*1+ebx]
+        xor     edi,eax
+        mov     ebp,DWORD [32+esi]
+        add     ebx,edi
+        rol     ebx,22
+        mov     edi,ecx
+        add     ebx,ecx
+        ; R0 8
+        xor     edi,edx
+        and     edi,ebx
+        lea     eax,[1770035416+ebp*1+eax]
+        xor     edi,edx
+        mov     ebp,DWORD [36+esi]
+        add     eax,edi
+        rol     eax,7
+        mov     edi,ebx
+        add     eax,ebx
+        ; R0 9
+        xor     edi,ecx
+        and     edi,eax
+        lea     edx,[2336552879+ebp*1+edx]
+        xor     edi,ecx
+        mov     ebp,DWORD [40+esi]
+        add     edx,edi
+        rol     edx,12
+        mov     edi,eax
+        add     edx,eax
+        ; R0 10
+        xor     edi,ebx
+        and     edi,edx
+        lea     ecx,[4294925233+ebp*1+ecx]
+        xor     edi,ebx
+        mov     ebp,DWORD [44+esi]
+        add     ecx,edi
+        rol     ecx,17
+        mov     edi,edx
+        add     ecx,edx
+        ; R0 11
+        xor     edi,eax
+        and     edi,ecx
+        lea     ebx,[2304563134+ebp*1+ebx]
+        xor     edi,eax
+        mov     ebp,DWORD [48+esi]
+        add     ebx,edi
+        rol     ebx,22
+        mov     edi,ecx
+        add     ebx,ecx
+        ; R0 12
+        xor     edi,edx
+        and     edi,ebx
+        lea     eax,[1804603682+ebp*1+eax]
+        xor     edi,edx
+        mov     ebp,DWORD [52+esi]
+        add     eax,edi
+        rol     eax,7
+        mov     edi,ebx
+        add     eax,ebx
+        ; R0 13
+        xor     edi,ecx
+        and     edi,eax
+        lea     edx,[4254626195+ebp*1+edx]
+        xor     edi,ecx
+        mov     ebp,DWORD [56+esi]
+        add     edx,edi
+        rol     edx,12
+        mov     edi,eax
+        add     edx,eax
+        ; R0 14
+        xor     edi,ebx
+        and     edi,edx
+        lea     ecx,[2792965006+ebp*1+ecx]
+        xor     edi,ebx
+        mov     ebp,DWORD [60+esi]
+        add     ecx,edi
+        rol     ecx,17
+        mov     edi,edx
+        add     ecx,edx
+        ; R0 15
+        xor     edi,eax
+        and     edi,ecx
+        lea     ebx,[1236535329+ebp*1+ebx]
+        xor     edi,eax
+        mov     ebp,DWORD [4+esi]
+        add     ebx,edi
+        rol     ebx,22
+        mov     edi,ecx
+        add     ebx,ecx
+        ;
+        ; R1 section
+        ; R1 16
+        xor     edi,ebx
+        and     edi,edx
+        lea     eax,[4129170786+ebp*1+eax]
+        xor     edi,ecx
+        mov     ebp,DWORD [24+esi]
+        add     eax,edi
+        mov     edi,ebx
+        rol     eax,5
+        add     eax,ebx
+        ; R1 17
+        xor     edi,eax
+        and     edi,ecx
+        lea     edx,[3225465664+ebp*1+edx]
+        xor     edi,ebx
+        mov     ebp,DWORD [44+esi]
+        add     edx,edi
+        mov     edi,eax
+        rol     edx,9
+        add     edx,eax
+        ; R1 18
+        xor     edi,edx
+        and     edi,ebx
+        lea     ecx,[643717713+ebp*1+ecx]
+        xor     edi,eax
+        mov     ebp,DWORD [esi]
+        add     ecx,edi
+        mov     edi,edx
+        rol     ecx,14
+        add     ecx,edx
+        ; R1 19
+        xor     edi,ecx
+        and     edi,eax
+        lea     ebx,[3921069994+ebp*1+ebx]
+        xor     edi,edx
+        mov     ebp,DWORD [20+esi]
+        add     ebx,edi
+        mov     edi,ecx
+        rol     ebx,20
+        add     ebx,ecx
+        ; R1 20
+        xor     edi,ebx
+        and     edi,edx
+        lea     eax,[3593408605+ebp*1+eax]
+        xor     edi,ecx
+        mov     ebp,DWORD [40+esi]
+        add     eax,edi
+        mov     edi,ebx
+        rol     eax,5
+        add     eax,ebx
+        ; R1 21
+        xor     edi,eax
+        and     edi,ecx
+        lea     edx,[38016083+ebp*1+edx]
+        xor     edi,ebx
+        mov     ebp,DWORD [60+esi]
+        add     edx,edi
+        mov     edi,eax
+        rol     edx,9
+        add     edx,eax
+        ; R1 22
+        xor     edi,edx
+        and     edi,ebx
+        lea     ecx,[3634488961+ebp*1+ecx]
+        xor     edi,eax
+        mov     ebp,DWORD [16+esi]
+        add     ecx,edi
+        mov     edi,edx
+        rol     ecx,14
+        add     ecx,edx
+        ; R1 23
+        xor     edi,ecx
+        and     edi,eax
+        lea     ebx,[3889429448+ebp*1+ebx]
+        xor     edi,edx
+        mov     ebp,DWORD [36+esi]
+        add     ebx,edi
+        mov     edi,ecx
+        rol     ebx,20
+        add     ebx,ecx
+        ; R1 24
+        xor     edi,ebx
+        and     edi,edx
+        lea     eax,[568446438+ebp*1+eax]
+        xor     edi,ecx
+        mov     ebp,DWORD [56+esi]
+        add     eax,edi
+        mov     edi,ebx
+        rol     eax,5
+        add     eax,ebx
+        ; R1 25
+        xor     edi,eax
+        and     edi,ecx
+        lea     edx,[3275163606+ebp*1+edx]
+        xor     edi,ebx
+        mov     ebp,DWORD [12+esi]
+        add     edx,edi
+        mov     edi,eax
+        rol     edx,9
+        add     edx,eax
+        ; R1 26
+        xor     edi,edx
+        and     edi,ebx
+        lea     ecx,[4107603335+ebp*1+ecx]
+        xor     edi,eax
+        mov     ebp,DWORD [32+esi]
+        add     ecx,edi
+        mov     edi,edx
+        rol     ecx,14
+        add     ecx,edx
+        ; R1 27
+        xor     edi,ecx
+        and     edi,eax
+        lea     ebx,[1163531501+ebp*1+ebx]
+        xor     edi,edx
+        mov     ebp,DWORD [52+esi]
+        add     ebx,edi
+        mov     edi,ecx
+        rol     ebx,20
+        add     ebx,ecx
+        ; R1 28
+        xor     edi,ebx
+        and     edi,edx
+        lea     eax,[2850285829+ebp*1+eax]
+        xor     edi,ecx
+        mov     ebp,DWORD [8+esi]
+        add     eax,edi
+        mov     edi,ebx
+        rol     eax,5
+        add     eax,ebx
+        ; R1 29
+        xor     edi,eax
+        and     edi,ecx
+        lea     edx,[4243563512+ebp*1+edx]
+        xor     edi,ebx
+        mov     ebp,DWORD [28+esi]
+        add     edx,edi
+        mov     edi,eax
+        rol     edx,9
+        add     edx,eax
+        ; R1 30
+        xor     edi,edx
+        and     edi,ebx
+        lea     ecx,[1735328473+ebp*1+ecx]
+        xor     edi,eax
+        mov     ebp,DWORD [48+esi]
+        add     ecx,edi
+        mov     edi,edx
+        rol     ecx,14
+        add     ecx,edx
+        ; R1 31
+        xor     edi,ecx
+        and     edi,eax
+        lea     ebx,[2368359562+ebp*1+ebx]
+        xor     edi,edx
+        mov     ebp,DWORD [20+esi]
+        add     ebx,edi
+        mov     edi,ecx
+        rol     ebx,20
+        add     ebx,ecx
+        ;
+        ; R2 section
+        ; R2 32
+        xor     edi,edx
+        xor     edi,ebx
+        lea     eax,[4294588738+ebp*1+eax]
+        add     eax,edi
+        mov     ebp,DWORD [32+esi]
+        rol     eax,4
+        mov     edi,ebx
+        ; R2 33
+        add     eax,ebx
+        xor     edi,ecx
+        lea     edx,[2272392833+ebp*1+edx]
+        xor     edi,eax
+        mov     ebp,DWORD [44+esi]
+        add     edx,edi
+        mov     edi,eax
+        rol     edx,11
+        add     edx,eax
+        ; R2 34
+        xor     edi,ebx
+        xor     edi,edx
+        lea     ecx,[1839030562+ebp*1+ecx]
+        add     ecx,edi
+        mov     ebp,DWORD [56+esi]
+        rol     ecx,16
+        mov     edi,edx
+        ; R2 35
+        add     ecx,edx
+        xor     edi,eax
+        lea     ebx,[4259657740+ebp*1+ebx]
+        xor     edi,ecx
+        mov     ebp,DWORD [4+esi]
+        add     ebx,edi
+        mov     edi,ecx
+        rol     ebx,23
+        add     ebx,ecx
+        ; R2 36
+        xor     edi,edx
+        xor     edi,ebx
+        lea     eax,[2763975236+ebp*1+eax]
+        add     eax,edi
+        mov     ebp,DWORD [16+esi]
+        rol     eax,4
+        mov     edi,ebx
+        ; R2 37
+        add     eax,ebx
+        xor     edi,ecx
+        lea     edx,[1272893353+ebp*1+edx]
+        xor     edi,eax
+        mov     ebp,DWORD [28+esi]
+        add     edx,edi
+        mov     edi,eax
+        rol     edx,11
+        add     edx,eax
+        ; R2 38
+        xor     edi,ebx
+        xor     edi,edx
+        lea     ecx,[4139469664+ebp*1+ecx]
+        add     ecx,edi
+        mov     ebp,DWORD [40+esi]
+        rol     ecx,16
+        mov     edi,edx
+        ; R2 39
+        add     ecx,edx
+        xor     edi,eax
+        lea     ebx,[3200236656+ebp*1+ebx]
+        xor     edi,ecx
+        mov     ebp,DWORD [52+esi]
+        add     ebx,edi
+        mov     edi,ecx
+        rol     ebx,23
+        add     ebx,ecx
+        ; R2 40
+        xor     edi,edx
+        xor     edi,ebx
+        lea     eax,[681279174+ebp*1+eax]
+        add     eax,edi
+        mov     ebp,DWORD [esi]
+        rol     eax,4
+        mov     edi,ebx
+        ; R2 41
+        add     eax,ebx
+        xor     edi,ecx
+        lea     edx,[3936430074+ebp*1+edx]
+        xor     edi,eax
+        mov     ebp,DWORD [12+esi]
+        add     edx,edi
+        mov     edi,eax
+        rol     edx,11
+        add     edx,eax
+        ; R2 42
+        xor     edi,ebx
+        xor     edi,edx
+        lea     ecx,[3572445317+ebp*1+ecx]
+        add     ecx,edi
+        mov     ebp,DWORD [24+esi]
+        rol     ecx,16
+        mov     edi,edx
+        ; R2 43
+        add     ecx,edx
+        xor     edi,eax
+        lea     ebx,[76029189+ebp*1+ebx]
+        xor     edi,ecx
+        mov     ebp,DWORD [36+esi]
+        add     ebx,edi
+        mov     edi,ecx
+        rol     ebx,23
+        add     ebx,ecx
+        ; R2 44
+        xor     edi,edx
+        xor     edi,ebx
+        lea     eax,[3654602809+ebp*1+eax]
+        add     eax,edi
+        mov     ebp,DWORD [48+esi]
+        rol     eax,4
+        mov     edi,ebx
+        ; R2 45
+        add     eax,ebx
+        xor     edi,ecx
+        lea     edx,[3873151461+ebp*1+edx]
+        xor     edi,eax
+        mov     ebp,DWORD [60+esi]
+        add     edx,edi
+        mov     edi,eax
+        rol     edx,11
+        add     edx,eax
+        ; R2 46
+        xor     edi,ebx
+        xor     edi,edx
+        lea     ecx,[530742520+ebp*1+ecx]
+        add     ecx,edi
+        mov     ebp,DWORD [8+esi]
+        rol     ecx,16
+        mov     edi,edx
+        ; R2 47
+        add     ecx,edx
+        xor     edi,eax
+        lea     ebx,[3299628645+ebp*1+ebx]
+        xor     edi,ecx
+        mov     ebp,DWORD [esi]
+        add     ebx,edi
+        mov     edi,-1
+        rol     ebx,23
+        add     ebx,ecx
+        ;
+        ; R3 section
+        ; R3 48
+        xor     edi,edx
+        or      edi,ebx
+        lea     eax,[4096336452+ebp*1+eax]
+        xor     edi,ecx
+        mov     ebp,DWORD [28+esi]
+        add     eax,edi
+        mov     edi,-1
+        rol     eax,6
+        xor     edi,ecx
+        add     eax,ebx
+        ; R3 49
+        or      edi,eax
+        lea     edx,[1126891415+ebp*1+edx]
+        xor     edi,ebx
+        mov     ebp,DWORD [56+esi]
+        add     edx,edi
+        mov     edi,-1
+        rol     edx,10
+        xor     edi,ebx
+        add     edx,eax
+        ; R3 50
+        or      edi,edx
+        lea     ecx,[2878612391+ebp*1+ecx]
+        xor     edi,eax
+        mov     ebp,DWORD [20+esi]
+        add     ecx,edi
+        mov     edi,-1
+        rol     ecx,15
+        xor     edi,eax
+        add     ecx,edx
+        ; R3 51
+        or      edi,ecx
+        lea     ebx,[4237533241+ebp*1+ebx]
+        xor     edi,edx
+        mov     ebp,DWORD [48+esi]
+        add     ebx,edi
+        mov     edi,-1
+        rol     ebx,21
+        xor     edi,edx
+        add     ebx,ecx
+        ; R3 52
+        or      edi,ebx
+        lea     eax,[1700485571+ebp*1+eax]
+        xor     edi,ecx
+        mov     ebp,DWORD [12+esi]
+        add     eax,edi
+        mov     edi,-1
+        rol     eax,6
+        xor     edi,ecx
+        add     eax,ebx
+        ; R3 53
+        or      edi,eax
+        lea     edx,[2399980690+ebp*1+edx]
+        xor     edi,ebx
+        mov     ebp,DWORD [40+esi]
+        add     edx,edi
+        mov     edi,-1
+        rol     edx,10
+        xor     edi,ebx
+        add     edx,eax
+        ; R3 54
+        or      edi,edx
+        lea     ecx,[4293915773+ebp*1+ecx]
+        xor     edi,eax
+        mov     ebp,DWORD [4+esi]
+        add     ecx,edi
+        mov     edi,-1
+        rol     ecx,15
+        xor     edi,eax
+        add     ecx,edx
+        ; R3 55
+        or      edi,ecx
+        lea     ebx,[2240044497+ebp*1+ebx]
+        xor     edi,edx
+        mov     ebp,DWORD [32+esi]
+        add     ebx,edi
+        mov     edi,-1
+        rol     ebx,21
+        xor     edi,edx
+        add     ebx,ecx
+        ; R3 56
+        or      edi,ebx
+        lea     eax,[1873313359+ebp*1+eax]
+        xor     edi,ecx
+        mov     ebp,DWORD [60+esi]
+        add     eax,edi
+        mov     edi,-1
+        rol     eax,6
+        xor     edi,ecx
+        add     eax,ebx
+        ; R3 57
+        or      edi,eax
+        lea     edx,[4264355552+ebp*1+edx]
+        xor     edi,ebx
+        mov     ebp,DWORD [24+esi]
+        add     edx,edi
+        mov     edi,-1
+        rol     edx,10
+        xor     edi,ebx
+        add     edx,eax
+        ; R3 58
+        or      edi,edx
+        lea     ecx,[2734768916+ebp*1+ecx]
+        xor     edi,eax
+        mov     ebp,DWORD [52+esi]
+        add     ecx,edi
+        mov     edi,-1
+        rol     ecx,15
+        xor     edi,eax
+        add     ecx,edx
+        ; R3 59
+        or      edi,ecx
+        lea     ebx,[1309151649+ebp*1+ebx]
+        xor     edi,edx
+        mov     ebp,DWORD [16+esi]
+        add     ebx,edi
+        mov     edi,-1
+        rol     ebx,21
+        xor     edi,edx
+        add     ebx,ecx
+        ; R3 60
+        or      edi,ebx
+        lea     eax,[4149444226+ebp*1+eax]
+        xor     edi,ecx
+        mov     ebp,DWORD [44+esi]
+        add     eax,edi
+        mov     edi,-1
+        rol     eax,6
+        xor     edi,ecx
+        add     eax,ebx
+        ; R3 61
+        or      edi,eax
+        lea     edx,[3174756917+ebp*1+edx]
+        xor     edi,ebx
+        mov     ebp,DWORD [8+esi]
+        add     edx,edi
+        mov     edi,-1
+        rol     edx,10
+        xor     edi,ebx
+        add     edx,eax
+        ; R3 62
+        or      edi,edx
+        lea     ecx,[718787259+ebp*1+ecx]
+        xor     edi,eax
+        mov     ebp,DWORD [36+esi]
+        add     ecx,edi
+        mov     edi,-1
+        rol     ecx,15
+        xor     edi,eax
+        add     ecx,edx
+        ; R3 63
+        or      edi,ecx
+        lea     ebx,[3951481745+ebp*1+ebx]
+        xor     edi,edx
+        mov     ebp,DWORD [24+esp]
+        add     ebx,edi
+        add     esi,64
+        rol     ebx,21
+        mov     edi,DWORD [ebp]
+        add     ebx,ecx
+        add     eax,edi
+        mov     edi,DWORD [4+ebp]
+        add     ebx,edi
+        mov     edi,DWORD [8+ebp]
+        add     ecx,edi
+        mov     edi,DWORD [12+ebp]
+        add     edx,edi
+        mov     DWORD [ebp],eax
+        mov     DWORD [4+ebp],ebx
+        mov     edi,DWORD [esp]
+        mov     DWORD [8+ebp],ecx
+        mov     DWORD [12+ebp],edx
+        cmp     edi,esi
+        jae     NEAR L$000start
+        pop     eax
+        pop     ebx
+        pop     ebp
+        pop     edi
+        pop     esi
+        ret
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/modes/ghash-x86.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/modes/ghash-x86.nasm
new file mode 100644
index 0000000000..57649ad22b
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/modes/ghash-x86.nasm
@@ -0,0 +1,1264 @@
+; Copyright 2010-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code    use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text   code align=64
+%else
+section .text   code
+%endif
+global  _gcm_gmult_4bit_x86
+align   16
+_gcm_gmult_4bit_x86:
+L$_gcm_gmult_4bit_x86_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        sub     esp,84
+        mov     edi,DWORD [104+esp]
+        mov     esi,DWORD [108+esp]
+        mov     ebp,DWORD [edi]
+        mov     edx,DWORD [4+edi]
+        mov     ecx,DWORD [8+edi]
+        mov     ebx,DWORD [12+edi]
+        mov     DWORD [16+esp],0
+        mov     DWORD [20+esp],471859200
+        mov     DWORD [24+esp],943718400
+        mov     DWORD [28+esp],610271232
+        mov     DWORD [32+esp],1887436800
+        mov     DWORD [36+esp],1822425088
+        mov     DWORD [40+esp],1220542464
+        mov     DWORD [44+esp],1423966208
+        mov     DWORD [48+esp],3774873600
+        mov     DWORD [52+esp],4246732800
+        mov     DWORD [56+esp],3644850176
+        mov     DWORD [60+esp],3311403008
+        mov     DWORD [64+esp],2441084928
+        mov     DWORD [68+esp],2376073216
+        mov     DWORD [72+esp],2847932416
+        mov     DWORD [76+esp],3051356160
+        mov     DWORD [esp],ebp
+        mov     DWORD [4+esp],edx
+        mov     DWORD [8+esp],ecx
+        mov     DWORD [12+esp],ebx
+        shr     ebx,20
+        and     ebx,240
+        mov     ebp,DWORD [4+ebx*1+esi]
+        mov     edx,DWORD [ebx*1+esi]
+        mov     ecx,DWORD [12+ebx*1+esi]
+        mov     ebx,DWORD [8+ebx*1+esi]
+        xor     eax,eax
+        mov     edi,15
+        jmp     NEAR L$000x86_loop
+align   16
+L$000x86_loop:
+        mov     al,bl
+        shrd    ebx,ecx,4
+        and     al,15
+        shrd    ecx,edx,4
+        shrd    edx,ebp,4
+        shr     ebp,4
+        xor     ebp,DWORD [16+eax*4+esp]
+        mov     al,BYTE [edi*1+esp]
+        and     al,240
+        xor     ebx,DWORD [8+eax*1+esi]
+        xor     ecx,DWORD [12+eax*1+esi]
+        xor     edx,DWORD [eax*1+esi]
+        xor     ebp,DWORD [4+eax*1+esi]
+        dec     edi
+        js      NEAR L$001x86_break
+        mov     al,bl
+        shrd    ebx,ecx,4
+        and     al,15
+        shrd    ecx,edx,4
+        shrd    edx,ebp,4
+        shr     ebp,4
+        xor     ebp,DWORD [16+eax*4+esp]
+        mov     al,BYTE [edi*1+esp]
+        shl     al,4
+        xor     ebx,DWORD [8+eax*1+esi]
+        xor     ecx,DWORD [12+eax*1+esi]
+        xor     edx,DWORD [eax*1+esi]
+        xor     ebp,DWORD [4+eax*1+esi]
+        jmp     NEAR L$000x86_loop
+align   16
+L$001x86_break:
+        bswap   ebx
+        bswap   ecx
+        bswap   edx
+        bswap   ebp
+        mov     edi,DWORD [104+esp]
+        mov     DWORD [12+edi],ebx
+        mov     DWORD [8+edi],ecx
+        mov     DWORD [4+edi],edx
+        mov     DWORD [edi],ebp
+        add     esp,84
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _gcm_ghash_4bit_x86
+align   16
+_gcm_ghash_4bit_x86:
+L$_gcm_ghash_4bit_x86_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        sub     esp,84
+        mov     ebx,DWORD [104+esp]
+        mov     esi,DWORD [108+esp]
+        mov     edi,DWORD [112+esp]
+        mov     ecx,DWORD [116+esp]
+        add     ecx,edi
+        mov     DWORD [116+esp],ecx
+        mov     ebp,DWORD [ebx]
+        mov     edx,DWORD [4+ebx]
+        mov     ecx,DWORD [8+ebx]
+        mov     ebx,DWORD [12+ebx]
+        mov     DWORD [16+esp],0
+        mov     DWORD [20+esp],471859200
+        mov     DWORD [24+esp],943718400
+        mov     DWORD [28+esp],610271232
+        mov     DWORD [32+esp],1887436800
+        mov     DWORD [36+esp],1822425088
+        mov     DWORD [40+esp],1220542464
+        mov     DWORD [44+esp],1423966208
+        mov     DWORD [48+esp],3774873600
+        mov     DWORD [52+esp],4246732800
+        mov     DWORD [56+esp],3644850176
+        mov     DWORD [60+esp],3311403008
+        mov     DWORD [64+esp],2441084928
+        mov     DWORD [68+esp],2376073216
+        mov     DWORD [72+esp],2847932416
+        mov     DWORD [76+esp],3051356160
+align   16
+L$002x86_outer_loop:
+        xor     ebx,DWORD [12+edi]
+        xor     ecx,DWORD [8+edi]
+        xor     edx,DWORD [4+edi]
+        xor     ebp,DWORD [edi]
+        mov     DWORD [12+esp],ebx
+        mov     DWORD [8+esp],ecx
+        mov     DWORD [4+esp],edx
+        mov     DWORD [esp],ebp
+        shr     ebx,20
+        and     ebx,240
+        mov     ebp,DWORD [4+ebx*1+esi]
+        mov     edx,DWORD [ebx*1+esi]
+        mov     ecx,DWORD [12+ebx*1+esi]
+        mov     ebx,DWORD [8+ebx*1+esi]
+        xor     eax,eax
+        mov     edi,15
+        jmp     NEAR L$003x86_loop
+align   16
+L$003x86_loop:
+        mov     al,bl
+        shrd    ebx,ecx,4
+        and     al,15
+        shrd    ecx,edx,4
+        shrd    edx,ebp,4
+        shr     ebp,4
+        xor     ebp,DWORD [16+eax*4+esp]
+        mov     al,BYTE [edi*1+esp]
+        and     al,240
+        xor     ebx,DWORD [8+eax*1+esi]
+        xor     ecx,DWORD [12+eax*1+esi]
+        xor     edx,DWORD [eax*1+esi]
+        xor     ebp,DWORD [4+eax*1+esi]
+        dec     edi
+        js      NEAR L$004x86_break
+        mov     al,bl
+        shrd    ebx,ecx,4
+        and     al,15
+        shrd    ecx,edx,4
+        shrd    edx,ebp,4
+        shr     ebp,4
+        xor     ebp,DWORD [16+eax*4+esp]
+        mov     al,BYTE [edi*1+esp]
+        shl     al,4
+        xor     ebx,DWORD [8+eax*1+esi]
+        xor     ecx,DWORD [12+eax*1+esi]
+        xor     edx,DWORD [eax*1+esi]
+        xor     ebp,DWORD [4+eax*1+esi]
+        jmp     NEAR L$003x86_loop
+align   16
+L$004x86_break:
+        bswap   ebx
+        bswap   ecx
+        bswap   edx
+        bswap   ebp
+        mov     edi,DWORD [112+esp]
+        lea     edi,[16+edi]
+        cmp     edi,DWORD [116+esp]
+        mov     DWORD [112+esp],edi
+        jb      NEAR L$002x86_outer_loop
+        mov     edi,DWORD [104+esp]
+        mov     DWORD [12+edi],ebx
+        mov     DWORD [8+edi],ecx
+        mov     DWORD [4+edi],edx
+        mov     DWORD [edi],ebp
+        add     esp,84
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _gcm_gmult_4bit_mmx
+align   16
+_gcm_gmult_4bit_mmx:
+L$_gcm_gmult_4bit_mmx_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     edi,DWORD [20+esp]
+        mov     esi,DWORD [24+esp]
+        call    L$005pic_point
+L$005pic_point:
+        pop     eax
+        lea     eax,[(L$rem_4bit-L$005pic_point)+eax]
+        movzx   ebx,BYTE [15+edi]
+        xor     ecx,ecx
+        mov     edx,ebx
+        mov     cl,dl
+        mov     ebp,14
+        shl     cl,4
+        and     edx,240
+        movq    mm0,[8+ecx*1+esi]
+        movq    mm1,[ecx*1+esi]
+        movd    ebx,mm0
+        jmp     NEAR L$006mmx_loop
+align   16
+L$006mmx_loop:
+        psrlq   mm0,4
+        and     ebx,15
+        movq    mm2,mm1
+        psrlq   mm1,4
+        pxor    mm0,[8+edx*1+esi]
+        mov     cl,BYTE [ebp*1+edi]
+        psllq   mm2,60
+        pxor    mm1,[ebx*8+eax]
+        dec     ebp
+        movd    ebx,mm0
+        pxor    mm1,[edx*1+esi]
+        mov     edx,ecx
+        pxor    mm0,mm2
+        js      NEAR L$007mmx_break
+        shl     cl,4
+        and     ebx,15
+        psrlq   mm0,4
+        and     edx,240
+        movq    mm2,mm1
+        psrlq   mm1,4
+        pxor    mm0,[8+ecx*1+esi]
+        psllq   mm2,60
+        pxor    mm1,[ebx*8+eax]
+        movd    ebx,mm0
+        pxor    mm1,[ecx*1+esi]
+        pxor    mm0,mm2
+        jmp     NEAR L$006mmx_loop
+align   16
+L$007mmx_break:
+        shl     cl,4
+        and     ebx,15
+        psrlq   mm0,4
+        and     edx,240
+        movq    mm2,mm1
+        psrlq   mm1,4
+        pxor    mm0,[8+ecx*1+esi]
+        psllq   mm2,60
+        pxor    mm1,[ebx*8+eax]
+        movd    ebx,mm0
+        pxor    mm1,[ecx*1+esi]
+        pxor    mm0,mm2
+        psrlq   mm0,4
+        and     ebx,15
+        movq    mm2,mm1
+        psrlq   mm1,4
+        pxor    mm0,[8+edx*1+esi]
+        psllq   mm2,60
+        pxor    mm1,[ebx*8+eax]
+        movd    ebx,mm0
+        pxor    mm1,[edx*1+esi]
+        pxor    mm0,mm2
+        psrlq   mm0,32
+        movd    edx,mm1
+        psrlq   mm1,32
+        movd    ecx,mm0
+        movd    ebp,mm1
+        bswap   ebx
+        bswap   edx
+        bswap   ecx
+        bswap   ebp
+        emms
+        mov     DWORD [12+edi],ebx
+        mov     DWORD [4+edi],edx
+        mov     DWORD [8+edi],ecx
+        mov     DWORD [edi],ebp
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _gcm_ghash_4bit_mmx
+align   16
+_gcm_ghash_4bit_mmx:
+L$_gcm_ghash_4bit_mmx_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     eax,DWORD [20+esp]
+        mov     ebx,DWORD [24+esp]
+        mov     ecx,DWORD [28+esp]
+        mov     edx,DWORD [32+esp]
+        mov     ebp,esp
+        call    L$008pic_point
+L$008pic_point:
+        pop     esi
+        lea     esi,[(L$rem_8bit-L$008pic_point)+esi]
+        sub     esp,544
+        and     esp,-64
+        sub     esp,16
+        add     edx,ecx
+        mov     DWORD [544+esp],eax
+        mov     DWORD [552+esp],edx
+        mov     DWORD [556+esp],ebp
+        add     ebx,128
+        lea     edi,[144+esp]
+        lea     ebp,[400+esp]
+        mov     edx,DWORD [ebx-120]
+        movq    mm0,[ebx-120]
+        movq    mm3,[ebx-128]
+        shl     edx,4
+        mov     BYTE [esp],dl
+        mov     edx,DWORD [ebx-104]
+        movq    mm2,[ebx-104]
+        movq    mm5,[ebx-112]
+        movq    [edi-128],mm0
+        psrlq   mm0,4
+        movq    [edi],mm3
+        movq    mm7,mm3
+        psrlq   mm3,4
+        shl     edx,4
+        mov     BYTE [1+esp],dl
+        mov     edx,DWORD [ebx-88]
+        movq    mm1,[ebx-88]
+        psllq   mm7,60
+        movq    mm4,[ebx-96]
+        por     mm0,mm7
+        movq    [edi-120],mm2
+        psrlq   mm2,4
+        movq    [8+edi],mm5
+        movq    mm6,mm5
+        movq    [ebp-128],mm0
+        psrlq   mm5,4
+        movq    [ebp],mm3
+        shl     edx,4
+        mov     BYTE [2+esp],dl
+        mov     edx,DWORD [ebx-72]
+        movq    mm0,[ebx-72]
+        psllq   mm6,60
+        movq    mm3,[ebx-80]
+        por     mm2,mm6
+        movq    [edi-112],mm1
+        psrlq   mm1,4
+        movq    [16+edi],mm4
+        movq    mm7,mm4
+        movq    [ebp-120],mm2
+        psrlq   mm4,4
+        movq    [8+ebp],mm5
+        shl     edx,4
+        mov     BYTE [3+esp],dl
+        mov     edx,DWORD [ebx-56]
+        movq    mm2,[ebx-56]
+        psllq   mm7,60
+        movq    mm5,[ebx-64]
+        por     mm1,mm7
+        movq    [edi-104],mm0
+        psrlq   mm0,4
+        movq    [24+edi],mm3
+        movq    mm6,mm3
+        movq    [ebp-112],mm1
+        psrlq   mm3,4
+        movq    [16+ebp],mm4
+        shl     edx,4
+        mov     BYTE [4+esp],dl
+        mov     edx,DWORD [ebx-40]
+        movq    mm1,[ebx-40]
+        psllq   mm6,60
+        movq    mm4,[ebx-48]
+        por     mm0,mm6
+        movq    [edi-96],mm2
+        psrlq   mm2,4
+        movq    [32+edi],mm5
+        movq    mm7,mm5
+        movq    [ebp-104],mm0
+        psrlq   mm5,4
+        movq    [24+ebp],mm3
+        shl     edx,4
+        mov     BYTE [5+esp],dl
+        mov     edx,DWORD [ebx-24]
+        movq    mm0,[ebx-24]
+        psllq   mm7,60
+        movq    mm3,[ebx-32]
+        por     mm2,mm7
+        movq    [edi-88],mm1
+        psrlq   mm1,4
+        movq    [40+edi],mm4
+        movq    mm6,mm4
+        movq    [ebp-96],mm2
+        psrlq   mm4,4
+        movq    [32+ebp],mm5
+        shl     edx,4
+        mov     BYTE [6+esp],dl
+        mov     edx,DWORD [ebx-8]
+        movq    mm2,[ebx-8]
+        psllq   mm6,60
+        movq    mm5,[ebx-16]
+        por     mm1,mm6
+        movq    [edi-80],mm0
+        psrlq   mm0,4
+        movq    [48+edi],mm3
+        movq    mm7,mm3
+        movq    [ebp-88],mm1
+        psrlq   mm3,4
+        movq    [40+ebp],mm4
+        shl     edx,4
+        mov     BYTE [7+esp],dl
+        mov     edx,DWORD [8+ebx]
+        movq    mm1,[8+ebx]
+        psllq   mm7,60
+        movq    mm4,[ebx]
+        por     mm0,mm7
+        movq    [edi-72],mm2
+        psrlq   mm2,4
+        movq    [56+edi],mm5
+        movq    mm6,mm5
+        movq    [ebp-80],mm0
+        psrlq   mm5,4
+        movq    [48+ebp],mm3
+        shl     edx,4
+        mov     BYTE [8+esp],dl
+        mov     edx,DWORD [24+ebx]
+        movq    mm0,[24+ebx]
+        psllq   mm6,60
+        movq    mm3,[16+ebx]
+        por     mm2,mm6
+        movq    [edi-64],mm1
+        psrlq   mm1,4
+        movq    [64+edi],mm4
+        movq    mm7,mm4
+        movq    [ebp-72],mm2
+        psrlq   mm4,4
+        movq    [56+ebp],mm5
+        shl     edx,4
+        mov     BYTE [9+esp],dl
+        mov     edx,DWORD [40+ebx]
+        movq    mm2,[40+ebx]
+        psllq   mm7,60
+        movq    mm5,[32+ebx]
+        por     mm1,mm7
+        movq    [edi-56],mm0
+        psrlq   mm0,4
+        movq    [72+edi],mm3
+        movq    mm6,mm3
+        movq    [ebp-64],mm1
+        psrlq   mm3,4
+        movq    [64+ebp],mm4
+        shl     edx,4
+        mov     BYTE [10+esp],dl
+        mov     edx,DWORD [56+ebx]
+        movq    mm1,[56+ebx]
+        psllq   mm6,60
+        movq    mm4,[48+ebx]
+        por     mm0,mm6
+        movq    [edi-48],mm2
+        psrlq   mm2,4
+        movq    [80+edi],mm5
+        movq    mm7,mm5
+        movq    [ebp-56],mm0
+        psrlq   mm5,4
+        movq    [72+ebp],mm3
+        shl     edx,4
+        mov     BYTE [11+esp],dl
+        mov     edx,DWORD [72+ebx]
+        movq    mm0,[72+ebx]
+        psllq   mm7,60
+        movq    mm3,[64+ebx]
+        por     mm2,mm7
+        movq    [edi-40],mm1
+        psrlq   mm1,4
+        movq    [88+edi],mm4
+        movq    mm6,mm4
+        movq    [ebp-48],mm2
+        psrlq   mm4,4
+        movq    [80+ebp],mm5
+        shl     edx,4
+        mov     BYTE [12+esp],dl
+        mov     edx,DWORD [88+ebx]
+        movq    mm2,[88+ebx]
+        psllq   mm6,60
+        movq    mm5,[80+ebx]
+        por     mm1,mm6
+        movq    [edi-32],mm0
+        psrlq   mm0,4
+        movq    [96+edi],mm3
+        movq    mm7,mm3
+        movq    [ebp-40],mm1
+        psrlq   mm3,4
+        movq    [88+ebp],mm4
+        shl     edx,4
+        mov     BYTE [13+esp],dl
+        mov     edx,DWORD [104+ebx]
+        movq    mm1,[104+ebx]
+        psllq   mm7,60
+        movq    mm4,[96+ebx]
+        por     mm0,mm7
+        movq    [edi-24],mm2
+        psrlq   mm2,4
+        movq    [104+edi],mm5
+        movq    mm6,mm5
+        movq    [ebp-32],mm0
+        psrlq   mm5,4
+        movq    [96+ebp],mm3
+        shl     edx,4
+        mov     BYTE [14+esp],dl
+        mov     edx,DWORD [120+ebx]
+        movq    mm0,[120+ebx]
+        psllq   mm6,60
+        movq    mm3,[112+ebx]
+        por     mm2,mm6
+        movq    [edi-16],mm1
+        psrlq   mm1,4
+        movq    [112+edi],mm4
+        movq    mm7,mm4
+        movq    [ebp-24],mm2
+        psrlq   mm4,4
+        movq    [104+ebp],mm5
+        shl     edx,4
+        mov     BYTE [15+esp],dl
+        psllq   mm7,60
+        por     mm1,mm7
+        movq    [edi-8],mm0
+        psrlq   mm0,4
+        movq    [120+edi],mm3
+        movq    mm6,mm3
+        movq    [ebp-16],mm1
+        psrlq   mm3,4
+        movq    [112+ebp],mm4
+        psllq   mm6,60
+        por     mm0,mm6
+        movq    [ebp-8],mm0
+        movq    [120+ebp],mm3
+        movq    mm6,[eax]
+        mov     ebx,DWORD [8+eax]
+        mov     edx,DWORD [12+eax]
+align   16
+L$009outer:
+        xor     edx,DWORD [12+ecx]
+        xor     ebx,DWORD [8+ecx]
+        pxor    mm6,[ecx]
+        lea     ecx,[16+ecx]
+        mov     DWORD [536+esp],ebx
+        movq    [528+esp],mm6
+        mov     DWORD [548+esp],ecx
+        xor     eax,eax
+        rol     edx,8
+        mov     al,dl
+        mov     ebp,eax
+        and     al,15
+        shr     ebp,4
+        pxor    mm0,mm0
+        rol     edx,8
+        pxor    mm1,mm1
+        pxor    mm2,mm2
+        movq    mm7,[16+eax*8+esp]
+        movq    mm6,[144+eax*8+esp]
+        mov     al,dl
+        movd    ebx,mm7
+        psrlq   mm7,8
+        movq    mm3,mm6
+        mov     edi,eax
+        psrlq   mm6,8
+        pxor    mm7,[272+ebp*8+esp]
+        and     al,15
+        psllq   mm3,56
+        shr     edi,4
+        pxor    mm7,[16+eax*8+esp]
+        rol     edx,8
+        pxor    mm6,[144+eax*8+esp]
+        pxor    mm7,mm3
+        pxor    mm6,[400+ebp*8+esp]
+        xor     bl,BYTE [ebp*1+esp]
+        mov     al,dl
+        movd    ecx,mm7
+        movzx   ebx,bl
+        psrlq   mm7,8
+        movq    mm3,mm6
+        mov     ebp,eax
+        psrlq   mm6,8
+        pxor    mm7,[272+edi*8+esp]
+        and     al,15
+        psllq   mm3,56
+        shr     ebp,4
+        pinsrw  mm2,WORD [ebx*2+esi],2
+        pxor    mm7,[16+eax*8+esp]
+        rol     edx,8
+        pxor    mm6,[144+eax*8+esp]
+        pxor    mm7,mm3
+        pxor    mm6,[400+edi*8+esp]
+        xor     cl,BYTE [edi*1+esp]
+        mov     al,dl
+        mov     edx,DWORD [536+esp]
+        movd    ebx,mm7
+        movzx   ecx,cl
+        psrlq   mm7,8
+        movq    mm3,mm6
+        mov     edi,eax
+        psrlq   mm6,8
+        pxor    mm7,[272+ebp*8+esp]
+        and     al,15
+        psllq   mm3,56
+        pxor    mm6,mm2
+        shr     edi,4
+        pinsrw  mm1,WORD [ecx*2+esi],2
+        pxor    mm7,[16+eax*8+esp]
+        rol     edx,8
+        pxor    mm6,[144+eax*8+esp]
+        pxor    mm7,mm3
+        pxor    mm6,[400+ebp*8+esp]
+        xor     bl,BYTE [ebp*1+esp]
+        mov     al,dl
+        movd    ecx,mm7
+        movzx   ebx,bl
+        psrlq   mm7,8
+        movq    mm3,mm6
+        mov     ebp,eax
+        psrlq   mm6,8
+        pxor    mm7,[272+edi*8+esp]
+        and     al,15
+        psllq   mm3,56
+        pxor    mm6,mm1
+        shr     ebp,4
+        pinsrw  mm0,WORD [ebx*2+esi],2
+        pxor    mm7,[16+eax*8+esp]
+        rol     edx,8
+        pxor    mm6,[144+eax*8+esp]
+        pxor    mm7,mm3
+        pxor    mm6,[400+edi*8+esp]
+        xor     cl,BYTE [edi*1+esp]
+        mov     al,dl
+        movd    ebx,mm7
+        movzx   ecx,cl
+        psrlq   mm7,8
+        movq    mm3,mm6
+        mov     edi,eax
+        psrlq   mm6,8
+        pxor    mm7,[272+ebp*8+esp]
+        and     al,15
+        psllq   mm3,56
+        pxor    mm6,mm0
+        shr     edi,4
+        pinsrw  mm2,WORD [ecx*2+esi],2
+        pxor    mm7,[16+eax*8+esp]
+        rol     edx,8
+        pxor    mm6,[144+eax*8+esp]
+        pxor    mm7,mm3
+        pxor    mm6,[400+ebp*8+esp]
+        xor     bl,BYTE [ebp*1+esp]
+        mov     al,dl
+        movd    ecx,mm7
+        movzx   ebx,bl
+        psrlq   mm7,8
+        movq    mm3,mm6
+        mov     ebp,eax
+        psrlq   mm6,8
+        pxor    mm7,[272+edi*8+esp]
+        and     al,15
+        psllq   mm3,56
+        pxor    mm6,mm2
+        shr     ebp,4
+        pinsrw  mm1,WORD [ebx*2+esi],2
+        pxor    mm7,[16+eax*8+esp]
+        rol     edx,8
+        pxor    mm6,[144+eax*8+esp]
+        pxor    mm7,mm3
+        pxor    mm6,[400+edi*8+esp]
+        xor     cl,BYTE [edi*1+esp]
+        mov     al,dl
+        mov     edx,DWORD [532+esp]
+        movd    ebx,mm7
+        movzx   ecx,cl
+        psrlq   mm7,8
+        movq    mm3,mm6
+        mov     edi,eax
+        psrlq   mm6,8
+        pxor    mm7,[272+ebp*8+esp]
+        and     al,15
+        psllq   mm3,56
+        pxor    mm6,mm1
+        shr     edi,4
+        pinsrw  mm0,WORD [ecx*2+esi],2
+        pxor    mm7,[16+eax*8+esp]
+        rol     edx,8
+        pxor    mm6,[144+eax*8+esp]
+        pxor    mm7,mm3
+        pxor    mm6,[400+ebp*8+esp]
+        xor     bl,BYTE [ebp*1+esp]
+        mov     al,dl
+        movd    ecx,mm7
+        movzx   ebx,bl
+        psrlq   mm7,8
+        movq    mm3,mm6
+        mov     ebp,eax
+        psrlq   mm6,8
+        pxor    mm7,[272+edi*8+esp]
+        and     al,15
+        psllq   mm3,56
+        pxor    mm6,mm0
+        shr     ebp,4
+        pinsrw  mm2,WORD [ebx*2+esi],2
+        pxor    mm7,[16+eax*8+esp]
+        rol     edx,8
+        pxor    mm6,[144+eax*8+esp]
+        pxor    mm7,mm3
+        pxor    mm6,[400+edi*8+esp]
+        xor     cl,BYTE [edi*1+esp]
+        mov     al,dl
+        movd    ebx,mm7
+        movzx   ecx,cl
+        psrlq   mm7,8
+        movq    mm3,mm6
+        mov     edi,eax
+        psrlq   mm6,8
+        pxor    mm7,[272+ebp*8+esp]
+        and     al,15
+        psllq   mm3,56
+        pxor    mm6,mm2
+        shr     edi,4
+        pinsrw  mm1,WORD [ecx*2+esi],2
+        pxor    mm7,[16+eax*8+esp]
+        rol     edx,8
+        pxor    mm6,[144+eax*8+esp]
+        pxor    mm7,mm3
+        pxor    mm6,[400+ebp*8+esp]
+        xor     bl,BYTE [ebp*1+esp]
+        mov     al,dl
+        movd    ecx,mm7
+        movzx   ebx,bl
+        psrlq   mm7,8
+        movq    mm3,mm6
+        mov     ebp,eax
+        psrlq   mm6,8
+        pxor    mm7,[272+edi*8+esp]
+        and     al,15
+        psllq   mm3,56
+        pxor    mm6,mm1
+        shr     ebp,4
+        pinsrw  mm0,WORD [ebx*2+esi],2
+        pxor    mm7,[16+eax*8+esp]
+        rol     edx,8
+        pxor    mm6,[144+eax*8+esp]
+        pxor    mm7,mm3
+        pxor    mm6,[400+edi*8+esp]
+        xor     cl,BYTE [edi*1+esp]
+        mov     al,dl
+        mov     edx,DWORD [528+esp]
+        movd    ebx,mm7
+        movzx   ecx,cl
+        psrlq   mm7,8
+        movq    mm3,mm6
+        mov     edi,eax
+        psrlq   mm6,8
+        pxor    mm7,[272+ebp*8+esp]
+        and     al,15
+        psllq   mm3,56
+        pxor    mm6,mm0
+        shr     edi,4
+        pinsrw  mm2,WORD [ecx*2+esi],2
+        pxor    mm7,[16+eax*8+esp]
+        rol     edx,8
+        pxor    mm6,[144+eax*8+esp]
+        pxor    mm7,mm3
+        pxor    mm6,[400+ebp*8+esp]
+        xor     bl,BYTE [ebp*1+esp]
+        mov     al,dl
+        movd    ecx,mm7
+        movzx   ebx,bl
+        psrlq   mm7,8
+        movq    mm3,mm6
+        mov     ebp,eax
+        psrlq   mm6,8
+        pxor    mm7,[272+edi*8+esp]
+        and     al,15
+        psllq   mm3,56
+        pxor    mm6,mm2
+        shr     ebp,4
+        pinsrw  mm1,WORD [ebx*2+esi],2
+        pxor    mm7,[16+eax*8+esp]
+        rol     edx,8
+        pxor    mm6,[144+eax*8+esp]
+        pxor    mm7,mm3
+        pxor    mm6,[400+edi*8+esp]
+        xor     cl,BYTE [edi*1+esp]
+        mov     al,dl
+        movd    ebx,mm7
+        movzx   ecx,cl
+        psrlq   mm7,8
+        movq    mm3,mm6
+        mov     edi,eax
+        psrlq   mm6,8
+        pxor    mm7,[272+ebp*8+esp]
+        and     al,15
+        psllq   mm3,56
+        pxor    mm6,mm1
+        shr     edi,4
+        pinsrw  mm0,WORD [ecx*2+esi],2
+        pxor    mm7,[16+eax*8+esp]
+        rol     edx,8
+        pxor    mm6,[144+eax*8+esp]
+        pxor    mm7,mm3
+        pxor    mm6,[400+ebp*8+esp]
+        xor     bl,BYTE [ebp*1+esp]
+        mov     al,dl
+        movd    ecx,mm7
+        movzx   ebx,bl
+        psrlq   mm7,8
+        movq    mm3,mm6
+        mov     ebp,eax
+        psrlq   mm6,8
+        pxor    mm7,[272+edi*8+esp]
+        and     al,15
+        psllq   mm3,56
+        pxor    mm6,mm0
+        shr     ebp,4
+        pinsrw  mm2,WORD [ebx*2+esi],2
+        pxor    mm7,[16+eax*8+esp]
+        rol     edx,8
+        pxor    mm6,[144+eax*8+esp]
+        pxor    mm7,mm3
+        pxor    mm6,[400+edi*8+esp]
+        xor     cl,BYTE [edi*1+esp]
+        mov     al,dl
+        mov     edx,DWORD [524+esp]
+        movd    ebx,mm7
+        movzx   ecx,cl
+        psrlq   mm7,8
+        movq    mm3,mm6
+        mov     edi,eax
+        psrlq   mm6,8
+        pxor    mm7,[272+ebp*8+esp]
+        and     al,15
+        psllq   mm3,56
+        pxor    mm6,mm2
+        shr     edi,4
+        pinsrw  mm1,WORD [ecx*2+esi],2
+        pxor    mm7,[16+eax*8+esp]
+        pxor    mm6,[144+eax*8+esp]
+        xor     bl,BYTE [ebp*1+esp]
+        pxor    mm7,mm3
+        pxor    mm6,[400+ebp*8+esp]
+        movzx   ebx,bl
+        pxor    mm2,mm2
+        psllq   mm1,4
+        movd    ecx,mm7
+        psrlq   mm7,4
+        movq    mm3,mm6
+        psrlq   mm6,4
+        shl     ecx,4
+        pxor    mm7,[16+edi*8+esp]
+        psllq   mm3,60
+        movzx   ecx,cl
+        pxor    mm7,mm3
+        pxor    mm6,[144+edi*8+esp]
+        pinsrw  mm0,WORD [ebx*2+esi],2
+        pxor    mm6,mm1
+        movd    edx,mm7
+        pinsrw  mm2,WORD [ecx*2+esi],3
+        psllq   mm0,12
+        pxor    mm6,mm0
+        psrlq   mm7,32
+        pxor    mm6,mm2
+        mov     ecx,DWORD [548+esp]
+        movd    ebx,mm7
+        movq    mm3,mm6
+        psllw   mm6,8
+        psrlw   mm3,8
+        por     mm6,mm3
+        bswap   edx
+        pshufw  mm6,mm6,27
+        bswap   ebx
+        cmp     ecx,DWORD [552+esp]
+        jne     NEAR L$009outer
+        mov     eax,DWORD [544+esp]
+        mov     DWORD [12+eax],edx
+        mov     DWORD [8+eax],ebx
+        movq    [eax],mm6
+        mov     esp,DWORD [556+esp]
+        emms
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _gcm_init_clmul
+align   16
+_gcm_init_clmul:
+L$_gcm_init_clmul_begin:
+        mov     edx,DWORD [4+esp]
+        mov     eax,DWORD [8+esp]
+        call    L$010pic
+L$010pic:
+        pop     ecx
+        lea     ecx,[(L$bswap-L$010pic)+ecx]
+        movdqu  xmm2,[eax]
+        pshufd  xmm2,xmm2,78
+        pshufd  xmm4,xmm2,255
+        movdqa  xmm3,xmm2
+        psllq   xmm2,1
+        pxor    xmm5,xmm5
+        psrlq   xmm3,63
+        pcmpgtd xmm5,xmm4
+        pslldq  xmm3,8
+        por     xmm2,xmm3
+        pand    xmm5,[16+ecx]
+        pxor    xmm2,xmm5
+        movdqa  xmm0,xmm2
+        movdqa  xmm1,xmm0
+        pshufd  xmm3,xmm0,78
+        pshufd  xmm4,xmm2,78
+        pxor    xmm3,xmm0
+        pxor    xmm4,xmm2
+db      102,15,58,68,194,0
+db      102,15,58,68,202,17
+db      102,15,58,68,220,0
+        xorps   xmm3,xmm0
+        xorps   xmm3,xmm1
+        movdqa  xmm4,xmm3
+        psrldq  xmm3,8
+        pslldq  xmm4,8
+        pxor    xmm1,xmm3
+        pxor    xmm0,xmm4
+        movdqa  xmm4,xmm0
+        movdqa  xmm3,xmm0
+        psllq   xmm0,5
+        pxor    xmm3,xmm0
+        psllq   xmm0,1
+        pxor    xmm0,xmm3
+        psllq   xmm0,57
+        movdqa  xmm3,xmm0
+        pslldq  xmm0,8
+        psrldq  xmm3,8
+        pxor    xmm0,xmm4
+        pxor    xmm1,xmm3
+        movdqa  xmm4,xmm0
+        psrlq   xmm0,1
+        pxor    xmm1,xmm4
+        pxor    xmm4,xmm0
+        psrlq   xmm0,5
+        pxor    xmm0,xmm4
+        psrlq   xmm0,1
+        pxor    xmm0,xmm1
+        pshufd  xmm3,xmm2,78
+        pshufd  xmm4,xmm0,78
+        pxor    xmm3,xmm2
+        movdqu  [edx],xmm2
+        pxor    xmm4,xmm0
+        movdqu  [16+edx],xmm0
+db      102,15,58,15,227,8
+        movdqu  [32+edx],xmm4
+        ret
+global  _gcm_gmult_clmul
+align   16
+_gcm_gmult_clmul:
+L$_gcm_gmult_clmul_begin:
+        mov     eax,DWORD [4+esp]
+        mov     edx,DWORD [8+esp]
+        call    L$011pic
+L$011pic:
+        pop     ecx
+        lea     ecx,[(L$bswap-L$011pic)+ecx]
+        movdqu  xmm0,[eax]
+        movdqa  xmm5,[ecx]
+        movups  xmm2,[edx]
+db      102,15,56,0,197
+        movups  xmm4,[32+edx]
+        movdqa  xmm1,xmm0
+        pshufd  xmm3,xmm0,78
+        pxor    xmm3,xmm0
+db      102,15,58,68,194,0
+db      102,15,58,68,202,17
+db      102,15,58,68,220,0
+        xorps   xmm3,xmm0
+        xorps   xmm3,xmm1
+        movdqa  xmm4,xmm3
+        psrldq  xmm3,8
+        pslldq  xmm4,8
+        pxor    xmm1,xmm3
+        pxor    xmm0,xmm4
+        movdqa  xmm4,xmm0
+        movdqa  xmm3,xmm0
+        psllq   xmm0,5
+        pxor    xmm3,xmm0
+        psllq   xmm0,1
+        pxor    xmm0,xmm3
+        psllq   xmm0,57
+        movdqa  xmm3,xmm0
+        pslldq  xmm0,8
+        psrldq  xmm3,8
+        pxor    xmm0,xmm4
+        pxor    xmm1,xmm3
+        movdqa  xmm4,xmm0
+        psrlq   xmm0,1
+        pxor    xmm1,xmm4
+        pxor    xmm4,xmm0
+        psrlq   xmm0,5
+        pxor    xmm0,xmm4
+        psrlq   xmm0,1
+        pxor    xmm0,xmm1
+db      102,15,56,0,197
+        movdqu  [eax],xmm0
+        ret
+global  _gcm_ghash_clmul
+align   16
+_gcm_ghash_clmul:
+L$_gcm_ghash_clmul_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     eax,DWORD [20+esp]
+        mov     edx,DWORD [24+esp]
+        mov     esi,DWORD [28+esp]
+        mov     ebx,DWORD [32+esp]
+        call    L$012pic
+L$012pic:
+        pop     ecx
+        lea     ecx,[(L$bswap-L$012pic)+ecx]
+        movdqu  xmm0,[eax]
+        movdqa  xmm5,[ecx]
+        movdqu  xmm2,[edx]
+db      102,15,56,0,197
+        sub     ebx,16
+        jz      NEAR L$013odd_tail
+        movdqu  xmm3,[esi]
+        movdqu  xmm6,[16+esi]
+db      102,15,56,0,221
+db      102,15,56,0,245
+        movdqu  xmm5,[32+edx]
+        pxor    xmm0,xmm3
+        pshufd  xmm3,xmm6,78
+        movdqa  xmm7,xmm6
+        pxor    xmm3,xmm6
+        lea     esi,[32+esi]
+db      102,15,58,68,242,0
+db      102,15,58,68,250,17
+db      102,15,58,68,221,0
+        movups  xmm2,[16+edx]
+        nop
+        sub     ebx,32
+        jbe     NEAR L$014even_tail
+        jmp     NEAR L$015mod_loop
+align   32
+L$015mod_loop:
+        pshufd  xmm4,xmm0,78
+        movdqa  xmm1,xmm0
+        pxor    xmm4,xmm0
+        nop
+db      102,15,58,68,194,0
+db      102,15,58,68,202,17
+db      102,15,58,68,229,16
+        movups  xmm2,[edx]
+        xorps   xmm0,xmm6
+        movdqa  xmm5,[ecx]
+        xorps   xmm1,xmm7
+        movdqu  xmm7,[esi]
+        pxor    xmm3,xmm0
+        movdqu  xmm6,[16+esi]
+        pxor    xmm3,xmm1
+db      102,15,56,0,253
+        pxor    xmm4,xmm3
+        movdqa  xmm3,xmm4
+        psrldq  xmm4,8
+        pslldq  xmm3,8
+        pxor    xmm1,xmm4
+        pxor    xmm0,xmm3
+db      102,15,56,0,245
+        pxor    xmm1,xmm7
+        movdqa  xmm7,xmm6
+        movdqa  xmm4,xmm0
+        movdqa  xmm3,xmm0
+        psllq   xmm0,5
+        pxor    xmm3,xmm0
+        psllq   xmm0,1
+        pxor    xmm0,xmm3
+db      102,15,58,68,242,0
+        movups  xmm5,[32+edx]
+        psllq   xmm0,57
+        movdqa  xmm3,xmm0
+        pslldq  xmm0,8
+        psrldq  xmm3,8
+        pxor    xmm0,xmm4
+        pxor    xmm1,xmm3
+        pshufd  xmm3,xmm7,78
+        movdqa  xmm4,xmm0
+        psrlq   xmm0,1
+        pxor    xmm3,xmm7
+        pxor    xmm1,xmm4
+db      102,15,58,68,250,17
+        movups  xmm2,[16+edx]
+        pxor    xmm4,xmm0
+        psrlq   xmm0,5
+        pxor    xmm0,xmm4
+        psrlq   xmm0,1
+        pxor    xmm0,xmm1
+db      102,15,58,68,221,0
+        lea     esi,[32+esi]
+        sub     ebx,32
+        ja      NEAR L$015mod_loop
+L$014even_tail:
+        pshufd  xmm4,xmm0,78
+        movdqa  xmm1,xmm0
+        pxor    xmm4,xmm0
+db      102,15,58,68,194,0
+db      102,15,58,68,202,17
+db      102,15,58,68,229,16
+        movdqa  xmm5,[ecx]
+        xorps   xmm0,xmm6
+        xorps   xmm1,xmm7
+        pxor    xmm3,xmm0
+        pxor    xmm3,xmm1
+        pxor    xmm4,xmm3
+        movdqa  xmm3,xmm4
+        psrldq  xmm4,8
+        pslldq  xmm3,8
+        pxor    xmm1,xmm4
+        pxor    xmm0,xmm3
+        movdqa  xmm4,xmm0
+        movdqa  xmm3,xmm0
+        psllq   xmm0,5
+        pxor    xmm3,xmm0
+        psllq   xmm0,1
+        pxor    xmm0,xmm3
+        psllq   xmm0,57
+        movdqa  xmm3,xmm0
+        pslldq  xmm0,8
+        psrldq  xmm3,8
+        pxor    xmm0,xmm4
+        pxor    xmm1,xmm3
+        movdqa  xmm4,xmm0
+        psrlq   xmm0,1
+        pxor    xmm1,xmm4
+        pxor    xmm4,xmm0
+        psrlq   xmm0,5
+        pxor    xmm0,xmm4
+        psrlq   xmm0,1
+        pxor    xmm0,xmm1
+        test    ebx,ebx
+        jnz     NEAR L$016done
+        movups  xmm2,[edx]
+L$013odd_tail:
+        movdqu  xmm3,[esi]
+db      102,15,56,0,221
+        pxor    xmm0,xmm3
+        movdqa  xmm1,xmm0
+        pshufd  xmm3,xmm0,78
+        pshufd  xmm4,xmm2,78
+        pxor    xmm3,xmm0
+        pxor    xmm4,xmm2
+db      102,15,58,68,194,0
+db      102,15,58,68,202,17
+db      102,15,58,68,220,0
+        xorps   xmm3,xmm0
+        xorps   xmm3,xmm1
+        movdqa  xmm4,xmm3
+        psrldq  xmm3,8
+        pslldq  xmm4,8
+        pxor    xmm1,xmm3
+        pxor    xmm0,xmm4
+        movdqa  xmm4,xmm0
+        movdqa  xmm3,xmm0
+        psllq   xmm0,5
+        pxor    xmm3,xmm0
+        psllq   xmm0,1
+        pxor    xmm0,xmm3
+        psllq   xmm0,57
+        movdqa  xmm3,xmm0
+        pslldq  xmm0,8
+        psrldq  xmm3,8
+        pxor    xmm0,xmm4
+        pxor    xmm1,xmm3
+        movdqa  xmm4,xmm0
+        psrlq   xmm0,1
+        pxor    xmm1,xmm4
+        pxor    xmm4,xmm0
+        psrlq   xmm0,5
+        pxor    xmm0,xmm4
+        psrlq   xmm0,1
+        pxor    xmm0,xmm1
+L$016done:
+db      102,15,56,0,197
+        movdqu  [eax],xmm0
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+align   64
+L$bswap:
+db      15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0
+db      1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,194
+align   64
+L$rem_8bit:
+dw      0,450,900,582,1800,1738,1164,1358
+dw      3600,4050,3476,3158,2328,2266,2716,2910
+dw      7200,7650,8100,7782,6952,6890,6316,6510
+dw      4656,5106,4532,4214,5432,5370,5820,6014
+dw      14400,14722,15300,14854,16200,16010,15564,15630
+dw      13904,14226,13780,13334,12632,12442,13020,13086
+dw      9312,9634,10212,9766,9064,8874,8428,8494
+dw      10864,11186,10740,10294,11640,11450,12028,12094
+dw      28800,28994,29444,29382,30600,30282,29708,30158
+dw      32400,32594,32020,31958,31128,30810,31260,31710
+dw      27808,28002,28452,28390,27560,27242,26668,27118
+dw      25264,25458,24884,24822,26040,25722,26172,26622
+dw      18624,18690,19268,19078,20424,19978,19532,19854
+dw      18128,18194,17748,17558,16856,16410,16988,17310
+dw      21728,21794,22372,22182,21480,21034,20588,20910
+dw      23280,23346,22900,22710,24056,23610,24188,24510
+dw      57600,57538,57988,58182,58888,59338,58764,58446
+dw      61200,61138,60564,60758,59416,59866,60316,59998
+dw      64800,64738,65188,65382,64040,64490,63916,63598
+dw      62256,62194,61620,61814,62520,62970,63420,63102
+dw      55616,55426,56004,56070,56904,57226,56780,56334
+dw      55120,54930,54484,54550,53336,53658,54236,53790
+dw      50528,50338,50916,50982,49768,50090,49644,49198
+dw      52080,51890,51444,51510,52344,52666,53244,52798
+dw      37248,36930,37380,37830,38536,38730,38156,38094
+dw      40848,40530,39956,40406,39064,39258,39708,39646
+dw      36256,35938,36388,36838,35496,35690,35116,35054
+dw      33712,33394,32820,33270,33976,34170,34620,34558
+dw      43456,43010,43588,43910,44744,44810,44364,44174
+dw      42960,42514,42068,42390,41176,41242,41820,41630
+dw      46560,46114,46692,47014,45800,45866,45420,45230
+dw      48112,47666,47220,47542,48376,48442,49020,48830
+align   64
+L$rem_4bit:
+dd      0,0,0,471859200,0,943718400,0,610271232
+dd      0,1887436800,0,1822425088,0,1220542464,0,1423966208
+dd      0,3774873600,0,4246732800,0,3644850176,0,3311403008
+dd      0,2441084928,0,2376073216,0,2847932416,0,3051356160
+db      71,72,65,83,72,32,102,111,114,32,120,56,54,44,32,67
+db      82,89,80,84,79,71,65,77,83,32,98,121,32,60,97,112
+db      112,114,111,64,111,112,101,110,115,115,108,46,111,114,103,62
+db      0
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/rc4/rc4-586.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/rc4/rc4-586.nasm
new file mode 100644
index 0000000000..e78222ee9d
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/rc4/rc4-586.nasm
@@ -0,0 +1,381 @@
+; Copyright 1998-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code    use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text   code align=64
+%else
+section .text   code
+%endif
+;extern _OPENSSL_ia32cap_P
+global  _RC4
+align   16
+_RC4:
+L$_RC4_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     edi,DWORD [20+esp]
+        mov     edx,DWORD [24+esp]
+        mov     esi,DWORD [28+esp]
+        mov     ebp,DWORD [32+esp]
+        xor     eax,eax
+        xor     ebx,ebx
+        cmp     edx,0
+        je      NEAR L$000abort
+        mov     al,BYTE [edi]
+        mov     bl,BYTE [4+edi]
+        add     edi,8
+        lea     ecx,[edx*1+esi]
+        sub     ebp,esi
+        mov     DWORD [24+esp],ecx
+        inc     al
+        cmp     DWORD [256+edi],-1
+        je      NEAR L$001RC4_CHAR
+        mov     ecx,DWORD [eax*4+edi]
+        and     edx,-4
+        jz      NEAR L$002loop1
+        mov     DWORD [32+esp],ebp
+        test    edx,-8
+        jz      NEAR L$003go4loop4
+        lea     ebp,[_OPENSSL_ia32cap_P]
+        bt      DWORD [ebp],26
+        jnc     NEAR L$003go4loop4
+        mov     ebp,DWORD [32+esp]
+        and     edx,-8
+        lea     edx,[edx*1+esi-8]
+        mov     DWORD [edi-4],edx
+        add     bl,cl
+        mov     edx,DWORD [ebx*4+edi]
+        mov     DWORD [ebx*4+edi],ecx
+        mov     DWORD [eax*4+edi],edx
+        inc     eax
+        add     edx,ecx
+        movzx   eax,al
+        movzx   edx,dl
+        movq    mm0,[esi]
+        mov     ecx,DWORD [eax*4+edi]
+        movd    mm2,DWORD [edx*4+edi]
+        jmp     NEAR L$004loop_mmx_enter
+align   16
+L$005loop_mmx:
+        add     bl,cl
+        psllq   mm1,56
+        mov     edx,DWORD [ebx*4+edi]
+        mov     DWORD [ebx*4+edi],ecx
+        mov     DWORD [eax*4+edi],edx
+        inc     eax
+        add     edx,ecx
+        movzx   eax,al
+        movzx   edx,dl
+        pxor    mm2,mm1
+        movq    mm0,[esi]
+        movq    [esi*1+ebp-8],mm2
+        mov     ecx,DWORD [eax*4+edi]
+        movd    mm2,DWORD [edx*4+edi]
+L$004loop_mmx_enter:
+        add     bl,cl
+        mov     edx,DWORD [ebx*4+edi]
+        mov     DWORD [ebx*4+edi],ecx
+        mov     DWORD [eax*4+edi],edx
+        inc     eax
+        add     edx,ecx
+        movzx   eax,al
+        movzx   edx,dl
+        pxor    mm2,mm0
+        mov     ecx,DWORD [eax*4+edi]
+        movd    mm1,DWORD [edx*4+edi]
+        add     bl,cl
+        psllq   mm1,8
+        mov     edx,DWORD [ebx*4+edi]
+        mov     DWORD [ebx*4+edi],ecx
+        mov     DWORD [eax*4+edi],edx
+        inc     eax
+        add     edx,ecx
+        movzx   eax,al
+        movzx   edx,dl
+        pxor    mm2,mm1
+        mov     ecx,DWORD [eax*4+edi]
+        movd    mm1,DWORD [edx*4+edi]
+        add     bl,cl
+        psllq   mm1,16
+        mov     edx,DWORD [ebx*4+edi]
+        mov     DWORD [ebx*4+edi],ecx
+        mov     DWORD [eax*4+edi],edx
+        inc     eax
+        add     edx,ecx
+        movzx   eax,al
+        movzx   edx,dl
+        pxor    mm2,mm1
+        mov     ecx,DWORD [eax*4+edi]
+        movd    mm1,DWORD [edx*4+edi]
+        add     bl,cl
+        psllq   mm1,24
+        mov     edx,DWORD [ebx*4+edi]
+        mov     DWORD [ebx*4+edi],ecx
+        mov     DWORD [eax*4+edi],edx
+        inc     eax
+        add     edx,ecx
+        movzx   eax,al
+        movzx   edx,dl
+        pxor    mm2,mm1
+        mov     ecx,DWORD [eax*4+edi]
+        movd    mm1,DWORD [edx*4+edi]
+        add     bl,cl
+        psllq   mm1,32
+        mov     edx,DWORD [ebx*4+edi]
+        mov     DWORD [ebx*4+edi],ecx
+        mov     DWORD [eax*4+edi],edx
+        inc     eax
+        add     edx,ecx
+        movzx   eax,al
+        movzx   edx,dl
+        pxor    mm2,mm1
+        mov     ecx,DWORD [eax*4+edi]
+        movd    mm1,DWORD [edx*4+edi]
+        add     bl,cl
+        psllq   mm1,40
+        mov     edx,DWORD [ebx*4+edi]
+        mov     DWORD [ebx*4+edi],ecx
+        mov     DWORD [eax*4+edi],edx
+        inc     eax
+        add     edx,ecx
+        movzx   eax,al
+        movzx   edx,dl
+        pxor    mm2,mm1
+        mov     ecx,DWORD [eax*4+edi]
+        movd    mm1,DWORD [edx*4+edi]
+        add     bl,cl
+        psllq   mm1,48
+        mov     edx,DWORD [ebx*4+edi]
+        mov     DWORD [ebx*4+edi],ecx
+        mov     DWORD [eax*4+edi],edx
+        inc     eax
+        add     edx,ecx
+        movzx   eax,al
+        movzx   edx,dl
+        pxor    mm2,mm1
+        mov     ecx,DWORD [eax*4+edi]
+        movd    mm1,DWORD [edx*4+edi]
+        mov     edx,ebx
+        xor     ebx,ebx
+        mov     bl,dl
+        cmp     esi,DWORD [edi-4]
+        lea     esi,[8+esi]
+        jb      NEAR L$005loop_mmx
+        psllq   mm1,56
+        pxor    mm2,mm1
+        movq    [esi*1+ebp-8],mm2
+        emms
+        cmp     esi,DWORD [24+esp]
+        je      NEAR L$006done
+        jmp     NEAR L$002loop1
+align   16
+L$003go4loop4:
+        lea     edx,[edx*1+esi-4]
+        mov     DWORD [28+esp],edx
+L$007loop4:
+        add     bl,cl
+        mov     edx,DWORD [ebx*4+edi]
+        mov     DWORD [ebx*4+edi],ecx
+        mov     DWORD [eax*4+edi],edx
+        add     edx,ecx
+        inc     al
+        and     edx,255
+        mov     ecx,DWORD [eax*4+edi]
+        mov     ebp,DWORD [edx*4+edi]
+        add     bl,cl
+        mov     edx,DWORD [ebx*4+edi]
+        mov     DWORD [ebx*4+edi],ecx
+        mov     DWORD [eax*4+edi],edx
+        add     edx,ecx
+        inc     al
+        and     edx,255
+        ror     ebp,8
+        mov     ecx,DWORD [eax*4+edi]
+        or      ebp,DWORD [edx*4+edi]
+        add     bl,cl
+        mov     edx,DWORD [ebx*4+edi]
+        mov     DWORD [ebx*4+edi],ecx
+        mov     DWORD [eax*4+edi],edx
+        add     edx,ecx
+        inc     al
+        and     edx,255
+        ror     ebp,8
+        mov     ecx,DWORD [eax*4+edi]
+        or      ebp,DWORD [edx*4+edi]
+        add     bl,cl
+        mov     edx,DWORD [ebx*4+edi]
+        mov     DWORD [ebx*4+edi],ecx
+        mov     DWORD [eax*4+edi],edx
+        add     edx,ecx
+        inc     al
+        and     edx,255
+        ror     ebp,8
+        mov     ecx,DWORD [32+esp]
+        or      ebp,DWORD [edx*4+edi]
+        ror     ebp,8
+        xor     ebp,DWORD [esi]
+        cmp     esi,DWORD [28+esp]
+        mov     DWORD [esi*1+ecx],ebp
+        lea     esi,[4+esi]
+        mov     ecx,DWORD [eax*4+edi]
+        jb      NEAR L$007loop4
+        cmp     esi,DWORD [24+esp]
+        je      NEAR L$006done
+        mov     ebp,DWORD [32+esp]
+align   16
+L$002loop1:
+        add     bl,cl
+        mov     edx,DWORD [ebx*4+edi]
+        mov     DWORD [ebx*4+edi],ecx
+        mov     DWORD [eax*4+edi],edx
+        add     edx,ecx
+        inc     al
+        and     edx,255
+        mov     edx,DWORD [edx*4+edi]
+        xor     dl,BYTE [esi]
+        lea     esi,[1+esi]
+        mov     ecx,DWORD [eax*4+edi]
+        cmp     esi,DWORD [24+esp]
+        mov     BYTE [esi*1+ebp-1],dl
+        jb      NEAR L$002loop1
+        jmp     NEAR L$006done
+align   16
+L$001RC4_CHAR:
+        movzx   ecx,BYTE [eax*1+edi]
+L$008cloop1:
+        add     bl,cl
+        movzx   edx,BYTE [ebx*1+edi]
+        mov     BYTE [ebx*1+edi],cl
+        mov     BYTE [eax*1+edi],dl
+        add     dl,cl
+        movzx   edx,BYTE [edx*1+edi]
+        add     al,1
+        xor     dl,BYTE [esi]
+        lea     esi,[1+esi]
+        movzx   ecx,BYTE [eax*1+edi]
+        cmp     esi,DWORD [24+esp]
+        mov     BYTE [esi*1+ebp-1],dl
+        jb      NEAR L$008cloop1
+L$006done:
+        dec     al
+        mov     DWORD [edi-4],ebx
+        mov     BYTE [edi-8],al
+L$000abort:
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _RC4_set_key
+align   16
+_RC4_set_key:
+L$_RC4_set_key_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     edi,DWORD [20+esp]
+        mov     ebp,DWORD [24+esp]
+        mov     esi,DWORD [28+esp]
+        lea     edx,[_OPENSSL_ia32cap_P]
+        lea     edi,[8+edi]
+        lea     esi,[ebp*1+esi]
+        neg     ebp
+        xor     eax,eax
+        mov     DWORD [edi-4],ebp
+        bt      DWORD [edx],20
+        jc      NEAR L$009c1stloop
+align   16
+L$010w1stloop:
+        mov     DWORD [eax*4+edi],eax
+        add     al,1
+        jnc     NEAR L$010w1stloop
+        xor     ecx,ecx
+        xor     edx,edx
+align   16
+L$011w2ndloop:
+        mov     eax,DWORD [ecx*4+edi]
+        add     dl,BYTE [ebp*1+esi]
+        add     dl,al
+        add     ebp,1
+        mov     ebx,DWORD [edx*4+edi]
+        jnz     NEAR L$012wnowrap
+        mov     ebp,DWORD [edi-4]
+L$012wnowrap:
+        mov     DWORD [edx*4+edi],eax
+        mov     DWORD [ecx*4+edi],ebx
+        add     cl,1
+        jnc     NEAR L$011w2ndloop
+        jmp     NEAR L$013exit
+align   16
+L$009c1stloop:
+        mov     BYTE [eax*1+edi],al
+        add     al,1
+        jnc     NEAR L$009c1stloop
+        xor     ecx,ecx
+        xor     edx,edx
+        xor     ebx,ebx
+align   16
+L$014c2ndloop:
+        mov     al,BYTE [ecx*1+edi]
+        add     dl,BYTE [ebp*1+esi]
+        add     dl,al
+        add     ebp,1
+        mov     bl,BYTE [edx*1+edi]
+        jnz     NEAR L$015cnowrap
+        mov     ebp,DWORD [edi-4]
+L$015cnowrap:
+        mov     BYTE [edx*1+edi],al
+        mov     BYTE [ecx*1+edi],bl
+        add     cl,1
+        jnc     NEAR L$014c2ndloop
+        mov     DWORD [256+edi],-1
+L$013exit:
+        xor     eax,eax
+        mov     DWORD [edi-8],eax
+        mov     DWORD [edi-4],eax
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _RC4_options
+align   16
+_RC4_options:
+L$_RC4_options_begin:
+        call    L$016pic_point
+L$016pic_point:
+        pop     eax
+        lea     eax,[(L$017opts-L$016pic_point)+eax]
+        lea     edx,[_OPENSSL_ia32cap_P]
+        mov     edx,DWORD [edx]
+        bt      edx,20
+        jc      NEAR L$0181xchar
+        bt      edx,26
+        jnc     NEAR L$019ret
+        add     eax,25
+        ret
+L$0181xchar:
+        add     eax,12
+L$019ret:
+        ret
+align   64
+L$017opts:
+db      114,99,52,40,52,120,44,105,110,116,41,0
+db      114,99,52,40,49,120,44,99,104,97,114,41,0
+db      114,99,52,40,56,120,44,109,109,120,41,0
+db      82,67,52,32,102,111,114,32,120,56,54,44,32,67,82,89
+db      80,84,79,71,65,77,83,32,98,121,32,60,97,112,112,114
+db      111,64,111,112,101,110,115,115,108,46,111,114,103,62,0
+align   64
+segment .bss
+common  _OPENSSL_ia32cap_P 16
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha1-586.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha1-586.nasm
new file mode 100644
index 0000000000..4a893333d8
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha1-586.nasm
@@ -0,0 +1,3977 @@
+; Copyright 1998-2018 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code    use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text   code align=64
+%else
+section .text   code
+%endif
+;extern _OPENSSL_ia32cap_P
+global  _sha1_block_data_order
+align   16
+_sha1_block_data_order:
+L$_sha1_block_data_order_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        call    L$000pic_point
+L$000pic_point:
+        pop     ebp
+        lea     esi,[_OPENSSL_ia32cap_P]
+        lea     ebp,[(L$K_XX_XX-L$000pic_point)+ebp]
+        mov     eax,DWORD [esi]
+        mov     edx,DWORD [4+esi]
+        test    edx,512
+        jz      NEAR L$001x86
+        mov     ecx,DWORD [8+esi]
+        test    eax,16777216
+        jz      NEAR L$001x86
+        test    ecx,536870912
+        jnz     NEAR L$shaext_shortcut
+        and     edx,268435456
+        and     eax,1073741824
+        or      eax,edx
+        cmp     eax,1342177280
+        je      NEAR L$avx_shortcut
+        jmp     NEAR L$ssse3_shortcut
+align   16
+L$001x86:
+        mov     ebp,DWORD [20+esp]
+        mov     esi,DWORD [24+esp]
+        mov     eax,DWORD [28+esp]
+        sub     esp,76
+        shl     eax,6
+        add     eax,esi
+        mov     DWORD [104+esp],eax
+        mov     edi,DWORD [16+ebp]
+        jmp     NEAR L$002loop
+align   16
+L$002loop:
+        mov     eax,DWORD [esi]
+        mov     ebx,DWORD [4+esi]
+        mov     ecx,DWORD [8+esi]
+        mov     edx,DWORD [12+esi]
+        bswap   eax
+        bswap   ebx
+        bswap   ecx
+        bswap   edx
+        mov     DWORD [esp],eax
+        mov     DWORD [4+esp],ebx
+        mov     DWORD [8+esp],ecx
+        mov     DWORD [12+esp],edx
+        mov     eax,DWORD [16+esi]
+        mov     ebx,DWORD [20+esi]
+        mov     ecx,DWORD [24+esi]
+        mov     edx,DWORD [28+esi]
+        bswap   eax
+        bswap   ebx
+        bswap   ecx
+        bswap   edx
+        mov     DWORD [16+esp],eax
+        mov     DWORD [20+esp],ebx
+        mov     DWORD [24+esp],ecx
+        mov     DWORD [28+esp],edx
+        mov     eax,DWORD [32+esi]
+        mov     ebx,DWORD [36+esi]
+        mov     ecx,DWORD [40+esi]
+        mov     edx,DWORD [44+esi]
+        bswap   eax
+        bswap   ebx
+        bswap   ecx
+        bswap   edx
+        mov     DWORD [32+esp],eax
+        mov     DWORD [36+esp],ebx
+        mov     DWORD [40+esp],ecx
+        mov     DWORD [44+esp],edx
+        mov     eax,DWORD [48+esi]
+        mov     ebx,DWORD [52+esi]
+        mov     ecx,DWORD [56+esi]
+        mov     edx,DWORD [60+esi]
+        bswap   eax
+        bswap   ebx
+        bswap   ecx
+        bswap   edx
+        mov     DWORD [48+esp],eax
+        mov     DWORD [52+esp],ebx
+        mov     DWORD [56+esp],ecx
+        mov     DWORD [60+esp],edx
+        mov     DWORD [100+esp],esi
+        mov     eax,DWORD [ebp]
+        mov     ebx,DWORD [4+ebp]
+        mov     ecx,DWORD [8+ebp]
+        mov     edx,DWORD [12+ebp]
+        ; 00_15 0
+        mov     esi,ecx
+        mov     ebp,eax
+        rol     ebp,5
+        xor     esi,edx
+        add     ebp,edi
+        mov     edi,DWORD [esp]
+        and     esi,ebx
+        ror     ebx,2
+        xor     esi,edx
+        lea     ebp,[1518500249+edi*1+ebp]
+        add     ebp,esi
+        ; 00_15 1
+        mov     edi,ebx
+        mov     esi,ebp
+        rol     ebp,5
+        xor     edi,ecx
+        add     ebp,edx
+        mov     edx,DWORD [4+esp]
+        and     edi,eax
+        ror     eax,2
+        xor     edi,ecx
+        lea     ebp,[1518500249+edx*1+ebp]
+        add     ebp,edi
+        ; 00_15 2
+        mov     edx,eax
+        mov     edi,ebp
+        rol     ebp,5
+        xor     edx,ebx
+        add     ebp,ecx
+        mov     ecx,DWORD [8+esp]
+        and     edx,esi
+        ror     esi,2
+        xor     edx,ebx
+        lea     ebp,[1518500249+ecx*1+ebp]
+        add     ebp,edx
+        ; 00_15 3
+        mov     ecx,esi
+        mov     edx,ebp
+        rol     ebp,5
+        xor     ecx,eax
+        add     ebp,ebx
+        mov     ebx,DWORD [12+esp]
+        and     ecx,edi
+        ror     edi,2
+        xor     ecx,eax
+        lea     ebp,[1518500249+ebx*1+ebp]
+        add     ebp,ecx
+        ; 00_15 4
+        mov     ebx,edi
+        mov     ecx,ebp
+        rol     ebp,5
+        xor     ebx,esi
+        add     ebp,eax
+        mov     eax,DWORD [16+esp]
+        and     ebx,edx
+        ror     edx,2
+        xor     ebx,esi
+        lea     ebp,[1518500249+eax*1+ebp]
+        add     ebp,ebx
+        ; 00_15 5
+        mov     eax,edx
+        mov     ebx,ebp
+        rol     ebp,5
+        xor     eax,edi
+        add     ebp,esi
+        mov     esi,DWORD [20+esp]
+        and     eax,ecx
+        ror     ecx,2
+        xor     eax,edi
+        lea     ebp,[1518500249+esi*1+ebp]
+        add     ebp,eax
+        ; 00_15 6
+        mov     esi,ecx
+        mov     eax,ebp
+        rol     ebp,5
+        xor     esi,edx
+        add     ebp,edi
+        mov     edi,DWORD [24+esp]
+        and     esi,ebx
+        ror     ebx,2
+        xor     esi,edx
+        lea     ebp,[1518500249+edi*1+ebp]
+        add     ebp,esi
+        ; 00_15 7
+        mov     edi,ebx
+        mov     esi,ebp
+        rol     ebp,5
+        xor     edi,ecx
+        add     ebp,edx
+        mov     edx,DWORD [28+esp]
+        and     edi,eax
+        ror     eax,2
+        xor     edi,ecx
+        lea     ebp,[1518500249+edx*1+ebp]
+        add     ebp,edi
+        ; 00_15 8
+        mov     edx,eax
+        mov     edi,ebp
+        rol     ebp,5
+        xor     edx,ebx
+        add     ebp,ecx
+        mov     ecx,DWORD [32+esp]
+        and     edx,esi
+        ror     esi,2
+        xor     edx,ebx
+        lea     ebp,[1518500249+ecx*1+ebp]
+        add     ebp,edx
+        ; 00_15 9
+        mov     ecx,esi
+        mov     edx,ebp
+        rol     ebp,5
+        xor     ecx,eax
+        add     ebp,ebx
+        mov     ebx,DWORD [36+esp]
+        and     ecx,edi
+        ror     edi,2
+        xor     ecx,eax
+        lea     ebp,[1518500249+ebx*1+ebp]
+        add     ebp,ecx
+        ; 00_15 10
+        mov     ebx,edi
+        mov     ecx,ebp
+        rol     ebp,5
+        xor     ebx,esi
+        add     ebp,eax
+        mov     eax,DWORD [40+esp]
+        and     ebx,edx
+        ror     edx,2
+        xor     ebx,esi
+        lea     ebp,[1518500249+eax*1+ebp]
+        add     ebp,ebx
+        ; 00_15 11
+        mov     eax,edx
+        mov     ebx,ebp
+        rol     ebp,5
+        xor     eax,edi
+        add     ebp,esi
+        mov     esi,DWORD [44+esp]
+        and     eax,ecx
+        ror     ecx,2
+        xor     eax,edi
+        lea     ebp,[1518500249+esi*1+ebp]
+        add     ebp,eax
+        ; 00_15 12
+        mov     esi,ecx
+        mov     eax,ebp
+        rol     ebp,5
+        xor     esi,edx
+        add     ebp,edi
+        mov     edi,DWORD [48+esp]
+        and     esi,ebx
+        ror     ebx,2
+        xor     esi,edx
+        lea     ebp,[1518500249+edi*1+ebp]
+        add     ebp,esi
+        ; 00_15 13
+        mov     edi,ebx
+        mov     esi,ebp
+        rol     ebp,5
+        xor     edi,ecx
+        add     ebp,edx
+        mov     edx,DWORD [52+esp]
+        and     edi,eax
+        ror     eax,2
+        xor     edi,ecx
+        lea     ebp,[1518500249+edx*1+ebp]
+        add     ebp,edi
+        ; 00_15 14
+        mov     edx,eax
+        mov     edi,ebp
+        rol     ebp,5
+        xor     edx,ebx
+        add     ebp,ecx
+        mov     ecx,DWORD [56+esp]
+        and     edx,esi
+        ror     esi,2
+        xor     edx,ebx
+        lea     ebp,[1518500249+ecx*1+ebp]
+        add     ebp,edx
+        ; 00_15 15
+        mov     ecx,esi
+        mov     edx,ebp
+        rol     ebp,5
+        xor     ecx,eax
+        add     ebp,ebx
+        mov     ebx,DWORD [60+esp]
+        and     ecx,edi
+        ror     edi,2
+        xor     ecx,eax
+        lea     ebp,[1518500249+ebx*1+ebp]
+        mov     ebx,DWORD [esp]
+        add     ecx,ebp
+        ; 16_19 16
+        mov     ebp,edi
+        xor     ebx,DWORD [8+esp]
+        xor     ebp,esi
+        xor     ebx,DWORD [32+esp]
+        and     ebp,edx
+        xor     ebx,DWORD [52+esp]
+        rol     ebx,1
+        xor     ebp,esi
+        add     eax,ebp
+        mov     ebp,ecx
+        ror     edx,2
+        mov     DWORD [esp],ebx
+        rol     ebp,5
+        lea     ebx,[1518500249+eax*1+ebx]
+        mov     eax,DWORD [4+esp]
+        add     ebx,ebp
+        ; 16_19 17
+        mov     ebp,edx
+        xor     eax,DWORD [12+esp]
+        xor     ebp,edi
+        xor     eax,DWORD [36+esp]
+        and     ebp,ecx
+        xor     eax,DWORD [56+esp]
+        rol     eax,1
+        xor     ebp,edi
+        add     esi,ebp
+        mov     ebp,ebx
+        ror     ecx,2
+        mov     DWORD [4+esp],eax
+        rol     ebp,5
+        lea     eax,[1518500249+esi*1+eax]
+        mov     esi,DWORD [8+esp]
+        add     eax,ebp
+        ; 16_19 18
+        mov     ebp,ecx
+        xor     esi,DWORD [16+esp]
+        xor     ebp,edx
+        xor     esi,DWORD [40+esp]
+        and     ebp,ebx
+        xor     esi,DWORD [60+esp]
+        rol     esi,1
+        xor     ebp,edx
+        add     edi,ebp
+        mov     ebp,eax
+        ror     ebx,2
+        mov     DWORD [8+esp],esi
+        rol     ebp,5
+        lea     esi,[1518500249+edi*1+esi]
+        mov     edi,DWORD [12+esp]
+        add     esi,ebp
+        ; 16_19 19
+        mov     ebp,ebx
+        xor     edi,DWORD [20+esp]
+        xor     ebp,ecx
+        xor     edi,DWORD [44+esp]
+        and     ebp,eax
+        xor     edi,DWORD [esp]
+        rol     edi,1
+        xor     ebp,ecx
+        add     edx,ebp
+        mov     ebp,esi
+        ror     eax,2
+        mov     DWORD [12+esp],edi
+        rol     ebp,5
+        lea     edi,[1518500249+edx*1+edi]
+        mov     edx,DWORD [16+esp]
+        add     edi,ebp
+        ; 20_39 20
+        mov     ebp,esi
+        xor     edx,DWORD [24+esp]
+        xor     ebp,eax
+        xor     edx,DWORD [48+esp]
+        xor     ebp,ebx
+        xor     edx,DWORD [4+esp]
+        rol     edx,1
+        add     ecx,ebp
+        ror     esi,2
+        mov     ebp,edi
+        rol     ebp,5
+        mov     DWORD [16+esp],edx
+        lea     edx,[1859775393+ecx*1+edx]
+        mov     ecx,DWORD [20+esp]
+        add     edx,ebp
+        ; 20_39 21
+        mov     ebp,edi
+        xor     ecx,DWORD [28+esp]
+        xor     ebp,esi
+        xor     ecx,DWORD [52+esp]
+        xor     ebp,eax
+        xor     ecx,DWORD [8+esp]
+        rol     ecx,1
+        add     ebx,ebp
+        ror     edi,2
+        mov     ebp,edx
+        rol     ebp,5
+        mov     DWORD [20+esp],ecx
+        lea     ecx,[1859775393+ebx*1+ecx]
+        mov     ebx,DWORD [24+esp]
+        add     ecx,ebp
+        ; 20_39 22
+        mov     ebp,edx
+        xor     ebx,DWORD [32+esp]
+        xor     ebp,edi
+        xor     ebx,DWORD [56+esp]
+        xor     ebp,esi
+        xor     ebx,DWORD [12+esp]
+        rol     ebx,1
+        add     eax,ebp
+        ror     edx,2
+        mov     ebp,ecx
+        rol     ebp,5
+        mov     DWORD [24+esp],ebx
+        lea     ebx,[1859775393+eax*1+ebx]
+        mov     eax,DWORD [28+esp]
+        add     ebx,ebp
+        ; 20_39 23
+        mov     ebp,ecx
+        xor     eax,DWORD [36+esp]
+        xor     ebp,edx
+        xor     eax,DWORD [60+esp]
+        xor     ebp,edi
+        xor     eax,DWORD [16+esp]
+        rol     eax,1
+        add     esi,ebp
+        ror     ecx,2
+        mov     ebp,ebx
+        rol     ebp,5
+        mov     DWORD [28+esp],eax
+        lea     eax,[1859775393+esi*1+eax]
+        mov     esi,DWORD [32+esp]
+        add     eax,ebp
+        ; 20_39 24
+        mov     ebp,ebx
+        xor     esi,DWORD [40+esp]
+        xor     ebp,ecx
+        xor     esi,DWORD [esp]
+        xor     ebp,edx
+        xor     esi,DWORD [20+esp]
+        rol     esi,1
+        add     edi,ebp
+        ror     ebx,2
+        mov     ebp,eax
+        rol     ebp,5
+        mov     DWORD [32+esp],esi
+        lea     esi,[1859775393+edi*1+esi]
+        mov     edi,DWORD [36+esp]
+        add     esi,ebp
+        ; 20_39 25
+        mov     ebp,eax
+        xor     edi,DWORD [44+esp]
+        xor     ebp,ebx
+        xor     edi,DWORD [4+esp]
+        xor     ebp,ecx
+        xor     edi,DWORD [24+esp]
+        rol     edi,1
+        add     edx,ebp
+        ror     eax,2
+        mov     ebp,esi
+        rol     ebp,5
+        mov     DWORD [36+esp],edi
+        lea     edi,[1859775393+edx*1+edi]
+        mov     edx,DWORD [40+esp]
+        add     edi,ebp
+        ; 20_39 26
+        mov     ebp,esi
+        xor     edx,DWORD [48+esp]
+        xor     ebp,eax
+        xor     edx,DWORD [8+esp]
+        xor     ebp,ebx
+        xor     edx,DWORD [28+esp]
+        rol     edx,1
+        add     ecx,ebp
+        ror     esi,2
+        mov     ebp,edi
+        rol     ebp,5
+        mov     DWORD [40+esp],edx
+        lea     edx,[1859775393+ecx*1+edx]
+        mov     ecx,DWORD [44+esp]
+        add     edx,ebp
+        ; 20_39 27
+        mov     ebp,edi
+        xor     ecx,DWORD [52+esp]
+        xor     ebp,esi
+        xor     ecx,DWORD [12+esp]
+        xor     ebp,eax
+        xor     ecx,DWORD [32+esp]
+        rol     ecx,1
+        add     ebx,ebp
+        ror     edi,2
+        mov     ebp,edx
+        rol     ebp,5
+        mov     DWORD [44+esp],ecx
+        lea     ecx,[1859775393+ebx*1+ecx]
+        mov     ebx,DWORD [48+esp]
+        add     ecx,ebp
+        ; 20_39 28
+        mov     ebp,edx
+        xor     ebx,DWORD [56+esp]
+        xor     ebp,edi
+        xor     ebx,DWORD [16+esp]
+        xor     ebp,esi
+        xor     ebx,DWORD [36+esp]
+        rol     ebx,1
+        add     eax,ebp
+        ror     edx,2
+        mov     ebp,ecx
+        rol     ebp,5
+        mov     DWORD [48+esp],ebx
+        lea     ebx,[1859775393+eax*1+ebx]
+        mov     eax,DWORD [52+esp]
+        add     ebx,ebp
+        ; 20_39 29
+        mov     ebp,ecx
+        xor     eax,DWORD [60+esp]
+        xor     ebp,edx
+        xor     eax,DWORD [20+esp]
+        xor     ebp,edi
+        xor     eax,DWORD [40+esp]
+        rol     eax,1
+        add     esi,ebp
+        ror     ecx,2
+        mov     ebp,ebx
+        rol     ebp,5
+        mov     DWORD [52+esp],eax
+        lea     eax,[1859775393+esi*1+eax]
+        mov     esi,DWORD [56+esp]
+        add     eax,ebp
+        ; 20_39 30
+        mov     ebp,ebx
+        xor     esi,DWORD [esp]
+        xor     ebp,ecx
+        xor     esi,DWORD [24+esp]
+        xor     ebp,edx
+        xor     esi,DWORD [44+esp]
+        rol     esi,1
+        add     edi,ebp
+        ror     ebx,2
+        mov     ebp,eax
+        rol     ebp,5
+        mov     DWORD [56+esp],esi
+        lea     esi,[1859775393+edi*1+esi]
+        mov     edi,DWORD [60+esp]
+        add     esi,ebp
+        ; 20_39 31
+        mov     ebp,eax
+        xor     edi,DWORD [4+esp]
+        xor     ebp,ebx
+        xor     edi,DWORD [28+esp]
+        xor     ebp,ecx
+        xor     edi,DWORD [48+esp]
+        rol     edi,1
+        add     edx,ebp
+        ror     eax,2
+        mov     ebp,esi
+        rol     ebp,5
+        mov     DWORD [60+esp],edi
+        lea     edi,[1859775393+edx*1+edi]
+        mov     edx,DWORD [esp]
+        add     edi,ebp
+        ; 20_39 32
+        mov     ebp,esi
+        xor     edx,DWORD [8+esp]
+        xor     ebp,eax
+        xor     edx,DWORD [32+esp]
+        xor     ebp,ebx
+        xor     edx,DWORD [52+esp]
+        rol     edx,1
+        add     ecx,ebp
+        ror     esi,2
+        mov     ebp,edi
+        rol     ebp,5
+        mov     DWORD [esp],edx
+        lea     edx,[1859775393+ecx*1+edx]
+        mov     ecx,DWORD [4+esp]
+        add     edx,ebp
+        ; 20_39 33
+        mov     ebp,edi
+        xor     ecx,DWORD [12+esp]
+        xor     ebp,esi
+        xor     ecx,DWORD [36+esp]
+        xor     ebp,eax
+        xor     ecx,DWORD [56+esp]
+        rol     ecx,1
+        add     ebx,ebp
+        ror     edi,2
+        mov     ebp,edx
+        rol     ebp,5
+        mov     DWORD [4+esp],ecx
+        lea     ecx,[1859775393+ebx*1+ecx]
+        mov     ebx,DWORD [8+esp]
+        add     ecx,ebp
+        ; 20_39 34
+        mov     ebp,edx
+        xor     ebx,DWORD [16+esp]
+        xor     ebp,edi
+        xor     ebx,DWORD [40+esp]
+        xor     ebp,esi
+        xor     ebx,DWORD [60+esp]
+        rol     ebx,1
+        add     eax,ebp
+        ror     edx,2
+        mov     ebp,ecx
+        rol     ebp,5
+        mov     DWORD [8+esp],ebx
+        lea     ebx,[1859775393+eax*1+ebx]
+        mov     eax,DWORD [12+esp]
+        add     ebx,ebp
+        ; 20_39 35
+        mov     ebp,ecx
+        xor     eax,DWORD [20+esp]
+        xor     ebp,edx
+        xor     eax,DWORD [44+esp]
+        xor     ebp,edi
+        xor     eax,DWORD [esp]
+        rol     eax,1
+        add     esi,ebp
+        ror     ecx,2
+        mov     ebp,ebx
+        rol     ebp,5
+        mov     DWORD [12+esp],eax
+        lea     eax,[1859775393+esi*1+eax]
+        mov     esi,DWORD [16+esp]
+        add     eax,ebp
+        ; 20_39 36
+        mov     ebp,ebx
+        xor     esi,DWORD [24+esp]
+        xor     ebp,ecx
+        xor     esi,DWORD [48+esp]
+        xor     ebp,edx
+        xor     esi,DWORD [4+esp]
+        rol     esi,1
+        add     edi,ebp
+        ror     ebx,2
+        mov     ebp,eax
+        rol     ebp,5
+        mov     DWORD [16+esp],esi
+        lea     esi,[1859775393+edi*1+esi]
+        mov     edi,DWORD [20+esp]
+        add     esi,ebp
+        ; 20_39 37
+        mov     ebp,eax
+        xor     edi,DWORD [28+esp]
+        xor     ebp,ebx
+        xor     edi,DWORD [52+esp]
+        xor     ebp,ecx
+        xor     edi,DWORD [8+esp]
+        rol     edi,1
+        add     edx,ebp
+        ror     eax,2
+        mov     ebp,esi
+        rol     ebp,5
+        mov     DWORD [20+esp],edi
+        lea     edi,[1859775393+edx*1+edi]
+        mov     edx,DWORD [24+esp]
+        add     edi,ebp
+        ; 20_39 38
+        mov     ebp,esi
+        xor     edx,DWORD [32+esp]
+        xor     ebp,eax
+        xor     edx,DWORD [56+esp]
+        xor     ebp,ebx
+        xor     edx,DWORD [12+esp]
+        rol     edx,1
+        add     ecx,ebp
+        ror     esi,2
+        mov     ebp,edi
+        rol     ebp,5
+        mov     DWORD [24+esp],edx
+        lea     edx,[1859775393+ecx*1+edx]
+        mov     ecx,DWORD [28+esp]
+        add     edx,ebp
+        ; 20_39 39
+        mov     ebp,edi
+        xor     ecx,DWORD [36+esp]
+        xor     ebp,esi
+        xor     ecx,DWORD [60+esp]
+        xor     ebp,eax
+        xor     ecx,DWORD [16+esp]
+        rol     ecx,1
+        add     ebx,ebp
+        ror     edi,2
+        mov     ebp,edx
+        rol     ebp,5
+        mov     DWORD [28+esp],ecx
+        lea     ecx,[1859775393+ebx*1+ecx]
+        mov     ebx,DWORD [32+esp]
+        add     ecx,ebp
+        ; 40_59 40
+        mov     ebp,edi
+        xor     ebx,DWORD [40+esp]
+        xor     ebp,esi
+        xor     ebx,DWORD [esp]
+        and     ebp,edx
+        xor     ebx,DWORD [20+esp]
+        rol     ebx,1
+        add     ebp,eax
+        ror     edx,2
+        mov     eax,ecx
+        rol     eax,5
+        mov     DWORD [32+esp],ebx
+        lea     ebx,[2400959708+ebp*1+ebx]
+        mov     ebp,edi
+        add     ebx,eax
+        and     ebp,esi
+        mov     eax,DWORD [36+esp]
+        add     ebx,ebp
+        ; 40_59 41
+        mov     ebp,edx
+        xor     eax,DWORD [44+esp]
+        xor     ebp,edi
+        xor     eax,DWORD [4+esp]
+        and     ebp,ecx
+        xor     eax,DWORD [24+esp]
+        rol     eax,1
+        add     ebp,esi
+        ror     ecx,2
+        mov     esi,ebx
+        rol     esi,5
+        mov     DWORD [36+esp],eax
+        lea     eax,[2400959708+ebp*1+eax]
+        mov     ebp,edx
+        add     eax,esi
+        and     ebp,edi
+        mov     esi,DWORD [40+esp]
+        add     eax,ebp
+        ; 40_59 42
+        mov     ebp,ecx
+        xor     esi,DWORD [48+esp]
+        xor     ebp,edx
+        xor     esi,DWORD [8+esp]
+        and     ebp,ebx
+        xor     esi,DWORD [28+esp]
+        rol     esi,1
+        add     ebp,edi
+        ror     ebx,2
+        mov     edi,eax
+        rol     edi,5
+        mov     DWORD [40+esp],esi
+        lea     esi,[2400959708+ebp*1+esi]
+        mov     ebp,ecx
+        add     esi,edi
+        and     ebp,edx
+        mov     edi,DWORD [44+esp]
+        add     esi,ebp
+        ; 40_59 43
+        mov     ebp,ebx
+        xor     edi,DWORD [52+esp]
+        xor     ebp,ecx
+        xor     edi,DWORD [12+esp]
+        and     ebp,eax
+        xor     edi,DWORD [32+esp]
+        rol     edi,1
+        add     ebp,edx
+        ror     eax,2
+        mov     edx,esi
+        rol     edx,5
+        mov     DWORD [44+esp],edi
+        lea     edi,[2400959708+ebp*1+edi]
+        mov     ebp,ebx
+        add     edi,edx
+        and     ebp,ecx
+        mov     edx,DWORD [48+esp]
+        add     edi,ebp
+        ; 40_59 44
+        mov     ebp,eax
+        xor     edx,DWORD [56+esp]
+        xor     ebp,ebx
+        xor     edx,DWORD [16+esp]
+        and     ebp,esi
+        xor     edx,DWORD [36+esp]
+        rol     edx,1
+        add     ebp,ecx
+        ror     esi,2
+        mov     ecx,edi
+        rol     ecx,5
+        mov     DWORD [48+esp],edx
+        lea     edx,[2400959708+ebp*1+edx]
+        mov     ebp,eax
+        add     edx,ecx
+        and     ebp,ebx
+        mov     ecx,DWORD [52+esp]
+        add     edx,ebp
+        ; 40_59 45
+        mov     ebp,esi
+        xor     ecx,DWORD [60+esp]
+        xor     ebp,eax
+        xor     ecx,DWORD [20+esp]
+        and     ebp,edi
+        xor     ecx,DWORD [40+esp]
+        rol     ecx,1
+        add     ebp,ebx
+        ror     edi,2
+        mov     ebx,edx
+        rol     ebx,5
+        mov     DWORD [52+esp],ecx
+        lea     ecx,[2400959708+ebp*1+ecx]
+        mov     ebp,esi
+        add     ecx,ebx
+        and     ebp,eax
+        mov     ebx,DWORD [56+esp]
+        add     ecx,ebp
+        ; 40_59 46
+        mov     ebp,edi
+        xor     ebx,DWORD [esp]
+        xor     ebp,esi
+        xor     ebx,DWORD [24+esp]
+        and     ebp,edx
+        xor     ebx,DWORD [44+esp]
+        rol     ebx,1
+        add     ebp,eax
+        ror     edx,2
+        mov     eax,ecx
+        rol     eax,5
+        mov     DWORD [56+esp],ebx
+        lea     ebx,[2400959708+ebp*1+ebx]
+        mov     ebp,edi
+        add     ebx,eax
+        and     ebp,esi
+        mov     eax,DWORD [60+esp]
+        add     ebx,ebp
+        ; 40_59 47
+        mov     ebp,edx
+        xor     eax,DWORD [4+esp]
+        xor     ebp,edi
+        xor     eax,DWORD [28+esp]
+        and     ebp,ecx
+        xor     eax,DWORD [48+esp]
+        rol     eax,1
+        add     ebp,esi
+        ror     ecx,2
+        mov     esi,ebx
+        rol     esi,5
+        mov     DWORD [60+esp],eax
+        lea     eax,[2400959708+ebp*1+eax]
+        mov     ebp,edx
+        add     eax,esi
+        and     ebp,edi
+        mov     esi,DWORD [esp]
+        add     eax,ebp
+        ; 40_59 48
+        mov     ebp,ecx
+        xor     esi,DWORD [8+esp]
+        xor     ebp,edx
+        xor     esi,DWORD [32+esp]
+        and     ebp,ebx
+        xor     esi,DWORD [52+esp]
+        rol     esi,1
+        add     ebp,edi
+        ror     ebx,2
+        mov     edi,eax
+        rol     edi,5
+        mov     DWORD [esp],esi
+        lea     esi,[2400959708+ebp*1+esi]
+        mov     ebp,ecx
+        add     esi,edi
+        and     ebp,edx
+        mov     edi,DWORD [4+esp]
+        add     esi,ebp
+        ; 40_59 49
+        mov     ebp,ebx
+        xor     edi,DWORD [12+esp]
+        xor     ebp,ecx
+        xor     edi,DWORD [36+esp]
+        and     ebp,eax
+        xor     edi,DWORD [56+esp]
+        rol     edi,1
+        add     ebp,edx
+        ror     eax,2
+        mov     edx,esi
+        rol     edx,5
+        mov     DWORD [4+esp],edi
+        lea     edi,[2400959708+ebp*1+edi]
+        mov     ebp,ebx
+        add     edi,edx
+        and     ebp,ecx
+        mov     edx,DWORD [8+esp]
+        add     edi,ebp
+        ; 40_59 50
+        mov     ebp,eax
+        xor     edx,DWORD [16+esp]
+        xor     ebp,ebx
+        xor     edx,DWORD [40+esp]
+        and     ebp,esi
+        xor     edx,DWORD [60+esp]
+        rol     edx,1
+        add     ebp,ecx
+        ror     esi,2
+        mov     ecx,edi
+        rol     ecx,5
+        mov     DWORD [8+esp],edx
+        lea     edx,[2400959708+ebp*1+edx]
+        mov     ebp,eax
+        add     edx,ecx
+        and     ebp,ebx
+        mov     ecx,DWORD [12+esp]
+        add     edx,ebp
+        ; 40_59 51
+        mov     ebp,esi
+        xor     ecx,DWORD [20+esp]
+        xor     ebp,eax
+        xor     ecx,DWORD [44+esp]
+        and     ebp,edi
+        xor     ecx,DWORD [esp]
+        rol     ecx,1
+        add     ebp,ebx
+        ror     edi,2
+        mov     ebx,edx
+        rol     ebx,5
+        mov     DWORD [12+esp],ecx
+        lea     ecx,[2400959708+ebp*1+ecx]
+        mov     ebp,esi
+        add     ecx,ebx
+        and     ebp,eax
+        mov     ebx,DWORD [16+esp]
+        add     ecx,ebp
+        ; 40_59 52
+        mov     ebp,edi
+        xor     ebx,DWORD [24+esp]
+        xor     ebp,esi
+        xor     ebx,DWORD [48+esp]
+        and     ebp,edx
+        xor     ebx,DWORD [4+esp]
+        rol     ebx,1
+        add     ebp,eax
+        ror     edx,2
+        mov     eax,ecx
+        rol     eax,5
+        mov     DWORD [16+esp],ebx
+        lea     ebx,[2400959708+ebp*1+ebx]
+        mov     ebp,edi
+        add     ebx,eax
+        and     ebp,esi
+        mov     eax,DWORD [20+esp]
+        add     ebx,ebp
+        ; 40_59 53
+        mov     ebp,edx
+        xor     eax,DWORD [28+esp]
+        xor     ebp,edi
+        xor     eax,DWORD [52+esp]
+        and     ebp,ecx
+        xor     eax,DWORD [8+esp]
+        rol     eax,1
+        add     ebp,esi
+        ror     ecx,2
+        mov     esi,ebx
+        rol     esi,5
+        mov     DWORD [20+esp],eax
+        lea     eax,[2400959708+ebp*1+eax]
+        mov     ebp,edx
+        add     eax,esi
+        and     ebp,edi
+        mov     esi,DWORD [24+esp]
+        add     eax,ebp
+        ; 40_59 54
+        mov     ebp,ecx
+        xor     esi,DWORD [32+esp]
+        xor     ebp,edx
+        xor     esi,DWORD [56+esp]
+        and     ebp,ebx
+        xor     esi,DWORD [12+esp]
+        rol     esi,1
+        add     ebp,edi
+        ror     ebx,2
+        mov     edi,eax
+        rol     edi,5
+        mov     DWORD [24+esp],esi
+        lea     esi,[2400959708+ebp*1+esi]
+        mov     ebp,ecx
+        add     esi,edi
+        and     ebp,edx
+        mov     edi,DWORD [28+esp]
+        add     esi,ebp
+        ; 40_59 55
+        mov     ebp,ebx
+        xor     edi,DWORD [36+esp]
+        xor     ebp,ecx
+        xor     edi,DWORD [60+esp]
+        and     ebp,eax
+        xor     edi,DWORD [16+esp]
+        rol     edi,1
+        add     ebp,edx
+        ror     eax,2
+        mov     edx,esi
+        rol     edx,5
+        mov     DWORD [28+esp],edi
+        lea     edi,[2400959708+ebp*1+edi]
+        mov     ebp,ebx
+        add     edi,edx
+        and     ebp,ecx
+        mov     edx,DWORD [32+esp]
+        add     edi,ebp
+        ; 40_59 56
+        mov     ebp,eax
+        xor     edx,DWORD [40+esp]
+        xor     ebp,ebx
+        xor     edx,DWORD [esp]
+        and     ebp,esi
+        xor     edx,DWORD [20+esp]
+        rol     edx,1
+        add     ebp,ecx
+        ror     esi,2
+        mov     ecx,edi
+        rol     ecx,5
+        mov     DWORD [32+esp],edx
+        lea     edx,[2400959708+ebp*1+edx]
+        mov     ebp,eax
+        add     edx,ecx
+        and     ebp,ebx
+        mov     ecx,DWORD [36+esp]
+        add     edx,ebp
+        ; 40_59 57
+        mov     ebp,esi
+        xor     ecx,DWORD [44+esp]
+        xor     ebp,eax
+        xor     ecx,DWORD [4+esp]
+        and     ebp,edi
+        xor     ecx,DWORD [24+esp]
+        rol     ecx,1
+        add     ebp,ebx
+        ror     edi,2
+        mov     ebx,edx
+        rol     ebx,5
+        mov     DWORD [36+esp],ecx
+        lea     ecx,[2400959708+ebp*1+ecx]
+        mov     ebp,esi
+        add     ecx,ebx
+        and     ebp,eax
+        mov     ebx,DWORD [40+esp]
+        add     ecx,ebp
+        ; 40_59 58
+        mov     ebp,edi
+        xor     ebx,DWORD [48+esp]
+        xor     ebp,esi
+        xor     ebx,DWORD [8+esp]
+        and     ebp,edx
+        xor     ebx,DWORD [28+esp]
+        rol     ebx,1
+        add     ebp,eax
+        ror     edx,2
+        mov     eax,ecx
+        rol     eax,5
+        mov     DWORD [40+esp],ebx
+        lea     ebx,[2400959708+ebp*1+ebx]
+        mov     ebp,edi
+        add     ebx,eax
+        and     ebp,esi
+        mov     eax,DWORD [44+esp]
+        add     ebx,ebp
+        ; 40_59 59
+        mov     ebp,edx
+        xor     eax,DWORD [52+esp]
+        xor     ebp,edi
+        xor     eax,DWORD [12+esp]
+        and     ebp,ecx
+        xor     eax,DWORD [32+esp]
+        rol     eax,1
+        add     ebp,esi
+        ror     ecx,2
+        mov     esi,ebx
+        rol     esi,5
+        mov     DWORD [44+esp],eax
+        lea     eax,[2400959708+ebp*1+eax]
+        mov     ebp,edx
+        add     eax,esi
+        and     ebp,edi
+        mov     esi,DWORD [48+esp]
+        add     eax,ebp
+        ; 20_39 60
+        mov     ebp,ebx
+        xor     esi,DWORD [56+esp]
+        xor     ebp,ecx
+        xor     esi,DWORD [16+esp]
+        xor     ebp,edx
+        xor     esi,DWORD [36+esp]
+        rol     esi,1
+        add     edi,ebp
+        ror     ebx,2
+        mov     ebp,eax
+        rol     ebp,5
+        mov     DWORD [48+esp],esi
+        lea     esi,[3395469782+edi*1+esi]
+        mov     edi,DWORD [52+esp]
+        add     esi,ebp
+        ; 20_39 61
+        mov     ebp,eax
+        xor     edi,DWORD [60+esp]
+        xor     ebp,ebx
+        xor     edi,DWORD [20+esp]
+        xor     ebp,ecx
+        xor     edi,DWORD [40+esp]
+        rol     edi,1
+        add     edx,ebp
+        ror     eax,2
+        mov     ebp,esi
+        rol     ebp,5
+        mov     DWORD [52+esp],edi
+        lea     edi,[3395469782+edx*1+edi]
+        mov     edx,DWORD [56+esp]
+        add     edi,ebp
+        ; 20_39 62
+        mov     ebp,esi
+        xor     edx,DWORD [esp]
+        xor     ebp,eax
+        xor     edx,DWORD [24+esp]
+        xor     ebp,ebx
+        xor     edx,DWORD [44+esp]
+        rol     edx,1
+        add     ecx,ebp
+        ror     esi,2
+        mov     ebp,edi
+        rol     ebp,5
+        mov     DWORD [56+esp],edx
+        lea     edx,[3395469782+ecx*1+edx]
+        mov     ecx,DWORD [60+esp]
+        add     edx,ebp
+        ; 20_39 63
+        mov     ebp,edi
+        xor     ecx,DWORD [4+esp]
+        xor     ebp,esi
+        xor     ecx,DWORD [28+esp]
+        xor     ebp,eax
+        xor     ecx,DWORD [48+esp]
+        rol     ecx,1
+        add     ebx,ebp
+        ror     edi,2
+        mov     ebp,edx
+        rol     ebp,5
+        mov     DWORD [60+esp],ecx
+        lea     ecx,[3395469782+ebx*1+ecx]
+        mov     ebx,DWORD [esp]
+        add     ecx,ebp
+        ; 20_39 64
+        mov     ebp,edx
+        xor     ebx,DWORD [8+esp]
+        xor     ebp,edi
+        xor     ebx,DWORD [32+esp]
+        xor     ebp,esi
+        xor     ebx,DWORD [52+esp]
+        rol     ebx,1
+        add     eax,ebp
+        ror     edx,2
+        mov     ebp,ecx
+        rol     ebp,5
+        mov     DWORD [esp],ebx
+        lea     ebx,[3395469782+eax*1+ebx]
+        mov     eax,DWORD [4+esp]
+        add     ebx,ebp
+        ; 20_39 65
+        mov     ebp,ecx
+        xor     eax,DWORD [12+esp]
+        xor     ebp,edx
+        xor     eax,DWORD [36+esp]
+        xor     ebp,edi
+        xor     eax,DWORD [56+esp]
+        rol     eax,1
+        add     esi,ebp
+        ror     ecx,2
+        mov     ebp,ebx
+        rol     ebp,5
+        mov     DWORD [4+esp],eax
+        lea     eax,[3395469782+esi*1+eax]
+        mov     esi,DWORD [8+esp]
+        add     eax,ebp
+        ; 20_39 66
+        mov     ebp,ebx
+        xor     esi,DWORD [16+esp]
+        xor     ebp,ecx
+        xor     esi,DWORD [40+esp]
+        xor     ebp,edx
+        xor     esi,DWORD [60+esp]
+        rol     esi,1
+        add     edi,ebp
+        ror     ebx,2
+        mov     ebp,eax
+        rol     ebp,5
+        mov     DWORD [8+esp],esi
+        lea     esi,[3395469782+edi*1+esi]
+        mov     edi,DWORD [12+esp]
+        add     esi,ebp
+        ; 20_39 67
+        mov     ebp,eax
+        xor     edi,DWORD [20+esp]
+        xor     ebp,ebx
+        xor     edi,DWORD [44+esp]
+        xor     ebp,ecx
+        xor     edi,DWORD [esp]
+        rol     edi,1
+        add     edx,ebp
+        ror     eax,2
+        mov     ebp,esi
+        rol     ebp,5
+        mov     DWORD [12+esp],edi
+        lea     edi,[3395469782+edx*1+edi]
+        mov     edx,DWORD [16+esp]
+        add     edi,ebp
+        ; 20_39 68
+        mov     ebp,esi
+        xor     edx,DWORD [24+esp]
+        xor     ebp,eax
+        xor     edx,DWORD [48+esp]
+        xor     ebp,ebx
+        xor     edx,DWORD [4+esp]
+        rol     edx,1
+        add     ecx,ebp
+        ror     esi,2
+        mov     ebp,edi
+        rol     ebp,5
+        mov     DWORD [16+esp],edx
+        lea     edx,[3395469782+ecx*1+edx]
+        mov     ecx,DWORD [20+esp]
+        add     edx,ebp
+        ; 20_39 69
+        mov     ebp,edi
+        xor     ecx,DWORD [28+esp]
+        xor     ebp,esi
+        xor     ecx,DWORD [52+esp]
+        xor     ebp,eax
+        xor     ecx,DWORD [8+esp]
+        rol     ecx,1
+        add     ebx,ebp
+        ror     edi,2
+        mov     ebp,edx
+        rol     ebp,5
+        mov     DWORD [20+esp],ecx
+        lea     ecx,[3395469782+ebx*1+ecx]
+        mov     ebx,DWORD [24+esp]
+        add     ecx,ebp
+        ; 20_39 70
+        mov     ebp,edx
+        xor     ebx,DWORD [32+esp]
+        xor     ebp,edi
+        xor     ebx,DWORD [56+esp]
+        xor     ebp,esi
+        xor     ebx,DWORD [12+esp]
+        rol     ebx,1
+        add     eax,ebp
+        ror     edx,2
+        mov     ebp,ecx
+        rol     ebp,5
+        mov     DWORD [24+esp],ebx
+        lea     ebx,[3395469782+eax*1+ebx]
+        mov     eax,DWORD [28+esp]
+        add     ebx,ebp
+        ; 20_39 71
+        mov     ebp,ecx
+        xor     eax,DWORD [36+esp]
+        xor     ebp,edx
+        xor     eax,DWORD [60+esp]
+        xor     ebp,edi
+        xor     eax,DWORD [16+esp]
+        rol     eax,1
+        add     esi,ebp
+        ror     ecx,2
+        mov     ebp,ebx
+        rol     ebp,5
+        mov     DWORD [28+esp],eax
+        lea     eax,[3395469782+esi*1+eax]
+        mov     esi,DWORD [32+esp]
+        add     eax,ebp
+        ; 20_39 72
+        mov     ebp,ebx
+        xor     esi,DWORD [40+esp]
+        xor     ebp,ecx
+        xor     esi,DWORD [esp]
+        xor     ebp,edx
+        xor     esi,DWORD [20+esp]
+        rol     esi,1
+        add     edi,ebp
+        ror     ebx,2
+        mov     ebp,eax
+        rol     ebp,5
+        mov     DWORD [32+esp],esi
+        lea     esi,[3395469782+edi*1+esi]
+        mov     edi,DWORD [36+esp]
+        add     esi,ebp
+        ; 20_39 73
+        mov     ebp,eax
+        xor     edi,DWORD [44+esp]
+        xor     ebp,ebx
+        xor     edi,DWORD [4+esp]
+        xor     ebp,ecx
+        xor     edi,DWORD [24+esp]
+        rol     edi,1
+        add     edx,ebp
+        ror     eax,2
+        mov     ebp,esi
+        rol     ebp,5
+        mov     DWORD [36+esp],edi
+        lea     edi,[3395469782+edx*1+edi]
+        mov     edx,DWORD [40+esp]
+        add     edi,ebp
+        ; 20_39 74
+        mov     ebp,esi
+        xor     edx,DWORD [48+esp]
+        xor     ebp,eax
+        xor     edx,DWORD [8+esp]
+        xor     ebp,ebx
+        xor     edx,DWORD [28+esp]
+        rol     edx,1
+        add     ecx,ebp
+        ror     esi,2
+        mov     ebp,edi
+        rol     ebp,5
+        mov     DWORD [40+esp],edx
+        lea     edx,[3395469782+ecx*1+edx]
+        mov     ecx,DWORD [44+esp]
+        add     edx,ebp
+        ; 20_39 75
+        mov     ebp,edi
+        xor     ecx,DWORD [52+esp]
+        xor     ebp,esi
+        xor     ecx,DWORD [12+esp]
+        xor     ebp,eax
+        xor     ecx,DWORD [32+esp]
+        rol     ecx,1
+        add     ebx,ebp
+        ror     edi,2
+        mov     ebp,edx
+        rol     ebp,5
+        mov     DWORD [44+esp],ecx
+        lea     ecx,[3395469782+ebx*1+ecx]
+        mov     ebx,DWORD [48+esp]
+        add     ecx,ebp
+        ; 20_39 76
+        mov     ebp,edx
+        xor     ebx,DWORD [56+esp]
+        xor     ebp,edi
+        xor     ebx,DWORD [16+esp]
+        xor     ebp,esi
+        xor     ebx,DWORD [36+esp]
+        rol     ebx,1
+        add     eax,ebp
+        ror     edx,2
+        mov     ebp,ecx
+        rol     ebp,5
+        mov     DWORD [48+esp],ebx
+        lea     ebx,[3395469782+eax*1+ebx]
+        mov     eax,DWORD [52+esp]
+        add     ebx,ebp
+        ; 20_39 77
+        mov     ebp,ecx
+        xor     eax,DWORD [60+esp]
+        xor     ebp,edx
+        xor     eax,DWORD [20+esp]
+        xor     ebp,edi
+        xor     eax,DWORD [40+esp]
+        rol     eax,1
+        add     esi,ebp
+        ror     ecx,2
+        mov     ebp,ebx
+        rol     ebp,5
+        lea     eax,[3395469782+esi*1+eax]
+        mov     esi,DWORD [56+esp]
+        add     eax,ebp
+        ; 20_39 78
+        mov     ebp,ebx
+        xor     esi,DWORD [esp]
+        xor     ebp,ecx
+        xor     esi,DWORD [24+esp]
+        xor     ebp,edx
+        xor     esi,DWORD [44+esp]
+        rol     esi,1
+        add     edi,ebp
+        ror     ebx,2
+        mov     ebp,eax
+        rol     ebp,5
+        lea     esi,[3395469782+edi*1+esi]
+        mov     edi,DWORD [60+esp]
+        add     esi,ebp
+        ; 20_39 79
+        mov     ebp,eax
+        xor     edi,DWORD [4+esp]
+        xor     ebp,ebx
+        xor     edi,DWORD [28+esp]
+        xor     ebp,ecx
+        xor     edi,DWORD [48+esp]
+        rol     edi,1
+        add     edx,ebp
+        ror     eax,2
+        mov     ebp,esi
+        rol     ebp,5
+        lea     edi,[3395469782+edx*1+edi]
+        add     edi,ebp
+        mov     ebp,DWORD [96+esp]
+        mov     edx,DWORD [100+esp]
+        add     edi,DWORD [ebp]
+        add     esi,DWORD [4+ebp]
+        add     eax,DWORD [8+ebp]
+        add     ebx,DWORD [12+ebp]
+        add     ecx,DWORD [16+ebp]
+        mov     DWORD [ebp],edi
+        add     edx,64
+        mov     DWORD [4+ebp],esi
+        cmp     edx,DWORD [104+esp]
+        mov     DWORD [8+ebp],eax
+        mov     edi,ecx
+        mov     DWORD [12+ebp],ebx
+        mov     esi,edx
+        mov     DWORD [16+ebp],ecx
+        jb      NEAR L$002loop
+        add     esp,76
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+align   16
+__sha1_block_data_order_shaext:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        call    L$003pic_point
+L$003pic_point:
+        pop     ebp
+        lea     ebp,[(L$K_XX_XX-L$003pic_point)+ebp]
+L$shaext_shortcut:
+        mov     edi,DWORD [20+esp]
+        mov     ebx,esp
+        mov     esi,DWORD [24+esp]
+        mov     ecx,DWORD [28+esp]
+        sub     esp,32
+        movdqu  xmm0,[edi]
+        movd    xmm1,DWORD [16+edi]
+        and     esp,-32
+        movdqa  xmm3,[80+ebp]
+        movdqu  xmm4,[esi]
+        pshufd  xmm0,xmm0,27
+        movdqu  xmm5,[16+esi]
+        pshufd  xmm1,xmm1,27
+        movdqu  xmm6,[32+esi]
+db      102,15,56,0,227
+        movdqu  xmm7,[48+esi]
+db      102,15,56,0,235
+db      102,15,56,0,243
+db      102,15,56,0,251
+        jmp     NEAR L$004loop_shaext
+align   16
+L$004loop_shaext:
+        dec     ecx
+        lea     eax,[64+esi]
+        movdqa  [esp],xmm1
+        paddd   xmm1,xmm4
+        cmovne  esi,eax
+        movdqa  [16+esp],xmm0
+db      15,56,201,229
+        movdqa  xmm2,xmm0
+db      15,58,204,193,0
+db      15,56,200,213
+        pxor    xmm4,xmm6
+db      15,56,201,238
+db      15,56,202,231
+        movdqa  xmm1,xmm0
+db      15,58,204,194,0
+db      15,56,200,206
+        pxor    xmm5,xmm7
+db      15,56,202,236
+db      15,56,201,247
+        movdqa  xmm2,xmm0
+db      15,58,204,193,0
+db      15,56,200,215
+        pxor    xmm6,xmm4
+db      15,56,201,252
+db      15,56,202,245
+        movdqa  xmm1,xmm0
+db      15,58,204,194,0
+db      15,56,200,204
+        pxor    xmm7,xmm5
+db      15,56,202,254
+db      15,56,201,229
+        movdqa  xmm2,xmm0
+db      15,58,204,193,0
+db      15,56,200,213
+        pxor    xmm4,xmm6
+db      15,56,201,238
+db      15,56,202,231
+        movdqa  xmm1,xmm0
+db      15,58,204,194,1
+db      15,56,200,206
+        pxor    xmm5,xmm7
+db      15,56,202,236
+db      15,56,201,247
+        movdqa  xmm2,xmm0
+db      15,58,204,193,1
+db      15,56,200,215
+        pxor    xmm6,xmm4
+db      15,56,201,252
+db      15,56,202,245
+        movdqa  xmm1,xmm0
+db      15,58,204,194,1
+db      15,56,200,204
+        pxor    xmm7,xmm5
+db      15,56,202,254
+db      15,56,201,229
+        movdqa  xmm2,xmm0
+db      15,58,204,193,1
+db      15,56,200,213
+        pxor    xmm4,xmm6
+db      15,56,201,238
+db      15,56,202,231
+        movdqa  xmm1,xmm0
+db      15,58,204,194,1
+db      15,56,200,206
+        pxor    xmm5,xmm7
+db      15,56,202,236
+db      15,56,201,247
+        movdqa  xmm2,xmm0
+db      15,58,204,193,2
+db      15,56,200,215
+        pxor    xmm6,xmm4
+db      15,56,201,252
+db      15,56,202,245
+        movdqa  xmm1,xmm0
+db      15,58,204,194,2
+db      15,56,200,204
+        pxor    xmm7,xmm5
+db      15,56,202,254
+db      15,56,201,229
+        movdqa  xmm2,xmm0
+db      15,58,204,193,2
+db      15,56,200,213
+        pxor    xmm4,xmm6
+db      15,56,201,238
+db      15,56,202,231
+        movdqa  xmm1,xmm0
+db      15,58,204,194,2
+db      15,56,200,206
+        pxor    xmm5,xmm7
+db      15,56,202,236
+db      15,56,201,247
+        movdqa  xmm2,xmm0
+db      15,58,204,193,2
+db      15,56,200,215
+        pxor    xmm6,xmm4
+db      15,56,201,252
+db      15,56,202,245
+        movdqa  xmm1,xmm0
+db      15,58,204,194,3
+db      15,56,200,204
+        pxor    xmm7,xmm5
+db      15,56,202,254
+        movdqu  xmm4,[esi]
+        movdqa  xmm2,xmm0
+db      15,58,204,193,3
+db      15,56,200,213
+        movdqu  xmm5,[16+esi]
+db      102,15,56,0,227
+        movdqa  xmm1,xmm0
+db      15,58,204,194,3
+db      15,56,200,206
+        movdqu  xmm6,[32+esi]
+db      102,15,56,0,235
+        movdqa  xmm2,xmm0
+db      15,58,204,193,3
+db      15,56,200,215
+        movdqu  xmm7,[48+esi]
+db      102,15,56,0,243
+        movdqa  xmm1,xmm0
+db      15,58,204,194,3
+        movdqa  xmm2,[esp]
+db      102,15,56,0,251
+db      15,56,200,202
+        paddd   xmm0,[16+esp]
+        jnz     NEAR L$004loop_shaext
+        pshufd  xmm0,xmm0,27
+        pshufd  xmm1,xmm1,27
+        movdqu  [edi],xmm0
+        movd    DWORD [16+edi],xmm1
+        mov     esp,ebx
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+align   16
+__sha1_block_data_order_ssse3:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        call    L$005pic_point
+L$005pic_point:
+        pop     ebp
+        lea     ebp,[(L$K_XX_XX-L$005pic_point)+ebp]
+L$ssse3_shortcut:
+        movdqa  xmm7,[ebp]
+        movdqa  xmm0,[16+ebp]
+        movdqa  xmm1,[32+ebp]
+        movdqa  xmm2,[48+ebp]
+        movdqa  xmm6,[64+ebp]
+        mov     edi,DWORD [20+esp]
+        mov     ebp,DWORD [24+esp]
+        mov     edx,DWORD [28+esp]
+        mov     esi,esp
+        sub     esp,208
+        and     esp,-64
+        movdqa  [112+esp],xmm0
+        movdqa  [128+esp],xmm1
+        movdqa  [144+esp],xmm2
+        shl     edx,6
+        movdqa  [160+esp],xmm7
+        add     edx,ebp
+        movdqa  [176+esp],xmm6
+        add     ebp,64
+        mov     DWORD [192+esp],edi
+        mov     DWORD [196+esp],ebp
+        mov     DWORD [200+esp],edx
+        mov     DWORD [204+esp],esi
+        mov     eax,DWORD [edi]
+        mov     ebx,DWORD [4+edi]
+        mov     ecx,DWORD [8+edi]
+        mov     edx,DWORD [12+edi]
+        mov     edi,DWORD [16+edi]
+        mov     esi,ebx
+        movdqu  xmm0,[ebp-64]
+        movdqu  xmm1,[ebp-48]
+        movdqu  xmm2,[ebp-32]
+        movdqu  xmm3,[ebp-16]
+db      102,15,56,0,198
+db      102,15,56,0,206
+db      102,15,56,0,214
+        movdqa  [96+esp],xmm7
+db      102,15,56,0,222
+        paddd   xmm0,xmm7
+        paddd   xmm1,xmm7
+        paddd   xmm2,xmm7
+        movdqa  [esp],xmm0
+        psubd   xmm0,xmm7
+        movdqa  [16+esp],xmm1
+        psubd   xmm1,xmm7
+        movdqa  [32+esp],xmm2
+        mov     ebp,ecx
+        psubd   xmm2,xmm7
+        xor     ebp,edx
+        pshufd  xmm4,xmm0,238
+        and     esi,ebp
+        jmp     NEAR L$006loop
+align   16
+L$006loop:
+        ror     ebx,2
+        xor     esi,edx
+        mov     ebp,eax
+        punpcklqdq      xmm4,xmm1
+        movdqa  xmm6,xmm3
+        add     edi,DWORD [esp]
+        xor     ebx,ecx
+        paddd   xmm7,xmm3
+        movdqa  [64+esp],xmm0
+        rol     eax,5
+        add     edi,esi
+        psrldq  xmm6,4
+        and     ebp,ebx
+        xor     ebx,ecx
+        pxor    xmm4,xmm0
+        add     edi,eax
+        ror     eax,7
+        pxor    xmm6,xmm2
+        xor     ebp,ecx
+        mov     esi,edi
+        add     edx,DWORD [4+esp]
+        pxor    xmm4,xmm6
+        xor     eax,ebx
+        rol     edi,5
+        movdqa  [48+esp],xmm7
+        add     edx,ebp
+        and     esi,eax
+        movdqa  xmm0,xmm4
+        xor     eax,ebx
+        add     edx,edi
+        ror     edi,7
+        movdqa  xmm6,xmm4
+        xor     esi,ebx
+        pslldq  xmm0,12
+        paddd   xmm4,xmm4
+        mov     ebp,edx
+        add     ecx,DWORD [8+esp]
+        psrld   xmm6,31
+        xor     edi,eax
+        rol     edx,5
+        movdqa  xmm7,xmm0
+        add     ecx,esi
+        and     ebp,edi
+        xor     edi,eax
+        psrld   xmm0,30
+        add     ecx,edx
+        ror     edx,7
+        por     xmm4,xmm6
+        xor     ebp,eax
+        mov     esi,ecx
+        add     ebx,DWORD [12+esp]
+        pslld   xmm7,2
+        xor     edx,edi
+        rol     ecx,5
+        pxor    xmm4,xmm0
+        movdqa  xmm0,[96+esp]
+        add     ebx,ebp
+        and     esi,edx
+        pxor    xmm4,xmm7
+        pshufd  xmm5,xmm1,238
+        xor     edx,edi
+        add     ebx,ecx
+        ror     ecx,7
+        xor     esi,edi
+        mov     ebp,ebx
+        punpcklqdq      xmm5,xmm2
+        movdqa  xmm7,xmm4
+        add     eax,DWORD [16+esp]
+        xor     ecx,edx
+        paddd   xmm0,xmm4
+        movdqa  [80+esp],xmm1
+        rol     ebx,5
+        add     eax,esi
+        psrldq  xmm7,4
+        and     ebp,ecx
+        xor     ecx,edx
+        pxor    xmm5,xmm1
+        add     eax,ebx
+        ror     ebx,7
+        pxor    xmm7,xmm3
+        xor     ebp,edx
+        mov     esi,eax
+        add     edi,DWORD [20+esp]
+        pxor    xmm5,xmm7
+        xor     ebx,ecx
+        rol     eax,5
+        movdqa  [esp],xmm0
+        add     edi,ebp
+        and     esi,ebx
+        movdqa  xmm1,xmm5
+        xor     ebx,ecx
+        add     edi,eax
+        ror     eax,7
+        movdqa  xmm7,xmm5
+        xor     esi,ecx
+        pslldq  xmm1,12
+        paddd   xmm5,xmm5
+        mov     ebp,edi
+        add     edx,DWORD [24+esp]
+        psrld   xmm7,31
+        xor     eax,ebx
+        rol     edi,5
+        movdqa  xmm0,xmm1
+        add     edx,esi
+        and     ebp,eax
+        xor     eax,ebx
+        psrld   xmm1,30
+        add     edx,edi
+        ror     edi,7
+        por     xmm5,xmm7
+        xor     ebp,ebx
+        mov     esi,edx
+        add     ecx,DWORD [28+esp]
+        pslld   xmm0,2
+        xor     edi,eax
+        rol     edx,5
+        pxor    xmm5,xmm1
+        movdqa  xmm1,[112+esp]
+        add     ecx,ebp
+        and     esi,edi
+        pxor    xmm5,xmm0
+        pshufd  xmm6,xmm2,238
+        xor     edi,eax
+        add     ecx,edx
+        ror     edx,7
+        xor     esi,eax
+        mov     ebp,ecx
+        punpcklqdq      xmm6,xmm3
+        movdqa  xmm0,xmm5
+        add     ebx,DWORD [32+esp]
+        xor     edx,edi
+        paddd   xmm1,xmm5
+        movdqa  [96+esp],xmm2
+        rol     ecx,5
+        add     ebx,esi
+        psrldq  xmm0,4
+        and     ebp,edx
+        xor     edx,edi
+        pxor    xmm6,xmm2
+        add     ebx,ecx
+        ror     ecx,7
+        pxor    xmm0,xmm4
+        xor     ebp,edi
+        mov     esi,ebx
+        add     eax,DWORD [36+esp]
+        pxor    xmm6,xmm0
+        xor     ecx,edx
+        rol     ebx,5
+        movdqa  [16+esp],xmm1
+        add     eax,ebp
+        and     esi,ecx
+        movdqa  xmm2,xmm6
+        xor     ecx,edx
+        add     eax,ebx
+        ror     ebx,7
+        movdqa  xmm0,xmm6
+        xor     esi,edx
+        pslldq  xmm2,12
+        paddd   xmm6,xmm6
+        mov     ebp,eax
+        add     edi,DWORD [40+esp]
+        psrld   xmm0,31
+        xor     ebx,ecx
+        rol     eax,5
+        movdqa  xmm1,xmm2
+        add     edi,esi
+        and     ebp,ebx
+        xor     ebx,ecx
+        psrld   xmm2,30
+        add     edi,eax
+        ror     eax,7
+        por     xmm6,xmm0
+        xor     ebp,ecx
+        movdqa  xmm0,[64+esp]
+        mov     esi,edi
+        add     edx,DWORD [44+esp]
+        pslld   xmm1,2
+        xor     eax,ebx
+        rol     edi,5
+        pxor    xmm6,xmm2
+        movdqa  xmm2,[112+esp]
+        add     edx,ebp
+        and     esi,eax
+        pxor    xmm6,xmm1
+        pshufd  xmm7,xmm3,238
+        xor     eax,ebx
+        add     edx,edi
+        ror     edi,7
+        xor     esi,ebx
+        mov     ebp,edx
+        punpcklqdq      xmm7,xmm4
+        movdqa  xmm1,xmm6
+        add     ecx,DWORD [48+esp]
+        xor     edi,eax
+        paddd   xmm2,xmm6
+        movdqa  [64+esp],xmm3
+        rol     edx,5
+        add     ecx,esi
+        psrldq  xmm1,4
+        and     ebp,edi
+        xor     edi,eax
+        pxor    xmm7,xmm3
+        add     ecx,edx
+        ror     edx,7
+        pxor    xmm1,xmm5
+        xor     ebp,eax
+        mov     esi,ecx
+        add     ebx,DWORD [52+esp]
+        pxor    xmm7,xmm1
+        xor     edx,edi
+        rol     ecx,5
+        movdqa  [32+esp],xmm2
+        add     ebx,ebp
+        and     esi,edx
+        movdqa  xmm3,xmm7
+        xor     edx,edi
+        add     ebx,ecx
+        ror     ecx,7
+        movdqa  xmm1,xmm7
+        xor     esi,edi
+        pslldq  xmm3,12
+        paddd   xmm7,xmm7
+        mov     ebp,ebx
+        add     eax,DWORD [56+esp]
+        psrld   xmm1,31
+        xor     ecx,edx
+        rol     ebx,5
+        movdqa  xmm2,xmm3
+        add     eax,esi
+        and     ebp,ecx
+        xor     ecx,edx
+        psrld   xmm3,30
+        add     eax,ebx
+        ror     ebx,7
+        por     xmm7,xmm1
+        xor     ebp,edx
+        movdqa  xmm1,[80+esp]
+        mov     esi,eax
+        add     edi,DWORD [60+esp]
+        pslld   xmm2,2
+        xor     ebx,ecx
+        rol     eax,5
+        pxor    xmm7,xmm3
+        movdqa  xmm3,[112+esp]
+        add     edi,ebp
+        and     esi,ebx
+        pxor    xmm7,xmm2
+        pshufd  xmm2,xmm6,238
+        xor     ebx,ecx
+        add     edi,eax
+        ror     eax,7
+        pxor    xmm0,xmm4
+        punpcklqdq      xmm2,xmm7
+        xor     esi,ecx
+        mov     ebp,edi
+        add     edx,DWORD [esp]
+        pxor    xmm0,xmm1
+        movdqa  [80+esp],xmm4
+        xor     eax,ebx
+        rol     edi,5
+        movdqa  xmm4,xmm3
+        add     edx,esi
+        paddd   xmm3,xmm7
+        and     ebp,eax
+        pxor    xmm0,xmm2
+        xor     eax,ebx
+        add     edx,edi
+        ror     edi,7
+        xor     ebp,ebx
+        movdqa  xmm2,xmm0
+        movdqa  [48+esp],xmm3
+        mov     esi,edx
+        add     ecx,DWORD [4+esp]
+        xor     edi,eax
+        rol     edx,5
+        pslld   xmm0,2
+        add     ecx,ebp
+        and     esi,edi
+        psrld   xmm2,30
+        xor     edi,eax
+        add     ecx,edx
+        ror     edx,7
+        xor     esi,eax
+        mov     ebp,ecx
+        add     ebx,DWORD [8+esp]
+        xor     edx,edi
+        rol     ecx,5
+        por     xmm0,xmm2
+        add     ebx,esi
+        and     ebp,edx
+        movdqa  xmm2,[96+esp]
+        xor     edx,edi
+        add     ebx,ecx
+        add     eax,DWORD [12+esp]
+        xor     ebp,edi
+        mov     esi,ebx
+        pshufd  xmm3,xmm7,238
+        rol     ebx,5
+        add     eax,ebp
+        xor     esi,edx
+        ror     ecx,7
+        add     eax,ebx
+        add     edi,DWORD [16+esp]
+        pxor    xmm1,xmm5
+        punpcklqdq      xmm3,xmm0
+        xor     esi,ecx
+        mov     ebp,eax
+        rol     eax,5
+        pxor    xmm1,xmm2
+        movdqa  [96+esp],xmm5
+        add     edi,esi
+        xor     ebp,ecx
+        movdqa  xmm5,xmm4
+        ror     ebx,7
+        paddd   xmm4,xmm0
+        add     edi,eax
+        pxor    xmm1,xmm3
+        add     edx,DWORD [20+esp]
+        xor     ebp,ebx
+        mov     esi,edi
+        rol     edi,5
+        movdqa  xmm3,xmm1
+        movdqa  [esp],xmm4
+        add     edx,ebp
+        xor     esi,ebx
+        ror     eax,7
+        add     edx,edi
+        pslld   xmm1,2
+        add     ecx,DWORD [24+esp]
+        xor     esi,eax
+        psrld   xmm3,30
+        mov     ebp,edx
+        rol     edx,5
+        add     ecx,esi
+        xor     ebp,eax
+        ror     edi,7
+        add     ecx,edx
+        por     xmm1,xmm3
+        add     ebx,DWORD [28+esp]
+        xor     ebp,edi
+        movdqa  xmm3,[64+esp]
+        mov     esi,ecx
+        rol     ecx,5
+        add     ebx,ebp
+        xor     esi,edi
+        ror     edx,7
+        pshufd  xmm4,xmm0,238
+        add     ebx,ecx
+        add     eax,DWORD [32+esp]
+        pxor    xmm2,xmm6
+        punpcklqdq      xmm4,xmm1
+        xor     esi,edx
+        mov     ebp,ebx
+        rol     ebx,5
+        pxor    xmm2,xmm3
+        movdqa  [64+esp],xmm6
+        add     eax,esi
+        xor     ebp,edx
+        movdqa  xmm6,[128+esp]
+        ror     ecx,7
+        paddd   xmm5,xmm1
+        add     eax,ebx
+        pxor    xmm2,xmm4
+        add     edi,DWORD [36+esp]
+        xor     ebp,ecx
+        mov     esi,eax
+        rol     eax,5
+        movdqa  xmm4,xmm2
+        movdqa  [16+esp],xmm5
+        add     edi,ebp
+        xor     esi,ecx
+        ror     ebx,7
+        add     edi,eax
+        pslld   xmm2,2
+        add     edx,DWORD [40+esp]
+        xor     esi,ebx
+        psrld   xmm4,30
+        mov     ebp,edi
+        rol     edi,5
+        add     edx,esi
+        xor     ebp,ebx
+        ror     eax,7
+        add     edx,edi
+        por     xmm2,xmm4
+        add     ecx,DWORD [44+esp]
+        xor     ebp,eax
+        movdqa  xmm4,[80+esp]
+        mov     esi,edx
+        rol     edx,5
+        add     ecx,ebp
+        xor     esi,eax
+        ror     edi,7
+        pshufd  xmm5,xmm1,238
+        add     ecx,edx
+        add     ebx,DWORD [48+esp]
+        pxor    xmm3,xmm7
+        punpcklqdq      xmm5,xmm2
+        xor     esi,edi
+        mov     ebp,ecx
+        rol     ecx,5
+        pxor    xmm3,xmm4
+        movdqa  [80+esp],xmm7
+        add     ebx,esi
+        xor     ebp,edi
+        movdqa  xmm7,xmm6
+        ror     edx,7
+        paddd   xmm6,xmm2
+        add     ebx,ecx
+        pxor    xmm3,xmm5
+        add     eax,DWORD [52+esp]
+        xor     ebp,edx
+        mov     esi,ebx
+        rol     ebx,5
+        movdqa  xmm5,xmm3
+        movdqa  [32+esp],xmm6
+        add     eax,ebp
+        xor     esi,edx
+        ror     ecx,7
+        add     eax,ebx
+        pslld   xmm3,2
+        add     edi,DWORD [56+esp]
+        xor     esi,ecx
+        psrld   xmm5,30
+        mov     ebp,eax
+        rol     eax,5
+        add     edi,esi
+        xor     ebp,ecx
+        ror     ebx,7
+        add     edi,eax
+        por     xmm3,xmm5
+        add     edx,DWORD [60+esp]
+        xor     ebp,ebx
+        movdqa  xmm5,[96+esp]
+        mov     esi,edi
+        rol     edi,5
+        add     edx,ebp
+        xor     esi,ebx
+        ror     eax,7
+        pshufd  xmm6,xmm2,238
+        add     edx,edi
+        add     ecx,DWORD [esp]
+        pxor    xmm4,xmm0
+        punpcklqdq      xmm6,xmm3
+        xor     esi,eax
+        mov     ebp,edx
+        rol     edx,5
+        pxor    xmm4,xmm5
+        movdqa  [96+esp],xmm0
+        add     ecx,esi
+        xor     ebp,eax
+        movdqa  xmm0,xmm7
+        ror     edi,7
+        paddd   xmm7,xmm3
+        add     ecx,edx
+        pxor    xmm4,xmm6
+        add     ebx,DWORD [4+esp]
+        xor     ebp,edi
+        mov     esi,ecx
+        rol     ecx,5
+        movdqa  xmm6,xmm4
+        movdqa  [48+esp],xmm7
+        add     ebx,ebp
+        xor     esi,edi
+        ror     edx,7
+        add     ebx,ecx
+        pslld   xmm4,2
+        add     eax,DWORD [8+esp]
+        xor     esi,edx
+        psrld   xmm6,30
+        mov     ebp,ebx
+        rol     ebx,5
+        add     eax,esi
+        xor     ebp,edx
+        ror     ecx,7
+        add     eax,ebx
+        por     xmm4,xmm6
+        add     edi,DWORD [12+esp]
+        xor     ebp,ecx
+        movdqa  xmm6,[64+esp]
+        mov     esi,eax
+        rol     eax,5
+        add     edi,ebp
+        xor     esi,ecx
+        ror     ebx,7
+        pshufd  xmm7,xmm3,238
+        add     edi,eax
+        add     edx,DWORD [16+esp]
+        pxor    xmm5,xmm1
+        punpcklqdq      xmm7,xmm4
+        xor     esi,ebx
+        mov     ebp,edi
+        rol     edi,5
+        pxor    xmm5,xmm6
+        movdqa  [64+esp],xmm1
+        add     edx,esi
+        xor     ebp,ebx
+        movdqa  xmm1,xmm0
+        ror     eax,7
+        paddd   xmm0,xmm4
+        add     edx,edi
+        pxor    xmm5,xmm7
+        add     ecx,DWORD [20+esp]
+        xor     ebp,eax
+        mov     esi,edx
+        rol     edx,5
+        movdqa  xmm7,xmm5
+        movdqa  [esp],xmm0
+        add     ecx,ebp
+        xor     esi,eax
+        ror     edi,7
+        add     ecx,edx
+        pslld   xmm5,2
+        add     ebx,DWORD [24+esp]
+        xor     esi,edi
+        psrld   xmm7,30
+        mov     ebp,ecx
+        rol     ecx,5
+        add     ebx,esi
+        xor     ebp,edi
+        ror     edx,7
+        add     ebx,ecx
+        por     xmm5,xmm7
+        add     eax,DWORD [28+esp]
+        movdqa  xmm7,[80+esp]
+        ror     ecx,7
+        mov     esi,ebx
+        xor     ebp,edx
+        rol     ebx,5
+        pshufd  xmm0,xmm4,238
+        add     eax,ebp
+        xor     esi,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        add     edi,DWORD [32+esp]
+        pxor    xmm6,xmm2
+        punpcklqdq      xmm0,xmm5
+        and     esi,ecx
+        xor     ecx,edx
+        ror     ebx,7
+        pxor    xmm6,xmm7
+        movdqa  [80+esp],xmm2
+        mov     ebp,eax
+        xor     esi,ecx
+        rol     eax,5
+        movdqa  xmm2,xmm1
+        add     edi,esi
+        paddd   xmm1,xmm5
+        xor     ebp,ebx
+        pxor    xmm6,xmm0
+        xor     ebx,ecx
+        add     edi,eax
+        add     edx,DWORD [36+esp]
+        and     ebp,ebx
+        movdqa  xmm0,xmm6
+        movdqa  [16+esp],xmm1
+        xor     ebx,ecx
+        ror     eax,7
+        mov     esi,edi
+        xor     ebp,ebx
+        rol     edi,5
+        pslld   xmm6,2
+        add     edx,ebp
+        xor     esi,eax
+        psrld   xmm0,30
+        xor     eax,ebx
+        add     edx,edi
+        add     ecx,DWORD [40+esp]
+        and     esi,eax
+        xor     eax,ebx
+        ror     edi,7
+        por     xmm6,xmm0
+        mov     ebp,edx
+        xor     esi,eax
+        movdqa  xmm0,[96+esp]
+        rol     edx,5
+        add     ecx,esi
+        xor     ebp,edi
+        xor     edi,eax
+        add     ecx,edx
+        pshufd  xmm1,xmm5,238
+        add     ebx,DWORD [44+esp]
+        and     ebp,edi
+        xor     edi,eax
+        ror     edx,7
+        mov     esi,ecx
+        xor     ebp,edi
+        rol     ecx,5
+        add     ebx,ebp
+        xor     esi,edx
+        xor     edx,edi
+        add     ebx,ecx
+        add     eax,DWORD [48+esp]
+        pxor    xmm7,xmm3
+        punpcklqdq      xmm1,xmm6
+        and     esi,edx
+        xor     edx,edi
+        ror     ecx,7
+        pxor    xmm7,xmm0
+        movdqa  [96+esp],xmm3
+        mov     ebp,ebx
+        xor     esi,edx
+        rol     ebx,5
+        movdqa  xmm3,[144+esp]
+        add     eax,esi
+        paddd   xmm2,xmm6
+        xor     ebp,ecx
+        pxor    xmm7,xmm1
+        xor     ecx,edx
+        add     eax,ebx
+        add     edi,DWORD [52+esp]
+        and     ebp,ecx
+        movdqa  xmm1,xmm7
+        movdqa  [32+esp],xmm2
+        xor     ecx,edx
+        ror     ebx,7
+        mov     esi,eax
+        xor     ebp,ecx
+        rol     eax,5
+        pslld   xmm7,2
+        add     edi,ebp
+        xor     esi,ebx
+        psrld   xmm1,30
+        xor     ebx,ecx
+        add     edi,eax
+        add     edx,DWORD [56+esp]
+        and     esi,ebx
+        xor     ebx,ecx
+        ror     eax,7
+        por     xmm7,xmm1
+        mov     ebp,edi
+        xor     esi,ebx
+        movdqa  xmm1,[64+esp]
+        rol     edi,5
+        add     edx,esi
+        xor     ebp,eax
+        xor     eax,ebx
+        add     edx,edi
+        pshufd  xmm2,xmm6,238
+        add     ecx,DWORD [60+esp]
+        and     ebp,eax
+        xor     eax,ebx
+        ror     edi,7
+        mov     esi,edx
+        xor     ebp,eax
+        rol     edx,5
+        add     ecx,ebp
+        xor     esi,edi
+        xor     edi,eax
+        add     ecx,edx
+        add     ebx,DWORD [esp]
+        pxor    xmm0,xmm4
+        punpcklqdq      xmm2,xmm7
+        and     esi,edi
+        xor     edi,eax
+        ror     edx,7
+        pxor    xmm0,xmm1
+        movdqa  [64+esp],xmm4
+        mov     ebp,ecx
+        xor     esi,edi
+        rol     ecx,5
+        movdqa  xmm4,xmm3
+        add     ebx,esi
+        paddd   xmm3,xmm7
+        xor     ebp,edx
+        pxor    xmm0,xmm2
+        xor     edx,edi
+        add     ebx,ecx
+        add     eax,DWORD [4+esp]
+        and     ebp,edx
+        movdqa  xmm2,xmm0
+        movdqa  [48+esp],xmm3
+        xor     edx,edi
+        ror     ecx,7
+        mov     esi,ebx
+        xor     ebp,edx
+        rol     ebx,5
+        pslld   xmm0,2
+        add     eax,ebp
+        xor     esi,ecx
+        psrld   xmm2,30
+        xor     ecx,edx
+        add     eax,ebx
+        add     edi,DWORD [8+esp]
+        and     esi,ecx
+        xor     ecx,edx
+        ror     ebx,7
+        por     xmm0,xmm2
+        mov     ebp,eax
+        xor     esi,ecx
+        movdqa  xmm2,[80+esp]
+        rol     eax,5
+        add     edi,esi
+        xor     ebp,ebx
+        xor     ebx,ecx
+        add     edi,eax
+        pshufd  xmm3,xmm7,238
+        add     edx,DWORD [12+esp]
+        and     ebp,ebx
+        xor     ebx,ecx
+        ror     eax,7
+        mov     esi,edi
+        xor     ebp,ebx
+        rol     edi,5
+        add     edx,ebp
+        xor     esi,eax
+        xor     eax,ebx
+        add     edx,edi
+        add     ecx,DWORD [16+esp]
+        pxor    xmm1,xmm5
+        punpcklqdq      xmm3,xmm0
+        and     esi,eax
+        xor     eax,ebx
+        ror     edi,7
+        pxor    xmm1,xmm2
+        movdqa  [80+esp],xmm5
+        mov     ebp,edx
+        xor     esi,eax
+        rol     edx,5
+        movdqa  xmm5,xmm4
+        add     ecx,esi
+        paddd   xmm4,xmm0
+        xor     ebp,edi
+        pxor    xmm1,xmm3
+        xor     edi,eax
+        add     ecx,edx
+        add     ebx,DWORD [20+esp]
+        and     ebp,edi
+        movdqa  xmm3,xmm1
+        movdqa  [esp],xmm4
+        xor     edi,eax
+        ror     edx,7
+        mov     esi,ecx
+        xor     ebp,edi
+        rol     ecx,5
+        pslld   xmm1,2
+        add     ebx,ebp
+        xor     esi,edx
+        psrld   xmm3,30
+        xor     edx,edi
+        add     ebx,ecx
+        add     eax,DWORD [24+esp]
+        and     esi,edx
+        xor     edx,edi
+        ror     ecx,7
+        por     xmm1,xmm3
+        mov     ebp,ebx
+        xor     esi,edx
+        movdqa  xmm3,[96+esp]
+        rol     ebx,5
+        add     eax,esi
+        xor     ebp,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        pshufd  xmm4,xmm0,238
+        add     edi,DWORD [28+esp]
+        and     ebp,ecx
+        xor     ecx,edx
+        ror     ebx,7
+        mov     esi,eax
+        xor     ebp,ecx
+        rol     eax,5
+        add     edi,ebp
+        xor     esi,ebx
+        xor     ebx,ecx
+        add     edi,eax
+        add     edx,DWORD [32+esp]
+        pxor    xmm2,xmm6
+        punpcklqdq      xmm4,xmm1
+        and     esi,ebx
+        xor     ebx,ecx
+        ror     eax,7
+        pxor    xmm2,xmm3
+        movdqa  [96+esp],xmm6
+        mov     ebp,edi
+        xor     esi,ebx
+        rol     edi,5
+        movdqa  xmm6,xmm5
+        add     edx,esi
+        paddd   xmm5,xmm1
+        xor     ebp,eax
+        pxor    xmm2,xmm4
+        xor     eax,ebx
+        add     edx,edi
+        add     ecx,DWORD [36+esp]
+        and     ebp,eax
+        movdqa  xmm4,xmm2
+        movdqa  [16+esp],xmm5
+        xor     eax,ebx
+        ror     edi,7
+        mov     esi,edx
+        xor     ebp,eax
+        rol     edx,5
+        pslld   xmm2,2
+        add     ecx,ebp
+        xor     esi,edi
+        psrld   xmm4,30
+        xor     edi,eax
+        add     ecx,edx
+        add     ebx,DWORD [40+esp]
+        and     esi,edi
+        xor     edi,eax
+        ror     edx,7
+        por     xmm2,xmm4
+        mov     ebp,ecx
+        xor     esi,edi
+        movdqa  xmm4,[64+esp]
+        rol     ecx,5
+        add     ebx,esi
+        xor     ebp,edx
+        xor     edx,edi
+        add     ebx,ecx
+        pshufd  xmm5,xmm1,238
+        add     eax,DWORD [44+esp]
+        and     ebp,edx
+        xor     edx,edi
+        ror     ecx,7
+        mov     esi,ebx
+        xor     ebp,edx
+        rol     ebx,5
+        add     eax,ebp
+        xor     esi,edx
+        add     eax,ebx
+        add     edi,DWORD [48+esp]
+        pxor    xmm3,xmm7
+        punpcklqdq      xmm5,xmm2
+        xor     esi,ecx
+        mov     ebp,eax
+        rol     eax,5
+        pxor    xmm3,xmm4
+        movdqa  [64+esp],xmm7
+        add     edi,esi
+        xor     ebp,ecx
+        movdqa  xmm7,xmm6
+        ror     ebx,7
+        paddd   xmm6,xmm2
+        add     edi,eax
+        pxor    xmm3,xmm5
+        add     edx,DWORD [52+esp]
+        xor     ebp,ebx
+        mov     esi,edi
+        rol     edi,5
+        movdqa  xmm5,xmm3
+        movdqa  [32+esp],xmm6
+        add     edx,ebp
+        xor     esi,ebx
+        ror     eax,7
+        add     edx,edi
+        pslld   xmm3,2
+        add     ecx,DWORD [56+esp]
+        xor     esi,eax
+        psrld   xmm5,30
+        mov     ebp,edx
+        rol     edx,5
+        add     ecx,esi
+        xor     ebp,eax
+        ror     edi,7
+        add     ecx,edx
+        por     xmm3,xmm5
+        add     ebx,DWORD [60+esp]
+        xor     ebp,edi
+        mov     esi,ecx
+        rol     ecx,5
+        add     ebx,ebp
+        xor     esi,edi
+        ror     edx,7
+        add     ebx,ecx
+        add     eax,DWORD [esp]
+        xor     esi,edx
+        mov     ebp,ebx
+        rol     ebx,5
+        add     eax,esi
+        xor     ebp,edx
+        ror     ecx,7
+        paddd   xmm7,xmm3
+        add     eax,ebx
+        add     edi,DWORD [4+esp]
+        xor     ebp,ecx
+        mov     esi,eax
+        movdqa  [48+esp],xmm7
+        rol     eax,5
+        add     edi,ebp
+        xor     esi,ecx
+        ror     ebx,7
+        add     edi,eax
+        add     edx,DWORD [8+esp]
+        xor     esi,ebx
+        mov     ebp,edi
+        rol     edi,5
+        add     edx,esi
+        xor     ebp,ebx
+        ror     eax,7
+        add     edx,edi
+        add     ecx,DWORD [12+esp]
+        xor     ebp,eax
+        mov     esi,edx
+        rol     edx,5
+        add     ecx,ebp
+        xor     esi,eax
+        ror     edi,7
+        add     ecx,edx
+        mov     ebp,DWORD [196+esp]
+        cmp     ebp,DWORD [200+esp]
+        je      NEAR L$007done
+        movdqa  xmm7,[160+esp]
+        movdqa  xmm6,[176+esp]
+        movdqu  xmm0,[ebp]
+        movdqu  xmm1,[16+ebp]
+        movdqu  xmm2,[32+ebp]
+        movdqu  xmm3,[48+ebp]
+        add     ebp,64
+db      102,15,56,0,198
+        mov     DWORD [196+esp],ebp
+        movdqa  [96+esp],xmm7
+        add     ebx,DWORD [16+esp]
+        xor     esi,edi
+        mov     ebp,ecx
+        rol     ecx,5
+        add     ebx,esi
+        xor     ebp,edi
+        ror     edx,7
+db      102,15,56,0,206
+        add     ebx,ecx
+        add     eax,DWORD [20+esp]
+        xor     ebp,edx
+        mov     esi,ebx
+        paddd   xmm0,xmm7
+        rol     ebx,5
+        add     eax,ebp
+        xor     esi,edx
+        ror     ecx,7
+        movdqa  [esp],xmm0
+        add     eax,ebx
+        add     edi,DWORD [24+esp]
+        xor     esi,ecx
+        mov     ebp,eax
+        psubd   xmm0,xmm7
+        rol     eax,5
+        add     edi,esi
+        xor     ebp,ecx
+        ror     ebx,7
+        add     edi,eax
+        add     edx,DWORD [28+esp]
+        xor     ebp,ebx
+        mov     esi,edi
+        rol     edi,5
+        add     edx,ebp
+        xor     esi,ebx
+        ror     eax,7
+        add     edx,edi
+        add     ecx,DWORD [32+esp]
+        xor     esi,eax
+        mov     ebp,edx
+        rol     edx,5
+        add     ecx,esi
+        xor     ebp,eax
+        ror     edi,7
+db      102,15,56,0,214
+        add     ecx,edx
+        add     ebx,DWORD [36+esp]
+        xor     ebp,edi
+        mov     esi,ecx
+        paddd   xmm1,xmm7
+        rol     ecx,5
+        add     ebx,ebp
+        xor     esi,edi
+        ror     edx,7
+        movdqa  [16+esp],xmm1
+        add     ebx,ecx
+        add     eax,DWORD [40+esp]
+        xor     esi,edx
+        mov     ebp,ebx
+        psubd   xmm1,xmm7
+        rol     ebx,5
+        add     eax,esi
+        xor     ebp,edx
+        ror     ecx,7
+        add     eax,ebx
+        add     edi,DWORD [44+esp]
+        xor     ebp,ecx
+        mov     esi,eax
+        rol     eax,5
+        add     edi,ebp
+        xor     esi,ecx
+        ror     ebx,7
+        add     edi,eax
+        add     edx,DWORD [48+esp]
+        xor     esi,ebx
+        mov     ebp,edi
+        rol     edi,5
+        add     edx,esi
+        xor     ebp,ebx
+        ror     eax,7
+db      102,15,56,0,222
+        add     edx,edi
+        add     ecx,DWORD [52+esp]
+        xor     ebp,eax
+        mov     esi,edx
+        paddd   xmm2,xmm7
+        rol     edx,5
+        add     ecx,ebp
+        xor     esi,eax
+        ror     edi,7
+        movdqa  [32+esp],xmm2
+        add     ecx,edx
+        add     ebx,DWORD [56+esp]
+        xor     esi,edi
+        mov     ebp,ecx
+        psubd   xmm2,xmm7
+        rol     ecx,5
+        add     ebx,esi
+        xor     ebp,edi
+        ror     edx,7
+        add     ebx,ecx
+        add     eax,DWORD [60+esp]
+        xor     ebp,edx
+        mov     esi,ebx
+        rol     ebx,5
+        add     eax,ebp
+        ror     ecx,7
+        add     eax,ebx
+        mov     ebp,DWORD [192+esp]
+        add     eax,DWORD [ebp]
+        add     esi,DWORD [4+ebp]
+        add     ecx,DWORD [8+ebp]
+        mov     DWORD [ebp],eax
+        add     edx,DWORD [12+ebp]
+        mov     DWORD [4+ebp],esi
+        add     edi,DWORD [16+ebp]
+        mov     DWORD [8+ebp],ecx
+        mov     ebx,ecx
+        mov     DWORD [12+ebp],edx
+        xor     ebx,edx
+        mov     DWORD [16+ebp],edi
+        mov     ebp,esi
+        pshufd  xmm4,xmm0,238
+        and     esi,ebx
+        mov     ebx,ebp
+        jmp     NEAR L$006loop
+align   16
+L$007done:
+        add     ebx,DWORD [16+esp]
+        xor     esi,edi
+        mov     ebp,ecx
+        rol     ecx,5
+        add     ebx,esi
+        xor     ebp,edi
+        ror     edx,7
+        add     ebx,ecx
+        add     eax,DWORD [20+esp]
+        xor     ebp,edx
+        mov     esi,ebx
+        rol     ebx,5
+        add     eax,ebp
+        xor     esi,edx
+        ror     ecx,7
+        add     eax,ebx
+        add     edi,DWORD [24+esp]
+        xor     esi,ecx
+        mov     ebp,eax
+        rol     eax,5
+        add     edi,esi
+        xor     ebp,ecx
+        ror     ebx,7
+        add     edi,eax
+        add     edx,DWORD [28+esp]
+        xor     ebp,ebx
+        mov     esi,edi
+        rol     edi,5
+        add     edx,ebp
+        xor     esi,ebx
+        ror     eax,7
+        add     edx,edi
+        add     ecx,DWORD [32+esp]
+        xor     esi,eax
+        mov     ebp,edx
+        rol     edx,5
+        add     ecx,esi
+        xor     ebp,eax
+        ror     edi,7
+        add     ecx,edx
+        add     ebx,DWORD [36+esp]
+        xor     ebp,edi
+        mov     esi,ecx
+        rol     ecx,5
+        add     ebx,ebp
+        xor     esi,edi
+        ror     edx,7
+        add     ebx,ecx
+        add     eax,DWORD [40+esp]
+        xor     esi,edx
+        mov     ebp,ebx
+        rol     ebx,5
+        add     eax,esi
+        xor     ebp,edx
+        ror     ecx,7
+        add     eax,ebx
+        add     edi,DWORD [44+esp]
+        xor     ebp,ecx
+        mov     esi,eax
+        rol     eax,5
+        add     edi,ebp
+        xor     esi,ecx
+        ror     ebx,7
+        add     edi,eax
+        add     edx,DWORD [48+esp]
+        xor     esi,ebx
+        mov     ebp,edi
+        rol     edi,5
+        add     edx,esi
+        xor     ebp,ebx
+        ror     eax,7
+        add     edx,edi
+        add     ecx,DWORD [52+esp]
+        xor     ebp,eax
+        mov     esi,edx
+        rol     edx,5
+        add     ecx,ebp
+        xor     esi,eax
+        ror     edi,7
+        add     ecx,edx
+        add     ebx,DWORD [56+esp]
+        xor     esi,edi
+        mov     ebp,ecx
+        rol     ecx,5
+        add     ebx,esi
+        xor     ebp,edi
+        ror     edx,7
+        add     ebx,ecx
+        add     eax,DWORD [60+esp]
+        xor     ebp,edx
+        mov     esi,ebx
+        rol     ebx,5
+        add     eax,ebp
+        ror     ecx,7
+        add     eax,ebx
+        mov     ebp,DWORD [192+esp]
+        add     eax,DWORD [ebp]
+        mov     esp,DWORD [204+esp]
+        add     esi,DWORD [4+ebp]
+        add     ecx,DWORD [8+ebp]
+        mov     DWORD [ebp],eax
+        add     edx,DWORD [12+ebp]
+        mov     DWORD [4+ebp],esi
+        add     edi,DWORD [16+ebp]
+        mov     DWORD [8+ebp],ecx
+        mov     DWORD [12+ebp],edx
+        mov     DWORD [16+ebp],edi
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+align   16
+__sha1_block_data_order_avx:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        call    L$008pic_point
+L$008pic_point:
+        pop     ebp
+        lea     ebp,[(L$K_XX_XX-L$008pic_point)+ebp]
+L$avx_shortcut:
+        vzeroall
+        vmovdqa xmm7,[ebp]
+        vmovdqa xmm0,[16+ebp]
+        vmovdqa xmm1,[32+ebp]
+        vmovdqa xmm2,[48+ebp]
+        vmovdqa xmm6,[64+ebp]
+        mov     edi,DWORD [20+esp]
+        mov     ebp,DWORD [24+esp]
+        mov     edx,DWORD [28+esp]
+        mov     esi,esp
+        sub     esp,208
+        and     esp,-64
+        vmovdqa [112+esp],xmm0
+        vmovdqa [128+esp],xmm1
+        vmovdqa [144+esp],xmm2
+        shl     edx,6
+        vmovdqa [160+esp],xmm7
+        add     edx,ebp
+        vmovdqa [176+esp],xmm6
+        add     ebp,64
+        mov     DWORD [192+esp],edi
+        mov     DWORD [196+esp],ebp
+        mov     DWORD [200+esp],edx
+        mov     DWORD [204+esp],esi
+        mov     eax,DWORD [edi]
+        mov     ebx,DWORD [4+edi]
+        mov     ecx,DWORD [8+edi]
+        mov     edx,DWORD [12+edi]
+        mov     edi,DWORD [16+edi]
+        mov     esi,ebx
+        vmovdqu xmm0,[ebp-64]
+        vmovdqu xmm1,[ebp-48]
+        vmovdqu xmm2,[ebp-32]
+        vmovdqu xmm3,[ebp-16]
+        vpshufb xmm0,xmm0,xmm6
+        vpshufb xmm1,xmm1,xmm6
+        vpshufb xmm2,xmm2,xmm6
+        vmovdqa [96+esp],xmm7
+        vpshufb xmm3,xmm3,xmm6
+        vpaddd  xmm4,xmm0,xmm7
+        vpaddd  xmm5,xmm1,xmm7
+        vpaddd  xmm6,xmm2,xmm7
+        vmovdqa [esp],xmm4
+        mov     ebp,ecx
+        vmovdqa [16+esp],xmm5
+        xor     ebp,edx
+        vmovdqa [32+esp],xmm6
+        and     esi,ebp
+        jmp     NEAR L$009loop
+align   16
+L$009loop:
+        shrd    ebx,ebx,2
+        xor     esi,edx
+        vpalignr        xmm4,xmm1,xmm0,8
+        mov     ebp,eax
+        add     edi,DWORD [esp]
+        vpaddd  xmm7,xmm7,xmm3
+        vmovdqa [64+esp],xmm0
+        xor     ebx,ecx
+        shld    eax,eax,5
+        vpsrldq xmm6,xmm3,4
+        add     edi,esi
+        and     ebp,ebx
+        vpxor   xmm4,xmm4,xmm0
+        xor     ebx,ecx
+        add     edi,eax
+        vpxor   xmm6,xmm6,xmm2
+        shrd    eax,eax,7
+        xor     ebp,ecx
+        vmovdqa [48+esp],xmm7
+        mov     esi,edi
+        add     edx,DWORD [4+esp]
+        vpxor   xmm4,xmm4,xmm6
+        xor     eax,ebx
+        shld    edi,edi,5
+        add     edx,ebp
+        and     esi,eax
+        vpsrld  xmm6,xmm4,31
+        xor     eax,ebx
+        add     edx,edi
+        shrd    edi,edi,7
+        xor     esi,ebx
+        vpslldq xmm0,xmm4,12
+        vpaddd  xmm4,xmm4,xmm4
+        mov     ebp,edx
+        add     ecx,DWORD [8+esp]
+        xor     edi,eax
+        shld    edx,edx,5
+        vpsrld  xmm7,xmm0,30
+        vpor    xmm4,xmm4,xmm6
+        add     ecx,esi
+        and     ebp,edi
+        xor     edi,eax
+        add     ecx,edx
+        vpslld  xmm0,xmm0,2
+        shrd    edx,edx,7
+        xor     ebp,eax
+        vpxor   xmm4,xmm4,xmm7
+        mov     esi,ecx
+        add     ebx,DWORD [12+esp]
+        xor     edx,edi
+        shld    ecx,ecx,5
+        vpxor   xmm4,xmm4,xmm0
+        add     ebx,ebp
+        and     esi,edx
+        vmovdqa xmm0,[96+esp]
+        xor     edx,edi
+        add     ebx,ecx
+        shrd    ecx,ecx,7
+        xor     esi,edi
+        vpalignr        xmm5,xmm2,xmm1,8
+        mov     ebp,ebx
+        add     eax,DWORD [16+esp]
+        vpaddd  xmm0,xmm0,xmm4
+        vmovdqa [80+esp],xmm1
+        xor     ecx,edx
+        shld    ebx,ebx,5
+        vpsrldq xmm7,xmm4,4
+        add     eax,esi
+        and     ebp,ecx
+        vpxor   xmm5,xmm5,xmm1
+        xor     ecx,edx
+        add     eax,ebx
+        vpxor   xmm7,xmm7,xmm3
+        shrd    ebx,ebx,7
+        xor     ebp,edx
+        vmovdqa [esp],xmm0
+        mov     esi,eax
+        add     edi,DWORD [20+esp]
+        vpxor   xmm5,xmm5,xmm7
+        xor     ebx,ecx
+        shld    eax,eax,5
+        add     edi,ebp
+        and     esi,ebx
+        vpsrld  xmm7,xmm5,31
+        xor     ebx,ecx
+        add     edi,eax
+        shrd    eax,eax,7
+        xor     esi,ecx
+        vpslldq xmm1,xmm5,12
+        vpaddd  xmm5,xmm5,xmm5
+        mov     ebp,edi
+        add     edx,DWORD [24+esp]
+        xor     eax,ebx
+        shld    edi,edi,5
+        vpsrld  xmm0,xmm1,30
+        vpor    xmm5,xmm5,xmm7
+        add     edx,esi
+        and     ebp,eax
+        xor     eax,ebx
+        add     edx,edi
+        vpslld  xmm1,xmm1,2
+        shrd    edi,edi,7
+        xor     ebp,ebx
+        vpxor   xmm5,xmm5,xmm0
+        mov     esi,edx
+        add     ecx,DWORD [28+esp]
+        xor     edi,eax
+        shld    edx,edx,5
+        vpxor   xmm5,xmm5,xmm1
+        add     ecx,ebp
+        and     esi,edi
+        vmovdqa xmm1,[112+esp]
+        xor     edi,eax
+        add     ecx,edx
+        shrd    edx,edx,7
+        xor     esi,eax
+        vpalignr        xmm6,xmm3,xmm2,8
+        mov     ebp,ecx
+        add     ebx,DWORD [32+esp]
+        vpaddd  xmm1,xmm1,xmm5
+        vmovdqa [96+esp],xmm2
+        xor     edx,edi
+        shld    ecx,ecx,5
+        vpsrldq xmm0,xmm5,4
+        add     ebx,esi
+        and     ebp,edx
+        vpxor   xmm6,xmm6,xmm2
+        xor     edx,edi
+        add     ebx,ecx
+        vpxor   xmm0,xmm0,xmm4
+        shrd    ecx,ecx,7
+        xor     ebp,edi
+        vmovdqa [16+esp],xmm1
+        mov     esi,ebx
+        add     eax,DWORD [36+esp]
+        vpxor   xmm6,xmm6,xmm0
+        xor     ecx,edx
+        shld    ebx,ebx,5
+        add     eax,ebp
+        and     esi,ecx
+        vpsrld  xmm0,xmm6,31
+        xor     ecx,edx
+        add     eax,ebx
+        shrd    ebx,ebx,7
+        xor     esi,edx
+        vpslldq xmm2,xmm6,12
+        vpaddd  xmm6,xmm6,xmm6
+        mov     ebp,eax
+        add     edi,DWORD [40+esp]
+        xor     ebx,ecx
+        shld    eax,eax,5
+        vpsrld  xmm1,xmm2,30
+        vpor    xmm6,xmm6,xmm0
+        add     edi,esi
+        and     ebp,ebx
+        xor     ebx,ecx
+        add     edi,eax
+        vpslld  xmm2,xmm2,2
+        vmovdqa xmm0,[64+esp]
+        shrd    eax,eax,7
+        xor     ebp,ecx
+        vpxor   xmm6,xmm6,xmm1
+        mov     esi,edi
+        add     edx,DWORD [44+esp]
+        xor     eax,ebx
+        shld    edi,edi,5
+        vpxor   xmm6,xmm6,xmm2
+        add     edx,ebp
+        and     esi,eax
+        vmovdqa xmm2,[112+esp]
+        xor     eax,ebx
+        add     edx,edi
+        shrd    edi,edi,7
+        xor     esi,ebx
+        vpalignr        xmm7,xmm4,xmm3,8
+        mov     ebp,edx
+        add     ecx,DWORD [48+esp]
+        vpaddd  xmm2,xmm2,xmm6
+        vmovdqa [64+esp],xmm3
+        xor     edi,eax
+        shld    edx,edx,5
+        vpsrldq xmm1,xmm6,4
+        add     ecx,esi
+        and     ebp,edi
+        vpxor   xmm7,xmm7,xmm3
+        xor     edi,eax
+        add     ecx,edx
+        vpxor   xmm1,xmm1,xmm5
+        shrd    edx,edx,7
+        xor     ebp,eax
+        vmovdqa [32+esp],xmm2
+        mov     esi,ecx
+        add     ebx,DWORD [52+esp]
+        vpxor   xmm7,xmm7,xmm1
+        xor     edx,edi
+        shld    ecx,ecx,5
+        add     ebx,ebp
+        and     esi,edx
+        vpsrld  xmm1,xmm7,31
+        xor     edx,edi
+        add     ebx,ecx
+        shrd    ecx,ecx,7
+        xor     esi,edi
+        vpslldq xmm3,xmm7,12
+        vpaddd  xmm7,xmm7,xmm7
+        mov     ebp,ebx
+        add     eax,DWORD [56+esp]
+        xor     ecx,edx
+        shld    ebx,ebx,5
+        vpsrld  xmm2,xmm3,30
+        vpor    xmm7,xmm7,xmm1
+        add     eax,esi
+        and     ebp,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        vpslld  xmm3,xmm3,2
+        vmovdqa xmm1,[80+esp]
+        shrd    ebx,ebx,7
+        xor     ebp,edx
+        vpxor   xmm7,xmm7,xmm2
+        mov     esi,eax
+        add     edi,DWORD [60+esp]
+        xor     ebx,ecx
+        shld    eax,eax,5
+        vpxor   xmm7,xmm7,xmm3
+        add     edi,ebp
+        and     esi,ebx
+        vmovdqa xmm3,[112+esp]
+        xor     ebx,ecx
+        add     edi,eax
+        vpalignr        xmm2,xmm7,xmm6,8
+        vpxor   xmm0,xmm0,xmm4
+        shrd    eax,eax,7
+        xor     esi,ecx
+        mov     ebp,edi
+        add     edx,DWORD [esp]
+        vpxor   xmm0,xmm0,xmm1
+        vmovdqa [80+esp],xmm4
+        xor     eax,ebx
+        shld    edi,edi,5
+        vmovdqa xmm4,xmm3
+        vpaddd  xmm3,xmm3,xmm7
+        add     edx,esi
+        and     ebp,eax
+        vpxor   xmm0,xmm0,xmm2
+        xor     eax,ebx
+        add     edx,edi
+        shrd    edi,edi,7
+        xor     ebp,ebx
+        vpsrld  xmm2,xmm0,30
+        vmovdqa [48+esp],xmm3
+        mov     esi,edx
+        add     ecx,DWORD [4+esp]
+        xor     edi,eax
+        shld    edx,edx,5
+        vpslld  xmm0,xmm0,2
+        add     ecx,ebp
+        and     esi,edi
+        xor     edi,eax
+        add     ecx,edx
+        shrd    edx,edx,7
+        xor     esi,eax
+        mov     ebp,ecx
+        add     ebx,DWORD [8+esp]
+        vpor    xmm0,xmm0,xmm2
+        xor     edx,edi
+        shld    ecx,ecx,5
+        vmovdqa xmm2,[96+esp]
+        add     ebx,esi
+        and     ebp,edx
+        xor     edx,edi
+        add     ebx,ecx
+        add     eax,DWORD [12+esp]
+        xor     ebp,edi
+        mov     esi,ebx
+        shld    ebx,ebx,5
+        add     eax,ebp
+        xor     esi,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        vpalignr        xmm3,xmm0,xmm7,8
+        vpxor   xmm1,xmm1,xmm5
+        add     edi,DWORD [16+esp]
+        xor     esi,ecx
+        mov     ebp,eax
+        shld    eax,eax,5
+        vpxor   xmm1,xmm1,xmm2
+        vmovdqa [96+esp],xmm5
+        add     edi,esi
+        xor     ebp,ecx
+        vmovdqa xmm5,xmm4
+        vpaddd  xmm4,xmm4,xmm0
+        shrd    ebx,ebx,7
+        add     edi,eax
+        vpxor   xmm1,xmm1,xmm3
+        add     edx,DWORD [20+esp]
+        xor     ebp,ebx
+        mov     esi,edi
+        shld    edi,edi,5
+        vpsrld  xmm3,xmm1,30
+        vmovdqa [esp],xmm4
+        add     edx,ebp
+        xor     esi,ebx
+        shrd    eax,eax,7
+        add     edx,edi
+        vpslld  xmm1,xmm1,2
+        add     ecx,DWORD [24+esp]
+        xor     esi,eax
+        mov     ebp,edx
+        shld    edx,edx,5
+        add     ecx,esi
+        xor     ebp,eax
+        shrd    edi,edi,7
+        add     ecx,edx
+        vpor    xmm1,xmm1,xmm3
+        add     ebx,DWORD [28+esp]
+        xor     ebp,edi
+        vmovdqa xmm3,[64+esp]
+        mov     esi,ecx
+        shld    ecx,ecx,5
+        add     ebx,ebp
+        xor     esi,edi
+        shrd    edx,edx,7
+        add     ebx,ecx
+        vpalignr        xmm4,xmm1,xmm0,8
+        vpxor   xmm2,xmm2,xmm6
+        add     eax,DWORD [32+esp]
+        xor     esi,edx
+        mov     ebp,ebx
+        shld    ebx,ebx,5
+        vpxor   xmm2,xmm2,xmm3
+        vmovdqa [64+esp],xmm6
+        add     eax,esi
+        xor     ebp,edx
+        vmovdqa xmm6,[128+esp]
+        vpaddd  xmm5,xmm5,xmm1
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        vpxor   xmm2,xmm2,xmm4
+        add     edi,DWORD [36+esp]
+        xor     ebp,ecx
+        mov     esi,eax
+        shld    eax,eax,5
+        vpsrld  xmm4,xmm2,30
+        vmovdqa [16+esp],xmm5
+        add     edi,ebp
+        xor     esi,ecx
+        shrd    ebx,ebx,7
+        add     edi,eax
+        vpslld  xmm2,xmm2,2
+        add     edx,DWORD [40+esp]
+        xor     esi,ebx
+        mov     ebp,edi
+        shld    edi,edi,5
+        add     edx,esi
+        xor     ebp,ebx
+        shrd    eax,eax,7
+        add     edx,edi
+        vpor    xmm2,xmm2,xmm4
+        add     ecx,DWORD [44+esp]
+        xor     ebp,eax
+        vmovdqa xmm4,[80+esp]
+        mov     esi,edx
+        shld    edx,edx,5
+        add     ecx,ebp
+        xor     esi,eax
+        shrd    edi,edi,7
+        add     ecx,edx
+        vpalignr        xmm5,xmm2,xmm1,8
+        vpxor   xmm3,xmm3,xmm7
+        add     ebx,DWORD [48+esp]
+        xor     esi,edi
+        mov     ebp,ecx
+        shld    ecx,ecx,5
+        vpxor   xmm3,xmm3,xmm4
+        vmovdqa [80+esp],xmm7
+        add     ebx,esi
+        xor     ebp,edi
+        vmovdqa xmm7,xmm6
+        vpaddd  xmm6,xmm6,xmm2
+        shrd    edx,edx,7
+        add     ebx,ecx
+        vpxor   xmm3,xmm3,xmm5
+        add     eax,DWORD [52+esp]
+        xor     ebp,edx
+        mov     esi,ebx
+        shld    ebx,ebx,5
+        vpsrld  xmm5,xmm3,30
+        vmovdqa [32+esp],xmm6
+        add     eax,ebp
+        xor     esi,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        vpslld  xmm3,xmm3,2
+        add     edi,DWORD [56+esp]
+        xor     esi,ecx
+        mov     ebp,eax
+        shld    eax,eax,5
+        add     edi,esi
+        xor     ebp,ecx
+        shrd    ebx,ebx,7
+        add     edi,eax
+        vpor    xmm3,xmm3,xmm5
+        add     edx,DWORD [60+esp]
+        xor     ebp,ebx
+        vmovdqa xmm5,[96+esp]
+        mov     esi,edi
+        shld    edi,edi,5
+        add     edx,ebp
+        xor     esi,ebx
+        shrd    eax,eax,7
+        add     edx,edi
+        vpalignr        xmm6,xmm3,xmm2,8
+        vpxor   xmm4,xmm4,xmm0
+        add     ecx,DWORD [esp]
+        xor     esi,eax
+        mov     ebp,edx
+        shld    edx,edx,5
+        vpxor   xmm4,xmm4,xmm5
+        vmovdqa [96+esp],xmm0
+        add     ecx,esi
+        xor     ebp,eax
+        vmovdqa xmm0,xmm7
+        vpaddd  xmm7,xmm7,xmm3
+        shrd    edi,edi,7
+        add     ecx,edx
+        vpxor   xmm4,xmm4,xmm6
+        add     ebx,DWORD [4+esp]
+        xor     ebp,edi
+        mov     esi,ecx
+        shld    ecx,ecx,5
+        vpsrld  xmm6,xmm4,30
+        vmovdqa [48+esp],xmm7
+        add     ebx,ebp
+        xor     esi,edi
+        shrd    edx,edx,7
+        add     ebx,ecx
+        vpslld  xmm4,xmm4,2
+        add     eax,DWORD [8+esp]
+        xor     esi,edx
+        mov     ebp,ebx
+        shld    ebx,ebx,5
+        add     eax,esi
+        xor     ebp,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        vpor    xmm4,xmm4,xmm6
+        add     edi,DWORD [12+esp]
+        xor     ebp,ecx
+        vmovdqa xmm6,[64+esp]
+        mov     esi,eax
+        shld    eax,eax,5
+        add     edi,ebp
+        xor     esi,ecx
+        shrd    ebx,ebx,7
+        add     edi,eax
+        vpalignr        xmm7,xmm4,xmm3,8
+        vpxor   xmm5,xmm5,xmm1
+        add     edx,DWORD [16+esp]
+        xor     esi,ebx
+        mov     ebp,edi
+        shld    edi,edi,5
+        vpxor   xmm5,xmm5,xmm6
+        vmovdqa [64+esp],xmm1
+        add     edx,esi
+        xor     ebp,ebx
+        vmovdqa xmm1,xmm0
+        vpaddd  xmm0,xmm0,xmm4
+        shrd    eax,eax,7
+        add     edx,edi
+        vpxor   xmm5,xmm5,xmm7
+        add     ecx,DWORD [20+esp]
+        xor     ebp,eax
+        mov     esi,edx
+        shld    edx,edx,5
+        vpsrld  xmm7,xmm5,30
+        vmovdqa [esp],xmm0
+        add     ecx,ebp
+        xor     esi,eax
+        shrd    edi,edi,7
+        add     ecx,edx
+        vpslld  xmm5,xmm5,2
+        add     ebx,DWORD [24+esp]
+        xor     esi,edi
+        mov     ebp,ecx
+        shld    ecx,ecx,5
+        add     ebx,esi
+        xor     ebp,edi
+        shrd    edx,edx,7
+        add     ebx,ecx
+        vpor    xmm5,xmm5,xmm7
+        add     eax,DWORD [28+esp]
+        vmovdqa xmm7,[80+esp]
+        shrd    ecx,ecx,7
+        mov     esi,ebx
+        xor     ebp,edx
+        shld    ebx,ebx,5
+        add     eax,ebp
+        xor     esi,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        vpalignr        xmm0,xmm5,xmm4,8
+        vpxor   xmm6,xmm6,xmm2
+        add     edi,DWORD [32+esp]
+        and     esi,ecx
+        xor     ecx,edx
+        shrd    ebx,ebx,7
+        vpxor   xmm6,xmm6,xmm7
+        vmovdqa [80+esp],xmm2
+        mov     ebp,eax
+        xor     esi,ecx
+        vmovdqa xmm2,xmm1
+        vpaddd  xmm1,xmm1,xmm5
+        shld    eax,eax,5
+        add     edi,esi
+        vpxor   xmm6,xmm6,xmm0
+        xor     ebp,ebx
+        xor     ebx,ecx
+        add     edi,eax
+        add     edx,DWORD [36+esp]
+        vpsrld  xmm0,xmm6,30
+        vmovdqa [16+esp],xmm1
+        and     ebp,ebx
+        xor     ebx,ecx
+        shrd    eax,eax,7
+        mov     esi,edi
+        vpslld  xmm6,xmm6,2
+        xor     ebp,ebx
+        shld    edi,edi,5
+        add     edx,ebp
+        xor     esi,eax
+        xor     eax,ebx
+        add     edx,edi
+        add     ecx,DWORD [40+esp]
+        and     esi,eax
+        vpor    xmm6,xmm6,xmm0
+        xor     eax,ebx
+        shrd    edi,edi,7
+        vmovdqa xmm0,[96+esp]
+        mov     ebp,edx
+        xor     esi,eax
+        shld    edx,edx,5
+        add     ecx,esi
+        xor     ebp,edi
+        xor     edi,eax
+        add     ecx,edx
+        add     ebx,DWORD [44+esp]
+        and     ebp,edi
+        xor     edi,eax
+        shrd    edx,edx,7
+        mov     esi,ecx
+        xor     ebp,edi
+        shld    ecx,ecx,5
+        add     ebx,ebp
+        xor     esi,edx
+        xor     edx,edi
+        add     ebx,ecx
+        vpalignr        xmm1,xmm6,xmm5,8
+        vpxor   xmm7,xmm7,xmm3
+        add     eax,DWORD [48+esp]
+        and     esi,edx
+        xor     edx,edi
+        shrd    ecx,ecx,7
+        vpxor   xmm7,xmm7,xmm0
+        vmovdqa [96+esp],xmm3
+        mov     ebp,ebx
+        xor     esi,edx
+        vmovdqa xmm3,[144+esp]
+        vpaddd  xmm2,xmm2,xmm6
+        shld    ebx,ebx,5
+        add     eax,esi
+        vpxor   xmm7,xmm7,xmm1
+        xor     ebp,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        add     edi,DWORD [52+esp]
+        vpsrld  xmm1,xmm7,30
+        vmovdqa [32+esp],xmm2
+        and     ebp,ecx
+        xor     ecx,edx
+        shrd    ebx,ebx,7
+        mov     esi,eax
+        vpslld  xmm7,xmm7,2
+        xor     ebp,ecx
+        shld    eax,eax,5
+        add     edi,ebp
+        xor     esi,ebx
+        xor     ebx,ecx
+        add     edi,eax
+        add     edx,DWORD [56+esp]
+        and     esi,ebx
+        vpor    xmm7,xmm7,xmm1
+        xor     ebx,ecx
+        shrd    eax,eax,7
+        vmovdqa xmm1,[64+esp]
+        mov     ebp,edi
+        xor     esi,ebx
+        shld    edi,edi,5
+        add     edx,esi
+        xor     ebp,eax
+        xor     eax,ebx
+        add     edx,edi
+        add     ecx,DWORD [60+esp]
+        and     ebp,eax
+        xor     eax,ebx
+        shrd    edi,edi,7
+        mov     esi,edx
+        xor     ebp,eax
+        shld    edx,edx,5
+        add     ecx,ebp
+        xor     esi,edi
+        xor     edi,eax
+        add     ecx,edx
+        vpalignr        xmm2,xmm7,xmm6,8
+        vpxor   xmm0,xmm0,xmm4
+        add     ebx,DWORD [esp]
+        and     esi,edi
+        xor     edi,eax
+        shrd    edx,edx,7
+        vpxor   xmm0,xmm0,xmm1
+        vmovdqa [64+esp],xmm4
+        mov     ebp,ecx
+        xor     esi,edi
+        vmovdqa xmm4,xmm3
+        vpaddd  xmm3,xmm3,xmm7
+        shld    ecx,ecx,5
+        add     ebx,esi
+        vpxor   xmm0,xmm0,xmm2
+        xor     ebp,edx
+        xor     edx,edi
+        add     ebx,ecx
+        add     eax,DWORD [4+esp]
+        vpsrld  xmm2,xmm0,30
+        vmovdqa [48+esp],xmm3
+        and     ebp,edx
+        xor     edx,edi
+        shrd    ecx,ecx,7
+        mov     esi,ebx
+        vpslld  xmm0,xmm0,2
+        xor     ebp,edx
+        shld    ebx,ebx,5
+        add     eax,ebp
+        xor     esi,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        add     edi,DWORD [8+esp]
+        and     esi,ecx
+        vpor    xmm0,xmm0,xmm2
+        xor     ecx,edx
+        shrd    ebx,ebx,7
+        vmovdqa xmm2,[80+esp]
+        mov     ebp,eax
+        xor     esi,ecx
+        shld    eax,eax,5
+        add     edi,esi
+        xor     ebp,ebx
+        xor     ebx,ecx
+        add     edi,eax
+        add     edx,DWORD [12+esp]
+        and     ebp,ebx
+        xor     ebx,ecx
+        shrd    eax,eax,7
+        mov     esi,edi
+        xor     ebp,ebx
+        shld    edi,edi,5
+        add     edx,ebp
+        xor     esi,eax
+        xor     eax,ebx
+        add     edx,edi
+        vpalignr        xmm3,xmm0,xmm7,8
+        vpxor   xmm1,xmm1,xmm5
+        add     ecx,DWORD [16+esp]
+        and     esi,eax
+        xor     eax,ebx
+        shrd    edi,edi,7
+        vpxor   xmm1,xmm1,xmm2
+        vmovdqa [80+esp],xmm5
+        mov     ebp,edx
+        xor     esi,eax
+        vmovdqa xmm5,xmm4
+        vpaddd  xmm4,xmm4,xmm0
+        shld    edx,edx,5
+        add     ecx,esi
+        vpxor   xmm1,xmm1,xmm3
+        xor     ebp,edi
+        xor     edi,eax
+        add     ecx,edx
+        add     ebx,DWORD [20+esp]
+        vpsrld  xmm3,xmm1,30
+        vmovdqa [esp],xmm4
+        and     ebp,edi
+        xor     edi,eax
+        shrd    edx,edx,7
+        mov     esi,ecx
+        vpslld  xmm1,xmm1,2
+        xor     ebp,edi
+        shld    ecx,ecx,5
+        add     ebx,ebp
+        xor     esi,edx
+        xor     edx,edi
+        add     ebx,ecx
+        add     eax,DWORD [24+esp]
+        and     esi,edx
+        vpor    xmm1,xmm1,xmm3
+        xor     edx,edi
+        shrd    ecx,ecx,7
+        vmovdqa xmm3,[96+esp]
+        mov     ebp,ebx
+        xor     esi,edx
+        shld    ebx,ebx,5
+        add     eax,esi
+        xor     ebp,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        add     edi,DWORD [28+esp]
+        and     ebp,ecx
+        xor     ecx,edx
+        shrd    ebx,ebx,7
+        mov     esi,eax
+        xor     ebp,ecx
+        shld    eax,eax,5
+        add     edi,ebp
+        xor     esi,ebx
+        xor     ebx,ecx
+        add     edi,eax
+        vpalignr        xmm4,xmm1,xmm0,8
+        vpxor   xmm2,xmm2,xmm6
+        add     edx,DWORD [32+esp]
+        and     esi,ebx
+        xor     ebx,ecx
+        shrd    eax,eax,7
+        vpxor   xmm2,xmm2,xmm3
+        vmovdqa [96+esp],xmm6
+        mov     ebp,edi
+        xor     esi,ebx
+        vmovdqa xmm6,xmm5
+        vpaddd  xmm5,xmm5,xmm1
+        shld    edi,edi,5
+        add     edx,esi
+        vpxor   xmm2,xmm2,xmm4
+        xor     ebp,eax
+        xor     eax,ebx
+        add     edx,edi
+        add     ecx,DWORD [36+esp]
+        vpsrld  xmm4,xmm2,30
+        vmovdqa [16+esp],xmm5
+        and     ebp,eax
+        xor     eax,ebx
+        shrd    edi,edi,7
+        mov     esi,edx
+        vpslld  xmm2,xmm2,2
+        xor     ebp,eax
+        shld    edx,edx,5
+        add     ecx,ebp
+        xor     esi,edi
+        xor     edi,eax
+        add     ecx,edx
+        add     ebx,DWORD [40+esp]
+        and     esi,edi
+        vpor    xmm2,xmm2,xmm4
+        xor     edi,eax
+        shrd    edx,edx,7
+        vmovdqa xmm4,[64+esp]
+        mov     ebp,ecx
+        xor     esi,edi
+        shld    ecx,ecx,5
+        add     ebx,esi
+        xor     ebp,edx
+        xor     edx,edi
+        add     ebx,ecx
+        add     eax,DWORD [44+esp]
+        and     ebp,edx
+        xor     edx,edi
+        shrd    ecx,ecx,7
+        mov     esi,ebx
+        xor     ebp,edx
+        shld    ebx,ebx,5
+        add     eax,ebp
+        xor     esi,edx
+        add     eax,ebx
+        vpalignr        xmm5,xmm2,xmm1,8
+        vpxor   xmm3,xmm3,xmm7
+        add     edi,DWORD [48+esp]
+        xor     esi,ecx
+        mov     ebp,eax
+        shld    eax,eax,5
+        vpxor   xmm3,xmm3,xmm4
+        vmovdqa [64+esp],xmm7
+        add     edi,esi
+        xor     ebp,ecx
+        vmovdqa xmm7,xmm6
+        vpaddd  xmm6,xmm6,xmm2
+        shrd    ebx,ebx,7
+        add     edi,eax
+        vpxor   xmm3,xmm3,xmm5
+        add     edx,DWORD [52+esp]
+        xor     ebp,ebx
+        mov     esi,edi
+        shld    edi,edi,5
+        vpsrld  xmm5,xmm3,30
+        vmovdqa [32+esp],xmm6
+        add     edx,ebp
+        xor     esi,ebx
+        shrd    eax,eax,7
+        add     edx,edi
+        vpslld  xmm3,xmm3,2
+        add     ecx,DWORD [56+esp]
+        xor     esi,eax
+        mov     ebp,edx
+        shld    edx,edx,5
+        add     ecx,esi
+        xor     ebp,eax
+        shrd    edi,edi,7
+        add     ecx,edx
+        vpor    xmm3,xmm3,xmm5
+        add     ebx,DWORD [60+esp]
+        xor     ebp,edi
+        mov     esi,ecx
+        shld    ecx,ecx,5
+        add     ebx,ebp
+        xor     esi,edi
+        shrd    edx,edx,7
+        add     ebx,ecx
+        add     eax,DWORD [esp]
+        vpaddd  xmm7,xmm7,xmm3
+        xor     esi,edx
+        mov     ebp,ebx
+        shld    ebx,ebx,5
+        add     eax,esi
+        vmovdqa [48+esp],xmm7
+        xor     ebp,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        add     edi,DWORD [4+esp]
+        xor     ebp,ecx
+        mov     esi,eax
+        shld    eax,eax,5
+        add     edi,ebp
+        xor     esi,ecx
+        shrd    ebx,ebx,7
+        add     edi,eax
+        add     edx,DWORD [8+esp]
+        xor     esi,ebx
+        mov     ebp,edi
+        shld    edi,edi,5
+        add     edx,esi
+        xor     ebp,ebx
+        shrd    eax,eax,7
+        add     edx,edi
+        add     ecx,DWORD [12+esp]
+        xor     ebp,eax
+        mov     esi,edx
+        shld    edx,edx,5
+        add     ecx,ebp
+        xor     esi,eax
+        shrd    edi,edi,7
+        add     ecx,edx
+        mov     ebp,DWORD [196+esp]
+        cmp     ebp,DWORD [200+esp]
+        je      NEAR L$010done
+        vmovdqa xmm7,[160+esp]
+        vmovdqa xmm6,[176+esp]
+        vmovdqu xmm0,[ebp]
+        vmovdqu xmm1,[16+ebp]
+        vmovdqu xmm2,[32+ebp]
+        vmovdqu xmm3,[48+ebp]
+        add     ebp,64
+        vpshufb xmm0,xmm0,xmm6
+        mov     DWORD [196+esp],ebp
+        vmovdqa [96+esp],xmm7
+        add     ebx,DWORD [16+esp]
+        xor     esi,edi
+        vpshufb xmm1,xmm1,xmm6
+        mov     ebp,ecx
+        shld    ecx,ecx,5
+        vpaddd  xmm4,xmm0,xmm7
+        add     ebx,esi
+        xor     ebp,edi
+        shrd    edx,edx,7
+        add     ebx,ecx
+        vmovdqa [esp],xmm4
+        add     eax,DWORD [20+esp]
+        xor     ebp,edx
+        mov     esi,ebx
+        shld    ebx,ebx,5
+        add     eax,ebp
+        xor     esi,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        add     edi,DWORD [24+esp]
+        xor     esi,ecx
+        mov     ebp,eax
+        shld    eax,eax,5
+        add     edi,esi
+        xor     ebp,ecx
+        shrd    ebx,ebx,7
+        add     edi,eax
+        add     edx,DWORD [28+esp]
+        xor     ebp,ebx
+        mov     esi,edi
+        shld    edi,edi,5
+        add     edx,ebp
+        xor     esi,ebx
+        shrd    eax,eax,7
+        add     edx,edi
+        add     ecx,DWORD [32+esp]
+        xor     esi,eax
+        vpshufb xmm2,xmm2,xmm6
+        mov     ebp,edx
+        shld    edx,edx,5
+        vpaddd  xmm5,xmm1,xmm7
+        add     ecx,esi
+        xor     ebp,eax
+        shrd    edi,edi,7
+        add     ecx,edx
+        vmovdqa [16+esp],xmm5
+        add     ebx,DWORD [36+esp]
+        xor     ebp,edi
+        mov     esi,ecx
+        shld    ecx,ecx,5
+        add     ebx,ebp
+        xor     esi,edi
+        shrd    edx,edx,7
+        add     ebx,ecx
+        add     eax,DWORD [40+esp]
+        xor     esi,edx
+        mov     ebp,ebx
+        shld    ebx,ebx,5
+        add     eax,esi
+        xor     ebp,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        add     edi,DWORD [44+esp]
+        xor     ebp,ecx
+        mov     esi,eax
+        shld    eax,eax,5
+        add     edi,ebp
+        xor     esi,ecx
+        shrd    ebx,ebx,7
+        add     edi,eax
+        add     edx,DWORD [48+esp]
+        xor     esi,ebx
+        vpshufb xmm3,xmm3,xmm6
+        mov     ebp,edi
+        shld    edi,edi,5
+        vpaddd  xmm6,xmm2,xmm7
+        add     edx,esi
+        xor     ebp,ebx
+        shrd    eax,eax,7
+        add     edx,edi
+        vmovdqa [32+esp],xmm6
+        add     ecx,DWORD [52+esp]
+        xor     ebp,eax
+        mov     esi,edx
+        shld    edx,edx,5
+        add     ecx,ebp
+        xor     esi,eax
+        shrd    edi,edi,7
+        add     ecx,edx
+        add     ebx,DWORD [56+esp]
+        xor     esi,edi
+        mov     ebp,ecx
+        shld    ecx,ecx,5
+        add     ebx,esi
+        xor     ebp,edi
+        shrd    edx,edx,7
+        add     ebx,ecx
+        add     eax,DWORD [60+esp]
+        xor     ebp,edx
+        mov     esi,ebx
+        shld    ebx,ebx,5
+        add     eax,ebp
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        mov     ebp,DWORD [192+esp]
+        add     eax,DWORD [ebp]
+        add     esi,DWORD [4+ebp]
+        add     ecx,DWORD [8+ebp]
+        mov     DWORD [ebp],eax
+        add     edx,DWORD [12+ebp]
+        mov     DWORD [4+ebp],esi
+        add     edi,DWORD [16+ebp]
+        mov     ebx,ecx
+        mov     DWORD [8+ebp],ecx
+        xor     ebx,edx
+        mov     DWORD [12+ebp],edx
+        mov     DWORD [16+ebp],edi
+        mov     ebp,esi
+        and     esi,ebx
+        mov     ebx,ebp
+        jmp     NEAR L$009loop
+align   16
+L$010done:
+        add     ebx,DWORD [16+esp]
+        xor     esi,edi
+        mov     ebp,ecx
+        shld    ecx,ecx,5
+        add     ebx,esi
+        xor     ebp,edi
+        shrd    edx,edx,7
+        add     ebx,ecx
+        add     eax,DWORD [20+esp]
+        xor     ebp,edx
+        mov     esi,ebx
+        shld    ebx,ebx,5
+        add     eax,ebp
+        xor     esi,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        add     edi,DWORD [24+esp]
+        xor     esi,ecx
+        mov     ebp,eax
+        shld    eax,eax,5
+        add     edi,esi
+        xor     ebp,ecx
+        shrd    ebx,ebx,7
+        add     edi,eax
+        add     edx,DWORD [28+esp]
+        xor     ebp,ebx
+        mov     esi,edi
+        shld    edi,edi,5
+        add     edx,ebp
+        xor     esi,ebx
+        shrd    eax,eax,7
+        add     edx,edi
+        add     ecx,DWORD [32+esp]
+        xor     esi,eax
+        mov     ebp,edx
+        shld    edx,edx,5
+        add     ecx,esi
+        xor     ebp,eax
+        shrd    edi,edi,7
+        add     ecx,edx
+        add     ebx,DWORD [36+esp]
+        xor     ebp,edi
+        mov     esi,ecx
+        shld    ecx,ecx,5
+        add     ebx,ebp
+        xor     esi,edi
+        shrd    edx,edx,7
+        add     ebx,ecx
+        add     eax,DWORD [40+esp]
+        xor     esi,edx
+        mov     ebp,ebx
+        shld    ebx,ebx,5
+        add     eax,esi
+        xor     ebp,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        add     edi,DWORD [44+esp]
+        xor     ebp,ecx
+        mov     esi,eax
+        shld    eax,eax,5
+        add     edi,ebp
+        xor     esi,ecx
+        shrd    ebx,ebx,7
+        add     edi,eax
+        add     edx,DWORD [48+esp]
+        xor     esi,ebx
+        mov     ebp,edi
+        shld    edi,edi,5
+        add     edx,esi
+        xor     ebp,ebx
+        shrd    eax,eax,7
+        add     edx,edi
+        add     ecx,DWORD [52+esp]
+        xor     ebp,eax
+        mov     esi,edx
+        shld    edx,edx,5
+        add     ecx,ebp
+        xor     esi,eax
+        shrd    edi,edi,7
+        add     ecx,edx
+        add     ebx,DWORD [56+esp]
+        xor     esi,edi
+        mov     ebp,ecx
+        shld    ecx,ecx,5
+        add     ebx,esi
+        xor     ebp,edi
+        shrd    edx,edx,7
+        add     ebx,ecx
+        add     eax,DWORD [60+esp]
+        xor     ebp,edx
+        mov     esi,ebx
+        shld    ebx,ebx,5
+        add     eax,ebp
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        vzeroall
+        mov     ebp,DWORD [192+esp]
+        add     eax,DWORD [ebp]
+        mov     esp,DWORD [204+esp]
+        add     esi,DWORD [4+ebp]
+        add     ecx,DWORD [8+ebp]
+        mov     DWORD [ebp],eax
+        add     edx,DWORD [12+ebp]
+        mov     DWORD [4+ebp],esi
+        add     edi,DWORD [16+ebp]
+        mov     DWORD [8+ebp],ecx
+        mov     DWORD [12+ebp],edx
+        mov     DWORD [16+ebp],edi
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+align   64
+L$K_XX_XX:
+dd      1518500249,1518500249,1518500249,1518500249
+dd      1859775393,1859775393,1859775393,1859775393
+dd      2400959708,2400959708,2400959708,2400959708
+dd      3395469782,3395469782,3395469782,3395469782
+dd      66051,67438087,134810123,202182159
+db      15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0
+db      83,72,65,49,32,98,108,111,99,107,32,116,114,97,110,115
+db      102,111,114,109,32,102,111,114,32,120,56,54,44,32,67,82
+db      89,80,84,79,71,65,77,83,32,98,121,32,60,97,112,112
+db      114,111,64,111,112,101,110,115,115,108,46,111,114,103,62,0
+segment .bss
+common  _OPENSSL_ia32cap_P 16
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha256-586.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha256-586.nasm
new file mode 100644
index 0000000000..0540b0eac7
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha256-586.nasm
@@ -0,0 +1,6796 @@
+; Copyright 2007-2018 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code    use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text   code align=64
+%else
+section .text   code
+%endif
+;extern _OPENSSL_ia32cap_P
+global  _sha256_block_data_order
+align   16
+_sha256_block_data_order:
+L$_sha256_block_data_order_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     esi,DWORD [20+esp]
+        mov     edi,DWORD [24+esp]
+        mov     eax,DWORD [28+esp]
+        mov     ebx,esp
+        call    L$000pic_point
+L$000pic_point:
+        pop     ebp
+        lea     ebp,[(L$001K256-L$000pic_point)+ebp]
+        sub     esp,16
+        and     esp,-64
+        shl     eax,6
+        add     eax,edi
+        mov     DWORD [esp],esi
+        mov     DWORD [4+esp],edi
+        mov     DWORD [8+esp],eax
+        mov     DWORD [12+esp],ebx
+        lea     edx,[_OPENSSL_ia32cap_P]
+        mov     ecx,DWORD [edx]
+        mov     ebx,DWORD [4+edx]
+        test    ecx,1048576
+        jnz     NEAR L$002loop
+        mov     edx,DWORD [8+edx]
+        test    ecx,16777216
+        jz      NEAR L$003no_xmm
+        and     ecx,1073741824
+        and     ebx,268435968
+        test    edx,536870912
+        jnz     NEAR L$004shaext
+        or      ecx,ebx
+        and     ecx,1342177280
+        cmp     ecx,1342177280
+        je      NEAR L$005AVX
+        test    ebx,512
+        jnz     NEAR L$006SSSE3
+L$003no_xmm:
+        sub     eax,edi
+        cmp     eax,256
+        jae     NEAR L$007unrolled
+        jmp     NEAR L$002loop
+align   16
+L$002loop:
+        mov     eax,DWORD [edi]
+        mov     ebx,DWORD [4+edi]
+        mov     ecx,DWORD [8+edi]
+        bswap   eax
+        mov     edx,DWORD [12+edi]
+        bswap   ebx
+        push    eax
+        bswap   ecx
+        push    ebx
+        bswap   edx
+        push    ecx
+        push    edx
+        mov     eax,DWORD [16+edi]
+        mov     ebx,DWORD [20+edi]
+        mov     ecx,DWORD [24+edi]
+        bswap   eax
+        mov     edx,DWORD [28+edi]
+        bswap   ebx
+        push    eax
+        bswap   ecx
+        push    ebx
+        bswap   edx
+        push    ecx
+        push    edx
+        mov     eax,DWORD [32+edi]
+        mov     ebx,DWORD [36+edi]
+        mov     ecx,DWORD [40+edi]
+        bswap   eax
+        mov     edx,DWORD [44+edi]
+        bswap   ebx
+        push    eax
+        bswap   ecx
+        push    ebx
+        bswap   edx
+        push    ecx
+        push    edx
+        mov     eax,DWORD [48+edi]
+        mov     ebx,DWORD [52+edi]
+        mov     ecx,DWORD [56+edi]
+        bswap   eax
+        mov     edx,DWORD [60+edi]
+        bswap   ebx
+        push    eax
+        bswap   ecx
+        push    ebx
+        bswap   edx
+        push    ecx
+        push    edx
+        add     edi,64
+        lea     esp,[esp-36]
+        mov     DWORD [104+esp],edi
+        mov     eax,DWORD [esi]
+        mov     ebx,DWORD [4+esi]
+        mov     ecx,DWORD [8+esi]
+        mov     edi,DWORD [12+esi]
+        mov     DWORD [8+esp],ebx
+        xor     ebx,ecx
+        mov     DWORD [12+esp],ecx
+        mov     DWORD [16+esp],edi
+        mov     DWORD [esp],ebx
+        mov     edx,DWORD [16+esi]
+        mov     ebx,DWORD [20+esi]
+        mov     ecx,DWORD [24+esi]
+        mov     edi,DWORD [28+esi]
+        mov     DWORD [24+esp],ebx
+        mov     DWORD [28+esp],ecx
+        mov     DWORD [32+esp],edi
+align   16
+L$00800_15:
+        mov     ecx,edx
+        mov     esi,DWORD [24+esp]
+        ror     ecx,14
+        mov     edi,DWORD [28+esp]
+        xor     ecx,edx
+        xor     esi,edi
+        mov     ebx,DWORD [96+esp]
+        ror     ecx,5
+        and     esi,edx
+        mov     DWORD [20+esp],edx
+        xor     edx,ecx
+        add     ebx,DWORD [32+esp]
+        xor     esi,edi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,esi
+        ror     ecx,9
+        add     ebx,edx
+        mov     edi,DWORD [8+esp]
+        xor     ecx,eax
+        mov     DWORD [4+esp],eax
+        lea     esp,[esp-4]
+        ror     ecx,11
+        mov     esi,DWORD [ebp]
+        xor     ecx,eax
+        mov     edx,DWORD [20+esp]
+        xor     eax,edi
+        ror     ecx,2
+        add     ebx,esi
+        mov     DWORD [esp],eax
+        add     edx,ebx
+        and     eax,DWORD [4+esp]
+        add     ebx,ecx
+        xor     eax,edi
+        add     ebp,4
+        add     eax,ebx
+        cmp     esi,3248222580
+        jne     NEAR L$00800_15
+        mov     ecx,DWORD [156+esp]
+        jmp     NEAR L$00916_63
+align   16
+L$00916_63:
+        mov     ebx,ecx
+        mov     esi,DWORD [104+esp]
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [160+esp]
+        shr     edi,10
+        add     ebx,DWORD [124+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [24+esp]
+        ror     ecx,14
+        add     ebx,edi
+        mov     edi,DWORD [28+esp]
+        xor     ecx,edx
+        xor     esi,edi
+        mov     DWORD [96+esp],ebx
+        ror     ecx,5
+        and     esi,edx
+        mov     DWORD [20+esp],edx
+        xor     edx,ecx
+        add     ebx,DWORD [32+esp]
+        xor     esi,edi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,esi
+        ror     ecx,9
+        add     ebx,edx
+        mov     edi,DWORD [8+esp]
+        xor     ecx,eax
+        mov     DWORD [4+esp],eax
+        lea     esp,[esp-4]
+        ror     ecx,11
+        mov     esi,DWORD [ebp]
+        xor     ecx,eax
+        mov     edx,DWORD [20+esp]
+        xor     eax,edi
+        ror     ecx,2
+        add     ebx,esi
+        mov     DWORD [esp],eax
+        add     edx,ebx
+        and     eax,DWORD [4+esp]
+        add     ebx,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [156+esp]
+        add     ebp,4
+        add     eax,ebx
+        cmp     esi,3329325298
+        jne     NEAR L$00916_63
+        mov     esi,DWORD [356+esp]
+        mov     ebx,DWORD [8+esp]
+        mov     ecx,DWORD [16+esp]
+        add     eax,DWORD [esi]
+        add     ebx,DWORD [4+esi]
+        add     edi,DWORD [8+esi]
+        add     ecx,DWORD [12+esi]
+        mov     DWORD [esi],eax
+        mov     DWORD [4+esi],ebx
+        mov     DWORD [8+esi],edi
+        mov     DWORD [12+esi],ecx
+        mov     eax,DWORD [24+esp]
+        mov     ebx,DWORD [28+esp]
+        mov     ecx,DWORD [32+esp]
+        mov     edi,DWORD [360+esp]
+        add     edx,DWORD [16+esi]
+        add     eax,DWORD [20+esi]
+        add     ebx,DWORD [24+esi]
+        add     ecx,DWORD [28+esi]
+        mov     DWORD [16+esi],edx
+        mov     DWORD [20+esi],eax
+        mov     DWORD [24+esi],ebx
+        mov     DWORD [28+esi],ecx
+        lea     esp,[356+esp]
+        sub     ebp,256
+        cmp     edi,DWORD [8+esp]
+        jb      NEAR L$002loop
+        mov     esp,DWORD [12+esp]
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+align   64
+L$001K256:
+dd      1116352408,1899447441,3049323471,3921009573,961987163,1508970993,2453635748,2870763221,3624381080,310598401,607225278,1426881987,1925078388,2162078206,2614888103,3248222580,3835390401,4022224774,264347078,604807628,770255983,1249150122,1555081692,1996064986,2554220882,2821834349,2952996808,3210313671,3336571891,3584528711,113926993,338241895,666307205,773529912,1294757372,1396182291,1695183700,1986661051,2177026350,2456956037,2730485921,2820302411,3259730800,3345764771,3516065817,3600352804,4094571909,275423344,430227734,506948616,659060556,883997877,958139571,1322822218,1537002063,1747873779,1955562222,2024104815,2227730452,2361852424,2428436474,2756734187,3204031479,3329325298
+dd      66051,67438087,134810123,202182159
+db      83,72,65,50,53,54,32,98,108,111,99,107,32,116,114,97
+db      110,115,102,111,114,109,32,102,111,114,32,120,56,54,44,32
+db      67,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97
+db      112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103
+db      62,0
+align   16
+L$007unrolled:
+        lea     esp,[esp-96]
+        mov     eax,DWORD [esi]
+        mov     ebp,DWORD [4+esi]
+        mov     ecx,DWORD [8+esi]
+        mov     ebx,DWORD [12+esi]
+        mov     DWORD [4+esp],ebp
+        xor     ebp,ecx
+        mov     DWORD [8+esp],ecx
+        mov     DWORD [12+esp],ebx
+        mov     edx,DWORD [16+esi]
+        mov     ebx,DWORD [20+esi]
+        mov     ecx,DWORD [24+esi]
+        mov     esi,DWORD [28+esi]
+        mov     DWORD [20+esp],ebx
+        mov     DWORD [24+esp],ecx
+        mov     DWORD [28+esp],esi
+        jmp     NEAR L$010grand_loop
+align   16
+L$010grand_loop:
+        mov     ebx,DWORD [edi]
+        mov     ecx,DWORD [4+edi]
+        bswap   ebx
+        mov     esi,DWORD [8+edi]
+        bswap   ecx
+        mov     DWORD [32+esp],ebx
+        bswap   esi
+        mov     DWORD [36+esp],ecx
+        mov     DWORD [40+esp],esi
+        mov     ebx,DWORD [12+edi]
+        mov     ecx,DWORD [16+edi]
+        bswap   ebx
+        mov     esi,DWORD [20+edi]
+        bswap   ecx
+        mov     DWORD [44+esp],ebx
+        bswap   esi
+        mov     DWORD [48+esp],ecx
+        mov     DWORD [52+esp],esi
+        mov     ebx,DWORD [24+edi]
+        mov     ecx,DWORD [28+edi]
+        bswap   ebx
+        mov     esi,DWORD [32+edi]
+        bswap   ecx
+        mov     DWORD [56+esp],ebx
+        bswap   esi
+        mov     DWORD [60+esp],ecx
+        mov     DWORD [64+esp],esi
+        mov     ebx,DWORD [36+edi]
+        mov     ecx,DWORD [40+edi]
+        bswap   ebx
+        mov     esi,DWORD [44+edi]
+        bswap   ecx
+        mov     DWORD [68+esp],ebx
+        bswap   esi
+        mov     DWORD [72+esp],ecx
+        mov     DWORD [76+esp],esi
+        mov     ebx,DWORD [48+edi]
+        mov     ecx,DWORD [52+edi]
+        bswap   ebx
+        mov     esi,DWORD [56+edi]
+        bswap   ecx
+        mov     DWORD [80+esp],ebx
+        bswap   esi
+        mov     DWORD [84+esp],ecx
+        mov     DWORD [88+esp],esi
+        mov     ebx,DWORD [60+edi]
+        add     edi,64
+        bswap   ebx
+        mov     DWORD [100+esp],edi
+        mov     DWORD [92+esp],ebx
+        mov     ecx,edx
+        mov     esi,DWORD [20+esp]
+        ror     edx,14
+        mov     edi,DWORD [24+esp]
+        xor     edx,ecx
+        mov     ebx,DWORD [32+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [16+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [28+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [4+esp]
+        xor     ecx,eax
+        mov     DWORD [esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[1116352408+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [12+esp]
+        add     ebp,ecx
+        mov     esi,edx
+        mov     ecx,DWORD [16+esp]
+        ror     edx,14
+        mov     edi,DWORD [20+esp]
+        xor     edx,esi
+        mov     ebx,DWORD [36+esp]
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [12+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [24+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [esp]
+        xor     esi,ebp
+        mov     DWORD [28+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[1899447441+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [8+esp]
+        add     eax,esi
+        mov     ecx,edx
+        mov     esi,DWORD [12+esp]
+        ror     edx,14
+        mov     edi,DWORD [16+esp]
+        xor     edx,ecx
+        mov     ebx,DWORD [40+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [8+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [20+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [28+esp]
+        xor     ecx,eax
+        mov     DWORD [24+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[3049323471+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [4+esp]
+        add     ebp,ecx
+        mov     esi,edx
+        mov     ecx,DWORD [8+esp]
+        ror     edx,14
+        mov     edi,DWORD [12+esp]
+        xor     edx,esi
+        mov     ebx,DWORD [44+esp]
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [4+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [16+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [24+esp]
+        xor     esi,ebp
+        mov     DWORD [20+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[3921009573+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [esp]
+        add     eax,esi
+        mov     ecx,edx
+        mov     esi,DWORD [4+esp]
+        ror     edx,14
+        mov     edi,DWORD [8+esp]
+        xor     edx,ecx
+        mov     ebx,DWORD [48+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [12+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [20+esp]
+        xor     ecx,eax
+        mov     DWORD [16+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[961987163+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [28+esp]
+        add     ebp,ecx
+        mov     esi,edx
+        mov     ecx,DWORD [esp]
+        ror     edx,14
+        mov     edi,DWORD [4+esp]
+        xor     edx,esi
+        mov     ebx,DWORD [52+esp]
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [28+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [8+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [16+esp]
+        xor     esi,ebp
+        mov     DWORD [12+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[1508970993+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [24+esp]
+        add     eax,esi
+        mov     ecx,edx
+        mov     esi,DWORD [28+esp]
+        ror     edx,14
+        mov     edi,DWORD [esp]
+        xor     edx,ecx
+        mov     ebx,DWORD [56+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [24+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [4+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [12+esp]
+        xor     ecx,eax
+        mov     DWORD [8+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[2453635748+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [20+esp]
+        add     ebp,ecx
+        mov     esi,edx
+        mov     ecx,DWORD [24+esp]
+        ror     edx,14
+        mov     edi,DWORD [28+esp]
+        xor     edx,esi
+        mov     ebx,DWORD [60+esp]
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [20+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [8+esp]
+        xor     esi,ebp
+        mov     DWORD [4+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[2870763221+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [16+esp]
+        add     eax,esi
+        mov     ecx,edx
+        mov     esi,DWORD [20+esp]
+        ror     edx,14
+        mov     edi,DWORD [24+esp]
+        xor     edx,ecx
+        mov     ebx,DWORD [64+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [16+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [28+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [4+esp]
+        xor     ecx,eax
+        mov     DWORD [esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[3624381080+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [12+esp]
+        add     ebp,ecx
+        mov     esi,edx
+        mov     ecx,DWORD [16+esp]
+        ror     edx,14
+        mov     edi,DWORD [20+esp]
+        xor     edx,esi
+        mov     ebx,DWORD [68+esp]
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [12+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [24+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [esp]
+        xor     esi,ebp
+        mov     DWORD [28+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[310598401+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [8+esp]
+        add     eax,esi
+        mov     ecx,edx
+        mov     esi,DWORD [12+esp]
+        ror     edx,14
+        mov     edi,DWORD [16+esp]
+        xor     edx,ecx
+        mov     ebx,DWORD [72+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [8+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [20+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [28+esp]
+        xor     ecx,eax
+        mov     DWORD [24+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[607225278+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [4+esp]
+        add     ebp,ecx
+        mov     esi,edx
+        mov     ecx,DWORD [8+esp]
+        ror     edx,14
+        mov     edi,DWORD [12+esp]
+        xor     edx,esi
+        mov     ebx,DWORD [76+esp]
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [4+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [16+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [24+esp]
+        xor     esi,ebp
+        mov     DWORD [20+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[1426881987+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [esp]
+        add     eax,esi
+        mov     ecx,edx
+        mov     esi,DWORD [4+esp]
+        ror     edx,14
+        mov     edi,DWORD [8+esp]
+        xor     edx,ecx
+        mov     ebx,DWORD [80+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [12+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [20+esp]
+        xor     ecx,eax
+        mov     DWORD [16+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[1925078388+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [28+esp]
+        add     ebp,ecx
+        mov     esi,edx
+        mov     ecx,DWORD [esp]
+        ror     edx,14
+        mov     edi,DWORD [4+esp]
+        xor     edx,esi
+        mov     ebx,DWORD [84+esp]
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [28+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [8+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [16+esp]
+        xor     esi,ebp
+        mov     DWORD [12+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[2162078206+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [24+esp]
+        add     eax,esi
+        mov     ecx,edx
+        mov     esi,DWORD [28+esp]
+        ror     edx,14
+        mov     edi,DWORD [esp]
+        xor     edx,ecx
+        mov     ebx,DWORD [88+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [24+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [4+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [12+esp]
+        xor     ecx,eax
+        mov     DWORD [8+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[2614888103+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [20+esp]
+        add     ebp,ecx
+        mov     esi,edx
+        mov     ecx,DWORD [24+esp]
+        ror     edx,14
+        mov     edi,DWORD [28+esp]
+        xor     edx,esi
+        mov     ebx,DWORD [92+esp]
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [20+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [8+esp]
+        xor     esi,ebp
+        mov     DWORD [4+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[3248222580+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [36+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [16+esp]
+        add     eax,esi
+        mov     esi,DWORD [88+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [32+esp]
+        shr     edi,10
+        add     ebx,DWORD [68+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [20+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [24+esp]
+        xor     edx,ecx
+        mov     DWORD [32+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [16+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [28+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [4+esp]
+        xor     ecx,eax
+        mov     DWORD [esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[3835390401+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [40+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [12+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [92+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [36+esp]
+        shr     edi,10
+        add     ebx,DWORD [72+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [16+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [20+esp]
+        xor     edx,esi
+        mov     DWORD [36+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [12+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [24+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [esp]
+        xor     esi,ebp
+        mov     DWORD [28+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[4022224774+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [44+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [8+esp]
+        add     eax,esi
+        mov     esi,DWORD [32+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [40+esp]
+        shr     edi,10
+        add     ebx,DWORD [76+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [12+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [16+esp]
+        xor     edx,ecx
+        mov     DWORD [40+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [8+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [20+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [28+esp]
+        xor     ecx,eax
+        mov     DWORD [24+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[264347078+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [48+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [4+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [36+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [44+esp]
+        shr     edi,10
+        add     ebx,DWORD [80+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [8+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [12+esp]
+        xor     edx,esi
+        mov     DWORD [44+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [4+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [16+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [24+esp]
+        xor     esi,ebp
+        mov     DWORD [20+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[604807628+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [52+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [esp]
+        add     eax,esi
+        mov     esi,DWORD [40+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [48+esp]
+        shr     edi,10
+        add     ebx,DWORD [84+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [4+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [8+esp]
+        xor     edx,ecx
+        mov     DWORD [48+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [12+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [20+esp]
+        xor     ecx,eax
+        mov     DWORD [16+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[770255983+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [56+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [28+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [44+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [52+esp]
+        shr     edi,10
+        add     ebx,DWORD [88+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [4+esp]
+        xor     edx,esi
+        mov     DWORD [52+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [28+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [8+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [16+esp]
+        xor     esi,ebp
+        mov     DWORD [12+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[1249150122+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [60+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [24+esp]
+        add     eax,esi
+        mov     esi,DWORD [48+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [56+esp]
+        shr     edi,10
+        add     ebx,DWORD [92+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [28+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [esp]
+        xor     edx,ecx
+        mov     DWORD [56+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [24+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [4+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [12+esp]
+        xor     ecx,eax
+        mov     DWORD [8+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[1555081692+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [64+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [20+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [52+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [60+esp]
+        shr     edi,10
+        add     ebx,DWORD [32+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [24+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [28+esp]
+        xor     edx,esi
+        mov     DWORD [60+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [20+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [8+esp]
+        xor     esi,ebp
+        mov     DWORD [4+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[1996064986+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [68+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [16+esp]
+        add     eax,esi
+        mov     esi,DWORD [56+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [64+esp]
+        shr     edi,10
+        add     ebx,DWORD [36+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [20+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [24+esp]
+        xor     edx,ecx
+        mov     DWORD [64+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [16+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [28+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [4+esp]
+        xor     ecx,eax
+        mov     DWORD [esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[2554220882+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [72+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [12+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [60+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [68+esp]
+        shr     edi,10
+        add     ebx,DWORD [40+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [16+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [20+esp]
+        xor     edx,esi
+        mov     DWORD [68+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [12+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [24+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [esp]
+        xor     esi,ebp
+        mov     DWORD [28+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[2821834349+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [76+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [8+esp]
+        add     eax,esi
+        mov     esi,DWORD [64+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [72+esp]
+        shr     edi,10
+        add     ebx,DWORD [44+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [12+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [16+esp]
+        xor     edx,ecx
+        mov     DWORD [72+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [8+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [20+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [28+esp]
+        xor     ecx,eax
+        mov     DWORD [24+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[2952996808+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [80+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [4+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [68+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [76+esp]
+        shr     edi,10
+        add     ebx,DWORD [48+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [8+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [12+esp]
+        xor     edx,esi
+        mov     DWORD [76+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [4+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [16+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [24+esp]
+        xor     esi,ebp
+        mov     DWORD [20+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[3210313671+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [84+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [esp]
+        add     eax,esi
+        mov     esi,DWORD [72+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [80+esp]
+        shr     edi,10
+        add     ebx,DWORD [52+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [4+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [8+esp]
+        xor     edx,ecx
+        mov     DWORD [80+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [12+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [20+esp]
+        xor     ecx,eax
+        mov     DWORD [16+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[3336571891+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [88+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [28+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [76+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [84+esp]
+        shr     edi,10
+        add     ebx,DWORD [56+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [4+esp]
+        xor     edx,esi
+        mov     DWORD [84+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [28+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [8+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [16+esp]
+        xor     esi,ebp
+        mov     DWORD [12+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[3584528711+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [92+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [24+esp]
+        add     eax,esi
+        mov     esi,DWORD [80+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [88+esp]
+        shr     edi,10
+        add     ebx,DWORD [60+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [28+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [esp]
+        xor     edx,ecx
+        mov     DWORD [88+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [24+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [4+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [12+esp]
+        xor     ecx,eax
+        mov     DWORD [8+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[113926993+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [32+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [20+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [84+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [92+esp]
+        shr     edi,10
+        add     ebx,DWORD [64+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [24+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [28+esp]
+        xor     edx,esi
+        mov     DWORD [92+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [20+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [8+esp]
+        xor     esi,ebp
+        mov     DWORD [4+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[338241895+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [36+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [16+esp]
+        add     eax,esi
+        mov     esi,DWORD [88+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [32+esp]
+        shr     edi,10
+        add     ebx,DWORD [68+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [20+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [24+esp]
+        xor     edx,ecx
+        mov     DWORD [32+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [16+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [28+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [4+esp]
+        xor     ecx,eax
+        mov     DWORD [esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[666307205+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [40+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [12+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [92+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [36+esp]
+        shr     edi,10
+        add     ebx,DWORD [72+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [16+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [20+esp]
+        xor     edx,esi
+        mov     DWORD [36+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [12+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [24+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [esp]
+        xor     esi,ebp
+        mov     DWORD [28+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[773529912+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [44+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [8+esp]
+        add     eax,esi
+        mov     esi,DWORD [32+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [40+esp]
+        shr     edi,10
+        add     ebx,DWORD [76+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [12+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [16+esp]
+        xor     edx,ecx
+        mov     DWORD [40+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [8+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [20+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [28+esp]
+        xor     ecx,eax
+        mov     DWORD [24+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[1294757372+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [48+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [4+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [36+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [44+esp]
+        shr     edi,10
+        add     ebx,DWORD [80+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [8+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [12+esp]
+        xor     edx,esi
+        mov     DWORD [44+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [4+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [16+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [24+esp]
+        xor     esi,ebp
+        mov     DWORD [20+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[1396182291+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [52+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [esp]
+        add     eax,esi
+        mov     esi,DWORD [40+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [48+esp]
+        shr     edi,10
+        add     ebx,DWORD [84+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [4+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [8+esp]
+        xor     edx,ecx
+        mov     DWORD [48+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [12+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [20+esp]
+        xor     ecx,eax
+        mov     DWORD [16+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[1695183700+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [56+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [28+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [44+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [52+esp]
+        shr     edi,10
+        add     ebx,DWORD [88+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [4+esp]
+        xor     edx,esi
+        mov     DWORD [52+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [28+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [8+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [16+esp]
+        xor     esi,ebp
+        mov     DWORD [12+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[1986661051+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [60+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [24+esp]
+        add     eax,esi
+        mov     esi,DWORD [48+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [56+esp]
+        shr     edi,10
+        add     ebx,DWORD [92+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [28+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [esp]
+        xor     edx,ecx
+        mov     DWORD [56+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [24+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [4+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [12+esp]
+        xor     ecx,eax
+        mov     DWORD [8+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[2177026350+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [64+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [20+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [52+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [60+esp]
+        shr     edi,10
+        add     ebx,DWORD [32+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [24+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [28+esp]
+        xor     edx,esi
+        mov     DWORD [60+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [20+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [8+esp]
+        xor     esi,ebp
+        mov     DWORD [4+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[2456956037+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [68+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [16+esp]
+        add     eax,esi
+        mov     esi,DWORD [56+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [64+esp]
+        shr     edi,10
+        add     ebx,DWORD [36+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [20+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [24+esp]
+        xor     edx,ecx
+        mov     DWORD [64+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [16+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [28+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [4+esp]
+        xor     ecx,eax
+        mov     DWORD [esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[2730485921+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [72+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [12+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [60+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [68+esp]
+        shr     edi,10
+        add     ebx,DWORD [40+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [16+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [20+esp]
+        xor     edx,esi
+        mov     DWORD [68+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [12+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [24+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [esp]
+        xor     esi,ebp
+        mov     DWORD [28+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[2820302411+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [76+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [8+esp]
+        add     eax,esi
+        mov     esi,DWORD [64+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [72+esp]
+        shr     edi,10
+        add     ebx,DWORD [44+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [12+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [16+esp]
+        xor     edx,ecx
+        mov     DWORD [72+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [8+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [20+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [28+esp]
+        xor     ecx,eax
+        mov     DWORD [24+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[3259730800+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [80+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [4+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [68+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [76+esp]
+        shr     edi,10
+        add     ebx,DWORD [48+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [8+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [12+esp]
+        xor     edx,esi
+        mov     DWORD [76+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [4+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [16+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [24+esp]
+        xor     esi,ebp
+        mov     DWORD [20+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[3345764771+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [84+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [esp]
+        add     eax,esi
+        mov     esi,DWORD [72+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [80+esp]
+        shr     edi,10
+        add     ebx,DWORD [52+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [4+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [8+esp]
+        xor     edx,ecx
+        mov     DWORD [80+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [12+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [20+esp]
+        xor     ecx,eax
+        mov     DWORD [16+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[3516065817+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [88+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [28+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [76+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [84+esp]
+        shr     edi,10
+        add     ebx,DWORD [56+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [4+esp]
+        xor     edx,esi
+        mov     DWORD [84+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [28+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [8+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [16+esp]
+        xor     esi,ebp
+        mov     DWORD [12+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[3600352804+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [92+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [24+esp]
+        add     eax,esi
+        mov     esi,DWORD [80+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [88+esp]
+        shr     edi,10
+        add     ebx,DWORD [60+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [28+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [esp]
+        xor     edx,ecx
+        mov     DWORD [88+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [24+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [4+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [12+esp]
+        xor     ecx,eax
+        mov     DWORD [8+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[4094571909+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [32+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [20+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [84+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [92+esp]
+        shr     edi,10
+        add     ebx,DWORD [64+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [24+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [28+esp]
+        xor     edx,esi
+        mov     DWORD [92+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [20+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [8+esp]
+        xor     esi,ebp
+        mov     DWORD [4+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[275423344+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [36+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [16+esp]
+        add     eax,esi
+        mov     esi,DWORD [88+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [32+esp]
+        shr     edi,10
+        add     ebx,DWORD [68+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [20+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [24+esp]
+        xor     edx,ecx
+        mov     DWORD [32+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [16+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [28+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [4+esp]
+        xor     ecx,eax
+        mov     DWORD [esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[430227734+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [40+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [12+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [92+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [36+esp]
+        shr     edi,10
+        add     ebx,DWORD [72+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [16+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [20+esp]
+        xor     edx,esi
+        mov     DWORD [36+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [12+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [24+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [esp]
+        xor     esi,ebp
+        mov     DWORD [28+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[506948616+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [44+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [8+esp]
+        add     eax,esi
+        mov     esi,DWORD [32+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [40+esp]
+        shr     edi,10
+        add     ebx,DWORD [76+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [12+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [16+esp]
+        xor     edx,ecx
+        mov     DWORD [40+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [8+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [20+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [28+esp]
+        xor     ecx,eax
+        mov     DWORD [24+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[659060556+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [48+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [4+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [36+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [44+esp]
+        shr     edi,10
+        add     ebx,DWORD [80+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [8+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [12+esp]
+        xor     edx,esi
+        mov     DWORD [44+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [4+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [16+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [24+esp]
+        xor     esi,ebp
+        mov     DWORD [20+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[883997877+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [52+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [esp]
+        add     eax,esi
+        mov     esi,DWORD [40+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [48+esp]
+        shr     edi,10
+        add     ebx,DWORD [84+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [4+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [8+esp]
+        xor     edx,ecx
+        mov     DWORD [48+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [12+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [20+esp]
+        xor     ecx,eax
+        mov     DWORD [16+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[958139571+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [56+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [28+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [44+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [52+esp]
+        shr     edi,10
+        add     ebx,DWORD [88+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [4+esp]
+        xor     edx,esi
+        mov     DWORD [52+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [28+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [8+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [16+esp]
+        xor     esi,ebp
+        mov     DWORD [12+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[1322822218+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [60+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [24+esp]
+        add     eax,esi
+        mov     esi,DWORD [48+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [56+esp]
+        shr     edi,10
+        add     ebx,DWORD [92+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [28+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [esp]
+        xor     edx,ecx
+        mov     DWORD [56+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [24+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [4+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [12+esp]
+        xor     ecx,eax
+        mov     DWORD [8+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[1537002063+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [64+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [20+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [52+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [60+esp]
+        shr     edi,10
+        add     ebx,DWORD [32+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [24+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [28+esp]
+        xor     edx,esi
+        mov     DWORD [60+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [20+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [8+esp]
+        xor     esi,ebp
+        mov     DWORD [4+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[1747873779+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [68+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [16+esp]
+        add     eax,esi
+        mov     esi,DWORD [56+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [64+esp]
+        shr     edi,10
+        add     ebx,DWORD [36+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [20+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [24+esp]
+        xor     edx,ecx
+        mov     DWORD [64+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [16+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [28+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [4+esp]
+        xor     ecx,eax
+        mov     DWORD [esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[1955562222+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [72+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [12+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [60+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [68+esp]
+        shr     edi,10
+        add     ebx,DWORD [40+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [16+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [20+esp]
+        xor     edx,esi
+        mov     DWORD [68+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [12+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [24+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [esp]
+        xor     esi,ebp
+        mov     DWORD [28+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[2024104815+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [76+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [8+esp]
+        add     eax,esi
+        mov     esi,DWORD [64+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [72+esp]
+        shr     edi,10
+        add     ebx,DWORD [44+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [12+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [16+esp]
+        xor     edx,ecx
+        mov     DWORD [72+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [8+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [20+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [28+esp]
+        xor     ecx,eax
+        mov     DWORD [24+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[2227730452+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [80+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [4+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [68+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [76+esp]
+        shr     edi,10
+        add     ebx,DWORD [48+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [8+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [12+esp]
+        xor     edx,esi
+        mov     DWORD [76+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [4+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [16+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [24+esp]
+        xor     esi,ebp
+        mov     DWORD [20+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[2361852424+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [84+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [esp]
+        add     eax,esi
+        mov     esi,DWORD [72+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [80+esp]
+        shr     edi,10
+        add     ebx,DWORD [52+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [4+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [8+esp]
+        xor     edx,ecx
+        mov     DWORD [80+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [12+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [20+esp]
+        xor     ecx,eax
+        mov     DWORD [16+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[2428436474+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [88+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [28+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [76+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [84+esp]
+        shr     edi,10
+        add     ebx,DWORD [56+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [4+esp]
+        xor     edx,esi
+        mov     DWORD [84+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [28+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [8+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [16+esp]
+        xor     esi,ebp
+        mov     DWORD [12+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[2756734187+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [92+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [24+esp]
+        add     eax,esi
+        mov     esi,DWORD [80+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [88+esp]
+        shr     edi,10
+        add     ebx,DWORD [60+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [28+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [esp]
+        xor     edx,ecx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [24+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [4+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [12+esp]
+        xor     ecx,eax
+        mov     DWORD [8+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[3204031479+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [32+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [20+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [84+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [92+esp]
+        shr     edi,10
+        add     ebx,DWORD [64+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [24+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [28+esp]
+        xor     edx,esi
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [20+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [8+esp]
+        xor     esi,ebp
+        mov     DWORD [4+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[3329325298+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [16+esp]
+        add     eax,esi
+        mov     esi,DWORD [96+esp]
+        xor     ebp,edi
+        mov     ecx,DWORD [12+esp]
+        add     eax,DWORD [esi]
+        add     ebp,DWORD [4+esi]
+        add     edi,DWORD [8+esi]
+        add     ecx,DWORD [12+esi]
+        mov     DWORD [esi],eax
+        mov     DWORD [4+esi],ebp
+        mov     DWORD [8+esi],edi
+        mov     DWORD [12+esi],ecx
+        mov     DWORD [4+esp],ebp
+        xor     ebp,edi
+        mov     DWORD [8+esp],edi
+        mov     DWORD [12+esp],ecx
+        mov     edi,DWORD [20+esp]
+        mov     ebx,DWORD [24+esp]
+        mov     ecx,DWORD [28+esp]
+        add     edx,DWORD [16+esi]
+        add     edi,DWORD [20+esi]
+        add     ebx,DWORD [24+esi]
+        add     ecx,DWORD [28+esi]
+        mov     DWORD [16+esi],edx
+        mov     DWORD [20+esi],edi
+        mov     DWORD [24+esi],ebx
+        mov     DWORD [28+esi],ecx
+        mov     DWORD [20+esp],edi
+        mov     edi,DWORD [100+esp]
+        mov     DWORD [24+esp],ebx
+        mov     DWORD [28+esp],ecx
+        cmp     edi,DWORD [104+esp]
+        jb      NEAR L$010grand_loop
+        mov     esp,DWORD [108+esp]
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+align   32
+L$004shaext:
+        sub     esp,32
+        movdqu  xmm1,[esi]
+        lea     ebp,[128+ebp]
+        movdqu  xmm2,[16+esi]
+        movdqa  xmm7,[128+ebp]
+        pshufd  xmm0,xmm1,27
+        pshufd  xmm1,xmm1,177
+        pshufd  xmm2,xmm2,27
+db      102,15,58,15,202,8
+        punpcklqdq      xmm2,xmm0
+        jmp     NEAR L$011loop_shaext
+align   16
+L$011loop_shaext:
+        movdqu  xmm3,[edi]
+        movdqu  xmm4,[16+edi]
+        movdqu  xmm5,[32+edi]
+db      102,15,56,0,223
+        movdqu  xmm6,[48+edi]
+        movdqa  [16+esp],xmm2
+        movdqa  xmm0,[ebp-128]
+        paddd   xmm0,xmm3
+db      102,15,56,0,231
+db      15,56,203,209
+        pshufd  xmm0,xmm0,14
+        nop
+        movdqa  [esp],xmm1
+db      15,56,203,202
+        movdqa  xmm0,[ebp-112]
+        paddd   xmm0,xmm4
+db      102,15,56,0,239
+db      15,56,203,209
+        pshufd  xmm0,xmm0,14
+        lea     edi,[64+edi]
+db      15,56,204,220
+db      15,56,203,202
+        movdqa  xmm0,[ebp-96]
+        paddd   xmm0,xmm5
+db      102,15,56,0,247
+db      15,56,203,209
+        pshufd  xmm0,xmm0,14
+        movdqa  xmm7,xmm6
+db      102,15,58,15,253,4
+        nop
+        paddd   xmm3,xmm7
+db      15,56,204,229
+db      15,56,203,202
+        movdqa  xmm0,[ebp-80]
+        paddd   xmm0,xmm6
+db      15,56,205,222
+db      15,56,203,209
+        pshufd  xmm0,xmm0,14
+        movdqa  xmm7,xmm3
+db      102,15,58,15,254,4
+        nop
+        paddd   xmm4,xmm7
+db      15,56,204,238
+db      15,56,203,202
+        movdqa  xmm0,[ebp-64]
+        paddd   xmm0,xmm3
+db      15,56,205,227
+db      15,56,203,209
+        pshufd  xmm0,xmm0,14
+        movdqa  xmm7,xmm4
+db      102,15,58,15,251,4
+        nop
+        paddd   xmm5,xmm7
+db      15,56,204,243
+db      15,56,203,202
+        movdqa  xmm0,[ebp-48]
+        paddd   xmm0,xmm4
+db      15,56,205,236
+db      15,56,203,209
+        pshufd  xmm0,xmm0,14
+        movdqa  xmm7,xmm5
+db      102,15,58,15,252,4
+        nop
+        paddd   xmm6,xmm7
+db      15,56,204,220
+db      15,56,203,202
+        movdqa  xmm0,[ebp-32]
+        paddd   xmm0,xmm5
+db      15,56,205,245
+db      15,56,203,209
+        pshufd  xmm0,xmm0,14
+        movdqa  xmm7,xmm6
+db      102,15,58,15,253,4
+        nop
+        paddd   xmm3,xmm7
+db      15,56,204,229
+db      15,56,203,202
+        movdqa  xmm0,[ebp-16]
+        paddd   xmm0,xmm6
+db      15,56,205,222
+db      15,56,203,209
+        pshufd  xmm0,xmm0,14
+        movdqa  xmm7,xmm3
+db      102,15,58,15,254,4
+        nop
+        paddd   xmm4,xmm7
+db      15,56,204,238
+db      15,56,203,202
+        movdqa  xmm0,[ebp]
+        paddd   xmm0,xmm3
+db      15,56,205,227
+db      15,56,203,209
+        pshufd  xmm0,xmm0,14
+        movdqa  xmm7,xmm4
+db      102,15,58,15,251,4
+        nop
+        paddd   xmm5,xmm7
+db      15,56,204,243
+db      15,56,203,202
+        movdqa  xmm0,[16+ebp]
+        paddd   xmm0,xmm4
+db      15,56,205,236
+db      15,56,203,209
+        pshufd  xmm0,xmm0,14
+        movdqa  xmm7,xmm5
+db      102,15,58,15,252,4
+        nop
+        paddd   xmm6,xmm7
+db      15,56,204,220
+db      15,56,203,202
+        movdqa  xmm0,[32+ebp]
+        paddd   xmm0,xmm5
+db      15,56,205,245
+db      15,56,203,209
+        pshufd  xmm0,xmm0,14
+        movdqa  xmm7,xmm6
+db      102,15,58,15,253,4
+        nop
+        paddd   xmm3,xmm7
+db      15,56,204,229
+db      15,56,203,202
+        movdqa  xmm0,[48+ebp]
+        paddd   xmm0,xmm6
+db      15,56,205,222
+db      15,56,203,209
+        pshufd  xmm0,xmm0,14
+        movdqa  xmm7,xmm3
+db      102,15,58,15,254,4
+        nop
+        paddd   xmm4,xmm7
+db      15,56,204,238
+db      15,56,203,202
+        movdqa  xmm0,[64+ebp]
+        paddd   xmm0,xmm3
+db      15,56,205,227
+db      15,56,203,209
+        pshufd  xmm0,xmm0,14
+        movdqa  xmm7,xmm4
+db      102,15,58,15,251,4
+        nop
+        paddd   xmm5,xmm7
+db      15,56,204,243
+db      15,56,203,202
+        movdqa  xmm0,[80+ebp]
+        paddd   xmm0,xmm4
+db      15,56,205,236
+db      15,56,203,209
+        pshufd  xmm0,xmm0,14
+        movdqa  xmm7,xmm5
+db      102,15,58,15,252,4
+db      15,56,203,202
+        paddd   xmm6,xmm7
+        movdqa  xmm0,[96+ebp]
+        paddd   xmm0,xmm5
+db      15,56,203,209
+        pshufd  xmm0,xmm0,14
+db      15,56,205,245
+        movdqa  xmm7,[128+ebp]
+db      15,56,203,202
+        movdqa  xmm0,[112+ebp]
+        paddd   xmm0,xmm6
+        nop
+db      15,56,203,209
+        pshufd  xmm0,xmm0,14
+        cmp     eax,edi
+        nop
+db      15,56,203,202
+        paddd   xmm2,[16+esp]
+        paddd   xmm1,[esp]
+        jnz     NEAR L$011loop_shaext
+        pshufd  xmm2,xmm2,177
+        pshufd  xmm7,xmm1,27
+        pshufd  xmm1,xmm1,177
+        punpckhqdq      xmm1,xmm2
+db      102,15,58,15,215,8
+        mov     esp,DWORD [44+esp]
+        movdqu  [esi],xmm1
+        movdqu  [16+esi],xmm2
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+align   32
+L$006SSSE3:
+        lea     esp,[esp-96]
+        mov     eax,DWORD [esi]
+        mov     ebx,DWORD [4+esi]
+        mov     ecx,DWORD [8+esi]
+        mov     edi,DWORD [12+esi]
+        mov     DWORD [4+esp],ebx
+        xor     ebx,ecx
+        mov     DWORD [8+esp],ecx
+        mov     DWORD [12+esp],edi
+        mov     edx,DWORD [16+esi]
+        mov     edi,DWORD [20+esi]
+        mov     ecx,DWORD [24+esi]
+        mov     esi,DWORD [28+esi]
+        mov     DWORD [20+esp],edi
+        mov     edi,DWORD [100+esp]
+        mov     DWORD [24+esp],ecx
+        mov     DWORD [28+esp],esi
+        movdqa  xmm7,[256+ebp]
+        jmp     NEAR L$012grand_ssse3
+align   16
+L$012grand_ssse3:
+        movdqu  xmm0,[edi]
+        movdqu  xmm1,[16+edi]
+        movdqu  xmm2,[32+edi]
+        movdqu  xmm3,[48+edi]
+        add     edi,64
+db      102,15,56,0,199
+        mov     DWORD [100+esp],edi
+db      102,15,56,0,207
+        movdqa  xmm4,[ebp]
+db      102,15,56,0,215
+        movdqa  xmm5,[16+ebp]
+        paddd   xmm4,xmm0
+db      102,15,56,0,223
+        movdqa  xmm6,[32+ebp]
+        paddd   xmm5,xmm1
+        movdqa  xmm7,[48+ebp]
+        movdqa  [32+esp],xmm4
+        paddd   xmm6,xmm2
+        movdqa  [48+esp],xmm5
+        paddd   xmm7,xmm3
+        movdqa  [64+esp],xmm6
+        movdqa  [80+esp],xmm7
+        jmp     NEAR L$013ssse3_00_47
+align   16
+L$013ssse3_00_47:
+        add     ebp,64
+        mov     ecx,edx
+        movdqa  xmm4,xmm1
+        ror     edx,14
+        mov     esi,DWORD [20+esp]
+        movdqa  xmm7,xmm3
+        xor     edx,ecx
+        mov     edi,DWORD [24+esp]
+db      102,15,58,15,224,4
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+db      102,15,58,15,250,4
+        mov     DWORD [16+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        movdqa  xmm5,xmm4
+        ror     edx,6
+        mov     ecx,eax
+        movdqa  xmm6,xmm4
+        add     edx,edi
+        mov     edi,DWORD [4+esp]
+        psrld   xmm4,3
+        mov     esi,eax
+        ror     ecx,9
+        paddd   xmm0,xmm7
+        mov     DWORD [esp],eax
+        xor     ecx,eax
+        psrld   xmm6,7
+        xor     eax,edi
+        add     edx,DWORD [28+esp]
+        ror     ecx,11
+        and     ebx,eax
+        pshufd  xmm7,xmm3,250
+        xor     ecx,esi
+        add     edx,DWORD [32+esp]
+        pslld   xmm5,14
+        xor     ebx,edi
+        ror     ecx,2
+        pxor    xmm4,xmm6
+        add     ebx,edx
+        add     edx,DWORD [12+esp]
+        psrld   xmm6,11
+        add     ebx,ecx
+        mov     ecx,edx
+        ror     edx,14
+        pxor    xmm4,xmm5
+        mov     esi,DWORD [16+esp]
+        xor     edx,ecx
+        pslld   xmm5,11
+        mov     edi,DWORD [20+esp]
+        xor     esi,edi
+        ror     edx,5
+        pxor    xmm4,xmm6
+        and     esi,ecx
+        mov     DWORD [12+esp],ecx
+        movdqa  xmm6,xmm7
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        pxor    xmm4,xmm5
+        mov     ecx,ebx
+        add     edx,edi
+        psrld   xmm7,10
+        mov     edi,DWORD [esp]
+        mov     esi,ebx
+        ror     ecx,9
+        paddd   xmm0,xmm4
+        mov     DWORD [28+esp],ebx
+        xor     ecx,ebx
+        psrlq   xmm6,17
+        xor     ebx,edi
+        add     edx,DWORD [24+esp]
+        ror     ecx,11
+        pxor    xmm7,xmm6
+        and     eax,ebx
+        xor     ecx,esi
+        psrlq   xmm6,2
+        add     edx,DWORD [36+esp]
+        xor     eax,edi
+        ror     ecx,2
+        pxor    xmm7,xmm6
+        add     eax,edx
+        add     edx,DWORD [8+esp]
+        pshufd  xmm7,xmm7,128
+        add     eax,ecx
+        mov     ecx,edx
+        ror     edx,14
+        mov     esi,DWORD [12+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [16+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        psrldq  xmm7,8
+        mov     DWORD [8+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        paddd   xmm0,xmm7
+        ror     edx,6
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [28+esp]
+        mov     esi,eax
+        ror     ecx,9
+        mov     DWORD [24+esp],eax
+        pshufd  xmm7,xmm0,80
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [20+esp]
+        movdqa  xmm6,xmm7
+        ror     ecx,11
+        psrld   xmm7,10
+        and     ebx,eax
+        psrlq   xmm6,17
+        xor     ecx,esi
+        add     edx,DWORD [40+esp]
+        xor     ebx,edi
+        ror     ecx,2
+        pxor    xmm7,xmm6
+        add     ebx,edx
+        add     edx,DWORD [4+esp]
+        psrlq   xmm6,2
+        add     ebx,ecx
+        mov     ecx,edx
+        ror     edx,14
+        pxor    xmm7,xmm6
+        mov     esi,DWORD [8+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [12+esp]
+        pshufd  xmm7,xmm7,8
+        xor     esi,edi
+        ror     edx,5
+        movdqa  xmm6,[ebp]
+        and     esi,ecx
+        mov     DWORD [4+esp],ecx
+        pslldq  xmm7,8
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [24+esp]
+        mov     esi,ebx
+        ror     ecx,9
+        paddd   xmm0,xmm7
+        mov     DWORD [20+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [16+esp]
+        paddd   xmm6,xmm0
+        ror     ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [44+esp]
+        xor     eax,edi
+        ror     ecx,2
+        add     eax,edx
+        add     edx,DWORD [esp]
+        add     eax,ecx
+        movdqa  [32+esp],xmm6
+        mov     ecx,edx
+        movdqa  xmm4,xmm2
+        ror     edx,14
+        mov     esi,DWORD [4+esp]
+        movdqa  xmm7,xmm0
+        xor     edx,ecx
+        mov     edi,DWORD [8+esp]
+db      102,15,58,15,225,4
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+db      102,15,58,15,251,4
+        mov     DWORD [esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        movdqa  xmm5,xmm4
+        ror     edx,6
+        mov     ecx,eax
+        movdqa  xmm6,xmm4
+        add     edx,edi
+        mov     edi,DWORD [20+esp]
+        psrld   xmm4,3
+        mov     esi,eax
+        ror     ecx,9
+        paddd   xmm1,xmm7
+        mov     DWORD [16+esp],eax
+        xor     ecx,eax
+        psrld   xmm6,7
+        xor     eax,edi
+        add     edx,DWORD [12+esp]
+        ror     ecx,11
+        and     ebx,eax
+        pshufd  xmm7,xmm0,250
+        xor     ecx,esi
+        add     edx,DWORD [48+esp]
+        pslld   xmm5,14
+        xor     ebx,edi
+        ror     ecx,2
+        pxor    xmm4,xmm6
+        add     ebx,edx
+        add     edx,DWORD [28+esp]
+        psrld   xmm6,11
+        add     ebx,ecx
+        mov     ecx,edx
+        ror     edx,14
+        pxor    xmm4,xmm5
+        mov     esi,DWORD [esp]
+        xor     edx,ecx
+        pslld   xmm5,11
+        mov     edi,DWORD [4+esp]
+        xor     esi,edi
+        ror     edx,5
+        pxor    xmm4,xmm6
+        and     esi,ecx
+        mov     DWORD [28+esp],ecx
+        movdqa  xmm6,xmm7
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        pxor    xmm4,xmm5
+        mov     ecx,ebx
+        add     edx,edi
+        psrld   xmm7,10
+        mov     edi,DWORD [16+esp]
+        mov     esi,ebx
+        ror     ecx,9
+        paddd   xmm1,xmm4
+        mov     DWORD [12+esp],ebx
+        xor     ecx,ebx
+        psrlq   xmm6,17
+        xor     ebx,edi
+        add     edx,DWORD [8+esp]
+        ror     ecx,11
+        pxor    xmm7,xmm6
+        and     eax,ebx
+        xor     ecx,esi
+        psrlq   xmm6,2
+        add     edx,DWORD [52+esp]
+        xor     eax,edi
+        ror     ecx,2
+        pxor    xmm7,xmm6
+        add     eax,edx
+        add     edx,DWORD [24+esp]
+        pshufd  xmm7,xmm7,128
+        add     eax,ecx
+        mov     ecx,edx
+        ror     edx,14
+        mov     esi,DWORD [28+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        psrldq  xmm7,8
+        mov     DWORD [24+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        paddd   xmm1,xmm7
+        ror     edx,6
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [12+esp]
+        mov     esi,eax
+        ror     ecx,9
+        mov     DWORD [8+esp],eax
+        pshufd  xmm7,xmm1,80
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [4+esp]
+        movdqa  xmm6,xmm7
+        ror     ecx,11
+        psrld   xmm7,10
+        and     ebx,eax
+        psrlq   xmm6,17
+        xor     ecx,esi
+        add     edx,DWORD [56+esp]
+        xor     ebx,edi
+        ror     ecx,2
+        pxor    xmm7,xmm6
+        add     ebx,edx
+        add     edx,DWORD [20+esp]
+        psrlq   xmm6,2
+        add     ebx,ecx
+        mov     ecx,edx
+        ror     edx,14
+        pxor    xmm7,xmm6
+        mov     esi,DWORD [24+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [28+esp]
+        pshufd  xmm7,xmm7,8
+        xor     esi,edi
+        ror     edx,5
+        movdqa  xmm6,[16+ebp]
+        and     esi,ecx
+        mov     DWORD [20+esp],ecx
+        pslldq  xmm7,8
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [8+esp]
+        mov     esi,ebx
+        ror     ecx,9
+        paddd   xmm1,xmm7
+        mov     DWORD [4+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [esp]
+        paddd   xmm6,xmm1
+        ror     ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [60+esp]
+        xor     eax,edi
+        ror     ecx,2
+        add     eax,edx
+        add     edx,DWORD [16+esp]
+        add     eax,ecx
+        movdqa  [48+esp],xmm6
+        mov     ecx,edx
+        movdqa  xmm4,xmm3
+        ror     edx,14
+        mov     esi,DWORD [20+esp]
+        movdqa  xmm7,xmm1
+        xor     edx,ecx
+        mov     edi,DWORD [24+esp]
+db      102,15,58,15,226,4
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+db      102,15,58,15,248,4
+        mov     DWORD [16+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        movdqa  xmm5,xmm4
+        ror     edx,6
+        mov     ecx,eax
+        movdqa  xmm6,xmm4
+        add     edx,edi
+        mov     edi,DWORD [4+esp]
+        psrld   xmm4,3
+        mov     esi,eax
+        ror     ecx,9
+        paddd   xmm2,xmm7
+        mov     DWORD [esp],eax
+        xor     ecx,eax
+        psrld   xmm6,7
+        xor     eax,edi
+        add     edx,DWORD [28+esp]
+        ror     ecx,11
+        and     ebx,eax
+        pshufd  xmm7,xmm1,250
+        xor     ecx,esi
+        add     edx,DWORD [64+esp]
+        pslld   xmm5,14
+        xor     ebx,edi
+        ror     ecx,2
+        pxor    xmm4,xmm6
+        add     ebx,edx
+        add     edx,DWORD [12+esp]
+        psrld   xmm6,11
+        add     ebx,ecx
+        mov     ecx,edx
+        ror     edx,14
+        pxor    xmm4,xmm5
+        mov     esi,DWORD [16+esp]
+        xor     edx,ecx
+        pslld   xmm5,11
+        mov     edi,DWORD [20+esp]
+        xor     esi,edi
+        ror     edx,5
+        pxor    xmm4,xmm6
+        and     esi,ecx
+        mov     DWORD [12+esp],ecx
+        movdqa  xmm6,xmm7
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        pxor    xmm4,xmm5
+        mov     ecx,ebx
+        add     edx,edi
+        psrld   xmm7,10
+        mov     edi,DWORD [esp]
+        mov     esi,ebx
+        ror     ecx,9
+        paddd   xmm2,xmm4
+        mov     DWORD [28+esp],ebx
+        xor     ecx,ebx
+        psrlq   xmm6,17
+        xor     ebx,edi
+        add     edx,DWORD [24+esp]
+        ror     ecx,11
+        pxor    xmm7,xmm6
+        and     eax,ebx
+        xor     ecx,esi
+        psrlq   xmm6,2
+        add     edx,DWORD [68+esp]
+        xor     eax,edi
+        ror     ecx,2
+        pxor    xmm7,xmm6
+        add     eax,edx
+        add     edx,DWORD [8+esp]
+        pshufd  xmm7,xmm7,128
+        add     eax,ecx
+        mov     ecx,edx
+        ror     edx,14
+        mov     esi,DWORD [12+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [16+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        psrldq  xmm7,8
+        mov     DWORD [8+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        paddd   xmm2,xmm7
+        ror     edx,6
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [28+esp]
+        mov     esi,eax
+        ror     ecx,9
+        mov     DWORD [24+esp],eax
+        pshufd  xmm7,xmm2,80
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [20+esp]
+        movdqa  xmm6,xmm7
+        ror     ecx,11
+        psrld   xmm7,10
+        and     ebx,eax
+        psrlq   xmm6,17
+        xor     ecx,esi
+        add     edx,DWORD [72+esp]
+        xor     ebx,edi
+        ror     ecx,2
+        pxor    xmm7,xmm6
+        add     ebx,edx
+        add     edx,DWORD [4+esp]
+        psrlq   xmm6,2
+        add     ebx,ecx
+        mov     ecx,edx
+        ror     edx,14
+        pxor    xmm7,xmm6
+        mov     esi,DWORD [8+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [12+esp]
+        pshufd  xmm7,xmm7,8
+        xor     esi,edi
+        ror     edx,5
+        movdqa  xmm6,[32+ebp]
+        and     esi,ecx
+        mov     DWORD [4+esp],ecx
+        pslldq  xmm7,8
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [24+esp]
+        mov     esi,ebx
+        ror     ecx,9
+        paddd   xmm2,xmm7
+        mov     DWORD [20+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [16+esp]
+        paddd   xmm6,xmm2
+        ror     ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [76+esp]
+        xor     eax,edi
+        ror     ecx,2
+        add     eax,edx
+        add     edx,DWORD [esp]
+        add     eax,ecx
+        movdqa  [64+esp],xmm6
+        mov     ecx,edx
+        movdqa  xmm4,xmm0
+        ror     edx,14
+        mov     esi,DWORD [4+esp]
+        movdqa  xmm7,xmm2
+        xor     edx,ecx
+        mov     edi,DWORD [8+esp]
+db      102,15,58,15,227,4
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+db      102,15,58,15,249,4
+        mov     DWORD [esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        movdqa  xmm5,xmm4
+        ror     edx,6
+        mov     ecx,eax
+        movdqa  xmm6,xmm4
+        add     edx,edi
+        mov     edi,DWORD [20+esp]
+        psrld   xmm4,3
+        mov     esi,eax
+        ror     ecx,9
+        paddd   xmm3,xmm7
+        mov     DWORD [16+esp],eax
+        xor     ecx,eax
+        psrld   xmm6,7
+        xor     eax,edi
+        add     edx,DWORD [12+esp]
+        ror     ecx,11
+        and     ebx,eax
+        pshufd  xmm7,xmm2,250
+        xor     ecx,esi
+        add     edx,DWORD [80+esp]
+        pslld   xmm5,14
+        xor     ebx,edi
+        ror     ecx,2
+        pxor    xmm4,xmm6
+        add     ebx,edx
+        add     edx,DWORD [28+esp]
+        psrld   xmm6,11
+        add     ebx,ecx
+        mov     ecx,edx
+        ror     edx,14
+        pxor    xmm4,xmm5
+        mov     esi,DWORD [esp]
+        xor     edx,ecx
+        pslld   xmm5,11
+        mov     edi,DWORD [4+esp]
+        xor     esi,edi
+        ror     edx,5
+        pxor    xmm4,xmm6
+        and     esi,ecx
+        mov     DWORD [28+esp],ecx
+        movdqa  xmm6,xmm7
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        pxor    xmm4,xmm5
+        mov     ecx,ebx
+        add     edx,edi
+        psrld   xmm7,10
+        mov     edi,DWORD [16+esp]
+        mov     esi,ebx
+        ror     ecx,9
+        paddd   xmm3,xmm4
+        mov     DWORD [12+esp],ebx
+        xor     ecx,ebx
+        psrlq   xmm6,17
+        xor     ebx,edi
+        add     edx,DWORD [8+esp]
+        ror     ecx,11
+        pxor    xmm7,xmm6
+        and     eax,ebx
+        xor     ecx,esi
+        psrlq   xmm6,2
+        add     edx,DWORD [84+esp]
+        xor     eax,edi
+        ror     ecx,2
+        pxor    xmm7,xmm6
+        add     eax,edx
+        add     edx,DWORD [24+esp]
+        pshufd  xmm7,xmm7,128
+        add     eax,ecx
+        mov     ecx,edx
+        ror     edx,14
+        mov     esi,DWORD [28+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        psrldq  xmm7,8
+        mov     DWORD [24+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        paddd   xmm3,xmm7
+        ror     edx,6
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [12+esp]
+        mov     esi,eax
+        ror     ecx,9
+        mov     DWORD [8+esp],eax
+        pshufd  xmm7,xmm3,80
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [4+esp]
+        movdqa  xmm6,xmm7
+        ror     ecx,11
+        psrld   xmm7,10
+        and     ebx,eax
+        psrlq   xmm6,17
+        xor     ecx,esi
+        add     edx,DWORD [88+esp]
+        xor     ebx,edi
+        ror     ecx,2
+        pxor    xmm7,xmm6
+        add     ebx,edx
+        add     edx,DWORD [20+esp]
+        psrlq   xmm6,2
+        add     ebx,ecx
+        mov     ecx,edx
+        ror     edx,14
+        pxor    xmm7,xmm6
+        mov     esi,DWORD [24+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [28+esp]
+        pshufd  xmm7,xmm7,8
+        xor     esi,edi
+        ror     edx,5
+        movdqa  xmm6,[48+ebp]
+        and     esi,ecx
+        mov     DWORD [20+esp],ecx
+        pslldq  xmm7,8
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [8+esp]
+        mov     esi,ebx
+        ror     ecx,9
+        paddd   xmm3,xmm7
+        mov     DWORD [4+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [esp]
+        paddd   xmm6,xmm3
+        ror     ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [92+esp]
+        xor     eax,edi
+        ror     ecx,2
+        add     eax,edx
+        add     edx,DWORD [16+esp]
+        add     eax,ecx
+        movdqa  [80+esp],xmm6
+        cmp     DWORD [64+ebp],66051
+        jne     NEAR L$013ssse3_00_47
+        mov     ecx,edx
+        ror     edx,14
+        mov     esi,DWORD [20+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [24+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [16+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [4+esp]
+        mov     esi,eax
+        ror     ecx,9
+        mov     DWORD [esp],eax
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [28+esp]
+        ror     ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        add     edx,DWORD [32+esp]
+        xor     ebx,edi
+        ror     ecx,2
+        add     ebx,edx
+        add     edx,DWORD [12+esp]
+        add     ebx,ecx
+        mov     ecx,edx
+        ror     edx,14
+        mov     esi,DWORD [16+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [20+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [12+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [esp]
+        mov     esi,ebx
+        ror     ecx,9
+        mov     DWORD [28+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [24+esp]
+        ror     ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [36+esp]
+        xor     eax,edi
+        ror     ecx,2
+        add     eax,edx
+        add     edx,DWORD [8+esp]
+        add     eax,ecx
+        mov     ecx,edx
+        ror     edx,14
+        mov     esi,DWORD [12+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [16+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [8+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [28+esp]
+        mov     esi,eax
+        ror     ecx,9
+        mov     DWORD [24+esp],eax
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [20+esp]
+        ror     ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        add     edx,DWORD [40+esp]
+        xor     ebx,edi
+        ror     ecx,2
+        add     ebx,edx
+        add     edx,DWORD [4+esp]
+        add     ebx,ecx
+        mov     ecx,edx
+        ror     edx,14
+        mov     esi,DWORD [8+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [12+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [4+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [24+esp]
+        mov     esi,ebx
+        ror     ecx,9
+        mov     DWORD [20+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [16+esp]
+        ror     ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [44+esp]
+        xor     eax,edi
+        ror     ecx,2
+        add     eax,edx
+        add     edx,DWORD [esp]
+        add     eax,ecx
+        mov     ecx,edx
+        ror     edx,14
+        mov     esi,DWORD [4+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [8+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [20+esp]
+        mov     esi,eax
+        ror     ecx,9
+        mov     DWORD [16+esp],eax
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [12+esp]
+        ror     ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        add     edx,DWORD [48+esp]
+        xor     ebx,edi
+        ror     ecx,2
+        add     ebx,edx
+        add     edx,DWORD [28+esp]
+        add     ebx,ecx
+        mov     ecx,edx
+        ror     edx,14
+        mov     esi,DWORD [esp]
+        xor     edx,ecx
+        mov     edi,DWORD [4+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [28+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [16+esp]
+        mov     esi,ebx
+        ror     ecx,9
+        mov     DWORD [12+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [8+esp]
+        ror     ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [52+esp]
+        xor     eax,edi
+        ror     ecx,2
+        add     eax,edx
+        add     edx,DWORD [24+esp]
+        add     eax,ecx
+        mov     ecx,edx
+        ror     edx,14
+        mov     esi,DWORD [28+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [24+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [12+esp]
+        mov     esi,eax
+        ror     ecx,9
+        mov     DWORD [8+esp],eax
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [4+esp]
+        ror     ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        add     edx,DWORD [56+esp]
+        xor     ebx,edi
+        ror     ecx,2
+        add     ebx,edx
+        add     edx,DWORD [20+esp]
+        add     ebx,ecx
+        mov     ecx,edx
+        ror     edx,14
+        mov     esi,DWORD [24+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [28+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [20+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [8+esp]
+        mov     esi,ebx
+        ror     ecx,9
+        mov     DWORD [4+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [esp]
+        ror     ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [60+esp]
+        xor     eax,edi
+        ror     ecx,2
+        add     eax,edx
+        add     edx,DWORD [16+esp]
+        add     eax,ecx
+        mov     ecx,edx
+        ror     edx,14
+        mov     esi,DWORD [20+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [24+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [16+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [4+esp]
+        mov     esi,eax
+        ror     ecx,9
+        mov     DWORD [esp],eax
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [28+esp]
+        ror     ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        add     edx,DWORD [64+esp]
+        xor     ebx,edi
+        ror     ecx,2
+        add     ebx,edx
+        add     edx,DWORD [12+esp]
+        add     ebx,ecx
+        mov     ecx,edx
+        ror     edx,14
+        mov     esi,DWORD [16+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [20+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [12+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [esp]
+        mov     esi,ebx
+        ror     ecx,9
+        mov     DWORD [28+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [24+esp]
+        ror     ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [68+esp]
+        xor     eax,edi
+        ror     ecx,2
+        add     eax,edx
+        add     edx,DWORD [8+esp]
+        add     eax,ecx
+        mov     ecx,edx
+        ror     edx,14
+        mov     esi,DWORD [12+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [16+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [8+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [28+esp]
+        mov     esi,eax
+        ror     ecx,9
+        mov     DWORD [24+esp],eax
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [20+esp]
+        ror     ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        add     edx,DWORD [72+esp]
+        xor     ebx,edi
+        ror     ecx,2
+        add     ebx,edx
+        add     edx,DWORD [4+esp]
+        add     ebx,ecx
+        mov     ecx,edx
+        ror     edx,14
+        mov     esi,DWORD [8+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [12+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [4+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [24+esp]
+        mov     esi,ebx
+        ror     ecx,9
+        mov     DWORD [20+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [16+esp]
+        ror     ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [76+esp]
+        xor     eax,edi
+        ror     ecx,2
+        add     eax,edx
+        add     edx,DWORD [esp]
+        add     eax,ecx
+        mov     ecx,edx
+        ror     edx,14
+        mov     esi,DWORD [4+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [8+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [20+esp]
+        mov     esi,eax
+        ror     ecx,9
+        mov     DWORD [16+esp],eax
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [12+esp]
+        ror     ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        add     edx,DWORD [80+esp]
+        xor     ebx,edi
+        ror     ecx,2
+        add     ebx,edx
+        add     edx,DWORD [28+esp]
+        add     ebx,ecx
+        mov     ecx,edx
+        ror     edx,14
+        mov     esi,DWORD [esp]
+        xor     edx,ecx
+        mov     edi,DWORD [4+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [28+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [16+esp]
+        mov     esi,ebx
+        ror     ecx,9
+        mov     DWORD [12+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [8+esp]
+        ror     ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [84+esp]
+        xor     eax,edi
+        ror     ecx,2
+        add     eax,edx
+        add     edx,DWORD [24+esp]
+        add     eax,ecx
+        mov     ecx,edx
+        ror     edx,14
+        mov     esi,DWORD [28+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [24+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [12+esp]
+        mov     esi,eax
+        ror     ecx,9
+        mov     DWORD [8+esp],eax
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [4+esp]
+        ror     ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        add     edx,DWORD [88+esp]
+        xor     ebx,edi
+        ror     ecx,2
+        add     ebx,edx
+        add     edx,DWORD [20+esp]
+        add     ebx,ecx
+        mov     ecx,edx
+        ror     edx,14
+        mov     esi,DWORD [24+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [28+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [20+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [8+esp]
+        mov     esi,ebx
+        ror     ecx,9
+        mov     DWORD [4+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [esp]
+        ror     ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [92+esp]
+        xor     eax,edi
+        ror     ecx,2
+        add     eax,edx
+        add     edx,DWORD [16+esp]
+        add     eax,ecx
+        mov     esi,DWORD [96+esp]
+        xor     ebx,edi
+        mov     ecx,DWORD [12+esp]
+        add     eax,DWORD [esi]
+        add     ebx,DWORD [4+esi]
+        add     edi,DWORD [8+esi]
+        add     ecx,DWORD [12+esi]
+        mov     DWORD [esi],eax
+        mov     DWORD [4+esi],ebx
+        mov     DWORD [8+esi],edi
+        mov     DWORD [12+esi],ecx
+        mov     DWORD [4+esp],ebx
+        xor     ebx,edi
+        mov     DWORD [8+esp],edi
+        mov     DWORD [12+esp],ecx
+        mov     edi,DWORD [20+esp]
+        mov     ecx,DWORD [24+esp]
+        add     edx,DWORD [16+esi]
+        add     edi,DWORD [20+esi]
+        add     ecx,DWORD [24+esi]
+        mov     DWORD [16+esi],edx
+        mov     DWORD [20+esi],edi
+        mov     DWORD [20+esp],edi
+        mov     edi,DWORD [28+esp]
+        mov     DWORD [24+esi],ecx
+        add     edi,DWORD [28+esi]
+        mov     DWORD [24+esp],ecx
+        mov     DWORD [28+esi],edi
+        mov     DWORD [28+esp],edi
+        mov     edi,DWORD [100+esp]
+        movdqa  xmm7,[64+ebp]
+        sub     ebp,192
+        cmp     edi,DWORD [104+esp]
+        jb      NEAR L$012grand_ssse3
+        mov     esp,DWORD [108+esp]
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+align   32
+L$005AVX:
+        and     edx,264
+        cmp     edx,264
+        je      NEAR L$014AVX_BMI
+        lea     esp,[esp-96]
+        vzeroall
+        mov     eax,DWORD [esi]
+        mov     ebx,DWORD [4+esi]
+        mov     ecx,DWORD [8+esi]
+        mov     edi,DWORD [12+esi]
+        mov     DWORD [4+esp],ebx
+        xor     ebx,ecx
+        mov     DWORD [8+esp],ecx
+        mov     DWORD [12+esp],edi
+        mov     edx,DWORD [16+esi]
+        mov     edi,DWORD [20+esi]
+        mov     ecx,DWORD [24+esi]
+        mov     esi,DWORD [28+esi]
+        mov     DWORD [20+esp],edi
+        mov     edi,DWORD [100+esp]
+        mov     DWORD [24+esp],ecx
+        mov     DWORD [28+esp],esi
+        vmovdqa xmm7,[256+ebp]
+        jmp     NEAR L$015grand_avx
+align   32
+L$015grand_avx:
+        vmovdqu xmm0,[edi]
+        vmovdqu xmm1,[16+edi]
+        vmovdqu xmm2,[32+edi]
+        vmovdqu xmm3,[48+edi]
+        add     edi,64
+        vpshufb xmm0,xmm0,xmm7
+        mov     DWORD [100+esp],edi
+        vpshufb xmm1,xmm1,xmm7
+        vpshufb xmm2,xmm2,xmm7
+        vpaddd  xmm4,xmm0,[ebp]
+        vpshufb xmm3,xmm3,xmm7
+        vpaddd  xmm5,xmm1,[16+ebp]
+        vpaddd  xmm6,xmm2,[32+ebp]
+        vpaddd  xmm7,xmm3,[48+ebp]
+        vmovdqa [32+esp],xmm4
+        vmovdqa [48+esp],xmm5
+        vmovdqa [64+esp],xmm6
+        vmovdqa [80+esp],xmm7
+        jmp     NEAR L$016avx_00_47
+align   16
+L$016avx_00_47:
+        add     ebp,64
+        vpalignr        xmm4,xmm1,xmm0,4
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [20+esp]
+        vpalignr        xmm7,xmm3,xmm2,4
+        xor     edx,ecx
+        mov     edi,DWORD [24+esp]
+        xor     esi,edi
+        vpsrld  xmm6,xmm4,7
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [16+esp],ecx
+        vpaddd  xmm0,xmm0,xmm7
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        vpsrld  xmm7,xmm4,3
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [4+esp]
+        vpslld  xmm5,xmm4,14
+        mov     esi,eax
+        shrd    ecx,ecx,9
+        mov     DWORD [esp],eax
+        vpxor   xmm4,xmm7,xmm6
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [28+esp]
+        vpshufd xmm7,xmm3,250
+        shrd    ecx,ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        vpsrld  xmm6,xmm6,11
+        add     edx,DWORD [32+esp]
+        xor     ebx,edi
+        shrd    ecx,ecx,2
+        vpxor   xmm4,xmm4,xmm5
+        add     ebx,edx
+        add     edx,DWORD [12+esp]
+        add     ebx,ecx
+        vpslld  xmm5,xmm5,11
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [16+esp]
+        vpxor   xmm4,xmm4,xmm6
+        xor     edx,ecx
+        mov     edi,DWORD [20+esp]
+        xor     esi,edi
+        vpsrld  xmm6,xmm7,10
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [12+esp],ecx
+        vpxor   xmm4,xmm4,xmm5
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        vpsrlq  xmm5,xmm7,17
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [esp]
+        vpaddd  xmm0,xmm0,xmm4
+        mov     esi,ebx
+        shrd    ecx,ecx,9
+        mov     DWORD [28+esp],ebx
+        vpxor   xmm6,xmm6,xmm5
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [24+esp]
+        vpsrlq  xmm7,xmm7,19
+        shrd    ecx,ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        vpxor   xmm6,xmm6,xmm7
+        add     edx,DWORD [36+esp]
+        xor     eax,edi
+        shrd    ecx,ecx,2
+        vpshufd xmm7,xmm6,132
+        add     eax,edx
+        add     edx,DWORD [8+esp]
+        add     eax,ecx
+        vpsrldq xmm7,xmm7,8
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [12+esp]
+        vpaddd  xmm0,xmm0,xmm7
+        xor     edx,ecx
+        mov     edi,DWORD [16+esp]
+        xor     esi,edi
+        vpshufd xmm7,xmm0,80
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [8+esp],ecx
+        vpsrld  xmm6,xmm7,10
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        vpsrlq  xmm5,xmm7,17
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [28+esp]
+        vpxor   xmm6,xmm6,xmm5
+        mov     esi,eax
+        shrd    ecx,ecx,9
+        mov     DWORD [24+esp],eax
+        vpsrlq  xmm7,xmm7,19
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [20+esp]
+        vpxor   xmm6,xmm6,xmm7
+        shrd    ecx,ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        vpshufd xmm7,xmm6,232
+        add     edx,DWORD [40+esp]
+        xor     ebx,edi
+        shrd    ecx,ecx,2
+        vpslldq xmm7,xmm7,8
+        add     ebx,edx
+        add     edx,DWORD [4+esp]
+        add     ebx,ecx
+        vpaddd  xmm0,xmm0,xmm7
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [8+esp]
+        vpaddd  xmm6,xmm0,[ebp]
+        xor     edx,ecx
+        mov     edi,DWORD [12+esp]
+        xor     esi,edi
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [4+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [24+esp]
+        mov     esi,ebx
+        shrd    ecx,ecx,9
+        mov     DWORD [20+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [16+esp]
+        shrd    ecx,ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [44+esp]
+        xor     eax,edi
+        shrd    ecx,ecx,2
+        add     eax,edx
+        add     edx,DWORD [esp]
+        add     eax,ecx
+        vmovdqa [32+esp],xmm6
+        vpalignr        xmm4,xmm2,xmm1,4
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [4+esp]
+        vpalignr        xmm7,xmm0,xmm3,4
+        xor     edx,ecx
+        mov     edi,DWORD [8+esp]
+        xor     esi,edi
+        vpsrld  xmm6,xmm4,7
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [esp],ecx
+        vpaddd  xmm1,xmm1,xmm7
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        vpsrld  xmm7,xmm4,3
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [20+esp]
+        vpslld  xmm5,xmm4,14
+        mov     esi,eax
+        shrd    ecx,ecx,9
+        mov     DWORD [16+esp],eax
+        vpxor   xmm4,xmm7,xmm6
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [12+esp]
+        vpshufd xmm7,xmm0,250
+        shrd    ecx,ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        vpsrld  xmm6,xmm6,11
+        add     edx,DWORD [48+esp]
+        xor     ebx,edi
+        shrd    ecx,ecx,2
+        vpxor   xmm4,xmm4,xmm5
+        add     ebx,edx
+        add     edx,DWORD [28+esp]
+        add     ebx,ecx
+        vpslld  xmm5,xmm5,11
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [esp]
+        vpxor   xmm4,xmm4,xmm6
+        xor     edx,ecx
+        mov     edi,DWORD [4+esp]
+        xor     esi,edi
+        vpsrld  xmm6,xmm7,10
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [28+esp],ecx
+        vpxor   xmm4,xmm4,xmm5
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        vpsrlq  xmm5,xmm7,17
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [16+esp]
+        vpaddd  xmm1,xmm1,xmm4
+        mov     esi,ebx
+        shrd    ecx,ecx,9
+        mov     DWORD [12+esp],ebx
+        vpxor   xmm6,xmm6,xmm5
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [8+esp]
+        vpsrlq  xmm7,xmm7,19
+        shrd    ecx,ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        vpxor   xmm6,xmm6,xmm7
+        add     edx,DWORD [52+esp]
+        xor     eax,edi
+        shrd    ecx,ecx,2
+        vpshufd xmm7,xmm6,132
+        add     eax,edx
+        add     edx,DWORD [24+esp]
+        add     eax,ecx
+        vpsrldq xmm7,xmm7,8
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [28+esp]
+        vpaddd  xmm1,xmm1,xmm7
+        xor     edx,ecx
+        mov     edi,DWORD [esp]
+        xor     esi,edi
+        vpshufd xmm7,xmm1,80
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [24+esp],ecx
+        vpsrld  xmm6,xmm7,10
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        vpsrlq  xmm5,xmm7,17
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [12+esp]
+        vpxor   xmm6,xmm6,xmm5
+        mov     esi,eax
+        shrd    ecx,ecx,9
+        mov     DWORD [8+esp],eax
+        vpsrlq  xmm7,xmm7,19
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [4+esp]
+        vpxor   xmm6,xmm6,xmm7
+        shrd    ecx,ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        vpshufd xmm7,xmm6,232
+        add     edx,DWORD [56+esp]
+        xor     ebx,edi
+        shrd    ecx,ecx,2
+        vpslldq xmm7,xmm7,8
+        add     ebx,edx
+        add     edx,DWORD [20+esp]
+        add     ebx,ecx
+        vpaddd  xmm1,xmm1,xmm7
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [24+esp]
+        vpaddd  xmm6,xmm1,[16+ebp]
+        xor     edx,ecx
+        mov     edi,DWORD [28+esp]
+        xor     esi,edi
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [20+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [8+esp]
+        mov     esi,ebx
+        shrd    ecx,ecx,9
+        mov     DWORD [4+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [esp]
+        shrd    ecx,ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [60+esp]
+        xor     eax,edi
+        shrd    ecx,ecx,2
+        add     eax,edx
+        add     edx,DWORD [16+esp]
+        add     eax,ecx
+        vmovdqa [48+esp],xmm6
+        vpalignr        xmm4,xmm3,xmm2,4
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [20+esp]
+        vpalignr        xmm7,xmm1,xmm0,4
+        xor     edx,ecx
+        mov     edi,DWORD [24+esp]
+        xor     esi,edi
+        vpsrld  xmm6,xmm4,7
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [16+esp],ecx
+        vpaddd  xmm2,xmm2,xmm7
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        vpsrld  xmm7,xmm4,3
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [4+esp]
+        vpslld  xmm5,xmm4,14
+        mov     esi,eax
+        shrd    ecx,ecx,9
+        mov     DWORD [esp],eax
+        vpxor   xmm4,xmm7,xmm6
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [28+esp]
+        vpshufd xmm7,xmm1,250
+        shrd    ecx,ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        vpsrld  xmm6,xmm6,11
+        add     edx,DWORD [64+esp]
+        xor     ebx,edi
+        shrd    ecx,ecx,2
+        vpxor   xmm4,xmm4,xmm5
+        add     ebx,edx
+        add     edx,DWORD [12+esp]
+        add     ebx,ecx
+        vpslld  xmm5,xmm5,11
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [16+esp]
+        vpxor   xmm4,xmm4,xmm6
+        xor     edx,ecx
+        mov     edi,DWORD [20+esp]
+        xor     esi,edi
+        vpsrld  xmm6,xmm7,10
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [12+esp],ecx
+        vpxor   xmm4,xmm4,xmm5
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        vpsrlq  xmm5,xmm7,17
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [esp]
+        vpaddd  xmm2,xmm2,xmm4
+        mov     esi,ebx
+        shrd    ecx,ecx,9
+        mov     DWORD [28+esp],ebx
+        vpxor   xmm6,xmm6,xmm5
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [24+esp]
+        vpsrlq  xmm7,xmm7,19
+        shrd    ecx,ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        vpxor   xmm6,xmm6,xmm7
+        add     edx,DWORD [68+esp]
+        xor     eax,edi
+        shrd    ecx,ecx,2
+        vpshufd xmm7,xmm6,132
+        add     eax,edx
+        add     edx,DWORD [8+esp]
+        add     eax,ecx
+        vpsrldq xmm7,xmm7,8
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [12+esp]
+        vpaddd  xmm2,xmm2,xmm7
+        xor     edx,ecx
+        mov     edi,DWORD [16+esp]
+        xor     esi,edi
+        vpshufd xmm7,xmm2,80
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [8+esp],ecx
+        vpsrld  xmm6,xmm7,10
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        vpsrlq  xmm5,xmm7,17
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [28+esp]
+        vpxor   xmm6,xmm6,xmm5
+        mov     esi,eax
+        shrd    ecx,ecx,9
+        mov     DWORD [24+esp],eax
+        vpsrlq  xmm7,xmm7,19
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [20+esp]
+        vpxor   xmm6,xmm6,xmm7
+        shrd    ecx,ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        vpshufd xmm7,xmm6,232
+        add     edx,DWORD [72+esp]
+        xor     ebx,edi
+        shrd    ecx,ecx,2
+        vpslldq xmm7,xmm7,8
+        add     ebx,edx
+        add     edx,DWORD [4+esp]
+        add     ebx,ecx
+        vpaddd  xmm2,xmm2,xmm7
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [8+esp]
+        vpaddd  xmm6,xmm2,[32+ebp]
+        xor     edx,ecx
+        mov     edi,DWORD [12+esp]
+        xor     esi,edi
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [4+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [24+esp]
+        mov     esi,ebx
+        shrd    ecx,ecx,9
+        mov     DWORD [20+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [16+esp]
+        shrd    ecx,ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [76+esp]
+        xor     eax,edi
+        shrd    ecx,ecx,2
+        add     eax,edx
+        add     edx,DWORD [esp]
+        add     eax,ecx
+        vmovdqa [64+esp],xmm6
+        vpalignr        xmm4,xmm0,xmm3,4
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [4+esp]
+        vpalignr        xmm7,xmm2,xmm1,4
+        xor     edx,ecx
+        mov     edi,DWORD [8+esp]
+        xor     esi,edi
+        vpsrld  xmm6,xmm4,7
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [esp],ecx
+        vpaddd  xmm3,xmm3,xmm7
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        vpsrld  xmm7,xmm4,3
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [20+esp]
+        vpslld  xmm5,xmm4,14
+        mov     esi,eax
+        shrd    ecx,ecx,9
+        mov     DWORD [16+esp],eax
+        vpxor   xmm4,xmm7,xmm6
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [12+esp]
+        vpshufd xmm7,xmm2,250
+        shrd    ecx,ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        vpsrld  xmm6,xmm6,11
+        add     edx,DWORD [80+esp]
+        xor     ebx,edi
+        shrd    ecx,ecx,2
+        vpxor   xmm4,xmm4,xmm5
+        add     ebx,edx
+        add     edx,DWORD [28+esp]
+        add     ebx,ecx
+        vpslld  xmm5,xmm5,11
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [esp]
+        vpxor   xmm4,xmm4,xmm6
+        xor     edx,ecx
+        mov     edi,DWORD [4+esp]
+        xor     esi,edi
+        vpsrld  xmm6,xmm7,10
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [28+esp],ecx
+        vpxor   xmm4,xmm4,xmm5
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        vpsrlq  xmm5,xmm7,17
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [16+esp]
+        vpaddd  xmm3,xmm3,xmm4
+        mov     esi,ebx
+        shrd    ecx,ecx,9
+        mov     DWORD [12+esp],ebx
+        vpxor   xmm6,xmm6,xmm5
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [8+esp]
+        vpsrlq  xmm7,xmm7,19
+        shrd    ecx,ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        vpxor   xmm6,xmm6,xmm7
+        add     edx,DWORD [84+esp]
+        xor     eax,edi
+        shrd    ecx,ecx,2
+        vpshufd xmm7,xmm6,132
+        add     eax,edx
+        add     edx,DWORD [24+esp]
+        add     eax,ecx
+        vpsrldq xmm7,xmm7,8
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [28+esp]
+        vpaddd  xmm3,xmm3,xmm7
+        xor     edx,ecx
+        mov     edi,DWORD [esp]
+        xor     esi,edi
+        vpshufd xmm7,xmm3,80
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [24+esp],ecx
+        vpsrld  xmm6,xmm7,10
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        vpsrlq  xmm5,xmm7,17
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [12+esp]
+        vpxor   xmm6,xmm6,xmm5
+        mov     esi,eax
+        shrd    ecx,ecx,9
+        mov     DWORD [8+esp],eax
+        vpsrlq  xmm7,xmm7,19
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [4+esp]
+        vpxor   xmm6,xmm6,xmm7
+        shrd    ecx,ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        vpshufd xmm7,xmm6,232
+        add     edx,DWORD [88+esp]
+        xor     ebx,edi
+        shrd    ecx,ecx,2
+        vpslldq xmm7,xmm7,8
+        add     ebx,edx
+        add     edx,DWORD [20+esp]
+        add     ebx,ecx
+        vpaddd  xmm3,xmm3,xmm7
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [24+esp]
+        vpaddd  xmm6,xmm3,[48+ebp]
+        xor     edx,ecx
+        mov     edi,DWORD [28+esp]
+        xor     esi,edi
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [20+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [8+esp]
+        mov     esi,ebx
+        shrd    ecx,ecx,9
+        mov     DWORD [4+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [esp]
+        shrd    ecx,ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [92+esp]
+        xor     eax,edi
+        shrd    ecx,ecx,2
+        add     eax,edx
+        add     edx,DWORD [16+esp]
+        add     eax,ecx
+        vmovdqa [80+esp],xmm6
+        cmp     DWORD [64+ebp],66051
+        jne     NEAR L$016avx_00_47
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [20+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [24+esp]
+        xor     esi,edi
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [16+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [4+esp]
+        mov     esi,eax
+        shrd    ecx,ecx,9
+        mov     DWORD [esp],eax
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [28+esp]
+        shrd    ecx,ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        add     edx,DWORD [32+esp]
+        xor     ebx,edi
+        shrd    ecx,ecx,2
+        add     ebx,edx
+        add     edx,DWORD [12+esp]
+        add     ebx,ecx
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [16+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [20+esp]
+        xor     esi,edi
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [12+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [esp]
+        mov     esi,ebx
+        shrd    ecx,ecx,9
+        mov     DWORD [28+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [24+esp]
+        shrd    ecx,ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [36+esp]
+        xor     eax,edi
+        shrd    ecx,ecx,2
+        add     eax,edx
+        add     edx,DWORD [8+esp]
+        add     eax,ecx
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [12+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [16+esp]
+        xor     esi,edi
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [8+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [28+esp]
+        mov     esi,eax
+        shrd    ecx,ecx,9
+        mov     DWORD [24+esp],eax
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [20+esp]
+        shrd    ecx,ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        add     edx,DWORD [40+esp]
+        xor     ebx,edi
+        shrd    ecx,ecx,2
+        add     ebx,edx
+        add     edx,DWORD [4+esp]
+        add     ebx,ecx
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [8+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [12+esp]
+        xor     esi,edi
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [4+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [24+esp]
+        mov     esi,ebx
+        shrd    ecx,ecx,9
+        mov     DWORD [20+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [16+esp]
+        shrd    ecx,ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [44+esp]
+        xor     eax,edi
+        shrd    ecx,ecx,2
+        add     eax,edx
+        add     edx,DWORD [esp]
+        add     eax,ecx
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [4+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [8+esp]
+        xor     esi,edi
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [20+esp]
+        mov     esi,eax
+        shrd    ecx,ecx,9
+        mov     DWORD [16+esp],eax
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [12+esp]
+        shrd    ecx,ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        add     edx,DWORD [48+esp]
+        xor     ebx,edi
+        shrd    ecx,ecx,2
+        add     ebx,edx
+        add     edx,DWORD [28+esp]
+        add     ebx,ecx
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [esp]
+        xor     edx,ecx
+        mov     edi,DWORD [4+esp]
+        xor     esi,edi
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [28+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [16+esp]
+        mov     esi,ebx
+        shrd    ecx,ecx,9
+        mov     DWORD [12+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [8+esp]
+        shrd    ecx,ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [52+esp]
+        xor     eax,edi
+        shrd    ecx,ecx,2
+        add     eax,edx
+        add     edx,DWORD [24+esp]
+        add     eax,ecx
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [28+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [esp]
+        xor     esi,edi
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [24+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [12+esp]
+        mov     esi,eax
+        shrd    ecx,ecx,9
+        mov     DWORD [8+esp],eax
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [4+esp]
+        shrd    ecx,ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        add     edx,DWORD [56+esp]
+        xor     ebx,edi
+        shrd    ecx,ecx,2
+        add     ebx,edx
+        add     edx,DWORD [20+esp]
+        add     ebx,ecx
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [24+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [28+esp]
+        xor     esi,edi
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [20+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [8+esp]
+        mov     esi,ebx
+        shrd    ecx,ecx,9
+        mov     DWORD [4+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [esp]
+        shrd    ecx,ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [60+esp]
+        xor     eax,edi
+        shrd    ecx,ecx,2
+        add     eax,edx
+        add     edx,DWORD [16+esp]
+        add     eax,ecx
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [20+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [24+esp]
+        xor     esi,edi
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [16+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [4+esp]
+        mov     esi,eax
+        shrd    ecx,ecx,9
+        mov     DWORD [esp],eax
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [28+esp]
+        shrd    ecx,ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        add     edx,DWORD [64+esp]
+        xor     ebx,edi
+        shrd    ecx,ecx,2
+        add     ebx,edx
+        add     edx,DWORD [12+esp]
+        add     ebx,ecx
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [16+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [20+esp]
+        xor     esi,edi
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [12+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [esp]
+        mov     esi,ebx
+        shrd    ecx,ecx,9
+        mov     DWORD [28+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [24+esp]
+        shrd    ecx,ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [68+esp]
+        xor     eax,edi
+        shrd    ecx,ecx,2
+        add     eax,edx
+        add     edx,DWORD [8+esp]
+        add     eax,ecx
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [12+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [16+esp]
+        xor     esi,edi
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [8+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [28+esp]
+        mov     esi,eax
+        shrd    ecx,ecx,9
+        mov     DWORD [24+esp],eax
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [20+esp]
+        shrd    ecx,ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        add     edx,DWORD [72+esp]
+        xor     ebx,edi
+        shrd    ecx,ecx,2
+        add     ebx,edx
+        add     edx,DWORD [4+esp]
+        add     ebx,ecx
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [8+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [12+esp]
+        xor     esi,edi
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [4+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [24+esp]
+        mov     esi,ebx
+        shrd    ecx,ecx,9
+        mov     DWORD [20+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [16+esp]
+        shrd    ecx,ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [76+esp]
+        xor     eax,edi
+        shrd    ecx,ecx,2
+        add     eax,edx
+        add     edx,DWORD [esp]
+        add     eax,ecx
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [4+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [8+esp]
+        xor     esi,edi
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [20+esp]
+        mov     esi,eax
+        shrd    ecx,ecx,9
+        mov     DWORD [16+esp],eax
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [12+esp]
+        shrd    ecx,ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        add     edx,DWORD [80+esp]
+        xor     ebx,edi
+        shrd    ecx,ecx,2
+        add     ebx,edx
+        add     edx,DWORD [28+esp]
+        add     ebx,ecx
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [esp]
+        xor     edx,ecx
+        mov     edi,DWORD [4+esp]
+        xor     esi,edi
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [28+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [16+esp]
+        mov     esi,ebx
+        shrd    ecx,ecx,9
+        mov     DWORD [12+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [8+esp]
+        shrd    ecx,ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [84+esp]
+        xor     eax,edi
+        shrd    ecx,ecx,2
+        add     eax,edx
+        add     edx,DWORD [24+esp]
+        add     eax,ecx
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [28+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [esp]
+        xor     esi,edi
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [24+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [12+esp]
+        mov     esi,eax
+        shrd    ecx,ecx,9
+        mov     DWORD [8+esp],eax
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [4+esp]
+        shrd    ecx,ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        add     edx,DWORD [88+esp]
+        xor     ebx,edi
+        shrd    ecx,ecx,2
+        add     ebx,edx
+        add     edx,DWORD [20+esp]
+        add     ebx,ecx
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [24+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [28+esp]
+        xor     esi,edi
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [20+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [8+esp]
+        mov     esi,ebx
+        shrd    ecx,ecx,9
+        mov     DWORD [4+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [esp]
+        shrd    ecx,ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [92+esp]
+        xor     eax,edi
+        shrd    ecx,ecx,2
+        add     eax,edx
+        add     edx,DWORD [16+esp]
+        add     eax,ecx
+        mov     esi,DWORD [96+esp]
+        xor     ebx,edi
+        mov     ecx,DWORD [12+esp]
+        add     eax,DWORD [esi]
+        add     ebx,DWORD [4+esi]
+        add     edi,DWORD [8+esi]
+        add     ecx,DWORD [12+esi]
+        mov     DWORD [esi],eax
+        mov     DWORD [4+esi],ebx
+        mov     DWORD [8+esi],edi
+        mov     DWORD [12+esi],ecx
+        mov     DWORD [4+esp],ebx
+        xor     ebx,edi
+        mov     DWORD [8+esp],edi
+        mov     DWORD [12+esp],ecx
+        mov     edi,DWORD [20+esp]
+        mov     ecx,DWORD [24+esp]
+        add     edx,DWORD [16+esi]
+        add     edi,DWORD [20+esi]
+        add     ecx,DWORD [24+esi]
+        mov     DWORD [16+esi],edx
+        mov     DWORD [20+esi],edi
+        mov     DWORD [20+esp],edi
+        mov     edi,DWORD [28+esp]
+        mov     DWORD [24+esi],ecx
+        add     edi,DWORD [28+esi]
+        mov     DWORD [24+esp],ecx
+        mov     DWORD [28+esi],edi
+        mov     DWORD [28+esp],edi
+        mov     edi,DWORD [100+esp]
+        vmovdqa xmm7,[64+ebp]
+        sub     ebp,192
+        cmp     edi,DWORD [104+esp]
+        jb      NEAR L$015grand_avx
+        mov     esp,DWORD [108+esp]
+        vzeroall
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+align   32
+L$014AVX_BMI:
+        lea     esp,[esp-96]
+        vzeroall
+        mov     eax,DWORD [esi]
+        mov     ebx,DWORD [4+esi]
+        mov     ecx,DWORD [8+esi]
+        mov     edi,DWORD [12+esi]
+        mov     DWORD [4+esp],ebx
+        xor     ebx,ecx
+        mov     DWORD [8+esp],ecx
+        mov     DWORD [12+esp],edi
+        mov     edx,DWORD [16+esi]
+        mov     edi,DWORD [20+esi]
+        mov     ecx,DWORD [24+esi]
+        mov     esi,DWORD [28+esi]
+        mov     DWORD [20+esp],edi
+        mov     edi,DWORD [100+esp]
+        mov     DWORD [24+esp],ecx
+        mov     DWORD [28+esp],esi
+        vmovdqa xmm7,[256+ebp]
+        jmp     NEAR L$017grand_avx_bmi
+align   32
+L$017grand_avx_bmi:
+        vmovdqu xmm0,[edi]
+        vmovdqu xmm1,[16+edi]
+        vmovdqu xmm2,[32+edi]
+        vmovdqu xmm3,[48+edi]
+        add     edi,64
+        vpshufb xmm0,xmm0,xmm7
+        mov     DWORD [100+esp],edi
+        vpshufb xmm1,xmm1,xmm7
+        vpshufb xmm2,xmm2,xmm7
+        vpaddd  xmm4,xmm0,[ebp]
+        vpshufb xmm3,xmm3,xmm7
+        vpaddd  xmm5,xmm1,[16+ebp]
+        vpaddd  xmm6,xmm2,[32+ebp]
+        vpaddd  xmm7,xmm3,[48+ebp]
+        vmovdqa [32+esp],xmm4
+        vmovdqa [48+esp],xmm5
+        vmovdqa [64+esp],xmm6
+        vmovdqa [80+esp],xmm7
+        jmp     NEAR L$018avx_bmi_00_47
+align   16
+L$018avx_bmi_00_47:
+        add     ebp,64
+        vpalignr        xmm4,xmm1,xmm0,4
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        mov     DWORD [16+esp],edx
+        vpalignr        xmm7,xmm3,xmm2,4
+        rorx    edi,edx,25
+        xor     ecx,esi
+        andn    esi,edx,DWORD [24+esp]
+        vpsrld  xmm6,xmm4,7
+        xor     ecx,edi
+        and     edx,DWORD [20+esp]
+        mov     DWORD [esp],eax
+        vpaddd  xmm0,xmm0,xmm7
+        or      edx,esi
+        rorx    edi,eax,2
+        rorx    esi,eax,13
+        vpsrld  xmm7,xmm4,3
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,eax,22
+        xor     esi,edi
+        vpslld  xmm5,xmm4,14
+        mov     edi,DWORD [4+esp]
+        xor     ecx,esi
+        xor     eax,edi
+        vpxor   xmm4,xmm7,xmm6
+        add     edx,DWORD [28+esp]
+        and     ebx,eax
+        add     edx,DWORD [32+esp]
+        vpshufd xmm7,xmm3,250
+        xor     ebx,edi
+        add     ecx,edx
+        add     edx,DWORD [12+esp]
+        vpsrld  xmm6,xmm6,11
+        lea     ebx,[ecx*1+ebx]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        vpxor   xmm4,xmm4,xmm5
+        mov     DWORD [12+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        vpslld  xmm5,xmm5,11
+        andn    esi,edx,DWORD [20+esp]
+        xor     ecx,edi
+        and     edx,DWORD [16+esp]
+        vpxor   xmm4,xmm4,xmm6
+        mov     DWORD [28+esp],ebx
+        or      edx,esi
+        rorx    edi,ebx,2
+        rorx    esi,ebx,13
+        vpsrld  xmm6,xmm7,10
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,ebx,22
+        xor     esi,edi
+        vpxor   xmm4,xmm4,xmm5
+        mov     edi,DWORD [esp]
+        xor     ecx,esi
+        xor     ebx,edi
+        vpsrlq  xmm5,xmm7,17
+        add     edx,DWORD [24+esp]
+        and     eax,ebx
+        add     edx,DWORD [36+esp]
+        vpaddd  xmm0,xmm0,xmm4
+        xor     eax,edi
+        add     ecx,edx
+        add     edx,DWORD [8+esp]
+        vpxor   xmm6,xmm6,xmm5
+        lea     eax,[ecx*1+eax]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        vpsrlq  xmm7,xmm7,19
+        mov     DWORD [8+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        vpxor   xmm6,xmm6,xmm7
+        andn    esi,edx,DWORD [16+esp]
+        xor     ecx,edi
+        and     edx,DWORD [12+esp]
+        vpshufd xmm7,xmm6,132
+        mov     DWORD [24+esp],eax
+        or      edx,esi
+        rorx    edi,eax,2
+        rorx    esi,eax,13
+        vpsrldq xmm7,xmm7,8
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,eax,22
+        xor     esi,edi
+        vpaddd  xmm0,xmm0,xmm7
+        mov     edi,DWORD [28+esp]
+        xor     ecx,esi
+        xor     eax,edi
+        vpshufd xmm7,xmm0,80
+        add     edx,DWORD [20+esp]
+        and     ebx,eax
+        add     edx,DWORD [40+esp]
+        vpsrld  xmm6,xmm7,10
+        xor     ebx,edi
+        add     ecx,edx
+        add     edx,DWORD [4+esp]
+        vpsrlq  xmm5,xmm7,17
+        lea     ebx,[ecx*1+ebx]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        vpxor   xmm6,xmm6,xmm5
+        mov     DWORD [4+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        vpsrlq  xmm7,xmm7,19
+        andn    esi,edx,DWORD [12+esp]
+        xor     ecx,edi
+        and     edx,DWORD [8+esp]
+        vpxor   xmm6,xmm6,xmm7
+        mov     DWORD [20+esp],ebx
+        or      edx,esi
+        rorx    edi,ebx,2
+        rorx    esi,ebx,13
+        vpshufd xmm7,xmm6,232
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,ebx,22
+        xor     esi,edi
+        vpslldq xmm7,xmm7,8
+        mov     edi,DWORD [24+esp]
+        xor     ecx,esi
+        xor     ebx,edi
+        vpaddd  xmm0,xmm0,xmm7
+        add     edx,DWORD [16+esp]
+        and     eax,ebx
+        add     edx,DWORD [44+esp]
+        vpaddd  xmm6,xmm0,[ebp]
+        xor     eax,edi
+        add     ecx,edx
+        add     edx,DWORD [esp]
+        lea     eax,[ecx*1+eax]
+        vmovdqa [32+esp],xmm6
+        vpalignr        xmm4,xmm2,xmm1,4
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        mov     DWORD [esp],edx
+        vpalignr        xmm7,xmm0,xmm3,4
+        rorx    edi,edx,25
+        xor     ecx,esi
+        andn    esi,edx,DWORD [8+esp]
+        vpsrld  xmm6,xmm4,7
+        xor     ecx,edi
+        and     edx,DWORD [4+esp]
+        mov     DWORD [16+esp],eax
+        vpaddd  xmm1,xmm1,xmm7
+        or      edx,esi
+        rorx    edi,eax,2
+        rorx    esi,eax,13
+        vpsrld  xmm7,xmm4,3
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,eax,22
+        xor     esi,edi
+        vpslld  xmm5,xmm4,14
+        mov     edi,DWORD [20+esp]
+        xor     ecx,esi
+        xor     eax,edi
+        vpxor   xmm4,xmm7,xmm6
+        add     edx,DWORD [12+esp]
+        and     ebx,eax
+        add     edx,DWORD [48+esp]
+        vpshufd xmm7,xmm0,250
+        xor     ebx,edi
+        add     ecx,edx
+        add     edx,DWORD [28+esp]
+        vpsrld  xmm6,xmm6,11
+        lea     ebx,[ecx*1+ebx]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        vpxor   xmm4,xmm4,xmm5
+        mov     DWORD [28+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        vpslld  xmm5,xmm5,11
+        andn    esi,edx,DWORD [4+esp]
+        xor     ecx,edi
+        and     edx,DWORD [esp]
+        vpxor   xmm4,xmm4,xmm6
+        mov     DWORD [12+esp],ebx
+        or      edx,esi
+        rorx    edi,ebx,2
+        rorx    esi,ebx,13
+        vpsrld  xmm6,xmm7,10
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,ebx,22
+        xor     esi,edi
+        vpxor   xmm4,xmm4,xmm5
+        mov     edi,DWORD [16+esp]
+        xor     ecx,esi
+        xor     ebx,edi
+        vpsrlq  xmm5,xmm7,17
+        add     edx,DWORD [8+esp]
+        and     eax,ebx
+        add     edx,DWORD [52+esp]
+        vpaddd  xmm1,xmm1,xmm4
+        xor     eax,edi
+        add     ecx,edx
+        add     edx,DWORD [24+esp]
+        vpxor   xmm6,xmm6,xmm5
+        lea     eax,[ecx*1+eax]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        vpsrlq  xmm7,xmm7,19
+        mov     DWORD [24+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        vpxor   xmm6,xmm6,xmm7
+        andn    esi,edx,DWORD [esp]
+        xor     ecx,edi
+        and     edx,DWORD [28+esp]
+        vpshufd xmm7,xmm6,132
+        mov     DWORD [8+esp],eax
+        or      edx,esi
+        rorx    edi,eax,2
+        rorx    esi,eax,13
+        vpsrldq xmm7,xmm7,8
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,eax,22
+        xor     esi,edi
+        vpaddd  xmm1,xmm1,xmm7
+        mov     edi,DWORD [12+esp]
+        xor     ecx,esi
+        xor     eax,edi
+        vpshufd xmm7,xmm1,80
+        add     edx,DWORD [4+esp]
+        and     ebx,eax
+        add     edx,DWORD [56+esp]
+        vpsrld  xmm6,xmm7,10
+        xor     ebx,edi
+        add     ecx,edx
+        add     edx,DWORD [20+esp]
+        vpsrlq  xmm5,xmm7,17
+        lea     ebx,[ecx*1+ebx]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        vpxor   xmm6,xmm6,xmm5
+        mov     DWORD [20+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        vpsrlq  xmm7,xmm7,19
+        andn    esi,edx,DWORD [28+esp]
+        xor     ecx,edi
+        and     edx,DWORD [24+esp]
+        vpxor   xmm6,xmm6,xmm7
+        mov     DWORD [4+esp],ebx
+        or      edx,esi
+        rorx    edi,ebx,2
+        rorx    esi,ebx,13
+        vpshufd xmm7,xmm6,232
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,ebx,22
+        xor     esi,edi
+        vpslldq xmm7,xmm7,8
+        mov     edi,DWORD [8+esp]
+        xor     ecx,esi
+        xor     ebx,edi
+        vpaddd  xmm1,xmm1,xmm7
+        add     edx,DWORD [esp]
+        and     eax,ebx
+        add     edx,DWORD [60+esp]
+        vpaddd  xmm6,xmm1,[16+ebp]
+        xor     eax,edi
+        add     ecx,edx
+        add     edx,DWORD [16+esp]
+        lea     eax,[ecx*1+eax]
+        vmovdqa [48+esp],xmm6
+        vpalignr        xmm4,xmm3,xmm2,4
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        mov     DWORD [16+esp],edx
+        vpalignr        xmm7,xmm1,xmm0,4
+        rorx    edi,edx,25
+        xor     ecx,esi
+        andn    esi,edx,DWORD [24+esp]
+        vpsrld  xmm6,xmm4,7
+        xor     ecx,edi
+        and     edx,DWORD [20+esp]
+        mov     DWORD [esp],eax
+        vpaddd  xmm2,xmm2,xmm7
+        or      edx,esi
+        rorx    edi,eax,2
+        rorx    esi,eax,13
+        vpsrld  xmm7,xmm4,3
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,eax,22
+        xor     esi,edi
+        vpslld  xmm5,xmm4,14
+        mov     edi,DWORD [4+esp]
+        xor     ecx,esi
+        xor     eax,edi
+        vpxor   xmm4,xmm7,xmm6
+        add     edx,DWORD [28+esp]
+        and     ebx,eax
+        add     edx,DWORD [64+esp]
+        vpshufd xmm7,xmm1,250
+        xor     ebx,edi
+        add     ecx,edx
+        add     edx,DWORD [12+esp]
+        vpsrld  xmm6,xmm6,11
+        lea     ebx,[ecx*1+ebx]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        vpxor   xmm4,xmm4,xmm5
+        mov     DWORD [12+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        vpslld  xmm5,xmm5,11
+        andn    esi,edx,DWORD [20+esp]
+        xor     ecx,edi
+        and     edx,DWORD [16+esp]
+        vpxor   xmm4,xmm4,xmm6
+        mov     DWORD [28+esp],ebx
+        or      edx,esi
+        rorx    edi,ebx,2
+        rorx    esi,ebx,13
+        vpsrld  xmm6,xmm7,10
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,ebx,22
+        xor     esi,edi
+        vpxor   xmm4,xmm4,xmm5
+        mov     edi,DWORD [esp]
+        xor     ecx,esi
+        xor     ebx,edi
+        vpsrlq  xmm5,xmm7,17
+        add     edx,DWORD [24+esp]
+        and     eax,ebx
+        add     edx,DWORD [68+esp]
+        vpaddd  xmm2,xmm2,xmm4
+        xor     eax,edi
+        add     ecx,edx
+        add     edx,DWORD [8+esp]
+        vpxor   xmm6,xmm6,xmm5
+        lea     eax,[ecx*1+eax]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        vpsrlq  xmm7,xmm7,19
+        mov     DWORD [8+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        vpxor   xmm6,xmm6,xmm7
+        andn    esi,edx,DWORD [16+esp]
+        xor     ecx,edi
+        and     edx,DWORD [12+esp]
+        vpshufd xmm7,xmm6,132
+        mov     DWORD [24+esp],eax
+        or      edx,esi
+        rorx    edi,eax,2
+        rorx    esi,eax,13
+        vpsrldq xmm7,xmm7,8
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,eax,22
+        xor     esi,edi
+        vpaddd  xmm2,xmm2,xmm7
+        mov     edi,DWORD [28+esp]
+        xor     ecx,esi
+        xor     eax,edi
+        vpshufd xmm7,xmm2,80
+        add     edx,DWORD [20+esp]
+        and     ebx,eax
+        add     edx,DWORD [72+esp]
+        vpsrld  xmm6,xmm7,10
+        xor     ebx,edi
+        add     ecx,edx
+        add     edx,DWORD [4+esp]
+        vpsrlq  xmm5,xmm7,17
+        lea     ebx,[ecx*1+ebx]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        vpxor   xmm6,xmm6,xmm5
+        mov     DWORD [4+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        vpsrlq  xmm7,xmm7,19
+        andn    esi,edx,DWORD [12+esp]
+        xor     ecx,edi
+        and     edx,DWORD [8+esp]
+        vpxor   xmm6,xmm6,xmm7
+        mov     DWORD [20+esp],ebx
+        or      edx,esi
+        rorx    edi,ebx,2
+        rorx    esi,ebx,13
+        vpshufd xmm7,xmm6,232
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,ebx,22
+        xor     esi,edi
+        vpslldq xmm7,xmm7,8
+        mov     edi,DWORD [24+esp]
+        xor     ecx,esi
+        xor     ebx,edi
+        vpaddd  xmm2,xmm2,xmm7
+        add     edx,DWORD [16+esp]
+        and     eax,ebx
+        add     edx,DWORD [76+esp]
+        vpaddd  xmm6,xmm2,[32+ebp]
+        xor     eax,edi
+        add     ecx,edx
+        add     edx,DWORD [esp]
+        lea     eax,[ecx*1+eax]
+        vmovdqa [64+esp],xmm6
+        vpalignr        xmm4,xmm0,xmm3,4
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        mov     DWORD [esp],edx
+        vpalignr        xmm7,xmm2,xmm1,4
+        rorx    edi,edx,25
+        xor     ecx,esi
+        andn    esi,edx,DWORD [8+esp]
+        vpsrld  xmm6,xmm4,7
+        xor     ecx,edi
+        and     edx,DWORD [4+esp]
+        mov     DWORD [16+esp],eax
+        vpaddd  xmm3,xmm3,xmm7
+        or      edx,esi
+        rorx    edi,eax,2
+        rorx    esi,eax,13
+        vpsrld  xmm7,xmm4,3
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,eax,22
+        xor     esi,edi
+        vpslld  xmm5,xmm4,14
+        mov     edi,DWORD [20+esp]
+        xor     ecx,esi
+        xor     eax,edi
+        vpxor   xmm4,xmm7,xmm6
+        add     edx,DWORD [12+esp]
+        and     ebx,eax
+        add     edx,DWORD [80+esp]
+        vpshufd xmm7,xmm2,250
+        xor     ebx,edi
+        add     ecx,edx
+        add     edx,DWORD [28+esp]
+        vpsrld  xmm6,xmm6,11
+        lea     ebx,[ecx*1+ebx]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        vpxor   xmm4,xmm4,xmm5
+        mov     DWORD [28+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        vpslld  xmm5,xmm5,11
+        andn    esi,edx,DWORD [4+esp]
+        xor     ecx,edi
+        and     edx,DWORD [esp]
+        vpxor   xmm4,xmm4,xmm6
+        mov     DWORD [12+esp],ebx
+        or      edx,esi
+        rorx    edi,ebx,2
+        rorx    esi,ebx,13
+        vpsrld  xmm6,xmm7,10
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,ebx,22
+        xor     esi,edi
+        vpxor   xmm4,xmm4,xmm5
+        mov     edi,DWORD [16+esp]
+        xor     ecx,esi
+        xor     ebx,edi
+        vpsrlq  xmm5,xmm7,17
+        add     edx,DWORD [8+esp]
+        and     eax,ebx
+        add     edx,DWORD [84+esp]
+        vpaddd  xmm3,xmm3,xmm4
+        xor     eax,edi
+        add     ecx,edx
+        add     edx,DWORD [24+esp]
+        vpxor   xmm6,xmm6,xmm5
+        lea     eax,[ecx*1+eax]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        vpsrlq  xmm7,xmm7,19
+        mov     DWORD [24+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        vpxor   xmm6,xmm6,xmm7
+        andn    esi,edx,DWORD [esp]
+        xor     ecx,edi
+        and     edx,DWORD [28+esp]
+        vpshufd xmm7,xmm6,132
+        mov     DWORD [8+esp],eax
+        or      edx,esi
+        rorx    edi,eax,2
+        rorx    esi,eax,13
+        vpsrldq xmm7,xmm7,8
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,eax,22
+        xor     esi,edi
+        vpaddd  xmm3,xmm3,xmm7
+        mov     edi,DWORD [12+esp]
+        xor     ecx,esi
+        xor     eax,edi
+        vpshufd xmm7,xmm3,80
+        add     edx,DWORD [4+esp]
+        and     ebx,eax
+        add     edx,DWORD [88+esp]
+        vpsrld  xmm6,xmm7,10
+        xor     ebx,edi
+        add     ecx,edx
+        add     edx,DWORD [20+esp]
+        vpsrlq  xmm5,xmm7,17
+        lea     ebx,[ecx*1+ebx]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        vpxor   xmm6,xmm6,xmm5
+        mov     DWORD [20+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        vpsrlq  xmm7,xmm7,19
+        andn    esi,edx,DWORD [28+esp]
+        xor     ecx,edi
+        and     edx,DWORD [24+esp]
+        vpxor   xmm6,xmm6,xmm7
+        mov     DWORD [4+esp],ebx
+        or      edx,esi
+        rorx    edi,ebx,2
+        rorx    esi,ebx,13
+        vpshufd xmm7,xmm6,232
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,ebx,22
+        xor     esi,edi
+        vpslldq xmm7,xmm7,8
+        mov     edi,DWORD [8+esp]
+        xor     ecx,esi
+        xor     ebx,edi
+        vpaddd  xmm3,xmm3,xmm7
+        add     edx,DWORD [esp]
+        and     eax,ebx
+        add     edx,DWORD [92+esp]
+        vpaddd  xmm6,xmm3,[48+ebp]
+        xor     eax,edi
+        add     ecx,edx
+        add     edx,DWORD [16+esp]
+        lea     eax,[ecx*1+eax]
+        vmovdqa [80+esp],xmm6
+        cmp     DWORD [64+ebp],66051
+        jne     NEAR L$018avx_bmi_00_47
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        mov     DWORD [16+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        andn    esi,edx,DWORD [24+esp]
+        xor     ecx,edi
+        and     edx,DWORD [20+esp]
+        mov     DWORD [esp],eax
+        or      edx,esi
+        rorx    edi,eax,2
+        rorx    esi,eax,13
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,eax,22
+        xor     esi,edi
+        mov     edi,DWORD [4+esp]
+        xor     ecx,esi
+        xor     eax,edi
+        add     edx,DWORD [28+esp]
+        and     ebx,eax
+        add     edx,DWORD [32+esp]
+        xor     ebx,edi
+        add     ecx,edx
+        add     edx,DWORD [12+esp]
+        lea     ebx,[ecx*1+ebx]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        mov     DWORD [12+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        andn    esi,edx,DWORD [20+esp]
+        xor     ecx,edi
+        and     edx,DWORD [16+esp]
+        mov     DWORD [28+esp],ebx
+        or      edx,esi
+        rorx    edi,ebx,2
+        rorx    esi,ebx,13
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,ebx,22
+        xor     esi,edi
+        mov     edi,DWORD [esp]
+        xor     ecx,esi
+        xor     ebx,edi
+        add     edx,DWORD [24+esp]
+        and     eax,ebx
+        add     edx,DWORD [36+esp]
+        xor     eax,edi
+        add     ecx,edx
+        add     edx,DWORD [8+esp]
+        lea     eax,[ecx*1+eax]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        mov     DWORD [8+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        andn    esi,edx,DWORD [16+esp]
+        xor     ecx,edi
+        and     edx,DWORD [12+esp]
+        mov     DWORD [24+esp],eax
+        or      edx,esi
+        rorx    edi,eax,2
+        rorx    esi,eax,13
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,eax,22
+        xor     esi,edi
+        mov     edi,DWORD [28+esp]
+        xor     ecx,esi
+        xor     eax,edi
+        add     edx,DWORD [20+esp]
+        and     ebx,eax
+        add     edx,DWORD [40+esp]
+        xor     ebx,edi
+        add     ecx,edx
+        add     edx,DWORD [4+esp]
+        lea     ebx,[ecx*1+ebx]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        mov     DWORD [4+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        andn    esi,edx,DWORD [12+esp]
+        xor     ecx,edi
+        and     edx,DWORD [8+esp]
+        mov     DWORD [20+esp],ebx
+        or      edx,esi
+        rorx    edi,ebx,2
+        rorx    esi,ebx,13
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,ebx,22
+        xor     esi,edi
+        mov     edi,DWORD [24+esp]
+        xor     ecx,esi
+        xor     ebx,edi
+        add     edx,DWORD [16+esp]
+        and     eax,ebx
+        add     edx,DWORD [44+esp]
+        xor     eax,edi
+        add     ecx,edx
+        add     edx,DWORD [esp]
+        lea     eax,[ecx*1+eax]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        mov     DWORD [esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        andn    esi,edx,DWORD [8+esp]
+        xor     ecx,edi
+        and     edx,DWORD [4+esp]
+        mov     DWORD [16+esp],eax
+        or      edx,esi
+        rorx    edi,eax,2
+        rorx    esi,eax,13
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,eax,22
+        xor     esi,edi
+        mov     edi,DWORD [20+esp]
+        xor     ecx,esi
+        xor     eax,edi
+        add     edx,DWORD [12+esp]
+        and     ebx,eax
+        add     edx,DWORD [48+esp]
+        xor     ebx,edi
+        add     ecx,edx
+        add     edx,DWORD [28+esp]
+        lea     ebx,[ecx*1+ebx]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        mov     DWORD [28+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        andn    esi,edx,DWORD [4+esp]
+        xor     ecx,edi
+        and     edx,DWORD [esp]
+        mov     DWORD [12+esp],ebx
+        or      edx,esi
+        rorx    edi,ebx,2
+        rorx    esi,ebx,13
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,ebx,22
+        xor     esi,edi
+        mov     edi,DWORD [16+esp]
+        xor     ecx,esi
+        xor     ebx,edi
+        add     edx,DWORD [8+esp]
+        and     eax,ebx
+        add     edx,DWORD [52+esp]
+        xor     eax,edi
+        add     ecx,edx
+        add     edx,DWORD [24+esp]
+        lea     eax,[ecx*1+eax]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        mov     DWORD [24+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        andn    esi,edx,DWORD [esp]
+        xor     ecx,edi
+        and     edx,DWORD [28+esp]
+        mov     DWORD [8+esp],eax
+        or      edx,esi
+        rorx    edi,eax,2
+        rorx    esi,eax,13
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,eax,22
+        xor     esi,edi
+        mov     edi,DWORD [12+esp]
+        xor     ecx,esi
+        xor     eax,edi
+        add     edx,DWORD [4+esp]
+        and     ebx,eax
+        add     edx,DWORD [56+esp]
+        xor     ebx,edi
+        add     ecx,edx
+        add     edx,DWORD [20+esp]
+        lea     ebx,[ecx*1+ebx]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        mov     DWORD [20+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        andn    esi,edx,DWORD [28+esp]
+        xor     ecx,edi
+        and     edx,DWORD [24+esp]
+        mov     DWORD [4+esp],ebx
+        or      edx,esi
+        rorx    edi,ebx,2
+        rorx    esi,ebx,13
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,ebx,22
+        xor     esi,edi
+        mov     edi,DWORD [8+esp]
+        xor     ecx,esi
+        xor     ebx,edi
+        add     edx,DWORD [esp]
+        and     eax,ebx
+        add     edx,DWORD [60+esp]
+        xor     eax,edi
+        add     ecx,edx
+        add     edx,DWORD [16+esp]
+        lea     eax,[ecx*1+eax]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        mov     DWORD [16+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        andn    esi,edx,DWORD [24+esp]
+        xor     ecx,edi
+        and     edx,DWORD [20+esp]
+        mov     DWORD [esp],eax
+        or      edx,esi
+        rorx    edi,eax,2
+        rorx    esi,eax,13
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,eax,22
+        xor     esi,edi
+        mov     edi,DWORD [4+esp]
+        xor     ecx,esi
+        xor     eax,edi
+        add     edx,DWORD [28+esp]
+        and     ebx,eax
+        add     edx,DWORD [64+esp]
+        xor     ebx,edi
+        add     ecx,edx
+        add     edx,DWORD [12+esp]
+        lea     ebx,[ecx*1+ebx]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        mov     DWORD [12+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        andn    esi,edx,DWORD [20+esp]
+        xor     ecx,edi
+        and     edx,DWORD [16+esp]
+        mov     DWORD [28+esp],ebx
+        or      edx,esi
+        rorx    edi,ebx,2
+        rorx    esi,ebx,13
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,ebx,22
+        xor     esi,edi
+        mov     edi,DWORD [esp]
+        xor     ecx,esi
+        xor     ebx,edi
+        add     edx,DWORD [24+esp]
+        and     eax,ebx
+        add     edx,DWORD [68+esp]
+        xor     eax,edi
+        add     ecx,edx
+        add     edx,DWORD [8+esp]
+        lea     eax,[ecx*1+eax]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        mov     DWORD [8+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        andn    esi,edx,DWORD [16+esp]
+        xor     ecx,edi
+        and     edx,DWORD [12+esp]
+        mov     DWORD [24+esp],eax
+        or      edx,esi
+        rorx    edi,eax,2
+        rorx    esi,eax,13
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,eax,22
+        xor     esi,edi
+        mov     edi,DWORD [28+esp]
+        xor     ecx,esi
+        xor     eax,edi
+        add     edx,DWORD [20+esp]
+        and     ebx,eax
+        add     edx,DWORD [72+esp]
+        xor     ebx,edi
+        add     ecx,edx
+        add     edx,DWORD [4+esp]
+        lea     ebx,[ecx*1+ebx]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        mov     DWORD [4+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        andn    esi,edx,DWORD [12+esp]
+        xor     ecx,edi
+        and     edx,DWORD [8+esp]
+        mov     DWORD [20+esp],ebx
+        or      edx,esi
+        rorx    edi,ebx,2
+        rorx    esi,ebx,13
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,ebx,22
+        xor     esi,edi
+        mov     edi,DWORD [24+esp]
+        xor     ecx,esi
+        xor     ebx,edi
+        add     edx,DWORD [16+esp]
+        and     eax,ebx
+        add     edx,DWORD [76+esp]
+        xor     eax,edi
+        add     ecx,edx
+        add     edx,DWORD [esp]
+        lea     eax,[ecx*1+eax]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        mov     DWORD [esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        andn    esi,edx,DWORD [8+esp]
+        xor     ecx,edi
+        and     edx,DWORD [4+esp]
+        mov     DWORD [16+esp],eax
+        or      edx,esi
+        rorx    edi,eax,2
+        rorx    esi,eax,13
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,eax,22
+        xor     esi,edi
+        mov     edi,DWORD [20+esp]
+        xor     ecx,esi
+        xor     eax,edi
+        add     edx,DWORD [12+esp]
+        and     ebx,eax
+        add     edx,DWORD [80+esp]
+        xor     ebx,edi
+        add     ecx,edx
+        add     edx,DWORD [28+esp]
+        lea     ebx,[ecx*1+ebx]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        mov     DWORD [28+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        andn    esi,edx,DWORD [4+esp]
+        xor     ecx,edi
+        and     edx,DWORD [esp]
+        mov     DWORD [12+esp],ebx
+        or      edx,esi
+        rorx    edi,ebx,2
+        rorx    esi,ebx,13
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,ebx,22
+        xor     esi,edi
+        mov     edi,DWORD [16+esp]
+        xor     ecx,esi
+        xor     ebx,edi
+        add     edx,DWORD [8+esp]
+        and     eax,ebx
+        add     edx,DWORD [84+esp]
+        xor     eax,edi
+        add     ecx,edx
+        add     edx,DWORD [24+esp]
+        lea     eax,[ecx*1+eax]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        mov     DWORD [24+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        andn    esi,edx,DWORD [esp]
+        xor     ecx,edi
+        and     edx,DWORD [28+esp]
+        mov     DWORD [8+esp],eax
+        or      edx,esi
+        rorx    edi,eax,2
+        rorx    esi,eax,13
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,eax,22
+        xor     esi,edi
+        mov     edi,DWORD [12+esp]
+        xor     ecx,esi
+        xor     eax,edi
+        add     edx,DWORD [4+esp]
+        and     ebx,eax
+        add     edx,DWORD [88+esp]
+        xor     ebx,edi
+        add     ecx,edx
+        add     edx,DWORD [20+esp]
+        lea     ebx,[ecx*1+ebx]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        mov     DWORD [20+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        andn    esi,edx,DWORD [28+esp]
+        xor     ecx,edi
+        and     edx,DWORD [24+esp]
+        mov     DWORD [4+esp],ebx
+        or      edx,esi
+        rorx    edi,ebx,2
+        rorx    esi,ebx,13
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,ebx,22
+        xor     esi,edi
+        mov     edi,DWORD [8+esp]
+        xor     ecx,esi
+        xor     ebx,edi
+        add     edx,DWORD [esp]
+        and     eax,ebx
+        add     edx,DWORD [92+esp]
+        xor     eax,edi
+        add     ecx,edx
+        add     edx,DWORD [16+esp]
+        lea     eax,[ecx*1+eax]
+        mov     esi,DWORD [96+esp]
+        xor     ebx,edi
+        mov     ecx,DWORD [12+esp]
+        add     eax,DWORD [esi]
+        add     ebx,DWORD [4+esi]
+        add     edi,DWORD [8+esi]
+        add     ecx,DWORD [12+esi]
+        mov     DWORD [esi],eax
+        mov     DWORD [4+esi],ebx
+        mov     DWORD [8+esi],edi
+        mov     DWORD [12+esi],ecx
+        mov     DWORD [4+esp],ebx
+        xor     ebx,edi
+        mov     DWORD [8+esp],edi
+        mov     DWORD [12+esp],ecx
+        mov     edi,DWORD [20+esp]
+        mov     ecx,DWORD [24+esp]
+        add     edx,DWORD [16+esi]
+        add     edi,DWORD [20+esi]
+        add     ecx,DWORD [24+esi]
+        mov     DWORD [16+esi],edx
+        mov     DWORD [20+esi],edi
+        mov     DWORD [20+esp],edi
+        mov     edi,DWORD [28+esp]
+        mov     DWORD [24+esi],ecx
+        add     edi,DWORD [28+esi]
+        mov     DWORD [24+esp],ecx
+        mov     DWORD [28+esi],edi
+        mov     DWORD [28+esp],edi
+        mov     edi,DWORD [100+esp]
+        vmovdqa xmm7,[64+ebp]
+        sub     ebp,192
+        cmp     edi,DWORD [104+esp]
+        jb      NEAR L$017grand_avx_bmi
+        mov     esp,DWORD [108+esp]
+        vzeroall
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+segment .bss
+common  _OPENSSL_ia32cap_P 16
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha512-586.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha512-586.nasm
new file mode 100644
index 0000000000..f80f1cca53
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha512-586.nasm
@@ -0,0 +1,2842 @@
+; Copyright 2007-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code    use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text   code align=64
+%else
+section .text   code
+%endif
+;extern _OPENSSL_ia32cap_P
+global  _sha512_block_data_order
+align   16
+_sha512_block_data_order:
+L$_sha512_block_data_order_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     esi,DWORD [20+esp]
+        mov     edi,DWORD [24+esp]
+        mov     eax,DWORD [28+esp]
+        mov     ebx,esp
+        call    L$000pic_point
+L$000pic_point:
+        pop     ebp
+        lea     ebp,[(L$001K512-L$000pic_point)+ebp]
+        sub     esp,16
+        and     esp,-64
+        shl     eax,7
+        add     eax,edi
+        mov     DWORD [esp],esi
+        mov     DWORD [4+esp],edi
+        mov     DWORD [8+esp],eax
+        mov     DWORD [12+esp],ebx
+        lea     edx,[_OPENSSL_ia32cap_P]
+        mov     ecx,DWORD [edx]
+        test    ecx,67108864
+        jz      NEAR L$002loop_x86
+        mov     edx,DWORD [4+edx]
+        movq    mm0,[esi]
+        and     ecx,16777216
+        movq    mm1,[8+esi]
+        and     edx,512
+        movq    mm2,[16+esi]
+        or      ecx,edx
+        movq    mm3,[24+esi]
+        movq    mm4,[32+esi]
+        movq    mm5,[40+esi]
+        movq    mm6,[48+esi]
+        movq    mm7,[56+esi]
+        cmp     ecx,16777728
+        je      NEAR L$003SSSE3
+        sub     esp,80
+        jmp     NEAR L$004loop_sse2
+align   16
+L$004loop_sse2:
+        movq    [8+esp],mm1
+        movq    [16+esp],mm2
+        movq    [24+esp],mm3
+        movq    [40+esp],mm5
+        movq    [48+esp],mm6
+        pxor    mm2,mm1
+        movq    [56+esp],mm7
+        movq    mm3,mm0
+        mov     eax,DWORD [edi]
+        mov     ebx,DWORD [4+edi]
+        add     edi,8
+        mov     edx,15
+        bswap   eax
+        bswap   ebx
+        jmp     NEAR L$00500_14_sse2
+align   16
+L$00500_14_sse2:
+        movd    mm1,eax
+        mov     eax,DWORD [edi]
+        movd    mm7,ebx
+        mov     ebx,DWORD [4+edi]
+        add     edi,8
+        bswap   eax
+        bswap   ebx
+        punpckldq       mm7,mm1
+        movq    mm1,mm4
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [32+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        movq    mm0,mm3
+        movq    [72+esp],mm7
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [esp],mm0
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[56+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        paddq   mm7,[ebp]
+        pxor    mm3,mm4
+        movq    mm4,[24+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm0
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm0
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[8+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        sub     esp,8
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm2,mm0
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        pxor    mm6,mm7
+        movq    mm5,[40+esp]
+        paddq   mm3,mm2
+        movq    mm2,mm0
+        add     ebp,8
+        paddq   mm3,mm6
+        movq    mm6,[48+esp]
+        dec     edx
+        jnz     NEAR L$00500_14_sse2
+        movd    mm1,eax
+        movd    mm7,ebx
+        punpckldq       mm7,mm1
+        movq    mm1,mm4
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [32+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        movq    mm0,mm3
+        movq    [72+esp],mm7
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [esp],mm0
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[56+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        paddq   mm7,[ebp]
+        pxor    mm3,mm4
+        movq    mm4,[24+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm0
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm0
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[8+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        sub     esp,8
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm2,mm0
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        pxor    mm6,mm7
+        movq    mm7,[192+esp]
+        paddq   mm3,mm2
+        movq    mm2,mm0
+        add     ebp,8
+        paddq   mm3,mm6
+        pxor    mm0,mm0
+        mov     edx,32
+        jmp     NEAR L$00616_79_sse2
+align   16
+L$00616_79_sse2:
+        movq    mm5,[88+esp]
+        movq    mm1,mm7
+        psrlq   mm7,1
+        movq    mm6,mm5
+        psrlq   mm5,6
+        psllq   mm1,56
+        paddq   mm0,mm3
+        movq    mm3,mm7
+        psrlq   mm7,6
+        pxor    mm3,mm1
+        psllq   mm1,7
+        pxor    mm3,mm7
+        psrlq   mm7,1
+        pxor    mm3,mm1
+        movq    mm1,mm5
+        psrlq   mm5,13
+        pxor    mm7,mm3
+        psllq   mm6,3
+        pxor    mm1,mm5
+        paddq   mm7,[200+esp]
+        pxor    mm1,mm6
+        psrlq   mm5,42
+        paddq   mm7,[128+esp]
+        pxor    mm1,mm5
+        psllq   mm6,42
+        movq    mm5,[40+esp]
+        pxor    mm1,mm6
+        movq    mm6,[48+esp]
+        paddq   mm7,mm1
+        movq    mm1,mm4
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [32+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        movq    [72+esp],mm7
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [esp],mm0
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[56+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        paddq   mm7,[ebp]
+        pxor    mm3,mm4
+        movq    mm4,[24+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm0
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm0
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[8+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        sub     esp,8
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm2,mm0
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        pxor    mm6,mm7
+        movq    mm7,[192+esp]
+        paddq   mm2,mm6
+        add     ebp,8
+        movq    mm5,[88+esp]
+        movq    mm1,mm7
+        psrlq   mm7,1
+        movq    mm6,mm5
+        psrlq   mm5,6
+        psllq   mm1,56
+        paddq   mm2,mm3
+        movq    mm3,mm7
+        psrlq   mm7,6
+        pxor    mm3,mm1
+        psllq   mm1,7
+        pxor    mm3,mm7
+        psrlq   mm7,1
+        pxor    mm3,mm1
+        movq    mm1,mm5
+        psrlq   mm5,13
+        pxor    mm7,mm3
+        psllq   mm6,3
+        pxor    mm1,mm5
+        paddq   mm7,[200+esp]
+        pxor    mm1,mm6
+        psrlq   mm5,42
+        paddq   mm7,[128+esp]
+        pxor    mm1,mm5
+        psllq   mm6,42
+        movq    mm5,[40+esp]
+        pxor    mm1,mm6
+        movq    mm6,[48+esp]
+        paddq   mm7,mm1
+        movq    mm1,mm4
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [32+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        movq    [72+esp],mm7
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [esp],mm2
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[56+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        paddq   mm7,[ebp]
+        pxor    mm3,mm4
+        movq    mm4,[24+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm2
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm2
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[8+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        sub     esp,8
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm0,mm2
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        pxor    mm6,mm7
+        movq    mm7,[192+esp]
+        paddq   mm0,mm6
+        add     ebp,8
+        dec     edx
+        jnz     NEAR L$00616_79_sse2
+        paddq   mm0,mm3
+        movq    mm1,[8+esp]
+        movq    mm3,[24+esp]
+        movq    mm5,[40+esp]
+        movq    mm6,[48+esp]
+        movq    mm7,[56+esp]
+        pxor    mm2,mm1
+        paddq   mm0,[esi]
+        paddq   mm1,[8+esi]
+        paddq   mm2,[16+esi]
+        paddq   mm3,[24+esi]
+        paddq   mm4,[32+esi]
+        paddq   mm5,[40+esi]
+        paddq   mm6,[48+esi]
+        paddq   mm7,[56+esi]
+        mov     eax,640
+        movq    [esi],mm0
+        movq    [8+esi],mm1
+        movq    [16+esi],mm2
+        movq    [24+esi],mm3
+        movq    [32+esi],mm4
+        movq    [40+esi],mm5
+        movq    [48+esi],mm6
+        movq    [56+esi],mm7
+        lea     esp,[eax*1+esp]
+        sub     ebp,eax
+        cmp     edi,DWORD [88+esp]
+        jb      NEAR L$004loop_sse2
+        mov     esp,DWORD [92+esp]
+        emms
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+align   32
+L$003SSSE3:
+        lea     edx,[esp-64]
+        sub     esp,256
+        movdqa  xmm1,[640+ebp]
+        movdqu  xmm0,[edi]
+db      102,15,56,0,193
+        movdqa  xmm3,[ebp]
+        movdqa  xmm2,xmm1
+        movdqu  xmm1,[16+edi]
+        paddq   xmm3,xmm0
+db      102,15,56,0,202
+        movdqa  [edx-128],xmm3
+        movdqa  xmm4,[16+ebp]
+        movdqa  xmm3,xmm2
+        movdqu  xmm2,[32+edi]
+        paddq   xmm4,xmm1
+db      102,15,56,0,211
+        movdqa  [edx-112],xmm4
+        movdqa  xmm5,[32+ebp]
+        movdqa  xmm4,xmm3
+        movdqu  xmm3,[48+edi]
+        paddq   xmm5,xmm2
+db      102,15,56,0,220
+        movdqa  [edx-96],xmm5
+        movdqa  xmm6,[48+ebp]
+        movdqa  xmm5,xmm4
+        movdqu  xmm4,[64+edi]
+        paddq   xmm6,xmm3
+db      102,15,56,0,229
+        movdqa  [edx-80],xmm6
+        movdqa  xmm7,[64+ebp]
+        movdqa  xmm6,xmm5
+        movdqu  xmm5,[80+edi]
+        paddq   xmm7,xmm4
+db      102,15,56,0,238
+        movdqa  [edx-64],xmm7
+        movdqa  [edx],xmm0
+        movdqa  xmm0,[80+ebp]
+        movdqa  xmm7,xmm6
+        movdqu  xmm6,[96+edi]
+        paddq   xmm0,xmm5
+db      102,15,56,0,247
+        movdqa  [edx-48],xmm0
+        movdqa  [16+edx],xmm1
+        movdqa  xmm1,[96+ebp]
+        movdqa  xmm0,xmm7
+        movdqu  xmm7,[112+edi]
+        paddq   xmm1,xmm6
+db      102,15,56,0,248
+        movdqa  [edx-32],xmm1
+        movdqa  [32+edx],xmm2
+        movdqa  xmm2,[112+ebp]
+        movdqa  xmm0,[edx]
+        paddq   xmm2,xmm7
+        movdqa  [edx-16],xmm2
+        nop
+align   32
+L$007loop_ssse3:
+        movdqa  xmm2,[16+edx]
+        movdqa  [48+edx],xmm3
+        lea     ebp,[128+ebp]
+        movq    [8+esp],mm1
+        mov     ebx,edi
+        movq    [16+esp],mm2
+        lea     edi,[128+edi]
+        movq    [24+esp],mm3
+        cmp     edi,eax
+        movq    [40+esp],mm5
+        cmovb   ebx,edi
+        movq    [48+esp],mm6
+        mov     ecx,4
+        pxor    mm2,mm1
+        movq    [56+esp],mm7
+        pxor    mm3,mm3
+        jmp     NEAR L$00800_47_ssse3
+align   32
+L$00800_47_ssse3:
+        movdqa  xmm3,xmm5
+        movdqa  xmm1,xmm2
+db      102,15,58,15,208,8
+        movdqa  [edx],xmm4
+db      102,15,58,15,220,8
+        movdqa  xmm4,xmm2
+        psrlq   xmm2,7
+        paddq   xmm0,xmm3
+        movdqa  xmm3,xmm4
+        psrlq   xmm4,1
+        psllq   xmm3,56
+        pxor    xmm2,xmm4
+        psrlq   xmm4,7
+        pxor    xmm2,xmm3
+        psllq   xmm3,7
+        pxor    xmm2,xmm4
+        movdqa  xmm4,xmm7
+        pxor    xmm2,xmm3
+        movdqa  xmm3,xmm7
+        psrlq   xmm4,6
+        paddq   xmm0,xmm2
+        movdqa  xmm2,xmm7
+        psrlq   xmm3,19
+        psllq   xmm2,3
+        pxor    xmm4,xmm3
+        psrlq   xmm3,42
+        pxor    xmm4,xmm2
+        psllq   xmm2,42
+        pxor    xmm4,xmm3
+        movdqa  xmm3,[32+edx]
+        pxor    xmm4,xmm2
+        movdqa  xmm2,[ebp]
+        movq    mm1,mm4
+        paddq   xmm0,xmm4
+        movq    mm7,[edx-128]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [32+esp],mm4
+        paddq   xmm2,xmm0
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm0,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [esp],mm0
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[56+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[24+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm0
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm0
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[8+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm2,mm0
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        pxor    mm6,mm7
+        movq    mm5,[32+esp]
+        paddq   mm2,mm6
+        movq    mm6,[40+esp]
+        movq    mm1,mm4
+        movq    mm7,[edx-120]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [24+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm2,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [56+esp],mm2
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[48+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[16+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm2
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm2
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm0,mm2
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        pxor    mm6,mm7
+        movq    mm5,[24+esp]
+        paddq   mm0,mm6
+        movq    mm6,[32+esp]
+        movdqa  [edx-128],xmm2
+        movdqa  xmm4,xmm6
+        movdqa  xmm2,xmm3
+db      102,15,58,15,217,8
+        movdqa  [16+edx],xmm5
+db      102,15,58,15,229,8
+        movdqa  xmm5,xmm3
+        psrlq   xmm3,7
+        paddq   xmm1,xmm4
+        movdqa  xmm4,xmm5
+        psrlq   xmm5,1
+        psllq   xmm4,56
+        pxor    xmm3,xmm5
+        psrlq   xmm5,7
+        pxor    xmm3,xmm4
+        psllq   xmm4,7
+        pxor    xmm3,xmm5
+        movdqa  xmm5,xmm0
+        pxor    xmm3,xmm4
+        movdqa  xmm4,xmm0
+        psrlq   xmm5,6
+        paddq   xmm1,xmm3
+        movdqa  xmm3,xmm0
+        psrlq   xmm4,19
+        psllq   xmm3,3
+        pxor    xmm5,xmm4
+        psrlq   xmm4,42
+        pxor    xmm5,xmm3
+        psllq   xmm3,42
+        pxor    xmm5,xmm4
+        movdqa  xmm4,[48+edx]
+        pxor    xmm5,xmm3
+        movdqa  xmm3,[16+ebp]
+        movq    mm1,mm4
+        paddq   xmm1,xmm5
+        movq    mm7,[edx-112]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [16+esp],mm4
+        paddq   xmm3,xmm1
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm0,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [48+esp],mm0
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[40+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[8+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm0
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm0
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[56+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm2,mm0
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        pxor    mm6,mm7
+        movq    mm5,[16+esp]
+        paddq   mm2,mm6
+        movq    mm6,[24+esp]
+        movq    mm1,mm4
+        movq    mm7,[edx-104]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [8+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm2,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [40+esp],mm2
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[32+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[esp]
+        paddq   mm3,mm7
+        movq    mm5,mm2
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm2
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[48+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm0,mm2
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        pxor    mm6,mm7
+        movq    mm5,[8+esp]
+        paddq   mm0,mm6
+        movq    mm6,[16+esp]
+        movdqa  [edx-112],xmm3
+        movdqa  xmm5,xmm7
+        movdqa  xmm3,xmm4
+db      102,15,58,15,226,8
+        movdqa  [32+edx],xmm6
+db      102,15,58,15,238,8
+        movdqa  xmm6,xmm4
+        psrlq   xmm4,7
+        paddq   xmm2,xmm5
+        movdqa  xmm5,xmm6
+        psrlq   xmm6,1
+        psllq   xmm5,56
+        pxor    xmm4,xmm6
+        psrlq   xmm6,7
+        pxor    xmm4,xmm5
+        psllq   xmm5,7
+        pxor    xmm4,xmm6
+        movdqa  xmm6,xmm1
+        pxor    xmm4,xmm5
+        movdqa  xmm5,xmm1
+        psrlq   xmm6,6
+        paddq   xmm2,xmm4
+        movdqa  xmm4,xmm1
+        psrlq   xmm5,19
+        psllq   xmm4,3
+        pxor    xmm6,xmm5
+        psrlq   xmm5,42
+        pxor    xmm6,xmm4
+        psllq   xmm4,42
+        pxor    xmm6,xmm5
+        movdqa  xmm5,[edx]
+        pxor    xmm6,xmm4
+        movdqa  xmm4,[32+ebp]
+        movq    mm1,mm4
+        paddq   xmm2,xmm6
+        movq    mm7,[edx-96]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [esp],mm4
+        paddq   xmm4,xmm2
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm0,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [32+esp],mm0
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[24+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[56+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm0
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm0
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[40+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm2,mm0
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        pxor    mm6,mm7
+        movq    mm5,[esp]
+        paddq   mm2,mm6
+        movq    mm6,[8+esp]
+        movq    mm1,mm4
+        movq    mm7,[edx-88]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [56+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm2,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [24+esp],mm2
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[16+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[48+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm2
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm2
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[32+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm0,mm2
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        pxor    mm6,mm7
+        movq    mm5,[56+esp]
+        paddq   mm0,mm6
+        movq    mm6,[esp]
+        movdqa  [edx-96],xmm4
+        movdqa  xmm6,xmm0
+        movdqa  xmm4,xmm5
+db      102,15,58,15,235,8
+        movdqa  [48+edx],xmm7
+db      102,15,58,15,247,8
+        movdqa  xmm7,xmm5
+        psrlq   xmm5,7
+        paddq   xmm3,xmm6
+        movdqa  xmm6,xmm7
+        psrlq   xmm7,1
+        psllq   xmm6,56
+        pxor    xmm5,xmm7
+        psrlq   xmm7,7
+        pxor    xmm5,xmm6
+        psllq   xmm6,7
+        pxor    xmm5,xmm7
+        movdqa  xmm7,xmm2
+        pxor    xmm5,xmm6
+        movdqa  xmm6,xmm2
+        psrlq   xmm7,6
+        paddq   xmm3,xmm5
+        movdqa  xmm5,xmm2
+        psrlq   xmm6,19
+        psllq   xmm5,3
+        pxor    xmm7,xmm6
+        psrlq   xmm6,42
+        pxor    xmm7,xmm5
+        psllq   xmm5,42
+        pxor    xmm7,xmm6
+        movdqa  xmm6,[16+edx]
+        pxor    xmm7,xmm5
+        movdqa  xmm5,[48+ebp]
+        movq    mm1,mm4
+        paddq   xmm3,xmm7
+        movq    mm7,[edx-80]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [48+esp],mm4
+        paddq   xmm5,xmm3
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm0,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [16+esp],mm0
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[8+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[40+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm0
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm0
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[24+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm2,mm0
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        pxor    mm6,mm7
+        movq    mm5,[48+esp]
+        paddq   mm2,mm6
+        movq    mm6,[56+esp]
+        movq    mm1,mm4
+        movq    mm7,[edx-72]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [40+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm2,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [8+esp],mm2
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[32+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm2
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm2
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[16+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm0,mm2
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        pxor    mm6,mm7
+        movq    mm5,[40+esp]
+        paddq   mm0,mm6
+        movq    mm6,[48+esp]
+        movdqa  [edx-80],xmm5
+        movdqa  xmm7,xmm1
+        movdqa  xmm5,xmm6
+db      102,15,58,15,244,8
+        movdqa  [edx],xmm0
+db      102,15,58,15,248,8
+        movdqa  xmm0,xmm6
+        psrlq   xmm6,7
+        paddq   xmm4,xmm7
+        movdqa  xmm7,xmm0
+        psrlq   xmm0,1
+        psllq   xmm7,56
+        pxor    xmm6,xmm0
+        psrlq   xmm0,7
+        pxor    xmm6,xmm7
+        psllq   xmm7,7
+        pxor    xmm6,xmm0
+        movdqa  xmm0,xmm3
+        pxor    xmm6,xmm7
+        movdqa  xmm7,xmm3
+        psrlq   xmm0,6
+        paddq   xmm4,xmm6
+        movdqa  xmm6,xmm3
+        psrlq   xmm7,19
+        psllq   xmm6,3
+        pxor    xmm0,xmm7
+        psrlq   xmm7,42
+        pxor    xmm0,xmm6
+        psllq   xmm6,42
+        pxor    xmm0,xmm7
+        movdqa  xmm7,[32+edx]
+        pxor    xmm0,xmm6
+        movdqa  xmm6,[64+ebp]
+        movq    mm1,mm4
+        paddq   xmm4,xmm0
+        movq    mm7,[edx-64]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [32+esp],mm4
+        paddq   xmm6,xmm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm0,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [esp],mm0
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[56+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[24+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm0
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm0
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[8+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm2,mm0
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        pxor    mm6,mm7
+        movq    mm5,[32+esp]
+        paddq   mm2,mm6
+        movq    mm6,[40+esp]
+        movq    mm1,mm4
+        movq    mm7,[edx-56]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [24+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm2,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [56+esp],mm2
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[48+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[16+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm2
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm2
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm0,mm2
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        pxor    mm6,mm7
+        movq    mm5,[24+esp]
+        paddq   mm0,mm6
+        movq    mm6,[32+esp]
+        movdqa  [edx-64],xmm6
+        movdqa  xmm0,xmm2
+        movdqa  xmm6,xmm7
+db      102,15,58,15,253,8
+        movdqa  [16+edx],xmm1
+db      102,15,58,15,193,8
+        movdqa  xmm1,xmm7
+        psrlq   xmm7,7
+        paddq   xmm5,xmm0
+        movdqa  xmm0,xmm1
+        psrlq   xmm1,1
+        psllq   xmm0,56
+        pxor    xmm7,xmm1
+        psrlq   xmm1,7
+        pxor    xmm7,xmm0
+        psllq   xmm0,7
+        pxor    xmm7,xmm1
+        movdqa  xmm1,xmm4
+        pxor    xmm7,xmm0
+        movdqa  xmm0,xmm4
+        psrlq   xmm1,6
+        paddq   xmm5,xmm7
+        movdqa  xmm7,xmm4
+        psrlq   xmm0,19
+        psllq   xmm7,3
+        pxor    xmm1,xmm0
+        psrlq   xmm0,42
+        pxor    xmm1,xmm7
+        psllq   xmm7,42
+        pxor    xmm1,xmm0
+        movdqa  xmm0,[48+edx]
+        pxor    xmm1,xmm7
+        movdqa  xmm7,[80+ebp]
+        movq    mm1,mm4
+        paddq   xmm5,xmm1
+        movq    mm7,[edx-48]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [16+esp],mm4
+        paddq   xmm7,xmm5
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm0,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [48+esp],mm0
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[40+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[8+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm0
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm0
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[56+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm2,mm0
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        pxor    mm6,mm7
+        movq    mm5,[16+esp]
+        paddq   mm2,mm6
+        movq    mm6,[24+esp]
+        movq    mm1,mm4
+        movq    mm7,[edx-40]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [8+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm2,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [40+esp],mm2
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[32+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[esp]
+        paddq   mm3,mm7
+        movq    mm5,mm2
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm2
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[48+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm0,mm2
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        pxor    mm6,mm7
+        movq    mm5,[8+esp]
+        paddq   mm0,mm6
+        movq    mm6,[16+esp]
+        movdqa  [edx-48],xmm7
+        movdqa  xmm1,xmm3
+        movdqa  xmm7,xmm0
+db      102,15,58,15,198,8
+        movdqa  [32+edx],xmm2
+db      102,15,58,15,202,8
+        movdqa  xmm2,xmm0
+        psrlq   xmm0,7
+        paddq   xmm6,xmm1
+        movdqa  xmm1,xmm2
+        psrlq   xmm2,1
+        psllq   xmm1,56
+        pxor    xmm0,xmm2
+        psrlq   xmm2,7
+        pxor    xmm0,xmm1
+        psllq   xmm1,7
+        pxor    xmm0,xmm2
+        movdqa  xmm2,xmm5
+        pxor    xmm0,xmm1
+        movdqa  xmm1,xmm5
+        psrlq   xmm2,6
+        paddq   xmm6,xmm0
+        movdqa  xmm0,xmm5
+        psrlq   xmm1,19
+        psllq   xmm0,3
+        pxor    xmm2,xmm1
+        psrlq   xmm1,42
+        pxor    xmm2,xmm0
+        psllq   xmm0,42
+        pxor    xmm2,xmm1
+        movdqa  xmm1,[edx]
+        pxor    xmm2,xmm0
+        movdqa  xmm0,[96+ebp]
+        movq    mm1,mm4
+        paddq   xmm6,xmm2
+        movq    mm7,[edx-32]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [esp],mm4
+        paddq   xmm0,xmm6
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm0,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [32+esp],mm0
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[24+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[56+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm0
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm0
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[40+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm2,mm0
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        pxor    mm6,mm7
+        movq    mm5,[esp]
+        paddq   mm2,mm6
+        movq    mm6,[8+esp]
+        movq    mm1,mm4
+        movq    mm7,[edx-24]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [56+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm2,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [24+esp],mm2
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[16+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[48+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm2
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm2
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[32+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm0,mm2
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        pxor    mm6,mm7
+        movq    mm5,[56+esp]
+        paddq   mm0,mm6
+        movq    mm6,[esp]
+        movdqa  [edx-32],xmm0
+        movdqa  xmm2,xmm4
+        movdqa  xmm0,xmm1
+db      102,15,58,15,207,8
+        movdqa  [48+edx],xmm3
+db      102,15,58,15,211,8
+        movdqa  xmm3,xmm1
+        psrlq   xmm1,7
+        paddq   xmm7,xmm2
+        movdqa  xmm2,xmm3
+        psrlq   xmm3,1
+        psllq   xmm2,56
+        pxor    xmm1,xmm3
+        psrlq   xmm3,7
+        pxor    xmm1,xmm2
+        psllq   xmm2,7
+        pxor    xmm1,xmm3
+        movdqa  xmm3,xmm6
+        pxor    xmm1,xmm2
+        movdqa  xmm2,xmm6
+        psrlq   xmm3,6
+        paddq   xmm7,xmm1
+        movdqa  xmm1,xmm6
+        psrlq   xmm2,19
+        psllq   xmm1,3
+        pxor    xmm3,xmm2
+        psrlq   xmm2,42
+        pxor    xmm3,xmm1
+        psllq   xmm1,42
+        pxor    xmm3,xmm2
+        movdqa  xmm2,[16+edx]
+        pxor    xmm3,xmm1
+        movdqa  xmm1,[112+ebp]
+        movq    mm1,mm4
+        paddq   xmm7,xmm3
+        movq    mm7,[edx-16]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [48+esp],mm4
+        paddq   xmm1,xmm7
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm0,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [16+esp],mm0
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[8+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[40+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm0
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm0
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[24+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm2,mm0
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        pxor    mm6,mm7
+        movq    mm5,[48+esp]
+        paddq   mm2,mm6
+        movq    mm6,[56+esp]
+        movq    mm1,mm4
+        movq    mm7,[edx-8]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [40+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm2,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [8+esp],mm2
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[32+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm2
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm2
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[16+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm0,mm2
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        pxor    mm6,mm7
+        movq    mm5,[40+esp]
+        paddq   mm0,mm6
+        movq    mm6,[48+esp]
+        movdqa  [edx-16],xmm1
+        lea     ebp,[128+ebp]
+        dec     ecx
+        jnz     NEAR L$00800_47_ssse3
+        movdqa  xmm1,[ebp]
+        lea     ebp,[ebp-640]
+        movdqu  xmm0,[ebx]
+db      102,15,56,0,193
+        movdqa  xmm3,[ebp]
+        movdqa  xmm2,xmm1
+        movdqu  xmm1,[16+ebx]
+        paddq   xmm3,xmm0
+db      102,15,56,0,202
+        movq    mm1,mm4
+        movq    mm7,[edx-128]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [32+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm0,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [esp],mm0
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[56+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[24+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm0
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm0
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[8+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm2,mm0
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        pxor    mm6,mm7
+        movq    mm5,[32+esp]
+        paddq   mm2,mm6
+        movq    mm6,[40+esp]
+        movq    mm1,mm4
+        movq    mm7,[edx-120]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [24+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm2,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [56+esp],mm2
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[48+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[16+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm2
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm2
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm0,mm2
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        pxor    mm6,mm7
+        movq    mm5,[24+esp]
+        paddq   mm0,mm6
+        movq    mm6,[32+esp]
+        movdqa  [edx-128],xmm3
+        movdqa  xmm4,[16+ebp]
+        movdqa  xmm3,xmm2
+        movdqu  xmm2,[32+ebx]
+        paddq   xmm4,xmm1
+db      102,15,56,0,211
+        movq    mm1,mm4
+        movq    mm7,[edx-112]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [16+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm0,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [48+esp],mm0
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[40+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[8+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm0
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm0
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[56+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm2,mm0
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        pxor    mm6,mm7
+        movq    mm5,[16+esp]
+        paddq   mm2,mm6
+        movq    mm6,[24+esp]
+        movq    mm1,mm4
+        movq    mm7,[edx-104]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [8+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm2,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [40+esp],mm2
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[32+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[esp]
+        paddq   mm3,mm7
+        movq    mm5,mm2
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm2
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[48+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm0,mm2
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        pxor    mm6,mm7
+        movq    mm5,[8+esp]
+        paddq   mm0,mm6
+        movq    mm6,[16+esp]
+        movdqa  [edx-112],xmm4
+        movdqa  xmm5,[32+ebp]
+        movdqa  xmm4,xmm3
+        movdqu  xmm3,[48+ebx]
+        paddq   xmm5,xmm2
+db      102,15,56,0,220
+        movq    mm1,mm4
+        movq    mm7,[edx-96]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm0,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [32+esp],mm0
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[24+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[56+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm0
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm0
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[40+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm2,mm0
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        pxor    mm6,mm7
+        movq    mm5,[esp]
+        paddq   mm2,mm6
+        movq    mm6,[8+esp]
+        movq    mm1,mm4
+        movq    mm7,[edx-88]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [56+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm2,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [24+esp],mm2
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[16+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[48+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm2
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm2
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[32+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm0,mm2
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        pxor    mm6,mm7
+        movq    mm5,[56+esp]
+        paddq   mm0,mm6
+        movq    mm6,[esp]
+        movdqa  [edx-96],xmm5
+        movdqa  xmm6,[48+ebp]
+        movdqa  xmm5,xmm4
+        movdqu  xmm4,[64+ebx]
+        paddq   xmm6,xmm3
+db      102,15,56,0,229
+        movq    mm1,mm4
+        movq    mm7,[edx-80]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [48+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm0,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [16+esp],mm0
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[8+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[40+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm0
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm0
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[24+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm2,mm0
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        pxor    mm6,mm7
+        movq    mm5,[48+esp]
+        paddq   mm2,mm6
+        movq    mm6,[56+esp]
+        movq    mm1,mm4
+        movq    mm7,[edx-72]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [40+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm2,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [8+esp],mm2
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[32+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm2
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm2
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[16+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm0,mm2
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        pxor    mm6,mm7
+        movq    mm5,[40+esp]
+        paddq   mm0,mm6
+        movq    mm6,[48+esp]
+        movdqa  [edx-80],xmm6
+        movdqa  xmm7,[64+ebp]
+        movdqa  xmm6,xmm5
+        movdqu  xmm5,[80+ebx]
+        paddq   xmm7,xmm4
+db      102,15,56,0,238
+        movq    mm1,mm4
+        movq    mm7,[edx-64]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [32+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm0,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [esp],mm0
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[56+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[24+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm0
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm0
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[8+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm2,mm0
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        pxor    mm6,mm7
+        movq    mm5,[32+esp]
+        paddq   mm2,mm6
+        movq    mm6,[40+esp]
+        movq    mm1,mm4
+        movq    mm7,[edx-56]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [24+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm2,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [56+esp],mm2
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[48+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[16+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm2
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm2
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm0,mm2
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        pxor    mm6,mm7
+        movq    mm5,[24+esp]
+        paddq   mm0,mm6
+        movq    mm6,[32+esp]
+        movdqa  [edx-64],xmm7
+        movdqa  [edx],xmm0
+        movdqa  xmm0,[80+ebp]
+        movdqa  xmm7,xmm6
+        movdqu  xmm6,[96+ebx]
+        paddq   xmm0,xmm5
+db      102,15,56,0,247
+        movq    mm1,mm4
+        movq    mm7,[edx-48]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [16+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm0,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [48+esp],mm0
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[40+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[8+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm0
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm0
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[56+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm2,mm0
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        pxor    mm6,mm7
+        movq    mm5,[16+esp]
+        paddq   mm2,mm6
+        movq    mm6,[24+esp]
+        movq    mm1,mm4
+        movq    mm7,[edx-40]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [8+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm2,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [40+esp],mm2
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[32+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[esp]
+        paddq   mm3,mm7
+        movq    mm5,mm2
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm2
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[48+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm0,mm2
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        pxor    mm6,mm7
+        movq    mm5,[8+esp]
+        paddq   mm0,mm6
+        movq    mm6,[16+esp]
+        movdqa  [edx-48],xmm0
+        movdqa  [16+edx],xmm1
+        movdqa  xmm1,[96+ebp]
+        movdqa  xmm0,xmm7
+        movdqu  xmm7,[112+ebx]
+        paddq   xmm1,xmm6
+db      102,15,56,0,248
+        movq    mm1,mm4
+        movq    mm7,[edx-32]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm0,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [32+esp],mm0
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[24+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[56+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm0
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm0
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[40+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm2,mm0
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        pxor    mm6,mm7
+        movq    mm5,[esp]
+        paddq   mm2,mm6
+        movq    mm6,[8+esp]
+        movq    mm1,mm4
+        movq    mm7,[edx-24]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [56+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm2,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [24+esp],mm2
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[16+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[48+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm2
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm2
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[32+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm0,mm2
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        pxor    mm6,mm7
+        movq    mm5,[56+esp]
+        paddq   mm0,mm6
+        movq    mm6,[esp]
+        movdqa  [edx-32],xmm1
+        movdqa  [32+edx],xmm2
+        movdqa  xmm2,[112+ebp]
+        movdqa  xmm0,[edx]
+        paddq   xmm2,xmm7
+        movq    mm1,mm4
+        movq    mm7,[edx-16]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [48+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm0,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [16+esp],mm0
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[8+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[40+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm0
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm0
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[24+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm2,mm0
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        pxor    mm6,mm7
+        movq    mm5,[48+esp]
+        paddq   mm2,mm6
+        movq    mm6,[56+esp]
+        movq    mm1,mm4
+        movq    mm7,[edx-8]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [40+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm2,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [8+esp],mm2
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[32+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm2
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm2
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[16+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm0,mm2
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        pxor    mm6,mm7
+        movq    mm5,[40+esp]
+        paddq   mm0,mm6
+        movq    mm6,[48+esp]
+        movdqa  [edx-16],xmm2
+        movq    mm1,[8+esp]
+        paddq   mm0,mm3
+        movq    mm3,[24+esp]
+        movq    mm7,[56+esp]
+        pxor    mm2,mm1
+        paddq   mm0,[esi]
+        paddq   mm1,[8+esi]
+        paddq   mm2,[16+esi]
+        paddq   mm3,[24+esi]
+        paddq   mm4,[32+esi]
+        paddq   mm5,[40+esi]
+        paddq   mm6,[48+esi]
+        paddq   mm7,[56+esi]
+        movq    [esi],mm0
+        movq    [8+esi],mm1
+        movq    [16+esi],mm2
+        movq    [24+esi],mm3
+        movq    [32+esi],mm4
+        movq    [40+esi],mm5
+        movq    [48+esi],mm6
+        movq    [56+esi],mm7
+        cmp     edi,eax
+        jb      NEAR L$007loop_ssse3
+        mov     esp,DWORD [76+edx]
+        emms
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+align   16
+L$002loop_x86:
+        mov     eax,DWORD [edi]
+        mov     ebx,DWORD [4+edi]
+        mov     ecx,DWORD [8+edi]
+        mov     edx,DWORD [12+edi]
+        bswap   eax
+        bswap   ebx
+        bswap   ecx
+        bswap   edx
+        push    eax
+        push    ebx
+        push    ecx
+        push    edx
+        mov     eax,DWORD [16+edi]
+        mov     ebx,DWORD [20+edi]
+        mov     ecx,DWORD [24+edi]
+        mov     edx,DWORD [28+edi]
+        bswap   eax
+        bswap   ebx
+        bswap   ecx
+        bswap   edx
+        push    eax
+        push    ebx
+        push    ecx
+        push    edx
+        mov     eax,DWORD [32+edi]
+        mov     ebx,DWORD [36+edi]
+        mov     ecx,DWORD [40+edi]
+        mov     edx,DWORD [44+edi]
+        bswap   eax
+        bswap   ebx
+        bswap   ecx
+        bswap   edx
+        push    eax
+        push    ebx
+        push    ecx
+        push    edx
+        mov     eax,DWORD [48+edi]
+        mov     ebx,DWORD [52+edi]
+        mov     ecx,DWORD [56+edi]
+        mov     edx,DWORD [60+edi]
+        bswap   eax
+        bswap   ebx
+        bswap   ecx
+        bswap   edx
+        push    eax
+        push    ebx
+        push    ecx
+        push    edx
+        mov     eax,DWORD [64+edi]
+        mov     ebx,DWORD [68+edi]
+        mov     ecx,DWORD [72+edi]
+        mov     edx,DWORD [76+edi]
+        bswap   eax
+        bswap   ebx
+        bswap   ecx
+        bswap   edx
+        push    eax
+        push    ebx
+        push    ecx
+        push    edx
+        mov     eax,DWORD [80+edi]
+        mov     ebx,DWORD [84+edi]
+        mov     ecx,DWORD [88+edi]
+        mov     edx,DWORD [92+edi]
+        bswap   eax
+        bswap   ebx
+        bswap   ecx
+        bswap   edx
+        push    eax
+        push    ebx
+        push    ecx
+        push    edx
+        mov     eax,DWORD [96+edi]
+        mov     ebx,DWORD [100+edi]
+        mov     ecx,DWORD [104+edi]
+        mov     edx,DWORD [108+edi]
+        bswap   eax
+        bswap   ebx
+        bswap   ecx
+        bswap   edx
+        push    eax
+        push    ebx
+        push    ecx
+        push    edx
+        mov     eax,DWORD [112+edi]
+        mov     ebx,DWORD [116+edi]
+        mov     ecx,DWORD [120+edi]
+        mov     edx,DWORD [124+edi]
+        bswap   eax
+        bswap   ebx
+        bswap   ecx
+        bswap   edx
+        push    eax
+        push    ebx
+        push    ecx
+        push    edx
+        add     edi,128
+        sub     esp,72
+        mov     DWORD [204+esp],edi
+        lea     edi,[8+esp]
+        mov     ecx,16
+dd      2784229001
+align   16
+L$00900_15_x86:
+        mov     ecx,DWORD [40+esp]
+        mov     edx,DWORD [44+esp]
+        mov     esi,ecx
+        shr     ecx,9
+        mov     edi,edx
+        shr     edx,9
+        mov     ebx,ecx
+        shl     esi,14
+        mov     eax,edx
+        shl     edi,14
+        xor     ebx,esi
+        shr     ecx,5
+        xor     eax,edi
+        shr     edx,5
+        xor     eax,ecx
+        shl     esi,4
+        xor     ebx,edx
+        shl     edi,4
+        xor     ebx,esi
+        shr     ecx,4
+        xor     eax,edi
+        shr     edx,4
+        xor     eax,ecx
+        shl     esi,5
+        xor     ebx,edx
+        shl     edi,5
+        xor     eax,esi
+        xor     ebx,edi
+        mov     ecx,DWORD [48+esp]
+        mov     edx,DWORD [52+esp]
+        mov     esi,DWORD [56+esp]
+        mov     edi,DWORD [60+esp]
+        add     eax,DWORD [64+esp]
+        adc     ebx,DWORD [68+esp]
+        xor     ecx,esi
+        xor     edx,edi
+        and     ecx,DWORD [40+esp]
+        and     edx,DWORD [44+esp]
+        add     eax,DWORD [192+esp]
+        adc     ebx,DWORD [196+esp]
+        xor     ecx,esi
+        xor     edx,edi
+        mov     esi,DWORD [ebp]
+        mov     edi,DWORD [4+ebp]
+        add     eax,ecx
+        adc     ebx,edx
+        mov     ecx,DWORD [32+esp]
+        mov     edx,DWORD [36+esp]
+        add     eax,esi
+        adc     ebx,edi
+        mov     DWORD [esp],eax
+        mov     DWORD [4+esp],ebx
+        add     eax,ecx
+        adc     ebx,edx
+        mov     ecx,DWORD [8+esp]
+        mov     edx,DWORD [12+esp]
+        mov     DWORD [32+esp],eax
+        mov     DWORD [36+esp],ebx
+        mov     esi,ecx
+        shr     ecx,2
+        mov     edi,edx
+        shr     edx,2
+        mov     ebx,ecx
+        shl     esi,4
+        mov     eax,edx
+        shl     edi,4
+        xor     ebx,esi
+        shr     ecx,5
+        xor     eax,edi
+        shr     edx,5
+        xor     ebx,ecx
+        shl     esi,21
+        xor     eax,edx
+        shl     edi,21
+        xor     eax,esi
+        shr     ecx,21
+        xor     ebx,edi
+        shr     edx,21
+        xor     eax,ecx
+        shl     esi,5
+        xor     ebx,edx
+        shl     edi,5
+        xor     eax,esi
+        xor     ebx,edi
+        mov     ecx,DWORD [8+esp]
+        mov     edx,DWORD [12+esp]
+        mov     esi,DWORD [16+esp]
+        mov     edi,DWORD [20+esp]
+        add     eax,DWORD [esp]
+        adc     ebx,DWORD [4+esp]
+        or      ecx,esi
+        or      edx,edi
+        and     ecx,DWORD [24+esp]
+        and     edx,DWORD [28+esp]
+        and     esi,DWORD [8+esp]
+        and     edi,DWORD [12+esp]
+        or      ecx,esi
+        or      edx,edi
+        add     eax,ecx
+        adc     ebx,edx
+        mov     DWORD [esp],eax
+        mov     DWORD [4+esp],ebx
+        mov     dl,BYTE [ebp]
+        sub     esp,8
+        lea     ebp,[8+ebp]
+        cmp     dl,148
+        jne     NEAR L$00900_15_x86
+align   16
+L$01016_79_x86:
+        mov     ecx,DWORD [312+esp]
+        mov     edx,DWORD [316+esp]
+        mov     esi,ecx
+        shr     ecx,1
+        mov     edi,edx
+        shr     edx,1
+        mov     eax,ecx
+        shl     esi,24
+        mov     ebx,edx
+        shl     edi,24
+        xor     ebx,esi
+        shr     ecx,6
+        xor     eax,edi
+        shr     edx,6
+        xor     eax,ecx
+        shl     esi,7
+        xor     ebx,edx
+        shl     edi,1
+        xor     ebx,esi
+        shr     ecx,1
+        xor     eax,edi
+        shr     edx,1
+        xor     eax,ecx
+        shl     edi,6
+        xor     ebx,edx
+        xor     eax,edi
+        mov     DWORD [esp],eax
+        mov     DWORD [4+esp],ebx
+        mov     ecx,DWORD [208+esp]
+        mov     edx,DWORD [212+esp]
+        mov     esi,ecx
+        shr     ecx,6
+        mov     edi,edx
+        shr     edx,6
+        mov     eax,ecx
+        shl     esi,3
+        mov     ebx,edx
+        shl     edi,3
+        xor     eax,esi
+        shr     ecx,13
+        xor     ebx,edi
+        shr     edx,13
+        xor     eax,ecx
+        shl     esi,10
+        xor     ebx,edx
+        shl     edi,10
+        xor     ebx,esi
+        shr     ecx,10
+        xor     eax,edi
+        shr     edx,10
+        xor     ebx,ecx
+        shl     edi,13
+        xor     eax,edx
+        xor     eax,edi
+        mov     ecx,DWORD [320+esp]
+        mov     edx,DWORD [324+esp]
+        add     eax,DWORD [esp]
+        adc     ebx,DWORD [4+esp]
+        mov     esi,DWORD [248+esp]
+        mov     edi,DWORD [252+esp]
+        add     eax,ecx
+        adc     ebx,edx
+        add     eax,esi
+        adc     ebx,edi
+        mov     DWORD [192+esp],eax
+        mov     DWORD [196+esp],ebx
+        mov     ecx,DWORD [40+esp]
+        mov     edx,DWORD [44+esp]
+        mov     esi,ecx
+        shr     ecx,9
+        mov     edi,edx
+        shr     edx,9
+        mov     ebx,ecx
+        shl     esi,14
+        mov     eax,edx
+        shl     edi,14
+        xor     ebx,esi
+        shr     ecx,5
+        xor     eax,edi
+        shr     edx,5
+        xor     eax,ecx
+        shl     esi,4
+        xor     ebx,edx
+        shl     edi,4
+        xor     ebx,esi
+        shr     ecx,4
+        xor     eax,edi
+        shr     edx,4
+        xor     eax,ecx
+        shl     esi,5
+        xor     ebx,edx
+        shl     edi,5
+        xor     eax,esi
+        xor     ebx,edi
+        mov     ecx,DWORD [48+esp]
+        mov     edx,DWORD [52+esp]
+        mov     esi,DWORD [56+esp]
+        mov     edi,DWORD [60+esp]
+        add     eax,DWORD [64+esp]
+        adc     ebx,DWORD [68+esp]
+        xor     ecx,esi
+        xor     edx,edi
+        and     ecx,DWORD [40+esp]
+        and     edx,DWORD [44+esp]
+        add     eax,DWORD [192+esp]
+        adc     ebx,DWORD [196+esp]
+        xor     ecx,esi
+        xor     edx,edi
+        mov     esi,DWORD [ebp]
+        mov     edi,DWORD [4+ebp]
+        add     eax,ecx
+        adc     ebx,edx
+        mov     ecx,DWORD [32+esp]
+        mov     edx,DWORD [36+esp]
+        add     eax,esi
+        adc     ebx,edi
+        mov     DWORD [esp],eax
+        mov     DWORD [4+esp],ebx
+        add     eax,ecx
+        adc     ebx,edx
+        mov     ecx,DWORD [8+esp]
+        mov     edx,DWORD [12+esp]
+        mov     DWORD [32+esp],eax
+        mov     DWORD [36+esp],ebx
+        mov     esi,ecx
+        shr     ecx,2
+        mov     edi,edx
+        shr     edx,2
+        mov     ebx,ecx
+        shl     esi,4
+        mov     eax,edx
+        shl     edi,4
+        xor     ebx,esi
+        shr     ecx,5
+        xor     eax,edi
+        shr     edx,5
+        xor     ebx,ecx
+        shl     esi,21
+        xor     eax,edx
+        shl     edi,21
+        xor     eax,esi
+        shr     ecx,21
+        xor     ebx,edi
+        shr     edx,21
+        xor     eax,ecx
+        shl     esi,5
+        xor     ebx,edx
+        shl     edi,5
+        xor     eax,esi
+        xor     ebx,edi
+        mov     ecx,DWORD [8+esp]
+        mov     edx,DWORD [12+esp]
+        mov     esi,DWORD [16+esp]
+        mov     edi,DWORD [20+esp]
+        add     eax,DWORD [esp]
+        adc     ebx,DWORD [4+esp]
+        or      ecx,esi
+        or      edx,edi
+        and     ecx,DWORD [24+esp]
+        and     edx,DWORD [28+esp]
+        and     esi,DWORD [8+esp]
+        and     edi,DWORD [12+esp]
+        or      ecx,esi
+        or      edx,edi
+        add     eax,ecx
+        adc     ebx,edx
+        mov     DWORD [esp],eax
+        mov     DWORD [4+esp],ebx
+        mov     dl,BYTE [ebp]
+        sub     esp,8
+        lea     ebp,[8+ebp]
+        cmp     dl,23
+        jne     NEAR L$01016_79_x86
+        mov     esi,DWORD [840+esp]
+        mov     edi,DWORD [844+esp]
+        mov     eax,DWORD [esi]
+        mov     ebx,DWORD [4+esi]
+        mov     ecx,DWORD [8+esi]
+        mov     edx,DWORD [12+esi]
+        add     eax,DWORD [8+esp]
+        adc     ebx,DWORD [12+esp]
+        mov     DWORD [esi],eax
+        mov     DWORD [4+esi],ebx
+        add     ecx,DWORD [16+esp]
+        adc     edx,DWORD [20+esp]
+        mov     DWORD [8+esi],ecx
+        mov     DWORD [12+esi],edx
+        mov     eax,DWORD [16+esi]
+        mov     ebx,DWORD [20+esi]
+        mov     ecx,DWORD [24+esi]
+        mov     edx,DWORD [28+esi]
+        add     eax,DWORD [24+esp]
+        adc     ebx,DWORD [28+esp]
+        mov     DWORD [16+esi],eax
+        mov     DWORD [20+esi],ebx
+        add     ecx,DWORD [32+esp]
+        adc     edx,DWORD [36+esp]
+        mov     DWORD [24+esi],ecx
+        mov     DWORD [28+esi],edx
+        mov     eax,DWORD [32+esi]
+        mov     ebx,DWORD [36+esi]
+        mov     ecx,DWORD [40+esi]
+        mov     edx,DWORD [44+esi]
+        add     eax,DWORD [40+esp]
+        adc     ebx,DWORD [44+esp]
+        mov     DWORD [32+esi],eax
+        mov     DWORD [36+esi],ebx
+        add     ecx,DWORD [48+esp]
+        adc     edx,DWORD [52+esp]
+        mov     DWORD [40+esi],ecx
+        mov     DWORD [44+esi],edx
+        mov     eax,DWORD [48+esi]
+        mov     ebx,DWORD [52+esi]
+        mov     ecx,DWORD [56+esi]
+        mov     edx,DWORD [60+esi]
+        add     eax,DWORD [56+esp]
+        adc     ebx,DWORD [60+esp]
+        mov     DWORD [48+esi],eax
+        mov     DWORD [52+esi],ebx
+        add     ecx,DWORD [64+esp]
+        adc     edx,DWORD [68+esp]
+        mov     DWORD [56+esi],ecx
+        mov     DWORD [60+esi],edx
+        add     esp,840
+        sub     ebp,640
+        cmp     edi,DWORD [8+esp]
+        jb      NEAR L$002loop_x86
+        mov     esp,DWORD [12+esp]
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+align   64
+L$001K512:
+dd      3609767458,1116352408
+dd      602891725,1899447441
+dd      3964484399,3049323471
+dd      2173295548,3921009573
+dd      4081628472,961987163
+dd      3053834265,1508970993
+dd      2937671579,2453635748
+dd      3664609560,2870763221
+dd      2734883394,3624381080
+dd      1164996542,310598401
+dd      1323610764,607225278
+dd      3590304994,1426881987
+dd      4068182383,1925078388
+dd      991336113,2162078206
+dd      633803317,2614888103
+dd      3479774868,3248222580
+dd      2666613458,3835390401
+dd      944711139,4022224774
+dd      2341262773,264347078
+dd      2007800933,604807628
+dd      1495990901,770255983
+dd      1856431235,1249150122
+dd      3175218132,1555081692
+dd      2198950837,1996064986
+dd      3999719339,2554220882
+dd      766784016,2821834349
+dd      2566594879,2952996808
+dd      3203337956,3210313671
+dd      1034457026,3336571891
+dd      2466948901,3584528711
+dd      3758326383,113926993
+dd      168717936,338241895
+dd      1188179964,666307205
+dd      1546045734,773529912
+dd      1522805485,1294757372
+dd      2643833823,1396182291
+dd      2343527390,1695183700
+dd      1014477480,1986661051
+dd      1206759142,2177026350
+dd      344077627,2456956037
+dd      1290863460,2730485921
+dd      3158454273,2820302411
+dd      3505952657,3259730800
+dd      106217008,3345764771
+dd      3606008344,3516065817
+dd      1432725776,3600352804
+dd      1467031594,4094571909
+dd      851169720,275423344
+dd      3100823752,430227734
+dd      1363258195,506948616
+dd      3750685593,659060556
+dd      3785050280,883997877
+dd      3318307427,958139571
+dd      3812723403,1322822218
+dd      2003034995,1537002063
+dd      3602036899,1747873779
+dd      1575990012,1955562222
+dd      1125592928,2024104815
+dd      2716904306,2227730452
+dd      442776044,2361852424
+dd      593698344,2428436474
+dd      3733110249,2756734187
+dd      2999351573,3204031479
+dd      3815920427,3329325298
+dd      3928383900,3391569614
+dd      566280711,3515267271
+dd      3454069534,3940187606
+dd      4000239992,4118630271
+dd      1914138554,116418474
+dd      2731055270,174292421
+dd      3203993006,289380356
+dd      320620315,460393269
+dd      587496836,685471733
+dd      1086792851,852142971
+dd      365543100,1017036298
+dd      2618297676,1126000580
+dd      3409855158,1288033470
+dd      4234509866,1501505948
+dd      987167468,1607167915
+dd      1246189591,1816402316
+dd      67438087,66051
+dd      202182159,134810123
+db      83,72,65,53,49,50,32,98,108,111,99,107,32,116,114,97
+db      110,115,102,111,114,109,32,102,111,114,32,120,56,54,44,32
+db      67,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97
+db      112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103
+db      62,0
+segment .bss
+common  _OPENSSL_ia32cap_P 16
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/x86cpuid.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/x86cpuid.nasm
new file mode 100644
index 0000000000..9d61eedd34
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/x86cpuid.nasm
@@ -0,0 +1,513 @@
+; Copyright 2004-2018 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code    use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text   code align=64
+%else
+section .text   code
+%endif
+global  _OPENSSL_ia32_cpuid
+align   16
+_OPENSSL_ia32_cpuid:
+L$_OPENSSL_ia32_cpuid_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        xor     edx,edx
+        pushfd
+        pop     eax
+        mov     ecx,eax
+        xor     eax,2097152
+        push    eax
+        popfd
+        pushfd
+        pop     eax
+        xor     ecx,eax
+        xor     eax,eax
+        mov     esi,DWORD [20+esp]
+        mov     DWORD [8+esi],eax
+        bt      ecx,21
+        jnc     NEAR L$000nocpuid
+        cpuid
+        mov     edi,eax
+        xor     eax,eax
+        cmp     ebx,1970169159
+        setne   al
+        mov     ebp,eax
+        cmp     edx,1231384169
+        setne   al
+        or      ebp,eax
+        cmp     ecx,1818588270
+        setne   al
+        or      ebp,eax
+        jz      NEAR L$001intel
+        cmp     ebx,1752462657
+        setne   al
+        mov     esi,eax
+        cmp     edx,1769238117
+        setne   al
+        or      esi,eax
+        cmp     ecx,1145913699
+        setne   al
+        or      esi,eax
+        jnz     NEAR L$001intel
+        mov     eax,2147483648
+        cpuid
+        cmp     eax,2147483649
+        jb      NEAR L$001intel
+        mov     esi,eax
+        mov     eax,2147483649
+        cpuid
+        or      ebp,ecx
+        and     ebp,2049
+        cmp     esi,2147483656
+        jb      NEAR L$001intel
+        mov     eax,2147483656
+        cpuid
+        movzx   esi,cl
+        inc     esi
+        mov     eax,1
+        xor     ecx,ecx
+        cpuid
+        bt      edx,28
+        jnc     NEAR L$002generic
+        shr     ebx,16
+        and     ebx,255
+        cmp     ebx,esi
+        ja      NEAR L$002generic
+        and     edx,4026531839
+        jmp     NEAR L$002generic
+L$001intel:
+        cmp     edi,4
+        mov     esi,-1
+        jb      NEAR L$003nocacheinfo
+        mov     eax,4
+        mov     ecx,0
+        cpuid
+        mov     esi,eax
+        shr     esi,14
+        and     esi,4095
+L$003nocacheinfo:
+        mov     eax,1
+        xor     ecx,ecx
+        cpuid
+        and     edx,3220176895
+        cmp     ebp,0
+        jne     NEAR L$004notintel
+        or      edx,1073741824
+        and     ah,15
+        cmp     ah,15
+        jne     NEAR L$004notintel
+        or      edx,1048576
+L$004notintel:
+        bt      edx,28
+        jnc     NEAR L$002generic
+        and     edx,4026531839
+        cmp     esi,0
+        je      NEAR L$002generic
+        or      edx,268435456
+        shr     ebx,16
+        cmp     bl,1
+        ja      NEAR L$002generic
+        and     edx,4026531839
+L$002generic:
+        and     ebp,2048
+        and     ecx,4294965247
+        mov     esi,edx
+        or      ebp,ecx
+        cmp     edi,7
+        mov     edi,DWORD [20+esp]
+        jb      NEAR L$005no_extended_info
+        mov     eax,7
+        xor     ecx,ecx
+        cpuid
+        mov     DWORD [8+edi],ebx
+L$005no_extended_info:
+        bt      ebp,27
+        jnc     NEAR L$006clear_avx
+        xor     ecx,ecx
+db      15,1,208
+        and     eax,6
+        cmp     eax,6
+        je      NEAR L$007done
+        cmp     eax,2
+        je      NEAR L$006clear_avx
+L$008clear_xmm:
+        and     ebp,4261412861
+        and     esi,4278190079
+L$006clear_avx:
+        and     ebp,4026525695
+        and     DWORD [8+edi],4294967263
+L$007done:
+        mov     eax,esi
+        mov     edx,ebp
+L$000nocpuid:
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+;extern _OPENSSL_ia32cap_P
+global  _OPENSSL_rdtsc
+align   16
+_OPENSSL_rdtsc:
+L$_OPENSSL_rdtsc_begin:
+        xor     eax,eax
+        xor     edx,edx
+        lea     ecx,[_OPENSSL_ia32cap_P]
+        bt      DWORD [ecx],4
+        jnc     NEAR L$009notsc
+        rdtsc
+L$009notsc:
+        ret
+global  _OPENSSL_instrument_halt
+align   16
+_OPENSSL_instrument_halt:
+L$_OPENSSL_instrument_halt_begin:
+        lea     ecx,[_OPENSSL_ia32cap_P]
+        bt      DWORD [ecx],4
+        jnc     NEAR L$010nohalt
+dd      2421723150
+        and     eax,3
+        jnz     NEAR L$010nohalt
+        pushfd
+        pop     eax
+        bt      eax,9
+        jnc     NEAR L$010nohalt
+        rdtsc
+        push    edx
+        push    eax
+        hlt
+        rdtsc
+        sub     eax,DWORD [esp]
+        sbb     edx,DWORD [4+esp]
+        add     esp,8
+        ret
+L$010nohalt:
+        xor     eax,eax
+        xor     edx,edx
+        ret
+global  _OPENSSL_far_spin
+align   16
+_OPENSSL_far_spin:
+L$_OPENSSL_far_spin_begin:
+        pushfd
+        pop     eax
+        bt      eax,9
+        jnc     NEAR L$011nospin
+        mov     eax,DWORD [4+esp]
+        mov     ecx,DWORD [8+esp]
+dd      2430111262
+        xor     eax,eax
+        mov     edx,DWORD [ecx]
+        jmp     NEAR L$012spin
+align   16
+L$012spin:
+        inc     eax
+        cmp     edx,DWORD [ecx]
+        je      NEAR L$012spin
+dd      529567888
+        ret
+L$011nospin:
+        xor     eax,eax
+        xor     edx,edx
+        ret
+global  _OPENSSL_wipe_cpu
+align   16
+_OPENSSL_wipe_cpu:
+L$_OPENSSL_wipe_cpu_begin:
+        xor     eax,eax
+        xor     edx,edx
+        lea     ecx,[_OPENSSL_ia32cap_P]
+        mov     ecx,DWORD [ecx]
+        bt      DWORD [ecx],1
+        jnc     NEAR L$013no_x87
+        and     ecx,83886080
+        cmp     ecx,83886080
+        jne     NEAR L$014no_sse2
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        pxor    xmm2,xmm2
+        pxor    xmm3,xmm3
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+        pxor    xmm6,xmm6
+        pxor    xmm7,xmm7
+L$014no_sse2:
+dd      4007259865,4007259865,4007259865,4007259865,2430851995
+L$013no_x87:
+        lea     eax,[4+esp]
+        ret
+global  _OPENSSL_atomic_add
+align   16
+_OPENSSL_atomic_add:
+L$_OPENSSL_atomic_add_begin:
+        mov     edx,DWORD [4+esp]
+        mov     ecx,DWORD [8+esp]
+        push    ebx
+        nop
+        mov     eax,DWORD [edx]
+L$015spin:
+        lea     ebx,[ecx*1+eax]
+        nop
+dd      447811568
+        jne     NEAR L$015spin
+        mov     eax,ebx
+        pop     ebx
+        ret
+global  _OPENSSL_cleanse
+align   16
+_OPENSSL_cleanse:
+L$_OPENSSL_cleanse_begin:
+        mov     edx,DWORD [4+esp]
+        mov     ecx,DWORD [8+esp]
+        xor     eax,eax
+        cmp     ecx,7
+        jae     NEAR L$016lot
+        cmp     ecx,0
+        je      NEAR L$017ret
+L$018little:
+        mov     BYTE [edx],al
+        sub     ecx,1
+        lea     edx,[1+edx]
+        jnz     NEAR L$018little
+L$017ret:
+        ret
+align   16
+L$016lot:
+        test    edx,3
+        jz      NEAR L$019aligned
+        mov     BYTE [edx],al
+        lea     ecx,[ecx-1]
+        lea     edx,[1+edx]
+        jmp     NEAR L$016lot
+L$019aligned:
+        mov     DWORD [edx],eax
+        lea     ecx,[ecx-4]
+        test    ecx,-4
+        lea     edx,[4+edx]
+        jnz     NEAR L$019aligned
+        cmp     ecx,0
+        jne     NEAR L$018little
+        ret
+global  _CRYPTO_memcmp
+align   16
+_CRYPTO_memcmp:
+L$_CRYPTO_memcmp_begin:
+        push    esi
+        push    edi
+        mov     esi,DWORD [12+esp]
+        mov     edi,DWORD [16+esp]
+        mov     ecx,DWORD [20+esp]
+        xor     eax,eax
+        xor     edx,edx
+        cmp     ecx,0
+        je      NEAR L$020no_data
+L$021loop:
+        mov     dl,BYTE [esi]
+        lea     esi,[1+esi]
+        xor     dl,BYTE [edi]
+        lea     edi,[1+edi]
+        or      al,dl
+        dec     ecx
+        jnz     NEAR L$021loop
+        neg     eax
+        shr     eax,31
+L$020no_data:
+        pop     edi
+        pop     esi
+        ret
+global  _OPENSSL_instrument_bus
+align   16
+_OPENSSL_instrument_bus:
+L$_OPENSSL_instrument_bus_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     eax,0
+        lea     edx,[_OPENSSL_ia32cap_P]
+        bt      DWORD [edx],4
+        jnc     NEAR L$022nogo
+        bt      DWORD [edx],19
+        jnc     NEAR L$022nogo
+        mov     edi,DWORD [20+esp]
+        mov     ecx,DWORD [24+esp]
+        rdtsc
+        mov     esi,eax
+        mov     ebx,0
+        clflush [edi]
+db      240
+        add     DWORD [edi],ebx
+        jmp     NEAR L$023loop
+align   16
+L$023loop:
+        rdtsc
+        mov     edx,eax
+        sub     eax,esi
+        mov     esi,edx
+        mov     ebx,eax
+        clflush [edi]
+db      240
+        add     DWORD [edi],eax
+        lea     edi,[4+edi]
+        sub     ecx,1
+        jnz     NEAR L$023loop
+        mov     eax,DWORD [24+esp]
+L$022nogo:
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _OPENSSL_instrument_bus2
+align   16
+_OPENSSL_instrument_bus2:
+L$_OPENSSL_instrument_bus2_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     eax,0
+        lea     edx,[_OPENSSL_ia32cap_P]
+        bt      DWORD [edx],4
+        jnc     NEAR L$024nogo
+        bt      DWORD [edx],19
+        jnc     NEAR L$024nogo
+        mov     edi,DWORD [20+esp]
+        mov     ecx,DWORD [24+esp]
+        mov     ebp,DWORD [28+esp]
+        rdtsc
+        mov     esi,eax
+        mov     ebx,0
+        clflush [edi]
+db      240
+        add     DWORD [edi],ebx
+        rdtsc
+        mov     edx,eax
+        sub     eax,esi
+        mov     esi,edx
+        mov     ebx,eax
+        jmp     NEAR L$025loop2
+align   16
+L$025loop2:
+        clflush [edi]
+db      240
+        add     DWORD [edi],eax
+        sub     ebp,1
+        jz      NEAR L$026done2
+        rdtsc
+        mov     edx,eax
+        sub     eax,esi
+        mov     esi,edx
+        cmp     eax,ebx
+        mov     ebx,eax
+        mov     edx,0
+        setne   dl
+        sub     ecx,edx
+        lea     edi,[edx*4+edi]
+        jnz     NEAR L$025loop2
+L$026done2:
+        mov     eax,DWORD [24+esp]
+        sub     eax,ecx
+L$024nogo:
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _OPENSSL_ia32_rdrand_bytes
+align   16
+_OPENSSL_ia32_rdrand_bytes:
+L$_OPENSSL_ia32_rdrand_bytes_begin:
+        push    edi
+        push    ebx
+        xor     eax,eax
+        mov     edi,DWORD [12+esp]
+        mov     ebx,DWORD [16+esp]
+        cmp     ebx,0
+        je      NEAR L$027done
+        mov     ecx,8
+L$028loop:
+db      15,199,242
+        jc      NEAR L$029break
+        loop    L$028loop
+        jmp     NEAR L$027done
+align   16
+L$029break:
+        cmp     ebx,4
+        jb      NEAR L$030tail
+        mov     DWORD [edi],edx
+        lea     edi,[4+edi]
+        add     eax,4
+        sub     ebx,4
+        jz      NEAR L$027done
+        mov     ecx,8
+        jmp     NEAR L$028loop
+align   16
+L$030tail:
+        mov     BYTE [edi],dl
+        lea     edi,[1+edi]
+        inc     eax
+        shr     edx,8
+        dec     ebx
+        jnz     NEAR L$030tail
+L$027done:
+        xor     edx,edx
+        pop     ebx
+        pop     edi
+        ret
+global  _OPENSSL_ia32_rdseed_bytes
+align   16
+_OPENSSL_ia32_rdseed_bytes:
+L$_OPENSSL_ia32_rdseed_bytes_begin:
+        push    edi
+        push    ebx
+        xor     eax,eax
+        mov     edi,DWORD [12+esp]
+        mov     ebx,DWORD [16+esp]
+        cmp     ebx,0
+        je      NEAR L$031done
+        mov     ecx,8
+L$032loop:
+db      15,199,250
+        jc      NEAR L$033break
+        loop    L$032loop
+        jmp     NEAR L$031done
+align   16
+L$033break:
+        cmp     ebx,4
+        jb      NEAR L$034tail
+        mov     DWORD [edi],edx
+        lea     edi,[4+edi]
+        add     eax,4
+        sub     ebx,4
+        jz      NEAR L$031done
+        mov     ecx,8
+        jmp     NEAR L$032loop
+align   16
+L$034tail:
+        mov     BYTE [edi],dl
+        lea     edi,[1+edi]
+        inc     eax
+        shr     edx,8
+        dec     ebx
+        jnz     NEAR L$034tail
+L$031done:
+        xor     edx,edx
+        pop     ebx
+        pop     edi
+        ret
+segment .bss
+common  _OPENSSL_ia32cap_P 16
+segment .CRT$XCU data align=4
+extern  _OPENSSL_cpuid_setup
+dd      _OPENSSL_cpuid_setup
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb-x86_64.nasm
new file mode 100644
index 0000000000..a90434b21f
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb-x86_64.nasm
@@ -0,0 +1,1772 @@
+; Copyright 2013-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+EXTERN  OPENSSL_ia32cap_P
+
+global  aesni_multi_cbc_encrypt
+
+ALIGN   32
+aesni_multi_cbc_encrypt:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_multi_cbc_encrypt:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+        cmp     edx,2
+        jb      NEAR $L$enc_non_avx
+        mov     ecx,DWORD[((OPENSSL_ia32cap_P+4))]
+        test    ecx,268435456
+        jnz     NEAR _avx_cbc_enc_shortcut
+        jmp     NEAR $L$enc_non_avx
+ALIGN   16
+$L$enc_non_avx:
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        lea     rsp,[((-168))+rsp]
+        movaps  XMMWORD[rsp],xmm6
+        movaps  XMMWORD[16+rsp],xmm7
+        movaps  XMMWORD[32+rsp],xmm8
+        movaps  XMMWORD[48+rsp],xmm9
+        movaps  XMMWORD[64+rsp],xmm10
+        movaps  XMMWORD[80+rsp],xmm11
+        movaps  XMMWORD[96+rsp],xmm12
+        movaps  XMMWORD[(-104)+rax],xmm13
+        movaps  XMMWORD[(-88)+rax],xmm14
+        movaps  XMMWORD[(-72)+rax],xmm15
+
+
+
+
+
+
+        sub     rsp,48
+        and     rsp,-64
+        mov     QWORD[16+rsp],rax
+
+
+$L$enc4x_body:
+        movdqu  xmm12,XMMWORD[rsi]
+        lea     rsi,[120+rsi]
+        lea     rdi,[80+rdi]
+
+$L$enc4x_loop_grande:
+        mov     DWORD[24+rsp],edx
+        xor     edx,edx
+        mov     ecx,DWORD[((-64))+rdi]
+        mov     r8,QWORD[((-80))+rdi]
+        cmp     ecx,edx
+        mov     r12,QWORD[((-72))+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        movdqu  xmm2,XMMWORD[((-56))+rdi]
+        mov     DWORD[32+rsp],ecx
+        cmovle  r8,rsp
+        mov     ecx,DWORD[((-24))+rdi]
+        mov     r9,QWORD[((-40))+rdi]
+        cmp     ecx,edx
+        mov     r13,QWORD[((-32))+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        movdqu  xmm3,XMMWORD[((-16))+rdi]
+        mov     DWORD[36+rsp],ecx
+        cmovle  r9,rsp
+        mov     ecx,DWORD[16+rdi]
+        mov     r10,QWORD[rdi]
+        cmp     ecx,edx
+        mov     r14,QWORD[8+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        movdqu  xmm4,XMMWORD[24+rdi]
+        mov     DWORD[40+rsp],ecx
+        cmovle  r10,rsp
+        mov     ecx,DWORD[56+rdi]
+        mov     r11,QWORD[40+rdi]
+        cmp     ecx,edx
+        mov     r15,QWORD[48+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        movdqu  xmm5,XMMWORD[64+rdi]
+        mov     DWORD[44+rsp],ecx
+        cmovle  r11,rsp
+        test    edx,edx
+        jz      NEAR $L$enc4x_done
+
+        movups  xmm1,XMMWORD[((16-120))+rsi]
+        pxor    xmm2,xmm12
+        movups  xmm0,XMMWORD[((32-120))+rsi]
+        pxor    xmm3,xmm12
+        mov     eax,DWORD[((240-120))+rsi]
+        pxor    xmm4,xmm12
+        movdqu  xmm6,XMMWORD[r8]
+        pxor    xmm5,xmm12
+        movdqu  xmm7,XMMWORD[r9]
+        pxor    xmm2,xmm6
+        movdqu  xmm8,XMMWORD[r10]
+        pxor    xmm3,xmm7
+        movdqu  xmm9,XMMWORD[r11]
+        pxor    xmm4,xmm8
+        pxor    xmm5,xmm9
+        movdqa  xmm10,XMMWORD[32+rsp]
+        xor     rbx,rbx
+        jmp     NEAR $L$oop_enc4x
+
+ALIGN   32
+$L$oop_enc4x:
+        add     rbx,16
+        lea     rbp,[16+rsp]
+        mov     ecx,1
+        sub     rbp,rbx
+
+DB      102,15,56,220,209
+        prefetcht0      [31+rbx*1+r8]
+        prefetcht0      [31+rbx*1+r9]
+DB      102,15,56,220,217
+        prefetcht0      [31+rbx*1+r10]
+        prefetcht0      [31+rbx*1+r10]
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+        movups  xmm1,XMMWORD[((48-120))+rsi]
+        cmp     ecx,DWORD[32+rsp]
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+DB      102,15,56,220,224
+        cmovge  r8,rbp
+        cmovg   r12,rbp
+DB      102,15,56,220,232
+        movups  xmm0,XMMWORD[((-56))+rsi]
+        cmp     ecx,DWORD[36+rsp]
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+        cmovge  r9,rbp
+        cmovg   r13,rbp
+DB      102,15,56,220,233
+        movups  xmm1,XMMWORD[((-40))+rsi]
+        cmp     ecx,DWORD[40+rsp]
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+DB      102,15,56,220,224
+        cmovge  r10,rbp
+        cmovg   r14,rbp
+DB      102,15,56,220,232
+        movups  xmm0,XMMWORD[((-24))+rsi]
+        cmp     ecx,DWORD[44+rsp]
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+        cmovge  r11,rbp
+        cmovg   r15,rbp
+DB      102,15,56,220,233
+        movups  xmm1,XMMWORD[((-8))+rsi]
+        movdqa  xmm11,xmm10
+DB      102,15,56,220,208
+        prefetcht0      [15+rbx*1+r12]
+        prefetcht0      [15+rbx*1+r13]
+DB      102,15,56,220,216
+        prefetcht0      [15+rbx*1+r14]
+        prefetcht0      [15+rbx*1+r15]
+DB      102,15,56,220,224
+DB      102,15,56,220,232
+        movups  xmm0,XMMWORD[((128-120))+rsi]
+        pxor    xmm12,xmm12
+
+DB      102,15,56,220,209
+        pcmpgtd xmm11,xmm12
+        movdqu  xmm12,XMMWORD[((-120))+rsi]
+DB      102,15,56,220,217
+        paddd   xmm10,xmm11
+        movdqa  XMMWORD[32+rsp],xmm10
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+        movups  xmm1,XMMWORD[((144-120))+rsi]
+
+        cmp     eax,11
+
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+DB      102,15,56,220,224
+DB      102,15,56,220,232
+        movups  xmm0,XMMWORD[((160-120))+rsi]
+
+        jb      NEAR $L$enc4x_tail
+
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+        movups  xmm1,XMMWORD[((176-120))+rsi]
+
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+DB      102,15,56,220,224
+DB      102,15,56,220,232
+        movups  xmm0,XMMWORD[((192-120))+rsi]
+
+        je      NEAR $L$enc4x_tail
+
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+        movups  xmm1,XMMWORD[((208-120))+rsi]
+
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+DB      102,15,56,220,224
+DB      102,15,56,220,232
+        movups  xmm0,XMMWORD[((224-120))+rsi]
+        jmp     NEAR $L$enc4x_tail
+
+ALIGN   32
+$L$enc4x_tail:
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+        movdqu  xmm6,XMMWORD[rbx*1+r8]
+        movdqu  xmm1,XMMWORD[((16-120))+rsi]
+
+DB      102,15,56,221,208
+        movdqu  xmm7,XMMWORD[rbx*1+r9]
+        pxor    xmm6,xmm12
+DB      102,15,56,221,216
+        movdqu  xmm8,XMMWORD[rbx*1+r10]
+        pxor    xmm7,xmm12
+DB      102,15,56,221,224
+        movdqu  xmm9,XMMWORD[rbx*1+r11]
+        pxor    xmm8,xmm12
+DB      102,15,56,221,232
+        movdqu  xmm0,XMMWORD[((32-120))+rsi]
+        pxor    xmm9,xmm12
+
+        movups  XMMWORD[(-16)+rbx*1+r12],xmm2
+        pxor    xmm2,xmm6
+        movups  XMMWORD[(-16)+rbx*1+r13],xmm3
+        pxor    xmm3,xmm7
+        movups  XMMWORD[(-16)+rbx*1+r14],xmm4
+        pxor    xmm4,xmm8
+        movups  XMMWORD[(-16)+rbx*1+r15],xmm5
+        pxor    xmm5,xmm9
+
+        dec     edx
+        jnz     NEAR $L$oop_enc4x
+
+        mov     rax,QWORD[16+rsp]
+
+        mov     edx,DWORD[24+rsp]
+
+
+
+
+
+
+
+
+
+
+        lea     rdi,[160+rdi]
+        dec     edx
+        jnz     NEAR $L$enc4x_loop_grande
+
+$L$enc4x_done:
+        movaps  xmm6,XMMWORD[((-216))+rax]
+        movaps  xmm7,XMMWORD[((-200))+rax]
+        movaps  xmm8,XMMWORD[((-184))+rax]
+        movaps  xmm9,XMMWORD[((-168))+rax]
+        movaps  xmm10,XMMWORD[((-152))+rax]
+        movaps  xmm11,XMMWORD[((-136))+rax]
+        movaps  xmm12,XMMWORD[((-120))+rax]
+
+
+
+        mov     r15,QWORD[((-48))+rax]
+
+        mov     r14,QWORD[((-40))+rax]
+
+        mov     r13,QWORD[((-32))+rax]
+
+        mov     r12,QWORD[((-24))+rax]
+
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$enc4x_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_aesni_multi_cbc_encrypt:
+
+global  aesni_multi_cbc_decrypt
+
+ALIGN   32
+aesni_multi_cbc_decrypt:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_multi_cbc_decrypt:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+        cmp     edx,2
+        jb      NEAR $L$dec_non_avx
+        mov     ecx,DWORD[((OPENSSL_ia32cap_P+4))]
+        test    ecx,268435456
+        jnz     NEAR _avx_cbc_dec_shortcut
+        jmp     NEAR $L$dec_non_avx
+ALIGN   16
+$L$dec_non_avx:
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        lea     rsp,[((-168))+rsp]
+        movaps  XMMWORD[rsp],xmm6
+        movaps  XMMWORD[16+rsp],xmm7
+        movaps  XMMWORD[32+rsp],xmm8
+        movaps  XMMWORD[48+rsp],xmm9
+        movaps  XMMWORD[64+rsp],xmm10
+        movaps  XMMWORD[80+rsp],xmm11
+        movaps  XMMWORD[96+rsp],xmm12
+        movaps  XMMWORD[(-104)+rax],xmm13
+        movaps  XMMWORD[(-88)+rax],xmm14
+        movaps  XMMWORD[(-72)+rax],xmm15
+
+
+
+
+
+
+        sub     rsp,48
+        and     rsp,-64
+        mov     QWORD[16+rsp],rax
+
+
+$L$dec4x_body:
+        movdqu  xmm12,XMMWORD[rsi]
+        lea     rsi,[120+rsi]
+        lea     rdi,[80+rdi]
+
+$L$dec4x_loop_grande:
+        mov     DWORD[24+rsp],edx
+        xor     edx,edx
+        mov     ecx,DWORD[((-64))+rdi]
+        mov     r8,QWORD[((-80))+rdi]
+        cmp     ecx,edx
+        mov     r12,QWORD[((-72))+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        movdqu  xmm6,XMMWORD[((-56))+rdi]
+        mov     DWORD[32+rsp],ecx
+        cmovle  r8,rsp
+        mov     ecx,DWORD[((-24))+rdi]
+        mov     r9,QWORD[((-40))+rdi]
+        cmp     ecx,edx
+        mov     r13,QWORD[((-32))+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        movdqu  xmm7,XMMWORD[((-16))+rdi]
+        mov     DWORD[36+rsp],ecx
+        cmovle  r9,rsp
+        mov     ecx,DWORD[16+rdi]
+        mov     r10,QWORD[rdi]
+        cmp     ecx,edx
+        mov     r14,QWORD[8+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        movdqu  xmm8,XMMWORD[24+rdi]
+        mov     DWORD[40+rsp],ecx
+        cmovle  r10,rsp
+        mov     ecx,DWORD[56+rdi]
+        mov     r11,QWORD[40+rdi]
+        cmp     ecx,edx
+        mov     r15,QWORD[48+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        movdqu  xmm9,XMMWORD[64+rdi]
+        mov     DWORD[44+rsp],ecx
+        cmovle  r11,rsp
+        test    edx,edx
+        jz      NEAR $L$dec4x_done
+
+        movups  xmm1,XMMWORD[((16-120))+rsi]
+        movups  xmm0,XMMWORD[((32-120))+rsi]
+        mov     eax,DWORD[((240-120))+rsi]
+        movdqu  xmm2,XMMWORD[r8]
+        movdqu  xmm3,XMMWORD[r9]
+        pxor    xmm2,xmm12
+        movdqu  xmm4,XMMWORD[r10]
+        pxor    xmm3,xmm12
+        movdqu  xmm5,XMMWORD[r11]
+        pxor    xmm4,xmm12
+        pxor    xmm5,xmm12
+        movdqa  xmm10,XMMWORD[32+rsp]
+        xor     rbx,rbx
+        jmp     NEAR $L$oop_dec4x
+
+ALIGN   32
+$L$oop_dec4x:
+        add     rbx,16
+        lea     rbp,[16+rsp]
+        mov     ecx,1
+        sub     rbp,rbx
+
+DB      102,15,56,222,209
+        prefetcht0      [31+rbx*1+r8]
+        prefetcht0      [31+rbx*1+r9]
+DB      102,15,56,222,217
+        prefetcht0      [31+rbx*1+r10]
+        prefetcht0      [31+rbx*1+r11]
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+        movups  xmm1,XMMWORD[((48-120))+rsi]
+        cmp     ecx,DWORD[32+rsp]
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+DB      102,15,56,222,224
+        cmovge  r8,rbp
+        cmovg   r12,rbp
+DB      102,15,56,222,232
+        movups  xmm0,XMMWORD[((-56))+rsi]
+        cmp     ecx,DWORD[36+rsp]
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+        cmovge  r9,rbp
+        cmovg   r13,rbp
+DB      102,15,56,222,233
+        movups  xmm1,XMMWORD[((-40))+rsi]
+        cmp     ecx,DWORD[40+rsp]
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+DB      102,15,56,222,224
+        cmovge  r10,rbp
+        cmovg   r14,rbp
+DB      102,15,56,222,232
+        movups  xmm0,XMMWORD[((-24))+rsi]
+        cmp     ecx,DWORD[44+rsp]
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+        cmovge  r11,rbp
+        cmovg   r15,rbp
+DB      102,15,56,222,233
+        movups  xmm1,XMMWORD[((-8))+rsi]
+        movdqa  xmm11,xmm10
+DB      102,15,56,222,208
+        prefetcht0      [15+rbx*1+r12]
+        prefetcht0      [15+rbx*1+r13]
+DB      102,15,56,222,216
+        prefetcht0      [15+rbx*1+r14]
+        prefetcht0      [15+rbx*1+r15]
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+        movups  xmm0,XMMWORD[((128-120))+rsi]
+        pxor    xmm12,xmm12
+
+DB      102,15,56,222,209
+        pcmpgtd xmm11,xmm12
+        movdqu  xmm12,XMMWORD[((-120))+rsi]
+DB      102,15,56,222,217
+        paddd   xmm10,xmm11
+        movdqa  XMMWORD[32+rsp],xmm10
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+        movups  xmm1,XMMWORD[((144-120))+rsi]
+
+        cmp     eax,11
+
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+        movups  xmm0,XMMWORD[((160-120))+rsi]
+
+        jb      NEAR $L$dec4x_tail
+
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+        movups  xmm1,XMMWORD[((176-120))+rsi]
+
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+        movups  xmm0,XMMWORD[((192-120))+rsi]
+
+        je      NEAR $L$dec4x_tail
+
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+        movups  xmm1,XMMWORD[((208-120))+rsi]
+
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+        movups  xmm0,XMMWORD[((224-120))+rsi]
+        jmp     NEAR $L$dec4x_tail
+
+ALIGN   32
+$L$dec4x_tail:
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+        pxor    xmm6,xmm0
+        pxor    xmm7,xmm0
+DB      102,15,56,222,233
+        movdqu  xmm1,XMMWORD[((16-120))+rsi]
+        pxor    xmm8,xmm0
+        pxor    xmm9,xmm0
+        movdqu  xmm0,XMMWORD[((32-120))+rsi]
+
+DB      102,15,56,223,214
+DB      102,15,56,223,223
+        movdqu  xmm6,XMMWORD[((-16))+rbx*1+r8]
+        movdqu  xmm7,XMMWORD[((-16))+rbx*1+r9]
+DB      102,65,15,56,223,224
+DB      102,65,15,56,223,233
+        movdqu  xmm8,XMMWORD[((-16))+rbx*1+r10]
+        movdqu  xmm9,XMMWORD[((-16))+rbx*1+r11]
+
+        movups  XMMWORD[(-16)+rbx*1+r12],xmm2
+        movdqu  xmm2,XMMWORD[rbx*1+r8]
+        movups  XMMWORD[(-16)+rbx*1+r13],xmm3
+        movdqu  xmm3,XMMWORD[rbx*1+r9]
+        pxor    xmm2,xmm12
+        movups  XMMWORD[(-16)+rbx*1+r14],xmm4
+        movdqu  xmm4,XMMWORD[rbx*1+r10]
+        pxor    xmm3,xmm12
+        movups  XMMWORD[(-16)+rbx*1+r15],xmm5
+        movdqu  xmm5,XMMWORD[rbx*1+r11]
+        pxor    xmm4,xmm12
+        pxor    xmm5,xmm12
+
+        dec     edx
+        jnz     NEAR $L$oop_dec4x
+
+        mov     rax,QWORD[16+rsp]
+
+        mov     edx,DWORD[24+rsp]
+
+        lea     rdi,[160+rdi]
+        dec     edx
+        jnz     NEAR $L$dec4x_loop_grande
+
+$L$dec4x_done:
+        movaps  xmm6,XMMWORD[((-216))+rax]
+        movaps  xmm7,XMMWORD[((-200))+rax]
+        movaps  xmm8,XMMWORD[((-184))+rax]
+        movaps  xmm9,XMMWORD[((-168))+rax]
+        movaps  xmm10,XMMWORD[((-152))+rax]
+        movaps  xmm11,XMMWORD[((-136))+rax]
+        movaps  xmm12,XMMWORD[((-120))+rax]
+
+
+
+        mov     r15,QWORD[((-48))+rax]
+
+        mov     r14,QWORD[((-40))+rax]
+
+        mov     r13,QWORD[((-32))+rax]
+
+        mov     r12,QWORD[((-24))+rax]
+
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$dec4x_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_aesni_multi_cbc_decrypt:
+
+ALIGN   32
+aesni_multi_cbc_encrypt_avx:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_multi_cbc_encrypt_avx:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+_avx_cbc_enc_shortcut:
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        lea     rsp,[((-168))+rsp]
+        movaps  XMMWORD[rsp],xmm6
+        movaps  XMMWORD[16+rsp],xmm7
+        movaps  XMMWORD[32+rsp],xmm8
+        movaps  XMMWORD[48+rsp],xmm9
+        movaps  XMMWORD[64+rsp],xmm10
+        movaps  XMMWORD[80+rsp],xmm11
+        movaps  XMMWORD[(-120)+rax],xmm12
+        movaps  XMMWORD[(-104)+rax],xmm13
+        movaps  XMMWORD[(-88)+rax],xmm14
+        movaps  XMMWORD[(-72)+rax],xmm15
+
+
+
+
+
+
+
+
+        sub     rsp,192
+        and     rsp,-128
+        mov     QWORD[16+rsp],rax
+
+
+$L$enc8x_body:
+        vzeroupper
+        vmovdqu xmm15,XMMWORD[rsi]
+        lea     rsi,[120+rsi]
+        lea     rdi,[160+rdi]
+        shr     edx,1
+
+$L$enc8x_loop_grande:
+
+        xor     edx,edx
+        mov     ecx,DWORD[((-144))+rdi]
+        mov     r8,QWORD[((-160))+rdi]
+        cmp     ecx,edx
+        mov     rbx,QWORD[((-152))+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        vmovdqu xmm2,XMMWORD[((-136))+rdi]
+        mov     DWORD[32+rsp],ecx
+        cmovle  r8,rsp
+        sub     rbx,r8
+        mov     QWORD[64+rsp],rbx
+        mov     ecx,DWORD[((-104))+rdi]
+        mov     r9,QWORD[((-120))+rdi]
+        cmp     ecx,edx
+        mov     rbp,QWORD[((-112))+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        vmovdqu xmm3,XMMWORD[((-96))+rdi]
+        mov     DWORD[36+rsp],ecx
+        cmovle  r9,rsp
+        sub     rbp,r9
+        mov     QWORD[72+rsp],rbp
+        mov     ecx,DWORD[((-64))+rdi]
+        mov     r10,QWORD[((-80))+rdi]
+        cmp     ecx,edx
+        mov     rbp,QWORD[((-72))+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        vmovdqu xmm4,XMMWORD[((-56))+rdi]
+        mov     DWORD[40+rsp],ecx
+        cmovle  r10,rsp
+        sub     rbp,r10
+        mov     QWORD[80+rsp],rbp
+        mov     ecx,DWORD[((-24))+rdi]
+        mov     r11,QWORD[((-40))+rdi]
+        cmp     ecx,edx
+        mov     rbp,QWORD[((-32))+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        vmovdqu xmm5,XMMWORD[((-16))+rdi]
+        mov     DWORD[44+rsp],ecx
+        cmovle  r11,rsp
+        sub     rbp,r11
+        mov     QWORD[88+rsp],rbp
+        mov     ecx,DWORD[16+rdi]
+        mov     r12,QWORD[rdi]
+        cmp     ecx,edx
+        mov     rbp,QWORD[8+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        vmovdqu xmm6,XMMWORD[24+rdi]
+        mov     DWORD[48+rsp],ecx
+        cmovle  r12,rsp
+        sub     rbp,r12
+        mov     QWORD[96+rsp],rbp
+        mov     ecx,DWORD[56+rdi]
+        mov     r13,QWORD[40+rdi]
+        cmp     ecx,edx
+        mov     rbp,QWORD[48+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        vmovdqu xmm7,XMMWORD[64+rdi]
+        mov     DWORD[52+rsp],ecx
+        cmovle  r13,rsp
+        sub     rbp,r13
+        mov     QWORD[104+rsp],rbp
+        mov     ecx,DWORD[96+rdi]
+        mov     r14,QWORD[80+rdi]
+        cmp     ecx,edx
+        mov     rbp,QWORD[88+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        vmovdqu xmm8,XMMWORD[104+rdi]
+        mov     DWORD[56+rsp],ecx
+        cmovle  r14,rsp
+        sub     rbp,r14
+        mov     QWORD[112+rsp],rbp
+        mov     ecx,DWORD[136+rdi]
+        mov     r15,QWORD[120+rdi]
+        cmp     ecx,edx
+        mov     rbp,QWORD[128+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        vmovdqu xmm9,XMMWORD[144+rdi]
+        mov     DWORD[60+rsp],ecx
+        cmovle  r15,rsp
+        sub     rbp,r15
+        mov     QWORD[120+rsp],rbp
+        test    edx,edx
+        jz      NEAR $L$enc8x_done
+
+        vmovups xmm1,XMMWORD[((16-120))+rsi]
+        vmovups xmm0,XMMWORD[((32-120))+rsi]
+        mov     eax,DWORD[((240-120))+rsi]
+
+        vpxor   xmm10,xmm15,XMMWORD[r8]
+        lea     rbp,[128+rsp]
+        vpxor   xmm11,xmm15,XMMWORD[r9]
+        vpxor   xmm12,xmm15,XMMWORD[r10]
+        vpxor   xmm13,xmm15,XMMWORD[r11]
+        vpxor   xmm2,xmm2,xmm10
+        vpxor   xmm10,xmm15,XMMWORD[r12]
+        vpxor   xmm3,xmm3,xmm11
+        vpxor   xmm11,xmm15,XMMWORD[r13]
+        vpxor   xmm4,xmm4,xmm12
+        vpxor   xmm12,xmm15,XMMWORD[r14]
+        vpxor   xmm5,xmm5,xmm13
+        vpxor   xmm13,xmm15,XMMWORD[r15]
+        vpxor   xmm6,xmm6,xmm10
+        mov     ecx,1
+        vpxor   xmm7,xmm7,xmm11
+        vpxor   xmm8,xmm8,xmm12
+        vpxor   xmm9,xmm9,xmm13
+        jmp     NEAR $L$oop_enc8x
+
+ALIGN   32
+$L$oop_enc8x:
+        vaesenc xmm2,xmm2,xmm1
+        cmp     ecx,DWORD[((32+0))+rsp]
+        vaesenc xmm3,xmm3,xmm1
+        prefetcht0      [31+r8]
+        vaesenc xmm4,xmm4,xmm1
+        vaesenc xmm5,xmm5,xmm1
+        lea     rbx,[rbx*1+r8]
+        cmovge  r8,rsp
+        vaesenc xmm6,xmm6,xmm1
+        cmovg   rbx,rsp
+        vaesenc xmm7,xmm7,xmm1
+        sub     rbx,r8
+        vaesenc xmm8,xmm8,xmm1
+        vpxor   xmm10,xmm15,XMMWORD[16+r8]
+        mov     QWORD[((64+0))+rsp],rbx
+        vaesenc xmm9,xmm9,xmm1
+        vmovups xmm1,XMMWORD[((-72))+rsi]
+        lea     r8,[16+rbx*1+r8]
+        vmovdqu XMMWORD[rbp],xmm10
+        vaesenc xmm2,xmm2,xmm0
+        cmp     ecx,DWORD[((32+4))+rsp]
+        mov     rbx,QWORD[((64+8))+rsp]
+        vaesenc xmm3,xmm3,xmm0
+        prefetcht0      [31+r9]
+        vaesenc xmm4,xmm4,xmm0
+        vaesenc xmm5,xmm5,xmm0
+        lea     rbx,[rbx*1+r9]
+        cmovge  r9,rsp
+        vaesenc xmm6,xmm6,xmm0
+        cmovg   rbx,rsp
+        vaesenc xmm7,xmm7,xmm0
+        sub     rbx,r9
+        vaesenc xmm8,xmm8,xmm0
+        vpxor   xmm11,xmm15,XMMWORD[16+r9]
+        mov     QWORD[((64+8))+rsp],rbx
+        vaesenc xmm9,xmm9,xmm0
+        vmovups xmm0,XMMWORD[((-56))+rsi]
+        lea     r9,[16+rbx*1+r9]
+        vmovdqu XMMWORD[16+rbp],xmm11
+        vaesenc xmm2,xmm2,xmm1
+        cmp     ecx,DWORD[((32+8))+rsp]
+        mov     rbx,QWORD[((64+16))+rsp]
+        vaesenc xmm3,xmm3,xmm1
+        prefetcht0      [31+r10]
+        vaesenc xmm4,xmm4,xmm1
+        prefetcht0      [15+r8]
+        vaesenc xmm5,xmm5,xmm1
+        lea     rbx,[rbx*1+r10]
+        cmovge  r10,rsp
+        vaesenc xmm6,xmm6,xmm1
+        cmovg   rbx,rsp
+        vaesenc xmm7,xmm7,xmm1
+        sub     rbx,r10
+        vaesenc xmm8,xmm8,xmm1
+        vpxor   xmm12,xmm15,XMMWORD[16+r10]
+        mov     QWORD[((64+16))+rsp],rbx
+        vaesenc xmm9,xmm9,xmm1
+        vmovups xmm1,XMMWORD[((-40))+rsi]
+        lea     r10,[16+rbx*1+r10]
+        vmovdqu XMMWORD[32+rbp],xmm12
+        vaesenc xmm2,xmm2,xmm0
+        cmp     ecx,DWORD[((32+12))+rsp]
+        mov     rbx,QWORD[((64+24))+rsp]
+        vaesenc xmm3,xmm3,xmm0
+        prefetcht0      [31+r11]
+        vaesenc xmm4,xmm4,xmm0
+        prefetcht0      [15+r9]
+        vaesenc xmm5,xmm5,xmm0
+        lea     rbx,[rbx*1+r11]
+        cmovge  r11,rsp
+        vaesenc xmm6,xmm6,xmm0
+        cmovg   rbx,rsp
+        vaesenc xmm7,xmm7,xmm0
+        sub     rbx,r11
+        vaesenc xmm8,xmm8,xmm0
+        vpxor   xmm13,xmm15,XMMWORD[16+r11]
+        mov     QWORD[((64+24))+rsp],rbx
+        vaesenc xmm9,xmm9,xmm0
+        vmovups xmm0,XMMWORD[((-24))+rsi]
+        lea     r11,[16+rbx*1+r11]
+        vmovdqu XMMWORD[48+rbp],xmm13
+        vaesenc xmm2,xmm2,xmm1
+        cmp     ecx,DWORD[((32+16))+rsp]
+        mov     rbx,QWORD[((64+32))+rsp]
+        vaesenc xmm3,xmm3,xmm1
+        prefetcht0      [31+r12]
+        vaesenc xmm4,xmm4,xmm1
+        prefetcht0      [15+r10]
+        vaesenc xmm5,xmm5,xmm1
+        lea     rbx,[rbx*1+r12]
+        cmovge  r12,rsp
+        vaesenc xmm6,xmm6,xmm1
+        cmovg   rbx,rsp
+        vaesenc xmm7,xmm7,xmm1
+        sub     rbx,r12
+        vaesenc xmm8,xmm8,xmm1
+        vpxor   xmm10,xmm15,XMMWORD[16+r12]
+        mov     QWORD[((64+32))+rsp],rbx
+        vaesenc xmm9,xmm9,xmm1
+        vmovups xmm1,XMMWORD[((-8))+rsi]
+        lea     r12,[16+rbx*1+r12]
+        vaesenc xmm2,xmm2,xmm0
+        cmp     ecx,DWORD[((32+20))+rsp]
+        mov     rbx,QWORD[((64+40))+rsp]
+        vaesenc xmm3,xmm3,xmm0
+        prefetcht0      [31+r13]
+        vaesenc xmm4,xmm4,xmm0
+        prefetcht0      [15+r11]
+        vaesenc xmm5,xmm5,xmm0
+        lea     rbx,[r13*1+rbx]
+        cmovge  r13,rsp
+        vaesenc xmm6,xmm6,xmm0
+        cmovg   rbx,rsp
+        vaesenc xmm7,xmm7,xmm0
+        sub     rbx,r13
+        vaesenc xmm8,xmm8,xmm0
+        vpxor   xmm11,xmm15,XMMWORD[16+r13]
+        mov     QWORD[((64+40))+rsp],rbx
+        vaesenc xmm9,xmm9,xmm0
+        vmovups xmm0,XMMWORD[8+rsi]
+        lea     r13,[16+rbx*1+r13]
+        vaesenc xmm2,xmm2,xmm1
+        cmp     ecx,DWORD[((32+24))+rsp]
+        mov     rbx,QWORD[((64+48))+rsp]
+        vaesenc xmm3,xmm3,xmm1
+        prefetcht0      [31+r14]
+        vaesenc xmm4,xmm4,xmm1
+        prefetcht0      [15+r12]
+        vaesenc xmm5,xmm5,xmm1
+        lea     rbx,[rbx*1+r14]
+        cmovge  r14,rsp
+        vaesenc xmm6,xmm6,xmm1
+        cmovg   rbx,rsp
+        vaesenc xmm7,xmm7,xmm1
+        sub     rbx,r14
+        vaesenc xmm8,xmm8,xmm1
+        vpxor   xmm12,xmm15,XMMWORD[16+r14]
+        mov     QWORD[((64+48))+rsp],rbx
+        vaesenc xmm9,xmm9,xmm1
+        vmovups xmm1,XMMWORD[24+rsi]
+        lea     r14,[16+rbx*1+r14]
+        vaesenc xmm2,xmm2,xmm0
+        cmp     ecx,DWORD[((32+28))+rsp]
+        mov     rbx,QWORD[((64+56))+rsp]
+        vaesenc xmm3,xmm3,xmm0
+        prefetcht0      [31+r15]
+        vaesenc xmm4,xmm4,xmm0
+        prefetcht0      [15+r13]
+        vaesenc xmm5,xmm5,xmm0
+        lea     rbx,[rbx*1+r15]
+        cmovge  r15,rsp
+        vaesenc xmm6,xmm6,xmm0
+        cmovg   rbx,rsp
+        vaesenc xmm7,xmm7,xmm0
+        sub     rbx,r15
+        vaesenc xmm8,xmm8,xmm0
+        vpxor   xmm13,xmm15,XMMWORD[16+r15]
+        mov     QWORD[((64+56))+rsp],rbx
+        vaesenc xmm9,xmm9,xmm0
+        vmovups xmm0,XMMWORD[40+rsi]
+        lea     r15,[16+rbx*1+r15]
+        vmovdqu xmm14,XMMWORD[32+rsp]
+        prefetcht0      [15+r14]
+        prefetcht0      [15+r15]
+        cmp     eax,11
+        jb      NEAR $L$enc8x_tail
+
+        vaesenc xmm2,xmm2,xmm1
+        vaesenc xmm3,xmm3,xmm1
+        vaesenc xmm4,xmm4,xmm1
+        vaesenc xmm5,xmm5,xmm1
+        vaesenc xmm6,xmm6,xmm1
+        vaesenc xmm7,xmm7,xmm1
+        vaesenc xmm8,xmm8,xmm1
+        vaesenc xmm9,xmm9,xmm1
+        vmovups xmm1,XMMWORD[((176-120))+rsi]
+
+        vaesenc xmm2,xmm2,xmm0
+        vaesenc xmm3,xmm3,xmm0
+        vaesenc xmm4,xmm4,xmm0
+        vaesenc xmm5,xmm5,xmm0
+        vaesenc xmm6,xmm6,xmm0
+        vaesenc xmm7,xmm7,xmm0
+        vaesenc xmm8,xmm8,xmm0
+        vaesenc xmm9,xmm9,xmm0
+        vmovups xmm0,XMMWORD[((192-120))+rsi]
+        je      NEAR $L$enc8x_tail
+
+        vaesenc xmm2,xmm2,xmm1
+        vaesenc xmm3,xmm3,xmm1
+        vaesenc xmm4,xmm4,xmm1
+        vaesenc xmm5,xmm5,xmm1
+        vaesenc xmm6,xmm6,xmm1
+        vaesenc xmm7,xmm7,xmm1
+        vaesenc xmm8,xmm8,xmm1
+        vaesenc xmm9,xmm9,xmm1
+        vmovups xmm1,XMMWORD[((208-120))+rsi]
+
+        vaesenc xmm2,xmm2,xmm0
+        vaesenc xmm3,xmm3,xmm0
+        vaesenc xmm4,xmm4,xmm0
+        vaesenc xmm5,xmm5,xmm0
+        vaesenc xmm6,xmm6,xmm0
+        vaesenc xmm7,xmm7,xmm0
+        vaesenc xmm8,xmm8,xmm0
+        vaesenc xmm9,xmm9,xmm0
+        vmovups xmm0,XMMWORD[((224-120))+rsi]
+
+$L$enc8x_tail:
+        vaesenc xmm2,xmm2,xmm1
+        vpxor   xmm15,xmm15,xmm15
+        vaesenc xmm3,xmm3,xmm1
+        vaesenc xmm4,xmm4,xmm1
+        vpcmpgtd        xmm15,xmm14,xmm15
+        vaesenc xmm5,xmm5,xmm1
+        vaesenc xmm6,xmm6,xmm1
+        vpaddd  xmm15,xmm15,xmm14
+        vmovdqu xmm14,XMMWORD[48+rsp]
+        vaesenc xmm7,xmm7,xmm1
+        mov     rbx,QWORD[64+rsp]
+        vaesenc xmm8,xmm8,xmm1
+        vaesenc xmm9,xmm9,xmm1
+        vmovups xmm1,XMMWORD[((16-120))+rsi]
+
+        vaesenclast     xmm2,xmm2,xmm0
+        vmovdqa XMMWORD[32+rsp],xmm15
+        vpxor   xmm15,xmm15,xmm15
+        vaesenclast     xmm3,xmm3,xmm0
+        vaesenclast     xmm4,xmm4,xmm0
+        vpcmpgtd        xmm15,xmm14,xmm15
+        vaesenclast     xmm5,xmm5,xmm0
+        vaesenclast     xmm6,xmm6,xmm0
+        vpaddd  xmm14,xmm14,xmm15
+        vmovdqu xmm15,XMMWORD[((-120))+rsi]
+        vaesenclast     xmm7,xmm7,xmm0
+        vaesenclast     xmm8,xmm8,xmm0
+        vmovdqa XMMWORD[48+rsp],xmm14
+        vaesenclast     xmm9,xmm9,xmm0
+        vmovups xmm0,XMMWORD[((32-120))+rsi]
+
+        vmovups XMMWORD[(-16)+r8],xmm2
+        sub     r8,rbx
+        vpxor   xmm2,xmm2,XMMWORD[rbp]
+        vmovups XMMWORD[(-16)+r9],xmm3
+        sub     r9,QWORD[72+rsp]
+        vpxor   xmm3,xmm3,XMMWORD[16+rbp]
+        vmovups XMMWORD[(-16)+r10],xmm4
+        sub     r10,QWORD[80+rsp]
+        vpxor   xmm4,xmm4,XMMWORD[32+rbp]
+        vmovups XMMWORD[(-16)+r11],xmm5
+        sub     r11,QWORD[88+rsp]
+        vpxor   xmm5,xmm5,XMMWORD[48+rbp]
+        vmovups XMMWORD[(-16)+r12],xmm6
+        sub     r12,QWORD[96+rsp]
+        vpxor   xmm6,xmm6,xmm10
+        vmovups XMMWORD[(-16)+r13],xmm7
+        sub     r13,QWORD[104+rsp]
+        vpxor   xmm7,xmm7,xmm11
+        vmovups XMMWORD[(-16)+r14],xmm8
+        sub     r14,QWORD[112+rsp]
+        vpxor   xmm8,xmm8,xmm12
+        vmovups XMMWORD[(-16)+r15],xmm9
+        sub     r15,QWORD[120+rsp]
+        vpxor   xmm9,xmm9,xmm13
+
+        dec     edx
+        jnz     NEAR $L$oop_enc8x
+
+        mov     rax,QWORD[16+rsp]
+
+
+
+
+
+
+$L$enc8x_done:
+        vzeroupper
+        movaps  xmm6,XMMWORD[((-216))+rax]
+        movaps  xmm7,XMMWORD[((-200))+rax]
+        movaps  xmm8,XMMWORD[((-184))+rax]
+        movaps  xmm9,XMMWORD[((-168))+rax]
+        movaps  xmm10,XMMWORD[((-152))+rax]
+        movaps  xmm11,XMMWORD[((-136))+rax]
+        movaps  xmm12,XMMWORD[((-120))+rax]
+        movaps  xmm13,XMMWORD[((-104))+rax]
+        movaps  xmm14,XMMWORD[((-88))+rax]
+        movaps  xmm15,XMMWORD[((-72))+rax]
+        mov     r15,QWORD[((-48))+rax]
+
+        mov     r14,QWORD[((-40))+rax]
+
+        mov     r13,QWORD[((-32))+rax]
+
+        mov     r12,QWORD[((-24))+rax]
+
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$enc8x_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_aesni_multi_cbc_encrypt_avx:
+
+
+ALIGN   32
+aesni_multi_cbc_decrypt_avx:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_multi_cbc_decrypt_avx:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+_avx_cbc_dec_shortcut:
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        lea     rsp,[((-168))+rsp]
+        movaps  XMMWORD[rsp],xmm6
+        movaps  XMMWORD[16+rsp],xmm7
+        movaps  XMMWORD[32+rsp],xmm8
+        movaps  XMMWORD[48+rsp],xmm9
+        movaps  XMMWORD[64+rsp],xmm10
+        movaps  XMMWORD[80+rsp],xmm11
+        movaps  XMMWORD[(-120)+rax],xmm12
+        movaps  XMMWORD[(-104)+rax],xmm13
+        movaps  XMMWORD[(-88)+rax],xmm14
+        movaps  XMMWORD[(-72)+rax],xmm15
+
+
+
+
+
+
+
+
+
+        sub     rsp,256
+        and     rsp,-256
+        sub     rsp,192
+        mov     QWORD[16+rsp],rax
+
+
+$L$dec8x_body:
+        vzeroupper
+        vmovdqu xmm15,XMMWORD[rsi]
+        lea     rsi,[120+rsi]
+        lea     rdi,[160+rdi]
+        shr     edx,1
+
+$L$dec8x_loop_grande:
+
+        xor     edx,edx
+        mov     ecx,DWORD[((-144))+rdi]
+        mov     r8,QWORD[((-160))+rdi]
+        cmp     ecx,edx
+        mov     rbx,QWORD[((-152))+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        vmovdqu xmm2,XMMWORD[((-136))+rdi]
+        mov     DWORD[32+rsp],ecx
+        cmovle  r8,rsp
+        sub     rbx,r8
+        mov     QWORD[64+rsp],rbx
+        vmovdqu XMMWORD[192+rsp],xmm2
+        mov     ecx,DWORD[((-104))+rdi]
+        mov     r9,QWORD[((-120))+rdi]
+        cmp     ecx,edx
+        mov     rbp,QWORD[((-112))+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        vmovdqu xmm3,XMMWORD[((-96))+rdi]
+        mov     DWORD[36+rsp],ecx
+        cmovle  r9,rsp
+        sub     rbp,r9
+        mov     QWORD[72+rsp],rbp
+        vmovdqu XMMWORD[208+rsp],xmm3
+        mov     ecx,DWORD[((-64))+rdi]
+        mov     r10,QWORD[((-80))+rdi]
+        cmp     ecx,edx
+        mov     rbp,QWORD[((-72))+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        vmovdqu xmm4,XMMWORD[((-56))+rdi]
+        mov     DWORD[40+rsp],ecx
+        cmovle  r10,rsp
+        sub     rbp,r10
+        mov     QWORD[80+rsp],rbp
+        vmovdqu XMMWORD[224+rsp],xmm4
+        mov     ecx,DWORD[((-24))+rdi]
+        mov     r11,QWORD[((-40))+rdi]
+        cmp     ecx,edx
+        mov     rbp,QWORD[((-32))+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        vmovdqu xmm5,XMMWORD[((-16))+rdi]
+        mov     DWORD[44+rsp],ecx
+        cmovle  r11,rsp
+        sub     rbp,r11
+        mov     QWORD[88+rsp],rbp
+        vmovdqu XMMWORD[240+rsp],xmm5
+        mov     ecx,DWORD[16+rdi]
+        mov     r12,QWORD[rdi]
+        cmp     ecx,edx
+        mov     rbp,QWORD[8+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        vmovdqu xmm6,XMMWORD[24+rdi]
+        mov     DWORD[48+rsp],ecx
+        cmovle  r12,rsp
+        sub     rbp,r12
+        mov     QWORD[96+rsp],rbp
+        vmovdqu XMMWORD[256+rsp],xmm6
+        mov     ecx,DWORD[56+rdi]
+        mov     r13,QWORD[40+rdi]
+        cmp     ecx,edx
+        mov     rbp,QWORD[48+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        vmovdqu xmm7,XMMWORD[64+rdi]
+        mov     DWORD[52+rsp],ecx
+        cmovle  r13,rsp
+        sub     rbp,r13
+        mov     QWORD[104+rsp],rbp
+        vmovdqu XMMWORD[272+rsp],xmm7
+        mov     ecx,DWORD[96+rdi]
+        mov     r14,QWORD[80+rdi]
+        cmp     ecx,edx
+        mov     rbp,QWORD[88+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        vmovdqu xmm8,XMMWORD[104+rdi]
+        mov     DWORD[56+rsp],ecx
+        cmovle  r14,rsp
+        sub     rbp,r14
+        mov     QWORD[112+rsp],rbp
+        vmovdqu XMMWORD[288+rsp],xmm8
+        mov     ecx,DWORD[136+rdi]
+        mov     r15,QWORD[120+rdi]
+        cmp     ecx,edx
+        mov     rbp,QWORD[128+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        vmovdqu xmm9,XMMWORD[144+rdi]
+        mov     DWORD[60+rsp],ecx
+        cmovle  r15,rsp
+        sub     rbp,r15
+        mov     QWORD[120+rsp],rbp
+        vmovdqu XMMWORD[304+rsp],xmm9
+        test    edx,edx
+        jz      NEAR $L$dec8x_done
+
+        vmovups xmm1,XMMWORD[((16-120))+rsi]
+        vmovups xmm0,XMMWORD[((32-120))+rsi]
+        mov     eax,DWORD[((240-120))+rsi]
+        lea     rbp,[((192+128))+rsp]
+
+        vmovdqu xmm2,XMMWORD[r8]
+        vmovdqu xmm3,XMMWORD[r9]
+        vmovdqu xmm4,XMMWORD[r10]
+        vmovdqu xmm5,XMMWORD[r11]
+        vmovdqu xmm6,XMMWORD[r12]
+        vmovdqu xmm7,XMMWORD[r13]
+        vmovdqu xmm8,XMMWORD[r14]
+        vmovdqu xmm9,XMMWORD[r15]
+        vmovdqu XMMWORD[rbp],xmm2
+        vpxor   xmm2,xmm2,xmm15
+        vmovdqu XMMWORD[16+rbp],xmm3
+        vpxor   xmm3,xmm3,xmm15
+        vmovdqu XMMWORD[32+rbp],xmm4
+        vpxor   xmm4,xmm4,xmm15
+        vmovdqu XMMWORD[48+rbp],xmm5
+        vpxor   xmm5,xmm5,xmm15
+        vmovdqu XMMWORD[64+rbp],xmm6
+        vpxor   xmm6,xmm6,xmm15
+        vmovdqu XMMWORD[80+rbp],xmm7
+        vpxor   xmm7,xmm7,xmm15
+        vmovdqu XMMWORD[96+rbp],xmm8
+        vpxor   xmm8,xmm8,xmm15
+        vmovdqu XMMWORD[112+rbp],xmm9
+        vpxor   xmm9,xmm9,xmm15
+        xor     rbp,0x80
+        mov     ecx,1
+        jmp     NEAR $L$oop_dec8x
+
+ALIGN   32
+$L$oop_dec8x:
+        vaesdec xmm2,xmm2,xmm1
+        cmp     ecx,DWORD[((32+0))+rsp]
+        vaesdec xmm3,xmm3,xmm1
+        prefetcht0      [31+r8]
+        vaesdec xmm4,xmm4,xmm1
+        vaesdec xmm5,xmm5,xmm1
+        lea     rbx,[rbx*1+r8]
+        cmovge  r8,rsp
+        vaesdec xmm6,xmm6,xmm1
+        cmovg   rbx,rsp
+        vaesdec xmm7,xmm7,xmm1
+        sub     rbx,r8
+        vaesdec xmm8,xmm8,xmm1
+        vmovdqu xmm10,XMMWORD[16+r8]
+        mov     QWORD[((64+0))+rsp],rbx
+        vaesdec xmm9,xmm9,xmm1
+        vmovups xmm1,XMMWORD[((-72))+rsi]
+        lea     r8,[16+rbx*1+r8]
+        vmovdqu XMMWORD[128+rsp],xmm10
+        vaesdec xmm2,xmm2,xmm0
+        cmp     ecx,DWORD[((32+4))+rsp]
+        mov     rbx,QWORD[((64+8))+rsp]
+        vaesdec xmm3,xmm3,xmm0
+        prefetcht0      [31+r9]
+        vaesdec xmm4,xmm4,xmm0
+        vaesdec xmm5,xmm5,xmm0
+        lea     rbx,[rbx*1+r9]
+        cmovge  r9,rsp
+        vaesdec xmm6,xmm6,xmm0
+        cmovg   rbx,rsp
+        vaesdec xmm7,xmm7,xmm0
+        sub     rbx,r9
+        vaesdec xmm8,xmm8,xmm0
+        vmovdqu xmm11,XMMWORD[16+r9]
+        mov     QWORD[((64+8))+rsp],rbx
+        vaesdec xmm9,xmm9,xmm0
+        vmovups xmm0,XMMWORD[((-56))+rsi]
+        lea     r9,[16+rbx*1+r9]
+        vmovdqu XMMWORD[144+rsp],xmm11
+        vaesdec xmm2,xmm2,xmm1
+        cmp     ecx,DWORD[((32+8))+rsp]
+        mov     rbx,QWORD[((64+16))+rsp]
+        vaesdec xmm3,xmm3,xmm1
+        prefetcht0      [31+r10]
+        vaesdec xmm4,xmm4,xmm1
+        prefetcht0      [15+r8]
+        vaesdec xmm5,xmm5,xmm1
+        lea     rbx,[rbx*1+r10]
+        cmovge  r10,rsp
+        vaesdec xmm6,xmm6,xmm1
+        cmovg   rbx,rsp
+        vaesdec xmm7,xmm7,xmm1
+        sub     rbx,r10
+        vaesdec xmm8,xmm8,xmm1
+        vmovdqu xmm12,XMMWORD[16+r10]
+        mov     QWORD[((64+16))+rsp],rbx
+        vaesdec xmm9,xmm9,xmm1
+        vmovups xmm1,XMMWORD[((-40))+rsi]
+        lea     r10,[16+rbx*1+r10]
+        vmovdqu XMMWORD[160+rsp],xmm12
+        vaesdec xmm2,xmm2,xmm0
+        cmp     ecx,DWORD[((32+12))+rsp]
+        mov     rbx,QWORD[((64+24))+rsp]
+        vaesdec xmm3,xmm3,xmm0
+        prefetcht0      [31+r11]
+        vaesdec xmm4,xmm4,xmm0
+        prefetcht0      [15+r9]
+        vaesdec xmm5,xmm5,xmm0
+        lea     rbx,[rbx*1+r11]
+        cmovge  r11,rsp
+        vaesdec xmm6,xmm6,xmm0
+        cmovg   rbx,rsp
+        vaesdec xmm7,xmm7,xmm0
+        sub     rbx,r11
+        vaesdec xmm8,xmm8,xmm0
+        vmovdqu xmm13,XMMWORD[16+r11]
+        mov     QWORD[((64+24))+rsp],rbx
+        vaesdec xmm9,xmm9,xmm0
+        vmovups xmm0,XMMWORD[((-24))+rsi]
+        lea     r11,[16+rbx*1+r11]
+        vmovdqu XMMWORD[176+rsp],xmm13
+        vaesdec xmm2,xmm2,xmm1
+        cmp     ecx,DWORD[((32+16))+rsp]
+        mov     rbx,QWORD[((64+32))+rsp]
+        vaesdec xmm3,xmm3,xmm1
+        prefetcht0      [31+r12]
+        vaesdec xmm4,xmm4,xmm1
+        prefetcht0      [15+r10]
+        vaesdec xmm5,xmm5,xmm1
+        lea     rbx,[rbx*1+r12]
+        cmovge  r12,rsp
+        vaesdec xmm6,xmm6,xmm1
+        cmovg   rbx,rsp
+        vaesdec xmm7,xmm7,xmm1
+        sub     rbx,r12
+        vaesdec xmm8,xmm8,xmm1
+        vmovdqu xmm10,XMMWORD[16+r12]
+        mov     QWORD[((64+32))+rsp],rbx
+        vaesdec xmm9,xmm9,xmm1
+        vmovups xmm1,XMMWORD[((-8))+rsi]
+        lea     r12,[16+rbx*1+r12]
+        vaesdec xmm2,xmm2,xmm0
+        cmp     ecx,DWORD[((32+20))+rsp]
+        mov     rbx,QWORD[((64+40))+rsp]
+        vaesdec xmm3,xmm3,xmm0
+        prefetcht0      [31+r13]
+        vaesdec xmm4,xmm4,xmm0
+        prefetcht0      [15+r11]
+        vaesdec xmm5,xmm5,xmm0
+        lea     rbx,[r13*1+rbx]
+        cmovge  r13,rsp
+        vaesdec xmm6,xmm6,xmm0
+        cmovg   rbx,rsp
+        vaesdec xmm7,xmm7,xmm0
+        sub     rbx,r13
+        vaesdec xmm8,xmm8,xmm0
+        vmovdqu xmm11,XMMWORD[16+r13]
+        mov     QWORD[((64+40))+rsp],rbx
+        vaesdec xmm9,xmm9,xmm0
+        vmovups xmm0,XMMWORD[8+rsi]
+        lea     r13,[16+rbx*1+r13]
+        vaesdec xmm2,xmm2,xmm1
+        cmp     ecx,DWORD[((32+24))+rsp]
+        mov     rbx,QWORD[((64+48))+rsp]
+        vaesdec xmm3,xmm3,xmm1
+        prefetcht0      [31+r14]
+        vaesdec xmm4,xmm4,xmm1
+        prefetcht0      [15+r12]
+        vaesdec xmm5,xmm5,xmm1
+        lea     rbx,[rbx*1+r14]
+        cmovge  r14,rsp
+        vaesdec xmm6,xmm6,xmm1
+        cmovg   rbx,rsp
+        vaesdec xmm7,xmm7,xmm1
+        sub     rbx,r14
+        vaesdec xmm8,xmm8,xmm1
+        vmovdqu xmm12,XMMWORD[16+r14]
+        mov     QWORD[((64+48))+rsp],rbx
+        vaesdec xmm9,xmm9,xmm1
+        vmovups xmm1,XMMWORD[24+rsi]
+        lea     r14,[16+rbx*1+r14]
+        vaesdec xmm2,xmm2,xmm0
+        cmp     ecx,DWORD[((32+28))+rsp]
+        mov     rbx,QWORD[((64+56))+rsp]
+        vaesdec xmm3,xmm3,xmm0
+        prefetcht0      [31+r15]
+        vaesdec xmm4,xmm4,xmm0
+        prefetcht0      [15+r13]
+        vaesdec xmm5,xmm5,xmm0
+        lea     rbx,[rbx*1+r15]
+        cmovge  r15,rsp
+        vaesdec xmm6,xmm6,xmm0
+        cmovg   rbx,rsp
+        vaesdec xmm7,xmm7,xmm0
+        sub     rbx,r15
+        vaesdec xmm8,xmm8,xmm0
+        vmovdqu xmm13,XMMWORD[16+r15]
+        mov     QWORD[((64+56))+rsp],rbx
+        vaesdec xmm9,xmm9,xmm0
+        vmovups xmm0,XMMWORD[40+rsi]
+        lea     r15,[16+rbx*1+r15]
+        vmovdqu xmm14,XMMWORD[32+rsp]
+        prefetcht0      [15+r14]
+        prefetcht0      [15+r15]
+        cmp     eax,11
+        jb      NEAR $L$dec8x_tail
+
+        vaesdec xmm2,xmm2,xmm1
+        vaesdec xmm3,xmm3,xmm1
+        vaesdec xmm4,xmm4,xmm1
+        vaesdec xmm5,xmm5,xmm1
+        vaesdec xmm6,xmm6,xmm1
+        vaesdec xmm7,xmm7,xmm1
+        vaesdec xmm8,xmm8,xmm1
+        vaesdec xmm9,xmm9,xmm1
+        vmovups xmm1,XMMWORD[((176-120))+rsi]
+
+        vaesdec xmm2,xmm2,xmm0
+        vaesdec xmm3,xmm3,xmm0
+        vaesdec xmm4,xmm4,xmm0
+        vaesdec xmm5,xmm5,xmm0
+        vaesdec xmm6,xmm6,xmm0
+        vaesdec xmm7,xmm7,xmm0
+        vaesdec xmm8,xmm8,xmm0
+        vaesdec xmm9,xmm9,xmm0
+        vmovups xmm0,XMMWORD[((192-120))+rsi]
+        je      NEAR $L$dec8x_tail
+
+        vaesdec xmm2,xmm2,xmm1
+        vaesdec xmm3,xmm3,xmm1
+        vaesdec xmm4,xmm4,xmm1
+        vaesdec xmm5,xmm5,xmm1
+        vaesdec xmm6,xmm6,xmm1
+        vaesdec xmm7,xmm7,xmm1
+        vaesdec xmm8,xmm8,xmm1
+        vaesdec xmm9,xmm9,xmm1
+        vmovups xmm1,XMMWORD[((208-120))+rsi]
+
+        vaesdec xmm2,xmm2,xmm0
+        vaesdec xmm3,xmm3,xmm0
+        vaesdec xmm4,xmm4,xmm0
+        vaesdec xmm5,xmm5,xmm0
+        vaesdec xmm6,xmm6,xmm0
+        vaesdec xmm7,xmm7,xmm0
+        vaesdec xmm8,xmm8,xmm0
+        vaesdec xmm9,xmm9,xmm0
+        vmovups xmm0,XMMWORD[((224-120))+rsi]
+
+$L$dec8x_tail:
+        vaesdec xmm2,xmm2,xmm1
+        vpxor   xmm15,xmm15,xmm15
+        vaesdec xmm3,xmm3,xmm1
+        vaesdec xmm4,xmm4,xmm1
+        vpcmpgtd        xmm15,xmm14,xmm15
+        vaesdec xmm5,xmm5,xmm1
+        vaesdec xmm6,xmm6,xmm1
+        vpaddd  xmm15,xmm15,xmm14
+        vmovdqu xmm14,XMMWORD[48+rsp]
+        vaesdec xmm7,xmm7,xmm1
+        mov     rbx,QWORD[64+rsp]
+        vaesdec xmm8,xmm8,xmm1
+        vaesdec xmm9,xmm9,xmm1
+        vmovups xmm1,XMMWORD[((16-120))+rsi]
+
+        vaesdeclast     xmm2,xmm2,xmm0
+        vmovdqa XMMWORD[32+rsp],xmm15
+        vpxor   xmm15,xmm15,xmm15
+        vaesdeclast     xmm3,xmm3,xmm0
+        vpxor   xmm2,xmm2,XMMWORD[rbp]
+        vaesdeclast     xmm4,xmm4,xmm0
+        vpxor   xmm3,xmm3,XMMWORD[16+rbp]
+        vpcmpgtd        xmm15,xmm14,xmm15
+        vaesdeclast     xmm5,xmm5,xmm0
+        vpxor   xmm4,xmm4,XMMWORD[32+rbp]
+        vaesdeclast     xmm6,xmm6,xmm0
+        vpxor   xmm5,xmm5,XMMWORD[48+rbp]
+        vpaddd  xmm14,xmm14,xmm15
+        vmovdqu xmm15,XMMWORD[((-120))+rsi]
+        vaesdeclast     xmm7,xmm7,xmm0
+        vpxor   xmm6,xmm6,XMMWORD[64+rbp]
+        vaesdeclast     xmm8,xmm8,xmm0
+        vpxor   xmm7,xmm7,XMMWORD[80+rbp]
+        vmovdqa XMMWORD[48+rsp],xmm14
+        vaesdeclast     xmm9,xmm9,xmm0
+        vpxor   xmm8,xmm8,XMMWORD[96+rbp]
+        vmovups xmm0,XMMWORD[((32-120))+rsi]
+
+        vmovups XMMWORD[(-16)+r8],xmm2
+        sub     r8,rbx
+        vmovdqu xmm2,XMMWORD[((128+0))+rsp]
+        vpxor   xmm9,xmm9,XMMWORD[112+rbp]
+        vmovups XMMWORD[(-16)+r9],xmm3
+        sub     r9,QWORD[72+rsp]
+        vmovdqu XMMWORD[rbp],xmm2
+        vpxor   xmm2,xmm2,xmm15
+        vmovdqu xmm3,XMMWORD[((128+16))+rsp]
+        vmovups XMMWORD[(-16)+r10],xmm4
+        sub     r10,QWORD[80+rsp]
+        vmovdqu XMMWORD[16+rbp],xmm3
+        vpxor   xmm3,xmm3,xmm15
+        vmovdqu xmm4,XMMWORD[((128+32))+rsp]
+        vmovups XMMWORD[(-16)+r11],xmm5
+        sub     r11,QWORD[88+rsp]
+        vmovdqu XMMWORD[32+rbp],xmm4
+        vpxor   xmm4,xmm4,xmm15
+        vmovdqu xmm5,XMMWORD[((128+48))+rsp]
+        vmovups XMMWORD[(-16)+r12],xmm6
+        sub     r12,QWORD[96+rsp]
+        vmovdqu XMMWORD[48+rbp],xmm5
+        vpxor   xmm5,xmm5,xmm15
+        vmovdqu XMMWORD[64+rbp],xmm10
+        vpxor   xmm6,xmm15,xmm10
+        vmovups XMMWORD[(-16)+r13],xmm7
+        sub     r13,QWORD[104+rsp]
+        vmovdqu XMMWORD[80+rbp],xmm11
+        vpxor   xmm7,xmm15,xmm11
+        vmovups XMMWORD[(-16)+r14],xmm8
+        sub     r14,QWORD[112+rsp]
+        vmovdqu XMMWORD[96+rbp],xmm12
+        vpxor   xmm8,xmm15,xmm12
+        vmovups XMMWORD[(-16)+r15],xmm9
+        sub     r15,QWORD[120+rsp]
+        vmovdqu XMMWORD[112+rbp],xmm13
+        vpxor   xmm9,xmm15,xmm13
+
+        xor     rbp,128
+        dec     edx
+        jnz     NEAR $L$oop_dec8x
+
+        mov     rax,QWORD[16+rsp]
+
+
+
+
+
+
+$L$dec8x_done:
+        vzeroupper
+        movaps  xmm6,XMMWORD[((-216))+rax]
+        movaps  xmm7,XMMWORD[((-200))+rax]
+        movaps  xmm8,XMMWORD[((-184))+rax]
+        movaps  xmm9,XMMWORD[((-168))+rax]
+        movaps  xmm10,XMMWORD[((-152))+rax]
+        movaps  xmm11,XMMWORD[((-136))+rax]
+        movaps  xmm12,XMMWORD[((-120))+rax]
+        movaps  xmm13,XMMWORD[((-104))+rax]
+        movaps  xmm14,XMMWORD[((-88))+rax]
+        movaps  xmm15,XMMWORD[((-72))+rax]
+        mov     r15,QWORD[((-48))+rax]
+
+        mov     r14,QWORD[((-40))+rax]
+
+        mov     r13,QWORD[((-32))+rax]
+
+        mov     r12,QWORD[((-24))+rax]
+
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$dec8x_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_aesni_multi_cbc_decrypt_avx:
+EXTERN  __imp_RtlVirtualUnwind
+
+ALIGN   16
+se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$in_prologue
+
+        mov     rax,QWORD[152+r8]
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$in_prologue
+
+        mov     rax,QWORD[16+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+        mov     r12,QWORD[((-24))+rax]
+        mov     r13,QWORD[((-32))+rax]
+        mov     r14,QWORD[((-40))+rax]
+        mov     r15,QWORD[((-48))+rax]
+        mov     QWORD[144+r8],rbx
+        mov     QWORD[160+r8],rbp
+        mov     QWORD[216+r8],r12
+        mov     QWORD[224+r8],r13
+        mov     QWORD[232+r8],r14
+        mov     QWORD[240+r8],r15
+
+        lea     rsi,[((-56-160))+rax]
+        lea     rdi,[512+r8]
+        mov     ecx,20
+        DD      0xa548f3fc
+
+$L$in_prologue:
+        mov     rdi,QWORD[8+rax]
+        mov     rsi,QWORD[16+rax]
+        mov     QWORD[152+r8],rax
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+
+        mov     rdi,QWORD[40+r9]
+        mov     rsi,r8
+        mov     ecx,154
+        DD      0xa548f3fc
+
+        mov     rsi,r9
+        xor     rcx,rcx
+        mov     rdx,QWORD[8+rsi]
+        mov     r8,QWORD[rsi]
+        mov     r9,QWORD[16+rsi]
+        mov     r10,QWORD[40+rsi]
+        lea     r11,[56+rsi]
+        lea     r12,[24+rsi]
+        mov     QWORD[32+rsp],r10
+        mov     QWORD[40+rsp],r11
+        mov     QWORD[48+rsp],r12
+        mov     QWORD[56+rsp],rcx
+        call    QWORD[__imp_RtlVirtualUnwind]
+
+        mov     eax,1
+        add     rsp,64
+        popfq
+        pop     r15
+        pop     r14
+        pop     r13
+        pop     r12
+        pop     rbp
+        pop     rbx
+        pop     rdi
+        pop     rsi
+        DB      0F3h,0C3h               ;repret
+
+
+section .pdata rdata align=4
+ALIGN   4
+        DD      $L$SEH_begin_aesni_multi_cbc_encrypt wrt ..imagebase
+        DD      $L$SEH_end_aesni_multi_cbc_encrypt wrt ..imagebase
+        DD      $L$SEH_info_aesni_multi_cbc_encrypt wrt ..imagebase
+        DD      $L$SEH_begin_aesni_multi_cbc_decrypt wrt ..imagebase
+        DD      $L$SEH_end_aesni_multi_cbc_decrypt wrt ..imagebase
+        DD      $L$SEH_info_aesni_multi_cbc_decrypt wrt ..imagebase
+        DD      $L$SEH_begin_aesni_multi_cbc_encrypt_avx wrt ..imagebase
+        DD      $L$SEH_end_aesni_multi_cbc_encrypt_avx wrt ..imagebase
+        DD      $L$SEH_info_aesni_multi_cbc_encrypt_avx wrt ..imagebase
+        DD      $L$SEH_begin_aesni_multi_cbc_decrypt_avx wrt ..imagebase
+        DD      $L$SEH_end_aesni_multi_cbc_decrypt_avx wrt ..imagebase
+        DD      $L$SEH_info_aesni_multi_cbc_decrypt_avx wrt ..imagebase
+section .xdata rdata align=8
+ALIGN   8
+$L$SEH_info_aesni_multi_cbc_encrypt:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$enc4x_body wrt ..imagebase,$L$enc4x_epilogue wrt ..imagebase
+$L$SEH_info_aesni_multi_cbc_decrypt:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$dec4x_body wrt ..imagebase,$L$dec4x_epilogue wrt ..imagebase
+$L$SEH_info_aesni_multi_cbc_encrypt_avx:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$enc8x_body wrt ..imagebase,$L$enc8x_epilogue wrt ..imagebase
+$L$SEH_info_aesni_multi_cbc_decrypt_avx:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$dec8x_body wrt ..imagebase,$L$dec8x_epilogue wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1-x86_64.nasm
new file mode 100644
index 0000000000..0b706c4e77
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1-x86_64.nasm
@@ -0,0 +1,3271 @@
+; Copyright 2011-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+EXTERN  OPENSSL_ia32cap_P
+
+global  aesni_cbc_sha1_enc
+
+ALIGN   32
+aesni_cbc_sha1_enc:
+
+        mov     r10d,DWORD[((OPENSSL_ia32cap_P+0))]
+        mov     r11,QWORD[((OPENSSL_ia32cap_P+4))]
+        bt      r11,61
+        jc      NEAR aesni_cbc_sha1_enc_shaext
+        and     r11d,268435456
+        and     r10d,1073741824
+        or      r10d,r11d
+        cmp     r10d,1342177280
+        je      NEAR aesni_cbc_sha1_enc_avx
+        jmp     NEAR aesni_cbc_sha1_enc_ssse3
+        DB      0F3h,0C3h               ;repret
+
+
+ALIGN   32
+aesni_cbc_sha1_enc_ssse3:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_cbc_sha1_enc_ssse3:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        mov     r10,QWORD[56+rsp]
+
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        lea     rsp,[((-264))+rsp]
+
+
+
+        movaps  XMMWORD[(96+0)+rsp],xmm6
+        movaps  XMMWORD[(96+16)+rsp],xmm7
+        movaps  XMMWORD[(96+32)+rsp],xmm8
+        movaps  XMMWORD[(96+48)+rsp],xmm9
+        movaps  XMMWORD[(96+64)+rsp],xmm10
+        movaps  XMMWORD[(96+80)+rsp],xmm11
+        movaps  XMMWORD[(96+96)+rsp],xmm12
+        movaps  XMMWORD[(96+112)+rsp],xmm13
+        movaps  XMMWORD[(96+128)+rsp],xmm14
+        movaps  XMMWORD[(96+144)+rsp],xmm15
+$L$prologue_ssse3:
+        mov     r12,rdi
+        mov     r13,rsi
+        mov     r14,rdx
+        lea     r15,[112+rcx]
+        movdqu  xmm2,XMMWORD[r8]
+        mov     QWORD[88+rsp],r8
+        shl     r14,6
+        sub     r13,r12
+        mov     r8d,DWORD[((240-112))+r15]
+        add     r14,r10
+
+        lea     r11,[K_XX_XX]
+        mov     eax,DWORD[r9]
+        mov     ebx,DWORD[4+r9]
+        mov     ecx,DWORD[8+r9]
+        mov     edx,DWORD[12+r9]
+        mov     esi,ebx
+        mov     ebp,DWORD[16+r9]
+        mov     edi,ecx
+        xor     edi,edx
+        and     esi,edi
+
+        movdqa  xmm3,XMMWORD[64+r11]
+        movdqa  xmm13,XMMWORD[r11]
+        movdqu  xmm4,XMMWORD[r10]
+        movdqu  xmm5,XMMWORD[16+r10]
+        movdqu  xmm6,XMMWORD[32+r10]
+        movdqu  xmm7,XMMWORD[48+r10]
+DB      102,15,56,0,227
+DB      102,15,56,0,235
+DB      102,15,56,0,243
+        add     r10,64
+        paddd   xmm4,xmm13
+DB      102,15,56,0,251
+        paddd   xmm5,xmm13
+        paddd   xmm6,xmm13
+        movdqa  XMMWORD[rsp],xmm4
+        psubd   xmm4,xmm13
+        movdqa  XMMWORD[16+rsp],xmm5
+        psubd   xmm5,xmm13
+        movdqa  XMMWORD[32+rsp],xmm6
+        psubd   xmm6,xmm13
+        movups  xmm15,XMMWORD[((-112))+r15]
+        movups  xmm0,XMMWORD[((16-112))+r15]
+        jmp     NEAR $L$oop_ssse3
+ALIGN   32
+$L$oop_ssse3:
+        ror     ebx,2
+        movups  xmm14,XMMWORD[r12]
+        xorps   xmm14,xmm15
+        xorps   xmm2,xmm14
+        movups  xmm1,XMMWORD[((-80))+r15]
+DB      102,15,56,220,208
+        pshufd  xmm8,xmm4,238
+        xor     esi,edx
+        movdqa  xmm12,xmm7
+        paddd   xmm13,xmm7
+        mov     edi,eax
+        add     ebp,DWORD[rsp]
+        punpcklqdq      xmm8,xmm5
+        xor     ebx,ecx
+        rol     eax,5
+        add     ebp,esi
+        psrldq  xmm12,4
+        and     edi,ebx
+        xor     ebx,ecx
+        pxor    xmm8,xmm4
+        add     ebp,eax
+        ror     eax,7
+        pxor    xmm12,xmm6
+        xor     edi,ecx
+        mov     esi,ebp
+        add     edx,DWORD[4+rsp]
+        pxor    xmm8,xmm12
+        xor     eax,ebx
+        rol     ebp,5
+        movdqa  XMMWORD[48+rsp],xmm13
+        add     edx,edi
+        movups  xmm0,XMMWORD[((-64))+r15]
+DB      102,15,56,220,209
+        and     esi,eax
+        movdqa  xmm3,xmm8
+        xor     eax,ebx
+        add     edx,ebp
+        ror     ebp,7
+        movdqa  xmm12,xmm8
+        xor     esi,ebx
+        pslldq  xmm3,12
+        paddd   xmm8,xmm8
+        mov     edi,edx
+        add     ecx,DWORD[8+rsp]
+        psrld   xmm12,31
+        xor     ebp,eax
+        rol     edx,5
+        add     ecx,esi
+        movdqa  xmm13,xmm3
+        and     edi,ebp
+        xor     ebp,eax
+        psrld   xmm3,30
+        add     ecx,edx
+        ror     edx,7
+        por     xmm8,xmm12
+        xor     edi,eax
+        mov     esi,ecx
+        add     ebx,DWORD[12+rsp]
+        movups  xmm1,XMMWORD[((-48))+r15]
+DB      102,15,56,220,208
+        pslld   xmm13,2
+        pxor    xmm8,xmm3
+        xor     edx,ebp
+        movdqa  xmm3,XMMWORD[r11]
+        rol     ecx,5
+        add     ebx,edi
+        and     esi,edx
+        pxor    xmm8,xmm13
+        xor     edx,ebp
+        add     ebx,ecx
+        ror     ecx,7
+        pshufd  xmm9,xmm5,238
+        xor     esi,ebp
+        movdqa  xmm13,xmm8
+        paddd   xmm3,xmm8
+        mov     edi,ebx
+        add     eax,DWORD[16+rsp]
+        punpcklqdq      xmm9,xmm6
+        xor     ecx,edx
+        rol     ebx,5
+        add     eax,esi
+        psrldq  xmm13,4
+        and     edi,ecx
+        xor     ecx,edx
+        pxor    xmm9,xmm5
+        add     eax,ebx
+        ror     ebx,7
+        movups  xmm0,XMMWORD[((-32))+r15]
+DB      102,15,56,220,209
+        pxor    xmm13,xmm7
+        xor     edi,edx
+        mov     esi,eax
+        add     ebp,DWORD[20+rsp]
+        pxor    xmm9,xmm13
+        xor     ebx,ecx
+        rol     eax,5
+        movdqa  XMMWORD[rsp],xmm3
+        add     ebp,edi
+        and     esi,ebx
+        movdqa  xmm12,xmm9
+        xor     ebx,ecx
+        add     ebp,eax
+        ror     eax,7
+        movdqa  xmm13,xmm9
+        xor     esi,ecx
+        pslldq  xmm12,12
+        paddd   xmm9,xmm9
+        mov     edi,ebp
+        add     edx,DWORD[24+rsp]
+        psrld   xmm13,31
+        xor     eax,ebx
+        rol     ebp,5
+        add     edx,esi
+        movups  xmm1,XMMWORD[((-16))+r15]
+DB      102,15,56,220,208
+        movdqa  xmm3,xmm12
+        and     edi,eax
+        xor     eax,ebx
+        psrld   xmm12,30
+        add     edx,ebp
+        ror     ebp,7
+        por     xmm9,xmm13
+        xor     edi,ebx
+        mov     esi,edx
+        add     ecx,DWORD[28+rsp]
+        pslld   xmm3,2
+        pxor    xmm9,xmm12
+        xor     ebp,eax
+        movdqa  xmm12,XMMWORD[16+r11]
+        rol     edx,5
+        add     ecx,edi
+        and     esi,ebp
+        pxor    xmm9,xmm3
+        xor     ebp,eax
+        add     ecx,edx
+        ror     edx,7
+        pshufd  xmm10,xmm6,238
+        xor     esi,eax
+        movdqa  xmm3,xmm9
+        paddd   xmm12,xmm9
+        mov     edi,ecx
+        add     ebx,DWORD[32+rsp]
+        movups  xmm0,XMMWORD[r15]
+DB      102,15,56,220,209
+        punpcklqdq      xmm10,xmm7
+        xor     edx,ebp
+        rol     ecx,5
+        add     ebx,esi
+        psrldq  xmm3,4
+        and     edi,edx
+        xor     edx,ebp
+        pxor    xmm10,xmm6
+        add     ebx,ecx
+        ror     ecx,7
+        pxor    xmm3,xmm8
+        xor     edi,ebp
+        mov     esi,ebx
+        add     eax,DWORD[36+rsp]
+        pxor    xmm10,xmm3
+        xor     ecx,edx
+        rol     ebx,5
+        movdqa  XMMWORD[16+rsp],xmm12
+        add     eax,edi
+        and     esi,ecx
+        movdqa  xmm13,xmm10
+        xor     ecx,edx
+        add     eax,ebx
+        ror     ebx,7
+        movups  xmm1,XMMWORD[16+r15]
+DB      102,15,56,220,208
+        movdqa  xmm3,xmm10
+        xor     esi,edx
+        pslldq  xmm13,12
+        paddd   xmm10,xmm10
+        mov     edi,eax
+        add     ebp,DWORD[40+rsp]
+        psrld   xmm3,31
+        xor     ebx,ecx
+        rol     eax,5
+        add     ebp,esi
+        movdqa  xmm12,xmm13
+        and     edi,ebx
+        xor     ebx,ecx
+        psrld   xmm13,30
+        add     ebp,eax
+        ror     eax,7
+        por     xmm10,xmm3
+        xor     edi,ecx
+        mov     esi,ebp
+        add     edx,DWORD[44+rsp]
+        pslld   xmm12,2
+        pxor    xmm10,xmm13
+        xor     eax,ebx
+        movdqa  xmm13,XMMWORD[16+r11]
+        rol     ebp,5
+        add     edx,edi
+        movups  xmm0,XMMWORD[32+r15]
+DB      102,15,56,220,209
+        and     esi,eax
+        pxor    xmm10,xmm12
+        xor     eax,ebx
+        add     edx,ebp
+        ror     ebp,7
+        pshufd  xmm11,xmm7,238
+        xor     esi,ebx
+        movdqa  xmm12,xmm10
+        paddd   xmm13,xmm10
+        mov     edi,edx
+        add     ecx,DWORD[48+rsp]
+        punpcklqdq      xmm11,xmm8
+        xor     ebp,eax
+        rol     edx,5
+        add     ecx,esi
+        psrldq  xmm12,4
+        and     edi,ebp
+        xor     ebp,eax
+        pxor    xmm11,xmm7
+        add     ecx,edx
+        ror     edx,7
+        pxor    xmm12,xmm9
+        xor     edi,eax
+        mov     esi,ecx
+        add     ebx,DWORD[52+rsp]
+        movups  xmm1,XMMWORD[48+r15]
+DB      102,15,56,220,208
+        pxor    xmm11,xmm12
+        xor     edx,ebp
+        rol     ecx,5
+        movdqa  XMMWORD[32+rsp],xmm13
+        add     ebx,edi
+        and     esi,edx
+        movdqa  xmm3,xmm11
+        xor     edx,ebp
+        add     ebx,ecx
+        ror     ecx,7
+        movdqa  xmm12,xmm11
+        xor     esi,ebp
+        pslldq  xmm3,12
+        paddd   xmm11,xmm11
+        mov     edi,ebx
+        add     eax,DWORD[56+rsp]
+        psrld   xmm12,31
+        xor     ecx,edx
+        rol     ebx,5
+        add     eax,esi
+        movdqa  xmm13,xmm3
+        and     edi,ecx
+        xor     ecx,edx
+        psrld   xmm3,30
+        add     eax,ebx
+        ror     ebx,7
+        cmp     r8d,11
+        jb      NEAR $L$aesenclast1
+        movups  xmm0,XMMWORD[64+r15]
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[80+r15]
+DB      102,15,56,220,208
+        je      NEAR $L$aesenclast1
+        movups  xmm0,XMMWORD[96+r15]
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[112+r15]
+DB      102,15,56,220,208
+$L$aesenclast1:
+DB      102,15,56,221,209
+        movups  xmm0,XMMWORD[((16-112))+r15]
+        por     xmm11,xmm12
+        xor     edi,edx
+        mov     esi,eax
+        add     ebp,DWORD[60+rsp]
+        pslld   xmm13,2
+        pxor    xmm11,xmm3
+        xor     ebx,ecx
+        movdqa  xmm3,XMMWORD[16+r11]
+        rol     eax,5
+        add     ebp,edi
+        and     esi,ebx
+        pxor    xmm11,xmm13
+        pshufd  xmm13,xmm10,238
+        xor     ebx,ecx
+        add     ebp,eax
+        ror     eax,7
+        pxor    xmm4,xmm8
+        xor     esi,ecx
+        mov     edi,ebp
+        add     edx,DWORD[rsp]
+        punpcklqdq      xmm13,xmm11
+        xor     eax,ebx
+        rol     ebp,5
+        pxor    xmm4,xmm5
+        add     edx,esi
+        movups  xmm14,XMMWORD[16+r12]
+        xorps   xmm14,xmm15
+        movups  XMMWORD[r13*1+r12],xmm2
+        xorps   xmm2,xmm14
+        movups  xmm1,XMMWORD[((-80))+r15]
+DB      102,15,56,220,208
+        and     edi,eax
+        movdqa  xmm12,xmm3
+        xor     eax,ebx
+        paddd   xmm3,xmm11
+        add     edx,ebp
+        pxor    xmm4,xmm13
+        ror     ebp,7
+        xor     edi,ebx
+        mov     esi,edx
+        add     ecx,DWORD[4+rsp]
+        movdqa  xmm13,xmm4
+        xor     ebp,eax
+        rol     edx,5
+        movdqa  XMMWORD[48+rsp],xmm3
+        add     ecx,edi
+        and     esi,ebp
+        xor     ebp,eax
+        pslld   xmm4,2
+        add     ecx,edx
+        ror     edx,7
+        psrld   xmm13,30
+        xor     esi,eax
+        mov     edi,ecx
+        add     ebx,DWORD[8+rsp]
+        movups  xmm0,XMMWORD[((-64))+r15]
+DB      102,15,56,220,209
+        por     xmm4,xmm13
+        xor     edx,ebp
+        rol     ecx,5
+        pshufd  xmm3,xmm11,238
+        add     ebx,esi
+        and     edi,edx
+        xor     edx,ebp
+        add     ebx,ecx
+        add     eax,DWORD[12+rsp]
+        xor     edi,ebp
+        mov     esi,ebx
+        rol     ebx,5
+        add     eax,edi
+        xor     esi,edx
+        ror     ecx,7
+        add     eax,ebx
+        pxor    xmm5,xmm9
+        add     ebp,DWORD[16+rsp]
+        movups  xmm1,XMMWORD[((-48))+r15]
+DB      102,15,56,220,208
+        xor     esi,ecx
+        punpcklqdq      xmm3,xmm4
+        mov     edi,eax
+        rol     eax,5
+        pxor    xmm5,xmm6
+        add     ebp,esi
+        xor     edi,ecx
+        movdqa  xmm13,xmm12
+        ror     ebx,7
+        paddd   xmm12,xmm4
+        add     ebp,eax
+        pxor    xmm5,xmm3
+        add     edx,DWORD[20+rsp]
+        xor     edi,ebx
+        mov     esi,ebp
+        rol     ebp,5
+        movdqa  xmm3,xmm5
+        add     edx,edi
+        xor     esi,ebx
+        movdqa  XMMWORD[rsp],xmm12
+        ror     eax,7
+        add     edx,ebp
+        add     ecx,DWORD[24+rsp]
+        pslld   xmm5,2
+        xor     esi,eax
+        mov     edi,edx
+        psrld   xmm3,30
+        rol     edx,5
+        add     ecx,esi
+        movups  xmm0,XMMWORD[((-32))+r15]
+DB      102,15,56,220,209
+        xor     edi,eax
+        ror     ebp,7
+        por     xmm5,xmm3
+        add     ecx,edx
+        add     ebx,DWORD[28+rsp]
+        pshufd  xmm12,xmm4,238
+        xor     edi,ebp
+        mov     esi,ecx
+        rol     ecx,5
+        add     ebx,edi
+        xor     esi,ebp
+        ror     edx,7
+        add     ebx,ecx
+        pxor    xmm6,xmm10
+        add     eax,DWORD[32+rsp]
+        xor     esi,edx
+        punpcklqdq      xmm12,xmm5
+        mov     edi,ebx
+        rol     ebx,5
+        pxor    xmm6,xmm7
+        add     eax,esi
+        xor     edi,edx
+        movdqa  xmm3,XMMWORD[32+r11]
+        ror     ecx,7
+        paddd   xmm13,xmm5
+        add     eax,ebx
+        pxor    xmm6,xmm12
+        add     ebp,DWORD[36+rsp]
+        movups  xmm1,XMMWORD[((-16))+r15]
+DB      102,15,56,220,208
+        xor     edi,ecx
+        mov     esi,eax
+        rol     eax,5
+        movdqa  xmm12,xmm6
+        add     ebp,edi
+        xor     esi,ecx
+        movdqa  XMMWORD[16+rsp],xmm13
+        ror     ebx,7
+        add     ebp,eax
+        add     edx,DWORD[40+rsp]
+        pslld   xmm6,2
+        xor     esi,ebx
+        mov     edi,ebp
+        psrld   xmm12,30
+        rol     ebp,5
+        add     edx,esi
+        xor     edi,ebx
+        ror     eax,7
+        por     xmm6,xmm12
+        add     edx,ebp
+        add     ecx,DWORD[44+rsp]
+        pshufd  xmm13,xmm5,238
+        xor     edi,eax
+        mov     esi,edx
+        rol     edx,5
+        add     ecx,edi
+        movups  xmm0,XMMWORD[r15]
+DB      102,15,56,220,209
+        xor     esi,eax
+        ror     ebp,7
+        add     ecx,edx
+        pxor    xmm7,xmm11
+        add     ebx,DWORD[48+rsp]
+        xor     esi,ebp
+        punpcklqdq      xmm13,xmm6
+        mov     edi,ecx
+        rol     ecx,5
+        pxor    xmm7,xmm8
+        add     ebx,esi
+        xor     edi,ebp
+        movdqa  xmm12,xmm3
+        ror     edx,7
+        paddd   xmm3,xmm6
+        add     ebx,ecx
+        pxor    xmm7,xmm13
+        add     eax,DWORD[52+rsp]
+        xor     edi,edx
+        mov     esi,ebx
+        rol     ebx,5
+        movdqa  xmm13,xmm7
+        add     eax,edi
+        xor     esi,edx
+        movdqa  XMMWORD[32+rsp],xmm3
+        ror     ecx,7
+        add     eax,ebx
+        add     ebp,DWORD[56+rsp]
+        movups  xmm1,XMMWORD[16+r15]
+DB      102,15,56,220,208
+        pslld   xmm7,2
+        xor     esi,ecx
+        mov     edi,eax
+        psrld   xmm13,30
+        rol     eax,5
+        add     ebp,esi
+        xor     edi,ecx
+        ror     ebx,7
+        por     xmm7,xmm13
+        add     ebp,eax
+        add     edx,DWORD[60+rsp]
+        pshufd  xmm3,xmm6,238
+        xor     edi,ebx
+        mov     esi,ebp
+        rol     ebp,5
+        add     edx,edi
+        xor     esi,ebx
+        ror     eax,7
+        add     edx,ebp
+        pxor    xmm8,xmm4
+        add     ecx,DWORD[rsp]
+        xor     esi,eax
+        punpcklqdq      xmm3,xmm7
+        mov     edi,edx
+        rol     edx,5
+        pxor    xmm8,xmm9
+        add     ecx,esi
+        movups  xmm0,XMMWORD[32+r15]
+DB      102,15,56,220,209
+        xor     edi,eax
+        movdqa  xmm13,xmm12
+        ror     ebp,7
+        paddd   xmm12,xmm7
+        add     ecx,edx
+        pxor    xmm8,xmm3
+        add     ebx,DWORD[4+rsp]
+        xor     edi,ebp
+        mov     esi,ecx
+        rol     ecx,5
+        movdqa  xmm3,xmm8
+        add     ebx,edi
+        xor     esi,ebp
+        movdqa  XMMWORD[48+rsp],xmm12
+        ror     edx,7
+        add     ebx,ecx
+        add     eax,DWORD[8+rsp]
+        pslld   xmm8,2
+        xor     esi,edx
+        mov     edi,ebx
+        psrld   xmm3,30
+        rol     ebx,5
+        add     eax,esi
+        xor     edi,edx
+        ror     ecx,7
+        por     xmm8,xmm3
+        add     eax,ebx
+        add     ebp,DWORD[12+rsp]
+        movups  xmm1,XMMWORD[48+r15]
+DB      102,15,56,220,208
+        pshufd  xmm12,xmm7,238
+        xor     edi,ecx
+        mov     esi,eax
+        rol     eax,5
+        add     ebp,edi
+        xor     esi,ecx
+        ror     ebx,7
+        add     ebp,eax
+        pxor    xmm9,xmm5
+        add     edx,DWORD[16+rsp]
+        xor     esi,ebx
+        punpcklqdq      xmm12,xmm8
+        mov     edi,ebp
+        rol     ebp,5
+        pxor    xmm9,xmm10
+        add     edx,esi
+        xor     edi,ebx
+        movdqa  xmm3,xmm13
+        ror     eax,7
+        paddd   xmm13,xmm8
+        add     edx,ebp
+        pxor    xmm9,xmm12
+        add     ecx,DWORD[20+rsp]
+        xor     edi,eax
+        mov     esi,edx
+        rol     edx,5
+        movdqa  xmm12,xmm9
+        add     ecx,edi
+        cmp     r8d,11
+        jb      NEAR $L$aesenclast2
+        movups  xmm0,XMMWORD[64+r15]
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[80+r15]
+DB      102,15,56,220,208
+        je      NEAR $L$aesenclast2
+        movups  xmm0,XMMWORD[96+r15]
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[112+r15]
+DB      102,15,56,220,208
+$L$aesenclast2:
+DB      102,15,56,221,209
+        movups  xmm0,XMMWORD[((16-112))+r15]
+        xor     esi,eax
+        movdqa  XMMWORD[rsp],xmm13
+        ror     ebp,7
+        add     ecx,edx
+        add     ebx,DWORD[24+rsp]
+        pslld   xmm9,2
+        xor     esi,ebp
+        mov     edi,ecx
+        psrld   xmm12,30
+        rol     ecx,5
+        add     ebx,esi
+        xor     edi,ebp
+        ror     edx,7
+        por     xmm9,xmm12
+        add     ebx,ecx
+        add     eax,DWORD[28+rsp]
+        pshufd  xmm13,xmm8,238
+        ror     ecx,7
+        mov     esi,ebx
+        xor     edi,edx
+        rol     ebx,5
+        add     eax,edi
+        xor     esi,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        pxor    xmm10,xmm6
+        add     ebp,DWORD[32+rsp]
+        movups  xmm14,XMMWORD[32+r12]
+        xorps   xmm14,xmm15
+        movups  XMMWORD[16+r12*1+r13],xmm2
+        xorps   xmm2,xmm14
+        movups  xmm1,XMMWORD[((-80))+r15]
+DB      102,15,56,220,208
+        and     esi,ecx
+        xor     ecx,edx
+        ror     ebx,7
+        punpcklqdq      xmm13,xmm9
+        mov     edi,eax
+        xor     esi,ecx
+        pxor    xmm10,xmm11
+        rol     eax,5
+        add     ebp,esi
+        movdqa  xmm12,xmm3
+        xor     edi,ebx
+        paddd   xmm3,xmm9
+        xor     ebx,ecx
+        pxor    xmm10,xmm13
+        add     ebp,eax
+        add     edx,DWORD[36+rsp]
+        and     edi,ebx
+        xor     ebx,ecx
+        ror     eax,7
+        movdqa  xmm13,xmm10
+        mov     esi,ebp
+        xor     edi,ebx
+        movdqa  XMMWORD[16+rsp],xmm3
+        rol     ebp,5
+        add     edx,edi
+        movups  xmm0,XMMWORD[((-64))+r15]
+DB      102,15,56,220,209
+        xor     esi,eax
+        pslld   xmm10,2
+        xor     eax,ebx
+        add     edx,ebp
+        psrld   xmm13,30
+        add     ecx,DWORD[40+rsp]
+        and     esi,eax
+        xor     eax,ebx
+        por     xmm10,xmm13
+        ror     ebp,7
+        mov     edi,edx
+        xor     esi,eax
+        rol     edx,5
+        pshufd  xmm3,xmm9,238
+        add     ecx,esi
+        xor     edi,ebp
+        xor     ebp,eax
+        add     ecx,edx
+        add     ebx,DWORD[44+rsp]
+        and     edi,ebp
+        xor     ebp,eax
+        ror     edx,7
+        movups  xmm1,XMMWORD[((-48))+r15]
+DB      102,15,56,220,208
+        mov     esi,ecx
+        xor     edi,ebp
+        rol     ecx,5
+        add     ebx,edi
+        xor     esi,edx
+        xor     edx,ebp
+        add     ebx,ecx
+        pxor    xmm11,xmm7
+        add     eax,DWORD[48+rsp]
+        and     esi,edx
+        xor     edx,ebp
+        ror     ecx,7
+        punpcklqdq      xmm3,xmm10
+        mov     edi,ebx
+        xor     esi,edx
+        pxor    xmm11,xmm4
+        rol     ebx,5
+        add     eax,esi
+        movdqa  xmm13,XMMWORD[48+r11]
+        xor     edi,ecx
+        paddd   xmm12,xmm10
+        xor     ecx,edx
+        pxor    xmm11,xmm3
+        add     eax,ebx
+        add     ebp,DWORD[52+rsp]
+        movups  xmm0,XMMWORD[((-32))+r15]
+DB      102,15,56,220,209
+        and     edi,ecx
+        xor     ecx,edx
+        ror     ebx,7
+        movdqa  xmm3,xmm11
+        mov     esi,eax
+        xor     edi,ecx
+        movdqa  XMMWORD[32+rsp],xmm12
+        rol     eax,5
+        add     ebp,edi
+        xor     esi,ebx
+        pslld   xmm11,2
+        xor     ebx,ecx
+        add     ebp,eax
+        psrld   xmm3,30
+        add     edx,DWORD[56+rsp]
+        and     esi,ebx
+        xor     ebx,ecx
+        por     xmm11,xmm3
+        ror     eax,7
+        mov     edi,ebp
+        xor     esi,ebx
+        rol     ebp,5
+        pshufd  xmm12,xmm10,238
+        add     edx,esi
+        movups  xmm1,XMMWORD[((-16))+r15]
+DB      102,15,56,220,208
+        xor     edi,eax
+        xor     eax,ebx
+        add     edx,ebp
+        add     ecx,DWORD[60+rsp]
+        and     edi,eax
+        xor     eax,ebx
+        ror     ebp,7
+        mov     esi,edx
+        xor     edi,eax
+        rol     edx,5
+        add     ecx,edi
+        xor     esi,ebp
+        xor     ebp,eax
+        add     ecx,edx
+        pxor    xmm4,xmm8
+        add     ebx,DWORD[rsp]
+        and     esi,ebp
+        xor     ebp,eax
+        ror     edx,7
+        movups  xmm0,XMMWORD[r15]
+DB      102,15,56,220,209
+        punpcklqdq      xmm12,xmm11
+        mov     edi,ecx
+        xor     esi,ebp
+        pxor    xmm4,xmm5
+        rol     ecx,5
+        add     ebx,esi
+        movdqa  xmm3,xmm13
+        xor     edi,edx
+        paddd   xmm13,xmm11
+        xor     edx,ebp
+        pxor    xmm4,xmm12
+        add     ebx,ecx
+        add     eax,DWORD[4+rsp]
+        and     edi,edx
+        xor     edx,ebp
+        ror     ecx,7
+        movdqa  xmm12,xmm4
+        mov     esi,ebx
+        xor     edi,edx
+        movdqa  XMMWORD[48+rsp],xmm13
+        rol     ebx,5
+        add     eax,edi
+        xor     esi,ecx
+        pslld   xmm4,2
+        xor     ecx,edx
+        add     eax,ebx
+        psrld   xmm12,30
+        add     ebp,DWORD[8+rsp]
+        movups  xmm1,XMMWORD[16+r15]
+DB      102,15,56,220,208
+        and     esi,ecx
+        xor     ecx,edx
+        por     xmm4,xmm12
+        ror     ebx,7
+        mov     edi,eax
+        xor     esi,ecx
+        rol     eax,5
+        pshufd  xmm13,xmm11,238
+        add     ebp,esi
+        xor     edi,ebx
+        xor     ebx,ecx
+        add     ebp,eax
+        add     edx,DWORD[12+rsp]
+        and     edi,ebx
+        xor     ebx,ecx
+        ror     eax,7
+        mov     esi,ebp
+        xor     edi,ebx
+        rol     ebp,5
+        add     edx,edi
+        movups  xmm0,XMMWORD[32+r15]
+DB      102,15,56,220,209
+        xor     esi,eax
+        xor     eax,ebx
+        add     edx,ebp
+        pxor    xmm5,xmm9
+        add     ecx,DWORD[16+rsp]
+        and     esi,eax
+        xor     eax,ebx
+        ror     ebp,7
+        punpcklqdq      xmm13,xmm4
+        mov     edi,edx
+        xor     esi,eax
+        pxor    xmm5,xmm6
+        rol     edx,5
+        add     ecx,esi
+        movdqa  xmm12,xmm3
+        xor     edi,ebp
+        paddd   xmm3,xmm4
+        xor     ebp,eax
+        pxor    xmm5,xmm13
+        add     ecx,edx
+        add     ebx,DWORD[20+rsp]
+        and     edi,ebp
+        xor     ebp,eax
+        ror     edx,7
+        movups  xmm1,XMMWORD[48+r15]
+DB      102,15,56,220,208
+        movdqa  xmm13,xmm5
+        mov     esi,ecx
+        xor     edi,ebp
+        movdqa  XMMWORD[rsp],xmm3
+        rol     ecx,5
+        add     ebx,edi
+        xor     esi,edx
+        pslld   xmm5,2
+        xor     edx,ebp
+        add     ebx,ecx
+        psrld   xmm13,30
+        add     eax,DWORD[24+rsp]
+        and     esi,edx
+        xor     edx,ebp
+        por     xmm5,xmm13
+        ror     ecx,7
+        mov     edi,ebx
+        xor     esi,edx
+        rol     ebx,5
+        pshufd  xmm3,xmm4,238
+        add     eax,esi
+        xor     edi,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        add     ebp,DWORD[28+rsp]
+        cmp     r8d,11
+        jb      NEAR $L$aesenclast3
+        movups  xmm0,XMMWORD[64+r15]
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[80+r15]
+DB      102,15,56,220,208
+        je      NEAR $L$aesenclast3
+        movups  xmm0,XMMWORD[96+r15]
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[112+r15]
+DB      102,15,56,220,208
+$L$aesenclast3:
+DB      102,15,56,221,209
+        movups  xmm0,XMMWORD[((16-112))+r15]
+        and     edi,ecx
+        xor     ecx,edx
+        ror     ebx,7
+        mov     esi,eax
+        xor     edi,ecx
+        rol     eax,5
+        add     ebp,edi
+        xor     esi,ebx
+        xor     ebx,ecx
+        add     ebp,eax
+        pxor    xmm6,xmm10
+        add     edx,DWORD[32+rsp]
+        and     esi,ebx
+        xor     ebx,ecx
+        ror     eax,7
+        punpcklqdq      xmm3,xmm5
+        mov     edi,ebp
+        xor     esi,ebx
+        pxor    xmm6,xmm7
+        rol     ebp,5
+        add     edx,esi
+        movups  xmm14,XMMWORD[48+r12]
+        xorps   xmm14,xmm15
+        movups  XMMWORD[32+r12*1+r13],xmm2
+        xorps   xmm2,xmm14
+        movups  xmm1,XMMWORD[((-80))+r15]
+DB      102,15,56,220,208
+        movdqa  xmm13,xmm12
+        xor     edi,eax
+        paddd   xmm12,xmm5
+        xor     eax,ebx
+        pxor    xmm6,xmm3
+        add     edx,ebp
+        add     ecx,DWORD[36+rsp]
+        and     edi,eax
+        xor     eax,ebx
+        ror     ebp,7
+        movdqa  xmm3,xmm6
+        mov     esi,edx
+        xor     edi,eax
+        movdqa  XMMWORD[16+rsp],xmm12
+        rol     edx,5
+        add     ecx,edi
+        xor     esi,ebp
+        pslld   xmm6,2
+        xor     ebp,eax
+        add     ecx,edx
+        psrld   xmm3,30
+        add     ebx,DWORD[40+rsp]
+        and     esi,ebp
+        xor     ebp,eax
+        por     xmm6,xmm3
+        ror     edx,7
+        movups  xmm0,XMMWORD[((-64))+r15]
+DB      102,15,56,220,209
+        mov     edi,ecx
+        xor     esi,ebp
+        rol     ecx,5
+        pshufd  xmm12,xmm5,238
+        add     ebx,esi
+        xor     edi,edx
+        xor     edx,ebp
+        add     ebx,ecx
+        add     eax,DWORD[44+rsp]
+        and     edi,edx
+        xor     edx,ebp
+        ror     ecx,7
+        mov     esi,ebx
+        xor     edi,edx
+        rol     ebx,5
+        add     eax,edi
+        xor     esi,edx
+        add     eax,ebx
+        pxor    xmm7,xmm11
+        add     ebp,DWORD[48+rsp]
+        movups  xmm1,XMMWORD[((-48))+r15]
+DB      102,15,56,220,208
+        xor     esi,ecx
+        punpcklqdq      xmm12,xmm6
+        mov     edi,eax
+        rol     eax,5
+        pxor    xmm7,xmm8
+        add     ebp,esi
+        xor     edi,ecx
+        movdqa  xmm3,xmm13
+        ror     ebx,7
+        paddd   xmm13,xmm6
+        add     ebp,eax
+        pxor    xmm7,xmm12
+        add     edx,DWORD[52+rsp]
+        xor     edi,ebx
+        mov     esi,ebp
+        rol     ebp,5
+        movdqa  xmm12,xmm7
+        add     edx,edi
+        xor     esi,ebx
+        movdqa  XMMWORD[32+rsp],xmm13
+        ror     eax,7
+        add     edx,ebp
+        add     ecx,DWORD[56+rsp]
+        pslld   xmm7,2
+        xor     esi,eax
+        mov     edi,edx
+        psrld   xmm12,30
+        rol     edx,5
+        add     ecx,esi
+        movups  xmm0,XMMWORD[((-32))+r15]
+DB      102,15,56,220,209
+        xor     edi,eax
+        ror     ebp,7
+        por     xmm7,xmm12
+        add     ecx,edx
+        add     ebx,DWORD[60+rsp]
+        xor     edi,ebp
+        mov     esi,ecx
+        rol     ecx,5
+        add     ebx,edi
+        xor     esi,ebp
+        ror     edx,7
+        add     ebx,ecx
+        add     eax,DWORD[rsp]
+        xor     esi,edx
+        mov     edi,ebx
+        rol     ebx,5
+        paddd   xmm3,xmm7
+        add     eax,esi
+        xor     edi,edx
+        movdqa  XMMWORD[48+rsp],xmm3
+        ror     ecx,7
+        add     eax,ebx
+        add     ebp,DWORD[4+rsp]
+        movups  xmm1,XMMWORD[((-16))+r15]
+DB      102,15,56,220,208
+        xor     edi,ecx
+        mov     esi,eax
+        rol     eax,5
+        add     ebp,edi
+        xor     esi,ecx
+        ror     ebx,7
+        add     ebp,eax
+        add     edx,DWORD[8+rsp]
+        xor     esi,ebx
+        mov     edi,ebp
+        rol     ebp,5
+        add     edx,esi
+        xor     edi,ebx
+        ror     eax,7
+        add     edx,ebp
+        add     ecx,DWORD[12+rsp]
+        xor     edi,eax
+        mov     esi,edx
+        rol     edx,5
+        add     ecx,edi
+        movups  xmm0,XMMWORD[r15]
+DB      102,15,56,220,209
+        xor     esi,eax
+        ror     ebp,7
+        add     ecx,edx
+        cmp     r10,r14
+        je      NEAR $L$done_ssse3
+        movdqa  xmm3,XMMWORD[64+r11]
+        movdqa  xmm13,XMMWORD[r11]
+        movdqu  xmm4,XMMWORD[r10]
+        movdqu  xmm5,XMMWORD[16+r10]
+        movdqu  xmm6,XMMWORD[32+r10]
+        movdqu  xmm7,XMMWORD[48+r10]
+DB      102,15,56,0,227
+        add     r10,64
+        add     ebx,DWORD[16+rsp]
+        xor     esi,ebp
+        mov     edi,ecx
+DB      102,15,56,0,235
+        rol     ecx,5
+        add     ebx,esi
+        xor     edi,ebp
+        ror     edx,7
+        paddd   xmm4,xmm13
+        add     ebx,ecx
+        add     eax,DWORD[20+rsp]
+        xor     edi,edx
+        mov     esi,ebx
+        movdqa  XMMWORD[rsp],xmm4
+        rol     ebx,5
+        add     eax,edi
+        xor     esi,edx
+        ror     ecx,7
+        psubd   xmm4,xmm13
+        add     eax,ebx
+        add     ebp,DWORD[24+rsp]
+        movups  xmm1,XMMWORD[16+r15]
+DB      102,15,56,220,208
+        xor     esi,ecx
+        mov     edi,eax
+        rol     eax,5
+        add     ebp,esi
+        xor     edi,ecx
+        ror     ebx,7
+        add     ebp,eax
+        add     edx,DWORD[28+rsp]
+        xor     edi,ebx
+        mov     esi,ebp
+        rol     ebp,5
+        add     edx,edi
+        xor     esi,ebx
+        ror     eax,7
+        add     edx,ebp
+        add     ecx,DWORD[32+rsp]
+        xor     esi,eax
+        mov     edi,edx
+DB      102,15,56,0,243
+        rol     edx,5
+        add     ecx,esi
+        movups  xmm0,XMMWORD[32+r15]
+DB      102,15,56,220,209
+        xor     edi,eax
+        ror     ebp,7
+        paddd   xmm5,xmm13
+        add     ecx,edx
+        add     ebx,DWORD[36+rsp]
+        xor     edi,ebp
+        mov     esi,ecx
+        movdqa  XMMWORD[16+rsp],xmm5
+        rol     ecx,5
+        add     ebx,edi
+        xor     esi,ebp
+        ror     edx,7
+        psubd   xmm5,xmm13
+        add     ebx,ecx
+        add     eax,DWORD[40+rsp]
+        xor     esi,edx
+        mov     edi,ebx
+        rol     ebx,5
+        add     eax,esi
+        xor     edi,edx
+        ror     ecx,7
+        add     eax,ebx
+        add     ebp,DWORD[44+rsp]
+        movups  xmm1,XMMWORD[48+r15]
+DB      102,15,56,220,208
+        xor     edi,ecx
+        mov     esi,eax
+        rol     eax,5
+        add     ebp,edi
+        xor     esi,ecx
+        ror     ebx,7
+        add     ebp,eax
+        add     edx,DWORD[48+rsp]
+        xor     esi,ebx
+        mov     edi,ebp
+DB      102,15,56,0,251
+        rol     ebp,5
+        add     edx,esi
+        xor     edi,ebx
+        ror     eax,7
+        paddd   xmm6,xmm13
+        add     edx,ebp
+        add     ecx,DWORD[52+rsp]
+        xor     edi,eax
+        mov     esi,edx
+        movdqa  XMMWORD[32+rsp],xmm6
+        rol     edx,5
+        add     ecx,edi
+        cmp     r8d,11
+        jb      NEAR $L$aesenclast4
+        movups  xmm0,XMMWORD[64+r15]
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[80+r15]
+DB      102,15,56,220,208
+        je      NEAR $L$aesenclast4
+        movups  xmm0,XMMWORD[96+r15]
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[112+r15]
+DB      102,15,56,220,208
+$L$aesenclast4:
+DB      102,15,56,221,209
+        movups  xmm0,XMMWORD[((16-112))+r15]
+        xor     esi,eax
+        ror     ebp,7
+        psubd   xmm6,xmm13
+        add     ecx,edx
+        add     ebx,DWORD[56+rsp]
+        xor     esi,ebp
+        mov     edi,ecx
+        rol     ecx,5
+        add     ebx,esi
+        xor     edi,ebp
+        ror     edx,7
+        add     ebx,ecx
+        add     eax,DWORD[60+rsp]
+        xor     edi,edx
+        mov     esi,ebx
+        rol     ebx,5
+        add     eax,edi
+        ror     ecx,7
+        add     eax,ebx
+        movups  XMMWORD[48+r12*1+r13],xmm2
+        lea     r12,[64+r12]
+
+        add     eax,DWORD[r9]
+        add     esi,DWORD[4+r9]
+        add     ecx,DWORD[8+r9]
+        add     edx,DWORD[12+r9]
+        mov     DWORD[r9],eax
+        add     ebp,DWORD[16+r9]
+        mov     DWORD[4+r9],esi
+        mov     ebx,esi
+        mov     DWORD[8+r9],ecx
+        mov     edi,ecx
+        mov     DWORD[12+r9],edx
+        xor     edi,edx
+        mov     DWORD[16+r9],ebp
+        and     esi,edi
+        jmp     NEAR $L$oop_ssse3
+
+$L$done_ssse3:
+        add     ebx,DWORD[16+rsp]
+        xor     esi,ebp
+        mov     edi,ecx
+        rol     ecx,5
+        add     ebx,esi
+        xor     edi,ebp
+        ror     edx,7
+        add     ebx,ecx
+        add     eax,DWORD[20+rsp]
+        xor     edi,edx
+        mov     esi,ebx
+        rol     ebx,5
+        add     eax,edi
+        xor     esi,edx
+        ror     ecx,7
+        add     eax,ebx
+        add     ebp,DWORD[24+rsp]
+        movups  xmm1,XMMWORD[16+r15]
+DB      102,15,56,220,208
+        xor     esi,ecx
+        mov     edi,eax
+        rol     eax,5
+        add     ebp,esi
+        xor     edi,ecx
+        ror     ebx,7
+        add     ebp,eax
+        add     edx,DWORD[28+rsp]
+        xor     edi,ebx
+        mov     esi,ebp
+        rol     ebp,5
+        add     edx,edi
+        xor     esi,ebx
+        ror     eax,7
+        add     edx,ebp
+        add     ecx,DWORD[32+rsp]
+        xor     esi,eax
+        mov     edi,edx
+        rol     edx,5
+        add     ecx,esi
+        movups  xmm0,XMMWORD[32+r15]
+DB      102,15,56,220,209
+        xor     edi,eax
+        ror     ebp,7
+        add     ecx,edx
+        add     ebx,DWORD[36+rsp]
+        xor     edi,ebp
+        mov     esi,ecx
+        rol     ecx,5
+        add     ebx,edi
+        xor     esi,ebp
+        ror     edx,7
+        add     ebx,ecx
+        add     eax,DWORD[40+rsp]
+        xor     esi,edx
+        mov     edi,ebx
+        rol     ebx,5
+        add     eax,esi
+        xor     edi,edx
+        ror     ecx,7
+        add     eax,ebx
+        add     ebp,DWORD[44+rsp]
+        movups  xmm1,XMMWORD[48+r15]
+DB      102,15,56,220,208
+        xor     edi,ecx
+        mov     esi,eax
+        rol     eax,5
+        add     ebp,edi
+        xor     esi,ecx
+        ror     ebx,7
+        add     ebp,eax
+        add     edx,DWORD[48+rsp]
+        xor     esi,ebx
+        mov     edi,ebp
+        rol     ebp,5
+        add     edx,esi
+        xor     edi,ebx
+        ror     eax,7
+        add     edx,ebp
+        add     ecx,DWORD[52+rsp]
+        xor     edi,eax
+        mov     esi,edx
+        rol     edx,5
+        add     ecx,edi
+        cmp     r8d,11
+        jb      NEAR $L$aesenclast5
+        movups  xmm0,XMMWORD[64+r15]
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[80+r15]
+DB      102,15,56,220,208
+        je      NEAR $L$aesenclast5
+        movups  xmm0,XMMWORD[96+r15]
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[112+r15]
+DB      102,15,56,220,208
+$L$aesenclast5:
+DB      102,15,56,221,209
+        movups  xmm0,XMMWORD[((16-112))+r15]
+        xor     esi,eax
+        ror     ebp,7
+        add     ecx,edx
+        add     ebx,DWORD[56+rsp]
+        xor     esi,ebp
+        mov     edi,ecx
+        rol     ecx,5
+        add     ebx,esi
+        xor     edi,ebp
+        ror     edx,7
+        add     ebx,ecx
+        add     eax,DWORD[60+rsp]
+        xor     edi,edx
+        mov     esi,ebx
+        rol     ebx,5
+        add     eax,edi
+        ror     ecx,7
+        add     eax,ebx
+        movups  XMMWORD[48+r12*1+r13],xmm2
+        mov     r8,QWORD[88+rsp]
+
+        add     eax,DWORD[r9]
+        add     esi,DWORD[4+r9]
+        add     ecx,DWORD[8+r9]
+        mov     DWORD[r9],eax
+        add     edx,DWORD[12+r9]
+        mov     DWORD[4+r9],esi
+        add     ebp,DWORD[16+r9]
+        mov     DWORD[8+r9],ecx
+        mov     DWORD[12+r9],edx
+        mov     DWORD[16+r9],ebp
+        movups  XMMWORD[r8],xmm2
+        movaps  xmm6,XMMWORD[((96+0))+rsp]
+        movaps  xmm7,XMMWORD[((96+16))+rsp]
+        movaps  xmm8,XMMWORD[((96+32))+rsp]
+        movaps  xmm9,XMMWORD[((96+48))+rsp]
+        movaps  xmm10,XMMWORD[((96+64))+rsp]
+        movaps  xmm11,XMMWORD[((96+80))+rsp]
+        movaps  xmm12,XMMWORD[((96+96))+rsp]
+        movaps  xmm13,XMMWORD[((96+112))+rsp]
+        movaps  xmm14,XMMWORD[((96+128))+rsp]
+        movaps  xmm15,XMMWORD[((96+144))+rsp]
+        lea     rsi,[264+rsp]
+
+        mov     r15,QWORD[rsi]
+
+        mov     r14,QWORD[8+rsi]
+
+        mov     r13,QWORD[16+rsi]
+
+        mov     r12,QWORD[24+rsi]
+
+        mov     rbp,QWORD[32+rsi]
+
+        mov     rbx,QWORD[40+rsi]
+
+        lea     rsp,[48+rsi]
+
+$L$epilogue_ssse3:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_aesni_cbc_sha1_enc_ssse3:
+
+ALIGN   32
+aesni_cbc_sha1_enc_avx:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_cbc_sha1_enc_avx:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        mov     r10,QWORD[56+rsp]
+
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        lea     rsp,[((-264))+rsp]
+
+
+
+        movaps  XMMWORD[(96+0)+rsp],xmm6
+        movaps  XMMWORD[(96+16)+rsp],xmm7
+        movaps  XMMWORD[(96+32)+rsp],xmm8
+        movaps  XMMWORD[(96+48)+rsp],xmm9
+        movaps  XMMWORD[(96+64)+rsp],xmm10
+        movaps  XMMWORD[(96+80)+rsp],xmm11
+        movaps  XMMWORD[(96+96)+rsp],xmm12
+        movaps  XMMWORD[(96+112)+rsp],xmm13
+        movaps  XMMWORD[(96+128)+rsp],xmm14
+        movaps  XMMWORD[(96+144)+rsp],xmm15
+$L$prologue_avx:
+        vzeroall
+        mov     r12,rdi
+        mov     r13,rsi
+        mov     r14,rdx
+        lea     r15,[112+rcx]
+        vmovdqu xmm12,XMMWORD[r8]
+        mov     QWORD[88+rsp],r8
+        shl     r14,6
+        sub     r13,r12
+        mov     r8d,DWORD[((240-112))+r15]
+        add     r14,r10
+
+        lea     r11,[K_XX_XX]
+        mov     eax,DWORD[r9]
+        mov     ebx,DWORD[4+r9]
+        mov     ecx,DWORD[8+r9]
+        mov     edx,DWORD[12+r9]
+        mov     esi,ebx
+        mov     ebp,DWORD[16+r9]
+        mov     edi,ecx
+        xor     edi,edx
+        and     esi,edi
+
+        vmovdqa xmm6,XMMWORD[64+r11]
+        vmovdqa xmm10,XMMWORD[r11]
+        vmovdqu xmm0,XMMWORD[r10]
+        vmovdqu xmm1,XMMWORD[16+r10]
+        vmovdqu xmm2,XMMWORD[32+r10]
+        vmovdqu xmm3,XMMWORD[48+r10]
+        vpshufb xmm0,xmm0,xmm6
+        add     r10,64
+        vpshufb xmm1,xmm1,xmm6
+        vpshufb xmm2,xmm2,xmm6
+        vpshufb xmm3,xmm3,xmm6
+        vpaddd  xmm4,xmm0,xmm10
+        vpaddd  xmm5,xmm1,xmm10
+        vpaddd  xmm6,xmm2,xmm10
+        vmovdqa XMMWORD[rsp],xmm4
+        vmovdqa XMMWORD[16+rsp],xmm5
+        vmovdqa XMMWORD[32+rsp],xmm6
+        vmovups xmm15,XMMWORD[((-112))+r15]
+        vmovups xmm14,XMMWORD[((16-112))+r15]
+        jmp     NEAR $L$oop_avx
+ALIGN   32
+$L$oop_avx:
+        shrd    ebx,ebx,2
+        vmovdqu xmm13,XMMWORD[r12]
+        vpxor   xmm13,xmm13,xmm15
+        vpxor   xmm12,xmm12,xmm13
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[((-80))+r15]
+        xor     esi,edx
+        vpalignr        xmm4,xmm1,xmm0,8
+        mov     edi,eax
+        add     ebp,DWORD[rsp]
+        vpaddd  xmm9,xmm10,xmm3
+        xor     ebx,ecx
+        shld    eax,eax,5
+        vpsrldq xmm8,xmm3,4
+        add     ebp,esi
+        and     edi,ebx
+        vpxor   xmm4,xmm4,xmm0
+        xor     ebx,ecx
+        add     ebp,eax
+        vpxor   xmm8,xmm8,xmm2
+        shrd    eax,eax,7
+        xor     edi,ecx
+        mov     esi,ebp
+        add     edx,DWORD[4+rsp]
+        vpxor   xmm4,xmm4,xmm8
+        xor     eax,ebx
+        shld    ebp,ebp,5
+        vmovdqa XMMWORD[48+rsp],xmm9
+        add     edx,edi
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[((-64))+r15]
+        and     esi,eax
+        vpsrld  xmm8,xmm4,31
+        xor     eax,ebx
+        add     edx,ebp
+        shrd    ebp,ebp,7
+        xor     esi,ebx
+        vpslldq xmm9,xmm4,12
+        vpaddd  xmm4,xmm4,xmm4
+        mov     edi,edx
+        add     ecx,DWORD[8+rsp]
+        xor     ebp,eax
+        shld    edx,edx,5
+        vpor    xmm4,xmm4,xmm8
+        vpsrld  xmm8,xmm9,30
+        add     ecx,esi
+        and     edi,ebp
+        xor     ebp,eax
+        add     ecx,edx
+        vpslld  xmm9,xmm9,2
+        vpxor   xmm4,xmm4,xmm8
+        shrd    edx,edx,7
+        xor     edi,eax
+        mov     esi,ecx
+        add     ebx,DWORD[12+rsp]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[((-48))+r15]
+        vpxor   xmm4,xmm4,xmm9
+        xor     edx,ebp
+        shld    ecx,ecx,5
+        add     ebx,edi
+        and     esi,edx
+        xor     edx,ebp
+        add     ebx,ecx
+        shrd    ecx,ecx,7
+        xor     esi,ebp
+        vpalignr        xmm5,xmm2,xmm1,8
+        mov     edi,ebx
+        add     eax,DWORD[16+rsp]
+        vpaddd  xmm9,xmm10,xmm4
+        xor     ecx,edx
+        shld    ebx,ebx,5
+        vpsrldq xmm8,xmm4,4
+        add     eax,esi
+        and     edi,ecx
+        vpxor   xmm5,xmm5,xmm1
+        xor     ecx,edx
+        add     eax,ebx
+        vpxor   xmm8,xmm8,xmm3
+        shrd    ebx,ebx,7
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[((-32))+r15]
+        xor     edi,edx
+        mov     esi,eax
+        add     ebp,DWORD[20+rsp]
+        vpxor   xmm5,xmm5,xmm8
+        xor     ebx,ecx
+        shld    eax,eax,5
+        vmovdqa XMMWORD[rsp],xmm9
+        add     ebp,edi
+        and     esi,ebx
+        vpsrld  xmm8,xmm5,31
+        xor     ebx,ecx
+        add     ebp,eax
+        shrd    eax,eax,7
+        xor     esi,ecx
+        vpslldq xmm9,xmm5,12
+        vpaddd  xmm5,xmm5,xmm5
+        mov     edi,ebp
+        add     edx,DWORD[24+rsp]
+        xor     eax,ebx
+        shld    ebp,ebp,5
+        vpor    xmm5,xmm5,xmm8
+        vpsrld  xmm8,xmm9,30
+        add     edx,esi
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[((-16))+r15]
+        and     edi,eax
+        xor     eax,ebx
+        add     edx,ebp
+        vpslld  xmm9,xmm9,2
+        vpxor   xmm5,xmm5,xmm8
+        shrd    ebp,ebp,7
+        xor     edi,ebx
+        mov     esi,edx
+        add     ecx,DWORD[28+rsp]
+        vpxor   xmm5,xmm5,xmm9
+        xor     ebp,eax
+        shld    edx,edx,5
+        vmovdqa xmm10,XMMWORD[16+r11]
+        add     ecx,edi
+        and     esi,ebp
+        xor     ebp,eax
+        add     ecx,edx
+        shrd    edx,edx,7
+        xor     esi,eax
+        vpalignr        xmm6,xmm3,xmm2,8
+        mov     edi,ecx
+        add     ebx,DWORD[32+rsp]
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[r15]
+        vpaddd  xmm9,xmm10,xmm5
+        xor     edx,ebp
+        shld    ecx,ecx,5
+        vpsrldq xmm8,xmm5,4
+        add     ebx,esi
+        and     edi,edx
+        vpxor   xmm6,xmm6,xmm2
+        xor     edx,ebp
+        add     ebx,ecx
+        vpxor   xmm8,xmm8,xmm4
+        shrd    ecx,ecx,7
+        xor     edi,ebp
+        mov     esi,ebx
+        add     eax,DWORD[36+rsp]
+        vpxor   xmm6,xmm6,xmm8
+        xor     ecx,edx
+        shld    ebx,ebx,5
+        vmovdqa XMMWORD[16+rsp],xmm9
+        add     eax,edi
+        and     esi,ecx
+        vpsrld  xmm8,xmm6,31
+        xor     ecx,edx
+        add     eax,ebx
+        shrd    ebx,ebx,7
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[16+r15]
+        xor     esi,edx
+        vpslldq xmm9,xmm6,12
+        vpaddd  xmm6,xmm6,xmm6
+        mov     edi,eax
+        add     ebp,DWORD[40+rsp]
+        xor     ebx,ecx
+        shld    eax,eax,5
+        vpor    xmm6,xmm6,xmm8
+        vpsrld  xmm8,xmm9,30
+        add     ebp,esi
+        and     edi,ebx
+        xor     ebx,ecx
+        add     ebp,eax
+        vpslld  xmm9,xmm9,2
+        vpxor   xmm6,xmm6,xmm8
+        shrd    eax,eax,7
+        xor     edi,ecx
+        mov     esi,ebp
+        add     edx,DWORD[44+rsp]
+        vpxor   xmm6,xmm6,xmm9
+        xor     eax,ebx
+        shld    ebp,ebp,5
+        add     edx,edi
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[32+r15]
+        and     esi,eax
+        xor     eax,ebx
+        add     edx,ebp
+        shrd    ebp,ebp,7
+        xor     esi,ebx
+        vpalignr        xmm7,xmm4,xmm3,8
+        mov     edi,edx
+        add     ecx,DWORD[48+rsp]
+        vpaddd  xmm9,xmm10,xmm6
+        xor     ebp,eax
+        shld    edx,edx,5
+        vpsrldq xmm8,xmm6,4
+        add     ecx,esi
+        and     edi,ebp
+        vpxor   xmm7,xmm7,xmm3
+        xor     ebp,eax
+        add     ecx,edx
+        vpxor   xmm8,xmm8,xmm5
+        shrd    edx,edx,7
+        xor     edi,eax
+        mov     esi,ecx
+        add     ebx,DWORD[52+rsp]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[48+r15]
+        vpxor   xmm7,xmm7,xmm8
+        xor     edx,ebp
+        shld    ecx,ecx,5
+        vmovdqa XMMWORD[32+rsp],xmm9
+        add     ebx,edi
+        and     esi,edx
+        vpsrld  xmm8,xmm7,31
+        xor     edx,ebp
+        add     ebx,ecx
+        shrd    ecx,ecx,7
+        xor     esi,ebp
+        vpslldq xmm9,xmm7,12
+        vpaddd  xmm7,xmm7,xmm7
+        mov     edi,ebx
+        add     eax,DWORD[56+rsp]
+        xor     ecx,edx
+        shld    ebx,ebx,5
+        vpor    xmm7,xmm7,xmm8
+        vpsrld  xmm8,xmm9,30
+        add     eax,esi
+        and     edi,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        vpslld  xmm9,xmm9,2
+        vpxor   xmm7,xmm7,xmm8
+        shrd    ebx,ebx,7
+        cmp     r8d,11
+        jb      NEAR $L$vaesenclast6
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[64+r15]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[80+r15]
+        je      NEAR $L$vaesenclast6
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[96+r15]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[112+r15]
+$L$vaesenclast6:
+        vaesenclast     xmm12,xmm12,xmm15
+        vmovups xmm15,XMMWORD[((-112))+r15]
+        vmovups xmm14,XMMWORD[((16-112))+r15]
+        xor     edi,edx
+        mov     esi,eax
+        add     ebp,DWORD[60+rsp]
+        vpxor   xmm7,xmm7,xmm9
+        xor     ebx,ecx
+        shld    eax,eax,5
+        add     ebp,edi
+        and     esi,ebx
+        xor     ebx,ecx
+        add     ebp,eax
+        vpalignr        xmm8,xmm7,xmm6,8
+        vpxor   xmm0,xmm0,xmm4
+        shrd    eax,eax,7
+        xor     esi,ecx
+        mov     edi,ebp
+        add     edx,DWORD[rsp]
+        vpxor   xmm0,xmm0,xmm1
+        xor     eax,ebx
+        shld    ebp,ebp,5
+        vpaddd  xmm9,xmm10,xmm7
+        add     edx,esi
+        vmovdqu xmm13,XMMWORD[16+r12]
+        vpxor   xmm13,xmm13,xmm15
+        vmovups XMMWORD[r13*1+r12],xmm12
+        vpxor   xmm12,xmm12,xmm13
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[((-80))+r15]
+        and     edi,eax
+        vpxor   xmm0,xmm0,xmm8
+        xor     eax,ebx
+        add     edx,ebp
+        shrd    ebp,ebp,7
+        xor     edi,ebx
+        vpsrld  xmm8,xmm0,30
+        vmovdqa XMMWORD[48+rsp],xmm9
+        mov     esi,edx
+        add     ecx,DWORD[4+rsp]
+        xor     ebp,eax
+        shld    edx,edx,5
+        vpslld  xmm0,xmm0,2
+        add     ecx,edi
+        and     esi,ebp
+        xor     ebp,eax
+        add     ecx,edx
+        shrd    edx,edx,7
+        xor     esi,eax
+        mov     edi,ecx
+        add     ebx,DWORD[8+rsp]
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[((-64))+r15]
+        vpor    xmm0,xmm0,xmm8
+        xor     edx,ebp
+        shld    ecx,ecx,5
+        add     ebx,esi
+        and     edi,edx
+        xor     edx,ebp
+        add     ebx,ecx
+        add     eax,DWORD[12+rsp]
+        xor     edi,ebp
+        mov     esi,ebx
+        shld    ebx,ebx,5
+        add     eax,edi
+        xor     esi,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        vpalignr        xmm8,xmm0,xmm7,8
+        vpxor   xmm1,xmm1,xmm5
+        add     ebp,DWORD[16+rsp]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[((-48))+r15]
+        xor     esi,ecx
+        mov     edi,eax
+        shld    eax,eax,5
+        vpxor   xmm1,xmm1,xmm2
+        add     ebp,esi
+        xor     edi,ecx
+        vpaddd  xmm9,xmm10,xmm0
+        shrd    ebx,ebx,7
+        add     ebp,eax
+        vpxor   xmm1,xmm1,xmm8
+        add     edx,DWORD[20+rsp]
+        xor     edi,ebx
+        mov     esi,ebp
+        shld    ebp,ebp,5
+        vpsrld  xmm8,xmm1,30
+        vmovdqa XMMWORD[rsp],xmm9
+        add     edx,edi
+        xor     esi,ebx
+        shrd    eax,eax,7
+        add     edx,ebp
+        vpslld  xmm1,xmm1,2
+        add     ecx,DWORD[24+rsp]
+        xor     esi,eax
+        mov     edi,edx
+        shld    edx,edx,5
+        add     ecx,esi
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[((-32))+r15]
+        xor     edi,eax
+        shrd    ebp,ebp,7
+        add     ecx,edx
+        vpor    xmm1,xmm1,xmm8
+        add     ebx,DWORD[28+rsp]
+        xor     edi,ebp
+        mov     esi,ecx
+        shld    ecx,ecx,5
+        add     ebx,edi
+        xor     esi,ebp
+        shrd    edx,edx,7
+        add     ebx,ecx
+        vpalignr        xmm8,xmm1,xmm0,8
+        vpxor   xmm2,xmm2,xmm6
+        add     eax,DWORD[32+rsp]
+        xor     esi,edx
+        mov     edi,ebx
+        shld    ebx,ebx,5
+        vpxor   xmm2,xmm2,xmm3
+        add     eax,esi
+        xor     edi,edx
+        vpaddd  xmm9,xmm10,xmm1
+        vmovdqa xmm10,XMMWORD[32+r11]
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        vpxor   xmm2,xmm2,xmm8
+        add     ebp,DWORD[36+rsp]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[((-16))+r15]
+        xor     edi,ecx
+        mov     esi,eax
+        shld    eax,eax,5
+        vpsrld  xmm8,xmm2,30
+        vmovdqa XMMWORD[16+rsp],xmm9
+        add     ebp,edi
+        xor     esi,ecx
+        shrd    ebx,ebx,7
+        add     ebp,eax
+        vpslld  xmm2,xmm2,2
+        add     edx,DWORD[40+rsp]
+        xor     esi,ebx
+        mov     edi,ebp
+        shld    ebp,ebp,5
+        add     edx,esi
+        xor     edi,ebx
+        shrd    eax,eax,7
+        add     edx,ebp
+        vpor    xmm2,xmm2,xmm8
+        add     ecx,DWORD[44+rsp]
+        xor     edi,eax
+        mov     esi,edx
+        shld    edx,edx,5
+        add     ecx,edi
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[r15]
+        xor     esi,eax
+        shrd    ebp,ebp,7
+        add     ecx,edx
+        vpalignr        xmm8,xmm2,xmm1,8
+        vpxor   xmm3,xmm3,xmm7
+        add     ebx,DWORD[48+rsp]
+        xor     esi,ebp
+        mov     edi,ecx
+        shld    ecx,ecx,5
+        vpxor   xmm3,xmm3,xmm4
+        add     ebx,esi
+        xor     edi,ebp
+        vpaddd  xmm9,xmm10,xmm2
+        shrd    edx,edx,7
+        add     ebx,ecx
+        vpxor   xmm3,xmm3,xmm8
+        add     eax,DWORD[52+rsp]
+        xor     edi,edx
+        mov     esi,ebx
+        shld    ebx,ebx,5
+        vpsrld  xmm8,xmm3,30
+        vmovdqa XMMWORD[32+rsp],xmm9
+        add     eax,edi
+        xor     esi,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        vpslld  xmm3,xmm3,2
+        add     ebp,DWORD[56+rsp]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[16+r15]
+        xor     esi,ecx
+        mov     edi,eax
+        shld    eax,eax,5
+        add     ebp,esi
+        xor     edi,ecx
+        shrd    ebx,ebx,7
+        add     ebp,eax
+        vpor    xmm3,xmm3,xmm8
+        add     edx,DWORD[60+rsp]
+        xor     edi,ebx
+        mov     esi,ebp
+        shld    ebp,ebp,5
+        add     edx,edi
+        xor     esi,ebx
+        shrd    eax,eax,7
+        add     edx,ebp
+        vpalignr        xmm8,xmm3,xmm2,8
+        vpxor   xmm4,xmm4,xmm0
+        add     ecx,DWORD[rsp]
+        xor     esi,eax
+        mov     edi,edx
+        shld    edx,edx,5
+        vpxor   xmm4,xmm4,xmm5
+        add     ecx,esi
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[32+r15]
+        xor     edi,eax
+        vpaddd  xmm9,xmm10,xmm3
+        shrd    ebp,ebp,7
+        add     ecx,edx
+        vpxor   xmm4,xmm4,xmm8
+        add     ebx,DWORD[4+rsp]
+        xor     edi,ebp
+        mov     esi,ecx
+        shld    ecx,ecx,5
+        vpsrld  xmm8,xmm4,30
+        vmovdqa XMMWORD[48+rsp],xmm9
+        add     ebx,edi
+        xor     esi,ebp
+        shrd    edx,edx,7
+        add     ebx,ecx
+        vpslld  xmm4,xmm4,2
+        add     eax,DWORD[8+rsp]
+        xor     esi,edx
+        mov     edi,ebx
+        shld    ebx,ebx,5
+        add     eax,esi
+        xor     edi,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        vpor    xmm4,xmm4,xmm8
+        add     ebp,DWORD[12+rsp]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[48+r15]
+        xor     edi,ecx
+        mov     esi,eax
+        shld    eax,eax,5
+        add     ebp,edi
+        xor     esi,ecx
+        shrd    ebx,ebx,7
+        add     ebp,eax
+        vpalignr        xmm8,xmm4,xmm3,8
+        vpxor   xmm5,xmm5,xmm1
+        add     edx,DWORD[16+rsp]
+        xor     esi,ebx
+        mov     edi,ebp
+        shld    ebp,ebp,5
+        vpxor   xmm5,xmm5,xmm6
+        add     edx,esi
+        xor     edi,ebx
+        vpaddd  xmm9,xmm10,xmm4
+        shrd    eax,eax,7
+        add     edx,ebp
+        vpxor   xmm5,xmm5,xmm8
+        add     ecx,DWORD[20+rsp]
+        xor     edi,eax
+        mov     esi,edx
+        shld    edx,edx,5
+        vpsrld  xmm8,xmm5,30
+        vmovdqa XMMWORD[rsp],xmm9
+        add     ecx,edi
+        cmp     r8d,11
+        jb      NEAR $L$vaesenclast7
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[64+r15]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[80+r15]
+        je      NEAR $L$vaesenclast7
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[96+r15]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[112+r15]
+$L$vaesenclast7:
+        vaesenclast     xmm12,xmm12,xmm15
+        vmovups xmm15,XMMWORD[((-112))+r15]
+        vmovups xmm14,XMMWORD[((16-112))+r15]
+        xor     esi,eax
+        shrd    ebp,ebp,7
+        add     ecx,edx
+        vpslld  xmm5,xmm5,2
+        add     ebx,DWORD[24+rsp]
+        xor     esi,ebp
+        mov     edi,ecx
+        shld    ecx,ecx,5
+        add     ebx,esi
+        xor     edi,ebp
+        shrd    edx,edx,7
+        add     ebx,ecx
+        vpor    xmm5,xmm5,xmm8
+        add     eax,DWORD[28+rsp]
+        shrd    ecx,ecx,7
+        mov     esi,ebx
+        xor     edi,edx
+        shld    ebx,ebx,5
+        add     eax,edi
+        xor     esi,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        vpalignr        xmm8,xmm5,xmm4,8
+        vpxor   xmm6,xmm6,xmm2
+        add     ebp,DWORD[32+rsp]
+        vmovdqu xmm13,XMMWORD[32+r12]
+        vpxor   xmm13,xmm13,xmm15
+        vmovups XMMWORD[16+r12*1+r13],xmm12
+        vpxor   xmm12,xmm12,xmm13
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[((-80))+r15]
+        and     esi,ecx
+        xor     ecx,edx
+        shrd    ebx,ebx,7
+        vpxor   xmm6,xmm6,xmm7
+        mov     edi,eax
+        xor     esi,ecx
+        vpaddd  xmm9,xmm10,xmm5
+        shld    eax,eax,5
+        add     ebp,esi
+        vpxor   xmm6,xmm6,xmm8
+        xor     edi,ebx
+        xor     ebx,ecx
+        add     ebp,eax
+        add     edx,DWORD[36+rsp]
+        vpsrld  xmm8,xmm6,30
+        vmovdqa XMMWORD[16+rsp],xmm9
+        and     edi,ebx
+        xor     ebx,ecx
+        shrd    eax,eax,7
+        mov     esi,ebp
+        vpslld  xmm6,xmm6,2
+        xor     edi,ebx
+        shld    ebp,ebp,5
+        add     edx,edi
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[((-64))+r15]
+        xor     esi,eax
+        xor     eax,ebx
+        add     edx,ebp
+        add     ecx,DWORD[40+rsp]
+        and     esi,eax
+        vpor    xmm6,xmm6,xmm8
+        xor     eax,ebx
+        shrd    ebp,ebp,7
+        mov     edi,edx
+        xor     esi,eax
+        shld    edx,edx,5
+        add     ecx,esi
+        xor     edi,ebp
+        xor     ebp,eax
+        add     ecx,edx
+        add     ebx,DWORD[44+rsp]
+        and     edi,ebp
+        xor     ebp,eax
+        shrd    edx,edx,7
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[((-48))+r15]
+        mov     esi,ecx
+        xor     edi,ebp
+        shld    ecx,ecx,5
+        add     ebx,edi
+        xor     esi,edx
+        xor     edx,ebp
+        add     ebx,ecx
+        vpalignr        xmm8,xmm6,xmm5,8
+        vpxor   xmm7,xmm7,xmm3
+        add     eax,DWORD[48+rsp]
+        and     esi,edx
+        xor     edx,ebp
+        shrd    ecx,ecx,7
+        vpxor   xmm7,xmm7,xmm0
+        mov     edi,ebx
+        xor     esi,edx
+        vpaddd  xmm9,xmm10,xmm6
+        vmovdqa xmm10,XMMWORD[48+r11]
+        shld    ebx,ebx,5
+        add     eax,esi
+        vpxor   xmm7,xmm7,xmm8
+        xor     edi,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        add     ebp,DWORD[52+rsp]
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[((-32))+r15]
+        vpsrld  xmm8,xmm7,30
+        vmovdqa XMMWORD[32+rsp],xmm9
+        and     edi,ecx
+        xor     ecx,edx
+        shrd    ebx,ebx,7
+        mov     esi,eax
+        vpslld  xmm7,xmm7,2
+        xor     edi,ecx
+        shld    eax,eax,5
+        add     ebp,edi
+        xor     esi,ebx
+        xor     ebx,ecx
+        add     ebp,eax
+        add     edx,DWORD[56+rsp]
+        and     esi,ebx
+        vpor    xmm7,xmm7,xmm8
+        xor     ebx,ecx
+        shrd    eax,eax,7
+        mov     edi,ebp
+        xor     esi,ebx
+        shld    ebp,ebp,5
+        add     edx,esi
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[((-16))+r15]
+        xor     edi,eax
+        xor     eax,ebx
+        add     edx,ebp
+        add     ecx,DWORD[60+rsp]
+        and     edi,eax
+        xor     eax,ebx
+        shrd    ebp,ebp,7
+        mov     esi,edx
+        xor     edi,eax
+        shld    edx,edx,5
+        add     ecx,edi
+        xor     esi,ebp
+        xor     ebp,eax
+        add     ecx,edx
+        vpalignr        xmm8,xmm7,xmm6,8
+        vpxor   xmm0,xmm0,xmm4
+        add     ebx,DWORD[rsp]
+        and     esi,ebp
+        xor     ebp,eax
+        shrd    edx,edx,7
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[r15]
+        vpxor   xmm0,xmm0,xmm1
+        mov     edi,ecx
+        xor     esi,ebp
+        vpaddd  xmm9,xmm10,xmm7
+        shld    ecx,ecx,5
+        add     ebx,esi
+        vpxor   xmm0,xmm0,xmm8
+        xor     edi,edx
+        xor     edx,ebp
+        add     ebx,ecx
+        add     eax,DWORD[4+rsp]
+        vpsrld  xmm8,xmm0,30
+        vmovdqa XMMWORD[48+rsp],xmm9
+        and     edi,edx
+        xor     edx,ebp
+        shrd    ecx,ecx,7
+        mov     esi,ebx
+        vpslld  xmm0,xmm0,2
+        xor     edi,edx
+        shld    ebx,ebx,5
+        add     eax,edi
+        xor     esi,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        add     ebp,DWORD[8+rsp]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[16+r15]
+        and     esi,ecx
+        vpor    xmm0,xmm0,xmm8
+        xor     ecx,edx
+        shrd    ebx,ebx,7
+        mov     edi,eax
+        xor     esi,ecx
+        shld    eax,eax,5
+        add     ebp,esi
+        xor     edi,ebx
+        xor     ebx,ecx
+        add     ebp,eax
+        add     edx,DWORD[12+rsp]
+        and     edi,ebx
+        xor     ebx,ecx
+        shrd    eax,eax,7
+        mov     esi,ebp
+        xor     edi,ebx
+        shld    ebp,ebp,5
+        add     edx,edi
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[32+r15]
+        xor     esi,eax
+        xor     eax,ebx
+        add     edx,ebp
+        vpalignr        xmm8,xmm0,xmm7,8
+        vpxor   xmm1,xmm1,xmm5
+        add     ecx,DWORD[16+rsp]
+        and     esi,eax
+        xor     eax,ebx
+        shrd    ebp,ebp,7
+        vpxor   xmm1,xmm1,xmm2
+        mov     edi,edx
+        xor     esi,eax
+        vpaddd  xmm9,xmm10,xmm0
+        shld    edx,edx,5
+        add     ecx,esi
+        vpxor   xmm1,xmm1,xmm8
+        xor     edi,ebp
+        xor     ebp,eax
+        add     ecx,edx
+        add     ebx,DWORD[20+rsp]
+        vpsrld  xmm8,xmm1,30
+        vmovdqa XMMWORD[rsp],xmm9
+        and     edi,ebp
+        xor     ebp,eax
+        shrd    edx,edx,7
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[48+r15]
+        mov     esi,ecx
+        vpslld  xmm1,xmm1,2
+        xor     edi,ebp
+        shld    ecx,ecx,5
+        add     ebx,edi
+        xor     esi,edx
+        xor     edx,ebp
+        add     ebx,ecx
+        add     eax,DWORD[24+rsp]
+        and     esi,edx
+        vpor    xmm1,xmm1,xmm8
+        xor     edx,ebp
+        shrd    ecx,ecx,7
+        mov     edi,ebx
+        xor     esi,edx
+        shld    ebx,ebx,5
+        add     eax,esi
+        xor     edi,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        add     ebp,DWORD[28+rsp]
+        cmp     r8d,11
+        jb      NEAR $L$vaesenclast8
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[64+r15]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[80+r15]
+        je      NEAR $L$vaesenclast8
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[96+r15]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[112+r15]
+$L$vaesenclast8:
+        vaesenclast     xmm12,xmm12,xmm15
+        vmovups xmm15,XMMWORD[((-112))+r15]
+        vmovups xmm14,XMMWORD[((16-112))+r15]
+        and     edi,ecx
+        xor     ecx,edx
+        shrd    ebx,ebx,7
+        mov     esi,eax
+        xor     edi,ecx
+        shld    eax,eax,5
+        add     ebp,edi
+        xor     esi,ebx
+        xor     ebx,ecx
+        add     ebp,eax
+        vpalignr        xmm8,xmm1,xmm0,8
+        vpxor   xmm2,xmm2,xmm6
+        add     edx,DWORD[32+rsp]
+        and     esi,ebx
+        xor     ebx,ecx
+        shrd    eax,eax,7
+        vpxor   xmm2,xmm2,xmm3
+        mov     edi,ebp
+        xor     esi,ebx
+        vpaddd  xmm9,xmm10,xmm1
+        shld    ebp,ebp,5
+        add     edx,esi
+        vmovdqu xmm13,XMMWORD[48+r12]
+        vpxor   xmm13,xmm13,xmm15
+        vmovups XMMWORD[32+r12*1+r13],xmm12
+        vpxor   xmm12,xmm12,xmm13
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[((-80))+r15]
+        vpxor   xmm2,xmm2,xmm8
+        xor     edi,eax
+        xor     eax,ebx
+        add     edx,ebp
+        add     ecx,DWORD[36+rsp]
+        vpsrld  xmm8,xmm2,30
+        vmovdqa XMMWORD[16+rsp],xmm9
+        and     edi,eax
+        xor     eax,ebx
+        shrd    ebp,ebp,7
+        mov     esi,edx
+        vpslld  xmm2,xmm2,2
+        xor     edi,eax
+        shld    edx,edx,5
+        add     ecx,edi
+        xor     esi,ebp
+        xor     ebp,eax
+        add     ecx,edx
+        add     ebx,DWORD[40+rsp]
+        and     esi,ebp
+        vpor    xmm2,xmm2,xmm8
+        xor     ebp,eax
+        shrd    edx,edx,7
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[((-64))+r15]
+        mov     edi,ecx
+        xor     esi,ebp
+        shld    ecx,ecx,5
+        add     ebx,esi
+        xor     edi,edx
+        xor     edx,ebp
+        add     ebx,ecx
+        add     eax,DWORD[44+rsp]
+        and     edi,edx
+        xor     edx,ebp
+        shrd    ecx,ecx,7
+        mov     esi,ebx
+        xor     edi,edx
+        shld    ebx,ebx,5
+        add     eax,edi
+        xor     esi,edx
+        add     eax,ebx
+        vpalignr        xmm8,xmm2,xmm1,8
+        vpxor   xmm3,xmm3,xmm7
+        add     ebp,DWORD[48+rsp]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[((-48))+r15]
+        xor     esi,ecx
+        mov     edi,eax
+        shld    eax,eax,5
+        vpxor   xmm3,xmm3,xmm4
+        add     ebp,esi
+        xor     edi,ecx
+        vpaddd  xmm9,xmm10,xmm2
+        shrd    ebx,ebx,7
+        add     ebp,eax
+        vpxor   xmm3,xmm3,xmm8
+        add     edx,DWORD[52+rsp]
+        xor     edi,ebx
+        mov     esi,ebp
+        shld    ebp,ebp,5
+        vpsrld  xmm8,xmm3,30
+        vmovdqa XMMWORD[32+rsp],xmm9
+        add     edx,edi
+        xor     esi,ebx
+        shrd    eax,eax,7
+        add     edx,ebp
+        vpslld  xmm3,xmm3,2
+        add     ecx,DWORD[56+rsp]
+        xor     esi,eax
+        mov     edi,edx
+        shld    edx,edx,5
+        add     ecx,esi
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[((-32))+r15]
+        xor     edi,eax
+        shrd    ebp,ebp,7
+        add     ecx,edx
+        vpor    xmm3,xmm3,xmm8
+        add     ebx,DWORD[60+rsp]
+        xor     edi,ebp
+        mov     esi,ecx
+        shld    ecx,ecx,5
+        add     ebx,edi
+        xor     esi,ebp
+        shrd    edx,edx,7
+        add     ebx,ecx
+        add     eax,DWORD[rsp]
+        vpaddd  xmm9,xmm10,xmm3
+        xor     esi,edx
+        mov     edi,ebx
+        shld    ebx,ebx,5
+        add     eax,esi
+        vmovdqa XMMWORD[48+rsp],xmm9
+        xor     edi,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        add     ebp,DWORD[4+rsp]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[((-16))+r15]
+        xor     edi,ecx
+        mov     esi,eax
+        shld    eax,eax,5
+        add     ebp,edi
+        xor     esi,ecx
+        shrd    ebx,ebx,7
+        add     ebp,eax
+        add     edx,DWORD[8+rsp]
+        xor     esi,ebx
+        mov     edi,ebp
+        shld    ebp,ebp,5
+        add     edx,esi
+        xor     edi,ebx
+        shrd    eax,eax,7
+        add     edx,ebp
+        add     ecx,DWORD[12+rsp]
+        xor     edi,eax
+        mov     esi,edx
+        shld    edx,edx,5
+        add     ecx,edi
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[r15]
+        xor     esi,eax
+        shrd    ebp,ebp,7
+        add     ecx,edx
+        cmp     r10,r14
+        je      NEAR $L$done_avx
+        vmovdqa xmm9,XMMWORD[64+r11]
+        vmovdqa xmm10,XMMWORD[r11]
+        vmovdqu xmm0,XMMWORD[r10]
+        vmovdqu xmm1,XMMWORD[16+r10]
+        vmovdqu xmm2,XMMWORD[32+r10]
+        vmovdqu xmm3,XMMWORD[48+r10]
+        vpshufb xmm0,xmm0,xmm9
+        add     r10,64
+        add     ebx,DWORD[16+rsp]
+        xor     esi,ebp
+        vpshufb xmm1,xmm1,xmm9
+        mov     edi,ecx
+        shld    ecx,ecx,5
+        vpaddd  xmm8,xmm0,xmm10
+        add     ebx,esi
+        xor     edi,ebp
+        shrd    edx,edx,7
+        add     ebx,ecx
+        vmovdqa XMMWORD[rsp],xmm8
+        add     eax,DWORD[20+rsp]
+        xor     edi,edx
+        mov     esi,ebx
+        shld    ebx,ebx,5
+        add     eax,edi
+        xor     esi,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        add     ebp,DWORD[24+rsp]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[16+r15]
+        xor     esi,ecx
+        mov     edi,eax
+        shld    eax,eax,5
+        add     ebp,esi
+        xor     edi,ecx
+        shrd    ebx,ebx,7
+        add     ebp,eax
+        add     edx,DWORD[28+rsp]
+        xor     edi,ebx
+        mov     esi,ebp
+        shld    ebp,ebp,5
+        add     edx,edi
+        xor     esi,ebx
+        shrd    eax,eax,7
+        add     edx,ebp
+        add     ecx,DWORD[32+rsp]
+        xor     esi,eax
+        vpshufb xmm2,xmm2,xmm9
+        mov     edi,edx
+        shld    edx,edx,5
+        vpaddd  xmm8,xmm1,xmm10
+        add     ecx,esi
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[32+r15]
+        xor     edi,eax
+        shrd    ebp,ebp,7
+        add     ecx,edx
+        vmovdqa XMMWORD[16+rsp],xmm8
+        add     ebx,DWORD[36+rsp]
+        xor     edi,ebp
+        mov     esi,ecx
+        shld    ecx,ecx,5
+        add     ebx,edi
+        xor     esi,ebp
+        shrd    edx,edx,7
+        add     ebx,ecx
+        add     eax,DWORD[40+rsp]
+        xor     esi,edx
+        mov     edi,ebx
+        shld    ebx,ebx,5
+        add     eax,esi
+        xor     edi,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        add     ebp,DWORD[44+rsp]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[48+r15]
+        xor     edi,ecx
+        mov     esi,eax
+        shld    eax,eax,5
+        add     ebp,edi
+        xor     esi,ecx
+        shrd    ebx,ebx,7
+        add     ebp,eax
+        add     edx,DWORD[48+rsp]
+        xor     esi,ebx
+        vpshufb xmm3,xmm3,xmm9
+        mov     edi,ebp
+        shld    ebp,ebp,5
+        vpaddd  xmm8,xmm2,xmm10
+        add     edx,esi
+        xor     edi,ebx
+        shrd    eax,eax,7
+        add     edx,ebp
+        vmovdqa XMMWORD[32+rsp],xmm8
+        add     ecx,DWORD[52+rsp]
+        xor     edi,eax
+        mov     esi,edx
+        shld    edx,edx,5
+        add     ecx,edi
+        cmp     r8d,11
+        jb      NEAR $L$vaesenclast9
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[64+r15]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[80+r15]
+        je      NEAR $L$vaesenclast9
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[96+r15]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[112+r15]
+$L$vaesenclast9:
+        vaesenclast     xmm12,xmm12,xmm15
+        vmovups xmm15,XMMWORD[((-112))+r15]
+        vmovups xmm14,XMMWORD[((16-112))+r15]
+        xor     esi,eax
+        shrd    ebp,ebp,7
+        add     ecx,edx
+        add     ebx,DWORD[56+rsp]
+        xor     esi,ebp
+        mov     edi,ecx
+        shld    ecx,ecx,5
+        add     ebx,esi
+        xor     edi,ebp
+        shrd    edx,edx,7
+        add     ebx,ecx
+        add     eax,DWORD[60+rsp]
+        xor     edi,edx
+        mov     esi,ebx
+        shld    ebx,ebx,5
+        add     eax,edi
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        vmovups XMMWORD[48+r12*1+r13],xmm12
+        lea     r12,[64+r12]
+
+        add     eax,DWORD[r9]
+        add     esi,DWORD[4+r9]
+        add     ecx,DWORD[8+r9]
+        add     edx,DWORD[12+r9]
+        mov     DWORD[r9],eax
+        add     ebp,DWORD[16+r9]
+        mov     DWORD[4+r9],esi
+        mov     ebx,esi
+        mov     DWORD[8+r9],ecx
+        mov     edi,ecx
+        mov     DWORD[12+r9],edx
+        xor     edi,edx
+        mov     DWORD[16+r9],ebp
+        and     esi,edi
+        jmp     NEAR $L$oop_avx
+
+$L$done_avx:
+        add     ebx,DWORD[16+rsp]
+        xor     esi,ebp
+        mov     edi,ecx
+        shld    ecx,ecx,5
+        add     ebx,esi
+        xor     edi,ebp
+        shrd    edx,edx,7
+        add     ebx,ecx
+        add     eax,DWORD[20+rsp]
+        xor     edi,edx
+        mov     esi,ebx
+        shld    ebx,ebx,5
+        add     eax,edi
+        xor     esi,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        add     ebp,DWORD[24+rsp]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[16+r15]
+        xor     esi,ecx
+        mov     edi,eax
+        shld    eax,eax,5
+        add     ebp,esi
+        xor     edi,ecx
+        shrd    ebx,ebx,7
+        add     ebp,eax
+        add     edx,DWORD[28+rsp]
+        xor     edi,ebx
+        mov     esi,ebp
+        shld    ebp,ebp,5
+        add     edx,edi
+        xor     esi,ebx
+        shrd    eax,eax,7
+        add     edx,ebp
+        add     ecx,DWORD[32+rsp]
+        xor     esi,eax
+        mov     edi,edx
+        shld    edx,edx,5
+        add     ecx,esi
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[32+r15]
+        xor     edi,eax
+        shrd    ebp,ebp,7
+        add     ecx,edx
+        add     ebx,DWORD[36+rsp]
+        xor     edi,ebp
+        mov     esi,ecx
+        shld    ecx,ecx,5
+        add     ebx,edi
+        xor     esi,ebp
+        shrd    edx,edx,7
+        add     ebx,ecx
+        add     eax,DWORD[40+rsp]
+        xor     esi,edx
+        mov     edi,ebx
+        shld    ebx,ebx,5
+        add     eax,esi
+        xor     edi,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        add     ebp,DWORD[44+rsp]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[48+r15]
+        xor     edi,ecx
+        mov     esi,eax
+        shld    eax,eax,5
+        add     ebp,edi
+        xor     esi,ecx
+        shrd    ebx,ebx,7
+        add     ebp,eax
+        add     edx,DWORD[48+rsp]
+        xor     esi,ebx
+        mov     edi,ebp
+        shld    ebp,ebp,5
+        add     edx,esi
+        xor     edi,ebx
+        shrd    eax,eax,7
+        add     edx,ebp
+        add     ecx,DWORD[52+rsp]
+        xor     edi,eax
+        mov     esi,edx
+        shld    edx,edx,5
+        add     ecx,edi
+        cmp     r8d,11
+        jb      NEAR $L$vaesenclast10
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[64+r15]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[80+r15]
+        je      NEAR $L$vaesenclast10
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[96+r15]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[112+r15]
+$L$vaesenclast10:
+        vaesenclast     xmm12,xmm12,xmm15
+        vmovups xmm15,XMMWORD[((-112))+r15]
+        vmovups xmm14,XMMWORD[((16-112))+r15]
+        xor     esi,eax
+        shrd    ebp,ebp,7
+        add     ecx,edx
+        add     ebx,DWORD[56+rsp]
+        xor     esi,ebp
+        mov     edi,ecx
+        shld    ecx,ecx,5
+        add     ebx,esi
+        xor     edi,ebp
+        shrd    edx,edx,7
+        add     ebx,ecx
+        add     eax,DWORD[60+rsp]
+        xor     edi,edx
+        mov     esi,ebx
+        shld    ebx,ebx,5
+        add     eax,edi
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        vmovups XMMWORD[48+r12*1+r13],xmm12
+        mov     r8,QWORD[88+rsp]
+
+        add     eax,DWORD[r9]
+        add     esi,DWORD[4+r9]
+        add     ecx,DWORD[8+r9]
+        mov     DWORD[r9],eax
+        add     edx,DWORD[12+r9]
+        mov     DWORD[4+r9],esi
+        add     ebp,DWORD[16+r9]
+        mov     DWORD[8+r9],ecx
+        mov     DWORD[12+r9],edx
+        mov     DWORD[16+r9],ebp
+        vmovups XMMWORD[r8],xmm12
+        vzeroall
+        movaps  xmm6,XMMWORD[((96+0))+rsp]
+        movaps  xmm7,XMMWORD[((96+16))+rsp]
+        movaps  xmm8,XMMWORD[((96+32))+rsp]
+        movaps  xmm9,XMMWORD[((96+48))+rsp]
+        movaps  xmm10,XMMWORD[((96+64))+rsp]
+        movaps  xmm11,XMMWORD[((96+80))+rsp]
+        movaps  xmm12,XMMWORD[((96+96))+rsp]
+        movaps  xmm13,XMMWORD[((96+112))+rsp]
+        movaps  xmm14,XMMWORD[((96+128))+rsp]
+        movaps  xmm15,XMMWORD[((96+144))+rsp]
+        lea     rsi,[264+rsp]
+
+        mov     r15,QWORD[rsi]
+
+        mov     r14,QWORD[8+rsi]
+
+        mov     r13,QWORD[16+rsi]
+
+        mov     r12,QWORD[24+rsi]
+
+        mov     rbp,QWORD[32+rsi]
+
+        mov     rbx,QWORD[40+rsi]
+
+        lea     rsp,[48+rsi]
+
+$L$epilogue_avx:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_aesni_cbc_sha1_enc_avx:
+ALIGN   64
+K_XX_XX:
+        DD      0x5a827999,0x5a827999,0x5a827999,0x5a827999
+        DD      0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1
+        DD      0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc
+        DD      0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6
+        DD      0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+DB      0xf,0xe,0xd,0xc,0xb,0xa,0x9,0x8,0x7,0x6,0x5,0x4,0x3,0x2,0x1,0x0
+
+DB      65,69,83,78,73,45,67,66,67,43,83,72,65,49,32,115
+DB      116,105,116,99,104,32,102,111,114,32,120,56,54,95,54,52
+DB      44,32,67,82,89,80,84,79,71,65,77,83,32,98,121,32
+DB      60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111
+DB      114,103,62,0
+ALIGN   64
+
+ALIGN   32
+aesni_cbc_sha1_enc_shaext:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_cbc_sha1_enc_shaext:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+        mov     r10,QWORD[56+rsp]
+        lea     rsp,[((-168))+rsp]
+        movaps  XMMWORD[(-8-160)+rax],xmm6
+        movaps  XMMWORD[(-8-144)+rax],xmm7
+        movaps  XMMWORD[(-8-128)+rax],xmm8
+        movaps  XMMWORD[(-8-112)+rax],xmm9
+        movaps  XMMWORD[(-8-96)+rax],xmm10
+        movaps  XMMWORD[(-8-80)+rax],xmm11
+        movaps  XMMWORD[(-8-64)+rax],xmm12
+        movaps  XMMWORD[(-8-48)+rax],xmm13
+        movaps  XMMWORD[(-8-32)+rax],xmm14
+        movaps  XMMWORD[(-8-16)+rax],xmm15
+$L$prologue_shaext:
+        movdqu  xmm8,XMMWORD[r9]
+        movd    xmm9,DWORD[16+r9]
+        movdqa  xmm7,XMMWORD[((K_XX_XX+80))]
+
+        mov     r11d,DWORD[240+rcx]
+        sub     rsi,rdi
+        movups  xmm15,XMMWORD[rcx]
+        movups  xmm2,XMMWORD[r8]
+        movups  xmm0,XMMWORD[16+rcx]
+        lea     rcx,[112+rcx]
+
+        pshufd  xmm8,xmm8,27
+        pshufd  xmm9,xmm9,27
+        jmp     NEAR $L$oop_shaext
+
+ALIGN   16
+$L$oop_shaext:
+        movups  xmm14,XMMWORD[rdi]
+        xorps   xmm14,xmm15
+        xorps   xmm2,xmm14
+        movups  xmm1,XMMWORD[((-80))+rcx]
+DB      102,15,56,220,208
+        movdqu  xmm3,XMMWORD[r10]
+        movdqa  xmm12,xmm9
+DB      102,15,56,0,223
+        movdqu  xmm4,XMMWORD[16+r10]
+        movdqa  xmm11,xmm8
+        movups  xmm0,XMMWORD[((-64))+rcx]
+DB      102,15,56,220,209
+DB      102,15,56,0,231
+
+        paddd   xmm9,xmm3
+        movdqu  xmm5,XMMWORD[32+r10]
+        lea     r10,[64+r10]
+        pxor    xmm3,xmm12
+        movups  xmm1,XMMWORD[((-48))+rcx]
+DB      102,15,56,220,208
+        pxor    xmm3,xmm12
+        movdqa  xmm10,xmm8
+DB      102,15,56,0,239
+DB      69,15,58,204,193,0
+DB      68,15,56,200,212
+        movups  xmm0,XMMWORD[((-32))+rcx]
+DB      102,15,56,220,209
+DB      15,56,201,220
+        movdqu  xmm6,XMMWORD[((-16))+r10]
+        movdqa  xmm9,xmm8
+DB      102,15,56,0,247
+        movups  xmm1,XMMWORD[((-16))+rcx]
+DB      102,15,56,220,208
+DB      69,15,58,204,194,0
+DB      68,15,56,200,205
+        pxor    xmm3,xmm5
+DB      15,56,201,229
+        movups  xmm0,XMMWORD[rcx]
+DB      102,15,56,220,209
+        movdqa  xmm10,xmm8
+DB      69,15,58,204,193,0
+DB      68,15,56,200,214
+        movups  xmm1,XMMWORD[16+rcx]
+DB      102,15,56,220,208
+DB      15,56,202,222
+        pxor    xmm4,xmm6
+DB      15,56,201,238
+        movups  xmm0,XMMWORD[32+rcx]
+DB      102,15,56,220,209
+        movdqa  xmm9,xmm8
+DB      69,15,58,204,194,0
+DB      68,15,56,200,203
+        movups  xmm1,XMMWORD[48+rcx]
+DB      102,15,56,220,208
+DB      15,56,202,227
+        pxor    xmm5,xmm3
+DB      15,56,201,243
+        cmp     r11d,11
+        jb      NEAR $L$aesenclast11
+        movups  xmm0,XMMWORD[64+rcx]
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[80+rcx]
+DB      102,15,56,220,208
+        je      NEAR $L$aesenclast11
+        movups  xmm0,XMMWORD[96+rcx]
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[112+rcx]
+DB      102,15,56,220,208
+$L$aesenclast11:
+DB      102,15,56,221,209
+        movups  xmm0,XMMWORD[((16-112))+rcx]
+        movdqa  xmm10,xmm8
+DB      69,15,58,204,193,0
+DB      68,15,56,200,212
+        movups  xmm14,XMMWORD[16+rdi]
+        xorps   xmm14,xmm15
+        movups  XMMWORD[rdi*1+rsi],xmm2
+        xorps   xmm2,xmm14
+        movups  xmm1,XMMWORD[((-80))+rcx]
+DB      102,15,56,220,208
+DB      15,56,202,236
+        pxor    xmm6,xmm4
+DB      15,56,201,220
+        movups  xmm0,XMMWORD[((-64))+rcx]
+DB      102,15,56,220,209
+        movdqa  xmm9,xmm8
+DB      69,15,58,204,194,1
+DB      68,15,56,200,205
+        movups  xmm1,XMMWORD[((-48))+rcx]
+DB      102,15,56,220,208
+DB      15,56,202,245
+        pxor    xmm3,xmm5
+DB      15,56,201,229
+        movups  xmm0,XMMWORD[((-32))+rcx]
+DB      102,15,56,220,209
+        movdqa  xmm10,xmm8
+DB      69,15,58,204,193,1
+DB      68,15,56,200,214
+        movups  xmm1,XMMWORD[((-16))+rcx]
+DB      102,15,56,220,208
+DB      15,56,202,222
+        pxor    xmm4,xmm6
+DB      15,56,201,238
+        movups  xmm0,XMMWORD[rcx]
+DB      102,15,56,220,209
+        movdqa  xmm9,xmm8
+DB      69,15,58,204,194,1
+DB      68,15,56,200,203
+        movups  xmm1,XMMWORD[16+rcx]
+DB      102,15,56,220,208
+DB      15,56,202,227
+        pxor    xmm5,xmm3
+DB      15,56,201,243
+        movups  xmm0,XMMWORD[32+rcx]
+DB      102,15,56,220,209
+        movdqa  xmm10,xmm8
+DB      69,15,58,204,193,1
+DB      68,15,56,200,212
+        movups  xmm1,XMMWORD[48+rcx]
+DB      102,15,56,220,208
+DB      15,56,202,236
+        pxor    xmm6,xmm4
+DB      15,56,201,220
+        cmp     r11d,11
+        jb      NEAR $L$aesenclast12
+        movups  xmm0,XMMWORD[64+rcx]
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[80+rcx]
+DB      102,15,56,220,208
+        je      NEAR $L$aesenclast12
+        movups  xmm0,XMMWORD[96+rcx]
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[112+rcx]
+DB      102,15,56,220,208
+$L$aesenclast12:
+DB      102,15,56,221,209
+        movups  xmm0,XMMWORD[((16-112))+rcx]
+        movdqa  xmm9,xmm8
+DB      69,15,58,204,194,1
+DB      68,15,56,200,205
+        movups  xmm14,XMMWORD[32+rdi]
+        xorps   xmm14,xmm15
+        movups  XMMWORD[16+rdi*1+rsi],xmm2
+        xorps   xmm2,xmm14
+        movups  xmm1,XMMWORD[((-80))+rcx]
+DB      102,15,56,220,208
+DB      15,56,202,245
+        pxor    xmm3,xmm5
+DB      15,56,201,229
+        movups  xmm0,XMMWORD[((-64))+rcx]
+DB      102,15,56,220,209
+        movdqa  xmm10,xmm8
+DB      69,15,58,204,193,2
+DB      68,15,56,200,214
+        movups  xmm1,XMMWORD[((-48))+rcx]
+DB      102,15,56,220,208
+DB      15,56,202,222
+        pxor    xmm4,xmm6
+DB      15,56,201,238
+        movups  xmm0,XMMWORD[((-32))+rcx]
+DB      102,15,56,220,209
+        movdqa  xmm9,xmm8
+DB      69,15,58,204,194,2
+DB      68,15,56,200,203
+        movups  xmm1,XMMWORD[((-16))+rcx]
+DB      102,15,56,220,208
+DB      15,56,202,227
+        pxor    xmm5,xmm3
+DB      15,56,201,243
+        movups  xmm0,XMMWORD[rcx]
+DB      102,15,56,220,209
+        movdqa  xmm10,xmm8
+DB      69,15,58,204,193,2
+DB      68,15,56,200,212
+        movups  xmm1,XMMWORD[16+rcx]
+DB      102,15,56,220,208
+DB      15,56,202,236
+        pxor    xmm6,xmm4
+DB      15,56,201,220
+        movups  xmm0,XMMWORD[32+rcx]
+DB      102,15,56,220,209
+        movdqa  xmm9,xmm8
+DB      69,15,58,204,194,2
+DB      68,15,56,200,205
+        movups  xmm1,XMMWORD[48+rcx]
+DB      102,15,56,220,208
+DB      15,56,202,245
+        pxor    xmm3,xmm5
+DB      15,56,201,229
+        cmp     r11d,11
+        jb      NEAR $L$aesenclast13
+        movups  xmm0,XMMWORD[64+rcx]
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[80+rcx]
+DB      102,15,56,220,208
+        je      NEAR $L$aesenclast13
+        movups  xmm0,XMMWORD[96+rcx]
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[112+rcx]
+DB      102,15,56,220,208
+$L$aesenclast13:
+DB      102,15,56,221,209
+        movups  xmm0,XMMWORD[((16-112))+rcx]
+        movdqa  xmm10,xmm8
+DB      69,15,58,204,193,2
+DB      68,15,56,200,214
+        movups  xmm14,XMMWORD[48+rdi]
+        xorps   xmm14,xmm15
+        movups  XMMWORD[32+rdi*1+rsi],xmm2
+        xorps   xmm2,xmm14
+        movups  xmm1,XMMWORD[((-80))+rcx]
+DB      102,15,56,220,208
+DB      15,56,202,222
+        pxor    xmm4,xmm6
+DB      15,56,201,238
+        movups  xmm0,XMMWORD[((-64))+rcx]
+DB      102,15,56,220,209
+        movdqa  xmm9,xmm8
+DB      69,15,58,204,194,3
+DB      68,15,56,200,203
+        movups  xmm1,XMMWORD[((-48))+rcx]
+DB      102,15,56,220,208
+DB      15,56,202,227
+        pxor    xmm5,xmm3
+DB      15,56,201,243
+        movups  xmm0,XMMWORD[((-32))+rcx]
+DB      102,15,56,220,209
+        movdqa  xmm10,xmm8
+DB      69,15,58,204,193,3
+DB      68,15,56,200,212
+DB      15,56,202,236
+        pxor    xmm6,xmm4
+        movups  xmm1,XMMWORD[((-16))+rcx]
+DB      102,15,56,220,208
+        movdqa  xmm9,xmm8
+DB      69,15,58,204,194,3
+DB      68,15,56,200,205
+DB      15,56,202,245
+        movups  xmm0,XMMWORD[rcx]
+DB      102,15,56,220,209
+        movdqa  xmm5,xmm12
+        movdqa  xmm10,xmm8
+DB      69,15,58,204,193,3
+DB      68,15,56,200,214
+        movups  xmm1,XMMWORD[16+rcx]
+DB      102,15,56,220,208
+        movdqa  xmm9,xmm8
+DB      69,15,58,204,194,3
+DB      68,15,56,200,205
+        movups  xmm0,XMMWORD[32+rcx]
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[48+rcx]
+DB      102,15,56,220,208
+        cmp     r11d,11
+        jb      NEAR $L$aesenclast14
+        movups  xmm0,XMMWORD[64+rcx]
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[80+rcx]
+DB      102,15,56,220,208
+        je      NEAR $L$aesenclast14
+        movups  xmm0,XMMWORD[96+rcx]
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[112+rcx]
+DB      102,15,56,220,208
+$L$aesenclast14:
+DB      102,15,56,221,209
+        movups  xmm0,XMMWORD[((16-112))+rcx]
+        dec     rdx
+
+        paddd   xmm8,xmm11
+        movups  XMMWORD[48+rdi*1+rsi],xmm2
+        lea     rdi,[64+rdi]
+        jnz     NEAR $L$oop_shaext
+
+        pshufd  xmm8,xmm8,27
+        pshufd  xmm9,xmm9,27
+        movups  XMMWORD[r8],xmm2
+        movdqu  XMMWORD[r9],xmm8
+        movd    DWORD[16+r9],xmm9
+        movaps  xmm6,XMMWORD[((-8-160))+rax]
+        movaps  xmm7,XMMWORD[((-8-144))+rax]
+        movaps  xmm8,XMMWORD[((-8-128))+rax]
+        movaps  xmm9,XMMWORD[((-8-112))+rax]
+        movaps  xmm10,XMMWORD[((-8-96))+rax]
+        movaps  xmm11,XMMWORD[((-8-80))+rax]
+        movaps  xmm12,XMMWORD[((-8-64))+rax]
+        movaps  xmm13,XMMWORD[((-8-48))+rax]
+        movaps  xmm14,XMMWORD[((-8-32))+rax]
+        movaps  xmm15,XMMWORD[((-8-16))+rax]
+        mov     rsp,rax
+$L$epilogue_shaext:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+$L$SEH_end_aesni_cbc_sha1_enc_shaext:
+EXTERN  __imp_RtlVirtualUnwind
+
+ALIGN   16
+ssse3_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$common_seh_tail
+
+        mov     rax,QWORD[152+r8]
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$common_seh_tail
+        lea     r10,[aesni_cbc_sha1_enc_shaext]
+        cmp     rbx,r10
+        jb      NEAR $L$seh_no_shaext
+
+        lea     rsi,[rax]
+        lea     rdi,[512+r8]
+        mov     ecx,20
+        DD      0xa548f3fc
+        lea     rax,[168+rax]
+        jmp     NEAR $L$common_seh_tail
+$L$seh_no_shaext:
+        lea     rsi,[96+rax]
+        lea     rdi,[512+r8]
+        mov     ecx,20
+        DD      0xa548f3fc
+        lea     rax,[264+rax]
+
+        mov     r15,QWORD[rax]
+        mov     r14,QWORD[8+rax]
+        mov     r13,QWORD[16+rax]
+        mov     r12,QWORD[24+rax]
+        mov     rbp,QWORD[32+rax]
+        mov     rbx,QWORD[40+rax]
+        lea     rax,[48+rax]
+        mov     QWORD[144+r8],rbx
+        mov     QWORD[160+r8],rbp
+        mov     QWORD[216+r8],r12
+        mov     QWORD[224+r8],r13
+        mov     QWORD[232+r8],r14
+        mov     QWORD[240+r8],r15
+
+$L$common_seh_tail:
+        mov     rdi,QWORD[8+rax]
+        mov     rsi,QWORD[16+rax]
+        mov     QWORD[152+r8],rax
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+
+        mov     rdi,QWORD[40+r9]
+        mov     rsi,r8
+        mov     ecx,154
+        DD      0xa548f3fc
+
+        mov     rsi,r9
+        xor     rcx,rcx
+        mov     rdx,QWORD[8+rsi]
+        mov     r8,QWORD[rsi]
+        mov     r9,QWORD[16+rsi]
+        mov     r10,QWORD[40+rsi]
+        lea     r11,[56+rsi]
+        lea     r12,[24+rsi]
+        mov     QWORD[32+rsp],r10
+        mov     QWORD[40+rsp],r11
+        mov     QWORD[48+rsp],r12
+        mov     QWORD[56+rsp],rcx
+        call    QWORD[__imp_RtlVirtualUnwind]
+
+        mov     eax,1
+        add     rsp,64
+        popfq
+        pop     r15
+        pop     r14
+        pop     r13
+        pop     r12
+        pop     rbp
+        pop     rbx
+        pop     rdi
+        pop     rsi
+        DB      0F3h,0C3h               ;repret
+
+
+section .pdata rdata align=4
+ALIGN   4
+        DD      $L$SEH_begin_aesni_cbc_sha1_enc_ssse3 wrt ..imagebase
+        DD      $L$SEH_end_aesni_cbc_sha1_enc_ssse3 wrt ..imagebase
+        DD      $L$SEH_info_aesni_cbc_sha1_enc_ssse3 wrt ..imagebase
+        DD      $L$SEH_begin_aesni_cbc_sha1_enc_avx wrt ..imagebase
+        DD      $L$SEH_end_aesni_cbc_sha1_enc_avx wrt ..imagebase
+        DD      $L$SEH_info_aesni_cbc_sha1_enc_avx wrt ..imagebase
+        DD      $L$SEH_begin_aesni_cbc_sha1_enc_shaext wrt ..imagebase
+        DD      $L$SEH_end_aesni_cbc_sha1_enc_shaext wrt ..imagebase
+        DD      $L$SEH_info_aesni_cbc_sha1_enc_shaext wrt ..imagebase
+section .xdata rdata align=8
+ALIGN   8
+$L$SEH_info_aesni_cbc_sha1_enc_ssse3:
+DB      9,0,0,0
+        DD      ssse3_handler wrt ..imagebase
+        DD      $L$prologue_ssse3 wrt ..imagebase,$L$epilogue_ssse3 wrt ..imagebase
+$L$SEH_info_aesni_cbc_sha1_enc_avx:
+DB      9,0,0,0
+        DD      ssse3_handler wrt ..imagebase
+        DD      $L$prologue_avx wrt ..imagebase,$L$epilogue_avx wrt ..imagebase
+$L$SEH_info_aesni_cbc_sha1_enc_shaext:
+DB      9,0,0,0
+        DD      ssse3_handler wrt ..imagebase
+        DD      $L$prologue_shaext wrt ..imagebase,$L$epilogue_shaext wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256-x86_64.nasm
new file mode 100644
index 0000000000..0dba3d7f67
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256-x86_64.nasm
@@ -0,0 +1,4709 @@
+; Copyright 2013-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+EXTERN  OPENSSL_ia32cap_P
+global  aesni_cbc_sha256_enc
+
+ALIGN   16
+aesni_cbc_sha256_enc:
+        lea     r11,[OPENSSL_ia32cap_P]
+        mov     eax,1
+        cmp     rcx,0
+        je      NEAR $L$probe
+        mov     eax,DWORD[r11]
+        mov     r10,QWORD[4+r11]
+        bt      r10,61
+        jc      NEAR aesni_cbc_sha256_enc_shaext
+        mov     r11,r10
+        shr     r11,32
+
+        test    r10d,2048
+        jnz     NEAR aesni_cbc_sha256_enc_xop
+        and     r11d,296
+        cmp     r11d,296
+        je      NEAR aesni_cbc_sha256_enc_avx2
+        and     r10d,268435456
+        jnz     NEAR aesni_cbc_sha256_enc_avx
+        ud2
+        xor     eax,eax
+        cmp     rcx,0
+        je      NEAR $L$probe
+        ud2
+$L$probe:
+        DB      0F3h,0C3h               ;repret
+
+
+ALIGN   64
+
+K256:
+        DD      0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5
+        DD      0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5
+        DD      0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5
+        DD      0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5
+        DD      0xd807aa98,0x12835b01,0x243185be,0x550c7dc3
+        DD      0xd807aa98,0x12835b01,0x243185be,0x550c7dc3
+        DD      0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174
+        DD      0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174
+        DD      0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc
+        DD      0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc
+        DD      0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da
+        DD      0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da
+        DD      0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7
+        DD      0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7
+        DD      0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967
+        DD      0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967
+        DD      0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13
+        DD      0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13
+        DD      0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85
+        DD      0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85
+        DD      0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3
+        DD      0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3
+        DD      0xd192e819,0xd6990624,0xf40e3585,0x106aa070
+        DD      0xd192e819,0xd6990624,0xf40e3585,0x106aa070
+        DD      0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5
+        DD      0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5
+        DD      0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3
+        DD      0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3
+        DD      0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208
+        DD      0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208
+        DD      0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2
+        DD      0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2
+
+        DD      0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+        DD      0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+        DD      0,0,0,0,0,0,0,0,-1,-1,-1,-1
+        DD      0,0,0,0,0,0,0,0
+DB      65,69,83,78,73,45,67,66,67,43,83,72,65,50,53,54
+DB      32,115,116,105,116,99,104,32,102,111,114,32,120,56,54,95
+DB      54,52,44,32,67,82,89,80,84,79,71,65,77,83,32,98
+DB      121,32,60,97,112,112,114,111,64,111,112,101,110,115,115,108
+DB      46,111,114,103,62,0
+ALIGN   64
+
+ALIGN   64
+aesni_cbc_sha256_enc_xop:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_cbc_sha256_enc_xop:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+$L$xop_shortcut:
+        mov     r10,QWORD[56+rsp]
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        sub     rsp,288
+        and     rsp,-64
+
+        shl     rdx,6
+        sub     rsi,rdi
+        sub     r10,rdi
+        add     rdx,rdi
+
+
+        mov     QWORD[((64+8))+rsp],rsi
+        mov     QWORD[((64+16))+rsp],rdx
+
+        mov     QWORD[((64+32))+rsp],r8
+        mov     QWORD[((64+40))+rsp],r9
+        mov     QWORD[((64+48))+rsp],r10
+        mov     QWORD[120+rsp],rax
+
+        movaps  XMMWORD[128+rsp],xmm6
+        movaps  XMMWORD[144+rsp],xmm7
+        movaps  XMMWORD[160+rsp],xmm8
+        movaps  XMMWORD[176+rsp],xmm9
+        movaps  XMMWORD[192+rsp],xmm10
+        movaps  XMMWORD[208+rsp],xmm11
+        movaps  XMMWORD[224+rsp],xmm12
+        movaps  XMMWORD[240+rsp],xmm13
+        movaps  XMMWORD[256+rsp],xmm14
+        movaps  XMMWORD[272+rsp],xmm15
+$L$prologue_xop:
+        vzeroall
+
+        mov     r12,rdi
+        lea     rdi,[128+rcx]
+        lea     r13,[((K256+544))]
+        mov     r14d,DWORD[((240-128))+rdi]
+        mov     r15,r9
+        mov     rsi,r10
+        vmovdqu xmm8,XMMWORD[r8]
+        sub     r14,9
+
+        mov     eax,DWORD[r15]
+        mov     ebx,DWORD[4+r15]
+        mov     ecx,DWORD[8+r15]
+        mov     edx,DWORD[12+r15]
+        mov     r8d,DWORD[16+r15]
+        mov     r9d,DWORD[20+r15]
+        mov     r10d,DWORD[24+r15]
+        mov     r11d,DWORD[28+r15]
+
+        vmovdqa xmm14,XMMWORD[r14*8+r13]
+        vmovdqa xmm13,XMMWORD[16+r14*8+r13]
+        vmovdqa xmm12,XMMWORD[32+r14*8+r13]
+        vmovdqu xmm10,XMMWORD[((0-128))+rdi]
+        jmp     NEAR $L$loop_xop
+ALIGN   16
+$L$loop_xop:
+        vmovdqa xmm7,XMMWORD[((K256+512))]
+        vmovdqu xmm0,XMMWORD[r12*1+rsi]
+        vmovdqu xmm1,XMMWORD[16+r12*1+rsi]
+        vmovdqu xmm2,XMMWORD[32+r12*1+rsi]
+        vmovdqu xmm3,XMMWORD[48+r12*1+rsi]
+        vpshufb xmm0,xmm0,xmm7
+        lea     rbp,[K256]
+        vpshufb xmm1,xmm1,xmm7
+        vpshufb xmm2,xmm2,xmm7
+        vpaddd  xmm4,xmm0,XMMWORD[rbp]
+        vpshufb xmm3,xmm3,xmm7
+        vpaddd  xmm5,xmm1,XMMWORD[32+rbp]
+        vpaddd  xmm6,xmm2,XMMWORD[64+rbp]
+        vpaddd  xmm7,xmm3,XMMWORD[96+rbp]
+        vmovdqa XMMWORD[rsp],xmm4
+        mov     r14d,eax
+        vmovdqa XMMWORD[16+rsp],xmm5
+        mov     esi,ebx
+        vmovdqa XMMWORD[32+rsp],xmm6
+        xor     esi,ecx
+        vmovdqa XMMWORD[48+rsp],xmm7
+        mov     r13d,r8d
+        jmp     NEAR $L$xop_00_47
+
+ALIGN   16
+$L$xop_00_47:
+        sub     rbp,-16*2*4
+        vmovdqu xmm9,XMMWORD[r12]
+        mov     QWORD[((64+0))+rsp],r12
+        vpalignr        xmm4,xmm1,xmm0,4
+        ror     r13d,14
+        mov     eax,r14d
+        vpalignr        xmm7,xmm3,xmm2,4
+        mov     r12d,r9d
+        xor     r13d,r8d
+DB      143,232,120,194,236,14
+        ror     r14d,9
+        xor     r12d,r10d
+        vpsrld  xmm4,xmm4,3
+        ror     r13d,5
+        xor     r14d,eax
+        vpaddd  xmm0,xmm0,xmm7
+        and     r12d,r8d
+        vpxor   xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((16-128))+rdi]
+        xor     r13d,r8d
+        add     r11d,DWORD[rsp]
+        mov     r15d,eax
+DB      143,232,120,194,245,11
+        ror     r14d,11
+        xor     r12d,r10d
+        vpxor   xmm4,xmm4,xmm5
+        xor     r15d,ebx
+        ror     r13d,6
+        add     r11d,r12d
+        and     esi,r15d
+DB      143,232,120,194,251,13
+        xor     r14d,eax
+        add     r11d,r13d
+        vpxor   xmm4,xmm4,xmm6
+        xor     esi,ebx
+        add     edx,r11d
+        vpsrld  xmm6,xmm3,10
+        ror     r14d,2
+        add     r11d,esi
+        vpaddd  xmm0,xmm0,xmm4
+        mov     r13d,edx
+        add     r14d,r11d
+DB      143,232,120,194,239,2
+        ror     r13d,14
+        mov     r11d,r14d
+        vpxor   xmm7,xmm7,xmm6
+        mov     r12d,r8d
+        xor     r13d,edx
+        ror     r14d,9
+        xor     r12d,r9d
+        vpxor   xmm7,xmm7,xmm5
+        ror     r13d,5
+        xor     r14d,r11d
+        and     r12d,edx
+        vpxor   xmm9,xmm9,xmm8
+        xor     r13d,edx
+        vpsrldq xmm7,xmm7,8
+        add     r10d,DWORD[4+rsp]
+        mov     esi,r11d
+        ror     r14d,11
+        xor     r12d,r9d
+        vpaddd  xmm0,xmm0,xmm7
+        xor     esi,eax
+        ror     r13d,6
+        add     r10d,r12d
+        and     r15d,esi
+DB      143,232,120,194,248,13
+        xor     r14d,r11d
+        add     r10d,r13d
+        vpsrld  xmm6,xmm0,10
+        xor     r15d,eax
+        add     ecx,r10d
+DB      143,232,120,194,239,2
+        ror     r14d,2
+        add     r10d,r15d
+        vpxor   xmm7,xmm7,xmm6
+        mov     r13d,ecx
+        add     r14d,r10d
+        ror     r13d,14
+        mov     r10d,r14d
+        vpxor   xmm7,xmm7,xmm5
+        mov     r12d,edx
+        xor     r13d,ecx
+        ror     r14d,9
+        xor     r12d,r8d
+        vpslldq xmm7,xmm7,8
+        ror     r13d,5
+        xor     r14d,r10d
+        and     r12d,ecx
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((32-128))+rdi]
+        xor     r13d,ecx
+        vpaddd  xmm0,xmm0,xmm7
+        add     r9d,DWORD[8+rsp]
+        mov     r15d,r10d
+        ror     r14d,11
+        xor     r12d,r8d
+        vpaddd  xmm6,xmm0,XMMWORD[rbp]
+        xor     r15d,r11d
+        ror     r13d,6
+        add     r9d,r12d
+        and     esi,r15d
+        xor     r14d,r10d
+        add     r9d,r13d
+        xor     esi,r11d
+        add     ebx,r9d
+        ror     r14d,2
+        add     r9d,esi
+        mov     r13d,ebx
+        add     r14d,r9d
+        ror     r13d,14
+        mov     r9d,r14d
+        mov     r12d,ecx
+        xor     r13d,ebx
+        ror     r14d,9
+        xor     r12d,edx
+        ror     r13d,5
+        xor     r14d,r9d
+        and     r12d,ebx
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((48-128))+rdi]
+        xor     r13d,ebx
+        add     r8d,DWORD[12+rsp]
+        mov     esi,r9d
+        ror     r14d,11
+        xor     r12d,edx
+        xor     esi,r10d
+        ror     r13d,6
+        add     r8d,r12d
+        and     r15d,esi
+        xor     r14d,r9d
+        add     r8d,r13d
+        xor     r15d,r10d
+        add     eax,r8d
+        ror     r14d,2
+        add     r8d,r15d
+        mov     r13d,eax
+        add     r14d,r8d
+        vmovdqa XMMWORD[rsp],xmm6
+        vpalignr        xmm4,xmm2,xmm1,4
+        ror     r13d,14
+        mov     r8d,r14d
+        vpalignr        xmm7,xmm0,xmm3,4
+        mov     r12d,ebx
+        xor     r13d,eax
+DB      143,232,120,194,236,14
+        ror     r14d,9
+        xor     r12d,ecx
+        vpsrld  xmm4,xmm4,3
+        ror     r13d,5
+        xor     r14d,r8d
+        vpaddd  xmm1,xmm1,xmm7
+        and     r12d,eax
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((64-128))+rdi]
+        xor     r13d,eax
+        add     edx,DWORD[16+rsp]
+        mov     r15d,r8d
+DB      143,232,120,194,245,11
+        ror     r14d,11
+        xor     r12d,ecx
+        vpxor   xmm4,xmm4,xmm5
+        xor     r15d,r9d
+        ror     r13d,6
+        add     edx,r12d
+        and     esi,r15d
+DB      143,232,120,194,248,13
+        xor     r14d,r8d
+        add     edx,r13d
+        vpxor   xmm4,xmm4,xmm6
+        xor     esi,r9d
+        add     r11d,edx
+        vpsrld  xmm6,xmm0,10
+        ror     r14d,2
+        add     edx,esi
+        vpaddd  xmm1,xmm1,xmm4
+        mov     r13d,r11d
+        add     r14d,edx
+DB      143,232,120,194,239,2
+        ror     r13d,14
+        mov     edx,r14d
+        vpxor   xmm7,xmm7,xmm6
+        mov     r12d,eax
+        xor     r13d,r11d
+        ror     r14d,9
+        xor     r12d,ebx
+        vpxor   xmm7,xmm7,xmm5
+        ror     r13d,5
+        xor     r14d,edx
+        and     r12d,r11d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((80-128))+rdi]
+        xor     r13d,r11d
+        vpsrldq xmm7,xmm7,8
+        add     ecx,DWORD[20+rsp]
+        mov     esi,edx
+        ror     r14d,11
+        xor     r12d,ebx
+        vpaddd  xmm1,xmm1,xmm7
+        xor     esi,r8d
+        ror     r13d,6
+        add     ecx,r12d
+        and     r15d,esi
+DB      143,232,120,194,249,13
+        xor     r14d,edx
+        add     ecx,r13d
+        vpsrld  xmm6,xmm1,10
+        xor     r15d,r8d
+        add     r10d,ecx
+DB      143,232,120,194,239,2
+        ror     r14d,2
+        add     ecx,r15d
+        vpxor   xmm7,xmm7,xmm6
+        mov     r13d,r10d
+        add     r14d,ecx
+        ror     r13d,14
+        mov     ecx,r14d
+        vpxor   xmm7,xmm7,xmm5
+        mov     r12d,r11d
+        xor     r13d,r10d
+        ror     r14d,9
+        xor     r12d,eax
+        vpslldq xmm7,xmm7,8
+        ror     r13d,5
+        xor     r14d,ecx
+        and     r12d,r10d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((96-128))+rdi]
+        xor     r13d,r10d
+        vpaddd  xmm1,xmm1,xmm7
+        add     ebx,DWORD[24+rsp]
+        mov     r15d,ecx
+        ror     r14d,11
+        xor     r12d,eax
+        vpaddd  xmm6,xmm1,XMMWORD[32+rbp]
+        xor     r15d,edx
+        ror     r13d,6
+        add     ebx,r12d
+        and     esi,r15d
+        xor     r14d,ecx
+        add     ebx,r13d
+        xor     esi,edx
+        add     r9d,ebx
+        ror     r14d,2
+        add     ebx,esi
+        mov     r13d,r9d
+        add     r14d,ebx
+        ror     r13d,14
+        mov     ebx,r14d
+        mov     r12d,r10d
+        xor     r13d,r9d
+        ror     r14d,9
+        xor     r12d,r11d
+        ror     r13d,5
+        xor     r14d,ebx
+        and     r12d,r9d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((112-128))+rdi]
+        xor     r13d,r9d
+        add     eax,DWORD[28+rsp]
+        mov     esi,ebx
+        ror     r14d,11
+        xor     r12d,r11d
+        xor     esi,ecx
+        ror     r13d,6
+        add     eax,r12d
+        and     r15d,esi
+        xor     r14d,ebx
+        add     eax,r13d
+        xor     r15d,ecx
+        add     r8d,eax
+        ror     r14d,2
+        add     eax,r15d
+        mov     r13d,r8d
+        add     r14d,eax
+        vmovdqa XMMWORD[16+rsp],xmm6
+        vpalignr        xmm4,xmm3,xmm2,4
+        ror     r13d,14
+        mov     eax,r14d
+        vpalignr        xmm7,xmm1,xmm0,4
+        mov     r12d,r9d
+        xor     r13d,r8d
+DB      143,232,120,194,236,14
+        ror     r14d,9
+        xor     r12d,r10d
+        vpsrld  xmm4,xmm4,3
+        ror     r13d,5
+        xor     r14d,eax
+        vpaddd  xmm2,xmm2,xmm7
+        and     r12d,r8d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((128-128))+rdi]
+        xor     r13d,r8d
+        add     r11d,DWORD[32+rsp]
+        mov     r15d,eax
+DB      143,232,120,194,245,11
+        ror     r14d,11
+        xor     r12d,r10d
+        vpxor   xmm4,xmm4,xmm5
+        xor     r15d,ebx
+        ror     r13d,6
+        add     r11d,r12d
+        and     esi,r15d
+DB      143,232,120,194,249,13
+        xor     r14d,eax
+        add     r11d,r13d
+        vpxor   xmm4,xmm4,xmm6
+        xor     esi,ebx
+        add     edx,r11d
+        vpsrld  xmm6,xmm1,10
+        ror     r14d,2
+        add     r11d,esi
+        vpaddd  xmm2,xmm2,xmm4
+        mov     r13d,edx
+        add     r14d,r11d
+DB      143,232,120,194,239,2
+        ror     r13d,14
+        mov     r11d,r14d
+        vpxor   xmm7,xmm7,xmm6
+        mov     r12d,r8d
+        xor     r13d,edx
+        ror     r14d,9
+        xor     r12d,r9d
+        vpxor   xmm7,xmm7,xmm5
+        ror     r13d,5
+        xor     r14d,r11d
+        and     r12d,edx
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((144-128))+rdi]
+        xor     r13d,edx
+        vpsrldq xmm7,xmm7,8
+        add     r10d,DWORD[36+rsp]
+        mov     esi,r11d
+        ror     r14d,11
+        xor     r12d,r9d
+        vpaddd  xmm2,xmm2,xmm7
+        xor     esi,eax
+        ror     r13d,6
+        add     r10d,r12d
+        and     r15d,esi
+DB      143,232,120,194,250,13
+        xor     r14d,r11d
+        add     r10d,r13d
+        vpsrld  xmm6,xmm2,10
+        xor     r15d,eax
+        add     ecx,r10d
+DB      143,232,120,194,239,2
+        ror     r14d,2
+        add     r10d,r15d
+        vpxor   xmm7,xmm7,xmm6
+        mov     r13d,ecx
+        add     r14d,r10d
+        ror     r13d,14
+        mov     r10d,r14d
+        vpxor   xmm7,xmm7,xmm5
+        mov     r12d,edx
+        xor     r13d,ecx
+        ror     r14d,9
+        xor     r12d,r8d
+        vpslldq xmm7,xmm7,8
+        ror     r13d,5
+        xor     r14d,r10d
+        and     r12d,ecx
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((160-128))+rdi]
+        xor     r13d,ecx
+        vpaddd  xmm2,xmm2,xmm7
+        add     r9d,DWORD[40+rsp]
+        mov     r15d,r10d
+        ror     r14d,11
+        xor     r12d,r8d
+        vpaddd  xmm6,xmm2,XMMWORD[64+rbp]
+        xor     r15d,r11d
+        ror     r13d,6
+        add     r9d,r12d
+        and     esi,r15d
+        xor     r14d,r10d
+        add     r9d,r13d
+        xor     esi,r11d
+        add     ebx,r9d
+        ror     r14d,2
+        add     r9d,esi
+        mov     r13d,ebx
+        add     r14d,r9d
+        ror     r13d,14
+        mov     r9d,r14d
+        mov     r12d,ecx
+        xor     r13d,ebx
+        ror     r14d,9
+        xor     r12d,edx
+        ror     r13d,5
+        xor     r14d,r9d
+        and     r12d,ebx
+        vaesenclast     xmm11,xmm9,xmm10
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((176-128))+rdi]
+        xor     r13d,ebx
+        add     r8d,DWORD[44+rsp]
+        mov     esi,r9d
+        ror     r14d,11
+        xor     r12d,edx
+        xor     esi,r10d
+        ror     r13d,6
+        add     r8d,r12d
+        and     r15d,esi
+        xor     r14d,r9d
+        add     r8d,r13d
+        xor     r15d,r10d
+        add     eax,r8d
+        ror     r14d,2
+        add     r8d,r15d
+        mov     r13d,eax
+        add     r14d,r8d
+        vmovdqa XMMWORD[32+rsp],xmm6
+        vpalignr        xmm4,xmm0,xmm3,4
+        ror     r13d,14
+        mov     r8d,r14d
+        vpalignr        xmm7,xmm2,xmm1,4
+        mov     r12d,ebx
+        xor     r13d,eax
+DB      143,232,120,194,236,14
+        ror     r14d,9
+        xor     r12d,ecx
+        vpsrld  xmm4,xmm4,3
+        ror     r13d,5
+        xor     r14d,r8d
+        vpaddd  xmm3,xmm3,xmm7
+        and     r12d,eax
+        vpand   xmm8,xmm11,xmm12
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((192-128))+rdi]
+        xor     r13d,eax
+        add     edx,DWORD[48+rsp]
+        mov     r15d,r8d
+DB      143,232,120,194,245,11
+        ror     r14d,11
+        xor     r12d,ecx
+        vpxor   xmm4,xmm4,xmm5
+        xor     r15d,r9d
+        ror     r13d,6
+        add     edx,r12d
+        and     esi,r15d
+DB      143,232,120,194,250,13
+        xor     r14d,r8d
+        add     edx,r13d
+        vpxor   xmm4,xmm4,xmm6
+        xor     esi,r9d
+        add     r11d,edx
+        vpsrld  xmm6,xmm2,10
+        ror     r14d,2
+        add     edx,esi
+        vpaddd  xmm3,xmm3,xmm4
+        mov     r13d,r11d
+        add     r14d,edx
+DB      143,232,120,194,239,2
+        ror     r13d,14
+        mov     edx,r14d
+        vpxor   xmm7,xmm7,xmm6
+        mov     r12d,eax
+        xor     r13d,r11d
+        ror     r14d,9
+        xor     r12d,ebx
+        vpxor   xmm7,xmm7,xmm5
+        ror     r13d,5
+        xor     r14d,edx
+        and     r12d,r11d
+        vaesenclast     xmm11,xmm9,xmm10
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((208-128))+rdi]
+        xor     r13d,r11d
+        vpsrldq xmm7,xmm7,8
+        add     ecx,DWORD[52+rsp]
+        mov     esi,edx
+        ror     r14d,11
+        xor     r12d,ebx
+        vpaddd  xmm3,xmm3,xmm7
+        xor     esi,r8d
+        ror     r13d,6
+        add     ecx,r12d
+        and     r15d,esi
+DB      143,232,120,194,251,13
+        xor     r14d,edx
+        add     ecx,r13d
+        vpsrld  xmm6,xmm3,10
+        xor     r15d,r8d
+        add     r10d,ecx
+DB      143,232,120,194,239,2
+        ror     r14d,2
+        add     ecx,r15d
+        vpxor   xmm7,xmm7,xmm6
+        mov     r13d,r10d
+        add     r14d,ecx
+        ror     r13d,14
+        mov     ecx,r14d
+        vpxor   xmm7,xmm7,xmm5
+        mov     r12d,r11d
+        xor     r13d,r10d
+        ror     r14d,9
+        xor     r12d,eax
+        vpslldq xmm7,xmm7,8
+        ror     r13d,5
+        xor     r14d,ecx
+        and     r12d,r10d
+        vpand   xmm11,xmm11,xmm13
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((224-128))+rdi]
+        xor     r13d,r10d
+        vpaddd  xmm3,xmm3,xmm7
+        add     ebx,DWORD[56+rsp]
+        mov     r15d,ecx
+        ror     r14d,11
+        xor     r12d,eax
+        vpaddd  xmm6,xmm3,XMMWORD[96+rbp]
+        xor     r15d,edx
+        ror     r13d,6
+        add     ebx,r12d
+        and     esi,r15d
+        xor     r14d,ecx
+        add     ebx,r13d
+        xor     esi,edx
+        add     r9d,ebx
+        ror     r14d,2
+        add     ebx,esi
+        mov     r13d,r9d
+        add     r14d,ebx
+        ror     r13d,14
+        mov     ebx,r14d
+        mov     r12d,r10d
+        xor     r13d,r9d
+        ror     r14d,9
+        xor     r12d,r11d
+        ror     r13d,5
+        xor     r14d,ebx
+        and     r12d,r9d
+        vpor    xmm8,xmm8,xmm11
+        vaesenclast     xmm11,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((0-128))+rdi]
+        xor     r13d,r9d
+        add     eax,DWORD[60+rsp]
+        mov     esi,ebx
+        ror     r14d,11
+        xor     r12d,r11d
+        xor     esi,ecx
+        ror     r13d,6
+        add     eax,r12d
+        and     r15d,esi
+        xor     r14d,ebx
+        add     eax,r13d
+        xor     r15d,ecx
+        add     r8d,eax
+        ror     r14d,2
+        add     eax,r15d
+        mov     r13d,r8d
+        add     r14d,eax
+        vmovdqa XMMWORD[48+rsp],xmm6
+        mov     r12,QWORD[((64+0))+rsp]
+        vpand   xmm11,xmm11,xmm14
+        mov     r15,QWORD[((64+8))+rsp]
+        vpor    xmm8,xmm8,xmm11
+        vmovdqu XMMWORD[r12*1+r15],xmm8
+        lea     r12,[16+r12]
+        cmp     BYTE[131+rbp],0
+        jne     NEAR $L$xop_00_47
+        vmovdqu xmm9,XMMWORD[r12]
+        mov     QWORD[((64+0))+rsp],r12
+        ror     r13d,14
+        mov     eax,r14d
+        mov     r12d,r9d
+        xor     r13d,r8d
+        ror     r14d,9
+        xor     r12d,r10d
+        ror     r13d,5
+        xor     r14d,eax
+        and     r12d,r8d
+        vpxor   xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((16-128))+rdi]
+        xor     r13d,r8d
+        add     r11d,DWORD[rsp]
+        mov     r15d,eax
+        ror     r14d,11
+        xor     r12d,r10d
+        xor     r15d,ebx
+        ror     r13d,6
+        add     r11d,r12d
+        and     esi,r15d
+        xor     r14d,eax
+        add     r11d,r13d
+        xor     esi,ebx
+        add     edx,r11d
+        ror     r14d,2
+        add     r11d,esi
+        mov     r13d,edx
+        add     r14d,r11d
+        ror     r13d,14
+        mov     r11d,r14d
+        mov     r12d,r8d
+        xor     r13d,edx
+        ror     r14d,9
+        xor     r12d,r9d
+        ror     r13d,5
+        xor     r14d,r11d
+        and     r12d,edx
+        vpxor   xmm9,xmm9,xmm8
+        xor     r13d,edx
+        add     r10d,DWORD[4+rsp]
+        mov     esi,r11d
+        ror     r14d,11
+        xor     r12d,r9d
+        xor     esi,eax
+        ror     r13d,6
+        add     r10d,r12d
+        and     r15d,esi
+        xor     r14d,r11d
+        add     r10d,r13d
+        xor     r15d,eax
+        add     ecx,r10d
+        ror     r14d,2
+        add     r10d,r15d
+        mov     r13d,ecx
+        add     r14d,r10d
+        ror     r13d,14
+        mov     r10d,r14d
+        mov     r12d,edx
+        xor     r13d,ecx
+        ror     r14d,9
+        xor     r12d,r8d
+        ror     r13d,5
+        xor     r14d,r10d
+        and     r12d,ecx
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((32-128))+rdi]
+        xor     r13d,ecx
+        add     r9d,DWORD[8+rsp]
+        mov     r15d,r10d
+        ror     r14d,11
+        xor     r12d,r8d
+        xor     r15d,r11d
+        ror     r13d,6
+        add     r9d,r12d
+        and     esi,r15d
+        xor     r14d,r10d
+        add     r9d,r13d
+        xor     esi,r11d
+        add     ebx,r9d
+        ror     r14d,2
+        add     r9d,esi
+        mov     r13d,ebx
+        add     r14d,r9d
+        ror     r13d,14
+        mov     r9d,r14d
+        mov     r12d,ecx
+        xor     r13d,ebx
+        ror     r14d,9
+        xor     r12d,edx
+        ror     r13d,5
+        xor     r14d,r9d
+        and     r12d,ebx
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((48-128))+rdi]
+        xor     r13d,ebx
+        add     r8d,DWORD[12+rsp]
+        mov     esi,r9d
+        ror     r14d,11
+        xor     r12d,edx
+        xor     esi,r10d
+        ror     r13d,6
+        add     r8d,r12d
+        and     r15d,esi
+        xor     r14d,r9d
+        add     r8d,r13d
+        xor     r15d,r10d
+        add     eax,r8d
+        ror     r14d,2
+        add     r8d,r15d
+        mov     r13d,eax
+        add     r14d,r8d
+        ror     r13d,14
+        mov     r8d,r14d
+        mov     r12d,ebx
+        xor     r13d,eax
+        ror     r14d,9
+        xor     r12d,ecx
+        ror     r13d,5
+        xor     r14d,r8d
+        and     r12d,eax
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((64-128))+rdi]
+        xor     r13d,eax
+        add     edx,DWORD[16+rsp]
+        mov     r15d,r8d
+        ror     r14d,11
+        xor     r12d,ecx
+        xor     r15d,r9d
+        ror     r13d,6
+        add     edx,r12d
+        and     esi,r15d
+        xor     r14d,r8d
+        add     edx,r13d
+        xor     esi,r9d
+        add     r11d,edx
+        ror     r14d,2
+        add     edx,esi
+        mov     r13d,r11d
+        add     r14d,edx
+        ror     r13d,14
+        mov     edx,r14d
+        mov     r12d,eax
+        xor     r13d,r11d
+        ror     r14d,9
+        xor     r12d,ebx
+        ror     r13d,5
+        xor     r14d,edx
+        and     r12d,r11d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((80-128))+rdi]
+        xor     r13d,r11d
+        add     ecx,DWORD[20+rsp]
+        mov     esi,edx
+        ror     r14d,11
+        xor     r12d,ebx
+        xor     esi,r8d
+        ror     r13d,6
+        add     ecx,r12d
+        and     r15d,esi
+        xor     r14d,edx
+        add     ecx,r13d
+        xor     r15d,r8d
+        add     r10d,ecx
+        ror     r14d,2
+        add     ecx,r15d
+        mov     r13d,r10d
+        add     r14d,ecx
+        ror     r13d,14
+        mov     ecx,r14d
+        mov     r12d,r11d
+        xor     r13d,r10d
+        ror     r14d,9
+        xor     r12d,eax
+        ror     r13d,5
+        xor     r14d,ecx
+        and     r12d,r10d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((96-128))+rdi]
+        xor     r13d,r10d
+        add     ebx,DWORD[24+rsp]
+        mov     r15d,ecx
+        ror     r14d,11
+        xor     r12d,eax
+        xor     r15d,edx
+        ror     r13d,6
+        add     ebx,r12d
+        and     esi,r15d
+        xor     r14d,ecx
+        add     ebx,r13d
+        xor     esi,edx
+        add     r9d,ebx
+        ror     r14d,2
+        add     ebx,esi
+        mov     r13d,r9d
+        add     r14d,ebx
+        ror     r13d,14
+        mov     ebx,r14d
+        mov     r12d,r10d
+        xor     r13d,r9d
+        ror     r14d,9
+        xor     r12d,r11d
+        ror     r13d,5
+        xor     r14d,ebx
+        and     r12d,r9d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((112-128))+rdi]
+        xor     r13d,r9d
+        add     eax,DWORD[28+rsp]
+        mov     esi,ebx
+        ror     r14d,11
+        xor     r12d,r11d
+        xor     esi,ecx
+        ror     r13d,6
+        add     eax,r12d
+        and     r15d,esi
+        xor     r14d,ebx
+        add     eax,r13d
+        xor     r15d,ecx
+        add     r8d,eax
+        ror     r14d,2
+        add     eax,r15d
+        mov     r13d,r8d
+        add     r14d,eax
+        ror     r13d,14
+        mov     eax,r14d
+        mov     r12d,r9d
+        xor     r13d,r8d
+        ror     r14d,9
+        xor     r12d,r10d
+        ror     r13d,5
+        xor     r14d,eax
+        and     r12d,r8d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((128-128))+rdi]
+        xor     r13d,r8d
+        add     r11d,DWORD[32+rsp]
+        mov     r15d,eax
+        ror     r14d,11
+        xor     r12d,r10d
+        xor     r15d,ebx
+        ror     r13d,6
+        add     r11d,r12d
+        and     esi,r15d
+        xor     r14d,eax
+        add     r11d,r13d
+        xor     esi,ebx
+        add     edx,r11d
+        ror     r14d,2
+        add     r11d,esi
+        mov     r13d,edx
+        add     r14d,r11d
+        ror     r13d,14
+        mov     r11d,r14d
+        mov     r12d,r8d
+        xor     r13d,edx
+        ror     r14d,9
+        xor     r12d,r9d
+        ror     r13d,5
+        xor     r14d,r11d
+        and     r12d,edx
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((144-128))+rdi]
+        xor     r13d,edx
+        add     r10d,DWORD[36+rsp]
+        mov     esi,r11d
+        ror     r14d,11
+        xor     r12d,r9d
+        xor     esi,eax
+        ror     r13d,6
+        add     r10d,r12d
+        and     r15d,esi
+        xor     r14d,r11d
+        add     r10d,r13d
+        xor     r15d,eax
+        add     ecx,r10d
+        ror     r14d,2
+        add     r10d,r15d
+        mov     r13d,ecx
+        add     r14d,r10d
+        ror     r13d,14
+        mov     r10d,r14d
+        mov     r12d,edx
+        xor     r13d,ecx
+        ror     r14d,9
+        xor     r12d,r8d
+        ror     r13d,5
+        xor     r14d,r10d
+        and     r12d,ecx
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((160-128))+rdi]
+        xor     r13d,ecx
+        add     r9d,DWORD[40+rsp]
+        mov     r15d,r10d
+        ror     r14d,11
+        xor     r12d,r8d
+        xor     r15d,r11d
+        ror     r13d,6
+        add     r9d,r12d
+        and     esi,r15d
+        xor     r14d,r10d
+        add     r9d,r13d
+        xor     esi,r11d
+        add     ebx,r9d
+        ror     r14d,2
+        add     r9d,esi
+        mov     r13d,ebx
+        add     r14d,r9d
+        ror     r13d,14
+        mov     r9d,r14d
+        mov     r12d,ecx
+        xor     r13d,ebx
+        ror     r14d,9
+        xor     r12d,edx
+        ror     r13d,5
+        xor     r14d,r9d
+        and     r12d,ebx
+        vaesenclast     xmm11,xmm9,xmm10
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((176-128))+rdi]
+        xor     r13d,ebx
+        add     r8d,DWORD[44+rsp]
+        mov     esi,r9d
+        ror     r14d,11
+        xor     r12d,edx
+        xor     esi,r10d
+        ror     r13d,6
+        add     r8d,r12d
+        and     r15d,esi
+        xor     r14d,r9d
+        add     r8d,r13d
+        xor     r15d,r10d
+        add     eax,r8d
+        ror     r14d,2
+        add     r8d,r15d
+        mov     r13d,eax
+        add     r14d,r8d
+        ror     r13d,14
+        mov     r8d,r14d
+        mov     r12d,ebx
+        xor     r13d,eax
+        ror     r14d,9
+        xor     r12d,ecx
+        ror     r13d,5
+        xor     r14d,r8d
+        and     r12d,eax
+        vpand   xmm8,xmm11,xmm12
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((192-128))+rdi]
+        xor     r13d,eax
+        add     edx,DWORD[48+rsp]
+        mov     r15d,r8d
+        ror     r14d,11
+        xor     r12d,ecx
+        xor     r15d,r9d
+        ror     r13d,6
+        add     edx,r12d
+        and     esi,r15d
+        xor     r14d,r8d
+        add     edx,r13d
+        xor     esi,r9d
+        add     r11d,edx
+        ror     r14d,2
+        add     edx,esi
+        mov     r13d,r11d
+        add     r14d,edx
+        ror     r13d,14
+        mov     edx,r14d
+        mov     r12d,eax
+        xor     r13d,r11d
+        ror     r14d,9
+        xor     r12d,ebx
+        ror     r13d,5
+        xor     r14d,edx
+        and     r12d,r11d
+        vaesenclast     xmm11,xmm9,xmm10
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((208-128))+rdi]
+        xor     r13d,r11d
+        add     ecx,DWORD[52+rsp]
+        mov     esi,edx
+        ror     r14d,11
+        xor     r12d,ebx
+        xor     esi,r8d
+        ror     r13d,6
+        add     ecx,r12d
+        and     r15d,esi
+        xor     r14d,edx
+        add     ecx,r13d
+        xor     r15d,r8d
+        add     r10d,ecx
+        ror     r14d,2
+        add     ecx,r15d
+        mov     r13d,r10d
+        add     r14d,ecx
+        ror     r13d,14
+        mov     ecx,r14d
+        mov     r12d,r11d
+        xor     r13d,r10d
+        ror     r14d,9
+        xor     r12d,eax
+        ror     r13d,5
+        xor     r14d,ecx
+        and     r12d,r10d
+        vpand   xmm11,xmm11,xmm13
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((224-128))+rdi]
+        xor     r13d,r10d
+        add     ebx,DWORD[56+rsp]
+        mov     r15d,ecx
+        ror     r14d,11
+        xor     r12d,eax
+        xor     r15d,edx
+        ror     r13d,6
+        add     ebx,r12d
+        and     esi,r15d
+        xor     r14d,ecx
+        add     ebx,r13d
+        xor     esi,edx
+        add     r9d,ebx
+        ror     r14d,2
+        add     ebx,esi
+        mov     r13d,r9d
+        add     r14d,ebx
+        ror     r13d,14
+        mov     ebx,r14d
+        mov     r12d,r10d
+        xor     r13d,r9d
+        ror     r14d,9
+        xor     r12d,r11d
+        ror     r13d,5
+        xor     r14d,ebx
+        and     r12d,r9d
+        vpor    xmm8,xmm8,xmm11
+        vaesenclast     xmm11,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((0-128))+rdi]
+        xor     r13d,r9d
+        add     eax,DWORD[60+rsp]
+        mov     esi,ebx
+        ror     r14d,11
+        xor     r12d,r11d
+        xor     esi,ecx
+        ror     r13d,6
+        add     eax,r12d
+        and     r15d,esi
+        xor     r14d,ebx
+        add     eax,r13d
+        xor     r15d,ecx
+        add     r8d,eax
+        ror     r14d,2
+        add     eax,r15d
+        mov     r13d,r8d
+        add     r14d,eax
+        mov     r12,QWORD[((64+0))+rsp]
+        mov     r13,QWORD[((64+8))+rsp]
+        mov     r15,QWORD[((64+40))+rsp]
+        mov     rsi,QWORD[((64+48))+rsp]
+
+        vpand   xmm11,xmm11,xmm14
+        mov     eax,r14d
+        vpor    xmm8,xmm8,xmm11
+        vmovdqu XMMWORD[r13*1+r12],xmm8
+        lea     r12,[16+r12]
+
+        add     eax,DWORD[r15]
+        add     ebx,DWORD[4+r15]
+        add     ecx,DWORD[8+r15]
+        add     edx,DWORD[12+r15]
+        add     r8d,DWORD[16+r15]
+        add     r9d,DWORD[20+r15]
+        add     r10d,DWORD[24+r15]
+        add     r11d,DWORD[28+r15]
+
+        cmp     r12,QWORD[((64+16))+rsp]
+
+        mov     DWORD[r15],eax
+        mov     DWORD[4+r15],ebx
+        mov     DWORD[8+r15],ecx
+        mov     DWORD[12+r15],edx
+        mov     DWORD[16+r15],r8d
+        mov     DWORD[20+r15],r9d
+        mov     DWORD[24+r15],r10d
+        mov     DWORD[28+r15],r11d
+
+        jb      NEAR $L$loop_xop
+
+        mov     r8,QWORD[((64+32))+rsp]
+        mov     rsi,QWORD[120+rsp]
+
+        vmovdqu XMMWORD[r8],xmm8
+        vzeroall
+        movaps  xmm6,XMMWORD[128+rsp]
+        movaps  xmm7,XMMWORD[144+rsp]
+        movaps  xmm8,XMMWORD[160+rsp]
+        movaps  xmm9,XMMWORD[176+rsp]
+        movaps  xmm10,XMMWORD[192+rsp]
+        movaps  xmm11,XMMWORD[208+rsp]
+        movaps  xmm12,XMMWORD[224+rsp]
+        movaps  xmm13,XMMWORD[240+rsp]
+        movaps  xmm14,XMMWORD[256+rsp]
+        movaps  xmm15,XMMWORD[272+rsp]
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$epilogue_xop:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_aesni_cbc_sha256_enc_xop:
+
+ALIGN   64
+aesni_cbc_sha256_enc_avx:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_cbc_sha256_enc_avx:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+$L$avx_shortcut:
+        mov     r10,QWORD[56+rsp]
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        sub     rsp,288
+        and     rsp,-64
+
+        shl     rdx,6
+        sub     rsi,rdi
+        sub     r10,rdi
+        add     rdx,rdi
+
+
+        mov     QWORD[((64+8))+rsp],rsi
+        mov     QWORD[((64+16))+rsp],rdx
+
+        mov     QWORD[((64+32))+rsp],r8
+        mov     QWORD[((64+40))+rsp],r9
+        mov     QWORD[((64+48))+rsp],r10
+        mov     QWORD[120+rsp],rax
+
+        movaps  XMMWORD[128+rsp],xmm6
+        movaps  XMMWORD[144+rsp],xmm7
+        movaps  XMMWORD[160+rsp],xmm8
+        movaps  XMMWORD[176+rsp],xmm9
+        movaps  XMMWORD[192+rsp],xmm10
+        movaps  XMMWORD[208+rsp],xmm11
+        movaps  XMMWORD[224+rsp],xmm12
+        movaps  XMMWORD[240+rsp],xmm13
+        movaps  XMMWORD[256+rsp],xmm14
+        movaps  XMMWORD[272+rsp],xmm15
+$L$prologue_avx:
+        vzeroall
+
+        mov     r12,rdi
+        lea     rdi,[128+rcx]
+        lea     r13,[((K256+544))]
+        mov     r14d,DWORD[((240-128))+rdi]
+        mov     r15,r9
+        mov     rsi,r10
+        vmovdqu xmm8,XMMWORD[r8]
+        sub     r14,9
+
+        mov     eax,DWORD[r15]
+        mov     ebx,DWORD[4+r15]
+        mov     ecx,DWORD[8+r15]
+        mov     edx,DWORD[12+r15]
+        mov     r8d,DWORD[16+r15]
+        mov     r9d,DWORD[20+r15]
+        mov     r10d,DWORD[24+r15]
+        mov     r11d,DWORD[28+r15]
+
+        vmovdqa xmm14,XMMWORD[r14*8+r13]
+        vmovdqa xmm13,XMMWORD[16+r14*8+r13]
+        vmovdqa xmm12,XMMWORD[32+r14*8+r13]
+        vmovdqu xmm10,XMMWORD[((0-128))+rdi]
+        jmp     NEAR $L$loop_avx
+ALIGN   16
+$L$loop_avx:
+        vmovdqa xmm7,XMMWORD[((K256+512))]
+        vmovdqu xmm0,XMMWORD[r12*1+rsi]
+        vmovdqu xmm1,XMMWORD[16+r12*1+rsi]
+        vmovdqu xmm2,XMMWORD[32+r12*1+rsi]
+        vmovdqu xmm3,XMMWORD[48+r12*1+rsi]
+        vpshufb xmm0,xmm0,xmm7
+        lea     rbp,[K256]
+        vpshufb xmm1,xmm1,xmm7
+        vpshufb xmm2,xmm2,xmm7
+        vpaddd  xmm4,xmm0,XMMWORD[rbp]
+        vpshufb xmm3,xmm3,xmm7
+        vpaddd  xmm5,xmm1,XMMWORD[32+rbp]
+        vpaddd  xmm6,xmm2,XMMWORD[64+rbp]
+        vpaddd  xmm7,xmm3,XMMWORD[96+rbp]
+        vmovdqa XMMWORD[rsp],xmm4
+        mov     r14d,eax
+        vmovdqa XMMWORD[16+rsp],xmm5
+        mov     esi,ebx
+        vmovdqa XMMWORD[32+rsp],xmm6
+        xor     esi,ecx
+        vmovdqa XMMWORD[48+rsp],xmm7
+        mov     r13d,r8d
+        jmp     NEAR $L$avx_00_47
+
+ALIGN   16
+$L$avx_00_47:
+        sub     rbp,-16*2*4
+        vmovdqu xmm9,XMMWORD[r12]
+        mov     QWORD[((64+0))+rsp],r12
+        vpalignr        xmm4,xmm1,xmm0,4
+        shrd    r13d,r13d,14
+        mov     eax,r14d
+        mov     r12d,r9d
+        vpalignr        xmm7,xmm3,xmm2,4
+        xor     r13d,r8d
+        shrd    r14d,r14d,9
+        xor     r12d,r10d
+        vpsrld  xmm6,xmm4,7
+        shrd    r13d,r13d,5
+        xor     r14d,eax
+        and     r12d,r8d
+        vpaddd  xmm0,xmm0,xmm7
+        vpxor   xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((16-128))+rdi]
+        xor     r13d,r8d
+        add     r11d,DWORD[rsp]
+        mov     r15d,eax
+        vpsrld  xmm7,xmm4,3
+        shrd    r14d,r14d,11
+        xor     r12d,r10d
+        xor     r15d,ebx
+        vpslld  xmm5,xmm4,14
+        shrd    r13d,r13d,6
+        add     r11d,r12d
+        and     esi,r15d
+        vpxor   xmm4,xmm7,xmm6
+        xor     r14d,eax
+        add     r11d,r13d
+        xor     esi,ebx
+        vpshufd xmm7,xmm3,250
+        add     edx,r11d
+        shrd    r14d,r14d,2
+        add     r11d,esi
+        vpsrld  xmm6,xmm6,11
+        mov     r13d,edx
+        add     r14d,r11d
+        shrd    r13d,r13d,14
+        vpxor   xmm4,xmm4,xmm5
+        mov     r11d,r14d
+        mov     r12d,r8d
+        xor     r13d,edx
+        vpslld  xmm5,xmm5,11
+        shrd    r14d,r14d,9
+        xor     r12d,r9d
+        shrd    r13d,r13d,5
+        vpxor   xmm4,xmm4,xmm6
+        xor     r14d,r11d
+        and     r12d,edx
+        vpxor   xmm9,xmm9,xmm8
+        xor     r13d,edx
+        vpsrld  xmm6,xmm7,10
+        add     r10d,DWORD[4+rsp]
+        mov     esi,r11d
+        shrd    r14d,r14d,11
+        vpxor   xmm4,xmm4,xmm5
+        xor     r12d,r9d
+        xor     esi,eax
+        shrd    r13d,r13d,6
+        vpsrlq  xmm7,xmm7,17
+        add     r10d,r12d
+        and     r15d,esi
+        xor     r14d,r11d
+        vpaddd  xmm0,xmm0,xmm4
+        add     r10d,r13d
+        xor     r15d,eax
+        add     ecx,r10d
+        vpxor   xmm6,xmm6,xmm7
+        shrd    r14d,r14d,2
+        add     r10d,r15d
+        mov     r13d,ecx
+        vpsrlq  xmm7,xmm7,2
+        add     r14d,r10d
+        shrd    r13d,r13d,14
+        mov     r10d,r14d
+        vpxor   xmm6,xmm6,xmm7
+        mov     r12d,edx
+        xor     r13d,ecx
+        shrd    r14d,r14d,9
+        vpshufd xmm6,xmm6,132
+        xor     r12d,r8d
+        shrd    r13d,r13d,5
+        xor     r14d,r10d
+        vpsrldq xmm6,xmm6,8
+        and     r12d,ecx
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((32-128))+rdi]
+        xor     r13d,ecx
+        add     r9d,DWORD[8+rsp]
+        vpaddd  xmm0,xmm0,xmm6
+        mov     r15d,r10d
+        shrd    r14d,r14d,11
+        xor     r12d,r8d
+        vpshufd xmm7,xmm0,80
+        xor     r15d,r11d
+        shrd    r13d,r13d,6
+        add     r9d,r12d
+        vpsrld  xmm6,xmm7,10
+        and     esi,r15d
+        xor     r14d,r10d
+        add     r9d,r13d
+        vpsrlq  xmm7,xmm7,17
+        xor     esi,r11d
+        add     ebx,r9d
+        shrd    r14d,r14d,2
+        vpxor   xmm6,xmm6,xmm7
+        add     r9d,esi
+        mov     r13d,ebx
+        add     r14d,r9d
+        vpsrlq  xmm7,xmm7,2
+        shrd    r13d,r13d,14
+        mov     r9d,r14d
+        mov     r12d,ecx
+        vpxor   xmm6,xmm6,xmm7
+        xor     r13d,ebx
+        shrd    r14d,r14d,9
+        xor     r12d,edx
+        vpshufd xmm6,xmm6,232
+        shrd    r13d,r13d,5
+        xor     r14d,r9d
+        and     r12d,ebx
+        vpslldq xmm6,xmm6,8
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((48-128))+rdi]
+        xor     r13d,ebx
+        add     r8d,DWORD[12+rsp]
+        mov     esi,r9d
+        vpaddd  xmm0,xmm0,xmm6
+        shrd    r14d,r14d,11
+        xor     r12d,edx
+        xor     esi,r10d
+        vpaddd  xmm6,xmm0,XMMWORD[rbp]
+        shrd    r13d,r13d,6
+        add     r8d,r12d
+        and     r15d,esi
+        xor     r14d,r9d
+        add     r8d,r13d
+        xor     r15d,r10d
+        add     eax,r8d
+        shrd    r14d,r14d,2
+        add     r8d,r15d
+        mov     r13d,eax
+        add     r14d,r8d
+        vmovdqa XMMWORD[rsp],xmm6
+        vpalignr        xmm4,xmm2,xmm1,4
+        shrd    r13d,r13d,14
+        mov     r8d,r14d
+        mov     r12d,ebx
+        vpalignr        xmm7,xmm0,xmm3,4
+        xor     r13d,eax
+        shrd    r14d,r14d,9
+        xor     r12d,ecx
+        vpsrld  xmm6,xmm4,7
+        shrd    r13d,r13d,5
+        xor     r14d,r8d
+        and     r12d,eax
+        vpaddd  xmm1,xmm1,xmm7
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((64-128))+rdi]
+        xor     r13d,eax
+        add     edx,DWORD[16+rsp]
+        mov     r15d,r8d
+        vpsrld  xmm7,xmm4,3
+        shrd    r14d,r14d,11
+        xor     r12d,ecx
+        xor     r15d,r9d
+        vpslld  xmm5,xmm4,14
+        shrd    r13d,r13d,6
+        add     edx,r12d
+        and     esi,r15d
+        vpxor   xmm4,xmm7,xmm6
+        xor     r14d,r8d
+        add     edx,r13d
+        xor     esi,r9d
+        vpshufd xmm7,xmm0,250
+        add     r11d,edx
+        shrd    r14d,r14d,2
+        add     edx,esi
+        vpsrld  xmm6,xmm6,11
+        mov     r13d,r11d
+        add     r14d,edx
+        shrd    r13d,r13d,14
+        vpxor   xmm4,xmm4,xmm5
+        mov     edx,r14d
+        mov     r12d,eax
+        xor     r13d,r11d
+        vpslld  xmm5,xmm5,11
+        shrd    r14d,r14d,9
+        xor     r12d,ebx
+        shrd    r13d,r13d,5
+        vpxor   xmm4,xmm4,xmm6
+        xor     r14d,edx
+        and     r12d,r11d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((80-128))+rdi]
+        xor     r13d,r11d
+        vpsrld  xmm6,xmm7,10
+        add     ecx,DWORD[20+rsp]
+        mov     esi,edx
+        shrd    r14d,r14d,11
+        vpxor   xmm4,xmm4,xmm5
+        xor     r12d,ebx
+        xor     esi,r8d
+        shrd    r13d,r13d,6
+        vpsrlq  xmm7,xmm7,17
+        add     ecx,r12d
+        and     r15d,esi
+        xor     r14d,edx
+        vpaddd  xmm1,xmm1,xmm4
+        add     ecx,r13d
+        xor     r15d,r8d
+        add     r10d,ecx
+        vpxor   xmm6,xmm6,xmm7
+        shrd    r14d,r14d,2
+        add     ecx,r15d
+        mov     r13d,r10d
+        vpsrlq  xmm7,xmm7,2
+        add     r14d,ecx
+        shrd    r13d,r13d,14
+        mov     ecx,r14d
+        vpxor   xmm6,xmm6,xmm7
+        mov     r12d,r11d
+        xor     r13d,r10d
+        shrd    r14d,r14d,9
+        vpshufd xmm6,xmm6,132
+        xor     r12d,eax
+        shrd    r13d,r13d,5
+        xor     r14d,ecx
+        vpsrldq xmm6,xmm6,8
+        and     r12d,r10d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((96-128))+rdi]
+        xor     r13d,r10d
+        add     ebx,DWORD[24+rsp]
+        vpaddd  xmm1,xmm1,xmm6
+        mov     r15d,ecx
+        shrd    r14d,r14d,11
+        xor     r12d,eax
+        vpshufd xmm7,xmm1,80
+        xor     r15d,edx
+        shrd    r13d,r13d,6
+        add     ebx,r12d
+        vpsrld  xmm6,xmm7,10
+        and     esi,r15d
+        xor     r14d,ecx
+        add     ebx,r13d
+        vpsrlq  xmm7,xmm7,17
+        xor     esi,edx
+        add     r9d,ebx
+        shrd    r14d,r14d,2
+        vpxor   xmm6,xmm6,xmm7
+        add     ebx,esi
+        mov     r13d,r9d
+        add     r14d,ebx
+        vpsrlq  xmm7,xmm7,2
+        shrd    r13d,r13d,14
+        mov     ebx,r14d
+        mov     r12d,r10d
+        vpxor   xmm6,xmm6,xmm7
+        xor     r13d,r9d
+        shrd    r14d,r14d,9
+        xor     r12d,r11d
+        vpshufd xmm6,xmm6,232
+        shrd    r13d,r13d,5
+        xor     r14d,ebx
+        and     r12d,r9d
+        vpslldq xmm6,xmm6,8
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((112-128))+rdi]
+        xor     r13d,r9d
+        add     eax,DWORD[28+rsp]
+        mov     esi,ebx
+        vpaddd  xmm1,xmm1,xmm6
+        shrd    r14d,r14d,11
+        xor     r12d,r11d
+        xor     esi,ecx
+        vpaddd  xmm6,xmm1,XMMWORD[32+rbp]
+        shrd    r13d,r13d,6
+        add     eax,r12d
+        and     r15d,esi
+        xor     r14d,ebx
+        add     eax,r13d
+        xor     r15d,ecx
+        add     r8d,eax
+        shrd    r14d,r14d,2
+        add     eax,r15d
+        mov     r13d,r8d
+        add     r14d,eax
+        vmovdqa XMMWORD[16+rsp],xmm6
+        vpalignr        xmm4,xmm3,xmm2,4
+        shrd    r13d,r13d,14
+        mov     eax,r14d
+        mov     r12d,r9d
+        vpalignr        xmm7,xmm1,xmm0,4
+        xor     r13d,r8d
+        shrd    r14d,r14d,9
+        xor     r12d,r10d
+        vpsrld  xmm6,xmm4,7
+        shrd    r13d,r13d,5
+        xor     r14d,eax
+        and     r12d,r8d
+        vpaddd  xmm2,xmm2,xmm7
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((128-128))+rdi]
+        xor     r13d,r8d
+        add     r11d,DWORD[32+rsp]
+        mov     r15d,eax
+        vpsrld  xmm7,xmm4,3
+        shrd    r14d,r14d,11
+        xor     r12d,r10d
+        xor     r15d,ebx
+        vpslld  xmm5,xmm4,14
+        shrd    r13d,r13d,6
+        add     r11d,r12d
+        and     esi,r15d
+        vpxor   xmm4,xmm7,xmm6
+        xor     r14d,eax
+        add     r11d,r13d
+        xor     esi,ebx
+        vpshufd xmm7,xmm1,250
+        add     edx,r11d
+        shrd    r14d,r14d,2
+        add     r11d,esi
+        vpsrld  xmm6,xmm6,11
+        mov     r13d,edx
+        add     r14d,r11d
+        shrd    r13d,r13d,14
+        vpxor   xmm4,xmm4,xmm5
+        mov     r11d,r14d
+        mov     r12d,r8d
+        xor     r13d,edx
+        vpslld  xmm5,xmm5,11
+        shrd    r14d,r14d,9
+        xor     r12d,r9d
+        shrd    r13d,r13d,5
+        vpxor   xmm4,xmm4,xmm6
+        xor     r14d,r11d
+        and     r12d,edx
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((144-128))+rdi]
+        xor     r13d,edx
+        vpsrld  xmm6,xmm7,10
+        add     r10d,DWORD[36+rsp]
+        mov     esi,r11d
+        shrd    r14d,r14d,11
+        vpxor   xmm4,xmm4,xmm5
+        xor     r12d,r9d
+        xor     esi,eax
+        shrd    r13d,r13d,6
+        vpsrlq  xmm7,xmm7,17
+        add     r10d,r12d
+        and     r15d,esi
+        xor     r14d,r11d
+        vpaddd  xmm2,xmm2,xmm4
+        add     r10d,r13d
+        xor     r15d,eax
+        add     ecx,r10d
+        vpxor   xmm6,xmm6,xmm7
+        shrd    r14d,r14d,2
+        add     r10d,r15d
+        mov     r13d,ecx
+        vpsrlq  xmm7,xmm7,2
+        add     r14d,r10d
+        shrd    r13d,r13d,14
+        mov     r10d,r14d
+        vpxor   xmm6,xmm6,xmm7
+        mov     r12d,edx
+        xor     r13d,ecx
+        shrd    r14d,r14d,9
+        vpshufd xmm6,xmm6,132
+        xor     r12d,r8d
+        shrd    r13d,r13d,5
+        xor     r14d,r10d
+        vpsrldq xmm6,xmm6,8
+        and     r12d,ecx
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((160-128))+rdi]
+        xor     r13d,ecx
+        add     r9d,DWORD[40+rsp]
+        vpaddd  xmm2,xmm2,xmm6
+        mov     r15d,r10d
+        shrd    r14d,r14d,11
+        xor     r12d,r8d
+        vpshufd xmm7,xmm2,80
+        xor     r15d,r11d
+        shrd    r13d,r13d,6
+        add     r9d,r12d
+        vpsrld  xmm6,xmm7,10
+        and     esi,r15d
+        xor     r14d,r10d
+        add     r9d,r13d
+        vpsrlq  xmm7,xmm7,17
+        xor     esi,r11d
+        add     ebx,r9d
+        shrd    r14d,r14d,2
+        vpxor   xmm6,xmm6,xmm7
+        add     r9d,esi
+        mov     r13d,ebx
+        add     r14d,r9d
+        vpsrlq  xmm7,xmm7,2
+        shrd    r13d,r13d,14
+        mov     r9d,r14d
+        mov     r12d,ecx
+        vpxor   xmm6,xmm6,xmm7
+        xor     r13d,ebx
+        shrd    r14d,r14d,9
+        xor     r12d,edx
+        vpshufd xmm6,xmm6,232
+        shrd    r13d,r13d,5
+        xor     r14d,r9d
+        and     r12d,ebx
+        vpslldq xmm6,xmm6,8
+        vaesenclast     xmm11,xmm9,xmm10
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((176-128))+rdi]
+        xor     r13d,ebx
+        add     r8d,DWORD[44+rsp]
+        mov     esi,r9d
+        vpaddd  xmm2,xmm2,xmm6
+        shrd    r14d,r14d,11
+        xor     r12d,edx
+        xor     esi,r10d
+        vpaddd  xmm6,xmm2,XMMWORD[64+rbp]
+        shrd    r13d,r13d,6
+        add     r8d,r12d
+        and     r15d,esi
+        xor     r14d,r9d
+        add     r8d,r13d
+        xor     r15d,r10d
+        add     eax,r8d
+        shrd    r14d,r14d,2
+        add     r8d,r15d
+        mov     r13d,eax
+        add     r14d,r8d
+        vmovdqa XMMWORD[32+rsp],xmm6
+        vpalignr        xmm4,xmm0,xmm3,4
+        shrd    r13d,r13d,14
+        mov     r8d,r14d
+        mov     r12d,ebx
+        vpalignr        xmm7,xmm2,xmm1,4
+        xor     r13d,eax
+        shrd    r14d,r14d,9
+        xor     r12d,ecx
+        vpsrld  xmm6,xmm4,7
+        shrd    r13d,r13d,5
+        xor     r14d,r8d
+        and     r12d,eax
+        vpaddd  xmm3,xmm3,xmm7
+        vpand   xmm8,xmm11,xmm12
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((192-128))+rdi]
+        xor     r13d,eax
+        add     edx,DWORD[48+rsp]
+        mov     r15d,r8d
+        vpsrld  xmm7,xmm4,3
+        shrd    r14d,r14d,11
+        xor     r12d,ecx
+        xor     r15d,r9d
+        vpslld  xmm5,xmm4,14
+        shrd    r13d,r13d,6
+        add     edx,r12d
+        and     esi,r15d
+        vpxor   xmm4,xmm7,xmm6
+        xor     r14d,r8d
+        add     edx,r13d
+        xor     esi,r9d
+        vpshufd xmm7,xmm2,250
+        add     r11d,edx
+        shrd    r14d,r14d,2
+        add     edx,esi
+        vpsrld  xmm6,xmm6,11
+        mov     r13d,r11d
+        add     r14d,edx
+        shrd    r13d,r13d,14
+        vpxor   xmm4,xmm4,xmm5
+        mov     edx,r14d
+        mov     r12d,eax
+        xor     r13d,r11d
+        vpslld  xmm5,xmm5,11
+        shrd    r14d,r14d,9
+        xor     r12d,ebx
+        shrd    r13d,r13d,5
+        vpxor   xmm4,xmm4,xmm6
+        xor     r14d,edx
+        and     r12d,r11d
+        vaesenclast     xmm11,xmm9,xmm10
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((208-128))+rdi]
+        xor     r13d,r11d
+        vpsrld  xmm6,xmm7,10
+        add     ecx,DWORD[52+rsp]
+        mov     esi,edx
+        shrd    r14d,r14d,11
+        vpxor   xmm4,xmm4,xmm5
+        xor     r12d,ebx
+        xor     esi,r8d
+        shrd    r13d,r13d,6
+        vpsrlq  xmm7,xmm7,17
+        add     ecx,r12d
+        and     r15d,esi
+        xor     r14d,edx
+        vpaddd  xmm3,xmm3,xmm4
+        add     ecx,r13d
+        xor     r15d,r8d
+        add     r10d,ecx
+        vpxor   xmm6,xmm6,xmm7
+        shrd    r14d,r14d,2
+        add     ecx,r15d
+        mov     r13d,r10d
+        vpsrlq  xmm7,xmm7,2
+        add     r14d,ecx
+        shrd    r13d,r13d,14
+        mov     ecx,r14d
+        vpxor   xmm6,xmm6,xmm7
+        mov     r12d,r11d
+        xor     r13d,r10d
+        shrd    r14d,r14d,9
+        vpshufd xmm6,xmm6,132
+        xor     r12d,eax
+        shrd    r13d,r13d,5
+        xor     r14d,ecx
+        vpsrldq xmm6,xmm6,8
+        and     r12d,r10d
+        vpand   xmm11,xmm11,xmm13
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((224-128))+rdi]
+        xor     r13d,r10d
+        add     ebx,DWORD[56+rsp]
+        vpaddd  xmm3,xmm3,xmm6
+        mov     r15d,ecx
+        shrd    r14d,r14d,11
+        xor     r12d,eax
+        vpshufd xmm7,xmm3,80
+        xor     r15d,edx
+        shrd    r13d,r13d,6
+        add     ebx,r12d
+        vpsrld  xmm6,xmm7,10
+        and     esi,r15d
+        xor     r14d,ecx
+        add     ebx,r13d
+        vpsrlq  xmm7,xmm7,17
+        xor     esi,edx
+        add     r9d,ebx
+        shrd    r14d,r14d,2
+        vpxor   xmm6,xmm6,xmm7
+        add     ebx,esi
+        mov     r13d,r9d
+        add     r14d,ebx
+        vpsrlq  xmm7,xmm7,2
+        shrd    r13d,r13d,14
+        mov     ebx,r14d
+        mov     r12d,r10d
+        vpxor   xmm6,xmm6,xmm7
+        xor     r13d,r9d
+        shrd    r14d,r14d,9
+        xor     r12d,r11d
+        vpshufd xmm6,xmm6,232
+        shrd    r13d,r13d,5
+        xor     r14d,ebx
+        and     r12d,r9d
+        vpslldq xmm6,xmm6,8
+        vpor    xmm8,xmm8,xmm11
+        vaesenclast     xmm11,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((0-128))+rdi]
+        xor     r13d,r9d
+        add     eax,DWORD[60+rsp]
+        mov     esi,ebx
+        vpaddd  xmm3,xmm3,xmm6
+        shrd    r14d,r14d,11
+        xor     r12d,r11d
+        xor     esi,ecx
+        vpaddd  xmm6,xmm3,XMMWORD[96+rbp]
+        shrd    r13d,r13d,6
+        add     eax,r12d
+        and     r15d,esi
+        xor     r14d,ebx
+        add     eax,r13d
+        xor     r15d,ecx
+        add     r8d,eax
+        shrd    r14d,r14d,2
+        add     eax,r15d
+        mov     r13d,r8d
+        add     r14d,eax
+        vmovdqa XMMWORD[48+rsp],xmm6
+        mov     r12,QWORD[((64+0))+rsp]
+        vpand   xmm11,xmm11,xmm14
+        mov     r15,QWORD[((64+8))+rsp]
+        vpor    xmm8,xmm8,xmm11
+        vmovdqu XMMWORD[r12*1+r15],xmm8
+        lea     r12,[16+r12]
+        cmp     BYTE[131+rbp],0
+        jne     NEAR $L$avx_00_47
+        vmovdqu xmm9,XMMWORD[r12]
+        mov     QWORD[((64+0))+rsp],r12
+        shrd    r13d,r13d,14
+        mov     eax,r14d
+        mov     r12d,r9d
+        xor     r13d,r8d
+        shrd    r14d,r14d,9
+        xor     r12d,r10d
+        shrd    r13d,r13d,5
+        xor     r14d,eax
+        and     r12d,r8d
+        vpxor   xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((16-128))+rdi]
+        xor     r13d,r8d
+        add     r11d,DWORD[rsp]
+        mov     r15d,eax
+        shrd    r14d,r14d,11
+        xor     r12d,r10d
+        xor     r15d,ebx
+        shrd    r13d,r13d,6
+        add     r11d,r12d
+        and     esi,r15d
+        xor     r14d,eax
+        add     r11d,r13d
+        xor     esi,ebx
+        add     edx,r11d
+        shrd    r14d,r14d,2
+        add     r11d,esi
+        mov     r13d,edx
+        add     r14d,r11d
+        shrd    r13d,r13d,14
+        mov     r11d,r14d
+        mov     r12d,r8d
+        xor     r13d,edx
+        shrd    r14d,r14d,9
+        xor     r12d,r9d
+        shrd    r13d,r13d,5
+        xor     r14d,r11d
+        and     r12d,edx
+        vpxor   xmm9,xmm9,xmm8
+        xor     r13d,edx
+        add     r10d,DWORD[4+rsp]
+        mov     esi,r11d
+        shrd    r14d,r14d,11
+        xor     r12d,r9d
+        xor     esi,eax
+        shrd    r13d,r13d,6
+        add     r10d,r12d
+        and     r15d,esi
+        xor     r14d,r11d
+        add     r10d,r13d
+        xor     r15d,eax
+        add     ecx,r10d
+        shrd    r14d,r14d,2
+        add     r10d,r15d
+        mov     r13d,ecx
+        add     r14d,r10d
+        shrd    r13d,r13d,14
+        mov     r10d,r14d
+        mov     r12d,edx
+        xor     r13d,ecx
+        shrd    r14d,r14d,9
+        xor     r12d,r8d
+        shrd    r13d,r13d,5
+        xor     r14d,r10d
+        and     r12d,ecx
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((32-128))+rdi]
+        xor     r13d,ecx
+        add     r9d,DWORD[8+rsp]
+        mov     r15d,r10d
+        shrd    r14d,r14d,11
+        xor     r12d,r8d
+        xor     r15d,r11d
+        shrd    r13d,r13d,6
+        add     r9d,r12d
+        and     esi,r15d
+        xor     r14d,r10d
+        add     r9d,r13d
+        xor     esi,r11d
+        add     ebx,r9d
+        shrd    r14d,r14d,2
+        add     r9d,esi
+        mov     r13d,ebx
+        add     r14d,r9d
+        shrd    r13d,r13d,14
+        mov     r9d,r14d
+        mov     r12d,ecx
+        xor     r13d,ebx
+        shrd    r14d,r14d,9
+        xor     r12d,edx
+        shrd    r13d,r13d,5
+        xor     r14d,r9d
+        and     r12d,ebx
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((48-128))+rdi]
+        xor     r13d,ebx
+        add     r8d,DWORD[12+rsp]
+        mov     esi,r9d
+        shrd    r14d,r14d,11
+        xor     r12d,edx
+        xor     esi,r10d
+        shrd    r13d,r13d,6
+        add     r8d,r12d
+        and     r15d,esi
+        xor     r14d,r9d
+        add     r8d,r13d
+        xor     r15d,r10d
+        add     eax,r8d
+        shrd    r14d,r14d,2
+        add     r8d,r15d
+        mov     r13d,eax
+        add     r14d,r8d
+        shrd    r13d,r13d,14
+        mov     r8d,r14d
+        mov     r12d,ebx
+        xor     r13d,eax
+        shrd    r14d,r14d,9
+        xor     r12d,ecx
+        shrd    r13d,r13d,5
+        xor     r14d,r8d
+        and     r12d,eax
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((64-128))+rdi]
+        xor     r13d,eax
+        add     edx,DWORD[16+rsp]
+        mov     r15d,r8d
+        shrd    r14d,r14d,11
+        xor     r12d,ecx
+        xor     r15d,r9d
+        shrd    r13d,r13d,6
+        add     edx,r12d
+        and     esi,r15d
+        xor     r14d,r8d
+        add     edx,r13d
+        xor     esi,r9d
+        add     r11d,edx
+        shrd    r14d,r14d,2
+        add     edx,esi
+        mov     r13d,r11d
+        add     r14d,edx
+        shrd    r13d,r13d,14
+        mov     edx,r14d
+        mov     r12d,eax
+        xor     r13d,r11d
+        shrd    r14d,r14d,9
+        xor     r12d,ebx
+        shrd    r13d,r13d,5
+        xor     r14d,edx
+        and     r12d,r11d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((80-128))+rdi]
+        xor     r13d,r11d
+        add     ecx,DWORD[20+rsp]
+        mov     esi,edx
+        shrd    r14d,r14d,11
+        xor     r12d,ebx
+        xor     esi,r8d
+        shrd    r13d,r13d,6
+        add     ecx,r12d
+        and     r15d,esi
+        xor     r14d,edx
+        add     ecx,r13d
+        xor     r15d,r8d
+        add     r10d,ecx
+        shrd    r14d,r14d,2
+        add     ecx,r15d
+        mov     r13d,r10d
+        add     r14d,ecx
+        shrd    r13d,r13d,14
+        mov     ecx,r14d
+        mov     r12d,r11d
+        xor     r13d,r10d
+        shrd    r14d,r14d,9
+        xor     r12d,eax
+        shrd    r13d,r13d,5
+        xor     r14d,ecx
+        and     r12d,r10d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((96-128))+rdi]
+        xor     r13d,r10d
+        add     ebx,DWORD[24+rsp]
+        mov     r15d,ecx
+        shrd    r14d,r14d,11
+        xor     r12d,eax
+        xor     r15d,edx
+        shrd    r13d,r13d,6
+        add     ebx,r12d
+        and     esi,r15d
+        xor     r14d,ecx
+        add     ebx,r13d
+        xor     esi,edx
+        add     r9d,ebx
+        shrd    r14d,r14d,2
+        add     ebx,esi
+        mov     r13d,r9d
+        add     r14d,ebx
+        shrd    r13d,r13d,14
+        mov     ebx,r14d
+        mov     r12d,r10d
+        xor     r13d,r9d
+        shrd    r14d,r14d,9
+        xor     r12d,r11d
+        shrd    r13d,r13d,5
+        xor     r14d,ebx
+        and     r12d,r9d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((112-128))+rdi]
+        xor     r13d,r9d
+        add     eax,DWORD[28+rsp]
+        mov     esi,ebx
+        shrd    r14d,r14d,11
+        xor     r12d,r11d
+        xor     esi,ecx
+        shrd    r13d,r13d,6
+        add     eax,r12d
+        and     r15d,esi
+        xor     r14d,ebx
+        add     eax,r13d
+        xor     r15d,ecx
+        add     r8d,eax
+        shrd    r14d,r14d,2
+        add     eax,r15d
+        mov     r13d,r8d
+        add     r14d,eax
+        shrd    r13d,r13d,14
+        mov     eax,r14d
+        mov     r12d,r9d
+        xor     r13d,r8d
+        shrd    r14d,r14d,9
+        xor     r12d,r10d
+        shrd    r13d,r13d,5
+        xor     r14d,eax
+        and     r12d,r8d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((128-128))+rdi]
+        xor     r13d,r8d
+        add     r11d,DWORD[32+rsp]
+        mov     r15d,eax
+        shrd    r14d,r14d,11
+        xor     r12d,r10d
+        xor     r15d,ebx
+        shrd    r13d,r13d,6
+        add     r11d,r12d
+        and     esi,r15d
+        xor     r14d,eax
+        add     r11d,r13d
+        xor     esi,ebx
+        add     edx,r11d
+        shrd    r14d,r14d,2
+        add     r11d,esi
+        mov     r13d,edx
+        add     r14d,r11d
+        shrd    r13d,r13d,14
+        mov     r11d,r14d
+        mov     r12d,r8d
+        xor     r13d,edx
+        shrd    r14d,r14d,9
+        xor     r12d,r9d
+        shrd    r13d,r13d,5
+        xor     r14d,r11d
+        and     r12d,edx
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((144-128))+rdi]
+        xor     r13d,edx
+        add     r10d,DWORD[36+rsp]
+        mov     esi,r11d
+        shrd    r14d,r14d,11
+        xor     r12d,r9d
+        xor     esi,eax
+        shrd    r13d,r13d,6
+        add     r10d,r12d
+        and     r15d,esi
+        xor     r14d,r11d
+        add     r10d,r13d
+        xor     r15d,eax
+        add     ecx,r10d
+        shrd    r14d,r14d,2
+        add     r10d,r15d
+        mov     r13d,ecx
+        add     r14d,r10d
+        shrd    r13d,r13d,14
+        mov     r10d,r14d
+        mov     r12d,edx
+        xor     r13d,ecx
+        shrd    r14d,r14d,9
+        xor     r12d,r8d
+        shrd    r13d,r13d,5
+        xor     r14d,r10d
+        and     r12d,ecx
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((160-128))+rdi]
+        xor     r13d,ecx
+        add     r9d,DWORD[40+rsp]
+        mov     r15d,r10d
+        shrd    r14d,r14d,11
+        xor     r12d,r8d
+        xor     r15d,r11d
+        shrd    r13d,r13d,6
+        add     r9d,r12d
+        and     esi,r15d
+        xor     r14d,r10d
+        add     r9d,r13d
+        xor     esi,r11d
+        add     ebx,r9d
+        shrd    r14d,r14d,2
+        add     r9d,esi
+        mov     r13d,ebx
+        add     r14d,r9d
+        shrd    r13d,r13d,14
+        mov     r9d,r14d
+        mov     r12d,ecx
+        xor     r13d,ebx
+        shrd    r14d,r14d,9
+        xor     r12d,edx
+        shrd    r13d,r13d,5
+        xor     r14d,r9d
+        and     r12d,ebx
+        vaesenclast     xmm11,xmm9,xmm10
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((176-128))+rdi]
+        xor     r13d,ebx
+        add     r8d,DWORD[44+rsp]
+        mov     esi,r9d
+        shrd    r14d,r14d,11
+        xor     r12d,edx
+        xor     esi,r10d
+        shrd    r13d,r13d,6
+        add     r8d,r12d
+        and     r15d,esi
+        xor     r14d,r9d
+        add     r8d,r13d
+        xor     r15d,r10d
+        add     eax,r8d
+        shrd    r14d,r14d,2
+        add     r8d,r15d
+        mov     r13d,eax
+        add     r14d,r8d
+        shrd    r13d,r13d,14
+        mov     r8d,r14d
+        mov     r12d,ebx
+        xor     r13d,eax
+        shrd    r14d,r14d,9
+        xor     r12d,ecx
+        shrd    r13d,r13d,5
+        xor     r14d,r8d
+        and     r12d,eax
+        vpand   xmm8,xmm11,xmm12
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((192-128))+rdi]
+        xor     r13d,eax
+        add     edx,DWORD[48+rsp]
+        mov     r15d,r8d
+        shrd    r14d,r14d,11
+        xor     r12d,ecx
+        xor     r15d,r9d
+        shrd    r13d,r13d,6
+        add     edx,r12d
+        and     esi,r15d
+        xor     r14d,r8d
+        add     edx,r13d
+        xor     esi,r9d
+        add     r11d,edx
+        shrd    r14d,r14d,2
+        add     edx,esi
+        mov     r13d,r11d
+        add     r14d,edx
+        shrd    r13d,r13d,14
+        mov     edx,r14d
+        mov     r12d,eax
+        xor     r13d,r11d
+        shrd    r14d,r14d,9
+        xor     r12d,ebx
+        shrd    r13d,r13d,5
+        xor     r14d,edx
+        and     r12d,r11d
+        vaesenclast     xmm11,xmm9,xmm10
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((208-128))+rdi]
+        xor     r13d,r11d
+        add     ecx,DWORD[52+rsp]
+        mov     esi,edx
+        shrd    r14d,r14d,11
+        xor     r12d,ebx
+        xor     esi,r8d
+        shrd    r13d,r13d,6
+        add     ecx,r12d
+        and     r15d,esi
+        xor     r14d,edx
+        add     ecx,r13d
+        xor     r15d,r8d
+        add     r10d,ecx
+        shrd    r14d,r14d,2
+        add     ecx,r15d
+        mov     r13d,r10d
+        add     r14d,ecx
+        shrd    r13d,r13d,14
+        mov     ecx,r14d
+        mov     r12d,r11d
+        xor     r13d,r10d
+        shrd    r14d,r14d,9
+        xor     r12d,eax
+        shrd    r13d,r13d,5
+        xor     r14d,ecx
+        and     r12d,r10d
+        vpand   xmm11,xmm11,xmm13
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((224-128))+rdi]
+        xor     r13d,r10d
+        add     ebx,DWORD[56+rsp]
+        mov     r15d,ecx
+        shrd    r14d,r14d,11
+        xor     r12d,eax
+        xor     r15d,edx
+        shrd    r13d,r13d,6
+        add     ebx,r12d
+        and     esi,r15d
+        xor     r14d,ecx
+        add     ebx,r13d
+        xor     esi,edx
+        add     r9d,ebx
+        shrd    r14d,r14d,2
+        add     ebx,esi
+        mov     r13d,r9d
+        add     r14d,ebx
+        shrd    r13d,r13d,14
+        mov     ebx,r14d
+        mov     r12d,r10d
+        xor     r13d,r9d
+        shrd    r14d,r14d,9
+        xor     r12d,r11d
+        shrd    r13d,r13d,5
+        xor     r14d,ebx
+        and     r12d,r9d
+        vpor    xmm8,xmm8,xmm11
+        vaesenclast     xmm11,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((0-128))+rdi]
+        xor     r13d,r9d
+        add     eax,DWORD[60+rsp]
+        mov     esi,ebx
+        shrd    r14d,r14d,11
+        xor     r12d,r11d
+        xor     esi,ecx
+        shrd    r13d,r13d,6
+        add     eax,r12d
+        and     r15d,esi
+        xor     r14d,ebx
+        add     eax,r13d
+        xor     r15d,ecx
+        add     r8d,eax
+        shrd    r14d,r14d,2
+        add     eax,r15d
+        mov     r13d,r8d
+        add     r14d,eax
+        mov     r12,QWORD[((64+0))+rsp]
+        mov     r13,QWORD[((64+8))+rsp]
+        mov     r15,QWORD[((64+40))+rsp]
+        mov     rsi,QWORD[((64+48))+rsp]
+
+        vpand   xmm11,xmm11,xmm14
+        mov     eax,r14d
+        vpor    xmm8,xmm8,xmm11
+        vmovdqu XMMWORD[r13*1+r12],xmm8
+        lea     r12,[16+r12]
+
+        add     eax,DWORD[r15]
+        add     ebx,DWORD[4+r15]
+        add     ecx,DWORD[8+r15]
+        add     edx,DWORD[12+r15]
+        add     r8d,DWORD[16+r15]
+        add     r9d,DWORD[20+r15]
+        add     r10d,DWORD[24+r15]
+        add     r11d,DWORD[28+r15]
+
+        cmp     r12,QWORD[((64+16))+rsp]
+
+        mov     DWORD[r15],eax
+        mov     DWORD[4+r15],ebx
+        mov     DWORD[8+r15],ecx
+        mov     DWORD[12+r15],edx
+        mov     DWORD[16+r15],r8d
+        mov     DWORD[20+r15],r9d
+        mov     DWORD[24+r15],r10d
+        mov     DWORD[28+r15],r11d
+        jb      NEAR $L$loop_avx
+
+        mov     r8,QWORD[((64+32))+rsp]
+        mov     rsi,QWORD[120+rsp]
+
+        vmovdqu XMMWORD[r8],xmm8
+        vzeroall
+        movaps  xmm6,XMMWORD[128+rsp]
+        movaps  xmm7,XMMWORD[144+rsp]
+        movaps  xmm8,XMMWORD[160+rsp]
+        movaps  xmm9,XMMWORD[176+rsp]
+        movaps  xmm10,XMMWORD[192+rsp]
+        movaps  xmm11,XMMWORD[208+rsp]
+        movaps  xmm12,XMMWORD[224+rsp]
+        movaps  xmm13,XMMWORD[240+rsp]
+        movaps  xmm14,XMMWORD[256+rsp]
+        movaps  xmm15,XMMWORD[272+rsp]
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$epilogue_avx:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_aesni_cbc_sha256_enc_avx:
+
+ALIGN   64
+aesni_cbc_sha256_enc_avx2:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_cbc_sha256_enc_avx2:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+$L$avx2_shortcut:
+        mov     r10,QWORD[56+rsp]
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        sub     rsp,736
+        and     rsp,-256*4
+        add     rsp,448
+
+        shl     rdx,6
+        sub     rsi,rdi
+        sub     r10,rdi
+        add     rdx,rdi
+
+
+
+        mov     QWORD[((64+16))+rsp],rdx
+
+        mov     QWORD[((64+32))+rsp],r8
+        mov     QWORD[((64+40))+rsp],r9
+        mov     QWORD[((64+48))+rsp],r10
+        mov     QWORD[120+rsp],rax
+
+        movaps  XMMWORD[128+rsp],xmm6
+        movaps  XMMWORD[144+rsp],xmm7
+        movaps  XMMWORD[160+rsp],xmm8
+        movaps  XMMWORD[176+rsp],xmm9
+        movaps  XMMWORD[192+rsp],xmm10
+        movaps  XMMWORD[208+rsp],xmm11
+        movaps  XMMWORD[224+rsp],xmm12
+        movaps  XMMWORD[240+rsp],xmm13
+        movaps  XMMWORD[256+rsp],xmm14
+        movaps  XMMWORD[272+rsp],xmm15
+$L$prologue_avx2:
+        vzeroall
+
+        mov     r13,rdi
+        vpinsrq xmm15,xmm15,rsi,1
+        lea     rdi,[128+rcx]
+        lea     r12,[((K256+544))]
+        mov     r14d,DWORD[((240-128))+rdi]
+        mov     r15,r9
+        mov     rsi,r10
+        vmovdqu xmm8,XMMWORD[r8]
+        lea     r14,[((-9))+r14]
+
+        vmovdqa xmm14,XMMWORD[r14*8+r12]
+        vmovdqa xmm13,XMMWORD[16+r14*8+r12]
+        vmovdqa xmm12,XMMWORD[32+r14*8+r12]
+
+        sub     r13,-16*4
+        mov     eax,DWORD[r15]
+        lea     r12,[r13*1+rsi]
+        mov     ebx,DWORD[4+r15]
+        cmp     r13,rdx
+        mov     ecx,DWORD[8+r15]
+        cmove   r12,rsp
+        mov     edx,DWORD[12+r15]
+        mov     r8d,DWORD[16+r15]
+        mov     r9d,DWORD[20+r15]
+        mov     r10d,DWORD[24+r15]
+        mov     r11d,DWORD[28+r15]
+        vmovdqu xmm10,XMMWORD[((0-128))+rdi]
+        jmp     NEAR $L$oop_avx2
+ALIGN   16
+$L$oop_avx2:
+        vmovdqa ymm7,YMMWORD[((K256+512))]
+        vmovdqu xmm0,XMMWORD[((-64+0))+r13*1+rsi]
+        vmovdqu xmm1,XMMWORD[((-64+16))+r13*1+rsi]
+        vmovdqu xmm2,XMMWORD[((-64+32))+r13*1+rsi]
+        vmovdqu xmm3,XMMWORD[((-64+48))+r13*1+rsi]
+
+        vinserti128     ymm0,ymm0,XMMWORD[r12],1
+        vinserti128     ymm1,ymm1,XMMWORD[16+r12],1
+        vpshufb ymm0,ymm0,ymm7
+        vinserti128     ymm2,ymm2,XMMWORD[32+r12],1
+        vpshufb ymm1,ymm1,ymm7
+        vinserti128     ymm3,ymm3,XMMWORD[48+r12],1
+
+        lea     rbp,[K256]
+        vpshufb ymm2,ymm2,ymm7
+        lea     r13,[((-64))+r13]
+        vpaddd  ymm4,ymm0,YMMWORD[rbp]
+        vpshufb ymm3,ymm3,ymm7
+        vpaddd  ymm5,ymm1,YMMWORD[32+rbp]
+        vpaddd  ymm6,ymm2,YMMWORD[64+rbp]
+        vpaddd  ymm7,ymm3,YMMWORD[96+rbp]
+        vmovdqa YMMWORD[rsp],ymm4
+        xor     r14d,r14d
+        vmovdqa YMMWORD[32+rsp],ymm5
+        lea     rsp,[((-64))+rsp]
+        mov     esi,ebx
+        vmovdqa YMMWORD[rsp],ymm6
+        xor     esi,ecx
+        vmovdqa YMMWORD[32+rsp],ymm7
+        mov     r12d,r9d
+        sub     rbp,-16*2*4
+        jmp     NEAR $L$avx2_00_47
+
+ALIGN   16
+$L$avx2_00_47:
+        vmovdqu xmm9,XMMWORD[r13]
+        vpinsrq xmm15,xmm15,r13,0
+        lea     rsp,[((-64))+rsp]
+        vpalignr        ymm4,ymm1,ymm0,4
+        add     r11d,DWORD[((0+128))+rsp]
+        and     r12d,r8d
+        rorx    r13d,r8d,25
+        vpalignr        ymm7,ymm3,ymm2,4
+        rorx    r15d,r8d,11
+        lea     eax,[r14*1+rax]
+        lea     r11d,[r12*1+r11]
+        vpsrld  ymm6,ymm4,7
+        andn    r12d,r8d,r10d
+        xor     r13d,r15d
+        rorx    r14d,r8d,6
+        vpaddd  ymm0,ymm0,ymm7
+        lea     r11d,[r12*1+r11]
+        xor     r13d,r14d
+        mov     r15d,eax
+        vpsrld  ymm7,ymm4,3
+        rorx    r12d,eax,22
+        lea     r11d,[r13*1+r11]
+        xor     r15d,ebx
+        vpslld  ymm5,ymm4,14
+        rorx    r14d,eax,13
+        rorx    r13d,eax,2
+        lea     edx,[r11*1+rdx]
+        vpxor   ymm4,ymm7,ymm6
+        and     esi,r15d
+        vpxor   xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((16-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,ebx
+        vpshufd ymm7,ymm3,250
+        xor     r14d,r13d
+        lea     r11d,[rsi*1+r11]
+        mov     r12d,r8d
+        vpsrld  ymm6,ymm6,11
+        add     r10d,DWORD[((4+128))+rsp]
+        and     r12d,edx
+        rorx    r13d,edx,25
+        vpxor   ymm4,ymm4,ymm5
+        rorx    esi,edx,11
+        lea     r11d,[r14*1+r11]
+        lea     r10d,[r12*1+r10]
+        vpslld  ymm5,ymm5,11
+        andn    r12d,edx,r9d
+        xor     r13d,esi
+        rorx    r14d,edx,6
+        vpxor   ymm4,ymm4,ymm6
+        lea     r10d,[r12*1+r10]
+        xor     r13d,r14d
+        mov     esi,r11d
+        vpsrld  ymm6,ymm7,10
+        rorx    r12d,r11d,22
+        lea     r10d,[r13*1+r10]
+        xor     esi,eax
+        vpxor   ymm4,ymm4,ymm5
+        rorx    r14d,r11d,13
+        rorx    r13d,r11d,2
+        lea     ecx,[r10*1+rcx]
+        vpsrlq  ymm7,ymm7,17
+        and     r15d,esi
+        vpxor   xmm9,xmm9,xmm8
+        xor     r14d,r12d
+        xor     r15d,eax
+        vpaddd  ymm0,ymm0,ymm4
+        xor     r14d,r13d
+        lea     r10d,[r15*1+r10]
+        mov     r12d,edx
+        vpxor   ymm6,ymm6,ymm7
+        add     r9d,DWORD[((8+128))+rsp]
+        and     r12d,ecx
+        rorx    r13d,ecx,25
+        vpsrlq  ymm7,ymm7,2
+        rorx    r15d,ecx,11
+        lea     r10d,[r14*1+r10]
+        lea     r9d,[r12*1+r9]
+        vpxor   ymm6,ymm6,ymm7
+        andn    r12d,ecx,r8d
+        xor     r13d,r15d
+        rorx    r14d,ecx,6
+        vpshufd ymm6,ymm6,132
+        lea     r9d,[r12*1+r9]
+        xor     r13d,r14d
+        mov     r15d,r10d
+        vpsrldq ymm6,ymm6,8
+        rorx    r12d,r10d,22
+        lea     r9d,[r13*1+r9]
+        xor     r15d,r11d
+        vpaddd  ymm0,ymm0,ymm6
+        rorx    r14d,r10d,13
+        rorx    r13d,r10d,2
+        lea     ebx,[r9*1+rbx]
+        vpshufd ymm7,ymm0,80
+        and     esi,r15d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((32-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,r11d
+        vpsrld  ymm6,ymm7,10
+        xor     r14d,r13d
+        lea     r9d,[rsi*1+r9]
+        mov     r12d,ecx
+        vpsrlq  ymm7,ymm7,17
+        add     r8d,DWORD[((12+128))+rsp]
+        and     r12d,ebx
+        rorx    r13d,ebx,25
+        vpxor   ymm6,ymm6,ymm7
+        rorx    esi,ebx,11
+        lea     r9d,[r14*1+r9]
+        lea     r8d,[r12*1+r8]
+        vpsrlq  ymm7,ymm7,2
+        andn    r12d,ebx,edx
+        xor     r13d,esi
+        rorx    r14d,ebx,6
+        vpxor   ymm6,ymm6,ymm7
+        lea     r8d,[r12*1+r8]
+        xor     r13d,r14d
+        mov     esi,r9d
+        vpshufd ymm6,ymm6,232
+        rorx    r12d,r9d,22
+        lea     r8d,[r13*1+r8]
+        xor     esi,r10d
+        vpslldq ymm6,ymm6,8
+        rorx    r14d,r9d,13
+        rorx    r13d,r9d,2
+        lea     eax,[r8*1+rax]
+        vpaddd  ymm0,ymm0,ymm6
+        and     r15d,esi
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((48-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,r10d
+        vpaddd  ymm6,ymm0,YMMWORD[rbp]
+        xor     r14d,r13d
+        lea     r8d,[r15*1+r8]
+        mov     r12d,ebx
+        vmovdqa YMMWORD[rsp],ymm6
+        vpalignr        ymm4,ymm2,ymm1,4
+        add     edx,DWORD[((32+128))+rsp]
+        and     r12d,eax
+        rorx    r13d,eax,25
+        vpalignr        ymm7,ymm0,ymm3,4
+        rorx    r15d,eax,11
+        lea     r8d,[r14*1+r8]
+        lea     edx,[r12*1+rdx]
+        vpsrld  ymm6,ymm4,7
+        andn    r12d,eax,ecx
+        xor     r13d,r15d
+        rorx    r14d,eax,6
+        vpaddd  ymm1,ymm1,ymm7
+        lea     edx,[r12*1+rdx]
+        xor     r13d,r14d
+        mov     r15d,r8d
+        vpsrld  ymm7,ymm4,3
+        rorx    r12d,r8d,22
+        lea     edx,[r13*1+rdx]
+        xor     r15d,r9d
+        vpslld  ymm5,ymm4,14
+        rorx    r14d,r8d,13
+        rorx    r13d,r8d,2
+        lea     r11d,[rdx*1+r11]
+        vpxor   ymm4,ymm7,ymm6
+        and     esi,r15d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((64-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,r9d
+        vpshufd ymm7,ymm0,250
+        xor     r14d,r13d
+        lea     edx,[rsi*1+rdx]
+        mov     r12d,eax
+        vpsrld  ymm6,ymm6,11
+        add     ecx,DWORD[((36+128))+rsp]
+        and     r12d,r11d
+        rorx    r13d,r11d,25
+        vpxor   ymm4,ymm4,ymm5
+        rorx    esi,r11d,11
+        lea     edx,[r14*1+rdx]
+        lea     ecx,[r12*1+rcx]
+        vpslld  ymm5,ymm5,11
+        andn    r12d,r11d,ebx
+        xor     r13d,esi
+        rorx    r14d,r11d,6
+        vpxor   ymm4,ymm4,ymm6
+        lea     ecx,[r12*1+rcx]
+        xor     r13d,r14d
+        mov     esi,edx
+        vpsrld  ymm6,ymm7,10
+        rorx    r12d,edx,22
+        lea     ecx,[r13*1+rcx]
+        xor     esi,r8d
+        vpxor   ymm4,ymm4,ymm5
+        rorx    r14d,edx,13
+        rorx    r13d,edx,2
+        lea     r10d,[rcx*1+r10]
+        vpsrlq  ymm7,ymm7,17
+        and     r15d,esi
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((80-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,r8d
+        vpaddd  ymm1,ymm1,ymm4
+        xor     r14d,r13d
+        lea     ecx,[r15*1+rcx]
+        mov     r12d,r11d
+        vpxor   ymm6,ymm6,ymm7
+        add     ebx,DWORD[((40+128))+rsp]
+        and     r12d,r10d
+        rorx    r13d,r10d,25
+        vpsrlq  ymm7,ymm7,2
+        rorx    r15d,r10d,11
+        lea     ecx,[r14*1+rcx]
+        lea     ebx,[r12*1+rbx]
+        vpxor   ymm6,ymm6,ymm7
+        andn    r12d,r10d,eax
+        xor     r13d,r15d
+        rorx    r14d,r10d,6
+        vpshufd ymm6,ymm6,132
+        lea     ebx,[r12*1+rbx]
+        xor     r13d,r14d
+        mov     r15d,ecx
+        vpsrldq ymm6,ymm6,8
+        rorx    r12d,ecx,22
+        lea     ebx,[r13*1+rbx]
+        xor     r15d,edx
+        vpaddd  ymm1,ymm1,ymm6
+        rorx    r14d,ecx,13
+        rorx    r13d,ecx,2
+        lea     r9d,[rbx*1+r9]
+        vpshufd ymm7,ymm1,80
+        and     esi,r15d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((96-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,edx
+        vpsrld  ymm6,ymm7,10
+        xor     r14d,r13d
+        lea     ebx,[rsi*1+rbx]
+        mov     r12d,r10d
+        vpsrlq  ymm7,ymm7,17
+        add     eax,DWORD[((44+128))+rsp]
+        and     r12d,r9d
+        rorx    r13d,r9d,25
+        vpxor   ymm6,ymm6,ymm7
+        rorx    esi,r9d,11
+        lea     ebx,[r14*1+rbx]
+        lea     eax,[r12*1+rax]
+        vpsrlq  ymm7,ymm7,2
+        andn    r12d,r9d,r11d
+        xor     r13d,esi
+        rorx    r14d,r9d,6
+        vpxor   ymm6,ymm6,ymm7
+        lea     eax,[r12*1+rax]
+        xor     r13d,r14d
+        mov     esi,ebx
+        vpshufd ymm6,ymm6,232
+        rorx    r12d,ebx,22
+        lea     eax,[r13*1+rax]
+        xor     esi,ecx
+        vpslldq ymm6,ymm6,8
+        rorx    r14d,ebx,13
+        rorx    r13d,ebx,2
+        lea     r8d,[rax*1+r8]
+        vpaddd  ymm1,ymm1,ymm6
+        and     r15d,esi
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((112-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,ecx
+        vpaddd  ymm6,ymm1,YMMWORD[32+rbp]
+        xor     r14d,r13d
+        lea     eax,[r15*1+rax]
+        mov     r12d,r9d
+        vmovdqa YMMWORD[32+rsp],ymm6
+        lea     rsp,[((-64))+rsp]
+        vpalignr        ymm4,ymm3,ymm2,4
+        add     r11d,DWORD[((0+128))+rsp]
+        and     r12d,r8d
+        rorx    r13d,r8d,25
+        vpalignr        ymm7,ymm1,ymm0,4
+        rorx    r15d,r8d,11
+        lea     eax,[r14*1+rax]
+        lea     r11d,[r12*1+r11]
+        vpsrld  ymm6,ymm4,7
+        andn    r12d,r8d,r10d
+        xor     r13d,r15d
+        rorx    r14d,r8d,6
+        vpaddd  ymm2,ymm2,ymm7
+        lea     r11d,[r12*1+r11]
+        xor     r13d,r14d
+        mov     r15d,eax
+        vpsrld  ymm7,ymm4,3
+        rorx    r12d,eax,22
+        lea     r11d,[r13*1+r11]
+        xor     r15d,ebx
+        vpslld  ymm5,ymm4,14
+        rorx    r14d,eax,13
+        rorx    r13d,eax,2
+        lea     edx,[r11*1+rdx]
+        vpxor   ymm4,ymm7,ymm6
+        and     esi,r15d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((128-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,ebx
+        vpshufd ymm7,ymm1,250
+        xor     r14d,r13d
+        lea     r11d,[rsi*1+r11]
+        mov     r12d,r8d
+        vpsrld  ymm6,ymm6,11
+        add     r10d,DWORD[((4+128))+rsp]
+        and     r12d,edx
+        rorx    r13d,edx,25
+        vpxor   ymm4,ymm4,ymm5
+        rorx    esi,edx,11
+        lea     r11d,[r14*1+r11]
+        lea     r10d,[r12*1+r10]
+        vpslld  ymm5,ymm5,11
+        andn    r12d,edx,r9d
+        xor     r13d,esi
+        rorx    r14d,edx,6
+        vpxor   ymm4,ymm4,ymm6
+        lea     r10d,[r12*1+r10]
+        xor     r13d,r14d
+        mov     esi,r11d
+        vpsrld  ymm6,ymm7,10
+        rorx    r12d,r11d,22
+        lea     r10d,[r13*1+r10]
+        xor     esi,eax
+        vpxor   ymm4,ymm4,ymm5
+        rorx    r14d,r11d,13
+        rorx    r13d,r11d,2
+        lea     ecx,[r10*1+rcx]
+        vpsrlq  ymm7,ymm7,17
+        and     r15d,esi
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((144-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,eax
+        vpaddd  ymm2,ymm2,ymm4
+        xor     r14d,r13d
+        lea     r10d,[r15*1+r10]
+        mov     r12d,edx
+        vpxor   ymm6,ymm6,ymm7
+        add     r9d,DWORD[((8+128))+rsp]
+        and     r12d,ecx
+        rorx    r13d,ecx,25
+        vpsrlq  ymm7,ymm7,2
+        rorx    r15d,ecx,11
+        lea     r10d,[r14*1+r10]
+        lea     r9d,[r12*1+r9]
+        vpxor   ymm6,ymm6,ymm7
+        andn    r12d,ecx,r8d
+        xor     r13d,r15d
+        rorx    r14d,ecx,6
+        vpshufd ymm6,ymm6,132
+        lea     r9d,[r12*1+r9]
+        xor     r13d,r14d
+        mov     r15d,r10d
+        vpsrldq ymm6,ymm6,8
+        rorx    r12d,r10d,22
+        lea     r9d,[r13*1+r9]
+        xor     r15d,r11d
+        vpaddd  ymm2,ymm2,ymm6
+        rorx    r14d,r10d,13
+        rorx    r13d,r10d,2
+        lea     ebx,[r9*1+rbx]
+        vpshufd ymm7,ymm2,80
+        and     esi,r15d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((160-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,r11d
+        vpsrld  ymm6,ymm7,10
+        xor     r14d,r13d
+        lea     r9d,[rsi*1+r9]
+        mov     r12d,ecx
+        vpsrlq  ymm7,ymm7,17
+        add     r8d,DWORD[((12+128))+rsp]
+        and     r12d,ebx
+        rorx    r13d,ebx,25
+        vpxor   ymm6,ymm6,ymm7
+        rorx    esi,ebx,11
+        lea     r9d,[r14*1+r9]
+        lea     r8d,[r12*1+r8]
+        vpsrlq  ymm7,ymm7,2
+        andn    r12d,ebx,edx
+        xor     r13d,esi
+        rorx    r14d,ebx,6
+        vpxor   ymm6,ymm6,ymm7
+        lea     r8d,[r12*1+r8]
+        xor     r13d,r14d
+        mov     esi,r9d
+        vpshufd ymm6,ymm6,232
+        rorx    r12d,r9d,22
+        lea     r8d,[r13*1+r8]
+        xor     esi,r10d
+        vpslldq ymm6,ymm6,8
+        rorx    r14d,r9d,13
+        rorx    r13d,r9d,2
+        lea     eax,[r8*1+rax]
+        vpaddd  ymm2,ymm2,ymm6
+        and     r15d,esi
+        vaesenclast     xmm11,xmm9,xmm10
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((176-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,r10d
+        vpaddd  ymm6,ymm2,YMMWORD[64+rbp]
+        xor     r14d,r13d
+        lea     r8d,[r15*1+r8]
+        mov     r12d,ebx
+        vmovdqa YMMWORD[rsp],ymm6
+        vpalignr        ymm4,ymm0,ymm3,4
+        add     edx,DWORD[((32+128))+rsp]
+        and     r12d,eax
+        rorx    r13d,eax,25
+        vpalignr        ymm7,ymm2,ymm1,4
+        rorx    r15d,eax,11
+        lea     r8d,[r14*1+r8]
+        lea     edx,[r12*1+rdx]
+        vpsrld  ymm6,ymm4,7
+        andn    r12d,eax,ecx
+        xor     r13d,r15d
+        rorx    r14d,eax,6
+        vpaddd  ymm3,ymm3,ymm7
+        lea     edx,[r12*1+rdx]
+        xor     r13d,r14d
+        mov     r15d,r8d
+        vpsrld  ymm7,ymm4,3
+        rorx    r12d,r8d,22
+        lea     edx,[r13*1+rdx]
+        xor     r15d,r9d
+        vpslld  ymm5,ymm4,14
+        rorx    r14d,r8d,13
+        rorx    r13d,r8d,2
+        lea     r11d,[rdx*1+r11]
+        vpxor   ymm4,ymm7,ymm6
+        and     esi,r15d
+        vpand   xmm8,xmm11,xmm12
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((192-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,r9d
+        vpshufd ymm7,ymm2,250
+        xor     r14d,r13d
+        lea     edx,[rsi*1+rdx]
+        mov     r12d,eax
+        vpsrld  ymm6,ymm6,11
+        add     ecx,DWORD[((36+128))+rsp]
+        and     r12d,r11d
+        rorx    r13d,r11d,25
+        vpxor   ymm4,ymm4,ymm5
+        rorx    esi,r11d,11
+        lea     edx,[r14*1+rdx]
+        lea     ecx,[r12*1+rcx]
+        vpslld  ymm5,ymm5,11
+        andn    r12d,r11d,ebx
+        xor     r13d,esi
+        rorx    r14d,r11d,6
+        vpxor   ymm4,ymm4,ymm6
+        lea     ecx,[r12*1+rcx]
+        xor     r13d,r14d
+        mov     esi,edx
+        vpsrld  ymm6,ymm7,10
+        rorx    r12d,edx,22
+        lea     ecx,[r13*1+rcx]
+        xor     esi,r8d
+        vpxor   ymm4,ymm4,ymm5
+        rorx    r14d,edx,13
+        rorx    r13d,edx,2
+        lea     r10d,[rcx*1+r10]
+        vpsrlq  ymm7,ymm7,17
+        and     r15d,esi
+        vaesenclast     xmm11,xmm9,xmm10
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((208-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,r8d
+        vpaddd  ymm3,ymm3,ymm4
+        xor     r14d,r13d
+        lea     ecx,[r15*1+rcx]
+        mov     r12d,r11d
+        vpxor   ymm6,ymm6,ymm7
+        add     ebx,DWORD[((40+128))+rsp]
+        and     r12d,r10d
+        rorx    r13d,r10d,25
+        vpsrlq  ymm7,ymm7,2
+        rorx    r15d,r10d,11
+        lea     ecx,[r14*1+rcx]
+        lea     ebx,[r12*1+rbx]
+        vpxor   ymm6,ymm6,ymm7
+        andn    r12d,r10d,eax
+        xor     r13d,r15d
+        rorx    r14d,r10d,6
+        vpshufd ymm6,ymm6,132
+        lea     ebx,[r12*1+rbx]
+        xor     r13d,r14d
+        mov     r15d,ecx
+        vpsrldq ymm6,ymm6,8
+        rorx    r12d,ecx,22
+        lea     ebx,[r13*1+rbx]
+        xor     r15d,edx
+        vpaddd  ymm3,ymm3,ymm6
+        rorx    r14d,ecx,13
+        rorx    r13d,ecx,2
+        lea     r9d,[rbx*1+r9]
+        vpshufd ymm7,ymm3,80
+        and     esi,r15d
+        vpand   xmm11,xmm11,xmm13
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((224-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,edx
+        vpsrld  ymm6,ymm7,10
+        xor     r14d,r13d
+        lea     ebx,[rsi*1+rbx]
+        mov     r12d,r10d
+        vpsrlq  ymm7,ymm7,17
+        add     eax,DWORD[((44+128))+rsp]
+        and     r12d,r9d
+        rorx    r13d,r9d,25
+        vpxor   ymm6,ymm6,ymm7
+        rorx    esi,r9d,11
+        lea     ebx,[r14*1+rbx]
+        lea     eax,[r12*1+rax]
+        vpsrlq  ymm7,ymm7,2
+        andn    r12d,r9d,r11d
+        xor     r13d,esi
+        rorx    r14d,r9d,6
+        vpxor   ymm6,ymm6,ymm7
+        lea     eax,[r12*1+rax]
+        xor     r13d,r14d
+        mov     esi,ebx
+        vpshufd ymm6,ymm6,232
+        rorx    r12d,ebx,22
+        lea     eax,[r13*1+rax]
+        xor     esi,ecx
+        vpslldq ymm6,ymm6,8
+        rorx    r14d,ebx,13
+        rorx    r13d,ebx,2
+        lea     r8d,[rax*1+r8]
+        vpaddd  ymm3,ymm3,ymm6
+        and     r15d,esi
+        vpor    xmm8,xmm8,xmm11
+        vaesenclast     xmm11,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((0-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,ecx
+        vpaddd  ymm6,ymm3,YMMWORD[96+rbp]
+        xor     r14d,r13d
+        lea     eax,[r15*1+rax]
+        mov     r12d,r9d
+        vmovdqa YMMWORD[32+rsp],ymm6
+        vmovq   r13,xmm15
+        vpextrq r15,xmm15,1
+        vpand   xmm11,xmm11,xmm14
+        vpor    xmm8,xmm8,xmm11
+        vmovdqu XMMWORD[r13*1+r15],xmm8
+        lea     r13,[16+r13]
+        lea     rbp,[128+rbp]
+        cmp     BYTE[3+rbp],0
+        jne     NEAR $L$avx2_00_47
+        vmovdqu xmm9,XMMWORD[r13]
+        vpinsrq xmm15,xmm15,r13,0
+        add     r11d,DWORD[((0+64))+rsp]
+        and     r12d,r8d
+        rorx    r13d,r8d,25
+        rorx    r15d,r8d,11
+        lea     eax,[r14*1+rax]
+        lea     r11d,[r12*1+r11]
+        andn    r12d,r8d,r10d
+        xor     r13d,r15d
+        rorx    r14d,r8d,6
+        lea     r11d,[r12*1+r11]
+        xor     r13d,r14d
+        mov     r15d,eax
+        rorx    r12d,eax,22
+        lea     r11d,[r13*1+r11]
+        xor     r15d,ebx
+        rorx    r14d,eax,13
+        rorx    r13d,eax,2
+        lea     edx,[r11*1+rdx]
+        and     esi,r15d
+        vpxor   xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((16-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,ebx
+        xor     r14d,r13d
+        lea     r11d,[rsi*1+r11]
+        mov     r12d,r8d
+        add     r10d,DWORD[((4+64))+rsp]
+        and     r12d,edx
+        rorx    r13d,edx,25
+        rorx    esi,edx,11
+        lea     r11d,[r14*1+r11]
+        lea     r10d,[r12*1+r10]
+        andn    r12d,edx,r9d
+        xor     r13d,esi
+        rorx    r14d,edx,6
+        lea     r10d,[r12*1+r10]
+        xor     r13d,r14d
+        mov     esi,r11d
+        rorx    r12d,r11d,22
+        lea     r10d,[r13*1+r10]
+        xor     esi,eax
+        rorx    r14d,r11d,13
+        rorx    r13d,r11d,2
+        lea     ecx,[r10*1+rcx]
+        and     r15d,esi
+        vpxor   xmm9,xmm9,xmm8
+        xor     r14d,r12d
+        xor     r15d,eax
+        xor     r14d,r13d
+        lea     r10d,[r15*1+r10]
+        mov     r12d,edx
+        add     r9d,DWORD[((8+64))+rsp]
+        and     r12d,ecx
+        rorx    r13d,ecx,25
+        rorx    r15d,ecx,11
+        lea     r10d,[r14*1+r10]
+        lea     r9d,[r12*1+r9]
+        andn    r12d,ecx,r8d
+        xor     r13d,r15d
+        rorx    r14d,ecx,6
+        lea     r9d,[r12*1+r9]
+        xor     r13d,r14d
+        mov     r15d,r10d
+        rorx    r12d,r10d,22
+        lea     r9d,[r13*1+r9]
+        xor     r15d,r11d
+        rorx    r14d,r10d,13
+        rorx    r13d,r10d,2
+        lea     ebx,[r9*1+rbx]
+        and     esi,r15d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((32-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,r11d
+        xor     r14d,r13d
+        lea     r9d,[rsi*1+r9]
+        mov     r12d,ecx
+        add     r8d,DWORD[((12+64))+rsp]
+        and     r12d,ebx
+        rorx    r13d,ebx,25
+        rorx    esi,ebx,11
+        lea     r9d,[r14*1+r9]
+        lea     r8d,[r12*1+r8]
+        andn    r12d,ebx,edx
+        xor     r13d,esi
+        rorx    r14d,ebx,6
+        lea     r8d,[r12*1+r8]
+        xor     r13d,r14d
+        mov     esi,r9d
+        rorx    r12d,r9d,22
+        lea     r8d,[r13*1+r8]
+        xor     esi,r10d
+        rorx    r14d,r9d,13
+        rorx    r13d,r9d,2
+        lea     eax,[r8*1+rax]
+        and     r15d,esi
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((48-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,r10d
+        xor     r14d,r13d
+        lea     r8d,[r15*1+r8]
+        mov     r12d,ebx
+        add     edx,DWORD[((32+64))+rsp]
+        and     r12d,eax
+        rorx    r13d,eax,25
+        rorx    r15d,eax,11
+        lea     r8d,[r14*1+r8]
+        lea     edx,[r12*1+rdx]
+        andn    r12d,eax,ecx
+        xor     r13d,r15d
+        rorx    r14d,eax,6
+        lea     edx,[r12*1+rdx]
+        xor     r13d,r14d
+        mov     r15d,r8d
+        rorx    r12d,r8d,22
+        lea     edx,[r13*1+rdx]
+        xor     r15d,r9d
+        rorx    r14d,r8d,13
+        rorx    r13d,r8d,2
+        lea     r11d,[rdx*1+r11]
+        and     esi,r15d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((64-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,r9d
+        xor     r14d,r13d
+        lea     edx,[rsi*1+rdx]
+        mov     r12d,eax
+        add     ecx,DWORD[((36+64))+rsp]
+        and     r12d,r11d
+        rorx    r13d,r11d,25
+        rorx    esi,r11d,11
+        lea     edx,[r14*1+rdx]
+        lea     ecx,[r12*1+rcx]
+        andn    r12d,r11d,ebx
+        xor     r13d,esi
+        rorx    r14d,r11d,6
+        lea     ecx,[r12*1+rcx]
+        xor     r13d,r14d
+        mov     esi,edx
+        rorx    r12d,edx,22
+        lea     ecx,[r13*1+rcx]
+        xor     esi,r8d
+        rorx    r14d,edx,13
+        rorx    r13d,edx,2
+        lea     r10d,[rcx*1+r10]
+        and     r15d,esi
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((80-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,r8d
+        xor     r14d,r13d
+        lea     ecx,[r15*1+rcx]
+        mov     r12d,r11d
+        add     ebx,DWORD[((40+64))+rsp]
+        and     r12d,r10d
+        rorx    r13d,r10d,25
+        rorx    r15d,r10d,11
+        lea     ecx,[r14*1+rcx]
+        lea     ebx,[r12*1+rbx]
+        andn    r12d,r10d,eax
+        xor     r13d,r15d
+        rorx    r14d,r10d,6
+        lea     ebx,[r12*1+rbx]
+        xor     r13d,r14d
+        mov     r15d,ecx
+        rorx    r12d,ecx,22
+        lea     ebx,[r13*1+rbx]
+        xor     r15d,edx
+        rorx    r14d,ecx,13
+        rorx    r13d,ecx,2
+        lea     r9d,[rbx*1+r9]
+        and     esi,r15d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((96-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,edx
+        xor     r14d,r13d
+        lea     ebx,[rsi*1+rbx]
+        mov     r12d,r10d
+        add     eax,DWORD[((44+64))+rsp]
+        and     r12d,r9d
+        rorx    r13d,r9d,25
+        rorx    esi,r9d,11
+        lea     ebx,[r14*1+rbx]
+        lea     eax,[r12*1+rax]
+        andn    r12d,r9d,r11d
+        xor     r13d,esi
+        rorx    r14d,r9d,6
+        lea     eax,[r12*1+rax]
+        xor     r13d,r14d
+        mov     esi,ebx
+        rorx    r12d,ebx,22
+        lea     eax,[r13*1+rax]
+        xor     esi,ecx
+        rorx    r14d,ebx,13
+        rorx    r13d,ebx,2
+        lea     r8d,[rax*1+r8]
+        and     r15d,esi
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((112-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,ecx
+        xor     r14d,r13d
+        lea     eax,[r15*1+rax]
+        mov     r12d,r9d
+        add     r11d,DWORD[rsp]
+        and     r12d,r8d
+        rorx    r13d,r8d,25
+        rorx    r15d,r8d,11
+        lea     eax,[r14*1+rax]
+        lea     r11d,[r12*1+r11]
+        andn    r12d,r8d,r10d
+        xor     r13d,r15d
+        rorx    r14d,r8d,6
+        lea     r11d,[r12*1+r11]
+        xor     r13d,r14d
+        mov     r15d,eax
+        rorx    r12d,eax,22
+        lea     r11d,[r13*1+r11]
+        xor     r15d,ebx
+        rorx    r14d,eax,13
+        rorx    r13d,eax,2
+        lea     edx,[r11*1+rdx]
+        and     esi,r15d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((128-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,ebx
+        xor     r14d,r13d
+        lea     r11d,[rsi*1+r11]
+        mov     r12d,r8d
+        add     r10d,DWORD[4+rsp]
+        and     r12d,edx
+        rorx    r13d,edx,25
+        rorx    esi,edx,11
+        lea     r11d,[r14*1+r11]
+        lea     r10d,[r12*1+r10]
+        andn    r12d,edx,r9d
+        xor     r13d,esi
+        rorx    r14d,edx,6
+        lea     r10d,[r12*1+r10]
+        xor     r13d,r14d
+        mov     esi,r11d
+        rorx    r12d,r11d,22
+        lea     r10d,[r13*1+r10]
+        xor     esi,eax
+        rorx    r14d,r11d,13
+        rorx    r13d,r11d,2
+        lea     ecx,[r10*1+rcx]
+        and     r15d,esi
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((144-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,eax
+        xor     r14d,r13d
+        lea     r10d,[r15*1+r10]
+        mov     r12d,edx
+        add     r9d,DWORD[8+rsp]
+        and     r12d,ecx
+        rorx    r13d,ecx,25
+        rorx    r15d,ecx,11
+        lea     r10d,[r14*1+r10]
+        lea     r9d,[r12*1+r9]
+        andn    r12d,ecx,r8d
+        xor     r13d,r15d
+        rorx    r14d,ecx,6
+        lea     r9d,[r12*1+r9]
+        xor     r13d,r14d
+        mov     r15d,r10d
+        rorx    r12d,r10d,22
+        lea     r9d,[r13*1+r9]
+        xor     r15d,r11d
+        rorx    r14d,r10d,13
+        rorx    r13d,r10d,2
+        lea     ebx,[r9*1+rbx]
+        and     esi,r15d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((160-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,r11d
+        xor     r14d,r13d
+        lea     r9d,[rsi*1+r9]
+        mov     r12d,ecx
+        add     r8d,DWORD[12+rsp]
+        and     r12d,ebx
+        rorx    r13d,ebx,25
+        rorx    esi,ebx,11
+        lea     r9d,[r14*1+r9]
+        lea     r8d,[r12*1+r8]
+        andn    r12d,ebx,edx
+        xor     r13d,esi
+        rorx    r14d,ebx,6
+        lea     r8d,[r12*1+r8]
+        xor     r13d,r14d
+        mov     esi,r9d
+        rorx    r12d,r9d,22
+        lea     r8d,[r13*1+r8]
+        xor     esi,r10d
+        rorx    r14d,r9d,13
+        rorx    r13d,r9d,2
+        lea     eax,[r8*1+rax]
+        and     r15d,esi
+        vaesenclast     xmm11,xmm9,xmm10
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((176-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,r10d
+        xor     r14d,r13d
+        lea     r8d,[r15*1+r8]
+        mov     r12d,ebx
+        add     edx,DWORD[32+rsp]
+        and     r12d,eax
+        rorx    r13d,eax,25
+        rorx    r15d,eax,11
+        lea     r8d,[r14*1+r8]
+        lea     edx,[r12*1+rdx]
+        andn    r12d,eax,ecx
+        xor     r13d,r15d
+        rorx    r14d,eax,6
+        lea     edx,[r12*1+rdx]
+        xor     r13d,r14d
+        mov     r15d,r8d
+        rorx    r12d,r8d,22
+        lea     edx,[r13*1+rdx]
+        xor     r15d,r9d
+        rorx    r14d,r8d,13
+        rorx    r13d,r8d,2
+        lea     r11d,[rdx*1+r11]
+        and     esi,r15d
+        vpand   xmm8,xmm11,xmm12
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((192-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,r9d
+        xor     r14d,r13d
+        lea     edx,[rsi*1+rdx]
+        mov     r12d,eax
+        add     ecx,DWORD[36+rsp]
+        and     r12d,r11d
+        rorx    r13d,r11d,25
+        rorx    esi,r11d,11
+        lea     edx,[r14*1+rdx]
+        lea     ecx,[r12*1+rcx]
+        andn    r12d,r11d,ebx
+        xor     r13d,esi
+        rorx    r14d,r11d,6
+        lea     ecx,[r12*1+rcx]
+        xor     r13d,r14d
+        mov     esi,edx
+        rorx    r12d,edx,22
+        lea     ecx,[r13*1+rcx]
+        xor     esi,r8d
+        rorx    r14d,edx,13
+        rorx    r13d,edx,2
+        lea     r10d,[rcx*1+r10]
+        and     r15d,esi
+        vaesenclast     xmm11,xmm9,xmm10
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((208-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,r8d
+        xor     r14d,r13d
+        lea     ecx,[r15*1+rcx]
+        mov     r12d,r11d
+        add     ebx,DWORD[40+rsp]
+        and     r12d,r10d
+        rorx    r13d,r10d,25
+        rorx    r15d,r10d,11
+        lea     ecx,[r14*1+rcx]
+        lea     ebx,[r12*1+rbx]
+        andn    r12d,r10d,eax
+        xor     r13d,r15d
+        rorx    r14d,r10d,6
+        lea     ebx,[r12*1+rbx]
+        xor     r13d,r14d
+        mov     r15d,ecx
+        rorx    r12d,ecx,22
+        lea     ebx,[r13*1+rbx]
+        xor     r15d,edx
+        rorx    r14d,ecx,13
+        rorx    r13d,ecx,2
+        lea     r9d,[rbx*1+r9]
+        and     esi,r15d
+        vpand   xmm11,xmm11,xmm13
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((224-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,edx
+        xor     r14d,r13d
+        lea     ebx,[rsi*1+rbx]
+        mov     r12d,r10d
+        add     eax,DWORD[44+rsp]
+        and     r12d,r9d
+        rorx    r13d,r9d,25
+        rorx    esi,r9d,11
+        lea     ebx,[r14*1+rbx]
+        lea     eax,[r12*1+rax]
+        andn    r12d,r9d,r11d
+        xor     r13d,esi
+        rorx    r14d,r9d,6
+        lea     eax,[r12*1+rax]
+        xor     r13d,r14d
+        mov     esi,ebx
+        rorx    r12d,ebx,22
+        lea     eax,[r13*1+rax]
+        xor     esi,ecx
+        rorx    r14d,ebx,13
+        rorx    r13d,ebx,2
+        lea     r8d,[rax*1+r8]
+        and     r15d,esi
+        vpor    xmm8,xmm8,xmm11
+        vaesenclast     xmm11,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((0-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,ecx
+        xor     r14d,r13d
+        lea     eax,[r15*1+rax]
+        mov     r12d,r9d
+        vpextrq r12,xmm15,1
+        vmovq   r13,xmm15
+        mov     r15,QWORD[552+rsp]
+        add     eax,r14d
+        lea     rbp,[448+rsp]
+
+        vpand   xmm11,xmm11,xmm14
+        vpor    xmm8,xmm8,xmm11
+        vmovdqu XMMWORD[r13*1+r12],xmm8
+        lea     r13,[16+r13]
+
+        add     eax,DWORD[r15]
+        add     ebx,DWORD[4+r15]
+        add     ecx,DWORD[8+r15]
+        add     edx,DWORD[12+r15]
+        add     r8d,DWORD[16+r15]
+        add     r9d,DWORD[20+r15]
+        add     r10d,DWORD[24+r15]
+        add     r11d,DWORD[28+r15]
+
+        mov     DWORD[r15],eax
+        mov     DWORD[4+r15],ebx
+        mov     DWORD[8+r15],ecx
+        mov     DWORD[12+r15],edx
+        mov     DWORD[16+r15],r8d
+        mov     DWORD[20+r15],r9d
+        mov     DWORD[24+r15],r10d
+        mov     DWORD[28+r15],r11d
+
+        cmp     r13,QWORD[80+rbp]
+        je      NEAR $L$done_avx2
+
+        xor     r14d,r14d
+        mov     esi,ebx
+        mov     r12d,r9d
+        xor     esi,ecx
+        jmp     NEAR $L$ower_avx2
+ALIGN   16
+$L$ower_avx2:
+        vmovdqu xmm9,XMMWORD[r13]
+        vpinsrq xmm15,xmm15,r13,0
+        add     r11d,DWORD[((0+16))+rbp]
+        and     r12d,r8d
+        rorx    r13d,r8d,25
+        rorx    r15d,r8d,11
+        lea     eax,[r14*1+rax]
+        lea     r11d,[r12*1+r11]
+        andn    r12d,r8d,r10d
+        xor     r13d,r15d
+        rorx    r14d,r8d,6
+        lea     r11d,[r12*1+r11]
+        xor     r13d,r14d
+        mov     r15d,eax
+        rorx    r12d,eax,22
+        lea     r11d,[r13*1+r11]
+        xor     r15d,ebx
+        rorx    r14d,eax,13
+        rorx    r13d,eax,2
+        lea     edx,[r11*1+rdx]
+        and     esi,r15d
+        vpxor   xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((16-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,ebx
+        xor     r14d,r13d
+        lea     r11d,[rsi*1+r11]
+        mov     r12d,r8d
+        add     r10d,DWORD[((4+16))+rbp]
+        and     r12d,edx
+        rorx    r13d,edx,25
+        rorx    esi,edx,11
+        lea     r11d,[r14*1+r11]
+        lea     r10d,[r12*1+r10]
+        andn    r12d,edx,r9d
+        xor     r13d,esi
+        rorx    r14d,edx,6
+        lea     r10d,[r12*1+r10]
+        xor     r13d,r14d
+        mov     esi,r11d
+        rorx    r12d,r11d,22
+        lea     r10d,[r13*1+r10]
+        xor     esi,eax
+        rorx    r14d,r11d,13
+        rorx    r13d,r11d,2
+        lea     ecx,[r10*1+rcx]
+        and     r15d,esi
+        vpxor   xmm9,xmm9,xmm8
+        xor     r14d,r12d
+        xor     r15d,eax
+        xor     r14d,r13d
+        lea     r10d,[r15*1+r10]
+        mov     r12d,edx
+        add     r9d,DWORD[((8+16))+rbp]
+        and     r12d,ecx
+        rorx    r13d,ecx,25
+        rorx    r15d,ecx,11
+        lea     r10d,[r14*1+r10]
+        lea     r9d,[r12*1+r9]
+        andn    r12d,ecx,r8d
+        xor     r13d,r15d
+        rorx    r14d,ecx,6
+        lea     r9d,[r12*1+r9]
+        xor     r13d,r14d
+        mov     r15d,r10d
+        rorx    r12d,r10d,22
+        lea     r9d,[r13*1+r9]
+        xor     r15d,r11d
+        rorx    r14d,r10d,13
+        rorx    r13d,r10d,2
+        lea     ebx,[r9*1+rbx]
+        and     esi,r15d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((32-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,r11d
+        xor     r14d,r13d
+        lea     r9d,[rsi*1+r9]
+        mov     r12d,ecx
+        add     r8d,DWORD[((12+16))+rbp]
+        and     r12d,ebx
+        rorx    r13d,ebx,25
+        rorx    esi,ebx,11
+        lea     r9d,[r14*1+r9]
+        lea     r8d,[r12*1+r8]
+        andn    r12d,ebx,edx
+        xor     r13d,esi
+        rorx    r14d,ebx,6
+        lea     r8d,[r12*1+r8]
+        xor     r13d,r14d
+        mov     esi,r9d
+        rorx    r12d,r9d,22
+        lea     r8d,[r13*1+r8]
+        xor     esi,r10d
+        rorx    r14d,r9d,13
+        rorx    r13d,r9d,2
+        lea     eax,[r8*1+rax]
+        and     r15d,esi
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((48-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,r10d
+        xor     r14d,r13d
+        lea     r8d,[r15*1+r8]
+        mov     r12d,ebx
+        add     edx,DWORD[((32+16))+rbp]
+        and     r12d,eax
+        rorx    r13d,eax,25
+        rorx    r15d,eax,11
+        lea     r8d,[r14*1+r8]
+        lea     edx,[r12*1+rdx]
+        andn    r12d,eax,ecx
+        xor     r13d,r15d
+        rorx    r14d,eax,6
+        lea     edx,[r12*1+rdx]
+        xor     r13d,r14d
+        mov     r15d,r8d
+        rorx    r12d,r8d,22
+        lea     edx,[r13*1+rdx]
+        xor     r15d,r9d
+        rorx    r14d,r8d,13
+        rorx    r13d,r8d,2
+        lea     r11d,[rdx*1+r11]
+        and     esi,r15d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((64-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,r9d
+        xor     r14d,r13d
+        lea     edx,[rsi*1+rdx]
+        mov     r12d,eax
+        add     ecx,DWORD[((36+16))+rbp]
+        and     r12d,r11d
+        rorx    r13d,r11d,25
+        rorx    esi,r11d,11
+        lea     edx,[r14*1+rdx]
+        lea     ecx,[r12*1+rcx]
+        andn    r12d,r11d,ebx
+        xor     r13d,esi
+        rorx    r14d,r11d,6
+        lea     ecx,[r12*1+rcx]
+        xor     r13d,r14d
+        mov     esi,edx
+        rorx    r12d,edx,22
+        lea     ecx,[r13*1+rcx]
+        xor     esi,r8d
+        rorx    r14d,edx,13
+        rorx    r13d,edx,2
+        lea     r10d,[rcx*1+r10]
+        and     r15d,esi
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((80-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,r8d
+        xor     r14d,r13d
+        lea     ecx,[r15*1+rcx]
+        mov     r12d,r11d
+        add     ebx,DWORD[((40+16))+rbp]
+        and     r12d,r10d
+        rorx    r13d,r10d,25
+        rorx    r15d,r10d,11
+        lea     ecx,[r14*1+rcx]
+        lea     ebx,[r12*1+rbx]
+        andn    r12d,r10d,eax
+        xor     r13d,r15d
+        rorx    r14d,r10d,6
+        lea     ebx,[r12*1+rbx]
+        xor     r13d,r14d
+        mov     r15d,ecx
+        rorx    r12d,ecx,22
+        lea     ebx,[r13*1+rbx]
+        xor     r15d,edx
+        rorx    r14d,ecx,13
+        rorx    r13d,ecx,2
+        lea     r9d,[rbx*1+r9]
+        and     esi,r15d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((96-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,edx
+        xor     r14d,r13d
+        lea     ebx,[rsi*1+rbx]
+        mov     r12d,r10d
+        add     eax,DWORD[((44+16))+rbp]
+        and     r12d,r9d
+        rorx    r13d,r9d,25
+        rorx    esi,r9d,11
+        lea     ebx,[r14*1+rbx]
+        lea     eax,[r12*1+rax]
+        andn    r12d,r9d,r11d
+        xor     r13d,esi
+        rorx    r14d,r9d,6
+        lea     eax,[r12*1+rax]
+        xor     r13d,r14d
+        mov     esi,ebx
+        rorx    r12d,ebx,22
+        lea     eax,[r13*1+rax]
+        xor     esi,ecx
+        rorx    r14d,ebx,13
+        rorx    r13d,ebx,2
+        lea     r8d,[rax*1+r8]
+        and     r15d,esi
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((112-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,ecx
+        xor     r14d,r13d
+        lea     eax,[r15*1+rax]
+        mov     r12d,r9d
+        lea     rbp,[((-64))+rbp]
+        add     r11d,DWORD[((0+16))+rbp]
+        and     r12d,r8d
+        rorx    r13d,r8d,25
+        rorx    r15d,r8d,11
+        lea     eax,[r14*1+rax]
+        lea     r11d,[r12*1+r11]
+        andn    r12d,r8d,r10d
+        xor     r13d,r15d
+        rorx    r14d,r8d,6
+        lea     r11d,[r12*1+r11]
+        xor     r13d,r14d
+        mov     r15d,eax
+        rorx    r12d,eax,22
+        lea     r11d,[r13*1+r11]
+        xor     r15d,ebx
+        rorx    r14d,eax,13
+        rorx    r13d,eax,2
+        lea     edx,[r11*1+rdx]
+        and     esi,r15d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((128-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,ebx
+        xor     r14d,r13d
+        lea     r11d,[rsi*1+r11]
+        mov     r12d,r8d
+        add     r10d,DWORD[((4+16))+rbp]
+        and     r12d,edx
+        rorx    r13d,edx,25
+        rorx    esi,edx,11
+        lea     r11d,[r14*1+r11]
+        lea     r10d,[r12*1+r10]
+        andn    r12d,edx,r9d
+        xor     r13d,esi
+        rorx    r14d,edx,6
+        lea     r10d,[r12*1+r10]
+        xor     r13d,r14d
+        mov     esi,r11d
+        rorx    r12d,r11d,22
+        lea     r10d,[r13*1+r10]
+        xor     esi,eax
+        rorx    r14d,r11d,13
+        rorx    r13d,r11d,2
+        lea     ecx,[r10*1+rcx]
+        and     r15d,esi
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((144-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,eax
+        xor     r14d,r13d
+        lea     r10d,[r15*1+r10]
+        mov     r12d,edx
+        add     r9d,DWORD[((8+16))+rbp]
+        and     r12d,ecx
+        rorx    r13d,ecx,25
+        rorx    r15d,ecx,11
+        lea     r10d,[r14*1+r10]
+        lea     r9d,[r12*1+r9]
+        andn    r12d,ecx,r8d
+        xor     r13d,r15d
+        rorx    r14d,ecx,6
+        lea     r9d,[r12*1+r9]
+        xor     r13d,r14d
+        mov     r15d,r10d
+        rorx    r12d,r10d,22
+        lea     r9d,[r13*1+r9]
+        xor     r15d,r11d
+        rorx    r14d,r10d,13
+        rorx    r13d,r10d,2
+        lea     ebx,[r9*1+rbx]
+        and     esi,r15d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((160-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,r11d
+        xor     r14d,r13d
+        lea     r9d,[rsi*1+r9]
+        mov     r12d,ecx
+        add     r8d,DWORD[((12+16))+rbp]
+        and     r12d,ebx
+        rorx    r13d,ebx,25
+        rorx    esi,ebx,11
+        lea     r9d,[r14*1+r9]
+        lea     r8d,[r12*1+r8]
+        andn    r12d,ebx,edx
+        xor     r13d,esi
+        rorx    r14d,ebx,6
+        lea     r8d,[r12*1+r8]
+        xor     r13d,r14d
+        mov     esi,r9d
+        rorx    r12d,r9d,22
+        lea     r8d,[r13*1+r8]
+        xor     esi,r10d
+        rorx    r14d,r9d,13
+        rorx    r13d,r9d,2
+        lea     eax,[r8*1+rax]
+        and     r15d,esi
+        vaesenclast     xmm11,xmm9,xmm10
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((176-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,r10d
+        xor     r14d,r13d
+        lea     r8d,[r15*1+r8]
+        mov     r12d,ebx
+        add     edx,DWORD[((32+16))+rbp]
+        and     r12d,eax
+        rorx    r13d,eax,25
+        rorx    r15d,eax,11
+        lea     r8d,[r14*1+r8]
+        lea     edx,[r12*1+rdx]
+        andn    r12d,eax,ecx
+        xor     r13d,r15d
+        rorx    r14d,eax,6
+        lea     edx,[r12*1+rdx]
+        xor     r13d,r14d
+        mov     r15d,r8d
+        rorx    r12d,r8d,22
+        lea     edx,[r13*1+rdx]
+        xor     r15d,r9d
+        rorx    r14d,r8d,13
+        rorx    r13d,r8d,2
+        lea     r11d,[rdx*1+r11]
+        and     esi,r15d
+        vpand   xmm8,xmm11,xmm12
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((192-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,r9d
+        xor     r14d,r13d
+        lea     edx,[rsi*1+rdx]
+        mov     r12d,eax
+        add     ecx,DWORD[((36+16))+rbp]
+        and     r12d,r11d
+        rorx    r13d,r11d,25
+        rorx    esi,r11d,11
+        lea     edx,[r14*1+rdx]
+        lea     ecx,[r12*1+rcx]
+        andn    r12d,r11d,ebx
+        xor     r13d,esi
+        rorx    r14d,r11d,6
+        lea     ecx,[r12*1+rcx]
+        xor     r13d,r14d
+        mov     esi,edx
+        rorx    r12d,edx,22
+        lea     ecx,[r13*1+rcx]
+        xor     esi,r8d
+        rorx    r14d,edx,13
+        rorx    r13d,edx,2
+        lea     r10d,[rcx*1+r10]
+        and     r15d,esi
+        vaesenclast     xmm11,xmm9,xmm10
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((208-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,r8d
+        xor     r14d,r13d
+        lea     ecx,[r15*1+rcx]
+        mov     r12d,r11d
+        add     ebx,DWORD[((40+16))+rbp]
+        and     r12d,r10d
+        rorx    r13d,r10d,25
+        rorx    r15d,r10d,11
+        lea     ecx,[r14*1+rcx]
+        lea     ebx,[r12*1+rbx]
+        andn    r12d,r10d,eax
+        xor     r13d,r15d
+        rorx    r14d,r10d,6
+        lea     ebx,[r12*1+rbx]
+        xor     r13d,r14d
+        mov     r15d,ecx
+        rorx    r12d,ecx,22
+        lea     ebx,[r13*1+rbx]
+        xor     r15d,edx
+        rorx    r14d,ecx,13
+        rorx    r13d,ecx,2
+        lea     r9d,[rbx*1+r9]
+        and     esi,r15d
+        vpand   xmm11,xmm11,xmm13
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((224-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,edx
+        xor     r14d,r13d
+        lea     ebx,[rsi*1+rbx]
+        mov     r12d,r10d
+        add     eax,DWORD[((44+16))+rbp]
+        and     r12d,r9d
+        rorx    r13d,r9d,25
+        rorx    esi,r9d,11
+        lea     ebx,[r14*1+rbx]
+        lea     eax,[r12*1+rax]
+        andn    r12d,r9d,r11d
+        xor     r13d,esi
+        rorx    r14d,r9d,6
+        lea     eax,[r12*1+rax]
+        xor     r13d,r14d
+        mov     esi,ebx
+        rorx    r12d,ebx,22
+        lea     eax,[r13*1+rax]
+        xor     esi,ecx
+        rorx    r14d,ebx,13
+        rorx    r13d,ebx,2
+        lea     r8d,[rax*1+r8]
+        and     r15d,esi
+        vpor    xmm8,xmm8,xmm11
+        vaesenclast     xmm11,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((0-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,ecx
+        xor     r14d,r13d
+        lea     eax,[r15*1+rax]
+        mov     r12d,r9d
+        vmovq   r13,xmm15
+        vpextrq r15,xmm15,1
+        vpand   xmm11,xmm11,xmm14
+        vpor    xmm8,xmm8,xmm11
+        lea     rbp,[((-64))+rbp]
+        vmovdqu XMMWORD[r13*1+r15],xmm8
+        lea     r13,[16+r13]
+        cmp     rbp,rsp
+        jae     NEAR $L$ower_avx2
+
+        mov     r15,QWORD[552+rsp]
+        lea     r13,[64+r13]
+        mov     rsi,QWORD[560+rsp]
+        add     eax,r14d
+        lea     rsp,[448+rsp]
+
+        add     eax,DWORD[r15]
+        add     ebx,DWORD[4+r15]
+        add     ecx,DWORD[8+r15]
+        add     edx,DWORD[12+r15]
+        add     r8d,DWORD[16+r15]
+        add     r9d,DWORD[20+r15]
+        add     r10d,DWORD[24+r15]
+        lea     r12,[r13*1+rsi]
+        add     r11d,DWORD[28+r15]
+
+        cmp     r13,QWORD[((64+16))+rsp]
+
+        mov     DWORD[r15],eax
+        cmove   r12,rsp
+        mov     DWORD[4+r15],ebx
+        mov     DWORD[8+r15],ecx
+        mov     DWORD[12+r15],edx
+        mov     DWORD[16+r15],r8d
+        mov     DWORD[20+r15],r9d
+        mov     DWORD[24+r15],r10d
+        mov     DWORD[28+r15],r11d
+
+        jbe     NEAR $L$oop_avx2
+        lea     rbp,[rsp]
+
+$L$done_avx2:
+        lea     rsp,[rbp]
+        mov     r8,QWORD[((64+32))+rsp]
+        mov     rsi,QWORD[120+rsp]
+
+        vmovdqu XMMWORD[r8],xmm8
+        vzeroall
+        movaps  xmm6,XMMWORD[128+rsp]
+        movaps  xmm7,XMMWORD[144+rsp]
+        movaps  xmm8,XMMWORD[160+rsp]
+        movaps  xmm9,XMMWORD[176+rsp]
+        movaps  xmm10,XMMWORD[192+rsp]
+        movaps  xmm11,XMMWORD[208+rsp]
+        movaps  xmm12,XMMWORD[224+rsp]
+        movaps  xmm13,XMMWORD[240+rsp]
+        movaps  xmm14,XMMWORD[256+rsp]
+        movaps  xmm15,XMMWORD[272+rsp]
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$epilogue_avx2:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_aesni_cbc_sha256_enc_avx2:
+
+ALIGN   32
+aesni_cbc_sha256_enc_shaext:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_cbc_sha256_enc_shaext:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+        mov     r10,QWORD[56+rsp]
+        lea     rsp,[((-168))+rsp]
+        movaps  XMMWORD[(-8-160)+rax],xmm6
+        movaps  XMMWORD[(-8-144)+rax],xmm7
+        movaps  XMMWORD[(-8-128)+rax],xmm8
+        movaps  XMMWORD[(-8-112)+rax],xmm9
+        movaps  XMMWORD[(-8-96)+rax],xmm10
+        movaps  XMMWORD[(-8-80)+rax],xmm11
+        movaps  XMMWORD[(-8-64)+rax],xmm12
+        movaps  XMMWORD[(-8-48)+rax],xmm13
+        movaps  XMMWORD[(-8-32)+rax],xmm14
+        movaps  XMMWORD[(-8-16)+rax],xmm15
+$L$prologue_shaext:
+        lea     rax,[((K256+128))]
+        movdqu  xmm1,XMMWORD[r9]
+        movdqu  xmm2,XMMWORD[16+r9]
+        movdqa  xmm3,XMMWORD[((512-128))+rax]
+
+        mov     r11d,DWORD[240+rcx]
+        sub     rsi,rdi
+        movups  xmm15,XMMWORD[rcx]
+        movups  xmm6,XMMWORD[r8]
+        movups  xmm4,XMMWORD[16+rcx]
+        lea     rcx,[112+rcx]
+
+        pshufd  xmm0,xmm1,0x1b
+        pshufd  xmm1,xmm1,0xb1
+        pshufd  xmm2,xmm2,0x1b
+        movdqa  xmm7,xmm3
+DB      102,15,58,15,202,8
+        punpcklqdq      xmm2,xmm0
+
+        jmp     NEAR $L$oop_shaext
+
+ALIGN   16
+$L$oop_shaext:
+        movdqu  xmm10,XMMWORD[r10]
+        movdqu  xmm11,XMMWORD[16+r10]
+        movdqu  xmm12,XMMWORD[32+r10]
+DB      102,68,15,56,0,211
+        movdqu  xmm13,XMMWORD[48+r10]
+
+        movdqa  xmm0,XMMWORD[((0-128))+rax]
+        paddd   xmm0,xmm10
+DB      102,68,15,56,0,219
+        movdqa  xmm9,xmm2
+        movdqa  xmm8,xmm1
+        movups  xmm14,XMMWORD[rdi]
+        xorps   xmm14,xmm15
+        xorps   xmm6,xmm14
+        movups  xmm5,XMMWORD[((-80))+rcx]
+        aesenc  xmm6,xmm4
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movups  xmm4,XMMWORD[((-64))+rcx]
+        aesenc  xmm6,xmm5
+DB      15,56,203,202
+
+        movdqa  xmm0,XMMWORD[((32-128))+rax]
+        paddd   xmm0,xmm11
+DB      102,68,15,56,0,227
+        lea     r10,[64+r10]
+        movups  xmm5,XMMWORD[((-48))+rcx]
+        aesenc  xmm6,xmm4
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movups  xmm4,XMMWORD[((-32))+rcx]
+        aesenc  xmm6,xmm5
+DB      15,56,203,202
+
+        movdqa  xmm0,XMMWORD[((64-128))+rax]
+        paddd   xmm0,xmm12
+DB      102,68,15,56,0,235
+DB      69,15,56,204,211
+        movups  xmm5,XMMWORD[((-16))+rcx]
+        aesenc  xmm6,xmm4
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm3,xmm13
+DB      102,65,15,58,15,220,4
+        paddd   xmm10,xmm3
+        movups  xmm4,XMMWORD[rcx]
+        aesenc  xmm6,xmm5
+DB      15,56,203,202
+
+        movdqa  xmm0,XMMWORD[((96-128))+rax]
+        paddd   xmm0,xmm13
+DB      69,15,56,205,213
+DB      69,15,56,204,220
+        movups  xmm5,XMMWORD[16+rcx]
+        aesenc  xmm6,xmm4
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movups  xmm4,XMMWORD[32+rcx]
+        aesenc  xmm6,xmm5
+        movdqa  xmm3,xmm10
+DB      102,65,15,58,15,221,4
+        paddd   xmm11,xmm3
+DB      15,56,203,202
+        movdqa  xmm0,XMMWORD[((128-128))+rax]
+        paddd   xmm0,xmm10
+DB      69,15,56,205,218
+DB      69,15,56,204,229
+        movups  xmm5,XMMWORD[48+rcx]
+        aesenc  xmm6,xmm4
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm3,xmm11
+DB      102,65,15,58,15,218,4
+        paddd   xmm12,xmm3
+        cmp     r11d,11
+        jb      NEAR $L$aesenclast1
+        movups  xmm4,XMMWORD[64+rcx]
+        aesenc  xmm6,xmm5
+        movups  xmm5,XMMWORD[80+rcx]
+        aesenc  xmm6,xmm4
+        je      NEAR $L$aesenclast1
+        movups  xmm4,XMMWORD[96+rcx]
+        aesenc  xmm6,xmm5
+        movups  xmm5,XMMWORD[112+rcx]
+        aesenc  xmm6,xmm4
+$L$aesenclast1:
+        aesenclast      xmm6,xmm5
+        movups  xmm4,XMMWORD[((16-112))+rcx]
+        nop
+DB      15,56,203,202
+        movups  xmm14,XMMWORD[16+rdi]
+        xorps   xmm14,xmm15
+        movups  XMMWORD[rdi*1+rsi],xmm6
+        xorps   xmm6,xmm14
+        movups  xmm5,XMMWORD[((-80))+rcx]
+        aesenc  xmm6,xmm4
+        movdqa  xmm0,XMMWORD[((160-128))+rax]
+        paddd   xmm0,xmm11
+DB      69,15,56,205,227
+DB      69,15,56,204,234
+        movups  xmm4,XMMWORD[((-64))+rcx]
+        aesenc  xmm6,xmm5
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm3,xmm12
+DB      102,65,15,58,15,219,4
+        paddd   xmm13,xmm3
+        movups  xmm5,XMMWORD[((-48))+rcx]
+        aesenc  xmm6,xmm4
+DB      15,56,203,202
+        movdqa  xmm0,XMMWORD[((192-128))+rax]
+        paddd   xmm0,xmm12
+DB      69,15,56,205,236
+DB      69,15,56,204,211
+        movups  xmm4,XMMWORD[((-32))+rcx]
+        aesenc  xmm6,xmm5
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm3,xmm13
+DB      102,65,15,58,15,220,4
+        paddd   xmm10,xmm3
+        movups  xmm5,XMMWORD[((-16))+rcx]
+        aesenc  xmm6,xmm4
+DB      15,56,203,202
+        movdqa  xmm0,XMMWORD[((224-128))+rax]
+        paddd   xmm0,xmm13
+DB      69,15,56,205,213
+DB      69,15,56,204,220
+        movups  xmm4,XMMWORD[rcx]
+        aesenc  xmm6,xmm5
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm3,xmm10
+DB      102,65,15,58,15,221,4
+        paddd   xmm11,xmm3
+        movups  xmm5,XMMWORD[16+rcx]
+        aesenc  xmm6,xmm4
+DB      15,56,203,202
+        movdqa  xmm0,XMMWORD[((256-128))+rax]
+        paddd   xmm0,xmm10
+DB      69,15,56,205,218
+DB      69,15,56,204,229
+        movups  xmm4,XMMWORD[32+rcx]
+        aesenc  xmm6,xmm5
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm3,xmm11
+DB      102,65,15,58,15,218,4
+        paddd   xmm12,xmm3
+        movups  xmm5,XMMWORD[48+rcx]
+        aesenc  xmm6,xmm4
+        cmp     r11d,11
+        jb      NEAR $L$aesenclast2
+        movups  xmm4,XMMWORD[64+rcx]
+        aesenc  xmm6,xmm5
+        movups  xmm5,XMMWORD[80+rcx]
+        aesenc  xmm6,xmm4
+        je      NEAR $L$aesenclast2
+        movups  xmm4,XMMWORD[96+rcx]
+        aesenc  xmm6,xmm5
+        movups  xmm5,XMMWORD[112+rcx]
+        aesenc  xmm6,xmm4
+$L$aesenclast2:
+        aesenclast      xmm6,xmm5
+        movups  xmm4,XMMWORD[((16-112))+rcx]
+        nop
+DB      15,56,203,202
+        movups  xmm14,XMMWORD[32+rdi]
+        xorps   xmm14,xmm15
+        movups  XMMWORD[16+rdi*1+rsi],xmm6
+        xorps   xmm6,xmm14
+        movups  xmm5,XMMWORD[((-80))+rcx]
+        aesenc  xmm6,xmm4
+        movdqa  xmm0,XMMWORD[((288-128))+rax]
+        paddd   xmm0,xmm11
+DB      69,15,56,205,227
+DB      69,15,56,204,234
+        movups  xmm4,XMMWORD[((-64))+rcx]
+        aesenc  xmm6,xmm5
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm3,xmm12
+DB      102,65,15,58,15,219,4
+        paddd   xmm13,xmm3
+        movups  xmm5,XMMWORD[((-48))+rcx]
+        aesenc  xmm6,xmm4
+DB      15,56,203,202
+        movdqa  xmm0,XMMWORD[((320-128))+rax]
+        paddd   xmm0,xmm12
+DB      69,15,56,205,236
+DB      69,15,56,204,211
+        movups  xmm4,XMMWORD[((-32))+rcx]
+        aesenc  xmm6,xmm5
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm3,xmm13
+DB      102,65,15,58,15,220,4
+        paddd   xmm10,xmm3
+        movups  xmm5,XMMWORD[((-16))+rcx]
+        aesenc  xmm6,xmm4
+DB      15,56,203,202
+        movdqa  xmm0,XMMWORD[((352-128))+rax]
+        paddd   xmm0,xmm13
+DB      69,15,56,205,213
+DB      69,15,56,204,220
+        movups  xmm4,XMMWORD[rcx]
+        aesenc  xmm6,xmm5
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm3,xmm10
+DB      102,65,15,58,15,221,4
+        paddd   xmm11,xmm3
+        movups  xmm5,XMMWORD[16+rcx]
+        aesenc  xmm6,xmm4
+DB      15,56,203,202
+        movdqa  xmm0,XMMWORD[((384-128))+rax]
+        paddd   xmm0,xmm10
+DB      69,15,56,205,218
+DB      69,15,56,204,229
+        movups  xmm4,XMMWORD[32+rcx]
+        aesenc  xmm6,xmm5
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm3,xmm11
+DB      102,65,15,58,15,218,4
+        paddd   xmm12,xmm3
+        movups  xmm5,XMMWORD[48+rcx]
+        aesenc  xmm6,xmm4
+DB      15,56,203,202
+        movdqa  xmm0,XMMWORD[((416-128))+rax]
+        paddd   xmm0,xmm11
+DB      69,15,56,205,227
+DB      69,15,56,204,234
+        cmp     r11d,11
+        jb      NEAR $L$aesenclast3
+        movups  xmm4,XMMWORD[64+rcx]
+        aesenc  xmm6,xmm5
+        movups  xmm5,XMMWORD[80+rcx]
+        aesenc  xmm6,xmm4
+        je      NEAR $L$aesenclast3
+        movups  xmm4,XMMWORD[96+rcx]
+        aesenc  xmm6,xmm5
+        movups  xmm5,XMMWORD[112+rcx]
+        aesenc  xmm6,xmm4
+$L$aesenclast3:
+        aesenclast      xmm6,xmm5
+        movups  xmm4,XMMWORD[((16-112))+rcx]
+        nop
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm3,xmm12
+DB      102,65,15,58,15,219,4
+        paddd   xmm13,xmm3
+        movups  xmm14,XMMWORD[48+rdi]
+        xorps   xmm14,xmm15
+        movups  XMMWORD[32+rdi*1+rsi],xmm6
+        xorps   xmm6,xmm14
+        movups  xmm5,XMMWORD[((-80))+rcx]
+        aesenc  xmm6,xmm4
+        movups  xmm4,XMMWORD[((-64))+rcx]
+        aesenc  xmm6,xmm5
+DB      15,56,203,202
+
+        movdqa  xmm0,XMMWORD[((448-128))+rax]
+        paddd   xmm0,xmm12
+DB      69,15,56,205,236
+        movdqa  xmm3,xmm7
+        movups  xmm5,XMMWORD[((-48))+rcx]
+        aesenc  xmm6,xmm4
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movups  xmm4,XMMWORD[((-32))+rcx]
+        aesenc  xmm6,xmm5
+DB      15,56,203,202
+
+        movdqa  xmm0,XMMWORD[((480-128))+rax]
+        paddd   xmm0,xmm13
+        movups  xmm5,XMMWORD[((-16))+rcx]
+        aesenc  xmm6,xmm4
+        movups  xmm4,XMMWORD[rcx]
+        aesenc  xmm6,xmm5
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movups  xmm5,XMMWORD[16+rcx]
+        aesenc  xmm6,xmm4
+DB      15,56,203,202
+
+        movups  xmm4,XMMWORD[32+rcx]
+        aesenc  xmm6,xmm5
+        movups  xmm5,XMMWORD[48+rcx]
+        aesenc  xmm6,xmm4
+        cmp     r11d,11
+        jb      NEAR $L$aesenclast4
+        movups  xmm4,XMMWORD[64+rcx]
+        aesenc  xmm6,xmm5
+        movups  xmm5,XMMWORD[80+rcx]
+        aesenc  xmm6,xmm4
+        je      NEAR $L$aesenclast4
+        movups  xmm4,XMMWORD[96+rcx]
+        aesenc  xmm6,xmm5
+        movups  xmm5,XMMWORD[112+rcx]
+        aesenc  xmm6,xmm4
+$L$aesenclast4:
+        aesenclast      xmm6,xmm5
+        movups  xmm4,XMMWORD[((16-112))+rcx]
+        nop
+
+        paddd   xmm2,xmm9
+        paddd   xmm1,xmm8
+
+        dec     rdx
+        movups  XMMWORD[48+rdi*1+rsi],xmm6
+        lea     rdi,[64+rdi]
+        jnz     NEAR $L$oop_shaext
+
+        pshufd  xmm2,xmm2,0xb1
+        pshufd  xmm3,xmm1,0x1b
+        pshufd  xmm1,xmm1,0xb1
+        punpckhqdq      xmm1,xmm2
+DB      102,15,58,15,211,8
+
+        movups  XMMWORD[r8],xmm6
+        movdqu  XMMWORD[r9],xmm1
+        movdqu  XMMWORD[16+r9],xmm2
+        movaps  xmm6,XMMWORD[rsp]
+        movaps  xmm7,XMMWORD[16+rsp]
+        movaps  xmm8,XMMWORD[32+rsp]
+        movaps  xmm9,XMMWORD[48+rsp]
+        movaps  xmm10,XMMWORD[64+rsp]
+        movaps  xmm11,XMMWORD[80+rsp]
+        movaps  xmm12,XMMWORD[96+rsp]
+        movaps  xmm13,XMMWORD[112+rsp]
+        movaps  xmm14,XMMWORD[128+rsp]
+        movaps  xmm15,XMMWORD[144+rsp]
+        lea     rsp,[((8+160))+rsp]
+$L$epilogue_shaext:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+$L$SEH_end_aesni_cbc_sha256_enc_shaext:
+EXTERN  __imp_RtlVirtualUnwind
+
+ALIGN   16
+se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$in_prologue
+
+        mov     rax,QWORD[152+r8]
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$in_prologue
+        lea     r10,[aesni_cbc_sha256_enc_shaext]
+        cmp     rbx,r10
+        jb      NEAR $L$not_in_shaext
+
+        lea     rsi,[rax]
+        lea     rdi,[512+r8]
+        mov     ecx,20
+        DD      0xa548f3fc
+        lea     rax,[168+rax]
+        jmp     NEAR $L$in_prologue
+$L$not_in_shaext:
+        lea     r10,[$L$avx2_shortcut]
+        cmp     rbx,r10
+        jb      NEAR $L$not_in_avx2
+
+        and     rax,-256*4
+        add     rax,448
+$L$not_in_avx2:
+        mov     rsi,rax
+        mov     rax,QWORD[((64+56))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+        mov     r12,QWORD[((-24))+rax]
+        mov     r13,QWORD[((-32))+rax]
+        mov     r14,QWORD[((-40))+rax]
+        mov     r15,QWORD[((-48))+rax]
+        mov     QWORD[144+r8],rbx
+        mov     QWORD[160+r8],rbp
+        mov     QWORD[216+r8],r12
+        mov     QWORD[224+r8],r13
+        mov     QWORD[232+r8],r14
+        mov     QWORD[240+r8],r15
+
+        lea     rsi,[((64+64))+rsi]
+        lea     rdi,[512+r8]
+        mov     ecx,20
+        DD      0xa548f3fc
+
+$L$in_prologue:
+        mov     rdi,QWORD[8+rax]
+        mov     rsi,QWORD[16+rax]
+        mov     QWORD[152+r8],rax
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+
+        mov     rdi,QWORD[40+r9]
+        mov     rsi,r8
+        mov     ecx,154
+        DD      0xa548f3fc
+
+        mov     rsi,r9
+        xor     rcx,rcx
+        mov     rdx,QWORD[8+rsi]
+        mov     r8,QWORD[rsi]
+        mov     r9,QWORD[16+rsi]
+        mov     r10,QWORD[40+rsi]
+        lea     r11,[56+rsi]
+        lea     r12,[24+rsi]
+        mov     QWORD[32+rsp],r10
+        mov     QWORD[40+rsp],r11
+        mov     QWORD[48+rsp],r12
+        mov     QWORD[56+rsp],rcx
+        call    QWORD[__imp_RtlVirtualUnwind]
+
+        mov     eax,1
+        add     rsp,64
+        popfq
+        pop     r15
+        pop     r14
+        pop     r13
+        pop     r12
+        pop     rbp
+        pop     rbx
+        pop     rdi
+        pop     rsi
+        DB      0F3h,0C3h               ;repret
+
+
+section .pdata rdata align=4
+        DD      $L$SEH_begin_aesni_cbc_sha256_enc_xop wrt ..imagebase
+        DD      $L$SEH_end_aesni_cbc_sha256_enc_xop wrt ..imagebase
+        DD      $L$SEH_info_aesni_cbc_sha256_enc_xop wrt ..imagebase
+
+        DD      $L$SEH_begin_aesni_cbc_sha256_enc_avx wrt ..imagebase
+        DD      $L$SEH_end_aesni_cbc_sha256_enc_avx wrt ..imagebase
+        DD      $L$SEH_info_aesni_cbc_sha256_enc_avx wrt ..imagebase
+        DD      $L$SEH_begin_aesni_cbc_sha256_enc_avx2 wrt ..imagebase
+        DD      $L$SEH_end_aesni_cbc_sha256_enc_avx2 wrt ..imagebase
+        DD      $L$SEH_info_aesni_cbc_sha256_enc_avx2 wrt ..imagebase
+        DD      $L$SEH_begin_aesni_cbc_sha256_enc_shaext wrt ..imagebase
+        DD      $L$SEH_end_aesni_cbc_sha256_enc_shaext wrt ..imagebase
+        DD      $L$SEH_info_aesni_cbc_sha256_enc_shaext wrt ..imagebase
+section .xdata rdata align=8
+ALIGN   8
+$L$SEH_info_aesni_cbc_sha256_enc_xop:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$prologue_xop wrt ..imagebase,$L$epilogue_xop wrt ..imagebase
+
+$L$SEH_info_aesni_cbc_sha256_enc_avx:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$prologue_avx wrt ..imagebase,$L$epilogue_avx wrt ..imagebase
+$L$SEH_info_aesni_cbc_sha256_enc_avx2:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$prologue_avx2 wrt ..imagebase,$L$epilogue_avx2 wrt ..imagebase
+$L$SEH_info_aesni_cbc_sha256_enc_shaext:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$prologue_shaext wrt ..imagebase,$L$epilogue_shaext wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm
new file mode 100644
index 0000000000..2705ece3e2
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm
@@ -0,0 +1,5084 @@
+; Copyright 2009-2019 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+EXTERN  OPENSSL_ia32cap_P
+global  aesni_encrypt
+
+ALIGN   16
+aesni_encrypt:
+
+        movups  xmm2,XMMWORD[rcx]
+        mov     eax,DWORD[240+r8]
+        movups  xmm0,XMMWORD[r8]
+        movups  xmm1,XMMWORD[16+r8]
+        lea     r8,[32+r8]
+        xorps   xmm2,xmm0
+$L$oop_enc1_1:
+DB      102,15,56,220,209
+        dec     eax
+        movups  xmm1,XMMWORD[r8]
+        lea     r8,[16+r8]
+        jnz     NEAR $L$oop_enc1_1
+DB      102,15,56,221,209
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        movups  XMMWORD[rdx],xmm2
+        pxor    xmm2,xmm2
+        DB      0F3h,0C3h               ;repret
+
+
+
+global  aesni_decrypt
+
+ALIGN   16
+aesni_decrypt:
+
+        movups  xmm2,XMMWORD[rcx]
+        mov     eax,DWORD[240+r8]
+        movups  xmm0,XMMWORD[r8]
+        movups  xmm1,XMMWORD[16+r8]
+        lea     r8,[32+r8]
+        xorps   xmm2,xmm0
+$L$oop_dec1_2:
+DB      102,15,56,222,209
+        dec     eax
+        movups  xmm1,XMMWORD[r8]
+        lea     r8,[16+r8]
+        jnz     NEAR $L$oop_dec1_2
+DB      102,15,56,223,209
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        movups  XMMWORD[rdx],xmm2
+        pxor    xmm2,xmm2
+        DB      0F3h,0C3h               ;repret
+
+
+
+ALIGN   16
+_aesni_encrypt2:
+
+        movups  xmm0,XMMWORD[rcx]
+        shl     eax,4
+        movups  xmm1,XMMWORD[16+rcx]
+        xorps   xmm2,xmm0
+        xorps   xmm3,xmm0
+        movups  xmm0,XMMWORD[32+rcx]
+        lea     rcx,[32+rax*1+rcx]
+        neg     rax
+        add     rax,16
+
+$L$enc_loop2:
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+        movups  xmm1,XMMWORD[rax*1+rcx]
+        add     rax,32
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+        movups  xmm0,XMMWORD[((-16))+rax*1+rcx]
+        jnz     NEAR $L$enc_loop2
+
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,221,208
+DB      102,15,56,221,216
+        DB      0F3h,0C3h               ;repret
+
+
+
+ALIGN   16
+_aesni_decrypt2:
+
+        movups  xmm0,XMMWORD[rcx]
+        shl     eax,4
+        movups  xmm1,XMMWORD[16+rcx]
+        xorps   xmm2,xmm0
+        xorps   xmm3,xmm0
+        movups  xmm0,XMMWORD[32+rcx]
+        lea     rcx,[32+rax*1+rcx]
+        neg     rax
+        add     rax,16
+
+$L$dec_loop2:
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+        movups  xmm1,XMMWORD[rax*1+rcx]
+        add     rax,32
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+        movups  xmm0,XMMWORD[((-16))+rax*1+rcx]
+        jnz     NEAR $L$dec_loop2
+
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,223,208
+DB      102,15,56,223,216
+        DB      0F3h,0C3h               ;repret
+
+
+
+ALIGN   16
+_aesni_encrypt3:
+
+        movups  xmm0,XMMWORD[rcx]
+        shl     eax,4
+        movups  xmm1,XMMWORD[16+rcx]
+        xorps   xmm2,xmm0
+        xorps   xmm3,xmm0
+        xorps   xmm4,xmm0
+        movups  xmm0,XMMWORD[32+rcx]
+        lea     rcx,[32+rax*1+rcx]
+        neg     rax
+        add     rax,16
+
+$L$enc_loop3:
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+        movups  xmm1,XMMWORD[rax*1+rcx]
+        add     rax,32
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+DB      102,15,56,220,224
+        movups  xmm0,XMMWORD[((-16))+rax*1+rcx]
+        jnz     NEAR $L$enc_loop3
+
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+DB      102,15,56,221,208
+DB      102,15,56,221,216
+DB      102,15,56,221,224
+        DB      0F3h,0C3h               ;repret
+
+
+
+ALIGN   16
+_aesni_decrypt3:
+
+        movups  xmm0,XMMWORD[rcx]
+        shl     eax,4
+        movups  xmm1,XMMWORD[16+rcx]
+        xorps   xmm2,xmm0
+        xorps   xmm3,xmm0
+        xorps   xmm4,xmm0
+        movups  xmm0,XMMWORD[32+rcx]
+        lea     rcx,[32+rax*1+rcx]
+        neg     rax
+        add     rax,16
+
+$L$dec_loop3:
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+        movups  xmm1,XMMWORD[rax*1+rcx]
+        add     rax,32
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+DB      102,15,56,222,224
+        movups  xmm0,XMMWORD[((-16))+rax*1+rcx]
+        jnz     NEAR $L$dec_loop3
+
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+DB      102,15,56,223,208
+DB      102,15,56,223,216
+DB      102,15,56,223,224
+        DB      0F3h,0C3h               ;repret
+
+
+
+ALIGN   16
+_aesni_encrypt4:
+
+        movups  xmm0,XMMWORD[rcx]
+        shl     eax,4
+        movups  xmm1,XMMWORD[16+rcx]
+        xorps   xmm2,xmm0
+        xorps   xmm3,xmm0
+        xorps   xmm4,xmm0
+        xorps   xmm5,xmm0
+        movups  xmm0,XMMWORD[32+rcx]
+        lea     rcx,[32+rax*1+rcx]
+        neg     rax
+DB      0x0f,0x1f,0x00
+        add     rax,16
+
+$L$enc_loop4:
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+        movups  xmm1,XMMWORD[rax*1+rcx]
+        add     rax,32
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+DB      102,15,56,220,224
+DB      102,15,56,220,232
+        movups  xmm0,XMMWORD[((-16))+rax*1+rcx]
+        jnz     NEAR $L$enc_loop4
+
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+DB      102,15,56,221,208
+DB      102,15,56,221,216
+DB      102,15,56,221,224
+DB      102,15,56,221,232
+        DB      0F3h,0C3h               ;repret
+
+
+
+ALIGN   16
+_aesni_decrypt4:
+
+        movups  xmm0,XMMWORD[rcx]
+        shl     eax,4
+        movups  xmm1,XMMWORD[16+rcx]
+        xorps   xmm2,xmm0
+        xorps   xmm3,xmm0
+        xorps   xmm4,xmm0
+        xorps   xmm5,xmm0
+        movups  xmm0,XMMWORD[32+rcx]
+        lea     rcx,[32+rax*1+rcx]
+        neg     rax
+DB      0x0f,0x1f,0x00
+        add     rax,16
+
+$L$dec_loop4:
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+        movups  xmm1,XMMWORD[rax*1+rcx]
+        add     rax,32
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+        movups  xmm0,XMMWORD[((-16))+rax*1+rcx]
+        jnz     NEAR $L$dec_loop4
+
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+DB      102,15,56,223,208
+DB      102,15,56,223,216
+DB      102,15,56,223,224
+DB      102,15,56,223,232
+        DB      0F3h,0C3h               ;repret
+
+
+
+ALIGN   16
+_aesni_encrypt6:
+
+        movups  xmm0,XMMWORD[rcx]
+        shl     eax,4
+        movups  xmm1,XMMWORD[16+rcx]
+        xorps   xmm2,xmm0
+        pxor    xmm3,xmm0
+        pxor    xmm4,xmm0
+DB      102,15,56,220,209
+        lea     rcx,[32+rax*1+rcx]
+        neg     rax
+DB      102,15,56,220,217
+        pxor    xmm5,xmm0
+        pxor    xmm6,xmm0
+DB      102,15,56,220,225
+        pxor    xmm7,xmm0
+        movups  xmm0,XMMWORD[rax*1+rcx]
+        add     rax,16
+        jmp     NEAR $L$enc_loop6_enter
+ALIGN   16
+$L$enc_loop6:
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+$L$enc_loop6_enter:
+DB      102,15,56,220,233
+DB      102,15,56,220,241
+DB      102,15,56,220,249
+        movups  xmm1,XMMWORD[rax*1+rcx]
+        add     rax,32
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+DB      102,15,56,220,224
+DB      102,15,56,220,232
+DB      102,15,56,220,240
+DB      102,15,56,220,248
+        movups  xmm0,XMMWORD[((-16))+rax*1+rcx]
+        jnz     NEAR $L$enc_loop6
+
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+DB      102,15,56,220,241
+DB      102,15,56,220,249
+DB      102,15,56,221,208
+DB      102,15,56,221,216
+DB      102,15,56,221,224
+DB      102,15,56,221,232
+DB      102,15,56,221,240
+DB      102,15,56,221,248
+        DB      0F3h,0C3h               ;repret
+
+
+
+ALIGN   16
+_aesni_decrypt6:
+
+        movups  xmm0,XMMWORD[rcx]
+        shl     eax,4
+        movups  xmm1,XMMWORD[16+rcx]
+        xorps   xmm2,xmm0
+        pxor    xmm3,xmm0
+        pxor    xmm4,xmm0
+DB      102,15,56,222,209
+        lea     rcx,[32+rax*1+rcx]
+        neg     rax
+DB      102,15,56,222,217
+        pxor    xmm5,xmm0
+        pxor    xmm6,xmm0
+DB      102,15,56,222,225
+        pxor    xmm7,xmm0
+        movups  xmm0,XMMWORD[rax*1+rcx]
+        add     rax,16
+        jmp     NEAR $L$dec_loop6_enter
+ALIGN   16
+$L$dec_loop6:
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+$L$dec_loop6_enter:
+DB      102,15,56,222,233
+DB      102,15,56,222,241
+DB      102,15,56,222,249
+        movups  xmm1,XMMWORD[rax*1+rcx]
+        add     rax,32
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+DB      102,15,56,222,240
+DB      102,15,56,222,248
+        movups  xmm0,XMMWORD[((-16))+rax*1+rcx]
+        jnz     NEAR $L$dec_loop6
+
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+DB      102,15,56,222,241
+DB      102,15,56,222,249
+DB      102,15,56,223,208
+DB      102,15,56,223,216
+DB      102,15,56,223,224
+DB      102,15,56,223,232
+DB      102,15,56,223,240
+DB      102,15,56,223,248
+        DB      0F3h,0C3h               ;repret
+
+
+
+ALIGN   16
+_aesni_encrypt8:
+
+        movups  xmm0,XMMWORD[rcx]
+        shl     eax,4
+        movups  xmm1,XMMWORD[16+rcx]
+        xorps   xmm2,xmm0
+        xorps   xmm3,xmm0
+        pxor    xmm4,xmm0
+        pxor    xmm5,xmm0
+        pxor    xmm6,xmm0
+        lea     rcx,[32+rax*1+rcx]
+        neg     rax
+DB      102,15,56,220,209
+        pxor    xmm7,xmm0
+        pxor    xmm8,xmm0
+DB      102,15,56,220,217
+        pxor    xmm9,xmm0
+        movups  xmm0,XMMWORD[rax*1+rcx]
+        add     rax,16
+        jmp     NEAR $L$enc_loop8_inner
+ALIGN   16
+$L$enc_loop8:
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+$L$enc_loop8_inner:
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+DB      102,15,56,220,241
+DB      102,15,56,220,249
+DB      102,68,15,56,220,193
+DB      102,68,15,56,220,201
+$L$enc_loop8_enter:
+        movups  xmm1,XMMWORD[rax*1+rcx]
+        add     rax,32
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+DB      102,15,56,220,224
+DB      102,15,56,220,232
+DB      102,15,56,220,240
+DB      102,15,56,220,248
+DB      102,68,15,56,220,192
+DB      102,68,15,56,220,200
+        movups  xmm0,XMMWORD[((-16))+rax*1+rcx]
+        jnz     NEAR $L$enc_loop8
+
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+DB      102,15,56,220,241
+DB      102,15,56,220,249
+DB      102,68,15,56,220,193
+DB      102,68,15,56,220,201
+DB      102,15,56,221,208
+DB      102,15,56,221,216
+DB      102,15,56,221,224
+DB      102,15,56,221,232
+DB      102,15,56,221,240
+DB      102,15,56,221,248
+DB      102,68,15,56,221,192
+DB      102,68,15,56,221,200
+        DB      0F3h,0C3h               ;repret
+
+
+
+ALIGN   16
+_aesni_decrypt8:
+
+        movups  xmm0,XMMWORD[rcx]
+        shl     eax,4
+        movups  xmm1,XMMWORD[16+rcx]
+        xorps   xmm2,xmm0
+        xorps   xmm3,xmm0
+        pxor    xmm4,xmm0
+        pxor    xmm5,xmm0
+        pxor    xmm6,xmm0
+        lea     rcx,[32+rax*1+rcx]
+        neg     rax
+DB      102,15,56,222,209
+        pxor    xmm7,xmm0
+        pxor    xmm8,xmm0
+DB      102,15,56,222,217
+        pxor    xmm9,xmm0
+        movups  xmm0,XMMWORD[rax*1+rcx]
+        add     rax,16
+        jmp     NEAR $L$dec_loop8_inner
+ALIGN   16
+$L$dec_loop8:
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+$L$dec_loop8_inner:
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+DB      102,15,56,222,241
+DB      102,15,56,222,249
+DB      102,68,15,56,222,193
+DB      102,68,15,56,222,201
+$L$dec_loop8_enter:
+        movups  xmm1,XMMWORD[rax*1+rcx]
+        add     rax,32
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+DB      102,15,56,222,240
+DB      102,15,56,222,248
+DB      102,68,15,56,222,192
+DB      102,68,15,56,222,200
+        movups  xmm0,XMMWORD[((-16))+rax*1+rcx]
+        jnz     NEAR $L$dec_loop8
+
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+DB      102,15,56,222,241
+DB      102,15,56,222,249
+DB      102,68,15,56,222,193
+DB      102,68,15,56,222,201
+DB      102,15,56,223,208
+DB      102,15,56,223,216
+DB      102,15,56,223,224
+DB      102,15,56,223,232
+DB      102,15,56,223,240
+DB      102,15,56,223,248
+DB      102,68,15,56,223,192
+DB      102,68,15,56,223,200
+        DB      0F3h,0C3h               ;repret
+
+
+global  aesni_ecb_encrypt
+
+ALIGN   16
+aesni_ecb_encrypt:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_ecb_encrypt:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+
+
+
+        lea     rsp,[((-88))+rsp]
+        movaps  XMMWORD[rsp],xmm6
+        movaps  XMMWORD[16+rsp],xmm7
+        movaps  XMMWORD[32+rsp],xmm8
+        movaps  XMMWORD[48+rsp],xmm9
+$L$ecb_enc_body:
+        and     rdx,-16
+        jz      NEAR $L$ecb_ret
+
+        mov     eax,DWORD[240+rcx]
+        movups  xmm0,XMMWORD[rcx]
+        mov     r11,rcx
+        mov     r10d,eax
+        test    r8d,r8d
+        jz      NEAR $L$ecb_decrypt
+
+        cmp     rdx,0x80
+        jb      NEAR $L$ecb_enc_tail
+
+        movdqu  xmm2,XMMWORD[rdi]
+        movdqu  xmm3,XMMWORD[16+rdi]
+        movdqu  xmm4,XMMWORD[32+rdi]
+        movdqu  xmm5,XMMWORD[48+rdi]
+        movdqu  xmm6,XMMWORD[64+rdi]
+        movdqu  xmm7,XMMWORD[80+rdi]
+        movdqu  xmm8,XMMWORD[96+rdi]
+        movdqu  xmm9,XMMWORD[112+rdi]
+        lea     rdi,[128+rdi]
+        sub     rdx,0x80
+        jmp     NEAR $L$ecb_enc_loop8_enter
+ALIGN   16
+$L$ecb_enc_loop8:
+        movups  XMMWORD[rsi],xmm2
+        mov     rcx,r11
+        movdqu  xmm2,XMMWORD[rdi]
+        mov     eax,r10d
+        movups  XMMWORD[16+rsi],xmm3
+        movdqu  xmm3,XMMWORD[16+rdi]
+        movups  XMMWORD[32+rsi],xmm4
+        movdqu  xmm4,XMMWORD[32+rdi]
+        movups  XMMWORD[48+rsi],xmm5
+        movdqu  xmm5,XMMWORD[48+rdi]
+        movups  XMMWORD[64+rsi],xmm6
+        movdqu  xmm6,XMMWORD[64+rdi]
+        movups  XMMWORD[80+rsi],xmm7
+        movdqu  xmm7,XMMWORD[80+rdi]
+        movups  XMMWORD[96+rsi],xmm8
+        movdqu  xmm8,XMMWORD[96+rdi]
+        movups  XMMWORD[112+rsi],xmm9
+        lea     rsi,[128+rsi]
+        movdqu  xmm9,XMMWORD[112+rdi]
+        lea     rdi,[128+rdi]
+$L$ecb_enc_loop8_enter:
+
+        call    _aesni_encrypt8
+
+        sub     rdx,0x80
+        jnc     NEAR $L$ecb_enc_loop8
+
+        movups  XMMWORD[rsi],xmm2
+        mov     rcx,r11
+        movups  XMMWORD[16+rsi],xmm3
+        mov     eax,r10d
+        movups  XMMWORD[32+rsi],xmm4
+        movups  XMMWORD[48+rsi],xmm5
+        movups  XMMWORD[64+rsi],xmm6
+        movups  XMMWORD[80+rsi],xmm7
+        movups  XMMWORD[96+rsi],xmm8
+        movups  XMMWORD[112+rsi],xmm9
+        lea     rsi,[128+rsi]
+        add     rdx,0x80
+        jz      NEAR $L$ecb_ret
+
+$L$ecb_enc_tail:
+        movups  xmm2,XMMWORD[rdi]
+        cmp     rdx,0x20
+        jb      NEAR $L$ecb_enc_one
+        movups  xmm3,XMMWORD[16+rdi]
+        je      NEAR $L$ecb_enc_two
+        movups  xmm4,XMMWORD[32+rdi]
+        cmp     rdx,0x40
+        jb      NEAR $L$ecb_enc_three
+        movups  xmm5,XMMWORD[48+rdi]
+        je      NEAR $L$ecb_enc_four
+        movups  xmm6,XMMWORD[64+rdi]
+        cmp     rdx,0x60
+        jb      NEAR $L$ecb_enc_five
+        movups  xmm7,XMMWORD[80+rdi]
+        je      NEAR $L$ecb_enc_six
+        movdqu  xmm8,XMMWORD[96+rdi]
+        xorps   xmm9,xmm9
+        call    _aesni_encrypt8
+        movups  XMMWORD[rsi],xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        movups  XMMWORD[32+rsi],xmm4
+        movups  XMMWORD[48+rsi],xmm5
+        movups  XMMWORD[64+rsi],xmm6
+        movups  XMMWORD[80+rsi],xmm7
+        movups  XMMWORD[96+rsi],xmm8
+        jmp     NEAR $L$ecb_ret
+ALIGN   16
+$L$ecb_enc_one:
+        movups  xmm0,XMMWORD[rcx]
+        movups  xmm1,XMMWORD[16+rcx]
+        lea     rcx,[32+rcx]
+        xorps   xmm2,xmm0
+$L$oop_enc1_3:
+DB      102,15,56,220,209
+        dec     eax
+        movups  xmm1,XMMWORD[rcx]
+        lea     rcx,[16+rcx]
+        jnz     NEAR $L$oop_enc1_3
+DB      102,15,56,221,209
+        movups  XMMWORD[rsi],xmm2
+        jmp     NEAR $L$ecb_ret
+ALIGN   16
+$L$ecb_enc_two:
+        call    _aesni_encrypt2
+        movups  XMMWORD[rsi],xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        jmp     NEAR $L$ecb_ret
+ALIGN   16
+$L$ecb_enc_three:
+        call    _aesni_encrypt3
+        movups  XMMWORD[rsi],xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        movups  XMMWORD[32+rsi],xmm4
+        jmp     NEAR $L$ecb_ret
+ALIGN   16
+$L$ecb_enc_four:
+        call    _aesni_encrypt4
+        movups  XMMWORD[rsi],xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        movups  XMMWORD[32+rsi],xmm4
+        movups  XMMWORD[48+rsi],xmm5
+        jmp     NEAR $L$ecb_ret
+ALIGN   16
+$L$ecb_enc_five:
+        xorps   xmm7,xmm7
+        call    _aesni_encrypt6
+        movups  XMMWORD[rsi],xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        movups  XMMWORD[32+rsi],xmm4
+        movups  XMMWORD[48+rsi],xmm5
+        movups  XMMWORD[64+rsi],xmm6
+        jmp     NEAR $L$ecb_ret
+ALIGN   16
+$L$ecb_enc_six:
+        call    _aesni_encrypt6
+        movups  XMMWORD[rsi],xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        movups  XMMWORD[32+rsi],xmm4
+        movups  XMMWORD[48+rsi],xmm5
+        movups  XMMWORD[64+rsi],xmm6
+        movups  XMMWORD[80+rsi],xmm7
+        jmp     NEAR $L$ecb_ret
+
+ALIGN   16
+$L$ecb_decrypt:
+        cmp     rdx,0x80
+        jb      NEAR $L$ecb_dec_tail
+
+        movdqu  xmm2,XMMWORD[rdi]
+        movdqu  xmm3,XMMWORD[16+rdi]
+        movdqu  xmm4,XMMWORD[32+rdi]
+        movdqu  xmm5,XMMWORD[48+rdi]
+        movdqu  xmm6,XMMWORD[64+rdi]
+        movdqu  xmm7,XMMWORD[80+rdi]
+        movdqu  xmm8,XMMWORD[96+rdi]
+        movdqu  xmm9,XMMWORD[112+rdi]
+        lea     rdi,[128+rdi]
+        sub     rdx,0x80
+        jmp     NEAR $L$ecb_dec_loop8_enter
+ALIGN   16
+$L$ecb_dec_loop8:
+        movups  XMMWORD[rsi],xmm2
+        mov     rcx,r11
+        movdqu  xmm2,XMMWORD[rdi]
+        mov     eax,r10d
+        movups  XMMWORD[16+rsi],xmm3
+        movdqu  xmm3,XMMWORD[16+rdi]
+        movups  XMMWORD[32+rsi],xmm4
+        movdqu  xmm4,XMMWORD[32+rdi]
+        movups  XMMWORD[48+rsi],xmm5
+        movdqu  xmm5,XMMWORD[48+rdi]
+        movups  XMMWORD[64+rsi],xmm6
+        movdqu  xmm6,XMMWORD[64+rdi]
+        movups  XMMWORD[80+rsi],xmm7
+        movdqu  xmm7,XMMWORD[80+rdi]
+        movups  XMMWORD[96+rsi],xmm8
+        movdqu  xmm8,XMMWORD[96+rdi]
+        movups  XMMWORD[112+rsi],xmm9
+        lea     rsi,[128+rsi]
+        movdqu  xmm9,XMMWORD[112+rdi]
+        lea     rdi,[128+rdi]
+$L$ecb_dec_loop8_enter:
+
+        call    _aesni_decrypt8
+
+        movups  xmm0,XMMWORD[r11]
+        sub     rdx,0x80
+        jnc     NEAR $L$ecb_dec_loop8
+
+        movups  XMMWORD[rsi],xmm2
+        pxor    xmm2,xmm2
+        mov     rcx,r11
+        movups  XMMWORD[16+rsi],xmm3
+        pxor    xmm3,xmm3
+        mov     eax,r10d
+        movups  XMMWORD[32+rsi],xmm4
+        pxor    xmm4,xmm4
+        movups  XMMWORD[48+rsi],xmm5
+        pxor    xmm5,xmm5
+        movups  XMMWORD[64+rsi],xmm6
+        pxor    xmm6,xmm6
+        movups  XMMWORD[80+rsi],xmm7
+        pxor    xmm7,xmm7
+        movups  XMMWORD[96+rsi],xmm8
+        pxor    xmm8,xmm8
+        movups  XMMWORD[112+rsi],xmm9
+        pxor    xmm9,xmm9
+        lea     rsi,[128+rsi]
+        add     rdx,0x80
+        jz      NEAR $L$ecb_ret
+
+$L$ecb_dec_tail:
+        movups  xmm2,XMMWORD[rdi]
+        cmp     rdx,0x20
+        jb      NEAR $L$ecb_dec_one
+        movups  xmm3,XMMWORD[16+rdi]
+        je      NEAR $L$ecb_dec_two
+        movups  xmm4,XMMWORD[32+rdi]
+        cmp     rdx,0x40
+        jb      NEAR $L$ecb_dec_three
+        movups  xmm5,XMMWORD[48+rdi]
+        je      NEAR $L$ecb_dec_four
+        movups  xmm6,XMMWORD[64+rdi]
+        cmp     rdx,0x60
+        jb      NEAR $L$ecb_dec_five
+        movups  xmm7,XMMWORD[80+rdi]
+        je      NEAR $L$ecb_dec_six
+        movups  xmm8,XMMWORD[96+rdi]
+        movups  xmm0,XMMWORD[rcx]
+        xorps   xmm9,xmm9
+        call    _aesni_decrypt8
+        movups  XMMWORD[rsi],xmm2
+        pxor    xmm2,xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        pxor    xmm3,xmm3
+        movups  XMMWORD[32+rsi],xmm4
+        pxor    xmm4,xmm4
+        movups  XMMWORD[48+rsi],xmm5
+        pxor    xmm5,xmm5
+        movups  XMMWORD[64+rsi],xmm6
+        pxor    xmm6,xmm6
+        movups  XMMWORD[80+rsi],xmm7
+        pxor    xmm7,xmm7
+        movups  XMMWORD[96+rsi],xmm8
+        pxor    xmm8,xmm8
+        pxor    xmm9,xmm9
+        jmp     NEAR $L$ecb_ret
+ALIGN   16
+$L$ecb_dec_one:
+        movups  xmm0,XMMWORD[rcx]
+        movups  xmm1,XMMWORD[16+rcx]
+        lea     rcx,[32+rcx]
+        xorps   xmm2,xmm0
+$L$oop_dec1_4:
+DB      102,15,56,222,209
+        dec     eax
+        movups  xmm1,XMMWORD[rcx]
+        lea     rcx,[16+rcx]
+        jnz     NEAR $L$oop_dec1_4
+DB      102,15,56,223,209
+        movups  XMMWORD[rsi],xmm2
+        pxor    xmm2,xmm2
+        jmp     NEAR $L$ecb_ret
+ALIGN   16
+$L$ecb_dec_two:
+        call    _aesni_decrypt2
+        movups  XMMWORD[rsi],xmm2
+        pxor    xmm2,xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        pxor    xmm3,xmm3
+        jmp     NEAR $L$ecb_ret
+ALIGN   16
+$L$ecb_dec_three:
+        call    _aesni_decrypt3
+        movups  XMMWORD[rsi],xmm2
+        pxor    xmm2,xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        pxor    xmm3,xmm3
+        movups  XMMWORD[32+rsi],xmm4
+        pxor    xmm4,xmm4
+        jmp     NEAR $L$ecb_ret
+ALIGN   16
+$L$ecb_dec_four:
+        call    _aesni_decrypt4
+        movups  XMMWORD[rsi],xmm2
+        pxor    xmm2,xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        pxor    xmm3,xmm3
+        movups  XMMWORD[32+rsi],xmm4
+        pxor    xmm4,xmm4
+        movups  XMMWORD[48+rsi],xmm5
+        pxor    xmm5,xmm5
+        jmp     NEAR $L$ecb_ret
+ALIGN   16
+$L$ecb_dec_five:
+        xorps   xmm7,xmm7
+        call    _aesni_decrypt6
+        movups  XMMWORD[rsi],xmm2
+        pxor    xmm2,xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        pxor    xmm3,xmm3
+        movups  XMMWORD[32+rsi],xmm4
+        pxor    xmm4,xmm4
+        movups  XMMWORD[48+rsi],xmm5
+        pxor    xmm5,xmm5
+        movups  XMMWORD[64+rsi],xmm6
+        pxor    xmm6,xmm6
+        pxor    xmm7,xmm7
+        jmp     NEAR $L$ecb_ret
+ALIGN   16
+$L$ecb_dec_six:
+        call    _aesni_decrypt6
+        movups  XMMWORD[rsi],xmm2
+        pxor    xmm2,xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        pxor    xmm3,xmm3
+        movups  XMMWORD[32+rsi],xmm4
+        pxor    xmm4,xmm4
+        movups  XMMWORD[48+rsi],xmm5
+        pxor    xmm5,xmm5
+        movups  XMMWORD[64+rsi],xmm6
+        pxor    xmm6,xmm6
+        movups  XMMWORD[80+rsi],xmm7
+        pxor    xmm7,xmm7
+
+$L$ecb_ret:
+        xorps   xmm0,xmm0
+        pxor    xmm1,xmm1
+        movaps  xmm6,XMMWORD[rsp]
+        movaps  XMMWORD[rsp],xmm0
+        movaps  xmm7,XMMWORD[16+rsp]
+        movaps  XMMWORD[16+rsp],xmm0
+        movaps  xmm8,XMMWORD[32+rsp]
+        movaps  XMMWORD[32+rsp],xmm0
+        movaps  xmm9,XMMWORD[48+rsp]
+        movaps  XMMWORD[48+rsp],xmm0
+        lea     rsp,[88+rsp]
+$L$ecb_enc_ret:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_aesni_ecb_encrypt:
+global  aesni_ccm64_encrypt_blocks
+
+ALIGN   16
+aesni_ccm64_encrypt_blocks:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_ccm64_encrypt_blocks:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+        lea     rsp,[((-88))+rsp]
+        movaps  XMMWORD[rsp],xmm6
+        movaps  XMMWORD[16+rsp],xmm7
+        movaps  XMMWORD[32+rsp],xmm8
+        movaps  XMMWORD[48+rsp],xmm9
+$L$ccm64_enc_body:
+        mov     eax,DWORD[240+rcx]
+        movdqu  xmm6,XMMWORD[r8]
+        movdqa  xmm9,XMMWORD[$L$increment64]
+        movdqa  xmm7,XMMWORD[$L$bswap_mask]
+
+        shl     eax,4
+        mov     r10d,16
+        lea     r11,[rcx]
+        movdqu  xmm3,XMMWORD[r9]
+        movdqa  xmm2,xmm6
+        lea     rcx,[32+rax*1+rcx]
+DB      102,15,56,0,247
+        sub     r10,rax
+        jmp     NEAR $L$ccm64_enc_outer
+ALIGN   16
+$L$ccm64_enc_outer:
+        movups  xmm0,XMMWORD[r11]
+        mov     rax,r10
+        movups  xmm8,XMMWORD[rdi]
+
+        xorps   xmm2,xmm0
+        movups  xmm1,XMMWORD[16+r11]
+        xorps   xmm0,xmm8
+        xorps   xmm3,xmm0
+        movups  xmm0,XMMWORD[32+r11]
+
+$L$ccm64_enc2_loop:
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+        movups  xmm1,XMMWORD[rax*1+rcx]
+        add     rax,32
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+        movups  xmm0,XMMWORD[((-16))+rax*1+rcx]
+        jnz     NEAR $L$ccm64_enc2_loop
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+        paddq   xmm6,xmm9
+        dec     rdx
+DB      102,15,56,221,208
+DB      102,15,56,221,216
+
+        lea     rdi,[16+rdi]
+        xorps   xmm8,xmm2
+        movdqa  xmm2,xmm6
+        movups  XMMWORD[rsi],xmm8
+DB      102,15,56,0,215
+        lea     rsi,[16+rsi]
+        jnz     NEAR $L$ccm64_enc_outer
+
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        pxor    xmm2,xmm2
+        movups  XMMWORD[r9],xmm3
+        pxor    xmm3,xmm3
+        pxor    xmm8,xmm8
+        pxor    xmm6,xmm6
+        movaps  xmm6,XMMWORD[rsp]
+        movaps  XMMWORD[rsp],xmm0
+        movaps  xmm7,XMMWORD[16+rsp]
+        movaps  XMMWORD[16+rsp],xmm0
+        movaps  xmm8,XMMWORD[32+rsp]
+        movaps  XMMWORD[32+rsp],xmm0
+        movaps  xmm9,XMMWORD[48+rsp]
+        movaps  XMMWORD[48+rsp],xmm0
+        lea     rsp,[88+rsp]
+$L$ccm64_enc_ret:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+$L$SEH_end_aesni_ccm64_encrypt_blocks:
+global  aesni_ccm64_decrypt_blocks
+
+ALIGN   16
+aesni_ccm64_decrypt_blocks:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_ccm64_decrypt_blocks:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+        lea     rsp,[((-88))+rsp]
+        movaps  XMMWORD[rsp],xmm6
+        movaps  XMMWORD[16+rsp],xmm7
+        movaps  XMMWORD[32+rsp],xmm8
+        movaps  XMMWORD[48+rsp],xmm9
+$L$ccm64_dec_body:
+        mov     eax,DWORD[240+rcx]
+        movups  xmm6,XMMWORD[r8]
+        movdqu  xmm3,XMMWORD[r9]
+        movdqa  xmm9,XMMWORD[$L$increment64]
+        movdqa  xmm7,XMMWORD[$L$bswap_mask]
+
+        movaps  xmm2,xmm6
+        mov     r10d,eax
+        mov     r11,rcx
+DB      102,15,56,0,247
+        movups  xmm0,XMMWORD[rcx]
+        movups  xmm1,XMMWORD[16+rcx]
+        lea     rcx,[32+rcx]
+        xorps   xmm2,xmm0
+$L$oop_enc1_5:
+DB      102,15,56,220,209
+        dec     eax
+        movups  xmm1,XMMWORD[rcx]
+        lea     rcx,[16+rcx]
+        jnz     NEAR $L$oop_enc1_5
+DB      102,15,56,221,209
+        shl     r10d,4
+        mov     eax,16
+        movups  xmm8,XMMWORD[rdi]
+        paddq   xmm6,xmm9
+        lea     rdi,[16+rdi]
+        sub     rax,r10
+        lea     rcx,[32+r10*1+r11]
+        mov     r10,rax
+        jmp     NEAR $L$ccm64_dec_outer
+ALIGN   16
+$L$ccm64_dec_outer:
+        xorps   xmm8,xmm2
+        movdqa  xmm2,xmm6
+        movups  XMMWORD[rsi],xmm8
+        lea     rsi,[16+rsi]
+DB      102,15,56,0,215
+
+        sub     rdx,1
+        jz      NEAR $L$ccm64_dec_break
+
+        movups  xmm0,XMMWORD[r11]
+        mov     rax,r10
+        movups  xmm1,XMMWORD[16+r11]
+        xorps   xmm8,xmm0
+        xorps   xmm2,xmm0
+        xorps   xmm3,xmm8
+        movups  xmm0,XMMWORD[32+r11]
+        jmp     NEAR $L$ccm64_dec2_loop
+ALIGN   16
+$L$ccm64_dec2_loop:
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+        movups  xmm1,XMMWORD[rax*1+rcx]
+        add     rax,32
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+        movups  xmm0,XMMWORD[((-16))+rax*1+rcx]
+        jnz     NEAR $L$ccm64_dec2_loop
+        movups  xmm8,XMMWORD[rdi]
+        paddq   xmm6,xmm9
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,221,208
+DB      102,15,56,221,216
+        lea     rdi,[16+rdi]
+        jmp     NEAR $L$ccm64_dec_outer
+
+ALIGN   16
+$L$ccm64_dec_break:
+
+        mov     eax,DWORD[240+r11]
+        movups  xmm0,XMMWORD[r11]
+        movups  xmm1,XMMWORD[16+r11]
+        xorps   xmm8,xmm0
+        lea     r11,[32+r11]
+        xorps   xmm3,xmm8
+$L$oop_enc1_6:
+DB      102,15,56,220,217
+        dec     eax
+        movups  xmm1,XMMWORD[r11]
+        lea     r11,[16+r11]
+        jnz     NEAR $L$oop_enc1_6
+DB      102,15,56,221,217
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        pxor    xmm2,xmm2
+        movups  XMMWORD[r9],xmm3
+        pxor    xmm3,xmm3
+        pxor    xmm8,xmm8
+        pxor    xmm6,xmm6
+        movaps  xmm6,XMMWORD[rsp]
+        movaps  XMMWORD[rsp],xmm0
+        movaps  xmm7,XMMWORD[16+rsp]
+        movaps  XMMWORD[16+rsp],xmm0
+        movaps  xmm8,XMMWORD[32+rsp]
+        movaps  XMMWORD[32+rsp],xmm0
+        movaps  xmm9,XMMWORD[48+rsp]
+        movaps  XMMWORD[48+rsp],xmm0
+        lea     rsp,[88+rsp]
+$L$ccm64_dec_ret:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+$L$SEH_end_aesni_ccm64_decrypt_blocks:
+global  aesni_ctr32_encrypt_blocks
+
+ALIGN   16
+aesni_ctr32_encrypt_blocks:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_ctr32_encrypt_blocks:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+
+
+
+        cmp     rdx,1
+        jne     NEAR $L$ctr32_bulk
+
+
+
+        movups  xmm2,XMMWORD[r8]
+        movups  xmm3,XMMWORD[rdi]
+        mov     edx,DWORD[240+rcx]
+        movups  xmm0,XMMWORD[rcx]
+        movups  xmm1,XMMWORD[16+rcx]
+        lea     rcx,[32+rcx]
+        xorps   xmm2,xmm0
+$L$oop_enc1_7:
+DB      102,15,56,220,209
+        dec     edx
+        movups  xmm1,XMMWORD[rcx]
+        lea     rcx,[16+rcx]
+        jnz     NEAR $L$oop_enc1_7
+DB      102,15,56,221,209
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        xorps   xmm2,xmm3
+        pxor    xmm3,xmm3
+        movups  XMMWORD[rsi],xmm2
+        xorps   xmm2,xmm2
+        jmp     NEAR $L$ctr32_epilogue
+
+ALIGN   16
+$L$ctr32_bulk:
+        lea     r11,[rsp]
+
+        push    rbp
+
+        sub     rsp,288
+        and     rsp,-16
+        movaps  XMMWORD[(-168)+r11],xmm6
+        movaps  XMMWORD[(-152)+r11],xmm7
+        movaps  XMMWORD[(-136)+r11],xmm8
+        movaps  XMMWORD[(-120)+r11],xmm9
+        movaps  XMMWORD[(-104)+r11],xmm10
+        movaps  XMMWORD[(-88)+r11],xmm11
+        movaps  XMMWORD[(-72)+r11],xmm12
+        movaps  XMMWORD[(-56)+r11],xmm13
+        movaps  XMMWORD[(-40)+r11],xmm14
+        movaps  XMMWORD[(-24)+r11],xmm15
+$L$ctr32_body:
+
+
+
+
+        movdqu  xmm2,XMMWORD[r8]
+        movdqu  xmm0,XMMWORD[rcx]
+        mov     r8d,DWORD[12+r8]
+        pxor    xmm2,xmm0
+        mov     ebp,DWORD[12+rcx]
+        movdqa  XMMWORD[rsp],xmm2
+        bswap   r8d
+        movdqa  xmm3,xmm2
+        movdqa  xmm4,xmm2
+        movdqa  xmm5,xmm2
+        movdqa  XMMWORD[64+rsp],xmm2
+        movdqa  XMMWORD[80+rsp],xmm2
+        movdqa  XMMWORD[96+rsp],xmm2
+        mov     r10,rdx
+        movdqa  XMMWORD[112+rsp],xmm2
+
+        lea     rax,[1+r8]
+        lea     rdx,[2+r8]
+        bswap   eax
+        bswap   edx
+        xor     eax,ebp
+        xor     edx,ebp
+DB      102,15,58,34,216,3
+        lea     rax,[3+r8]
+        movdqa  XMMWORD[16+rsp],xmm3
+DB      102,15,58,34,226,3
+        bswap   eax
+        mov     rdx,r10
+        lea     r10,[4+r8]
+        movdqa  XMMWORD[32+rsp],xmm4
+        xor     eax,ebp
+        bswap   r10d
+DB      102,15,58,34,232,3
+        xor     r10d,ebp
+        movdqa  XMMWORD[48+rsp],xmm5
+        lea     r9,[5+r8]
+        mov     DWORD[((64+12))+rsp],r10d
+        bswap   r9d
+        lea     r10,[6+r8]
+        mov     eax,DWORD[240+rcx]
+        xor     r9d,ebp
+        bswap   r10d
+        mov     DWORD[((80+12))+rsp],r9d
+        xor     r10d,ebp
+        lea     r9,[7+r8]
+        mov     DWORD[((96+12))+rsp],r10d
+        bswap   r9d
+        mov     r10d,DWORD[((OPENSSL_ia32cap_P+4))]
+        xor     r9d,ebp
+        and     r10d,71303168
+        mov     DWORD[((112+12))+rsp],r9d
+
+        movups  xmm1,XMMWORD[16+rcx]
+
+        movdqa  xmm6,XMMWORD[64+rsp]
+        movdqa  xmm7,XMMWORD[80+rsp]
+
+        cmp     rdx,8
+        jb      NEAR $L$ctr32_tail
+
+        sub     rdx,6
+        cmp     r10d,4194304
+        je      NEAR $L$ctr32_6x
+
+        lea     rcx,[128+rcx]
+        sub     rdx,2
+        jmp     NEAR $L$ctr32_loop8
+
+ALIGN   16
+$L$ctr32_6x:
+        shl     eax,4
+        mov     r10d,48
+        bswap   ebp
+        lea     rcx,[32+rax*1+rcx]
+        sub     r10,rax
+        jmp     NEAR $L$ctr32_loop6
+
+ALIGN   16
+$L$ctr32_loop6:
+        add     r8d,6
+        movups  xmm0,XMMWORD[((-48))+r10*1+rcx]
+DB      102,15,56,220,209
+        mov     eax,r8d
+        xor     eax,ebp
+DB      102,15,56,220,217
+DB      0x0f,0x38,0xf1,0x44,0x24,12
+        lea     eax,[1+r8]
+DB      102,15,56,220,225
+        xor     eax,ebp
+DB      0x0f,0x38,0xf1,0x44,0x24,28
+DB      102,15,56,220,233
+        lea     eax,[2+r8]
+        xor     eax,ebp
+DB      102,15,56,220,241
+DB      0x0f,0x38,0xf1,0x44,0x24,44
+        lea     eax,[3+r8]
+DB      102,15,56,220,249
+        movups  xmm1,XMMWORD[((-32))+r10*1+rcx]
+        xor     eax,ebp
+
+DB      102,15,56,220,208
+DB      0x0f,0x38,0xf1,0x44,0x24,60
+        lea     eax,[4+r8]
+DB      102,15,56,220,216
+        xor     eax,ebp
+DB      0x0f,0x38,0xf1,0x44,0x24,76
+DB      102,15,56,220,224
+        lea     eax,[5+r8]
+        xor     eax,ebp
+DB      102,15,56,220,232
+DB      0x0f,0x38,0xf1,0x44,0x24,92
+        mov     rax,r10
+DB      102,15,56,220,240
+DB      102,15,56,220,248
+        movups  xmm0,XMMWORD[((-16))+r10*1+rcx]
+
+        call    $L$enc_loop6
+
+        movdqu  xmm8,XMMWORD[rdi]
+        movdqu  xmm9,XMMWORD[16+rdi]
+        movdqu  xmm10,XMMWORD[32+rdi]
+        movdqu  xmm11,XMMWORD[48+rdi]
+        movdqu  xmm12,XMMWORD[64+rdi]
+        movdqu  xmm13,XMMWORD[80+rdi]
+        lea     rdi,[96+rdi]
+        movups  xmm1,XMMWORD[((-64))+r10*1+rcx]
+        pxor    xmm8,xmm2
+        movaps  xmm2,XMMWORD[rsp]
+        pxor    xmm9,xmm3
+        movaps  xmm3,XMMWORD[16+rsp]
+        pxor    xmm10,xmm4
+        movaps  xmm4,XMMWORD[32+rsp]
+        pxor    xmm11,xmm5
+        movaps  xmm5,XMMWORD[48+rsp]
+        pxor    xmm12,xmm6
+        movaps  xmm6,XMMWORD[64+rsp]
+        pxor    xmm13,xmm7
+        movaps  xmm7,XMMWORD[80+rsp]
+        movdqu  XMMWORD[rsi],xmm8
+        movdqu  XMMWORD[16+rsi],xmm9
+        movdqu  XMMWORD[32+rsi],xmm10
+        movdqu  XMMWORD[48+rsi],xmm11
+        movdqu  XMMWORD[64+rsi],xmm12
+        movdqu  XMMWORD[80+rsi],xmm13
+        lea     rsi,[96+rsi]
+
+        sub     rdx,6
+        jnc     NEAR $L$ctr32_loop6
+
+        add     rdx,6
+        jz      NEAR $L$ctr32_done
+
+        lea     eax,[((-48))+r10]
+        lea     rcx,[((-80))+r10*1+rcx]
+        neg     eax
+        shr     eax,4
+        jmp     NEAR $L$ctr32_tail
+
+ALIGN   32
+$L$ctr32_loop8:
+        add     r8d,8
+        movdqa  xmm8,XMMWORD[96+rsp]
+DB      102,15,56,220,209
+        mov     r9d,r8d
+        movdqa  xmm9,XMMWORD[112+rsp]
+DB      102,15,56,220,217
+        bswap   r9d
+        movups  xmm0,XMMWORD[((32-128))+rcx]
+DB      102,15,56,220,225
+        xor     r9d,ebp
+        nop
+DB      102,15,56,220,233
+        mov     DWORD[((0+12))+rsp],r9d
+        lea     r9,[1+r8]
+DB      102,15,56,220,241
+DB      102,15,56,220,249
+DB      102,68,15,56,220,193
+DB      102,68,15,56,220,201
+        movups  xmm1,XMMWORD[((48-128))+rcx]
+        bswap   r9d
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+        xor     r9d,ebp
+DB      0x66,0x90
+DB      102,15,56,220,224
+DB      102,15,56,220,232
+        mov     DWORD[((16+12))+rsp],r9d
+        lea     r9,[2+r8]
+DB      102,15,56,220,240
+DB      102,15,56,220,248
+DB      102,68,15,56,220,192
+DB      102,68,15,56,220,200
+        movups  xmm0,XMMWORD[((64-128))+rcx]
+        bswap   r9d
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+        xor     r9d,ebp
+DB      0x66,0x90
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+        mov     DWORD[((32+12))+rsp],r9d
+        lea     r9,[3+r8]
+DB      102,15,56,220,241
+DB      102,15,56,220,249
+DB      102,68,15,56,220,193
+DB      102,68,15,56,220,201
+        movups  xmm1,XMMWORD[((80-128))+rcx]
+        bswap   r9d
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+        xor     r9d,ebp
+DB      0x66,0x90
+DB      102,15,56,220,224
+DB      102,15,56,220,232
+        mov     DWORD[((48+12))+rsp],r9d
+        lea     r9,[4+r8]
+DB      102,15,56,220,240
+DB      102,15,56,220,248
+DB      102,68,15,56,220,192
+DB      102,68,15,56,220,200
+        movups  xmm0,XMMWORD[((96-128))+rcx]
+        bswap   r9d
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+        xor     r9d,ebp
+DB      0x66,0x90
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+        mov     DWORD[((64+12))+rsp],r9d
+        lea     r9,[5+r8]
+DB      102,15,56,220,241
+DB      102,15,56,220,249
+DB      102,68,15,56,220,193
+DB      102,68,15,56,220,201
+        movups  xmm1,XMMWORD[((112-128))+rcx]
+        bswap   r9d
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+        xor     r9d,ebp
+DB      0x66,0x90
+DB      102,15,56,220,224
+DB      102,15,56,220,232
+        mov     DWORD[((80+12))+rsp],r9d
+        lea     r9,[6+r8]
+DB      102,15,56,220,240
+DB      102,15,56,220,248
+DB      102,68,15,56,220,192
+DB      102,68,15,56,220,200
+        movups  xmm0,XMMWORD[((128-128))+rcx]
+        bswap   r9d
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+        xor     r9d,ebp
+DB      0x66,0x90
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+        mov     DWORD[((96+12))+rsp],r9d
+        lea     r9,[7+r8]
+DB      102,15,56,220,241
+DB      102,15,56,220,249
+DB      102,68,15,56,220,193
+DB      102,68,15,56,220,201
+        movups  xmm1,XMMWORD[((144-128))+rcx]
+        bswap   r9d
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+DB      102,15,56,220,224
+        xor     r9d,ebp
+        movdqu  xmm10,XMMWORD[rdi]
+DB      102,15,56,220,232
+        mov     DWORD[((112+12))+rsp],r9d
+        cmp     eax,11
+DB      102,15,56,220,240
+DB      102,15,56,220,248
+DB      102,68,15,56,220,192
+DB      102,68,15,56,220,200
+        movups  xmm0,XMMWORD[((160-128))+rcx]
+
+        jb      NEAR $L$ctr32_enc_done
+
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+DB      102,15,56,220,241
+DB      102,15,56,220,249
+DB      102,68,15,56,220,193
+DB      102,68,15,56,220,201
+        movups  xmm1,XMMWORD[((176-128))+rcx]
+
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+DB      102,15,56,220,224
+DB      102,15,56,220,232
+DB      102,15,56,220,240
+DB      102,15,56,220,248
+DB      102,68,15,56,220,192
+DB      102,68,15,56,220,200
+        movups  xmm0,XMMWORD[((192-128))+rcx]
+        je      NEAR $L$ctr32_enc_done
+
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+DB      102,15,56,220,241
+DB      102,15,56,220,249
+DB      102,68,15,56,220,193
+DB      102,68,15,56,220,201
+        movups  xmm1,XMMWORD[((208-128))+rcx]
+
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+DB      102,15,56,220,224
+DB      102,15,56,220,232
+DB      102,15,56,220,240
+DB      102,15,56,220,248
+DB      102,68,15,56,220,192
+DB      102,68,15,56,220,200
+        movups  xmm0,XMMWORD[((224-128))+rcx]
+        jmp     NEAR $L$ctr32_enc_done
+
+ALIGN   16
+$L$ctr32_enc_done:
+        movdqu  xmm11,XMMWORD[16+rdi]
+        pxor    xmm10,xmm0
+        movdqu  xmm12,XMMWORD[32+rdi]
+        pxor    xmm11,xmm0
+        movdqu  xmm13,XMMWORD[48+rdi]
+        pxor    xmm12,xmm0
+        movdqu  xmm14,XMMWORD[64+rdi]
+        pxor    xmm13,xmm0
+        movdqu  xmm15,XMMWORD[80+rdi]
+        pxor    xmm14,xmm0
+        pxor    xmm15,xmm0
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+DB      102,15,56,220,241
+DB      102,15,56,220,249
+DB      102,68,15,56,220,193
+DB      102,68,15,56,220,201
+        movdqu  xmm1,XMMWORD[96+rdi]
+        lea     rdi,[128+rdi]
+
+DB      102,65,15,56,221,210
+        pxor    xmm1,xmm0
+        movdqu  xmm10,XMMWORD[((112-128))+rdi]
+DB      102,65,15,56,221,219
+        pxor    xmm10,xmm0
+        movdqa  xmm11,XMMWORD[rsp]
+DB      102,65,15,56,221,228
+DB      102,65,15,56,221,237
+        movdqa  xmm12,XMMWORD[16+rsp]
+        movdqa  xmm13,XMMWORD[32+rsp]
+DB      102,65,15,56,221,246
+DB      102,65,15,56,221,255
+        movdqa  xmm14,XMMWORD[48+rsp]
+        movdqa  xmm15,XMMWORD[64+rsp]
+DB      102,68,15,56,221,193
+        movdqa  xmm0,XMMWORD[80+rsp]
+        movups  xmm1,XMMWORD[((16-128))+rcx]
+DB      102,69,15,56,221,202
+
+        movups  XMMWORD[rsi],xmm2
+        movdqa  xmm2,xmm11
+        movups  XMMWORD[16+rsi],xmm3
+        movdqa  xmm3,xmm12
+        movups  XMMWORD[32+rsi],xmm4
+        movdqa  xmm4,xmm13
+        movups  XMMWORD[48+rsi],xmm5
+        movdqa  xmm5,xmm14
+        movups  XMMWORD[64+rsi],xmm6
+        movdqa  xmm6,xmm15
+        movups  XMMWORD[80+rsi],xmm7
+        movdqa  xmm7,xmm0
+        movups  XMMWORD[96+rsi],xmm8
+        movups  XMMWORD[112+rsi],xmm9
+        lea     rsi,[128+rsi]
+
+        sub     rdx,8
+        jnc     NEAR $L$ctr32_loop8
+
+        add     rdx,8
+        jz      NEAR $L$ctr32_done
+        lea     rcx,[((-128))+rcx]
+
+$L$ctr32_tail:
+
+
+        lea     rcx,[16+rcx]
+        cmp     rdx,4
+        jb      NEAR $L$ctr32_loop3
+        je      NEAR $L$ctr32_loop4
+
+
+        shl     eax,4
+        movdqa  xmm8,XMMWORD[96+rsp]
+        pxor    xmm9,xmm9
+
+        movups  xmm0,XMMWORD[16+rcx]
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+        lea     rcx,[((32-16))+rax*1+rcx]
+        neg     rax
+DB      102,15,56,220,225
+        add     rax,16
+        movups  xmm10,XMMWORD[rdi]
+DB      102,15,56,220,233
+DB      102,15,56,220,241
+        movups  xmm11,XMMWORD[16+rdi]
+        movups  xmm12,XMMWORD[32+rdi]
+DB      102,15,56,220,249
+DB      102,68,15,56,220,193
+
+        call    $L$enc_loop8_enter
+
+        movdqu  xmm13,XMMWORD[48+rdi]
+        pxor    xmm2,xmm10
+        movdqu  xmm10,XMMWORD[64+rdi]
+        pxor    xmm3,xmm11
+        movdqu  XMMWORD[rsi],xmm2
+        pxor    xmm4,xmm12
+        movdqu  XMMWORD[16+rsi],xmm3
+        pxor    xmm5,xmm13
+        movdqu  XMMWORD[32+rsi],xmm4
+        pxor    xmm6,xmm10
+        movdqu  XMMWORD[48+rsi],xmm5
+        movdqu  XMMWORD[64+rsi],xmm6
+        cmp     rdx,6
+        jb      NEAR $L$ctr32_done
+
+        movups  xmm11,XMMWORD[80+rdi]
+        xorps   xmm7,xmm11
+        movups  XMMWORD[80+rsi],xmm7
+        je      NEAR $L$ctr32_done
+
+        movups  xmm12,XMMWORD[96+rdi]
+        xorps   xmm8,xmm12
+        movups  XMMWORD[96+rsi],xmm8
+        jmp     NEAR $L$ctr32_done
+
+ALIGN   32
+$L$ctr32_loop4:
+DB      102,15,56,220,209
+        lea     rcx,[16+rcx]
+        dec     eax
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+        movups  xmm1,XMMWORD[rcx]
+        jnz     NEAR $L$ctr32_loop4
+DB      102,15,56,221,209
+DB      102,15,56,221,217
+        movups  xmm10,XMMWORD[rdi]
+        movups  xmm11,XMMWORD[16+rdi]
+DB      102,15,56,221,225
+DB      102,15,56,221,233
+        movups  xmm12,XMMWORD[32+rdi]
+        movups  xmm13,XMMWORD[48+rdi]
+
+        xorps   xmm2,xmm10
+        movups  XMMWORD[rsi],xmm2
+        xorps   xmm3,xmm11
+        movups  XMMWORD[16+rsi],xmm3
+        pxor    xmm4,xmm12
+        movdqu  XMMWORD[32+rsi],xmm4
+        pxor    xmm5,xmm13
+        movdqu  XMMWORD[48+rsi],xmm5
+        jmp     NEAR $L$ctr32_done
+
+ALIGN   32
+$L$ctr32_loop3:
+DB      102,15,56,220,209
+        lea     rcx,[16+rcx]
+        dec     eax
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+        movups  xmm1,XMMWORD[rcx]
+        jnz     NEAR $L$ctr32_loop3
+DB      102,15,56,221,209
+DB      102,15,56,221,217
+DB      102,15,56,221,225
+
+        movups  xmm10,XMMWORD[rdi]
+        xorps   xmm2,xmm10
+        movups  XMMWORD[rsi],xmm2
+        cmp     rdx,2
+        jb      NEAR $L$ctr32_done
+
+        movups  xmm11,XMMWORD[16+rdi]
+        xorps   xmm3,xmm11
+        movups  XMMWORD[16+rsi],xmm3
+        je      NEAR $L$ctr32_done
+
+        movups  xmm12,XMMWORD[32+rdi]
+        xorps   xmm4,xmm12
+        movups  XMMWORD[32+rsi],xmm4
+
+$L$ctr32_done:
+        xorps   xmm0,xmm0
+        xor     ebp,ebp
+        pxor    xmm1,xmm1
+        pxor    xmm2,xmm2
+        pxor    xmm3,xmm3
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+        movaps  xmm6,XMMWORD[((-168))+r11]
+        movaps  XMMWORD[(-168)+r11],xmm0
+        movaps  xmm7,XMMWORD[((-152))+r11]
+        movaps  XMMWORD[(-152)+r11],xmm0
+        movaps  xmm8,XMMWORD[((-136))+r11]
+        movaps  XMMWORD[(-136)+r11],xmm0
+        movaps  xmm9,XMMWORD[((-120))+r11]
+        movaps  XMMWORD[(-120)+r11],xmm0
+        movaps  xmm10,XMMWORD[((-104))+r11]
+        movaps  XMMWORD[(-104)+r11],xmm0
+        movaps  xmm11,XMMWORD[((-88))+r11]
+        movaps  XMMWORD[(-88)+r11],xmm0
+        movaps  xmm12,XMMWORD[((-72))+r11]
+        movaps  XMMWORD[(-72)+r11],xmm0
+        movaps  xmm13,XMMWORD[((-56))+r11]
+        movaps  XMMWORD[(-56)+r11],xmm0
+        movaps  xmm14,XMMWORD[((-40))+r11]
+        movaps  XMMWORD[(-40)+r11],xmm0
+        movaps  xmm15,XMMWORD[((-24))+r11]
+        movaps  XMMWORD[(-24)+r11],xmm0
+        movaps  XMMWORD[rsp],xmm0
+        movaps  XMMWORD[16+rsp],xmm0
+        movaps  XMMWORD[32+rsp],xmm0
+        movaps  XMMWORD[48+rsp],xmm0
+        movaps  XMMWORD[64+rsp],xmm0
+        movaps  XMMWORD[80+rsp],xmm0
+        movaps  XMMWORD[96+rsp],xmm0
+        movaps  XMMWORD[112+rsp],xmm0
+        mov     rbp,QWORD[((-8))+r11]
+
+        lea     rsp,[r11]
+
+$L$ctr32_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_aesni_ctr32_encrypt_blocks:
+global  aesni_xts_encrypt
+
+ALIGN   16
+aesni_xts_encrypt:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_xts_encrypt:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        lea     r11,[rsp]
+
+        push    rbp
+
+        sub     rsp,272
+        and     rsp,-16
+        movaps  XMMWORD[(-168)+r11],xmm6
+        movaps  XMMWORD[(-152)+r11],xmm7
+        movaps  XMMWORD[(-136)+r11],xmm8
+        movaps  XMMWORD[(-120)+r11],xmm9
+        movaps  XMMWORD[(-104)+r11],xmm10
+        movaps  XMMWORD[(-88)+r11],xmm11
+        movaps  XMMWORD[(-72)+r11],xmm12
+        movaps  XMMWORD[(-56)+r11],xmm13
+        movaps  XMMWORD[(-40)+r11],xmm14
+        movaps  XMMWORD[(-24)+r11],xmm15
+$L$xts_enc_body:
+        movups  xmm2,XMMWORD[r9]
+        mov     eax,DWORD[240+r8]
+        mov     r10d,DWORD[240+rcx]
+        movups  xmm0,XMMWORD[r8]
+        movups  xmm1,XMMWORD[16+r8]
+        lea     r8,[32+r8]
+        xorps   xmm2,xmm0
+$L$oop_enc1_8:
+DB      102,15,56,220,209
+        dec     eax
+        movups  xmm1,XMMWORD[r8]
+        lea     r8,[16+r8]
+        jnz     NEAR $L$oop_enc1_8
+DB      102,15,56,221,209
+        movups  xmm0,XMMWORD[rcx]
+        mov     rbp,rcx
+        mov     eax,r10d
+        shl     r10d,4
+        mov     r9,rdx
+        and     rdx,-16
+
+        movups  xmm1,XMMWORD[16+r10*1+rcx]
+
+        movdqa  xmm8,XMMWORD[$L$xts_magic]
+        movdqa  xmm15,xmm2
+        pshufd  xmm9,xmm2,0x5f
+        pxor    xmm1,xmm0
+        movdqa  xmm14,xmm9
+        paddd   xmm9,xmm9
+        movdqa  xmm10,xmm15
+        psrad   xmm14,31
+        paddq   xmm15,xmm15
+        pand    xmm14,xmm8
+        pxor    xmm10,xmm0
+        pxor    xmm15,xmm14
+        movdqa  xmm14,xmm9
+        paddd   xmm9,xmm9
+        movdqa  xmm11,xmm15
+        psrad   xmm14,31
+        paddq   xmm15,xmm15
+        pand    xmm14,xmm8
+        pxor    xmm11,xmm0
+        pxor    xmm15,xmm14
+        movdqa  xmm14,xmm9
+        paddd   xmm9,xmm9
+        movdqa  xmm12,xmm15
+        psrad   xmm14,31
+        paddq   xmm15,xmm15
+        pand    xmm14,xmm8
+        pxor    xmm12,xmm0
+        pxor    xmm15,xmm14
+        movdqa  xmm14,xmm9
+        paddd   xmm9,xmm9
+        movdqa  xmm13,xmm15
+        psrad   xmm14,31
+        paddq   xmm15,xmm15
+        pand    xmm14,xmm8
+        pxor    xmm13,xmm0
+        pxor    xmm15,xmm14
+        movdqa  xmm14,xmm15
+        psrad   xmm9,31
+        paddq   xmm15,xmm15
+        pand    xmm9,xmm8
+        pxor    xmm14,xmm0
+        pxor    xmm15,xmm9
+        movaps  XMMWORD[96+rsp],xmm1
+
+        sub     rdx,16*6
+        jc      NEAR $L$xts_enc_short
+
+        mov     eax,16+96
+        lea     rcx,[32+r10*1+rbp]
+        sub     rax,r10
+        movups  xmm1,XMMWORD[16+rbp]
+        mov     r10,rax
+        lea     r8,[$L$xts_magic]
+        jmp     NEAR $L$xts_enc_grandloop
+
+ALIGN   32
+$L$xts_enc_grandloop:
+        movdqu  xmm2,XMMWORD[rdi]
+        movdqa  xmm8,xmm0
+        movdqu  xmm3,XMMWORD[16+rdi]
+        pxor    xmm2,xmm10
+        movdqu  xmm4,XMMWORD[32+rdi]
+        pxor    xmm3,xmm11
+DB      102,15,56,220,209
+        movdqu  xmm5,XMMWORD[48+rdi]
+        pxor    xmm4,xmm12
+DB      102,15,56,220,217
+        movdqu  xmm6,XMMWORD[64+rdi]
+        pxor    xmm5,xmm13
+DB      102,15,56,220,225
+        movdqu  xmm7,XMMWORD[80+rdi]
+        pxor    xmm8,xmm15
+        movdqa  xmm9,XMMWORD[96+rsp]
+        pxor    xmm6,xmm14
+DB      102,15,56,220,233
+        movups  xmm0,XMMWORD[32+rbp]
+        lea     rdi,[96+rdi]
+        pxor    xmm7,xmm8
+
+        pxor    xmm10,xmm9
+DB      102,15,56,220,241
+        pxor    xmm11,xmm9
+        movdqa  XMMWORD[rsp],xmm10
+DB      102,15,56,220,249
+        movups  xmm1,XMMWORD[48+rbp]
+        pxor    xmm12,xmm9
+
+DB      102,15,56,220,208
+        pxor    xmm13,xmm9
+        movdqa  XMMWORD[16+rsp],xmm11
+DB      102,15,56,220,216
+        pxor    xmm14,xmm9
+        movdqa  XMMWORD[32+rsp],xmm12
+DB      102,15,56,220,224
+DB      102,15,56,220,232
+        pxor    xmm8,xmm9
+        movdqa  XMMWORD[64+rsp],xmm14
+DB      102,15,56,220,240
+DB      102,15,56,220,248
+        movups  xmm0,XMMWORD[64+rbp]
+        movdqa  XMMWORD[80+rsp],xmm8
+        pshufd  xmm9,xmm15,0x5f
+        jmp     NEAR $L$xts_enc_loop6
+ALIGN   32
+$L$xts_enc_loop6:
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+DB      102,15,56,220,241
+DB      102,15,56,220,249
+        movups  xmm1,XMMWORD[((-64))+rax*1+rcx]
+        add     rax,32
+
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+DB      102,15,56,220,224
+DB      102,15,56,220,232
+DB      102,15,56,220,240
+DB      102,15,56,220,248
+        movups  xmm0,XMMWORD[((-80))+rax*1+rcx]
+        jnz     NEAR $L$xts_enc_loop6
+
+        movdqa  xmm8,XMMWORD[r8]
+        movdqa  xmm14,xmm9
+        paddd   xmm9,xmm9
+DB      102,15,56,220,209
+        paddq   xmm15,xmm15
+        psrad   xmm14,31
+DB      102,15,56,220,217
+        pand    xmm14,xmm8
+        movups  xmm10,XMMWORD[rbp]
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+DB      102,15,56,220,241
+        pxor    xmm15,xmm14
+        movaps  xmm11,xmm10
+DB      102,15,56,220,249
+        movups  xmm1,XMMWORD[((-64))+rcx]
+
+        movdqa  xmm14,xmm9
+DB      102,15,56,220,208
+        paddd   xmm9,xmm9
+        pxor    xmm10,xmm15
+DB      102,15,56,220,216
+        psrad   xmm14,31
+        paddq   xmm15,xmm15
+DB      102,15,56,220,224
+DB      102,15,56,220,232
+        pand    xmm14,xmm8
+        movaps  xmm12,xmm11
+DB      102,15,56,220,240
+        pxor    xmm15,xmm14
+        movdqa  xmm14,xmm9
+DB      102,15,56,220,248
+        movups  xmm0,XMMWORD[((-48))+rcx]
+
+        paddd   xmm9,xmm9
+DB      102,15,56,220,209
+        pxor    xmm11,xmm15
+        psrad   xmm14,31
+DB      102,15,56,220,217
+        paddq   xmm15,xmm15
+        pand    xmm14,xmm8
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+        movdqa  XMMWORD[48+rsp],xmm13
+        pxor    xmm15,xmm14
+DB      102,15,56,220,241
+        movaps  xmm13,xmm12
+        movdqa  xmm14,xmm9
+DB      102,15,56,220,249
+        movups  xmm1,XMMWORD[((-32))+rcx]
+
+        paddd   xmm9,xmm9
+DB      102,15,56,220,208
+        pxor    xmm12,xmm15
+        psrad   xmm14,31
+DB      102,15,56,220,216
+        paddq   xmm15,xmm15
+        pand    xmm14,xmm8
+DB      102,15,56,220,224
+DB      102,15,56,220,232
+DB      102,15,56,220,240
+        pxor    xmm15,xmm14
+        movaps  xmm14,xmm13
+DB      102,15,56,220,248
+
+        movdqa  xmm0,xmm9
+        paddd   xmm9,xmm9
+DB      102,15,56,220,209
+        pxor    xmm13,xmm15
+        psrad   xmm0,31
+DB      102,15,56,220,217
+        paddq   xmm15,xmm15
+        pand    xmm0,xmm8
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+        pxor    xmm15,xmm0
+        movups  xmm0,XMMWORD[rbp]
+DB      102,15,56,220,241
+DB      102,15,56,220,249
+        movups  xmm1,XMMWORD[16+rbp]
+
+        pxor    xmm14,xmm15
+DB      102,15,56,221,84,36,0
+        psrad   xmm9,31
+        paddq   xmm15,xmm15
+DB      102,15,56,221,92,36,16
+DB      102,15,56,221,100,36,32
+        pand    xmm9,xmm8
+        mov     rax,r10
+DB      102,15,56,221,108,36,48
+DB      102,15,56,221,116,36,64
+DB      102,15,56,221,124,36,80
+        pxor    xmm15,xmm9
+
+        lea     rsi,[96+rsi]
+        movups  XMMWORD[(-96)+rsi],xmm2
+        movups  XMMWORD[(-80)+rsi],xmm3
+        movups  XMMWORD[(-64)+rsi],xmm4
+        movups  XMMWORD[(-48)+rsi],xmm5
+        movups  XMMWORD[(-32)+rsi],xmm6
+        movups  XMMWORD[(-16)+rsi],xmm7
+        sub     rdx,16*6
+        jnc     NEAR $L$xts_enc_grandloop
+
+        mov     eax,16+96
+        sub     eax,r10d
+        mov     rcx,rbp
+        shr     eax,4
+
+$L$xts_enc_short:
+
+        mov     r10d,eax
+        pxor    xmm10,xmm0
+        add     rdx,16*6
+        jz      NEAR $L$xts_enc_done
+
+        pxor    xmm11,xmm0
+        cmp     rdx,0x20
+        jb      NEAR $L$xts_enc_one
+        pxor    xmm12,xmm0
+        je      NEAR $L$xts_enc_two
+
+        pxor    xmm13,xmm0
+        cmp     rdx,0x40
+        jb      NEAR $L$xts_enc_three
+        pxor    xmm14,xmm0
+        je      NEAR $L$xts_enc_four
+
+        movdqu  xmm2,XMMWORD[rdi]
+        movdqu  xmm3,XMMWORD[16+rdi]
+        movdqu  xmm4,XMMWORD[32+rdi]
+        pxor    xmm2,xmm10
+        movdqu  xmm5,XMMWORD[48+rdi]
+        pxor    xmm3,xmm11
+        movdqu  xmm6,XMMWORD[64+rdi]
+        lea     rdi,[80+rdi]
+        pxor    xmm4,xmm12
+        pxor    xmm5,xmm13
+        pxor    xmm6,xmm14
+        pxor    xmm7,xmm7
+
+        call    _aesni_encrypt6
+
+        xorps   xmm2,xmm10
+        movdqa  xmm10,xmm15
+        xorps   xmm3,xmm11
+        xorps   xmm4,xmm12
+        movdqu  XMMWORD[rsi],xmm2
+        xorps   xmm5,xmm13
+        movdqu  XMMWORD[16+rsi],xmm3
+        xorps   xmm6,xmm14
+        movdqu  XMMWORD[32+rsi],xmm4
+        movdqu  XMMWORD[48+rsi],xmm5
+        movdqu  XMMWORD[64+rsi],xmm6
+        lea     rsi,[80+rsi]
+        jmp     NEAR $L$xts_enc_done
+
+ALIGN   16
+$L$xts_enc_one:
+        movups  xmm2,XMMWORD[rdi]
+        lea     rdi,[16+rdi]
+        xorps   xmm2,xmm10
+        movups  xmm0,XMMWORD[rcx]
+        movups  xmm1,XMMWORD[16+rcx]
+        lea     rcx,[32+rcx]
+        xorps   xmm2,xmm0
+$L$oop_enc1_9:
+DB      102,15,56,220,209
+        dec     eax
+        movups  xmm1,XMMWORD[rcx]
+        lea     rcx,[16+rcx]
+        jnz     NEAR $L$oop_enc1_9
+DB      102,15,56,221,209
+        xorps   xmm2,xmm10
+        movdqa  xmm10,xmm11
+        movups  XMMWORD[rsi],xmm2
+        lea     rsi,[16+rsi]
+        jmp     NEAR $L$xts_enc_done
+
+ALIGN   16
+$L$xts_enc_two:
+        movups  xmm2,XMMWORD[rdi]
+        movups  xmm3,XMMWORD[16+rdi]
+        lea     rdi,[32+rdi]
+        xorps   xmm2,xmm10
+        xorps   xmm3,xmm11
+
+        call    _aesni_encrypt2
+
+        xorps   xmm2,xmm10
+        movdqa  xmm10,xmm12
+        xorps   xmm3,xmm11
+        movups  XMMWORD[rsi],xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        lea     rsi,[32+rsi]
+        jmp     NEAR $L$xts_enc_done
+
+ALIGN   16
+$L$xts_enc_three:
+        movups  xmm2,XMMWORD[rdi]
+        movups  xmm3,XMMWORD[16+rdi]
+        movups  xmm4,XMMWORD[32+rdi]
+        lea     rdi,[48+rdi]
+        xorps   xmm2,xmm10
+        xorps   xmm3,xmm11
+        xorps   xmm4,xmm12
+
+        call    _aesni_encrypt3
+
+        xorps   xmm2,xmm10
+        movdqa  xmm10,xmm13
+        xorps   xmm3,xmm11
+        xorps   xmm4,xmm12
+        movups  XMMWORD[rsi],xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        movups  XMMWORD[32+rsi],xmm4
+        lea     rsi,[48+rsi]
+        jmp     NEAR $L$xts_enc_done
+
+ALIGN   16
+$L$xts_enc_four:
+        movups  xmm2,XMMWORD[rdi]
+        movups  xmm3,XMMWORD[16+rdi]
+        movups  xmm4,XMMWORD[32+rdi]
+        xorps   xmm2,xmm10
+        movups  xmm5,XMMWORD[48+rdi]
+        lea     rdi,[64+rdi]
+        xorps   xmm3,xmm11
+        xorps   xmm4,xmm12
+        xorps   xmm5,xmm13
+
+        call    _aesni_encrypt4
+
+        pxor    xmm2,xmm10
+        movdqa  xmm10,xmm14
+        pxor    xmm3,xmm11
+        pxor    xmm4,xmm12
+        movdqu  XMMWORD[rsi],xmm2
+        pxor    xmm5,xmm13
+        movdqu  XMMWORD[16+rsi],xmm3
+        movdqu  XMMWORD[32+rsi],xmm4
+        movdqu  XMMWORD[48+rsi],xmm5
+        lea     rsi,[64+rsi]
+        jmp     NEAR $L$xts_enc_done
+
+ALIGN   16
+$L$xts_enc_done:
+        and     r9,15
+        jz      NEAR $L$xts_enc_ret
+        mov     rdx,r9
+
+$L$xts_enc_steal:
+        movzx   eax,BYTE[rdi]
+        movzx   ecx,BYTE[((-16))+rsi]
+        lea     rdi,[1+rdi]
+        mov     BYTE[((-16))+rsi],al
+        mov     BYTE[rsi],cl
+        lea     rsi,[1+rsi]
+        sub     rdx,1
+        jnz     NEAR $L$xts_enc_steal
+
+        sub     rsi,r9
+        mov     rcx,rbp
+        mov     eax,r10d
+
+        movups  xmm2,XMMWORD[((-16))+rsi]
+        xorps   xmm2,xmm10
+        movups  xmm0,XMMWORD[rcx]
+        movups  xmm1,XMMWORD[16+rcx]
+        lea     rcx,[32+rcx]
+        xorps   xmm2,xmm0
+$L$oop_enc1_10:
+DB      102,15,56,220,209
+        dec     eax
+        movups  xmm1,XMMWORD[rcx]
+        lea     rcx,[16+rcx]
+        jnz     NEAR $L$oop_enc1_10
+DB      102,15,56,221,209
+        xorps   xmm2,xmm10
+        movups  XMMWORD[(-16)+rsi],xmm2
+
+$L$xts_enc_ret:
+        xorps   xmm0,xmm0
+        pxor    xmm1,xmm1
+        pxor    xmm2,xmm2
+        pxor    xmm3,xmm3
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+        movaps  xmm6,XMMWORD[((-168))+r11]
+        movaps  XMMWORD[(-168)+r11],xmm0
+        movaps  xmm7,XMMWORD[((-152))+r11]
+        movaps  XMMWORD[(-152)+r11],xmm0
+        movaps  xmm8,XMMWORD[((-136))+r11]
+        movaps  XMMWORD[(-136)+r11],xmm0
+        movaps  xmm9,XMMWORD[((-120))+r11]
+        movaps  XMMWORD[(-120)+r11],xmm0
+        movaps  xmm10,XMMWORD[((-104))+r11]
+        movaps  XMMWORD[(-104)+r11],xmm0
+        movaps  xmm11,XMMWORD[((-88))+r11]
+        movaps  XMMWORD[(-88)+r11],xmm0
+        movaps  xmm12,XMMWORD[((-72))+r11]
+        movaps  XMMWORD[(-72)+r11],xmm0
+        movaps  xmm13,XMMWORD[((-56))+r11]
+        movaps  XMMWORD[(-56)+r11],xmm0
+        movaps  xmm14,XMMWORD[((-40))+r11]
+        movaps  XMMWORD[(-40)+r11],xmm0
+        movaps  xmm15,XMMWORD[((-24))+r11]
+        movaps  XMMWORD[(-24)+r11],xmm0
+        movaps  XMMWORD[rsp],xmm0
+        movaps  XMMWORD[16+rsp],xmm0
+        movaps  XMMWORD[32+rsp],xmm0
+        movaps  XMMWORD[48+rsp],xmm0
+        movaps  XMMWORD[64+rsp],xmm0
+        movaps  XMMWORD[80+rsp],xmm0
+        movaps  XMMWORD[96+rsp],xmm0
+        mov     rbp,QWORD[((-8))+r11]
+
+        lea     rsp,[r11]
+
+$L$xts_enc_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_aesni_xts_encrypt:
+global  aesni_xts_decrypt
+
+ALIGN   16
+aesni_xts_decrypt:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_xts_decrypt:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        lea     r11,[rsp]
+
+        push    rbp
+
+        sub     rsp,272
+        and     rsp,-16
+        movaps  XMMWORD[(-168)+r11],xmm6
+        movaps  XMMWORD[(-152)+r11],xmm7
+        movaps  XMMWORD[(-136)+r11],xmm8
+        movaps  XMMWORD[(-120)+r11],xmm9
+        movaps  XMMWORD[(-104)+r11],xmm10
+        movaps  XMMWORD[(-88)+r11],xmm11
+        movaps  XMMWORD[(-72)+r11],xmm12
+        movaps  XMMWORD[(-56)+r11],xmm13
+        movaps  XMMWORD[(-40)+r11],xmm14
+        movaps  XMMWORD[(-24)+r11],xmm15
+$L$xts_dec_body:
+        movups  xmm2,XMMWORD[r9]
+        mov     eax,DWORD[240+r8]
+        mov     r10d,DWORD[240+rcx]
+        movups  xmm0,XMMWORD[r8]
+        movups  xmm1,XMMWORD[16+r8]
+        lea     r8,[32+r8]
+        xorps   xmm2,xmm0
+$L$oop_enc1_11:
+DB      102,15,56,220,209
+        dec     eax
+        movups  xmm1,XMMWORD[r8]
+        lea     r8,[16+r8]
+        jnz     NEAR $L$oop_enc1_11
+DB      102,15,56,221,209
+        xor     eax,eax
+        test    rdx,15
+        setnz   al
+        shl     rax,4
+        sub     rdx,rax
+
+        movups  xmm0,XMMWORD[rcx]
+        mov     rbp,rcx
+        mov     eax,r10d
+        shl     r10d,4
+        mov     r9,rdx
+        and     rdx,-16
+
+        movups  xmm1,XMMWORD[16+r10*1+rcx]
+
+        movdqa  xmm8,XMMWORD[$L$xts_magic]
+        movdqa  xmm15,xmm2
+        pshufd  xmm9,xmm2,0x5f
+        pxor    xmm1,xmm0
+        movdqa  xmm14,xmm9
+        paddd   xmm9,xmm9
+        movdqa  xmm10,xmm15
+        psrad   xmm14,31
+        paddq   xmm15,xmm15
+        pand    xmm14,xmm8
+        pxor    xmm10,xmm0
+        pxor    xmm15,xmm14
+        movdqa  xmm14,xmm9
+        paddd   xmm9,xmm9
+        movdqa  xmm11,xmm15
+        psrad   xmm14,31
+        paddq   xmm15,xmm15
+        pand    xmm14,xmm8
+        pxor    xmm11,xmm0
+        pxor    xmm15,xmm14
+        movdqa  xmm14,xmm9
+        paddd   xmm9,xmm9
+        movdqa  xmm12,xmm15
+        psrad   xmm14,31
+        paddq   xmm15,xmm15
+        pand    xmm14,xmm8
+        pxor    xmm12,xmm0
+        pxor    xmm15,xmm14
+        movdqa  xmm14,xmm9
+        paddd   xmm9,xmm9
+        movdqa  xmm13,xmm15
+        psrad   xmm14,31
+        paddq   xmm15,xmm15
+        pand    xmm14,xmm8
+        pxor    xmm13,xmm0
+        pxor    xmm15,xmm14
+        movdqa  xmm14,xmm15
+        psrad   xmm9,31
+        paddq   xmm15,xmm15
+        pand    xmm9,xmm8
+        pxor    xmm14,xmm0
+        pxor    xmm15,xmm9
+        movaps  XMMWORD[96+rsp],xmm1
+
+        sub     rdx,16*6
+        jc      NEAR $L$xts_dec_short
+
+        mov     eax,16+96
+        lea     rcx,[32+r10*1+rbp]
+        sub     rax,r10
+        movups  xmm1,XMMWORD[16+rbp]
+        mov     r10,rax
+        lea     r8,[$L$xts_magic]
+        jmp     NEAR $L$xts_dec_grandloop
+
+ALIGN   32
+$L$xts_dec_grandloop:
+        movdqu  xmm2,XMMWORD[rdi]
+        movdqa  xmm8,xmm0
+        movdqu  xmm3,XMMWORD[16+rdi]
+        pxor    xmm2,xmm10
+        movdqu  xmm4,XMMWORD[32+rdi]
+        pxor    xmm3,xmm11
+DB      102,15,56,222,209
+        movdqu  xmm5,XMMWORD[48+rdi]
+        pxor    xmm4,xmm12
+DB      102,15,56,222,217
+        movdqu  xmm6,XMMWORD[64+rdi]
+        pxor    xmm5,xmm13
+DB      102,15,56,222,225
+        movdqu  xmm7,XMMWORD[80+rdi]
+        pxor    xmm8,xmm15
+        movdqa  xmm9,XMMWORD[96+rsp]
+        pxor    xmm6,xmm14
+DB      102,15,56,222,233
+        movups  xmm0,XMMWORD[32+rbp]
+        lea     rdi,[96+rdi]
+        pxor    xmm7,xmm8
+
+        pxor    xmm10,xmm9
+DB      102,15,56,222,241
+        pxor    xmm11,xmm9
+        movdqa  XMMWORD[rsp],xmm10
+DB      102,15,56,222,249
+        movups  xmm1,XMMWORD[48+rbp]
+        pxor    xmm12,xmm9
+
+DB      102,15,56,222,208
+        pxor    xmm13,xmm9
+        movdqa  XMMWORD[16+rsp],xmm11
+DB      102,15,56,222,216
+        pxor    xmm14,xmm9
+        movdqa  XMMWORD[32+rsp],xmm12
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+        pxor    xmm8,xmm9
+        movdqa  XMMWORD[64+rsp],xmm14
+DB      102,15,56,222,240
+DB      102,15,56,222,248
+        movups  xmm0,XMMWORD[64+rbp]
+        movdqa  XMMWORD[80+rsp],xmm8
+        pshufd  xmm9,xmm15,0x5f
+        jmp     NEAR $L$xts_dec_loop6
+ALIGN   32
+$L$xts_dec_loop6:
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+DB      102,15,56,222,241
+DB      102,15,56,222,249
+        movups  xmm1,XMMWORD[((-64))+rax*1+rcx]
+        add     rax,32
+
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+DB      102,15,56,222,240
+DB      102,15,56,222,248
+        movups  xmm0,XMMWORD[((-80))+rax*1+rcx]
+        jnz     NEAR $L$xts_dec_loop6
+
+        movdqa  xmm8,XMMWORD[r8]
+        movdqa  xmm14,xmm9
+        paddd   xmm9,xmm9
+DB      102,15,56,222,209
+        paddq   xmm15,xmm15
+        psrad   xmm14,31
+DB      102,15,56,222,217
+        pand    xmm14,xmm8
+        movups  xmm10,XMMWORD[rbp]
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+DB      102,15,56,222,241
+        pxor    xmm15,xmm14
+        movaps  xmm11,xmm10
+DB      102,15,56,222,249
+        movups  xmm1,XMMWORD[((-64))+rcx]
+
+        movdqa  xmm14,xmm9
+DB      102,15,56,222,208
+        paddd   xmm9,xmm9
+        pxor    xmm10,xmm15
+DB      102,15,56,222,216
+        psrad   xmm14,31
+        paddq   xmm15,xmm15
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+        pand    xmm14,xmm8
+        movaps  xmm12,xmm11
+DB      102,15,56,222,240
+        pxor    xmm15,xmm14
+        movdqa  xmm14,xmm9
+DB      102,15,56,222,248
+        movups  xmm0,XMMWORD[((-48))+rcx]
+
+        paddd   xmm9,xmm9
+DB      102,15,56,222,209
+        pxor    xmm11,xmm15
+        psrad   xmm14,31
+DB      102,15,56,222,217
+        paddq   xmm15,xmm15
+        pand    xmm14,xmm8
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+        movdqa  XMMWORD[48+rsp],xmm13
+        pxor    xmm15,xmm14
+DB      102,15,56,222,241
+        movaps  xmm13,xmm12
+        movdqa  xmm14,xmm9
+DB      102,15,56,222,249
+        movups  xmm1,XMMWORD[((-32))+rcx]
+
+        paddd   xmm9,xmm9
+DB      102,15,56,222,208
+        pxor    xmm12,xmm15
+        psrad   xmm14,31
+DB      102,15,56,222,216
+        paddq   xmm15,xmm15
+        pand    xmm14,xmm8
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+DB      102,15,56,222,240
+        pxor    xmm15,xmm14
+        movaps  xmm14,xmm13
+DB      102,15,56,222,248
+
+        movdqa  xmm0,xmm9
+        paddd   xmm9,xmm9
+DB      102,15,56,222,209
+        pxor    xmm13,xmm15
+        psrad   xmm0,31
+DB      102,15,56,222,217
+        paddq   xmm15,xmm15
+        pand    xmm0,xmm8
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+        pxor    xmm15,xmm0
+        movups  xmm0,XMMWORD[rbp]
+DB      102,15,56,222,241
+DB      102,15,56,222,249
+        movups  xmm1,XMMWORD[16+rbp]
+
+        pxor    xmm14,xmm15
+DB      102,15,56,223,84,36,0
+        psrad   xmm9,31
+        paddq   xmm15,xmm15
+DB      102,15,56,223,92,36,16
+DB      102,15,56,223,100,36,32
+        pand    xmm9,xmm8
+        mov     rax,r10
+DB      102,15,56,223,108,36,48
+DB      102,15,56,223,116,36,64
+DB      102,15,56,223,124,36,80
+        pxor    xmm15,xmm9
+
+        lea     rsi,[96+rsi]
+        movups  XMMWORD[(-96)+rsi],xmm2
+        movups  XMMWORD[(-80)+rsi],xmm3
+        movups  XMMWORD[(-64)+rsi],xmm4
+        movups  XMMWORD[(-48)+rsi],xmm5
+        movups  XMMWORD[(-32)+rsi],xmm6
+        movups  XMMWORD[(-16)+rsi],xmm7
+        sub     rdx,16*6
+        jnc     NEAR $L$xts_dec_grandloop
+
+        mov     eax,16+96
+        sub     eax,r10d
+        mov     rcx,rbp
+        shr     eax,4
+
+$L$xts_dec_short:
+
+        mov     r10d,eax
+        pxor    xmm10,xmm0
+        pxor    xmm11,xmm0
+        add     rdx,16*6
+        jz      NEAR $L$xts_dec_done
+
+        pxor    xmm12,xmm0
+        cmp     rdx,0x20
+        jb      NEAR $L$xts_dec_one
+        pxor    xmm13,xmm0
+        je      NEAR $L$xts_dec_two
+
+        pxor    xmm14,xmm0
+        cmp     rdx,0x40
+        jb      NEAR $L$xts_dec_three
+        je      NEAR $L$xts_dec_four
+
+        movdqu  xmm2,XMMWORD[rdi]
+        movdqu  xmm3,XMMWORD[16+rdi]
+        movdqu  xmm4,XMMWORD[32+rdi]
+        pxor    xmm2,xmm10
+        movdqu  xmm5,XMMWORD[48+rdi]
+        pxor    xmm3,xmm11
+        movdqu  xmm6,XMMWORD[64+rdi]
+        lea     rdi,[80+rdi]
+        pxor    xmm4,xmm12
+        pxor    xmm5,xmm13
+        pxor    xmm6,xmm14
+
+        call    _aesni_decrypt6
+
+        xorps   xmm2,xmm10
+        xorps   xmm3,xmm11
+        xorps   xmm4,xmm12
+        movdqu  XMMWORD[rsi],xmm2
+        xorps   xmm5,xmm13
+        movdqu  XMMWORD[16+rsi],xmm3
+        xorps   xmm6,xmm14
+        movdqu  XMMWORD[32+rsi],xmm4
+        pxor    xmm14,xmm14
+        movdqu  XMMWORD[48+rsi],xmm5
+        pcmpgtd xmm14,xmm15
+        movdqu  XMMWORD[64+rsi],xmm6
+        lea     rsi,[80+rsi]
+        pshufd  xmm11,xmm14,0x13
+        and     r9,15
+        jz      NEAR $L$xts_dec_ret
+
+        movdqa  xmm10,xmm15
+        paddq   xmm15,xmm15
+        pand    xmm11,xmm8
+        pxor    xmm11,xmm15
+        jmp     NEAR $L$xts_dec_done2
+
+ALIGN   16
+$L$xts_dec_one:
+        movups  xmm2,XMMWORD[rdi]
+        lea     rdi,[16+rdi]
+        xorps   xmm2,xmm10
+        movups  xmm0,XMMWORD[rcx]
+        movups  xmm1,XMMWORD[16+rcx]
+        lea     rcx,[32+rcx]
+        xorps   xmm2,xmm0
+$L$oop_dec1_12:
+DB      102,15,56,222,209
+        dec     eax
+        movups  xmm1,XMMWORD[rcx]
+        lea     rcx,[16+rcx]
+        jnz     NEAR $L$oop_dec1_12
+DB      102,15,56,223,209
+        xorps   xmm2,xmm10
+        movdqa  xmm10,xmm11
+        movups  XMMWORD[rsi],xmm2
+        movdqa  xmm11,xmm12
+        lea     rsi,[16+rsi]
+        jmp     NEAR $L$xts_dec_done
+
+ALIGN   16
+$L$xts_dec_two:
+        movups  xmm2,XMMWORD[rdi]
+        movups  xmm3,XMMWORD[16+rdi]
+        lea     rdi,[32+rdi]
+        xorps   xmm2,xmm10
+        xorps   xmm3,xmm11
+
+        call    _aesni_decrypt2
+
+        xorps   xmm2,xmm10
+        movdqa  xmm10,xmm12
+        xorps   xmm3,xmm11
+        movdqa  xmm11,xmm13
+        movups  XMMWORD[rsi],xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        lea     rsi,[32+rsi]
+        jmp     NEAR $L$xts_dec_done
+
+ALIGN   16
+$L$xts_dec_three:
+        movups  xmm2,XMMWORD[rdi]
+        movups  xmm3,XMMWORD[16+rdi]
+        movups  xmm4,XMMWORD[32+rdi]
+        lea     rdi,[48+rdi]
+        xorps   xmm2,xmm10
+        xorps   xmm3,xmm11
+        xorps   xmm4,xmm12
+
+        call    _aesni_decrypt3
+
+        xorps   xmm2,xmm10
+        movdqa  xmm10,xmm13
+        xorps   xmm3,xmm11
+        movdqa  xmm11,xmm14
+        xorps   xmm4,xmm12
+        movups  XMMWORD[rsi],xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        movups  XMMWORD[32+rsi],xmm4
+        lea     rsi,[48+rsi]
+        jmp     NEAR $L$xts_dec_done
+
+ALIGN   16
+$L$xts_dec_four:
+        movups  xmm2,XMMWORD[rdi]
+        movups  xmm3,XMMWORD[16+rdi]
+        movups  xmm4,XMMWORD[32+rdi]
+        xorps   xmm2,xmm10
+        movups  xmm5,XMMWORD[48+rdi]
+        lea     rdi,[64+rdi]
+        xorps   xmm3,xmm11
+        xorps   xmm4,xmm12
+        xorps   xmm5,xmm13
+
+        call    _aesni_decrypt4
+
+        pxor    xmm2,xmm10
+        movdqa  xmm10,xmm14
+        pxor    xmm3,xmm11
+        movdqa  xmm11,xmm15
+        pxor    xmm4,xmm12
+        movdqu  XMMWORD[rsi],xmm2
+        pxor    xmm5,xmm13
+        movdqu  XMMWORD[16+rsi],xmm3
+        movdqu  XMMWORD[32+rsi],xmm4
+        movdqu  XMMWORD[48+rsi],xmm5
+        lea     rsi,[64+rsi]
+        jmp     NEAR $L$xts_dec_done
+
+ALIGN   16
+$L$xts_dec_done:
+        and     r9,15
+        jz      NEAR $L$xts_dec_ret
+$L$xts_dec_done2:
+        mov     rdx,r9
+        mov     rcx,rbp
+        mov     eax,r10d
+
+        movups  xmm2,XMMWORD[rdi]
+        xorps   xmm2,xmm11
+        movups  xmm0,XMMWORD[rcx]
+        movups  xmm1,XMMWORD[16+rcx]
+        lea     rcx,[32+rcx]
+        xorps   xmm2,xmm0
+$L$oop_dec1_13:
+DB      102,15,56,222,209
+        dec     eax
+        movups  xmm1,XMMWORD[rcx]
+        lea     rcx,[16+rcx]
+        jnz     NEAR $L$oop_dec1_13
+DB      102,15,56,223,209
+        xorps   xmm2,xmm11
+        movups  XMMWORD[rsi],xmm2
+
+$L$xts_dec_steal:
+        movzx   eax,BYTE[16+rdi]
+        movzx   ecx,BYTE[rsi]
+        lea     rdi,[1+rdi]
+        mov     BYTE[rsi],al
+        mov     BYTE[16+rsi],cl
+        lea     rsi,[1+rsi]
+        sub     rdx,1
+        jnz     NEAR $L$xts_dec_steal
+
+        sub     rsi,r9
+        mov     rcx,rbp
+        mov     eax,r10d
+
+        movups  xmm2,XMMWORD[rsi]
+        xorps   xmm2,xmm10
+        movups  xmm0,XMMWORD[rcx]
+        movups  xmm1,XMMWORD[16+rcx]
+        lea     rcx,[32+rcx]
+        xorps   xmm2,xmm0
+$L$oop_dec1_14:
+DB      102,15,56,222,209
+        dec     eax
+        movups  xmm1,XMMWORD[rcx]
+        lea     rcx,[16+rcx]
+        jnz     NEAR $L$oop_dec1_14
+DB      102,15,56,223,209
+        xorps   xmm2,xmm10
+        movups  XMMWORD[rsi],xmm2
+
+$L$xts_dec_ret:
+        xorps   xmm0,xmm0
+        pxor    xmm1,xmm1
+        pxor    xmm2,xmm2
+        pxor    xmm3,xmm3
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+        movaps  xmm6,XMMWORD[((-168))+r11]
+        movaps  XMMWORD[(-168)+r11],xmm0
+        movaps  xmm7,XMMWORD[((-152))+r11]
+        movaps  XMMWORD[(-152)+r11],xmm0
+        movaps  xmm8,XMMWORD[((-136))+r11]
+        movaps  XMMWORD[(-136)+r11],xmm0
+        movaps  xmm9,XMMWORD[((-120))+r11]
+        movaps  XMMWORD[(-120)+r11],xmm0
+        movaps  xmm10,XMMWORD[((-104))+r11]
+        movaps  XMMWORD[(-104)+r11],xmm0
+        movaps  xmm11,XMMWORD[((-88))+r11]
+        movaps  XMMWORD[(-88)+r11],xmm0
+        movaps  xmm12,XMMWORD[((-72))+r11]
+        movaps  XMMWORD[(-72)+r11],xmm0
+        movaps  xmm13,XMMWORD[((-56))+r11]
+        movaps  XMMWORD[(-56)+r11],xmm0
+        movaps  xmm14,XMMWORD[((-40))+r11]
+        movaps  XMMWORD[(-40)+r11],xmm0
+        movaps  xmm15,XMMWORD[((-24))+r11]
+        movaps  XMMWORD[(-24)+r11],xmm0
+        movaps  XMMWORD[rsp],xmm0
+        movaps  XMMWORD[16+rsp],xmm0
+        movaps  XMMWORD[32+rsp],xmm0
+        movaps  XMMWORD[48+rsp],xmm0
+        movaps  XMMWORD[64+rsp],xmm0
+        movaps  XMMWORD[80+rsp],xmm0
+        movaps  XMMWORD[96+rsp],xmm0
+        mov     rbp,QWORD[((-8))+r11]
+
+        lea     rsp,[r11]
+
+$L$xts_dec_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_aesni_xts_decrypt:
+global  aesni_ocb_encrypt
+
+ALIGN   32
+aesni_ocb_encrypt:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_ocb_encrypt:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        lea     rax,[rsp]
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        lea     rsp,[((-160))+rsp]
+        movaps  XMMWORD[rsp],xmm6
+        movaps  XMMWORD[16+rsp],xmm7
+        movaps  XMMWORD[32+rsp],xmm8
+        movaps  XMMWORD[48+rsp],xmm9
+        movaps  XMMWORD[64+rsp],xmm10
+        movaps  XMMWORD[80+rsp],xmm11
+        movaps  XMMWORD[96+rsp],xmm12
+        movaps  XMMWORD[112+rsp],xmm13
+        movaps  XMMWORD[128+rsp],xmm14
+        movaps  XMMWORD[144+rsp],xmm15
+$L$ocb_enc_body:
+        mov     rbx,QWORD[56+rax]
+        mov     rbp,QWORD[((56+8))+rax]
+
+        mov     r10d,DWORD[240+rcx]
+        mov     r11,rcx
+        shl     r10d,4
+        movups  xmm9,XMMWORD[rcx]
+        movups  xmm1,XMMWORD[16+r10*1+rcx]
+
+        movdqu  xmm15,XMMWORD[r9]
+        pxor    xmm9,xmm1
+        pxor    xmm15,xmm1
+
+        mov     eax,16+32
+        lea     rcx,[32+r10*1+r11]
+        movups  xmm1,XMMWORD[16+r11]
+        sub     rax,r10
+        mov     r10,rax
+
+        movdqu  xmm10,XMMWORD[rbx]
+        movdqu  xmm8,XMMWORD[rbp]
+
+        test    r8,1
+        jnz     NEAR $L$ocb_enc_odd
+
+        bsf     r12,r8
+        add     r8,1
+        shl     r12,4
+        movdqu  xmm7,XMMWORD[r12*1+rbx]
+        movdqu  xmm2,XMMWORD[rdi]
+        lea     rdi,[16+rdi]
+
+        call    __ocb_encrypt1
+
+        movdqa  xmm15,xmm7
+        movups  XMMWORD[rsi],xmm2
+        lea     rsi,[16+rsi]
+        sub     rdx,1
+        jz      NEAR $L$ocb_enc_done
+
+$L$ocb_enc_odd:
+        lea     r12,[1+r8]
+        lea     r13,[3+r8]
+        lea     r14,[5+r8]
+        lea     r8,[6+r8]
+        bsf     r12,r12
+        bsf     r13,r13
+        bsf     r14,r14
+        shl     r12,4
+        shl     r13,4
+        shl     r14,4
+
+        sub     rdx,6
+        jc      NEAR $L$ocb_enc_short
+        jmp     NEAR $L$ocb_enc_grandloop
+
+ALIGN   32
+$L$ocb_enc_grandloop:
+        movdqu  xmm2,XMMWORD[rdi]
+        movdqu  xmm3,XMMWORD[16+rdi]
+        movdqu  xmm4,XMMWORD[32+rdi]
+        movdqu  xmm5,XMMWORD[48+rdi]
+        movdqu  xmm6,XMMWORD[64+rdi]
+        movdqu  xmm7,XMMWORD[80+rdi]
+        lea     rdi,[96+rdi]
+
+        call    __ocb_encrypt6
+
+        movups  XMMWORD[rsi],xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        movups  XMMWORD[32+rsi],xmm4
+        movups  XMMWORD[48+rsi],xmm5
+        movups  XMMWORD[64+rsi],xmm6
+        movups  XMMWORD[80+rsi],xmm7
+        lea     rsi,[96+rsi]
+        sub     rdx,6
+        jnc     NEAR $L$ocb_enc_grandloop
+
+$L$ocb_enc_short:
+        add     rdx,6
+        jz      NEAR $L$ocb_enc_done
+
+        movdqu  xmm2,XMMWORD[rdi]
+        cmp     rdx,2
+        jb      NEAR $L$ocb_enc_one
+        movdqu  xmm3,XMMWORD[16+rdi]
+        je      NEAR $L$ocb_enc_two
+
+        movdqu  xmm4,XMMWORD[32+rdi]
+        cmp     rdx,4
+        jb      NEAR $L$ocb_enc_three
+        movdqu  xmm5,XMMWORD[48+rdi]
+        je      NEAR $L$ocb_enc_four
+
+        movdqu  xmm6,XMMWORD[64+rdi]
+        pxor    xmm7,xmm7
+
+        call    __ocb_encrypt6
+
+        movdqa  xmm15,xmm14
+        movups  XMMWORD[rsi],xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        movups  XMMWORD[32+rsi],xmm4
+        movups  XMMWORD[48+rsi],xmm5
+        movups  XMMWORD[64+rsi],xmm6
+
+        jmp     NEAR $L$ocb_enc_done
+
+ALIGN   16
+$L$ocb_enc_one:
+        movdqa  xmm7,xmm10
+
+        call    __ocb_encrypt1
+
+        movdqa  xmm15,xmm7
+        movups  XMMWORD[rsi],xmm2
+        jmp     NEAR $L$ocb_enc_done
+
+ALIGN   16
+$L$ocb_enc_two:
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+
+        call    __ocb_encrypt4
+
+        movdqa  xmm15,xmm11
+        movups  XMMWORD[rsi],xmm2
+        movups  XMMWORD[16+rsi],xmm3
+
+        jmp     NEAR $L$ocb_enc_done
+
+ALIGN   16
+$L$ocb_enc_three:
+        pxor    xmm5,xmm5
+
+        call    __ocb_encrypt4
+
+        movdqa  xmm15,xmm12
+        movups  XMMWORD[rsi],xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        movups  XMMWORD[32+rsi],xmm4
+
+        jmp     NEAR $L$ocb_enc_done
+
+ALIGN   16
+$L$ocb_enc_four:
+        call    __ocb_encrypt4
+
+        movdqa  xmm15,xmm13
+        movups  XMMWORD[rsi],xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        movups  XMMWORD[32+rsi],xmm4
+        movups  XMMWORD[48+rsi],xmm5
+
+$L$ocb_enc_done:
+        pxor    xmm15,xmm0
+        movdqu  XMMWORD[rbp],xmm8
+        movdqu  XMMWORD[r9],xmm15
+
+        xorps   xmm0,xmm0
+        pxor    xmm1,xmm1
+        pxor    xmm2,xmm2
+        pxor    xmm3,xmm3
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+        movaps  xmm6,XMMWORD[rsp]
+        movaps  XMMWORD[rsp],xmm0
+        movaps  xmm7,XMMWORD[16+rsp]
+        movaps  XMMWORD[16+rsp],xmm0
+        movaps  xmm8,XMMWORD[32+rsp]
+        movaps  XMMWORD[32+rsp],xmm0
+        movaps  xmm9,XMMWORD[48+rsp]
+        movaps  XMMWORD[48+rsp],xmm0
+        movaps  xmm10,XMMWORD[64+rsp]
+        movaps  XMMWORD[64+rsp],xmm0
+        movaps  xmm11,XMMWORD[80+rsp]
+        movaps  XMMWORD[80+rsp],xmm0
+        movaps  xmm12,XMMWORD[96+rsp]
+        movaps  XMMWORD[96+rsp],xmm0
+        movaps  xmm13,XMMWORD[112+rsp]
+        movaps  XMMWORD[112+rsp],xmm0
+        movaps  xmm14,XMMWORD[128+rsp]
+        movaps  XMMWORD[128+rsp],xmm0
+        movaps  xmm15,XMMWORD[144+rsp]
+        movaps  XMMWORD[144+rsp],xmm0
+        lea     rax,[((160+40))+rsp]
+$L$ocb_enc_pop:
+        mov     r14,QWORD[((-40))+rax]
+
+        mov     r13,QWORD[((-32))+rax]
+
+        mov     r12,QWORD[((-24))+rax]
+
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$ocb_enc_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_aesni_ocb_encrypt:
+
+
+ALIGN   32
+__ocb_encrypt6:
+        pxor    xmm15,xmm9
+        movdqu  xmm11,XMMWORD[r12*1+rbx]
+        movdqa  xmm12,xmm10
+        movdqu  xmm13,XMMWORD[r13*1+rbx]
+        movdqa  xmm14,xmm10
+        pxor    xmm10,xmm15
+        movdqu  xmm15,XMMWORD[r14*1+rbx]
+        pxor    xmm11,xmm10
+        pxor    xmm8,xmm2
+        pxor    xmm2,xmm10
+        pxor    xmm12,xmm11
+        pxor    xmm8,xmm3
+        pxor    xmm3,xmm11
+        pxor    xmm13,xmm12
+        pxor    xmm8,xmm4
+        pxor    xmm4,xmm12
+        pxor    xmm14,xmm13
+        pxor    xmm8,xmm5
+        pxor    xmm5,xmm13
+        pxor    xmm15,xmm14
+        pxor    xmm8,xmm6
+        pxor    xmm6,xmm14
+        pxor    xmm8,xmm7
+        pxor    xmm7,xmm15
+        movups  xmm0,XMMWORD[32+r11]
+
+        lea     r12,[1+r8]
+        lea     r13,[3+r8]
+        lea     r14,[5+r8]
+        add     r8,6
+        pxor    xmm10,xmm9
+        bsf     r12,r12
+        bsf     r13,r13
+        bsf     r14,r14
+
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+        pxor    xmm11,xmm9
+        pxor    xmm12,xmm9
+DB      102,15,56,220,241
+        pxor    xmm13,xmm9
+        pxor    xmm14,xmm9
+DB      102,15,56,220,249
+        movups  xmm1,XMMWORD[48+r11]
+        pxor    xmm15,xmm9
+
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+DB      102,15,56,220,224
+DB      102,15,56,220,232
+DB      102,15,56,220,240
+DB      102,15,56,220,248
+        movups  xmm0,XMMWORD[64+r11]
+        shl     r12,4
+        shl     r13,4
+        jmp     NEAR $L$ocb_enc_loop6
+
+ALIGN   32
+$L$ocb_enc_loop6:
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+DB      102,15,56,220,241
+DB      102,15,56,220,249
+        movups  xmm1,XMMWORD[rax*1+rcx]
+        add     rax,32
+
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+DB      102,15,56,220,224
+DB      102,15,56,220,232
+DB      102,15,56,220,240
+DB      102,15,56,220,248
+        movups  xmm0,XMMWORD[((-16))+rax*1+rcx]
+        jnz     NEAR $L$ocb_enc_loop6
+
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+DB      102,15,56,220,241
+DB      102,15,56,220,249
+        movups  xmm1,XMMWORD[16+r11]
+        shl     r14,4
+
+DB      102,65,15,56,221,210
+        movdqu  xmm10,XMMWORD[rbx]
+        mov     rax,r10
+DB      102,65,15,56,221,219
+DB      102,65,15,56,221,228
+DB      102,65,15,56,221,237
+DB      102,65,15,56,221,246
+DB      102,65,15,56,221,255
+        DB      0F3h,0C3h               ;repret
+
+
+
+ALIGN   32
+__ocb_encrypt4:
+        pxor    xmm15,xmm9
+        movdqu  xmm11,XMMWORD[r12*1+rbx]
+        movdqa  xmm12,xmm10
+        movdqu  xmm13,XMMWORD[r13*1+rbx]
+        pxor    xmm10,xmm15
+        pxor    xmm11,xmm10
+        pxor    xmm8,xmm2
+        pxor    xmm2,xmm10
+        pxor    xmm12,xmm11
+        pxor    xmm8,xmm3
+        pxor    xmm3,xmm11
+        pxor    xmm13,xmm12
+        pxor    xmm8,xmm4
+        pxor    xmm4,xmm12
+        pxor    xmm8,xmm5
+        pxor    xmm5,xmm13
+        movups  xmm0,XMMWORD[32+r11]
+
+        pxor    xmm10,xmm9
+        pxor    xmm11,xmm9
+        pxor    xmm12,xmm9
+        pxor    xmm13,xmm9
+
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+        movups  xmm1,XMMWORD[48+r11]
+
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+DB      102,15,56,220,224
+DB      102,15,56,220,232
+        movups  xmm0,XMMWORD[64+r11]
+        jmp     NEAR $L$ocb_enc_loop4
+
+ALIGN   32
+$L$ocb_enc_loop4:
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+        movups  xmm1,XMMWORD[rax*1+rcx]
+        add     rax,32
+
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+DB      102,15,56,220,224
+DB      102,15,56,220,232
+        movups  xmm0,XMMWORD[((-16))+rax*1+rcx]
+        jnz     NEAR $L$ocb_enc_loop4
+
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+        movups  xmm1,XMMWORD[16+r11]
+        mov     rax,r10
+
+DB      102,65,15,56,221,210
+DB      102,65,15,56,221,219
+DB      102,65,15,56,221,228
+DB      102,65,15,56,221,237
+        DB      0F3h,0C3h               ;repret
+
+
+
+ALIGN   32
+__ocb_encrypt1:
+        pxor    xmm7,xmm15
+        pxor    xmm7,xmm9
+        pxor    xmm8,xmm2
+        pxor    xmm2,xmm7
+        movups  xmm0,XMMWORD[32+r11]
+
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[48+r11]
+        pxor    xmm7,xmm9
+
+DB      102,15,56,220,208
+        movups  xmm0,XMMWORD[64+r11]
+        jmp     NEAR $L$ocb_enc_loop1
+
+ALIGN   32
+$L$ocb_enc_loop1:
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[rax*1+rcx]
+        add     rax,32
+
+DB      102,15,56,220,208
+        movups  xmm0,XMMWORD[((-16))+rax*1+rcx]
+        jnz     NEAR $L$ocb_enc_loop1
+
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[16+r11]
+        mov     rax,r10
+
+DB      102,15,56,221,215
+        DB      0F3h,0C3h               ;repret
+
+
+global  aesni_ocb_decrypt
+
+ALIGN   32
+aesni_ocb_decrypt:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_ocb_decrypt:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        lea     rax,[rsp]
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        lea     rsp,[((-160))+rsp]
+        movaps  XMMWORD[rsp],xmm6
+        movaps  XMMWORD[16+rsp],xmm7
+        movaps  XMMWORD[32+rsp],xmm8
+        movaps  XMMWORD[48+rsp],xmm9
+        movaps  XMMWORD[64+rsp],xmm10
+        movaps  XMMWORD[80+rsp],xmm11
+        movaps  XMMWORD[96+rsp],xmm12
+        movaps  XMMWORD[112+rsp],xmm13
+        movaps  XMMWORD[128+rsp],xmm14
+        movaps  XMMWORD[144+rsp],xmm15
+$L$ocb_dec_body:
+        mov     rbx,QWORD[56+rax]
+        mov     rbp,QWORD[((56+8))+rax]
+
+        mov     r10d,DWORD[240+rcx]
+        mov     r11,rcx
+        shl     r10d,4
+        movups  xmm9,XMMWORD[rcx]
+        movups  xmm1,XMMWORD[16+r10*1+rcx]
+
+        movdqu  xmm15,XMMWORD[r9]
+        pxor    xmm9,xmm1
+        pxor    xmm15,xmm1
+
+        mov     eax,16+32
+        lea     rcx,[32+r10*1+r11]
+        movups  xmm1,XMMWORD[16+r11]
+        sub     rax,r10
+        mov     r10,rax
+
+        movdqu  xmm10,XMMWORD[rbx]
+        movdqu  xmm8,XMMWORD[rbp]
+
+        test    r8,1
+        jnz     NEAR $L$ocb_dec_odd
+
+        bsf     r12,r8
+        add     r8,1
+        shl     r12,4
+        movdqu  xmm7,XMMWORD[r12*1+rbx]
+        movdqu  xmm2,XMMWORD[rdi]
+        lea     rdi,[16+rdi]
+
+        call    __ocb_decrypt1
+
+        movdqa  xmm15,xmm7
+        movups  XMMWORD[rsi],xmm2
+        xorps   xmm8,xmm2
+        lea     rsi,[16+rsi]
+        sub     rdx,1
+        jz      NEAR $L$ocb_dec_done
+
+$L$ocb_dec_odd:
+        lea     r12,[1+r8]
+        lea     r13,[3+r8]
+        lea     r14,[5+r8]
+        lea     r8,[6+r8]
+        bsf     r12,r12
+        bsf     r13,r13
+        bsf     r14,r14
+        shl     r12,4
+        shl     r13,4
+        shl     r14,4
+
+        sub     rdx,6
+        jc      NEAR $L$ocb_dec_short
+        jmp     NEAR $L$ocb_dec_grandloop
+
+ALIGN   32
+$L$ocb_dec_grandloop:
+        movdqu  xmm2,XMMWORD[rdi]
+        movdqu  xmm3,XMMWORD[16+rdi]
+        movdqu  xmm4,XMMWORD[32+rdi]
+        movdqu  xmm5,XMMWORD[48+rdi]
+        movdqu  xmm6,XMMWORD[64+rdi]
+        movdqu  xmm7,XMMWORD[80+rdi]
+        lea     rdi,[96+rdi]
+
+        call    __ocb_decrypt6
+
+        movups  XMMWORD[rsi],xmm2
+        pxor    xmm8,xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        pxor    xmm8,xmm3
+        movups  XMMWORD[32+rsi],xmm4
+        pxor    xmm8,xmm4
+        movups  XMMWORD[48+rsi],xmm5
+        pxor    xmm8,xmm5
+        movups  XMMWORD[64+rsi],xmm6
+        pxor    xmm8,xmm6
+        movups  XMMWORD[80+rsi],xmm7
+        pxor    xmm8,xmm7
+        lea     rsi,[96+rsi]
+        sub     rdx,6
+        jnc     NEAR $L$ocb_dec_grandloop
+
+$L$ocb_dec_short:
+        add     rdx,6
+        jz      NEAR $L$ocb_dec_done
+
+        movdqu  xmm2,XMMWORD[rdi]
+        cmp     rdx,2
+        jb      NEAR $L$ocb_dec_one
+        movdqu  xmm3,XMMWORD[16+rdi]
+        je      NEAR $L$ocb_dec_two
+
+        movdqu  xmm4,XMMWORD[32+rdi]
+        cmp     rdx,4
+        jb      NEAR $L$ocb_dec_three
+        movdqu  xmm5,XMMWORD[48+rdi]
+        je      NEAR $L$ocb_dec_four
+
+        movdqu  xmm6,XMMWORD[64+rdi]
+        pxor    xmm7,xmm7
+
+        call    __ocb_decrypt6
+
+        movdqa  xmm15,xmm14
+        movups  XMMWORD[rsi],xmm2
+        pxor    xmm8,xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        pxor    xmm8,xmm3
+        movups  XMMWORD[32+rsi],xmm4
+        pxor    xmm8,xmm4
+        movups  XMMWORD[48+rsi],xmm5
+        pxor    xmm8,xmm5
+        movups  XMMWORD[64+rsi],xmm6
+        pxor    xmm8,xmm6
+
+        jmp     NEAR $L$ocb_dec_done
+
+ALIGN   16
+$L$ocb_dec_one:
+        movdqa  xmm7,xmm10
+
+        call    __ocb_decrypt1
+
+        movdqa  xmm15,xmm7
+        movups  XMMWORD[rsi],xmm2
+        xorps   xmm8,xmm2
+        jmp     NEAR $L$ocb_dec_done
+
+ALIGN   16
+$L$ocb_dec_two:
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+
+        call    __ocb_decrypt4
+
+        movdqa  xmm15,xmm11
+        movups  XMMWORD[rsi],xmm2
+        xorps   xmm8,xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        xorps   xmm8,xmm3
+
+        jmp     NEAR $L$ocb_dec_done
+
+ALIGN   16
+$L$ocb_dec_three:
+        pxor    xmm5,xmm5
+
+        call    __ocb_decrypt4
+
+        movdqa  xmm15,xmm12
+        movups  XMMWORD[rsi],xmm2
+        xorps   xmm8,xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        xorps   xmm8,xmm3
+        movups  XMMWORD[32+rsi],xmm4
+        xorps   xmm8,xmm4
+
+        jmp     NEAR $L$ocb_dec_done
+
+ALIGN   16
+$L$ocb_dec_four:
+        call    __ocb_decrypt4
+
+        movdqa  xmm15,xmm13
+        movups  XMMWORD[rsi],xmm2
+        pxor    xmm8,xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        pxor    xmm8,xmm3
+        movups  XMMWORD[32+rsi],xmm4
+        pxor    xmm8,xmm4
+        movups  XMMWORD[48+rsi],xmm5
+        pxor    xmm8,xmm5
+
+$L$ocb_dec_done:
+        pxor    xmm15,xmm0
+        movdqu  XMMWORD[rbp],xmm8
+        movdqu  XMMWORD[r9],xmm15
+
+        xorps   xmm0,xmm0
+        pxor    xmm1,xmm1
+        pxor    xmm2,xmm2
+        pxor    xmm3,xmm3
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+        movaps  xmm6,XMMWORD[rsp]
+        movaps  XMMWORD[rsp],xmm0
+        movaps  xmm7,XMMWORD[16+rsp]
+        movaps  XMMWORD[16+rsp],xmm0
+        movaps  xmm8,XMMWORD[32+rsp]
+        movaps  XMMWORD[32+rsp],xmm0
+        movaps  xmm9,XMMWORD[48+rsp]
+        movaps  XMMWORD[48+rsp],xmm0
+        movaps  xmm10,XMMWORD[64+rsp]
+        movaps  XMMWORD[64+rsp],xmm0
+        movaps  xmm11,XMMWORD[80+rsp]
+        movaps  XMMWORD[80+rsp],xmm0
+        movaps  xmm12,XMMWORD[96+rsp]
+        movaps  XMMWORD[96+rsp],xmm0
+        movaps  xmm13,XMMWORD[112+rsp]
+        movaps  XMMWORD[112+rsp],xmm0
+        movaps  xmm14,XMMWORD[128+rsp]
+        movaps  XMMWORD[128+rsp],xmm0
+        movaps  xmm15,XMMWORD[144+rsp]
+        movaps  XMMWORD[144+rsp],xmm0
+        lea     rax,[((160+40))+rsp]
+$L$ocb_dec_pop:
+        mov     r14,QWORD[((-40))+rax]
+
+        mov     r13,QWORD[((-32))+rax]
+
+        mov     r12,QWORD[((-24))+rax]
+
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$ocb_dec_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_aesni_ocb_decrypt:
+
+
+ALIGN   32
+__ocb_decrypt6:
+        pxor    xmm15,xmm9
+        movdqu  xmm11,XMMWORD[r12*1+rbx]
+        movdqa  xmm12,xmm10
+        movdqu  xmm13,XMMWORD[r13*1+rbx]
+        movdqa  xmm14,xmm10
+        pxor    xmm10,xmm15
+        movdqu  xmm15,XMMWORD[r14*1+rbx]
+        pxor    xmm11,xmm10
+        pxor    xmm2,xmm10
+        pxor    xmm12,xmm11
+        pxor    xmm3,xmm11
+        pxor    xmm13,xmm12
+        pxor    xmm4,xmm12
+        pxor    xmm14,xmm13
+        pxor    xmm5,xmm13
+        pxor    xmm15,xmm14
+        pxor    xmm6,xmm14
+        pxor    xmm7,xmm15
+        movups  xmm0,XMMWORD[32+r11]
+
+        lea     r12,[1+r8]
+        lea     r13,[3+r8]
+        lea     r14,[5+r8]
+        add     r8,6
+        pxor    xmm10,xmm9
+        bsf     r12,r12
+        bsf     r13,r13
+        bsf     r14,r14
+
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+        pxor    xmm11,xmm9
+        pxor    xmm12,xmm9
+DB      102,15,56,222,241
+        pxor    xmm13,xmm9
+        pxor    xmm14,xmm9
+DB      102,15,56,222,249
+        movups  xmm1,XMMWORD[48+r11]
+        pxor    xmm15,xmm9
+
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+DB      102,15,56,222,240
+DB      102,15,56,222,248
+        movups  xmm0,XMMWORD[64+r11]
+        shl     r12,4
+        shl     r13,4
+        jmp     NEAR $L$ocb_dec_loop6
+
+ALIGN   32
+$L$ocb_dec_loop6:
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+DB      102,15,56,222,241
+DB      102,15,56,222,249
+        movups  xmm1,XMMWORD[rax*1+rcx]
+        add     rax,32
+
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+DB      102,15,56,222,240
+DB      102,15,56,222,248
+        movups  xmm0,XMMWORD[((-16))+rax*1+rcx]
+        jnz     NEAR $L$ocb_dec_loop6
+
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+DB      102,15,56,222,241
+DB      102,15,56,222,249
+        movups  xmm1,XMMWORD[16+r11]
+        shl     r14,4
+
+DB      102,65,15,56,223,210
+        movdqu  xmm10,XMMWORD[rbx]
+        mov     rax,r10
+DB      102,65,15,56,223,219
+DB      102,65,15,56,223,228
+DB      102,65,15,56,223,237
+DB      102,65,15,56,223,246
+DB      102,65,15,56,223,255
+        DB      0F3h,0C3h               ;repret
+
+
+
+ALIGN   32
+__ocb_decrypt4:
+        pxor    xmm15,xmm9
+        movdqu  xmm11,XMMWORD[r12*1+rbx]
+        movdqa  xmm12,xmm10
+        movdqu  xmm13,XMMWORD[r13*1+rbx]
+        pxor    xmm10,xmm15
+        pxor    xmm11,xmm10
+        pxor    xmm2,xmm10
+        pxor    xmm12,xmm11
+        pxor    xmm3,xmm11
+        pxor    xmm13,xmm12
+        pxor    xmm4,xmm12
+        pxor    xmm5,xmm13
+        movups  xmm0,XMMWORD[32+r11]
+
+        pxor    xmm10,xmm9
+        pxor    xmm11,xmm9
+        pxor    xmm12,xmm9
+        pxor    xmm13,xmm9
+
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+        movups  xmm1,XMMWORD[48+r11]
+
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+        movups  xmm0,XMMWORD[64+r11]
+        jmp     NEAR $L$ocb_dec_loop4
+
+ALIGN   32
+$L$ocb_dec_loop4:
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+        movups  xmm1,XMMWORD[rax*1+rcx]
+        add     rax,32
+
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+        movups  xmm0,XMMWORD[((-16))+rax*1+rcx]
+        jnz     NEAR $L$ocb_dec_loop4
+
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+        movups  xmm1,XMMWORD[16+r11]
+        mov     rax,r10
+
+DB      102,65,15,56,223,210
+DB      102,65,15,56,223,219
+DB      102,65,15,56,223,228
+DB      102,65,15,56,223,237
+        DB      0F3h,0C3h               ;repret
+
+
+
+ALIGN   32
+__ocb_decrypt1:
+        pxor    xmm7,xmm15
+        pxor    xmm7,xmm9
+        pxor    xmm2,xmm7
+        movups  xmm0,XMMWORD[32+r11]
+
+DB      102,15,56,222,209
+        movups  xmm1,XMMWORD[48+r11]
+        pxor    xmm7,xmm9
+
+DB      102,15,56,222,208
+        movups  xmm0,XMMWORD[64+r11]
+        jmp     NEAR $L$ocb_dec_loop1
+
+ALIGN   32
+$L$ocb_dec_loop1:
+DB      102,15,56,222,209
+        movups  xmm1,XMMWORD[rax*1+rcx]
+        add     rax,32
+
+DB      102,15,56,222,208
+        movups  xmm0,XMMWORD[((-16))+rax*1+rcx]
+        jnz     NEAR $L$ocb_dec_loop1
+
+DB      102,15,56,222,209
+        movups  xmm1,XMMWORD[16+r11]
+        mov     rax,r10
+
+DB      102,15,56,223,215
+        DB      0F3h,0C3h               ;repret
+
+global  aesni_cbc_encrypt
+
+ALIGN   16
+aesni_cbc_encrypt:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_cbc_encrypt:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        test    rdx,rdx
+        jz      NEAR $L$cbc_ret
+
+        mov     r10d,DWORD[240+rcx]
+        mov     r11,rcx
+        test    r9d,r9d
+        jz      NEAR $L$cbc_decrypt
+
+        movups  xmm2,XMMWORD[r8]
+        mov     eax,r10d
+        cmp     rdx,16
+        jb      NEAR $L$cbc_enc_tail
+        sub     rdx,16
+        jmp     NEAR $L$cbc_enc_loop
+ALIGN   16
+$L$cbc_enc_loop:
+        movups  xmm3,XMMWORD[rdi]
+        lea     rdi,[16+rdi]
+
+        movups  xmm0,XMMWORD[rcx]
+        movups  xmm1,XMMWORD[16+rcx]
+        xorps   xmm3,xmm0
+        lea     rcx,[32+rcx]
+        xorps   xmm2,xmm3
+$L$oop_enc1_15:
+DB      102,15,56,220,209
+        dec     eax
+        movups  xmm1,XMMWORD[rcx]
+        lea     rcx,[16+rcx]
+        jnz     NEAR $L$oop_enc1_15
+DB      102,15,56,221,209
+        mov     eax,r10d
+        mov     rcx,r11
+        movups  XMMWORD[rsi],xmm2
+        lea     rsi,[16+rsi]
+        sub     rdx,16
+        jnc     NEAR $L$cbc_enc_loop
+        add     rdx,16
+        jnz     NEAR $L$cbc_enc_tail
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        movups  XMMWORD[r8],xmm2
+        pxor    xmm2,xmm2
+        pxor    xmm3,xmm3
+        jmp     NEAR $L$cbc_ret
+
+$L$cbc_enc_tail:
+        mov     rcx,rdx
+        xchg    rsi,rdi
+        DD      0x9066A4F3
+        mov     ecx,16
+        sub     rcx,rdx
+        xor     eax,eax
+        DD      0x9066AAF3
+        lea     rdi,[((-16))+rdi]
+        mov     eax,r10d
+        mov     rsi,rdi
+        mov     rcx,r11
+        xor     rdx,rdx
+        jmp     NEAR $L$cbc_enc_loop
+
+ALIGN   16
+$L$cbc_decrypt:
+        cmp     rdx,16
+        jne     NEAR $L$cbc_decrypt_bulk
+
+
+
+        movdqu  xmm2,XMMWORD[rdi]
+        movdqu  xmm3,XMMWORD[r8]
+        movdqa  xmm4,xmm2
+        movups  xmm0,XMMWORD[rcx]
+        movups  xmm1,XMMWORD[16+rcx]
+        lea     rcx,[32+rcx]
+        xorps   xmm2,xmm0
+$L$oop_dec1_16:
+DB      102,15,56,222,209
+        dec     r10d
+        movups  xmm1,XMMWORD[rcx]
+        lea     rcx,[16+rcx]
+        jnz     NEAR $L$oop_dec1_16
+DB      102,15,56,223,209
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        movdqu  XMMWORD[r8],xmm4
+        xorps   xmm2,xmm3
+        pxor    xmm3,xmm3
+        movups  XMMWORD[rsi],xmm2
+        pxor    xmm2,xmm2
+        jmp     NEAR $L$cbc_ret
+ALIGN   16
+$L$cbc_decrypt_bulk:
+        lea     r11,[rsp]
+
+        push    rbp
+
+        sub     rsp,176
+        and     rsp,-16
+        movaps  XMMWORD[16+rsp],xmm6
+        movaps  XMMWORD[32+rsp],xmm7
+        movaps  XMMWORD[48+rsp],xmm8
+        movaps  XMMWORD[64+rsp],xmm9
+        movaps  XMMWORD[80+rsp],xmm10
+        movaps  XMMWORD[96+rsp],xmm11
+        movaps  XMMWORD[112+rsp],xmm12
+        movaps  XMMWORD[128+rsp],xmm13
+        movaps  XMMWORD[144+rsp],xmm14
+        movaps  XMMWORD[160+rsp],xmm15
+$L$cbc_decrypt_body:
+        mov     rbp,rcx
+        movups  xmm10,XMMWORD[r8]
+        mov     eax,r10d
+        cmp     rdx,0x50
+        jbe     NEAR $L$cbc_dec_tail
+
+        movups  xmm0,XMMWORD[rcx]
+        movdqu  xmm2,XMMWORD[rdi]
+        movdqu  xmm3,XMMWORD[16+rdi]
+        movdqa  xmm11,xmm2
+        movdqu  xmm4,XMMWORD[32+rdi]
+        movdqa  xmm12,xmm3
+        movdqu  xmm5,XMMWORD[48+rdi]
+        movdqa  xmm13,xmm4
+        movdqu  xmm6,XMMWORD[64+rdi]
+        movdqa  xmm14,xmm5
+        movdqu  xmm7,XMMWORD[80+rdi]
+        movdqa  xmm15,xmm6
+        mov     r9d,DWORD[((OPENSSL_ia32cap_P+4))]
+        cmp     rdx,0x70
+        jbe     NEAR $L$cbc_dec_six_or_seven
+
+        and     r9d,71303168
+        sub     rdx,0x50
+        cmp     r9d,4194304
+        je      NEAR $L$cbc_dec_loop6_enter
+        sub     rdx,0x20
+        lea     rcx,[112+rcx]
+        jmp     NEAR $L$cbc_dec_loop8_enter
+ALIGN   16
+$L$cbc_dec_loop8:
+        movups  XMMWORD[rsi],xmm9
+        lea     rsi,[16+rsi]
+$L$cbc_dec_loop8_enter:
+        movdqu  xmm8,XMMWORD[96+rdi]
+        pxor    xmm2,xmm0
+        movdqu  xmm9,XMMWORD[112+rdi]
+        pxor    xmm3,xmm0
+        movups  xmm1,XMMWORD[((16-112))+rcx]
+        pxor    xmm4,xmm0
+        mov     rbp,-1
+        cmp     rdx,0x70
+        pxor    xmm5,xmm0
+        pxor    xmm6,xmm0
+        pxor    xmm7,xmm0
+        pxor    xmm8,xmm0
+
+DB      102,15,56,222,209
+        pxor    xmm9,xmm0
+        movups  xmm0,XMMWORD[((32-112))+rcx]
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+DB      102,15,56,222,241
+DB      102,15,56,222,249
+DB      102,68,15,56,222,193
+        adc     rbp,0
+        and     rbp,128
+DB      102,68,15,56,222,201
+        add     rbp,rdi
+        movups  xmm1,XMMWORD[((48-112))+rcx]
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+DB      102,15,56,222,240
+DB      102,15,56,222,248
+DB      102,68,15,56,222,192
+DB      102,68,15,56,222,200
+        movups  xmm0,XMMWORD[((64-112))+rcx]
+        nop
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+DB      102,15,56,222,241
+DB      102,15,56,222,249
+DB      102,68,15,56,222,193
+DB      102,68,15,56,222,201
+        movups  xmm1,XMMWORD[((80-112))+rcx]
+        nop
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+DB      102,15,56,222,240
+DB      102,15,56,222,248
+DB      102,68,15,56,222,192
+DB      102,68,15,56,222,200
+        movups  xmm0,XMMWORD[((96-112))+rcx]
+        nop
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+DB      102,15,56,222,241
+DB      102,15,56,222,249
+DB      102,68,15,56,222,193
+DB      102,68,15,56,222,201
+        movups  xmm1,XMMWORD[((112-112))+rcx]
+        nop
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+DB      102,15,56,222,240
+DB      102,15,56,222,248
+DB      102,68,15,56,222,192
+DB      102,68,15,56,222,200
+        movups  xmm0,XMMWORD[((128-112))+rcx]
+        nop
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+DB      102,15,56,222,241
+DB      102,15,56,222,249
+DB      102,68,15,56,222,193
+DB      102,68,15,56,222,201
+        movups  xmm1,XMMWORD[((144-112))+rcx]
+        cmp     eax,11
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+DB      102,15,56,222,240
+DB      102,15,56,222,248
+DB      102,68,15,56,222,192
+DB      102,68,15,56,222,200
+        movups  xmm0,XMMWORD[((160-112))+rcx]
+        jb      NEAR $L$cbc_dec_done
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+DB      102,15,56,222,241
+DB      102,15,56,222,249
+DB      102,68,15,56,222,193
+DB      102,68,15,56,222,201
+        movups  xmm1,XMMWORD[((176-112))+rcx]
+        nop
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+DB      102,15,56,222,240
+DB      102,15,56,222,248
+DB      102,68,15,56,222,192
+DB      102,68,15,56,222,200
+        movups  xmm0,XMMWORD[((192-112))+rcx]
+        je      NEAR $L$cbc_dec_done
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+DB      102,15,56,222,241
+DB      102,15,56,222,249
+DB      102,68,15,56,222,193
+DB      102,68,15,56,222,201
+        movups  xmm1,XMMWORD[((208-112))+rcx]
+        nop
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+DB      102,15,56,222,240
+DB      102,15,56,222,248
+DB      102,68,15,56,222,192
+DB      102,68,15,56,222,200
+        movups  xmm0,XMMWORD[((224-112))+rcx]
+        jmp     NEAR $L$cbc_dec_done
+ALIGN   16
+$L$cbc_dec_done:
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+        pxor    xmm10,xmm0
+        pxor    xmm11,xmm0
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+        pxor    xmm12,xmm0
+        pxor    xmm13,xmm0
+DB      102,15,56,222,241
+DB      102,15,56,222,249
+        pxor    xmm14,xmm0
+        pxor    xmm15,xmm0
+DB      102,68,15,56,222,193
+DB      102,68,15,56,222,201
+        movdqu  xmm1,XMMWORD[80+rdi]
+
+DB      102,65,15,56,223,210
+        movdqu  xmm10,XMMWORD[96+rdi]
+        pxor    xmm1,xmm0
+DB      102,65,15,56,223,219
+        pxor    xmm10,xmm0
+        movdqu  xmm0,XMMWORD[112+rdi]
+DB      102,65,15,56,223,228
+        lea     rdi,[128+rdi]
+        movdqu  xmm11,XMMWORD[rbp]
+DB      102,65,15,56,223,237
+DB      102,65,15,56,223,246
+        movdqu  xmm12,XMMWORD[16+rbp]
+        movdqu  xmm13,XMMWORD[32+rbp]
+DB      102,65,15,56,223,255
+DB      102,68,15,56,223,193
+        movdqu  xmm14,XMMWORD[48+rbp]
+        movdqu  xmm15,XMMWORD[64+rbp]
+DB      102,69,15,56,223,202
+        movdqa  xmm10,xmm0
+        movdqu  xmm1,XMMWORD[80+rbp]
+        movups  xmm0,XMMWORD[((-112))+rcx]
+
+        movups  XMMWORD[rsi],xmm2
+        movdqa  xmm2,xmm11
+        movups  XMMWORD[16+rsi],xmm3
+        movdqa  xmm3,xmm12
+        movups  XMMWORD[32+rsi],xmm4
+        movdqa  xmm4,xmm13
+        movups  XMMWORD[48+rsi],xmm5
+        movdqa  xmm5,xmm14
+        movups  XMMWORD[64+rsi],xmm6
+        movdqa  xmm6,xmm15
+        movups  XMMWORD[80+rsi],xmm7
+        movdqa  xmm7,xmm1
+        movups  XMMWORD[96+rsi],xmm8
+        lea     rsi,[112+rsi]
+
+        sub     rdx,0x80
+        ja      NEAR $L$cbc_dec_loop8
+
+        movaps  xmm2,xmm9
+        lea     rcx,[((-112))+rcx]
+        add     rdx,0x70
+        jle     NEAR $L$cbc_dec_clear_tail_collected
+        movups  XMMWORD[rsi],xmm9
+        lea     rsi,[16+rsi]
+        cmp     rdx,0x50
+        jbe     NEAR $L$cbc_dec_tail
+
+        movaps  xmm2,xmm11
+$L$cbc_dec_six_or_seven:
+        cmp     rdx,0x60
+        ja      NEAR $L$cbc_dec_seven
+
+        movaps  xmm8,xmm7
+        call    _aesni_decrypt6
+        pxor    xmm2,xmm10
+        movaps  xmm10,xmm8
+        pxor    xmm3,xmm11
+        movdqu  XMMWORD[rsi],xmm2
+        pxor    xmm4,xmm12
+        movdqu  XMMWORD[16+rsi],xmm3
+        pxor    xmm3,xmm3
+        pxor    xmm5,xmm13
+        movdqu  XMMWORD[32+rsi],xmm4
+        pxor    xmm4,xmm4
+        pxor    xmm6,xmm14
+        movdqu  XMMWORD[48+rsi],xmm5
+        pxor    xmm5,xmm5
+        pxor    xmm7,xmm15
+        movdqu  XMMWORD[64+rsi],xmm6
+        pxor    xmm6,xmm6
+        lea     rsi,[80+rsi]
+        movdqa  xmm2,xmm7
+        pxor    xmm7,xmm7
+        jmp     NEAR $L$cbc_dec_tail_collected
+
+ALIGN   16
+$L$cbc_dec_seven:
+        movups  xmm8,XMMWORD[96+rdi]
+        xorps   xmm9,xmm9
+        call    _aesni_decrypt8
+        movups  xmm9,XMMWORD[80+rdi]
+        pxor    xmm2,xmm10
+        movups  xmm10,XMMWORD[96+rdi]
+        pxor    xmm3,xmm11
+        movdqu  XMMWORD[rsi],xmm2
+        pxor    xmm4,xmm12
+        movdqu  XMMWORD[16+rsi],xmm3
+        pxor    xmm3,xmm3
+        pxor    xmm5,xmm13
+        movdqu  XMMWORD[32+rsi],xmm4
+        pxor    xmm4,xmm4
+        pxor    xmm6,xmm14
+        movdqu  XMMWORD[48+rsi],xmm5
+        pxor    xmm5,xmm5
+        pxor    xmm7,xmm15
+        movdqu  XMMWORD[64+rsi],xmm6
+        pxor    xmm6,xmm6
+        pxor    xmm8,xmm9
+        movdqu  XMMWORD[80+rsi],xmm7
+        pxor    xmm7,xmm7
+        lea     rsi,[96+rsi]
+        movdqa  xmm2,xmm8
+        pxor    xmm8,xmm8
+        pxor    xmm9,xmm9
+        jmp     NEAR $L$cbc_dec_tail_collected
+
+ALIGN   16
+$L$cbc_dec_loop6:
+        movups  XMMWORD[rsi],xmm7
+        lea     rsi,[16+rsi]
+        movdqu  xmm2,XMMWORD[rdi]
+        movdqu  xmm3,XMMWORD[16+rdi]
+        movdqa  xmm11,xmm2
+        movdqu  xmm4,XMMWORD[32+rdi]
+        movdqa  xmm12,xmm3
+        movdqu  xmm5,XMMWORD[48+rdi]
+        movdqa  xmm13,xmm4
+        movdqu  xmm6,XMMWORD[64+rdi]
+        movdqa  xmm14,xmm5
+        movdqu  xmm7,XMMWORD[80+rdi]
+        movdqa  xmm15,xmm6
+$L$cbc_dec_loop6_enter:
+        lea     rdi,[96+rdi]
+        movdqa  xmm8,xmm7
+
+        call    _aesni_decrypt6
+
+        pxor    xmm2,xmm10
+        movdqa  xmm10,xmm8
+        pxor    xmm3,xmm11
+        movdqu  XMMWORD[rsi],xmm2
+        pxor    xmm4,xmm12
+        movdqu  XMMWORD[16+rsi],xmm3
+        pxor    xmm5,xmm13
+        movdqu  XMMWORD[32+rsi],xmm4
+        pxor    xmm6,xmm14
+        mov     rcx,rbp
+        movdqu  XMMWORD[48+rsi],xmm5
+        pxor    xmm7,xmm15
+        mov     eax,r10d
+        movdqu  XMMWORD[64+rsi],xmm6
+        lea     rsi,[80+rsi]
+        sub     rdx,0x60
+        ja      NEAR $L$cbc_dec_loop6
+
+        movdqa  xmm2,xmm7
+        add     rdx,0x50
+        jle     NEAR $L$cbc_dec_clear_tail_collected
+        movups  XMMWORD[rsi],xmm7
+        lea     rsi,[16+rsi]
+
+$L$cbc_dec_tail:
+        movups  xmm2,XMMWORD[rdi]
+        sub     rdx,0x10
+        jbe     NEAR $L$cbc_dec_one
+
+        movups  xmm3,XMMWORD[16+rdi]
+        movaps  xmm11,xmm2
+        sub     rdx,0x10
+        jbe     NEAR $L$cbc_dec_two
+
+        movups  xmm4,XMMWORD[32+rdi]
+        movaps  xmm12,xmm3
+        sub     rdx,0x10
+        jbe     NEAR $L$cbc_dec_three
+
+        movups  xmm5,XMMWORD[48+rdi]
+        movaps  xmm13,xmm4
+        sub     rdx,0x10
+        jbe     NEAR $L$cbc_dec_four
+
+        movups  xmm6,XMMWORD[64+rdi]
+        movaps  xmm14,xmm5
+        movaps  xmm15,xmm6
+        xorps   xmm7,xmm7
+        call    _aesni_decrypt6
+        pxor    xmm2,xmm10
+        movaps  xmm10,xmm15
+        pxor    xmm3,xmm11
+        movdqu  XMMWORD[rsi],xmm2
+        pxor    xmm4,xmm12
+        movdqu  XMMWORD[16+rsi],xmm3
+        pxor    xmm3,xmm3
+        pxor    xmm5,xmm13
+        movdqu  XMMWORD[32+rsi],xmm4
+        pxor    xmm4,xmm4
+        pxor    xmm6,xmm14
+        movdqu  XMMWORD[48+rsi],xmm5
+        pxor    xmm5,xmm5
+        lea     rsi,[64+rsi]
+        movdqa  xmm2,xmm6
+        pxor    xmm6,xmm6
+        pxor    xmm7,xmm7
+        sub     rdx,0x10
+        jmp     NEAR $L$cbc_dec_tail_collected
+
+ALIGN   16
+$L$cbc_dec_one:
+        movaps  xmm11,xmm2
+        movups  xmm0,XMMWORD[rcx]
+        movups  xmm1,XMMWORD[16+rcx]
+        lea     rcx,[32+rcx]
+        xorps   xmm2,xmm0
+$L$oop_dec1_17:
+DB      102,15,56,222,209
+        dec     eax
+        movups  xmm1,XMMWORD[rcx]
+        lea     rcx,[16+rcx]
+        jnz     NEAR $L$oop_dec1_17
+DB      102,15,56,223,209
+        xorps   xmm2,xmm10
+        movaps  xmm10,xmm11
+        jmp     NEAR $L$cbc_dec_tail_collected
+ALIGN   16
+$L$cbc_dec_two:
+        movaps  xmm12,xmm3
+        call    _aesni_decrypt2
+        pxor    xmm2,xmm10
+        movaps  xmm10,xmm12
+        pxor    xmm3,xmm11
+        movdqu  XMMWORD[rsi],xmm2
+        movdqa  xmm2,xmm3
+        pxor    xmm3,xmm3
+        lea     rsi,[16+rsi]
+        jmp     NEAR $L$cbc_dec_tail_collected
+ALIGN   16
+$L$cbc_dec_three:
+        movaps  xmm13,xmm4
+        call    _aesni_decrypt3
+        pxor    xmm2,xmm10
+        movaps  xmm10,xmm13
+        pxor    xmm3,xmm11
+        movdqu  XMMWORD[rsi],xmm2
+        pxor    xmm4,xmm12
+        movdqu  XMMWORD[16+rsi],xmm3
+        pxor    xmm3,xmm3
+        movdqa  xmm2,xmm4
+        pxor    xmm4,xmm4
+        lea     rsi,[32+rsi]
+        jmp     NEAR $L$cbc_dec_tail_collected
+ALIGN   16
+$L$cbc_dec_four:
+        movaps  xmm14,xmm5
+        call    _aesni_decrypt4
+        pxor    xmm2,xmm10
+        movaps  xmm10,xmm14
+        pxor    xmm3,xmm11
+        movdqu  XMMWORD[rsi],xmm2
+        pxor    xmm4,xmm12
+        movdqu  XMMWORD[16+rsi],xmm3
+        pxor    xmm3,xmm3
+        pxor    xmm5,xmm13
+        movdqu  XMMWORD[32+rsi],xmm4
+        pxor    xmm4,xmm4
+        movdqa  xmm2,xmm5
+        pxor    xmm5,xmm5
+        lea     rsi,[48+rsi]
+        jmp     NEAR $L$cbc_dec_tail_collected
+
+ALIGN   16
+$L$cbc_dec_clear_tail_collected:
+        pxor    xmm3,xmm3
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+$L$cbc_dec_tail_collected:
+        movups  XMMWORD[r8],xmm10
+        and     rdx,15
+        jnz     NEAR $L$cbc_dec_tail_partial
+        movups  XMMWORD[rsi],xmm2
+        pxor    xmm2,xmm2
+        jmp     NEAR $L$cbc_dec_ret
+ALIGN   16
+$L$cbc_dec_tail_partial:
+        movaps  XMMWORD[rsp],xmm2
+        pxor    xmm2,xmm2
+        mov     rcx,16
+        mov     rdi,rsi
+        sub     rcx,rdx
+        lea     rsi,[rsp]
+        DD      0x9066A4F3
+        movdqa  XMMWORD[rsp],xmm2
+
+$L$cbc_dec_ret:
+        xorps   xmm0,xmm0
+        pxor    xmm1,xmm1
+        movaps  xmm6,XMMWORD[16+rsp]
+        movaps  XMMWORD[16+rsp],xmm0
+        movaps  xmm7,XMMWORD[32+rsp]
+        movaps  XMMWORD[32+rsp],xmm0
+        movaps  xmm8,XMMWORD[48+rsp]
+        movaps  XMMWORD[48+rsp],xmm0
+        movaps  xmm9,XMMWORD[64+rsp]
+        movaps  XMMWORD[64+rsp],xmm0
+        movaps  xmm10,XMMWORD[80+rsp]
+        movaps  XMMWORD[80+rsp],xmm0
+        movaps  xmm11,XMMWORD[96+rsp]
+        movaps  XMMWORD[96+rsp],xmm0
+        movaps  xmm12,XMMWORD[112+rsp]
+        movaps  XMMWORD[112+rsp],xmm0
+        movaps  xmm13,XMMWORD[128+rsp]
+        movaps  XMMWORD[128+rsp],xmm0
+        movaps  xmm14,XMMWORD[144+rsp]
+        movaps  XMMWORD[144+rsp],xmm0
+        movaps  xmm15,XMMWORD[160+rsp]
+        movaps  XMMWORD[160+rsp],xmm0
+        mov     rbp,QWORD[((-8))+r11]
+
+        lea     rsp,[r11]
+
+$L$cbc_ret:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_aesni_cbc_encrypt:
+global  aesni_set_decrypt_key
+
+ALIGN   16
+aesni_set_decrypt_key:
+
+DB      0x48,0x83,0xEC,0x08
+
+        call    __aesni_set_encrypt_key
+        shl     edx,4
+        test    eax,eax
+        jnz     NEAR $L$dec_key_ret
+        lea     rcx,[16+rdx*1+r8]
+
+        movups  xmm0,XMMWORD[r8]
+        movups  xmm1,XMMWORD[rcx]
+        movups  XMMWORD[rcx],xmm0
+        movups  XMMWORD[r8],xmm1
+        lea     r8,[16+r8]
+        lea     rcx,[((-16))+rcx]
+
+$L$dec_key_inverse:
+        movups  xmm0,XMMWORD[r8]
+        movups  xmm1,XMMWORD[rcx]
+DB      102,15,56,219,192
+DB      102,15,56,219,201
+        lea     r8,[16+r8]
+        lea     rcx,[((-16))+rcx]
+        movups  XMMWORD[16+rcx],xmm0
+        movups  XMMWORD[(-16)+r8],xmm1
+        cmp     rcx,r8
+        ja      NEAR $L$dec_key_inverse
+
+        movups  xmm0,XMMWORD[r8]
+DB      102,15,56,219,192
+        pxor    xmm1,xmm1
+        movups  XMMWORD[rcx],xmm0
+        pxor    xmm0,xmm0
+$L$dec_key_ret:
+        add     rsp,8
+
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_set_decrypt_key:
+
+global  aesni_set_encrypt_key
+
+ALIGN   16
+aesni_set_encrypt_key:
+__aesni_set_encrypt_key:
+
+DB      0x48,0x83,0xEC,0x08
+
+        mov     rax,-1
+        test    rcx,rcx
+        jz      NEAR $L$enc_key_ret
+        test    r8,r8
+        jz      NEAR $L$enc_key_ret
+
+        mov     r10d,268437504
+        movups  xmm0,XMMWORD[rcx]
+        xorps   xmm4,xmm4
+        and     r10d,DWORD[((OPENSSL_ia32cap_P+4))]
+        lea     rax,[16+r8]
+        cmp     edx,256
+        je      NEAR $L$14rounds
+        cmp     edx,192
+        je      NEAR $L$12rounds
+        cmp     edx,128
+        jne     NEAR $L$bad_keybits
+
+$L$10rounds:
+        mov     edx,9
+        cmp     r10d,268435456
+        je      NEAR $L$10rounds_alt
+
+        movups  XMMWORD[r8],xmm0
+DB      102,15,58,223,200,1
+        call    $L$key_expansion_128_cold
+DB      102,15,58,223,200,2
+        call    $L$key_expansion_128
+DB      102,15,58,223,200,4
+        call    $L$key_expansion_128
+DB      102,15,58,223,200,8
+        call    $L$key_expansion_128
+DB      102,15,58,223,200,16
+        call    $L$key_expansion_128
+DB      102,15,58,223,200,32
+        call    $L$key_expansion_128
+DB      102,15,58,223,200,64
+        call    $L$key_expansion_128
+DB      102,15,58,223,200,128
+        call    $L$key_expansion_128
+DB      102,15,58,223,200,27
+        call    $L$key_expansion_128
+DB      102,15,58,223,200,54
+        call    $L$key_expansion_128
+        movups  XMMWORD[rax],xmm0
+        mov     DWORD[80+rax],edx
+        xor     eax,eax
+        jmp     NEAR $L$enc_key_ret
+
+ALIGN   16
+$L$10rounds_alt:
+        movdqa  xmm5,XMMWORD[$L$key_rotate]
+        mov     r10d,8
+        movdqa  xmm4,XMMWORD[$L$key_rcon1]
+        movdqa  xmm2,xmm0
+        movdqu  XMMWORD[r8],xmm0
+        jmp     NEAR $L$oop_key128
+
+ALIGN   16
+$L$oop_key128:
+DB      102,15,56,0,197
+DB      102,15,56,221,196
+        pslld   xmm4,1
+        lea     rax,[16+rax]
+
+        movdqa  xmm3,xmm2
+        pslldq  xmm2,4
+        pxor    xmm3,xmm2
+        pslldq  xmm2,4
+        pxor    xmm3,xmm2
+        pslldq  xmm2,4
+        pxor    xmm2,xmm3
+
+        pxor    xmm0,xmm2
+        movdqu  XMMWORD[(-16)+rax],xmm0
+        movdqa  xmm2,xmm0
+
+        dec     r10d
+        jnz     NEAR $L$oop_key128
+
+        movdqa  xmm4,XMMWORD[$L$key_rcon1b]
+
+DB      102,15,56,0,197
+DB      102,15,56,221,196
+        pslld   xmm4,1
+
+        movdqa  xmm3,xmm2
+        pslldq  xmm2,4
+        pxor    xmm3,xmm2
+        pslldq  xmm2,4
+        pxor    xmm3,xmm2
+        pslldq  xmm2,4
+        pxor    xmm2,xmm3
+
+        pxor    xmm0,xmm2
+        movdqu  XMMWORD[rax],xmm0
+
+        movdqa  xmm2,xmm0
+DB      102,15,56,0,197
+DB      102,15,56,221,196
+
+        movdqa  xmm3,xmm2
+        pslldq  xmm2,4
+        pxor    xmm3,xmm2
+        pslldq  xmm2,4
+        pxor    xmm3,xmm2
+        pslldq  xmm2,4
+        pxor    xmm2,xmm3
+
+        pxor    xmm0,xmm2
+        movdqu  XMMWORD[16+rax],xmm0
+
+        mov     DWORD[96+rax],edx
+        xor     eax,eax
+        jmp     NEAR $L$enc_key_ret
+
+ALIGN   16
+$L$12rounds:
+        movq    xmm2,QWORD[16+rcx]
+        mov     edx,11
+        cmp     r10d,268435456
+        je      NEAR $L$12rounds_alt
+
+        movups  XMMWORD[r8],xmm0
+DB      102,15,58,223,202,1
+        call    $L$key_expansion_192a_cold
+DB      102,15,58,223,202,2
+        call    $L$key_expansion_192b
+DB      102,15,58,223,202,4
+        call    $L$key_expansion_192a
+DB      102,15,58,223,202,8
+        call    $L$key_expansion_192b
+DB      102,15,58,223,202,16
+        call    $L$key_expansion_192a
+DB      102,15,58,223,202,32
+        call    $L$key_expansion_192b
+DB      102,15,58,223,202,64
+        call    $L$key_expansion_192a
+DB      102,15,58,223,202,128
+        call    $L$key_expansion_192b
+        movups  XMMWORD[rax],xmm0
+        mov     DWORD[48+rax],edx
+        xor     rax,rax
+        jmp     NEAR $L$enc_key_ret
+
+ALIGN   16
+$L$12rounds_alt:
+        movdqa  xmm5,XMMWORD[$L$key_rotate192]
+        movdqa  xmm4,XMMWORD[$L$key_rcon1]
+        mov     r10d,8
+        movdqu  XMMWORD[r8],xmm0
+        jmp     NEAR $L$oop_key192
+
+ALIGN   16
+$L$oop_key192:
+        movq    QWORD[rax],xmm2
+        movdqa  xmm1,xmm2
+DB      102,15,56,0,213
+DB      102,15,56,221,212
+        pslld   xmm4,1
+        lea     rax,[24+rax]
+
+        movdqa  xmm3,xmm0
+        pslldq  xmm0,4
+        pxor    xmm3,xmm0
+        pslldq  xmm0,4
+        pxor    xmm3,xmm0
+        pslldq  xmm0,4
+        pxor    xmm0,xmm3
+
+        pshufd  xmm3,xmm0,0xff
+        pxor    xmm3,xmm1
+        pslldq  xmm1,4
+        pxor    xmm3,xmm1
+
+        pxor    xmm0,xmm2
+        pxor    xmm2,xmm3
+        movdqu  XMMWORD[(-16)+rax],xmm0
+
+        dec     r10d
+        jnz     NEAR $L$oop_key192
+
+        mov     DWORD[32+rax],edx
+        xor     eax,eax
+        jmp     NEAR $L$enc_key_ret
+
+ALIGN   16
+$L$14rounds:
+        movups  xmm2,XMMWORD[16+rcx]
+        mov     edx,13
+        lea     rax,[16+rax]
+        cmp     r10d,268435456
+        je      NEAR $L$14rounds_alt
+
+        movups  XMMWORD[r8],xmm0
+        movups  XMMWORD[16+r8],xmm2
+DB      102,15,58,223,202,1
+        call    $L$key_expansion_256a_cold
+DB      102,15,58,223,200,1
+        call    $L$key_expansion_256b
+DB      102,15,58,223,202,2
+        call    $L$key_expansion_256a
+DB      102,15,58,223,200,2
+        call    $L$key_expansion_256b
+DB      102,15,58,223,202,4
+        call    $L$key_expansion_256a
+DB      102,15,58,223,200,4
+        call    $L$key_expansion_256b
+DB      102,15,58,223,202,8
+        call    $L$key_expansion_256a
+DB      102,15,58,223,200,8
+        call    $L$key_expansion_256b
+DB      102,15,58,223,202,16
+        call    $L$key_expansion_256a
+DB      102,15,58,223,200,16
+        call    $L$key_expansion_256b
+DB      102,15,58,223,202,32
+        call    $L$key_expansion_256a
+DB      102,15,58,223,200,32
+        call    $L$key_expansion_256b
+DB      102,15,58,223,202,64
+        call    $L$key_expansion_256a
+        movups  XMMWORD[rax],xmm0
+        mov     DWORD[16+rax],edx
+        xor     rax,rax
+        jmp     NEAR $L$enc_key_ret
+
+ALIGN   16
+$L$14rounds_alt:
+        movdqa  xmm5,XMMWORD[$L$key_rotate]
+        movdqa  xmm4,XMMWORD[$L$key_rcon1]
+        mov     r10d,7
+        movdqu  XMMWORD[r8],xmm0
+        movdqa  xmm1,xmm2
+        movdqu  XMMWORD[16+r8],xmm2
+        jmp     NEAR $L$oop_key256
+
+ALIGN   16
+$L$oop_key256:
+DB      102,15,56,0,213
+DB      102,15,56,221,212
+
+        movdqa  xmm3,xmm0
+        pslldq  xmm0,4
+        pxor    xmm3,xmm0
+        pslldq  xmm0,4
+        pxor    xmm3,xmm0
+        pslldq  xmm0,4
+        pxor    xmm0,xmm3
+        pslld   xmm4,1
+
+        pxor    xmm0,xmm2
+        movdqu  XMMWORD[rax],xmm0
+
+        dec     r10d
+        jz      NEAR $L$done_key256
+
+        pshufd  xmm2,xmm0,0xff
+        pxor    xmm3,xmm3
+DB      102,15,56,221,211
+
+        movdqa  xmm3,xmm1
+        pslldq  xmm1,4
+        pxor    xmm3,xmm1
+        pslldq  xmm1,4
+        pxor    xmm3,xmm1
+        pslldq  xmm1,4
+        pxor    xmm1,xmm3
+
+        pxor    xmm2,xmm1
+        movdqu  XMMWORD[16+rax],xmm2
+        lea     rax,[32+rax]
+        movdqa  xmm1,xmm2
+
+        jmp     NEAR $L$oop_key256
+
+$L$done_key256:
+        mov     DWORD[16+rax],edx
+        xor     eax,eax
+        jmp     NEAR $L$enc_key_ret
+
+ALIGN   16
+$L$bad_keybits:
+        mov     rax,-2
+$L$enc_key_ret:
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        pxor    xmm2,xmm2
+        pxor    xmm3,xmm3
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+        add     rsp,8
+
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_set_encrypt_key:
+
+ALIGN   16
+$L$key_expansion_128:
+        movups  XMMWORD[rax],xmm0
+        lea     rax,[16+rax]
+$L$key_expansion_128_cold:
+        shufps  xmm4,xmm0,16
+        xorps   xmm0,xmm4
+        shufps  xmm4,xmm0,140
+        xorps   xmm0,xmm4
+        shufps  xmm1,xmm1,255
+        xorps   xmm0,xmm1
+        DB      0F3h,0C3h               ;repret
+
+ALIGN   16
+$L$key_expansion_192a:
+        movups  XMMWORD[rax],xmm0
+        lea     rax,[16+rax]
+$L$key_expansion_192a_cold:
+        movaps  xmm5,xmm2
+$L$key_expansion_192b_warm:
+        shufps  xmm4,xmm0,16
+        movdqa  xmm3,xmm2
+        xorps   xmm0,xmm4
+        shufps  xmm4,xmm0,140
+        pslldq  xmm3,4
+        xorps   xmm0,xmm4
+        pshufd  xmm1,xmm1,85
+        pxor    xmm2,xmm3
+        pxor    xmm0,xmm1
+        pshufd  xmm3,xmm0,255
+        pxor    xmm2,xmm3
+        DB      0F3h,0C3h               ;repret
+
+ALIGN   16
+$L$key_expansion_192b:
+        movaps  xmm3,xmm0
+        shufps  xmm5,xmm0,68
+        movups  XMMWORD[rax],xmm5
+        shufps  xmm3,xmm2,78
+        movups  XMMWORD[16+rax],xmm3
+        lea     rax,[32+rax]
+        jmp     NEAR $L$key_expansion_192b_warm
+
+ALIGN   16
+$L$key_expansion_256a:
+        movups  XMMWORD[rax],xmm2
+        lea     rax,[16+rax]
+$L$key_expansion_256a_cold:
+        shufps  xmm4,xmm0,16
+        xorps   xmm0,xmm4
+        shufps  xmm4,xmm0,140
+        xorps   xmm0,xmm4
+        shufps  xmm1,xmm1,255
+        xorps   xmm0,xmm1
+        DB      0F3h,0C3h               ;repret
+
+ALIGN   16
+$L$key_expansion_256b:
+        movups  XMMWORD[rax],xmm0
+        lea     rax,[16+rax]
+
+        shufps  xmm4,xmm2,16
+        xorps   xmm2,xmm4
+        shufps  xmm4,xmm2,140
+        xorps   xmm2,xmm4
+        shufps  xmm1,xmm1,170
+        xorps   xmm2,xmm1
+        DB      0F3h,0C3h               ;repret
+
+
+ALIGN   64
+$L$bswap_mask:
+DB      15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0
+$L$increment32:
+        DD      6,6,6,0
+$L$increment64:
+        DD      1,0,0,0
+$L$xts_magic:
+        DD      0x87,0,1,0
+$L$increment1:
+DB      0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
+$L$key_rotate:
+        DD      0x0c0f0e0d,0x0c0f0e0d,0x0c0f0e0d,0x0c0f0e0d
+$L$key_rotate192:
+        DD      0x04070605,0x04070605,0x04070605,0x04070605
+$L$key_rcon1:
+        DD      1,1,1,1
+$L$key_rcon1b:
+        DD      0x1b,0x1b,0x1b,0x1b
+
+DB      65,69,83,32,102,111,114,32,73,110,116,101,108,32,65,69
+DB      83,45,78,73,44,32,67,82,89,80,84,79,71,65,77,83
+DB      32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115
+DB      115,108,46,111,114,103,62,0
+ALIGN   64
+EXTERN  __imp_RtlVirtualUnwind
+
+ALIGN   16
+ecb_ccm64_se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$common_seh_tail
+
+        mov     rax,QWORD[152+r8]
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$common_seh_tail
+
+        lea     rsi,[rax]
+        lea     rdi,[512+r8]
+        mov     ecx,8
+        DD      0xa548f3fc
+        lea     rax,[88+rax]
+
+        jmp     NEAR $L$common_seh_tail
+
+
+
+ALIGN   16
+ctr_xts_se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$common_seh_tail
+
+        mov     rax,QWORD[152+r8]
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$common_seh_tail
+
+        mov     rax,QWORD[208+r8]
+
+        lea     rsi,[((-168))+rax]
+        lea     rdi,[512+r8]
+        mov     ecx,20
+        DD      0xa548f3fc
+
+        mov     rbp,QWORD[((-8))+rax]
+        mov     QWORD[160+r8],rbp
+        jmp     NEAR $L$common_seh_tail
+
+
+
+ALIGN   16
+ocb_se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$common_seh_tail
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$common_seh_tail
+
+        mov     r10d,DWORD[8+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$ocb_no_xmm
+
+        mov     rax,QWORD[152+r8]
+
+        lea     rsi,[rax]
+        lea     rdi,[512+r8]
+        mov     ecx,20
+        DD      0xa548f3fc
+        lea     rax,[((160+40))+rax]
+
+$L$ocb_no_xmm:
+        mov     rbx,QWORD[((-8))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+        mov     r12,QWORD[((-24))+rax]
+        mov     r13,QWORD[((-32))+rax]
+        mov     r14,QWORD[((-40))+rax]
+
+        mov     QWORD[144+r8],rbx
+        mov     QWORD[160+r8],rbp
+        mov     QWORD[216+r8],r12
+        mov     QWORD[224+r8],r13
+        mov     QWORD[232+r8],r14
+
+        jmp     NEAR $L$common_seh_tail
+
+
+ALIGN   16
+cbc_se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[152+r8]
+        mov     rbx,QWORD[248+r8]
+
+        lea     r10,[$L$cbc_decrypt_bulk]
+        cmp     rbx,r10
+        jb      NEAR $L$common_seh_tail
+
+        mov     rax,QWORD[120+r8]
+
+        lea     r10,[$L$cbc_decrypt_body]
+        cmp     rbx,r10
+        jb      NEAR $L$common_seh_tail
+
+        mov     rax,QWORD[152+r8]
+
+        lea     r10,[$L$cbc_ret]
+        cmp     rbx,r10
+        jae     NEAR $L$common_seh_tail
+
+        lea     rsi,[16+rax]
+        lea     rdi,[512+r8]
+        mov     ecx,20
+        DD      0xa548f3fc
+
+        mov     rax,QWORD[208+r8]
+
+        mov     rbp,QWORD[((-8))+rax]
+        mov     QWORD[160+r8],rbp
+
+$L$common_seh_tail:
+        mov     rdi,QWORD[8+rax]
+        mov     rsi,QWORD[16+rax]
+        mov     QWORD[152+r8],rax
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+
+        mov     rdi,QWORD[40+r9]
+        mov     rsi,r8
+        mov     ecx,154
+        DD      0xa548f3fc
+
+        mov     rsi,r9
+        xor     rcx,rcx
+        mov     rdx,QWORD[8+rsi]
+        mov     r8,QWORD[rsi]
+        mov     r9,QWORD[16+rsi]
+        mov     r10,QWORD[40+rsi]
+        lea     r11,[56+rsi]
+        lea     r12,[24+rsi]
+        mov     QWORD[32+rsp],r10
+        mov     QWORD[40+rsp],r11
+        mov     QWORD[48+rsp],r12
+        mov     QWORD[56+rsp],rcx
+        call    QWORD[__imp_RtlVirtualUnwind]
+
+        mov     eax,1
+        add     rsp,64
+        popfq
+        pop     r15
+        pop     r14
+        pop     r13
+        pop     r12
+        pop     rbp
+        pop     rbx
+        pop     rdi
+        pop     rsi
+        DB      0F3h,0C3h               ;repret
+
+
+section .pdata rdata align=4
+ALIGN   4
+        DD      $L$SEH_begin_aesni_ecb_encrypt wrt ..imagebase
+        DD      $L$SEH_end_aesni_ecb_encrypt wrt ..imagebase
+        DD      $L$SEH_info_ecb wrt ..imagebase
+
+        DD      $L$SEH_begin_aesni_ccm64_encrypt_blocks wrt ..imagebase
+        DD      $L$SEH_end_aesni_ccm64_encrypt_blocks wrt ..imagebase
+        DD      $L$SEH_info_ccm64_enc wrt ..imagebase
+
+        DD      $L$SEH_begin_aesni_ccm64_decrypt_blocks wrt ..imagebase
+        DD      $L$SEH_end_aesni_ccm64_decrypt_blocks wrt ..imagebase
+        DD      $L$SEH_info_ccm64_dec wrt ..imagebase
+
+        DD      $L$SEH_begin_aesni_ctr32_encrypt_blocks wrt ..imagebase
+        DD      $L$SEH_end_aesni_ctr32_encrypt_blocks wrt ..imagebase
+        DD      $L$SEH_info_ctr32 wrt ..imagebase
+
+        DD      $L$SEH_begin_aesni_xts_encrypt wrt ..imagebase
+        DD      $L$SEH_end_aesni_xts_encrypt wrt ..imagebase
+        DD      $L$SEH_info_xts_enc wrt ..imagebase
+
+        DD      $L$SEH_begin_aesni_xts_decrypt wrt ..imagebase
+        DD      $L$SEH_end_aesni_xts_decrypt wrt ..imagebase
+        DD      $L$SEH_info_xts_dec wrt ..imagebase
+
+        DD      $L$SEH_begin_aesni_ocb_encrypt wrt ..imagebase
+        DD      $L$SEH_end_aesni_ocb_encrypt wrt ..imagebase
+        DD      $L$SEH_info_ocb_enc wrt ..imagebase
+
+        DD      $L$SEH_begin_aesni_ocb_decrypt wrt ..imagebase
+        DD      $L$SEH_end_aesni_ocb_decrypt wrt ..imagebase
+        DD      $L$SEH_info_ocb_dec wrt ..imagebase
+        DD      $L$SEH_begin_aesni_cbc_encrypt wrt ..imagebase
+        DD      $L$SEH_end_aesni_cbc_encrypt wrt ..imagebase
+        DD      $L$SEH_info_cbc wrt ..imagebase
+
+        DD      aesni_set_decrypt_key wrt ..imagebase
+        DD      $L$SEH_end_set_decrypt_key wrt ..imagebase
+        DD      $L$SEH_info_key wrt ..imagebase
+
+        DD      aesni_set_encrypt_key wrt ..imagebase
+        DD      $L$SEH_end_set_encrypt_key wrt ..imagebase
+        DD      $L$SEH_info_key wrt ..imagebase
+section .xdata rdata align=8
+ALIGN   8
+$L$SEH_info_ecb:
+DB      9,0,0,0
+        DD      ecb_ccm64_se_handler wrt ..imagebase
+        DD      $L$ecb_enc_body wrt ..imagebase,$L$ecb_enc_ret wrt ..imagebase
+$L$SEH_info_ccm64_enc:
+DB      9,0,0,0
+        DD      ecb_ccm64_se_handler wrt ..imagebase
+        DD      $L$ccm64_enc_body wrt ..imagebase,$L$ccm64_enc_ret wrt ..imagebase
+$L$SEH_info_ccm64_dec:
+DB      9,0,0,0
+        DD      ecb_ccm64_se_handler wrt ..imagebase
+        DD      $L$ccm64_dec_body wrt ..imagebase,$L$ccm64_dec_ret wrt ..imagebase
+$L$SEH_info_ctr32:
+DB      9,0,0,0
+        DD      ctr_xts_se_handler wrt ..imagebase
+        DD      $L$ctr32_body wrt ..imagebase,$L$ctr32_epilogue wrt ..imagebase
+$L$SEH_info_xts_enc:
+DB      9,0,0,0
+        DD      ctr_xts_se_handler wrt ..imagebase
+        DD      $L$xts_enc_body wrt ..imagebase,$L$xts_enc_epilogue wrt ..imagebase
+$L$SEH_info_xts_dec:
+DB      9,0,0,0
+        DD      ctr_xts_se_handler wrt ..imagebase
+        DD      $L$xts_dec_body wrt ..imagebase,$L$xts_dec_epilogue wrt ..imagebase
+$L$SEH_info_ocb_enc:
+DB      9,0,0,0
+        DD      ocb_se_handler wrt ..imagebase
+        DD      $L$ocb_enc_body wrt ..imagebase,$L$ocb_enc_epilogue wrt ..imagebase
+        DD      $L$ocb_enc_pop wrt ..imagebase
+        DD      0
+$L$SEH_info_ocb_dec:
+DB      9,0,0,0
+        DD      ocb_se_handler wrt ..imagebase
+        DD      $L$ocb_dec_body wrt ..imagebase,$L$ocb_dec_epilogue wrt ..imagebase
+        DD      $L$ocb_dec_pop wrt ..imagebase
+        DD      0
+$L$SEH_info_cbc:
+DB      9,0,0,0
+        DD      cbc_se_handler wrt ..imagebase
+$L$SEH_info_key:
+DB      0x01,0x04,0x01,0x00
+DB      0x04,0x02,0x00,0x00
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm
new file mode 100644
index 0000000000..e6a5733924
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm
@@ -0,0 +1,1170 @@
+; Copyright 2011-2019 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ALIGN   16
+_vpaes_encrypt_core:
+
+        mov     r9,rdx
+        mov     r11,16
+        mov     eax,DWORD[240+rdx]
+        movdqa  xmm1,xmm9
+        movdqa  xmm2,XMMWORD[$L$k_ipt]
+        pandn   xmm1,xmm0
+        movdqu  xmm5,XMMWORD[r9]
+        psrld   xmm1,4
+        pand    xmm0,xmm9
+DB      102,15,56,0,208
+        movdqa  xmm0,XMMWORD[(($L$k_ipt+16))]
+DB      102,15,56,0,193
+        pxor    xmm2,xmm5
+        add     r9,16
+        pxor    xmm0,xmm2
+        lea     r10,[$L$k_mc_backward]
+        jmp     NEAR $L$enc_entry
+
+ALIGN   16
+$L$enc_loop:
+
+        movdqa  xmm4,xmm13
+        movdqa  xmm0,xmm12
+DB      102,15,56,0,226
+DB      102,15,56,0,195
+        pxor    xmm4,xmm5
+        movdqa  xmm5,xmm15
+        pxor    xmm0,xmm4
+        movdqa  xmm1,XMMWORD[((-64))+r10*1+r11]
+DB      102,15,56,0,234
+        movdqa  xmm4,XMMWORD[r10*1+r11]
+        movdqa  xmm2,xmm14
+DB      102,15,56,0,211
+        movdqa  xmm3,xmm0
+        pxor    xmm2,xmm5
+DB      102,15,56,0,193
+        add     r9,16
+        pxor    xmm0,xmm2
+DB      102,15,56,0,220
+        add     r11,16
+        pxor    xmm3,xmm0
+DB      102,15,56,0,193
+        and     r11,0x30
+        sub     rax,1
+        pxor    xmm0,xmm3
+
+$L$enc_entry:
+
+        movdqa  xmm1,xmm9
+        movdqa  xmm5,xmm11
+        pandn   xmm1,xmm0
+        psrld   xmm1,4
+        pand    xmm0,xmm9
+DB      102,15,56,0,232
+        movdqa  xmm3,xmm10
+        pxor    xmm0,xmm1
+DB      102,15,56,0,217
+        movdqa  xmm4,xmm10
+        pxor    xmm3,xmm5
+DB      102,15,56,0,224
+        movdqa  xmm2,xmm10
+        pxor    xmm4,xmm5
+DB      102,15,56,0,211
+        movdqa  xmm3,xmm10
+        pxor    xmm2,xmm0
+DB      102,15,56,0,220
+        movdqu  xmm5,XMMWORD[r9]
+        pxor    xmm3,xmm1
+        jnz     NEAR $L$enc_loop
+
+
+        movdqa  xmm4,XMMWORD[((-96))+r10]
+        movdqa  xmm0,XMMWORD[((-80))+r10]
+DB      102,15,56,0,226
+        pxor    xmm4,xmm5
+DB      102,15,56,0,195
+        movdqa  xmm1,XMMWORD[64+r10*1+r11]
+        pxor    xmm0,xmm4
+DB      102,15,56,0,193
+        DB      0F3h,0C3h               ;repret
+
+
+
+
+
+
+
+
+
+ALIGN   16
+_vpaes_decrypt_core:
+
+        mov     r9,rdx
+        mov     eax,DWORD[240+rdx]
+        movdqa  xmm1,xmm9
+        movdqa  xmm2,XMMWORD[$L$k_dipt]
+        pandn   xmm1,xmm0
+        mov     r11,rax
+        psrld   xmm1,4
+        movdqu  xmm5,XMMWORD[r9]
+        shl     r11,4
+        pand    xmm0,xmm9
+DB      102,15,56,0,208
+        movdqa  xmm0,XMMWORD[(($L$k_dipt+16))]
+        xor     r11,0x30
+        lea     r10,[$L$k_dsbd]
+DB      102,15,56,0,193
+        and     r11,0x30
+        pxor    xmm2,xmm5
+        movdqa  xmm5,XMMWORD[(($L$k_mc_forward+48))]
+        pxor    xmm0,xmm2
+        add     r9,16
+        add     r11,r10
+        jmp     NEAR $L$dec_entry
+
+ALIGN   16
+$L$dec_loop:
+
+
+
+        movdqa  xmm4,XMMWORD[((-32))+r10]
+        movdqa  xmm1,XMMWORD[((-16))+r10]
+DB      102,15,56,0,226
+DB      102,15,56,0,203
+        pxor    xmm0,xmm4
+        movdqa  xmm4,XMMWORD[r10]
+        pxor    xmm0,xmm1
+        movdqa  xmm1,XMMWORD[16+r10]
+
+DB      102,15,56,0,226
+DB      102,15,56,0,197
+DB      102,15,56,0,203
+        pxor    xmm0,xmm4
+        movdqa  xmm4,XMMWORD[32+r10]
+        pxor    xmm0,xmm1
+        movdqa  xmm1,XMMWORD[48+r10]
+
+DB      102,15,56,0,226
+DB      102,15,56,0,197
+DB      102,15,56,0,203
+        pxor    xmm0,xmm4
+        movdqa  xmm4,XMMWORD[64+r10]
+        pxor    xmm0,xmm1
+        movdqa  xmm1,XMMWORD[80+r10]
+
+DB      102,15,56,0,226
+DB      102,15,56,0,197
+DB      102,15,56,0,203
+        pxor    xmm0,xmm4
+        add     r9,16
+DB      102,15,58,15,237,12
+        pxor    xmm0,xmm1
+        sub     rax,1
+
+$L$dec_entry:
+
+        movdqa  xmm1,xmm9
+        pandn   xmm1,xmm0
+        movdqa  xmm2,xmm11
+        psrld   xmm1,4
+        pand    xmm0,xmm9
+DB      102,15,56,0,208
+        movdqa  xmm3,xmm10
+        pxor    xmm0,xmm1
+DB      102,15,56,0,217
+        movdqa  xmm4,xmm10
+        pxor    xmm3,xmm2
+DB      102,15,56,0,224
+        pxor    xmm4,xmm2
+        movdqa  xmm2,xmm10
+DB      102,15,56,0,211
+        movdqa  xmm3,xmm10
+        pxor    xmm2,xmm0
+DB      102,15,56,0,220
+        movdqu  xmm0,XMMWORD[r9]
+        pxor    xmm3,xmm1
+        jnz     NEAR $L$dec_loop
+
+
+        movdqa  xmm4,XMMWORD[96+r10]
+DB      102,15,56,0,226
+        pxor    xmm4,xmm0
+        movdqa  xmm0,XMMWORD[112+r10]
+        movdqa  xmm2,XMMWORD[((-352))+r11]
+DB      102,15,56,0,195
+        pxor    xmm0,xmm4
+DB      102,15,56,0,194
+        DB      0F3h,0C3h               ;repret
+
+
+
+
+
+
+
+
+
+ALIGN   16
+_vpaes_schedule_core:
+
+
+
+
+
+
+        call    _vpaes_preheat
+        movdqa  xmm8,XMMWORD[$L$k_rcon]
+        movdqu  xmm0,XMMWORD[rdi]
+
+
+        movdqa  xmm3,xmm0
+        lea     r11,[$L$k_ipt]
+        call    _vpaes_schedule_transform
+        movdqa  xmm7,xmm0
+
+        lea     r10,[$L$k_sr]
+        test    rcx,rcx
+        jnz     NEAR $L$schedule_am_decrypting
+
+
+        movdqu  XMMWORD[rdx],xmm0
+        jmp     NEAR $L$schedule_go
+
+$L$schedule_am_decrypting:
+
+        movdqa  xmm1,XMMWORD[r10*1+r8]
+DB      102,15,56,0,217
+        movdqu  XMMWORD[rdx],xmm3
+        xor     r8,0x30
+
+$L$schedule_go:
+        cmp     esi,192
+        ja      NEAR $L$schedule_256
+        je      NEAR $L$schedule_192
+
+
+
+
+
+
+
+
+
+
+$L$schedule_128:
+        mov     esi,10
+
+$L$oop_schedule_128:
+        call    _vpaes_schedule_round
+        dec     rsi
+        jz      NEAR $L$schedule_mangle_last
+        call    _vpaes_schedule_mangle
+        jmp     NEAR $L$oop_schedule_128
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ALIGN   16
+$L$schedule_192:
+        movdqu  xmm0,XMMWORD[8+rdi]
+        call    _vpaes_schedule_transform
+        movdqa  xmm6,xmm0
+        pxor    xmm4,xmm4
+        movhlps xmm6,xmm4
+        mov     esi,4
+
+$L$oop_schedule_192:
+        call    _vpaes_schedule_round
+DB      102,15,58,15,198,8
+        call    _vpaes_schedule_mangle
+        call    _vpaes_schedule_192_smear
+        call    _vpaes_schedule_mangle
+        call    _vpaes_schedule_round
+        dec     rsi
+        jz      NEAR $L$schedule_mangle_last
+        call    _vpaes_schedule_mangle
+        call    _vpaes_schedule_192_smear
+        jmp     NEAR $L$oop_schedule_192
+
+
+
+
+
+
+
+
+
+
+
+ALIGN   16
+$L$schedule_256:
+        movdqu  xmm0,XMMWORD[16+rdi]
+        call    _vpaes_schedule_transform
+        mov     esi,7
+
+$L$oop_schedule_256:
+        call    _vpaes_schedule_mangle
+        movdqa  xmm6,xmm0
+
+
+        call    _vpaes_schedule_round
+        dec     rsi
+        jz      NEAR $L$schedule_mangle_last
+        call    _vpaes_schedule_mangle
+
+
+        pshufd  xmm0,xmm0,0xFF
+        movdqa  xmm5,xmm7
+        movdqa  xmm7,xmm6
+        call    _vpaes_schedule_low_round
+        movdqa  xmm7,xmm5
+
+        jmp     NEAR $L$oop_schedule_256
+
+
+
+
+
+
+
+
+
+
+
+
+ALIGN   16
+$L$schedule_mangle_last:
+
+        lea     r11,[$L$k_deskew]
+        test    rcx,rcx
+        jnz     NEAR $L$schedule_mangle_last_dec
+
+
+        movdqa  xmm1,XMMWORD[r10*1+r8]
+DB      102,15,56,0,193
+        lea     r11,[$L$k_opt]
+        add     rdx,32
+
+$L$schedule_mangle_last_dec:
+        add     rdx,-16
+        pxor    xmm0,XMMWORD[$L$k_s63]
+        call    _vpaes_schedule_transform
+        movdqu  XMMWORD[rdx],xmm0
+
+
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        pxor    xmm2,xmm2
+        pxor    xmm3,xmm3
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+        pxor    xmm6,xmm6
+        pxor    xmm7,xmm7
+        DB      0F3h,0C3h               ;repret
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ALIGN   16
+_vpaes_schedule_192_smear:
+
+        pshufd  xmm1,xmm6,0x80
+        pshufd  xmm0,xmm7,0xFE
+        pxor    xmm6,xmm1
+        pxor    xmm1,xmm1
+        pxor    xmm6,xmm0
+        movdqa  xmm0,xmm6
+        movhlps xmm6,xmm1
+        DB      0F3h,0C3h               ;repret
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ALIGN   16
+_vpaes_schedule_round:
+
+
+        pxor    xmm1,xmm1
+DB      102,65,15,58,15,200,15
+DB      102,69,15,58,15,192,15
+        pxor    xmm7,xmm1
+
+
+        pshufd  xmm0,xmm0,0xFF
+DB      102,15,58,15,192,1
+
+
+
+
+_vpaes_schedule_low_round:
+
+        movdqa  xmm1,xmm7
+        pslldq  xmm7,4
+        pxor    xmm7,xmm1
+        movdqa  xmm1,xmm7
+        pslldq  xmm7,8
+        pxor    xmm7,xmm1
+        pxor    xmm7,XMMWORD[$L$k_s63]
+
+
+        movdqa  xmm1,xmm9
+        pandn   xmm1,xmm0
+        psrld   xmm1,4
+        pand    xmm0,xmm9
+        movdqa  xmm2,xmm11
+DB      102,15,56,0,208
+        pxor    xmm0,xmm1
+        movdqa  xmm3,xmm10
+DB      102,15,56,0,217
+        pxor    xmm3,xmm2
+        movdqa  xmm4,xmm10
+DB      102,15,56,0,224
+        pxor    xmm4,xmm2
+        movdqa  xmm2,xmm10
+DB      102,15,56,0,211
+        pxor    xmm2,xmm0
+        movdqa  xmm3,xmm10
+DB      102,15,56,0,220
+        pxor    xmm3,xmm1
+        movdqa  xmm4,xmm13
+DB      102,15,56,0,226
+        movdqa  xmm0,xmm12
+DB      102,15,56,0,195
+        pxor    xmm0,xmm4
+
+
+        pxor    xmm0,xmm7
+        movdqa  xmm7,xmm0
+        DB      0F3h,0C3h               ;repret
+
+
+
+
+
+
+
+
+
+
+
+
+
+ALIGN   16
+_vpaes_schedule_transform:
+
+        movdqa  xmm1,xmm9
+        pandn   xmm1,xmm0
+        psrld   xmm1,4
+        pand    xmm0,xmm9
+        movdqa  xmm2,XMMWORD[r11]
+DB      102,15,56,0,208
+        movdqa  xmm0,XMMWORD[16+r11]
+DB      102,15,56,0,193
+        pxor    xmm0,xmm2
+        DB      0F3h,0C3h               ;repret
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ALIGN   16
+_vpaes_schedule_mangle:
+
+        movdqa  xmm4,xmm0
+        movdqa  xmm5,XMMWORD[$L$k_mc_forward]
+        test    rcx,rcx
+        jnz     NEAR $L$schedule_mangle_dec
+
+
+        add     rdx,16
+        pxor    xmm4,XMMWORD[$L$k_s63]
+DB      102,15,56,0,229
+        movdqa  xmm3,xmm4
+DB      102,15,56,0,229
+        pxor    xmm3,xmm4
+DB      102,15,56,0,229
+        pxor    xmm3,xmm4
+
+        jmp     NEAR $L$schedule_mangle_both
+ALIGN   16
+$L$schedule_mangle_dec:
+
+        lea     r11,[$L$k_dksd]
+        movdqa  xmm1,xmm9
+        pandn   xmm1,xmm4
+        psrld   xmm1,4
+        pand    xmm4,xmm9
+
+        movdqa  xmm2,XMMWORD[r11]
+DB      102,15,56,0,212
+        movdqa  xmm3,XMMWORD[16+r11]
+DB      102,15,56,0,217
+        pxor    xmm3,xmm2
+DB      102,15,56,0,221
+
+        movdqa  xmm2,XMMWORD[32+r11]
+DB      102,15,56,0,212
+        pxor    xmm2,xmm3
+        movdqa  xmm3,XMMWORD[48+r11]
+DB      102,15,56,0,217
+        pxor    xmm3,xmm2
+DB      102,15,56,0,221
+
+        movdqa  xmm2,XMMWORD[64+r11]
+DB      102,15,56,0,212
+        pxor    xmm2,xmm3
+        movdqa  xmm3,XMMWORD[80+r11]
+DB      102,15,56,0,217
+        pxor    xmm3,xmm2
+DB      102,15,56,0,221
+
+        movdqa  xmm2,XMMWORD[96+r11]
+DB      102,15,56,0,212
+        pxor    xmm2,xmm3
+        movdqa  xmm3,XMMWORD[112+r11]
+DB      102,15,56,0,217
+        pxor    xmm3,xmm2
+
+        add     rdx,-16
+
+$L$schedule_mangle_both:
+        movdqa  xmm1,XMMWORD[r10*1+r8]
+DB      102,15,56,0,217
+        add     r8,-16
+        and     r8,0x30
+        movdqu  XMMWORD[rdx],xmm3
+        DB      0F3h,0C3h               ;repret
+
+
+
+
+
+
+global  vpaes_set_encrypt_key
+
+ALIGN   16
+vpaes_set_encrypt_key:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_vpaes_set_encrypt_key:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+        lea     rsp,[((-184))+rsp]
+        movaps  XMMWORD[16+rsp],xmm6
+        movaps  XMMWORD[32+rsp],xmm7
+        movaps  XMMWORD[48+rsp],xmm8
+        movaps  XMMWORD[64+rsp],xmm9
+        movaps  XMMWORD[80+rsp],xmm10
+        movaps  XMMWORD[96+rsp],xmm11
+        movaps  XMMWORD[112+rsp],xmm12
+        movaps  XMMWORD[128+rsp],xmm13
+        movaps  XMMWORD[144+rsp],xmm14
+        movaps  XMMWORD[160+rsp],xmm15
+$L$enc_key_body:
+        mov     eax,esi
+        shr     eax,5
+        add     eax,5
+        mov     DWORD[240+rdx],eax
+
+        mov     ecx,0
+        mov     r8d,0x30
+        call    _vpaes_schedule_core
+        movaps  xmm6,XMMWORD[16+rsp]
+        movaps  xmm7,XMMWORD[32+rsp]
+        movaps  xmm8,XMMWORD[48+rsp]
+        movaps  xmm9,XMMWORD[64+rsp]
+        movaps  xmm10,XMMWORD[80+rsp]
+        movaps  xmm11,XMMWORD[96+rsp]
+        movaps  xmm12,XMMWORD[112+rsp]
+        movaps  xmm13,XMMWORD[128+rsp]
+        movaps  xmm14,XMMWORD[144+rsp]
+        movaps  xmm15,XMMWORD[160+rsp]
+        lea     rsp,[184+rsp]
+$L$enc_key_epilogue:
+        xor     eax,eax
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_vpaes_set_encrypt_key:
+
+global  vpaes_set_decrypt_key
+
+ALIGN   16
+vpaes_set_decrypt_key:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_vpaes_set_decrypt_key:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+        lea     rsp,[((-184))+rsp]
+        movaps  XMMWORD[16+rsp],xmm6
+        movaps  XMMWORD[32+rsp],xmm7
+        movaps  XMMWORD[48+rsp],xmm8
+        movaps  XMMWORD[64+rsp],xmm9
+        movaps  XMMWORD[80+rsp],xmm10
+        movaps  XMMWORD[96+rsp],xmm11
+        movaps  XMMWORD[112+rsp],xmm12
+        movaps  XMMWORD[128+rsp],xmm13
+        movaps  XMMWORD[144+rsp],xmm14
+        movaps  XMMWORD[160+rsp],xmm15
+$L$dec_key_body:
+        mov     eax,esi
+        shr     eax,5
+        add     eax,5
+        mov     DWORD[240+rdx],eax
+        shl     eax,4
+        lea     rdx,[16+rax*1+rdx]
+
+        mov     ecx,1
+        mov     r8d,esi
+        shr     r8d,1
+        and     r8d,32
+        xor     r8d,32
+        call    _vpaes_schedule_core
+        movaps  xmm6,XMMWORD[16+rsp]
+        movaps  xmm7,XMMWORD[32+rsp]
+        movaps  xmm8,XMMWORD[48+rsp]
+        movaps  xmm9,XMMWORD[64+rsp]
+        movaps  xmm10,XMMWORD[80+rsp]
+        movaps  xmm11,XMMWORD[96+rsp]
+        movaps  xmm12,XMMWORD[112+rsp]
+        movaps  xmm13,XMMWORD[128+rsp]
+        movaps  xmm14,XMMWORD[144+rsp]
+        movaps  xmm15,XMMWORD[160+rsp]
+        lea     rsp,[184+rsp]
+$L$dec_key_epilogue:
+        xor     eax,eax
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_vpaes_set_decrypt_key:
+
+global  vpaes_encrypt
+
+ALIGN   16
+vpaes_encrypt:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_vpaes_encrypt:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+        lea     rsp,[((-184))+rsp]
+        movaps  XMMWORD[16+rsp],xmm6
+        movaps  XMMWORD[32+rsp],xmm7
+        movaps  XMMWORD[48+rsp],xmm8
+        movaps  XMMWORD[64+rsp],xmm9
+        movaps  XMMWORD[80+rsp],xmm10
+        movaps  XMMWORD[96+rsp],xmm11
+        movaps  XMMWORD[112+rsp],xmm12
+        movaps  XMMWORD[128+rsp],xmm13
+        movaps  XMMWORD[144+rsp],xmm14
+        movaps  XMMWORD[160+rsp],xmm15
+$L$enc_body:
+        movdqu  xmm0,XMMWORD[rdi]
+        call    _vpaes_preheat
+        call    _vpaes_encrypt_core
+        movdqu  XMMWORD[rsi],xmm0
+        movaps  xmm6,XMMWORD[16+rsp]
+        movaps  xmm7,XMMWORD[32+rsp]
+        movaps  xmm8,XMMWORD[48+rsp]
+        movaps  xmm9,XMMWORD[64+rsp]
+        movaps  xmm10,XMMWORD[80+rsp]
+        movaps  xmm11,XMMWORD[96+rsp]
+        movaps  xmm12,XMMWORD[112+rsp]
+        movaps  xmm13,XMMWORD[128+rsp]
+        movaps  xmm14,XMMWORD[144+rsp]
+        movaps  xmm15,XMMWORD[160+rsp]
+        lea     rsp,[184+rsp]
+$L$enc_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_vpaes_encrypt:
+
+global  vpaes_decrypt
+
+ALIGN   16
+vpaes_decrypt:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_vpaes_decrypt:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+        lea     rsp,[((-184))+rsp]
+        movaps  XMMWORD[16+rsp],xmm6
+        movaps  XMMWORD[32+rsp],xmm7
+        movaps  XMMWORD[48+rsp],xmm8
+        movaps  XMMWORD[64+rsp],xmm9
+        movaps  XMMWORD[80+rsp],xmm10
+        movaps  XMMWORD[96+rsp],xmm11
+        movaps  XMMWORD[112+rsp],xmm12
+        movaps  XMMWORD[128+rsp],xmm13
+        movaps  XMMWORD[144+rsp],xmm14
+        movaps  XMMWORD[160+rsp],xmm15
+$L$dec_body:
+        movdqu  xmm0,XMMWORD[rdi]
+        call    _vpaes_preheat
+        call    _vpaes_decrypt_core
+        movdqu  XMMWORD[rsi],xmm0
+        movaps  xmm6,XMMWORD[16+rsp]
+        movaps  xmm7,XMMWORD[32+rsp]
+        movaps  xmm8,XMMWORD[48+rsp]
+        movaps  xmm9,XMMWORD[64+rsp]
+        movaps  xmm10,XMMWORD[80+rsp]
+        movaps  xmm11,XMMWORD[96+rsp]
+        movaps  xmm12,XMMWORD[112+rsp]
+        movaps  xmm13,XMMWORD[128+rsp]
+        movaps  xmm14,XMMWORD[144+rsp]
+        movaps  xmm15,XMMWORD[160+rsp]
+        lea     rsp,[184+rsp]
+$L$dec_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_vpaes_decrypt:
+global  vpaes_cbc_encrypt
+
+ALIGN   16
+vpaes_cbc_encrypt:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_vpaes_cbc_encrypt:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        xchg    rdx,rcx
+        sub     rcx,16
+        jc      NEAR $L$cbc_abort
+        lea     rsp,[((-184))+rsp]
+        movaps  XMMWORD[16+rsp],xmm6
+        movaps  XMMWORD[32+rsp],xmm7
+        movaps  XMMWORD[48+rsp],xmm8
+        movaps  XMMWORD[64+rsp],xmm9
+        movaps  XMMWORD[80+rsp],xmm10
+        movaps  XMMWORD[96+rsp],xmm11
+        movaps  XMMWORD[112+rsp],xmm12
+        movaps  XMMWORD[128+rsp],xmm13
+        movaps  XMMWORD[144+rsp],xmm14
+        movaps  XMMWORD[160+rsp],xmm15
+$L$cbc_body:
+        movdqu  xmm6,XMMWORD[r8]
+        sub     rsi,rdi
+        call    _vpaes_preheat
+        cmp     r9d,0
+        je      NEAR $L$cbc_dec_loop
+        jmp     NEAR $L$cbc_enc_loop
+ALIGN   16
+$L$cbc_enc_loop:
+        movdqu  xmm0,XMMWORD[rdi]
+        pxor    xmm0,xmm6
+        call    _vpaes_encrypt_core
+        movdqa  xmm6,xmm0
+        movdqu  XMMWORD[rdi*1+rsi],xmm0
+        lea     rdi,[16+rdi]
+        sub     rcx,16
+        jnc     NEAR $L$cbc_enc_loop
+        jmp     NEAR $L$cbc_done
+ALIGN   16
+$L$cbc_dec_loop:
+        movdqu  xmm0,XMMWORD[rdi]
+        movdqa  xmm7,xmm0
+        call    _vpaes_decrypt_core
+        pxor    xmm0,xmm6
+        movdqa  xmm6,xmm7
+        movdqu  XMMWORD[rdi*1+rsi],xmm0
+        lea     rdi,[16+rdi]
+        sub     rcx,16
+        jnc     NEAR $L$cbc_dec_loop
+$L$cbc_done:
+        movdqu  XMMWORD[r8],xmm6
+        movaps  xmm6,XMMWORD[16+rsp]
+        movaps  xmm7,XMMWORD[32+rsp]
+        movaps  xmm8,XMMWORD[48+rsp]
+        movaps  xmm9,XMMWORD[64+rsp]
+        movaps  xmm10,XMMWORD[80+rsp]
+        movaps  xmm11,XMMWORD[96+rsp]
+        movaps  xmm12,XMMWORD[112+rsp]
+        movaps  xmm13,XMMWORD[128+rsp]
+        movaps  xmm14,XMMWORD[144+rsp]
+        movaps  xmm15,XMMWORD[160+rsp]
+        lea     rsp,[184+rsp]
+$L$cbc_epilogue:
+$L$cbc_abort:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_vpaes_cbc_encrypt:
+
+
+
+
+
+
+
+ALIGN   16
+_vpaes_preheat:
+
+        lea     r10,[$L$k_s0F]
+        movdqa  xmm10,XMMWORD[((-32))+r10]
+        movdqa  xmm11,XMMWORD[((-16))+r10]
+        movdqa  xmm9,XMMWORD[r10]
+        movdqa  xmm13,XMMWORD[48+r10]
+        movdqa  xmm12,XMMWORD[64+r10]
+        movdqa  xmm15,XMMWORD[80+r10]
+        movdqa  xmm14,XMMWORD[96+r10]
+        DB      0F3h,0C3h               ;repret
+
+
+
+
+
+
+
+
+ALIGN   64
+_vpaes_consts:
+$L$k_inv:
+        DQ      0x0E05060F0D080180,0x040703090A0B0C02
+        DQ      0x01040A060F0B0780,0x030D0E0C02050809
+
+$L$k_s0F:
+        DQ      0x0F0F0F0F0F0F0F0F,0x0F0F0F0F0F0F0F0F
+
+$L$k_ipt:
+        DQ      0xC2B2E8985A2A7000,0xCABAE09052227808
+        DQ      0x4C01307D317C4D00,0xCD80B1FCB0FDCC81
+
+$L$k_sb1:
+        DQ      0xB19BE18FCB503E00,0xA5DF7A6E142AF544
+        DQ      0x3618D415FAE22300,0x3BF7CCC10D2ED9EF
+$L$k_sb2:
+        DQ      0xE27A93C60B712400,0x5EB7E955BC982FCD
+        DQ      0x69EB88400AE12900,0xC2A163C8AB82234A
+$L$k_sbo:
+        DQ      0xD0D26D176FBDC700,0x15AABF7AC502A878
+        DQ      0xCFE474A55FBB6A00,0x8E1E90D1412B35FA
+
+$L$k_mc_forward:
+        DQ      0x0407060500030201,0x0C0F0E0D080B0A09
+        DQ      0x080B0A0904070605,0x000302010C0F0E0D
+        DQ      0x0C0F0E0D080B0A09,0x0407060500030201
+        DQ      0x000302010C0F0E0D,0x080B0A0904070605
+
+$L$k_mc_backward:
+        DQ      0x0605040702010003,0x0E0D0C0F0A09080B
+        DQ      0x020100030E0D0C0F,0x0A09080B06050407
+        DQ      0x0E0D0C0F0A09080B,0x0605040702010003
+        DQ      0x0A09080B06050407,0x020100030E0D0C0F
+
+$L$k_sr:
+        DQ      0x0706050403020100,0x0F0E0D0C0B0A0908
+        DQ      0x030E09040F0A0500,0x0B06010C07020D08
+        DQ      0x0F060D040B020900,0x070E050C030A0108
+        DQ      0x0B0E0104070A0D00,0x0306090C0F020508
+
+$L$k_rcon:
+        DQ      0x1F8391B9AF9DEEB6,0x702A98084D7C7D81
+
+$L$k_s63:
+        DQ      0x5B5B5B5B5B5B5B5B,0x5B5B5B5B5B5B5B5B
+
+$L$k_opt:
+        DQ      0xFF9F4929D6B66000,0xF7974121DEBE6808
+        DQ      0x01EDBD5150BCEC00,0xE10D5DB1B05C0CE0
+
+$L$k_deskew:
+        DQ      0x07E4A34047A4E300,0x1DFEB95A5DBEF91A
+        DQ      0x5F36B5DC83EA6900,0x2841C2ABF49D1E77
+
+
+
+
+
+$L$k_dksd:
+        DQ      0xFEB91A5DA3E44700,0x0740E3A45A1DBEF9
+        DQ      0x41C277F4B5368300,0x5FDC69EAAB289D1E
+$L$k_dksb:
+        DQ      0x9A4FCA1F8550D500,0x03D653861CC94C99
+        DQ      0x115BEDA7B6FC4A00,0xD993256F7E3482C8
+$L$k_dkse:
+        DQ      0xD5031CCA1FC9D600,0x53859A4C994F5086
+        DQ      0xA23196054FDC7BE8,0xCD5EF96A20B31487
+$L$k_dks9:
+        DQ      0xB6116FC87ED9A700,0x4AED933482255BFC
+        DQ      0x4576516227143300,0x8BB89FACE9DAFDCE
+
+
+
+
+
+$L$k_dipt:
+        DQ      0x0F505B040B545F00,0x154A411E114E451A
+        DQ      0x86E383E660056500,0x12771772F491F194
+
+$L$k_dsb9:
+        DQ      0x851C03539A86D600,0xCAD51F504F994CC9
+        DQ      0xC03B1789ECD74900,0x725E2C9EB2FBA565
+$L$k_dsbd:
+        DQ      0x7D57CCDFE6B1A200,0xF56E9B13882A4439
+        DQ      0x3CE2FAF724C6CB00,0x2931180D15DEEFD3
+$L$k_dsbb:
+        DQ      0xD022649296B44200,0x602646F6B0F2D404
+        DQ      0xC19498A6CD596700,0xF3FF0C3E3255AA6B
+$L$k_dsbe:
+        DQ      0x46F2929626D4D000,0x2242600464B4F6B0
+        DQ      0x0C55A6CDFFAAC100,0x9467F36B98593E32
+$L$k_dsbo:
+        DQ      0x1387EA537EF94000,0xC7AA6DB9D4943E2D
+        DQ      0x12D7560F93441D00,0xCA4B8159D8C58E9C
+DB      86,101,99,116,111,114,32,80,101,114,109,117,116,97,116,105
+DB      111,110,32,65,69,83,32,102,111,114,32,120,56,54,95,54
+DB      52,47,83,83,83,69,51,44,32,77,105,107,101,32,72,97
+DB      109,98,117,114,103,32,40,83,116,97,110,102,111,114,100,32
+DB      85,110,105,118,101,114,115,105,116,121,41,0
+ALIGN   64
+
+EXTERN  __imp_RtlVirtualUnwind
+
+ALIGN   16
+se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$in_prologue
+
+        mov     rax,QWORD[152+r8]
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$in_prologue
+
+        lea     rsi,[16+rax]
+        lea     rdi,[512+r8]
+        mov     ecx,20
+        DD      0xa548f3fc
+        lea     rax,[184+rax]
+
+$L$in_prologue:
+        mov     rdi,QWORD[8+rax]
+        mov     rsi,QWORD[16+rax]
+        mov     QWORD[152+r8],rax
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+
+        mov     rdi,QWORD[40+r9]
+        mov     rsi,r8
+        mov     ecx,154
+        DD      0xa548f3fc
+
+        mov     rsi,r9
+        xor     rcx,rcx
+        mov     rdx,QWORD[8+rsi]
+        mov     r8,QWORD[rsi]
+        mov     r9,QWORD[16+rsi]
+        mov     r10,QWORD[40+rsi]
+        lea     r11,[56+rsi]
+        lea     r12,[24+rsi]
+        mov     QWORD[32+rsp],r10
+        mov     QWORD[40+rsp],r11
+        mov     QWORD[48+rsp],r12
+        mov     QWORD[56+rsp],rcx
+        call    QWORD[__imp_RtlVirtualUnwind]
+
+        mov     eax,1
+        add     rsp,64
+        popfq
+        pop     r15
+        pop     r14
+        pop     r13
+        pop     r12
+        pop     rbp
+        pop     rbx
+        pop     rdi
+        pop     rsi
+        DB      0F3h,0C3h               ;repret
+
+
+section .pdata rdata align=4
+ALIGN   4
+        DD      $L$SEH_begin_vpaes_set_encrypt_key wrt ..imagebase
+        DD      $L$SEH_end_vpaes_set_encrypt_key wrt ..imagebase
+        DD      $L$SEH_info_vpaes_set_encrypt_key wrt ..imagebase
+
+        DD      $L$SEH_begin_vpaes_set_decrypt_key wrt ..imagebase
+        DD      $L$SEH_end_vpaes_set_decrypt_key wrt ..imagebase
+        DD      $L$SEH_info_vpaes_set_decrypt_key wrt ..imagebase
+
+        DD      $L$SEH_begin_vpaes_encrypt wrt ..imagebase
+        DD      $L$SEH_end_vpaes_encrypt wrt ..imagebase
+        DD      $L$SEH_info_vpaes_encrypt wrt ..imagebase
+
+        DD      $L$SEH_begin_vpaes_decrypt wrt ..imagebase
+        DD      $L$SEH_end_vpaes_decrypt wrt ..imagebase
+        DD      $L$SEH_info_vpaes_decrypt wrt ..imagebase
+
+        DD      $L$SEH_begin_vpaes_cbc_encrypt wrt ..imagebase
+        DD      $L$SEH_end_vpaes_cbc_encrypt wrt ..imagebase
+        DD      $L$SEH_info_vpaes_cbc_encrypt wrt ..imagebase
+
+section .xdata rdata align=8
+ALIGN   8
+$L$SEH_info_vpaes_set_encrypt_key:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$enc_key_body wrt ..imagebase,$L$enc_key_epilogue wrt ..imagebase
+$L$SEH_info_vpaes_set_decrypt_key:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$dec_key_body wrt ..imagebase,$L$dec_key_epilogue wrt ..imagebase
+$L$SEH_info_vpaes_encrypt:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$enc_body wrt ..imagebase,$L$enc_epilogue wrt ..imagebase
+$L$SEH_info_vpaes_decrypt:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$dec_body wrt ..imagebase,$L$dec_epilogue wrt ..imagebase
+$L$SEH_info_vpaes_cbc_encrypt:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$cbc_body wrt ..imagebase,$L$cbc_epilogue wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-avx2.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-avx2.nasm
new file mode 100644
index 0000000000..69443b7261
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-avx2.nasm
@@ -0,0 +1,1989 @@
+; Copyright 2013-2019 The OpenSSL Project Authors. All Rights Reserved.
+; Copyright (c) 2012, Intel Corporation. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+global  rsaz_1024_sqr_avx2
+
+ALIGN   64
+rsaz_1024_sqr_avx2:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_rsaz_1024_sqr_avx2:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+
+
+
+        lea     rax,[rsp]
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        vzeroupper
+        lea     rsp,[((-168))+rsp]
+        vmovaps XMMWORD[(-216)+rax],xmm6
+        vmovaps XMMWORD[(-200)+rax],xmm7
+        vmovaps XMMWORD[(-184)+rax],xmm8
+        vmovaps XMMWORD[(-168)+rax],xmm9
+        vmovaps XMMWORD[(-152)+rax],xmm10
+        vmovaps XMMWORD[(-136)+rax],xmm11
+        vmovaps XMMWORD[(-120)+rax],xmm12
+        vmovaps XMMWORD[(-104)+rax],xmm13
+        vmovaps XMMWORD[(-88)+rax],xmm14
+        vmovaps XMMWORD[(-72)+rax],xmm15
+$L$sqr_1024_body:
+        mov     rbp,rax
+
+        mov     r13,rdx
+        sub     rsp,832
+        mov     r15,r13
+        sub     rdi,-128
+        sub     rsi,-128
+        sub     r13,-128
+
+        and     r15,4095
+        add     r15,32*10
+        shr     r15,12
+        vpxor   ymm9,ymm9,ymm9
+        jz      NEAR $L$sqr_1024_no_n_copy
+
+
+
+
+
+        sub     rsp,32*10
+        vmovdqu ymm0,YMMWORD[((0-128))+r13]
+        and     rsp,-2048
+        vmovdqu ymm1,YMMWORD[((32-128))+r13]
+        vmovdqu ymm2,YMMWORD[((64-128))+r13]
+        vmovdqu ymm3,YMMWORD[((96-128))+r13]
+        vmovdqu ymm4,YMMWORD[((128-128))+r13]
+        vmovdqu ymm5,YMMWORD[((160-128))+r13]
+        vmovdqu ymm6,YMMWORD[((192-128))+r13]
+        vmovdqu ymm7,YMMWORD[((224-128))+r13]
+        vmovdqu ymm8,YMMWORD[((256-128))+r13]
+        lea     r13,[((832+128))+rsp]
+        vmovdqu YMMWORD[(0-128)+r13],ymm0
+        vmovdqu YMMWORD[(32-128)+r13],ymm1
+        vmovdqu YMMWORD[(64-128)+r13],ymm2
+        vmovdqu YMMWORD[(96-128)+r13],ymm3
+        vmovdqu YMMWORD[(128-128)+r13],ymm4
+        vmovdqu YMMWORD[(160-128)+r13],ymm5
+        vmovdqu YMMWORD[(192-128)+r13],ymm6
+        vmovdqu YMMWORD[(224-128)+r13],ymm7
+        vmovdqu YMMWORD[(256-128)+r13],ymm8
+        vmovdqu YMMWORD[(288-128)+r13],ymm9
+
+$L$sqr_1024_no_n_copy:
+        and     rsp,-1024
+
+        vmovdqu ymm1,YMMWORD[((32-128))+rsi]
+        vmovdqu ymm2,YMMWORD[((64-128))+rsi]
+        vmovdqu ymm3,YMMWORD[((96-128))+rsi]
+        vmovdqu ymm4,YMMWORD[((128-128))+rsi]
+        vmovdqu ymm5,YMMWORD[((160-128))+rsi]
+        vmovdqu ymm6,YMMWORD[((192-128))+rsi]
+        vmovdqu ymm7,YMMWORD[((224-128))+rsi]
+        vmovdqu ymm8,YMMWORD[((256-128))+rsi]
+
+        lea     rbx,[192+rsp]
+        vmovdqu ymm15,YMMWORD[$L$and_mask]
+        jmp     NEAR $L$OOP_GRANDE_SQR_1024
+
+ALIGN   32
+$L$OOP_GRANDE_SQR_1024:
+        lea     r9,[((576+128))+rsp]
+        lea     r12,[448+rsp]
+
+
+
+
+        vpaddq  ymm1,ymm1,ymm1
+        vpbroadcastq    ymm10,QWORD[((0-128))+rsi]
+        vpaddq  ymm2,ymm2,ymm2
+        vmovdqa YMMWORD[(0-128)+r9],ymm1
+        vpaddq  ymm3,ymm3,ymm3
+        vmovdqa YMMWORD[(32-128)+r9],ymm2
+        vpaddq  ymm4,ymm4,ymm4
+        vmovdqa YMMWORD[(64-128)+r9],ymm3
+        vpaddq  ymm5,ymm5,ymm5
+        vmovdqa YMMWORD[(96-128)+r9],ymm4
+        vpaddq  ymm6,ymm6,ymm6
+        vmovdqa YMMWORD[(128-128)+r9],ymm5
+        vpaddq  ymm7,ymm7,ymm7
+        vmovdqa YMMWORD[(160-128)+r9],ymm6
+        vpaddq  ymm8,ymm8,ymm8
+        vmovdqa YMMWORD[(192-128)+r9],ymm7
+        vpxor   ymm9,ymm9,ymm9
+        vmovdqa YMMWORD[(224-128)+r9],ymm8
+
+        vpmuludq        ymm0,ymm10,YMMWORD[((0-128))+rsi]
+        vpbroadcastq    ymm11,QWORD[((32-128))+rsi]
+        vmovdqu YMMWORD[(288-192)+rbx],ymm9
+        vpmuludq        ymm1,ymm1,ymm10
+        vmovdqu YMMWORD[(320-448)+r12],ymm9
+        vpmuludq        ymm2,ymm2,ymm10
+        vmovdqu YMMWORD[(352-448)+r12],ymm9
+        vpmuludq        ymm3,ymm3,ymm10
+        vmovdqu YMMWORD[(384-448)+r12],ymm9
+        vpmuludq        ymm4,ymm4,ymm10
+        vmovdqu YMMWORD[(416-448)+r12],ymm9
+        vpmuludq        ymm5,ymm5,ymm10
+        vmovdqu YMMWORD[(448-448)+r12],ymm9
+        vpmuludq        ymm6,ymm6,ymm10
+        vmovdqu YMMWORD[(480-448)+r12],ymm9
+        vpmuludq        ymm7,ymm7,ymm10
+        vmovdqu YMMWORD[(512-448)+r12],ymm9
+        vpmuludq        ymm8,ymm8,ymm10
+        vpbroadcastq    ymm10,QWORD[((64-128))+rsi]
+        vmovdqu YMMWORD[(544-448)+r12],ymm9
+
+        mov     r15,rsi
+        mov     r14d,4
+        jmp     NEAR $L$sqr_entry_1024
+ALIGN   32
+$L$OOP_SQR_1024:
+        vpbroadcastq    ymm11,QWORD[((32-128))+r15]
+        vpmuludq        ymm0,ymm10,YMMWORD[((0-128))+rsi]
+        vpaddq  ymm0,ymm0,YMMWORD[((0-192))+rbx]
+        vpmuludq        ymm1,ymm10,YMMWORD[((0-128))+r9]
+        vpaddq  ymm1,ymm1,YMMWORD[((32-192))+rbx]
+        vpmuludq        ymm2,ymm10,YMMWORD[((32-128))+r9]
+        vpaddq  ymm2,ymm2,YMMWORD[((64-192))+rbx]
+        vpmuludq        ymm3,ymm10,YMMWORD[((64-128))+r9]
+        vpaddq  ymm3,ymm3,YMMWORD[((96-192))+rbx]
+        vpmuludq        ymm4,ymm10,YMMWORD[((96-128))+r9]
+        vpaddq  ymm4,ymm4,YMMWORD[((128-192))+rbx]
+        vpmuludq        ymm5,ymm10,YMMWORD[((128-128))+r9]
+        vpaddq  ymm5,ymm5,YMMWORD[((160-192))+rbx]
+        vpmuludq        ymm6,ymm10,YMMWORD[((160-128))+r9]
+        vpaddq  ymm6,ymm6,YMMWORD[((192-192))+rbx]
+        vpmuludq        ymm7,ymm10,YMMWORD[((192-128))+r9]
+        vpaddq  ymm7,ymm7,YMMWORD[((224-192))+rbx]
+        vpmuludq        ymm8,ymm10,YMMWORD[((224-128))+r9]
+        vpbroadcastq    ymm10,QWORD[((64-128))+r15]
+        vpaddq  ymm8,ymm8,YMMWORD[((256-192))+rbx]
+$L$sqr_entry_1024:
+        vmovdqu YMMWORD[(0-192)+rbx],ymm0
+        vmovdqu YMMWORD[(32-192)+rbx],ymm1
+
+        vpmuludq        ymm12,ymm11,YMMWORD[((32-128))+rsi]
+        vpaddq  ymm2,ymm2,ymm12
+        vpmuludq        ymm14,ymm11,YMMWORD[((32-128))+r9]
+        vpaddq  ymm3,ymm3,ymm14
+        vpmuludq        ymm13,ymm11,YMMWORD[((64-128))+r9]
+        vpaddq  ymm4,ymm4,ymm13
+        vpmuludq        ymm12,ymm11,YMMWORD[((96-128))+r9]
+        vpaddq  ymm5,ymm5,ymm12
+        vpmuludq        ymm14,ymm11,YMMWORD[((128-128))+r9]
+        vpaddq  ymm6,ymm6,ymm14
+        vpmuludq        ymm13,ymm11,YMMWORD[((160-128))+r9]
+        vpaddq  ymm7,ymm7,ymm13
+        vpmuludq        ymm12,ymm11,YMMWORD[((192-128))+r9]
+        vpaddq  ymm8,ymm8,ymm12
+        vpmuludq        ymm0,ymm11,YMMWORD[((224-128))+r9]
+        vpbroadcastq    ymm11,QWORD[((96-128))+r15]
+        vpaddq  ymm0,ymm0,YMMWORD[((288-192))+rbx]
+
+        vmovdqu YMMWORD[(64-192)+rbx],ymm2
+        vmovdqu YMMWORD[(96-192)+rbx],ymm3
+
+        vpmuludq        ymm13,ymm10,YMMWORD[((64-128))+rsi]
+        vpaddq  ymm4,ymm4,ymm13
+        vpmuludq        ymm12,ymm10,YMMWORD[((64-128))+r9]
+        vpaddq  ymm5,ymm5,ymm12
+        vpmuludq        ymm14,ymm10,YMMWORD[((96-128))+r9]
+        vpaddq  ymm6,ymm6,ymm14
+        vpmuludq        ymm13,ymm10,YMMWORD[((128-128))+r9]
+        vpaddq  ymm7,ymm7,ymm13
+        vpmuludq        ymm12,ymm10,YMMWORD[((160-128))+r9]
+        vpaddq  ymm8,ymm8,ymm12
+        vpmuludq        ymm14,ymm10,YMMWORD[((192-128))+r9]
+        vpaddq  ymm0,ymm0,ymm14
+        vpmuludq        ymm1,ymm10,YMMWORD[((224-128))+r9]
+        vpbroadcastq    ymm10,QWORD[((128-128))+r15]
+        vpaddq  ymm1,ymm1,YMMWORD[((320-448))+r12]
+
+        vmovdqu YMMWORD[(128-192)+rbx],ymm4
+        vmovdqu YMMWORD[(160-192)+rbx],ymm5
+
+        vpmuludq        ymm12,ymm11,YMMWORD[((96-128))+rsi]
+        vpaddq  ymm6,ymm6,ymm12
+        vpmuludq        ymm14,ymm11,YMMWORD[((96-128))+r9]
+        vpaddq  ymm7,ymm7,ymm14
+        vpmuludq        ymm13,ymm11,YMMWORD[((128-128))+r9]
+        vpaddq  ymm8,ymm8,ymm13
+        vpmuludq        ymm12,ymm11,YMMWORD[((160-128))+r9]
+        vpaddq  ymm0,ymm0,ymm12
+        vpmuludq        ymm14,ymm11,YMMWORD[((192-128))+r9]
+        vpaddq  ymm1,ymm1,ymm14
+        vpmuludq        ymm2,ymm11,YMMWORD[((224-128))+r9]
+        vpbroadcastq    ymm11,QWORD[((160-128))+r15]
+        vpaddq  ymm2,ymm2,YMMWORD[((352-448))+r12]
+
+        vmovdqu YMMWORD[(192-192)+rbx],ymm6
+        vmovdqu YMMWORD[(224-192)+rbx],ymm7
+
+        vpmuludq        ymm12,ymm10,YMMWORD[((128-128))+rsi]
+        vpaddq  ymm8,ymm8,ymm12
+        vpmuludq        ymm14,ymm10,YMMWORD[((128-128))+r9]
+        vpaddq  ymm0,ymm0,ymm14
+        vpmuludq        ymm13,ymm10,YMMWORD[((160-128))+r9]
+        vpaddq  ymm1,ymm1,ymm13
+        vpmuludq        ymm12,ymm10,YMMWORD[((192-128))+r9]
+        vpaddq  ymm2,ymm2,ymm12
+        vpmuludq        ymm3,ymm10,YMMWORD[((224-128))+r9]
+        vpbroadcastq    ymm10,QWORD[((192-128))+r15]
+        vpaddq  ymm3,ymm3,YMMWORD[((384-448))+r12]
+
+        vmovdqu YMMWORD[(256-192)+rbx],ymm8
+        vmovdqu YMMWORD[(288-192)+rbx],ymm0
+        lea     rbx,[8+rbx]
+
+        vpmuludq        ymm13,ymm11,YMMWORD[((160-128))+rsi]
+        vpaddq  ymm1,ymm1,ymm13
+        vpmuludq        ymm12,ymm11,YMMWORD[((160-128))+r9]
+        vpaddq  ymm2,ymm2,ymm12
+        vpmuludq        ymm14,ymm11,YMMWORD[((192-128))+r9]
+        vpaddq  ymm3,ymm3,ymm14
+        vpmuludq        ymm4,ymm11,YMMWORD[((224-128))+r9]
+        vpbroadcastq    ymm11,QWORD[((224-128))+r15]
+        vpaddq  ymm4,ymm4,YMMWORD[((416-448))+r12]
+
+        vmovdqu YMMWORD[(320-448)+r12],ymm1
+        vmovdqu YMMWORD[(352-448)+r12],ymm2
+
+        vpmuludq        ymm12,ymm10,YMMWORD[((192-128))+rsi]
+        vpaddq  ymm3,ymm3,ymm12
+        vpmuludq        ymm14,ymm10,YMMWORD[((192-128))+r9]
+        vpbroadcastq    ymm0,QWORD[((256-128))+r15]
+        vpaddq  ymm4,ymm4,ymm14
+        vpmuludq        ymm5,ymm10,YMMWORD[((224-128))+r9]
+        vpbroadcastq    ymm10,QWORD[((0+8-128))+r15]
+        vpaddq  ymm5,ymm5,YMMWORD[((448-448))+r12]
+
+        vmovdqu YMMWORD[(384-448)+r12],ymm3
+        vmovdqu YMMWORD[(416-448)+r12],ymm4
+        lea     r15,[8+r15]
+
+        vpmuludq        ymm12,ymm11,YMMWORD[((224-128))+rsi]
+        vpaddq  ymm5,ymm5,ymm12
+        vpmuludq        ymm6,ymm11,YMMWORD[((224-128))+r9]
+        vpaddq  ymm6,ymm6,YMMWORD[((480-448))+r12]
+
+        vpmuludq        ymm7,ymm0,YMMWORD[((256-128))+rsi]
+        vmovdqu YMMWORD[(448-448)+r12],ymm5
+        vpaddq  ymm7,ymm7,YMMWORD[((512-448))+r12]
+        vmovdqu YMMWORD[(480-448)+r12],ymm6
+        vmovdqu YMMWORD[(512-448)+r12],ymm7
+        lea     r12,[8+r12]
+
+        dec     r14d
+        jnz     NEAR $L$OOP_SQR_1024
+
+        vmovdqu ymm8,YMMWORD[256+rsp]
+        vmovdqu ymm1,YMMWORD[288+rsp]
+        vmovdqu ymm2,YMMWORD[320+rsp]
+        lea     rbx,[192+rsp]
+
+        vpsrlq  ymm14,ymm8,29
+        vpand   ymm8,ymm8,ymm15
+        vpsrlq  ymm11,ymm1,29
+        vpand   ymm1,ymm1,ymm15
+
+        vpermq  ymm14,ymm14,0x93
+        vpxor   ymm9,ymm9,ymm9
+        vpermq  ymm11,ymm11,0x93
+
+        vpblendd        ymm10,ymm14,ymm9,3
+        vpblendd        ymm14,ymm11,ymm14,3
+        vpaddq  ymm8,ymm8,ymm10
+        vpblendd        ymm11,ymm9,ymm11,3
+        vpaddq  ymm1,ymm1,ymm14
+        vpaddq  ymm2,ymm2,ymm11
+        vmovdqu YMMWORD[(288-192)+rbx],ymm1
+        vmovdqu YMMWORD[(320-192)+rbx],ymm2
+
+        mov     rax,QWORD[rsp]
+        mov     r10,QWORD[8+rsp]
+        mov     r11,QWORD[16+rsp]
+        mov     r12,QWORD[24+rsp]
+        vmovdqu ymm1,YMMWORD[32+rsp]
+        vmovdqu ymm2,YMMWORD[((64-192))+rbx]
+        vmovdqu ymm3,YMMWORD[((96-192))+rbx]
+        vmovdqu ymm4,YMMWORD[((128-192))+rbx]
+        vmovdqu ymm5,YMMWORD[((160-192))+rbx]
+        vmovdqu ymm6,YMMWORD[((192-192))+rbx]
+        vmovdqu ymm7,YMMWORD[((224-192))+rbx]
+
+        mov     r9,rax
+        imul    eax,ecx
+        and     eax,0x1fffffff
+        vmovd   xmm12,eax
+
+        mov     rdx,rax
+        imul    rax,QWORD[((-128))+r13]
+        vpbroadcastq    ymm12,xmm12
+        add     r9,rax
+        mov     rax,rdx
+        imul    rax,QWORD[((8-128))+r13]
+        shr     r9,29
+        add     r10,rax
+        mov     rax,rdx
+        imul    rax,QWORD[((16-128))+r13]
+        add     r10,r9
+        add     r11,rax
+        imul    rdx,QWORD[((24-128))+r13]
+        add     r12,rdx
+
+        mov     rax,r10
+        imul    eax,ecx
+        and     eax,0x1fffffff
+
+        mov     r14d,9
+        jmp     NEAR $L$OOP_REDUCE_1024
+
+ALIGN   32
+$L$OOP_REDUCE_1024:
+        vmovd   xmm13,eax
+        vpbroadcastq    ymm13,xmm13
+
+        vpmuludq        ymm10,ymm12,YMMWORD[((32-128))+r13]
+        mov     rdx,rax
+        imul    rax,QWORD[((-128))+r13]
+        vpaddq  ymm1,ymm1,ymm10
+        add     r10,rax
+        vpmuludq        ymm14,ymm12,YMMWORD[((64-128))+r13]
+        mov     rax,rdx
+        imul    rax,QWORD[((8-128))+r13]
+        vpaddq  ymm2,ymm2,ymm14
+        vpmuludq        ymm11,ymm12,YMMWORD[((96-128))+r13]
+DB      0x67
+        add     r11,rax
+DB      0x67
+        mov     rax,rdx
+        imul    rax,QWORD[((16-128))+r13]
+        shr     r10,29
+        vpaddq  ymm3,ymm3,ymm11
+        vpmuludq        ymm10,ymm12,YMMWORD[((128-128))+r13]
+        add     r12,rax
+        add     r11,r10
+        vpaddq  ymm4,ymm4,ymm10
+        vpmuludq        ymm14,ymm12,YMMWORD[((160-128))+r13]
+        mov     rax,r11
+        imul    eax,ecx
+        vpaddq  ymm5,ymm5,ymm14
+        vpmuludq        ymm11,ymm12,YMMWORD[((192-128))+r13]
+        and     eax,0x1fffffff
+        vpaddq  ymm6,ymm6,ymm11
+        vpmuludq        ymm10,ymm12,YMMWORD[((224-128))+r13]
+        vpaddq  ymm7,ymm7,ymm10
+        vpmuludq        ymm14,ymm12,YMMWORD[((256-128))+r13]
+        vmovd   xmm12,eax
+
+        vpaddq  ymm8,ymm8,ymm14
+
+        vpbroadcastq    ymm12,xmm12
+
+        vpmuludq        ymm11,ymm13,YMMWORD[((32-8-128))+r13]
+        vmovdqu ymm14,YMMWORD[((96-8-128))+r13]
+        mov     rdx,rax
+        imul    rax,QWORD[((-128))+r13]
+        vpaddq  ymm1,ymm1,ymm11
+        vpmuludq        ymm10,ymm13,YMMWORD[((64-8-128))+r13]
+        vmovdqu ymm11,YMMWORD[((128-8-128))+r13]
+        add     r11,rax
+        mov     rax,rdx
+        imul    rax,QWORD[((8-128))+r13]
+        vpaddq  ymm2,ymm2,ymm10
+        add     rax,r12
+        shr     r11,29
+        vpmuludq        ymm14,ymm14,ymm13
+        vmovdqu ymm10,YMMWORD[((160-8-128))+r13]
+        add     rax,r11
+        vpaddq  ymm3,ymm3,ymm14
+        vpmuludq        ymm11,ymm11,ymm13
+        vmovdqu ymm14,YMMWORD[((192-8-128))+r13]
+DB      0x67
+        mov     r12,rax
+        imul    eax,ecx
+        vpaddq  ymm4,ymm4,ymm11
+        vpmuludq        ymm10,ymm10,ymm13
+DB      0xc4,0x41,0x7e,0x6f,0x9d,0x58,0x00,0x00,0x00
+        and     eax,0x1fffffff
+        vpaddq  ymm5,ymm5,ymm10
+        vpmuludq        ymm14,ymm14,ymm13
+        vmovdqu ymm10,YMMWORD[((256-8-128))+r13]
+        vpaddq  ymm6,ymm6,ymm14
+        vpmuludq        ymm11,ymm11,ymm13
+        vmovdqu ymm9,YMMWORD[((288-8-128))+r13]
+        vmovd   xmm0,eax
+        imul    rax,QWORD[((-128))+r13]
+        vpaddq  ymm7,ymm7,ymm11
+        vpmuludq        ymm10,ymm10,ymm13
+        vmovdqu ymm14,YMMWORD[((32-16-128))+r13]
+        vpbroadcastq    ymm0,xmm0
+        vpaddq  ymm8,ymm8,ymm10
+        vpmuludq        ymm9,ymm9,ymm13
+        vmovdqu ymm11,YMMWORD[((64-16-128))+r13]
+        add     r12,rax
+
+        vmovdqu ymm13,YMMWORD[((32-24-128))+r13]
+        vpmuludq        ymm14,ymm14,ymm12
+        vmovdqu ymm10,YMMWORD[((96-16-128))+r13]
+        vpaddq  ymm1,ymm1,ymm14
+        vpmuludq        ymm13,ymm13,ymm0
+        vpmuludq        ymm11,ymm11,ymm12
+DB      0xc4,0x41,0x7e,0x6f,0xb5,0xf0,0xff,0xff,0xff
+        vpaddq  ymm13,ymm13,ymm1
+        vpaddq  ymm2,ymm2,ymm11
+        vpmuludq        ymm10,ymm10,ymm12
+        vmovdqu ymm11,YMMWORD[((160-16-128))+r13]
+DB      0x67
+        vmovq   rax,xmm13
+        vmovdqu YMMWORD[rsp],ymm13
+        vpaddq  ymm3,ymm3,ymm10
+        vpmuludq        ymm14,ymm14,ymm12
+        vmovdqu ymm10,YMMWORD[((192-16-128))+r13]
+        vpaddq  ymm4,ymm4,ymm14
+        vpmuludq        ymm11,ymm11,ymm12
+        vmovdqu ymm14,YMMWORD[((224-16-128))+r13]
+        vpaddq  ymm5,ymm5,ymm11
+        vpmuludq        ymm10,ymm10,ymm12
+        vmovdqu ymm11,YMMWORD[((256-16-128))+r13]
+        vpaddq  ymm6,ymm6,ymm10
+        vpmuludq        ymm14,ymm14,ymm12
+        shr     r12,29
+        vmovdqu ymm10,YMMWORD[((288-16-128))+r13]
+        add     rax,r12
+        vpaddq  ymm7,ymm7,ymm14
+        vpmuludq        ymm11,ymm11,ymm12
+
+        mov     r9,rax
+        imul    eax,ecx
+        vpaddq  ymm8,ymm8,ymm11
+        vpmuludq        ymm10,ymm10,ymm12
+        and     eax,0x1fffffff
+        vmovd   xmm12,eax
+        vmovdqu ymm11,YMMWORD[((96-24-128))+r13]
+DB      0x67
+        vpaddq  ymm9,ymm9,ymm10
+        vpbroadcastq    ymm12,xmm12
+
+        vpmuludq        ymm14,ymm0,YMMWORD[((64-24-128))+r13]
+        vmovdqu ymm10,YMMWORD[((128-24-128))+r13]
+        mov     rdx,rax
+        imul    rax,QWORD[((-128))+r13]
+        mov     r10,QWORD[8+rsp]
+        vpaddq  ymm1,ymm2,ymm14
+        vpmuludq        ymm11,ymm11,ymm0
+        vmovdqu ymm14,YMMWORD[((160-24-128))+r13]
+        add     r9,rax
+        mov     rax,rdx
+        imul    rax,QWORD[((8-128))+r13]
+DB      0x67
+        shr     r9,29
+        mov     r11,QWORD[16+rsp]
+        vpaddq  ymm2,ymm3,ymm11
+        vpmuludq        ymm10,ymm10,ymm0
+        vmovdqu ymm11,YMMWORD[((192-24-128))+r13]
+        add     r10,rax
+        mov     rax,rdx
+        imul    rax,QWORD[((16-128))+r13]
+        vpaddq  ymm3,ymm4,ymm10
+        vpmuludq        ymm14,ymm14,ymm0
+        vmovdqu ymm10,YMMWORD[((224-24-128))+r13]
+        imul    rdx,QWORD[((24-128))+r13]
+        add     r11,rax
+        lea     rax,[r10*1+r9]
+        vpaddq  ymm4,ymm5,ymm14
+        vpmuludq        ymm11,ymm11,ymm0
+        vmovdqu ymm14,YMMWORD[((256-24-128))+r13]
+        mov     r10,rax
+        imul    eax,ecx
+        vpmuludq        ymm10,ymm10,ymm0
+        vpaddq  ymm5,ymm6,ymm11
+        vmovdqu ymm11,YMMWORD[((288-24-128))+r13]
+        and     eax,0x1fffffff
+        vpaddq  ymm6,ymm7,ymm10
+        vpmuludq        ymm14,ymm14,ymm0
+        add     rdx,QWORD[24+rsp]
+        vpaddq  ymm7,ymm8,ymm14
+        vpmuludq        ymm11,ymm11,ymm0
+        vpaddq  ymm8,ymm9,ymm11
+        vmovq   xmm9,r12
+        mov     r12,rdx
+
+        dec     r14d
+        jnz     NEAR $L$OOP_REDUCE_1024
+        lea     r12,[448+rsp]
+        vpaddq  ymm0,ymm13,ymm9
+        vpxor   ymm9,ymm9,ymm9
+
+        vpaddq  ymm0,ymm0,YMMWORD[((288-192))+rbx]
+        vpaddq  ymm1,ymm1,YMMWORD[((320-448))+r12]
+        vpaddq  ymm2,ymm2,YMMWORD[((352-448))+r12]
+        vpaddq  ymm3,ymm3,YMMWORD[((384-448))+r12]
+        vpaddq  ymm4,ymm4,YMMWORD[((416-448))+r12]
+        vpaddq  ymm5,ymm5,YMMWORD[((448-448))+r12]
+        vpaddq  ymm6,ymm6,YMMWORD[((480-448))+r12]
+        vpaddq  ymm7,ymm7,YMMWORD[((512-448))+r12]
+        vpaddq  ymm8,ymm8,YMMWORD[((544-448))+r12]
+
+        vpsrlq  ymm14,ymm0,29
+        vpand   ymm0,ymm0,ymm15
+        vpsrlq  ymm11,ymm1,29
+        vpand   ymm1,ymm1,ymm15
+        vpsrlq  ymm12,ymm2,29
+        vpermq  ymm14,ymm14,0x93
+        vpand   ymm2,ymm2,ymm15
+        vpsrlq  ymm13,ymm3,29
+        vpermq  ymm11,ymm11,0x93
+        vpand   ymm3,ymm3,ymm15
+        vpermq  ymm12,ymm12,0x93
+
+        vpblendd        ymm10,ymm14,ymm9,3
+        vpermq  ymm13,ymm13,0x93
+        vpblendd        ymm14,ymm11,ymm14,3
+        vpaddq  ymm0,ymm0,ymm10
+        vpblendd        ymm11,ymm12,ymm11,3
+        vpaddq  ymm1,ymm1,ymm14
+        vpblendd        ymm12,ymm13,ymm12,3
+        vpaddq  ymm2,ymm2,ymm11
+        vpblendd        ymm13,ymm9,ymm13,3
+        vpaddq  ymm3,ymm3,ymm12
+        vpaddq  ymm4,ymm4,ymm13
+
+        vpsrlq  ymm14,ymm0,29
+        vpand   ymm0,ymm0,ymm15
+        vpsrlq  ymm11,ymm1,29
+        vpand   ymm1,ymm1,ymm15
+        vpsrlq  ymm12,ymm2,29
+        vpermq  ymm14,ymm14,0x93
+        vpand   ymm2,ymm2,ymm15
+        vpsrlq  ymm13,ymm3,29
+        vpermq  ymm11,ymm11,0x93
+        vpand   ymm3,ymm3,ymm15
+        vpermq  ymm12,ymm12,0x93
+
+        vpblendd        ymm10,ymm14,ymm9,3
+        vpermq  ymm13,ymm13,0x93
+        vpblendd        ymm14,ymm11,ymm14,3
+        vpaddq  ymm0,ymm0,ymm10
+        vpblendd        ymm11,ymm12,ymm11,3
+        vpaddq  ymm1,ymm1,ymm14
+        vmovdqu YMMWORD[(0-128)+rdi],ymm0
+        vpblendd        ymm12,ymm13,ymm12,3
+        vpaddq  ymm2,ymm2,ymm11
+        vmovdqu YMMWORD[(32-128)+rdi],ymm1
+        vpblendd        ymm13,ymm9,ymm13,3
+        vpaddq  ymm3,ymm3,ymm12
+        vmovdqu YMMWORD[(64-128)+rdi],ymm2
+        vpaddq  ymm4,ymm4,ymm13
+        vmovdqu YMMWORD[(96-128)+rdi],ymm3
+        vpsrlq  ymm14,ymm4,29
+        vpand   ymm4,ymm4,ymm15
+        vpsrlq  ymm11,ymm5,29
+        vpand   ymm5,ymm5,ymm15
+        vpsrlq  ymm12,ymm6,29
+        vpermq  ymm14,ymm14,0x93
+        vpand   ymm6,ymm6,ymm15
+        vpsrlq  ymm13,ymm7,29
+        vpermq  ymm11,ymm11,0x93
+        vpand   ymm7,ymm7,ymm15
+        vpsrlq  ymm0,ymm8,29
+        vpermq  ymm12,ymm12,0x93
+        vpand   ymm8,ymm8,ymm15
+        vpermq  ymm13,ymm13,0x93
+
+        vpblendd        ymm10,ymm14,ymm9,3
+        vpermq  ymm0,ymm0,0x93
+        vpblendd        ymm14,ymm11,ymm14,3
+        vpaddq  ymm4,ymm4,ymm10
+        vpblendd        ymm11,ymm12,ymm11,3
+        vpaddq  ymm5,ymm5,ymm14
+        vpblendd        ymm12,ymm13,ymm12,3
+        vpaddq  ymm6,ymm6,ymm11
+        vpblendd        ymm13,ymm0,ymm13,3
+        vpaddq  ymm7,ymm7,ymm12
+        vpaddq  ymm8,ymm8,ymm13
+
+        vpsrlq  ymm14,ymm4,29
+        vpand   ymm4,ymm4,ymm15
+        vpsrlq  ymm11,ymm5,29
+        vpand   ymm5,ymm5,ymm15
+        vpsrlq  ymm12,ymm6,29
+        vpermq  ymm14,ymm14,0x93
+        vpand   ymm6,ymm6,ymm15
+        vpsrlq  ymm13,ymm7,29
+        vpermq  ymm11,ymm11,0x93
+        vpand   ymm7,ymm7,ymm15
+        vpsrlq  ymm0,ymm8,29
+        vpermq  ymm12,ymm12,0x93
+        vpand   ymm8,ymm8,ymm15
+        vpermq  ymm13,ymm13,0x93
+
+        vpblendd        ymm10,ymm14,ymm9,3
+        vpermq  ymm0,ymm0,0x93
+        vpblendd        ymm14,ymm11,ymm14,3
+        vpaddq  ymm4,ymm4,ymm10
+        vpblendd        ymm11,ymm12,ymm11,3
+        vpaddq  ymm5,ymm5,ymm14
+        vmovdqu YMMWORD[(128-128)+rdi],ymm4
+        vpblendd        ymm12,ymm13,ymm12,3
+        vpaddq  ymm6,ymm6,ymm11
+        vmovdqu YMMWORD[(160-128)+rdi],ymm5
+        vpblendd        ymm13,ymm0,ymm13,3
+        vpaddq  ymm7,ymm7,ymm12
+        vmovdqu YMMWORD[(192-128)+rdi],ymm6
+        vpaddq  ymm8,ymm8,ymm13
+        vmovdqu YMMWORD[(224-128)+rdi],ymm7
+        vmovdqu YMMWORD[(256-128)+rdi],ymm8
+
+        mov     rsi,rdi
+        dec     r8d
+        jne     NEAR $L$OOP_GRANDE_SQR_1024
+
+        vzeroall
+        mov     rax,rbp
+
+$L$sqr_1024_in_tail:
+        movaps  xmm6,XMMWORD[((-216))+rax]
+        movaps  xmm7,XMMWORD[((-200))+rax]
+        movaps  xmm8,XMMWORD[((-184))+rax]
+        movaps  xmm9,XMMWORD[((-168))+rax]
+        movaps  xmm10,XMMWORD[((-152))+rax]
+        movaps  xmm11,XMMWORD[((-136))+rax]
+        movaps  xmm12,XMMWORD[((-120))+rax]
+        movaps  xmm13,XMMWORD[((-104))+rax]
+        movaps  xmm14,XMMWORD[((-88))+rax]
+        movaps  xmm15,XMMWORD[((-72))+rax]
+        mov     r15,QWORD[((-48))+rax]
+
+        mov     r14,QWORD[((-40))+rax]
+
+        mov     r13,QWORD[((-32))+rax]
+
+        mov     r12,QWORD[((-24))+rax]
+
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$sqr_1024_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_rsaz_1024_sqr_avx2:
+global  rsaz_1024_mul_avx2
+
+ALIGN   64
+rsaz_1024_mul_avx2:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_rsaz_1024_mul_avx2:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+
+
+
+        lea     rax,[rsp]
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        vzeroupper
+        lea     rsp,[((-168))+rsp]
+        vmovaps XMMWORD[(-216)+rax],xmm6
+        vmovaps XMMWORD[(-200)+rax],xmm7
+        vmovaps XMMWORD[(-184)+rax],xmm8
+        vmovaps XMMWORD[(-168)+rax],xmm9
+        vmovaps XMMWORD[(-152)+rax],xmm10
+        vmovaps XMMWORD[(-136)+rax],xmm11
+        vmovaps XMMWORD[(-120)+rax],xmm12
+        vmovaps XMMWORD[(-104)+rax],xmm13
+        vmovaps XMMWORD[(-88)+rax],xmm14
+        vmovaps XMMWORD[(-72)+rax],xmm15
+$L$mul_1024_body:
+        mov     rbp,rax
+
+        vzeroall
+        mov     r13,rdx
+        sub     rsp,64
+
+
+
+
+
+
+DB      0x67,0x67
+        mov     r15,rsi
+        and     r15,4095
+        add     r15,32*10
+        shr     r15,12
+        mov     r15,rsi
+        cmovnz  rsi,r13
+        cmovnz  r13,r15
+
+        mov     r15,rcx
+        sub     rsi,-128
+        sub     rcx,-128
+        sub     rdi,-128
+
+        and     r15,4095
+        add     r15,32*10
+DB      0x67,0x67
+        shr     r15,12
+        jz      NEAR $L$mul_1024_no_n_copy
+
+
+
+
+
+        sub     rsp,32*10
+        vmovdqu ymm0,YMMWORD[((0-128))+rcx]
+        and     rsp,-512
+        vmovdqu ymm1,YMMWORD[((32-128))+rcx]
+        vmovdqu ymm2,YMMWORD[((64-128))+rcx]
+        vmovdqu ymm3,YMMWORD[((96-128))+rcx]
+        vmovdqu ymm4,YMMWORD[((128-128))+rcx]
+        vmovdqu ymm5,YMMWORD[((160-128))+rcx]
+        vmovdqu ymm6,YMMWORD[((192-128))+rcx]
+        vmovdqu ymm7,YMMWORD[((224-128))+rcx]
+        vmovdqu ymm8,YMMWORD[((256-128))+rcx]
+        lea     rcx,[((64+128))+rsp]
+        vmovdqu YMMWORD[(0-128)+rcx],ymm0
+        vpxor   ymm0,ymm0,ymm0
+        vmovdqu YMMWORD[(32-128)+rcx],ymm1
+        vpxor   ymm1,ymm1,ymm1
+        vmovdqu YMMWORD[(64-128)+rcx],ymm2
+        vpxor   ymm2,ymm2,ymm2
+        vmovdqu YMMWORD[(96-128)+rcx],ymm3
+        vpxor   ymm3,ymm3,ymm3
+        vmovdqu YMMWORD[(128-128)+rcx],ymm4
+        vpxor   ymm4,ymm4,ymm4
+        vmovdqu YMMWORD[(160-128)+rcx],ymm5
+        vpxor   ymm5,ymm5,ymm5
+        vmovdqu YMMWORD[(192-128)+rcx],ymm6
+        vpxor   ymm6,ymm6,ymm6
+        vmovdqu YMMWORD[(224-128)+rcx],ymm7
+        vpxor   ymm7,ymm7,ymm7
+        vmovdqu YMMWORD[(256-128)+rcx],ymm8
+        vmovdqa ymm8,ymm0
+        vmovdqu YMMWORD[(288-128)+rcx],ymm9
+$L$mul_1024_no_n_copy:
+        and     rsp,-64
+
+        mov     rbx,QWORD[r13]
+        vpbroadcastq    ymm10,QWORD[r13]
+        vmovdqu YMMWORD[rsp],ymm0
+        xor     r9,r9
+DB      0x67
+        xor     r10,r10
+        xor     r11,r11
+        xor     r12,r12
+
+        vmovdqu ymm15,YMMWORD[$L$and_mask]
+        mov     r14d,9
+        vmovdqu YMMWORD[(288-128)+rdi],ymm9
+        jmp     NEAR $L$oop_mul_1024
+
+ALIGN   32
+$L$oop_mul_1024:
+        vpsrlq  ymm9,ymm3,29
+        mov     rax,rbx
+        imul    rax,QWORD[((-128))+rsi]
+        add     rax,r9
+        mov     r10,rbx
+        imul    r10,QWORD[((8-128))+rsi]
+        add     r10,QWORD[8+rsp]
+
+        mov     r9,rax
+        imul    eax,r8d
+        and     eax,0x1fffffff
+
+        mov     r11,rbx
+        imul    r11,QWORD[((16-128))+rsi]
+        add     r11,QWORD[16+rsp]
+
+        mov     r12,rbx
+        imul    r12,QWORD[((24-128))+rsi]
+        add     r12,QWORD[24+rsp]
+        vpmuludq        ymm0,ymm10,YMMWORD[((32-128))+rsi]
+        vmovd   xmm11,eax
+        vpaddq  ymm1,ymm1,ymm0
+        vpmuludq        ymm12,ymm10,YMMWORD[((64-128))+rsi]
+        vpbroadcastq    ymm11,xmm11
+        vpaddq  ymm2,ymm2,ymm12
+        vpmuludq        ymm13,ymm10,YMMWORD[((96-128))+rsi]
+        vpand   ymm3,ymm3,ymm15
+        vpaddq  ymm3,ymm3,ymm13
+        vpmuludq        ymm0,ymm10,YMMWORD[((128-128))+rsi]
+        vpaddq  ymm4,ymm4,ymm0
+        vpmuludq        ymm12,ymm10,YMMWORD[((160-128))+rsi]
+        vpaddq  ymm5,ymm5,ymm12
+        vpmuludq        ymm13,ymm10,YMMWORD[((192-128))+rsi]
+        vpaddq  ymm6,ymm6,ymm13
+        vpmuludq        ymm0,ymm10,YMMWORD[((224-128))+rsi]
+        vpermq  ymm9,ymm9,0x93
+        vpaddq  ymm7,ymm7,ymm0
+        vpmuludq        ymm12,ymm10,YMMWORD[((256-128))+rsi]
+        vpbroadcastq    ymm10,QWORD[8+r13]
+        vpaddq  ymm8,ymm8,ymm12
+
+        mov     rdx,rax
+        imul    rax,QWORD[((-128))+rcx]
+        add     r9,rax
+        mov     rax,rdx
+        imul    rax,QWORD[((8-128))+rcx]
+        add     r10,rax
+        mov     rax,rdx
+        imul    rax,QWORD[((16-128))+rcx]
+        add     r11,rax
+        shr     r9,29
+        imul    rdx,QWORD[((24-128))+rcx]
+        add     r12,rdx
+        add     r10,r9
+
+        vpmuludq        ymm13,ymm11,YMMWORD[((32-128))+rcx]
+        vmovq   rbx,xmm10
+        vpaddq  ymm1,ymm1,ymm13
+        vpmuludq        ymm0,ymm11,YMMWORD[((64-128))+rcx]
+        vpaddq  ymm2,ymm2,ymm0
+        vpmuludq        ymm12,ymm11,YMMWORD[((96-128))+rcx]
+        vpaddq  ymm3,ymm3,ymm12
+        vpmuludq        ymm13,ymm11,YMMWORD[((128-128))+rcx]
+        vpaddq  ymm4,ymm4,ymm13
+        vpmuludq        ymm0,ymm11,YMMWORD[((160-128))+rcx]
+        vpaddq  ymm5,ymm5,ymm0
+        vpmuludq        ymm12,ymm11,YMMWORD[((192-128))+rcx]
+        vpaddq  ymm6,ymm6,ymm12
+        vpmuludq        ymm13,ymm11,YMMWORD[((224-128))+rcx]
+        vpblendd        ymm12,ymm9,ymm14,3
+        vpaddq  ymm7,ymm7,ymm13
+        vpmuludq        ymm0,ymm11,YMMWORD[((256-128))+rcx]
+        vpaddq  ymm3,ymm3,ymm12
+        vpaddq  ymm8,ymm8,ymm0
+
+        mov     rax,rbx
+        imul    rax,QWORD[((-128))+rsi]
+        add     r10,rax
+        vmovdqu ymm12,YMMWORD[((-8+32-128))+rsi]
+        mov     rax,rbx
+        imul    rax,QWORD[((8-128))+rsi]
+        add     r11,rax
+        vmovdqu ymm13,YMMWORD[((-8+64-128))+rsi]
+
+        mov     rax,r10
+        vpblendd        ymm9,ymm9,ymm14,0xfc
+        imul    eax,r8d
+        vpaddq  ymm4,ymm4,ymm9
+        and     eax,0x1fffffff
+
+        imul    rbx,QWORD[((16-128))+rsi]
+        add     r12,rbx
+        vpmuludq        ymm12,ymm12,ymm10
+        vmovd   xmm11,eax
+        vmovdqu ymm0,YMMWORD[((-8+96-128))+rsi]
+        vpaddq  ymm1,ymm1,ymm12
+        vpmuludq        ymm13,ymm13,ymm10
+        vpbroadcastq    ymm11,xmm11
+        vmovdqu ymm12,YMMWORD[((-8+128-128))+rsi]
+        vpaddq  ymm2,ymm2,ymm13
+        vpmuludq        ymm0,ymm0,ymm10
+        vmovdqu ymm13,YMMWORD[((-8+160-128))+rsi]
+        vpaddq  ymm3,ymm3,ymm0
+        vpmuludq        ymm12,ymm12,ymm10
+        vmovdqu ymm0,YMMWORD[((-8+192-128))+rsi]
+        vpaddq  ymm4,ymm4,ymm12
+        vpmuludq        ymm13,ymm13,ymm10
+        vmovdqu ymm12,YMMWORD[((-8+224-128))+rsi]
+        vpaddq  ymm5,ymm5,ymm13
+        vpmuludq        ymm0,ymm0,ymm10
+        vmovdqu ymm13,YMMWORD[((-8+256-128))+rsi]
+        vpaddq  ymm6,ymm6,ymm0
+        vpmuludq        ymm12,ymm12,ymm10
+        vmovdqu ymm9,YMMWORD[((-8+288-128))+rsi]
+        vpaddq  ymm7,ymm7,ymm12
+        vpmuludq        ymm13,ymm13,ymm10
+        vpaddq  ymm8,ymm8,ymm13
+        vpmuludq        ymm9,ymm9,ymm10
+        vpbroadcastq    ymm10,QWORD[16+r13]
+
+        mov     rdx,rax
+        imul    rax,QWORD[((-128))+rcx]
+        add     r10,rax
+        vmovdqu ymm0,YMMWORD[((-8+32-128))+rcx]
+        mov     rax,rdx
+        imul    rax,QWORD[((8-128))+rcx]
+        add     r11,rax
+        vmovdqu ymm12,YMMWORD[((-8+64-128))+rcx]
+        shr     r10,29
+        imul    rdx,QWORD[((16-128))+rcx]
+        add     r12,rdx
+        add     r11,r10
+
+        vpmuludq        ymm0,ymm0,ymm11
+        vmovq   rbx,xmm10
+        vmovdqu ymm13,YMMWORD[((-8+96-128))+rcx]
+        vpaddq  ymm1,ymm1,ymm0
+        vpmuludq        ymm12,ymm12,ymm11
+        vmovdqu ymm0,YMMWORD[((-8+128-128))+rcx]
+        vpaddq  ymm2,ymm2,ymm12
+        vpmuludq        ymm13,ymm13,ymm11
+        vmovdqu ymm12,YMMWORD[((-8+160-128))+rcx]
+        vpaddq  ymm3,ymm3,ymm13
+        vpmuludq        ymm0,ymm0,ymm11
+        vmovdqu ymm13,YMMWORD[((-8+192-128))+rcx]
+        vpaddq  ymm4,ymm4,ymm0
+        vpmuludq        ymm12,ymm12,ymm11
+        vmovdqu ymm0,YMMWORD[((-8+224-128))+rcx]
+        vpaddq  ymm5,ymm5,ymm12
+        vpmuludq        ymm13,ymm13,ymm11
+        vmovdqu ymm12,YMMWORD[((-8+256-128))+rcx]
+        vpaddq  ymm6,ymm6,ymm13
+        vpmuludq        ymm0,ymm0,ymm11
+        vmovdqu ymm13,YMMWORD[((-8+288-128))+rcx]
+        vpaddq  ymm7,ymm7,ymm0
+        vpmuludq        ymm12,ymm12,ymm11
+        vpaddq  ymm8,ymm8,ymm12
+        vpmuludq        ymm13,ymm13,ymm11
+        vpaddq  ymm9,ymm9,ymm13
+
+        vmovdqu ymm0,YMMWORD[((-16+32-128))+rsi]
+        mov     rax,rbx
+        imul    rax,QWORD[((-128))+rsi]
+        add     rax,r11
+
+        vmovdqu ymm12,YMMWORD[((-16+64-128))+rsi]
+        mov     r11,rax
+        imul    eax,r8d
+        and     eax,0x1fffffff
+
+        imul    rbx,QWORD[((8-128))+rsi]
+        add     r12,rbx
+        vpmuludq        ymm0,ymm0,ymm10
+        vmovd   xmm11,eax
+        vmovdqu ymm13,YMMWORD[((-16+96-128))+rsi]
+        vpaddq  ymm1,ymm1,ymm0
+        vpmuludq        ymm12,ymm12,ymm10
+        vpbroadcastq    ymm11,xmm11
+        vmovdqu ymm0,YMMWORD[((-16+128-128))+rsi]
+        vpaddq  ymm2,ymm2,ymm12
+        vpmuludq        ymm13,ymm13,ymm10
+        vmovdqu ymm12,YMMWORD[((-16+160-128))+rsi]
+        vpaddq  ymm3,ymm3,ymm13
+        vpmuludq        ymm0,ymm0,ymm10
+        vmovdqu ymm13,YMMWORD[((-16+192-128))+rsi]
+        vpaddq  ymm4,ymm4,ymm0
+        vpmuludq        ymm12,ymm12,ymm10
+        vmovdqu ymm0,YMMWORD[((-16+224-128))+rsi]
+        vpaddq  ymm5,ymm5,ymm12
+        vpmuludq        ymm13,ymm13,ymm10
+        vmovdqu ymm12,YMMWORD[((-16+256-128))+rsi]
+        vpaddq  ymm6,ymm6,ymm13
+        vpmuludq        ymm0,ymm0,ymm10
+        vmovdqu ymm13,YMMWORD[((-16+288-128))+rsi]
+        vpaddq  ymm7,ymm7,ymm0
+        vpmuludq        ymm12,ymm12,ymm10
+        vpaddq  ymm8,ymm8,ymm12
+        vpmuludq        ymm13,ymm13,ymm10
+        vpbroadcastq    ymm10,QWORD[24+r13]
+        vpaddq  ymm9,ymm9,ymm13
+
+        vmovdqu ymm0,YMMWORD[((-16+32-128))+rcx]
+        mov     rdx,rax
+        imul    rax,QWORD[((-128))+rcx]
+        add     r11,rax
+        vmovdqu ymm12,YMMWORD[((-16+64-128))+rcx]
+        imul    rdx,QWORD[((8-128))+rcx]
+        add     r12,rdx
+        shr     r11,29
+
+        vpmuludq        ymm0,ymm0,ymm11
+        vmovq   rbx,xmm10
+        vmovdqu ymm13,YMMWORD[((-16+96-128))+rcx]
+        vpaddq  ymm1,ymm1,ymm0
+        vpmuludq        ymm12,ymm12,ymm11
+        vmovdqu ymm0,YMMWORD[((-16+128-128))+rcx]
+        vpaddq  ymm2,ymm2,ymm12
+        vpmuludq        ymm13,ymm13,ymm11
+        vmovdqu ymm12,YMMWORD[((-16+160-128))+rcx]
+        vpaddq  ymm3,ymm3,ymm13
+        vpmuludq        ymm0,ymm0,ymm11
+        vmovdqu ymm13,YMMWORD[((-16+192-128))+rcx]
+        vpaddq  ymm4,ymm4,ymm0
+        vpmuludq        ymm12,ymm12,ymm11
+        vmovdqu ymm0,YMMWORD[((-16+224-128))+rcx]
+        vpaddq  ymm5,ymm5,ymm12
+        vpmuludq        ymm13,ymm13,ymm11
+        vmovdqu ymm12,YMMWORD[((-16+256-128))+rcx]
+        vpaddq  ymm6,ymm6,ymm13
+        vpmuludq        ymm0,ymm0,ymm11
+        vmovdqu ymm13,YMMWORD[((-16+288-128))+rcx]
+        vpaddq  ymm7,ymm7,ymm0
+        vpmuludq        ymm12,ymm12,ymm11
+        vmovdqu ymm0,YMMWORD[((-24+32-128))+rsi]
+        vpaddq  ymm8,ymm8,ymm12
+        vpmuludq        ymm13,ymm13,ymm11
+        vmovdqu ymm12,YMMWORD[((-24+64-128))+rsi]
+        vpaddq  ymm9,ymm9,ymm13
+
+        add     r12,r11
+        imul    rbx,QWORD[((-128))+rsi]
+        add     r12,rbx
+
+        mov     rax,r12
+        imul    eax,r8d
+        and     eax,0x1fffffff
+
+        vpmuludq        ymm0,ymm0,ymm10
+        vmovd   xmm11,eax
+        vmovdqu ymm13,YMMWORD[((-24+96-128))+rsi]
+        vpaddq  ymm1,ymm1,ymm0
+        vpmuludq        ymm12,ymm12,ymm10
+        vpbroadcastq    ymm11,xmm11
+        vmovdqu ymm0,YMMWORD[((-24+128-128))+rsi]
+        vpaddq  ymm2,ymm2,ymm12
+        vpmuludq        ymm13,ymm13,ymm10
+        vmovdqu ymm12,YMMWORD[((-24+160-128))+rsi]
+        vpaddq  ymm3,ymm3,ymm13
+        vpmuludq        ymm0,ymm0,ymm10
+        vmovdqu ymm13,YMMWORD[((-24+192-128))+rsi]
+        vpaddq  ymm4,ymm4,ymm0
+        vpmuludq        ymm12,ymm12,ymm10
+        vmovdqu ymm0,YMMWORD[((-24+224-128))+rsi]
+        vpaddq  ymm5,ymm5,ymm12
+        vpmuludq        ymm13,ymm13,ymm10
+        vmovdqu ymm12,YMMWORD[((-24+256-128))+rsi]
+        vpaddq  ymm6,ymm6,ymm13
+        vpmuludq        ymm0,ymm0,ymm10
+        vmovdqu ymm13,YMMWORD[((-24+288-128))+rsi]
+        vpaddq  ymm7,ymm7,ymm0
+        vpmuludq        ymm12,ymm12,ymm10
+        vpaddq  ymm8,ymm8,ymm12
+        vpmuludq        ymm13,ymm13,ymm10
+        vpbroadcastq    ymm10,QWORD[32+r13]
+        vpaddq  ymm9,ymm9,ymm13
+        add     r13,32
+
+        vmovdqu ymm0,YMMWORD[((-24+32-128))+rcx]
+        imul    rax,QWORD[((-128))+rcx]
+        add     r12,rax
+        shr     r12,29
+
+        vmovdqu ymm12,YMMWORD[((-24+64-128))+rcx]
+        vpmuludq        ymm0,ymm0,ymm11
+        vmovq   rbx,xmm10
+        vmovdqu ymm13,YMMWORD[((-24+96-128))+rcx]
+        vpaddq  ymm0,ymm1,ymm0
+        vpmuludq        ymm12,ymm12,ymm11
+        vmovdqu YMMWORD[rsp],ymm0
+        vpaddq  ymm1,ymm2,ymm12
+        vmovdqu ymm0,YMMWORD[((-24+128-128))+rcx]
+        vpmuludq        ymm13,ymm13,ymm11
+        vmovdqu ymm12,YMMWORD[((-24+160-128))+rcx]
+        vpaddq  ymm2,ymm3,ymm13
+        vpmuludq        ymm0,ymm0,ymm11
+        vmovdqu ymm13,YMMWORD[((-24+192-128))+rcx]
+        vpaddq  ymm3,ymm4,ymm0
+        vpmuludq        ymm12,ymm12,ymm11
+        vmovdqu ymm0,YMMWORD[((-24+224-128))+rcx]
+        vpaddq  ymm4,ymm5,ymm12
+        vpmuludq        ymm13,ymm13,ymm11
+        vmovdqu ymm12,YMMWORD[((-24+256-128))+rcx]
+        vpaddq  ymm5,ymm6,ymm13
+        vpmuludq        ymm0,ymm0,ymm11
+        vmovdqu ymm13,YMMWORD[((-24+288-128))+rcx]
+        mov     r9,r12
+        vpaddq  ymm6,ymm7,ymm0
+        vpmuludq        ymm12,ymm12,ymm11
+        add     r9,QWORD[rsp]
+        vpaddq  ymm7,ymm8,ymm12
+        vpmuludq        ymm13,ymm13,ymm11
+        vmovq   xmm12,r12
+        vpaddq  ymm8,ymm9,ymm13
+
+        dec     r14d
+        jnz     NEAR $L$oop_mul_1024
+        vpaddq  ymm0,ymm12,YMMWORD[rsp]
+
+        vpsrlq  ymm12,ymm0,29
+        vpand   ymm0,ymm0,ymm15
+        vpsrlq  ymm13,ymm1,29
+        vpand   ymm1,ymm1,ymm15
+        vpsrlq  ymm10,ymm2,29
+        vpermq  ymm12,ymm12,0x93
+        vpand   ymm2,ymm2,ymm15
+        vpsrlq  ymm11,ymm3,29
+        vpermq  ymm13,ymm13,0x93
+        vpand   ymm3,ymm3,ymm15
+
+        vpblendd        ymm9,ymm12,ymm14,3
+        vpermq  ymm10,ymm10,0x93
+        vpblendd        ymm12,ymm13,ymm12,3
+        vpermq  ymm11,ymm11,0x93
+        vpaddq  ymm0,ymm0,ymm9
+        vpblendd        ymm13,ymm10,ymm13,3
+        vpaddq  ymm1,ymm1,ymm12
+        vpblendd        ymm10,ymm11,ymm10,3
+        vpaddq  ymm2,ymm2,ymm13
+        vpblendd        ymm11,ymm14,ymm11,3
+        vpaddq  ymm3,ymm3,ymm10
+        vpaddq  ymm4,ymm4,ymm11
+
+        vpsrlq  ymm12,ymm0,29
+        vpand   ymm0,ymm0,ymm15
+        vpsrlq  ymm13,ymm1,29
+        vpand   ymm1,ymm1,ymm15
+        vpsrlq  ymm10,ymm2,29
+        vpermq  ymm12,ymm12,0x93
+        vpand   ymm2,ymm2,ymm15
+        vpsrlq  ymm11,ymm3,29
+        vpermq  ymm13,ymm13,0x93
+        vpand   ymm3,ymm3,ymm15
+        vpermq  ymm10,ymm10,0x93
+
+        vpblendd        ymm9,ymm12,ymm14,3
+        vpermq  ymm11,ymm11,0x93
+        vpblendd        ymm12,ymm13,ymm12,3
+        vpaddq  ymm0,ymm0,ymm9
+        vpblendd        ymm13,ymm10,ymm13,3
+        vpaddq  ymm1,ymm1,ymm12
+        vpblendd        ymm10,ymm11,ymm10,3
+        vpaddq  ymm2,ymm2,ymm13
+        vpblendd        ymm11,ymm14,ymm11,3
+        vpaddq  ymm3,ymm3,ymm10
+        vpaddq  ymm4,ymm4,ymm11
+
+        vmovdqu YMMWORD[(0-128)+rdi],ymm0
+        vmovdqu YMMWORD[(32-128)+rdi],ymm1
+        vmovdqu YMMWORD[(64-128)+rdi],ymm2
+        vmovdqu YMMWORD[(96-128)+rdi],ymm3
+        vpsrlq  ymm12,ymm4,29
+        vpand   ymm4,ymm4,ymm15
+        vpsrlq  ymm13,ymm5,29
+        vpand   ymm5,ymm5,ymm15
+        vpsrlq  ymm10,ymm6,29
+        vpermq  ymm12,ymm12,0x93
+        vpand   ymm6,ymm6,ymm15
+        vpsrlq  ymm11,ymm7,29
+        vpermq  ymm13,ymm13,0x93
+        vpand   ymm7,ymm7,ymm15
+        vpsrlq  ymm0,ymm8,29
+        vpermq  ymm10,ymm10,0x93
+        vpand   ymm8,ymm8,ymm15
+        vpermq  ymm11,ymm11,0x93
+
+        vpblendd        ymm9,ymm12,ymm14,3
+        vpermq  ymm0,ymm0,0x93
+        vpblendd        ymm12,ymm13,ymm12,3
+        vpaddq  ymm4,ymm4,ymm9
+        vpblendd        ymm13,ymm10,ymm13,3
+        vpaddq  ymm5,ymm5,ymm12
+        vpblendd        ymm10,ymm11,ymm10,3
+        vpaddq  ymm6,ymm6,ymm13
+        vpblendd        ymm11,ymm0,ymm11,3
+        vpaddq  ymm7,ymm7,ymm10
+        vpaddq  ymm8,ymm8,ymm11
+
+        vpsrlq  ymm12,ymm4,29
+        vpand   ymm4,ymm4,ymm15
+        vpsrlq  ymm13,ymm5,29
+        vpand   ymm5,ymm5,ymm15
+        vpsrlq  ymm10,ymm6,29
+        vpermq  ymm12,ymm12,0x93
+        vpand   ymm6,ymm6,ymm15
+        vpsrlq  ymm11,ymm7,29
+        vpermq  ymm13,ymm13,0x93
+        vpand   ymm7,ymm7,ymm15
+        vpsrlq  ymm0,ymm8,29
+        vpermq  ymm10,ymm10,0x93
+        vpand   ymm8,ymm8,ymm15
+        vpermq  ymm11,ymm11,0x93
+
+        vpblendd        ymm9,ymm12,ymm14,3
+        vpermq  ymm0,ymm0,0x93
+        vpblendd        ymm12,ymm13,ymm12,3
+        vpaddq  ymm4,ymm4,ymm9
+        vpblendd        ymm13,ymm10,ymm13,3
+        vpaddq  ymm5,ymm5,ymm12
+        vpblendd        ymm10,ymm11,ymm10,3
+        vpaddq  ymm6,ymm6,ymm13
+        vpblendd        ymm11,ymm0,ymm11,3
+        vpaddq  ymm7,ymm7,ymm10
+        vpaddq  ymm8,ymm8,ymm11
+
+        vmovdqu YMMWORD[(128-128)+rdi],ymm4
+        vmovdqu YMMWORD[(160-128)+rdi],ymm5
+        vmovdqu YMMWORD[(192-128)+rdi],ymm6
+        vmovdqu YMMWORD[(224-128)+rdi],ymm7
+        vmovdqu YMMWORD[(256-128)+rdi],ymm8
+        vzeroupper
+
+        mov     rax,rbp
+
+$L$mul_1024_in_tail:
+        movaps  xmm6,XMMWORD[((-216))+rax]
+        movaps  xmm7,XMMWORD[((-200))+rax]
+        movaps  xmm8,XMMWORD[((-184))+rax]
+        movaps  xmm9,XMMWORD[((-168))+rax]
+        movaps  xmm10,XMMWORD[((-152))+rax]
+        movaps  xmm11,XMMWORD[((-136))+rax]
+        movaps  xmm12,XMMWORD[((-120))+rax]
+        movaps  xmm13,XMMWORD[((-104))+rax]
+        movaps  xmm14,XMMWORD[((-88))+rax]
+        movaps  xmm15,XMMWORD[((-72))+rax]
+        mov     r15,QWORD[((-48))+rax]
+
+        mov     r14,QWORD[((-40))+rax]
+
+        mov     r13,QWORD[((-32))+rax]
+
+        mov     r12,QWORD[((-24))+rax]
+
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$mul_1024_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_rsaz_1024_mul_avx2:
+global  rsaz_1024_red2norm_avx2
+
+ALIGN   32
+rsaz_1024_red2norm_avx2:
+
+        sub     rdx,-128
+        xor     rax,rax
+        mov     r8,QWORD[((-128))+rdx]
+        mov     r9,QWORD[((-120))+rdx]
+        mov     r10,QWORD[((-112))+rdx]
+        shl     r8,0
+        shl     r9,29
+        mov     r11,r10
+        shl     r10,58
+        shr     r11,6
+        add     rax,r8
+        add     rax,r9
+        add     rax,r10
+        adc     r11,0
+        mov     QWORD[rcx],rax
+        mov     rax,r11
+        mov     r8,QWORD[((-104))+rdx]
+        mov     r9,QWORD[((-96))+rdx]
+        shl     r8,23
+        mov     r10,r9
+        shl     r9,52
+        shr     r10,12
+        add     rax,r8
+        add     rax,r9
+        adc     r10,0
+        mov     QWORD[8+rcx],rax
+        mov     rax,r10
+        mov     r11,QWORD[((-88))+rdx]
+        mov     r8,QWORD[((-80))+rdx]
+        shl     r11,17
+        mov     r9,r8
+        shl     r8,46
+        shr     r9,18
+        add     rax,r11
+        add     rax,r8
+        adc     r9,0
+        mov     QWORD[16+rcx],rax
+        mov     rax,r9
+        mov     r10,QWORD[((-72))+rdx]
+        mov     r11,QWORD[((-64))+rdx]
+        shl     r10,11
+        mov     r8,r11
+        shl     r11,40
+        shr     r8,24
+        add     rax,r10
+        add     rax,r11
+        adc     r8,0
+        mov     QWORD[24+rcx],rax
+        mov     rax,r8
+        mov     r9,QWORD[((-56))+rdx]
+        mov     r10,QWORD[((-48))+rdx]
+        mov     r11,QWORD[((-40))+rdx]
+        shl     r9,5
+        shl     r10,34
+        mov     r8,r11
+        shl     r11,63
+        shr     r8,1
+        add     rax,r9
+        add     rax,r10
+        add     rax,r11
+        adc     r8,0
+        mov     QWORD[32+rcx],rax
+        mov     rax,r8
+        mov     r9,QWORD[((-32))+rdx]
+        mov     r10,QWORD[((-24))+rdx]
+        shl     r9,28
+        mov     r11,r10
+        shl     r10,57
+        shr     r11,7
+        add     rax,r9
+        add     rax,r10
+        adc     r11,0
+        mov     QWORD[40+rcx],rax
+        mov     rax,r11
+        mov     r8,QWORD[((-16))+rdx]
+        mov     r9,QWORD[((-8))+rdx]
+        shl     r8,22
+        mov     r10,r9
+        shl     r9,51
+        shr     r10,13
+        add     rax,r8
+        add     rax,r9
+        adc     r10,0
+        mov     QWORD[48+rcx],rax
+        mov     rax,r10
+        mov     r11,QWORD[rdx]
+        mov     r8,QWORD[8+rdx]
+        shl     r11,16
+        mov     r9,r8
+        shl     r8,45
+        shr     r9,19
+        add     rax,r11
+        add     rax,r8
+        adc     r9,0
+        mov     QWORD[56+rcx],rax
+        mov     rax,r9
+        mov     r10,QWORD[16+rdx]
+        mov     r11,QWORD[24+rdx]
+        shl     r10,10
+        mov     r8,r11
+        shl     r11,39
+        shr     r8,25
+        add     rax,r10
+        add     rax,r11
+        adc     r8,0
+        mov     QWORD[64+rcx],rax
+        mov     rax,r8
+        mov     r9,QWORD[32+rdx]
+        mov     r10,QWORD[40+rdx]
+        mov     r11,QWORD[48+rdx]
+        shl     r9,4
+        shl     r10,33
+        mov     r8,r11
+        shl     r11,62
+        shr     r8,2
+        add     rax,r9
+        add     rax,r10
+        add     rax,r11
+        adc     r8,0
+        mov     QWORD[72+rcx],rax
+        mov     rax,r8
+        mov     r9,QWORD[56+rdx]
+        mov     r10,QWORD[64+rdx]
+        shl     r9,27
+        mov     r11,r10
+        shl     r10,56
+        shr     r11,8
+        add     rax,r9
+        add     rax,r10
+        adc     r11,0
+        mov     QWORD[80+rcx],rax
+        mov     rax,r11
+        mov     r8,QWORD[72+rdx]
+        mov     r9,QWORD[80+rdx]
+        shl     r8,21
+        mov     r10,r9
+        shl     r9,50
+        shr     r10,14
+        add     rax,r8
+        add     rax,r9
+        adc     r10,0
+        mov     QWORD[88+rcx],rax
+        mov     rax,r10
+        mov     r11,QWORD[88+rdx]
+        mov     r8,QWORD[96+rdx]
+        shl     r11,15
+        mov     r9,r8
+        shl     r8,44
+        shr     r9,20
+        add     rax,r11
+        add     rax,r8
+        adc     r9,0
+        mov     QWORD[96+rcx],rax
+        mov     rax,r9
+        mov     r10,QWORD[104+rdx]
+        mov     r11,QWORD[112+rdx]
+        shl     r10,9
+        mov     r8,r11
+        shl     r11,38
+        shr     r8,26
+        add     rax,r10
+        add     rax,r11
+        adc     r8,0
+        mov     QWORD[104+rcx],rax
+        mov     rax,r8
+        mov     r9,QWORD[120+rdx]
+        mov     r10,QWORD[128+rdx]
+        mov     r11,QWORD[136+rdx]
+        shl     r9,3
+        shl     r10,32
+        mov     r8,r11
+        shl     r11,61
+        shr     r8,3
+        add     rax,r9
+        add     rax,r10
+        add     rax,r11
+        adc     r8,0
+        mov     QWORD[112+rcx],rax
+        mov     rax,r8
+        mov     r9,QWORD[144+rdx]
+        mov     r10,QWORD[152+rdx]
+        shl     r9,26
+        mov     r11,r10
+        shl     r10,55
+        shr     r11,9
+        add     rax,r9
+        add     rax,r10
+        adc     r11,0
+        mov     QWORD[120+rcx],rax
+        mov     rax,r11
+        DB      0F3h,0C3h               ;repret
+
+
+
+global  rsaz_1024_norm2red_avx2
+
+ALIGN   32
+rsaz_1024_norm2red_avx2:
+
+        sub     rcx,-128
+        mov     r8,QWORD[rdx]
+        mov     eax,0x1fffffff
+        mov     r9,QWORD[8+rdx]
+        mov     r11,r8
+        shr     r11,0
+        and     r11,rax
+        mov     QWORD[((-128))+rcx],r11
+        mov     r10,r8
+        shr     r10,29
+        and     r10,rax
+        mov     QWORD[((-120))+rcx],r10
+        shrd    r8,r9,58
+        and     r8,rax
+        mov     QWORD[((-112))+rcx],r8
+        mov     r10,QWORD[16+rdx]
+        mov     r8,r9
+        shr     r8,23
+        and     r8,rax
+        mov     QWORD[((-104))+rcx],r8
+        shrd    r9,r10,52
+        and     r9,rax
+        mov     QWORD[((-96))+rcx],r9
+        mov     r11,QWORD[24+rdx]
+        mov     r9,r10
+        shr     r9,17
+        and     r9,rax
+        mov     QWORD[((-88))+rcx],r9
+        shrd    r10,r11,46
+        and     r10,rax
+        mov     QWORD[((-80))+rcx],r10
+        mov     r8,QWORD[32+rdx]
+        mov     r10,r11
+        shr     r10,11
+        and     r10,rax
+        mov     QWORD[((-72))+rcx],r10
+        shrd    r11,r8,40
+        and     r11,rax
+        mov     QWORD[((-64))+rcx],r11
+        mov     r9,QWORD[40+rdx]
+        mov     r11,r8
+        shr     r11,5
+        and     r11,rax
+        mov     QWORD[((-56))+rcx],r11
+        mov     r10,r8
+        shr     r10,34
+        and     r10,rax
+        mov     QWORD[((-48))+rcx],r10
+        shrd    r8,r9,63
+        and     r8,rax
+        mov     QWORD[((-40))+rcx],r8
+        mov     r10,QWORD[48+rdx]
+        mov     r8,r9
+        shr     r8,28
+        and     r8,rax
+        mov     QWORD[((-32))+rcx],r8
+        shrd    r9,r10,57
+        and     r9,rax
+        mov     QWORD[((-24))+rcx],r9
+        mov     r11,QWORD[56+rdx]
+        mov     r9,r10
+        shr     r9,22
+        and     r9,rax
+        mov     QWORD[((-16))+rcx],r9
+        shrd    r10,r11,51
+        and     r10,rax
+        mov     QWORD[((-8))+rcx],r10
+        mov     r8,QWORD[64+rdx]
+        mov     r10,r11
+        shr     r10,16
+        and     r10,rax
+        mov     QWORD[rcx],r10
+        shrd    r11,r8,45
+        and     r11,rax
+        mov     QWORD[8+rcx],r11
+        mov     r9,QWORD[72+rdx]
+        mov     r11,r8
+        shr     r11,10
+        and     r11,rax
+        mov     QWORD[16+rcx],r11
+        shrd    r8,r9,39
+        and     r8,rax
+        mov     QWORD[24+rcx],r8
+        mov     r10,QWORD[80+rdx]
+        mov     r8,r9
+        shr     r8,4
+        and     r8,rax
+        mov     QWORD[32+rcx],r8
+        mov     r11,r9
+        shr     r11,33
+        and     r11,rax
+        mov     QWORD[40+rcx],r11
+        shrd    r9,r10,62
+        and     r9,rax
+        mov     QWORD[48+rcx],r9
+        mov     r11,QWORD[88+rdx]
+        mov     r9,r10
+        shr     r9,27
+        and     r9,rax
+        mov     QWORD[56+rcx],r9
+        shrd    r10,r11,56
+        and     r10,rax
+        mov     QWORD[64+rcx],r10
+        mov     r8,QWORD[96+rdx]
+        mov     r10,r11
+        shr     r10,21
+        and     r10,rax
+        mov     QWORD[72+rcx],r10
+        shrd    r11,r8,50
+        and     r11,rax
+        mov     QWORD[80+rcx],r11
+        mov     r9,QWORD[104+rdx]
+        mov     r11,r8
+        shr     r11,15
+        and     r11,rax
+        mov     QWORD[88+rcx],r11
+        shrd    r8,r9,44
+        and     r8,rax
+        mov     QWORD[96+rcx],r8
+        mov     r10,QWORD[112+rdx]
+        mov     r8,r9
+        shr     r8,9
+        and     r8,rax
+        mov     QWORD[104+rcx],r8
+        shrd    r9,r10,38
+        and     r9,rax
+        mov     QWORD[112+rcx],r9
+        mov     r11,QWORD[120+rdx]
+        mov     r9,r10
+        shr     r9,3
+        and     r9,rax
+        mov     QWORD[120+rcx],r9
+        mov     r8,r10
+        shr     r8,32
+        and     r8,rax
+        mov     QWORD[128+rcx],r8
+        shrd    r10,r11,61
+        and     r10,rax
+        mov     QWORD[136+rcx],r10
+        xor     r8,r8
+        mov     r10,r11
+        shr     r10,26
+        and     r10,rax
+        mov     QWORD[144+rcx],r10
+        shrd    r11,r8,55
+        and     r11,rax
+        mov     QWORD[152+rcx],r11
+        mov     QWORD[160+rcx],r8
+        mov     QWORD[168+rcx],r8
+        mov     QWORD[176+rcx],r8
+        mov     QWORD[184+rcx],r8
+        DB      0F3h,0C3h               ;repret
+
+
+global  rsaz_1024_scatter5_avx2
+
+ALIGN   32
+rsaz_1024_scatter5_avx2:
+
+        vzeroupper
+        vmovdqu ymm5,YMMWORD[$L$scatter_permd]
+        shl     r8d,4
+        lea     rcx,[r8*1+rcx]
+        mov     eax,9
+        jmp     NEAR $L$oop_scatter_1024
+
+ALIGN   32
+$L$oop_scatter_1024:
+        vmovdqu ymm0,YMMWORD[rdx]
+        lea     rdx,[32+rdx]
+        vpermd  ymm0,ymm5,ymm0
+        vmovdqu XMMWORD[rcx],xmm0
+        lea     rcx,[512+rcx]
+        dec     eax
+        jnz     NEAR $L$oop_scatter_1024
+
+        vzeroupper
+        DB      0F3h,0C3h               ;repret
+
+
+
+global  rsaz_1024_gather5_avx2
+
+ALIGN   32
+rsaz_1024_gather5_avx2:
+
+        vzeroupper
+        mov     r11,rsp
+
+        lea     rax,[((-136))+rsp]
+$L$SEH_begin_rsaz_1024_gather5:
+
+DB      0x48,0x8d,0x60,0xe0
+DB      0xc5,0xf8,0x29,0x70,0xe0
+DB      0xc5,0xf8,0x29,0x78,0xf0
+DB      0xc5,0x78,0x29,0x40,0x00
+DB      0xc5,0x78,0x29,0x48,0x10
+DB      0xc5,0x78,0x29,0x50,0x20
+DB      0xc5,0x78,0x29,0x58,0x30
+DB      0xc5,0x78,0x29,0x60,0x40
+DB      0xc5,0x78,0x29,0x68,0x50
+DB      0xc5,0x78,0x29,0x70,0x60
+DB      0xc5,0x78,0x29,0x78,0x70
+        lea     rsp,[((-256))+rsp]
+        and     rsp,-32
+        lea     r10,[$L$inc]
+        lea     rax,[((-128))+rsp]
+
+        vmovd   xmm4,r8d
+        vmovdqa ymm0,YMMWORD[r10]
+        vmovdqa ymm1,YMMWORD[32+r10]
+        vmovdqa ymm5,YMMWORD[64+r10]
+        vpbroadcastd    ymm4,xmm4
+
+        vpaddd  ymm2,ymm0,ymm5
+        vpcmpeqd        ymm0,ymm0,ymm4
+        vpaddd  ymm3,ymm1,ymm5
+        vpcmpeqd        ymm1,ymm1,ymm4
+        vmovdqa YMMWORD[(0+128)+rax],ymm0
+        vpaddd  ymm0,ymm2,ymm5
+        vpcmpeqd        ymm2,ymm2,ymm4
+        vmovdqa YMMWORD[(32+128)+rax],ymm1
+        vpaddd  ymm1,ymm3,ymm5
+        vpcmpeqd        ymm3,ymm3,ymm4
+        vmovdqa YMMWORD[(64+128)+rax],ymm2
+        vpaddd  ymm2,ymm0,ymm5
+        vpcmpeqd        ymm0,ymm0,ymm4
+        vmovdqa YMMWORD[(96+128)+rax],ymm3
+        vpaddd  ymm3,ymm1,ymm5
+        vpcmpeqd        ymm1,ymm1,ymm4
+        vmovdqa YMMWORD[(128+128)+rax],ymm0
+        vpaddd  ymm8,ymm2,ymm5
+        vpcmpeqd        ymm2,ymm2,ymm4
+        vmovdqa YMMWORD[(160+128)+rax],ymm1
+        vpaddd  ymm9,ymm3,ymm5
+        vpcmpeqd        ymm3,ymm3,ymm4
+        vmovdqa YMMWORD[(192+128)+rax],ymm2
+        vpaddd  ymm10,ymm8,ymm5
+        vpcmpeqd        ymm8,ymm8,ymm4
+        vmovdqa YMMWORD[(224+128)+rax],ymm3
+        vpaddd  ymm11,ymm9,ymm5
+        vpcmpeqd        ymm9,ymm9,ymm4
+        vpaddd  ymm12,ymm10,ymm5
+        vpcmpeqd        ymm10,ymm10,ymm4
+        vpaddd  ymm13,ymm11,ymm5
+        vpcmpeqd        ymm11,ymm11,ymm4
+        vpaddd  ymm14,ymm12,ymm5
+        vpcmpeqd        ymm12,ymm12,ymm4
+        vpaddd  ymm15,ymm13,ymm5
+        vpcmpeqd        ymm13,ymm13,ymm4
+        vpcmpeqd        ymm14,ymm14,ymm4
+        vpcmpeqd        ymm15,ymm15,ymm4
+
+        vmovdqa ymm7,YMMWORD[((-32))+r10]
+        lea     rdx,[128+rdx]
+        mov     r8d,9
+
+$L$oop_gather_1024:
+        vmovdqa ymm0,YMMWORD[((0-128))+rdx]
+        vmovdqa ymm1,YMMWORD[((32-128))+rdx]
+        vmovdqa ymm2,YMMWORD[((64-128))+rdx]
+        vmovdqa ymm3,YMMWORD[((96-128))+rdx]
+        vpand   ymm0,ymm0,YMMWORD[((0+128))+rax]
+        vpand   ymm1,ymm1,YMMWORD[((32+128))+rax]
+        vpand   ymm2,ymm2,YMMWORD[((64+128))+rax]
+        vpor    ymm4,ymm1,ymm0
+        vpand   ymm3,ymm3,YMMWORD[((96+128))+rax]
+        vmovdqa ymm0,YMMWORD[((128-128))+rdx]
+        vmovdqa ymm1,YMMWORD[((160-128))+rdx]
+        vpor    ymm5,ymm3,ymm2
+        vmovdqa ymm2,YMMWORD[((192-128))+rdx]
+        vmovdqa ymm3,YMMWORD[((224-128))+rdx]
+        vpand   ymm0,ymm0,YMMWORD[((128+128))+rax]
+        vpand   ymm1,ymm1,YMMWORD[((160+128))+rax]
+        vpand   ymm2,ymm2,YMMWORD[((192+128))+rax]
+        vpor    ymm4,ymm4,ymm0
+        vpand   ymm3,ymm3,YMMWORD[((224+128))+rax]
+        vpand   ymm0,ymm8,YMMWORD[((256-128))+rdx]
+        vpor    ymm5,ymm5,ymm1
+        vpand   ymm1,ymm9,YMMWORD[((288-128))+rdx]
+        vpor    ymm4,ymm4,ymm2
+        vpand   ymm2,ymm10,YMMWORD[((320-128))+rdx]
+        vpor    ymm5,ymm5,ymm3
+        vpand   ymm3,ymm11,YMMWORD[((352-128))+rdx]
+        vpor    ymm4,ymm4,ymm0
+        vpand   ymm0,ymm12,YMMWORD[((384-128))+rdx]
+        vpor    ymm5,ymm5,ymm1
+        vpand   ymm1,ymm13,YMMWORD[((416-128))+rdx]
+        vpor    ymm4,ymm4,ymm2
+        vpand   ymm2,ymm14,YMMWORD[((448-128))+rdx]
+        vpor    ymm5,ymm5,ymm3
+        vpand   ymm3,ymm15,YMMWORD[((480-128))+rdx]
+        lea     rdx,[512+rdx]
+        vpor    ymm4,ymm4,ymm0
+        vpor    ymm5,ymm5,ymm1
+        vpor    ymm4,ymm4,ymm2
+        vpor    ymm5,ymm5,ymm3
+
+        vpor    ymm4,ymm4,ymm5
+        vextracti128    xmm5,ymm4,1
+        vpor    xmm5,xmm5,xmm4
+        vpermd  ymm5,ymm7,ymm5
+        vmovdqu YMMWORD[rcx],ymm5
+        lea     rcx,[32+rcx]
+        dec     r8d
+        jnz     NEAR $L$oop_gather_1024
+
+        vpxor   ymm0,ymm0,ymm0
+        vmovdqu YMMWORD[rcx],ymm0
+        vzeroupper
+        movaps  xmm6,XMMWORD[((-168))+r11]
+        movaps  xmm7,XMMWORD[((-152))+r11]
+        movaps  xmm8,XMMWORD[((-136))+r11]
+        movaps  xmm9,XMMWORD[((-120))+r11]
+        movaps  xmm10,XMMWORD[((-104))+r11]
+        movaps  xmm11,XMMWORD[((-88))+r11]
+        movaps  xmm12,XMMWORD[((-72))+r11]
+        movaps  xmm13,XMMWORD[((-56))+r11]
+        movaps  xmm14,XMMWORD[((-40))+r11]
+        movaps  xmm15,XMMWORD[((-24))+r11]
+        lea     rsp,[r11]
+
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_rsaz_1024_gather5:
+
+EXTERN  OPENSSL_ia32cap_P
+global  rsaz_avx2_eligible
+
+ALIGN   32
+rsaz_avx2_eligible:
+        mov     eax,DWORD[((OPENSSL_ia32cap_P+8))]
+        mov     ecx,524544
+        mov     edx,0
+        and     ecx,eax
+        cmp     ecx,524544
+        cmove   eax,edx
+        and     eax,32
+        shr     eax,5
+        DB      0F3h,0C3h               ;repret
+
+
+ALIGN   64
+$L$and_mask:
+        DQ      0x1fffffff,0x1fffffff,0x1fffffff,0x1fffffff
+$L$scatter_permd:
+        DD      0,2,4,6,7,7,7,7
+$L$gather_permd:
+        DD      0,7,1,7,2,7,3,7
+$L$inc:
+        DD      0,0,0,0,1,1,1,1
+        DD      2,2,2,2,3,3,3,3
+        DD      4,4,4,4,4,4,4,4
+ALIGN   64
+EXTERN  __imp_RtlVirtualUnwind
+
+ALIGN   16
+rsaz_se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$common_seh_tail
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$common_seh_tail
+
+        mov     rbp,QWORD[160+r8]
+
+        mov     r10d,DWORD[8+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        cmovc   rax,rbp
+
+        mov     r15,QWORD[((-48))+rax]
+        mov     r14,QWORD[((-40))+rax]
+        mov     r13,QWORD[((-32))+rax]
+        mov     r12,QWORD[((-24))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+        mov     rbx,QWORD[((-8))+rax]
+        mov     QWORD[240+r8],r15
+        mov     QWORD[232+r8],r14
+        mov     QWORD[224+r8],r13
+        mov     QWORD[216+r8],r12
+        mov     QWORD[160+r8],rbp
+        mov     QWORD[144+r8],rbx
+
+        lea     rsi,[((-216))+rax]
+        lea     rdi,[512+r8]
+        mov     ecx,20
+        DD      0xa548f3fc
+
+$L$common_seh_tail:
+        mov     rdi,QWORD[8+rax]
+        mov     rsi,QWORD[16+rax]
+        mov     QWORD[152+r8],rax
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+
+        mov     rdi,QWORD[40+r9]
+        mov     rsi,r8
+        mov     ecx,154
+        DD      0xa548f3fc
+
+        mov     rsi,r9
+        xor     rcx,rcx
+        mov     rdx,QWORD[8+rsi]
+        mov     r8,QWORD[rsi]
+        mov     r9,QWORD[16+rsi]
+        mov     r10,QWORD[40+rsi]
+        lea     r11,[56+rsi]
+        lea     r12,[24+rsi]
+        mov     QWORD[32+rsp],r10
+        mov     QWORD[40+rsp],r11
+        mov     QWORD[48+rsp],r12
+        mov     QWORD[56+rsp],rcx
+        call    QWORD[__imp_RtlVirtualUnwind]
+
+        mov     eax,1
+        add     rsp,64
+        popfq
+        pop     r15
+        pop     r14
+        pop     r13
+        pop     r12
+        pop     rbp
+        pop     rbx
+        pop     rdi
+        pop     rsi
+        DB      0F3h,0C3h               ;repret
+
+
+section .pdata rdata align=4
+ALIGN   4
+        DD      $L$SEH_begin_rsaz_1024_sqr_avx2 wrt ..imagebase
+        DD      $L$SEH_end_rsaz_1024_sqr_avx2 wrt ..imagebase
+        DD      $L$SEH_info_rsaz_1024_sqr_avx2 wrt ..imagebase
+
+        DD      $L$SEH_begin_rsaz_1024_mul_avx2 wrt ..imagebase
+        DD      $L$SEH_end_rsaz_1024_mul_avx2 wrt ..imagebase
+        DD      $L$SEH_info_rsaz_1024_mul_avx2 wrt ..imagebase
+
+        DD      $L$SEH_begin_rsaz_1024_gather5 wrt ..imagebase
+        DD      $L$SEH_end_rsaz_1024_gather5 wrt ..imagebase
+        DD      $L$SEH_info_rsaz_1024_gather5 wrt ..imagebase
+section .xdata rdata align=8
+ALIGN   8
+$L$SEH_info_rsaz_1024_sqr_avx2:
+DB      9,0,0,0
+        DD      rsaz_se_handler wrt ..imagebase
+        DD      $L$sqr_1024_body wrt ..imagebase,$L$sqr_1024_epilogue wrt ..imagebase,$L$sqr_1024_in_tail wrt ..imagebase
+        DD      0
+$L$SEH_info_rsaz_1024_mul_avx2:
+DB      9,0,0,0
+        DD      rsaz_se_handler wrt ..imagebase
+        DD      $L$mul_1024_body wrt ..imagebase,$L$mul_1024_epilogue wrt ..imagebase,$L$mul_1024_in_tail wrt ..imagebase
+        DD      0
+$L$SEH_info_rsaz_1024_gather5:
+DB      0x01,0x36,0x17,0x0b
+DB      0x36,0xf8,0x09,0x00
+DB      0x31,0xe8,0x08,0x00
+DB      0x2c,0xd8,0x07,0x00
+DB      0x27,0xc8,0x06,0x00
+DB      0x22,0xb8,0x05,0x00
+DB      0x1d,0xa8,0x04,0x00
+DB      0x18,0x98,0x03,0x00
+DB      0x13,0x88,0x02,0x00
+DB      0x0e,0x78,0x01,0x00
+DB      0x09,0x68,0x00,0x00
+DB      0x04,0x01,0x15,0x00
+DB      0x00,0xb3,0x00,0x00
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-x86_64.nasm
new file mode 100644
index 0000000000..eb4958e903
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-x86_64.nasm
@@ -0,0 +1,2242 @@
+; Copyright 2013-2016 The OpenSSL Project Authors. All Rights Reserved.
+; Copyright (c) 2012, Intel Corporation. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+EXTERN  OPENSSL_ia32cap_P
+
+global  rsaz_512_sqr
+
+ALIGN   32
+rsaz_512_sqr:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_rsaz_512_sqr:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+
+
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+
+        sub     rsp,128+24
+
+$L$sqr_body:
+        mov     rbp,rdx
+        mov     rdx,QWORD[rsi]
+        mov     rax,QWORD[8+rsi]
+        mov     QWORD[128+rsp],rcx
+        mov     r11d,0x80100
+        and     r11d,DWORD[((OPENSSL_ia32cap_P+8))]
+        cmp     r11d,0x80100
+        je      NEAR $L$oop_sqrx
+        jmp     NEAR $L$oop_sqr
+
+ALIGN   32
+$L$oop_sqr:
+        mov     DWORD[((128+8))+rsp],r8d
+
+        mov     rbx,rdx
+        mul     rdx
+        mov     r8,rax
+        mov     rax,QWORD[16+rsi]
+        mov     r9,rdx
+
+        mul     rbx
+        add     r9,rax
+        mov     rax,QWORD[24+rsi]
+        mov     r10,rdx
+        adc     r10,0
+
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[32+rsi]
+        mov     r11,rdx
+        adc     r11,0
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[40+rsi]
+        mov     r12,rdx
+        adc     r12,0
+
+        mul     rbx
+        add     r12,rax
+        mov     rax,QWORD[48+rsi]
+        mov     r13,rdx
+        adc     r13,0
+
+        mul     rbx
+        add     r13,rax
+        mov     rax,QWORD[56+rsi]
+        mov     r14,rdx
+        adc     r14,0
+
+        mul     rbx
+        add     r14,rax
+        mov     rax,rbx
+        mov     r15,rdx
+        adc     r15,0
+
+        add     r8,r8
+        mov     rcx,r9
+        adc     r9,r9
+
+        mul     rax
+        mov     QWORD[rsp],rax
+        add     r8,rdx
+        adc     r9,0
+
+        mov     QWORD[8+rsp],r8
+        shr     rcx,63
+
+
+        mov     r8,QWORD[8+rsi]
+        mov     rax,QWORD[16+rsi]
+        mul     r8
+        add     r10,rax
+        mov     rax,QWORD[24+rsi]
+        mov     rbx,rdx
+        adc     rbx,0
+
+        mul     r8
+        add     r11,rax
+        mov     rax,QWORD[32+rsi]
+        adc     rdx,0
+        add     r11,rbx
+        mov     rbx,rdx
+        adc     rbx,0
+
+        mul     r8
+        add     r12,rax
+        mov     rax,QWORD[40+rsi]
+        adc     rdx,0
+        add     r12,rbx
+        mov     rbx,rdx
+        adc     rbx,0
+
+        mul     r8
+        add     r13,rax
+        mov     rax,QWORD[48+rsi]
+        adc     rdx,0
+        add     r13,rbx
+        mov     rbx,rdx
+        adc     rbx,0
+
+        mul     r8
+        add     r14,rax
+        mov     rax,QWORD[56+rsi]
+        adc     rdx,0
+        add     r14,rbx
+        mov     rbx,rdx
+        adc     rbx,0
+
+        mul     r8
+        add     r15,rax
+        mov     rax,r8
+        adc     rdx,0
+        add     r15,rbx
+        mov     r8,rdx
+        mov     rdx,r10
+        adc     r8,0
+
+        add     rdx,rdx
+        lea     r10,[r10*2+rcx]
+        mov     rbx,r11
+        adc     r11,r11
+
+        mul     rax
+        add     r9,rax
+        adc     r10,rdx
+        adc     r11,0
+
+        mov     QWORD[16+rsp],r9
+        mov     QWORD[24+rsp],r10
+        shr     rbx,63
+
+
+        mov     r9,QWORD[16+rsi]
+        mov     rax,QWORD[24+rsi]
+        mul     r9
+        add     r12,rax
+        mov     rax,QWORD[32+rsi]
+        mov     rcx,rdx
+        adc     rcx,0
+
+        mul     r9
+        add     r13,rax
+        mov     rax,QWORD[40+rsi]
+        adc     rdx,0
+        add     r13,rcx
+        mov     rcx,rdx
+        adc     rcx,0
+
+        mul     r9
+        add     r14,rax
+        mov     rax,QWORD[48+rsi]
+        adc     rdx,0
+        add     r14,rcx
+        mov     rcx,rdx
+        adc     rcx,0
+
+        mul     r9
+        mov     r10,r12
+        lea     r12,[r12*2+rbx]
+        add     r15,rax
+        mov     rax,QWORD[56+rsi]
+        adc     rdx,0
+        add     r15,rcx
+        mov     rcx,rdx
+        adc     rcx,0
+
+        mul     r9
+        shr     r10,63
+        add     r8,rax
+        mov     rax,r9
+        adc     rdx,0
+        add     r8,rcx
+        mov     r9,rdx
+        adc     r9,0
+
+        mov     rcx,r13
+        lea     r13,[r13*2+r10]
+
+        mul     rax
+        add     r11,rax
+        adc     r12,rdx
+        adc     r13,0
+
+        mov     QWORD[32+rsp],r11
+        mov     QWORD[40+rsp],r12
+        shr     rcx,63
+
+
+        mov     r10,QWORD[24+rsi]
+        mov     rax,QWORD[32+rsi]
+        mul     r10
+        add     r14,rax
+        mov     rax,QWORD[40+rsi]
+        mov     rbx,rdx
+        adc     rbx,0
+
+        mul     r10
+        add     r15,rax
+        mov     rax,QWORD[48+rsi]
+        adc     rdx,0
+        add     r15,rbx
+        mov     rbx,rdx
+        adc     rbx,0
+
+        mul     r10
+        mov     r12,r14
+        lea     r14,[r14*2+rcx]
+        add     r8,rax
+        mov     rax,QWORD[56+rsi]
+        adc     rdx,0
+        add     r8,rbx
+        mov     rbx,rdx
+        adc     rbx,0
+
+        mul     r10
+        shr     r12,63
+        add     r9,rax
+        mov     rax,r10
+        adc     rdx,0
+        add     r9,rbx
+        mov     r10,rdx
+        adc     r10,0
+
+        mov     rbx,r15
+        lea     r15,[r15*2+r12]
+
+        mul     rax
+        add     r13,rax
+        adc     r14,rdx
+        adc     r15,0
+
+        mov     QWORD[48+rsp],r13
+        mov     QWORD[56+rsp],r14
+        shr     rbx,63
+
+
+        mov     r11,QWORD[32+rsi]
+        mov     rax,QWORD[40+rsi]
+        mul     r11
+        add     r8,rax
+        mov     rax,QWORD[48+rsi]
+        mov     rcx,rdx
+        adc     rcx,0
+
+        mul     r11
+        add     r9,rax
+        mov     rax,QWORD[56+rsi]
+        adc     rdx,0
+        mov     r12,r8
+        lea     r8,[r8*2+rbx]
+        add     r9,rcx
+        mov     rcx,rdx
+        adc     rcx,0
+
+        mul     r11
+        shr     r12,63
+        add     r10,rax
+        mov     rax,r11
+        adc     rdx,0
+        add     r10,rcx
+        mov     r11,rdx
+        adc     r11,0
+
+        mov     rcx,r9
+        lea     r9,[r9*2+r12]
+
+        mul     rax
+        add     r15,rax
+        adc     r8,rdx
+        adc     r9,0
+
+        mov     QWORD[64+rsp],r15
+        mov     QWORD[72+rsp],r8
+        shr     rcx,63
+
+
+        mov     r12,QWORD[40+rsi]
+        mov     rax,QWORD[48+rsi]
+        mul     r12
+        add     r10,rax
+        mov     rax,QWORD[56+rsi]
+        mov     rbx,rdx
+        adc     rbx,0
+
+        mul     r12
+        add     r11,rax
+        mov     rax,r12
+        mov     r15,r10
+        lea     r10,[r10*2+rcx]
+        adc     rdx,0
+        shr     r15,63
+        add     r11,rbx
+        mov     r12,rdx
+        adc     r12,0
+
+        mov     rbx,r11
+        lea     r11,[r11*2+r15]
+
+        mul     rax
+        add     r9,rax
+        adc     r10,rdx
+        adc     r11,0
+
+        mov     QWORD[80+rsp],r9
+        mov     QWORD[88+rsp],r10
+
+
+        mov     r13,QWORD[48+rsi]
+        mov     rax,QWORD[56+rsi]
+        mul     r13
+        add     r12,rax
+        mov     rax,r13
+        mov     r13,rdx
+        adc     r13,0
+
+        xor     r14,r14
+        shl     rbx,1
+        adc     r12,r12
+        adc     r13,r13
+        adc     r14,r14
+
+        mul     rax
+        add     r11,rax
+        adc     r12,rdx
+        adc     r13,0
+
+        mov     QWORD[96+rsp],r11
+        mov     QWORD[104+rsp],r12
+
+
+        mov     rax,QWORD[56+rsi]
+        mul     rax
+        add     r13,rax
+        adc     rdx,0
+
+        add     r14,rdx
+
+        mov     QWORD[112+rsp],r13
+        mov     QWORD[120+rsp],r14
+
+        mov     r8,QWORD[rsp]
+        mov     r9,QWORD[8+rsp]
+        mov     r10,QWORD[16+rsp]
+        mov     r11,QWORD[24+rsp]
+        mov     r12,QWORD[32+rsp]
+        mov     r13,QWORD[40+rsp]
+        mov     r14,QWORD[48+rsp]
+        mov     r15,QWORD[56+rsp]
+
+        call    __rsaz_512_reduce
+
+        add     r8,QWORD[64+rsp]
+        adc     r9,QWORD[72+rsp]
+        adc     r10,QWORD[80+rsp]
+        adc     r11,QWORD[88+rsp]
+        adc     r12,QWORD[96+rsp]
+        adc     r13,QWORD[104+rsp]
+        adc     r14,QWORD[112+rsp]
+        adc     r15,QWORD[120+rsp]
+        sbb     rcx,rcx
+
+        call    __rsaz_512_subtract
+
+        mov     rdx,r8
+        mov     rax,r9
+        mov     r8d,DWORD[((128+8))+rsp]
+        mov     rsi,rdi
+
+        dec     r8d
+        jnz     NEAR $L$oop_sqr
+        jmp     NEAR $L$sqr_tail
+
+ALIGN   32
+$L$oop_sqrx:
+        mov     DWORD[((128+8))+rsp],r8d
+DB      102,72,15,110,199
+DB      102,72,15,110,205
+
+        mulx    r9,r8,rax
+
+        mulx    r10,rcx,QWORD[16+rsi]
+        xor     rbp,rbp
+
+        mulx    r11,rax,QWORD[24+rsi]
+        adcx    r9,rcx
+
+        mulx    r12,rcx,QWORD[32+rsi]
+        adcx    r10,rax
+
+        mulx    r13,rax,QWORD[40+rsi]
+        adcx    r11,rcx
+
+DB      0xc4,0x62,0xf3,0xf6,0xb6,0x30,0x00,0x00,0x00
+        adcx    r12,rax
+        adcx    r13,rcx
+
+DB      0xc4,0x62,0xfb,0xf6,0xbe,0x38,0x00,0x00,0x00
+        adcx    r14,rax
+        adcx    r15,rbp
+
+        mov     rcx,r9
+        shld    r9,r8,1
+        shl     r8,1
+
+        xor     ebp,ebp
+        mulx    rdx,rax,rdx
+        adcx    r8,rdx
+        mov     rdx,QWORD[8+rsi]
+        adcx    r9,rbp
+
+        mov     QWORD[rsp],rax
+        mov     QWORD[8+rsp],r8
+
+
+        mulx    rbx,rax,QWORD[16+rsi]
+        adox    r10,rax
+        adcx    r11,rbx
+
+DB      0xc4,0x62,0xc3,0xf6,0x86,0x18,0x00,0x00,0x00
+        adox    r11,rdi
+        adcx    r12,r8
+
+        mulx    rbx,rax,QWORD[32+rsi]
+        adox    r12,rax
+        adcx    r13,rbx
+
+        mulx    r8,rdi,QWORD[40+rsi]
+        adox    r13,rdi
+        adcx    r14,r8
+
+DB      0xc4,0xe2,0xfb,0xf6,0x9e,0x30,0x00,0x00,0x00
+        adox    r14,rax
+        adcx    r15,rbx
+
+DB      0xc4,0x62,0xc3,0xf6,0x86,0x38,0x00,0x00,0x00
+        adox    r15,rdi
+        adcx    r8,rbp
+        adox    r8,rbp
+
+        mov     rbx,r11
+        shld    r11,r10,1
+        shld    r10,rcx,1
+
+        xor     ebp,ebp
+        mulx    rcx,rax,rdx
+        mov     rdx,QWORD[16+rsi]
+        adcx    r9,rax
+        adcx    r10,rcx
+        adcx    r11,rbp
+
+        mov     QWORD[16+rsp],r9
+DB      0x4c,0x89,0x94,0x24,0x18,0x00,0x00,0x00
+
+
+DB      0xc4,0x62,0xc3,0xf6,0x8e,0x18,0x00,0x00,0x00
+        adox    r12,rdi
+        adcx    r13,r9
+
+        mulx    rcx,rax,QWORD[32+rsi]
+        adox    r13,rax
+        adcx    r14,rcx
+
+        mulx    r9,rdi,QWORD[40+rsi]
+        adox    r14,rdi
+        adcx    r15,r9
+
+DB      0xc4,0xe2,0xfb,0xf6,0x8e,0x30,0x00,0x00,0x00
+        adox    r15,rax
+        adcx    r8,rcx
+
+DB      0xc4,0x62,0xc3,0xf6,0x8e,0x38,0x00,0x00,0x00
+        adox    r8,rdi
+        adcx    r9,rbp
+        adox    r9,rbp
+
+        mov     rcx,r13
+        shld    r13,r12,1
+        shld    r12,rbx,1
+
+        xor     ebp,ebp
+        mulx    rdx,rax,rdx
+        adcx    r11,rax
+        adcx    r12,rdx
+        mov     rdx,QWORD[24+rsi]
+        adcx    r13,rbp
+
+        mov     QWORD[32+rsp],r11
+DB      0x4c,0x89,0xa4,0x24,0x28,0x00,0x00,0x00
+
+
+DB      0xc4,0xe2,0xfb,0xf6,0x9e,0x20,0x00,0x00,0x00
+        adox    r14,rax
+        adcx    r15,rbx
+
+        mulx    r10,rdi,QWORD[40+rsi]
+        adox    r15,rdi
+        adcx    r8,r10
+
+        mulx    rbx,rax,QWORD[48+rsi]
+        adox    r8,rax
+        adcx    r9,rbx
+
+        mulx    r10,rdi,QWORD[56+rsi]
+        adox    r9,rdi
+        adcx    r10,rbp
+        adox    r10,rbp
+
+DB      0x66
+        mov     rbx,r15
+        shld    r15,r14,1
+        shld    r14,rcx,1
+
+        xor     ebp,ebp
+        mulx    rdx,rax,rdx
+        adcx    r13,rax
+        adcx    r14,rdx
+        mov     rdx,QWORD[32+rsi]
+        adcx    r15,rbp
+
+        mov     QWORD[48+rsp],r13
+        mov     QWORD[56+rsp],r14
+
+
+DB      0xc4,0x62,0xc3,0xf6,0x9e,0x28,0x00,0x00,0x00
+        adox    r8,rdi
+        adcx    r9,r11
+
+        mulx    rcx,rax,QWORD[48+rsi]
+        adox    r9,rax
+        adcx    r10,rcx
+
+        mulx    r11,rdi,QWORD[56+rsi]
+        adox    r10,rdi
+        adcx    r11,rbp
+        adox    r11,rbp
+
+        mov     rcx,r9
+        shld    r9,r8,1
+        shld    r8,rbx,1
+
+        xor     ebp,ebp
+        mulx    rdx,rax,rdx
+        adcx    r15,rax
+        adcx    r8,rdx
+        mov     rdx,QWORD[40+rsi]
+        adcx    r9,rbp
+
+        mov     QWORD[64+rsp],r15
+        mov     QWORD[72+rsp],r8
+
+
+DB      0xc4,0xe2,0xfb,0xf6,0x9e,0x30,0x00,0x00,0x00
+        adox    r10,rax
+        adcx    r11,rbx
+
+DB      0xc4,0x62,0xc3,0xf6,0xa6,0x38,0x00,0x00,0x00
+        adox    r11,rdi
+        adcx    r12,rbp
+        adox    r12,rbp
+
+        mov     rbx,r11
+        shld    r11,r10,1
+        shld    r10,rcx,1
+
+        xor     ebp,ebp
+        mulx    rdx,rax,rdx
+        adcx    r9,rax
+        adcx    r10,rdx
+        mov     rdx,QWORD[48+rsi]
+        adcx    r11,rbp
+
+        mov     QWORD[80+rsp],r9
+        mov     QWORD[88+rsp],r10
+
+
+DB      0xc4,0x62,0xfb,0xf6,0xae,0x38,0x00,0x00,0x00
+        adox    r12,rax
+        adox    r13,rbp
+
+        xor     r14,r14
+        shld    r14,r13,1
+        shld    r13,r12,1
+        shld    r12,rbx,1
+
+        xor     ebp,ebp
+        mulx    rdx,rax,rdx
+        adcx    r11,rax
+        adcx    r12,rdx
+        mov     rdx,QWORD[56+rsi]
+        adcx    r13,rbp
+
+DB      0x4c,0x89,0x9c,0x24,0x60,0x00,0x00,0x00
+DB      0x4c,0x89,0xa4,0x24,0x68,0x00,0x00,0x00
+
+
+        mulx    rdx,rax,rdx
+        adox    r13,rax
+        adox    rdx,rbp
+
+DB      0x66
+        add     r14,rdx
+
+        mov     QWORD[112+rsp],r13
+        mov     QWORD[120+rsp],r14
+DB      102,72,15,126,199
+DB      102,72,15,126,205
+
+        mov     rdx,QWORD[128+rsp]
+        mov     r8,QWORD[rsp]
+        mov     r9,QWORD[8+rsp]
+        mov     r10,QWORD[16+rsp]
+        mov     r11,QWORD[24+rsp]
+        mov     r12,QWORD[32+rsp]
+        mov     r13,QWORD[40+rsp]
+        mov     r14,QWORD[48+rsp]
+        mov     r15,QWORD[56+rsp]
+
+        call    __rsaz_512_reducex
+
+        add     r8,QWORD[64+rsp]
+        adc     r9,QWORD[72+rsp]
+        adc     r10,QWORD[80+rsp]
+        adc     r11,QWORD[88+rsp]
+        adc     r12,QWORD[96+rsp]
+        adc     r13,QWORD[104+rsp]
+        adc     r14,QWORD[112+rsp]
+        adc     r15,QWORD[120+rsp]
+        sbb     rcx,rcx
+
+        call    __rsaz_512_subtract
+
+        mov     rdx,r8
+        mov     rax,r9
+        mov     r8d,DWORD[((128+8))+rsp]
+        mov     rsi,rdi
+
+        dec     r8d
+        jnz     NEAR $L$oop_sqrx
+
+$L$sqr_tail:
+
+        lea     rax,[((128+24+48))+rsp]
+
+        mov     r15,QWORD[((-48))+rax]
+
+        mov     r14,QWORD[((-40))+rax]
+
+        mov     r13,QWORD[((-32))+rax]
+
+        mov     r12,QWORD[((-24))+rax]
+
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$sqr_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_rsaz_512_sqr:
+global  rsaz_512_mul
+
+ALIGN   32
+rsaz_512_mul:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_rsaz_512_mul:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+
+
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+
+        sub     rsp,128+24
+
+$L$mul_body:
+DB      102,72,15,110,199
+DB      102,72,15,110,201
+        mov     QWORD[128+rsp],r8
+        mov     r11d,0x80100
+        and     r11d,DWORD[((OPENSSL_ia32cap_P+8))]
+        cmp     r11d,0x80100
+        je      NEAR $L$mulx
+        mov     rbx,QWORD[rdx]
+        mov     rbp,rdx
+        call    __rsaz_512_mul
+
+DB      102,72,15,126,199
+DB      102,72,15,126,205
+
+        mov     r8,QWORD[rsp]
+        mov     r9,QWORD[8+rsp]
+        mov     r10,QWORD[16+rsp]
+        mov     r11,QWORD[24+rsp]
+        mov     r12,QWORD[32+rsp]
+        mov     r13,QWORD[40+rsp]
+        mov     r14,QWORD[48+rsp]
+        mov     r15,QWORD[56+rsp]
+
+        call    __rsaz_512_reduce
+        jmp     NEAR $L$mul_tail
+
+ALIGN   32
+$L$mulx:
+        mov     rbp,rdx
+        mov     rdx,QWORD[rdx]
+        call    __rsaz_512_mulx
+
+DB      102,72,15,126,199
+DB      102,72,15,126,205
+
+        mov     rdx,QWORD[128+rsp]
+        mov     r8,QWORD[rsp]
+        mov     r9,QWORD[8+rsp]
+        mov     r10,QWORD[16+rsp]
+        mov     r11,QWORD[24+rsp]
+        mov     r12,QWORD[32+rsp]
+        mov     r13,QWORD[40+rsp]
+        mov     r14,QWORD[48+rsp]
+        mov     r15,QWORD[56+rsp]
+
+        call    __rsaz_512_reducex
+$L$mul_tail:
+        add     r8,QWORD[64+rsp]
+        adc     r9,QWORD[72+rsp]
+        adc     r10,QWORD[80+rsp]
+        adc     r11,QWORD[88+rsp]
+        adc     r12,QWORD[96+rsp]
+        adc     r13,QWORD[104+rsp]
+        adc     r14,QWORD[112+rsp]
+        adc     r15,QWORD[120+rsp]
+        sbb     rcx,rcx
+
+        call    __rsaz_512_subtract
+
+        lea     rax,[((128+24+48))+rsp]
+
+        mov     r15,QWORD[((-48))+rax]
+
+        mov     r14,QWORD[((-40))+rax]
+
+        mov     r13,QWORD[((-32))+rax]
+
+        mov     r12,QWORD[((-24))+rax]
+
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$mul_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_rsaz_512_mul:
+global  rsaz_512_mul_gather4
+
+ALIGN   32
+rsaz_512_mul_gather4:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_rsaz_512_mul_gather4:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+
+        sub     rsp,328
+
+        movaps  XMMWORD[160+rsp],xmm6
+        movaps  XMMWORD[176+rsp],xmm7
+        movaps  XMMWORD[192+rsp],xmm8
+        movaps  XMMWORD[208+rsp],xmm9
+        movaps  XMMWORD[224+rsp],xmm10
+        movaps  XMMWORD[240+rsp],xmm11
+        movaps  XMMWORD[256+rsp],xmm12
+        movaps  XMMWORD[272+rsp],xmm13
+        movaps  XMMWORD[288+rsp],xmm14
+        movaps  XMMWORD[304+rsp],xmm15
+$L$mul_gather4_body:
+        movd    xmm8,r9d
+        movdqa  xmm1,XMMWORD[(($L$inc+16))]
+        movdqa  xmm0,XMMWORD[$L$inc]
+
+        pshufd  xmm8,xmm8,0
+        movdqa  xmm7,xmm1
+        movdqa  xmm2,xmm1
+        paddd   xmm1,xmm0
+        pcmpeqd xmm0,xmm8
+        movdqa  xmm3,xmm7
+        paddd   xmm2,xmm1
+        pcmpeqd xmm1,xmm8
+        movdqa  xmm4,xmm7
+        paddd   xmm3,xmm2
+        pcmpeqd xmm2,xmm8
+        movdqa  xmm5,xmm7
+        paddd   xmm4,xmm3
+        pcmpeqd xmm3,xmm8
+        movdqa  xmm6,xmm7
+        paddd   xmm5,xmm4
+        pcmpeqd xmm4,xmm8
+        paddd   xmm6,xmm5
+        pcmpeqd xmm5,xmm8
+        paddd   xmm7,xmm6
+        pcmpeqd xmm6,xmm8
+        pcmpeqd xmm7,xmm8
+
+        movdqa  xmm8,XMMWORD[rdx]
+        movdqa  xmm9,XMMWORD[16+rdx]
+        movdqa  xmm10,XMMWORD[32+rdx]
+        movdqa  xmm11,XMMWORD[48+rdx]
+        pand    xmm8,xmm0
+        movdqa  xmm12,XMMWORD[64+rdx]
+        pand    xmm9,xmm1
+        movdqa  xmm13,XMMWORD[80+rdx]
+        pand    xmm10,xmm2
+        movdqa  xmm14,XMMWORD[96+rdx]
+        pand    xmm11,xmm3
+        movdqa  xmm15,XMMWORD[112+rdx]
+        lea     rbp,[128+rdx]
+        pand    xmm12,xmm4
+        pand    xmm13,xmm5
+        pand    xmm14,xmm6
+        pand    xmm15,xmm7
+        por     xmm8,xmm10
+        por     xmm9,xmm11
+        por     xmm8,xmm12
+        por     xmm9,xmm13
+        por     xmm8,xmm14
+        por     xmm9,xmm15
+
+        por     xmm8,xmm9
+        pshufd  xmm9,xmm8,0x4e
+        por     xmm8,xmm9
+        mov     r11d,0x80100
+        and     r11d,DWORD[((OPENSSL_ia32cap_P+8))]
+        cmp     r11d,0x80100
+        je      NEAR $L$mulx_gather
+DB      102,76,15,126,195
+
+        mov     QWORD[128+rsp],r8
+        mov     QWORD[((128+8))+rsp],rdi
+        mov     QWORD[((128+16))+rsp],rcx
+
+        mov     rax,QWORD[rsi]
+        mov     rcx,QWORD[8+rsi]
+        mul     rbx
+        mov     QWORD[rsp],rax
+        mov     rax,rcx
+        mov     r8,rdx
+
+        mul     rbx
+        add     r8,rax
+        mov     rax,QWORD[16+rsi]
+        mov     r9,rdx
+        adc     r9,0
+
+        mul     rbx
+        add     r9,rax
+        mov     rax,QWORD[24+rsi]
+        mov     r10,rdx
+        adc     r10,0
+
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[32+rsi]
+        mov     r11,rdx
+        adc     r11,0
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[40+rsi]
+        mov     r12,rdx
+        adc     r12,0
+
+        mul     rbx
+        add     r12,rax
+        mov     rax,QWORD[48+rsi]
+        mov     r13,rdx
+        adc     r13,0
+
+        mul     rbx
+        add     r13,rax
+        mov     rax,QWORD[56+rsi]
+        mov     r14,rdx
+        adc     r14,0
+
+        mul     rbx
+        add     r14,rax
+        mov     rax,QWORD[rsi]
+        mov     r15,rdx
+        adc     r15,0
+
+        lea     rdi,[8+rsp]
+        mov     ecx,7
+        jmp     NEAR $L$oop_mul_gather
+
+ALIGN   32
+$L$oop_mul_gather:
+        movdqa  xmm8,XMMWORD[rbp]
+        movdqa  xmm9,XMMWORD[16+rbp]
+        movdqa  xmm10,XMMWORD[32+rbp]
+        movdqa  xmm11,XMMWORD[48+rbp]
+        pand    xmm8,xmm0
+        movdqa  xmm12,XMMWORD[64+rbp]
+        pand    xmm9,xmm1
+        movdqa  xmm13,XMMWORD[80+rbp]
+        pand    xmm10,xmm2
+        movdqa  xmm14,XMMWORD[96+rbp]
+        pand    xmm11,xmm3
+        movdqa  xmm15,XMMWORD[112+rbp]
+        lea     rbp,[128+rbp]
+        pand    xmm12,xmm4
+        pand    xmm13,xmm5
+        pand    xmm14,xmm6
+        pand    xmm15,xmm7
+        por     xmm8,xmm10
+        por     xmm9,xmm11
+        por     xmm8,xmm12
+        por     xmm9,xmm13
+        por     xmm8,xmm14
+        por     xmm9,xmm15
+
+        por     xmm8,xmm9
+        pshufd  xmm9,xmm8,0x4e
+        por     xmm8,xmm9
+DB      102,76,15,126,195
+
+        mul     rbx
+        add     r8,rax
+        mov     rax,QWORD[8+rsi]
+        mov     QWORD[rdi],r8
+        mov     r8,rdx
+        adc     r8,0
+
+        mul     rbx
+        add     r9,rax
+        mov     rax,QWORD[16+rsi]
+        adc     rdx,0
+        add     r8,r9
+        mov     r9,rdx
+        adc     r9,0
+
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[24+rsi]
+        adc     rdx,0
+        add     r9,r10
+        mov     r10,rdx
+        adc     r10,0
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[32+rsi]
+        adc     rdx,0
+        add     r10,r11
+        mov     r11,rdx
+        adc     r11,0
+
+        mul     rbx
+        add     r12,rax
+        mov     rax,QWORD[40+rsi]
+        adc     rdx,0
+        add     r11,r12
+        mov     r12,rdx
+        adc     r12,0
+
+        mul     rbx
+        add     r13,rax
+        mov     rax,QWORD[48+rsi]
+        adc     rdx,0
+        add     r12,r13
+        mov     r13,rdx
+        adc     r13,0
+
+        mul     rbx
+        add     r14,rax
+        mov     rax,QWORD[56+rsi]
+        adc     rdx,0
+        add     r13,r14
+        mov     r14,rdx
+        adc     r14,0
+
+        mul     rbx
+        add     r15,rax
+        mov     rax,QWORD[rsi]
+        adc     rdx,0
+        add     r14,r15
+        mov     r15,rdx
+        adc     r15,0
+
+        lea     rdi,[8+rdi]
+
+        dec     ecx
+        jnz     NEAR $L$oop_mul_gather
+
+        mov     QWORD[rdi],r8
+        mov     QWORD[8+rdi],r9
+        mov     QWORD[16+rdi],r10
+        mov     QWORD[24+rdi],r11
+        mov     QWORD[32+rdi],r12
+        mov     QWORD[40+rdi],r13
+        mov     QWORD[48+rdi],r14
+        mov     QWORD[56+rdi],r15
+
+        mov     rdi,QWORD[((128+8))+rsp]
+        mov     rbp,QWORD[((128+16))+rsp]
+
+        mov     r8,QWORD[rsp]
+        mov     r9,QWORD[8+rsp]
+        mov     r10,QWORD[16+rsp]
+        mov     r11,QWORD[24+rsp]
+        mov     r12,QWORD[32+rsp]
+        mov     r13,QWORD[40+rsp]
+        mov     r14,QWORD[48+rsp]
+        mov     r15,QWORD[56+rsp]
+
+        call    __rsaz_512_reduce
+        jmp     NEAR $L$mul_gather_tail
+
+ALIGN   32
+$L$mulx_gather:
+DB      102,76,15,126,194
+
+        mov     QWORD[128+rsp],r8
+        mov     QWORD[((128+8))+rsp],rdi
+        mov     QWORD[((128+16))+rsp],rcx
+
+        mulx    r8,rbx,QWORD[rsi]
+        mov     QWORD[rsp],rbx
+        xor     edi,edi
+
+        mulx    r9,rax,QWORD[8+rsi]
+
+        mulx    r10,rbx,QWORD[16+rsi]
+        adcx    r8,rax
+
+        mulx    r11,rax,QWORD[24+rsi]
+        adcx    r9,rbx
+
+        mulx    r12,rbx,QWORD[32+rsi]
+        adcx    r10,rax
+
+        mulx    r13,rax,QWORD[40+rsi]
+        adcx    r11,rbx
+
+        mulx    r14,rbx,QWORD[48+rsi]
+        adcx    r12,rax
+
+        mulx    r15,rax,QWORD[56+rsi]
+        adcx    r13,rbx
+        adcx    r14,rax
+DB      0x67
+        mov     rbx,r8
+        adcx    r15,rdi
+
+        mov     rcx,-7
+        jmp     NEAR $L$oop_mulx_gather
+
+ALIGN   32
+$L$oop_mulx_gather:
+        movdqa  xmm8,XMMWORD[rbp]
+        movdqa  xmm9,XMMWORD[16+rbp]
+        movdqa  xmm10,XMMWORD[32+rbp]
+        movdqa  xmm11,XMMWORD[48+rbp]
+        pand    xmm8,xmm0
+        movdqa  xmm12,XMMWORD[64+rbp]
+        pand    xmm9,xmm1
+        movdqa  xmm13,XMMWORD[80+rbp]
+        pand    xmm10,xmm2
+        movdqa  xmm14,XMMWORD[96+rbp]
+        pand    xmm11,xmm3
+        movdqa  xmm15,XMMWORD[112+rbp]
+        lea     rbp,[128+rbp]
+        pand    xmm12,xmm4
+        pand    xmm13,xmm5
+        pand    xmm14,xmm6
+        pand    xmm15,xmm7
+        por     xmm8,xmm10
+        por     xmm9,xmm11
+        por     xmm8,xmm12
+        por     xmm9,xmm13
+        por     xmm8,xmm14
+        por     xmm9,xmm15
+
+        por     xmm8,xmm9
+        pshufd  xmm9,xmm8,0x4e
+        por     xmm8,xmm9
+DB      102,76,15,126,194
+
+DB      0xc4,0x62,0xfb,0xf6,0x86,0x00,0x00,0x00,0x00
+        adcx    rbx,rax
+        adox    r8,r9
+
+        mulx    r9,rax,QWORD[8+rsi]
+        adcx    r8,rax
+        adox    r9,r10
+
+        mulx    r10,rax,QWORD[16+rsi]
+        adcx    r9,rax
+        adox    r10,r11
+
+DB      0xc4,0x62,0xfb,0xf6,0x9e,0x18,0x00,0x00,0x00
+        adcx    r10,rax
+        adox    r11,r12
+
+        mulx    r12,rax,QWORD[32+rsi]
+        adcx    r11,rax
+        adox    r12,r13
+
+        mulx    r13,rax,QWORD[40+rsi]
+        adcx    r12,rax
+        adox    r13,r14
+
+DB      0xc4,0x62,0xfb,0xf6,0xb6,0x30,0x00,0x00,0x00
+        adcx    r13,rax
+DB      0x67
+        adox    r14,r15
+
+        mulx    r15,rax,QWORD[56+rsi]
+        mov     QWORD[64+rcx*8+rsp],rbx
+        adcx    r14,rax
+        adox    r15,rdi
+        mov     rbx,r8
+        adcx    r15,rdi
+
+        inc     rcx
+        jnz     NEAR $L$oop_mulx_gather
+
+        mov     QWORD[64+rsp],r8
+        mov     QWORD[((64+8))+rsp],r9
+        mov     QWORD[((64+16))+rsp],r10
+        mov     QWORD[((64+24))+rsp],r11
+        mov     QWORD[((64+32))+rsp],r12
+        mov     QWORD[((64+40))+rsp],r13
+        mov     QWORD[((64+48))+rsp],r14
+        mov     QWORD[((64+56))+rsp],r15
+
+        mov     rdx,QWORD[128+rsp]
+        mov     rdi,QWORD[((128+8))+rsp]
+        mov     rbp,QWORD[((128+16))+rsp]
+
+        mov     r8,QWORD[rsp]
+        mov     r9,QWORD[8+rsp]
+        mov     r10,QWORD[16+rsp]
+        mov     r11,QWORD[24+rsp]
+        mov     r12,QWORD[32+rsp]
+        mov     r13,QWORD[40+rsp]
+        mov     r14,QWORD[48+rsp]
+        mov     r15,QWORD[56+rsp]
+
+        call    __rsaz_512_reducex
+
+$L$mul_gather_tail:
+        add     r8,QWORD[64+rsp]
+        adc     r9,QWORD[72+rsp]
+        adc     r10,QWORD[80+rsp]
+        adc     r11,QWORD[88+rsp]
+        adc     r12,QWORD[96+rsp]
+        adc     r13,QWORD[104+rsp]
+        adc     r14,QWORD[112+rsp]
+        adc     r15,QWORD[120+rsp]
+        sbb     rcx,rcx
+
+        call    __rsaz_512_subtract
+
+        lea     rax,[((128+24+48))+rsp]
+        movaps  xmm6,XMMWORD[((160-200))+rax]
+        movaps  xmm7,XMMWORD[((176-200))+rax]
+        movaps  xmm8,XMMWORD[((192-200))+rax]
+        movaps  xmm9,XMMWORD[((208-200))+rax]
+        movaps  xmm10,XMMWORD[((224-200))+rax]
+        movaps  xmm11,XMMWORD[((240-200))+rax]
+        movaps  xmm12,XMMWORD[((256-200))+rax]
+        movaps  xmm13,XMMWORD[((272-200))+rax]
+        movaps  xmm14,XMMWORD[((288-200))+rax]
+        movaps  xmm15,XMMWORD[((304-200))+rax]
+        lea     rax,[176+rax]
+
+        mov     r15,QWORD[((-48))+rax]
+
+        mov     r14,QWORD[((-40))+rax]
+
+        mov     r13,QWORD[((-32))+rax]
+
+        mov     r12,QWORD[((-24))+rax]
+
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$mul_gather4_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_rsaz_512_mul_gather4:
+global  rsaz_512_mul_scatter4
+
+ALIGN   32
+rsaz_512_mul_scatter4:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_rsaz_512_mul_scatter4:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+
+        mov     r9d,r9d
+        sub     rsp,128+24
+
+$L$mul_scatter4_body:
+        lea     r8,[r9*8+r8]
+DB      102,72,15,110,199
+DB      102,72,15,110,202
+DB      102,73,15,110,208
+        mov     QWORD[128+rsp],rcx
+
+        mov     rbp,rdi
+        mov     r11d,0x80100
+        and     r11d,DWORD[((OPENSSL_ia32cap_P+8))]
+        cmp     r11d,0x80100
+        je      NEAR $L$mulx_scatter
+        mov     rbx,QWORD[rdi]
+        call    __rsaz_512_mul
+
+DB      102,72,15,126,199
+DB      102,72,15,126,205
+
+        mov     r8,QWORD[rsp]
+        mov     r9,QWORD[8+rsp]
+        mov     r10,QWORD[16+rsp]
+        mov     r11,QWORD[24+rsp]
+        mov     r12,QWORD[32+rsp]
+        mov     r13,QWORD[40+rsp]
+        mov     r14,QWORD[48+rsp]
+        mov     r15,QWORD[56+rsp]
+
+        call    __rsaz_512_reduce
+        jmp     NEAR $L$mul_scatter_tail
+
+ALIGN   32
+$L$mulx_scatter:
+        mov     rdx,QWORD[rdi]
+        call    __rsaz_512_mulx
+
+DB      102,72,15,126,199
+DB      102,72,15,126,205
+
+        mov     rdx,QWORD[128+rsp]
+        mov     r8,QWORD[rsp]
+        mov     r9,QWORD[8+rsp]
+        mov     r10,QWORD[16+rsp]
+        mov     r11,QWORD[24+rsp]
+        mov     r12,QWORD[32+rsp]
+        mov     r13,QWORD[40+rsp]
+        mov     r14,QWORD[48+rsp]
+        mov     r15,QWORD[56+rsp]
+
+        call    __rsaz_512_reducex
+
+$L$mul_scatter_tail:
+        add     r8,QWORD[64+rsp]
+        adc     r9,QWORD[72+rsp]
+        adc     r10,QWORD[80+rsp]
+        adc     r11,QWORD[88+rsp]
+        adc     r12,QWORD[96+rsp]
+        adc     r13,QWORD[104+rsp]
+        adc     r14,QWORD[112+rsp]
+        adc     r15,QWORD[120+rsp]
+DB      102,72,15,126,214
+        sbb     rcx,rcx
+
+        call    __rsaz_512_subtract
+
+        mov     QWORD[rsi],r8
+        mov     QWORD[128+rsi],r9
+        mov     QWORD[256+rsi],r10
+        mov     QWORD[384+rsi],r11
+        mov     QWORD[512+rsi],r12
+        mov     QWORD[640+rsi],r13
+        mov     QWORD[768+rsi],r14
+        mov     QWORD[896+rsi],r15
+
+        lea     rax,[((128+24+48))+rsp]
+
+        mov     r15,QWORD[((-48))+rax]
+
+        mov     r14,QWORD[((-40))+rax]
+
+        mov     r13,QWORD[((-32))+rax]
+
+        mov     r12,QWORD[((-24))+rax]
+
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$mul_scatter4_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_rsaz_512_mul_scatter4:
+global  rsaz_512_mul_by_one
+
+ALIGN   32
+rsaz_512_mul_by_one:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_rsaz_512_mul_by_one:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+
+
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+
+        sub     rsp,128+24
+
+$L$mul_by_one_body:
+        mov     eax,DWORD[((OPENSSL_ia32cap_P+8))]
+        mov     rbp,rdx
+        mov     QWORD[128+rsp],rcx
+
+        mov     r8,QWORD[rsi]
+        pxor    xmm0,xmm0
+        mov     r9,QWORD[8+rsi]
+        mov     r10,QWORD[16+rsi]
+        mov     r11,QWORD[24+rsi]
+        mov     r12,QWORD[32+rsi]
+        mov     r13,QWORD[40+rsi]
+        mov     r14,QWORD[48+rsi]
+        mov     r15,QWORD[56+rsi]
+
+        movdqa  XMMWORD[rsp],xmm0
+        movdqa  XMMWORD[16+rsp],xmm0
+        movdqa  XMMWORD[32+rsp],xmm0
+        movdqa  XMMWORD[48+rsp],xmm0
+        movdqa  XMMWORD[64+rsp],xmm0
+        movdqa  XMMWORD[80+rsp],xmm0
+        movdqa  XMMWORD[96+rsp],xmm0
+        and     eax,0x80100
+        cmp     eax,0x80100
+        je      NEAR $L$by_one_callx
+        call    __rsaz_512_reduce
+        jmp     NEAR $L$by_one_tail
+ALIGN   32
+$L$by_one_callx:
+        mov     rdx,QWORD[128+rsp]
+        call    __rsaz_512_reducex
+$L$by_one_tail:
+        mov     QWORD[rdi],r8
+        mov     QWORD[8+rdi],r9
+        mov     QWORD[16+rdi],r10
+        mov     QWORD[24+rdi],r11
+        mov     QWORD[32+rdi],r12
+        mov     QWORD[40+rdi],r13
+        mov     QWORD[48+rdi],r14
+        mov     QWORD[56+rdi],r15
+
+        lea     rax,[((128+24+48))+rsp]
+
+        mov     r15,QWORD[((-48))+rax]
+
+        mov     r14,QWORD[((-40))+rax]
+
+        mov     r13,QWORD[((-32))+rax]
+
+        mov     r12,QWORD[((-24))+rax]
+
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$mul_by_one_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_rsaz_512_mul_by_one:
+
+ALIGN   32
+__rsaz_512_reduce:
+        mov     rbx,r8
+        imul    rbx,QWORD[((128+8))+rsp]
+        mov     rax,QWORD[rbp]
+        mov     ecx,8
+        jmp     NEAR $L$reduction_loop
+
+ALIGN   32
+$L$reduction_loop:
+        mul     rbx
+        mov     rax,QWORD[8+rbp]
+        neg     r8
+        mov     r8,rdx
+        adc     r8,0
+
+        mul     rbx
+        add     r9,rax
+        mov     rax,QWORD[16+rbp]
+        adc     rdx,0
+        add     r8,r9
+        mov     r9,rdx
+        adc     r9,0
+
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[24+rbp]
+        adc     rdx,0
+        add     r9,r10
+        mov     r10,rdx
+        adc     r10,0
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[32+rbp]
+        adc     rdx,0
+        add     r10,r11
+        mov     rsi,QWORD[((128+8))+rsp]
+
+
+        adc     rdx,0
+        mov     r11,rdx
+
+        mul     rbx
+        add     r12,rax
+        mov     rax,QWORD[40+rbp]
+        adc     rdx,0
+        imul    rsi,r8
+        add     r11,r12
+        mov     r12,rdx
+        adc     r12,0
+
+        mul     rbx
+        add     r13,rax
+        mov     rax,QWORD[48+rbp]
+        adc     rdx,0
+        add     r12,r13
+        mov     r13,rdx
+        adc     r13,0
+
+        mul     rbx
+        add     r14,rax
+        mov     rax,QWORD[56+rbp]
+        adc     rdx,0
+        add     r13,r14
+        mov     r14,rdx
+        adc     r14,0
+
+        mul     rbx
+        mov     rbx,rsi
+        add     r15,rax
+        mov     rax,QWORD[rbp]
+        adc     rdx,0
+        add     r14,r15
+        mov     r15,rdx
+        adc     r15,0
+
+        dec     ecx
+        jne     NEAR $L$reduction_loop
+
+        DB      0F3h,0C3h               ;repret
+
+
+ALIGN   32
+__rsaz_512_reducex:
+
+        imul    rdx,r8
+        xor     rsi,rsi
+        mov     ecx,8
+        jmp     NEAR $L$reduction_loopx
+
+ALIGN   32
+$L$reduction_loopx:
+        mov     rbx,r8
+        mulx    r8,rax,QWORD[rbp]
+        adcx    rax,rbx
+        adox    r8,r9
+
+        mulx    r9,rax,QWORD[8+rbp]
+        adcx    r8,rax
+        adox    r9,r10
+
+        mulx    r10,rbx,QWORD[16+rbp]
+        adcx    r9,rbx
+        adox    r10,r11
+
+        mulx    r11,rbx,QWORD[24+rbp]
+        adcx    r10,rbx
+        adox    r11,r12
+
+DB      0xc4,0x62,0xe3,0xf6,0xa5,0x20,0x00,0x00,0x00
+        mov     rax,rdx
+        mov     rdx,r8
+        adcx    r11,rbx
+        adox    r12,r13
+
+        mulx    rdx,rbx,QWORD[((128+8))+rsp]
+        mov     rdx,rax
+
+        mulx    r13,rax,QWORD[40+rbp]
+        adcx    r12,rax
+        adox    r13,r14
+
+DB      0xc4,0x62,0xfb,0xf6,0xb5,0x30,0x00,0x00,0x00
+        adcx    r13,rax
+        adox    r14,r15
+
+        mulx    r15,rax,QWORD[56+rbp]
+        mov     rdx,rbx
+        adcx    r14,rax
+        adox    r15,rsi
+        adcx    r15,rsi
+
+        dec     ecx
+        jne     NEAR $L$reduction_loopx
+
+        DB      0F3h,0C3h               ;repret
+
+
+ALIGN   32
+__rsaz_512_subtract:
+        mov     QWORD[rdi],r8
+        mov     QWORD[8+rdi],r9
+        mov     QWORD[16+rdi],r10
+        mov     QWORD[24+rdi],r11
+        mov     QWORD[32+rdi],r12
+        mov     QWORD[40+rdi],r13
+        mov     QWORD[48+rdi],r14
+        mov     QWORD[56+rdi],r15
+
+        mov     r8,QWORD[rbp]
+        mov     r9,QWORD[8+rbp]
+        neg     r8
+        not     r9
+        and     r8,rcx
+        mov     r10,QWORD[16+rbp]
+        and     r9,rcx
+        not     r10
+        mov     r11,QWORD[24+rbp]
+        and     r10,rcx
+        not     r11
+        mov     r12,QWORD[32+rbp]
+        and     r11,rcx
+        not     r12
+        mov     r13,QWORD[40+rbp]
+        and     r12,rcx
+        not     r13
+        mov     r14,QWORD[48+rbp]
+        and     r13,rcx
+        not     r14
+        mov     r15,QWORD[56+rbp]
+        and     r14,rcx
+        not     r15
+        and     r15,rcx
+
+        add     r8,QWORD[rdi]
+        adc     r9,QWORD[8+rdi]
+        adc     r10,QWORD[16+rdi]
+        adc     r11,QWORD[24+rdi]
+        adc     r12,QWORD[32+rdi]
+        adc     r13,QWORD[40+rdi]
+        adc     r14,QWORD[48+rdi]
+        adc     r15,QWORD[56+rdi]
+
+        mov     QWORD[rdi],r8
+        mov     QWORD[8+rdi],r9
+        mov     QWORD[16+rdi],r10
+        mov     QWORD[24+rdi],r11
+        mov     QWORD[32+rdi],r12
+        mov     QWORD[40+rdi],r13
+        mov     QWORD[48+rdi],r14
+        mov     QWORD[56+rdi],r15
+
+        DB      0F3h,0C3h               ;repret
+
+
+ALIGN   32
+__rsaz_512_mul:
+        lea     rdi,[8+rsp]
+
+        mov     rax,QWORD[rsi]
+        mul     rbx
+        mov     QWORD[rdi],rax
+        mov     rax,QWORD[8+rsi]
+        mov     r8,rdx
+
+        mul     rbx
+        add     r8,rax
+        mov     rax,QWORD[16+rsi]
+        mov     r9,rdx
+        adc     r9,0
+
+        mul     rbx
+        add     r9,rax
+        mov     rax,QWORD[24+rsi]
+        mov     r10,rdx
+        adc     r10,0
+
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[32+rsi]
+        mov     r11,rdx
+        adc     r11,0
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[40+rsi]
+        mov     r12,rdx
+        adc     r12,0
+
+        mul     rbx
+        add     r12,rax
+        mov     rax,QWORD[48+rsi]
+        mov     r13,rdx
+        adc     r13,0
+
+        mul     rbx
+        add     r13,rax
+        mov     rax,QWORD[56+rsi]
+        mov     r14,rdx
+        adc     r14,0
+
+        mul     rbx
+        add     r14,rax
+        mov     rax,QWORD[rsi]
+        mov     r15,rdx
+        adc     r15,0
+
+        lea     rbp,[8+rbp]
+        lea     rdi,[8+rdi]
+
+        mov     ecx,7
+        jmp     NEAR $L$oop_mul
+
+ALIGN   32
+$L$oop_mul:
+        mov     rbx,QWORD[rbp]
+        mul     rbx
+        add     r8,rax
+        mov     rax,QWORD[8+rsi]
+        mov     QWORD[rdi],r8
+        mov     r8,rdx
+        adc     r8,0
+
+        mul     rbx
+        add     r9,rax
+        mov     rax,QWORD[16+rsi]
+        adc     rdx,0
+        add     r8,r9
+        mov     r9,rdx
+        adc     r9,0
+
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[24+rsi]
+        adc     rdx,0
+        add     r9,r10
+        mov     r10,rdx
+        adc     r10,0
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[32+rsi]
+        adc     rdx,0
+        add     r10,r11
+        mov     r11,rdx
+        adc     r11,0
+
+        mul     rbx
+        add     r12,rax
+        mov     rax,QWORD[40+rsi]
+        adc     rdx,0
+        add     r11,r12
+        mov     r12,rdx
+        adc     r12,0
+
+        mul     rbx
+        add     r13,rax
+        mov     rax,QWORD[48+rsi]
+        adc     rdx,0
+        add     r12,r13
+        mov     r13,rdx
+        adc     r13,0
+
+        mul     rbx
+        add     r14,rax
+        mov     rax,QWORD[56+rsi]
+        adc     rdx,0
+        add     r13,r14
+        mov     r14,rdx
+        lea     rbp,[8+rbp]
+        adc     r14,0
+
+        mul     rbx
+        add     r15,rax
+        mov     rax,QWORD[rsi]
+        adc     rdx,0
+        add     r14,r15
+        mov     r15,rdx
+        adc     r15,0
+
+        lea     rdi,[8+rdi]
+
+        dec     ecx
+        jnz     NEAR $L$oop_mul
+
+        mov     QWORD[rdi],r8
+        mov     QWORD[8+rdi],r9
+        mov     QWORD[16+rdi],r10
+        mov     QWORD[24+rdi],r11
+        mov     QWORD[32+rdi],r12
+        mov     QWORD[40+rdi],r13
+        mov     QWORD[48+rdi],r14
+        mov     QWORD[56+rdi],r15
+
+        DB      0F3h,0C3h               ;repret
+
+
+ALIGN   32
+__rsaz_512_mulx:
+        mulx    r8,rbx,QWORD[rsi]
+        mov     rcx,-6
+
+        mulx    r9,rax,QWORD[8+rsi]
+        mov     QWORD[8+rsp],rbx
+
+        mulx    r10,rbx,QWORD[16+rsi]
+        adc     r8,rax
+
+        mulx    r11,rax,QWORD[24+rsi]
+        adc     r9,rbx
+
+        mulx    r12,rbx,QWORD[32+rsi]
+        adc     r10,rax
+
+        mulx    r13,rax,QWORD[40+rsi]
+        adc     r11,rbx
+
+        mulx    r14,rbx,QWORD[48+rsi]
+        adc     r12,rax
+
+        mulx    r15,rax,QWORD[56+rsi]
+        mov     rdx,QWORD[8+rbp]
+        adc     r13,rbx
+        adc     r14,rax
+        adc     r15,0
+
+        xor     rdi,rdi
+        jmp     NEAR $L$oop_mulx
+
+ALIGN   32
+$L$oop_mulx:
+        mov     rbx,r8
+        mulx    r8,rax,QWORD[rsi]
+        adcx    rbx,rax
+        adox    r8,r9
+
+        mulx    r9,rax,QWORD[8+rsi]
+        adcx    r8,rax
+        adox    r9,r10
+
+        mulx    r10,rax,QWORD[16+rsi]
+        adcx    r9,rax
+        adox    r10,r11
+
+        mulx    r11,rax,QWORD[24+rsi]
+        adcx    r10,rax
+        adox    r11,r12
+
+DB      0x3e,0xc4,0x62,0xfb,0xf6,0xa6,0x20,0x00,0x00,0x00
+        adcx    r11,rax
+        adox    r12,r13
+
+        mulx    r13,rax,QWORD[40+rsi]
+        adcx    r12,rax
+        adox    r13,r14
+
+        mulx    r14,rax,QWORD[48+rsi]
+        adcx    r13,rax
+        adox    r14,r15
+
+        mulx    r15,rax,QWORD[56+rsi]
+        mov     rdx,QWORD[64+rcx*8+rbp]
+        mov     QWORD[((8+64-8))+rcx*8+rsp],rbx
+        adcx    r14,rax
+        adox    r15,rdi
+        adcx    r15,rdi
+
+        inc     rcx
+        jnz     NEAR $L$oop_mulx
+
+        mov     rbx,r8
+        mulx    r8,rax,QWORD[rsi]
+        adcx    rbx,rax
+        adox    r8,r9
+
+DB      0xc4,0x62,0xfb,0xf6,0x8e,0x08,0x00,0x00,0x00
+        adcx    r8,rax
+        adox    r9,r10
+
+DB      0xc4,0x62,0xfb,0xf6,0x96,0x10,0x00,0x00,0x00
+        adcx    r9,rax
+        adox    r10,r11
+
+        mulx    r11,rax,QWORD[24+rsi]
+        adcx    r10,rax
+        adox    r11,r12
+
+        mulx    r12,rax,QWORD[32+rsi]
+        adcx    r11,rax
+        adox    r12,r13
+
+        mulx    r13,rax,QWORD[40+rsi]
+        adcx    r12,rax
+        adox    r13,r14
+
+DB      0xc4,0x62,0xfb,0xf6,0xb6,0x30,0x00,0x00,0x00
+        adcx    r13,rax
+        adox    r14,r15
+
+DB      0xc4,0x62,0xfb,0xf6,0xbe,0x38,0x00,0x00,0x00
+        adcx    r14,rax
+        adox    r15,rdi
+        adcx    r15,rdi
+
+        mov     QWORD[((8+64-8))+rsp],rbx
+        mov     QWORD[((8+64))+rsp],r8
+        mov     QWORD[((8+64+8))+rsp],r9
+        mov     QWORD[((8+64+16))+rsp],r10
+        mov     QWORD[((8+64+24))+rsp],r11
+        mov     QWORD[((8+64+32))+rsp],r12
+        mov     QWORD[((8+64+40))+rsp],r13
+        mov     QWORD[((8+64+48))+rsp],r14
+        mov     QWORD[((8+64+56))+rsp],r15
+
+        DB      0F3h,0C3h               ;repret
+
+global  rsaz_512_scatter4
+
+ALIGN   16
+rsaz_512_scatter4:
+        lea     rcx,[r8*8+rcx]
+        mov     r9d,8
+        jmp     NEAR $L$oop_scatter
+ALIGN   16
+$L$oop_scatter:
+        mov     rax,QWORD[rdx]
+        lea     rdx,[8+rdx]
+        mov     QWORD[rcx],rax
+        lea     rcx,[128+rcx]
+        dec     r9d
+        jnz     NEAR $L$oop_scatter
+        DB      0F3h,0C3h               ;repret
+
+
+global  rsaz_512_gather4
+
+ALIGN   16
+rsaz_512_gather4:
+$L$SEH_begin_rsaz_512_gather4:
+DB      0x48,0x81,0xec,0xa8,0x00,0x00,0x00
+DB      0x0f,0x29,0x34,0x24
+DB      0x0f,0x29,0x7c,0x24,0x10
+DB      0x44,0x0f,0x29,0x44,0x24,0x20
+DB      0x44,0x0f,0x29,0x4c,0x24,0x30
+DB      0x44,0x0f,0x29,0x54,0x24,0x40
+DB      0x44,0x0f,0x29,0x5c,0x24,0x50
+DB      0x44,0x0f,0x29,0x64,0x24,0x60
+DB      0x44,0x0f,0x29,0x6c,0x24,0x70
+DB      0x44,0x0f,0x29,0xb4,0x24,0x80,0,0,0
+DB      0x44,0x0f,0x29,0xbc,0x24,0x90,0,0,0
+        movd    xmm8,r8d
+        movdqa  xmm1,XMMWORD[(($L$inc+16))]
+        movdqa  xmm0,XMMWORD[$L$inc]
+
+        pshufd  xmm8,xmm8,0
+        movdqa  xmm7,xmm1
+        movdqa  xmm2,xmm1
+        paddd   xmm1,xmm0
+        pcmpeqd xmm0,xmm8
+        movdqa  xmm3,xmm7
+        paddd   xmm2,xmm1
+        pcmpeqd xmm1,xmm8
+        movdqa  xmm4,xmm7
+        paddd   xmm3,xmm2
+        pcmpeqd xmm2,xmm8
+        movdqa  xmm5,xmm7
+        paddd   xmm4,xmm3
+        pcmpeqd xmm3,xmm8
+        movdqa  xmm6,xmm7
+        paddd   xmm5,xmm4
+        pcmpeqd xmm4,xmm8
+        paddd   xmm6,xmm5
+        pcmpeqd xmm5,xmm8
+        paddd   xmm7,xmm6
+        pcmpeqd xmm6,xmm8
+        pcmpeqd xmm7,xmm8
+        mov     r9d,8
+        jmp     NEAR $L$oop_gather
+ALIGN   16
+$L$oop_gather:
+        movdqa  xmm8,XMMWORD[rdx]
+        movdqa  xmm9,XMMWORD[16+rdx]
+        movdqa  xmm10,XMMWORD[32+rdx]
+        movdqa  xmm11,XMMWORD[48+rdx]
+        pand    xmm8,xmm0
+        movdqa  xmm12,XMMWORD[64+rdx]
+        pand    xmm9,xmm1
+        movdqa  xmm13,XMMWORD[80+rdx]
+        pand    xmm10,xmm2
+        movdqa  xmm14,XMMWORD[96+rdx]
+        pand    xmm11,xmm3
+        movdqa  xmm15,XMMWORD[112+rdx]
+        lea     rdx,[128+rdx]
+        pand    xmm12,xmm4
+        pand    xmm13,xmm5
+        pand    xmm14,xmm6
+        pand    xmm15,xmm7
+        por     xmm8,xmm10
+        por     xmm9,xmm11
+        por     xmm8,xmm12
+        por     xmm9,xmm13
+        por     xmm8,xmm14
+        por     xmm9,xmm15
+
+        por     xmm8,xmm9
+        pshufd  xmm9,xmm8,0x4e
+        por     xmm8,xmm9
+        movq    QWORD[rcx],xmm8
+        lea     rcx,[8+rcx]
+        dec     r9d
+        jnz     NEAR $L$oop_gather
+        movaps  xmm6,XMMWORD[rsp]
+        movaps  xmm7,XMMWORD[16+rsp]
+        movaps  xmm8,XMMWORD[32+rsp]
+        movaps  xmm9,XMMWORD[48+rsp]
+        movaps  xmm10,XMMWORD[64+rsp]
+        movaps  xmm11,XMMWORD[80+rsp]
+        movaps  xmm12,XMMWORD[96+rsp]
+        movaps  xmm13,XMMWORD[112+rsp]
+        movaps  xmm14,XMMWORD[128+rsp]
+        movaps  xmm15,XMMWORD[144+rsp]
+        add     rsp,0xa8
+        DB      0F3h,0C3h               ;repret
+$L$SEH_end_rsaz_512_gather4:
+
+
+ALIGN   64
+$L$inc:
+        DD      0,0,1,1
+        DD      2,2,2,2
+EXTERN  __imp_RtlVirtualUnwind
+
+ALIGN   16
+se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$common_seh_tail
+
+        mov     rax,QWORD[152+r8]
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$common_seh_tail
+
+        lea     rax,[((128+24+48))+rax]
+
+        lea     rbx,[$L$mul_gather4_epilogue]
+        cmp     rbx,r10
+        jne     NEAR $L$se_not_in_mul_gather4
+
+        lea     rax,[176+rax]
+
+        lea     rsi,[((-48-168))+rax]
+        lea     rdi,[512+r8]
+        mov     ecx,20
+        DD      0xa548f3fc
+
+$L$se_not_in_mul_gather4:
+        mov     rbx,QWORD[((-8))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+        mov     r12,QWORD[((-24))+rax]
+        mov     r13,QWORD[((-32))+rax]
+        mov     r14,QWORD[((-40))+rax]
+        mov     r15,QWORD[((-48))+rax]
+        mov     QWORD[144+r8],rbx
+        mov     QWORD[160+r8],rbp
+        mov     QWORD[216+r8],r12
+        mov     QWORD[224+r8],r13
+        mov     QWORD[232+r8],r14
+        mov     QWORD[240+r8],r15
+
+$L$common_seh_tail:
+        mov     rdi,QWORD[8+rax]
+        mov     rsi,QWORD[16+rax]
+        mov     QWORD[152+r8],rax
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+
+        mov     rdi,QWORD[40+r9]
+        mov     rsi,r8
+        mov     ecx,154
+        DD      0xa548f3fc
+
+        mov     rsi,r9
+        xor     rcx,rcx
+        mov     rdx,QWORD[8+rsi]
+        mov     r8,QWORD[rsi]
+        mov     r9,QWORD[16+rsi]
+        mov     r10,QWORD[40+rsi]
+        lea     r11,[56+rsi]
+        lea     r12,[24+rsi]
+        mov     QWORD[32+rsp],r10
+        mov     QWORD[40+rsp],r11
+        mov     QWORD[48+rsp],r12
+        mov     QWORD[56+rsp],rcx
+        call    QWORD[__imp_RtlVirtualUnwind]
+
+        mov     eax,1
+        add     rsp,64
+        popfq
+        pop     r15
+        pop     r14
+        pop     r13
+        pop     r12
+        pop     rbp
+        pop     rbx
+        pop     rdi
+        pop     rsi
+        DB      0F3h,0C3h               ;repret
+
+
+section .pdata rdata align=4
+ALIGN   4
+        DD      $L$SEH_begin_rsaz_512_sqr wrt ..imagebase
+        DD      $L$SEH_end_rsaz_512_sqr wrt ..imagebase
+        DD      $L$SEH_info_rsaz_512_sqr wrt ..imagebase
+
+        DD      $L$SEH_begin_rsaz_512_mul wrt ..imagebase
+        DD      $L$SEH_end_rsaz_512_mul wrt ..imagebase
+        DD      $L$SEH_info_rsaz_512_mul wrt ..imagebase
+
+        DD      $L$SEH_begin_rsaz_512_mul_gather4 wrt ..imagebase
+        DD      $L$SEH_end_rsaz_512_mul_gather4 wrt ..imagebase
+        DD      $L$SEH_info_rsaz_512_mul_gather4 wrt ..imagebase
+
+        DD      $L$SEH_begin_rsaz_512_mul_scatter4 wrt ..imagebase
+        DD      $L$SEH_end_rsaz_512_mul_scatter4 wrt ..imagebase
+        DD      $L$SEH_info_rsaz_512_mul_scatter4 wrt ..imagebase
+
+        DD      $L$SEH_begin_rsaz_512_mul_by_one wrt ..imagebase
+        DD      $L$SEH_end_rsaz_512_mul_by_one wrt ..imagebase
+        DD      $L$SEH_info_rsaz_512_mul_by_one wrt ..imagebase
+
+        DD      $L$SEH_begin_rsaz_512_gather4 wrt ..imagebase
+        DD      $L$SEH_end_rsaz_512_gather4 wrt ..imagebase
+        DD      $L$SEH_info_rsaz_512_gather4 wrt ..imagebase
+
+section .xdata rdata align=8
+ALIGN   8
+$L$SEH_info_rsaz_512_sqr:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$sqr_body wrt ..imagebase,$L$sqr_epilogue wrt ..imagebase
+$L$SEH_info_rsaz_512_mul:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$mul_body wrt ..imagebase,$L$mul_epilogue wrt ..imagebase
+$L$SEH_info_rsaz_512_mul_gather4:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$mul_gather4_body wrt ..imagebase,$L$mul_gather4_epilogue wrt ..imagebase
+$L$SEH_info_rsaz_512_mul_scatter4:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$mul_scatter4_body wrt ..imagebase,$L$mul_scatter4_epilogue wrt ..imagebase
+$L$SEH_info_rsaz_512_mul_by_one:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$mul_by_one_body wrt ..imagebase,$L$mul_by_one_epilogue wrt ..imagebase
+$L$SEH_info_rsaz_512_gather4:
+DB      0x01,0x46,0x16,0x00
+DB      0x46,0xf8,0x09,0x00
+DB      0x3d,0xe8,0x08,0x00
+DB      0x34,0xd8,0x07,0x00
+DB      0x2e,0xc8,0x06,0x00
+DB      0x28,0xb8,0x05,0x00
+DB      0x22,0xa8,0x04,0x00
+DB      0x1c,0x98,0x03,0x00
+DB      0x16,0x88,0x02,0x00
+DB      0x10,0x78,0x01,0x00
+DB      0x0b,0x68,0x00,0x00
+DB      0x07,0x01,0x15,0x00
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-gf2m.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-gf2m.nasm
new file mode 100644
index 0000000000..b96e85a35a
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-gf2m.nasm
@@ -0,0 +1,432 @@
+; Copyright 2011-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+
+ALIGN   16
+_mul_1x1:
+
+        sub     rsp,128+8
+
+        mov     r9,-1
+        lea     rsi,[rax*1+rax]
+        shr     r9,3
+        lea     rdi,[rax*4]
+        and     r9,rax
+        lea     r12,[rax*8]
+        sar     rax,63
+        lea     r10,[r9*1+r9]
+        sar     rsi,63
+        lea     r11,[r9*4]
+        and     rax,rbp
+        sar     rdi,63
+        mov     rdx,rax
+        shl     rax,63
+        and     rsi,rbp
+        shr     rdx,1
+        mov     rcx,rsi
+        shl     rsi,62
+        and     rdi,rbp
+        shr     rcx,2
+        xor     rax,rsi
+        mov     rbx,rdi
+        shl     rdi,61
+        xor     rdx,rcx
+        shr     rbx,3
+        xor     rax,rdi
+        xor     rdx,rbx
+
+        mov     r13,r9
+        mov     QWORD[rsp],0
+        xor     r13,r10
+        mov     QWORD[8+rsp],r9
+        mov     r14,r11
+        mov     QWORD[16+rsp],r10
+        xor     r14,r12
+        mov     QWORD[24+rsp],r13
+
+        xor     r9,r11
+        mov     QWORD[32+rsp],r11
+        xor     r10,r11
+        mov     QWORD[40+rsp],r9
+        xor     r13,r11
+        mov     QWORD[48+rsp],r10
+        xor     r9,r14
+        mov     QWORD[56+rsp],r13
+        xor     r10,r14
+
+        mov     QWORD[64+rsp],r12
+        xor     r13,r14
+        mov     QWORD[72+rsp],r9
+        xor     r9,r11
+        mov     QWORD[80+rsp],r10
+        xor     r10,r11
+        mov     QWORD[88+rsp],r13
+
+        xor     r13,r11
+        mov     QWORD[96+rsp],r14
+        mov     rsi,r8
+        mov     QWORD[104+rsp],r9
+        and     rsi,rbp
+        mov     QWORD[112+rsp],r10
+        shr     rbp,4
+        mov     QWORD[120+rsp],r13
+        mov     rdi,r8
+        and     rdi,rbp
+        shr     rbp,4
+
+        movq    xmm0,QWORD[rsi*8+rsp]
+        mov     rsi,r8
+        and     rsi,rbp
+        shr     rbp,4
+        mov     rcx,QWORD[rdi*8+rsp]
+        mov     rdi,r8
+        mov     rbx,rcx
+        shl     rcx,4
+        and     rdi,rbp
+        movq    xmm1,QWORD[rsi*8+rsp]
+        shr     rbx,60
+        xor     rax,rcx
+        pslldq  xmm1,1
+        mov     rsi,r8
+        shr     rbp,4
+        xor     rdx,rbx
+        and     rsi,rbp
+        shr     rbp,4
+        pxor    xmm0,xmm1
+        mov     rcx,QWORD[rdi*8+rsp]
+        mov     rdi,r8
+        mov     rbx,rcx
+        shl     rcx,12
+        and     rdi,rbp
+        movq    xmm1,QWORD[rsi*8+rsp]
+        shr     rbx,52
+        xor     rax,rcx
+        pslldq  xmm1,2
+        mov     rsi,r8
+        shr     rbp,4
+        xor     rdx,rbx
+        and     rsi,rbp
+        shr     rbp,4
+        pxor    xmm0,xmm1
+        mov     rcx,QWORD[rdi*8+rsp]
+        mov     rdi,r8
+        mov     rbx,rcx
+        shl     rcx,20
+        and     rdi,rbp
+        movq    xmm1,QWORD[rsi*8+rsp]
+        shr     rbx,44
+        xor     rax,rcx
+        pslldq  xmm1,3
+        mov     rsi,r8
+        shr     rbp,4
+        xor     rdx,rbx
+        and     rsi,rbp
+        shr     rbp,4
+        pxor    xmm0,xmm1
+        mov     rcx,QWORD[rdi*8+rsp]
+        mov     rdi,r8
+        mov     rbx,rcx
+        shl     rcx,28
+        and     rdi,rbp
+        movq    xmm1,QWORD[rsi*8+rsp]
+        shr     rbx,36
+        xor     rax,rcx
+        pslldq  xmm1,4
+        mov     rsi,r8
+        shr     rbp,4
+        xor     rdx,rbx
+        and     rsi,rbp
+        shr     rbp,4
+        pxor    xmm0,xmm1
+        mov     rcx,QWORD[rdi*8+rsp]
+        mov     rdi,r8
+        mov     rbx,rcx
+        shl     rcx,36
+        and     rdi,rbp
+        movq    xmm1,QWORD[rsi*8+rsp]
+        shr     rbx,28
+        xor     rax,rcx
+        pslldq  xmm1,5
+        mov     rsi,r8
+        shr     rbp,4
+        xor     rdx,rbx
+        and     rsi,rbp
+        shr     rbp,4
+        pxor    xmm0,xmm1
+        mov     rcx,QWORD[rdi*8+rsp]
+        mov     rdi,r8
+        mov     rbx,rcx
+        shl     rcx,44
+        and     rdi,rbp
+        movq    xmm1,QWORD[rsi*8+rsp]
+        shr     rbx,20
+        xor     rax,rcx
+        pslldq  xmm1,6
+        mov     rsi,r8
+        shr     rbp,4
+        xor     rdx,rbx
+        and     rsi,rbp
+        shr     rbp,4
+        pxor    xmm0,xmm1
+        mov     rcx,QWORD[rdi*8+rsp]
+        mov     rdi,r8
+        mov     rbx,rcx
+        shl     rcx,52
+        and     rdi,rbp
+        movq    xmm1,QWORD[rsi*8+rsp]
+        shr     rbx,12
+        xor     rax,rcx
+        pslldq  xmm1,7
+        mov     rsi,r8
+        shr     rbp,4
+        xor     rdx,rbx
+        and     rsi,rbp
+        shr     rbp,4
+        pxor    xmm0,xmm1
+        mov     rcx,QWORD[rdi*8+rsp]
+        mov     rbx,rcx
+        shl     rcx,60
+DB      102,72,15,126,198
+        shr     rbx,4
+        xor     rax,rcx
+        psrldq  xmm0,8
+        xor     rdx,rbx
+DB      102,72,15,126,199
+        xor     rax,rsi
+        xor     rdx,rdi
+
+        add     rsp,128+8
+
+        DB      0F3h,0C3h               ;repret
+$L$end_mul_1x1:
+
+
+EXTERN  OPENSSL_ia32cap_P
+global  bn_GF2m_mul_2x2
+
+ALIGN   16
+bn_GF2m_mul_2x2:
+
+        mov     rax,rsp
+        mov     r10,QWORD[OPENSSL_ia32cap_P]
+        bt      r10,33
+        jnc     NEAR $L$vanilla_mul_2x2
+
+DB      102,72,15,110,194
+DB      102,73,15,110,201
+DB      102,73,15,110,208
+        movq    xmm3,QWORD[40+rsp]
+        movdqa  xmm4,xmm0
+        movdqa  xmm5,xmm1
+DB      102,15,58,68,193,0
+        pxor    xmm4,xmm2
+        pxor    xmm5,xmm3
+DB      102,15,58,68,211,0
+DB      102,15,58,68,229,0
+        xorps   xmm4,xmm0
+        xorps   xmm4,xmm2
+        movdqa  xmm5,xmm4
+        pslldq  xmm4,8
+        psrldq  xmm5,8
+        pxor    xmm2,xmm4
+        pxor    xmm0,xmm5
+        movdqu  XMMWORD[rcx],xmm2
+        movdqu  XMMWORD[16+rcx],xmm0
+        DB      0F3h,0C3h               ;repret
+
+ALIGN   16
+$L$vanilla_mul_2x2:
+        lea     rsp,[((-136))+rsp]
+
+        mov     r10,QWORD[176+rsp]
+        mov     QWORD[120+rsp],rdi
+        mov     QWORD[128+rsp],rsi
+        mov     QWORD[80+rsp],r14
+
+        mov     QWORD[88+rsp],r13
+
+        mov     QWORD[96+rsp],r12
+
+        mov     QWORD[104+rsp],rbp
+
+        mov     QWORD[112+rsp],rbx
+
+$L$body_mul_2x2:
+        mov     QWORD[32+rsp],rcx
+        mov     QWORD[40+rsp],rdx
+        mov     QWORD[48+rsp],r8
+        mov     QWORD[56+rsp],r9
+        mov     QWORD[64+rsp],r10
+
+        mov     r8,0xf
+        mov     rax,rdx
+        mov     rbp,r9
+        call    _mul_1x1
+        mov     QWORD[16+rsp],rax
+        mov     QWORD[24+rsp],rdx
+
+        mov     rax,QWORD[48+rsp]
+        mov     rbp,QWORD[64+rsp]
+        call    _mul_1x1
+        mov     QWORD[rsp],rax
+        mov     QWORD[8+rsp],rdx
+
+        mov     rax,QWORD[40+rsp]
+        mov     rbp,QWORD[56+rsp]
+        xor     rax,QWORD[48+rsp]
+        xor     rbp,QWORD[64+rsp]
+        call    _mul_1x1
+        mov     rbx,QWORD[rsp]
+        mov     rcx,QWORD[8+rsp]
+        mov     rdi,QWORD[16+rsp]
+        mov     rsi,QWORD[24+rsp]
+        mov     rbp,QWORD[32+rsp]
+
+        xor     rax,rdx
+        xor     rdx,rcx
+        xor     rax,rbx
+        mov     QWORD[rbp],rbx
+        xor     rdx,rdi
+        mov     QWORD[24+rbp],rsi
+        xor     rax,rsi
+        xor     rdx,rsi
+        xor     rax,rdx
+        mov     QWORD[16+rbp],rdx
+        mov     QWORD[8+rbp],rax
+
+        mov     r14,QWORD[80+rsp]
+
+        mov     r13,QWORD[88+rsp]
+
+        mov     r12,QWORD[96+rsp]
+
+        mov     rbp,QWORD[104+rsp]
+
+        mov     rbx,QWORD[112+rsp]
+
+        mov     rdi,QWORD[120+rsp]
+        mov     rsi,QWORD[128+rsp]
+        lea     rsp,[136+rsp]
+
+$L$epilogue_mul_2x2:
+        DB      0F3h,0C3h               ;repret
+$L$end_mul_2x2:
+
+
+DB      71,70,40,50,94,109,41,32,77,117,108,116,105,112,108,105
+DB      99,97,116,105,111,110,32,102,111,114,32,120,56,54,95,54
+DB      52,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121
+DB      32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46
+DB      111,114,103,62,0
+ALIGN   16
+EXTERN  __imp_RtlVirtualUnwind
+
+
+ALIGN   16
+se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        lea     r10,[$L$body_mul_2x2]
+        cmp     rbx,r10
+        jb      NEAR $L$in_prologue
+
+        mov     rax,QWORD[152+r8]
+
+        lea     r10,[$L$epilogue_mul_2x2]
+        cmp     rbx,r10
+        jae     NEAR $L$in_prologue
+
+        mov     r14,QWORD[80+rax]
+        mov     r13,QWORD[88+rax]
+        mov     r12,QWORD[96+rax]
+        mov     rbp,QWORD[104+rax]
+        mov     rbx,QWORD[112+rax]
+        mov     rdi,QWORD[120+rax]
+        mov     rsi,QWORD[128+rax]
+
+        mov     QWORD[144+r8],rbx
+        mov     QWORD[160+r8],rbp
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+        mov     QWORD[216+r8],r12
+        mov     QWORD[224+r8],r13
+        mov     QWORD[232+r8],r14
+
+        lea     rax,[136+rax]
+
+$L$in_prologue:
+        mov     QWORD[152+r8],rax
+
+        mov     rdi,QWORD[40+r9]
+        mov     rsi,r8
+        mov     ecx,154
+        DD      0xa548f3fc
+
+        mov     rsi,r9
+        xor     rcx,rcx
+        mov     rdx,QWORD[8+rsi]
+        mov     r8,QWORD[rsi]
+        mov     r9,QWORD[16+rsi]
+        mov     r10,QWORD[40+rsi]
+        lea     r11,[56+rsi]
+        lea     r12,[24+rsi]
+        mov     QWORD[32+rsp],r10
+        mov     QWORD[40+rsp],r11
+        mov     QWORD[48+rsp],r12
+        mov     QWORD[56+rsp],rcx
+        call    QWORD[__imp_RtlVirtualUnwind]
+
+        mov     eax,1
+        add     rsp,64
+        popfq
+        pop     r15
+        pop     r14
+        pop     r13
+        pop     r12
+        pop     rbp
+        pop     rbx
+        pop     rdi
+        pop     rsi
+        DB      0F3h,0C3h               ;repret
+
+
+section .pdata rdata align=4
+ALIGN   4
+        DD      _mul_1x1 wrt ..imagebase
+        DD      $L$end_mul_1x1 wrt ..imagebase
+        DD      $L$SEH_info_1x1 wrt ..imagebase
+
+        DD      $L$vanilla_mul_2x2 wrt ..imagebase
+        DD      $L$end_mul_2x2 wrt ..imagebase
+        DD      $L$SEH_info_2x2 wrt ..imagebase
+section .xdata rdata align=8
+ALIGN   8
+$L$SEH_info_1x1:
+DB      0x01,0x07,0x02,0x00
+DB      0x07,0x01,0x11,0x00
+$L$SEH_info_2x2:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont.nasm
new file mode 100644
index 0000000000..9ff8ec428f
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont.nasm
@@ -0,0 +1,1479 @@
+; Copyright 2005-2018 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+EXTERN  OPENSSL_ia32cap_P
+
+global  bn_mul_mont
+
+ALIGN   16
+bn_mul_mont:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_bn_mul_mont:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        mov     r9d,r9d
+        mov     rax,rsp
+
+        test    r9d,3
+        jnz     NEAR $L$mul_enter
+        cmp     r9d,8
+        jb      NEAR $L$mul_enter
+        mov     r11d,DWORD[((OPENSSL_ia32cap_P+8))]
+        cmp     rdx,rsi
+        jne     NEAR $L$mul4x_enter
+        test    r9d,7
+        jz      NEAR $L$sqr8x_enter
+        jmp     NEAR $L$mul4x_enter
+
+ALIGN   16
+$L$mul_enter:
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+
+        neg     r9
+        mov     r11,rsp
+        lea     r10,[((-16))+r9*8+rsp]
+        neg     r9
+        and     r10,-1024
+
+
+
+
+
+
+
+
+
+        sub     r11,r10
+        and     r11,-4096
+        lea     rsp,[r11*1+r10]
+        mov     r11,QWORD[rsp]
+        cmp     rsp,r10
+        ja      NEAR $L$mul_page_walk
+        jmp     NEAR $L$mul_page_walk_done
+
+ALIGN   16
+$L$mul_page_walk:
+        lea     rsp,[((-4096))+rsp]
+        mov     r11,QWORD[rsp]
+        cmp     rsp,r10
+        ja      NEAR $L$mul_page_walk
+$L$mul_page_walk_done:
+
+        mov     QWORD[8+r9*8+rsp],rax
+
+$L$mul_body:
+        mov     r12,rdx
+        mov     r8,QWORD[r8]
+        mov     rbx,QWORD[r12]
+        mov     rax,QWORD[rsi]
+
+        xor     r14,r14
+        xor     r15,r15
+
+        mov     rbp,r8
+        mul     rbx
+        mov     r10,rax
+        mov     rax,QWORD[rcx]
+
+        imul    rbp,r10
+        mov     r11,rdx
+
+        mul     rbp
+        add     r10,rax
+        mov     rax,QWORD[8+rsi]
+        adc     rdx,0
+        mov     r13,rdx
+
+        lea     r15,[1+r15]
+        jmp     NEAR $L$1st_enter
+
+ALIGN   16
+$L$1st:
+        add     r13,rax
+        mov     rax,QWORD[r15*8+rsi]
+        adc     rdx,0
+        add     r13,r11
+        mov     r11,r10
+        adc     rdx,0
+        mov     QWORD[((-16))+r15*8+rsp],r13
+        mov     r13,rdx
+
+$L$1st_enter:
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[r15*8+rcx]
+        adc     rdx,0
+        lea     r15,[1+r15]
+        mov     r10,rdx
+
+        mul     rbp
+        cmp     r15,r9
+        jne     NEAR $L$1st
+
+        add     r13,rax
+        mov     rax,QWORD[rsi]
+        adc     rdx,0
+        add     r13,r11
+        adc     rdx,0
+        mov     QWORD[((-16))+r15*8+rsp],r13
+        mov     r13,rdx
+        mov     r11,r10
+
+        xor     rdx,rdx
+        add     r13,r11
+        adc     rdx,0
+        mov     QWORD[((-8))+r9*8+rsp],r13
+        mov     QWORD[r9*8+rsp],rdx
+
+        lea     r14,[1+r14]
+        jmp     NEAR $L$outer
+ALIGN   16
+$L$outer:
+        mov     rbx,QWORD[r14*8+r12]
+        xor     r15,r15
+        mov     rbp,r8
+        mov     r10,QWORD[rsp]
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[rcx]
+        adc     rdx,0
+
+        imul    rbp,r10
+        mov     r11,rdx
+
+        mul     rbp
+        add     r10,rax
+        mov     rax,QWORD[8+rsi]
+        adc     rdx,0
+        mov     r10,QWORD[8+rsp]
+        mov     r13,rdx
+
+        lea     r15,[1+r15]
+        jmp     NEAR $L$inner_enter
+
+ALIGN   16
+$L$inner:
+        add     r13,rax
+        mov     rax,QWORD[r15*8+rsi]
+        adc     rdx,0
+        add     r13,r10
+        mov     r10,QWORD[r15*8+rsp]
+        adc     rdx,0
+        mov     QWORD[((-16))+r15*8+rsp],r13
+        mov     r13,rdx
+
+$L$inner_enter:
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[r15*8+rcx]
+        adc     rdx,0
+        add     r10,r11
+        mov     r11,rdx
+        adc     r11,0
+        lea     r15,[1+r15]
+
+        mul     rbp
+        cmp     r15,r9
+        jne     NEAR $L$inner
+
+        add     r13,rax
+        mov     rax,QWORD[rsi]
+        adc     rdx,0
+        add     r13,r10
+        mov     r10,QWORD[r15*8+rsp]
+        adc     rdx,0
+        mov     QWORD[((-16))+r15*8+rsp],r13
+        mov     r13,rdx
+
+        xor     rdx,rdx
+        add     r13,r11
+        adc     rdx,0
+        add     r13,r10
+        adc     rdx,0
+        mov     QWORD[((-8))+r9*8+rsp],r13
+        mov     QWORD[r9*8+rsp],rdx
+
+        lea     r14,[1+r14]
+        cmp     r14,r9
+        jb      NEAR $L$outer
+
+        xor     r14,r14
+        mov     rax,QWORD[rsp]
+        mov     r15,r9
+
+ALIGN   16
+$L$sub: sbb     rax,QWORD[r14*8+rcx]
+        mov     QWORD[r14*8+rdi],rax
+        mov     rax,QWORD[8+r14*8+rsp]
+        lea     r14,[1+r14]
+        dec     r15
+        jnz     NEAR $L$sub
+
+        sbb     rax,0
+        mov     rbx,-1
+        xor     rbx,rax
+        xor     r14,r14
+        mov     r15,r9
+
+$L$copy:
+        mov     rcx,QWORD[r14*8+rdi]
+        mov     rdx,QWORD[r14*8+rsp]
+        and     rcx,rbx
+        and     rdx,rax
+        mov     QWORD[r14*8+rsp],r9
+        or      rdx,rcx
+        mov     QWORD[r14*8+rdi],rdx
+        lea     r14,[1+r14]
+        sub     r15,1
+        jnz     NEAR $L$copy
+
+        mov     rsi,QWORD[8+r9*8+rsp]
+
+        mov     rax,1
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$mul_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_bn_mul_mont:
+
+ALIGN   16
+bn_mul4x_mont:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_bn_mul4x_mont:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        mov     r9d,r9d
+        mov     rax,rsp
+
+$L$mul4x_enter:
+        and     r11d,0x80100
+        cmp     r11d,0x80100
+        je      NEAR $L$mulx4x_enter
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+
+        neg     r9
+        mov     r11,rsp
+        lea     r10,[((-32))+r9*8+rsp]
+        neg     r9
+        and     r10,-1024
+
+        sub     r11,r10
+        and     r11,-4096
+        lea     rsp,[r11*1+r10]
+        mov     r11,QWORD[rsp]
+        cmp     rsp,r10
+        ja      NEAR $L$mul4x_page_walk
+        jmp     NEAR $L$mul4x_page_walk_done
+
+$L$mul4x_page_walk:
+        lea     rsp,[((-4096))+rsp]
+        mov     r11,QWORD[rsp]
+        cmp     rsp,r10
+        ja      NEAR $L$mul4x_page_walk
+$L$mul4x_page_walk_done:
+
+        mov     QWORD[8+r9*8+rsp],rax
+
+$L$mul4x_body:
+        mov     QWORD[16+r9*8+rsp],rdi
+        mov     r12,rdx
+        mov     r8,QWORD[r8]
+        mov     rbx,QWORD[r12]
+        mov     rax,QWORD[rsi]
+
+        xor     r14,r14
+        xor     r15,r15
+
+        mov     rbp,r8
+        mul     rbx
+        mov     r10,rax
+        mov     rax,QWORD[rcx]
+
+        imul    rbp,r10
+        mov     r11,rdx
+
+        mul     rbp
+        add     r10,rax
+        mov     rax,QWORD[8+rsi]
+        adc     rdx,0
+        mov     rdi,rdx
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[8+rcx]
+        adc     rdx,0
+        mov     r10,rdx
+
+        mul     rbp
+        add     rdi,rax
+        mov     rax,QWORD[16+rsi]
+        adc     rdx,0
+        add     rdi,r11
+        lea     r15,[4+r15]
+        adc     rdx,0
+        mov     QWORD[rsp],rdi
+        mov     r13,rdx
+        jmp     NEAR $L$1st4x
+ALIGN   16
+$L$1st4x:
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[((-16))+r15*8+rcx]
+        adc     rdx,0
+        mov     r11,rdx
+
+        mul     rbp
+        add     r13,rax
+        mov     rax,QWORD[((-8))+r15*8+rsi]
+        adc     rdx,0
+        add     r13,r10
+        adc     rdx,0
+        mov     QWORD[((-24))+r15*8+rsp],r13
+        mov     rdi,rdx
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[((-8))+r15*8+rcx]
+        adc     rdx,0
+        mov     r10,rdx
+
+        mul     rbp
+        add     rdi,rax
+        mov     rax,QWORD[r15*8+rsi]
+        adc     rdx,0
+        add     rdi,r11
+        adc     rdx,0
+        mov     QWORD[((-16))+r15*8+rsp],rdi
+        mov     r13,rdx
+
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[r15*8+rcx]
+        adc     rdx,0
+        mov     r11,rdx
+
+        mul     rbp
+        add     r13,rax
+        mov     rax,QWORD[8+r15*8+rsi]
+        adc     rdx,0
+        add     r13,r10
+        adc     rdx,0
+        mov     QWORD[((-8))+r15*8+rsp],r13
+        mov     rdi,rdx
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[8+r15*8+rcx]
+        adc     rdx,0
+        lea     r15,[4+r15]
+        mov     r10,rdx
+
+        mul     rbp
+        add     rdi,rax
+        mov     rax,QWORD[((-16))+r15*8+rsi]
+        adc     rdx,0
+        add     rdi,r11
+        adc     rdx,0
+        mov     QWORD[((-32))+r15*8+rsp],rdi
+        mov     r13,rdx
+        cmp     r15,r9
+        jb      NEAR $L$1st4x
+
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[((-16))+r15*8+rcx]
+        adc     rdx,0
+        mov     r11,rdx
+
+        mul     rbp
+        add     r13,rax
+        mov     rax,QWORD[((-8))+r15*8+rsi]
+        adc     rdx,0
+        add     r13,r10
+        adc     rdx,0
+        mov     QWORD[((-24))+r15*8+rsp],r13
+        mov     rdi,rdx
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[((-8))+r15*8+rcx]
+        adc     rdx,0
+        mov     r10,rdx
+
+        mul     rbp
+        add     rdi,rax
+        mov     rax,QWORD[rsi]
+        adc     rdx,0
+        add     rdi,r11
+        adc     rdx,0
+        mov     QWORD[((-16))+r15*8+rsp],rdi
+        mov     r13,rdx
+
+        xor     rdi,rdi
+        add     r13,r10
+        adc     rdi,0
+        mov     QWORD[((-8))+r15*8+rsp],r13
+        mov     QWORD[r15*8+rsp],rdi
+
+        lea     r14,[1+r14]
+ALIGN   4
+$L$outer4x:
+        mov     rbx,QWORD[r14*8+r12]
+        xor     r15,r15
+        mov     r10,QWORD[rsp]
+        mov     rbp,r8
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[rcx]
+        adc     rdx,0
+
+        imul    rbp,r10
+        mov     r11,rdx
+
+        mul     rbp
+        add     r10,rax
+        mov     rax,QWORD[8+rsi]
+        adc     rdx,0
+        mov     rdi,rdx
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[8+rcx]
+        adc     rdx,0
+        add     r11,QWORD[8+rsp]
+        adc     rdx,0
+        mov     r10,rdx
+
+        mul     rbp
+        add     rdi,rax
+        mov     rax,QWORD[16+rsi]
+        adc     rdx,0
+        add     rdi,r11
+        lea     r15,[4+r15]
+        adc     rdx,0
+        mov     QWORD[rsp],rdi
+        mov     r13,rdx
+        jmp     NEAR $L$inner4x
+ALIGN   16
+$L$inner4x:
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[((-16))+r15*8+rcx]
+        adc     rdx,0
+        add     r10,QWORD[((-16))+r15*8+rsp]
+        adc     rdx,0
+        mov     r11,rdx
+
+        mul     rbp
+        add     r13,rax
+        mov     rax,QWORD[((-8))+r15*8+rsi]
+        adc     rdx,0
+        add     r13,r10
+        adc     rdx,0
+        mov     QWORD[((-24))+r15*8+rsp],r13
+        mov     rdi,rdx
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[((-8))+r15*8+rcx]
+        adc     rdx,0
+        add     r11,QWORD[((-8))+r15*8+rsp]
+        adc     rdx,0
+        mov     r10,rdx
+
+        mul     rbp
+        add     rdi,rax
+        mov     rax,QWORD[r15*8+rsi]
+        adc     rdx,0
+        add     rdi,r11
+        adc     rdx,0
+        mov     QWORD[((-16))+r15*8+rsp],rdi
+        mov     r13,rdx
+
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[r15*8+rcx]
+        adc     rdx,0
+        add     r10,QWORD[r15*8+rsp]
+        adc     rdx,0
+        mov     r11,rdx
+
+        mul     rbp
+        add     r13,rax
+        mov     rax,QWORD[8+r15*8+rsi]
+        adc     rdx,0
+        add     r13,r10
+        adc     rdx,0
+        mov     QWORD[((-8))+r15*8+rsp],r13
+        mov     rdi,rdx
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[8+r15*8+rcx]
+        adc     rdx,0
+        add     r11,QWORD[8+r15*8+rsp]
+        adc     rdx,0
+        lea     r15,[4+r15]
+        mov     r10,rdx
+
+        mul     rbp
+        add     rdi,rax
+        mov     rax,QWORD[((-16))+r15*8+rsi]
+        adc     rdx,0
+        add     rdi,r11
+        adc     rdx,0
+        mov     QWORD[((-32))+r15*8+rsp],rdi
+        mov     r13,rdx
+        cmp     r15,r9
+        jb      NEAR $L$inner4x
+
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[((-16))+r15*8+rcx]
+        adc     rdx,0
+        add     r10,QWORD[((-16))+r15*8+rsp]
+        adc     rdx,0
+        mov     r11,rdx
+
+        mul     rbp
+        add     r13,rax
+        mov     rax,QWORD[((-8))+r15*8+rsi]
+        adc     rdx,0
+        add     r13,r10
+        adc     rdx,0
+        mov     QWORD[((-24))+r15*8+rsp],r13
+        mov     rdi,rdx
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[((-8))+r15*8+rcx]
+        adc     rdx,0
+        add     r11,QWORD[((-8))+r15*8+rsp]
+        adc     rdx,0
+        lea     r14,[1+r14]
+        mov     r10,rdx
+
+        mul     rbp
+        add     rdi,rax
+        mov     rax,QWORD[rsi]
+        adc     rdx,0
+        add     rdi,r11
+        adc     rdx,0
+        mov     QWORD[((-16))+r15*8+rsp],rdi
+        mov     r13,rdx
+
+        xor     rdi,rdi
+        add     r13,r10
+        adc     rdi,0
+        add     r13,QWORD[r9*8+rsp]
+        adc     rdi,0
+        mov     QWORD[((-8))+r15*8+rsp],r13
+        mov     QWORD[r15*8+rsp],rdi
+
+        cmp     r14,r9
+        jb      NEAR $L$outer4x
+        mov     rdi,QWORD[16+r9*8+rsp]
+        lea     r15,[((-4))+r9]
+        mov     rax,QWORD[rsp]
+        mov     rdx,QWORD[8+rsp]
+        shr     r15,2
+        lea     rsi,[rsp]
+        xor     r14,r14
+
+        sub     rax,QWORD[rcx]
+        mov     rbx,QWORD[16+rsi]
+        mov     rbp,QWORD[24+rsi]
+        sbb     rdx,QWORD[8+rcx]
+
+$L$sub4x:
+        mov     QWORD[r14*8+rdi],rax
+        mov     QWORD[8+r14*8+rdi],rdx
+        sbb     rbx,QWORD[16+r14*8+rcx]
+        mov     rax,QWORD[32+r14*8+rsi]
+        mov     rdx,QWORD[40+r14*8+rsi]
+        sbb     rbp,QWORD[24+r14*8+rcx]
+        mov     QWORD[16+r14*8+rdi],rbx
+        mov     QWORD[24+r14*8+rdi],rbp
+        sbb     rax,QWORD[32+r14*8+rcx]
+        mov     rbx,QWORD[48+r14*8+rsi]
+        mov     rbp,QWORD[56+r14*8+rsi]
+        sbb     rdx,QWORD[40+r14*8+rcx]
+        lea     r14,[4+r14]
+        dec     r15
+        jnz     NEAR $L$sub4x
+
+        mov     QWORD[r14*8+rdi],rax
+        mov     rax,QWORD[32+r14*8+rsi]
+        sbb     rbx,QWORD[16+r14*8+rcx]
+        mov     QWORD[8+r14*8+rdi],rdx
+        sbb     rbp,QWORD[24+r14*8+rcx]
+        mov     QWORD[16+r14*8+rdi],rbx
+
+        sbb     rax,0
+        mov     QWORD[24+r14*8+rdi],rbp
+        pxor    xmm0,xmm0
+DB      102,72,15,110,224
+        pcmpeqd xmm5,xmm5
+        pshufd  xmm4,xmm4,0
+        mov     r15,r9
+        pxor    xmm5,xmm4
+        shr     r15,2
+        xor     eax,eax
+
+        jmp     NEAR $L$copy4x
+ALIGN   16
+$L$copy4x:
+        movdqa  xmm1,XMMWORD[rax*1+rsp]
+        movdqu  xmm2,XMMWORD[rax*1+rdi]
+        pand    xmm1,xmm4
+        pand    xmm2,xmm5
+        movdqa  xmm3,XMMWORD[16+rax*1+rsp]
+        movdqa  XMMWORD[rax*1+rsp],xmm0
+        por     xmm1,xmm2
+        movdqu  xmm2,XMMWORD[16+rax*1+rdi]
+        movdqu  XMMWORD[rax*1+rdi],xmm1
+        pand    xmm3,xmm4
+        pand    xmm2,xmm5
+        movdqa  XMMWORD[16+rax*1+rsp],xmm0
+        por     xmm3,xmm2
+        movdqu  XMMWORD[16+rax*1+rdi],xmm3
+        lea     rax,[32+rax]
+        dec     r15
+        jnz     NEAR $L$copy4x
+        mov     rsi,QWORD[8+r9*8+rsp]
+
+        mov     rax,1
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$mul4x_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_bn_mul4x_mont:
+EXTERN  bn_sqrx8x_internal
+EXTERN  bn_sqr8x_internal
+
+
+ALIGN   32
+bn_sqr8x_mont:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_bn_sqr8x_mont:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        mov     rax,rsp
+
+$L$sqr8x_enter:
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+$L$sqr8x_prologue:
+
+        mov     r10d,r9d
+        shl     r9d,3
+        shl     r10,3+2
+        neg     r9
+
+
+
+
+
+
+        lea     r11,[((-64))+r9*2+rsp]
+        mov     rbp,rsp
+        mov     r8,QWORD[r8]
+        sub     r11,rsi
+        and     r11,4095
+        cmp     r10,r11
+        jb      NEAR $L$sqr8x_sp_alt
+        sub     rbp,r11
+        lea     rbp,[((-64))+r9*2+rbp]
+        jmp     NEAR $L$sqr8x_sp_done
+
+ALIGN   32
+$L$sqr8x_sp_alt:
+        lea     r10,[((4096-64))+r9*2]
+        lea     rbp,[((-64))+r9*2+rbp]
+        sub     r11,r10
+        mov     r10,0
+        cmovc   r11,r10
+        sub     rbp,r11
+$L$sqr8x_sp_done:
+        and     rbp,-64
+        mov     r11,rsp
+        sub     r11,rbp
+        and     r11,-4096
+        lea     rsp,[rbp*1+r11]
+        mov     r10,QWORD[rsp]
+        cmp     rsp,rbp
+        ja      NEAR $L$sqr8x_page_walk
+        jmp     NEAR $L$sqr8x_page_walk_done
+
+ALIGN   16
+$L$sqr8x_page_walk:
+        lea     rsp,[((-4096))+rsp]
+        mov     r10,QWORD[rsp]
+        cmp     rsp,rbp
+        ja      NEAR $L$sqr8x_page_walk
+$L$sqr8x_page_walk_done:
+
+        mov     r10,r9
+        neg     r9
+
+        mov     QWORD[32+rsp],r8
+        mov     QWORD[40+rsp],rax
+
+$L$sqr8x_body:
+
+DB      102,72,15,110,209
+        pxor    xmm0,xmm0
+DB      102,72,15,110,207
+DB      102,73,15,110,218
+        mov     eax,DWORD[((OPENSSL_ia32cap_P+8))]
+        and     eax,0x80100
+        cmp     eax,0x80100
+        jne     NEAR $L$sqr8x_nox
+
+        call    bn_sqrx8x_internal
+
+
+
+
+        lea     rbx,[rcx*1+r8]
+        mov     r9,rcx
+        mov     rdx,rcx
+DB      102,72,15,126,207
+        sar     rcx,3+2
+        jmp     NEAR $L$sqr8x_sub
+
+ALIGN   32
+$L$sqr8x_nox:
+        call    bn_sqr8x_internal
+
+
+
+
+        lea     rbx,[r9*1+rdi]
+        mov     rcx,r9
+        mov     rdx,r9
+DB      102,72,15,126,207
+        sar     rcx,3+2
+        jmp     NEAR $L$sqr8x_sub
+
+ALIGN   32
+$L$sqr8x_sub:
+        mov     r12,QWORD[rbx]
+        mov     r13,QWORD[8+rbx]
+        mov     r14,QWORD[16+rbx]
+        mov     r15,QWORD[24+rbx]
+        lea     rbx,[32+rbx]
+        sbb     r12,QWORD[rbp]
+        sbb     r13,QWORD[8+rbp]
+        sbb     r14,QWORD[16+rbp]
+        sbb     r15,QWORD[24+rbp]
+        lea     rbp,[32+rbp]
+        mov     QWORD[rdi],r12
+        mov     QWORD[8+rdi],r13
+        mov     QWORD[16+rdi],r14
+        mov     QWORD[24+rdi],r15
+        lea     rdi,[32+rdi]
+        inc     rcx
+        jnz     NEAR $L$sqr8x_sub
+
+        sbb     rax,0
+        lea     rbx,[r9*1+rbx]
+        lea     rdi,[r9*1+rdi]
+
+DB      102,72,15,110,200
+        pxor    xmm0,xmm0
+        pshufd  xmm1,xmm1,0
+        mov     rsi,QWORD[40+rsp]
+
+        jmp     NEAR $L$sqr8x_cond_copy
+
+ALIGN   32
+$L$sqr8x_cond_copy:
+        movdqa  xmm2,XMMWORD[rbx]
+        movdqa  xmm3,XMMWORD[16+rbx]
+        lea     rbx,[32+rbx]
+        movdqu  xmm4,XMMWORD[rdi]
+        movdqu  xmm5,XMMWORD[16+rdi]
+        lea     rdi,[32+rdi]
+        movdqa  XMMWORD[(-32)+rbx],xmm0
+        movdqa  XMMWORD[(-16)+rbx],xmm0
+        movdqa  XMMWORD[(-32)+rdx*1+rbx],xmm0
+        movdqa  XMMWORD[(-16)+rdx*1+rbx],xmm0
+        pcmpeqd xmm0,xmm1
+        pand    xmm2,xmm1
+        pand    xmm3,xmm1
+        pand    xmm4,xmm0
+        pand    xmm5,xmm0
+        pxor    xmm0,xmm0
+        por     xmm4,xmm2
+        por     xmm5,xmm3
+        movdqu  XMMWORD[(-32)+rdi],xmm4
+        movdqu  XMMWORD[(-16)+rdi],xmm5
+        add     r9,32
+        jnz     NEAR $L$sqr8x_cond_copy
+
+        mov     rax,1
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$sqr8x_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_bn_sqr8x_mont:
+
+ALIGN   32
+bn_mulx4x_mont:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_bn_mulx4x_mont:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        mov     rax,rsp
+
+$L$mulx4x_enter:
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+$L$mulx4x_prologue:
+
+        shl     r9d,3
+        xor     r10,r10
+        sub     r10,r9
+        mov     r8,QWORD[r8]
+        lea     rbp,[((-72))+r10*1+rsp]
+        and     rbp,-128
+        mov     r11,rsp
+        sub     r11,rbp
+        and     r11,-4096
+        lea     rsp,[rbp*1+r11]
+        mov     r10,QWORD[rsp]
+        cmp     rsp,rbp
+        ja      NEAR $L$mulx4x_page_walk
+        jmp     NEAR $L$mulx4x_page_walk_done
+
+ALIGN   16
+$L$mulx4x_page_walk:
+        lea     rsp,[((-4096))+rsp]
+        mov     r10,QWORD[rsp]
+        cmp     rsp,rbp
+        ja      NEAR $L$mulx4x_page_walk
+$L$mulx4x_page_walk_done:
+
+        lea     r10,[r9*1+rdx]
+
+
+
+
+
+
+
+
+
+
+
+
+        mov     QWORD[rsp],r9
+        shr     r9,5
+        mov     QWORD[16+rsp],r10
+        sub     r9,1
+        mov     QWORD[24+rsp],r8
+        mov     QWORD[32+rsp],rdi
+        mov     QWORD[40+rsp],rax
+
+        mov     QWORD[48+rsp],r9
+        jmp     NEAR $L$mulx4x_body
+
+ALIGN   32
+$L$mulx4x_body:
+        lea     rdi,[8+rdx]
+        mov     rdx,QWORD[rdx]
+        lea     rbx,[((64+32))+rsp]
+        mov     r9,rdx
+
+        mulx    rax,r8,QWORD[rsi]
+        mulx    r14,r11,QWORD[8+rsi]
+        add     r11,rax
+        mov     QWORD[8+rsp],rdi
+        mulx    r13,r12,QWORD[16+rsi]
+        adc     r12,r14
+        adc     r13,0
+
+        mov     rdi,r8
+        imul    r8,QWORD[24+rsp]
+        xor     rbp,rbp
+
+        mulx    r14,rax,QWORD[24+rsi]
+        mov     rdx,r8
+        lea     rsi,[32+rsi]
+        adcx    r13,rax
+        adcx    r14,rbp
+
+        mulx    r10,rax,QWORD[rcx]
+        adcx    rdi,rax
+        adox    r10,r11
+        mulx    r11,rax,QWORD[8+rcx]
+        adcx    r10,rax
+        adox    r11,r12
+DB      0xc4,0x62,0xfb,0xf6,0xa1,0x10,0x00,0x00,0x00
+        mov     rdi,QWORD[48+rsp]
+        mov     QWORD[((-32))+rbx],r10
+        adcx    r11,rax
+        adox    r12,r13
+        mulx    r15,rax,QWORD[24+rcx]
+        mov     rdx,r9
+        mov     QWORD[((-24))+rbx],r11
+        adcx    r12,rax
+        adox    r15,rbp
+        lea     rcx,[32+rcx]
+        mov     QWORD[((-16))+rbx],r12
+
+        jmp     NEAR $L$mulx4x_1st
+
+ALIGN   32
+$L$mulx4x_1st:
+        adcx    r15,rbp
+        mulx    rax,r10,QWORD[rsi]
+        adcx    r10,r14
+        mulx    r14,r11,QWORD[8+rsi]
+        adcx    r11,rax
+        mulx    rax,r12,QWORD[16+rsi]
+        adcx    r12,r14
+        mulx    r14,r13,QWORD[24+rsi]
+DB      0x67,0x67
+        mov     rdx,r8
+        adcx    r13,rax
+        adcx    r14,rbp
+        lea     rsi,[32+rsi]
+        lea     rbx,[32+rbx]
+
+        adox    r10,r15
+        mulx    r15,rax,QWORD[rcx]
+        adcx    r10,rax
+        adox    r11,r15
+        mulx    r15,rax,QWORD[8+rcx]
+        adcx    r11,rax
+        adox    r12,r15
+        mulx    r15,rax,QWORD[16+rcx]
+        mov     QWORD[((-40))+rbx],r10
+        adcx    r12,rax
+        mov     QWORD[((-32))+rbx],r11
+        adox    r13,r15
+        mulx    r15,rax,QWORD[24+rcx]
+        mov     rdx,r9
+        mov     QWORD[((-24))+rbx],r12
+        adcx    r13,rax
+        adox    r15,rbp
+        lea     rcx,[32+rcx]
+        mov     QWORD[((-16))+rbx],r13
+
+        dec     rdi
+        jnz     NEAR $L$mulx4x_1st
+
+        mov     rax,QWORD[rsp]
+        mov     rdi,QWORD[8+rsp]
+        adc     r15,rbp
+        add     r14,r15
+        sbb     r15,r15
+        mov     QWORD[((-8))+rbx],r14
+        jmp     NEAR $L$mulx4x_outer
+
+ALIGN   32
+$L$mulx4x_outer:
+        mov     rdx,QWORD[rdi]
+        lea     rdi,[8+rdi]
+        sub     rsi,rax
+        mov     QWORD[rbx],r15
+        lea     rbx,[((64+32))+rsp]
+        sub     rcx,rax
+
+        mulx    r11,r8,QWORD[rsi]
+        xor     ebp,ebp
+        mov     r9,rdx
+        mulx    r12,r14,QWORD[8+rsi]
+        adox    r8,QWORD[((-32))+rbx]
+        adcx    r11,r14
+        mulx    r13,r15,QWORD[16+rsi]
+        adox    r11,QWORD[((-24))+rbx]
+        adcx    r12,r15
+        adox    r12,QWORD[((-16))+rbx]
+        adcx    r13,rbp
+        adox    r13,rbp
+
+        mov     QWORD[8+rsp],rdi
+        mov     r15,r8
+        imul    r8,QWORD[24+rsp]
+        xor     ebp,ebp
+
+        mulx    r14,rax,QWORD[24+rsi]
+        mov     rdx,r8
+        adcx    r13,rax
+        adox    r13,QWORD[((-8))+rbx]
+        adcx    r14,rbp
+        lea     rsi,[32+rsi]
+        adox    r14,rbp
+
+        mulx    r10,rax,QWORD[rcx]
+        adcx    r15,rax
+        adox    r10,r11
+        mulx    r11,rax,QWORD[8+rcx]
+        adcx    r10,rax
+        adox    r11,r12
+        mulx    r12,rax,QWORD[16+rcx]
+        mov     QWORD[((-32))+rbx],r10
+        adcx    r11,rax
+        adox    r12,r13
+        mulx    r15,rax,QWORD[24+rcx]
+        mov     rdx,r9
+        mov     QWORD[((-24))+rbx],r11
+        lea     rcx,[32+rcx]
+        adcx    r12,rax
+        adox    r15,rbp
+        mov     rdi,QWORD[48+rsp]
+        mov     QWORD[((-16))+rbx],r12
+
+        jmp     NEAR $L$mulx4x_inner
+
+ALIGN   32
+$L$mulx4x_inner:
+        mulx    rax,r10,QWORD[rsi]
+        adcx    r15,rbp
+        adox    r10,r14
+        mulx    r14,r11,QWORD[8+rsi]
+        adcx    r10,QWORD[rbx]
+        adox    r11,rax
+        mulx    rax,r12,QWORD[16+rsi]
+        adcx    r11,QWORD[8+rbx]
+        adox    r12,r14
+        mulx    r14,r13,QWORD[24+rsi]
+        mov     rdx,r8
+        adcx    r12,QWORD[16+rbx]
+        adox    r13,rax
+        adcx    r13,QWORD[24+rbx]
+        adox    r14,rbp
+        lea     rsi,[32+rsi]
+        lea     rbx,[32+rbx]
+        adcx    r14,rbp
+
+        adox    r10,r15
+        mulx    r15,rax,QWORD[rcx]
+        adcx    r10,rax
+        adox    r11,r15
+        mulx    r15,rax,QWORD[8+rcx]
+        adcx    r11,rax
+        adox    r12,r15
+        mulx    r15,rax,QWORD[16+rcx]
+        mov     QWORD[((-40))+rbx],r10
+        adcx    r12,rax
+        adox    r13,r15
+        mulx    r15,rax,QWORD[24+rcx]
+        mov     rdx,r9
+        mov     QWORD[((-32))+rbx],r11
+        mov     QWORD[((-24))+rbx],r12
+        adcx    r13,rax
+        adox    r15,rbp
+        lea     rcx,[32+rcx]
+        mov     QWORD[((-16))+rbx],r13
+
+        dec     rdi
+        jnz     NEAR $L$mulx4x_inner
+
+        mov     rax,QWORD[rsp]
+        mov     rdi,QWORD[8+rsp]
+        adc     r15,rbp
+        sub     rbp,QWORD[rbx]
+        adc     r14,r15
+        sbb     r15,r15
+        mov     QWORD[((-8))+rbx],r14
+
+        cmp     rdi,QWORD[16+rsp]
+        jne     NEAR $L$mulx4x_outer
+
+        lea     rbx,[64+rsp]
+        sub     rcx,rax
+        neg     r15
+        mov     rdx,rax
+        shr     rax,3+2
+        mov     rdi,QWORD[32+rsp]
+        jmp     NEAR $L$mulx4x_sub
+
+ALIGN   32
+$L$mulx4x_sub:
+        mov     r11,QWORD[rbx]
+        mov     r12,QWORD[8+rbx]
+        mov     r13,QWORD[16+rbx]
+        mov     r14,QWORD[24+rbx]
+        lea     rbx,[32+rbx]
+        sbb     r11,QWORD[rcx]
+        sbb     r12,QWORD[8+rcx]
+        sbb     r13,QWORD[16+rcx]
+        sbb     r14,QWORD[24+rcx]
+        lea     rcx,[32+rcx]
+        mov     QWORD[rdi],r11
+        mov     QWORD[8+rdi],r12
+        mov     QWORD[16+rdi],r13
+        mov     QWORD[24+rdi],r14
+        lea     rdi,[32+rdi]
+        dec     rax
+        jnz     NEAR $L$mulx4x_sub
+
+        sbb     r15,0
+        lea     rbx,[64+rsp]
+        sub     rdi,rdx
+
+DB      102,73,15,110,207
+        pxor    xmm0,xmm0
+        pshufd  xmm1,xmm1,0
+        mov     rsi,QWORD[40+rsp]
+
+        jmp     NEAR $L$mulx4x_cond_copy
+
+ALIGN   32
+$L$mulx4x_cond_copy:
+        movdqa  xmm2,XMMWORD[rbx]
+        movdqa  xmm3,XMMWORD[16+rbx]
+        lea     rbx,[32+rbx]
+        movdqu  xmm4,XMMWORD[rdi]
+        movdqu  xmm5,XMMWORD[16+rdi]
+        lea     rdi,[32+rdi]
+        movdqa  XMMWORD[(-32)+rbx],xmm0
+        movdqa  XMMWORD[(-16)+rbx],xmm0
+        pcmpeqd xmm0,xmm1
+        pand    xmm2,xmm1
+        pand    xmm3,xmm1
+        pand    xmm4,xmm0
+        pand    xmm5,xmm0
+        pxor    xmm0,xmm0
+        por     xmm4,xmm2
+        por     xmm5,xmm3
+        movdqu  XMMWORD[(-32)+rdi],xmm4
+        movdqu  XMMWORD[(-16)+rdi],xmm5
+        sub     rdx,32
+        jnz     NEAR $L$mulx4x_cond_copy
+
+        mov     QWORD[rbx],rdx
+
+        mov     rax,1
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$mulx4x_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_bn_mulx4x_mont:
+DB      77,111,110,116,103,111,109,101,114,121,32,77,117,108,116,105
+DB      112,108,105,99,97,116,105,111,110,32,102,111,114,32,120,56
+DB      54,95,54,52,44,32,67,82,89,80,84,79,71,65,77,83
+DB      32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115
+DB      115,108,46,111,114,103,62,0
+ALIGN   16
+EXTERN  __imp_RtlVirtualUnwind
+
+ALIGN   16
+mul_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$common_seh_tail
+
+        mov     rax,QWORD[152+r8]
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$common_seh_tail
+
+        mov     r10,QWORD[192+r8]
+        mov     rax,QWORD[8+r10*8+rax]
+
+        jmp     NEAR $L$common_pop_regs
+
+
+
+ALIGN   16
+sqr_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$common_seh_tail
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$common_pop_regs
+
+        mov     rax,QWORD[152+r8]
+
+        mov     r10d,DWORD[8+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$common_seh_tail
+
+        mov     rax,QWORD[40+rax]
+
+$L$common_pop_regs:
+        mov     rbx,QWORD[((-8))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+        mov     r12,QWORD[((-24))+rax]
+        mov     r13,QWORD[((-32))+rax]
+        mov     r14,QWORD[((-40))+rax]
+        mov     r15,QWORD[((-48))+rax]
+        mov     QWORD[144+r8],rbx
+        mov     QWORD[160+r8],rbp
+        mov     QWORD[216+r8],r12
+        mov     QWORD[224+r8],r13
+        mov     QWORD[232+r8],r14
+        mov     QWORD[240+r8],r15
+
+$L$common_seh_tail:
+        mov     rdi,QWORD[8+rax]
+        mov     rsi,QWORD[16+rax]
+        mov     QWORD[152+r8],rax
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+
+        mov     rdi,QWORD[40+r9]
+        mov     rsi,r8
+        mov     ecx,154
+        DD      0xa548f3fc
+
+        mov     rsi,r9
+        xor     rcx,rcx
+        mov     rdx,QWORD[8+rsi]
+        mov     r8,QWORD[rsi]
+        mov     r9,QWORD[16+rsi]
+        mov     r10,QWORD[40+rsi]
+        lea     r11,[56+rsi]
+        lea     r12,[24+rsi]
+        mov     QWORD[32+rsp],r10
+        mov     QWORD[40+rsp],r11
+        mov     QWORD[48+rsp],r12
+        mov     QWORD[56+rsp],rcx
+        call    QWORD[__imp_RtlVirtualUnwind]
+
+        mov     eax,1
+        add     rsp,64
+        popfq
+        pop     r15
+        pop     r14
+        pop     r13
+        pop     r12
+        pop     rbp
+        pop     rbx
+        pop     rdi
+        pop     rsi
+        DB      0F3h,0C3h               ;repret
+
+
+section .pdata rdata align=4
+ALIGN   4
+        DD      $L$SEH_begin_bn_mul_mont wrt ..imagebase
+        DD      $L$SEH_end_bn_mul_mont wrt ..imagebase
+        DD      $L$SEH_info_bn_mul_mont wrt ..imagebase
+
+        DD      $L$SEH_begin_bn_mul4x_mont wrt ..imagebase
+        DD      $L$SEH_end_bn_mul4x_mont wrt ..imagebase
+        DD      $L$SEH_info_bn_mul4x_mont wrt ..imagebase
+
+        DD      $L$SEH_begin_bn_sqr8x_mont wrt ..imagebase
+        DD      $L$SEH_end_bn_sqr8x_mont wrt ..imagebase
+        DD      $L$SEH_info_bn_sqr8x_mont wrt ..imagebase
+        DD      $L$SEH_begin_bn_mulx4x_mont wrt ..imagebase
+        DD      $L$SEH_end_bn_mulx4x_mont wrt ..imagebase
+        DD      $L$SEH_info_bn_mulx4x_mont wrt ..imagebase
+section .xdata rdata align=8
+ALIGN   8
+$L$SEH_info_bn_mul_mont:
+DB      9,0,0,0
+        DD      mul_handler wrt ..imagebase
+        DD      $L$mul_body wrt ..imagebase,$L$mul_epilogue wrt ..imagebase
+$L$SEH_info_bn_mul4x_mont:
+DB      9,0,0,0
+        DD      mul_handler wrt ..imagebase
+        DD      $L$mul4x_body wrt ..imagebase,$L$mul4x_epilogue wrt ..imagebase
+$L$SEH_info_bn_sqr8x_mont:
+DB      9,0,0,0
+        DD      sqr_handler wrt ..imagebase
+        DD      $L$sqr8x_prologue wrt ..imagebase,$L$sqr8x_body wrt ..imagebase,$L$sqr8x_epilogue wrt ..imagebase
+ALIGN   8
+$L$SEH_info_bn_mulx4x_mont:
+DB      9,0,0,0
+        DD      sqr_handler wrt ..imagebase
+        DD      $L$mulx4x_prologue wrt ..imagebase,$L$mulx4x_body wrt ..imagebase,$L$mulx4x_epilogue wrt ..imagebase
+ALIGN   8
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont5.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont5.nasm
new file mode 100644
index 0000000000..f256a94476
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont5.nasm
@@ -0,0 +1,4033 @@
+; Copyright 2011-2019 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+EXTERN  OPENSSL_ia32cap_P
+
+global  bn_mul_mont_gather5
+
+ALIGN   64
+bn_mul_mont_gather5:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_bn_mul_mont_gather5:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        mov     r9d,r9d
+        mov     rax,rsp
+
+        test    r9d,7
+        jnz     NEAR $L$mul_enter
+        mov     r11d,DWORD[((OPENSSL_ia32cap_P+8))]
+        jmp     NEAR $L$mul4x_enter
+
+ALIGN   16
+$L$mul_enter:
+        movd    xmm5,DWORD[56+rsp]
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+
+        neg     r9
+        mov     r11,rsp
+        lea     r10,[((-280))+r9*8+rsp]
+        neg     r9
+        and     r10,-1024
+
+
+
+
+
+
+
+
+
+        sub     r11,r10
+        and     r11,-4096
+        lea     rsp,[r11*1+r10]
+        mov     r11,QWORD[rsp]
+        cmp     rsp,r10
+        ja      NEAR $L$mul_page_walk
+        jmp     NEAR $L$mul_page_walk_done
+
+$L$mul_page_walk:
+        lea     rsp,[((-4096))+rsp]
+        mov     r11,QWORD[rsp]
+        cmp     rsp,r10
+        ja      NEAR $L$mul_page_walk
+$L$mul_page_walk_done:
+
+        lea     r10,[$L$inc]
+        mov     QWORD[8+r9*8+rsp],rax
+
+$L$mul_body:
+
+        lea     r12,[128+rdx]
+        movdqa  xmm0,XMMWORD[r10]
+        movdqa  xmm1,XMMWORD[16+r10]
+        lea     r10,[((24-112))+r9*8+rsp]
+        and     r10,-16
+
+        pshufd  xmm5,xmm5,0
+        movdqa  xmm4,xmm1
+        movdqa  xmm2,xmm1
+        paddd   xmm1,xmm0
+        pcmpeqd xmm0,xmm5
+DB      0x67
+        movdqa  xmm3,xmm4
+        paddd   xmm2,xmm1
+        pcmpeqd xmm1,xmm5
+        movdqa  XMMWORD[112+r10],xmm0
+        movdqa  xmm0,xmm4
+
+        paddd   xmm3,xmm2
+        pcmpeqd xmm2,xmm5
+        movdqa  XMMWORD[128+r10],xmm1
+        movdqa  xmm1,xmm4
+
+        paddd   xmm0,xmm3
+        pcmpeqd xmm3,xmm5
+        movdqa  XMMWORD[144+r10],xmm2
+        movdqa  xmm2,xmm4
+
+        paddd   xmm1,xmm0
+        pcmpeqd xmm0,xmm5
+        movdqa  XMMWORD[160+r10],xmm3
+        movdqa  xmm3,xmm4
+        paddd   xmm2,xmm1
+        pcmpeqd xmm1,xmm5
+        movdqa  XMMWORD[176+r10],xmm0
+        movdqa  xmm0,xmm4
+
+        paddd   xmm3,xmm2
+        pcmpeqd xmm2,xmm5
+        movdqa  XMMWORD[192+r10],xmm1
+        movdqa  xmm1,xmm4
+
+        paddd   xmm0,xmm3
+        pcmpeqd xmm3,xmm5
+        movdqa  XMMWORD[208+r10],xmm2
+        movdqa  xmm2,xmm4
+
+        paddd   xmm1,xmm0
+        pcmpeqd xmm0,xmm5
+        movdqa  XMMWORD[224+r10],xmm3
+        movdqa  xmm3,xmm4
+        paddd   xmm2,xmm1
+        pcmpeqd xmm1,xmm5
+        movdqa  XMMWORD[240+r10],xmm0
+        movdqa  xmm0,xmm4
+
+        paddd   xmm3,xmm2
+        pcmpeqd xmm2,xmm5
+        movdqa  XMMWORD[256+r10],xmm1
+        movdqa  xmm1,xmm4
+
+        paddd   xmm0,xmm3
+        pcmpeqd xmm3,xmm5
+        movdqa  XMMWORD[272+r10],xmm2
+        movdqa  xmm2,xmm4
+
+        paddd   xmm1,xmm0
+        pcmpeqd xmm0,xmm5
+        movdqa  XMMWORD[288+r10],xmm3
+        movdqa  xmm3,xmm4
+        paddd   xmm2,xmm1
+        pcmpeqd xmm1,xmm5
+        movdqa  XMMWORD[304+r10],xmm0
+
+        paddd   xmm3,xmm2
+DB      0x67
+        pcmpeqd xmm2,xmm5
+        movdqa  XMMWORD[320+r10],xmm1
+
+        pcmpeqd xmm3,xmm5
+        movdqa  XMMWORD[336+r10],xmm2
+        pand    xmm0,XMMWORD[64+r12]
+
+        pand    xmm1,XMMWORD[80+r12]
+        pand    xmm2,XMMWORD[96+r12]
+        movdqa  XMMWORD[352+r10],xmm3
+        pand    xmm3,XMMWORD[112+r12]
+        por     xmm0,xmm2
+        por     xmm1,xmm3
+        movdqa  xmm4,XMMWORD[((-128))+r12]
+        movdqa  xmm5,XMMWORD[((-112))+r12]
+        movdqa  xmm2,XMMWORD[((-96))+r12]
+        pand    xmm4,XMMWORD[112+r10]
+        movdqa  xmm3,XMMWORD[((-80))+r12]
+        pand    xmm5,XMMWORD[128+r10]
+        por     xmm0,xmm4
+        pand    xmm2,XMMWORD[144+r10]
+        por     xmm1,xmm5
+        pand    xmm3,XMMWORD[160+r10]
+        por     xmm0,xmm2
+        por     xmm1,xmm3
+        movdqa  xmm4,XMMWORD[((-64))+r12]
+        movdqa  xmm5,XMMWORD[((-48))+r12]
+        movdqa  xmm2,XMMWORD[((-32))+r12]
+        pand    xmm4,XMMWORD[176+r10]
+        movdqa  xmm3,XMMWORD[((-16))+r12]
+        pand    xmm5,XMMWORD[192+r10]
+        por     xmm0,xmm4
+        pand    xmm2,XMMWORD[208+r10]
+        por     xmm1,xmm5
+        pand    xmm3,XMMWORD[224+r10]
+        por     xmm0,xmm2
+        por     xmm1,xmm3
+        movdqa  xmm4,XMMWORD[r12]
+        movdqa  xmm5,XMMWORD[16+r12]
+        movdqa  xmm2,XMMWORD[32+r12]
+        pand    xmm4,XMMWORD[240+r10]
+        movdqa  xmm3,XMMWORD[48+r12]
+        pand    xmm5,XMMWORD[256+r10]
+        por     xmm0,xmm4
+        pand    xmm2,XMMWORD[272+r10]
+        por     xmm1,xmm5
+        pand    xmm3,XMMWORD[288+r10]
+        por     xmm0,xmm2
+        por     xmm1,xmm3
+        por     xmm0,xmm1
+        pshufd  xmm1,xmm0,0x4e
+        por     xmm0,xmm1
+        lea     r12,[256+r12]
+DB      102,72,15,126,195
+
+        mov     r8,QWORD[r8]
+        mov     rax,QWORD[rsi]
+
+        xor     r14,r14
+        xor     r15,r15
+
+        mov     rbp,r8
+        mul     rbx
+        mov     r10,rax
+        mov     rax,QWORD[rcx]
+
+        imul    rbp,r10
+        mov     r11,rdx
+
+        mul     rbp
+        add     r10,rax
+        mov     rax,QWORD[8+rsi]
+        adc     rdx,0
+        mov     r13,rdx
+
+        lea     r15,[1+r15]
+        jmp     NEAR $L$1st_enter
+
+ALIGN   16
+$L$1st:
+        add     r13,rax
+        mov     rax,QWORD[r15*8+rsi]
+        adc     rdx,0
+        add     r13,r11
+        mov     r11,r10
+        adc     rdx,0
+        mov     QWORD[((-16))+r15*8+rsp],r13
+        mov     r13,rdx
+
+$L$1st_enter:
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[r15*8+rcx]
+        adc     rdx,0
+        lea     r15,[1+r15]
+        mov     r10,rdx
+
+        mul     rbp
+        cmp     r15,r9
+        jne     NEAR $L$1st
+
+
+        add     r13,rax
+        adc     rdx,0
+        add     r13,r11
+        adc     rdx,0
+        mov     QWORD[((-16))+r9*8+rsp],r13
+        mov     r13,rdx
+        mov     r11,r10
+
+        xor     rdx,rdx
+        add     r13,r11
+        adc     rdx,0
+        mov     QWORD[((-8))+r9*8+rsp],r13
+        mov     QWORD[r9*8+rsp],rdx
+
+        lea     r14,[1+r14]
+        jmp     NEAR $L$outer
+ALIGN   16
+$L$outer:
+        lea     rdx,[((24+128))+r9*8+rsp]
+        and     rdx,-16
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+        movdqa  xmm0,XMMWORD[((-128))+r12]
+        movdqa  xmm1,XMMWORD[((-112))+r12]
+        movdqa  xmm2,XMMWORD[((-96))+r12]
+        movdqa  xmm3,XMMWORD[((-80))+r12]
+        pand    xmm0,XMMWORD[((-128))+rdx]
+        pand    xmm1,XMMWORD[((-112))+rdx]
+        por     xmm4,xmm0
+        pand    xmm2,XMMWORD[((-96))+rdx]
+        por     xmm5,xmm1
+        pand    xmm3,XMMWORD[((-80))+rdx]
+        por     xmm4,xmm2
+        por     xmm5,xmm3
+        movdqa  xmm0,XMMWORD[((-64))+r12]
+        movdqa  xmm1,XMMWORD[((-48))+r12]
+        movdqa  xmm2,XMMWORD[((-32))+r12]
+        movdqa  xmm3,XMMWORD[((-16))+r12]
+        pand    xmm0,XMMWORD[((-64))+rdx]
+        pand    xmm1,XMMWORD[((-48))+rdx]
+        por     xmm4,xmm0
+        pand    xmm2,XMMWORD[((-32))+rdx]
+        por     xmm5,xmm1
+        pand    xmm3,XMMWORD[((-16))+rdx]
+        por     xmm4,xmm2
+        por     xmm5,xmm3
+        movdqa  xmm0,XMMWORD[r12]
+        movdqa  xmm1,XMMWORD[16+r12]
+        movdqa  xmm2,XMMWORD[32+r12]
+        movdqa  xmm3,XMMWORD[48+r12]
+        pand    xmm0,XMMWORD[rdx]
+        pand    xmm1,XMMWORD[16+rdx]
+        por     xmm4,xmm0
+        pand    xmm2,XMMWORD[32+rdx]
+        por     xmm5,xmm1
+        pand    xmm3,XMMWORD[48+rdx]
+        por     xmm4,xmm2
+        por     xmm5,xmm3
+        movdqa  xmm0,XMMWORD[64+r12]
+        movdqa  xmm1,XMMWORD[80+r12]
+        movdqa  xmm2,XMMWORD[96+r12]
+        movdqa  xmm3,XMMWORD[112+r12]
+        pand    xmm0,XMMWORD[64+rdx]
+        pand    xmm1,XMMWORD[80+rdx]
+        por     xmm4,xmm0
+        pand    xmm2,XMMWORD[96+rdx]
+        por     xmm5,xmm1
+        pand    xmm3,XMMWORD[112+rdx]
+        por     xmm4,xmm2
+        por     xmm5,xmm3
+        por     xmm4,xmm5
+        pshufd  xmm0,xmm4,0x4e
+        por     xmm0,xmm4
+        lea     r12,[256+r12]
+
+        mov     rax,QWORD[rsi]
+DB      102,72,15,126,195
+
+        xor     r15,r15
+        mov     rbp,r8
+        mov     r10,QWORD[rsp]
+
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[rcx]
+        adc     rdx,0
+
+        imul    rbp,r10
+        mov     r11,rdx
+
+        mul     rbp
+        add     r10,rax
+        mov     rax,QWORD[8+rsi]
+        adc     rdx,0
+        mov     r10,QWORD[8+rsp]
+        mov     r13,rdx
+
+        lea     r15,[1+r15]
+        jmp     NEAR $L$inner_enter
+
+ALIGN   16
+$L$inner:
+        add     r13,rax
+        mov     rax,QWORD[r15*8+rsi]
+        adc     rdx,0
+        add     r13,r10
+        mov     r10,QWORD[r15*8+rsp]
+        adc     rdx,0
+        mov     QWORD[((-16))+r15*8+rsp],r13
+        mov     r13,rdx
+
+$L$inner_enter:
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[r15*8+rcx]
+        adc     rdx,0
+        add     r10,r11
+        mov     r11,rdx
+        adc     r11,0
+        lea     r15,[1+r15]
+
+        mul     rbp
+        cmp     r15,r9
+        jne     NEAR $L$inner
+
+        add     r13,rax
+        adc     rdx,0
+        add     r13,r10
+        mov     r10,QWORD[r9*8+rsp]
+        adc     rdx,0
+        mov     QWORD[((-16))+r9*8+rsp],r13
+        mov     r13,rdx
+
+        xor     rdx,rdx
+        add     r13,r11
+        adc     rdx,0
+        add     r13,r10
+        adc     rdx,0
+        mov     QWORD[((-8))+r9*8+rsp],r13
+        mov     QWORD[r9*8+rsp],rdx
+
+        lea     r14,[1+r14]
+        cmp     r14,r9
+        jb      NEAR $L$outer
+
+        xor     r14,r14
+        mov     rax,QWORD[rsp]
+        lea     rsi,[rsp]
+        mov     r15,r9
+        jmp     NEAR $L$sub
+ALIGN   16
+$L$sub: sbb     rax,QWORD[r14*8+rcx]
+        mov     QWORD[r14*8+rdi],rax
+        mov     rax,QWORD[8+r14*8+rsi]
+        lea     r14,[1+r14]
+        dec     r15
+        jnz     NEAR $L$sub
+
+        sbb     rax,0
+        mov     rbx,-1
+        xor     rbx,rax
+        xor     r14,r14
+        mov     r15,r9
+
+$L$copy:
+        mov     rcx,QWORD[r14*8+rdi]
+        mov     rdx,QWORD[r14*8+rsp]
+        and     rcx,rbx
+        and     rdx,rax
+        mov     QWORD[r14*8+rsp],r14
+        or      rdx,rcx
+        mov     QWORD[r14*8+rdi],rdx
+        lea     r14,[1+r14]
+        sub     r15,1
+        jnz     NEAR $L$copy
+
+        mov     rsi,QWORD[8+r9*8+rsp]
+
+        mov     rax,1
+
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$mul_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_bn_mul_mont_gather5:
+
+ALIGN   32
+bn_mul4x_mont_gather5:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_bn_mul4x_mont_gather5:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+DB      0x67
+        mov     rax,rsp
+
+$L$mul4x_enter:
+        and     r11d,0x80108
+        cmp     r11d,0x80108
+        je      NEAR $L$mulx4x_enter
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+$L$mul4x_prologue:
+
+DB      0x67
+        shl     r9d,3
+        lea     r10,[r9*2+r9]
+        neg     r9
+
+
+
+
+
+
+
+
+
+
+        lea     r11,[((-320))+r9*2+rsp]
+        mov     rbp,rsp
+        sub     r11,rdi
+        and     r11,4095
+        cmp     r10,r11
+        jb      NEAR $L$mul4xsp_alt
+        sub     rbp,r11
+        lea     rbp,[((-320))+r9*2+rbp]
+        jmp     NEAR $L$mul4xsp_done
+
+ALIGN   32
+$L$mul4xsp_alt:
+        lea     r10,[((4096-320))+r9*2]
+        lea     rbp,[((-320))+r9*2+rbp]
+        sub     r11,r10
+        mov     r10,0
+        cmovc   r11,r10
+        sub     rbp,r11
+$L$mul4xsp_done:
+        and     rbp,-64
+        mov     r11,rsp
+        sub     r11,rbp
+        and     r11,-4096
+        lea     rsp,[rbp*1+r11]
+        mov     r10,QWORD[rsp]
+        cmp     rsp,rbp
+        ja      NEAR $L$mul4x_page_walk
+        jmp     NEAR $L$mul4x_page_walk_done
+
+$L$mul4x_page_walk:
+        lea     rsp,[((-4096))+rsp]
+        mov     r10,QWORD[rsp]
+        cmp     rsp,rbp
+        ja      NEAR $L$mul4x_page_walk
+$L$mul4x_page_walk_done:
+
+        neg     r9
+
+        mov     QWORD[40+rsp],rax
+
+$L$mul4x_body:
+
+        call    mul4x_internal
+
+        mov     rsi,QWORD[40+rsp]
+
+        mov     rax,1
+
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$mul4x_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_bn_mul4x_mont_gather5:
+
+
+ALIGN   32
+mul4x_internal:
+        shl     r9,5
+        movd    xmm5,DWORD[56+rax]
+        lea     rax,[$L$inc]
+        lea     r13,[128+r9*1+rdx]
+        shr     r9,5
+        movdqa  xmm0,XMMWORD[rax]
+        movdqa  xmm1,XMMWORD[16+rax]
+        lea     r10,[((88-112))+r9*1+rsp]
+        lea     r12,[128+rdx]
+
+        pshufd  xmm5,xmm5,0
+        movdqa  xmm4,xmm1
+DB      0x67,0x67
+        movdqa  xmm2,xmm1
+        paddd   xmm1,xmm0
+        pcmpeqd xmm0,xmm5
+DB      0x67
+        movdqa  xmm3,xmm4
+        paddd   xmm2,xmm1
+        pcmpeqd xmm1,xmm5
+        movdqa  XMMWORD[112+r10],xmm0
+        movdqa  xmm0,xmm4
+
+        paddd   xmm3,xmm2
+        pcmpeqd xmm2,xmm5
+        movdqa  XMMWORD[128+r10],xmm1
+        movdqa  xmm1,xmm4
+
+        paddd   xmm0,xmm3
+        pcmpeqd xmm3,xmm5
+        movdqa  XMMWORD[144+r10],xmm2
+        movdqa  xmm2,xmm4
+
+        paddd   xmm1,xmm0
+        pcmpeqd xmm0,xmm5
+        movdqa  XMMWORD[160+r10],xmm3
+        movdqa  xmm3,xmm4
+        paddd   xmm2,xmm1
+        pcmpeqd xmm1,xmm5
+        movdqa  XMMWORD[176+r10],xmm0
+        movdqa  xmm0,xmm4
+
+        paddd   xmm3,xmm2
+        pcmpeqd xmm2,xmm5
+        movdqa  XMMWORD[192+r10],xmm1
+        movdqa  xmm1,xmm4
+
+        paddd   xmm0,xmm3
+        pcmpeqd xmm3,xmm5
+        movdqa  XMMWORD[208+r10],xmm2
+        movdqa  xmm2,xmm4
+
+        paddd   xmm1,xmm0
+        pcmpeqd xmm0,xmm5
+        movdqa  XMMWORD[224+r10],xmm3
+        movdqa  xmm3,xmm4
+        paddd   xmm2,xmm1
+        pcmpeqd xmm1,xmm5
+        movdqa  XMMWORD[240+r10],xmm0
+        movdqa  xmm0,xmm4
+
+        paddd   xmm3,xmm2
+        pcmpeqd xmm2,xmm5
+        movdqa  XMMWORD[256+r10],xmm1
+        movdqa  xmm1,xmm4
+
+        paddd   xmm0,xmm3
+        pcmpeqd xmm3,xmm5
+        movdqa  XMMWORD[272+r10],xmm2
+        movdqa  xmm2,xmm4
+
+        paddd   xmm1,xmm0
+        pcmpeqd xmm0,xmm5
+        movdqa  XMMWORD[288+r10],xmm3
+        movdqa  xmm3,xmm4
+        paddd   xmm2,xmm1
+        pcmpeqd xmm1,xmm5
+        movdqa  XMMWORD[304+r10],xmm0
+
+        paddd   xmm3,xmm2
+DB      0x67
+        pcmpeqd xmm2,xmm5
+        movdqa  XMMWORD[320+r10],xmm1
+
+        pcmpeqd xmm3,xmm5
+        movdqa  XMMWORD[336+r10],xmm2
+        pand    xmm0,XMMWORD[64+r12]
+
+        pand    xmm1,XMMWORD[80+r12]
+        pand    xmm2,XMMWORD[96+r12]
+        movdqa  XMMWORD[352+r10],xmm3
+        pand    xmm3,XMMWORD[112+r12]
+        por     xmm0,xmm2
+        por     xmm1,xmm3
+        movdqa  xmm4,XMMWORD[((-128))+r12]
+        movdqa  xmm5,XMMWORD[((-112))+r12]
+        movdqa  xmm2,XMMWORD[((-96))+r12]
+        pand    xmm4,XMMWORD[112+r10]
+        movdqa  xmm3,XMMWORD[((-80))+r12]
+        pand    xmm5,XMMWORD[128+r10]
+        por     xmm0,xmm4
+        pand    xmm2,XMMWORD[144+r10]
+        por     xmm1,xmm5
+        pand    xmm3,XMMWORD[160+r10]
+        por     xmm0,xmm2
+        por     xmm1,xmm3
+        movdqa  xmm4,XMMWORD[((-64))+r12]
+        movdqa  xmm5,XMMWORD[((-48))+r12]
+        movdqa  xmm2,XMMWORD[((-32))+r12]
+        pand    xmm4,XMMWORD[176+r10]
+        movdqa  xmm3,XMMWORD[((-16))+r12]
+        pand    xmm5,XMMWORD[192+r10]
+        por     xmm0,xmm4
+        pand    xmm2,XMMWORD[208+r10]
+        por     xmm1,xmm5
+        pand    xmm3,XMMWORD[224+r10]
+        por     xmm0,xmm2
+        por     xmm1,xmm3
+        movdqa  xmm4,XMMWORD[r12]
+        movdqa  xmm5,XMMWORD[16+r12]
+        movdqa  xmm2,XMMWORD[32+r12]
+        pand    xmm4,XMMWORD[240+r10]
+        movdqa  xmm3,XMMWORD[48+r12]
+        pand    xmm5,XMMWORD[256+r10]
+        por     xmm0,xmm4
+        pand    xmm2,XMMWORD[272+r10]
+        por     xmm1,xmm5
+        pand    xmm3,XMMWORD[288+r10]
+        por     xmm0,xmm2
+        por     xmm1,xmm3
+        por     xmm0,xmm1
+        pshufd  xmm1,xmm0,0x4e
+        por     xmm0,xmm1
+        lea     r12,[256+r12]
+DB      102,72,15,126,195
+
+        mov     QWORD[((16+8))+rsp],r13
+        mov     QWORD[((56+8))+rsp],rdi
+
+        mov     r8,QWORD[r8]
+        mov     rax,QWORD[rsi]
+        lea     rsi,[r9*1+rsi]
+        neg     r9
+
+        mov     rbp,r8
+        mul     rbx
+        mov     r10,rax
+        mov     rax,QWORD[rcx]
+
+        imul    rbp,r10
+        lea     r14,[((64+8))+rsp]
+        mov     r11,rdx
+
+        mul     rbp
+        add     r10,rax
+        mov     rax,QWORD[8+r9*1+rsi]
+        adc     rdx,0
+        mov     rdi,rdx
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[8+rcx]
+        adc     rdx,0
+        mov     r10,rdx
+
+        mul     rbp
+        add     rdi,rax
+        mov     rax,QWORD[16+r9*1+rsi]
+        adc     rdx,0
+        add     rdi,r11
+        lea     r15,[32+r9]
+        lea     rcx,[32+rcx]
+        adc     rdx,0
+        mov     QWORD[r14],rdi
+        mov     r13,rdx
+        jmp     NEAR $L$1st4x
+
+ALIGN   32
+$L$1st4x:
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[((-16))+rcx]
+        lea     r14,[32+r14]
+        adc     rdx,0
+        mov     r11,rdx
+
+        mul     rbp
+        add     r13,rax
+        mov     rax,QWORD[((-8))+r15*1+rsi]
+        adc     rdx,0
+        add     r13,r10
+        adc     rdx,0
+        mov     QWORD[((-24))+r14],r13
+        mov     rdi,rdx
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[((-8))+rcx]
+        adc     rdx,0
+        mov     r10,rdx
+
+        mul     rbp
+        add     rdi,rax
+        mov     rax,QWORD[r15*1+rsi]
+        adc     rdx,0
+        add     rdi,r11
+        adc     rdx,0
+        mov     QWORD[((-16))+r14],rdi
+        mov     r13,rdx
+
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[rcx]
+        adc     rdx,0
+        mov     r11,rdx
+
+        mul     rbp
+        add     r13,rax
+        mov     rax,QWORD[8+r15*1+rsi]
+        adc     rdx,0
+        add     r13,r10
+        adc     rdx,0
+        mov     QWORD[((-8))+r14],r13
+        mov     rdi,rdx
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[8+rcx]
+        adc     rdx,0
+        mov     r10,rdx
+
+        mul     rbp
+        add     rdi,rax
+        mov     rax,QWORD[16+r15*1+rsi]
+        adc     rdx,0
+        add     rdi,r11
+        lea     rcx,[32+rcx]
+        adc     rdx,0
+        mov     QWORD[r14],rdi
+        mov     r13,rdx
+
+        add     r15,32
+        jnz     NEAR $L$1st4x
+
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[((-16))+rcx]
+        lea     r14,[32+r14]
+        adc     rdx,0
+        mov     r11,rdx
+
+        mul     rbp
+        add     r13,rax
+        mov     rax,QWORD[((-8))+rsi]
+        adc     rdx,0
+        add     r13,r10
+        adc     rdx,0
+        mov     QWORD[((-24))+r14],r13
+        mov     rdi,rdx
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[((-8))+rcx]
+        adc     rdx,0
+        mov     r10,rdx
+
+        mul     rbp
+        add     rdi,rax
+        mov     rax,QWORD[r9*1+rsi]
+        adc     rdx,0
+        add     rdi,r11
+        adc     rdx,0
+        mov     QWORD[((-16))+r14],rdi
+        mov     r13,rdx
+
+        lea     rcx,[r9*1+rcx]
+
+        xor     rdi,rdi
+        add     r13,r10
+        adc     rdi,0
+        mov     QWORD[((-8))+r14],r13
+
+        jmp     NEAR $L$outer4x
+
+ALIGN   32
+$L$outer4x:
+        lea     rdx,[((16+128))+r14]
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+        movdqa  xmm0,XMMWORD[((-128))+r12]
+        movdqa  xmm1,XMMWORD[((-112))+r12]
+        movdqa  xmm2,XMMWORD[((-96))+r12]
+        movdqa  xmm3,XMMWORD[((-80))+r12]
+        pand    xmm0,XMMWORD[((-128))+rdx]
+        pand    xmm1,XMMWORD[((-112))+rdx]
+        por     xmm4,xmm0
+        pand    xmm2,XMMWORD[((-96))+rdx]
+        por     xmm5,xmm1
+        pand    xmm3,XMMWORD[((-80))+rdx]
+        por     xmm4,xmm2
+        por     xmm5,xmm3
+        movdqa  xmm0,XMMWORD[((-64))+r12]
+        movdqa  xmm1,XMMWORD[((-48))+r12]
+        movdqa  xmm2,XMMWORD[((-32))+r12]
+        movdqa  xmm3,XMMWORD[((-16))+r12]
+        pand    xmm0,XMMWORD[((-64))+rdx]
+        pand    xmm1,XMMWORD[((-48))+rdx]
+        por     xmm4,xmm0
+        pand    xmm2,XMMWORD[((-32))+rdx]
+        por     xmm5,xmm1
+        pand    xmm3,XMMWORD[((-16))+rdx]
+        por     xmm4,xmm2
+        por     xmm5,xmm3
+        movdqa  xmm0,XMMWORD[r12]
+        movdqa  xmm1,XMMWORD[16+r12]
+        movdqa  xmm2,XMMWORD[32+r12]
+        movdqa  xmm3,XMMWORD[48+r12]
+        pand    xmm0,XMMWORD[rdx]
+        pand    xmm1,XMMWORD[16+rdx]
+        por     xmm4,xmm0
+        pand    xmm2,XMMWORD[32+rdx]
+        por     xmm5,xmm1
+        pand    xmm3,XMMWORD[48+rdx]
+        por     xmm4,xmm2
+        por     xmm5,xmm3
+        movdqa  xmm0,XMMWORD[64+r12]
+        movdqa  xmm1,XMMWORD[80+r12]
+        movdqa  xmm2,XMMWORD[96+r12]
+        movdqa  xmm3,XMMWORD[112+r12]
+        pand    xmm0,XMMWORD[64+rdx]
+        pand    xmm1,XMMWORD[80+rdx]
+        por     xmm4,xmm0
+        pand    xmm2,XMMWORD[96+rdx]
+        por     xmm5,xmm1
+        pand    xmm3,XMMWORD[112+rdx]
+        por     xmm4,xmm2
+        por     xmm5,xmm3
+        por     xmm4,xmm5
+        pshufd  xmm0,xmm4,0x4e
+        por     xmm0,xmm4
+        lea     r12,[256+r12]
+DB      102,72,15,126,195
+
+        mov     r10,QWORD[r9*1+r14]
+        mov     rbp,r8
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[rcx]
+        adc     rdx,0
+
+        imul    rbp,r10
+        mov     r11,rdx
+        mov     QWORD[r14],rdi
+
+        lea     r14,[r9*1+r14]
+
+        mul     rbp
+        add     r10,rax
+        mov     rax,QWORD[8+r9*1+rsi]
+        adc     rdx,0
+        mov     rdi,rdx
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[8+rcx]
+        adc     rdx,0
+        add     r11,QWORD[8+r14]
+        adc     rdx,0
+        mov     r10,rdx
+
+        mul     rbp
+        add     rdi,rax
+        mov     rax,QWORD[16+r9*1+rsi]
+        adc     rdx,0
+        add     rdi,r11
+        lea     r15,[32+r9]
+        lea     rcx,[32+rcx]
+        adc     rdx,0
+        mov     r13,rdx
+        jmp     NEAR $L$inner4x
+
+ALIGN   32
+$L$inner4x:
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[((-16))+rcx]
+        adc     rdx,0
+        add     r10,QWORD[16+r14]
+        lea     r14,[32+r14]
+        adc     rdx,0
+        mov     r11,rdx
+
+        mul     rbp
+        add     r13,rax
+        mov     rax,QWORD[((-8))+r15*1+rsi]
+        adc     rdx,0
+        add     r13,r10
+        adc     rdx,0
+        mov     QWORD[((-32))+r14],rdi
+        mov     rdi,rdx
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[((-8))+rcx]
+        adc     rdx,0
+        add     r11,QWORD[((-8))+r14]
+        adc     rdx,0
+        mov     r10,rdx
+
+        mul     rbp
+        add     rdi,rax
+        mov     rax,QWORD[r15*1+rsi]
+        adc     rdx,0
+        add     rdi,r11
+        adc     rdx,0
+        mov     QWORD[((-24))+r14],r13
+        mov     r13,rdx
+
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[rcx]
+        adc     rdx,0
+        add     r10,QWORD[r14]
+        adc     rdx,0
+        mov     r11,rdx
+
+        mul     rbp
+        add     r13,rax
+        mov     rax,QWORD[8+r15*1+rsi]
+        adc     rdx,0
+        add     r13,r10
+        adc     rdx,0
+        mov     QWORD[((-16))+r14],rdi
+        mov     rdi,rdx
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[8+rcx]
+        adc     rdx,0
+        add     r11,QWORD[8+r14]
+        adc     rdx,0
+        mov     r10,rdx
+
+        mul     rbp
+        add     rdi,rax
+        mov     rax,QWORD[16+r15*1+rsi]
+        adc     rdx,0
+        add     rdi,r11
+        lea     rcx,[32+rcx]
+        adc     rdx,0
+        mov     QWORD[((-8))+r14],r13
+        mov     r13,rdx
+
+        add     r15,32
+        jnz     NEAR $L$inner4x
+
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[((-16))+rcx]
+        adc     rdx,0
+        add     r10,QWORD[16+r14]
+        lea     r14,[32+r14]
+        adc     rdx,0
+        mov     r11,rdx
+
+        mul     rbp
+        add     r13,rax
+        mov     rax,QWORD[((-8))+rsi]
+        adc     rdx,0
+        add     r13,r10
+        adc     rdx,0
+        mov     QWORD[((-32))+r14],rdi
+        mov     rdi,rdx
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,rbp
+        mov     rbp,QWORD[((-8))+rcx]
+        adc     rdx,0
+        add     r11,QWORD[((-8))+r14]
+        adc     rdx,0
+        mov     r10,rdx
+
+        mul     rbp
+        add     rdi,rax
+        mov     rax,QWORD[r9*1+rsi]
+        adc     rdx,0
+        add     rdi,r11
+        adc     rdx,0
+        mov     QWORD[((-24))+r14],r13
+        mov     r13,rdx
+
+        mov     QWORD[((-16))+r14],rdi
+        lea     rcx,[r9*1+rcx]
+
+        xor     rdi,rdi
+        add     r13,r10
+        adc     rdi,0
+        add     r13,QWORD[r14]
+        adc     rdi,0
+        mov     QWORD[((-8))+r14],r13
+
+        cmp     r12,QWORD[((16+8))+rsp]
+        jb      NEAR $L$outer4x
+        xor     rax,rax
+        sub     rbp,r13
+        adc     r15,r15
+        or      rdi,r15
+        sub     rax,rdi
+        lea     rbx,[r9*1+r14]
+        mov     r12,QWORD[rcx]
+        lea     rbp,[rcx]
+        mov     rcx,r9
+        sar     rcx,3+2
+        mov     rdi,QWORD[((56+8))+rsp]
+        dec     r12
+        xor     r10,r10
+        mov     r13,QWORD[8+rbp]
+        mov     r14,QWORD[16+rbp]
+        mov     r15,QWORD[24+rbp]
+        jmp     NEAR $L$sqr4x_sub_entry
+
+global  bn_power5
+
+ALIGN   32
+bn_power5:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_bn_power5:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        mov     rax,rsp
+
+        mov     r11d,DWORD[((OPENSSL_ia32cap_P+8))]
+        and     r11d,0x80108
+        cmp     r11d,0x80108
+        je      NEAR $L$powerx5_enter
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+$L$power5_prologue:
+
+        shl     r9d,3
+        lea     r10d,[r9*2+r9]
+        neg     r9
+        mov     r8,QWORD[r8]
+
+
+
+
+
+
+
+
+        lea     r11,[((-320))+r9*2+rsp]
+        mov     rbp,rsp
+        sub     r11,rdi
+        and     r11,4095
+        cmp     r10,r11
+        jb      NEAR $L$pwr_sp_alt
+        sub     rbp,r11
+        lea     rbp,[((-320))+r9*2+rbp]
+        jmp     NEAR $L$pwr_sp_done
+
+ALIGN   32
+$L$pwr_sp_alt:
+        lea     r10,[((4096-320))+r9*2]
+        lea     rbp,[((-320))+r9*2+rbp]
+        sub     r11,r10
+        mov     r10,0
+        cmovc   r11,r10
+        sub     rbp,r11
+$L$pwr_sp_done:
+        and     rbp,-64
+        mov     r11,rsp
+        sub     r11,rbp
+        and     r11,-4096
+        lea     rsp,[rbp*1+r11]
+        mov     r10,QWORD[rsp]
+        cmp     rsp,rbp
+        ja      NEAR $L$pwr_page_walk
+        jmp     NEAR $L$pwr_page_walk_done
+
+$L$pwr_page_walk:
+        lea     rsp,[((-4096))+rsp]
+        mov     r10,QWORD[rsp]
+        cmp     rsp,rbp
+        ja      NEAR $L$pwr_page_walk
+$L$pwr_page_walk_done:
+
+        mov     r10,r9
+        neg     r9
+
+
+
+
+
+
+
+
+
+
+        mov     QWORD[32+rsp],r8
+        mov     QWORD[40+rsp],rax
+
+$L$power5_body:
+DB      102,72,15,110,207
+DB      102,72,15,110,209
+DB      102,73,15,110,218
+DB      102,72,15,110,226
+
+        call    __bn_sqr8x_internal
+        call    __bn_post4x_internal
+        call    __bn_sqr8x_internal
+        call    __bn_post4x_internal
+        call    __bn_sqr8x_internal
+        call    __bn_post4x_internal
+        call    __bn_sqr8x_internal
+        call    __bn_post4x_internal
+        call    __bn_sqr8x_internal
+        call    __bn_post4x_internal
+
+DB      102,72,15,126,209
+DB      102,72,15,126,226
+        mov     rdi,rsi
+        mov     rax,QWORD[40+rsp]
+        lea     r8,[32+rsp]
+
+        call    mul4x_internal
+
+        mov     rsi,QWORD[40+rsp]
+
+        mov     rax,1
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$power5_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_bn_power5:
+
+global  bn_sqr8x_internal
+
+
+ALIGN   32
+bn_sqr8x_internal:
+__bn_sqr8x_internal:
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+        lea     rbp,[32+r10]
+        lea     rsi,[r9*1+rsi]
+
+        mov     rcx,r9
+
+
+        mov     r14,QWORD[((-32))+rbp*1+rsi]
+        lea     rdi,[((48+8))+r9*2+rsp]
+        mov     rax,QWORD[((-24))+rbp*1+rsi]
+        lea     rdi,[((-32))+rbp*1+rdi]
+        mov     rbx,QWORD[((-16))+rbp*1+rsi]
+        mov     r15,rax
+
+        mul     r14
+        mov     r10,rax
+        mov     rax,rbx
+        mov     r11,rdx
+        mov     QWORD[((-24))+rbp*1+rdi],r10
+
+        mul     r14
+        add     r11,rax
+        mov     rax,rbx
+        adc     rdx,0
+        mov     QWORD[((-16))+rbp*1+rdi],r11
+        mov     r10,rdx
+
+
+        mov     rbx,QWORD[((-8))+rbp*1+rsi]
+        mul     r15
+        mov     r12,rax
+        mov     rax,rbx
+        mov     r13,rdx
+
+        lea     rcx,[rbp]
+        mul     r14
+        add     r10,rax
+        mov     rax,rbx
+        mov     r11,rdx
+        adc     r11,0
+        add     r10,r12
+        adc     r11,0
+        mov     QWORD[((-8))+rcx*1+rdi],r10
+        jmp     NEAR $L$sqr4x_1st
+
+ALIGN   32
+$L$sqr4x_1st:
+        mov     rbx,QWORD[rcx*1+rsi]
+        mul     r15
+        add     r13,rax
+        mov     rax,rbx
+        mov     r12,rdx
+        adc     r12,0
+
+        mul     r14
+        add     r11,rax
+        mov     rax,rbx
+        mov     rbx,QWORD[8+rcx*1+rsi]
+        mov     r10,rdx
+        adc     r10,0
+        add     r11,r13
+        adc     r10,0
+
+
+        mul     r15
+        add     r12,rax
+        mov     rax,rbx
+        mov     QWORD[rcx*1+rdi],r11
+        mov     r13,rdx
+        adc     r13,0
+
+        mul     r14
+        add     r10,rax
+        mov     rax,rbx
+        mov     rbx,QWORD[16+rcx*1+rsi]
+        mov     r11,rdx
+        adc     r11,0
+        add     r10,r12
+        adc     r11,0
+
+        mul     r15
+        add     r13,rax
+        mov     rax,rbx
+        mov     QWORD[8+rcx*1+rdi],r10
+        mov     r12,rdx
+        adc     r12,0
+
+        mul     r14
+        add     r11,rax
+        mov     rax,rbx
+        mov     rbx,QWORD[24+rcx*1+rsi]
+        mov     r10,rdx
+        adc     r10,0
+        add     r11,r13
+        adc     r10,0
+
+
+        mul     r15
+        add     r12,rax
+        mov     rax,rbx
+        mov     QWORD[16+rcx*1+rdi],r11
+        mov     r13,rdx
+        adc     r13,0
+        lea     rcx,[32+rcx]
+
+        mul     r14
+        add     r10,rax
+        mov     rax,rbx
+        mov     r11,rdx
+        adc     r11,0
+        add     r10,r12
+        adc     r11,0
+        mov     QWORD[((-8))+rcx*1+rdi],r10
+
+        cmp     rcx,0
+        jne     NEAR $L$sqr4x_1st
+
+        mul     r15
+        add     r13,rax
+        lea     rbp,[16+rbp]
+        adc     rdx,0
+        add     r13,r11
+        adc     rdx,0
+
+        mov     QWORD[rdi],r13
+        mov     r12,rdx
+        mov     QWORD[8+rdi],rdx
+        jmp     NEAR $L$sqr4x_outer
+
+ALIGN   32
+$L$sqr4x_outer:
+        mov     r14,QWORD[((-32))+rbp*1+rsi]
+        lea     rdi,[((48+8))+r9*2+rsp]
+        mov     rax,QWORD[((-24))+rbp*1+rsi]
+        lea     rdi,[((-32))+rbp*1+rdi]
+        mov     rbx,QWORD[((-16))+rbp*1+rsi]
+        mov     r15,rax
+
+        mul     r14
+        mov     r10,QWORD[((-24))+rbp*1+rdi]
+        add     r10,rax
+        mov     rax,rbx
+        adc     rdx,0
+        mov     QWORD[((-24))+rbp*1+rdi],r10
+        mov     r11,rdx
+
+        mul     r14
+        add     r11,rax
+        mov     rax,rbx
+        adc     rdx,0
+        add     r11,QWORD[((-16))+rbp*1+rdi]
+        mov     r10,rdx
+        adc     r10,0
+        mov     QWORD[((-16))+rbp*1+rdi],r11
+
+        xor     r12,r12
+
+        mov     rbx,QWORD[((-8))+rbp*1+rsi]
+        mul     r15
+        add     r12,rax
+        mov     rax,rbx
+        adc     rdx,0
+        add     r12,QWORD[((-8))+rbp*1+rdi]
+        mov     r13,rdx
+        adc     r13,0
+
+        mul     r14
+        add     r10,rax
+        mov     rax,rbx
+        adc     rdx,0
+        add     r10,r12
+        mov     r11,rdx
+        adc     r11,0
+        mov     QWORD[((-8))+rbp*1+rdi],r10
+
+        lea     rcx,[rbp]
+        jmp     NEAR $L$sqr4x_inner
+
+ALIGN   32
+$L$sqr4x_inner:
+        mov     rbx,QWORD[rcx*1+rsi]
+        mul     r15
+        add     r13,rax
+        mov     rax,rbx
+        mov     r12,rdx
+        adc     r12,0
+        add     r13,QWORD[rcx*1+rdi]
+        adc     r12,0
+
+DB      0x67
+        mul     r14
+        add     r11,rax
+        mov     rax,rbx
+        mov     rbx,QWORD[8+rcx*1+rsi]
+        mov     r10,rdx
+        adc     r10,0
+        add     r11,r13
+        adc     r10,0
+
+        mul     r15
+        add     r12,rax
+        mov     QWORD[rcx*1+rdi],r11
+        mov     rax,rbx
+        mov     r13,rdx
+        adc     r13,0
+        add     r12,QWORD[8+rcx*1+rdi]
+        lea     rcx,[16+rcx]
+        adc     r13,0
+
+        mul     r14
+        add     r10,rax
+        mov     rax,rbx
+        adc     rdx,0
+        add     r10,r12
+        mov     r11,rdx
+        adc     r11,0
+        mov     QWORD[((-8))+rcx*1+rdi],r10
+
+        cmp     rcx,0
+        jne     NEAR $L$sqr4x_inner
+
+DB      0x67
+        mul     r15
+        add     r13,rax
+        adc     rdx,0
+        add     r13,r11
+        adc     rdx,0
+
+        mov     QWORD[rdi],r13
+        mov     r12,rdx
+        mov     QWORD[8+rdi],rdx
+
+        add     rbp,16
+        jnz     NEAR $L$sqr4x_outer
+
+
+        mov     r14,QWORD[((-32))+rsi]
+        lea     rdi,[((48+8))+r9*2+rsp]
+        mov     rax,QWORD[((-24))+rsi]
+        lea     rdi,[((-32))+rbp*1+rdi]
+        mov     rbx,QWORD[((-16))+rsi]
+        mov     r15,rax
+
+        mul     r14
+        add     r10,rax
+        mov     rax,rbx
+        mov     r11,rdx
+        adc     r11,0
+
+        mul     r14
+        add     r11,rax
+        mov     rax,rbx
+        mov     QWORD[((-24))+rdi],r10
+        mov     r10,rdx
+        adc     r10,0
+        add     r11,r13
+        mov     rbx,QWORD[((-8))+rsi]
+        adc     r10,0
+
+        mul     r15
+        add     r12,rax
+        mov     rax,rbx
+        mov     QWORD[((-16))+rdi],r11
+        mov     r13,rdx
+        adc     r13,0
+
+        mul     r14
+        add     r10,rax
+        mov     rax,rbx
+        mov     r11,rdx
+        adc     r11,0
+        add     r10,r12
+        adc     r11,0
+        mov     QWORD[((-8))+rdi],r10
+
+        mul     r15
+        add     r13,rax
+        mov     rax,QWORD[((-16))+rsi]
+        adc     rdx,0
+        add     r13,r11
+        adc     rdx,0
+
+        mov     QWORD[rdi],r13
+        mov     r12,rdx
+        mov     QWORD[8+rdi],rdx
+
+        mul     rbx
+        add     rbp,16
+        xor     r14,r14
+        sub     rbp,r9
+        xor     r15,r15
+
+        add     rax,r12
+        adc     rdx,0
+        mov     QWORD[8+rdi],rax
+        mov     QWORD[16+rdi],rdx
+        mov     QWORD[24+rdi],r15
+
+        mov     rax,QWORD[((-16))+rbp*1+rsi]
+        lea     rdi,[((48+8))+rsp]
+        xor     r10,r10
+        mov     r11,QWORD[8+rdi]
+
+        lea     r12,[r10*2+r14]
+        shr     r10,63
+        lea     r13,[r11*2+rcx]
+        shr     r11,63
+        or      r13,r10
+        mov     r10,QWORD[16+rdi]
+        mov     r14,r11
+        mul     rax
+        neg     r15
+        mov     r11,QWORD[24+rdi]
+        adc     r12,rax
+        mov     rax,QWORD[((-8))+rbp*1+rsi]
+        mov     QWORD[rdi],r12
+        adc     r13,rdx
+
+        lea     rbx,[r10*2+r14]
+        mov     QWORD[8+rdi],r13
+        sbb     r15,r15
+        shr     r10,63
+        lea     r8,[r11*2+rcx]
+        shr     r11,63
+        or      r8,r10
+        mov     r10,QWORD[32+rdi]
+        mov     r14,r11
+        mul     rax
+        neg     r15
+        mov     r11,QWORD[40+rdi]
+        adc     rbx,rax
+        mov     rax,QWORD[rbp*1+rsi]
+        mov     QWORD[16+rdi],rbx
+        adc     r8,rdx
+        lea     rbp,[16+rbp]
+        mov     QWORD[24+rdi],r8
+        sbb     r15,r15
+        lea     rdi,[64+rdi]
+        jmp     NEAR $L$sqr4x_shift_n_add
+
+ALIGN   32
+$L$sqr4x_shift_n_add:
+        lea     r12,[r10*2+r14]
+        shr     r10,63
+        lea     r13,[r11*2+rcx]
+        shr     r11,63
+        or      r13,r10
+        mov     r10,QWORD[((-16))+rdi]
+        mov     r14,r11
+        mul     rax
+        neg     r15
+        mov     r11,QWORD[((-8))+rdi]
+        adc     r12,rax
+        mov     rax,QWORD[((-8))+rbp*1+rsi]
+        mov     QWORD[((-32))+rdi],r12
+        adc     r13,rdx
+
+        lea     rbx,[r10*2+r14]
+        mov     QWORD[((-24))+rdi],r13
+        sbb     r15,r15
+        shr     r10,63
+        lea     r8,[r11*2+rcx]
+        shr     r11,63
+        or      r8,r10
+        mov     r10,QWORD[rdi]
+        mov     r14,r11
+        mul     rax
+        neg     r15
+        mov     r11,QWORD[8+rdi]
+        adc     rbx,rax
+        mov     rax,QWORD[rbp*1+rsi]
+        mov     QWORD[((-16))+rdi],rbx
+        adc     r8,rdx
+
+        lea     r12,[r10*2+r14]
+        mov     QWORD[((-8))+rdi],r8
+        sbb     r15,r15
+        shr     r10,63
+        lea     r13,[r11*2+rcx]
+        shr     r11,63
+        or      r13,r10
+        mov     r10,QWORD[16+rdi]
+        mov     r14,r11
+        mul     rax
+        neg     r15
+        mov     r11,QWORD[24+rdi]
+        adc     r12,rax
+        mov     rax,QWORD[8+rbp*1+rsi]
+        mov     QWORD[rdi],r12
+        adc     r13,rdx
+
+        lea     rbx,[r10*2+r14]
+        mov     QWORD[8+rdi],r13
+        sbb     r15,r15
+        shr     r10,63
+        lea     r8,[r11*2+rcx]
+        shr     r11,63
+        or      r8,r10
+        mov     r10,QWORD[32+rdi]
+        mov     r14,r11
+        mul     rax
+        neg     r15
+        mov     r11,QWORD[40+rdi]
+        adc     rbx,rax
+        mov     rax,QWORD[16+rbp*1+rsi]
+        mov     QWORD[16+rdi],rbx
+        adc     r8,rdx
+        mov     QWORD[24+rdi],r8
+        sbb     r15,r15
+        lea     rdi,[64+rdi]
+        add     rbp,32
+        jnz     NEAR $L$sqr4x_shift_n_add
+
+        lea     r12,[r10*2+r14]
+DB      0x67
+        shr     r10,63
+        lea     r13,[r11*2+rcx]
+        shr     r11,63
+        or      r13,r10
+        mov     r10,QWORD[((-16))+rdi]
+        mov     r14,r11
+        mul     rax
+        neg     r15
+        mov     r11,QWORD[((-8))+rdi]
+        adc     r12,rax
+        mov     rax,QWORD[((-8))+rsi]
+        mov     QWORD[((-32))+rdi],r12
+        adc     r13,rdx
+
+        lea     rbx,[r10*2+r14]
+        mov     QWORD[((-24))+rdi],r13
+        sbb     r15,r15
+        shr     r10,63
+        lea     r8,[r11*2+rcx]
+        shr     r11,63
+        or      r8,r10
+        mul     rax
+        neg     r15
+        adc     rbx,rax
+        adc     r8,rdx
+        mov     QWORD[((-16))+rdi],rbx
+        mov     QWORD[((-8))+rdi],r8
+DB      102,72,15,126,213
+__bn_sqr8x_reduction:
+        xor     rax,rax
+        lea     rcx,[rbp*1+r9]
+        lea     rdx,[((48+8))+r9*2+rsp]
+        mov     QWORD[((0+8))+rsp],rcx
+        lea     rdi,[((48+8))+r9*1+rsp]
+        mov     QWORD[((8+8))+rsp],rdx
+        neg     r9
+        jmp     NEAR $L$8x_reduction_loop
+
+ALIGN   32
+$L$8x_reduction_loop:
+        lea     rdi,[r9*1+rdi]
+DB      0x66
+        mov     rbx,QWORD[rdi]
+        mov     r9,QWORD[8+rdi]
+        mov     r10,QWORD[16+rdi]
+        mov     r11,QWORD[24+rdi]
+        mov     r12,QWORD[32+rdi]
+        mov     r13,QWORD[40+rdi]
+        mov     r14,QWORD[48+rdi]
+        mov     r15,QWORD[56+rdi]
+        mov     QWORD[rdx],rax
+        lea     rdi,[64+rdi]
+
+DB      0x67
+        mov     r8,rbx
+        imul    rbx,QWORD[((32+8))+rsp]
+        mov     rax,QWORD[rbp]
+        mov     ecx,8
+        jmp     NEAR $L$8x_reduce
+
+ALIGN   32
+$L$8x_reduce:
+        mul     rbx
+        mov     rax,QWORD[8+rbp]
+        neg     r8
+        mov     r8,rdx
+        adc     r8,0
+
+        mul     rbx
+        add     r9,rax
+        mov     rax,QWORD[16+rbp]
+        adc     rdx,0
+        add     r8,r9
+        mov     QWORD[((48-8+8))+rcx*8+rsp],rbx
+        mov     r9,rdx
+        adc     r9,0
+
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[24+rbp]
+        adc     rdx,0
+        add     r9,r10
+        mov     rsi,QWORD[((32+8))+rsp]
+        mov     r10,rdx
+        adc     r10,0
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[32+rbp]
+        adc     rdx,0
+        imul    rsi,r8
+        add     r10,r11
+        mov     r11,rdx
+        adc     r11,0
+
+        mul     rbx
+        add     r12,rax
+        mov     rax,QWORD[40+rbp]
+        adc     rdx,0
+        add     r11,r12
+        mov     r12,rdx
+        adc     r12,0
+
+        mul     rbx
+        add     r13,rax
+        mov     rax,QWORD[48+rbp]
+        adc     rdx,0
+        add     r12,r13
+        mov     r13,rdx
+        adc     r13,0
+
+        mul     rbx
+        add     r14,rax
+        mov     rax,QWORD[56+rbp]
+        adc     rdx,0
+        add     r13,r14
+        mov     r14,rdx
+        adc     r14,0
+
+        mul     rbx
+        mov     rbx,rsi
+        add     r15,rax
+        mov     rax,QWORD[rbp]
+        adc     rdx,0
+        add     r14,r15
+        mov     r15,rdx
+        adc     r15,0
+
+        dec     ecx
+        jnz     NEAR $L$8x_reduce
+
+        lea     rbp,[64+rbp]
+        xor     rax,rax
+        mov     rdx,QWORD[((8+8))+rsp]
+        cmp     rbp,QWORD[((0+8))+rsp]
+        jae     NEAR $L$8x_no_tail
+
+DB      0x66
+        add     r8,QWORD[rdi]
+        adc     r9,QWORD[8+rdi]
+        adc     r10,QWORD[16+rdi]
+        adc     r11,QWORD[24+rdi]
+        adc     r12,QWORD[32+rdi]
+        adc     r13,QWORD[40+rdi]
+        adc     r14,QWORD[48+rdi]
+        adc     r15,QWORD[56+rdi]
+        sbb     rsi,rsi
+
+        mov     rbx,QWORD[((48+56+8))+rsp]
+        mov     ecx,8
+        mov     rax,QWORD[rbp]
+        jmp     NEAR $L$8x_tail
+
+ALIGN   32
+$L$8x_tail:
+        mul     rbx
+        add     r8,rax
+        mov     rax,QWORD[8+rbp]
+        mov     QWORD[rdi],r8
+        mov     r8,rdx
+        adc     r8,0
+
+        mul     rbx
+        add     r9,rax
+        mov     rax,QWORD[16+rbp]
+        adc     rdx,0
+        add     r8,r9
+        lea     rdi,[8+rdi]
+        mov     r9,rdx
+        adc     r9,0
+
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[24+rbp]
+        adc     rdx,0
+        add     r9,r10
+        mov     r10,rdx
+        adc     r10,0
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[32+rbp]
+        adc     rdx,0
+        add     r10,r11
+        mov     r11,rdx
+        adc     r11,0
+
+        mul     rbx
+        add     r12,rax
+        mov     rax,QWORD[40+rbp]
+        adc     rdx,0
+        add     r11,r12
+        mov     r12,rdx
+        adc     r12,0
+
+        mul     rbx
+        add     r13,rax
+        mov     rax,QWORD[48+rbp]
+        adc     rdx,0
+        add     r12,r13
+        mov     r13,rdx
+        adc     r13,0
+
+        mul     rbx
+        add     r14,rax
+        mov     rax,QWORD[56+rbp]
+        adc     rdx,0
+        add     r13,r14
+        mov     r14,rdx
+        adc     r14,0
+
+        mul     rbx
+        mov     rbx,QWORD[((48-16+8))+rcx*8+rsp]
+        add     r15,rax
+        adc     rdx,0
+        add     r14,r15
+        mov     rax,QWORD[rbp]
+        mov     r15,rdx
+        adc     r15,0
+
+        dec     ecx
+        jnz     NEAR $L$8x_tail
+
+        lea     rbp,[64+rbp]
+        mov     rdx,QWORD[((8+8))+rsp]
+        cmp     rbp,QWORD[((0+8))+rsp]
+        jae     NEAR $L$8x_tail_done
+
+        mov     rbx,QWORD[((48+56+8))+rsp]
+        neg     rsi
+        mov     rax,QWORD[rbp]
+        adc     r8,QWORD[rdi]
+        adc     r9,QWORD[8+rdi]
+        adc     r10,QWORD[16+rdi]
+        adc     r11,QWORD[24+rdi]
+        adc     r12,QWORD[32+rdi]
+        adc     r13,QWORD[40+rdi]
+        adc     r14,QWORD[48+rdi]
+        adc     r15,QWORD[56+rdi]
+        sbb     rsi,rsi
+
+        mov     ecx,8
+        jmp     NEAR $L$8x_tail
+
+ALIGN   32
+$L$8x_tail_done:
+        xor     rax,rax
+        add     r8,QWORD[rdx]
+        adc     r9,0
+        adc     r10,0
+        adc     r11,0
+        adc     r12,0
+        adc     r13,0
+        adc     r14,0
+        adc     r15,0
+        adc     rax,0
+
+        neg     rsi
+$L$8x_no_tail:
+        adc     r8,QWORD[rdi]
+        adc     r9,QWORD[8+rdi]
+        adc     r10,QWORD[16+rdi]
+        adc     r11,QWORD[24+rdi]
+        adc     r12,QWORD[32+rdi]
+        adc     r13,QWORD[40+rdi]
+        adc     r14,QWORD[48+rdi]
+        adc     r15,QWORD[56+rdi]
+        adc     rax,0
+        mov     rcx,QWORD[((-8))+rbp]
+        xor     rsi,rsi
+
+DB      102,72,15,126,213
+
+        mov     QWORD[rdi],r8
+        mov     QWORD[8+rdi],r9
+DB      102,73,15,126,217
+        mov     QWORD[16+rdi],r10
+        mov     QWORD[24+rdi],r11
+        mov     QWORD[32+rdi],r12
+        mov     QWORD[40+rdi],r13
+        mov     QWORD[48+rdi],r14
+        mov     QWORD[56+rdi],r15
+        lea     rdi,[64+rdi]
+
+        cmp     rdi,rdx
+        jb      NEAR $L$8x_reduction_loop
+        DB      0F3h,0C3h               ;repret
+
+
+ALIGN   32
+__bn_post4x_internal:
+        mov     r12,QWORD[rbp]
+        lea     rbx,[r9*1+rdi]
+        mov     rcx,r9
+DB      102,72,15,126,207
+        neg     rax
+DB      102,72,15,126,206
+        sar     rcx,3+2
+        dec     r12
+        xor     r10,r10
+        mov     r13,QWORD[8+rbp]
+        mov     r14,QWORD[16+rbp]
+        mov     r15,QWORD[24+rbp]
+        jmp     NEAR $L$sqr4x_sub_entry
+
+ALIGN   16
+$L$sqr4x_sub:
+        mov     r12,QWORD[rbp]
+        mov     r13,QWORD[8+rbp]
+        mov     r14,QWORD[16+rbp]
+        mov     r15,QWORD[24+rbp]
+$L$sqr4x_sub_entry:
+        lea     rbp,[32+rbp]
+        not     r12
+        not     r13
+        not     r14
+        not     r15
+        and     r12,rax
+        and     r13,rax
+        and     r14,rax
+        and     r15,rax
+
+        neg     r10
+        adc     r12,QWORD[rbx]
+        adc     r13,QWORD[8+rbx]
+        adc     r14,QWORD[16+rbx]
+        adc     r15,QWORD[24+rbx]
+        mov     QWORD[rdi],r12
+        lea     rbx,[32+rbx]
+        mov     QWORD[8+rdi],r13
+        sbb     r10,r10
+        mov     QWORD[16+rdi],r14
+        mov     QWORD[24+rdi],r15
+        lea     rdi,[32+rdi]
+
+        inc     rcx
+        jnz     NEAR $L$sqr4x_sub
+
+        mov     r10,r9
+        neg     r9
+        DB      0F3h,0C3h               ;repret
+
+global  bn_from_montgomery
+
+ALIGN   32
+bn_from_montgomery:
+        test    DWORD[48+rsp],7
+        jz      NEAR bn_from_mont8x
+        xor     eax,eax
+        DB      0F3h,0C3h               ;repret
+
+
+
+ALIGN   32
+bn_from_mont8x:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_bn_from_mont8x:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+DB      0x67
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+$L$from_prologue:
+
+        shl     r9d,3
+        lea     r10,[r9*2+r9]
+        neg     r9
+        mov     r8,QWORD[r8]
+
+
+
+
+
+
+
+
+        lea     r11,[((-320))+r9*2+rsp]
+        mov     rbp,rsp
+        sub     r11,rdi
+        and     r11,4095
+        cmp     r10,r11
+        jb      NEAR $L$from_sp_alt
+        sub     rbp,r11
+        lea     rbp,[((-320))+r9*2+rbp]
+        jmp     NEAR $L$from_sp_done
+
+ALIGN   32
+$L$from_sp_alt:
+        lea     r10,[((4096-320))+r9*2]
+        lea     rbp,[((-320))+r9*2+rbp]
+        sub     r11,r10
+        mov     r10,0
+        cmovc   r11,r10
+        sub     rbp,r11
+$L$from_sp_done:
+        and     rbp,-64
+        mov     r11,rsp
+        sub     r11,rbp
+        and     r11,-4096
+        lea     rsp,[rbp*1+r11]
+        mov     r10,QWORD[rsp]
+        cmp     rsp,rbp
+        ja      NEAR $L$from_page_walk
+        jmp     NEAR $L$from_page_walk_done
+
+$L$from_page_walk:
+        lea     rsp,[((-4096))+rsp]
+        mov     r10,QWORD[rsp]
+        cmp     rsp,rbp
+        ja      NEAR $L$from_page_walk
+$L$from_page_walk_done:
+
+        mov     r10,r9
+        neg     r9
+
+
+
+
+
+
+
+
+
+
+        mov     QWORD[32+rsp],r8
+        mov     QWORD[40+rsp],rax
+
+$L$from_body:
+        mov     r11,r9
+        lea     rax,[48+rsp]
+        pxor    xmm0,xmm0
+        jmp     NEAR $L$mul_by_1
+
+ALIGN   32
+$L$mul_by_1:
+        movdqu  xmm1,XMMWORD[rsi]
+        movdqu  xmm2,XMMWORD[16+rsi]
+        movdqu  xmm3,XMMWORD[32+rsi]
+        movdqa  XMMWORD[r9*1+rax],xmm0
+        movdqu  xmm4,XMMWORD[48+rsi]
+        movdqa  XMMWORD[16+r9*1+rax],xmm0
+DB      0x48,0x8d,0xb6,0x40,0x00,0x00,0x00
+        movdqa  XMMWORD[rax],xmm1
+        movdqa  XMMWORD[32+r9*1+rax],xmm0
+        movdqa  XMMWORD[16+rax],xmm2
+        movdqa  XMMWORD[48+r9*1+rax],xmm0
+        movdqa  XMMWORD[32+rax],xmm3
+        movdqa  XMMWORD[48+rax],xmm4
+        lea     rax,[64+rax]
+        sub     r11,64
+        jnz     NEAR $L$mul_by_1
+
+DB      102,72,15,110,207
+DB      102,72,15,110,209
+DB      0x67
+        mov     rbp,rcx
+DB      102,73,15,110,218
+        mov     r11d,DWORD[((OPENSSL_ia32cap_P+8))]
+        and     r11d,0x80108
+        cmp     r11d,0x80108
+        jne     NEAR $L$from_mont_nox
+
+        lea     rdi,[r9*1+rax]
+        call    __bn_sqrx8x_reduction
+        call    __bn_postx4x_internal
+
+        pxor    xmm0,xmm0
+        lea     rax,[48+rsp]
+        jmp     NEAR $L$from_mont_zero
+
+ALIGN   32
+$L$from_mont_nox:
+        call    __bn_sqr8x_reduction
+        call    __bn_post4x_internal
+
+        pxor    xmm0,xmm0
+        lea     rax,[48+rsp]
+        jmp     NEAR $L$from_mont_zero
+
+ALIGN   32
+$L$from_mont_zero:
+        mov     rsi,QWORD[40+rsp]
+
+        movdqa  XMMWORD[rax],xmm0
+        movdqa  XMMWORD[16+rax],xmm0
+        movdqa  XMMWORD[32+rax],xmm0
+        movdqa  XMMWORD[48+rax],xmm0
+        lea     rax,[64+rax]
+        sub     r9,32
+        jnz     NEAR $L$from_mont_zero
+
+        mov     rax,1
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$from_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_bn_from_mont8x:
+
+ALIGN   32
+bn_mulx4x_mont_gather5:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_bn_mulx4x_mont_gather5:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        mov     rax,rsp
+
+$L$mulx4x_enter:
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+$L$mulx4x_prologue:
+
+        shl     r9d,3
+        lea     r10,[r9*2+r9]
+        neg     r9
+        mov     r8,QWORD[r8]
+
+
+
+
+
+
+
+
+
+
+        lea     r11,[((-320))+r9*2+rsp]
+        mov     rbp,rsp
+        sub     r11,rdi
+        and     r11,4095
+        cmp     r10,r11
+        jb      NEAR $L$mulx4xsp_alt
+        sub     rbp,r11
+        lea     rbp,[((-320))+r9*2+rbp]
+        jmp     NEAR $L$mulx4xsp_done
+
+$L$mulx4xsp_alt:
+        lea     r10,[((4096-320))+r9*2]
+        lea     rbp,[((-320))+r9*2+rbp]
+        sub     r11,r10
+        mov     r10,0
+        cmovc   r11,r10
+        sub     rbp,r11
+$L$mulx4xsp_done:
+        and     rbp,-64
+        mov     r11,rsp
+        sub     r11,rbp
+        and     r11,-4096
+        lea     rsp,[rbp*1+r11]
+        mov     r10,QWORD[rsp]
+        cmp     rsp,rbp
+        ja      NEAR $L$mulx4x_page_walk
+        jmp     NEAR $L$mulx4x_page_walk_done
+
+$L$mulx4x_page_walk:
+        lea     rsp,[((-4096))+rsp]
+        mov     r10,QWORD[rsp]
+        cmp     rsp,rbp
+        ja      NEAR $L$mulx4x_page_walk
+$L$mulx4x_page_walk_done:
+
+
+
+
+
+
+
+
+
+
+
+
+
+        mov     QWORD[32+rsp],r8
+        mov     QWORD[40+rsp],rax
+
+$L$mulx4x_body:
+        call    mulx4x_internal
+
+        mov     rsi,QWORD[40+rsp]
+
+        mov     rax,1
+
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$mulx4x_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_bn_mulx4x_mont_gather5:
+
+
+ALIGN   32
+mulx4x_internal:
+        mov     QWORD[8+rsp],r9
+        mov     r10,r9
+        neg     r9
+        shl     r9,5
+        neg     r10
+        lea     r13,[128+r9*1+rdx]
+        shr     r9,5+5
+        movd    xmm5,DWORD[56+rax]
+        sub     r9,1
+        lea     rax,[$L$inc]
+        mov     QWORD[((16+8))+rsp],r13
+        mov     QWORD[((24+8))+rsp],r9
+        mov     QWORD[((56+8))+rsp],rdi
+        movdqa  xmm0,XMMWORD[rax]
+        movdqa  xmm1,XMMWORD[16+rax]
+        lea     r10,[((88-112))+r10*1+rsp]
+        lea     rdi,[128+rdx]
+
+        pshufd  xmm5,xmm5,0
+        movdqa  xmm4,xmm1
+DB      0x67
+        movdqa  xmm2,xmm1
+DB      0x67
+        paddd   xmm1,xmm0
+        pcmpeqd xmm0,xmm5
+        movdqa  xmm3,xmm4
+        paddd   xmm2,xmm1
+        pcmpeqd xmm1,xmm5
+        movdqa  XMMWORD[112+r10],xmm0
+        movdqa  xmm0,xmm4
+
+        paddd   xmm3,xmm2
+        pcmpeqd xmm2,xmm5
+        movdqa  XMMWORD[128+r10],xmm1
+        movdqa  xmm1,xmm4
+
+        paddd   xmm0,xmm3
+        pcmpeqd xmm3,xmm5
+        movdqa  XMMWORD[144+r10],xmm2
+        movdqa  xmm2,xmm4
+
+        paddd   xmm1,xmm0
+        pcmpeqd xmm0,xmm5
+        movdqa  XMMWORD[160+r10],xmm3
+        movdqa  xmm3,xmm4
+        paddd   xmm2,xmm1
+        pcmpeqd xmm1,xmm5
+        movdqa  XMMWORD[176+r10],xmm0
+        movdqa  xmm0,xmm4
+
+        paddd   xmm3,xmm2
+        pcmpeqd xmm2,xmm5
+        movdqa  XMMWORD[192+r10],xmm1
+        movdqa  xmm1,xmm4
+
+        paddd   xmm0,xmm3
+        pcmpeqd xmm3,xmm5
+        movdqa  XMMWORD[208+r10],xmm2
+        movdqa  xmm2,xmm4
+
+        paddd   xmm1,xmm0
+        pcmpeqd xmm0,xmm5
+        movdqa  XMMWORD[224+r10],xmm3
+        movdqa  xmm3,xmm4
+        paddd   xmm2,xmm1
+        pcmpeqd xmm1,xmm5
+        movdqa  XMMWORD[240+r10],xmm0
+        movdqa  xmm0,xmm4
+
+        paddd   xmm3,xmm2
+        pcmpeqd xmm2,xmm5
+        movdqa  XMMWORD[256+r10],xmm1
+        movdqa  xmm1,xmm4
+
+        paddd   xmm0,xmm3
+        pcmpeqd xmm3,xmm5
+        movdqa  XMMWORD[272+r10],xmm2
+        movdqa  xmm2,xmm4
+
+        paddd   xmm1,xmm0
+        pcmpeqd xmm0,xmm5
+        movdqa  XMMWORD[288+r10],xmm3
+        movdqa  xmm3,xmm4
+DB      0x67
+        paddd   xmm2,xmm1
+        pcmpeqd xmm1,xmm5
+        movdqa  XMMWORD[304+r10],xmm0
+
+        paddd   xmm3,xmm2
+        pcmpeqd xmm2,xmm5
+        movdqa  XMMWORD[320+r10],xmm1
+
+        pcmpeqd xmm3,xmm5
+        movdqa  XMMWORD[336+r10],xmm2
+
+        pand    xmm0,XMMWORD[64+rdi]
+        pand    xmm1,XMMWORD[80+rdi]
+        pand    xmm2,XMMWORD[96+rdi]
+        movdqa  XMMWORD[352+r10],xmm3
+        pand    xmm3,XMMWORD[112+rdi]
+        por     xmm0,xmm2
+        por     xmm1,xmm3
+        movdqa  xmm4,XMMWORD[((-128))+rdi]
+        movdqa  xmm5,XMMWORD[((-112))+rdi]
+        movdqa  xmm2,XMMWORD[((-96))+rdi]
+        pand    xmm4,XMMWORD[112+r10]
+        movdqa  xmm3,XMMWORD[((-80))+rdi]
+        pand    xmm5,XMMWORD[128+r10]
+        por     xmm0,xmm4
+        pand    xmm2,XMMWORD[144+r10]
+        por     xmm1,xmm5
+        pand    xmm3,XMMWORD[160+r10]
+        por     xmm0,xmm2
+        por     xmm1,xmm3
+        movdqa  xmm4,XMMWORD[((-64))+rdi]
+        movdqa  xmm5,XMMWORD[((-48))+rdi]
+        movdqa  xmm2,XMMWORD[((-32))+rdi]
+        pand    xmm4,XMMWORD[176+r10]
+        movdqa  xmm3,XMMWORD[((-16))+rdi]
+        pand    xmm5,XMMWORD[192+r10]
+        por     xmm0,xmm4
+        pand    xmm2,XMMWORD[208+r10]
+        por     xmm1,xmm5
+        pand    xmm3,XMMWORD[224+r10]
+        por     xmm0,xmm2
+        por     xmm1,xmm3
+        movdqa  xmm4,XMMWORD[rdi]
+        movdqa  xmm5,XMMWORD[16+rdi]
+        movdqa  xmm2,XMMWORD[32+rdi]
+        pand    xmm4,XMMWORD[240+r10]
+        movdqa  xmm3,XMMWORD[48+rdi]
+        pand    xmm5,XMMWORD[256+r10]
+        por     xmm0,xmm4
+        pand    xmm2,XMMWORD[272+r10]
+        por     xmm1,xmm5
+        pand    xmm3,XMMWORD[288+r10]
+        por     xmm0,xmm2
+        por     xmm1,xmm3
+        pxor    xmm0,xmm1
+        pshufd  xmm1,xmm0,0x4e
+        por     xmm0,xmm1
+        lea     rdi,[256+rdi]
+DB      102,72,15,126,194
+        lea     rbx,[((64+32+8))+rsp]
+
+        mov     r9,rdx
+        mulx    rax,r8,QWORD[rsi]
+        mulx    r12,r11,QWORD[8+rsi]
+        add     r11,rax
+        mulx    r13,rax,QWORD[16+rsi]
+        adc     r12,rax
+        adc     r13,0
+        mulx    r14,rax,QWORD[24+rsi]
+
+        mov     r15,r8
+        imul    r8,QWORD[((32+8))+rsp]
+        xor     rbp,rbp
+        mov     rdx,r8
+
+        mov     QWORD[((8+8))+rsp],rdi
+
+        lea     rsi,[32+rsi]
+        adcx    r13,rax
+        adcx    r14,rbp
+
+        mulx    r10,rax,QWORD[rcx]
+        adcx    r15,rax
+        adox    r10,r11
+        mulx    r11,rax,QWORD[8+rcx]
+        adcx    r10,rax
+        adox    r11,r12
+        mulx    r12,rax,QWORD[16+rcx]
+        mov     rdi,QWORD[((24+8))+rsp]
+        mov     QWORD[((-32))+rbx],r10
+        adcx    r11,rax
+        adox    r12,r13
+        mulx    r15,rax,QWORD[24+rcx]
+        mov     rdx,r9
+        mov     QWORD[((-24))+rbx],r11
+        adcx    r12,rax
+        adox    r15,rbp
+        lea     rcx,[32+rcx]
+        mov     QWORD[((-16))+rbx],r12
+        jmp     NEAR $L$mulx4x_1st
+
+ALIGN   32
+$L$mulx4x_1st:
+        adcx    r15,rbp
+        mulx    rax,r10,QWORD[rsi]
+        adcx    r10,r14
+        mulx    r14,r11,QWORD[8+rsi]
+        adcx    r11,rax
+        mulx    rax,r12,QWORD[16+rsi]
+        adcx    r12,r14
+        mulx    r14,r13,QWORD[24+rsi]
+DB      0x67,0x67
+        mov     rdx,r8
+        adcx    r13,rax
+        adcx    r14,rbp
+        lea     rsi,[32+rsi]
+        lea     rbx,[32+rbx]
+
+        adox    r10,r15
+        mulx    r15,rax,QWORD[rcx]
+        adcx    r10,rax
+        adox    r11,r15
+        mulx    r15,rax,QWORD[8+rcx]
+        adcx    r11,rax
+        adox    r12,r15
+        mulx    r15,rax,QWORD[16+rcx]
+        mov     QWORD[((-40))+rbx],r10
+        adcx    r12,rax
+        mov     QWORD[((-32))+rbx],r11
+        adox    r13,r15
+        mulx    r15,rax,QWORD[24+rcx]
+        mov     rdx,r9
+        mov     QWORD[((-24))+rbx],r12
+        adcx    r13,rax
+        adox    r15,rbp
+        lea     rcx,[32+rcx]
+        mov     QWORD[((-16))+rbx],r13
+
+        dec     rdi
+        jnz     NEAR $L$mulx4x_1st
+
+        mov     rax,QWORD[8+rsp]
+        adc     r15,rbp
+        lea     rsi,[rax*1+rsi]
+        add     r14,r15
+        mov     rdi,QWORD[((8+8))+rsp]
+        adc     rbp,rbp
+        mov     QWORD[((-8))+rbx],r14
+        jmp     NEAR $L$mulx4x_outer
+
+ALIGN   32
+$L$mulx4x_outer:
+        lea     r10,[((16-256))+rbx]
+        pxor    xmm4,xmm4
+DB      0x67,0x67
+        pxor    xmm5,xmm5
+        movdqa  xmm0,XMMWORD[((-128))+rdi]
+        movdqa  xmm1,XMMWORD[((-112))+rdi]
+        movdqa  xmm2,XMMWORD[((-96))+rdi]
+        pand    xmm0,XMMWORD[256+r10]
+        movdqa  xmm3,XMMWORD[((-80))+rdi]
+        pand    xmm1,XMMWORD[272+r10]
+        por     xmm4,xmm0
+        pand    xmm2,XMMWORD[288+r10]
+        por     xmm5,xmm1
+        pand    xmm3,XMMWORD[304+r10]
+        por     xmm4,xmm2
+        por     xmm5,xmm3
+        movdqa  xmm0,XMMWORD[((-64))+rdi]
+        movdqa  xmm1,XMMWORD[((-48))+rdi]
+        movdqa  xmm2,XMMWORD[((-32))+rdi]
+        pand    xmm0,XMMWORD[320+r10]
+        movdqa  xmm3,XMMWORD[((-16))+rdi]
+        pand    xmm1,XMMWORD[336+r10]
+        por     xmm4,xmm0
+        pand    xmm2,XMMWORD[352+r10]
+        por     xmm5,xmm1
+        pand    xmm3,XMMWORD[368+r10]
+        por     xmm4,xmm2
+        por     xmm5,xmm3
+        movdqa  xmm0,XMMWORD[rdi]
+        movdqa  xmm1,XMMWORD[16+rdi]
+        movdqa  xmm2,XMMWORD[32+rdi]
+        pand    xmm0,XMMWORD[384+r10]
+        movdqa  xmm3,XMMWORD[48+rdi]
+        pand    xmm1,XMMWORD[400+r10]
+        por     xmm4,xmm0
+        pand    xmm2,XMMWORD[416+r10]
+        por     xmm5,xmm1
+        pand    xmm3,XMMWORD[432+r10]
+        por     xmm4,xmm2
+        por     xmm5,xmm3
+        movdqa  xmm0,XMMWORD[64+rdi]
+        movdqa  xmm1,XMMWORD[80+rdi]
+        movdqa  xmm2,XMMWORD[96+rdi]
+        pand    xmm0,XMMWORD[448+r10]
+        movdqa  xmm3,XMMWORD[112+rdi]
+        pand    xmm1,XMMWORD[464+r10]
+        por     xmm4,xmm0
+        pand    xmm2,XMMWORD[480+r10]
+        por     xmm5,xmm1
+        pand    xmm3,XMMWORD[496+r10]
+        por     xmm4,xmm2
+        por     xmm5,xmm3
+        por     xmm4,xmm5
+        pshufd  xmm0,xmm4,0x4e
+        por     xmm0,xmm4
+        lea     rdi,[256+rdi]
+DB      102,72,15,126,194
+
+        mov     QWORD[rbx],rbp
+        lea     rbx,[32+rax*1+rbx]
+        mulx    r11,r8,QWORD[rsi]
+        xor     rbp,rbp
+        mov     r9,rdx
+        mulx    r12,r14,QWORD[8+rsi]
+        adox    r8,QWORD[((-32))+rbx]
+        adcx    r11,r14
+        mulx    r13,r15,QWORD[16+rsi]
+        adox    r11,QWORD[((-24))+rbx]
+        adcx    r12,r15
+        mulx    r14,rdx,QWORD[24+rsi]
+        adox    r12,QWORD[((-16))+rbx]
+        adcx    r13,rdx
+        lea     rcx,[rax*1+rcx]
+        lea     rsi,[32+rsi]
+        adox    r13,QWORD[((-8))+rbx]
+        adcx    r14,rbp
+        adox    r14,rbp
+
+        mov     r15,r8
+        imul    r8,QWORD[((32+8))+rsp]
+
+        mov     rdx,r8
+        xor     rbp,rbp
+        mov     QWORD[((8+8))+rsp],rdi
+
+        mulx    r10,rax,QWORD[rcx]
+        adcx    r15,rax
+        adox    r10,r11
+        mulx    r11,rax,QWORD[8+rcx]
+        adcx    r10,rax
+        adox    r11,r12
+        mulx    r12,rax,QWORD[16+rcx]
+        adcx    r11,rax
+        adox    r12,r13
+        mulx    r15,rax,QWORD[24+rcx]
+        mov     rdx,r9
+        mov     rdi,QWORD[((24+8))+rsp]
+        mov     QWORD[((-32))+rbx],r10
+        adcx    r12,rax
+        mov     QWORD[((-24))+rbx],r11
+        adox    r15,rbp
+        mov     QWORD[((-16))+rbx],r12
+        lea     rcx,[32+rcx]
+        jmp     NEAR $L$mulx4x_inner
+
+ALIGN   32
+$L$mulx4x_inner:
+        mulx    rax,r10,QWORD[rsi]
+        adcx    r15,rbp
+        adox    r10,r14
+        mulx    r14,r11,QWORD[8+rsi]
+        adcx    r10,QWORD[rbx]
+        adox    r11,rax
+        mulx    rax,r12,QWORD[16+rsi]
+        adcx    r11,QWORD[8+rbx]
+        adox    r12,r14
+        mulx    r14,r13,QWORD[24+rsi]
+        mov     rdx,r8
+        adcx    r12,QWORD[16+rbx]
+        adox    r13,rax
+        adcx    r13,QWORD[24+rbx]
+        adox    r14,rbp
+        lea     rsi,[32+rsi]
+        lea     rbx,[32+rbx]
+        adcx    r14,rbp
+
+        adox    r10,r15
+        mulx    r15,rax,QWORD[rcx]
+        adcx    r10,rax
+        adox    r11,r15
+        mulx    r15,rax,QWORD[8+rcx]
+        adcx    r11,rax
+        adox    r12,r15
+        mulx    r15,rax,QWORD[16+rcx]
+        mov     QWORD[((-40))+rbx],r10
+        adcx    r12,rax
+        adox    r13,r15
+        mov     QWORD[((-32))+rbx],r11
+        mulx    r15,rax,QWORD[24+rcx]
+        mov     rdx,r9
+        lea     rcx,[32+rcx]
+        mov     QWORD[((-24))+rbx],r12
+        adcx    r13,rax
+        adox    r15,rbp
+        mov     QWORD[((-16))+rbx],r13
+
+        dec     rdi
+        jnz     NEAR $L$mulx4x_inner
+
+        mov     rax,QWORD[((0+8))+rsp]
+        adc     r15,rbp
+        sub     rdi,QWORD[rbx]
+        mov     rdi,QWORD[((8+8))+rsp]
+        mov     r10,QWORD[((16+8))+rsp]
+        adc     r14,r15
+        lea     rsi,[rax*1+rsi]
+        adc     rbp,rbp
+        mov     QWORD[((-8))+rbx],r14
+
+        cmp     rdi,r10
+        jb      NEAR $L$mulx4x_outer
+
+        mov     r10,QWORD[((-8))+rcx]
+        mov     r8,rbp
+        mov     r12,QWORD[rax*1+rcx]
+        lea     rbp,[rax*1+rcx]
+        mov     rcx,rax
+        lea     rdi,[rax*1+rbx]
+        xor     eax,eax
+        xor     r15,r15
+        sub     r10,r14
+        adc     r15,r15
+        or      r8,r15
+        sar     rcx,3+2
+        sub     rax,r8
+        mov     rdx,QWORD[((56+8))+rsp]
+        dec     r12
+        mov     r13,QWORD[8+rbp]
+        xor     r8,r8
+        mov     r14,QWORD[16+rbp]
+        mov     r15,QWORD[24+rbp]
+        jmp     NEAR $L$sqrx4x_sub_entry
+
+
+ALIGN   32
+bn_powerx5:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_bn_powerx5:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        mov     rax,rsp
+
+$L$powerx5_enter:
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+$L$powerx5_prologue:
+
+        shl     r9d,3
+        lea     r10,[r9*2+r9]
+        neg     r9
+        mov     r8,QWORD[r8]
+
+
+
+
+
+
+
+
+        lea     r11,[((-320))+r9*2+rsp]
+        mov     rbp,rsp
+        sub     r11,rdi
+        and     r11,4095
+        cmp     r10,r11
+        jb      NEAR $L$pwrx_sp_alt
+        sub     rbp,r11
+        lea     rbp,[((-320))+r9*2+rbp]
+        jmp     NEAR $L$pwrx_sp_done
+
+ALIGN   32
+$L$pwrx_sp_alt:
+        lea     r10,[((4096-320))+r9*2]
+        lea     rbp,[((-320))+r9*2+rbp]
+        sub     r11,r10
+        mov     r10,0
+        cmovc   r11,r10
+        sub     rbp,r11
+$L$pwrx_sp_done:
+        and     rbp,-64
+        mov     r11,rsp
+        sub     r11,rbp
+        and     r11,-4096
+        lea     rsp,[rbp*1+r11]
+        mov     r10,QWORD[rsp]
+        cmp     rsp,rbp
+        ja      NEAR $L$pwrx_page_walk
+        jmp     NEAR $L$pwrx_page_walk_done
+
+$L$pwrx_page_walk:
+        lea     rsp,[((-4096))+rsp]
+        mov     r10,QWORD[rsp]
+        cmp     rsp,rbp
+        ja      NEAR $L$pwrx_page_walk
+$L$pwrx_page_walk_done:
+
+        mov     r10,r9
+        neg     r9
+
+
+
+
+
+
+
+
+
+
+
+
+        pxor    xmm0,xmm0
+DB      102,72,15,110,207
+DB      102,72,15,110,209
+DB      102,73,15,110,218
+DB      102,72,15,110,226
+        mov     QWORD[32+rsp],r8
+        mov     QWORD[40+rsp],rax
+
+$L$powerx5_body:
+
+        call    __bn_sqrx8x_internal
+        call    __bn_postx4x_internal
+        call    __bn_sqrx8x_internal
+        call    __bn_postx4x_internal
+        call    __bn_sqrx8x_internal
+        call    __bn_postx4x_internal
+        call    __bn_sqrx8x_internal
+        call    __bn_postx4x_internal
+        call    __bn_sqrx8x_internal
+        call    __bn_postx4x_internal
+
+        mov     r9,r10
+        mov     rdi,rsi
+DB      102,72,15,126,209
+DB      102,72,15,126,226
+        mov     rax,QWORD[40+rsp]
+
+        call    mulx4x_internal
+
+        mov     rsi,QWORD[40+rsp]
+
+        mov     rax,1
+
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$powerx5_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_bn_powerx5:
+
+global  bn_sqrx8x_internal
+
+
+ALIGN   32
+bn_sqrx8x_internal:
+__bn_sqrx8x_internal:
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+        lea     rdi,[((48+8))+rsp]
+        lea     rbp,[r9*1+rsi]
+        mov     QWORD[((0+8))+rsp],r9
+        mov     QWORD[((8+8))+rsp],rbp
+        jmp     NEAR $L$sqr8x_zero_start
+
+ALIGN   32
+DB      0x66,0x66,0x66,0x2e,0x0f,0x1f,0x84,0x00,0x00,0x00,0x00,0x00
+$L$sqrx8x_zero:
+DB      0x3e
+        movdqa  XMMWORD[rdi],xmm0
+        movdqa  XMMWORD[16+rdi],xmm0
+        movdqa  XMMWORD[32+rdi],xmm0
+        movdqa  XMMWORD[48+rdi],xmm0
+$L$sqr8x_zero_start:
+        movdqa  XMMWORD[64+rdi],xmm0
+        movdqa  XMMWORD[80+rdi],xmm0
+        movdqa  XMMWORD[96+rdi],xmm0
+        movdqa  XMMWORD[112+rdi],xmm0
+        lea     rdi,[128+rdi]
+        sub     r9,64
+        jnz     NEAR $L$sqrx8x_zero
+
+        mov     rdx,QWORD[rsi]
+
+        xor     r10,r10
+        xor     r11,r11
+        xor     r12,r12
+        xor     r13,r13
+        xor     r14,r14
+        xor     r15,r15
+        lea     rdi,[((48+8))+rsp]
+        xor     rbp,rbp
+        jmp     NEAR $L$sqrx8x_outer_loop
+
+ALIGN   32
+$L$sqrx8x_outer_loop:
+        mulx    rax,r8,QWORD[8+rsi]
+        adcx    r8,r9
+        adox    r10,rax
+        mulx    rax,r9,QWORD[16+rsi]
+        adcx    r9,r10
+        adox    r11,rax
+DB      0xc4,0xe2,0xab,0xf6,0x86,0x18,0x00,0x00,0x00
+        adcx    r10,r11
+        adox    r12,rax
+DB      0xc4,0xe2,0xa3,0xf6,0x86,0x20,0x00,0x00,0x00
+        adcx    r11,r12
+        adox    r13,rax
+        mulx    rax,r12,QWORD[40+rsi]
+        adcx    r12,r13
+        adox    r14,rax
+        mulx    rax,r13,QWORD[48+rsi]
+        adcx    r13,r14
+        adox    rax,r15
+        mulx    r15,r14,QWORD[56+rsi]
+        mov     rdx,QWORD[8+rsi]
+        adcx    r14,rax
+        adox    r15,rbp
+        adc     r15,QWORD[64+rdi]
+        mov     QWORD[8+rdi],r8
+        mov     QWORD[16+rdi],r9
+        sbb     rcx,rcx
+        xor     rbp,rbp
+
+
+        mulx    rbx,r8,QWORD[16+rsi]
+        mulx    rax,r9,QWORD[24+rsi]
+        adcx    r8,r10
+        adox    r9,rbx
+        mulx    rbx,r10,QWORD[32+rsi]
+        adcx    r9,r11
+        adox    r10,rax
+DB      0xc4,0xe2,0xa3,0xf6,0x86,0x28,0x00,0x00,0x00
+        adcx    r10,r12
+        adox    r11,rbx
+DB      0xc4,0xe2,0x9b,0xf6,0x9e,0x30,0x00,0x00,0x00
+        adcx    r11,r13
+        adox    r12,r14
+DB      0xc4,0x62,0x93,0xf6,0xb6,0x38,0x00,0x00,0x00
+        mov     rdx,QWORD[16+rsi]
+        adcx    r12,rax
+        adox    r13,rbx
+        adcx    r13,r15
+        adox    r14,rbp
+        adcx    r14,rbp
+
+        mov     QWORD[24+rdi],r8
+        mov     QWORD[32+rdi],r9
+
+        mulx    rbx,r8,QWORD[24+rsi]
+        mulx    rax,r9,QWORD[32+rsi]
+        adcx    r8,r10
+        adox    r9,rbx
+        mulx    rbx,r10,QWORD[40+rsi]
+        adcx    r9,r11
+        adox    r10,rax
+DB      0xc4,0xe2,0xa3,0xf6,0x86,0x30,0x00,0x00,0x00
+        adcx    r10,r12
+        adox    r11,r13
+DB      0xc4,0x62,0x9b,0xf6,0xae,0x38,0x00,0x00,0x00
+DB      0x3e
+        mov     rdx,QWORD[24+rsi]
+        adcx    r11,rbx
+        adox    r12,rax
+        adcx    r12,r14
+        mov     QWORD[40+rdi],r8
+        mov     QWORD[48+rdi],r9
+        mulx    rax,r8,QWORD[32+rsi]
+        adox    r13,rbp
+        adcx    r13,rbp
+
+        mulx    rbx,r9,QWORD[40+rsi]
+        adcx    r8,r10
+        adox    r9,rax
+        mulx    rax,r10,QWORD[48+rsi]
+        adcx    r9,r11
+        adox    r10,r12
+        mulx    r12,r11,QWORD[56+rsi]
+        mov     rdx,QWORD[32+rsi]
+        mov     r14,QWORD[40+rsi]
+        adcx    r10,rbx
+        adox    r11,rax
+        mov     r15,QWORD[48+rsi]
+        adcx    r11,r13
+        adox    r12,rbp
+        adcx    r12,rbp
+
+        mov     QWORD[56+rdi],r8
+        mov     QWORD[64+rdi],r9
+
+        mulx    rax,r9,r14
+        mov     r8,QWORD[56+rsi]
+        adcx    r9,r10
+        mulx    rbx,r10,r15
+        adox    r10,rax
+        adcx    r10,r11
+        mulx    rax,r11,r8
+        mov     rdx,r14
+        adox    r11,rbx
+        adcx    r11,r12
+
+        adcx    rax,rbp
+
+        mulx    rbx,r14,r15
+        mulx    r13,r12,r8
+        mov     rdx,r15
+        lea     rsi,[64+rsi]
+        adcx    r11,r14
+        adox    r12,rbx
+        adcx    r12,rax
+        adox    r13,rbp
+
+DB      0x67,0x67
+        mulx    r14,r8,r8
+        adcx    r13,r8
+        adcx    r14,rbp
+
+        cmp     rsi,QWORD[((8+8))+rsp]
+        je      NEAR $L$sqrx8x_outer_break
+
+        neg     rcx
+        mov     rcx,-8
+        mov     r15,rbp
+        mov     r8,QWORD[64+rdi]
+        adcx    r9,QWORD[72+rdi]
+        adcx    r10,QWORD[80+rdi]
+        adcx    r11,QWORD[88+rdi]
+        adc     r12,QWORD[96+rdi]
+        adc     r13,QWORD[104+rdi]
+        adc     r14,QWORD[112+rdi]
+        adc     r15,QWORD[120+rdi]
+        lea     rbp,[rsi]
+        lea     rdi,[128+rdi]
+        sbb     rax,rax
+
+        mov     rdx,QWORD[((-64))+rsi]
+        mov     QWORD[((16+8))+rsp],rax
+        mov     QWORD[((24+8))+rsp],rdi
+
+
+        xor     eax,eax
+        jmp     NEAR $L$sqrx8x_loop
+
+ALIGN   32
+$L$sqrx8x_loop:
+        mov     rbx,r8
+        mulx    r8,rax,QWORD[rbp]
+        adcx    rbx,rax
+        adox    r8,r9
+
+        mulx    r9,rax,QWORD[8+rbp]
+        adcx    r8,rax
+        adox    r9,r10
+
+        mulx    r10,rax,QWORD[16+rbp]
+        adcx    r9,rax
+        adox    r10,r11
+
+        mulx    r11,rax,QWORD[24+rbp]
+        adcx    r10,rax
+        adox    r11,r12
+
+DB      0xc4,0x62,0xfb,0xf6,0xa5,0x20,0x00,0x00,0x00
+        adcx    r11,rax
+        adox    r12,r13
+
+        mulx    r13,rax,QWORD[40+rbp]
+        adcx    r12,rax
+        adox    r13,r14
+
+        mulx    r14,rax,QWORD[48+rbp]
+        mov     QWORD[rcx*8+rdi],rbx
+        mov     ebx,0
+        adcx    r13,rax
+        adox    r14,r15
+
+DB      0xc4,0x62,0xfb,0xf6,0xbd,0x38,0x00,0x00,0x00
+        mov     rdx,QWORD[8+rcx*8+rsi]
+        adcx    r14,rax
+        adox    r15,rbx
+        adcx    r15,rbx
+
+DB      0x67
+        inc     rcx
+        jnz     NEAR $L$sqrx8x_loop
+
+        lea     rbp,[64+rbp]
+        mov     rcx,-8
+        cmp     rbp,QWORD[((8+8))+rsp]
+        je      NEAR $L$sqrx8x_break
+
+        sub     rbx,QWORD[((16+8))+rsp]
+DB      0x66
+        mov     rdx,QWORD[((-64))+rsi]
+        adcx    r8,QWORD[rdi]
+        adcx    r9,QWORD[8+rdi]
+        adc     r10,QWORD[16+rdi]
+        adc     r11,QWORD[24+rdi]
+        adc     r12,QWORD[32+rdi]
+        adc     r13,QWORD[40+rdi]
+        adc     r14,QWORD[48+rdi]
+        adc     r15,QWORD[56+rdi]
+        lea     rdi,[64+rdi]
+DB      0x67
+        sbb     rax,rax
+        xor     ebx,ebx
+        mov     QWORD[((16+8))+rsp],rax
+        jmp     NEAR $L$sqrx8x_loop
+
+ALIGN   32
+$L$sqrx8x_break:
+        xor     rbp,rbp
+        sub     rbx,QWORD[((16+8))+rsp]
+        adcx    r8,rbp
+        mov     rcx,QWORD[((24+8))+rsp]
+        adcx    r9,rbp
+        mov     rdx,QWORD[rsi]
+        adc     r10,0
+        mov     QWORD[rdi],r8
+        adc     r11,0
+        adc     r12,0
+        adc     r13,0
+        adc     r14,0
+        adc     r15,0
+        cmp     rdi,rcx
+        je      NEAR $L$sqrx8x_outer_loop
+
+        mov     QWORD[8+rdi],r9
+        mov     r9,QWORD[8+rcx]
+        mov     QWORD[16+rdi],r10
+        mov     r10,QWORD[16+rcx]
+        mov     QWORD[24+rdi],r11
+        mov     r11,QWORD[24+rcx]
+        mov     QWORD[32+rdi],r12
+        mov     r12,QWORD[32+rcx]
+        mov     QWORD[40+rdi],r13
+        mov     r13,QWORD[40+rcx]
+        mov     QWORD[48+rdi],r14
+        mov     r14,QWORD[48+rcx]
+        mov     QWORD[56+rdi],r15
+        mov     r15,QWORD[56+rcx]
+        mov     rdi,rcx
+        jmp     NEAR $L$sqrx8x_outer_loop
+
+ALIGN   32
+$L$sqrx8x_outer_break:
+        mov     QWORD[72+rdi],r9
+DB      102,72,15,126,217
+        mov     QWORD[80+rdi],r10
+        mov     QWORD[88+rdi],r11
+        mov     QWORD[96+rdi],r12
+        mov     QWORD[104+rdi],r13
+        mov     QWORD[112+rdi],r14
+        lea     rdi,[((48+8))+rsp]
+        mov     rdx,QWORD[rcx*1+rsi]
+
+        mov     r11,QWORD[8+rdi]
+        xor     r10,r10
+        mov     r9,QWORD[((0+8))+rsp]
+        adox    r11,r11
+        mov     r12,QWORD[16+rdi]
+        mov     r13,QWORD[24+rdi]
+
+
+ALIGN   32
+$L$sqrx4x_shift_n_add:
+        mulx    rbx,rax,rdx
+        adox    r12,r12
+        adcx    rax,r10
+DB      0x48,0x8b,0x94,0x0e,0x08,0x00,0x00,0x00
+DB      0x4c,0x8b,0x97,0x20,0x00,0x00,0x00
+        adox    r13,r13
+        adcx    rbx,r11
+        mov     r11,QWORD[40+rdi]
+        mov     QWORD[rdi],rax
+        mov     QWORD[8+rdi],rbx
+
+        mulx    rbx,rax,rdx
+        adox    r10,r10
+        adcx    rax,r12
+        mov     rdx,QWORD[16+rcx*1+rsi]
+        mov     r12,QWORD[48+rdi]
+        adox    r11,r11
+        adcx    rbx,r13
+        mov     r13,QWORD[56+rdi]
+        mov     QWORD[16+rdi],rax
+        mov     QWORD[24+rdi],rbx
+
+        mulx    rbx,rax,rdx
+        adox    r12,r12
+        adcx    rax,r10
+        mov     rdx,QWORD[24+rcx*1+rsi]
+        lea     rcx,[32+rcx]
+        mov     r10,QWORD[64+rdi]
+        adox    r13,r13
+        adcx    rbx,r11
+        mov     r11,QWORD[72+rdi]
+        mov     QWORD[32+rdi],rax
+        mov     QWORD[40+rdi],rbx
+
+        mulx    rbx,rax,rdx
+        adox    r10,r10
+        adcx    rax,r12
+        jrcxz   $L$sqrx4x_shift_n_add_break
+DB      0x48,0x8b,0x94,0x0e,0x00,0x00,0x00,0x00
+        adox    r11,r11
+        adcx    rbx,r13
+        mov     r12,QWORD[80+rdi]
+        mov     r13,QWORD[88+rdi]
+        mov     QWORD[48+rdi],rax
+        mov     QWORD[56+rdi],rbx
+        lea     rdi,[64+rdi]
+        nop
+        jmp     NEAR $L$sqrx4x_shift_n_add
+
+ALIGN   32
+$L$sqrx4x_shift_n_add_break:
+        adcx    rbx,r13
+        mov     QWORD[48+rdi],rax
+        mov     QWORD[56+rdi],rbx
+        lea     rdi,[64+rdi]
+DB      102,72,15,126,213
+__bn_sqrx8x_reduction:
+        xor     eax,eax
+        mov     rbx,QWORD[((32+8))+rsp]
+        mov     rdx,QWORD[((48+8))+rsp]
+        lea     rcx,[((-64))+r9*1+rbp]
+
+        mov     QWORD[((0+8))+rsp],rcx
+        mov     QWORD[((8+8))+rsp],rdi
+
+        lea     rdi,[((48+8))+rsp]
+        jmp     NEAR $L$sqrx8x_reduction_loop
+
+ALIGN   32
+$L$sqrx8x_reduction_loop:
+        mov     r9,QWORD[8+rdi]
+        mov     r10,QWORD[16+rdi]
+        mov     r11,QWORD[24+rdi]
+        mov     r12,QWORD[32+rdi]
+        mov     r8,rdx
+        imul    rdx,rbx
+        mov     r13,QWORD[40+rdi]
+        mov     r14,QWORD[48+rdi]
+        mov     r15,QWORD[56+rdi]
+        mov     QWORD[((24+8))+rsp],rax
+
+        lea     rdi,[64+rdi]
+        xor     rsi,rsi
+        mov     rcx,-8
+        jmp     NEAR $L$sqrx8x_reduce
+
+ALIGN   32
+$L$sqrx8x_reduce:
+        mov     rbx,r8
+        mulx    r8,rax,QWORD[rbp]
+        adcx    rax,rbx
+        adox    r8,r9
+
+        mulx    r9,rbx,QWORD[8+rbp]
+        adcx    r8,rbx
+        adox    r9,r10
+
+        mulx    r10,rbx,QWORD[16+rbp]
+        adcx    r9,rbx
+        adox    r10,r11
+
+        mulx    r11,rbx,QWORD[24+rbp]
+        adcx    r10,rbx
+        adox    r11,r12
+
+DB      0xc4,0x62,0xe3,0xf6,0xa5,0x20,0x00,0x00,0x00
+        mov     rax,rdx
+        mov     rdx,r8
+        adcx    r11,rbx
+        adox    r12,r13
+
+        mulx    rdx,rbx,QWORD[((32+8))+rsp]
+        mov     rdx,rax
+        mov     QWORD[((64+48+8))+rcx*8+rsp],rax
+
+        mulx    r13,rax,QWORD[40+rbp]
+        adcx    r12,rax
+        adox    r13,r14
+
+        mulx    r14,rax,QWORD[48+rbp]
+        adcx    r13,rax
+        adox    r14,r15
+
+        mulx    r15,rax,QWORD[56+rbp]
+        mov     rdx,rbx
+        adcx    r14,rax
+        adox    r15,rsi
+        adcx    r15,rsi
+
+DB      0x67,0x67,0x67
+        inc     rcx
+        jnz     NEAR $L$sqrx8x_reduce
+
+        mov     rax,rsi
+        cmp     rbp,QWORD[((0+8))+rsp]
+        jae     NEAR $L$sqrx8x_no_tail
+
+        mov     rdx,QWORD[((48+8))+rsp]
+        add     r8,QWORD[rdi]
+        lea     rbp,[64+rbp]
+        mov     rcx,-8
+        adcx    r9,QWORD[8+rdi]
+        adcx    r10,QWORD[16+rdi]
+        adc     r11,QWORD[24+rdi]
+        adc     r12,QWORD[32+rdi]
+        adc     r13,QWORD[40+rdi]
+        adc     r14,QWORD[48+rdi]
+        adc     r15,QWORD[56+rdi]
+        lea     rdi,[64+rdi]
+        sbb     rax,rax
+
+        xor     rsi,rsi
+        mov     QWORD[((16+8))+rsp],rax
+        jmp     NEAR $L$sqrx8x_tail
+
+ALIGN   32
+$L$sqrx8x_tail:
+        mov     rbx,r8
+        mulx    r8,rax,QWORD[rbp]
+        adcx    rbx,rax
+        adox    r8,r9
+
+        mulx    r9,rax,QWORD[8+rbp]
+        adcx    r8,rax
+        adox    r9,r10
+
+        mulx    r10,rax,QWORD[16+rbp]
+        adcx    r9,rax
+        adox    r10,r11
+
+        mulx    r11,rax,QWORD[24+rbp]
+        adcx    r10,rax
+        adox    r11,r12
+
+DB      0xc4,0x62,0xfb,0xf6,0xa5,0x20,0x00,0x00,0x00
+        adcx    r11,rax
+        adox    r12,r13
+
+        mulx    r13,rax,QWORD[40+rbp]
+        adcx    r12,rax
+        adox    r13,r14
+
+        mulx    r14,rax,QWORD[48+rbp]
+        adcx    r13,rax
+        adox    r14,r15
+
+        mulx    r15,rax,QWORD[56+rbp]
+        mov     rdx,QWORD[((72+48+8))+rcx*8+rsp]
+        adcx    r14,rax
+        adox    r15,rsi
+        mov     QWORD[rcx*8+rdi],rbx
+        mov     rbx,r8
+        adcx    r15,rsi
+
+        inc     rcx
+        jnz     NEAR $L$sqrx8x_tail
+
+        cmp     rbp,QWORD[((0+8))+rsp]
+        jae     NEAR $L$sqrx8x_tail_done
+
+        sub     rsi,QWORD[((16+8))+rsp]
+        mov     rdx,QWORD[((48+8))+rsp]
+        lea     rbp,[64+rbp]
+        adc     r8,QWORD[rdi]
+        adc     r9,QWORD[8+rdi]
+        adc     r10,QWORD[16+rdi]
+        adc     r11,QWORD[24+rdi]
+        adc     r12,QWORD[32+rdi]
+        adc     r13,QWORD[40+rdi]
+        adc     r14,QWORD[48+rdi]
+        adc     r15,QWORD[56+rdi]
+        lea     rdi,[64+rdi]
+        sbb     rax,rax
+        sub     rcx,8
+
+        xor     rsi,rsi
+        mov     QWORD[((16+8))+rsp],rax
+        jmp     NEAR $L$sqrx8x_tail
+
+ALIGN   32
+$L$sqrx8x_tail_done:
+        xor     rax,rax
+        add     r8,QWORD[((24+8))+rsp]
+        adc     r9,0
+        adc     r10,0
+        adc     r11,0
+        adc     r12,0
+        adc     r13,0
+        adc     r14,0
+        adc     r15,0
+        adc     rax,0
+
+        sub     rsi,QWORD[((16+8))+rsp]
+$L$sqrx8x_no_tail:
+        adc     r8,QWORD[rdi]
+DB      102,72,15,126,217
+        adc     r9,QWORD[8+rdi]
+        mov     rsi,QWORD[56+rbp]
+DB      102,72,15,126,213
+        adc     r10,QWORD[16+rdi]
+        adc     r11,QWORD[24+rdi]
+        adc     r12,QWORD[32+rdi]
+        adc     r13,QWORD[40+rdi]
+        adc     r14,QWORD[48+rdi]
+        adc     r15,QWORD[56+rdi]
+        adc     rax,0
+
+        mov     rbx,QWORD[((32+8))+rsp]
+        mov     rdx,QWORD[64+rcx*1+rdi]
+
+        mov     QWORD[rdi],r8
+        lea     r8,[64+rdi]
+        mov     QWORD[8+rdi],r9
+        mov     QWORD[16+rdi],r10
+        mov     QWORD[24+rdi],r11
+        mov     QWORD[32+rdi],r12
+        mov     QWORD[40+rdi],r13
+        mov     QWORD[48+rdi],r14
+        mov     QWORD[56+rdi],r15
+
+        lea     rdi,[64+rcx*1+rdi]
+        cmp     r8,QWORD[((8+8))+rsp]
+        jb      NEAR $L$sqrx8x_reduction_loop
+        DB      0F3h,0C3h               ;repret
+
+
+ALIGN   32
+__bn_postx4x_internal:
+        mov     r12,QWORD[rbp]
+        mov     r10,rcx
+        mov     r9,rcx
+        neg     rax
+        sar     rcx,3+2
+
+DB      102,72,15,126,202
+DB      102,72,15,126,206
+        dec     r12
+        mov     r13,QWORD[8+rbp]
+        xor     r8,r8
+        mov     r14,QWORD[16+rbp]
+        mov     r15,QWORD[24+rbp]
+        jmp     NEAR $L$sqrx4x_sub_entry
+
+ALIGN   16
+$L$sqrx4x_sub:
+        mov     r12,QWORD[rbp]
+        mov     r13,QWORD[8+rbp]
+        mov     r14,QWORD[16+rbp]
+        mov     r15,QWORD[24+rbp]
+$L$sqrx4x_sub_entry:
+        andn    r12,r12,rax
+        lea     rbp,[32+rbp]
+        andn    r13,r13,rax
+        andn    r14,r14,rax
+        andn    r15,r15,rax
+
+        neg     r8
+        adc     r12,QWORD[rdi]
+        adc     r13,QWORD[8+rdi]
+        adc     r14,QWORD[16+rdi]
+        adc     r15,QWORD[24+rdi]
+        mov     QWORD[rdx],r12
+        lea     rdi,[32+rdi]
+        mov     QWORD[8+rdx],r13
+        sbb     r8,r8
+        mov     QWORD[16+rdx],r14
+        mov     QWORD[24+rdx],r15
+        lea     rdx,[32+rdx]
+
+        inc     rcx
+        jnz     NEAR $L$sqrx4x_sub
+
+        neg     r9
+
+        DB      0F3h,0C3h               ;repret
+
+global  bn_get_bits5
+
+ALIGN   16
+bn_get_bits5:
+        lea     r10,[rcx]
+        lea     r11,[1+rcx]
+        mov     ecx,edx
+        shr     edx,4
+        and     ecx,15
+        lea     eax,[((-8))+rcx]
+        cmp     ecx,11
+        cmova   r10,r11
+        cmova   ecx,eax
+        movzx   eax,WORD[rdx*2+r10]
+        shr     eax,cl
+        and     eax,31
+        DB      0F3h,0C3h               ;repret
+
+
+global  bn_scatter5
+
+ALIGN   16
+bn_scatter5:
+        cmp     edx,0
+        jz      NEAR $L$scatter_epilogue
+        lea     r8,[r9*8+r8]
+$L$scatter:
+        mov     rax,QWORD[rcx]
+        lea     rcx,[8+rcx]
+        mov     QWORD[r8],rax
+        lea     r8,[256+r8]
+        sub     edx,1
+        jnz     NEAR $L$scatter
+$L$scatter_epilogue:
+        DB      0F3h,0C3h               ;repret
+
+
+global  bn_gather5
+
+ALIGN   32
+bn_gather5:
+$L$SEH_begin_bn_gather5:
+
+DB      0x4c,0x8d,0x14,0x24
+DB      0x48,0x81,0xec,0x08,0x01,0x00,0x00
+        lea     rax,[$L$inc]
+        and     rsp,-16
+
+        movd    xmm5,r9d
+        movdqa  xmm0,XMMWORD[rax]
+        movdqa  xmm1,XMMWORD[16+rax]
+        lea     r11,[128+r8]
+        lea     rax,[128+rsp]
+
+        pshufd  xmm5,xmm5,0
+        movdqa  xmm4,xmm1
+        movdqa  xmm2,xmm1
+        paddd   xmm1,xmm0
+        pcmpeqd xmm0,xmm5
+        movdqa  xmm3,xmm4
+
+        paddd   xmm2,xmm1
+        pcmpeqd xmm1,xmm5
+        movdqa  XMMWORD[(-128)+rax],xmm0
+        movdqa  xmm0,xmm4
+
+        paddd   xmm3,xmm2
+        pcmpeqd xmm2,xmm5
+        movdqa  XMMWORD[(-112)+rax],xmm1
+        movdqa  xmm1,xmm4
+
+        paddd   xmm0,xmm3
+        pcmpeqd xmm3,xmm5
+        movdqa  XMMWORD[(-96)+rax],xmm2
+        movdqa  xmm2,xmm4
+        paddd   xmm1,xmm0
+        pcmpeqd xmm0,xmm5
+        movdqa  XMMWORD[(-80)+rax],xmm3
+        movdqa  xmm3,xmm4
+
+        paddd   xmm2,xmm1
+        pcmpeqd xmm1,xmm5
+        movdqa  XMMWORD[(-64)+rax],xmm0
+        movdqa  xmm0,xmm4
+
+        paddd   xmm3,xmm2
+        pcmpeqd xmm2,xmm5
+        movdqa  XMMWORD[(-48)+rax],xmm1
+        movdqa  xmm1,xmm4
+
+        paddd   xmm0,xmm3
+        pcmpeqd xmm3,xmm5
+        movdqa  XMMWORD[(-32)+rax],xmm2
+        movdqa  xmm2,xmm4
+        paddd   xmm1,xmm0
+        pcmpeqd xmm0,xmm5
+        movdqa  XMMWORD[(-16)+rax],xmm3
+        movdqa  xmm3,xmm4
+
+        paddd   xmm2,xmm1
+        pcmpeqd xmm1,xmm5
+        movdqa  XMMWORD[rax],xmm0
+        movdqa  xmm0,xmm4
+
+        paddd   xmm3,xmm2
+        pcmpeqd xmm2,xmm5
+        movdqa  XMMWORD[16+rax],xmm1
+        movdqa  xmm1,xmm4
+
+        paddd   xmm0,xmm3
+        pcmpeqd xmm3,xmm5
+        movdqa  XMMWORD[32+rax],xmm2
+        movdqa  xmm2,xmm4
+        paddd   xmm1,xmm0
+        pcmpeqd xmm0,xmm5
+        movdqa  XMMWORD[48+rax],xmm3
+        movdqa  xmm3,xmm4
+
+        paddd   xmm2,xmm1
+        pcmpeqd xmm1,xmm5
+        movdqa  XMMWORD[64+rax],xmm0
+        movdqa  xmm0,xmm4
+
+        paddd   xmm3,xmm2
+        pcmpeqd xmm2,xmm5
+        movdqa  XMMWORD[80+rax],xmm1
+        movdqa  xmm1,xmm4
+
+        paddd   xmm0,xmm3
+        pcmpeqd xmm3,xmm5
+        movdqa  XMMWORD[96+rax],xmm2
+        movdqa  xmm2,xmm4
+        movdqa  XMMWORD[112+rax],xmm3
+        jmp     NEAR $L$gather
+
+ALIGN   32
+$L$gather:
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+        movdqa  xmm0,XMMWORD[((-128))+r11]
+        movdqa  xmm1,XMMWORD[((-112))+r11]
+        movdqa  xmm2,XMMWORD[((-96))+r11]
+        pand    xmm0,XMMWORD[((-128))+rax]
+        movdqa  xmm3,XMMWORD[((-80))+r11]
+        pand    xmm1,XMMWORD[((-112))+rax]
+        por     xmm4,xmm0
+        pand    xmm2,XMMWORD[((-96))+rax]
+        por     xmm5,xmm1
+        pand    xmm3,XMMWORD[((-80))+rax]
+        por     xmm4,xmm2
+        por     xmm5,xmm3
+        movdqa  xmm0,XMMWORD[((-64))+r11]
+        movdqa  xmm1,XMMWORD[((-48))+r11]
+        movdqa  xmm2,XMMWORD[((-32))+r11]
+        pand    xmm0,XMMWORD[((-64))+rax]
+        movdqa  xmm3,XMMWORD[((-16))+r11]
+        pand    xmm1,XMMWORD[((-48))+rax]
+        por     xmm4,xmm0
+        pand    xmm2,XMMWORD[((-32))+rax]
+        por     xmm5,xmm1
+        pand    xmm3,XMMWORD[((-16))+rax]
+        por     xmm4,xmm2
+        por     xmm5,xmm3
+        movdqa  xmm0,XMMWORD[r11]
+        movdqa  xmm1,XMMWORD[16+r11]
+        movdqa  xmm2,XMMWORD[32+r11]
+        pand    xmm0,XMMWORD[rax]
+        movdqa  xmm3,XMMWORD[48+r11]
+        pand    xmm1,XMMWORD[16+rax]
+        por     xmm4,xmm0
+        pand    xmm2,XMMWORD[32+rax]
+        por     xmm5,xmm1
+        pand    xmm3,XMMWORD[48+rax]
+        por     xmm4,xmm2
+        por     xmm5,xmm3
+        movdqa  xmm0,XMMWORD[64+r11]
+        movdqa  xmm1,XMMWORD[80+r11]
+        movdqa  xmm2,XMMWORD[96+r11]
+        pand    xmm0,XMMWORD[64+rax]
+        movdqa  xmm3,XMMWORD[112+r11]
+        pand    xmm1,XMMWORD[80+rax]
+        por     xmm4,xmm0
+        pand    xmm2,XMMWORD[96+rax]
+        por     xmm5,xmm1
+        pand    xmm3,XMMWORD[112+rax]
+        por     xmm4,xmm2
+        por     xmm5,xmm3
+        por     xmm4,xmm5
+        lea     r11,[256+r11]
+        pshufd  xmm0,xmm4,0x4e
+        por     xmm0,xmm4
+        movq    QWORD[rcx],xmm0
+        lea     rcx,[8+rcx]
+        sub     edx,1
+        jnz     NEAR $L$gather
+
+        lea     rsp,[r10]
+        DB      0F3h,0C3h               ;repret
+$L$SEH_end_bn_gather5:
+
+ALIGN   64
+$L$inc:
+        DD      0,0,1,1
+        DD      2,2,2,2
+DB      77,111,110,116,103,111,109,101,114,121,32,77,117,108,116,105
+DB      112,108,105,99,97,116,105,111,110,32,119,105,116,104,32,115
+DB      99,97,116,116,101,114,47,103,97,116,104,101,114,32,102,111
+DB      114,32,120,56,54,95,54,52,44,32,67,82,89,80,84,79
+DB      71,65,77,83,32,98,121,32,60,97,112,112,114,111,64,111
+DB      112,101,110,115,115,108,46,111,114,103,62,0
+EXTERN  __imp_RtlVirtualUnwind
+
+ALIGN   16
+mul_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$common_seh_tail
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$common_pop_regs
+
+        mov     rax,QWORD[152+r8]
+
+        mov     r10d,DWORD[8+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$common_seh_tail
+
+        lea     r10,[$L$mul_epilogue]
+        cmp     rbx,r10
+        ja      NEAR $L$body_40
+
+        mov     r10,QWORD[192+r8]
+        mov     rax,QWORD[8+r10*8+rax]
+
+        jmp     NEAR $L$common_pop_regs
+
+$L$body_40:
+        mov     rax,QWORD[40+rax]
+$L$common_pop_regs:
+        mov     rbx,QWORD[((-8))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+        mov     r12,QWORD[((-24))+rax]
+        mov     r13,QWORD[((-32))+rax]
+        mov     r14,QWORD[((-40))+rax]
+        mov     r15,QWORD[((-48))+rax]
+        mov     QWORD[144+r8],rbx
+        mov     QWORD[160+r8],rbp
+        mov     QWORD[216+r8],r12
+        mov     QWORD[224+r8],r13
+        mov     QWORD[232+r8],r14
+        mov     QWORD[240+r8],r15
+
+$L$common_seh_tail:
+        mov     rdi,QWORD[8+rax]
+        mov     rsi,QWORD[16+rax]
+        mov     QWORD[152+r8],rax
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+
+        mov     rdi,QWORD[40+r9]
+        mov     rsi,r8
+        mov     ecx,154
+        DD      0xa548f3fc
+
+        mov     rsi,r9
+        xor     rcx,rcx
+        mov     rdx,QWORD[8+rsi]
+        mov     r8,QWORD[rsi]
+        mov     r9,QWORD[16+rsi]
+        mov     r10,QWORD[40+rsi]
+        lea     r11,[56+rsi]
+        lea     r12,[24+rsi]
+        mov     QWORD[32+rsp],r10
+        mov     QWORD[40+rsp],r11
+        mov     QWORD[48+rsp],r12
+        mov     QWORD[56+rsp],rcx
+        call    QWORD[__imp_RtlVirtualUnwind]
+
+        mov     eax,1
+        add     rsp,64
+        popfq
+        pop     r15
+        pop     r14
+        pop     r13
+        pop     r12
+        pop     rbp
+        pop     rbx
+        pop     rdi
+        pop     rsi
+        DB      0F3h,0C3h               ;repret
+
+
+section .pdata rdata align=4
+ALIGN   4
+        DD      $L$SEH_begin_bn_mul_mont_gather5 wrt ..imagebase
+        DD      $L$SEH_end_bn_mul_mont_gather5 wrt ..imagebase
+        DD      $L$SEH_info_bn_mul_mont_gather5 wrt ..imagebase
+
+        DD      $L$SEH_begin_bn_mul4x_mont_gather5 wrt ..imagebase
+        DD      $L$SEH_end_bn_mul4x_mont_gather5 wrt ..imagebase
+        DD      $L$SEH_info_bn_mul4x_mont_gather5 wrt ..imagebase
+
+        DD      $L$SEH_begin_bn_power5 wrt ..imagebase
+        DD      $L$SEH_end_bn_power5 wrt ..imagebase
+        DD      $L$SEH_info_bn_power5 wrt ..imagebase
+
+        DD      $L$SEH_begin_bn_from_mont8x wrt ..imagebase
+        DD      $L$SEH_end_bn_from_mont8x wrt ..imagebase
+        DD      $L$SEH_info_bn_from_mont8x wrt ..imagebase
+        DD      $L$SEH_begin_bn_mulx4x_mont_gather5 wrt ..imagebase
+        DD      $L$SEH_end_bn_mulx4x_mont_gather5 wrt ..imagebase
+        DD      $L$SEH_info_bn_mulx4x_mont_gather5 wrt ..imagebase
+
+        DD      $L$SEH_begin_bn_powerx5 wrt ..imagebase
+        DD      $L$SEH_end_bn_powerx5 wrt ..imagebase
+        DD      $L$SEH_info_bn_powerx5 wrt ..imagebase
+        DD      $L$SEH_begin_bn_gather5 wrt ..imagebase
+        DD      $L$SEH_end_bn_gather5 wrt ..imagebase
+        DD      $L$SEH_info_bn_gather5 wrt ..imagebase
+
+section .xdata rdata align=8
+ALIGN   8
+$L$SEH_info_bn_mul_mont_gather5:
+DB      9,0,0,0
+        DD      mul_handler wrt ..imagebase
+        DD      $L$mul_body wrt ..imagebase,$L$mul_body wrt ..imagebase,$L$mul_epilogue wrt ..imagebase
+ALIGN   8
+$L$SEH_info_bn_mul4x_mont_gather5:
+DB      9,0,0,0
+        DD      mul_handler wrt ..imagebase
+        DD      $L$mul4x_prologue wrt ..imagebase,$L$mul4x_body wrt ..imagebase,$L$mul4x_epilogue wrt ..imagebase
+ALIGN   8
+$L$SEH_info_bn_power5:
+DB      9,0,0,0
+        DD      mul_handler wrt ..imagebase
+        DD      $L$power5_prologue wrt ..imagebase,$L$power5_body wrt ..imagebase,$L$power5_epilogue wrt ..imagebase
+ALIGN   8
+$L$SEH_info_bn_from_mont8x:
+DB      9,0,0,0
+        DD      mul_handler wrt ..imagebase
+        DD      $L$from_prologue wrt ..imagebase,$L$from_body wrt ..imagebase,$L$from_epilogue wrt ..imagebase
+ALIGN   8
+$L$SEH_info_bn_mulx4x_mont_gather5:
+DB      9,0,0,0
+        DD      mul_handler wrt ..imagebase
+        DD      $L$mulx4x_prologue wrt ..imagebase,$L$mulx4x_body wrt ..imagebase,$L$mulx4x_epilogue wrt ..imagebase
+ALIGN   8
+$L$SEH_info_bn_powerx5:
+DB      9,0,0,0
+        DD      mul_handler wrt ..imagebase
+        DD      $L$powerx5_prologue wrt ..imagebase,$L$powerx5_body wrt ..imagebase,$L$powerx5_epilogue wrt ..imagebase
+ALIGN   8
+$L$SEH_info_bn_gather5:
+DB      0x01,0x0b,0x03,0x0a
+DB      0x0b,0x01,0x21,0x00
+DB      0x04,0xa3,0x00,0x00
+ALIGN   8
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/md5/md5-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/md5/md5-x86_64.nasm
new file mode 100644
index 0000000000..ff688eeb06
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/md5/md5-x86_64.nasm
@@ -0,0 +1,794 @@
+; Author: Marc Bevand <bevand_m (at) epita.fr>
+; Copyright 2005-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+ALIGN   16
+
+global  md5_block_asm_data_order
+
+md5_block_asm_data_order:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_md5_block_asm_data_order:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+        push    rbp
+
+        push    rbx
+
+        push    r12
+
+        push    r14
+
+        push    r15
+
+$L$prologue:
+
+
+
+
+        mov     rbp,rdi
+        shl     rdx,6
+        lea     rdi,[rdx*1+rsi]
+        mov     eax,DWORD[rbp]
+        mov     ebx,DWORD[4+rbp]
+        mov     ecx,DWORD[8+rbp]
+        mov     edx,DWORD[12+rbp]
+
+
+
+
+
+
+
+        cmp     rsi,rdi
+        je      NEAR $L$end
+
+
+$L$loop:
+        mov     r8d,eax
+        mov     r9d,ebx
+        mov     r14d,ecx
+        mov     r15d,edx
+        mov     r10d,DWORD[rsi]
+        mov     r11d,edx
+        xor     r11d,ecx
+        lea     eax,[((-680876936))+r10*1+rax]
+        and     r11d,ebx
+        mov     r10d,DWORD[4+rsi]
+        xor     r11d,edx
+        add     eax,r11d
+        rol     eax,7
+        mov     r11d,ecx
+        add     eax,ebx
+        xor     r11d,ebx
+        lea     edx,[((-389564586))+r10*1+rdx]
+        and     r11d,eax
+        mov     r10d,DWORD[8+rsi]
+        xor     r11d,ecx
+        add     edx,r11d
+        rol     edx,12
+        mov     r11d,ebx
+        add     edx,eax
+        xor     r11d,eax
+        lea     ecx,[606105819+r10*1+rcx]
+        and     r11d,edx
+        mov     r10d,DWORD[12+rsi]
+        xor     r11d,ebx
+        add     ecx,r11d
+        rol     ecx,17
+        mov     r11d,eax
+        add     ecx,edx
+        xor     r11d,edx
+        lea     ebx,[((-1044525330))+r10*1+rbx]
+        and     r11d,ecx
+        mov     r10d,DWORD[16+rsi]
+        xor     r11d,eax
+        add     ebx,r11d
+        rol     ebx,22
+        mov     r11d,edx
+        add     ebx,ecx
+        xor     r11d,ecx
+        lea     eax,[((-176418897))+r10*1+rax]
+        and     r11d,ebx
+        mov     r10d,DWORD[20+rsi]
+        xor     r11d,edx
+        add     eax,r11d
+        rol     eax,7
+        mov     r11d,ecx
+        add     eax,ebx
+        xor     r11d,ebx
+        lea     edx,[1200080426+r10*1+rdx]
+        and     r11d,eax
+        mov     r10d,DWORD[24+rsi]
+        xor     r11d,ecx
+        add     edx,r11d
+        rol     edx,12
+        mov     r11d,ebx
+        add     edx,eax
+        xor     r11d,eax
+        lea     ecx,[((-1473231341))+r10*1+rcx]
+        and     r11d,edx
+        mov     r10d,DWORD[28+rsi]
+        xor     r11d,ebx
+        add     ecx,r11d
+        rol     ecx,17
+        mov     r11d,eax
+        add     ecx,edx
+        xor     r11d,edx
+        lea     ebx,[((-45705983))+r10*1+rbx]
+        and     r11d,ecx
+        mov     r10d,DWORD[32+rsi]
+        xor     r11d,eax
+        add     ebx,r11d
+        rol     ebx,22
+        mov     r11d,edx
+        add     ebx,ecx
+        xor     r11d,ecx
+        lea     eax,[1770035416+r10*1+rax]
+        and     r11d,ebx
+        mov     r10d,DWORD[36+rsi]
+        xor     r11d,edx
+        add     eax,r11d
+        rol     eax,7
+        mov     r11d,ecx
+        add     eax,ebx
+        xor     r11d,ebx
+        lea     edx,[((-1958414417))+r10*1+rdx]
+        and     r11d,eax
+        mov     r10d,DWORD[40+rsi]
+        xor     r11d,ecx
+        add     edx,r11d
+        rol     edx,12
+        mov     r11d,ebx
+        add     edx,eax
+        xor     r11d,eax
+        lea     ecx,[((-42063))+r10*1+rcx]
+        and     r11d,edx
+        mov     r10d,DWORD[44+rsi]
+        xor     r11d,ebx
+        add     ecx,r11d
+        rol     ecx,17
+        mov     r11d,eax
+        add     ecx,edx
+        xor     r11d,edx
+        lea     ebx,[((-1990404162))+r10*1+rbx]
+        and     r11d,ecx
+        mov     r10d,DWORD[48+rsi]
+        xor     r11d,eax
+        add     ebx,r11d
+        rol     ebx,22
+        mov     r11d,edx
+        add     ebx,ecx
+        xor     r11d,ecx
+        lea     eax,[1804603682+r10*1+rax]
+        and     r11d,ebx
+        mov     r10d,DWORD[52+rsi]
+        xor     r11d,edx
+        add     eax,r11d
+        rol     eax,7
+        mov     r11d,ecx
+        add     eax,ebx
+        xor     r11d,ebx
+        lea     edx,[((-40341101))+r10*1+rdx]
+        and     r11d,eax
+        mov     r10d,DWORD[56+rsi]
+        xor     r11d,ecx
+        add     edx,r11d
+        rol     edx,12
+        mov     r11d,ebx
+        add     edx,eax
+        xor     r11d,eax
+        lea     ecx,[((-1502002290))+r10*1+rcx]
+        and     r11d,edx
+        mov     r10d,DWORD[60+rsi]
+        xor     r11d,ebx
+        add     ecx,r11d
+        rol     ecx,17
+        mov     r11d,eax
+        add     ecx,edx
+        xor     r11d,edx
+        lea     ebx,[1236535329+r10*1+rbx]
+        and     r11d,ecx
+        mov     r10d,DWORD[4+rsi]
+        xor     r11d,eax
+        add     ebx,r11d
+        rol     ebx,22
+        mov     r11d,edx
+        add     ebx,ecx
+        mov     r11d,edx
+        mov     r12d,edx
+        not     r11d
+        and     r12d,ebx
+        lea     eax,[((-165796510))+r10*1+rax]
+        and     r11d,ecx
+        mov     r10d,DWORD[24+rsi]
+        or      r12d,r11d
+        mov     r11d,ecx
+        add     eax,r12d
+        mov     r12d,ecx
+        rol     eax,5
+        add     eax,ebx
+        not     r11d
+        and     r12d,eax
+        lea     edx,[((-1069501632))+r10*1+rdx]
+        and     r11d,ebx
+        mov     r10d,DWORD[44+rsi]
+        or      r12d,r11d
+        mov     r11d,ebx
+        add     edx,r12d
+        mov     r12d,ebx
+        rol     edx,9
+        add     edx,eax
+        not     r11d
+        and     r12d,edx
+        lea     ecx,[643717713+r10*1+rcx]
+        and     r11d,eax
+        mov     r10d,DWORD[rsi]
+        or      r12d,r11d
+        mov     r11d,eax
+        add     ecx,r12d
+        mov     r12d,eax
+        rol     ecx,14
+        add     ecx,edx
+        not     r11d
+        and     r12d,ecx
+        lea     ebx,[((-373897302))+r10*1+rbx]
+        and     r11d,edx
+        mov     r10d,DWORD[20+rsi]
+        or      r12d,r11d
+        mov     r11d,edx
+        add     ebx,r12d
+        mov     r12d,edx
+        rol     ebx,20
+        add     ebx,ecx
+        not     r11d
+        and     r12d,ebx
+        lea     eax,[((-701558691))+r10*1+rax]
+        and     r11d,ecx
+        mov     r10d,DWORD[40+rsi]
+        or      r12d,r11d
+        mov     r11d,ecx
+        add     eax,r12d
+        mov     r12d,ecx
+        rol     eax,5
+        add     eax,ebx
+        not     r11d
+        and     r12d,eax
+        lea     edx,[38016083+r10*1+rdx]
+        and     r11d,ebx
+        mov     r10d,DWORD[60+rsi]
+        or      r12d,r11d
+        mov     r11d,ebx
+        add     edx,r12d
+        mov     r12d,ebx
+        rol     edx,9
+        add     edx,eax
+        not     r11d
+        and     r12d,edx
+        lea     ecx,[((-660478335))+r10*1+rcx]
+        and     r11d,eax
+        mov     r10d,DWORD[16+rsi]
+        or      r12d,r11d
+        mov     r11d,eax
+        add     ecx,r12d
+        mov     r12d,eax
+        rol     ecx,14
+        add     ecx,edx
+        not     r11d
+        and     r12d,ecx
+        lea     ebx,[((-405537848))+r10*1+rbx]
+        and     r11d,edx
+        mov     r10d,DWORD[36+rsi]
+        or      r12d,r11d
+        mov     r11d,edx
+        add     ebx,r12d
+        mov     r12d,edx
+        rol     ebx,20
+        add     ebx,ecx
+        not     r11d
+        and     r12d,ebx
+        lea     eax,[568446438+r10*1+rax]
+        and     r11d,ecx
+        mov     r10d,DWORD[56+rsi]
+        or      r12d,r11d
+        mov     r11d,ecx
+        add     eax,r12d
+        mov     r12d,ecx
+        rol     eax,5
+        add     eax,ebx
+        not     r11d
+        and     r12d,eax
+        lea     edx,[((-1019803690))+r10*1+rdx]
+        and     r11d,ebx
+        mov     r10d,DWORD[12+rsi]
+        or      r12d,r11d
+        mov     r11d,ebx
+        add     edx,r12d
+        mov     r12d,ebx
+        rol     edx,9
+        add     edx,eax
+        not     r11d
+        and     r12d,edx
+        lea     ecx,[((-187363961))+r10*1+rcx]
+        and     r11d,eax
+        mov     r10d,DWORD[32+rsi]
+        or      r12d,r11d
+        mov     r11d,eax
+        add     ecx,r12d
+        mov     r12d,eax
+        rol     ecx,14
+        add     ecx,edx
+        not     r11d
+        and     r12d,ecx
+        lea     ebx,[1163531501+r10*1+rbx]
+        and     r11d,edx
+        mov     r10d,DWORD[52+rsi]
+        or      r12d,r11d
+        mov     r11d,edx
+        add     ebx,r12d
+        mov     r12d,edx
+        rol     ebx,20
+        add     ebx,ecx
+        not     r11d
+        and     r12d,ebx
+        lea     eax,[((-1444681467))+r10*1+rax]
+        and     r11d,ecx
+        mov     r10d,DWORD[8+rsi]
+        or      r12d,r11d
+        mov     r11d,ecx
+        add     eax,r12d
+        mov     r12d,ecx
+        rol     eax,5
+        add     eax,ebx
+        not     r11d
+        and     r12d,eax
+        lea     edx,[((-51403784))+r10*1+rdx]
+        and     r11d,ebx
+        mov     r10d,DWORD[28+rsi]
+        or      r12d,r11d
+        mov     r11d,ebx
+        add     edx,r12d
+        mov     r12d,ebx
+        rol     edx,9
+        add     edx,eax
+        not     r11d
+        and     r12d,edx
+        lea     ecx,[1735328473+r10*1+rcx]
+        and     r11d,eax
+        mov     r10d,DWORD[48+rsi]
+        or      r12d,r11d
+        mov     r11d,eax
+        add     ecx,r12d
+        mov     r12d,eax
+        rol     ecx,14
+        add     ecx,edx
+        not     r11d
+        and     r12d,ecx
+        lea     ebx,[((-1926607734))+r10*1+rbx]
+        and     r11d,edx
+        mov     r10d,DWORD[20+rsi]
+        or      r12d,r11d
+        mov     r11d,edx
+        add     ebx,r12d
+        mov     r12d,edx
+        rol     ebx,20
+        add     ebx,ecx
+        mov     r11d,ecx
+        lea     eax,[((-378558))+r10*1+rax]
+        xor     r11d,edx
+        mov     r10d,DWORD[32+rsi]
+        xor     r11d,ebx
+        add     eax,r11d
+        mov     r11d,ebx
+        rol     eax,4
+        add     eax,ebx
+        lea     edx,[((-2022574463))+r10*1+rdx]
+        xor     r11d,ecx
+        mov     r10d,DWORD[44+rsi]
+        xor     r11d,eax
+        add     edx,r11d
+        rol     edx,11
+        mov     r11d,eax
+        add     edx,eax
+        lea     ecx,[1839030562+r10*1+rcx]
+        xor     r11d,ebx
+        mov     r10d,DWORD[56+rsi]
+        xor     r11d,edx
+        add     ecx,r11d
+        mov     r11d,edx
+        rol     ecx,16
+        add     ecx,edx
+        lea     ebx,[((-35309556))+r10*1+rbx]
+        xor     r11d,eax
+        mov     r10d,DWORD[4+rsi]
+        xor     r11d,ecx
+        add     ebx,r11d
+        rol     ebx,23
+        mov     r11d,ecx
+        add     ebx,ecx
+        lea     eax,[((-1530992060))+r10*1+rax]
+        xor     r11d,edx
+        mov     r10d,DWORD[16+rsi]
+        xor     r11d,ebx
+        add     eax,r11d
+        mov     r11d,ebx
+        rol     eax,4
+        add     eax,ebx
+        lea     edx,[1272893353+r10*1+rdx]
+        xor     r11d,ecx
+        mov     r10d,DWORD[28+rsi]
+        xor     r11d,eax
+        add     edx,r11d
+        rol     edx,11
+        mov     r11d,eax
+        add     edx,eax
+        lea     ecx,[((-155497632))+r10*1+rcx]
+        xor     r11d,ebx
+        mov     r10d,DWORD[40+rsi]
+        xor     r11d,edx
+        add     ecx,r11d
+        mov     r11d,edx
+        rol     ecx,16
+        add     ecx,edx
+        lea     ebx,[((-1094730640))+r10*1+rbx]
+        xor     r11d,eax
+        mov     r10d,DWORD[52+rsi]
+        xor     r11d,ecx
+        add     ebx,r11d
+        rol     ebx,23
+        mov     r11d,ecx
+        add     ebx,ecx
+        lea     eax,[681279174+r10*1+rax]
+        xor     r11d,edx
+        mov     r10d,DWORD[rsi]
+        xor     r11d,ebx
+        add     eax,r11d
+        mov     r11d,ebx
+        rol     eax,4
+        add     eax,ebx
+        lea     edx,[((-358537222))+r10*1+rdx]
+        xor     r11d,ecx
+        mov     r10d,DWORD[12+rsi]
+        xor     r11d,eax
+        add     edx,r11d
+        rol     edx,11
+        mov     r11d,eax
+        add     edx,eax
+        lea     ecx,[((-722521979))+r10*1+rcx]
+        xor     r11d,ebx
+        mov     r10d,DWORD[24+rsi]
+        xor     r11d,edx
+        add     ecx,r11d
+        mov     r11d,edx
+        rol     ecx,16
+        add     ecx,edx
+        lea     ebx,[76029189+r10*1+rbx]
+        xor     r11d,eax
+        mov     r10d,DWORD[36+rsi]
+        xor     r11d,ecx
+        add     ebx,r11d
+        rol     ebx,23
+        mov     r11d,ecx
+        add     ebx,ecx
+        lea     eax,[((-640364487))+r10*1+rax]
+        xor     r11d,edx
+        mov     r10d,DWORD[48+rsi]
+        xor     r11d,ebx
+        add     eax,r11d
+        mov     r11d,ebx
+        rol     eax,4
+        add     eax,ebx
+        lea     edx,[((-421815835))+r10*1+rdx]
+        xor     r11d,ecx
+        mov     r10d,DWORD[60+rsi]
+        xor     r11d,eax
+        add     edx,r11d
+        rol     edx,11
+        mov     r11d,eax
+        add     edx,eax
+        lea     ecx,[530742520+r10*1+rcx]
+        xor     r11d,ebx
+        mov     r10d,DWORD[8+rsi]
+        xor     r11d,edx
+        add     ecx,r11d
+        mov     r11d,edx
+        rol     ecx,16
+        add     ecx,edx
+        lea     ebx,[((-995338651))+r10*1+rbx]
+        xor     r11d,eax
+        mov     r10d,DWORD[rsi]
+        xor     r11d,ecx
+        add     ebx,r11d
+        rol     ebx,23
+        mov     r11d,ecx
+        add     ebx,ecx
+        mov     r11d,0xffffffff
+        xor     r11d,edx
+        lea     eax,[((-198630844))+r10*1+rax]
+        or      r11d,ebx
+        mov     r10d,DWORD[28+rsi]
+        xor     r11d,ecx
+        add     eax,r11d
+        mov     r11d,0xffffffff
+        rol     eax,6
+        xor     r11d,ecx
+        add     eax,ebx
+        lea     edx,[1126891415+r10*1+rdx]
+        or      r11d,eax
+        mov     r10d,DWORD[56+rsi]
+        xor     r11d,ebx
+        add     edx,r11d
+        mov     r11d,0xffffffff
+        rol     edx,10
+        xor     r11d,ebx
+        add     edx,eax
+        lea     ecx,[((-1416354905))+r10*1+rcx]
+        or      r11d,edx
+        mov     r10d,DWORD[20+rsi]
+        xor     r11d,eax
+        add     ecx,r11d
+        mov     r11d,0xffffffff
+        rol     ecx,15
+        xor     r11d,eax
+        add     ecx,edx
+        lea     ebx,[((-57434055))+r10*1+rbx]
+        or      r11d,ecx
+        mov     r10d,DWORD[48+rsi]
+        xor     r11d,edx
+        add     ebx,r11d
+        mov     r11d,0xffffffff
+        rol     ebx,21
+        xor     r11d,edx
+        add     ebx,ecx
+        lea     eax,[1700485571+r10*1+rax]
+        or      r11d,ebx
+        mov     r10d,DWORD[12+rsi]
+        xor     r11d,ecx
+        add     eax,r11d
+        mov     r11d,0xffffffff
+        rol     eax,6
+        xor     r11d,ecx
+        add     eax,ebx
+        lea     edx,[((-1894986606))+r10*1+rdx]
+        or      r11d,eax
+        mov     r10d,DWORD[40+rsi]
+        xor     r11d,ebx
+        add     edx,r11d
+        mov     r11d,0xffffffff
+        rol     edx,10
+        xor     r11d,ebx
+        add     edx,eax
+        lea     ecx,[((-1051523))+r10*1+rcx]
+        or      r11d,edx
+        mov     r10d,DWORD[4+rsi]
+        xor     r11d,eax
+        add     ecx,r11d
+        mov     r11d,0xffffffff
+        rol     ecx,15
+        xor     r11d,eax
+        add     ecx,edx
+        lea     ebx,[((-2054922799))+r10*1+rbx]
+        or      r11d,ecx
+        mov     r10d,DWORD[32+rsi]
+        xor     r11d,edx
+        add     ebx,r11d
+        mov     r11d,0xffffffff
+        rol     ebx,21
+        xor     r11d,edx
+        add     ebx,ecx
+        lea     eax,[1873313359+r10*1+rax]
+        or      r11d,ebx
+        mov     r10d,DWORD[60+rsi]
+        xor     r11d,ecx
+        add     eax,r11d
+        mov     r11d,0xffffffff
+        rol     eax,6
+        xor     r11d,ecx
+        add     eax,ebx
+        lea     edx,[((-30611744))+r10*1+rdx]
+        or      r11d,eax
+        mov     r10d,DWORD[24+rsi]
+        xor     r11d,ebx
+        add     edx,r11d
+        mov     r11d,0xffffffff
+        rol     edx,10
+        xor     r11d,ebx
+        add     edx,eax
+        lea     ecx,[((-1560198380))+r10*1+rcx]
+        or      r11d,edx
+        mov     r10d,DWORD[52+rsi]
+        xor     r11d,eax
+        add     ecx,r11d
+        mov     r11d,0xffffffff
+        rol     ecx,15
+        xor     r11d,eax
+        add     ecx,edx
+        lea     ebx,[1309151649+r10*1+rbx]
+        or      r11d,ecx
+        mov     r10d,DWORD[16+rsi]
+        xor     r11d,edx
+        add     ebx,r11d
+        mov     r11d,0xffffffff
+        rol     ebx,21
+        xor     r11d,edx
+        add     ebx,ecx
+        lea     eax,[((-145523070))+r10*1+rax]
+        or      r11d,ebx
+        mov     r10d,DWORD[44+rsi]
+        xor     r11d,ecx
+        add     eax,r11d
+        mov     r11d,0xffffffff
+        rol     eax,6
+        xor     r11d,ecx
+        add     eax,ebx
+        lea     edx,[((-1120210379))+r10*1+rdx]
+        or      r11d,eax
+        mov     r10d,DWORD[8+rsi]
+        xor     r11d,ebx
+        add     edx,r11d
+        mov     r11d,0xffffffff
+        rol     edx,10
+        xor     r11d,ebx
+        add     edx,eax
+        lea     ecx,[718787259+r10*1+rcx]
+        or      r11d,edx
+        mov     r10d,DWORD[36+rsi]
+        xor     r11d,eax
+        add     ecx,r11d
+        mov     r11d,0xffffffff
+        rol     ecx,15
+        xor     r11d,eax
+        add     ecx,edx
+        lea     ebx,[((-343485551))+r10*1+rbx]
+        or      r11d,ecx
+        mov     r10d,DWORD[rsi]
+        xor     r11d,edx
+        add     ebx,r11d
+        mov     r11d,0xffffffff
+        rol     ebx,21
+        xor     r11d,edx
+        add     ebx,ecx
+
+        add     eax,r8d
+        add     ebx,r9d
+        add     ecx,r14d
+        add     edx,r15d
+
+
+        add     rsi,64
+        cmp     rsi,rdi
+        jb      NEAR $L$loop
+
+
+$L$end:
+        mov     DWORD[rbp],eax
+        mov     DWORD[4+rbp],ebx
+        mov     DWORD[8+rbp],ecx
+        mov     DWORD[12+rbp],edx
+
+        mov     r15,QWORD[rsp]
+
+        mov     r14,QWORD[8+rsp]
+
+        mov     r12,QWORD[16+rsp]
+
+        mov     rbx,QWORD[24+rsp]
+
+        mov     rbp,QWORD[32+rsp]
+
+        add     rsp,40
+
+$L$epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_md5_block_asm_data_order:
+EXTERN  __imp_RtlVirtualUnwind
+
+ALIGN   16
+se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        lea     r10,[$L$prologue]
+        cmp     rbx,r10
+        jb      NEAR $L$in_prologue
+
+        mov     rax,QWORD[152+r8]
+
+        lea     r10,[$L$epilogue]
+        cmp     rbx,r10
+        jae     NEAR $L$in_prologue
+
+        lea     rax,[40+rax]
+
+        mov     rbp,QWORD[((-8))+rax]
+        mov     rbx,QWORD[((-16))+rax]
+        mov     r12,QWORD[((-24))+rax]
+        mov     r14,QWORD[((-32))+rax]
+        mov     r15,QWORD[((-40))+rax]
+        mov     QWORD[144+r8],rbx
+        mov     QWORD[160+r8],rbp
+        mov     QWORD[216+r8],r12
+        mov     QWORD[232+r8],r14
+        mov     QWORD[240+r8],r15
+
+$L$in_prologue:
+        mov     rdi,QWORD[8+rax]
+        mov     rsi,QWORD[16+rax]
+        mov     QWORD[152+r8],rax
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+
+        mov     rdi,QWORD[40+r9]
+        mov     rsi,r8
+        mov     ecx,154
+        DD      0xa548f3fc
+
+        mov     rsi,r9
+        xor     rcx,rcx
+        mov     rdx,QWORD[8+rsi]
+        mov     r8,QWORD[rsi]
+        mov     r9,QWORD[16+rsi]
+        mov     r10,QWORD[40+rsi]
+        lea     r11,[56+rsi]
+        lea     r12,[24+rsi]
+        mov     QWORD[32+rsp],r10
+        mov     QWORD[40+rsp],r11
+        mov     QWORD[48+rsp],r12
+        mov     QWORD[56+rsp],rcx
+        call    QWORD[__imp_RtlVirtualUnwind]
+
+        mov     eax,1
+        add     rsp,64
+        popfq
+        pop     r15
+        pop     r14
+        pop     r13
+        pop     r12
+        pop     rbp
+        pop     rbx
+        pop     rdi
+        pop     rsi
+        DB      0F3h,0C3h               ;repret
+
+
+section .pdata rdata align=4
+ALIGN   4
+        DD      $L$SEH_begin_md5_block_asm_data_order wrt ..imagebase
+        DD      $L$SEH_end_md5_block_asm_data_order wrt ..imagebase
+        DD      $L$SEH_info_md5_block_asm_data_order wrt ..imagebase
+
+section .xdata rdata align=8
+ALIGN   8
+$L$SEH_info_md5_block_asm_data_order:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm-x86_64.nasm
new file mode 100644
index 0000000000..3951121452
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm-x86_64.nasm
@@ -0,0 +1,984 @@
+; Copyright 2013-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+
+ALIGN   32
+_aesni_ctr32_ghash_6x:
+        vmovdqu xmm2,XMMWORD[32+r11]
+        sub     rdx,6
+        vpxor   xmm4,xmm4,xmm4
+        vmovdqu xmm15,XMMWORD[((0-128))+rcx]
+        vpaddb  xmm10,xmm1,xmm2
+        vpaddb  xmm11,xmm10,xmm2
+        vpaddb  xmm12,xmm11,xmm2
+        vpaddb  xmm13,xmm12,xmm2
+        vpaddb  xmm14,xmm13,xmm2
+        vpxor   xmm9,xmm1,xmm15
+        vmovdqu XMMWORD[(16+8)+rsp],xmm4
+        jmp     NEAR $L$oop6x
+
+ALIGN   32
+$L$oop6x:
+        add     ebx,100663296
+        jc      NEAR $L$handle_ctr32
+        vmovdqu xmm3,XMMWORD[((0-32))+r9]
+        vpaddb  xmm1,xmm14,xmm2
+        vpxor   xmm10,xmm10,xmm15
+        vpxor   xmm11,xmm11,xmm15
+
+$L$resume_ctr32:
+        vmovdqu XMMWORD[r8],xmm1
+        vpclmulqdq      xmm5,xmm7,xmm3,0x10
+        vpxor   xmm12,xmm12,xmm15
+        vmovups xmm2,XMMWORD[((16-128))+rcx]
+        vpclmulqdq      xmm6,xmm7,xmm3,0x01
+        xor     r12,r12
+        cmp     r15,r14
+
+        vaesenc xmm9,xmm9,xmm2
+        vmovdqu xmm0,XMMWORD[((48+8))+rsp]
+        vpxor   xmm13,xmm13,xmm15
+        vpclmulqdq      xmm1,xmm7,xmm3,0x00
+        vaesenc xmm10,xmm10,xmm2
+        vpxor   xmm14,xmm14,xmm15
+        setnc   r12b
+        vpclmulqdq      xmm7,xmm7,xmm3,0x11
+        vaesenc xmm11,xmm11,xmm2
+        vmovdqu xmm3,XMMWORD[((16-32))+r9]
+        neg     r12
+        vaesenc xmm12,xmm12,xmm2
+        vpxor   xmm6,xmm6,xmm5
+        vpclmulqdq      xmm5,xmm0,xmm3,0x00
+        vpxor   xmm8,xmm8,xmm4
+        vaesenc xmm13,xmm13,xmm2
+        vpxor   xmm4,xmm1,xmm5
+        and     r12,0x60
+        vmovups xmm15,XMMWORD[((32-128))+rcx]
+        vpclmulqdq      xmm1,xmm0,xmm3,0x10
+        vaesenc xmm14,xmm14,xmm2
+
+        vpclmulqdq      xmm2,xmm0,xmm3,0x01
+        lea     r14,[r12*1+r14]
+        vaesenc xmm9,xmm9,xmm15
+        vpxor   xmm8,xmm8,XMMWORD[((16+8))+rsp]
+        vpclmulqdq      xmm3,xmm0,xmm3,0x11
+        vmovdqu xmm0,XMMWORD[((64+8))+rsp]
+        vaesenc xmm10,xmm10,xmm15
+        movbe   r13,QWORD[88+r14]
+        vaesenc xmm11,xmm11,xmm15
+        movbe   r12,QWORD[80+r14]
+        vaesenc xmm12,xmm12,xmm15
+        mov     QWORD[((32+8))+rsp],r13
+        vaesenc xmm13,xmm13,xmm15
+        mov     QWORD[((40+8))+rsp],r12
+        vmovdqu xmm5,XMMWORD[((48-32))+r9]
+        vaesenc xmm14,xmm14,xmm15
+
+        vmovups xmm15,XMMWORD[((48-128))+rcx]
+        vpxor   xmm6,xmm6,xmm1
+        vpclmulqdq      xmm1,xmm0,xmm5,0x00
+        vaesenc xmm9,xmm9,xmm15
+        vpxor   xmm6,xmm6,xmm2
+        vpclmulqdq      xmm2,xmm0,xmm5,0x10
+        vaesenc xmm10,xmm10,xmm15
+        vpxor   xmm7,xmm7,xmm3
+        vpclmulqdq      xmm3,xmm0,xmm5,0x01
+        vaesenc xmm11,xmm11,xmm15
+        vpclmulqdq      xmm5,xmm0,xmm5,0x11
+        vmovdqu xmm0,XMMWORD[((80+8))+rsp]
+        vaesenc xmm12,xmm12,xmm15
+        vaesenc xmm13,xmm13,xmm15
+        vpxor   xmm4,xmm4,xmm1
+        vmovdqu xmm1,XMMWORD[((64-32))+r9]
+        vaesenc xmm14,xmm14,xmm15
+
+        vmovups xmm15,XMMWORD[((64-128))+rcx]
+        vpxor   xmm6,xmm6,xmm2
+        vpclmulqdq      xmm2,xmm0,xmm1,0x00
+        vaesenc xmm9,xmm9,xmm15
+        vpxor   xmm6,xmm6,xmm3
+        vpclmulqdq      xmm3,xmm0,xmm1,0x10
+        vaesenc xmm10,xmm10,xmm15
+        movbe   r13,QWORD[72+r14]
+        vpxor   xmm7,xmm7,xmm5
+        vpclmulqdq      xmm5,xmm0,xmm1,0x01
+        vaesenc xmm11,xmm11,xmm15
+        movbe   r12,QWORD[64+r14]
+        vpclmulqdq      xmm1,xmm0,xmm1,0x11
+        vmovdqu xmm0,XMMWORD[((96+8))+rsp]
+        vaesenc xmm12,xmm12,xmm15
+        mov     QWORD[((48+8))+rsp],r13
+        vaesenc xmm13,xmm13,xmm15
+        mov     QWORD[((56+8))+rsp],r12
+        vpxor   xmm4,xmm4,xmm2
+        vmovdqu xmm2,XMMWORD[((96-32))+r9]
+        vaesenc xmm14,xmm14,xmm15
+
+        vmovups xmm15,XMMWORD[((80-128))+rcx]
+        vpxor   xmm6,xmm6,xmm3
+        vpclmulqdq      xmm3,xmm0,xmm2,0x00
+        vaesenc xmm9,xmm9,xmm15
+        vpxor   xmm6,xmm6,xmm5
+        vpclmulqdq      xmm5,xmm0,xmm2,0x10
+        vaesenc xmm10,xmm10,xmm15
+        movbe   r13,QWORD[56+r14]
+        vpxor   xmm7,xmm7,xmm1
+        vpclmulqdq      xmm1,xmm0,xmm2,0x01
+        vpxor   xmm8,xmm8,XMMWORD[((112+8))+rsp]
+        vaesenc xmm11,xmm11,xmm15
+        movbe   r12,QWORD[48+r14]
+        vpclmulqdq      xmm2,xmm0,xmm2,0x11
+        vaesenc xmm12,xmm12,xmm15
+        mov     QWORD[((64+8))+rsp],r13
+        vaesenc xmm13,xmm13,xmm15
+        mov     QWORD[((72+8))+rsp],r12
+        vpxor   xmm4,xmm4,xmm3
+        vmovdqu xmm3,XMMWORD[((112-32))+r9]
+        vaesenc xmm14,xmm14,xmm15
+
+        vmovups xmm15,XMMWORD[((96-128))+rcx]
+        vpxor   xmm6,xmm6,xmm5
+        vpclmulqdq      xmm5,xmm8,xmm3,0x10
+        vaesenc xmm9,xmm9,xmm15
+        vpxor   xmm6,xmm6,xmm1
+        vpclmulqdq      xmm1,xmm8,xmm3,0x01
+        vaesenc xmm10,xmm10,xmm15
+        movbe   r13,QWORD[40+r14]
+        vpxor   xmm7,xmm7,xmm2
+        vpclmulqdq      xmm2,xmm8,xmm3,0x00
+        vaesenc xmm11,xmm11,xmm15
+        movbe   r12,QWORD[32+r14]
+        vpclmulqdq      xmm8,xmm8,xmm3,0x11
+        vaesenc xmm12,xmm12,xmm15
+        mov     QWORD[((80+8))+rsp],r13
+        vaesenc xmm13,xmm13,xmm15
+        mov     QWORD[((88+8))+rsp],r12
+        vpxor   xmm6,xmm6,xmm5
+        vaesenc xmm14,xmm14,xmm15
+        vpxor   xmm6,xmm6,xmm1
+
+        vmovups xmm15,XMMWORD[((112-128))+rcx]
+        vpslldq xmm5,xmm6,8
+        vpxor   xmm4,xmm4,xmm2
+        vmovdqu xmm3,XMMWORD[16+r11]
+
+        vaesenc xmm9,xmm9,xmm15
+        vpxor   xmm7,xmm7,xmm8
+        vaesenc xmm10,xmm10,xmm15
+        vpxor   xmm4,xmm4,xmm5
+        movbe   r13,QWORD[24+r14]
+        vaesenc xmm11,xmm11,xmm15
+        movbe   r12,QWORD[16+r14]
+        vpalignr        xmm0,xmm4,xmm4,8
+        vpclmulqdq      xmm4,xmm4,xmm3,0x10
+        mov     QWORD[((96+8))+rsp],r13
+        vaesenc xmm12,xmm12,xmm15
+        mov     QWORD[((104+8))+rsp],r12
+        vaesenc xmm13,xmm13,xmm15
+        vmovups xmm1,XMMWORD[((128-128))+rcx]
+        vaesenc xmm14,xmm14,xmm15
+
+        vaesenc xmm9,xmm9,xmm1
+        vmovups xmm15,XMMWORD[((144-128))+rcx]
+        vaesenc xmm10,xmm10,xmm1
+        vpsrldq xmm6,xmm6,8
+        vaesenc xmm11,xmm11,xmm1
+        vpxor   xmm7,xmm7,xmm6
+        vaesenc xmm12,xmm12,xmm1
+        vpxor   xmm4,xmm4,xmm0
+        movbe   r13,QWORD[8+r14]
+        vaesenc xmm13,xmm13,xmm1
+        movbe   r12,QWORD[r14]
+        vaesenc xmm14,xmm14,xmm1
+        vmovups xmm1,XMMWORD[((160-128))+rcx]
+        cmp     ebp,11
+        jb      NEAR $L$enc_tail
+
+        vaesenc xmm9,xmm9,xmm15
+        vaesenc xmm10,xmm10,xmm15
+        vaesenc xmm11,xmm11,xmm15
+        vaesenc xmm12,xmm12,xmm15
+        vaesenc xmm13,xmm13,xmm15
+        vaesenc xmm14,xmm14,xmm15
+
+        vaesenc xmm9,xmm9,xmm1
+        vaesenc xmm10,xmm10,xmm1
+        vaesenc xmm11,xmm11,xmm1
+        vaesenc xmm12,xmm12,xmm1
+        vaesenc xmm13,xmm13,xmm1
+        vmovups xmm15,XMMWORD[((176-128))+rcx]
+        vaesenc xmm14,xmm14,xmm1
+        vmovups xmm1,XMMWORD[((192-128))+rcx]
+        je      NEAR $L$enc_tail
+
+        vaesenc xmm9,xmm9,xmm15
+        vaesenc xmm10,xmm10,xmm15
+        vaesenc xmm11,xmm11,xmm15
+        vaesenc xmm12,xmm12,xmm15
+        vaesenc xmm13,xmm13,xmm15
+        vaesenc xmm14,xmm14,xmm15
+
+        vaesenc xmm9,xmm9,xmm1
+        vaesenc xmm10,xmm10,xmm1
+        vaesenc xmm11,xmm11,xmm1
+        vaesenc xmm12,xmm12,xmm1
+        vaesenc xmm13,xmm13,xmm1
+        vmovups xmm15,XMMWORD[((208-128))+rcx]
+        vaesenc xmm14,xmm14,xmm1
+        vmovups xmm1,XMMWORD[((224-128))+rcx]
+        jmp     NEAR $L$enc_tail
+
+ALIGN   32
+$L$handle_ctr32:
+        vmovdqu xmm0,XMMWORD[r11]
+        vpshufb xmm6,xmm1,xmm0
+        vmovdqu xmm5,XMMWORD[48+r11]
+        vpaddd  xmm10,xmm6,XMMWORD[64+r11]
+        vpaddd  xmm11,xmm6,xmm5
+        vmovdqu xmm3,XMMWORD[((0-32))+r9]
+        vpaddd  xmm12,xmm10,xmm5
+        vpshufb xmm10,xmm10,xmm0
+        vpaddd  xmm13,xmm11,xmm5
+        vpshufb xmm11,xmm11,xmm0
+        vpxor   xmm10,xmm10,xmm15
+        vpaddd  xmm14,xmm12,xmm5
+        vpshufb xmm12,xmm12,xmm0
+        vpxor   xmm11,xmm11,xmm15
+        vpaddd  xmm1,xmm13,xmm5
+        vpshufb xmm13,xmm13,xmm0
+        vpshufb xmm14,xmm14,xmm0
+        vpshufb xmm1,xmm1,xmm0
+        jmp     NEAR $L$resume_ctr32
+
+ALIGN   32
+$L$enc_tail:
+        vaesenc xmm9,xmm9,xmm15
+        vmovdqu XMMWORD[(16+8)+rsp],xmm7
+        vpalignr        xmm8,xmm4,xmm4,8
+        vaesenc xmm10,xmm10,xmm15
+        vpclmulqdq      xmm4,xmm4,xmm3,0x10
+        vpxor   xmm2,xmm1,XMMWORD[rdi]
+        vaesenc xmm11,xmm11,xmm15
+        vpxor   xmm0,xmm1,XMMWORD[16+rdi]
+        vaesenc xmm12,xmm12,xmm15
+        vpxor   xmm5,xmm1,XMMWORD[32+rdi]
+        vaesenc xmm13,xmm13,xmm15
+        vpxor   xmm6,xmm1,XMMWORD[48+rdi]
+        vaesenc xmm14,xmm14,xmm15
+        vpxor   xmm7,xmm1,XMMWORD[64+rdi]
+        vpxor   xmm3,xmm1,XMMWORD[80+rdi]
+        vmovdqu xmm1,XMMWORD[r8]
+
+        vaesenclast     xmm9,xmm9,xmm2
+        vmovdqu xmm2,XMMWORD[32+r11]
+        vaesenclast     xmm10,xmm10,xmm0
+        vpaddb  xmm0,xmm1,xmm2
+        mov     QWORD[((112+8))+rsp],r13
+        lea     rdi,[96+rdi]
+        vaesenclast     xmm11,xmm11,xmm5
+        vpaddb  xmm5,xmm0,xmm2
+        mov     QWORD[((120+8))+rsp],r12
+        lea     rsi,[96+rsi]
+        vmovdqu xmm15,XMMWORD[((0-128))+rcx]
+        vaesenclast     xmm12,xmm12,xmm6
+        vpaddb  xmm6,xmm5,xmm2
+        vaesenclast     xmm13,xmm13,xmm7
+        vpaddb  xmm7,xmm6,xmm2
+        vaesenclast     xmm14,xmm14,xmm3
+        vpaddb  xmm3,xmm7,xmm2
+
+        add     r10,0x60
+        sub     rdx,0x6
+        jc      NEAR $L$6x_done
+
+        vmovups XMMWORD[(-96)+rsi],xmm9
+        vpxor   xmm9,xmm1,xmm15
+        vmovups XMMWORD[(-80)+rsi],xmm10
+        vmovdqa xmm10,xmm0
+        vmovups XMMWORD[(-64)+rsi],xmm11
+        vmovdqa xmm11,xmm5
+        vmovups XMMWORD[(-48)+rsi],xmm12
+        vmovdqa xmm12,xmm6
+        vmovups XMMWORD[(-32)+rsi],xmm13
+        vmovdqa xmm13,xmm7
+        vmovups XMMWORD[(-16)+rsi],xmm14
+        vmovdqa xmm14,xmm3
+        vmovdqu xmm7,XMMWORD[((32+8))+rsp]
+        jmp     NEAR $L$oop6x
+
+$L$6x_done:
+        vpxor   xmm8,xmm8,XMMWORD[((16+8))+rsp]
+        vpxor   xmm8,xmm8,xmm4
+
+        DB      0F3h,0C3h               ;repret
+
+global  aesni_gcm_decrypt
+
+ALIGN   32
+aesni_gcm_decrypt:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_gcm_decrypt:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        xor     r10,r10
+        cmp     rdx,0x60
+        jb      NEAR $L$gcm_dec_abort
+
+        lea     rax,[rsp]
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        lea     rsp,[((-168))+rsp]
+        movaps  XMMWORD[(-216)+rax],xmm6
+        movaps  XMMWORD[(-200)+rax],xmm7
+        movaps  XMMWORD[(-184)+rax],xmm8
+        movaps  XMMWORD[(-168)+rax],xmm9
+        movaps  XMMWORD[(-152)+rax],xmm10
+        movaps  XMMWORD[(-136)+rax],xmm11
+        movaps  XMMWORD[(-120)+rax],xmm12
+        movaps  XMMWORD[(-104)+rax],xmm13
+        movaps  XMMWORD[(-88)+rax],xmm14
+        movaps  XMMWORD[(-72)+rax],xmm15
+$L$gcm_dec_body:
+        vzeroupper
+
+        vmovdqu xmm1,XMMWORD[r8]
+        add     rsp,-128
+        mov     ebx,DWORD[12+r8]
+        lea     r11,[$L$bswap_mask]
+        lea     r14,[((-128))+rcx]
+        mov     r15,0xf80
+        vmovdqu xmm8,XMMWORD[r9]
+        and     rsp,-128
+        vmovdqu xmm0,XMMWORD[r11]
+        lea     rcx,[128+rcx]
+        lea     r9,[((32+32))+r9]
+        mov     ebp,DWORD[((240-128))+rcx]
+        vpshufb xmm8,xmm8,xmm0
+
+        and     r14,r15
+        and     r15,rsp
+        sub     r15,r14
+        jc      NEAR $L$dec_no_key_aliasing
+        cmp     r15,768
+        jnc     NEAR $L$dec_no_key_aliasing
+        sub     rsp,r15
+$L$dec_no_key_aliasing:
+
+        vmovdqu xmm7,XMMWORD[80+rdi]
+        lea     r14,[rdi]
+        vmovdqu xmm4,XMMWORD[64+rdi]
+        lea     r15,[((-192))+rdx*1+rdi]
+        vmovdqu xmm5,XMMWORD[48+rdi]
+        shr     rdx,4
+        xor     r10,r10
+        vmovdqu xmm6,XMMWORD[32+rdi]
+        vpshufb xmm7,xmm7,xmm0
+        vmovdqu xmm2,XMMWORD[16+rdi]
+        vpshufb xmm4,xmm4,xmm0
+        vmovdqu xmm3,XMMWORD[rdi]
+        vpshufb xmm5,xmm5,xmm0
+        vmovdqu XMMWORD[48+rsp],xmm4
+        vpshufb xmm6,xmm6,xmm0
+        vmovdqu XMMWORD[64+rsp],xmm5
+        vpshufb xmm2,xmm2,xmm0
+        vmovdqu XMMWORD[80+rsp],xmm6
+        vpshufb xmm3,xmm3,xmm0
+        vmovdqu XMMWORD[96+rsp],xmm2
+        vmovdqu XMMWORD[112+rsp],xmm3
+
+        call    _aesni_ctr32_ghash_6x
+
+        vmovups XMMWORD[(-96)+rsi],xmm9
+        vmovups XMMWORD[(-80)+rsi],xmm10
+        vmovups XMMWORD[(-64)+rsi],xmm11
+        vmovups XMMWORD[(-48)+rsi],xmm12
+        vmovups XMMWORD[(-32)+rsi],xmm13
+        vmovups XMMWORD[(-16)+rsi],xmm14
+
+        vpshufb xmm8,xmm8,XMMWORD[r11]
+        vmovdqu XMMWORD[(-64)+r9],xmm8
+
+        vzeroupper
+        movaps  xmm6,XMMWORD[((-216))+rax]
+        movaps  xmm7,XMMWORD[((-200))+rax]
+        movaps  xmm8,XMMWORD[((-184))+rax]
+        movaps  xmm9,XMMWORD[((-168))+rax]
+        movaps  xmm10,XMMWORD[((-152))+rax]
+        movaps  xmm11,XMMWORD[((-136))+rax]
+        movaps  xmm12,XMMWORD[((-120))+rax]
+        movaps  xmm13,XMMWORD[((-104))+rax]
+        movaps  xmm14,XMMWORD[((-88))+rax]
+        movaps  xmm15,XMMWORD[((-72))+rax]
+        mov     r15,QWORD[((-48))+rax]
+
+        mov     r14,QWORD[((-40))+rax]
+
+        mov     r13,QWORD[((-32))+rax]
+
+        mov     r12,QWORD[((-24))+rax]
+
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$gcm_dec_abort:
+        mov     rax,r10
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_aesni_gcm_decrypt:
+
+ALIGN   32
+_aesni_ctr32_6x:
+        vmovdqu xmm4,XMMWORD[((0-128))+rcx]
+        vmovdqu xmm2,XMMWORD[32+r11]
+        lea     r13,[((-1))+rbp]
+        vmovups xmm15,XMMWORD[((16-128))+rcx]
+        lea     r12,[((32-128))+rcx]
+        vpxor   xmm9,xmm1,xmm4
+        add     ebx,100663296
+        jc      NEAR $L$handle_ctr32_2
+        vpaddb  xmm10,xmm1,xmm2
+        vpaddb  xmm11,xmm10,xmm2
+        vpxor   xmm10,xmm10,xmm4
+        vpaddb  xmm12,xmm11,xmm2
+        vpxor   xmm11,xmm11,xmm4
+        vpaddb  xmm13,xmm12,xmm2
+        vpxor   xmm12,xmm12,xmm4
+        vpaddb  xmm14,xmm13,xmm2
+        vpxor   xmm13,xmm13,xmm4
+        vpaddb  xmm1,xmm14,xmm2
+        vpxor   xmm14,xmm14,xmm4
+        jmp     NEAR $L$oop_ctr32
+
+ALIGN   16
+$L$oop_ctr32:
+        vaesenc xmm9,xmm9,xmm15
+        vaesenc xmm10,xmm10,xmm15
+        vaesenc xmm11,xmm11,xmm15
+        vaesenc xmm12,xmm12,xmm15
+        vaesenc xmm13,xmm13,xmm15
+        vaesenc xmm14,xmm14,xmm15
+        vmovups xmm15,XMMWORD[r12]
+        lea     r12,[16+r12]
+        dec     r13d
+        jnz     NEAR $L$oop_ctr32
+
+        vmovdqu xmm3,XMMWORD[r12]
+        vaesenc xmm9,xmm9,xmm15
+        vpxor   xmm4,xmm3,XMMWORD[rdi]
+        vaesenc xmm10,xmm10,xmm15
+        vpxor   xmm5,xmm3,XMMWORD[16+rdi]
+        vaesenc xmm11,xmm11,xmm15
+        vpxor   xmm6,xmm3,XMMWORD[32+rdi]
+        vaesenc xmm12,xmm12,xmm15
+        vpxor   xmm8,xmm3,XMMWORD[48+rdi]
+        vaesenc xmm13,xmm13,xmm15
+        vpxor   xmm2,xmm3,XMMWORD[64+rdi]
+        vaesenc xmm14,xmm14,xmm15
+        vpxor   xmm3,xmm3,XMMWORD[80+rdi]
+        lea     rdi,[96+rdi]
+
+        vaesenclast     xmm9,xmm9,xmm4
+        vaesenclast     xmm10,xmm10,xmm5
+        vaesenclast     xmm11,xmm11,xmm6
+        vaesenclast     xmm12,xmm12,xmm8
+        vaesenclast     xmm13,xmm13,xmm2
+        vaesenclast     xmm14,xmm14,xmm3
+        vmovups XMMWORD[rsi],xmm9
+        vmovups XMMWORD[16+rsi],xmm10
+        vmovups XMMWORD[32+rsi],xmm11
+        vmovups XMMWORD[48+rsi],xmm12
+        vmovups XMMWORD[64+rsi],xmm13
+        vmovups XMMWORD[80+rsi],xmm14
+        lea     rsi,[96+rsi]
+
+        DB      0F3h,0C3h               ;repret
+ALIGN   32
+$L$handle_ctr32_2:
+        vpshufb xmm6,xmm1,xmm0
+        vmovdqu xmm5,XMMWORD[48+r11]
+        vpaddd  xmm10,xmm6,XMMWORD[64+r11]
+        vpaddd  xmm11,xmm6,xmm5
+        vpaddd  xmm12,xmm10,xmm5
+        vpshufb xmm10,xmm10,xmm0
+        vpaddd  xmm13,xmm11,xmm5
+        vpshufb xmm11,xmm11,xmm0
+        vpxor   xmm10,xmm10,xmm4
+        vpaddd  xmm14,xmm12,xmm5
+        vpshufb xmm12,xmm12,xmm0
+        vpxor   xmm11,xmm11,xmm4
+        vpaddd  xmm1,xmm13,xmm5
+        vpshufb xmm13,xmm13,xmm0
+        vpxor   xmm12,xmm12,xmm4
+        vpshufb xmm14,xmm14,xmm0
+        vpxor   xmm13,xmm13,xmm4
+        vpshufb xmm1,xmm1,xmm0
+        vpxor   xmm14,xmm14,xmm4
+        jmp     NEAR $L$oop_ctr32
+
+
+global  aesni_gcm_encrypt
+
+ALIGN   32
+aesni_gcm_encrypt:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_gcm_encrypt:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        xor     r10,r10
+        cmp     rdx,0x60*3
+        jb      NEAR $L$gcm_enc_abort
+
+        lea     rax,[rsp]
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        lea     rsp,[((-168))+rsp]
+        movaps  XMMWORD[(-216)+rax],xmm6
+        movaps  XMMWORD[(-200)+rax],xmm7
+        movaps  XMMWORD[(-184)+rax],xmm8
+        movaps  XMMWORD[(-168)+rax],xmm9
+        movaps  XMMWORD[(-152)+rax],xmm10
+        movaps  XMMWORD[(-136)+rax],xmm11
+        movaps  XMMWORD[(-120)+rax],xmm12
+        movaps  XMMWORD[(-104)+rax],xmm13
+        movaps  XMMWORD[(-88)+rax],xmm14
+        movaps  XMMWORD[(-72)+rax],xmm15
+$L$gcm_enc_body:
+        vzeroupper
+
+        vmovdqu xmm1,XMMWORD[r8]
+        add     rsp,-128
+        mov     ebx,DWORD[12+r8]
+        lea     r11,[$L$bswap_mask]
+        lea     r14,[((-128))+rcx]
+        mov     r15,0xf80
+        lea     rcx,[128+rcx]
+        vmovdqu xmm0,XMMWORD[r11]
+        and     rsp,-128
+        mov     ebp,DWORD[((240-128))+rcx]
+
+        and     r14,r15
+        and     r15,rsp
+        sub     r15,r14
+        jc      NEAR $L$enc_no_key_aliasing
+        cmp     r15,768
+        jnc     NEAR $L$enc_no_key_aliasing
+        sub     rsp,r15
+$L$enc_no_key_aliasing:
+
+        lea     r14,[rsi]
+        lea     r15,[((-192))+rdx*1+rsi]
+        shr     rdx,4
+
+        call    _aesni_ctr32_6x
+        vpshufb xmm8,xmm9,xmm0
+        vpshufb xmm2,xmm10,xmm0
+        vmovdqu XMMWORD[112+rsp],xmm8
+        vpshufb xmm4,xmm11,xmm0
+        vmovdqu XMMWORD[96+rsp],xmm2
+        vpshufb xmm5,xmm12,xmm0
+        vmovdqu XMMWORD[80+rsp],xmm4
+        vpshufb xmm6,xmm13,xmm0
+        vmovdqu XMMWORD[64+rsp],xmm5
+        vpshufb xmm7,xmm14,xmm0
+        vmovdqu XMMWORD[48+rsp],xmm6
+
+        call    _aesni_ctr32_6x
+
+        vmovdqu xmm8,XMMWORD[r9]
+        lea     r9,[((32+32))+r9]
+        sub     rdx,12
+        mov     r10,0x60*2
+        vpshufb xmm8,xmm8,xmm0
+
+        call    _aesni_ctr32_ghash_6x
+        vmovdqu xmm7,XMMWORD[32+rsp]
+        vmovdqu xmm0,XMMWORD[r11]
+        vmovdqu xmm3,XMMWORD[((0-32))+r9]
+        vpunpckhqdq     xmm1,xmm7,xmm7
+        vmovdqu xmm15,XMMWORD[((32-32))+r9]
+        vmovups XMMWORD[(-96)+rsi],xmm9
+        vpshufb xmm9,xmm9,xmm0
+        vpxor   xmm1,xmm1,xmm7
+        vmovups XMMWORD[(-80)+rsi],xmm10
+        vpshufb xmm10,xmm10,xmm0
+        vmovups XMMWORD[(-64)+rsi],xmm11
+        vpshufb xmm11,xmm11,xmm0
+        vmovups XMMWORD[(-48)+rsi],xmm12
+        vpshufb xmm12,xmm12,xmm0
+        vmovups XMMWORD[(-32)+rsi],xmm13
+        vpshufb xmm13,xmm13,xmm0
+        vmovups XMMWORD[(-16)+rsi],xmm14
+        vpshufb xmm14,xmm14,xmm0
+        vmovdqu XMMWORD[16+rsp],xmm9
+        vmovdqu xmm6,XMMWORD[48+rsp]
+        vmovdqu xmm0,XMMWORD[((16-32))+r9]
+        vpunpckhqdq     xmm2,xmm6,xmm6
+        vpclmulqdq      xmm5,xmm7,xmm3,0x00
+        vpxor   xmm2,xmm2,xmm6
+        vpclmulqdq      xmm7,xmm7,xmm3,0x11
+        vpclmulqdq      xmm1,xmm1,xmm15,0x00
+
+        vmovdqu xmm9,XMMWORD[64+rsp]
+        vpclmulqdq      xmm4,xmm6,xmm0,0x00
+        vmovdqu xmm3,XMMWORD[((48-32))+r9]
+        vpxor   xmm4,xmm4,xmm5
+        vpunpckhqdq     xmm5,xmm9,xmm9
+        vpclmulqdq      xmm6,xmm6,xmm0,0x11
+        vpxor   xmm5,xmm5,xmm9
+        vpxor   xmm6,xmm6,xmm7
+        vpclmulqdq      xmm2,xmm2,xmm15,0x10
+        vmovdqu xmm15,XMMWORD[((80-32))+r9]
+        vpxor   xmm2,xmm2,xmm1
+
+        vmovdqu xmm1,XMMWORD[80+rsp]
+        vpclmulqdq      xmm7,xmm9,xmm3,0x00
+        vmovdqu xmm0,XMMWORD[((64-32))+r9]
+        vpxor   xmm7,xmm7,xmm4
+        vpunpckhqdq     xmm4,xmm1,xmm1
+        vpclmulqdq      xmm9,xmm9,xmm3,0x11
+        vpxor   xmm4,xmm4,xmm1
+        vpxor   xmm9,xmm9,xmm6
+        vpclmulqdq      xmm5,xmm5,xmm15,0x00
+        vpxor   xmm5,xmm5,xmm2
+
+        vmovdqu xmm2,XMMWORD[96+rsp]
+        vpclmulqdq      xmm6,xmm1,xmm0,0x00
+        vmovdqu xmm3,XMMWORD[((96-32))+r9]
+        vpxor   xmm6,xmm6,xmm7
+        vpunpckhqdq     xmm7,xmm2,xmm2
+        vpclmulqdq      xmm1,xmm1,xmm0,0x11
+        vpxor   xmm7,xmm7,xmm2
+        vpxor   xmm1,xmm1,xmm9
+        vpclmulqdq      xmm4,xmm4,xmm15,0x10
+        vmovdqu xmm15,XMMWORD[((128-32))+r9]
+        vpxor   xmm4,xmm4,xmm5
+
+        vpxor   xmm8,xmm8,XMMWORD[112+rsp]
+        vpclmulqdq      xmm5,xmm2,xmm3,0x00
+        vmovdqu xmm0,XMMWORD[((112-32))+r9]
+        vpunpckhqdq     xmm9,xmm8,xmm8
+        vpxor   xmm5,xmm5,xmm6
+        vpclmulqdq      xmm2,xmm2,xmm3,0x11
+        vpxor   xmm9,xmm9,xmm8
+        vpxor   xmm2,xmm2,xmm1
+        vpclmulqdq      xmm7,xmm7,xmm15,0x00
+        vpxor   xmm4,xmm7,xmm4
+
+        vpclmulqdq      xmm6,xmm8,xmm0,0x00
+        vmovdqu xmm3,XMMWORD[((0-32))+r9]
+        vpunpckhqdq     xmm1,xmm14,xmm14
+        vpclmulqdq      xmm8,xmm8,xmm0,0x11
+        vpxor   xmm1,xmm1,xmm14
+        vpxor   xmm5,xmm6,xmm5
+        vpclmulqdq      xmm9,xmm9,xmm15,0x10
+        vmovdqu xmm15,XMMWORD[((32-32))+r9]
+        vpxor   xmm7,xmm8,xmm2
+        vpxor   xmm6,xmm9,xmm4
+
+        vmovdqu xmm0,XMMWORD[((16-32))+r9]
+        vpxor   xmm9,xmm7,xmm5
+        vpclmulqdq      xmm4,xmm14,xmm3,0x00
+        vpxor   xmm6,xmm6,xmm9
+        vpunpckhqdq     xmm2,xmm13,xmm13
+        vpclmulqdq      xmm14,xmm14,xmm3,0x11
+        vpxor   xmm2,xmm2,xmm13
+        vpslldq xmm9,xmm6,8
+        vpclmulqdq      xmm1,xmm1,xmm15,0x00
+        vpxor   xmm8,xmm5,xmm9
+        vpsrldq xmm6,xmm6,8
+        vpxor   xmm7,xmm7,xmm6
+
+        vpclmulqdq      xmm5,xmm13,xmm0,0x00
+        vmovdqu xmm3,XMMWORD[((48-32))+r9]
+        vpxor   xmm5,xmm5,xmm4
+        vpunpckhqdq     xmm9,xmm12,xmm12
+        vpclmulqdq      xmm13,xmm13,xmm0,0x11
+        vpxor   xmm9,xmm9,xmm12
+        vpxor   xmm13,xmm13,xmm14
+        vpalignr        xmm14,xmm8,xmm8,8
+        vpclmulqdq      xmm2,xmm2,xmm15,0x10
+        vmovdqu xmm15,XMMWORD[((80-32))+r9]
+        vpxor   xmm2,xmm2,xmm1
+
+        vpclmulqdq      xmm4,xmm12,xmm3,0x00
+        vmovdqu xmm0,XMMWORD[((64-32))+r9]
+        vpxor   xmm4,xmm4,xmm5
+        vpunpckhqdq     xmm1,xmm11,xmm11
+        vpclmulqdq      xmm12,xmm12,xmm3,0x11
+        vpxor   xmm1,xmm1,xmm11
+        vpxor   xmm12,xmm12,xmm13
+        vxorps  xmm7,xmm7,XMMWORD[16+rsp]
+        vpclmulqdq      xmm9,xmm9,xmm15,0x00
+        vpxor   xmm9,xmm9,xmm2
+
+        vpclmulqdq      xmm8,xmm8,XMMWORD[16+r11],0x10
+        vxorps  xmm8,xmm8,xmm14
+
+        vpclmulqdq      xmm5,xmm11,xmm0,0x00
+        vmovdqu xmm3,XMMWORD[((96-32))+r9]
+        vpxor   xmm5,xmm5,xmm4
+        vpunpckhqdq     xmm2,xmm10,xmm10
+        vpclmulqdq      xmm11,xmm11,xmm0,0x11
+        vpxor   xmm2,xmm2,xmm10
+        vpalignr        xmm14,xmm8,xmm8,8
+        vpxor   xmm11,xmm11,xmm12
+        vpclmulqdq      xmm1,xmm1,xmm15,0x10
+        vmovdqu xmm15,XMMWORD[((128-32))+r9]
+        vpxor   xmm1,xmm1,xmm9
+
+        vxorps  xmm14,xmm14,xmm7
+        vpclmulqdq      xmm8,xmm8,XMMWORD[16+r11],0x10
+        vxorps  xmm8,xmm8,xmm14
+
+        vpclmulqdq      xmm4,xmm10,xmm3,0x00
+        vmovdqu xmm0,XMMWORD[((112-32))+r9]
+        vpxor   xmm4,xmm4,xmm5
+        vpunpckhqdq     xmm9,xmm8,xmm8
+        vpclmulqdq      xmm10,xmm10,xmm3,0x11
+        vpxor   xmm9,xmm9,xmm8
+        vpxor   xmm10,xmm10,xmm11
+        vpclmulqdq      xmm2,xmm2,xmm15,0x00
+        vpxor   xmm2,xmm2,xmm1
+
+        vpclmulqdq      xmm5,xmm8,xmm0,0x00
+        vpclmulqdq      xmm7,xmm8,xmm0,0x11
+        vpxor   xmm5,xmm5,xmm4
+        vpclmulqdq      xmm6,xmm9,xmm15,0x10
+        vpxor   xmm7,xmm7,xmm10
+        vpxor   xmm6,xmm6,xmm2
+
+        vpxor   xmm4,xmm7,xmm5
+        vpxor   xmm6,xmm6,xmm4
+        vpslldq xmm1,xmm6,8
+        vmovdqu xmm3,XMMWORD[16+r11]
+        vpsrldq xmm6,xmm6,8
+        vpxor   xmm8,xmm5,xmm1
+        vpxor   xmm7,xmm7,xmm6
+
+        vpalignr        xmm2,xmm8,xmm8,8
+        vpclmulqdq      xmm8,xmm8,xmm3,0x10
+        vpxor   xmm8,xmm8,xmm2
+
+        vpalignr        xmm2,xmm8,xmm8,8
+        vpclmulqdq      xmm8,xmm8,xmm3,0x10
+        vpxor   xmm2,xmm2,xmm7
+        vpxor   xmm8,xmm8,xmm2
+        vpshufb xmm8,xmm8,XMMWORD[r11]
+        vmovdqu XMMWORD[(-64)+r9],xmm8
+
+        vzeroupper
+        movaps  xmm6,XMMWORD[((-216))+rax]
+        movaps  xmm7,XMMWORD[((-200))+rax]
+        movaps  xmm8,XMMWORD[((-184))+rax]
+        movaps  xmm9,XMMWORD[((-168))+rax]
+        movaps  xmm10,XMMWORD[((-152))+rax]
+        movaps  xmm11,XMMWORD[((-136))+rax]
+        movaps  xmm12,XMMWORD[((-120))+rax]
+        movaps  xmm13,XMMWORD[((-104))+rax]
+        movaps  xmm14,XMMWORD[((-88))+rax]
+        movaps  xmm15,XMMWORD[((-72))+rax]
+        mov     r15,QWORD[((-48))+rax]
+
+        mov     r14,QWORD[((-40))+rax]
+
+        mov     r13,QWORD[((-32))+rax]
+
+        mov     r12,QWORD[((-24))+rax]
+
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$gcm_enc_abort:
+        mov     rax,r10
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_aesni_gcm_encrypt:
+ALIGN   64
+$L$bswap_mask:
+DB      15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0
+$L$poly:
+DB      0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0xc2
+$L$one_msb:
+DB      0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
+$L$two_lsb:
+DB      2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
+$L$one_lsb:
+DB      1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
+DB      65,69,83,45,78,73,32,71,67,77,32,109,111,100,117,108
+DB      101,32,102,111,114,32,120,56,54,95,54,52,44,32,67,82
+DB      89,80,84,79,71,65,77,83,32,98,121,32,60,97,112,112
+DB      114,111,64,111,112,101,110,115,115,108,46,111,114,103,62,0
+ALIGN   64
+EXTERN  __imp_RtlVirtualUnwind
+
+ALIGN   16
+gcm_se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$common_seh_tail
+
+        mov     rax,QWORD[152+r8]
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$common_seh_tail
+
+        mov     rax,QWORD[120+r8]
+
+        mov     r15,QWORD[((-48))+rax]
+        mov     r14,QWORD[((-40))+rax]
+        mov     r13,QWORD[((-32))+rax]
+        mov     r12,QWORD[((-24))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+        mov     rbx,QWORD[((-8))+rax]
+        mov     QWORD[240+r8],r15
+        mov     QWORD[232+r8],r14
+        mov     QWORD[224+r8],r13
+        mov     QWORD[216+r8],r12
+        mov     QWORD[160+r8],rbp
+        mov     QWORD[144+r8],rbx
+
+        lea     rsi,[((-216))+rax]
+        lea     rdi,[512+r8]
+        mov     ecx,20
+        DD      0xa548f3fc
+
+$L$common_seh_tail:
+        mov     rdi,QWORD[8+rax]
+        mov     rsi,QWORD[16+rax]
+        mov     QWORD[152+r8],rax
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+
+        mov     rdi,QWORD[40+r9]
+        mov     rsi,r8
+        mov     ecx,154
+        DD      0xa548f3fc
+
+        mov     rsi,r9
+        xor     rcx,rcx
+        mov     rdx,QWORD[8+rsi]
+        mov     r8,QWORD[rsi]
+        mov     r9,QWORD[16+rsi]
+        mov     r10,QWORD[40+rsi]
+        lea     r11,[56+rsi]
+        lea     r12,[24+rsi]
+        mov     QWORD[32+rsp],r10
+        mov     QWORD[40+rsp],r11
+        mov     QWORD[48+rsp],r12
+        mov     QWORD[56+rsp],rcx
+        call    QWORD[__imp_RtlVirtualUnwind]
+
+        mov     eax,1
+        add     rsp,64
+        popfq
+        pop     r15
+        pop     r14
+        pop     r13
+        pop     r12
+        pop     rbp
+        pop     rbx
+        pop     rdi
+        pop     rsi
+        DB      0F3h,0C3h               ;repret
+
+
+section .pdata rdata align=4
+ALIGN   4
+        DD      $L$SEH_begin_aesni_gcm_decrypt wrt ..imagebase
+        DD      $L$SEH_end_aesni_gcm_decrypt wrt ..imagebase
+        DD      $L$SEH_gcm_dec_info wrt ..imagebase
+
+        DD      $L$SEH_begin_aesni_gcm_encrypt wrt ..imagebase
+        DD      $L$SEH_end_aesni_gcm_encrypt wrt ..imagebase
+        DD      $L$SEH_gcm_enc_info wrt ..imagebase
+section .xdata rdata align=8
+ALIGN   8
+$L$SEH_gcm_dec_info:
+DB      9,0,0,0
+        DD      gcm_se_handler wrt ..imagebase
+        DD      $L$gcm_dec_body wrt ..imagebase,$L$gcm_dec_abort wrt ..imagebase
+$L$SEH_gcm_enc_info:
+DB      9,0,0,0
+        DD      gcm_se_handler wrt ..imagebase
+        DD      $L$gcm_enc_body wrt ..imagebase,$L$gcm_enc_abort wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash-x86_64.nasm
new file mode 100644
index 0000000000..3d67e12775
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash-x86_64.nasm
@@ -0,0 +1,2077 @@
+; Copyright 2010-2019 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+EXTERN  OPENSSL_ia32cap_P
+
+global  gcm_gmult_4bit
+
+ALIGN   16
+gcm_gmult_4bit:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_gcm_gmult_4bit:
+        mov     rdi,rcx
+        mov     rsi,rdx
+
+
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        sub     rsp,280
+
+$L$gmult_prologue:
+
+        movzx   r8,BYTE[15+rdi]
+        lea     r11,[$L$rem_4bit]
+        xor     rax,rax
+        xor     rbx,rbx
+        mov     al,r8b
+        mov     bl,r8b
+        shl     al,4
+        mov     rcx,14
+        mov     r8,QWORD[8+rax*1+rsi]
+        mov     r9,QWORD[rax*1+rsi]
+        and     bl,0xf0
+        mov     rdx,r8
+        jmp     NEAR $L$oop1
+
+ALIGN   16
+$L$oop1:
+        shr     r8,4
+        and     rdx,0xf
+        mov     r10,r9
+        mov     al,BYTE[rcx*1+rdi]
+        shr     r9,4
+        xor     r8,QWORD[8+rbx*1+rsi]
+        shl     r10,60
+        xor     r9,QWORD[rbx*1+rsi]
+        mov     bl,al
+        xor     r9,QWORD[rdx*8+r11]
+        mov     rdx,r8
+        shl     al,4
+        xor     r8,r10
+        dec     rcx
+        js      NEAR $L$break1
+
+        shr     r8,4
+        and     rdx,0xf
+        mov     r10,r9
+        shr     r9,4
+        xor     r8,QWORD[8+rax*1+rsi]
+        shl     r10,60
+        xor     r9,QWORD[rax*1+rsi]
+        and     bl,0xf0
+        xor     r9,QWORD[rdx*8+r11]
+        mov     rdx,r8
+        xor     r8,r10
+        jmp     NEAR $L$oop1
+
+ALIGN   16
+$L$break1:
+        shr     r8,4
+        and     rdx,0xf
+        mov     r10,r9
+        shr     r9,4
+        xor     r8,QWORD[8+rax*1+rsi]
+        shl     r10,60
+        xor     r9,QWORD[rax*1+rsi]
+        and     bl,0xf0
+        xor     r9,QWORD[rdx*8+r11]
+        mov     rdx,r8
+        xor     r8,r10
+
+        shr     r8,4
+        and     rdx,0xf
+        mov     r10,r9
+        shr     r9,4
+        xor     r8,QWORD[8+rbx*1+rsi]
+        shl     r10,60
+        xor     r9,QWORD[rbx*1+rsi]
+        xor     r8,r10
+        xor     r9,QWORD[rdx*8+r11]
+
+        bswap   r8
+        bswap   r9
+        mov     QWORD[8+rdi],r8
+        mov     QWORD[rdi],r9
+
+        lea     rsi,[((280+48))+rsp]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$gmult_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_gcm_gmult_4bit:
+global  gcm_ghash_4bit
+
+ALIGN   16
+gcm_ghash_4bit:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_gcm_ghash_4bit:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+
+
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        sub     rsp,280
+
+$L$ghash_prologue:
+        mov     r14,rdx
+        mov     r15,rcx
+        sub     rsi,-128
+        lea     rbp,[((16+128))+rsp]
+        xor     edx,edx
+        mov     r8,QWORD[((0+0-128))+rsi]
+        mov     rax,QWORD[((0+8-128))+rsi]
+        mov     dl,al
+        shr     rax,4
+        mov     r10,r8
+        shr     r8,4
+        mov     r9,QWORD[((16+0-128))+rsi]
+        shl     dl,4
+        mov     rbx,QWORD[((16+8-128))+rsi]
+        shl     r10,60
+        mov     BYTE[rsp],dl
+        or      rax,r10
+        mov     dl,bl
+        shr     rbx,4
+        mov     r10,r9
+        shr     r9,4
+        mov     QWORD[rbp],r8
+        mov     r8,QWORD[((32+0-128))+rsi]
+        shl     dl,4
+        mov     QWORD[((0-128))+rbp],rax
+        mov     rax,QWORD[((32+8-128))+rsi]
+        shl     r10,60
+        mov     BYTE[1+rsp],dl
+        or      rbx,r10
+        mov     dl,al
+        shr     rax,4
+        mov     r10,r8
+        shr     r8,4
+        mov     QWORD[8+rbp],r9
+        mov     r9,QWORD[((48+0-128))+rsi]
+        shl     dl,4
+        mov     QWORD[((8-128))+rbp],rbx
+        mov     rbx,QWORD[((48+8-128))+rsi]
+        shl     r10,60
+        mov     BYTE[2+rsp],dl
+        or      rax,r10
+        mov     dl,bl
+        shr     rbx,4
+        mov     r10,r9
+        shr     r9,4
+        mov     QWORD[16+rbp],r8
+        mov     r8,QWORD[((64+0-128))+rsi]
+        shl     dl,4
+        mov     QWORD[((16-128))+rbp],rax
+        mov     rax,QWORD[((64+8-128))+rsi]
+        shl     r10,60
+        mov     BYTE[3+rsp],dl
+        or      rbx,r10
+        mov     dl,al
+        shr     rax,4
+        mov     r10,r8
+        shr     r8,4
+        mov     QWORD[24+rbp],r9
+        mov     r9,QWORD[((80+0-128))+rsi]
+        shl     dl,4
+        mov     QWORD[((24-128))+rbp],rbx
+        mov     rbx,QWORD[((80+8-128))+rsi]
+        shl     r10,60
+        mov     BYTE[4+rsp],dl
+        or      rax,r10
+        mov     dl,bl
+        shr     rbx,4
+        mov     r10,r9
+        shr     r9,4
+        mov     QWORD[32+rbp],r8
+        mov     r8,QWORD[((96+0-128))+rsi]
+        shl     dl,4
+        mov     QWORD[((32-128))+rbp],rax
+        mov     rax,QWORD[((96+8-128))+rsi]
+        shl     r10,60
+        mov     BYTE[5+rsp],dl
+        or      rbx,r10
+        mov     dl,al
+        shr     rax,4
+        mov     r10,r8
+        shr     r8,4
+        mov     QWORD[40+rbp],r9
+        mov     r9,QWORD[((112+0-128))+rsi]
+        shl     dl,4
+        mov     QWORD[((40-128))+rbp],rbx
+        mov     rbx,QWORD[((112+8-128))+rsi]
+        shl     r10,60
+        mov     BYTE[6+rsp],dl
+        or      rax,r10
+        mov     dl,bl
+        shr     rbx,4
+        mov     r10,r9
+        shr     r9,4
+        mov     QWORD[48+rbp],r8
+        mov     r8,QWORD[((128+0-128))+rsi]
+        shl     dl,4
+        mov     QWORD[((48-128))+rbp],rax
+        mov     rax,QWORD[((128+8-128))+rsi]
+        shl     r10,60
+        mov     BYTE[7+rsp],dl
+        or      rbx,r10
+        mov     dl,al
+        shr     rax,4
+        mov     r10,r8
+        shr     r8,4
+        mov     QWORD[56+rbp],r9
+        mov     r9,QWORD[((144+0-128))+rsi]
+        shl     dl,4
+        mov     QWORD[((56-128))+rbp],rbx
+        mov     rbx,QWORD[((144+8-128))+rsi]
+        shl     r10,60
+        mov     BYTE[8+rsp],dl
+        or      rax,r10
+        mov     dl,bl
+        shr     rbx,4
+        mov     r10,r9
+        shr     r9,4
+        mov     QWORD[64+rbp],r8
+        mov     r8,QWORD[((160+0-128))+rsi]
+        shl     dl,4
+        mov     QWORD[((64-128))+rbp],rax
+        mov     rax,QWORD[((160+8-128))+rsi]
+        shl     r10,60
+        mov     BYTE[9+rsp],dl
+        or      rbx,r10
+        mov     dl,al
+        shr     rax,4
+        mov     r10,r8
+        shr     r8,4
+        mov     QWORD[72+rbp],r9
+        mov     r9,QWORD[((176+0-128))+rsi]
+        shl     dl,4
+        mov     QWORD[((72-128))+rbp],rbx
+        mov     rbx,QWORD[((176+8-128))+rsi]
+        shl     r10,60
+        mov     BYTE[10+rsp],dl
+        or      rax,r10
+        mov     dl,bl
+        shr     rbx,4
+        mov     r10,r9
+        shr     r9,4
+        mov     QWORD[80+rbp],r8
+        mov     r8,QWORD[((192+0-128))+rsi]
+        shl     dl,4
+        mov     QWORD[((80-128))+rbp],rax
+        mov     rax,QWORD[((192+8-128))+rsi]
+        shl     r10,60
+        mov     BYTE[11+rsp],dl
+        or      rbx,r10
+        mov     dl,al
+        shr     rax,4
+        mov     r10,r8
+        shr     r8,4
+        mov     QWORD[88+rbp],r9
+        mov     r9,QWORD[((208+0-128))+rsi]
+        shl     dl,4
+        mov     QWORD[((88-128))+rbp],rbx
+        mov     rbx,QWORD[((208+8-128))+rsi]
+        shl     r10,60
+        mov     BYTE[12+rsp],dl
+        or      rax,r10
+        mov     dl,bl
+        shr     rbx,4
+        mov     r10,r9
+        shr     r9,4
+        mov     QWORD[96+rbp],r8
+        mov     r8,QWORD[((224+0-128))+rsi]
+        shl     dl,4
+        mov     QWORD[((96-128))+rbp],rax
+        mov     rax,QWORD[((224+8-128))+rsi]
+        shl     r10,60
+        mov     BYTE[13+rsp],dl
+        or      rbx,r10
+        mov     dl,al
+        shr     rax,4
+        mov     r10,r8
+        shr     r8,4
+        mov     QWORD[104+rbp],r9
+        mov     r9,QWORD[((240+0-128))+rsi]
+        shl     dl,4
+        mov     QWORD[((104-128))+rbp],rbx
+        mov     rbx,QWORD[((240+8-128))+rsi]
+        shl     r10,60
+        mov     BYTE[14+rsp],dl
+        or      rax,r10
+        mov     dl,bl
+        shr     rbx,4
+        mov     r10,r9
+        shr     r9,4
+        mov     QWORD[112+rbp],r8
+        shl     dl,4
+        mov     QWORD[((112-128))+rbp],rax
+        shl     r10,60
+        mov     BYTE[15+rsp],dl
+        or      rbx,r10
+        mov     QWORD[120+rbp],r9
+        mov     QWORD[((120-128))+rbp],rbx
+        add     rsi,-128
+        mov     r8,QWORD[8+rdi]
+        mov     r9,QWORD[rdi]
+        add     r15,r14
+        lea     r11,[$L$rem_8bit]
+        jmp     NEAR $L$outer_loop
+ALIGN   16
+$L$outer_loop:
+        xor     r9,QWORD[r14]
+        mov     rdx,QWORD[8+r14]
+        lea     r14,[16+r14]
+        xor     rdx,r8
+        mov     QWORD[rdi],r9
+        mov     QWORD[8+rdi],rdx
+        shr     rdx,32
+        xor     rax,rax
+        rol     edx,8
+        mov     al,dl
+        movzx   ebx,dl
+        shl     al,4
+        shr     ebx,4
+        rol     edx,8
+        mov     r8,QWORD[8+rax*1+rsi]
+        mov     r9,QWORD[rax*1+rsi]
+        mov     al,dl
+        movzx   ecx,dl
+        shl     al,4
+        movzx   r12,BYTE[rbx*1+rsp]
+        shr     ecx,4
+        xor     r12,r8
+        mov     r10,r9
+        shr     r8,8
+        movzx   r12,r12b
+        shr     r9,8
+        xor     r8,QWORD[((-128))+rbx*8+rbp]
+        shl     r10,56
+        xor     r9,QWORD[rbx*8+rbp]
+        rol     edx,8
+        xor     r8,QWORD[8+rax*1+rsi]
+        xor     r9,QWORD[rax*1+rsi]
+        mov     al,dl
+        xor     r8,r10
+        movzx   r12,WORD[r12*2+r11]
+        movzx   ebx,dl
+        shl     al,4
+        movzx   r13,BYTE[rcx*1+rsp]
+        shr     ebx,4
+        shl     r12,48
+        xor     r13,r8
+        mov     r10,r9
+        xor     r9,r12
+        shr     r8,8
+        movzx   r13,r13b
+        shr     r9,8
+        xor     r8,QWORD[((-128))+rcx*8+rbp]
+        shl     r10,56
+        xor     r9,QWORD[rcx*8+rbp]
+        rol     edx,8
+        xor     r8,QWORD[8+rax*1+rsi]
+        xor     r9,QWORD[rax*1+rsi]
+        mov     al,dl
+        xor     r8,r10
+        movzx   r13,WORD[r13*2+r11]
+        movzx   ecx,dl
+        shl     al,4
+        movzx   r12,BYTE[rbx*1+rsp]
+        shr     ecx,4
+        shl     r13,48
+        xor     r12,r8
+        mov     r10,r9
+        xor     r9,r13
+        shr     r8,8
+        movzx   r12,r12b
+        mov     edx,DWORD[8+rdi]
+        shr     r9,8
+        xor     r8,QWORD[((-128))+rbx*8+rbp]
+        shl     r10,56
+        xor     r9,QWORD[rbx*8+rbp]
+        rol     edx,8
+        xor     r8,QWORD[8+rax*1+rsi]
+        xor     r9,QWORD[rax*1+rsi]
+        mov     al,dl
+        xor     r8,r10
+        movzx   r12,WORD[r12*2+r11]
+        movzx   ebx,dl
+        shl     al,4
+        movzx   r13,BYTE[rcx*1+rsp]
+        shr     ebx,4
+        shl     r12,48
+        xor     r13,r8
+        mov     r10,r9
+        xor     r9,r12
+        shr     r8,8
+        movzx   r13,r13b
+        shr     r9,8
+        xor     r8,QWORD[((-128))+rcx*8+rbp]
+        shl     r10,56
+        xor     r9,QWORD[rcx*8+rbp]
+        rol     edx,8
+        xor     r8,QWORD[8+rax*1+rsi]
+        xor     r9,QWORD[rax*1+rsi]
+        mov     al,dl
+        xor     r8,r10
+        movzx   r13,WORD[r13*2+r11]
+        movzx   ecx,dl
+        shl     al,4
+        movzx   r12,BYTE[rbx*1+rsp]
+        shr     ecx,4
+        shl     r13,48
+        xor     r12,r8
+        mov     r10,r9
+        xor     r9,r13
+        shr     r8,8
+        movzx   r12,r12b
+        shr     r9,8
+        xor     r8,QWORD[((-128))+rbx*8+rbp]
+        shl     r10,56
+        xor     r9,QWORD[rbx*8+rbp]
+        rol     edx,8
+        xor     r8,QWORD[8+rax*1+rsi]
+        xor     r9,QWORD[rax*1+rsi]
+        mov     al,dl
+        xor     r8,r10
+        movzx   r12,WORD[r12*2+r11]
+        movzx   ebx,dl
+        shl     al,4
+        movzx   r13,BYTE[rcx*1+rsp]
+        shr     ebx,4
+        shl     r12,48
+        xor     r13,r8
+        mov     r10,r9
+        xor     r9,r12
+        shr     r8,8
+        movzx   r13,r13b
+        shr     r9,8
+        xor     r8,QWORD[((-128))+rcx*8+rbp]
+        shl     r10,56
+        xor     r9,QWORD[rcx*8+rbp]
+        rol     edx,8
+        xor     r8,QWORD[8+rax*1+rsi]
+        xor     r9,QWORD[rax*1+rsi]
+        mov     al,dl
+        xor     r8,r10
+        movzx   r13,WORD[r13*2+r11]
+        movzx   ecx,dl
+        shl     al,4
+        movzx   r12,BYTE[rbx*1+rsp]
+        shr     ecx,4
+        shl     r13,48
+        xor     r12,r8
+        mov     r10,r9
+        xor     r9,r13
+        shr     r8,8
+        movzx   r12,r12b
+        mov     edx,DWORD[4+rdi]
+        shr     r9,8
+        xor     r8,QWORD[((-128))+rbx*8+rbp]
+        shl     r10,56
+        xor     r9,QWORD[rbx*8+rbp]
+        rol     edx,8
+        xor     r8,QWORD[8+rax*1+rsi]
+        xor     r9,QWORD[rax*1+rsi]
+        mov     al,dl
+        xor     r8,r10
+        movzx   r12,WORD[r12*2+r11]
+        movzx   ebx,dl
+        shl     al,4
+        movzx   r13,BYTE[rcx*1+rsp]
+        shr     ebx,4
+        shl     r12,48
+        xor     r13,r8
+        mov     r10,r9
+        xor     r9,r12
+        shr     r8,8
+        movzx   r13,r13b
+        shr     r9,8
+        xor     r8,QWORD[((-128))+rcx*8+rbp]
+        shl     r10,56
+        xor     r9,QWORD[rcx*8+rbp]
+        rol     edx,8
+        xor     r8,QWORD[8+rax*1+rsi]
+        xor     r9,QWORD[rax*1+rsi]
+        mov     al,dl
+        xor     r8,r10
+        movzx   r13,WORD[r13*2+r11]
+        movzx   ecx,dl
+        shl     al,4
+        movzx   r12,BYTE[rbx*1+rsp]
+        shr     ecx,4
+        shl     r13,48
+        xor     r12,r8
+        mov     r10,r9
+        xor     r9,r13
+        shr     r8,8
+        movzx   r12,r12b
+        shr     r9,8
+        xor     r8,QWORD[((-128))+rbx*8+rbp]
+        shl     r10,56
+        xor     r9,QWORD[rbx*8+rbp]
+        rol     edx,8
+        xor     r8,QWORD[8+rax*1+rsi]
+        xor     r9,QWORD[rax*1+rsi]
+        mov     al,dl
+        xor     r8,r10
+        movzx   r12,WORD[r12*2+r11]
+        movzx   ebx,dl
+        shl     al,4
+        movzx   r13,BYTE[rcx*1+rsp]
+        shr     ebx,4
+        shl     r12,48
+        xor     r13,r8
+        mov     r10,r9
+        xor     r9,r12
+        shr     r8,8
+        movzx   r13,r13b
+        shr     r9,8
+        xor     r8,QWORD[((-128))+rcx*8+rbp]
+        shl     r10,56
+        xor     r9,QWORD[rcx*8+rbp]
+        rol     edx,8
+        xor     r8,QWORD[8+rax*1+rsi]
+        xor     r9,QWORD[rax*1+rsi]
+        mov     al,dl
+        xor     r8,r10
+        movzx   r13,WORD[r13*2+r11]
+        movzx   ecx,dl
+        shl     al,4
+        movzx   r12,BYTE[rbx*1+rsp]
+        shr     ecx,4
+        shl     r13,48
+        xor     r12,r8
+        mov     r10,r9
+        xor     r9,r13
+        shr     r8,8
+        movzx   r12,r12b
+        mov     edx,DWORD[rdi]
+        shr     r9,8
+        xor     r8,QWORD[((-128))+rbx*8+rbp]
+        shl     r10,56
+        xor     r9,QWORD[rbx*8+rbp]
+        rol     edx,8
+        xor     r8,QWORD[8+rax*1+rsi]
+        xor     r9,QWORD[rax*1+rsi]
+        mov     al,dl
+        xor     r8,r10
+        movzx   r12,WORD[r12*2+r11]
+        movzx   ebx,dl
+        shl     al,4
+        movzx   r13,BYTE[rcx*1+rsp]
+        shr     ebx,4
+        shl     r12,48
+        xor     r13,r8
+        mov     r10,r9
+        xor     r9,r12
+        shr     r8,8
+        movzx   r13,r13b
+        shr     r9,8
+        xor     r8,QWORD[((-128))+rcx*8+rbp]
+        shl     r10,56
+        xor     r9,QWORD[rcx*8+rbp]
+        rol     edx,8
+        xor     r8,QWORD[8+rax*1+rsi]
+        xor     r9,QWORD[rax*1+rsi]
+        mov     al,dl
+        xor     r8,r10
+        movzx   r13,WORD[r13*2+r11]
+        movzx   ecx,dl
+        shl     al,4
+        movzx   r12,BYTE[rbx*1+rsp]
+        shr     ecx,4
+        shl     r13,48
+        xor     r12,r8
+        mov     r10,r9
+        xor     r9,r13
+        shr     r8,8
+        movzx   r12,r12b
+        shr     r9,8
+        xor     r8,QWORD[((-128))+rbx*8+rbp]
+        shl     r10,56
+        xor     r9,QWORD[rbx*8+rbp]
+        rol     edx,8
+        xor     r8,QWORD[8+rax*1+rsi]
+        xor     r9,QWORD[rax*1+rsi]
+        mov     al,dl
+        xor     r8,r10
+        movzx   r12,WORD[r12*2+r11]
+        movzx   ebx,dl
+        shl     al,4
+        movzx   r13,BYTE[rcx*1+rsp]
+        shr     ebx,4
+        shl     r12,48
+        xor     r13,r8
+        mov     r10,r9
+        xor     r9,r12
+        shr     r8,8
+        movzx   r13,r13b
+        shr     r9,8
+        xor     r8,QWORD[((-128))+rcx*8+rbp]
+        shl     r10,56
+        xor     r9,QWORD[rcx*8+rbp]
+        rol     edx,8
+        xor     r8,QWORD[8+rax*1+rsi]
+        xor     r9,QWORD[rax*1+rsi]
+        mov     al,dl
+        xor     r8,r10
+        movzx   r13,WORD[r13*2+r11]
+        movzx   ecx,dl
+        shl     al,4
+        movzx   r12,BYTE[rbx*1+rsp]
+        and     ecx,240
+        shl     r13,48
+        xor     r12,r8
+        mov     r10,r9
+        xor     r9,r13
+        shr     r8,8
+        movzx   r12,r12b
+        mov     edx,DWORD[((-4))+rdi]
+        shr     r9,8
+        xor     r8,QWORD[((-128))+rbx*8+rbp]
+        shl     r10,56
+        xor     r9,QWORD[rbx*8+rbp]
+        movzx   r12,WORD[r12*2+r11]
+        xor     r8,QWORD[8+rax*1+rsi]
+        xor     r9,QWORD[rax*1+rsi]
+        shl     r12,48
+        xor     r8,r10
+        xor     r9,r12
+        movzx   r13,r8b
+        shr     r8,4
+        mov     r10,r9
+        shl     r13b,4
+        shr     r9,4
+        xor     r8,QWORD[8+rcx*1+rsi]
+        movzx   r13,WORD[r13*2+r11]
+        shl     r10,60
+        xor     r9,QWORD[rcx*1+rsi]
+        xor     r8,r10
+        shl     r13,48
+        bswap   r8
+        xor     r9,r13
+        bswap   r9
+        cmp     r14,r15
+        jb      NEAR $L$outer_loop
+        mov     QWORD[8+rdi],r8
+        mov     QWORD[rdi],r9
+
+        lea     rsi,[((280+48))+rsp]
+
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$ghash_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_gcm_ghash_4bit:
+global  gcm_init_clmul
+
+ALIGN   16
+gcm_init_clmul:
+
+$L$_init_clmul:
+$L$SEH_begin_gcm_init_clmul:
+
+DB      0x48,0x83,0xec,0x18
+DB      0x0f,0x29,0x34,0x24
+        movdqu  xmm2,XMMWORD[rdx]
+        pshufd  xmm2,xmm2,78
+
+
+        pshufd  xmm4,xmm2,255
+        movdqa  xmm3,xmm2
+        psllq   xmm2,1
+        pxor    xmm5,xmm5
+        psrlq   xmm3,63
+        pcmpgtd xmm5,xmm4
+        pslldq  xmm3,8
+        por     xmm2,xmm3
+
+
+        pand    xmm5,XMMWORD[$L$0x1c2_polynomial]
+        pxor    xmm2,xmm5
+
+
+        pshufd  xmm6,xmm2,78
+        movdqa  xmm0,xmm2
+        pxor    xmm6,xmm2
+        movdqa  xmm1,xmm0
+        pshufd  xmm3,xmm0,78
+        pxor    xmm3,xmm0
+DB      102,15,58,68,194,0
+DB      102,15,58,68,202,17
+DB      102,15,58,68,222,0
+        pxor    xmm3,xmm0
+        pxor    xmm3,xmm1
+
+        movdqa  xmm4,xmm3
+        psrldq  xmm3,8
+        pslldq  xmm4,8
+        pxor    xmm1,xmm3
+        pxor    xmm0,xmm4
+
+        movdqa  xmm4,xmm0
+        movdqa  xmm3,xmm0
+        psllq   xmm0,5
+        pxor    xmm3,xmm0
+        psllq   xmm0,1
+        pxor    xmm0,xmm3
+        psllq   xmm0,57
+        movdqa  xmm3,xmm0
+        pslldq  xmm0,8
+        psrldq  xmm3,8
+        pxor    xmm0,xmm4
+        pxor    xmm1,xmm3
+
+
+        movdqa  xmm4,xmm0
+        psrlq   xmm0,1
+        pxor    xmm1,xmm4
+        pxor    xmm4,xmm0
+        psrlq   xmm0,5
+        pxor    xmm0,xmm4
+        psrlq   xmm0,1
+        pxor    xmm0,xmm1
+        pshufd  xmm3,xmm2,78
+        pshufd  xmm4,xmm0,78
+        pxor    xmm3,xmm2
+        movdqu  XMMWORD[rcx],xmm2
+        pxor    xmm4,xmm0
+        movdqu  XMMWORD[16+rcx],xmm0
+DB      102,15,58,15,227,8
+        movdqu  XMMWORD[32+rcx],xmm4
+        movdqa  xmm1,xmm0
+        pshufd  xmm3,xmm0,78
+        pxor    xmm3,xmm0
+DB      102,15,58,68,194,0
+DB      102,15,58,68,202,17
+DB      102,15,58,68,222,0
+        pxor    xmm3,xmm0
+        pxor    xmm3,xmm1
+
+        movdqa  xmm4,xmm3
+        psrldq  xmm3,8
+        pslldq  xmm4,8
+        pxor    xmm1,xmm3
+        pxor    xmm0,xmm4
+
+        movdqa  xmm4,xmm0
+        movdqa  xmm3,xmm0
+        psllq   xmm0,5
+        pxor    xmm3,xmm0
+        psllq   xmm0,1
+        pxor    xmm0,xmm3
+        psllq   xmm0,57
+        movdqa  xmm3,xmm0
+        pslldq  xmm0,8
+        psrldq  xmm3,8
+        pxor    xmm0,xmm4
+        pxor    xmm1,xmm3
+
+
+        movdqa  xmm4,xmm0
+        psrlq   xmm0,1
+        pxor    xmm1,xmm4
+        pxor    xmm4,xmm0
+        psrlq   xmm0,5
+        pxor    xmm0,xmm4
+        psrlq   xmm0,1
+        pxor    xmm0,xmm1
+        movdqa  xmm5,xmm0
+        movdqa  xmm1,xmm0
+        pshufd  xmm3,xmm0,78
+        pxor    xmm3,xmm0
+DB      102,15,58,68,194,0
+DB      102,15,58,68,202,17
+DB      102,15,58,68,222,0
+        pxor    xmm3,xmm0
+        pxor    xmm3,xmm1
+
+        movdqa  xmm4,xmm3
+        psrldq  xmm3,8
+        pslldq  xmm4,8
+        pxor    xmm1,xmm3
+        pxor    xmm0,xmm4
+
+        movdqa  xmm4,xmm0
+        movdqa  xmm3,xmm0
+        psllq   xmm0,5
+        pxor    xmm3,xmm0
+        psllq   xmm0,1
+        pxor    xmm0,xmm3
+        psllq   xmm0,57
+        movdqa  xmm3,xmm0
+        pslldq  xmm0,8
+        psrldq  xmm3,8
+        pxor    xmm0,xmm4
+        pxor    xmm1,xmm3
+
+
+        movdqa  xmm4,xmm0
+        psrlq   xmm0,1
+        pxor    xmm1,xmm4
+        pxor    xmm4,xmm0
+        psrlq   xmm0,5
+        pxor    xmm0,xmm4
+        psrlq   xmm0,1
+        pxor    xmm0,xmm1
+        pshufd  xmm3,xmm5,78
+        pshufd  xmm4,xmm0,78
+        pxor    xmm3,xmm5
+        movdqu  XMMWORD[48+rcx],xmm5
+        pxor    xmm4,xmm0
+        movdqu  XMMWORD[64+rcx],xmm0
+DB      102,15,58,15,227,8
+        movdqu  XMMWORD[80+rcx],xmm4
+        movaps  xmm6,XMMWORD[rsp]
+        lea     rsp,[24+rsp]
+$L$SEH_end_gcm_init_clmul:
+        DB      0F3h,0C3h               ;repret
+
+
+global  gcm_gmult_clmul
+
+ALIGN   16
+gcm_gmult_clmul:
+
+$L$_gmult_clmul:
+        movdqu  xmm0,XMMWORD[rcx]
+        movdqa  xmm5,XMMWORD[$L$bswap_mask]
+        movdqu  xmm2,XMMWORD[rdx]
+        movdqu  xmm4,XMMWORD[32+rdx]
+DB      102,15,56,0,197
+        movdqa  xmm1,xmm0
+        pshufd  xmm3,xmm0,78
+        pxor    xmm3,xmm0
+DB      102,15,58,68,194,0
+DB      102,15,58,68,202,17
+DB      102,15,58,68,220,0
+        pxor    xmm3,xmm0
+        pxor    xmm3,xmm1
+
+        movdqa  xmm4,xmm3
+        psrldq  xmm3,8
+        pslldq  xmm4,8
+        pxor    xmm1,xmm3
+        pxor    xmm0,xmm4
+
+        movdqa  xmm4,xmm0
+        movdqa  xmm3,xmm0
+        psllq   xmm0,5
+        pxor    xmm3,xmm0
+        psllq   xmm0,1
+        pxor    xmm0,xmm3
+        psllq   xmm0,57
+        movdqa  xmm3,xmm0
+        pslldq  xmm0,8
+        psrldq  xmm3,8
+        pxor    xmm0,xmm4
+        pxor    xmm1,xmm3
+
+
+        movdqa  xmm4,xmm0
+        psrlq   xmm0,1
+        pxor    xmm1,xmm4
+        pxor    xmm4,xmm0
+        psrlq   xmm0,5
+        pxor    xmm0,xmm4
+        psrlq   xmm0,1
+        pxor    xmm0,xmm1
+DB      102,15,56,0,197
+        movdqu  XMMWORD[rcx],xmm0
+        DB      0F3h,0C3h               ;repret
+
+
+global  gcm_ghash_clmul
+
+ALIGN   32
+gcm_ghash_clmul:
+
+$L$_ghash_clmul:
+        lea     rax,[((-136))+rsp]
+$L$SEH_begin_gcm_ghash_clmul:
+
+DB      0x48,0x8d,0x60,0xe0
+DB      0x0f,0x29,0x70,0xe0
+DB      0x0f,0x29,0x78,0xf0
+DB      0x44,0x0f,0x29,0x00
+DB      0x44,0x0f,0x29,0x48,0x10
+DB      0x44,0x0f,0x29,0x50,0x20
+DB      0x44,0x0f,0x29,0x58,0x30
+DB      0x44,0x0f,0x29,0x60,0x40
+DB      0x44,0x0f,0x29,0x68,0x50
+DB      0x44,0x0f,0x29,0x70,0x60
+DB      0x44,0x0f,0x29,0x78,0x70
+        movdqa  xmm10,XMMWORD[$L$bswap_mask]
+
+        movdqu  xmm0,XMMWORD[rcx]
+        movdqu  xmm2,XMMWORD[rdx]
+        movdqu  xmm7,XMMWORD[32+rdx]
+DB      102,65,15,56,0,194
+
+        sub     r9,0x10
+        jz      NEAR $L$odd_tail
+
+        movdqu  xmm6,XMMWORD[16+rdx]
+        mov     eax,DWORD[((OPENSSL_ia32cap_P+4))]
+        cmp     r9,0x30
+        jb      NEAR $L$skip4x
+
+        and     eax,71303168
+        cmp     eax,4194304
+        je      NEAR $L$skip4x
+
+        sub     r9,0x30
+        mov     rax,0xA040608020C0E000
+        movdqu  xmm14,XMMWORD[48+rdx]
+        movdqu  xmm15,XMMWORD[64+rdx]
+
+
+
+
+        movdqu  xmm3,XMMWORD[48+r8]
+        movdqu  xmm11,XMMWORD[32+r8]
+DB      102,65,15,56,0,218
+DB      102,69,15,56,0,218
+        movdqa  xmm5,xmm3
+        pshufd  xmm4,xmm3,78
+        pxor    xmm4,xmm3
+DB      102,15,58,68,218,0
+DB      102,15,58,68,234,17
+DB      102,15,58,68,231,0
+
+        movdqa  xmm13,xmm11
+        pshufd  xmm12,xmm11,78
+        pxor    xmm12,xmm11
+DB      102,68,15,58,68,222,0
+DB      102,68,15,58,68,238,17
+DB      102,68,15,58,68,231,16
+        xorps   xmm3,xmm11
+        xorps   xmm5,xmm13
+        movups  xmm7,XMMWORD[80+rdx]
+        xorps   xmm4,xmm12
+
+        movdqu  xmm11,XMMWORD[16+r8]
+        movdqu  xmm8,XMMWORD[r8]
+DB      102,69,15,56,0,218
+DB      102,69,15,56,0,194
+        movdqa  xmm13,xmm11
+        pshufd  xmm12,xmm11,78
+        pxor    xmm0,xmm8
+        pxor    xmm12,xmm11
+DB      102,69,15,58,68,222,0
+        movdqa  xmm1,xmm0
+        pshufd  xmm8,xmm0,78
+        pxor    xmm8,xmm0
+DB      102,69,15,58,68,238,17
+DB      102,68,15,58,68,231,0
+        xorps   xmm3,xmm11
+        xorps   xmm5,xmm13
+
+        lea     r8,[64+r8]
+        sub     r9,0x40
+        jc      NEAR $L$tail4x
+
+        jmp     NEAR $L$mod4_loop
+ALIGN   32
+$L$mod4_loop:
+DB      102,65,15,58,68,199,0
+        xorps   xmm4,xmm12
+        movdqu  xmm11,XMMWORD[48+r8]
+DB      102,69,15,56,0,218
+DB      102,65,15,58,68,207,17
+        xorps   xmm0,xmm3
+        movdqu  xmm3,XMMWORD[32+r8]
+        movdqa  xmm13,xmm11
+DB      102,68,15,58,68,199,16
+        pshufd  xmm12,xmm11,78
+        xorps   xmm1,xmm5
+        pxor    xmm12,xmm11
+DB      102,65,15,56,0,218
+        movups  xmm7,XMMWORD[32+rdx]
+        xorps   xmm8,xmm4
+DB      102,68,15,58,68,218,0
+        pshufd  xmm4,xmm3,78
+
+        pxor    xmm8,xmm0
+        movdqa  xmm5,xmm3
+        pxor    xmm8,xmm1
+        pxor    xmm4,xmm3
+        movdqa  xmm9,xmm8
+DB      102,68,15,58,68,234,17
+        pslldq  xmm8,8
+        psrldq  xmm9,8
+        pxor    xmm0,xmm8
+        movdqa  xmm8,XMMWORD[$L$7_mask]
+        pxor    xmm1,xmm9
+DB      102,76,15,110,200
+
+        pand    xmm8,xmm0
+DB      102,69,15,56,0,200
+        pxor    xmm9,xmm0
+DB      102,68,15,58,68,231,0
+        psllq   xmm9,57
+        movdqa  xmm8,xmm9
+        pslldq  xmm9,8
+DB      102,15,58,68,222,0
+        psrldq  xmm8,8
+        pxor    xmm0,xmm9
+        pxor    xmm1,xmm8
+        movdqu  xmm8,XMMWORD[r8]
+
+        movdqa  xmm9,xmm0
+        psrlq   xmm0,1
+DB      102,15,58,68,238,17
+        xorps   xmm3,xmm11
+        movdqu  xmm11,XMMWORD[16+r8]
+DB      102,69,15,56,0,218
+DB      102,15,58,68,231,16
+        xorps   xmm5,xmm13
+        movups  xmm7,XMMWORD[80+rdx]
+DB      102,69,15,56,0,194
+        pxor    xmm1,xmm9
+        pxor    xmm9,xmm0
+        psrlq   xmm0,5
+
+        movdqa  xmm13,xmm11
+        pxor    xmm4,xmm12
+        pshufd  xmm12,xmm11,78
+        pxor    xmm0,xmm9
+        pxor    xmm1,xmm8
+        pxor    xmm12,xmm11
+DB      102,69,15,58,68,222,0
+        psrlq   xmm0,1
+        pxor    xmm0,xmm1
+        movdqa  xmm1,xmm0
+DB      102,69,15,58,68,238,17
+        xorps   xmm3,xmm11
+        pshufd  xmm8,xmm0,78
+        pxor    xmm8,xmm0
+
+DB      102,68,15,58,68,231,0
+        xorps   xmm5,xmm13
+
+        lea     r8,[64+r8]
+        sub     r9,0x40
+        jnc     NEAR $L$mod4_loop
+
+$L$tail4x:
+DB      102,65,15,58,68,199,0
+DB      102,65,15,58,68,207,17
+DB      102,68,15,58,68,199,16
+        xorps   xmm4,xmm12
+        xorps   xmm0,xmm3
+        xorps   xmm1,xmm5
+        pxor    xmm1,xmm0
+        pxor    xmm8,xmm4
+
+        pxor    xmm8,xmm1
+        pxor    xmm1,xmm0
+
+        movdqa  xmm9,xmm8
+        psrldq  xmm8,8
+        pslldq  xmm9,8
+        pxor    xmm1,xmm8
+        pxor    xmm0,xmm9
+
+        movdqa  xmm4,xmm0
+        movdqa  xmm3,xmm0
+        psllq   xmm0,5
+        pxor    xmm3,xmm0
+        psllq   xmm0,1
+        pxor    xmm0,xmm3
+        psllq   xmm0,57
+        movdqa  xmm3,xmm0
+        pslldq  xmm0,8
+        psrldq  xmm3,8
+        pxor    xmm0,xmm4
+        pxor    xmm1,xmm3
+
+
+        movdqa  xmm4,xmm0
+        psrlq   xmm0,1
+        pxor    xmm1,xmm4
+        pxor    xmm4,xmm0
+        psrlq   xmm0,5
+        pxor    xmm0,xmm4
+        psrlq   xmm0,1
+        pxor    xmm0,xmm1
+        add     r9,0x40
+        jz      NEAR $L$done
+        movdqu  xmm7,XMMWORD[32+rdx]
+        sub     r9,0x10
+        jz      NEAR $L$odd_tail
+$L$skip4x:
+
+
+
+
+
+        movdqu  xmm8,XMMWORD[r8]
+        movdqu  xmm3,XMMWORD[16+r8]
+DB      102,69,15,56,0,194
+DB      102,65,15,56,0,218
+        pxor    xmm0,xmm8
+
+        movdqa  xmm5,xmm3
+        pshufd  xmm4,xmm3,78
+        pxor    xmm4,xmm3
+DB      102,15,58,68,218,0
+DB      102,15,58,68,234,17
+DB      102,15,58,68,231,0
+
+        lea     r8,[32+r8]
+        nop
+        sub     r9,0x20
+        jbe     NEAR $L$even_tail
+        nop
+        jmp     NEAR $L$mod_loop
+
+ALIGN   32
+$L$mod_loop:
+        movdqa  xmm1,xmm0
+        movdqa  xmm8,xmm4
+        pshufd  xmm4,xmm0,78
+        pxor    xmm4,xmm0
+
+DB      102,15,58,68,198,0
+DB      102,15,58,68,206,17
+DB      102,15,58,68,231,16
+
+        pxor    xmm0,xmm3
+        pxor    xmm1,xmm5
+        movdqu  xmm9,XMMWORD[r8]
+        pxor    xmm8,xmm0
+DB      102,69,15,56,0,202
+        movdqu  xmm3,XMMWORD[16+r8]
+
+        pxor    xmm8,xmm1
+        pxor    xmm1,xmm9
+        pxor    xmm4,xmm8
+DB      102,65,15,56,0,218
+        movdqa  xmm8,xmm4
+        psrldq  xmm8,8
+        pslldq  xmm4,8
+        pxor    xmm1,xmm8
+        pxor    xmm0,xmm4
+
+        movdqa  xmm5,xmm3
+
+        movdqa  xmm9,xmm0
+        movdqa  xmm8,xmm0
+        psllq   xmm0,5
+        pxor    xmm8,xmm0
+DB      102,15,58,68,218,0
+        psllq   xmm0,1
+        pxor    xmm0,xmm8
+        psllq   xmm0,57
+        movdqa  xmm8,xmm0
+        pslldq  xmm0,8
+        psrldq  xmm8,8
+        pxor    xmm0,xmm9
+        pshufd  xmm4,xmm5,78
+        pxor    xmm1,xmm8
+        pxor    xmm4,xmm5
+
+        movdqa  xmm9,xmm0
+        psrlq   xmm0,1
+DB      102,15,58,68,234,17
+        pxor    xmm1,xmm9
+        pxor    xmm9,xmm0
+        psrlq   xmm0,5
+        pxor    xmm0,xmm9
+        lea     r8,[32+r8]
+        psrlq   xmm0,1
+DB      102,15,58,68,231,0
+        pxor    xmm0,xmm1
+
+        sub     r9,0x20
+        ja      NEAR $L$mod_loop
+
+$L$even_tail:
+        movdqa  xmm1,xmm0
+        movdqa  xmm8,xmm4
+        pshufd  xmm4,xmm0,78
+        pxor    xmm4,xmm0
+
+DB      102,15,58,68,198,0
+DB      102,15,58,68,206,17
+DB      102,15,58,68,231,16
+
+        pxor    xmm0,xmm3
+        pxor    xmm1,xmm5
+        pxor    xmm8,xmm0
+        pxor    xmm8,xmm1
+        pxor    xmm4,xmm8
+        movdqa  xmm8,xmm4
+        psrldq  xmm8,8
+        pslldq  xmm4,8
+        pxor    xmm1,xmm8
+        pxor    xmm0,xmm4
+
+        movdqa  xmm4,xmm0
+        movdqa  xmm3,xmm0
+        psllq   xmm0,5
+        pxor    xmm3,xmm0
+        psllq   xmm0,1
+        pxor    xmm0,xmm3
+        psllq   xmm0,57
+        movdqa  xmm3,xmm0
+        pslldq  xmm0,8
+        psrldq  xmm3,8
+        pxor    xmm0,xmm4
+        pxor    xmm1,xmm3
+
+
+        movdqa  xmm4,xmm0
+        psrlq   xmm0,1
+        pxor    xmm1,xmm4
+        pxor    xmm4,xmm0
+        psrlq   xmm0,5
+        pxor    xmm0,xmm4
+        psrlq   xmm0,1
+        pxor    xmm0,xmm1
+        test    r9,r9
+        jnz     NEAR $L$done
+
+$L$odd_tail:
+        movdqu  xmm8,XMMWORD[r8]
+DB      102,69,15,56,0,194
+        pxor    xmm0,xmm8
+        movdqa  xmm1,xmm0
+        pshufd  xmm3,xmm0,78
+        pxor    xmm3,xmm0
+DB      102,15,58,68,194,0
+DB      102,15,58,68,202,17
+DB      102,15,58,68,223,0
+        pxor    xmm3,xmm0
+        pxor    xmm3,xmm1
+
+        movdqa  xmm4,xmm3
+        psrldq  xmm3,8
+        pslldq  xmm4,8
+        pxor    xmm1,xmm3
+        pxor    xmm0,xmm4
+
+        movdqa  xmm4,xmm0
+        movdqa  xmm3,xmm0
+        psllq   xmm0,5
+        pxor    xmm3,xmm0
+        psllq   xmm0,1
+        pxor    xmm0,xmm3
+        psllq   xmm0,57
+        movdqa  xmm3,xmm0
+        pslldq  xmm0,8
+        psrldq  xmm3,8
+        pxor    xmm0,xmm4
+        pxor    xmm1,xmm3
+
+
+        movdqa  xmm4,xmm0
+        psrlq   xmm0,1
+        pxor    xmm1,xmm4
+        pxor    xmm4,xmm0
+        psrlq   xmm0,5
+        pxor    xmm0,xmm4
+        psrlq   xmm0,1
+        pxor    xmm0,xmm1
+$L$done:
+DB      102,65,15,56,0,194
+        movdqu  XMMWORD[rcx],xmm0
+        movaps  xmm6,XMMWORD[rsp]
+        movaps  xmm7,XMMWORD[16+rsp]
+        movaps  xmm8,XMMWORD[32+rsp]
+        movaps  xmm9,XMMWORD[48+rsp]
+        movaps  xmm10,XMMWORD[64+rsp]
+        movaps  xmm11,XMMWORD[80+rsp]
+        movaps  xmm12,XMMWORD[96+rsp]
+        movaps  xmm13,XMMWORD[112+rsp]
+        movaps  xmm14,XMMWORD[128+rsp]
+        movaps  xmm15,XMMWORD[144+rsp]
+        lea     rsp,[168+rsp]
+$L$SEH_end_gcm_ghash_clmul:
+        DB      0F3h,0C3h               ;repret
+
+
+global  gcm_init_avx
+
+ALIGN   32
+gcm_init_avx:
+
+$L$SEH_begin_gcm_init_avx:
+
+DB      0x48,0x83,0xec,0x18
+DB      0x0f,0x29,0x34,0x24
+        vzeroupper
+
+        vmovdqu xmm2,XMMWORD[rdx]
+        vpshufd xmm2,xmm2,78
+
+
+        vpshufd xmm4,xmm2,255
+        vpsrlq  xmm3,xmm2,63
+        vpsllq  xmm2,xmm2,1
+        vpxor   xmm5,xmm5,xmm5
+        vpcmpgtd        xmm5,xmm5,xmm4
+        vpslldq xmm3,xmm3,8
+        vpor    xmm2,xmm2,xmm3
+
+
+        vpand   xmm5,xmm5,XMMWORD[$L$0x1c2_polynomial]
+        vpxor   xmm2,xmm2,xmm5
+
+        vpunpckhqdq     xmm6,xmm2,xmm2
+        vmovdqa xmm0,xmm2
+        vpxor   xmm6,xmm6,xmm2
+        mov     r10,4
+        jmp     NEAR $L$init_start_avx
+ALIGN   32
+$L$init_loop_avx:
+        vpalignr        xmm5,xmm4,xmm3,8
+        vmovdqu XMMWORD[(-16)+rcx],xmm5
+        vpunpckhqdq     xmm3,xmm0,xmm0
+        vpxor   xmm3,xmm3,xmm0
+        vpclmulqdq      xmm1,xmm0,xmm2,0x11
+        vpclmulqdq      xmm0,xmm0,xmm2,0x00
+        vpclmulqdq      xmm3,xmm3,xmm6,0x00
+        vpxor   xmm4,xmm1,xmm0
+        vpxor   xmm3,xmm3,xmm4
+
+        vpslldq xmm4,xmm3,8
+        vpsrldq xmm3,xmm3,8
+        vpxor   xmm0,xmm0,xmm4
+        vpxor   xmm1,xmm1,xmm3
+        vpsllq  xmm3,xmm0,57
+        vpsllq  xmm4,xmm0,62
+        vpxor   xmm4,xmm4,xmm3
+        vpsllq  xmm3,xmm0,63
+        vpxor   xmm4,xmm4,xmm3
+        vpslldq xmm3,xmm4,8
+        vpsrldq xmm4,xmm4,8
+        vpxor   xmm0,xmm0,xmm3
+        vpxor   xmm1,xmm1,xmm4
+
+        vpsrlq  xmm4,xmm0,1
+        vpxor   xmm1,xmm1,xmm0
+        vpxor   xmm0,xmm0,xmm4
+        vpsrlq  xmm4,xmm4,5
+        vpxor   xmm0,xmm0,xmm4
+        vpsrlq  xmm0,xmm0,1
+        vpxor   xmm0,xmm0,xmm1
+$L$init_start_avx:
+        vmovdqa xmm5,xmm0
+        vpunpckhqdq     xmm3,xmm0,xmm0
+        vpxor   xmm3,xmm3,xmm0
+        vpclmulqdq      xmm1,xmm0,xmm2,0x11
+        vpclmulqdq      xmm0,xmm0,xmm2,0x00
+        vpclmulqdq      xmm3,xmm3,xmm6,0x00
+        vpxor   xmm4,xmm1,xmm0
+        vpxor   xmm3,xmm3,xmm4
+
+        vpslldq xmm4,xmm3,8
+        vpsrldq xmm3,xmm3,8
+        vpxor   xmm0,xmm0,xmm4
+        vpxor   xmm1,xmm1,xmm3
+        vpsllq  xmm3,xmm0,57
+        vpsllq  xmm4,xmm0,62
+        vpxor   xmm4,xmm4,xmm3
+        vpsllq  xmm3,xmm0,63
+        vpxor   xmm4,xmm4,xmm3
+        vpslldq xmm3,xmm4,8
+        vpsrldq xmm4,xmm4,8
+        vpxor   xmm0,xmm0,xmm3
+        vpxor   xmm1,xmm1,xmm4
+
+        vpsrlq  xmm4,xmm0,1
+        vpxor   xmm1,xmm1,xmm0
+        vpxor   xmm0,xmm0,xmm4
+        vpsrlq  xmm4,xmm4,5
+        vpxor   xmm0,xmm0,xmm4
+        vpsrlq  xmm0,xmm0,1
+        vpxor   xmm0,xmm0,xmm1
+        vpshufd xmm3,xmm5,78
+        vpshufd xmm4,xmm0,78
+        vpxor   xmm3,xmm3,xmm5
+        vmovdqu XMMWORD[rcx],xmm5
+        vpxor   xmm4,xmm4,xmm0
+        vmovdqu XMMWORD[16+rcx],xmm0
+        lea     rcx,[48+rcx]
+        sub     r10,1
+        jnz     NEAR $L$init_loop_avx
+
+        vpalignr        xmm5,xmm3,xmm4,8
+        vmovdqu XMMWORD[(-16)+rcx],xmm5
+
+        vzeroupper
+        movaps  xmm6,XMMWORD[rsp]
+        lea     rsp,[24+rsp]
+$L$SEH_end_gcm_init_avx:
+        DB      0F3h,0C3h               ;repret
+
+
+global  gcm_gmult_avx
+
+ALIGN   32
+gcm_gmult_avx:
+
+        jmp     NEAR $L$_gmult_clmul
+
+
+global  gcm_ghash_avx
+
+ALIGN   32
+gcm_ghash_avx:
+
+        lea     rax,[((-136))+rsp]
+$L$SEH_begin_gcm_ghash_avx:
+
+DB      0x48,0x8d,0x60,0xe0
+DB      0x0f,0x29,0x70,0xe0
+DB      0x0f,0x29,0x78,0xf0
+DB      0x44,0x0f,0x29,0x00
+DB      0x44,0x0f,0x29,0x48,0x10
+DB      0x44,0x0f,0x29,0x50,0x20
+DB      0x44,0x0f,0x29,0x58,0x30
+DB      0x44,0x0f,0x29,0x60,0x40
+DB      0x44,0x0f,0x29,0x68,0x50
+DB      0x44,0x0f,0x29,0x70,0x60
+DB      0x44,0x0f,0x29,0x78,0x70
+        vzeroupper
+
+        vmovdqu xmm10,XMMWORD[rcx]
+        lea     r10,[$L$0x1c2_polynomial]
+        lea     rdx,[64+rdx]
+        vmovdqu xmm13,XMMWORD[$L$bswap_mask]
+        vpshufb xmm10,xmm10,xmm13
+        cmp     r9,0x80
+        jb      NEAR $L$short_avx
+        sub     r9,0x80
+
+        vmovdqu xmm14,XMMWORD[112+r8]
+        vmovdqu xmm6,XMMWORD[((0-64))+rdx]
+        vpshufb xmm14,xmm14,xmm13
+        vmovdqu xmm7,XMMWORD[((32-64))+rdx]
+
+        vpunpckhqdq     xmm9,xmm14,xmm14
+        vmovdqu xmm15,XMMWORD[96+r8]
+        vpclmulqdq      xmm0,xmm14,xmm6,0x00
+        vpxor   xmm9,xmm9,xmm14
+        vpshufb xmm15,xmm15,xmm13
+        vpclmulqdq      xmm1,xmm14,xmm6,0x11
+        vmovdqu xmm6,XMMWORD[((16-64))+rdx]
+        vpunpckhqdq     xmm8,xmm15,xmm15
+        vmovdqu xmm14,XMMWORD[80+r8]
+        vpclmulqdq      xmm2,xmm9,xmm7,0x00
+        vpxor   xmm8,xmm8,xmm15
+
+        vpshufb xmm14,xmm14,xmm13
+        vpclmulqdq      xmm3,xmm15,xmm6,0x00
+        vpunpckhqdq     xmm9,xmm14,xmm14
+        vpclmulqdq      xmm4,xmm15,xmm6,0x11
+        vmovdqu xmm6,XMMWORD[((48-64))+rdx]
+        vpxor   xmm9,xmm9,xmm14
+        vmovdqu xmm15,XMMWORD[64+r8]
+        vpclmulqdq      xmm5,xmm8,xmm7,0x10
+        vmovdqu xmm7,XMMWORD[((80-64))+rdx]
+
+        vpshufb xmm15,xmm15,xmm13
+        vpxor   xmm3,xmm3,xmm0
+        vpclmulqdq      xmm0,xmm14,xmm6,0x00
+        vpxor   xmm4,xmm4,xmm1
+        vpunpckhqdq     xmm8,xmm15,xmm15
+        vpclmulqdq      xmm1,xmm14,xmm6,0x11
+        vmovdqu xmm6,XMMWORD[((64-64))+rdx]
+        vpxor   xmm5,xmm5,xmm2
+        vpclmulqdq      xmm2,xmm9,xmm7,0x00
+        vpxor   xmm8,xmm8,xmm15
+
+        vmovdqu xmm14,XMMWORD[48+r8]
+        vpxor   xmm0,xmm0,xmm3
+        vpclmulqdq      xmm3,xmm15,xmm6,0x00
+        vpxor   xmm1,xmm1,xmm4
+        vpshufb xmm14,xmm14,xmm13
+        vpclmulqdq      xmm4,xmm15,xmm6,0x11
+        vmovdqu xmm6,XMMWORD[((96-64))+rdx]
+        vpxor   xmm2,xmm2,xmm5
+        vpunpckhqdq     xmm9,xmm14,xmm14
+        vpclmulqdq      xmm5,xmm8,xmm7,0x10
+        vmovdqu xmm7,XMMWORD[((128-64))+rdx]
+        vpxor   xmm9,xmm9,xmm14
+
+        vmovdqu xmm15,XMMWORD[32+r8]
+        vpxor   xmm3,xmm3,xmm0
+        vpclmulqdq      xmm0,xmm14,xmm6,0x00
+        vpxor   xmm4,xmm4,xmm1
+        vpshufb xmm15,xmm15,xmm13
+        vpclmulqdq      xmm1,xmm14,xmm6,0x11
+        vmovdqu xmm6,XMMWORD[((112-64))+rdx]
+        vpxor   xmm5,xmm5,xmm2
+        vpunpckhqdq     xmm8,xmm15,xmm15
+        vpclmulqdq      xmm2,xmm9,xmm7,0x00
+        vpxor   xmm8,xmm8,xmm15
+
+        vmovdqu xmm14,XMMWORD[16+r8]
+        vpxor   xmm0,xmm0,xmm3
+        vpclmulqdq      xmm3,xmm15,xmm6,0x00
+        vpxor   xmm1,xmm1,xmm4
+        vpshufb xmm14,xmm14,xmm13
+        vpclmulqdq      xmm4,xmm15,xmm6,0x11
+        vmovdqu xmm6,XMMWORD[((144-64))+rdx]
+        vpxor   xmm2,xmm2,xmm5
+        vpunpckhqdq     xmm9,xmm14,xmm14
+        vpclmulqdq      xmm5,xmm8,xmm7,0x10
+        vmovdqu xmm7,XMMWORD[((176-64))+rdx]
+        vpxor   xmm9,xmm9,xmm14
+
+        vmovdqu xmm15,XMMWORD[r8]
+        vpxor   xmm3,xmm3,xmm0
+        vpclmulqdq      xmm0,xmm14,xmm6,0x00
+        vpxor   xmm4,xmm4,xmm1
+        vpshufb xmm15,xmm15,xmm13
+        vpclmulqdq      xmm1,xmm14,xmm6,0x11
+        vmovdqu xmm6,XMMWORD[((160-64))+rdx]
+        vpxor   xmm5,xmm5,xmm2
+        vpclmulqdq      xmm2,xmm9,xmm7,0x10
+
+        lea     r8,[128+r8]
+        cmp     r9,0x80
+        jb      NEAR $L$tail_avx
+
+        vpxor   xmm15,xmm15,xmm10
+        sub     r9,0x80
+        jmp     NEAR $L$oop8x_avx
+
+ALIGN   32
+$L$oop8x_avx:
+        vpunpckhqdq     xmm8,xmm15,xmm15
+        vmovdqu xmm14,XMMWORD[112+r8]
+        vpxor   xmm3,xmm3,xmm0
+        vpxor   xmm8,xmm8,xmm15
+        vpclmulqdq      xmm10,xmm15,xmm6,0x00
+        vpshufb xmm14,xmm14,xmm13
+        vpxor   xmm4,xmm4,xmm1
+        vpclmulqdq      xmm11,xmm15,xmm6,0x11
+        vmovdqu xmm6,XMMWORD[((0-64))+rdx]
+        vpunpckhqdq     xmm9,xmm14,xmm14
+        vpxor   xmm5,xmm5,xmm2
+        vpclmulqdq      xmm12,xmm8,xmm7,0x00
+        vmovdqu xmm7,XMMWORD[((32-64))+rdx]
+        vpxor   xmm9,xmm9,xmm14
+
+        vmovdqu xmm15,XMMWORD[96+r8]
+        vpclmulqdq      xmm0,xmm14,xmm6,0x00
+        vpxor   xmm10,xmm10,xmm3
+        vpshufb xmm15,xmm15,xmm13
+        vpclmulqdq      xmm1,xmm14,xmm6,0x11
+        vxorps  xmm11,xmm11,xmm4
+        vmovdqu xmm6,XMMWORD[((16-64))+rdx]
+        vpunpckhqdq     xmm8,xmm15,xmm15
+        vpclmulqdq      xmm2,xmm9,xmm7,0x00
+        vpxor   xmm12,xmm12,xmm5
+        vxorps  xmm8,xmm8,xmm15
+
+        vmovdqu xmm14,XMMWORD[80+r8]
+        vpxor   xmm12,xmm12,xmm10
+        vpclmulqdq      xmm3,xmm15,xmm6,0x00
+        vpxor   xmm12,xmm12,xmm11
+        vpslldq xmm9,xmm12,8
+        vpxor   xmm3,xmm3,xmm0
+        vpclmulqdq      xmm4,xmm15,xmm6,0x11
+        vpsrldq xmm12,xmm12,8
+        vpxor   xmm10,xmm10,xmm9
+        vmovdqu xmm6,XMMWORD[((48-64))+rdx]
+        vpshufb xmm14,xmm14,xmm13
+        vxorps  xmm11,xmm11,xmm12
+        vpxor   xmm4,xmm4,xmm1
+        vpunpckhqdq     xmm9,xmm14,xmm14
+        vpclmulqdq      xmm5,xmm8,xmm7,0x10
+        vmovdqu xmm7,XMMWORD[((80-64))+rdx]
+        vpxor   xmm9,xmm9,xmm14
+        vpxor   xmm5,xmm5,xmm2
+
+        vmovdqu xmm15,XMMWORD[64+r8]
+        vpalignr        xmm12,xmm10,xmm10,8
+        vpclmulqdq      xmm0,xmm14,xmm6,0x00
+        vpshufb xmm15,xmm15,xmm13
+        vpxor   xmm0,xmm0,xmm3
+        vpclmulqdq      xmm1,xmm14,xmm6,0x11
+        vmovdqu xmm6,XMMWORD[((64-64))+rdx]
+        vpunpckhqdq     xmm8,xmm15,xmm15
+        vpxor   xmm1,xmm1,xmm4
+        vpclmulqdq      xmm2,xmm9,xmm7,0x00
+        vxorps  xmm8,xmm8,xmm15
+        vpxor   xmm2,xmm2,xmm5
+
+        vmovdqu xmm14,XMMWORD[48+r8]
+        vpclmulqdq      xmm10,xmm10,XMMWORD[r10],0x10
+        vpclmulqdq      xmm3,xmm15,xmm6,0x00
+        vpshufb xmm14,xmm14,xmm13
+        vpxor   xmm3,xmm3,xmm0
+        vpclmulqdq      xmm4,xmm15,xmm6,0x11
+        vmovdqu xmm6,XMMWORD[((96-64))+rdx]
+        vpunpckhqdq     xmm9,xmm14,xmm14
+        vpxor   xmm4,xmm4,xmm1
+        vpclmulqdq      xmm5,xmm8,xmm7,0x10
+        vmovdqu xmm7,XMMWORD[((128-64))+rdx]
+        vpxor   xmm9,xmm9,xmm14
+        vpxor   xmm5,xmm5,xmm2
+
+        vmovdqu xmm15,XMMWORD[32+r8]
+        vpclmulqdq      xmm0,xmm14,xmm6,0x00
+        vpshufb xmm15,xmm15,xmm13
+        vpxor   xmm0,xmm0,xmm3
+        vpclmulqdq      xmm1,xmm14,xmm6,0x11
+        vmovdqu xmm6,XMMWORD[((112-64))+rdx]
+        vpunpckhqdq     xmm8,xmm15,xmm15
+        vpxor   xmm1,xmm1,xmm4
+        vpclmulqdq      xmm2,xmm9,xmm7,0x00
+        vpxor   xmm8,xmm8,xmm15
+        vpxor   xmm2,xmm2,xmm5
+        vxorps  xmm10,xmm10,xmm12
+
+        vmovdqu xmm14,XMMWORD[16+r8]
+        vpalignr        xmm12,xmm10,xmm10,8
+        vpclmulqdq      xmm3,xmm15,xmm6,0x00
+        vpshufb xmm14,xmm14,xmm13
+        vpxor   xmm3,xmm3,xmm0
+        vpclmulqdq      xmm4,xmm15,xmm6,0x11
+        vmovdqu xmm6,XMMWORD[((144-64))+rdx]
+        vpclmulqdq      xmm10,xmm10,XMMWORD[r10],0x10
+        vxorps  xmm12,xmm12,xmm11
+        vpunpckhqdq     xmm9,xmm14,xmm14
+        vpxor   xmm4,xmm4,xmm1
+        vpclmulqdq      xmm5,xmm8,xmm7,0x10
+        vmovdqu xmm7,XMMWORD[((176-64))+rdx]
+        vpxor   xmm9,xmm9,xmm14
+        vpxor   xmm5,xmm5,xmm2
+
+        vmovdqu xmm15,XMMWORD[r8]
+        vpclmulqdq      xmm0,xmm14,xmm6,0x00
+        vpshufb xmm15,xmm15,xmm13
+        vpclmulqdq      xmm1,xmm14,xmm6,0x11
+        vmovdqu xmm6,XMMWORD[((160-64))+rdx]
+        vpxor   xmm15,xmm15,xmm12
+        vpclmulqdq      xmm2,xmm9,xmm7,0x10
+        vpxor   xmm15,xmm15,xmm10
+
+        lea     r8,[128+r8]
+        sub     r9,0x80
+        jnc     NEAR $L$oop8x_avx
+
+        add     r9,0x80
+        jmp     NEAR $L$tail_no_xor_avx
+
+ALIGN   32
+$L$short_avx:
+        vmovdqu xmm14,XMMWORD[((-16))+r9*1+r8]
+        lea     r8,[r9*1+r8]
+        vmovdqu xmm6,XMMWORD[((0-64))+rdx]
+        vmovdqu xmm7,XMMWORD[((32-64))+rdx]
+        vpshufb xmm15,xmm14,xmm13
+
+        vmovdqa xmm3,xmm0
+        vmovdqa xmm4,xmm1
+        vmovdqa xmm5,xmm2
+        sub     r9,0x10
+        jz      NEAR $L$tail_avx
+
+        vpunpckhqdq     xmm8,xmm15,xmm15
+        vpxor   xmm3,xmm3,xmm0
+        vpclmulqdq      xmm0,xmm15,xmm6,0x00
+        vpxor   xmm8,xmm8,xmm15
+        vmovdqu xmm14,XMMWORD[((-32))+r8]
+        vpxor   xmm4,xmm4,xmm1
+        vpclmulqdq      xmm1,xmm15,xmm6,0x11
+        vmovdqu xmm6,XMMWORD[((16-64))+rdx]
+        vpshufb xmm15,xmm14,xmm13
+        vpxor   xmm5,xmm5,xmm2
+        vpclmulqdq      xmm2,xmm8,xmm7,0x00
+        vpsrldq xmm7,xmm7,8
+        sub     r9,0x10
+        jz      NEAR $L$tail_avx
+
+        vpunpckhqdq     xmm8,xmm15,xmm15
+        vpxor   xmm3,xmm3,xmm0
+        vpclmulqdq      xmm0,xmm15,xmm6,0x00
+        vpxor   xmm8,xmm8,xmm15
+        vmovdqu xmm14,XMMWORD[((-48))+r8]
+        vpxor   xmm4,xmm4,xmm1
+        vpclmulqdq      xmm1,xmm15,xmm6,0x11
+        vmovdqu xmm6,XMMWORD[((48-64))+rdx]
+        vpshufb xmm15,xmm14,xmm13
+        vpxor   xmm5,xmm5,xmm2
+        vpclmulqdq      xmm2,xmm8,xmm7,0x00
+        vmovdqu xmm7,XMMWORD[((80-64))+rdx]
+        sub     r9,0x10
+        jz      NEAR $L$tail_avx
+
+        vpunpckhqdq     xmm8,xmm15,xmm15
+        vpxor   xmm3,xmm3,xmm0
+        vpclmulqdq      xmm0,xmm15,xmm6,0x00
+        vpxor   xmm8,xmm8,xmm15
+        vmovdqu xmm14,XMMWORD[((-64))+r8]
+        vpxor   xmm4,xmm4,xmm1
+        vpclmulqdq      xmm1,xmm15,xmm6,0x11
+        vmovdqu xmm6,XMMWORD[((64-64))+rdx]
+        vpshufb xmm15,xmm14,xmm13
+        vpxor   xmm5,xmm5,xmm2
+        vpclmulqdq      xmm2,xmm8,xmm7,0x00
+        vpsrldq xmm7,xmm7,8
+        sub     r9,0x10
+        jz      NEAR $L$tail_avx
+
+        vpunpckhqdq     xmm8,xmm15,xmm15
+        vpxor   xmm3,xmm3,xmm0
+        vpclmulqdq      xmm0,xmm15,xmm6,0x00
+        vpxor   xmm8,xmm8,xmm15
+        vmovdqu xmm14,XMMWORD[((-80))+r8]
+        vpxor   xmm4,xmm4,xmm1
+        vpclmulqdq      xmm1,xmm15,xmm6,0x11
+        vmovdqu xmm6,XMMWORD[((96-64))+rdx]
+        vpshufb xmm15,xmm14,xmm13
+        vpxor   xmm5,xmm5,xmm2
+        vpclmulqdq      xmm2,xmm8,xmm7,0x00
+        vmovdqu xmm7,XMMWORD[((128-64))+rdx]
+        sub     r9,0x10
+        jz      NEAR $L$tail_avx
+
+        vpunpckhqdq     xmm8,xmm15,xmm15
+        vpxor   xmm3,xmm3,xmm0
+        vpclmulqdq      xmm0,xmm15,xmm6,0x00
+        vpxor   xmm8,xmm8,xmm15
+        vmovdqu xmm14,XMMWORD[((-96))+r8]
+        vpxor   xmm4,xmm4,xmm1
+        vpclmulqdq      xmm1,xmm15,xmm6,0x11
+        vmovdqu xmm6,XMMWORD[((112-64))+rdx]
+        vpshufb xmm15,xmm14,xmm13
+        vpxor   xmm5,xmm5,xmm2
+        vpclmulqdq      xmm2,xmm8,xmm7,0x00
+        vpsrldq xmm7,xmm7,8
+        sub     r9,0x10
+        jz      NEAR $L$tail_avx
+
+        vpunpckhqdq     xmm8,xmm15,xmm15
+        vpxor   xmm3,xmm3,xmm0
+        vpclmulqdq      xmm0,xmm15,xmm6,0x00
+        vpxor   xmm8,xmm8,xmm15
+        vmovdqu xmm14,XMMWORD[((-112))+r8]
+        vpxor   xmm4,xmm4,xmm1
+        vpclmulqdq      xmm1,xmm15,xmm6,0x11
+        vmovdqu xmm6,XMMWORD[((144-64))+rdx]
+        vpshufb xmm15,xmm14,xmm13
+        vpxor   xmm5,xmm5,xmm2
+        vpclmulqdq      xmm2,xmm8,xmm7,0x00
+        vmovq   xmm7,QWORD[((184-64))+rdx]
+        sub     r9,0x10
+        jmp     NEAR $L$tail_avx
+
+ALIGN   32
+$L$tail_avx:
+        vpxor   xmm15,xmm15,xmm10
+$L$tail_no_xor_avx:
+        vpunpckhqdq     xmm8,xmm15,xmm15
+        vpxor   xmm3,xmm3,xmm0
+        vpclmulqdq      xmm0,xmm15,xmm6,0x00
+        vpxor   xmm8,xmm8,xmm15
+        vpxor   xmm4,xmm4,xmm1
+        vpclmulqdq      xmm1,xmm15,xmm6,0x11
+        vpxor   xmm5,xmm5,xmm2
+        vpclmulqdq      xmm2,xmm8,xmm7,0x00
+
+        vmovdqu xmm12,XMMWORD[r10]
+
+        vpxor   xmm10,xmm3,xmm0
+        vpxor   xmm11,xmm4,xmm1
+        vpxor   xmm5,xmm5,xmm2
+
+        vpxor   xmm5,xmm5,xmm10
+        vpxor   xmm5,xmm5,xmm11
+        vpslldq xmm9,xmm5,8
+        vpsrldq xmm5,xmm5,8
+        vpxor   xmm10,xmm10,xmm9
+        vpxor   xmm11,xmm11,xmm5
+
+        vpclmulqdq      xmm9,xmm10,xmm12,0x10
+        vpalignr        xmm10,xmm10,xmm10,8
+        vpxor   xmm10,xmm10,xmm9
+
+        vpclmulqdq      xmm9,xmm10,xmm12,0x10
+        vpalignr        xmm10,xmm10,xmm10,8
+        vpxor   xmm10,xmm10,xmm11
+        vpxor   xmm10,xmm10,xmm9
+
+        cmp     r9,0
+        jne     NEAR $L$short_avx
+
+        vpshufb xmm10,xmm10,xmm13
+        vmovdqu XMMWORD[rcx],xmm10
+        vzeroupper
+        movaps  xmm6,XMMWORD[rsp]
+        movaps  xmm7,XMMWORD[16+rsp]
+        movaps  xmm8,XMMWORD[32+rsp]
+        movaps  xmm9,XMMWORD[48+rsp]
+        movaps  xmm10,XMMWORD[64+rsp]
+        movaps  xmm11,XMMWORD[80+rsp]
+        movaps  xmm12,XMMWORD[96+rsp]
+        movaps  xmm13,XMMWORD[112+rsp]
+        movaps  xmm14,XMMWORD[128+rsp]
+        movaps  xmm15,XMMWORD[144+rsp]
+        lea     rsp,[168+rsp]
+$L$SEH_end_gcm_ghash_avx:
+        DB      0F3h,0C3h               ;repret
+
+
+ALIGN   64
+$L$bswap_mask:
+DB      15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0
+$L$0x1c2_polynomial:
+DB      1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0xc2
+$L$7_mask:
+        DD      7,0,7,0
+$L$7_mask_poly:
+        DD      7,0,450,0
+ALIGN   64
+
+$L$rem_4bit:
+        DD      0,0,0,471859200,0,943718400,0,610271232
+        DD      0,1887436800,0,1822425088,0,1220542464,0,1423966208
+        DD      0,3774873600,0,4246732800,0,3644850176,0,3311403008
+        DD      0,2441084928,0,2376073216,0,2847932416,0,3051356160
+
+$L$rem_8bit:
+        DW      0x0000,0x01C2,0x0384,0x0246,0x0708,0x06CA,0x048C,0x054E
+        DW      0x0E10,0x0FD2,0x0D94,0x0C56,0x0918,0x08DA,0x0A9C,0x0B5E
+        DW      0x1C20,0x1DE2,0x1FA4,0x1E66,0x1B28,0x1AEA,0x18AC,0x196E
+        DW      0x1230,0x13F2,0x11B4,0x1076,0x1538,0x14FA,0x16BC,0x177E
+        DW      0x3840,0x3982,0x3BC4,0x3A06,0x3F48,0x3E8A,0x3CCC,0x3D0E
+        DW      0x3650,0x3792,0x35D4,0x3416,0x3158,0x309A,0x32DC,0x331E
+        DW      0x2460,0x25A2,0x27E4,0x2626,0x2368,0x22AA,0x20EC,0x212E
+        DW      0x2A70,0x2BB2,0x29F4,0x2836,0x2D78,0x2CBA,0x2EFC,0x2F3E
+        DW      0x7080,0x7142,0x7304,0x72C6,0x7788,0x764A,0x740C,0x75CE
+        DW      0x7E90,0x7F52,0x7D14,0x7CD6,0x7998,0x785A,0x7A1C,0x7BDE
+        DW      0x6CA0,0x6D62,0x6F24,0x6EE6,0x6BA8,0x6A6A,0x682C,0x69EE
+        DW      0x62B0,0x6372,0x6134,0x60F6,0x65B8,0x647A,0x663C,0x67FE
+        DW      0x48C0,0x4902,0x4B44,0x4A86,0x4FC8,0x4E0A,0x4C4C,0x4D8E
+        DW      0x46D0,0x4712,0x4554,0x4496,0x41D8,0x401A,0x425C,0x439E
+        DW      0x54E0,0x5522,0x5764,0x56A6,0x53E8,0x522A,0x506C,0x51AE
+        DW      0x5AF0,0x5B32,0x5974,0x58B6,0x5DF8,0x5C3A,0x5E7C,0x5FBE
+        DW      0xE100,0xE0C2,0xE284,0xE346,0xE608,0xE7CA,0xE58C,0xE44E
+        DW      0xEF10,0xEED2,0xEC94,0xED56,0xE818,0xE9DA,0xEB9C,0xEA5E
+        DW      0xFD20,0xFCE2,0xFEA4,0xFF66,0xFA28,0xFBEA,0xF9AC,0xF86E
+        DW      0xF330,0xF2F2,0xF0B4,0xF176,0xF438,0xF5FA,0xF7BC,0xF67E
+        DW      0xD940,0xD882,0xDAC4,0xDB06,0xDE48,0xDF8A,0xDDCC,0xDC0E
+        DW      0xD750,0xD692,0xD4D4,0xD516,0xD058,0xD19A,0xD3DC,0xD21E
+        DW      0xC560,0xC4A2,0xC6E4,0xC726,0xC268,0xC3AA,0xC1EC,0xC02E
+        DW      0xCB70,0xCAB2,0xC8F4,0xC936,0xCC78,0xCDBA,0xCFFC,0xCE3E
+        DW      0x9180,0x9042,0x9204,0x93C6,0x9688,0x974A,0x950C,0x94CE
+        DW      0x9F90,0x9E52,0x9C14,0x9DD6,0x9898,0x995A,0x9B1C,0x9ADE
+        DW      0x8DA0,0x8C62,0x8E24,0x8FE6,0x8AA8,0x8B6A,0x892C,0x88EE
+        DW      0x83B0,0x8272,0x8034,0x81F6,0x84B8,0x857A,0x873C,0x86FE
+        DW      0xA9C0,0xA802,0xAA44,0xAB86,0xAEC8,0xAF0A,0xAD4C,0xAC8E
+        DW      0xA7D0,0xA612,0xA454,0xA596,0xA0D8,0xA11A,0xA35C,0xA29E
+        DW      0xB5E0,0xB422,0xB664,0xB7A6,0xB2E8,0xB32A,0xB16C,0xB0AE
+        DW      0xBBF0,0xBA32,0xB874,0xB9B6,0xBCF8,0xBD3A,0xBF7C,0xBEBE
+
+DB      71,72,65,83,72,32,102,111,114,32,120,56,54,95,54,52
+DB      44,32,67,82,89,80,84,79,71,65,77,83,32,98,121,32
+DB      60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111
+DB      114,103,62,0
+ALIGN   64
+EXTERN  __imp_RtlVirtualUnwind
+
+ALIGN   16
+se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$in_prologue
+
+        mov     rax,QWORD[152+r8]
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$in_prologue
+
+        lea     rax,[((48+280))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+        mov     r12,QWORD[((-24))+rax]
+        mov     r13,QWORD[((-32))+rax]
+        mov     r14,QWORD[((-40))+rax]
+        mov     r15,QWORD[((-48))+rax]
+        mov     QWORD[144+r8],rbx
+        mov     QWORD[160+r8],rbp
+        mov     QWORD[216+r8],r12
+        mov     QWORD[224+r8],r13
+        mov     QWORD[232+r8],r14
+        mov     QWORD[240+r8],r15
+
+$L$in_prologue:
+        mov     rdi,QWORD[8+rax]
+        mov     rsi,QWORD[16+rax]
+        mov     QWORD[152+r8],rax
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+
+        mov     rdi,QWORD[40+r9]
+        mov     rsi,r8
+        mov     ecx,154
+        DD      0xa548f3fc
+
+        mov     rsi,r9
+        xor     rcx,rcx
+        mov     rdx,QWORD[8+rsi]
+        mov     r8,QWORD[rsi]
+        mov     r9,QWORD[16+rsi]
+        mov     r10,QWORD[40+rsi]
+        lea     r11,[56+rsi]
+        lea     r12,[24+rsi]
+        mov     QWORD[32+rsp],r10
+        mov     QWORD[40+rsp],r11
+        mov     QWORD[48+rsp],r12
+        mov     QWORD[56+rsp],rcx
+        call    QWORD[__imp_RtlVirtualUnwind]
+
+        mov     eax,1
+        add     rsp,64
+        popfq
+        pop     r15
+        pop     r14
+        pop     r13
+        pop     r12
+        pop     rbp
+        pop     rbx
+        pop     rdi
+        pop     rsi
+        DB      0F3h,0C3h               ;repret
+
+
+section .pdata rdata align=4
+ALIGN   4
+        DD      $L$SEH_begin_gcm_gmult_4bit wrt ..imagebase
+        DD      $L$SEH_end_gcm_gmult_4bit wrt ..imagebase
+        DD      $L$SEH_info_gcm_gmult_4bit wrt ..imagebase
+
+        DD      $L$SEH_begin_gcm_ghash_4bit wrt ..imagebase
+        DD      $L$SEH_end_gcm_ghash_4bit wrt ..imagebase
+        DD      $L$SEH_info_gcm_ghash_4bit wrt ..imagebase
+
+        DD      $L$SEH_begin_gcm_init_clmul wrt ..imagebase
+        DD      $L$SEH_end_gcm_init_clmul wrt ..imagebase
+        DD      $L$SEH_info_gcm_init_clmul wrt ..imagebase
+
+        DD      $L$SEH_begin_gcm_ghash_clmul wrt ..imagebase
+        DD      $L$SEH_end_gcm_ghash_clmul wrt ..imagebase
+        DD      $L$SEH_info_gcm_ghash_clmul wrt ..imagebase
+        DD      $L$SEH_begin_gcm_init_avx wrt ..imagebase
+        DD      $L$SEH_end_gcm_init_avx wrt ..imagebase
+        DD      $L$SEH_info_gcm_init_clmul wrt ..imagebase
+
+        DD      $L$SEH_begin_gcm_ghash_avx wrt ..imagebase
+        DD      $L$SEH_end_gcm_ghash_avx wrt ..imagebase
+        DD      $L$SEH_info_gcm_ghash_clmul wrt ..imagebase
+section .xdata rdata align=8
+ALIGN   8
+$L$SEH_info_gcm_gmult_4bit:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$gmult_prologue wrt ..imagebase,$L$gmult_epilogue wrt ..imagebase
+$L$SEH_info_gcm_ghash_4bit:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$ghash_prologue wrt ..imagebase,$L$ghash_epilogue wrt ..imagebase
+$L$SEH_info_gcm_init_clmul:
+DB      0x01,0x08,0x03,0x00
+DB      0x08,0x68,0x00,0x00
+DB      0x04,0x22,0x00,0x00
+$L$SEH_info_gcm_ghash_clmul:
+DB      0x01,0x33,0x16,0x00
+DB      0x33,0xf8,0x09,0x00
+DB      0x2e,0xe8,0x08,0x00
+DB      0x29,0xd8,0x07,0x00
+DB      0x24,0xc8,0x06,0x00
+DB      0x1f,0xb8,0x05,0x00
+DB      0x1a,0xa8,0x04,0x00
+DB      0x15,0x98,0x03,0x00
+DB      0x10,0x88,0x02,0x00
+DB      0x0c,0x78,0x01,0x00
+DB      0x08,0x68,0x00,0x00
+DB      0x04,0x01,0x15,0x00
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-md5-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-md5-x86_64.nasm
new file mode 100644
index 0000000000..c9a37a47c9
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-md5-x86_64.nasm
@@ -0,0 +1,1395 @@
+; Copyright 2011-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+ALIGN   16
+
+global  rc4_md5_enc
+
+rc4_md5_enc:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_rc4_md5_enc:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        cmp     r9,0
+        je      NEAR $L$abort
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        sub     rsp,40
+
+$L$body:
+        mov     r11,rcx
+        mov     r12,r9
+        mov     r13,rsi
+        mov     r14,rdx
+        mov     r15,r8
+        xor     rbp,rbp
+        xor     rcx,rcx
+
+        lea     rdi,[8+rdi]
+        mov     bpl,BYTE[((-8))+rdi]
+        mov     cl,BYTE[((-4))+rdi]
+
+        inc     bpl
+        sub     r14,r13
+        mov     eax,DWORD[rbp*4+rdi]
+        add     cl,al
+        lea     rsi,[rbp*4+rdi]
+        shl     r12,6
+        add     r12,r15
+        mov     QWORD[16+rsp],r12
+
+        mov     QWORD[24+rsp],r11
+        mov     r8d,DWORD[r11]
+        mov     r9d,DWORD[4+r11]
+        mov     r10d,DWORD[8+r11]
+        mov     r11d,DWORD[12+r11]
+        jmp     NEAR $L$oop
+
+ALIGN   16
+$L$oop:
+        mov     DWORD[rsp],r8d
+        mov     DWORD[4+rsp],r9d
+        mov     DWORD[8+rsp],r10d
+        mov     r12d,r11d
+        mov     DWORD[12+rsp],r11d
+        pxor    xmm0,xmm0
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r10d
+        mov     DWORD[rcx*4+rdi],eax
+        and     r12d,r9d
+        add     r8d,DWORD[r15]
+        add     al,dl
+        mov     ebx,DWORD[4+rsi]
+        add     r8d,3614090360
+        xor     r12d,r11d
+        movzx   eax,al
+        mov     DWORD[rsi],edx
+        add     r8d,r12d
+        add     cl,bl
+        rol     r8d,7
+        mov     r12d,r10d
+        movd    xmm0,DWORD[rax*4+rdi]
+
+        add     r8d,r9d
+        pxor    xmm1,xmm1
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r9d
+        mov     DWORD[rcx*4+rdi],ebx
+        and     r12d,r8d
+        add     r11d,DWORD[4+r15]
+        add     bl,dl
+        mov     eax,DWORD[8+rsi]
+        add     r11d,3905402710
+        xor     r12d,r10d
+        movzx   ebx,bl
+        mov     DWORD[4+rsi],edx
+        add     r11d,r12d
+        add     cl,al
+        rol     r11d,12
+        mov     r12d,r9d
+        movd    xmm1,DWORD[rbx*4+rdi]
+
+        add     r11d,r8d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r8d
+        mov     DWORD[rcx*4+rdi],eax
+        and     r12d,r11d
+        add     r10d,DWORD[8+r15]
+        add     al,dl
+        mov     ebx,DWORD[12+rsi]
+        add     r10d,606105819
+        xor     r12d,r9d
+        movzx   eax,al
+        mov     DWORD[8+rsi],edx
+        add     r10d,r12d
+        add     cl,bl
+        rol     r10d,17
+        mov     r12d,r8d
+        pinsrw  xmm0,WORD[rax*4+rdi],1
+
+        add     r10d,r11d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r11d
+        mov     DWORD[rcx*4+rdi],ebx
+        and     r12d,r10d
+        add     r9d,DWORD[12+r15]
+        add     bl,dl
+        mov     eax,DWORD[16+rsi]
+        add     r9d,3250441966
+        xor     r12d,r8d
+        movzx   ebx,bl
+        mov     DWORD[12+rsi],edx
+        add     r9d,r12d
+        add     cl,al
+        rol     r9d,22
+        mov     r12d,r11d
+        pinsrw  xmm1,WORD[rbx*4+rdi],1
+
+        add     r9d,r10d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r10d
+        mov     DWORD[rcx*4+rdi],eax
+        and     r12d,r9d
+        add     r8d,DWORD[16+r15]
+        add     al,dl
+        mov     ebx,DWORD[20+rsi]
+        add     r8d,4118548399
+        xor     r12d,r11d
+        movzx   eax,al
+        mov     DWORD[16+rsi],edx
+        add     r8d,r12d
+        add     cl,bl
+        rol     r8d,7
+        mov     r12d,r10d
+        pinsrw  xmm0,WORD[rax*4+rdi],2
+
+        add     r8d,r9d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r9d
+        mov     DWORD[rcx*4+rdi],ebx
+        and     r12d,r8d
+        add     r11d,DWORD[20+r15]
+        add     bl,dl
+        mov     eax,DWORD[24+rsi]
+        add     r11d,1200080426
+        xor     r12d,r10d
+        movzx   ebx,bl
+        mov     DWORD[20+rsi],edx
+        add     r11d,r12d
+        add     cl,al
+        rol     r11d,12
+        mov     r12d,r9d
+        pinsrw  xmm1,WORD[rbx*4+rdi],2
+
+        add     r11d,r8d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r8d
+        mov     DWORD[rcx*4+rdi],eax
+        and     r12d,r11d
+        add     r10d,DWORD[24+r15]
+        add     al,dl
+        mov     ebx,DWORD[28+rsi]
+        add     r10d,2821735955
+        xor     r12d,r9d
+        movzx   eax,al
+        mov     DWORD[24+rsi],edx
+        add     r10d,r12d
+        add     cl,bl
+        rol     r10d,17
+        mov     r12d,r8d
+        pinsrw  xmm0,WORD[rax*4+rdi],3
+
+        add     r10d,r11d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r11d
+        mov     DWORD[rcx*4+rdi],ebx
+        and     r12d,r10d
+        add     r9d,DWORD[28+r15]
+        add     bl,dl
+        mov     eax,DWORD[32+rsi]
+        add     r9d,4249261313
+        xor     r12d,r8d
+        movzx   ebx,bl
+        mov     DWORD[28+rsi],edx
+        add     r9d,r12d
+        add     cl,al
+        rol     r9d,22
+        mov     r12d,r11d
+        pinsrw  xmm1,WORD[rbx*4+rdi],3
+
+        add     r9d,r10d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r10d
+        mov     DWORD[rcx*4+rdi],eax
+        and     r12d,r9d
+        add     r8d,DWORD[32+r15]
+        add     al,dl
+        mov     ebx,DWORD[36+rsi]
+        add     r8d,1770035416
+        xor     r12d,r11d
+        movzx   eax,al
+        mov     DWORD[32+rsi],edx
+        add     r8d,r12d
+        add     cl,bl
+        rol     r8d,7
+        mov     r12d,r10d
+        pinsrw  xmm0,WORD[rax*4+rdi],4
+
+        add     r8d,r9d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r9d
+        mov     DWORD[rcx*4+rdi],ebx
+        and     r12d,r8d
+        add     r11d,DWORD[36+r15]
+        add     bl,dl
+        mov     eax,DWORD[40+rsi]
+        add     r11d,2336552879
+        xor     r12d,r10d
+        movzx   ebx,bl
+        mov     DWORD[36+rsi],edx
+        add     r11d,r12d
+        add     cl,al
+        rol     r11d,12
+        mov     r12d,r9d
+        pinsrw  xmm1,WORD[rbx*4+rdi],4
+
+        add     r11d,r8d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r8d
+        mov     DWORD[rcx*4+rdi],eax
+        and     r12d,r11d
+        add     r10d,DWORD[40+r15]
+        add     al,dl
+        mov     ebx,DWORD[44+rsi]
+        add     r10d,4294925233
+        xor     r12d,r9d
+        movzx   eax,al
+        mov     DWORD[40+rsi],edx
+        add     r10d,r12d
+        add     cl,bl
+        rol     r10d,17
+        mov     r12d,r8d
+        pinsrw  xmm0,WORD[rax*4+rdi],5
+
+        add     r10d,r11d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r11d
+        mov     DWORD[rcx*4+rdi],ebx
+        and     r12d,r10d
+        add     r9d,DWORD[44+r15]
+        add     bl,dl
+        mov     eax,DWORD[48+rsi]
+        add     r9d,2304563134
+        xor     r12d,r8d
+        movzx   ebx,bl
+        mov     DWORD[44+rsi],edx
+        add     r9d,r12d
+        add     cl,al
+        rol     r9d,22
+        mov     r12d,r11d
+        pinsrw  xmm1,WORD[rbx*4+rdi],5
+
+        add     r9d,r10d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r10d
+        mov     DWORD[rcx*4+rdi],eax
+        and     r12d,r9d
+        add     r8d,DWORD[48+r15]
+        add     al,dl
+        mov     ebx,DWORD[52+rsi]
+        add     r8d,1804603682
+        xor     r12d,r11d
+        movzx   eax,al
+        mov     DWORD[48+rsi],edx
+        add     r8d,r12d
+        add     cl,bl
+        rol     r8d,7
+        mov     r12d,r10d
+        pinsrw  xmm0,WORD[rax*4+rdi],6
+
+        add     r8d,r9d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r9d
+        mov     DWORD[rcx*4+rdi],ebx
+        and     r12d,r8d
+        add     r11d,DWORD[52+r15]
+        add     bl,dl
+        mov     eax,DWORD[56+rsi]
+        add     r11d,4254626195
+        xor     r12d,r10d
+        movzx   ebx,bl
+        mov     DWORD[52+rsi],edx
+        add     r11d,r12d
+        add     cl,al
+        rol     r11d,12
+        mov     r12d,r9d
+        pinsrw  xmm1,WORD[rbx*4+rdi],6
+
+        add     r11d,r8d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r8d
+        mov     DWORD[rcx*4+rdi],eax
+        and     r12d,r11d
+        add     r10d,DWORD[56+r15]
+        add     al,dl
+        mov     ebx,DWORD[60+rsi]
+        add     r10d,2792965006
+        xor     r12d,r9d
+        movzx   eax,al
+        mov     DWORD[56+rsi],edx
+        add     r10d,r12d
+        add     cl,bl
+        rol     r10d,17
+        mov     r12d,r8d
+        pinsrw  xmm0,WORD[rax*4+rdi],7
+
+        add     r10d,r11d
+        movdqu  xmm2,XMMWORD[r13]
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r11d
+        mov     DWORD[rcx*4+rdi],ebx
+        and     r12d,r10d
+        add     r9d,DWORD[60+r15]
+        add     bl,dl
+        mov     eax,DWORD[64+rsi]
+        add     r9d,1236535329
+        xor     r12d,r8d
+        movzx   ebx,bl
+        mov     DWORD[60+rsi],edx
+        add     r9d,r12d
+        add     cl,al
+        rol     r9d,22
+        mov     r12d,r10d
+        pinsrw  xmm1,WORD[rbx*4+rdi],7
+
+        add     r9d,r10d
+        psllq   xmm1,8
+        pxor    xmm2,xmm0
+        pxor    xmm2,xmm1
+        pxor    xmm0,xmm0
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r9d
+        mov     DWORD[rcx*4+rdi],eax
+        and     r12d,r11d
+        add     r8d,DWORD[4+r15]
+        add     al,dl
+        mov     ebx,DWORD[68+rsi]
+        add     r8d,4129170786
+        xor     r12d,r10d
+        movzx   eax,al
+        mov     DWORD[64+rsi],edx
+        add     r8d,r12d
+        add     cl,bl
+        rol     r8d,5
+        mov     r12d,r9d
+        movd    xmm0,DWORD[rax*4+rdi]
+
+        add     r8d,r9d
+        pxor    xmm1,xmm1
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r8d
+        mov     DWORD[rcx*4+rdi],ebx
+        and     r12d,r10d
+        add     r11d,DWORD[24+r15]
+        add     bl,dl
+        mov     eax,DWORD[72+rsi]
+        add     r11d,3225465664
+        xor     r12d,r9d
+        movzx   ebx,bl
+        mov     DWORD[68+rsi],edx
+        add     r11d,r12d
+        add     cl,al
+        rol     r11d,9
+        mov     r12d,r8d
+        movd    xmm1,DWORD[rbx*4+rdi]
+
+        add     r11d,r8d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r11d
+        mov     DWORD[rcx*4+rdi],eax
+        and     r12d,r9d
+        add     r10d,DWORD[44+r15]
+        add     al,dl
+        mov     ebx,DWORD[76+rsi]
+        add     r10d,643717713
+        xor     r12d,r8d
+        movzx   eax,al
+        mov     DWORD[72+rsi],edx
+        add     r10d,r12d
+        add     cl,bl
+        rol     r10d,14
+        mov     r12d,r11d
+        pinsrw  xmm0,WORD[rax*4+rdi],1
+
+        add     r10d,r11d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r10d
+        mov     DWORD[rcx*4+rdi],ebx
+        and     r12d,r8d
+        add     r9d,DWORD[r15]
+        add     bl,dl
+        mov     eax,DWORD[80+rsi]
+        add     r9d,3921069994
+        xor     r12d,r11d
+        movzx   ebx,bl
+        mov     DWORD[76+rsi],edx
+        add     r9d,r12d
+        add     cl,al
+        rol     r9d,20
+        mov     r12d,r10d
+        pinsrw  xmm1,WORD[rbx*4+rdi],1
+
+        add     r9d,r10d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r9d
+        mov     DWORD[rcx*4+rdi],eax
+        and     r12d,r11d
+        add     r8d,DWORD[20+r15]
+        add     al,dl
+        mov     ebx,DWORD[84+rsi]
+        add     r8d,3593408605
+        xor     r12d,r10d
+        movzx   eax,al
+        mov     DWORD[80+rsi],edx
+        add     r8d,r12d
+        add     cl,bl
+        rol     r8d,5
+        mov     r12d,r9d
+        pinsrw  xmm0,WORD[rax*4+rdi],2
+
+        add     r8d,r9d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r8d
+        mov     DWORD[rcx*4+rdi],ebx
+        and     r12d,r10d
+        add     r11d,DWORD[40+r15]
+        add     bl,dl
+        mov     eax,DWORD[88+rsi]
+        add     r11d,38016083
+        xor     r12d,r9d
+        movzx   ebx,bl
+        mov     DWORD[84+rsi],edx
+        add     r11d,r12d
+        add     cl,al
+        rol     r11d,9
+        mov     r12d,r8d
+        pinsrw  xmm1,WORD[rbx*4+rdi],2
+
+        add     r11d,r8d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r11d
+        mov     DWORD[rcx*4+rdi],eax
+        and     r12d,r9d
+        add     r10d,DWORD[60+r15]
+        add     al,dl
+        mov     ebx,DWORD[92+rsi]
+        add     r10d,3634488961
+        xor     r12d,r8d
+        movzx   eax,al
+        mov     DWORD[88+rsi],edx
+        add     r10d,r12d
+        add     cl,bl
+        rol     r10d,14
+        mov     r12d,r11d
+        pinsrw  xmm0,WORD[rax*4+rdi],3
+
+        add     r10d,r11d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r10d
+        mov     DWORD[rcx*4+rdi],ebx
+        and     r12d,r8d
+        add     r9d,DWORD[16+r15]
+        add     bl,dl
+        mov     eax,DWORD[96+rsi]
+        add     r9d,3889429448
+        xor     r12d,r11d
+        movzx   ebx,bl
+        mov     DWORD[92+rsi],edx
+        add     r9d,r12d
+        add     cl,al
+        rol     r9d,20
+        mov     r12d,r10d
+        pinsrw  xmm1,WORD[rbx*4+rdi],3
+
+        add     r9d,r10d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r9d
+        mov     DWORD[rcx*4+rdi],eax
+        and     r12d,r11d
+        add     r8d,DWORD[36+r15]
+        add     al,dl
+        mov     ebx,DWORD[100+rsi]
+        add     r8d,568446438
+        xor     r12d,r10d
+        movzx   eax,al
+        mov     DWORD[96+rsi],edx
+        add     r8d,r12d
+        add     cl,bl
+        rol     r8d,5
+        mov     r12d,r9d
+        pinsrw  xmm0,WORD[rax*4+rdi],4
+
+        add     r8d,r9d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r8d
+        mov     DWORD[rcx*4+rdi],ebx
+        and     r12d,r10d
+        add     r11d,DWORD[56+r15]
+        add     bl,dl
+        mov     eax,DWORD[104+rsi]
+        add     r11d,3275163606
+        xor     r12d,r9d
+        movzx   ebx,bl
+        mov     DWORD[100+rsi],edx
+        add     r11d,r12d
+        add     cl,al
+        rol     r11d,9
+        mov     r12d,r8d
+        pinsrw  xmm1,WORD[rbx*4+rdi],4
+
+        add     r11d,r8d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r11d
+        mov     DWORD[rcx*4+rdi],eax
+        and     r12d,r9d
+        add     r10d,DWORD[12+r15]
+        add     al,dl
+        mov     ebx,DWORD[108+rsi]
+        add     r10d,4107603335
+        xor     r12d,r8d
+        movzx   eax,al
+        mov     DWORD[104+rsi],edx
+        add     r10d,r12d
+        add     cl,bl
+        rol     r10d,14
+        mov     r12d,r11d
+        pinsrw  xmm0,WORD[rax*4+rdi],5
+
+        add     r10d,r11d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r10d
+        mov     DWORD[rcx*4+rdi],ebx
+        and     r12d,r8d
+        add     r9d,DWORD[32+r15]
+        add     bl,dl
+        mov     eax,DWORD[112+rsi]
+        add     r9d,1163531501
+        xor     r12d,r11d
+        movzx   ebx,bl
+        mov     DWORD[108+rsi],edx
+        add     r9d,r12d
+        add     cl,al
+        rol     r9d,20
+        mov     r12d,r10d
+        pinsrw  xmm1,WORD[rbx*4+rdi],5
+
+        add     r9d,r10d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r9d
+        mov     DWORD[rcx*4+rdi],eax
+        and     r12d,r11d
+        add     r8d,DWORD[52+r15]
+        add     al,dl
+        mov     ebx,DWORD[116+rsi]
+        add     r8d,2850285829
+        xor     r12d,r10d
+        movzx   eax,al
+        mov     DWORD[112+rsi],edx
+        add     r8d,r12d
+        add     cl,bl
+        rol     r8d,5
+        mov     r12d,r9d
+        pinsrw  xmm0,WORD[rax*4+rdi],6
+
+        add     r8d,r9d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r8d
+        mov     DWORD[rcx*4+rdi],ebx
+        and     r12d,r10d
+        add     r11d,DWORD[8+r15]
+        add     bl,dl
+        mov     eax,DWORD[120+rsi]
+        add     r11d,4243563512
+        xor     r12d,r9d
+        movzx   ebx,bl
+        mov     DWORD[116+rsi],edx
+        add     r11d,r12d
+        add     cl,al
+        rol     r11d,9
+        mov     r12d,r8d
+        pinsrw  xmm1,WORD[rbx*4+rdi],6
+
+        add     r11d,r8d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r11d
+        mov     DWORD[rcx*4+rdi],eax
+        and     r12d,r9d
+        add     r10d,DWORD[28+r15]
+        add     al,dl
+        mov     ebx,DWORD[124+rsi]
+        add     r10d,1735328473
+        xor     r12d,r8d
+        movzx   eax,al
+        mov     DWORD[120+rsi],edx
+        add     r10d,r12d
+        add     cl,bl
+        rol     r10d,14
+        mov     r12d,r11d
+        pinsrw  xmm0,WORD[rax*4+rdi],7
+
+        add     r10d,r11d
+        movdqu  xmm3,XMMWORD[16+r13]
+        add     bpl,32
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r10d
+        mov     DWORD[rcx*4+rdi],ebx
+        and     r12d,r8d
+        add     r9d,DWORD[48+r15]
+        add     bl,dl
+        mov     eax,DWORD[rbp*4+rdi]
+        add     r9d,2368359562
+        xor     r12d,r11d
+        movzx   ebx,bl
+        mov     DWORD[124+rsi],edx
+        add     r9d,r12d
+        add     cl,al
+        rol     r9d,20
+        mov     r12d,r11d
+        pinsrw  xmm1,WORD[rbx*4+rdi],7
+
+        add     r9d,r10d
+        mov     rsi,rcx
+        xor     rcx,rcx
+        mov     cl,sil
+        lea     rsi,[rbp*4+rdi]
+        psllq   xmm1,8
+        pxor    xmm3,xmm0
+        pxor    xmm3,xmm1
+        pxor    xmm0,xmm0
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r10d
+        mov     DWORD[rcx*4+rdi],eax
+        xor     r12d,r9d
+        add     r8d,DWORD[20+r15]
+        add     al,dl
+        mov     ebx,DWORD[4+rsi]
+        add     r8d,4294588738
+        movzx   eax,al
+        add     r8d,r12d
+        mov     DWORD[rsi],edx
+        add     cl,bl
+        rol     r8d,4
+        mov     r12d,r10d
+        movd    xmm0,DWORD[rax*4+rdi]
+
+        add     r8d,r9d
+        pxor    xmm1,xmm1
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r9d
+        mov     DWORD[rcx*4+rdi],ebx
+        xor     r12d,r8d
+        add     r11d,DWORD[32+r15]
+        add     bl,dl
+        mov     eax,DWORD[8+rsi]
+        add     r11d,2272392833
+        movzx   ebx,bl
+        add     r11d,r12d
+        mov     DWORD[4+rsi],edx
+        add     cl,al
+        rol     r11d,11
+        mov     r12d,r9d
+        movd    xmm1,DWORD[rbx*4+rdi]
+
+        add     r11d,r8d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r8d
+        mov     DWORD[rcx*4+rdi],eax
+        xor     r12d,r11d
+        add     r10d,DWORD[44+r15]
+        add     al,dl
+        mov     ebx,DWORD[12+rsi]
+        add     r10d,1839030562
+        movzx   eax,al
+        add     r10d,r12d
+        mov     DWORD[8+rsi],edx
+        add     cl,bl
+        rol     r10d,16
+        mov     r12d,r8d
+        pinsrw  xmm0,WORD[rax*4+rdi],1
+
+        add     r10d,r11d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r11d
+        mov     DWORD[rcx*4+rdi],ebx
+        xor     r12d,r10d
+        add     r9d,DWORD[56+r15]
+        add     bl,dl
+        mov     eax,DWORD[16+rsi]
+        add     r9d,4259657740
+        movzx   ebx,bl
+        add     r9d,r12d
+        mov     DWORD[12+rsi],edx
+        add     cl,al
+        rol     r9d,23
+        mov     r12d,r11d
+        pinsrw  xmm1,WORD[rbx*4+rdi],1
+
+        add     r9d,r10d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r10d
+        mov     DWORD[rcx*4+rdi],eax
+        xor     r12d,r9d
+        add     r8d,DWORD[4+r15]
+        add     al,dl
+        mov     ebx,DWORD[20+rsi]
+        add     r8d,2763975236
+        movzx   eax,al
+        add     r8d,r12d
+        mov     DWORD[16+rsi],edx
+        add     cl,bl
+        rol     r8d,4
+        mov     r12d,r10d
+        pinsrw  xmm0,WORD[rax*4+rdi],2
+
+        add     r8d,r9d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r9d
+        mov     DWORD[rcx*4+rdi],ebx
+        xor     r12d,r8d
+        add     r11d,DWORD[16+r15]
+        add     bl,dl
+        mov     eax,DWORD[24+rsi]
+        add     r11d,1272893353
+        movzx   ebx,bl
+        add     r11d,r12d
+        mov     DWORD[20+rsi],edx
+        add     cl,al
+        rol     r11d,11
+        mov     r12d,r9d
+        pinsrw  xmm1,WORD[rbx*4+rdi],2
+
+        add     r11d,r8d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r8d
+        mov     DWORD[rcx*4+rdi],eax
+        xor     r12d,r11d
+        add     r10d,DWORD[28+r15]
+        add     al,dl
+        mov     ebx,DWORD[28+rsi]
+        add     r10d,4139469664
+        movzx   eax,al
+        add     r10d,r12d
+        mov     DWORD[24+rsi],edx
+        add     cl,bl
+        rol     r10d,16
+        mov     r12d,r8d
+        pinsrw  xmm0,WORD[rax*4+rdi],3
+
+        add     r10d,r11d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r11d
+        mov     DWORD[rcx*4+rdi],ebx
+        xor     r12d,r10d
+        add     r9d,DWORD[40+r15]
+        add     bl,dl
+        mov     eax,DWORD[32+rsi]
+        add     r9d,3200236656
+        movzx   ebx,bl
+        add     r9d,r12d
+        mov     DWORD[28+rsi],edx
+        add     cl,al
+        rol     r9d,23
+        mov     r12d,r11d
+        pinsrw  xmm1,WORD[rbx*4+rdi],3
+
+        add     r9d,r10d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r10d
+        mov     DWORD[rcx*4+rdi],eax
+        xor     r12d,r9d
+        add     r8d,DWORD[52+r15]
+        add     al,dl
+        mov     ebx,DWORD[36+rsi]
+        add     r8d,681279174
+        movzx   eax,al
+        add     r8d,r12d
+        mov     DWORD[32+rsi],edx
+        add     cl,bl
+        rol     r8d,4
+        mov     r12d,r10d
+        pinsrw  xmm0,WORD[rax*4+rdi],4
+
+        add     r8d,r9d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r9d
+        mov     DWORD[rcx*4+rdi],ebx
+        xor     r12d,r8d
+        add     r11d,DWORD[r15]
+        add     bl,dl
+        mov     eax,DWORD[40+rsi]
+        add     r11d,3936430074
+        movzx   ebx,bl
+        add     r11d,r12d
+        mov     DWORD[36+rsi],edx
+        add     cl,al
+        rol     r11d,11
+        mov     r12d,r9d
+        pinsrw  xmm1,WORD[rbx*4+rdi],4
+
+        add     r11d,r8d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r8d
+        mov     DWORD[rcx*4+rdi],eax
+        xor     r12d,r11d
+        add     r10d,DWORD[12+r15]
+        add     al,dl
+        mov     ebx,DWORD[44+rsi]
+        add     r10d,3572445317
+        movzx   eax,al
+        add     r10d,r12d
+        mov     DWORD[40+rsi],edx
+        add     cl,bl
+        rol     r10d,16
+        mov     r12d,r8d
+        pinsrw  xmm0,WORD[rax*4+rdi],5
+
+        add     r10d,r11d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r11d
+        mov     DWORD[rcx*4+rdi],ebx
+        xor     r12d,r10d
+        add     r9d,DWORD[24+r15]
+        add     bl,dl
+        mov     eax,DWORD[48+rsi]
+        add     r9d,76029189
+        movzx   ebx,bl
+        add     r9d,r12d
+        mov     DWORD[44+rsi],edx
+        add     cl,al
+        rol     r9d,23
+        mov     r12d,r11d
+        pinsrw  xmm1,WORD[rbx*4+rdi],5
+
+        add     r9d,r10d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r10d
+        mov     DWORD[rcx*4+rdi],eax
+        xor     r12d,r9d
+        add     r8d,DWORD[36+r15]
+        add     al,dl
+        mov     ebx,DWORD[52+rsi]
+        add     r8d,3654602809
+        movzx   eax,al
+        add     r8d,r12d
+        mov     DWORD[48+rsi],edx
+        add     cl,bl
+        rol     r8d,4
+        mov     r12d,r10d
+        pinsrw  xmm0,WORD[rax*4+rdi],6
+
+        add     r8d,r9d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r9d
+        mov     DWORD[rcx*4+rdi],ebx
+        xor     r12d,r8d
+        add     r11d,DWORD[48+r15]
+        add     bl,dl
+        mov     eax,DWORD[56+rsi]
+        add     r11d,3873151461
+        movzx   ebx,bl
+        add     r11d,r12d
+        mov     DWORD[52+rsi],edx
+        add     cl,al
+        rol     r11d,11
+        mov     r12d,r9d
+        pinsrw  xmm1,WORD[rbx*4+rdi],6
+
+        add     r11d,r8d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r8d
+        mov     DWORD[rcx*4+rdi],eax
+        xor     r12d,r11d
+        add     r10d,DWORD[60+r15]
+        add     al,dl
+        mov     ebx,DWORD[60+rsi]
+        add     r10d,530742520
+        movzx   eax,al
+        add     r10d,r12d
+        mov     DWORD[56+rsi],edx
+        add     cl,bl
+        rol     r10d,16
+        mov     r12d,r8d
+        pinsrw  xmm0,WORD[rax*4+rdi],7
+
+        add     r10d,r11d
+        movdqu  xmm4,XMMWORD[32+r13]
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r11d
+        mov     DWORD[rcx*4+rdi],ebx
+        xor     r12d,r10d
+        add     r9d,DWORD[8+r15]
+        add     bl,dl
+        mov     eax,DWORD[64+rsi]
+        add     r9d,3299628645
+        movzx   ebx,bl
+        add     r9d,r12d
+        mov     DWORD[60+rsi],edx
+        add     cl,al
+        rol     r9d,23
+        mov     r12d,-1
+        pinsrw  xmm1,WORD[rbx*4+rdi],7
+
+        add     r9d,r10d
+        psllq   xmm1,8
+        pxor    xmm4,xmm0
+        pxor    xmm4,xmm1
+        pxor    xmm0,xmm0
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r11d
+        mov     DWORD[rcx*4+rdi],eax
+        or      r12d,r9d
+        add     r8d,DWORD[r15]
+        add     al,dl
+        mov     ebx,DWORD[68+rsi]
+        add     r8d,4096336452
+        movzx   eax,al
+        xor     r12d,r10d
+        mov     DWORD[64+rsi],edx
+        add     r8d,r12d
+        add     cl,bl
+        rol     r8d,6
+        mov     r12d,-1
+        movd    xmm0,DWORD[rax*4+rdi]
+
+        add     r8d,r9d
+        pxor    xmm1,xmm1
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r10d
+        mov     DWORD[rcx*4+rdi],ebx
+        or      r12d,r8d
+        add     r11d,DWORD[28+r15]
+        add     bl,dl
+        mov     eax,DWORD[72+rsi]
+        add     r11d,1126891415
+        movzx   ebx,bl
+        xor     r12d,r9d
+        mov     DWORD[68+rsi],edx
+        add     r11d,r12d
+        add     cl,al
+        rol     r11d,10
+        mov     r12d,-1
+        movd    xmm1,DWORD[rbx*4+rdi]
+
+        add     r11d,r8d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r9d
+        mov     DWORD[rcx*4+rdi],eax
+        or      r12d,r11d
+        add     r10d,DWORD[56+r15]
+        add     al,dl
+        mov     ebx,DWORD[76+rsi]
+        add     r10d,2878612391
+        movzx   eax,al
+        xor     r12d,r8d
+        mov     DWORD[72+rsi],edx
+        add     r10d,r12d
+        add     cl,bl
+        rol     r10d,15
+        mov     r12d,-1
+        pinsrw  xmm0,WORD[rax*4+rdi],1
+
+        add     r10d,r11d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r8d
+        mov     DWORD[rcx*4+rdi],ebx
+        or      r12d,r10d
+        add     r9d,DWORD[20+r15]
+        add     bl,dl
+        mov     eax,DWORD[80+rsi]
+        add     r9d,4237533241
+        movzx   ebx,bl
+        xor     r12d,r11d
+        mov     DWORD[76+rsi],edx
+        add     r9d,r12d
+        add     cl,al
+        rol     r9d,21
+        mov     r12d,-1
+        pinsrw  xmm1,WORD[rbx*4+rdi],1
+
+        add     r9d,r10d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r11d
+        mov     DWORD[rcx*4+rdi],eax
+        or      r12d,r9d
+        add     r8d,DWORD[48+r15]
+        add     al,dl
+        mov     ebx,DWORD[84+rsi]
+        add     r8d,1700485571
+        movzx   eax,al
+        xor     r12d,r10d
+        mov     DWORD[80+rsi],edx
+        add     r8d,r12d
+        add     cl,bl
+        rol     r8d,6
+        mov     r12d,-1
+        pinsrw  xmm0,WORD[rax*4+rdi],2
+
+        add     r8d,r9d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r10d
+        mov     DWORD[rcx*4+rdi],ebx
+        or      r12d,r8d
+        add     r11d,DWORD[12+r15]
+        add     bl,dl
+        mov     eax,DWORD[88+rsi]
+        add     r11d,2399980690
+        movzx   ebx,bl
+        xor     r12d,r9d
+        mov     DWORD[84+rsi],edx
+        add     r11d,r12d
+        add     cl,al
+        rol     r11d,10
+        mov     r12d,-1
+        pinsrw  xmm1,WORD[rbx*4+rdi],2
+
+        add     r11d,r8d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r9d
+        mov     DWORD[rcx*4+rdi],eax
+        or      r12d,r11d
+        add     r10d,DWORD[40+r15]
+        add     al,dl
+        mov     ebx,DWORD[92+rsi]
+        add     r10d,4293915773
+        movzx   eax,al
+        xor     r12d,r8d
+        mov     DWORD[88+rsi],edx
+        add     r10d,r12d
+        add     cl,bl
+        rol     r10d,15
+        mov     r12d,-1
+        pinsrw  xmm0,WORD[rax*4+rdi],3
+
+        add     r10d,r11d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r8d
+        mov     DWORD[rcx*4+rdi],ebx
+        or      r12d,r10d
+        add     r9d,DWORD[4+r15]
+        add     bl,dl
+        mov     eax,DWORD[96+rsi]
+        add     r9d,2240044497
+        movzx   ebx,bl
+        xor     r12d,r11d
+        mov     DWORD[92+rsi],edx
+        add     r9d,r12d
+        add     cl,al
+        rol     r9d,21
+        mov     r12d,-1
+        pinsrw  xmm1,WORD[rbx*4+rdi],3
+
+        add     r9d,r10d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r11d
+        mov     DWORD[rcx*4+rdi],eax
+        or      r12d,r9d
+        add     r8d,DWORD[32+r15]
+        add     al,dl
+        mov     ebx,DWORD[100+rsi]
+        add     r8d,1873313359
+        movzx   eax,al
+        xor     r12d,r10d
+        mov     DWORD[96+rsi],edx
+        add     r8d,r12d
+        add     cl,bl
+        rol     r8d,6
+        mov     r12d,-1
+        pinsrw  xmm0,WORD[rax*4+rdi],4
+
+        add     r8d,r9d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r10d
+        mov     DWORD[rcx*4+rdi],ebx
+        or      r12d,r8d
+        add     r11d,DWORD[60+r15]
+        add     bl,dl
+        mov     eax,DWORD[104+rsi]
+        add     r11d,4264355552
+        movzx   ebx,bl
+        xor     r12d,r9d
+        mov     DWORD[100+rsi],edx
+        add     r11d,r12d
+        add     cl,al
+        rol     r11d,10
+        mov     r12d,-1
+        pinsrw  xmm1,WORD[rbx*4+rdi],4
+
+        add     r11d,r8d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r9d
+        mov     DWORD[rcx*4+rdi],eax
+        or      r12d,r11d
+        add     r10d,DWORD[24+r15]
+        add     al,dl
+        mov     ebx,DWORD[108+rsi]
+        add     r10d,2734768916
+        movzx   eax,al
+        xor     r12d,r8d
+        mov     DWORD[104+rsi],edx
+        add     r10d,r12d
+        add     cl,bl
+        rol     r10d,15
+        mov     r12d,-1
+        pinsrw  xmm0,WORD[rax*4+rdi],5
+
+        add     r10d,r11d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r8d
+        mov     DWORD[rcx*4+rdi],ebx
+        or      r12d,r10d
+        add     r9d,DWORD[52+r15]
+        add     bl,dl
+        mov     eax,DWORD[112+rsi]
+        add     r9d,1309151649
+        movzx   ebx,bl
+        xor     r12d,r11d
+        mov     DWORD[108+rsi],edx
+        add     r9d,r12d
+        add     cl,al
+        rol     r9d,21
+        mov     r12d,-1
+        pinsrw  xmm1,WORD[rbx*4+rdi],5
+
+        add     r9d,r10d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r11d
+        mov     DWORD[rcx*4+rdi],eax
+        or      r12d,r9d
+        add     r8d,DWORD[16+r15]
+        add     al,dl
+        mov     ebx,DWORD[116+rsi]
+        add     r8d,4149444226
+        movzx   eax,al
+        xor     r12d,r10d
+        mov     DWORD[112+rsi],edx
+        add     r8d,r12d
+        add     cl,bl
+        rol     r8d,6
+        mov     r12d,-1
+        pinsrw  xmm0,WORD[rax*4+rdi],6
+
+        add     r8d,r9d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r10d
+        mov     DWORD[rcx*4+rdi],ebx
+        or      r12d,r8d
+        add     r11d,DWORD[44+r15]
+        add     bl,dl
+        mov     eax,DWORD[120+rsi]
+        add     r11d,3174756917
+        movzx   ebx,bl
+        xor     r12d,r9d
+        mov     DWORD[116+rsi],edx
+        add     r11d,r12d
+        add     cl,al
+        rol     r11d,10
+        mov     r12d,-1
+        pinsrw  xmm1,WORD[rbx*4+rdi],6
+
+        add     r11d,r8d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r9d
+        mov     DWORD[rcx*4+rdi],eax
+        or      r12d,r11d
+        add     r10d,DWORD[8+r15]
+        add     al,dl
+        mov     ebx,DWORD[124+rsi]
+        add     r10d,718787259
+        movzx   eax,al
+        xor     r12d,r8d
+        mov     DWORD[120+rsi],edx
+        add     r10d,r12d
+        add     cl,bl
+        rol     r10d,15
+        mov     r12d,-1
+        pinsrw  xmm0,WORD[rax*4+rdi],7
+
+        add     r10d,r11d
+        movdqu  xmm5,XMMWORD[48+r13]
+        add     bpl,32
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r8d
+        mov     DWORD[rcx*4+rdi],ebx
+        or      r12d,r10d
+        add     r9d,DWORD[36+r15]
+        add     bl,dl
+        mov     eax,DWORD[rbp*4+rdi]
+        add     r9d,3951481745
+        movzx   ebx,bl
+        xor     r12d,r11d
+        mov     DWORD[124+rsi],edx
+        add     r9d,r12d
+        add     cl,al
+        rol     r9d,21
+        mov     r12d,-1
+        pinsrw  xmm1,WORD[rbx*4+rdi],7
+
+        add     r9d,r10d
+        mov     rsi,rbp
+        xor     rbp,rbp
+        mov     bpl,sil
+        mov     rsi,rcx
+        xor     rcx,rcx
+        mov     cl,sil
+        lea     rsi,[rbp*4+rdi]
+        psllq   xmm1,8
+        pxor    xmm5,xmm0
+        pxor    xmm5,xmm1
+        add     r8d,DWORD[rsp]
+        add     r9d,DWORD[4+rsp]
+        add     r10d,DWORD[8+rsp]
+        add     r11d,DWORD[12+rsp]
+
+        movdqu  XMMWORD[r13*1+r14],xmm2
+        movdqu  XMMWORD[16+r13*1+r14],xmm3
+        movdqu  XMMWORD[32+r13*1+r14],xmm4
+        movdqu  XMMWORD[48+r13*1+r14],xmm5
+        lea     r15,[64+r15]
+        lea     r13,[64+r13]
+        cmp     r15,QWORD[16+rsp]
+        jb      NEAR $L$oop
+
+        mov     r12,QWORD[24+rsp]
+        sub     cl,al
+        mov     DWORD[r12],r8d
+        mov     DWORD[4+r12],r9d
+        mov     DWORD[8+r12],r10d
+        mov     DWORD[12+r12],r11d
+        sub     bpl,1
+        mov     DWORD[((-8))+rdi],ebp
+        mov     DWORD[((-4))+rdi],ecx
+
+        mov     r15,QWORD[40+rsp]
+
+        mov     r14,QWORD[48+rsp]
+
+        mov     r13,QWORD[56+rsp]
+
+        mov     r12,QWORD[64+rsp]
+
+        mov     rbp,QWORD[72+rsp]
+
+        mov     rbx,QWORD[80+rsp]
+
+        lea     rsp,[88+rsp]
+
+$L$epilogue:
+$L$abort:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_rc4_md5_enc:
+EXTERN  __imp_RtlVirtualUnwind
+
+ALIGN   16
+se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        lea     r10,[$L$body]
+        cmp     rbx,r10
+        jb      NEAR $L$in_prologue
+
+        mov     rax,QWORD[152+r8]
+
+        lea     r10,[$L$epilogue]
+        cmp     rbx,r10
+        jae     NEAR $L$in_prologue
+
+        mov     r15,QWORD[40+rax]
+        mov     r14,QWORD[48+rax]
+        mov     r13,QWORD[56+rax]
+        mov     r12,QWORD[64+rax]
+        mov     rbp,QWORD[72+rax]
+        mov     rbx,QWORD[80+rax]
+        lea     rax,[88+rax]
+
+        mov     QWORD[144+r8],rbx
+        mov     QWORD[160+r8],rbp
+        mov     QWORD[216+r8],r12
+        mov     QWORD[224+r8],r13
+        mov     QWORD[232+r8],r14
+        mov     QWORD[240+r8],r15
+
+$L$in_prologue:
+        mov     rdi,QWORD[8+rax]
+        mov     rsi,QWORD[16+rax]
+        mov     QWORD[152+r8],rax
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+
+        mov     rdi,QWORD[40+r9]
+        mov     rsi,r8
+        mov     ecx,154
+        DD      0xa548f3fc
+
+        mov     rsi,r9
+        xor     rcx,rcx
+        mov     rdx,QWORD[8+rsi]
+        mov     r8,QWORD[rsi]
+        mov     r9,QWORD[16+rsi]
+        mov     r10,QWORD[40+rsi]
+        lea     r11,[56+rsi]
+        lea     r12,[24+rsi]
+        mov     QWORD[32+rsp],r10
+        mov     QWORD[40+rsp],r11
+        mov     QWORD[48+rsp],r12
+        mov     QWORD[56+rsp],rcx
+        call    QWORD[__imp_RtlVirtualUnwind]
+
+        mov     eax,1
+        add     rsp,64
+        popfq
+        pop     r15
+        pop     r14
+        pop     r13
+        pop     r12
+        pop     rbp
+        pop     rbx
+        pop     rdi
+        pop     rsi
+        DB      0F3h,0C3h               ;repret
+
+
+section .pdata rdata align=4
+ALIGN   4
+        DD      $L$SEH_begin_rc4_md5_enc wrt ..imagebase
+        DD      $L$SEH_end_rc4_md5_enc wrt ..imagebase
+        DD      $L$SEH_info_rc4_md5_enc wrt ..imagebase
+
+section .xdata rdata align=8
+ALIGN   8
+$L$SEH_info_rc4_md5_enc:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-x86_64.nasm
new file mode 100644
index 0000000000..72e3641649
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-x86_64.nasm
@@ -0,0 +1,784 @@
+; Copyright 2005-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+EXTERN  OPENSSL_ia32cap_P
+
+global  RC4
+
+ALIGN   16
+RC4:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_RC4:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+
+        or      rsi,rsi
+        jne     NEAR $L$entry
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+$L$entry:
+
+        push    rbx
+
+        push    r12
+
+        push    r13
+
+$L$prologue:
+        mov     r11,rsi
+        mov     r12,rdx
+        mov     r13,rcx
+        xor     r10,r10
+        xor     rcx,rcx
+
+        lea     rdi,[8+rdi]
+        mov     r10b,BYTE[((-8))+rdi]
+        mov     cl,BYTE[((-4))+rdi]
+        cmp     DWORD[256+rdi],-1
+        je      NEAR $L$RC4_CHAR
+        mov     r8d,DWORD[OPENSSL_ia32cap_P]
+        xor     rbx,rbx
+        inc     r10b
+        sub     rbx,r10
+        sub     r13,r12
+        mov     eax,DWORD[r10*4+rdi]
+        test    r11,-16
+        jz      NEAR $L$loop1
+        bt      r8d,30
+        jc      NEAR $L$intel
+        and     rbx,7
+        lea     rsi,[1+r10]
+        jz      NEAR $L$oop8
+        sub     r11,rbx
+$L$oop8_warmup:
+        add     cl,al
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],eax
+        mov     DWORD[r10*4+rdi],edx
+        add     al,dl
+        inc     r10b
+        mov     edx,DWORD[rax*4+rdi]
+        mov     eax,DWORD[r10*4+rdi]
+        xor     dl,BYTE[r12]
+        mov     BYTE[r13*1+r12],dl
+        lea     r12,[1+r12]
+        dec     rbx
+        jnz     NEAR $L$oop8_warmup
+
+        lea     rsi,[1+r10]
+        jmp     NEAR $L$oop8
+ALIGN   16
+$L$oop8:
+        add     cl,al
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],eax
+        mov     ebx,DWORD[rsi*4+rdi]
+        ror     r8,8
+        mov     DWORD[r10*4+rdi],edx
+        add     dl,al
+        mov     r8b,BYTE[rdx*4+rdi]
+        add     cl,bl
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],ebx
+        mov     eax,DWORD[4+rsi*4+rdi]
+        ror     r8,8
+        mov     DWORD[4+r10*4+rdi],edx
+        add     dl,bl
+        mov     r8b,BYTE[rdx*4+rdi]
+        add     cl,al
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],eax
+        mov     ebx,DWORD[8+rsi*4+rdi]
+        ror     r8,8
+        mov     DWORD[8+r10*4+rdi],edx
+        add     dl,al
+        mov     r8b,BYTE[rdx*4+rdi]
+        add     cl,bl
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],ebx
+        mov     eax,DWORD[12+rsi*4+rdi]
+        ror     r8,8
+        mov     DWORD[12+r10*4+rdi],edx
+        add     dl,bl
+        mov     r8b,BYTE[rdx*4+rdi]
+        add     cl,al
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],eax
+        mov     ebx,DWORD[16+rsi*4+rdi]
+        ror     r8,8
+        mov     DWORD[16+r10*4+rdi],edx
+        add     dl,al
+        mov     r8b,BYTE[rdx*4+rdi]
+        add     cl,bl
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],ebx
+        mov     eax,DWORD[20+rsi*4+rdi]
+        ror     r8,8
+        mov     DWORD[20+r10*4+rdi],edx
+        add     dl,bl
+        mov     r8b,BYTE[rdx*4+rdi]
+        add     cl,al
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],eax
+        mov     ebx,DWORD[24+rsi*4+rdi]
+        ror     r8,8
+        mov     DWORD[24+r10*4+rdi],edx
+        add     dl,al
+        mov     r8b,BYTE[rdx*4+rdi]
+        add     sil,8
+        add     cl,bl
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],ebx
+        mov     eax,DWORD[((-4))+rsi*4+rdi]
+        ror     r8,8
+        mov     DWORD[28+r10*4+rdi],edx
+        add     dl,bl
+        mov     r8b,BYTE[rdx*4+rdi]
+        add     r10b,8
+        ror     r8,8
+        sub     r11,8
+
+        xor     r8,QWORD[r12]
+        mov     QWORD[r13*1+r12],r8
+        lea     r12,[8+r12]
+
+        test    r11,-8
+        jnz     NEAR $L$oop8
+        cmp     r11,0
+        jne     NEAR $L$loop1
+        jmp     NEAR $L$exit
+
+ALIGN   16
+$L$intel:
+        test    r11,-32
+        jz      NEAR $L$loop1
+        and     rbx,15
+        jz      NEAR $L$oop16_is_hot
+        sub     r11,rbx
+$L$oop16_warmup:
+        add     cl,al
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],eax
+        mov     DWORD[r10*4+rdi],edx
+        add     al,dl
+        inc     r10b
+        mov     edx,DWORD[rax*4+rdi]
+        mov     eax,DWORD[r10*4+rdi]
+        xor     dl,BYTE[r12]
+        mov     BYTE[r13*1+r12],dl
+        lea     r12,[1+r12]
+        dec     rbx
+        jnz     NEAR $L$oop16_warmup
+
+        mov     rbx,rcx
+        xor     rcx,rcx
+        mov     cl,bl
+
+$L$oop16_is_hot:
+        lea     rsi,[r10*4+rdi]
+        add     cl,al
+        mov     edx,DWORD[rcx*4+rdi]
+        pxor    xmm0,xmm0
+        mov     DWORD[rcx*4+rdi],eax
+        add     al,dl
+        mov     ebx,DWORD[4+rsi]
+        movzx   eax,al
+        mov     DWORD[rsi],edx
+        add     cl,bl
+        pinsrw  xmm0,WORD[rax*4+rdi],0
+        jmp     NEAR $L$oop16_enter
+ALIGN   16
+$L$oop16:
+        add     cl,al
+        mov     edx,DWORD[rcx*4+rdi]
+        pxor    xmm2,xmm0
+        psllq   xmm1,8
+        pxor    xmm0,xmm0
+        mov     DWORD[rcx*4+rdi],eax
+        add     al,dl
+        mov     ebx,DWORD[4+rsi]
+        movzx   eax,al
+        mov     DWORD[rsi],edx
+        pxor    xmm2,xmm1
+        add     cl,bl
+        pinsrw  xmm0,WORD[rax*4+rdi],0
+        movdqu  XMMWORD[r13*1+r12],xmm2
+        lea     r12,[16+r12]
+$L$oop16_enter:
+        mov     edx,DWORD[rcx*4+rdi]
+        pxor    xmm1,xmm1
+        mov     DWORD[rcx*4+rdi],ebx
+        add     bl,dl
+        mov     eax,DWORD[8+rsi]
+        movzx   ebx,bl
+        mov     DWORD[4+rsi],edx
+        add     cl,al
+        pinsrw  xmm1,WORD[rbx*4+rdi],0
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],eax
+        add     al,dl
+        mov     ebx,DWORD[12+rsi]
+        movzx   eax,al
+        mov     DWORD[8+rsi],edx
+        add     cl,bl
+        pinsrw  xmm0,WORD[rax*4+rdi],1
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],ebx
+        add     bl,dl
+        mov     eax,DWORD[16+rsi]
+        movzx   ebx,bl
+        mov     DWORD[12+rsi],edx
+        add     cl,al
+        pinsrw  xmm1,WORD[rbx*4+rdi],1
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],eax
+        add     al,dl
+        mov     ebx,DWORD[20+rsi]
+        movzx   eax,al
+        mov     DWORD[16+rsi],edx
+        add     cl,bl
+        pinsrw  xmm0,WORD[rax*4+rdi],2
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],ebx
+        add     bl,dl
+        mov     eax,DWORD[24+rsi]
+        movzx   ebx,bl
+        mov     DWORD[20+rsi],edx
+        add     cl,al
+        pinsrw  xmm1,WORD[rbx*4+rdi],2
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],eax
+        add     al,dl
+        mov     ebx,DWORD[28+rsi]
+        movzx   eax,al
+        mov     DWORD[24+rsi],edx
+        add     cl,bl
+        pinsrw  xmm0,WORD[rax*4+rdi],3
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],ebx
+        add     bl,dl
+        mov     eax,DWORD[32+rsi]
+        movzx   ebx,bl
+        mov     DWORD[28+rsi],edx
+        add     cl,al
+        pinsrw  xmm1,WORD[rbx*4+rdi],3
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],eax
+        add     al,dl
+        mov     ebx,DWORD[36+rsi]
+        movzx   eax,al
+        mov     DWORD[32+rsi],edx
+        add     cl,bl
+        pinsrw  xmm0,WORD[rax*4+rdi],4
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],ebx
+        add     bl,dl
+        mov     eax,DWORD[40+rsi]
+        movzx   ebx,bl
+        mov     DWORD[36+rsi],edx
+        add     cl,al
+        pinsrw  xmm1,WORD[rbx*4+rdi],4
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],eax
+        add     al,dl
+        mov     ebx,DWORD[44+rsi]
+        movzx   eax,al
+        mov     DWORD[40+rsi],edx
+        add     cl,bl
+        pinsrw  xmm0,WORD[rax*4+rdi],5
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],ebx
+        add     bl,dl
+        mov     eax,DWORD[48+rsi]
+        movzx   ebx,bl
+        mov     DWORD[44+rsi],edx
+        add     cl,al
+        pinsrw  xmm1,WORD[rbx*4+rdi],5
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],eax
+        add     al,dl
+        mov     ebx,DWORD[52+rsi]
+        movzx   eax,al
+        mov     DWORD[48+rsi],edx
+        add     cl,bl
+        pinsrw  xmm0,WORD[rax*4+rdi],6
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],ebx
+        add     bl,dl
+        mov     eax,DWORD[56+rsi]
+        movzx   ebx,bl
+        mov     DWORD[52+rsi],edx
+        add     cl,al
+        pinsrw  xmm1,WORD[rbx*4+rdi],6
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],eax
+        add     al,dl
+        mov     ebx,DWORD[60+rsi]
+        movzx   eax,al
+        mov     DWORD[56+rsi],edx
+        add     cl,bl
+        pinsrw  xmm0,WORD[rax*4+rdi],7
+        add     r10b,16
+        movdqu  xmm2,XMMWORD[r12]
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],ebx
+        add     bl,dl
+        movzx   ebx,bl
+        mov     DWORD[60+rsi],edx
+        lea     rsi,[r10*4+rdi]
+        pinsrw  xmm1,WORD[rbx*4+rdi],7
+        mov     eax,DWORD[rsi]
+        mov     rbx,rcx
+        xor     rcx,rcx
+        sub     r11,16
+        mov     cl,bl
+        test    r11,-16
+        jnz     NEAR $L$oop16
+
+        psllq   xmm1,8
+        pxor    xmm2,xmm0
+        pxor    xmm2,xmm1
+        movdqu  XMMWORD[r13*1+r12],xmm2
+        lea     r12,[16+r12]
+
+        cmp     r11,0
+        jne     NEAR $L$loop1
+        jmp     NEAR $L$exit
+
+ALIGN   16
+$L$loop1:
+        add     cl,al
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],eax
+        mov     DWORD[r10*4+rdi],edx
+        add     al,dl
+        inc     r10b
+        mov     edx,DWORD[rax*4+rdi]
+        mov     eax,DWORD[r10*4+rdi]
+        xor     dl,BYTE[r12]
+        mov     BYTE[r13*1+r12],dl
+        lea     r12,[1+r12]
+        dec     r11
+        jnz     NEAR $L$loop1
+        jmp     NEAR $L$exit
+
+ALIGN   16
+$L$RC4_CHAR:
+        add     r10b,1
+        movzx   eax,BYTE[r10*1+rdi]
+        test    r11,-8
+        jz      NEAR $L$cloop1
+        jmp     NEAR $L$cloop8
+ALIGN   16
+$L$cloop8:
+        mov     r8d,DWORD[r12]
+        mov     r9d,DWORD[4+r12]
+        add     cl,al
+        lea     rsi,[1+r10]
+        movzx   edx,BYTE[rcx*1+rdi]
+        movzx   esi,sil
+        movzx   ebx,BYTE[rsi*1+rdi]
+        mov     BYTE[rcx*1+rdi],al
+        cmp     rcx,rsi
+        mov     BYTE[r10*1+rdi],dl
+        jne     NEAR $L$cmov0
+        mov     rbx,rax
+$L$cmov0:
+        add     dl,al
+        xor     r8b,BYTE[rdx*1+rdi]
+        ror     r8d,8
+        add     cl,bl
+        lea     r10,[1+rsi]
+        movzx   edx,BYTE[rcx*1+rdi]
+        movzx   r10d,r10b
+        movzx   eax,BYTE[r10*1+rdi]
+        mov     BYTE[rcx*1+rdi],bl
+        cmp     rcx,r10
+        mov     BYTE[rsi*1+rdi],dl
+        jne     NEAR $L$cmov1
+        mov     rax,rbx
+$L$cmov1:
+        add     dl,bl
+        xor     r8b,BYTE[rdx*1+rdi]
+        ror     r8d,8
+        add     cl,al
+        lea     rsi,[1+r10]
+        movzx   edx,BYTE[rcx*1+rdi]
+        movzx   esi,sil
+        movzx   ebx,BYTE[rsi*1+rdi]
+        mov     BYTE[rcx*1+rdi],al
+        cmp     rcx,rsi
+        mov     BYTE[r10*1+rdi],dl
+        jne     NEAR $L$cmov2
+        mov     rbx,rax
+$L$cmov2:
+        add     dl,al
+        xor     r8b,BYTE[rdx*1+rdi]
+        ror     r8d,8
+        add     cl,bl
+        lea     r10,[1+rsi]
+        movzx   edx,BYTE[rcx*1+rdi]
+        movzx   r10d,r10b
+        movzx   eax,BYTE[r10*1+rdi]
+        mov     BYTE[rcx*1+rdi],bl
+        cmp     rcx,r10
+        mov     BYTE[rsi*1+rdi],dl
+        jne     NEAR $L$cmov3
+        mov     rax,rbx
+$L$cmov3:
+        add     dl,bl
+        xor     r8b,BYTE[rdx*1+rdi]
+        ror     r8d,8
+        add     cl,al
+        lea     rsi,[1+r10]
+        movzx   edx,BYTE[rcx*1+rdi]
+        movzx   esi,sil
+        movzx   ebx,BYTE[rsi*1+rdi]
+        mov     BYTE[rcx*1+rdi],al
+        cmp     rcx,rsi
+        mov     BYTE[r10*1+rdi],dl
+        jne     NEAR $L$cmov4
+        mov     rbx,rax
+$L$cmov4:
+        add     dl,al
+        xor     r9b,BYTE[rdx*1+rdi]
+        ror     r9d,8
+        add     cl,bl
+        lea     r10,[1+rsi]
+        movzx   edx,BYTE[rcx*1+rdi]
+        movzx   r10d,r10b
+        movzx   eax,BYTE[r10*1+rdi]
+        mov     BYTE[rcx*1+rdi],bl
+        cmp     rcx,r10
+        mov     BYTE[rsi*1+rdi],dl
+        jne     NEAR $L$cmov5
+        mov     rax,rbx
+$L$cmov5:
+        add     dl,bl
+        xor     r9b,BYTE[rdx*1+rdi]
+        ror     r9d,8
+        add     cl,al
+        lea     rsi,[1+r10]
+        movzx   edx,BYTE[rcx*1+rdi]
+        movzx   esi,sil
+        movzx   ebx,BYTE[rsi*1+rdi]
+        mov     BYTE[rcx*1+rdi],al
+        cmp     rcx,rsi
+        mov     BYTE[r10*1+rdi],dl
+        jne     NEAR $L$cmov6
+        mov     rbx,rax
+$L$cmov6:
+        add     dl,al
+        xor     r9b,BYTE[rdx*1+rdi]
+        ror     r9d,8
+        add     cl,bl
+        lea     r10,[1+rsi]
+        movzx   edx,BYTE[rcx*1+rdi]
+        movzx   r10d,r10b
+        movzx   eax,BYTE[r10*1+rdi]
+        mov     BYTE[rcx*1+rdi],bl
+        cmp     rcx,r10
+        mov     BYTE[rsi*1+rdi],dl
+        jne     NEAR $L$cmov7
+        mov     rax,rbx
+$L$cmov7:
+        add     dl,bl
+        xor     r9b,BYTE[rdx*1+rdi]
+        ror     r9d,8
+        lea     r11,[((-8))+r11]
+        mov     DWORD[r13],r8d
+        lea     r12,[8+r12]
+        mov     DWORD[4+r13],r9d
+        lea     r13,[8+r13]
+
+        test    r11,-8
+        jnz     NEAR $L$cloop8
+        cmp     r11,0
+        jne     NEAR $L$cloop1
+        jmp     NEAR $L$exit
+ALIGN   16
+$L$cloop1:
+        add     cl,al
+        movzx   ecx,cl
+        movzx   edx,BYTE[rcx*1+rdi]
+        mov     BYTE[rcx*1+rdi],al
+        mov     BYTE[r10*1+rdi],dl
+        add     dl,al
+        add     r10b,1
+        movzx   edx,dl
+        movzx   r10d,r10b
+        movzx   edx,BYTE[rdx*1+rdi]
+        movzx   eax,BYTE[r10*1+rdi]
+        xor     dl,BYTE[r12]
+        lea     r12,[1+r12]
+        mov     BYTE[r13],dl
+        lea     r13,[1+r13]
+        sub     r11,1
+        jnz     NEAR $L$cloop1
+        jmp     NEAR $L$exit
+
+ALIGN   16
+$L$exit:
+        sub     r10b,1
+        mov     DWORD[((-8))+rdi],r10d
+        mov     DWORD[((-4))+rdi],ecx
+
+        mov     r13,QWORD[rsp]
+
+        mov     r12,QWORD[8+rsp]
+
+        mov     rbx,QWORD[16+rsp]
+
+        add     rsp,24
+
+$L$epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_RC4:
+global  RC4_set_key
+
+ALIGN   16
+RC4_set_key:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_RC4_set_key:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+        lea     rdi,[8+rdi]
+        lea     rdx,[rsi*1+rdx]
+        neg     rsi
+        mov     rcx,rsi
+        xor     eax,eax
+        xor     r9,r9
+        xor     r10,r10
+        xor     r11,r11
+
+        mov     r8d,DWORD[OPENSSL_ia32cap_P]
+        bt      r8d,20
+        jc      NEAR $L$c1stloop
+        jmp     NEAR $L$w1stloop
+
+ALIGN   16
+$L$w1stloop:
+        mov     DWORD[rax*4+rdi],eax
+        add     al,1
+        jnc     NEAR $L$w1stloop
+
+        xor     r9,r9
+        xor     r8,r8
+ALIGN   16
+$L$w2ndloop:
+        mov     r10d,DWORD[r9*4+rdi]
+        add     r8b,BYTE[rsi*1+rdx]
+        add     r8b,r10b
+        add     rsi,1
+        mov     r11d,DWORD[r8*4+rdi]
+        cmovz   rsi,rcx
+        mov     DWORD[r8*4+rdi],r10d
+        mov     DWORD[r9*4+rdi],r11d
+        add     r9b,1
+        jnc     NEAR $L$w2ndloop
+        jmp     NEAR $L$exit_key
+
+ALIGN   16
+$L$c1stloop:
+        mov     BYTE[rax*1+rdi],al
+        add     al,1
+        jnc     NEAR $L$c1stloop
+
+        xor     r9,r9
+        xor     r8,r8
+ALIGN   16
+$L$c2ndloop:
+        mov     r10b,BYTE[r9*1+rdi]
+        add     r8b,BYTE[rsi*1+rdx]
+        add     r8b,r10b
+        add     rsi,1
+        mov     r11b,BYTE[r8*1+rdi]
+        jnz     NEAR $L$cnowrap
+        mov     rsi,rcx
+$L$cnowrap:
+        mov     BYTE[r8*1+rdi],r10b
+        mov     BYTE[r9*1+rdi],r11b
+        add     r9b,1
+        jnc     NEAR $L$c2ndloop
+        mov     DWORD[256+rdi],-1
+
+ALIGN   16
+$L$exit_key:
+        xor     eax,eax
+        mov     DWORD[((-8))+rdi],eax
+        mov     DWORD[((-4))+rdi],eax
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+$L$SEH_end_RC4_set_key:
+
+global  RC4_options
+
+ALIGN   16
+RC4_options:
+        lea     rax,[$L$opts]
+        mov     edx,DWORD[OPENSSL_ia32cap_P]
+        bt      edx,20
+        jc      NEAR $L$8xchar
+        bt      edx,30
+        jnc     NEAR $L$done
+        add     rax,25
+        DB      0F3h,0C3h               ;repret
+$L$8xchar:
+        add     rax,12
+$L$done:
+        DB      0F3h,0C3h               ;repret
+ALIGN   64
+$L$opts:
+DB      114,99,52,40,56,120,44,105,110,116,41,0
+DB      114,99,52,40,56,120,44,99,104,97,114,41,0
+DB      114,99,52,40,49,54,120,44,105,110,116,41,0
+DB      82,67,52,32,102,111,114,32,120,56,54,95,54,52,44,32
+DB      67,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97
+DB      112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103
+DB      62,0
+ALIGN   64
+
+EXTERN  __imp_RtlVirtualUnwind
+
+ALIGN   16
+stream_se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        lea     r10,[$L$prologue]
+        cmp     rbx,r10
+        jb      NEAR $L$in_prologue
+
+        mov     rax,QWORD[152+r8]
+
+        lea     r10,[$L$epilogue]
+        cmp     rbx,r10
+        jae     NEAR $L$in_prologue
+
+        lea     rax,[24+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+        mov     r12,QWORD[((-16))+rax]
+        mov     r13,QWORD[((-24))+rax]
+        mov     QWORD[144+r8],rbx
+        mov     QWORD[216+r8],r12
+        mov     QWORD[224+r8],r13
+
+$L$in_prologue:
+        mov     rdi,QWORD[8+rax]
+        mov     rsi,QWORD[16+rax]
+        mov     QWORD[152+r8],rax
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+
+        jmp     NEAR $L$common_seh_exit
+
+
+
+ALIGN   16
+key_se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[152+r8]
+        mov     rdi,QWORD[8+rax]
+        mov     rsi,QWORD[16+rax]
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+
+$L$common_seh_exit:
+
+        mov     rdi,QWORD[40+r9]
+        mov     rsi,r8
+        mov     ecx,154
+        DD      0xa548f3fc
+
+        mov     rsi,r9
+        xor     rcx,rcx
+        mov     rdx,QWORD[8+rsi]
+        mov     r8,QWORD[rsi]
+        mov     r9,QWORD[16+rsi]
+        mov     r10,QWORD[40+rsi]
+        lea     r11,[56+rsi]
+        lea     r12,[24+rsi]
+        mov     QWORD[32+rsp],r10
+        mov     QWORD[40+rsp],r11
+        mov     QWORD[48+rsp],r12
+        mov     QWORD[56+rsp],rcx
+        call    QWORD[__imp_RtlVirtualUnwind]
+
+        mov     eax,1
+        add     rsp,64
+        popfq
+        pop     r15
+        pop     r14
+        pop     r13
+        pop     r12
+        pop     rbp
+        pop     rbx
+        pop     rdi
+        pop     rsi
+        DB      0F3h,0C3h               ;repret
+
+
+section .pdata rdata align=4
+ALIGN   4
+        DD      $L$SEH_begin_RC4 wrt ..imagebase
+        DD      $L$SEH_end_RC4 wrt ..imagebase
+        DD      $L$SEH_info_RC4 wrt ..imagebase
+
+        DD      $L$SEH_begin_RC4_set_key wrt ..imagebase
+        DD      $L$SEH_end_RC4_set_key wrt ..imagebase
+        DD      $L$SEH_info_RC4_set_key wrt ..imagebase
+
+section .xdata rdata align=8
+ALIGN   8
+$L$SEH_info_RC4:
+DB      9,0,0,0
+        DD      stream_se_handler wrt ..imagebase
+$L$SEH_info_RC4_set_key:
+DB      9,0,0,0
+        DD      key_se_handler wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/sha/keccak1600-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/keccak1600-x86_64.nasm
new file mode 100644
index 0000000000..00eadebf68
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/keccak1600-x86_64.nasm
@@ -0,0 +1,532 @@
+; Copyright 2017-2018 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+
+ALIGN   32
+__KeccakF1600:
+        mov     rax,QWORD[60+rdi]
+        mov     rbx,QWORD[68+rdi]
+        mov     rcx,QWORD[76+rdi]
+        mov     rdx,QWORD[84+rdi]
+        mov     rbp,QWORD[92+rdi]
+        jmp     NEAR $L$oop
+
+ALIGN   32
+$L$oop:
+        mov     r8,QWORD[((-100))+rdi]
+        mov     r9,QWORD[((-52))+rdi]
+        mov     r10,QWORD[((-4))+rdi]
+        mov     r11,QWORD[44+rdi]
+
+        xor     rcx,QWORD[((-84))+rdi]
+        xor     rdx,QWORD[((-76))+rdi]
+        xor     rax,r8
+        xor     rbx,QWORD[((-92))+rdi]
+        xor     rcx,QWORD[((-44))+rdi]
+        xor     rax,QWORD[((-60))+rdi]
+        mov     r12,rbp
+        xor     rbp,QWORD[((-68))+rdi]
+
+        xor     rcx,r10
+        xor     rax,QWORD[((-20))+rdi]
+        xor     rdx,QWORD[((-36))+rdi]
+        xor     rbx,r9
+        xor     rbp,QWORD[((-28))+rdi]
+
+        xor     rcx,QWORD[36+rdi]
+        xor     rax,QWORD[20+rdi]
+        xor     rdx,QWORD[4+rdi]
+        xor     rbx,QWORD[((-12))+rdi]
+        xor     rbp,QWORD[12+rdi]
+
+        mov     r13,rcx
+        rol     rcx,1
+        xor     rcx,rax
+        xor     rdx,r11
+
+        rol     rax,1
+        xor     rax,rdx
+        xor     rbx,QWORD[28+rdi]
+
+        rol     rdx,1
+        xor     rdx,rbx
+        xor     rbp,QWORD[52+rdi]
+
+        rol     rbx,1
+        xor     rbx,rbp
+
+        rol     rbp,1
+        xor     rbp,r13
+        xor     r9,rcx
+        xor     r10,rdx
+        rol     r9,44
+        xor     r11,rbp
+        xor     r12,rax
+        rol     r10,43
+        xor     r8,rbx
+        mov     r13,r9
+        rol     r11,21
+        or      r9,r10
+        xor     r9,r8
+        rol     r12,14
+
+        xor     r9,QWORD[r15]
+        lea     r15,[8+r15]
+
+        mov     r14,r12
+        and     r12,r11
+        mov     QWORD[((-100))+rsi],r9
+        xor     r12,r10
+        not     r10
+        mov     QWORD[((-84))+rsi],r12
+
+        or      r10,r11
+        mov     r12,QWORD[76+rdi]
+        xor     r10,r13
+        mov     QWORD[((-92))+rsi],r10
+
+        and     r13,r8
+        mov     r9,QWORD[((-28))+rdi]
+        xor     r13,r14
+        mov     r10,QWORD[((-20))+rdi]
+        mov     QWORD[((-68))+rsi],r13
+
+        or      r14,r8
+        mov     r8,QWORD[((-76))+rdi]
+        xor     r14,r11
+        mov     r11,QWORD[28+rdi]
+        mov     QWORD[((-76))+rsi],r14
+
+
+        xor     r8,rbp
+        xor     r12,rdx
+        rol     r8,28
+        xor     r11,rcx
+        xor     r9,rax
+        rol     r12,61
+        rol     r11,45
+        xor     r10,rbx
+        rol     r9,20
+        mov     r13,r8
+        or      r8,r12
+        rol     r10,3
+
+        xor     r8,r11
+        mov     QWORD[((-36))+rsi],r8
+
+        mov     r14,r9
+        and     r9,r13
+        mov     r8,QWORD[((-92))+rdi]
+        xor     r9,r12
+        not     r12
+        mov     QWORD[((-28))+rsi],r9
+
+        or      r12,r11
+        mov     r9,QWORD[((-44))+rdi]
+        xor     r12,r10
+        mov     QWORD[((-44))+rsi],r12
+
+        and     r11,r10
+        mov     r12,QWORD[60+rdi]
+        xor     r11,r14
+        mov     QWORD[((-52))+rsi],r11
+
+        or      r14,r10
+        mov     r10,QWORD[4+rdi]
+        xor     r14,r13
+        mov     r11,QWORD[52+rdi]
+        mov     QWORD[((-60))+rsi],r14
+
+
+        xor     r10,rbp
+        xor     r11,rax
+        rol     r10,25
+        xor     r9,rdx
+        rol     r11,8
+        xor     r12,rbx
+        rol     r9,6
+        xor     r8,rcx
+        rol     r12,18
+        mov     r13,r10
+        and     r10,r11
+        rol     r8,1
+
+        not     r11
+        xor     r10,r9
+        mov     QWORD[((-12))+rsi],r10
+
+        mov     r14,r12
+        and     r12,r11
+        mov     r10,QWORD[((-12))+rdi]
+        xor     r12,r13
+        mov     QWORD[((-4))+rsi],r12
+
+        or      r13,r9
+        mov     r12,QWORD[84+rdi]
+        xor     r13,r8
+        mov     QWORD[((-20))+rsi],r13
+
+        and     r9,r8
+        xor     r9,r14
+        mov     QWORD[12+rsi],r9
+
+        or      r14,r8
+        mov     r9,QWORD[((-60))+rdi]
+        xor     r14,r11
+        mov     r11,QWORD[36+rdi]
+        mov     QWORD[4+rsi],r14
+
+
+        mov     r8,QWORD[((-68))+rdi]
+
+        xor     r10,rcx
+        xor     r11,rdx
+        rol     r10,10
+        xor     r9,rbx
+        rol     r11,15
+        xor     r12,rbp
+        rol     r9,36
+        xor     r8,rax
+        rol     r12,56
+        mov     r13,r10
+        or      r10,r11
+        rol     r8,27
+
+        not     r11
+        xor     r10,r9
+        mov     QWORD[28+rsi],r10
+
+        mov     r14,r12
+        or      r12,r11
+        xor     r12,r13
+        mov     QWORD[36+rsi],r12
+
+        and     r13,r9
+        xor     r13,r8
+        mov     QWORD[20+rsi],r13
+
+        or      r9,r8
+        xor     r9,r14
+        mov     QWORD[52+rsi],r9
+
+        and     r8,r14
+        xor     r8,r11
+        mov     QWORD[44+rsi],r8
+
+
+        xor     rdx,QWORD[((-84))+rdi]
+        xor     rbp,QWORD[((-36))+rdi]
+        rol     rdx,62
+        xor     rcx,QWORD[68+rdi]
+        rol     rbp,55
+        xor     rax,QWORD[12+rdi]
+        rol     rcx,2
+        xor     rbx,QWORD[20+rdi]
+        xchg    rdi,rsi
+        rol     rax,39
+        rol     rbx,41
+        mov     r13,rdx
+        and     rdx,rbp
+        not     rbp
+        xor     rdx,rcx
+        mov     QWORD[92+rdi],rdx
+
+        mov     r14,rax
+        and     rax,rbp
+        xor     rax,r13
+        mov     QWORD[60+rdi],rax
+
+        or      r13,rcx
+        xor     r13,rbx
+        mov     QWORD[84+rdi],r13
+
+        and     rcx,rbx
+        xor     rcx,r14
+        mov     QWORD[76+rdi],rcx
+
+        or      rbx,r14
+        xor     rbx,rbp
+        mov     QWORD[68+rdi],rbx
+
+        mov     rbp,rdx
+        mov     rdx,r13
+
+        test    r15,255
+        jnz     NEAR $L$oop
+
+        lea     r15,[((-192))+r15]
+        DB      0F3h,0C3h               ;repret
+
+
+
+ALIGN   32
+KeccakF1600:
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+
+        lea     rdi,[100+rdi]
+        sub     rsp,200
+
+
+        not     QWORD[((-92))+rdi]
+        not     QWORD[((-84))+rdi]
+        not     QWORD[((-36))+rdi]
+        not     QWORD[((-4))+rdi]
+        not     QWORD[36+rdi]
+        not     QWORD[60+rdi]
+
+        lea     r15,[iotas]
+        lea     rsi,[100+rsp]
+
+        call    __KeccakF1600
+
+        not     QWORD[((-92))+rdi]
+        not     QWORD[((-84))+rdi]
+        not     QWORD[((-36))+rdi]
+        not     QWORD[((-4))+rdi]
+        not     QWORD[36+rdi]
+        not     QWORD[60+rdi]
+        lea     rdi,[((-100))+rdi]
+
+        add     rsp,200
+
+
+        pop     r15
+
+        pop     r14
+
+        pop     r13
+
+        pop     r12
+
+        pop     rbp
+
+        pop     rbx
+
+        DB      0F3h,0C3h               ;repret
+
+
+global  SHA3_absorb
+
+ALIGN   32
+SHA3_absorb:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_SHA3_absorb:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+
+
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+
+        lea     rdi,[100+rdi]
+        sub     rsp,232
+
+
+        mov     r9,rsi
+        lea     rsi,[100+rsp]
+
+        not     QWORD[((-92))+rdi]
+        not     QWORD[((-84))+rdi]
+        not     QWORD[((-36))+rdi]
+        not     QWORD[((-4))+rdi]
+        not     QWORD[36+rdi]
+        not     QWORD[60+rdi]
+        lea     r15,[iotas]
+
+        mov     QWORD[((216-100))+rsi],rcx
+
+$L$oop_absorb:
+        cmp     rdx,rcx
+        jc      NEAR $L$done_absorb
+
+        shr     rcx,3
+        lea     r8,[((-100))+rdi]
+
+$L$block_absorb:
+        mov     rax,QWORD[r9]
+        lea     r9,[8+r9]
+        xor     rax,QWORD[r8]
+        lea     r8,[8+r8]
+        sub     rdx,8
+        mov     QWORD[((-8))+r8],rax
+        sub     rcx,1
+        jnz     NEAR $L$block_absorb
+
+        mov     QWORD[((200-100))+rsi],r9
+        mov     QWORD[((208-100))+rsi],rdx
+        call    __KeccakF1600
+        mov     r9,QWORD[((200-100))+rsi]
+        mov     rdx,QWORD[((208-100))+rsi]
+        mov     rcx,QWORD[((216-100))+rsi]
+        jmp     NEAR $L$oop_absorb
+
+ALIGN   32
+$L$done_absorb:
+        mov     rax,rdx
+
+        not     QWORD[((-92))+rdi]
+        not     QWORD[((-84))+rdi]
+        not     QWORD[((-36))+rdi]
+        not     QWORD[((-4))+rdi]
+        not     QWORD[36+rdi]
+        not     QWORD[60+rdi]
+
+        add     rsp,232
+
+
+        pop     r15
+
+        pop     r14
+
+        pop     r13
+
+        pop     r12
+
+        pop     rbp
+
+        pop     rbx
+
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_SHA3_absorb:
+global  SHA3_squeeze
+
+ALIGN   32
+SHA3_squeeze:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_SHA3_squeeze:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+
+
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+
+        shr     rcx,3
+        mov     r8,rdi
+        mov     r12,rsi
+        mov     r13,rdx
+        mov     r14,rcx
+        jmp     NEAR $L$oop_squeeze
+
+ALIGN   32
+$L$oop_squeeze:
+        cmp     r13,8
+        jb      NEAR $L$tail_squeeze
+
+        mov     rax,QWORD[r8]
+        lea     r8,[8+r8]
+        mov     QWORD[r12],rax
+        lea     r12,[8+r12]
+        sub     r13,8
+        jz      NEAR $L$done_squeeze
+
+        sub     rcx,1
+        jnz     NEAR $L$oop_squeeze
+
+        call    KeccakF1600
+        mov     r8,rdi
+        mov     rcx,r14
+        jmp     NEAR $L$oop_squeeze
+
+$L$tail_squeeze:
+        mov     rsi,r8
+        mov     rdi,r12
+        mov     rcx,r13
+DB      0xf3,0xa4
+
+$L$done_squeeze:
+        pop     r14
+
+        pop     r13
+
+        pop     r12
+
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_SHA3_squeeze:
+ALIGN   256
+        DQ      0,0,0,0,0,0,0,0
+
+iotas:
+        DQ      0x0000000000000001
+        DQ      0x0000000000008082
+        DQ      0x800000000000808a
+        DQ      0x8000000080008000
+        DQ      0x000000000000808b
+        DQ      0x0000000080000001
+        DQ      0x8000000080008081
+        DQ      0x8000000000008009
+        DQ      0x000000000000008a
+        DQ      0x0000000000000088
+        DQ      0x0000000080008009
+        DQ      0x000000008000000a
+        DQ      0x000000008000808b
+        DQ      0x800000000000008b
+        DQ      0x8000000000008089
+        DQ      0x8000000000008003
+        DQ      0x8000000000008002
+        DQ      0x8000000000000080
+        DQ      0x000000000000800a
+        DQ      0x800000008000000a
+        DQ      0x8000000080008081
+        DQ      0x8000000000008080
+        DQ      0x0000000080000001
+        DQ      0x8000000080008008
+
+DB      75,101,99,99,97,107,45,49,54,48,48,32,97,98,115,111
+DB      114,98,32,97,110,100,32,115,113,117,101,101,122,101,32,102
+DB      111,114,32,120,56,54,95,54,52,44,32,67,82,89,80,84
+DB      79,71,65,77,83,32,98,121,32,60,97,112,112,114,111,64
+DB      111,112,101,110,115,115,108,46,111,114,103,62,0
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb-x86_64.nasm
new file mode 100644
index 0000000000..ea394daa3b
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb-x86_64.nasm
@@ -0,0 +1,7581 @@
+; Copyright 2013-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+EXTERN  OPENSSL_ia32cap_P
+
+global  sha1_multi_block
+
+ALIGN   32
+sha1_multi_block:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha1_multi_block:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+        mov     rcx,QWORD[((OPENSSL_ia32cap_P+4))]
+        bt      rcx,61
+        jc      NEAR _shaext_shortcut
+        test    ecx,268435456
+        jnz     NEAR _avx_shortcut
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        lea     rsp,[((-168))+rsp]
+        movaps  XMMWORD[rsp],xmm6
+        movaps  XMMWORD[16+rsp],xmm7
+        movaps  XMMWORD[32+rsp],xmm8
+        movaps  XMMWORD[48+rsp],xmm9
+        movaps  XMMWORD[(-120)+rax],xmm10
+        movaps  XMMWORD[(-104)+rax],xmm11
+        movaps  XMMWORD[(-88)+rax],xmm12
+        movaps  XMMWORD[(-72)+rax],xmm13
+        movaps  XMMWORD[(-56)+rax],xmm14
+        movaps  XMMWORD[(-40)+rax],xmm15
+        sub     rsp,288
+        and     rsp,-256
+        mov     QWORD[272+rsp],rax
+
+$L$body:
+        lea     rbp,[K_XX_XX]
+        lea     rbx,[256+rsp]
+
+$L$oop_grande:
+        mov     DWORD[280+rsp],edx
+        xor     edx,edx
+        mov     r8,QWORD[rsi]
+        mov     ecx,DWORD[8+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[rbx],ecx
+        cmovle  r8,rbp
+        mov     r9,QWORD[16+rsi]
+        mov     ecx,DWORD[24+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[4+rbx],ecx
+        cmovle  r9,rbp
+        mov     r10,QWORD[32+rsi]
+        mov     ecx,DWORD[40+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[8+rbx],ecx
+        cmovle  r10,rbp
+        mov     r11,QWORD[48+rsi]
+        mov     ecx,DWORD[56+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[12+rbx],ecx
+        cmovle  r11,rbp
+        test    edx,edx
+        jz      NEAR $L$done
+
+        movdqu  xmm10,XMMWORD[rdi]
+        lea     rax,[128+rsp]
+        movdqu  xmm11,XMMWORD[32+rdi]
+        movdqu  xmm12,XMMWORD[64+rdi]
+        movdqu  xmm13,XMMWORD[96+rdi]
+        movdqu  xmm14,XMMWORD[128+rdi]
+        movdqa  xmm5,XMMWORD[96+rbp]
+        movdqa  xmm15,XMMWORD[((-32))+rbp]
+        jmp     NEAR $L$oop
+
+ALIGN   32
+$L$oop:
+        movd    xmm0,DWORD[r8]
+        lea     r8,[64+r8]
+        movd    xmm2,DWORD[r9]
+        lea     r9,[64+r9]
+        movd    xmm3,DWORD[r10]
+        lea     r10,[64+r10]
+        movd    xmm4,DWORD[r11]
+        lea     r11,[64+r11]
+        punpckldq       xmm0,xmm3
+        movd    xmm1,DWORD[((-60))+r8]
+        punpckldq       xmm2,xmm4
+        movd    xmm9,DWORD[((-60))+r9]
+        punpckldq       xmm0,xmm2
+        movd    xmm8,DWORD[((-60))+r10]
+DB      102,15,56,0,197
+        movd    xmm7,DWORD[((-60))+r11]
+        punpckldq       xmm1,xmm8
+        movdqa  xmm8,xmm10
+        paddd   xmm14,xmm15
+        punpckldq       xmm9,xmm7
+        movdqa  xmm7,xmm11
+        movdqa  xmm6,xmm11
+        pslld   xmm8,5
+        pandn   xmm7,xmm13
+        pand    xmm6,xmm12
+        punpckldq       xmm1,xmm9
+        movdqa  xmm9,xmm10
+
+        movdqa  XMMWORD[(0-128)+rax],xmm0
+        paddd   xmm14,xmm0
+        movd    xmm2,DWORD[((-56))+r8]
+        psrld   xmm9,27
+        pxor    xmm6,xmm7
+        movdqa  xmm7,xmm11
+
+        por     xmm8,xmm9
+        movd    xmm9,DWORD[((-56))+r9]
+        pslld   xmm7,30
+        paddd   xmm14,xmm6
+
+        psrld   xmm11,2
+        paddd   xmm14,xmm8
+DB      102,15,56,0,205
+        movd    xmm8,DWORD[((-56))+r10]
+        por     xmm11,xmm7
+        movd    xmm7,DWORD[((-56))+r11]
+        punpckldq       xmm2,xmm8
+        movdqa  xmm8,xmm14
+        paddd   xmm13,xmm15
+        punpckldq       xmm9,xmm7
+        movdqa  xmm7,xmm10
+        movdqa  xmm6,xmm10
+        pslld   xmm8,5
+        pandn   xmm7,xmm12
+        pand    xmm6,xmm11
+        punpckldq       xmm2,xmm9
+        movdqa  xmm9,xmm14
+
+        movdqa  XMMWORD[(16-128)+rax],xmm1
+        paddd   xmm13,xmm1
+        movd    xmm3,DWORD[((-52))+r8]
+        psrld   xmm9,27
+        pxor    xmm6,xmm7
+        movdqa  xmm7,xmm10
+
+        por     xmm8,xmm9
+        movd    xmm9,DWORD[((-52))+r9]
+        pslld   xmm7,30
+        paddd   xmm13,xmm6
+
+        psrld   xmm10,2
+        paddd   xmm13,xmm8
+DB      102,15,56,0,213
+        movd    xmm8,DWORD[((-52))+r10]
+        por     xmm10,xmm7
+        movd    xmm7,DWORD[((-52))+r11]
+        punpckldq       xmm3,xmm8
+        movdqa  xmm8,xmm13
+        paddd   xmm12,xmm15
+        punpckldq       xmm9,xmm7
+        movdqa  xmm7,xmm14
+        movdqa  xmm6,xmm14
+        pslld   xmm8,5
+        pandn   xmm7,xmm11
+        pand    xmm6,xmm10
+        punpckldq       xmm3,xmm9
+        movdqa  xmm9,xmm13
+
+        movdqa  XMMWORD[(32-128)+rax],xmm2
+        paddd   xmm12,xmm2
+        movd    xmm4,DWORD[((-48))+r8]
+        psrld   xmm9,27
+        pxor    xmm6,xmm7
+        movdqa  xmm7,xmm14
+
+        por     xmm8,xmm9
+        movd    xmm9,DWORD[((-48))+r9]
+        pslld   xmm7,30
+        paddd   xmm12,xmm6
+
+        psrld   xmm14,2
+        paddd   xmm12,xmm8
+DB      102,15,56,0,221
+        movd    xmm8,DWORD[((-48))+r10]
+        por     xmm14,xmm7
+        movd    xmm7,DWORD[((-48))+r11]
+        punpckldq       xmm4,xmm8
+        movdqa  xmm8,xmm12
+        paddd   xmm11,xmm15
+        punpckldq       xmm9,xmm7
+        movdqa  xmm7,xmm13
+        movdqa  xmm6,xmm13
+        pslld   xmm8,5
+        pandn   xmm7,xmm10
+        pand    xmm6,xmm14
+        punpckldq       xmm4,xmm9
+        movdqa  xmm9,xmm12
+
+        movdqa  XMMWORD[(48-128)+rax],xmm3
+        paddd   xmm11,xmm3
+        movd    xmm0,DWORD[((-44))+r8]
+        psrld   xmm9,27
+        pxor    xmm6,xmm7
+        movdqa  xmm7,xmm13
+
+        por     xmm8,xmm9
+        movd    xmm9,DWORD[((-44))+r9]
+        pslld   xmm7,30
+        paddd   xmm11,xmm6
+
+        psrld   xmm13,2
+        paddd   xmm11,xmm8
+DB      102,15,56,0,229
+        movd    xmm8,DWORD[((-44))+r10]
+        por     xmm13,xmm7
+        movd    xmm7,DWORD[((-44))+r11]
+        punpckldq       xmm0,xmm8
+        movdqa  xmm8,xmm11
+        paddd   xmm10,xmm15
+        punpckldq       xmm9,xmm7
+        movdqa  xmm7,xmm12
+        movdqa  xmm6,xmm12
+        pslld   xmm8,5
+        pandn   xmm7,xmm14
+        pand    xmm6,xmm13
+        punpckldq       xmm0,xmm9
+        movdqa  xmm9,xmm11
+
+        movdqa  XMMWORD[(64-128)+rax],xmm4
+        paddd   xmm10,xmm4
+        movd    xmm1,DWORD[((-40))+r8]
+        psrld   xmm9,27
+        pxor    xmm6,xmm7
+        movdqa  xmm7,xmm12
+
+        por     xmm8,xmm9
+        movd    xmm9,DWORD[((-40))+r9]
+        pslld   xmm7,30
+        paddd   xmm10,xmm6
+
+        psrld   xmm12,2
+        paddd   xmm10,xmm8
+DB      102,15,56,0,197
+        movd    xmm8,DWORD[((-40))+r10]
+        por     xmm12,xmm7
+        movd    xmm7,DWORD[((-40))+r11]
+        punpckldq       xmm1,xmm8
+        movdqa  xmm8,xmm10
+        paddd   xmm14,xmm15
+        punpckldq       xmm9,xmm7
+        movdqa  xmm7,xmm11
+        movdqa  xmm6,xmm11
+        pslld   xmm8,5
+        pandn   xmm7,xmm13
+        pand    xmm6,xmm12
+        punpckldq       xmm1,xmm9
+        movdqa  xmm9,xmm10
+
+        movdqa  XMMWORD[(80-128)+rax],xmm0
+        paddd   xmm14,xmm0
+        movd    xmm2,DWORD[((-36))+r8]
+        psrld   xmm9,27
+        pxor    xmm6,xmm7
+        movdqa  xmm7,xmm11
+
+        por     xmm8,xmm9
+        movd    xmm9,DWORD[((-36))+r9]
+        pslld   xmm7,30
+        paddd   xmm14,xmm6
+
+        psrld   xmm11,2
+        paddd   xmm14,xmm8
+DB      102,15,56,0,205
+        movd    xmm8,DWORD[((-36))+r10]
+        por     xmm11,xmm7
+        movd    xmm7,DWORD[((-36))+r11]
+        punpckldq       xmm2,xmm8
+        movdqa  xmm8,xmm14
+        paddd   xmm13,xmm15
+        punpckldq       xmm9,xmm7
+        movdqa  xmm7,xmm10
+        movdqa  xmm6,xmm10
+        pslld   xmm8,5
+        pandn   xmm7,xmm12
+        pand    xmm6,xmm11
+        punpckldq       xmm2,xmm9
+        movdqa  xmm9,xmm14
+
+        movdqa  XMMWORD[(96-128)+rax],xmm1
+        paddd   xmm13,xmm1
+        movd    xmm3,DWORD[((-32))+r8]
+        psrld   xmm9,27
+        pxor    xmm6,xmm7
+        movdqa  xmm7,xmm10
+
+        por     xmm8,xmm9
+        movd    xmm9,DWORD[((-32))+r9]
+        pslld   xmm7,30
+        paddd   xmm13,xmm6
+
+        psrld   xmm10,2
+        paddd   xmm13,xmm8
+DB      102,15,56,0,213
+        movd    xmm8,DWORD[((-32))+r10]
+        por     xmm10,xmm7
+        movd    xmm7,DWORD[((-32))+r11]
+        punpckldq       xmm3,xmm8
+        movdqa  xmm8,xmm13
+        paddd   xmm12,xmm15
+        punpckldq       xmm9,xmm7
+        movdqa  xmm7,xmm14
+        movdqa  xmm6,xmm14
+        pslld   xmm8,5
+        pandn   xmm7,xmm11
+        pand    xmm6,xmm10
+        punpckldq       xmm3,xmm9
+        movdqa  xmm9,xmm13
+
+        movdqa  XMMWORD[(112-128)+rax],xmm2
+        paddd   xmm12,xmm2
+        movd    xmm4,DWORD[((-28))+r8]
+        psrld   xmm9,27
+        pxor    xmm6,xmm7
+        movdqa  xmm7,xmm14
+
+        por     xmm8,xmm9
+        movd    xmm9,DWORD[((-28))+r9]
+        pslld   xmm7,30
+        paddd   xmm12,xmm6
+
+        psrld   xmm14,2
+        paddd   xmm12,xmm8
+DB      102,15,56,0,221
+        movd    xmm8,DWORD[((-28))+r10]
+        por     xmm14,xmm7
+        movd    xmm7,DWORD[((-28))+r11]
+        punpckldq       xmm4,xmm8
+        movdqa  xmm8,xmm12
+        paddd   xmm11,xmm15
+        punpckldq       xmm9,xmm7
+        movdqa  xmm7,xmm13
+        movdqa  xmm6,xmm13
+        pslld   xmm8,5
+        pandn   xmm7,xmm10
+        pand    xmm6,xmm14
+        punpckldq       xmm4,xmm9
+        movdqa  xmm9,xmm12
+
+        movdqa  XMMWORD[(128-128)+rax],xmm3
+        paddd   xmm11,xmm3
+        movd    xmm0,DWORD[((-24))+r8]
+        psrld   xmm9,27
+        pxor    xmm6,xmm7
+        movdqa  xmm7,xmm13
+
+        por     xmm8,xmm9
+        movd    xmm9,DWORD[((-24))+r9]
+        pslld   xmm7,30
+        paddd   xmm11,xmm6
+
+        psrld   xmm13,2
+        paddd   xmm11,xmm8
+DB      102,15,56,0,229
+        movd    xmm8,DWORD[((-24))+r10]
+        por     xmm13,xmm7
+        movd    xmm7,DWORD[((-24))+r11]
+        punpckldq       xmm0,xmm8
+        movdqa  xmm8,xmm11
+        paddd   xmm10,xmm15
+        punpckldq       xmm9,xmm7
+        movdqa  xmm7,xmm12
+        movdqa  xmm6,xmm12
+        pslld   xmm8,5
+        pandn   xmm7,xmm14
+        pand    xmm6,xmm13
+        punpckldq       xmm0,xmm9
+        movdqa  xmm9,xmm11
+
+        movdqa  XMMWORD[(144-128)+rax],xmm4
+        paddd   xmm10,xmm4
+        movd    xmm1,DWORD[((-20))+r8]
+        psrld   xmm9,27
+        pxor    xmm6,xmm7
+        movdqa  xmm7,xmm12
+
+        por     xmm8,xmm9
+        movd    xmm9,DWORD[((-20))+r9]
+        pslld   xmm7,30
+        paddd   xmm10,xmm6
+
+        psrld   xmm12,2
+        paddd   xmm10,xmm8
+DB      102,15,56,0,197
+        movd    xmm8,DWORD[((-20))+r10]
+        por     xmm12,xmm7
+        movd    xmm7,DWORD[((-20))+r11]
+        punpckldq       xmm1,xmm8
+        movdqa  xmm8,xmm10
+        paddd   xmm14,xmm15
+        punpckldq       xmm9,xmm7
+        movdqa  xmm7,xmm11
+        movdqa  xmm6,xmm11
+        pslld   xmm8,5
+        pandn   xmm7,xmm13
+        pand    xmm6,xmm12
+        punpckldq       xmm1,xmm9
+        movdqa  xmm9,xmm10
+
+        movdqa  XMMWORD[(160-128)+rax],xmm0
+        paddd   xmm14,xmm0
+        movd    xmm2,DWORD[((-16))+r8]
+        psrld   xmm9,27
+        pxor    xmm6,xmm7
+        movdqa  xmm7,xmm11
+
+        por     xmm8,xmm9
+        movd    xmm9,DWORD[((-16))+r9]
+        pslld   xmm7,30
+        paddd   xmm14,xmm6
+
+        psrld   xmm11,2
+        paddd   xmm14,xmm8
+DB      102,15,56,0,205
+        movd    xmm8,DWORD[((-16))+r10]
+        por     xmm11,xmm7
+        movd    xmm7,DWORD[((-16))+r11]
+        punpckldq       xmm2,xmm8
+        movdqa  xmm8,xmm14
+        paddd   xmm13,xmm15
+        punpckldq       xmm9,xmm7
+        movdqa  xmm7,xmm10
+        movdqa  xmm6,xmm10
+        pslld   xmm8,5
+        pandn   xmm7,xmm12
+        pand    xmm6,xmm11
+        punpckldq       xmm2,xmm9
+        movdqa  xmm9,xmm14
+
+        movdqa  XMMWORD[(176-128)+rax],xmm1
+        paddd   xmm13,xmm1
+        movd    xmm3,DWORD[((-12))+r8]
+        psrld   xmm9,27
+        pxor    xmm6,xmm7
+        movdqa  xmm7,xmm10
+
+        por     xmm8,xmm9
+        movd    xmm9,DWORD[((-12))+r9]
+        pslld   xmm7,30
+        paddd   xmm13,xmm6
+
+        psrld   xmm10,2
+        paddd   xmm13,xmm8
+DB      102,15,56,0,213
+        movd    xmm8,DWORD[((-12))+r10]
+        por     xmm10,xmm7
+        movd    xmm7,DWORD[((-12))+r11]
+        punpckldq       xmm3,xmm8
+        movdqa  xmm8,xmm13
+        paddd   xmm12,xmm15
+        punpckldq       xmm9,xmm7
+        movdqa  xmm7,xmm14
+        movdqa  xmm6,xmm14
+        pslld   xmm8,5
+        pandn   xmm7,xmm11
+        pand    xmm6,xmm10
+        punpckldq       xmm3,xmm9
+        movdqa  xmm9,xmm13
+
+        movdqa  XMMWORD[(192-128)+rax],xmm2
+        paddd   xmm12,xmm2
+        movd    xmm4,DWORD[((-8))+r8]
+        psrld   xmm9,27
+        pxor    xmm6,xmm7
+        movdqa  xmm7,xmm14
+
+        por     xmm8,xmm9
+        movd    xmm9,DWORD[((-8))+r9]
+        pslld   xmm7,30
+        paddd   xmm12,xmm6
+
+        psrld   xmm14,2
+        paddd   xmm12,xmm8
+DB      102,15,56,0,221
+        movd    xmm8,DWORD[((-8))+r10]
+        por     xmm14,xmm7
+        movd    xmm7,DWORD[((-8))+r11]
+        punpckldq       xmm4,xmm8
+        movdqa  xmm8,xmm12
+        paddd   xmm11,xmm15
+        punpckldq       xmm9,xmm7
+        movdqa  xmm7,xmm13
+        movdqa  xmm6,xmm13
+        pslld   xmm8,5
+        pandn   xmm7,xmm10
+        pand    xmm6,xmm14
+        punpckldq       xmm4,xmm9
+        movdqa  xmm9,xmm12
+
+        movdqa  XMMWORD[(208-128)+rax],xmm3
+        paddd   xmm11,xmm3
+        movd    xmm0,DWORD[((-4))+r8]
+        psrld   xmm9,27
+        pxor    xmm6,xmm7
+        movdqa  xmm7,xmm13
+
+        por     xmm8,xmm9
+        movd    xmm9,DWORD[((-4))+r9]
+        pslld   xmm7,30
+        paddd   xmm11,xmm6
+
+        psrld   xmm13,2
+        paddd   xmm11,xmm8
+DB      102,15,56,0,229
+        movd    xmm8,DWORD[((-4))+r10]
+        por     xmm13,xmm7
+        movdqa  xmm1,XMMWORD[((0-128))+rax]
+        movd    xmm7,DWORD[((-4))+r11]
+        punpckldq       xmm0,xmm8
+        movdqa  xmm8,xmm11
+        paddd   xmm10,xmm15
+        punpckldq       xmm9,xmm7
+        movdqa  xmm7,xmm12
+        movdqa  xmm6,xmm12
+        pslld   xmm8,5
+        prefetcht0      [63+r8]
+        pandn   xmm7,xmm14
+        pand    xmm6,xmm13
+        punpckldq       xmm0,xmm9
+        movdqa  xmm9,xmm11
+
+        movdqa  XMMWORD[(224-128)+rax],xmm4
+        paddd   xmm10,xmm4
+        psrld   xmm9,27
+        pxor    xmm6,xmm7
+        movdqa  xmm7,xmm12
+        prefetcht0      [63+r9]
+
+        por     xmm8,xmm9
+        pslld   xmm7,30
+        paddd   xmm10,xmm6
+        prefetcht0      [63+r10]
+
+        psrld   xmm12,2
+        paddd   xmm10,xmm8
+DB      102,15,56,0,197
+        prefetcht0      [63+r11]
+        por     xmm12,xmm7
+        movdqa  xmm2,XMMWORD[((16-128))+rax]
+        pxor    xmm1,xmm3
+        movdqa  xmm3,XMMWORD[((32-128))+rax]
+
+        movdqa  xmm8,xmm10
+        pxor    xmm1,XMMWORD[((128-128))+rax]
+        paddd   xmm14,xmm15
+        movdqa  xmm7,xmm11
+        pslld   xmm8,5
+        pxor    xmm1,xmm3
+        movdqa  xmm6,xmm11
+        pandn   xmm7,xmm13
+        movdqa  xmm5,xmm1
+        pand    xmm6,xmm12
+        movdqa  xmm9,xmm10
+        psrld   xmm5,31
+        paddd   xmm1,xmm1
+
+        movdqa  XMMWORD[(240-128)+rax],xmm0
+        paddd   xmm14,xmm0
+        psrld   xmm9,27
+        pxor    xmm6,xmm7
+
+        movdqa  xmm7,xmm11
+        por     xmm8,xmm9
+        pslld   xmm7,30
+        paddd   xmm14,xmm6
+
+        psrld   xmm11,2
+        paddd   xmm14,xmm8
+        por     xmm1,xmm5
+        por     xmm11,xmm7
+        pxor    xmm2,xmm4
+        movdqa  xmm4,XMMWORD[((48-128))+rax]
+
+        movdqa  xmm8,xmm14
+        pxor    xmm2,XMMWORD[((144-128))+rax]
+        paddd   xmm13,xmm15
+        movdqa  xmm7,xmm10
+        pslld   xmm8,5
+        pxor    xmm2,xmm4
+        movdqa  xmm6,xmm10
+        pandn   xmm7,xmm12
+        movdqa  xmm5,xmm2
+        pand    xmm6,xmm11
+        movdqa  xmm9,xmm14
+        psrld   xmm5,31
+        paddd   xmm2,xmm2
+
+        movdqa  XMMWORD[(0-128)+rax],xmm1
+        paddd   xmm13,xmm1
+        psrld   xmm9,27
+        pxor    xmm6,xmm7
+
+        movdqa  xmm7,xmm10
+        por     xmm8,xmm9
+        pslld   xmm7,30
+        paddd   xmm13,xmm6
+
+        psrld   xmm10,2
+        paddd   xmm13,xmm8
+        por     xmm2,xmm5
+        por     xmm10,xmm7
+        pxor    xmm3,xmm0
+        movdqa  xmm0,XMMWORD[((64-128))+rax]
+
+        movdqa  xmm8,xmm13
+        pxor    xmm3,XMMWORD[((160-128))+rax]
+        paddd   xmm12,xmm15
+        movdqa  xmm7,xmm14
+        pslld   xmm8,5
+        pxor    xmm3,xmm0
+        movdqa  xmm6,xmm14
+        pandn   xmm7,xmm11
+        movdqa  xmm5,xmm3
+        pand    xmm6,xmm10
+        movdqa  xmm9,xmm13
+        psrld   xmm5,31
+        paddd   xmm3,xmm3
+
+        movdqa  XMMWORD[(16-128)+rax],xmm2
+        paddd   xmm12,xmm2
+        psrld   xmm9,27
+        pxor    xmm6,xmm7
+
+        movdqa  xmm7,xmm14
+        por     xmm8,xmm9
+        pslld   xmm7,30
+        paddd   xmm12,xmm6
+
+        psrld   xmm14,2
+        paddd   xmm12,xmm8
+        por     xmm3,xmm5
+        por     xmm14,xmm7
+        pxor    xmm4,xmm1
+        movdqa  xmm1,XMMWORD[((80-128))+rax]
+
+        movdqa  xmm8,xmm12
+        pxor    xmm4,XMMWORD[((176-128))+rax]
+        paddd   xmm11,xmm15
+        movdqa  xmm7,xmm13
+        pslld   xmm8,5
+        pxor    xmm4,xmm1
+        movdqa  xmm6,xmm13
+        pandn   xmm7,xmm10
+        movdqa  xmm5,xmm4
+        pand    xmm6,xmm14
+        movdqa  xmm9,xmm12
+        psrld   xmm5,31
+        paddd   xmm4,xmm4
+
+        movdqa  XMMWORD[(32-128)+rax],xmm3
+        paddd   xmm11,xmm3
+        psrld   xmm9,27
+        pxor    xmm6,xmm7
+
+        movdqa  xmm7,xmm13
+        por     xmm8,xmm9
+        pslld   xmm7,30
+        paddd   xmm11,xmm6
+
+        psrld   xmm13,2
+        paddd   xmm11,xmm8
+        por     xmm4,xmm5
+        por     xmm13,xmm7
+        pxor    xmm0,xmm2
+        movdqa  xmm2,XMMWORD[((96-128))+rax]
+
+        movdqa  xmm8,xmm11
+        pxor    xmm0,XMMWORD[((192-128))+rax]
+        paddd   xmm10,xmm15
+        movdqa  xmm7,xmm12
+        pslld   xmm8,5
+        pxor    xmm0,xmm2
+        movdqa  xmm6,xmm12
+        pandn   xmm7,xmm14
+        movdqa  xmm5,xmm0
+        pand    xmm6,xmm13
+        movdqa  xmm9,xmm11
+        psrld   xmm5,31
+        paddd   xmm0,xmm0
+
+        movdqa  XMMWORD[(48-128)+rax],xmm4
+        paddd   xmm10,xmm4
+        psrld   xmm9,27
+        pxor    xmm6,xmm7
+
+        movdqa  xmm7,xmm12
+        por     xmm8,xmm9
+        pslld   xmm7,30
+        paddd   xmm10,xmm6
+
+        psrld   xmm12,2
+        paddd   xmm10,xmm8
+        por     xmm0,xmm5
+        por     xmm12,xmm7
+        movdqa  xmm15,XMMWORD[rbp]
+        pxor    xmm1,xmm3
+        movdqa  xmm3,XMMWORD[((112-128))+rax]
+
+        movdqa  xmm8,xmm10
+        movdqa  xmm6,xmm13
+        pxor    xmm1,XMMWORD[((208-128))+rax]
+        paddd   xmm14,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm11
+
+        movdqa  xmm9,xmm10
+        movdqa  XMMWORD[(64-128)+rax],xmm0
+        paddd   xmm14,xmm0
+        pxor    xmm1,xmm3
+        psrld   xmm9,27
+        pxor    xmm6,xmm12
+        movdqa  xmm7,xmm11
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm1
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm14,xmm6
+        paddd   xmm1,xmm1
+
+        psrld   xmm11,2
+        paddd   xmm14,xmm8
+        por     xmm1,xmm5
+        por     xmm11,xmm7
+        pxor    xmm2,xmm4
+        movdqa  xmm4,XMMWORD[((128-128))+rax]
+
+        movdqa  xmm8,xmm14
+        movdqa  xmm6,xmm12
+        pxor    xmm2,XMMWORD[((224-128))+rax]
+        paddd   xmm13,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm10
+
+        movdqa  xmm9,xmm14
+        movdqa  XMMWORD[(80-128)+rax],xmm1
+        paddd   xmm13,xmm1
+        pxor    xmm2,xmm4
+        psrld   xmm9,27
+        pxor    xmm6,xmm11
+        movdqa  xmm7,xmm10
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm2
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm13,xmm6
+        paddd   xmm2,xmm2
+
+        psrld   xmm10,2
+        paddd   xmm13,xmm8
+        por     xmm2,xmm5
+        por     xmm10,xmm7
+        pxor    xmm3,xmm0
+        movdqa  xmm0,XMMWORD[((144-128))+rax]
+
+        movdqa  xmm8,xmm13
+        movdqa  xmm6,xmm11
+        pxor    xmm3,XMMWORD[((240-128))+rax]
+        paddd   xmm12,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm14
+
+        movdqa  xmm9,xmm13
+        movdqa  XMMWORD[(96-128)+rax],xmm2
+        paddd   xmm12,xmm2
+        pxor    xmm3,xmm0
+        psrld   xmm9,27
+        pxor    xmm6,xmm10
+        movdqa  xmm7,xmm14
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm3
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm12,xmm6
+        paddd   xmm3,xmm3
+
+        psrld   xmm14,2
+        paddd   xmm12,xmm8
+        por     xmm3,xmm5
+        por     xmm14,xmm7
+        pxor    xmm4,xmm1
+        movdqa  xmm1,XMMWORD[((160-128))+rax]
+
+        movdqa  xmm8,xmm12
+        movdqa  xmm6,xmm10
+        pxor    xmm4,XMMWORD[((0-128))+rax]
+        paddd   xmm11,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm13
+
+        movdqa  xmm9,xmm12
+        movdqa  XMMWORD[(112-128)+rax],xmm3
+        paddd   xmm11,xmm3
+        pxor    xmm4,xmm1
+        psrld   xmm9,27
+        pxor    xmm6,xmm14
+        movdqa  xmm7,xmm13
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm4
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm11,xmm6
+        paddd   xmm4,xmm4
+
+        psrld   xmm13,2
+        paddd   xmm11,xmm8
+        por     xmm4,xmm5
+        por     xmm13,xmm7
+        pxor    xmm0,xmm2
+        movdqa  xmm2,XMMWORD[((176-128))+rax]
+
+        movdqa  xmm8,xmm11
+        movdqa  xmm6,xmm14
+        pxor    xmm0,XMMWORD[((16-128))+rax]
+        paddd   xmm10,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm12
+
+        movdqa  xmm9,xmm11
+        movdqa  XMMWORD[(128-128)+rax],xmm4
+        paddd   xmm10,xmm4
+        pxor    xmm0,xmm2
+        psrld   xmm9,27
+        pxor    xmm6,xmm13
+        movdqa  xmm7,xmm12
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm0
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm10,xmm6
+        paddd   xmm0,xmm0
+
+        psrld   xmm12,2
+        paddd   xmm10,xmm8
+        por     xmm0,xmm5
+        por     xmm12,xmm7
+        pxor    xmm1,xmm3
+        movdqa  xmm3,XMMWORD[((192-128))+rax]
+
+        movdqa  xmm8,xmm10
+        movdqa  xmm6,xmm13
+        pxor    xmm1,XMMWORD[((32-128))+rax]
+        paddd   xmm14,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm11
+
+        movdqa  xmm9,xmm10
+        movdqa  XMMWORD[(144-128)+rax],xmm0
+        paddd   xmm14,xmm0
+        pxor    xmm1,xmm3
+        psrld   xmm9,27
+        pxor    xmm6,xmm12
+        movdqa  xmm7,xmm11
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm1
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm14,xmm6
+        paddd   xmm1,xmm1
+
+        psrld   xmm11,2
+        paddd   xmm14,xmm8
+        por     xmm1,xmm5
+        por     xmm11,xmm7
+        pxor    xmm2,xmm4
+        movdqa  xmm4,XMMWORD[((208-128))+rax]
+
+        movdqa  xmm8,xmm14
+        movdqa  xmm6,xmm12
+        pxor    xmm2,XMMWORD[((48-128))+rax]
+        paddd   xmm13,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm10
+
+        movdqa  xmm9,xmm14
+        movdqa  XMMWORD[(160-128)+rax],xmm1
+        paddd   xmm13,xmm1
+        pxor    xmm2,xmm4
+        psrld   xmm9,27
+        pxor    xmm6,xmm11
+        movdqa  xmm7,xmm10
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm2
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm13,xmm6
+        paddd   xmm2,xmm2
+
+        psrld   xmm10,2
+        paddd   xmm13,xmm8
+        por     xmm2,xmm5
+        por     xmm10,xmm7
+        pxor    xmm3,xmm0
+        movdqa  xmm0,XMMWORD[((224-128))+rax]
+
+        movdqa  xmm8,xmm13
+        movdqa  xmm6,xmm11
+        pxor    xmm3,XMMWORD[((64-128))+rax]
+        paddd   xmm12,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm14
+
+        movdqa  xmm9,xmm13
+        movdqa  XMMWORD[(176-128)+rax],xmm2
+        paddd   xmm12,xmm2
+        pxor    xmm3,xmm0
+        psrld   xmm9,27
+        pxor    xmm6,xmm10
+        movdqa  xmm7,xmm14
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm3
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm12,xmm6
+        paddd   xmm3,xmm3
+
+        psrld   xmm14,2
+        paddd   xmm12,xmm8
+        por     xmm3,xmm5
+        por     xmm14,xmm7
+        pxor    xmm4,xmm1
+        movdqa  xmm1,XMMWORD[((240-128))+rax]
+
+        movdqa  xmm8,xmm12
+        movdqa  xmm6,xmm10
+        pxor    xmm4,XMMWORD[((80-128))+rax]
+        paddd   xmm11,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm13
+
+        movdqa  xmm9,xmm12
+        movdqa  XMMWORD[(192-128)+rax],xmm3
+        paddd   xmm11,xmm3
+        pxor    xmm4,xmm1
+        psrld   xmm9,27
+        pxor    xmm6,xmm14
+        movdqa  xmm7,xmm13
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm4
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm11,xmm6
+        paddd   xmm4,xmm4
+
+        psrld   xmm13,2
+        paddd   xmm11,xmm8
+        por     xmm4,xmm5
+        por     xmm13,xmm7
+        pxor    xmm0,xmm2
+        movdqa  xmm2,XMMWORD[((0-128))+rax]
+
+        movdqa  xmm8,xmm11
+        movdqa  xmm6,xmm14
+        pxor    xmm0,XMMWORD[((96-128))+rax]
+        paddd   xmm10,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm12
+
+        movdqa  xmm9,xmm11
+        movdqa  XMMWORD[(208-128)+rax],xmm4
+        paddd   xmm10,xmm4
+        pxor    xmm0,xmm2
+        psrld   xmm9,27
+        pxor    xmm6,xmm13
+        movdqa  xmm7,xmm12
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm0
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm10,xmm6
+        paddd   xmm0,xmm0
+
+        psrld   xmm12,2
+        paddd   xmm10,xmm8
+        por     xmm0,xmm5
+        por     xmm12,xmm7
+        pxor    xmm1,xmm3
+        movdqa  xmm3,XMMWORD[((16-128))+rax]
+
+        movdqa  xmm8,xmm10
+        movdqa  xmm6,xmm13
+        pxor    xmm1,XMMWORD[((112-128))+rax]
+        paddd   xmm14,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm11
+
+        movdqa  xmm9,xmm10
+        movdqa  XMMWORD[(224-128)+rax],xmm0
+        paddd   xmm14,xmm0
+        pxor    xmm1,xmm3
+        psrld   xmm9,27
+        pxor    xmm6,xmm12
+        movdqa  xmm7,xmm11
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm1
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm14,xmm6
+        paddd   xmm1,xmm1
+
+        psrld   xmm11,2
+        paddd   xmm14,xmm8
+        por     xmm1,xmm5
+        por     xmm11,xmm7
+        pxor    xmm2,xmm4
+        movdqa  xmm4,XMMWORD[((32-128))+rax]
+
+        movdqa  xmm8,xmm14
+        movdqa  xmm6,xmm12
+        pxor    xmm2,XMMWORD[((128-128))+rax]
+        paddd   xmm13,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm10
+
+        movdqa  xmm9,xmm14
+        movdqa  XMMWORD[(240-128)+rax],xmm1
+        paddd   xmm13,xmm1
+        pxor    xmm2,xmm4
+        psrld   xmm9,27
+        pxor    xmm6,xmm11
+        movdqa  xmm7,xmm10
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm2
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm13,xmm6
+        paddd   xmm2,xmm2
+
+        psrld   xmm10,2
+        paddd   xmm13,xmm8
+        por     xmm2,xmm5
+        por     xmm10,xmm7
+        pxor    xmm3,xmm0
+        movdqa  xmm0,XMMWORD[((48-128))+rax]
+
+        movdqa  xmm8,xmm13
+        movdqa  xmm6,xmm11
+        pxor    xmm3,XMMWORD[((144-128))+rax]
+        paddd   xmm12,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm14
+
+        movdqa  xmm9,xmm13
+        movdqa  XMMWORD[(0-128)+rax],xmm2
+        paddd   xmm12,xmm2
+        pxor    xmm3,xmm0
+        psrld   xmm9,27
+        pxor    xmm6,xmm10
+        movdqa  xmm7,xmm14
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm3
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm12,xmm6
+        paddd   xmm3,xmm3
+
+        psrld   xmm14,2
+        paddd   xmm12,xmm8
+        por     xmm3,xmm5
+        por     xmm14,xmm7
+        pxor    xmm4,xmm1
+        movdqa  xmm1,XMMWORD[((64-128))+rax]
+
+        movdqa  xmm8,xmm12
+        movdqa  xmm6,xmm10
+        pxor    xmm4,XMMWORD[((160-128))+rax]
+        paddd   xmm11,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm13
+
+        movdqa  xmm9,xmm12
+        movdqa  XMMWORD[(16-128)+rax],xmm3
+        paddd   xmm11,xmm3
+        pxor    xmm4,xmm1
+        psrld   xmm9,27
+        pxor    xmm6,xmm14
+        movdqa  xmm7,xmm13
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm4
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm11,xmm6
+        paddd   xmm4,xmm4
+
+        psrld   xmm13,2
+        paddd   xmm11,xmm8
+        por     xmm4,xmm5
+        por     xmm13,xmm7
+        pxor    xmm0,xmm2
+        movdqa  xmm2,XMMWORD[((80-128))+rax]
+
+        movdqa  xmm8,xmm11
+        movdqa  xmm6,xmm14
+        pxor    xmm0,XMMWORD[((176-128))+rax]
+        paddd   xmm10,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm12
+
+        movdqa  xmm9,xmm11
+        movdqa  XMMWORD[(32-128)+rax],xmm4
+        paddd   xmm10,xmm4
+        pxor    xmm0,xmm2
+        psrld   xmm9,27
+        pxor    xmm6,xmm13
+        movdqa  xmm7,xmm12
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm0
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm10,xmm6
+        paddd   xmm0,xmm0
+
+        psrld   xmm12,2
+        paddd   xmm10,xmm8
+        por     xmm0,xmm5
+        por     xmm12,xmm7
+        pxor    xmm1,xmm3
+        movdqa  xmm3,XMMWORD[((96-128))+rax]
+
+        movdqa  xmm8,xmm10
+        movdqa  xmm6,xmm13
+        pxor    xmm1,XMMWORD[((192-128))+rax]
+        paddd   xmm14,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm11
+
+        movdqa  xmm9,xmm10
+        movdqa  XMMWORD[(48-128)+rax],xmm0
+        paddd   xmm14,xmm0
+        pxor    xmm1,xmm3
+        psrld   xmm9,27
+        pxor    xmm6,xmm12
+        movdqa  xmm7,xmm11
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm1
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm14,xmm6
+        paddd   xmm1,xmm1
+
+        psrld   xmm11,2
+        paddd   xmm14,xmm8
+        por     xmm1,xmm5
+        por     xmm11,xmm7
+        pxor    xmm2,xmm4
+        movdqa  xmm4,XMMWORD[((112-128))+rax]
+
+        movdqa  xmm8,xmm14
+        movdqa  xmm6,xmm12
+        pxor    xmm2,XMMWORD[((208-128))+rax]
+        paddd   xmm13,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm10
+
+        movdqa  xmm9,xmm14
+        movdqa  XMMWORD[(64-128)+rax],xmm1
+        paddd   xmm13,xmm1
+        pxor    xmm2,xmm4
+        psrld   xmm9,27
+        pxor    xmm6,xmm11
+        movdqa  xmm7,xmm10
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm2
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm13,xmm6
+        paddd   xmm2,xmm2
+
+        psrld   xmm10,2
+        paddd   xmm13,xmm8
+        por     xmm2,xmm5
+        por     xmm10,xmm7
+        pxor    xmm3,xmm0
+        movdqa  xmm0,XMMWORD[((128-128))+rax]
+
+        movdqa  xmm8,xmm13
+        movdqa  xmm6,xmm11
+        pxor    xmm3,XMMWORD[((224-128))+rax]
+        paddd   xmm12,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm14
+
+        movdqa  xmm9,xmm13
+        movdqa  XMMWORD[(80-128)+rax],xmm2
+        paddd   xmm12,xmm2
+        pxor    xmm3,xmm0
+        psrld   xmm9,27
+        pxor    xmm6,xmm10
+        movdqa  xmm7,xmm14
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm3
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm12,xmm6
+        paddd   xmm3,xmm3
+
+        psrld   xmm14,2
+        paddd   xmm12,xmm8
+        por     xmm3,xmm5
+        por     xmm14,xmm7
+        pxor    xmm4,xmm1
+        movdqa  xmm1,XMMWORD[((144-128))+rax]
+
+        movdqa  xmm8,xmm12
+        movdqa  xmm6,xmm10
+        pxor    xmm4,XMMWORD[((240-128))+rax]
+        paddd   xmm11,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm13
+
+        movdqa  xmm9,xmm12
+        movdqa  XMMWORD[(96-128)+rax],xmm3
+        paddd   xmm11,xmm3
+        pxor    xmm4,xmm1
+        psrld   xmm9,27
+        pxor    xmm6,xmm14
+        movdqa  xmm7,xmm13
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm4
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm11,xmm6
+        paddd   xmm4,xmm4
+
+        psrld   xmm13,2
+        paddd   xmm11,xmm8
+        por     xmm4,xmm5
+        por     xmm13,xmm7
+        pxor    xmm0,xmm2
+        movdqa  xmm2,XMMWORD[((160-128))+rax]
+
+        movdqa  xmm8,xmm11
+        movdqa  xmm6,xmm14
+        pxor    xmm0,XMMWORD[((0-128))+rax]
+        paddd   xmm10,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm12
+
+        movdqa  xmm9,xmm11
+        movdqa  XMMWORD[(112-128)+rax],xmm4
+        paddd   xmm10,xmm4
+        pxor    xmm0,xmm2
+        psrld   xmm9,27
+        pxor    xmm6,xmm13
+        movdqa  xmm7,xmm12
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm0
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm10,xmm6
+        paddd   xmm0,xmm0
+
+        psrld   xmm12,2
+        paddd   xmm10,xmm8
+        por     xmm0,xmm5
+        por     xmm12,xmm7
+        movdqa  xmm15,XMMWORD[32+rbp]
+        pxor    xmm1,xmm3
+        movdqa  xmm3,XMMWORD[((176-128))+rax]
+
+        movdqa  xmm8,xmm10
+        movdqa  xmm7,xmm13
+        pxor    xmm1,XMMWORD[((16-128))+rax]
+        pxor    xmm1,xmm3
+        paddd   xmm14,xmm15
+        pslld   xmm8,5
+        movdqa  xmm9,xmm10
+        pand    xmm7,xmm12
+
+        movdqa  xmm6,xmm13
+        movdqa  xmm5,xmm1
+        psrld   xmm9,27
+        paddd   xmm14,xmm7
+        pxor    xmm6,xmm12
+
+        movdqa  XMMWORD[(128-128)+rax],xmm0
+        paddd   xmm14,xmm0
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        pand    xmm6,xmm11
+        movdqa  xmm7,xmm11
+
+        pslld   xmm7,30
+        paddd   xmm1,xmm1
+        paddd   xmm14,xmm6
+
+        psrld   xmm11,2
+        paddd   xmm14,xmm8
+        por     xmm1,xmm5
+        por     xmm11,xmm7
+        pxor    xmm2,xmm4
+        movdqa  xmm4,XMMWORD[((192-128))+rax]
+
+        movdqa  xmm8,xmm14
+        movdqa  xmm7,xmm12
+        pxor    xmm2,XMMWORD[((32-128))+rax]
+        pxor    xmm2,xmm4
+        paddd   xmm13,xmm15
+        pslld   xmm8,5
+        movdqa  xmm9,xmm14
+        pand    xmm7,xmm11
+
+        movdqa  xmm6,xmm12
+        movdqa  xmm5,xmm2
+        psrld   xmm9,27
+        paddd   xmm13,xmm7
+        pxor    xmm6,xmm11
+
+        movdqa  XMMWORD[(144-128)+rax],xmm1
+        paddd   xmm13,xmm1
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        pand    xmm6,xmm10
+        movdqa  xmm7,xmm10
+
+        pslld   xmm7,30
+        paddd   xmm2,xmm2
+        paddd   xmm13,xmm6
+
+        psrld   xmm10,2
+        paddd   xmm13,xmm8
+        por     xmm2,xmm5
+        por     xmm10,xmm7
+        pxor    xmm3,xmm0
+        movdqa  xmm0,XMMWORD[((208-128))+rax]
+
+        movdqa  xmm8,xmm13
+        movdqa  xmm7,xmm11
+        pxor    xmm3,XMMWORD[((48-128))+rax]
+        pxor    xmm3,xmm0
+        paddd   xmm12,xmm15
+        pslld   xmm8,5
+        movdqa  xmm9,xmm13
+        pand    xmm7,xmm10
+
+        movdqa  xmm6,xmm11
+        movdqa  xmm5,xmm3
+        psrld   xmm9,27
+        paddd   xmm12,xmm7
+        pxor    xmm6,xmm10
+
+        movdqa  XMMWORD[(160-128)+rax],xmm2
+        paddd   xmm12,xmm2
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        pand    xmm6,xmm14
+        movdqa  xmm7,xmm14
+
+        pslld   xmm7,30
+        paddd   xmm3,xmm3
+        paddd   xmm12,xmm6
+
+        psrld   xmm14,2
+        paddd   xmm12,xmm8
+        por     xmm3,xmm5
+        por     xmm14,xmm7
+        pxor    xmm4,xmm1
+        movdqa  xmm1,XMMWORD[((224-128))+rax]
+
+        movdqa  xmm8,xmm12
+        movdqa  xmm7,xmm10
+        pxor    xmm4,XMMWORD[((64-128))+rax]
+        pxor    xmm4,xmm1
+        paddd   xmm11,xmm15
+        pslld   xmm8,5
+        movdqa  xmm9,xmm12
+        pand    xmm7,xmm14
+
+        movdqa  xmm6,xmm10
+        movdqa  xmm5,xmm4
+        psrld   xmm9,27
+        paddd   xmm11,xmm7
+        pxor    xmm6,xmm14
+
+        movdqa  XMMWORD[(176-128)+rax],xmm3
+        paddd   xmm11,xmm3
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        pand    xmm6,xmm13
+        movdqa  xmm7,xmm13
+
+        pslld   xmm7,30
+        paddd   xmm4,xmm4
+        paddd   xmm11,xmm6
+
+        psrld   xmm13,2
+        paddd   xmm11,xmm8
+        por     xmm4,xmm5
+        por     xmm13,xmm7
+        pxor    xmm0,xmm2
+        movdqa  xmm2,XMMWORD[((240-128))+rax]
+
+        movdqa  xmm8,xmm11
+        movdqa  xmm7,xmm14
+        pxor    xmm0,XMMWORD[((80-128))+rax]
+        pxor    xmm0,xmm2
+        paddd   xmm10,xmm15
+        pslld   xmm8,5
+        movdqa  xmm9,xmm11
+        pand    xmm7,xmm13
+
+        movdqa  xmm6,xmm14
+        movdqa  xmm5,xmm0
+        psrld   xmm9,27
+        paddd   xmm10,xmm7
+        pxor    xmm6,xmm13
+
+        movdqa  XMMWORD[(192-128)+rax],xmm4
+        paddd   xmm10,xmm4
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        pand    xmm6,xmm12
+        movdqa  xmm7,xmm12
+
+        pslld   xmm7,30
+        paddd   xmm0,xmm0
+        paddd   xmm10,xmm6
+
+        psrld   xmm12,2
+        paddd   xmm10,xmm8
+        por     xmm0,xmm5
+        por     xmm12,xmm7
+        pxor    xmm1,xmm3
+        movdqa  xmm3,XMMWORD[((0-128))+rax]
+
+        movdqa  xmm8,xmm10
+        movdqa  xmm7,xmm13
+        pxor    xmm1,XMMWORD[((96-128))+rax]
+        pxor    xmm1,xmm3
+        paddd   xmm14,xmm15
+        pslld   xmm8,5
+        movdqa  xmm9,xmm10
+        pand    xmm7,xmm12
+
+        movdqa  xmm6,xmm13
+        movdqa  xmm5,xmm1
+        psrld   xmm9,27
+        paddd   xmm14,xmm7
+        pxor    xmm6,xmm12
+
+        movdqa  XMMWORD[(208-128)+rax],xmm0
+        paddd   xmm14,xmm0
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        pand    xmm6,xmm11
+        movdqa  xmm7,xmm11
+
+        pslld   xmm7,30
+        paddd   xmm1,xmm1
+        paddd   xmm14,xmm6
+
+        psrld   xmm11,2
+        paddd   xmm14,xmm8
+        por     xmm1,xmm5
+        por     xmm11,xmm7
+        pxor    xmm2,xmm4
+        movdqa  xmm4,XMMWORD[((16-128))+rax]
+
+        movdqa  xmm8,xmm14
+        movdqa  xmm7,xmm12
+        pxor    xmm2,XMMWORD[((112-128))+rax]
+        pxor    xmm2,xmm4
+        paddd   xmm13,xmm15
+        pslld   xmm8,5
+        movdqa  xmm9,xmm14
+        pand    xmm7,xmm11
+
+        movdqa  xmm6,xmm12
+        movdqa  xmm5,xmm2
+        psrld   xmm9,27
+        paddd   xmm13,xmm7
+        pxor    xmm6,xmm11
+
+        movdqa  XMMWORD[(224-128)+rax],xmm1
+        paddd   xmm13,xmm1
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        pand    xmm6,xmm10
+        movdqa  xmm7,xmm10
+
+        pslld   xmm7,30
+        paddd   xmm2,xmm2
+        paddd   xmm13,xmm6
+
+        psrld   xmm10,2
+        paddd   xmm13,xmm8
+        por     xmm2,xmm5
+        por     xmm10,xmm7
+        pxor    xmm3,xmm0
+        movdqa  xmm0,XMMWORD[((32-128))+rax]
+
+        movdqa  xmm8,xmm13
+        movdqa  xmm7,xmm11
+        pxor    xmm3,XMMWORD[((128-128))+rax]
+        pxor    xmm3,xmm0
+        paddd   xmm12,xmm15
+        pslld   xmm8,5
+        movdqa  xmm9,xmm13
+        pand    xmm7,xmm10
+
+        movdqa  xmm6,xmm11
+        movdqa  xmm5,xmm3
+        psrld   xmm9,27
+        paddd   xmm12,xmm7
+        pxor    xmm6,xmm10
+
+        movdqa  XMMWORD[(240-128)+rax],xmm2
+        paddd   xmm12,xmm2
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        pand    xmm6,xmm14
+        movdqa  xmm7,xmm14
+
+        pslld   xmm7,30
+        paddd   xmm3,xmm3
+        paddd   xmm12,xmm6
+
+        psrld   xmm14,2
+        paddd   xmm12,xmm8
+        por     xmm3,xmm5
+        por     xmm14,xmm7
+        pxor    xmm4,xmm1
+        movdqa  xmm1,XMMWORD[((48-128))+rax]
+
+        movdqa  xmm8,xmm12
+        movdqa  xmm7,xmm10
+        pxor    xmm4,XMMWORD[((144-128))+rax]
+        pxor    xmm4,xmm1
+        paddd   xmm11,xmm15
+        pslld   xmm8,5
+        movdqa  xmm9,xmm12
+        pand    xmm7,xmm14
+
+        movdqa  xmm6,xmm10
+        movdqa  xmm5,xmm4
+        psrld   xmm9,27
+        paddd   xmm11,xmm7
+        pxor    xmm6,xmm14
+
+        movdqa  XMMWORD[(0-128)+rax],xmm3
+        paddd   xmm11,xmm3
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        pand    xmm6,xmm13
+        movdqa  xmm7,xmm13
+
+        pslld   xmm7,30
+        paddd   xmm4,xmm4
+        paddd   xmm11,xmm6
+
+        psrld   xmm13,2
+        paddd   xmm11,xmm8
+        por     xmm4,xmm5
+        por     xmm13,xmm7
+        pxor    xmm0,xmm2
+        movdqa  xmm2,XMMWORD[((64-128))+rax]
+
+        movdqa  xmm8,xmm11
+        movdqa  xmm7,xmm14
+        pxor    xmm0,XMMWORD[((160-128))+rax]
+        pxor    xmm0,xmm2
+        paddd   xmm10,xmm15
+        pslld   xmm8,5
+        movdqa  xmm9,xmm11
+        pand    xmm7,xmm13
+
+        movdqa  xmm6,xmm14
+        movdqa  xmm5,xmm0
+        psrld   xmm9,27
+        paddd   xmm10,xmm7
+        pxor    xmm6,xmm13
+
+        movdqa  XMMWORD[(16-128)+rax],xmm4
+        paddd   xmm10,xmm4
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        pand    xmm6,xmm12
+        movdqa  xmm7,xmm12
+
+        pslld   xmm7,30
+        paddd   xmm0,xmm0
+        paddd   xmm10,xmm6
+
+        psrld   xmm12,2
+        paddd   xmm10,xmm8
+        por     xmm0,xmm5
+        por     xmm12,xmm7
+        pxor    xmm1,xmm3
+        movdqa  xmm3,XMMWORD[((80-128))+rax]
+
+        movdqa  xmm8,xmm10
+        movdqa  xmm7,xmm13
+        pxor    xmm1,XMMWORD[((176-128))+rax]
+        pxor    xmm1,xmm3
+        paddd   xmm14,xmm15
+        pslld   xmm8,5
+        movdqa  xmm9,xmm10
+        pand    xmm7,xmm12
+
+        movdqa  xmm6,xmm13
+        movdqa  xmm5,xmm1
+        psrld   xmm9,27
+        paddd   xmm14,xmm7
+        pxor    xmm6,xmm12
+
+        movdqa  XMMWORD[(32-128)+rax],xmm0
+        paddd   xmm14,xmm0
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        pand    xmm6,xmm11
+        movdqa  xmm7,xmm11
+
+        pslld   xmm7,30
+        paddd   xmm1,xmm1
+        paddd   xmm14,xmm6
+
+        psrld   xmm11,2
+        paddd   xmm14,xmm8
+        por     xmm1,xmm5
+        por     xmm11,xmm7
+        pxor    xmm2,xmm4
+        movdqa  xmm4,XMMWORD[((96-128))+rax]
+
+        movdqa  xmm8,xmm14
+        movdqa  xmm7,xmm12
+        pxor    xmm2,XMMWORD[((192-128))+rax]
+        pxor    xmm2,xmm4
+        paddd   xmm13,xmm15
+        pslld   xmm8,5
+        movdqa  xmm9,xmm14
+        pand    xmm7,xmm11
+
+        movdqa  xmm6,xmm12
+        movdqa  xmm5,xmm2
+        psrld   xmm9,27
+        paddd   xmm13,xmm7
+        pxor    xmm6,xmm11
+
+        movdqa  XMMWORD[(48-128)+rax],xmm1
+        paddd   xmm13,xmm1
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        pand    xmm6,xmm10
+        movdqa  xmm7,xmm10
+
+        pslld   xmm7,30
+        paddd   xmm2,xmm2
+        paddd   xmm13,xmm6
+
+        psrld   xmm10,2
+        paddd   xmm13,xmm8
+        por     xmm2,xmm5
+        por     xmm10,xmm7
+        pxor    xmm3,xmm0
+        movdqa  xmm0,XMMWORD[((112-128))+rax]
+
+        movdqa  xmm8,xmm13
+        movdqa  xmm7,xmm11
+        pxor    xmm3,XMMWORD[((208-128))+rax]
+        pxor    xmm3,xmm0
+        paddd   xmm12,xmm15
+        pslld   xmm8,5
+        movdqa  xmm9,xmm13
+        pand    xmm7,xmm10
+
+        movdqa  xmm6,xmm11
+        movdqa  xmm5,xmm3
+        psrld   xmm9,27
+        paddd   xmm12,xmm7
+        pxor    xmm6,xmm10
+
+        movdqa  XMMWORD[(64-128)+rax],xmm2
+        paddd   xmm12,xmm2
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        pand    xmm6,xmm14
+        movdqa  xmm7,xmm14
+
+        pslld   xmm7,30
+        paddd   xmm3,xmm3
+        paddd   xmm12,xmm6
+
+        psrld   xmm14,2
+        paddd   xmm12,xmm8
+        por     xmm3,xmm5
+        por     xmm14,xmm7
+        pxor    xmm4,xmm1
+        movdqa  xmm1,XMMWORD[((128-128))+rax]
+
+        movdqa  xmm8,xmm12
+        movdqa  xmm7,xmm10
+        pxor    xmm4,XMMWORD[((224-128))+rax]
+        pxor    xmm4,xmm1
+        paddd   xmm11,xmm15
+        pslld   xmm8,5
+        movdqa  xmm9,xmm12
+        pand    xmm7,xmm14
+
+        movdqa  xmm6,xmm10
+        movdqa  xmm5,xmm4
+        psrld   xmm9,27
+        paddd   xmm11,xmm7
+        pxor    xmm6,xmm14
+
+        movdqa  XMMWORD[(80-128)+rax],xmm3
+        paddd   xmm11,xmm3
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        pand    xmm6,xmm13
+        movdqa  xmm7,xmm13
+
+        pslld   xmm7,30
+        paddd   xmm4,xmm4
+        paddd   xmm11,xmm6
+
+        psrld   xmm13,2
+        paddd   xmm11,xmm8
+        por     xmm4,xmm5
+        por     xmm13,xmm7
+        pxor    xmm0,xmm2
+        movdqa  xmm2,XMMWORD[((144-128))+rax]
+
+        movdqa  xmm8,xmm11
+        movdqa  xmm7,xmm14
+        pxor    xmm0,XMMWORD[((240-128))+rax]
+        pxor    xmm0,xmm2
+        paddd   xmm10,xmm15
+        pslld   xmm8,5
+        movdqa  xmm9,xmm11
+        pand    xmm7,xmm13
+
+        movdqa  xmm6,xmm14
+        movdqa  xmm5,xmm0
+        psrld   xmm9,27
+        paddd   xmm10,xmm7
+        pxor    xmm6,xmm13
+
+        movdqa  XMMWORD[(96-128)+rax],xmm4
+        paddd   xmm10,xmm4
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        pand    xmm6,xmm12
+        movdqa  xmm7,xmm12
+
+        pslld   xmm7,30
+        paddd   xmm0,xmm0
+        paddd   xmm10,xmm6
+
+        psrld   xmm12,2
+        paddd   xmm10,xmm8
+        por     xmm0,xmm5
+        por     xmm12,xmm7
+        pxor    xmm1,xmm3
+        movdqa  xmm3,XMMWORD[((160-128))+rax]
+
+        movdqa  xmm8,xmm10
+        movdqa  xmm7,xmm13
+        pxor    xmm1,XMMWORD[((0-128))+rax]
+        pxor    xmm1,xmm3
+        paddd   xmm14,xmm15
+        pslld   xmm8,5
+        movdqa  xmm9,xmm10
+        pand    xmm7,xmm12
+
+        movdqa  xmm6,xmm13
+        movdqa  xmm5,xmm1
+        psrld   xmm9,27
+        paddd   xmm14,xmm7
+        pxor    xmm6,xmm12
+
+        movdqa  XMMWORD[(112-128)+rax],xmm0
+        paddd   xmm14,xmm0
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        pand    xmm6,xmm11
+        movdqa  xmm7,xmm11
+
+        pslld   xmm7,30
+        paddd   xmm1,xmm1
+        paddd   xmm14,xmm6
+
+        psrld   xmm11,2
+        paddd   xmm14,xmm8
+        por     xmm1,xmm5
+        por     xmm11,xmm7
+        pxor    xmm2,xmm4
+        movdqa  xmm4,XMMWORD[((176-128))+rax]
+
+        movdqa  xmm8,xmm14
+        movdqa  xmm7,xmm12
+        pxor    xmm2,XMMWORD[((16-128))+rax]
+        pxor    xmm2,xmm4
+        paddd   xmm13,xmm15
+        pslld   xmm8,5
+        movdqa  xmm9,xmm14
+        pand    xmm7,xmm11
+
+        movdqa  xmm6,xmm12
+        movdqa  xmm5,xmm2
+        psrld   xmm9,27
+        paddd   xmm13,xmm7
+        pxor    xmm6,xmm11
+
+        movdqa  XMMWORD[(128-128)+rax],xmm1
+        paddd   xmm13,xmm1
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        pand    xmm6,xmm10
+        movdqa  xmm7,xmm10
+
+        pslld   xmm7,30
+        paddd   xmm2,xmm2
+        paddd   xmm13,xmm6
+
+        psrld   xmm10,2
+        paddd   xmm13,xmm8
+        por     xmm2,xmm5
+        por     xmm10,xmm7
+        pxor    xmm3,xmm0
+        movdqa  xmm0,XMMWORD[((192-128))+rax]
+
+        movdqa  xmm8,xmm13
+        movdqa  xmm7,xmm11
+        pxor    xmm3,XMMWORD[((32-128))+rax]
+        pxor    xmm3,xmm0
+        paddd   xmm12,xmm15
+        pslld   xmm8,5
+        movdqa  xmm9,xmm13
+        pand    xmm7,xmm10
+
+        movdqa  xmm6,xmm11
+        movdqa  xmm5,xmm3
+        psrld   xmm9,27
+        paddd   xmm12,xmm7
+        pxor    xmm6,xmm10
+
+        movdqa  XMMWORD[(144-128)+rax],xmm2
+        paddd   xmm12,xmm2
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        pand    xmm6,xmm14
+        movdqa  xmm7,xmm14
+
+        pslld   xmm7,30
+        paddd   xmm3,xmm3
+        paddd   xmm12,xmm6
+
+        psrld   xmm14,2
+        paddd   xmm12,xmm8
+        por     xmm3,xmm5
+        por     xmm14,xmm7
+        pxor    xmm4,xmm1
+        movdqa  xmm1,XMMWORD[((208-128))+rax]
+
+        movdqa  xmm8,xmm12
+        movdqa  xmm7,xmm10
+        pxor    xmm4,XMMWORD[((48-128))+rax]
+        pxor    xmm4,xmm1
+        paddd   xmm11,xmm15
+        pslld   xmm8,5
+        movdqa  xmm9,xmm12
+        pand    xmm7,xmm14
+
+        movdqa  xmm6,xmm10
+        movdqa  xmm5,xmm4
+        psrld   xmm9,27
+        paddd   xmm11,xmm7
+        pxor    xmm6,xmm14
+
+        movdqa  XMMWORD[(160-128)+rax],xmm3
+        paddd   xmm11,xmm3
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        pand    xmm6,xmm13
+        movdqa  xmm7,xmm13
+
+        pslld   xmm7,30
+        paddd   xmm4,xmm4
+        paddd   xmm11,xmm6
+
+        psrld   xmm13,2
+        paddd   xmm11,xmm8
+        por     xmm4,xmm5
+        por     xmm13,xmm7
+        pxor    xmm0,xmm2
+        movdqa  xmm2,XMMWORD[((224-128))+rax]
+
+        movdqa  xmm8,xmm11
+        movdqa  xmm7,xmm14
+        pxor    xmm0,XMMWORD[((64-128))+rax]
+        pxor    xmm0,xmm2
+        paddd   xmm10,xmm15
+        pslld   xmm8,5
+        movdqa  xmm9,xmm11
+        pand    xmm7,xmm13
+
+        movdqa  xmm6,xmm14
+        movdqa  xmm5,xmm0
+        psrld   xmm9,27
+        paddd   xmm10,xmm7
+        pxor    xmm6,xmm13
+
+        movdqa  XMMWORD[(176-128)+rax],xmm4
+        paddd   xmm10,xmm4
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        pand    xmm6,xmm12
+        movdqa  xmm7,xmm12
+
+        pslld   xmm7,30
+        paddd   xmm0,xmm0
+        paddd   xmm10,xmm6
+
+        psrld   xmm12,2
+        paddd   xmm10,xmm8
+        por     xmm0,xmm5
+        por     xmm12,xmm7
+        movdqa  xmm15,XMMWORD[64+rbp]
+        pxor    xmm1,xmm3
+        movdqa  xmm3,XMMWORD[((240-128))+rax]
+
+        movdqa  xmm8,xmm10
+        movdqa  xmm6,xmm13
+        pxor    xmm1,XMMWORD[((80-128))+rax]
+        paddd   xmm14,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm11
+
+        movdqa  xmm9,xmm10
+        movdqa  XMMWORD[(192-128)+rax],xmm0
+        paddd   xmm14,xmm0
+        pxor    xmm1,xmm3
+        psrld   xmm9,27
+        pxor    xmm6,xmm12
+        movdqa  xmm7,xmm11
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm1
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm14,xmm6
+        paddd   xmm1,xmm1
+
+        psrld   xmm11,2
+        paddd   xmm14,xmm8
+        por     xmm1,xmm5
+        por     xmm11,xmm7
+        pxor    xmm2,xmm4
+        movdqa  xmm4,XMMWORD[((0-128))+rax]
+
+        movdqa  xmm8,xmm14
+        movdqa  xmm6,xmm12
+        pxor    xmm2,XMMWORD[((96-128))+rax]
+        paddd   xmm13,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm10
+
+        movdqa  xmm9,xmm14
+        movdqa  XMMWORD[(208-128)+rax],xmm1
+        paddd   xmm13,xmm1
+        pxor    xmm2,xmm4
+        psrld   xmm9,27
+        pxor    xmm6,xmm11
+        movdqa  xmm7,xmm10
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm2
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm13,xmm6
+        paddd   xmm2,xmm2
+
+        psrld   xmm10,2
+        paddd   xmm13,xmm8
+        por     xmm2,xmm5
+        por     xmm10,xmm7
+        pxor    xmm3,xmm0
+        movdqa  xmm0,XMMWORD[((16-128))+rax]
+
+        movdqa  xmm8,xmm13
+        movdqa  xmm6,xmm11
+        pxor    xmm3,XMMWORD[((112-128))+rax]
+        paddd   xmm12,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm14
+
+        movdqa  xmm9,xmm13
+        movdqa  XMMWORD[(224-128)+rax],xmm2
+        paddd   xmm12,xmm2
+        pxor    xmm3,xmm0
+        psrld   xmm9,27
+        pxor    xmm6,xmm10
+        movdqa  xmm7,xmm14
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm3
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm12,xmm6
+        paddd   xmm3,xmm3
+
+        psrld   xmm14,2
+        paddd   xmm12,xmm8
+        por     xmm3,xmm5
+        por     xmm14,xmm7
+        pxor    xmm4,xmm1
+        movdqa  xmm1,XMMWORD[((32-128))+rax]
+
+        movdqa  xmm8,xmm12
+        movdqa  xmm6,xmm10
+        pxor    xmm4,XMMWORD[((128-128))+rax]
+        paddd   xmm11,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm13
+
+        movdqa  xmm9,xmm12
+        movdqa  XMMWORD[(240-128)+rax],xmm3
+        paddd   xmm11,xmm3
+        pxor    xmm4,xmm1
+        psrld   xmm9,27
+        pxor    xmm6,xmm14
+        movdqa  xmm7,xmm13
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm4
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm11,xmm6
+        paddd   xmm4,xmm4
+
+        psrld   xmm13,2
+        paddd   xmm11,xmm8
+        por     xmm4,xmm5
+        por     xmm13,xmm7
+        pxor    xmm0,xmm2
+        movdqa  xmm2,XMMWORD[((48-128))+rax]
+
+        movdqa  xmm8,xmm11
+        movdqa  xmm6,xmm14
+        pxor    xmm0,XMMWORD[((144-128))+rax]
+        paddd   xmm10,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm12
+
+        movdqa  xmm9,xmm11
+        movdqa  XMMWORD[(0-128)+rax],xmm4
+        paddd   xmm10,xmm4
+        pxor    xmm0,xmm2
+        psrld   xmm9,27
+        pxor    xmm6,xmm13
+        movdqa  xmm7,xmm12
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm0
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm10,xmm6
+        paddd   xmm0,xmm0
+
+        psrld   xmm12,2
+        paddd   xmm10,xmm8
+        por     xmm0,xmm5
+        por     xmm12,xmm7
+        pxor    xmm1,xmm3
+        movdqa  xmm3,XMMWORD[((64-128))+rax]
+
+        movdqa  xmm8,xmm10
+        movdqa  xmm6,xmm13
+        pxor    xmm1,XMMWORD[((160-128))+rax]
+        paddd   xmm14,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm11
+
+        movdqa  xmm9,xmm10
+        movdqa  XMMWORD[(16-128)+rax],xmm0
+        paddd   xmm14,xmm0
+        pxor    xmm1,xmm3
+        psrld   xmm9,27
+        pxor    xmm6,xmm12
+        movdqa  xmm7,xmm11
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm1
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm14,xmm6
+        paddd   xmm1,xmm1
+
+        psrld   xmm11,2
+        paddd   xmm14,xmm8
+        por     xmm1,xmm5
+        por     xmm11,xmm7
+        pxor    xmm2,xmm4
+        movdqa  xmm4,XMMWORD[((80-128))+rax]
+
+        movdqa  xmm8,xmm14
+        movdqa  xmm6,xmm12
+        pxor    xmm2,XMMWORD[((176-128))+rax]
+        paddd   xmm13,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm10
+
+        movdqa  xmm9,xmm14
+        movdqa  XMMWORD[(32-128)+rax],xmm1
+        paddd   xmm13,xmm1
+        pxor    xmm2,xmm4
+        psrld   xmm9,27
+        pxor    xmm6,xmm11
+        movdqa  xmm7,xmm10
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm2
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm13,xmm6
+        paddd   xmm2,xmm2
+
+        psrld   xmm10,2
+        paddd   xmm13,xmm8
+        por     xmm2,xmm5
+        por     xmm10,xmm7
+        pxor    xmm3,xmm0
+        movdqa  xmm0,XMMWORD[((96-128))+rax]
+
+        movdqa  xmm8,xmm13
+        movdqa  xmm6,xmm11
+        pxor    xmm3,XMMWORD[((192-128))+rax]
+        paddd   xmm12,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm14
+
+        movdqa  xmm9,xmm13
+        movdqa  XMMWORD[(48-128)+rax],xmm2
+        paddd   xmm12,xmm2
+        pxor    xmm3,xmm0
+        psrld   xmm9,27
+        pxor    xmm6,xmm10
+        movdqa  xmm7,xmm14
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm3
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm12,xmm6
+        paddd   xmm3,xmm3
+
+        psrld   xmm14,2
+        paddd   xmm12,xmm8
+        por     xmm3,xmm5
+        por     xmm14,xmm7
+        pxor    xmm4,xmm1
+        movdqa  xmm1,XMMWORD[((112-128))+rax]
+
+        movdqa  xmm8,xmm12
+        movdqa  xmm6,xmm10
+        pxor    xmm4,XMMWORD[((208-128))+rax]
+        paddd   xmm11,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm13
+
+        movdqa  xmm9,xmm12
+        movdqa  XMMWORD[(64-128)+rax],xmm3
+        paddd   xmm11,xmm3
+        pxor    xmm4,xmm1
+        psrld   xmm9,27
+        pxor    xmm6,xmm14
+        movdqa  xmm7,xmm13
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm4
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm11,xmm6
+        paddd   xmm4,xmm4
+
+        psrld   xmm13,2
+        paddd   xmm11,xmm8
+        por     xmm4,xmm5
+        por     xmm13,xmm7
+        pxor    xmm0,xmm2
+        movdqa  xmm2,XMMWORD[((128-128))+rax]
+
+        movdqa  xmm8,xmm11
+        movdqa  xmm6,xmm14
+        pxor    xmm0,XMMWORD[((224-128))+rax]
+        paddd   xmm10,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm12
+
+        movdqa  xmm9,xmm11
+        movdqa  XMMWORD[(80-128)+rax],xmm4
+        paddd   xmm10,xmm4
+        pxor    xmm0,xmm2
+        psrld   xmm9,27
+        pxor    xmm6,xmm13
+        movdqa  xmm7,xmm12
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm0
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm10,xmm6
+        paddd   xmm0,xmm0
+
+        psrld   xmm12,2
+        paddd   xmm10,xmm8
+        por     xmm0,xmm5
+        por     xmm12,xmm7
+        pxor    xmm1,xmm3
+        movdqa  xmm3,XMMWORD[((144-128))+rax]
+
+        movdqa  xmm8,xmm10
+        movdqa  xmm6,xmm13
+        pxor    xmm1,XMMWORD[((240-128))+rax]
+        paddd   xmm14,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm11
+
+        movdqa  xmm9,xmm10
+        movdqa  XMMWORD[(96-128)+rax],xmm0
+        paddd   xmm14,xmm0
+        pxor    xmm1,xmm3
+        psrld   xmm9,27
+        pxor    xmm6,xmm12
+        movdqa  xmm7,xmm11
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm1
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm14,xmm6
+        paddd   xmm1,xmm1
+
+        psrld   xmm11,2
+        paddd   xmm14,xmm8
+        por     xmm1,xmm5
+        por     xmm11,xmm7
+        pxor    xmm2,xmm4
+        movdqa  xmm4,XMMWORD[((160-128))+rax]
+
+        movdqa  xmm8,xmm14
+        movdqa  xmm6,xmm12
+        pxor    xmm2,XMMWORD[((0-128))+rax]
+        paddd   xmm13,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm10
+
+        movdqa  xmm9,xmm14
+        movdqa  XMMWORD[(112-128)+rax],xmm1
+        paddd   xmm13,xmm1
+        pxor    xmm2,xmm4
+        psrld   xmm9,27
+        pxor    xmm6,xmm11
+        movdqa  xmm7,xmm10
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm2
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm13,xmm6
+        paddd   xmm2,xmm2
+
+        psrld   xmm10,2
+        paddd   xmm13,xmm8
+        por     xmm2,xmm5
+        por     xmm10,xmm7
+        pxor    xmm3,xmm0
+        movdqa  xmm0,XMMWORD[((176-128))+rax]
+
+        movdqa  xmm8,xmm13
+        movdqa  xmm6,xmm11
+        pxor    xmm3,XMMWORD[((16-128))+rax]
+        paddd   xmm12,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm14
+
+        movdqa  xmm9,xmm13
+        paddd   xmm12,xmm2
+        pxor    xmm3,xmm0
+        psrld   xmm9,27
+        pxor    xmm6,xmm10
+        movdqa  xmm7,xmm14
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm3
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm12,xmm6
+        paddd   xmm3,xmm3
+
+        psrld   xmm14,2
+        paddd   xmm12,xmm8
+        por     xmm3,xmm5
+        por     xmm14,xmm7
+        pxor    xmm4,xmm1
+        movdqa  xmm1,XMMWORD[((192-128))+rax]
+
+        movdqa  xmm8,xmm12
+        movdqa  xmm6,xmm10
+        pxor    xmm4,XMMWORD[((32-128))+rax]
+        paddd   xmm11,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm13
+
+        movdqa  xmm9,xmm12
+        paddd   xmm11,xmm3
+        pxor    xmm4,xmm1
+        psrld   xmm9,27
+        pxor    xmm6,xmm14
+        movdqa  xmm7,xmm13
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm4
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm11,xmm6
+        paddd   xmm4,xmm4
+
+        psrld   xmm13,2
+        paddd   xmm11,xmm8
+        por     xmm4,xmm5
+        por     xmm13,xmm7
+        pxor    xmm0,xmm2
+        movdqa  xmm2,XMMWORD[((208-128))+rax]
+
+        movdqa  xmm8,xmm11
+        movdqa  xmm6,xmm14
+        pxor    xmm0,XMMWORD[((48-128))+rax]
+        paddd   xmm10,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm12
+
+        movdqa  xmm9,xmm11
+        paddd   xmm10,xmm4
+        pxor    xmm0,xmm2
+        psrld   xmm9,27
+        pxor    xmm6,xmm13
+        movdqa  xmm7,xmm12
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm0
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm10,xmm6
+        paddd   xmm0,xmm0
+
+        psrld   xmm12,2
+        paddd   xmm10,xmm8
+        por     xmm0,xmm5
+        por     xmm12,xmm7
+        pxor    xmm1,xmm3
+        movdqa  xmm3,XMMWORD[((224-128))+rax]
+
+        movdqa  xmm8,xmm10
+        movdqa  xmm6,xmm13
+        pxor    xmm1,XMMWORD[((64-128))+rax]
+        paddd   xmm14,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm11
+
+        movdqa  xmm9,xmm10
+        paddd   xmm14,xmm0
+        pxor    xmm1,xmm3
+        psrld   xmm9,27
+        pxor    xmm6,xmm12
+        movdqa  xmm7,xmm11
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm1
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm14,xmm6
+        paddd   xmm1,xmm1
+
+        psrld   xmm11,2
+        paddd   xmm14,xmm8
+        por     xmm1,xmm5
+        por     xmm11,xmm7
+        pxor    xmm2,xmm4
+        movdqa  xmm4,XMMWORD[((240-128))+rax]
+
+        movdqa  xmm8,xmm14
+        movdqa  xmm6,xmm12
+        pxor    xmm2,XMMWORD[((80-128))+rax]
+        paddd   xmm13,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm10
+
+        movdqa  xmm9,xmm14
+        paddd   xmm13,xmm1
+        pxor    xmm2,xmm4
+        psrld   xmm9,27
+        pxor    xmm6,xmm11
+        movdqa  xmm7,xmm10
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm2
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm13,xmm6
+        paddd   xmm2,xmm2
+
+        psrld   xmm10,2
+        paddd   xmm13,xmm8
+        por     xmm2,xmm5
+        por     xmm10,xmm7
+        pxor    xmm3,xmm0
+        movdqa  xmm0,XMMWORD[((0-128))+rax]
+
+        movdqa  xmm8,xmm13
+        movdqa  xmm6,xmm11
+        pxor    xmm3,XMMWORD[((96-128))+rax]
+        paddd   xmm12,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm14
+
+        movdqa  xmm9,xmm13
+        paddd   xmm12,xmm2
+        pxor    xmm3,xmm0
+        psrld   xmm9,27
+        pxor    xmm6,xmm10
+        movdqa  xmm7,xmm14
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm3
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm12,xmm6
+        paddd   xmm3,xmm3
+
+        psrld   xmm14,2
+        paddd   xmm12,xmm8
+        por     xmm3,xmm5
+        por     xmm14,xmm7
+        pxor    xmm4,xmm1
+        movdqa  xmm1,XMMWORD[((16-128))+rax]
+
+        movdqa  xmm8,xmm12
+        movdqa  xmm6,xmm10
+        pxor    xmm4,XMMWORD[((112-128))+rax]
+        paddd   xmm11,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm13
+
+        movdqa  xmm9,xmm12
+        paddd   xmm11,xmm3
+        pxor    xmm4,xmm1
+        psrld   xmm9,27
+        pxor    xmm6,xmm14
+        movdqa  xmm7,xmm13
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm4
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm11,xmm6
+        paddd   xmm4,xmm4
+
+        psrld   xmm13,2
+        paddd   xmm11,xmm8
+        por     xmm4,xmm5
+        por     xmm13,xmm7
+        movdqa  xmm8,xmm11
+        paddd   xmm10,xmm15
+        movdqa  xmm6,xmm14
+        pslld   xmm8,5
+        pxor    xmm6,xmm12
+
+        movdqa  xmm9,xmm11
+        paddd   xmm10,xmm4
+        psrld   xmm9,27
+        movdqa  xmm7,xmm12
+        pxor    xmm6,xmm13
+
+        pslld   xmm7,30
+        por     xmm8,xmm9
+        paddd   xmm10,xmm6
+
+        psrld   xmm12,2
+        paddd   xmm10,xmm8
+        por     xmm12,xmm7
+        movdqa  xmm0,XMMWORD[rbx]
+        mov     ecx,1
+        cmp     ecx,DWORD[rbx]
+        pxor    xmm8,xmm8
+        cmovge  r8,rbp
+        cmp     ecx,DWORD[4+rbx]
+        movdqa  xmm1,xmm0
+        cmovge  r9,rbp
+        cmp     ecx,DWORD[8+rbx]
+        pcmpgtd xmm1,xmm8
+        cmovge  r10,rbp
+        cmp     ecx,DWORD[12+rbx]
+        paddd   xmm0,xmm1
+        cmovge  r11,rbp
+
+        movdqu  xmm6,XMMWORD[rdi]
+        pand    xmm10,xmm1
+        movdqu  xmm7,XMMWORD[32+rdi]
+        pand    xmm11,xmm1
+        paddd   xmm10,xmm6
+        movdqu  xmm8,XMMWORD[64+rdi]
+        pand    xmm12,xmm1
+        paddd   xmm11,xmm7
+        movdqu  xmm9,XMMWORD[96+rdi]
+        pand    xmm13,xmm1
+        paddd   xmm12,xmm8
+        movdqu  xmm5,XMMWORD[128+rdi]
+        pand    xmm14,xmm1
+        movdqu  XMMWORD[rdi],xmm10
+        paddd   xmm13,xmm9
+        movdqu  XMMWORD[32+rdi],xmm11
+        paddd   xmm14,xmm5
+        movdqu  XMMWORD[64+rdi],xmm12
+        movdqu  XMMWORD[96+rdi],xmm13
+        movdqu  XMMWORD[128+rdi],xmm14
+
+        movdqa  XMMWORD[rbx],xmm0
+        movdqa  xmm5,XMMWORD[96+rbp]
+        movdqa  xmm15,XMMWORD[((-32))+rbp]
+        dec     edx
+        jnz     NEAR $L$oop
+
+        mov     edx,DWORD[280+rsp]
+        lea     rdi,[16+rdi]
+        lea     rsi,[64+rsi]
+        dec     edx
+        jnz     NEAR $L$oop_grande
+
+$L$done:
+        mov     rax,QWORD[272+rsp]
+
+        movaps  xmm6,XMMWORD[((-184))+rax]
+        movaps  xmm7,XMMWORD[((-168))+rax]
+        movaps  xmm8,XMMWORD[((-152))+rax]
+        movaps  xmm9,XMMWORD[((-136))+rax]
+        movaps  xmm10,XMMWORD[((-120))+rax]
+        movaps  xmm11,XMMWORD[((-104))+rax]
+        movaps  xmm12,XMMWORD[((-88))+rax]
+        movaps  xmm13,XMMWORD[((-72))+rax]
+        movaps  xmm14,XMMWORD[((-56))+rax]
+        movaps  xmm15,XMMWORD[((-40))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_sha1_multi_block:
+
+ALIGN   32
+sha1_multi_block_shaext:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha1_multi_block_shaext:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+_shaext_shortcut:
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        lea     rsp,[((-168))+rsp]
+        movaps  XMMWORD[rsp],xmm6
+        movaps  XMMWORD[16+rsp],xmm7
+        movaps  XMMWORD[32+rsp],xmm8
+        movaps  XMMWORD[48+rsp],xmm9
+        movaps  XMMWORD[(-120)+rax],xmm10
+        movaps  XMMWORD[(-104)+rax],xmm11
+        movaps  XMMWORD[(-88)+rax],xmm12
+        movaps  XMMWORD[(-72)+rax],xmm13
+        movaps  XMMWORD[(-56)+rax],xmm14
+        movaps  XMMWORD[(-40)+rax],xmm15
+        sub     rsp,288
+        shl     edx,1
+        and     rsp,-256
+        lea     rdi,[64+rdi]
+        mov     QWORD[272+rsp],rax
+$L$body_shaext:
+        lea     rbx,[256+rsp]
+        movdqa  xmm3,XMMWORD[((K_XX_XX+128))]
+
+$L$oop_grande_shaext:
+        mov     DWORD[280+rsp],edx
+        xor     edx,edx
+        mov     r8,QWORD[rsi]
+        mov     ecx,DWORD[8+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[rbx],ecx
+        cmovle  r8,rsp
+        mov     r9,QWORD[16+rsi]
+        mov     ecx,DWORD[24+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[4+rbx],ecx
+        cmovle  r9,rsp
+        test    edx,edx
+        jz      NEAR $L$done_shaext
+
+        movq    xmm0,QWORD[((0-64))+rdi]
+        movq    xmm4,QWORD[((32-64))+rdi]
+        movq    xmm5,QWORD[((64-64))+rdi]
+        movq    xmm6,QWORD[((96-64))+rdi]
+        movq    xmm7,QWORD[((128-64))+rdi]
+
+        punpckldq       xmm0,xmm4
+        punpckldq       xmm5,xmm6
+
+        movdqa  xmm8,xmm0
+        punpcklqdq      xmm0,xmm5
+        punpckhqdq      xmm8,xmm5
+
+        pshufd  xmm1,xmm7,63
+        pshufd  xmm9,xmm7,127
+        pshufd  xmm0,xmm0,27
+        pshufd  xmm8,xmm8,27
+        jmp     NEAR $L$oop_shaext
+
+ALIGN   32
+$L$oop_shaext:
+        movdqu  xmm4,XMMWORD[r8]
+        movdqu  xmm11,XMMWORD[r9]
+        movdqu  xmm5,XMMWORD[16+r8]
+        movdqu  xmm12,XMMWORD[16+r9]
+        movdqu  xmm6,XMMWORD[32+r8]
+DB      102,15,56,0,227
+        movdqu  xmm13,XMMWORD[32+r9]
+DB      102,68,15,56,0,219
+        movdqu  xmm7,XMMWORD[48+r8]
+        lea     r8,[64+r8]
+DB      102,15,56,0,235
+        movdqu  xmm14,XMMWORD[48+r9]
+        lea     r9,[64+r9]
+DB      102,68,15,56,0,227
+
+        movdqa  XMMWORD[80+rsp],xmm1
+        paddd   xmm1,xmm4
+        movdqa  XMMWORD[112+rsp],xmm9
+        paddd   xmm9,xmm11
+        movdqa  XMMWORD[64+rsp],xmm0
+        movdqa  xmm2,xmm0
+        movdqa  XMMWORD[96+rsp],xmm8
+        movdqa  xmm10,xmm8
+DB      15,58,204,193,0
+DB      15,56,200,213
+DB      69,15,58,204,193,0
+DB      69,15,56,200,212
+DB      102,15,56,0,243
+        prefetcht0      [127+r8]
+DB      15,56,201,229
+DB      102,68,15,56,0,235
+        prefetcht0      [127+r9]
+DB      69,15,56,201,220
+
+DB      102,15,56,0,251
+        movdqa  xmm1,xmm0
+DB      102,68,15,56,0,243
+        movdqa  xmm9,xmm8
+DB      15,58,204,194,0
+DB      15,56,200,206
+DB      69,15,58,204,194,0
+DB      69,15,56,200,205
+        pxor    xmm4,xmm6
+DB      15,56,201,238
+        pxor    xmm11,xmm13
+DB      69,15,56,201,229
+        movdqa  xmm2,xmm0
+        movdqa  xmm10,xmm8
+DB      15,58,204,193,0
+DB      15,56,200,215
+DB      69,15,58,204,193,0
+DB      69,15,56,200,214
+DB      15,56,202,231
+DB      69,15,56,202,222
+        pxor    xmm5,xmm7
+DB      15,56,201,247
+        pxor    xmm12,xmm14
+DB      69,15,56,201,238
+        movdqa  xmm1,xmm0
+        movdqa  xmm9,xmm8
+DB      15,58,204,194,0
+DB      15,56,200,204
+DB      69,15,58,204,194,0
+DB      69,15,56,200,203
+DB      15,56,202,236
+DB      69,15,56,202,227
+        pxor    xmm6,xmm4
+DB      15,56,201,252
+        pxor    xmm13,xmm11
+DB      69,15,56,201,243
+        movdqa  xmm2,xmm0
+        movdqa  xmm10,xmm8
+DB      15,58,204,193,0
+DB      15,56,200,213
+DB      69,15,58,204,193,0
+DB      69,15,56,200,212
+DB      15,56,202,245
+DB      69,15,56,202,236
+        pxor    xmm7,xmm5
+DB      15,56,201,229
+        pxor    xmm14,xmm12
+DB      69,15,56,201,220
+        movdqa  xmm1,xmm0
+        movdqa  xmm9,xmm8
+DB      15,58,204,194,1
+DB      15,56,200,206
+DB      69,15,58,204,194,1
+DB      69,15,56,200,205
+DB      15,56,202,254
+DB      69,15,56,202,245
+        pxor    xmm4,xmm6
+DB      15,56,201,238
+        pxor    xmm11,xmm13
+DB      69,15,56,201,229
+        movdqa  xmm2,xmm0
+        movdqa  xmm10,xmm8
+DB      15,58,204,193,1
+DB      15,56,200,215
+DB      69,15,58,204,193,1
+DB      69,15,56,200,214
+DB      15,56,202,231
+DB      69,15,56,202,222
+        pxor    xmm5,xmm7
+DB      15,56,201,247
+        pxor    xmm12,xmm14
+DB      69,15,56,201,238
+        movdqa  xmm1,xmm0
+        movdqa  xmm9,xmm8
+DB      15,58,204,194,1
+DB      15,56,200,204
+DB      69,15,58,204,194,1
+DB      69,15,56,200,203
+DB      15,56,202,236
+DB      69,15,56,202,227
+        pxor    xmm6,xmm4
+DB      15,56,201,252
+        pxor    xmm13,xmm11
+DB      69,15,56,201,243
+        movdqa  xmm2,xmm0
+        movdqa  xmm10,xmm8
+DB      15,58,204,193,1
+DB      15,56,200,213
+DB      69,15,58,204,193,1
+DB      69,15,56,200,212
+DB      15,56,202,245
+DB      69,15,56,202,236
+        pxor    xmm7,xmm5
+DB      15,56,201,229
+        pxor    xmm14,xmm12
+DB      69,15,56,201,220
+        movdqa  xmm1,xmm0
+        movdqa  xmm9,xmm8
+DB      15,58,204,194,1
+DB      15,56,200,206
+DB      69,15,58,204,194,1
+DB      69,15,56,200,205
+DB      15,56,202,254
+DB      69,15,56,202,245
+        pxor    xmm4,xmm6
+DB      15,56,201,238
+        pxor    xmm11,xmm13
+DB      69,15,56,201,229
+        movdqa  xmm2,xmm0
+        movdqa  xmm10,xmm8
+DB      15,58,204,193,2
+DB      15,56,200,215
+DB      69,15,58,204,193,2
+DB      69,15,56,200,214
+DB      15,56,202,231
+DB      69,15,56,202,222
+        pxor    xmm5,xmm7
+DB      15,56,201,247
+        pxor    xmm12,xmm14
+DB      69,15,56,201,238
+        movdqa  xmm1,xmm0
+        movdqa  xmm9,xmm8
+DB      15,58,204,194,2
+DB      15,56,200,204
+DB      69,15,58,204,194,2
+DB      69,15,56,200,203
+DB      15,56,202,236
+DB      69,15,56,202,227
+        pxor    xmm6,xmm4
+DB      15,56,201,252
+        pxor    xmm13,xmm11
+DB      69,15,56,201,243
+        movdqa  xmm2,xmm0
+        movdqa  xmm10,xmm8
+DB      15,58,204,193,2
+DB      15,56,200,213
+DB      69,15,58,204,193,2
+DB      69,15,56,200,212
+DB      15,56,202,245
+DB      69,15,56,202,236
+        pxor    xmm7,xmm5
+DB      15,56,201,229
+        pxor    xmm14,xmm12
+DB      69,15,56,201,220
+        movdqa  xmm1,xmm0
+        movdqa  xmm9,xmm8
+DB      15,58,204,194,2
+DB      15,56,200,206
+DB      69,15,58,204,194,2
+DB      69,15,56,200,205
+DB      15,56,202,254
+DB      69,15,56,202,245
+        pxor    xmm4,xmm6
+DB      15,56,201,238
+        pxor    xmm11,xmm13
+DB      69,15,56,201,229
+        movdqa  xmm2,xmm0
+        movdqa  xmm10,xmm8
+DB      15,58,204,193,2
+DB      15,56,200,215
+DB      69,15,58,204,193,2
+DB      69,15,56,200,214
+DB      15,56,202,231
+DB      69,15,56,202,222
+        pxor    xmm5,xmm7
+DB      15,56,201,247
+        pxor    xmm12,xmm14
+DB      69,15,56,201,238
+        movdqa  xmm1,xmm0
+        movdqa  xmm9,xmm8
+DB      15,58,204,194,3
+DB      15,56,200,204
+DB      69,15,58,204,194,3
+DB      69,15,56,200,203
+DB      15,56,202,236
+DB      69,15,56,202,227
+        pxor    xmm6,xmm4
+DB      15,56,201,252
+        pxor    xmm13,xmm11
+DB      69,15,56,201,243
+        movdqa  xmm2,xmm0
+        movdqa  xmm10,xmm8
+DB      15,58,204,193,3
+DB      15,56,200,213
+DB      69,15,58,204,193,3
+DB      69,15,56,200,212
+DB      15,56,202,245
+DB      69,15,56,202,236
+        pxor    xmm7,xmm5
+        pxor    xmm14,xmm12
+
+        mov     ecx,1
+        pxor    xmm4,xmm4
+        cmp     ecx,DWORD[rbx]
+        cmovge  r8,rsp
+
+        movdqa  xmm1,xmm0
+        movdqa  xmm9,xmm8
+DB      15,58,204,194,3
+DB      15,56,200,206
+DB      69,15,58,204,194,3
+DB      69,15,56,200,205
+DB      15,56,202,254
+DB      69,15,56,202,245
+
+        cmp     ecx,DWORD[4+rbx]
+        cmovge  r9,rsp
+        movq    xmm6,QWORD[rbx]
+
+        movdqa  xmm2,xmm0
+        movdqa  xmm10,xmm8
+DB      15,58,204,193,3
+DB      15,56,200,215
+DB      69,15,58,204,193,3
+DB      69,15,56,200,214
+
+        pshufd  xmm11,xmm6,0x00
+        pshufd  xmm12,xmm6,0x55
+        movdqa  xmm7,xmm6
+        pcmpgtd xmm11,xmm4
+        pcmpgtd xmm12,xmm4
+
+        movdqa  xmm1,xmm0
+        movdqa  xmm9,xmm8
+DB      15,58,204,194,3
+DB      15,56,200,204
+DB      69,15,58,204,194,3
+DB      68,15,56,200,204
+
+        pcmpgtd xmm7,xmm4
+        pand    xmm0,xmm11
+        pand    xmm1,xmm11
+        pand    xmm8,xmm12
+        pand    xmm9,xmm12
+        paddd   xmm6,xmm7
+
+        paddd   xmm0,XMMWORD[64+rsp]
+        paddd   xmm1,XMMWORD[80+rsp]
+        paddd   xmm8,XMMWORD[96+rsp]
+        paddd   xmm9,XMMWORD[112+rsp]
+
+        movq    QWORD[rbx],xmm6
+        dec     edx
+        jnz     NEAR $L$oop_shaext
+
+        mov     edx,DWORD[280+rsp]
+
+        pshufd  xmm0,xmm0,27
+        pshufd  xmm8,xmm8,27
+
+        movdqa  xmm6,xmm0
+        punpckldq       xmm0,xmm8
+        punpckhdq       xmm6,xmm8
+        punpckhdq       xmm1,xmm9
+        movq    QWORD[(0-64)+rdi],xmm0
+        psrldq  xmm0,8
+        movq    QWORD[(64-64)+rdi],xmm6
+        psrldq  xmm6,8
+        movq    QWORD[(32-64)+rdi],xmm0
+        psrldq  xmm1,8
+        movq    QWORD[(96-64)+rdi],xmm6
+        movq    QWORD[(128-64)+rdi],xmm1
+
+        lea     rdi,[8+rdi]
+        lea     rsi,[32+rsi]
+        dec     edx
+        jnz     NEAR $L$oop_grande_shaext
+
+$L$done_shaext:
+
+        movaps  xmm6,XMMWORD[((-184))+rax]
+        movaps  xmm7,XMMWORD[((-168))+rax]
+        movaps  xmm8,XMMWORD[((-152))+rax]
+        movaps  xmm9,XMMWORD[((-136))+rax]
+        movaps  xmm10,XMMWORD[((-120))+rax]
+        movaps  xmm11,XMMWORD[((-104))+rax]
+        movaps  xmm12,XMMWORD[((-88))+rax]
+        movaps  xmm13,XMMWORD[((-72))+rax]
+        movaps  xmm14,XMMWORD[((-56))+rax]
+        movaps  xmm15,XMMWORD[((-40))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$epilogue_shaext:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_sha1_multi_block_shaext:
+
+ALIGN   32
+sha1_multi_block_avx:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha1_multi_block_avx:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+_avx_shortcut:
+        shr     rcx,32
+        cmp     edx,2
+        jb      NEAR $L$avx
+        test    ecx,32
+        jnz     NEAR _avx2_shortcut
+        jmp     NEAR $L$avx
+ALIGN   32
+$L$avx:
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        lea     rsp,[((-168))+rsp]
+        movaps  XMMWORD[rsp],xmm6
+        movaps  XMMWORD[16+rsp],xmm7
+        movaps  XMMWORD[32+rsp],xmm8
+        movaps  XMMWORD[48+rsp],xmm9
+        movaps  XMMWORD[(-120)+rax],xmm10
+        movaps  XMMWORD[(-104)+rax],xmm11
+        movaps  XMMWORD[(-88)+rax],xmm12
+        movaps  XMMWORD[(-72)+rax],xmm13
+        movaps  XMMWORD[(-56)+rax],xmm14
+        movaps  XMMWORD[(-40)+rax],xmm15
+        sub     rsp,288
+        and     rsp,-256
+        mov     QWORD[272+rsp],rax
+
+$L$body_avx:
+        lea     rbp,[K_XX_XX]
+        lea     rbx,[256+rsp]
+
+        vzeroupper
+$L$oop_grande_avx:
+        mov     DWORD[280+rsp],edx
+        xor     edx,edx
+        mov     r8,QWORD[rsi]
+        mov     ecx,DWORD[8+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[rbx],ecx
+        cmovle  r8,rbp
+        mov     r9,QWORD[16+rsi]
+        mov     ecx,DWORD[24+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[4+rbx],ecx
+        cmovle  r9,rbp
+        mov     r10,QWORD[32+rsi]
+        mov     ecx,DWORD[40+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[8+rbx],ecx
+        cmovle  r10,rbp
+        mov     r11,QWORD[48+rsi]
+        mov     ecx,DWORD[56+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[12+rbx],ecx
+        cmovle  r11,rbp
+        test    edx,edx
+        jz      NEAR $L$done_avx
+
+        vmovdqu xmm10,XMMWORD[rdi]
+        lea     rax,[128+rsp]
+        vmovdqu xmm11,XMMWORD[32+rdi]
+        vmovdqu xmm12,XMMWORD[64+rdi]
+        vmovdqu xmm13,XMMWORD[96+rdi]
+        vmovdqu xmm14,XMMWORD[128+rdi]
+        vmovdqu xmm5,XMMWORD[96+rbp]
+        jmp     NEAR $L$oop_avx
+
+ALIGN   32
+$L$oop_avx:
+        vmovdqa xmm15,XMMWORD[((-32))+rbp]
+        vmovd   xmm0,DWORD[r8]
+        lea     r8,[64+r8]
+        vmovd   xmm2,DWORD[r9]
+        lea     r9,[64+r9]
+        vpinsrd xmm0,xmm0,DWORD[r10],1
+        lea     r10,[64+r10]
+        vpinsrd xmm2,xmm2,DWORD[r11],1
+        lea     r11,[64+r11]
+        vmovd   xmm1,DWORD[((-60))+r8]
+        vpunpckldq      xmm0,xmm0,xmm2
+        vmovd   xmm9,DWORD[((-60))+r9]
+        vpshufb xmm0,xmm0,xmm5
+        vpinsrd xmm1,xmm1,DWORD[((-60))+r10],1
+        vpinsrd xmm9,xmm9,DWORD[((-60))+r11],1
+        vpaddd  xmm14,xmm14,xmm15
+        vpslld  xmm8,xmm10,5
+        vpandn  xmm7,xmm11,xmm13
+        vpand   xmm6,xmm11,xmm12
+
+        vmovdqa XMMWORD[(0-128)+rax],xmm0
+        vpaddd  xmm14,xmm14,xmm0
+        vpunpckldq      xmm1,xmm1,xmm9
+        vpsrld  xmm9,xmm10,27
+        vpxor   xmm6,xmm6,xmm7
+        vmovd   xmm2,DWORD[((-56))+r8]
+
+        vpslld  xmm7,xmm11,30
+        vpor    xmm8,xmm8,xmm9
+        vmovd   xmm9,DWORD[((-56))+r9]
+        vpaddd  xmm14,xmm14,xmm6
+
+        vpsrld  xmm11,xmm11,2
+        vpaddd  xmm14,xmm14,xmm8
+        vpshufb xmm1,xmm1,xmm5
+        vpor    xmm11,xmm11,xmm7
+        vpinsrd xmm2,xmm2,DWORD[((-56))+r10],1
+        vpinsrd xmm9,xmm9,DWORD[((-56))+r11],1
+        vpaddd  xmm13,xmm13,xmm15
+        vpslld  xmm8,xmm14,5
+        vpandn  xmm7,xmm10,xmm12
+        vpand   xmm6,xmm10,xmm11
+
+        vmovdqa XMMWORD[(16-128)+rax],xmm1
+        vpaddd  xmm13,xmm13,xmm1
+        vpunpckldq      xmm2,xmm2,xmm9
+        vpsrld  xmm9,xmm14,27
+        vpxor   xmm6,xmm6,xmm7
+        vmovd   xmm3,DWORD[((-52))+r8]
+
+        vpslld  xmm7,xmm10,30
+        vpor    xmm8,xmm8,xmm9
+        vmovd   xmm9,DWORD[((-52))+r9]
+        vpaddd  xmm13,xmm13,xmm6
+
+        vpsrld  xmm10,xmm10,2
+        vpaddd  xmm13,xmm13,xmm8
+        vpshufb xmm2,xmm2,xmm5
+        vpor    xmm10,xmm10,xmm7
+        vpinsrd xmm3,xmm3,DWORD[((-52))+r10],1
+        vpinsrd xmm9,xmm9,DWORD[((-52))+r11],1
+        vpaddd  xmm12,xmm12,xmm15
+        vpslld  xmm8,xmm13,5
+        vpandn  xmm7,xmm14,xmm11
+        vpand   xmm6,xmm14,xmm10
+
+        vmovdqa XMMWORD[(32-128)+rax],xmm2
+        vpaddd  xmm12,xmm12,xmm2
+        vpunpckldq      xmm3,xmm3,xmm9
+        vpsrld  xmm9,xmm13,27
+        vpxor   xmm6,xmm6,xmm7
+        vmovd   xmm4,DWORD[((-48))+r8]
+
+        vpslld  xmm7,xmm14,30
+        vpor    xmm8,xmm8,xmm9
+        vmovd   xmm9,DWORD[((-48))+r9]
+        vpaddd  xmm12,xmm12,xmm6
+
+        vpsrld  xmm14,xmm14,2
+        vpaddd  xmm12,xmm12,xmm8
+        vpshufb xmm3,xmm3,xmm5
+        vpor    xmm14,xmm14,xmm7
+        vpinsrd xmm4,xmm4,DWORD[((-48))+r10],1
+        vpinsrd xmm9,xmm9,DWORD[((-48))+r11],1
+        vpaddd  xmm11,xmm11,xmm15
+        vpslld  xmm8,xmm12,5
+        vpandn  xmm7,xmm13,xmm10
+        vpand   xmm6,xmm13,xmm14
+
+        vmovdqa XMMWORD[(48-128)+rax],xmm3
+        vpaddd  xmm11,xmm11,xmm3
+        vpunpckldq      xmm4,xmm4,xmm9
+        vpsrld  xmm9,xmm12,27
+        vpxor   xmm6,xmm6,xmm7
+        vmovd   xmm0,DWORD[((-44))+r8]
+
+        vpslld  xmm7,xmm13,30
+        vpor    xmm8,xmm8,xmm9
+        vmovd   xmm9,DWORD[((-44))+r9]
+        vpaddd  xmm11,xmm11,xmm6
+
+        vpsrld  xmm13,xmm13,2
+        vpaddd  xmm11,xmm11,xmm8
+        vpshufb xmm4,xmm4,xmm5
+        vpor    xmm13,xmm13,xmm7
+        vpinsrd xmm0,xmm0,DWORD[((-44))+r10],1
+        vpinsrd xmm9,xmm9,DWORD[((-44))+r11],1
+        vpaddd  xmm10,xmm10,xmm15
+        vpslld  xmm8,xmm11,5
+        vpandn  xmm7,xmm12,xmm14
+        vpand   xmm6,xmm12,xmm13
+
+        vmovdqa XMMWORD[(64-128)+rax],xmm4
+        vpaddd  xmm10,xmm10,xmm4
+        vpunpckldq      xmm0,xmm0,xmm9
+        vpsrld  xmm9,xmm11,27
+        vpxor   xmm6,xmm6,xmm7
+        vmovd   xmm1,DWORD[((-40))+r8]
+
+        vpslld  xmm7,xmm12,30
+        vpor    xmm8,xmm8,xmm9
+        vmovd   xmm9,DWORD[((-40))+r9]
+        vpaddd  xmm10,xmm10,xmm6
+
+        vpsrld  xmm12,xmm12,2
+        vpaddd  xmm10,xmm10,xmm8
+        vpshufb xmm0,xmm0,xmm5
+        vpor    xmm12,xmm12,xmm7
+        vpinsrd xmm1,xmm1,DWORD[((-40))+r10],1
+        vpinsrd xmm9,xmm9,DWORD[((-40))+r11],1
+        vpaddd  xmm14,xmm14,xmm15
+        vpslld  xmm8,xmm10,5
+        vpandn  xmm7,xmm11,xmm13
+        vpand   xmm6,xmm11,xmm12
+
+        vmovdqa XMMWORD[(80-128)+rax],xmm0
+        vpaddd  xmm14,xmm14,xmm0
+        vpunpckldq      xmm1,xmm1,xmm9
+        vpsrld  xmm9,xmm10,27
+        vpxor   xmm6,xmm6,xmm7
+        vmovd   xmm2,DWORD[((-36))+r8]
+
+        vpslld  xmm7,xmm11,30
+        vpor    xmm8,xmm8,xmm9
+        vmovd   xmm9,DWORD[((-36))+r9]
+        vpaddd  xmm14,xmm14,xmm6
+
+        vpsrld  xmm11,xmm11,2
+        vpaddd  xmm14,xmm14,xmm8
+        vpshufb xmm1,xmm1,xmm5
+        vpor    xmm11,xmm11,xmm7
+        vpinsrd xmm2,xmm2,DWORD[((-36))+r10],1
+        vpinsrd xmm9,xmm9,DWORD[((-36))+r11],1
+        vpaddd  xmm13,xmm13,xmm15
+        vpslld  xmm8,xmm14,5
+        vpandn  xmm7,xmm10,xmm12
+        vpand   xmm6,xmm10,xmm11
+
+        vmovdqa XMMWORD[(96-128)+rax],xmm1
+        vpaddd  xmm13,xmm13,xmm1
+        vpunpckldq      xmm2,xmm2,xmm9
+        vpsrld  xmm9,xmm14,27
+        vpxor   xmm6,xmm6,xmm7
+        vmovd   xmm3,DWORD[((-32))+r8]
+
+        vpslld  xmm7,xmm10,30
+        vpor    xmm8,xmm8,xmm9
+        vmovd   xmm9,DWORD[((-32))+r9]
+        vpaddd  xmm13,xmm13,xmm6
+
+        vpsrld  xmm10,xmm10,2
+        vpaddd  xmm13,xmm13,xmm8
+        vpshufb xmm2,xmm2,xmm5
+        vpor    xmm10,xmm10,xmm7
+        vpinsrd xmm3,xmm3,DWORD[((-32))+r10],1
+        vpinsrd xmm9,xmm9,DWORD[((-32))+r11],1
+        vpaddd  xmm12,xmm12,xmm15
+        vpslld  xmm8,xmm13,5
+        vpandn  xmm7,xmm14,xmm11
+        vpand   xmm6,xmm14,xmm10
+
+        vmovdqa XMMWORD[(112-128)+rax],xmm2
+        vpaddd  xmm12,xmm12,xmm2
+        vpunpckldq      xmm3,xmm3,xmm9
+        vpsrld  xmm9,xmm13,27
+        vpxor   xmm6,xmm6,xmm7
+        vmovd   xmm4,DWORD[((-28))+r8]
+
+        vpslld  xmm7,xmm14,30
+        vpor    xmm8,xmm8,xmm9
+        vmovd   xmm9,DWORD[((-28))+r9]
+        vpaddd  xmm12,xmm12,xmm6
+
+        vpsrld  xmm14,xmm14,2
+        vpaddd  xmm12,xmm12,xmm8
+        vpshufb xmm3,xmm3,xmm5
+        vpor    xmm14,xmm14,xmm7
+        vpinsrd xmm4,xmm4,DWORD[((-28))+r10],1
+        vpinsrd xmm9,xmm9,DWORD[((-28))+r11],1
+        vpaddd  xmm11,xmm11,xmm15
+        vpslld  xmm8,xmm12,5
+        vpandn  xmm7,xmm13,xmm10
+        vpand   xmm6,xmm13,xmm14
+
+        vmovdqa XMMWORD[(128-128)+rax],xmm3
+        vpaddd  xmm11,xmm11,xmm3
+        vpunpckldq      xmm4,xmm4,xmm9
+        vpsrld  xmm9,xmm12,27
+        vpxor   xmm6,xmm6,xmm7
+        vmovd   xmm0,DWORD[((-24))+r8]
+
+        vpslld  xmm7,xmm13,30
+        vpor    xmm8,xmm8,xmm9
+        vmovd   xmm9,DWORD[((-24))+r9]
+        vpaddd  xmm11,xmm11,xmm6
+
+        vpsrld  xmm13,xmm13,2
+        vpaddd  xmm11,xmm11,xmm8
+        vpshufb xmm4,xmm4,xmm5
+        vpor    xmm13,xmm13,xmm7
+        vpinsrd xmm0,xmm0,DWORD[((-24))+r10],1
+        vpinsrd xmm9,xmm9,DWORD[((-24))+r11],1
+        vpaddd  xmm10,xmm10,xmm15
+        vpslld  xmm8,xmm11,5
+        vpandn  xmm7,xmm12,xmm14
+        vpand   xmm6,xmm12,xmm13
+
+        vmovdqa XMMWORD[(144-128)+rax],xmm4
+        vpaddd  xmm10,xmm10,xmm4
+        vpunpckldq      xmm0,xmm0,xmm9
+        vpsrld  xmm9,xmm11,27
+        vpxor   xmm6,xmm6,xmm7
+        vmovd   xmm1,DWORD[((-20))+r8]
+
+        vpslld  xmm7,xmm12,30
+        vpor    xmm8,xmm8,xmm9
+        vmovd   xmm9,DWORD[((-20))+r9]
+        vpaddd  xmm10,xmm10,xmm6
+
+        vpsrld  xmm12,xmm12,2
+        vpaddd  xmm10,xmm10,xmm8
+        vpshufb xmm0,xmm0,xmm5
+        vpor    xmm12,xmm12,xmm7
+        vpinsrd xmm1,xmm1,DWORD[((-20))+r10],1
+        vpinsrd xmm9,xmm9,DWORD[((-20))+r11],1
+        vpaddd  xmm14,xmm14,xmm15
+        vpslld  xmm8,xmm10,5
+        vpandn  xmm7,xmm11,xmm13
+        vpand   xmm6,xmm11,xmm12
+
+        vmovdqa XMMWORD[(160-128)+rax],xmm0
+        vpaddd  xmm14,xmm14,xmm0
+        vpunpckldq      xmm1,xmm1,xmm9
+        vpsrld  xmm9,xmm10,27
+        vpxor   xmm6,xmm6,xmm7
+        vmovd   xmm2,DWORD[((-16))+r8]
+
+        vpslld  xmm7,xmm11,30
+        vpor    xmm8,xmm8,xmm9
+        vmovd   xmm9,DWORD[((-16))+r9]
+        vpaddd  xmm14,xmm14,xmm6
+
+        vpsrld  xmm11,xmm11,2
+        vpaddd  xmm14,xmm14,xmm8
+        vpshufb xmm1,xmm1,xmm5
+        vpor    xmm11,xmm11,xmm7
+        vpinsrd xmm2,xmm2,DWORD[((-16))+r10],1
+        vpinsrd xmm9,xmm9,DWORD[((-16))+r11],1
+        vpaddd  xmm13,xmm13,xmm15
+        vpslld  xmm8,xmm14,5
+        vpandn  xmm7,xmm10,xmm12
+        vpand   xmm6,xmm10,xmm11
+
+        vmovdqa XMMWORD[(176-128)+rax],xmm1
+        vpaddd  xmm13,xmm13,xmm1
+        vpunpckldq      xmm2,xmm2,xmm9
+        vpsrld  xmm9,xmm14,27
+        vpxor   xmm6,xmm6,xmm7
+        vmovd   xmm3,DWORD[((-12))+r8]
+
+        vpslld  xmm7,xmm10,30
+        vpor    xmm8,xmm8,xmm9
+        vmovd   xmm9,DWORD[((-12))+r9]
+        vpaddd  xmm13,xmm13,xmm6
+
+        vpsrld  xmm10,xmm10,2
+        vpaddd  xmm13,xmm13,xmm8
+        vpshufb xmm2,xmm2,xmm5
+        vpor    xmm10,xmm10,xmm7
+        vpinsrd xmm3,xmm3,DWORD[((-12))+r10],1
+        vpinsrd xmm9,xmm9,DWORD[((-12))+r11],1
+        vpaddd  xmm12,xmm12,xmm15
+        vpslld  xmm8,xmm13,5
+        vpandn  xmm7,xmm14,xmm11
+        vpand   xmm6,xmm14,xmm10
+
+        vmovdqa XMMWORD[(192-128)+rax],xmm2
+        vpaddd  xmm12,xmm12,xmm2
+        vpunpckldq      xmm3,xmm3,xmm9
+        vpsrld  xmm9,xmm13,27
+        vpxor   xmm6,xmm6,xmm7
+        vmovd   xmm4,DWORD[((-8))+r8]
+
+        vpslld  xmm7,xmm14,30
+        vpor    xmm8,xmm8,xmm9
+        vmovd   xmm9,DWORD[((-8))+r9]
+        vpaddd  xmm12,xmm12,xmm6
+
+        vpsrld  xmm14,xmm14,2
+        vpaddd  xmm12,xmm12,xmm8
+        vpshufb xmm3,xmm3,xmm5
+        vpor    xmm14,xmm14,xmm7
+        vpinsrd xmm4,xmm4,DWORD[((-8))+r10],1
+        vpinsrd xmm9,xmm9,DWORD[((-8))+r11],1
+        vpaddd  xmm11,xmm11,xmm15
+        vpslld  xmm8,xmm12,5
+        vpandn  xmm7,xmm13,xmm10
+        vpand   xmm6,xmm13,xmm14
+
+        vmovdqa XMMWORD[(208-128)+rax],xmm3
+        vpaddd  xmm11,xmm11,xmm3
+        vpunpckldq      xmm4,xmm4,xmm9
+        vpsrld  xmm9,xmm12,27
+        vpxor   xmm6,xmm6,xmm7
+        vmovd   xmm0,DWORD[((-4))+r8]
+
+        vpslld  xmm7,xmm13,30
+        vpor    xmm8,xmm8,xmm9
+        vmovd   xmm9,DWORD[((-4))+r9]
+        vpaddd  xmm11,xmm11,xmm6
+
+        vpsrld  xmm13,xmm13,2
+        vpaddd  xmm11,xmm11,xmm8
+        vpshufb xmm4,xmm4,xmm5
+        vpor    xmm13,xmm13,xmm7
+        vmovdqa xmm1,XMMWORD[((0-128))+rax]
+        vpinsrd xmm0,xmm0,DWORD[((-4))+r10],1
+        vpinsrd xmm9,xmm9,DWORD[((-4))+r11],1
+        vpaddd  xmm10,xmm10,xmm15
+        prefetcht0      [63+r8]
+        vpslld  xmm8,xmm11,5
+        vpandn  xmm7,xmm12,xmm14
+        vpand   xmm6,xmm12,xmm13
+
+        vmovdqa XMMWORD[(224-128)+rax],xmm4
+        vpaddd  xmm10,xmm10,xmm4
+        vpunpckldq      xmm0,xmm0,xmm9
+        vpsrld  xmm9,xmm11,27
+        prefetcht0      [63+r9]
+        vpxor   xmm6,xmm6,xmm7
+
+        vpslld  xmm7,xmm12,30
+        vpor    xmm8,xmm8,xmm9
+        prefetcht0      [63+r10]
+        vpaddd  xmm10,xmm10,xmm6
+
+        vpsrld  xmm12,xmm12,2
+        vpaddd  xmm10,xmm10,xmm8
+        prefetcht0      [63+r11]
+        vpshufb xmm0,xmm0,xmm5
+        vpor    xmm12,xmm12,xmm7
+        vmovdqa xmm2,XMMWORD[((16-128))+rax]
+        vpxor   xmm1,xmm1,xmm3
+        vmovdqa xmm3,XMMWORD[((32-128))+rax]
+
+        vpaddd  xmm14,xmm14,xmm15
+        vpslld  xmm8,xmm10,5
+        vpandn  xmm7,xmm11,xmm13
+
+        vpand   xmm6,xmm11,xmm12
+
+        vmovdqa XMMWORD[(240-128)+rax],xmm0
+        vpaddd  xmm14,xmm14,xmm0
+        vpxor   xmm1,xmm1,XMMWORD[((128-128))+rax]
+        vpsrld  xmm9,xmm10,27
+        vpxor   xmm6,xmm6,xmm7
+        vpxor   xmm1,xmm1,xmm3
+
+
+        vpslld  xmm7,xmm11,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm14,xmm14,xmm6
+
+        vpsrld  xmm5,xmm1,31
+        vpaddd  xmm1,xmm1,xmm1
+
+        vpsrld  xmm11,xmm11,2
+
+        vpaddd  xmm14,xmm14,xmm8
+        vpor    xmm1,xmm1,xmm5
+        vpor    xmm11,xmm11,xmm7
+        vpxor   xmm2,xmm2,xmm4
+        vmovdqa xmm4,XMMWORD[((48-128))+rax]
+
+        vpaddd  xmm13,xmm13,xmm15
+        vpslld  xmm8,xmm14,5
+        vpandn  xmm7,xmm10,xmm12
+
+        vpand   xmm6,xmm10,xmm11
+
+        vmovdqa XMMWORD[(0-128)+rax],xmm1
+        vpaddd  xmm13,xmm13,xmm1
+        vpxor   xmm2,xmm2,XMMWORD[((144-128))+rax]
+        vpsrld  xmm9,xmm14,27
+        vpxor   xmm6,xmm6,xmm7
+        vpxor   xmm2,xmm2,xmm4
+
+
+        vpslld  xmm7,xmm10,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm13,xmm13,xmm6
+
+        vpsrld  xmm5,xmm2,31
+        vpaddd  xmm2,xmm2,xmm2
+
+        vpsrld  xmm10,xmm10,2
+
+        vpaddd  xmm13,xmm13,xmm8
+        vpor    xmm2,xmm2,xmm5
+        vpor    xmm10,xmm10,xmm7
+        vpxor   xmm3,xmm3,xmm0
+        vmovdqa xmm0,XMMWORD[((64-128))+rax]
+
+        vpaddd  xmm12,xmm12,xmm15
+        vpslld  xmm8,xmm13,5
+        vpandn  xmm7,xmm14,xmm11
+
+        vpand   xmm6,xmm14,xmm10
+
+        vmovdqa XMMWORD[(16-128)+rax],xmm2
+        vpaddd  xmm12,xmm12,xmm2
+        vpxor   xmm3,xmm3,XMMWORD[((160-128))+rax]
+        vpsrld  xmm9,xmm13,27
+        vpxor   xmm6,xmm6,xmm7
+        vpxor   xmm3,xmm3,xmm0
+
+
+        vpslld  xmm7,xmm14,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm12,xmm12,xmm6
+
+        vpsrld  xmm5,xmm3,31
+        vpaddd  xmm3,xmm3,xmm3
+
+        vpsrld  xmm14,xmm14,2
+
+        vpaddd  xmm12,xmm12,xmm8
+        vpor    xmm3,xmm3,xmm5
+        vpor    xmm14,xmm14,xmm7
+        vpxor   xmm4,xmm4,xmm1
+        vmovdqa xmm1,XMMWORD[((80-128))+rax]
+
+        vpaddd  xmm11,xmm11,xmm15
+        vpslld  xmm8,xmm12,5
+        vpandn  xmm7,xmm13,xmm10
+
+        vpand   xmm6,xmm13,xmm14
+
+        vmovdqa XMMWORD[(32-128)+rax],xmm3
+        vpaddd  xmm11,xmm11,xmm3
+        vpxor   xmm4,xmm4,XMMWORD[((176-128))+rax]
+        vpsrld  xmm9,xmm12,27
+        vpxor   xmm6,xmm6,xmm7
+        vpxor   xmm4,xmm4,xmm1
+
+
+        vpslld  xmm7,xmm13,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm11,xmm11,xmm6
+
+        vpsrld  xmm5,xmm4,31
+        vpaddd  xmm4,xmm4,xmm4
+
+        vpsrld  xmm13,xmm13,2
+
+        vpaddd  xmm11,xmm11,xmm8
+        vpor    xmm4,xmm4,xmm5
+        vpor    xmm13,xmm13,xmm7
+        vpxor   xmm0,xmm0,xmm2
+        vmovdqa xmm2,XMMWORD[((96-128))+rax]
+
+        vpaddd  xmm10,xmm10,xmm15
+        vpslld  xmm8,xmm11,5
+        vpandn  xmm7,xmm12,xmm14
+
+        vpand   xmm6,xmm12,xmm13
+
+        vmovdqa XMMWORD[(48-128)+rax],xmm4
+        vpaddd  xmm10,xmm10,xmm4
+        vpxor   xmm0,xmm0,XMMWORD[((192-128))+rax]
+        vpsrld  xmm9,xmm11,27
+        vpxor   xmm6,xmm6,xmm7
+        vpxor   xmm0,xmm0,xmm2
+
+
+        vpslld  xmm7,xmm12,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm10,xmm10,xmm6
+
+        vpsrld  xmm5,xmm0,31
+        vpaddd  xmm0,xmm0,xmm0
+
+        vpsrld  xmm12,xmm12,2
+
+        vpaddd  xmm10,xmm10,xmm8
+        vpor    xmm0,xmm0,xmm5
+        vpor    xmm12,xmm12,xmm7
+        vmovdqa xmm15,XMMWORD[rbp]
+        vpxor   xmm1,xmm1,xmm3
+        vmovdqa xmm3,XMMWORD[((112-128))+rax]
+
+        vpslld  xmm8,xmm10,5
+        vpaddd  xmm14,xmm14,xmm15
+        vpxor   xmm6,xmm13,xmm11
+        vmovdqa XMMWORD[(64-128)+rax],xmm0
+        vpaddd  xmm14,xmm14,xmm0
+        vpxor   xmm1,xmm1,XMMWORD[((208-128))+rax]
+        vpsrld  xmm9,xmm10,27
+        vpxor   xmm6,xmm6,xmm12
+        vpxor   xmm1,xmm1,xmm3
+
+        vpslld  xmm7,xmm11,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm14,xmm14,xmm6
+        vpsrld  xmm5,xmm1,31
+        vpaddd  xmm1,xmm1,xmm1
+
+        vpsrld  xmm11,xmm11,2
+        vpaddd  xmm14,xmm14,xmm8
+        vpor    xmm1,xmm1,xmm5
+        vpor    xmm11,xmm11,xmm7
+        vpxor   xmm2,xmm2,xmm4
+        vmovdqa xmm4,XMMWORD[((128-128))+rax]
+
+        vpslld  xmm8,xmm14,5
+        vpaddd  xmm13,xmm13,xmm15
+        vpxor   xmm6,xmm12,xmm10
+        vmovdqa XMMWORD[(80-128)+rax],xmm1
+        vpaddd  xmm13,xmm13,xmm1
+        vpxor   xmm2,xmm2,XMMWORD[((224-128))+rax]
+        vpsrld  xmm9,xmm14,27
+        vpxor   xmm6,xmm6,xmm11
+        vpxor   xmm2,xmm2,xmm4
+
+        vpslld  xmm7,xmm10,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm13,xmm13,xmm6
+        vpsrld  xmm5,xmm2,31
+        vpaddd  xmm2,xmm2,xmm2
+
+        vpsrld  xmm10,xmm10,2
+        vpaddd  xmm13,xmm13,xmm8
+        vpor    xmm2,xmm2,xmm5
+        vpor    xmm10,xmm10,xmm7
+        vpxor   xmm3,xmm3,xmm0
+        vmovdqa xmm0,XMMWORD[((144-128))+rax]
+
+        vpslld  xmm8,xmm13,5
+        vpaddd  xmm12,xmm12,xmm15
+        vpxor   xmm6,xmm11,xmm14
+        vmovdqa XMMWORD[(96-128)+rax],xmm2
+        vpaddd  xmm12,xmm12,xmm2
+        vpxor   xmm3,xmm3,XMMWORD[((240-128))+rax]
+        vpsrld  xmm9,xmm13,27
+        vpxor   xmm6,xmm6,xmm10
+        vpxor   xmm3,xmm3,xmm0
+
+        vpslld  xmm7,xmm14,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm12,xmm12,xmm6
+        vpsrld  xmm5,xmm3,31
+        vpaddd  xmm3,xmm3,xmm3
+
+        vpsrld  xmm14,xmm14,2
+        vpaddd  xmm12,xmm12,xmm8
+        vpor    xmm3,xmm3,xmm5
+        vpor    xmm14,xmm14,xmm7
+        vpxor   xmm4,xmm4,xmm1
+        vmovdqa xmm1,XMMWORD[((160-128))+rax]
+
+        vpslld  xmm8,xmm12,5
+        vpaddd  xmm11,xmm11,xmm15
+        vpxor   xmm6,xmm10,xmm13
+        vmovdqa XMMWORD[(112-128)+rax],xmm3
+        vpaddd  xmm11,xmm11,xmm3
+        vpxor   xmm4,xmm4,XMMWORD[((0-128))+rax]
+        vpsrld  xmm9,xmm12,27
+        vpxor   xmm6,xmm6,xmm14
+        vpxor   xmm4,xmm4,xmm1
+
+        vpslld  xmm7,xmm13,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm11,xmm11,xmm6
+        vpsrld  xmm5,xmm4,31
+        vpaddd  xmm4,xmm4,xmm4
+
+        vpsrld  xmm13,xmm13,2
+        vpaddd  xmm11,xmm11,xmm8
+        vpor    xmm4,xmm4,xmm5
+        vpor    xmm13,xmm13,xmm7
+        vpxor   xmm0,xmm0,xmm2
+        vmovdqa xmm2,XMMWORD[((176-128))+rax]
+
+        vpslld  xmm8,xmm11,5
+        vpaddd  xmm10,xmm10,xmm15
+        vpxor   xmm6,xmm14,xmm12
+        vmovdqa XMMWORD[(128-128)+rax],xmm4
+        vpaddd  xmm10,xmm10,xmm4
+        vpxor   xmm0,xmm0,XMMWORD[((16-128))+rax]
+        vpsrld  xmm9,xmm11,27
+        vpxor   xmm6,xmm6,xmm13
+        vpxor   xmm0,xmm0,xmm2
+
+        vpslld  xmm7,xmm12,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm10,xmm10,xmm6
+        vpsrld  xmm5,xmm0,31
+        vpaddd  xmm0,xmm0,xmm0
+
+        vpsrld  xmm12,xmm12,2
+        vpaddd  xmm10,xmm10,xmm8
+        vpor    xmm0,xmm0,xmm5
+        vpor    xmm12,xmm12,xmm7
+        vpxor   xmm1,xmm1,xmm3
+        vmovdqa xmm3,XMMWORD[((192-128))+rax]
+
+        vpslld  xmm8,xmm10,5
+        vpaddd  xmm14,xmm14,xmm15
+        vpxor   xmm6,xmm13,xmm11
+        vmovdqa XMMWORD[(144-128)+rax],xmm0
+        vpaddd  xmm14,xmm14,xmm0
+        vpxor   xmm1,xmm1,XMMWORD[((32-128))+rax]
+        vpsrld  xmm9,xmm10,27
+        vpxor   xmm6,xmm6,xmm12
+        vpxor   xmm1,xmm1,xmm3
+
+        vpslld  xmm7,xmm11,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm14,xmm14,xmm6
+        vpsrld  xmm5,xmm1,31
+        vpaddd  xmm1,xmm1,xmm1
+
+        vpsrld  xmm11,xmm11,2
+        vpaddd  xmm14,xmm14,xmm8
+        vpor    xmm1,xmm1,xmm5
+        vpor    xmm11,xmm11,xmm7
+        vpxor   xmm2,xmm2,xmm4
+        vmovdqa xmm4,XMMWORD[((208-128))+rax]
+
+        vpslld  xmm8,xmm14,5
+        vpaddd  xmm13,xmm13,xmm15
+        vpxor   xmm6,xmm12,xmm10
+        vmovdqa XMMWORD[(160-128)+rax],xmm1
+        vpaddd  xmm13,xmm13,xmm1
+        vpxor   xmm2,xmm2,XMMWORD[((48-128))+rax]
+        vpsrld  xmm9,xmm14,27
+        vpxor   xmm6,xmm6,xmm11
+        vpxor   xmm2,xmm2,xmm4
+
+        vpslld  xmm7,xmm10,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm13,xmm13,xmm6
+        vpsrld  xmm5,xmm2,31
+        vpaddd  xmm2,xmm2,xmm2
+
+        vpsrld  xmm10,xmm10,2
+        vpaddd  xmm13,xmm13,xmm8
+        vpor    xmm2,xmm2,xmm5
+        vpor    xmm10,xmm10,xmm7
+        vpxor   xmm3,xmm3,xmm0
+        vmovdqa xmm0,XMMWORD[((224-128))+rax]
+
+        vpslld  xmm8,xmm13,5
+        vpaddd  xmm12,xmm12,xmm15
+        vpxor   xmm6,xmm11,xmm14
+        vmovdqa XMMWORD[(176-128)+rax],xmm2
+        vpaddd  xmm12,xmm12,xmm2
+        vpxor   xmm3,xmm3,XMMWORD[((64-128))+rax]
+        vpsrld  xmm9,xmm13,27
+        vpxor   xmm6,xmm6,xmm10
+        vpxor   xmm3,xmm3,xmm0
+
+        vpslld  xmm7,xmm14,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm12,xmm12,xmm6
+        vpsrld  xmm5,xmm3,31
+        vpaddd  xmm3,xmm3,xmm3
+
+        vpsrld  xmm14,xmm14,2
+        vpaddd  xmm12,xmm12,xmm8
+        vpor    xmm3,xmm3,xmm5
+        vpor    xmm14,xmm14,xmm7
+        vpxor   xmm4,xmm4,xmm1
+        vmovdqa xmm1,XMMWORD[((240-128))+rax]
+
+        vpslld  xmm8,xmm12,5
+        vpaddd  xmm11,xmm11,xmm15
+        vpxor   xmm6,xmm10,xmm13
+        vmovdqa XMMWORD[(192-128)+rax],xmm3
+        vpaddd  xmm11,xmm11,xmm3
+        vpxor   xmm4,xmm4,XMMWORD[((80-128))+rax]
+        vpsrld  xmm9,xmm12,27
+        vpxor   xmm6,xmm6,xmm14
+        vpxor   xmm4,xmm4,xmm1
+
+        vpslld  xmm7,xmm13,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm11,xmm11,xmm6
+        vpsrld  xmm5,xmm4,31
+        vpaddd  xmm4,xmm4,xmm4
+
+        vpsrld  xmm13,xmm13,2
+        vpaddd  xmm11,xmm11,xmm8
+        vpor    xmm4,xmm4,xmm5
+        vpor    xmm13,xmm13,xmm7
+        vpxor   xmm0,xmm0,xmm2
+        vmovdqa xmm2,XMMWORD[((0-128))+rax]
+
+        vpslld  xmm8,xmm11,5
+        vpaddd  xmm10,xmm10,xmm15
+        vpxor   xmm6,xmm14,xmm12
+        vmovdqa XMMWORD[(208-128)+rax],xmm4
+        vpaddd  xmm10,xmm10,xmm4
+        vpxor   xmm0,xmm0,XMMWORD[((96-128))+rax]
+        vpsrld  xmm9,xmm11,27
+        vpxor   xmm6,xmm6,xmm13
+        vpxor   xmm0,xmm0,xmm2
+
+        vpslld  xmm7,xmm12,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm10,xmm10,xmm6
+        vpsrld  xmm5,xmm0,31
+        vpaddd  xmm0,xmm0,xmm0
+
+        vpsrld  xmm12,xmm12,2
+        vpaddd  xmm10,xmm10,xmm8
+        vpor    xmm0,xmm0,xmm5
+        vpor    xmm12,xmm12,xmm7
+        vpxor   xmm1,xmm1,xmm3
+        vmovdqa xmm3,XMMWORD[((16-128))+rax]
+
+        vpslld  xmm8,xmm10,5
+        vpaddd  xmm14,xmm14,xmm15
+        vpxor   xmm6,xmm13,xmm11
+        vmovdqa XMMWORD[(224-128)+rax],xmm0
+        vpaddd  xmm14,xmm14,xmm0
+        vpxor   xmm1,xmm1,XMMWORD[((112-128))+rax]
+        vpsrld  xmm9,xmm10,27
+        vpxor   xmm6,xmm6,xmm12
+        vpxor   xmm1,xmm1,xmm3
+
+        vpslld  xmm7,xmm11,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm14,xmm14,xmm6
+        vpsrld  xmm5,xmm1,31
+        vpaddd  xmm1,xmm1,xmm1
+
+        vpsrld  xmm11,xmm11,2
+        vpaddd  xmm14,xmm14,xmm8
+        vpor    xmm1,xmm1,xmm5
+        vpor    xmm11,xmm11,xmm7
+        vpxor   xmm2,xmm2,xmm4
+        vmovdqa xmm4,XMMWORD[((32-128))+rax]
+
+        vpslld  xmm8,xmm14,5
+        vpaddd  xmm13,xmm13,xmm15
+        vpxor   xmm6,xmm12,xmm10
+        vmovdqa XMMWORD[(240-128)+rax],xmm1
+        vpaddd  xmm13,xmm13,xmm1
+        vpxor   xmm2,xmm2,XMMWORD[((128-128))+rax]
+        vpsrld  xmm9,xmm14,27
+        vpxor   xmm6,xmm6,xmm11
+        vpxor   xmm2,xmm2,xmm4
+
+        vpslld  xmm7,xmm10,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm13,xmm13,xmm6
+        vpsrld  xmm5,xmm2,31
+        vpaddd  xmm2,xmm2,xmm2
+
+        vpsrld  xmm10,xmm10,2
+        vpaddd  xmm13,xmm13,xmm8
+        vpor    xmm2,xmm2,xmm5
+        vpor    xmm10,xmm10,xmm7
+        vpxor   xmm3,xmm3,xmm0
+        vmovdqa xmm0,XMMWORD[((48-128))+rax]
+
+        vpslld  xmm8,xmm13,5
+        vpaddd  xmm12,xmm12,xmm15
+        vpxor   xmm6,xmm11,xmm14
+        vmovdqa XMMWORD[(0-128)+rax],xmm2
+        vpaddd  xmm12,xmm12,xmm2
+        vpxor   xmm3,xmm3,XMMWORD[((144-128))+rax]
+        vpsrld  xmm9,xmm13,27
+        vpxor   xmm6,xmm6,xmm10
+        vpxor   xmm3,xmm3,xmm0
+
+        vpslld  xmm7,xmm14,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm12,xmm12,xmm6
+        vpsrld  xmm5,xmm3,31
+        vpaddd  xmm3,xmm3,xmm3
+
+        vpsrld  xmm14,xmm14,2
+        vpaddd  xmm12,xmm12,xmm8
+        vpor    xmm3,xmm3,xmm5
+        vpor    xmm14,xmm14,xmm7
+        vpxor   xmm4,xmm4,xmm1
+        vmovdqa xmm1,XMMWORD[((64-128))+rax]
+
+        vpslld  xmm8,xmm12,5
+        vpaddd  xmm11,xmm11,xmm15
+        vpxor   xmm6,xmm10,xmm13
+        vmovdqa XMMWORD[(16-128)+rax],xmm3
+        vpaddd  xmm11,xmm11,xmm3
+        vpxor   xmm4,xmm4,XMMWORD[((160-128))+rax]
+        vpsrld  xmm9,xmm12,27
+        vpxor   xmm6,xmm6,xmm14
+        vpxor   xmm4,xmm4,xmm1
+
+        vpslld  xmm7,xmm13,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm11,xmm11,xmm6
+        vpsrld  xmm5,xmm4,31
+        vpaddd  xmm4,xmm4,xmm4
+
+        vpsrld  xmm13,xmm13,2
+        vpaddd  xmm11,xmm11,xmm8
+        vpor    xmm4,xmm4,xmm5
+        vpor    xmm13,xmm13,xmm7
+        vpxor   xmm0,xmm0,xmm2
+        vmovdqa xmm2,XMMWORD[((80-128))+rax]
+
+        vpslld  xmm8,xmm11,5
+        vpaddd  xmm10,xmm10,xmm15
+        vpxor   xmm6,xmm14,xmm12
+        vmovdqa XMMWORD[(32-128)+rax],xmm4
+        vpaddd  xmm10,xmm10,xmm4
+        vpxor   xmm0,xmm0,XMMWORD[((176-128))+rax]
+        vpsrld  xmm9,xmm11,27
+        vpxor   xmm6,xmm6,xmm13
+        vpxor   xmm0,xmm0,xmm2
+
+        vpslld  xmm7,xmm12,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm10,xmm10,xmm6
+        vpsrld  xmm5,xmm0,31
+        vpaddd  xmm0,xmm0,xmm0
+
+        vpsrld  xmm12,xmm12,2
+        vpaddd  xmm10,xmm10,xmm8
+        vpor    xmm0,xmm0,xmm5
+        vpor    xmm12,xmm12,xmm7
+        vpxor   xmm1,xmm1,xmm3
+        vmovdqa xmm3,XMMWORD[((96-128))+rax]
+
+        vpslld  xmm8,xmm10,5
+        vpaddd  xmm14,xmm14,xmm15
+        vpxor   xmm6,xmm13,xmm11
+        vmovdqa XMMWORD[(48-128)+rax],xmm0
+        vpaddd  xmm14,xmm14,xmm0
+        vpxor   xmm1,xmm1,XMMWORD[((192-128))+rax]
+        vpsrld  xmm9,xmm10,27
+        vpxor   xmm6,xmm6,xmm12
+        vpxor   xmm1,xmm1,xmm3
+
+        vpslld  xmm7,xmm11,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm14,xmm14,xmm6
+        vpsrld  xmm5,xmm1,31
+        vpaddd  xmm1,xmm1,xmm1
+
+        vpsrld  xmm11,xmm11,2
+        vpaddd  xmm14,xmm14,xmm8
+        vpor    xmm1,xmm1,xmm5
+        vpor    xmm11,xmm11,xmm7
+        vpxor   xmm2,xmm2,xmm4
+        vmovdqa xmm4,XMMWORD[((112-128))+rax]
+
+        vpslld  xmm8,xmm14,5
+        vpaddd  xmm13,xmm13,xmm15
+        vpxor   xmm6,xmm12,xmm10
+        vmovdqa XMMWORD[(64-128)+rax],xmm1
+        vpaddd  xmm13,xmm13,xmm1
+        vpxor   xmm2,xmm2,XMMWORD[((208-128))+rax]
+        vpsrld  xmm9,xmm14,27
+        vpxor   xmm6,xmm6,xmm11
+        vpxor   xmm2,xmm2,xmm4
+
+        vpslld  xmm7,xmm10,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm13,xmm13,xmm6
+        vpsrld  xmm5,xmm2,31
+        vpaddd  xmm2,xmm2,xmm2
+
+        vpsrld  xmm10,xmm10,2
+        vpaddd  xmm13,xmm13,xmm8
+        vpor    xmm2,xmm2,xmm5
+        vpor    xmm10,xmm10,xmm7
+        vpxor   xmm3,xmm3,xmm0
+        vmovdqa xmm0,XMMWORD[((128-128))+rax]
+
+        vpslld  xmm8,xmm13,5
+        vpaddd  xmm12,xmm12,xmm15
+        vpxor   xmm6,xmm11,xmm14
+        vmovdqa XMMWORD[(80-128)+rax],xmm2
+        vpaddd  xmm12,xmm12,xmm2
+        vpxor   xmm3,xmm3,XMMWORD[((224-128))+rax]
+        vpsrld  xmm9,xmm13,27
+        vpxor   xmm6,xmm6,xmm10
+        vpxor   xmm3,xmm3,xmm0
+
+        vpslld  xmm7,xmm14,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm12,xmm12,xmm6
+        vpsrld  xmm5,xmm3,31
+        vpaddd  xmm3,xmm3,xmm3
+
+        vpsrld  xmm14,xmm14,2
+        vpaddd  xmm12,xmm12,xmm8
+        vpor    xmm3,xmm3,xmm5
+        vpor    xmm14,xmm14,xmm7
+        vpxor   xmm4,xmm4,xmm1
+        vmovdqa xmm1,XMMWORD[((144-128))+rax]
+
+        vpslld  xmm8,xmm12,5
+        vpaddd  xmm11,xmm11,xmm15
+        vpxor   xmm6,xmm10,xmm13
+        vmovdqa XMMWORD[(96-128)+rax],xmm3
+        vpaddd  xmm11,xmm11,xmm3
+        vpxor   xmm4,xmm4,XMMWORD[((240-128))+rax]
+        vpsrld  xmm9,xmm12,27
+        vpxor   xmm6,xmm6,xmm14
+        vpxor   xmm4,xmm4,xmm1
+
+        vpslld  xmm7,xmm13,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm11,xmm11,xmm6
+        vpsrld  xmm5,xmm4,31
+        vpaddd  xmm4,xmm4,xmm4
+
+        vpsrld  xmm13,xmm13,2
+        vpaddd  xmm11,xmm11,xmm8
+        vpor    xmm4,xmm4,xmm5
+        vpor    xmm13,xmm13,xmm7
+        vpxor   xmm0,xmm0,xmm2
+        vmovdqa xmm2,XMMWORD[((160-128))+rax]
+
+        vpslld  xmm8,xmm11,5
+        vpaddd  xmm10,xmm10,xmm15
+        vpxor   xmm6,xmm14,xmm12
+        vmovdqa XMMWORD[(112-128)+rax],xmm4
+        vpaddd  xmm10,xmm10,xmm4
+        vpxor   xmm0,xmm0,XMMWORD[((0-128))+rax]
+        vpsrld  xmm9,xmm11,27
+        vpxor   xmm6,xmm6,xmm13
+        vpxor   xmm0,xmm0,xmm2
+
+        vpslld  xmm7,xmm12,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm10,xmm10,xmm6
+        vpsrld  xmm5,xmm0,31
+        vpaddd  xmm0,xmm0,xmm0
+
+        vpsrld  xmm12,xmm12,2
+        vpaddd  xmm10,xmm10,xmm8
+        vpor    xmm0,xmm0,xmm5
+        vpor    xmm12,xmm12,xmm7
+        vmovdqa xmm15,XMMWORD[32+rbp]
+        vpxor   xmm1,xmm1,xmm3
+        vmovdqa xmm3,XMMWORD[((176-128))+rax]
+
+        vpaddd  xmm14,xmm14,xmm15
+        vpslld  xmm8,xmm10,5
+        vpand   xmm7,xmm13,xmm12
+        vpxor   xmm1,xmm1,XMMWORD[((16-128))+rax]
+
+        vpaddd  xmm14,xmm14,xmm7
+        vpsrld  xmm9,xmm10,27
+        vpxor   xmm6,xmm13,xmm12
+        vpxor   xmm1,xmm1,xmm3
+
+        vmovdqu XMMWORD[(128-128)+rax],xmm0
+        vpaddd  xmm14,xmm14,xmm0
+        vpor    xmm8,xmm8,xmm9
+        vpsrld  xmm5,xmm1,31
+        vpand   xmm6,xmm6,xmm11
+        vpaddd  xmm1,xmm1,xmm1
+
+        vpslld  xmm7,xmm11,30
+        vpaddd  xmm14,xmm14,xmm6
+
+        vpsrld  xmm11,xmm11,2
+        vpaddd  xmm14,xmm14,xmm8
+        vpor    xmm1,xmm1,xmm5
+        vpor    xmm11,xmm11,xmm7
+        vpxor   xmm2,xmm2,xmm4
+        vmovdqa xmm4,XMMWORD[((192-128))+rax]
+
+        vpaddd  xmm13,xmm13,xmm15
+        vpslld  xmm8,xmm14,5
+        vpand   xmm7,xmm12,xmm11
+        vpxor   xmm2,xmm2,XMMWORD[((32-128))+rax]
+
+        vpaddd  xmm13,xmm13,xmm7
+        vpsrld  xmm9,xmm14,27
+        vpxor   xmm6,xmm12,xmm11
+        vpxor   xmm2,xmm2,xmm4
+
+        vmovdqu XMMWORD[(144-128)+rax],xmm1
+        vpaddd  xmm13,xmm13,xmm1
+        vpor    xmm8,xmm8,xmm9
+        vpsrld  xmm5,xmm2,31
+        vpand   xmm6,xmm6,xmm10
+        vpaddd  xmm2,xmm2,xmm2
+
+        vpslld  xmm7,xmm10,30
+        vpaddd  xmm13,xmm13,xmm6
+
+        vpsrld  xmm10,xmm10,2
+        vpaddd  xmm13,xmm13,xmm8
+        vpor    xmm2,xmm2,xmm5
+        vpor    xmm10,xmm10,xmm7
+        vpxor   xmm3,xmm3,xmm0
+        vmovdqa xmm0,XMMWORD[((208-128))+rax]
+
+        vpaddd  xmm12,xmm12,xmm15
+        vpslld  xmm8,xmm13,5
+        vpand   xmm7,xmm11,xmm10
+        vpxor   xmm3,xmm3,XMMWORD[((48-128))+rax]
+
+        vpaddd  xmm12,xmm12,xmm7
+        vpsrld  xmm9,xmm13,27
+        vpxor   xmm6,xmm11,xmm10
+        vpxor   xmm3,xmm3,xmm0
+
+        vmovdqu XMMWORD[(160-128)+rax],xmm2
+        vpaddd  xmm12,xmm12,xmm2
+        vpor    xmm8,xmm8,xmm9
+        vpsrld  xmm5,xmm3,31
+        vpand   xmm6,xmm6,xmm14
+        vpaddd  xmm3,xmm3,xmm3
+
+        vpslld  xmm7,xmm14,30
+        vpaddd  xmm12,xmm12,xmm6
+
+        vpsrld  xmm14,xmm14,2
+        vpaddd  xmm12,xmm12,xmm8
+        vpor    xmm3,xmm3,xmm5
+        vpor    xmm14,xmm14,xmm7
+        vpxor   xmm4,xmm4,xmm1
+        vmovdqa xmm1,XMMWORD[((224-128))+rax]
+
+        vpaddd  xmm11,xmm11,xmm15
+        vpslld  xmm8,xmm12,5
+        vpand   xmm7,xmm10,xmm14
+        vpxor   xmm4,xmm4,XMMWORD[((64-128))+rax]
+
+        vpaddd  xmm11,xmm11,xmm7
+        vpsrld  xmm9,xmm12,27
+        vpxor   xmm6,xmm10,xmm14
+        vpxor   xmm4,xmm4,xmm1
+
+        vmovdqu XMMWORD[(176-128)+rax],xmm3
+        vpaddd  xmm11,xmm11,xmm3
+        vpor    xmm8,xmm8,xmm9
+        vpsrld  xmm5,xmm4,31
+        vpand   xmm6,xmm6,xmm13
+        vpaddd  xmm4,xmm4,xmm4
+
+        vpslld  xmm7,xmm13,30
+        vpaddd  xmm11,xmm11,xmm6
+
+        vpsrld  xmm13,xmm13,2
+        vpaddd  xmm11,xmm11,xmm8
+        vpor    xmm4,xmm4,xmm5
+        vpor    xmm13,xmm13,xmm7
+        vpxor   xmm0,xmm0,xmm2
+        vmovdqa xmm2,XMMWORD[((240-128))+rax]
+
+        vpaddd  xmm10,xmm10,xmm15
+        vpslld  xmm8,xmm11,5
+        vpand   xmm7,xmm14,xmm13
+        vpxor   xmm0,xmm0,XMMWORD[((80-128))+rax]
+
+        vpaddd  xmm10,xmm10,xmm7
+        vpsrld  xmm9,xmm11,27
+        vpxor   xmm6,xmm14,xmm13
+        vpxor   xmm0,xmm0,xmm2
+
+        vmovdqu XMMWORD[(192-128)+rax],xmm4
+        vpaddd  xmm10,xmm10,xmm4
+        vpor    xmm8,xmm8,xmm9
+        vpsrld  xmm5,xmm0,31
+        vpand   xmm6,xmm6,xmm12
+        vpaddd  xmm0,xmm0,xmm0
+
+        vpslld  xmm7,xmm12,30
+        vpaddd  xmm10,xmm10,xmm6
+
+        vpsrld  xmm12,xmm12,2
+        vpaddd  xmm10,xmm10,xmm8
+        vpor    xmm0,xmm0,xmm5
+        vpor    xmm12,xmm12,xmm7
+        vpxor   xmm1,xmm1,xmm3
+        vmovdqa xmm3,XMMWORD[((0-128))+rax]
+
+        vpaddd  xmm14,xmm14,xmm15
+        vpslld  xmm8,xmm10,5
+        vpand   xmm7,xmm13,xmm12
+        vpxor   xmm1,xmm1,XMMWORD[((96-128))+rax]
+
+        vpaddd  xmm14,xmm14,xmm7
+        vpsrld  xmm9,xmm10,27
+        vpxor   xmm6,xmm13,xmm12
+        vpxor   xmm1,xmm1,xmm3
+
+        vmovdqu XMMWORD[(208-128)+rax],xmm0
+        vpaddd  xmm14,xmm14,xmm0
+        vpor    xmm8,xmm8,xmm9
+        vpsrld  xmm5,xmm1,31
+        vpand   xmm6,xmm6,xmm11
+        vpaddd  xmm1,xmm1,xmm1
+
+        vpslld  xmm7,xmm11,30
+        vpaddd  xmm14,xmm14,xmm6
+
+        vpsrld  xmm11,xmm11,2
+        vpaddd  xmm14,xmm14,xmm8
+        vpor    xmm1,xmm1,xmm5
+        vpor    xmm11,xmm11,xmm7
+        vpxor   xmm2,xmm2,xmm4
+        vmovdqa xmm4,XMMWORD[((16-128))+rax]
+
+        vpaddd  xmm13,xmm13,xmm15
+        vpslld  xmm8,xmm14,5
+        vpand   xmm7,xmm12,xmm11
+        vpxor   xmm2,xmm2,XMMWORD[((112-128))+rax]
+
+        vpaddd  xmm13,xmm13,xmm7
+        vpsrld  xmm9,xmm14,27
+        vpxor   xmm6,xmm12,xmm11
+        vpxor   xmm2,xmm2,xmm4
+
+        vmovdqu XMMWORD[(224-128)+rax],xmm1
+        vpaddd  xmm13,xmm13,xmm1
+        vpor    xmm8,xmm8,xmm9
+        vpsrld  xmm5,xmm2,31
+        vpand   xmm6,xmm6,xmm10
+        vpaddd  xmm2,xmm2,xmm2
+
+        vpslld  xmm7,xmm10,30
+        vpaddd  xmm13,xmm13,xmm6
+
+        vpsrld  xmm10,xmm10,2
+        vpaddd  xmm13,xmm13,xmm8
+        vpor    xmm2,xmm2,xmm5
+        vpor    xmm10,xmm10,xmm7
+        vpxor   xmm3,xmm3,xmm0
+        vmovdqa xmm0,XMMWORD[((32-128))+rax]
+
+        vpaddd  xmm12,xmm12,xmm15
+        vpslld  xmm8,xmm13,5
+        vpand   xmm7,xmm11,xmm10
+        vpxor   xmm3,xmm3,XMMWORD[((128-128))+rax]
+
+        vpaddd  xmm12,xmm12,xmm7
+        vpsrld  xmm9,xmm13,27
+        vpxor   xmm6,xmm11,xmm10
+        vpxor   xmm3,xmm3,xmm0
+
+        vmovdqu XMMWORD[(240-128)+rax],xmm2
+        vpaddd  xmm12,xmm12,xmm2
+        vpor    xmm8,xmm8,xmm9
+        vpsrld  xmm5,xmm3,31
+        vpand   xmm6,xmm6,xmm14
+        vpaddd  xmm3,xmm3,xmm3
+
+        vpslld  xmm7,xmm14,30
+        vpaddd  xmm12,xmm12,xmm6
+
+        vpsrld  xmm14,xmm14,2
+        vpaddd  xmm12,xmm12,xmm8
+        vpor    xmm3,xmm3,xmm5
+        vpor    xmm14,xmm14,xmm7
+        vpxor   xmm4,xmm4,xmm1
+        vmovdqa xmm1,XMMWORD[((48-128))+rax]
+
+        vpaddd  xmm11,xmm11,xmm15
+        vpslld  xmm8,xmm12,5
+        vpand   xmm7,xmm10,xmm14
+        vpxor   xmm4,xmm4,XMMWORD[((144-128))+rax]
+
+        vpaddd  xmm11,xmm11,xmm7
+        vpsrld  xmm9,xmm12,27
+        vpxor   xmm6,xmm10,xmm14
+        vpxor   xmm4,xmm4,xmm1
+
+        vmovdqu XMMWORD[(0-128)+rax],xmm3
+        vpaddd  xmm11,xmm11,xmm3
+        vpor    xmm8,xmm8,xmm9
+        vpsrld  xmm5,xmm4,31
+        vpand   xmm6,xmm6,xmm13
+        vpaddd  xmm4,xmm4,xmm4
+
+        vpslld  xmm7,xmm13,30
+        vpaddd  xmm11,xmm11,xmm6
+
+        vpsrld  xmm13,xmm13,2
+        vpaddd  xmm11,xmm11,xmm8
+        vpor    xmm4,xmm4,xmm5
+        vpor    xmm13,xmm13,xmm7
+        vpxor   xmm0,xmm0,xmm2
+        vmovdqa xmm2,XMMWORD[((64-128))+rax]
+
+        vpaddd  xmm10,xmm10,xmm15
+        vpslld  xmm8,xmm11,5
+        vpand   xmm7,xmm14,xmm13
+        vpxor   xmm0,xmm0,XMMWORD[((160-128))+rax]
+
+        vpaddd  xmm10,xmm10,xmm7
+        vpsrld  xmm9,xmm11,27
+        vpxor   xmm6,xmm14,xmm13
+        vpxor   xmm0,xmm0,xmm2
+
+        vmovdqu XMMWORD[(16-128)+rax],xmm4
+        vpaddd  xmm10,xmm10,xmm4
+        vpor    xmm8,xmm8,xmm9
+        vpsrld  xmm5,xmm0,31
+        vpand   xmm6,xmm6,xmm12
+        vpaddd  xmm0,xmm0,xmm0
+
+        vpslld  xmm7,xmm12,30
+        vpaddd  xmm10,xmm10,xmm6
+
+        vpsrld  xmm12,xmm12,2
+        vpaddd  xmm10,xmm10,xmm8
+        vpor    xmm0,xmm0,xmm5
+        vpor    xmm12,xmm12,xmm7
+        vpxor   xmm1,xmm1,xmm3
+        vmovdqa xmm3,XMMWORD[((80-128))+rax]
+
+        vpaddd  xmm14,xmm14,xmm15
+        vpslld  xmm8,xmm10,5
+        vpand   xmm7,xmm13,xmm12
+        vpxor   xmm1,xmm1,XMMWORD[((176-128))+rax]
+
+        vpaddd  xmm14,xmm14,xmm7
+        vpsrld  xmm9,xmm10,27
+        vpxor   xmm6,xmm13,xmm12
+        vpxor   xmm1,xmm1,xmm3
+
+        vmovdqu XMMWORD[(32-128)+rax],xmm0
+        vpaddd  xmm14,xmm14,xmm0
+        vpor    xmm8,xmm8,xmm9
+        vpsrld  xmm5,xmm1,31
+        vpand   xmm6,xmm6,xmm11
+        vpaddd  xmm1,xmm1,xmm1
+
+        vpslld  xmm7,xmm11,30
+        vpaddd  xmm14,xmm14,xmm6
+
+        vpsrld  xmm11,xmm11,2
+        vpaddd  xmm14,xmm14,xmm8
+        vpor    xmm1,xmm1,xmm5
+        vpor    xmm11,xmm11,xmm7
+        vpxor   xmm2,xmm2,xmm4
+        vmovdqa xmm4,XMMWORD[((96-128))+rax]
+
+        vpaddd  xmm13,xmm13,xmm15
+        vpslld  xmm8,xmm14,5
+        vpand   xmm7,xmm12,xmm11
+        vpxor   xmm2,xmm2,XMMWORD[((192-128))+rax]
+
+        vpaddd  xmm13,xmm13,xmm7
+        vpsrld  xmm9,xmm14,27
+        vpxor   xmm6,xmm12,xmm11
+        vpxor   xmm2,xmm2,xmm4
+
+        vmovdqu XMMWORD[(48-128)+rax],xmm1
+        vpaddd  xmm13,xmm13,xmm1
+        vpor    xmm8,xmm8,xmm9
+        vpsrld  xmm5,xmm2,31
+        vpand   xmm6,xmm6,xmm10
+        vpaddd  xmm2,xmm2,xmm2
+
+        vpslld  xmm7,xmm10,30
+        vpaddd  xmm13,xmm13,xmm6
+
+        vpsrld  xmm10,xmm10,2
+        vpaddd  xmm13,xmm13,xmm8
+        vpor    xmm2,xmm2,xmm5
+        vpor    xmm10,xmm10,xmm7
+        vpxor   xmm3,xmm3,xmm0
+        vmovdqa xmm0,XMMWORD[((112-128))+rax]
+
+        vpaddd  xmm12,xmm12,xmm15
+        vpslld  xmm8,xmm13,5
+        vpand   xmm7,xmm11,xmm10
+        vpxor   xmm3,xmm3,XMMWORD[((208-128))+rax]
+
+        vpaddd  xmm12,xmm12,xmm7
+        vpsrld  xmm9,xmm13,27
+        vpxor   xmm6,xmm11,xmm10
+        vpxor   xmm3,xmm3,xmm0
+
+        vmovdqu XMMWORD[(64-128)+rax],xmm2
+        vpaddd  xmm12,xmm12,xmm2
+        vpor    xmm8,xmm8,xmm9
+        vpsrld  xmm5,xmm3,31
+        vpand   xmm6,xmm6,xmm14
+        vpaddd  xmm3,xmm3,xmm3
+
+        vpslld  xmm7,xmm14,30
+        vpaddd  xmm12,xmm12,xmm6
+
+        vpsrld  xmm14,xmm14,2
+        vpaddd  xmm12,xmm12,xmm8
+        vpor    xmm3,xmm3,xmm5
+        vpor    xmm14,xmm14,xmm7
+        vpxor   xmm4,xmm4,xmm1
+        vmovdqa xmm1,XMMWORD[((128-128))+rax]
+
+        vpaddd  xmm11,xmm11,xmm15
+        vpslld  xmm8,xmm12,5
+        vpand   xmm7,xmm10,xmm14
+        vpxor   xmm4,xmm4,XMMWORD[((224-128))+rax]
+
+        vpaddd  xmm11,xmm11,xmm7
+        vpsrld  xmm9,xmm12,27
+        vpxor   xmm6,xmm10,xmm14
+        vpxor   xmm4,xmm4,xmm1
+
+        vmovdqu XMMWORD[(80-128)+rax],xmm3
+        vpaddd  xmm11,xmm11,xmm3
+        vpor    xmm8,xmm8,xmm9
+        vpsrld  xmm5,xmm4,31
+        vpand   xmm6,xmm6,xmm13
+        vpaddd  xmm4,xmm4,xmm4
+
+        vpslld  xmm7,xmm13,30
+        vpaddd  xmm11,xmm11,xmm6
+
+        vpsrld  xmm13,xmm13,2
+        vpaddd  xmm11,xmm11,xmm8
+        vpor    xmm4,xmm4,xmm5
+        vpor    xmm13,xmm13,xmm7
+        vpxor   xmm0,xmm0,xmm2
+        vmovdqa xmm2,XMMWORD[((144-128))+rax]
+
+        vpaddd  xmm10,xmm10,xmm15
+        vpslld  xmm8,xmm11,5
+        vpand   xmm7,xmm14,xmm13
+        vpxor   xmm0,xmm0,XMMWORD[((240-128))+rax]
+
+        vpaddd  xmm10,xmm10,xmm7
+        vpsrld  xmm9,xmm11,27
+        vpxor   xmm6,xmm14,xmm13
+        vpxor   xmm0,xmm0,xmm2
+
+        vmovdqu XMMWORD[(96-128)+rax],xmm4
+        vpaddd  xmm10,xmm10,xmm4
+        vpor    xmm8,xmm8,xmm9
+        vpsrld  xmm5,xmm0,31
+        vpand   xmm6,xmm6,xmm12
+        vpaddd  xmm0,xmm0,xmm0
+
+        vpslld  xmm7,xmm12,30
+        vpaddd  xmm10,xmm10,xmm6
+
+        vpsrld  xmm12,xmm12,2
+        vpaddd  xmm10,xmm10,xmm8
+        vpor    xmm0,xmm0,xmm5
+        vpor    xmm12,xmm12,xmm7
+        vpxor   xmm1,xmm1,xmm3
+        vmovdqa xmm3,XMMWORD[((160-128))+rax]
+
+        vpaddd  xmm14,xmm14,xmm15
+        vpslld  xmm8,xmm10,5
+        vpand   xmm7,xmm13,xmm12
+        vpxor   xmm1,xmm1,XMMWORD[((0-128))+rax]
+
+        vpaddd  xmm14,xmm14,xmm7
+        vpsrld  xmm9,xmm10,27
+        vpxor   xmm6,xmm13,xmm12
+        vpxor   xmm1,xmm1,xmm3
+
+        vmovdqu XMMWORD[(112-128)+rax],xmm0
+        vpaddd  xmm14,xmm14,xmm0
+        vpor    xmm8,xmm8,xmm9
+        vpsrld  xmm5,xmm1,31
+        vpand   xmm6,xmm6,xmm11
+        vpaddd  xmm1,xmm1,xmm1
+
+        vpslld  xmm7,xmm11,30
+        vpaddd  xmm14,xmm14,xmm6
+
+        vpsrld  xmm11,xmm11,2
+        vpaddd  xmm14,xmm14,xmm8
+        vpor    xmm1,xmm1,xmm5
+        vpor    xmm11,xmm11,xmm7
+        vpxor   xmm2,xmm2,xmm4
+        vmovdqa xmm4,XMMWORD[((176-128))+rax]
+
+        vpaddd  xmm13,xmm13,xmm15
+        vpslld  xmm8,xmm14,5
+        vpand   xmm7,xmm12,xmm11
+        vpxor   xmm2,xmm2,XMMWORD[((16-128))+rax]
+
+        vpaddd  xmm13,xmm13,xmm7
+        vpsrld  xmm9,xmm14,27
+        vpxor   xmm6,xmm12,xmm11
+        vpxor   xmm2,xmm2,xmm4
+
+        vmovdqu XMMWORD[(128-128)+rax],xmm1
+        vpaddd  xmm13,xmm13,xmm1
+        vpor    xmm8,xmm8,xmm9
+        vpsrld  xmm5,xmm2,31
+        vpand   xmm6,xmm6,xmm10
+        vpaddd  xmm2,xmm2,xmm2
+
+        vpslld  xmm7,xmm10,30
+        vpaddd  xmm13,xmm13,xmm6
+
+        vpsrld  xmm10,xmm10,2
+        vpaddd  xmm13,xmm13,xmm8
+        vpor    xmm2,xmm2,xmm5
+        vpor    xmm10,xmm10,xmm7
+        vpxor   xmm3,xmm3,xmm0
+        vmovdqa xmm0,XMMWORD[((192-128))+rax]
+
+        vpaddd  xmm12,xmm12,xmm15
+        vpslld  xmm8,xmm13,5
+        vpand   xmm7,xmm11,xmm10
+        vpxor   xmm3,xmm3,XMMWORD[((32-128))+rax]
+
+        vpaddd  xmm12,xmm12,xmm7
+        vpsrld  xmm9,xmm13,27
+        vpxor   xmm6,xmm11,xmm10
+        vpxor   xmm3,xmm3,xmm0
+
+        vmovdqu XMMWORD[(144-128)+rax],xmm2
+        vpaddd  xmm12,xmm12,xmm2
+        vpor    xmm8,xmm8,xmm9
+        vpsrld  xmm5,xmm3,31
+        vpand   xmm6,xmm6,xmm14
+        vpaddd  xmm3,xmm3,xmm3
+
+        vpslld  xmm7,xmm14,30
+        vpaddd  xmm12,xmm12,xmm6
+
+        vpsrld  xmm14,xmm14,2
+        vpaddd  xmm12,xmm12,xmm8
+        vpor    xmm3,xmm3,xmm5
+        vpor    xmm14,xmm14,xmm7
+        vpxor   xmm4,xmm4,xmm1
+        vmovdqa xmm1,XMMWORD[((208-128))+rax]
+
+        vpaddd  xmm11,xmm11,xmm15
+        vpslld  xmm8,xmm12,5
+        vpand   xmm7,xmm10,xmm14
+        vpxor   xmm4,xmm4,XMMWORD[((48-128))+rax]
+
+        vpaddd  xmm11,xmm11,xmm7
+        vpsrld  xmm9,xmm12,27
+        vpxor   xmm6,xmm10,xmm14
+        vpxor   xmm4,xmm4,xmm1
+
+        vmovdqu XMMWORD[(160-128)+rax],xmm3
+        vpaddd  xmm11,xmm11,xmm3
+        vpor    xmm8,xmm8,xmm9
+        vpsrld  xmm5,xmm4,31
+        vpand   xmm6,xmm6,xmm13
+        vpaddd  xmm4,xmm4,xmm4
+
+        vpslld  xmm7,xmm13,30
+        vpaddd  xmm11,xmm11,xmm6
+
+        vpsrld  xmm13,xmm13,2
+        vpaddd  xmm11,xmm11,xmm8
+        vpor    xmm4,xmm4,xmm5
+        vpor    xmm13,xmm13,xmm7
+        vpxor   xmm0,xmm0,xmm2
+        vmovdqa xmm2,XMMWORD[((224-128))+rax]
+
+        vpaddd  xmm10,xmm10,xmm15
+        vpslld  xmm8,xmm11,5
+        vpand   xmm7,xmm14,xmm13
+        vpxor   xmm0,xmm0,XMMWORD[((64-128))+rax]
+
+        vpaddd  xmm10,xmm10,xmm7
+        vpsrld  xmm9,xmm11,27
+        vpxor   xmm6,xmm14,xmm13
+        vpxor   xmm0,xmm0,xmm2
+
+        vmovdqu XMMWORD[(176-128)+rax],xmm4
+        vpaddd  xmm10,xmm10,xmm4
+        vpor    xmm8,xmm8,xmm9
+        vpsrld  xmm5,xmm0,31
+        vpand   xmm6,xmm6,xmm12
+        vpaddd  xmm0,xmm0,xmm0
+
+        vpslld  xmm7,xmm12,30
+        vpaddd  xmm10,xmm10,xmm6
+
+        vpsrld  xmm12,xmm12,2
+        vpaddd  xmm10,xmm10,xmm8
+        vpor    xmm0,xmm0,xmm5
+        vpor    xmm12,xmm12,xmm7
+        vmovdqa xmm15,XMMWORD[64+rbp]
+        vpxor   xmm1,xmm1,xmm3
+        vmovdqa xmm3,XMMWORD[((240-128))+rax]
+
+        vpslld  xmm8,xmm10,5
+        vpaddd  xmm14,xmm14,xmm15
+        vpxor   xmm6,xmm13,xmm11
+        vmovdqa XMMWORD[(192-128)+rax],xmm0
+        vpaddd  xmm14,xmm14,xmm0
+        vpxor   xmm1,xmm1,XMMWORD[((80-128))+rax]
+        vpsrld  xmm9,xmm10,27
+        vpxor   xmm6,xmm6,xmm12
+        vpxor   xmm1,xmm1,xmm3
+
+        vpslld  xmm7,xmm11,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm14,xmm14,xmm6
+        vpsrld  xmm5,xmm1,31
+        vpaddd  xmm1,xmm1,xmm1
+
+        vpsrld  xmm11,xmm11,2
+        vpaddd  xmm14,xmm14,xmm8
+        vpor    xmm1,xmm1,xmm5
+        vpor    xmm11,xmm11,xmm7
+        vpxor   xmm2,xmm2,xmm4
+        vmovdqa xmm4,XMMWORD[((0-128))+rax]
+
+        vpslld  xmm8,xmm14,5
+        vpaddd  xmm13,xmm13,xmm15
+        vpxor   xmm6,xmm12,xmm10
+        vmovdqa XMMWORD[(208-128)+rax],xmm1
+        vpaddd  xmm13,xmm13,xmm1
+        vpxor   xmm2,xmm2,XMMWORD[((96-128))+rax]
+        vpsrld  xmm9,xmm14,27
+        vpxor   xmm6,xmm6,xmm11
+        vpxor   xmm2,xmm2,xmm4
+
+        vpslld  xmm7,xmm10,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm13,xmm13,xmm6
+        vpsrld  xmm5,xmm2,31
+        vpaddd  xmm2,xmm2,xmm2
+
+        vpsrld  xmm10,xmm10,2
+        vpaddd  xmm13,xmm13,xmm8
+        vpor    xmm2,xmm2,xmm5
+        vpor    xmm10,xmm10,xmm7
+        vpxor   xmm3,xmm3,xmm0
+        vmovdqa xmm0,XMMWORD[((16-128))+rax]
+
+        vpslld  xmm8,xmm13,5
+        vpaddd  xmm12,xmm12,xmm15
+        vpxor   xmm6,xmm11,xmm14
+        vmovdqa XMMWORD[(224-128)+rax],xmm2
+        vpaddd  xmm12,xmm12,xmm2
+        vpxor   xmm3,xmm3,XMMWORD[((112-128))+rax]
+        vpsrld  xmm9,xmm13,27
+        vpxor   xmm6,xmm6,xmm10
+        vpxor   xmm3,xmm3,xmm0
+
+        vpslld  xmm7,xmm14,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm12,xmm12,xmm6
+        vpsrld  xmm5,xmm3,31
+        vpaddd  xmm3,xmm3,xmm3
+
+        vpsrld  xmm14,xmm14,2
+        vpaddd  xmm12,xmm12,xmm8
+        vpor    xmm3,xmm3,xmm5
+        vpor    xmm14,xmm14,xmm7
+        vpxor   xmm4,xmm4,xmm1
+        vmovdqa xmm1,XMMWORD[((32-128))+rax]
+
+        vpslld  xmm8,xmm12,5
+        vpaddd  xmm11,xmm11,xmm15
+        vpxor   xmm6,xmm10,xmm13
+        vmovdqa XMMWORD[(240-128)+rax],xmm3
+        vpaddd  xmm11,xmm11,xmm3
+        vpxor   xmm4,xmm4,XMMWORD[((128-128))+rax]
+        vpsrld  xmm9,xmm12,27
+        vpxor   xmm6,xmm6,xmm14
+        vpxor   xmm4,xmm4,xmm1
+
+        vpslld  xmm7,xmm13,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm11,xmm11,xmm6
+        vpsrld  xmm5,xmm4,31
+        vpaddd  xmm4,xmm4,xmm4
+
+        vpsrld  xmm13,xmm13,2
+        vpaddd  xmm11,xmm11,xmm8
+        vpor    xmm4,xmm4,xmm5
+        vpor    xmm13,xmm13,xmm7
+        vpxor   xmm0,xmm0,xmm2
+        vmovdqa xmm2,XMMWORD[((48-128))+rax]
+
+        vpslld  xmm8,xmm11,5
+        vpaddd  xmm10,xmm10,xmm15
+        vpxor   xmm6,xmm14,xmm12
+        vmovdqa XMMWORD[(0-128)+rax],xmm4
+        vpaddd  xmm10,xmm10,xmm4
+        vpxor   xmm0,xmm0,XMMWORD[((144-128))+rax]
+        vpsrld  xmm9,xmm11,27
+        vpxor   xmm6,xmm6,xmm13
+        vpxor   xmm0,xmm0,xmm2
+
+        vpslld  xmm7,xmm12,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm10,xmm10,xmm6
+        vpsrld  xmm5,xmm0,31
+        vpaddd  xmm0,xmm0,xmm0
+
+        vpsrld  xmm12,xmm12,2
+        vpaddd  xmm10,xmm10,xmm8
+        vpor    xmm0,xmm0,xmm5
+        vpor    xmm12,xmm12,xmm7
+        vpxor   xmm1,xmm1,xmm3
+        vmovdqa xmm3,XMMWORD[((64-128))+rax]
+
+        vpslld  xmm8,xmm10,5
+        vpaddd  xmm14,xmm14,xmm15
+        vpxor   xmm6,xmm13,xmm11
+        vmovdqa XMMWORD[(16-128)+rax],xmm0
+        vpaddd  xmm14,xmm14,xmm0
+        vpxor   xmm1,xmm1,XMMWORD[((160-128))+rax]
+        vpsrld  xmm9,xmm10,27
+        vpxor   xmm6,xmm6,xmm12
+        vpxor   xmm1,xmm1,xmm3
+
+        vpslld  xmm7,xmm11,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm14,xmm14,xmm6
+        vpsrld  xmm5,xmm1,31
+        vpaddd  xmm1,xmm1,xmm1
+
+        vpsrld  xmm11,xmm11,2
+        vpaddd  xmm14,xmm14,xmm8
+        vpor    xmm1,xmm1,xmm5
+        vpor    xmm11,xmm11,xmm7
+        vpxor   xmm2,xmm2,xmm4
+        vmovdqa xmm4,XMMWORD[((80-128))+rax]
+
+        vpslld  xmm8,xmm14,5
+        vpaddd  xmm13,xmm13,xmm15
+        vpxor   xmm6,xmm12,xmm10
+        vmovdqa XMMWORD[(32-128)+rax],xmm1
+        vpaddd  xmm13,xmm13,xmm1
+        vpxor   xmm2,xmm2,XMMWORD[((176-128))+rax]
+        vpsrld  xmm9,xmm14,27
+        vpxor   xmm6,xmm6,xmm11
+        vpxor   xmm2,xmm2,xmm4
+
+        vpslld  xmm7,xmm10,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm13,xmm13,xmm6
+        vpsrld  xmm5,xmm2,31
+        vpaddd  xmm2,xmm2,xmm2
+
+        vpsrld  xmm10,xmm10,2
+        vpaddd  xmm13,xmm13,xmm8
+        vpor    xmm2,xmm2,xmm5
+        vpor    xmm10,xmm10,xmm7
+        vpxor   xmm3,xmm3,xmm0
+        vmovdqa xmm0,XMMWORD[((96-128))+rax]
+
+        vpslld  xmm8,xmm13,5
+        vpaddd  xmm12,xmm12,xmm15
+        vpxor   xmm6,xmm11,xmm14
+        vmovdqa XMMWORD[(48-128)+rax],xmm2
+        vpaddd  xmm12,xmm12,xmm2
+        vpxor   xmm3,xmm3,XMMWORD[((192-128))+rax]
+        vpsrld  xmm9,xmm13,27
+        vpxor   xmm6,xmm6,xmm10
+        vpxor   xmm3,xmm3,xmm0
+
+        vpslld  xmm7,xmm14,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm12,xmm12,xmm6
+        vpsrld  xmm5,xmm3,31
+        vpaddd  xmm3,xmm3,xmm3
+
+        vpsrld  xmm14,xmm14,2
+        vpaddd  xmm12,xmm12,xmm8
+        vpor    xmm3,xmm3,xmm5
+        vpor    xmm14,xmm14,xmm7
+        vpxor   xmm4,xmm4,xmm1
+        vmovdqa xmm1,XMMWORD[((112-128))+rax]
+
+        vpslld  xmm8,xmm12,5
+        vpaddd  xmm11,xmm11,xmm15
+        vpxor   xmm6,xmm10,xmm13
+        vmovdqa XMMWORD[(64-128)+rax],xmm3
+        vpaddd  xmm11,xmm11,xmm3
+        vpxor   xmm4,xmm4,XMMWORD[((208-128))+rax]
+        vpsrld  xmm9,xmm12,27
+        vpxor   xmm6,xmm6,xmm14
+        vpxor   xmm4,xmm4,xmm1
+
+        vpslld  xmm7,xmm13,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm11,xmm11,xmm6
+        vpsrld  xmm5,xmm4,31
+        vpaddd  xmm4,xmm4,xmm4
+
+        vpsrld  xmm13,xmm13,2
+        vpaddd  xmm11,xmm11,xmm8
+        vpor    xmm4,xmm4,xmm5
+        vpor    xmm13,xmm13,xmm7
+        vpxor   xmm0,xmm0,xmm2
+        vmovdqa xmm2,XMMWORD[((128-128))+rax]
+
+        vpslld  xmm8,xmm11,5
+        vpaddd  xmm10,xmm10,xmm15
+        vpxor   xmm6,xmm14,xmm12
+        vmovdqa XMMWORD[(80-128)+rax],xmm4
+        vpaddd  xmm10,xmm10,xmm4
+        vpxor   xmm0,xmm0,XMMWORD[((224-128))+rax]
+        vpsrld  xmm9,xmm11,27
+        vpxor   xmm6,xmm6,xmm13
+        vpxor   xmm0,xmm0,xmm2
+
+        vpslld  xmm7,xmm12,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm10,xmm10,xmm6
+        vpsrld  xmm5,xmm0,31
+        vpaddd  xmm0,xmm0,xmm0
+
+        vpsrld  xmm12,xmm12,2
+        vpaddd  xmm10,xmm10,xmm8
+        vpor    xmm0,xmm0,xmm5
+        vpor    xmm12,xmm12,xmm7
+        vpxor   xmm1,xmm1,xmm3
+        vmovdqa xmm3,XMMWORD[((144-128))+rax]
+
+        vpslld  xmm8,xmm10,5
+        vpaddd  xmm14,xmm14,xmm15
+        vpxor   xmm6,xmm13,xmm11
+        vmovdqa XMMWORD[(96-128)+rax],xmm0
+        vpaddd  xmm14,xmm14,xmm0
+        vpxor   xmm1,xmm1,XMMWORD[((240-128))+rax]
+        vpsrld  xmm9,xmm10,27
+        vpxor   xmm6,xmm6,xmm12
+        vpxor   xmm1,xmm1,xmm3
+
+        vpslld  xmm7,xmm11,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm14,xmm14,xmm6
+        vpsrld  xmm5,xmm1,31
+        vpaddd  xmm1,xmm1,xmm1
+
+        vpsrld  xmm11,xmm11,2
+        vpaddd  xmm14,xmm14,xmm8
+        vpor    xmm1,xmm1,xmm5
+        vpor    xmm11,xmm11,xmm7
+        vpxor   xmm2,xmm2,xmm4
+        vmovdqa xmm4,XMMWORD[((160-128))+rax]
+
+        vpslld  xmm8,xmm14,5
+        vpaddd  xmm13,xmm13,xmm15
+        vpxor   xmm6,xmm12,xmm10
+        vmovdqa XMMWORD[(112-128)+rax],xmm1
+        vpaddd  xmm13,xmm13,xmm1
+        vpxor   xmm2,xmm2,XMMWORD[((0-128))+rax]
+        vpsrld  xmm9,xmm14,27
+        vpxor   xmm6,xmm6,xmm11
+        vpxor   xmm2,xmm2,xmm4
+
+        vpslld  xmm7,xmm10,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm13,xmm13,xmm6
+        vpsrld  xmm5,xmm2,31
+        vpaddd  xmm2,xmm2,xmm2
+
+        vpsrld  xmm10,xmm10,2
+        vpaddd  xmm13,xmm13,xmm8
+        vpor    xmm2,xmm2,xmm5
+        vpor    xmm10,xmm10,xmm7
+        vpxor   xmm3,xmm3,xmm0
+        vmovdqa xmm0,XMMWORD[((176-128))+rax]
+
+        vpslld  xmm8,xmm13,5
+        vpaddd  xmm12,xmm12,xmm15
+        vpxor   xmm6,xmm11,xmm14
+        vpaddd  xmm12,xmm12,xmm2
+        vpxor   xmm3,xmm3,XMMWORD[((16-128))+rax]
+        vpsrld  xmm9,xmm13,27
+        vpxor   xmm6,xmm6,xmm10
+        vpxor   xmm3,xmm3,xmm0
+
+        vpslld  xmm7,xmm14,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm12,xmm12,xmm6
+        vpsrld  xmm5,xmm3,31
+        vpaddd  xmm3,xmm3,xmm3
+
+        vpsrld  xmm14,xmm14,2
+        vpaddd  xmm12,xmm12,xmm8
+        vpor    xmm3,xmm3,xmm5
+        vpor    xmm14,xmm14,xmm7
+        vpxor   xmm4,xmm4,xmm1
+        vmovdqa xmm1,XMMWORD[((192-128))+rax]
+
+        vpslld  xmm8,xmm12,5
+        vpaddd  xmm11,xmm11,xmm15
+        vpxor   xmm6,xmm10,xmm13
+        vpaddd  xmm11,xmm11,xmm3
+        vpxor   xmm4,xmm4,XMMWORD[((32-128))+rax]
+        vpsrld  xmm9,xmm12,27
+        vpxor   xmm6,xmm6,xmm14
+        vpxor   xmm4,xmm4,xmm1
+
+        vpslld  xmm7,xmm13,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm11,xmm11,xmm6
+        vpsrld  xmm5,xmm4,31
+        vpaddd  xmm4,xmm4,xmm4
+
+        vpsrld  xmm13,xmm13,2
+        vpaddd  xmm11,xmm11,xmm8
+        vpor    xmm4,xmm4,xmm5
+        vpor    xmm13,xmm13,xmm7
+        vpxor   xmm0,xmm0,xmm2
+        vmovdqa xmm2,XMMWORD[((208-128))+rax]
+
+        vpslld  xmm8,xmm11,5
+        vpaddd  xmm10,xmm10,xmm15
+        vpxor   xmm6,xmm14,xmm12
+        vpaddd  xmm10,xmm10,xmm4
+        vpxor   xmm0,xmm0,XMMWORD[((48-128))+rax]
+        vpsrld  xmm9,xmm11,27
+        vpxor   xmm6,xmm6,xmm13
+        vpxor   xmm0,xmm0,xmm2
+
+        vpslld  xmm7,xmm12,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm10,xmm10,xmm6
+        vpsrld  xmm5,xmm0,31
+        vpaddd  xmm0,xmm0,xmm0
+
+        vpsrld  xmm12,xmm12,2
+        vpaddd  xmm10,xmm10,xmm8
+        vpor    xmm0,xmm0,xmm5
+        vpor    xmm12,xmm12,xmm7
+        vpxor   xmm1,xmm1,xmm3
+        vmovdqa xmm3,XMMWORD[((224-128))+rax]
+
+        vpslld  xmm8,xmm10,5
+        vpaddd  xmm14,xmm14,xmm15
+        vpxor   xmm6,xmm13,xmm11
+        vpaddd  xmm14,xmm14,xmm0
+        vpxor   xmm1,xmm1,XMMWORD[((64-128))+rax]
+        vpsrld  xmm9,xmm10,27
+        vpxor   xmm6,xmm6,xmm12
+        vpxor   xmm1,xmm1,xmm3
+
+        vpslld  xmm7,xmm11,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm14,xmm14,xmm6
+        vpsrld  xmm5,xmm1,31
+        vpaddd  xmm1,xmm1,xmm1
+
+        vpsrld  xmm11,xmm11,2
+        vpaddd  xmm14,xmm14,xmm8
+        vpor    xmm1,xmm1,xmm5
+        vpor    xmm11,xmm11,xmm7
+        vpxor   xmm2,xmm2,xmm4
+        vmovdqa xmm4,XMMWORD[((240-128))+rax]
+
+        vpslld  xmm8,xmm14,5
+        vpaddd  xmm13,xmm13,xmm15
+        vpxor   xmm6,xmm12,xmm10
+        vpaddd  xmm13,xmm13,xmm1
+        vpxor   xmm2,xmm2,XMMWORD[((80-128))+rax]
+        vpsrld  xmm9,xmm14,27
+        vpxor   xmm6,xmm6,xmm11
+        vpxor   xmm2,xmm2,xmm4
+
+        vpslld  xmm7,xmm10,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm13,xmm13,xmm6
+        vpsrld  xmm5,xmm2,31
+        vpaddd  xmm2,xmm2,xmm2
+
+        vpsrld  xmm10,xmm10,2
+        vpaddd  xmm13,xmm13,xmm8
+        vpor    xmm2,xmm2,xmm5
+        vpor    xmm10,xmm10,xmm7
+        vpxor   xmm3,xmm3,xmm0
+        vmovdqa xmm0,XMMWORD[((0-128))+rax]
+
+        vpslld  xmm8,xmm13,5
+        vpaddd  xmm12,xmm12,xmm15
+        vpxor   xmm6,xmm11,xmm14
+        vpaddd  xmm12,xmm12,xmm2
+        vpxor   xmm3,xmm3,XMMWORD[((96-128))+rax]
+        vpsrld  xmm9,xmm13,27
+        vpxor   xmm6,xmm6,xmm10
+        vpxor   xmm3,xmm3,xmm0
+
+        vpslld  xmm7,xmm14,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm12,xmm12,xmm6
+        vpsrld  xmm5,xmm3,31
+        vpaddd  xmm3,xmm3,xmm3
+
+        vpsrld  xmm14,xmm14,2
+        vpaddd  xmm12,xmm12,xmm8
+        vpor    xmm3,xmm3,xmm5
+        vpor    xmm14,xmm14,xmm7
+        vpxor   xmm4,xmm4,xmm1
+        vmovdqa xmm1,XMMWORD[((16-128))+rax]
+
+        vpslld  xmm8,xmm12,5
+        vpaddd  xmm11,xmm11,xmm15
+        vpxor   xmm6,xmm10,xmm13
+        vpaddd  xmm11,xmm11,xmm3
+        vpxor   xmm4,xmm4,XMMWORD[((112-128))+rax]
+        vpsrld  xmm9,xmm12,27
+        vpxor   xmm6,xmm6,xmm14
+        vpxor   xmm4,xmm4,xmm1
+
+        vpslld  xmm7,xmm13,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm11,xmm11,xmm6
+        vpsrld  xmm5,xmm4,31
+        vpaddd  xmm4,xmm4,xmm4
+
+        vpsrld  xmm13,xmm13,2
+        vpaddd  xmm11,xmm11,xmm8
+        vpor    xmm4,xmm4,xmm5
+        vpor    xmm13,xmm13,xmm7
+        vpslld  xmm8,xmm11,5
+        vpaddd  xmm10,xmm10,xmm15
+        vpxor   xmm6,xmm14,xmm12
+
+        vpsrld  xmm9,xmm11,27
+        vpaddd  xmm10,xmm10,xmm4
+        vpxor   xmm6,xmm6,xmm13
+
+        vpslld  xmm7,xmm12,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm10,xmm10,xmm6
+
+        vpsrld  xmm12,xmm12,2
+        vpaddd  xmm10,xmm10,xmm8
+        vpor    xmm12,xmm12,xmm7
+        mov     ecx,1
+        cmp     ecx,DWORD[rbx]
+        cmovge  r8,rbp
+        cmp     ecx,DWORD[4+rbx]
+        cmovge  r9,rbp
+        cmp     ecx,DWORD[8+rbx]
+        cmovge  r10,rbp
+        cmp     ecx,DWORD[12+rbx]
+        cmovge  r11,rbp
+        vmovdqu xmm6,XMMWORD[rbx]
+        vpxor   xmm8,xmm8,xmm8
+        vmovdqa xmm7,xmm6
+        vpcmpgtd        xmm7,xmm7,xmm8
+        vpaddd  xmm6,xmm6,xmm7
+
+        vpand   xmm10,xmm10,xmm7
+        vpand   xmm11,xmm11,xmm7
+        vpaddd  xmm10,xmm10,XMMWORD[rdi]
+        vpand   xmm12,xmm12,xmm7
+        vpaddd  xmm11,xmm11,XMMWORD[32+rdi]
+        vpand   xmm13,xmm13,xmm7
+        vpaddd  xmm12,xmm12,XMMWORD[64+rdi]
+        vpand   xmm14,xmm14,xmm7
+        vpaddd  xmm13,xmm13,XMMWORD[96+rdi]
+        vpaddd  xmm14,xmm14,XMMWORD[128+rdi]
+        vmovdqu XMMWORD[rdi],xmm10
+        vmovdqu XMMWORD[32+rdi],xmm11
+        vmovdqu XMMWORD[64+rdi],xmm12
+        vmovdqu XMMWORD[96+rdi],xmm13
+        vmovdqu XMMWORD[128+rdi],xmm14
+
+        vmovdqu XMMWORD[rbx],xmm6
+        vmovdqu xmm5,XMMWORD[96+rbp]
+        dec     edx
+        jnz     NEAR $L$oop_avx
+
+        mov     edx,DWORD[280+rsp]
+        lea     rdi,[16+rdi]
+        lea     rsi,[64+rsi]
+        dec     edx
+        jnz     NEAR $L$oop_grande_avx
+
+$L$done_avx:
+        mov     rax,QWORD[272+rsp]
+
+        vzeroupper
+        movaps  xmm6,XMMWORD[((-184))+rax]
+        movaps  xmm7,XMMWORD[((-168))+rax]
+        movaps  xmm8,XMMWORD[((-152))+rax]
+        movaps  xmm9,XMMWORD[((-136))+rax]
+        movaps  xmm10,XMMWORD[((-120))+rax]
+        movaps  xmm11,XMMWORD[((-104))+rax]
+        movaps  xmm12,XMMWORD[((-88))+rax]
+        movaps  xmm13,XMMWORD[((-72))+rax]
+        movaps  xmm14,XMMWORD[((-56))+rax]
+        movaps  xmm15,XMMWORD[((-40))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$epilogue_avx:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_sha1_multi_block_avx:
+
+ALIGN   32
+sha1_multi_block_avx2:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha1_multi_block_avx2:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+_avx2_shortcut:
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        lea     rsp,[((-168))+rsp]
+        movaps  XMMWORD[rsp],xmm6
+        movaps  XMMWORD[16+rsp],xmm7
+        movaps  XMMWORD[32+rsp],xmm8
+        movaps  XMMWORD[48+rsp],xmm9
+        movaps  XMMWORD[64+rsp],xmm10
+        movaps  XMMWORD[80+rsp],xmm11
+        movaps  XMMWORD[(-120)+rax],xmm12
+        movaps  XMMWORD[(-104)+rax],xmm13
+        movaps  XMMWORD[(-88)+rax],xmm14
+        movaps  XMMWORD[(-72)+rax],xmm15
+        sub     rsp,576
+        and     rsp,-256
+        mov     QWORD[544+rsp],rax
+
+$L$body_avx2:
+        lea     rbp,[K_XX_XX]
+        shr     edx,1
+
+        vzeroupper
+$L$oop_grande_avx2:
+        mov     DWORD[552+rsp],edx
+        xor     edx,edx
+        lea     rbx,[512+rsp]
+        mov     r12,QWORD[rsi]
+        mov     ecx,DWORD[8+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[rbx],ecx
+        cmovle  r12,rbp
+        mov     r13,QWORD[16+rsi]
+        mov     ecx,DWORD[24+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[4+rbx],ecx
+        cmovle  r13,rbp
+        mov     r14,QWORD[32+rsi]
+        mov     ecx,DWORD[40+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[8+rbx],ecx
+        cmovle  r14,rbp
+        mov     r15,QWORD[48+rsi]
+        mov     ecx,DWORD[56+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[12+rbx],ecx
+        cmovle  r15,rbp
+        mov     r8,QWORD[64+rsi]
+        mov     ecx,DWORD[72+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[16+rbx],ecx
+        cmovle  r8,rbp
+        mov     r9,QWORD[80+rsi]
+        mov     ecx,DWORD[88+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[20+rbx],ecx
+        cmovle  r9,rbp
+        mov     r10,QWORD[96+rsi]
+        mov     ecx,DWORD[104+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[24+rbx],ecx
+        cmovle  r10,rbp
+        mov     r11,QWORD[112+rsi]
+        mov     ecx,DWORD[120+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[28+rbx],ecx
+        cmovle  r11,rbp
+        vmovdqu ymm0,YMMWORD[rdi]
+        lea     rax,[128+rsp]
+        vmovdqu ymm1,YMMWORD[32+rdi]
+        lea     rbx,[((256+128))+rsp]
+        vmovdqu ymm2,YMMWORD[64+rdi]
+        vmovdqu ymm3,YMMWORD[96+rdi]
+        vmovdqu ymm4,YMMWORD[128+rdi]
+        vmovdqu ymm9,YMMWORD[96+rbp]
+        jmp     NEAR $L$oop_avx2
+
+ALIGN   32
+$L$oop_avx2:
+        vmovdqa ymm15,YMMWORD[((-32))+rbp]
+        vmovd   xmm10,DWORD[r12]
+        lea     r12,[64+r12]
+        vmovd   xmm12,DWORD[r8]
+        lea     r8,[64+r8]
+        vmovd   xmm7,DWORD[r13]
+        lea     r13,[64+r13]
+        vmovd   xmm6,DWORD[r9]
+        lea     r9,[64+r9]
+        vpinsrd xmm10,xmm10,DWORD[r14],1
+        lea     r14,[64+r14]
+        vpinsrd xmm12,xmm12,DWORD[r10],1
+        lea     r10,[64+r10]
+        vpinsrd xmm7,xmm7,DWORD[r15],1
+        lea     r15,[64+r15]
+        vpunpckldq      ymm10,ymm10,ymm7
+        vpinsrd xmm6,xmm6,DWORD[r11],1
+        lea     r11,[64+r11]
+        vpunpckldq      ymm12,ymm12,ymm6
+        vmovd   xmm11,DWORD[((-60))+r12]
+        vinserti128     ymm10,ymm10,xmm12,1
+        vmovd   xmm8,DWORD[((-60))+r8]
+        vpshufb ymm10,ymm10,ymm9
+        vmovd   xmm7,DWORD[((-60))+r13]
+        vmovd   xmm6,DWORD[((-60))+r9]
+        vpinsrd xmm11,xmm11,DWORD[((-60))+r14],1
+        vpinsrd xmm8,xmm8,DWORD[((-60))+r10],1
+        vpinsrd xmm7,xmm7,DWORD[((-60))+r15],1
+        vpunpckldq      ymm11,ymm11,ymm7
+        vpinsrd xmm6,xmm6,DWORD[((-60))+r11],1
+        vpunpckldq      ymm8,ymm8,ymm6
+        vpaddd  ymm4,ymm4,ymm15
+        vpslld  ymm7,ymm0,5
+        vpandn  ymm6,ymm1,ymm3
+        vpand   ymm5,ymm1,ymm2
+
+        vmovdqa YMMWORD[(0-128)+rax],ymm10
+        vpaddd  ymm4,ymm4,ymm10
+        vinserti128     ymm11,ymm11,xmm8,1
+        vpsrld  ymm8,ymm0,27
+        vpxor   ymm5,ymm5,ymm6
+        vmovd   xmm12,DWORD[((-56))+r12]
+
+        vpslld  ymm6,ymm1,30
+        vpor    ymm7,ymm7,ymm8
+        vmovd   xmm8,DWORD[((-56))+r8]
+        vpaddd  ymm4,ymm4,ymm5
+
+        vpsrld  ymm1,ymm1,2
+        vpaddd  ymm4,ymm4,ymm7
+        vpshufb ymm11,ymm11,ymm9
+        vpor    ymm1,ymm1,ymm6
+        vmovd   xmm7,DWORD[((-56))+r13]
+        vmovd   xmm6,DWORD[((-56))+r9]
+        vpinsrd xmm12,xmm12,DWORD[((-56))+r14],1
+        vpinsrd xmm8,xmm8,DWORD[((-56))+r10],1
+        vpinsrd xmm7,xmm7,DWORD[((-56))+r15],1
+        vpunpckldq      ymm12,ymm12,ymm7
+        vpinsrd xmm6,xmm6,DWORD[((-56))+r11],1
+        vpunpckldq      ymm8,ymm8,ymm6
+        vpaddd  ymm3,ymm3,ymm15
+        vpslld  ymm7,ymm4,5
+        vpandn  ymm6,ymm0,ymm2
+        vpand   ymm5,ymm0,ymm1
+
+        vmovdqa YMMWORD[(32-128)+rax],ymm11
+        vpaddd  ymm3,ymm3,ymm11
+        vinserti128     ymm12,ymm12,xmm8,1
+        vpsrld  ymm8,ymm4,27
+        vpxor   ymm5,ymm5,ymm6
+        vmovd   xmm13,DWORD[((-52))+r12]
+
+        vpslld  ymm6,ymm0,30
+        vpor    ymm7,ymm7,ymm8
+        vmovd   xmm8,DWORD[((-52))+r8]
+        vpaddd  ymm3,ymm3,ymm5
+
+        vpsrld  ymm0,ymm0,2
+        vpaddd  ymm3,ymm3,ymm7
+        vpshufb ymm12,ymm12,ymm9
+        vpor    ymm0,ymm0,ymm6
+        vmovd   xmm7,DWORD[((-52))+r13]
+        vmovd   xmm6,DWORD[((-52))+r9]
+        vpinsrd xmm13,xmm13,DWORD[((-52))+r14],1
+        vpinsrd xmm8,xmm8,DWORD[((-52))+r10],1
+        vpinsrd xmm7,xmm7,DWORD[((-52))+r15],1
+        vpunpckldq      ymm13,ymm13,ymm7
+        vpinsrd xmm6,xmm6,DWORD[((-52))+r11],1
+        vpunpckldq      ymm8,ymm8,ymm6
+        vpaddd  ymm2,ymm2,ymm15
+        vpslld  ymm7,ymm3,5
+        vpandn  ymm6,ymm4,ymm1
+        vpand   ymm5,ymm4,ymm0
+
+        vmovdqa YMMWORD[(64-128)+rax],ymm12
+        vpaddd  ymm2,ymm2,ymm12
+        vinserti128     ymm13,ymm13,xmm8,1
+        vpsrld  ymm8,ymm3,27
+        vpxor   ymm5,ymm5,ymm6
+        vmovd   xmm14,DWORD[((-48))+r12]
+
+        vpslld  ymm6,ymm4,30
+        vpor    ymm7,ymm7,ymm8
+        vmovd   xmm8,DWORD[((-48))+r8]
+        vpaddd  ymm2,ymm2,ymm5
+
+        vpsrld  ymm4,ymm4,2
+        vpaddd  ymm2,ymm2,ymm7
+        vpshufb ymm13,ymm13,ymm9
+        vpor    ymm4,ymm4,ymm6
+        vmovd   xmm7,DWORD[((-48))+r13]
+        vmovd   xmm6,DWORD[((-48))+r9]
+        vpinsrd xmm14,xmm14,DWORD[((-48))+r14],1
+        vpinsrd xmm8,xmm8,DWORD[((-48))+r10],1
+        vpinsrd xmm7,xmm7,DWORD[((-48))+r15],1
+        vpunpckldq      ymm14,ymm14,ymm7
+        vpinsrd xmm6,xmm6,DWORD[((-48))+r11],1
+        vpunpckldq      ymm8,ymm8,ymm6
+        vpaddd  ymm1,ymm1,ymm15
+        vpslld  ymm7,ymm2,5
+        vpandn  ymm6,ymm3,ymm0
+        vpand   ymm5,ymm3,ymm4
+
+        vmovdqa YMMWORD[(96-128)+rax],ymm13
+        vpaddd  ymm1,ymm1,ymm13
+        vinserti128     ymm14,ymm14,xmm8,1
+        vpsrld  ymm8,ymm2,27
+        vpxor   ymm5,ymm5,ymm6
+        vmovd   xmm10,DWORD[((-44))+r12]
+
+        vpslld  ymm6,ymm3,30
+        vpor    ymm7,ymm7,ymm8
+        vmovd   xmm8,DWORD[((-44))+r8]
+        vpaddd  ymm1,ymm1,ymm5
+
+        vpsrld  ymm3,ymm3,2
+        vpaddd  ymm1,ymm1,ymm7
+        vpshufb ymm14,ymm14,ymm9
+        vpor    ymm3,ymm3,ymm6
+        vmovd   xmm7,DWORD[((-44))+r13]
+        vmovd   xmm6,DWORD[((-44))+r9]
+        vpinsrd xmm10,xmm10,DWORD[((-44))+r14],1
+        vpinsrd xmm8,xmm8,DWORD[((-44))+r10],1
+        vpinsrd xmm7,xmm7,DWORD[((-44))+r15],1
+        vpunpckldq      ymm10,ymm10,ymm7
+        vpinsrd xmm6,xmm6,DWORD[((-44))+r11],1
+        vpunpckldq      ymm8,ymm8,ymm6
+        vpaddd  ymm0,ymm0,ymm15
+        vpslld  ymm7,ymm1,5
+        vpandn  ymm6,ymm2,ymm4
+        vpand   ymm5,ymm2,ymm3
+
+        vmovdqa YMMWORD[(128-128)+rax],ymm14
+        vpaddd  ymm0,ymm0,ymm14
+        vinserti128     ymm10,ymm10,xmm8,1
+        vpsrld  ymm8,ymm1,27
+        vpxor   ymm5,ymm5,ymm6
+        vmovd   xmm11,DWORD[((-40))+r12]
+
+        vpslld  ymm6,ymm2,30
+        vpor    ymm7,ymm7,ymm8
+        vmovd   xmm8,DWORD[((-40))+r8]
+        vpaddd  ymm0,ymm0,ymm5
+
+        vpsrld  ymm2,ymm2,2
+        vpaddd  ymm0,ymm0,ymm7
+        vpshufb ymm10,ymm10,ymm9
+        vpor    ymm2,ymm2,ymm6
+        vmovd   xmm7,DWORD[((-40))+r13]
+        vmovd   xmm6,DWORD[((-40))+r9]
+        vpinsrd xmm11,xmm11,DWORD[((-40))+r14],1
+        vpinsrd xmm8,xmm8,DWORD[((-40))+r10],1
+        vpinsrd xmm7,xmm7,DWORD[((-40))+r15],1
+        vpunpckldq      ymm11,ymm11,ymm7
+        vpinsrd xmm6,xmm6,DWORD[((-40))+r11],1
+        vpunpckldq      ymm8,ymm8,ymm6
+        vpaddd  ymm4,ymm4,ymm15
+        vpslld  ymm7,ymm0,5
+        vpandn  ymm6,ymm1,ymm3
+        vpand   ymm5,ymm1,ymm2
+
+        vmovdqa YMMWORD[(160-128)+rax],ymm10
+        vpaddd  ymm4,ymm4,ymm10
+        vinserti128     ymm11,ymm11,xmm8,1
+        vpsrld  ymm8,ymm0,27
+        vpxor   ymm5,ymm5,ymm6
+        vmovd   xmm12,DWORD[((-36))+r12]
+
+        vpslld  ymm6,ymm1,30
+        vpor    ymm7,ymm7,ymm8
+        vmovd   xmm8,DWORD[((-36))+r8]
+        vpaddd  ymm4,ymm4,ymm5
+
+        vpsrld  ymm1,ymm1,2
+        vpaddd  ymm4,ymm4,ymm7
+        vpshufb ymm11,ymm11,ymm9
+        vpor    ymm1,ymm1,ymm6
+        vmovd   xmm7,DWORD[((-36))+r13]
+        vmovd   xmm6,DWORD[((-36))+r9]
+        vpinsrd xmm12,xmm12,DWORD[((-36))+r14],1
+        vpinsrd xmm8,xmm8,DWORD[((-36))+r10],1
+        vpinsrd xmm7,xmm7,DWORD[((-36))+r15],1
+        vpunpckldq      ymm12,ymm12,ymm7
+        vpinsrd xmm6,xmm6,DWORD[((-36))+r11],1
+        vpunpckldq      ymm8,ymm8,ymm6
+        vpaddd  ymm3,ymm3,ymm15
+        vpslld  ymm7,ymm4,5
+        vpandn  ymm6,ymm0,ymm2
+        vpand   ymm5,ymm0,ymm1
+
+        vmovdqa YMMWORD[(192-128)+rax],ymm11
+        vpaddd  ymm3,ymm3,ymm11
+        vinserti128     ymm12,ymm12,xmm8,1
+        vpsrld  ymm8,ymm4,27
+        vpxor   ymm5,ymm5,ymm6
+        vmovd   xmm13,DWORD[((-32))+r12]
+
+        vpslld  ymm6,ymm0,30
+        vpor    ymm7,ymm7,ymm8
+        vmovd   xmm8,DWORD[((-32))+r8]
+        vpaddd  ymm3,ymm3,ymm5
+
+        vpsrld  ymm0,ymm0,2
+        vpaddd  ymm3,ymm3,ymm7
+        vpshufb ymm12,ymm12,ymm9
+        vpor    ymm0,ymm0,ymm6
+        vmovd   xmm7,DWORD[((-32))+r13]
+        vmovd   xmm6,DWORD[((-32))+r9]
+        vpinsrd xmm13,xmm13,DWORD[((-32))+r14],1
+        vpinsrd xmm8,xmm8,DWORD[((-32))+r10],1
+        vpinsrd xmm7,xmm7,DWORD[((-32))+r15],1
+        vpunpckldq      ymm13,ymm13,ymm7
+        vpinsrd xmm6,xmm6,DWORD[((-32))+r11],1
+        vpunpckldq      ymm8,ymm8,ymm6
+        vpaddd  ymm2,ymm2,ymm15
+        vpslld  ymm7,ymm3,5
+        vpandn  ymm6,ymm4,ymm1
+        vpand   ymm5,ymm4,ymm0
+
+        vmovdqa YMMWORD[(224-128)+rax],ymm12
+        vpaddd  ymm2,ymm2,ymm12
+        vinserti128     ymm13,ymm13,xmm8,1
+        vpsrld  ymm8,ymm3,27
+        vpxor   ymm5,ymm5,ymm6
+        vmovd   xmm14,DWORD[((-28))+r12]
+
+        vpslld  ymm6,ymm4,30
+        vpor    ymm7,ymm7,ymm8
+        vmovd   xmm8,DWORD[((-28))+r8]
+        vpaddd  ymm2,ymm2,ymm5
+
+        vpsrld  ymm4,ymm4,2
+        vpaddd  ymm2,ymm2,ymm7
+        vpshufb ymm13,ymm13,ymm9
+        vpor    ymm4,ymm4,ymm6
+        vmovd   xmm7,DWORD[((-28))+r13]
+        vmovd   xmm6,DWORD[((-28))+r9]
+        vpinsrd xmm14,xmm14,DWORD[((-28))+r14],1
+        vpinsrd xmm8,xmm8,DWORD[((-28))+r10],1
+        vpinsrd xmm7,xmm7,DWORD[((-28))+r15],1
+        vpunpckldq      ymm14,ymm14,ymm7
+        vpinsrd xmm6,xmm6,DWORD[((-28))+r11],1
+        vpunpckldq      ymm8,ymm8,ymm6
+        vpaddd  ymm1,ymm1,ymm15
+        vpslld  ymm7,ymm2,5
+        vpandn  ymm6,ymm3,ymm0
+        vpand   ymm5,ymm3,ymm4
+
+        vmovdqa YMMWORD[(256-256-128)+rbx],ymm13
+        vpaddd  ymm1,ymm1,ymm13
+        vinserti128     ymm14,ymm14,xmm8,1
+        vpsrld  ymm8,ymm2,27
+        vpxor   ymm5,ymm5,ymm6
+        vmovd   xmm10,DWORD[((-24))+r12]
+
+        vpslld  ymm6,ymm3,30
+        vpor    ymm7,ymm7,ymm8
+        vmovd   xmm8,DWORD[((-24))+r8]
+        vpaddd  ymm1,ymm1,ymm5
+
+        vpsrld  ymm3,ymm3,2
+        vpaddd  ymm1,ymm1,ymm7
+        vpshufb ymm14,ymm14,ymm9
+        vpor    ymm3,ymm3,ymm6
+        vmovd   xmm7,DWORD[((-24))+r13]
+        vmovd   xmm6,DWORD[((-24))+r9]
+        vpinsrd xmm10,xmm10,DWORD[((-24))+r14],1
+        vpinsrd xmm8,xmm8,DWORD[((-24))+r10],1
+        vpinsrd xmm7,xmm7,DWORD[((-24))+r15],1
+        vpunpckldq      ymm10,ymm10,ymm7
+        vpinsrd xmm6,xmm6,DWORD[((-24))+r11],1
+        vpunpckldq      ymm8,ymm8,ymm6
+        vpaddd  ymm0,ymm0,ymm15
+        vpslld  ymm7,ymm1,5
+        vpandn  ymm6,ymm2,ymm4
+        vpand   ymm5,ymm2,ymm3
+
+        vmovdqa YMMWORD[(288-256-128)+rbx],ymm14
+        vpaddd  ymm0,ymm0,ymm14
+        vinserti128     ymm10,ymm10,xmm8,1
+        vpsrld  ymm8,ymm1,27
+        vpxor   ymm5,ymm5,ymm6
+        vmovd   xmm11,DWORD[((-20))+r12]
+
+        vpslld  ymm6,ymm2,30
+        vpor    ymm7,ymm7,ymm8
+        vmovd   xmm8,DWORD[((-20))+r8]
+        vpaddd  ymm0,ymm0,ymm5
+
+        vpsrld  ymm2,ymm2,2
+        vpaddd  ymm0,ymm0,ymm7
+        vpshufb ymm10,ymm10,ymm9
+        vpor    ymm2,ymm2,ymm6
+        vmovd   xmm7,DWORD[((-20))+r13]
+        vmovd   xmm6,DWORD[((-20))+r9]
+        vpinsrd xmm11,xmm11,DWORD[((-20))+r14],1
+        vpinsrd xmm8,xmm8,DWORD[((-20))+r10],1
+        vpinsrd xmm7,xmm7,DWORD[((-20))+r15],1
+        vpunpckldq      ymm11,ymm11,ymm7
+        vpinsrd xmm6,xmm6,DWORD[((-20))+r11],1
+        vpunpckldq      ymm8,ymm8,ymm6
+        vpaddd  ymm4,ymm4,ymm15
+        vpslld  ymm7,ymm0,5
+        vpandn  ymm6,ymm1,ymm3
+        vpand   ymm5,ymm1,ymm2
+
+        vmovdqa YMMWORD[(320-256-128)+rbx],ymm10
+        vpaddd  ymm4,ymm4,ymm10
+        vinserti128     ymm11,ymm11,xmm8,1
+        vpsrld  ymm8,ymm0,27
+        vpxor   ymm5,ymm5,ymm6
+        vmovd   xmm12,DWORD[((-16))+r12]
+
+        vpslld  ymm6,ymm1,30
+        vpor    ymm7,ymm7,ymm8
+        vmovd   xmm8,DWORD[((-16))+r8]
+        vpaddd  ymm4,ymm4,ymm5
+
+        vpsrld  ymm1,ymm1,2
+        vpaddd  ymm4,ymm4,ymm7
+        vpshufb ymm11,ymm11,ymm9
+        vpor    ymm1,ymm1,ymm6
+        vmovd   xmm7,DWORD[((-16))+r13]
+        vmovd   xmm6,DWORD[((-16))+r9]
+        vpinsrd xmm12,xmm12,DWORD[((-16))+r14],1
+        vpinsrd xmm8,xmm8,DWORD[((-16))+r10],1
+        vpinsrd xmm7,xmm7,DWORD[((-16))+r15],1
+        vpunpckldq      ymm12,ymm12,ymm7
+        vpinsrd xmm6,xmm6,DWORD[((-16))+r11],1
+        vpunpckldq      ymm8,ymm8,ymm6
+        vpaddd  ymm3,ymm3,ymm15
+        vpslld  ymm7,ymm4,5
+        vpandn  ymm6,ymm0,ymm2
+        vpand   ymm5,ymm0,ymm1
+
+        vmovdqa YMMWORD[(352-256-128)+rbx],ymm11
+        vpaddd  ymm3,ymm3,ymm11
+        vinserti128     ymm12,ymm12,xmm8,1
+        vpsrld  ymm8,ymm4,27
+        vpxor   ymm5,ymm5,ymm6
+        vmovd   xmm13,DWORD[((-12))+r12]
+
+        vpslld  ymm6,ymm0,30
+        vpor    ymm7,ymm7,ymm8
+        vmovd   xmm8,DWORD[((-12))+r8]
+        vpaddd  ymm3,ymm3,ymm5
+
+        vpsrld  ymm0,ymm0,2
+        vpaddd  ymm3,ymm3,ymm7
+        vpshufb ymm12,ymm12,ymm9
+        vpor    ymm0,ymm0,ymm6
+        vmovd   xmm7,DWORD[((-12))+r13]
+        vmovd   xmm6,DWORD[((-12))+r9]
+        vpinsrd xmm13,xmm13,DWORD[((-12))+r14],1
+        vpinsrd xmm8,xmm8,DWORD[((-12))+r10],1
+        vpinsrd xmm7,xmm7,DWORD[((-12))+r15],1
+        vpunpckldq      ymm13,ymm13,ymm7
+        vpinsrd xmm6,xmm6,DWORD[((-12))+r11],1
+        vpunpckldq      ymm8,ymm8,ymm6
+        vpaddd  ymm2,ymm2,ymm15
+        vpslld  ymm7,ymm3,5
+        vpandn  ymm6,ymm4,ymm1
+        vpand   ymm5,ymm4,ymm0
+
+        vmovdqa YMMWORD[(384-256-128)+rbx],ymm12
+        vpaddd  ymm2,ymm2,ymm12
+        vinserti128     ymm13,ymm13,xmm8,1
+        vpsrld  ymm8,ymm3,27
+        vpxor   ymm5,ymm5,ymm6
+        vmovd   xmm14,DWORD[((-8))+r12]
+
+        vpslld  ymm6,ymm4,30
+        vpor    ymm7,ymm7,ymm8
+        vmovd   xmm8,DWORD[((-8))+r8]
+        vpaddd  ymm2,ymm2,ymm5
+
+        vpsrld  ymm4,ymm4,2
+        vpaddd  ymm2,ymm2,ymm7
+        vpshufb ymm13,ymm13,ymm9
+        vpor    ymm4,ymm4,ymm6
+        vmovd   xmm7,DWORD[((-8))+r13]
+        vmovd   xmm6,DWORD[((-8))+r9]
+        vpinsrd xmm14,xmm14,DWORD[((-8))+r14],1
+        vpinsrd xmm8,xmm8,DWORD[((-8))+r10],1
+        vpinsrd xmm7,xmm7,DWORD[((-8))+r15],1
+        vpunpckldq      ymm14,ymm14,ymm7
+        vpinsrd xmm6,xmm6,DWORD[((-8))+r11],1
+        vpunpckldq      ymm8,ymm8,ymm6
+        vpaddd  ymm1,ymm1,ymm15
+        vpslld  ymm7,ymm2,5
+        vpandn  ymm6,ymm3,ymm0
+        vpand   ymm5,ymm3,ymm4
+
+        vmovdqa YMMWORD[(416-256-128)+rbx],ymm13
+        vpaddd  ymm1,ymm1,ymm13
+        vinserti128     ymm14,ymm14,xmm8,1
+        vpsrld  ymm8,ymm2,27
+        vpxor   ymm5,ymm5,ymm6
+        vmovd   xmm10,DWORD[((-4))+r12]
+
+        vpslld  ymm6,ymm3,30
+        vpor    ymm7,ymm7,ymm8
+        vmovd   xmm8,DWORD[((-4))+r8]
+        vpaddd  ymm1,ymm1,ymm5
+
+        vpsrld  ymm3,ymm3,2
+        vpaddd  ymm1,ymm1,ymm7
+        vpshufb ymm14,ymm14,ymm9
+        vpor    ymm3,ymm3,ymm6
+        vmovdqa ymm11,YMMWORD[((0-128))+rax]
+        vmovd   xmm7,DWORD[((-4))+r13]
+        vmovd   xmm6,DWORD[((-4))+r9]
+        vpinsrd xmm10,xmm10,DWORD[((-4))+r14],1
+        vpinsrd xmm8,xmm8,DWORD[((-4))+r10],1
+        vpinsrd xmm7,xmm7,DWORD[((-4))+r15],1
+        vpunpckldq      ymm10,ymm10,ymm7
+        vpinsrd xmm6,xmm6,DWORD[((-4))+r11],1
+        vpunpckldq      ymm8,ymm8,ymm6
+        vpaddd  ymm0,ymm0,ymm15
+        prefetcht0      [63+r12]
+        vpslld  ymm7,ymm1,5
+        vpandn  ymm6,ymm2,ymm4
+        vpand   ymm5,ymm2,ymm3
+
+        vmovdqa YMMWORD[(448-256-128)+rbx],ymm14
+        vpaddd  ymm0,ymm0,ymm14
+        vinserti128     ymm10,ymm10,xmm8,1
+        vpsrld  ymm8,ymm1,27
+        prefetcht0      [63+r13]
+        vpxor   ymm5,ymm5,ymm6
+
+        vpslld  ymm6,ymm2,30
+        vpor    ymm7,ymm7,ymm8
+        prefetcht0      [63+r14]
+        vpaddd  ymm0,ymm0,ymm5
+
+        vpsrld  ymm2,ymm2,2
+        vpaddd  ymm0,ymm0,ymm7
+        prefetcht0      [63+r15]
+        vpshufb ymm10,ymm10,ymm9
+        vpor    ymm2,ymm2,ymm6
+        vmovdqa ymm12,YMMWORD[((32-128))+rax]
+        vpxor   ymm11,ymm11,ymm13
+        vmovdqa ymm13,YMMWORD[((64-128))+rax]
+
+        vpaddd  ymm4,ymm4,ymm15
+        vpslld  ymm7,ymm0,5
+        vpandn  ymm6,ymm1,ymm3
+        prefetcht0      [63+r8]
+        vpand   ymm5,ymm1,ymm2
+
+        vmovdqa YMMWORD[(480-256-128)+rbx],ymm10
+        vpaddd  ymm4,ymm4,ymm10
+        vpxor   ymm11,ymm11,YMMWORD[((256-256-128))+rbx]
+        vpsrld  ymm8,ymm0,27
+        vpxor   ymm5,ymm5,ymm6
+        vpxor   ymm11,ymm11,ymm13
+        prefetcht0      [63+r9]
+
+        vpslld  ymm6,ymm1,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm4,ymm4,ymm5
+        prefetcht0      [63+r10]
+        vpsrld  ymm9,ymm11,31
+        vpaddd  ymm11,ymm11,ymm11
+
+        vpsrld  ymm1,ymm1,2
+        prefetcht0      [63+r11]
+        vpaddd  ymm4,ymm4,ymm7
+        vpor    ymm11,ymm11,ymm9
+        vpor    ymm1,ymm1,ymm6
+        vpxor   ymm12,ymm12,ymm14
+        vmovdqa ymm14,YMMWORD[((96-128))+rax]
+
+        vpaddd  ymm3,ymm3,ymm15
+        vpslld  ymm7,ymm4,5
+        vpandn  ymm6,ymm0,ymm2
+
+        vpand   ymm5,ymm0,ymm1
+
+        vmovdqa YMMWORD[(0-128)+rax],ymm11
+        vpaddd  ymm3,ymm3,ymm11
+        vpxor   ymm12,ymm12,YMMWORD[((288-256-128))+rbx]
+        vpsrld  ymm8,ymm4,27
+        vpxor   ymm5,ymm5,ymm6
+        vpxor   ymm12,ymm12,ymm14
+
+
+        vpslld  ymm6,ymm0,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm3,ymm3,ymm5
+
+        vpsrld  ymm9,ymm12,31
+        vpaddd  ymm12,ymm12,ymm12
+
+        vpsrld  ymm0,ymm0,2
+
+        vpaddd  ymm3,ymm3,ymm7
+        vpor    ymm12,ymm12,ymm9
+        vpor    ymm0,ymm0,ymm6
+        vpxor   ymm13,ymm13,ymm10
+        vmovdqa ymm10,YMMWORD[((128-128))+rax]
+
+        vpaddd  ymm2,ymm2,ymm15
+        vpslld  ymm7,ymm3,5
+        vpandn  ymm6,ymm4,ymm1
+
+        vpand   ymm5,ymm4,ymm0
+
+        vmovdqa YMMWORD[(32-128)+rax],ymm12
+        vpaddd  ymm2,ymm2,ymm12
+        vpxor   ymm13,ymm13,YMMWORD[((320-256-128))+rbx]
+        vpsrld  ymm8,ymm3,27
+        vpxor   ymm5,ymm5,ymm6
+        vpxor   ymm13,ymm13,ymm10
+
+
+        vpslld  ymm6,ymm4,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm2,ymm2,ymm5
+
+        vpsrld  ymm9,ymm13,31
+        vpaddd  ymm13,ymm13,ymm13
+
+        vpsrld  ymm4,ymm4,2
+
+        vpaddd  ymm2,ymm2,ymm7
+        vpor    ymm13,ymm13,ymm9
+        vpor    ymm4,ymm4,ymm6
+        vpxor   ymm14,ymm14,ymm11
+        vmovdqa ymm11,YMMWORD[((160-128))+rax]
+
+        vpaddd  ymm1,ymm1,ymm15
+        vpslld  ymm7,ymm2,5
+        vpandn  ymm6,ymm3,ymm0
+
+        vpand   ymm5,ymm3,ymm4
+
+        vmovdqa YMMWORD[(64-128)+rax],ymm13
+        vpaddd  ymm1,ymm1,ymm13
+        vpxor   ymm14,ymm14,YMMWORD[((352-256-128))+rbx]
+        vpsrld  ymm8,ymm2,27
+        vpxor   ymm5,ymm5,ymm6
+        vpxor   ymm14,ymm14,ymm11
+
+
+        vpslld  ymm6,ymm3,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm1,ymm1,ymm5
+
+        vpsrld  ymm9,ymm14,31
+        vpaddd  ymm14,ymm14,ymm14
+
+        vpsrld  ymm3,ymm3,2
+
+        vpaddd  ymm1,ymm1,ymm7
+        vpor    ymm14,ymm14,ymm9
+        vpor    ymm3,ymm3,ymm6
+        vpxor   ymm10,ymm10,ymm12
+        vmovdqa ymm12,YMMWORD[((192-128))+rax]
+
+        vpaddd  ymm0,ymm0,ymm15
+        vpslld  ymm7,ymm1,5
+        vpandn  ymm6,ymm2,ymm4
+
+        vpand   ymm5,ymm2,ymm3
+
+        vmovdqa YMMWORD[(96-128)+rax],ymm14
+        vpaddd  ymm0,ymm0,ymm14
+        vpxor   ymm10,ymm10,YMMWORD[((384-256-128))+rbx]
+        vpsrld  ymm8,ymm1,27
+        vpxor   ymm5,ymm5,ymm6
+        vpxor   ymm10,ymm10,ymm12
+
+
+        vpslld  ymm6,ymm2,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm0,ymm0,ymm5
+
+        vpsrld  ymm9,ymm10,31
+        vpaddd  ymm10,ymm10,ymm10
+
+        vpsrld  ymm2,ymm2,2
+
+        vpaddd  ymm0,ymm0,ymm7
+        vpor    ymm10,ymm10,ymm9
+        vpor    ymm2,ymm2,ymm6
+        vmovdqa ymm15,YMMWORD[rbp]
+        vpxor   ymm11,ymm11,ymm13
+        vmovdqa ymm13,YMMWORD[((224-128))+rax]
+
+        vpslld  ymm7,ymm0,5
+        vpaddd  ymm4,ymm4,ymm15
+        vpxor   ymm5,ymm3,ymm1
+        vmovdqa YMMWORD[(128-128)+rax],ymm10
+        vpaddd  ymm4,ymm4,ymm10
+        vpxor   ymm11,ymm11,YMMWORD[((416-256-128))+rbx]
+        vpsrld  ymm8,ymm0,27
+        vpxor   ymm5,ymm5,ymm2
+        vpxor   ymm11,ymm11,ymm13
+
+        vpslld  ymm6,ymm1,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm4,ymm4,ymm5
+        vpsrld  ymm9,ymm11,31
+        vpaddd  ymm11,ymm11,ymm11
+
+        vpsrld  ymm1,ymm1,2
+        vpaddd  ymm4,ymm4,ymm7
+        vpor    ymm11,ymm11,ymm9
+        vpor    ymm1,ymm1,ymm6
+        vpxor   ymm12,ymm12,ymm14
+        vmovdqa ymm14,YMMWORD[((256-256-128))+rbx]
+
+        vpslld  ymm7,ymm4,5
+        vpaddd  ymm3,ymm3,ymm15
+        vpxor   ymm5,ymm2,ymm0
+        vmovdqa YMMWORD[(160-128)+rax],ymm11
+        vpaddd  ymm3,ymm3,ymm11
+        vpxor   ymm12,ymm12,YMMWORD[((448-256-128))+rbx]
+        vpsrld  ymm8,ymm4,27
+        vpxor   ymm5,ymm5,ymm1
+        vpxor   ymm12,ymm12,ymm14
+
+        vpslld  ymm6,ymm0,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm3,ymm3,ymm5
+        vpsrld  ymm9,ymm12,31
+        vpaddd  ymm12,ymm12,ymm12
+
+        vpsrld  ymm0,ymm0,2
+        vpaddd  ymm3,ymm3,ymm7
+        vpor    ymm12,ymm12,ymm9
+        vpor    ymm0,ymm0,ymm6
+        vpxor   ymm13,ymm13,ymm10
+        vmovdqa ymm10,YMMWORD[((288-256-128))+rbx]
+
+        vpslld  ymm7,ymm3,5
+        vpaddd  ymm2,ymm2,ymm15
+        vpxor   ymm5,ymm1,ymm4
+        vmovdqa YMMWORD[(192-128)+rax],ymm12
+        vpaddd  ymm2,ymm2,ymm12
+        vpxor   ymm13,ymm13,YMMWORD[((480-256-128))+rbx]
+        vpsrld  ymm8,ymm3,27
+        vpxor   ymm5,ymm5,ymm0
+        vpxor   ymm13,ymm13,ymm10
+
+        vpslld  ymm6,ymm4,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm2,ymm2,ymm5
+        vpsrld  ymm9,ymm13,31
+        vpaddd  ymm13,ymm13,ymm13
+
+        vpsrld  ymm4,ymm4,2
+        vpaddd  ymm2,ymm2,ymm7
+        vpor    ymm13,ymm13,ymm9
+        vpor    ymm4,ymm4,ymm6
+        vpxor   ymm14,ymm14,ymm11
+        vmovdqa ymm11,YMMWORD[((320-256-128))+rbx]
+
+        vpslld  ymm7,ymm2,5
+        vpaddd  ymm1,ymm1,ymm15
+        vpxor   ymm5,ymm0,ymm3
+        vmovdqa YMMWORD[(224-128)+rax],ymm13
+        vpaddd  ymm1,ymm1,ymm13
+        vpxor   ymm14,ymm14,YMMWORD[((0-128))+rax]
+        vpsrld  ymm8,ymm2,27
+        vpxor   ymm5,ymm5,ymm4
+        vpxor   ymm14,ymm14,ymm11
+
+        vpslld  ymm6,ymm3,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm1,ymm1,ymm5
+        vpsrld  ymm9,ymm14,31
+        vpaddd  ymm14,ymm14,ymm14
+
+        vpsrld  ymm3,ymm3,2
+        vpaddd  ymm1,ymm1,ymm7
+        vpor    ymm14,ymm14,ymm9
+        vpor    ymm3,ymm3,ymm6
+        vpxor   ymm10,ymm10,ymm12
+        vmovdqa ymm12,YMMWORD[((352-256-128))+rbx]
+
+        vpslld  ymm7,ymm1,5
+        vpaddd  ymm0,ymm0,ymm15
+        vpxor   ymm5,ymm4,ymm2
+        vmovdqa YMMWORD[(256-256-128)+rbx],ymm14
+        vpaddd  ymm0,ymm0,ymm14
+        vpxor   ymm10,ymm10,YMMWORD[((32-128))+rax]
+        vpsrld  ymm8,ymm1,27
+        vpxor   ymm5,ymm5,ymm3
+        vpxor   ymm10,ymm10,ymm12
+
+        vpslld  ymm6,ymm2,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm0,ymm0,ymm5
+        vpsrld  ymm9,ymm10,31
+        vpaddd  ymm10,ymm10,ymm10
+
+        vpsrld  ymm2,ymm2,2
+        vpaddd  ymm0,ymm0,ymm7
+        vpor    ymm10,ymm10,ymm9
+        vpor    ymm2,ymm2,ymm6
+        vpxor   ymm11,ymm11,ymm13
+        vmovdqa ymm13,YMMWORD[((384-256-128))+rbx]
+
+        vpslld  ymm7,ymm0,5
+        vpaddd  ymm4,ymm4,ymm15
+        vpxor   ymm5,ymm3,ymm1
+        vmovdqa YMMWORD[(288-256-128)+rbx],ymm10
+        vpaddd  ymm4,ymm4,ymm10
+        vpxor   ymm11,ymm11,YMMWORD[((64-128))+rax]
+        vpsrld  ymm8,ymm0,27
+        vpxor   ymm5,ymm5,ymm2
+        vpxor   ymm11,ymm11,ymm13
+
+        vpslld  ymm6,ymm1,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm4,ymm4,ymm5
+        vpsrld  ymm9,ymm11,31
+        vpaddd  ymm11,ymm11,ymm11
+
+        vpsrld  ymm1,ymm1,2
+        vpaddd  ymm4,ymm4,ymm7
+        vpor    ymm11,ymm11,ymm9
+        vpor    ymm1,ymm1,ymm6
+        vpxor   ymm12,ymm12,ymm14
+        vmovdqa ymm14,YMMWORD[((416-256-128))+rbx]
+
+        vpslld  ymm7,ymm4,5
+        vpaddd  ymm3,ymm3,ymm15
+        vpxor   ymm5,ymm2,ymm0
+        vmovdqa YMMWORD[(320-256-128)+rbx],ymm11
+        vpaddd  ymm3,ymm3,ymm11
+        vpxor   ymm12,ymm12,YMMWORD[((96-128))+rax]
+        vpsrld  ymm8,ymm4,27
+        vpxor   ymm5,ymm5,ymm1
+        vpxor   ymm12,ymm12,ymm14
+
+        vpslld  ymm6,ymm0,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm3,ymm3,ymm5
+        vpsrld  ymm9,ymm12,31
+        vpaddd  ymm12,ymm12,ymm12
+
+        vpsrld  ymm0,ymm0,2
+        vpaddd  ymm3,ymm3,ymm7
+        vpor    ymm12,ymm12,ymm9
+        vpor    ymm0,ymm0,ymm6
+        vpxor   ymm13,ymm13,ymm10
+        vmovdqa ymm10,YMMWORD[((448-256-128))+rbx]
+
+        vpslld  ymm7,ymm3,5
+        vpaddd  ymm2,ymm2,ymm15
+        vpxor   ymm5,ymm1,ymm4
+        vmovdqa YMMWORD[(352-256-128)+rbx],ymm12
+        vpaddd  ymm2,ymm2,ymm12
+        vpxor   ymm13,ymm13,YMMWORD[((128-128))+rax]
+        vpsrld  ymm8,ymm3,27
+        vpxor   ymm5,ymm5,ymm0
+        vpxor   ymm13,ymm13,ymm10
+
+        vpslld  ymm6,ymm4,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm2,ymm2,ymm5
+        vpsrld  ymm9,ymm13,31
+        vpaddd  ymm13,ymm13,ymm13
+
+        vpsrld  ymm4,ymm4,2
+        vpaddd  ymm2,ymm2,ymm7
+        vpor    ymm13,ymm13,ymm9
+        vpor    ymm4,ymm4,ymm6
+        vpxor   ymm14,ymm14,ymm11
+        vmovdqa ymm11,YMMWORD[((480-256-128))+rbx]
+
+        vpslld  ymm7,ymm2,5
+        vpaddd  ymm1,ymm1,ymm15
+        vpxor   ymm5,ymm0,ymm3
+        vmovdqa YMMWORD[(384-256-128)+rbx],ymm13
+        vpaddd  ymm1,ymm1,ymm13
+        vpxor   ymm14,ymm14,YMMWORD[((160-128))+rax]
+        vpsrld  ymm8,ymm2,27
+        vpxor   ymm5,ymm5,ymm4
+        vpxor   ymm14,ymm14,ymm11
+
+        vpslld  ymm6,ymm3,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm1,ymm1,ymm5
+        vpsrld  ymm9,ymm14,31
+        vpaddd  ymm14,ymm14,ymm14
+
+        vpsrld  ymm3,ymm3,2
+        vpaddd  ymm1,ymm1,ymm7
+        vpor    ymm14,ymm14,ymm9
+        vpor    ymm3,ymm3,ymm6
+        vpxor   ymm10,ymm10,ymm12
+        vmovdqa ymm12,YMMWORD[((0-128))+rax]
+
+        vpslld  ymm7,ymm1,5
+        vpaddd  ymm0,ymm0,ymm15
+        vpxor   ymm5,ymm4,ymm2
+        vmovdqa YMMWORD[(416-256-128)+rbx],ymm14
+        vpaddd  ymm0,ymm0,ymm14
+        vpxor   ymm10,ymm10,YMMWORD[((192-128))+rax]
+        vpsrld  ymm8,ymm1,27
+        vpxor   ymm5,ymm5,ymm3
+        vpxor   ymm10,ymm10,ymm12
+
+        vpslld  ymm6,ymm2,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm0,ymm0,ymm5
+        vpsrld  ymm9,ymm10,31
+        vpaddd  ymm10,ymm10,ymm10
+
+        vpsrld  ymm2,ymm2,2
+        vpaddd  ymm0,ymm0,ymm7
+        vpor    ymm10,ymm10,ymm9
+        vpor    ymm2,ymm2,ymm6
+        vpxor   ymm11,ymm11,ymm13
+        vmovdqa ymm13,YMMWORD[((32-128))+rax]
+
+        vpslld  ymm7,ymm0,5
+        vpaddd  ymm4,ymm4,ymm15
+        vpxor   ymm5,ymm3,ymm1
+        vmovdqa YMMWORD[(448-256-128)+rbx],ymm10
+        vpaddd  ymm4,ymm4,ymm10
+        vpxor   ymm11,ymm11,YMMWORD[((224-128))+rax]
+        vpsrld  ymm8,ymm0,27
+        vpxor   ymm5,ymm5,ymm2
+        vpxor   ymm11,ymm11,ymm13
+
+        vpslld  ymm6,ymm1,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm4,ymm4,ymm5
+        vpsrld  ymm9,ymm11,31
+        vpaddd  ymm11,ymm11,ymm11
+
+        vpsrld  ymm1,ymm1,2
+        vpaddd  ymm4,ymm4,ymm7
+        vpor    ymm11,ymm11,ymm9
+        vpor    ymm1,ymm1,ymm6
+        vpxor   ymm12,ymm12,ymm14
+        vmovdqa ymm14,YMMWORD[((64-128))+rax]
+
+        vpslld  ymm7,ymm4,5
+        vpaddd  ymm3,ymm3,ymm15
+        vpxor   ymm5,ymm2,ymm0
+        vmovdqa YMMWORD[(480-256-128)+rbx],ymm11
+        vpaddd  ymm3,ymm3,ymm11
+        vpxor   ymm12,ymm12,YMMWORD[((256-256-128))+rbx]
+        vpsrld  ymm8,ymm4,27
+        vpxor   ymm5,ymm5,ymm1
+        vpxor   ymm12,ymm12,ymm14
+
+        vpslld  ymm6,ymm0,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm3,ymm3,ymm5
+        vpsrld  ymm9,ymm12,31
+        vpaddd  ymm12,ymm12,ymm12
+
+        vpsrld  ymm0,ymm0,2
+        vpaddd  ymm3,ymm3,ymm7
+        vpor    ymm12,ymm12,ymm9
+        vpor    ymm0,ymm0,ymm6
+        vpxor   ymm13,ymm13,ymm10
+        vmovdqa ymm10,YMMWORD[((96-128))+rax]
+
+        vpslld  ymm7,ymm3,5
+        vpaddd  ymm2,ymm2,ymm15
+        vpxor   ymm5,ymm1,ymm4
+        vmovdqa YMMWORD[(0-128)+rax],ymm12
+        vpaddd  ymm2,ymm2,ymm12
+        vpxor   ymm13,ymm13,YMMWORD[((288-256-128))+rbx]
+        vpsrld  ymm8,ymm3,27
+        vpxor   ymm5,ymm5,ymm0
+        vpxor   ymm13,ymm13,ymm10
+
+        vpslld  ymm6,ymm4,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm2,ymm2,ymm5
+        vpsrld  ymm9,ymm13,31
+        vpaddd  ymm13,ymm13,ymm13
+
+        vpsrld  ymm4,ymm4,2
+        vpaddd  ymm2,ymm2,ymm7
+        vpor    ymm13,ymm13,ymm9
+        vpor    ymm4,ymm4,ymm6
+        vpxor   ymm14,ymm14,ymm11
+        vmovdqa ymm11,YMMWORD[((128-128))+rax]
+
+        vpslld  ymm7,ymm2,5
+        vpaddd  ymm1,ymm1,ymm15
+        vpxor   ymm5,ymm0,ymm3
+        vmovdqa YMMWORD[(32-128)+rax],ymm13
+        vpaddd  ymm1,ymm1,ymm13
+        vpxor   ymm14,ymm14,YMMWORD[((320-256-128))+rbx]
+        vpsrld  ymm8,ymm2,27
+        vpxor   ymm5,ymm5,ymm4
+        vpxor   ymm14,ymm14,ymm11
+
+        vpslld  ymm6,ymm3,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm1,ymm1,ymm5
+        vpsrld  ymm9,ymm14,31
+        vpaddd  ymm14,ymm14,ymm14
+
+        vpsrld  ymm3,ymm3,2
+        vpaddd  ymm1,ymm1,ymm7
+        vpor    ymm14,ymm14,ymm9
+        vpor    ymm3,ymm3,ymm6
+        vpxor   ymm10,ymm10,ymm12
+        vmovdqa ymm12,YMMWORD[((160-128))+rax]
+
+        vpslld  ymm7,ymm1,5
+        vpaddd  ymm0,ymm0,ymm15
+        vpxor   ymm5,ymm4,ymm2
+        vmovdqa YMMWORD[(64-128)+rax],ymm14
+        vpaddd  ymm0,ymm0,ymm14
+        vpxor   ymm10,ymm10,YMMWORD[((352-256-128))+rbx]
+        vpsrld  ymm8,ymm1,27
+        vpxor   ymm5,ymm5,ymm3
+        vpxor   ymm10,ymm10,ymm12
+
+        vpslld  ymm6,ymm2,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm0,ymm0,ymm5
+        vpsrld  ymm9,ymm10,31
+        vpaddd  ymm10,ymm10,ymm10
+
+        vpsrld  ymm2,ymm2,2
+        vpaddd  ymm0,ymm0,ymm7
+        vpor    ymm10,ymm10,ymm9
+        vpor    ymm2,ymm2,ymm6
+        vpxor   ymm11,ymm11,ymm13
+        vmovdqa ymm13,YMMWORD[((192-128))+rax]
+
+        vpslld  ymm7,ymm0,5
+        vpaddd  ymm4,ymm4,ymm15
+        vpxor   ymm5,ymm3,ymm1
+        vmovdqa YMMWORD[(96-128)+rax],ymm10
+        vpaddd  ymm4,ymm4,ymm10
+        vpxor   ymm11,ymm11,YMMWORD[((384-256-128))+rbx]
+        vpsrld  ymm8,ymm0,27
+        vpxor   ymm5,ymm5,ymm2
+        vpxor   ymm11,ymm11,ymm13
+
+        vpslld  ymm6,ymm1,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm4,ymm4,ymm5
+        vpsrld  ymm9,ymm11,31
+        vpaddd  ymm11,ymm11,ymm11
+
+        vpsrld  ymm1,ymm1,2
+        vpaddd  ymm4,ymm4,ymm7
+        vpor    ymm11,ymm11,ymm9
+        vpor    ymm1,ymm1,ymm6
+        vpxor   ymm12,ymm12,ymm14
+        vmovdqa ymm14,YMMWORD[((224-128))+rax]
+
+        vpslld  ymm7,ymm4,5
+        vpaddd  ymm3,ymm3,ymm15
+        vpxor   ymm5,ymm2,ymm0
+        vmovdqa YMMWORD[(128-128)+rax],ymm11
+        vpaddd  ymm3,ymm3,ymm11
+        vpxor   ymm12,ymm12,YMMWORD[((416-256-128))+rbx]
+        vpsrld  ymm8,ymm4,27
+        vpxor   ymm5,ymm5,ymm1
+        vpxor   ymm12,ymm12,ymm14
+
+        vpslld  ymm6,ymm0,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm3,ymm3,ymm5
+        vpsrld  ymm9,ymm12,31
+        vpaddd  ymm12,ymm12,ymm12
+
+        vpsrld  ymm0,ymm0,2
+        vpaddd  ymm3,ymm3,ymm7
+        vpor    ymm12,ymm12,ymm9
+        vpor    ymm0,ymm0,ymm6
+        vpxor   ymm13,ymm13,ymm10
+        vmovdqa ymm10,YMMWORD[((256-256-128))+rbx]
+
+        vpslld  ymm7,ymm3,5
+        vpaddd  ymm2,ymm2,ymm15
+        vpxor   ymm5,ymm1,ymm4
+        vmovdqa YMMWORD[(160-128)+rax],ymm12
+        vpaddd  ymm2,ymm2,ymm12
+        vpxor   ymm13,ymm13,YMMWORD[((448-256-128))+rbx]
+        vpsrld  ymm8,ymm3,27
+        vpxor   ymm5,ymm5,ymm0
+        vpxor   ymm13,ymm13,ymm10
+
+        vpslld  ymm6,ymm4,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm2,ymm2,ymm5
+        vpsrld  ymm9,ymm13,31
+        vpaddd  ymm13,ymm13,ymm13
+
+        vpsrld  ymm4,ymm4,2
+        vpaddd  ymm2,ymm2,ymm7
+        vpor    ymm13,ymm13,ymm9
+        vpor    ymm4,ymm4,ymm6
+        vpxor   ymm14,ymm14,ymm11
+        vmovdqa ymm11,YMMWORD[((288-256-128))+rbx]
+
+        vpslld  ymm7,ymm2,5
+        vpaddd  ymm1,ymm1,ymm15
+        vpxor   ymm5,ymm0,ymm3
+        vmovdqa YMMWORD[(192-128)+rax],ymm13
+        vpaddd  ymm1,ymm1,ymm13
+        vpxor   ymm14,ymm14,YMMWORD[((480-256-128))+rbx]
+        vpsrld  ymm8,ymm2,27
+        vpxor   ymm5,ymm5,ymm4
+        vpxor   ymm14,ymm14,ymm11
+
+        vpslld  ymm6,ymm3,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm1,ymm1,ymm5
+        vpsrld  ymm9,ymm14,31
+        vpaddd  ymm14,ymm14,ymm14
+
+        vpsrld  ymm3,ymm3,2
+        vpaddd  ymm1,ymm1,ymm7
+        vpor    ymm14,ymm14,ymm9
+        vpor    ymm3,ymm3,ymm6
+        vpxor   ymm10,ymm10,ymm12
+        vmovdqa ymm12,YMMWORD[((320-256-128))+rbx]
+
+        vpslld  ymm7,ymm1,5
+        vpaddd  ymm0,ymm0,ymm15
+        vpxor   ymm5,ymm4,ymm2
+        vmovdqa YMMWORD[(224-128)+rax],ymm14
+        vpaddd  ymm0,ymm0,ymm14
+        vpxor   ymm10,ymm10,YMMWORD[((0-128))+rax]
+        vpsrld  ymm8,ymm1,27
+        vpxor   ymm5,ymm5,ymm3
+        vpxor   ymm10,ymm10,ymm12
+
+        vpslld  ymm6,ymm2,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm0,ymm0,ymm5
+        vpsrld  ymm9,ymm10,31
+        vpaddd  ymm10,ymm10,ymm10
+
+        vpsrld  ymm2,ymm2,2
+        vpaddd  ymm0,ymm0,ymm7
+        vpor    ymm10,ymm10,ymm9
+        vpor    ymm2,ymm2,ymm6
+        vmovdqa ymm15,YMMWORD[32+rbp]
+        vpxor   ymm11,ymm11,ymm13
+        vmovdqa ymm13,YMMWORD[((352-256-128))+rbx]
+
+        vpaddd  ymm4,ymm4,ymm15
+        vpslld  ymm7,ymm0,5
+        vpand   ymm6,ymm3,ymm2
+        vpxor   ymm11,ymm11,YMMWORD[((32-128))+rax]
+
+        vpaddd  ymm4,ymm4,ymm6
+        vpsrld  ymm8,ymm0,27
+        vpxor   ymm5,ymm3,ymm2
+        vpxor   ymm11,ymm11,ymm13
+
+        vmovdqu YMMWORD[(256-256-128)+rbx],ymm10
+        vpaddd  ymm4,ymm4,ymm10
+        vpor    ymm7,ymm7,ymm8
+        vpsrld  ymm9,ymm11,31
+        vpand   ymm5,ymm5,ymm1
+        vpaddd  ymm11,ymm11,ymm11
+
+        vpslld  ymm6,ymm1,30
+        vpaddd  ymm4,ymm4,ymm5
+
+        vpsrld  ymm1,ymm1,2
+        vpaddd  ymm4,ymm4,ymm7
+        vpor    ymm11,ymm11,ymm9
+        vpor    ymm1,ymm1,ymm6
+        vpxor   ymm12,ymm12,ymm14
+        vmovdqa ymm14,YMMWORD[((384-256-128))+rbx]
+
+        vpaddd  ymm3,ymm3,ymm15
+        vpslld  ymm7,ymm4,5
+        vpand   ymm6,ymm2,ymm1
+        vpxor   ymm12,ymm12,YMMWORD[((64-128))+rax]
+
+        vpaddd  ymm3,ymm3,ymm6
+        vpsrld  ymm8,ymm4,27
+        vpxor   ymm5,ymm2,ymm1
+        vpxor   ymm12,ymm12,ymm14
+
+        vmovdqu YMMWORD[(288-256-128)+rbx],ymm11
+        vpaddd  ymm3,ymm3,ymm11
+        vpor    ymm7,ymm7,ymm8
+        vpsrld  ymm9,ymm12,31
+        vpand   ymm5,ymm5,ymm0
+        vpaddd  ymm12,ymm12,ymm12
+
+        vpslld  ymm6,ymm0,30
+        vpaddd  ymm3,ymm3,ymm5
+
+        vpsrld  ymm0,ymm0,2
+        vpaddd  ymm3,ymm3,ymm7
+        vpor    ymm12,ymm12,ymm9
+        vpor    ymm0,ymm0,ymm6
+        vpxor   ymm13,ymm13,ymm10
+        vmovdqa ymm10,YMMWORD[((416-256-128))+rbx]
+
+        vpaddd  ymm2,ymm2,ymm15
+        vpslld  ymm7,ymm3,5
+        vpand   ymm6,ymm1,ymm0
+        vpxor   ymm13,ymm13,YMMWORD[((96-128))+rax]
+
+        vpaddd  ymm2,ymm2,ymm6
+        vpsrld  ymm8,ymm3,27
+        vpxor   ymm5,ymm1,ymm0
+        vpxor   ymm13,ymm13,ymm10
+
+        vmovdqu YMMWORD[(320-256-128)+rbx],ymm12
+        vpaddd  ymm2,ymm2,ymm12
+        vpor    ymm7,ymm7,ymm8
+        vpsrld  ymm9,ymm13,31
+        vpand   ymm5,ymm5,ymm4
+        vpaddd  ymm13,ymm13,ymm13
+
+        vpslld  ymm6,ymm4,30
+        vpaddd  ymm2,ymm2,ymm5
+
+        vpsrld  ymm4,ymm4,2
+        vpaddd  ymm2,ymm2,ymm7
+        vpor    ymm13,ymm13,ymm9
+        vpor    ymm4,ymm4,ymm6
+        vpxor   ymm14,ymm14,ymm11
+        vmovdqa ymm11,YMMWORD[((448-256-128))+rbx]
+
+        vpaddd  ymm1,ymm1,ymm15
+        vpslld  ymm7,ymm2,5
+        vpand   ymm6,ymm0,ymm4
+        vpxor   ymm14,ymm14,YMMWORD[((128-128))+rax]
+
+        vpaddd  ymm1,ymm1,ymm6
+        vpsrld  ymm8,ymm2,27
+        vpxor   ymm5,ymm0,ymm4
+        vpxor   ymm14,ymm14,ymm11
+
+        vmovdqu YMMWORD[(352-256-128)+rbx],ymm13
+        vpaddd  ymm1,ymm1,ymm13
+        vpor    ymm7,ymm7,ymm8
+        vpsrld  ymm9,ymm14,31
+        vpand   ymm5,ymm5,ymm3
+        vpaddd  ymm14,ymm14,ymm14
+
+        vpslld  ymm6,ymm3,30
+        vpaddd  ymm1,ymm1,ymm5
+
+        vpsrld  ymm3,ymm3,2
+        vpaddd  ymm1,ymm1,ymm7
+        vpor    ymm14,ymm14,ymm9
+        vpor    ymm3,ymm3,ymm6
+        vpxor   ymm10,ymm10,ymm12
+        vmovdqa ymm12,YMMWORD[((480-256-128))+rbx]
+
+        vpaddd  ymm0,ymm0,ymm15
+        vpslld  ymm7,ymm1,5
+        vpand   ymm6,ymm4,ymm3
+        vpxor   ymm10,ymm10,YMMWORD[((160-128))+rax]
+
+        vpaddd  ymm0,ymm0,ymm6
+        vpsrld  ymm8,ymm1,27
+        vpxor   ymm5,ymm4,ymm3
+        vpxor   ymm10,ymm10,ymm12
+
+        vmovdqu YMMWORD[(384-256-128)+rbx],ymm14
+        vpaddd  ymm0,ymm0,ymm14
+        vpor    ymm7,ymm7,ymm8
+        vpsrld  ymm9,ymm10,31
+        vpand   ymm5,ymm5,ymm2
+        vpaddd  ymm10,ymm10,ymm10
+
+        vpslld  ymm6,ymm2,30
+        vpaddd  ymm0,ymm0,ymm5
+
+        vpsrld  ymm2,ymm2,2
+        vpaddd  ymm0,ymm0,ymm7
+        vpor    ymm10,ymm10,ymm9
+        vpor    ymm2,ymm2,ymm6
+        vpxor   ymm11,ymm11,ymm13
+        vmovdqa ymm13,YMMWORD[((0-128))+rax]
+
+        vpaddd  ymm4,ymm4,ymm15
+        vpslld  ymm7,ymm0,5
+        vpand   ymm6,ymm3,ymm2
+        vpxor   ymm11,ymm11,YMMWORD[((192-128))+rax]
+
+        vpaddd  ymm4,ymm4,ymm6
+        vpsrld  ymm8,ymm0,27
+        vpxor   ymm5,ymm3,ymm2
+        vpxor   ymm11,ymm11,ymm13
+
+        vmovdqu YMMWORD[(416-256-128)+rbx],ymm10
+        vpaddd  ymm4,ymm4,ymm10
+        vpor    ymm7,ymm7,ymm8
+        vpsrld  ymm9,ymm11,31
+        vpand   ymm5,ymm5,ymm1
+        vpaddd  ymm11,ymm11,ymm11
+
+        vpslld  ymm6,ymm1,30
+        vpaddd  ymm4,ymm4,ymm5
+
+        vpsrld  ymm1,ymm1,2
+        vpaddd  ymm4,ymm4,ymm7
+        vpor    ymm11,ymm11,ymm9
+        vpor    ymm1,ymm1,ymm6
+        vpxor   ymm12,ymm12,ymm14
+        vmovdqa ymm14,YMMWORD[((32-128))+rax]
+
+        vpaddd  ymm3,ymm3,ymm15
+        vpslld  ymm7,ymm4,5
+        vpand   ymm6,ymm2,ymm1
+        vpxor   ymm12,ymm12,YMMWORD[((224-128))+rax]
+
+        vpaddd  ymm3,ymm3,ymm6
+        vpsrld  ymm8,ymm4,27
+        vpxor   ymm5,ymm2,ymm1
+        vpxor   ymm12,ymm12,ymm14
+
+        vmovdqu YMMWORD[(448-256-128)+rbx],ymm11
+        vpaddd  ymm3,ymm3,ymm11
+        vpor    ymm7,ymm7,ymm8
+        vpsrld  ymm9,ymm12,31
+        vpand   ymm5,ymm5,ymm0
+        vpaddd  ymm12,ymm12,ymm12
+
+        vpslld  ymm6,ymm0,30
+        vpaddd  ymm3,ymm3,ymm5
+
+        vpsrld  ymm0,ymm0,2
+        vpaddd  ymm3,ymm3,ymm7
+        vpor    ymm12,ymm12,ymm9
+        vpor    ymm0,ymm0,ymm6
+        vpxor   ymm13,ymm13,ymm10
+        vmovdqa ymm10,YMMWORD[((64-128))+rax]
+
+        vpaddd  ymm2,ymm2,ymm15
+        vpslld  ymm7,ymm3,5
+        vpand   ymm6,ymm1,ymm0
+        vpxor   ymm13,ymm13,YMMWORD[((256-256-128))+rbx]
+
+        vpaddd  ymm2,ymm2,ymm6
+        vpsrld  ymm8,ymm3,27
+        vpxor   ymm5,ymm1,ymm0
+        vpxor   ymm13,ymm13,ymm10
+
+        vmovdqu YMMWORD[(480-256-128)+rbx],ymm12
+        vpaddd  ymm2,ymm2,ymm12
+        vpor    ymm7,ymm7,ymm8
+        vpsrld  ymm9,ymm13,31
+        vpand   ymm5,ymm5,ymm4
+        vpaddd  ymm13,ymm13,ymm13
+
+        vpslld  ymm6,ymm4,30
+        vpaddd  ymm2,ymm2,ymm5
+
+        vpsrld  ymm4,ymm4,2
+        vpaddd  ymm2,ymm2,ymm7
+        vpor    ymm13,ymm13,ymm9
+        vpor    ymm4,ymm4,ymm6
+        vpxor   ymm14,ymm14,ymm11
+        vmovdqa ymm11,YMMWORD[((96-128))+rax]
+
+        vpaddd  ymm1,ymm1,ymm15
+        vpslld  ymm7,ymm2,5
+        vpand   ymm6,ymm0,ymm4
+        vpxor   ymm14,ymm14,YMMWORD[((288-256-128))+rbx]
+
+        vpaddd  ymm1,ymm1,ymm6
+        vpsrld  ymm8,ymm2,27
+        vpxor   ymm5,ymm0,ymm4
+        vpxor   ymm14,ymm14,ymm11
+
+        vmovdqu YMMWORD[(0-128)+rax],ymm13
+        vpaddd  ymm1,ymm1,ymm13
+        vpor    ymm7,ymm7,ymm8
+        vpsrld  ymm9,ymm14,31
+        vpand   ymm5,ymm5,ymm3
+        vpaddd  ymm14,ymm14,ymm14
+
+        vpslld  ymm6,ymm3,30
+        vpaddd  ymm1,ymm1,ymm5
+
+        vpsrld  ymm3,ymm3,2
+        vpaddd  ymm1,ymm1,ymm7
+        vpor    ymm14,ymm14,ymm9
+        vpor    ymm3,ymm3,ymm6
+        vpxor   ymm10,ymm10,ymm12
+        vmovdqa ymm12,YMMWORD[((128-128))+rax]
+
+        vpaddd  ymm0,ymm0,ymm15
+        vpslld  ymm7,ymm1,5
+        vpand   ymm6,ymm4,ymm3
+        vpxor   ymm10,ymm10,YMMWORD[((320-256-128))+rbx]
+
+        vpaddd  ymm0,ymm0,ymm6
+        vpsrld  ymm8,ymm1,27
+        vpxor   ymm5,ymm4,ymm3
+        vpxor   ymm10,ymm10,ymm12
+
+        vmovdqu YMMWORD[(32-128)+rax],ymm14
+        vpaddd  ymm0,ymm0,ymm14
+        vpor    ymm7,ymm7,ymm8
+        vpsrld  ymm9,ymm10,31
+        vpand   ymm5,ymm5,ymm2
+        vpaddd  ymm10,ymm10,ymm10
+
+        vpslld  ymm6,ymm2,30
+        vpaddd  ymm0,ymm0,ymm5
+
+        vpsrld  ymm2,ymm2,2
+        vpaddd  ymm0,ymm0,ymm7
+        vpor    ymm10,ymm10,ymm9
+        vpor    ymm2,ymm2,ymm6
+        vpxor   ymm11,ymm11,ymm13
+        vmovdqa ymm13,YMMWORD[((160-128))+rax]
+
+        vpaddd  ymm4,ymm4,ymm15
+        vpslld  ymm7,ymm0,5
+        vpand   ymm6,ymm3,ymm2
+        vpxor   ymm11,ymm11,YMMWORD[((352-256-128))+rbx]
+
+        vpaddd  ymm4,ymm4,ymm6
+        vpsrld  ymm8,ymm0,27
+        vpxor   ymm5,ymm3,ymm2
+        vpxor   ymm11,ymm11,ymm13
+
+        vmovdqu YMMWORD[(64-128)+rax],ymm10
+        vpaddd  ymm4,ymm4,ymm10
+        vpor    ymm7,ymm7,ymm8
+        vpsrld  ymm9,ymm11,31
+        vpand   ymm5,ymm5,ymm1
+        vpaddd  ymm11,ymm11,ymm11
+
+        vpslld  ymm6,ymm1,30
+        vpaddd  ymm4,ymm4,ymm5
+
+        vpsrld  ymm1,ymm1,2
+        vpaddd  ymm4,ymm4,ymm7
+        vpor    ymm11,ymm11,ymm9
+        vpor    ymm1,ymm1,ymm6
+        vpxor   ymm12,ymm12,ymm14
+        vmovdqa ymm14,YMMWORD[((192-128))+rax]
+
+        vpaddd  ymm3,ymm3,ymm15
+        vpslld  ymm7,ymm4,5
+        vpand   ymm6,ymm2,ymm1
+        vpxor   ymm12,ymm12,YMMWORD[((384-256-128))+rbx]
+
+        vpaddd  ymm3,ymm3,ymm6
+        vpsrld  ymm8,ymm4,27
+        vpxor   ymm5,ymm2,ymm1
+        vpxor   ymm12,ymm12,ymm14
+
+        vmovdqu YMMWORD[(96-128)+rax],ymm11
+        vpaddd  ymm3,ymm3,ymm11
+        vpor    ymm7,ymm7,ymm8
+        vpsrld  ymm9,ymm12,31
+        vpand   ymm5,ymm5,ymm0
+        vpaddd  ymm12,ymm12,ymm12
+
+        vpslld  ymm6,ymm0,30
+        vpaddd  ymm3,ymm3,ymm5
+
+        vpsrld  ymm0,ymm0,2
+        vpaddd  ymm3,ymm3,ymm7
+        vpor    ymm12,ymm12,ymm9
+        vpor    ymm0,ymm0,ymm6
+        vpxor   ymm13,ymm13,ymm10
+        vmovdqa ymm10,YMMWORD[((224-128))+rax]
+
+        vpaddd  ymm2,ymm2,ymm15
+        vpslld  ymm7,ymm3,5
+        vpand   ymm6,ymm1,ymm0
+        vpxor   ymm13,ymm13,YMMWORD[((416-256-128))+rbx]
+
+        vpaddd  ymm2,ymm2,ymm6
+        vpsrld  ymm8,ymm3,27
+        vpxor   ymm5,ymm1,ymm0
+        vpxor   ymm13,ymm13,ymm10
+
+        vmovdqu YMMWORD[(128-128)+rax],ymm12
+        vpaddd  ymm2,ymm2,ymm12
+        vpor    ymm7,ymm7,ymm8
+        vpsrld  ymm9,ymm13,31
+        vpand   ymm5,ymm5,ymm4
+        vpaddd  ymm13,ymm13,ymm13
+
+        vpslld  ymm6,ymm4,30
+        vpaddd  ymm2,ymm2,ymm5
+
+        vpsrld  ymm4,ymm4,2
+        vpaddd  ymm2,ymm2,ymm7
+        vpor    ymm13,ymm13,ymm9
+        vpor    ymm4,ymm4,ymm6
+        vpxor   ymm14,ymm14,ymm11
+        vmovdqa ymm11,YMMWORD[((256-256-128))+rbx]
+
+        vpaddd  ymm1,ymm1,ymm15
+        vpslld  ymm7,ymm2,5
+        vpand   ymm6,ymm0,ymm4
+        vpxor   ymm14,ymm14,YMMWORD[((448-256-128))+rbx]
+
+        vpaddd  ymm1,ymm1,ymm6
+        vpsrld  ymm8,ymm2,27
+        vpxor   ymm5,ymm0,ymm4
+        vpxor   ymm14,ymm14,ymm11
+
+        vmovdqu YMMWORD[(160-128)+rax],ymm13
+        vpaddd  ymm1,ymm1,ymm13
+        vpor    ymm7,ymm7,ymm8
+        vpsrld  ymm9,ymm14,31
+        vpand   ymm5,ymm5,ymm3
+        vpaddd  ymm14,ymm14,ymm14
+
+        vpslld  ymm6,ymm3,30
+        vpaddd  ymm1,ymm1,ymm5
+
+        vpsrld  ymm3,ymm3,2
+        vpaddd  ymm1,ymm1,ymm7
+        vpor    ymm14,ymm14,ymm9
+        vpor    ymm3,ymm3,ymm6
+        vpxor   ymm10,ymm10,ymm12
+        vmovdqa ymm12,YMMWORD[((288-256-128))+rbx]
+
+        vpaddd  ymm0,ymm0,ymm15
+        vpslld  ymm7,ymm1,5
+        vpand   ymm6,ymm4,ymm3
+        vpxor   ymm10,ymm10,YMMWORD[((480-256-128))+rbx]
+
+        vpaddd  ymm0,ymm0,ymm6
+        vpsrld  ymm8,ymm1,27
+        vpxor   ymm5,ymm4,ymm3
+        vpxor   ymm10,ymm10,ymm12
+
+        vmovdqu YMMWORD[(192-128)+rax],ymm14
+        vpaddd  ymm0,ymm0,ymm14
+        vpor    ymm7,ymm7,ymm8
+        vpsrld  ymm9,ymm10,31
+        vpand   ymm5,ymm5,ymm2
+        vpaddd  ymm10,ymm10,ymm10
+
+        vpslld  ymm6,ymm2,30
+        vpaddd  ymm0,ymm0,ymm5
+
+        vpsrld  ymm2,ymm2,2
+        vpaddd  ymm0,ymm0,ymm7
+        vpor    ymm10,ymm10,ymm9
+        vpor    ymm2,ymm2,ymm6
+        vpxor   ymm11,ymm11,ymm13
+        vmovdqa ymm13,YMMWORD[((320-256-128))+rbx]
+
+        vpaddd  ymm4,ymm4,ymm15
+        vpslld  ymm7,ymm0,5
+        vpand   ymm6,ymm3,ymm2
+        vpxor   ymm11,ymm11,YMMWORD[((0-128))+rax]
+
+        vpaddd  ymm4,ymm4,ymm6
+        vpsrld  ymm8,ymm0,27
+        vpxor   ymm5,ymm3,ymm2
+        vpxor   ymm11,ymm11,ymm13
+
+        vmovdqu YMMWORD[(224-128)+rax],ymm10
+        vpaddd  ymm4,ymm4,ymm10
+        vpor    ymm7,ymm7,ymm8
+        vpsrld  ymm9,ymm11,31
+        vpand   ymm5,ymm5,ymm1
+        vpaddd  ymm11,ymm11,ymm11
+
+        vpslld  ymm6,ymm1,30
+        vpaddd  ymm4,ymm4,ymm5
+
+        vpsrld  ymm1,ymm1,2
+        vpaddd  ymm4,ymm4,ymm7
+        vpor    ymm11,ymm11,ymm9
+        vpor    ymm1,ymm1,ymm6
+        vpxor   ymm12,ymm12,ymm14
+        vmovdqa ymm14,YMMWORD[((352-256-128))+rbx]
+
+        vpaddd  ymm3,ymm3,ymm15
+        vpslld  ymm7,ymm4,5
+        vpand   ymm6,ymm2,ymm1
+        vpxor   ymm12,ymm12,YMMWORD[((32-128))+rax]
+
+        vpaddd  ymm3,ymm3,ymm6
+        vpsrld  ymm8,ymm4,27
+        vpxor   ymm5,ymm2,ymm1
+        vpxor   ymm12,ymm12,ymm14
+
+        vmovdqu YMMWORD[(256-256-128)+rbx],ymm11
+        vpaddd  ymm3,ymm3,ymm11
+        vpor    ymm7,ymm7,ymm8
+        vpsrld  ymm9,ymm12,31
+        vpand   ymm5,ymm5,ymm0
+        vpaddd  ymm12,ymm12,ymm12
+
+        vpslld  ymm6,ymm0,30
+        vpaddd  ymm3,ymm3,ymm5
+
+        vpsrld  ymm0,ymm0,2
+        vpaddd  ymm3,ymm3,ymm7
+        vpor    ymm12,ymm12,ymm9
+        vpor    ymm0,ymm0,ymm6
+        vpxor   ymm13,ymm13,ymm10
+        vmovdqa ymm10,YMMWORD[((384-256-128))+rbx]
+
+        vpaddd  ymm2,ymm2,ymm15
+        vpslld  ymm7,ymm3,5
+        vpand   ymm6,ymm1,ymm0
+        vpxor   ymm13,ymm13,YMMWORD[((64-128))+rax]
+
+        vpaddd  ymm2,ymm2,ymm6
+        vpsrld  ymm8,ymm3,27
+        vpxor   ymm5,ymm1,ymm0
+        vpxor   ymm13,ymm13,ymm10
+
+        vmovdqu YMMWORD[(288-256-128)+rbx],ymm12
+        vpaddd  ymm2,ymm2,ymm12
+        vpor    ymm7,ymm7,ymm8
+        vpsrld  ymm9,ymm13,31
+        vpand   ymm5,ymm5,ymm4
+        vpaddd  ymm13,ymm13,ymm13
+
+        vpslld  ymm6,ymm4,30
+        vpaddd  ymm2,ymm2,ymm5
+
+        vpsrld  ymm4,ymm4,2
+        vpaddd  ymm2,ymm2,ymm7
+        vpor    ymm13,ymm13,ymm9
+        vpor    ymm4,ymm4,ymm6
+        vpxor   ymm14,ymm14,ymm11
+        vmovdqa ymm11,YMMWORD[((416-256-128))+rbx]
+
+        vpaddd  ymm1,ymm1,ymm15
+        vpslld  ymm7,ymm2,5
+        vpand   ymm6,ymm0,ymm4
+        vpxor   ymm14,ymm14,YMMWORD[((96-128))+rax]
+
+        vpaddd  ymm1,ymm1,ymm6
+        vpsrld  ymm8,ymm2,27
+        vpxor   ymm5,ymm0,ymm4
+        vpxor   ymm14,ymm14,ymm11
+
+        vmovdqu YMMWORD[(320-256-128)+rbx],ymm13
+        vpaddd  ymm1,ymm1,ymm13
+        vpor    ymm7,ymm7,ymm8
+        vpsrld  ymm9,ymm14,31
+        vpand   ymm5,ymm5,ymm3
+        vpaddd  ymm14,ymm14,ymm14
+
+        vpslld  ymm6,ymm3,30
+        vpaddd  ymm1,ymm1,ymm5
+
+        vpsrld  ymm3,ymm3,2
+        vpaddd  ymm1,ymm1,ymm7
+        vpor    ymm14,ymm14,ymm9
+        vpor    ymm3,ymm3,ymm6
+        vpxor   ymm10,ymm10,ymm12
+        vmovdqa ymm12,YMMWORD[((448-256-128))+rbx]
+
+        vpaddd  ymm0,ymm0,ymm15
+        vpslld  ymm7,ymm1,5
+        vpand   ymm6,ymm4,ymm3
+        vpxor   ymm10,ymm10,YMMWORD[((128-128))+rax]
+
+        vpaddd  ymm0,ymm0,ymm6
+        vpsrld  ymm8,ymm1,27
+        vpxor   ymm5,ymm4,ymm3
+        vpxor   ymm10,ymm10,ymm12
+
+        vmovdqu YMMWORD[(352-256-128)+rbx],ymm14
+        vpaddd  ymm0,ymm0,ymm14
+        vpor    ymm7,ymm7,ymm8
+        vpsrld  ymm9,ymm10,31
+        vpand   ymm5,ymm5,ymm2
+        vpaddd  ymm10,ymm10,ymm10
+
+        vpslld  ymm6,ymm2,30
+        vpaddd  ymm0,ymm0,ymm5
+
+        vpsrld  ymm2,ymm2,2
+        vpaddd  ymm0,ymm0,ymm7
+        vpor    ymm10,ymm10,ymm9
+        vpor    ymm2,ymm2,ymm6
+        vmovdqa ymm15,YMMWORD[64+rbp]
+        vpxor   ymm11,ymm11,ymm13
+        vmovdqa ymm13,YMMWORD[((480-256-128))+rbx]
+
+        vpslld  ymm7,ymm0,5
+        vpaddd  ymm4,ymm4,ymm15
+        vpxor   ymm5,ymm3,ymm1
+        vmovdqa YMMWORD[(384-256-128)+rbx],ymm10
+        vpaddd  ymm4,ymm4,ymm10
+        vpxor   ymm11,ymm11,YMMWORD[((160-128))+rax]
+        vpsrld  ymm8,ymm0,27
+        vpxor   ymm5,ymm5,ymm2
+        vpxor   ymm11,ymm11,ymm13
+
+        vpslld  ymm6,ymm1,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm4,ymm4,ymm5
+        vpsrld  ymm9,ymm11,31
+        vpaddd  ymm11,ymm11,ymm11
+
+        vpsrld  ymm1,ymm1,2
+        vpaddd  ymm4,ymm4,ymm7
+        vpor    ymm11,ymm11,ymm9
+        vpor    ymm1,ymm1,ymm6
+        vpxor   ymm12,ymm12,ymm14
+        vmovdqa ymm14,YMMWORD[((0-128))+rax]
+
+        vpslld  ymm7,ymm4,5
+        vpaddd  ymm3,ymm3,ymm15
+        vpxor   ymm5,ymm2,ymm0
+        vmovdqa YMMWORD[(416-256-128)+rbx],ymm11
+        vpaddd  ymm3,ymm3,ymm11
+        vpxor   ymm12,ymm12,YMMWORD[((192-128))+rax]
+        vpsrld  ymm8,ymm4,27
+        vpxor   ymm5,ymm5,ymm1
+        vpxor   ymm12,ymm12,ymm14
+
+        vpslld  ymm6,ymm0,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm3,ymm3,ymm5
+        vpsrld  ymm9,ymm12,31
+        vpaddd  ymm12,ymm12,ymm12
+
+        vpsrld  ymm0,ymm0,2
+        vpaddd  ymm3,ymm3,ymm7
+        vpor    ymm12,ymm12,ymm9
+        vpor    ymm0,ymm0,ymm6
+        vpxor   ymm13,ymm13,ymm10
+        vmovdqa ymm10,YMMWORD[((32-128))+rax]
+
+        vpslld  ymm7,ymm3,5
+        vpaddd  ymm2,ymm2,ymm15
+        vpxor   ymm5,ymm1,ymm4
+        vmovdqa YMMWORD[(448-256-128)+rbx],ymm12
+        vpaddd  ymm2,ymm2,ymm12
+        vpxor   ymm13,ymm13,YMMWORD[((224-128))+rax]
+        vpsrld  ymm8,ymm3,27
+        vpxor   ymm5,ymm5,ymm0
+        vpxor   ymm13,ymm13,ymm10
+
+        vpslld  ymm6,ymm4,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm2,ymm2,ymm5
+        vpsrld  ymm9,ymm13,31
+        vpaddd  ymm13,ymm13,ymm13
+
+        vpsrld  ymm4,ymm4,2
+        vpaddd  ymm2,ymm2,ymm7
+        vpor    ymm13,ymm13,ymm9
+        vpor    ymm4,ymm4,ymm6
+        vpxor   ymm14,ymm14,ymm11
+        vmovdqa ymm11,YMMWORD[((64-128))+rax]
+
+        vpslld  ymm7,ymm2,5
+        vpaddd  ymm1,ymm1,ymm15
+        vpxor   ymm5,ymm0,ymm3
+        vmovdqa YMMWORD[(480-256-128)+rbx],ymm13
+        vpaddd  ymm1,ymm1,ymm13
+        vpxor   ymm14,ymm14,YMMWORD[((256-256-128))+rbx]
+        vpsrld  ymm8,ymm2,27
+        vpxor   ymm5,ymm5,ymm4
+        vpxor   ymm14,ymm14,ymm11
+
+        vpslld  ymm6,ymm3,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm1,ymm1,ymm5
+        vpsrld  ymm9,ymm14,31
+        vpaddd  ymm14,ymm14,ymm14
+
+        vpsrld  ymm3,ymm3,2
+        vpaddd  ymm1,ymm1,ymm7
+        vpor    ymm14,ymm14,ymm9
+        vpor    ymm3,ymm3,ymm6
+        vpxor   ymm10,ymm10,ymm12
+        vmovdqa ymm12,YMMWORD[((96-128))+rax]
+
+        vpslld  ymm7,ymm1,5
+        vpaddd  ymm0,ymm0,ymm15
+        vpxor   ymm5,ymm4,ymm2
+        vmovdqa YMMWORD[(0-128)+rax],ymm14
+        vpaddd  ymm0,ymm0,ymm14
+        vpxor   ymm10,ymm10,YMMWORD[((288-256-128))+rbx]
+        vpsrld  ymm8,ymm1,27
+        vpxor   ymm5,ymm5,ymm3
+        vpxor   ymm10,ymm10,ymm12
+
+        vpslld  ymm6,ymm2,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm0,ymm0,ymm5
+        vpsrld  ymm9,ymm10,31
+        vpaddd  ymm10,ymm10,ymm10
+
+        vpsrld  ymm2,ymm2,2
+        vpaddd  ymm0,ymm0,ymm7
+        vpor    ymm10,ymm10,ymm9
+        vpor    ymm2,ymm2,ymm6
+        vpxor   ymm11,ymm11,ymm13
+        vmovdqa ymm13,YMMWORD[((128-128))+rax]
+
+        vpslld  ymm7,ymm0,5
+        vpaddd  ymm4,ymm4,ymm15
+        vpxor   ymm5,ymm3,ymm1
+        vmovdqa YMMWORD[(32-128)+rax],ymm10
+        vpaddd  ymm4,ymm4,ymm10
+        vpxor   ymm11,ymm11,YMMWORD[((320-256-128))+rbx]
+        vpsrld  ymm8,ymm0,27
+        vpxor   ymm5,ymm5,ymm2
+        vpxor   ymm11,ymm11,ymm13
+
+        vpslld  ymm6,ymm1,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm4,ymm4,ymm5
+        vpsrld  ymm9,ymm11,31
+        vpaddd  ymm11,ymm11,ymm11
+
+        vpsrld  ymm1,ymm1,2
+        vpaddd  ymm4,ymm4,ymm7
+        vpor    ymm11,ymm11,ymm9
+        vpor    ymm1,ymm1,ymm6
+        vpxor   ymm12,ymm12,ymm14
+        vmovdqa ymm14,YMMWORD[((160-128))+rax]
+
+        vpslld  ymm7,ymm4,5
+        vpaddd  ymm3,ymm3,ymm15
+        vpxor   ymm5,ymm2,ymm0
+        vmovdqa YMMWORD[(64-128)+rax],ymm11
+        vpaddd  ymm3,ymm3,ymm11
+        vpxor   ymm12,ymm12,YMMWORD[((352-256-128))+rbx]
+        vpsrld  ymm8,ymm4,27
+        vpxor   ymm5,ymm5,ymm1
+        vpxor   ymm12,ymm12,ymm14
+
+        vpslld  ymm6,ymm0,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm3,ymm3,ymm5
+        vpsrld  ymm9,ymm12,31
+        vpaddd  ymm12,ymm12,ymm12
+
+        vpsrld  ymm0,ymm0,2
+        vpaddd  ymm3,ymm3,ymm7
+        vpor    ymm12,ymm12,ymm9
+        vpor    ymm0,ymm0,ymm6
+        vpxor   ymm13,ymm13,ymm10
+        vmovdqa ymm10,YMMWORD[((192-128))+rax]
+
+        vpslld  ymm7,ymm3,5
+        vpaddd  ymm2,ymm2,ymm15
+        vpxor   ymm5,ymm1,ymm4
+        vmovdqa YMMWORD[(96-128)+rax],ymm12
+        vpaddd  ymm2,ymm2,ymm12
+        vpxor   ymm13,ymm13,YMMWORD[((384-256-128))+rbx]
+        vpsrld  ymm8,ymm3,27
+        vpxor   ymm5,ymm5,ymm0
+        vpxor   ymm13,ymm13,ymm10
+
+        vpslld  ymm6,ymm4,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm2,ymm2,ymm5
+        vpsrld  ymm9,ymm13,31
+        vpaddd  ymm13,ymm13,ymm13
+
+        vpsrld  ymm4,ymm4,2
+        vpaddd  ymm2,ymm2,ymm7
+        vpor    ymm13,ymm13,ymm9
+        vpor    ymm4,ymm4,ymm6
+        vpxor   ymm14,ymm14,ymm11
+        vmovdqa ymm11,YMMWORD[((224-128))+rax]
+
+        vpslld  ymm7,ymm2,5
+        vpaddd  ymm1,ymm1,ymm15
+        vpxor   ymm5,ymm0,ymm3
+        vmovdqa YMMWORD[(128-128)+rax],ymm13
+        vpaddd  ymm1,ymm1,ymm13
+        vpxor   ymm14,ymm14,YMMWORD[((416-256-128))+rbx]
+        vpsrld  ymm8,ymm2,27
+        vpxor   ymm5,ymm5,ymm4
+        vpxor   ymm14,ymm14,ymm11
+
+        vpslld  ymm6,ymm3,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm1,ymm1,ymm5
+        vpsrld  ymm9,ymm14,31
+        vpaddd  ymm14,ymm14,ymm14
+
+        vpsrld  ymm3,ymm3,2
+        vpaddd  ymm1,ymm1,ymm7
+        vpor    ymm14,ymm14,ymm9
+        vpor    ymm3,ymm3,ymm6
+        vpxor   ymm10,ymm10,ymm12
+        vmovdqa ymm12,YMMWORD[((256-256-128))+rbx]
+
+        vpslld  ymm7,ymm1,5
+        vpaddd  ymm0,ymm0,ymm15
+        vpxor   ymm5,ymm4,ymm2
+        vmovdqa YMMWORD[(160-128)+rax],ymm14
+        vpaddd  ymm0,ymm0,ymm14
+        vpxor   ymm10,ymm10,YMMWORD[((448-256-128))+rbx]
+        vpsrld  ymm8,ymm1,27
+        vpxor   ymm5,ymm5,ymm3
+        vpxor   ymm10,ymm10,ymm12
+
+        vpslld  ymm6,ymm2,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm0,ymm0,ymm5
+        vpsrld  ymm9,ymm10,31
+        vpaddd  ymm10,ymm10,ymm10
+
+        vpsrld  ymm2,ymm2,2
+        vpaddd  ymm0,ymm0,ymm7
+        vpor    ymm10,ymm10,ymm9
+        vpor    ymm2,ymm2,ymm6
+        vpxor   ymm11,ymm11,ymm13
+        vmovdqa ymm13,YMMWORD[((288-256-128))+rbx]
+
+        vpslld  ymm7,ymm0,5
+        vpaddd  ymm4,ymm4,ymm15
+        vpxor   ymm5,ymm3,ymm1
+        vmovdqa YMMWORD[(192-128)+rax],ymm10
+        vpaddd  ymm4,ymm4,ymm10
+        vpxor   ymm11,ymm11,YMMWORD[((480-256-128))+rbx]
+        vpsrld  ymm8,ymm0,27
+        vpxor   ymm5,ymm5,ymm2
+        vpxor   ymm11,ymm11,ymm13
+
+        vpslld  ymm6,ymm1,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm4,ymm4,ymm5
+        vpsrld  ymm9,ymm11,31
+        vpaddd  ymm11,ymm11,ymm11
+
+        vpsrld  ymm1,ymm1,2
+        vpaddd  ymm4,ymm4,ymm7
+        vpor    ymm11,ymm11,ymm9
+        vpor    ymm1,ymm1,ymm6
+        vpxor   ymm12,ymm12,ymm14
+        vmovdqa ymm14,YMMWORD[((320-256-128))+rbx]
+
+        vpslld  ymm7,ymm4,5
+        vpaddd  ymm3,ymm3,ymm15
+        vpxor   ymm5,ymm2,ymm0
+        vmovdqa YMMWORD[(224-128)+rax],ymm11
+        vpaddd  ymm3,ymm3,ymm11
+        vpxor   ymm12,ymm12,YMMWORD[((0-128))+rax]
+        vpsrld  ymm8,ymm4,27
+        vpxor   ymm5,ymm5,ymm1
+        vpxor   ymm12,ymm12,ymm14
+
+        vpslld  ymm6,ymm0,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm3,ymm3,ymm5
+        vpsrld  ymm9,ymm12,31
+        vpaddd  ymm12,ymm12,ymm12
+
+        vpsrld  ymm0,ymm0,2
+        vpaddd  ymm3,ymm3,ymm7
+        vpor    ymm12,ymm12,ymm9
+        vpor    ymm0,ymm0,ymm6
+        vpxor   ymm13,ymm13,ymm10
+        vmovdqa ymm10,YMMWORD[((352-256-128))+rbx]
+
+        vpslld  ymm7,ymm3,5
+        vpaddd  ymm2,ymm2,ymm15
+        vpxor   ymm5,ymm1,ymm4
+        vpaddd  ymm2,ymm2,ymm12
+        vpxor   ymm13,ymm13,YMMWORD[((32-128))+rax]
+        vpsrld  ymm8,ymm3,27
+        vpxor   ymm5,ymm5,ymm0
+        vpxor   ymm13,ymm13,ymm10
+
+        vpslld  ymm6,ymm4,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm2,ymm2,ymm5
+        vpsrld  ymm9,ymm13,31
+        vpaddd  ymm13,ymm13,ymm13
+
+        vpsrld  ymm4,ymm4,2
+        vpaddd  ymm2,ymm2,ymm7
+        vpor    ymm13,ymm13,ymm9
+        vpor    ymm4,ymm4,ymm6
+        vpxor   ymm14,ymm14,ymm11
+        vmovdqa ymm11,YMMWORD[((384-256-128))+rbx]
+
+        vpslld  ymm7,ymm2,5
+        vpaddd  ymm1,ymm1,ymm15
+        vpxor   ymm5,ymm0,ymm3
+        vpaddd  ymm1,ymm1,ymm13
+        vpxor   ymm14,ymm14,YMMWORD[((64-128))+rax]
+        vpsrld  ymm8,ymm2,27
+        vpxor   ymm5,ymm5,ymm4
+        vpxor   ymm14,ymm14,ymm11
+
+        vpslld  ymm6,ymm3,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm1,ymm1,ymm5
+        vpsrld  ymm9,ymm14,31
+        vpaddd  ymm14,ymm14,ymm14
+
+        vpsrld  ymm3,ymm3,2
+        vpaddd  ymm1,ymm1,ymm7
+        vpor    ymm14,ymm14,ymm9
+        vpor    ymm3,ymm3,ymm6
+        vpxor   ymm10,ymm10,ymm12
+        vmovdqa ymm12,YMMWORD[((416-256-128))+rbx]
+
+        vpslld  ymm7,ymm1,5
+        vpaddd  ymm0,ymm0,ymm15
+        vpxor   ymm5,ymm4,ymm2
+        vpaddd  ymm0,ymm0,ymm14
+        vpxor   ymm10,ymm10,YMMWORD[((96-128))+rax]
+        vpsrld  ymm8,ymm1,27
+        vpxor   ymm5,ymm5,ymm3
+        vpxor   ymm10,ymm10,ymm12
+
+        vpslld  ymm6,ymm2,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm0,ymm0,ymm5
+        vpsrld  ymm9,ymm10,31
+        vpaddd  ymm10,ymm10,ymm10
+
+        vpsrld  ymm2,ymm2,2
+        vpaddd  ymm0,ymm0,ymm7
+        vpor    ymm10,ymm10,ymm9
+        vpor    ymm2,ymm2,ymm6
+        vpxor   ymm11,ymm11,ymm13
+        vmovdqa ymm13,YMMWORD[((448-256-128))+rbx]
+
+        vpslld  ymm7,ymm0,5
+        vpaddd  ymm4,ymm4,ymm15
+        vpxor   ymm5,ymm3,ymm1
+        vpaddd  ymm4,ymm4,ymm10
+        vpxor   ymm11,ymm11,YMMWORD[((128-128))+rax]
+        vpsrld  ymm8,ymm0,27
+        vpxor   ymm5,ymm5,ymm2
+        vpxor   ymm11,ymm11,ymm13
+
+        vpslld  ymm6,ymm1,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm4,ymm4,ymm5
+        vpsrld  ymm9,ymm11,31
+        vpaddd  ymm11,ymm11,ymm11
+
+        vpsrld  ymm1,ymm1,2
+        vpaddd  ymm4,ymm4,ymm7
+        vpor    ymm11,ymm11,ymm9
+        vpor    ymm1,ymm1,ymm6
+        vpxor   ymm12,ymm12,ymm14
+        vmovdqa ymm14,YMMWORD[((480-256-128))+rbx]
+
+        vpslld  ymm7,ymm4,5
+        vpaddd  ymm3,ymm3,ymm15
+        vpxor   ymm5,ymm2,ymm0
+        vpaddd  ymm3,ymm3,ymm11
+        vpxor   ymm12,ymm12,YMMWORD[((160-128))+rax]
+        vpsrld  ymm8,ymm4,27
+        vpxor   ymm5,ymm5,ymm1
+        vpxor   ymm12,ymm12,ymm14
+
+        vpslld  ymm6,ymm0,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm3,ymm3,ymm5
+        vpsrld  ymm9,ymm12,31
+        vpaddd  ymm12,ymm12,ymm12
+
+        vpsrld  ymm0,ymm0,2
+        vpaddd  ymm3,ymm3,ymm7
+        vpor    ymm12,ymm12,ymm9
+        vpor    ymm0,ymm0,ymm6
+        vpxor   ymm13,ymm13,ymm10
+        vmovdqa ymm10,YMMWORD[((0-128))+rax]
+
+        vpslld  ymm7,ymm3,5
+        vpaddd  ymm2,ymm2,ymm15
+        vpxor   ymm5,ymm1,ymm4
+        vpaddd  ymm2,ymm2,ymm12
+        vpxor   ymm13,ymm13,YMMWORD[((192-128))+rax]
+        vpsrld  ymm8,ymm3,27
+        vpxor   ymm5,ymm5,ymm0
+        vpxor   ymm13,ymm13,ymm10
+
+        vpslld  ymm6,ymm4,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm2,ymm2,ymm5
+        vpsrld  ymm9,ymm13,31
+        vpaddd  ymm13,ymm13,ymm13
+
+        vpsrld  ymm4,ymm4,2
+        vpaddd  ymm2,ymm2,ymm7
+        vpor    ymm13,ymm13,ymm9
+        vpor    ymm4,ymm4,ymm6
+        vpxor   ymm14,ymm14,ymm11
+        vmovdqa ymm11,YMMWORD[((32-128))+rax]
+
+        vpslld  ymm7,ymm2,5
+        vpaddd  ymm1,ymm1,ymm15
+        vpxor   ymm5,ymm0,ymm3
+        vpaddd  ymm1,ymm1,ymm13
+        vpxor   ymm14,ymm14,YMMWORD[((224-128))+rax]
+        vpsrld  ymm8,ymm2,27
+        vpxor   ymm5,ymm5,ymm4
+        vpxor   ymm14,ymm14,ymm11
+
+        vpslld  ymm6,ymm3,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm1,ymm1,ymm5
+        vpsrld  ymm9,ymm14,31
+        vpaddd  ymm14,ymm14,ymm14
+
+        vpsrld  ymm3,ymm3,2
+        vpaddd  ymm1,ymm1,ymm7
+        vpor    ymm14,ymm14,ymm9
+        vpor    ymm3,ymm3,ymm6
+        vpslld  ymm7,ymm1,5
+        vpaddd  ymm0,ymm0,ymm15
+        vpxor   ymm5,ymm4,ymm2
+
+        vpsrld  ymm8,ymm1,27
+        vpaddd  ymm0,ymm0,ymm14
+        vpxor   ymm5,ymm5,ymm3
+
+        vpslld  ymm6,ymm2,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm0,ymm0,ymm5
+
+        vpsrld  ymm2,ymm2,2
+        vpaddd  ymm0,ymm0,ymm7
+        vpor    ymm2,ymm2,ymm6
+        mov     ecx,1
+        lea     rbx,[512+rsp]
+        cmp     ecx,DWORD[rbx]
+        cmovge  r12,rbp
+        cmp     ecx,DWORD[4+rbx]
+        cmovge  r13,rbp
+        cmp     ecx,DWORD[8+rbx]
+        cmovge  r14,rbp
+        cmp     ecx,DWORD[12+rbx]
+        cmovge  r15,rbp
+        cmp     ecx,DWORD[16+rbx]
+        cmovge  r8,rbp
+        cmp     ecx,DWORD[20+rbx]
+        cmovge  r9,rbp
+        cmp     ecx,DWORD[24+rbx]
+        cmovge  r10,rbp
+        cmp     ecx,DWORD[28+rbx]
+        cmovge  r11,rbp
+        vmovdqu ymm5,YMMWORD[rbx]
+        vpxor   ymm7,ymm7,ymm7
+        vmovdqa ymm6,ymm5
+        vpcmpgtd        ymm6,ymm6,ymm7
+        vpaddd  ymm5,ymm5,ymm6
+
+        vpand   ymm0,ymm0,ymm6
+        vpand   ymm1,ymm1,ymm6
+        vpaddd  ymm0,ymm0,YMMWORD[rdi]
+        vpand   ymm2,ymm2,ymm6
+        vpaddd  ymm1,ymm1,YMMWORD[32+rdi]
+        vpand   ymm3,ymm3,ymm6
+        vpaddd  ymm2,ymm2,YMMWORD[64+rdi]
+        vpand   ymm4,ymm4,ymm6
+        vpaddd  ymm3,ymm3,YMMWORD[96+rdi]
+        vpaddd  ymm4,ymm4,YMMWORD[128+rdi]
+        vmovdqu YMMWORD[rdi],ymm0
+        vmovdqu YMMWORD[32+rdi],ymm1
+        vmovdqu YMMWORD[64+rdi],ymm2
+        vmovdqu YMMWORD[96+rdi],ymm3
+        vmovdqu YMMWORD[128+rdi],ymm4
+
+        vmovdqu YMMWORD[rbx],ymm5
+        lea     rbx,[((256+128))+rsp]
+        vmovdqu ymm9,YMMWORD[96+rbp]
+        dec     edx
+        jnz     NEAR $L$oop_avx2
+
+
+
+
+
+
+
+$L$done_avx2:
+        mov     rax,QWORD[544+rsp]
+
+        vzeroupper
+        movaps  xmm6,XMMWORD[((-216))+rax]
+        movaps  xmm7,XMMWORD[((-200))+rax]
+        movaps  xmm8,XMMWORD[((-184))+rax]
+        movaps  xmm9,XMMWORD[((-168))+rax]
+        movaps  xmm10,XMMWORD[((-152))+rax]
+        movaps  xmm11,XMMWORD[((-136))+rax]
+        movaps  xmm12,XMMWORD[((-120))+rax]
+        movaps  xmm13,XMMWORD[((-104))+rax]
+        movaps  xmm14,XMMWORD[((-88))+rax]
+        movaps  xmm15,XMMWORD[((-72))+rax]
+        mov     r15,QWORD[((-48))+rax]
+
+        mov     r14,QWORD[((-40))+rax]
+
+        mov     r13,QWORD[((-32))+rax]
+
+        mov     r12,QWORD[((-24))+rax]
+
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$epilogue_avx2:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_sha1_multi_block_avx2:
+
+ALIGN   256
+        DD      0x5a827999,0x5a827999,0x5a827999,0x5a827999
+        DD      0x5a827999,0x5a827999,0x5a827999,0x5a827999
+K_XX_XX:
+        DD      0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1
+        DD      0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1
+        DD      0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc
+        DD      0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc
+        DD      0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6
+        DD      0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6
+        DD      0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+        DD      0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+DB      0xf,0xe,0xd,0xc,0xb,0xa,0x9,0x8,0x7,0x6,0x5,0x4,0x3,0x2,0x1,0x0
+DB      83,72,65,49,32,109,117,108,116,105,45,98,108,111,99,107
+DB      32,116,114,97,110,115,102,111,114,109,32,102,111,114,32,120
+DB      56,54,95,54,52,44,32,67,82,89,80,84,79,71,65,77
+DB      83,32,98,121,32,60,97,112,112,114,111,64,111,112,101,110
+DB      115,115,108,46,111,114,103,62,0
+EXTERN  __imp_RtlVirtualUnwind
+
+ALIGN   16
+se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$in_prologue
+
+        mov     rax,QWORD[152+r8]
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$in_prologue
+
+        mov     rax,QWORD[272+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+        mov     QWORD[144+r8],rbx
+        mov     QWORD[160+r8],rbp
+
+        lea     rsi,[((-24-160))+rax]
+        lea     rdi,[512+r8]
+        mov     ecx,20
+        DD      0xa548f3fc
+
+$L$in_prologue:
+        mov     rdi,QWORD[8+rax]
+        mov     rsi,QWORD[16+rax]
+        mov     QWORD[152+r8],rax
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+
+        mov     rdi,QWORD[40+r9]
+        mov     rsi,r8
+        mov     ecx,154
+        DD      0xa548f3fc
+
+        mov     rsi,r9
+        xor     rcx,rcx
+        mov     rdx,QWORD[8+rsi]
+        mov     r8,QWORD[rsi]
+        mov     r9,QWORD[16+rsi]
+        mov     r10,QWORD[40+rsi]
+        lea     r11,[56+rsi]
+        lea     r12,[24+rsi]
+        mov     QWORD[32+rsp],r10
+        mov     QWORD[40+rsp],r11
+        mov     QWORD[48+rsp],r12
+        mov     QWORD[56+rsp],rcx
+        call    QWORD[__imp_RtlVirtualUnwind]
+
+        mov     eax,1
+        add     rsp,64
+        popfq
+        pop     r15
+        pop     r14
+        pop     r13
+        pop     r12
+        pop     rbp
+        pop     rbx
+        pop     rdi
+        pop     rsi
+        DB      0F3h,0C3h               ;repret
+
+
+ALIGN   16
+avx2_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$in_prologue
+
+        mov     rax,QWORD[152+r8]
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$in_prologue
+
+        mov     rax,QWORD[544+r8]
+
+        mov     rbx,QWORD[((-8))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+        mov     r12,QWORD[((-24))+rax]
+        mov     r13,QWORD[((-32))+rax]
+        mov     r14,QWORD[((-40))+rax]
+        mov     r15,QWORD[((-48))+rax]
+        mov     QWORD[144+r8],rbx
+        mov     QWORD[160+r8],rbp
+        mov     QWORD[216+r8],r12
+        mov     QWORD[224+r8],r13
+        mov     QWORD[232+r8],r14
+        mov     QWORD[240+r8],r15
+
+        lea     rsi,[((-56-160))+rax]
+        lea     rdi,[512+r8]
+        mov     ecx,20
+        DD      0xa548f3fc
+
+        jmp     NEAR $L$in_prologue
+
+section .pdata rdata align=4
+ALIGN   4
+        DD      $L$SEH_begin_sha1_multi_block wrt ..imagebase
+        DD      $L$SEH_end_sha1_multi_block wrt ..imagebase
+        DD      $L$SEH_info_sha1_multi_block wrt ..imagebase
+        DD      $L$SEH_begin_sha1_multi_block_shaext wrt ..imagebase
+        DD      $L$SEH_end_sha1_multi_block_shaext wrt ..imagebase
+        DD      $L$SEH_info_sha1_multi_block_shaext wrt ..imagebase
+        DD      $L$SEH_begin_sha1_multi_block_avx wrt ..imagebase
+        DD      $L$SEH_end_sha1_multi_block_avx wrt ..imagebase
+        DD      $L$SEH_info_sha1_multi_block_avx wrt ..imagebase
+        DD      $L$SEH_begin_sha1_multi_block_avx2 wrt ..imagebase
+        DD      $L$SEH_end_sha1_multi_block_avx2 wrt ..imagebase
+        DD      $L$SEH_info_sha1_multi_block_avx2 wrt ..imagebase
+section .xdata rdata align=8
+ALIGN   8
+$L$SEH_info_sha1_multi_block:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$body wrt ..imagebase,$L$epilogue wrt ..imagebase
+$L$SEH_info_sha1_multi_block_shaext:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$body_shaext wrt ..imagebase,$L$epilogue_shaext wrt ..imagebase
+$L$SEH_info_sha1_multi_block_avx:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$body_avx wrt ..imagebase,$L$epilogue_avx wrt ..imagebase
+$L$SEH_info_sha1_multi_block_avx2:
+DB      9,0,0,0
+        DD      avx2_handler wrt ..imagebase
+        DD      $L$body_avx2 wrt ..imagebase,$L$epilogue_avx2 wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm
new file mode 100644
index 0000000000..3a7655b27f
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm
@@ -0,0 +1,5773 @@
+; Copyright 2006-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+EXTERN  OPENSSL_ia32cap_P
+
+global  sha1_block_data_order
+
+ALIGN   16
+sha1_block_data_order:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha1_block_data_order:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+        mov     r9d,DWORD[((OPENSSL_ia32cap_P+0))]
+        mov     r8d,DWORD[((OPENSSL_ia32cap_P+4))]
+        mov     r10d,DWORD[((OPENSSL_ia32cap_P+8))]
+        test    r8d,512
+        jz      NEAR $L$ialu
+        test    r10d,536870912
+        jnz     NEAR _shaext_shortcut
+        and     r10d,296
+        cmp     r10d,296
+        je      NEAR _avx2_shortcut
+        and     r8d,268435456
+        and     r9d,1073741824
+        or      r8d,r9d
+        cmp     r8d,1342177280
+        je      NEAR _avx_shortcut
+        jmp     NEAR _ssse3_shortcut
+
+ALIGN   16
+$L$ialu:
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        mov     r8,rdi
+        sub     rsp,72
+        mov     r9,rsi
+        and     rsp,-64
+        mov     r10,rdx
+        mov     QWORD[64+rsp],rax
+
+$L$prologue:
+
+        mov     esi,DWORD[r8]
+        mov     edi,DWORD[4+r8]
+        mov     r11d,DWORD[8+r8]
+        mov     r12d,DWORD[12+r8]
+        mov     r13d,DWORD[16+r8]
+        jmp     NEAR $L$loop
+
+ALIGN   16
+$L$loop:
+        mov     edx,DWORD[r9]
+        bswap   edx
+        mov     ebp,DWORD[4+r9]
+        mov     eax,r12d
+        mov     DWORD[rsp],edx
+        mov     ecx,esi
+        bswap   ebp
+        xor     eax,r11d
+        rol     ecx,5
+        and     eax,edi
+        lea     r13d,[1518500249+r13*1+rdx]
+        add     r13d,ecx
+        xor     eax,r12d
+        rol     edi,30
+        add     r13d,eax
+        mov     r14d,DWORD[8+r9]
+        mov     eax,r11d
+        mov     DWORD[4+rsp],ebp
+        mov     ecx,r13d
+        bswap   r14d
+        xor     eax,edi
+        rol     ecx,5
+        and     eax,esi
+        lea     r12d,[1518500249+r12*1+rbp]
+        add     r12d,ecx
+        xor     eax,r11d
+        rol     esi,30
+        add     r12d,eax
+        mov     edx,DWORD[12+r9]
+        mov     eax,edi
+        mov     DWORD[8+rsp],r14d
+        mov     ecx,r12d
+        bswap   edx
+        xor     eax,esi
+        rol     ecx,5
+        and     eax,r13d
+        lea     r11d,[1518500249+r11*1+r14]
+        add     r11d,ecx
+        xor     eax,edi
+        rol     r13d,30
+        add     r11d,eax
+        mov     ebp,DWORD[16+r9]
+        mov     eax,esi
+        mov     DWORD[12+rsp],edx
+        mov     ecx,r11d
+        bswap   ebp
+        xor     eax,r13d
+        rol     ecx,5
+        and     eax,r12d
+        lea     edi,[1518500249+rdi*1+rdx]
+        add     edi,ecx
+        xor     eax,esi
+        rol     r12d,30
+        add     edi,eax
+        mov     r14d,DWORD[20+r9]
+        mov     eax,r13d
+        mov     DWORD[16+rsp],ebp
+        mov     ecx,edi
+        bswap   r14d
+        xor     eax,r12d
+        rol     ecx,5
+        and     eax,r11d
+        lea     esi,[1518500249+rsi*1+rbp]
+        add     esi,ecx
+        xor     eax,r13d
+        rol     r11d,30
+        add     esi,eax
+        mov     edx,DWORD[24+r9]
+        mov     eax,r12d
+        mov     DWORD[20+rsp],r14d
+        mov     ecx,esi
+        bswap   edx
+        xor     eax,r11d
+        rol     ecx,5
+        and     eax,edi
+        lea     r13d,[1518500249+r13*1+r14]
+        add     r13d,ecx
+        xor     eax,r12d
+        rol     edi,30
+        add     r13d,eax
+        mov     ebp,DWORD[28+r9]
+        mov     eax,r11d
+        mov     DWORD[24+rsp],edx
+        mov     ecx,r13d
+        bswap   ebp
+        xor     eax,edi
+        rol     ecx,5
+        and     eax,esi
+        lea     r12d,[1518500249+r12*1+rdx]
+        add     r12d,ecx
+        xor     eax,r11d
+        rol     esi,30
+        add     r12d,eax
+        mov     r14d,DWORD[32+r9]
+        mov     eax,edi
+        mov     DWORD[28+rsp],ebp
+        mov     ecx,r12d
+        bswap   r14d
+        xor     eax,esi
+        rol     ecx,5
+        and     eax,r13d
+        lea     r11d,[1518500249+r11*1+rbp]
+        add     r11d,ecx
+        xor     eax,edi
+        rol     r13d,30
+        add     r11d,eax
+        mov     edx,DWORD[36+r9]
+        mov     eax,esi
+        mov     DWORD[32+rsp],r14d
+        mov     ecx,r11d
+        bswap   edx
+        xor     eax,r13d
+        rol     ecx,5
+        and     eax,r12d
+        lea     edi,[1518500249+rdi*1+r14]
+        add     edi,ecx
+        xor     eax,esi
+        rol     r12d,30
+        add     edi,eax
+        mov     ebp,DWORD[40+r9]
+        mov     eax,r13d
+        mov     DWORD[36+rsp],edx
+        mov     ecx,edi
+        bswap   ebp
+        xor     eax,r12d
+        rol     ecx,5
+        and     eax,r11d
+        lea     esi,[1518500249+rsi*1+rdx]
+        add     esi,ecx
+        xor     eax,r13d
+        rol     r11d,30
+        add     esi,eax
+        mov     r14d,DWORD[44+r9]
+        mov     eax,r12d
+        mov     DWORD[40+rsp],ebp
+        mov     ecx,esi
+        bswap   r14d
+        xor     eax,r11d
+        rol     ecx,5
+        and     eax,edi
+        lea     r13d,[1518500249+r13*1+rbp]
+        add     r13d,ecx
+        xor     eax,r12d
+        rol     edi,30
+        add     r13d,eax
+        mov     edx,DWORD[48+r9]
+        mov     eax,r11d
+        mov     DWORD[44+rsp],r14d
+        mov     ecx,r13d
+        bswap   edx
+        xor     eax,edi
+        rol     ecx,5
+        and     eax,esi
+        lea     r12d,[1518500249+r12*1+r14]
+        add     r12d,ecx
+        xor     eax,r11d
+        rol     esi,30
+        add     r12d,eax
+        mov     ebp,DWORD[52+r9]
+        mov     eax,edi
+        mov     DWORD[48+rsp],edx
+        mov     ecx,r12d
+        bswap   ebp
+        xor     eax,esi
+        rol     ecx,5
+        and     eax,r13d
+        lea     r11d,[1518500249+r11*1+rdx]
+        add     r11d,ecx
+        xor     eax,edi
+        rol     r13d,30
+        add     r11d,eax
+        mov     r14d,DWORD[56+r9]
+        mov     eax,esi
+        mov     DWORD[52+rsp],ebp
+        mov     ecx,r11d
+        bswap   r14d
+        xor     eax,r13d
+        rol     ecx,5
+        and     eax,r12d
+        lea     edi,[1518500249+rdi*1+rbp]
+        add     edi,ecx
+        xor     eax,esi
+        rol     r12d,30
+        add     edi,eax
+        mov     edx,DWORD[60+r9]
+        mov     eax,r13d
+        mov     DWORD[56+rsp],r14d
+        mov     ecx,edi
+        bswap   edx
+        xor     eax,r12d
+        rol     ecx,5
+        and     eax,r11d
+        lea     esi,[1518500249+rsi*1+r14]
+        add     esi,ecx
+        xor     eax,r13d
+        rol     r11d,30
+        add     esi,eax
+        xor     ebp,DWORD[rsp]
+        mov     eax,r12d
+        mov     DWORD[60+rsp],edx
+        mov     ecx,esi
+        xor     ebp,DWORD[8+rsp]
+        xor     eax,r11d
+        rol     ecx,5
+        xor     ebp,DWORD[32+rsp]
+        and     eax,edi
+        lea     r13d,[1518500249+r13*1+rdx]
+        rol     edi,30
+        xor     eax,r12d
+        add     r13d,ecx
+        rol     ebp,1
+        add     r13d,eax
+        xor     r14d,DWORD[4+rsp]
+        mov     eax,r11d
+        mov     DWORD[rsp],ebp
+        mov     ecx,r13d
+        xor     r14d,DWORD[12+rsp]
+        xor     eax,edi
+        rol     ecx,5
+        xor     r14d,DWORD[36+rsp]
+        and     eax,esi
+        lea     r12d,[1518500249+r12*1+rbp]
+        rol     esi,30
+        xor     eax,r11d
+        add     r12d,ecx
+        rol     r14d,1
+        add     r12d,eax
+        xor     edx,DWORD[8+rsp]
+        mov     eax,edi
+        mov     DWORD[4+rsp],r14d
+        mov     ecx,r12d
+        xor     edx,DWORD[16+rsp]
+        xor     eax,esi
+        rol     ecx,5
+        xor     edx,DWORD[40+rsp]
+        and     eax,r13d
+        lea     r11d,[1518500249+r11*1+r14]
+        rol     r13d,30
+        xor     eax,edi
+        add     r11d,ecx
+        rol     edx,1
+        add     r11d,eax
+        xor     ebp,DWORD[12+rsp]
+        mov     eax,esi
+        mov     DWORD[8+rsp],edx
+        mov     ecx,r11d
+        xor     ebp,DWORD[20+rsp]
+        xor     eax,r13d
+        rol     ecx,5
+        xor     ebp,DWORD[44+rsp]
+        and     eax,r12d
+        lea     edi,[1518500249+rdi*1+rdx]
+        rol     r12d,30
+        xor     eax,esi
+        add     edi,ecx
+        rol     ebp,1
+        add     edi,eax
+        xor     r14d,DWORD[16+rsp]
+        mov     eax,r13d
+        mov     DWORD[12+rsp],ebp
+        mov     ecx,edi
+        xor     r14d,DWORD[24+rsp]
+        xor     eax,r12d
+        rol     ecx,5
+        xor     r14d,DWORD[48+rsp]
+        and     eax,r11d
+        lea     esi,[1518500249+rsi*1+rbp]
+        rol     r11d,30
+        xor     eax,r13d
+        add     esi,ecx
+        rol     r14d,1
+        add     esi,eax
+        xor     edx,DWORD[20+rsp]
+        mov     eax,edi
+        mov     DWORD[16+rsp],r14d
+        mov     ecx,esi
+        xor     edx,DWORD[28+rsp]
+        xor     eax,r12d
+        rol     ecx,5
+        xor     edx,DWORD[52+rsp]
+        lea     r13d,[1859775393+r13*1+r14]
+        xor     eax,r11d
+        add     r13d,ecx
+        rol     edi,30
+        add     r13d,eax
+        rol     edx,1
+        xor     ebp,DWORD[24+rsp]
+        mov     eax,esi
+        mov     DWORD[20+rsp],edx
+        mov     ecx,r13d
+        xor     ebp,DWORD[32+rsp]
+        xor     eax,r11d
+        rol     ecx,5
+        xor     ebp,DWORD[56+rsp]
+        lea     r12d,[1859775393+r12*1+rdx]
+        xor     eax,edi
+        add     r12d,ecx
+        rol     esi,30
+        add     r12d,eax
+        rol     ebp,1
+        xor     r14d,DWORD[28+rsp]
+        mov     eax,r13d
+        mov     DWORD[24+rsp],ebp
+        mov     ecx,r12d
+        xor     r14d,DWORD[36+rsp]
+        xor     eax,edi
+        rol     ecx,5
+        xor     r14d,DWORD[60+rsp]
+        lea     r11d,[1859775393+r11*1+rbp]
+        xor     eax,esi
+        add     r11d,ecx
+        rol     r13d,30
+        add     r11d,eax
+        rol     r14d,1
+        xor     edx,DWORD[32+rsp]
+        mov     eax,r12d
+        mov     DWORD[28+rsp],r14d
+        mov     ecx,r11d
+        xor     edx,DWORD[40+rsp]
+        xor     eax,esi
+        rol     ecx,5
+        xor     edx,DWORD[rsp]
+        lea     edi,[1859775393+rdi*1+r14]
+        xor     eax,r13d
+        add     edi,ecx
+        rol     r12d,30
+        add     edi,eax
+        rol     edx,1
+        xor     ebp,DWORD[36+rsp]
+        mov     eax,r11d
+        mov     DWORD[32+rsp],edx
+        mov     ecx,edi
+        xor     ebp,DWORD[44+rsp]
+        xor     eax,r13d
+        rol     ecx,5
+        xor     ebp,DWORD[4+rsp]
+        lea     esi,[1859775393+rsi*1+rdx]
+        xor     eax,r12d
+        add     esi,ecx
+        rol     r11d,30
+        add     esi,eax
+        rol     ebp,1
+        xor     r14d,DWORD[40+rsp]
+        mov     eax,edi
+        mov     DWORD[36+rsp],ebp
+        mov     ecx,esi
+        xor     r14d,DWORD[48+rsp]
+        xor     eax,r12d
+        rol     ecx,5
+        xor     r14d,DWORD[8+rsp]
+        lea     r13d,[1859775393+r13*1+rbp]
+        xor     eax,r11d
+        add     r13d,ecx
+        rol     edi,30
+        add     r13d,eax
+        rol     r14d,1
+        xor     edx,DWORD[44+rsp]
+        mov     eax,esi
+        mov     DWORD[40+rsp],r14d
+        mov     ecx,r13d
+        xor     edx,DWORD[52+rsp]
+        xor     eax,r11d
+        rol     ecx,5
+        xor     edx,DWORD[12+rsp]
+        lea     r12d,[1859775393+r12*1+r14]
+        xor     eax,edi
+        add     r12d,ecx
+        rol     esi,30
+        add     r12d,eax
+        rol     edx,1
+        xor     ebp,DWORD[48+rsp]
+        mov     eax,r13d
+        mov     DWORD[44+rsp],edx
+        mov     ecx,r12d
+        xor     ebp,DWORD[56+rsp]
+        xor     eax,edi
+        rol     ecx,5
+        xor     ebp,DWORD[16+rsp]
+        lea     r11d,[1859775393+r11*1+rdx]
+        xor     eax,esi
+        add     r11d,ecx
+        rol     r13d,30
+        add     r11d,eax
+        rol     ebp,1
+        xor     r14d,DWORD[52+rsp]
+        mov     eax,r12d
+        mov     DWORD[48+rsp],ebp
+        mov     ecx,r11d
+        xor     r14d,DWORD[60+rsp]
+        xor     eax,esi
+        rol     ecx,5
+        xor     r14d,DWORD[20+rsp]
+        lea     edi,[1859775393+rdi*1+rbp]
+        xor     eax,r13d
+        add     edi,ecx
+        rol     r12d,30
+        add     edi,eax
+        rol     r14d,1
+        xor     edx,DWORD[56+rsp]
+        mov     eax,r11d
+        mov     DWORD[52+rsp],r14d
+        mov     ecx,edi
+        xor     edx,DWORD[rsp]
+        xor     eax,r13d
+        rol     ecx,5
+        xor     edx,DWORD[24+rsp]
+        lea     esi,[1859775393+rsi*1+r14]
+        xor     eax,r12d
+        add     esi,ecx
+        rol     r11d,30
+        add     esi,eax
+        rol     edx,1
+        xor     ebp,DWORD[60+rsp]
+        mov     eax,edi
+        mov     DWORD[56+rsp],edx
+        mov     ecx,esi
+        xor     ebp,DWORD[4+rsp]
+        xor     eax,r12d
+        rol     ecx,5
+        xor     ebp,DWORD[28+rsp]
+        lea     r13d,[1859775393+r13*1+rdx]
+        xor     eax,r11d
+        add     r13d,ecx
+        rol     edi,30
+        add     r13d,eax
+        rol     ebp,1
+        xor     r14d,DWORD[rsp]
+        mov     eax,esi
+        mov     DWORD[60+rsp],ebp
+        mov     ecx,r13d
+        xor     r14d,DWORD[8+rsp]
+        xor     eax,r11d
+        rol     ecx,5
+        xor     r14d,DWORD[32+rsp]
+        lea     r12d,[1859775393+r12*1+rbp]
+        xor     eax,edi
+        add     r12d,ecx
+        rol     esi,30
+        add     r12d,eax
+        rol     r14d,1
+        xor     edx,DWORD[4+rsp]
+        mov     eax,r13d
+        mov     DWORD[rsp],r14d
+        mov     ecx,r12d
+        xor     edx,DWORD[12+rsp]
+        xor     eax,edi
+        rol     ecx,5
+        xor     edx,DWORD[36+rsp]
+        lea     r11d,[1859775393+r11*1+r14]
+        xor     eax,esi
+        add     r11d,ecx
+        rol     r13d,30
+        add     r11d,eax
+        rol     edx,1
+        xor     ebp,DWORD[8+rsp]
+        mov     eax,r12d
+        mov     DWORD[4+rsp],edx
+        mov     ecx,r11d
+        xor     ebp,DWORD[16+rsp]
+        xor     eax,esi
+        rol     ecx,5
+        xor     ebp,DWORD[40+rsp]
+        lea     edi,[1859775393+rdi*1+rdx]
+        xor     eax,r13d
+        add     edi,ecx
+        rol     r12d,30
+        add     edi,eax
+        rol     ebp,1
+        xor     r14d,DWORD[12+rsp]
+        mov     eax,r11d
+        mov     DWORD[8+rsp],ebp
+        mov     ecx,edi
+        xor     r14d,DWORD[20+rsp]
+        xor     eax,r13d
+        rol     ecx,5
+        xor     r14d,DWORD[44+rsp]
+        lea     esi,[1859775393+rsi*1+rbp]
+        xor     eax,r12d
+        add     esi,ecx
+        rol     r11d,30
+        add     esi,eax
+        rol     r14d,1
+        xor     edx,DWORD[16+rsp]
+        mov     eax,edi
+        mov     DWORD[12+rsp],r14d
+        mov     ecx,esi
+        xor     edx,DWORD[24+rsp]
+        xor     eax,r12d
+        rol     ecx,5
+        xor     edx,DWORD[48+rsp]
+        lea     r13d,[1859775393+r13*1+r14]
+        xor     eax,r11d
+        add     r13d,ecx
+        rol     edi,30
+        add     r13d,eax
+        rol     edx,1
+        xor     ebp,DWORD[20+rsp]
+        mov     eax,esi
+        mov     DWORD[16+rsp],edx
+        mov     ecx,r13d
+        xor     ebp,DWORD[28+rsp]
+        xor     eax,r11d
+        rol     ecx,5
+        xor     ebp,DWORD[52+rsp]
+        lea     r12d,[1859775393+r12*1+rdx]
+        xor     eax,edi
+        add     r12d,ecx
+        rol     esi,30
+        add     r12d,eax
+        rol     ebp,1
+        xor     r14d,DWORD[24+rsp]
+        mov     eax,r13d
+        mov     DWORD[20+rsp],ebp
+        mov     ecx,r12d
+        xor     r14d,DWORD[32+rsp]
+        xor     eax,edi
+        rol     ecx,5
+        xor     r14d,DWORD[56+rsp]
+        lea     r11d,[1859775393+r11*1+rbp]
+        xor     eax,esi
+        add     r11d,ecx
+        rol     r13d,30
+        add     r11d,eax
+        rol     r14d,1
+        xor     edx,DWORD[28+rsp]
+        mov     eax,r12d
+        mov     DWORD[24+rsp],r14d
+        mov     ecx,r11d
+        xor     edx,DWORD[36+rsp]
+        xor     eax,esi
+        rol     ecx,5
+        xor     edx,DWORD[60+rsp]
+        lea     edi,[1859775393+rdi*1+r14]
+        xor     eax,r13d
+        add     edi,ecx
+        rol     r12d,30
+        add     edi,eax
+        rol     edx,1
+        xor     ebp,DWORD[32+rsp]
+        mov     eax,r11d
+        mov     DWORD[28+rsp],edx
+        mov     ecx,edi
+        xor     ebp,DWORD[40+rsp]
+        xor     eax,r13d
+        rol     ecx,5
+        xor     ebp,DWORD[rsp]
+        lea     esi,[1859775393+rsi*1+rdx]
+        xor     eax,r12d
+        add     esi,ecx
+        rol     r11d,30
+        add     esi,eax
+        rol     ebp,1
+        xor     r14d,DWORD[36+rsp]
+        mov     eax,r12d
+        mov     DWORD[32+rsp],ebp
+        mov     ebx,r12d
+        xor     r14d,DWORD[44+rsp]
+        and     eax,r11d
+        mov     ecx,esi
+        xor     r14d,DWORD[4+rsp]
+        lea     r13d,[((-1894007588))+r13*1+rbp]
+        xor     ebx,r11d
+        rol     ecx,5
+        add     r13d,eax
+        rol     r14d,1
+        and     ebx,edi
+        add     r13d,ecx
+        rol     edi,30
+        add     r13d,ebx
+        xor     edx,DWORD[40+rsp]
+        mov     eax,r11d
+        mov     DWORD[36+rsp],r14d
+        mov     ebx,r11d
+        xor     edx,DWORD[48+rsp]
+        and     eax,edi
+        mov     ecx,r13d
+        xor     edx,DWORD[8+rsp]
+        lea     r12d,[((-1894007588))+r12*1+r14]
+        xor     ebx,edi
+        rol     ecx,5
+        add     r12d,eax
+        rol     edx,1
+        and     ebx,esi
+        add     r12d,ecx
+        rol     esi,30
+        add     r12d,ebx
+        xor     ebp,DWORD[44+rsp]
+        mov     eax,edi
+        mov     DWORD[40+rsp],edx
+        mov     ebx,edi
+        xor     ebp,DWORD[52+rsp]
+        and     eax,esi
+        mov     ecx,r12d
+        xor     ebp,DWORD[12+rsp]
+        lea     r11d,[((-1894007588))+r11*1+rdx]
+        xor     ebx,esi
+        rol     ecx,5
+        add     r11d,eax
+        rol     ebp,1
+        and     ebx,r13d
+        add     r11d,ecx
+        rol     r13d,30
+        add     r11d,ebx
+        xor     r14d,DWORD[48+rsp]
+        mov     eax,esi
+        mov     DWORD[44+rsp],ebp
+        mov     ebx,esi
+        xor     r14d,DWORD[56+rsp]
+        and     eax,r13d
+        mov     ecx,r11d
+        xor     r14d,DWORD[16+rsp]
+        lea     edi,[((-1894007588))+rdi*1+rbp]
+        xor     ebx,r13d
+        rol     ecx,5
+        add     edi,eax
+        rol     r14d,1
+        and     ebx,r12d
+        add     edi,ecx
+        rol     r12d,30
+        add     edi,ebx
+        xor     edx,DWORD[52+rsp]
+        mov     eax,r13d
+        mov     DWORD[48+rsp],r14d
+        mov     ebx,r13d
+        xor     edx,DWORD[60+rsp]
+        and     eax,r12d
+        mov     ecx,edi
+        xor     edx,DWORD[20+rsp]
+        lea     esi,[((-1894007588))+rsi*1+r14]
+        xor     ebx,r12d
+        rol     ecx,5
+        add     esi,eax
+        rol     edx,1
+        and     ebx,r11d
+        add     esi,ecx
+        rol     r11d,30
+        add     esi,ebx
+        xor     ebp,DWORD[56+rsp]
+        mov     eax,r12d
+        mov     DWORD[52+rsp],edx
+        mov     ebx,r12d
+        xor     ebp,DWORD[rsp]
+        and     eax,r11d
+        mov     ecx,esi
+        xor     ebp,DWORD[24+rsp]
+        lea     r13d,[((-1894007588))+r13*1+rdx]
+        xor     ebx,r11d
+        rol     ecx,5
+        add     r13d,eax
+        rol     ebp,1
+        and     ebx,edi
+        add     r13d,ecx
+        rol     edi,30
+        add     r13d,ebx
+        xor     r14d,DWORD[60+rsp]
+        mov     eax,r11d
+        mov     DWORD[56+rsp],ebp
+        mov     ebx,r11d
+        xor     r14d,DWORD[4+rsp]
+        and     eax,edi
+        mov     ecx,r13d
+        xor     r14d,DWORD[28+rsp]
+        lea     r12d,[((-1894007588))+r12*1+rbp]
+        xor     ebx,edi
+        rol     ecx,5
+        add     r12d,eax
+        rol     r14d,1
+        and     ebx,esi
+        add     r12d,ecx
+        rol     esi,30
+        add     r12d,ebx
+        xor     edx,DWORD[rsp]
+        mov     eax,edi
+        mov     DWORD[60+rsp],r14d
+        mov     ebx,edi
+        xor     edx,DWORD[8+rsp]
+        and     eax,esi
+        mov     ecx,r12d
+        xor     edx,DWORD[32+rsp]
+        lea     r11d,[((-1894007588))+r11*1+r14]
+        xor     ebx,esi
+        rol     ecx,5
+        add     r11d,eax
+        rol     edx,1
+        and     ebx,r13d
+        add     r11d,ecx
+        rol     r13d,30
+        add     r11d,ebx
+        xor     ebp,DWORD[4+rsp]
+        mov     eax,esi
+        mov     DWORD[rsp],edx
+        mov     ebx,esi
+        xor     ebp,DWORD[12+rsp]
+        and     eax,r13d
+        mov     ecx,r11d
+        xor     ebp,DWORD[36+rsp]
+        lea     edi,[((-1894007588))+rdi*1+rdx]
+        xor     ebx,r13d
+        rol     ecx,5
+        add     edi,eax
+        rol     ebp,1
+        and     ebx,r12d
+        add     edi,ecx
+        rol     r12d,30
+        add     edi,ebx
+        xor     r14d,DWORD[8+rsp]
+        mov     eax,r13d
+        mov     DWORD[4+rsp],ebp
+        mov     ebx,r13d
+        xor     r14d,DWORD[16+rsp]
+        and     eax,r12d
+        mov     ecx,edi
+        xor     r14d,DWORD[40+rsp]
+        lea     esi,[((-1894007588))+rsi*1+rbp]
+        xor     ebx,r12d
+        rol     ecx,5
+        add     esi,eax
+        rol     r14d,1
+        and     ebx,r11d
+        add     esi,ecx
+        rol     r11d,30
+        add     esi,ebx
+        xor     edx,DWORD[12+rsp]
+        mov     eax,r12d
+        mov     DWORD[8+rsp],r14d
+        mov     ebx,r12d
+        xor     edx,DWORD[20+rsp]
+        and     eax,r11d
+        mov     ecx,esi
+        xor     edx,DWORD[44+rsp]
+        lea     r13d,[((-1894007588))+r13*1+r14]
+        xor     ebx,r11d
+        rol     ecx,5
+        add     r13d,eax
+        rol     edx,1
+        and     ebx,edi
+        add     r13d,ecx
+        rol     edi,30
+        add     r13d,ebx
+        xor     ebp,DWORD[16+rsp]
+        mov     eax,r11d
+        mov     DWORD[12+rsp],edx
+        mov     ebx,r11d
+        xor     ebp,DWORD[24+rsp]
+        and     eax,edi
+        mov     ecx,r13d
+        xor     ebp,DWORD[48+rsp]
+        lea     r12d,[((-1894007588))+r12*1+rdx]
+        xor     ebx,edi
+        rol     ecx,5
+        add     r12d,eax
+        rol     ebp,1
+        and     ebx,esi
+        add     r12d,ecx
+        rol     esi,30
+        add     r12d,ebx
+        xor     r14d,DWORD[20+rsp]
+        mov     eax,edi
+        mov     DWORD[16+rsp],ebp
+        mov     ebx,edi
+        xor     r14d,DWORD[28+rsp]
+        and     eax,esi
+        mov     ecx,r12d
+        xor     r14d,DWORD[52+rsp]
+        lea     r11d,[((-1894007588))+r11*1+rbp]
+        xor     ebx,esi
+        rol     ecx,5
+        add     r11d,eax
+        rol     r14d,1
+        and     ebx,r13d
+        add     r11d,ecx
+        rol     r13d,30
+        add     r11d,ebx
+        xor     edx,DWORD[24+rsp]
+        mov     eax,esi
+        mov     DWORD[20+rsp],r14d
+        mov     ebx,esi
+        xor     edx,DWORD[32+rsp]
+        and     eax,r13d
+        mov     ecx,r11d
+        xor     edx,DWORD[56+rsp]
+        lea     edi,[((-1894007588))+rdi*1+r14]
+        xor     ebx,r13d
+        rol     ecx,5
+        add     edi,eax
+        rol     edx,1
+        and     ebx,r12d
+        add     edi,ecx
+        rol     r12d,30
+        add     edi,ebx
+        xor     ebp,DWORD[28+rsp]
+        mov     eax,r13d
+        mov     DWORD[24+rsp],edx
+        mov     ebx,r13d
+        xor     ebp,DWORD[36+rsp]
+        and     eax,r12d
+        mov     ecx,edi
+        xor     ebp,DWORD[60+rsp]
+        lea     esi,[((-1894007588))+rsi*1+rdx]
+        xor     ebx,r12d
+        rol     ecx,5
+        add     esi,eax
+        rol     ebp,1
+        and     ebx,r11d
+        add     esi,ecx
+        rol     r11d,30
+        add     esi,ebx
+        xor     r14d,DWORD[32+rsp]
+        mov     eax,r12d
+        mov     DWORD[28+rsp],ebp
+        mov     ebx,r12d
+        xor     r14d,DWORD[40+rsp]
+        and     eax,r11d
+        mov     ecx,esi
+        xor     r14d,DWORD[rsp]
+        lea     r13d,[((-1894007588))+r13*1+rbp]
+        xor     ebx,r11d
+        rol     ecx,5
+        add     r13d,eax
+        rol     r14d,1
+        and     ebx,edi
+        add     r13d,ecx
+        rol     edi,30
+        add     r13d,ebx
+        xor     edx,DWORD[36+rsp]
+        mov     eax,r11d
+        mov     DWORD[32+rsp],r14d
+        mov     ebx,r11d
+        xor     edx,DWORD[44+rsp]
+        and     eax,edi
+        mov     ecx,r13d
+        xor     edx,DWORD[4+rsp]
+        lea     r12d,[((-1894007588))+r12*1+r14]
+        xor     ebx,edi
+        rol     ecx,5
+        add     r12d,eax
+        rol     edx,1
+        and     ebx,esi
+        add     r12d,ecx
+        rol     esi,30
+        add     r12d,ebx
+        xor     ebp,DWORD[40+rsp]
+        mov     eax,edi
+        mov     DWORD[36+rsp],edx
+        mov     ebx,edi
+        xor     ebp,DWORD[48+rsp]
+        and     eax,esi
+        mov     ecx,r12d
+        xor     ebp,DWORD[8+rsp]
+        lea     r11d,[((-1894007588))+r11*1+rdx]
+        xor     ebx,esi
+        rol     ecx,5
+        add     r11d,eax
+        rol     ebp,1
+        and     ebx,r13d
+        add     r11d,ecx
+        rol     r13d,30
+        add     r11d,ebx
+        xor     r14d,DWORD[44+rsp]
+        mov     eax,esi
+        mov     DWORD[40+rsp],ebp
+        mov     ebx,esi
+        xor     r14d,DWORD[52+rsp]
+        and     eax,r13d
+        mov     ecx,r11d
+        xor     r14d,DWORD[12+rsp]
+        lea     edi,[((-1894007588))+rdi*1+rbp]
+        xor     ebx,r13d
+        rol     ecx,5
+        add     edi,eax
+        rol     r14d,1
+        and     ebx,r12d
+        add     edi,ecx
+        rol     r12d,30
+        add     edi,ebx
+        xor     edx,DWORD[48+rsp]
+        mov     eax,r13d
+        mov     DWORD[44+rsp],r14d
+        mov     ebx,r13d
+        xor     edx,DWORD[56+rsp]
+        and     eax,r12d
+        mov     ecx,edi
+        xor     edx,DWORD[16+rsp]
+        lea     esi,[((-1894007588))+rsi*1+r14]
+        xor     ebx,r12d
+        rol     ecx,5
+        add     esi,eax
+        rol     edx,1
+        and     ebx,r11d
+        add     esi,ecx
+        rol     r11d,30
+        add     esi,ebx
+        xor     ebp,DWORD[52+rsp]
+        mov     eax,edi
+        mov     DWORD[48+rsp],edx
+        mov     ecx,esi
+        xor     ebp,DWORD[60+rsp]
+        xor     eax,r12d
+        rol     ecx,5
+        xor     ebp,DWORD[20+rsp]
+        lea     r13d,[((-899497514))+r13*1+rdx]
+        xor     eax,r11d
+        add     r13d,ecx
+        rol     edi,30
+        add     r13d,eax
+        rol     ebp,1
+        xor     r14d,DWORD[56+rsp]
+        mov     eax,esi
+        mov     DWORD[52+rsp],ebp
+        mov     ecx,r13d
+        xor     r14d,DWORD[rsp]
+        xor     eax,r11d
+        rol     ecx,5
+        xor     r14d,DWORD[24+rsp]
+        lea     r12d,[((-899497514))+r12*1+rbp]
+        xor     eax,edi
+        add     r12d,ecx
+        rol     esi,30
+        add     r12d,eax
+        rol     r14d,1
+        xor     edx,DWORD[60+rsp]
+        mov     eax,r13d
+        mov     DWORD[56+rsp],r14d
+        mov     ecx,r12d
+        xor     edx,DWORD[4+rsp]
+        xor     eax,edi
+        rol     ecx,5
+        xor     edx,DWORD[28+rsp]
+        lea     r11d,[((-899497514))+r11*1+r14]
+        xor     eax,esi
+        add     r11d,ecx
+        rol     r13d,30
+        add     r11d,eax
+        rol     edx,1
+        xor     ebp,DWORD[rsp]
+        mov     eax,r12d
+        mov     DWORD[60+rsp],edx
+        mov     ecx,r11d
+        xor     ebp,DWORD[8+rsp]
+        xor     eax,esi
+        rol     ecx,5
+        xor     ebp,DWORD[32+rsp]
+        lea     edi,[((-899497514))+rdi*1+rdx]
+        xor     eax,r13d
+        add     edi,ecx
+        rol     r12d,30
+        add     edi,eax
+        rol     ebp,1
+        xor     r14d,DWORD[4+rsp]
+        mov     eax,r11d
+        mov     DWORD[rsp],ebp
+        mov     ecx,edi
+        xor     r14d,DWORD[12+rsp]
+        xor     eax,r13d
+        rol     ecx,5
+        xor     r14d,DWORD[36+rsp]
+        lea     esi,[((-899497514))+rsi*1+rbp]
+        xor     eax,r12d
+        add     esi,ecx
+        rol     r11d,30
+        add     esi,eax
+        rol     r14d,1
+        xor     edx,DWORD[8+rsp]
+        mov     eax,edi
+        mov     DWORD[4+rsp],r14d
+        mov     ecx,esi
+        xor     edx,DWORD[16+rsp]
+        xor     eax,r12d
+        rol     ecx,5
+        xor     edx,DWORD[40+rsp]
+        lea     r13d,[((-899497514))+r13*1+r14]
+        xor     eax,r11d
+        add     r13d,ecx
+        rol     edi,30
+        add     r13d,eax
+        rol     edx,1
+        xor     ebp,DWORD[12+rsp]
+        mov     eax,esi
+        mov     DWORD[8+rsp],edx
+        mov     ecx,r13d
+        xor     ebp,DWORD[20+rsp]
+        xor     eax,r11d
+        rol     ecx,5
+        xor     ebp,DWORD[44+rsp]
+        lea     r12d,[((-899497514))+r12*1+rdx]
+        xor     eax,edi
+        add     r12d,ecx
+        rol     esi,30
+        add     r12d,eax
+        rol     ebp,1
+        xor     r14d,DWORD[16+rsp]
+        mov     eax,r13d
+        mov     DWORD[12+rsp],ebp
+        mov     ecx,r12d
+        xor     r14d,DWORD[24+rsp]
+        xor     eax,edi
+        rol     ecx,5
+        xor     r14d,DWORD[48+rsp]
+        lea     r11d,[((-899497514))+r11*1+rbp]
+        xor     eax,esi
+        add     r11d,ecx
+        rol     r13d,30
+        add     r11d,eax
+        rol     r14d,1
+        xor     edx,DWORD[20+rsp]
+        mov     eax,r12d
+        mov     DWORD[16+rsp],r14d
+        mov     ecx,r11d
+        xor     edx,DWORD[28+rsp]
+        xor     eax,esi
+        rol     ecx,5
+        xor     edx,DWORD[52+rsp]
+        lea     edi,[((-899497514))+rdi*1+r14]
+        xor     eax,r13d
+        add     edi,ecx
+        rol     r12d,30
+        add     edi,eax
+        rol     edx,1
+        xor     ebp,DWORD[24+rsp]
+        mov     eax,r11d
+        mov     DWORD[20+rsp],edx
+        mov     ecx,edi
+        xor     ebp,DWORD[32+rsp]
+        xor     eax,r13d
+        rol     ecx,5
+        xor     ebp,DWORD[56+rsp]
+        lea     esi,[((-899497514))+rsi*1+rdx]
+        xor     eax,r12d
+        add     esi,ecx
+        rol     r11d,30
+        add     esi,eax
+        rol     ebp,1
+        xor     r14d,DWORD[28+rsp]
+        mov     eax,edi
+        mov     DWORD[24+rsp],ebp
+        mov     ecx,esi
+        xor     r14d,DWORD[36+rsp]
+        xor     eax,r12d
+        rol     ecx,5
+        xor     r14d,DWORD[60+rsp]
+        lea     r13d,[((-899497514))+r13*1+rbp]
+        xor     eax,r11d
+        add     r13d,ecx
+        rol     edi,30
+        add     r13d,eax
+        rol     r14d,1
+        xor     edx,DWORD[32+rsp]
+        mov     eax,esi
+        mov     DWORD[28+rsp],r14d
+        mov     ecx,r13d
+        xor     edx,DWORD[40+rsp]
+        xor     eax,r11d
+        rol     ecx,5
+        xor     edx,DWORD[rsp]
+        lea     r12d,[((-899497514))+r12*1+r14]
+        xor     eax,edi
+        add     r12d,ecx
+        rol     esi,30
+        add     r12d,eax
+        rol     edx,1
+        xor     ebp,DWORD[36+rsp]
+        mov     eax,r13d
+
+        mov     ecx,r12d
+        xor     ebp,DWORD[44+rsp]
+        xor     eax,edi
+        rol     ecx,5
+        xor     ebp,DWORD[4+rsp]
+        lea     r11d,[((-899497514))+r11*1+rdx]
+        xor     eax,esi
+        add     r11d,ecx
+        rol     r13d,30
+        add     r11d,eax
+        rol     ebp,1
+        xor     r14d,DWORD[40+rsp]
+        mov     eax,r12d
+
+        mov     ecx,r11d
+        xor     r14d,DWORD[48+rsp]
+        xor     eax,esi
+        rol     ecx,5
+        xor     r14d,DWORD[8+rsp]
+        lea     edi,[((-899497514))+rdi*1+rbp]
+        xor     eax,r13d
+        add     edi,ecx
+        rol     r12d,30
+        add     edi,eax
+        rol     r14d,1
+        xor     edx,DWORD[44+rsp]
+        mov     eax,r11d
+
+        mov     ecx,edi
+        xor     edx,DWORD[52+rsp]
+        xor     eax,r13d
+        rol     ecx,5
+        xor     edx,DWORD[12+rsp]
+        lea     esi,[((-899497514))+rsi*1+r14]
+        xor     eax,r12d
+        add     esi,ecx
+        rol     r11d,30
+        add     esi,eax
+        rol     edx,1
+        xor     ebp,DWORD[48+rsp]
+        mov     eax,edi
+
+        mov     ecx,esi
+        xor     ebp,DWORD[56+rsp]
+        xor     eax,r12d
+        rol     ecx,5
+        xor     ebp,DWORD[16+rsp]
+        lea     r13d,[((-899497514))+r13*1+rdx]
+        xor     eax,r11d
+        add     r13d,ecx
+        rol     edi,30
+        add     r13d,eax
+        rol     ebp,1
+        xor     r14d,DWORD[52+rsp]
+        mov     eax,esi
+
+        mov     ecx,r13d
+        xor     r14d,DWORD[60+rsp]
+        xor     eax,r11d
+        rol     ecx,5
+        xor     r14d,DWORD[20+rsp]
+        lea     r12d,[((-899497514))+r12*1+rbp]
+        xor     eax,edi
+        add     r12d,ecx
+        rol     esi,30
+        add     r12d,eax
+        rol     r14d,1
+        xor     edx,DWORD[56+rsp]
+        mov     eax,r13d
+
+        mov     ecx,r12d
+        xor     edx,DWORD[rsp]
+        xor     eax,edi
+        rol     ecx,5
+        xor     edx,DWORD[24+rsp]
+        lea     r11d,[((-899497514))+r11*1+r14]
+        xor     eax,esi
+        add     r11d,ecx
+        rol     r13d,30
+        add     r11d,eax
+        rol     edx,1
+        xor     ebp,DWORD[60+rsp]
+        mov     eax,r12d
+
+        mov     ecx,r11d
+        xor     ebp,DWORD[4+rsp]
+        xor     eax,esi
+        rol     ecx,5
+        xor     ebp,DWORD[28+rsp]
+        lea     edi,[((-899497514))+rdi*1+rdx]
+        xor     eax,r13d
+        add     edi,ecx
+        rol     r12d,30
+        add     edi,eax
+        rol     ebp,1
+        mov     eax,r11d
+        mov     ecx,edi
+        xor     eax,r13d
+        lea     esi,[((-899497514))+rsi*1+rbp]
+        rol     ecx,5
+        xor     eax,r12d
+        add     esi,ecx
+        rol     r11d,30
+        add     esi,eax
+        add     esi,DWORD[r8]
+        add     edi,DWORD[4+r8]
+        add     r11d,DWORD[8+r8]
+        add     r12d,DWORD[12+r8]
+        add     r13d,DWORD[16+r8]
+        mov     DWORD[r8],esi
+        mov     DWORD[4+r8],edi
+        mov     DWORD[8+r8],r11d
+        mov     DWORD[12+r8],r12d
+        mov     DWORD[16+r8],r13d
+
+        sub     r10,1
+        lea     r9,[64+r9]
+        jnz     NEAR $L$loop
+
+        mov     rsi,QWORD[64+rsp]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_sha1_block_data_order:
+
+ALIGN   32
+sha1_block_data_order_shaext:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha1_block_data_order_shaext:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+_shaext_shortcut:
+
+        lea     rsp,[((-72))+rsp]
+        movaps  XMMWORD[(-8-64)+rax],xmm6
+        movaps  XMMWORD[(-8-48)+rax],xmm7
+        movaps  XMMWORD[(-8-32)+rax],xmm8
+        movaps  XMMWORD[(-8-16)+rax],xmm9
+$L$prologue_shaext:
+        movdqu  xmm0,XMMWORD[rdi]
+        movd    xmm1,DWORD[16+rdi]
+        movdqa  xmm3,XMMWORD[((K_XX_XX+160))]
+
+        movdqu  xmm4,XMMWORD[rsi]
+        pshufd  xmm0,xmm0,27
+        movdqu  xmm5,XMMWORD[16+rsi]
+        pshufd  xmm1,xmm1,27
+        movdqu  xmm6,XMMWORD[32+rsi]
+DB      102,15,56,0,227
+        movdqu  xmm7,XMMWORD[48+rsi]
+DB      102,15,56,0,235
+DB      102,15,56,0,243
+        movdqa  xmm9,xmm1
+DB      102,15,56,0,251
+        jmp     NEAR $L$oop_shaext
+
+ALIGN   16
+$L$oop_shaext:
+        dec     rdx
+        lea     r8,[64+rsi]
+        paddd   xmm1,xmm4
+        cmovne  rsi,r8
+        movdqa  xmm8,xmm0
+DB      15,56,201,229
+        movdqa  xmm2,xmm0
+DB      15,58,204,193,0
+DB      15,56,200,213
+        pxor    xmm4,xmm6
+DB      15,56,201,238
+DB      15,56,202,231
+
+        movdqa  xmm1,xmm0
+DB      15,58,204,194,0
+DB      15,56,200,206
+        pxor    xmm5,xmm7
+DB      15,56,202,236
+DB      15,56,201,247
+        movdqa  xmm2,xmm0
+DB      15,58,204,193,0
+DB      15,56,200,215
+        pxor    xmm6,xmm4
+DB      15,56,201,252
+DB      15,56,202,245
+
+        movdqa  xmm1,xmm0
+DB      15,58,204,194,0
+DB      15,56,200,204
+        pxor    xmm7,xmm5
+DB      15,56,202,254
+DB      15,56,201,229
+        movdqa  xmm2,xmm0
+DB      15,58,204,193,0
+DB      15,56,200,213
+        pxor    xmm4,xmm6
+DB      15,56,201,238
+DB      15,56,202,231
+
+        movdqa  xmm1,xmm0
+DB      15,58,204,194,1
+DB      15,56,200,206
+        pxor    xmm5,xmm7
+DB      15,56,202,236
+DB      15,56,201,247
+        movdqa  xmm2,xmm0
+DB      15,58,204,193,1
+DB      15,56,200,215
+        pxor    xmm6,xmm4
+DB      15,56,201,252
+DB      15,56,202,245
+
+        movdqa  xmm1,xmm0
+DB      15,58,204,194,1
+DB      15,56,200,204
+        pxor    xmm7,xmm5
+DB      15,56,202,254
+DB      15,56,201,229
+        movdqa  xmm2,xmm0
+DB      15,58,204,193,1
+DB      15,56,200,213
+        pxor    xmm4,xmm6
+DB      15,56,201,238
+DB      15,56,202,231
+
+        movdqa  xmm1,xmm0
+DB      15,58,204,194,1
+DB      15,56,200,206
+        pxor    xmm5,xmm7
+DB      15,56,202,236
+DB      15,56,201,247
+        movdqa  xmm2,xmm0
+DB      15,58,204,193,2
+DB      15,56,200,215
+        pxor    xmm6,xmm4
+DB      15,56,201,252
+DB      15,56,202,245
+
+        movdqa  xmm1,xmm0
+DB      15,58,204,194,2
+DB      15,56,200,204
+        pxor    xmm7,xmm5
+DB      15,56,202,254
+DB      15,56,201,229
+        movdqa  xmm2,xmm0
+DB      15,58,204,193,2
+DB      15,56,200,213
+        pxor    xmm4,xmm6
+DB      15,56,201,238
+DB      15,56,202,231
+
+        movdqa  xmm1,xmm0
+DB      15,58,204,194,2
+DB      15,56,200,206
+        pxor    xmm5,xmm7
+DB      15,56,202,236
+DB      15,56,201,247
+        movdqa  xmm2,xmm0
+DB      15,58,204,193,2
+DB      15,56,200,215
+        pxor    xmm6,xmm4
+DB      15,56,201,252
+DB      15,56,202,245
+
+        movdqa  xmm1,xmm0
+DB      15,58,204,194,3
+DB      15,56,200,204
+        pxor    xmm7,xmm5
+DB      15,56,202,254
+        movdqu  xmm4,XMMWORD[rsi]
+        movdqa  xmm2,xmm0
+DB      15,58,204,193,3
+DB      15,56,200,213
+        movdqu  xmm5,XMMWORD[16+rsi]
+DB      102,15,56,0,227
+
+        movdqa  xmm1,xmm0
+DB      15,58,204,194,3
+DB      15,56,200,206
+        movdqu  xmm6,XMMWORD[32+rsi]
+DB      102,15,56,0,235
+
+        movdqa  xmm2,xmm0
+DB      15,58,204,193,3
+DB      15,56,200,215
+        movdqu  xmm7,XMMWORD[48+rsi]
+DB      102,15,56,0,243
+
+        movdqa  xmm1,xmm0
+DB      15,58,204,194,3
+DB      65,15,56,200,201
+DB      102,15,56,0,251
+
+        paddd   xmm0,xmm8
+        movdqa  xmm9,xmm1
+
+        jnz     NEAR $L$oop_shaext
+
+        pshufd  xmm0,xmm0,27
+        pshufd  xmm1,xmm1,27
+        movdqu  XMMWORD[rdi],xmm0
+        movd    DWORD[16+rdi],xmm1
+        movaps  xmm6,XMMWORD[((-8-64))+rax]
+        movaps  xmm7,XMMWORD[((-8-48))+rax]
+        movaps  xmm8,XMMWORD[((-8-32))+rax]
+        movaps  xmm9,XMMWORD[((-8-16))+rax]
+        mov     rsp,rax
+$L$epilogue_shaext:
+
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+$L$SEH_end_sha1_block_data_order_shaext:
+
+ALIGN   16
+sha1_block_data_order_ssse3:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha1_block_data_order_ssse3:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+_ssse3_shortcut:
+
+        mov     r11,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        lea     rsp,[((-160))+rsp]
+        movaps  XMMWORD[(-40-96)+r11],xmm6
+        movaps  XMMWORD[(-40-80)+r11],xmm7
+        movaps  XMMWORD[(-40-64)+r11],xmm8
+        movaps  XMMWORD[(-40-48)+r11],xmm9
+        movaps  XMMWORD[(-40-32)+r11],xmm10
+        movaps  XMMWORD[(-40-16)+r11],xmm11
+$L$prologue_ssse3:
+        and     rsp,-64
+        mov     r8,rdi
+        mov     r9,rsi
+        mov     r10,rdx
+
+        shl     r10,6
+        add     r10,r9
+        lea     r14,[((K_XX_XX+64))]
+
+        mov     eax,DWORD[r8]
+        mov     ebx,DWORD[4+r8]
+        mov     ecx,DWORD[8+r8]
+        mov     edx,DWORD[12+r8]
+        mov     esi,ebx
+        mov     ebp,DWORD[16+r8]
+        mov     edi,ecx
+        xor     edi,edx
+        and     esi,edi
+
+        movdqa  xmm6,XMMWORD[64+r14]
+        movdqa  xmm9,XMMWORD[((-64))+r14]
+        movdqu  xmm0,XMMWORD[r9]
+        movdqu  xmm1,XMMWORD[16+r9]
+        movdqu  xmm2,XMMWORD[32+r9]
+        movdqu  xmm3,XMMWORD[48+r9]
+DB      102,15,56,0,198
+DB      102,15,56,0,206
+DB      102,15,56,0,214
+        add     r9,64
+        paddd   xmm0,xmm9
+DB      102,15,56,0,222
+        paddd   xmm1,xmm9
+        paddd   xmm2,xmm9
+        movdqa  XMMWORD[rsp],xmm0
+        psubd   xmm0,xmm9
+        movdqa  XMMWORD[16+rsp],xmm1
+        psubd   xmm1,xmm9
+        movdqa  XMMWORD[32+rsp],xmm2
+        psubd   xmm2,xmm9
+        jmp     NEAR $L$oop_ssse3
+ALIGN   16
+$L$oop_ssse3:
+        ror     ebx,2
+        pshufd  xmm4,xmm0,238
+        xor     esi,edx
+        movdqa  xmm8,xmm3
+        paddd   xmm9,xmm3
+        mov     edi,eax
+        add     ebp,DWORD[rsp]
+        punpcklqdq      xmm4,xmm1
+        xor     ebx,ecx
+        rol     eax,5
+        add     ebp,esi
+        psrldq  xmm8,4
+        and     edi,ebx
+        xor     ebx,ecx
+        pxor    xmm4,xmm0
+        add     ebp,eax
+        ror     eax,7
+        pxor    xmm8,xmm2
+        xor     edi,ecx
+        mov     esi,ebp
+        add     edx,DWORD[4+rsp]
+        pxor    xmm4,xmm8
+        xor     eax,ebx
+        rol     ebp,5
+        movdqa  XMMWORD[48+rsp],xmm9
+        add     edx,edi
+        and     esi,eax
+        movdqa  xmm10,xmm4
+        xor     eax,ebx
+        add     edx,ebp
+        ror     ebp,7
+        movdqa  xmm8,xmm4
+        xor     esi,ebx
+        pslldq  xmm10,12
+        paddd   xmm4,xmm4
+        mov     edi,edx
+        add     ecx,DWORD[8+rsp]
+        psrld   xmm8,31
+        xor     ebp,eax
+        rol     edx,5
+        add     ecx,esi
+        movdqa  xmm9,xmm10
+        and     edi,ebp
+        xor     ebp,eax
+        psrld   xmm10,30
+        add     ecx,edx
+        ror     edx,7
+        por     xmm4,xmm8
+        xor     edi,eax
+        mov     esi,ecx
+        add     ebx,DWORD[12+rsp]
+        pslld   xmm9,2
+        pxor    xmm4,xmm10
+        xor     edx,ebp
+        movdqa  xmm10,XMMWORD[((-64))+r14]
+        rol     ecx,5
+        add     ebx,edi
+        and     esi,edx
+        pxor    xmm4,xmm9
+        xor     edx,ebp
+        add     ebx,ecx
+        ror     ecx,7
+        pshufd  xmm5,xmm1,238
+        xor     esi,ebp
+        movdqa  xmm9,xmm4
+        paddd   xmm10,xmm4
+        mov     edi,ebx
+        add     eax,DWORD[16+rsp]
+        punpcklqdq      xmm5,xmm2
+        xor     ecx,edx
+        rol     ebx,5
+        add     eax,esi
+        psrldq  xmm9,4
+        and     edi,ecx
+        xor     ecx,edx
+        pxor    xmm5,xmm1
+        add     eax,ebx
+        ror     ebx,7
+        pxor    xmm9,xmm3
+        xor     edi,edx
+        mov     esi,eax
+        add     ebp,DWORD[20+rsp]
+        pxor    xmm5,xmm9
+        xor     ebx,ecx
+        rol     eax,5
+        movdqa  XMMWORD[rsp],xmm10
+        add     ebp,edi
+        and     esi,ebx
+        movdqa  xmm8,xmm5
+        xor     ebx,ecx
+        add     ebp,eax
+        ror     eax,7
+        movdqa  xmm9,xmm5
+        xor     esi,ecx
+        pslldq  xmm8,12
+        paddd   xmm5,xmm5
+        mov     edi,ebp
+        add     edx,DWORD[24+rsp]
+        psrld   xmm9,31
+        xor     eax,ebx
+        rol     ebp,5
+        add     edx,esi
+        movdqa  xmm10,xmm8
+        and     edi,eax
+        xor     eax,ebx
+        psrld   xmm8,30
+        add     edx,ebp
+        ror     ebp,7
+        por     xmm5,xmm9
+        xor     edi,ebx
+        mov     esi,edx
+        add     ecx,DWORD[28+rsp]
+        pslld   xmm10,2
+        pxor    xmm5,xmm8
+        xor     ebp,eax
+        movdqa  xmm8,XMMWORD[((-32))+r14]
+        rol     edx,5
+        add     ecx,edi
+        and     esi,ebp
+        pxor    xmm5,xmm10
+        xor     ebp,eax
+        add     ecx,edx
+        ror     edx,7
+        pshufd  xmm6,xmm2,238
+        xor     esi,eax
+        movdqa  xmm10,xmm5
+        paddd   xmm8,xmm5
+        mov     edi,ecx
+        add     ebx,DWORD[32+rsp]
+        punpcklqdq      xmm6,xmm3
+        xor     edx,ebp
+        rol     ecx,5
+        add     ebx,esi
+        psrldq  xmm10,4
+        and     edi,edx
+        xor     edx,ebp
+        pxor    xmm6,xmm2
+        add     ebx,ecx
+        ror     ecx,7
+        pxor    xmm10,xmm4
+        xor     edi,ebp
+        mov     esi,ebx
+        add     eax,DWORD[36+rsp]
+        pxor    xmm6,xmm10
+        xor     ecx,edx
+        rol     ebx,5
+        movdqa  XMMWORD[16+rsp],xmm8
+        add     eax,edi
+        and     esi,ecx
+        movdqa  xmm9,xmm6
+        xor     ecx,edx
+        add     eax,ebx
+        ror     ebx,7
+        movdqa  xmm10,xmm6
+        xor     esi,edx
+        pslldq  xmm9,12
+        paddd   xmm6,xmm6
+        mov     edi,eax
+        add     ebp,DWORD[40+rsp]
+        psrld   xmm10,31
+        xor     ebx,ecx
+        rol     eax,5
+        add     ebp,esi
+        movdqa  xmm8,xmm9
+        and     edi,ebx
+        xor     ebx,ecx
+        psrld   xmm9,30
+        add     ebp,eax
+        ror     eax,7
+        por     xmm6,xmm10
+        xor     edi,ecx
+        mov     esi,ebp
+        add     edx,DWORD[44+rsp]
+        pslld   xmm8,2
+        pxor    xmm6,xmm9
+        xor     eax,ebx
+        movdqa  xmm9,XMMWORD[((-32))+r14]
+        rol     ebp,5
+        add     edx,edi
+        and     esi,eax
+        pxor    xmm6,xmm8
+        xor     eax,ebx
+        add     edx,ebp
+        ror     ebp,7
+        pshufd  xmm7,xmm3,238
+        xor     esi,ebx
+        movdqa  xmm8,xmm6
+        paddd   xmm9,xmm6
+        mov     edi,edx
+        add     ecx,DWORD[48+rsp]
+        punpcklqdq      xmm7,xmm4
+        xor     ebp,eax
+        rol     edx,5
+        add     ecx,esi
+        psrldq  xmm8,4
+        and     edi,ebp
+        xor     ebp,eax
+        pxor    xmm7,xmm3
+        add     ecx,edx
+        ror     edx,7
+        pxor    xmm8,xmm5
+        xor     edi,eax
+        mov     esi,ecx
+        add     ebx,DWORD[52+rsp]
+        pxor    xmm7,xmm8
+        xor     edx,ebp
+        rol     ecx,5
+        movdqa  XMMWORD[32+rsp],xmm9
+        add     ebx,edi
+        and     esi,edx
+        movdqa  xmm10,xmm7
+        xor     edx,ebp
+        add     ebx,ecx
+        ror     ecx,7
+        movdqa  xmm8,xmm7
+        xor     esi,ebp
+        pslldq  xmm10,12
+        paddd   xmm7,xmm7
+        mov     edi,ebx
+        add     eax,DWORD[56+rsp]
+        psrld   xmm8,31
+        xor     ecx,edx
+        rol     ebx,5
+        add     eax,esi
+        movdqa  xmm9,xmm10
+        and     edi,ecx
+        xor     ecx,edx
+        psrld   xmm10,30
+        add     eax,ebx
+        ror     ebx,7
+        por     xmm7,xmm8
+        xor     edi,edx
+        mov     esi,eax
+        add     ebp,DWORD[60+rsp]
+        pslld   xmm9,2
+        pxor    xmm7,xmm10
+        xor     ebx,ecx
+        movdqa  xmm10,XMMWORD[((-32))+r14]
+        rol     eax,5
+        add     ebp,edi
+        and     esi,ebx
+        pxor    xmm7,xmm9
+        pshufd  xmm9,xmm6,238
+        xor     ebx,ecx
+        add     ebp,eax
+        ror     eax,7
+        pxor    xmm0,xmm4
+        xor     esi,ecx
+        mov     edi,ebp
+        add     edx,DWORD[rsp]
+        punpcklqdq      xmm9,xmm7
+        xor     eax,ebx
+        rol     ebp,5
+        pxor    xmm0,xmm1
+        add     edx,esi
+        and     edi,eax
+        movdqa  xmm8,xmm10
+        xor     eax,ebx
+        paddd   xmm10,xmm7
+        add     edx,ebp
+        pxor    xmm0,xmm9
+        ror     ebp,7
+        xor     edi,ebx
+        mov     esi,edx
+        add     ecx,DWORD[4+rsp]
+        movdqa  xmm9,xmm0
+        xor     ebp,eax
+        rol     edx,5
+        movdqa  XMMWORD[48+rsp],xmm10
+        add     ecx,edi
+        and     esi,ebp
+        xor     ebp,eax
+        pslld   xmm0,2
+        add     ecx,edx
+        ror     edx,7
+        psrld   xmm9,30
+        xor     esi,eax
+        mov     edi,ecx
+        add     ebx,DWORD[8+rsp]
+        por     xmm0,xmm9
+        xor     edx,ebp
+        rol     ecx,5
+        pshufd  xmm10,xmm7,238
+        add     ebx,esi
+        and     edi,edx
+        xor     edx,ebp
+        add     ebx,ecx
+        add     eax,DWORD[12+rsp]
+        xor     edi,ebp
+        mov     esi,ebx
+        rol     ebx,5
+        add     eax,edi
+        xor     esi,edx
+        ror     ecx,7
+        add     eax,ebx
+        pxor    xmm1,xmm5
+        add     ebp,DWORD[16+rsp]
+        xor     esi,ecx
+        punpcklqdq      xmm10,xmm0
+        mov     edi,eax
+        rol     eax,5
+        pxor    xmm1,xmm2
+        add     ebp,esi
+        xor     edi,ecx
+        movdqa  xmm9,xmm8
+        ror     ebx,7
+        paddd   xmm8,xmm0
+        add     ebp,eax
+        pxor    xmm1,xmm10
+        add     edx,DWORD[20+rsp]
+        xor     edi,ebx
+        mov     esi,ebp
+        rol     ebp,5
+        movdqa  xmm10,xmm1
+        add     edx,edi
+        xor     esi,ebx
+        movdqa  XMMWORD[rsp],xmm8
+        ror     eax,7
+        add     edx,ebp
+        add     ecx,DWORD[24+rsp]
+        pslld   xmm1,2
+        xor     esi,eax
+        mov     edi,edx
+        psrld   xmm10,30
+        rol     edx,5
+        add     ecx,esi
+        xor     edi,eax
+        ror     ebp,7
+        por     xmm1,xmm10
+        add     ecx,edx
+        add     ebx,DWORD[28+rsp]
+        pshufd  xmm8,xmm0,238
+        xor     edi,ebp
+        mov     esi,ecx
+        rol     ecx,5
+        add     ebx,edi
+        xor     esi,ebp
+        ror     edx,7
+        add     ebx,ecx
+        pxor    xmm2,xmm6
+        add     eax,DWORD[32+rsp]
+        xor     esi,edx
+        punpcklqdq      xmm8,xmm1
+        mov     edi,ebx
+        rol     ebx,5
+        pxor    xmm2,xmm3
+        add     eax,esi
+        xor     edi,edx
+        movdqa  xmm10,XMMWORD[r14]
+        ror     ecx,7
+        paddd   xmm9,xmm1
+        add     eax,ebx
+        pxor    xmm2,xmm8
+        add     ebp,DWORD[36+rsp]
+        xor     edi,ecx
+        mov     esi,eax
+        rol     eax,5
+        movdqa  xmm8,xmm2
+        add     ebp,edi
+        xor     esi,ecx
+        movdqa  XMMWORD[16+rsp],xmm9
+        ror     ebx,7
+        add     ebp,eax
+        add     edx,DWORD[40+rsp]
+        pslld   xmm2,2
+        xor     esi,ebx
+        mov     edi,ebp
+        psrld   xmm8,30
+        rol     ebp,5
+        add     edx,esi
+        xor     edi,ebx
+        ror     eax,7
+        por     xmm2,xmm8
+        add     edx,ebp
+        add     ecx,DWORD[44+rsp]
+        pshufd  xmm9,xmm1,238
+        xor     edi,eax
+        mov     esi,edx
+        rol     edx,5
+        add     ecx,edi
+        xor     esi,eax
+        ror     ebp,7
+        add     ecx,edx
+        pxor    xmm3,xmm7
+        add     ebx,DWORD[48+rsp]
+        xor     esi,ebp
+        punpcklqdq      xmm9,xmm2
+        mov     edi,ecx
+        rol     ecx,5
+        pxor    xmm3,xmm4
+        add     ebx,esi
+        xor     edi,ebp
+        movdqa  xmm8,xmm10
+        ror     edx,7
+        paddd   xmm10,xmm2
+        add     ebx,ecx
+        pxor    xmm3,xmm9
+        add     eax,DWORD[52+rsp]
+        xor     edi,edx
+        mov     esi,ebx
+        rol     ebx,5
+        movdqa  xmm9,xmm3
+        add     eax,edi
+        xor     esi,edx
+        movdqa  XMMWORD[32+rsp],xmm10
+        ror     ecx,7
+        add     eax,ebx
+        add     ebp,DWORD[56+rsp]
+        pslld   xmm3,2
+        xor     esi,ecx
+        mov     edi,eax
+        psrld   xmm9,30
+        rol     eax,5
+        add     ebp,esi
+        xor     edi,ecx
+        ror     ebx,7
+        por     xmm3,xmm9
+        add     ebp,eax
+        add     edx,DWORD[60+rsp]
+        pshufd  xmm10,xmm2,238
+        xor     edi,ebx
+        mov     esi,ebp
+        rol     ebp,5
+        add     edx,edi
+        xor     esi,ebx
+        ror     eax,7
+        add     edx,ebp
+        pxor    xmm4,xmm0
+        add     ecx,DWORD[rsp]
+        xor     esi,eax
+        punpcklqdq      xmm10,xmm3
+        mov     edi,edx
+        rol     edx,5
+        pxor    xmm4,xmm5
+        add     ecx,esi
+        xor     edi,eax
+        movdqa  xmm9,xmm8
+        ror     ebp,7
+        paddd   xmm8,xmm3
+        add     ecx,edx
+        pxor    xmm4,xmm10
+        add     ebx,DWORD[4+rsp]
+        xor     edi,ebp
+        mov     esi,ecx
+        rol     ecx,5
+        movdqa  xmm10,xmm4
+        add     ebx,edi
+        xor     esi,ebp
+        movdqa  XMMWORD[48+rsp],xmm8
+        ror     edx,7
+        add     ebx,ecx
+        add     eax,DWORD[8+rsp]
+        pslld   xmm4,2
+        xor     esi,edx
+        mov     edi,ebx
+        psrld   xmm10,30
+        rol     ebx,5
+        add     eax,esi
+        xor     edi,edx
+        ror     ecx,7
+        por     xmm4,xmm10
+        add     eax,ebx
+        add     ebp,DWORD[12+rsp]
+        pshufd  xmm8,xmm3,238
+        xor     edi,ecx
+        mov     esi,eax
+        rol     eax,5
+        add     ebp,edi
+        xor     esi,ecx
+        ror     ebx,7
+        add     ebp,eax
+        pxor    xmm5,xmm1
+        add     edx,DWORD[16+rsp]
+        xor     esi,ebx
+        punpcklqdq      xmm8,xmm4
+        mov     edi,ebp
+        rol     ebp,5
+        pxor    xmm5,xmm6
+        add     edx,esi
+        xor     edi,ebx
+        movdqa  xmm10,xmm9
+        ror     eax,7
+        paddd   xmm9,xmm4
+        add     edx,ebp
+        pxor    xmm5,xmm8
+        add     ecx,DWORD[20+rsp]
+        xor     edi,eax
+        mov     esi,edx
+        rol     edx,5
+        movdqa  xmm8,xmm5
+        add     ecx,edi
+        xor     esi,eax
+        movdqa  XMMWORD[rsp],xmm9
+        ror     ebp,7
+        add     ecx,edx
+        add     ebx,DWORD[24+rsp]
+        pslld   xmm5,2
+        xor     esi,ebp
+        mov     edi,ecx
+        psrld   xmm8,30
+        rol     ecx,5
+        add     ebx,esi
+        xor     edi,ebp
+        ror     edx,7
+        por     xmm5,xmm8
+        add     ebx,ecx
+        add     eax,DWORD[28+rsp]
+        pshufd  xmm9,xmm4,238
+        ror     ecx,7
+        mov     esi,ebx
+        xor     edi,edx
+        rol     ebx,5
+        add     eax,edi
+        xor     esi,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        pxor    xmm6,xmm2
+        add     ebp,DWORD[32+rsp]
+        and     esi,ecx
+        xor     ecx,edx
+        ror     ebx,7
+        punpcklqdq      xmm9,xmm5
+        mov     edi,eax
+        xor     esi,ecx
+        pxor    xmm6,xmm7
+        rol     eax,5
+        add     ebp,esi
+        movdqa  xmm8,xmm10
+        xor     edi,ebx
+        paddd   xmm10,xmm5
+        xor     ebx,ecx
+        pxor    xmm6,xmm9
+        add     ebp,eax
+        add     edx,DWORD[36+rsp]
+        and     edi,ebx
+        xor     ebx,ecx
+        ror     eax,7
+        movdqa  xmm9,xmm6
+        mov     esi,ebp
+        xor     edi,ebx
+        movdqa  XMMWORD[16+rsp],xmm10
+        rol     ebp,5
+        add     edx,edi
+        xor     esi,eax
+        pslld   xmm6,2
+        xor     eax,ebx
+        add     edx,ebp
+        psrld   xmm9,30
+        add     ecx,DWORD[40+rsp]
+        and     esi,eax
+        xor     eax,ebx
+        por     xmm6,xmm9
+        ror     ebp,7
+        mov     edi,edx
+        xor     esi,eax
+        rol     edx,5
+        pshufd  xmm10,xmm5,238
+        add     ecx,esi
+        xor     edi,ebp
+        xor     ebp,eax
+        add     ecx,edx
+        add     ebx,DWORD[44+rsp]
+        and     edi,ebp
+        xor     ebp,eax
+        ror     edx,7
+        mov     esi,ecx
+        xor     edi,ebp
+        rol     ecx,5
+        add     ebx,edi
+        xor     esi,edx
+        xor     edx,ebp
+        add     ebx,ecx
+        pxor    xmm7,xmm3
+        add     eax,DWORD[48+rsp]
+        and     esi,edx
+        xor     edx,ebp
+        ror     ecx,7
+        punpcklqdq      xmm10,xmm6
+        mov     edi,ebx
+        xor     esi,edx
+        pxor    xmm7,xmm0
+        rol     ebx,5
+        add     eax,esi
+        movdqa  xmm9,XMMWORD[32+r14]
+        xor     edi,ecx
+        paddd   xmm8,xmm6
+        xor     ecx,edx
+        pxor    xmm7,xmm10
+        add     eax,ebx
+        add     ebp,DWORD[52+rsp]
+        and     edi,ecx
+        xor     ecx,edx
+        ror     ebx,7
+        movdqa  xmm10,xmm7
+        mov     esi,eax
+        xor     edi,ecx
+        movdqa  XMMWORD[32+rsp],xmm8
+        rol     eax,5
+        add     ebp,edi
+        xor     esi,ebx
+        pslld   xmm7,2
+        xor     ebx,ecx
+        add     ebp,eax
+        psrld   xmm10,30
+        add     edx,DWORD[56+rsp]
+        and     esi,ebx
+        xor     ebx,ecx
+        por     xmm7,xmm10
+        ror     eax,7
+        mov     edi,ebp
+        xor     esi,ebx
+        rol     ebp,5
+        pshufd  xmm8,xmm6,238
+        add     edx,esi
+        xor     edi,eax
+        xor     eax,ebx
+        add     edx,ebp
+        add     ecx,DWORD[60+rsp]
+        and     edi,eax
+        xor     eax,ebx
+        ror     ebp,7
+        mov     esi,edx
+        xor     edi,eax
+        rol     edx,5
+        add     ecx,edi
+        xor     esi,ebp
+        xor     ebp,eax
+        add     ecx,edx
+        pxor    xmm0,xmm4
+        add     ebx,DWORD[rsp]
+        and     esi,ebp
+        xor     ebp,eax
+        ror     edx,7
+        punpcklqdq      xmm8,xmm7
+        mov     edi,ecx
+        xor     esi,ebp
+        pxor    xmm0,xmm1
+        rol     ecx,5
+        add     ebx,esi
+        movdqa  xmm10,xmm9
+        xor     edi,edx
+        paddd   xmm9,xmm7
+        xor     edx,ebp
+        pxor    xmm0,xmm8
+        add     ebx,ecx
+        add     eax,DWORD[4+rsp]
+        and     edi,edx
+        xor     edx,ebp
+        ror     ecx,7
+        movdqa  xmm8,xmm0
+        mov     esi,ebx
+        xor     edi,edx
+        movdqa  XMMWORD[48+rsp],xmm9
+        rol     ebx,5
+        add     eax,edi
+        xor     esi,ecx
+        pslld   xmm0,2
+        xor     ecx,edx
+        add     eax,ebx
+        psrld   xmm8,30
+        add     ebp,DWORD[8+rsp]
+        and     esi,ecx
+        xor     ecx,edx
+        por     xmm0,xmm8
+        ror     ebx,7
+        mov     edi,eax
+        xor     esi,ecx
+        rol     eax,5
+        pshufd  xmm9,xmm7,238
+        add     ebp,esi
+        xor     edi,ebx
+        xor     ebx,ecx
+        add     ebp,eax
+        add     edx,DWORD[12+rsp]
+        and     edi,ebx
+        xor     ebx,ecx
+        ror     eax,7
+        mov     esi,ebp
+        xor     edi,ebx
+        rol     ebp,5
+        add     edx,edi
+        xor     esi,eax
+        xor     eax,ebx
+        add     edx,ebp
+        pxor    xmm1,xmm5
+        add     ecx,DWORD[16+rsp]
+        and     esi,eax
+        xor     eax,ebx
+        ror     ebp,7
+        punpcklqdq      xmm9,xmm0
+        mov     edi,edx
+        xor     esi,eax
+        pxor    xmm1,xmm2
+        rol     edx,5
+        add     ecx,esi
+        movdqa  xmm8,xmm10
+        xor     edi,ebp
+        paddd   xmm10,xmm0
+        xor     ebp,eax
+        pxor    xmm1,xmm9
+        add     ecx,edx
+        add     ebx,DWORD[20+rsp]
+        and     edi,ebp
+        xor     ebp,eax
+        ror     edx,7
+        movdqa  xmm9,xmm1
+        mov     esi,ecx
+        xor     edi,ebp
+        movdqa  XMMWORD[rsp],xmm10
+        rol     ecx,5
+        add     ebx,edi
+        xor     esi,edx
+        pslld   xmm1,2
+        xor     edx,ebp
+        add     ebx,ecx
+        psrld   xmm9,30
+        add     eax,DWORD[24+rsp]
+        and     esi,edx
+        xor     edx,ebp
+        por     xmm1,xmm9
+        ror     ecx,7
+        mov     edi,ebx
+        xor     esi,edx
+        rol     ebx,5
+        pshufd  xmm10,xmm0,238
+        add     eax,esi
+        xor     edi,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        add     ebp,DWORD[28+rsp]
+        and     edi,ecx
+        xor     ecx,edx
+        ror     ebx,7
+        mov     esi,eax
+        xor     edi,ecx
+        rol     eax,5
+        add     ebp,edi
+        xor     esi,ebx
+        xor     ebx,ecx
+        add     ebp,eax
+        pxor    xmm2,xmm6
+        add     edx,DWORD[32+rsp]
+        and     esi,ebx
+        xor     ebx,ecx
+        ror     eax,7
+        punpcklqdq      xmm10,xmm1
+        mov     edi,ebp
+        xor     esi,ebx
+        pxor    xmm2,xmm3
+        rol     ebp,5
+        add     edx,esi
+        movdqa  xmm9,xmm8
+        xor     edi,eax
+        paddd   xmm8,xmm1
+        xor     eax,ebx
+        pxor    xmm2,xmm10
+        add     edx,ebp
+        add     ecx,DWORD[36+rsp]
+        and     edi,eax
+        xor     eax,ebx
+        ror     ebp,7
+        movdqa  xmm10,xmm2
+        mov     esi,edx
+        xor     edi,eax
+        movdqa  XMMWORD[16+rsp],xmm8
+        rol     edx,5
+        add     ecx,edi
+        xor     esi,ebp
+        pslld   xmm2,2
+        xor     ebp,eax
+        add     ecx,edx
+        psrld   xmm10,30
+        add     ebx,DWORD[40+rsp]
+        and     esi,ebp
+        xor     ebp,eax
+        por     xmm2,xmm10
+        ror     edx,7
+        mov     edi,ecx
+        xor     esi,ebp
+        rol     ecx,5
+        pshufd  xmm8,xmm1,238
+        add     ebx,esi
+        xor     edi,edx
+        xor     edx,ebp
+        add     ebx,ecx
+        add     eax,DWORD[44+rsp]
+        and     edi,edx
+        xor     edx,ebp
+        ror     ecx,7
+        mov     esi,ebx
+        xor     edi,edx
+        rol     ebx,5
+        add     eax,edi
+        xor     esi,edx
+        add     eax,ebx
+        pxor    xmm3,xmm7
+        add     ebp,DWORD[48+rsp]
+        xor     esi,ecx
+        punpcklqdq      xmm8,xmm2
+        mov     edi,eax
+        rol     eax,5
+        pxor    xmm3,xmm4
+        add     ebp,esi
+        xor     edi,ecx
+        movdqa  xmm10,xmm9
+        ror     ebx,7
+        paddd   xmm9,xmm2
+        add     ebp,eax
+        pxor    xmm3,xmm8
+        add     edx,DWORD[52+rsp]
+        xor     edi,ebx
+        mov     esi,ebp
+        rol     ebp,5
+        movdqa  xmm8,xmm3
+        add     edx,edi
+        xor     esi,ebx
+        movdqa  XMMWORD[32+rsp],xmm9
+        ror     eax,7
+        add     edx,ebp
+        add     ecx,DWORD[56+rsp]
+        pslld   xmm3,2
+        xor     esi,eax
+        mov     edi,edx
+        psrld   xmm8,30
+        rol     edx,5
+        add     ecx,esi
+        xor     edi,eax
+        ror     ebp,7
+        por     xmm3,xmm8
+        add     ecx,edx
+        add     ebx,DWORD[60+rsp]
+        xor     edi,ebp
+        mov     esi,ecx
+        rol     ecx,5
+        add     ebx,edi
+        xor     esi,ebp
+        ror     edx,7
+        add     ebx,ecx
+        add     eax,DWORD[rsp]
+        xor     esi,edx
+        mov     edi,ebx
+        rol     ebx,5
+        paddd   xmm10,xmm3
+        add     eax,esi
+        xor     edi,edx
+        movdqa  XMMWORD[48+rsp],xmm10
+        ror     ecx,7
+        add     eax,ebx
+        add     ebp,DWORD[4+rsp]
+        xor     edi,ecx
+        mov     esi,eax
+        rol     eax,5
+        add     ebp,edi
+        xor     esi,ecx
+        ror     ebx,7
+        add     ebp,eax
+        add     edx,DWORD[8+rsp]
+        xor     esi,ebx
+        mov     edi,ebp
+        rol     ebp,5
+        add     edx,esi
+        xor     edi,ebx
+        ror     eax,7
+        add     edx,ebp
+        add     ecx,DWORD[12+rsp]
+        xor     edi,eax
+        mov     esi,edx
+        rol     edx,5
+        add     ecx,edi
+        xor     esi,eax
+        ror     ebp,7
+        add     ecx,edx
+        cmp     r9,r10
+        je      NEAR $L$done_ssse3
+        movdqa  xmm6,XMMWORD[64+r14]
+        movdqa  xmm9,XMMWORD[((-64))+r14]
+        movdqu  xmm0,XMMWORD[r9]
+        movdqu  xmm1,XMMWORD[16+r9]
+        movdqu  xmm2,XMMWORD[32+r9]
+        movdqu  xmm3,XMMWORD[48+r9]
+DB      102,15,56,0,198
+        add     r9,64
+        add     ebx,DWORD[16+rsp]
+        xor     esi,ebp
+        mov     edi,ecx
+DB      102,15,56,0,206
+        rol     ecx,5
+        add     ebx,esi
+        xor     edi,ebp
+        ror     edx,7
+        paddd   xmm0,xmm9
+        add     ebx,ecx
+        add     eax,DWORD[20+rsp]
+        xor     edi,edx
+        mov     esi,ebx
+        movdqa  XMMWORD[rsp],xmm0
+        rol     ebx,5
+        add     eax,edi
+        xor     esi,edx
+        ror     ecx,7
+        psubd   xmm0,xmm9
+        add     eax,ebx
+        add     ebp,DWORD[24+rsp]
+        xor     esi,ecx
+        mov     edi,eax
+        rol     eax,5
+        add     ebp,esi
+        xor     edi,ecx
+        ror     ebx,7
+        add     ebp,eax
+        add     edx,DWORD[28+rsp]
+        xor     edi,ebx
+        mov     esi,ebp
+        rol     ebp,5
+        add     edx,edi
+        xor     esi,ebx
+        ror     eax,7
+        add     edx,ebp
+        add     ecx,DWORD[32+rsp]
+        xor     esi,eax
+        mov     edi,edx
+DB      102,15,56,0,214
+        rol     edx,5
+        add     ecx,esi
+        xor     edi,eax
+        ror     ebp,7
+        paddd   xmm1,xmm9
+        add     ecx,edx
+        add     ebx,DWORD[36+rsp]
+        xor     edi,ebp
+        mov     esi,ecx
+        movdqa  XMMWORD[16+rsp],xmm1
+        rol     ecx,5
+        add     ebx,edi
+        xor     esi,ebp
+        ror     edx,7
+        psubd   xmm1,xmm9
+        add     ebx,ecx
+        add     eax,DWORD[40+rsp]
+        xor     esi,edx
+        mov     edi,ebx
+        rol     ebx,5
+        add     eax,esi
+        xor     edi,edx
+        ror     ecx,7
+        add     eax,ebx
+        add     ebp,DWORD[44+rsp]
+        xor     edi,ecx
+        mov     esi,eax
+        rol     eax,5
+        add     ebp,edi
+        xor     esi,ecx
+        ror     ebx,7
+        add     ebp,eax
+        add     edx,DWORD[48+rsp]
+        xor     esi,ebx
+        mov     edi,ebp
+DB      102,15,56,0,222
+        rol     ebp,5
+        add     edx,esi
+        xor     edi,ebx
+        ror     eax,7
+        paddd   xmm2,xmm9
+        add     edx,ebp
+        add     ecx,DWORD[52+rsp]
+        xor     edi,eax
+        mov     esi,edx
+        movdqa  XMMWORD[32+rsp],xmm2
+        rol     edx,5
+        add     ecx,edi
+        xor     esi,eax
+        ror     ebp,7
+        psubd   xmm2,xmm9
+        add     ecx,edx
+        add     ebx,DWORD[56+rsp]
+        xor     esi,ebp
+        mov     edi,ecx
+        rol     ecx,5
+        add     ebx,esi
+        xor     edi,ebp
+        ror     edx,7
+        add     ebx,ecx
+        add     eax,DWORD[60+rsp]
+        xor     edi,edx
+        mov     esi,ebx
+        rol     ebx,5
+        add     eax,edi
+        ror     ecx,7
+        add     eax,ebx
+        add     eax,DWORD[r8]
+        add     esi,DWORD[4+r8]
+        add     ecx,DWORD[8+r8]
+        add     edx,DWORD[12+r8]
+        mov     DWORD[r8],eax
+        add     ebp,DWORD[16+r8]
+        mov     DWORD[4+r8],esi
+        mov     ebx,esi
+        mov     DWORD[8+r8],ecx
+        mov     edi,ecx
+        mov     DWORD[12+r8],edx
+        xor     edi,edx
+        mov     DWORD[16+r8],ebp
+        and     esi,edi
+        jmp     NEAR $L$oop_ssse3
+
+ALIGN   16
+$L$done_ssse3:
+        add     ebx,DWORD[16+rsp]
+        xor     esi,ebp
+        mov     edi,ecx
+        rol     ecx,5
+        add     ebx,esi
+        xor     edi,ebp
+        ror     edx,7
+        add     ebx,ecx
+        add     eax,DWORD[20+rsp]
+        xor     edi,edx
+        mov     esi,ebx
+        rol     ebx,5
+        add     eax,edi
+        xor     esi,edx
+        ror     ecx,7
+        add     eax,ebx
+        add     ebp,DWORD[24+rsp]
+        xor     esi,ecx
+        mov     edi,eax
+        rol     eax,5
+        add     ebp,esi
+        xor     edi,ecx
+        ror     ebx,7
+        add     ebp,eax
+        add     edx,DWORD[28+rsp]
+        xor     edi,ebx
+        mov     esi,ebp
+        rol     ebp,5
+        add     edx,edi
+        xor     esi,ebx
+        ror     eax,7
+        add     edx,ebp
+        add     ecx,DWORD[32+rsp]
+        xor     esi,eax
+        mov     edi,edx
+        rol     edx,5
+        add     ecx,esi
+        xor     edi,eax
+        ror     ebp,7
+        add     ecx,edx
+        add     ebx,DWORD[36+rsp]
+        xor     edi,ebp
+        mov     esi,ecx
+        rol     ecx,5
+        add     ebx,edi
+        xor     esi,ebp
+        ror     edx,7
+        add     ebx,ecx
+        add     eax,DWORD[40+rsp]
+        xor     esi,edx
+        mov     edi,ebx
+        rol     ebx,5
+        add     eax,esi
+        xor     edi,edx
+        ror     ecx,7
+        add     eax,ebx
+        add     ebp,DWORD[44+rsp]
+        xor     edi,ecx
+        mov     esi,eax
+        rol     eax,5
+        add     ebp,edi
+        xor     esi,ecx
+        ror     ebx,7
+        add     ebp,eax
+        add     edx,DWORD[48+rsp]
+        xor     esi,ebx
+        mov     edi,ebp
+        rol     ebp,5
+        add     edx,esi
+        xor     edi,ebx
+        ror     eax,7
+        add     edx,ebp
+        add     ecx,DWORD[52+rsp]
+        xor     edi,eax
+        mov     esi,edx
+        rol     edx,5
+        add     ecx,edi
+        xor     esi,eax
+        ror     ebp,7
+        add     ecx,edx
+        add     ebx,DWORD[56+rsp]
+        xor     esi,ebp
+        mov     edi,ecx
+        rol     ecx,5
+        add     ebx,esi
+        xor     edi,ebp
+        ror     edx,7
+        add     ebx,ecx
+        add     eax,DWORD[60+rsp]
+        xor     edi,edx
+        mov     esi,ebx
+        rol     ebx,5
+        add     eax,edi
+        ror     ecx,7
+        add     eax,ebx
+        add     eax,DWORD[r8]
+        add     esi,DWORD[4+r8]
+        add     ecx,DWORD[8+r8]
+        mov     DWORD[r8],eax
+        add     edx,DWORD[12+r8]
+        mov     DWORD[4+r8],esi
+        add     ebp,DWORD[16+r8]
+        mov     DWORD[8+r8],ecx
+        mov     DWORD[12+r8],edx
+        mov     DWORD[16+r8],ebp
+        movaps  xmm6,XMMWORD[((-40-96))+r11]
+        movaps  xmm7,XMMWORD[((-40-80))+r11]
+        movaps  xmm8,XMMWORD[((-40-64))+r11]
+        movaps  xmm9,XMMWORD[((-40-48))+r11]
+        movaps  xmm10,XMMWORD[((-40-32))+r11]
+        movaps  xmm11,XMMWORD[((-40-16))+r11]
+        mov     r14,QWORD[((-40))+r11]
+
+        mov     r13,QWORD[((-32))+r11]
+
+        mov     r12,QWORD[((-24))+r11]
+
+        mov     rbp,QWORD[((-16))+r11]
+
+        mov     rbx,QWORD[((-8))+r11]
+
+        lea     rsp,[r11]
+
+$L$epilogue_ssse3:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_sha1_block_data_order_ssse3:
+
+ALIGN   16
+sha1_block_data_order_avx:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha1_block_data_order_avx:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+_avx_shortcut:
+
+        mov     r11,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        lea     rsp,[((-160))+rsp]
+        vzeroupper
+        vmovaps XMMWORD[(-40-96)+r11],xmm6
+        vmovaps XMMWORD[(-40-80)+r11],xmm7
+        vmovaps XMMWORD[(-40-64)+r11],xmm8
+        vmovaps XMMWORD[(-40-48)+r11],xmm9
+        vmovaps XMMWORD[(-40-32)+r11],xmm10
+        vmovaps XMMWORD[(-40-16)+r11],xmm11
+$L$prologue_avx:
+        and     rsp,-64
+        mov     r8,rdi
+        mov     r9,rsi
+        mov     r10,rdx
+
+        shl     r10,6
+        add     r10,r9
+        lea     r14,[((K_XX_XX+64))]
+
+        mov     eax,DWORD[r8]
+        mov     ebx,DWORD[4+r8]
+        mov     ecx,DWORD[8+r8]
+        mov     edx,DWORD[12+r8]
+        mov     esi,ebx
+        mov     ebp,DWORD[16+r8]
+        mov     edi,ecx
+        xor     edi,edx
+        and     esi,edi
+
+        vmovdqa xmm6,XMMWORD[64+r14]
+        vmovdqa xmm11,XMMWORD[((-64))+r14]
+        vmovdqu xmm0,XMMWORD[r9]
+        vmovdqu xmm1,XMMWORD[16+r9]
+        vmovdqu xmm2,XMMWORD[32+r9]
+        vmovdqu xmm3,XMMWORD[48+r9]
+        vpshufb xmm0,xmm0,xmm6
+        add     r9,64
+        vpshufb xmm1,xmm1,xmm6
+        vpshufb xmm2,xmm2,xmm6
+        vpshufb xmm3,xmm3,xmm6
+        vpaddd  xmm4,xmm0,xmm11
+        vpaddd  xmm5,xmm1,xmm11
+        vpaddd  xmm6,xmm2,xmm11
+        vmovdqa XMMWORD[rsp],xmm4
+        vmovdqa XMMWORD[16+rsp],xmm5
+        vmovdqa XMMWORD[32+rsp],xmm6
+        jmp     NEAR $L$oop_avx
+ALIGN   16
+$L$oop_avx:
+        shrd    ebx,ebx,2
+        xor     esi,edx
+        vpalignr        xmm4,xmm1,xmm0,8
+        mov     edi,eax
+        add     ebp,DWORD[rsp]
+        vpaddd  xmm9,xmm11,xmm3
+        xor     ebx,ecx
+        shld    eax,eax,5
+        vpsrldq xmm8,xmm3,4
+        add     ebp,esi
+        and     edi,ebx
+        vpxor   xmm4,xmm4,xmm0
+        xor     ebx,ecx
+        add     ebp,eax
+        vpxor   xmm8,xmm8,xmm2
+        shrd    eax,eax,7
+        xor     edi,ecx
+        mov     esi,ebp
+        add     edx,DWORD[4+rsp]
+        vpxor   xmm4,xmm4,xmm8
+        xor     eax,ebx
+        shld    ebp,ebp,5
+        vmovdqa XMMWORD[48+rsp],xmm9
+        add     edx,edi
+        and     esi,eax
+        vpsrld  xmm8,xmm4,31
+        xor     eax,ebx
+        add     edx,ebp
+        shrd    ebp,ebp,7
+        xor     esi,ebx
+        vpslldq xmm10,xmm4,12
+        vpaddd  xmm4,xmm4,xmm4
+        mov     edi,edx
+        add     ecx,DWORD[8+rsp]
+        xor     ebp,eax
+        shld    edx,edx,5
+        vpsrld  xmm9,xmm10,30
+        vpor    xmm4,xmm4,xmm8
+        add     ecx,esi
+        and     edi,ebp
+        xor     ebp,eax
+        add     ecx,edx
+        vpslld  xmm10,xmm10,2
+        vpxor   xmm4,xmm4,xmm9
+        shrd    edx,edx,7
+        xor     edi,eax
+        mov     esi,ecx
+        add     ebx,DWORD[12+rsp]
+        vpxor   xmm4,xmm4,xmm10
+        xor     edx,ebp
+        shld    ecx,ecx,5
+        add     ebx,edi
+        and     esi,edx
+        xor     edx,ebp
+        add     ebx,ecx
+        shrd    ecx,ecx,7
+        xor     esi,ebp
+        vpalignr        xmm5,xmm2,xmm1,8
+        mov     edi,ebx
+        add     eax,DWORD[16+rsp]
+        vpaddd  xmm9,xmm11,xmm4
+        xor     ecx,edx
+        shld    ebx,ebx,5
+        vpsrldq xmm8,xmm4,4
+        add     eax,esi
+        and     edi,ecx
+        vpxor   xmm5,xmm5,xmm1
+        xor     ecx,edx
+        add     eax,ebx
+        vpxor   xmm8,xmm8,xmm3
+        shrd    ebx,ebx,7
+        xor     edi,edx
+        mov     esi,eax
+        add     ebp,DWORD[20+rsp]
+        vpxor   xmm5,xmm5,xmm8
+        xor     ebx,ecx
+        shld    eax,eax,5
+        vmovdqa XMMWORD[rsp],xmm9
+        add     ebp,edi
+        and     esi,ebx
+        vpsrld  xmm8,xmm5,31
+        xor     ebx,ecx
+        add     ebp,eax
+        shrd    eax,eax,7
+        xor     esi,ecx
+        vpslldq xmm10,xmm5,12
+        vpaddd  xmm5,xmm5,xmm5
+        mov     edi,ebp
+        add     edx,DWORD[24+rsp]
+        xor     eax,ebx
+        shld    ebp,ebp,5
+        vpsrld  xmm9,xmm10,30
+        vpor    xmm5,xmm5,xmm8
+        add     edx,esi
+        and     edi,eax
+        xor     eax,ebx
+        add     edx,ebp
+        vpslld  xmm10,xmm10,2
+        vpxor   xmm5,xmm5,xmm9
+        shrd    ebp,ebp,7
+        xor     edi,ebx
+        mov     esi,edx
+        add     ecx,DWORD[28+rsp]
+        vpxor   xmm5,xmm5,xmm10
+        xor     ebp,eax
+        shld    edx,edx,5
+        vmovdqa xmm11,XMMWORD[((-32))+r14]
+        add     ecx,edi
+        and     esi,ebp
+        xor     ebp,eax
+        add     ecx,edx
+        shrd    edx,edx,7
+        xor     esi,eax
+        vpalignr        xmm6,xmm3,xmm2,8
+        mov     edi,ecx
+        add     ebx,DWORD[32+rsp]
+        vpaddd  xmm9,xmm11,xmm5
+        xor     edx,ebp
+        shld    ecx,ecx,5
+        vpsrldq xmm8,xmm5,4
+        add     ebx,esi
+        and     edi,edx
+        vpxor   xmm6,xmm6,xmm2
+        xor     edx,ebp
+        add     ebx,ecx
+        vpxor   xmm8,xmm8,xmm4
+        shrd    ecx,ecx,7
+        xor     edi,ebp
+        mov     esi,ebx
+        add     eax,DWORD[36+rsp]
+        vpxor   xmm6,xmm6,xmm8
+        xor     ecx,edx
+        shld    ebx,ebx,5
+        vmovdqa XMMWORD[16+rsp],xmm9
+        add     eax,edi
+        and     esi,ecx
+        vpsrld  xmm8,xmm6,31
+        xor     ecx,edx
+        add     eax,ebx
+        shrd    ebx,ebx,7
+        xor     esi,edx
+        vpslldq xmm10,xmm6,12
+        vpaddd  xmm6,xmm6,xmm6
+        mov     edi,eax
+        add     ebp,DWORD[40+rsp]
+        xor     ebx,ecx
+        shld    eax,eax,5
+        vpsrld  xmm9,xmm10,30
+        vpor    xmm6,xmm6,xmm8
+        add     ebp,esi
+        and     edi,ebx
+        xor     ebx,ecx
+        add     ebp,eax
+        vpslld  xmm10,xmm10,2
+        vpxor   xmm6,xmm6,xmm9
+        shrd    eax,eax,7
+        xor     edi,ecx
+        mov     esi,ebp
+        add     edx,DWORD[44+rsp]
+        vpxor   xmm6,xmm6,xmm10
+        xor     eax,ebx
+        shld    ebp,ebp,5
+        add     edx,edi
+        and     esi,eax
+        xor     eax,ebx
+        add     edx,ebp
+        shrd    ebp,ebp,7
+        xor     esi,ebx
+        vpalignr        xmm7,xmm4,xmm3,8
+        mov     edi,edx
+        add     ecx,DWORD[48+rsp]
+        vpaddd  xmm9,xmm11,xmm6
+        xor     ebp,eax
+        shld    edx,edx,5
+        vpsrldq xmm8,xmm6,4
+        add     ecx,esi
+        and     edi,ebp
+        vpxor   xmm7,xmm7,xmm3
+        xor     ebp,eax
+        add     ecx,edx
+        vpxor   xmm8,xmm8,xmm5
+        shrd    edx,edx,7
+        xor     edi,eax
+        mov     esi,ecx
+        add     ebx,DWORD[52+rsp]
+        vpxor   xmm7,xmm7,xmm8
+        xor     edx,ebp
+        shld    ecx,ecx,5
+        vmovdqa XMMWORD[32+rsp],xmm9
+        add     ebx,edi
+        and     esi,edx
+        vpsrld  xmm8,xmm7,31
+        xor     edx,ebp
+        add     ebx,ecx
+        shrd    ecx,ecx,7
+        xor     esi,ebp
+        vpslldq xmm10,xmm7,12
+        vpaddd  xmm7,xmm7,xmm7
+        mov     edi,ebx
+        add     eax,DWORD[56+rsp]
+        xor     ecx,edx
+        shld    ebx,ebx,5
+        vpsrld  xmm9,xmm10,30
+        vpor    xmm7,xmm7,xmm8
+        add     eax,esi
+        and     edi,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        vpslld  xmm10,xmm10,2
+        vpxor   xmm7,xmm7,xmm9
+        shrd    ebx,ebx,7
+        xor     edi,edx
+        mov     esi,eax
+        add     ebp,DWORD[60+rsp]
+        vpxor   xmm7,xmm7,xmm10
+        xor     ebx,ecx
+        shld    eax,eax,5
+        add     ebp,edi
+        and     esi,ebx
+        xor     ebx,ecx
+        add     ebp,eax
+        vpalignr        xmm8,xmm7,xmm6,8
+        vpxor   xmm0,xmm0,xmm4
+        shrd    eax,eax,7
+        xor     esi,ecx
+        mov     edi,ebp
+        add     edx,DWORD[rsp]
+        vpxor   xmm0,xmm0,xmm1
+        xor     eax,ebx
+        shld    ebp,ebp,5
+        vpaddd  xmm9,xmm11,xmm7
+        add     edx,esi
+        and     edi,eax
+        vpxor   xmm0,xmm0,xmm8
+        xor     eax,ebx
+        add     edx,ebp
+        shrd    ebp,ebp,7
+        xor     edi,ebx
+        vpsrld  xmm8,xmm0,30
+        vmovdqa XMMWORD[48+rsp],xmm9
+        mov     esi,edx
+        add     ecx,DWORD[4+rsp]
+        xor     ebp,eax
+        shld    edx,edx,5
+        vpslld  xmm0,xmm0,2
+        add     ecx,edi
+        and     esi,ebp
+        xor     ebp,eax
+        add     ecx,edx
+        shrd    edx,edx,7
+        xor     esi,eax
+        mov     edi,ecx
+        add     ebx,DWORD[8+rsp]
+        vpor    xmm0,xmm0,xmm8
+        xor     edx,ebp
+        shld    ecx,ecx,5
+        add     ebx,esi
+        and     edi,edx
+        xor     edx,ebp
+        add     ebx,ecx
+        add     eax,DWORD[12+rsp]
+        xor     edi,ebp
+        mov     esi,ebx
+        shld    ebx,ebx,5
+        add     eax,edi
+        xor     esi,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        vpalignr        xmm8,xmm0,xmm7,8
+        vpxor   xmm1,xmm1,xmm5
+        add     ebp,DWORD[16+rsp]
+        xor     esi,ecx
+        mov     edi,eax
+        shld    eax,eax,5
+        vpxor   xmm1,xmm1,xmm2
+        add     ebp,esi
+        xor     edi,ecx
+        vpaddd  xmm9,xmm11,xmm0
+        shrd    ebx,ebx,7
+        add     ebp,eax
+        vpxor   xmm1,xmm1,xmm8
+        add     edx,DWORD[20+rsp]
+        xor     edi,ebx
+        mov     esi,ebp
+        shld    ebp,ebp,5
+        vpsrld  xmm8,xmm1,30
+        vmovdqa XMMWORD[rsp],xmm9
+        add     edx,edi
+        xor     esi,ebx
+        shrd    eax,eax,7
+        add     edx,ebp
+        vpslld  xmm1,xmm1,2
+        add     ecx,DWORD[24+rsp]
+        xor     esi,eax
+        mov     edi,edx
+        shld    edx,edx,5
+        add     ecx,esi
+        xor     edi,eax
+        shrd    ebp,ebp,7
+        add     ecx,edx
+        vpor    xmm1,xmm1,xmm8
+        add     ebx,DWORD[28+rsp]
+        xor     edi,ebp
+        mov     esi,ecx
+        shld    ecx,ecx,5
+        add     ebx,edi
+        xor     esi,ebp
+        shrd    edx,edx,7
+        add     ebx,ecx
+        vpalignr        xmm8,xmm1,xmm0,8
+        vpxor   xmm2,xmm2,xmm6
+        add     eax,DWORD[32+rsp]
+        xor     esi,edx
+        mov     edi,ebx
+        shld    ebx,ebx,5
+        vpxor   xmm2,xmm2,xmm3
+        add     eax,esi
+        xor     edi,edx
+        vpaddd  xmm9,xmm11,xmm1
+        vmovdqa xmm11,XMMWORD[r14]
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        vpxor   xmm2,xmm2,xmm8
+        add     ebp,DWORD[36+rsp]
+        xor     edi,ecx
+        mov     esi,eax
+        shld    eax,eax,5
+        vpsrld  xmm8,xmm2,30
+        vmovdqa XMMWORD[16+rsp],xmm9
+        add     ebp,edi
+        xor     esi,ecx
+        shrd    ebx,ebx,7
+        add     ebp,eax
+        vpslld  xmm2,xmm2,2
+        add     edx,DWORD[40+rsp]
+        xor     esi,ebx
+        mov     edi,ebp
+        shld    ebp,ebp,5
+        add     edx,esi
+        xor     edi,ebx
+        shrd    eax,eax,7
+        add     edx,ebp
+        vpor    xmm2,xmm2,xmm8
+        add     ecx,DWORD[44+rsp]
+        xor     edi,eax
+        mov     esi,edx
+        shld    edx,edx,5
+        add     ecx,edi
+        xor     esi,eax
+        shrd    ebp,ebp,7
+        add     ecx,edx
+        vpalignr        xmm8,xmm2,xmm1,8
+        vpxor   xmm3,xmm3,xmm7
+        add     ebx,DWORD[48+rsp]
+        xor     esi,ebp
+        mov     edi,ecx
+        shld    ecx,ecx,5
+        vpxor   xmm3,xmm3,xmm4
+        add     ebx,esi
+        xor     edi,ebp
+        vpaddd  xmm9,xmm11,xmm2
+        shrd    edx,edx,7
+        add     ebx,ecx
+        vpxor   xmm3,xmm3,xmm8
+        add     eax,DWORD[52+rsp]
+        xor     edi,edx
+        mov     esi,ebx
+        shld    ebx,ebx,5
+        vpsrld  xmm8,xmm3,30
+        vmovdqa XMMWORD[32+rsp],xmm9
+        add     eax,edi
+        xor     esi,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        vpslld  xmm3,xmm3,2
+        add     ebp,DWORD[56+rsp]
+        xor     esi,ecx
+        mov     edi,eax
+        shld    eax,eax,5
+        add     ebp,esi
+        xor     edi,ecx
+        shrd    ebx,ebx,7
+        add     ebp,eax
+        vpor    xmm3,xmm3,xmm8
+        add     edx,DWORD[60+rsp]
+        xor     edi,ebx
+        mov     esi,ebp
+        shld    ebp,ebp,5
+        add     edx,edi
+        xor     esi,ebx
+        shrd    eax,eax,7
+        add     edx,ebp
+        vpalignr        xmm8,xmm3,xmm2,8
+        vpxor   xmm4,xmm4,xmm0
+        add     ecx,DWORD[rsp]
+        xor     esi,eax
+        mov     edi,edx
+        shld    edx,edx,5
+        vpxor   xmm4,xmm4,xmm5
+        add     ecx,esi
+        xor     edi,eax
+        vpaddd  xmm9,xmm11,xmm3
+        shrd    ebp,ebp,7
+        add     ecx,edx
+        vpxor   xmm4,xmm4,xmm8
+        add     ebx,DWORD[4+rsp]
+        xor     edi,ebp
+        mov     esi,ecx
+        shld    ecx,ecx,5
+        vpsrld  xmm8,xmm4,30
+        vmovdqa XMMWORD[48+rsp],xmm9
+        add     ebx,edi
+        xor     esi,ebp
+        shrd    edx,edx,7
+        add     ebx,ecx
+        vpslld  xmm4,xmm4,2
+        add     eax,DWORD[8+rsp]
+        xor     esi,edx
+        mov     edi,ebx
+        shld    ebx,ebx,5
+        add     eax,esi
+        xor     edi,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        vpor    xmm4,xmm4,xmm8
+        add     ebp,DWORD[12+rsp]
+        xor     edi,ecx
+        mov     esi,eax
+        shld    eax,eax,5
+        add     ebp,edi
+        xor     esi,ecx
+        shrd    ebx,ebx,7
+        add     ebp,eax
+        vpalignr        xmm8,xmm4,xmm3,8
+        vpxor   xmm5,xmm5,xmm1
+        add     edx,DWORD[16+rsp]
+        xor     esi,ebx
+        mov     edi,ebp
+        shld    ebp,ebp,5
+        vpxor   xmm5,xmm5,xmm6
+        add     edx,esi
+        xor     edi,ebx
+        vpaddd  xmm9,xmm11,xmm4
+        shrd    eax,eax,7
+        add     edx,ebp
+        vpxor   xmm5,xmm5,xmm8
+        add     ecx,DWORD[20+rsp]
+        xor     edi,eax
+        mov     esi,edx
+        shld    edx,edx,5
+        vpsrld  xmm8,xmm5,30
+        vmovdqa XMMWORD[rsp],xmm9
+        add     ecx,edi
+        xor     esi,eax
+        shrd    ebp,ebp,7
+        add     ecx,edx
+        vpslld  xmm5,xmm5,2
+        add     ebx,DWORD[24+rsp]
+        xor     esi,ebp
+        mov     edi,ecx
+        shld    ecx,ecx,5
+        add     ebx,esi
+        xor     edi,ebp
+        shrd    edx,edx,7
+        add     ebx,ecx
+        vpor    xmm5,xmm5,xmm8
+        add     eax,DWORD[28+rsp]
+        shrd    ecx,ecx,7
+        mov     esi,ebx
+        xor     edi,edx
+        shld    ebx,ebx,5
+        add     eax,edi
+        xor     esi,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        vpalignr        xmm8,xmm5,xmm4,8
+        vpxor   xmm6,xmm6,xmm2
+        add     ebp,DWORD[32+rsp]
+        and     esi,ecx
+        xor     ecx,edx
+        shrd    ebx,ebx,7
+        vpxor   xmm6,xmm6,xmm7
+        mov     edi,eax
+        xor     esi,ecx
+        vpaddd  xmm9,xmm11,xmm5
+        shld    eax,eax,5
+        add     ebp,esi
+        vpxor   xmm6,xmm6,xmm8
+        xor     edi,ebx
+        xor     ebx,ecx
+        add     ebp,eax
+        add     edx,DWORD[36+rsp]
+        vpsrld  xmm8,xmm6,30
+        vmovdqa XMMWORD[16+rsp],xmm9
+        and     edi,ebx
+        xor     ebx,ecx
+        shrd    eax,eax,7
+        mov     esi,ebp
+        vpslld  xmm6,xmm6,2
+        xor     edi,ebx
+        shld    ebp,ebp,5
+        add     edx,edi
+        xor     esi,eax
+        xor     eax,ebx
+        add     edx,ebp
+        add     ecx,DWORD[40+rsp]
+        and     esi,eax
+        vpor    xmm6,xmm6,xmm8
+        xor     eax,ebx
+        shrd    ebp,ebp,7
+        mov     edi,edx
+        xor     esi,eax
+        shld    edx,edx,5
+        add     ecx,esi
+        xor     edi,ebp
+        xor     ebp,eax
+        add     ecx,edx
+        add     ebx,DWORD[44+rsp]
+        and     edi,ebp
+        xor     ebp,eax
+        shrd    edx,edx,7
+        mov     esi,ecx
+        xor     edi,ebp
+        shld    ecx,ecx,5
+        add     ebx,edi
+        xor     esi,edx
+        xor     edx,ebp
+        add     ebx,ecx
+        vpalignr        xmm8,xmm6,xmm5,8
+        vpxor   xmm7,xmm7,xmm3
+        add     eax,DWORD[48+rsp]
+        and     esi,edx
+        xor     edx,ebp
+        shrd    ecx,ecx,7
+        vpxor   xmm7,xmm7,xmm0
+        mov     edi,ebx
+        xor     esi,edx
+        vpaddd  xmm9,xmm11,xmm6
+        vmovdqa xmm11,XMMWORD[32+r14]
+        shld    ebx,ebx,5
+        add     eax,esi
+        vpxor   xmm7,xmm7,xmm8
+        xor     edi,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        add     ebp,DWORD[52+rsp]
+        vpsrld  xmm8,xmm7,30
+        vmovdqa XMMWORD[32+rsp],xmm9
+        and     edi,ecx
+        xor     ecx,edx
+        shrd    ebx,ebx,7
+        mov     esi,eax
+        vpslld  xmm7,xmm7,2
+        xor     edi,ecx
+        shld    eax,eax,5
+        add     ebp,edi
+        xor     esi,ebx
+        xor     ebx,ecx
+        add     ebp,eax
+        add     edx,DWORD[56+rsp]
+        and     esi,ebx
+        vpor    xmm7,xmm7,xmm8
+        xor     ebx,ecx
+        shrd    eax,eax,7
+        mov     edi,ebp
+        xor     esi,ebx
+        shld    ebp,ebp,5
+        add     edx,esi
+        xor     edi,eax
+        xor     eax,ebx
+        add     edx,ebp
+        add     ecx,DWORD[60+rsp]
+        and     edi,eax
+        xor     eax,ebx
+        shrd    ebp,ebp,7
+        mov     esi,edx
+        xor     edi,eax
+        shld    edx,edx,5
+        add     ecx,edi
+        xor     esi,ebp
+        xor     ebp,eax
+        add     ecx,edx
+        vpalignr        xmm8,xmm7,xmm6,8
+        vpxor   xmm0,xmm0,xmm4
+        add     ebx,DWORD[rsp]
+        and     esi,ebp
+        xor     ebp,eax
+        shrd    edx,edx,7
+        vpxor   xmm0,xmm0,xmm1
+        mov     edi,ecx
+        xor     esi,ebp
+        vpaddd  xmm9,xmm11,xmm7
+        shld    ecx,ecx,5
+        add     ebx,esi
+        vpxor   xmm0,xmm0,xmm8
+        xor     edi,edx
+        xor     edx,ebp
+        add     ebx,ecx
+        add     eax,DWORD[4+rsp]
+        vpsrld  xmm8,xmm0,30
+        vmovdqa XMMWORD[48+rsp],xmm9
+        and     edi,edx
+        xor     edx,ebp
+        shrd    ecx,ecx,7
+        mov     esi,ebx
+        vpslld  xmm0,xmm0,2
+        xor     edi,edx
+        shld    ebx,ebx,5
+        add     eax,edi
+        xor     esi,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        add     ebp,DWORD[8+rsp]
+        and     esi,ecx
+        vpor    xmm0,xmm0,xmm8
+        xor     ecx,edx
+        shrd    ebx,ebx,7
+        mov     edi,eax
+        xor     esi,ecx
+        shld    eax,eax,5
+        add     ebp,esi
+        xor     edi,ebx
+        xor     ebx,ecx
+        add     ebp,eax
+        add     edx,DWORD[12+rsp]
+        and     edi,ebx
+        xor     ebx,ecx
+        shrd    eax,eax,7
+        mov     esi,ebp
+        xor     edi,ebx
+        shld    ebp,ebp,5
+        add     edx,edi
+        xor     esi,eax
+        xor     eax,ebx
+        add     edx,ebp
+        vpalignr        xmm8,xmm0,xmm7,8
+        vpxor   xmm1,xmm1,xmm5
+        add     ecx,DWORD[16+rsp]
+        and     esi,eax
+        xor     eax,ebx
+        shrd    ebp,ebp,7
+        vpxor   xmm1,xmm1,xmm2
+        mov     edi,edx
+        xor     esi,eax
+        vpaddd  xmm9,xmm11,xmm0
+        shld    edx,edx,5
+        add     ecx,esi
+        vpxor   xmm1,xmm1,xmm8
+        xor     edi,ebp
+        xor     ebp,eax
+        add     ecx,edx
+        add     ebx,DWORD[20+rsp]
+        vpsrld  xmm8,xmm1,30
+        vmovdqa XMMWORD[rsp],xmm9
+        and     edi,ebp
+        xor     ebp,eax
+        shrd    edx,edx,7
+        mov     esi,ecx
+        vpslld  xmm1,xmm1,2
+        xor     edi,ebp
+        shld    ecx,ecx,5
+        add     ebx,edi
+        xor     esi,edx
+        xor     edx,ebp
+        add     ebx,ecx
+        add     eax,DWORD[24+rsp]
+        and     esi,edx
+        vpor    xmm1,xmm1,xmm8
+        xor     edx,ebp
+        shrd    ecx,ecx,7
+        mov     edi,ebx
+        xor     esi,edx
+        shld    ebx,ebx,5
+        add     eax,esi
+        xor     edi,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        add     ebp,DWORD[28+rsp]
+        and     edi,ecx
+        xor     ecx,edx
+        shrd    ebx,ebx,7
+        mov     esi,eax
+        xor     edi,ecx
+        shld    eax,eax,5
+        add     ebp,edi
+        xor     esi,ebx
+        xor     ebx,ecx
+        add     ebp,eax
+        vpalignr        xmm8,xmm1,xmm0,8
+        vpxor   xmm2,xmm2,xmm6
+        add     edx,DWORD[32+rsp]
+        and     esi,ebx
+        xor     ebx,ecx
+        shrd    eax,eax,7
+        vpxor   xmm2,xmm2,xmm3
+        mov     edi,ebp
+        xor     esi,ebx
+        vpaddd  xmm9,xmm11,xmm1
+        shld    ebp,ebp,5
+        add     edx,esi
+        vpxor   xmm2,xmm2,xmm8
+        xor     edi,eax
+        xor     eax,ebx
+        add     edx,ebp
+        add     ecx,DWORD[36+rsp]
+        vpsrld  xmm8,xmm2,30
+        vmovdqa XMMWORD[16+rsp],xmm9
+        and     edi,eax
+        xor     eax,ebx
+        shrd    ebp,ebp,7
+        mov     esi,edx
+        vpslld  xmm2,xmm2,2
+        xor     edi,eax
+        shld    edx,edx,5
+        add     ecx,edi
+        xor     esi,ebp
+        xor     ebp,eax
+        add     ecx,edx
+        add     ebx,DWORD[40+rsp]
+        and     esi,ebp
+        vpor    xmm2,xmm2,xmm8
+        xor     ebp,eax
+        shrd    edx,edx,7
+        mov     edi,ecx
+        xor     esi,ebp
+        shld    ecx,ecx,5
+        add     ebx,esi
+        xor     edi,edx
+        xor     edx,ebp
+        add     ebx,ecx
+        add     eax,DWORD[44+rsp]
+        and     edi,edx
+        xor     edx,ebp
+        shrd    ecx,ecx,7
+        mov     esi,ebx
+        xor     edi,edx
+        shld    ebx,ebx,5
+        add     eax,edi
+        xor     esi,edx
+        add     eax,ebx
+        vpalignr        xmm8,xmm2,xmm1,8
+        vpxor   xmm3,xmm3,xmm7
+        add     ebp,DWORD[48+rsp]
+        xor     esi,ecx
+        mov     edi,eax
+        shld    eax,eax,5
+        vpxor   xmm3,xmm3,xmm4
+        add     ebp,esi
+        xor     edi,ecx
+        vpaddd  xmm9,xmm11,xmm2
+        shrd    ebx,ebx,7
+        add     ebp,eax
+        vpxor   xmm3,xmm3,xmm8
+        add     edx,DWORD[52+rsp]
+        xor     edi,ebx
+        mov     esi,ebp
+        shld    ebp,ebp,5
+        vpsrld  xmm8,xmm3,30
+        vmovdqa XMMWORD[32+rsp],xmm9
+        add     edx,edi
+        xor     esi,ebx
+        shrd    eax,eax,7
+        add     edx,ebp
+        vpslld  xmm3,xmm3,2
+        add     ecx,DWORD[56+rsp]
+        xor     esi,eax
+        mov     edi,edx
+        shld    edx,edx,5
+        add     ecx,esi
+        xor     edi,eax
+        shrd    ebp,ebp,7
+        add     ecx,edx
+        vpor    xmm3,xmm3,xmm8
+        add     ebx,DWORD[60+rsp]
+        xor     edi,ebp
+        mov     esi,ecx
+        shld    ecx,ecx,5
+        add     ebx,edi
+        xor     esi,ebp
+        shrd    edx,edx,7
+        add     ebx,ecx
+        add     eax,DWORD[rsp]
+        vpaddd  xmm9,xmm11,xmm3
+        xor     esi,edx
+        mov     edi,ebx
+        shld    ebx,ebx,5
+        add     eax,esi
+        vmovdqa XMMWORD[48+rsp],xmm9
+        xor     edi,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        add     ebp,DWORD[4+rsp]
+        xor     edi,ecx
+        mov     esi,eax
+        shld    eax,eax,5
+        add     ebp,edi
+        xor     esi,ecx
+        shrd    ebx,ebx,7
+        add     ebp,eax
+        add     edx,DWORD[8+rsp]
+        xor     esi,ebx
+        mov     edi,ebp
+        shld    ebp,ebp,5
+        add     edx,esi
+        xor     edi,ebx
+        shrd    eax,eax,7
+        add     edx,ebp
+        add     ecx,DWORD[12+rsp]
+        xor     edi,eax
+        mov     esi,edx
+        shld    edx,edx,5
+        add     ecx,edi
+        xor     esi,eax
+        shrd    ebp,ebp,7
+        add     ecx,edx
+        cmp     r9,r10
+        je      NEAR $L$done_avx
+        vmovdqa xmm6,XMMWORD[64+r14]
+        vmovdqa xmm11,XMMWORD[((-64))+r14]
+        vmovdqu xmm0,XMMWORD[r9]
+        vmovdqu xmm1,XMMWORD[16+r9]
+        vmovdqu xmm2,XMMWORD[32+r9]
+        vmovdqu xmm3,XMMWORD[48+r9]
+        vpshufb xmm0,xmm0,xmm6
+        add     r9,64
+        add     ebx,DWORD[16+rsp]
+        xor     esi,ebp
+        vpshufb xmm1,xmm1,xmm6
+        mov     edi,ecx
+        shld    ecx,ecx,5
+        vpaddd  xmm4,xmm0,xmm11
+        add     ebx,esi
+        xor     edi,ebp
+        shrd    edx,edx,7
+        add     ebx,ecx
+        vmovdqa XMMWORD[rsp],xmm4
+        add     eax,DWORD[20+rsp]
+        xor     edi,edx
+        mov     esi,ebx
+        shld    ebx,ebx,5
+        add     eax,edi
+        xor     esi,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        add     ebp,DWORD[24+rsp]
+        xor     esi,ecx
+        mov     edi,eax
+        shld    eax,eax,5
+        add     ebp,esi
+        xor     edi,ecx
+        shrd    ebx,ebx,7
+        add     ebp,eax
+        add     edx,DWORD[28+rsp]
+        xor     edi,ebx
+        mov     esi,ebp
+        shld    ebp,ebp,5
+        add     edx,edi
+        xor     esi,ebx
+        shrd    eax,eax,7
+        add     edx,ebp
+        add     ecx,DWORD[32+rsp]
+        xor     esi,eax
+        vpshufb xmm2,xmm2,xmm6
+        mov     edi,edx
+        shld    edx,edx,5
+        vpaddd  xmm5,xmm1,xmm11
+        add     ecx,esi
+        xor     edi,eax
+        shrd    ebp,ebp,7
+        add     ecx,edx
+        vmovdqa XMMWORD[16+rsp],xmm5
+        add     ebx,DWORD[36+rsp]
+        xor     edi,ebp
+        mov     esi,ecx
+        shld    ecx,ecx,5
+        add     ebx,edi
+        xor     esi,ebp
+        shrd    edx,edx,7
+        add     ebx,ecx
+        add     eax,DWORD[40+rsp]
+        xor     esi,edx
+        mov     edi,ebx
+        shld    ebx,ebx,5
+        add     eax,esi
+        xor     edi,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        add     ebp,DWORD[44+rsp]
+        xor     edi,ecx
+        mov     esi,eax
+        shld    eax,eax,5
+        add     ebp,edi
+        xor     esi,ecx
+        shrd    ebx,ebx,7
+        add     ebp,eax
+        add     edx,DWORD[48+rsp]
+        xor     esi,ebx
+        vpshufb xmm3,xmm3,xmm6
+        mov     edi,ebp
+        shld    ebp,ebp,5
+        vpaddd  xmm6,xmm2,xmm11
+        add     edx,esi
+        xor     edi,ebx
+        shrd    eax,eax,7
+        add     edx,ebp
+        vmovdqa XMMWORD[32+rsp],xmm6
+        add     ecx,DWORD[52+rsp]
+        xor     edi,eax
+        mov     esi,edx
+        shld    edx,edx,5
+        add     ecx,edi
+        xor     esi,eax
+        shrd    ebp,ebp,7
+        add     ecx,edx
+        add     ebx,DWORD[56+rsp]
+        xor     esi,ebp
+        mov     edi,ecx
+        shld    ecx,ecx,5
+        add     ebx,esi
+        xor     edi,ebp
+        shrd    edx,edx,7
+        add     ebx,ecx
+        add     eax,DWORD[60+rsp]
+        xor     edi,edx
+        mov     esi,ebx
+        shld    ebx,ebx,5
+        add     eax,edi
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        add     eax,DWORD[r8]
+        add     esi,DWORD[4+r8]
+        add     ecx,DWORD[8+r8]
+        add     edx,DWORD[12+r8]
+        mov     DWORD[r8],eax
+        add     ebp,DWORD[16+r8]
+        mov     DWORD[4+r8],esi
+        mov     ebx,esi
+        mov     DWORD[8+r8],ecx
+        mov     edi,ecx
+        mov     DWORD[12+r8],edx
+        xor     edi,edx
+        mov     DWORD[16+r8],ebp
+        and     esi,edi
+        jmp     NEAR $L$oop_avx
+
+ALIGN   16
+$L$done_avx:
+        add     ebx,DWORD[16+rsp]
+        xor     esi,ebp
+        mov     edi,ecx
+        shld    ecx,ecx,5
+        add     ebx,esi
+        xor     edi,ebp
+        shrd    edx,edx,7
+        add     ebx,ecx
+        add     eax,DWORD[20+rsp]
+        xor     edi,edx
+        mov     esi,ebx
+        shld    ebx,ebx,5
+        add     eax,edi
+        xor     esi,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        add     ebp,DWORD[24+rsp]
+        xor     esi,ecx
+        mov     edi,eax
+        shld    eax,eax,5
+        add     ebp,esi
+        xor     edi,ecx
+        shrd    ebx,ebx,7
+        add     ebp,eax
+        add     edx,DWORD[28+rsp]
+        xor     edi,ebx
+        mov     esi,ebp
+        shld    ebp,ebp,5
+        add     edx,edi
+        xor     esi,ebx
+        shrd    eax,eax,7
+        add     edx,ebp
+        add     ecx,DWORD[32+rsp]
+        xor     esi,eax
+        mov     edi,edx
+        shld    edx,edx,5
+        add     ecx,esi
+        xor     edi,eax
+        shrd    ebp,ebp,7
+        add     ecx,edx
+        add     ebx,DWORD[36+rsp]
+        xor     edi,ebp
+        mov     esi,ecx
+        shld    ecx,ecx,5
+        add     ebx,edi
+        xor     esi,ebp
+        shrd    edx,edx,7
+        add     ebx,ecx
+        add     eax,DWORD[40+rsp]
+        xor     esi,edx
+        mov     edi,ebx
+        shld    ebx,ebx,5
+        add     eax,esi
+        xor     edi,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        add     ebp,DWORD[44+rsp]
+        xor     edi,ecx
+        mov     esi,eax
+        shld    eax,eax,5
+        add     ebp,edi
+        xor     esi,ecx
+        shrd    ebx,ebx,7
+        add     ebp,eax
+        add     edx,DWORD[48+rsp]
+        xor     esi,ebx
+        mov     edi,ebp
+        shld    ebp,ebp,5
+        add     edx,esi
+        xor     edi,ebx
+        shrd    eax,eax,7
+        add     edx,ebp
+        add     ecx,DWORD[52+rsp]
+        xor     edi,eax
+        mov     esi,edx
+        shld    edx,edx,5
+        add     ecx,edi
+        xor     esi,eax
+        shrd    ebp,ebp,7
+        add     ecx,edx
+        add     ebx,DWORD[56+rsp]
+        xor     esi,ebp
+        mov     edi,ecx
+        shld    ecx,ecx,5
+        add     ebx,esi
+        xor     edi,ebp
+        shrd    edx,edx,7
+        add     ebx,ecx
+        add     eax,DWORD[60+rsp]
+        xor     edi,edx
+        mov     esi,ebx
+        shld    ebx,ebx,5
+        add     eax,edi
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        vzeroupper
+
+        add     eax,DWORD[r8]
+        add     esi,DWORD[4+r8]
+        add     ecx,DWORD[8+r8]
+        mov     DWORD[r8],eax
+        add     edx,DWORD[12+r8]
+        mov     DWORD[4+r8],esi
+        add     ebp,DWORD[16+r8]
+        mov     DWORD[8+r8],ecx
+        mov     DWORD[12+r8],edx
+        mov     DWORD[16+r8],ebp
+        movaps  xmm6,XMMWORD[((-40-96))+r11]
+        movaps  xmm7,XMMWORD[((-40-80))+r11]
+        movaps  xmm8,XMMWORD[((-40-64))+r11]
+        movaps  xmm9,XMMWORD[((-40-48))+r11]
+        movaps  xmm10,XMMWORD[((-40-32))+r11]
+        movaps  xmm11,XMMWORD[((-40-16))+r11]
+        mov     r14,QWORD[((-40))+r11]
+
+        mov     r13,QWORD[((-32))+r11]
+
+        mov     r12,QWORD[((-24))+r11]
+
+        mov     rbp,QWORD[((-16))+r11]
+
+        mov     rbx,QWORD[((-8))+r11]
+
+        lea     rsp,[r11]
+
+$L$epilogue_avx:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_sha1_block_data_order_avx:
+
+ALIGN   16
+sha1_block_data_order_avx2:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha1_block_data_order_avx2:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+_avx2_shortcut:
+
+        mov     r11,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        vzeroupper
+        lea     rsp,[((-96))+rsp]
+        vmovaps XMMWORD[(-40-96)+r11],xmm6
+        vmovaps XMMWORD[(-40-80)+r11],xmm7
+        vmovaps XMMWORD[(-40-64)+r11],xmm8
+        vmovaps XMMWORD[(-40-48)+r11],xmm9
+        vmovaps XMMWORD[(-40-32)+r11],xmm10
+        vmovaps XMMWORD[(-40-16)+r11],xmm11
+$L$prologue_avx2:
+        mov     r8,rdi
+        mov     r9,rsi
+        mov     r10,rdx
+
+        lea     rsp,[((-640))+rsp]
+        shl     r10,6
+        lea     r13,[64+r9]
+        and     rsp,-128
+        add     r10,r9
+        lea     r14,[((K_XX_XX+64))]
+
+        mov     eax,DWORD[r8]
+        cmp     r13,r10
+        cmovae  r13,r9
+        mov     ebp,DWORD[4+r8]
+        mov     ecx,DWORD[8+r8]
+        mov     edx,DWORD[12+r8]
+        mov     esi,DWORD[16+r8]
+        vmovdqu ymm6,YMMWORD[64+r14]
+
+        vmovdqu xmm0,XMMWORD[r9]
+        vmovdqu xmm1,XMMWORD[16+r9]
+        vmovdqu xmm2,XMMWORD[32+r9]
+        vmovdqu xmm3,XMMWORD[48+r9]
+        lea     r9,[64+r9]
+        vinserti128     ymm0,ymm0,XMMWORD[r13],1
+        vinserti128     ymm1,ymm1,XMMWORD[16+r13],1
+        vpshufb ymm0,ymm0,ymm6
+        vinserti128     ymm2,ymm2,XMMWORD[32+r13],1
+        vpshufb ymm1,ymm1,ymm6
+        vinserti128     ymm3,ymm3,XMMWORD[48+r13],1
+        vpshufb ymm2,ymm2,ymm6
+        vmovdqu ymm11,YMMWORD[((-64))+r14]
+        vpshufb ymm3,ymm3,ymm6
+
+        vpaddd  ymm4,ymm0,ymm11
+        vpaddd  ymm5,ymm1,ymm11
+        vmovdqu YMMWORD[rsp],ymm4
+        vpaddd  ymm6,ymm2,ymm11
+        vmovdqu YMMWORD[32+rsp],ymm5
+        vpaddd  ymm7,ymm3,ymm11
+        vmovdqu YMMWORD[64+rsp],ymm6
+        vmovdqu YMMWORD[96+rsp],ymm7
+        vpalignr        ymm4,ymm1,ymm0,8
+        vpsrldq ymm8,ymm3,4
+        vpxor   ymm4,ymm4,ymm0
+        vpxor   ymm8,ymm8,ymm2
+        vpxor   ymm4,ymm4,ymm8
+        vpsrld  ymm8,ymm4,31
+        vpslldq ymm10,ymm4,12
+        vpaddd  ymm4,ymm4,ymm4
+        vpsrld  ymm9,ymm10,30
+        vpor    ymm4,ymm4,ymm8
+        vpslld  ymm10,ymm10,2
+        vpxor   ymm4,ymm4,ymm9
+        vpxor   ymm4,ymm4,ymm10
+        vpaddd  ymm9,ymm4,ymm11
+        vmovdqu YMMWORD[128+rsp],ymm9
+        vpalignr        ymm5,ymm2,ymm1,8
+        vpsrldq ymm8,ymm4,4
+        vpxor   ymm5,ymm5,ymm1
+        vpxor   ymm8,ymm8,ymm3
+        vpxor   ymm5,ymm5,ymm8
+        vpsrld  ymm8,ymm5,31
+        vmovdqu ymm11,YMMWORD[((-32))+r14]
+        vpslldq ymm10,ymm5,12
+        vpaddd  ymm5,ymm5,ymm5
+        vpsrld  ymm9,ymm10,30
+        vpor    ymm5,ymm5,ymm8
+        vpslld  ymm10,ymm10,2
+        vpxor   ymm5,ymm5,ymm9
+        vpxor   ymm5,ymm5,ymm10
+        vpaddd  ymm9,ymm5,ymm11
+        vmovdqu YMMWORD[160+rsp],ymm9
+        vpalignr        ymm6,ymm3,ymm2,8
+        vpsrldq ymm8,ymm5,4
+        vpxor   ymm6,ymm6,ymm2
+        vpxor   ymm8,ymm8,ymm4
+        vpxor   ymm6,ymm6,ymm8
+        vpsrld  ymm8,ymm6,31
+        vpslldq ymm10,ymm6,12
+        vpaddd  ymm6,ymm6,ymm6
+        vpsrld  ymm9,ymm10,30
+        vpor    ymm6,ymm6,ymm8
+        vpslld  ymm10,ymm10,2
+        vpxor   ymm6,ymm6,ymm9
+        vpxor   ymm6,ymm6,ymm10
+        vpaddd  ymm9,ymm6,ymm11
+        vmovdqu YMMWORD[192+rsp],ymm9
+        vpalignr        ymm7,ymm4,ymm3,8
+        vpsrldq ymm8,ymm6,4
+        vpxor   ymm7,ymm7,ymm3
+        vpxor   ymm8,ymm8,ymm5
+        vpxor   ymm7,ymm7,ymm8
+        vpsrld  ymm8,ymm7,31
+        vpslldq ymm10,ymm7,12
+        vpaddd  ymm7,ymm7,ymm7
+        vpsrld  ymm9,ymm10,30
+        vpor    ymm7,ymm7,ymm8
+        vpslld  ymm10,ymm10,2
+        vpxor   ymm7,ymm7,ymm9
+        vpxor   ymm7,ymm7,ymm10
+        vpaddd  ymm9,ymm7,ymm11
+        vmovdqu YMMWORD[224+rsp],ymm9
+        lea     r13,[128+rsp]
+        jmp     NEAR $L$oop_avx2
+ALIGN   32
+$L$oop_avx2:
+        rorx    ebx,ebp,2
+        andn    edi,ebp,edx
+        and     ebp,ecx
+        xor     ebp,edi
+        jmp     NEAR $L$align32_1
+ALIGN   32
+$L$align32_1:
+        vpalignr        ymm8,ymm7,ymm6,8
+        vpxor   ymm0,ymm0,ymm4
+        add     esi,DWORD[((-128))+r13]
+        andn    edi,eax,ecx
+        vpxor   ymm0,ymm0,ymm1
+        add     esi,ebp
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        vpxor   ymm0,ymm0,ymm8
+        and     eax,ebx
+        add     esi,r12d
+        xor     eax,edi
+        vpsrld  ymm8,ymm0,30
+        vpslld  ymm0,ymm0,2
+        add     edx,DWORD[((-124))+r13]
+        andn    edi,esi,ebx
+        add     edx,eax
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        and     esi,ebp
+        vpor    ymm0,ymm0,ymm8
+        add     edx,r12d
+        xor     esi,edi
+        add     ecx,DWORD[((-120))+r13]
+        andn    edi,edx,ebp
+        vpaddd  ymm9,ymm0,ymm11
+        add     ecx,esi
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        and     edx,eax
+        vmovdqu YMMWORD[256+rsp],ymm9
+        add     ecx,r12d
+        xor     edx,edi
+        add     ebx,DWORD[((-116))+r13]
+        andn    edi,ecx,eax
+        add     ebx,edx
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        and     ecx,esi
+        add     ebx,r12d
+        xor     ecx,edi
+        add     ebp,DWORD[((-96))+r13]
+        andn    edi,ebx,esi
+        add     ebp,ecx
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        and     ebx,edx
+        add     ebp,r12d
+        xor     ebx,edi
+        vpalignr        ymm8,ymm0,ymm7,8
+        vpxor   ymm1,ymm1,ymm5
+        add     eax,DWORD[((-92))+r13]
+        andn    edi,ebp,edx
+        vpxor   ymm1,ymm1,ymm2
+        add     eax,ebx
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        vpxor   ymm1,ymm1,ymm8
+        and     ebp,ecx
+        add     eax,r12d
+        xor     ebp,edi
+        vpsrld  ymm8,ymm1,30
+        vpslld  ymm1,ymm1,2
+        add     esi,DWORD[((-88))+r13]
+        andn    edi,eax,ecx
+        add     esi,ebp
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        and     eax,ebx
+        vpor    ymm1,ymm1,ymm8
+        add     esi,r12d
+        xor     eax,edi
+        add     edx,DWORD[((-84))+r13]
+        andn    edi,esi,ebx
+        vpaddd  ymm9,ymm1,ymm11
+        add     edx,eax
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        and     esi,ebp
+        vmovdqu YMMWORD[288+rsp],ymm9
+        add     edx,r12d
+        xor     esi,edi
+        add     ecx,DWORD[((-64))+r13]
+        andn    edi,edx,ebp
+        add     ecx,esi
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        and     edx,eax
+        add     ecx,r12d
+        xor     edx,edi
+        add     ebx,DWORD[((-60))+r13]
+        andn    edi,ecx,eax
+        add     ebx,edx
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        and     ecx,esi
+        add     ebx,r12d
+        xor     ecx,edi
+        vpalignr        ymm8,ymm1,ymm0,8
+        vpxor   ymm2,ymm2,ymm6
+        add     ebp,DWORD[((-56))+r13]
+        andn    edi,ebx,esi
+        vpxor   ymm2,ymm2,ymm3
+        vmovdqu ymm11,YMMWORD[r14]
+        add     ebp,ecx
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        vpxor   ymm2,ymm2,ymm8
+        and     ebx,edx
+        add     ebp,r12d
+        xor     ebx,edi
+        vpsrld  ymm8,ymm2,30
+        vpslld  ymm2,ymm2,2
+        add     eax,DWORD[((-52))+r13]
+        andn    edi,ebp,edx
+        add     eax,ebx
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        and     ebp,ecx
+        vpor    ymm2,ymm2,ymm8
+        add     eax,r12d
+        xor     ebp,edi
+        add     esi,DWORD[((-32))+r13]
+        andn    edi,eax,ecx
+        vpaddd  ymm9,ymm2,ymm11
+        add     esi,ebp
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        and     eax,ebx
+        vmovdqu YMMWORD[320+rsp],ymm9
+        add     esi,r12d
+        xor     eax,edi
+        add     edx,DWORD[((-28))+r13]
+        andn    edi,esi,ebx
+        add     edx,eax
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        and     esi,ebp
+        add     edx,r12d
+        xor     esi,edi
+        add     ecx,DWORD[((-24))+r13]
+        andn    edi,edx,ebp
+        add     ecx,esi
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        and     edx,eax
+        add     ecx,r12d
+        xor     edx,edi
+        vpalignr        ymm8,ymm2,ymm1,8
+        vpxor   ymm3,ymm3,ymm7
+        add     ebx,DWORD[((-20))+r13]
+        andn    edi,ecx,eax
+        vpxor   ymm3,ymm3,ymm4
+        add     ebx,edx
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        vpxor   ymm3,ymm3,ymm8
+        and     ecx,esi
+        add     ebx,r12d
+        xor     ecx,edi
+        vpsrld  ymm8,ymm3,30
+        vpslld  ymm3,ymm3,2
+        add     ebp,DWORD[r13]
+        andn    edi,ebx,esi
+        add     ebp,ecx
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        and     ebx,edx
+        vpor    ymm3,ymm3,ymm8
+        add     ebp,r12d
+        xor     ebx,edi
+        add     eax,DWORD[4+r13]
+        andn    edi,ebp,edx
+        vpaddd  ymm9,ymm3,ymm11
+        add     eax,ebx
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        and     ebp,ecx
+        vmovdqu YMMWORD[352+rsp],ymm9
+        add     eax,r12d
+        xor     ebp,edi
+        add     esi,DWORD[8+r13]
+        andn    edi,eax,ecx
+        add     esi,ebp
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        and     eax,ebx
+        add     esi,r12d
+        xor     eax,edi
+        add     edx,DWORD[12+r13]
+        lea     edx,[rax*1+rdx]
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        xor     esi,ebp
+        add     edx,r12d
+        xor     esi,ebx
+        vpalignr        ymm8,ymm3,ymm2,8
+        vpxor   ymm4,ymm4,ymm0
+        add     ecx,DWORD[32+r13]
+        lea     ecx,[rsi*1+rcx]
+        vpxor   ymm4,ymm4,ymm5
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        xor     edx,eax
+        vpxor   ymm4,ymm4,ymm8
+        add     ecx,r12d
+        xor     edx,ebp
+        add     ebx,DWORD[36+r13]
+        vpsrld  ymm8,ymm4,30
+        vpslld  ymm4,ymm4,2
+        lea     ebx,[rdx*1+rbx]
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        xor     ecx,esi
+        add     ebx,r12d
+        xor     ecx,eax
+        vpor    ymm4,ymm4,ymm8
+        add     ebp,DWORD[40+r13]
+        lea     ebp,[rbp*1+rcx]
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        vpaddd  ymm9,ymm4,ymm11
+        xor     ebx,edx
+        add     ebp,r12d
+        xor     ebx,esi
+        add     eax,DWORD[44+r13]
+        vmovdqu YMMWORD[384+rsp],ymm9
+        lea     eax,[rbx*1+rax]
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        xor     ebp,ecx
+        add     eax,r12d
+        xor     ebp,edx
+        add     esi,DWORD[64+r13]
+        lea     esi,[rbp*1+rsi]
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        xor     eax,ebx
+        add     esi,r12d
+        xor     eax,ecx
+        vpalignr        ymm8,ymm4,ymm3,8
+        vpxor   ymm5,ymm5,ymm1
+        add     edx,DWORD[68+r13]
+        lea     edx,[rax*1+rdx]
+        vpxor   ymm5,ymm5,ymm6
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        xor     esi,ebp
+        vpxor   ymm5,ymm5,ymm8
+        add     edx,r12d
+        xor     esi,ebx
+        add     ecx,DWORD[72+r13]
+        vpsrld  ymm8,ymm5,30
+        vpslld  ymm5,ymm5,2
+        lea     ecx,[rsi*1+rcx]
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        xor     edx,eax
+        add     ecx,r12d
+        xor     edx,ebp
+        vpor    ymm5,ymm5,ymm8
+        add     ebx,DWORD[76+r13]
+        lea     ebx,[rdx*1+rbx]
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        vpaddd  ymm9,ymm5,ymm11
+        xor     ecx,esi
+        add     ebx,r12d
+        xor     ecx,eax
+        add     ebp,DWORD[96+r13]
+        vmovdqu YMMWORD[416+rsp],ymm9
+        lea     ebp,[rbp*1+rcx]
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        xor     ebx,edx
+        add     ebp,r12d
+        xor     ebx,esi
+        add     eax,DWORD[100+r13]
+        lea     eax,[rbx*1+rax]
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        xor     ebp,ecx
+        add     eax,r12d
+        xor     ebp,edx
+        vpalignr        ymm8,ymm5,ymm4,8
+        vpxor   ymm6,ymm6,ymm2
+        add     esi,DWORD[104+r13]
+        lea     esi,[rbp*1+rsi]
+        vpxor   ymm6,ymm6,ymm7
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        xor     eax,ebx
+        vpxor   ymm6,ymm6,ymm8
+        add     esi,r12d
+        xor     eax,ecx
+        add     edx,DWORD[108+r13]
+        lea     r13,[256+r13]
+        vpsrld  ymm8,ymm6,30
+        vpslld  ymm6,ymm6,2
+        lea     edx,[rax*1+rdx]
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        xor     esi,ebp
+        add     edx,r12d
+        xor     esi,ebx
+        vpor    ymm6,ymm6,ymm8
+        add     ecx,DWORD[((-128))+r13]
+        lea     ecx,[rsi*1+rcx]
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        vpaddd  ymm9,ymm6,ymm11
+        xor     edx,eax
+        add     ecx,r12d
+        xor     edx,ebp
+        add     ebx,DWORD[((-124))+r13]
+        vmovdqu YMMWORD[448+rsp],ymm9
+        lea     ebx,[rdx*1+rbx]
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        xor     ecx,esi
+        add     ebx,r12d
+        xor     ecx,eax
+        add     ebp,DWORD[((-120))+r13]
+        lea     ebp,[rbp*1+rcx]
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        xor     ebx,edx
+        add     ebp,r12d
+        xor     ebx,esi
+        vpalignr        ymm8,ymm6,ymm5,8
+        vpxor   ymm7,ymm7,ymm3
+        add     eax,DWORD[((-116))+r13]
+        lea     eax,[rbx*1+rax]
+        vpxor   ymm7,ymm7,ymm0
+        vmovdqu ymm11,YMMWORD[32+r14]
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        xor     ebp,ecx
+        vpxor   ymm7,ymm7,ymm8
+        add     eax,r12d
+        xor     ebp,edx
+        add     esi,DWORD[((-96))+r13]
+        vpsrld  ymm8,ymm7,30
+        vpslld  ymm7,ymm7,2
+        lea     esi,[rbp*1+rsi]
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        xor     eax,ebx
+        add     esi,r12d
+        xor     eax,ecx
+        vpor    ymm7,ymm7,ymm8
+        add     edx,DWORD[((-92))+r13]
+        lea     edx,[rax*1+rdx]
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        vpaddd  ymm9,ymm7,ymm11
+        xor     esi,ebp
+        add     edx,r12d
+        xor     esi,ebx
+        add     ecx,DWORD[((-88))+r13]
+        vmovdqu YMMWORD[480+rsp],ymm9
+        lea     ecx,[rsi*1+rcx]
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        xor     edx,eax
+        add     ecx,r12d
+        xor     edx,ebp
+        add     ebx,DWORD[((-84))+r13]
+        mov     edi,esi
+        xor     edi,eax
+        lea     ebx,[rdx*1+rbx]
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        xor     ecx,esi
+        add     ebx,r12d
+        and     ecx,edi
+        jmp     NEAR $L$align32_2
+ALIGN   32
+$L$align32_2:
+        vpalignr        ymm8,ymm7,ymm6,8
+        vpxor   ymm0,ymm0,ymm4
+        add     ebp,DWORD[((-64))+r13]
+        xor     ecx,esi
+        vpxor   ymm0,ymm0,ymm1
+        mov     edi,edx
+        xor     edi,esi
+        lea     ebp,[rbp*1+rcx]
+        vpxor   ymm0,ymm0,ymm8
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        xor     ebx,edx
+        vpsrld  ymm8,ymm0,30
+        vpslld  ymm0,ymm0,2
+        add     ebp,r12d
+        and     ebx,edi
+        add     eax,DWORD[((-60))+r13]
+        xor     ebx,edx
+        mov     edi,ecx
+        xor     edi,edx
+        vpor    ymm0,ymm0,ymm8
+        lea     eax,[rbx*1+rax]
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        xor     ebp,ecx
+        vpaddd  ymm9,ymm0,ymm11
+        add     eax,r12d
+        and     ebp,edi
+        add     esi,DWORD[((-56))+r13]
+        xor     ebp,ecx
+        vmovdqu YMMWORD[512+rsp],ymm9
+        mov     edi,ebx
+        xor     edi,ecx
+        lea     esi,[rbp*1+rsi]
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        xor     eax,ebx
+        add     esi,r12d
+        and     eax,edi
+        add     edx,DWORD[((-52))+r13]
+        xor     eax,ebx
+        mov     edi,ebp
+        xor     edi,ebx
+        lea     edx,[rax*1+rdx]
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        xor     esi,ebp
+        add     edx,r12d
+        and     esi,edi
+        add     ecx,DWORD[((-32))+r13]
+        xor     esi,ebp
+        mov     edi,eax
+        xor     edi,ebp
+        lea     ecx,[rsi*1+rcx]
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        xor     edx,eax
+        add     ecx,r12d
+        and     edx,edi
+        vpalignr        ymm8,ymm0,ymm7,8
+        vpxor   ymm1,ymm1,ymm5
+        add     ebx,DWORD[((-28))+r13]
+        xor     edx,eax
+        vpxor   ymm1,ymm1,ymm2
+        mov     edi,esi
+        xor     edi,eax
+        lea     ebx,[rdx*1+rbx]
+        vpxor   ymm1,ymm1,ymm8
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        xor     ecx,esi
+        vpsrld  ymm8,ymm1,30
+        vpslld  ymm1,ymm1,2
+        add     ebx,r12d
+        and     ecx,edi
+        add     ebp,DWORD[((-24))+r13]
+        xor     ecx,esi
+        mov     edi,edx
+        xor     edi,esi
+        vpor    ymm1,ymm1,ymm8
+        lea     ebp,[rbp*1+rcx]
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        xor     ebx,edx
+        vpaddd  ymm9,ymm1,ymm11
+        add     ebp,r12d
+        and     ebx,edi
+        add     eax,DWORD[((-20))+r13]
+        xor     ebx,edx
+        vmovdqu YMMWORD[544+rsp],ymm9
+        mov     edi,ecx
+        xor     edi,edx
+        lea     eax,[rbx*1+rax]
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        xor     ebp,ecx
+        add     eax,r12d
+        and     ebp,edi
+        add     esi,DWORD[r13]
+        xor     ebp,ecx
+        mov     edi,ebx
+        xor     edi,ecx
+        lea     esi,[rbp*1+rsi]
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        xor     eax,ebx
+        add     esi,r12d
+        and     eax,edi
+        add     edx,DWORD[4+r13]
+        xor     eax,ebx
+        mov     edi,ebp
+        xor     edi,ebx
+        lea     edx,[rax*1+rdx]
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        xor     esi,ebp
+        add     edx,r12d
+        and     esi,edi
+        vpalignr        ymm8,ymm1,ymm0,8
+        vpxor   ymm2,ymm2,ymm6
+        add     ecx,DWORD[8+r13]
+        xor     esi,ebp
+        vpxor   ymm2,ymm2,ymm3
+        mov     edi,eax
+        xor     edi,ebp
+        lea     ecx,[rsi*1+rcx]
+        vpxor   ymm2,ymm2,ymm8
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        xor     edx,eax
+        vpsrld  ymm8,ymm2,30
+        vpslld  ymm2,ymm2,2
+        add     ecx,r12d
+        and     edx,edi
+        add     ebx,DWORD[12+r13]
+        xor     edx,eax
+        mov     edi,esi
+        xor     edi,eax
+        vpor    ymm2,ymm2,ymm8
+        lea     ebx,[rdx*1+rbx]
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        xor     ecx,esi
+        vpaddd  ymm9,ymm2,ymm11
+        add     ebx,r12d
+        and     ecx,edi
+        add     ebp,DWORD[32+r13]
+        xor     ecx,esi
+        vmovdqu YMMWORD[576+rsp],ymm9
+        mov     edi,edx
+        xor     edi,esi
+        lea     ebp,[rbp*1+rcx]
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        xor     ebx,edx
+        add     ebp,r12d
+        and     ebx,edi
+        add     eax,DWORD[36+r13]
+        xor     ebx,edx
+        mov     edi,ecx
+        xor     edi,edx
+        lea     eax,[rbx*1+rax]
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        xor     ebp,ecx
+        add     eax,r12d
+        and     ebp,edi
+        add     esi,DWORD[40+r13]
+        xor     ebp,ecx
+        mov     edi,ebx
+        xor     edi,ecx
+        lea     esi,[rbp*1+rsi]
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        xor     eax,ebx
+        add     esi,r12d
+        and     eax,edi
+        vpalignr        ymm8,ymm2,ymm1,8
+        vpxor   ymm3,ymm3,ymm7
+        add     edx,DWORD[44+r13]
+        xor     eax,ebx
+        vpxor   ymm3,ymm3,ymm4
+        mov     edi,ebp
+        xor     edi,ebx
+        lea     edx,[rax*1+rdx]
+        vpxor   ymm3,ymm3,ymm8
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        xor     esi,ebp
+        vpsrld  ymm8,ymm3,30
+        vpslld  ymm3,ymm3,2
+        add     edx,r12d
+        and     esi,edi
+        add     ecx,DWORD[64+r13]
+        xor     esi,ebp
+        mov     edi,eax
+        xor     edi,ebp
+        vpor    ymm3,ymm3,ymm8
+        lea     ecx,[rsi*1+rcx]
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        xor     edx,eax
+        vpaddd  ymm9,ymm3,ymm11
+        add     ecx,r12d
+        and     edx,edi
+        add     ebx,DWORD[68+r13]
+        xor     edx,eax
+        vmovdqu YMMWORD[608+rsp],ymm9
+        mov     edi,esi
+        xor     edi,eax
+        lea     ebx,[rdx*1+rbx]
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        xor     ecx,esi
+        add     ebx,r12d
+        and     ecx,edi
+        add     ebp,DWORD[72+r13]
+        xor     ecx,esi
+        mov     edi,edx
+        xor     edi,esi
+        lea     ebp,[rbp*1+rcx]
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        xor     ebx,edx
+        add     ebp,r12d
+        and     ebx,edi
+        add     eax,DWORD[76+r13]
+        xor     ebx,edx
+        lea     eax,[rbx*1+rax]
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        xor     ebp,ecx
+        add     eax,r12d
+        xor     ebp,edx
+        add     esi,DWORD[96+r13]
+        lea     esi,[rbp*1+rsi]
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        xor     eax,ebx
+        add     esi,r12d
+        xor     eax,ecx
+        add     edx,DWORD[100+r13]
+        lea     edx,[rax*1+rdx]
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        xor     esi,ebp
+        add     edx,r12d
+        xor     esi,ebx
+        add     ecx,DWORD[104+r13]
+        lea     ecx,[rsi*1+rcx]
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        xor     edx,eax
+        add     ecx,r12d
+        xor     edx,ebp
+        add     ebx,DWORD[108+r13]
+        lea     r13,[256+r13]
+        lea     ebx,[rdx*1+rbx]
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        xor     ecx,esi
+        add     ebx,r12d
+        xor     ecx,eax
+        add     ebp,DWORD[((-128))+r13]
+        lea     ebp,[rbp*1+rcx]
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        xor     ebx,edx
+        add     ebp,r12d
+        xor     ebx,esi
+        add     eax,DWORD[((-124))+r13]
+        lea     eax,[rbx*1+rax]
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        xor     ebp,ecx
+        add     eax,r12d
+        xor     ebp,edx
+        add     esi,DWORD[((-120))+r13]
+        lea     esi,[rbp*1+rsi]
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        xor     eax,ebx
+        add     esi,r12d
+        xor     eax,ecx
+        add     edx,DWORD[((-116))+r13]
+        lea     edx,[rax*1+rdx]
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        xor     esi,ebp
+        add     edx,r12d
+        xor     esi,ebx
+        add     ecx,DWORD[((-96))+r13]
+        lea     ecx,[rsi*1+rcx]
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        xor     edx,eax
+        add     ecx,r12d
+        xor     edx,ebp
+        add     ebx,DWORD[((-92))+r13]
+        lea     ebx,[rdx*1+rbx]
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        xor     ecx,esi
+        add     ebx,r12d
+        xor     ecx,eax
+        add     ebp,DWORD[((-88))+r13]
+        lea     ebp,[rbp*1+rcx]
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        xor     ebx,edx
+        add     ebp,r12d
+        xor     ebx,esi
+        add     eax,DWORD[((-84))+r13]
+        lea     eax,[rbx*1+rax]
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        xor     ebp,ecx
+        add     eax,r12d
+        xor     ebp,edx
+        add     esi,DWORD[((-64))+r13]
+        lea     esi,[rbp*1+rsi]
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        xor     eax,ebx
+        add     esi,r12d
+        xor     eax,ecx
+        add     edx,DWORD[((-60))+r13]
+        lea     edx,[rax*1+rdx]
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        xor     esi,ebp
+        add     edx,r12d
+        xor     esi,ebx
+        add     ecx,DWORD[((-56))+r13]
+        lea     ecx,[rsi*1+rcx]
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        xor     edx,eax
+        add     ecx,r12d
+        xor     edx,ebp
+        add     ebx,DWORD[((-52))+r13]
+        lea     ebx,[rdx*1+rbx]
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        xor     ecx,esi
+        add     ebx,r12d
+        xor     ecx,eax
+        add     ebp,DWORD[((-32))+r13]
+        lea     ebp,[rbp*1+rcx]
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        xor     ebx,edx
+        add     ebp,r12d
+        xor     ebx,esi
+        add     eax,DWORD[((-28))+r13]
+        lea     eax,[rbx*1+rax]
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        xor     ebp,ecx
+        add     eax,r12d
+        xor     ebp,edx
+        add     esi,DWORD[((-24))+r13]
+        lea     esi,[rbp*1+rsi]
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        xor     eax,ebx
+        add     esi,r12d
+        xor     eax,ecx
+        add     edx,DWORD[((-20))+r13]
+        lea     edx,[rax*1+rdx]
+        rorx    r12d,esi,27
+        add     edx,r12d
+        lea     r13,[128+r9]
+        lea     rdi,[128+r9]
+        cmp     r13,r10
+        cmovae  r13,r9
+
+
+        add     edx,DWORD[r8]
+        add     esi,DWORD[4+r8]
+        add     ebp,DWORD[8+r8]
+        mov     DWORD[r8],edx
+        add     ebx,DWORD[12+r8]
+        mov     DWORD[4+r8],esi
+        mov     eax,edx
+        add     ecx,DWORD[16+r8]
+        mov     r12d,ebp
+        mov     DWORD[8+r8],ebp
+        mov     edx,ebx
+
+        mov     DWORD[12+r8],ebx
+        mov     ebp,esi
+        mov     DWORD[16+r8],ecx
+
+        mov     esi,ecx
+        mov     ecx,r12d
+
+
+        cmp     r9,r10
+        je      NEAR $L$done_avx2
+        vmovdqu ymm6,YMMWORD[64+r14]
+        cmp     rdi,r10
+        ja      NEAR $L$ast_avx2
+
+        vmovdqu xmm0,XMMWORD[((-64))+rdi]
+        vmovdqu xmm1,XMMWORD[((-48))+rdi]
+        vmovdqu xmm2,XMMWORD[((-32))+rdi]
+        vmovdqu xmm3,XMMWORD[((-16))+rdi]
+        vinserti128     ymm0,ymm0,XMMWORD[r13],1
+        vinserti128     ymm1,ymm1,XMMWORD[16+r13],1
+        vinserti128     ymm2,ymm2,XMMWORD[32+r13],1
+        vinserti128     ymm3,ymm3,XMMWORD[48+r13],1
+        jmp     NEAR $L$ast_avx2
+
+ALIGN   32
+$L$ast_avx2:
+        lea     r13,[((128+16))+rsp]
+        rorx    ebx,ebp,2
+        andn    edi,ebp,edx
+        and     ebp,ecx
+        xor     ebp,edi
+        sub     r9,-128
+        add     esi,DWORD[((-128))+r13]
+        andn    edi,eax,ecx
+        add     esi,ebp
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        and     eax,ebx
+        add     esi,r12d
+        xor     eax,edi
+        add     edx,DWORD[((-124))+r13]
+        andn    edi,esi,ebx
+        add     edx,eax
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        and     esi,ebp
+        add     edx,r12d
+        xor     esi,edi
+        add     ecx,DWORD[((-120))+r13]
+        andn    edi,edx,ebp
+        add     ecx,esi
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        and     edx,eax
+        add     ecx,r12d
+        xor     edx,edi
+        add     ebx,DWORD[((-116))+r13]
+        andn    edi,ecx,eax
+        add     ebx,edx
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        and     ecx,esi
+        add     ebx,r12d
+        xor     ecx,edi
+        add     ebp,DWORD[((-96))+r13]
+        andn    edi,ebx,esi
+        add     ebp,ecx
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        and     ebx,edx
+        add     ebp,r12d
+        xor     ebx,edi
+        add     eax,DWORD[((-92))+r13]
+        andn    edi,ebp,edx
+        add     eax,ebx
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        and     ebp,ecx
+        add     eax,r12d
+        xor     ebp,edi
+        add     esi,DWORD[((-88))+r13]
+        andn    edi,eax,ecx
+        add     esi,ebp
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        and     eax,ebx
+        add     esi,r12d
+        xor     eax,edi
+        add     edx,DWORD[((-84))+r13]
+        andn    edi,esi,ebx
+        add     edx,eax
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        and     esi,ebp
+        add     edx,r12d
+        xor     esi,edi
+        add     ecx,DWORD[((-64))+r13]
+        andn    edi,edx,ebp
+        add     ecx,esi
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        and     edx,eax
+        add     ecx,r12d
+        xor     edx,edi
+        add     ebx,DWORD[((-60))+r13]
+        andn    edi,ecx,eax
+        add     ebx,edx
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        and     ecx,esi
+        add     ebx,r12d
+        xor     ecx,edi
+        add     ebp,DWORD[((-56))+r13]
+        andn    edi,ebx,esi
+        add     ebp,ecx
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        and     ebx,edx
+        add     ebp,r12d
+        xor     ebx,edi
+        add     eax,DWORD[((-52))+r13]
+        andn    edi,ebp,edx
+        add     eax,ebx
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        and     ebp,ecx
+        add     eax,r12d
+        xor     ebp,edi
+        add     esi,DWORD[((-32))+r13]
+        andn    edi,eax,ecx
+        add     esi,ebp
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        and     eax,ebx
+        add     esi,r12d
+        xor     eax,edi
+        add     edx,DWORD[((-28))+r13]
+        andn    edi,esi,ebx
+        add     edx,eax
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        and     esi,ebp
+        add     edx,r12d
+        xor     esi,edi
+        add     ecx,DWORD[((-24))+r13]
+        andn    edi,edx,ebp
+        add     ecx,esi
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        and     edx,eax
+        add     ecx,r12d
+        xor     edx,edi
+        add     ebx,DWORD[((-20))+r13]
+        andn    edi,ecx,eax
+        add     ebx,edx
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        and     ecx,esi
+        add     ebx,r12d
+        xor     ecx,edi
+        add     ebp,DWORD[r13]
+        andn    edi,ebx,esi
+        add     ebp,ecx
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        and     ebx,edx
+        add     ebp,r12d
+        xor     ebx,edi
+        add     eax,DWORD[4+r13]
+        andn    edi,ebp,edx
+        add     eax,ebx
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        and     ebp,ecx
+        add     eax,r12d
+        xor     ebp,edi
+        add     esi,DWORD[8+r13]
+        andn    edi,eax,ecx
+        add     esi,ebp
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        and     eax,ebx
+        add     esi,r12d
+        xor     eax,edi
+        add     edx,DWORD[12+r13]
+        lea     edx,[rax*1+rdx]
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        xor     esi,ebp
+        add     edx,r12d
+        xor     esi,ebx
+        add     ecx,DWORD[32+r13]
+        lea     ecx,[rsi*1+rcx]
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        xor     edx,eax
+        add     ecx,r12d
+        xor     edx,ebp
+        add     ebx,DWORD[36+r13]
+        lea     ebx,[rdx*1+rbx]
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        xor     ecx,esi
+        add     ebx,r12d
+        xor     ecx,eax
+        add     ebp,DWORD[40+r13]
+        lea     ebp,[rbp*1+rcx]
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        xor     ebx,edx
+        add     ebp,r12d
+        xor     ebx,esi
+        add     eax,DWORD[44+r13]
+        lea     eax,[rbx*1+rax]
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        xor     ebp,ecx
+        add     eax,r12d
+        xor     ebp,edx
+        add     esi,DWORD[64+r13]
+        lea     esi,[rbp*1+rsi]
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        xor     eax,ebx
+        add     esi,r12d
+        xor     eax,ecx
+        vmovdqu ymm11,YMMWORD[((-64))+r14]
+        vpshufb ymm0,ymm0,ymm6
+        add     edx,DWORD[68+r13]
+        lea     edx,[rax*1+rdx]
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        xor     esi,ebp
+        add     edx,r12d
+        xor     esi,ebx
+        add     ecx,DWORD[72+r13]
+        lea     ecx,[rsi*1+rcx]
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        xor     edx,eax
+        add     ecx,r12d
+        xor     edx,ebp
+        add     ebx,DWORD[76+r13]
+        lea     ebx,[rdx*1+rbx]
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        xor     ecx,esi
+        add     ebx,r12d
+        xor     ecx,eax
+        add     ebp,DWORD[96+r13]
+        lea     ebp,[rbp*1+rcx]
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        xor     ebx,edx
+        add     ebp,r12d
+        xor     ebx,esi
+        add     eax,DWORD[100+r13]
+        lea     eax,[rbx*1+rax]
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        xor     ebp,ecx
+        add     eax,r12d
+        xor     ebp,edx
+        vpshufb ymm1,ymm1,ymm6
+        vpaddd  ymm8,ymm0,ymm11
+        add     esi,DWORD[104+r13]
+        lea     esi,[rbp*1+rsi]
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        xor     eax,ebx
+        add     esi,r12d
+        xor     eax,ecx
+        add     edx,DWORD[108+r13]
+        lea     r13,[256+r13]
+        lea     edx,[rax*1+rdx]
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        xor     esi,ebp
+        add     edx,r12d
+        xor     esi,ebx
+        add     ecx,DWORD[((-128))+r13]
+        lea     ecx,[rsi*1+rcx]
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        xor     edx,eax
+        add     ecx,r12d
+        xor     edx,ebp
+        add     ebx,DWORD[((-124))+r13]
+        lea     ebx,[rdx*1+rbx]
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        xor     ecx,esi
+        add     ebx,r12d
+        xor     ecx,eax
+        add     ebp,DWORD[((-120))+r13]
+        lea     ebp,[rbp*1+rcx]
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        xor     ebx,edx
+        add     ebp,r12d
+        xor     ebx,esi
+        vmovdqu YMMWORD[rsp],ymm8
+        vpshufb ymm2,ymm2,ymm6
+        vpaddd  ymm9,ymm1,ymm11
+        add     eax,DWORD[((-116))+r13]
+        lea     eax,[rbx*1+rax]
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        xor     ebp,ecx
+        add     eax,r12d
+        xor     ebp,edx
+        add     esi,DWORD[((-96))+r13]
+        lea     esi,[rbp*1+rsi]
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        xor     eax,ebx
+        add     esi,r12d
+        xor     eax,ecx
+        add     edx,DWORD[((-92))+r13]
+        lea     edx,[rax*1+rdx]
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        xor     esi,ebp
+        add     edx,r12d
+        xor     esi,ebx
+        add     ecx,DWORD[((-88))+r13]
+        lea     ecx,[rsi*1+rcx]
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        xor     edx,eax
+        add     ecx,r12d
+        xor     edx,ebp
+        add     ebx,DWORD[((-84))+r13]
+        mov     edi,esi
+        xor     edi,eax
+        lea     ebx,[rdx*1+rbx]
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        xor     ecx,esi
+        add     ebx,r12d
+        and     ecx,edi
+        vmovdqu YMMWORD[32+rsp],ymm9
+        vpshufb ymm3,ymm3,ymm6
+        vpaddd  ymm6,ymm2,ymm11
+        add     ebp,DWORD[((-64))+r13]
+        xor     ecx,esi
+        mov     edi,edx
+        xor     edi,esi
+        lea     ebp,[rbp*1+rcx]
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        xor     ebx,edx
+        add     ebp,r12d
+        and     ebx,edi
+        add     eax,DWORD[((-60))+r13]
+        xor     ebx,edx
+        mov     edi,ecx
+        xor     edi,edx
+        lea     eax,[rbx*1+rax]
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        xor     ebp,ecx
+        add     eax,r12d
+        and     ebp,edi
+        add     esi,DWORD[((-56))+r13]
+        xor     ebp,ecx
+        mov     edi,ebx
+        xor     edi,ecx
+        lea     esi,[rbp*1+rsi]
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        xor     eax,ebx
+        add     esi,r12d
+        and     eax,edi
+        add     edx,DWORD[((-52))+r13]
+        xor     eax,ebx
+        mov     edi,ebp
+        xor     edi,ebx
+        lea     edx,[rax*1+rdx]
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        xor     esi,ebp
+        add     edx,r12d
+        and     esi,edi
+        add     ecx,DWORD[((-32))+r13]
+        xor     esi,ebp
+        mov     edi,eax
+        xor     edi,ebp
+        lea     ecx,[rsi*1+rcx]
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        xor     edx,eax
+        add     ecx,r12d
+        and     edx,edi
+        jmp     NEAR $L$align32_3
+ALIGN   32
+$L$align32_3:
+        vmovdqu YMMWORD[64+rsp],ymm6
+        vpaddd  ymm7,ymm3,ymm11
+        add     ebx,DWORD[((-28))+r13]
+        xor     edx,eax
+        mov     edi,esi
+        xor     edi,eax
+        lea     ebx,[rdx*1+rbx]
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        xor     ecx,esi
+        add     ebx,r12d
+        and     ecx,edi
+        add     ebp,DWORD[((-24))+r13]
+        xor     ecx,esi
+        mov     edi,edx
+        xor     edi,esi
+        lea     ebp,[rbp*1+rcx]
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        xor     ebx,edx
+        add     ebp,r12d
+        and     ebx,edi
+        add     eax,DWORD[((-20))+r13]
+        xor     ebx,edx
+        mov     edi,ecx
+        xor     edi,edx
+        lea     eax,[rbx*1+rax]
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        xor     ebp,ecx
+        add     eax,r12d
+        and     ebp,edi
+        add     esi,DWORD[r13]
+        xor     ebp,ecx
+        mov     edi,ebx
+        xor     edi,ecx
+        lea     esi,[rbp*1+rsi]
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        xor     eax,ebx
+        add     esi,r12d
+        and     eax,edi
+        add     edx,DWORD[4+r13]
+        xor     eax,ebx
+        mov     edi,ebp
+        xor     edi,ebx
+        lea     edx,[rax*1+rdx]
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        xor     esi,ebp
+        add     edx,r12d
+        and     esi,edi
+        vmovdqu YMMWORD[96+rsp],ymm7
+        add     ecx,DWORD[8+r13]
+        xor     esi,ebp
+        mov     edi,eax
+        xor     edi,ebp
+        lea     ecx,[rsi*1+rcx]
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        xor     edx,eax
+        add     ecx,r12d
+        and     edx,edi
+        add     ebx,DWORD[12+r13]
+        xor     edx,eax
+        mov     edi,esi
+        xor     edi,eax
+        lea     ebx,[rdx*1+rbx]
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        xor     ecx,esi
+        add     ebx,r12d
+        and     ecx,edi
+        add     ebp,DWORD[32+r13]
+        xor     ecx,esi
+        mov     edi,edx
+        xor     edi,esi
+        lea     ebp,[rbp*1+rcx]
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        xor     ebx,edx
+        add     ebp,r12d
+        and     ebx,edi
+        add     eax,DWORD[36+r13]
+        xor     ebx,edx
+        mov     edi,ecx
+        xor     edi,edx
+        lea     eax,[rbx*1+rax]
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        xor     ebp,ecx
+        add     eax,r12d
+        and     ebp,edi
+        add     esi,DWORD[40+r13]
+        xor     ebp,ecx
+        mov     edi,ebx
+        xor     edi,ecx
+        lea     esi,[rbp*1+rsi]
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        xor     eax,ebx
+        add     esi,r12d
+        and     eax,edi
+        vpalignr        ymm4,ymm1,ymm0,8
+        add     edx,DWORD[44+r13]
+        xor     eax,ebx
+        mov     edi,ebp
+        xor     edi,ebx
+        vpsrldq ymm8,ymm3,4
+        lea     edx,[rax*1+rdx]
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        vpxor   ymm4,ymm4,ymm0
+        vpxor   ymm8,ymm8,ymm2
+        xor     esi,ebp
+        add     edx,r12d
+        vpxor   ymm4,ymm4,ymm8
+        and     esi,edi
+        add     ecx,DWORD[64+r13]
+        xor     esi,ebp
+        mov     edi,eax
+        vpsrld  ymm8,ymm4,31
+        xor     edi,ebp
+        lea     ecx,[rsi*1+rcx]
+        rorx    r12d,edx,27
+        vpslldq ymm10,ymm4,12
+        vpaddd  ymm4,ymm4,ymm4
+        rorx    esi,edx,2
+        xor     edx,eax
+        vpsrld  ymm9,ymm10,30
+        vpor    ymm4,ymm4,ymm8
+        add     ecx,r12d
+        and     edx,edi
+        vpslld  ymm10,ymm10,2
+        vpxor   ymm4,ymm4,ymm9
+        add     ebx,DWORD[68+r13]
+        xor     edx,eax
+        vpxor   ymm4,ymm4,ymm10
+        mov     edi,esi
+        xor     edi,eax
+        lea     ebx,[rdx*1+rbx]
+        vpaddd  ymm9,ymm4,ymm11
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        xor     ecx,esi
+        vmovdqu YMMWORD[128+rsp],ymm9
+        add     ebx,r12d
+        and     ecx,edi
+        add     ebp,DWORD[72+r13]
+        xor     ecx,esi
+        mov     edi,edx
+        xor     edi,esi
+        lea     ebp,[rbp*1+rcx]
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        xor     ebx,edx
+        add     ebp,r12d
+        and     ebx,edi
+        add     eax,DWORD[76+r13]
+        xor     ebx,edx
+        lea     eax,[rbx*1+rax]
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        xor     ebp,ecx
+        add     eax,r12d
+        xor     ebp,edx
+        vpalignr        ymm5,ymm2,ymm1,8
+        add     esi,DWORD[96+r13]
+        lea     esi,[rbp*1+rsi]
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        vpsrldq ymm8,ymm4,4
+        xor     eax,ebx
+        add     esi,r12d
+        xor     eax,ecx
+        vpxor   ymm5,ymm5,ymm1
+        vpxor   ymm8,ymm8,ymm3
+        add     edx,DWORD[100+r13]
+        lea     edx,[rax*1+rdx]
+        vpxor   ymm5,ymm5,ymm8
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        xor     esi,ebp
+        add     edx,r12d
+        vpsrld  ymm8,ymm5,31
+        vmovdqu ymm11,YMMWORD[((-32))+r14]
+        xor     esi,ebx
+        add     ecx,DWORD[104+r13]
+        lea     ecx,[rsi*1+rcx]
+        vpslldq ymm10,ymm5,12
+        vpaddd  ymm5,ymm5,ymm5
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        vpsrld  ymm9,ymm10,30
+        vpor    ymm5,ymm5,ymm8
+        xor     edx,eax
+        add     ecx,r12d
+        vpslld  ymm10,ymm10,2
+        vpxor   ymm5,ymm5,ymm9
+        xor     edx,ebp
+        add     ebx,DWORD[108+r13]
+        lea     r13,[256+r13]
+        vpxor   ymm5,ymm5,ymm10
+        lea     ebx,[rdx*1+rbx]
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        vpaddd  ymm9,ymm5,ymm11
+        xor     ecx,esi
+        add     ebx,r12d
+        xor     ecx,eax
+        vmovdqu YMMWORD[160+rsp],ymm9
+        add     ebp,DWORD[((-128))+r13]
+        lea     ebp,[rbp*1+rcx]
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        xor     ebx,edx
+        add     ebp,r12d
+        xor     ebx,esi
+        vpalignr        ymm6,ymm3,ymm2,8
+        add     eax,DWORD[((-124))+r13]
+        lea     eax,[rbx*1+rax]
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        vpsrldq ymm8,ymm5,4
+        xor     ebp,ecx
+        add     eax,r12d
+        xor     ebp,edx
+        vpxor   ymm6,ymm6,ymm2
+        vpxor   ymm8,ymm8,ymm4
+        add     esi,DWORD[((-120))+r13]
+        lea     esi,[rbp*1+rsi]
+        vpxor   ymm6,ymm6,ymm8
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        xor     eax,ebx
+        add     esi,r12d
+        vpsrld  ymm8,ymm6,31
+        xor     eax,ecx
+        add     edx,DWORD[((-116))+r13]
+        lea     edx,[rax*1+rdx]
+        vpslldq ymm10,ymm6,12
+        vpaddd  ymm6,ymm6,ymm6
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        vpsrld  ymm9,ymm10,30
+        vpor    ymm6,ymm6,ymm8
+        xor     esi,ebp
+        add     edx,r12d
+        vpslld  ymm10,ymm10,2
+        vpxor   ymm6,ymm6,ymm9
+        xor     esi,ebx
+        add     ecx,DWORD[((-96))+r13]
+        vpxor   ymm6,ymm6,ymm10
+        lea     ecx,[rsi*1+rcx]
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        vpaddd  ymm9,ymm6,ymm11
+        xor     edx,eax
+        add     ecx,r12d
+        xor     edx,ebp
+        vmovdqu YMMWORD[192+rsp],ymm9
+        add     ebx,DWORD[((-92))+r13]
+        lea     ebx,[rdx*1+rbx]
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        xor     ecx,esi
+        add     ebx,r12d
+        xor     ecx,eax
+        vpalignr        ymm7,ymm4,ymm3,8
+        add     ebp,DWORD[((-88))+r13]
+        lea     ebp,[rbp*1+rcx]
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        vpsrldq ymm8,ymm6,4
+        xor     ebx,edx
+        add     ebp,r12d
+        xor     ebx,esi
+        vpxor   ymm7,ymm7,ymm3
+        vpxor   ymm8,ymm8,ymm5
+        add     eax,DWORD[((-84))+r13]
+        lea     eax,[rbx*1+rax]
+        vpxor   ymm7,ymm7,ymm8
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        xor     ebp,ecx
+        add     eax,r12d
+        vpsrld  ymm8,ymm7,31
+        xor     ebp,edx
+        add     esi,DWORD[((-64))+r13]
+        lea     esi,[rbp*1+rsi]
+        vpslldq ymm10,ymm7,12
+        vpaddd  ymm7,ymm7,ymm7
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        vpsrld  ymm9,ymm10,30
+        vpor    ymm7,ymm7,ymm8
+        xor     eax,ebx
+        add     esi,r12d
+        vpslld  ymm10,ymm10,2
+        vpxor   ymm7,ymm7,ymm9
+        xor     eax,ecx
+        add     edx,DWORD[((-60))+r13]
+        vpxor   ymm7,ymm7,ymm10
+        lea     edx,[rax*1+rdx]
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        vpaddd  ymm9,ymm7,ymm11
+        xor     esi,ebp
+        add     edx,r12d
+        xor     esi,ebx
+        vmovdqu YMMWORD[224+rsp],ymm9
+        add     ecx,DWORD[((-56))+r13]
+        lea     ecx,[rsi*1+rcx]
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        xor     edx,eax
+        add     ecx,r12d
+        xor     edx,ebp
+        add     ebx,DWORD[((-52))+r13]
+        lea     ebx,[rdx*1+rbx]
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        xor     ecx,esi
+        add     ebx,r12d
+        xor     ecx,eax
+        add     ebp,DWORD[((-32))+r13]
+        lea     ebp,[rbp*1+rcx]
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        xor     ebx,edx
+        add     ebp,r12d
+        xor     ebx,esi
+        add     eax,DWORD[((-28))+r13]
+        lea     eax,[rbx*1+rax]
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        xor     ebp,ecx
+        add     eax,r12d
+        xor     ebp,edx
+        add     esi,DWORD[((-24))+r13]
+        lea     esi,[rbp*1+rsi]
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        xor     eax,ebx
+        add     esi,r12d
+        xor     eax,ecx
+        add     edx,DWORD[((-20))+r13]
+        lea     edx,[rax*1+rdx]
+        rorx    r12d,esi,27
+        add     edx,r12d
+        lea     r13,[128+rsp]
+
+
+        add     edx,DWORD[r8]
+        add     esi,DWORD[4+r8]
+        add     ebp,DWORD[8+r8]
+        mov     DWORD[r8],edx
+        add     ebx,DWORD[12+r8]
+        mov     DWORD[4+r8],esi
+        mov     eax,edx
+        add     ecx,DWORD[16+r8]
+        mov     r12d,ebp
+        mov     DWORD[8+r8],ebp
+        mov     edx,ebx
+
+        mov     DWORD[12+r8],ebx
+        mov     ebp,esi
+        mov     DWORD[16+r8],ecx
+
+        mov     esi,ecx
+        mov     ecx,r12d
+
+
+        cmp     r9,r10
+        jbe     NEAR $L$oop_avx2
+
+$L$done_avx2:
+        vzeroupper
+        movaps  xmm6,XMMWORD[((-40-96))+r11]
+        movaps  xmm7,XMMWORD[((-40-80))+r11]
+        movaps  xmm8,XMMWORD[((-40-64))+r11]
+        movaps  xmm9,XMMWORD[((-40-48))+r11]
+        movaps  xmm10,XMMWORD[((-40-32))+r11]
+        movaps  xmm11,XMMWORD[((-40-16))+r11]
+        mov     r14,QWORD[((-40))+r11]
+
+        mov     r13,QWORD[((-32))+r11]
+
+        mov     r12,QWORD[((-24))+r11]
+
+        mov     rbp,QWORD[((-16))+r11]
+
+        mov     rbx,QWORD[((-8))+r11]
+
+        lea     rsp,[r11]
+
+$L$epilogue_avx2:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_sha1_block_data_order_avx2:
+ALIGN   64
+K_XX_XX:
+        DD      0x5a827999,0x5a827999,0x5a827999,0x5a827999
+        DD      0x5a827999,0x5a827999,0x5a827999,0x5a827999
+        DD      0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1
+        DD      0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1
+        DD      0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc
+        DD      0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc
+        DD      0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6
+        DD      0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6
+        DD      0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+        DD      0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+DB      0xf,0xe,0xd,0xc,0xb,0xa,0x9,0x8,0x7,0x6,0x5,0x4,0x3,0x2,0x1,0x0
+DB      83,72,65,49,32,98,108,111,99,107,32,116,114,97,110,115
+DB      102,111,114,109,32,102,111,114,32,120,56,54,95,54,52,44
+DB      32,67,82,89,80,84,79,71,65,77,83,32,98,121,32,60
+DB      97,112,112,114,111,64,111,112,101,110,115,115,108,46,111,114
+DB      103,62,0
+ALIGN   64
+EXTERN  __imp_RtlVirtualUnwind
+
+ALIGN   16
+se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        lea     r10,[$L$prologue]
+        cmp     rbx,r10
+        jb      NEAR $L$common_seh_tail
+
+        mov     rax,QWORD[152+r8]
+
+        lea     r10,[$L$epilogue]
+        cmp     rbx,r10
+        jae     NEAR $L$common_seh_tail
+
+        mov     rax,QWORD[64+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+        mov     r12,QWORD[((-24))+rax]
+        mov     r13,QWORD[((-32))+rax]
+        mov     r14,QWORD[((-40))+rax]
+        mov     QWORD[144+r8],rbx
+        mov     QWORD[160+r8],rbp
+        mov     QWORD[216+r8],r12
+        mov     QWORD[224+r8],r13
+        mov     QWORD[232+r8],r14
+
+        jmp     NEAR $L$common_seh_tail
+
+
+ALIGN   16
+shaext_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        lea     r10,[$L$prologue_shaext]
+        cmp     rbx,r10
+        jb      NEAR $L$common_seh_tail
+
+        lea     r10,[$L$epilogue_shaext]
+        cmp     rbx,r10
+        jae     NEAR $L$common_seh_tail
+
+        lea     rsi,[((-8-64))+rax]
+        lea     rdi,[512+r8]
+        mov     ecx,8
+        DD      0xa548f3fc
+
+        jmp     NEAR $L$common_seh_tail
+
+
+ALIGN   16
+ssse3_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$common_seh_tail
+
+        mov     rax,QWORD[208+r8]
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$common_seh_tail
+
+        lea     rsi,[((-40-96))+rax]
+        lea     rdi,[512+r8]
+        mov     ecx,12
+        DD      0xa548f3fc
+
+        mov     rbx,QWORD[((-8))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+        mov     r12,QWORD[((-24))+rax]
+        mov     r13,QWORD[((-32))+rax]
+        mov     r14,QWORD[((-40))+rax]
+        mov     QWORD[144+r8],rbx
+        mov     QWORD[160+r8],rbp
+        mov     QWORD[216+r8],r12
+        mov     QWORD[224+r8],r13
+        mov     QWORD[232+r8],r14
+
+$L$common_seh_tail:
+        mov     rdi,QWORD[8+rax]
+        mov     rsi,QWORD[16+rax]
+        mov     QWORD[152+r8],rax
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+
+        mov     rdi,QWORD[40+r9]
+        mov     rsi,r8
+        mov     ecx,154
+        DD      0xa548f3fc
+
+        mov     rsi,r9
+        xor     rcx,rcx
+        mov     rdx,QWORD[8+rsi]
+        mov     r8,QWORD[rsi]
+        mov     r9,QWORD[16+rsi]
+        mov     r10,QWORD[40+rsi]
+        lea     r11,[56+rsi]
+        lea     r12,[24+rsi]
+        mov     QWORD[32+rsp],r10
+        mov     QWORD[40+rsp],r11
+        mov     QWORD[48+rsp],r12
+        mov     QWORD[56+rsp],rcx
+        call    QWORD[__imp_RtlVirtualUnwind]
+
+        mov     eax,1
+        add     rsp,64
+        popfq
+        pop     r15
+        pop     r14
+        pop     r13
+        pop     r12
+        pop     rbp
+        pop     rbx
+        pop     rdi
+        pop     rsi
+        DB      0F3h,0C3h               ;repret
+
+
+section .pdata rdata align=4
+ALIGN   4
+        DD      $L$SEH_begin_sha1_block_data_order wrt ..imagebase
+        DD      $L$SEH_end_sha1_block_data_order wrt ..imagebase
+        DD      $L$SEH_info_sha1_block_data_order wrt ..imagebase
+        DD      $L$SEH_begin_sha1_block_data_order_shaext wrt ..imagebase
+        DD      $L$SEH_end_sha1_block_data_order_shaext wrt ..imagebase
+        DD      $L$SEH_info_sha1_block_data_order_shaext wrt ..imagebase
+        DD      $L$SEH_begin_sha1_block_data_order_ssse3 wrt ..imagebase
+        DD      $L$SEH_end_sha1_block_data_order_ssse3 wrt ..imagebase
+        DD      $L$SEH_info_sha1_block_data_order_ssse3 wrt ..imagebase
+        DD      $L$SEH_begin_sha1_block_data_order_avx wrt ..imagebase
+        DD      $L$SEH_end_sha1_block_data_order_avx wrt ..imagebase
+        DD      $L$SEH_info_sha1_block_data_order_avx wrt ..imagebase
+        DD      $L$SEH_begin_sha1_block_data_order_avx2 wrt ..imagebase
+        DD      $L$SEH_end_sha1_block_data_order_avx2 wrt ..imagebase
+        DD      $L$SEH_info_sha1_block_data_order_avx2 wrt ..imagebase
+section .xdata rdata align=8
+ALIGN   8
+$L$SEH_info_sha1_block_data_order:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+$L$SEH_info_sha1_block_data_order_shaext:
+DB      9,0,0,0
+        DD      shaext_handler wrt ..imagebase
+$L$SEH_info_sha1_block_data_order_ssse3:
+DB      9,0,0,0
+        DD      ssse3_handler wrt ..imagebase
+        DD      $L$prologue_ssse3 wrt ..imagebase,$L$epilogue_ssse3 wrt ..imagebase
+$L$SEH_info_sha1_block_data_order_avx:
+DB      9,0,0,0
+        DD      ssse3_handler wrt ..imagebase
+        DD      $L$prologue_avx wrt ..imagebase,$L$epilogue_avx wrt ..imagebase
+$L$SEH_info_sha1_block_data_order_avx2:
+DB      9,0,0,0
+        DD      ssse3_handler wrt ..imagebase
+        DD      $L$prologue_avx2 wrt ..imagebase,$L$epilogue_avx2 wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb-x86_64.nasm
new file mode 100644
index 0000000000..5940112c1f
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb-x86_64.nasm
@@ -0,0 +1,8262 @@
+; Copyright 2013-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+EXTERN  OPENSSL_ia32cap_P
+
+global  sha256_multi_block
+
+ALIGN   32
+sha256_multi_block:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha256_multi_block:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+        mov     rcx,QWORD[((OPENSSL_ia32cap_P+4))]
+        bt      rcx,61
+        jc      NEAR _shaext_shortcut
+        test    ecx,268435456
+        jnz     NEAR _avx_shortcut
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        lea     rsp,[((-168))+rsp]
+        movaps  XMMWORD[rsp],xmm6
+        movaps  XMMWORD[16+rsp],xmm7
+        movaps  XMMWORD[32+rsp],xmm8
+        movaps  XMMWORD[48+rsp],xmm9
+        movaps  XMMWORD[(-120)+rax],xmm10
+        movaps  XMMWORD[(-104)+rax],xmm11
+        movaps  XMMWORD[(-88)+rax],xmm12
+        movaps  XMMWORD[(-72)+rax],xmm13
+        movaps  XMMWORD[(-56)+rax],xmm14
+        movaps  XMMWORD[(-40)+rax],xmm15
+        sub     rsp,288
+        and     rsp,-256
+        mov     QWORD[272+rsp],rax
+
+$L$body:
+        lea     rbp,[((K256+128))]
+        lea     rbx,[256+rsp]
+        lea     rdi,[128+rdi]
+
+$L$oop_grande:
+        mov     DWORD[280+rsp],edx
+        xor     edx,edx
+        mov     r8,QWORD[rsi]
+        mov     ecx,DWORD[8+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[rbx],ecx
+        cmovle  r8,rbp
+        mov     r9,QWORD[16+rsi]
+        mov     ecx,DWORD[24+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[4+rbx],ecx
+        cmovle  r9,rbp
+        mov     r10,QWORD[32+rsi]
+        mov     ecx,DWORD[40+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[8+rbx],ecx
+        cmovle  r10,rbp
+        mov     r11,QWORD[48+rsi]
+        mov     ecx,DWORD[56+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[12+rbx],ecx
+        cmovle  r11,rbp
+        test    edx,edx
+        jz      NEAR $L$done
+
+        movdqu  xmm8,XMMWORD[((0-128))+rdi]
+        lea     rax,[128+rsp]
+        movdqu  xmm9,XMMWORD[((32-128))+rdi]
+        movdqu  xmm10,XMMWORD[((64-128))+rdi]
+        movdqu  xmm11,XMMWORD[((96-128))+rdi]
+        movdqu  xmm12,XMMWORD[((128-128))+rdi]
+        movdqu  xmm13,XMMWORD[((160-128))+rdi]
+        movdqu  xmm14,XMMWORD[((192-128))+rdi]
+        movdqu  xmm15,XMMWORD[((224-128))+rdi]
+        movdqu  xmm6,XMMWORD[$L$pbswap]
+        jmp     NEAR $L$oop
+
+ALIGN   32
+$L$oop:
+        movdqa  xmm4,xmm10
+        pxor    xmm4,xmm9
+        movd    xmm5,DWORD[r8]
+        movd    xmm0,DWORD[r9]
+        movd    xmm1,DWORD[r10]
+        movd    xmm2,DWORD[r11]
+        punpckldq       xmm5,xmm1
+        punpckldq       xmm0,xmm2
+        punpckldq       xmm5,xmm0
+        movdqa  xmm7,xmm12
+DB      102,15,56,0,238
+        movdqa  xmm2,xmm12
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm12
+        pslld   xmm2,7
+        movdqa  XMMWORD[(0-128)+rax],xmm5
+        paddd   xmm5,xmm15
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[((-128))+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm12
+
+        pxor    xmm7,xmm2
+        movdqa  xmm3,xmm12
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm14
+        pand    xmm3,xmm13
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm8
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm8
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        movdqa  xmm3,xmm9
+        movdqa  xmm7,xmm8
+        pslld   xmm2,10
+        pxor    xmm3,xmm8
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm4,xmm3
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm15,xmm9
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm15,xmm4
+        paddd   xmm11,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm15,xmm5
+        paddd   xmm15,xmm7
+        movd    xmm5,DWORD[4+r8]
+        movd    xmm0,DWORD[4+r9]
+        movd    xmm1,DWORD[4+r10]
+        movd    xmm2,DWORD[4+r11]
+        punpckldq       xmm5,xmm1
+        punpckldq       xmm0,xmm2
+        punpckldq       xmm5,xmm0
+        movdqa  xmm7,xmm11
+
+        movdqa  xmm2,xmm11
+DB      102,15,56,0,238
+        psrld   xmm7,6
+        movdqa  xmm1,xmm11
+        pslld   xmm2,7
+        movdqa  XMMWORD[(16-128)+rax],xmm5
+        paddd   xmm5,xmm14
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[((-96))+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm11
+
+        pxor    xmm7,xmm2
+        movdqa  xmm4,xmm11
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm13
+        pand    xmm4,xmm12
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm15
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm15
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm4
+        movdqa  xmm4,xmm8
+        movdqa  xmm7,xmm15
+        pslld   xmm2,10
+        pxor    xmm4,xmm15
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm3,xmm4
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm14,xmm8
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm14,xmm3
+        paddd   xmm10,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm14,xmm5
+        paddd   xmm14,xmm7
+        movd    xmm5,DWORD[8+r8]
+        movd    xmm0,DWORD[8+r9]
+        movd    xmm1,DWORD[8+r10]
+        movd    xmm2,DWORD[8+r11]
+        punpckldq       xmm5,xmm1
+        punpckldq       xmm0,xmm2
+        punpckldq       xmm5,xmm0
+        movdqa  xmm7,xmm10
+DB      102,15,56,0,238
+        movdqa  xmm2,xmm10
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm10
+        pslld   xmm2,7
+        movdqa  XMMWORD[(32-128)+rax],xmm5
+        paddd   xmm5,xmm13
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[((-64))+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm10
+
+        pxor    xmm7,xmm2
+        movdqa  xmm3,xmm10
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm12
+        pand    xmm3,xmm11
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm14
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm14
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        movdqa  xmm3,xmm15
+        movdqa  xmm7,xmm14
+        pslld   xmm2,10
+        pxor    xmm3,xmm14
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm4,xmm3
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm13,xmm15
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm13,xmm4
+        paddd   xmm9,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm13,xmm5
+        paddd   xmm13,xmm7
+        movd    xmm5,DWORD[12+r8]
+        movd    xmm0,DWORD[12+r9]
+        movd    xmm1,DWORD[12+r10]
+        movd    xmm2,DWORD[12+r11]
+        punpckldq       xmm5,xmm1
+        punpckldq       xmm0,xmm2
+        punpckldq       xmm5,xmm0
+        movdqa  xmm7,xmm9
+
+        movdqa  xmm2,xmm9
+DB      102,15,56,0,238
+        psrld   xmm7,6
+        movdqa  xmm1,xmm9
+        pslld   xmm2,7
+        movdqa  XMMWORD[(48-128)+rax],xmm5
+        paddd   xmm5,xmm12
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[((-32))+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm9
+
+        pxor    xmm7,xmm2
+        movdqa  xmm4,xmm9
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm11
+        pand    xmm4,xmm10
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm13
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm13
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm4
+        movdqa  xmm4,xmm14
+        movdqa  xmm7,xmm13
+        pslld   xmm2,10
+        pxor    xmm4,xmm13
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm3,xmm4
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm12,xmm14
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm12,xmm3
+        paddd   xmm8,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm12,xmm5
+        paddd   xmm12,xmm7
+        movd    xmm5,DWORD[16+r8]
+        movd    xmm0,DWORD[16+r9]
+        movd    xmm1,DWORD[16+r10]
+        movd    xmm2,DWORD[16+r11]
+        punpckldq       xmm5,xmm1
+        punpckldq       xmm0,xmm2
+        punpckldq       xmm5,xmm0
+        movdqa  xmm7,xmm8
+DB      102,15,56,0,238
+        movdqa  xmm2,xmm8
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm8
+        pslld   xmm2,7
+        movdqa  XMMWORD[(64-128)+rax],xmm5
+        paddd   xmm5,xmm11
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm8
+
+        pxor    xmm7,xmm2
+        movdqa  xmm3,xmm8
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm10
+        pand    xmm3,xmm9
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm12
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm12
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        movdqa  xmm3,xmm13
+        movdqa  xmm7,xmm12
+        pslld   xmm2,10
+        pxor    xmm3,xmm12
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm4,xmm3
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm11,xmm13
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm11,xmm4
+        paddd   xmm15,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm11,xmm5
+        paddd   xmm11,xmm7
+        movd    xmm5,DWORD[20+r8]
+        movd    xmm0,DWORD[20+r9]
+        movd    xmm1,DWORD[20+r10]
+        movd    xmm2,DWORD[20+r11]
+        punpckldq       xmm5,xmm1
+        punpckldq       xmm0,xmm2
+        punpckldq       xmm5,xmm0
+        movdqa  xmm7,xmm15
+
+        movdqa  xmm2,xmm15
+DB      102,15,56,0,238
+        psrld   xmm7,6
+        movdqa  xmm1,xmm15
+        pslld   xmm2,7
+        movdqa  XMMWORD[(80-128)+rax],xmm5
+        paddd   xmm5,xmm10
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[32+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm15
+
+        pxor    xmm7,xmm2
+        movdqa  xmm4,xmm15
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm9
+        pand    xmm4,xmm8
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm11
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm11
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm4
+        movdqa  xmm4,xmm12
+        movdqa  xmm7,xmm11
+        pslld   xmm2,10
+        pxor    xmm4,xmm11
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm3,xmm4
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm10,xmm12
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm10,xmm3
+        paddd   xmm14,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm10,xmm5
+        paddd   xmm10,xmm7
+        movd    xmm5,DWORD[24+r8]
+        movd    xmm0,DWORD[24+r9]
+        movd    xmm1,DWORD[24+r10]
+        movd    xmm2,DWORD[24+r11]
+        punpckldq       xmm5,xmm1
+        punpckldq       xmm0,xmm2
+        punpckldq       xmm5,xmm0
+        movdqa  xmm7,xmm14
+DB      102,15,56,0,238
+        movdqa  xmm2,xmm14
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm14
+        pslld   xmm2,7
+        movdqa  XMMWORD[(96-128)+rax],xmm5
+        paddd   xmm5,xmm9
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[64+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm14
+
+        pxor    xmm7,xmm2
+        movdqa  xmm3,xmm14
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm8
+        pand    xmm3,xmm15
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm10
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm10
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        movdqa  xmm3,xmm11
+        movdqa  xmm7,xmm10
+        pslld   xmm2,10
+        pxor    xmm3,xmm10
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm4,xmm3
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm9,xmm11
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm9,xmm4
+        paddd   xmm13,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm9,xmm5
+        paddd   xmm9,xmm7
+        movd    xmm5,DWORD[28+r8]
+        movd    xmm0,DWORD[28+r9]
+        movd    xmm1,DWORD[28+r10]
+        movd    xmm2,DWORD[28+r11]
+        punpckldq       xmm5,xmm1
+        punpckldq       xmm0,xmm2
+        punpckldq       xmm5,xmm0
+        movdqa  xmm7,xmm13
+
+        movdqa  xmm2,xmm13
+DB      102,15,56,0,238
+        psrld   xmm7,6
+        movdqa  xmm1,xmm13
+        pslld   xmm2,7
+        movdqa  XMMWORD[(112-128)+rax],xmm5
+        paddd   xmm5,xmm8
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[96+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm13
+
+        pxor    xmm7,xmm2
+        movdqa  xmm4,xmm13
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm15
+        pand    xmm4,xmm14
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm9
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm9
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm4
+        movdqa  xmm4,xmm10
+        movdqa  xmm7,xmm9
+        pslld   xmm2,10
+        pxor    xmm4,xmm9
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm3,xmm4
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm8,xmm10
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm8,xmm3
+        paddd   xmm12,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm8,xmm5
+        paddd   xmm8,xmm7
+        lea     rbp,[256+rbp]
+        movd    xmm5,DWORD[32+r8]
+        movd    xmm0,DWORD[32+r9]
+        movd    xmm1,DWORD[32+r10]
+        movd    xmm2,DWORD[32+r11]
+        punpckldq       xmm5,xmm1
+        punpckldq       xmm0,xmm2
+        punpckldq       xmm5,xmm0
+        movdqa  xmm7,xmm12
+DB      102,15,56,0,238
+        movdqa  xmm2,xmm12
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm12
+        pslld   xmm2,7
+        movdqa  XMMWORD[(128-128)+rax],xmm5
+        paddd   xmm5,xmm15
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[((-128))+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm12
+
+        pxor    xmm7,xmm2
+        movdqa  xmm3,xmm12
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm14
+        pand    xmm3,xmm13
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm8
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm8
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        movdqa  xmm3,xmm9
+        movdqa  xmm7,xmm8
+        pslld   xmm2,10
+        pxor    xmm3,xmm8
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm4,xmm3
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm15,xmm9
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm15,xmm4
+        paddd   xmm11,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm15,xmm5
+        paddd   xmm15,xmm7
+        movd    xmm5,DWORD[36+r8]
+        movd    xmm0,DWORD[36+r9]
+        movd    xmm1,DWORD[36+r10]
+        movd    xmm2,DWORD[36+r11]
+        punpckldq       xmm5,xmm1
+        punpckldq       xmm0,xmm2
+        punpckldq       xmm5,xmm0
+        movdqa  xmm7,xmm11
+
+        movdqa  xmm2,xmm11
+DB      102,15,56,0,238
+        psrld   xmm7,6
+        movdqa  xmm1,xmm11
+        pslld   xmm2,7
+        movdqa  XMMWORD[(144-128)+rax],xmm5
+        paddd   xmm5,xmm14
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[((-96))+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm11
+
+        pxor    xmm7,xmm2
+        movdqa  xmm4,xmm11
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm13
+        pand    xmm4,xmm12
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm15
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm15
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm4
+        movdqa  xmm4,xmm8
+        movdqa  xmm7,xmm15
+        pslld   xmm2,10
+        pxor    xmm4,xmm15
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm3,xmm4
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm14,xmm8
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm14,xmm3
+        paddd   xmm10,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm14,xmm5
+        paddd   xmm14,xmm7
+        movd    xmm5,DWORD[40+r8]
+        movd    xmm0,DWORD[40+r9]
+        movd    xmm1,DWORD[40+r10]
+        movd    xmm2,DWORD[40+r11]
+        punpckldq       xmm5,xmm1
+        punpckldq       xmm0,xmm2
+        punpckldq       xmm5,xmm0
+        movdqa  xmm7,xmm10
+DB      102,15,56,0,238
+        movdqa  xmm2,xmm10
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm10
+        pslld   xmm2,7
+        movdqa  XMMWORD[(160-128)+rax],xmm5
+        paddd   xmm5,xmm13
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[((-64))+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm10
+
+        pxor    xmm7,xmm2
+        movdqa  xmm3,xmm10
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm12
+        pand    xmm3,xmm11
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm14
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm14
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        movdqa  xmm3,xmm15
+        movdqa  xmm7,xmm14
+        pslld   xmm2,10
+        pxor    xmm3,xmm14
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm4,xmm3
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm13,xmm15
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm13,xmm4
+        paddd   xmm9,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm13,xmm5
+        paddd   xmm13,xmm7
+        movd    xmm5,DWORD[44+r8]
+        movd    xmm0,DWORD[44+r9]
+        movd    xmm1,DWORD[44+r10]
+        movd    xmm2,DWORD[44+r11]
+        punpckldq       xmm5,xmm1
+        punpckldq       xmm0,xmm2
+        punpckldq       xmm5,xmm0
+        movdqa  xmm7,xmm9
+
+        movdqa  xmm2,xmm9
+DB      102,15,56,0,238
+        psrld   xmm7,6
+        movdqa  xmm1,xmm9
+        pslld   xmm2,7
+        movdqa  XMMWORD[(176-128)+rax],xmm5
+        paddd   xmm5,xmm12
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[((-32))+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm9
+
+        pxor    xmm7,xmm2
+        movdqa  xmm4,xmm9
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm11
+        pand    xmm4,xmm10
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm13
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm13
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm4
+        movdqa  xmm4,xmm14
+        movdqa  xmm7,xmm13
+        pslld   xmm2,10
+        pxor    xmm4,xmm13
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm3,xmm4
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm12,xmm14
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm12,xmm3
+        paddd   xmm8,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm12,xmm5
+        paddd   xmm12,xmm7
+        movd    xmm5,DWORD[48+r8]
+        movd    xmm0,DWORD[48+r9]
+        movd    xmm1,DWORD[48+r10]
+        movd    xmm2,DWORD[48+r11]
+        punpckldq       xmm5,xmm1
+        punpckldq       xmm0,xmm2
+        punpckldq       xmm5,xmm0
+        movdqa  xmm7,xmm8
+DB      102,15,56,0,238
+        movdqa  xmm2,xmm8
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm8
+        pslld   xmm2,7
+        movdqa  XMMWORD[(192-128)+rax],xmm5
+        paddd   xmm5,xmm11
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm8
+
+        pxor    xmm7,xmm2
+        movdqa  xmm3,xmm8
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm10
+        pand    xmm3,xmm9
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm12
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm12
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        movdqa  xmm3,xmm13
+        movdqa  xmm7,xmm12
+        pslld   xmm2,10
+        pxor    xmm3,xmm12
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm4,xmm3
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm11,xmm13
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm11,xmm4
+        paddd   xmm15,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm11,xmm5
+        paddd   xmm11,xmm7
+        movd    xmm5,DWORD[52+r8]
+        movd    xmm0,DWORD[52+r9]
+        movd    xmm1,DWORD[52+r10]
+        movd    xmm2,DWORD[52+r11]
+        punpckldq       xmm5,xmm1
+        punpckldq       xmm0,xmm2
+        punpckldq       xmm5,xmm0
+        movdqa  xmm7,xmm15
+
+        movdqa  xmm2,xmm15
+DB      102,15,56,0,238
+        psrld   xmm7,6
+        movdqa  xmm1,xmm15
+        pslld   xmm2,7
+        movdqa  XMMWORD[(208-128)+rax],xmm5
+        paddd   xmm5,xmm10
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[32+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm15
+
+        pxor    xmm7,xmm2
+        movdqa  xmm4,xmm15
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm9
+        pand    xmm4,xmm8
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm11
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm11
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm4
+        movdqa  xmm4,xmm12
+        movdqa  xmm7,xmm11
+        pslld   xmm2,10
+        pxor    xmm4,xmm11
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm3,xmm4
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm10,xmm12
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm10,xmm3
+        paddd   xmm14,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm10,xmm5
+        paddd   xmm10,xmm7
+        movd    xmm5,DWORD[56+r8]
+        movd    xmm0,DWORD[56+r9]
+        movd    xmm1,DWORD[56+r10]
+        movd    xmm2,DWORD[56+r11]
+        punpckldq       xmm5,xmm1
+        punpckldq       xmm0,xmm2
+        punpckldq       xmm5,xmm0
+        movdqa  xmm7,xmm14
+DB      102,15,56,0,238
+        movdqa  xmm2,xmm14
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm14
+        pslld   xmm2,7
+        movdqa  XMMWORD[(224-128)+rax],xmm5
+        paddd   xmm5,xmm9
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[64+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm14
+
+        pxor    xmm7,xmm2
+        movdqa  xmm3,xmm14
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm8
+        pand    xmm3,xmm15
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm10
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm10
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        movdqa  xmm3,xmm11
+        movdqa  xmm7,xmm10
+        pslld   xmm2,10
+        pxor    xmm3,xmm10
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm4,xmm3
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm9,xmm11
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm9,xmm4
+        paddd   xmm13,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm9,xmm5
+        paddd   xmm9,xmm7
+        movd    xmm5,DWORD[60+r8]
+        lea     r8,[64+r8]
+        movd    xmm0,DWORD[60+r9]
+        lea     r9,[64+r9]
+        movd    xmm1,DWORD[60+r10]
+        lea     r10,[64+r10]
+        movd    xmm2,DWORD[60+r11]
+        lea     r11,[64+r11]
+        punpckldq       xmm5,xmm1
+        punpckldq       xmm0,xmm2
+        punpckldq       xmm5,xmm0
+        movdqa  xmm7,xmm13
+
+        movdqa  xmm2,xmm13
+DB      102,15,56,0,238
+        psrld   xmm7,6
+        movdqa  xmm1,xmm13
+        pslld   xmm2,7
+        movdqa  XMMWORD[(240-128)+rax],xmm5
+        paddd   xmm5,xmm8
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[96+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm13
+        prefetcht0      [63+r8]
+        pxor    xmm7,xmm2
+        movdqa  xmm4,xmm13
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm15
+        pand    xmm4,xmm14
+        pxor    xmm7,xmm1
+
+        prefetcht0      [63+r9]
+        movdqa  xmm1,xmm9
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm9
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm4
+        movdqa  xmm4,xmm10
+        movdqa  xmm7,xmm9
+        pslld   xmm2,10
+        pxor    xmm4,xmm9
+
+        prefetcht0      [63+r10]
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm3,xmm4
+        pxor    xmm1,xmm7
+
+        prefetcht0      [63+r11]
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm8,xmm10
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm8,xmm3
+        paddd   xmm12,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm8,xmm5
+        paddd   xmm8,xmm7
+        lea     rbp,[256+rbp]
+        movdqu  xmm5,XMMWORD[((0-128))+rax]
+        mov     ecx,3
+        jmp     NEAR $L$oop_16_xx
+ALIGN   32
+$L$oop_16_xx:
+        movdqa  xmm6,XMMWORD[((16-128))+rax]
+        paddd   xmm5,XMMWORD[((144-128))+rax]
+
+        movdqa  xmm7,xmm6
+        movdqa  xmm1,xmm6
+        psrld   xmm7,3
+        movdqa  xmm2,xmm6
+
+        psrld   xmm1,7
+        movdqa  xmm0,XMMWORD[((224-128))+rax]
+        pslld   xmm2,14
+        pxor    xmm7,xmm1
+        psrld   xmm1,18-7
+        movdqa  xmm3,xmm0
+        pxor    xmm7,xmm2
+        pslld   xmm2,25-14
+        pxor    xmm7,xmm1
+        psrld   xmm0,10
+        movdqa  xmm1,xmm3
+
+        psrld   xmm3,17
+        pxor    xmm7,xmm2
+        pslld   xmm1,13
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        psrld   xmm3,19-17
+        pxor    xmm0,xmm1
+        pslld   xmm1,15-13
+        pxor    xmm0,xmm3
+        pxor    xmm0,xmm1
+        paddd   xmm5,xmm0
+        movdqa  xmm7,xmm12
+
+        movdqa  xmm2,xmm12
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm12
+        pslld   xmm2,7
+        movdqa  XMMWORD[(0-128)+rax],xmm5
+        paddd   xmm5,xmm15
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[((-128))+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm12
+
+        pxor    xmm7,xmm2
+        movdqa  xmm3,xmm12
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm14
+        pand    xmm3,xmm13
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm8
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm8
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        movdqa  xmm3,xmm9
+        movdqa  xmm7,xmm8
+        pslld   xmm2,10
+        pxor    xmm3,xmm8
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm4,xmm3
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm15,xmm9
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm15,xmm4
+        paddd   xmm11,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm15,xmm5
+        paddd   xmm15,xmm7
+        movdqa  xmm5,XMMWORD[((32-128))+rax]
+        paddd   xmm6,XMMWORD[((160-128))+rax]
+
+        movdqa  xmm7,xmm5
+        movdqa  xmm1,xmm5
+        psrld   xmm7,3
+        movdqa  xmm2,xmm5
+
+        psrld   xmm1,7
+        movdqa  xmm0,XMMWORD[((240-128))+rax]
+        pslld   xmm2,14
+        pxor    xmm7,xmm1
+        psrld   xmm1,18-7
+        movdqa  xmm4,xmm0
+        pxor    xmm7,xmm2
+        pslld   xmm2,25-14
+        pxor    xmm7,xmm1
+        psrld   xmm0,10
+        movdqa  xmm1,xmm4
+
+        psrld   xmm4,17
+        pxor    xmm7,xmm2
+        pslld   xmm1,13
+        paddd   xmm6,xmm7
+        pxor    xmm0,xmm4
+        psrld   xmm4,19-17
+        pxor    xmm0,xmm1
+        pslld   xmm1,15-13
+        pxor    xmm0,xmm4
+        pxor    xmm0,xmm1
+        paddd   xmm6,xmm0
+        movdqa  xmm7,xmm11
+
+        movdqa  xmm2,xmm11
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm11
+        pslld   xmm2,7
+        movdqa  XMMWORD[(16-128)+rax],xmm6
+        paddd   xmm6,xmm14
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm6,XMMWORD[((-96))+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm11
+
+        pxor    xmm7,xmm2
+        movdqa  xmm4,xmm11
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm13
+        pand    xmm4,xmm12
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm15
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm15
+        psrld   xmm1,2
+        paddd   xmm6,xmm7
+        pxor    xmm0,xmm4
+        movdqa  xmm4,xmm8
+        movdqa  xmm7,xmm15
+        pslld   xmm2,10
+        pxor    xmm4,xmm15
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm6,xmm0
+        pslld   xmm2,19-10
+        pand    xmm3,xmm4
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm14,xmm8
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm14,xmm3
+        paddd   xmm10,xmm6
+        pxor    xmm7,xmm2
+
+        paddd   xmm14,xmm6
+        paddd   xmm14,xmm7
+        movdqa  xmm6,XMMWORD[((48-128))+rax]
+        paddd   xmm5,XMMWORD[((176-128))+rax]
+
+        movdqa  xmm7,xmm6
+        movdqa  xmm1,xmm6
+        psrld   xmm7,3
+        movdqa  xmm2,xmm6
+
+        psrld   xmm1,7
+        movdqa  xmm0,XMMWORD[((0-128))+rax]
+        pslld   xmm2,14
+        pxor    xmm7,xmm1
+        psrld   xmm1,18-7
+        movdqa  xmm3,xmm0
+        pxor    xmm7,xmm2
+        pslld   xmm2,25-14
+        pxor    xmm7,xmm1
+        psrld   xmm0,10
+        movdqa  xmm1,xmm3
+
+        psrld   xmm3,17
+        pxor    xmm7,xmm2
+        pslld   xmm1,13
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        psrld   xmm3,19-17
+        pxor    xmm0,xmm1
+        pslld   xmm1,15-13
+        pxor    xmm0,xmm3
+        pxor    xmm0,xmm1
+        paddd   xmm5,xmm0
+        movdqa  xmm7,xmm10
+
+        movdqa  xmm2,xmm10
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm10
+        pslld   xmm2,7
+        movdqa  XMMWORD[(32-128)+rax],xmm5
+        paddd   xmm5,xmm13
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[((-64))+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm10
+
+        pxor    xmm7,xmm2
+        movdqa  xmm3,xmm10
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm12
+        pand    xmm3,xmm11
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm14
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm14
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        movdqa  xmm3,xmm15
+        movdqa  xmm7,xmm14
+        pslld   xmm2,10
+        pxor    xmm3,xmm14
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm4,xmm3
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm13,xmm15
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm13,xmm4
+        paddd   xmm9,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm13,xmm5
+        paddd   xmm13,xmm7
+        movdqa  xmm5,XMMWORD[((64-128))+rax]
+        paddd   xmm6,XMMWORD[((192-128))+rax]
+
+        movdqa  xmm7,xmm5
+        movdqa  xmm1,xmm5
+        psrld   xmm7,3
+        movdqa  xmm2,xmm5
+
+        psrld   xmm1,7
+        movdqa  xmm0,XMMWORD[((16-128))+rax]
+        pslld   xmm2,14
+        pxor    xmm7,xmm1
+        psrld   xmm1,18-7
+        movdqa  xmm4,xmm0
+        pxor    xmm7,xmm2
+        pslld   xmm2,25-14
+        pxor    xmm7,xmm1
+        psrld   xmm0,10
+        movdqa  xmm1,xmm4
+
+        psrld   xmm4,17
+        pxor    xmm7,xmm2
+        pslld   xmm1,13
+        paddd   xmm6,xmm7
+        pxor    xmm0,xmm4
+        psrld   xmm4,19-17
+        pxor    xmm0,xmm1
+        pslld   xmm1,15-13
+        pxor    xmm0,xmm4
+        pxor    xmm0,xmm1
+        paddd   xmm6,xmm0
+        movdqa  xmm7,xmm9
+
+        movdqa  xmm2,xmm9
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm9
+        pslld   xmm2,7
+        movdqa  XMMWORD[(48-128)+rax],xmm6
+        paddd   xmm6,xmm12
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm6,XMMWORD[((-32))+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm9
+
+        pxor    xmm7,xmm2
+        movdqa  xmm4,xmm9
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm11
+        pand    xmm4,xmm10
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm13
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm13
+        psrld   xmm1,2
+        paddd   xmm6,xmm7
+        pxor    xmm0,xmm4
+        movdqa  xmm4,xmm14
+        movdqa  xmm7,xmm13
+        pslld   xmm2,10
+        pxor    xmm4,xmm13
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm6,xmm0
+        pslld   xmm2,19-10
+        pand    xmm3,xmm4
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm12,xmm14
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm12,xmm3
+        paddd   xmm8,xmm6
+        pxor    xmm7,xmm2
+
+        paddd   xmm12,xmm6
+        paddd   xmm12,xmm7
+        movdqa  xmm6,XMMWORD[((80-128))+rax]
+        paddd   xmm5,XMMWORD[((208-128))+rax]
+
+        movdqa  xmm7,xmm6
+        movdqa  xmm1,xmm6
+        psrld   xmm7,3
+        movdqa  xmm2,xmm6
+
+        psrld   xmm1,7
+        movdqa  xmm0,XMMWORD[((32-128))+rax]
+        pslld   xmm2,14
+        pxor    xmm7,xmm1
+        psrld   xmm1,18-7
+        movdqa  xmm3,xmm0
+        pxor    xmm7,xmm2
+        pslld   xmm2,25-14
+        pxor    xmm7,xmm1
+        psrld   xmm0,10
+        movdqa  xmm1,xmm3
+
+        psrld   xmm3,17
+        pxor    xmm7,xmm2
+        pslld   xmm1,13
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        psrld   xmm3,19-17
+        pxor    xmm0,xmm1
+        pslld   xmm1,15-13
+        pxor    xmm0,xmm3
+        pxor    xmm0,xmm1
+        paddd   xmm5,xmm0
+        movdqa  xmm7,xmm8
+
+        movdqa  xmm2,xmm8
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm8
+        pslld   xmm2,7
+        movdqa  XMMWORD[(64-128)+rax],xmm5
+        paddd   xmm5,xmm11
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm8
+
+        pxor    xmm7,xmm2
+        movdqa  xmm3,xmm8
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm10
+        pand    xmm3,xmm9
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm12
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm12
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        movdqa  xmm3,xmm13
+        movdqa  xmm7,xmm12
+        pslld   xmm2,10
+        pxor    xmm3,xmm12
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm4,xmm3
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm11,xmm13
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm11,xmm4
+        paddd   xmm15,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm11,xmm5
+        paddd   xmm11,xmm7
+        movdqa  xmm5,XMMWORD[((96-128))+rax]
+        paddd   xmm6,XMMWORD[((224-128))+rax]
+
+        movdqa  xmm7,xmm5
+        movdqa  xmm1,xmm5
+        psrld   xmm7,3
+        movdqa  xmm2,xmm5
+
+        psrld   xmm1,7
+        movdqa  xmm0,XMMWORD[((48-128))+rax]
+        pslld   xmm2,14
+        pxor    xmm7,xmm1
+        psrld   xmm1,18-7
+        movdqa  xmm4,xmm0
+        pxor    xmm7,xmm2
+        pslld   xmm2,25-14
+        pxor    xmm7,xmm1
+        psrld   xmm0,10
+        movdqa  xmm1,xmm4
+
+        psrld   xmm4,17
+        pxor    xmm7,xmm2
+        pslld   xmm1,13
+        paddd   xmm6,xmm7
+        pxor    xmm0,xmm4
+        psrld   xmm4,19-17
+        pxor    xmm0,xmm1
+        pslld   xmm1,15-13
+        pxor    xmm0,xmm4
+        pxor    xmm0,xmm1
+        paddd   xmm6,xmm0
+        movdqa  xmm7,xmm15
+
+        movdqa  xmm2,xmm15
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm15
+        pslld   xmm2,7
+        movdqa  XMMWORD[(80-128)+rax],xmm6
+        paddd   xmm6,xmm10
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm6,XMMWORD[32+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm15
+
+        pxor    xmm7,xmm2
+        movdqa  xmm4,xmm15
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm9
+        pand    xmm4,xmm8
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm11
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm11
+        psrld   xmm1,2
+        paddd   xmm6,xmm7
+        pxor    xmm0,xmm4
+        movdqa  xmm4,xmm12
+        movdqa  xmm7,xmm11
+        pslld   xmm2,10
+        pxor    xmm4,xmm11
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm6,xmm0
+        pslld   xmm2,19-10
+        pand    xmm3,xmm4
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm10,xmm12
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm10,xmm3
+        paddd   xmm14,xmm6
+        pxor    xmm7,xmm2
+
+        paddd   xmm10,xmm6
+        paddd   xmm10,xmm7
+        movdqa  xmm6,XMMWORD[((112-128))+rax]
+        paddd   xmm5,XMMWORD[((240-128))+rax]
+
+        movdqa  xmm7,xmm6
+        movdqa  xmm1,xmm6
+        psrld   xmm7,3
+        movdqa  xmm2,xmm6
+
+        psrld   xmm1,7
+        movdqa  xmm0,XMMWORD[((64-128))+rax]
+        pslld   xmm2,14
+        pxor    xmm7,xmm1
+        psrld   xmm1,18-7
+        movdqa  xmm3,xmm0
+        pxor    xmm7,xmm2
+        pslld   xmm2,25-14
+        pxor    xmm7,xmm1
+        psrld   xmm0,10
+        movdqa  xmm1,xmm3
+
+        psrld   xmm3,17
+        pxor    xmm7,xmm2
+        pslld   xmm1,13
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        psrld   xmm3,19-17
+        pxor    xmm0,xmm1
+        pslld   xmm1,15-13
+        pxor    xmm0,xmm3
+        pxor    xmm0,xmm1
+        paddd   xmm5,xmm0
+        movdqa  xmm7,xmm14
+
+        movdqa  xmm2,xmm14
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm14
+        pslld   xmm2,7
+        movdqa  XMMWORD[(96-128)+rax],xmm5
+        paddd   xmm5,xmm9
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[64+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm14
+
+        pxor    xmm7,xmm2
+        movdqa  xmm3,xmm14
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm8
+        pand    xmm3,xmm15
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm10
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm10
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        movdqa  xmm3,xmm11
+        movdqa  xmm7,xmm10
+        pslld   xmm2,10
+        pxor    xmm3,xmm10
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm4,xmm3
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm9,xmm11
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm9,xmm4
+        paddd   xmm13,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm9,xmm5
+        paddd   xmm9,xmm7
+        movdqa  xmm5,XMMWORD[((128-128))+rax]
+        paddd   xmm6,XMMWORD[((0-128))+rax]
+
+        movdqa  xmm7,xmm5
+        movdqa  xmm1,xmm5
+        psrld   xmm7,3
+        movdqa  xmm2,xmm5
+
+        psrld   xmm1,7
+        movdqa  xmm0,XMMWORD[((80-128))+rax]
+        pslld   xmm2,14
+        pxor    xmm7,xmm1
+        psrld   xmm1,18-7
+        movdqa  xmm4,xmm0
+        pxor    xmm7,xmm2
+        pslld   xmm2,25-14
+        pxor    xmm7,xmm1
+        psrld   xmm0,10
+        movdqa  xmm1,xmm4
+
+        psrld   xmm4,17
+        pxor    xmm7,xmm2
+        pslld   xmm1,13
+        paddd   xmm6,xmm7
+        pxor    xmm0,xmm4
+        psrld   xmm4,19-17
+        pxor    xmm0,xmm1
+        pslld   xmm1,15-13
+        pxor    xmm0,xmm4
+        pxor    xmm0,xmm1
+        paddd   xmm6,xmm0
+        movdqa  xmm7,xmm13
+
+        movdqa  xmm2,xmm13
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm13
+        pslld   xmm2,7
+        movdqa  XMMWORD[(112-128)+rax],xmm6
+        paddd   xmm6,xmm8
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm6,XMMWORD[96+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm13
+
+        pxor    xmm7,xmm2
+        movdqa  xmm4,xmm13
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm15
+        pand    xmm4,xmm14
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm9
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm9
+        psrld   xmm1,2
+        paddd   xmm6,xmm7
+        pxor    xmm0,xmm4
+        movdqa  xmm4,xmm10
+        movdqa  xmm7,xmm9
+        pslld   xmm2,10
+        pxor    xmm4,xmm9
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm6,xmm0
+        pslld   xmm2,19-10
+        pand    xmm3,xmm4
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm8,xmm10
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm8,xmm3
+        paddd   xmm12,xmm6
+        pxor    xmm7,xmm2
+
+        paddd   xmm8,xmm6
+        paddd   xmm8,xmm7
+        lea     rbp,[256+rbp]
+        movdqa  xmm6,XMMWORD[((144-128))+rax]
+        paddd   xmm5,XMMWORD[((16-128))+rax]
+
+        movdqa  xmm7,xmm6
+        movdqa  xmm1,xmm6
+        psrld   xmm7,3
+        movdqa  xmm2,xmm6
+
+        psrld   xmm1,7
+        movdqa  xmm0,XMMWORD[((96-128))+rax]
+        pslld   xmm2,14
+        pxor    xmm7,xmm1
+        psrld   xmm1,18-7
+        movdqa  xmm3,xmm0
+        pxor    xmm7,xmm2
+        pslld   xmm2,25-14
+        pxor    xmm7,xmm1
+        psrld   xmm0,10
+        movdqa  xmm1,xmm3
+
+        psrld   xmm3,17
+        pxor    xmm7,xmm2
+        pslld   xmm1,13
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        psrld   xmm3,19-17
+        pxor    xmm0,xmm1
+        pslld   xmm1,15-13
+        pxor    xmm0,xmm3
+        pxor    xmm0,xmm1
+        paddd   xmm5,xmm0
+        movdqa  xmm7,xmm12
+
+        movdqa  xmm2,xmm12
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm12
+        pslld   xmm2,7
+        movdqa  XMMWORD[(128-128)+rax],xmm5
+        paddd   xmm5,xmm15
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[((-128))+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm12
+
+        pxor    xmm7,xmm2
+        movdqa  xmm3,xmm12
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm14
+        pand    xmm3,xmm13
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm8
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm8
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        movdqa  xmm3,xmm9
+        movdqa  xmm7,xmm8
+        pslld   xmm2,10
+        pxor    xmm3,xmm8
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm4,xmm3
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm15,xmm9
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm15,xmm4
+        paddd   xmm11,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm15,xmm5
+        paddd   xmm15,xmm7
+        movdqa  xmm5,XMMWORD[((160-128))+rax]
+        paddd   xmm6,XMMWORD[((32-128))+rax]
+
+        movdqa  xmm7,xmm5
+        movdqa  xmm1,xmm5
+        psrld   xmm7,3
+        movdqa  xmm2,xmm5
+
+        psrld   xmm1,7
+        movdqa  xmm0,XMMWORD[((112-128))+rax]
+        pslld   xmm2,14
+        pxor    xmm7,xmm1
+        psrld   xmm1,18-7
+        movdqa  xmm4,xmm0
+        pxor    xmm7,xmm2
+        pslld   xmm2,25-14
+        pxor    xmm7,xmm1
+        psrld   xmm0,10
+        movdqa  xmm1,xmm4
+
+        psrld   xmm4,17
+        pxor    xmm7,xmm2
+        pslld   xmm1,13
+        paddd   xmm6,xmm7
+        pxor    xmm0,xmm4
+        psrld   xmm4,19-17
+        pxor    xmm0,xmm1
+        pslld   xmm1,15-13
+        pxor    xmm0,xmm4
+        pxor    xmm0,xmm1
+        paddd   xmm6,xmm0
+        movdqa  xmm7,xmm11
+
+        movdqa  xmm2,xmm11
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm11
+        pslld   xmm2,7
+        movdqa  XMMWORD[(144-128)+rax],xmm6
+        paddd   xmm6,xmm14
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm6,XMMWORD[((-96))+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm11
+
+        pxor    xmm7,xmm2
+        movdqa  xmm4,xmm11
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm13
+        pand    xmm4,xmm12
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm15
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm15
+        psrld   xmm1,2
+        paddd   xmm6,xmm7
+        pxor    xmm0,xmm4
+        movdqa  xmm4,xmm8
+        movdqa  xmm7,xmm15
+        pslld   xmm2,10
+        pxor    xmm4,xmm15
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm6,xmm0
+        pslld   xmm2,19-10
+        pand    xmm3,xmm4
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm14,xmm8
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm14,xmm3
+        paddd   xmm10,xmm6
+        pxor    xmm7,xmm2
+
+        paddd   xmm14,xmm6
+        paddd   xmm14,xmm7
+        movdqa  xmm6,XMMWORD[((176-128))+rax]
+        paddd   xmm5,XMMWORD[((48-128))+rax]
+
+        movdqa  xmm7,xmm6
+        movdqa  xmm1,xmm6
+        psrld   xmm7,3
+        movdqa  xmm2,xmm6
+
+        psrld   xmm1,7
+        movdqa  xmm0,XMMWORD[((128-128))+rax]
+        pslld   xmm2,14
+        pxor    xmm7,xmm1
+        psrld   xmm1,18-7
+        movdqa  xmm3,xmm0
+        pxor    xmm7,xmm2
+        pslld   xmm2,25-14
+        pxor    xmm7,xmm1
+        psrld   xmm0,10
+        movdqa  xmm1,xmm3
+
+        psrld   xmm3,17
+        pxor    xmm7,xmm2
+        pslld   xmm1,13
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        psrld   xmm3,19-17
+        pxor    xmm0,xmm1
+        pslld   xmm1,15-13
+        pxor    xmm0,xmm3
+        pxor    xmm0,xmm1
+        paddd   xmm5,xmm0
+        movdqa  xmm7,xmm10
+
+        movdqa  xmm2,xmm10
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm10
+        pslld   xmm2,7
+        movdqa  XMMWORD[(160-128)+rax],xmm5
+        paddd   xmm5,xmm13
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[((-64))+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm10
+
+        pxor    xmm7,xmm2
+        movdqa  xmm3,xmm10
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm12
+        pand    xmm3,xmm11
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm14
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm14
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        movdqa  xmm3,xmm15
+        movdqa  xmm7,xmm14
+        pslld   xmm2,10
+        pxor    xmm3,xmm14
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm4,xmm3
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm13,xmm15
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm13,xmm4
+        paddd   xmm9,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm13,xmm5
+        paddd   xmm13,xmm7
+        movdqa  xmm5,XMMWORD[((192-128))+rax]
+        paddd   xmm6,XMMWORD[((64-128))+rax]
+
+        movdqa  xmm7,xmm5
+        movdqa  xmm1,xmm5
+        psrld   xmm7,3
+        movdqa  xmm2,xmm5
+
+        psrld   xmm1,7
+        movdqa  xmm0,XMMWORD[((144-128))+rax]
+        pslld   xmm2,14
+        pxor    xmm7,xmm1
+        psrld   xmm1,18-7
+        movdqa  xmm4,xmm0
+        pxor    xmm7,xmm2
+        pslld   xmm2,25-14
+        pxor    xmm7,xmm1
+        psrld   xmm0,10
+        movdqa  xmm1,xmm4
+
+        psrld   xmm4,17
+        pxor    xmm7,xmm2
+        pslld   xmm1,13
+        paddd   xmm6,xmm7
+        pxor    xmm0,xmm4
+        psrld   xmm4,19-17
+        pxor    xmm0,xmm1
+        pslld   xmm1,15-13
+        pxor    xmm0,xmm4
+        pxor    xmm0,xmm1
+        paddd   xmm6,xmm0
+        movdqa  xmm7,xmm9
+
+        movdqa  xmm2,xmm9
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm9
+        pslld   xmm2,7
+        movdqa  XMMWORD[(176-128)+rax],xmm6
+        paddd   xmm6,xmm12
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm6,XMMWORD[((-32))+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm9
+
+        pxor    xmm7,xmm2
+        movdqa  xmm4,xmm9
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm11
+        pand    xmm4,xmm10
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm13
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm13
+        psrld   xmm1,2
+        paddd   xmm6,xmm7
+        pxor    xmm0,xmm4
+        movdqa  xmm4,xmm14
+        movdqa  xmm7,xmm13
+        pslld   xmm2,10
+        pxor    xmm4,xmm13
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm6,xmm0
+        pslld   xmm2,19-10
+        pand    xmm3,xmm4
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm12,xmm14
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm12,xmm3
+        paddd   xmm8,xmm6
+        pxor    xmm7,xmm2
+
+        paddd   xmm12,xmm6
+        paddd   xmm12,xmm7
+        movdqa  xmm6,XMMWORD[((208-128))+rax]
+        paddd   xmm5,XMMWORD[((80-128))+rax]
+
+        movdqa  xmm7,xmm6
+        movdqa  xmm1,xmm6
+        psrld   xmm7,3
+        movdqa  xmm2,xmm6
+
+        psrld   xmm1,7
+        movdqa  xmm0,XMMWORD[((160-128))+rax]
+        pslld   xmm2,14
+        pxor    xmm7,xmm1
+        psrld   xmm1,18-7
+        movdqa  xmm3,xmm0
+        pxor    xmm7,xmm2
+        pslld   xmm2,25-14
+        pxor    xmm7,xmm1
+        psrld   xmm0,10
+        movdqa  xmm1,xmm3
+
+        psrld   xmm3,17
+        pxor    xmm7,xmm2
+        pslld   xmm1,13
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        psrld   xmm3,19-17
+        pxor    xmm0,xmm1
+        pslld   xmm1,15-13
+        pxor    xmm0,xmm3
+        pxor    xmm0,xmm1
+        paddd   xmm5,xmm0
+        movdqa  xmm7,xmm8
+
+        movdqa  xmm2,xmm8
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm8
+        pslld   xmm2,7
+        movdqa  XMMWORD[(192-128)+rax],xmm5
+        paddd   xmm5,xmm11
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm8
+
+        pxor    xmm7,xmm2
+        movdqa  xmm3,xmm8
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm10
+        pand    xmm3,xmm9
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm12
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm12
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        movdqa  xmm3,xmm13
+        movdqa  xmm7,xmm12
+        pslld   xmm2,10
+        pxor    xmm3,xmm12
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm4,xmm3
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm11,xmm13
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm11,xmm4
+        paddd   xmm15,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm11,xmm5
+        paddd   xmm11,xmm7
+        movdqa  xmm5,XMMWORD[((224-128))+rax]
+        paddd   xmm6,XMMWORD[((96-128))+rax]
+
+        movdqa  xmm7,xmm5
+        movdqa  xmm1,xmm5
+        psrld   xmm7,3
+        movdqa  xmm2,xmm5
+
+        psrld   xmm1,7
+        movdqa  xmm0,XMMWORD[((176-128))+rax]
+        pslld   xmm2,14
+        pxor    xmm7,xmm1
+        psrld   xmm1,18-7
+        movdqa  xmm4,xmm0
+        pxor    xmm7,xmm2
+        pslld   xmm2,25-14
+        pxor    xmm7,xmm1
+        psrld   xmm0,10
+        movdqa  xmm1,xmm4
+
+        psrld   xmm4,17
+        pxor    xmm7,xmm2
+        pslld   xmm1,13
+        paddd   xmm6,xmm7
+        pxor    xmm0,xmm4
+        psrld   xmm4,19-17
+        pxor    xmm0,xmm1
+        pslld   xmm1,15-13
+        pxor    xmm0,xmm4
+        pxor    xmm0,xmm1
+        paddd   xmm6,xmm0
+        movdqa  xmm7,xmm15
+
+        movdqa  xmm2,xmm15
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm15
+        pslld   xmm2,7
+        movdqa  XMMWORD[(208-128)+rax],xmm6
+        paddd   xmm6,xmm10
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm6,XMMWORD[32+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm15
+
+        pxor    xmm7,xmm2
+        movdqa  xmm4,xmm15
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm9
+        pand    xmm4,xmm8
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm11
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm11
+        psrld   xmm1,2
+        paddd   xmm6,xmm7
+        pxor    xmm0,xmm4
+        movdqa  xmm4,xmm12
+        movdqa  xmm7,xmm11
+        pslld   xmm2,10
+        pxor    xmm4,xmm11
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm6,xmm0
+        pslld   xmm2,19-10
+        pand    xmm3,xmm4
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm10,xmm12
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm10,xmm3
+        paddd   xmm14,xmm6
+        pxor    xmm7,xmm2
+
+        paddd   xmm10,xmm6
+        paddd   xmm10,xmm7
+        movdqa  xmm6,XMMWORD[((240-128))+rax]
+        paddd   xmm5,XMMWORD[((112-128))+rax]
+
+        movdqa  xmm7,xmm6
+        movdqa  xmm1,xmm6
+        psrld   xmm7,3
+        movdqa  xmm2,xmm6
+
+        psrld   xmm1,7
+        movdqa  xmm0,XMMWORD[((192-128))+rax]
+        pslld   xmm2,14
+        pxor    xmm7,xmm1
+        psrld   xmm1,18-7
+        movdqa  xmm3,xmm0
+        pxor    xmm7,xmm2
+        pslld   xmm2,25-14
+        pxor    xmm7,xmm1
+        psrld   xmm0,10
+        movdqa  xmm1,xmm3
+
+        psrld   xmm3,17
+        pxor    xmm7,xmm2
+        pslld   xmm1,13
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        psrld   xmm3,19-17
+        pxor    xmm0,xmm1
+        pslld   xmm1,15-13
+        pxor    xmm0,xmm3
+        pxor    xmm0,xmm1
+        paddd   xmm5,xmm0
+        movdqa  xmm7,xmm14
+
+        movdqa  xmm2,xmm14
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm14
+        pslld   xmm2,7
+        movdqa  XMMWORD[(224-128)+rax],xmm5
+        paddd   xmm5,xmm9
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[64+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm14
+
+        pxor    xmm7,xmm2
+        movdqa  xmm3,xmm14
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm8
+        pand    xmm3,xmm15
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm10
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm10
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        movdqa  xmm3,xmm11
+        movdqa  xmm7,xmm10
+        pslld   xmm2,10
+        pxor    xmm3,xmm10
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm4,xmm3
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm9,xmm11
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm9,xmm4
+        paddd   xmm13,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm9,xmm5
+        paddd   xmm9,xmm7
+        movdqa  xmm5,XMMWORD[((0-128))+rax]
+        paddd   xmm6,XMMWORD[((128-128))+rax]
+
+        movdqa  xmm7,xmm5
+        movdqa  xmm1,xmm5
+        psrld   xmm7,3
+        movdqa  xmm2,xmm5
+
+        psrld   xmm1,7
+        movdqa  xmm0,XMMWORD[((208-128))+rax]
+        pslld   xmm2,14
+        pxor    xmm7,xmm1
+        psrld   xmm1,18-7
+        movdqa  xmm4,xmm0
+        pxor    xmm7,xmm2
+        pslld   xmm2,25-14
+        pxor    xmm7,xmm1
+        psrld   xmm0,10
+        movdqa  xmm1,xmm4
+
+        psrld   xmm4,17
+        pxor    xmm7,xmm2
+        pslld   xmm1,13
+        paddd   xmm6,xmm7
+        pxor    xmm0,xmm4
+        psrld   xmm4,19-17
+        pxor    xmm0,xmm1
+        pslld   xmm1,15-13
+        pxor    xmm0,xmm4
+        pxor    xmm0,xmm1
+        paddd   xmm6,xmm0
+        movdqa  xmm7,xmm13
+
+        movdqa  xmm2,xmm13
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm13
+        pslld   xmm2,7
+        movdqa  XMMWORD[(240-128)+rax],xmm6
+        paddd   xmm6,xmm8
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm6,XMMWORD[96+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm13
+
+        pxor    xmm7,xmm2
+        movdqa  xmm4,xmm13
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm15
+        pand    xmm4,xmm14
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm9
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm9
+        psrld   xmm1,2
+        paddd   xmm6,xmm7
+        pxor    xmm0,xmm4
+        movdqa  xmm4,xmm10
+        movdqa  xmm7,xmm9
+        pslld   xmm2,10
+        pxor    xmm4,xmm9
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm6,xmm0
+        pslld   xmm2,19-10
+        pand    xmm3,xmm4
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm8,xmm10
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm8,xmm3
+        paddd   xmm12,xmm6
+        pxor    xmm7,xmm2
+
+        paddd   xmm8,xmm6
+        paddd   xmm8,xmm7
+        lea     rbp,[256+rbp]
+        dec     ecx
+        jnz     NEAR $L$oop_16_xx
+
+        mov     ecx,1
+        lea     rbp,[((K256+128))]
+
+        movdqa  xmm7,XMMWORD[rbx]
+        cmp     ecx,DWORD[rbx]
+        pxor    xmm0,xmm0
+        cmovge  r8,rbp
+        cmp     ecx,DWORD[4+rbx]
+        movdqa  xmm6,xmm7
+        cmovge  r9,rbp
+        cmp     ecx,DWORD[8+rbx]
+        pcmpgtd xmm6,xmm0
+        cmovge  r10,rbp
+        cmp     ecx,DWORD[12+rbx]
+        paddd   xmm7,xmm6
+        cmovge  r11,rbp
+
+        movdqu  xmm0,XMMWORD[((0-128))+rdi]
+        pand    xmm8,xmm6
+        movdqu  xmm1,XMMWORD[((32-128))+rdi]
+        pand    xmm9,xmm6
+        movdqu  xmm2,XMMWORD[((64-128))+rdi]
+        pand    xmm10,xmm6
+        movdqu  xmm5,XMMWORD[((96-128))+rdi]
+        pand    xmm11,xmm6
+        paddd   xmm8,xmm0
+        movdqu  xmm0,XMMWORD[((128-128))+rdi]
+        pand    xmm12,xmm6
+        paddd   xmm9,xmm1
+        movdqu  xmm1,XMMWORD[((160-128))+rdi]
+        pand    xmm13,xmm6
+        paddd   xmm10,xmm2
+        movdqu  xmm2,XMMWORD[((192-128))+rdi]
+        pand    xmm14,xmm6
+        paddd   xmm11,xmm5
+        movdqu  xmm5,XMMWORD[((224-128))+rdi]
+        pand    xmm15,xmm6
+        paddd   xmm12,xmm0
+        paddd   xmm13,xmm1
+        movdqu  XMMWORD[(0-128)+rdi],xmm8
+        paddd   xmm14,xmm2
+        movdqu  XMMWORD[(32-128)+rdi],xmm9
+        paddd   xmm15,xmm5
+        movdqu  XMMWORD[(64-128)+rdi],xmm10
+        movdqu  XMMWORD[(96-128)+rdi],xmm11
+        movdqu  XMMWORD[(128-128)+rdi],xmm12
+        movdqu  XMMWORD[(160-128)+rdi],xmm13
+        movdqu  XMMWORD[(192-128)+rdi],xmm14
+        movdqu  XMMWORD[(224-128)+rdi],xmm15
+
+        movdqa  XMMWORD[rbx],xmm7
+        movdqa  xmm6,XMMWORD[$L$pbswap]
+        dec     edx
+        jnz     NEAR $L$oop
+
+        mov     edx,DWORD[280+rsp]
+        lea     rdi,[16+rdi]
+        lea     rsi,[64+rsi]
+        dec     edx
+        jnz     NEAR $L$oop_grande
+
+$L$done:
+        mov     rax,QWORD[272+rsp]
+
+        movaps  xmm6,XMMWORD[((-184))+rax]
+        movaps  xmm7,XMMWORD[((-168))+rax]
+        movaps  xmm8,XMMWORD[((-152))+rax]
+        movaps  xmm9,XMMWORD[((-136))+rax]
+        movaps  xmm10,XMMWORD[((-120))+rax]
+        movaps  xmm11,XMMWORD[((-104))+rax]
+        movaps  xmm12,XMMWORD[((-88))+rax]
+        movaps  xmm13,XMMWORD[((-72))+rax]
+        movaps  xmm14,XMMWORD[((-56))+rax]
+        movaps  xmm15,XMMWORD[((-40))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_sha256_multi_block:
+
+ALIGN   32
+sha256_multi_block_shaext:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha256_multi_block_shaext:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+_shaext_shortcut:
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        lea     rsp,[((-168))+rsp]
+        movaps  XMMWORD[rsp],xmm6
+        movaps  XMMWORD[16+rsp],xmm7
+        movaps  XMMWORD[32+rsp],xmm8
+        movaps  XMMWORD[48+rsp],xmm9
+        movaps  XMMWORD[(-120)+rax],xmm10
+        movaps  XMMWORD[(-104)+rax],xmm11
+        movaps  XMMWORD[(-88)+rax],xmm12
+        movaps  XMMWORD[(-72)+rax],xmm13
+        movaps  XMMWORD[(-56)+rax],xmm14
+        movaps  XMMWORD[(-40)+rax],xmm15
+        sub     rsp,288
+        shl     edx,1
+        and     rsp,-256
+        lea     rdi,[128+rdi]
+        mov     QWORD[272+rsp],rax
+$L$body_shaext:
+        lea     rbx,[256+rsp]
+        lea     rbp,[((K256_shaext+128))]
+
+$L$oop_grande_shaext:
+        mov     DWORD[280+rsp],edx
+        xor     edx,edx
+        mov     r8,QWORD[rsi]
+        mov     ecx,DWORD[8+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[rbx],ecx
+        cmovle  r8,rsp
+        mov     r9,QWORD[16+rsi]
+        mov     ecx,DWORD[24+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[4+rbx],ecx
+        cmovle  r9,rsp
+        test    edx,edx
+        jz      NEAR $L$done_shaext
+
+        movq    xmm12,QWORD[((0-128))+rdi]
+        movq    xmm4,QWORD[((32-128))+rdi]
+        movq    xmm13,QWORD[((64-128))+rdi]
+        movq    xmm5,QWORD[((96-128))+rdi]
+        movq    xmm8,QWORD[((128-128))+rdi]
+        movq    xmm9,QWORD[((160-128))+rdi]
+        movq    xmm10,QWORD[((192-128))+rdi]
+        movq    xmm11,QWORD[((224-128))+rdi]
+
+        punpckldq       xmm12,xmm4
+        punpckldq       xmm13,xmm5
+        punpckldq       xmm8,xmm9
+        punpckldq       xmm10,xmm11
+        movdqa  xmm3,XMMWORD[((K256_shaext-16))]
+
+        movdqa  xmm14,xmm12
+        movdqa  xmm15,xmm13
+        punpcklqdq      xmm12,xmm8
+        punpcklqdq      xmm13,xmm10
+        punpckhqdq      xmm14,xmm8
+        punpckhqdq      xmm15,xmm10
+
+        pshufd  xmm12,xmm12,27
+        pshufd  xmm13,xmm13,27
+        pshufd  xmm14,xmm14,27
+        pshufd  xmm15,xmm15,27
+        jmp     NEAR $L$oop_shaext
+
+ALIGN   32
+$L$oop_shaext:
+        movdqu  xmm4,XMMWORD[r8]
+        movdqu  xmm8,XMMWORD[r9]
+        movdqu  xmm5,XMMWORD[16+r8]
+        movdqu  xmm9,XMMWORD[16+r9]
+        movdqu  xmm6,XMMWORD[32+r8]
+DB      102,15,56,0,227
+        movdqu  xmm10,XMMWORD[32+r9]
+DB      102,68,15,56,0,195
+        movdqu  xmm7,XMMWORD[48+r8]
+        lea     r8,[64+r8]
+        movdqu  xmm11,XMMWORD[48+r9]
+        lea     r9,[64+r9]
+
+        movdqa  xmm0,XMMWORD[((0-128))+rbp]
+DB      102,15,56,0,235
+        paddd   xmm0,xmm4
+        pxor    xmm4,xmm12
+        movdqa  xmm1,xmm0
+        movdqa  xmm2,XMMWORD[((0-128))+rbp]
+DB      102,68,15,56,0,203
+        paddd   xmm2,xmm8
+        movdqa  XMMWORD[80+rsp],xmm13
+DB      69,15,56,203,236
+        pxor    xmm8,xmm14
+        movdqa  xmm0,xmm2
+        movdqa  XMMWORD[112+rsp],xmm15
+DB      69,15,56,203,254
+        pshufd  xmm0,xmm1,0x0e
+        pxor    xmm4,xmm12
+        movdqa  XMMWORD[64+rsp],xmm12
+DB      69,15,56,203,229
+        pshufd  xmm0,xmm2,0x0e
+        pxor    xmm8,xmm14
+        movdqa  XMMWORD[96+rsp],xmm14
+        movdqa  xmm1,XMMWORD[((16-128))+rbp]
+        paddd   xmm1,xmm5
+DB      102,15,56,0,243
+DB      69,15,56,203,247
+
+        movdqa  xmm0,xmm1
+        movdqa  xmm2,XMMWORD[((16-128))+rbp]
+        paddd   xmm2,xmm9
+DB      69,15,56,203,236
+        movdqa  xmm0,xmm2
+        prefetcht0      [127+r8]
+DB      102,15,56,0,251
+DB      102,68,15,56,0,211
+        prefetcht0      [127+r9]
+DB      69,15,56,203,254
+        pshufd  xmm0,xmm1,0x0e
+DB      102,68,15,56,0,219
+DB      15,56,204,229
+DB      69,15,56,203,229
+        pshufd  xmm0,xmm2,0x0e
+        movdqa  xmm1,XMMWORD[((32-128))+rbp]
+        paddd   xmm1,xmm6
+DB      69,15,56,203,247
+
+        movdqa  xmm0,xmm1
+        movdqa  xmm2,XMMWORD[((32-128))+rbp]
+        paddd   xmm2,xmm10
+DB      69,15,56,203,236
+DB      69,15,56,204,193
+        movdqa  xmm0,xmm2
+        movdqa  xmm3,xmm7
+DB      69,15,56,203,254
+        pshufd  xmm0,xmm1,0x0e
+DB      102,15,58,15,222,4
+        paddd   xmm4,xmm3
+        movdqa  xmm3,xmm11
+DB      102,65,15,58,15,218,4
+DB      15,56,204,238
+DB      69,15,56,203,229
+        pshufd  xmm0,xmm2,0x0e
+        movdqa  xmm1,XMMWORD[((48-128))+rbp]
+        paddd   xmm1,xmm7
+DB      69,15,56,203,247
+DB      69,15,56,204,202
+
+        movdqa  xmm0,xmm1
+        movdqa  xmm2,XMMWORD[((48-128))+rbp]
+        paddd   xmm8,xmm3
+        paddd   xmm2,xmm11
+DB      15,56,205,231
+DB      69,15,56,203,236
+        movdqa  xmm0,xmm2
+        movdqa  xmm3,xmm4
+DB      102,15,58,15,223,4
+DB      69,15,56,203,254
+DB      69,15,56,205,195
+        pshufd  xmm0,xmm1,0x0e
+        paddd   xmm5,xmm3
+        movdqa  xmm3,xmm8
+DB      102,65,15,58,15,219,4
+DB      15,56,204,247
+DB      69,15,56,203,229
+        pshufd  xmm0,xmm2,0x0e
+        movdqa  xmm1,XMMWORD[((64-128))+rbp]
+        paddd   xmm1,xmm4
+DB      69,15,56,203,247
+DB      69,15,56,204,211
+        movdqa  xmm0,xmm1
+        movdqa  xmm2,XMMWORD[((64-128))+rbp]
+        paddd   xmm9,xmm3
+        paddd   xmm2,xmm8
+DB      15,56,205,236
+DB      69,15,56,203,236
+        movdqa  xmm0,xmm2
+        movdqa  xmm3,xmm5
+DB      102,15,58,15,220,4
+DB      69,15,56,203,254
+DB      69,15,56,205,200
+        pshufd  xmm0,xmm1,0x0e
+        paddd   xmm6,xmm3
+        movdqa  xmm3,xmm9
+DB      102,65,15,58,15,216,4
+DB      15,56,204,252
+DB      69,15,56,203,229
+        pshufd  xmm0,xmm2,0x0e
+        movdqa  xmm1,XMMWORD[((80-128))+rbp]
+        paddd   xmm1,xmm5
+DB      69,15,56,203,247
+DB      69,15,56,204,216
+        movdqa  xmm0,xmm1
+        movdqa  xmm2,XMMWORD[((80-128))+rbp]
+        paddd   xmm10,xmm3
+        paddd   xmm2,xmm9
+DB      15,56,205,245
+DB      69,15,56,203,236
+        movdqa  xmm0,xmm2
+        movdqa  xmm3,xmm6
+DB      102,15,58,15,221,4
+DB      69,15,56,203,254
+DB      69,15,56,205,209
+        pshufd  xmm0,xmm1,0x0e
+        paddd   xmm7,xmm3
+        movdqa  xmm3,xmm10
+DB      102,65,15,58,15,217,4
+DB      15,56,204,229
+DB      69,15,56,203,229
+        pshufd  xmm0,xmm2,0x0e
+        movdqa  xmm1,XMMWORD[((96-128))+rbp]
+        paddd   xmm1,xmm6
+DB      69,15,56,203,247
+DB      69,15,56,204,193
+        movdqa  xmm0,xmm1
+        movdqa  xmm2,XMMWORD[((96-128))+rbp]
+        paddd   xmm11,xmm3
+        paddd   xmm2,xmm10
+DB      15,56,205,254
+DB      69,15,56,203,236
+        movdqa  xmm0,xmm2
+        movdqa  xmm3,xmm7
+DB      102,15,58,15,222,4
+DB      69,15,56,203,254
+DB      69,15,56,205,218
+        pshufd  xmm0,xmm1,0x0e
+        paddd   xmm4,xmm3
+        movdqa  xmm3,xmm11
+DB      102,65,15,58,15,218,4
+DB      15,56,204,238
+DB      69,15,56,203,229
+        pshufd  xmm0,xmm2,0x0e
+        movdqa  xmm1,XMMWORD[((112-128))+rbp]
+        paddd   xmm1,xmm7
+DB      69,15,56,203,247
+DB      69,15,56,204,202
+        movdqa  xmm0,xmm1
+        movdqa  xmm2,XMMWORD[((112-128))+rbp]
+        paddd   xmm8,xmm3
+        paddd   xmm2,xmm11
+DB      15,56,205,231
+DB      69,15,56,203,236
+        movdqa  xmm0,xmm2
+        movdqa  xmm3,xmm4
+DB      102,15,58,15,223,4
+DB      69,15,56,203,254
+DB      69,15,56,205,195
+        pshufd  xmm0,xmm1,0x0e
+        paddd   xmm5,xmm3
+        movdqa  xmm3,xmm8
+DB      102,65,15,58,15,219,4
+DB      15,56,204,247
+DB      69,15,56,203,229
+        pshufd  xmm0,xmm2,0x0e
+        movdqa  xmm1,XMMWORD[((128-128))+rbp]
+        paddd   xmm1,xmm4
+DB      69,15,56,203,247
+DB      69,15,56,204,211
+        movdqa  xmm0,xmm1
+        movdqa  xmm2,XMMWORD[((128-128))+rbp]
+        paddd   xmm9,xmm3
+        paddd   xmm2,xmm8
+DB      15,56,205,236
+DB      69,15,56,203,236
+        movdqa  xmm0,xmm2
+        movdqa  xmm3,xmm5
+DB      102,15,58,15,220,4
+DB      69,15,56,203,254
+DB      69,15,56,205,200
+        pshufd  xmm0,xmm1,0x0e
+        paddd   xmm6,xmm3
+        movdqa  xmm3,xmm9
+DB      102,65,15,58,15,216,4
+DB      15,56,204,252
+DB      69,15,56,203,229
+        pshufd  xmm0,xmm2,0x0e
+        movdqa  xmm1,XMMWORD[((144-128))+rbp]
+        paddd   xmm1,xmm5
+DB      69,15,56,203,247
+DB      69,15,56,204,216
+        movdqa  xmm0,xmm1
+        movdqa  xmm2,XMMWORD[((144-128))+rbp]
+        paddd   xmm10,xmm3
+        paddd   xmm2,xmm9
+DB      15,56,205,245
+DB      69,15,56,203,236
+        movdqa  xmm0,xmm2
+        movdqa  xmm3,xmm6
+DB      102,15,58,15,221,4
+DB      69,15,56,203,254
+DB      69,15,56,205,209
+        pshufd  xmm0,xmm1,0x0e
+        paddd   xmm7,xmm3
+        movdqa  xmm3,xmm10
+DB      102,65,15,58,15,217,4
+DB      15,56,204,229
+DB      69,15,56,203,229
+        pshufd  xmm0,xmm2,0x0e
+        movdqa  xmm1,XMMWORD[((160-128))+rbp]
+        paddd   xmm1,xmm6
+DB      69,15,56,203,247
+DB      69,15,56,204,193
+        movdqa  xmm0,xmm1
+        movdqa  xmm2,XMMWORD[((160-128))+rbp]
+        paddd   xmm11,xmm3
+        paddd   xmm2,xmm10
+DB      15,56,205,254
+DB      69,15,56,203,236
+        movdqa  xmm0,xmm2
+        movdqa  xmm3,xmm7
+DB      102,15,58,15,222,4
+DB      69,15,56,203,254
+DB      69,15,56,205,218
+        pshufd  xmm0,xmm1,0x0e
+        paddd   xmm4,xmm3
+        movdqa  xmm3,xmm11
+DB      102,65,15,58,15,218,4
+DB      15,56,204,238
+DB      69,15,56,203,229
+        pshufd  xmm0,xmm2,0x0e
+        movdqa  xmm1,XMMWORD[((176-128))+rbp]
+        paddd   xmm1,xmm7
+DB      69,15,56,203,247
+DB      69,15,56,204,202
+        movdqa  xmm0,xmm1
+        movdqa  xmm2,XMMWORD[((176-128))+rbp]
+        paddd   xmm8,xmm3
+        paddd   xmm2,xmm11
+DB      15,56,205,231
+DB      69,15,56,203,236
+        movdqa  xmm0,xmm2
+        movdqa  xmm3,xmm4
+DB      102,15,58,15,223,4
+DB      69,15,56,203,254
+DB      69,15,56,205,195
+        pshufd  xmm0,xmm1,0x0e
+        paddd   xmm5,xmm3
+        movdqa  xmm3,xmm8
+DB      102,65,15,58,15,219,4
+DB      15,56,204,247
+DB      69,15,56,203,229
+        pshufd  xmm0,xmm2,0x0e
+        movdqa  xmm1,XMMWORD[((192-128))+rbp]
+        paddd   xmm1,xmm4
+DB      69,15,56,203,247
+DB      69,15,56,204,211
+        movdqa  xmm0,xmm1
+        movdqa  xmm2,XMMWORD[((192-128))+rbp]
+        paddd   xmm9,xmm3
+        paddd   xmm2,xmm8
+DB      15,56,205,236
+DB      69,15,56,203,236
+        movdqa  xmm0,xmm2
+        movdqa  xmm3,xmm5
+DB      102,15,58,15,220,4
+DB      69,15,56,203,254
+DB      69,15,56,205,200
+        pshufd  xmm0,xmm1,0x0e
+        paddd   xmm6,xmm3
+        movdqa  xmm3,xmm9
+DB      102,65,15,58,15,216,4
+DB      15,56,204,252
+DB      69,15,56,203,229
+        pshufd  xmm0,xmm2,0x0e
+        movdqa  xmm1,XMMWORD[((208-128))+rbp]
+        paddd   xmm1,xmm5
+DB      69,15,56,203,247
+DB      69,15,56,204,216
+        movdqa  xmm0,xmm1
+        movdqa  xmm2,XMMWORD[((208-128))+rbp]
+        paddd   xmm10,xmm3
+        paddd   xmm2,xmm9
+DB      15,56,205,245
+DB      69,15,56,203,236
+        movdqa  xmm0,xmm2
+        movdqa  xmm3,xmm6
+DB      102,15,58,15,221,4
+DB      69,15,56,203,254
+DB      69,15,56,205,209
+        pshufd  xmm0,xmm1,0x0e
+        paddd   xmm7,xmm3
+        movdqa  xmm3,xmm10
+DB      102,65,15,58,15,217,4
+        nop
+DB      69,15,56,203,229
+        pshufd  xmm0,xmm2,0x0e
+        movdqa  xmm1,XMMWORD[((224-128))+rbp]
+        paddd   xmm1,xmm6
+DB      69,15,56,203,247
+
+        movdqa  xmm0,xmm1
+        movdqa  xmm2,XMMWORD[((224-128))+rbp]
+        paddd   xmm11,xmm3
+        paddd   xmm2,xmm10
+DB      15,56,205,254
+        nop
+DB      69,15,56,203,236
+        movdqa  xmm0,xmm2
+        mov     ecx,1
+        pxor    xmm6,xmm6
+DB      69,15,56,203,254
+DB      69,15,56,205,218
+        pshufd  xmm0,xmm1,0x0e
+        movdqa  xmm1,XMMWORD[((240-128))+rbp]
+        paddd   xmm1,xmm7
+        movq    xmm7,QWORD[rbx]
+        nop
+DB      69,15,56,203,229
+        pshufd  xmm0,xmm2,0x0e
+        movdqa  xmm2,XMMWORD[((240-128))+rbp]
+        paddd   xmm2,xmm11
+DB      69,15,56,203,247
+
+        movdqa  xmm0,xmm1
+        cmp     ecx,DWORD[rbx]
+        cmovge  r8,rsp
+        cmp     ecx,DWORD[4+rbx]
+        cmovge  r9,rsp
+        pshufd  xmm9,xmm7,0x00
+DB      69,15,56,203,236
+        movdqa  xmm0,xmm2
+        pshufd  xmm10,xmm7,0x55
+        movdqa  xmm11,xmm7
+DB      69,15,56,203,254
+        pshufd  xmm0,xmm1,0x0e
+        pcmpgtd xmm9,xmm6
+        pcmpgtd xmm10,xmm6
+DB      69,15,56,203,229
+        pshufd  xmm0,xmm2,0x0e
+        pcmpgtd xmm11,xmm6
+        movdqa  xmm3,XMMWORD[((K256_shaext-16))]
+DB      69,15,56,203,247
+
+        pand    xmm13,xmm9
+        pand    xmm15,xmm10
+        pand    xmm12,xmm9
+        pand    xmm14,xmm10
+        paddd   xmm11,xmm7
+
+        paddd   xmm13,XMMWORD[80+rsp]
+        paddd   xmm15,XMMWORD[112+rsp]
+        paddd   xmm12,XMMWORD[64+rsp]
+        paddd   xmm14,XMMWORD[96+rsp]
+
+        movq    QWORD[rbx],xmm11
+        dec     edx
+        jnz     NEAR $L$oop_shaext
+
+        mov     edx,DWORD[280+rsp]
+
+        pshufd  xmm12,xmm12,27
+        pshufd  xmm13,xmm13,27
+        pshufd  xmm14,xmm14,27
+        pshufd  xmm15,xmm15,27
+
+        movdqa  xmm5,xmm12
+        movdqa  xmm6,xmm13
+        punpckldq       xmm12,xmm14
+        punpckhdq       xmm5,xmm14
+        punpckldq       xmm13,xmm15
+        punpckhdq       xmm6,xmm15
+
+        movq    QWORD[(0-128)+rdi],xmm12
+        psrldq  xmm12,8
+        movq    QWORD[(128-128)+rdi],xmm5
+        psrldq  xmm5,8
+        movq    QWORD[(32-128)+rdi],xmm12
+        movq    QWORD[(160-128)+rdi],xmm5
+
+        movq    QWORD[(64-128)+rdi],xmm13
+        psrldq  xmm13,8
+        movq    QWORD[(192-128)+rdi],xmm6
+        psrldq  xmm6,8
+        movq    QWORD[(96-128)+rdi],xmm13
+        movq    QWORD[(224-128)+rdi],xmm6
+
+        lea     rdi,[8+rdi]
+        lea     rsi,[32+rsi]
+        dec     edx
+        jnz     NEAR $L$oop_grande_shaext
+
+$L$done_shaext:
+
+        movaps  xmm6,XMMWORD[((-184))+rax]
+        movaps  xmm7,XMMWORD[((-168))+rax]
+        movaps  xmm8,XMMWORD[((-152))+rax]
+        movaps  xmm9,XMMWORD[((-136))+rax]
+        movaps  xmm10,XMMWORD[((-120))+rax]
+        movaps  xmm11,XMMWORD[((-104))+rax]
+        movaps  xmm12,XMMWORD[((-88))+rax]
+        movaps  xmm13,XMMWORD[((-72))+rax]
+        movaps  xmm14,XMMWORD[((-56))+rax]
+        movaps  xmm15,XMMWORD[((-40))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$epilogue_shaext:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_sha256_multi_block_shaext:
+
+ALIGN   32
+sha256_multi_block_avx:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha256_multi_block_avx:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+_avx_shortcut:
+        shr     rcx,32
+        cmp     edx,2
+        jb      NEAR $L$avx
+        test    ecx,32
+        jnz     NEAR _avx2_shortcut
+        jmp     NEAR $L$avx
+ALIGN   32
+$L$avx:
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        lea     rsp,[((-168))+rsp]
+        movaps  XMMWORD[rsp],xmm6
+        movaps  XMMWORD[16+rsp],xmm7
+        movaps  XMMWORD[32+rsp],xmm8
+        movaps  XMMWORD[48+rsp],xmm9
+        movaps  XMMWORD[(-120)+rax],xmm10
+        movaps  XMMWORD[(-104)+rax],xmm11
+        movaps  XMMWORD[(-88)+rax],xmm12
+        movaps  XMMWORD[(-72)+rax],xmm13
+        movaps  XMMWORD[(-56)+rax],xmm14
+        movaps  XMMWORD[(-40)+rax],xmm15
+        sub     rsp,288
+        and     rsp,-256
+        mov     QWORD[272+rsp],rax
+
+$L$body_avx:
+        lea     rbp,[((K256+128))]
+        lea     rbx,[256+rsp]
+        lea     rdi,[128+rdi]
+
+$L$oop_grande_avx:
+        mov     DWORD[280+rsp],edx
+        xor     edx,edx
+        mov     r8,QWORD[rsi]
+        mov     ecx,DWORD[8+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[rbx],ecx
+        cmovle  r8,rbp
+        mov     r9,QWORD[16+rsi]
+        mov     ecx,DWORD[24+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[4+rbx],ecx
+        cmovle  r9,rbp
+        mov     r10,QWORD[32+rsi]
+        mov     ecx,DWORD[40+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[8+rbx],ecx
+        cmovle  r10,rbp
+        mov     r11,QWORD[48+rsi]
+        mov     ecx,DWORD[56+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[12+rbx],ecx
+        cmovle  r11,rbp
+        test    edx,edx
+        jz      NEAR $L$done_avx
+
+        vmovdqu xmm8,XMMWORD[((0-128))+rdi]
+        lea     rax,[128+rsp]
+        vmovdqu xmm9,XMMWORD[((32-128))+rdi]
+        vmovdqu xmm10,XMMWORD[((64-128))+rdi]
+        vmovdqu xmm11,XMMWORD[((96-128))+rdi]
+        vmovdqu xmm12,XMMWORD[((128-128))+rdi]
+        vmovdqu xmm13,XMMWORD[((160-128))+rdi]
+        vmovdqu xmm14,XMMWORD[((192-128))+rdi]
+        vmovdqu xmm15,XMMWORD[((224-128))+rdi]
+        vmovdqu xmm6,XMMWORD[$L$pbswap]
+        jmp     NEAR $L$oop_avx
+
+ALIGN   32
+$L$oop_avx:
+        vpxor   xmm4,xmm10,xmm9
+        vmovd   xmm5,DWORD[r8]
+        vmovd   xmm0,DWORD[r9]
+        vpinsrd xmm5,xmm5,DWORD[r10],1
+        vpinsrd xmm0,xmm0,DWORD[r11],1
+        vpunpckldq      xmm5,xmm5,xmm0
+        vpshufb xmm5,xmm5,xmm6
+        vpsrld  xmm7,xmm12,6
+        vpslld  xmm2,xmm12,26
+        vmovdqu XMMWORD[(0-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm15
+
+        vpsrld  xmm1,xmm12,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm12,21
+        vpaddd  xmm5,xmm5,XMMWORD[((-128))+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm12,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm12,7
+        vpandn  xmm0,xmm12,xmm14
+        vpand   xmm3,xmm12,xmm13
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm15,xmm8,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm8,30
+        vpxor   xmm0,xmm0,xmm3
+        vpxor   xmm3,xmm9,xmm8
+
+        vpxor   xmm15,xmm15,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm8,13
+
+        vpslld  xmm2,xmm8,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm4,xmm4,xmm3
+
+        vpxor   xmm7,xmm15,xmm1
+
+        vpsrld  xmm1,xmm8,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm8,10
+        vpxor   xmm15,xmm9,xmm4
+        vpaddd  xmm11,xmm11,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm15,xmm15,xmm5
+        vpaddd  xmm15,xmm15,xmm7
+        vmovd   xmm5,DWORD[4+r8]
+        vmovd   xmm0,DWORD[4+r9]
+        vpinsrd xmm5,xmm5,DWORD[4+r10],1
+        vpinsrd xmm0,xmm0,DWORD[4+r11],1
+        vpunpckldq      xmm5,xmm5,xmm0
+        vpshufb xmm5,xmm5,xmm6
+        vpsrld  xmm7,xmm11,6
+        vpslld  xmm2,xmm11,26
+        vmovdqu XMMWORD[(16-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm14
+
+        vpsrld  xmm1,xmm11,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm11,21
+        vpaddd  xmm5,xmm5,XMMWORD[((-96))+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm11,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm11,7
+        vpandn  xmm0,xmm11,xmm13
+        vpand   xmm4,xmm11,xmm12
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm14,xmm15,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm15,30
+        vpxor   xmm0,xmm0,xmm4
+        vpxor   xmm4,xmm8,xmm15
+
+        vpxor   xmm14,xmm14,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm15,13
+
+        vpslld  xmm2,xmm15,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm3,xmm3,xmm4
+
+        vpxor   xmm7,xmm14,xmm1
+
+        vpsrld  xmm1,xmm15,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm15,10
+        vpxor   xmm14,xmm8,xmm3
+        vpaddd  xmm10,xmm10,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm14,xmm14,xmm5
+        vpaddd  xmm14,xmm14,xmm7
+        vmovd   xmm5,DWORD[8+r8]
+        vmovd   xmm0,DWORD[8+r9]
+        vpinsrd xmm5,xmm5,DWORD[8+r10],1
+        vpinsrd xmm0,xmm0,DWORD[8+r11],1
+        vpunpckldq      xmm5,xmm5,xmm0
+        vpshufb xmm5,xmm5,xmm6
+        vpsrld  xmm7,xmm10,6
+        vpslld  xmm2,xmm10,26
+        vmovdqu XMMWORD[(32-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm13
+
+        vpsrld  xmm1,xmm10,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm10,21
+        vpaddd  xmm5,xmm5,XMMWORD[((-64))+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm10,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm10,7
+        vpandn  xmm0,xmm10,xmm12
+        vpand   xmm3,xmm10,xmm11
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm13,xmm14,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm14,30
+        vpxor   xmm0,xmm0,xmm3
+        vpxor   xmm3,xmm15,xmm14
+
+        vpxor   xmm13,xmm13,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm14,13
+
+        vpslld  xmm2,xmm14,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm4,xmm4,xmm3
+
+        vpxor   xmm7,xmm13,xmm1
+
+        vpsrld  xmm1,xmm14,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm14,10
+        vpxor   xmm13,xmm15,xmm4
+        vpaddd  xmm9,xmm9,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm13,xmm13,xmm5
+        vpaddd  xmm13,xmm13,xmm7
+        vmovd   xmm5,DWORD[12+r8]
+        vmovd   xmm0,DWORD[12+r9]
+        vpinsrd xmm5,xmm5,DWORD[12+r10],1
+        vpinsrd xmm0,xmm0,DWORD[12+r11],1
+        vpunpckldq      xmm5,xmm5,xmm0
+        vpshufb xmm5,xmm5,xmm6
+        vpsrld  xmm7,xmm9,6
+        vpslld  xmm2,xmm9,26
+        vmovdqu XMMWORD[(48-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm12
+
+        vpsrld  xmm1,xmm9,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm9,21
+        vpaddd  xmm5,xmm5,XMMWORD[((-32))+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm9,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm9,7
+        vpandn  xmm0,xmm9,xmm11
+        vpand   xmm4,xmm9,xmm10
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm12,xmm13,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm13,30
+        vpxor   xmm0,xmm0,xmm4
+        vpxor   xmm4,xmm14,xmm13
+
+        vpxor   xmm12,xmm12,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm13,13
+
+        vpslld  xmm2,xmm13,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm3,xmm3,xmm4
+
+        vpxor   xmm7,xmm12,xmm1
+
+        vpsrld  xmm1,xmm13,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm13,10
+        vpxor   xmm12,xmm14,xmm3
+        vpaddd  xmm8,xmm8,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm12,xmm12,xmm5
+        vpaddd  xmm12,xmm12,xmm7
+        vmovd   xmm5,DWORD[16+r8]
+        vmovd   xmm0,DWORD[16+r9]
+        vpinsrd xmm5,xmm5,DWORD[16+r10],1
+        vpinsrd xmm0,xmm0,DWORD[16+r11],1
+        vpunpckldq      xmm5,xmm5,xmm0
+        vpshufb xmm5,xmm5,xmm6
+        vpsrld  xmm7,xmm8,6
+        vpslld  xmm2,xmm8,26
+        vmovdqu XMMWORD[(64-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm11
+
+        vpsrld  xmm1,xmm8,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm8,21
+        vpaddd  xmm5,xmm5,XMMWORD[rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm8,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm8,7
+        vpandn  xmm0,xmm8,xmm10
+        vpand   xmm3,xmm8,xmm9
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm11,xmm12,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm12,30
+        vpxor   xmm0,xmm0,xmm3
+        vpxor   xmm3,xmm13,xmm12
+
+        vpxor   xmm11,xmm11,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm12,13
+
+        vpslld  xmm2,xmm12,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm4,xmm4,xmm3
+
+        vpxor   xmm7,xmm11,xmm1
+
+        vpsrld  xmm1,xmm12,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm12,10
+        vpxor   xmm11,xmm13,xmm4
+        vpaddd  xmm15,xmm15,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm11,xmm11,xmm5
+        vpaddd  xmm11,xmm11,xmm7
+        vmovd   xmm5,DWORD[20+r8]
+        vmovd   xmm0,DWORD[20+r9]
+        vpinsrd xmm5,xmm5,DWORD[20+r10],1
+        vpinsrd xmm0,xmm0,DWORD[20+r11],1
+        vpunpckldq      xmm5,xmm5,xmm0
+        vpshufb xmm5,xmm5,xmm6
+        vpsrld  xmm7,xmm15,6
+        vpslld  xmm2,xmm15,26
+        vmovdqu XMMWORD[(80-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm10
+
+        vpsrld  xmm1,xmm15,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm15,21
+        vpaddd  xmm5,xmm5,XMMWORD[32+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm15,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm15,7
+        vpandn  xmm0,xmm15,xmm9
+        vpand   xmm4,xmm15,xmm8
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm10,xmm11,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm11,30
+        vpxor   xmm0,xmm0,xmm4
+        vpxor   xmm4,xmm12,xmm11
+
+        vpxor   xmm10,xmm10,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm11,13
+
+        vpslld  xmm2,xmm11,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm3,xmm3,xmm4
+
+        vpxor   xmm7,xmm10,xmm1
+
+        vpsrld  xmm1,xmm11,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm11,10
+        vpxor   xmm10,xmm12,xmm3
+        vpaddd  xmm14,xmm14,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm10,xmm10,xmm5
+        vpaddd  xmm10,xmm10,xmm7
+        vmovd   xmm5,DWORD[24+r8]
+        vmovd   xmm0,DWORD[24+r9]
+        vpinsrd xmm5,xmm5,DWORD[24+r10],1
+        vpinsrd xmm0,xmm0,DWORD[24+r11],1
+        vpunpckldq      xmm5,xmm5,xmm0
+        vpshufb xmm5,xmm5,xmm6
+        vpsrld  xmm7,xmm14,6
+        vpslld  xmm2,xmm14,26
+        vmovdqu XMMWORD[(96-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm9
+
+        vpsrld  xmm1,xmm14,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm14,21
+        vpaddd  xmm5,xmm5,XMMWORD[64+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm14,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm14,7
+        vpandn  xmm0,xmm14,xmm8
+        vpand   xmm3,xmm14,xmm15
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm9,xmm10,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm10,30
+        vpxor   xmm0,xmm0,xmm3
+        vpxor   xmm3,xmm11,xmm10
+
+        vpxor   xmm9,xmm9,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm10,13
+
+        vpslld  xmm2,xmm10,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm4,xmm4,xmm3
+
+        vpxor   xmm7,xmm9,xmm1
+
+        vpsrld  xmm1,xmm10,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm10,10
+        vpxor   xmm9,xmm11,xmm4
+        vpaddd  xmm13,xmm13,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm9,xmm9,xmm5
+        vpaddd  xmm9,xmm9,xmm7
+        vmovd   xmm5,DWORD[28+r8]
+        vmovd   xmm0,DWORD[28+r9]
+        vpinsrd xmm5,xmm5,DWORD[28+r10],1
+        vpinsrd xmm0,xmm0,DWORD[28+r11],1
+        vpunpckldq      xmm5,xmm5,xmm0
+        vpshufb xmm5,xmm5,xmm6
+        vpsrld  xmm7,xmm13,6
+        vpslld  xmm2,xmm13,26
+        vmovdqu XMMWORD[(112-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm8
+
+        vpsrld  xmm1,xmm13,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm13,21
+        vpaddd  xmm5,xmm5,XMMWORD[96+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm13,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm13,7
+        vpandn  xmm0,xmm13,xmm15
+        vpand   xmm4,xmm13,xmm14
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm8,xmm9,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm9,30
+        vpxor   xmm0,xmm0,xmm4
+        vpxor   xmm4,xmm10,xmm9
+
+        vpxor   xmm8,xmm8,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm9,13
+
+        vpslld  xmm2,xmm9,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm3,xmm3,xmm4
+
+        vpxor   xmm7,xmm8,xmm1
+
+        vpsrld  xmm1,xmm9,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm9,10
+        vpxor   xmm8,xmm10,xmm3
+        vpaddd  xmm12,xmm12,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm8,xmm8,xmm5
+        vpaddd  xmm8,xmm8,xmm7
+        add     rbp,256
+        vmovd   xmm5,DWORD[32+r8]
+        vmovd   xmm0,DWORD[32+r9]
+        vpinsrd xmm5,xmm5,DWORD[32+r10],1
+        vpinsrd xmm0,xmm0,DWORD[32+r11],1
+        vpunpckldq      xmm5,xmm5,xmm0
+        vpshufb xmm5,xmm5,xmm6
+        vpsrld  xmm7,xmm12,6
+        vpslld  xmm2,xmm12,26
+        vmovdqu XMMWORD[(128-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm15
+
+        vpsrld  xmm1,xmm12,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm12,21
+        vpaddd  xmm5,xmm5,XMMWORD[((-128))+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm12,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm12,7
+        vpandn  xmm0,xmm12,xmm14
+        vpand   xmm3,xmm12,xmm13
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm15,xmm8,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm8,30
+        vpxor   xmm0,xmm0,xmm3
+        vpxor   xmm3,xmm9,xmm8
+
+        vpxor   xmm15,xmm15,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm8,13
+
+        vpslld  xmm2,xmm8,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm4,xmm4,xmm3
+
+        vpxor   xmm7,xmm15,xmm1
+
+        vpsrld  xmm1,xmm8,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm8,10
+        vpxor   xmm15,xmm9,xmm4
+        vpaddd  xmm11,xmm11,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm15,xmm15,xmm5
+        vpaddd  xmm15,xmm15,xmm7
+        vmovd   xmm5,DWORD[36+r8]
+        vmovd   xmm0,DWORD[36+r9]
+        vpinsrd xmm5,xmm5,DWORD[36+r10],1
+        vpinsrd xmm0,xmm0,DWORD[36+r11],1
+        vpunpckldq      xmm5,xmm5,xmm0
+        vpshufb xmm5,xmm5,xmm6
+        vpsrld  xmm7,xmm11,6
+        vpslld  xmm2,xmm11,26
+        vmovdqu XMMWORD[(144-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm14
+
+        vpsrld  xmm1,xmm11,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm11,21
+        vpaddd  xmm5,xmm5,XMMWORD[((-96))+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm11,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm11,7
+        vpandn  xmm0,xmm11,xmm13
+        vpand   xmm4,xmm11,xmm12
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm14,xmm15,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm15,30
+        vpxor   xmm0,xmm0,xmm4
+        vpxor   xmm4,xmm8,xmm15
+
+        vpxor   xmm14,xmm14,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm15,13
+
+        vpslld  xmm2,xmm15,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm3,xmm3,xmm4
+
+        vpxor   xmm7,xmm14,xmm1
+
+        vpsrld  xmm1,xmm15,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm15,10
+        vpxor   xmm14,xmm8,xmm3
+        vpaddd  xmm10,xmm10,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm14,xmm14,xmm5
+        vpaddd  xmm14,xmm14,xmm7
+        vmovd   xmm5,DWORD[40+r8]
+        vmovd   xmm0,DWORD[40+r9]
+        vpinsrd xmm5,xmm5,DWORD[40+r10],1
+        vpinsrd xmm0,xmm0,DWORD[40+r11],1
+        vpunpckldq      xmm5,xmm5,xmm0
+        vpshufb xmm5,xmm5,xmm6
+        vpsrld  xmm7,xmm10,6
+        vpslld  xmm2,xmm10,26
+        vmovdqu XMMWORD[(160-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm13
+
+        vpsrld  xmm1,xmm10,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm10,21
+        vpaddd  xmm5,xmm5,XMMWORD[((-64))+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm10,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm10,7
+        vpandn  xmm0,xmm10,xmm12
+        vpand   xmm3,xmm10,xmm11
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm13,xmm14,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm14,30
+        vpxor   xmm0,xmm0,xmm3
+        vpxor   xmm3,xmm15,xmm14
+
+        vpxor   xmm13,xmm13,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm14,13
+
+        vpslld  xmm2,xmm14,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm4,xmm4,xmm3
+
+        vpxor   xmm7,xmm13,xmm1
+
+        vpsrld  xmm1,xmm14,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm14,10
+        vpxor   xmm13,xmm15,xmm4
+        vpaddd  xmm9,xmm9,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm13,xmm13,xmm5
+        vpaddd  xmm13,xmm13,xmm7
+        vmovd   xmm5,DWORD[44+r8]
+        vmovd   xmm0,DWORD[44+r9]
+        vpinsrd xmm5,xmm5,DWORD[44+r10],1
+        vpinsrd xmm0,xmm0,DWORD[44+r11],1
+        vpunpckldq      xmm5,xmm5,xmm0
+        vpshufb xmm5,xmm5,xmm6
+        vpsrld  xmm7,xmm9,6
+        vpslld  xmm2,xmm9,26
+        vmovdqu XMMWORD[(176-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm12
+
+        vpsrld  xmm1,xmm9,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm9,21
+        vpaddd  xmm5,xmm5,XMMWORD[((-32))+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm9,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm9,7
+        vpandn  xmm0,xmm9,xmm11
+        vpand   xmm4,xmm9,xmm10
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm12,xmm13,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm13,30
+        vpxor   xmm0,xmm0,xmm4
+        vpxor   xmm4,xmm14,xmm13
+
+        vpxor   xmm12,xmm12,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm13,13
+
+        vpslld  xmm2,xmm13,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm3,xmm3,xmm4
+
+        vpxor   xmm7,xmm12,xmm1
+
+        vpsrld  xmm1,xmm13,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm13,10
+        vpxor   xmm12,xmm14,xmm3
+        vpaddd  xmm8,xmm8,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm12,xmm12,xmm5
+        vpaddd  xmm12,xmm12,xmm7
+        vmovd   xmm5,DWORD[48+r8]
+        vmovd   xmm0,DWORD[48+r9]
+        vpinsrd xmm5,xmm5,DWORD[48+r10],1
+        vpinsrd xmm0,xmm0,DWORD[48+r11],1
+        vpunpckldq      xmm5,xmm5,xmm0
+        vpshufb xmm5,xmm5,xmm6
+        vpsrld  xmm7,xmm8,6
+        vpslld  xmm2,xmm8,26
+        vmovdqu XMMWORD[(192-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm11
+
+        vpsrld  xmm1,xmm8,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm8,21
+        vpaddd  xmm5,xmm5,XMMWORD[rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm8,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm8,7
+        vpandn  xmm0,xmm8,xmm10
+        vpand   xmm3,xmm8,xmm9
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm11,xmm12,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm12,30
+        vpxor   xmm0,xmm0,xmm3
+        vpxor   xmm3,xmm13,xmm12
+
+        vpxor   xmm11,xmm11,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm12,13
+
+        vpslld  xmm2,xmm12,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm4,xmm4,xmm3
+
+        vpxor   xmm7,xmm11,xmm1
+
+        vpsrld  xmm1,xmm12,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm12,10
+        vpxor   xmm11,xmm13,xmm4
+        vpaddd  xmm15,xmm15,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm11,xmm11,xmm5
+        vpaddd  xmm11,xmm11,xmm7
+        vmovd   xmm5,DWORD[52+r8]
+        vmovd   xmm0,DWORD[52+r9]
+        vpinsrd xmm5,xmm5,DWORD[52+r10],1
+        vpinsrd xmm0,xmm0,DWORD[52+r11],1
+        vpunpckldq      xmm5,xmm5,xmm0
+        vpshufb xmm5,xmm5,xmm6
+        vpsrld  xmm7,xmm15,6
+        vpslld  xmm2,xmm15,26
+        vmovdqu XMMWORD[(208-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm10
+
+        vpsrld  xmm1,xmm15,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm15,21
+        vpaddd  xmm5,xmm5,XMMWORD[32+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm15,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm15,7
+        vpandn  xmm0,xmm15,xmm9
+        vpand   xmm4,xmm15,xmm8
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm10,xmm11,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm11,30
+        vpxor   xmm0,xmm0,xmm4
+        vpxor   xmm4,xmm12,xmm11
+
+        vpxor   xmm10,xmm10,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm11,13
+
+        vpslld  xmm2,xmm11,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm3,xmm3,xmm4
+
+        vpxor   xmm7,xmm10,xmm1
+
+        vpsrld  xmm1,xmm11,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm11,10
+        vpxor   xmm10,xmm12,xmm3
+        vpaddd  xmm14,xmm14,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm10,xmm10,xmm5
+        vpaddd  xmm10,xmm10,xmm7
+        vmovd   xmm5,DWORD[56+r8]
+        vmovd   xmm0,DWORD[56+r9]
+        vpinsrd xmm5,xmm5,DWORD[56+r10],1
+        vpinsrd xmm0,xmm0,DWORD[56+r11],1
+        vpunpckldq      xmm5,xmm5,xmm0
+        vpshufb xmm5,xmm5,xmm6
+        vpsrld  xmm7,xmm14,6
+        vpslld  xmm2,xmm14,26
+        vmovdqu XMMWORD[(224-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm9
+
+        vpsrld  xmm1,xmm14,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm14,21
+        vpaddd  xmm5,xmm5,XMMWORD[64+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm14,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm14,7
+        vpandn  xmm0,xmm14,xmm8
+        vpand   xmm3,xmm14,xmm15
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm9,xmm10,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm10,30
+        vpxor   xmm0,xmm0,xmm3
+        vpxor   xmm3,xmm11,xmm10
+
+        vpxor   xmm9,xmm9,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm10,13
+
+        vpslld  xmm2,xmm10,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm4,xmm4,xmm3
+
+        vpxor   xmm7,xmm9,xmm1
+
+        vpsrld  xmm1,xmm10,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm10,10
+        vpxor   xmm9,xmm11,xmm4
+        vpaddd  xmm13,xmm13,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm9,xmm9,xmm5
+        vpaddd  xmm9,xmm9,xmm7
+        vmovd   xmm5,DWORD[60+r8]
+        lea     r8,[64+r8]
+        vmovd   xmm0,DWORD[60+r9]
+        lea     r9,[64+r9]
+        vpinsrd xmm5,xmm5,DWORD[60+r10],1
+        lea     r10,[64+r10]
+        vpinsrd xmm0,xmm0,DWORD[60+r11],1
+        lea     r11,[64+r11]
+        vpunpckldq      xmm5,xmm5,xmm0
+        vpshufb xmm5,xmm5,xmm6
+        vpsrld  xmm7,xmm13,6
+        vpslld  xmm2,xmm13,26
+        vmovdqu XMMWORD[(240-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm8
+
+        vpsrld  xmm1,xmm13,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm13,21
+        vpaddd  xmm5,xmm5,XMMWORD[96+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm13,25
+        vpxor   xmm7,xmm7,xmm2
+        prefetcht0      [63+r8]
+        vpslld  xmm2,xmm13,7
+        vpandn  xmm0,xmm13,xmm15
+        vpand   xmm4,xmm13,xmm14
+        prefetcht0      [63+r9]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm8,xmm9,2
+        vpxor   xmm7,xmm7,xmm2
+        prefetcht0      [63+r10]
+        vpslld  xmm1,xmm9,30
+        vpxor   xmm0,xmm0,xmm4
+        vpxor   xmm4,xmm10,xmm9
+        prefetcht0      [63+r11]
+        vpxor   xmm8,xmm8,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm9,13
+
+        vpslld  xmm2,xmm9,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm3,xmm3,xmm4
+
+        vpxor   xmm7,xmm8,xmm1
+
+        vpsrld  xmm1,xmm9,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm9,10
+        vpxor   xmm8,xmm10,xmm3
+        vpaddd  xmm12,xmm12,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm8,xmm8,xmm5
+        vpaddd  xmm8,xmm8,xmm7
+        add     rbp,256
+        vmovdqu xmm5,XMMWORD[((0-128))+rax]
+        mov     ecx,3
+        jmp     NEAR $L$oop_16_xx_avx
+ALIGN   32
+$L$oop_16_xx_avx:
+        vmovdqu xmm6,XMMWORD[((16-128))+rax]
+        vpaddd  xmm5,xmm5,XMMWORD[((144-128))+rax]
+
+        vpsrld  xmm7,xmm6,3
+        vpsrld  xmm1,xmm6,7
+        vpslld  xmm2,xmm6,25
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm6,18
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm6,14
+        vmovdqu xmm0,XMMWORD[((224-128))+rax]
+        vpsrld  xmm3,xmm0,10
+
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm0,17
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,15
+        vpaddd  xmm5,xmm5,xmm7
+        vpxor   xmm7,xmm3,xmm1
+        vpsrld  xmm1,xmm0,19
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,13
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+        vpaddd  xmm5,xmm5,xmm7
+        vpsrld  xmm7,xmm12,6
+        vpslld  xmm2,xmm12,26
+        vmovdqu XMMWORD[(0-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm15
+
+        vpsrld  xmm1,xmm12,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm12,21
+        vpaddd  xmm5,xmm5,XMMWORD[((-128))+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm12,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm12,7
+        vpandn  xmm0,xmm12,xmm14
+        vpand   xmm3,xmm12,xmm13
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm15,xmm8,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm8,30
+        vpxor   xmm0,xmm0,xmm3
+        vpxor   xmm3,xmm9,xmm8
+
+        vpxor   xmm15,xmm15,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm8,13
+
+        vpslld  xmm2,xmm8,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm4,xmm4,xmm3
+
+        vpxor   xmm7,xmm15,xmm1
+
+        vpsrld  xmm1,xmm8,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm8,10
+        vpxor   xmm15,xmm9,xmm4
+        vpaddd  xmm11,xmm11,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm15,xmm15,xmm5
+        vpaddd  xmm15,xmm15,xmm7
+        vmovdqu xmm5,XMMWORD[((32-128))+rax]
+        vpaddd  xmm6,xmm6,XMMWORD[((160-128))+rax]
+
+        vpsrld  xmm7,xmm5,3
+        vpsrld  xmm1,xmm5,7
+        vpslld  xmm2,xmm5,25
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm5,18
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm5,14
+        vmovdqu xmm0,XMMWORD[((240-128))+rax]
+        vpsrld  xmm4,xmm0,10
+
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm0,17
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,15
+        vpaddd  xmm6,xmm6,xmm7
+        vpxor   xmm7,xmm4,xmm1
+        vpsrld  xmm1,xmm0,19
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,13
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+        vpaddd  xmm6,xmm6,xmm7
+        vpsrld  xmm7,xmm11,6
+        vpslld  xmm2,xmm11,26
+        vmovdqu XMMWORD[(16-128)+rax],xmm6
+        vpaddd  xmm6,xmm6,xmm14
+
+        vpsrld  xmm1,xmm11,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm11,21
+        vpaddd  xmm6,xmm6,XMMWORD[((-96))+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm11,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm11,7
+        vpandn  xmm0,xmm11,xmm13
+        vpand   xmm4,xmm11,xmm12
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm14,xmm15,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm15,30
+        vpxor   xmm0,xmm0,xmm4
+        vpxor   xmm4,xmm8,xmm15
+
+        vpxor   xmm14,xmm14,xmm1
+        vpaddd  xmm6,xmm6,xmm7
+
+        vpsrld  xmm1,xmm15,13
+
+        vpslld  xmm2,xmm15,19
+        vpaddd  xmm6,xmm6,xmm0
+        vpand   xmm3,xmm3,xmm4
+
+        vpxor   xmm7,xmm14,xmm1
+
+        vpsrld  xmm1,xmm15,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm15,10
+        vpxor   xmm14,xmm8,xmm3
+        vpaddd  xmm10,xmm10,xmm6
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm14,xmm14,xmm6
+        vpaddd  xmm14,xmm14,xmm7
+        vmovdqu xmm6,XMMWORD[((48-128))+rax]
+        vpaddd  xmm5,xmm5,XMMWORD[((176-128))+rax]
+
+        vpsrld  xmm7,xmm6,3
+        vpsrld  xmm1,xmm6,7
+        vpslld  xmm2,xmm6,25
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm6,18
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm6,14
+        vmovdqu xmm0,XMMWORD[((0-128))+rax]
+        vpsrld  xmm3,xmm0,10
+
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm0,17
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,15
+        vpaddd  xmm5,xmm5,xmm7
+        vpxor   xmm7,xmm3,xmm1
+        vpsrld  xmm1,xmm0,19
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,13
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+        vpaddd  xmm5,xmm5,xmm7
+        vpsrld  xmm7,xmm10,6
+        vpslld  xmm2,xmm10,26
+        vmovdqu XMMWORD[(32-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm13
+
+        vpsrld  xmm1,xmm10,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm10,21
+        vpaddd  xmm5,xmm5,XMMWORD[((-64))+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm10,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm10,7
+        vpandn  xmm0,xmm10,xmm12
+        vpand   xmm3,xmm10,xmm11
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm13,xmm14,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm14,30
+        vpxor   xmm0,xmm0,xmm3
+        vpxor   xmm3,xmm15,xmm14
+
+        vpxor   xmm13,xmm13,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm14,13
+
+        vpslld  xmm2,xmm14,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm4,xmm4,xmm3
+
+        vpxor   xmm7,xmm13,xmm1
+
+        vpsrld  xmm1,xmm14,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm14,10
+        vpxor   xmm13,xmm15,xmm4
+        vpaddd  xmm9,xmm9,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm13,xmm13,xmm5
+        vpaddd  xmm13,xmm13,xmm7
+        vmovdqu xmm5,XMMWORD[((64-128))+rax]
+        vpaddd  xmm6,xmm6,XMMWORD[((192-128))+rax]
+
+        vpsrld  xmm7,xmm5,3
+        vpsrld  xmm1,xmm5,7
+        vpslld  xmm2,xmm5,25
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm5,18
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm5,14
+        vmovdqu xmm0,XMMWORD[((16-128))+rax]
+        vpsrld  xmm4,xmm0,10
+
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm0,17
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,15
+        vpaddd  xmm6,xmm6,xmm7
+        vpxor   xmm7,xmm4,xmm1
+        vpsrld  xmm1,xmm0,19
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,13
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+        vpaddd  xmm6,xmm6,xmm7
+        vpsrld  xmm7,xmm9,6
+        vpslld  xmm2,xmm9,26
+        vmovdqu XMMWORD[(48-128)+rax],xmm6
+        vpaddd  xmm6,xmm6,xmm12
+
+        vpsrld  xmm1,xmm9,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm9,21
+        vpaddd  xmm6,xmm6,XMMWORD[((-32))+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm9,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm9,7
+        vpandn  xmm0,xmm9,xmm11
+        vpand   xmm4,xmm9,xmm10
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm12,xmm13,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm13,30
+        vpxor   xmm0,xmm0,xmm4
+        vpxor   xmm4,xmm14,xmm13
+
+        vpxor   xmm12,xmm12,xmm1
+        vpaddd  xmm6,xmm6,xmm7
+
+        vpsrld  xmm1,xmm13,13
+
+        vpslld  xmm2,xmm13,19
+        vpaddd  xmm6,xmm6,xmm0
+        vpand   xmm3,xmm3,xmm4
+
+        vpxor   xmm7,xmm12,xmm1
+
+        vpsrld  xmm1,xmm13,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm13,10
+        vpxor   xmm12,xmm14,xmm3
+        vpaddd  xmm8,xmm8,xmm6
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm12,xmm12,xmm6
+        vpaddd  xmm12,xmm12,xmm7
+        vmovdqu xmm6,XMMWORD[((80-128))+rax]
+        vpaddd  xmm5,xmm5,XMMWORD[((208-128))+rax]
+
+        vpsrld  xmm7,xmm6,3
+        vpsrld  xmm1,xmm6,7
+        vpslld  xmm2,xmm6,25
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm6,18
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm6,14
+        vmovdqu xmm0,XMMWORD[((32-128))+rax]
+        vpsrld  xmm3,xmm0,10
+
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm0,17
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,15
+        vpaddd  xmm5,xmm5,xmm7
+        vpxor   xmm7,xmm3,xmm1
+        vpsrld  xmm1,xmm0,19
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,13
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+        vpaddd  xmm5,xmm5,xmm7
+        vpsrld  xmm7,xmm8,6
+        vpslld  xmm2,xmm8,26
+        vmovdqu XMMWORD[(64-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm11
+
+        vpsrld  xmm1,xmm8,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm8,21
+        vpaddd  xmm5,xmm5,XMMWORD[rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm8,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm8,7
+        vpandn  xmm0,xmm8,xmm10
+        vpand   xmm3,xmm8,xmm9
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm11,xmm12,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm12,30
+        vpxor   xmm0,xmm0,xmm3
+        vpxor   xmm3,xmm13,xmm12
+
+        vpxor   xmm11,xmm11,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm12,13
+
+        vpslld  xmm2,xmm12,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm4,xmm4,xmm3
+
+        vpxor   xmm7,xmm11,xmm1
+
+        vpsrld  xmm1,xmm12,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm12,10
+        vpxor   xmm11,xmm13,xmm4
+        vpaddd  xmm15,xmm15,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm11,xmm11,xmm5
+        vpaddd  xmm11,xmm11,xmm7
+        vmovdqu xmm5,XMMWORD[((96-128))+rax]
+        vpaddd  xmm6,xmm6,XMMWORD[((224-128))+rax]
+
+        vpsrld  xmm7,xmm5,3
+        vpsrld  xmm1,xmm5,7
+        vpslld  xmm2,xmm5,25
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm5,18
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm5,14
+        vmovdqu xmm0,XMMWORD[((48-128))+rax]
+        vpsrld  xmm4,xmm0,10
+
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm0,17
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,15
+        vpaddd  xmm6,xmm6,xmm7
+        vpxor   xmm7,xmm4,xmm1
+        vpsrld  xmm1,xmm0,19
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,13
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+        vpaddd  xmm6,xmm6,xmm7
+        vpsrld  xmm7,xmm15,6
+        vpslld  xmm2,xmm15,26
+        vmovdqu XMMWORD[(80-128)+rax],xmm6
+        vpaddd  xmm6,xmm6,xmm10
+
+        vpsrld  xmm1,xmm15,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm15,21
+        vpaddd  xmm6,xmm6,XMMWORD[32+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm15,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm15,7
+        vpandn  xmm0,xmm15,xmm9
+        vpand   xmm4,xmm15,xmm8
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm10,xmm11,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm11,30
+        vpxor   xmm0,xmm0,xmm4
+        vpxor   xmm4,xmm12,xmm11
+
+        vpxor   xmm10,xmm10,xmm1
+        vpaddd  xmm6,xmm6,xmm7
+
+        vpsrld  xmm1,xmm11,13
+
+        vpslld  xmm2,xmm11,19
+        vpaddd  xmm6,xmm6,xmm0
+        vpand   xmm3,xmm3,xmm4
+
+        vpxor   xmm7,xmm10,xmm1
+
+        vpsrld  xmm1,xmm11,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm11,10
+        vpxor   xmm10,xmm12,xmm3
+        vpaddd  xmm14,xmm14,xmm6
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm10,xmm10,xmm6
+        vpaddd  xmm10,xmm10,xmm7
+        vmovdqu xmm6,XMMWORD[((112-128))+rax]
+        vpaddd  xmm5,xmm5,XMMWORD[((240-128))+rax]
+
+        vpsrld  xmm7,xmm6,3
+        vpsrld  xmm1,xmm6,7
+        vpslld  xmm2,xmm6,25
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm6,18
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm6,14
+        vmovdqu xmm0,XMMWORD[((64-128))+rax]
+        vpsrld  xmm3,xmm0,10
+
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm0,17
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,15
+        vpaddd  xmm5,xmm5,xmm7
+        vpxor   xmm7,xmm3,xmm1
+        vpsrld  xmm1,xmm0,19
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,13
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+        vpaddd  xmm5,xmm5,xmm7
+        vpsrld  xmm7,xmm14,6
+        vpslld  xmm2,xmm14,26
+        vmovdqu XMMWORD[(96-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm9
+
+        vpsrld  xmm1,xmm14,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm14,21
+        vpaddd  xmm5,xmm5,XMMWORD[64+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm14,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm14,7
+        vpandn  xmm0,xmm14,xmm8
+        vpand   xmm3,xmm14,xmm15
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm9,xmm10,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm10,30
+        vpxor   xmm0,xmm0,xmm3
+        vpxor   xmm3,xmm11,xmm10
+
+        vpxor   xmm9,xmm9,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm10,13
+
+        vpslld  xmm2,xmm10,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm4,xmm4,xmm3
+
+        vpxor   xmm7,xmm9,xmm1
+
+        vpsrld  xmm1,xmm10,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm10,10
+        vpxor   xmm9,xmm11,xmm4
+        vpaddd  xmm13,xmm13,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm9,xmm9,xmm5
+        vpaddd  xmm9,xmm9,xmm7
+        vmovdqu xmm5,XMMWORD[((128-128))+rax]
+        vpaddd  xmm6,xmm6,XMMWORD[((0-128))+rax]
+
+        vpsrld  xmm7,xmm5,3
+        vpsrld  xmm1,xmm5,7
+        vpslld  xmm2,xmm5,25
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm5,18
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm5,14
+        vmovdqu xmm0,XMMWORD[((80-128))+rax]
+        vpsrld  xmm4,xmm0,10
+
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm0,17
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,15
+        vpaddd  xmm6,xmm6,xmm7
+        vpxor   xmm7,xmm4,xmm1
+        vpsrld  xmm1,xmm0,19
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,13
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+        vpaddd  xmm6,xmm6,xmm7
+        vpsrld  xmm7,xmm13,6
+        vpslld  xmm2,xmm13,26
+        vmovdqu XMMWORD[(112-128)+rax],xmm6
+        vpaddd  xmm6,xmm6,xmm8
+
+        vpsrld  xmm1,xmm13,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm13,21
+        vpaddd  xmm6,xmm6,XMMWORD[96+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm13,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm13,7
+        vpandn  xmm0,xmm13,xmm15
+        vpand   xmm4,xmm13,xmm14
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm8,xmm9,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm9,30
+        vpxor   xmm0,xmm0,xmm4
+        vpxor   xmm4,xmm10,xmm9
+
+        vpxor   xmm8,xmm8,xmm1
+        vpaddd  xmm6,xmm6,xmm7
+
+        vpsrld  xmm1,xmm9,13
+
+        vpslld  xmm2,xmm9,19
+        vpaddd  xmm6,xmm6,xmm0
+        vpand   xmm3,xmm3,xmm4
+
+        vpxor   xmm7,xmm8,xmm1
+
+        vpsrld  xmm1,xmm9,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm9,10
+        vpxor   xmm8,xmm10,xmm3
+        vpaddd  xmm12,xmm12,xmm6
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm8,xmm8,xmm6
+        vpaddd  xmm8,xmm8,xmm7
+        add     rbp,256
+        vmovdqu xmm6,XMMWORD[((144-128))+rax]
+        vpaddd  xmm5,xmm5,XMMWORD[((16-128))+rax]
+
+        vpsrld  xmm7,xmm6,3
+        vpsrld  xmm1,xmm6,7
+        vpslld  xmm2,xmm6,25
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm6,18
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm6,14
+        vmovdqu xmm0,XMMWORD[((96-128))+rax]
+        vpsrld  xmm3,xmm0,10
+
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm0,17
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,15
+        vpaddd  xmm5,xmm5,xmm7
+        vpxor   xmm7,xmm3,xmm1
+        vpsrld  xmm1,xmm0,19
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,13
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+        vpaddd  xmm5,xmm5,xmm7
+        vpsrld  xmm7,xmm12,6
+        vpslld  xmm2,xmm12,26
+        vmovdqu XMMWORD[(128-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm15
+
+        vpsrld  xmm1,xmm12,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm12,21
+        vpaddd  xmm5,xmm5,XMMWORD[((-128))+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm12,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm12,7
+        vpandn  xmm0,xmm12,xmm14
+        vpand   xmm3,xmm12,xmm13
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm15,xmm8,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm8,30
+        vpxor   xmm0,xmm0,xmm3
+        vpxor   xmm3,xmm9,xmm8
+
+        vpxor   xmm15,xmm15,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm8,13
+
+        vpslld  xmm2,xmm8,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm4,xmm4,xmm3
+
+        vpxor   xmm7,xmm15,xmm1
+
+        vpsrld  xmm1,xmm8,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm8,10
+        vpxor   xmm15,xmm9,xmm4
+        vpaddd  xmm11,xmm11,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm15,xmm15,xmm5
+        vpaddd  xmm15,xmm15,xmm7
+        vmovdqu xmm5,XMMWORD[((160-128))+rax]
+        vpaddd  xmm6,xmm6,XMMWORD[((32-128))+rax]
+
+        vpsrld  xmm7,xmm5,3
+        vpsrld  xmm1,xmm5,7
+        vpslld  xmm2,xmm5,25
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm5,18
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm5,14
+        vmovdqu xmm0,XMMWORD[((112-128))+rax]
+        vpsrld  xmm4,xmm0,10
+
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm0,17
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,15
+        vpaddd  xmm6,xmm6,xmm7
+        vpxor   xmm7,xmm4,xmm1
+        vpsrld  xmm1,xmm0,19
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,13
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+        vpaddd  xmm6,xmm6,xmm7
+        vpsrld  xmm7,xmm11,6
+        vpslld  xmm2,xmm11,26
+        vmovdqu XMMWORD[(144-128)+rax],xmm6
+        vpaddd  xmm6,xmm6,xmm14
+
+        vpsrld  xmm1,xmm11,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm11,21
+        vpaddd  xmm6,xmm6,XMMWORD[((-96))+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm11,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm11,7
+        vpandn  xmm0,xmm11,xmm13
+        vpand   xmm4,xmm11,xmm12
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm14,xmm15,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm15,30
+        vpxor   xmm0,xmm0,xmm4
+        vpxor   xmm4,xmm8,xmm15
+
+        vpxor   xmm14,xmm14,xmm1
+        vpaddd  xmm6,xmm6,xmm7
+
+        vpsrld  xmm1,xmm15,13
+
+        vpslld  xmm2,xmm15,19
+        vpaddd  xmm6,xmm6,xmm0
+        vpand   xmm3,xmm3,xmm4
+
+        vpxor   xmm7,xmm14,xmm1
+
+        vpsrld  xmm1,xmm15,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm15,10
+        vpxor   xmm14,xmm8,xmm3
+        vpaddd  xmm10,xmm10,xmm6
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm14,xmm14,xmm6
+        vpaddd  xmm14,xmm14,xmm7
+        vmovdqu xmm6,XMMWORD[((176-128))+rax]
+        vpaddd  xmm5,xmm5,XMMWORD[((48-128))+rax]
+
+        vpsrld  xmm7,xmm6,3
+        vpsrld  xmm1,xmm6,7
+        vpslld  xmm2,xmm6,25
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm6,18
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm6,14
+        vmovdqu xmm0,XMMWORD[((128-128))+rax]
+        vpsrld  xmm3,xmm0,10
+
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm0,17
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,15
+        vpaddd  xmm5,xmm5,xmm7
+        vpxor   xmm7,xmm3,xmm1
+        vpsrld  xmm1,xmm0,19
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,13
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+        vpaddd  xmm5,xmm5,xmm7
+        vpsrld  xmm7,xmm10,6
+        vpslld  xmm2,xmm10,26
+        vmovdqu XMMWORD[(160-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm13
+
+        vpsrld  xmm1,xmm10,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm10,21
+        vpaddd  xmm5,xmm5,XMMWORD[((-64))+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm10,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm10,7
+        vpandn  xmm0,xmm10,xmm12
+        vpand   xmm3,xmm10,xmm11
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm13,xmm14,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm14,30
+        vpxor   xmm0,xmm0,xmm3
+        vpxor   xmm3,xmm15,xmm14
+
+        vpxor   xmm13,xmm13,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm14,13
+
+        vpslld  xmm2,xmm14,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm4,xmm4,xmm3
+
+        vpxor   xmm7,xmm13,xmm1
+
+        vpsrld  xmm1,xmm14,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm14,10
+        vpxor   xmm13,xmm15,xmm4
+        vpaddd  xmm9,xmm9,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm13,xmm13,xmm5
+        vpaddd  xmm13,xmm13,xmm7
+        vmovdqu xmm5,XMMWORD[((192-128))+rax]
+        vpaddd  xmm6,xmm6,XMMWORD[((64-128))+rax]
+
+        vpsrld  xmm7,xmm5,3
+        vpsrld  xmm1,xmm5,7
+        vpslld  xmm2,xmm5,25
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm5,18
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm5,14
+        vmovdqu xmm0,XMMWORD[((144-128))+rax]
+        vpsrld  xmm4,xmm0,10
+
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm0,17
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,15
+        vpaddd  xmm6,xmm6,xmm7
+        vpxor   xmm7,xmm4,xmm1
+        vpsrld  xmm1,xmm0,19
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,13
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+        vpaddd  xmm6,xmm6,xmm7
+        vpsrld  xmm7,xmm9,6
+        vpslld  xmm2,xmm9,26
+        vmovdqu XMMWORD[(176-128)+rax],xmm6
+        vpaddd  xmm6,xmm6,xmm12
+
+        vpsrld  xmm1,xmm9,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm9,21
+        vpaddd  xmm6,xmm6,XMMWORD[((-32))+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm9,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm9,7
+        vpandn  xmm0,xmm9,xmm11
+        vpand   xmm4,xmm9,xmm10
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm12,xmm13,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm13,30
+        vpxor   xmm0,xmm0,xmm4
+        vpxor   xmm4,xmm14,xmm13
+
+        vpxor   xmm12,xmm12,xmm1
+        vpaddd  xmm6,xmm6,xmm7
+
+        vpsrld  xmm1,xmm13,13
+
+        vpslld  xmm2,xmm13,19
+        vpaddd  xmm6,xmm6,xmm0
+        vpand   xmm3,xmm3,xmm4
+
+        vpxor   xmm7,xmm12,xmm1
+
+        vpsrld  xmm1,xmm13,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm13,10
+        vpxor   xmm12,xmm14,xmm3
+        vpaddd  xmm8,xmm8,xmm6
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm12,xmm12,xmm6
+        vpaddd  xmm12,xmm12,xmm7
+        vmovdqu xmm6,XMMWORD[((208-128))+rax]
+        vpaddd  xmm5,xmm5,XMMWORD[((80-128))+rax]
+
+        vpsrld  xmm7,xmm6,3
+        vpsrld  xmm1,xmm6,7
+        vpslld  xmm2,xmm6,25
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm6,18
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm6,14
+        vmovdqu xmm0,XMMWORD[((160-128))+rax]
+        vpsrld  xmm3,xmm0,10
+
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm0,17
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,15
+        vpaddd  xmm5,xmm5,xmm7
+        vpxor   xmm7,xmm3,xmm1
+        vpsrld  xmm1,xmm0,19
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,13
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+        vpaddd  xmm5,xmm5,xmm7
+        vpsrld  xmm7,xmm8,6
+        vpslld  xmm2,xmm8,26
+        vmovdqu XMMWORD[(192-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm11
+
+        vpsrld  xmm1,xmm8,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm8,21
+        vpaddd  xmm5,xmm5,XMMWORD[rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm8,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm8,7
+        vpandn  xmm0,xmm8,xmm10
+        vpand   xmm3,xmm8,xmm9
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm11,xmm12,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm12,30
+        vpxor   xmm0,xmm0,xmm3
+        vpxor   xmm3,xmm13,xmm12
+
+        vpxor   xmm11,xmm11,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm12,13
+
+        vpslld  xmm2,xmm12,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm4,xmm4,xmm3
+
+        vpxor   xmm7,xmm11,xmm1
+
+        vpsrld  xmm1,xmm12,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm12,10
+        vpxor   xmm11,xmm13,xmm4
+        vpaddd  xmm15,xmm15,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm11,xmm11,xmm5
+        vpaddd  xmm11,xmm11,xmm7
+        vmovdqu xmm5,XMMWORD[((224-128))+rax]
+        vpaddd  xmm6,xmm6,XMMWORD[((96-128))+rax]
+
+        vpsrld  xmm7,xmm5,3
+        vpsrld  xmm1,xmm5,7
+        vpslld  xmm2,xmm5,25
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm5,18
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm5,14
+        vmovdqu xmm0,XMMWORD[((176-128))+rax]
+        vpsrld  xmm4,xmm0,10
+
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm0,17
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,15
+        vpaddd  xmm6,xmm6,xmm7
+        vpxor   xmm7,xmm4,xmm1
+        vpsrld  xmm1,xmm0,19
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,13
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+        vpaddd  xmm6,xmm6,xmm7
+        vpsrld  xmm7,xmm15,6
+        vpslld  xmm2,xmm15,26
+        vmovdqu XMMWORD[(208-128)+rax],xmm6
+        vpaddd  xmm6,xmm6,xmm10
+
+        vpsrld  xmm1,xmm15,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm15,21
+        vpaddd  xmm6,xmm6,XMMWORD[32+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm15,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm15,7
+        vpandn  xmm0,xmm15,xmm9
+        vpand   xmm4,xmm15,xmm8
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm10,xmm11,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm11,30
+        vpxor   xmm0,xmm0,xmm4
+        vpxor   xmm4,xmm12,xmm11
+
+        vpxor   xmm10,xmm10,xmm1
+        vpaddd  xmm6,xmm6,xmm7
+
+        vpsrld  xmm1,xmm11,13
+
+        vpslld  xmm2,xmm11,19
+        vpaddd  xmm6,xmm6,xmm0
+        vpand   xmm3,xmm3,xmm4
+
+        vpxor   xmm7,xmm10,xmm1
+
+        vpsrld  xmm1,xmm11,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm11,10
+        vpxor   xmm10,xmm12,xmm3
+        vpaddd  xmm14,xmm14,xmm6
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm10,xmm10,xmm6
+        vpaddd  xmm10,xmm10,xmm7
+        vmovdqu xmm6,XMMWORD[((240-128))+rax]
+        vpaddd  xmm5,xmm5,XMMWORD[((112-128))+rax]
+
+        vpsrld  xmm7,xmm6,3
+        vpsrld  xmm1,xmm6,7
+        vpslld  xmm2,xmm6,25
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm6,18
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm6,14
+        vmovdqu xmm0,XMMWORD[((192-128))+rax]
+        vpsrld  xmm3,xmm0,10
+
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm0,17
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,15
+        vpaddd  xmm5,xmm5,xmm7
+        vpxor   xmm7,xmm3,xmm1
+        vpsrld  xmm1,xmm0,19
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,13
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+        vpaddd  xmm5,xmm5,xmm7
+        vpsrld  xmm7,xmm14,6
+        vpslld  xmm2,xmm14,26
+        vmovdqu XMMWORD[(224-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm9
+
+        vpsrld  xmm1,xmm14,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm14,21
+        vpaddd  xmm5,xmm5,XMMWORD[64+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm14,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm14,7
+        vpandn  xmm0,xmm14,xmm8
+        vpand   xmm3,xmm14,xmm15
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm9,xmm10,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm10,30
+        vpxor   xmm0,xmm0,xmm3
+        vpxor   xmm3,xmm11,xmm10
+
+        vpxor   xmm9,xmm9,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm10,13
+
+        vpslld  xmm2,xmm10,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm4,xmm4,xmm3
+
+        vpxor   xmm7,xmm9,xmm1
+
+        vpsrld  xmm1,xmm10,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm10,10
+        vpxor   xmm9,xmm11,xmm4
+        vpaddd  xmm13,xmm13,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm9,xmm9,xmm5
+        vpaddd  xmm9,xmm9,xmm7
+        vmovdqu xmm5,XMMWORD[((0-128))+rax]
+        vpaddd  xmm6,xmm6,XMMWORD[((128-128))+rax]
+
+        vpsrld  xmm7,xmm5,3
+        vpsrld  xmm1,xmm5,7
+        vpslld  xmm2,xmm5,25
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm5,18
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm5,14
+        vmovdqu xmm0,XMMWORD[((208-128))+rax]
+        vpsrld  xmm4,xmm0,10
+
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm0,17
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,15
+        vpaddd  xmm6,xmm6,xmm7
+        vpxor   xmm7,xmm4,xmm1
+        vpsrld  xmm1,xmm0,19
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,13
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+        vpaddd  xmm6,xmm6,xmm7
+        vpsrld  xmm7,xmm13,6
+        vpslld  xmm2,xmm13,26
+        vmovdqu XMMWORD[(240-128)+rax],xmm6
+        vpaddd  xmm6,xmm6,xmm8
+
+        vpsrld  xmm1,xmm13,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm13,21
+        vpaddd  xmm6,xmm6,XMMWORD[96+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm13,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm13,7
+        vpandn  xmm0,xmm13,xmm15
+        vpand   xmm4,xmm13,xmm14
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm8,xmm9,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm9,30
+        vpxor   xmm0,xmm0,xmm4
+        vpxor   xmm4,xmm10,xmm9
+
+        vpxor   xmm8,xmm8,xmm1
+        vpaddd  xmm6,xmm6,xmm7
+
+        vpsrld  xmm1,xmm9,13
+
+        vpslld  xmm2,xmm9,19
+        vpaddd  xmm6,xmm6,xmm0
+        vpand   xmm3,xmm3,xmm4
+
+        vpxor   xmm7,xmm8,xmm1
+
+        vpsrld  xmm1,xmm9,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm9,10
+        vpxor   xmm8,xmm10,xmm3
+        vpaddd  xmm12,xmm12,xmm6
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm8,xmm8,xmm6
+        vpaddd  xmm8,xmm8,xmm7
+        add     rbp,256
+        dec     ecx
+        jnz     NEAR $L$oop_16_xx_avx
+
+        mov     ecx,1
+        lea     rbp,[((K256+128))]
+        cmp     ecx,DWORD[rbx]
+        cmovge  r8,rbp
+        cmp     ecx,DWORD[4+rbx]
+        cmovge  r9,rbp
+        cmp     ecx,DWORD[8+rbx]
+        cmovge  r10,rbp
+        cmp     ecx,DWORD[12+rbx]
+        cmovge  r11,rbp
+        vmovdqa xmm7,XMMWORD[rbx]
+        vpxor   xmm0,xmm0,xmm0
+        vmovdqa xmm6,xmm7
+        vpcmpgtd        xmm6,xmm6,xmm0
+        vpaddd  xmm7,xmm7,xmm6
+
+        vmovdqu xmm0,XMMWORD[((0-128))+rdi]
+        vpand   xmm8,xmm8,xmm6
+        vmovdqu xmm1,XMMWORD[((32-128))+rdi]
+        vpand   xmm9,xmm9,xmm6
+        vmovdqu xmm2,XMMWORD[((64-128))+rdi]
+        vpand   xmm10,xmm10,xmm6
+        vmovdqu xmm5,XMMWORD[((96-128))+rdi]
+        vpand   xmm11,xmm11,xmm6
+        vpaddd  xmm8,xmm8,xmm0
+        vmovdqu xmm0,XMMWORD[((128-128))+rdi]
+        vpand   xmm12,xmm12,xmm6
+        vpaddd  xmm9,xmm9,xmm1
+        vmovdqu xmm1,XMMWORD[((160-128))+rdi]
+        vpand   xmm13,xmm13,xmm6
+        vpaddd  xmm10,xmm10,xmm2
+        vmovdqu xmm2,XMMWORD[((192-128))+rdi]
+        vpand   xmm14,xmm14,xmm6
+        vpaddd  xmm11,xmm11,xmm5
+        vmovdqu xmm5,XMMWORD[((224-128))+rdi]
+        vpand   xmm15,xmm15,xmm6
+        vpaddd  xmm12,xmm12,xmm0
+        vpaddd  xmm13,xmm13,xmm1
+        vmovdqu XMMWORD[(0-128)+rdi],xmm8
+        vpaddd  xmm14,xmm14,xmm2
+        vmovdqu XMMWORD[(32-128)+rdi],xmm9
+        vpaddd  xmm15,xmm15,xmm5
+        vmovdqu XMMWORD[(64-128)+rdi],xmm10
+        vmovdqu XMMWORD[(96-128)+rdi],xmm11
+        vmovdqu XMMWORD[(128-128)+rdi],xmm12
+        vmovdqu XMMWORD[(160-128)+rdi],xmm13
+        vmovdqu XMMWORD[(192-128)+rdi],xmm14
+        vmovdqu XMMWORD[(224-128)+rdi],xmm15
+
+        vmovdqu XMMWORD[rbx],xmm7
+        vmovdqu xmm6,XMMWORD[$L$pbswap]
+        dec     edx
+        jnz     NEAR $L$oop_avx
+
+        mov     edx,DWORD[280+rsp]
+        lea     rdi,[16+rdi]
+        lea     rsi,[64+rsi]
+        dec     edx
+        jnz     NEAR $L$oop_grande_avx
+
+$L$done_avx:
+        mov     rax,QWORD[272+rsp]
+
+        vzeroupper
+        movaps  xmm6,XMMWORD[((-184))+rax]
+        movaps  xmm7,XMMWORD[((-168))+rax]
+        movaps  xmm8,XMMWORD[((-152))+rax]
+        movaps  xmm9,XMMWORD[((-136))+rax]
+        movaps  xmm10,XMMWORD[((-120))+rax]
+        movaps  xmm11,XMMWORD[((-104))+rax]
+        movaps  xmm12,XMMWORD[((-88))+rax]
+        movaps  xmm13,XMMWORD[((-72))+rax]
+        movaps  xmm14,XMMWORD[((-56))+rax]
+        movaps  xmm15,XMMWORD[((-40))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$epilogue_avx:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_sha256_multi_block_avx:
+
+ALIGN   32
+sha256_multi_block_avx2:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha256_multi_block_avx2:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+_avx2_shortcut:
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        lea     rsp,[((-168))+rsp]
+        movaps  XMMWORD[rsp],xmm6
+        movaps  XMMWORD[16+rsp],xmm7
+        movaps  XMMWORD[32+rsp],xmm8
+        movaps  XMMWORD[48+rsp],xmm9
+        movaps  XMMWORD[64+rsp],xmm10
+        movaps  XMMWORD[80+rsp],xmm11
+        movaps  XMMWORD[(-120)+rax],xmm12
+        movaps  XMMWORD[(-104)+rax],xmm13
+        movaps  XMMWORD[(-88)+rax],xmm14
+        movaps  XMMWORD[(-72)+rax],xmm15
+        sub     rsp,576
+        and     rsp,-256
+        mov     QWORD[544+rsp],rax
+
+$L$body_avx2:
+        lea     rbp,[((K256+128))]
+        lea     rdi,[128+rdi]
+
+$L$oop_grande_avx2:
+        mov     DWORD[552+rsp],edx
+        xor     edx,edx
+        lea     rbx,[512+rsp]
+        mov     r12,QWORD[rsi]
+        mov     ecx,DWORD[8+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[rbx],ecx
+        cmovle  r12,rbp
+        mov     r13,QWORD[16+rsi]
+        mov     ecx,DWORD[24+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[4+rbx],ecx
+        cmovle  r13,rbp
+        mov     r14,QWORD[32+rsi]
+        mov     ecx,DWORD[40+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[8+rbx],ecx
+        cmovle  r14,rbp
+        mov     r15,QWORD[48+rsi]
+        mov     ecx,DWORD[56+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[12+rbx],ecx
+        cmovle  r15,rbp
+        mov     r8,QWORD[64+rsi]
+        mov     ecx,DWORD[72+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[16+rbx],ecx
+        cmovle  r8,rbp
+        mov     r9,QWORD[80+rsi]
+        mov     ecx,DWORD[88+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[20+rbx],ecx
+        cmovle  r9,rbp
+        mov     r10,QWORD[96+rsi]
+        mov     ecx,DWORD[104+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[24+rbx],ecx
+        cmovle  r10,rbp
+        mov     r11,QWORD[112+rsi]
+        mov     ecx,DWORD[120+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[28+rbx],ecx
+        cmovle  r11,rbp
+        vmovdqu ymm8,YMMWORD[((0-128))+rdi]
+        lea     rax,[128+rsp]
+        vmovdqu ymm9,YMMWORD[((32-128))+rdi]
+        lea     rbx,[((256+128))+rsp]
+        vmovdqu ymm10,YMMWORD[((64-128))+rdi]
+        vmovdqu ymm11,YMMWORD[((96-128))+rdi]
+        vmovdqu ymm12,YMMWORD[((128-128))+rdi]
+        vmovdqu ymm13,YMMWORD[((160-128))+rdi]
+        vmovdqu ymm14,YMMWORD[((192-128))+rdi]
+        vmovdqu ymm15,YMMWORD[((224-128))+rdi]
+        vmovdqu ymm6,YMMWORD[$L$pbswap]
+        jmp     NEAR $L$oop_avx2
+
+ALIGN   32
+$L$oop_avx2:
+        vpxor   ymm4,ymm10,ymm9
+        vmovd   xmm5,DWORD[r12]
+        vmovd   xmm0,DWORD[r8]
+        vmovd   xmm1,DWORD[r13]
+        vmovd   xmm2,DWORD[r9]
+        vpinsrd xmm5,xmm5,DWORD[r14],1
+        vpinsrd xmm0,xmm0,DWORD[r10],1
+        vpinsrd xmm1,xmm1,DWORD[r15],1
+        vpunpckldq      ymm5,ymm5,ymm1
+        vpinsrd xmm2,xmm2,DWORD[r11],1
+        vpunpckldq      ymm0,ymm0,ymm2
+        vinserti128     ymm5,ymm5,xmm0,1
+        vpshufb ymm5,ymm5,ymm6
+        vpsrld  ymm7,ymm12,6
+        vpslld  ymm2,ymm12,26
+        vmovdqu YMMWORD[(0-128)+rax],ymm5
+        vpaddd  ymm5,ymm5,ymm15
+
+        vpsrld  ymm1,ymm12,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm12,21
+        vpaddd  ymm5,ymm5,YMMWORD[((-128))+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm12,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm12,7
+        vpandn  ymm0,ymm12,ymm14
+        vpand   ymm3,ymm12,ymm13
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm15,ymm8,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm8,30
+        vpxor   ymm0,ymm0,ymm3
+        vpxor   ymm3,ymm9,ymm8
+
+        vpxor   ymm15,ymm15,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm8,13
+
+        vpslld  ymm2,ymm8,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm4,ymm4,ymm3
+
+        vpxor   ymm7,ymm15,ymm1
+
+        vpsrld  ymm1,ymm8,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm8,10
+        vpxor   ymm15,ymm9,ymm4
+        vpaddd  ymm11,ymm11,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm15,ymm15,ymm5
+        vpaddd  ymm15,ymm15,ymm7
+        vmovd   xmm5,DWORD[4+r12]
+        vmovd   xmm0,DWORD[4+r8]
+        vmovd   xmm1,DWORD[4+r13]
+        vmovd   xmm2,DWORD[4+r9]
+        vpinsrd xmm5,xmm5,DWORD[4+r14],1
+        vpinsrd xmm0,xmm0,DWORD[4+r10],1
+        vpinsrd xmm1,xmm1,DWORD[4+r15],1
+        vpunpckldq      ymm5,ymm5,ymm1
+        vpinsrd xmm2,xmm2,DWORD[4+r11],1
+        vpunpckldq      ymm0,ymm0,ymm2
+        vinserti128     ymm5,ymm5,xmm0,1
+        vpshufb ymm5,ymm5,ymm6
+        vpsrld  ymm7,ymm11,6
+        vpslld  ymm2,ymm11,26
+        vmovdqu YMMWORD[(32-128)+rax],ymm5
+        vpaddd  ymm5,ymm5,ymm14
+
+        vpsrld  ymm1,ymm11,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm11,21
+        vpaddd  ymm5,ymm5,YMMWORD[((-96))+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm11,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm11,7
+        vpandn  ymm0,ymm11,ymm13
+        vpand   ymm4,ymm11,ymm12
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm14,ymm15,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm15,30
+        vpxor   ymm0,ymm0,ymm4
+        vpxor   ymm4,ymm8,ymm15
+
+        vpxor   ymm14,ymm14,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm15,13
+
+        vpslld  ymm2,ymm15,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm3,ymm3,ymm4
+
+        vpxor   ymm7,ymm14,ymm1
+
+        vpsrld  ymm1,ymm15,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm15,10
+        vpxor   ymm14,ymm8,ymm3
+        vpaddd  ymm10,ymm10,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm14,ymm14,ymm5
+        vpaddd  ymm14,ymm14,ymm7
+        vmovd   xmm5,DWORD[8+r12]
+        vmovd   xmm0,DWORD[8+r8]
+        vmovd   xmm1,DWORD[8+r13]
+        vmovd   xmm2,DWORD[8+r9]
+        vpinsrd xmm5,xmm5,DWORD[8+r14],1
+        vpinsrd xmm0,xmm0,DWORD[8+r10],1
+        vpinsrd xmm1,xmm1,DWORD[8+r15],1
+        vpunpckldq      ymm5,ymm5,ymm1
+        vpinsrd xmm2,xmm2,DWORD[8+r11],1
+        vpunpckldq      ymm0,ymm0,ymm2
+        vinserti128     ymm5,ymm5,xmm0,1
+        vpshufb ymm5,ymm5,ymm6
+        vpsrld  ymm7,ymm10,6
+        vpslld  ymm2,ymm10,26
+        vmovdqu YMMWORD[(64-128)+rax],ymm5
+        vpaddd  ymm5,ymm5,ymm13
+
+        vpsrld  ymm1,ymm10,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm10,21
+        vpaddd  ymm5,ymm5,YMMWORD[((-64))+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm10,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm10,7
+        vpandn  ymm0,ymm10,ymm12
+        vpand   ymm3,ymm10,ymm11
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm13,ymm14,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm14,30
+        vpxor   ymm0,ymm0,ymm3
+        vpxor   ymm3,ymm15,ymm14
+
+        vpxor   ymm13,ymm13,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm14,13
+
+        vpslld  ymm2,ymm14,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm4,ymm4,ymm3
+
+        vpxor   ymm7,ymm13,ymm1
+
+        vpsrld  ymm1,ymm14,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm14,10
+        vpxor   ymm13,ymm15,ymm4
+        vpaddd  ymm9,ymm9,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm13,ymm13,ymm5
+        vpaddd  ymm13,ymm13,ymm7
+        vmovd   xmm5,DWORD[12+r12]
+        vmovd   xmm0,DWORD[12+r8]
+        vmovd   xmm1,DWORD[12+r13]
+        vmovd   xmm2,DWORD[12+r9]
+        vpinsrd xmm5,xmm5,DWORD[12+r14],1
+        vpinsrd xmm0,xmm0,DWORD[12+r10],1
+        vpinsrd xmm1,xmm1,DWORD[12+r15],1
+        vpunpckldq      ymm5,ymm5,ymm1
+        vpinsrd xmm2,xmm2,DWORD[12+r11],1
+        vpunpckldq      ymm0,ymm0,ymm2
+        vinserti128     ymm5,ymm5,xmm0,1
+        vpshufb ymm5,ymm5,ymm6
+        vpsrld  ymm7,ymm9,6
+        vpslld  ymm2,ymm9,26
+        vmovdqu YMMWORD[(96-128)+rax],ymm5
+        vpaddd  ymm5,ymm5,ymm12
+
+        vpsrld  ymm1,ymm9,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm9,21
+        vpaddd  ymm5,ymm5,YMMWORD[((-32))+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm9,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm9,7
+        vpandn  ymm0,ymm9,ymm11
+        vpand   ymm4,ymm9,ymm10
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm12,ymm13,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm13,30
+        vpxor   ymm0,ymm0,ymm4
+        vpxor   ymm4,ymm14,ymm13
+
+        vpxor   ymm12,ymm12,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm13,13
+
+        vpslld  ymm2,ymm13,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm3,ymm3,ymm4
+
+        vpxor   ymm7,ymm12,ymm1
+
+        vpsrld  ymm1,ymm13,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm13,10
+        vpxor   ymm12,ymm14,ymm3
+        vpaddd  ymm8,ymm8,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm12,ymm12,ymm5
+        vpaddd  ymm12,ymm12,ymm7
+        vmovd   xmm5,DWORD[16+r12]
+        vmovd   xmm0,DWORD[16+r8]
+        vmovd   xmm1,DWORD[16+r13]
+        vmovd   xmm2,DWORD[16+r9]
+        vpinsrd xmm5,xmm5,DWORD[16+r14],1
+        vpinsrd xmm0,xmm0,DWORD[16+r10],1
+        vpinsrd xmm1,xmm1,DWORD[16+r15],1
+        vpunpckldq      ymm5,ymm5,ymm1
+        vpinsrd xmm2,xmm2,DWORD[16+r11],1
+        vpunpckldq      ymm0,ymm0,ymm2
+        vinserti128     ymm5,ymm5,xmm0,1
+        vpshufb ymm5,ymm5,ymm6
+        vpsrld  ymm7,ymm8,6
+        vpslld  ymm2,ymm8,26
+        vmovdqu YMMWORD[(128-128)+rax],ymm5
+        vpaddd  ymm5,ymm5,ymm11
+
+        vpsrld  ymm1,ymm8,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm8,21
+        vpaddd  ymm5,ymm5,YMMWORD[rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm8,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm8,7
+        vpandn  ymm0,ymm8,ymm10
+        vpand   ymm3,ymm8,ymm9
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm11,ymm12,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm12,30
+        vpxor   ymm0,ymm0,ymm3
+        vpxor   ymm3,ymm13,ymm12
+
+        vpxor   ymm11,ymm11,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm12,13
+
+        vpslld  ymm2,ymm12,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm4,ymm4,ymm3
+
+        vpxor   ymm7,ymm11,ymm1
+
+        vpsrld  ymm1,ymm12,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm12,10
+        vpxor   ymm11,ymm13,ymm4
+        vpaddd  ymm15,ymm15,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm11,ymm11,ymm5
+        vpaddd  ymm11,ymm11,ymm7
+        vmovd   xmm5,DWORD[20+r12]
+        vmovd   xmm0,DWORD[20+r8]
+        vmovd   xmm1,DWORD[20+r13]
+        vmovd   xmm2,DWORD[20+r9]
+        vpinsrd xmm5,xmm5,DWORD[20+r14],1
+        vpinsrd xmm0,xmm0,DWORD[20+r10],1
+        vpinsrd xmm1,xmm1,DWORD[20+r15],1
+        vpunpckldq      ymm5,ymm5,ymm1
+        vpinsrd xmm2,xmm2,DWORD[20+r11],1
+        vpunpckldq      ymm0,ymm0,ymm2
+        vinserti128     ymm5,ymm5,xmm0,1
+        vpshufb ymm5,ymm5,ymm6
+        vpsrld  ymm7,ymm15,6
+        vpslld  ymm2,ymm15,26
+        vmovdqu YMMWORD[(160-128)+rax],ymm5
+        vpaddd  ymm5,ymm5,ymm10
+
+        vpsrld  ymm1,ymm15,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm15,21
+        vpaddd  ymm5,ymm5,YMMWORD[32+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm15,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm15,7
+        vpandn  ymm0,ymm15,ymm9
+        vpand   ymm4,ymm15,ymm8
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm10,ymm11,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm11,30
+        vpxor   ymm0,ymm0,ymm4
+        vpxor   ymm4,ymm12,ymm11
+
+        vpxor   ymm10,ymm10,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm11,13
+
+        vpslld  ymm2,ymm11,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm3,ymm3,ymm4
+
+        vpxor   ymm7,ymm10,ymm1
+
+        vpsrld  ymm1,ymm11,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm11,10
+        vpxor   ymm10,ymm12,ymm3
+        vpaddd  ymm14,ymm14,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm10,ymm10,ymm5
+        vpaddd  ymm10,ymm10,ymm7
+        vmovd   xmm5,DWORD[24+r12]
+        vmovd   xmm0,DWORD[24+r8]
+        vmovd   xmm1,DWORD[24+r13]
+        vmovd   xmm2,DWORD[24+r9]
+        vpinsrd xmm5,xmm5,DWORD[24+r14],1
+        vpinsrd xmm0,xmm0,DWORD[24+r10],1
+        vpinsrd xmm1,xmm1,DWORD[24+r15],1
+        vpunpckldq      ymm5,ymm5,ymm1
+        vpinsrd xmm2,xmm2,DWORD[24+r11],1
+        vpunpckldq      ymm0,ymm0,ymm2
+        vinserti128     ymm5,ymm5,xmm0,1
+        vpshufb ymm5,ymm5,ymm6
+        vpsrld  ymm7,ymm14,6
+        vpslld  ymm2,ymm14,26
+        vmovdqu YMMWORD[(192-128)+rax],ymm5
+        vpaddd  ymm5,ymm5,ymm9
+
+        vpsrld  ymm1,ymm14,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm14,21
+        vpaddd  ymm5,ymm5,YMMWORD[64+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm14,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm14,7
+        vpandn  ymm0,ymm14,ymm8
+        vpand   ymm3,ymm14,ymm15
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm9,ymm10,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm10,30
+        vpxor   ymm0,ymm0,ymm3
+        vpxor   ymm3,ymm11,ymm10
+
+        vpxor   ymm9,ymm9,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm10,13
+
+        vpslld  ymm2,ymm10,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm4,ymm4,ymm3
+
+        vpxor   ymm7,ymm9,ymm1
+
+        vpsrld  ymm1,ymm10,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm10,10
+        vpxor   ymm9,ymm11,ymm4
+        vpaddd  ymm13,ymm13,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm9,ymm9,ymm5
+        vpaddd  ymm9,ymm9,ymm7
+        vmovd   xmm5,DWORD[28+r12]
+        vmovd   xmm0,DWORD[28+r8]
+        vmovd   xmm1,DWORD[28+r13]
+        vmovd   xmm2,DWORD[28+r9]
+        vpinsrd xmm5,xmm5,DWORD[28+r14],1
+        vpinsrd xmm0,xmm0,DWORD[28+r10],1
+        vpinsrd xmm1,xmm1,DWORD[28+r15],1
+        vpunpckldq      ymm5,ymm5,ymm1
+        vpinsrd xmm2,xmm2,DWORD[28+r11],1
+        vpunpckldq      ymm0,ymm0,ymm2
+        vinserti128     ymm5,ymm5,xmm0,1
+        vpshufb ymm5,ymm5,ymm6
+        vpsrld  ymm7,ymm13,6
+        vpslld  ymm2,ymm13,26
+        vmovdqu YMMWORD[(224-128)+rax],ymm5
+        vpaddd  ymm5,ymm5,ymm8
+
+        vpsrld  ymm1,ymm13,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm13,21
+        vpaddd  ymm5,ymm5,YMMWORD[96+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm13,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm13,7
+        vpandn  ymm0,ymm13,ymm15
+        vpand   ymm4,ymm13,ymm14
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm8,ymm9,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm9,30
+        vpxor   ymm0,ymm0,ymm4
+        vpxor   ymm4,ymm10,ymm9
+
+        vpxor   ymm8,ymm8,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm9,13
+
+        vpslld  ymm2,ymm9,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm3,ymm3,ymm4
+
+        vpxor   ymm7,ymm8,ymm1
+
+        vpsrld  ymm1,ymm9,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm9,10
+        vpxor   ymm8,ymm10,ymm3
+        vpaddd  ymm12,ymm12,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm8,ymm8,ymm5
+        vpaddd  ymm8,ymm8,ymm7
+        add     rbp,256
+        vmovd   xmm5,DWORD[32+r12]
+        vmovd   xmm0,DWORD[32+r8]
+        vmovd   xmm1,DWORD[32+r13]
+        vmovd   xmm2,DWORD[32+r9]
+        vpinsrd xmm5,xmm5,DWORD[32+r14],1
+        vpinsrd xmm0,xmm0,DWORD[32+r10],1
+        vpinsrd xmm1,xmm1,DWORD[32+r15],1
+        vpunpckldq      ymm5,ymm5,ymm1
+        vpinsrd xmm2,xmm2,DWORD[32+r11],1
+        vpunpckldq      ymm0,ymm0,ymm2
+        vinserti128     ymm5,ymm5,xmm0,1
+        vpshufb ymm5,ymm5,ymm6
+        vpsrld  ymm7,ymm12,6
+        vpslld  ymm2,ymm12,26
+        vmovdqu YMMWORD[(256-256-128)+rbx],ymm5
+        vpaddd  ymm5,ymm5,ymm15
+
+        vpsrld  ymm1,ymm12,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm12,21
+        vpaddd  ymm5,ymm5,YMMWORD[((-128))+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm12,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm12,7
+        vpandn  ymm0,ymm12,ymm14
+        vpand   ymm3,ymm12,ymm13
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm15,ymm8,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm8,30
+        vpxor   ymm0,ymm0,ymm3
+        vpxor   ymm3,ymm9,ymm8
+
+        vpxor   ymm15,ymm15,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm8,13
+
+        vpslld  ymm2,ymm8,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm4,ymm4,ymm3
+
+        vpxor   ymm7,ymm15,ymm1
+
+        vpsrld  ymm1,ymm8,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm8,10
+        vpxor   ymm15,ymm9,ymm4
+        vpaddd  ymm11,ymm11,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm15,ymm15,ymm5
+        vpaddd  ymm15,ymm15,ymm7
+        vmovd   xmm5,DWORD[36+r12]
+        vmovd   xmm0,DWORD[36+r8]
+        vmovd   xmm1,DWORD[36+r13]
+        vmovd   xmm2,DWORD[36+r9]
+        vpinsrd xmm5,xmm5,DWORD[36+r14],1
+        vpinsrd xmm0,xmm0,DWORD[36+r10],1
+        vpinsrd xmm1,xmm1,DWORD[36+r15],1
+        vpunpckldq      ymm5,ymm5,ymm1
+        vpinsrd xmm2,xmm2,DWORD[36+r11],1
+        vpunpckldq      ymm0,ymm0,ymm2
+        vinserti128     ymm5,ymm5,xmm0,1
+        vpshufb ymm5,ymm5,ymm6
+        vpsrld  ymm7,ymm11,6
+        vpslld  ymm2,ymm11,26
+        vmovdqu YMMWORD[(288-256-128)+rbx],ymm5
+        vpaddd  ymm5,ymm5,ymm14
+
+        vpsrld  ymm1,ymm11,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm11,21
+        vpaddd  ymm5,ymm5,YMMWORD[((-96))+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm11,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm11,7
+        vpandn  ymm0,ymm11,ymm13
+        vpand   ymm4,ymm11,ymm12
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm14,ymm15,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm15,30
+        vpxor   ymm0,ymm0,ymm4
+        vpxor   ymm4,ymm8,ymm15
+
+        vpxor   ymm14,ymm14,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm15,13
+
+        vpslld  ymm2,ymm15,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm3,ymm3,ymm4
+
+        vpxor   ymm7,ymm14,ymm1
+
+        vpsrld  ymm1,ymm15,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm15,10
+        vpxor   ymm14,ymm8,ymm3
+        vpaddd  ymm10,ymm10,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm14,ymm14,ymm5
+        vpaddd  ymm14,ymm14,ymm7
+        vmovd   xmm5,DWORD[40+r12]
+        vmovd   xmm0,DWORD[40+r8]
+        vmovd   xmm1,DWORD[40+r13]
+        vmovd   xmm2,DWORD[40+r9]
+        vpinsrd xmm5,xmm5,DWORD[40+r14],1
+        vpinsrd xmm0,xmm0,DWORD[40+r10],1
+        vpinsrd xmm1,xmm1,DWORD[40+r15],1
+        vpunpckldq      ymm5,ymm5,ymm1
+        vpinsrd xmm2,xmm2,DWORD[40+r11],1
+        vpunpckldq      ymm0,ymm0,ymm2
+        vinserti128     ymm5,ymm5,xmm0,1
+        vpshufb ymm5,ymm5,ymm6
+        vpsrld  ymm7,ymm10,6
+        vpslld  ymm2,ymm10,26
+        vmovdqu YMMWORD[(320-256-128)+rbx],ymm5
+        vpaddd  ymm5,ymm5,ymm13
+
+        vpsrld  ymm1,ymm10,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm10,21
+        vpaddd  ymm5,ymm5,YMMWORD[((-64))+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm10,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm10,7
+        vpandn  ymm0,ymm10,ymm12
+        vpand   ymm3,ymm10,ymm11
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm13,ymm14,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm14,30
+        vpxor   ymm0,ymm0,ymm3
+        vpxor   ymm3,ymm15,ymm14
+
+        vpxor   ymm13,ymm13,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm14,13
+
+        vpslld  ymm2,ymm14,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm4,ymm4,ymm3
+
+        vpxor   ymm7,ymm13,ymm1
+
+        vpsrld  ymm1,ymm14,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm14,10
+        vpxor   ymm13,ymm15,ymm4
+        vpaddd  ymm9,ymm9,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm13,ymm13,ymm5
+        vpaddd  ymm13,ymm13,ymm7
+        vmovd   xmm5,DWORD[44+r12]
+        vmovd   xmm0,DWORD[44+r8]
+        vmovd   xmm1,DWORD[44+r13]
+        vmovd   xmm2,DWORD[44+r9]
+        vpinsrd xmm5,xmm5,DWORD[44+r14],1
+        vpinsrd xmm0,xmm0,DWORD[44+r10],1
+        vpinsrd xmm1,xmm1,DWORD[44+r15],1
+        vpunpckldq      ymm5,ymm5,ymm1
+        vpinsrd xmm2,xmm2,DWORD[44+r11],1
+        vpunpckldq      ymm0,ymm0,ymm2
+        vinserti128     ymm5,ymm5,xmm0,1
+        vpshufb ymm5,ymm5,ymm6
+        vpsrld  ymm7,ymm9,6
+        vpslld  ymm2,ymm9,26
+        vmovdqu YMMWORD[(352-256-128)+rbx],ymm5
+        vpaddd  ymm5,ymm5,ymm12
+
+        vpsrld  ymm1,ymm9,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm9,21
+        vpaddd  ymm5,ymm5,YMMWORD[((-32))+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm9,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm9,7
+        vpandn  ymm0,ymm9,ymm11
+        vpand   ymm4,ymm9,ymm10
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm12,ymm13,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm13,30
+        vpxor   ymm0,ymm0,ymm4
+        vpxor   ymm4,ymm14,ymm13
+
+        vpxor   ymm12,ymm12,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm13,13
+
+        vpslld  ymm2,ymm13,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm3,ymm3,ymm4
+
+        vpxor   ymm7,ymm12,ymm1
+
+        vpsrld  ymm1,ymm13,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm13,10
+        vpxor   ymm12,ymm14,ymm3
+        vpaddd  ymm8,ymm8,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm12,ymm12,ymm5
+        vpaddd  ymm12,ymm12,ymm7
+        vmovd   xmm5,DWORD[48+r12]
+        vmovd   xmm0,DWORD[48+r8]
+        vmovd   xmm1,DWORD[48+r13]
+        vmovd   xmm2,DWORD[48+r9]
+        vpinsrd xmm5,xmm5,DWORD[48+r14],1
+        vpinsrd xmm0,xmm0,DWORD[48+r10],1
+        vpinsrd xmm1,xmm1,DWORD[48+r15],1
+        vpunpckldq      ymm5,ymm5,ymm1
+        vpinsrd xmm2,xmm2,DWORD[48+r11],1
+        vpunpckldq      ymm0,ymm0,ymm2
+        vinserti128     ymm5,ymm5,xmm0,1
+        vpshufb ymm5,ymm5,ymm6
+        vpsrld  ymm7,ymm8,6
+        vpslld  ymm2,ymm8,26
+        vmovdqu YMMWORD[(384-256-128)+rbx],ymm5
+        vpaddd  ymm5,ymm5,ymm11
+
+        vpsrld  ymm1,ymm8,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm8,21
+        vpaddd  ymm5,ymm5,YMMWORD[rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm8,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm8,7
+        vpandn  ymm0,ymm8,ymm10
+        vpand   ymm3,ymm8,ymm9
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm11,ymm12,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm12,30
+        vpxor   ymm0,ymm0,ymm3
+        vpxor   ymm3,ymm13,ymm12
+
+        vpxor   ymm11,ymm11,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm12,13
+
+        vpslld  ymm2,ymm12,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm4,ymm4,ymm3
+
+        vpxor   ymm7,ymm11,ymm1
+
+        vpsrld  ymm1,ymm12,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm12,10
+        vpxor   ymm11,ymm13,ymm4
+        vpaddd  ymm15,ymm15,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm11,ymm11,ymm5
+        vpaddd  ymm11,ymm11,ymm7
+        vmovd   xmm5,DWORD[52+r12]
+        vmovd   xmm0,DWORD[52+r8]
+        vmovd   xmm1,DWORD[52+r13]
+        vmovd   xmm2,DWORD[52+r9]
+        vpinsrd xmm5,xmm5,DWORD[52+r14],1
+        vpinsrd xmm0,xmm0,DWORD[52+r10],1
+        vpinsrd xmm1,xmm1,DWORD[52+r15],1
+        vpunpckldq      ymm5,ymm5,ymm1
+        vpinsrd xmm2,xmm2,DWORD[52+r11],1
+        vpunpckldq      ymm0,ymm0,ymm2
+        vinserti128     ymm5,ymm5,xmm0,1
+        vpshufb ymm5,ymm5,ymm6
+        vpsrld  ymm7,ymm15,6
+        vpslld  ymm2,ymm15,26
+        vmovdqu YMMWORD[(416-256-128)+rbx],ymm5
+        vpaddd  ymm5,ymm5,ymm10
+
+        vpsrld  ymm1,ymm15,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm15,21
+        vpaddd  ymm5,ymm5,YMMWORD[32+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm15,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm15,7
+        vpandn  ymm0,ymm15,ymm9
+        vpand   ymm4,ymm15,ymm8
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm10,ymm11,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm11,30
+        vpxor   ymm0,ymm0,ymm4
+        vpxor   ymm4,ymm12,ymm11
+
+        vpxor   ymm10,ymm10,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm11,13
+
+        vpslld  ymm2,ymm11,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm3,ymm3,ymm4
+
+        vpxor   ymm7,ymm10,ymm1
+
+        vpsrld  ymm1,ymm11,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm11,10
+        vpxor   ymm10,ymm12,ymm3
+        vpaddd  ymm14,ymm14,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm10,ymm10,ymm5
+        vpaddd  ymm10,ymm10,ymm7
+        vmovd   xmm5,DWORD[56+r12]
+        vmovd   xmm0,DWORD[56+r8]
+        vmovd   xmm1,DWORD[56+r13]
+        vmovd   xmm2,DWORD[56+r9]
+        vpinsrd xmm5,xmm5,DWORD[56+r14],1
+        vpinsrd xmm0,xmm0,DWORD[56+r10],1
+        vpinsrd xmm1,xmm1,DWORD[56+r15],1
+        vpunpckldq      ymm5,ymm5,ymm1
+        vpinsrd xmm2,xmm2,DWORD[56+r11],1
+        vpunpckldq      ymm0,ymm0,ymm2
+        vinserti128     ymm5,ymm5,xmm0,1
+        vpshufb ymm5,ymm5,ymm6
+        vpsrld  ymm7,ymm14,6
+        vpslld  ymm2,ymm14,26
+        vmovdqu YMMWORD[(448-256-128)+rbx],ymm5
+        vpaddd  ymm5,ymm5,ymm9
+
+        vpsrld  ymm1,ymm14,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm14,21
+        vpaddd  ymm5,ymm5,YMMWORD[64+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm14,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm14,7
+        vpandn  ymm0,ymm14,ymm8
+        vpand   ymm3,ymm14,ymm15
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm9,ymm10,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm10,30
+        vpxor   ymm0,ymm0,ymm3
+        vpxor   ymm3,ymm11,ymm10
+
+        vpxor   ymm9,ymm9,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm10,13
+
+        vpslld  ymm2,ymm10,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm4,ymm4,ymm3
+
+        vpxor   ymm7,ymm9,ymm1
+
+        vpsrld  ymm1,ymm10,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm10,10
+        vpxor   ymm9,ymm11,ymm4
+        vpaddd  ymm13,ymm13,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm9,ymm9,ymm5
+        vpaddd  ymm9,ymm9,ymm7
+        vmovd   xmm5,DWORD[60+r12]
+        lea     r12,[64+r12]
+        vmovd   xmm0,DWORD[60+r8]
+        lea     r8,[64+r8]
+        vmovd   xmm1,DWORD[60+r13]
+        lea     r13,[64+r13]
+        vmovd   xmm2,DWORD[60+r9]
+        lea     r9,[64+r9]
+        vpinsrd xmm5,xmm5,DWORD[60+r14],1
+        lea     r14,[64+r14]
+        vpinsrd xmm0,xmm0,DWORD[60+r10],1
+        lea     r10,[64+r10]
+        vpinsrd xmm1,xmm1,DWORD[60+r15],1
+        lea     r15,[64+r15]
+        vpunpckldq      ymm5,ymm5,ymm1
+        vpinsrd xmm2,xmm2,DWORD[60+r11],1
+        lea     r11,[64+r11]
+        vpunpckldq      ymm0,ymm0,ymm2
+        vinserti128     ymm5,ymm5,xmm0,1
+        vpshufb ymm5,ymm5,ymm6
+        vpsrld  ymm7,ymm13,6
+        vpslld  ymm2,ymm13,26
+        vmovdqu YMMWORD[(480-256-128)+rbx],ymm5
+        vpaddd  ymm5,ymm5,ymm8
+
+        vpsrld  ymm1,ymm13,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm13,21
+        vpaddd  ymm5,ymm5,YMMWORD[96+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm13,25
+        vpxor   ymm7,ymm7,ymm2
+        prefetcht0      [63+r12]
+        vpslld  ymm2,ymm13,7
+        vpandn  ymm0,ymm13,ymm15
+        vpand   ymm4,ymm13,ymm14
+        prefetcht0      [63+r13]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm8,ymm9,2
+        vpxor   ymm7,ymm7,ymm2
+        prefetcht0      [63+r14]
+        vpslld  ymm1,ymm9,30
+        vpxor   ymm0,ymm0,ymm4
+        vpxor   ymm4,ymm10,ymm9
+        prefetcht0      [63+r15]
+        vpxor   ymm8,ymm8,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm9,13
+        prefetcht0      [63+r8]
+        vpslld  ymm2,ymm9,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm3,ymm3,ymm4
+        prefetcht0      [63+r9]
+        vpxor   ymm7,ymm8,ymm1
+
+        vpsrld  ymm1,ymm9,22
+        vpxor   ymm7,ymm7,ymm2
+        prefetcht0      [63+r10]
+        vpslld  ymm2,ymm9,10
+        vpxor   ymm8,ymm10,ymm3
+        vpaddd  ymm12,ymm12,ymm5
+        prefetcht0      [63+r11]
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm8,ymm8,ymm5
+        vpaddd  ymm8,ymm8,ymm7
+        add     rbp,256
+        vmovdqu ymm5,YMMWORD[((0-128))+rax]
+        mov     ecx,3
+        jmp     NEAR $L$oop_16_xx_avx2
+ALIGN   32
+$L$oop_16_xx_avx2:
+        vmovdqu ymm6,YMMWORD[((32-128))+rax]
+        vpaddd  ymm5,ymm5,YMMWORD[((288-256-128))+rbx]
+
+        vpsrld  ymm7,ymm6,3
+        vpsrld  ymm1,ymm6,7
+        vpslld  ymm2,ymm6,25
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm6,18
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm6,14
+        vmovdqu ymm0,YMMWORD[((448-256-128))+rbx]
+        vpsrld  ymm3,ymm0,10
+
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm0,17
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,15
+        vpaddd  ymm5,ymm5,ymm7
+        vpxor   ymm7,ymm3,ymm1
+        vpsrld  ymm1,ymm0,19
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,13
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+        vpaddd  ymm5,ymm5,ymm7
+        vpsrld  ymm7,ymm12,6
+        vpslld  ymm2,ymm12,26
+        vmovdqu YMMWORD[(0-128)+rax],ymm5
+        vpaddd  ymm5,ymm5,ymm15
+
+        vpsrld  ymm1,ymm12,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm12,21
+        vpaddd  ymm5,ymm5,YMMWORD[((-128))+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm12,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm12,7
+        vpandn  ymm0,ymm12,ymm14
+        vpand   ymm3,ymm12,ymm13
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm15,ymm8,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm8,30
+        vpxor   ymm0,ymm0,ymm3
+        vpxor   ymm3,ymm9,ymm8
+
+        vpxor   ymm15,ymm15,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm8,13
+
+        vpslld  ymm2,ymm8,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm4,ymm4,ymm3
+
+        vpxor   ymm7,ymm15,ymm1
+
+        vpsrld  ymm1,ymm8,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm8,10
+        vpxor   ymm15,ymm9,ymm4
+        vpaddd  ymm11,ymm11,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm15,ymm15,ymm5
+        vpaddd  ymm15,ymm15,ymm7
+        vmovdqu ymm5,YMMWORD[((64-128))+rax]
+        vpaddd  ymm6,ymm6,YMMWORD[((320-256-128))+rbx]
+
+        vpsrld  ymm7,ymm5,3
+        vpsrld  ymm1,ymm5,7
+        vpslld  ymm2,ymm5,25
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm5,18
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm5,14
+        vmovdqu ymm0,YMMWORD[((480-256-128))+rbx]
+        vpsrld  ymm4,ymm0,10
+
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm0,17
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,15
+        vpaddd  ymm6,ymm6,ymm7
+        vpxor   ymm7,ymm4,ymm1
+        vpsrld  ymm1,ymm0,19
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,13
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+        vpaddd  ymm6,ymm6,ymm7
+        vpsrld  ymm7,ymm11,6
+        vpslld  ymm2,ymm11,26
+        vmovdqu YMMWORD[(32-128)+rax],ymm6
+        vpaddd  ymm6,ymm6,ymm14
+
+        vpsrld  ymm1,ymm11,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm11,21
+        vpaddd  ymm6,ymm6,YMMWORD[((-96))+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm11,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm11,7
+        vpandn  ymm0,ymm11,ymm13
+        vpand   ymm4,ymm11,ymm12
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm14,ymm15,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm15,30
+        vpxor   ymm0,ymm0,ymm4
+        vpxor   ymm4,ymm8,ymm15
+
+        vpxor   ymm14,ymm14,ymm1
+        vpaddd  ymm6,ymm6,ymm7
+
+        vpsrld  ymm1,ymm15,13
+
+        vpslld  ymm2,ymm15,19
+        vpaddd  ymm6,ymm6,ymm0
+        vpand   ymm3,ymm3,ymm4
+
+        vpxor   ymm7,ymm14,ymm1
+
+        vpsrld  ymm1,ymm15,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm15,10
+        vpxor   ymm14,ymm8,ymm3
+        vpaddd  ymm10,ymm10,ymm6
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm14,ymm14,ymm6
+        vpaddd  ymm14,ymm14,ymm7
+        vmovdqu ymm6,YMMWORD[((96-128))+rax]
+        vpaddd  ymm5,ymm5,YMMWORD[((352-256-128))+rbx]
+
+        vpsrld  ymm7,ymm6,3
+        vpsrld  ymm1,ymm6,7
+        vpslld  ymm2,ymm6,25
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm6,18
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm6,14
+        vmovdqu ymm0,YMMWORD[((0-128))+rax]
+        vpsrld  ymm3,ymm0,10
+
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm0,17
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,15
+        vpaddd  ymm5,ymm5,ymm7
+        vpxor   ymm7,ymm3,ymm1
+        vpsrld  ymm1,ymm0,19
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,13
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+        vpaddd  ymm5,ymm5,ymm7
+        vpsrld  ymm7,ymm10,6
+        vpslld  ymm2,ymm10,26
+        vmovdqu YMMWORD[(64-128)+rax],ymm5
+        vpaddd  ymm5,ymm5,ymm13
+
+        vpsrld  ymm1,ymm10,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm10,21
+        vpaddd  ymm5,ymm5,YMMWORD[((-64))+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm10,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm10,7
+        vpandn  ymm0,ymm10,ymm12
+        vpand   ymm3,ymm10,ymm11
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm13,ymm14,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm14,30
+        vpxor   ymm0,ymm0,ymm3
+        vpxor   ymm3,ymm15,ymm14
+
+        vpxor   ymm13,ymm13,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm14,13
+
+        vpslld  ymm2,ymm14,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm4,ymm4,ymm3
+
+        vpxor   ymm7,ymm13,ymm1
+
+        vpsrld  ymm1,ymm14,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm14,10
+        vpxor   ymm13,ymm15,ymm4
+        vpaddd  ymm9,ymm9,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm13,ymm13,ymm5
+        vpaddd  ymm13,ymm13,ymm7
+        vmovdqu ymm5,YMMWORD[((128-128))+rax]
+        vpaddd  ymm6,ymm6,YMMWORD[((384-256-128))+rbx]
+
+        vpsrld  ymm7,ymm5,3
+        vpsrld  ymm1,ymm5,7
+        vpslld  ymm2,ymm5,25
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm5,18
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm5,14
+        vmovdqu ymm0,YMMWORD[((32-128))+rax]
+        vpsrld  ymm4,ymm0,10
+
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm0,17
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,15
+        vpaddd  ymm6,ymm6,ymm7
+        vpxor   ymm7,ymm4,ymm1
+        vpsrld  ymm1,ymm0,19
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,13
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+        vpaddd  ymm6,ymm6,ymm7
+        vpsrld  ymm7,ymm9,6
+        vpslld  ymm2,ymm9,26
+        vmovdqu YMMWORD[(96-128)+rax],ymm6
+        vpaddd  ymm6,ymm6,ymm12
+
+        vpsrld  ymm1,ymm9,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm9,21
+        vpaddd  ymm6,ymm6,YMMWORD[((-32))+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm9,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm9,7
+        vpandn  ymm0,ymm9,ymm11
+        vpand   ymm4,ymm9,ymm10
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm12,ymm13,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm13,30
+        vpxor   ymm0,ymm0,ymm4
+        vpxor   ymm4,ymm14,ymm13
+
+        vpxor   ymm12,ymm12,ymm1
+        vpaddd  ymm6,ymm6,ymm7
+
+        vpsrld  ymm1,ymm13,13
+
+        vpslld  ymm2,ymm13,19
+        vpaddd  ymm6,ymm6,ymm0
+        vpand   ymm3,ymm3,ymm4
+
+        vpxor   ymm7,ymm12,ymm1
+
+        vpsrld  ymm1,ymm13,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm13,10
+        vpxor   ymm12,ymm14,ymm3
+        vpaddd  ymm8,ymm8,ymm6
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm12,ymm12,ymm6
+        vpaddd  ymm12,ymm12,ymm7
+        vmovdqu ymm6,YMMWORD[((160-128))+rax]
+        vpaddd  ymm5,ymm5,YMMWORD[((416-256-128))+rbx]
+
+        vpsrld  ymm7,ymm6,3
+        vpsrld  ymm1,ymm6,7
+        vpslld  ymm2,ymm6,25
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm6,18
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm6,14
+        vmovdqu ymm0,YMMWORD[((64-128))+rax]
+        vpsrld  ymm3,ymm0,10
+
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm0,17
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,15
+        vpaddd  ymm5,ymm5,ymm7
+        vpxor   ymm7,ymm3,ymm1
+        vpsrld  ymm1,ymm0,19
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,13
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+        vpaddd  ymm5,ymm5,ymm7
+        vpsrld  ymm7,ymm8,6
+        vpslld  ymm2,ymm8,26
+        vmovdqu YMMWORD[(128-128)+rax],ymm5
+        vpaddd  ymm5,ymm5,ymm11
+
+        vpsrld  ymm1,ymm8,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm8,21
+        vpaddd  ymm5,ymm5,YMMWORD[rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm8,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm8,7
+        vpandn  ymm0,ymm8,ymm10
+        vpand   ymm3,ymm8,ymm9
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm11,ymm12,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm12,30
+        vpxor   ymm0,ymm0,ymm3
+        vpxor   ymm3,ymm13,ymm12
+
+        vpxor   ymm11,ymm11,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm12,13
+
+        vpslld  ymm2,ymm12,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm4,ymm4,ymm3
+
+        vpxor   ymm7,ymm11,ymm1
+
+        vpsrld  ymm1,ymm12,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm12,10
+        vpxor   ymm11,ymm13,ymm4
+        vpaddd  ymm15,ymm15,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm11,ymm11,ymm5
+        vpaddd  ymm11,ymm11,ymm7
+        vmovdqu ymm5,YMMWORD[((192-128))+rax]
+        vpaddd  ymm6,ymm6,YMMWORD[((448-256-128))+rbx]
+
+        vpsrld  ymm7,ymm5,3
+        vpsrld  ymm1,ymm5,7
+        vpslld  ymm2,ymm5,25
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm5,18
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm5,14
+        vmovdqu ymm0,YMMWORD[((96-128))+rax]
+        vpsrld  ymm4,ymm0,10
+
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm0,17
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,15
+        vpaddd  ymm6,ymm6,ymm7
+        vpxor   ymm7,ymm4,ymm1
+        vpsrld  ymm1,ymm0,19
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,13
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+        vpaddd  ymm6,ymm6,ymm7
+        vpsrld  ymm7,ymm15,6
+        vpslld  ymm2,ymm15,26
+        vmovdqu YMMWORD[(160-128)+rax],ymm6
+        vpaddd  ymm6,ymm6,ymm10
+
+        vpsrld  ymm1,ymm15,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm15,21
+        vpaddd  ymm6,ymm6,YMMWORD[32+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm15,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm15,7
+        vpandn  ymm0,ymm15,ymm9
+        vpand   ymm4,ymm15,ymm8
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm10,ymm11,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm11,30
+        vpxor   ymm0,ymm0,ymm4
+        vpxor   ymm4,ymm12,ymm11
+
+        vpxor   ymm10,ymm10,ymm1
+        vpaddd  ymm6,ymm6,ymm7
+
+        vpsrld  ymm1,ymm11,13
+
+        vpslld  ymm2,ymm11,19
+        vpaddd  ymm6,ymm6,ymm0
+        vpand   ymm3,ymm3,ymm4
+
+        vpxor   ymm7,ymm10,ymm1
+
+        vpsrld  ymm1,ymm11,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm11,10
+        vpxor   ymm10,ymm12,ymm3
+        vpaddd  ymm14,ymm14,ymm6
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm10,ymm10,ymm6
+        vpaddd  ymm10,ymm10,ymm7
+        vmovdqu ymm6,YMMWORD[((224-128))+rax]
+        vpaddd  ymm5,ymm5,YMMWORD[((480-256-128))+rbx]
+
+        vpsrld  ymm7,ymm6,3
+        vpsrld  ymm1,ymm6,7
+        vpslld  ymm2,ymm6,25
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm6,18
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm6,14
+        vmovdqu ymm0,YMMWORD[((128-128))+rax]
+        vpsrld  ymm3,ymm0,10
+
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm0,17
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,15
+        vpaddd  ymm5,ymm5,ymm7
+        vpxor   ymm7,ymm3,ymm1
+        vpsrld  ymm1,ymm0,19
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,13
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+        vpaddd  ymm5,ymm5,ymm7
+        vpsrld  ymm7,ymm14,6
+        vpslld  ymm2,ymm14,26
+        vmovdqu YMMWORD[(192-128)+rax],ymm5
+        vpaddd  ymm5,ymm5,ymm9
+
+        vpsrld  ymm1,ymm14,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm14,21
+        vpaddd  ymm5,ymm5,YMMWORD[64+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm14,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm14,7
+        vpandn  ymm0,ymm14,ymm8
+        vpand   ymm3,ymm14,ymm15
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm9,ymm10,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm10,30
+        vpxor   ymm0,ymm0,ymm3
+        vpxor   ymm3,ymm11,ymm10
+
+        vpxor   ymm9,ymm9,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm10,13
+
+        vpslld  ymm2,ymm10,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm4,ymm4,ymm3
+
+        vpxor   ymm7,ymm9,ymm1
+
+        vpsrld  ymm1,ymm10,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm10,10
+        vpxor   ymm9,ymm11,ymm4
+        vpaddd  ymm13,ymm13,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm9,ymm9,ymm5
+        vpaddd  ymm9,ymm9,ymm7
+        vmovdqu ymm5,YMMWORD[((256-256-128))+rbx]
+        vpaddd  ymm6,ymm6,YMMWORD[((0-128))+rax]
+
+        vpsrld  ymm7,ymm5,3
+        vpsrld  ymm1,ymm5,7
+        vpslld  ymm2,ymm5,25
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm5,18
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm5,14
+        vmovdqu ymm0,YMMWORD[((160-128))+rax]
+        vpsrld  ymm4,ymm0,10
+
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm0,17
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,15
+        vpaddd  ymm6,ymm6,ymm7
+        vpxor   ymm7,ymm4,ymm1
+        vpsrld  ymm1,ymm0,19
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,13
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+        vpaddd  ymm6,ymm6,ymm7
+        vpsrld  ymm7,ymm13,6
+        vpslld  ymm2,ymm13,26
+        vmovdqu YMMWORD[(224-128)+rax],ymm6
+        vpaddd  ymm6,ymm6,ymm8
+
+        vpsrld  ymm1,ymm13,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm13,21
+        vpaddd  ymm6,ymm6,YMMWORD[96+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm13,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm13,7
+        vpandn  ymm0,ymm13,ymm15
+        vpand   ymm4,ymm13,ymm14
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm8,ymm9,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm9,30
+        vpxor   ymm0,ymm0,ymm4
+        vpxor   ymm4,ymm10,ymm9
+
+        vpxor   ymm8,ymm8,ymm1
+        vpaddd  ymm6,ymm6,ymm7
+
+        vpsrld  ymm1,ymm9,13
+
+        vpslld  ymm2,ymm9,19
+        vpaddd  ymm6,ymm6,ymm0
+        vpand   ymm3,ymm3,ymm4
+
+        vpxor   ymm7,ymm8,ymm1
+
+        vpsrld  ymm1,ymm9,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm9,10
+        vpxor   ymm8,ymm10,ymm3
+        vpaddd  ymm12,ymm12,ymm6
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm8,ymm8,ymm6
+        vpaddd  ymm8,ymm8,ymm7
+        add     rbp,256
+        vmovdqu ymm6,YMMWORD[((288-256-128))+rbx]
+        vpaddd  ymm5,ymm5,YMMWORD[((32-128))+rax]
+
+        vpsrld  ymm7,ymm6,3
+        vpsrld  ymm1,ymm6,7
+        vpslld  ymm2,ymm6,25
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm6,18
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm6,14
+        vmovdqu ymm0,YMMWORD[((192-128))+rax]
+        vpsrld  ymm3,ymm0,10
+
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm0,17
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,15
+        vpaddd  ymm5,ymm5,ymm7
+        vpxor   ymm7,ymm3,ymm1
+        vpsrld  ymm1,ymm0,19
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,13
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+        vpaddd  ymm5,ymm5,ymm7
+        vpsrld  ymm7,ymm12,6
+        vpslld  ymm2,ymm12,26
+        vmovdqu YMMWORD[(256-256-128)+rbx],ymm5
+        vpaddd  ymm5,ymm5,ymm15
+
+        vpsrld  ymm1,ymm12,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm12,21
+        vpaddd  ymm5,ymm5,YMMWORD[((-128))+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm12,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm12,7
+        vpandn  ymm0,ymm12,ymm14
+        vpand   ymm3,ymm12,ymm13
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm15,ymm8,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm8,30
+        vpxor   ymm0,ymm0,ymm3
+        vpxor   ymm3,ymm9,ymm8
+
+        vpxor   ymm15,ymm15,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm8,13
+
+        vpslld  ymm2,ymm8,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm4,ymm4,ymm3
+
+        vpxor   ymm7,ymm15,ymm1
+
+        vpsrld  ymm1,ymm8,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm8,10
+        vpxor   ymm15,ymm9,ymm4
+        vpaddd  ymm11,ymm11,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm15,ymm15,ymm5
+        vpaddd  ymm15,ymm15,ymm7
+        vmovdqu ymm5,YMMWORD[((320-256-128))+rbx]
+        vpaddd  ymm6,ymm6,YMMWORD[((64-128))+rax]
+
+        vpsrld  ymm7,ymm5,3
+        vpsrld  ymm1,ymm5,7
+        vpslld  ymm2,ymm5,25
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm5,18
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm5,14
+        vmovdqu ymm0,YMMWORD[((224-128))+rax]
+        vpsrld  ymm4,ymm0,10
+
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm0,17
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,15
+        vpaddd  ymm6,ymm6,ymm7
+        vpxor   ymm7,ymm4,ymm1
+        vpsrld  ymm1,ymm0,19
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,13
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+        vpaddd  ymm6,ymm6,ymm7
+        vpsrld  ymm7,ymm11,6
+        vpslld  ymm2,ymm11,26
+        vmovdqu YMMWORD[(288-256-128)+rbx],ymm6
+        vpaddd  ymm6,ymm6,ymm14
+
+        vpsrld  ymm1,ymm11,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm11,21
+        vpaddd  ymm6,ymm6,YMMWORD[((-96))+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm11,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm11,7
+        vpandn  ymm0,ymm11,ymm13
+        vpand   ymm4,ymm11,ymm12
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm14,ymm15,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm15,30
+        vpxor   ymm0,ymm0,ymm4
+        vpxor   ymm4,ymm8,ymm15
+
+        vpxor   ymm14,ymm14,ymm1
+        vpaddd  ymm6,ymm6,ymm7
+
+        vpsrld  ymm1,ymm15,13
+
+        vpslld  ymm2,ymm15,19
+        vpaddd  ymm6,ymm6,ymm0
+        vpand   ymm3,ymm3,ymm4
+
+        vpxor   ymm7,ymm14,ymm1
+
+        vpsrld  ymm1,ymm15,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm15,10
+        vpxor   ymm14,ymm8,ymm3
+        vpaddd  ymm10,ymm10,ymm6
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm14,ymm14,ymm6
+        vpaddd  ymm14,ymm14,ymm7
+        vmovdqu ymm6,YMMWORD[((352-256-128))+rbx]
+        vpaddd  ymm5,ymm5,YMMWORD[((96-128))+rax]
+
+        vpsrld  ymm7,ymm6,3
+        vpsrld  ymm1,ymm6,7
+        vpslld  ymm2,ymm6,25
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm6,18
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm6,14
+        vmovdqu ymm0,YMMWORD[((256-256-128))+rbx]
+        vpsrld  ymm3,ymm0,10
+
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm0,17
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,15
+        vpaddd  ymm5,ymm5,ymm7
+        vpxor   ymm7,ymm3,ymm1
+        vpsrld  ymm1,ymm0,19
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,13
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+        vpaddd  ymm5,ymm5,ymm7
+        vpsrld  ymm7,ymm10,6
+        vpslld  ymm2,ymm10,26
+        vmovdqu YMMWORD[(320-256-128)+rbx],ymm5
+        vpaddd  ymm5,ymm5,ymm13
+
+        vpsrld  ymm1,ymm10,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm10,21
+        vpaddd  ymm5,ymm5,YMMWORD[((-64))+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm10,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm10,7
+        vpandn  ymm0,ymm10,ymm12
+        vpand   ymm3,ymm10,ymm11
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm13,ymm14,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm14,30
+        vpxor   ymm0,ymm0,ymm3
+        vpxor   ymm3,ymm15,ymm14
+
+        vpxor   ymm13,ymm13,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm14,13
+
+        vpslld  ymm2,ymm14,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm4,ymm4,ymm3
+
+        vpxor   ymm7,ymm13,ymm1
+
+        vpsrld  ymm1,ymm14,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm14,10
+        vpxor   ymm13,ymm15,ymm4
+        vpaddd  ymm9,ymm9,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm13,ymm13,ymm5
+        vpaddd  ymm13,ymm13,ymm7
+        vmovdqu ymm5,YMMWORD[((384-256-128))+rbx]
+        vpaddd  ymm6,ymm6,YMMWORD[((128-128))+rax]
+
+        vpsrld  ymm7,ymm5,3
+        vpsrld  ymm1,ymm5,7
+        vpslld  ymm2,ymm5,25
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm5,18
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm5,14
+        vmovdqu ymm0,YMMWORD[((288-256-128))+rbx]
+        vpsrld  ymm4,ymm0,10
+
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm0,17
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,15
+        vpaddd  ymm6,ymm6,ymm7
+        vpxor   ymm7,ymm4,ymm1
+        vpsrld  ymm1,ymm0,19
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,13
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+        vpaddd  ymm6,ymm6,ymm7
+        vpsrld  ymm7,ymm9,6
+        vpslld  ymm2,ymm9,26
+        vmovdqu YMMWORD[(352-256-128)+rbx],ymm6
+        vpaddd  ymm6,ymm6,ymm12
+
+        vpsrld  ymm1,ymm9,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm9,21
+        vpaddd  ymm6,ymm6,YMMWORD[((-32))+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm9,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm9,7
+        vpandn  ymm0,ymm9,ymm11
+        vpand   ymm4,ymm9,ymm10
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm12,ymm13,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm13,30
+        vpxor   ymm0,ymm0,ymm4
+        vpxor   ymm4,ymm14,ymm13
+
+        vpxor   ymm12,ymm12,ymm1
+        vpaddd  ymm6,ymm6,ymm7
+
+        vpsrld  ymm1,ymm13,13
+
+        vpslld  ymm2,ymm13,19
+        vpaddd  ymm6,ymm6,ymm0
+        vpand   ymm3,ymm3,ymm4
+
+        vpxor   ymm7,ymm12,ymm1
+
+        vpsrld  ymm1,ymm13,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm13,10
+        vpxor   ymm12,ymm14,ymm3
+        vpaddd  ymm8,ymm8,ymm6
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm12,ymm12,ymm6
+        vpaddd  ymm12,ymm12,ymm7
+        vmovdqu ymm6,YMMWORD[((416-256-128))+rbx]
+        vpaddd  ymm5,ymm5,YMMWORD[((160-128))+rax]
+
+        vpsrld  ymm7,ymm6,3
+        vpsrld  ymm1,ymm6,7
+        vpslld  ymm2,ymm6,25
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm6,18
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm6,14
+        vmovdqu ymm0,YMMWORD[((320-256-128))+rbx]
+        vpsrld  ymm3,ymm0,10
+
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm0,17
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,15
+        vpaddd  ymm5,ymm5,ymm7
+        vpxor   ymm7,ymm3,ymm1
+        vpsrld  ymm1,ymm0,19
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,13
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+        vpaddd  ymm5,ymm5,ymm7
+        vpsrld  ymm7,ymm8,6
+        vpslld  ymm2,ymm8,26
+        vmovdqu YMMWORD[(384-256-128)+rbx],ymm5
+        vpaddd  ymm5,ymm5,ymm11
+
+        vpsrld  ymm1,ymm8,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm8,21
+        vpaddd  ymm5,ymm5,YMMWORD[rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm8,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm8,7
+        vpandn  ymm0,ymm8,ymm10
+        vpand   ymm3,ymm8,ymm9
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm11,ymm12,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm12,30
+        vpxor   ymm0,ymm0,ymm3
+        vpxor   ymm3,ymm13,ymm12
+
+        vpxor   ymm11,ymm11,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm12,13
+
+        vpslld  ymm2,ymm12,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm4,ymm4,ymm3
+
+        vpxor   ymm7,ymm11,ymm1
+
+        vpsrld  ymm1,ymm12,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm12,10
+        vpxor   ymm11,ymm13,ymm4
+        vpaddd  ymm15,ymm15,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm11,ymm11,ymm5
+        vpaddd  ymm11,ymm11,ymm7
+        vmovdqu ymm5,YMMWORD[((448-256-128))+rbx]
+        vpaddd  ymm6,ymm6,YMMWORD[((192-128))+rax]
+
+        vpsrld  ymm7,ymm5,3
+        vpsrld  ymm1,ymm5,7
+        vpslld  ymm2,ymm5,25
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm5,18
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm5,14
+        vmovdqu ymm0,YMMWORD[((352-256-128))+rbx]
+        vpsrld  ymm4,ymm0,10
+
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm0,17
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,15
+        vpaddd  ymm6,ymm6,ymm7
+        vpxor   ymm7,ymm4,ymm1
+        vpsrld  ymm1,ymm0,19
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,13
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+        vpaddd  ymm6,ymm6,ymm7
+        vpsrld  ymm7,ymm15,6
+        vpslld  ymm2,ymm15,26
+        vmovdqu YMMWORD[(416-256-128)+rbx],ymm6
+        vpaddd  ymm6,ymm6,ymm10
+
+        vpsrld  ymm1,ymm15,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm15,21
+        vpaddd  ymm6,ymm6,YMMWORD[32+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm15,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm15,7
+        vpandn  ymm0,ymm15,ymm9
+        vpand   ymm4,ymm15,ymm8
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm10,ymm11,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm11,30
+        vpxor   ymm0,ymm0,ymm4
+        vpxor   ymm4,ymm12,ymm11
+
+        vpxor   ymm10,ymm10,ymm1
+        vpaddd  ymm6,ymm6,ymm7
+
+        vpsrld  ymm1,ymm11,13
+
+        vpslld  ymm2,ymm11,19
+        vpaddd  ymm6,ymm6,ymm0
+        vpand   ymm3,ymm3,ymm4
+
+        vpxor   ymm7,ymm10,ymm1
+
+        vpsrld  ymm1,ymm11,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm11,10
+        vpxor   ymm10,ymm12,ymm3
+        vpaddd  ymm14,ymm14,ymm6
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm10,ymm10,ymm6
+        vpaddd  ymm10,ymm10,ymm7
+        vmovdqu ymm6,YMMWORD[((480-256-128))+rbx]
+        vpaddd  ymm5,ymm5,YMMWORD[((224-128))+rax]
+
+        vpsrld  ymm7,ymm6,3
+        vpsrld  ymm1,ymm6,7
+        vpslld  ymm2,ymm6,25
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm6,18
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm6,14
+        vmovdqu ymm0,YMMWORD[((384-256-128))+rbx]
+        vpsrld  ymm3,ymm0,10
+
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm0,17
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,15
+        vpaddd  ymm5,ymm5,ymm7
+        vpxor   ymm7,ymm3,ymm1
+        vpsrld  ymm1,ymm0,19
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,13
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+        vpaddd  ymm5,ymm5,ymm7
+        vpsrld  ymm7,ymm14,6
+        vpslld  ymm2,ymm14,26
+        vmovdqu YMMWORD[(448-256-128)+rbx],ymm5
+        vpaddd  ymm5,ymm5,ymm9
+
+        vpsrld  ymm1,ymm14,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm14,21
+        vpaddd  ymm5,ymm5,YMMWORD[64+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm14,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm14,7
+        vpandn  ymm0,ymm14,ymm8
+        vpand   ymm3,ymm14,ymm15
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm9,ymm10,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm10,30
+        vpxor   ymm0,ymm0,ymm3
+        vpxor   ymm3,ymm11,ymm10
+
+        vpxor   ymm9,ymm9,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm10,13
+
+        vpslld  ymm2,ymm10,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm4,ymm4,ymm3
+
+        vpxor   ymm7,ymm9,ymm1
+
+        vpsrld  ymm1,ymm10,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm10,10
+        vpxor   ymm9,ymm11,ymm4
+        vpaddd  ymm13,ymm13,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm9,ymm9,ymm5
+        vpaddd  ymm9,ymm9,ymm7
+        vmovdqu ymm5,YMMWORD[((0-128))+rax]
+        vpaddd  ymm6,ymm6,YMMWORD[((256-256-128))+rbx]
+
+        vpsrld  ymm7,ymm5,3
+        vpsrld  ymm1,ymm5,7
+        vpslld  ymm2,ymm5,25
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm5,18
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm5,14
+        vmovdqu ymm0,YMMWORD[((416-256-128))+rbx]
+        vpsrld  ymm4,ymm0,10
+
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm0,17
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,15
+        vpaddd  ymm6,ymm6,ymm7
+        vpxor   ymm7,ymm4,ymm1
+        vpsrld  ymm1,ymm0,19
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,13
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+        vpaddd  ymm6,ymm6,ymm7
+        vpsrld  ymm7,ymm13,6
+        vpslld  ymm2,ymm13,26
+        vmovdqu YMMWORD[(480-256-128)+rbx],ymm6
+        vpaddd  ymm6,ymm6,ymm8
+
+        vpsrld  ymm1,ymm13,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm13,21
+        vpaddd  ymm6,ymm6,YMMWORD[96+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm13,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm13,7
+        vpandn  ymm0,ymm13,ymm15
+        vpand   ymm4,ymm13,ymm14
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm8,ymm9,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm9,30
+        vpxor   ymm0,ymm0,ymm4
+        vpxor   ymm4,ymm10,ymm9
+
+        vpxor   ymm8,ymm8,ymm1
+        vpaddd  ymm6,ymm6,ymm7
+
+        vpsrld  ymm1,ymm9,13
+
+        vpslld  ymm2,ymm9,19
+        vpaddd  ymm6,ymm6,ymm0
+        vpand   ymm3,ymm3,ymm4
+
+        vpxor   ymm7,ymm8,ymm1
+
+        vpsrld  ymm1,ymm9,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm9,10
+        vpxor   ymm8,ymm10,ymm3
+        vpaddd  ymm12,ymm12,ymm6
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm8,ymm8,ymm6
+        vpaddd  ymm8,ymm8,ymm7
+        add     rbp,256
+        dec     ecx
+        jnz     NEAR $L$oop_16_xx_avx2
+
+        mov     ecx,1
+        lea     rbx,[512+rsp]
+        lea     rbp,[((K256+128))]
+        cmp     ecx,DWORD[rbx]
+        cmovge  r12,rbp
+        cmp     ecx,DWORD[4+rbx]
+        cmovge  r13,rbp
+        cmp     ecx,DWORD[8+rbx]
+        cmovge  r14,rbp
+        cmp     ecx,DWORD[12+rbx]
+        cmovge  r15,rbp
+        cmp     ecx,DWORD[16+rbx]
+        cmovge  r8,rbp
+        cmp     ecx,DWORD[20+rbx]
+        cmovge  r9,rbp
+        cmp     ecx,DWORD[24+rbx]
+        cmovge  r10,rbp
+        cmp     ecx,DWORD[28+rbx]
+        cmovge  r11,rbp
+        vmovdqa ymm7,YMMWORD[rbx]
+        vpxor   ymm0,ymm0,ymm0
+        vmovdqa ymm6,ymm7
+        vpcmpgtd        ymm6,ymm6,ymm0
+        vpaddd  ymm7,ymm7,ymm6
+
+        vmovdqu ymm0,YMMWORD[((0-128))+rdi]
+        vpand   ymm8,ymm8,ymm6
+        vmovdqu ymm1,YMMWORD[((32-128))+rdi]
+        vpand   ymm9,ymm9,ymm6
+        vmovdqu ymm2,YMMWORD[((64-128))+rdi]
+        vpand   ymm10,ymm10,ymm6
+        vmovdqu ymm5,YMMWORD[((96-128))+rdi]
+        vpand   ymm11,ymm11,ymm6
+        vpaddd  ymm8,ymm8,ymm0
+        vmovdqu ymm0,YMMWORD[((128-128))+rdi]
+        vpand   ymm12,ymm12,ymm6
+        vpaddd  ymm9,ymm9,ymm1
+        vmovdqu ymm1,YMMWORD[((160-128))+rdi]
+        vpand   ymm13,ymm13,ymm6
+        vpaddd  ymm10,ymm10,ymm2
+        vmovdqu ymm2,YMMWORD[((192-128))+rdi]
+        vpand   ymm14,ymm14,ymm6
+        vpaddd  ymm11,ymm11,ymm5
+        vmovdqu ymm5,YMMWORD[((224-128))+rdi]
+        vpand   ymm15,ymm15,ymm6
+        vpaddd  ymm12,ymm12,ymm0
+        vpaddd  ymm13,ymm13,ymm1
+        vmovdqu YMMWORD[(0-128)+rdi],ymm8
+        vpaddd  ymm14,ymm14,ymm2
+        vmovdqu YMMWORD[(32-128)+rdi],ymm9
+        vpaddd  ymm15,ymm15,ymm5
+        vmovdqu YMMWORD[(64-128)+rdi],ymm10
+        vmovdqu YMMWORD[(96-128)+rdi],ymm11
+        vmovdqu YMMWORD[(128-128)+rdi],ymm12
+        vmovdqu YMMWORD[(160-128)+rdi],ymm13
+        vmovdqu YMMWORD[(192-128)+rdi],ymm14
+        vmovdqu YMMWORD[(224-128)+rdi],ymm15
+
+        vmovdqu YMMWORD[rbx],ymm7
+        lea     rbx,[((256+128))+rsp]
+        vmovdqu ymm6,YMMWORD[$L$pbswap]
+        dec     edx
+        jnz     NEAR $L$oop_avx2
+
+
+
+
+
+
+
+$L$done_avx2:
+        mov     rax,QWORD[544+rsp]
+
+        vzeroupper
+        movaps  xmm6,XMMWORD[((-216))+rax]
+        movaps  xmm7,XMMWORD[((-200))+rax]
+        movaps  xmm8,XMMWORD[((-184))+rax]
+        movaps  xmm9,XMMWORD[((-168))+rax]
+        movaps  xmm10,XMMWORD[((-152))+rax]
+        movaps  xmm11,XMMWORD[((-136))+rax]
+        movaps  xmm12,XMMWORD[((-120))+rax]
+        movaps  xmm13,XMMWORD[((-104))+rax]
+        movaps  xmm14,XMMWORD[((-88))+rax]
+        movaps  xmm15,XMMWORD[((-72))+rax]
+        mov     r15,QWORD[((-48))+rax]
+
+        mov     r14,QWORD[((-40))+rax]
+
+        mov     r13,QWORD[((-32))+rax]
+
+        mov     r12,QWORD[((-24))+rax]
+
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$epilogue_avx2:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_sha256_multi_block_avx2:
+ALIGN   256
+K256:
+        DD      1116352408,1116352408,1116352408,1116352408
+        DD      1116352408,1116352408,1116352408,1116352408
+        DD      1899447441,1899447441,1899447441,1899447441
+        DD      1899447441,1899447441,1899447441,1899447441
+        DD      3049323471,3049323471,3049323471,3049323471
+        DD      3049323471,3049323471,3049323471,3049323471
+        DD      3921009573,3921009573,3921009573,3921009573
+        DD      3921009573,3921009573,3921009573,3921009573
+        DD      961987163,961987163,961987163,961987163
+        DD      961987163,961987163,961987163,961987163
+        DD      1508970993,1508970993,1508970993,1508970993
+        DD      1508970993,1508970993,1508970993,1508970993
+        DD      2453635748,2453635748,2453635748,2453635748
+        DD      2453635748,2453635748,2453635748,2453635748
+        DD      2870763221,2870763221,2870763221,2870763221
+        DD      2870763221,2870763221,2870763221,2870763221
+        DD      3624381080,3624381080,3624381080,3624381080
+        DD      3624381080,3624381080,3624381080,3624381080
+        DD      310598401,310598401,310598401,310598401
+        DD      310598401,310598401,310598401,310598401
+        DD      607225278,607225278,607225278,607225278
+        DD      607225278,607225278,607225278,607225278
+        DD      1426881987,1426881987,1426881987,1426881987
+        DD      1426881987,1426881987,1426881987,1426881987
+        DD      1925078388,1925078388,1925078388,1925078388
+        DD      1925078388,1925078388,1925078388,1925078388
+        DD      2162078206,2162078206,2162078206,2162078206
+        DD      2162078206,2162078206,2162078206,2162078206
+        DD      2614888103,2614888103,2614888103,2614888103
+        DD      2614888103,2614888103,2614888103,2614888103
+        DD      3248222580,3248222580,3248222580,3248222580
+        DD      3248222580,3248222580,3248222580,3248222580
+        DD      3835390401,3835390401,3835390401,3835390401
+        DD      3835390401,3835390401,3835390401,3835390401
+        DD      4022224774,4022224774,4022224774,4022224774
+        DD      4022224774,4022224774,4022224774,4022224774
+        DD      264347078,264347078,264347078,264347078
+        DD      264347078,264347078,264347078,264347078
+        DD      604807628,604807628,604807628,604807628
+        DD      604807628,604807628,604807628,604807628
+        DD      770255983,770255983,770255983,770255983
+        DD      770255983,770255983,770255983,770255983
+        DD      1249150122,1249150122,1249150122,1249150122
+        DD      1249150122,1249150122,1249150122,1249150122
+        DD      1555081692,1555081692,1555081692,1555081692
+        DD      1555081692,1555081692,1555081692,1555081692
+        DD      1996064986,1996064986,1996064986,1996064986
+        DD      1996064986,1996064986,1996064986,1996064986
+        DD      2554220882,2554220882,2554220882,2554220882
+        DD      2554220882,2554220882,2554220882,2554220882
+        DD      2821834349,2821834349,2821834349,2821834349
+        DD      2821834349,2821834349,2821834349,2821834349
+        DD      2952996808,2952996808,2952996808,2952996808
+        DD      2952996808,2952996808,2952996808,2952996808
+        DD      3210313671,3210313671,3210313671,3210313671
+        DD      3210313671,3210313671,3210313671,3210313671
+        DD      3336571891,3336571891,3336571891,3336571891
+        DD      3336571891,3336571891,3336571891,3336571891
+        DD      3584528711,3584528711,3584528711,3584528711
+        DD      3584528711,3584528711,3584528711,3584528711
+        DD      113926993,113926993,113926993,113926993
+        DD      113926993,113926993,113926993,113926993
+        DD      338241895,338241895,338241895,338241895
+        DD      338241895,338241895,338241895,338241895
+        DD      666307205,666307205,666307205,666307205
+        DD      666307205,666307205,666307205,666307205
+        DD      773529912,773529912,773529912,773529912
+        DD      773529912,773529912,773529912,773529912
+        DD      1294757372,1294757372,1294757372,1294757372
+        DD      1294757372,1294757372,1294757372,1294757372
+        DD      1396182291,1396182291,1396182291,1396182291
+        DD      1396182291,1396182291,1396182291,1396182291
+        DD      1695183700,1695183700,1695183700,1695183700
+        DD      1695183700,1695183700,1695183700,1695183700
+        DD      1986661051,1986661051,1986661051,1986661051
+        DD      1986661051,1986661051,1986661051,1986661051
+        DD      2177026350,2177026350,2177026350,2177026350
+        DD      2177026350,2177026350,2177026350,2177026350
+        DD      2456956037,2456956037,2456956037,2456956037
+        DD      2456956037,2456956037,2456956037,2456956037
+        DD      2730485921,2730485921,2730485921,2730485921
+        DD      2730485921,2730485921,2730485921,2730485921
+        DD      2820302411,2820302411,2820302411,2820302411
+        DD      2820302411,2820302411,2820302411,2820302411
+        DD      3259730800,3259730800,3259730800,3259730800
+        DD      3259730800,3259730800,3259730800,3259730800
+        DD      3345764771,3345764771,3345764771,3345764771
+        DD      3345764771,3345764771,3345764771,3345764771
+        DD      3516065817,3516065817,3516065817,3516065817
+        DD      3516065817,3516065817,3516065817,3516065817
+        DD      3600352804,3600352804,3600352804,3600352804
+        DD      3600352804,3600352804,3600352804,3600352804
+        DD      4094571909,4094571909,4094571909,4094571909
+        DD      4094571909,4094571909,4094571909,4094571909
+        DD      275423344,275423344,275423344,275423344
+        DD      275423344,275423344,275423344,275423344
+        DD      430227734,430227734,430227734,430227734
+        DD      430227734,430227734,430227734,430227734
+        DD      506948616,506948616,506948616,506948616
+        DD      506948616,506948616,506948616,506948616
+        DD      659060556,659060556,659060556,659060556
+        DD      659060556,659060556,659060556,659060556
+        DD      883997877,883997877,883997877,883997877
+        DD      883997877,883997877,883997877,883997877
+        DD      958139571,958139571,958139571,958139571
+        DD      958139571,958139571,958139571,958139571
+        DD      1322822218,1322822218,1322822218,1322822218
+        DD      1322822218,1322822218,1322822218,1322822218
+        DD      1537002063,1537002063,1537002063,1537002063
+        DD      1537002063,1537002063,1537002063,1537002063
+        DD      1747873779,1747873779,1747873779,1747873779
+        DD      1747873779,1747873779,1747873779,1747873779
+        DD      1955562222,1955562222,1955562222,1955562222
+        DD      1955562222,1955562222,1955562222,1955562222
+        DD      2024104815,2024104815,2024104815,2024104815
+        DD      2024104815,2024104815,2024104815,2024104815
+        DD      2227730452,2227730452,2227730452,2227730452
+        DD      2227730452,2227730452,2227730452,2227730452
+        DD      2361852424,2361852424,2361852424,2361852424
+        DD      2361852424,2361852424,2361852424,2361852424
+        DD      2428436474,2428436474,2428436474,2428436474
+        DD      2428436474,2428436474,2428436474,2428436474
+        DD      2756734187,2756734187,2756734187,2756734187
+        DD      2756734187,2756734187,2756734187,2756734187
+        DD      3204031479,3204031479,3204031479,3204031479
+        DD      3204031479,3204031479,3204031479,3204031479
+        DD      3329325298,3329325298,3329325298,3329325298
+        DD      3329325298,3329325298,3329325298,3329325298
+$L$pbswap:
+        DD      0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+        DD      0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+K256_shaext:
+        DD      0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5
+        DD      0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5
+        DD      0xd807aa98,0x12835b01,0x243185be,0x550c7dc3
+        DD      0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174
+        DD      0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc
+        DD      0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da
+        DD      0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7
+        DD      0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967
+        DD      0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13
+        DD      0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85
+        DD      0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3
+        DD      0xd192e819,0xd6990624,0xf40e3585,0x106aa070
+        DD      0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5
+        DD      0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3
+        DD      0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208
+        DD      0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2
+DB      83,72,65,50,53,54,32,109,117,108,116,105,45,98,108,111
+DB      99,107,32,116,114,97,110,115,102,111,114,109,32,102,111,114
+DB      32,120,56,54,95,54,52,44,32,67,82,89,80,84,79,71
+DB      65,77,83,32,98,121,32,60,97,112,112,114,111,64,111,112
+DB      101,110,115,115,108,46,111,114,103,62,0
+EXTERN  __imp_RtlVirtualUnwind
+
+ALIGN   16
+se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$in_prologue
+
+        mov     rax,QWORD[152+r8]
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$in_prologue
+
+        mov     rax,QWORD[272+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+        mov     QWORD[144+r8],rbx
+        mov     QWORD[160+r8],rbp
+
+        lea     rsi,[((-24-160))+rax]
+        lea     rdi,[512+r8]
+        mov     ecx,20
+        DD      0xa548f3fc
+
+$L$in_prologue:
+        mov     rdi,QWORD[8+rax]
+        mov     rsi,QWORD[16+rax]
+        mov     QWORD[152+r8],rax
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+
+        mov     rdi,QWORD[40+r9]
+        mov     rsi,r8
+        mov     ecx,154
+        DD      0xa548f3fc
+
+        mov     rsi,r9
+        xor     rcx,rcx
+        mov     rdx,QWORD[8+rsi]
+        mov     r8,QWORD[rsi]
+        mov     r9,QWORD[16+rsi]
+        mov     r10,QWORD[40+rsi]
+        lea     r11,[56+rsi]
+        lea     r12,[24+rsi]
+        mov     QWORD[32+rsp],r10
+        mov     QWORD[40+rsp],r11
+        mov     QWORD[48+rsp],r12
+        mov     QWORD[56+rsp],rcx
+        call    QWORD[__imp_RtlVirtualUnwind]
+
+        mov     eax,1
+        add     rsp,64
+        popfq
+        pop     r15
+        pop     r14
+        pop     r13
+        pop     r12
+        pop     rbp
+        pop     rbx
+        pop     rdi
+        pop     rsi
+        DB      0F3h,0C3h               ;repret
+
+
+ALIGN   16
+avx2_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$in_prologue
+
+        mov     rax,QWORD[152+r8]
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$in_prologue
+
+        mov     rax,QWORD[544+r8]
+
+        mov     rbx,QWORD[((-8))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+        mov     r12,QWORD[((-24))+rax]
+        mov     r13,QWORD[((-32))+rax]
+        mov     r14,QWORD[((-40))+rax]
+        mov     r15,QWORD[((-48))+rax]
+        mov     QWORD[144+r8],rbx
+        mov     QWORD[160+r8],rbp
+        mov     QWORD[216+r8],r12
+        mov     QWORD[224+r8],r13
+        mov     QWORD[232+r8],r14
+        mov     QWORD[240+r8],r15
+
+        lea     rsi,[((-56-160))+rax]
+        lea     rdi,[512+r8]
+        mov     ecx,20
+        DD      0xa548f3fc
+
+        jmp     NEAR $L$in_prologue
+
+section .pdata rdata align=4
+ALIGN   4
+        DD      $L$SEH_begin_sha256_multi_block wrt ..imagebase
+        DD      $L$SEH_end_sha256_multi_block wrt ..imagebase
+        DD      $L$SEH_info_sha256_multi_block wrt ..imagebase
+        DD      $L$SEH_begin_sha256_multi_block_shaext wrt ..imagebase
+        DD      $L$SEH_end_sha256_multi_block_shaext wrt ..imagebase
+        DD      $L$SEH_info_sha256_multi_block_shaext wrt ..imagebase
+        DD      $L$SEH_begin_sha256_multi_block_avx wrt ..imagebase
+        DD      $L$SEH_end_sha256_multi_block_avx wrt ..imagebase
+        DD      $L$SEH_info_sha256_multi_block_avx wrt ..imagebase
+        DD      $L$SEH_begin_sha256_multi_block_avx2 wrt ..imagebase
+        DD      $L$SEH_end_sha256_multi_block_avx2 wrt ..imagebase
+        DD      $L$SEH_info_sha256_multi_block_avx2 wrt ..imagebase
+section .xdata rdata align=8
+ALIGN   8
+$L$SEH_info_sha256_multi_block:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$body wrt ..imagebase,$L$epilogue wrt ..imagebase
+$L$SEH_info_sha256_multi_block_shaext:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$body_shaext wrt ..imagebase,$L$epilogue_shaext wrt ..imagebase
+$L$SEH_info_sha256_multi_block_avx:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$body_avx wrt ..imagebase,$L$epilogue_avx wrt ..imagebase
+$L$SEH_info_sha256_multi_block_avx2:
+DB      9,0,0,0
+        DD      avx2_handler wrt ..imagebase
+        DD      $L$body_avx2 wrt ..imagebase,$L$epilogue_avx2 wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm
new file mode 100644
index 0000000000..e8abeaa668
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm
@@ -0,0 +1,5712 @@
+; Copyright 2005-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+EXTERN  OPENSSL_ia32cap_P
+global  sha256_block_data_order
+
+ALIGN   16
+sha256_block_data_order:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha256_block_data_order:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+        lea     r11,[OPENSSL_ia32cap_P]
+        mov     r9d,DWORD[r11]
+        mov     r10d,DWORD[4+r11]
+        mov     r11d,DWORD[8+r11]
+        test    r11d,536870912
+        jnz     NEAR _shaext_shortcut
+        and     r11d,296
+        cmp     r11d,296
+        je      NEAR $L$avx2_shortcut
+        and     r9d,1073741824
+        and     r10d,268435968
+        or      r10d,r9d
+        cmp     r10d,1342177792
+        je      NEAR $L$avx_shortcut
+        test    r10d,512
+        jnz     NEAR $L$ssse3_shortcut
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        shl     rdx,4
+        sub     rsp,16*4+4*8
+        lea     rdx,[rdx*4+rsi]
+        and     rsp,-64
+        mov     QWORD[((64+0))+rsp],rdi
+        mov     QWORD[((64+8))+rsp],rsi
+        mov     QWORD[((64+16))+rsp],rdx
+        mov     QWORD[88+rsp],rax
+
+$L$prologue:
+
+        mov     eax,DWORD[rdi]
+        mov     ebx,DWORD[4+rdi]
+        mov     ecx,DWORD[8+rdi]
+        mov     edx,DWORD[12+rdi]
+        mov     r8d,DWORD[16+rdi]
+        mov     r9d,DWORD[20+rdi]
+        mov     r10d,DWORD[24+rdi]
+        mov     r11d,DWORD[28+rdi]
+        jmp     NEAR $L$loop
+
+ALIGN   16
+$L$loop:
+        mov     edi,ebx
+        lea     rbp,[K256]
+        xor     edi,ecx
+        mov     r12d,DWORD[rsi]
+        mov     r13d,r8d
+        mov     r14d,eax
+        bswap   r12d
+        ror     r13d,14
+        mov     r15d,r9d
+
+        xor     r13d,r8d
+        ror     r14d,9
+        xor     r15d,r10d
+
+        mov     DWORD[rsp],r12d
+        xor     r14d,eax
+        and     r15d,r8d
+
+        ror     r13d,5
+        add     r12d,r11d
+        xor     r15d,r10d
+
+        ror     r14d,11
+        xor     r13d,r8d
+        add     r12d,r15d
+
+        mov     r15d,eax
+        add     r12d,DWORD[rbp]
+        xor     r14d,eax
+
+        xor     r15d,ebx
+        ror     r13d,6
+        mov     r11d,ebx
+
+        and     edi,r15d
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     r11d,edi
+        add     edx,r12d
+        add     r11d,r12d
+
+        lea     rbp,[4+rbp]
+        add     r11d,r14d
+        mov     r12d,DWORD[4+rsi]
+        mov     r13d,edx
+        mov     r14d,r11d
+        bswap   r12d
+        ror     r13d,14
+        mov     edi,r8d
+
+        xor     r13d,edx
+        ror     r14d,9
+        xor     edi,r9d
+
+        mov     DWORD[4+rsp],r12d
+        xor     r14d,r11d
+        and     edi,edx
+
+        ror     r13d,5
+        add     r12d,r10d
+        xor     edi,r9d
+
+        ror     r14d,11
+        xor     r13d,edx
+        add     r12d,edi
+
+        mov     edi,r11d
+        add     r12d,DWORD[rbp]
+        xor     r14d,r11d
+
+        xor     edi,eax
+        ror     r13d,6
+        mov     r10d,eax
+
+        and     r15d,edi
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     r10d,r15d
+        add     ecx,r12d
+        add     r10d,r12d
+
+        lea     rbp,[4+rbp]
+        add     r10d,r14d
+        mov     r12d,DWORD[8+rsi]
+        mov     r13d,ecx
+        mov     r14d,r10d
+        bswap   r12d
+        ror     r13d,14
+        mov     r15d,edx
+
+        xor     r13d,ecx
+        ror     r14d,9
+        xor     r15d,r8d
+
+        mov     DWORD[8+rsp],r12d
+        xor     r14d,r10d
+        and     r15d,ecx
+
+        ror     r13d,5
+        add     r12d,r9d
+        xor     r15d,r8d
+
+        ror     r14d,11
+        xor     r13d,ecx
+        add     r12d,r15d
+
+        mov     r15d,r10d
+        add     r12d,DWORD[rbp]
+        xor     r14d,r10d
+
+        xor     r15d,r11d
+        ror     r13d,6
+        mov     r9d,r11d
+
+        and     edi,r15d
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     r9d,edi
+        add     ebx,r12d
+        add     r9d,r12d
+
+        lea     rbp,[4+rbp]
+        add     r9d,r14d
+        mov     r12d,DWORD[12+rsi]
+        mov     r13d,ebx
+        mov     r14d,r9d
+        bswap   r12d
+        ror     r13d,14
+        mov     edi,ecx
+
+        xor     r13d,ebx
+        ror     r14d,9
+        xor     edi,edx
+
+        mov     DWORD[12+rsp],r12d
+        xor     r14d,r9d
+        and     edi,ebx
+
+        ror     r13d,5
+        add     r12d,r8d
+        xor     edi,edx
+
+        ror     r14d,11
+        xor     r13d,ebx
+        add     r12d,edi
+
+        mov     edi,r9d
+        add     r12d,DWORD[rbp]
+        xor     r14d,r9d
+
+        xor     edi,r10d
+        ror     r13d,6
+        mov     r8d,r10d
+
+        and     r15d,edi
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     r8d,r15d
+        add     eax,r12d
+        add     r8d,r12d
+
+        lea     rbp,[20+rbp]
+        add     r8d,r14d
+        mov     r12d,DWORD[16+rsi]
+        mov     r13d,eax
+        mov     r14d,r8d
+        bswap   r12d
+        ror     r13d,14
+        mov     r15d,ebx
+
+        xor     r13d,eax
+        ror     r14d,9
+        xor     r15d,ecx
+
+        mov     DWORD[16+rsp],r12d
+        xor     r14d,r8d
+        and     r15d,eax
+
+        ror     r13d,5
+        add     r12d,edx
+        xor     r15d,ecx
+
+        ror     r14d,11
+        xor     r13d,eax
+        add     r12d,r15d
+
+        mov     r15d,r8d
+        add     r12d,DWORD[rbp]
+        xor     r14d,r8d
+
+        xor     r15d,r9d
+        ror     r13d,6
+        mov     edx,r9d
+
+        and     edi,r15d
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     edx,edi
+        add     r11d,r12d
+        add     edx,r12d
+
+        lea     rbp,[4+rbp]
+        add     edx,r14d
+        mov     r12d,DWORD[20+rsi]
+        mov     r13d,r11d
+        mov     r14d,edx
+        bswap   r12d
+        ror     r13d,14
+        mov     edi,eax
+
+        xor     r13d,r11d
+        ror     r14d,9
+        xor     edi,ebx
+
+        mov     DWORD[20+rsp],r12d
+        xor     r14d,edx
+        and     edi,r11d
+
+        ror     r13d,5
+        add     r12d,ecx
+        xor     edi,ebx
+
+        ror     r14d,11
+        xor     r13d,r11d
+        add     r12d,edi
+
+        mov     edi,edx
+        add     r12d,DWORD[rbp]
+        xor     r14d,edx
+
+        xor     edi,r8d
+        ror     r13d,6
+        mov     ecx,r8d
+
+        and     r15d,edi
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     ecx,r15d
+        add     r10d,r12d
+        add     ecx,r12d
+
+        lea     rbp,[4+rbp]
+        add     ecx,r14d
+        mov     r12d,DWORD[24+rsi]
+        mov     r13d,r10d
+        mov     r14d,ecx
+        bswap   r12d
+        ror     r13d,14
+        mov     r15d,r11d
+
+        xor     r13d,r10d
+        ror     r14d,9
+        xor     r15d,eax
+
+        mov     DWORD[24+rsp],r12d
+        xor     r14d,ecx
+        and     r15d,r10d
+
+        ror     r13d,5
+        add     r12d,ebx
+        xor     r15d,eax
+
+        ror     r14d,11
+        xor     r13d,r10d
+        add     r12d,r15d
+
+        mov     r15d,ecx
+        add     r12d,DWORD[rbp]
+        xor     r14d,ecx
+
+        xor     r15d,edx
+        ror     r13d,6
+        mov     ebx,edx
+
+        and     edi,r15d
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     ebx,edi
+        add     r9d,r12d
+        add     ebx,r12d
+
+        lea     rbp,[4+rbp]
+        add     ebx,r14d
+        mov     r12d,DWORD[28+rsi]
+        mov     r13d,r9d
+        mov     r14d,ebx
+        bswap   r12d
+        ror     r13d,14
+        mov     edi,r10d
+
+        xor     r13d,r9d
+        ror     r14d,9
+        xor     edi,r11d
+
+        mov     DWORD[28+rsp],r12d
+        xor     r14d,ebx
+        and     edi,r9d
+
+        ror     r13d,5
+        add     r12d,eax
+        xor     edi,r11d
+
+        ror     r14d,11
+        xor     r13d,r9d
+        add     r12d,edi
+
+        mov     edi,ebx
+        add     r12d,DWORD[rbp]
+        xor     r14d,ebx
+
+        xor     edi,ecx
+        ror     r13d,6
+        mov     eax,ecx
+
+        and     r15d,edi
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     eax,r15d
+        add     r8d,r12d
+        add     eax,r12d
+
+        lea     rbp,[20+rbp]
+        add     eax,r14d
+        mov     r12d,DWORD[32+rsi]
+        mov     r13d,r8d
+        mov     r14d,eax
+        bswap   r12d
+        ror     r13d,14
+        mov     r15d,r9d
+
+        xor     r13d,r8d
+        ror     r14d,9
+        xor     r15d,r10d
+
+        mov     DWORD[32+rsp],r12d
+        xor     r14d,eax
+        and     r15d,r8d
+
+        ror     r13d,5
+        add     r12d,r11d
+        xor     r15d,r10d
+
+        ror     r14d,11
+        xor     r13d,r8d
+        add     r12d,r15d
+
+        mov     r15d,eax
+        add     r12d,DWORD[rbp]
+        xor     r14d,eax
+
+        xor     r15d,ebx
+        ror     r13d,6
+        mov     r11d,ebx
+
+        and     edi,r15d
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     r11d,edi
+        add     edx,r12d
+        add     r11d,r12d
+
+        lea     rbp,[4+rbp]
+        add     r11d,r14d
+        mov     r12d,DWORD[36+rsi]
+        mov     r13d,edx
+        mov     r14d,r11d
+        bswap   r12d
+        ror     r13d,14
+        mov     edi,r8d
+
+        xor     r13d,edx
+        ror     r14d,9
+        xor     edi,r9d
+
+        mov     DWORD[36+rsp],r12d
+        xor     r14d,r11d
+        and     edi,edx
+
+        ror     r13d,5
+        add     r12d,r10d
+        xor     edi,r9d
+
+        ror     r14d,11
+        xor     r13d,edx
+        add     r12d,edi
+
+        mov     edi,r11d
+        add     r12d,DWORD[rbp]
+        xor     r14d,r11d
+
+        xor     edi,eax
+        ror     r13d,6
+        mov     r10d,eax
+
+        and     r15d,edi
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     r10d,r15d
+        add     ecx,r12d
+        add     r10d,r12d
+
+        lea     rbp,[4+rbp]
+        add     r10d,r14d
+        mov     r12d,DWORD[40+rsi]
+        mov     r13d,ecx
+        mov     r14d,r10d
+        bswap   r12d
+        ror     r13d,14
+        mov     r15d,edx
+
+        xor     r13d,ecx
+        ror     r14d,9
+        xor     r15d,r8d
+
+        mov     DWORD[40+rsp],r12d
+        xor     r14d,r10d
+        and     r15d,ecx
+
+        ror     r13d,5
+        add     r12d,r9d
+        xor     r15d,r8d
+
+        ror     r14d,11
+        xor     r13d,ecx
+        add     r12d,r15d
+
+        mov     r15d,r10d
+        add     r12d,DWORD[rbp]
+        xor     r14d,r10d
+
+        xor     r15d,r11d
+        ror     r13d,6
+        mov     r9d,r11d
+
+        and     edi,r15d
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     r9d,edi
+        add     ebx,r12d
+        add     r9d,r12d
+
+        lea     rbp,[4+rbp]
+        add     r9d,r14d
+        mov     r12d,DWORD[44+rsi]
+        mov     r13d,ebx
+        mov     r14d,r9d
+        bswap   r12d
+        ror     r13d,14
+        mov     edi,ecx
+
+        xor     r13d,ebx
+        ror     r14d,9
+        xor     edi,edx
+
+        mov     DWORD[44+rsp],r12d
+        xor     r14d,r9d
+        and     edi,ebx
+
+        ror     r13d,5
+        add     r12d,r8d
+        xor     edi,edx
+
+        ror     r14d,11
+        xor     r13d,ebx
+        add     r12d,edi
+
+        mov     edi,r9d
+        add     r12d,DWORD[rbp]
+        xor     r14d,r9d
+
+        xor     edi,r10d
+        ror     r13d,6
+        mov     r8d,r10d
+
+        and     r15d,edi
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     r8d,r15d
+        add     eax,r12d
+        add     r8d,r12d
+
+        lea     rbp,[20+rbp]
+        add     r8d,r14d
+        mov     r12d,DWORD[48+rsi]
+        mov     r13d,eax
+        mov     r14d,r8d
+        bswap   r12d
+        ror     r13d,14
+        mov     r15d,ebx
+
+        xor     r13d,eax
+        ror     r14d,9
+        xor     r15d,ecx
+
+        mov     DWORD[48+rsp],r12d
+        xor     r14d,r8d
+        and     r15d,eax
+
+        ror     r13d,5
+        add     r12d,edx
+        xor     r15d,ecx
+
+        ror     r14d,11
+        xor     r13d,eax
+        add     r12d,r15d
+
+        mov     r15d,r8d
+        add     r12d,DWORD[rbp]
+        xor     r14d,r8d
+
+        xor     r15d,r9d
+        ror     r13d,6
+        mov     edx,r9d
+
+        and     edi,r15d
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     edx,edi
+        add     r11d,r12d
+        add     edx,r12d
+
+        lea     rbp,[4+rbp]
+        add     edx,r14d
+        mov     r12d,DWORD[52+rsi]
+        mov     r13d,r11d
+        mov     r14d,edx
+        bswap   r12d
+        ror     r13d,14
+        mov     edi,eax
+
+        xor     r13d,r11d
+        ror     r14d,9
+        xor     edi,ebx
+
+        mov     DWORD[52+rsp],r12d
+        xor     r14d,edx
+        and     edi,r11d
+
+        ror     r13d,5
+        add     r12d,ecx
+        xor     edi,ebx
+
+        ror     r14d,11
+        xor     r13d,r11d
+        add     r12d,edi
+
+        mov     edi,edx
+        add     r12d,DWORD[rbp]
+        xor     r14d,edx
+
+        xor     edi,r8d
+        ror     r13d,6
+        mov     ecx,r8d
+
+        and     r15d,edi
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     ecx,r15d
+        add     r10d,r12d
+        add     ecx,r12d
+
+        lea     rbp,[4+rbp]
+        add     ecx,r14d
+        mov     r12d,DWORD[56+rsi]
+        mov     r13d,r10d
+        mov     r14d,ecx
+        bswap   r12d
+        ror     r13d,14
+        mov     r15d,r11d
+
+        xor     r13d,r10d
+        ror     r14d,9
+        xor     r15d,eax
+
+        mov     DWORD[56+rsp],r12d
+        xor     r14d,ecx
+        and     r15d,r10d
+
+        ror     r13d,5
+        add     r12d,ebx
+        xor     r15d,eax
+
+        ror     r14d,11
+        xor     r13d,r10d
+        add     r12d,r15d
+
+        mov     r15d,ecx
+        add     r12d,DWORD[rbp]
+        xor     r14d,ecx
+
+        xor     r15d,edx
+        ror     r13d,6
+        mov     ebx,edx
+
+        and     edi,r15d
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     ebx,edi
+        add     r9d,r12d
+        add     ebx,r12d
+
+        lea     rbp,[4+rbp]
+        add     ebx,r14d
+        mov     r12d,DWORD[60+rsi]
+        mov     r13d,r9d
+        mov     r14d,ebx
+        bswap   r12d
+        ror     r13d,14
+        mov     edi,r10d
+
+        xor     r13d,r9d
+        ror     r14d,9
+        xor     edi,r11d
+
+        mov     DWORD[60+rsp],r12d
+        xor     r14d,ebx
+        and     edi,r9d
+
+        ror     r13d,5
+        add     r12d,eax
+        xor     edi,r11d
+
+        ror     r14d,11
+        xor     r13d,r9d
+        add     r12d,edi
+
+        mov     edi,ebx
+        add     r12d,DWORD[rbp]
+        xor     r14d,ebx
+
+        xor     edi,ecx
+        ror     r13d,6
+        mov     eax,ecx
+
+        and     r15d,edi
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     eax,r15d
+        add     r8d,r12d
+        add     eax,r12d
+
+        lea     rbp,[20+rbp]
+        jmp     NEAR $L$rounds_16_xx
+ALIGN   16
+$L$rounds_16_xx:
+        mov     r13d,DWORD[4+rsp]
+        mov     r15d,DWORD[56+rsp]
+
+        mov     r12d,r13d
+        ror     r13d,11
+        add     eax,r14d
+        mov     r14d,r15d
+        ror     r15d,2
+
+        xor     r13d,r12d
+        shr     r12d,3
+        ror     r13d,7
+        xor     r15d,r14d
+        shr     r14d,10
+
+        ror     r15d,17
+        xor     r12d,r13d
+        xor     r15d,r14d
+        add     r12d,DWORD[36+rsp]
+
+        add     r12d,DWORD[rsp]
+        mov     r13d,r8d
+        add     r12d,r15d
+        mov     r14d,eax
+        ror     r13d,14
+        mov     r15d,r9d
+
+        xor     r13d,r8d
+        ror     r14d,9
+        xor     r15d,r10d
+
+        mov     DWORD[rsp],r12d
+        xor     r14d,eax
+        and     r15d,r8d
+
+        ror     r13d,5
+        add     r12d,r11d
+        xor     r15d,r10d
+
+        ror     r14d,11
+        xor     r13d,r8d
+        add     r12d,r15d
+
+        mov     r15d,eax
+        add     r12d,DWORD[rbp]
+        xor     r14d,eax
+
+        xor     r15d,ebx
+        ror     r13d,6
+        mov     r11d,ebx
+
+        and     edi,r15d
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     r11d,edi
+        add     edx,r12d
+        add     r11d,r12d
+
+        lea     rbp,[4+rbp]
+        mov     r13d,DWORD[8+rsp]
+        mov     edi,DWORD[60+rsp]
+
+        mov     r12d,r13d
+        ror     r13d,11
+        add     r11d,r14d
+        mov     r14d,edi
+        ror     edi,2
+
+        xor     r13d,r12d
+        shr     r12d,3
+        ror     r13d,7
+        xor     edi,r14d
+        shr     r14d,10
+
+        ror     edi,17
+        xor     r12d,r13d
+        xor     edi,r14d
+        add     r12d,DWORD[40+rsp]
+
+        add     r12d,DWORD[4+rsp]
+        mov     r13d,edx
+        add     r12d,edi
+        mov     r14d,r11d
+        ror     r13d,14
+        mov     edi,r8d
+
+        xor     r13d,edx
+        ror     r14d,9
+        xor     edi,r9d
+
+        mov     DWORD[4+rsp],r12d
+        xor     r14d,r11d
+        and     edi,edx
+
+        ror     r13d,5
+        add     r12d,r10d
+        xor     edi,r9d
+
+        ror     r14d,11
+        xor     r13d,edx
+        add     r12d,edi
+
+        mov     edi,r11d
+        add     r12d,DWORD[rbp]
+        xor     r14d,r11d
+
+        xor     edi,eax
+        ror     r13d,6
+        mov     r10d,eax
+
+        and     r15d,edi
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     r10d,r15d
+        add     ecx,r12d
+        add     r10d,r12d
+
+        lea     rbp,[4+rbp]
+        mov     r13d,DWORD[12+rsp]
+        mov     r15d,DWORD[rsp]
+
+        mov     r12d,r13d
+        ror     r13d,11
+        add     r10d,r14d
+        mov     r14d,r15d
+        ror     r15d,2
+
+        xor     r13d,r12d
+        shr     r12d,3
+        ror     r13d,7
+        xor     r15d,r14d
+        shr     r14d,10
+
+        ror     r15d,17
+        xor     r12d,r13d
+        xor     r15d,r14d
+        add     r12d,DWORD[44+rsp]
+
+        add     r12d,DWORD[8+rsp]
+        mov     r13d,ecx
+        add     r12d,r15d
+        mov     r14d,r10d
+        ror     r13d,14
+        mov     r15d,edx
+
+        xor     r13d,ecx
+        ror     r14d,9
+        xor     r15d,r8d
+
+        mov     DWORD[8+rsp],r12d
+        xor     r14d,r10d
+        and     r15d,ecx
+
+        ror     r13d,5
+        add     r12d,r9d
+        xor     r15d,r8d
+
+        ror     r14d,11
+        xor     r13d,ecx
+        add     r12d,r15d
+
+        mov     r15d,r10d
+        add     r12d,DWORD[rbp]
+        xor     r14d,r10d
+
+        xor     r15d,r11d
+        ror     r13d,6
+        mov     r9d,r11d
+
+        and     edi,r15d
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     r9d,edi
+        add     ebx,r12d
+        add     r9d,r12d
+
+        lea     rbp,[4+rbp]
+        mov     r13d,DWORD[16+rsp]
+        mov     edi,DWORD[4+rsp]
+
+        mov     r12d,r13d
+        ror     r13d,11
+        add     r9d,r14d
+        mov     r14d,edi
+        ror     edi,2
+
+        xor     r13d,r12d
+        shr     r12d,3
+        ror     r13d,7
+        xor     edi,r14d
+        shr     r14d,10
+
+        ror     edi,17
+        xor     r12d,r13d
+        xor     edi,r14d
+        add     r12d,DWORD[48+rsp]
+
+        add     r12d,DWORD[12+rsp]
+        mov     r13d,ebx
+        add     r12d,edi
+        mov     r14d,r9d
+        ror     r13d,14
+        mov     edi,ecx
+
+        xor     r13d,ebx
+        ror     r14d,9
+        xor     edi,edx
+
+        mov     DWORD[12+rsp],r12d
+        xor     r14d,r9d
+        and     edi,ebx
+
+        ror     r13d,5
+        add     r12d,r8d
+        xor     edi,edx
+
+        ror     r14d,11
+        xor     r13d,ebx
+        add     r12d,edi
+
+        mov     edi,r9d
+        add     r12d,DWORD[rbp]
+        xor     r14d,r9d
+
+        xor     edi,r10d
+        ror     r13d,6
+        mov     r8d,r10d
+
+        and     r15d,edi
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     r8d,r15d
+        add     eax,r12d
+        add     r8d,r12d
+
+        lea     rbp,[20+rbp]
+        mov     r13d,DWORD[20+rsp]
+        mov     r15d,DWORD[8+rsp]
+
+        mov     r12d,r13d
+        ror     r13d,11
+        add     r8d,r14d
+        mov     r14d,r15d
+        ror     r15d,2
+
+        xor     r13d,r12d
+        shr     r12d,3
+        ror     r13d,7
+        xor     r15d,r14d
+        shr     r14d,10
+
+        ror     r15d,17
+        xor     r12d,r13d
+        xor     r15d,r14d
+        add     r12d,DWORD[52+rsp]
+
+        add     r12d,DWORD[16+rsp]
+        mov     r13d,eax
+        add     r12d,r15d
+        mov     r14d,r8d
+        ror     r13d,14
+        mov     r15d,ebx
+
+        xor     r13d,eax
+        ror     r14d,9
+        xor     r15d,ecx
+
+        mov     DWORD[16+rsp],r12d
+        xor     r14d,r8d
+        and     r15d,eax
+
+        ror     r13d,5
+        add     r12d,edx
+        xor     r15d,ecx
+
+        ror     r14d,11
+        xor     r13d,eax
+        add     r12d,r15d
+
+        mov     r15d,r8d
+        add     r12d,DWORD[rbp]
+        xor     r14d,r8d
+
+        xor     r15d,r9d
+        ror     r13d,6
+        mov     edx,r9d
+
+        and     edi,r15d
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     edx,edi
+        add     r11d,r12d
+        add     edx,r12d
+
+        lea     rbp,[4+rbp]
+        mov     r13d,DWORD[24+rsp]
+        mov     edi,DWORD[12+rsp]
+
+        mov     r12d,r13d
+        ror     r13d,11
+        add     edx,r14d
+        mov     r14d,edi
+        ror     edi,2
+
+        xor     r13d,r12d
+        shr     r12d,3
+        ror     r13d,7
+        xor     edi,r14d
+        shr     r14d,10
+
+        ror     edi,17
+        xor     r12d,r13d
+        xor     edi,r14d
+        add     r12d,DWORD[56+rsp]
+
+        add     r12d,DWORD[20+rsp]
+        mov     r13d,r11d
+        add     r12d,edi
+        mov     r14d,edx
+        ror     r13d,14
+        mov     edi,eax
+
+        xor     r13d,r11d
+        ror     r14d,9
+        xor     edi,ebx
+
+        mov     DWORD[20+rsp],r12d
+        xor     r14d,edx
+        and     edi,r11d
+
+        ror     r13d,5
+        add     r12d,ecx
+        xor     edi,ebx
+
+        ror     r14d,11
+        xor     r13d,r11d
+        add     r12d,edi
+
+        mov     edi,edx
+        add     r12d,DWORD[rbp]
+        xor     r14d,edx
+
+        xor     edi,r8d
+        ror     r13d,6
+        mov     ecx,r8d
+
+        and     r15d,edi
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     ecx,r15d
+        add     r10d,r12d
+        add     ecx,r12d
+
+        lea     rbp,[4+rbp]
+        mov     r13d,DWORD[28+rsp]
+        mov     r15d,DWORD[16+rsp]
+
+        mov     r12d,r13d
+        ror     r13d,11
+        add     ecx,r14d
+        mov     r14d,r15d
+        ror     r15d,2
+
+        xor     r13d,r12d
+        shr     r12d,3
+        ror     r13d,7
+        xor     r15d,r14d
+        shr     r14d,10
+
+        ror     r15d,17
+        xor     r12d,r13d
+        xor     r15d,r14d
+        add     r12d,DWORD[60+rsp]
+
+        add     r12d,DWORD[24+rsp]
+        mov     r13d,r10d
+        add     r12d,r15d
+        mov     r14d,ecx
+        ror     r13d,14
+        mov     r15d,r11d
+
+        xor     r13d,r10d
+        ror     r14d,9
+        xor     r15d,eax
+
+        mov     DWORD[24+rsp],r12d
+        xor     r14d,ecx
+        and     r15d,r10d
+
+        ror     r13d,5
+        add     r12d,ebx
+        xor     r15d,eax
+
+        ror     r14d,11
+        xor     r13d,r10d
+        add     r12d,r15d
+
+        mov     r15d,ecx
+        add     r12d,DWORD[rbp]
+        xor     r14d,ecx
+
+        xor     r15d,edx
+        ror     r13d,6
+        mov     ebx,edx
+
+        and     edi,r15d
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     ebx,edi
+        add     r9d,r12d
+        add     ebx,r12d
+
+        lea     rbp,[4+rbp]
+        mov     r13d,DWORD[32+rsp]
+        mov     edi,DWORD[20+rsp]
+
+        mov     r12d,r13d
+        ror     r13d,11
+        add     ebx,r14d
+        mov     r14d,edi
+        ror     edi,2
+
+        xor     r13d,r12d
+        shr     r12d,3
+        ror     r13d,7
+        xor     edi,r14d
+        shr     r14d,10
+
+        ror     edi,17
+        xor     r12d,r13d
+        xor     edi,r14d
+        add     r12d,DWORD[rsp]
+
+        add     r12d,DWORD[28+rsp]
+        mov     r13d,r9d
+        add     r12d,edi
+        mov     r14d,ebx
+        ror     r13d,14
+        mov     edi,r10d
+
+        xor     r13d,r9d
+        ror     r14d,9
+        xor     edi,r11d
+
+        mov     DWORD[28+rsp],r12d
+        xor     r14d,ebx
+        and     edi,r9d
+
+        ror     r13d,5
+        add     r12d,eax
+        xor     edi,r11d
+
+        ror     r14d,11
+        xor     r13d,r9d
+        add     r12d,edi
+
+        mov     edi,ebx
+        add     r12d,DWORD[rbp]
+        xor     r14d,ebx
+
+        xor     edi,ecx
+        ror     r13d,6
+        mov     eax,ecx
+
+        and     r15d,edi
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     eax,r15d
+        add     r8d,r12d
+        add     eax,r12d
+
+        lea     rbp,[20+rbp]
+        mov     r13d,DWORD[36+rsp]
+        mov     r15d,DWORD[24+rsp]
+
+        mov     r12d,r13d
+        ror     r13d,11
+        add     eax,r14d
+        mov     r14d,r15d
+        ror     r15d,2
+
+        xor     r13d,r12d
+        shr     r12d,3
+        ror     r13d,7
+        xor     r15d,r14d
+        shr     r14d,10
+
+        ror     r15d,17
+        xor     r12d,r13d
+        xor     r15d,r14d
+        add     r12d,DWORD[4+rsp]
+
+        add     r12d,DWORD[32+rsp]
+        mov     r13d,r8d
+        add     r12d,r15d
+        mov     r14d,eax
+        ror     r13d,14
+        mov     r15d,r9d
+
+        xor     r13d,r8d
+        ror     r14d,9
+        xor     r15d,r10d
+
+        mov     DWORD[32+rsp],r12d
+        xor     r14d,eax
+        and     r15d,r8d
+
+        ror     r13d,5
+        add     r12d,r11d
+        xor     r15d,r10d
+
+        ror     r14d,11
+        xor     r13d,r8d
+        add     r12d,r15d
+
+        mov     r15d,eax
+        add     r12d,DWORD[rbp]
+        xor     r14d,eax
+
+        xor     r15d,ebx
+        ror     r13d,6
+        mov     r11d,ebx
+
+        and     edi,r15d
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     r11d,edi
+        add     edx,r12d
+        add     r11d,r12d
+
+        lea     rbp,[4+rbp]
+        mov     r13d,DWORD[40+rsp]
+        mov     edi,DWORD[28+rsp]
+
+        mov     r12d,r13d
+        ror     r13d,11
+        add     r11d,r14d
+        mov     r14d,edi
+        ror     edi,2
+
+        xor     r13d,r12d
+        shr     r12d,3
+        ror     r13d,7
+        xor     edi,r14d
+        shr     r14d,10
+
+        ror     edi,17
+        xor     r12d,r13d
+        xor     edi,r14d
+        add     r12d,DWORD[8+rsp]
+
+        add     r12d,DWORD[36+rsp]
+        mov     r13d,edx
+        add     r12d,edi
+        mov     r14d,r11d
+        ror     r13d,14
+        mov     edi,r8d
+
+        xor     r13d,edx
+        ror     r14d,9
+        xor     edi,r9d
+
+        mov     DWORD[36+rsp],r12d
+        xor     r14d,r11d
+        and     edi,edx
+
+        ror     r13d,5
+        add     r12d,r10d
+        xor     edi,r9d
+
+        ror     r14d,11
+        xor     r13d,edx
+        add     r12d,edi
+
+        mov     edi,r11d
+        add     r12d,DWORD[rbp]
+        xor     r14d,r11d
+
+        xor     edi,eax
+        ror     r13d,6
+        mov     r10d,eax
+
+        and     r15d,edi
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     r10d,r15d
+        add     ecx,r12d
+        add     r10d,r12d
+
+        lea     rbp,[4+rbp]
+        mov     r13d,DWORD[44+rsp]
+        mov     r15d,DWORD[32+rsp]
+
+        mov     r12d,r13d
+        ror     r13d,11
+        add     r10d,r14d
+        mov     r14d,r15d
+        ror     r15d,2
+
+        xor     r13d,r12d
+        shr     r12d,3
+        ror     r13d,7
+        xor     r15d,r14d
+        shr     r14d,10
+
+        ror     r15d,17
+        xor     r12d,r13d
+        xor     r15d,r14d
+        add     r12d,DWORD[12+rsp]
+
+        add     r12d,DWORD[40+rsp]
+        mov     r13d,ecx
+        add     r12d,r15d
+        mov     r14d,r10d
+        ror     r13d,14
+        mov     r15d,edx
+
+        xor     r13d,ecx
+        ror     r14d,9
+        xor     r15d,r8d
+
+        mov     DWORD[40+rsp],r12d
+        xor     r14d,r10d
+        and     r15d,ecx
+
+        ror     r13d,5
+        add     r12d,r9d
+        xor     r15d,r8d
+
+        ror     r14d,11
+        xor     r13d,ecx
+        add     r12d,r15d
+
+        mov     r15d,r10d
+        add     r12d,DWORD[rbp]
+        xor     r14d,r10d
+
+        xor     r15d,r11d
+        ror     r13d,6
+        mov     r9d,r11d
+
+        and     edi,r15d
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     r9d,edi
+        add     ebx,r12d
+        add     r9d,r12d
+
+        lea     rbp,[4+rbp]
+        mov     r13d,DWORD[48+rsp]
+        mov     edi,DWORD[36+rsp]
+
+        mov     r12d,r13d
+        ror     r13d,11
+        add     r9d,r14d
+        mov     r14d,edi
+        ror     edi,2
+
+        xor     r13d,r12d
+        shr     r12d,3
+        ror     r13d,7
+        xor     edi,r14d
+        shr     r14d,10
+
+        ror     edi,17
+        xor     r12d,r13d
+        xor     edi,r14d
+        add     r12d,DWORD[16+rsp]
+
+        add     r12d,DWORD[44+rsp]
+        mov     r13d,ebx
+        add     r12d,edi
+        mov     r14d,r9d
+        ror     r13d,14
+        mov     edi,ecx
+
+        xor     r13d,ebx
+        ror     r14d,9
+        xor     edi,edx
+
+        mov     DWORD[44+rsp],r12d
+        xor     r14d,r9d
+        and     edi,ebx
+
+        ror     r13d,5
+        add     r12d,r8d
+        xor     edi,edx
+
+        ror     r14d,11
+        xor     r13d,ebx
+        add     r12d,edi
+
+        mov     edi,r9d
+        add     r12d,DWORD[rbp]
+        xor     r14d,r9d
+
+        xor     edi,r10d
+        ror     r13d,6
+        mov     r8d,r10d
+
+        and     r15d,edi
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     r8d,r15d
+        add     eax,r12d
+        add     r8d,r12d
+
+        lea     rbp,[20+rbp]
+        mov     r13d,DWORD[52+rsp]
+        mov     r15d,DWORD[40+rsp]
+
+        mov     r12d,r13d
+        ror     r13d,11
+        add     r8d,r14d
+        mov     r14d,r15d
+        ror     r15d,2
+
+        xor     r13d,r12d
+        shr     r12d,3
+        ror     r13d,7
+        xor     r15d,r14d
+        shr     r14d,10
+
+        ror     r15d,17
+        xor     r12d,r13d
+        xor     r15d,r14d
+        add     r12d,DWORD[20+rsp]
+
+        add     r12d,DWORD[48+rsp]
+        mov     r13d,eax
+        add     r12d,r15d
+        mov     r14d,r8d
+        ror     r13d,14
+        mov     r15d,ebx
+
+        xor     r13d,eax
+        ror     r14d,9
+        xor     r15d,ecx
+
+        mov     DWORD[48+rsp],r12d
+        xor     r14d,r8d
+        and     r15d,eax
+
+        ror     r13d,5
+        add     r12d,edx
+        xor     r15d,ecx
+
+        ror     r14d,11
+        xor     r13d,eax
+        add     r12d,r15d
+
+        mov     r15d,r8d
+        add     r12d,DWORD[rbp]
+        xor     r14d,r8d
+
+        xor     r15d,r9d
+        ror     r13d,6
+        mov     edx,r9d
+
+        and     edi,r15d
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     edx,edi
+        add     r11d,r12d
+        add     edx,r12d
+
+        lea     rbp,[4+rbp]
+        mov     r13d,DWORD[56+rsp]
+        mov     edi,DWORD[44+rsp]
+
+        mov     r12d,r13d
+        ror     r13d,11
+        add     edx,r14d
+        mov     r14d,edi
+        ror     edi,2
+
+        xor     r13d,r12d
+        shr     r12d,3
+        ror     r13d,7
+        xor     edi,r14d
+        shr     r14d,10
+
+        ror     edi,17
+        xor     r12d,r13d
+        xor     edi,r14d
+        add     r12d,DWORD[24+rsp]
+
+        add     r12d,DWORD[52+rsp]
+        mov     r13d,r11d
+        add     r12d,edi
+        mov     r14d,edx
+        ror     r13d,14
+        mov     edi,eax
+
+        xor     r13d,r11d
+        ror     r14d,9
+        xor     edi,ebx
+
+        mov     DWORD[52+rsp],r12d
+        xor     r14d,edx
+        and     edi,r11d
+
+        ror     r13d,5
+        add     r12d,ecx
+        xor     edi,ebx
+
+        ror     r14d,11
+        xor     r13d,r11d
+        add     r12d,edi
+
+        mov     edi,edx
+        add     r12d,DWORD[rbp]
+        xor     r14d,edx
+
+        xor     edi,r8d
+        ror     r13d,6
+        mov     ecx,r8d
+
+        and     r15d,edi
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     ecx,r15d
+        add     r10d,r12d
+        add     ecx,r12d
+
+        lea     rbp,[4+rbp]
+        mov     r13d,DWORD[60+rsp]
+        mov     r15d,DWORD[48+rsp]
+
+        mov     r12d,r13d
+        ror     r13d,11
+        add     ecx,r14d
+        mov     r14d,r15d
+        ror     r15d,2
+
+        xor     r13d,r12d
+        shr     r12d,3
+        ror     r13d,7
+        xor     r15d,r14d
+        shr     r14d,10
+
+        ror     r15d,17
+        xor     r12d,r13d
+        xor     r15d,r14d
+        add     r12d,DWORD[28+rsp]
+
+        add     r12d,DWORD[56+rsp]
+        mov     r13d,r10d
+        add     r12d,r15d
+        mov     r14d,ecx
+        ror     r13d,14
+        mov     r15d,r11d
+
+        xor     r13d,r10d
+        ror     r14d,9
+        xor     r15d,eax
+
+        mov     DWORD[56+rsp],r12d
+        xor     r14d,ecx
+        and     r15d,r10d
+
+        ror     r13d,5
+        add     r12d,ebx
+        xor     r15d,eax
+
+        ror     r14d,11
+        xor     r13d,r10d
+        add     r12d,r15d
+
+        mov     r15d,ecx
+        add     r12d,DWORD[rbp]
+        xor     r14d,ecx
+
+        xor     r15d,edx
+        ror     r13d,6
+        mov     ebx,edx
+
+        and     edi,r15d
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     ebx,edi
+        add     r9d,r12d
+        add     ebx,r12d
+
+        lea     rbp,[4+rbp]
+        mov     r13d,DWORD[rsp]
+        mov     edi,DWORD[52+rsp]
+
+        mov     r12d,r13d
+        ror     r13d,11
+        add     ebx,r14d
+        mov     r14d,edi
+        ror     edi,2
+
+        xor     r13d,r12d
+        shr     r12d,3
+        ror     r13d,7
+        xor     edi,r14d
+        shr     r14d,10
+
+        ror     edi,17
+        xor     r12d,r13d
+        xor     edi,r14d
+        add     r12d,DWORD[32+rsp]
+
+        add     r12d,DWORD[60+rsp]
+        mov     r13d,r9d
+        add     r12d,edi
+        mov     r14d,ebx
+        ror     r13d,14
+        mov     edi,r10d
+
+        xor     r13d,r9d
+        ror     r14d,9
+        xor     edi,r11d
+
+        mov     DWORD[60+rsp],r12d
+        xor     r14d,ebx
+        and     edi,r9d
+
+        ror     r13d,5
+        add     r12d,eax
+        xor     edi,r11d
+
+        ror     r14d,11
+        xor     r13d,r9d
+        add     r12d,edi
+
+        mov     edi,ebx
+        add     r12d,DWORD[rbp]
+        xor     r14d,ebx
+
+        xor     edi,ecx
+        ror     r13d,6
+        mov     eax,ecx
+
+        and     r15d,edi
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     eax,r15d
+        add     r8d,r12d
+        add     eax,r12d
+
+        lea     rbp,[20+rbp]
+        cmp     BYTE[3+rbp],0
+        jnz     NEAR $L$rounds_16_xx
+
+        mov     rdi,QWORD[((64+0))+rsp]
+        add     eax,r14d
+        lea     rsi,[64+rsi]
+
+        add     eax,DWORD[rdi]
+        add     ebx,DWORD[4+rdi]
+        add     ecx,DWORD[8+rdi]
+        add     edx,DWORD[12+rdi]
+        add     r8d,DWORD[16+rdi]
+        add     r9d,DWORD[20+rdi]
+        add     r10d,DWORD[24+rdi]
+        add     r11d,DWORD[28+rdi]
+
+        cmp     rsi,QWORD[((64+16))+rsp]
+
+        mov     DWORD[rdi],eax
+        mov     DWORD[4+rdi],ebx
+        mov     DWORD[8+rdi],ecx
+        mov     DWORD[12+rdi],edx
+        mov     DWORD[16+rdi],r8d
+        mov     DWORD[20+rdi],r9d
+        mov     DWORD[24+rdi],r10d
+        mov     DWORD[28+rdi],r11d
+        jb      NEAR $L$loop
+
+        mov     rsi,QWORD[88+rsp]
+
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_sha256_block_data_order:
+ALIGN   64
+
+K256:
+        DD      0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5
+        DD      0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5
+        DD      0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5
+        DD      0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5
+        DD      0xd807aa98,0x12835b01,0x243185be,0x550c7dc3
+        DD      0xd807aa98,0x12835b01,0x243185be,0x550c7dc3
+        DD      0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174
+        DD      0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174
+        DD      0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc
+        DD      0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc
+        DD      0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da
+        DD      0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da
+        DD      0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7
+        DD      0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7
+        DD      0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967
+        DD      0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967
+        DD      0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13
+        DD      0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13
+        DD      0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85
+        DD      0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85
+        DD      0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3
+        DD      0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3
+        DD      0xd192e819,0xd6990624,0xf40e3585,0x106aa070
+        DD      0xd192e819,0xd6990624,0xf40e3585,0x106aa070
+        DD      0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5
+        DD      0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5
+        DD      0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3
+        DD      0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3
+        DD      0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208
+        DD      0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208
+        DD      0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2
+        DD      0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2
+
+        DD      0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+        DD      0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+        DD      0x03020100,0x0b0a0908,0xffffffff,0xffffffff
+        DD      0x03020100,0x0b0a0908,0xffffffff,0xffffffff
+        DD      0xffffffff,0xffffffff,0x03020100,0x0b0a0908
+        DD      0xffffffff,0xffffffff,0x03020100,0x0b0a0908
+DB      83,72,65,50,53,54,32,98,108,111,99,107,32,116,114,97
+DB      110,115,102,111,114,109,32,102,111,114,32,120,56,54,95,54
+DB      52,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121
+DB      32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46
+DB      111,114,103,62,0
+
+ALIGN   64
+sha256_block_data_order_shaext:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha256_block_data_order_shaext:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+_shaext_shortcut:
+        lea     rsp,[((-88))+rsp]
+        movaps  XMMWORD[(-8-80)+rax],xmm6
+        movaps  XMMWORD[(-8-64)+rax],xmm7
+        movaps  XMMWORD[(-8-48)+rax],xmm8
+        movaps  XMMWORD[(-8-32)+rax],xmm9
+        movaps  XMMWORD[(-8-16)+rax],xmm10
+$L$prologue_shaext:
+        lea     rcx,[((K256+128))]
+        movdqu  xmm1,XMMWORD[rdi]
+        movdqu  xmm2,XMMWORD[16+rdi]
+        movdqa  xmm7,XMMWORD[((512-128))+rcx]
+
+        pshufd  xmm0,xmm1,0x1b
+        pshufd  xmm1,xmm1,0xb1
+        pshufd  xmm2,xmm2,0x1b
+        movdqa  xmm8,xmm7
+DB      102,15,58,15,202,8
+        punpcklqdq      xmm2,xmm0
+        jmp     NEAR $L$oop_shaext
+
+ALIGN   16
+$L$oop_shaext:
+        movdqu  xmm3,XMMWORD[rsi]
+        movdqu  xmm4,XMMWORD[16+rsi]
+        movdqu  xmm5,XMMWORD[32+rsi]
+DB      102,15,56,0,223
+        movdqu  xmm6,XMMWORD[48+rsi]
+
+        movdqa  xmm0,XMMWORD[((0-128))+rcx]
+        paddd   xmm0,xmm3
+DB      102,15,56,0,231
+        movdqa  xmm10,xmm2
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        nop
+        movdqa  xmm9,xmm1
+DB      15,56,203,202
+
+        movdqa  xmm0,XMMWORD[((32-128))+rcx]
+        paddd   xmm0,xmm4
+DB      102,15,56,0,239
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        lea     rsi,[64+rsi]
+DB      15,56,204,220
+DB      15,56,203,202
+
+        movdqa  xmm0,XMMWORD[((64-128))+rcx]
+        paddd   xmm0,xmm5
+DB      102,15,56,0,247
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm7,xmm6
+DB      102,15,58,15,253,4
+        nop
+        paddd   xmm3,xmm7
+DB      15,56,204,229
+DB      15,56,203,202
+
+        movdqa  xmm0,XMMWORD[((96-128))+rcx]
+        paddd   xmm0,xmm6
+DB      15,56,205,222
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm7,xmm3
+DB      102,15,58,15,254,4
+        nop
+        paddd   xmm4,xmm7
+DB      15,56,204,238
+DB      15,56,203,202
+        movdqa  xmm0,XMMWORD[((128-128))+rcx]
+        paddd   xmm0,xmm3
+DB      15,56,205,227
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm7,xmm4
+DB      102,15,58,15,251,4
+        nop
+        paddd   xmm5,xmm7
+DB      15,56,204,243
+DB      15,56,203,202
+        movdqa  xmm0,XMMWORD[((160-128))+rcx]
+        paddd   xmm0,xmm4
+DB      15,56,205,236
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm7,xmm5
+DB      102,15,58,15,252,4
+        nop
+        paddd   xmm6,xmm7
+DB      15,56,204,220
+DB      15,56,203,202
+        movdqa  xmm0,XMMWORD[((192-128))+rcx]
+        paddd   xmm0,xmm5
+DB      15,56,205,245
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm7,xmm6
+DB      102,15,58,15,253,4
+        nop
+        paddd   xmm3,xmm7
+DB      15,56,204,229
+DB      15,56,203,202
+        movdqa  xmm0,XMMWORD[((224-128))+rcx]
+        paddd   xmm0,xmm6
+DB      15,56,205,222
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm7,xmm3
+DB      102,15,58,15,254,4
+        nop
+        paddd   xmm4,xmm7
+DB      15,56,204,238
+DB      15,56,203,202
+        movdqa  xmm0,XMMWORD[((256-128))+rcx]
+        paddd   xmm0,xmm3
+DB      15,56,205,227
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm7,xmm4
+DB      102,15,58,15,251,4
+        nop
+        paddd   xmm5,xmm7
+DB      15,56,204,243
+DB      15,56,203,202
+        movdqa  xmm0,XMMWORD[((288-128))+rcx]
+        paddd   xmm0,xmm4
+DB      15,56,205,236
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm7,xmm5
+DB      102,15,58,15,252,4
+        nop
+        paddd   xmm6,xmm7
+DB      15,56,204,220
+DB      15,56,203,202
+        movdqa  xmm0,XMMWORD[((320-128))+rcx]
+        paddd   xmm0,xmm5
+DB      15,56,205,245
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm7,xmm6
+DB      102,15,58,15,253,4
+        nop
+        paddd   xmm3,xmm7
+DB      15,56,204,229
+DB      15,56,203,202
+        movdqa  xmm0,XMMWORD[((352-128))+rcx]
+        paddd   xmm0,xmm6
+DB      15,56,205,222
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm7,xmm3
+DB      102,15,58,15,254,4
+        nop
+        paddd   xmm4,xmm7
+DB      15,56,204,238
+DB      15,56,203,202
+        movdqa  xmm0,XMMWORD[((384-128))+rcx]
+        paddd   xmm0,xmm3
+DB      15,56,205,227
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm7,xmm4
+DB      102,15,58,15,251,4
+        nop
+        paddd   xmm5,xmm7
+DB      15,56,204,243
+DB      15,56,203,202
+        movdqa  xmm0,XMMWORD[((416-128))+rcx]
+        paddd   xmm0,xmm4
+DB      15,56,205,236
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm7,xmm5
+DB      102,15,58,15,252,4
+DB      15,56,203,202
+        paddd   xmm6,xmm7
+
+        movdqa  xmm0,XMMWORD[((448-128))+rcx]
+        paddd   xmm0,xmm5
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+DB      15,56,205,245
+        movdqa  xmm7,xmm8
+DB      15,56,203,202
+
+        movdqa  xmm0,XMMWORD[((480-128))+rcx]
+        paddd   xmm0,xmm6
+        nop
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        dec     rdx
+        nop
+DB      15,56,203,202
+
+        paddd   xmm2,xmm10
+        paddd   xmm1,xmm9
+        jnz     NEAR $L$oop_shaext
+
+        pshufd  xmm2,xmm2,0xb1
+        pshufd  xmm7,xmm1,0x1b
+        pshufd  xmm1,xmm1,0xb1
+        punpckhqdq      xmm1,xmm2
+DB      102,15,58,15,215,8
+
+        movdqu  XMMWORD[rdi],xmm1
+        movdqu  XMMWORD[16+rdi],xmm2
+        movaps  xmm6,XMMWORD[((-8-80))+rax]
+        movaps  xmm7,XMMWORD[((-8-64))+rax]
+        movaps  xmm8,XMMWORD[((-8-48))+rax]
+        movaps  xmm9,XMMWORD[((-8-32))+rax]
+        movaps  xmm10,XMMWORD[((-8-16))+rax]
+        mov     rsp,rax
+$L$epilogue_shaext:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+$L$SEH_end_sha256_block_data_order_shaext:
+
+ALIGN   64
+sha256_block_data_order_ssse3:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha256_block_data_order_ssse3:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+$L$ssse3_shortcut:
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        shl     rdx,4
+        sub     rsp,160
+        lea     rdx,[rdx*4+rsi]
+        and     rsp,-64
+        mov     QWORD[((64+0))+rsp],rdi
+        mov     QWORD[((64+8))+rsp],rsi
+        mov     QWORD[((64+16))+rsp],rdx
+        mov     QWORD[88+rsp],rax
+
+        movaps  XMMWORD[(64+32)+rsp],xmm6
+        movaps  XMMWORD[(64+48)+rsp],xmm7
+        movaps  XMMWORD[(64+64)+rsp],xmm8
+        movaps  XMMWORD[(64+80)+rsp],xmm9
+$L$prologue_ssse3:
+
+        mov     eax,DWORD[rdi]
+        mov     ebx,DWORD[4+rdi]
+        mov     ecx,DWORD[8+rdi]
+        mov     edx,DWORD[12+rdi]
+        mov     r8d,DWORD[16+rdi]
+        mov     r9d,DWORD[20+rdi]
+        mov     r10d,DWORD[24+rdi]
+        mov     r11d,DWORD[28+rdi]
+
+
+        jmp     NEAR $L$loop_ssse3
+ALIGN   16
+$L$loop_ssse3:
+        movdqa  xmm7,XMMWORD[((K256+512))]
+        movdqu  xmm0,XMMWORD[rsi]
+        movdqu  xmm1,XMMWORD[16+rsi]
+        movdqu  xmm2,XMMWORD[32+rsi]
+DB      102,15,56,0,199
+        movdqu  xmm3,XMMWORD[48+rsi]
+        lea     rbp,[K256]
+DB      102,15,56,0,207
+        movdqa  xmm4,XMMWORD[rbp]
+        movdqa  xmm5,XMMWORD[32+rbp]
+DB      102,15,56,0,215
+        paddd   xmm4,xmm0
+        movdqa  xmm6,XMMWORD[64+rbp]
+DB      102,15,56,0,223
+        movdqa  xmm7,XMMWORD[96+rbp]
+        paddd   xmm5,xmm1
+        paddd   xmm6,xmm2
+        paddd   xmm7,xmm3
+        movdqa  XMMWORD[rsp],xmm4
+        mov     r14d,eax
+        movdqa  XMMWORD[16+rsp],xmm5
+        mov     edi,ebx
+        movdqa  XMMWORD[32+rsp],xmm6
+        xor     edi,ecx
+        movdqa  XMMWORD[48+rsp],xmm7
+        mov     r13d,r8d
+        jmp     NEAR $L$ssse3_00_47
+
+ALIGN   16
+$L$ssse3_00_47:
+        sub     rbp,-128
+        ror     r13d,14
+        movdqa  xmm4,xmm1
+        mov     eax,r14d
+        mov     r12d,r9d
+        movdqa  xmm7,xmm3
+        ror     r14d,9
+        xor     r13d,r8d
+        xor     r12d,r10d
+        ror     r13d,5
+        xor     r14d,eax
+DB      102,15,58,15,224,4
+        and     r12d,r8d
+        xor     r13d,r8d
+DB      102,15,58,15,250,4
+        add     r11d,DWORD[rsp]
+        mov     r15d,eax
+        xor     r12d,r10d
+        ror     r14d,11
+        movdqa  xmm5,xmm4
+        xor     r15d,ebx
+        add     r11d,r12d
+        movdqa  xmm6,xmm4
+        ror     r13d,6
+        and     edi,r15d
+        psrld   xmm4,3
+        xor     r14d,eax
+        add     r11d,r13d
+        xor     edi,ebx
+        paddd   xmm0,xmm7
+        ror     r14d,2
+        add     edx,r11d
+        psrld   xmm6,7
+        add     r11d,edi
+        mov     r13d,edx
+        pshufd  xmm7,xmm3,250
+        add     r14d,r11d
+        ror     r13d,14
+        pslld   xmm5,14
+        mov     r11d,r14d
+        mov     r12d,r8d
+        pxor    xmm4,xmm6
+        ror     r14d,9
+        xor     r13d,edx
+        xor     r12d,r9d
+        ror     r13d,5
+        psrld   xmm6,11
+        xor     r14d,r11d
+        pxor    xmm4,xmm5
+        and     r12d,edx
+        xor     r13d,edx
+        pslld   xmm5,11
+        add     r10d,DWORD[4+rsp]
+        mov     edi,r11d
+        pxor    xmm4,xmm6
+        xor     r12d,r9d
+        ror     r14d,11
+        movdqa  xmm6,xmm7
+        xor     edi,eax
+        add     r10d,r12d
+        pxor    xmm4,xmm5
+        ror     r13d,6
+        and     r15d,edi
+        xor     r14d,r11d
+        psrld   xmm7,10
+        add     r10d,r13d
+        xor     r15d,eax
+        paddd   xmm0,xmm4
+        ror     r14d,2
+        add     ecx,r10d
+        psrlq   xmm6,17
+        add     r10d,r15d
+        mov     r13d,ecx
+        add     r14d,r10d
+        pxor    xmm7,xmm6
+        ror     r13d,14
+        mov     r10d,r14d
+        mov     r12d,edx
+        ror     r14d,9
+        psrlq   xmm6,2
+        xor     r13d,ecx
+        xor     r12d,r8d
+        pxor    xmm7,xmm6
+        ror     r13d,5
+        xor     r14d,r10d
+        and     r12d,ecx
+        pshufd  xmm7,xmm7,128
+        xor     r13d,ecx
+        add     r9d,DWORD[8+rsp]
+        mov     r15d,r10d
+        psrldq  xmm7,8
+        xor     r12d,r8d
+        ror     r14d,11
+        xor     r15d,r11d
+        add     r9d,r12d
+        ror     r13d,6
+        paddd   xmm0,xmm7
+        and     edi,r15d
+        xor     r14d,r10d
+        add     r9d,r13d
+        pshufd  xmm7,xmm0,80
+        xor     edi,r11d
+        ror     r14d,2
+        add     ebx,r9d
+        movdqa  xmm6,xmm7
+        add     r9d,edi
+        mov     r13d,ebx
+        psrld   xmm7,10
+        add     r14d,r9d
+        ror     r13d,14
+        psrlq   xmm6,17
+        mov     r9d,r14d
+        mov     r12d,ecx
+        pxor    xmm7,xmm6
+        ror     r14d,9
+        xor     r13d,ebx
+        xor     r12d,edx
+        ror     r13d,5
+        xor     r14d,r9d
+        psrlq   xmm6,2
+        and     r12d,ebx
+        xor     r13d,ebx
+        add     r8d,DWORD[12+rsp]
+        pxor    xmm7,xmm6
+        mov     edi,r9d
+        xor     r12d,edx
+        ror     r14d,11
+        pshufd  xmm7,xmm7,8
+        xor     edi,r10d
+        add     r8d,r12d
+        movdqa  xmm6,XMMWORD[rbp]
+        ror     r13d,6
+        and     r15d,edi
+        pslldq  xmm7,8
+        xor     r14d,r9d
+        add     r8d,r13d
+        xor     r15d,r10d
+        paddd   xmm0,xmm7
+        ror     r14d,2
+        add     eax,r8d
+        add     r8d,r15d
+        paddd   xmm6,xmm0
+        mov     r13d,eax
+        add     r14d,r8d
+        movdqa  XMMWORD[rsp],xmm6
+        ror     r13d,14
+        movdqa  xmm4,xmm2
+        mov     r8d,r14d
+        mov     r12d,ebx
+        movdqa  xmm7,xmm0
+        ror     r14d,9
+        xor     r13d,eax
+        xor     r12d,ecx
+        ror     r13d,5
+        xor     r14d,r8d
+DB      102,15,58,15,225,4
+        and     r12d,eax
+        xor     r13d,eax
+DB      102,15,58,15,251,4
+        add     edx,DWORD[16+rsp]
+        mov     r15d,r8d
+        xor     r12d,ecx
+        ror     r14d,11
+        movdqa  xmm5,xmm4
+        xor     r15d,r9d
+        add     edx,r12d
+        movdqa  xmm6,xmm4
+        ror     r13d,6
+        and     edi,r15d
+        psrld   xmm4,3
+        xor     r14d,r8d
+        add     edx,r13d
+        xor     edi,r9d
+        paddd   xmm1,xmm7
+        ror     r14d,2
+        add     r11d,edx
+        psrld   xmm6,7
+        add     edx,edi
+        mov     r13d,r11d
+        pshufd  xmm7,xmm0,250
+        add     r14d,edx
+        ror     r13d,14
+        pslld   xmm5,14
+        mov     edx,r14d
+        mov     r12d,eax
+        pxor    xmm4,xmm6
+        ror     r14d,9
+        xor     r13d,r11d
+        xor     r12d,ebx
+        ror     r13d,5
+        psrld   xmm6,11
+        xor     r14d,edx
+        pxor    xmm4,xmm5
+        and     r12d,r11d
+        xor     r13d,r11d
+        pslld   xmm5,11
+        add     ecx,DWORD[20+rsp]
+        mov     edi,edx
+        pxor    xmm4,xmm6
+        xor     r12d,ebx
+        ror     r14d,11
+        movdqa  xmm6,xmm7
+        xor     edi,r8d
+        add     ecx,r12d
+        pxor    xmm4,xmm5
+        ror     r13d,6
+        and     r15d,edi
+        xor     r14d,edx
+        psrld   xmm7,10
+        add     ecx,r13d
+        xor     r15d,r8d
+        paddd   xmm1,xmm4
+        ror     r14d,2
+        add     r10d,ecx
+        psrlq   xmm6,17
+        add     ecx,r15d
+        mov     r13d,r10d
+        add     r14d,ecx
+        pxor    xmm7,xmm6
+        ror     r13d,14
+        mov     ecx,r14d
+        mov     r12d,r11d
+        ror     r14d,9
+        psrlq   xmm6,2
+        xor     r13d,r10d
+        xor     r12d,eax
+        pxor    xmm7,xmm6
+        ror     r13d,5
+        xor     r14d,ecx
+        and     r12d,r10d
+        pshufd  xmm7,xmm7,128
+        xor     r13d,r10d
+        add     ebx,DWORD[24+rsp]
+        mov     r15d,ecx
+        psrldq  xmm7,8
+        xor     r12d,eax
+        ror     r14d,11
+        xor     r15d,edx
+        add     ebx,r12d
+        ror     r13d,6
+        paddd   xmm1,xmm7
+        and     edi,r15d
+        xor     r14d,ecx
+        add     ebx,r13d
+        pshufd  xmm7,xmm1,80
+        xor     edi,edx
+        ror     r14d,2
+        add     r9d,ebx
+        movdqa  xmm6,xmm7
+        add     ebx,edi
+        mov     r13d,r9d
+        psrld   xmm7,10
+        add     r14d,ebx
+        ror     r13d,14
+        psrlq   xmm6,17
+        mov     ebx,r14d
+        mov     r12d,r10d
+        pxor    xmm7,xmm6
+        ror     r14d,9
+        xor     r13d,r9d
+        xor     r12d,r11d
+        ror     r13d,5
+        xor     r14d,ebx
+        psrlq   xmm6,2
+        and     r12d,r9d
+        xor     r13d,r9d
+        add     eax,DWORD[28+rsp]
+        pxor    xmm7,xmm6
+        mov     edi,ebx
+        xor     r12d,r11d
+        ror     r14d,11
+        pshufd  xmm7,xmm7,8
+        xor     edi,ecx
+        add     eax,r12d
+        movdqa  xmm6,XMMWORD[32+rbp]
+        ror     r13d,6
+        and     r15d,edi
+        pslldq  xmm7,8
+        xor     r14d,ebx
+        add     eax,r13d
+        xor     r15d,ecx
+        paddd   xmm1,xmm7
+        ror     r14d,2
+        add     r8d,eax
+        add     eax,r15d
+        paddd   xmm6,xmm1
+        mov     r13d,r8d
+        add     r14d,eax
+        movdqa  XMMWORD[16+rsp],xmm6
+        ror     r13d,14
+        movdqa  xmm4,xmm3
+        mov     eax,r14d
+        mov     r12d,r9d
+        movdqa  xmm7,xmm1
+        ror     r14d,9
+        xor     r13d,r8d
+        xor     r12d,r10d
+        ror     r13d,5
+        xor     r14d,eax
+DB      102,15,58,15,226,4
+        and     r12d,r8d
+        xor     r13d,r8d
+DB      102,15,58,15,248,4
+        add     r11d,DWORD[32+rsp]
+        mov     r15d,eax
+        xor     r12d,r10d
+        ror     r14d,11
+        movdqa  xmm5,xmm4
+        xor     r15d,ebx
+        add     r11d,r12d
+        movdqa  xmm6,xmm4
+        ror     r13d,6
+        and     edi,r15d
+        psrld   xmm4,3
+        xor     r14d,eax
+        add     r11d,r13d
+        xor     edi,ebx
+        paddd   xmm2,xmm7
+        ror     r14d,2
+        add     edx,r11d
+        psrld   xmm6,7
+        add     r11d,edi
+        mov     r13d,edx
+        pshufd  xmm7,xmm1,250
+        add     r14d,r11d
+        ror     r13d,14
+        pslld   xmm5,14
+        mov     r11d,r14d
+        mov     r12d,r8d
+        pxor    xmm4,xmm6
+        ror     r14d,9
+        xor     r13d,edx
+        xor     r12d,r9d
+        ror     r13d,5
+        psrld   xmm6,11
+        xor     r14d,r11d
+        pxor    xmm4,xmm5
+        and     r12d,edx
+        xor     r13d,edx
+        pslld   xmm5,11
+        add     r10d,DWORD[36+rsp]
+        mov     edi,r11d
+        pxor    xmm4,xmm6
+        xor     r12d,r9d
+        ror     r14d,11
+        movdqa  xmm6,xmm7
+        xor     edi,eax
+        add     r10d,r12d
+        pxor    xmm4,xmm5
+        ror     r13d,6
+        and     r15d,edi
+        xor     r14d,r11d
+        psrld   xmm7,10
+        add     r10d,r13d
+        xor     r15d,eax
+        paddd   xmm2,xmm4
+        ror     r14d,2
+        add     ecx,r10d
+        psrlq   xmm6,17
+        add     r10d,r15d
+        mov     r13d,ecx
+        add     r14d,r10d
+        pxor    xmm7,xmm6
+        ror     r13d,14
+        mov     r10d,r14d
+        mov     r12d,edx
+        ror     r14d,9
+        psrlq   xmm6,2
+        xor     r13d,ecx
+        xor     r12d,r8d
+        pxor    xmm7,xmm6
+        ror     r13d,5
+        xor     r14d,r10d
+        and     r12d,ecx
+        pshufd  xmm7,xmm7,128
+        xor     r13d,ecx
+        add     r9d,DWORD[40+rsp]
+        mov     r15d,r10d
+        psrldq  xmm7,8
+        xor     r12d,r8d
+        ror     r14d,11
+        xor     r15d,r11d
+        add     r9d,r12d
+        ror     r13d,6
+        paddd   xmm2,xmm7
+        and     edi,r15d
+        xor     r14d,r10d
+        add     r9d,r13d
+        pshufd  xmm7,xmm2,80
+        xor     edi,r11d
+        ror     r14d,2
+        add     ebx,r9d
+        movdqa  xmm6,xmm7
+        add     r9d,edi
+        mov     r13d,ebx
+        psrld   xmm7,10
+        add     r14d,r9d
+        ror     r13d,14
+        psrlq   xmm6,17
+        mov     r9d,r14d
+        mov     r12d,ecx
+        pxor    xmm7,xmm6
+        ror     r14d,9
+        xor     r13d,ebx
+        xor     r12d,edx
+        ror     r13d,5
+        xor     r14d,r9d
+        psrlq   xmm6,2
+        and     r12d,ebx
+        xor     r13d,ebx
+        add     r8d,DWORD[44+rsp]
+        pxor    xmm7,xmm6
+        mov     edi,r9d
+        xor     r12d,edx
+        ror     r14d,11
+        pshufd  xmm7,xmm7,8
+        xor     edi,r10d
+        add     r8d,r12d
+        movdqa  xmm6,XMMWORD[64+rbp]
+        ror     r13d,6
+        and     r15d,edi
+        pslldq  xmm7,8
+        xor     r14d,r9d
+        add     r8d,r13d
+        xor     r15d,r10d
+        paddd   xmm2,xmm7
+        ror     r14d,2
+        add     eax,r8d
+        add     r8d,r15d
+        paddd   xmm6,xmm2
+        mov     r13d,eax
+        add     r14d,r8d
+        movdqa  XMMWORD[32+rsp],xmm6
+        ror     r13d,14
+        movdqa  xmm4,xmm0
+        mov     r8d,r14d
+        mov     r12d,ebx
+        movdqa  xmm7,xmm2
+        ror     r14d,9
+        xor     r13d,eax
+        xor     r12d,ecx
+        ror     r13d,5
+        xor     r14d,r8d
+DB      102,15,58,15,227,4
+        and     r12d,eax
+        xor     r13d,eax
+DB      102,15,58,15,249,4
+        add     edx,DWORD[48+rsp]
+        mov     r15d,r8d
+        xor     r12d,ecx
+        ror     r14d,11
+        movdqa  xmm5,xmm4
+        xor     r15d,r9d
+        add     edx,r12d
+        movdqa  xmm6,xmm4
+        ror     r13d,6
+        and     edi,r15d
+        psrld   xmm4,3
+        xor     r14d,r8d
+        add     edx,r13d
+        xor     edi,r9d
+        paddd   xmm3,xmm7
+        ror     r14d,2
+        add     r11d,edx
+        psrld   xmm6,7
+        add     edx,edi
+        mov     r13d,r11d
+        pshufd  xmm7,xmm2,250
+        add     r14d,edx
+        ror     r13d,14
+        pslld   xmm5,14
+        mov     edx,r14d
+        mov     r12d,eax
+        pxor    xmm4,xmm6
+        ror     r14d,9
+        xor     r13d,r11d
+        xor     r12d,ebx
+        ror     r13d,5
+        psrld   xmm6,11
+        xor     r14d,edx
+        pxor    xmm4,xmm5
+        and     r12d,r11d
+        xor     r13d,r11d
+        pslld   xmm5,11
+        add     ecx,DWORD[52+rsp]
+        mov     edi,edx
+        pxor    xmm4,xmm6
+        xor     r12d,ebx
+        ror     r14d,11
+        movdqa  xmm6,xmm7
+        xor     edi,r8d
+        add     ecx,r12d
+        pxor    xmm4,xmm5
+        ror     r13d,6
+        and     r15d,edi
+        xor     r14d,edx
+        psrld   xmm7,10
+        add     ecx,r13d
+        xor     r15d,r8d
+        paddd   xmm3,xmm4
+        ror     r14d,2
+        add     r10d,ecx
+        psrlq   xmm6,17
+        add     ecx,r15d
+        mov     r13d,r10d
+        add     r14d,ecx
+        pxor    xmm7,xmm6
+        ror     r13d,14
+        mov     ecx,r14d
+        mov     r12d,r11d
+        ror     r14d,9
+        psrlq   xmm6,2
+        xor     r13d,r10d
+        xor     r12d,eax
+        pxor    xmm7,xmm6
+        ror     r13d,5
+        xor     r14d,ecx
+        and     r12d,r10d
+        pshufd  xmm7,xmm7,128
+        xor     r13d,r10d
+        add     ebx,DWORD[56+rsp]
+        mov     r15d,ecx
+        psrldq  xmm7,8
+        xor     r12d,eax
+        ror     r14d,11
+        xor     r15d,edx
+        add     ebx,r12d
+        ror     r13d,6
+        paddd   xmm3,xmm7
+        and     edi,r15d
+        xor     r14d,ecx
+        add     ebx,r13d
+        pshufd  xmm7,xmm3,80
+        xor     edi,edx
+        ror     r14d,2
+        add     r9d,ebx
+        movdqa  xmm6,xmm7
+        add     ebx,edi
+        mov     r13d,r9d
+        psrld   xmm7,10
+        add     r14d,ebx
+        ror     r13d,14
+        psrlq   xmm6,17
+        mov     ebx,r14d
+        mov     r12d,r10d
+        pxor    xmm7,xmm6
+        ror     r14d,9
+        xor     r13d,r9d
+        xor     r12d,r11d
+        ror     r13d,5
+        xor     r14d,ebx
+        psrlq   xmm6,2
+        and     r12d,r9d
+        xor     r13d,r9d
+        add     eax,DWORD[60+rsp]
+        pxor    xmm7,xmm6
+        mov     edi,ebx
+        xor     r12d,r11d
+        ror     r14d,11
+        pshufd  xmm7,xmm7,8
+        xor     edi,ecx
+        add     eax,r12d
+        movdqa  xmm6,XMMWORD[96+rbp]
+        ror     r13d,6
+        and     r15d,edi
+        pslldq  xmm7,8
+        xor     r14d,ebx
+        add     eax,r13d
+        xor     r15d,ecx
+        paddd   xmm3,xmm7
+        ror     r14d,2
+        add     r8d,eax
+        add     eax,r15d
+        paddd   xmm6,xmm3
+        mov     r13d,r8d
+        add     r14d,eax
+        movdqa  XMMWORD[48+rsp],xmm6
+        cmp     BYTE[131+rbp],0
+        jne     NEAR $L$ssse3_00_47
+        ror     r13d,14
+        mov     eax,r14d
+        mov     r12d,r9d
+        ror     r14d,9
+        xor     r13d,r8d
+        xor     r12d,r10d
+        ror     r13d,5
+        xor     r14d,eax
+        and     r12d,r8d
+        xor     r13d,r8d
+        add     r11d,DWORD[rsp]
+        mov     r15d,eax
+        xor     r12d,r10d
+        ror     r14d,11
+        xor     r15d,ebx
+        add     r11d,r12d
+        ror     r13d,6
+        and     edi,r15d
+        xor     r14d,eax
+        add     r11d,r13d
+        xor     edi,ebx
+        ror     r14d,2
+        add     edx,r11d
+        add     r11d,edi
+        mov     r13d,edx
+        add     r14d,r11d
+        ror     r13d,14
+        mov     r11d,r14d
+        mov     r12d,r8d
+        ror     r14d,9
+        xor     r13d,edx
+        xor     r12d,r9d
+        ror     r13d,5
+        xor     r14d,r11d
+        and     r12d,edx
+        xor     r13d,edx
+        add     r10d,DWORD[4+rsp]
+        mov     edi,r11d
+        xor     r12d,r9d
+        ror     r14d,11
+        xor     edi,eax
+        add     r10d,r12d
+        ror     r13d,6
+        and     r15d,edi
+        xor     r14d,r11d
+        add     r10d,r13d
+        xor     r15d,eax
+        ror     r14d,2
+        add     ecx,r10d
+        add     r10d,r15d
+        mov     r13d,ecx
+        add     r14d,r10d
+        ror     r13d,14
+        mov     r10d,r14d
+        mov     r12d,edx
+        ror     r14d,9
+        xor     r13d,ecx
+        xor     r12d,r8d
+        ror     r13d,5
+        xor     r14d,r10d
+        and     r12d,ecx
+        xor     r13d,ecx
+        add     r9d,DWORD[8+rsp]
+        mov     r15d,r10d
+        xor     r12d,r8d
+        ror     r14d,11
+        xor     r15d,r11d
+        add     r9d,r12d
+        ror     r13d,6
+        and     edi,r15d
+        xor     r14d,r10d
+        add     r9d,r13d
+        xor     edi,r11d
+        ror     r14d,2
+        add     ebx,r9d
+        add     r9d,edi
+        mov     r13d,ebx
+        add     r14d,r9d
+        ror     r13d,14
+        mov     r9d,r14d
+        mov     r12d,ecx
+        ror     r14d,9
+        xor     r13d,ebx
+        xor     r12d,edx
+        ror     r13d,5
+        xor     r14d,r9d
+        and     r12d,ebx
+        xor     r13d,ebx
+        add     r8d,DWORD[12+rsp]
+        mov     edi,r9d
+        xor     r12d,edx
+        ror     r14d,11
+        xor     edi,r10d
+        add     r8d,r12d
+        ror     r13d,6
+        and     r15d,edi
+        xor     r14d,r9d
+        add     r8d,r13d
+        xor     r15d,r10d
+        ror     r14d,2
+        add     eax,r8d
+        add     r8d,r15d
+        mov     r13d,eax
+        add     r14d,r8d
+        ror     r13d,14
+        mov     r8d,r14d
+        mov     r12d,ebx
+        ror     r14d,9
+        xor     r13d,eax
+        xor     r12d,ecx
+        ror     r13d,5
+        xor     r14d,r8d
+        and     r12d,eax
+        xor     r13d,eax
+        add     edx,DWORD[16+rsp]
+        mov     r15d,r8d
+        xor     r12d,ecx
+        ror     r14d,11
+        xor     r15d,r9d
+        add     edx,r12d
+        ror     r13d,6
+        and     edi,r15d
+        xor     r14d,r8d
+        add     edx,r13d
+        xor     edi,r9d
+        ror     r14d,2
+        add     r11d,edx
+        add     edx,edi
+        mov     r13d,r11d
+        add     r14d,edx
+        ror     r13d,14
+        mov     edx,r14d
+        mov     r12d,eax
+        ror     r14d,9
+        xor     r13d,r11d
+        xor     r12d,ebx
+        ror     r13d,5
+        xor     r14d,edx
+        and     r12d,r11d
+        xor     r13d,r11d
+        add     ecx,DWORD[20+rsp]
+        mov     edi,edx
+        xor     r12d,ebx
+        ror     r14d,11
+        xor     edi,r8d
+        add     ecx,r12d
+        ror     r13d,6
+        and     r15d,edi
+        xor     r14d,edx
+        add     ecx,r13d
+        xor     r15d,r8d
+        ror     r14d,2
+        add     r10d,ecx
+        add     ecx,r15d
+        mov     r13d,r10d
+        add     r14d,ecx
+        ror     r13d,14
+        mov     ecx,r14d
+        mov     r12d,r11d
+        ror     r14d,9
+        xor     r13d,r10d
+        xor     r12d,eax
+        ror     r13d,5
+        xor     r14d,ecx
+        and     r12d,r10d
+        xor     r13d,r10d
+        add     ebx,DWORD[24+rsp]
+        mov     r15d,ecx
+        xor     r12d,eax
+        ror     r14d,11
+        xor     r15d,edx
+        add     ebx,r12d
+        ror     r13d,6
+        and     edi,r15d
+        xor     r14d,ecx
+        add     ebx,r13d
+        xor     edi,edx
+        ror     r14d,2
+        add     r9d,ebx
+        add     ebx,edi
+        mov     r13d,r9d
+        add     r14d,ebx
+        ror     r13d,14
+        mov     ebx,r14d
+        mov     r12d,r10d
+        ror     r14d,9
+        xor     r13d,r9d
+        xor     r12d,r11d
+        ror     r13d,5
+        xor     r14d,ebx
+        and     r12d,r9d
+        xor     r13d,r9d
+        add     eax,DWORD[28+rsp]
+        mov     edi,ebx
+        xor     r12d,r11d
+        ror     r14d,11
+        xor     edi,ecx
+        add     eax,r12d
+        ror     r13d,6
+        and     r15d,edi
+        xor     r14d,ebx
+        add     eax,r13d
+        xor     r15d,ecx
+        ror     r14d,2
+        add     r8d,eax
+        add     eax,r15d
+        mov     r13d,r8d
+        add     r14d,eax
+        ror     r13d,14
+        mov     eax,r14d
+        mov     r12d,r9d
+        ror     r14d,9
+        xor     r13d,r8d
+        xor     r12d,r10d
+        ror     r13d,5
+        xor     r14d,eax
+        and     r12d,r8d
+        xor     r13d,r8d
+        add     r11d,DWORD[32+rsp]
+        mov     r15d,eax
+        xor     r12d,r10d
+        ror     r14d,11
+        xor     r15d,ebx
+        add     r11d,r12d
+        ror     r13d,6
+        and     edi,r15d
+        xor     r14d,eax
+        add     r11d,r13d
+        xor     edi,ebx
+        ror     r14d,2
+        add     edx,r11d
+        add     r11d,edi
+        mov     r13d,edx
+        add     r14d,r11d
+        ror     r13d,14
+        mov     r11d,r14d
+        mov     r12d,r8d
+        ror     r14d,9
+        xor     r13d,edx
+        xor     r12d,r9d
+        ror     r13d,5
+        xor     r14d,r11d
+        and     r12d,edx
+        xor     r13d,edx
+        add     r10d,DWORD[36+rsp]
+        mov     edi,r11d
+        xor     r12d,r9d
+        ror     r14d,11
+        xor     edi,eax
+        add     r10d,r12d
+        ror     r13d,6
+        and     r15d,edi
+        xor     r14d,r11d
+        add     r10d,r13d
+        xor     r15d,eax
+        ror     r14d,2
+        add     ecx,r10d
+        add     r10d,r15d
+        mov     r13d,ecx
+        add     r14d,r10d
+        ror     r13d,14
+        mov     r10d,r14d
+        mov     r12d,edx
+        ror     r14d,9
+        xor     r13d,ecx
+        xor     r12d,r8d
+        ror     r13d,5
+        xor     r14d,r10d
+        and     r12d,ecx
+        xor     r13d,ecx
+        add     r9d,DWORD[40+rsp]
+        mov     r15d,r10d
+        xor     r12d,r8d
+        ror     r14d,11
+        xor     r15d,r11d
+        add     r9d,r12d
+        ror     r13d,6
+        and     edi,r15d
+        xor     r14d,r10d
+        add     r9d,r13d
+        xor     edi,r11d
+        ror     r14d,2
+        add     ebx,r9d
+        add     r9d,edi
+        mov     r13d,ebx
+        add     r14d,r9d
+        ror     r13d,14
+        mov     r9d,r14d
+        mov     r12d,ecx
+        ror     r14d,9
+        xor     r13d,ebx
+        xor     r12d,edx
+        ror     r13d,5
+        xor     r14d,r9d
+        and     r12d,ebx
+        xor     r13d,ebx
+        add     r8d,DWORD[44+rsp]
+        mov     edi,r9d
+        xor     r12d,edx
+        ror     r14d,11
+        xor     edi,r10d
+        add     r8d,r12d
+        ror     r13d,6
+        and     r15d,edi
+        xor     r14d,r9d
+        add     r8d,r13d
+        xor     r15d,r10d
+        ror     r14d,2
+        add     eax,r8d
+        add     r8d,r15d
+        mov     r13d,eax
+        add     r14d,r8d
+        ror     r13d,14
+        mov     r8d,r14d
+        mov     r12d,ebx
+        ror     r14d,9
+        xor     r13d,eax
+        xor     r12d,ecx
+        ror     r13d,5
+        xor     r14d,r8d
+        and     r12d,eax
+        xor     r13d,eax
+        add     edx,DWORD[48+rsp]
+        mov     r15d,r8d
+        xor     r12d,ecx
+        ror     r14d,11
+        xor     r15d,r9d
+        add     edx,r12d
+        ror     r13d,6
+        and     edi,r15d
+        xor     r14d,r8d
+        add     edx,r13d
+        xor     edi,r9d
+        ror     r14d,2
+        add     r11d,edx
+        add     edx,edi
+        mov     r13d,r11d
+        add     r14d,edx
+        ror     r13d,14
+        mov     edx,r14d
+        mov     r12d,eax
+        ror     r14d,9
+        xor     r13d,r11d
+        xor     r12d,ebx
+        ror     r13d,5
+        xor     r14d,edx
+        and     r12d,r11d
+        xor     r13d,r11d
+        add     ecx,DWORD[52+rsp]
+        mov     edi,edx
+        xor     r12d,ebx
+        ror     r14d,11
+        xor     edi,r8d
+        add     ecx,r12d
+        ror     r13d,6
+        and     r15d,edi
+        xor     r14d,edx
+        add     ecx,r13d
+        xor     r15d,r8d
+        ror     r14d,2
+        add     r10d,ecx
+        add     ecx,r15d
+        mov     r13d,r10d
+        add     r14d,ecx
+        ror     r13d,14
+        mov     ecx,r14d
+        mov     r12d,r11d
+        ror     r14d,9
+        xor     r13d,r10d
+        xor     r12d,eax
+        ror     r13d,5
+        xor     r14d,ecx
+        and     r12d,r10d
+        xor     r13d,r10d
+        add     ebx,DWORD[56+rsp]
+        mov     r15d,ecx
+        xor     r12d,eax
+        ror     r14d,11
+        xor     r15d,edx
+        add     ebx,r12d
+        ror     r13d,6
+        and     edi,r15d
+        xor     r14d,ecx
+        add     ebx,r13d
+        xor     edi,edx
+        ror     r14d,2
+        add     r9d,ebx
+        add     ebx,edi
+        mov     r13d,r9d
+        add     r14d,ebx
+        ror     r13d,14
+        mov     ebx,r14d
+        mov     r12d,r10d
+        ror     r14d,9
+        xor     r13d,r9d
+        xor     r12d,r11d
+        ror     r13d,5
+        xor     r14d,ebx
+        and     r12d,r9d
+        xor     r13d,r9d
+        add     eax,DWORD[60+rsp]
+        mov     edi,ebx
+        xor     r12d,r11d
+        ror     r14d,11
+        xor     edi,ecx
+        add     eax,r12d
+        ror     r13d,6
+        and     r15d,edi
+        xor     r14d,ebx
+        add     eax,r13d
+        xor     r15d,ecx
+        ror     r14d,2
+        add     r8d,eax
+        add     eax,r15d
+        mov     r13d,r8d
+        add     r14d,eax
+        mov     rdi,QWORD[((64+0))+rsp]
+        mov     eax,r14d
+
+        add     eax,DWORD[rdi]
+        lea     rsi,[64+rsi]
+        add     ebx,DWORD[4+rdi]
+        add     ecx,DWORD[8+rdi]
+        add     edx,DWORD[12+rdi]
+        add     r8d,DWORD[16+rdi]
+        add     r9d,DWORD[20+rdi]
+        add     r10d,DWORD[24+rdi]
+        add     r11d,DWORD[28+rdi]
+
+        cmp     rsi,QWORD[((64+16))+rsp]
+
+        mov     DWORD[rdi],eax
+        mov     DWORD[4+rdi],ebx
+        mov     DWORD[8+rdi],ecx
+        mov     DWORD[12+rdi],edx
+        mov     DWORD[16+rdi],r8d
+        mov     DWORD[20+rdi],r9d
+        mov     DWORD[24+rdi],r10d
+        mov     DWORD[28+rdi],r11d
+        jb      NEAR $L$loop_ssse3
+
+        mov     rsi,QWORD[88+rsp]
+
+        movaps  xmm6,XMMWORD[((64+32))+rsp]
+        movaps  xmm7,XMMWORD[((64+48))+rsp]
+        movaps  xmm8,XMMWORD[((64+64))+rsp]
+        movaps  xmm9,XMMWORD[((64+80))+rsp]
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$epilogue_ssse3:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_sha256_block_data_order_ssse3:
+
+ALIGN   64
+sha256_block_data_order_avx:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha256_block_data_order_avx:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+$L$avx_shortcut:
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        shl     rdx,4
+        sub     rsp,160
+        lea     rdx,[rdx*4+rsi]
+        and     rsp,-64
+        mov     QWORD[((64+0))+rsp],rdi
+        mov     QWORD[((64+8))+rsp],rsi
+        mov     QWORD[((64+16))+rsp],rdx
+        mov     QWORD[88+rsp],rax
+
+        movaps  XMMWORD[(64+32)+rsp],xmm6
+        movaps  XMMWORD[(64+48)+rsp],xmm7
+        movaps  XMMWORD[(64+64)+rsp],xmm8
+        movaps  XMMWORD[(64+80)+rsp],xmm9
+$L$prologue_avx:
+
+        vzeroupper
+        mov     eax,DWORD[rdi]
+        mov     ebx,DWORD[4+rdi]
+        mov     ecx,DWORD[8+rdi]
+        mov     edx,DWORD[12+rdi]
+        mov     r8d,DWORD[16+rdi]
+        mov     r9d,DWORD[20+rdi]
+        mov     r10d,DWORD[24+rdi]
+        mov     r11d,DWORD[28+rdi]
+        vmovdqa xmm8,XMMWORD[((K256+512+32))]
+        vmovdqa xmm9,XMMWORD[((K256+512+64))]
+        jmp     NEAR $L$loop_avx
+ALIGN   16
+$L$loop_avx:
+        vmovdqa xmm7,XMMWORD[((K256+512))]
+        vmovdqu xmm0,XMMWORD[rsi]
+        vmovdqu xmm1,XMMWORD[16+rsi]
+        vmovdqu xmm2,XMMWORD[32+rsi]
+        vmovdqu xmm3,XMMWORD[48+rsi]
+        vpshufb xmm0,xmm0,xmm7
+        lea     rbp,[K256]
+        vpshufb xmm1,xmm1,xmm7
+        vpshufb xmm2,xmm2,xmm7
+        vpaddd  xmm4,xmm0,XMMWORD[rbp]
+        vpshufb xmm3,xmm3,xmm7
+        vpaddd  xmm5,xmm1,XMMWORD[32+rbp]
+        vpaddd  xmm6,xmm2,XMMWORD[64+rbp]
+        vpaddd  xmm7,xmm3,XMMWORD[96+rbp]
+        vmovdqa XMMWORD[rsp],xmm4
+        mov     r14d,eax
+        vmovdqa XMMWORD[16+rsp],xmm5
+        mov     edi,ebx
+        vmovdqa XMMWORD[32+rsp],xmm6
+        xor     edi,ecx
+        vmovdqa XMMWORD[48+rsp],xmm7
+        mov     r13d,r8d
+        jmp     NEAR $L$avx_00_47
+
+ALIGN   16
+$L$avx_00_47:
+        sub     rbp,-128
+        vpalignr        xmm4,xmm1,xmm0,4
+        shrd    r13d,r13d,14
+        mov     eax,r14d
+        mov     r12d,r9d
+        vpalignr        xmm7,xmm3,xmm2,4
+        shrd    r14d,r14d,9
+        xor     r13d,r8d
+        xor     r12d,r10d
+        vpsrld  xmm6,xmm4,7
+        shrd    r13d,r13d,5
+        xor     r14d,eax
+        and     r12d,r8d
+        vpaddd  xmm0,xmm0,xmm7
+        xor     r13d,r8d
+        add     r11d,DWORD[rsp]
+        mov     r15d,eax
+        vpsrld  xmm7,xmm4,3
+        xor     r12d,r10d
+        shrd    r14d,r14d,11
+        xor     r15d,ebx
+        vpslld  xmm5,xmm4,14
+        add     r11d,r12d
+        shrd    r13d,r13d,6
+        and     edi,r15d
+        vpxor   xmm4,xmm7,xmm6
+        xor     r14d,eax
+        add     r11d,r13d
+        xor     edi,ebx
+        vpshufd xmm7,xmm3,250
+        shrd    r14d,r14d,2
+        add     edx,r11d
+        add     r11d,edi
+        vpsrld  xmm6,xmm6,11
+        mov     r13d,edx
+        add     r14d,r11d
+        shrd    r13d,r13d,14
+        vpxor   xmm4,xmm4,xmm5
+        mov     r11d,r14d
+        mov     r12d,r8d
+        shrd    r14d,r14d,9
+        vpslld  xmm5,xmm5,11
+        xor     r13d,edx
+        xor     r12d,r9d
+        shrd    r13d,r13d,5
+        vpxor   xmm4,xmm4,xmm6
+        xor     r14d,r11d
+        and     r12d,edx
+        xor     r13d,edx
+        vpsrld  xmm6,xmm7,10
+        add     r10d,DWORD[4+rsp]
+        mov     edi,r11d
+        xor     r12d,r9d
+        vpxor   xmm4,xmm4,xmm5
+        shrd    r14d,r14d,11
+        xor     edi,eax
+        add     r10d,r12d
+        vpsrlq  xmm7,xmm7,17
+        shrd    r13d,r13d,6
+        and     r15d,edi
+        xor     r14d,r11d
+        vpaddd  xmm0,xmm0,xmm4
+        add     r10d,r13d
+        xor     r15d,eax
+        shrd    r14d,r14d,2
+        vpxor   xmm6,xmm6,xmm7
+        add     ecx,r10d
+        add     r10d,r15d
+        mov     r13d,ecx
+        vpsrlq  xmm7,xmm7,2
+        add     r14d,r10d
+        shrd    r13d,r13d,14
+        mov     r10d,r14d
+        vpxor   xmm6,xmm6,xmm7
+        mov     r12d,edx
+        shrd    r14d,r14d,9
+        xor     r13d,ecx
+        vpshufb xmm6,xmm6,xmm8
+        xor     r12d,r8d
+        shrd    r13d,r13d,5
+        xor     r14d,r10d
+        vpaddd  xmm0,xmm0,xmm6
+        and     r12d,ecx
+        xor     r13d,ecx
+        add     r9d,DWORD[8+rsp]
+        vpshufd xmm7,xmm0,80
+        mov     r15d,r10d
+        xor     r12d,r8d
+        shrd    r14d,r14d,11
+        vpsrld  xmm6,xmm7,10
+        xor     r15d,r11d
+        add     r9d,r12d
+        shrd    r13d,r13d,6
+        vpsrlq  xmm7,xmm7,17
+        and     edi,r15d
+        xor     r14d,r10d
+        add     r9d,r13d
+        vpxor   xmm6,xmm6,xmm7
+        xor     edi,r11d
+        shrd    r14d,r14d,2
+        add     ebx,r9d
+        vpsrlq  xmm7,xmm7,2
+        add     r9d,edi
+        mov     r13d,ebx
+        add     r14d,r9d
+        vpxor   xmm6,xmm6,xmm7
+        shrd    r13d,r13d,14
+        mov     r9d,r14d
+        mov     r12d,ecx
+        vpshufb xmm6,xmm6,xmm9
+        shrd    r14d,r14d,9
+        xor     r13d,ebx
+        xor     r12d,edx
+        vpaddd  xmm0,xmm0,xmm6
+        shrd    r13d,r13d,5
+        xor     r14d,r9d
+        and     r12d,ebx
+        vpaddd  xmm6,xmm0,XMMWORD[rbp]
+        xor     r13d,ebx
+        add     r8d,DWORD[12+rsp]
+        mov     edi,r9d
+        xor     r12d,edx
+        shrd    r14d,r14d,11
+        xor     edi,r10d
+        add     r8d,r12d
+        shrd    r13d,r13d,6
+        and     r15d,edi
+        xor     r14d,r9d
+        add     r8d,r13d
+        xor     r15d,r10d
+        shrd    r14d,r14d,2
+        add     eax,r8d
+        add     r8d,r15d
+        mov     r13d,eax
+        add     r14d,r8d
+        vmovdqa XMMWORD[rsp],xmm6
+        vpalignr        xmm4,xmm2,xmm1,4
+        shrd    r13d,r13d,14
+        mov     r8d,r14d
+        mov     r12d,ebx
+        vpalignr        xmm7,xmm0,xmm3,4
+        shrd    r14d,r14d,9
+        xor     r13d,eax
+        xor     r12d,ecx
+        vpsrld  xmm6,xmm4,7
+        shrd    r13d,r13d,5
+        xor     r14d,r8d
+        and     r12d,eax
+        vpaddd  xmm1,xmm1,xmm7
+        xor     r13d,eax
+        add     edx,DWORD[16+rsp]
+        mov     r15d,r8d
+        vpsrld  xmm7,xmm4,3
+        xor     r12d,ecx
+        shrd    r14d,r14d,11
+        xor     r15d,r9d
+        vpslld  xmm5,xmm4,14
+        add     edx,r12d
+        shrd    r13d,r13d,6
+        and     edi,r15d
+        vpxor   xmm4,xmm7,xmm6
+        xor     r14d,r8d
+        add     edx,r13d
+        xor     edi,r9d
+        vpshufd xmm7,xmm0,250
+        shrd    r14d,r14d,2
+        add     r11d,edx
+        add     edx,edi
+        vpsrld  xmm6,xmm6,11
+        mov     r13d,r11d
+        add     r14d,edx
+        shrd    r13d,r13d,14
+        vpxor   xmm4,xmm4,xmm5
+        mov     edx,r14d
+        mov     r12d,eax
+        shrd    r14d,r14d,9
+        vpslld  xmm5,xmm5,11
+        xor     r13d,r11d
+        xor     r12d,ebx
+        shrd    r13d,r13d,5
+        vpxor   xmm4,xmm4,xmm6
+        xor     r14d,edx
+        and     r12d,r11d
+        xor     r13d,r11d
+        vpsrld  xmm6,xmm7,10
+        add     ecx,DWORD[20+rsp]
+        mov     edi,edx
+        xor     r12d,ebx
+        vpxor   xmm4,xmm4,xmm5
+        shrd    r14d,r14d,11
+        xor     edi,r8d
+        add     ecx,r12d
+        vpsrlq  xmm7,xmm7,17
+        shrd    r13d,r13d,6
+        and     r15d,edi
+        xor     r14d,edx
+        vpaddd  xmm1,xmm1,xmm4
+        add     ecx,r13d
+        xor     r15d,r8d
+        shrd    r14d,r14d,2
+        vpxor   xmm6,xmm6,xmm7
+        add     r10d,ecx
+        add     ecx,r15d
+        mov     r13d,r10d
+        vpsrlq  xmm7,xmm7,2
+        add     r14d,ecx
+        shrd    r13d,r13d,14
+        mov     ecx,r14d
+        vpxor   xmm6,xmm6,xmm7
+        mov     r12d,r11d
+        shrd    r14d,r14d,9
+        xor     r13d,r10d
+        vpshufb xmm6,xmm6,xmm8
+        xor     r12d,eax
+        shrd    r13d,r13d,5
+        xor     r14d,ecx
+        vpaddd  xmm1,xmm1,xmm6
+        and     r12d,r10d
+        xor     r13d,r10d
+        add     ebx,DWORD[24+rsp]
+        vpshufd xmm7,xmm1,80
+        mov     r15d,ecx
+        xor     r12d,eax
+        shrd    r14d,r14d,11
+        vpsrld  xmm6,xmm7,10
+        xor     r15d,edx
+        add     ebx,r12d
+        shrd    r13d,r13d,6
+        vpsrlq  xmm7,xmm7,17
+        and     edi,r15d
+        xor     r14d,ecx
+        add     ebx,r13d
+        vpxor   xmm6,xmm6,xmm7
+        xor     edi,edx
+        shrd    r14d,r14d,2
+        add     r9d,ebx
+        vpsrlq  xmm7,xmm7,2
+        add     ebx,edi
+        mov     r13d,r9d
+        add     r14d,ebx
+        vpxor   xmm6,xmm6,xmm7
+        shrd    r13d,r13d,14
+        mov     ebx,r14d
+        mov     r12d,r10d
+        vpshufb xmm6,xmm6,xmm9
+        shrd    r14d,r14d,9
+        xor     r13d,r9d
+        xor     r12d,r11d
+        vpaddd  xmm1,xmm1,xmm6
+        shrd    r13d,r13d,5
+        xor     r14d,ebx
+        and     r12d,r9d
+        vpaddd  xmm6,xmm1,XMMWORD[32+rbp]
+        xor     r13d,r9d
+        add     eax,DWORD[28+rsp]
+        mov     edi,ebx
+        xor     r12d,r11d
+        shrd    r14d,r14d,11
+        xor     edi,ecx
+        add     eax,r12d
+        shrd    r13d,r13d,6
+        and     r15d,edi
+        xor     r14d,ebx
+        add     eax,r13d
+        xor     r15d,ecx
+        shrd    r14d,r14d,2
+        add     r8d,eax
+        add     eax,r15d
+        mov     r13d,r8d
+        add     r14d,eax
+        vmovdqa XMMWORD[16+rsp],xmm6
+        vpalignr        xmm4,xmm3,xmm2,4
+        shrd    r13d,r13d,14
+        mov     eax,r14d
+        mov     r12d,r9d
+        vpalignr        xmm7,xmm1,xmm0,4
+        shrd    r14d,r14d,9
+        xor     r13d,r8d
+        xor     r12d,r10d
+        vpsrld  xmm6,xmm4,7
+        shrd    r13d,r13d,5
+        xor     r14d,eax
+        and     r12d,r8d
+        vpaddd  xmm2,xmm2,xmm7
+        xor     r13d,r8d
+        add     r11d,DWORD[32+rsp]
+        mov     r15d,eax
+        vpsrld  xmm7,xmm4,3
+        xor     r12d,r10d
+        shrd    r14d,r14d,11
+        xor     r15d,ebx
+        vpslld  xmm5,xmm4,14
+        add     r11d,r12d
+        shrd    r13d,r13d,6
+        and     edi,r15d
+        vpxor   xmm4,xmm7,xmm6
+        xor     r14d,eax
+        add     r11d,r13d
+        xor     edi,ebx
+        vpshufd xmm7,xmm1,250
+        shrd    r14d,r14d,2
+        add     edx,r11d
+        add     r11d,edi
+        vpsrld  xmm6,xmm6,11
+        mov     r13d,edx
+        add     r14d,r11d
+        shrd    r13d,r13d,14
+        vpxor   xmm4,xmm4,xmm5
+        mov     r11d,r14d
+        mov     r12d,r8d
+        shrd    r14d,r14d,9
+        vpslld  xmm5,xmm5,11
+        xor     r13d,edx
+        xor     r12d,r9d
+        shrd    r13d,r13d,5
+        vpxor   xmm4,xmm4,xmm6
+        xor     r14d,r11d
+        and     r12d,edx
+        xor     r13d,edx
+        vpsrld  xmm6,xmm7,10
+        add     r10d,DWORD[36+rsp]
+        mov     edi,r11d
+        xor     r12d,r9d
+        vpxor   xmm4,xmm4,xmm5
+        shrd    r14d,r14d,11
+        xor     edi,eax
+        add     r10d,r12d
+        vpsrlq  xmm7,xmm7,17
+        shrd    r13d,r13d,6
+        and     r15d,edi
+        xor     r14d,r11d
+        vpaddd  xmm2,xmm2,xmm4
+        add     r10d,r13d
+        xor     r15d,eax
+        shrd    r14d,r14d,2
+        vpxor   xmm6,xmm6,xmm7
+        add     ecx,r10d
+        add     r10d,r15d
+        mov     r13d,ecx
+        vpsrlq  xmm7,xmm7,2
+        add     r14d,r10d
+        shrd    r13d,r13d,14
+        mov     r10d,r14d
+        vpxor   xmm6,xmm6,xmm7
+        mov     r12d,edx
+        shrd    r14d,r14d,9
+        xor     r13d,ecx
+        vpshufb xmm6,xmm6,xmm8
+        xor     r12d,r8d
+        shrd    r13d,r13d,5
+        xor     r14d,r10d
+        vpaddd  xmm2,xmm2,xmm6
+        and     r12d,ecx
+        xor     r13d,ecx
+        add     r9d,DWORD[40+rsp]
+        vpshufd xmm7,xmm2,80
+        mov     r15d,r10d
+        xor     r12d,r8d
+        shrd    r14d,r14d,11
+        vpsrld  xmm6,xmm7,10
+        xor     r15d,r11d
+        add     r9d,r12d
+        shrd    r13d,r13d,6
+        vpsrlq  xmm7,xmm7,17
+        and     edi,r15d
+        xor     r14d,r10d
+        add     r9d,r13d
+        vpxor   xmm6,xmm6,xmm7
+        xor     edi,r11d
+        shrd    r14d,r14d,2
+        add     ebx,r9d
+        vpsrlq  xmm7,xmm7,2
+        add     r9d,edi
+        mov     r13d,ebx
+        add     r14d,r9d
+        vpxor   xmm6,xmm6,xmm7
+        shrd    r13d,r13d,14
+        mov     r9d,r14d
+        mov     r12d,ecx
+        vpshufb xmm6,xmm6,xmm9
+        shrd    r14d,r14d,9
+        xor     r13d,ebx
+        xor     r12d,edx
+        vpaddd  xmm2,xmm2,xmm6
+        shrd    r13d,r13d,5
+        xor     r14d,r9d
+        and     r12d,ebx
+        vpaddd  xmm6,xmm2,XMMWORD[64+rbp]
+        xor     r13d,ebx
+        add     r8d,DWORD[44+rsp]
+        mov     edi,r9d
+        xor     r12d,edx
+        shrd    r14d,r14d,11
+        xor     edi,r10d
+        add     r8d,r12d
+        shrd    r13d,r13d,6
+        and     r15d,edi
+        xor     r14d,r9d
+        add     r8d,r13d
+        xor     r15d,r10d
+        shrd    r14d,r14d,2
+        add     eax,r8d
+        add     r8d,r15d
+        mov     r13d,eax
+        add     r14d,r8d
+        vmovdqa XMMWORD[32+rsp],xmm6
+        vpalignr        xmm4,xmm0,xmm3,4
+        shrd    r13d,r13d,14
+        mov     r8d,r14d
+        mov     r12d,ebx
+        vpalignr        xmm7,xmm2,xmm1,4
+        shrd    r14d,r14d,9
+        xor     r13d,eax
+        xor     r12d,ecx
+        vpsrld  xmm6,xmm4,7
+        shrd    r13d,r13d,5
+        xor     r14d,r8d
+        and     r12d,eax
+        vpaddd  xmm3,xmm3,xmm7
+        xor     r13d,eax
+        add     edx,DWORD[48+rsp]
+        mov     r15d,r8d
+        vpsrld  xmm7,xmm4,3
+        xor     r12d,ecx
+        shrd    r14d,r14d,11
+        xor     r15d,r9d
+        vpslld  xmm5,xmm4,14
+        add     edx,r12d
+        shrd    r13d,r13d,6
+        and     edi,r15d
+        vpxor   xmm4,xmm7,xmm6
+        xor     r14d,r8d
+        add     edx,r13d
+        xor     edi,r9d
+        vpshufd xmm7,xmm2,250
+        shrd    r14d,r14d,2
+        add     r11d,edx
+        add     edx,edi
+        vpsrld  xmm6,xmm6,11
+        mov     r13d,r11d
+        add     r14d,edx
+        shrd    r13d,r13d,14
+        vpxor   xmm4,xmm4,xmm5
+        mov     edx,r14d
+        mov     r12d,eax
+        shrd    r14d,r14d,9
+        vpslld  xmm5,xmm5,11
+        xor     r13d,r11d
+        xor     r12d,ebx
+        shrd    r13d,r13d,5
+        vpxor   xmm4,xmm4,xmm6
+        xor     r14d,edx
+        and     r12d,r11d
+        xor     r13d,r11d
+        vpsrld  xmm6,xmm7,10
+        add     ecx,DWORD[52+rsp]
+        mov     edi,edx
+        xor     r12d,ebx
+        vpxor   xmm4,xmm4,xmm5
+        shrd    r14d,r14d,11
+        xor     edi,r8d
+        add     ecx,r12d
+        vpsrlq  xmm7,xmm7,17
+        shrd    r13d,r13d,6
+        and     r15d,edi
+        xor     r14d,edx
+        vpaddd  xmm3,xmm3,xmm4
+        add     ecx,r13d
+        xor     r15d,r8d
+        shrd    r14d,r14d,2
+        vpxor   xmm6,xmm6,xmm7
+        add     r10d,ecx
+        add     ecx,r15d
+        mov     r13d,r10d
+        vpsrlq  xmm7,xmm7,2
+        add     r14d,ecx
+        shrd    r13d,r13d,14
+        mov     ecx,r14d
+        vpxor   xmm6,xmm6,xmm7
+        mov     r12d,r11d
+        shrd    r14d,r14d,9
+        xor     r13d,r10d
+        vpshufb xmm6,xmm6,xmm8
+        xor     r12d,eax
+        shrd    r13d,r13d,5
+        xor     r14d,ecx
+        vpaddd  xmm3,xmm3,xmm6
+        and     r12d,r10d
+        xor     r13d,r10d
+        add     ebx,DWORD[56+rsp]
+        vpshufd xmm7,xmm3,80
+        mov     r15d,ecx
+        xor     r12d,eax
+        shrd    r14d,r14d,11
+        vpsrld  xmm6,xmm7,10
+        xor     r15d,edx
+        add     ebx,r12d
+        shrd    r13d,r13d,6
+        vpsrlq  xmm7,xmm7,17
+        and     edi,r15d
+        xor     r14d,ecx
+        add     ebx,r13d
+        vpxor   xmm6,xmm6,xmm7
+        xor     edi,edx
+        shrd    r14d,r14d,2
+        add     r9d,ebx
+        vpsrlq  xmm7,xmm7,2
+        add     ebx,edi
+        mov     r13d,r9d
+        add     r14d,ebx
+        vpxor   xmm6,xmm6,xmm7
+        shrd    r13d,r13d,14
+        mov     ebx,r14d
+        mov     r12d,r10d
+        vpshufb xmm6,xmm6,xmm9
+        shrd    r14d,r14d,9
+        xor     r13d,r9d
+        xor     r12d,r11d
+        vpaddd  xmm3,xmm3,xmm6
+        shrd    r13d,r13d,5
+        xor     r14d,ebx
+        and     r12d,r9d
+        vpaddd  xmm6,xmm3,XMMWORD[96+rbp]
+        xor     r13d,r9d
+        add     eax,DWORD[60+rsp]
+        mov     edi,ebx
+        xor     r12d,r11d
+        shrd    r14d,r14d,11
+        xor     edi,ecx
+        add     eax,r12d
+        shrd    r13d,r13d,6
+        and     r15d,edi
+        xor     r14d,ebx
+        add     eax,r13d
+        xor     r15d,ecx
+        shrd    r14d,r14d,2
+        add     r8d,eax
+        add     eax,r15d
+        mov     r13d,r8d
+        add     r14d,eax
+        vmovdqa XMMWORD[48+rsp],xmm6
+        cmp     BYTE[131+rbp],0
+        jne     NEAR $L$avx_00_47
+        shrd    r13d,r13d,14
+        mov     eax,r14d
+        mov     r12d,r9d
+        shrd    r14d,r14d,9
+        xor     r13d,r8d
+        xor     r12d,r10d
+        shrd    r13d,r13d,5
+        xor     r14d,eax
+        and     r12d,r8d
+        xor     r13d,r8d
+        add     r11d,DWORD[rsp]
+        mov     r15d,eax
+        xor     r12d,r10d
+        shrd    r14d,r14d,11
+        xor     r15d,ebx
+        add     r11d,r12d
+        shrd    r13d,r13d,6
+        and     edi,r15d
+        xor     r14d,eax
+        add     r11d,r13d
+        xor     edi,ebx
+        shrd    r14d,r14d,2
+        add     edx,r11d
+        add     r11d,edi
+        mov     r13d,edx
+        add     r14d,r11d
+        shrd    r13d,r13d,14
+        mov     r11d,r14d
+        mov     r12d,r8d
+        shrd    r14d,r14d,9
+        xor     r13d,edx
+        xor     r12d,r9d
+        shrd    r13d,r13d,5
+        xor     r14d,r11d
+        and     r12d,edx
+        xor     r13d,edx
+        add     r10d,DWORD[4+rsp]
+        mov     edi,r11d
+        xor     r12d,r9d
+        shrd    r14d,r14d,11
+        xor     edi,eax
+        add     r10d,r12d
+        shrd    r13d,r13d,6
+        and     r15d,edi
+        xor     r14d,r11d
+        add     r10d,r13d
+        xor     r15d,eax
+        shrd    r14d,r14d,2
+        add     ecx,r10d
+        add     r10d,r15d
+        mov     r13d,ecx
+        add     r14d,r10d
+        shrd    r13d,r13d,14
+        mov     r10d,r14d
+        mov     r12d,edx
+        shrd    r14d,r14d,9
+        xor     r13d,ecx
+        xor     r12d,r8d
+        shrd    r13d,r13d,5
+        xor     r14d,r10d
+        and     r12d,ecx
+        xor     r13d,ecx
+        add     r9d,DWORD[8+rsp]
+        mov     r15d,r10d
+        xor     r12d,r8d
+        shrd    r14d,r14d,11
+        xor     r15d,r11d
+        add     r9d,r12d
+        shrd    r13d,r13d,6
+        and     edi,r15d
+        xor     r14d,r10d
+        add     r9d,r13d
+        xor     edi,r11d
+        shrd    r14d,r14d,2
+        add     ebx,r9d
+        add     r9d,edi
+        mov     r13d,ebx
+        add     r14d,r9d
+        shrd    r13d,r13d,14
+        mov     r9d,r14d
+        mov     r12d,ecx
+        shrd    r14d,r14d,9
+        xor     r13d,ebx
+        xor     r12d,edx
+        shrd    r13d,r13d,5
+        xor     r14d,r9d
+        and     r12d,ebx
+        xor     r13d,ebx
+        add     r8d,DWORD[12+rsp]
+        mov     edi,r9d
+        xor     r12d,edx
+        shrd    r14d,r14d,11
+        xor     edi,r10d
+        add     r8d,r12d
+        shrd    r13d,r13d,6
+        and     r15d,edi
+        xor     r14d,r9d
+        add     r8d,r13d
+        xor     r15d,r10d
+        shrd    r14d,r14d,2
+        add     eax,r8d
+        add     r8d,r15d
+        mov     r13d,eax
+        add     r14d,r8d
+        shrd    r13d,r13d,14
+        mov     r8d,r14d
+        mov     r12d,ebx
+        shrd    r14d,r14d,9
+        xor     r13d,eax
+        xor     r12d,ecx
+        shrd    r13d,r13d,5
+        xor     r14d,r8d
+        and     r12d,eax
+        xor     r13d,eax
+        add     edx,DWORD[16+rsp]
+        mov     r15d,r8d
+        xor     r12d,ecx
+        shrd    r14d,r14d,11
+        xor     r15d,r9d
+        add     edx,r12d
+        shrd    r13d,r13d,6
+        and     edi,r15d
+        xor     r14d,r8d
+        add     edx,r13d
+        xor     edi,r9d
+        shrd    r14d,r14d,2
+        add     r11d,edx
+        add     edx,edi
+        mov     r13d,r11d
+        add     r14d,edx
+        shrd    r13d,r13d,14
+        mov     edx,r14d
+        mov     r12d,eax
+        shrd    r14d,r14d,9
+        xor     r13d,r11d
+        xor     r12d,ebx
+        shrd    r13d,r13d,5
+        xor     r14d,edx
+        and     r12d,r11d
+        xor     r13d,r11d
+        add     ecx,DWORD[20+rsp]
+        mov     edi,edx
+        xor     r12d,ebx
+        shrd    r14d,r14d,11
+        xor     edi,r8d
+        add     ecx,r12d
+        shrd    r13d,r13d,6
+        and     r15d,edi
+        xor     r14d,edx
+        add     ecx,r13d
+        xor     r15d,r8d
+        shrd    r14d,r14d,2
+        add     r10d,ecx
+        add     ecx,r15d
+        mov     r13d,r10d
+        add     r14d,ecx
+        shrd    r13d,r13d,14
+        mov     ecx,r14d
+        mov     r12d,r11d
+        shrd    r14d,r14d,9
+        xor     r13d,r10d
+        xor     r12d,eax
+        shrd    r13d,r13d,5
+        xor     r14d,ecx
+        and     r12d,r10d
+        xor     r13d,r10d
+        add     ebx,DWORD[24+rsp]
+        mov     r15d,ecx
+        xor     r12d,eax
+        shrd    r14d,r14d,11
+        xor     r15d,edx
+        add     ebx,r12d
+        shrd    r13d,r13d,6
+        and     edi,r15d
+        xor     r14d,ecx
+        add     ebx,r13d
+        xor     edi,edx
+        shrd    r14d,r14d,2
+        add     r9d,ebx
+        add     ebx,edi
+        mov     r13d,r9d
+        add     r14d,ebx
+        shrd    r13d,r13d,14
+        mov     ebx,r14d
+        mov     r12d,r10d
+        shrd    r14d,r14d,9
+        xor     r13d,r9d
+        xor     r12d,r11d
+        shrd    r13d,r13d,5
+        xor     r14d,ebx
+        and     r12d,r9d
+        xor     r13d,r9d
+        add     eax,DWORD[28+rsp]
+        mov     edi,ebx
+        xor     r12d,r11d
+        shrd    r14d,r14d,11
+        xor     edi,ecx
+        add     eax,r12d
+        shrd    r13d,r13d,6
+        and     r15d,edi
+        xor     r14d,ebx
+        add     eax,r13d
+        xor     r15d,ecx
+        shrd    r14d,r14d,2
+        add     r8d,eax
+        add     eax,r15d
+        mov     r13d,r8d
+        add     r14d,eax
+        shrd    r13d,r13d,14
+        mov     eax,r14d
+        mov     r12d,r9d
+        shrd    r14d,r14d,9
+        xor     r13d,r8d
+        xor     r12d,r10d
+        shrd    r13d,r13d,5
+        xor     r14d,eax
+        and     r12d,r8d
+        xor     r13d,r8d
+        add     r11d,DWORD[32+rsp]
+        mov     r15d,eax
+        xor     r12d,r10d
+        shrd    r14d,r14d,11
+        xor     r15d,ebx
+        add     r11d,r12d
+        shrd    r13d,r13d,6
+        and     edi,r15d
+        xor     r14d,eax
+        add     r11d,r13d
+        xor     edi,ebx
+        shrd    r14d,r14d,2
+        add     edx,r11d
+        add     r11d,edi
+        mov     r13d,edx
+        add     r14d,r11d
+        shrd    r13d,r13d,14
+        mov     r11d,r14d
+        mov     r12d,r8d
+        shrd    r14d,r14d,9
+        xor     r13d,edx
+        xor     r12d,r9d
+        shrd    r13d,r13d,5
+        xor     r14d,r11d
+        and     r12d,edx
+        xor     r13d,edx
+        add     r10d,DWORD[36+rsp]
+        mov     edi,r11d
+        xor     r12d,r9d
+        shrd    r14d,r14d,11
+        xor     edi,eax
+        add     r10d,r12d
+        shrd    r13d,r13d,6
+        and     r15d,edi
+        xor     r14d,r11d
+        add     r10d,r13d
+        xor     r15d,eax
+        shrd    r14d,r14d,2
+        add     ecx,r10d
+        add     r10d,r15d
+        mov     r13d,ecx
+        add     r14d,r10d
+        shrd    r13d,r13d,14
+        mov     r10d,r14d
+        mov     r12d,edx
+        shrd    r14d,r14d,9
+        xor     r13d,ecx
+        xor     r12d,r8d
+        shrd    r13d,r13d,5
+        xor     r14d,r10d
+        and     r12d,ecx
+        xor     r13d,ecx
+        add     r9d,DWORD[40+rsp]
+        mov     r15d,r10d
+        xor     r12d,r8d
+        shrd    r14d,r14d,11
+        xor     r15d,r11d
+        add     r9d,r12d
+        shrd    r13d,r13d,6
+        and     edi,r15d
+        xor     r14d,r10d
+        add     r9d,r13d
+        xor     edi,r11d
+        shrd    r14d,r14d,2
+        add     ebx,r9d
+        add     r9d,edi
+        mov     r13d,ebx
+        add     r14d,r9d
+        shrd    r13d,r13d,14
+        mov     r9d,r14d
+        mov     r12d,ecx
+        shrd    r14d,r14d,9
+        xor     r13d,ebx
+        xor     r12d,edx
+        shrd    r13d,r13d,5
+        xor     r14d,r9d
+        and     r12d,ebx
+        xor     r13d,ebx
+        add     r8d,DWORD[44+rsp]
+        mov     edi,r9d
+        xor     r12d,edx
+        shrd    r14d,r14d,11
+        xor     edi,r10d
+        add     r8d,r12d
+        shrd    r13d,r13d,6
+        and     r15d,edi
+        xor     r14d,r9d
+        add     r8d,r13d
+        xor     r15d,r10d
+        shrd    r14d,r14d,2
+        add     eax,r8d
+        add     r8d,r15d
+        mov     r13d,eax
+        add     r14d,r8d
+        shrd    r13d,r13d,14
+        mov     r8d,r14d
+        mov     r12d,ebx
+        shrd    r14d,r14d,9
+        xor     r13d,eax
+        xor     r12d,ecx
+        shrd    r13d,r13d,5
+        xor     r14d,r8d
+        and     r12d,eax
+        xor     r13d,eax
+        add     edx,DWORD[48+rsp]
+        mov     r15d,r8d
+        xor     r12d,ecx
+        shrd    r14d,r14d,11
+        xor     r15d,r9d
+        add     edx,r12d
+        shrd    r13d,r13d,6
+        and     edi,r15d
+        xor     r14d,r8d
+        add     edx,r13d
+        xor     edi,r9d
+        shrd    r14d,r14d,2
+        add     r11d,edx
+        add     edx,edi
+        mov     r13d,r11d
+        add     r14d,edx
+        shrd    r13d,r13d,14
+        mov     edx,r14d
+        mov     r12d,eax
+        shrd    r14d,r14d,9
+        xor     r13d,r11d
+        xor     r12d,ebx
+        shrd    r13d,r13d,5
+        xor     r14d,edx
+        and     r12d,r11d
+        xor     r13d,r11d
+        add     ecx,DWORD[52+rsp]
+        mov     edi,edx
+        xor     r12d,ebx
+        shrd    r14d,r14d,11
+        xor     edi,r8d
+        add     ecx,r12d
+        shrd    r13d,r13d,6
+        and     r15d,edi
+        xor     r14d,edx
+        add     ecx,r13d
+        xor     r15d,r8d
+        shrd    r14d,r14d,2
+        add     r10d,ecx
+        add     ecx,r15d
+        mov     r13d,r10d
+        add     r14d,ecx
+        shrd    r13d,r13d,14
+        mov     ecx,r14d
+        mov     r12d,r11d
+        shrd    r14d,r14d,9
+        xor     r13d,r10d
+        xor     r12d,eax
+        shrd    r13d,r13d,5
+        xor     r14d,ecx
+        and     r12d,r10d
+        xor     r13d,r10d
+        add     ebx,DWORD[56+rsp]
+        mov     r15d,ecx
+        xor     r12d,eax
+        shrd    r14d,r14d,11
+        xor     r15d,edx
+        add     ebx,r12d
+        shrd    r13d,r13d,6
+        and     edi,r15d
+        xor     r14d,ecx
+        add     ebx,r13d
+        xor     edi,edx
+        shrd    r14d,r14d,2
+        add     r9d,ebx
+        add     ebx,edi
+        mov     r13d,r9d
+        add     r14d,ebx
+        shrd    r13d,r13d,14
+        mov     ebx,r14d
+        mov     r12d,r10d
+        shrd    r14d,r14d,9
+        xor     r13d,r9d
+        xor     r12d,r11d
+        shrd    r13d,r13d,5
+        xor     r14d,ebx
+        and     r12d,r9d
+        xor     r13d,r9d
+        add     eax,DWORD[60+rsp]
+        mov     edi,ebx
+        xor     r12d,r11d
+        shrd    r14d,r14d,11
+        xor     edi,ecx
+        add     eax,r12d
+        shrd    r13d,r13d,6
+        and     r15d,edi
+        xor     r14d,ebx
+        add     eax,r13d
+        xor     r15d,ecx
+        shrd    r14d,r14d,2
+        add     r8d,eax
+        add     eax,r15d
+        mov     r13d,r8d
+        add     r14d,eax
+        mov     rdi,QWORD[((64+0))+rsp]
+        mov     eax,r14d
+
+        add     eax,DWORD[rdi]
+        lea     rsi,[64+rsi]
+        add     ebx,DWORD[4+rdi]
+        add     ecx,DWORD[8+rdi]
+        add     edx,DWORD[12+rdi]
+        add     r8d,DWORD[16+rdi]
+        add     r9d,DWORD[20+rdi]
+        add     r10d,DWORD[24+rdi]
+        add     r11d,DWORD[28+rdi]
+
+        cmp     rsi,QWORD[((64+16))+rsp]
+
+        mov     DWORD[rdi],eax
+        mov     DWORD[4+rdi],ebx
+        mov     DWORD[8+rdi],ecx
+        mov     DWORD[12+rdi],edx
+        mov     DWORD[16+rdi],r8d
+        mov     DWORD[20+rdi],r9d
+        mov     DWORD[24+rdi],r10d
+        mov     DWORD[28+rdi],r11d
+        jb      NEAR $L$loop_avx
+
+        mov     rsi,QWORD[88+rsp]
+
+        vzeroupper
+        movaps  xmm6,XMMWORD[((64+32))+rsp]
+        movaps  xmm7,XMMWORD[((64+48))+rsp]
+        movaps  xmm8,XMMWORD[((64+64))+rsp]
+        movaps  xmm9,XMMWORD[((64+80))+rsp]
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$epilogue_avx:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_sha256_block_data_order_avx:
+
+ALIGN   64
+sha256_block_data_order_avx2:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha256_block_data_order_avx2:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+$L$avx2_shortcut:
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        sub     rsp,608
+        shl     rdx,4
+        and     rsp,-256*4
+        lea     rdx,[rdx*4+rsi]
+        add     rsp,448
+        mov     QWORD[((64+0))+rsp],rdi
+        mov     QWORD[((64+8))+rsp],rsi
+        mov     QWORD[((64+16))+rsp],rdx
+        mov     QWORD[88+rsp],rax
+
+        movaps  XMMWORD[(64+32)+rsp],xmm6
+        movaps  XMMWORD[(64+48)+rsp],xmm7
+        movaps  XMMWORD[(64+64)+rsp],xmm8
+        movaps  XMMWORD[(64+80)+rsp],xmm9
+$L$prologue_avx2:
+
+        vzeroupper
+        sub     rsi,-16*4
+        mov     eax,DWORD[rdi]
+        mov     r12,rsi
+        mov     ebx,DWORD[4+rdi]
+        cmp     rsi,rdx
+        mov     ecx,DWORD[8+rdi]
+        cmove   r12,rsp
+        mov     edx,DWORD[12+rdi]
+        mov     r8d,DWORD[16+rdi]
+        mov     r9d,DWORD[20+rdi]
+        mov     r10d,DWORD[24+rdi]
+        mov     r11d,DWORD[28+rdi]
+        vmovdqa ymm8,YMMWORD[((K256+512+32))]
+        vmovdqa ymm9,YMMWORD[((K256+512+64))]
+        jmp     NEAR $L$oop_avx2
+ALIGN   16
+$L$oop_avx2:
+        vmovdqa ymm7,YMMWORD[((K256+512))]
+        vmovdqu xmm0,XMMWORD[((-64+0))+rsi]
+        vmovdqu xmm1,XMMWORD[((-64+16))+rsi]
+        vmovdqu xmm2,XMMWORD[((-64+32))+rsi]
+        vmovdqu xmm3,XMMWORD[((-64+48))+rsi]
+
+        vinserti128     ymm0,ymm0,XMMWORD[r12],1
+        vinserti128     ymm1,ymm1,XMMWORD[16+r12],1
+        vpshufb ymm0,ymm0,ymm7
+        vinserti128     ymm2,ymm2,XMMWORD[32+r12],1
+        vpshufb ymm1,ymm1,ymm7
+        vinserti128     ymm3,ymm3,XMMWORD[48+r12],1
+
+        lea     rbp,[K256]
+        vpshufb ymm2,ymm2,ymm7
+        vpaddd  ymm4,ymm0,YMMWORD[rbp]
+        vpshufb ymm3,ymm3,ymm7
+        vpaddd  ymm5,ymm1,YMMWORD[32+rbp]
+        vpaddd  ymm6,ymm2,YMMWORD[64+rbp]
+        vpaddd  ymm7,ymm3,YMMWORD[96+rbp]
+        vmovdqa YMMWORD[rsp],ymm4
+        xor     r14d,r14d
+        vmovdqa YMMWORD[32+rsp],ymm5
+        lea     rsp,[((-64))+rsp]
+        mov     edi,ebx
+        vmovdqa YMMWORD[rsp],ymm6
+        xor     edi,ecx
+        vmovdqa YMMWORD[32+rsp],ymm7
+        mov     r12d,r9d
+        sub     rbp,-16*2*4
+        jmp     NEAR $L$avx2_00_47
+
+ALIGN   16
+$L$avx2_00_47:
+        lea     rsp,[((-64))+rsp]
+        vpalignr        ymm4,ymm1,ymm0,4
+        add     r11d,DWORD[((0+128))+rsp]
+        and     r12d,r8d
+        rorx    r13d,r8d,25
+        vpalignr        ymm7,ymm3,ymm2,4
+        rorx    r15d,r8d,11
+        lea     eax,[r14*1+rax]
+        lea     r11d,[r12*1+r11]
+        vpsrld  ymm6,ymm4,7
+        andn    r12d,r8d,r10d
+        xor     r13d,r15d
+        rorx    r14d,r8d,6
+        vpaddd  ymm0,ymm0,ymm7
+        lea     r11d,[r12*1+r11]
+        xor     r13d,r14d
+        mov     r15d,eax
+        vpsrld  ymm7,ymm4,3
+        rorx    r12d,eax,22
+        lea     r11d,[r13*1+r11]
+        xor     r15d,ebx
+        vpslld  ymm5,ymm4,14
+        rorx    r14d,eax,13
+        rorx    r13d,eax,2
+        lea     edx,[r11*1+rdx]
+        vpxor   ymm4,ymm7,ymm6
+        and     edi,r15d
+        xor     r14d,r12d
+        xor     edi,ebx
+        vpshufd ymm7,ymm3,250
+        xor     r14d,r13d
+        lea     r11d,[rdi*1+r11]
+        mov     r12d,r8d
+        vpsrld  ymm6,ymm6,11
+        add     r10d,DWORD[((4+128))+rsp]
+        and     r12d,edx
+        rorx    r13d,edx,25
+        vpxor   ymm4,ymm4,ymm5
+        rorx    edi,edx,11
+        lea     r11d,[r14*1+r11]
+        lea     r10d,[r12*1+r10]
+        vpslld  ymm5,ymm5,11
+        andn    r12d,edx,r9d
+        xor     r13d,edi
+        rorx    r14d,edx,6
+        vpxor   ymm4,ymm4,ymm6
+        lea     r10d,[r12*1+r10]
+        xor     r13d,r14d
+        mov     edi,r11d
+        vpsrld  ymm6,ymm7,10
+        rorx    r12d,r11d,22
+        lea     r10d,[r13*1+r10]
+        xor     edi,eax
+        vpxor   ymm4,ymm4,ymm5
+        rorx    r14d,r11d,13
+        rorx    r13d,r11d,2
+        lea     ecx,[r10*1+rcx]
+        vpsrlq  ymm7,ymm7,17
+        and     r15d,edi
+        xor     r14d,r12d
+        xor     r15d,eax
+        vpaddd  ymm0,ymm0,ymm4
+        xor     r14d,r13d
+        lea     r10d,[r15*1+r10]
+        mov     r12d,edx
+        vpxor   ymm6,ymm6,ymm7
+        add     r9d,DWORD[((8+128))+rsp]
+        and     r12d,ecx
+        rorx    r13d,ecx,25
+        vpsrlq  ymm7,ymm7,2
+        rorx    r15d,ecx,11
+        lea     r10d,[r14*1+r10]
+        lea     r9d,[r12*1+r9]
+        vpxor   ymm6,ymm6,ymm7
+        andn    r12d,ecx,r8d
+        xor     r13d,r15d
+        rorx    r14d,ecx,6
+        vpshufb ymm6,ymm6,ymm8
+        lea     r9d,[r12*1+r9]
+        xor     r13d,r14d
+        mov     r15d,r10d
+        vpaddd  ymm0,ymm0,ymm6
+        rorx    r12d,r10d,22
+        lea     r9d,[r13*1+r9]
+        xor     r15d,r11d
+        vpshufd ymm7,ymm0,80
+        rorx    r14d,r10d,13
+        rorx    r13d,r10d,2
+        lea     ebx,[r9*1+rbx]
+        vpsrld  ymm6,ymm7,10
+        and     edi,r15d
+        xor     r14d,r12d
+        xor     edi,r11d
+        vpsrlq  ymm7,ymm7,17
+        xor     r14d,r13d
+        lea     r9d,[rdi*1+r9]
+        mov     r12d,ecx
+        vpxor   ymm6,ymm6,ymm7
+        add     r8d,DWORD[((12+128))+rsp]
+        and     r12d,ebx
+        rorx    r13d,ebx,25
+        vpsrlq  ymm7,ymm7,2
+        rorx    edi,ebx,11
+        lea     r9d,[r14*1+r9]
+        lea     r8d,[r12*1+r8]
+        vpxor   ymm6,ymm6,ymm7
+        andn    r12d,ebx,edx
+        xor     r13d,edi
+        rorx    r14d,ebx,6
+        vpshufb ymm6,ymm6,ymm9
+        lea     r8d,[r12*1+r8]
+        xor     r13d,r14d
+        mov     edi,r9d
+        vpaddd  ymm0,ymm0,ymm6
+        rorx    r12d,r9d,22
+        lea     r8d,[r13*1+r8]
+        xor     edi,r10d
+        vpaddd  ymm6,ymm0,YMMWORD[rbp]
+        rorx    r14d,r9d,13
+        rorx    r13d,r9d,2
+        lea     eax,[r8*1+rax]
+        and     r15d,edi
+        xor     r14d,r12d
+        xor     r15d,r10d
+        xor     r14d,r13d
+        lea     r8d,[r15*1+r8]
+        mov     r12d,ebx
+        vmovdqa YMMWORD[rsp],ymm6
+        vpalignr        ymm4,ymm2,ymm1,4
+        add     edx,DWORD[((32+128))+rsp]
+        and     r12d,eax
+        rorx    r13d,eax,25
+        vpalignr        ymm7,ymm0,ymm3,4
+        rorx    r15d,eax,11
+        lea     r8d,[r14*1+r8]
+        lea     edx,[r12*1+rdx]
+        vpsrld  ymm6,ymm4,7
+        andn    r12d,eax,ecx
+        xor     r13d,r15d
+        rorx    r14d,eax,6
+        vpaddd  ymm1,ymm1,ymm7
+        lea     edx,[r12*1+rdx]
+        xor     r13d,r14d
+        mov     r15d,r8d
+        vpsrld  ymm7,ymm4,3
+        rorx    r12d,r8d,22
+        lea     edx,[r13*1+rdx]
+        xor     r15d,r9d
+        vpslld  ymm5,ymm4,14
+        rorx    r14d,r8d,13
+        rorx    r13d,r8d,2
+        lea     r11d,[rdx*1+r11]
+        vpxor   ymm4,ymm7,ymm6
+        and     edi,r15d
+        xor     r14d,r12d
+        xor     edi,r9d
+        vpshufd ymm7,ymm0,250
+        xor     r14d,r13d
+        lea     edx,[rdi*1+rdx]
+        mov     r12d,eax
+        vpsrld  ymm6,ymm6,11
+        add     ecx,DWORD[((36+128))+rsp]
+        and     r12d,r11d
+        rorx    r13d,r11d,25
+        vpxor   ymm4,ymm4,ymm5
+        rorx    edi,r11d,11
+        lea     edx,[r14*1+rdx]
+        lea     ecx,[r12*1+rcx]
+        vpslld  ymm5,ymm5,11
+        andn    r12d,r11d,ebx
+        xor     r13d,edi
+        rorx    r14d,r11d,6
+        vpxor   ymm4,ymm4,ymm6
+        lea     ecx,[r12*1+rcx]
+        xor     r13d,r14d
+        mov     edi,edx
+        vpsrld  ymm6,ymm7,10
+        rorx    r12d,edx,22
+        lea     ecx,[r13*1+rcx]
+        xor     edi,r8d
+        vpxor   ymm4,ymm4,ymm5
+        rorx    r14d,edx,13
+        rorx    r13d,edx,2
+        lea     r10d,[rcx*1+r10]
+        vpsrlq  ymm7,ymm7,17
+        and     r15d,edi
+        xor     r14d,r12d
+        xor     r15d,r8d
+        vpaddd  ymm1,ymm1,ymm4
+        xor     r14d,r13d
+        lea     ecx,[r15*1+rcx]
+        mov     r12d,r11d
+        vpxor   ymm6,ymm6,ymm7
+        add     ebx,DWORD[((40+128))+rsp]
+        and     r12d,r10d
+        rorx    r13d,r10d,25
+        vpsrlq  ymm7,ymm7,2
+        rorx    r15d,r10d,11
+        lea     ecx,[r14*1+rcx]
+        lea     ebx,[r12*1+rbx]
+        vpxor   ymm6,ymm6,ymm7
+        andn    r12d,r10d,eax
+        xor     r13d,r15d
+        rorx    r14d,r10d,6
+        vpshufb ymm6,ymm6,ymm8
+        lea     ebx,[r12*1+rbx]
+        xor     r13d,r14d
+        mov     r15d,ecx
+        vpaddd  ymm1,ymm1,ymm6
+        rorx    r12d,ecx,22
+        lea     ebx,[r13*1+rbx]
+        xor     r15d,edx
+        vpshufd ymm7,ymm1,80
+        rorx    r14d,ecx,13
+        rorx    r13d,ecx,2
+        lea     r9d,[rbx*1+r9]
+        vpsrld  ymm6,ymm7,10
+        and     edi,r15d
+        xor     r14d,r12d
+        xor     edi,edx
+        vpsrlq  ymm7,ymm7,17
+        xor     r14d,r13d
+        lea     ebx,[rdi*1+rbx]
+        mov     r12d,r10d
+        vpxor   ymm6,ymm6,ymm7
+        add     eax,DWORD[((44+128))+rsp]
+        and     r12d,r9d
+        rorx    r13d,r9d,25
+        vpsrlq  ymm7,ymm7,2
+        rorx    edi,r9d,11
+        lea     ebx,[r14*1+rbx]
+        lea     eax,[r12*1+rax]
+        vpxor   ymm6,ymm6,ymm7
+        andn    r12d,r9d,r11d
+        xor     r13d,edi
+        rorx    r14d,r9d,6
+        vpshufb ymm6,ymm6,ymm9
+        lea     eax,[r12*1+rax]
+        xor     r13d,r14d
+        mov     edi,ebx
+        vpaddd  ymm1,ymm1,ymm6
+        rorx    r12d,ebx,22
+        lea     eax,[r13*1+rax]
+        xor     edi,ecx
+        vpaddd  ymm6,ymm1,YMMWORD[32+rbp]
+        rorx    r14d,ebx,13
+        rorx    r13d,ebx,2
+        lea     r8d,[rax*1+r8]
+        and     r15d,edi
+        xor     r14d,r12d
+        xor     r15d,ecx
+        xor     r14d,r13d
+        lea     eax,[r15*1+rax]
+        mov     r12d,r9d
+        vmovdqa YMMWORD[32+rsp],ymm6
+        lea     rsp,[((-64))+rsp]
+        vpalignr        ymm4,ymm3,ymm2,4
+        add     r11d,DWORD[((0+128))+rsp]
+        and     r12d,r8d
+        rorx    r13d,r8d,25
+        vpalignr        ymm7,ymm1,ymm0,4
+        rorx    r15d,r8d,11
+        lea     eax,[r14*1+rax]
+        lea     r11d,[r12*1+r11]
+        vpsrld  ymm6,ymm4,7
+        andn    r12d,r8d,r10d
+        xor     r13d,r15d
+        rorx    r14d,r8d,6
+        vpaddd  ymm2,ymm2,ymm7
+        lea     r11d,[r12*1+r11]
+        xor     r13d,r14d
+        mov     r15d,eax
+        vpsrld  ymm7,ymm4,3
+        rorx    r12d,eax,22
+        lea     r11d,[r13*1+r11]
+        xor     r15d,ebx
+        vpslld  ymm5,ymm4,14
+        rorx    r14d,eax,13
+        rorx    r13d,eax,2
+        lea     edx,[r11*1+rdx]
+        vpxor   ymm4,ymm7,ymm6
+        and     edi,r15d
+        xor     r14d,r12d
+        xor     edi,ebx
+        vpshufd ymm7,ymm1,250
+        xor     r14d,r13d
+        lea     r11d,[rdi*1+r11]
+        mov     r12d,r8d
+        vpsrld  ymm6,ymm6,11
+        add     r10d,DWORD[((4+128))+rsp]
+        and     r12d,edx
+        rorx    r13d,edx,25
+        vpxor   ymm4,ymm4,ymm5
+        rorx    edi,edx,11
+        lea     r11d,[r14*1+r11]
+        lea     r10d,[r12*1+r10]
+        vpslld  ymm5,ymm5,11
+        andn    r12d,edx,r9d
+        xor     r13d,edi
+        rorx    r14d,edx,6
+        vpxor   ymm4,ymm4,ymm6
+        lea     r10d,[r12*1+r10]
+        xor     r13d,r14d
+        mov     edi,r11d
+        vpsrld  ymm6,ymm7,10
+        rorx    r12d,r11d,22
+        lea     r10d,[r13*1+r10]
+        xor     edi,eax
+        vpxor   ymm4,ymm4,ymm5
+        rorx    r14d,r11d,13
+        rorx    r13d,r11d,2
+        lea     ecx,[r10*1+rcx]
+        vpsrlq  ymm7,ymm7,17
+        and     r15d,edi
+        xor     r14d,r12d
+        xor     r15d,eax
+        vpaddd  ymm2,ymm2,ymm4
+        xor     r14d,r13d
+        lea     r10d,[r15*1+r10]
+        mov     r12d,edx
+        vpxor   ymm6,ymm6,ymm7
+        add     r9d,DWORD[((8+128))+rsp]
+        and     r12d,ecx
+        rorx    r13d,ecx,25
+        vpsrlq  ymm7,ymm7,2
+        rorx    r15d,ecx,11
+        lea     r10d,[r14*1+r10]
+        lea     r9d,[r12*1+r9]
+        vpxor   ymm6,ymm6,ymm7
+        andn    r12d,ecx,r8d
+        xor     r13d,r15d
+        rorx    r14d,ecx,6
+        vpshufb ymm6,ymm6,ymm8
+        lea     r9d,[r12*1+r9]
+        xor     r13d,r14d
+        mov     r15d,r10d
+        vpaddd  ymm2,ymm2,ymm6
+        rorx    r12d,r10d,22
+        lea     r9d,[r13*1+r9]
+        xor     r15d,r11d
+        vpshufd ymm7,ymm2,80
+        rorx    r14d,r10d,13
+        rorx    r13d,r10d,2
+        lea     ebx,[r9*1+rbx]
+        vpsrld  ymm6,ymm7,10
+        and     edi,r15d
+        xor     r14d,r12d
+        xor     edi,r11d
+        vpsrlq  ymm7,ymm7,17
+        xor     r14d,r13d
+        lea     r9d,[rdi*1+r9]
+        mov     r12d,ecx
+        vpxor   ymm6,ymm6,ymm7
+        add     r8d,DWORD[((12+128))+rsp]
+        and     r12d,ebx
+        rorx    r13d,ebx,25
+        vpsrlq  ymm7,ymm7,2
+        rorx    edi,ebx,11
+        lea     r9d,[r14*1+r9]
+        lea     r8d,[r12*1+r8]
+        vpxor   ymm6,ymm6,ymm7
+        andn    r12d,ebx,edx
+        xor     r13d,edi
+        rorx    r14d,ebx,6
+        vpshufb ymm6,ymm6,ymm9
+        lea     r8d,[r12*1+r8]
+        xor     r13d,r14d
+        mov     edi,r9d
+        vpaddd  ymm2,ymm2,ymm6
+        rorx    r12d,r9d,22
+        lea     r8d,[r13*1+r8]
+        xor     edi,r10d
+        vpaddd  ymm6,ymm2,YMMWORD[64+rbp]
+        rorx    r14d,r9d,13
+        rorx    r13d,r9d,2
+        lea     eax,[r8*1+rax]
+        and     r15d,edi
+        xor     r14d,r12d
+        xor     r15d,r10d
+        xor     r14d,r13d
+        lea     r8d,[r15*1+r8]
+        mov     r12d,ebx
+        vmovdqa YMMWORD[rsp],ymm6
+        vpalignr        ymm4,ymm0,ymm3,4
+        add     edx,DWORD[((32+128))+rsp]
+        and     r12d,eax
+        rorx    r13d,eax,25
+        vpalignr        ymm7,ymm2,ymm1,4
+        rorx    r15d,eax,11
+        lea     r8d,[r14*1+r8]
+        lea     edx,[r12*1+rdx]
+        vpsrld  ymm6,ymm4,7
+        andn    r12d,eax,ecx
+        xor     r13d,r15d
+        rorx    r14d,eax,6
+        vpaddd  ymm3,ymm3,ymm7
+        lea     edx,[r12*1+rdx]
+        xor     r13d,r14d
+        mov     r15d,r8d
+        vpsrld  ymm7,ymm4,3
+        rorx    r12d,r8d,22
+        lea     edx,[r13*1+rdx]
+        xor     r15d,r9d
+        vpslld  ymm5,ymm4,14
+        rorx    r14d,r8d,13
+        rorx    r13d,r8d,2
+        lea     r11d,[rdx*1+r11]
+        vpxor   ymm4,ymm7,ymm6
+        and     edi,r15d
+        xor     r14d,r12d
+        xor     edi,r9d
+        vpshufd ymm7,ymm2,250
+        xor     r14d,r13d
+        lea     edx,[rdi*1+rdx]
+        mov     r12d,eax
+        vpsrld  ymm6,ymm6,11
+        add     ecx,DWORD[((36+128))+rsp]
+        and     r12d,r11d
+        rorx    r13d,r11d,25
+        vpxor   ymm4,ymm4,ymm5
+        rorx    edi,r11d,11
+        lea     edx,[r14*1+rdx]
+        lea     ecx,[r12*1+rcx]
+        vpslld  ymm5,ymm5,11
+        andn    r12d,r11d,ebx
+        xor     r13d,edi
+        rorx    r14d,r11d,6
+        vpxor   ymm4,ymm4,ymm6
+        lea     ecx,[r12*1+rcx]
+        xor     r13d,r14d
+        mov     edi,edx
+        vpsrld  ymm6,ymm7,10
+        rorx    r12d,edx,22
+        lea     ecx,[r13*1+rcx]
+        xor     edi,r8d
+        vpxor   ymm4,ymm4,ymm5
+        rorx    r14d,edx,13
+        rorx    r13d,edx,2
+        lea     r10d,[rcx*1+r10]
+        vpsrlq  ymm7,ymm7,17
+        and     r15d,edi
+        xor     r14d,r12d
+        xor     r15d,r8d
+        vpaddd  ymm3,ymm3,ymm4
+        xor     r14d,r13d
+        lea     ecx,[r15*1+rcx]
+        mov     r12d,r11d
+        vpxor   ymm6,ymm6,ymm7
+        add     ebx,DWORD[((40+128))+rsp]
+        and     r12d,r10d
+        rorx    r13d,r10d,25
+        vpsrlq  ymm7,ymm7,2
+        rorx    r15d,r10d,11
+        lea     ecx,[r14*1+rcx]
+        lea     ebx,[r12*1+rbx]
+        vpxor   ymm6,ymm6,ymm7
+        andn    r12d,r10d,eax
+        xor     r13d,r15d
+        rorx    r14d,r10d,6
+        vpshufb ymm6,ymm6,ymm8
+        lea     ebx,[r12*1+rbx]
+        xor     r13d,r14d
+        mov     r15d,ecx
+        vpaddd  ymm3,ymm3,ymm6
+        rorx    r12d,ecx,22
+        lea     ebx,[r13*1+rbx]
+        xor     r15d,edx
+        vpshufd ymm7,ymm3,80
+        rorx    r14d,ecx,13
+        rorx    r13d,ecx,2
+        lea     r9d,[rbx*1+r9]
+        vpsrld  ymm6,ymm7,10
+        and     edi,r15d
+        xor     r14d,r12d
+        xor     edi,edx
+        vpsrlq  ymm7,ymm7,17
+        xor     r14d,r13d
+        lea     ebx,[rdi*1+rbx]
+        mov     r12d,r10d
+        vpxor   ymm6,ymm6,ymm7
+        add     eax,DWORD[((44+128))+rsp]
+        and     r12d,r9d
+        rorx    r13d,r9d,25
+        vpsrlq  ymm7,ymm7,2
+        rorx    edi,r9d,11
+        lea     ebx,[r14*1+rbx]
+        lea     eax,[r12*1+rax]
+        vpxor   ymm6,ymm6,ymm7
+        andn    r12d,r9d,r11d
+        xor     r13d,edi
+        rorx    r14d,r9d,6
+        vpshufb ymm6,ymm6,ymm9
+        lea     eax,[r12*1+rax]
+        xor     r13d,r14d
+        mov     edi,ebx
+        vpaddd  ymm3,ymm3,ymm6
+        rorx    r12d,ebx,22
+        lea     eax,[r13*1+rax]
+        xor     edi,ecx
+        vpaddd  ymm6,ymm3,YMMWORD[96+rbp]
+        rorx    r14d,ebx,13
+        rorx    r13d,ebx,2
+        lea     r8d,[rax*1+r8]
+        and     r15d,edi
+        xor     r14d,r12d
+        xor     r15d,ecx
+        xor     r14d,r13d
+        lea     eax,[r15*1+rax]
+        mov     r12d,r9d
+        vmovdqa YMMWORD[32+rsp],ymm6
+        lea     rbp,[128+rbp]
+        cmp     BYTE[3+rbp],0
+        jne     NEAR $L$avx2_00_47
+        add     r11d,DWORD[((0+64))+rsp]
+        and     r12d,r8d
+        rorx    r13d,r8d,25
+        rorx    r15d,r8d,11
+        lea     eax,[r14*1+rax]
+        lea     r11d,[r12*1+r11]
+        andn    r12d,r8d,r10d
+        xor     r13d,r15d
+        rorx    r14d,r8d,6
+        lea     r11d,[r12*1+r11]
+        xor     r13d,r14d
+        mov     r15d,eax
+        rorx    r12d,eax,22
+        lea     r11d,[r13*1+r11]
+        xor     r15d,ebx
+        rorx    r14d,eax,13
+        rorx    r13d,eax,2
+        lea     edx,[r11*1+rdx]
+        and     edi,r15d
+        xor     r14d,r12d
+        xor     edi,ebx
+        xor     r14d,r13d
+        lea     r11d,[rdi*1+r11]
+        mov     r12d,r8d
+        add     r10d,DWORD[((4+64))+rsp]
+        and     r12d,edx
+        rorx    r13d,edx,25
+        rorx    edi,edx,11
+        lea     r11d,[r14*1+r11]
+        lea     r10d,[r12*1+r10]
+        andn    r12d,edx,r9d
+        xor     r13d,edi
+        rorx    r14d,edx,6
+        lea     r10d,[r12*1+r10]
+        xor     r13d,r14d
+        mov     edi,r11d
+        rorx    r12d,r11d,22
+        lea     r10d,[r13*1+r10]
+        xor     edi,eax
+        rorx    r14d,r11d,13
+        rorx    r13d,r11d,2
+        lea     ecx,[r10*1+rcx]
+        and     r15d,edi
+        xor     r14d,r12d
+        xor     r15d,eax
+        xor     r14d,r13d
+        lea     r10d,[r15*1+r10]
+        mov     r12d,edx
+        add     r9d,DWORD[((8+64))+rsp]
+        and     r12d,ecx
+        rorx    r13d,ecx,25
+        rorx    r15d,ecx,11
+        lea     r10d,[r14*1+r10]
+        lea     r9d,[r12*1+r9]
+        andn    r12d,ecx,r8d
+        xor     r13d,r15d
+        rorx    r14d,ecx,6
+        lea     r9d,[r12*1+r9]
+        xor     r13d,r14d
+        mov     r15d,r10d
+        rorx    r12d,r10d,22
+        lea     r9d,[r13*1+r9]
+        xor     r15d,r11d
+        rorx    r14d,r10d,13
+        rorx    r13d,r10d,2
+        lea     ebx,[r9*1+rbx]
+        and     edi,r15d
+        xor     r14d,r12d
+        xor     edi,r11d
+        xor     r14d,r13d
+        lea     r9d,[rdi*1+r9]
+        mov     r12d,ecx
+        add     r8d,DWORD[((12+64))+rsp]
+        and     r12d,ebx
+        rorx    r13d,ebx,25
+        rorx    edi,ebx,11
+        lea     r9d,[r14*1+r9]
+        lea     r8d,[r12*1+r8]
+        andn    r12d,ebx,edx
+        xor     r13d,edi
+        rorx    r14d,ebx,6
+        lea     r8d,[r12*1+r8]
+        xor     r13d,r14d
+        mov     edi,r9d
+        rorx    r12d,r9d,22
+        lea     r8d,[r13*1+r8]
+        xor     edi,r10d
+        rorx    r14d,r9d,13
+        rorx    r13d,r9d,2
+        lea     eax,[r8*1+rax]
+        and     r15d,edi
+        xor     r14d,r12d
+        xor     r15d,r10d
+        xor     r14d,r13d
+        lea     r8d,[r15*1+r8]
+        mov     r12d,ebx
+        add     edx,DWORD[((32+64))+rsp]
+        and     r12d,eax
+        rorx    r13d,eax,25
+        rorx    r15d,eax,11
+        lea     r8d,[r14*1+r8]
+        lea     edx,[r12*1+rdx]
+        andn    r12d,eax,ecx
+        xor     r13d,r15d
+        rorx    r14d,eax,6
+        lea     edx,[r12*1+rdx]
+        xor     r13d,r14d
+        mov     r15d,r8d
+        rorx    r12d,r8d,22
+        lea     edx,[r13*1+rdx]
+        xor     r15d,r9d
+        rorx    r14d,r8d,13
+        rorx    r13d,r8d,2
+        lea     r11d,[rdx*1+r11]
+        and     edi,r15d
+        xor     r14d,r12d
+        xor     edi,r9d
+        xor     r14d,r13d
+        lea     edx,[rdi*1+rdx]
+        mov     r12d,eax
+        add     ecx,DWORD[((36+64))+rsp]
+        and     r12d,r11d
+        rorx    r13d,r11d,25
+        rorx    edi,r11d,11
+        lea     edx,[r14*1+rdx]
+        lea     ecx,[r12*1+rcx]
+        andn    r12d,r11d,ebx
+        xor     r13d,edi
+        rorx    r14d,r11d,6
+        lea     ecx,[r12*1+rcx]
+        xor     r13d,r14d
+        mov     edi,edx
+        rorx    r12d,edx,22
+        lea     ecx,[r13*1+rcx]
+        xor     edi,r8d
+        rorx    r14d,edx,13
+        rorx    r13d,edx,2
+        lea     r10d,[rcx*1+r10]
+        and     r15d,edi
+        xor     r14d,r12d
+        xor     r15d,r8d
+        xor     r14d,r13d
+        lea     ecx,[r15*1+rcx]
+        mov     r12d,r11d
+        add     ebx,DWORD[((40+64))+rsp]
+        and     r12d,r10d
+        rorx    r13d,r10d,25
+        rorx    r15d,r10d,11
+        lea     ecx,[r14*1+rcx]
+        lea     ebx,[r12*1+rbx]
+        andn    r12d,r10d,eax
+        xor     r13d,r15d
+        rorx    r14d,r10d,6
+        lea     ebx,[r12*1+rbx]
+        xor     r13d,r14d
+        mov     r15d,ecx
+        rorx    r12d,ecx,22
+        lea     ebx,[r13*1+rbx]
+        xor     r15d,edx
+        rorx    r14d,ecx,13
+        rorx    r13d,ecx,2
+        lea     r9d,[rbx*1+r9]
+        and     edi,r15d
+        xor     r14d,r12d
+        xor     edi,edx
+        xor     r14d,r13d
+        lea     ebx,[rdi*1+rbx]
+        mov     r12d,r10d
+        add     eax,DWORD[((44+64))+rsp]
+        and     r12d,r9d
+        rorx    r13d,r9d,25
+        rorx    edi,r9d,11
+        lea     ebx,[r14*1+rbx]
+        lea     eax,[r12*1+rax]
+        andn    r12d,r9d,r11d
+        xor     r13d,edi
+        rorx    r14d,r9d,6
+        lea     eax,[r12*1+rax]
+        xor     r13d,r14d
+        mov     edi,ebx
+        rorx    r12d,ebx,22
+        lea     eax,[r13*1+rax]
+        xor     edi,ecx
+        rorx    r14d,ebx,13
+        rorx    r13d,ebx,2
+        lea     r8d,[rax*1+r8]
+        and     r15d,edi
+        xor     r14d,r12d
+        xor     r15d,ecx
+        xor     r14d,r13d
+        lea     eax,[r15*1+rax]
+        mov     r12d,r9d
+        add     r11d,DWORD[rsp]
+        and     r12d,r8d
+        rorx    r13d,r8d,25
+        rorx    r15d,r8d,11
+        lea     eax,[r14*1+rax]
+        lea     r11d,[r12*1+r11]
+        andn    r12d,r8d,r10d
+        xor     r13d,r15d
+        rorx    r14d,r8d,6
+        lea     r11d,[r12*1+r11]
+        xor     r13d,r14d
+        mov     r15d,eax
+        rorx    r12d,eax,22
+        lea     r11d,[r13*1+r11]
+        xor     r15d,ebx
+        rorx    r14d,eax,13
+        rorx    r13d,eax,2
+        lea     edx,[r11*1+rdx]
+        and     edi,r15d
+        xor     r14d,r12d
+        xor     edi,ebx
+        xor     r14d,r13d
+        lea     r11d,[rdi*1+r11]
+        mov     r12d,r8d
+        add     r10d,DWORD[4+rsp]
+        and     r12d,edx
+        rorx    r13d,edx,25
+        rorx    edi,edx,11
+        lea     r11d,[r14*1+r11]
+        lea     r10d,[r12*1+r10]
+        andn    r12d,edx,r9d
+        xor     r13d,edi
+        rorx    r14d,edx,6
+        lea     r10d,[r12*1+r10]
+        xor     r13d,r14d
+        mov     edi,r11d
+        rorx    r12d,r11d,22
+        lea     r10d,[r13*1+r10]
+        xor     edi,eax
+        rorx    r14d,r11d,13
+        rorx    r13d,r11d,2
+        lea     ecx,[r10*1+rcx]
+        and     r15d,edi
+        xor     r14d,r12d
+        xor     r15d,eax
+        xor     r14d,r13d
+        lea     r10d,[r15*1+r10]
+        mov     r12d,edx
+        add     r9d,DWORD[8+rsp]
+        and     r12d,ecx
+        rorx    r13d,ecx,25
+        rorx    r15d,ecx,11
+        lea     r10d,[r14*1+r10]
+        lea     r9d,[r12*1+r9]
+        andn    r12d,ecx,r8d
+        xor     r13d,r15d
+        rorx    r14d,ecx,6
+        lea     r9d,[r12*1+r9]
+        xor     r13d,r14d
+        mov     r15d,r10d
+        rorx    r12d,r10d,22
+        lea     r9d,[r13*1+r9]
+        xor     r15d,r11d
+        rorx    r14d,r10d,13
+        rorx    r13d,r10d,2
+        lea     ebx,[r9*1+rbx]
+        and     edi,r15d
+        xor     r14d,r12d
+        xor     edi,r11d
+        xor     r14d,r13d
+        lea     r9d,[rdi*1+r9]
+        mov     r12d,ecx
+        add     r8d,DWORD[12+rsp]
+        and     r12d,ebx
+        rorx    r13d,ebx,25
+        rorx    edi,ebx,11
+        lea     r9d,[r14*1+r9]
+        lea     r8d,[r12*1+r8]
+        andn    r12d,ebx,edx
+        xor     r13d,edi
+        rorx    r14d,ebx,6
+        lea     r8d,[r12*1+r8]
+        xor     r13d,r14d
+        mov     edi,r9d
+        rorx    r12d,r9d,22
+        lea     r8d,[r13*1+r8]
+        xor     edi,r10d
+        rorx    r14d,r9d,13
+        rorx    r13d,r9d,2
+        lea     eax,[r8*1+rax]
+        and     r15d,edi
+        xor     r14d,r12d
+        xor     r15d,r10d
+        xor     r14d,r13d
+        lea     r8d,[r15*1+r8]
+        mov     r12d,ebx
+        add     edx,DWORD[32+rsp]
+        and     r12d,eax
+        rorx    r13d,eax,25
+        rorx    r15d,eax,11
+        lea     r8d,[r14*1+r8]
+        lea     edx,[r12*1+rdx]
+        andn    r12d,eax,ecx
+        xor     r13d,r15d
+        rorx    r14d,eax,6
+        lea     edx,[r12*1+rdx]
+        xor     r13d,r14d
+        mov     r15d,r8d
+        rorx    r12d,r8d,22
+        lea     edx,[r13*1+rdx]
+        xor     r15d,r9d
+        rorx    r14d,r8d,13
+        rorx    r13d,r8d,2
+        lea     r11d,[rdx*1+r11]
+        and     edi,r15d
+        xor     r14d,r12d
+        xor     edi,r9d
+        xor     r14d,r13d
+        lea     edx,[rdi*1+rdx]
+        mov     r12d,eax
+        add     ecx,DWORD[36+rsp]
+        and     r12d,r11d
+        rorx    r13d,r11d,25
+        rorx    edi,r11d,11
+        lea     edx,[r14*1+rdx]
+        lea     ecx,[r12*1+rcx]
+        andn    r12d,r11d,ebx
+        xor     r13d,edi
+        rorx    r14d,r11d,6
+        lea     ecx,[r12*1+rcx]
+        xor     r13d,r14d
+        mov     edi,edx
+        rorx    r12d,edx,22
+        lea     ecx,[r13*1+rcx]
+        xor     edi,r8d
+        rorx    r14d,edx,13
+        rorx    r13d,edx,2
+        lea     r10d,[rcx*1+r10]
+        and     r15d,edi
+        xor     r14d,r12d
+        xor     r15d,r8d
+        xor     r14d,r13d
+        lea     ecx,[r15*1+rcx]
+        mov     r12d,r11d
+        add     ebx,DWORD[40+rsp]
+        and     r12d,r10d
+        rorx    r13d,r10d,25
+        rorx    r15d,r10d,11
+        lea     ecx,[r14*1+rcx]
+        lea     ebx,[r12*1+rbx]
+        andn    r12d,r10d,eax
+        xor     r13d,r15d
+        rorx    r14d,r10d,6
+        lea     ebx,[r12*1+rbx]
+        xor     r13d,r14d
+        mov     r15d,ecx
+        rorx    r12d,ecx,22
+        lea     ebx,[r13*1+rbx]
+        xor     r15d,edx
+        rorx    r14d,ecx,13
+        rorx    r13d,ecx,2
+        lea     r9d,[rbx*1+r9]
+        and     edi,r15d
+        xor     r14d,r12d
+        xor     edi,edx
+        xor     r14d,r13d
+        lea     ebx,[rdi*1+rbx]
+        mov     r12d,r10d
+        add     eax,DWORD[44+rsp]
+        and     r12d,r9d
+        rorx    r13d,r9d,25
+        rorx    edi,r9d,11
+        lea     ebx,[r14*1+rbx]
+        lea     eax,[r12*1+rax]
+        andn    r12d,r9d,r11d
+        xor     r13d,edi
+        rorx    r14d,r9d,6
+        lea     eax,[r12*1+rax]
+        xor     r13d,r14d
+        mov     edi,ebx
+        rorx    r12d,ebx,22
+        lea     eax,[r13*1+rax]
+        xor     edi,ecx
+        rorx    r14d,ebx,13
+        rorx    r13d,ebx,2
+        lea     r8d,[rax*1+r8]
+        and     r15d,edi
+        xor     r14d,r12d
+        xor     r15d,ecx
+        xor     r14d,r13d
+        lea     eax,[r15*1+rax]
+        mov     r12d,r9d
+        mov     rdi,QWORD[512+rsp]
+        add     eax,r14d
+
+        lea     rbp,[448+rsp]
+
+        add     eax,DWORD[rdi]
+        add     ebx,DWORD[4+rdi]
+        add     ecx,DWORD[8+rdi]
+        add     edx,DWORD[12+rdi]
+        add     r8d,DWORD[16+rdi]
+        add     r9d,DWORD[20+rdi]
+        add     r10d,DWORD[24+rdi]
+        add     r11d,DWORD[28+rdi]
+
+        mov     DWORD[rdi],eax
+        mov     DWORD[4+rdi],ebx
+        mov     DWORD[8+rdi],ecx
+        mov     DWORD[12+rdi],edx
+        mov     DWORD[16+rdi],r8d
+        mov     DWORD[20+rdi],r9d
+        mov     DWORD[24+rdi],r10d
+        mov     DWORD[28+rdi],r11d
+
+        cmp     rsi,QWORD[80+rbp]
+        je      NEAR $L$done_avx2
+
+        xor     r14d,r14d
+        mov     edi,ebx
+        xor     edi,ecx
+        mov     r12d,r9d
+        jmp     NEAR $L$ower_avx2
+ALIGN   16
+$L$ower_avx2:
+        add     r11d,DWORD[((0+16))+rbp]
+        and     r12d,r8d
+        rorx    r13d,r8d,25
+        rorx    r15d,r8d,11
+        lea     eax,[r14*1+rax]
+        lea     r11d,[r12*1+r11]
+        andn    r12d,r8d,r10d
+        xor     r13d,r15d
+        rorx    r14d,r8d,6
+        lea     r11d,[r12*1+r11]
+        xor     r13d,r14d
+        mov     r15d,eax
+        rorx    r12d,eax,22
+        lea     r11d,[r13*1+r11]
+        xor     r15d,ebx
+        rorx    r14d,eax,13
+        rorx    r13d,eax,2
+        lea     edx,[r11*1+rdx]
+        and     edi,r15d
+        xor     r14d,r12d
+        xor     edi,ebx
+        xor     r14d,r13d
+        lea     r11d,[rdi*1+r11]
+        mov     r12d,r8d
+        add     r10d,DWORD[((4+16))+rbp]
+        and     r12d,edx
+        rorx    r13d,edx,25
+        rorx    edi,edx,11
+        lea     r11d,[r14*1+r11]
+        lea     r10d,[r12*1+r10]
+        andn    r12d,edx,r9d
+        xor     r13d,edi
+        rorx    r14d,edx,6
+        lea     r10d,[r12*1+r10]
+        xor     r13d,r14d
+        mov     edi,r11d
+        rorx    r12d,r11d,22
+        lea     r10d,[r13*1+r10]
+        xor     edi,eax
+        rorx    r14d,r11d,13
+        rorx    r13d,r11d,2
+        lea     ecx,[r10*1+rcx]
+        and     r15d,edi
+        xor     r14d,r12d
+        xor     r15d,eax
+        xor     r14d,r13d
+        lea     r10d,[r15*1+r10]
+        mov     r12d,edx
+        add     r9d,DWORD[((8+16))+rbp]
+        and     r12d,ecx
+        rorx    r13d,ecx,25
+        rorx    r15d,ecx,11
+        lea     r10d,[r14*1+r10]
+        lea     r9d,[r12*1+r9]
+        andn    r12d,ecx,r8d
+        xor     r13d,r15d
+        rorx    r14d,ecx,6
+        lea     r9d,[r12*1+r9]
+        xor     r13d,r14d
+        mov     r15d,r10d
+        rorx    r12d,r10d,22
+        lea     r9d,[r13*1+r9]
+        xor     r15d,r11d
+        rorx    r14d,r10d,13
+        rorx    r13d,r10d,2
+        lea     ebx,[r9*1+rbx]
+        and     edi,r15d
+        xor     r14d,r12d
+        xor     edi,r11d
+        xor     r14d,r13d
+        lea     r9d,[rdi*1+r9]
+        mov     r12d,ecx
+        add     r8d,DWORD[((12+16))+rbp]
+        and     r12d,ebx
+        rorx    r13d,ebx,25
+        rorx    edi,ebx,11
+        lea     r9d,[r14*1+r9]
+        lea     r8d,[r12*1+r8]
+        andn    r12d,ebx,edx
+        xor     r13d,edi
+        rorx    r14d,ebx,6
+        lea     r8d,[r12*1+r8]
+        xor     r13d,r14d
+        mov     edi,r9d
+        rorx    r12d,r9d,22
+        lea     r8d,[r13*1+r8]
+        xor     edi,r10d
+        rorx    r14d,r9d,13
+        rorx    r13d,r9d,2
+        lea     eax,[r8*1+rax]
+        and     r15d,edi
+        xor     r14d,r12d
+        xor     r15d,r10d
+        xor     r14d,r13d
+        lea     r8d,[r15*1+r8]
+        mov     r12d,ebx
+        add     edx,DWORD[((32+16))+rbp]
+        and     r12d,eax
+        rorx    r13d,eax,25
+        rorx    r15d,eax,11
+        lea     r8d,[r14*1+r8]
+        lea     edx,[r12*1+rdx]
+        andn    r12d,eax,ecx
+        xor     r13d,r15d
+        rorx    r14d,eax,6
+        lea     edx,[r12*1+rdx]
+        xor     r13d,r14d
+        mov     r15d,r8d
+        rorx    r12d,r8d,22
+        lea     edx,[r13*1+rdx]
+        xor     r15d,r9d
+        rorx    r14d,r8d,13
+        rorx    r13d,r8d,2
+        lea     r11d,[rdx*1+r11]
+        and     edi,r15d
+        xor     r14d,r12d
+        xor     edi,r9d
+        xor     r14d,r13d
+        lea     edx,[rdi*1+rdx]
+        mov     r12d,eax
+        add     ecx,DWORD[((36+16))+rbp]
+        and     r12d,r11d
+        rorx    r13d,r11d,25
+        rorx    edi,r11d,11
+        lea     edx,[r14*1+rdx]
+        lea     ecx,[r12*1+rcx]
+        andn    r12d,r11d,ebx
+        xor     r13d,edi
+        rorx    r14d,r11d,6
+        lea     ecx,[r12*1+rcx]
+        xor     r13d,r14d
+        mov     edi,edx
+        rorx    r12d,edx,22
+        lea     ecx,[r13*1+rcx]
+        xor     edi,r8d
+        rorx    r14d,edx,13
+        rorx    r13d,edx,2
+        lea     r10d,[rcx*1+r10]
+        and     r15d,edi
+        xor     r14d,r12d
+        xor     r15d,r8d
+        xor     r14d,r13d
+        lea     ecx,[r15*1+rcx]
+        mov     r12d,r11d
+        add     ebx,DWORD[((40+16))+rbp]
+        and     r12d,r10d
+        rorx    r13d,r10d,25
+        rorx    r15d,r10d,11
+        lea     ecx,[r14*1+rcx]
+        lea     ebx,[r12*1+rbx]
+        andn    r12d,r10d,eax
+        xor     r13d,r15d
+        rorx    r14d,r10d,6
+        lea     ebx,[r12*1+rbx]
+        xor     r13d,r14d
+        mov     r15d,ecx
+        rorx    r12d,ecx,22
+        lea     ebx,[r13*1+rbx]
+        xor     r15d,edx
+        rorx    r14d,ecx,13
+        rorx    r13d,ecx,2
+        lea     r9d,[rbx*1+r9]
+        and     edi,r15d
+        xor     r14d,r12d
+        xor     edi,edx
+        xor     r14d,r13d
+        lea     ebx,[rdi*1+rbx]
+        mov     r12d,r10d
+        add     eax,DWORD[((44+16))+rbp]
+        and     r12d,r9d
+        rorx    r13d,r9d,25
+        rorx    edi,r9d,11
+        lea     ebx,[r14*1+rbx]
+        lea     eax,[r12*1+rax]
+        andn    r12d,r9d,r11d
+        xor     r13d,edi
+        rorx    r14d,r9d,6
+        lea     eax,[r12*1+rax]
+        xor     r13d,r14d
+        mov     edi,ebx
+        rorx    r12d,ebx,22
+        lea     eax,[r13*1+rax]
+        xor     edi,ecx
+        rorx    r14d,ebx,13
+        rorx    r13d,ebx,2
+        lea     r8d,[rax*1+r8]
+        and     r15d,edi
+        xor     r14d,r12d
+        xor     r15d,ecx
+        xor     r14d,r13d
+        lea     eax,[r15*1+rax]
+        mov     r12d,r9d
+        lea     rbp,[((-64))+rbp]
+        cmp     rbp,rsp
+        jae     NEAR $L$ower_avx2
+
+        mov     rdi,QWORD[512+rsp]
+        add     eax,r14d
+
+        lea     rsp,[448+rsp]
+
+        add     eax,DWORD[rdi]
+        add     ebx,DWORD[4+rdi]
+        add     ecx,DWORD[8+rdi]
+        add     edx,DWORD[12+rdi]
+        add     r8d,DWORD[16+rdi]
+        add     r9d,DWORD[20+rdi]
+        lea     rsi,[128+rsi]
+        add     r10d,DWORD[24+rdi]
+        mov     r12,rsi
+        add     r11d,DWORD[28+rdi]
+        cmp     rsi,QWORD[((64+16))+rsp]
+
+        mov     DWORD[rdi],eax
+        cmove   r12,rsp
+        mov     DWORD[4+rdi],ebx
+        mov     DWORD[8+rdi],ecx
+        mov     DWORD[12+rdi],edx
+        mov     DWORD[16+rdi],r8d
+        mov     DWORD[20+rdi],r9d
+        mov     DWORD[24+rdi],r10d
+        mov     DWORD[28+rdi],r11d
+
+        jbe     NEAR $L$oop_avx2
+        lea     rbp,[rsp]
+
+$L$done_avx2:
+        lea     rsp,[rbp]
+        mov     rsi,QWORD[88+rsp]
+
+        vzeroupper
+        movaps  xmm6,XMMWORD[((64+32))+rsp]
+        movaps  xmm7,XMMWORD[((64+48))+rsp]
+        movaps  xmm8,XMMWORD[((64+64))+rsp]
+        movaps  xmm9,XMMWORD[((64+80))+rsp]
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$epilogue_avx2:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_sha256_block_data_order_avx2:
+EXTERN  __imp_RtlVirtualUnwind
+
+ALIGN   16
+se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$in_prologue
+
+        mov     rax,QWORD[152+r8]
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$in_prologue
+        lea     r10,[$L$avx2_shortcut]
+        cmp     rbx,r10
+        jb      NEAR $L$not_in_avx2
+
+        and     rax,-256*4
+        add     rax,448
+$L$not_in_avx2:
+        mov     rsi,rax
+        mov     rax,QWORD[((64+24))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+        mov     r12,QWORD[((-24))+rax]
+        mov     r13,QWORD[((-32))+rax]
+        mov     r14,QWORD[((-40))+rax]
+        mov     r15,QWORD[((-48))+rax]
+        mov     QWORD[144+r8],rbx
+        mov     QWORD[160+r8],rbp
+        mov     QWORD[216+r8],r12
+        mov     QWORD[224+r8],r13
+        mov     QWORD[232+r8],r14
+        mov     QWORD[240+r8],r15
+
+        lea     r10,[$L$epilogue]
+        cmp     rbx,r10
+        jb      NEAR $L$in_prologue
+
+        lea     rsi,[((64+32))+rsi]
+        lea     rdi,[512+r8]
+        mov     ecx,8
+        DD      0xa548f3fc
+
+$L$in_prologue:
+        mov     rdi,QWORD[8+rax]
+        mov     rsi,QWORD[16+rax]
+        mov     QWORD[152+r8],rax
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+
+        mov     rdi,QWORD[40+r9]
+        mov     rsi,r8
+        mov     ecx,154
+        DD      0xa548f3fc
+
+        mov     rsi,r9
+        xor     rcx,rcx
+        mov     rdx,QWORD[8+rsi]
+        mov     r8,QWORD[rsi]
+        mov     r9,QWORD[16+rsi]
+        mov     r10,QWORD[40+rsi]
+        lea     r11,[56+rsi]
+        lea     r12,[24+rsi]
+        mov     QWORD[32+rsp],r10
+        mov     QWORD[40+rsp],r11
+        mov     QWORD[48+rsp],r12
+        mov     QWORD[56+rsp],rcx
+        call    QWORD[__imp_RtlVirtualUnwind]
+
+        mov     eax,1
+        add     rsp,64
+        popfq
+        pop     r15
+        pop     r14
+        pop     r13
+        pop     r12
+        pop     rbp
+        pop     rbx
+        pop     rdi
+        pop     rsi
+        DB      0F3h,0C3h               ;repret
+
+
+ALIGN   16
+shaext_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        lea     r10,[$L$prologue_shaext]
+        cmp     rbx,r10
+        jb      NEAR $L$in_prologue
+
+        lea     r10,[$L$epilogue_shaext]
+        cmp     rbx,r10
+        jae     NEAR $L$in_prologue
+
+        lea     rsi,[((-8-80))+rax]
+        lea     rdi,[512+r8]
+        mov     ecx,10
+        DD      0xa548f3fc
+
+        jmp     NEAR $L$in_prologue
+
+section .pdata rdata align=4
+ALIGN   4
+        DD      $L$SEH_begin_sha256_block_data_order wrt ..imagebase
+        DD      $L$SEH_end_sha256_block_data_order wrt ..imagebase
+        DD      $L$SEH_info_sha256_block_data_order wrt ..imagebase
+        DD      $L$SEH_begin_sha256_block_data_order_shaext wrt ..imagebase
+        DD      $L$SEH_end_sha256_block_data_order_shaext wrt ..imagebase
+        DD      $L$SEH_info_sha256_block_data_order_shaext wrt ..imagebase
+        DD      $L$SEH_begin_sha256_block_data_order_ssse3 wrt ..imagebase
+        DD      $L$SEH_end_sha256_block_data_order_ssse3 wrt ..imagebase
+        DD      $L$SEH_info_sha256_block_data_order_ssse3 wrt ..imagebase
+        DD      $L$SEH_begin_sha256_block_data_order_avx wrt ..imagebase
+        DD      $L$SEH_end_sha256_block_data_order_avx wrt ..imagebase
+        DD      $L$SEH_info_sha256_block_data_order_avx wrt ..imagebase
+        DD      $L$SEH_begin_sha256_block_data_order_avx2 wrt ..imagebase
+        DD      $L$SEH_end_sha256_block_data_order_avx2 wrt ..imagebase
+        DD      $L$SEH_info_sha256_block_data_order_avx2 wrt ..imagebase
+section .xdata rdata align=8
+ALIGN   8
+$L$SEH_info_sha256_block_data_order:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$prologue wrt ..imagebase,$L$epilogue wrt ..imagebase
+$L$SEH_info_sha256_block_data_order_shaext:
+DB      9,0,0,0
+        DD      shaext_handler wrt ..imagebase
+$L$SEH_info_sha256_block_data_order_ssse3:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$prologue_ssse3 wrt ..imagebase,$L$epilogue_ssse3 wrt ..imagebase
+$L$SEH_info_sha256_block_data_order_avx:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$prologue_avx wrt ..imagebase,$L$epilogue_avx wrt ..imagebase
+$L$SEH_info_sha256_block_data_order_avx2:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$prologue_avx2 wrt ..imagebase,$L$epilogue_avx2 wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm
new file mode 100644
index 0000000000..6d48b93b84
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm
@@ -0,0 +1,5668 @@
+; Copyright 2005-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+EXTERN  OPENSSL_ia32cap_P
+global  sha512_block_data_order
+
+ALIGN   16
+sha512_block_data_order:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha512_block_data_order:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+        lea     r11,[OPENSSL_ia32cap_P]
+        mov     r9d,DWORD[r11]
+        mov     r10d,DWORD[4+r11]
+        mov     r11d,DWORD[8+r11]
+        test    r10d,2048
+        jnz     NEAR $L$xop_shortcut
+        and     r11d,296
+        cmp     r11d,296
+        je      NEAR $L$avx2_shortcut
+        and     r9d,1073741824
+        and     r10d,268435968
+        or      r10d,r9d
+        cmp     r10d,1342177792
+        je      NEAR $L$avx_shortcut
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        shl     rdx,4
+        sub     rsp,16*8+4*8
+        lea     rdx,[rdx*8+rsi]
+        and     rsp,-64
+        mov     QWORD[((128+0))+rsp],rdi
+        mov     QWORD[((128+8))+rsp],rsi
+        mov     QWORD[((128+16))+rsp],rdx
+        mov     QWORD[152+rsp],rax
+
+$L$prologue:
+
+        mov     rax,QWORD[rdi]
+        mov     rbx,QWORD[8+rdi]
+        mov     rcx,QWORD[16+rdi]
+        mov     rdx,QWORD[24+rdi]
+        mov     r8,QWORD[32+rdi]
+        mov     r9,QWORD[40+rdi]
+        mov     r10,QWORD[48+rdi]
+        mov     r11,QWORD[56+rdi]
+        jmp     NEAR $L$loop
+
+ALIGN   16
+$L$loop:
+        mov     rdi,rbx
+        lea     rbp,[K512]
+        xor     rdi,rcx
+        mov     r12,QWORD[rsi]
+        mov     r13,r8
+        mov     r14,rax
+        bswap   r12
+        ror     r13,23
+        mov     r15,r9
+
+        xor     r13,r8
+        ror     r14,5
+        xor     r15,r10
+
+        mov     QWORD[rsp],r12
+        xor     r14,rax
+        and     r15,r8
+
+        ror     r13,4
+        add     r12,r11
+        xor     r15,r10
+
+        ror     r14,6
+        xor     r13,r8
+        add     r12,r15
+
+        mov     r15,rax
+        add     r12,QWORD[rbp]
+        xor     r14,rax
+
+        xor     r15,rbx
+        ror     r13,14
+        mov     r11,rbx
+
+        and     rdi,r15
+        ror     r14,28
+        add     r12,r13
+
+        xor     r11,rdi
+        add     rdx,r12
+        add     r11,r12
+
+        lea     rbp,[8+rbp]
+        add     r11,r14
+        mov     r12,QWORD[8+rsi]
+        mov     r13,rdx
+        mov     r14,r11
+        bswap   r12
+        ror     r13,23
+        mov     rdi,r8
+
+        xor     r13,rdx
+        ror     r14,5
+        xor     rdi,r9
+
+        mov     QWORD[8+rsp],r12
+        xor     r14,r11
+        and     rdi,rdx
+
+        ror     r13,4
+        add     r12,r10
+        xor     rdi,r9
+
+        ror     r14,6
+        xor     r13,rdx
+        add     r12,rdi
+
+        mov     rdi,r11
+        add     r12,QWORD[rbp]
+        xor     r14,r11
+
+        xor     rdi,rax
+        ror     r13,14
+        mov     r10,rax
+
+        and     r15,rdi
+        ror     r14,28
+        add     r12,r13
+
+        xor     r10,r15
+        add     rcx,r12
+        add     r10,r12
+
+        lea     rbp,[24+rbp]
+        add     r10,r14
+        mov     r12,QWORD[16+rsi]
+        mov     r13,rcx
+        mov     r14,r10
+        bswap   r12
+        ror     r13,23
+        mov     r15,rdx
+
+        xor     r13,rcx
+        ror     r14,5
+        xor     r15,r8
+
+        mov     QWORD[16+rsp],r12
+        xor     r14,r10
+        and     r15,rcx
+
+        ror     r13,4
+        add     r12,r9
+        xor     r15,r8
+
+        ror     r14,6
+        xor     r13,rcx
+        add     r12,r15
+
+        mov     r15,r10
+        add     r12,QWORD[rbp]
+        xor     r14,r10
+
+        xor     r15,r11
+        ror     r13,14
+        mov     r9,r11
+
+        and     rdi,r15
+        ror     r14,28
+        add     r12,r13
+
+        xor     r9,rdi
+        add     rbx,r12
+        add     r9,r12
+
+        lea     rbp,[8+rbp]
+        add     r9,r14
+        mov     r12,QWORD[24+rsi]
+        mov     r13,rbx
+        mov     r14,r9
+        bswap   r12
+        ror     r13,23
+        mov     rdi,rcx
+
+        xor     r13,rbx
+        ror     r14,5
+        xor     rdi,rdx
+
+        mov     QWORD[24+rsp],r12
+        xor     r14,r9
+        and     rdi,rbx
+
+        ror     r13,4
+        add     r12,r8
+        xor     rdi,rdx
+
+        ror     r14,6
+        xor     r13,rbx
+        add     r12,rdi
+
+        mov     rdi,r9
+        add     r12,QWORD[rbp]
+        xor     r14,r9
+
+        xor     rdi,r10
+        ror     r13,14
+        mov     r8,r10
+
+        and     r15,rdi
+        ror     r14,28
+        add     r12,r13
+
+        xor     r8,r15
+        add     rax,r12
+        add     r8,r12
+
+        lea     rbp,[24+rbp]
+        add     r8,r14
+        mov     r12,QWORD[32+rsi]
+        mov     r13,rax
+        mov     r14,r8
+        bswap   r12
+        ror     r13,23
+        mov     r15,rbx
+
+        xor     r13,rax
+        ror     r14,5
+        xor     r15,rcx
+
+        mov     QWORD[32+rsp],r12
+        xor     r14,r8
+        and     r15,rax
+
+        ror     r13,4
+        add     r12,rdx
+        xor     r15,rcx
+
+        ror     r14,6
+        xor     r13,rax
+        add     r12,r15
+
+        mov     r15,r8
+        add     r12,QWORD[rbp]
+        xor     r14,r8
+
+        xor     r15,r9
+        ror     r13,14
+        mov     rdx,r9
+
+        and     rdi,r15
+        ror     r14,28
+        add     r12,r13
+
+        xor     rdx,rdi
+        add     r11,r12
+        add     rdx,r12
+
+        lea     rbp,[8+rbp]
+        add     rdx,r14
+        mov     r12,QWORD[40+rsi]
+        mov     r13,r11
+        mov     r14,rdx
+        bswap   r12
+        ror     r13,23
+        mov     rdi,rax
+
+        xor     r13,r11
+        ror     r14,5
+        xor     rdi,rbx
+
+        mov     QWORD[40+rsp],r12
+        xor     r14,rdx
+        and     rdi,r11
+
+        ror     r13,4
+        add     r12,rcx
+        xor     rdi,rbx
+
+        ror     r14,6
+        xor     r13,r11
+        add     r12,rdi
+
+        mov     rdi,rdx
+        add     r12,QWORD[rbp]
+        xor     r14,rdx
+
+        xor     rdi,r8
+        ror     r13,14
+        mov     rcx,r8
+
+        and     r15,rdi
+        ror     r14,28
+        add     r12,r13
+
+        xor     rcx,r15
+        add     r10,r12
+        add     rcx,r12
+
+        lea     rbp,[24+rbp]
+        add     rcx,r14
+        mov     r12,QWORD[48+rsi]
+        mov     r13,r10
+        mov     r14,rcx
+        bswap   r12
+        ror     r13,23
+        mov     r15,r11
+
+        xor     r13,r10
+        ror     r14,5
+        xor     r15,rax
+
+        mov     QWORD[48+rsp],r12
+        xor     r14,rcx
+        and     r15,r10
+
+        ror     r13,4
+        add     r12,rbx
+        xor     r15,rax
+
+        ror     r14,6
+        xor     r13,r10
+        add     r12,r15
+
+        mov     r15,rcx
+        add     r12,QWORD[rbp]
+        xor     r14,rcx
+
+        xor     r15,rdx
+        ror     r13,14
+        mov     rbx,rdx
+
+        and     rdi,r15
+        ror     r14,28
+        add     r12,r13
+
+        xor     rbx,rdi
+        add     r9,r12
+        add     rbx,r12
+
+        lea     rbp,[8+rbp]
+        add     rbx,r14
+        mov     r12,QWORD[56+rsi]
+        mov     r13,r9
+        mov     r14,rbx
+        bswap   r12
+        ror     r13,23
+        mov     rdi,r10
+
+        xor     r13,r9
+        ror     r14,5
+        xor     rdi,r11
+
+        mov     QWORD[56+rsp],r12
+        xor     r14,rbx
+        and     rdi,r9
+
+        ror     r13,4
+        add     r12,rax
+        xor     rdi,r11
+
+        ror     r14,6
+        xor     r13,r9
+        add     r12,rdi
+
+        mov     rdi,rbx
+        add     r12,QWORD[rbp]
+        xor     r14,rbx
+
+        xor     rdi,rcx
+        ror     r13,14
+        mov     rax,rcx
+
+        and     r15,rdi
+        ror     r14,28
+        add     r12,r13
+
+        xor     rax,r15
+        add     r8,r12
+        add     rax,r12
+
+        lea     rbp,[24+rbp]
+        add     rax,r14
+        mov     r12,QWORD[64+rsi]
+        mov     r13,r8
+        mov     r14,rax
+        bswap   r12
+        ror     r13,23
+        mov     r15,r9
+
+        xor     r13,r8
+        ror     r14,5
+        xor     r15,r10
+
+        mov     QWORD[64+rsp],r12
+        xor     r14,rax
+        and     r15,r8
+
+        ror     r13,4
+        add     r12,r11
+        xor     r15,r10
+
+        ror     r14,6
+        xor     r13,r8
+        add     r12,r15
+
+        mov     r15,rax
+        add     r12,QWORD[rbp]
+        xor     r14,rax
+
+        xor     r15,rbx
+        ror     r13,14
+        mov     r11,rbx
+
+        and     rdi,r15
+        ror     r14,28
+        add     r12,r13
+
+        xor     r11,rdi
+        add     rdx,r12
+        add     r11,r12
+
+        lea     rbp,[8+rbp]
+        add     r11,r14
+        mov     r12,QWORD[72+rsi]
+        mov     r13,rdx
+        mov     r14,r11
+        bswap   r12
+        ror     r13,23
+        mov     rdi,r8
+
+        xor     r13,rdx
+        ror     r14,5
+        xor     rdi,r9
+
+        mov     QWORD[72+rsp],r12
+        xor     r14,r11
+        and     rdi,rdx
+
+        ror     r13,4
+        add     r12,r10
+        xor     rdi,r9
+
+        ror     r14,6
+        xor     r13,rdx
+        add     r12,rdi
+
+        mov     rdi,r11
+        add     r12,QWORD[rbp]
+        xor     r14,r11
+
+        xor     rdi,rax
+        ror     r13,14
+        mov     r10,rax
+
+        and     r15,rdi
+        ror     r14,28
+        add     r12,r13
+
+        xor     r10,r15
+        add     rcx,r12
+        add     r10,r12
+
+        lea     rbp,[24+rbp]
+        add     r10,r14
+        mov     r12,QWORD[80+rsi]
+        mov     r13,rcx
+        mov     r14,r10
+        bswap   r12
+        ror     r13,23
+        mov     r15,rdx
+
+        xor     r13,rcx
+        ror     r14,5
+        xor     r15,r8
+
+        mov     QWORD[80+rsp],r12
+        xor     r14,r10
+        and     r15,rcx
+
+        ror     r13,4
+        add     r12,r9
+        xor     r15,r8
+
+        ror     r14,6
+        xor     r13,rcx
+        add     r12,r15
+
+        mov     r15,r10
+        add     r12,QWORD[rbp]
+        xor     r14,r10
+
+        xor     r15,r11
+        ror     r13,14
+        mov     r9,r11
+
+        and     rdi,r15
+        ror     r14,28
+        add     r12,r13
+
+        xor     r9,rdi
+        add     rbx,r12
+        add     r9,r12
+
+        lea     rbp,[8+rbp]
+        add     r9,r14
+        mov     r12,QWORD[88+rsi]
+        mov     r13,rbx
+        mov     r14,r9
+        bswap   r12
+        ror     r13,23
+        mov     rdi,rcx
+
+        xor     r13,rbx
+        ror     r14,5
+        xor     rdi,rdx
+
+        mov     QWORD[88+rsp],r12
+        xor     r14,r9
+        and     rdi,rbx
+
+        ror     r13,4
+        add     r12,r8
+        xor     rdi,rdx
+
+        ror     r14,6
+        xor     r13,rbx
+        add     r12,rdi
+
+        mov     rdi,r9
+        add     r12,QWORD[rbp]
+        xor     r14,r9
+
+        xor     rdi,r10
+        ror     r13,14
+        mov     r8,r10
+
+        and     r15,rdi
+        ror     r14,28
+        add     r12,r13
+
+        xor     r8,r15
+        add     rax,r12
+        add     r8,r12
+
+        lea     rbp,[24+rbp]
+        add     r8,r14
+        mov     r12,QWORD[96+rsi]
+        mov     r13,rax
+        mov     r14,r8
+        bswap   r12
+        ror     r13,23
+        mov     r15,rbx
+
+        xor     r13,rax
+        ror     r14,5
+        xor     r15,rcx
+
+        mov     QWORD[96+rsp],r12
+        xor     r14,r8
+        and     r15,rax
+
+        ror     r13,4
+        add     r12,rdx
+        xor     r15,rcx
+
+        ror     r14,6
+        xor     r13,rax
+        add     r12,r15
+
+        mov     r15,r8
+        add     r12,QWORD[rbp]
+        xor     r14,r8
+
+        xor     r15,r9
+        ror     r13,14
+        mov     rdx,r9
+
+        and     rdi,r15
+        ror     r14,28
+        add     r12,r13
+
+        xor     rdx,rdi
+        add     r11,r12
+        add     rdx,r12
+
+        lea     rbp,[8+rbp]
+        add     rdx,r14
+        mov     r12,QWORD[104+rsi]
+        mov     r13,r11
+        mov     r14,rdx
+        bswap   r12
+        ror     r13,23
+        mov     rdi,rax
+
+        xor     r13,r11
+        ror     r14,5
+        xor     rdi,rbx
+
+        mov     QWORD[104+rsp],r12
+        xor     r14,rdx
+        and     rdi,r11
+
+        ror     r13,4
+        add     r12,rcx
+        xor     rdi,rbx
+
+        ror     r14,6
+        xor     r13,r11
+        add     r12,rdi
+
+        mov     rdi,rdx
+        add     r12,QWORD[rbp]
+        xor     r14,rdx
+
+        xor     rdi,r8
+        ror     r13,14
+        mov     rcx,r8
+
+        and     r15,rdi
+        ror     r14,28
+        add     r12,r13
+
+        xor     rcx,r15
+        add     r10,r12
+        add     rcx,r12
+
+        lea     rbp,[24+rbp]
+        add     rcx,r14
+        mov     r12,QWORD[112+rsi]
+        mov     r13,r10
+        mov     r14,rcx
+        bswap   r12
+        ror     r13,23
+        mov     r15,r11
+
+        xor     r13,r10
+        ror     r14,5
+        xor     r15,rax
+
+        mov     QWORD[112+rsp],r12
+        xor     r14,rcx
+        and     r15,r10
+
+        ror     r13,4
+        add     r12,rbx
+        xor     r15,rax
+
+        ror     r14,6
+        xor     r13,r10
+        add     r12,r15
+
+        mov     r15,rcx
+        add     r12,QWORD[rbp]
+        xor     r14,rcx
+
+        xor     r15,rdx
+        ror     r13,14
+        mov     rbx,rdx
+
+        and     rdi,r15
+        ror     r14,28
+        add     r12,r13
+
+        xor     rbx,rdi
+        add     r9,r12
+        add     rbx,r12
+
+        lea     rbp,[8+rbp]
+        add     rbx,r14
+        mov     r12,QWORD[120+rsi]
+        mov     r13,r9
+        mov     r14,rbx
+        bswap   r12
+        ror     r13,23
+        mov     rdi,r10
+
+        xor     r13,r9
+        ror     r14,5
+        xor     rdi,r11
+
+        mov     QWORD[120+rsp],r12
+        xor     r14,rbx
+        and     rdi,r9
+
+        ror     r13,4
+        add     r12,rax
+        xor     rdi,r11
+
+        ror     r14,6
+        xor     r13,r9
+        add     r12,rdi
+
+        mov     rdi,rbx
+        add     r12,QWORD[rbp]
+        xor     r14,rbx
+
+        xor     rdi,rcx
+        ror     r13,14
+        mov     rax,rcx
+
+        and     r15,rdi
+        ror     r14,28
+        add     r12,r13
+
+        xor     rax,r15
+        add     r8,r12
+        add     rax,r12
+
+        lea     rbp,[24+rbp]
+        jmp     NEAR $L$rounds_16_xx
+ALIGN   16
+$L$rounds_16_xx:
+        mov     r13,QWORD[8+rsp]
+        mov     r15,QWORD[112+rsp]
+
+        mov     r12,r13
+        ror     r13,7
+        add     rax,r14
+        mov     r14,r15
+        ror     r15,42
+
+        xor     r13,r12
+        shr     r12,7
+        ror     r13,1
+        xor     r15,r14
+        shr     r14,6
+
+        ror     r15,19
+        xor     r12,r13
+        xor     r15,r14
+        add     r12,QWORD[72+rsp]
+
+        add     r12,QWORD[rsp]
+        mov     r13,r8
+        add     r12,r15
+        mov     r14,rax
+        ror     r13,23
+        mov     r15,r9
+
+        xor     r13,r8
+        ror     r14,5
+        xor     r15,r10
+
+        mov     QWORD[rsp],r12
+        xor     r14,rax
+        and     r15,r8
+
+        ror     r13,4
+        add     r12,r11
+        xor     r15,r10
+
+        ror     r14,6
+        xor     r13,r8
+        add     r12,r15
+
+        mov     r15,rax
+        add     r12,QWORD[rbp]
+        xor     r14,rax
+
+        xor     r15,rbx
+        ror     r13,14
+        mov     r11,rbx
+
+        and     rdi,r15
+        ror     r14,28
+        add     r12,r13
+
+        xor     r11,rdi
+        add     rdx,r12
+        add     r11,r12
+
+        lea     rbp,[8+rbp]
+        mov     r13,QWORD[16+rsp]
+        mov     rdi,QWORD[120+rsp]
+
+        mov     r12,r13
+        ror     r13,7
+        add     r11,r14
+        mov     r14,rdi
+        ror     rdi,42
+
+        xor     r13,r12
+        shr     r12,7
+        ror     r13,1
+        xor     rdi,r14
+        shr     r14,6
+
+        ror     rdi,19
+        xor     r12,r13
+        xor     rdi,r14
+        add     r12,QWORD[80+rsp]
+
+        add     r12,QWORD[8+rsp]
+        mov     r13,rdx
+        add     r12,rdi
+        mov     r14,r11
+        ror     r13,23
+        mov     rdi,r8
+
+        xor     r13,rdx
+        ror     r14,5
+        xor     rdi,r9
+
+        mov     QWORD[8+rsp],r12
+        xor     r14,r11
+        and     rdi,rdx
+
+        ror     r13,4
+        add     r12,r10
+        xor     rdi,r9
+
+        ror     r14,6
+        xor     r13,rdx
+        add     r12,rdi
+
+        mov     rdi,r11
+        add     r12,QWORD[rbp]
+        xor     r14,r11
+
+        xor     rdi,rax
+        ror     r13,14
+        mov     r10,rax
+
+        and     r15,rdi
+        ror     r14,28
+        add     r12,r13
+
+        xor     r10,r15
+        add     rcx,r12
+        add     r10,r12
+
+        lea     rbp,[24+rbp]
+        mov     r13,QWORD[24+rsp]
+        mov     r15,QWORD[rsp]
+
+        mov     r12,r13
+        ror     r13,7
+        add     r10,r14
+        mov     r14,r15
+        ror     r15,42
+
+        xor     r13,r12
+        shr     r12,7
+        ror     r13,1
+        xor     r15,r14
+        shr     r14,6
+
+        ror     r15,19
+        xor     r12,r13
+        xor     r15,r14
+        add     r12,QWORD[88+rsp]
+
+        add     r12,QWORD[16+rsp]
+        mov     r13,rcx
+        add     r12,r15
+        mov     r14,r10
+        ror     r13,23
+        mov     r15,rdx
+
+        xor     r13,rcx
+        ror     r14,5
+        xor     r15,r8
+
+        mov     QWORD[16+rsp],r12
+        xor     r14,r10
+        and     r15,rcx
+
+        ror     r13,4
+        add     r12,r9
+        xor     r15,r8
+
+        ror     r14,6
+        xor     r13,rcx
+        add     r12,r15
+
+        mov     r15,r10
+        add     r12,QWORD[rbp]
+        xor     r14,r10
+
+        xor     r15,r11
+        ror     r13,14
+        mov     r9,r11
+
+        and     rdi,r15
+        ror     r14,28
+        add     r12,r13
+
+        xor     r9,rdi
+        add     rbx,r12
+        add     r9,r12
+
+        lea     rbp,[8+rbp]
+        mov     r13,QWORD[32+rsp]
+        mov     rdi,QWORD[8+rsp]
+
+        mov     r12,r13
+        ror     r13,7
+        add     r9,r14
+        mov     r14,rdi
+        ror     rdi,42
+
+        xor     r13,r12
+        shr     r12,7
+        ror     r13,1
+        xor     rdi,r14
+        shr     r14,6
+
+        ror     rdi,19
+        xor     r12,r13
+        xor     rdi,r14
+        add     r12,QWORD[96+rsp]
+
+        add     r12,QWORD[24+rsp]
+        mov     r13,rbx
+        add     r12,rdi
+        mov     r14,r9
+        ror     r13,23
+        mov     rdi,rcx
+
+        xor     r13,rbx
+        ror     r14,5
+        xor     rdi,rdx
+
+        mov     QWORD[24+rsp],r12
+        xor     r14,r9
+        and     rdi,rbx
+
+        ror     r13,4
+        add     r12,r8
+        xor     rdi,rdx
+
+        ror     r14,6
+        xor     r13,rbx
+        add     r12,rdi
+
+        mov     rdi,r9
+        add     r12,QWORD[rbp]
+        xor     r14,r9
+
+        xor     rdi,r10
+        ror     r13,14
+        mov     r8,r10
+
+        and     r15,rdi
+        ror     r14,28
+        add     r12,r13
+
+        xor     r8,r15
+        add     rax,r12
+        add     r8,r12
+
+        lea     rbp,[24+rbp]
+        mov     r13,QWORD[40+rsp]
+        mov     r15,QWORD[16+rsp]
+
+        mov     r12,r13
+        ror     r13,7
+        add     r8,r14
+        mov     r14,r15
+        ror     r15,42
+
+        xor     r13,r12
+        shr     r12,7
+        ror     r13,1
+        xor     r15,r14
+        shr     r14,6
+
+        ror     r15,19
+        xor     r12,r13
+        xor     r15,r14
+        add     r12,QWORD[104+rsp]
+
+        add     r12,QWORD[32+rsp]
+        mov     r13,rax
+        add     r12,r15
+        mov     r14,r8
+        ror     r13,23
+        mov     r15,rbx
+
+        xor     r13,rax
+        ror     r14,5
+        xor     r15,rcx
+
+        mov     QWORD[32+rsp],r12
+        xor     r14,r8
+        and     r15,rax
+
+        ror     r13,4
+        add     r12,rdx
+        xor     r15,rcx
+
+        ror     r14,6
+        xor     r13,rax
+        add     r12,r15
+
+        mov     r15,r8
+        add     r12,QWORD[rbp]
+        xor     r14,r8
+
+        xor     r15,r9
+        ror     r13,14
+        mov     rdx,r9
+
+        and     rdi,r15
+        ror     r14,28
+        add     r12,r13
+
+        xor     rdx,rdi
+        add     r11,r12
+        add     rdx,r12
+
+        lea     rbp,[8+rbp]
+        mov     r13,QWORD[48+rsp]
+        mov     rdi,QWORD[24+rsp]
+
+        mov     r12,r13
+        ror     r13,7
+        add     rdx,r14
+        mov     r14,rdi
+        ror     rdi,42
+
+        xor     r13,r12
+        shr     r12,7
+        ror     r13,1
+        xor     rdi,r14
+        shr     r14,6
+
+        ror     rdi,19
+        xor     r12,r13
+        xor     rdi,r14
+        add     r12,QWORD[112+rsp]
+
+        add     r12,QWORD[40+rsp]
+        mov     r13,r11
+        add     r12,rdi
+        mov     r14,rdx
+        ror     r13,23
+        mov     rdi,rax
+
+        xor     r13,r11
+        ror     r14,5
+        xor     rdi,rbx
+
+        mov     QWORD[40+rsp],r12
+        xor     r14,rdx
+        and     rdi,r11
+
+        ror     r13,4
+        add     r12,rcx
+        xor     rdi,rbx
+
+        ror     r14,6
+        xor     r13,r11
+        add     r12,rdi
+
+        mov     rdi,rdx
+        add     r12,QWORD[rbp]
+        xor     r14,rdx
+
+        xor     rdi,r8
+        ror     r13,14
+        mov     rcx,r8
+
+        and     r15,rdi
+        ror     r14,28
+        add     r12,r13
+
+        xor     rcx,r15
+        add     r10,r12
+        add     rcx,r12
+
+        lea     rbp,[24+rbp]
+        mov     r13,QWORD[56+rsp]
+        mov     r15,QWORD[32+rsp]
+
+        mov     r12,r13
+        ror     r13,7
+        add     rcx,r14
+        mov     r14,r15
+        ror     r15,42
+
+        xor     r13,r12
+        shr     r12,7
+        ror     r13,1
+        xor     r15,r14
+        shr     r14,6
+
+        ror     r15,19
+        xor     r12,r13
+        xor     r15,r14
+        add     r12,QWORD[120+rsp]
+
+        add     r12,QWORD[48+rsp]
+        mov     r13,r10
+        add     r12,r15
+        mov     r14,rcx
+        ror     r13,23
+        mov     r15,r11
+
+        xor     r13,r10
+        ror     r14,5
+        xor     r15,rax
+
+        mov     QWORD[48+rsp],r12
+        xor     r14,rcx
+        and     r15,r10
+
+        ror     r13,4
+        add     r12,rbx
+        xor     r15,rax
+
+        ror     r14,6
+        xor     r13,r10
+        add     r12,r15
+
+        mov     r15,rcx
+        add     r12,QWORD[rbp]
+        xor     r14,rcx
+
+        xor     r15,rdx
+        ror     r13,14
+        mov     rbx,rdx
+
+        and     rdi,r15
+        ror     r14,28
+        add     r12,r13
+
+        xor     rbx,rdi
+        add     r9,r12
+        add     rbx,r12
+
+        lea     rbp,[8+rbp]
+        mov     r13,QWORD[64+rsp]
+        mov     rdi,QWORD[40+rsp]
+
+        mov     r12,r13
+        ror     r13,7
+        add     rbx,r14
+        mov     r14,rdi
+        ror     rdi,42
+
+        xor     r13,r12
+        shr     r12,7
+        ror     r13,1
+        xor     rdi,r14
+        shr     r14,6
+
+        ror     rdi,19
+        xor     r12,r13
+        xor     rdi,r14
+        add     r12,QWORD[rsp]
+
+        add     r12,QWORD[56+rsp]
+        mov     r13,r9
+        add     r12,rdi
+        mov     r14,rbx
+        ror     r13,23
+        mov     rdi,r10
+
+        xor     r13,r9
+        ror     r14,5
+        xor     rdi,r11
+
+        mov     QWORD[56+rsp],r12
+        xor     r14,rbx
+        and     rdi,r9
+
+        ror     r13,4
+        add     r12,rax
+        xor     rdi,r11
+
+        ror     r14,6
+        xor     r13,r9
+        add     r12,rdi
+
+        mov     rdi,rbx
+        add     r12,QWORD[rbp]
+        xor     r14,rbx
+
+        xor     rdi,rcx
+        ror     r13,14
+        mov     rax,rcx
+
+        and     r15,rdi
+        ror     r14,28
+        add     r12,r13
+
+        xor     rax,r15
+        add     r8,r12
+        add     rax,r12
+
+        lea     rbp,[24+rbp]
+        mov     r13,QWORD[72+rsp]
+        mov     r15,QWORD[48+rsp]
+
+        mov     r12,r13
+        ror     r13,7
+        add     rax,r14
+        mov     r14,r15
+        ror     r15,42
+
+        xor     r13,r12
+        shr     r12,7
+        ror     r13,1
+        xor     r15,r14
+        shr     r14,6
+
+        ror     r15,19
+        xor     r12,r13
+        xor     r15,r14
+        add     r12,QWORD[8+rsp]
+
+        add     r12,QWORD[64+rsp]
+        mov     r13,r8
+        add     r12,r15
+        mov     r14,rax
+        ror     r13,23
+        mov     r15,r9
+
+        xor     r13,r8
+        ror     r14,5
+        xor     r15,r10
+
+        mov     QWORD[64+rsp],r12
+        xor     r14,rax
+        and     r15,r8
+
+        ror     r13,4
+        add     r12,r11
+        xor     r15,r10
+
+        ror     r14,6
+        xor     r13,r8
+        add     r12,r15
+
+        mov     r15,rax
+        add     r12,QWORD[rbp]
+        xor     r14,rax
+
+        xor     r15,rbx
+        ror     r13,14
+        mov     r11,rbx
+
+        and     rdi,r15
+        ror     r14,28
+        add     r12,r13
+
+        xor     r11,rdi
+        add     rdx,r12
+        add     r11,r12
+
+        lea     rbp,[8+rbp]
+        mov     r13,QWORD[80+rsp]
+        mov     rdi,QWORD[56+rsp]
+
+        mov     r12,r13
+        ror     r13,7
+        add     r11,r14
+        mov     r14,rdi
+        ror     rdi,42
+
+        xor     r13,r12
+        shr     r12,7
+        ror     r13,1
+        xor     rdi,r14
+        shr     r14,6
+
+        ror     rdi,19
+        xor     r12,r13
+        xor     rdi,r14
+        add     r12,QWORD[16+rsp]
+
+        add     r12,QWORD[72+rsp]
+        mov     r13,rdx
+        add     r12,rdi
+        mov     r14,r11
+        ror     r13,23
+        mov     rdi,r8
+
+        xor     r13,rdx
+        ror     r14,5
+        xor     rdi,r9
+
+        mov     QWORD[72+rsp],r12
+        xor     r14,r11
+        and     rdi,rdx
+
+        ror     r13,4
+        add     r12,r10
+        xor     rdi,r9
+
+        ror     r14,6
+        xor     r13,rdx
+        add     r12,rdi
+
+        mov     rdi,r11
+        add     r12,QWORD[rbp]
+        xor     r14,r11
+
+        xor     rdi,rax
+        ror     r13,14
+        mov     r10,rax
+
+        and     r15,rdi
+        ror     r14,28
+        add     r12,r13
+
+        xor     r10,r15
+        add     rcx,r12
+        add     r10,r12
+
+        lea     rbp,[24+rbp]
+        mov     r13,QWORD[88+rsp]
+        mov     r15,QWORD[64+rsp]
+
+        mov     r12,r13
+        ror     r13,7
+        add     r10,r14
+        mov     r14,r15
+        ror     r15,42
+
+        xor     r13,r12
+        shr     r12,7
+        ror     r13,1
+        xor     r15,r14
+        shr     r14,6
+
+        ror     r15,19
+        xor     r12,r13
+        xor     r15,r14
+        add     r12,QWORD[24+rsp]
+
+        add     r12,QWORD[80+rsp]
+        mov     r13,rcx
+        add     r12,r15
+        mov     r14,r10
+        ror     r13,23
+        mov     r15,rdx
+
+        xor     r13,rcx
+        ror     r14,5
+        xor     r15,r8
+
+        mov     QWORD[80+rsp],r12
+        xor     r14,r10
+        and     r15,rcx
+
+        ror     r13,4
+        add     r12,r9
+        xor     r15,r8
+
+        ror     r14,6
+        xor     r13,rcx
+        add     r12,r15
+
+        mov     r15,r10
+        add     r12,QWORD[rbp]
+        xor     r14,r10
+
+        xor     r15,r11
+        ror     r13,14
+        mov     r9,r11
+
+        and     rdi,r15
+        ror     r14,28
+        add     r12,r13
+
+        xor     r9,rdi
+        add     rbx,r12
+        add     r9,r12
+
+        lea     rbp,[8+rbp]
+        mov     r13,QWORD[96+rsp]
+        mov     rdi,QWORD[72+rsp]
+
+        mov     r12,r13
+        ror     r13,7
+        add     r9,r14
+        mov     r14,rdi
+        ror     rdi,42
+
+        xor     r13,r12
+        shr     r12,7
+        ror     r13,1
+        xor     rdi,r14
+        shr     r14,6
+
+        ror     rdi,19
+        xor     r12,r13
+        xor     rdi,r14
+        add     r12,QWORD[32+rsp]
+
+        add     r12,QWORD[88+rsp]
+        mov     r13,rbx
+        add     r12,rdi
+        mov     r14,r9
+        ror     r13,23
+        mov     rdi,rcx
+
+        xor     r13,rbx
+        ror     r14,5
+        xor     rdi,rdx
+
+        mov     QWORD[88+rsp],r12
+        xor     r14,r9
+        and     rdi,rbx
+
+        ror     r13,4
+        add     r12,r8
+        xor     rdi,rdx
+
+        ror     r14,6
+        xor     r13,rbx
+        add     r12,rdi
+
+        mov     rdi,r9
+        add     r12,QWORD[rbp]
+        xor     r14,r9
+
+        xor     rdi,r10
+        ror     r13,14
+        mov     r8,r10
+
+        and     r15,rdi
+        ror     r14,28
+        add     r12,r13
+
+        xor     r8,r15
+        add     rax,r12
+        add     r8,r12
+
+        lea     rbp,[24+rbp]
+        mov     r13,QWORD[104+rsp]
+        mov     r15,QWORD[80+rsp]
+
+        mov     r12,r13
+        ror     r13,7
+        add     r8,r14
+        mov     r14,r15
+        ror     r15,42
+
+        xor     r13,r12
+        shr     r12,7
+        ror     r13,1
+        xor     r15,r14
+        shr     r14,6
+
+        ror     r15,19
+        xor     r12,r13
+        xor     r15,r14
+        add     r12,QWORD[40+rsp]
+
+        add     r12,QWORD[96+rsp]
+        mov     r13,rax
+        add     r12,r15
+        mov     r14,r8
+        ror     r13,23
+        mov     r15,rbx
+
+        xor     r13,rax
+        ror     r14,5
+        xor     r15,rcx
+
+        mov     QWORD[96+rsp],r12
+        xor     r14,r8
+        and     r15,rax
+
+        ror     r13,4
+        add     r12,rdx
+        xor     r15,rcx
+
+        ror     r14,6
+        xor     r13,rax
+        add     r12,r15
+
+        mov     r15,r8
+        add     r12,QWORD[rbp]
+        xor     r14,r8
+
+        xor     r15,r9
+        ror     r13,14
+        mov     rdx,r9
+
+        and     rdi,r15
+        ror     r14,28
+        add     r12,r13
+
+        xor     rdx,rdi
+        add     r11,r12
+        add     rdx,r12
+
+        lea     rbp,[8+rbp]
+        mov     r13,QWORD[112+rsp]
+        mov     rdi,QWORD[88+rsp]
+
+        mov     r12,r13
+        ror     r13,7
+        add     rdx,r14
+        mov     r14,rdi
+        ror     rdi,42
+
+        xor     r13,r12
+        shr     r12,7
+        ror     r13,1
+        xor     rdi,r14
+        shr     r14,6
+
+        ror     rdi,19
+        xor     r12,r13
+        xor     rdi,r14
+        add     r12,QWORD[48+rsp]
+
+        add     r12,QWORD[104+rsp]
+        mov     r13,r11
+        add     r12,rdi
+        mov     r14,rdx
+        ror     r13,23
+        mov     rdi,rax
+
+        xor     r13,r11
+        ror     r14,5
+        xor     rdi,rbx
+
+        mov     QWORD[104+rsp],r12
+        xor     r14,rdx
+        and     rdi,r11
+
+        ror     r13,4
+        add     r12,rcx
+        xor     rdi,rbx
+
+        ror     r14,6
+        xor     r13,r11
+        add     r12,rdi
+
+        mov     rdi,rdx
+        add     r12,QWORD[rbp]
+        xor     r14,rdx
+
+        xor     rdi,r8
+        ror     r13,14
+        mov     rcx,r8
+
+        and     r15,rdi
+        ror     r14,28
+        add     r12,r13
+
+        xor     rcx,r15
+        add     r10,r12
+        add     rcx,r12
+
+        lea     rbp,[24+rbp]
+        mov     r13,QWORD[120+rsp]
+        mov     r15,QWORD[96+rsp]
+
+        mov     r12,r13
+        ror     r13,7
+        add     rcx,r14
+        mov     r14,r15
+        ror     r15,42
+
+        xor     r13,r12
+        shr     r12,7
+        ror     r13,1
+        xor     r15,r14
+        shr     r14,6
+
+        ror     r15,19
+        xor     r12,r13
+        xor     r15,r14
+        add     r12,QWORD[56+rsp]
+
+        add     r12,QWORD[112+rsp]
+        mov     r13,r10
+        add     r12,r15
+        mov     r14,rcx
+        ror     r13,23
+        mov     r15,r11
+
+        xor     r13,r10
+        ror     r14,5
+        xor     r15,rax
+
+        mov     QWORD[112+rsp],r12
+        xor     r14,rcx
+        and     r15,r10
+
+        ror     r13,4
+        add     r12,rbx
+        xor     r15,rax
+
+        ror     r14,6
+        xor     r13,r10
+        add     r12,r15
+
+        mov     r15,rcx
+        add     r12,QWORD[rbp]
+        xor     r14,rcx
+
+        xor     r15,rdx
+        ror     r13,14
+        mov     rbx,rdx
+
+        and     rdi,r15
+        ror     r14,28
+        add     r12,r13
+
+        xor     rbx,rdi
+        add     r9,r12
+        add     rbx,r12
+
+        lea     rbp,[8+rbp]
+        mov     r13,QWORD[rsp]
+        mov     rdi,QWORD[104+rsp]
+
+        mov     r12,r13
+        ror     r13,7
+        add     rbx,r14
+        mov     r14,rdi
+        ror     rdi,42
+
+        xor     r13,r12
+        shr     r12,7
+        ror     r13,1
+        xor     rdi,r14
+        shr     r14,6
+
+        ror     rdi,19
+        xor     r12,r13
+        xor     rdi,r14
+        add     r12,QWORD[64+rsp]
+
+        add     r12,QWORD[120+rsp]
+        mov     r13,r9
+        add     r12,rdi
+        mov     r14,rbx
+        ror     r13,23
+        mov     rdi,r10
+
+        xor     r13,r9
+        ror     r14,5
+        xor     rdi,r11
+
+        mov     QWORD[120+rsp],r12
+        xor     r14,rbx
+        and     rdi,r9
+
+        ror     r13,4
+        add     r12,rax
+        xor     rdi,r11
+
+        ror     r14,6
+        xor     r13,r9
+        add     r12,rdi
+
+        mov     rdi,rbx
+        add     r12,QWORD[rbp]
+        xor     r14,rbx
+
+        xor     rdi,rcx
+        ror     r13,14
+        mov     rax,rcx
+
+        and     r15,rdi
+        ror     r14,28
+        add     r12,r13
+
+        xor     rax,r15
+        add     r8,r12
+        add     rax,r12
+
+        lea     rbp,[24+rbp]
+        cmp     BYTE[7+rbp],0
+        jnz     NEAR $L$rounds_16_xx
+
+        mov     rdi,QWORD[((128+0))+rsp]
+        add     rax,r14
+        lea     rsi,[128+rsi]
+
+        add     rax,QWORD[rdi]
+        add     rbx,QWORD[8+rdi]
+        add     rcx,QWORD[16+rdi]
+        add     rdx,QWORD[24+rdi]
+        add     r8,QWORD[32+rdi]
+        add     r9,QWORD[40+rdi]
+        add     r10,QWORD[48+rdi]
+        add     r11,QWORD[56+rdi]
+
+        cmp     rsi,QWORD[((128+16))+rsp]
+
+        mov     QWORD[rdi],rax
+        mov     QWORD[8+rdi],rbx
+        mov     QWORD[16+rdi],rcx
+        mov     QWORD[24+rdi],rdx
+        mov     QWORD[32+rdi],r8
+        mov     QWORD[40+rdi],r9
+        mov     QWORD[48+rdi],r10
+        mov     QWORD[56+rdi],r11
+        jb      NEAR $L$loop
+
+        mov     rsi,QWORD[152+rsp]
+
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_sha512_block_data_order:
+ALIGN   64
+
+K512:
+        DQ      0x428a2f98d728ae22,0x7137449123ef65cd
+        DQ      0x428a2f98d728ae22,0x7137449123ef65cd
+        DQ      0xb5c0fbcfec4d3b2f,0xe9b5dba58189dbbc
+        DQ      0xb5c0fbcfec4d3b2f,0xe9b5dba58189dbbc
+        DQ      0x3956c25bf348b538,0x59f111f1b605d019
+        DQ      0x3956c25bf348b538,0x59f111f1b605d019
+        DQ      0x923f82a4af194f9b,0xab1c5ed5da6d8118
+        DQ      0x923f82a4af194f9b,0xab1c5ed5da6d8118
+        DQ      0xd807aa98a3030242,0x12835b0145706fbe
+        DQ      0xd807aa98a3030242,0x12835b0145706fbe
+        DQ      0x243185be4ee4b28c,0x550c7dc3d5ffb4e2
+        DQ      0x243185be4ee4b28c,0x550c7dc3d5ffb4e2
+        DQ      0x72be5d74f27b896f,0x80deb1fe3b1696b1
+        DQ      0x72be5d74f27b896f,0x80deb1fe3b1696b1
+        DQ      0x9bdc06a725c71235,0xc19bf174cf692694
+        DQ      0x9bdc06a725c71235,0xc19bf174cf692694
+        DQ      0xe49b69c19ef14ad2,0xefbe4786384f25e3
+        DQ      0xe49b69c19ef14ad2,0xefbe4786384f25e3
+        DQ      0x0fc19dc68b8cd5b5,0x240ca1cc77ac9c65
+        DQ      0x0fc19dc68b8cd5b5,0x240ca1cc77ac9c65
+        DQ      0x2de92c6f592b0275,0x4a7484aa6ea6e483
+        DQ      0x2de92c6f592b0275,0x4a7484aa6ea6e483
+        DQ      0x5cb0a9dcbd41fbd4,0x76f988da831153b5
+        DQ      0x5cb0a9dcbd41fbd4,0x76f988da831153b5
+        DQ      0x983e5152ee66dfab,0xa831c66d2db43210
+        DQ      0x983e5152ee66dfab,0xa831c66d2db43210
+        DQ      0xb00327c898fb213f,0xbf597fc7beef0ee4
+        DQ      0xb00327c898fb213f,0xbf597fc7beef0ee4
+        DQ      0xc6e00bf33da88fc2,0xd5a79147930aa725
+        DQ      0xc6e00bf33da88fc2,0xd5a79147930aa725
+        DQ      0x06ca6351e003826f,0x142929670a0e6e70
+        DQ      0x06ca6351e003826f,0x142929670a0e6e70
+        DQ      0x27b70a8546d22ffc,0x2e1b21385c26c926
+        DQ      0x27b70a8546d22ffc,0x2e1b21385c26c926
+        DQ      0x4d2c6dfc5ac42aed,0x53380d139d95b3df
+        DQ      0x4d2c6dfc5ac42aed,0x53380d139d95b3df
+        DQ      0x650a73548baf63de,0x766a0abb3c77b2a8
+        DQ      0x650a73548baf63de,0x766a0abb3c77b2a8
+        DQ      0x81c2c92e47edaee6,0x92722c851482353b
+        DQ      0x81c2c92e47edaee6,0x92722c851482353b
+        DQ      0xa2bfe8a14cf10364,0xa81a664bbc423001
+        DQ      0xa2bfe8a14cf10364,0xa81a664bbc423001
+        DQ      0xc24b8b70d0f89791,0xc76c51a30654be30
+        DQ      0xc24b8b70d0f89791,0xc76c51a30654be30
+        DQ      0xd192e819d6ef5218,0xd69906245565a910
+        DQ      0xd192e819d6ef5218,0xd69906245565a910
+        DQ      0xf40e35855771202a,0x106aa07032bbd1b8
+        DQ      0xf40e35855771202a,0x106aa07032bbd1b8
+        DQ      0x19a4c116b8d2d0c8,0x1e376c085141ab53
+        DQ      0x19a4c116b8d2d0c8,0x1e376c085141ab53
+        DQ      0x2748774cdf8eeb99,0x34b0bcb5e19b48a8
+        DQ      0x2748774cdf8eeb99,0x34b0bcb5e19b48a8
+        DQ      0x391c0cb3c5c95a63,0x4ed8aa4ae3418acb
+        DQ      0x391c0cb3c5c95a63,0x4ed8aa4ae3418acb
+        DQ      0x5b9cca4f7763e373,0x682e6ff3d6b2b8a3
+        DQ      0x5b9cca4f7763e373,0x682e6ff3d6b2b8a3
+        DQ      0x748f82ee5defb2fc,0x78a5636f43172f60
+        DQ      0x748f82ee5defb2fc,0x78a5636f43172f60
+        DQ      0x84c87814a1f0ab72,0x8cc702081a6439ec
+        DQ      0x84c87814a1f0ab72,0x8cc702081a6439ec
+        DQ      0x90befffa23631e28,0xa4506cebde82bde9
+        DQ      0x90befffa23631e28,0xa4506cebde82bde9
+        DQ      0xbef9a3f7b2c67915,0xc67178f2e372532b
+        DQ      0xbef9a3f7b2c67915,0xc67178f2e372532b
+        DQ      0xca273eceea26619c,0xd186b8c721c0c207
+        DQ      0xca273eceea26619c,0xd186b8c721c0c207
+        DQ      0xeada7dd6cde0eb1e,0xf57d4f7fee6ed178
+        DQ      0xeada7dd6cde0eb1e,0xf57d4f7fee6ed178
+        DQ      0x06f067aa72176fba,0x0a637dc5a2c898a6
+        DQ      0x06f067aa72176fba,0x0a637dc5a2c898a6
+        DQ      0x113f9804bef90dae,0x1b710b35131c471b
+        DQ      0x113f9804bef90dae,0x1b710b35131c471b
+        DQ      0x28db77f523047d84,0x32caab7b40c72493
+        DQ      0x28db77f523047d84,0x32caab7b40c72493
+        DQ      0x3c9ebe0a15c9bebc,0x431d67c49c100d4c
+        DQ      0x3c9ebe0a15c9bebc,0x431d67c49c100d4c
+        DQ      0x4cc5d4becb3e42b6,0x597f299cfc657e2a
+        DQ      0x4cc5d4becb3e42b6,0x597f299cfc657e2a
+        DQ      0x5fcb6fab3ad6faec,0x6c44198c4a475817
+        DQ      0x5fcb6fab3ad6faec,0x6c44198c4a475817
+
+        DQ      0x0001020304050607,0x08090a0b0c0d0e0f
+        DQ      0x0001020304050607,0x08090a0b0c0d0e0f
+DB      83,72,65,53,49,50,32,98,108,111,99,107,32,116,114,97
+DB      110,115,102,111,114,109,32,102,111,114,32,120,56,54,95,54
+DB      52,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121
+DB      32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46
+DB      111,114,103,62,0
+
+ALIGN   64
+sha512_block_data_order_xop:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha512_block_data_order_xop:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+$L$xop_shortcut:
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        shl     rdx,4
+        sub     rsp,256
+        lea     rdx,[rdx*8+rsi]
+        and     rsp,-64
+        mov     QWORD[((128+0))+rsp],rdi
+        mov     QWORD[((128+8))+rsp],rsi
+        mov     QWORD[((128+16))+rsp],rdx
+        mov     QWORD[152+rsp],rax
+
+        movaps  XMMWORD[(128+32)+rsp],xmm6
+        movaps  XMMWORD[(128+48)+rsp],xmm7
+        movaps  XMMWORD[(128+64)+rsp],xmm8
+        movaps  XMMWORD[(128+80)+rsp],xmm9
+        movaps  XMMWORD[(128+96)+rsp],xmm10
+        movaps  XMMWORD[(128+112)+rsp],xmm11
+$L$prologue_xop:
+
+        vzeroupper
+        mov     rax,QWORD[rdi]
+        mov     rbx,QWORD[8+rdi]
+        mov     rcx,QWORD[16+rdi]
+        mov     rdx,QWORD[24+rdi]
+        mov     r8,QWORD[32+rdi]
+        mov     r9,QWORD[40+rdi]
+        mov     r10,QWORD[48+rdi]
+        mov     r11,QWORD[56+rdi]
+        jmp     NEAR $L$loop_xop
+ALIGN   16
+$L$loop_xop:
+        vmovdqa xmm11,XMMWORD[((K512+1280))]
+        vmovdqu xmm0,XMMWORD[rsi]
+        lea     rbp,[((K512+128))]
+        vmovdqu xmm1,XMMWORD[16+rsi]
+        vmovdqu xmm2,XMMWORD[32+rsi]
+        vpshufb xmm0,xmm0,xmm11
+        vmovdqu xmm3,XMMWORD[48+rsi]
+        vpshufb xmm1,xmm1,xmm11
+        vmovdqu xmm4,XMMWORD[64+rsi]
+        vpshufb xmm2,xmm2,xmm11
+        vmovdqu xmm5,XMMWORD[80+rsi]
+        vpshufb xmm3,xmm3,xmm11
+        vmovdqu xmm6,XMMWORD[96+rsi]
+        vpshufb xmm4,xmm4,xmm11
+        vmovdqu xmm7,XMMWORD[112+rsi]
+        vpshufb xmm5,xmm5,xmm11
+        vpaddq  xmm8,xmm0,XMMWORD[((-128))+rbp]
+        vpshufb xmm6,xmm6,xmm11
+        vpaddq  xmm9,xmm1,XMMWORD[((-96))+rbp]
+        vpshufb xmm7,xmm7,xmm11
+        vpaddq  xmm10,xmm2,XMMWORD[((-64))+rbp]
+        vpaddq  xmm11,xmm3,XMMWORD[((-32))+rbp]
+        vmovdqa XMMWORD[rsp],xmm8
+        vpaddq  xmm8,xmm4,XMMWORD[rbp]
+        vmovdqa XMMWORD[16+rsp],xmm9
+        vpaddq  xmm9,xmm5,XMMWORD[32+rbp]
+        vmovdqa XMMWORD[32+rsp],xmm10
+        vpaddq  xmm10,xmm6,XMMWORD[64+rbp]
+        vmovdqa XMMWORD[48+rsp],xmm11
+        vpaddq  xmm11,xmm7,XMMWORD[96+rbp]
+        vmovdqa XMMWORD[64+rsp],xmm8
+        mov     r14,rax
+        vmovdqa XMMWORD[80+rsp],xmm9
+        mov     rdi,rbx
+        vmovdqa XMMWORD[96+rsp],xmm10
+        xor     rdi,rcx
+        vmovdqa XMMWORD[112+rsp],xmm11
+        mov     r13,r8
+        jmp     NEAR $L$xop_00_47
+
+ALIGN   16
+$L$xop_00_47:
+        add     rbp,256
+        vpalignr        xmm8,xmm1,xmm0,8
+        ror     r13,23
+        mov     rax,r14
+        vpalignr        xmm11,xmm5,xmm4,8
+        mov     r12,r9
+        ror     r14,5
+DB      143,72,120,195,200,56
+        xor     r13,r8
+        xor     r12,r10
+        vpsrlq  xmm8,xmm8,7
+        ror     r13,4
+        xor     r14,rax
+        vpaddq  xmm0,xmm0,xmm11
+        and     r12,r8
+        xor     r13,r8
+        add     r11,QWORD[rsp]
+        mov     r15,rax
+DB      143,72,120,195,209,7
+        xor     r12,r10
+        ror     r14,6
+        vpxor   xmm8,xmm8,xmm9
+        xor     r15,rbx
+        add     r11,r12
+        ror     r13,14
+        and     rdi,r15
+DB      143,104,120,195,223,3
+        xor     r14,rax
+        add     r11,r13
+        vpxor   xmm8,xmm8,xmm10
+        xor     rdi,rbx
+        ror     r14,28
+        vpsrlq  xmm10,xmm7,6
+        add     rdx,r11
+        add     r11,rdi
+        vpaddq  xmm0,xmm0,xmm8
+        mov     r13,rdx
+        add     r14,r11
+DB      143,72,120,195,203,42
+        ror     r13,23
+        mov     r11,r14
+        vpxor   xmm11,xmm11,xmm10
+        mov     r12,r8
+        ror     r14,5
+        xor     r13,rdx
+        xor     r12,r9
+        vpxor   xmm11,xmm11,xmm9
+        ror     r13,4
+        xor     r14,r11
+        and     r12,rdx
+        xor     r13,rdx
+        vpaddq  xmm0,xmm0,xmm11
+        add     r10,QWORD[8+rsp]
+        mov     rdi,r11
+        xor     r12,r9
+        ror     r14,6
+        vpaddq  xmm10,xmm0,XMMWORD[((-128))+rbp]
+        xor     rdi,rax
+        add     r10,r12
+        ror     r13,14
+        and     r15,rdi
+        xor     r14,r11
+        add     r10,r13
+        xor     r15,rax
+        ror     r14,28
+        add     rcx,r10
+        add     r10,r15
+        mov     r13,rcx
+        add     r14,r10
+        vmovdqa XMMWORD[rsp],xmm10
+        vpalignr        xmm8,xmm2,xmm1,8
+        ror     r13,23
+        mov     r10,r14
+        vpalignr        xmm11,xmm6,xmm5,8
+        mov     r12,rdx
+        ror     r14,5
+DB      143,72,120,195,200,56
+        xor     r13,rcx
+        xor     r12,r8
+        vpsrlq  xmm8,xmm8,7
+        ror     r13,4
+        xor     r14,r10
+        vpaddq  xmm1,xmm1,xmm11
+        and     r12,rcx
+        xor     r13,rcx
+        add     r9,QWORD[16+rsp]
+        mov     r15,r10
+DB      143,72,120,195,209,7
+        xor     r12,r8
+        ror     r14,6
+        vpxor   xmm8,xmm8,xmm9
+        xor     r15,r11
+        add     r9,r12
+        ror     r13,14
+        and     rdi,r15
+DB      143,104,120,195,216,3
+        xor     r14,r10
+        add     r9,r13
+        vpxor   xmm8,xmm8,xmm10
+        xor     rdi,r11
+        ror     r14,28
+        vpsrlq  xmm10,xmm0,6
+        add     rbx,r9
+        add     r9,rdi
+        vpaddq  xmm1,xmm1,xmm8
+        mov     r13,rbx
+        add     r14,r9
+DB      143,72,120,195,203,42
+        ror     r13,23
+        mov     r9,r14
+        vpxor   xmm11,xmm11,xmm10
+        mov     r12,rcx
+        ror     r14,5
+        xor     r13,rbx
+        xor     r12,rdx
+        vpxor   xmm11,xmm11,xmm9
+        ror     r13,4
+        xor     r14,r9
+        and     r12,rbx
+        xor     r13,rbx
+        vpaddq  xmm1,xmm1,xmm11
+        add     r8,QWORD[24+rsp]
+        mov     rdi,r9
+        xor     r12,rdx
+        ror     r14,6
+        vpaddq  xmm10,xmm1,XMMWORD[((-96))+rbp]
+        xor     rdi,r10
+        add     r8,r12
+        ror     r13,14
+        and     r15,rdi
+        xor     r14,r9
+        add     r8,r13
+        xor     r15,r10
+        ror     r14,28
+        add     rax,r8
+        add     r8,r15
+        mov     r13,rax
+        add     r14,r8
+        vmovdqa XMMWORD[16+rsp],xmm10
+        vpalignr        xmm8,xmm3,xmm2,8
+        ror     r13,23
+        mov     r8,r14
+        vpalignr        xmm11,xmm7,xmm6,8
+        mov     r12,rbx
+        ror     r14,5
+DB      143,72,120,195,200,56
+        xor     r13,rax
+        xor     r12,rcx
+        vpsrlq  xmm8,xmm8,7
+        ror     r13,4
+        xor     r14,r8
+        vpaddq  xmm2,xmm2,xmm11
+        and     r12,rax
+        xor     r13,rax
+        add     rdx,QWORD[32+rsp]
+        mov     r15,r8
+DB      143,72,120,195,209,7
+        xor     r12,rcx
+        ror     r14,6
+        vpxor   xmm8,xmm8,xmm9
+        xor     r15,r9
+        add     rdx,r12
+        ror     r13,14
+        and     rdi,r15
+DB      143,104,120,195,217,3
+        xor     r14,r8
+        add     rdx,r13
+        vpxor   xmm8,xmm8,xmm10
+        xor     rdi,r9
+        ror     r14,28
+        vpsrlq  xmm10,xmm1,6
+        add     r11,rdx
+        add     rdx,rdi
+        vpaddq  xmm2,xmm2,xmm8
+        mov     r13,r11
+        add     r14,rdx
+DB      143,72,120,195,203,42
+        ror     r13,23
+        mov     rdx,r14
+        vpxor   xmm11,xmm11,xmm10
+        mov     r12,rax
+        ror     r14,5
+        xor     r13,r11
+        xor     r12,rbx
+        vpxor   xmm11,xmm11,xmm9
+        ror     r13,4
+        xor     r14,rdx
+        and     r12,r11
+        xor     r13,r11
+        vpaddq  xmm2,xmm2,xmm11
+        add     rcx,QWORD[40+rsp]
+        mov     rdi,rdx
+        xor     r12,rbx
+        ror     r14,6
+        vpaddq  xmm10,xmm2,XMMWORD[((-64))+rbp]
+        xor     rdi,r8
+        add     rcx,r12
+        ror     r13,14
+        and     r15,rdi
+        xor     r14,rdx
+        add     rcx,r13
+        xor     r15,r8
+        ror     r14,28
+        add     r10,rcx
+        add     rcx,r15
+        mov     r13,r10
+        add     r14,rcx
+        vmovdqa XMMWORD[32+rsp],xmm10
+        vpalignr        xmm8,xmm4,xmm3,8
+        ror     r13,23
+        mov     rcx,r14
+        vpalignr        xmm11,xmm0,xmm7,8
+        mov     r12,r11
+        ror     r14,5
+DB      143,72,120,195,200,56
+        xor     r13,r10
+        xor     r12,rax
+        vpsrlq  xmm8,xmm8,7
+        ror     r13,4
+        xor     r14,rcx
+        vpaddq  xmm3,xmm3,xmm11
+        and     r12,r10
+        xor     r13,r10
+        add     rbx,QWORD[48+rsp]
+        mov     r15,rcx
+DB      143,72,120,195,209,7
+        xor     r12,rax
+        ror     r14,6
+        vpxor   xmm8,xmm8,xmm9
+        xor     r15,rdx
+        add     rbx,r12
+        ror     r13,14
+        and     rdi,r15
+DB      143,104,120,195,218,3
+        xor     r14,rcx
+        add     rbx,r13
+        vpxor   xmm8,xmm8,xmm10
+        xor     rdi,rdx
+        ror     r14,28
+        vpsrlq  xmm10,xmm2,6
+        add     r9,rbx
+        add     rbx,rdi
+        vpaddq  xmm3,xmm3,xmm8
+        mov     r13,r9
+        add     r14,rbx
+DB      143,72,120,195,203,42
+        ror     r13,23
+        mov     rbx,r14
+        vpxor   xmm11,xmm11,xmm10
+        mov     r12,r10
+        ror     r14,5
+        xor     r13,r9
+        xor     r12,r11
+        vpxor   xmm11,xmm11,xmm9
+        ror     r13,4
+        xor     r14,rbx
+        and     r12,r9
+        xor     r13,r9
+        vpaddq  xmm3,xmm3,xmm11
+        add     rax,QWORD[56+rsp]
+        mov     rdi,rbx
+        xor     r12,r11
+        ror     r14,6
+        vpaddq  xmm10,xmm3,XMMWORD[((-32))+rbp]
+        xor     rdi,rcx
+        add     rax,r12
+        ror     r13,14
+        and     r15,rdi
+        xor     r14,rbx
+        add     rax,r13
+        xor     r15,rcx
+        ror     r14,28
+        add     r8,rax
+        add     rax,r15
+        mov     r13,r8
+        add     r14,rax
+        vmovdqa XMMWORD[48+rsp],xmm10
+        vpalignr        xmm8,xmm5,xmm4,8
+        ror     r13,23
+        mov     rax,r14
+        vpalignr        xmm11,xmm1,xmm0,8
+        mov     r12,r9
+        ror     r14,5
+DB      143,72,120,195,200,56
+        xor     r13,r8
+        xor     r12,r10
+        vpsrlq  xmm8,xmm8,7
+        ror     r13,4
+        xor     r14,rax
+        vpaddq  xmm4,xmm4,xmm11
+        and     r12,r8
+        xor     r13,r8
+        add     r11,QWORD[64+rsp]
+        mov     r15,rax
+DB      143,72,120,195,209,7
+        xor     r12,r10
+        ror     r14,6
+        vpxor   xmm8,xmm8,xmm9
+        xor     r15,rbx
+        add     r11,r12
+        ror     r13,14
+        and     rdi,r15
+DB      143,104,120,195,219,3
+        xor     r14,rax
+        add     r11,r13
+        vpxor   xmm8,xmm8,xmm10
+        xor     rdi,rbx
+        ror     r14,28
+        vpsrlq  xmm10,xmm3,6
+        add     rdx,r11
+        add     r11,rdi
+        vpaddq  xmm4,xmm4,xmm8
+        mov     r13,rdx
+        add     r14,r11
+DB      143,72,120,195,203,42
+        ror     r13,23
+        mov     r11,r14
+        vpxor   xmm11,xmm11,xmm10
+        mov     r12,r8
+        ror     r14,5
+        xor     r13,rdx
+        xor     r12,r9
+        vpxor   xmm11,xmm11,xmm9
+        ror     r13,4
+        xor     r14,r11
+        and     r12,rdx
+        xor     r13,rdx
+        vpaddq  xmm4,xmm4,xmm11
+        add     r10,QWORD[72+rsp]
+        mov     rdi,r11
+        xor     r12,r9
+        ror     r14,6
+        vpaddq  xmm10,xmm4,XMMWORD[rbp]
+        xor     rdi,rax
+        add     r10,r12
+        ror     r13,14
+        and     r15,rdi
+        xor     r14,r11
+        add     r10,r13
+        xor     r15,rax
+        ror     r14,28
+        add     rcx,r10
+        add     r10,r15
+        mov     r13,rcx
+        add     r14,r10
+        vmovdqa XMMWORD[64+rsp],xmm10
+        vpalignr        xmm8,xmm6,xmm5,8
+        ror     r13,23
+        mov     r10,r14
+        vpalignr        xmm11,xmm2,xmm1,8
+        mov     r12,rdx
+        ror     r14,5
+DB      143,72,120,195,200,56
+        xor     r13,rcx
+        xor     r12,r8
+        vpsrlq  xmm8,xmm8,7
+        ror     r13,4
+        xor     r14,r10
+        vpaddq  xmm5,xmm5,xmm11
+        and     r12,rcx
+        xor     r13,rcx
+        add     r9,QWORD[80+rsp]
+        mov     r15,r10
+DB      143,72,120,195,209,7
+        xor     r12,r8
+        ror     r14,6
+        vpxor   xmm8,xmm8,xmm9
+        xor     r15,r11
+        add     r9,r12
+        ror     r13,14
+        and     rdi,r15
+DB      143,104,120,195,220,3
+        xor     r14,r10
+        add     r9,r13
+        vpxor   xmm8,xmm8,xmm10
+        xor     rdi,r11
+        ror     r14,28
+        vpsrlq  xmm10,xmm4,6
+        add     rbx,r9
+        add     r9,rdi
+        vpaddq  xmm5,xmm5,xmm8
+        mov     r13,rbx
+        add     r14,r9
+DB      143,72,120,195,203,42
+        ror     r13,23
+        mov     r9,r14
+        vpxor   xmm11,xmm11,xmm10
+        mov     r12,rcx
+        ror     r14,5
+        xor     r13,rbx
+        xor     r12,rdx
+        vpxor   xmm11,xmm11,xmm9
+        ror     r13,4
+        xor     r14,r9
+        and     r12,rbx
+        xor     r13,rbx
+        vpaddq  xmm5,xmm5,xmm11
+        add     r8,QWORD[88+rsp]
+        mov     rdi,r9
+        xor     r12,rdx
+        ror     r14,6
+        vpaddq  xmm10,xmm5,XMMWORD[32+rbp]
+        xor     rdi,r10
+        add     r8,r12
+        ror     r13,14
+        and     r15,rdi
+        xor     r14,r9
+        add     r8,r13
+        xor     r15,r10
+        ror     r14,28
+        add     rax,r8
+        add     r8,r15
+        mov     r13,rax
+        add     r14,r8
+        vmovdqa XMMWORD[80+rsp],xmm10
+        vpalignr        xmm8,xmm7,xmm6,8
+        ror     r13,23
+        mov     r8,r14
+        vpalignr        xmm11,xmm3,xmm2,8
+        mov     r12,rbx
+        ror     r14,5
+DB      143,72,120,195,200,56
+        xor     r13,rax
+        xor     r12,rcx
+        vpsrlq  xmm8,xmm8,7
+        ror     r13,4
+        xor     r14,r8
+        vpaddq  xmm6,xmm6,xmm11
+        and     r12,rax
+        xor     r13,rax
+        add     rdx,QWORD[96+rsp]
+        mov     r15,r8
+DB      143,72,120,195,209,7
+        xor     r12,rcx
+        ror     r14,6
+        vpxor   xmm8,xmm8,xmm9
+        xor     r15,r9
+        add     rdx,r12
+        ror     r13,14
+        and     rdi,r15
+DB      143,104,120,195,221,3
+        xor     r14,r8
+        add     rdx,r13
+        vpxor   xmm8,xmm8,xmm10
+        xor     rdi,r9
+        ror     r14,28
+        vpsrlq  xmm10,xmm5,6
+        add     r11,rdx
+        add     rdx,rdi
+        vpaddq  xmm6,xmm6,xmm8
+        mov     r13,r11
+        add     r14,rdx
+DB      143,72,120,195,203,42
+        ror     r13,23
+        mov     rdx,r14
+        vpxor   xmm11,xmm11,xmm10
+        mov     r12,rax
+        ror     r14,5
+        xor     r13,r11
+        xor     r12,rbx
+        vpxor   xmm11,xmm11,xmm9
+        ror     r13,4
+        xor     r14,rdx
+        and     r12,r11
+        xor     r13,r11
+        vpaddq  xmm6,xmm6,xmm11
+        add     rcx,QWORD[104+rsp]
+        mov     rdi,rdx
+        xor     r12,rbx
+        ror     r14,6
+        vpaddq  xmm10,xmm6,XMMWORD[64+rbp]
+        xor     rdi,r8
+        add     rcx,r12
+        ror     r13,14
+        and     r15,rdi
+        xor     r14,rdx
+        add     rcx,r13
+        xor     r15,r8
+        ror     r14,28
+        add     r10,rcx
+        add     rcx,r15
+        mov     r13,r10
+        add     r14,rcx
+        vmovdqa XMMWORD[96+rsp],xmm10
+        vpalignr        xmm8,xmm0,xmm7,8
+        ror     r13,23
+        mov     rcx,r14
+        vpalignr        xmm11,xmm4,xmm3,8
+        mov     r12,r11
+        ror     r14,5
+DB      143,72,120,195,200,56
+        xor     r13,r10
+        xor     r12,rax
+        vpsrlq  xmm8,xmm8,7
+        ror     r13,4
+        xor     r14,rcx
+        vpaddq  xmm7,xmm7,xmm11
+        and     r12,r10
+        xor     r13,r10
+        add     rbx,QWORD[112+rsp]
+        mov     r15,rcx
+DB      143,72,120,195,209,7
+        xor     r12,rax
+        ror     r14,6
+        vpxor   xmm8,xmm8,xmm9
+        xor     r15,rdx
+        add     rbx,r12
+        ror     r13,14
+        and     rdi,r15
+DB      143,104,120,195,222,3
+        xor     r14,rcx
+        add     rbx,r13
+        vpxor   xmm8,xmm8,xmm10
+        xor     rdi,rdx
+        ror     r14,28
+        vpsrlq  xmm10,xmm6,6
+        add     r9,rbx
+        add     rbx,rdi
+        vpaddq  xmm7,xmm7,xmm8
+        mov     r13,r9
+        add     r14,rbx
+DB      143,72,120,195,203,42
+        ror     r13,23
+        mov     rbx,r14
+        vpxor   xmm11,xmm11,xmm10
+        mov     r12,r10
+        ror     r14,5
+        xor     r13,r9
+        xor     r12,r11
+        vpxor   xmm11,xmm11,xmm9
+        ror     r13,4
+        xor     r14,rbx
+        and     r12,r9
+        xor     r13,r9
+        vpaddq  xmm7,xmm7,xmm11
+        add     rax,QWORD[120+rsp]
+        mov     rdi,rbx
+        xor     r12,r11
+        ror     r14,6
+        vpaddq  xmm10,xmm7,XMMWORD[96+rbp]
+        xor     rdi,rcx
+        add     rax,r12
+        ror     r13,14
+        and     r15,rdi
+        xor     r14,rbx
+        add     rax,r13
+        xor     r15,rcx
+        ror     r14,28
+        add     r8,rax
+        add     rax,r15
+        mov     r13,r8
+        add     r14,rax
+        vmovdqa XMMWORD[112+rsp],xmm10
+        cmp     BYTE[135+rbp],0
+        jne     NEAR $L$xop_00_47
+        ror     r13,23
+        mov     rax,r14
+        mov     r12,r9
+        ror     r14,5
+        xor     r13,r8
+        xor     r12,r10
+        ror     r13,4
+        xor     r14,rax
+        and     r12,r8
+        xor     r13,r8
+        add     r11,QWORD[rsp]
+        mov     r15,rax
+        xor     r12,r10
+        ror     r14,6
+        xor     r15,rbx
+        add     r11,r12
+        ror     r13,14
+        and     rdi,r15
+        xor     r14,rax
+        add     r11,r13
+        xor     rdi,rbx
+        ror     r14,28
+        add     rdx,r11
+        add     r11,rdi
+        mov     r13,rdx
+        add     r14,r11
+        ror     r13,23
+        mov     r11,r14
+        mov     r12,r8
+        ror     r14,5
+        xor     r13,rdx
+        xor     r12,r9
+        ror     r13,4
+        xor     r14,r11
+        and     r12,rdx
+        xor     r13,rdx
+        add     r10,QWORD[8+rsp]
+        mov     rdi,r11
+        xor     r12,r9
+        ror     r14,6
+        xor     rdi,rax
+        add     r10,r12
+        ror     r13,14
+        and     r15,rdi
+        xor     r14,r11
+        add     r10,r13
+        xor     r15,rax
+        ror     r14,28
+        add     rcx,r10
+        add     r10,r15
+        mov     r13,rcx
+        add     r14,r10
+        ror     r13,23
+        mov     r10,r14
+        mov     r12,rdx
+        ror     r14,5
+        xor     r13,rcx
+        xor     r12,r8
+        ror     r13,4
+        xor     r14,r10
+        and     r12,rcx
+        xor     r13,rcx
+        add     r9,QWORD[16+rsp]
+        mov     r15,r10
+        xor     r12,r8
+        ror     r14,6
+        xor     r15,r11
+        add     r9,r12
+        ror     r13,14
+        and     rdi,r15
+        xor     r14,r10
+        add     r9,r13
+        xor     rdi,r11
+        ror     r14,28
+        add     rbx,r9
+        add     r9,rdi
+        mov     r13,rbx
+        add     r14,r9
+        ror     r13,23
+        mov     r9,r14
+        mov     r12,rcx
+        ror     r14,5
+        xor     r13,rbx
+        xor     r12,rdx
+        ror     r13,4
+        xor     r14,r9
+        and     r12,rbx
+        xor     r13,rbx
+        add     r8,QWORD[24+rsp]
+        mov     rdi,r9
+        xor     r12,rdx
+        ror     r14,6
+        xor     rdi,r10
+        add     r8,r12
+        ror     r13,14
+        and     r15,rdi
+        xor     r14,r9
+        add     r8,r13
+        xor     r15,r10
+        ror     r14,28
+        add     rax,r8
+        add     r8,r15
+        mov     r13,rax
+        add     r14,r8
+        ror     r13,23
+        mov     r8,r14
+        mov     r12,rbx
+        ror     r14,5
+        xor     r13,rax
+        xor     r12,rcx
+        ror     r13,4
+        xor     r14,r8
+        and     r12,rax
+        xor     r13,rax
+        add     rdx,QWORD[32+rsp]
+        mov     r15,r8
+        xor     r12,rcx
+        ror     r14,6
+        xor     r15,r9
+        add     rdx,r12
+        ror     r13,14
+        and     rdi,r15
+        xor     r14,r8
+        add     rdx,r13
+        xor     rdi,r9
+        ror     r14,28
+        add     r11,rdx
+        add     rdx,rdi
+        mov     r13,r11
+        add     r14,rdx
+        ror     r13,23
+        mov     rdx,r14
+        mov     r12,rax
+        ror     r14,5
+        xor     r13,r11
+        xor     r12,rbx
+        ror     r13,4
+        xor     r14,rdx
+        and     r12,r11
+        xor     r13,r11
+        add     rcx,QWORD[40+rsp]
+        mov     rdi,rdx
+        xor     r12,rbx
+        ror     r14,6
+        xor     rdi,r8
+        add     rcx,r12
+        ror     r13,14
+        and     r15,rdi
+        xor     r14,rdx
+        add     rcx,r13
+        xor     r15,r8
+        ror     r14,28
+        add     r10,rcx
+        add     rcx,r15
+        mov     r13,r10
+        add     r14,rcx
+        ror     r13,23
+        mov     rcx,r14
+        mov     r12,r11
+        ror     r14,5
+        xor     r13,r10
+        xor     r12,rax
+        ror     r13,4
+        xor     r14,rcx
+        and     r12,r10
+        xor     r13,r10
+        add     rbx,QWORD[48+rsp]
+        mov     r15,rcx
+        xor     r12,rax
+        ror     r14,6
+        xor     r15,rdx
+        add     rbx,r12
+        ror     r13,14
+        and     rdi,r15
+        xor     r14,rcx
+        add     rbx,r13
+        xor     rdi,rdx
+        ror     r14,28
+        add     r9,rbx
+        add     rbx,rdi
+        mov     r13,r9
+        add     r14,rbx
+        ror     r13,23
+        mov     rbx,r14
+        mov     r12,r10
+        ror     r14,5
+        xor     r13,r9
+        xor     r12,r11
+        ror     r13,4
+        xor     r14,rbx
+        and     r12,r9
+        xor     r13,r9
+        add     rax,QWORD[56+rsp]
+        mov     rdi,rbx
+        xor     r12,r11
+        ror     r14,6
+        xor     rdi,rcx
+        add     rax,r12
+        ror     r13,14
+        and     r15,rdi
+        xor     r14,rbx
+        add     rax,r13
+        xor     r15,rcx
+        ror     r14,28
+        add     r8,rax
+        add     rax,r15
+        mov     r13,r8
+        add     r14,rax
+        ror     r13,23
+        mov     rax,r14
+        mov     r12,r9
+        ror     r14,5
+        xor     r13,r8
+        xor     r12,r10
+        ror     r13,4
+        xor     r14,rax
+        and     r12,r8
+        xor     r13,r8
+        add     r11,QWORD[64+rsp]
+        mov     r15,rax
+        xor     r12,r10
+        ror     r14,6
+        xor     r15,rbx
+        add     r11,r12
+        ror     r13,14
+        and     rdi,r15
+        xor     r14,rax
+        add     r11,r13
+        xor     rdi,rbx
+        ror     r14,28
+        add     rdx,r11
+        add     r11,rdi
+        mov     r13,rdx
+        add     r14,r11
+        ror     r13,23
+        mov     r11,r14
+        mov     r12,r8
+        ror     r14,5
+        xor     r13,rdx
+        xor     r12,r9
+        ror     r13,4
+        xor     r14,r11
+        and     r12,rdx
+        xor     r13,rdx
+        add     r10,QWORD[72+rsp]
+        mov     rdi,r11
+        xor     r12,r9
+        ror     r14,6
+        xor     rdi,rax
+        add     r10,r12
+        ror     r13,14
+        and     r15,rdi
+        xor     r14,r11
+        add     r10,r13
+        xor     r15,rax
+        ror     r14,28
+        add     rcx,r10
+        add     r10,r15
+        mov     r13,rcx
+        add     r14,r10
+        ror     r13,23
+        mov     r10,r14
+        mov     r12,rdx
+        ror     r14,5
+        xor     r13,rcx
+        xor     r12,r8
+        ror     r13,4
+        xor     r14,r10
+        and     r12,rcx
+        xor     r13,rcx
+        add     r9,QWORD[80+rsp]
+        mov     r15,r10
+        xor     r12,r8
+        ror     r14,6
+        xor     r15,r11
+        add     r9,r12
+        ror     r13,14
+        and     rdi,r15
+        xor     r14,r10
+        add     r9,r13
+        xor     rdi,r11
+        ror     r14,28
+        add     rbx,r9
+        add     r9,rdi
+        mov     r13,rbx
+        add     r14,r9
+        ror     r13,23
+        mov     r9,r14
+        mov     r12,rcx
+        ror     r14,5
+        xor     r13,rbx
+        xor     r12,rdx
+        ror     r13,4
+        xor     r14,r9
+        and     r12,rbx
+        xor     r13,rbx
+        add     r8,QWORD[88+rsp]
+        mov     rdi,r9
+        xor     r12,rdx
+        ror     r14,6
+        xor     rdi,r10
+        add     r8,r12
+        ror     r13,14
+        and     r15,rdi
+        xor     r14,r9
+        add     r8,r13
+        xor     r15,r10
+        ror     r14,28
+        add     rax,r8
+        add     r8,r15
+        mov     r13,rax
+        add     r14,r8
+        ror     r13,23
+        mov     r8,r14
+        mov     r12,rbx
+        ror     r14,5
+        xor     r13,rax
+        xor     r12,rcx
+        ror     r13,4
+        xor     r14,r8
+        and     r12,rax
+        xor     r13,rax
+        add     rdx,QWORD[96+rsp]
+        mov     r15,r8
+        xor     r12,rcx
+        ror     r14,6
+        xor     r15,r9
+        add     rdx,r12
+        ror     r13,14
+        and     rdi,r15
+        xor     r14,r8
+        add     rdx,r13
+        xor     rdi,r9
+        ror     r14,28
+        add     r11,rdx
+        add     rdx,rdi
+        mov     r13,r11
+        add     r14,rdx
+        ror     r13,23
+        mov     rdx,r14
+        mov     r12,rax
+        ror     r14,5
+        xor     r13,r11
+        xor     r12,rbx
+        ror     r13,4
+        xor     r14,rdx
+        and     r12,r11
+        xor     r13,r11
+        add     rcx,QWORD[104+rsp]
+        mov     rdi,rdx
+        xor     r12,rbx
+        ror     r14,6
+        xor     rdi,r8
+        add     rcx,r12
+        ror     r13,14
+        and     r15,rdi
+        xor     r14,rdx
+        add     rcx,r13
+        xor     r15,r8
+        ror     r14,28
+        add     r10,rcx
+        add     rcx,r15
+        mov     r13,r10
+        add     r14,rcx
+        ror     r13,23
+        mov     rcx,r14
+        mov     r12,r11
+        ror     r14,5
+        xor     r13,r10
+        xor     r12,rax
+        ror     r13,4
+        xor     r14,rcx
+        and     r12,r10
+        xor     r13,r10
+        add     rbx,QWORD[112+rsp]
+        mov     r15,rcx
+        xor     r12,rax
+        ror     r14,6
+        xor     r15,rdx
+        add     rbx,r12
+        ror     r13,14
+        and     rdi,r15
+        xor     r14,rcx
+        add     rbx,r13
+        xor     rdi,rdx
+        ror     r14,28
+        add     r9,rbx
+        add     rbx,rdi
+        mov     r13,r9
+        add     r14,rbx
+        ror     r13,23
+        mov     rbx,r14
+        mov     r12,r10
+        ror     r14,5
+        xor     r13,r9
+        xor     r12,r11
+        ror     r13,4
+        xor     r14,rbx
+        and     r12,r9
+        xor     r13,r9
+        add     rax,QWORD[120+rsp]
+        mov     rdi,rbx
+        xor     r12,r11
+        ror     r14,6
+        xor     rdi,rcx
+        add     rax,r12
+        ror     r13,14
+        and     r15,rdi
+        xor     r14,rbx
+        add     rax,r13
+        xor     r15,rcx
+        ror     r14,28
+        add     r8,rax
+        add     rax,r15
+        mov     r13,r8
+        add     r14,rax
+        mov     rdi,QWORD[((128+0))+rsp]
+        mov     rax,r14
+
+        add     rax,QWORD[rdi]
+        lea     rsi,[128+rsi]
+        add     rbx,QWORD[8+rdi]
+        add     rcx,QWORD[16+rdi]
+        add     rdx,QWORD[24+rdi]
+        add     r8,QWORD[32+rdi]
+        add     r9,QWORD[40+rdi]
+        add     r10,QWORD[48+rdi]
+        add     r11,QWORD[56+rdi]
+
+        cmp     rsi,QWORD[((128+16))+rsp]
+
+        mov     QWORD[rdi],rax
+        mov     QWORD[8+rdi],rbx
+        mov     QWORD[16+rdi],rcx
+        mov     QWORD[24+rdi],rdx
+        mov     QWORD[32+rdi],r8
+        mov     QWORD[40+rdi],r9
+        mov     QWORD[48+rdi],r10
+        mov     QWORD[56+rdi],r11
+        jb      NEAR $L$loop_xop
+
+        mov     rsi,QWORD[152+rsp]
+
+        vzeroupper
+        movaps  xmm6,XMMWORD[((128+32))+rsp]
+        movaps  xmm7,XMMWORD[((128+48))+rsp]
+        movaps  xmm8,XMMWORD[((128+64))+rsp]
+        movaps  xmm9,XMMWORD[((128+80))+rsp]
+        movaps  xmm10,XMMWORD[((128+96))+rsp]
+        movaps  xmm11,XMMWORD[((128+112))+rsp]
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$epilogue_xop:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_sha512_block_data_order_xop:
+
+ALIGN   64
+sha512_block_data_order_avx:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha512_block_data_order_avx:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+$L$avx_shortcut:
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        shl     rdx,4
+        sub     rsp,256
+        lea     rdx,[rdx*8+rsi]
+        and     rsp,-64
+        mov     QWORD[((128+0))+rsp],rdi
+        mov     QWORD[((128+8))+rsp],rsi
+        mov     QWORD[((128+16))+rsp],rdx
+        mov     QWORD[152+rsp],rax
+
+        movaps  XMMWORD[(128+32)+rsp],xmm6
+        movaps  XMMWORD[(128+48)+rsp],xmm7
+        movaps  XMMWORD[(128+64)+rsp],xmm8
+        movaps  XMMWORD[(128+80)+rsp],xmm9
+        movaps  XMMWORD[(128+96)+rsp],xmm10
+        movaps  XMMWORD[(128+112)+rsp],xmm11
+$L$prologue_avx:
+
+        vzeroupper
+        mov     rax,QWORD[rdi]
+        mov     rbx,QWORD[8+rdi]
+        mov     rcx,QWORD[16+rdi]
+        mov     rdx,QWORD[24+rdi]
+        mov     r8,QWORD[32+rdi]
+        mov     r9,QWORD[40+rdi]
+        mov     r10,QWORD[48+rdi]
+        mov     r11,QWORD[56+rdi]
+        jmp     NEAR $L$loop_avx
+ALIGN   16
+$L$loop_avx:
+        vmovdqa xmm11,XMMWORD[((K512+1280))]
+        vmovdqu xmm0,XMMWORD[rsi]
+        lea     rbp,[((K512+128))]
+        vmovdqu xmm1,XMMWORD[16+rsi]
+        vmovdqu xmm2,XMMWORD[32+rsi]
+        vpshufb xmm0,xmm0,xmm11
+        vmovdqu xmm3,XMMWORD[48+rsi]
+        vpshufb xmm1,xmm1,xmm11
+        vmovdqu xmm4,XMMWORD[64+rsi]
+        vpshufb xmm2,xmm2,xmm11
+        vmovdqu xmm5,XMMWORD[80+rsi]
+        vpshufb xmm3,xmm3,xmm11
+        vmovdqu xmm6,XMMWORD[96+rsi]
+        vpshufb xmm4,xmm4,xmm11
+        vmovdqu xmm7,XMMWORD[112+rsi]
+        vpshufb xmm5,xmm5,xmm11
+        vpaddq  xmm8,xmm0,XMMWORD[((-128))+rbp]
+        vpshufb xmm6,xmm6,xmm11
+        vpaddq  xmm9,xmm1,XMMWORD[((-96))+rbp]
+        vpshufb xmm7,xmm7,xmm11
+        vpaddq  xmm10,xmm2,XMMWORD[((-64))+rbp]
+        vpaddq  xmm11,xmm3,XMMWORD[((-32))+rbp]
+        vmovdqa XMMWORD[rsp],xmm8
+        vpaddq  xmm8,xmm4,XMMWORD[rbp]
+        vmovdqa XMMWORD[16+rsp],xmm9
+        vpaddq  xmm9,xmm5,XMMWORD[32+rbp]
+        vmovdqa XMMWORD[32+rsp],xmm10
+        vpaddq  xmm10,xmm6,XMMWORD[64+rbp]
+        vmovdqa XMMWORD[48+rsp],xmm11
+        vpaddq  xmm11,xmm7,XMMWORD[96+rbp]
+        vmovdqa XMMWORD[64+rsp],xmm8
+        mov     r14,rax
+        vmovdqa XMMWORD[80+rsp],xmm9
+        mov     rdi,rbx
+        vmovdqa XMMWORD[96+rsp],xmm10
+        xor     rdi,rcx
+        vmovdqa XMMWORD[112+rsp],xmm11
+        mov     r13,r8
+        jmp     NEAR $L$avx_00_47
+
+ALIGN   16
+$L$avx_00_47:
+        add     rbp,256
+        vpalignr        xmm8,xmm1,xmm0,8
+        shrd    r13,r13,23
+        mov     rax,r14
+        vpalignr        xmm11,xmm5,xmm4,8
+        mov     r12,r9
+        shrd    r14,r14,5
+        vpsrlq  xmm10,xmm8,1
+        xor     r13,r8
+        xor     r12,r10
+        vpaddq  xmm0,xmm0,xmm11
+        shrd    r13,r13,4
+        xor     r14,rax
+        vpsrlq  xmm11,xmm8,7
+        and     r12,r8
+        xor     r13,r8
+        vpsllq  xmm9,xmm8,56
+        add     r11,QWORD[rsp]
+        mov     r15,rax
+        vpxor   xmm8,xmm11,xmm10
+        xor     r12,r10
+        shrd    r14,r14,6
+        vpsrlq  xmm10,xmm10,7
+        xor     r15,rbx
+        add     r11,r12
+        vpxor   xmm8,xmm8,xmm9
+        shrd    r13,r13,14
+        and     rdi,r15
+        vpsllq  xmm9,xmm9,7
+        xor     r14,rax
+        add     r11,r13
+        vpxor   xmm8,xmm8,xmm10
+        xor     rdi,rbx
+        shrd    r14,r14,28
+        vpsrlq  xmm11,xmm7,6
+        add     rdx,r11
+        add     r11,rdi
+        vpxor   xmm8,xmm8,xmm9
+        mov     r13,rdx
+        add     r14,r11
+        vpsllq  xmm10,xmm7,3
+        shrd    r13,r13,23
+        mov     r11,r14
+        vpaddq  xmm0,xmm0,xmm8
+        mov     r12,r8
+        shrd    r14,r14,5
+        vpsrlq  xmm9,xmm7,19
+        xor     r13,rdx
+        xor     r12,r9
+        vpxor   xmm11,xmm11,xmm10
+        shrd    r13,r13,4
+        xor     r14,r11
+        vpsllq  xmm10,xmm10,42
+        and     r12,rdx
+        xor     r13,rdx
+        vpxor   xmm11,xmm11,xmm9
+        add     r10,QWORD[8+rsp]
+        mov     rdi,r11
+        vpsrlq  xmm9,xmm9,42
+        xor     r12,r9
+        shrd    r14,r14,6
+        vpxor   xmm11,xmm11,xmm10
+        xor     rdi,rax
+        add     r10,r12
+        vpxor   xmm11,xmm11,xmm9
+        shrd    r13,r13,14
+        and     r15,rdi
+        vpaddq  xmm0,xmm0,xmm11
+        xor     r14,r11
+        add     r10,r13
+        vpaddq  xmm10,xmm0,XMMWORD[((-128))+rbp]
+        xor     r15,rax
+        shrd    r14,r14,28
+        add     rcx,r10
+        add     r10,r15
+        mov     r13,rcx
+        add     r14,r10
+        vmovdqa XMMWORD[rsp],xmm10
+        vpalignr        xmm8,xmm2,xmm1,8
+        shrd    r13,r13,23
+        mov     r10,r14
+        vpalignr        xmm11,xmm6,xmm5,8
+        mov     r12,rdx
+        shrd    r14,r14,5
+        vpsrlq  xmm10,xmm8,1
+        xor     r13,rcx
+        xor     r12,r8
+        vpaddq  xmm1,xmm1,xmm11
+        shrd    r13,r13,4
+        xor     r14,r10
+        vpsrlq  xmm11,xmm8,7
+        and     r12,rcx
+        xor     r13,rcx
+        vpsllq  xmm9,xmm8,56
+        add     r9,QWORD[16+rsp]
+        mov     r15,r10
+        vpxor   xmm8,xmm11,xmm10
+        xor     r12,r8
+        shrd    r14,r14,6
+        vpsrlq  xmm10,xmm10,7
+        xor     r15,r11
+        add     r9,r12
+        vpxor   xmm8,xmm8,xmm9
+        shrd    r13,r13,14
+        and     rdi,r15
+        vpsllq  xmm9,xmm9,7
+        xor     r14,r10
+        add     r9,r13
+        vpxor   xmm8,xmm8,xmm10
+        xor     rdi,r11
+        shrd    r14,r14,28
+        vpsrlq  xmm11,xmm0,6
+        add     rbx,r9
+        add     r9,rdi
+        vpxor   xmm8,xmm8,xmm9
+        mov     r13,rbx
+        add     r14,r9
+        vpsllq  xmm10,xmm0,3
+        shrd    r13,r13,23
+        mov     r9,r14
+        vpaddq  xmm1,xmm1,xmm8
+        mov     r12,rcx
+        shrd    r14,r14,5
+        vpsrlq  xmm9,xmm0,19
+        xor     r13,rbx
+        xor     r12,rdx
+        vpxor   xmm11,xmm11,xmm10
+        shrd    r13,r13,4
+        xor     r14,r9
+        vpsllq  xmm10,xmm10,42
+        and     r12,rbx
+        xor     r13,rbx
+        vpxor   xmm11,xmm11,xmm9
+        add     r8,QWORD[24+rsp]
+        mov     rdi,r9
+        vpsrlq  xmm9,xmm9,42
+        xor     r12,rdx
+        shrd    r14,r14,6
+        vpxor   xmm11,xmm11,xmm10
+        xor     rdi,r10
+        add     r8,r12
+        vpxor   xmm11,xmm11,xmm9
+        shrd    r13,r13,14
+        and     r15,rdi
+        vpaddq  xmm1,xmm1,xmm11
+        xor     r14,r9
+        add     r8,r13
+        vpaddq  xmm10,xmm1,XMMWORD[((-96))+rbp]
+        xor     r15,r10
+        shrd    r14,r14,28
+        add     rax,r8
+        add     r8,r15
+        mov     r13,rax
+        add     r14,r8
+        vmovdqa XMMWORD[16+rsp],xmm10
+        vpalignr        xmm8,xmm3,xmm2,8
+        shrd    r13,r13,23
+        mov     r8,r14
+        vpalignr        xmm11,xmm7,xmm6,8
+        mov     r12,rbx
+        shrd    r14,r14,5
+        vpsrlq  xmm10,xmm8,1
+        xor     r13,rax
+        xor     r12,rcx
+        vpaddq  xmm2,xmm2,xmm11
+        shrd    r13,r13,4
+        xor     r14,r8
+        vpsrlq  xmm11,xmm8,7
+        and     r12,rax
+        xor     r13,rax
+        vpsllq  xmm9,xmm8,56
+        add     rdx,QWORD[32+rsp]
+        mov     r15,r8
+        vpxor   xmm8,xmm11,xmm10
+        xor     r12,rcx
+        shrd    r14,r14,6
+        vpsrlq  xmm10,xmm10,7
+        xor     r15,r9
+        add     rdx,r12
+        vpxor   xmm8,xmm8,xmm9
+        shrd    r13,r13,14
+        and     rdi,r15
+        vpsllq  xmm9,xmm9,7
+        xor     r14,r8
+        add     rdx,r13
+        vpxor   xmm8,xmm8,xmm10
+        xor     rdi,r9
+        shrd    r14,r14,28
+        vpsrlq  xmm11,xmm1,6
+        add     r11,rdx
+        add     rdx,rdi
+        vpxor   xmm8,xmm8,xmm9
+        mov     r13,r11
+        add     r14,rdx
+        vpsllq  xmm10,xmm1,3
+        shrd    r13,r13,23
+        mov     rdx,r14
+        vpaddq  xmm2,xmm2,xmm8
+        mov     r12,rax
+        shrd    r14,r14,5
+        vpsrlq  xmm9,xmm1,19
+        xor     r13,r11
+        xor     r12,rbx
+        vpxor   xmm11,xmm11,xmm10
+        shrd    r13,r13,4
+        xor     r14,rdx
+        vpsllq  xmm10,xmm10,42
+        and     r12,r11
+        xor     r13,r11
+        vpxor   xmm11,xmm11,xmm9
+        add     rcx,QWORD[40+rsp]
+        mov     rdi,rdx
+        vpsrlq  xmm9,xmm9,42
+        xor     r12,rbx
+        shrd    r14,r14,6
+        vpxor   xmm11,xmm11,xmm10
+        xor     rdi,r8
+        add     rcx,r12
+        vpxor   xmm11,xmm11,xmm9
+        shrd    r13,r13,14
+        and     r15,rdi
+        vpaddq  xmm2,xmm2,xmm11
+        xor     r14,rdx
+        add     rcx,r13
+        vpaddq  xmm10,xmm2,XMMWORD[((-64))+rbp]
+        xor     r15,r8
+        shrd    r14,r14,28
+        add     r10,rcx
+        add     rcx,r15
+        mov     r13,r10
+        add     r14,rcx
+        vmovdqa XMMWORD[32+rsp],xmm10
+        vpalignr        xmm8,xmm4,xmm3,8
+        shrd    r13,r13,23
+        mov     rcx,r14
+        vpalignr        xmm11,xmm0,xmm7,8
+        mov     r12,r11
+        shrd    r14,r14,5
+        vpsrlq  xmm10,xmm8,1
+        xor     r13,r10
+        xor     r12,rax
+        vpaddq  xmm3,xmm3,xmm11
+        shrd    r13,r13,4
+        xor     r14,rcx
+        vpsrlq  xmm11,xmm8,7
+        and     r12,r10
+        xor     r13,r10
+        vpsllq  xmm9,xmm8,56
+        add     rbx,QWORD[48+rsp]
+        mov     r15,rcx
+        vpxor   xmm8,xmm11,xmm10
+        xor     r12,rax
+        shrd    r14,r14,6
+        vpsrlq  xmm10,xmm10,7
+        xor     r15,rdx
+        add     rbx,r12
+        vpxor   xmm8,xmm8,xmm9
+        shrd    r13,r13,14
+        and     rdi,r15
+        vpsllq  xmm9,xmm9,7
+        xor     r14,rcx
+        add     rbx,r13
+        vpxor   xmm8,xmm8,xmm10
+        xor     rdi,rdx
+        shrd    r14,r14,28
+        vpsrlq  xmm11,xmm2,6
+        add     r9,rbx
+        add     rbx,rdi
+        vpxor   xmm8,xmm8,xmm9
+        mov     r13,r9
+        add     r14,rbx
+        vpsllq  xmm10,xmm2,3
+        shrd    r13,r13,23
+        mov     rbx,r14
+        vpaddq  xmm3,xmm3,xmm8
+        mov     r12,r10
+        shrd    r14,r14,5
+        vpsrlq  xmm9,xmm2,19
+        xor     r13,r9
+        xor     r12,r11
+        vpxor   xmm11,xmm11,xmm10
+        shrd    r13,r13,4
+        xor     r14,rbx
+        vpsllq  xmm10,xmm10,42
+        and     r12,r9
+        xor     r13,r9
+        vpxor   xmm11,xmm11,xmm9
+        add     rax,QWORD[56+rsp]
+        mov     rdi,rbx
+        vpsrlq  xmm9,xmm9,42
+        xor     r12,r11
+        shrd    r14,r14,6
+        vpxor   xmm11,xmm11,xmm10
+        xor     rdi,rcx
+        add     rax,r12
+        vpxor   xmm11,xmm11,xmm9
+        shrd    r13,r13,14
+        and     r15,rdi
+        vpaddq  xmm3,xmm3,xmm11
+        xor     r14,rbx
+        add     rax,r13
+        vpaddq  xmm10,xmm3,XMMWORD[((-32))+rbp]
+        xor     r15,rcx
+        shrd    r14,r14,28
+        add     r8,rax
+        add     rax,r15
+        mov     r13,r8
+        add     r14,rax
+        vmovdqa XMMWORD[48+rsp],xmm10
+        vpalignr        xmm8,xmm5,xmm4,8
+        shrd    r13,r13,23
+        mov     rax,r14
+        vpalignr        xmm11,xmm1,xmm0,8
+        mov     r12,r9
+        shrd    r14,r14,5
+        vpsrlq  xmm10,xmm8,1
+        xor     r13,r8
+        xor     r12,r10
+        vpaddq  xmm4,xmm4,xmm11
+        shrd    r13,r13,4
+        xor     r14,rax
+        vpsrlq  xmm11,xmm8,7
+        and     r12,r8
+        xor     r13,r8
+        vpsllq  xmm9,xmm8,56
+        add     r11,QWORD[64+rsp]
+        mov     r15,rax
+        vpxor   xmm8,xmm11,xmm10
+        xor     r12,r10
+        shrd    r14,r14,6
+        vpsrlq  xmm10,xmm10,7
+        xor     r15,rbx
+        add     r11,r12
+        vpxor   xmm8,xmm8,xmm9
+        shrd    r13,r13,14
+        and     rdi,r15
+        vpsllq  xmm9,xmm9,7
+        xor     r14,rax
+        add     r11,r13
+        vpxor   xmm8,xmm8,xmm10
+        xor     rdi,rbx
+        shrd    r14,r14,28
+        vpsrlq  xmm11,xmm3,6
+        add     rdx,r11
+        add     r11,rdi
+        vpxor   xmm8,xmm8,xmm9
+        mov     r13,rdx
+        add     r14,r11
+        vpsllq  xmm10,xmm3,3
+        shrd    r13,r13,23
+        mov     r11,r14
+        vpaddq  xmm4,xmm4,xmm8
+        mov     r12,r8
+        shrd    r14,r14,5
+        vpsrlq  xmm9,xmm3,19
+        xor     r13,rdx
+        xor     r12,r9
+        vpxor   xmm11,xmm11,xmm10
+        shrd    r13,r13,4
+        xor     r14,r11
+        vpsllq  xmm10,xmm10,42
+        and     r12,rdx
+        xor     r13,rdx
+        vpxor   xmm11,xmm11,xmm9
+        add     r10,QWORD[72+rsp]
+        mov     rdi,r11
+        vpsrlq  xmm9,xmm9,42
+        xor     r12,r9
+        shrd    r14,r14,6
+        vpxor   xmm11,xmm11,xmm10
+        xor     rdi,rax
+        add     r10,r12
+        vpxor   xmm11,xmm11,xmm9
+        shrd    r13,r13,14
+        and     r15,rdi
+        vpaddq  xmm4,xmm4,xmm11
+        xor     r14,r11
+        add     r10,r13
+        vpaddq  xmm10,xmm4,XMMWORD[rbp]
+        xor     r15,rax
+        shrd    r14,r14,28
+        add     rcx,r10
+        add     r10,r15
+        mov     r13,rcx
+        add     r14,r10
+        vmovdqa XMMWORD[64+rsp],xmm10
+        vpalignr        xmm8,xmm6,xmm5,8
+        shrd    r13,r13,23
+        mov     r10,r14
+        vpalignr        xmm11,xmm2,xmm1,8
+        mov     r12,rdx
+        shrd    r14,r14,5
+        vpsrlq  xmm10,xmm8,1
+        xor     r13,rcx
+        xor     r12,r8
+        vpaddq  xmm5,xmm5,xmm11
+        shrd    r13,r13,4
+        xor     r14,r10
+        vpsrlq  xmm11,xmm8,7
+        and     r12,rcx
+        xor     r13,rcx
+        vpsllq  xmm9,xmm8,56
+        add     r9,QWORD[80+rsp]
+        mov     r15,r10
+        vpxor   xmm8,xmm11,xmm10
+        xor     r12,r8
+        shrd    r14,r14,6
+        vpsrlq  xmm10,xmm10,7
+        xor     r15,r11
+        add     r9,r12
+        vpxor   xmm8,xmm8,xmm9
+        shrd    r13,r13,14
+        and     rdi,r15
+        vpsllq  xmm9,xmm9,7
+        xor     r14,r10
+        add     r9,r13
+        vpxor   xmm8,xmm8,xmm10
+        xor     rdi,r11
+        shrd    r14,r14,28
+        vpsrlq  xmm11,xmm4,6
+        add     rbx,r9
+        add     r9,rdi
+        vpxor   xmm8,xmm8,xmm9
+        mov     r13,rbx
+        add     r14,r9
+        vpsllq  xmm10,xmm4,3
+        shrd    r13,r13,23
+        mov     r9,r14
+        vpaddq  xmm5,xmm5,xmm8
+        mov     r12,rcx
+        shrd    r14,r14,5
+        vpsrlq  xmm9,xmm4,19
+        xor     r13,rbx
+        xor     r12,rdx
+        vpxor   xmm11,xmm11,xmm10
+        shrd    r13,r13,4
+        xor     r14,r9
+        vpsllq  xmm10,xmm10,42
+        and     r12,rbx
+        xor     r13,rbx
+        vpxor   xmm11,xmm11,xmm9
+        add     r8,QWORD[88+rsp]
+        mov     rdi,r9
+        vpsrlq  xmm9,xmm9,42
+        xor     r12,rdx
+        shrd    r14,r14,6
+        vpxor   xmm11,xmm11,xmm10
+        xor     rdi,r10
+        add     r8,r12
+        vpxor   xmm11,xmm11,xmm9
+        shrd    r13,r13,14
+        and     r15,rdi
+        vpaddq  xmm5,xmm5,xmm11
+        xor     r14,r9
+        add     r8,r13
+        vpaddq  xmm10,xmm5,XMMWORD[32+rbp]
+        xor     r15,r10
+        shrd    r14,r14,28
+        add     rax,r8
+        add     r8,r15
+        mov     r13,rax
+        add     r14,r8
+        vmovdqa XMMWORD[80+rsp],xmm10
+        vpalignr        xmm8,xmm7,xmm6,8
+        shrd    r13,r13,23
+        mov     r8,r14
+        vpalignr        xmm11,xmm3,xmm2,8
+        mov     r12,rbx
+        shrd    r14,r14,5
+        vpsrlq  xmm10,xmm8,1
+        xor     r13,rax
+        xor     r12,rcx
+        vpaddq  xmm6,xmm6,xmm11
+        shrd    r13,r13,4
+        xor     r14,r8
+        vpsrlq  xmm11,xmm8,7
+        and     r12,rax
+        xor     r13,rax
+        vpsllq  xmm9,xmm8,56
+        add     rdx,QWORD[96+rsp]
+        mov     r15,r8
+        vpxor   xmm8,xmm11,xmm10
+        xor     r12,rcx
+        shrd    r14,r14,6
+        vpsrlq  xmm10,xmm10,7
+        xor     r15,r9
+        add     rdx,r12
+        vpxor   xmm8,xmm8,xmm9
+        shrd    r13,r13,14
+        and     rdi,r15
+        vpsllq  xmm9,xmm9,7
+        xor     r14,r8
+        add     rdx,r13
+        vpxor   xmm8,xmm8,xmm10
+        xor     rdi,r9
+        shrd    r14,r14,28
+        vpsrlq  xmm11,xmm5,6
+        add     r11,rdx
+        add     rdx,rdi
+        vpxor   xmm8,xmm8,xmm9
+        mov     r13,r11
+        add     r14,rdx
+        vpsllq  xmm10,xmm5,3
+        shrd    r13,r13,23
+        mov     rdx,r14
+        vpaddq  xmm6,xmm6,xmm8
+        mov     r12,rax
+        shrd    r14,r14,5
+        vpsrlq  xmm9,xmm5,19
+        xor     r13,r11
+        xor     r12,rbx
+        vpxor   xmm11,xmm11,xmm10
+        shrd    r13,r13,4
+        xor     r14,rdx
+        vpsllq  xmm10,xmm10,42
+        and     r12,r11
+        xor     r13,r11
+        vpxor   xmm11,xmm11,xmm9
+        add     rcx,QWORD[104+rsp]
+        mov     rdi,rdx
+        vpsrlq  xmm9,xmm9,42
+        xor     r12,rbx
+        shrd    r14,r14,6
+        vpxor   xmm11,xmm11,xmm10
+        xor     rdi,r8
+        add     rcx,r12
+        vpxor   xmm11,xmm11,xmm9
+        shrd    r13,r13,14
+        and     r15,rdi
+        vpaddq  xmm6,xmm6,xmm11
+        xor     r14,rdx
+        add     rcx,r13
+        vpaddq  xmm10,xmm6,XMMWORD[64+rbp]
+        xor     r15,r8
+        shrd    r14,r14,28
+        add     r10,rcx
+        add     rcx,r15
+        mov     r13,r10
+        add     r14,rcx
+        vmovdqa XMMWORD[96+rsp],xmm10
+        vpalignr        xmm8,xmm0,xmm7,8
+        shrd    r13,r13,23
+        mov     rcx,r14
+        vpalignr        xmm11,xmm4,xmm3,8
+        mov     r12,r11
+        shrd    r14,r14,5
+        vpsrlq  xmm10,xmm8,1
+        xor     r13,r10
+        xor     r12,rax
+        vpaddq  xmm7,xmm7,xmm11
+        shrd    r13,r13,4
+        xor     r14,rcx
+        vpsrlq  xmm11,xmm8,7
+        and     r12,r10
+        xor     r13,r10
+        vpsllq  xmm9,xmm8,56
+        add     rbx,QWORD[112+rsp]
+        mov     r15,rcx
+        vpxor   xmm8,xmm11,xmm10
+        xor     r12,rax
+        shrd    r14,r14,6
+        vpsrlq  xmm10,xmm10,7
+        xor     r15,rdx
+        add     rbx,r12
+        vpxor   xmm8,xmm8,xmm9
+        shrd    r13,r13,14
+        and     rdi,r15
+        vpsllq  xmm9,xmm9,7
+        xor     r14,rcx
+        add     rbx,r13
+        vpxor   xmm8,xmm8,xmm10
+        xor     rdi,rdx
+        shrd    r14,r14,28
+        vpsrlq  xmm11,xmm6,6
+        add     r9,rbx
+        add     rbx,rdi
+        vpxor   xmm8,xmm8,xmm9
+        mov     r13,r9
+        add     r14,rbx
+        vpsllq  xmm10,xmm6,3
+        shrd    r13,r13,23
+        mov     rbx,r14
+        vpaddq  xmm7,xmm7,xmm8
+        mov     r12,r10
+        shrd    r14,r14,5
+        vpsrlq  xmm9,xmm6,19
+        xor     r13,r9
+        xor     r12,r11
+        vpxor   xmm11,xmm11,xmm10
+        shrd    r13,r13,4
+        xor     r14,rbx
+        vpsllq  xmm10,xmm10,42
+        and     r12,r9
+        xor     r13,r9
+        vpxor   xmm11,xmm11,xmm9
+        add     rax,QWORD[120+rsp]
+        mov     rdi,rbx
+        vpsrlq  xmm9,xmm9,42
+        xor     r12,r11
+        shrd    r14,r14,6
+        vpxor   xmm11,xmm11,xmm10
+        xor     rdi,rcx
+        add     rax,r12
+        vpxor   xmm11,xmm11,xmm9
+        shrd    r13,r13,14
+        and     r15,rdi
+        vpaddq  xmm7,xmm7,xmm11
+        xor     r14,rbx
+        add     rax,r13
+        vpaddq  xmm10,xmm7,XMMWORD[96+rbp]
+        xor     r15,rcx
+        shrd    r14,r14,28
+        add     r8,rax
+        add     rax,r15
+        mov     r13,r8
+        add     r14,rax
+        vmovdqa XMMWORD[112+rsp],xmm10
+        cmp     BYTE[135+rbp],0
+        jne     NEAR $L$avx_00_47
+        shrd    r13,r13,23
+        mov     rax,r14
+        mov     r12,r9
+        shrd    r14,r14,5
+        xor     r13,r8
+        xor     r12,r10
+        shrd    r13,r13,4
+        xor     r14,rax
+        and     r12,r8
+        xor     r13,r8
+        add     r11,QWORD[rsp]
+        mov     r15,rax
+        xor     r12,r10
+        shrd    r14,r14,6
+        xor     r15,rbx
+        add     r11,r12
+        shrd    r13,r13,14
+        and     rdi,r15
+        xor     r14,rax
+        add     r11,r13
+        xor     rdi,rbx
+        shrd    r14,r14,28
+        add     rdx,r11
+        add     r11,rdi
+        mov     r13,rdx
+        add     r14,r11
+        shrd    r13,r13,23
+        mov     r11,r14
+        mov     r12,r8
+        shrd    r14,r14,5
+        xor     r13,rdx
+        xor     r12,r9
+        shrd    r13,r13,4
+        xor     r14,r11
+        and     r12,rdx
+        xor     r13,rdx
+        add     r10,QWORD[8+rsp]
+        mov     rdi,r11
+        xor     r12,r9
+        shrd    r14,r14,6
+        xor     rdi,rax
+        add     r10,r12
+        shrd    r13,r13,14
+        and     r15,rdi
+        xor     r14,r11
+        add     r10,r13
+        xor     r15,rax
+        shrd    r14,r14,28
+        add     rcx,r10
+        add     r10,r15
+        mov     r13,rcx
+        add     r14,r10
+        shrd    r13,r13,23
+        mov     r10,r14
+        mov     r12,rdx
+        shrd    r14,r14,5
+        xor     r13,rcx
+        xor     r12,r8
+        shrd    r13,r13,4
+        xor     r14,r10
+        and     r12,rcx
+        xor     r13,rcx
+        add     r9,QWORD[16+rsp]
+        mov     r15,r10
+        xor     r12,r8
+        shrd    r14,r14,6
+        xor     r15,r11
+        add     r9,r12
+        shrd    r13,r13,14
+        and     rdi,r15
+        xor     r14,r10
+        add     r9,r13
+        xor     rdi,r11
+        shrd    r14,r14,28
+        add     rbx,r9
+        add     r9,rdi
+        mov     r13,rbx
+        add     r14,r9
+        shrd    r13,r13,23
+        mov     r9,r14
+        mov     r12,rcx
+        shrd    r14,r14,5
+        xor     r13,rbx
+        xor     r12,rdx
+        shrd    r13,r13,4
+        xor     r14,r9
+        and     r12,rbx
+        xor     r13,rbx
+        add     r8,QWORD[24+rsp]
+        mov     rdi,r9
+        xor     r12,rdx
+        shrd    r14,r14,6
+        xor     rdi,r10
+        add     r8,r12
+        shrd    r13,r13,14
+        and     r15,rdi
+        xor     r14,r9
+        add     r8,r13
+        xor     r15,r10
+        shrd    r14,r14,28
+        add     rax,r8
+        add     r8,r15
+        mov     r13,rax
+        add     r14,r8
+        shrd    r13,r13,23
+        mov     r8,r14
+        mov     r12,rbx
+        shrd    r14,r14,5
+        xor     r13,rax
+        xor     r12,rcx
+        shrd    r13,r13,4
+        xor     r14,r8
+        and     r12,rax
+        xor     r13,rax
+        add     rdx,QWORD[32+rsp]
+        mov     r15,r8
+        xor     r12,rcx
+        shrd    r14,r14,6
+        xor     r15,r9
+        add     rdx,r12
+        shrd    r13,r13,14
+        and     rdi,r15
+        xor     r14,r8
+        add     rdx,r13
+        xor     rdi,r9
+        shrd    r14,r14,28
+        add     r11,rdx
+        add     rdx,rdi
+        mov     r13,r11
+        add     r14,rdx
+        shrd    r13,r13,23
+        mov     rdx,r14
+        mov     r12,rax
+        shrd    r14,r14,5
+        xor     r13,r11
+        xor     r12,rbx
+        shrd    r13,r13,4
+        xor     r14,rdx
+        and     r12,r11
+        xor     r13,r11
+        add     rcx,QWORD[40+rsp]
+        mov     rdi,rdx
+        xor     r12,rbx
+        shrd    r14,r14,6
+        xor     rdi,r8
+        add     rcx,r12
+        shrd    r13,r13,14
+        and     r15,rdi
+        xor     r14,rdx
+        add     rcx,r13
+        xor     r15,r8
+        shrd    r14,r14,28
+        add     r10,rcx
+        add     rcx,r15
+        mov     r13,r10
+        add     r14,rcx
+        shrd    r13,r13,23
+        mov     rcx,r14
+        mov     r12,r11
+        shrd    r14,r14,5
+        xor     r13,r10
+        xor     r12,rax
+        shrd    r13,r13,4
+        xor     r14,rcx
+        and     r12,r10
+        xor     r13,r10
+        add     rbx,QWORD[48+rsp]
+        mov     r15,rcx
+        xor     r12,rax
+        shrd    r14,r14,6
+        xor     r15,rdx
+        add     rbx,r12
+        shrd    r13,r13,14
+        and     rdi,r15
+        xor     r14,rcx
+        add     rbx,r13
+        xor     rdi,rdx
+        shrd    r14,r14,28
+        add     r9,rbx
+        add     rbx,rdi
+        mov     r13,r9
+        add     r14,rbx
+        shrd    r13,r13,23
+        mov     rbx,r14
+        mov     r12,r10
+        shrd    r14,r14,5
+        xor     r13,r9
+        xor     r12,r11
+        shrd    r13,r13,4
+        xor     r14,rbx
+        and     r12,r9
+        xor     r13,r9
+        add     rax,QWORD[56+rsp]
+        mov     rdi,rbx
+        xor     r12,r11
+        shrd    r14,r14,6
+        xor     rdi,rcx
+        add     rax,r12
+        shrd    r13,r13,14
+        and     r15,rdi
+        xor     r14,rbx
+        add     rax,r13
+        xor     r15,rcx
+        shrd    r14,r14,28
+        add     r8,rax
+        add     rax,r15
+        mov     r13,r8
+        add     r14,rax
+        shrd    r13,r13,23
+        mov     rax,r14
+        mov     r12,r9
+        shrd    r14,r14,5
+        xor     r13,r8
+        xor     r12,r10
+        shrd    r13,r13,4
+        xor     r14,rax
+        and     r12,r8
+        xor     r13,r8
+        add     r11,QWORD[64+rsp]
+        mov     r15,rax
+        xor     r12,r10
+        shrd    r14,r14,6
+        xor     r15,rbx
+        add     r11,r12
+        shrd    r13,r13,14
+        and     rdi,r15
+        xor     r14,rax
+        add     r11,r13
+        xor     rdi,rbx
+        shrd    r14,r14,28
+        add     rdx,r11
+        add     r11,rdi
+        mov     r13,rdx
+        add     r14,r11
+        shrd    r13,r13,23
+        mov     r11,r14
+        mov     r12,r8
+        shrd    r14,r14,5
+        xor     r13,rdx
+        xor     r12,r9
+        shrd    r13,r13,4
+        xor     r14,r11
+        and     r12,rdx
+        xor     r13,rdx
+        add     r10,QWORD[72+rsp]
+        mov     rdi,r11
+        xor     r12,r9
+        shrd    r14,r14,6
+        xor     rdi,rax
+        add     r10,r12
+        shrd    r13,r13,14
+        and     r15,rdi
+        xor     r14,r11
+        add     r10,r13
+        xor     r15,rax
+        shrd    r14,r14,28
+        add     rcx,r10
+        add     r10,r15
+        mov     r13,rcx
+        add     r14,r10
+        shrd    r13,r13,23
+        mov     r10,r14
+        mov     r12,rdx
+        shrd    r14,r14,5
+        xor     r13,rcx
+        xor     r12,r8
+        shrd    r13,r13,4
+        xor     r14,r10
+        and     r12,rcx
+        xor     r13,rcx
+        add     r9,QWORD[80+rsp]
+        mov     r15,r10
+        xor     r12,r8
+        shrd    r14,r14,6
+        xor     r15,r11
+        add     r9,r12
+        shrd    r13,r13,14
+        and     rdi,r15
+        xor     r14,r10
+        add     r9,r13
+        xor     rdi,r11
+        shrd    r14,r14,28
+        add     rbx,r9
+        add     r9,rdi
+        mov     r13,rbx
+        add     r14,r9
+        shrd    r13,r13,23
+        mov     r9,r14
+        mov     r12,rcx
+        shrd    r14,r14,5
+        xor     r13,rbx
+        xor     r12,rdx
+        shrd    r13,r13,4
+        xor     r14,r9
+        and     r12,rbx
+        xor     r13,rbx
+        add     r8,QWORD[88+rsp]
+        mov     rdi,r9
+        xor     r12,rdx
+        shrd    r14,r14,6
+        xor     rdi,r10
+        add     r8,r12
+        shrd    r13,r13,14
+        and     r15,rdi
+        xor     r14,r9
+        add     r8,r13
+        xor     r15,r10
+        shrd    r14,r14,28
+        add     rax,r8
+        add     r8,r15
+        mov     r13,rax
+        add     r14,r8
+        shrd    r13,r13,23
+        mov     r8,r14
+        mov     r12,rbx
+        shrd    r14,r14,5
+        xor     r13,rax
+        xor     r12,rcx
+        shrd    r13,r13,4
+        xor     r14,r8
+        and     r12,rax
+        xor     r13,rax
+        add     rdx,QWORD[96+rsp]
+        mov     r15,r8
+        xor     r12,rcx
+        shrd    r14,r14,6
+        xor     r15,r9
+        add     rdx,r12
+        shrd    r13,r13,14
+        and     rdi,r15
+        xor     r14,r8
+        add     rdx,r13
+        xor     rdi,r9
+        shrd    r14,r14,28
+        add     r11,rdx
+        add     rdx,rdi
+        mov     r13,r11
+        add     r14,rdx
+        shrd    r13,r13,23
+        mov     rdx,r14
+        mov     r12,rax
+        shrd    r14,r14,5
+        xor     r13,r11
+        xor     r12,rbx
+        shrd    r13,r13,4
+        xor     r14,rdx
+        and     r12,r11
+        xor     r13,r11
+        add     rcx,QWORD[104+rsp]
+        mov     rdi,rdx
+        xor     r12,rbx
+        shrd    r14,r14,6
+        xor     rdi,r8
+        add     rcx,r12
+        shrd    r13,r13,14
+        and     r15,rdi
+        xor     r14,rdx
+        add     rcx,r13
+        xor     r15,r8
+        shrd    r14,r14,28
+        add     r10,rcx
+        add     rcx,r15
+        mov     r13,r10
+        add     r14,rcx
+        shrd    r13,r13,23
+        mov     rcx,r14
+        mov     r12,r11
+        shrd    r14,r14,5
+        xor     r13,r10
+        xor     r12,rax
+        shrd    r13,r13,4
+        xor     r14,rcx
+        and     r12,r10
+        xor     r13,r10
+        add     rbx,QWORD[112+rsp]
+        mov     r15,rcx
+        xor     r12,rax
+        shrd    r14,r14,6
+        xor     r15,rdx
+        add     rbx,r12
+        shrd    r13,r13,14
+        and     rdi,r15
+        xor     r14,rcx
+        add     rbx,r13
+        xor     rdi,rdx
+        shrd    r14,r14,28
+        add     r9,rbx
+        add     rbx,rdi
+        mov     r13,r9
+        add     r14,rbx
+        shrd    r13,r13,23
+        mov     rbx,r14
+        mov     r12,r10
+        shrd    r14,r14,5
+        xor     r13,r9
+        xor     r12,r11
+        shrd    r13,r13,4
+        xor     r14,rbx
+        and     r12,r9
+        xor     r13,r9
+        add     rax,QWORD[120+rsp]
+        mov     rdi,rbx
+        xor     r12,r11
+        shrd    r14,r14,6
+        xor     rdi,rcx
+        add     rax,r12
+        shrd    r13,r13,14
+        and     r15,rdi
+        xor     r14,rbx
+        add     rax,r13
+        xor     r15,rcx
+        shrd    r14,r14,28
+        add     r8,rax
+        add     rax,r15
+        mov     r13,r8
+        add     r14,rax
+        mov     rdi,QWORD[((128+0))+rsp]
+        mov     rax,r14
+
+        add     rax,QWORD[rdi]
+        lea     rsi,[128+rsi]
+        add     rbx,QWORD[8+rdi]
+        add     rcx,QWORD[16+rdi]
+        add     rdx,QWORD[24+rdi]
+        add     r8,QWORD[32+rdi]
+        add     r9,QWORD[40+rdi]
+        add     r10,QWORD[48+rdi]
+        add     r11,QWORD[56+rdi]
+
+        cmp     rsi,QWORD[((128+16))+rsp]
+
+        mov     QWORD[rdi],rax
+        mov     QWORD[8+rdi],rbx
+        mov     QWORD[16+rdi],rcx
+        mov     QWORD[24+rdi],rdx
+        mov     QWORD[32+rdi],r8
+        mov     QWORD[40+rdi],r9
+        mov     QWORD[48+rdi],r10
+        mov     QWORD[56+rdi],r11
+        jb      NEAR $L$loop_avx
+
+        mov     rsi,QWORD[152+rsp]
+
+        vzeroupper
+        movaps  xmm6,XMMWORD[((128+32))+rsp]
+        movaps  xmm7,XMMWORD[((128+48))+rsp]
+        movaps  xmm8,XMMWORD[((128+64))+rsp]
+        movaps  xmm9,XMMWORD[((128+80))+rsp]
+        movaps  xmm10,XMMWORD[((128+96))+rsp]
+        movaps  xmm11,XMMWORD[((128+112))+rsp]
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$epilogue_avx:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_sha512_block_data_order_avx:
+
+ALIGN   64
+sha512_block_data_order_avx2:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha512_block_data_order_avx2:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+$L$avx2_shortcut:
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        sub     rsp,1408
+        shl     rdx,4
+        and     rsp,-256*8
+        lea     rdx,[rdx*8+rsi]
+        add     rsp,1152
+        mov     QWORD[((128+0))+rsp],rdi
+        mov     QWORD[((128+8))+rsp],rsi
+        mov     QWORD[((128+16))+rsp],rdx
+        mov     QWORD[152+rsp],rax
+
+        movaps  XMMWORD[(128+32)+rsp],xmm6
+        movaps  XMMWORD[(128+48)+rsp],xmm7
+        movaps  XMMWORD[(128+64)+rsp],xmm8
+        movaps  XMMWORD[(128+80)+rsp],xmm9
+        movaps  XMMWORD[(128+96)+rsp],xmm10
+        movaps  XMMWORD[(128+112)+rsp],xmm11
+$L$prologue_avx2:
+
+        vzeroupper
+        sub     rsi,-16*8
+        mov     rax,QWORD[rdi]
+        mov     r12,rsi
+        mov     rbx,QWORD[8+rdi]
+        cmp     rsi,rdx
+        mov     rcx,QWORD[16+rdi]
+        cmove   r12,rsp
+        mov     rdx,QWORD[24+rdi]
+        mov     r8,QWORD[32+rdi]
+        mov     r9,QWORD[40+rdi]
+        mov     r10,QWORD[48+rdi]
+        mov     r11,QWORD[56+rdi]
+        jmp     NEAR $L$oop_avx2
+ALIGN   16
+$L$oop_avx2:
+        vmovdqu xmm0,XMMWORD[((-128))+rsi]
+        vmovdqu xmm1,XMMWORD[((-128+16))+rsi]
+        vmovdqu xmm2,XMMWORD[((-128+32))+rsi]
+        lea     rbp,[((K512+128))]
+        vmovdqu xmm3,XMMWORD[((-128+48))+rsi]
+        vmovdqu xmm4,XMMWORD[((-128+64))+rsi]
+        vmovdqu xmm5,XMMWORD[((-128+80))+rsi]
+        vmovdqu xmm6,XMMWORD[((-128+96))+rsi]
+        vmovdqu xmm7,XMMWORD[((-128+112))+rsi]
+
+        vmovdqa ymm10,YMMWORD[1152+rbp]
+        vinserti128     ymm0,ymm0,XMMWORD[r12],1
+        vinserti128     ymm1,ymm1,XMMWORD[16+r12],1
+        vpshufb ymm0,ymm0,ymm10
+        vinserti128     ymm2,ymm2,XMMWORD[32+r12],1
+        vpshufb ymm1,ymm1,ymm10
+        vinserti128     ymm3,ymm3,XMMWORD[48+r12],1
+        vpshufb ymm2,ymm2,ymm10
+        vinserti128     ymm4,ymm4,XMMWORD[64+r12],1
+        vpshufb ymm3,ymm3,ymm10
+        vinserti128     ymm5,ymm5,XMMWORD[80+r12],1
+        vpshufb ymm4,ymm4,ymm10
+        vinserti128     ymm6,ymm6,XMMWORD[96+r12],1
+        vpshufb ymm5,ymm5,ymm10
+        vinserti128     ymm7,ymm7,XMMWORD[112+r12],1
+
+        vpaddq  ymm8,ymm0,YMMWORD[((-128))+rbp]
+        vpshufb ymm6,ymm6,ymm10
+        vpaddq  ymm9,ymm1,YMMWORD[((-96))+rbp]
+        vpshufb ymm7,ymm7,ymm10
+        vpaddq  ymm10,ymm2,YMMWORD[((-64))+rbp]
+        vpaddq  ymm11,ymm3,YMMWORD[((-32))+rbp]
+        vmovdqa YMMWORD[rsp],ymm8
+        vpaddq  ymm8,ymm4,YMMWORD[rbp]
+        vmovdqa YMMWORD[32+rsp],ymm9
+        vpaddq  ymm9,ymm5,YMMWORD[32+rbp]
+        vmovdqa YMMWORD[64+rsp],ymm10
+        vpaddq  ymm10,ymm6,YMMWORD[64+rbp]
+        vmovdqa YMMWORD[96+rsp],ymm11
+        lea     rsp,[((-128))+rsp]
+        vpaddq  ymm11,ymm7,YMMWORD[96+rbp]
+        vmovdqa YMMWORD[rsp],ymm8
+        xor     r14,r14
+        vmovdqa YMMWORD[32+rsp],ymm9
+        mov     rdi,rbx
+        vmovdqa YMMWORD[64+rsp],ymm10
+        xor     rdi,rcx
+        vmovdqa YMMWORD[96+rsp],ymm11
+        mov     r12,r9
+        add     rbp,16*2*8
+        jmp     NEAR $L$avx2_00_47
+
+ALIGN   16
+$L$avx2_00_47:
+        lea     rsp,[((-128))+rsp]
+        vpalignr        ymm8,ymm1,ymm0,8
+        add     r11,QWORD[((0+256))+rsp]
+        and     r12,r8
+        rorx    r13,r8,41
+        vpalignr        ymm11,ymm5,ymm4,8
+        rorx    r15,r8,18
+        lea     rax,[r14*1+rax]
+        lea     r11,[r12*1+r11]
+        vpsrlq  ymm10,ymm8,1
+        andn    r12,r8,r10
+        xor     r13,r15
+        rorx    r14,r8,14
+        vpaddq  ymm0,ymm0,ymm11
+        vpsrlq  ymm11,ymm8,7
+        lea     r11,[r12*1+r11]
+        xor     r13,r14
+        mov     r15,rax
+        vpsllq  ymm9,ymm8,56
+        vpxor   ymm8,ymm11,ymm10
+        rorx    r12,rax,39
+        lea     r11,[r13*1+r11]
+        xor     r15,rbx
+        vpsrlq  ymm10,ymm10,7
+        vpxor   ymm8,ymm8,ymm9
+        rorx    r14,rax,34
+        rorx    r13,rax,28
+        lea     rdx,[r11*1+rdx]
+        vpsllq  ymm9,ymm9,7
+        vpxor   ymm8,ymm8,ymm10
+        and     rdi,r15
+        xor     r14,r12
+        xor     rdi,rbx
+        vpsrlq  ymm11,ymm7,6
+        vpxor   ymm8,ymm8,ymm9
+        xor     r14,r13
+        lea     r11,[rdi*1+r11]
+        mov     r12,r8
+        vpsllq  ymm10,ymm7,3
+        vpaddq  ymm0,ymm0,ymm8
+        add     r10,QWORD[((8+256))+rsp]
+        and     r12,rdx
+        rorx    r13,rdx,41
+        vpsrlq  ymm9,ymm7,19
+        vpxor   ymm11,ymm11,ymm10
+        rorx    rdi,rdx,18
+        lea     r11,[r14*1+r11]
+        lea     r10,[r12*1+r10]
+        vpsllq  ymm10,ymm10,42
+        vpxor   ymm11,ymm11,ymm9
+        andn    r12,rdx,r9
+        xor     r13,rdi
+        rorx    r14,rdx,14
+        vpsrlq  ymm9,ymm9,42
+        vpxor   ymm11,ymm11,ymm10
+        lea     r10,[r12*1+r10]
+        xor     r13,r14
+        mov     rdi,r11
+        vpxor   ymm11,ymm11,ymm9
+        rorx    r12,r11,39
+        lea     r10,[r13*1+r10]
+        xor     rdi,rax
+        vpaddq  ymm0,ymm0,ymm11
+        rorx    r14,r11,34
+        rorx    r13,r11,28
+        lea     rcx,[r10*1+rcx]
+        vpaddq  ymm10,ymm0,YMMWORD[((-128))+rbp]
+        and     r15,rdi
+        xor     r14,r12
+        xor     r15,rax
+        xor     r14,r13
+        lea     r10,[r15*1+r10]
+        mov     r12,rdx
+        vmovdqa YMMWORD[rsp],ymm10
+        vpalignr        ymm8,ymm2,ymm1,8
+        add     r9,QWORD[((32+256))+rsp]
+        and     r12,rcx
+        rorx    r13,rcx,41
+        vpalignr        ymm11,ymm6,ymm5,8
+        rorx    r15,rcx,18
+        lea     r10,[r14*1+r10]
+        lea     r9,[r12*1+r9]
+        vpsrlq  ymm10,ymm8,1
+        andn    r12,rcx,r8
+        xor     r13,r15
+        rorx    r14,rcx,14
+        vpaddq  ymm1,ymm1,ymm11
+        vpsrlq  ymm11,ymm8,7
+        lea     r9,[r12*1+r9]
+        xor     r13,r14
+        mov     r15,r10
+        vpsllq  ymm9,ymm8,56
+        vpxor   ymm8,ymm11,ymm10
+        rorx    r12,r10,39
+        lea     r9,[r13*1+r9]
+        xor     r15,r11
+        vpsrlq  ymm10,ymm10,7
+        vpxor   ymm8,ymm8,ymm9
+        rorx    r14,r10,34
+        rorx    r13,r10,28
+        lea     rbx,[r9*1+rbx]
+        vpsllq  ymm9,ymm9,7
+        vpxor   ymm8,ymm8,ymm10
+        and     rdi,r15
+        xor     r14,r12
+        xor     rdi,r11
+        vpsrlq  ymm11,ymm0,6
+        vpxor   ymm8,ymm8,ymm9
+        xor     r14,r13
+        lea     r9,[rdi*1+r9]
+        mov     r12,rcx
+        vpsllq  ymm10,ymm0,3
+        vpaddq  ymm1,ymm1,ymm8
+        add     r8,QWORD[((40+256))+rsp]
+        and     r12,rbx
+        rorx    r13,rbx,41
+        vpsrlq  ymm9,ymm0,19
+        vpxor   ymm11,ymm11,ymm10
+        rorx    rdi,rbx,18
+        lea     r9,[r14*1+r9]
+        lea     r8,[r12*1+r8]
+        vpsllq  ymm10,ymm10,42
+        vpxor   ymm11,ymm11,ymm9
+        andn    r12,rbx,rdx
+        xor     r13,rdi
+        rorx    r14,rbx,14
+        vpsrlq  ymm9,ymm9,42
+        vpxor   ymm11,ymm11,ymm10
+        lea     r8,[r12*1+r8]
+        xor     r13,r14
+        mov     rdi,r9
+        vpxor   ymm11,ymm11,ymm9
+        rorx    r12,r9,39
+        lea     r8,[r13*1+r8]
+        xor     rdi,r10
+        vpaddq  ymm1,ymm1,ymm11
+        rorx    r14,r9,34
+        rorx    r13,r9,28
+        lea     rax,[r8*1+rax]
+        vpaddq  ymm10,ymm1,YMMWORD[((-96))+rbp]
+        and     r15,rdi
+        xor     r14,r12
+        xor     r15,r10
+        xor     r14,r13
+        lea     r8,[r15*1+r8]
+        mov     r12,rbx
+        vmovdqa YMMWORD[32+rsp],ymm10
+        vpalignr        ymm8,ymm3,ymm2,8
+        add     rdx,QWORD[((64+256))+rsp]
+        and     r12,rax
+        rorx    r13,rax,41
+        vpalignr        ymm11,ymm7,ymm6,8
+        rorx    r15,rax,18
+        lea     r8,[r14*1+r8]
+        lea     rdx,[r12*1+rdx]
+        vpsrlq  ymm10,ymm8,1
+        andn    r12,rax,rcx
+        xor     r13,r15
+        rorx    r14,rax,14
+        vpaddq  ymm2,ymm2,ymm11
+        vpsrlq  ymm11,ymm8,7
+        lea     rdx,[r12*1+rdx]
+        xor     r13,r14
+        mov     r15,r8
+        vpsllq  ymm9,ymm8,56
+        vpxor   ymm8,ymm11,ymm10
+        rorx    r12,r8,39
+        lea     rdx,[r13*1+rdx]
+        xor     r15,r9
+        vpsrlq  ymm10,ymm10,7
+        vpxor   ymm8,ymm8,ymm9
+        rorx    r14,r8,34
+        rorx    r13,r8,28
+        lea     r11,[rdx*1+r11]
+        vpsllq  ymm9,ymm9,7
+        vpxor   ymm8,ymm8,ymm10
+        and     rdi,r15
+        xor     r14,r12
+        xor     rdi,r9
+        vpsrlq  ymm11,ymm1,6
+        vpxor   ymm8,ymm8,ymm9
+        xor     r14,r13
+        lea     rdx,[rdi*1+rdx]
+        mov     r12,rax
+        vpsllq  ymm10,ymm1,3
+        vpaddq  ymm2,ymm2,ymm8
+        add     rcx,QWORD[((72+256))+rsp]
+        and     r12,r11
+        rorx    r13,r11,41
+        vpsrlq  ymm9,ymm1,19
+        vpxor   ymm11,ymm11,ymm10
+        rorx    rdi,r11,18
+        lea     rdx,[r14*1+rdx]
+        lea     rcx,[r12*1+rcx]
+        vpsllq  ymm10,ymm10,42
+        vpxor   ymm11,ymm11,ymm9
+        andn    r12,r11,rbx
+        xor     r13,rdi
+        rorx    r14,r11,14
+        vpsrlq  ymm9,ymm9,42
+        vpxor   ymm11,ymm11,ymm10
+        lea     rcx,[r12*1+rcx]
+        xor     r13,r14
+        mov     rdi,rdx
+        vpxor   ymm11,ymm11,ymm9
+        rorx    r12,rdx,39
+        lea     rcx,[r13*1+rcx]
+        xor     rdi,r8
+        vpaddq  ymm2,ymm2,ymm11
+        rorx    r14,rdx,34
+        rorx    r13,rdx,28
+        lea     r10,[rcx*1+r10]
+        vpaddq  ymm10,ymm2,YMMWORD[((-64))+rbp]
+        and     r15,rdi
+        xor     r14,r12
+        xor     r15,r8
+        xor     r14,r13
+        lea     rcx,[r15*1+rcx]
+        mov     r12,r11
+        vmovdqa YMMWORD[64+rsp],ymm10
+        vpalignr        ymm8,ymm4,ymm3,8
+        add     rbx,QWORD[((96+256))+rsp]
+        and     r12,r10
+        rorx    r13,r10,41
+        vpalignr        ymm11,ymm0,ymm7,8
+        rorx    r15,r10,18
+        lea     rcx,[r14*1+rcx]
+        lea     rbx,[r12*1+rbx]
+        vpsrlq  ymm10,ymm8,1
+        andn    r12,r10,rax
+        xor     r13,r15
+        rorx    r14,r10,14
+        vpaddq  ymm3,ymm3,ymm11
+        vpsrlq  ymm11,ymm8,7
+        lea     rbx,[r12*1+rbx]
+        xor     r13,r14
+        mov     r15,rcx
+        vpsllq  ymm9,ymm8,56
+        vpxor   ymm8,ymm11,ymm10
+        rorx    r12,rcx,39
+        lea     rbx,[r13*1+rbx]
+        xor     r15,rdx
+        vpsrlq  ymm10,ymm10,7
+        vpxor   ymm8,ymm8,ymm9
+        rorx    r14,rcx,34
+        rorx    r13,rcx,28
+        lea     r9,[rbx*1+r9]
+        vpsllq  ymm9,ymm9,7
+        vpxor   ymm8,ymm8,ymm10
+        and     rdi,r15
+        xor     r14,r12
+        xor     rdi,rdx
+        vpsrlq  ymm11,ymm2,6
+        vpxor   ymm8,ymm8,ymm9
+        xor     r14,r13
+        lea     rbx,[rdi*1+rbx]
+        mov     r12,r10
+        vpsllq  ymm10,ymm2,3
+        vpaddq  ymm3,ymm3,ymm8
+        add     rax,QWORD[((104+256))+rsp]
+        and     r12,r9
+        rorx    r13,r9,41
+        vpsrlq  ymm9,ymm2,19
+        vpxor   ymm11,ymm11,ymm10
+        rorx    rdi,r9,18
+        lea     rbx,[r14*1+rbx]
+        lea     rax,[r12*1+rax]
+        vpsllq  ymm10,ymm10,42
+        vpxor   ymm11,ymm11,ymm9
+        andn    r12,r9,r11
+        xor     r13,rdi
+        rorx    r14,r9,14
+        vpsrlq  ymm9,ymm9,42
+        vpxor   ymm11,ymm11,ymm10
+        lea     rax,[r12*1+rax]
+        xor     r13,r14
+        mov     rdi,rbx
+        vpxor   ymm11,ymm11,ymm9
+        rorx    r12,rbx,39
+        lea     rax,[r13*1+rax]
+        xor     rdi,rcx
+        vpaddq  ymm3,ymm3,ymm11
+        rorx    r14,rbx,34
+        rorx    r13,rbx,28
+        lea     r8,[rax*1+r8]
+        vpaddq  ymm10,ymm3,YMMWORD[((-32))+rbp]
+        and     r15,rdi
+        xor     r14,r12
+        xor     r15,rcx
+        xor     r14,r13
+        lea     rax,[r15*1+rax]
+        mov     r12,r9
+        vmovdqa YMMWORD[96+rsp],ymm10
+        lea     rsp,[((-128))+rsp]
+        vpalignr        ymm8,ymm5,ymm4,8
+        add     r11,QWORD[((0+256))+rsp]
+        and     r12,r8
+        rorx    r13,r8,41
+        vpalignr        ymm11,ymm1,ymm0,8
+        rorx    r15,r8,18
+        lea     rax,[r14*1+rax]
+        lea     r11,[r12*1+r11]
+        vpsrlq  ymm10,ymm8,1
+        andn    r12,r8,r10
+        xor     r13,r15
+        rorx    r14,r8,14
+        vpaddq  ymm4,ymm4,ymm11
+        vpsrlq  ymm11,ymm8,7
+        lea     r11,[r12*1+r11]
+        xor     r13,r14
+        mov     r15,rax
+        vpsllq  ymm9,ymm8,56
+        vpxor   ymm8,ymm11,ymm10
+        rorx    r12,rax,39
+        lea     r11,[r13*1+r11]
+        xor     r15,rbx
+        vpsrlq  ymm10,ymm10,7
+        vpxor   ymm8,ymm8,ymm9
+        rorx    r14,rax,34
+        rorx    r13,rax,28
+        lea     rdx,[r11*1+rdx]
+        vpsllq  ymm9,ymm9,7
+        vpxor   ymm8,ymm8,ymm10
+        and     rdi,r15
+        xor     r14,r12
+        xor     rdi,rbx
+        vpsrlq  ymm11,ymm3,6
+        vpxor   ymm8,ymm8,ymm9
+        xor     r14,r13
+        lea     r11,[rdi*1+r11]
+        mov     r12,r8
+        vpsllq  ymm10,ymm3,3
+        vpaddq  ymm4,ymm4,ymm8
+        add     r10,QWORD[((8+256))+rsp]
+        and     r12,rdx
+        rorx    r13,rdx,41
+        vpsrlq  ymm9,ymm3,19
+        vpxor   ymm11,ymm11,ymm10
+        rorx    rdi,rdx,18
+        lea     r11,[r14*1+r11]
+        lea     r10,[r12*1+r10]
+        vpsllq  ymm10,ymm10,42
+        vpxor   ymm11,ymm11,ymm9
+        andn    r12,rdx,r9
+        xor     r13,rdi
+        rorx    r14,rdx,14
+        vpsrlq  ymm9,ymm9,42
+        vpxor   ymm11,ymm11,ymm10
+        lea     r10,[r12*1+r10]
+        xor     r13,r14
+        mov     rdi,r11
+        vpxor   ymm11,ymm11,ymm9
+        rorx    r12,r11,39
+        lea     r10,[r13*1+r10]
+        xor     rdi,rax
+        vpaddq  ymm4,ymm4,ymm11
+        rorx    r14,r11,34
+        rorx    r13,r11,28
+        lea     rcx,[r10*1+rcx]
+        vpaddq  ymm10,ymm4,YMMWORD[rbp]
+        and     r15,rdi
+        xor     r14,r12
+        xor     r15,rax
+        xor     r14,r13
+        lea     r10,[r15*1+r10]
+        mov     r12,rdx
+        vmovdqa YMMWORD[rsp],ymm10
+        vpalignr        ymm8,ymm6,ymm5,8
+        add     r9,QWORD[((32+256))+rsp]
+        and     r12,rcx
+        rorx    r13,rcx,41
+        vpalignr        ymm11,ymm2,ymm1,8
+        rorx    r15,rcx,18
+        lea     r10,[r14*1+r10]
+        lea     r9,[r12*1+r9]
+        vpsrlq  ymm10,ymm8,1
+        andn    r12,rcx,r8
+        xor     r13,r15
+        rorx    r14,rcx,14
+        vpaddq  ymm5,ymm5,ymm11
+        vpsrlq  ymm11,ymm8,7
+        lea     r9,[r12*1+r9]
+        xor     r13,r14
+        mov     r15,r10
+        vpsllq  ymm9,ymm8,56
+        vpxor   ymm8,ymm11,ymm10
+        rorx    r12,r10,39
+        lea     r9,[r13*1+r9]
+        xor     r15,r11
+        vpsrlq  ymm10,ymm10,7
+        vpxor   ymm8,ymm8,ymm9
+        rorx    r14,r10,34
+        rorx    r13,r10,28
+        lea     rbx,[r9*1+rbx]
+        vpsllq  ymm9,ymm9,7
+        vpxor   ymm8,ymm8,ymm10
+        and     rdi,r15
+        xor     r14,r12
+        xor     rdi,r11
+        vpsrlq  ymm11,ymm4,6
+        vpxor   ymm8,ymm8,ymm9
+        xor     r14,r13
+        lea     r9,[rdi*1+r9]
+        mov     r12,rcx
+        vpsllq  ymm10,ymm4,3
+        vpaddq  ymm5,ymm5,ymm8
+        add     r8,QWORD[((40+256))+rsp]
+        and     r12,rbx
+        rorx    r13,rbx,41
+        vpsrlq  ymm9,ymm4,19
+        vpxor   ymm11,ymm11,ymm10
+        rorx    rdi,rbx,18
+        lea     r9,[r14*1+r9]
+        lea     r8,[r12*1+r8]
+        vpsllq  ymm10,ymm10,42
+        vpxor   ymm11,ymm11,ymm9
+        andn    r12,rbx,rdx
+        xor     r13,rdi
+        rorx    r14,rbx,14
+        vpsrlq  ymm9,ymm9,42
+        vpxor   ymm11,ymm11,ymm10
+        lea     r8,[r12*1+r8]
+        xor     r13,r14
+        mov     rdi,r9
+        vpxor   ymm11,ymm11,ymm9
+        rorx    r12,r9,39
+        lea     r8,[r13*1+r8]
+        xor     rdi,r10
+        vpaddq  ymm5,ymm5,ymm11
+        rorx    r14,r9,34
+        rorx    r13,r9,28
+        lea     rax,[r8*1+rax]
+        vpaddq  ymm10,ymm5,YMMWORD[32+rbp]
+        and     r15,rdi
+        xor     r14,r12
+        xor     r15,r10
+        xor     r14,r13
+        lea     r8,[r15*1+r8]
+        mov     r12,rbx
+        vmovdqa YMMWORD[32+rsp],ymm10
+        vpalignr        ymm8,ymm7,ymm6,8
+        add     rdx,QWORD[((64+256))+rsp]
+        and     r12,rax
+        rorx    r13,rax,41
+        vpalignr        ymm11,ymm3,ymm2,8
+        rorx    r15,rax,18
+        lea     r8,[r14*1+r8]
+        lea     rdx,[r12*1+rdx]
+        vpsrlq  ymm10,ymm8,1
+        andn    r12,rax,rcx
+        xor     r13,r15
+        rorx    r14,rax,14
+        vpaddq  ymm6,ymm6,ymm11
+        vpsrlq  ymm11,ymm8,7
+        lea     rdx,[r12*1+rdx]
+        xor     r13,r14
+        mov     r15,r8
+        vpsllq  ymm9,ymm8,56
+        vpxor   ymm8,ymm11,ymm10
+        rorx    r12,r8,39
+        lea     rdx,[r13*1+rdx]
+        xor     r15,r9
+        vpsrlq  ymm10,ymm10,7
+        vpxor   ymm8,ymm8,ymm9
+        rorx    r14,r8,34
+        rorx    r13,r8,28
+        lea     r11,[rdx*1+r11]
+        vpsllq  ymm9,ymm9,7
+        vpxor   ymm8,ymm8,ymm10
+        and     rdi,r15
+        xor     r14,r12
+        xor     rdi,r9
+        vpsrlq  ymm11,ymm5,6
+        vpxor   ymm8,ymm8,ymm9
+        xor     r14,r13
+        lea     rdx,[rdi*1+rdx]
+        mov     r12,rax
+        vpsllq  ymm10,ymm5,3
+        vpaddq  ymm6,ymm6,ymm8
+        add     rcx,QWORD[((72+256))+rsp]
+        and     r12,r11
+        rorx    r13,r11,41
+        vpsrlq  ymm9,ymm5,19
+        vpxor   ymm11,ymm11,ymm10
+        rorx    rdi,r11,18
+        lea     rdx,[r14*1+rdx]
+        lea     rcx,[r12*1+rcx]
+        vpsllq  ymm10,ymm10,42
+        vpxor   ymm11,ymm11,ymm9
+        andn    r12,r11,rbx
+        xor     r13,rdi
+        rorx    r14,r11,14
+        vpsrlq  ymm9,ymm9,42
+        vpxor   ymm11,ymm11,ymm10
+        lea     rcx,[r12*1+rcx]
+        xor     r13,r14
+        mov     rdi,rdx
+        vpxor   ymm11,ymm11,ymm9
+        rorx    r12,rdx,39
+        lea     rcx,[r13*1+rcx]
+        xor     rdi,r8
+        vpaddq  ymm6,ymm6,ymm11
+        rorx    r14,rdx,34
+        rorx    r13,rdx,28
+        lea     r10,[rcx*1+r10]
+        vpaddq  ymm10,ymm6,YMMWORD[64+rbp]
+        and     r15,rdi
+        xor     r14,r12
+        xor     r15,r8
+        xor     r14,r13
+        lea     rcx,[r15*1+rcx]
+        mov     r12,r11
+        vmovdqa YMMWORD[64+rsp],ymm10
+        vpalignr        ymm8,ymm0,ymm7,8
+        add     rbx,QWORD[((96+256))+rsp]
+        and     r12,r10
+        rorx    r13,r10,41
+        vpalignr        ymm11,ymm4,ymm3,8
+        rorx    r15,r10,18
+        lea     rcx,[r14*1+rcx]
+        lea     rbx,[r12*1+rbx]
+        vpsrlq  ymm10,ymm8,1
+        andn    r12,r10,rax
+        xor     r13,r15
+        rorx    r14,r10,14
+        vpaddq  ymm7,ymm7,ymm11
+        vpsrlq  ymm11,ymm8,7
+        lea     rbx,[r12*1+rbx]
+        xor     r13,r14
+        mov     r15,rcx
+        vpsllq  ymm9,ymm8,56
+        vpxor   ymm8,ymm11,ymm10
+        rorx    r12,rcx,39
+        lea     rbx,[r13*1+rbx]
+        xor     r15,rdx
+        vpsrlq  ymm10,ymm10,7
+        vpxor   ymm8,ymm8,ymm9
+        rorx    r14,rcx,34
+        rorx    r13,rcx,28
+        lea     r9,[rbx*1+r9]
+        vpsllq  ymm9,ymm9,7
+        vpxor   ymm8,ymm8,ymm10
+        and     rdi,r15
+        xor     r14,r12
+        xor     rdi,rdx
+        vpsrlq  ymm11,ymm6,6
+        vpxor   ymm8,ymm8,ymm9
+        xor     r14,r13
+        lea     rbx,[rdi*1+rbx]
+        mov     r12,r10
+        vpsllq  ymm10,ymm6,3
+        vpaddq  ymm7,ymm7,ymm8
+        add     rax,QWORD[((104+256))+rsp]
+        and     r12,r9
+        rorx    r13,r9,41
+        vpsrlq  ymm9,ymm6,19
+        vpxor   ymm11,ymm11,ymm10
+        rorx    rdi,r9,18
+        lea     rbx,[r14*1+rbx]
+        lea     rax,[r12*1+rax]
+        vpsllq  ymm10,ymm10,42
+        vpxor   ymm11,ymm11,ymm9
+        andn    r12,r9,r11
+        xor     r13,rdi
+        rorx    r14,r9,14
+        vpsrlq  ymm9,ymm9,42
+        vpxor   ymm11,ymm11,ymm10
+        lea     rax,[r12*1+rax]
+        xor     r13,r14
+        mov     rdi,rbx
+        vpxor   ymm11,ymm11,ymm9
+        rorx    r12,rbx,39
+        lea     rax,[r13*1+rax]
+        xor     rdi,rcx
+        vpaddq  ymm7,ymm7,ymm11
+        rorx    r14,rbx,34
+        rorx    r13,rbx,28
+        lea     r8,[rax*1+r8]
+        vpaddq  ymm10,ymm7,YMMWORD[96+rbp]
+        and     r15,rdi
+        xor     r14,r12
+        xor     r15,rcx
+        xor     r14,r13
+        lea     rax,[r15*1+rax]
+        mov     r12,r9
+        vmovdqa YMMWORD[96+rsp],ymm10
+        lea     rbp,[256+rbp]
+        cmp     BYTE[((-121))+rbp],0
+        jne     NEAR $L$avx2_00_47
+        add     r11,QWORD[((0+128))+rsp]
+        and     r12,r8
+        rorx    r13,r8,41
+        rorx    r15,r8,18
+        lea     rax,[r14*1+rax]
+        lea     r11,[r12*1+r11]
+        andn    r12,r8,r10
+        xor     r13,r15
+        rorx    r14,r8,14
+        lea     r11,[r12*1+r11]
+        xor     r13,r14
+        mov     r15,rax
+        rorx    r12,rax,39
+        lea     r11,[r13*1+r11]
+        xor     r15,rbx
+        rorx    r14,rax,34
+        rorx    r13,rax,28
+        lea     rdx,[r11*1+rdx]
+        and     rdi,r15
+        xor     r14,r12
+        xor     rdi,rbx
+        xor     r14,r13
+        lea     r11,[rdi*1+r11]
+        mov     r12,r8
+        add     r10,QWORD[((8+128))+rsp]
+        and     r12,rdx
+        rorx    r13,rdx,41
+        rorx    rdi,rdx,18
+        lea     r11,[r14*1+r11]
+        lea     r10,[r12*1+r10]
+        andn    r12,rdx,r9
+        xor     r13,rdi
+        rorx    r14,rdx,14
+        lea     r10,[r12*1+r10]
+        xor     r13,r14
+        mov     rdi,r11
+        rorx    r12,r11,39
+        lea     r10,[r13*1+r10]
+        xor     rdi,rax
+        rorx    r14,r11,34
+        rorx    r13,r11,28
+        lea     rcx,[r10*1+rcx]
+        and     r15,rdi
+        xor     r14,r12
+        xor     r15,rax
+        xor     r14,r13
+        lea     r10,[r15*1+r10]
+        mov     r12,rdx
+        add     r9,QWORD[((32+128))+rsp]
+        and     r12,rcx
+        rorx    r13,rcx,41
+        rorx    r15,rcx,18
+        lea     r10,[r14*1+r10]
+        lea     r9,[r12*1+r9]
+        andn    r12,rcx,r8
+        xor     r13,r15
+        rorx    r14,rcx,14
+        lea     r9,[r12*1+r9]
+        xor     r13,r14
+        mov     r15,r10
+        rorx    r12,r10,39
+        lea     r9,[r13*1+r9]
+        xor     r15,r11
+        rorx    r14,r10,34
+        rorx    r13,r10,28
+        lea     rbx,[r9*1+rbx]
+        and     rdi,r15
+        xor     r14,r12
+        xor     rdi,r11
+        xor     r14,r13
+        lea     r9,[rdi*1+r9]
+        mov     r12,rcx
+        add     r8,QWORD[((40+128))+rsp]
+        and     r12,rbx
+        rorx    r13,rbx,41
+        rorx    rdi,rbx,18
+        lea     r9,[r14*1+r9]
+        lea     r8,[r12*1+r8]
+        andn    r12,rbx,rdx
+        xor     r13,rdi
+        rorx    r14,rbx,14
+        lea     r8,[r12*1+r8]
+        xor     r13,r14
+        mov     rdi,r9
+        rorx    r12,r9,39
+        lea     r8,[r13*1+r8]
+        xor     rdi,r10
+        rorx    r14,r9,34
+        rorx    r13,r9,28
+        lea     rax,[r8*1+rax]
+        and     r15,rdi
+        xor     r14,r12
+        xor     r15,r10
+        xor     r14,r13
+        lea     r8,[r15*1+r8]
+        mov     r12,rbx
+        add     rdx,QWORD[((64+128))+rsp]
+        and     r12,rax
+        rorx    r13,rax,41
+        rorx    r15,rax,18
+        lea     r8,[r14*1+r8]
+        lea     rdx,[r12*1+rdx]
+        andn    r12,rax,rcx
+        xor     r13,r15
+        rorx    r14,rax,14
+        lea     rdx,[r12*1+rdx]
+        xor     r13,r14
+        mov     r15,r8
+        rorx    r12,r8,39
+        lea     rdx,[r13*1+rdx]
+        xor     r15,r9
+        rorx    r14,r8,34
+        rorx    r13,r8,28
+        lea     r11,[rdx*1+r11]
+        and     rdi,r15
+        xor     r14,r12
+        xor     rdi,r9
+        xor     r14,r13
+        lea     rdx,[rdi*1+rdx]
+        mov     r12,rax
+        add     rcx,QWORD[((72+128))+rsp]
+        and     r12,r11
+        rorx    r13,r11,41
+        rorx    rdi,r11,18
+        lea     rdx,[r14*1+rdx]
+        lea     rcx,[r12*1+rcx]
+        andn    r12,r11,rbx
+        xor     r13,rdi
+        rorx    r14,r11,14
+        lea     rcx,[r12*1+rcx]
+        xor     r13,r14
+        mov     rdi,rdx
+        rorx    r12,rdx,39
+        lea     rcx,[r13*1+rcx]
+        xor     rdi,r8
+        rorx    r14,rdx,34
+        rorx    r13,rdx,28
+        lea     r10,[rcx*1+r10]
+        and     r15,rdi
+        xor     r14,r12
+        xor     r15,r8
+        xor     r14,r13
+        lea     rcx,[r15*1+rcx]
+        mov     r12,r11
+        add     rbx,QWORD[((96+128))+rsp]
+        and     r12,r10
+        rorx    r13,r10,41
+        rorx    r15,r10,18
+        lea     rcx,[r14*1+rcx]
+        lea     rbx,[r12*1+rbx]
+        andn    r12,r10,rax
+        xor     r13,r15
+        rorx    r14,r10,14
+        lea     rbx,[r12*1+rbx]
+        xor     r13,r14
+        mov     r15,rcx
+        rorx    r12,rcx,39
+        lea     rbx,[r13*1+rbx]
+        xor     r15,rdx
+        rorx    r14,rcx,34
+        rorx    r13,rcx,28
+        lea     r9,[rbx*1+r9]
+        and     rdi,r15
+        xor     r14,r12
+        xor     rdi,rdx
+        xor     r14,r13
+        lea     rbx,[rdi*1+rbx]
+        mov     r12,r10
+        add     rax,QWORD[((104+128))+rsp]
+        and     r12,r9
+        rorx    r13,r9,41
+        rorx    rdi,r9,18
+        lea     rbx,[r14*1+rbx]
+        lea     rax,[r12*1+rax]
+        andn    r12,r9,r11
+        xor     r13,rdi
+        rorx    r14,r9,14
+        lea     rax,[r12*1+rax]
+        xor     r13,r14
+        mov     rdi,rbx
+        rorx    r12,rbx,39
+        lea     rax,[r13*1+rax]
+        xor     rdi,rcx
+        rorx    r14,rbx,34
+        rorx    r13,rbx,28
+        lea     r8,[rax*1+r8]
+        and     r15,rdi
+        xor     r14,r12
+        xor     r15,rcx
+        xor     r14,r13
+        lea     rax,[r15*1+rax]
+        mov     r12,r9
+        add     r11,QWORD[rsp]
+        and     r12,r8
+        rorx    r13,r8,41
+        rorx    r15,r8,18
+        lea     rax,[r14*1+rax]
+        lea     r11,[r12*1+r11]
+        andn    r12,r8,r10
+        xor     r13,r15
+        rorx    r14,r8,14
+        lea     r11,[r12*1+r11]
+        xor     r13,r14
+        mov     r15,rax
+        rorx    r12,rax,39
+        lea     r11,[r13*1+r11]
+        xor     r15,rbx
+        rorx    r14,rax,34
+        rorx    r13,rax,28
+        lea     rdx,[r11*1+rdx]
+        and     rdi,r15
+        xor     r14,r12
+        xor     rdi,rbx
+        xor     r14,r13
+        lea     r11,[rdi*1+r11]
+        mov     r12,r8
+        add     r10,QWORD[8+rsp]
+        and     r12,rdx
+        rorx    r13,rdx,41
+        rorx    rdi,rdx,18
+        lea     r11,[r14*1+r11]
+        lea     r10,[r12*1+r10]
+        andn    r12,rdx,r9
+        xor     r13,rdi
+        rorx    r14,rdx,14
+        lea     r10,[r12*1+r10]
+        xor     r13,r14
+        mov     rdi,r11
+        rorx    r12,r11,39
+        lea     r10,[r13*1+r10]
+        xor     rdi,rax
+        rorx    r14,r11,34
+        rorx    r13,r11,28
+        lea     rcx,[r10*1+rcx]
+        and     r15,rdi
+        xor     r14,r12
+        xor     r15,rax
+        xor     r14,r13
+        lea     r10,[r15*1+r10]
+        mov     r12,rdx
+        add     r9,QWORD[32+rsp]
+        and     r12,rcx
+        rorx    r13,rcx,41
+        rorx    r15,rcx,18
+        lea     r10,[r14*1+r10]
+        lea     r9,[r12*1+r9]
+        andn    r12,rcx,r8
+        xor     r13,r15
+        rorx    r14,rcx,14
+        lea     r9,[r12*1+r9]
+        xor     r13,r14
+        mov     r15,r10
+        rorx    r12,r10,39
+        lea     r9,[r13*1+r9]
+        xor     r15,r11
+        rorx    r14,r10,34
+        rorx    r13,r10,28
+        lea     rbx,[r9*1+rbx]
+        and     rdi,r15
+        xor     r14,r12
+        xor     rdi,r11
+        xor     r14,r13
+        lea     r9,[rdi*1+r9]
+        mov     r12,rcx
+        add     r8,QWORD[40+rsp]
+        and     r12,rbx
+        rorx    r13,rbx,41
+        rorx    rdi,rbx,18
+        lea     r9,[r14*1+r9]
+        lea     r8,[r12*1+r8]
+        andn    r12,rbx,rdx
+        xor     r13,rdi
+        rorx    r14,rbx,14
+        lea     r8,[r12*1+r8]
+        xor     r13,r14
+        mov     rdi,r9
+        rorx    r12,r9,39
+        lea     r8,[r13*1+r8]
+        xor     rdi,r10
+        rorx    r14,r9,34
+        rorx    r13,r9,28
+        lea     rax,[r8*1+rax]
+        and     r15,rdi
+        xor     r14,r12
+        xor     r15,r10
+        xor     r14,r13
+        lea     r8,[r15*1+r8]
+        mov     r12,rbx
+        add     rdx,QWORD[64+rsp]
+        and     r12,rax
+        rorx    r13,rax,41
+        rorx    r15,rax,18
+        lea     r8,[r14*1+r8]
+        lea     rdx,[r12*1+rdx]
+        andn    r12,rax,rcx
+        xor     r13,r15
+        rorx    r14,rax,14
+        lea     rdx,[r12*1+rdx]
+        xor     r13,r14
+        mov     r15,r8
+        rorx    r12,r8,39
+        lea     rdx,[r13*1+rdx]
+        xor     r15,r9
+        rorx    r14,r8,34
+        rorx    r13,r8,28
+        lea     r11,[rdx*1+r11]
+        and     rdi,r15
+        xor     r14,r12
+        xor     rdi,r9
+        xor     r14,r13
+        lea     rdx,[rdi*1+rdx]
+        mov     r12,rax
+        add     rcx,QWORD[72+rsp]
+        and     r12,r11
+        rorx    r13,r11,41
+        rorx    rdi,r11,18
+        lea     rdx,[r14*1+rdx]
+        lea     rcx,[r12*1+rcx]
+        andn    r12,r11,rbx
+        xor     r13,rdi
+        rorx    r14,r11,14
+        lea     rcx,[r12*1+rcx]
+        xor     r13,r14
+        mov     rdi,rdx
+        rorx    r12,rdx,39
+        lea     rcx,[r13*1+rcx]
+        xor     rdi,r8
+        rorx    r14,rdx,34
+        rorx    r13,rdx,28
+        lea     r10,[rcx*1+r10]
+        and     r15,rdi
+        xor     r14,r12
+        xor     r15,r8
+        xor     r14,r13
+        lea     rcx,[r15*1+rcx]
+        mov     r12,r11
+        add     rbx,QWORD[96+rsp]
+        and     r12,r10
+        rorx    r13,r10,41
+        rorx    r15,r10,18
+        lea     rcx,[r14*1+rcx]
+        lea     rbx,[r12*1+rbx]
+        andn    r12,r10,rax
+        xor     r13,r15
+        rorx    r14,r10,14
+        lea     rbx,[r12*1+rbx]
+        xor     r13,r14
+        mov     r15,rcx
+        rorx    r12,rcx,39
+        lea     rbx,[r13*1+rbx]
+        xor     r15,rdx
+        rorx    r14,rcx,34
+        rorx    r13,rcx,28
+        lea     r9,[rbx*1+r9]
+        and     rdi,r15
+        xor     r14,r12
+        xor     rdi,rdx
+        xor     r14,r13
+        lea     rbx,[rdi*1+rbx]
+        mov     r12,r10
+        add     rax,QWORD[104+rsp]
+        and     r12,r9
+        rorx    r13,r9,41
+        rorx    rdi,r9,18
+        lea     rbx,[r14*1+rbx]
+        lea     rax,[r12*1+rax]
+        andn    r12,r9,r11
+        xor     r13,rdi
+        rorx    r14,r9,14
+        lea     rax,[r12*1+rax]
+        xor     r13,r14
+        mov     rdi,rbx
+        rorx    r12,rbx,39
+        lea     rax,[r13*1+rax]
+        xor     rdi,rcx
+        rorx    r14,rbx,34
+        rorx    r13,rbx,28
+        lea     r8,[rax*1+r8]
+        and     r15,rdi
+        xor     r14,r12
+        xor     r15,rcx
+        xor     r14,r13
+        lea     rax,[r15*1+rax]
+        mov     r12,r9
+        mov     rdi,QWORD[1280+rsp]
+        add     rax,r14
+
+        lea     rbp,[1152+rsp]
+
+        add     rax,QWORD[rdi]
+        add     rbx,QWORD[8+rdi]
+        add     rcx,QWORD[16+rdi]
+        add     rdx,QWORD[24+rdi]
+        add     r8,QWORD[32+rdi]
+        add     r9,QWORD[40+rdi]
+        add     r10,QWORD[48+rdi]
+        add     r11,QWORD[56+rdi]
+
+        mov     QWORD[rdi],rax
+        mov     QWORD[8+rdi],rbx
+        mov     QWORD[16+rdi],rcx
+        mov     QWORD[24+rdi],rdx
+        mov     QWORD[32+rdi],r8
+        mov     QWORD[40+rdi],r9
+        mov     QWORD[48+rdi],r10
+        mov     QWORD[56+rdi],r11
+
+        cmp     rsi,QWORD[144+rbp]
+        je      NEAR $L$done_avx2
+
+        xor     r14,r14
+        mov     rdi,rbx
+        xor     rdi,rcx
+        mov     r12,r9
+        jmp     NEAR $L$ower_avx2
+ALIGN   16
+$L$ower_avx2:
+        add     r11,QWORD[((0+16))+rbp]
+        and     r12,r8
+        rorx    r13,r8,41
+        rorx    r15,r8,18
+        lea     rax,[r14*1+rax]
+        lea     r11,[r12*1+r11]
+        andn    r12,r8,r10
+        xor     r13,r15
+        rorx    r14,r8,14
+        lea     r11,[r12*1+r11]
+        xor     r13,r14
+        mov     r15,rax
+        rorx    r12,rax,39
+        lea     r11,[r13*1+r11]
+        xor     r15,rbx
+        rorx    r14,rax,34
+        rorx    r13,rax,28
+        lea     rdx,[r11*1+rdx]
+        and     rdi,r15
+        xor     r14,r12
+        xor     rdi,rbx
+        xor     r14,r13
+        lea     r11,[rdi*1+r11]
+        mov     r12,r8
+        add     r10,QWORD[((8+16))+rbp]
+        and     r12,rdx
+        rorx    r13,rdx,41
+        rorx    rdi,rdx,18
+        lea     r11,[r14*1+r11]
+        lea     r10,[r12*1+r10]
+        andn    r12,rdx,r9
+        xor     r13,rdi
+        rorx    r14,rdx,14
+        lea     r10,[r12*1+r10]
+        xor     r13,r14
+        mov     rdi,r11
+        rorx    r12,r11,39
+        lea     r10,[r13*1+r10]
+        xor     rdi,rax
+        rorx    r14,r11,34
+        rorx    r13,r11,28
+        lea     rcx,[r10*1+rcx]
+        and     r15,rdi
+        xor     r14,r12
+        xor     r15,rax
+        xor     r14,r13
+        lea     r10,[r15*1+r10]
+        mov     r12,rdx
+        add     r9,QWORD[((32+16))+rbp]
+        and     r12,rcx
+        rorx    r13,rcx,41
+        rorx    r15,rcx,18
+        lea     r10,[r14*1+r10]
+        lea     r9,[r12*1+r9]
+        andn    r12,rcx,r8
+        xor     r13,r15
+        rorx    r14,rcx,14
+        lea     r9,[r12*1+r9]
+        xor     r13,r14
+        mov     r15,r10
+        rorx    r12,r10,39
+        lea     r9,[r13*1+r9]
+        xor     r15,r11
+        rorx    r14,r10,34
+        rorx    r13,r10,28
+        lea     rbx,[r9*1+rbx]
+        and     rdi,r15
+        xor     r14,r12
+        xor     rdi,r11
+        xor     r14,r13
+        lea     r9,[rdi*1+r9]
+        mov     r12,rcx
+        add     r8,QWORD[((40+16))+rbp]
+        and     r12,rbx
+        rorx    r13,rbx,41
+        rorx    rdi,rbx,18
+        lea     r9,[r14*1+r9]
+        lea     r8,[r12*1+r8]
+        andn    r12,rbx,rdx
+        xor     r13,rdi
+        rorx    r14,rbx,14
+        lea     r8,[r12*1+r8]
+        xor     r13,r14
+        mov     rdi,r9
+        rorx    r12,r9,39
+        lea     r8,[r13*1+r8]
+        xor     rdi,r10
+        rorx    r14,r9,34
+        rorx    r13,r9,28
+        lea     rax,[r8*1+rax]
+        and     r15,rdi
+        xor     r14,r12
+        xor     r15,r10
+        xor     r14,r13
+        lea     r8,[r15*1+r8]
+        mov     r12,rbx
+        add     rdx,QWORD[((64+16))+rbp]
+        and     r12,rax
+        rorx    r13,rax,41
+        rorx    r15,rax,18
+        lea     r8,[r14*1+r8]
+        lea     rdx,[r12*1+rdx]
+        andn    r12,rax,rcx
+        xor     r13,r15
+        rorx    r14,rax,14
+        lea     rdx,[r12*1+rdx]
+        xor     r13,r14
+        mov     r15,r8
+        rorx    r12,r8,39
+        lea     rdx,[r13*1+rdx]
+        xor     r15,r9
+        rorx    r14,r8,34
+        rorx    r13,r8,28
+        lea     r11,[rdx*1+r11]
+        and     rdi,r15
+        xor     r14,r12
+        xor     rdi,r9
+        xor     r14,r13
+        lea     rdx,[rdi*1+rdx]
+        mov     r12,rax
+        add     rcx,QWORD[((72+16))+rbp]
+        and     r12,r11
+        rorx    r13,r11,41
+        rorx    rdi,r11,18
+        lea     rdx,[r14*1+rdx]
+        lea     rcx,[r12*1+rcx]
+        andn    r12,r11,rbx
+        xor     r13,rdi
+        rorx    r14,r11,14
+        lea     rcx,[r12*1+rcx]
+        xor     r13,r14
+        mov     rdi,rdx
+        rorx    r12,rdx,39
+        lea     rcx,[r13*1+rcx]
+        xor     rdi,r8
+        rorx    r14,rdx,34
+        rorx    r13,rdx,28
+        lea     r10,[rcx*1+r10]
+        and     r15,rdi
+        xor     r14,r12
+        xor     r15,r8
+        xor     r14,r13
+        lea     rcx,[r15*1+rcx]
+        mov     r12,r11
+        add     rbx,QWORD[((96+16))+rbp]
+        and     r12,r10
+        rorx    r13,r10,41
+        rorx    r15,r10,18
+        lea     rcx,[r14*1+rcx]
+        lea     rbx,[r12*1+rbx]
+        andn    r12,r10,rax
+        xor     r13,r15
+        rorx    r14,r10,14
+        lea     rbx,[r12*1+rbx]
+        xor     r13,r14
+        mov     r15,rcx
+        rorx    r12,rcx,39
+        lea     rbx,[r13*1+rbx]
+        xor     r15,rdx
+        rorx    r14,rcx,34
+        rorx    r13,rcx,28
+        lea     r9,[rbx*1+r9]
+        and     rdi,r15
+        xor     r14,r12
+        xor     rdi,rdx
+        xor     r14,r13
+        lea     rbx,[rdi*1+rbx]
+        mov     r12,r10
+        add     rax,QWORD[((104+16))+rbp]
+        and     r12,r9
+        rorx    r13,r9,41
+        rorx    rdi,r9,18
+        lea     rbx,[r14*1+rbx]
+        lea     rax,[r12*1+rax]
+        andn    r12,r9,r11
+        xor     r13,rdi
+        rorx    r14,r9,14
+        lea     rax,[r12*1+rax]
+        xor     r13,r14
+        mov     rdi,rbx
+        rorx    r12,rbx,39
+        lea     rax,[r13*1+rax]
+        xor     rdi,rcx
+        rorx    r14,rbx,34
+        rorx    r13,rbx,28
+        lea     r8,[rax*1+r8]
+        and     r15,rdi
+        xor     r14,r12
+        xor     r15,rcx
+        xor     r14,r13
+        lea     rax,[r15*1+rax]
+        mov     r12,r9
+        lea     rbp,[((-128))+rbp]
+        cmp     rbp,rsp
+        jae     NEAR $L$ower_avx2
+
+        mov     rdi,QWORD[1280+rsp]
+        add     rax,r14
+
+        lea     rsp,[1152+rsp]
+
+        add     rax,QWORD[rdi]
+        add     rbx,QWORD[8+rdi]
+        add     rcx,QWORD[16+rdi]
+        add     rdx,QWORD[24+rdi]
+        add     r8,QWORD[32+rdi]
+        add     r9,QWORD[40+rdi]
+        lea     rsi,[256+rsi]
+        add     r10,QWORD[48+rdi]
+        mov     r12,rsi
+        add     r11,QWORD[56+rdi]
+        cmp     rsi,QWORD[((128+16))+rsp]
+
+        mov     QWORD[rdi],rax
+        cmove   r12,rsp
+        mov     QWORD[8+rdi],rbx
+        mov     QWORD[16+rdi],rcx
+        mov     QWORD[24+rdi],rdx
+        mov     QWORD[32+rdi],r8
+        mov     QWORD[40+rdi],r9
+        mov     QWORD[48+rdi],r10
+        mov     QWORD[56+rdi],r11
+
+        jbe     NEAR $L$oop_avx2
+        lea     rbp,[rsp]
+
+$L$done_avx2:
+        lea     rsp,[rbp]
+        mov     rsi,QWORD[152+rsp]
+
+        vzeroupper
+        movaps  xmm6,XMMWORD[((128+32))+rsp]
+        movaps  xmm7,XMMWORD[((128+48))+rsp]
+        movaps  xmm8,XMMWORD[((128+64))+rsp]
+        movaps  xmm9,XMMWORD[((128+80))+rsp]
+        movaps  xmm10,XMMWORD[((128+96))+rsp]
+        movaps  xmm11,XMMWORD[((128+112))+rsp]
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$epilogue_avx2:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_sha512_block_data_order_avx2:
+EXTERN  __imp_RtlVirtualUnwind
+
+ALIGN   16
+se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$in_prologue
+
+        mov     rax,QWORD[152+r8]
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$in_prologue
+        lea     r10,[$L$avx2_shortcut]
+        cmp     rbx,r10
+        jb      NEAR $L$not_in_avx2
+
+        and     rax,-256*8
+        add     rax,1152
+$L$not_in_avx2:
+        mov     rsi,rax
+        mov     rax,QWORD[((128+24))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+        mov     r12,QWORD[((-24))+rax]
+        mov     r13,QWORD[((-32))+rax]
+        mov     r14,QWORD[((-40))+rax]
+        mov     r15,QWORD[((-48))+rax]
+        mov     QWORD[144+r8],rbx
+        mov     QWORD[160+r8],rbp
+        mov     QWORD[216+r8],r12
+        mov     QWORD[224+r8],r13
+        mov     QWORD[232+r8],r14
+        mov     QWORD[240+r8],r15
+
+        lea     r10,[$L$epilogue]
+        cmp     rbx,r10
+        jb      NEAR $L$in_prologue
+
+        lea     rsi,[((128+32))+rsi]
+        lea     rdi,[512+r8]
+        mov     ecx,12
+        DD      0xa548f3fc
+
+$L$in_prologue:
+        mov     rdi,QWORD[8+rax]
+        mov     rsi,QWORD[16+rax]
+        mov     QWORD[152+r8],rax
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+
+        mov     rdi,QWORD[40+r9]
+        mov     rsi,r8
+        mov     ecx,154
+        DD      0xa548f3fc
+
+        mov     rsi,r9
+        xor     rcx,rcx
+        mov     rdx,QWORD[8+rsi]
+        mov     r8,QWORD[rsi]
+        mov     r9,QWORD[16+rsi]
+        mov     r10,QWORD[40+rsi]
+        lea     r11,[56+rsi]
+        lea     r12,[24+rsi]
+        mov     QWORD[32+rsp],r10
+        mov     QWORD[40+rsp],r11
+        mov     QWORD[48+rsp],r12
+        mov     QWORD[56+rsp],rcx
+        call    QWORD[__imp_RtlVirtualUnwind]
+
+        mov     eax,1
+        add     rsp,64
+        popfq
+        pop     r15
+        pop     r14
+        pop     r13
+        pop     r12
+        pop     rbp
+        pop     rbx
+        pop     rdi
+        pop     rsi
+        DB      0F3h,0C3h               ;repret
+
+section .pdata rdata align=4
+ALIGN   4
+        DD      $L$SEH_begin_sha512_block_data_order wrt ..imagebase
+        DD      $L$SEH_end_sha512_block_data_order wrt ..imagebase
+        DD      $L$SEH_info_sha512_block_data_order wrt ..imagebase
+        DD      $L$SEH_begin_sha512_block_data_order_xop wrt ..imagebase
+        DD      $L$SEH_end_sha512_block_data_order_xop wrt ..imagebase
+        DD      $L$SEH_info_sha512_block_data_order_xop wrt ..imagebase
+        DD      $L$SEH_begin_sha512_block_data_order_avx wrt ..imagebase
+        DD      $L$SEH_end_sha512_block_data_order_avx wrt ..imagebase
+        DD      $L$SEH_info_sha512_block_data_order_avx wrt ..imagebase
+        DD      $L$SEH_begin_sha512_block_data_order_avx2 wrt ..imagebase
+        DD      $L$SEH_end_sha512_block_data_order_avx2 wrt ..imagebase
+        DD      $L$SEH_info_sha512_block_data_order_avx2 wrt ..imagebase
+section .xdata rdata align=8
+ALIGN   8
+$L$SEH_info_sha512_block_data_order:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$prologue wrt ..imagebase,$L$epilogue wrt ..imagebase
+$L$SEH_info_sha512_block_data_order_xop:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$prologue_xop wrt ..imagebase,$L$epilogue_xop wrt ..imagebase
+$L$SEH_info_sha512_block_data_order_avx:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$prologue_avx wrt ..imagebase,$L$epilogue_avx wrt ..imagebase
+$L$SEH_info_sha512_block_data_order_avx2:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$prologue_avx2 wrt ..imagebase,$L$epilogue_avx2 wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm
new file mode 100644
index 0000000000..2b64a074c3
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm
@@ -0,0 +1,472 @@
+; Copyright 2005-2018 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+EXTERN  OPENSSL_cpuid_setup
+
+section .CRT$XCU rdata align=8
+                DQ      OPENSSL_cpuid_setup
+
+
+common  OPENSSL_ia32cap_P 16
+
+section .text code align=64
+
+
+global  OPENSSL_atomic_add
+
+ALIGN   16
+OPENSSL_atomic_add:
+        mov     eax,DWORD[rcx]
+$L$spin:        lea     r8,[rax*1+rdx]
+DB      0xf0
+        cmpxchg DWORD[rcx],r8d
+        jne     NEAR $L$spin
+        mov     eax,r8d
+DB      0x48,0x98
+        DB      0F3h,0C3h               ;repret
+
+
+global  OPENSSL_rdtsc
+
+ALIGN   16
+OPENSSL_rdtsc:
+        rdtsc
+        shl     rdx,32
+        or      rax,rdx
+        DB      0F3h,0C3h               ;repret
+
+
+global  OPENSSL_ia32_cpuid
+
+ALIGN   16
+OPENSSL_ia32_cpuid:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_OPENSSL_ia32_cpuid:
+        mov     rdi,rcx
+
+
+
+        mov     r8,rbx
+
+
+        xor     eax,eax
+        mov     QWORD[8+rdi],rax
+        cpuid
+        mov     r11d,eax
+
+        xor     eax,eax
+        cmp     ebx,0x756e6547
+        setne   al
+        mov     r9d,eax
+        cmp     edx,0x49656e69
+        setne   al
+        or      r9d,eax
+        cmp     ecx,0x6c65746e
+        setne   al
+        or      r9d,eax
+        jz      NEAR $L$intel
+
+        cmp     ebx,0x68747541
+        setne   al
+        mov     r10d,eax
+        cmp     edx,0x69746E65
+        setne   al
+        or      r10d,eax
+        cmp     ecx,0x444D4163
+        setne   al
+        or      r10d,eax
+        jnz     NEAR $L$intel
+
+
+        mov     eax,0x80000000
+        cpuid
+        cmp     eax,0x80000001
+        jb      NEAR $L$intel
+        mov     r10d,eax
+        mov     eax,0x80000001
+        cpuid
+        or      r9d,ecx
+        and     r9d,0x00000801
+
+        cmp     r10d,0x80000008
+        jb      NEAR $L$intel
+
+        mov     eax,0x80000008
+        cpuid
+        movzx   r10,cl
+        inc     r10
+
+        mov     eax,1
+        cpuid
+        bt      edx,28
+        jnc     NEAR $L$generic
+        shr     ebx,16
+        cmp     bl,r10b
+        ja      NEAR $L$generic
+        and     edx,0xefffffff
+        jmp     NEAR $L$generic
+
+$L$intel:
+        cmp     r11d,4
+        mov     r10d,-1
+        jb      NEAR $L$nocacheinfo
+
+        mov     eax,4
+        mov     ecx,0
+        cpuid
+        mov     r10d,eax
+        shr     r10d,14
+        and     r10d,0xfff
+
+$L$nocacheinfo:
+        mov     eax,1
+        cpuid
+        movd    xmm0,eax
+        and     edx,0xbfefffff
+        cmp     r9d,0
+        jne     NEAR $L$notintel
+        or      edx,0x40000000
+        and     ah,15
+        cmp     ah,15
+        jne     NEAR $L$notP4
+        or      edx,0x00100000
+$L$notP4:
+        cmp     ah,6
+        jne     NEAR $L$notintel
+        and     eax,0x0fff0ff0
+        cmp     eax,0x00050670
+        je      NEAR $L$knights
+        cmp     eax,0x00080650
+        jne     NEAR $L$notintel
+$L$knights:
+        and     ecx,0xfbffffff
+
+$L$notintel:
+        bt      edx,28
+        jnc     NEAR $L$generic
+        and     edx,0xefffffff
+        cmp     r10d,0
+        je      NEAR $L$generic
+
+        or      edx,0x10000000
+        shr     ebx,16
+        cmp     bl,1
+        ja      NEAR $L$generic
+        and     edx,0xefffffff
+$L$generic:
+        and     r9d,0x00000800
+        and     ecx,0xfffff7ff
+        or      r9d,ecx
+
+        mov     r10d,edx
+
+        cmp     r11d,7
+        jb      NEAR $L$no_extended_info
+        mov     eax,7
+        xor     ecx,ecx
+        cpuid
+        bt      r9d,26
+        jc      NEAR $L$notknights
+        and     ebx,0xfff7ffff
+$L$notknights:
+        movd    eax,xmm0
+        and     eax,0x0fff0ff0
+        cmp     eax,0x00050650
+        jne     NEAR $L$notskylakex
+        and     ebx,0xfffeffff
+
+$L$notskylakex:
+        mov     DWORD[8+rdi],ebx
+        mov     DWORD[12+rdi],ecx
+$L$no_extended_info:
+
+        bt      r9d,27
+        jnc     NEAR $L$clear_avx
+        xor     ecx,ecx
+DB      0x0f,0x01,0xd0
+        and     eax,0xe6
+        cmp     eax,0xe6
+        je      NEAR $L$done
+        and     DWORD[8+rdi],0x3fdeffff
+
+
+
+
+        and     eax,6
+        cmp     eax,6
+        je      NEAR $L$done
+$L$clear_avx:
+        mov     eax,0xefffe7ff
+        and     r9d,eax
+        mov     eax,0x3fdeffdf
+        and     DWORD[8+rdi],eax
+$L$done:
+        shl     r9,32
+        mov     eax,r10d
+        mov     rbx,r8
+
+        or      rax,r9
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_OPENSSL_ia32_cpuid:
+
+global  OPENSSL_cleanse
+
+ALIGN   16
+OPENSSL_cleanse:
+        xor     rax,rax
+        cmp     rdx,15
+        jae     NEAR $L$ot
+        cmp     rdx,0
+        je      NEAR $L$ret
+$L$ittle:
+        mov     BYTE[rcx],al
+        sub     rdx,1
+        lea     rcx,[1+rcx]
+        jnz     NEAR $L$ittle
+$L$ret:
+        DB      0F3h,0C3h               ;repret
+ALIGN   16
+$L$ot:
+        test    rcx,7
+        jz      NEAR $L$aligned
+        mov     BYTE[rcx],al
+        lea     rdx,[((-1))+rdx]
+        lea     rcx,[1+rcx]
+        jmp     NEAR $L$ot
+$L$aligned:
+        mov     QWORD[rcx],rax
+        lea     rdx,[((-8))+rdx]
+        test    rdx,-8
+        lea     rcx,[8+rcx]
+        jnz     NEAR $L$aligned
+        cmp     rdx,0
+        jne     NEAR $L$ittle
+        DB      0F3h,0C3h               ;repret
+
+
+global  CRYPTO_memcmp
+
+ALIGN   16
+CRYPTO_memcmp:
+        xor     rax,rax
+        xor     r10,r10
+        cmp     r8,0
+        je      NEAR $L$no_data
+        cmp     r8,16
+        jne     NEAR $L$oop_cmp
+        mov     r10,QWORD[rcx]
+        mov     r11,QWORD[8+rcx]
+        mov     r8,1
+        xor     r10,QWORD[rdx]
+        xor     r11,QWORD[8+rdx]
+        or      r10,r11
+        cmovnz  rax,r8
+        DB      0F3h,0C3h               ;repret
+
+ALIGN   16
+$L$oop_cmp:
+        mov     r10b,BYTE[rcx]
+        lea     rcx,[1+rcx]
+        xor     r10b,BYTE[rdx]
+        lea     rdx,[1+rdx]
+        or      al,r10b
+        dec     r8
+        jnz     NEAR $L$oop_cmp
+        neg     rax
+        shr     rax,63
+$L$no_data:
+        DB      0F3h,0C3h               ;repret
+
+global  OPENSSL_wipe_cpu
+
+ALIGN   16
+OPENSSL_wipe_cpu:
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        pxor    xmm2,xmm2
+        pxor    xmm3,xmm3
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+        xor     rcx,rcx
+        xor     rdx,rdx
+        xor     r8,r8
+        xor     r9,r9
+        xor     r10,r10
+        xor     r11,r11
+        lea     rax,[8+rsp]
+        DB      0F3h,0C3h               ;repret
+
+global  OPENSSL_instrument_bus
+
+ALIGN   16
+OPENSSL_instrument_bus:
+        mov     r10,rcx
+        mov     rcx,rdx
+        mov     r11,rdx
+
+        rdtsc
+        mov     r8d,eax
+        mov     r9d,0
+        clflush [r10]
+DB      0xf0
+        add     DWORD[r10],r9d
+        jmp     NEAR $L$oop
+ALIGN   16
+$L$oop: rdtsc
+        mov     edx,eax
+        sub     eax,r8d
+        mov     r8d,edx
+        mov     r9d,eax
+        clflush [r10]
+DB      0xf0
+        add     DWORD[r10],eax
+        lea     r10,[4+r10]
+        sub     rcx,1
+        jnz     NEAR $L$oop
+
+        mov     rax,r11
+        DB      0F3h,0C3h               ;repret
+
+
+global  OPENSSL_instrument_bus2
+
+ALIGN   16
+OPENSSL_instrument_bus2:
+        mov     r10,rcx
+        mov     rcx,rdx
+        mov     r11,r8
+        mov     QWORD[8+rsp],rcx
+
+        rdtsc
+        mov     r8d,eax
+        mov     r9d,0
+
+        clflush [r10]
+DB      0xf0
+        add     DWORD[r10],r9d
+
+        rdtsc
+        mov     edx,eax
+        sub     eax,r8d
+        mov     r8d,edx
+        mov     r9d,eax
+$L$oop2:
+        clflush [r10]
+DB      0xf0
+        add     DWORD[r10],eax
+
+        sub     r11,1
+        jz      NEAR $L$done2
+
+        rdtsc
+        mov     edx,eax
+        sub     eax,r8d
+        mov     r8d,edx
+        cmp     eax,r9d
+        mov     r9d,eax
+        mov     edx,0
+        setne   dl
+        sub     rcx,rdx
+        lea     r10,[rdx*4+r10]
+        jnz     NEAR $L$oop2
+
+$L$done2:
+        mov     rax,QWORD[8+rsp]
+        sub     rax,rcx
+        DB      0F3h,0C3h               ;repret
+
+global  OPENSSL_ia32_rdrand_bytes
+
+ALIGN   16
+OPENSSL_ia32_rdrand_bytes:
+        xor     rax,rax
+        cmp     rdx,0
+        je      NEAR $L$done_rdrand_bytes
+
+        mov     r11,8
+$L$oop_rdrand_bytes:
+DB      73,15,199,242
+        jc      NEAR $L$break_rdrand_bytes
+        dec     r11
+        jnz     NEAR $L$oop_rdrand_bytes
+        jmp     NEAR $L$done_rdrand_bytes
+
+ALIGN   16
+$L$break_rdrand_bytes:
+        cmp     rdx,8
+        jb      NEAR $L$tail_rdrand_bytes
+        mov     QWORD[rcx],r10
+        lea     rcx,[8+rcx]
+        add     rax,8
+        sub     rdx,8
+        jz      NEAR $L$done_rdrand_bytes
+        mov     r11,8
+        jmp     NEAR $L$oop_rdrand_bytes
+
+ALIGN   16
+$L$tail_rdrand_bytes:
+        mov     BYTE[rcx],r10b
+        lea     rcx,[1+rcx]
+        inc     rax
+        shr     r10,8
+        dec     rdx
+        jnz     NEAR $L$tail_rdrand_bytes
+
+$L$done_rdrand_bytes:
+        xor     r10,r10
+        DB      0F3h,0C3h               ;repret
+
+global  OPENSSL_ia32_rdseed_bytes
+
+ALIGN   16
+OPENSSL_ia32_rdseed_bytes:
+        xor     rax,rax
+        cmp     rdx,0
+        je      NEAR $L$done_rdseed_bytes
+
+        mov     r11,8
+$L$oop_rdseed_bytes:
+DB      73,15,199,250
+        jc      NEAR $L$break_rdseed_bytes
+        dec     r11
+        jnz     NEAR $L$oop_rdseed_bytes
+        jmp     NEAR $L$done_rdseed_bytes
+
+ALIGN   16
+$L$break_rdseed_bytes:
+        cmp     rdx,8
+        jb      NEAR $L$tail_rdseed_bytes
+        mov     QWORD[rcx],r10
+        lea     rcx,[8+rcx]
+        add     rax,8
+        sub     rdx,8
+        jz      NEAR $L$done_rdseed_bytes
+        mov     r11,8
+        jmp     NEAR $L$oop_rdseed_bytes
+
+ALIGN   16
+$L$tail_rdseed_bytes:
+        mov     BYTE[rcx],r10b
+        lea     rcx,[1+rcx]
+        inc     rax
+        shr     r10,8
+        dec     rdx
+        jnz     NEAR $L$tail_rdseed_bytes
+
+$L$done_rdseed_bytes:
+        xor     r10,r10
+        DB      0F3h,0C3h               ;repret
+
diff --git a/CryptoPkg/Library/OpensslLib/process_files.pl b/CryptoPkg/Library/OpensslLib/process_files.pl
index 4ba25da407..c0a19b99b6 100755
--- a/CryptoPkg/Library/OpensslLib/process_files.pl
+++ b/CryptoPkg/Library/OpensslLib/process_files.pl
@@ -12,6 +12,47 @@
 use strict;
 use Cwd;
 use File::Copy;
+use File::Basename;
+use File::Path qw(make_path remove_tree);
+use Text::Tabs;
+
+#
+# OpenSSL perlasm generator script does not transfer the copyright header
+#
+sub copy_license_header
+{
+    my @args = split / /, shift;    #Separate args by spaces
+    my $source = $args[1];          #Source file is second (after "perl")
+    my $target = pop @args;         #Target file is always last
+    chop ($target);                 #Remove newline char
+
+    my $temp_file_name = "license.tmp";
+    open (my $source_file, "<" . $source) || die $source;
+    open (my $target_file, "<" . $target) || die $target;
+    open (my $temp_file, ">" . $temp_file_name) || die $temp_file_name;
+
+    #Copy source file header to temp file
+    while (my $line = <$source_file>) {
+        next if ($line =~ /#!/);    #Ignore shebang line
+        $line =~ s/#/;/;            #Fix comment character for assembly
+        $line =~ s/\s+$/\r\n/;      #Trim trailing whitepsace, fixup line endings
+        print ($temp_file $line);
+        last if ($line =~ /http/);  #Last line of copyright header contains a web link
+    }
+    print ($temp_file "\r\n");      #Add an empty line after the header
+    #Retrieve generated assembly contents
+    while (my $line = <$target_file>) {
+        $line =~ s/\s+$/\r\n/;      #Trim trailing whitepsace, fixup line endings
+        print ($temp_file expand ($line));  #expand() replaces tabs with spaces
+    }
+
+    close ($source_file);
+    close ($target_file);
+    close ($temp_file);
+
+    move ($temp_file_name, $target) ||
+        die "Cannot replace \"" . $target . "\"!";
+}
 
 #
 # Find the openssl directory name for use lib. We have to do this
@@ -21,10 +62,39 @@ use File::Copy;
 #
 my $inf_file;
 my $OPENSSL_PATH;
+my $uefi_config;
+my $extension;
+my $arch;
 my @inf;
 
 BEGIN {
     $inf_file = "OpensslLib.inf";
+    $uefi_config = "UEFI";
+    $arch = shift;
+
+    if (defined $arch) {
+        if (lc ($arch) eq lc ("X64")) {
+            $inf_file = "OpensslLibX64.inf";
+            $uefi_config = "UEFI-x86_64";
+            $extension = "nasm";
+        } elsif (lc ($arch) eq lc ("IA32")) {
+            $arch = "Ia32";
+            $inf_file = "OpensslLibIa32.inf";
+            $uefi_config = "UEFI-x86";
+            $extension = "nasm";
+        } else {
+            die "Unsupported architecture \"" . $arch . "\"!";
+        }
+
+        # Prepare assembly folder
+        if (-d $arch) {
+            remove_tree ($arch, {safe => 1}) ||
+                die "Cannot clean assembly folder \"" . $arch . "\"!";
+        } else {
+            mkdir $arch ||
+                die "Cannot create assembly folder \"" . $arch . "\"!";
+        }
+    }
 
     # Read the contents of the inf file
     open( FD, "<" . $inf_file ) ||
@@ -47,9 +117,9 @@ BEGIN {
             # Configure UEFI
             system(
                 "./Configure",
-                "UEFI",
+                "--config=../uefi-asm.conf",
+                "$uefi_config",
                 "no-afalgeng",
-                "no-asm",
                 "no-async",
                 "no-autoerrinit",
                 "no-autoload-config",
@@ -126,22 +196,52 @@ BEGIN {
 # Retrieve file lists from OpenSSL configdata
 #
 use configdata qw/%unified_info/;
+use configdata qw/%config/;
+use configdata qw/%target/;
+
+#
+# Collect build flags from configdata
+#
+my $flags = "";
+foreach my $f (@{$config{lib_defines}}) {
+    $flags .= " -D$f";
+}
 
 my @cryptofilelist = ();
 my @sslfilelist = ();
+my @asmfilelist = ();
+my @asmbuild = ();
 foreach my $product ((@{$unified_info{libraries}},
                       @{$unified_info{engines}})) {
     foreach my $o (@{$unified_info{sources}->{$product}}) {
         foreach my $s (@{$unified_info{sources}->{$o}}) {
-            next if ($unified_info{generate}->{$s});
-            next if $s =~ "crypto/bio/b_print.c";
-
             # No need to add unused files in UEFI.
             # So it can reduce porting time, compile time, library size.
+            next if $s =~ "crypto/bio/b_print.c";
             next if $s =~ "crypto/rand/randfile.c";
             next if $s =~ "crypto/store/";
             next if $s =~ "crypto/err/err_all.c";
 
+            if ($unified_info{generate}->{$s}) {
+                if (defined $arch) {
+                    my $buildstring = "perl";
+                    foreach my $arg (@{$unified_info{generate}->{$s}}) {
+                        if ($arg =~ ".pl") {
+                            $buildstring .= " ./openssl/$arg";
+                        } elsif ($arg =~ "PERLASM_SCHEME") {
+                            $buildstring .= " $target{perlasm_scheme}";
+                        } elsif ($arg =~ "LIB_CFLAGS") {
+                            $buildstring .= "$flags";
+                        }
+                    }
+                    ($s, my $path, undef) = fileparse($s, qr/\.[^.]*/);
+                    $buildstring .= " ./$arch/$path$s.$extension";
+                    make_path ("./$arch/$path");
+                    push @asmbuild, "$buildstring\n";
+                    push @asmfilelist, "  $arch/$path$s.$extension\r\n";
+                }
+                next;
+            }
             if ($product =~ "libssl") {
                 push @sslfilelist, '  $(OPENSSL_PATH)/' . $s . "\r\n";
                 next;
@@ -179,15 +279,31 @@ foreach (@headers){
 }
 
 
+#
+# Generate assembly files
+#
+if (@asmbuild) {
+    print "\n--> Generating assembly files ... ";
+    foreach my $buildstring (@asmbuild) {
+        system ("$buildstring");
+        copy_license_header ($buildstring);
+    }
+    print "Done!";
+}
+
 #
 # Update OpensslLib.inf with autogenerated file list
 #
 my @new_inf = ();
 my $subbing = 0;
-print "\n--> Updating OpensslLib.inf ... ";
+print "\n--> Updating $inf_file ... ";
 foreach (@inf) {
+    if ($_ =~ "DEFINE OPENSSL_FLAGS_CONFIG") {
+        push @new_inf, "  DEFINE OPENSSL_FLAGS_CONFIG    =" . $flags . "\r\n";
+        next;
+    }
     if ( $_ =~ "# Autogenerated files list starts here" ) {
-        push @new_inf, $_, @cryptofilelist, @sslfilelist;
+        push @new_inf, $_, @asmfilelist, @cryptofilelist, @sslfilelist;
         $subbing = 1;
         next;
     }
@@ -212,49 +328,51 @@ rename( $new_inf_file, $inf_file ) ||
     die "rename $inf_file";
 print "Done!";
 
-#
-# Update OpensslLibCrypto.inf with auto-generated file list (no libssl)
-#
-$inf_file = "OpensslLibCrypto.inf";
-
-# Read the contents of the inf file
-@inf = ();
-@new_inf = ();
-open( FD, "<" . $inf_file ) ||
-    die "Cannot open \"" . $inf_file . "\"!";
-@inf = (<FD>);
-close(FD) ||
-    die "Cannot close \"" . $inf_file . "\"!";
+if (!defined $arch) {
+    #
+    # Update OpensslLibCrypto.inf with auto-generated file list (no libssl)
+    #
+    $inf_file = "OpensslLibCrypto.inf";
 
-$subbing = 0;
-print "\n--> Updating OpensslLibCrypto.inf ... ";
-foreach (@inf) {
-    if ( $_ =~ "# Autogenerated files list starts here" ) {
-        push @new_inf, $_, @cryptofilelist;
-        $subbing = 1;
-        next;
-    }
-    if ( $_ =~ "# Autogenerated files list ends here" ) {
-        push @new_inf, $_;
-        $subbing = 0;
-        next;
+    # Read the contents of the inf file
+    @inf = ();
+    @new_inf = ();
+    open( FD, "<" . $inf_file ) ||
+        die "Cannot open \"" . $inf_file . "\"!";
+    @inf = (<FD>);
+    close(FD) ||
+        die "Cannot close \"" . $inf_file . "\"!";
+
+    $subbing = 0;
+    print "\n--> Updating OpensslLibCrypto.inf ... ";
+    foreach (@inf) {
+        if ( $_ =~ "# Autogenerated files list starts here" ) {
+            push @new_inf, $_, @cryptofilelist;
+            $subbing = 1;
+            next;
+        }
+        if ( $_ =~ "# Autogenerated files list ends here" ) {
+            push @new_inf, $_;
+            $subbing = 0;
+            next;
+        }
+
+        push @new_inf, $_
+            unless ($subbing);
     }
 
-    push @new_inf, $_
-        unless ($subbing);
+    $new_inf_file = $inf_file . ".new";
+    open( FD, ">" . $new_inf_file ) ||
+        die $new_inf_file;
+    print( FD @new_inf ) ||
+        die $new_inf_file;
+    close(FD) ||
+        die $new_inf_file;
+    rename( $new_inf_file, $inf_file ) ||
+        die "rename $inf_file";
+    print "Done!";
 }
 
-$new_inf_file = $inf_file . ".new";
-open( FD, ">" . $new_inf_file ) ||
-    die $new_inf_file;
-print( FD @new_inf ) ||
-    die $new_inf_file;
-close(FD) ||
-    die $new_inf_file;
-rename( $new_inf_file, $inf_file ) ||
-    die "rename $inf_file";
-print "Done!";
-
 #
 # Copy opensslconf.h and dso_conf.h generated from OpenSSL Configuration
 #
diff --git a/CryptoPkg/Library/OpensslLib/uefi-asm.conf b/CryptoPkg/Library/OpensslLib/uefi-asm.conf
new file mode 100644
index 0000000000..4fd52c9cf2
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/uefi-asm.conf
@@ -0,0 +1,14 @@
+## -*- mode: perl; -*-
+## UEFI assembly openssl configuration targets.
+
+my %targets = (
+#### UEFI
+    "UEFI-x86" => {
+        inherit_from     => [ "UEFI",  asm("x86_asm") ],
+        perlasm_scheme   => "win32n",
+    },
+    "UEFI-x86_64" => {
+        inherit_from     => [ "UEFI",  asm("x86_64_asm") ],
+        perlasm_scheme   => "nasm",
+    },
+);
-- 
2.16.2.windows.1


  reply	other threads:[~2020-03-17 10:27 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-17 10:26 [PATCH 0/1] CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64 Zurcher, Christopher J
2020-03-17 10:26 ` Zurcher, Christopher J [this message]
2020-03-26  1:15   ` [edk2-devel] [PATCH 1/1] " Yao, Jiewen
     [not found]   ` <15FFB5A5A94CCE31.23217@groups.io>
2020-03-26  1:23     ` Yao, Jiewen
2020-03-26  2:44       ` Zurcher, Christopher J
2020-03-26  3:05         ` Yao, Jiewen
2020-03-26  3:29           ` Zurcher, Christopher J
2020-03-26  3:58             ` Yao, Jiewen
2020-03-26 18:23               ` Michael D Kinney
2020-03-27  0:52                 ` Zurcher, Christopher J
2020-03-23 12:59 ` [edk2-devel] [PATCH 0/1] " Laszlo Ersek
2020-03-25 18:40 ` Ard Biesheuvel
2020-03-26  1:04   ` [edk2-devel] " Zurcher, Christopher J
2020-03-26  7:49     ` Ard Biesheuvel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200317102656.20032-2-christopher.j.zurcher@intel.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox