From: "Zurcher, Christopher J" <christopher.j.zurcher@intel.com>
To: devel@edk2.groups.io
Cc: Jian J Wang <jian.j.wang@intel.com>,
Xiaoyu Lu <xiaoyux.lu@intel.com>, Eugene Cohen <eugene@hp.com>,
Ard Biesheuvel <ard.biesheuvel@linaro.org>
Subject: [PATCH 1/1] CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64
Date: Tue, 17 Mar 2020 03:26:56 -0700 [thread overview]
Message-ID: <20200317102656.20032-2-christopher.j.zurcher@intel.com> (raw)
In-Reply-To: <20200317102656.20032-1-christopher.j.zurcher@intel.com>
BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2507
Adding IA32 and X64 versions of OpensslLib.inf, and their respective
assembly files. This also introduces the required modifications to
process_files.pl for generating these files.
Cc: Jian J Wang <jian.j.wang@intel.com>
Cc: Xiaoyu Lu <xiaoyux.lu@intel.com>
Cc: Eugene Cohen <eugene@hp.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Christopher J Zurcher <christopher.j.zurcher@intel.com>
---
CryptoPkg/Library/OpensslLib/OpensslLib.inf | 2 +-
CryptoPkg/Library/OpensslLib/OpensslLibCrypto.inf | 2 +-
CryptoPkg/Library/OpensslLib/OpensslLibIa32.inf | 680 ++
CryptoPkg/Library/OpensslLib/OpensslLibX64.inf | 691 ++
CryptoPkg/Library/Include/openssl/opensslconf.h | 3 -
CryptoPkg/Library/OpensslLib/ApiHooks.c | 18 +
CryptoPkg/Library/OpensslLib/OpensslLibConstructor.c | 34 +
CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/aesni-x86.nasm | 3209 ++++++++
CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/vpaes-x86.nasm | 648 ++
CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/bn-586.nasm | 1522 ++++
CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/co-586.nasm | 1259 +++
CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-gf2m.nasm | 352 +
CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-mont.nasm | 486 ++
CryptoPkg/Library/OpensslLib/Ia32/crypto/des/crypt586.nasm | 887 +++
CryptoPkg/Library/OpensslLib/Ia32/crypto/des/des-586.nasm | 1835 +++++
CryptoPkg/Library/OpensslLib/Ia32/crypto/md5/md5-586.nasm | 690 ++
CryptoPkg/Library/OpensslLib/Ia32/crypto/modes/ghash-x86.nasm | 1264 +++
CryptoPkg/Library/OpensslLib/Ia32/crypto/rc4/rc4-586.nasm | 381 +
CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha1-586.nasm | 3977 ++++++++++
CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha256-586.nasm | 6796 ++++++++++++++++
CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha512-586.nasm | 2842 +++++++
CryptoPkg/Library/OpensslLib/Ia32/crypto/x86cpuid.nasm | 513 ++
CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb-x86_64.nasm | 1772 +++++
CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1-x86_64.nasm | 3271 ++++++++
CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256-x86_64.nasm | 4709 +++++++++++
CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm | 5084 ++++++++++++
CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm | 1170 +++
CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-avx2.nasm | 1989 +++++
CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-x86_64.nasm | 2242 ++++++
CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-gf2m.nasm | 432 +
CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont.nasm | 1479 ++++
CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont5.nasm | 4033 ++++++++++
CryptoPkg/Library/OpensslLib/X64/crypto/md5/md5-x86_64.nasm | 794 ++
CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm-x86_64.nasm | 984 +++
CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash-x86_64.nasm | 2077 +++++
CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-md5-x86_64.nasm | 1395 ++++
CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-x86_64.nasm | 784 ++
CryptoPkg/Library/OpensslLib/X64/crypto/sha/keccak1600-x86_64.nasm | 532 ++
CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb-x86_64.nasm | 7581 ++++++++++++++++++
CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm | 5773 ++++++++++++++
CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb-x86_64.nasm | 8262 ++++++++++++++++++++
CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm | 5712 ++++++++++++++
CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm | 5668 ++++++++++++++
CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm | 472 ++
CryptoPkg/Library/OpensslLib/process_files.pl | 208 +-
CryptoPkg/Library/OpensslLib/uefi-asm.conf | 14 +
46 files changed, 94478 insertions(+), 50 deletions(-)
diff --git a/CryptoPkg/Library/OpensslLib/OpensslLib.inf b/CryptoPkg/Library/OpensslLib/OpensslLib.inf
index 3519a66885..542507a534 100644
--- a/CryptoPkg/Library/OpensslLib/OpensslLib.inf
+++ b/CryptoPkg/Library/OpensslLib/OpensslLib.inf
@@ -15,7 +15,7 @@
VERSION_STRING = 1.0
LIBRARY_CLASS = OpensslLib
DEFINE OPENSSL_PATH = openssl
- DEFINE OPENSSL_FLAGS = -DL_ENDIAN -DOPENSSL_SMALL_FOOTPRINT -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE
+ DEFINE OPENSSL_FLAGS = -DL_ENDIAN -DOPENSSL_SMALL_FOOTPRINT -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE -DOPENSSL_NO_ASM
#
# VALID_ARCHITECTURES = IA32 X64 ARM AARCH64
diff --git a/CryptoPkg/Library/OpensslLib/OpensslLibCrypto.inf b/CryptoPkg/Library/OpensslLib/OpensslLibCrypto.inf
index 8a723cb8cd..f0c588284c 100644
--- a/CryptoPkg/Library/OpensslLib/OpensslLibCrypto.inf
+++ b/CryptoPkg/Library/OpensslLib/OpensslLibCrypto.inf
@@ -15,7 +15,7 @@
VERSION_STRING = 1.0
LIBRARY_CLASS = OpensslLib
DEFINE OPENSSL_PATH = openssl
- DEFINE OPENSSL_FLAGS = -DL_ENDIAN -DOPENSSL_SMALL_FOOTPRINT -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE
+ DEFINE OPENSSL_FLAGS = -DL_ENDIAN -DOPENSSL_SMALL_FOOTPRINT -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE -DOPENSSL_NO_ASM
#
# VALID_ARCHITECTURES = IA32 X64 ARM AARCH64
diff --git a/CryptoPkg/Library/OpensslLib/OpensslLibIa32.inf b/CryptoPkg/Library/OpensslLib/OpensslLibIa32.inf
new file mode 100644
index 0000000000..14f4d4ab1a
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/OpensslLibIa32.inf
@@ -0,0 +1,680 @@
+## @file
+# This module provides OpenSSL Library implementation.
+#
+# Copyright (c) 2010 - 2020, Intel Corporation. All rights reserved.<BR>
+# SPDX-License-Identifier: BSD-2-Clause-Patent
+#
+##
+
+[Defines]
+ INF_VERSION = 0x00010005
+ BASE_NAME = OpensslLibIa32
+ MODULE_UNI_FILE = OpensslLib.uni
+ FILE_GUID = 5805D1D4-F8EE-4FBA-BDD8-74465F16A534
+ MODULE_TYPE = BASE
+ VERSION_STRING = 1.0
+ LIBRARY_CLASS = OpensslLib
+ DEFINE OPENSSL_PATH = openssl
+ DEFINE OPENSSL_FLAGS = -DL_ENDIAN -DOPENSSL_SMALL_FOOTPRINT -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE
+ DEFINE OPENSSL_FLAGS_CONFIG = -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM
+ CONSTRUCTOR = OpensslLibConstructor
+
+#
+# VALID_ARCHITECTURES = IA32
+#
+
+[Sources]
+ OpensslLibConstructor.c
+ $(OPENSSL_PATH)/e_os.h
+ $(OPENSSL_PATH)/ms/uplink.h
+# Autogenerated files list starts here
+ Ia32/crypto/aes/aesni-x86.nasm
+ Ia32/crypto/aes/vpaes-x86.nasm
+ Ia32/crypto/bn/bn-586.nasm
+ Ia32/crypto/bn/co-586.nasm
+ Ia32/crypto/bn/x86-gf2m.nasm
+ Ia32/crypto/bn/x86-mont.nasm
+ Ia32/crypto/des/crypt586.nasm
+ Ia32/crypto/des/des-586.nasm
+ Ia32/crypto/md5/md5-586.nasm
+ Ia32/crypto/modes/ghash-x86.nasm
+ Ia32/crypto/rc4/rc4-586.nasm
+ Ia32/crypto/sha/sha1-586.nasm
+ Ia32/crypto/sha/sha256-586.nasm
+ Ia32/crypto/sha/sha512-586.nasm
+ Ia32/crypto/x86cpuid.nasm
+ $(OPENSSL_PATH)/crypto/aes/aes_cbc.c
+ $(OPENSSL_PATH)/crypto/aes/aes_cfb.c
+ $(OPENSSL_PATH)/crypto/aes/aes_core.c
+ $(OPENSSL_PATH)/crypto/aes/aes_ecb.c
+ $(OPENSSL_PATH)/crypto/aes/aes_ige.c
+ $(OPENSSL_PATH)/crypto/aes/aes_misc.c
+ $(OPENSSL_PATH)/crypto/aes/aes_ofb.c
+ $(OPENSSL_PATH)/crypto/aes/aes_wrap.c
+ $(OPENSSL_PATH)/crypto/aria/aria.c
+ $(OPENSSL_PATH)/crypto/asn1/a_bitstr.c
+ $(OPENSSL_PATH)/crypto/asn1/a_d2i_fp.c
+ $(OPENSSL_PATH)/crypto/asn1/a_digest.c
+ $(OPENSSL_PATH)/crypto/asn1/a_dup.c
+ $(OPENSSL_PATH)/crypto/asn1/a_gentm.c
+ $(OPENSSL_PATH)/crypto/asn1/a_i2d_fp.c
+ $(OPENSSL_PATH)/crypto/asn1/a_int.c
+ $(OPENSSL_PATH)/crypto/asn1/a_mbstr.c
+ $(OPENSSL_PATH)/crypto/asn1/a_object.c
+ $(OPENSSL_PATH)/crypto/asn1/a_octet.c
+ $(OPENSSL_PATH)/crypto/asn1/a_print.c
+ $(OPENSSL_PATH)/crypto/asn1/a_sign.c
+ $(OPENSSL_PATH)/crypto/asn1/a_strex.c
+ $(OPENSSL_PATH)/crypto/asn1/a_strnid.c
+ $(OPENSSL_PATH)/crypto/asn1/a_time.c
+ $(OPENSSL_PATH)/crypto/asn1/a_type.c
+ $(OPENSSL_PATH)/crypto/asn1/a_utctm.c
+ $(OPENSSL_PATH)/crypto/asn1/a_utf8.c
+ $(OPENSSL_PATH)/crypto/asn1/a_verify.c
+ $(OPENSSL_PATH)/crypto/asn1/ameth_lib.c
+ $(OPENSSL_PATH)/crypto/asn1/asn1_err.c
+ $(OPENSSL_PATH)/crypto/asn1/asn1_gen.c
+ $(OPENSSL_PATH)/crypto/asn1/asn1_item_list.c
+ $(OPENSSL_PATH)/crypto/asn1/asn1_lib.c
+ $(OPENSSL_PATH)/crypto/asn1/asn1_par.c
+ $(OPENSSL_PATH)/crypto/asn1/asn_mime.c
+ $(OPENSSL_PATH)/crypto/asn1/asn_moid.c
+ $(OPENSSL_PATH)/crypto/asn1/asn_mstbl.c
+ $(OPENSSL_PATH)/crypto/asn1/asn_pack.c
+ $(OPENSSL_PATH)/crypto/asn1/bio_asn1.c
+ $(OPENSSL_PATH)/crypto/asn1/bio_ndef.c
+ $(OPENSSL_PATH)/crypto/asn1/d2i_pr.c
+ $(OPENSSL_PATH)/crypto/asn1/d2i_pu.c
+ $(OPENSSL_PATH)/crypto/asn1/evp_asn1.c
+ $(OPENSSL_PATH)/crypto/asn1/f_int.c
+ $(OPENSSL_PATH)/crypto/asn1/f_string.c
+ $(OPENSSL_PATH)/crypto/asn1/i2d_pr.c
+ $(OPENSSL_PATH)/crypto/asn1/i2d_pu.c
+ $(OPENSSL_PATH)/crypto/asn1/n_pkey.c
+ $(OPENSSL_PATH)/crypto/asn1/nsseq.c
+ $(OPENSSL_PATH)/crypto/asn1/p5_pbe.c
+ $(OPENSSL_PATH)/crypto/asn1/p5_pbev2.c
+ $(OPENSSL_PATH)/crypto/asn1/p5_scrypt.c
+ $(OPENSSL_PATH)/crypto/asn1/p8_pkey.c
+ $(OPENSSL_PATH)/crypto/asn1/t_bitst.c
+ $(OPENSSL_PATH)/crypto/asn1/t_pkey.c
+ $(OPENSSL_PATH)/crypto/asn1/t_spki.c
+ $(OPENSSL_PATH)/crypto/asn1/tasn_dec.c
+ $(OPENSSL_PATH)/crypto/asn1/tasn_enc.c
+ $(OPENSSL_PATH)/crypto/asn1/tasn_fre.c
+ $(OPENSSL_PATH)/crypto/asn1/tasn_new.c
+ $(OPENSSL_PATH)/crypto/asn1/tasn_prn.c
+ $(OPENSSL_PATH)/crypto/asn1/tasn_scn.c
+ $(OPENSSL_PATH)/crypto/asn1/tasn_typ.c
+ $(OPENSSL_PATH)/crypto/asn1/tasn_utl.c
+ $(OPENSSL_PATH)/crypto/asn1/x_algor.c
+ $(OPENSSL_PATH)/crypto/asn1/x_bignum.c
+ $(OPENSSL_PATH)/crypto/asn1/x_info.c
+ $(OPENSSL_PATH)/crypto/asn1/x_int64.c
+ $(OPENSSL_PATH)/crypto/asn1/x_long.c
+ $(OPENSSL_PATH)/crypto/asn1/x_pkey.c
+ $(OPENSSL_PATH)/crypto/asn1/x_sig.c
+ $(OPENSSL_PATH)/crypto/asn1/x_spki.c
+ $(OPENSSL_PATH)/crypto/asn1/x_val.c
+ $(OPENSSL_PATH)/crypto/async/arch/async_null.c
+ $(OPENSSL_PATH)/crypto/async/arch/async_posix.c
+ $(OPENSSL_PATH)/crypto/async/arch/async_win.c
+ $(OPENSSL_PATH)/crypto/async/async.c
+ $(OPENSSL_PATH)/crypto/async/async_err.c
+ $(OPENSSL_PATH)/crypto/async/async_wait.c
+ $(OPENSSL_PATH)/crypto/bio/b_addr.c
+ $(OPENSSL_PATH)/crypto/bio/b_dump.c
+ $(OPENSSL_PATH)/crypto/bio/b_sock.c
+ $(OPENSSL_PATH)/crypto/bio/b_sock2.c
+ $(OPENSSL_PATH)/crypto/bio/bf_buff.c
+ $(OPENSSL_PATH)/crypto/bio/bf_lbuf.c
+ $(OPENSSL_PATH)/crypto/bio/bf_nbio.c
+ $(OPENSSL_PATH)/crypto/bio/bf_null.c
+ $(OPENSSL_PATH)/crypto/bio/bio_cb.c
+ $(OPENSSL_PATH)/crypto/bio/bio_err.c
+ $(OPENSSL_PATH)/crypto/bio/bio_lib.c
+ $(OPENSSL_PATH)/crypto/bio/bio_meth.c
+ $(OPENSSL_PATH)/crypto/bio/bss_acpt.c
+ $(OPENSSL_PATH)/crypto/bio/bss_bio.c
+ $(OPENSSL_PATH)/crypto/bio/bss_conn.c
+ $(OPENSSL_PATH)/crypto/bio/bss_dgram.c
+ $(OPENSSL_PATH)/crypto/bio/bss_fd.c
+ $(OPENSSL_PATH)/crypto/bio/bss_file.c
+ $(OPENSSL_PATH)/crypto/bio/bss_log.c
+ $(OPENSSL_PATH)/crypto/bio/bss_mem.c
+ $(OPENSSL_PATH)/crypto/bio/bss_null.c
+ $(OPENSSL_PATH)/crypto/bio/bss_sock.c
+ $(OPENSSL_PATH)/crypto/bn/bn_add.c
+ $(OPENSSL_PATH)/crypto/bn/bn_blind.c
+ $(OPENSSL_PATH)/crypto/bn/bn_const.c
+ $(OPENSSL_PATH)/crypto/bn/bn_ctx.c
+ $(OPENSSL_PATH)/crypto/bn/bn_depr.c
+ $(OPENSSL_PATH)/crypto/bn/bn_dh.c
+ $(OPENSSL_PATH)/crypto/bn/bn_div.c
+ $(OPENSSL_PATH)/crypto/bn/bn_err.c
+ $(OPENSSL_PATH)/crypto/bn/bn_exp.c
+ $(OPENSSL_PATH)/crypto/bn/bn_exp2.c
+ $(OPENSSL_PATH)/crypto/bn/bn_gcd.c
+ $(OPENSSL_PATH)/crypto/bn/bn_gf2m.c
+ $(OPENSSL_PATH)/crypto/bn/bn_intern.c
+ $(OPENSSL_PATH)/crypto/bn/bn_kron.c
+ $(OPENSSL_PATH)/crypto/bn/bn_lib.c
+ $(OPENSSL_PATH)/crypto/bn/bn_mod.c
+ $(OPENSSL_PATH)/crypto/bn/bn_mont.c
+ $(OPENSSL_PATH)/crypto/bn/bn_mpi.c
+ $(OPENSSL_PATH)/crypto/bn/bn_mul.c
+ $(OPENSSL_PATH)/crypto/bn/bn_nist.c
+ $(OPENSSL_PATH)/crypto/bn/bn_prime.c
+ $(OPENSSL_PATH)/crypto/bn/bn_print.c
+ $(OPENSSL_PATH)/crypto/bn/bn_rand.c
+ $(OPENSSL_PATH)/crypto/bn/bn_recp.c
+ $(OPENSSL_PATH)/crypto/bn/bn_shift.c
+ $(OPENSSL_PATH)/crypto/bn/bn_sqr.c
+ $(OPENSSL_PATH)/crypto/bn/bn_sqrt.c
+ $(OPENSSL_PATH)/crypto/bn/bn_srp.c
+ $(OPENSSL_PATH)/crypto/bn/bn_word.c
+ $(OPENSSL_PATH)/crypto/bn/bn_x931p.c
+ $(OPENSSL_PATH)/crypto/buffer/buf_err.c
+ $(OPENSSL_PATH)/crypto/buffer/buffer.c
+ $(OPENSSL_PATH)/crypto/cmac/cm_ameth.c
+ $(OPENSSL_PATH)/crypto/cmac/cm_pmeth.c
+ $(OPENSSL_PATH)/crypto/cmac/cmac.c
+ $(OPENSSL_PATH)/crypto/comp/c_zlib.c
+ $(OPENSSL_PATH)/crypto/comp/comp_err.c
+ $(OPENSSL_PATH)/crypto/comp/comp_lib.c
+ $(OPENSSL_PATH)/crypto/conf/conf_api.c
+ $(OPENSSL_PATH)/crypto/conf/conf_def.c
+ $(OPENSSL_PATH)/crypto/conf/conf_err.c
+ $(OPENSSL_PATH)/crypto/conf/conf_lib.c
+ $(OPENSSL_PATH)/crypto/conf/conf_mall.c
+ $(OPENSSL_PATH)/crypto/conf/conf_mod.c
+ $(OPENSSL_PATH)/crypto/conf/conf_sap.c
+ $(OPENSSL_PATH)/crypto/conf/conf_ssl.c
+ $(OPENSSL_PATH)/crypto/cpt_err.c
+ $(OPENSSL_PATH)/crypto/cryptlib.c
+ $(OPENSSL_PATH)/crypto/ctype.c
+ $(OPENSSL_PATH)/crypto/cversion.c
+ $(OPENSSL_PATH)/crypto/des/cbc_cksm.c
+ $(OPENSSL_PATH)/crypto/des/cbc_enc.c
+ $(OPENSSL_PATH)/crypto/des/cfb64ede.c
+ $(OPENSSL_PATH)/crypto/des/cfb64enc.c
+ $(OPENSSL_PATH)/crypto/des/cfb_enc.c
+ $(OPENSSL_PATH)/crypto/des/ecb3_enc.c
+ $(OPENSSL_PATH)/crypto/des/ecb_enc.c
+ $(OPENSSL_PATH)/crypto/des/fcrypt.c
+ $(OPENSSL_PATH)/crypto/des/ofb64ede.c
+ $(OPENSSL_PATH)/crypto/des/ofb64enc.c
+ $(OPENSSL_PATH)/crypto/des/ofb_enc.c
+ $(OPENSSL_PATH)/crypto/des/pcbc_enc.c
+ $(OPENSSL_PATH)/crypto/des/qud_cksm.c
+ $(OPENSSL_PATH)/crypto/des/rand_key.c
+ $(OPENSSL_PATH)/crypto/des/set_key.c
+ $(OPENSSL_PATH)/crypto/des/str2key.c
+ $(OPENSSL_PATH)/crypto/des/xcbc_enc.c
+ $(OPENSSL_PATH)/crypto/dh/dh_ameth.c
+ $(OPENSSL_PATH)/crypto/dh/dh_asn1.c
+ $(OPENSSL_PATH)/crypto/dh/dh_check.c
+ $(OPENSSL_PATH)/crypto/dh/dh_depr.c
+ $(OPENSSL_PATH)/crypto/dh/dh_err.c
+ $(OPENSSL_PATH)/crypto/dh/dh_gen.c
+ $(OPENSSL_PATH)/crypto/dh/dh_kdf.c
+ $(OPENSSL_PATH)/crypto/dh/dh_key.c
+ $(OPENSSL_PATH)/crypto/dh/dh_lib.c
+ $(OPENSSL_PATH)/crypto/dh/dh_meth.c
+ $(OPENSSL_PATH)/crypto/dh/dh_pmeth.c
+ $(OPENSSL_PATH)/crypto/dh/dh_prn.c
+ $(OPENSSL_PATH)/crypto/dh/dh_rfc5114.c
+ $(OPENSSL_PATH)/crypto/dh/dh_rfc7919.c
+ $(OPENSSL_PATH)/crypto/dso/dso_dl.c
+ $(OPENSSL_PATH)/crypto/dso/dso_dlfcn.c
+ $(OPENSSL_PATH)/crypto/dso/dso_err.c
+ $(OPENSSL_PATH)/crypto/dso/dso_lib.c
+ $(OPENSSL_PATH)/crypto/dso/dso_openssl.c
+ $(OPENSSL_PATH)/crypto/dso/dso_vms.c
+ $(OPENSSL_PATH)/crypto/dso/dso_win32.c
+ $(OPENSSL_PATH)/crypto/ebcdic.c
+ $(OPENSSL_PATH)/crypto/err/err.c
+ $(OPENSSL_PATH)/crypto/err/err_prn.c
+ $(OPENSSL_PATH)/crypto/evp/bio_b64.c
+ $(OPENSSL_PATH)/crypto/evp/bio_enc.c
+ $(OPENSSL_PATH)/crypto/evp/bio_md.c
+ $(OPENSSL_PATH)/crypto/evp/bio_ok.c
+ $(OPENSSL_PATH)/crypto/evp/c_allc.c
+ $(OPENSSL_PATH)/crypto/evp/c_alld.c
+ $(OPENSSL_PATH)/crypto/evp/cmeth_lib.c
+ $(OPENSSL_PATH)/crypto/evp/digest.c
+ $(OPENSSL_PATH)/crypto/evp/e_aes.c
+ $(OPENSSL_PATH)/crypto/evp/e_aes_cbc_hmac_sha1.c
+ $(OPENSSL_PATH)/crypto/evp/e_aes_cbc_hmac_sha256.c
+ $(OPENSSL_PATH)/crypto/evp/e_aria.c
+ $(OPENSSL_PATH)/crypto/evp/e_bf.c
+ $(OPENSSL_PATH)/crypto/evp/e_camellia.c
+ $(OPENSSL_PATH)/crypto/evp/e_cast.c
+ $(OPENSSL_PATH)/crypto/evp/e_chacha20_poly1305.c
+ $(OPENSSL_PATH)/crypto/evp/e_des.c
+ $(OPENSSL_PATH)/crypto/evp/e_des3.c
+ $(OPENSSL_PATH)/crypto/evp/e_idea.c
+ $(OPENSSL_PATH)/crypto/evp/e_null.c
+ $(OPENSSL_PATH)/crypto/evp/e_old.c
+ $(OPENSSL_PATH)/crypto/evp/e_rc2.c
+ $(OPENSSL_PATH)/crypto/evp/e_rc4.c
+ $(OPENSSL_PATH)/crypto/evp/e_rc4_hmac_md5.c
+ $(OPENSSL_PATH)/crypto/evp/e_rc5.c
+ $(OPENSSL_PATH)/crypto/evp/e_seed.c
+ $(OPENSSL_PATH)/crypto/evp/e_sm4.c
+ $(OPENSSL_PATH)/crypto/evp/e_xcbc_d.c
+ $(OPENSSL_PATH)/crypto/evp/encode.c
+ $(OPENSSL_PATH)/crypto/evp/evp_cnf.c
+ $(OPENSSL_PATH)/crypto/evp/evp_enc.c
+ $(OPENSSL_PATH)/crypto/evp/evp_err.c
+ $(OPENSSL_PATH)/crypto/evp/evp_key.c
+ $(OPENSSL_PATH)/crypto/evp/evp_lib.c
+ $(OPENSSL_PATH)/crypto/evp/evp_pbe.c
+ $(OPENSSL_PATH)/crypto/evp/evp_pkey.c
+ $(OPENSSL_PATH)/crypto/evp/m_md2.c
+ $(OPENSSL_PATH)/crypto/evp/m_md4.c
+ $(OPENSSL_PATH)/crypto/evp/m_md5.c
+ $(OPENSSL_PATH)/crypto/evp/m_md5_sha1.c
+ $(OPENSSL_PATH)/crypto/evp/m_mdc2.c
+ $(OPENSSL_PATH)/crypto/evp/m_null.c
+ $(OPENSSL_PATH)/crypto/evp/m_ripemd.c
+ $(OPENSSL_PATH)/crypto/evp/m_sha1.c
+ $(OPENSSL_PATH)/crypto/evp/m_sha3.c
+ $(OPENSSL_PATH)/crypto/evp/m_sigver.c
+ $(OPENSSL_PATH)/crypto/evp/m_wp.c
+ $(OPENSSL_PATH)/crypto/evp/names.c
+ $(OPENSSL_PATH)/crypto/evp/p5_crpt.c
+ $(OPENSSL_PATH)/crypto/evp/p5_crpt2.c
+ $(OPENSSL_PATH)/crypto/evp/p_dec.c
+ $(OPENSSL_PATH)/crypto/evp/p_enc.c
+ $(OPENSSL_PATH)/crypto/evp/p_lib.c
+ $(OPENSSL_PATH)/crypto/evp/p_open.c
+ $(OPENSSL_PATH)/crypto/evp/p_seal.c
+ $(OPENSSL_PATH)/crypto/evp/p_sign.c
+ $(OPENSSL_PATH)/crypto/evp/p_verify.c
+ $(OPENSSL_PATH)/crypto/evp/pbe_scrypt.c
+ $(OPENSSL_PATH)/crypto/evp/pmeth_fn.c
+ $(OPENSSL_PATH)/crypto/evp/pmeth_gn.c
+ $(OPENSSL_PATH)/crypto/evp/pmeth_lib.c
+ $(OPENSSL_PATH)/crypto/ex_data.c
+ $(OPENSSL_PATH)/crypto/getenv.c
+ $(OPENSSL_PATH)/crypto/hmac/hm_ameth.c
+ $(OPENSSL_PATH)/crypto/hmac/hm_pmeth.c
+ $(OPENSSL_PATH)/crypto/hmac/hmac.c
+ $(OPENSSL_PATH)/crypto/init.c
+ $(OPENSSL_PATH)/crypto/kdf/hkdf.c
+ $(OPENSSL_PATH)/crypto/kdf/kdf_err.c
+ $(OPENSSL_PATH)/crypto/kdf/scrypt.c
+ $(OPENSSL_PATH)/crypto/kdf/tls1_prf.c
+ $(OPENSSL_PATH)/crypto/lhash/lh_stats.c
+ $(OPENSSL_PATH)/crypto/lhash/lhash.c
+ $(OPENSSL_PATH)/crypto/md4/md4_dgst.c
+ $(OPENSSL_PATH)/crypto/md4/md4_one.c
+ $(OPENSSL_PATH)/crypto/md5/md5_dgst.c
+ $(OPENSSL_PATH)/crypto/md5/md5_one.c
+ $(OPENSSL_PATH)/crypto/mem.c
+ $(OPENSSL_PATH)/crypto/mem_dbg.c
+ $(OPENSSL_PATH)/crypto/mem_sec.c
+ $(OPENSSL_PATH)/crypto/modes/cbc128.c
+ $(OPENSSL_PATH)/crypto/modes/ccm128.c
+ $(OPENSSL_PATH)/crypto/modes/cfb128.c
+ $(OPENSSL_PATH)/crypto/modes/ctr128.c
+ $(OPENSSL_PATH)/crypto/modes/cts128.c
+ $(OPENSSL_PATH)/crypto/modes/gcm128.c
+ $(OPENSSL_PATH)/crypto/modes/ocb128.c
+ $(OPENSSL_PATH)/crypto/modes/ofb128.c
+ $(OPENSSL_PATH)/crypto/modes/wrap128.c
+ $(OPENSSL_PATH)/crypto/modes/xts128.c
+ $(OPENSSL_PATH)/crypto/o_dir.c
+ $(OPENSSL_PATH)/crypto/o_fips.c
+ $(OPENSSL_PATH)/crypto/o_fopen.c
+ $(OPENSSL_PATH)/crypto/o_init.c
+ $(OPENSSL_PATH)/crypto/o_str.c
+ $(OPENSSL_PATH)/crypto/o_time.c
+ $(OPENSSL_PATH)/crypto/objects/o_names.c
+ $(OPENSSL_PATH)/crypto/objects/obj_dat.c
+ $(OPENSSL_PATH)/crypto/objects/obj_err.c
+ $(OPENSSL_PATH)/crypto/objects/obj_lib.c
+ $(OPENSSL_PATH)/crypto/objects/obj_xref.c
+ $(OPENSSL_PATH)/crypto/ocsp/ocsp_asn.c
+ $(OPENSSL_PATH)/crypto/ocsp/ocsp_cl.c
+ $(OPENSSL_PATH)/crypto/ocsp/ocsp_err.c
+ $(OPENSSL_PATH)/crypto/ocsp/ocsp_ext.c
+ $(OPENSSL_PATH)/crypto/ocsp/ocsp_ht.c
+ $(OPENSSL_PATH)/crypto/ocsp/ocsp_lib.c
+ $(OPENSSL_PATH)/crypto/ocsp/ocsp_prn.c
+ $(OPENSSL_PATH)/crypto/ocsp/ocsp_srv.c
+ $(OPENSSL_PATH)/crypto/ocsp/ocsp_vfy.c
+ $(OPENSSL_PATH)/crypto/ocsp/v3_ocsp.c
+ $(OPENSSL_PATH)/crypto/pem/pem_all.c
+ $(OPENSSL_PATH)/crypto/pem/pem_err.c
+ $(OPENSSL_PATH)/crypto/pem/pem_info.c
+ $(OPENSSL_PATH)/crypto/pem/pem_lib.c
+ $(OPENSSL_PATH)/crypto/pem/pem_oth.c
+ $(OPENSSL_PATH)/crypto/pem/pem_pk8.c
+ $(OPENSSL_PATH)/crypto/pem/pem_pkey.c
+ $(OPENSSL_PATH)/crypto/pem/pem_sign.c
+ $(OPENSSL_PATH)/crypto/pem/pem_x509.c
+ $(OPENSSL_PATH)/crypto/pem/pem_xaux.c
+ $(OPENSSL_PATH)/crypto/pem/pvkfmt.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_add.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_asn.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_attr.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_crpt.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_crt.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_decr.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_init.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_key.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_kiss.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_mutl.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_npas.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_p8d.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_p8e.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_sbag.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_utl.c
+ $(OPENSSL_PATH)/crypto/pkcs12/pk12err.c
+ $(OPENSSL_PATH)/crypto/pkcs7/bio_pk7.c
+ $(OPENSSL_PATH)/crypto/pkcs7/pk7_asn1.c
+ $(OPENSSL_PATH)/crypto/pkcs7/pk7_attr.c
+ $(OPENSSL_PATH)/crypto/pkcs7/pk7_doit.c
+ $(OPENSSL_PATH)/crypto/pkcs7/pk7_lib.c
+ $(OPENSSL_PATH)/crypto/pkcs7/pk7_mime.c
+ $(OPENSSL_PATH)/crypto/pkcs7/pk7_smime.c
+ $(OPENSSL_PATH)/crypto/pkcs7/pkcs7err.c
+ $(OPENSSL_PATH)/crypto/rand/drbg_ctr.c
+ $(OPENSSL_PATH)/crypto/rand/drbg_lib.c
+ $(OPENSSL_PATH)/crypto/rand/rand_egd.c
+ $(OPENSSL_PATH)/crypto/rand/rand_err.c
+ $(OPENSSL_PATH)/crypto/rand/rand_lib.c
+ $(OPENSSL_PATH)/crypto/rand/rand_unix.c
+ $(OPENSSL_PATH)/crypto/rand/rand_vms.c
+ $(OPENSSL_PATH)/crypto/rand/rand_win.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_ameth.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_asn1.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_chk.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_crpt.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_depr.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_err.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_gen.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_lib.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_meth.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_mp.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_none.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_oaep.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_ossl.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_pk1.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_pmeth.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_prn.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_pss.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_saos.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_sign.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_ssl.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_x931.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_x931g.c
+ $(OPENSSL_PATH)/crypto/sha/keccak1600.c
+ $(OPENSSL_PATH)/crypto/sha/sha1_one.c
+ $(OPENSSL_PATH)/crypto/sha/sha1dgst.c
+ $(OPENSSL_PATH)/crypto/sha/sha256.c
+ $(OPENSSL_PATH)/crypto/sha/sha512.c
+ $(OPENSSL_PATH)/crypto/siphash/siphash.c
+ $(OPENSSL_PATH)/crypto/siphash/siphash_ameth.c
+ $(OPENSSL_PATH)/crypto/siphash/siphash_pmeth.c
+ $(OPENSSL_PATH)/crypto/sm3/m_sm3.c
+ $(OPENSSL_PATH)/crypto/sm3/sm3.c
+ $(OPENSSL_PATH)/crypto/sm4/sm4.c
+ $(OPENSSL_PATH)/crypto/stack/stack.c
+ $(OPENSSL_PATH)/crypto/threads_none.c
+ $(OPENSSL_PATH)/crypto/threads_pthread.c
+ $(OPENSSL_PATH)/crypto/threads_win.c
+ $(OPENSSL_PATH)/crypto/txt_db/txt_db.c
+ $(OPENSSL_PATH)/crypto/ui/ui_err.c
+ $(OPENSSL_PATH)/crypto/ui/ui_lib.c
+ $(OPENSSL_PATH)/crypto/ui/ui_null.c
+ $(OPENSSL_PATH)/crypto/ui/ui_openssl.c
+ $(OPENSSL_PATH)/crypto/ui/ui_util.c
+ $(OPENSSL_PATH)/crypto/uid.c
+ $(OPENSSL_PATH)/crypto/x509/by_dir.c
+ $(OPENSSL_PATH)/crypto/x509/by_file.c
+ $(OPENSSL_PATH)/crypto/x509/t_crl.c
+ $(OPENSSL_PATH)/crypto/x509/t_req.c
+ $(OPENSSL_PATH)/crypto/x509/t_x509.c
+ $(OPENSSL_PATH)/crypto/x509/x509_att.c
+ $(OPENSSL_PATH)/crypto/x509/x509_cmp.c
+ $(OPENSSL_PATH)/crypto/x509/x509_d2.c
+ $(OPENSSL_PATH)/crypto/x509/x509_def.c
+ $(OPENSSL_PATH)/crypto/x509/x509_err.c
+ $(OPENSSL_PATH)/crypto/x509/x509_ext.c
+ $(OPENSSL_PATH)/crypto/x509/x509_lu.c
+ $(OPENSSL_PATH)/crypto/x509/x509_meth.c
+ $(OPENSSL_PATH)/crypto/x509/x509_obj.c
+ $(OPENSSL_PATH)/crypto/x509/x509_r2x.c
+ $(OPENSSL_PATH)/crypto/x509/x509_req.c
+ $(OPENSSL_PATH)/crypto/x509/x509_set.c
+ $(OPENSSL_PATH)/crypto/x509/x509_trs.c
+ $(OPENSSL_PATH)/crypto/x509/x509_txt.c
+ $(OPENSSL_PATH)/crypto/x509/x509_v3.c
+ $(OPENSSL_PATH)/crypto/x509/x509_vfy.c
+ $(OPENSSL_PATH)/crypto/x509/x509_vpm.c
+ $(OPENSSL_PATH)/crypto/x509/x509cset.c
+ $(OPENSSL_PATH)/crypto/x509/x509name.c
+ $(OPENSSL_PATH)/crypto/x509/x509rset.c
+ $(OPENSSL_PATH)/crypto/x509/x509spki.c
+ $(OPENSSL_PATH)/crypto/x509/x509type.c
+ $(OPENSSL_PATH)/crypto/x509/x_all.c
+ $(OPENSSL_PATH)/crypto/x509/x_attrib.c
+ $(OPENSSL_PATH)/crypto/x509/x_crl.c
+ $(OPENSSL_PATH)/crypto/x509/x_exten.c
+ $(OPENSSL_PATH)/crypto/x509/x_name.c
+ $(OPENSSL_PATH)/crypto/x509/x_pubkey.c
+ $(OPENSSL_PATH)/crypto/x509/x_req.c
+ $(OPENSSL_PATH)/crypto/x509/x_x509.c
+ $(OPENSSL_PATH)/crypto/x509/x_x509a.c
+ $(OPENSSL_PATH)/crypto/x509v3/pcy_cache.c
+ $(OPENSSL_PATH)/crypto/x509v3/pcy_data.c
+ $(OPENSSL_PATH)/crypto/x509v3/pcy_lib.c
+ $(OPENSSL_PATH)/crypto/x509v3/pcy_map.c
+ $(OPENSSL_PATH)/crypto/x509v3/pcy_node.c
+ $(OPENSSL_PATH)/crypto/x509v3/pcy_tree.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_addr.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_admis.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_akey.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_akeya.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_alt.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_asid.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_bcons.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_bitst.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_conf.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_cpols.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_crld.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_enum.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_extku.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_genn.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_ia5.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_info.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_int.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_lib.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_ncons.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_pci.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_pcia.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_pcons.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_pku.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_pmaps.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_prn.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_purp.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_skey.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_sxnet.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_tlsf.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_utl.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3err.c
+ $(OPENSSL_PATH)/crypto/arm_arch.h
+ $(OPENSSL_PATH)/crypto/mips_arch.h
+ $(OPENSSL_PATH)/crypto/ppc_arch.h
+ $(OPENSSL_PATH)/crypto/s390x_arch.h
+ $(OPENSSL_PATH)/crypto/sparc_arch.h
+ $(OPENSSL_PATH)/crypto/vms_rms.h
+ $(OPENSSL_PATH)/crypto/aes/aes_locl.h
+ $(OPENSSL_PATH)/crypto/asn1/asn1_item_list.h
+ $(OPENSSL_PATH)/crypto/asn1/asn1_locl.h
+ $(OPENSSL_PATH)/crypto/asn1/charmap.h
+ $(OPENSSL_PATH)/crypto/asn1/standard_methods.h
+ $(OPENSSL_PATH)/crypto/asn1/tbl_standard.h
+ $(OPENSSL_PATH)/crypto/async/async_locl.h
+ $(OPENSSL_PATH)/crypto/async/arch/async_null.h
+ $(OPENSSL_PATH)/crypto/async/arch/async_posix.h
+ $(OPENSSL_PATH)/crypto/async/arch/async_win.h
+ $(OPENSSL_PATH)/crypto/bio/bio_lcl.h
+ $(OPENSSL_PATH)/crypto/bn/bn_lcl.h
+ $(OPENSSL_PATH)/crypto/bn/bn_prime.h
+ $(OPENSSL_PATH)/crypto/bn/rsaz_exp.h
+ $(OPENSSL_PATH)/crypto/comp/comp_lcl.h
+ $(OPENSSL_PATH)/crypto/conf/conf_def.h
+ $(OPENSSL_PATH)/crypto/conf/conf_lcl.h
+ $(OPENSSL_PATH)/crypto/des/des_locl.h
+ $(OPENSSL_PATH)/crypto/des/spr.h
+ $(OPENSSL_PATH)/crypto/dh/dh_locl.h
+ $(OPENSSL_PATH)/crypto/dso/dso_locl.h
+ $(OPENSSL_PATH)/crypto/evp/evp_locl.h
+ $(OPENSSL_PATH)/crypto/hmac/hmac_lcl.h
+ $(OPENSSL_PATH)/crypto/lhash/lhash_lcl.h
+ $(OPENSSL_PATH)/crypto/md4/md4_locl.h
+ $(OPENSSL_PATH)/crypto/md5/md5_locl.h
+ $(OPENSSL_PATH)/crypto/modes/modes_lcl.h
+ $(OPENSSL_PATH)/crypto/objects/obj_dat.h
+ $(OPENSSL_PATH)/crypto/objects/obj_lcl.h
+ $(OPENSSL_PATH)/crypto/objects/obj_xref.h
+ $(OPENSSL_PATH)/crypto/ocsp/ocsp_lcl.h
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_lcl.h
+ $(OPENSSL_PATH)/crypto/rand/rand_lcl.h
+ $(OPENSSL_PATH)/crypto/rc4/rc4_locl.h
+ $(OPENSSL_PATH)/crypto/rsa/rsa_locl.h
+ $(OPENSSL_PATH)/crypto/sha/sha_locl.h
+ $(OPENSSL_PATH)/crypto/siphash/siphash_local.h
+ $(OPENSSL_PATH)/crypto/sm3/sm3_locl.h
+ $(OPENSSL_PATH)/crypto/store/store_locl.h
+ $(OPENSSL_PATH)/crypto/ui/ui_locl.h
+ $(OPENSSL_PATH)/crypto/x509/x509_lcl.h
+ $(OPENSSL_PATH)/crypto/x509v3/ext_dat.h
+ $(OPENSSL_PATH)/crypto/x509v3/pcy_int.h
+ $(OPENSSL_PATH)/crypto/x509v3/standard_exts.h
+ $(OPENSSL_PATH)/crypto/x509v3/v3_admis.h
+ $(OPENSSL_PATH)/ssl/bio_ssl.c
+ $(OPENSSL_PATH)/ssl/d1_lib.c
+ $(OPENSSL_PATH)/ssl/d1_msg.c
+ $(OPENSSL_PATH)/ssl/d1_srtp.c
+ $(OPENSSL_PATH)/ssl/methods.c
+ $(OPENSSL_PATH)/ssl/packet.c
+ $(OPENSSL_PATH)/ssl/pqueue.c
+ $(OPENSSL_PATH)/ssl/record/dtls1_bitmap.c
+ $(OPENSSL_PATH)/ssl/record/rec_layer_d1.c
+ $(OPENSSL_PATH)/ssl/record/rec_layer_s3.c
+ $(OPENSSL_PATH)/ssl/record/ssl3_buffer.c
+ $(OPENSSL_PATH)/ssl/record/ssl3_record.c
+ $(OPENSSL_PATH)/ssl/record/ssl3_record_tls13.c
+ $(OPENSSL_PATH)/ssl/s3_cbc.c
+ $(OPENSSL_PATH)/ssl/s3_enc.c
+ $(OPENSSL_PATH)/ssl/s3_lib.c
+ $(OPENSSL_PATH)/ssl/s3_msg.c
+ $(OPENSSL_PATH)/ssl/ssl_asn1.c
+ $(OPENSSL_PATH)/ssl/ssl_cert.c
+ $(OPENSSL_PATH)/ssl/ssl_ciph.c
+ $(OPENSSL_PATH)/ssl/ssl_conf.c
+ $(OPENSSL_PATH)/ssl/ssl_err.c
+ $(OPENSSL_PATH)/ssl/ssl_init.c
+ $(OPENSSL_PATH)/ssl/ssl_lib.c
+ $(OPENSSL_PATH)/ssl/ssl_mcnf.c
+ $(OPENSSL_PATH)/ssl/ssl_rsa.c
+ $(OPENSSL_PATH)/ssl/ssl_sess.c
+ $(OPENSSL_PATH)/ssl/ssl_stat.c
+ $(OPENSSL_PATH)/ssl/ssl_txt.c
+ $(OPENSSL_PATH)/ssl/ssl_utst.c
+ $(OPENSSL_PATH)/ssl/statem/extensions.c
+ $(OPENSSL_PATH)/ssl/statem/extensions_clnt.c
+ $(OPENSSL_PATH)/ssl/statem/extensions_cust.c
+ $(OPENSSL_PATH)/ssl/statem/extensions_srvr.c
+ $(OPENSSL_PATH)/ssl/statem/statem.c
+ $(OPENSSL_PATH)/ssl/statem/statem_clnt.c
+ $(OPENSSL_PATH)/ssl/statem/statem_dtls.c
+ $(OPENSSL_PATH)/ssl/statem/statem_lib.c
+ $(OPENSSL_PATH)/ssl/statem/statem_srvr.c
+ $(OPENSSL_PATH)/ssl/t1_enc.c
+ $(OPENSSL_PATH)/ssl/t1_lib.c
+ $(OPENSSL_PATH)/ssl/t1_trce.c
+ $(OPENSSL_PATH)/ssl/tls13_enc.c
+ $(OPENSSL_PATH)/ssl/tls_srp.c
+ $(OPENSSL_PATH)/ssl/packet_locl.h
+ $(OPENSSL_PATH)/ssl/ssl_cert_table.h
+ $(OPENSSL_PATH)/ssl/ssl_locl.h
+ $(OPENSSL_PATH)/ssl/record/record.h
+ $(OPENSSL_PATH)/ssl/record/record_locl.h
+ $(OPENSSL_PATH)/ssl/statem/statem.h
+ $(OPENSSL_PATH)/ssl/statem/statem_locl.h
+# Autogenerated files list ends here
+ buildinf.h
+ rand_pool_noise.h
+ ossl_store.c
+ rand_pool.c
+
+[Sources.Ia32]
+ rand_pool_noise_tsc.c
+
+[Packages]
+ MdePkg/MdePkg.dec
+ CryptoPkg/CryptoPkg.dec
+
+[LibraryClasses]
+ BaseLib
+ DebugLib
+ TimerLib
+ PrintLib
+
+[BuildOptions]
+ #
+ # Disables the following Visual Studio compiler warnings brought by openssl source,
+ # so we do not break the build with /WX option:
+ # C4090: 'function' : different 'const' qualifiers
+ # C4132: 'object' : const object should be initialized (tls13_enc.c)
+ # C4210: nonstandard extension used: function given file scope
+ # C4244: conversion from type1 to type2, possible loss of data
+ # C4245: conversion from type1 to type2, signed/unsigned mismatch
+ # C4267: conversion from size_t to type, possible loss of data
+ # C4306: 'identifier' : conversion from 'type1' to 'type2' of greater size
+ # C4310: cast truncates constant value
+ # C4389: 'operator' : signed/unsigned mismatch (xxxx)
+ # C4700: uninitialized local variable 'name' used. (conf_sap.c(71))
+ # C4702: unreachable code
+ # C4706: assignment within conditional expression
+ # C4819: The file contains a character that cannot be represented in the current code page
+ #
+ MSFT:*_*_IA32_CC_FLAGS = -U_WIN32 -U_WIN64 -U_MSC_VER $(OPENSSL_FLAGS) $(OPENSSL_FLAGS_CONFIG) /wd4090 /wd4132 /wd4210 /wd4244 /wd4245 /wd4267 /wd4310 /wd4389 /wd4700 /wd4702 /wd4706 /wd4819
+
+ INTEL:*_*_IA32_CC_FLAGS = -U_WIN32 -U_WIN64 -U_MSC_VER -U__ICC $(OPENSSL_FLAGS) $(OPENSSL_FLAGS_CONFIG) /w
+
+ #
+ # Suppress the following build warnings in openssl so we don't break the build with -Werror
+ # -Werror=maybe-uninitialized: there exist some other paths for which the variable is not initialized.
+ # -Werror=format: Check calls to printf and scanf, etc., to make sure that the arguments supplied have
+ # types appropriate to the format string specified.
+ # -Werror=unused-but-set-variable: Warn whenever a local variable is assigned to, but otherwise unused (aside from its declaration).
+ #
+ GCC:*_*_IA32_CC_FLAGS = -U_WIN32 -U_WIN64 $(OPENSSL_FLAGS) -Wno-error=maybe-uninitialized -Wno-error=unused-but-set-variable
+
+ # suppress the following warnings in openssl so we don't break the build with warnings-as-errors:
+ # 1295: Deprecated declaration <entity> - give arg types
+ # 550: <entity> was set but never used
+ # 1293: assignment in condition
+ # 111: statement is unreachable (invariably "break;" after "return X;" in case statement)
+ # 68: integer conversion resulted in a change of sign ("if (Status == -1)")
+ # 177: <entity> was declared but never referenced
+ # 223: function <entity> declared implicitly
+ # 144: a value of type <type> cannot be used to initialize an entity of type <type>
+ # 513: a value of type <type> cannot be assigned to an entity of type <type>
+ # 188: enumerated type mixed with another type (i.e. passing an integer as an enum without a cast)
+ # 1296: Extended constant initialiser used
+ # 128: loop is not reachable - may be emitted inappropriately if code follows a conditional return
+ # from the function that evaluates to true at compile time
+ # 546: transfer of control bypasses initialization - may be emitted inappropriately if the uninitialized
+ # variable is never referenced after the jump
+ # 1: ignore "#1-D: last line of file ends without a newline"
+ # 3017: <entity> may be used before being set (NOTE: This was fixed in OpenSSL 1.1 HEAD with
+ # commit d9b8b89bec4480de3a10bdaf9425db371c19145b, and can be dropped then.)
+ XCODE:*_*_IA32_CC_FLAGS = -mmmx -msse -U_WIN32 -U_WIN64 $(OPENSSL_FLAGS) -w -std=c99 -Wno-error=uninitialized
diff --git a/CryptoPkg/Library/OpensslLib/OpensslLibX64.inf b/CryptoPkg/Library/OpensslLib/OpensslLibX64.inf
new file mode 100644
index 0000000000..fcebc6d6de
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/OpensslLibX64.inf
@@ -0,0 +1,691 @@
+## @file
+# This module provides OpenSSL Library implementation.
+#
+# Copyright (c) 2010 - 2020, Intel Corporation. All rights reserved.<BR>
+# SPDX-License-Identifier: BSD-2-Clause-Patent
+#
+##
+
+[Defines]
+ INF_VERSION = 0x00010005
+ BASE_NAME = OpensslLibX64
+ MODULE_UNI_FILE = OpensslLib.uni
+ FILE_GUID = 18125E50-0117-4DD0-BE54-4784AD995FEF
+ MODULE_TYPE = BASE
+ VERSION_STRING = 1.0
+ LIBRARY_CLASS = OpensslLib
+ DEFINE OPENSSL_PATH = openssl
+ DEFINE OPENSSL_FLAGS = -DL_ENDIAN -DOPENSSL_SMALL_FOOTPRINT -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE
+ DEFINE OPENSSL_FLAGS_CONFIG = -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM
+ CONSTRUCTOR = OpensslLibConstructor
+
+#
+# VALID_ARCHITECTURES = X64
+#
+
+[Sources]
+ OpensslLibConstructor.c
+ $(OPENSSL_PATH)/e_os.h
+ $(OPENSSL_PATH)/ms/uplink.h
+# Autogenerated files list starts here
+ X64/crypto/aes/aesni-mb-x86_64.nasm
+ X64/crypto/aes/aesni-sha1-x86_64.nasm
+ X64/crypto/aes/aesni-sha256-x86_64.nasm
+ X64/crypto/aes/aesni-x86_64.nasm
+ X64/crypto/aes/vpaes-x86_64.nasm
+ X64/crypto/bn/rsaz-avx2.nasm
+ X64/crypto/bn/rsaz-x86_64.nasm
+ X64/crypto/bn/x86_64-gf2m.nasm
+ X64/crypto/bn/x86_64-mont.nasm
+ X64/crypto/bn/x86_64-mont5.nasm
+ X64/crypto/md5/md5-x86_64.nasm
+ X64/crypto/modes/aesni-gcm-x86_64.nasm
+ X64/crypto/modes/ghash-x86_64.nasm
+ X64/crypto/rc4/rc4-md5-x86_64.nasm
+ X64/crypto/rc4/rc4-x86_64.nasm
+ X64/crypto/sha/keccak1600-x86_64.nasm
+ X64/crypto/sha/sha1-mb-x86_64.nasm
+ X64/crypto/sha/sha1-x86_64.nasm
+ X64/crypto/sha/sha256-mb-x86_64.nasm
+ X64/crypto/sha/sha256-x86_64.nasm
+ X64/crypto/sha/sha512-x86_64.nasm
+ X64/crypto/x86_64cpuid.nasm
+ $(OPENSSL_PATH)/crypto/aes/aes_cbc.c
+ $(OPENSSL_PATH)/crypto/aes/aes_cfb.c
+ $(OPENSSL_PATH)/crypto/aes/aes_core.c
+ $(OPENSSL_PATH)/crypto/aes/aes_ecb.c
+ $(OPENSSL_PATH)/crypto/aes/aes_ige.c
+ $(OPENSSL_PATH)/crypto/aes/aes_misc.c
+ $(OPENSSL_PATH)/crypto/aes/aes_ofb.c
+ $(OPENSSL_PATH)/crypto/aes/aes_wrap.c
+ $(OPENSSL_PATH)/crypto/aria/aria.c
+ $(OPENSSL_PATH)/crypto/asn1/a_bitstr.c
+ $(OPENSSL_PATH)/crypto/asn1/a_d2i_fp.c
+ $(OPENSSL_PATH)/crypto/asn1/a_digest.c
+ $(OPENSSL_PATH)/crypto/asn1/a_dup.c
+ $(OPENSSL_PATH)/crypto/asn1/a_gentm.c
+ $(OPENSSL_PATH)/crypto/asn1/a_i2d_fp.c
+ $(OPENSSL_PATH)/crypto/asn1/a_int.c
+ $(OPENSSL_PATH)/crypto/asn1/a_mbstr.c
+ $(OPENSSL_PATH)/crypto/asn1/a_object.c
+ $(OPENSSL_PATH)/crypto/asn1/a_octet.c
+ $(OPENSSL_PATH)/crypto/asn1/a_print.c
+ $(OPENSSL_PATH)/crypto/asn1/a_sign.c
+ $(OPENSSL_PATH)/crypto/asn1/a_strex.c
+ $(OPENSSL_PATH)/crypto/asn1/a_strnid.c
+ $(OPENSSL_PATH)/crypto/asn1/a_time.c
+ $(OPENSSL_PATH)/crypto/asn1/a_type.c
+ $(OPENSSL_PATH)/crypto/asn1/a_utctm.c
+ $(OPENSSL_PATH)/crypto/asn1/a_utf8.c
+ $(OPENSSL_PATH)/crypto/asn1/a_verify.c
+ $(OPENSSL_PATH)/crypto/asn1/ameth_lib.c
+ $(OPENSSL_PATH)/crypto/asn1/asn1_err.c
+ $(OPENSSL_PATH)/crypto/asn1/asn1_gen.c
+ $(OPENSSL_PATH)/crypto/asn1/asn1_item_list.c
+ $(OPENSSL_PATH)/crypto/asn1/asn1_lib.c
+ $(OPENSSL_PATH)/crypto/asn1/asn1_par.c
+ $(OPENSSL_PATH)/crypto/asn1/asn_mime.c
+ $(OPENSSL_PATH)/crypto/asn1/asn_moid.c
+ $(OPENSSL_PATH)/crypto/asn1/asn_mstbl.c
+ $(OPENSSL_PATH)/crypto/asn1/asn_pack.c
+ $(OPENSSL_PATH)/crypto/asn1/bio_asn1.c
+ $(OPENSSL_PATH)/crypto/asn1/bio_ndef.c
+ $(OPENSSL_PATH)/crypto/asn1/d2i_pr.c
+ $(OPENSSL_PATH)/crypto/asn1/d2i_pu.c
+ $(OPENSSL_PATH)/crypto/asn1/evp_asn1.c
+ $(OPENSSL_PATH)/crypto/asn1/f_int.c
+ $(OPENSSL_PATH)/crypto/asn1/f_string.c
+ $(OPENSSL_PATH)/crypto/asn1/i2d_pr.c
+ $(OPENSSL_PATH)/crypto/asn1/i2d_pu.c
+ $(OPENSSL_PATH)/crypto/asn1/n_pkey.c
+ $(OPENSSL_PATH)/crypto/asn1/nsseq.c
+ $(OPENSSL_PATH)/crypto/asn1/p5_pbe.c
+ $(OPENSSL_PATH)/crypto/asn1/p5_pbev2.c
+ $(OPENSSL_PATH)/crypto/asn1/p5_scrypt.c
+ $(OPENSSL_PATH)/crypto/asn1/p8_pkey.c
+ $(OPENSSL_PATH)/crypto/asn1/t_bitst.c
+ $(OPENSSL_PATH)/crypto/asn1/t_pkey.c
+ $(OPENSSL_PATH)/crypto/asn1/t_spki.c
+ $(OPENSSL_PATH)/crypto/asn1/tasn_dec.c
+ $(OPENSSL_PATH)/crypto/asn1/tasn_enc.c
+ $(OPENSSL_PATH)/crypto/asn1/tasn_fre.c
+ $(OPENSSL_PATH)/crypto/asn1/tasn_new.c
+ $(OPENSSL_PATH)/crypto/asn1/tasn_prn.c
+ $(OPENSSL_PATH)/crypto/asn1/tasn_scn.c
+ $(OPENSSL_PATH)/crypto/asn1/tasn_typ.c
+ $(OPENSSL_PATH)/crypto/asn1/tasn_utl.c
+ $(OPENSSL_PATH)/crypto/asn1/x_algor.c
+ $(OPENSSL_PATH)/crypto/asn1/x_bignum.c
+ $(OPENSSL_PATH)/crypto/asn1/x_info.c
+ $(OPENSSL_PATH)/crypto/asn1/x_int64.c
+ $(OPENSSL_PATH)/crypto/asn1/x_long.c
+ $(OPENSSL_PATH)/crypto/asn1/x_pkey.c
+ $(OPENSSL_PATH)/crypto/asn1/x_sig.c
+ $(OPENSSL_PATH)/crypto/asn1/x_spki.c
+ $(OPENSSL_PATH)/crypto/asn1/x_val.c
+ $(OPENSSL_PATH)/crypto/async/arch/async_null.c
+ $(OPENSSL_PATH)/crypto/async/arch/async_posix.c
+ $(OPENSSL_PATH)/crypto/async/arch/async_win.c
+ $(OPENSSL_PATH)/crypto/async/async.c
+ $(OPENSSL_PATH)/crypto/async/async_err.c
+ $(OPENSSL_PATH)/crypto/async/async_wait.c
+ $(OPENSSL_PATH)/crypto/bio/b_addr.c
+ $(OPENSSL_PATH)/crypto/bio/b_dump.c
+ $(OPENSSL_PATH)/crypto/bio/b_sock.c
+ $(OPENSSL_PATH)/crypto/bio/b_sock2.c
+ $(OPENSSL_PATH)/crypto/bio/bf_buff.c
+ $(OPENSSL_PATH)/crypto/bio/bf_lbuf.c
+ $(OPENSSL_PATH)/crypto/bio/bf_nbio.c
+ $(OPENSSL_PATH)/crypto/bio/bf_null.c
+ $(OPENSSL_PATH)/crypto/bio/bio_cb.c
+ $(OPENSSL_PATH)/crypto/bio/bio_err.c
+ $(OPENSSL_PATH)/crypto/bio/bio_lib.c
+ $(OPENSSL_PATH)/crypto/bio/bio_meth.c
+ $(OPENSSL_PATH)/crypto/bio/bss_acpt.c
+ $(OPENSSL_PATH)/crypto/bio/bss_bio.c
+ $(OPENSSL_PATH)/crypto/bio/bss_conn.c
+ $(OPENSSL_PATH)/crypto/bio/bss_dgram.c
+ $(OPENSSL_PATH)/crypto/bio/bss_fd.c
+ $(OPENSSL_PATH)/crypto/bio/bss_file.c
+ $(OPENSSL_PATH)/crypto/bio/bss_log.c
+ $(OPENSSL_PATH)/crypto/bio/bss_mem.c
+ $(OPENSSL_PATH)/crypto/bio/bss_null.c
+ $(OPENSSL_PATH)/crypto/bio/bss_sock.c
+ $(OPENSSL_PATH)/crypto/bn/asm/x86_64-gcc.c
+ $(OPENSSL_PATH)/crypto/bn/bn_add.c
+ $(OPENSSL_PATH)/crypto/bn/bn_blind.c
+ $(OPENSSL_PATH)/crypto/bn/bn_const.c
+ $(OPENSSL_PATH)/crypto/bn/bn_ctx.c
+ $(OPENSSL_PATH)/crypto/bn/bn_depr.c
+ $(OPENSSL_PATH)/crypto/bn/bn_dh.c
+ $(OPENSSL_PATH)/crypto/bn/bn_div.c
+ $(OPENSSL_PATH)/crypto/bn/bn_err.c
+ $(OPENSSL_PATH)/crypto/bn/bn_exp.c
+ $(OPENSSL_PATH)/crypto/bn/bn_exp2.c
+ $(OPENSSL_PATH)/crypto/bn/bn_gcd.c
+ $(OPENSSL_PATH)/crypto/bn/bn_gf2m.c
+ $(OPENSSL_PATH)/crypto/bn/bn_intern.c
+ $(OPENSSL_PATH)/crypto/bn/bn_kron.c
+ $(OPENSSL_PATH)/crypto/bn/bn_lib.c
+ $(OPENSSL_PATH)/crypto/bn/bn_mod.c
+ $(OPENSSL_PATH)/crypto/bn/bn_mont.c
+ $(OPENSSL_PATH)/crypto/bn/bn_mpi.c
+ $(OPENSSL_PATH)/crypto/bn/bn_mul.c
+ $(OPENSSL_PATH)/crypto/bn/bn_nist.c
+ $(OPENSSL_PATH)/crypto/bn/bn_prime.c
+ $(OPENSSL_PATH)/crypto/bn/bn_print.c
+ $(OPENSSL_PATH)/crypto/bn/bn_rand.c
+ $(OPENSSL_PATH)/crypto/bn/bn_recp.c
+ $(OPENSSL_PATH)/crypto/bn/bn_shift.c
+ $(OPENSSL_PATH)/crypto/bn/bn_sqr.c
+ $(OPENSSL_PATH)/crypto/bn/bn_sqrt.c
+ $(OPENSSL_PATH)/crypto/bn/bn_srp.c
+ $(OPENSSL_PATH)/crypto/bn/bn_word.c
+ $(OPENSSL_PATH)/crypto/bn/bn_x931p.c
+ $(OPENSSL_PATH)/crypto/bn/rsaz_exp.c
+ $(OPENSSL_PATH)/crypto/buffer/buf_err.c
+ $(OPENSSL_PATH)/crypto/buffer/buffer.c
+ $(OPENSSL_PATH)/crypto/cmac/cm_ameth.c
+ $(OPENSSL_PATH)/crypto/cmac/cm_pmeth.c
+ $(OPENSSL_PATH)/crypto/cmac/cmac.c
+ $(OPENSSL_PATH)/crypto/comp/c_zlib.c
+ $(OPENSSL_PATH)/crypto/comp/comp_err.c
+ $(OPENSSL_PATH)/crypto/comp/comp_lib.c
+ $(OPENSSL_PATH)/crypto/conf/conf_api.c
+ $(OPENSSL_PATH)/crypto/conf/conf_def.c
+ $(OPENSSL_PATH)/crypto/conf/conf_err.c
+ $(OPENSSL_PATH)/crypto/conf/conf_lib.c
+ $(OPENSSL_PATH)/crypto/conf/conf_mall.c
+ $(OPENSSL_PATH)/crypto/conf/conf_mod.c
+ $(OPENSSL_PATH)/crypto/conf/conf_sap.c
+ $(OPENSSL_PATH)/crypto/conf/conf_ssl.c
+ $(OPENSSL_PATH)/crypto/cpt_err.c
+ $(OPENSSL_PATH)/crypto/cryptlib.c
+ $(OPENSSL_PATH)/crypto/ctype.c
+ $(OPENSSL_PATH)/crypto/cversion.c
+ $(OPENSSL_PATH)/crypto/des/cbc_cksm.c
+ $(OPENSSL_PATH)/crypto/des/cbc_enc.c
+ $(OPENSSL_PATH)/crypto/des/cfb64ede.c
+ $(OPENSSL_PATH)/crypto/des/cfb64enc.c
+ $(OPENSSL_PATH)/crypto/des/cfb_enc.c
+ $(OPENSSL_PATH)/crypto/des/des_enc.c
+ $(OPENSSL_PATH)/crypto/des/ecb3_enc.c
+ $(OPENSSL_PATH)/crypto/des/ecb_enc.c
+ $(OPENSSL_PATH)/crypto/des/fcrypt.c
+ $(OPENSSL_PATH)/crypto/des/fcrypt_b.c
+ $(OPENSSL_PATH)/crypto/des/ofb64ede.c
+ $(OPENSSL_PATH)/crypto/des/ofb64enc.c
+ $(OPENSSL_PATH)/crypto/des/ofb_enc.c
+ $(OPENSSL_PATH)/crypto/des/pcbc_enc.c
+ $(OPENSSL_PATH)/crypto/des/qud_cksm.c
+ $(OPENSSL_PATH)/crypto/des/rand_key.c
+ $(OPENSSL_PATH)/crypto/des/set_key.c
+ $(OPENSSL_PATH)/crypto/des/str2key.c
+ $(OPENSSL_PATH)/crypto/des/xcbc_enc.c
+ $(OPENSSL_PATH)/crypto/dh/dh_ameth.c
+ $(OPENSSL_PATH)/crypto/dh/dh_asn1.c
+ $(OPENSSL_PATH)/crypto/dh/dh_check.c
+ $(OPENSSL_PATH)/crypto/dh/dh_depr.c
+ $(OPENSSL_PATH)/crypto/dh/dh_err.c
+ $(OPENSSL_PATH)/crypto/dh/dh_gen.c
+ $(OPENSSL_PATH)/crypto/dh/dh_kdf.c
+ $(OPENSSL_PATH)/crypto/dh/dh_key.c
+ $(OPENSSL_PATH)/crypto/dh/dh_lib.c
+ $(OPENSSL_PATH)/crypto/dh/dh_meth.c
+ $(OPENSSL_PATH)/crypto/dh/dh_pmeth.c
+ $(OPENSSL_PATH)/crypto/dh/dh_prn.c
+ $(OPENSSL_PATH)/crypto/dh/dh_rfc5114.c
+ $(OPENSSL_PATH)/crypto/dh/dh_rfc7919.c
+ $(OPENSSL_PATH)/crypto/dso/dso_dl.c
+ $(OPENSSL_PATH)/crypto/dso/dso_dlfcn.c
+ $(OPENSSL_PATH)/crypto/dso/dso_err.c
+ $(OPENSSL_PATH)/crypto/dso/dso_lib.c
+ $(OPENSSL_PATH)/crypto/dso/dso_openssl.c
+ $(OPENSSL_PATH)/crypto/dso/dso_vms.c
+ $(OPENSSL_PATH)/crypto/dso/dso_win32.c
+ $(OPENSSL_PATH)/crypto/ebcdic.c
+ $(OPENSSL_PATH)/crypto/err/err.c
+ $(OPENSSL_PATH)/crypto/err/err_prn.c
+ $(OPENSSL_PATH)/crypto/evp/bio_b64.c
+ $(OPENSSL_PATH)/crypto/evp/bio_enc.c
+ $(OPENSSL_PATH)/crypto/evp/bio_md.c
+ $(OPENSSL_PATH)/crypto/evp/bio_ok.c
+ $(OPENSSL_PATH)/crypto/evp/c_allc.c
+ $(OPENSSL_PATH)/crypto/evp/c_alld.c
+ $(OPENSSL_PATH)/crypto/evp/cmeth_lib.c
+ $(OPENSSL_PATH)/crypto/evp/digest.c
+ $(OPENSSL_PATH)/crypto/evp/e_aes.c
+ $(OPENSSL_PATH)/crypto/evp/e_aes_cbc_hmac_sha1.c
+ $(OPENSSL_PATH)/crypto/evp/e_aes_cbc_hmac_sha256.c
+ $(OPENSSL_PATH)/crypto/evp/e_aria.c
+ $(OPENSSL_PATH)/crypto/evp/e_bf.c
+ $(OPENSSL_PATH)/crypto/evp/e_camellia.c
+ $(OPENSSL_PATH)/crypto/evp/e_cast.c
+ $(OPENSSL_PATH)/crypto/evp/e_chacha20_poly1305.c
+ $(OPENSSL_PATH)/crypto/evp/e_des.c
+ $(OPENSSL_PATH)/crypto/evp/e_des3.c
+ $(OPENSSL_PATH)/crypto/evp/e_idea.c
+ $(OPENSSL_PATH)/crypto/evp/e_null.c
+ $(OPENSSL_PATH)/crypto/evp/e_old.c
+ $(OPENSSL_PATH)/crypto/evp/e_rc2.c
+ $(OPENSSL_PATH)/crypto/evp/e_rc4.c
+ $(OPENSSL_PATH)/crypto/evp/e_rc4_hmac_md5.c
+ $(OPENSSL_PATH)/crypto/evp/e_rc5.c
+ $(OPENSSL_PATH)/crypto/evp/e_seed.c
+ $(OPENSSL_PATH)/crypto/evp/e_sm4.c
+ $(OPENSSL_PATH)/crypto/evp/e_xcbc_d.c
+ $(OPENSSL_PATH)/crypto/evp/encode.c
+ $(OPENSSL_PATH)/crypto/evp/evp_cnf.c
+ $(OPENSSL_PATH)/crypto/evp/evp_enc.c
+ $(OPENSSL_PATH)/crypto/evp/evp_err.c
+ $(OPENSSL_PATH)/crypto/evp/evp_key.c
+ $(OPENSSL_PATH)/crypto/evp/evp_lib.c
+ $(OPENSSL_PATH)/crypto/evp/evp_pbe.c
+ $(OPENSSL_PATH)/crypto/evp/evp_pkey.c
+ $(OPENSSL_PATH)/crypto/evp/m_md2.c
+ $(OPENSSL_PATH)/crypto/evp/m_md4.c
+ $(OPENSSL_PATH)/crypto/evp/m_md5.c
+ $(OPENSSL_PATH)/crypto/evp/m_md5_sha1.c
+ $(OPENSSL_PATH)/crypto/evp/m_mdc2.c
+ $(OPENSSL_PATH)/crypto/evp/m_null.c
+ $(OPENSSL_PATH)/crypto/evp/m_ripemd.c
+ $(OPENSSL_PATH)/crypto/evp/m_sha1.c
+ $(OPENSSL_PATH)/crypto/evp/m_sha3.c
+ $(OPENSSL_PATH)/crypto/evp/m_sigver.c
+ $(OPENSSL_PATH)/crypto/evp/m_wp.c
+ $(OPENSSL_PATH)/crypto/evp/names.c
+ $(OPENSSL_PATH)/crypto/evp/p5_crpt.c
+ $(OPENSSL_PATH)/crypto/evp/p5_crpt2.c
+ $(OPENSSL_PATH)/crypto/evp/p_dec.c
+ $(OPENSSL_PATH)/crypto/evp/p_enc.c
+ $(OPENSSL_PATH)/crypto/evp/p_lib.c
+ $(OPENSSL_PATH)/crypto/evp/p_open.c
+ $(OPENSSL_PATH)/crypto/evp/p_seal.c
+ $(OPENSSL_PATH)/crypto/evp/p_sign.c
+ $(OPENSSL_PATH)/crypto/evp/p_verify.c
+ $(OPENSSL_PATH)/crypto/evp/pbe_scrypt.c
+ $(OPENSSL_PATH)/crypto/evp/pmeth_fn.c
+ $(OPENSSL_PATH)/crypto/evp/pmeth_gn.c
+ $(OPENSSL_PATH)/crypto/evp/pmeth_lib.c
+ $(OPENSSL_PATH)/crypto/ex_data.c
+ $(OPENSSL_PATH)/crypto/getenv.c
+ $(OPENSSL_PATH)/crypto/hmac/hm_ameth.c
+ $(OPENSSL_PATH)/crypto/hmac/hm_pmeth.c
+ $(OPENSSL_PATH)/crypto/hmac/hmac.c
+ $(OPENSSL_PATH)/crypto/init.c
+ $(OPENSSL_PATH)/crypto/kdf/hkdf.c
+ $(OPENSSL_PATH)/crypto/kdf/kdf_err.c
+ $(OPENSSL_PATH)/crypto/kdf/scrypt.c
+ $(OPENSSL_PATH)/crypto/kdf/tls1_prf.c
+ $(OPENSSL_PATH)/crypto/lhash/lh_stats.c
+ $(OPENSSL_PATH)/crypto/lhash/lhash.c
+ $(OPENSSL_PATH)/crypto/md4/md4_dgst.c
+ $(OPENSSL_PATH)/crypto/md4/md4_one.c
+ $(OPENSSL_PATH)/crypto/md5/md5_dgst.c
+ $(OPENSSL_PATH)/crypto/md5/md5_one.c
+ $(OPENSSL_PATH)/crypto/mem.c
+ $(OPENSSL_PATH)/crypto/mem_dbg.c
+ $(OPENSSL_PATH)/crypto/mem_sec.c
+ $(OPENSSL_PATH)/crypto/modes/cbc128.c
+ $(OPENSSL_PATH)/crypto/modes/ccm128.c
+ $(OPENSSL_PATH)/crypto/modes/cfb128.c
+ $(OPENSSL_PATH)/crypto/modes/ctr128.c
+ $(OPENSSL_PATH)/crypto/modes/cts128.c
+ $(OPENSSL_PATH)/crypto/modes/gcm128.c
+ $(OPENSSL_PATH)/crypto/modes/ocb128.c
+ $(OPENSSL_PATH)/crypto/modes/ofb128.c
+ $(OPENSSL_PATH)/crypto/modes/wrap128.c
+ $(OPENSSL_PATH)/crypto/modes/xts128.c
+ $(OPENSSL_PATH)/crypto/o_dir.c
+ $(OPENSSL_PATH)/crypto/o_fips.c
+ $(OPENSSL_PATH)/crypto/o_fopen.c
+ $(OPENSSL_PATH)/crypto/o_init.c
+ $(OPENSSL_PATH)/crypto/o_str.c
+ $(OPENSSL_PATH)/crypto/o_time.c
+ $(OPENSSL_PATH)/crypto/objects/o_names.c
+ $(OPENSSL_PATH)/crypto/objects/obj_dat.c
+ $(OPENSSL_PATH)/crypto/objects/obj_err.c
+ $(OPENSSL_PATH)/crypto/objects/obj_lib.c
+ $(OPENSSL_PATH)/crypto/objects/obj_xref.c
+ $(OPENSSL_PATH)/crypto/ocsp/ocsp_asn.c
+ $(OPENSSL_PATH)/crypto/ocsp/ocsp_cl.c
+ $(OPENSSL_PATH)/crypto/ocsp/ocsp_err.c
+ $(OPENSSL_PATH)/crypto/ocsp/ocsp_ext.c
+ $(OPENSSL_PATH)/crypto/ocsp/ocsp_ht.c
+ $(OPENSSL_PATH)/crypto/ocsp/ocsp_lib.c
+ $(OPENSSL_PATH)/crypto/ocsp/ocsp_prn.c
+ $(OPENSSL_PATH)/crypto/ocsp/ocsp_srv.c
+ $(OPENSSL_PATH)/crypto/ocsp/ocsp_vfy.c
+ $(OPENSSL_PATH)/crypto/ocsp/v3_ocsp.c
+ $(OPENSSL_PATH)/crypto/pem/pem_all.c
+ $(OPENSSL_PATH)/crypto/pem/pem_err.c
+ $(OPENSSL_PATH)/crypto/pem/pem_info.c
+ $(OPENSSL_PATH)/crypto/pem/pem_lib.c
+ $(OPENSSL_PATH)/crypto/pem/pem_oth.c
+ $(OPENSSL_PATH)/crypto/pem/pem_pk8.c
+ $(OPENSSL_PATH)/crypto/pem/pem_pkey.c
+ $(OPENSSL_PATH)/crypto/pem/pem_sign.c
+ $(OPENSSL_PATH)/crypto/pem/pem_x509.c
+ $(OPENSSL_PATH)/crypto/pem/pem_xaux.c
+ $(OPENSSL_PATH)/crypto/pem/pvkfmt.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_add.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_asn.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_attr.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_crpt.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_crt.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_decr.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_init.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_key.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_kiss.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_mutl.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_npas.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_p8d.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_p8e.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_sbag.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_utl.c
+ $(OPENSSL_PATH)/crypto/pkcs12/pk12err.c
+ $(OPENSSL_PATH)/crypto/pkcs7/bio_pk7.c
+ $(OPENSSL_PATH)/crypto/pkcs7/pk7_asn1.c
+ $(OPENSSL_PATH)/crypto/pkcs7/pk7_attr.c
+ $(OPENSSL_PATH)/crypto/pkcs7/pk7_doit.c
+ $(OPENSSL_PATH)/crypto/pkcs7/pk7_lib.c
+ $(OPENSSL_PATH)/crypto/pkcs7/pk7_mime.c
+ $(OPENSSL_PATH)/crypto/pkcs7/pk7_smime.c
+ $(OPENSSL_PATH)/crypto/pkcs7/pkcs7err.c
+ $(OPENSSL_PATH)/crypto/rand/drbg_ctr.c
+ $(OPENSSL_PATH)/crypto/rand/drbg_lib.c
+ $(OPENSSL_PATH)/crypto/rand/rand_egd.c
+ $(OPENSSL_PATH)/crypto/rand/rand_err.c
+ $(OPENSSL_PATH)/crypto/rand/rand_lib.c
+ $(OPENSSL_PATH)/crypto/rand/rand_unix.c
+ $(OPENSSL_PATH)/crypto/rand/rand_vms.c
+ $(OPENSSL_PATH)/crypto/rand/rand_win.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_ameth.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_asn1.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_chk.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_crpt.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_depr.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_err.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_gen.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_lib.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_meth.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_mp.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_none.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_oaep.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_ossl.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_pk1.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_pmeth.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_prn.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_pss.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_saos.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_sign.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_ssl.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_x931.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_x931g.c
+ $(OPENSSL_PATH)/crypto/sha/sha1_one.c
+ $(OPENSSL_PATH)/crypto/sha/sha1dgst.c
+ $(OPENSSL_PATH)/crypto/sha/sha256.c
+ $(OPENSSL_PATH)/crypto/sha/sha512.c
+ $(OPENSSL_PATH)/crypto/siphash/siphash.c
+ $(OPENSSL_PATH)/crypto/siphash/siphash_ameth.c
+ $(OPENSSL_PATH)/crypto/siphash/siphash_pmeth.c
+ $(OPENSSL_PATH)/crypto/sm3/m_sm3.c
+ $(OPENSSL_PATH)/crypto/sm3/sm3.c
+ $(OPENSSL_PATH)/crypto/sm4/sm4.c
+ $(OPENSSL_PATH)/crypto/stack/stack.c
+ $(OPENSSL_PATH)/crypto/threads_none.c
+ $(OPENSSL_PATH)/crypto/threads_pthread.c
+ $(OPENSSL_PATH)/crypto/threads_win.c
+ $(OPENSSL_PATH)/crypto/txt_db/txt_db.c
+ $(OPENSSL_PATH)/crypto/ui/ui_err.c
+ $(OPENSSL_PATH)/crypto/ui/ui_lib.c
+ $(OPENSSL_PATH)/crypto/ui/ui_null.c
+ $(OPENSSL_PATH)/crypto/ui/ui_openssl.c
+ $(OPENSSL_PATH)/crypto/ui/ui_util.c
+ $(OPENSSL_PATH)/crypto/uid.c
+ $(OPENSSL_PATH)/crypto/x509/by_dir.c
+ $(OPENSSL_PATH)/crypto/x509/by_file.c
+ $(OPENSSL_PATH)/crypto/x509/t_crl.c
+ $(OPENSSL_PATH)/crypto/x509/t_req.c
+ $(OPENSSL_PATH)/crypto/x509/t_x509.c
+ $(OPENSSL_PATH)/crypto/x509/x509_att.c
+ $(OPENSSL_PATH)/crypto/x509/x509_cmp.c
+ $(OPENSSL_PATH)/crypto/x509/x509_d2.c
+ $(OPENSSL_PATH)/crypto/x509/x509_def.c
+ $(OPENSSL_PATH)/crypto/x509/x509_err.c
+ $(OPENSSL_PATH)/crypto/x509/x509_ext.c
+ $(OPENSSL_PATH)/crypto/x509/x509_lu.c
+ $(OPENSSL_PATH)/crypto/x509/x509_meth.c
+ $(OPENSSL_PATH)/crypto/x509/x509_obj.c
+ $(OPENSSL_PATH)/crypto/x509/x509_r2x.c
+ $(OPENSSL_PATH)/crypto/x509/x509_req.c
+ $(OPENSSL_PATH)/crypto/x509/x509_set.c
+ $(OPENSSL_PATH)/crypto/x509/x509_trs.c
+ $(OPENSSL_PATH)/crypto/x509/x509_txt.c
+ $(OPENSSL_PATH)/crypto/x509/x509_v3.c
+ $(OPENSSL_PATH)/crypto/x509/x509_vfy.c
+ $(OPENSSL_PATH)/crypto/x509/x509_vpm.c
+ $(OPENSSL_PATH)/crypto/x509/x509cset.c
+ $(OPENSSL_PATH)/crypto/x509/x509name.c
+ $(OPENSSL_PATH)/crypto/x509/x509rset.c
+ $(OPENSSL_PATH)/crypto/x509/x509spki.c
+ $(OPENSSL_PATH)/crypto/x509/x509type.c
+ $(OPENSSL_PATH)/crypto/x509/x_all.c
+ $(OPENSSL_PATH)/crypto/x509/x_attrib.c
+ $(OPENSSL_PATH)/crypto/x509/x_crl.c
+ $(OPENSSL_PATH)/crypto/x509/x_exten.c
+ $(OPENSSL_PATH)/crypto/x509/x_name.c
+ $(OPENSSL_PATH)/crypto/x509/x_pubkey.c
+ $(OPENSSL_PATH)/crypto/x509/x_req.c
+ $(OPENSSL_PATH)/crypto/x509/x_x509.c
+ $(OPENSSL_PATH)/crypto/x509/x_x509a.c
+ $(OPENSSL_PATH)/crypto/x509v3/pcy_cache.c
+ $(OPENSSL_PATH)/crypto/x509v3/pcy_data.c
+ $(OPENSSL_PATH)/crypto/x509v3/pcy_lib.c
+ $(OPENSSL_PATH)/crypto/x509v3/pcy_map.c
+ $(OPENSSL_PATH)/crypto/x509v3/pcy_node.c
+ $(OPENSSL_PATH)/crypto/x509v3/pcy_tree.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_addr.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_admis.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_akey.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_akeya.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_alt.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_asid.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_bcons.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_bitst.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_conf.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_cpols.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_crld.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_enum.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_extku.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_genn.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_ia5.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_info.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_int.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_lib.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_ncons.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_pci.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_pcia.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_pcons.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_pku.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_pmaps.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_prn.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_purp.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_skey.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_sxnet.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_tlsf.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_utl.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3err.c
+ $(OPENSSL_PATH)/crypto/arm_arch.h
+ $(OPENSSL_PATH)/crypto/mips_arch.h
+ $(OPENSSL_PATH)/crypto/ppc_arch.h
+ $(OPENSSL_PATH)/crypto/s390x_arch.h
+ $(OPENSSL_PATH)/crypto/sparc_arch.h
+ $(OPENSSL_PATH)/crypto/vms_rms.h
+ $(OPENSSL_PATH)/crypto/aes/aes_locl.h
+ $(OPENSSL_PATH)/crypto/asn1/asn1_item_list.h
+ $(OPENSSL_PATH)/crypto/asn1/asn1_locl.h
+ $(OPENSSL_PATH)/crypto/asn1/charmap.h
+ $(OPENSSL_PATH)/crypto/asn1/standard_methods.h
+ $(OPENSSL_PATH)/crypto/asn1/tbl_standard.h
+ $(OPENSSL_PATH)/crypto/async/async_locl.h
+ $(OPENSSL_PATH)/crypto/async/arch/async_null.h
+ $(OPENSSL_PATH)/crypto/async/arch/async_posix.h
+ $(OPENSSL_PATH)/crypto/async/arch/async_win.h
+ $(OPENSSL_PATH)/crypto/bio/bio_lcl.h
+ $(OPENSSL_PATH)/crypto/bn/bn_lcl.h
+ $(OPENSSL_PATH)/crypto/bn/bn_prime.h
+ $(OPENSSL_PATH)/crypto/bn/rsaz_exp.h
+ $(OPENSSL_PATH)/crypto/comp/comp_lcl.h
+ $(OPENSSL_PATH)/crypto/conf/conf_def.h
+ $(OPENSSL_PATH)/crypto/conf/conf_lcl.h
+ $(OPENSSL_PATH)/crypto/des/des_locl.h
+ $(OPENSSL_PATH)/crypto/des/spr.h
+ $(OPENSSL_PATH)/crypto/dh/dh_locl.h
+ $(OPENSSL_PATH)/crypto/dso/dso_locl.h
+ $(OPENSSL_PATH)/crypto/evp/evp_locl.h
+ $(OPENSSL_PATH)/crypto/hmac/hmac_lcl.h
+ $(OPENSSL_PATH)/crypto/lhash/lhash_lcl.h
+ $(OPENSSL_PATH)/crypto/md4/md4_locl.h
+ $(OPENSSL_PATH)/crypto/md5/md5_locl.h
+ $(OPENSSL_PATH)/crypto/modes/modes_lcl.h
+ $(OPENSSL_PATH)/crypto/objects/obj_dat.h
+ $(OPENSSL_PATH)/crypto/objects/obj_lcl.h
+ $(OPENSSL_PATH)/crypto/objects/obj_xref.h
+ $(OPENSSL_PATH)/crypto/ocsp/ocsp_lcl.h
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_lcl.h
+ $(OPENSSL_PATH)/crypto/rand/rand_lcl.h
+ $(OPENSSL_PATH)/crypto/rc4/rc4_locl.h
+ $(OPENSSL_PATH)/crypto/rsa/rsa_locl.h
+ $(OPENSSL_PATH)/crypto/sha/sha_locl.h
+ $(OPENSSL_PATH)/crypto/siphash/siphash_local.h
+ $(OPENSSL_PATH)/crypto/sm3/sm3_locl.h
+ $(OPENSSL_PATH)/crypto/store/store_locl.h
+ $(OPENSSL_PATH)/crypto/ui/ui_locl.h
+ $(OPENSSL_PATH)/crypto/x509/x509_lcl.h
+ $(OPENSSL_PATH)/crypto/x509v3/ext_dat.h
+ $(OPENSSL_PATH)/crypto/x509v3/pcy_int.h
+ $(OPENSSL_PATH)/crypto/x509v3/standard_exts.h
+ $(OPENSSL_PATH)/crypto/x509v3/v3_admis.h
+ $(OPENSSL_PATH)/ssl/bio_ssl.c
+ $(OPENSSL_PATH)/ssl/d1_lib.c
+ $(OPENSSL_PATH)/ssl/d1_msg.c
+ $(OPENSSL_PATH)/ssl/d1_srtp.c
+ $(OPENSSL_PATH)/ssl/methods.c
+ $(OPENSSL_PATH)/ssl/packet.c
+ $(OPENSSL_PATH)/ssl/pqueue.c
+ $(OPENSSL_PATH)/ssl/record/dtls1_bitmap.c
+ $(OPENSSL_PATH)/ssl/record/rec_layer_d1.c
+ $(OPENSSL_PATH)/ssl/record/rec_layer_s3.c
+ $(OPENSSL_PATH)/ssl/record/ssl3_buffer.c
+ $(OPENSSL_PATH)/ssl/record/ssl3_record.c
+ $(OPENSSL_PATH)/ssl/record/ssl3_record_tls13.c
+ $(OPENSSL_PATH)/ssl/s3_cbc.c
+ $(OPENSSL_PATH)/ssl/s3_enc.c
+ $(OPENSSL_PATH)/ssl/s3_lib.c
+ $(OPENSSL_PATH)/ssl/s3_msg.c
+ $(OPENSSL_PATH)/ssl/ssl_asn1.c
+ $(OPENSSL_PATH)/ssl/ssl_cert.c
+ $(OPENSSL_PATH)/ssl/ssl_ciph.c
+ $(OPENSSL_PATH)/ssl/ssl_conf.c
+ $(OPENSSL_PATH)/ssl/ssl_err.c
+ $(OPENSSL_PATH)/ssl/ssl_init.c
+ $(OPENSSL_PATH)/ssl/ssl_lib.c
+ $(OPENSSL_PATH)/ssl/ssl_mcnf.c
+ $(OPENSSL_PATH)/ssl/ssl_rsa.c
+ $(OPENSSL_PATH)/ssl/ssl_sess.c
+ $(OPENSSL_PATH)/ssl/ssl_stat.c
+ $(OPENSSL_PATH)/ssl/ssl_txt.c
+ $(OPENSSL_PATH)/ssl/ssl_utst.c
+ $(OPENSSL_PATH)/ssl/statem/extensions.c
+ $(OPENSSL_PATH)/ssl/statem/extensions_clnt.c
+ $(OPENSSL_PATH)/ssl/statem/extensions_cust.c
+ $(OPENSSL_PATH)/ssl/statem/extensions_srvr.c
+ $(OPENSSL_PATH)/ssl/statem/statem.c
+ $(OPENSSL_PATH)/ssl/statem/statem_clnt.c
+ $(OPENSSL_PATH)/ssl/statem/statem_dtls.c
+ $(OPENSSL_PATH)/ssl/statem/statem_lib.c
+ $(OPENSSL_PATH)/ssl/statem/statem_srvr.c
+ $(OPENSSL_PATH)/ssl/t1_enc.c
+ $(OPENSSL_PATH)/ssl/t1_lib.c
+ $(OPENSSL_PATH)/ssl/t1_trce.c
+ $(OPENSSL_PATH)/ssl/tls13_enc.c
+ $(OPENSSL_PATH)/ssl/tls_srp.c
+ $(OPENSSL_PATH)/ssl/packet_locl.h
+ $(OPENSSL_PATH)/ssl/ssl_cert_table.h
+ $(OPENSSL_PATH)/ssl/ssl_locl.h
+ $(OPENSSL_PATH)/ssl/record/record.h
+ $(OPENSSL_PATH)/ssl/record/record_locl.h
+ $(OPENSSL_PATH)/ssl/statem/statem.h
+ $(OPENSSL_PATH)/ssl/statem/statem_locl.h
+# Autogenerated files list ends here
+ buildinf.h
+ rand_pool_noise.h
+ ossl_store.c
+ rand_pool.c
+
+[Sources.X64]
+ ApiHooks.c
+ rand_pool_noise_tsc.c
+
+[Packages]
+ MdePkg/MdePkg.dec
+ CryptoPkg/CryptoPkg.dec
+
+[LibraryClasses]
+ BaseLib
+ DebugLib
+ TimerLib
+ PrintLib
+
+[BuildOptions]
+ #
+ # Disables the following Visual Studio compiler warnings brought by openssl source,
+ # so we do not break the build with /WX option:
+ # C4090: 'function' : different 'const' qualifiers
+ # C4132: 'object' : const object should be initialized (tls13_enc.c)
+ # C4210: nonstandard extension used: function given file scope
+ # C4244: conversion from type1 to type2, possible loss of data
+ # C4245: conversion from type1 to type2, signed/unsigned mismatch
+ # C4267: conversion from size_t to type, possible loss of data
+ # C4306: 'identifier' : conversion from 'type1' to 'type2' of greater size
+ # C4310: cast truncates constant value
+ # C4389: 'operator' : signed/unsigned mismatch (xxxx)
+ # C4700: uninitialized local variable 'name' used. (conf_sap.c(71))
+ # C4702: unreachable code
+ # C4706: assignment within conditional expression
+ # C4819: The file contains a character that cannot be represented in the current code page
+ #
+ MSFT:*_*_X64_CC_FLAGS = -U_WIN32 -U_WIN64 -U_MSC_VER $(OPENSSL_FLAGS) $(OPENSSL_FLAGS_CONFIG) /wd4090 /wd4132 /wd4210 /wd4244 /wd4245 /wd4267 /wd4306 /wd4310 /wd4700 /wd4389 /wd4702 /wd4706 /wd4819
+
+ INTEL:*_*_X64_CC_FLAGS = -U_WIN32 -U_WIN64 -U_MSC_VER -U__ICC $(OPENSSL_FLAGS) $(OPENSSL_FLAGS_CONFIG) /w
+
+ #
+ # Suppress the following build warnings in openssl so we don't break the build with -Werror
+ # -Werror=maybe-uninitialized: there exist some other paths for which the variable is not initialized.
+ # -Werror=format: Check calls to printf and scanf, etc., to make sure that the arguments supplied have
+ # types appropriate to the format string specified.
+ # -Werror=unused-but-set-variable: Warn whenever a local variable is assigned to, but otherwise unused (aside from its declaration).
+ #
+ GCC:*_*_X64_CC_FLAGS = -U_WIN32 -U_WIN64 $(OPENSSL_FLAGS) -Wno-error=maybe-uninitialized -Wno-error=format -Wno-format -Wno-error=unused-but-set-variable -DNO_MSABI_VA_FUNCS
+
+ # suppress the following warnings in openssl so we don't break the build with warnings-as-errors:
+ # 1295: Deprecated declaration <entity> - give arg types
+ # 550: <entity> was set but never used
+ # 1293: assignment in condition
+ # 111: statement is unreachable (invariably "break;" after "return X;" in case statement)
+ # 68: integer conversion resulted in a change of sign ("if (Status == -1)")
+ # 177: <entity> was declared but never referenced
+ # 223: function <entity> declared implicitly
+ # 144: a value of type <type> cannot be used to initialize an entity of type <type>
+ # 513: a value of type <type> cannot be assigned to an entity of type <type>
+ # 188: enumerated type mixed with another type (i.e. passing an integer as an enum without a cast)
+ # 1296: Extended constant initialiser used
+ # 128: loop is not reachable - may be emitted inappropriately if code follows a conditional return
+ # from the function that evaluates to true at compile time
+ # 546: transfer of control bypasses initialization - may be emitted inappropriately if the uninitialized
+ # variable is never referenced after the jump
+ # 1: ignore "#1-D: last line of file ends without a newline"
+ # 3017: <entity> may be used before being set (NOTE: This was fixed in OpenSSL 1.1 HEAD with
+ # commit d9b8b89bec4480de3a10bdaf9425db371c19145b, and can be dropped then.)
+ XCODE:*_*_X64_CC_FLAGS = -mmmx -msse -U_WIN32 -U_WIN64 $(OPENSSL_FLAGS) -w -std=c99 -Wno-error=uninitialized
diff --git a/CryptoPkg/Library/Include/openssl/opensslconf.h b/CryptoPkg/Library/Include/openssl/opensslconf.h
index bd34e53ef2..20f32cc6fe 100644
--- a/CryptoPkg/Library/Include/openssl/opensslconf.h
+++ b/CryptoPkg/Library/Include/openssl/opensslconf.h
@@ -103,9 +103,6 @@ extern "C" {
#ifndef OPENSSL_NO_ASAN
# define OPENSSL_NO_ASAN
#endif
-#ifndef OPENSSL_NO_ASM
-# define OPENSSL_NO_ASM
-#endif
#ifndef OPENSSL_NO_ASYNC
# define OPENSSL_NO_ASYNC
#endif
diff --git a/CryptoPkg/Library/OpensslLib/ApiHooks.c b/CryptoPkg/Library/OpensslLib/ApiHooks.c
new file mode 100644
index 0000000000..58cff16838
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/ApiHooks.c
@@ -0,0 +1,18 @@
+/** @file
+ OpenSSL Library API hooks.
+
+Copyright (c) 2020, Intel Corporation. All rights reserved.<BR>
+SPDX-License-Identifier: BSD-2-Clause-Patent
+
+**/
+
+#include <Uefi.h>
+
+VOID *
+__imp_RtlVirtualUnwind (
+ VOID * Args
+ )
+{
+ return NULL;
+}
+
diff --git a/CryptoPkg/Library/OpensslLib/OpensslLibConstructor.c b/CryptoPkg/Library/OpensslLib/OpensslLibConstructor.c
new file mode 100644
index 0000000000..ef20d2b84e
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/OpensslLibConstructor.c
@@ -0,0 +1,34 @@
+/** @file
+ Constructor to initialize CPUID data for OpenSSL assembly operations.
+
+Copyright (c) 2020, Intel Corporation. All rights reserved.<BR>
+SPDX-License-Identifier: BSD-2-Clause-Patent
+
+**/
+
+#include <Uefi.h>
+
+extern void OPENSSL_cpuid_setup (void);
+
+/**
+ Constructor routine for OpensslLib.
+
+ The constructor calls an internal OpenSSL function which fetches a local copy
+ of the hardware capability flags, used to enable native crypto instructions.
+
+ @param None
+
+ @retval EFI_SUCCESS The construction succeeded.
+
+**/
+EFI_STATUS
+EFIAPI
+OpensslLibConstructor (
+ VOID
+ )
+{
+ OPENSSL_cpuid_setup ();
+
+ return EFI_SUCCESS;
+}
+
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/aesni-x86.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/aesni-x86.nasm
new file mode 100644
index 0000000000..30879d3cf5
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/aesni-x86.nasm
@@ -0,0 +1,3209 @@
+; Copyright 2009-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text code align=64
+%else
+section .text code
+%endif
+;extern _OPENSSL_ia32cap_P
+global _aesni_encrypt
+align 16
+_aesni_encrypt:
+L$_aesni_encrypt_begin:
+ mov eax,DWORD [4+esp]
+ mov edx,DWORD [12+esp]
+ movups xmm2,[eax]
+ mov ecx,DWORD [240+edx]
+ mov eax,DWORD [8+esp]
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ lea edx,[32+edx]
+ xorps xmm2,xmm0
+L$000enc1_loop_1:
+db 102,15,56,220,209
+ dec ecx
+ movups xmm1,[edx]
+ lea edx,[16+edx]
+ jnz NEAR L$000enc1_loop_1
+db 102,15,56,221,209
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ movups [eax],xmm2
+ pxor xmm2,xmm2
+ ret
+global _aesni_decrypt
+align 16
+_aesni_decrypt:
+L$_aesni_decrypt_begin:
+ mov eax,DWORD [4+esp]
+ mov edx,DWORD [12+esp]
+ movups xmm2,[eax]
+ mov ecx,DWORD [240+edx]
+ mov eax,DWORD [8+esp]
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ lea edx,[32+edx]
+ xorps xmm2,xmm0
+L$001dec1_loop_2:
+db 102,15,56,222,209
+ dec ecx
+ movups xmm1,[edx]
+ lea edx,[16+edx]
+ jnz NEAR L$001dec1_loop_2
+db 102,15,56,223,209
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ movups [eax],xmm2
+ pxor xmm2,xmm2
+ ret
+align 16
+__aesni_encrypt2:
+ movups xmm0,[edx]
+ shl ecx,4
+ movups xmm1,[16+edx]
+ xorps xmm2,xmm0
+ pxor xmm3,xmm0
+ movups xmm0,[32+edx]
+ lea edx,[32+ecx*1+edx]
+ neg ecx
+ add ecx,16
+L$002enc2_loop:
+db 102,15,56,220,209
+db 102,15,56,220,217
+ movups xmm1,[ecx*1+edx]
+ add ecx,32
+db 102,15,56,220,208
+db 102,15,56,220,216
+ movups xmm0,[ecx*1+edx-16]
+ jnz NEAR L$002enc2_loop
+db 102,15,56,220,209
+db 102,15,56,220,217
+db 102,15,56,221,208
+db 102,15,56,221,216
+ ret
+align 16
+__aesni_decrypt2:
+ movups xmm0,[edx]
+ shl ecx,4
+ movups xmm1,[16+edx]
+ xorps xmm2,xmm0
+ pxor xmm3,xmm0
+ movups xmm0,[32+edx]
+ lea edx,[32+ecx*1+edx]
+ neg ecx
+ add ecx,16
+L$003dec2_loop:
+db 102,15,56,222,209
+db 102,15,56,222,217
+ movups xmm1,[ecx*1+edx]
+ add ecx,32
+db 102,15,56,222,208
+db 102,15,56,222,216
+ movups xmm0,[ecx*1+edx-16]
+ jnz NEAR L$003dec2_loop
+db 102,15,56,222,209
+db 102,15,56,222,217
+db 102,15,56,223,208
+db 102,15,56,223,216
+ ret
+align 16
+__aesni_encrypt3:
+ movups xmm0,[edx]
+ shl ecx,4
+ movups xmm1,[16+edx]
+ xorps xmm2,xmm0
+ pxor xmm3,xmm0
+ pxor xmm4,xmm0
+ movups xmm0,[32+edx]
+ lea edx,[32+ecx*1+edx]
+ neg ecx
+ add ecx,16
+L$004enc3_loop:
+db 102,15,56,220,209
+db 102,15,56,220,217
+db 102,15,56,220,225
+ movups xmm1,[ecx*1+edx]
+ add ecx,32
+db 102,15,56,220,208
+db 102,15,56,220,216
+db 102,15,56,220,224
+ movups xmm0,[ecx*1+edx-16]
+ jnz NEAR L$004enc3_loop
+db 102,15,56,220,209
+db 102,15,56,220,217
+db 102,15,56,220,225
+db 102,15,56,221,208
+db 102,15,56,221,216
+db 102,15,56,221,224
+ ret
+align 16
+__aesni_decrypt3:
+ movups xmm0,[edx]
+ shl ecx,4
+ movups xmm1,[16+edx]
+ xorps xmm2,xmm0
+ pxor xmm3,xmm0
+ pxor xmm4,xmm0
+ movups xmm0,[32+edx]
+ lea edx,[32+ecx*1+edx]
+ neg ecx
+ add ecx,16
+L$005dec3_loop:
+db 102,15,56,222,209
+db 102,15,56,222,217
+db 102,15,56,222,225
+ movups xmm1,[ecx*1+edx]
+ add ecx,32
+db 102,15,56,222,208
+db 102,15,56,222,216
+db 102,15,56,222,224
+ movups xmm0,[ecx*1+edx-16]
+ jnz NEAR L$005dec3_loop
+db 102,15,56,222,209
+db 102,15,56,222,217
+db 102,15,56,222,225
+db 102,15,56,223,208
+db 102,15,56,223,216
+db 102,15,56,223,224
+ ret
+align 16
+__aesni_encrypt4:
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ shl ecx,4
+ xorps xmm2,xmm0
+ pxor xmm3,xmm0
+ pxor xmm4,xmm0
+ pxor xmm5,xmm0
+ movups xmm0,[32+edx]
+ lea edx,[32+ecx*1+edx]
+ neg ecx
+db 15,31,64,0
+ add ecx,16
+L$006enc4_loop:
+db 102,15,56,220,209
+db 102,15,56,220,217
+db 102,15,56,220,225
+db 102,15,56,220,233
+ movups xmm1,[ecx*1+edx]
+ add ecx,32
+db 102,15,56,220,208
+db 102,15,56,220,216
+db 102,15,56,220,224
+db 102,15,56,220,232
+ movups xmm0,[ecx*1+edx-16]
+ jnz NEAR L$006enc4_loop
+db 102,15,56,220,209
+db 102,15,56,220,217
+db 102,15,56,220,225
+db 102,15,56,220,233
+db 102,15,56,221,208
+db 102,15,56,221,216
+db 102,15,56,221,224
+db 102,15,56,221,232
+ ret
+align 16
+__aesni_decrypt4:
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ shl ecx,4
+ xorps xmm2,xmm0
+ pxor xmm3,xmm0
+ pxor xmm4,xmm0
+ pxor xmm5,xmm0
+ movups xmm0,[32+edx]
+ lea edx,[32+ecx*1+edx]
+ neg ecx
+db 15,31,64,0
+ add ecx,16
+L$007dec4_loop:
+db 102,15,56,222,209
+db 102,15,56,222,217
+db 102,15,56,222,225
+db 102,15,56,222,233
+ movups xmm1,[ecx*1+edx]
+ add ecx,32
+db 102,15,56,222,208
+db 102,15,56,222,216
+db 102,15,56,222,224
+db 102,15,56,222,232
+ movups xmm0,[ecx*1+edx-16]
+ jnz NEAR L$007dec4_loop
+db 102,15,56,222,209
+db 102,15,56,222,217
+db 102,15,56,222,225
+db 102,15,56,222,233
+db 102,15,56,223,208
+db 102,15,56,223,216
+db 102,15,56,223,224
+db 102,15,56,223,232
+ ret
+align 16
+__aesni_encrypt6:
+ movups xmm0,[edx]
+ shl ecx,4
+ movups xmm1,[16+edx]
+ xorps xmm2,xmm0
+ pxor xmm3,xmm0
+ pxor xmm4,xmm0
+db 102,15,56,220,209
+ pxor xmm5,xmm0
+ pxor xmm6,xmm0
+db 102,15,56,220,217
+ lea edx,[32+ecx*1+edx]
+ neg ecx
+db 102,15,56,220,225
+ pxor xmm7,xmm0
+ movups xmm0,[ecx*1+edx]
+ add ecx,16
+ jmp NEAR L$008_aesni_encrypt6_inner
+align 16
+L$009enc6_loop:
+db 102,15,56,220,209
+db 102,15,56,220,217
+db 102,15,56,220,225
+L$008_aesni_encrypt6_inner:
+db 102,15,56,220,233
+db 102,15,56,220,241
+db 102,15,56,220,249
+L$_aesni_encrypt6_enter:
+ movups xmm1,[ecx*1+edx]
+ add ecx,32
+db 102,15,56,220,208
+db 102,15,56,220,216
+db 102,15,56,220,224
+db 102,15,56,220,232
+db 102,15,56,220,240
+db 102,15,56,220,248
+ movups xmm0,[ecx*1+edx-16]
+ jnz NEAR L$009enc6_loop
+db 102,15,56,220,209
+db 102,15,56,220,217
+db 102,15,56,220,225
+db 102,15,56,220,233
+db 102,15,56,220,241
+db 102,15,56,220,249
+db 102,15,56,221,208
+db 102,15,56,221,216
+db 102,15,56,221,224
+db 102,15,56,221,232
+db 102,15,56,221,240
+db 102,15,56,221,248
+ ret
+align 16
+__aesni_decrypt6:
+ movups xmm0,[edx]
+ shl ecx,4
+ movups xmm1,[16+edx]
+ xorps xmm2,xmm0
+ pxor xmm3,xmm0
+ pxor xmm4,xmm0
+db 102,15,56,222,209
+ pxor xmm5,xmm0
+ pxor xmm6,xmm0
+db 102,15,56,222,217
+ lea edx,[32+ecx*1+edx]
+ neg ecx
+db 102,15,56,222,225
+ pxor xmm7,xmm0
+ movups xmm0,[ecx*1+edx]
+ add ecx,16
+ jmp NEAR L$010_aesni_decrypt6_inner
+align 16
+L$011dec6_loop:
+db 102,15,56,222,209
+db 102,15,56,222,217
+db 102,15,56,222,225
+L$010_aesni_decrypt6_inner:
+db 102,15,56,222,233
+db 102,15,56,222,241
+db 102,15,56,222,249
+L$_aesni_decrypt6_enter:
+ movups xmm1,[ecx*1+edx]
+ add ecx,32
+db 102,15,56,222,208
+db 102,15,56,222,216
+db 102,15,56,222,224
+db 102,15,56,222,232
+db 102,15,56,222,240
+db 102,15,56,222,248
+ movups xmm0,[ecx*1+edx-16]
+ jnz NEAR L$011dec6_loop
+db 102,15,56,222,209
+db 102,15,56,222,217
+db 102,15,56,222,225
+db 102,15,56,222,233
+db 102,15,56,222,241
+db 102,15,56,222,249
+db 102,15,56,223,208
+db 102,15,56,223,216
+db 102,15,56,223,224
+db 102,15,56,223,232
+db 102,15,56,223,240
+db 102,15,56,223,248
+ ret
+global _aesni_ecb_encrypt
+align 16
+_aesni_ecb_encrypt:
+L$_aesni_ecb_encrypt_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov esi,DWORD [20+esp]
+ mov edi,DWORD [24+esp]
+ mov eax,DWORD [28+esp]
+ mov edx,DWORD [32+esp]
+ mov ebx,DWORD [36+esp]
+ and eax,-16
+ jz NEAR L$012ecb_ret
+ mov ecx,DWORD [240+edx]
+ test ebx,ebx
+ jz NEAR L$013ecb_decrypt
+ mov ebp,edx
+ mov ebx,ecx
+ cmp eax,96
+ jb NEAR L$014ecb_enc_tail
+ movdqu xmm2,[esi]
+ movdqu xmm3,[16+esi]
+ movdqu xmm4,[32+esi]
+ movdqu xmm5,[48+esi]
+ movdqu xmm6,[64+esi]
+ movdqu xmm7,[80+esi]
+ lea esi,[96+esi]
+ sub eax,96
+ jmp NEAR L$015ecb_enc_loop6_enter
+align 16
+L$016ecb_enc_loop6:
+ movups [edi],xmm2
+ movdqu xmm2,[esi]
+ movups [16+edi],xmm3
+ movdqu xmm3,[16+esi]
+ movups [32+edi],xmm4
+ movdqu xmm4,[32+esi]
+ movups [48+edi],xmm5
+ movdqu xmm5,[48+esi]
+ movups [64+edi],xmm6
+ movdqu xmm6,[64+esi]
+ movups [80+edi],xmm7
+ lea edi,[96+edi]
+ movdqu xmm7,[80+esi]
+ lea esi,[96+esi]
+L$015ecb_enc_loop6_enter:
+ call __aesni_encrypt6
+ mov edx,ebp
+ mov ecx,ebx
+ sub eax,96
+ jnc NEAR L$016ecb_enc_loop6
+ movups [edi],xmm2
+ movups [16+edi],xmm3
+ movups [32+edi],xmm4
+ movups [48+edi],xmm5
+ movups [64+edi],xmm6
+ movups [80+edi],xmm7
+ lea edi,[96+edi]
+ add eax,96
+ jz NEAR L$012ecb_ret
+L$014ecb_enc_tail:
+ movups xmm2,[esi]
+ cmp eax,32
+ jb NEAR L$017ecb_enc_one
+ movups xmm3,[16+esi]
+ je NEAR L$018ecb_enc_two
+ movups xmm4,[32+esi]
+ cmp eax,64
+ jb NEAR L$019ecb_enc_three
+ movups xmm5,[48+esi]
+ je NEAR L$020ecb_enc_four
+ movups xmm6,[64+esi]
+ xorps xmm7,xmm7
+ call __aesni_encrypt6
+ movups [edi],xmm2
+ movups [16+edi],xmm3
+ movups [32+edi],xmm4
+ movups [48+edi],xmm5
+ movups [64+edi],xmm6
+ jmp NEAR L$012ecb_ret
+align 16
+L$017ecb_enc_one:
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ lea edx,[32+edx]
+ xorps xmm2,xmm0
+L$021enc1_loop_3:
+db 102,15,56,220,209
+ dec ecx
+ movups xmm1,[edx]
+ lea edx,[16+edx]
+ jnz NEAR L$021enc1_loop_3
+db 102,15,56,221,209
+ movups [edi],xmm2
+ jmp NEAR L$012ecb_ret
+align 16
+L$018ecb_enc_two:
+ call __aesni_encrypt2
+ movups [edi],xmm2
+ movups [16+edi],xmm3
+ jmp NEAR L$012ecb_ret
+align 16
+L$019ecb_enc_three:
+ call __aesni_encrypt3
+ movups [edi],xmm2
+ movups [16+edi],xmm3
+ movups [32+edi],xmm4
+ jmp NEAR L$012ecb_ret
+align 16
+L$020ecb_enc_four:
+ call __aesni_encrypt4
+ movups [edi],xmm2
+ movups [16+edi],xmm3
+ movups [32+edi],xmm4
+ movups [48+edi],xmm5
+ jmp NEAR L$012ecb_ret
+align 16
+L$013ecb_decrypt:
+ mov ebp,edx
+ mov ebx,ecx
+ cmp eax,96
+ jb NEAR L$022ecb_dec_tail
+ movdqu xmm2,[esi]
+ movdqu xmm3,[16+esi]
+ movdqu xmm4,[32+esi]
+ movdqu xmm5,[48+esi]
+ movdqu xmm6,[64+esi]
+ movdqu xmm7,[80+esi]
+ lea esi,[96+esi]
+ sub eax,96
+ jmp NEAR L$023ecb_dec_loop6_enter
+align 16
+L$024ecb_dec_loop6:
+ movups [edi],xmm2
+ movdqu xmm2,[esi]
+ movups [16+edi],xmm3
+ movdqu xmm3,[16+esi]
+ movups [32+edi],xmm4
+ movdqu xmm4,[32+esi]
+ movups [48+edi],xmm5
+ movdqu xmm5,[48+esi]
+ movups [64+edi],xmm6
+ movdqu xmm6,[64+esi]
+ movups [80+edi],xmm7
+ lea edi,[96+edi]
+ movdqu xmm7,[80+esi]
+ lea esi,[96+esi]
+L$023ecb_dec_loop6_enter:
+ call __aesni_decrypt6
+ mov edx,ebp
+ mov ecx,ebx
+ sub eax,96
+ jnc NEAR L$024ecb_dec_loop6
+ movups [edi],xmm2
+ movups [16+edi],xmm3
+ movups [32+edi],xmm4
+ movups [48+edi],xmm5
+ movups [64+edi],xmm6
+ movups [80+edi],xmm7
+ lea edi,[96+edi]
+ add eax,96
+ jz NEAR L$012ecb_ret
+L$022ecb_dec_tail:
+ movups xmm2,[esi]
+ cmp eax,32
+ jb NEAR L$025ecb_dec_one
+ movups xmm3,[16+esi]
+ je NEAR L$026ecb_dec_two
+ movups xmm4,[32+esi]
+ cmp eax,64
+ jb NEAR L$027ecb_dec_three
+ movups xmm5,[48+esi]
+ je NEAR L$028ecb_dec_four
+ movups xmm6,[64+esi]
+ xorps xmm7,xmm7
+ call __aesni_decrypt6
+ movups [edi],xmm2
+ movups [16+edi],xmm3
+ movups [32+edi],xmm4
+ movups [48+edi],xmm5
+ movups [64+edi],xmm6
+ jmp NEAR L$012ecb_ret
+align 16
+L$025ecb_dec_one:
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ lea edx,[32+edx]
+ xorps xmm2,xmm0
+L$029dec1_loop_4:
+db 102,15,56,222,209
+ dec ecx
+ movups xmm1,[edx]
+ lea edx,[16+edx]
+ jnz NEAR L$029dec1_loop_4
+db 102,15,56,223,209
+ movups [edi],xmm2
+ jmp NEAR L$012ecb_ret
+align 16
+L$026ecb_dec_two:
+ call __aesni_decrypt2
+ movups [edi],xmm2
+ movups [16+edi],xmm3
+ jmp NEAR L$012ecb_ret
+align 16
+L$027ecb_dec_three:
+ call __aesni_decrypt3
+ movups [edi],xmm2
+ movups [16+edi],xmm3
+ movups [32+edi],xmm4
+ jmp NEAR L$012ecb_ret
+align 16
+L$028ecb_dec_four:
+ call __aesni_decrypt4
+ movups [edi],xmm2
+ movups [16+edi],xmm3
+ movups [32+edi],xmm4
+ movups [48+edi],xmm5
+L$012ecb_ret:
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ pxor xmm2,xmm2
+ pxor xmm3,xmm3
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+ pxor xmm6,xmm6
+ pxor xmm7,xmm7
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _aesni_ccm64_encrypt_blocks
+align 16
+_aesni_ccm64_encrypt_blocks:
+L$_aesni_ccm64_encrypt_blocks_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov esi,DWORD [20+esp]
+ mov edi,DWORD [24+esp]
+ mov eax,DWORD [28+esp]
+ mov edx,DWORD [32+esp]
+ mov ebx,DWORD [36+esp]
+ mov ecx,DWORD [40+esp]
+ mov ebp,esp
+ sub esp,60
+ and esp,-16
+ mov DWORD [48+esp],ebp
+ movdqu xmm7,[ebx]
+ movdqu xmm3,[ecx]
+ mov ecx,DWORD [240+edx]
+ mov DWORD [esp],202182159
+ mov DWORD [4+esp],134810123
+ mov DWORD [8+esp],67438087
+ mov DWORD [12+esp],66051
+ mov ebx,1
+ xor ebp,ebp
+ mov DWORD [16+esp],ebx
+ mov DWORD [20+esp],ebp
+ mov DWORD [24+esp],ebp
+ mov DWORD [28+esp],ebp
+ shl ecx,4
+ mov ebx,16
+ lea ebp,[edx]
+ movdqa xmm5,[esp]
+ movdqa xmm2,xmm7
+ lea edx,[32+ecx*1+edx]
+ sub ebx,ecx
+db 102,15,56,0,253
+L$030ccm64_enc_outer:
+ movups xmm0,[ebp]
+ mov ecx,ebx
+ movups xmm6,[esi]
+ xorps xmm2,xmm0
+ movups xmm1,[16+ebp]
+ xorps xmm0,xmm6
+ xorps xmm3,xmm0
+ movups xmm0,[32+ebp]
+L$031ccm64_enc2_loop:
+db 102,15,56,220,209
+db 102,15,56,220,217
+ movups xmm1,[ecx*1+edx]
+ add ecx,32
+db 102,15,56,220,208
+db 102,15,56,220,216
+ movups xmm0,[ecx*1+edx-16]
+ jnz NEAR L$031ccm64_enc2_loop
+db 102,15,56,220,209
+db 102,15,56,220,217
+ paddq xmm7,[16+esp]
+ dec eax
+db 102,15,56,221,208
+db 102,15,56,221,216
+ lea esi,[16+esi]
+ xorps xmm6,xmm2
+ movdqa xmm2,xmm7
+ movups [edi],xmm6
+db 102,15,56,0,213
+ lea edi,[16+edi]
+ jnz NEAR L$030ccm64_enc_outer
+ mov esp,DWORD [48+esp]
+ mov edi,DWORD [40+esp]
+ movups [edi],xmm3
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ pxor xmm2,xmm2
+ pxor xmm3,xmm3
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+ pxor xmm6,xmm6
+ pxor xmm7,xmm7
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _aesni_ccm64_decrypt_blocks
+align 16
+_aesni_ccm64_decrypt_blocks:
+L$_aesni_ccm64_decrypt_blocks_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov esi,DWORD [20+esp]
+ mov edi,DWORD [24+esp]
+ mov eax,DWORD [28+esp]
+ mov edx,DWORD [32+esp]
+ mov ebx,DWORD [36+esp]
+ mov ecx,DWORD [40+esp]
+ mov ebp,esp
+ sub esp,60
+ and esp,-16
+ mov DWORD [48+esp],ebp
+ movdqu xmm7,[ebx]
+ movdqu xmm3,[ecx]
+ mov ecx,DWORD [240+edx]
+ mov DWORD [esp],202182159
+ mov DWORD [4+esp],134810123
+ mov DWORD [8+esp],67438087
+ mov DWORD [12+esp],66051
+ mov ebx,1
+ xor ebp,ebp
+ mov DWORD [16+esp],ebx
+ mov DWORD [20+esp],ebp
+ mov DWORD [24+esp],ebp
+ mov DWORD [28+esp],ebp
+ movdqa xmm5,[esp]
+ movdqa xmm2,xmm7
+ mov ebp,edx
+ mov ebx,ecx
+db 102,15,56,0,253
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ lea edx,[32+edx]
+ xorps xmm2,xmm0
+L$032enc1_loop_5:
+db 102,15,56,220,209
+ dec ecx
+ movups xmm1,[edx]
+ lea edx,[16+edx]
+ jnz NEAR L$032enc1_loop_5
+db 102,15,56,221,209
+ shl ebx,4
+ mov ecx,16
+ movups xmm6,[esi]
+ paddq xmm7,[16+esp]
+ lea esi,[16+esi]
+ sub ecx,ebx
+ lea edx,[32+ebx*1+ebp]
+ mov ebx,ecx
+ jmp NEAR L$033ccm64_dec_outer
+align 16
+L$033ccm64_dec_outer:
+ xorps xmm6,xmm2
+ movdqa xmm2,xmm7
+ movups [edi],xmm6
+ lea edi,[16+edi]
+db 102,15,56,0,213
+ sub eax,1
+ jz NEAR L$034ccm64_dec_break
+ movups xmm0,[ebp]
+ mov ecx,ebx
+ movups xmm1,[16+ebp]
+ xorps xmm6,xmm0
+ xorps xmm2,xmm0
+ xorps xmm3,xmm6
+ movups xmm0,[32+ebp]
+L$035ccm64_dec2_loop:
+db 102,15,56,220,209
+db 102,15,56,220,217
+ movups xmm1,[ecx*1+edx]
+ add ecx,32
+db 102,15,56,220,208
+db 102,15,56,220,216
+ movups xmm0,[ecx*1+edx-16]
+ jnz NEAR L$035ccm64_dec2_loop
+ movups xmm6,[esi]
+ paddq xmm7,[16+esp]
+db 102,15,56,220,209
+db 102,15,56,220,217
+db 102,15,56,221,208
+db 102,15,56,221,216
+ lea esi,[16+esi]
+ jmp NEAR L$033ccm64_dec_outer
+align 16
+L$034ccm64_dec_break:
+ mov ecx,DWORD [240+ebp]
+ mov edx,ebp
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ xorps xmm6,xmm0
+ lea edx,[32+edx]
+ xorps xmm3,xmm6
+L$036enc1_loop_6:
+db 102,15,56,220,217
+ dec ecx
+ movups xmm1,[edx]
+ lea edx,[16+edx]
+ jnz NEAR L$036enc1_loop_6
+db 102,15,56,221,217
+ mov esp,DWORD [48+esp]
+ mov edi,DWORD [40+esp]
+ movups [edi],xmm3
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ pxor xmm2,xmm2
+ pxor xmm3,xmm3
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+ pxor xmm6,xmm6
+ pxor xmm7,xmm7
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _aesni_ctr32_encrypt_blocks
+align 16
+_aesni_ctr32_encrypt_blocks:
+L$_aesni_ctr32_encrypt_blocks_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov esi,DWORD [20+esp]
+ mov edi,DWORD [24+esp]
+ mov eax,DWORD [28+esp]
+ mov edx,DWORD [32+esp]
+ mov ebx,DWORD [36+esp]
+ mov ebp,esp
+ sub esp,88
+ and esp,-16
+ mov DWORD [80+esp],ebp
+ cmp eax,1
+ je NEAR L$037ctr32_one_shortcut
+ movdqu xmm7,[ebx]
+ mov DWORD [esp],202182159
+ mov DWORD [4+esp],134810123
+ mov DWORD [8+esp],67438087
+ mov DWORD [12+esp],66051
+ mov ecx,6
+ xor ebp,ebp
+ mov DWORD [16+esp],ecx
+ mov DWORD [20+esp],ecx
+ mov DWORD [24+esp],ecx
+ mov DWORD [28+esp],ebp
+db 102,15,58,22,251,3
+db 102,15,58,34,253,3
+ mov ecx,DWORD [240+edx]
+ bswap ebx
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ movdqa xmm2,[esp]
+db 102,15,58,34,195,0
+ lea ebp,[3+ebx]
+db 102,15,58,34,205,0
+ inc ebx
+db 102,15,58,34,195,1
+ inc ebp
+db 102,15,58,34,205,1
+ inc ebx
+db 102,15,58,34,195,2
+ inc ebp
+db 102,15,58,34,205,2
+ movdqa [48+esp],xmm0
+db 102,15,56,0,194
+ movdqu xmm6,[edx]
+ movdqa [64+esp],xmm1
+db 102,15,56,0,202
+ pshufd xmm2,xmm0,192
+ pshufd xmm3,xmm0,128
+ cmp eax,6
+ jb NEAR L$038ctr32_tail
+ pxor xmm7,xmm6
+ shl ecx,4
+ mov ebx,16
+ movdqa [32+esp],xmm7
+ mov ebp,edx
+ sub ebx,ecx
+ lea edx,[32+ecx*1+edx]
+ sub eax,6
+ jmp NEAR L$039ctr32_loop6
+align 16
+L$039ctr32_loop6:
+ pshufd xmm4,xmm0,64
+ movdqa xmm0,[32+esp]
+ pshufd xmm5,xmm1,192
+ pxor xmm2,xmm0
+ pshufd xmm6,xmm1,128
+ pxor xmm3,xmm0
+ pshufd xmm7,xmm1,64
+ movups xmm1,[16+ebp]
+ pxor xmm4,xmm0
+ pxor xmm5,xmm0
+db 102,15,56,220,209
+ pxor xmm6,xmm0
+ pxor xmm7,xmm0
+db 102,15,56,220,217
+ movups xmm0,[32+ebp]
+ mov ecx,ebx
+db 102,15,56,220,225
+db 102,15,56,220,233
+db 102,15,56,220,241
+db 102,15,56,220,249
+ call L$_aesni_encrypt6_enter
+ movups xmm1,[esi]
+ movups xmm0,[16+esi]
+ xorps xmm2,xmm1
+ movups xmm1,[32+esi]
+ xorps xmm3,xmm0
+ movups [edi],xmm2
+ movdqa xmm0,[16+esp]
+ xorps xmm4,xmm1
+ movdqa xmm1,[64+esp]
+ movups [16+edi],xmm3
+ movups [32+edi],xmm4
+ paddd xmm1,xmm0
+ paddd xmm0,[48+esp]
+ movdqa xmm2,[esp]
+ movups xmm3,[48+esi]
+ movups xmm4,[64+esi]
+ xorps xmm5,xmm3
+ movups xmm3,[80+esi]
+ lea esi,[96+esi]
+ movdqa [48+esp],xmm0
+db 102,15,56,0,194
+ xorps xmm6,xmm4
+ movups [48+edi],xmm5
+ xorps xmm7,xmm3
+ movdqa [64+esp],xmm1
+db 102,15,56,0,202
+ movups [64+edi],xmm6
+ pshufd xmm2,xmm0,192
+ movups [80+edi],xmm7
+ lea edi,[96+edi]
+ pshufd xmm3,xmm0,128
+ sub eax,6
+ jnc NEAR L$039ctr32_loop6
+ add eax,6
+ jz NEAR L$040ctr32_ret
+ movdqu xmm7,[ebp]
+ mov edx,ebp
+ pxor xmm7,[32+esp]
+ mov ecx,DWORD [240+ebp]
+L$038ctr32_tail:
+ por xmm2,xmm7
+ cmp eax,2
+ jb NEAR L$041ctr32_one
+ pshufd xmm4,xmm0,64
+ por xmm3,xmm7
+ je NEAR L$042ctr32_two
+ pshufd xmm5,xmm1,192
+ por xmm4,xmm7
+ cmp eax,4
+ jb NEAR L$043ctr32_three
+ pshufd xmm6,xmm1,128
+ por xmm5,xmm7
+ je NEAR L$044ctr32_four
+ por xmm6,xmm7
+ call __aesni_encrypt6
+ movups xmm1,[esi]
+ movups xmm0,[16+esi]
+ xorps xmm2,xmm1
+ movups xmm1,[32+esi]
+ xorps xmm3,xmm0
+ movups xmm0,[48+esi]
+ xorps xmm4,xmm1
+ movups xmm1,[64+esi]
+ xorps xmm5,xmm0
+ movups [edi],xmm2
+ xorps xmm6,xmm1
+ movups [16+edi],xmm3
+ movups [32+edi],xmm4
+ movups [48+edi],xmm5
+ movups [64+edi],xmm6
+ jmp NEAR L$040ctr32_ret
+align 16
+L$037ctr32_one_shortcut:
+ movups xmm2,[ebx]
+ mov ecx,DWORD [240+edx]
+L$041ctr32_one:
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ lea edx,[32+edx]
+ xorps xmm2,xmm0
+L$045enc1_loop_7:
+db 102,15,56,220,209
+ dec ecx
+ movups xmm1,[edx]
+ lea edx,[16+edx]
+ jnz NEAR L$045enc1_loop_7
+db 102,15,56,221,209
+ movups xmm6,[esi]
+ xorps xmm6,xmm2
+ movups [edi],xmm6
+ jmp NEAR L$040ctr32_ret
+align 16
+L$042ctr32_two:
+ call __aesni_encrypt2
+ movups xmm5,[esi]
+ movups xmm6,[16+esi]
+ xorps xmm2,xmm5
+ xorps xmm3,xmm6
+ movups [edi],xmm2
+ movups [16+edi],xmm3
+ jmp NEAR L$040ctr32_ret
+align 16
+L$043ctr32_three:
+ call __aesni_encrypt3
+ movups xmm5,[esi]
+ movups xmm6,[16+esi]
+ xorps xmm2,xmm5
+ movups xmm7,[32+esi]
+ xorps xmm3,xmm6
+ movups [edi],xmm2
+ xorps xmm4,xmm7
+ movups [16+edi],xmm3
+ movups [32+edi],xmm4
+ jmp NEAR L$040ctr32_ret
+align 16
+L$044ctr32_four:
+ call __aesni_encrypt4
+ movups xmm6,[esi]
+ movups xmm7,[16+esi]
+ movups xmm1,[32+esi]
+ xorps xmm2,xmm6
+ movups xmm0,[48+esi]
+ xorps xmm3,xmm7
+ movups [edi],xmm2
+ xorps xmm4,xmm1
+ movups [16+edi],xmm3
+ xorps xmm5,xmm0
+ movups [32+edi],xmm4
+ movups [48+edi],xmm5
+L$040ctr32_ret:
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ pxor xmm2,xmm2
+ pxor xmm3,xmm3
+ pxor xmm4,xmm4
+ movdqa [32+esp],xmm0
+ pxor xmm5,xmm5
+ movdqa [48+esp],xmm0
+ pxor xmm6,xmm6
+ movdqa [64+esp],xmm0
+ pxor xmm7,xmm7
+ mov esp,DWORD [80+esp]
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _aesni_xts_encrypt
+align 16
+_aesni_xts_encrypt:
+L$_aesni_xts_encrypt_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov edx,DWORD [36+esp]
+ mov esi,DWORD [40+esp]
+ mov ecx,DWORD [240+edx]
+ movups xmm2,[esi]
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ lea edx,[32+edx]
+ xorps xmm2,xmm0
+L$046enc1_loop_8:
+db 102,15,56,220,209
+ dec ecx
+ movups xmm1,[edx]
+ lea edx,[16+edx]
+ jnz NEAR L$046enc1_loop_8
+db 102,15,56,221,209
+ mov esi,DWORD [20+esp]
+ mov edi,DWORD [24+esp]
+ mov eax,DWORD [28+esp]
+ mov edx,DWORD [32+esp]
+ mov ebp,esp
+ sub esp,120
+ mov ecx,DWORD [240+edx]
+ and esp,-16
+ mov DWORD [96+esp],135
+ mov DWORD [100+esp],0
+ mov DWORD [104+esp],1
+ mov DWORD [108+esp],0
+ mov DWORD [112+esp],eax
+ mov DWORD [116+esp],ebp
+ movdqa xmm1,xmm2
+ pxor xmm0,xmm0
+ movdqa xmm3,[96+esp]
+ pcmpgtd xmm0,xmm1
+ and eax,-16
+ mov ebp,edx
+ mov ebx,ecx
+ sub eax,96
+ jc NEAR L$047xts_enc_short
+ shl ecx,4
+ mov ebx,16
+ sub ebx,ecx
+ lea edx,[32+ecx*1+edx]
+ jmp NEAR L$048xts_enc_loop6
+align 16
+L$048xts_enc_loop6:
+ pshufd xmm2,xmm0,19
+ pxor xmm0,xmm0
+ movdqa [esp],xmm1
+ paddq xmm1,xmm1
+ pand xmm2,xmm3
+ pcmpgtd xmm0,xmm1
+ pxor xmm1,xmm2
+ pshufd xmm2,xmm0,19
+ pxor xmm0,xmm0
+ movdqa [16+esp],xmm1
+ paddq xmm1,xmm1
+ pand xmm2,xmm3
+ pcmpgtd xmm0,xmm1
+ pxor xmm1,xmm2
+ pshufd xmm2,xmm0,19
+ pxor xmm0,xmm0
+ movdqa [32+esp],xmm1
+ paddq xmm1,xmm1
+ pand xmm2,xmm3
+ pcmpgtd xmm0,xmm1
+ pxor xmm1,xmm2
+ pshufd xmm2,xmm0,19
+ pxor xmm0,xmm0
+ movdqa [48+esp],xmm1
+ paddq xmm1,xmm1
+ pand xmm2,xmm3
+ pcmpgtd xmm0,xmm1
+ pxor xmm1,xmm2
+ pshufd xmm7,xmm0,19
+ movdqa [64+esp],xmm1
+ paddq xmm1,xmm1
+ movups xmm0,[ebp]
+ pand xmm7,xmm3
+ movups xmm2,[esi]
+ pxor xmm7,xmm1
+ mov ecx,ebx
+ movdqu xmm3,[16+esi]
+ xorps xmm2,xmm0
+ movdqu xmm4,[32+esi]
+ pxor xmm3,xmm0
+ movdqu xmm5,[48+esi]
+ pxor xmm4,xmm0
+ movdqu xmm6,[64+esi]
+ pxor xmm5,xmm0
+ movdqu xmm1,[80+esi]
+ pxor xmm6,xmm0
+ lea esi,[96+esi]
+ pxor xmm2,[esp]
+ movdqa [80+esp],xmm7
+ pxor xmm7,xmm1
+ movups xmm1,[16+ebp]
+ pxor xmm3,[16+esp]
+ pxor xmm4,[32+esp]
+db 102,15,56,220,209
+ pxor xmm5,[48+esp]
+ pxor xmm6,[64+esp]
+db 102,15,56,220,217
+ pxor xmm7,xmm0
+ movups xmm0,[32+ebp]
+db 102,15,56,220,225
+db 102,15,56,220,233
+db 102,15,56,220,241
+db 102,15,56,220,249
+ call L$_aesni_encrypt6_enter
+ movdqa xmm1,[80+esp]
+ pxor xmm0,xmm0
+ xorps xmm2,[esp]
+ pcmpgtd xmm0,xmm1
+ xorps xmm3,[16+esp]
+ movups [edi],xmm2
+ xorps xmm4,[32+esp]
+ movups [16+edi],xmm3
+ xorps xmm5,[48+esp]
+ movups [32+edi],xmm4
+ xorps xmm6,[64+esp]
+ movups [48+edi],xmm5
+ xorps xmm7,xmm1
+ movups [64+edi],xmm6
+ pshufd xmm2,xmm0,19
+ movups [80+edi],xmm7
+ lea edi,[96+edi]
+ movdqa xmm3,[96+esp]
+ pxor xmm0,xmm0
+ paddq xmm1,xmm1
+ pand xmm2,xmm3
+ pcmpgtd xmm0,xmm1
+ pxor xmm1,xmm2
+ sub eax,96
+ jnc NEAR L$048xts_enc_loop6
+ mov ecx,DWORD [240+ebp]
+ mov edx,ebp
+ mov ebx,ecx
+L$047xts_enc_short:
+ add eax,96
+ jz NEAR L$049xts_enc_done6x
+ movdqa xmm5,xmm1
+ cmp eax,32
+ jb NEAR L$050xts_enc_one
+ pshufd xmm2,xmm0,19
+ pxor xmm0,xmm0
+ paddq xmm1,xmm1
+ pand xmm2,xmm3
+ pcmpgtd xmm0,xmm1
+ pxor xmm1,xmm2
+ je NEAR L$051xts_enc_two
+ pshufd xmm2,xmm0,19
+ pxor xmm0,xmm0
+ movdqa xmm6,xmm1
+ paddq xmm1,xmm1
+ pand xmm2,xmm3
+ pcmpgtd xmm0,xmm1
+ pxor xmm1,xmm2
+ cmp eax,64
+ jb NEAR L$052xts_enc_three
+ pshufd xmm2,xmm0,19
+ pxor xmm0,xmm0
+ movdqa xmm7,xmm1
+ paddq xmm1,xmm1
+ pand xmm2,xmm3
+ pcmpgtd xmm0,xmm1
+ pxor xmm1,xmm2
+ movdqa [esp],xmm5
+ movdqa [16+esp],xmm6
+ je NEAR L$053xts_enc_four
+ movdqa [32+esp],xmm7
+ pshufd xmm7,xmm0,19
+ movdqa [48+esp],xmm1
+ paddq xmm1,xmm1
+ pand xmm7,xmm3
+ pxor xmm7,xmm1
+ movdqu xmm2,[esi]
+ movdqu xmm3,[16+esi]
+ movdqu xmm4,[32+esi]
+ pxor xmm2,[esp]
+ movdqu xmm5,[48+esi]
+ pxor xmm3,[16+esp]
+ movdqu xmm6,[64+esi]
+ pxor xmm4,[32+esp]
+ lea esi,[80+esi]
+ pxor xmm5,[48+esp]
+ movdqa [64+esp],xmm7
+ pxor xmm6,xmm7
+ call __aesni_encrypt6
+ movaps xmm1,[64+esp]
+ xorps xmm2,[esp]
+ xorps xmm3,[16+esp]
+ xorps xmm4,[32+esp]
+ movups [edi],xmm2
+ xorps xmm5,[48+esp]
+ movups [16+edi],xmm3
+ xorps xmm6,xmm1
+ movups [32+edi],xmm4
+ movups [48+edi],xmm5
+ movups [64+edi],xmm6
+ lea edi,[80+edi]
+ jmp NEAR L$054xts_enc_done
+align 16
+L$050xts_enc_one:
+ movups xmm2,[esi]
+ lea esi,[16+esi]
+ xorps xmm2,xmm5
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ lea edx,[32+edx]
+ xorps xmm2,xmm0
+L$055enc1_loop_9:
+db 102,15,56,220,209
+ dec ecx
+ movups xmm1,[edx]
+ lea edx,[16+edx]
+ jnz NEAR L$055enc1_loop_9
+db 102,15,56,221,209
+ xorps xmm2,xmm5
+ movups [edi],xmm2
+ lea edi,[16+edi]
+ movdqa xmm1,xmm5
+ jmp NEAR L$054xts_enc_done
+align 16
+L$051xts_enc_two:
+ movaps xmm6,xmm1
+ movups xmm2,[esi]
+ movups xmm3,[16+esi]
+ lea esi,[32+esi]
+ xorps xmm2,xmm5
+ xorps xmm3,xmm6
+ call __aesni_encrypt2
+ xorps xmm2,xmm5
+ xorps xmm3,xmm6
+ movups [edi],xmm2
+ movups [16+edi],xmm3
+ lea edi,[32+edi]
+ movdqa xmm1,xmm6
+ jmp NEAR L$054xts_enc_done
+align 16
+L$052xts_enc_three:
+ movaps xmm7,xmm1
+ movups xmm2,[esi]
+ movups xmm3,[16+esi]
+ movups xmm4,[32+esi]
+ lea esi,[48+esi]
+ xorps xmm2,xmm5
+ xorps xmm3,xmm6
+ xorps xmm4,xmm7
+ call __aesni_encrypt3
+ xorps xmm2,xmm5
+ xorps xmm3,xmm6
+ xorps xmm4,xmm7
+ movups [edi],xmm2
+ movups [16+edi],xmm3
+ movups [32+edi],xmm4
+ lea edi,[48+edi]
+ movdqa xmm1,xmm7
+ jmp NEAR L$054xts_enc_done
+align 16
+L$053xts_enc_four:
+ movaps xmm6,xmm1
+ movups xmm2,[esi]
+ movups xmm3,[16+esi]
+ movups xmm4,[32+esi]
+ xorps xmm2,[esp]
+ movups xmm5,[48+esi]
+ lea esi,[64+esi]
+ xorps xmm3,[16+esp]
+ xorps xmm4,xmm7
+ xorps xmm5,xmm6
+ call __aesni_encrypt4
+ xorps xmm2,[esp]
+ xorps xmm3,[16+esp]
+ xorps xmm4,xmm7
+ movups [edi],xmm2
+ xorps xmm5,xmm6
+ movups [16+edi],xmm3
+ movups [32+edi],xmm4
+ movups [48+edi],xmm5
+ lea edi,[64+edi]
+ movdqa xmm1,xmm6
+ jmp NEAR L$054xts_enc_done
+align 16
+L$049xts_enc_done6x:
+ mov eax,DWORD [112+esp]
+ and eax,15
+ jz NEAR L$056xts_enc_ret
+ movdqa xmm5,xmm1
+ mov DWORD [112+esp],eax
+ jmp NEAR L$057xts_enc_steal
+align 16
+L$054xts_enc_done:
+ mov eax,DWORD [112+esp]
+ pxor xmm0,xmm0
+ and eax,15
+ jz NEAR L$056xts_enc_ret
+ pcmpgtd xmm0,xmm1
+ mov DWORD [112+esp],eax
+ pshufd xmm5,xmm0,19
+ paddq xmm1,xmm1
+ pand xmm5,[96+esp]
+ pxor xmm5,xmm1
+L$057xts_enc_steal:
+ movzx ecx,BYTE [esi]
+ movzx edx,BYTE [edi-16]
+ lea esi,[1+esi]
+ mov BYTE [edi-16],cl
+ mov BYTE [edi],dl
+ lea edi,[1+edi]
+ sub eax,1
+ jnz NEAR L$057xts_enc_steal
+ sub edi,DWORD [112+esp]
+ mov edx,ebp
+ mov ecx,ebx
+ movups xmm2,[edi-16]
+ xorps xmm2,xmm5
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ lea edx,[32+edx]
+ xorps xmm2,xmm0
+L$058enc1_loop_10:
+db 102,15,56,220,209
+ dec ecx
+ movups xmm1,[edx]
+ lea edx,[16+edx]
+ jnz NEAR L$058enc1_loop_10
+db 102,15,56,221,209
+ xorps xmm2,xmm5
+ movups [edi-16],xmm2
+L$056xts_enc_ret:
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ pxor xmm2,xmm2
+ movdqa [esp],xmm0
+ pxor xmm3,xmm3
+ movdqa [16+esp],xmm0
+ pxor xmm4,xmm4
+ movdqa [32+esp],xmm0
+ pxor xmm5,xmm5
+ movdqa [48+esp],xmm0
+ pxor xmm6,xmm6
+ movdqa [64+esp],xmm0
+ pxor xmm7,xmm7
+ movdqa [80+esp],xmm0
+ mov esp,DWORD [116+esp]
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _aesni_xts_decrypt
+align 16
+_aesni_xts_decrypt:
+L$_aesni_xts_decrypt_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov edx,DWORD [36+esp]
+ mov esi,DWORD [40+esp]
+ mov ecx,DWORD [240+edx]
+ movups xmm2,[esi]
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ lea edx,[32+edx]
+ xorps xmm2,xmm0
+L$059enc1_loop_11:
+db 102,15,56,220,209
+ dec ecx
+ movups xmm1,[edx]
+ lea edx,[16+edx]
+ jnz NEAR L$059enc1_loop_11
+db 102,15,56,221,209
+ mov esi,DWORD [20+esp]
+ mov edi,DWORD [24+esp]
+ mov eax,DWORD [28+esp]
+ mov edx,DWORD [32+esp]
+ mov ebp,esp
+ sub esp,120
+ and esp,-16
+ xor ebx,ebx
+ test eax,15
+ setnz bl
+ shl ebx,4
+ sub eax,ebx
+ mov DWORD [96+esp],135
+ mov DWORD [100+esp],0
+ mov DWORD [104+esp],1
+ mov DWORD [108+esp],0
+ mov DWORD [112+esp],eax
+ mov DWORD [116+esp],ebp
+ mov ecx,DWORD [240+edx]
+ mov ebp,edx
+ mov ebx,ecx
+ movdqa xmm1,xmm2
+ pxor xmm0,xmm0
+ movdqa xmm3,[96+esp]
+ pcmpgtd xmm0,xmm1
+ and eax,-16
+ sub eax,96
+ jc NEAR L$060xts_dec_short
+ shl ecx,4
+ mov ebx,16
+ sub ebx,ecx
+ lea edx,[32+ecx*1+edx]
+ jmp NEAR L$061xts_dec_loop6
+align 16
+L$061xts_dec_loop6:
+ pshufd xmm2,xmm0,19
+ pxor xmm0,xmm0
+ movdqa [esp],xmm1
+ paddq xmm1,xmm1
+ pand xmm2,xmm3
+ pcmpgtd xmm0,xmm1
+ pxor xmm1,xmm2
+ pshufd xmm2,xmm0,19
+ pxor xmm0,xmm0
+ movdqa [16+esp],xmm1
+ paddq xmm1,xmm1
+ pand xmm2,xmm3
+ pcmpgtd xmm0,xmm1
+ pxor xmm1,xmm2
+ pshufd xmm2,xmm0,19
+ pxor xmm0,xmm0
+ movdqa [32+esp],xmm1
+ paddq xmm1,xmm1
+ pand xmm2,xmm3
+ pcmpgtd xmm0,xmm1
+ pxor xmm1,xmm2
+ pshufd xmm2,xmm0,19
+ pxor xmm0,xmm0
+ movdqa [48+esp],xmm1
+ paddq xmm1,xmm1
+ pand xmm2,xmm3
+ pcmpgtd xmm0,xmm1
+ pxor xmm1,xmm2
+ pshufd xmm7,xmm0,19
+ movdqa [64+esp],xmm1
+ paddq xmm1,xmm1
+ movups xmm0,[ebp]
+ pand xmm7,xmm3
+ movups xmm2,[esi]
+ pxor xmm7,xmm1
+ mov ecx,ebx
+ movdqu xmm3,[16+esi]
+ xorps xmm2,xmm0
+ movdqu xmm4,[32+esi]
+ pxor xmm3,xmm0
+ movdqu xmm5,[48+esi]
+ pxor xmm4,xmm0
+ movdqu xmm6,[64+esi]
+ pxor xmm5,xmm0
+ movdqu xmm1,[80+esi]
+ pxor xmm6,xmm0
+ lea esi,[96+esi]
+ pxor xmm2,[esp]
+ movdqa [80+esp],xmm7
+ pxor xmm7,xmm1
+ movups xmm1,[16+ebp]
+ pxor xmm3,[16+esp]
+ pxor xmm4,[32+esp]
+db 102,15,56,222,209
+ pxor xmm5,[48+esp]
+ pxor xmm6,[64+esp]
+db 102,15,56,222,217
+ pxor xmm7,xmm0
+ movups xmm0,[32+ebp]
+db 102,15,56,222,225
+db 102,15,56,222,233
+db 102,15,56,222,241
+db 102,15,56,222,249
+ call L$_aesni_decrypt6_enter
+ movdqa xmm1,[80+esp]
+ pxor xmm0,xmm0
+ xorps xmm2,[esp]
+ pcmpgtd xmm0,xmm1
+ xorps xmm3,[16+esp]
+ movups [edi],xmm2
+ xorps xmm4,[32+esp]
+ movups [16+edi],xmm3
+ xorps xmm5,[48+esp]
+ movups [32+edi],xmm4
+ xorps xmm6,[64+esp]
+ movups [48+edi],xmm5
+ xorps xmm7,xmm1
+ movups [64+edi],xmm6
+ pshufd xmm2,xmm0,19
+ movups [80+edi],xmm7
+ lea edi,[96+edi]
+ movdqa xmm3,[96+esp]
+ pxor xmm0,xmm0
+ paddq xmm1,xmm1
+ pand xmm2,xmm3
+ pcmpgtd xmm0,xmm1
+ pxor xmm1,xmm2
+ sub eax,96
+ jnc NEAR L$061xts_dec_loop6
+ mov ecx,DWORD [240+ebp]
+ mov edx,ebp
+ mov ebx,ecx
+L$060xts_dec_short:
+ add eax,96
+ jz NEAR L$062xts_dec_done6x
+ movdqa xmm5,xmm1
+ cmp eax,32
+ jb NEAR L$063xts_dec_one
+ pshufd xmm2,xmm0,19
+ pxor xmm0,xmm0
+ paddq xmm1,xmm1
+ pand xmm2,xmm3
+ pcmpgtd xmm0,xmm1
+ pxor xmm1,xmm2
+ je NEAR L$064xts_dec_two
+ pshufd xmm2,xmm0,19
+ pxor xmm0,xmm0
+ movdqa xmm6,xmm1
+ paddq xmm1,xmm1
+ pand xmm2,xmm3
+ pcmpgtd xmm0,xmm1
+ pxor xmm1,xmm2
+ cmp eax,64
+ jb NEAR L$065xts_dec_three
+ pshufd xmm2,xmm0,19
+ pxor xmm0,xmm0
+ movdqa xmm7,xmm1
+ paddq xmm1,xmm1
+ pand xmm2,xmm3
+ pcmpgtd xmm0,xmm1
+ pxor xmm1,xmm2
+ movdqa [esp],xmm5
+ movdqa [16+esp],xmm6
+ je NEAR L$066xts_dec_four
+ movdqa [32+esp],xmm7
+ pshufd xmm7,xmm0,19
+ movdqa [48+esp],xmm1
+ paddq xmm1,xmm1
+ pand xmm7,xmm3
+ pxor xmm7,xmm1
+ movdqu xmm2,[esi]
+ movdqu xmm3,[16+esi]
+ movdqu xmm4,[32+esi]
+ pxor xmm2,[esp]
+ movdqu xmm5,[48+esi]
+ pxor xmm3,[16+esp]
+ movdqu xmm6,[64+esi]
+ pxor xmm4,[32+esp]
+ lea esi,[80+esi]
+ pxor xmm5,[48+esp]
+ movdqa [64+esp],xmm7
+ pxor xmm6,xmm7
+ call __aesni_decrypt6
+ movaps xmm1,[64+esp]
+ xorps xmm2,[esp]
+ xorps xmm3,[16+esp]
+ xorps xmm4,[32+esp]
+ movups [edi],xmm2
+ xorps xmm5,[48+esp]
+ movups [16+edi],xmm3
+ xorps xmm6,xmm1
+ movups [32+edi],xmm4
+ movups [48+edi],xmm5
+ movups [64+edi],xmm6
+ lea edi,[80+edi]
+ jmp NEAR L$067xts_dec_done
+align 16
+L$063xts_dec_one:
+ movups xmm2,[esi]
+ lea esi,[16+esi]
+ xorps xmm2,xmm5
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ lea edx,[32+edx]
+ xorps xmm2,xmm0
+L$068dec1_loop_12:
+db 102,15,56,222,209
+ dec ecx
+ movups xmm1,[edx]
+ lea edx,[16+edx]
+ jnz NEAR L$068dec1_loop_12
+db 102,15,56,223,209
+ xorps xmm2,xmm5
+ movups [edi],xmm2
+ lea edi,[16+edi]
+ movdqa xmm1,xmm5
+ jmp NEAR L$067xts_dec_done
+align 16
+L$064xts_dec_two:
+ movaps xmm6,xmm1
+ movups xmm2,[esi]
+ movups xmm3,[16+esi]
+ lea esi,[32+esi]
+ xorps xmm2,xmm5
+ xorps xmm3,xmm6
+ call __aesni_decrypt2
+ xorps xmm2,xmm5
+ xorps xmm3,xmm6
+ movups [edi],xmm2
+ movups [16+edi],xmm3
+ lea edi,[32+edi]
+ movdqa xmm1,xmm6
+ jmp NEAR L$067xts_dec_done
+align 16
+L$065xts_dec_three:
+ movaps xmm7,xmm1
+ movups xmm2,[esi]
+ movups xmm3,[16+esi]
+ movups xmm4,[32+esi]
+ lea esi,[48+esi]
+ xorps xmm2,xmm5
+ xorps xmm3,xmm6
+ xorps xmm4,xmm7
+ call __aesni_decrypt3
+ xorps xmm2,xmm5
+ xorps xmm3,xmm6
+ xorps xmm4,xmm7
+ movups [edi],xmm2
+ movups [16+edi],xmm3
+ movups [32+edi],xmm4
+ lea edi,[48+edi]
+ movdqa xmm1,xmm7
+ jmp NEAR L$067xts_dec_done
+align 16
+L$066xts_dec_four:
+ movaps xmm6,xmm1
+ movups xmm2,[esi]
+ movups xmm3,[16+esi]
+ movups xmm4,[32+esi]
+ xorps xmm2,[esp]
+ movups xmm5,[48+esi]
+ lea esi,[64+esi]
+ xorps xmm3,[16+esp]
+ xorps xmm4,xmm7
+ xorps xmm5,xmm6
+ call __aesni_decrypt4
+ xorps xmm2,[esp]
+ xorps xmm3,[16+esp]
+ xorps xmm4,xmm7
+ movups [edi],xmm2
+ xorps xmm5,xmm6
+ movups [16+edi],xmm3
+ movups [32+edi],xmm4
+ movups [48+edi],xmm5
+ lea edi,[64+edi]
+ movdqa xmm1,xmm6
+ jmp NEAR L$067xts_dec_done
+align 16
+L$062xts_dec_done6x:
+ mov eax,DWORD [112+esp]
+ and eax,15
+ jz NEAR L$069xts_dec_ret
+ mov DWORD [112+esp],eax
+ jmp NEAR L$070xts_dec_only_one_more
+align 16
+L$067xts_dec_done:
+ mov eax,DWORD [112+esp]
+ pxor xmm0,xmm0
+ and eax,15
+ jz NEAR L$069xts_dec_ret
+ pcmpgtd xmm0,xmm1
+ mov DWORD [112+esp],eax
+ pshufd xmm2,xmm0,19
+ pxor xmm0,xmm0
+ movdqa xmm3,[96+esp]
+ paddq xmm1,xmm1
+ pand xmm2,xmm3
+ pcmpgtd xmm0,xmm1
+ pxor xmm1,xmm2
+L$070xts_dec_only_one_more:
+ pshufd xmm5,xmm0,19
+ movdqa xmm6,xmm1
+ paddq xmm1,xmm1
+ pand xmm5,xmm3
+ pxor xmm5,xmm1
+ mov edx,ebp
+ mov ecx,ebx
+ movups xmm2,[esi]
+ xorps xmm2,xmm5
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ lea edx,[32+edx]
+ xorps xmm2,xmm0
+L$071dec1_loop_13:
+db 102,15,56,222,209
+ dec ecx
+ movups xmm1,[edx]
+ lea edx,[16+edx]
+ jnz NEAR L$071dec1_loop_13
+db 102,15,56,223,209
+ xorps xmm2,xmm5
+ movups [edi],xmm2
+L$072xts_dec_steal:
+ movzx ecx,BYTE [16+esi]
+ movzx edx,BYTE [edi]
+ lea esi,[1+esi]
+ mov BYTE [edi],cl
+ mov BYTE [16+edi],dl
+ lea edi,[1+edi]
+ sub eax,1
+ jnz NEAR L$072xts_dec_steal
+ sub edi,DWORD [112+esp]
+ mov edx,ebp
+ mov ecx,ebx
+ movups xmm2,[edi]
+ xorps xmm2,xmm6
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ lea edx,[32+edx]
+ xorps xmm2,xmm0
+L$073dec1_loop_14:
+db 102,15,56,222,209
+ dec ecx
+ movups xmm1,[edx]
+ lea edx,[16+edx]
+ jnz NEAR L$073dec1_loop_14
+db 102,15,56,223,209
+ xorps xmm2,xmm6
+ movups [edi],xmm2
+L$069xts_dec_ret:
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ pxor xmm2,xmm2
+ movdqa [esp],xmm0
+ pxor xmm3,xmm3
+ movdqa [16+esp],xmm0
+ pxor xmm4,xmm4
+ movdqa [32+esp],xmm0
+ pxor xmm5,xmm5
+ movdqa [48+esp],xmm0
+ pxor xmm6,xmm6
+ movdqa [64+esp],xmm0
+ pxor xmm7,xmm7
+ movdqa [80+esp],xmm0
+ mov esp,DWORD [116+esp]
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _aesni_ocb_encrypt
+align 16
+_aesni_ocb_encrypt:
+L$_aesni_ocb_encrypt_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov ecx,DWORD [40+esp]
+ mov ebx,DWORD [48+esp]
+ mov esi,DWORD [20+esp]
+ mov edi,DWORD [24+esp]
+ mov eax,DWORD [28+esp]
+ mov edx,DWORD [32+esp]
+ movdqu xmm0,[ecx]
+ mov ebp,DWORD [36+esp]
+ movdqu xmm1,[ebx]
+ mov ebx,DWORD [44+esp]
+ mov ecx,esp
+ sub esp,132
+ and esp,-16
+ sub edi,esi
+ shl eax,4
+ lea eax,[eax*1+esi-96]
+ mov DWORD [120+esp],edi
+ mov DWORD [124+esp],eax
+ mov DWORD [128+esp],ecx
+ mov ecx,DWORD [240+edx]
+ test ebp,1
+ jnz NEAR L$074odd
+ bsf eax,ebp
+ add ebp,1
+ shl eax,4
+ movdqu xmm7,[eax*1+ebx]
+ mov eax,edx
+ movdqu xmm2,[esi]
+ lea esi,[16+esi]
+ pxor xmm7,xmm0
+ pxor xmm1,xmm2
+ pxor xmm2,xmm7
+ movdqa xmm6,xmm1
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ lea edx,[32+edx]
+ xorps xmm2,xmm0
+L$075enc1_loop_15:
+db 102,15,56,220,209
+ dec ecx
+ movups xmm1,[edx]
+ lea edx,[16+edx]
+ jnz NEAR L$075enc1_loop_15
+db 102,15,56,221,209
+ xorps xmm2,xmm7
+ movdqa xmm0,xmm7
+ movdqa xmm1,xmm6
+ movups [esi*1+edi-16],xmm2
+ mov ecx,DWORD [240+eax]
+ mov edx,eax
+ mov eax,DWORD [124+esp]
+L$074odd:
+ shl ecx,4
+ mov edi,16
+ sub edi,ecx
+ mov DWORD [112+esp],edx
+ lea edx,[32+ecx*1+edx]
+ mov DWORD [116+esp],edi
+ cmp esi,eax
+ ja NEAR L$076short
+ jmp NEAR L$077grandloop
+align 32
+L$077grandloop:
+ lea ecx,[1+ebp]
+ lea eax,[3+ebp]
+ lea edi,[5+ebp]
+ add ebp,6
+ bsf ecx,ecx
+ bsf eax,eax
+ bsf edi,edi
+ shl ecx,4
+ shl eax,4
+ shl edi,4
+ movdqu xmm2,[ebx]
+ movdqu xmm3,[ecx*1+ebx]
+ mov ecx,DWORD [116+esp]
+ movdqa xmm4,xmm2
+ movdqu xmm5,[eax*1+ebx]
+ movdqa xmm6,xmm2
+ movdqu xmm7,[edi*1+ebx]
+ pxor xmm2,xmm0
+ pxor xmm3,xmm2
+ movdqa [esp],xmm2
+ pxor xmm4,xmm3
+ movdqa [16+esp],xmm3
+ pxor xmm5,xmm4
+ movdqa [32+esp],xmm4
+ pxor xmm6,xmm5
+ movdqa [48+esp],xmm5
+ pxor xmm7,xmm6
+ movdqa [64+esp],xmm6
+ movdqa [80+esp],xmm7
+ movups xmm0,[ecx*1+edx-48]
+ movdqu xmm2,[esi]
+ movdqu xmm3,[16+esi]
+ movdqu xmm4,[32+esi]
+ movdqu xmm5,[48+esi]
+ movdqu xmm6,[64+esi]
+ movdqu xmm7,[80+esi]
+ lea esi,[96+esi]
+ pxor xmm1,xmm2
+ pxor xmm2,xmm0
+ pxor xmm1,xmm3
+ pxor xmm3,xmm0
+ pxor xmm1,xmm4
+ pxor xmm4,xmm0
+ pxor xmm1,xmm5
+ pxor xmm5,xmm0
+ pxor xmm1,xmm6
+ pxor xmm6,xmm0
+ pxor xmm1,xmm7
+ pxor xmm7,xmm0
+ movdqa [96+esp],xmm1
+ movups xmm1,[ecx*1+edx-32]
+ pxor xmm2,[esp]
+ pxor xmm3,[16+esp]
+ pxor xmm4,[32+esp]
+ pxor xmm5,[48+esp]
+ pxor xmm6,[64+esp]
+ pxor xmm7,[80+esp]
+ movups xmm0,[ecx*1+edx-16]
+db 102,15,56,220,209
+db 102,15,56,220,217
+db 102,15,56,220,225
+db 102,15,56,220,233
+db 102,15,56,220,241
+db 102,15,56,220,249
+ mov edi,DWORD [120+esp]
+ mov eax,DWORD [124+esp]
+ call L$_aesni_encrypt6_enter
+ movdqa xmm0,[80+esp]
+ pxor xmm2,[esp]
+ pxor xmm3,[16+esp]
+ pxor xmm4,[32+esp]
+ pxor xmm5,[48+esp]
+ pxor xmm6,[64+esp]
+ pxor xmm7,xmm0
+ movdqa xmm1,[96+esp]
+ movdqu [esi*1+edi-96],xmm2
+ movdqu [esi*1+edi-80],xmm3
+ movdqu [esi*1+edi-64],xmm4
+ movdqu [esi*1+edi-48],xmm5
+ movdqu [esi*1+edi-32],xmm6
+ movdqu [esi*1+edi-16],xmm7
+ cmp esi,eax
+ jb NEAR L$077grandloop
+L$076short:
+ add eax,96
+ sub eax,esi
+ jz NEAR L$078done
+ cmp eax,32
+ jb NEAR L$079one
+ je NEAR L$080two
+ cmp eax,64
+ jb NEAR L$081three
+ je NEAR L$082four
+ lea ecx,[1+ebp]
+ lea eax,[3+ebp]
+ bsf ecx,ecx
+ bsf eax,eax
+ shl ecx,4
+ shl eax,4
+ movdqu xmm2,[ebx]
+ movdqu xmm3,[ecx*1+ebx]
+ mov ecx,DWORD [116+esp]
+ movdqa xmm4,xmm2
+ movdqu xmm5,[eax*1+ebx]
+ movdqa xmm6,xmm2
+ pxor xmm2,xmm0
+ pxor xmm3,xmm2
+ movdqa [esp],xmm2
+ pxor xmm4,xmm3
+ movdqa [16+esp],xmm3
+ pxor xmm5,xmm4
+ movdqa [32+esp],xmm4
+ pxor xmm6,xmm5
+ movdqa [48+esp],xmm5
+ pxor xmm7,xmm6
+ movdqa [64+esp],xmm6
+ movups xmm0,[ecx*1+edx-48]
+ movdqu xmm2,[esi]
+ movdqu xmm3,[16+esi]
+ movdqu xmm4,[32+esi]
+ movdqu xmm5,[48+esi]
+ movdqu xmm6,[64+esi]
+ pxor xmm7,xmm7
+ pxor xmm1,xmm2
+ pxor xmm2,xmm0
+ pxor xmm1,xmm3
+ pxor xmm3,xmm0
+ pxor xmm1,xmm4
+ pxor xmm4,xmm0
+ pxor xmm1,xmm5
+ pxor xmm5,xmm0
+ pxor xmm1,xmm6
+ pxor xmm6,xmm0
+ movdqa [96+esp],xmm1
+ movups xmm1,[ecx*1+edx-32]
+ pxor xmm2,[esp]
+ pxor xmm3,[16+esp]
+ pxor xmm4,[32+esp]
+ pxor xmm5,[48+esp]
+ pxor xmm6,[64+esp]
+ movups xmm0,[ecx*1+edx-16]
+db 102,15,56,220,209
+db 102,15,56,220,217
+db 102,15,56,220,225
+db 102,15,56,220,233
+db 102,15,56,220,241
+db 102,15,56,220,249
+ mov edi,DWORD [120+esp]
+ call L$_aesni_encrypt6_enter
+ movdqa xmm0,[64+esp]
+ pxor xmm2,[esp]
+ pxor xmm3,[16+esp]
+ pxor xmm4,[32+esp]
+ pxor xmm5,[48+esp]
+ pxor xmm6,xmm0
+ movdqa xmm1,[96+esp]
+ movdqu [esi*1+edi],xmm2
+ movdqu [16+esi*1+edi],xmm3
+ movdqu [32+esi*1+edi],xmm4
+ movdqu [48+esi*1+edi],xmm5
+ movdqu [64+esi*1+edi],xmm6
+ jmp NEAR L$078done
+align 16
+L$079one:
+ movdqu xmm7,[ebx]
+ mov edx,DWORD [112+esp]
+ movdqu xmm2,[esi]
+ mov ecx,DWORD [240+edx]
+ pxor xmm7,xmm0
+ pxor xmm1,xmm2
+ pxor xmm2,xmm7
+ movdqa xmm6,xmm1
+ mov edi,DWORD [120+esp]
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ lea edx,[32+edx]
+ xorps xmm2,xmm0
+L$083enc1_loop_16:
+db 102,15,56,220,209
+ dec ecx
+ movups xmm1,[edx]
+ lea edx,[16+edx]
+ jnz NEAR L$083enc1_loop_16
+db 102,15,56,221,209
+ xorps xmm2,xmm7
+ movdqa xmm0,xmm7
+ movdqa xmm1,xmm6
+ movups [esi*1+edi],xmm2
+ jmp NEAR L$078done
+align 16
+L$080two:
+ lea ecx,[1+ebp]
+ mov edx,DWORD [112+esp]
+ bsf ecx,ecx
+ shl ecx,4
+ movdqu xmm6,[ebx]
+ movdqu xmm7,[ecx*1+ebx]
+ movdqu xmm2,[esi]
+ movdqu xmm3,[16+esi]
+ mov ecx,DWORD [240+edx]
+ pxor xmm6,xmm0
+ pxor xmm7,xmm6
+ pxor xmm1,xmm2
+ pxor xmm2,xmm6
+ pxor xmm1,xmm3
+ pxor xmm3,xmm7
+ movdqa xmm5,xmm1
+ mov edi,DWORD [120+esp]
+ call __aesni_encrypt2
+ xorps xmm2,xmm6
+ xorps xmm3,xmm7
+ movdqa xmm0,xmm7
+ movdqa xmm1,xmm5
+ movups [esi*1+edi],xmm2
+ movups [16+esi*1+edi],xmm3
+ jmp NEAR L$078done
+align 16
+L$081three:
+ lea ecx,[1+ebp]
+ mov edx,DWORD [112+esp]
+ bsf ecx,ecx
+ shl ecx,4
+ movdqu xmm5,[ebx]
+ movdqu xmm6,[ecx*1+ebx]
+ movdqa xmm7,xmm5
+ movdqu xmm2,[esi]
+ movdqu xmm3,[16+esi]
+ movdqu xmm4,[32+esi]
+ mov ecx,DWORD [240+edx]
+ pxor xmm5,xmm0
+ pxor xmm6,xmm5
+ pxor xmm7,xmm6
+ pxor xmm1,xmm2
+ pxor xmm2,xmm5
+ pxor xmm1,xmm3
+ pxor xmm3,xmm6
+ pxor xmm1,xmm4
+ pxor xmm4,xmm7
+ movdqa [96+esp],xmm1
+ mov edi,DWORD [120+esp]
+ call __aesni_encrypt3
+ xorps xmm2,xmm5
+ xorps xmm3,xmm6
+ xorps xmm4,xmm7
+ movdqa xmm0,xmm7
+ movdqa xmm1,[96+esp]
+ movups [esi*1+edi],xmm2
+ movups [16+esi*1+edi],xmm3
+ movups [32+esi*1+edi],xmm4
+ jmp NEAR L$078done
+align 16
+L$082four:
+ lea ecx,[1+ebp]
+ lea eax,[3+ebp]
+ bsf ecx,ecx
+ bsf eax,eax
+ mov edx,DWORD [112+esp]
+ shl ecx,4
+ shl eax,4
+ movdqu xmm4,[ebx]
+ movdqu xmm5,[ecx*1+ebx]
+ movdqa xmm6,xmm4
+ movdqu xmm7,[eax*1+ebx]
+ pxor xmm4,xmm0
+ movdqu xmm2,[esi]
+ pxor xmm5,xmm4
+ movdqu xmm3,[16+esi]
+ pxor xmm6,xmm5
+ movdqa [esp],xmm4
+ pxor xmm7,xmm6
+ movdqa [16+esp],xmm5
+ movdqu xmm4,[32+esi]
+ movdqu xmm5,[48+esi]
+ mov ecx,DWORD [240+edx]
+ pxor xmm1,xmm2
+ pxor xmm2,[esp]
+ pxor xmm1,xmm3
+ pxor xmm3,[16+esp]
+ pxor xmm1,xmm4
+ pxor xmm4,xmm6
+ pxor xmm1,xmm5
+ pxor xmm5,xmm7
+ movdqa [96+esp],xmm1
+ mov edi,DWORD [120+esp]
+ call __aesni_encrypt4
+ xorps xmm2,[esp]
+ xorps xmm3,[16+esp]
+ xorps xmm4,xmm6
+ movups [esi*1+edi],xmm2
+ xorps xmm5,xmm7
+ movups [16+esi*1+edi],xmm3
+ movdqa xmm0,xmm7
+ movups [32+esi*1+edi],xmm4
+ movdqa xmm1,[96+esp]
+ movups [48+esi*1+edi],xmm5
+L$078done:
+ mov edx,DWORD [128+esp]
+ pxor xmm2,xmm2
+ pxor xmm3,xmm3
+ movdqa [esp],xmm2
+ pxor xmm4,xmm4
+ movdqa [16+esp],xmm2
+ pxor xmm5,xmm5
+ movdqa [32+esp],xmm2
+ pxor xmm6,xmm6
+ movdqa [48+esp],xmm2
+ pxor xmm7,xmm7
+ movdqa [64+esp],xmm2
+ movdqa [80+esp],xmm2
+ movdqa [96+esp],xmm2
+ lea esp,[edx]
+ mov ecx,DWORD [40+esp]
+ mov ebx,DWORD [48+esp]
+ movdqu [ecx],xmm0
+ pxor xmm0,xmm0
+ movdqu [ebx],xmm1
+ pxor xmm1,xmm1
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _aesni_ocb_decrypt
+align 16
+_aesni_ocb_decrypt:
+L$_aesni_ocb_decrypt_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov ecx,DWORD [40+esp]
+ mov ebx,DWORD [48+esp]
+ mov esi,DWORD [20+esp]
+ mov edi,DWORD [24+esp]
+ mov eax,DWORD [28+esp]
+ mov edx,DWORD [32+esp]
+ movdqu xmm0,[ecx]
+ mov ebp,DWORD [36+esp]
+ movdqu xmm1,[ebx]
+ mov ebx,DWORD [44+esp]
+ mov ecx,esp
+ sub esp,132
+ and esp,-16
+ sub edi,esi
+ shl eax,4
+ lea eax,[eax*1+esi-96]
+ mov DWORD [120+esp],edi
+ mov DWORD [124+esp],eax
+ mov DWORD [128+esp],ecx
+ mov ecx,DWORD [240+edx]
+ test ebp,1
+ jnz NEAR L$084odd
+ bsf eax,ebp
+ add ebp,1
+ shl eax,4
+ movdqu xmm7,[eax*1+ebx]
+ mov eax,edx
+ movdqu xmm2,[esi]
+ lea esi,[16+esi]
+ pxor xmm7,xmm0
+ pxor xmm2,xmm7
+ movdqa xmm6,xmm1
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ lea edx,[32+edx]
+ xorps xmm2,xmm0
+L$085dec1_loop_17:
+db 102,15,56,222,209
+ dec ecx
+ movups xmm1,[edx]
+ lea edx,[16+edx]
+ jnz NEAR L$085dec1_loop_17
+db 102,15,56,223,209
+ xorps xmm2,xmm7
+ movaps xmm1,xmm6
+ movdqa xmm0,xmm7
+ xorps xmm1,xmm2
+ movups [esi*1+edi-16],xmm2
+ mov ecx,DWORD [240+eax]
+ mov edx,eax
+ mov eax,DWORD [124+esp]
+L$084odd:
+ shl ecx,4
+ mov edi,16
+ sub edi,ecx
+ mov DWORD [112+esp],edx
+ lea edx,[32+ecx*1+edx]
+ mov DWORD [116+esp],edi
+ cmp esi,eax
+ ja NEAR L$086short
+ jmp NEAR L$087grandloop
+align 32
+L$087grandloop:
+ lea ecx,[1+ebp]
+ lea eax,[3+ebp]
+ lea edi,[5+ebp]
+ add ebp,6
+ bsf ecx,ecx
+ bsf eax,eax
+ bsf edi,edi
+ shl ecx,4
+ shl eax,4
+ shl edi,4
+ movdqu xmm2,[ebx]
+ movdqu xmm3,[ecx*1+ebx]
+ mov ecx,DWORD [116+esp]
+ movdqa xmm4,xmm2
+ movdqu xmm5,[eax*1+ebx]
+ movdqa xmm6,xmm2
+ movdqu xmm7,[edi*1+ebx]
+ pxor xmm2,xmm0
+ pxor xmm3,xmm2
+ movdqa [esp],xmm2
+ pxor xmm4,xmm3
+ movdqa [16+esp],xmm3
+ pxor xmm5,xmm4
+ movdqa [32+esp],xmm4
+ pxor xmm6,xmm5
+ movdqa [48+esp],xmm5
+ pxor xmm7,xmm6
+ movdqa [64+esp],xmm6
+ movdqa [80+esp],xmm7
+ movups xmm0,[ecx*1+edx-48]
+ movdqu xmm2,[esi]
+ movdqu xmm3,[16+esi]
+ movdqu xmm4,[32+esi]
+ movdqu xmm5,[48+esi]
+ movdqu xmm6,[64+esi]
+ movdqu xmm7,[80+esi]
+ lea esi,[96+esi]
+ movdqa [96+esp],xmm1
+ pxor xmm2,xmm0
+ pxor xmm3,xmm0
+ pxor xmm4,xmm0
+ pxor xmm5,xmm0
+ pxor xmm6,xmm0
+ pxor xmm7,xmm0
+ movups xmm1,[ecx*1+edx-32]
+ pxor xmm2,[esp]
+ pxor xmm3,[16+esp]
+ pxor xmm4,[32+esp]
+ pxor xmm5,[48+esp]
+ pxor xmm6,[64+esp]
+ pxor xmm7,[80+esp]
+ movups xmm0,[ecx*1+edx-16]
+db 102,15,56,222,209
+db 102,15,56,222,217
+db 102,15,56,222,225
+db 102,15,56,222,233
+db 102,15,56,222,241
+db 102,15,56,222,249
+ mov edi,DWORD [120+esp]
+ mov eax,DWORD [124+esp]
+ call L$_aesni_decrypt6_enter
+ movdqa xmm0,[80+esp]
+ pxor xmm2,[esp]
+ movdqa xmm1,[96+esp]
+ pxor xmm3,[16+esp]
+ pxor xmm4,[32+esp]
+ pxor xmm5,[48+esp]
+ pxor xmm6,[64+esp]
+ pxor xmm7,xmm0
+ pxor xmm1,xmm2
+ movdqu [esi*1+edi-96],xmm2
+ pxor xmm1,xmm3
+ movdqu [esi*1+edi-80],xmm3
+ pxor xmm1,xmm4
+ movdqu [esi*1+edi-64],xmm4
+ pxor xmm1,xmm5
+ movdqu [esi*1+edi-48],xmm5
+ pxor xmm1,xmm6
+ movdqu [esi*1+edi-32],xmm6
+ pxor xmm1,xmm7
+ movdqu [esi*1+edi-16],xmm7
+ cmp esi,eax
+ jb NEAR L$087grandloop
+L$086short:
+ add eax,96
+ sub eax,esi
+ jz NEAR L$088done
+ cmp eax,32
+ jb NEAR L$089one
+ je NEAR L$090two
+ cmp eax,64
+ jb NEAR L$091three
+ je NEAR L$092four
+ lea ecx,[1+ebp]
+ lea eax,[3+ebp]
+ bsf ecx,ecx
+ bsf eax,eax
+ shl ecx,4
+ shl eax,4
+ movdqu xmm2,[ebx]
+ movdqu xmm3,[ecx*1+ebx]
+ mov ecx,DWORD [116+esp]
+ movdqa xmm4,xmm2
+ movdqu xmm5,[eax*1+ebx]
+ movdqa xmm6,xmm2
+ pxor xmm2,xmm0
+ pxor xmm3,xmm2
+ movdqa [esp],xmm2
+ pxor xmm4,xmm3
+ movdqa [16+esp],xmm3
+ pxor xmm5,xmm4
+ movdqa [32+esp],xmm4
+ pxor xmm6,xmm5
+ movdqa [48+esp],xmm5
+ pxor xmm7,xmm6
+ movdqa [64+esp],xmm6
+ movups xmm0,[ecx*1+edx-48]
+ movdqu xmm2,[esi]
+ movdqu xmm3,[16+esi]
+ movdqu xmm4,[32+esi]
+ movdqu xmm5,[48+esi]
+ movdqu xmm6,[64+esi]
+ pxor xmm7,xmm7
+ movdqa [96+esp],xmm1
+ pxor xmm2,xmm0
+ pxor xmm3,xmm0
+ pxor xmm4,xmm0
+ pxor xmm5,xmm0
+ pxor xmm6,xmm0
+ movups xmm1,[ecx*1+edx-32]
+ pxor xmm2,[esp]
+ pxor xmm3,[16+esp]
+ pxor xmm4,[32+esp]
+ pxor xmm5,[48+esp]
+ pxor xmm6,[64+esp]
+ movups xmm0,[ecx*1+edx-16]
+db 102,15,56,222,209
+db 102,15,56,222,217
+db 102,15,56,222,225
+db 102,15,56,222,233
+db 102,15,56,222,241
+db 102,15,56,222,249
+ mov edi,DWORD [120+esp]
+ call L$_aesni_decrypt6_enter
+ movdqa xmm0,[64+esp]
+ pxor xmm2,[esp]
+ movdqa xmm1,[96+esp]
+ pxor xmm3,[16+esp]
+ pxor xmm4,[32+esp]
+ pxor xmm5,[48+esp]
+ pxor xmm6,xmm0
+ pxor xmm1,xmm2
+ movdqu [esi*1+edi],xmm2
+ pxor xmm1,xmm3
+ movdqu [16+esi*1+edi],xmm3
+ pxor xmm1,xmm4
+ movdqu [32+esi*1+edi],xmm4
+ pxor xmm1,xmm5
+ movdqu [48+esi*1+edi],xmm5
+ pxor xmm1,xmm6
+ movdqu [64+esi*1+edi],xmm6
+ jmp NEAR L$088done
+align 16
+L$089one:
+ movdqu xmm7,[ebx]
+ mov edx,DWORD [112+esp]
+ movdqu xmm2,[esi]
+ mov ecx,DWORD [240+edx]
+ pxor xmm7,xmm0
+ pxor xmm2,xmm7
+ movdqa xmm6,xmm1
+ mov edi,DWORD [120+esp]
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ lea edx,[32+edx]
+ xorps xmm2,xmm0
+L$093dec1_loop_18:
+db 102,15,56,222,209
+ dec ecx
+ movups xmm1,[edx]
+ lea edx,[16+edx]
+ jnz NEAR L$093dec1_loop_18
+db 102,15,56,223,209
+ xorps xmm2,xmm7
+ movaps xmm1,xmm6
+ movdqa xmm0,xmm7
+ xorps xmm1,xmm2
+ movups [esi*1+edi],xmm2
+ jmp NEAR L$088done
+align 16
+L$090two:
+ lea ecx,[1+ebp]
+ mov edx,DWORD [112+esp]
+ bsf ecx,ecx
+ shl ecx,4
+ movdqu xmm6,[ebx]
+ movdqu xmm7,[ecx*1+ebx]
+ movdqu xmm2,[esi]
+ movdqu xmm3,[16+esi]
+ mov ecx,DWORD [240+edx]
+ movdqa xmm5,xmm1
+ pxor xmm6,xmm0
+ pxor xmm7,xmm6
+ pxor xmm2,xmm6
+ pxor xmm3,xmm7
+ mov edi,DWORD [120+esp]
+ call __aesni_decrypt2
+ xorps xmm2,xmm6
+ xorps xmm3,xmm7
+ movdqa xmm0,xmm7
+ xorps xmm5,xmm2
+ movups [esi*1+edi],xmm2
+ xorps xmm5,xmm3
+ movups [16+esi*1+edi],xmm3
+ movaps xmm1,xmm5
+ jmp NEAR L$088done
+align 16
+L$091three:
+ lea ecx,[1+ebp]
+ mov edx,DWORD [112+esp]
+ bsf ecx,ecx
+ shl ecx,4
+ movdqu xmm5,[ebx]
+ movdqu xmm6,[ecx*1+ebx]
+ movdqa xmm7,xmm5
+ movdqu xmm2,[esi]
+ movdqu xmm3,[16+esi]
+ movdqu xmm4,[32+esi]
+ mov ecx,DWORD [240+edx]
+ movdqa [96+esp],xmm1
+ pxor xmm5,xmm0
+ pxor xmm6,xmm5
+ pxor xmm7,xmm6
+ pxor xmm2,xmm5
+ pxor xmm3,xmm6
+ pxor xmm4,xmm7
+ mov edi,DWORD [120+esp]
+ call __aesni_decrypt3
+ movdqa xmm1,[96+esp]
+ xorps xmm2,xmm5
+ xorps xmm3,xmm6
+ xorps xmm4,xmm7
+ movups [esi*1+edi],xmm2
+ pxor xmm1,xmm2
+ movdqa xmm0,xmm7
+ movups [16+esi*1+edi],xmm3
+ pxor xmm1,xmm3
+ movups [32+esi*1+edi],xmm4
+ pxor xmm1,xmm4
+ jmp NEAR L$088done
+align 16
+L$092four:
+ lea ecx,[1+ebp]
+ lea eax,[3+ebp]
+ bsf ecx,ecx
+ bsf eax,eax
+ mov edx,DWORD [112+esp]
+ shl ecx,4
+ shl eax,4
+ movdqu xmm4,[ebx]
+ movdqu xmm5,[ecx*1+ebx]
+ movdqa xmm6,xmm4
+ movdqu xmm7,[eax*1+ebx]
+ pxor xmm4,xmm0
+ movdqu xmm2,[esi]
+ pxor xmm5,xmm4
+ movdqu xmm3,[16+esi]
+ pxor xmm6,xmm5
+ movdqa [esp],xmm4
+ pxor xmm7,xmm6
+ movdqa [16+esp],xmm5
+ movdqu xmm4,[32+esi]
+ movdqu xmm5,[48+esi]
+ mov ecx,DWORD [240+edx]
+ movdqa [96+esp],xmm1
+ pxor xmm2,[esp]
+ pxor xmm3,[16+esp]
+ pxor xmm4,xmm6
+ pxor xmm5,xmm7
+ mov edi,DWORD [120+esp]
+ call __aesni_decrypt4
+ movdqa xmm1,[96+esp]
+ xorps xmm2,[esp]
+ xorps xmm3,[16+esp]
+ xorps xmm4,xmm6
+ movups [esi*1+edi],xmm2
+ pxor xmm1,xmm2
+ xorps xmm5,xmm7
+ movups [16+esi*1+edi],xmm3
+ pxor xmm1,xmm3
+ movdqa xmm0,xmm7
+ movups [32+esi*1+edi],xmm4
+ pxor xmm1,xmm4
+ movups [48+esi*1+edi],xmm5
+ pxor xmm1,xmm5
+L$088done:
+ mov edx,DWORD [128+esp]
+ pxor xmm2,xmm2
+ pxor xmm3,xmm3
+ movdqa [esp],xmm2
+ pxor xmm4,xmm4
+ movdqa [16+esp],xmm2
+ pxor xmm5,xmm5
+ movdqa [32+esp],xmm2
+ pxor xmm6,xmm6
+ movdqa [48+esp],xmm2
+ pxor xmm7,xmm7
+ movdqa [64+esp],xmm2
+ movdqa [80+esp],xmm2
+ movdqa [96+esp],xmm2
+ lea esp,[edx]
+ mov ecx,DWORD [40+esp]
+ mov ebx,DWORD [48+esp]
+ movdqu [ecx],xmm0
+ pxor xmm0,xmm0
+ movdqu [ebx],xmm1
+ pxor xmm1,xmm1
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _aesni_cbc_encrypt
+align 16
+_aesni_cbc_encrypt:
+L$_aesni_cbc_encrypt_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov esi,DWORD [20+esp]
+ mov ebx,esp
+ mov edi,DWORD [24+esp]
+ sub ebx,24
+ mov eax,DWORD [28+esp]
+ and ebx,-16
+ mov edx,DWORD [32+esp]
+ mov ebp,DWORD [36+esp]
+ test eax,eax
+ jz NEAR L$094cbc_abort
+ cmp DWORD [40+esp],0
+ xchg ebx,esp
+ movups xmm7,[ebp]
+ mov ecx,DWORD [240+edx]
+ mov ebp,edx
+ mov DWORD [16+esp],ebx
+ mov ebx,ecx
+ je NEAR L$095cbc_decrypt
+ movaps xmm2,xmm7
+ cmp eax,16
+ jb NEAR L$096cbc_enc_tail
+ sub eax,16
+ jmp NEAR L$097cbc_enc_loop
+align 16
+L$097cbc_enc_loop:
+ movups xmm7,[esi]
+ lea esi,[16+esi]
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ xorps xmm7,xmm0
+ lea edx,[32+edx]
+ xorps xmm2,xmm7
+L$098enc1_loop_19:
+db 102,15,56,220,209
+ dec ecx
+ movups xmm1,[edx]
+ lea edx,[16+edx]
+ jnz NEAR L$098enc1_loop_19
+db 102,15,56,221,209
+ mov ecx,ebx
+ mov edx,ebp
+ movups [edi],xmm2
+ lea edi,[16+edi]
+ sub eax,16
+ jnc NEAR L$097cbc_enc_loop
+ add eax,16
+ jnz NEAR L$096cbc_enc_tail
+ movaps xmm7,xmm2
+ pxor xmm2,xmm2
+ jmp NEAR L$099cbc_ret
+L$096cbc_enc_tail:
+ mov ecx,eax
+dd 2767451785
+ mov ecx,16
+ sub ecx,eax
+ xor eax,eax
+dd 2868115081
+ lea edi,[edi-16]
+ mov ecx,ebx
+ mov esi,edi
+ mov edx,ebp
+ jmp NEAR L$097cbc_enc_loop
+align 16
+L$095cbc_decrypt:
+ cmp eax,80
+ jbe NEAR L$100cbc_dec_tail
+ movaps [esp],xmm7
+ sub eax,80
+ jmp NEAR L$101cbc_dec_loop6_enter
+align 16
+L$102cbc_dec_loop6:
+ movaps [esp],xmm0
+ movups [edi],xmm7
+ lea edi,[16+edi]
+L$101cbc_dec_loop6_enter:
+ movdqu xmm2,[esi]
+ movdqu xmm3,[16+esi]
+ movdqu xmm4,[32+esi]
+ movdqu xmm5,[48+esi]
+ movdqu xmm6,[64+esi]
+ movdqu xmm7,[80+esi]
+ call __aesni_decrypt6
+ movups xmm1,[esi]
+ movups xmm0,[16+esi]
+ xorps xmm2,[esp]
+ xorps xmm3,xmm1
+ movups xmm1,[32+esi]
+ xorps xmm4,xmm0
+ movups xmm0,[48+esi]
+ xorps xmm5,xmm1
+ movups xmm1,[64+esi]
+ xorps xmm6,xmm0
+ movups xmm0,[80+esi]
+ xorps xmm7,xmm1
+ movups [edi],xmm2
+ movups [16+edi],xmm3
+ lea esi,[96+esi]
+ movups [32+edi],xmm4
+ mov ecx,ebx
+ movups [48+edi],xmm5
+ mov edx,ebp
+ movups [64+edi],xmm6
+ lea edi,[80+edi]
+ sub eax,96
+ ja NEAR L$102cbc_dec_loop6
+ movaps xmm2,xmm7
+ movaps xmm7,xmm0
+ add eax,80
+ jle NEAR L$103cbc_dec_clear_tail_collected
+ movups [edi],xmm2
+ lea edi,[16+edi]
+L$100cbc_dec_tail:
+ movups xmm2,[esi]
+ movaps xmm6,xmm2
+ cmp eax,16
+ jbe NEAR L$104cbc_dec_one
+ movups xmm3,[16+esi]
+ movaps xmm5,xmm3
+ cmp eax,32
+ jbe NEAR L$105cbc_dec_two
+ movups xmm4,[32+esi]
+ cmp eax,48
+ jbe NEAR L$106cbc_dec_three
+ movups xmm5,[48+esi]
+ cmp eax,64
+ jbe NEAR L$107cbc_dec_four
+ movups xmm6,[64+esi]
+ movaps [esp],xmm7
+ movups xmm2,[esi]
+ xorps xmm7,xmm7
+ call __aesni_decrypt6
+ movups xmm1,[esi]
+ movups xmm0,[16+esi]
+ xorps xmm2,[esp]
+ xorps xmm3,xmm1
+ movups xmm1,[32+esi]
+ xorps xmm4,xmm0
+ movups xmm0,[48+esi]
+ xorps xmm5,xmm1
+ movups xmm7,[64+esi]
+ xorps xmm6,xmm0
+ movups [edi],xmm2
+ movups [16+edi],xmm3
+ pxor xmm3,xmm3
+ movups [32+edi],xmm4
+ pxor xmm4,xmm4
+ movups [48+edi],xmm5
+ pxor xmm5,xmm5
+ lea edi,[64+edi]
+ movaps xmm2,xmm6
+ pxor xmm6,xmm6
+ sub eax,80
+ jmp NEAR L$108cbc_dec_tail_collected
+align 16
+L$104cbc_dec_one:
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ lea edx,[32+edx]
+ xorps xmm2,xmm0
+L$109dec1_loop_20:
+db 102,15,56,222,209
+ dec ecx
+ movups xmm1,[edx]
+ lea edx,[16+edx]
+ jnz NEAR L$109dec1_loop_20
+db 102,15,56,223,209
+ xorps xmm2,xmm7
+ movaps xmm7,xmm6
+ sub eax,16
+ jmp NEAR L$108cbc_dec_tail_collected
+align 16
+L$105cbc_dec_two:
+ call __aesni_decrypt2
+ xorps xmm2,xmm7
+ xorps xmm3,xmm6
+ movups [edi],xmm2
+ movaps xmm2,xmm3
+ pxor xmm3,xmm3
+ lea edi,[16+edi]
+ movaps xmm7,xmm5
+ sub eax,32
+ jmp NEAR L$108cbc_dec_tail_collected
+align 16
+L$106cbc_dec_three:
+ call __aesni_decrypt3
+ xorps xmm2,xmm7
+ xorps xmm3,xmm6
+ xorps xmm4,xmm5
+ movups [edi],xmm2
+ movaps xmm2,xmm4
+ pxor xmm4,xmm4
+ movups [16+edi],xmm3
+ pxor xmm3,xmm3
+ lea edi,[32+edi]
+ movups xmm7,[32+esi]
+ sub eax,48
+ jmp NEAR L$108cbc_dec_tail_collected
+align 16
+L$107cbc_dec_four:
+ call __aesni_decrypt4
+ movups xmm1,[16+esi]
+ movups xmm0,[32+esi]
+ xorps xmm2,xmm7
+ movups xmm7,[48+esi]
+ xorps xmm3,xmm6
+ movups [edi],xmm2
+ xorps xmm4,xmm1
+ movups [16+edi],xmm3
+ pxor xmm3,xmm3
+ xorps xmm5,xmm0
+ movups [32+edi],xmm4
+ pxor xmm4,xmm4
+ lea edi,[48+edi]
+ movaps xmm2,xmm5
+ pxor xmm5,xmm5
+ sub eax,64
+ jmp NEAR L$108cbc_dec_tail_collected
+align 16
+L$103cbc_dec_clear_tail_collected:
+ pxor xmm3,xmm3
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+ pxor xmm6,xmm6
+L$108cbc_dec_tail_collected:
+ and eax,15
+ jnz NEAR L$110cbc_dec_tail_partial
+ movups [edi],xmm2
+ pxor xmm0,xmm0
+ jmp NEAR L$099cbc_ret
+align 16
+L$110cbc_dec_tail_partial:
+ movaps [esp],xmm2
+ pxor xmm0,xmm0
+ mov ecx,16
+ mov esi,esp
+ sub ecx,eax
+dd 2767451785
+ movdqa [esp],xmm2
+L$099cbc_ret:
+ mov esp,DWORD [16+esp]
+ mov ebp,DWORD [36+esp]
+ pxor xmm2,xmm2
+ pxor xmm1,xmm1
+ movups [ebp],xmm7
+ pxor xmm7,xmm7
+L$094cbc_abort:
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+align 16
+__aesni_set_encrypt_key:
+ push ebp
+ push ebx
+ test eax,eax
+ jz NEAR L$111bad_pointer
+ test edx,edx
+ jz NEAR L$111bad_pointer
+ call L$112pic
+L$112pic:
+ pop ebx
+ lea ebx,[(L$key_const-L$112pic)+ebx]
+ lea ebp,[_OPENSSL_ia32cap_P]
+ movups xmm0,[eax]
+ xorps xmm4,xmm4
+ mov ebp,DWORD [4+ebp]
+ lea edx,[16+edx]
+ and ebp,268437504
+ cmp ecx,256
+ je NEAR L$11314rounds
+ cmp ecx,192
+ je NEAR L$11412rounds
+ cmp ecx,128
+ jne NEAR L$115bad_keybits
+align 16
+L$11610rounds:
+ cmp ebp,268435456
+ je NEAR L$11710rounds_alt
+ mov ecx,9
+ movups [edx-16],xmm0
+db 102,15,58,223,200,1
+ call L$118key_128_cold
+db 102,15,58,223,200,2
+ call L$119key_128
+db 102,15,58,223,200,4
+ call L$119key_128
+db 102,15,58,223,200,8
+ call L$119key_128
+db 102,15,58,223,200,16
+ call L$119key_128
+db 102,15,58,223,200,32
+ call L$119key_128
+db 102,15,58,223,200,64
+ call L$119key_128
+db 102,15,58,223,200,128
+ call L$119key_128
+db 102,15,58,223,200,27
+ call L$119key_128
+db 102,15,58,223,200,54
+ call L$119key_128
+ movups [edx],xmm0
+ mov DWORD [80+edx],ecx
+ jmp NEAR L$120good_key
+align 16
+L$119key_128:
+ movups [edx],xmm0
+ lea edx,[16+edx]
+L$118key_128_cold:
+ shufps xmm4,xmm0,16
+ xorps xmm0,xmm4
+ shufps xmm4,xmm0,140
+ xorps xmm0,xmm4
+ shufps xmm1,xmm1,255
+ xorps xmm0,xmm1
+ ret
+align 16
+L$11710rounds_alt:
+ movdqa xmm5,[ebx]
+ mov ecx,8
+ movdqa xmm4,[32+ebx]
+ movdqa xmm2,xmm0
+ movdqu [edx-16],xmm0
+L$121loop_key128:
+db 102,15,56,0,197
+db 102,15,56,221,196
+ pslld xmm4,1
+ lea edx,[16+edx]
+ movdqa xmm3,xmm2
+ pslldq xmm2,4
+ pxor xmm3,xmm2
+ pslldq xmm2,4
+ pxor xmm3,xmm2
+ pslldq xmm2,4
+ pxor xmm2,xmm3
+ pxor xmm0,xmm2
+ movdqu [edx-16],xmm0
+ movdqa xmm2,xmm0
+ dec ecx
+ jnz NEAR L$121loop_key128
+ movdqa xmm4,[48+ebx]
+db 102,15,56,0,197
+db 102,15,56,221,196
+ pslld xmm4,1
+ movdqa xmm3,xmm2
+ pslldq xmm2,4
+ pxor xmm3,xmm2
+ pslldq xmm2,4
+ pxor xmm3,xmm2
+ pslldq xmm2,4
+ pxor xmm2,xmm3
+ pxor xmm0,xmm2
+ movdqu [edx],xmm0
+ movdqa xmm2,xmm0
+db 102,15,56,0,197
+db 102,15,56,221,196
+ movdqa xmm3,xmm2
+ pslldq xmm2,4
+ pxor xmm3,xmm2
+ pslldq xmm2,4
+ pxor xmm3,xmm2
+ pslldq xmm2,4
+ pxor xmm2,xmm3
+ pxor xmm0,xmm2
+ movdqu [16+edx],xmm0
+ mov ecx,9
+ mov DWORD [96+edx],ecx
+ jmp NEAR L$120good_key
+align 16
+L$11412rounds:
+ movq xmm2,[16+eax]
+ cmp ebp,268435456
+ je NEAR L$12212rounds_alt
+ mov ecx,11
+ movups [edx-16],xmm0
+db 102,15,58,223,202,1
+ call L$123key_192a_cold
+db 102,15,58,223,202,2
+ call L$124key_192b
+db 102,15,58,223,202,4
+ call L$125key_192a
+db 102,15,58,223,202,8
+ call L$124key_192b
+db 102,15,58,223,202,16
+ call L$125key_192a
+db 102,15,58,223,202,32
+ call L$124key_192b
+db 102,15,58,223,202,64
+ call L$125key_192a
+db 102,15,58,223,202,128
+ call L$124key_192b
+ movups [edx],xmm0
+ mov DWORD [48+edx],ecx
+ jmp NEAR L$120good_key
+align 16
+L$125key_192a:
+ movups [edx],xmm0
+ lea edx,[16+edx]
+align 16
+L$123key_192a_cold:
+ movaps xmm5,xmm2
+L$126key_192b_warm:
+ shufps xmm4,xmm0,16
+ movdqa xmm3,xmm2
+ xorps xmm0,xmm4
+ shufps xmm4,xmm0,140
+ pslldq xmm3,4
+ xorps xmm0,xmm4
+ pshufd xmm1,xmm1,85
+ pxor xmm2,xmm3
+ pxor xmm0,xmm1
+ pshufd xmm3,xmm0,255
+ pxor xmm2,xmm3
+ ret
+align 16
+L$124key_192b:
+ movaps xmm3,xmm0
+ shufps xmm5,xmm0,68
+ movups [edx],xmm5
+ shufps xmm3,xmm2,78
+ movups [16+edx],xmm3
+ lea edx,[32+edx]
+ jmp NEAR L$126key_192b_warm
+align 16
+L$12212rounds_alt:
+ movdqa xmm5,[16+ebx]
+ movdqa xmm4,[32+ebx]
+ mov ecx,8
+ movdqu [edx-16],xmm0
+L$127loop_key192:
+ movq [edx],xmm2
+ movdqa xmm1,xmm2
+db 102,15,56,0,213
+db 102,15,56,221,212
+ pslld xmm4,1
+ lea edx,[24+edx]
+ movdqa xmm3,xmm0
+ pslldq xmm0,4
+ pxor xmm3,xmm0
+ pslldq xmm0,4
+ pxor xmm3,xmm0
+ pslldq xmm0,4
+ pxor xmm0,xmm3
+ pshufd xmm3,xmm0,255
+ pxor xmm3,xmm1
+ pslldq xmm1,4
+ pxor xmm3,xmm1
+ pxor xmm0,xmm2
+ pxor xmm2,xmm3
+ movdqu [edx-16],xmm0
+ dec ecx
+ jnz NEAR L$127loop_key192
+ mov ecx,11
+ mov DWORD [32+edx],ecx
+ jmp NEAR L$120good_key
+align 16
+L$11314rounds:
+ movups xmm2,[16+eax]
+ lea edx,[16+edx]
+ cmp ebp,268435456
+ je NEAR L$12814rounds_alt
+ mov ecx,13
+ movups [edx-32],xmm0
+ movups [edx-16],xmm2
+db 102,15,58,223,202,1
+ call L$129key_256a_cold
+db 102,15,58,223,200,1
+ call L$130key_256b
+db 102,15,58,223,202,2
+ call L$131key_256a
+db 102,15,58,223,200,2
+ call L$130key_256b
+db 102,15,58,223,202,4
+ call L$131key_256a
+db 102,15,58,223,200,4
+ call L$130key_256b
+db 102,15,58,223,202,8
+ call L$131key_256a
+db 102,15,58,223,200,8
+ call L$130key_256b
+db 102,15,58,223,202,16
+ call L$131key_256a
+db 102,15,58,223,200,16
+ call L$130key_256b
+db 102,15,58,223,202,32
+ call L$131key_256a
+db 102,15,58,223,200,32
+ call L$130key_256b
+db 102,15,58,223,202,64
+ call L$131key_256a
+ movups [edx],xmm0
+ mov DWORD [16+edx],ecx
+ xor eax,eax
+ jmp NEAR L$120good_key
+align 16
+L$131key_256a:
+ movups [edx],xmm2
+ lea edx,[16+edx]
+L$129key_256a_cold:
+ shufps xmm4,xmm0,16
+ xorps xmm0,xmm4
+ shufps xmm4,xmm0,140
+ xorps xmm0,xmm4
+ shufps xmm1,xmm1,255
+ xorps xmm0,xmm1
+ ret
+align 16
+L$130key_256b:
+ movups [edx],xmm0
+ lea edx,[16+edx]
+ shufps xmm4,xmm2,16
+ xorps xmm2,xmm4
+ shufps xmm4,xmm2,140
+ xorps xmm2,xmm4
+ shufps xmm1,xmm1,170
+ xorps xmm2,xmm1
+ ret
+align 16
+L$12814rounds_alt:
+ movdqa xmm5,[ebx]
+ movdqa xmm4,[32+ebx]
+ mov ecx,7
+ movdqu [edx-32],xmm0
+ movdqa xmm1,xmm2
+ movdqu [edx-16],xmm2
+L$132loop_key256:
+db 102,15,56,0,213
+db 102,15,56,221,212
+ movdqa xmm3,xmm0
+ pslldq xmm0,4
+ pxor xmm3,xmm0
+ pslldq xmm0,4
+ pxor xmm3,xmm0
+ pslldq xmm0,4
+ pxor xmm0,xmm3
+ pslld xmm4,1
+ pxor xmm0,xmm2
+ movdqu [edx],xmm0
+ dec ecx
+ jz NEAR L$133done_key256
+ pshufd xmm2,xmm0,255
+ pxor xmm3,xmm3
+db 102,15,56,221,211
+ movdqa xmm3,xmm1
+ pslldq xmm1,4
+ pxor xmm3,xmm1
+ pslldq xmm1,4
+ pxor xmm3,xmm1
+ pslldq xmm1,4
+ pxor xmm1,xmm3
+ pxor xmm2,xmm1
+ movdqu [16+edx],xmm2
+ lea edx,[32+edx]
+ movdqa xmm1,xmm2
+ jmp NEAR L$132loop_key256
+L$133done_key256:
+ mov ecx,13
+ mov DWORD [16+edx],ecx
+L$120good_key:
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ pxor xmm2,xmm2
+ pxor xmm3,xmm3
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+ xor eax,eax
+ pop ebx
+ pop ebp
+ ret
+align 4
+L$111bad_pointer:
+ mov eax,-1
+ pop ebx
+ pop ebp
+ ret
+align 4
+L$115bad_keybits:
+ pxor xmm0,xmm0
+ mov eax,-2
+ pop ebx
+ pop ebp
+ ret
+global _aesni_set_encrypt_key
+align 16
+_aesni_set_encrypt_key:
+L$_aesni_set_encrypt_key_begin:
+ mov eax,DWORD [4+esp]
+ mov ecx,DWORD [8+esp]
+ mov edx,DWORD [12+esp]
+ call __aesni_set_encrypt_key
+ ret
+global _aesni_set_decrypt_key
+align 16
+_aesni_set_decrypt_key:
+L$_aesni_set_decrypt_key_begin:
+ mov eax,DWORD [4+esp]
+ mov ecx,DWORD [8+esp]
+ mov edx,DWORD [12+esp]
+ call __aesni_set_encrypt_key
+ mov edx,DWORD [12+esp]
+ shl ecx,4
+ test eax,eax
+ jnz NEAR L$134dec_key_ret
+ lea eax,[16+ecx*1+edx]
+ movups xmm0,[edx]
+ movups xmm1,[eax]
+ movups [eax],xmm0
+ movups [edx],xmm1
+ lea edx,[16+edx]
+ lea eax,[eax-16]
+L$135dec_key_inverse:
+ movups xmm0,[edx]
+ movups xmm1,[eax]
+db 102,15,56,219,192
+db 102,15,56,219,201
+ lea edx,[16+edx]
+ lea eax,[eax-16]
+ movups [16+eax],xmm0
+ movups [edx-16],xmm1
+ cmp eax,edx
+ ja NEAR L$135dec_key_inverse
+ movups xmm0,[edx]
+db 102,15,56,219,192
+ movups [edx],xmm0
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ xor eax,eax
+L$134dec_key_ret:
+ ret
+align 64
+L$key_const:
+dd 202313229,202313229,202313229,202313229
+dd 67569157,67569157,67569157,67569157
+dd 1,1,1,1
+dd 27,27,27,27
+db 65,69,83,32,102,111,114,32,73,110,116,101,108,32,65,69
+db 83,45,78,73,44,32,67,82,89,80,84,79,71,65,77,83
+db 32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115
+db 115,108,46,111,114,103,62,0
+segment .bss
+common _OPENSSL_ia32cap_P 16
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/vpaes-x86.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/vpaes-x86.nasm
new file mode 100644
index 0000000000..5eecfdba3d
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/vpaes-x86.nasm
@@ -0,0 +1,648 @@
+; Copyright 2011-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text code align=64
+%else
+section .text code
+%endif
+align 64
+L$_vpaes_consts:
+dd 218628480,235210255,168496130,67568393
+dd 252381056,17041926,33884169,51187212
+dd 252645135,252645135,252645135,252645135
+dd 1512730624,3266504856,1377990664,3401244816
+dd 830229760,1275146365,2969422977,3447763452
+dd 3411033600,2979783055,338359620,2782886510
+dd 4209124096,907596821,221174255,1006095553
+dd 191964160,3799684038,3164090317,1589111125
+dd 182528256,1777043520,2877432650,3265356744
+dd 1874708224,3503451415,3305285752,363511674
+dd 1606117888,3487855781,1093350906,2384367825
+dd 197121,67569157,134941193,202313229
+dd 67569157,134941193,202313229,197121
+dd 134941193,202313229,197121,67569157
+dd 202313229,197121,67569157,134941193
+dd 33619971,100992007,168364043,235736079
+dd 235736079,33619971,100992007,168364043
+dd 168364043,235736079,33619971,100992007
+dd 100992007,168364043,235736079,33619971
+dd 50462976,117835012,185207048,252579084
+dd 252314880,51251460,117574920,184942860
+dd 184682752,252054788,50987272,118359308
+dd 118099200,185467140,251790600,50727180
+dd 2946363062,528716217,1300004225,1881839624
+dd 1532713819,1532713819,1532713819,1532713819
+dd 3602276352,4288629033,3737020424,4153884961
+dd 1354558464,32357713,2958822624,3775749553
+dd 1201988352,132424512,1572796698,503232858
+dd 2213177600,1597421020,4103937655,675398315
+dd 2749646592,4273543773,1511898873,121693092
+dd 3040248576,1103263732,2871565598,1608280554
+dd 2236667136,2588920351,482954393,64377734
+dd 3069987328,291237287,2117370568,3650299247
+dd 533321216,3573750986,2572112006,1401264716
+dd 1339849704,2721158661,548607111,3445553514
+dd 2128193280,3054596040,2183486460,1257083700
+dd 655635200,1165381986,3923443150,2344132524
+dd 190078720,256924420,290342170,357187870
+dd 1610966272,2263057382,4103205268,309794674
+dd 2592527872,2233205587,1335446729,3402964816
+dd 3973531904,3225098121,3002836325,1918774430
+dd 3870401024,2102906079,2284471353,4117666579
+dd 617007872,1021508343,366931923,691083277
+dd 2528395776,3491914898,2968704004,1613121270
+dd 3445188352,3247741094,844474987,4093578302
+dd 651481088,1190302358,1689581232,574775300
+dd 4289380608,206939853,2555985458,2489840491
+dd 2130264064,327674451,3566485037,3349835193
+dd 2470714624,316102159,3636825756,3393945945
+db 86,101,99,116,111,114,32,80,101,114,109,117,116,97,116,105
+db 111,110,32,65,69,83,32,102,111,114,32,120,56,54,47,83
+db 83,83,69,51,44,32,77,105,107,101,32,72,97,109,98,117
+db 114,103,32,40,83,116,97,110,102,111,114,100,32,85,110,105
+db 118,101,114,115,105,116,121,41,0
+align 64
+align 16
+__vpaes_preheat:
+ add ebp,DWORD [esp]
+ movdqa xmm7,[ebp-48]
+ movdqa xmm6,[ebp-16]
+ ret
+align 16
+__vpaes_encrypt_core:
+ mov ecx,16
+ mov eax,DWORD [240+edx]
+ movdqa xmm1,xmm6
+ movdqa xmm2,[ebp]
+ pandn xmm1,xmm0
+ pand xmm0,xmm6
+ movdqu xmm5,[edx]
+db 102,15,56,0,208
+ movdqa xmm0,[16+ebp]
+ pxor xmm2,xmm5
+ psrld xmm1,4
+ add edx,16
+db 102,15,56,0,193
+ lea ebx,[192+ebp]
+ pxor xmm0,xmm2
+ jmp NEAR L$000enc_entry
+align 16
+L$001enc_loop:
+ movdqa xmm4,[32+ebp]
+ movdqa xmm0,[48+ebp]
+db 102,15,56,0,226
+db 102,15,56,0,195
+ pxor xmm4,xmm5
+ movdqa xmm5,[64+ebp]
+ pxor xmm0,xmm4
+ movdqa xmm1,[ecx*1+ebx-64]
+db 102,15,56,0,234
+ movdqa xmm2,[80+ebp]
+ movdqa xmm4,[ecx*1+ebx]
+db 102,15,56,0,211
+ movdqa xmm3,xmm0
+ pxor xmm2,xmm5
+db 102,15,56,0,193
+ add edx,16
+ pxor xmm0,xmm2
+db 102,15,56,0,220
+ add ecx,16
+ pxor xmm3,xmm0
+db 102,15,56,0,193
+ and ecx,48
+ sub eax,1
+ pxor xmm0,xmm3
+L$000enc_entry:
+ movdqa xmm1,xmm6
+ movdqa xmm5,[ebp-32]
+ pandn xmm1,xmm0
+ psrld xmm1,4
+ pand xmm0,xmm6
+db 102,15,56,0,232
+ movdqa xmm3,xmm7
+ pxor xmm0,xmm1
+db 102,15,56,0,217
+ movdqa xmm4,xmm7
+ pxor xmm3,xmm5
+db 102,15,56,0,224
+ movdqa xmm2,xmm7
+ pxor xmm4,xmm5
+db 102,15,56,0,211
+ movdqa xmm3,xmm7
+ pxor xmm2,xmm0
+db 102,15,56,0,220
+ movdqu xmm5,[edx]
+ pxor xmm3,xmm1
+ jnz NEAR L$001enc_loop
+ movdqa xmm4,[96+ebp]
+ movdqa xmm0,[112+ebp]
+db 102,15,56,0,226
+ pxor xmm4,xmm5
+db 102,15,56,0,195
+ movdqa xmm1,[64+ecx*1+ebx]
+ pxor xmm0,xmm4
+db 102,15,56,0,193
+ ret
+align 16
+__vpaes_decrypt_core:
+ lea ebx,[608+ebp]
+ mov eax,DWORD [240+edx]
+ movdqa xmm1,xmm6
+ movdqa xmm2,[ebx-64]
+ pandn xmm1,xmm0
+ mov ecx,eax
+ psrld xmm1,4
+ movdqu xmm5,[edx]
+ shl ecx,4
+ pand xmm0,xmm6
+db 102,15,56,0,208
+ movdqa xmm0,[ebx-48]
+ xor ecx,48
+db 102,15,56,0,193
+ and ecx,48
+ pxor xmm2,xmm5
+ movdqa xmm5,[176+ebp]
+ pxor xmm0,xmm2
+ add edx,16
+ lea ecx,[ecx*1+ebx-352]
+ jmp NEAR L$002dec_entry
+align 16
+L$003dec_loop:
+ movdqa xmm4,[ebx-32]
+ movdqa xmm1,[ebx-16]
+db 102,15,56,0,226
+db 102,15,56,0,203
+ pxor xmm0,xmm4
+ movdqa xmm4,[ebx]
+ pxor xmm0,xmm1
+ movdqa xmm1,[16+ebx]
+db 102,15,56,0,226
+db 102,15,56,0,197
+db 102,15,56,0,203
+ pxor xmm0,xmm4
+ movdqa xmm4,[32+ebx]
+ pxor xmm0,xmm1
+ movdqa xmm1,[48+ebx]
+db 102,15,56,0,226
+db 102,15,56,0,197
+db 102,15,56,0,203
+ pxor xmm0,xmm4
+ movdqa xmm4,[64+ebx]
+ pxor xmm0,xmm1
+ movdqa xmm1,[80+ebx]
+db 102,15,56,0,226
+db 102,15,56,0,197
+db 102,15,56,0,203
+ pxor xmm0,xmm4
+ add edx,16
+db 102,15,58,15,237,12
+ pxor xmm0,xmm1
+ sub eax,1
+L$002dec_entry:
+ movdqa xmm1,xmm6
+ movdqa xmm2,[ebp-32]
+ pandn xmm1,xmm0
+ pand xmm0,xmm6
+ psrld xmm1,4
+db 102,15,56,0,208
+ movdqa xmm3,xmm7
+ pxor xmm0,xmm1
+db 102,15,56,0,217
+ movdqa xmm4,xmm7
+ pxor xmm3,xmm2
+db 102,15,56,0,224
+ pxor xmm4,xmm2
+ movdqa xmm2,xmm7
+db 102,15,56,0,211
+ movdqa xmm3,xmm7
+ pxor xmm2,xmm0
+db 102,15,56,0,220
+ movdqu xmm0,[edx]
+ pxor xmm3,xmm1
+ jnz NEAR L$003dec_loop
+ movdqa xmm4,[96+ebx]
+db 102,15,56,0,226
+ pxor xmm4,xmm0
+ movdqa xmm0,[112+ebx]
+ movdqa xmm2,[ecx]
+db 102,15,56,0,195
+ pxor xmm0,xmm4
+db 102,15,56,0,194
+ ret
+align 16
+__vpaes_schedule_core:
+ add ebp,DWORD [esp]
+ movdqu xmm0,[esi]
+ movdqa xmm2,[320+ebp]
+ movdqa xmm3,xmm0
+ lea ebx,[ebp]
+ movdqa [4+esp],xmm2
+ call __vpaes_schedule_transform
+ movdqa xmm7,xmm0
+ test edi,edi
+ jnz NEAR L$004schedule_am_decrypting
+ movdqu [edx],xmm0
+ jmp NEAR L$005schedule_go
+L$004schedule_am_decrypting:
+ movdqa xmm1,[256+ecx*1+ebp]
+db 102,15,56,0,217
+ movdqu [edx],xmm3
+ xor ecx,48
+L$005schedule_go:
+ cmp eax,192
+ ja NEAR L$006schedule_256
+ je NEAR L$007schedule_192
+L$008schedule_128:
+ mov eax,10
+L$009loop_schedule_128:
+ call __vpaes_schedule_round
+ dec eax
+ jz NEAR L$010schedule_mangle_last
+ call __vpaes_schedule_mangle
+ jmp NEAR L$009loop_schedule_128
+align 16
+L$007schedule_192:
+ movdqu xmm0,[8+esi]
+ call __vpaes_schedule_transform
+ movdqa xmm6,xmm0
+ pxor xmm4,xmm4
+ movhlps xmm6,xmm4
+ mov eax,4
+L$011loop_schedule_192:
+ call __vpaes_schedule_round
+db 102,15,58,15,198,8
+ call __vpaes_schedule_mangle
+ call __vpaes_schedule_192_smear
+ call __vpaes_schedule_mangle
+ call __vpaes_schedule_round
+ dec eax
+ jz NEAR L$010schedule_mangle_last
+ call __vpaes_schedule_mangle
+ call __vpaes_schedule_192_smear
+ jmp NEAR L$011loop_schedule_192
+align 16
+L$006schedule_256:
+ movdqu xmm0,[16+esi]
+ call __vpaes_schedule_transform
+ mov eax,7
+L$012loop_schedule_256:
+ call __vpaes_schedule_mangle
+ movdqa xmm6,xmm0
+ call __vpaes_schedule_round
+ dec eax
+ jz NEAR L$010schedule_mangle_last
+ call __vpaes_schedule_mangle
+ pshufd xmm0,xmm0,255
+ movdqa [20+esp],xmm7
+ movdqa xmm7,xmm6
+ call L$_vpaes_schedule_low_round
+ movdqa xmm7,[20+esp]
+ jmp NEAR L$012loop_schedule_256
+align 16
+L$010schedule_mangle_last:
+ lea ebx,[384+ebp]
+ test edi,edi
+ jnz NEAR L$013schedule_mangle_last_dec
+ movdqa xmm1,[256+ecx*1+ebp]
+db 102,15,56,0,193
+ lea ebx,[352+ebp]
+ add edx,32
+L$013schedule_mangle_last_dec:
+ add edx,-16
+ pxor xmm0,[336+ebp]
+ call __vpaes_schedule_transform
+ movdqu [edx],xmm0
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ pxor xmm2,xmm2
+ pxor xmm3,xmm3
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+ pxor xmm6,xmm6
+ pxor xmm7,xmm7
+ ret
+align 16
+__vpaes_schedule_192_smear:
+ pshufd xmm1,xmm6,128
+ pshufd xmm0,xmm7,254
+ pxor xmm6,xmm1
+ pxor xmm1,xmm1
+ pxor xmm6,xmm0
+ movdqa xmm0,xmm6
+ movhlps xmm6,xmm1
+ ret
+align 16
+__vpaes_schedule_round:
+ movdqa xmm2,[8+esp]
+ pxor xmm1,xmm1
+db 102,15,58,15,202,15
+db 102,15,58,15,210,15
+ pxor xmm7,xmm1
+ pshufd xmm0,xmm0,255
+db 102,15,58,15,192,1
+ movdqa [8+esp],xmm2
+L$_vpaes_schedule_low_round:
+ movdqa xmm1,xmm7
+ pslldq xmm7,4
+ pxor xmm7,xmm1
+ movdqa xmm1,xmm7
+ pslldq xmm7,8
+ pxor xmm7,xmm1
+ pxor xmm7,[336+ebp]
+ movdqa xmm4,[ebp-16]
+ movdqa xmm5,[ebp-48]
+ movdqa xmm1,xmm4
+ pandn xmm1,xmm0
+ psrld xmm1,4
+ pand xmm0,xmm4
+ movdqa xmm2,[ebp-32]
+db 102,15,56,0,208
+ pxor xmm0,xmm1
+ movdqa xmm3,xmm5
+db 102,15,56,0,217
+ pxor xmm3,xmm2
+ movdqa xmm4,xmm5
+db 102,15,56,0,224
+ pxor xmm4,xmm2
+ movdqa xmm2,xmm5
+db 102,15,56,0,211
+ pxor xmm2,xmm0
+ movdqa xmm3,xmm5
+db 102,15,56,0,220
+ pxor xmm3,xmm1
+ movdqa xmm4,[32+ebp]
+db 102,15,56,0,226
+ movdqa xmm0,[48+ebp]
+db 102,15,56,0,195
+ pxor xmm0,xmm4
+ pxor xmm0,xmm7
+ movdqa xmm7,xmm0
+ ret
+align 16
+__vpaes_schedule_transform:
+ movdqa xmm2,[ebp-16]
+ movdqa xmm1,xmm2
+ pandn xmm1,xmm0
+ psrld xmm1,4
+ pand xmm0,xmm2
+ movdqa xmm2,[ebx]
+db 102,15,56,0,208
+ movdqa xmm0,[16+ebx]
+db 102,15,56,0,193
+ pxor xmm0,xmm2
+ ret
+align 16
+__vpaes_schedule_mangle:
+ movdqa xmm4,xmm0
+ movdqa xmm5,[128+ebp]
+ test edi,edi
+ jnz NEAR L$014schedule_mangle_dec
+ add edx,16
+ pxor xmm4,[336+ebp]
+db 102,15,56,0,229
+ movdqa xmm3,xmm4
+db 102,15,56,0,229
+ pxor xmm3,xmm4
+db 102,15,56,0,229
+ pxor xmm3,xmm4
+ jmp NEAR L$015schedule_mangle_both
+align 16
+L$014schedule_mangle_dec:
+ movdqa xmm2,[ebp-16]
+ lea esi,[416+ebp]
+ movdqa xmm1,xmm2
+ pandn xmm1,xmm4
+ psrld xmm1,4
+ pand xmm4,xmm2
+ movdqa xmm2,[esi]
+db 102,15,56,0,212
+ movdqa xmm3,[16+esi]
+db 102,15,56,0,217
+ pxor xmm3,xmm2
+db 102,15,56,0,221
+ movdqa xmm2,[32+esi]
+db 102,15,56,0,212
+ pxor xmm2,xmm3
+ movdqa xmm3,[48+esi]
+db 102,15,56,0,217
+ pxor xmm3,xmm2
+db 102,15,56,0,221
+ movdqa xmm2,[64+esi]
+db 102,15,56,0,212
+ pxor xmm2,xmm3
+ movdqa xmm3,[80+esi]
+db 102,15,56,0,217
+ pxor xmm3,xmm2
+db 102,15,56,0,221
+ movdqa xmm2,[96+esi]
+db 102,15,56,0,212
+ pxor xmm2,xmm3
+ movdqa xmm3,[112+esi]
+db 102,15,56,0,217
+ pxor xmm3,xmm2
+ add edx,-16
+L$015schedule_mangle_both:
+ movdqa xmm1,[256+ecx*1+ebp]
+db 102,15,56,0,217
+ add ecx,-16
+ and ecx,48
+ movdqu [edx],xmm3
+ ret
+global _vpaes_set_encrypt_key
+align 16
+_vpaes_set_encrypt_key:
+L$_vpaes_set_encrypt_key_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov esi,DWORD [20+esp]
+ lea ebx,[esp-56]
+ mov eax,DWORD [24+esp]
+ and ebx,-16
+ mov edx,DWORD [28+esp]
+ xchg ebx,esp
+ mov DWORD [48+esp],ebx
+ mov ebx,eax
+ shr ebx,5
+ add ebx,5
+ mov DWORD [240+edx],ebx
+ mov ecx,48
+ mov edi,0
+ lea ebp,[(L$_vpaes_consts+0x30-L$016pic_point)]
+ call __vpaes_schedule_core
+L$016pic_point:
+ mov esp,DWORD [48+esp]
+ xor eax,eax
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _vpaes_set_decrypt_key
+align 16
+_vpaes_set_decrypt_key:
+L$_vpaes_set_decrypt_key_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov esi,DWORD [20+esp]
+ lea ebx,[esp-56]
+ mov eax,DWORD [24+esp]
+ and ebx,-16
+ mov edx,DWORD [28+esp]
+ xchg ebx,esp
+ mov DWORD [48+esp],ebx
+ mov ebx,eax
+ shr ebx,5
+ add ebx,5
+ mov DWORD [240+edx],ebx
+ shl ebx,4
+ lea edx,[16+ebx*1+edx]
+ mov edi,1
+ mov ecx,eax
+ shr ecx,1
+ and ecx,32
+ xor ecx,32
+ lea ebp,[(L$_vpaes_consts+0x30-L$017pic_point)]
+ call __vpaes_schedule_core
+L$017pic_point:
+ mov esp,DWORD [48+esp]
+ xor eax,eax
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _vpaes_encrypt
+align 16
+_vpaes_encrypt:
+L$_vpaes_encrypt_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ lea ebp,[(L$_vpaes_consts+0x30-L$018pic_point)]
+ call __vpaes_preheat
+L$018pic_point:
+ mov esi,DWORD [20+esp]
+ lea ebx,[esp-56]
+ mov edi,DWORD [24+esp]
+ and ebx,-16
+ mov edx,DWORD [28+esp]
+ xchg ebx,esp
+ mov DWORD [48+esp],ebx
+ movdqu xmm0,[esi]
+ call __vpaes_encrypt_core
+ movdqu [edi],xmm0
+ mov esp,DWORD [48+esp]
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _vpaes_decrypt
+align 16
+_vpaes_decrypt:
+L$_vpaes_decrypt_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ lea ebp,[(L$_vpaes_consts+0x30-L$019pic_point)]
+ call __vpaes_preheat
+L$019pic_point:
+ mov esi,DWORD [20+esp]
+ lea ebx,[esp-56]
+ mov edi,DWORD [24+esp]
+ and ebx,-16
+ mov edx,DWORD [28+esp]
+ xchg ebx,esp
+ mov DWORD [48+esp],ebx
+ movdqu xmm0,[esi]
+ call __vpaes_decrypt_core
+ movdqu [edi],xmm0
+ mov esp,DWORD [48+esp]
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _vpaes_cbc_encrypt
+align 16
+_vpaes_cbc_encrypt:
+L$_vpaes_cbc_encrypt_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov esi,DWORD [20+esp]
+ mov edi,DWORD [24+esp]
+ mov eax,DWORD [28+esp]
+ mov edx,DWORD [32+esp]
+ sub eax,16
+ jc NEAR L$020cbc_abort
+ lea ebx,[esp-56]
+ mov ebp,DWORD [36+esp]
+ and ebx,-16
+ mov ecx,DWORD [40+esp]
+ xchg ebx,esp
+ movdqu xmm1,[ebp]
+ sub edi,esi
+ mov DWORD [48+esp],ebx
+ mov DWORD [esp],edi
+ mov DWORD [4+esp],edx
+ mov DWORD [8+esp],ebp
+ mov edi,eax
+ lea ebp,[(L$_vpaes_consts+0x30-L$021pic_point)]
+ call __vpaes_preheat
+L$021pic_point:
+ cmp ecx,0
+ je NEAR L$022cbc_dec_loop
+ jmp NEAR L$023cbc_enc_loop
+align 16
+L$023cbc_enc_loop:
+ movdqu xmm0,[esi]
+ pxor xmm0,xmm1
+ call __vpaes_encrypt_core
+ mov ebx,DWORD [esp]
+ mov edx,DWORD [4+esp]
+ movdqa xmm1,xmm0
+ movdqu [esi*1+ebx],xmm0
+ lea esi,[16+esi]
+ sub edi,16
+ jnc NEAR L$023cbc_enc_loop
+ jmp NEAR L$024cbc_done
+align 16
+L$022cbc_dec_loop:
+ movdqu xmm0,[esi]
+ movdqa [16+esp],xmm1
+ movdqa [32+esp],xmm0
+ call __vpaes_decrypt_core
+ mov ebx,DWORD [esp]
+ mov edx,DWORD [4+esp]
+ pxor xmm0,[16+esp]
+ movdqa xmm1,[32+esp]
+ movdqu [esi*1+ebx],xmm0
+ lea esi,[16+esi]
+ sub edi,16
+ jnc NEAR L$022cbc_dec_loop
+L$024cbc_done:
+ mov ebx,DWORD [8+esp]
+ mov esp,DWORD [48+esp]
+ movdqu [ebx],xmm1
+L$020cbc_abort:
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/bn-586.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/bn-586.nasm
new file mode 100644
index 0000000000..75bba13387
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/bn-586.nasm
@@ -0,0 +1,1522 @@
+; Copyright 1995-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text code align=64
+%else
+section .text code
+%endif
+;extern _OPENSSL_ia32cap_P
+global _bn_mul_add_words
+align 16
+_bn_mul_add_words:
+L$_bn_mul_add_words_begin:
+ lea eax,[_OPENSSL_ia32cap_P]
+ bt DWORD [eax],26
+ jnc NEAR L$000maw_non_sse2
+ mov eax,DWORD [4+esp]
+ mov edx,DWORD [8+esp]
+ mov ecx,DWORD [12+esp]
+ movd mm0,DWORD [16+esp]
+ pxor mm1,mm1
+ jmp NEAR L$001maw_sse2_entry
+align 16
+L$002maw_sse2_unrolled:
+ movd mm3,DWORD [eax]
+ paddq mm1,mm3
+ movd mm2,DWORD [edx]
+ pmuludq mm2,mm0
+ movd mm4,DWORD [4+edx]
+ pmuludq mm4,mm0
+ movd mm6,DWORD [8+edx]
+ pmuludq mm6,mm0
+ movd mm7,DWORD [12+edx]
+ pmuludq mm7,mm0
+ paddq mm1,mm2
+ movd mm3,DWORD [4+eax]
+ paddq mm3,mm4
+ movd mm5,DWORD [8+eax]
+ paddq mm5,mm6
+ movd mm4,DWORD [12+eax]
+ paddq mm7,mm4
+ movd DWORD [eax],mm1
+ movd mm2,DWORD [16+edx]
+ pmuludq mm2,mm0
+ psrlq mm1,32
+ movd mm4,DWORD [20+edx]
+ pmuludq mm4,mm0
+ paddq mm1,mm3
+ movd mm6,DWORD [24+edx]
+ pmuludq mm6,mm0
+ movd DWORD [4+eax],mm1
+ psrlq mm1,32
+ movd mm3,DWORD [28+edx]
+ add edx,32
+ pmuludq mm3,mm0
+ paddq mm1,mm5
+ movd mm5,DWORD [16+eax]
+ paddq mm2,mm5
+ movd DWORD [8+eax],mm1
+ psrlq mm1,32
+ paddq mm1,mm7
+ movd mm5,DWORD [20+eax]
+ paddq mm4,mm5
+ movd DWORD [12+eax],mm1
+ psrlq mm1,32
+ paddq mm1,mm2
+ movd mm5,DWORD [24+eax]
+ paddq mm6,mm5
+ movd DWORD [16+eax],mm1
+ psrlq mm1,32
+ paddq mm1,mm4
+ movd mm5,DWORD [28+eax]
+ paddq mm3,mm5
+ movd DWORD [20+eax],mm1
+ psrlq mm1,32
+ paddq mm1,mm6
+ movd DWORD [24+eax],mm1
+ psrlq mm1,32
+ paddq mm1,mm3
+ movd DWORD [28+eax],mm1
+ lea eax,[32+eax]
+ psrlq mm1,32
+ sub ecx,8
+ jz NEAR L$003maw_sse2_exit
+L$001maw_sse2_entry:
+ test ecx,4294967288
+ jnz NEAR L$002maw_sse2_unrolled
+align 4
+L$004maw_sse2_loop:
+ movd mm2,DWORD [edx]
+ movd mm3,DWORD [eax]
+ pmuludq mm2,mm0
+ lea edx,[4+edx]
+ paddq mm1,mm3
+ paddq mm1,mm2
+ movd DWORD [eax],mm1
+ sub ecx,1
+ psrlq mm1,32
+ lea eax,[4+eax]
+ jnz NEAR L$004maw_sse2_loop
+L$003maw_sse2_exit:
+ movd eax,mm1
+ emms
+ ret
+align 16
+L$000maw_non_sse2:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ ;
+ xor esi,esi
+ mov edi,DWORD [20+esp]
+ mov ecx,DWORD [28+esp]
+ mov ebx,DWORD [24+esp]
+ and ecx,4294967288
+ mov ebp,DWORD [32+esp]
+ push ecx
+ jz NEAR L$005maw_finish
+align 16
+L$006maw_loop:
+ ; Round 0
+ mov eax,DWORD [ebx]
+ mul ebp
+ add eax,esi
+ adc edx,0
+ add eax,DWORD [edi]
+ adc edx,0
+ mov DWORD [edi],eax
+ mov esi,edx
+ ; Round 4
+ mov eax,DWORD [4+ebx]
+ mul ebp
+ add eax,esi
+ adc edx,0
+ add eax,DWORD [4+edi]
+ adc edx,0
+ mov DWORD [4+edi],eax
+ mov esi,edx
+ ; Round 8
+ mov eax,DWORD [8+ebx]
+ mul ebp
+ add eax,esi
+ adc edx,0
+ add eax,DWORD [8+edi]
+ adc edx,0
+ mov DWORD [8+edi],eax
+ mov esi,edx
+ ; Round 12
+ mov eax,DWORD [12+ebx]
+ mul ebp
+ add eax,esi
+ adc edx,0
+ add eax,DWORD [12+edi]
+ adc edx,0
+ mov DWORD [12+edi],eax
+ mov esi,edx
+ ; Round 16
+ mov eax,DWORD [16+ebx]
+ mul ebp
+ add eax,esi
+ adc edx,0
+ add eax,DWORD [16+edi]
+ adc edx,0
+ mov DWORD [16+edi],eax
+ mov esi,edx
+ ; Round 20
+ mov eax,DWORD [20+ebx]
+ mul ebp
+ add eax,esi
+ adc edx,0
+ add eax,DWORD [20+edi]
+ adc edx,0
+ mov DWORD [20+edi],eax
+ mov esi,edx
+ ; Round 24
+ mov eax,DWORD [24+ebx]
+ mul ebp
+ add eax,esi
+ adc edx,0
+ add eax,DWORD [24+edi]
+ adc edx,0
+ mov DWORD [24+edi],eax
+ mov esi,edx
+ ; Round 28
+ mov eax,DWORD [28+ebx]
+ mul ebp
+ add eax,esi
+ adc edx,0
+ add eax,DWORD [28+edi]
+ adc edx,0
+ mov DWORD [28+edi],eax
+ mov esi,edx
+ ;
+ sub ecx,8
+ lea ebx,[32+ebx]
+ lea edi,[32+edi]
+ jnz NEAR L$006maw_loop
+L$005maw_finish:
+ mov ecx,DWORD [32+esp]
+ and ecx,7
+ jnz NEAR L$007maw_finish2
+ jmp NEAR L$008maw_end
+L$007maw_finish2:
+ ; Tail Round 0
+ mov eax,DWORD [ebx]
+ mul ebp
+ add eax,esi
+ adc edx,0
+ add eax,DWORD [edi]
+ adc edx,0
+ dec ecx
+ mov DWORD [edi],eax
+ mov esi,edx
+ jz NEAR L$008maw_end
+ ; Tail Round 1
+ mov eax,DWORD [4+ebx]
+ mul ebp
+ add eax,esi
+ adc edx,0
+ add eax,DWORD [4+edi]
+ adc edx,0
+ dec ecx
+ mov DWORD [4+edi],eax
+ mov esi,edx
+ jz NEAR L$008maw_end
+ ; Tail Round 2
+ mov eax,DWORD [8+ebx]
+ mul ebp
+ add eax,esi
+ adc edx,0
+ add eax,DWORD [8+edi]
+ adc edx,0
+ dec ecx
+ mov DWORD [8+edi],eax
+ mov esi,edx
+ jz NEAR L$008maw_end
+ ; Tail Round 3
+ mov eax,DWORD [12+ebx]
+ mul ebp
+ add eax,esi
+ adc edx,0
+ add eax,DWORD [12+edi]
+ adc edx,0
+ dec ecx
+ mov DWORD [12+edi],eax
+ mov esi,edx
+ jz NEAR L$008maw_end
+ ; Tail Round 4
+ mov eax,DWORD [16+ebx]
+ mul ebp
+ add eax,esi
+ adc edx,0
+ add eax,DWORD [16+edi]
+ adc edx,0
+ dec ecx
+ mov DWORD [16+edi],eax
+ mov esi,edx
+ jz NEAR L$008maw_end
+ ; Tail Round 5
+ mov eax,DWORD [20+ebx]
+ mul ebp
+ add eax,esi
+ adc edx,0
+ add eax,DWORD [20+edi]
+ adc edx,0
+ dec ecx
+ mov DWORD [20+edi],eax
+ mov esi,edx
+ jz NEAR L$008maw_end
+ ; Tail Round 6
+ mov eax,DWORD [24+ebx]
+ mul ebp
+ add eax,esi
+ adc edx,0
+ add eax,DWORD [24+edi]
+ adc edx,0
+ mov DWORD [24+edi],eax
+ mov esi,edx
+L$008maw_end:
+ mov eax,esi
+ pop ecx
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _bn_mul_words
+align 16
+_bn_mul_words:
+L$_bn_mul_words_begin:
+ lea eax,[_OPENSSL_ia32cap_P]
+ bt DWORD [eax],26
+ jnc NEAR L$009mw_non_sse2
+ mov eax,DWORD [4+esp]
+ mov edx,DWORD [8+esp]
+ mov ecx,DWORD [12+esp]
+ movd mm0,DWORD [16+esp]
+ pxor mm1,mm1
+align 16
+L$010mw_sse2_loop:
+ movd mm2,DWORD [edx]
+ pmuludq mm2,mm0
+ lea edx,[4+edx]
+ paddq mm1,mm2
+ movd DWORD [eax],mm1
+ sub ecx,1
+ psrlq mm1,32
+ lea eax,[4+eax]
+ jnz NEAR L$010mw_sse2_loop
+ movd eax,mm1
+ emms
+ ret
+align 16
+L$009mw_non_sse2:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ ;
+ xor esi,esi
+ mov edi,DWORD [20+esp]
+ mov ebx,DWORD [24+esp]
+ mov ebp,DWORD [28+esp]
+ mov ecx,DWORD [32+esp]
+ and ebp,4294967288
+ jz NEAR L$011mw_finish
+L$012mw_loop:
+ ; Round 0
+ mov eax,DWORD [ebx]
+ mul ecx
+ add eax,esi
+ adc edx,0
+ mov DWORD [edi],eax
+ mov esi,edx
+ ; Round 4
+ mov eax,DWORD [4+ebx]
+ mul ecx
+ add eax,esi
+ adc edx,0
+ mov DWORD [4+edi],eax
+ mov esi,edx
+ ; Round 8
+ mov eax,DWORD [8+ebx]
+ mul ecx
+ add eax,esi
+ adc edx,0
+ mov DWORD [8+edi],eax
+ mov esi,edx
+ ; Round 12
+ mov eax,DWORD [12+ebx]
+ mul ecx
+ add eax,esi
+ adc edx,0
+ mov DWORD [12+edi],eax
+ mov esi,edx
+ ; Round 16
+ mov eax,DWORD [16+ebx]
+ mul ecx
+ add eax,esi
+ adc edx,0
+ mov DWORD [16+edi],eax
+ mov esi,edx
+ ; Round 20
+ mov eax,DWORD [20+ebx]
+ mul ecx
+ add eax,esi
+ adc edx,0
+ mov DWORD [20+edi],eax
+ mov esi,edx
+ ; Round 24
+ mov eax,DWORD [24+ebx]
+ mul ecx
+ add eax,esi
+ adc edx,0
+ mov DWORD [24+edi],eax
+ mov esi,edx
+ ; Round 28
+ mov eax,DWORD [28+ebx]
+ mul ecx
+ add eax,esi
+ adc edx,0
+ mov DWORD [28+edi],eax
+ mov esi,edx
+ ;
+ add ebx,32
+ add edi,32
+ sub ebp,8
+ jz NEAR L$011mw_finish
+ jmp NEAR L$012mw_loop
+L$011mw_finish:
+ mov ebp,DWORD [28+esp]
+ and ebp,7
+ jnz NEAR L$013mw_finish2
+ jmp NEAR L$014mw_end
+L$013mw_finish2:
+ ; Tail Round 0
+ mov eax,DWORD [ebx]
+ mul ecx
+ add eax,esi
+ adc edx,0
+ mov DWORD [edi],eax
+ mov esi,edx
+ dec ebp
+ jz NEAR L$014mw_end
+ ; Tail Round 1
+ mov eax,DWORD [4+ebx]
+ mul ecx
+ add eax,esi
+ adc edx,0
+ mov DWORD [4+edi],eax
+ mov esi,edx
+ dec ebp
+ jz NEAR L$014mw_end
+ ; Tail Round 2
+ mov eax,DWORD [8+ebx]
+ mul ecx
+ add eax,esi
+ adc edx,0
+ mov DWORD [8+edi],eax
+ mov esi,edx
+ dec ebp
+ jz NEAR L$014mw_end
+ ; Tail Round 3
+ mov eax,DWORD [12+ebx]
+ mul ecx
+ add eax,esi
+ adc edx,0
+ mov DWORD [12+edi],eax
+ mov esi,edx
+ dec ebp
+ jz NEAR L$014mw_end
+ ; Tail Round 4
+ mov eax,DWORD [16+ebx]
+ mul ecx
+ add eax,esi
+ adc edx,0
+ mov DWORD [16+edi],eax
+ mov esi,edx
+ dec ebp
+ jz NEAR L$014mw_end
+ ; Tail Round 5
+ mov eax,DWORD [20+ebx]
+ mul ecx
+ add eax,esi
+ adc edx,0
+ mov DWORD [20+edi],eax
+ mov esi,edx
+ dec ebp
+ jz NEAR L$014mw_end
+ ; Tail Round 6
+ mov eax,DWORD [24+ebx]
+ mul ecx
+ add eax,esi
+ adc edx,0
+ mov DWORD [24+edi],eax
+ mov esi,edx
+L$014mw_end:
+ mov eax,esi
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _bn_sqr_words
+align 16
+_bn_sqr_words:
+L$_bn_sqr_words_begin:
+ lea eax,[_OPENSSL_ia32cap_P]
+ bt DWORD [eax],26
+ jnc NEAR L$015sqr_non_sse2
+ mov eax,DWORD [4+esp]
+ mov edx,DWORD [8+esp]
+ mov ecx,DWORD [12+esp]
+align 16
+L$016sqr_sse2_loop:
+ movd mm0,DWORD [edx]
+ pmuludq mm0,mm0
+ lea edx,[4+edx]
+ movq [eax],mm0
+ sub ecx,1
+ lea eax,[8+eax]
+ jnz NEAR L$016sqr_sse2_loop
+ emms
+ ret
+align 16
+L$015sqr_non_sse2:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ ;
+ mov esi,DWORD [20+esp]
+ mov edi,DWORD [24+esp]
+ mov ebx,DWORD [28+esp]
+ and ebx,4294967288
+ jz NEAR L$017sw_finish
+L$018sw_loop:
+ ; Round 0
+ mov eax,DWORD [edi]
+ mul eax
+ mov DWORD [esi],eax
+ mov DWORD [4+esi],edx
+ ; Round 4
+ mov eax,DWORD [4+edi]
+ mul eax
+ mov DWORD [8+esi],eax
+ mov DWORD [12+esi],edx
+ ; Round 8
+ mov eax,DWORD [8+edi]
+ mul eax
+ mov DWORD [16+esi],eax
+ mov DWORD [20+esi],edx
+ ; Round 12
+ mov eax,DWORD [12+edi]
+ mul eax
+ mov DWORD [24+esi],eax
+ mov DWORD [28+esi],edx
+ ; Round 16
+ mov eax,DWORD [16+edi]
+ mul eax
+ mov DWORD [32+esi],eax
+ mov DWORD [36+esi],edx
+ ; Round 20
+ mov eax,DWORD [20+edi]
+ mul eax
+ mov DWORD [40+esi],eax
+ mov DWORD [44+esi],edx
+ ; Round 24
+ mov eax,DWORD [24+edi]
+ mul eax
+ mov DWORD [48+esi],eax
+ mov DWORD [52+esi],edx
+ ; Round 28
+ mov eax,DWORD [28+edi]
+ mul eax
+ mov DWORD [56+esi],eax
+ mov DWORD [60+esi],edx
+ ;
+ add edi,32
+ add esi,64
+ sub ebx,8
+ jnz NEAR L$018sw_loop
+L$017sw_finish:
+ mov ebx,DWORD [28+esp]
+ and ebx,7
+ jz NEAR L$019sw_end
+ ; Tail Round 0
+ mov eax,DWORD [edi]
+ mul eax
+ mov DWORD [esi],eax
+ dec ebx
+ mov DWORD [4+esi],edx
+ jz NEAR L$019sw_end
+ ; Tail Round 1
+ mov eax,DWORD [4+edi]
+ mul eax
+ mov DWORD [8+esi],eax
+ dec ebx
+ mov DWORD [12+esi],edx
+ jz NEAR L$019sw_end
+ ; Tail Round 2
+ mov eax,DWORD [8+edi]
+ mul eax
+ mov DWORD [16+esi],eax
+ dec ebx
+ mov DWORD [20+esi],edx
+ jz NEAR L$019sw_end
+ ; Tail Round 3
+ mov eax,DWORD [12+edi]
+ mul eax
+ mov DWORD [24+esi],eax
+ dec ebx
+ mov DWORD [28+esi],edx
+ jz NEAR L$019sw_end
+ ; Tail Round 4
+ mov eax,DWORD [16+edi]
+ mul eax
+ mov DWORD [32+esi],eax
+ dec ebx
+ mov DWORD [36+esi],edx
+ jz NEAR L$019sw_end
+ ; Tail Round 5
+ mov eax,DWORD [20+edi]
+ mul eax
+ mov DWORD [40+esi],eax
+ dec ebx
+ mov DWORD [44+esi],edx
+ jz NEAR L$019sw_end
+ ; Tail Round 6
+ mov eax,DWORD [24+edi]
+ mul eax
+ mov DWORD [48+esi],eax
+ mov DWORD [52+esi],edx
+L$019sw_end:
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _bn_div_words
+align 16
+_bn_div_words:
+L$_bn_div_words_begin:
+ mov edx,DWORD [4+esp]
+ mov eax,DWORD [8+esp]
+ mov ecx,DWORD [12+esp]
+ div ecx
+ ret
+global _bn_add_words
+align 16
+_bn_add_words:
+L$_bn_add_words_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ ;
+ mov ebx,DWORD [20+esp]
+ mov esi,DWORD [24+esp]
+ mov edi,DWORD [28+esp]
+ mov ebp,DWORD [32+esp]
+ xor eax,eax
+ and ebp,4294967288
+ jz NEAR L$020aw_finish
+L$021aw_loop:
+ ; Round 0
+ mov ecx,DWORD [esi]
+ mov edx,DWORD [edi]
+ add ecx,eax
+ mov eax,0
+ adc eax,eax
+ add ecx,edx
+ adc eax,0
+ mov DWORD [ebx],ecx
+ ; Round 1
+ mov ecx,DWORD [4+esi]
+ mov edx,DWORD [4+edi]
+ add ecx,eax
+ mov eax,0
+ adc eax,eax
+ add ecx,edx
+ adc eax,0
+ mov DWORD [4+ebx],ecx
+ ; Round 2
+ mov ecx,DWORD [8+esi]
+ mov edx,DWORD [8+edi]
+ add ecx,eax
+ mov eax,0
+ adc eax,eax
+ add ecx,edx
+ adc eax,0
+ mov DWORD [8+ebx],ecx
+ ; Round 3
+ mov ecx,DWORD [12+esi]
+ mov edx,DWORD [12+edi]
+ add ecx,eax
+ mov eax,0
+ adc eax,eax
+ add ecx,edx
+ adc eax,0
+ mov DWORD [12+ebx],ecx
+ ; Round 4
+ mov ecx,DWORD [16+esi]
+ mov edx,DWORD [16+edi]
+ add ecx,eax
+ mov eax,0
+ adc eax,eax
+ add ecx,edx
+ adc eax,0
+ mov DWORD [16+ebx],ecx
+ ; Round 5
+ mov ecx,DWORD [20+esi]
+ mov edx,DWORD [20+edi]
+ add ecx,eax
+ mov eax,0
+ adc eax,eax
+ add ecx,edx
+ adc eax,0
+ mov DWORD [20+ebx],ecx
+ ; Round 6
+ mov ecx,DWORD [24+esi]
+ mov edx,DWORD [24+edi]
+ add ecx,eax
+ mov eax,0
+ adc eax,eax
+ add ecx,edx
+ adc eax,0
+ mov DWORD [24+ebx],ecx
+ ; Round 7
+ mov ecx,DWORD [28+esi]
+ mov edx,DWORD [28+edi]
+ add ecx,eax
+ mov eax,0
+ adc eax,eax
+ add ecx,edx
+ adc eax,0
+ mov DWORD [28+ebx],ecx
+ ;
+ add esi,32
+ add edi,32
+ add ebx,32
+ sub ebp,8
+ jnz NEAR L$021aw_loop
+L$020aw_finish:
+ mov ebp,DWORD [32+esp]
+ and ebp,7
+ jz NEAR L$022aw_end
+ ; Tail Round 0
+ mov ecx,DWORD [esi]
+ mov edx,DWORD [edi]
+ add ecx,eax
+ mov eax,0
+ adc eax,eax
+ add ecx,edx
+ adc eax,0
+ dec ebp
+ mov DWORD [ebx],ecx
+ jz NEAR L$022aw_end
+ ; Tail Round 1
+ mov ecx,DWORD [4+esi]
+ mov edx,DWORD [4+edi]
+ add ecx,eax
+ mov eax,0
+ adc eax,eax
+ add ecx,edx
+ adc eax,0
+ dec ebp
+ mov DWORD [4+ebx],ecx
+ jz NEAR L$022aw_end
+ ; Tail Round 2
+ mov ecx,DWORD [8+esi]
+ mov edx,DWORD [8+edi]
+ add ecx,eax
+ mov eax,0
+ adc eax,eax
+ add ecx,edx
+ adc eax,0
+ dec ebp
+ mov DWORD [8+ebx],ecx
+ jz NEAR L$022aw_end
+ ; Tail Round 3
+ mov ecx,DWORD [12+esi]
+ mov edx,DWORD [12+edi]
+ add ecx,eax
+ mov eax,0
+ adc eax,eax
+ add ecx,edx
+ adc eax,0
+ dec ebp
+ mov DWORD [12+ebx],ecx
+ jz NEAR L$022aw_end
+ ; Tail Round 4
+ mov ecx,DWORD [16+esi]
+ mov edx,DWORD [16+edi]
+ add ecx,eax
+ mov eax,0
+ adc eax,eax
+ add ecx,edx
+ adc eax,0
+ dec ebp
+ mov DWORD [16+ebx],ecx
+ jz NEAR L$022aw_end
+ ; Tail Round 5
+ mov ecx,DWORD [20+esi]
+ mov edx,DWORD [20+edi]
+ add ecx,eax
+ mov eax,0
+ adc eax,eax
+ add ecx,edx
+ adc eax,0
+ dec ebp
+ mov DWORD [20+ebx],ecx
+ jz NEAR L$022aw_end
+ ; Tail Round 6
+ mov ecx,DWORD [24+esi]
+ mov edx,DWORD [24+edi]
+ add ecx,eax
+ mov eax,0
+ adc eax,eax
+ add ecx,edx
+ adc eax,0
+ mov DWORD [24+ebx],ecx
+L$022aw_end:
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _bn_sub_words
+align 16
+_bn_sub_words:
+L$_bn_sub_words_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ ;
+ mov ebx,DWORD [20+esp]
+ mov esi,DWORD [24+esp]
+ mov edi,DWORD [28+esp]
+ mov ebp,DWORD [32+esp]
+ xor eax,eax
+ and ebp,4294967288
+ jz NEAR L$023aw_finish
+L$024aw_loop:
+ ; Round 0
+ mov ecx,DWORD [esi]
+ mov edx,DWORD [edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [ebx],ecx
+ ; Round 1
+ mov ecx,DWORD [4+esi]
+ mov edx,DWORD [4+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [4+ebx],ecx
+ ; Round 2
+ mov ecx,DWORD [8+esi]
+ mov edx,DWORD [8+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [8+ebx],ecx
+ ; Round 3
+ mov ecx,DWORD [12+esi]
+ mov edx,DWORD [12+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [12+ebx],ecx
+ ; Round 4
+ mov ecx,DWORD [16+esi]
+ mov edx,DWORD [16+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [16+ebx],ecx
+ ; Round 5
+ mov ecx,DWORD [20+esi]
+ mov edx,DWORD [20+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [20+ebx],ecx
+ ; Round 6
+ mov ecx,DWORD [24+esi]
+ mov edx,DWORD [24+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [24+ebx],ecx
+ ; Round 7
+ mov ecx,DWORD [28+esi]
+ mov edx,DWORD [28+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [28+ebx],ecx
+ ;
+ add esi,32
+ add edi,32
+ add ebx,32
+ sub ebp,8
+ jnz NEAR L$024aw_loop
+L$023aw_finish:
+ mov ebp,DWORD [32+esp]
+ and ebp,7
+ jz NEAR L$025aw_end
+ ; Tail Round 0
+ mov ecx,DWORD [esi]
+ mov edx,DWORD [edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ dec ebp
+ mov DWORD [ebx],ecx
+ jz NEAR L$025aw_end
+ ; Tail Round 1
+ mov ecx,DWORD [4+esi]
+ mov edx,DWORD [4+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ dec ebp
+ mov DWORD [4+ebx],ecx
+ jz NEAR L$025aw_end
+ ; Tail Round 2
+ mov ecx,DWORD [8+esi]
+ mov edx,DWORD [8+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ dec ebp
+ mov DWORD [8+ebx],ecx
+ jz NEAR L$025aw_end
+ ; Tail Round 3
+ mov ecx,DWORD [12+esi]
+ mov edx,DWORD [12+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ dec ebp
+ mov DWORD [12+ebx],ecx
+ jz NEAR L$025aw_end
+ ; Tail Round 4
+ mov ecx,DWORD [16+esi]
+ mov edx,DWORD [16+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ dec ebp
+ mov DWORD [16+ebx],ecx
+ jz NEAR L$025aw_end
+ ; Tail Round 5
+ mov ecx,DWORD [20+esi]
+ mov edx,DWORD [20+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ dec ebp
+ mov DWORD [20+ebx],ecx
+ jz NEAR L$025aw_end
+ ; Tail Round 6
+ mov ecx,DWORD [24+esi]
+ mov edx,DWORD [24+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [24+ebx],ecx
+L$025aw_end:
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _bn_sub_part_words
+align 16
+_bn_sub_part_words:
+L$_bn_sub_part_words_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ ;
+ mov ebx,DWORD [20+esp]
+ mov esi,DWORD [24+esp]
+ mov edi,DWORD [28+esp]
+ mov ebp,DWORD [32+esp]
+ xor eax,eax
+ and ebp,4294967288
+ jz NEAR L$026aw_finish
+L$027aw_loop:
+ ; Round 0
+ mov ecx,DWORD [esi]
+ mov edx,DWORD [edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [ebx],ecx
+ ; Round 1
+ mov ecx,DWORD [4+esi]
+ mov edx,DWORD [4+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [4+ebx],ecx
+ ; Round 2
+ mov ecx,DWORD [8+esi]
+ mov edx,DWORD [8+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [8+ebx],ecx
+ ; Round 3
+ mov ecx,DWORD [12+esi]
+ mov edx,DWORD [12+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [12+ebx],ecx
+ ; Round 4
+ mov ecx,DWORD [16+esi]
+ mov edx,DWORD [16+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [16+ebx],ecx
+ ; Round 5
+ mov ecx,DWORD [20+esi]
+ mov edx,DWORD [20+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [20+ebx],ecx
+ ; Round 6
+ mov ecx,DWORD [24+esi]
+ mov edx,DWORD [24+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [24+ebx],ecx
+ ; Round 7
+ mov ecx,DWORD [28+esi]
+ mov edx,DWORD [28+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [28+ebx],ecx
+ ;
+ add esi,32
+ add edi,32
+ add ebx,32
+ sub ebp,8
+ jnz NEAR L$027aw_loop
+L$026aw_finish:
+ mov ebp,DWORD [32+esp]
+ and ebp,7
+ jz NEAR L$028aw_end
+ ; Tail Round 0
+ mov ecx,DWORD [esi]
+ mov edx,DWORD [edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [ebx],ecx
+ add esi,4
+ add edi,4
+ add ebx,4
+ dec ebp
+ jz NEAR L$028aw_end
+ ; Tail Round 1
+ mov ecx,DWORD [esi]
+ mov edx,DWORD [edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [ebx],ecx
+ add esi,4
+ add edi,4
+ add ebx,4
+ dec ebp
+ jz NEAR L$028aw_end
+ ; Tail Round 2
+ mov ecx,DWORD [esi]
+ mov edx,DWORD [edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [ebx],ecx
+ add esi,4
+ add edi,4
+ add ebx,4
+ dec ebp
+ jz NEAR L$028aw_end
+ ; Tail Round 3
+ mov ecx,DWORD [esi]
+ mov edx,DWORD [edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [ebx],ecx
+ add esi,4
+ add edi,4
+ add ebx,4
+ dec ebp
+ jz NEAR L$028aw_end
+ ; Tail Round 4
+ mov ecx,DWORD [esi]
+ mov edx,DWORD [edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [ebx],ecx
+ add esi,4
+ add edi,4
+ add ebx,4
+ dec ebp
+ jz NEAR L$028aw_end
+ ; Tail Round 5
+ mov ecx,DWORD [esi]
+ mov edx,DWORD [edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [ebx],ecx
+ add esi,4
+ add edi,4
+ add ebx,4
+ dec ebp
+ jz NEAR L$028aw_end
+ ; Tail Round 6
+ mov ecx,DWORD [esi]
+ mov edx,DWORD [edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [ebx],ecx
+ add esi,4
+ add edi,4
+ add ebx,4
+L$028aw_end:
+ cmp DWORD [36+esp],0
+ je NEAR L$029pw_end
+ mov ebp,DWORD [36+esp]
+ cmp ebp,0
+ je NEAR L$029pw_end
+ jge NEAR L$030pw_pos
+ ; pw_neg
+ mov edx,0
+ sub edx,ebp
+ mov ebp,edx
+ and ebp,4294967288
+ jz NEAR L$031pw_neg_finish
+L$032pw_neg_loop:
+ ; dl<0 Round 0
+ mov ecx,0
+ mov edx,DWORD [edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [ebx],ecx
+ ; dl<0 Round 1
+ mov ecx,0
+ mov edx,DWORD [4+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [4+ebx],ecx
+ ; dl<0 Round 2
+ mov ecx,0
+ mov edx,DWORD [8+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [8+ebx],ecx
+ ; dl<0 Round 3
+ mov ecx,0
+ mov edx,DWORD [12+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [12+ebx],ecx
+ ; dl<0 Round 4
+ mov ecx,0
+ mov edx,DWORD [16+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [16+ebx],ecx
+ ; dl<0 Round 5
+ mov ecx,0
+ mov edx,DWORD [20+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [20+ebx],ecx
+ ; dl<0 Round 6
+ mov ecx,0
+ mov edx,DWORD [24+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [24+ebx],ecx
+ ; dl<0 Round 7
+ mov ecx,0
+ mov edx,DWORD [28+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [28+ebx],ecx
+ ;
+ add edi,32
+ add ebx,32
+ sub ebp,8
+ jnz NEAR L$032pw_neg_loop
+L$031pw_neg_finish:
+ mov edx,DWORD [36+esp]
+ mov ebp,0
+ sub ebp,edx
+ and ebp,7
+ jz NEAR L$029pw_end
+ ; dl<0 Tail Round 0
+ mov ecx,0
+ mov edx,DWORD [edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ dec ebp
+ mov DWORD [ebx],ecx
+ jz NEAR L$029pw_end
+ ; dl<0 Tail Round 1
+ mov ecx,0
+ mov edx,DWORD [4+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ dec ebp
+ mov DWORD [4+ebx],ecx
+ jz NEAR L$029pw_end
+ ; dl<0 Tail Round 2
+ mov ecx,0
+ mov edx,DWORD [8+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ dec ebp
+ mov DWORD [8+ebx],ecx
+ jz NEAR L$029pw_end
+ ; dl<0 Tail Round 3
+ mov ecx,0
+ mov edx,DWORD [12+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ dec ebp
+ mov DWORD [12+ebx],ecx
+ jz NEAR L$029pw_end
+ ; dl<0 Tail Round 4
+ mov ecx,0
+ mov edx,DWORD [16+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ dec ebp
+ mov DWORD [16+ebx],ecx
+ jz NEAR L$029pw_end
+ ; dl<0 Tail Round 5
+ mov ecx,0
+ mov edx,DWORD [20+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ dec ebp
+ mov DWORD [20+ebx],ecx
+ jz NEAR L$029pw_end
+ ; dl<0 Tail Round 6
+ mov ecx,0
+ mov edx,DWORD [24+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [24+ebx],ecx
+ jmp NEAR L$029pw_end
+L$030pw_pos:
+ and ebp,4294967288
+ jz NEAR L$033pw_pos_finish
+L$034pw_pos_loop:
+ ; dl>0 Round 0
+ mov ecx,DWORD [esi]
+ sub ecx,eax
+ mov DWORD [ebx],ecx
+ jnc NEAR L$035pw_nc0
+ ; dl>0 Round 1
+ mov ecx,DWORD [4+esi]
+ sub ecx,eax
+ mov DWORD [4+ebx],ecx
+ jnc NEAR L$036pw_nc1
+ ; dl>0 Round 2
+ mov ecx,DWORD [8+esi]
+ sub ecx,eax
+ mov DWORD [8+ebx],ecx
+ jnc NEAR L$037pw_nc2
+ ; dl>0 Round 3
+ mov ecx,DWORD [12+esi]
+ sub ecx,eax
+ mov DWORD [12+ebx],ecx
+ jnc NEAR L$038pw_nc3
+ ; dl>0 Round 4
+ mov ecx,DWORD [16+esi]
+ sub ecx,eax
+ mov DWORD [16+ebx],ecx
+ jnc NEAR L$039pw_nc4
+ ; dl>0 Round 5
+ mov ecx,DWORD [20+esi]
+ sub ecx,eax
+ mov DWORD [20+ebx],ecx
+ jnc NEAR L$040pw_nc5
+ ; dl>0 Round 6
+ mov ecx,DWORD [24+esi]
+ sub ecx,eax
+ mov DWORD [24+ebx],ecx
+ jnc NEAR L$041pw_nc6
+ ; dl>0 Round 7
+ mov ecx,DWORD [28+esi]
+ sub ecx,eax
+ mov DWORD [28+ebx],ecx
+ jnc NEAR L$042pw_nc7
+ ;
+ add esi,32
+ add ebx,32
+ sub ebp,8
+ jnz NEAR L$034pw_pos_loop
+L$033pw_pos_finish:
+ mov ebp,DWORD [36+esp]
+ and ebp,7
+ jz NEAR L$029pw_end
+ ; dl>0 Tail Round 0
+ mov ecx,DWORD [esi]
+ sub ecx,eax
+ mov DWORD [ebx],ecx
+ jnc NEAR L$043pw_tail_nc0
+ dec ebp
+ jz NEAR L$029pw_end
+ ; dl>0 Tail Round 1
+ mov ecx,DWORD [4+esi]
+ sub ecx,eax
+ mov DWORD [4+ebx],ecx
+ jnc NEAR L$044pw_tail_nc1
+ dec ebp
+ jz NEAR L$029pw_end
+ ; dl>0 Tail Round 2
+ mov ecx,DWORD [8+esi]
+ sub ecx,eax
+ mov DWORD [8+ebx],ecx
+ jnc NEAR L$045pw_tail_nc2
+ dec ebp
+ jz NEAR L$029pw_end
+ ; dl>0 Tail Round 3
+ mov ecx,DWORD [12+esi]
+ sub ecx,eax
+ mov DWORD [12+ebx],ecx
+ jnc NEAR L$046pw_tail_nc3
+ dec ebp
+ jz NEAR L$029pw_end
+ ; dl>0 Tail Round 4
+ mov ecx,DWORD [16+esi]
+ sub ecx,eax
+ mov DWORD [16+ebx],ecx
+ jnc NEAR L$047pw_tail_nc4
+ dec ebp
+ jz NEAR L$029pw_end
+ ; dl>0 Tail Round 5
+ mov ecx,DWORD [20+esi]
+ sub ecx,eax
+ mov DWORD [20+ebx],ecx
+ jnc NEAR L$048pw_tail_nc5
+ dec ebp
+ jz NEAR L$029pw_end
+ ; dl>0 Tail Round 6
+ mov ecx,DWORD [24+esi]
+ sub ecx,eax
+ mov DWORD [24+ebx],ecx
+ jnc NEAR L$049pw_tail_nc6
+ mov eax,1
+ jmp NEAR L$029pw_end
+L$050pw_nc_loop:
+ mov ecx,DWORD [esi]
+ mov DWORD [ebx],ecx
+L$035pw_nc0:
+ mov ecx,DWORD [4+esi]
+ mov DWORD [4+ebx],ecx
+L$036pw_nc1:
+ mov ecx,DWORD [8+esi]
+ mov DWORD [8+ebx],ecx
+L$037pw_nc2:
+ mov ecx,DWORD [12+esi]
+ mov DWORD [12+ebx],ecx
+L$038pw_nc3:
+ mov ecx,DWORD [16+esi]
+ mov DWORD [16+ebx],ecx
+L$039pw_nc4:
+ mov ecx,DWORD [20+esi]
+ mov DWORD [20+ebx],ecx
+L$040pw_nc5:
+ mov ecx,DWORD [24+esi]
+ mov DWORD [24+ebx],ecx
+L$041pw_nc6:
+ mov ecx,DWORD [28+esi]
+ mov DWORD [28+ebx],ecx
+L$042pw_nc7:
+ ;
+ add esi,32
+ add ebx,32
+ sub ebp,8
+ jnz NEAR L$050pw_nc_loop
+ mov ebp,DWORD [36+esp]
+ and ebp,7
+ jz NEAR L$051pw_nc_end
+ mov ecx,DWORD [esi]
+ mov DWORD [ebx],ecx
+L$043pw_tail_nc0:
+ dec ebp
+ jz NEAR L$051pw_nc_end
+ mov ecx,DWORD [4+esi]
+ mov DWORD [4+ebx],ecx
+L$044pw_tail_nc1:
+ dec ebp
+ jz NEAR L$051pw_nc_end
+ mov ecx,DWORD [8+esi]
+ mov DWORD [8+ebx],ecx
+L$045pw_tail_nc2:
+ dec ebp
+ jz NEAR L$051pw_nc_end
+ mov ecx,DWORD [12+esi]
+ mov DWORD [12+ebx],ecx
+L$046pw_tail_nc3:
+ dec ebp
+ jz NEAR L$051pw_nc_end
+ mov ecx,DWORD [16+esi]
+ mov DWORD [16+ebx],ecx
+L$047pw_tail_nc4:
+ dec ebp
+ jz NEAR L$051pw_nc_end
+ mov ecx,DWORD [20+esi]
+ mov DWORD [20+ebx],ecx
+L$048pw_tail_nc5:
+ dec ebp
+ jz NEAR L$051pw_nc_end
+ mov ecx,DWORD [24+esi]
+ mov DWORD [24+ebx],ecx
+L$049pw_tail_nc6:
+L$051pw_nc_end:
+ mov eax,0
+L$029pw_end:
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+segment .bss
+common _OPENSSL_ia32cap_P 16
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/co-586.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/co-586.nasm
new file mode 100644
index 0000000000..08eb9fe372
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/co-586.nasm
@@ -0,0 +1,1259 @@
+; Copyright 1995-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text code align=64
+%else
+section .text code
+%endif
+global _bn_mul_comba8
+align 16
+_bn_mul_comba8:
+L$_bn_mul_comba8_begin:
+ push esi
+ mov esi,DWORD [12+esp]
+ push edi
+ mov edi,DWORD [20+esp]
+ push ebp
+ push ebx
+ xor ebx,ebx
+ mov eax,DWORD [esi]
+ xor ecx,ecx
+ mov edx,DWORD [edi]
+ ; ################## Calculate word 0
+ xor ebp,ebp
+ ; mul a[0]*b[0]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [20+esp]
+ adc ecx,edx
+ mov edx,DWORD [edi]
+ adc ebp,0
+ mov DWORD [eax],ebx
+ mov eax,DWORD [4+esi]
+ ; saved r[0]
+ ; ################## Calculate word 1
+ xor ebx,ebx
+ ; mul a[1]*b[0]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [esi]
+ adc ebp,edx
+ mov edx,DWORD [4+edi]
+ adc ebx,0
+ ; mul a[0]*b[1]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [20+esp]
+ adc ebp,edx
+ mov edx,DWORD [edi]
+ adc ebx,0
+ mov DWORD [4+eax],ecx
+ mov eax,DWORD [8+esi]
+ ; saved r[1]
+ ; ################## Calculate word 2
+ xor ecx,ecx
+ ; mul a[2]*b[0]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [4+esi]
+ adc ebx,edx
+ mov edx,DWORD [4+edi]
+ adc ecx,0
+ ; mul a[1]*b[1]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [esi]
+ adc ebx,edx
+ mov edx,DWORD [8+edi]
+ adc ecx,0
+ ; mul a[0]*b[2]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [20+esp]
+ adc ebx,edx
+ mov edx,DWORD [edi]
+ adc ecx,0
+ mov DWORD [8+eax],ebp
+ mov eax,DWORD [12+esi]
+ ; saved r[2]
+ ; ################## Calculate word 3
+ xor ebp,ebp
+ ; mul a[3]*b[0]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [8+esi]
+ adc ecx,edx
+ mov edx,DWORD [4+edi]
+ adc ebp,0
+ ; mul a[2]*b[1]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [4+esi]
+ adc ecx,edx
+ mov edx,DWORD [8+edi]
+ adc ebp,0
+ ; mul a[1]*b[2]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [esi]
+ adc ecx,edx
+ mov edx,DWORD [12+edi]
+ adc ebp,0
+ ; mul a[0]*b[3]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [20+esp]
+ adc ecx,edx
+ mov edx,DWORD [edi]
+ adc ebp,0
+ mov DWORD [12+eax],ebx
+ mov eax,DWORD [16+esi]
+ ; saved r[3]
+ ; ################## Calculate word 4
+ xor ebx,ebx
+ ; mul a[4]*b[0]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [12+esi]
+ adc ebp,edx
+ mov edx,DWORD [4+edi]
+ adc ebx,0
+ ; mul a[3]*b[1]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [8+esi]
+ adc ebp,edx
+ mov edx,DWORD [8+edi]
+ adc ebx,0
+ ; mul a[2]*b[2]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [4+esi]
+ adc ebp,edx
+ mov edx,DWORD [12+edi]
+ adc ebx,0
+ ; mul a[1]*b[3]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [esi]
+ adc ebp,edx
+ mov edx,DWORD [16+edi]
+ adc ebx,0
+ ; mul a[0]*b[4]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [20+esp]
+ adc ebp,edx
+ mov edx,DWORD [edi]
+ adc ebx,0
+ mov DWORD [16+eax],ecx
+ mov eax,DWORD [20+esi]
+ ; saved r[4]
+ ; ################## Calculate word 5
+ xor ecx,ecx
+ ; mul a[5]*b[0]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [16+esi]
+ adc ebx,edx
+ mov edx,DWORD [4+edi]
+ adc ecx,0
+ ; mul a[4]*b[1]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [12+esi]
+ adc ebx,edx
+ mov edx,DWORD [8+edi]
+ adc ecx,0
+ ; mul a[3]*b[2]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [8+esi]
+ adc ebx,edx
+ mov edx,DWORD [12+edi]
+ adc ecx,0
+ ; mul a[2]*b[3]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [4+esi]
+ adc ebx,edx
+ mov edx,DWORD [16+edi]
+ adc ecx,0
+ ; mul a[1]*b[4]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [esi]
+ adc ebx,edx
+ mov edx,DWORD [20+edi]
+ adc ecx,0
+ ; mul a[0]*b[5]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [20+esp]
+ adc ebx,edx
+ mov edx,DWORD [edi]
+ adc ecx,0
+ mov DWORD [20+eax],ebp
+ mov eax,DWORD [24+esi]
+ ; saved r[5]
+ ; ################## Calculate word 6
+ xor ebp,ebp
+ ; mul a[6]*b[0]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [20+esi]
+ adc ecx,edx
+ mov edx,DWORD [4+edi]
+ adc ebp,0
+ ; mul a[5]*b[1]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [16+esi]
+ adc ecx,edx
+ mov edx,DWORD [8+edi]
+ adc ebp,0
+ ; mul a[4]*b[2]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [12+esi]
+ adc ecx,edx
+ mov edx,DWORD [12+edi]
+ adc ebp,0
+ ; mul a[3]*b[3]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [8+esi]
+ adc ecx,edx
+ mov edx,DWORD [16+edi]
+ adc ebp,0
+ ; mul a[2]*b[4]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [4+esi]
+ adc ecx,edx
+ mov edx,DWORD [20+edi]
+ adc ebp,0
+ ; mul a[1]*b[5]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [esi]
+ adc ecx,edx
+ mov edx,DWORD [24+edi]
+ adc ebp,0
+ ; mul a[0]*b[6]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [20+esp]
+ adc ecx,edx
+ mov edx,DWORD [edi]
+ adc ebp,0
+ mov DWORD [24+eax],ebx
+ mov eax,DWORD [28+esi]
+ ; saved r[6]
+ ; ################## Calculate word 7
+ xor ebx,ebx
+ ; mul a[7]*b[0]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [24+esi]
+ adc ebp,edx
+ mov edx,DWORD [4+edi]
+ adc ebx,0
+ ; mul a[6]*b[1]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [20+esi]
+ adc ebp,edx
+ mov edx,DWORD [8+edi]
+ adc ebx,0
+ ; mul a[5]*b[2]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [16+esi]
+ adc ebp,edx
+ mov edx,DWORD [12+edi]
+ adc ebx,0
+ ; mul a[4]*b[3]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [12+esi]
+ adc ebp,edx
+ mov edx,DWORD [16+edi]
+ adc ebx,0
+ ; mul a[3]*b[4]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [8+esi]
+ adc ebp,edx
+ mov edx,DWORD [20+edi]
+ adc ebx,0
+ ; mul a[2]*b[5]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [4+esi]
+ adc ebp,edx
+ mov edx,DWORD [24+edi]
+ adc ebx,0
+ ; mul a[1]*b[6]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [esi]
+ adc ebp,edx
+ mov edx,DWORD [28+edi]
+ adc ebx,0
+ ; mul a[0]*b[7]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [20+esp]
+ adc ebp,edx
+ mov edx,DWORD [4+edi]
+ adc ebx,0
+ mov DWORD [28+eax],ecx
+ mov eax,DWORD [28+esi]
+ ; saved r[7]
+ ; ################## Calculate word 8
+ xor ecx,ecx
+ ; mul a[7]*b[1]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [24+esi]
+ adc ebx,edx
+ mov edx,DWORD [8+edi]
+ adc ecx,0
+ ; mul a[6]*b[2]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [20+esi]
+ adc ebx,edx
+ mov edx,DWORD [12+edi]
+ adc ecx,0
+ ; mul a[5]*b[3]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [16+esi]
+ adc ebx,edx
+ mov edx,DWORD [16+edi]
+ adc ecx,0
+ ; mul a[4]*b[4]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [12+esi]
+ adc ebx,edx
+ mov edx,DWORD [20+edi]
+ adc ecx,0
+ ; mul a[3]*b[5]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [8+esi]
+ adc ebx,edx
+ mov edx,DWORD [24+edi]
+ adc ecx,0
+ ; mul a[2]*b[6]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [4+esi]
+ adc ebx,edx
+ mov edx,DWORD [28+edi]
+ adc ecx,0
+ ; mul a[1]*b[7]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [20+esp]
+ adc ebx,edx
+ mov edx,DWORD [8+edi]
+ adc ecx,0
+ mov DWORD [32+eax],ebp
+ mov eax,DWORD [28+esi]
+ ; saved r[8]
+ ; ################## Calculate word 9
+ xor ebp,ebp
+ ; mul a[7]*b[2]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [24+esi]
+ adc ecx,edx
+ mov edx,DWORD [12+edi]
+ adc ebp,0
+ ; mul a[6]*b[3]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [20+esi]
+ adc ecx,edx
+ mov edx,DWORD [16+edi]
+ adc ebp,0
+ ; mul a[5]*b[4]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [16+esi]
+ adc ecx,edx
+ mov edx,DWORD [20+edi]
+ adc ebp,0
+ ; mul a[4]*b[5]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [12+esi]
+ adc ecx,edx
+ mov edx,DWORD [24+edi]
+ adc ebp,0
+ ; mul a[3]*b[6]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [8+esi]
+ adc ecx,edx
+ mov edx,DWORD [28+edi]
+ adc ebp,0
+ ; mul a[2]*b[7]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [20+esp]
+ adc ecx,edx
+ mov edx,DWORD [12+edi]
+ adc ebp,0
+ mov DWORD [36+eax],ebx
+ mov eax,DWORD [28+esi]
+ ; saved r[9]
+ ; ################## Calculate word 10
+ xor ebx,ebx
+ ; mul a[7]*b[3]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [24+esi]
+ adc ebp,edx
+ mov edx,DWORD [16+edi]
+ adc ebx,0
+ ; mul a[6]*b[4]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [20+esi]
+ adc ebp,edx
+ mov edx,DWORD [20+edi]
+ adc ebx,0
+ ; mul a[5]*b[5]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [16+esi]
+ adc ebp,edx
+ mov edx,DWORD [24+edi]
+ adc ebx,0
+ ; mul a[4]*b[6]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [12+esi]
+ adc ebp,edx
+ mov edx,DWORD [28+edi]
+ adc ebx,0
+ ; mul a[3]*b[7]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [20+esp]
+ adc ebp,edx
+ mov edx,DWORD [16+edi]
+ adc ebx,0
+ mov DWORD [40+eax],ecx
+ mov eax,DWORD [28+esi]
+ ; saved r[10]
+ ; ################## Calculate word 11
+ xor ecx,ecx
+ ; mul a[7]*b[4]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [24+esi]
+ adc ebx,edx
+ mov edx,DWORD [20+edi]
+ adc ecx,0
+ ; mul a[6]*b[5]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [20+esi]
+ adc ebx,edx
+ mov edx,DWORD [24+edi]
+ adc ecx,0
+ ; mul a[5]*b[6]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [16+esi]
+ adc ebx,edx
+ mov edx,DWORD [28+edi]
+ adc ecx,0
+ ; mul a[4]*b[7]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [20+esp]
+ adc ebx,edx
+ mov edx,DWORD [20+edi]
+ adc ecx,0
+ mov DWORD [44+eax],ebp
+ mov eax,DWORD [28+esi]
+ ; saved r[11]
+ ; ################## Calculate word 12
+ xor ebp,ebp
+ ; mul a[7]*b[5]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [24+esi]
+ adc ecx,edx
+ mov edx,DWORD [24+edi]
+ adc ebp,0
+ ; mul a[6]*b[6]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [20+esi]
+ adc ecx,edx
+ mov edx,DWORD [28+edi]
+ adc ebp,0
+ ; mul a[5]*b[7]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [20+esp]
+ adc ecx,edx
+ mov edx,DWORD [24+edi]
+ adc ebp,0
+ mov DWORD [48+eax],ebx
+ mov eax,DWORD [28+esi]
+ ; saved r[12]
+ ; ################## Calculate word 13
+ xor ebx,ebx
+ ; mul a[7]*b[6]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [24+esi]
+ adc ebp,edx
+ mov edx,DWORD [28+edi]
+ adc ebx,0
+ ; mul a[6]*b[7]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [20+esp]
+ adc ebp,edx
+ mov edx,DWORD [28+edi]
+ adc ebx,0
+ mov DWORD [52+eax],ecx
+ mov eax,DWORD [28+esi]
+ ; saved r[13]
+ ; ################## Calculate word 14
+ xor ecx,ecx
+ ; mul a[7]*b[7]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [20+esp]
+ adc ebx,edx
+ adc ecx,0
+ mov DWORD [56+eax],ebp
+ ; saved r[14]
+ ; save r[15]
+ mov DWORD [60+eax],ebx
+ pop ebx
+ pop ebp
+ pop edi
+ pop esi
+ ret
+global _bn_mul_comba4
+align 16
+_bn_mul_comba4:
+L$_bn_mul_comba4_begin:
+ push esi
+ mov esi,DWORD [12+esp]
+ push edi
+ mov edi,DWORD [20+esp]
+ push ebp
+ push ebx
+ xor ebx,ebx
+ mov eax,DWORD [esi]
+ xor ecx,ecx
+ mov edx,DWORD [edi]
+ ; ################## Calculate word 0
+ xor ebp,ebp
+ ; mul a[0]*b[0]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [20+esp]
+ adc ecx,edx
+ mov edx,DWORD [edi]
+ adc ebp,0
+ mov DWORD [eax],ebx
+ mov eax,DWORD [4+esi]
+ ; saved r[0]
+ ; ################## Calculate word 1
+ xor ebx,ebx
+ ; mul a[1]*b[0]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [esi]
+ adc ebp,edx
+ mov edx,DWORD [4+edi]
+ adc ebx,0
+ ; mul a[0]*b[1]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [20+esp]
+ adc ebp,edx
+ mov edx,DWORD [edi]
+ adc ebx,0
+ mov DWORD [4+eax],ecx
+ mov eax,DWORD [8+esi]
+ ; saved r[1]
+ ; ################## Calculate word 2
+ xor ecx,ecx
+ ; mul a[2]*b[0]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [4+esi]
+ adc ebx,edx
+ mov edx,DWORD [4+edi]
+ adc ecx,0
+ ; mul a[1]*b[1]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [esi]
+ adc ebx,edx
+ mov edx,DWORD [8+edi]
+ adc ecx,0
+ ; mul a[0]*b[2]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [20+esp]
+ adc ebx,edx
+ mov edx,DWORD [edi]
+ adc ecx,0
+ mov DWORD [8+eax],ebp
+ mov eax,DWORD [12+esi]
+ ; saved r[2]
+ ; ################## Calculate word 3
+ xor ebp,ebp
+ ; mul a[3]*b[0]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [8+esi]
+ adc ecx,edx
+ mov edx,DWORD [4+edi]
+ adc ebp,0
+ ; mul a[2]*b[1]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [4+esi]
+ adc ecx,edx
+ mov edx,DWORD [8+edi]
+ adc ebp,0
+ ; mul a[1]*b[2]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [esi]
+ adc ecx,edx
+ mov edx,DWORD [12+edi]
+ adc ebp,0
+ ; mul a[0]*b[3]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [20+esp]
+ adc ecx,edx
+ mov edx,DWORD [4+edi]
+ adc ebp,0
+ mov DWORD [12+eax],ebx
+ mov eax,DWORD [12+esi]
+ ; saved r[3]
+ ; ################## Calculate word 4
+ xor ebx,ebx
+ ; mul a[3]*b[1]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [8+esi]
+ adc ebp,edx
+ mov edx,DWORD [8+edi]
+ adc ebx,0
+ ; mul a[2]*b[2]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [4+esi]
+ adc ebp,edx
+ mov edx,DWORD [12+edi]
+ adc ebx,0
+ ; mul a[1]*b[3]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [20+esp]
+ adc ebp,edx
+ mov edx,DWORD [8+edi]
+ adc ebx,0
+ mov DWORD [16+eax],ecx
+ mov eax,DWORD [12+esi]
+ ; saved r[4]
+ ; ################## Calculate word 5
+ xor ecx,ecx
+ ; mul a[3]*b[2]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [8+esi]
+ adc ebx,edx
+ mov edx,DWORD [12+edi]
+ adc ecx,0
+ ; mul a[2]*b[3]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [20+esp]
+ adc ebx,edx
+ mov edx,DWORD [12+edi]
+ adc ecx,0
+ mov DWORD [20+eax],ebp
+ mov eax,DWORD [12+esi]
+ ; saved r[5]
+ ; ################## Calculate word 6
+ xor ebp,ebp
+ ; mul a[3]*b[3]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [20+esp]
+ adc ecx,edx
+ adc ebp,0
+ mov DWORD [24+eax],ebx
+ ; saved r[6]
+ ; save r[7]
+ mov DWORD [28+eax],ecx
+ pop ebx
+ pop ebp
+ pop edi
+ pop esi
+ ret
+global _bn_sqr_comba8
+align 16
+_bn_sqr_comba8:
+L$_bn_sqr_comba8_begin:
+ push esi
+ push edi
+ push ebp
+ push ebx
+ mov edi,DWORD [20+esp]
+ mov esi,DWORD [24+esp]
+ xor ebx,ebx
+ xor ecx,ecx
+ mov eax,DWORD [esi]
+ ; ############### Calculate word 0
+ xor ebp,ebp
+ ; sqr a[0]*a[0]
+ mul eax
+ add ebx,eax
+ adc ecx,edx
+ mov edx,DWORD [esi]
+ adc ebp,0
+ mov DWORD [edi],ebx
+ mov eax,DWORD [4+esi]
+ ; saved r[0]
+ ; ############### Calculate word 1
+ xor ebx,ebx
+ ; sqr a[1]*a[0]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebx,0
+ add ecx,eax
+ adc ebp,edx
+ mov eax,DWORD [8+esi]
+ adc ebx,0
+ mov DWORD [4+edi],ecx
+ mov edx,DWORD [esi]
+ ; saved r[1]
+ ; ############### Calculate word 2
+ xor ecx,ecx
+ ; sqr a[2]*a[0]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ecx,0
+ add ebp,eax
+ adc ebx,edx
+ mov eax,DWORD [4+esi]
+ adc ecx,0
+ ; sqr a[1]*a[1]
+ mul eax
+ add ebp,eax
+ adc ebx,edx
+ mov edx,DWORD [esi]
+ adc ecx,0
+ mov DWORD [8+edi],ebp
+ mov eax,DWORD [12+esi]
+ ; saved r[2]
+ ; ############### Calculate word 3
+ xor ebp,ebp
+ ; sqr a[3]*a[0]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebp,0
+ add ebx,eax
+ adc ecx,edx
+ mov eax,DWORD [8+esi]
+ adc ebp,0
+ mov edx,DWORD [4+esi]
+ ; sqr a[2]*a[1]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebp,0
+ add ebx,eax
+ adc ecx,edx
+ mov eax,DWORD [16+esi]
+ adc ebp,0
+ mov DWORD [12+edi],ebx
+ mov edx,DWORD [esi]
+ ; saved r[3]
+ ; ############### Calculate word 4
+ xor ebx,ebx
+ ; sqr a[4]*a[0]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebx,0
+ add ecx,eax
+ adc ebp,edx
+ mov eax,DWORD [12+esi]
+ adc ebx,0
+ mov edx,DWORD [4+esi]
+ ; sqr a[3]*a[1]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebx,0
+ add ecx,eax
+ adc ebp,edx
+ mov eax,DWORD [8+esi]
+ adc ebx,0
+ ; sqr a[2]*a[2]
+ mul eax
+ add ecx,eax
+ adc ebp,edx
+ mov edx,DWORD [esi]
+ adc ebx,0
+ mov DWORD [16+edi],ecx
+ mov eax,DWORD [20+esi]
+ ; saved r[4]
+ ; ############### Calculate word 5
+ xor ecx,ecx
+ ; sqr a[5]*a[0]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ecx,0
+ add ebp,eax
+ adc ebx,edx
+ mov eax,DWORD [16+esi]
+ adc ecx,0
+ mov edx,DWORD [4+esi]
+ ; sqr a[4]*a[1]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ecx,0
+ add ebp,eax
+ adc ebx,edx
+ mov eax,DWORD [12+esi]
+ adc ecx,0
+ mov edx,DWORD [8+esi]
+ ; sqr a[3]*a[2]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ecx,0
+ add ebp,eax
+ adc ebx,edx
+ mov eax,DWORD [24+esi]
+ adc ecx,0
+ mov DWORD [20+edi],ebp
+ mov edx,DWORD [esi]
+ ; saved r[5]
+ ; ############### Calculate word 6
+ xor ebp,ebp
+ ; sqr a[6]*a[0]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebp,0
+ add ebx,eax
+ adc ecx,edx
+ mov eax,DWORD [20+esi]
+ adc ebp,0
+ mov edx,DWORD [4+esi]
+ ; sqr a[5]*a[1]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebp,0
+ add ebx,eax
+ adc ecx,edx
+ mov eax,DWORD [16+esi]
+ adc ebp,0
+ mov edx,DWORD [8+esi]
+ ; sqr a[4]*a[2]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebp,0
+ add ebx,eax
+ adc ecx,edx
+ mov eax,DWORD [12+esi]
+ adc ebp,0
+ ; sqr a[3]*a[3]
+ mul eax
+ add ebx,eax
+ adc ecx,edx
+ mov edx,DWORD [esi]
+ adc ebp,0
+ mov DWORD [24+edi],ebx
+ mov eax,DWORD [28+esi]
+ ; saved r[6]
+ ; ############### Calculate word 7
+ xor ebx,ebx
+ ; sqr a[7]*a[0]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebx,0
+ add ecx,eax
+ adc ebp,edx
+ mov eax,DWORD [24+esi]
+ adc ebx,0
+ mov edx,DWORD [4+esi]
+ ; sqr a[6]*a[1]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebx,0
+ add ecx,eax
+ adc ebp,edx
+ mov eax,DWORD [20+esi]
+ adc ebx,0
+ mov edx,DWORD [8+esi]
+ ; sqr a[5]*a[2]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebx,0
+ add ecx,eax
+ adc ebp,edx
+ mov eax,DWORD [16+esi]
+ adc ebx,0
+ mov edx,DWORD [12+esi]
+ ; sqr a[4]*a[3]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebx,0
+ add ecx,eax
+ adc ebp,edx
+ mov eax,DWORD [28+esi]
+ adc ebx,0
+ mov DWORD [28+edi],ecx
+ mov edx,DWORD [4+esi]
+ ; saved r[7]
+ ; ############### Calculate word 8
+ xor ecx,ecx
+ ; sqr a[7]*a[1]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ecx,0
+ add ebp,eax
+ adc ebx,edx
+ mov eax,DWORD [24+esi]
+ adc ecx,0
+ mov edx,DWORD [8+esi]
+ ; sqr a[6]*a[2]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ecx,0
+ add ebp,eax
+ adc ebx,edx
+ mov eax,DWORD [20+esi]
+ adc ecx,0
+ mov edx,DWORD [12+esi]
+ ; sqr a[5]*a[3]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ecx,0
+ add ebp,eax
+ adc ebx,edx
+ mov eax,DWORD [16+esi]
+ adc ecx,0
+ ; sqr a[4]*a[4]
+ mul eax
+ add ebp,eax
+ adc ebx,edx
+ mov edx,DWORD [8+esi]
+ adc ecx,0
+ mov DWORD [32+edi],ebp
+ mov eax,DWORD [28+esi]
+ ; saved r[8]
+ ; ############### Calculate word 9
+ xor ebp,ebp
+ ; sqr a[7]*a[2]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebp,0
+ add ebx,eax
+ adc ecx,edx
+ mov eax,DWORD [24+esi]
+ adc ebp,0
+ mov edx,DWORD [12+esi]
+ ; sqr a[6]*a[3]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebp,0
+ add ebx,eax
+ adc ecx,edx
+ mov eax,DWORD [20+esi]
+ adc ebp,0
+ mov edx,DWORD [16+esi]
+ ; sqr a[5]*a[4]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebp,0
+ add ebx,eax
+ adc ecx,edx
+ mov eax,DWORD [28+esi]
+ adc ebp,0
+ mov DWORD [36+edi],ebx
+ mov edx,DWORD [12+esi]
+ ; saved r[9]
+ ; ############### Calculate word 10
+ xor ebx,ebx
+ ; sqr a[7]*a[3]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebx,0
+ add ecx,eax
+ adc ebp,edx
+ mov eax,DWORD [24+esi]
+ adc ebx,0
+ mov edx,DWORD [16+esi]
+ ; sqr a[6]*a[4]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebx,0
+ add ecx,eax
+ adc ebp,edx
+ mov eax,DWORD [20+esi]
+ adc ebx,0
+ ; sqr a[5]*a[5]
+ mul eax
+ add ecx,eax
+ adc ebp,edx
+ mov edx,DWORD [16+esi]
+ adc ebx,0
+ mov DWORD [40+edi],ecx
+ mov eax,DWORD [28+esi]
+ ; saved r[10]
+ ; ############### Calculate word 11
+ xor ecx,ecx
+ ; sqr a[7]*a[4]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ecx,0
+ add ebp,eax
+ adc ebx,edx
+ mov eax,DWORD [24+esi]
+ adc ecx,0
+ mov edx,DWORD [20+esi]
+ ; sqr a[6]*a[5]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ecx,0
+ add ebp,eax
+ adc ebx,edx
+ mov eax,DWORD [28+esi]
+ adc ecx,0
+ mov DWORD [44+edi],ebp
+ mov edx,DWORD [20+esi]
+ ; saved r[11]
+ ; ############### Calculate word 12
+ xor ebp,ebp
+ ; sqr a[7]*a[5]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebp,0
+ add ebx,eax
+ adc ecx,edx
+ mov eax,DWORD [24+esi]
+ adc ebp,0
+ ; sqr a[6]*a[6]
+ mul eax
+ add ebx,eax
+ adc ecx,edx
+ mov edx,DWORD [24+esi]
+ adc ebp,0
+ mov DWORD [48+edi],ebx
+ mov eax,DWORD [28+esi]
+ ; saved r[12]
+ ; ############### Calculate word 13
+ xor ebx,ebx
+ ; sqr a[7]*a[6]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebx,0
+ add ecx,eax
+ adc ebp,edx
+ mov eax,DWORD [28+esi]
+ adc ebx,0
+ mov DWORD [52+edi],ecx
+ ; saved r[13]
+ ; ############### Calculate word 14
+ xor ecx,ecx
+ ; sqr a[7]*a[7]
+ mul eax
+ add ebp,eax
+ adc ebx,edx
+ adc ecx,0
+ mov DWORD [56+edi],ebp
+ ; saved r[14]
+ mov DWORD [60+edi],ebx
+ pop ebx
+ pop ebp
+ pop edi
+ pop esi
+ ret
+global _bn_sqr_comba4
+align 16
+_bn_sqr_comba4:
+L$_bn_sqr_comba4_begin:
+ push esi
+ push edi
+ push ebp
+ push ebx
+ mov edi,DWORD [20+esp]
+ mov esi,DWORD [24+esp]
+ xor ebx,ebx
+ xor ecx,ecx
+ mov eax,DWORD [esi]
+ ; ############### Calculate word 0
+ xor ebp,ebp
+ ; sqr a[0]*a[0]
+ mul eax
+ add ebx,eax
+ adc ecx,edx
+ mov edx,DWORD [esi]
+ adc ebp,0
+ mov DWORD [edi],ebx
+ mov eax,DWORD [4+esi]
+ ; saved r[0]
+ ; ############### Calculate word 1
+ xor ebx,ebx
+ ; sqr a[1]*a[0]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebx,0
+ add ecx,eax
+ adc ebp,edx
+ mov eax,DWORD [8+esi]
+ adc ebx,0
+ mov DWORD [4+edi],ecx
+ mov edx,DWORD [esi]
+ ; saved r[1]
+ ; ############### Calculate word 2
+ xor ecx,ecx
+ ; sqr a[2]*a[0]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ecx,0
+ add ebp,eax
+ adc ebx,edx
+ mov eax,DWORD [4+esi]
+ adc ecx,0
+ ; sqr a[1]*a[1]
+ mul eax
+ add ebp,eax
+ adc ebx,edx
+ mov edx,DWORD [esi]
+ adc ecx,0
+ mov DWORD [8+edi],ebp
+ mov eax,DWORD [12+esi]
+ ; saved r[2]
+ ; ############### Calculate word 3
+ xor ebp,ebp
+ ; sqr a[3]*a[0]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebp,0
+ add ebx,eax
+ adc ecx,edx
+ mov eax,DWORD [8+esi]
+ adc ebp,0
+ mov edx,DWORD [4+esi]
+ ; sqr a[2]*a[1]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebp,0
+ add ebx,eax
+ adc ecx,edx
+ mov eax,DWORD [12+esi]
+ adc ebp,0
+ mov DWORD [12+edi],ebx
+ mov edx,DWORD [4+esi]
+ ; saved r[3]
+ ; ############### Calculate word 4
+ xor ebx,ebx
+ ; sqr a[3]*a[1]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebx,0
+ add ecx,eax
+ adc ebp,edx
+ mov eax,DWORD [8+esi]
+ adc ebx,0
+ ; sqr a[2]*a[2]
+ mul eax
+ add ecx,eax
+ adc ebp,edx
+ mov edx,DWORD [8+esi]
+ adc ebx,0
+ mov DWORD [16+edi],ecx
+ mov eax,DWORD [12+esi]
+ ; saved r[4]
+ ; ############### Calculate word 5
+ xor ecx,ecx
+ ; sqr a[3]*a[2]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ecx,0
+ add ebp,eax
+ adc ebx,edx
+ mov eax,DWORD [12+esi]
+ adc ecx,0
+ mov DWORD [20+edi],ebp
+ ; saved r[5]
+ ; ############### Calculate word 6
+ xor ebp,ebp
+ ; sqr a[3]*a[3]
+ mul eax
+ add ebx,eax
+ adc ecx,edx
+ adc ebp,0
+ mov DWORD [24+edi],ebx
+ ; saved r[6]
+ mov DWORD [28+edi],ecx
+ pop ebx
+ pop ebp
+ pop edi
+ pop esi
+ ret
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-gf2m.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-gf2m.nasm
new file mode 100644
index 0000000000..5f2f4f65de
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-gf2m.nasm
@@ -0,0 +1,352 @@
+; Copyright 2011-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text code align=64
+%else
+section .text code
+%endif
+;extern _OPENSSL_ia32cap_P
+align 16
+__mul_1x1_mmx:
+ sub esp,36
+ mov ecx,eax
+ lea edx,[eax*1+eax]
+ and ecx,1073741823
+ lea ebp,[edx*1+edx]
+ mov DWORD [esp],0
+ and edx,2147483647
+ movd mm2,eax
+ movd mm3,ebx
+ mov DWORD [4+esp],ecx
+ xor ecx,edx
+ pxor mm5,mm5
+ pxor mm4,mm4
+ mov DWORD [8+esp],edx
+ xor edx,ebp
+ mov DWORD [12+esp],ecx
+ pcmpgtd mm5,mm2
+ paddd mm2,mm2
+ xor ecx,edx
+ mov DWORD [16+esp],ebp
+ xor ebp,edx
+ pand mm5,mm3
+ pcmpgtd mm4,mm2
+ mov DWORD [20+esp],ecx
+ xor ebp,ecx
+ psllq mm5,31
+ pand mm4,mm3
+ mov DWORD [24+esp],edx
+ mov esi,7
+ mov DWORD [28+esp],ebp
+ mov ebp,esi
+ and esi,ebx
+ shr ebx,3
+ mov edi,ebp
+ psllq mm4,30
+ and edi,ebx
+ shr ebx,3
+ movd mm0,DWORD [esi*4+esp]
+ mov esi,ebp
+ and esi,ebx
+ shr ebx,3
+ movd mm2,DWORD [edi*4+esp]
+ mov edi,ebp
+ psllq mm2,3
+ and edi,ebx
+ shr ebx,3
+ pxor mm0,mm2
+ movd mm1,DWORD [esi*4+esp]
+ mov esi,ebp
+ psllq mm1,6
+ and esi,ebx
+ shr ebx,3
+ pxor mm0,mm1
+ movd mm2,DWORD [edi*4+esp]
+ mov edi,ebp
+ psllq mm2,9
+ and edi,ebx
+ shr ebx,3
+ pxor mm0,mm2
+ movd mm1,DWORD [esi*4+esp]
+ mov esi,ebp
+ psllq mm1,12
+ and esi,ebx
+ shr ebx,3
+ pxor mm0,mm1
+ movd mm2,DWORD [edi*4+esp]
+ mov edi,ebp
+ psllq mm2,15
+ and edi,ebx
+ shr ebx,3
+ pxor mm0,mm2
+ movd mm1,DWORD [esi*4+esp]
+ mov esi,ebp
+ psllq mm1,18
+ and esi,ebx
+ shr ebx,3
+ pxor mm0,mm1
+ movd mm2,DWORD [edi*4+esp]
+ mov edi,ebp
+ psllq mm2,21
+ and edi,ebx
+ shr ebx,3
+ pxor mm0,mm2
+ movd mm1,DWORD [esi*4+esp]
+ mov esi,ebp
+ psllq mm1,24
+ and esi,ebx
+ shr ebx,3
+ pxor mm0,mm1
+ movd mm2,DWORD [edi*4+esp]
+ pxor mm0,mm4
+ psllq mm2,27
+ pxor mm0,mm2
+ movd mm1,DWORD [esi*4+esp]
+ pxor mm0,mm5
+ psllq mm1,30
+ add esp,36
+ pxor mm0,mm1
+ ret
+align 16
+__mul_1x1_ialu:
+ sub esp,36
+ mov ecx,eax
+ lea edx,[eax*1+eax]
+ lea ebp,[eax*4]
+ and ecx,1073741823
+ lea edi,[eax*1+eax]
+ sar eax,31
+ mov DWORD [esp],0
+ and edx,2147483647
+ mov DWORD [4+esp],ecx
+ xor ecx,edx
+ mov DWORD [8+esp],edx
+ xor edx,ebp
+ mov DWORD [12+esp],ecx
+ xor ecx,edx
+ mov DWORD [16+esp],ebp
+ xor ebp,edx
+ mov DWORD [20+esp],ecx
+ xor ebp,ecx
+ sar edi,31
+ and eax,ebx
+ mov DWORD [24+esp],edx
+ and edi,ebx
+ mov DWORD [28+esp],ebp
+ mov edx,eax
+ shl eax,31
+ mov ecx,edi
+ shr edx,1
+ mov esi,7
+ shl edi,30
+ and esi,ebx
+ shr ecx,2
+ xor eax,edi
+ shr ebx,3
+ mov edi,7
+ and edi,ebx
+ shr ebx,3
+ xor edx,ecx
+ xor eax,DWORD [esi*4+esp]
+ mov esi,7
+ and esi,ebx
+ shr ebx,3
+ mov ebp,DWORD [edi*4+esp]
+ mov edi,7
+ mov ecx,ebp
+ shl ebp,3
+ and edi,ebx
+ shr ecx,29
+ xor eax,ebp
+ shr ebx,3
+ xor edx,ecx
+ mov ecx,DWORD [esi*4+esp]
+ mov esi,7
+ mov ebp,ecx
+ shl ecx,6
+ and esi,ebx
+ shr ebp,26
+ xor eax,ecx
+ shr ebx,3
+ xor edx,ebp
+ mov ebp,DWORD [edi*4+esp]
+ mov edi,7
+ mov ecx,ebp
+ shl ebp,9
+ and edi,ebx
+ shr ecx,23
+ xor eax,ebp
+ shr ebx,3
+ xor edx,ecx
+ mov ecx,DWORD [esi*4+esp]
+ mov esi,7
+ mov ebp,ecx
+ shl ecx,12
+ and esi,ebx
+ shr ebp,20
+ xor eax,ecx
+ shr ebx,3
+ xor edx,ebp
+ mov ebp,DWORD [edi*4+esp]
+ mov edi,7
+ mov ecx,ebp
+ shl ebp,15
+ and edi,ebx
+ shr ecx,17
+ xor eax,ebp
+ shr ebx,3
+ xor edx,ecx
+ mov ecx,DWORD [esi*4+esp]
+ mov esi,7
+ mov ebp,ecx
+ shl ecx,18
+ and esi,ebx
+ shr ebp,14
+ xor eax,ecx
+ shr ebx,3
+ xor edx,ebp
+ mov ebp,DWORD [edi*4+esp]
+ mov edi,7
+ mov ecx,ebp
+ shl ebp,21
+ and edi,ebx
+ shr ecx,11
+ xor eax,ebp
+ shr ebx,3
+ xor edx,ecx
+ mov ecx,DWORD [esi*4+esp]
+ mov esi,7
+ mov ebp,ecx
+ shl ecx,24
+ and esi,ebx
+ shr ebp,8
+ xor eax,ecx
+ shr ebx,3
+ xor edx,ebp
+ mov ebp,DWORD [edi*4+esp]
+ mov ecx,ebp
+ shl ebp,27
+ mov edi,DWORD [esi*4+esp]
+ shr ecx,5
+ mov esi,edi
+ xor eax,ebp
+ shl edi,30
+ xor edx,ecx
+ shr esi,2
+ xor eax,edi
+ xor edx,esi
+ add esp,36
+ ret
+global _bn_GF2m_mul_2x2
+align 16
+_bn_GF2m_mul_2x2:
+L$_bn_GF2m_mul_2x2_begin:
+ lea edx,[_OPENSSL_ia32cap_P]
+ mov eax,DWORD [edx]
+ mov edx,DWORD [4+edx]
+ test eax,8388608
+ jz NEAR L$000ialu
+ test eax,16777216
+ jz NEAR L$001mmx
+ test edx,2
+ jz NEAR L$001mmx
+ movups xmm0,[8+esp]
+ shufps xmm0,xmm0,177
+db 102,15,58,68,192,1
+ mov eax,DWORD [4+esp]
+ movups [eax],xmm0
+ ret
+align 16
+L$001mmx:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov eax,DWORD [24+esp]
+ mov ebx,DWORD [32+esp]
+ call __mul_1x1_mmx
+ movq mm7,mm0
+ mov eax,DWORD [28+esp]
+ mov ebx,DWORD [36+esp]
+ call __mul_1x1_mmx
+ movq mm6,mm0
+ mov eax,DWORD [24+esp]
+ mov ebx,DWORD [32+esp]
+ xor eax,DWORD [28+esp]
+ xor ebx,DWORD [36+esp]
+ call __mul_1x1_mmx
+ pxor mm0,mm7
+ mov eax,DWORD [20+esp]
+ pxor mm0,mm6
+ movq mm2,mm0
+ psllq mm0,32
+ pop edi
+ psrlq mm2,32
+ pop esi
+ pxor mm0,mm6
+ pop ebx
+ pxor mm2,mm7
+ movq [eax],mm0
+ pop ebp
+ movq [8+eax],mm2
+ emms
+ ret
+align 16
+L$000ialu:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ sub esp,20
+ mov eax,DWORD [44+esp]
+ mov ebx,DWORD [52+esp]
+ call __mul_1x1_ialu
+ mov DWORD [8+esp],eax
+ mov DWORD [12+esp],edx
+ mov eax,DWORD [48+esp]
+ mov ebx,DWORD [56+esp]
+ call __mul_1x1_ialu
+ mov DWORD [esp],eax
+ mov DWORD [4+esp],edx
+ mov eax,DWORD [44+esp]
+ mov ebx,DWORD [52+esp]
+ xor eax,DWORD [48+esp]
+ xor ebx,DWORD [56+esp]
+ call __mul_1x1_ialu
+ mov ebp,DWORD [40+esp]
+ mov ebx,DWORD [esp]
+ mov ecx,DWORD [4+esp]
+ mov edi,DWORD [8+esp]
+ mov esi,DWORD [12+esp]
+ xor eax,edx
+ xor edx,ecx
+ xor eax,ebx
+ mov DWORD [ebp],ebx
+ xor edx,edi
+ mov DWORD [12+ebp],esi
+ xor eax,esi
+ add esp,20
+ xor edx,esi
+ pop edi
+ xor eax,edx
+ pop esi
+ mov DWORD [8+ebp],edx
+ pop ebx
+ mov DWORD [4+ebp],eax
+ pop ebp
+ ret
+db 71,70,40,50,94,109,41,32,77,117,108,116,105,112,108,105
+db 99,97,116,105,111,110,32,102,111,114,32,120,56,54,44,32
+db 67,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97
+db 112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103
+db 62,0
+segment .bss
+common _OPENSSL_ia32cap_P 16
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-mont.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-mont.nasm
new file mode 100644
index 0000000000..904526ffbf
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-mont.nasm
@@ -0,0 +1,486 @@
+; Copyright 2005-2018 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text code align=64
+%else
+section .text code
+%endif
+;extern _OPENSSL_ia32cap_P
+global _bn_mul_mont
+align 16
+_bn_mul_mont:
+L$_bn_mul_mont_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ xor eax,eax
+ mov edi,DWORD [40+esp]
+ cmp edi,4
+ jl NEAR L$000just_leave
+ lea esi,[20+esp]
+ lea edx,[24+esp]
+ add edi,2
+ neg edi
+ lea ebp,[edi*4+esp-32]
+ neg edi
+ mov eax,ebp
+ sub eax,edx
+ and eax,2047
+ sub ebp,eax
+ xor edx,ebp
+ and edx,2048
+ xor edx,2048
+ sub ebp,edx
+ and ebp,-64
+ mov eax,esp
+ sub eax,ebp
+ and eax,-4096
+ mov edx,esp
+ lea esp,[eax*1+ebp]
+ mov eax,DWORD [esp]
+ cmp esp,ebp
+ ja NEAR L$001page_walk
+ jmp NEAR L$002page_walk_done
+align 16
+L$001page_walk:
+ lea esp,[esp-4096]
+ mov eax,DWORD [esp]
+ cmp esp,ebp
+ ja NEAR L$001page_walk
+L$002page_walk_done:
+ mov eax,DWORD [esi]
+ mov ebx,DWORD [4+esi]
+ mov ecx,DWORD [8+esi]
+ mov ebp,DWORD [12+esi]
+ mov esi,DWORD [16+esi]
+ mov esi,DWORD [esi]
+ mov DWORD [4+esp],eax
+ mov DWORD [8+esp],ebx
+ mov DWORD [12+esp],ecx
+ mov DWORD [16+esp],ebp
+ mov DWORD [20+esp],esi
+ lea ebx,[edi-3]
+ mov DWORD [24+esp],edx
+ lea eax,[_OPENSSL_ia32cap_P]
+ bt DWORD [eax],26
+ jnc NEAR L$003non_sse2
+ mov eax,-1
+ movd mm7,eax
+ mov esi,DWORD [8+esp]
+ mov edi,DWORD [12+esp]
+ mov ebp,DWORD [16+esp]
+ xor edx,edx
+ xor ecx,ecx
+ movd mm4,DWORD [edi]
+ movd mm5,DWORD [esi]
+ movd mm3,DWORD [ebp]
+ pmuludq mm5,mm4
+ movq mm2,mm5
+ movq mm0,mm5
+ pand mm0,mm7
+ pmuludq mm5,[20+esp]
+ pmuludq mm3,mm5
+ paddq mm3,mm0
+ movd mm1,DWORD [4+ebp]
+ movd mm0,DWORD [4+esi]
+ psrlq mm2,32
+ psrlq mm3,32
+ inc ecx
+align 16
+L$0041st:
+ pmuludq mm0,mm4
+ pmuludq mm1,mm5
+ paddq mm2,mm0
+ paddq mm3,mm1
+ movq mm0,mm2
+ pand mm0,mm7
+ movd mm1,DWORD [4+ecx*4+ebp]
+ paddq mm3,mm0
+ movd mm0,DWORD [4+ecx*4+esi]
+ psrlq mm2,32
+ movd DWORD [28+ecx*4+esp],mm3
+ psrlq mm3,32
+ lea ecx,[1+ecx]
+ cmp ecx,ebx
+ jl NEAR L$0041st
+ pmuludq mm0,mm4
+ pmuludq mm1,mm5
+ paddq mm2,mm0
+ paddq mm3,mm1
+ movq mm0,mm2
+ pand mm0,mm7
+ paddq mm3,mm0
+ movd DWORD [28+ecx*4+esp],mm3
+ psrlq mm2,32
+ psrlq mm3,32
+ paddq mm3,mm2
+ movq [32+ebx*4+esp],mm3
+ inc edx
+L$005outer:
+ xor ecx,ecx
+ movd mm4,DWORD [edx*4+edi]
+ movd mm5,DWORD [esi]
+ movd mm6,DWORD [32+esp]
+ movd mm3,DWORD [ebp]
+ pmuludq mm5,mm4
+ paddq mm5,mm6
+ movq mm0,mm5
+ movq mm2,mm5
+ pand mm0,mm7
+ pmuludq mm5,[20+esp]
+ pmuludq mm3,mm5
+ paddq mm3,mm0
+ movd mm6,DWORD [36+esp]
+ movd mm1,DWORD [4+ebp]
+ movd mm0,DWORD [4+esi]
+ psrlq mm2,32
+ psrlq mm3,32
+ paddq mm2,mm6
+ inc ecx
+ dec ebx
+L$006inner:
+ pmuludq mm0,mm4
+ pmuludq mm1,mm5
+ paddq mm2,mm0
+ paddq mm3,mm1
+ movq mm0,mm2
+ movd mm6,DWORD [36+ecx*4+esp]
+ pand mm0,mm7
+ movd mm1,DWORD [4+ecx*4+ebp]
+ paddq mm3,mm0
+ movd mm0,DWORD [4+ecx*4+esi]
+ psrlq mm2,32
+ movd DWORD [28+ecx*4+esp],mm3
+ psrlq mm3,32
+ paddq mm2,mm6
+ dec ebx
+ lea ecx,[1+ecx]
+ jnz NEAR L$006inner
+ mov ebx,ecx
+ pmuludq mm0,mm4
+ pmuludq mm1,mm5
+ paddq mm2,mm0
+ paddq mm3,mm1
+ movq mm0,mm2
+ pand mm0,mm7
+ paddq mm3,mm0
+ movd DWORD [28+ecx*4+esp],mm3
+ psrlq mm2,32
+ psrlq mm3,32
+ movd mm6,DWORD [36+ebx*4+esp]
+ paddq mm3,mm2
+ paddq mm3,mm6
+ movq [32+ebx*4+esp],mm3
+ lea edx,[1+edx]
+ cmp edx,ebx
+ jle NEAR L$005outer
+ emms
+ jmp NEAR L$007common_tail
+align 16
+L$003non_sse2:
+ mov esi,DWORD [8+esp]
+ lea ebp,[1+ebx]
+ mov edi,DWORD [12+esp]
+ xor ecx,ecx
+ mov edx,esi
+ and ebp,1
+ sub edx,edi
+ lea eax,[4+ebx*4+edi]
+ or ebp,edx
+ mov edi,DWORD [edi]
+ jz NEAR L$008bn_sqr_mont
+ mov DWORD [28+esp],eax
+ mov eax,DWORD [esi]
+ xor edx,edx
+align 16
+L$009mull:
+ mov ebp,edx
+ mul edi
+ add ebp,eax
+ lea ecx,[1+ecx]
+ adc edx,0
+ mov eax,DWORD [ecx*4+esi]
+ cmp ecx,ebx
+ mov DWORD [28+ecx*4+esp],ebp
+ jl NEAR L$009mull
+ mov ebp,edx
+ mul edi
+ mov edi,DWORD [20+esp]
+ add eax,ebp
+ mov esi,DWORD [16+esp]
+ adc edx,0
+ imul edi,DWORD [32+esp]
+ mov DWORD [32+ebx*4+esp],eax
+ xor ecx,ecx
+ mov DWORD [36+ebx*4+esp],edx
+ mov DWORD [40+ebx*4+esp],ecx
+ mov eax,DWORD [esi]
+ mul edi
+ add eax,DWORD [32+esp]
+ mov eax,DWORD [4+esi]
+ adc edx,0
+ inc ecx
+ jmp NEAR L$0102ndmadd
+align 16
+L$0111stmadd:
+ mov ebp,edx
+ mul edi
+ add ebp,DWORD [32+ecx*4+esp]
+ lea ecx,[1+ecx]
+ adc edx,0
+ add ebp,eax
+ mov eax,DWORD [ecx*4+esi]
+ adc edx,0
+ cmp ecx,ebx
+ mov DWORD [28+ecx*4+esp],ebp
+ jl NEAR L$0111stmadd
+ mov ebp,edx
+ mul edi
+ add eax,DWORD [32+ebx*4+esp]
+ mov edi,DWORD [20+esp]
+ adc edx,0
+ mov esi,DWORD [16+esp]
+ add ebp,eax
+ adc edx,0
+ imul edi,DWORD [32+esp]
+ xor ecx,ecx
+ add edx,DWORD [36+ebx*4+esp]
+ mov DWORD [32+ebx*4+esp],ebp
+ adc ecx,0
+ mov eax,DWORD [esi]
+ mov DWORD [36+ebx*4+esp],edx
+ mov DWORD [40+ebx*4+esp],ecx
+ mul edi
+ add eax,DWORD [32+esp]
+ mov eax,DWORD [4+esi]
+ adc edx,0
+ mov ecx,1
+align 16
+L$0102ndmadd:
+ mov ebp,edx
+ mul edi
+ add ebp,DWORD [32+ecx*4+esp]
+ lea ecx,[1+ecx]
+ adc edx,0
+ add ebp,eax
+ mov eax,DWORD [ecx*4+esi]
+ adc edx,0
+ cmp ecx,ebx
+ mov DWORD [24+ecx*4+esp],ebp
+ jl NEAR L$0102ndmadd
+ mov ebp,edx
+ mul edi
+ add ebp,DWORD [32+ebx*4+esp]
+ adc edx,0
+ add ebp,eax
+ adc edx,0
+ mov DWORD [28+ebx*4+esp],ebp
+ xor eax,eax
+ mov ecx,DWORD [12+esp]
+ add edx,DWORD [36+ebx*4+esp]
+ adc eax,DWORD [40+ebx*4+esp]
+ lea ecx,[4+ecx]
+ mov DWORD [32+ebx*4+esp],edx
+ cmp ecx,DWORD [28+esp]
+ mov DWORD [36+ebx*4+esp],eax
+ je NEAR L$007common_tail
+ mov edi,DWORD [ecx]
+ mov esi,DWORD [8+esp]
+ mov DWORD [12+esp],ecx
+ xor ecx,ecx
+ xor edx,edx
+ mov eax,DWORD [esi]
+ jmp NEAR L$0111stmadd
+align 16
+L$008bn_sqr_mont:
+ mov DWORD [esp],ebx
+ mov DWORD [12+esp],ecx
+ mov eax,edi
+ mul edi
+ mov DWORD [32+esp],eax
+ mov ebx,edx
+ shr edx,1
+ and ebx,1
+ inc ecx
+align 16
+L$012sqr:
+ mov eax,DWORD [ecx*4+esi]
+ mov ebp,edx
+ mul edi
+ add eax,ebp
+ lea ecx,[1+ecx]
+ adc edx,0
+ lea ebp,[eax*2+ebx]
+ shr eax,31
+ cmp ecx,DWORD [esp]
+ mov ebx,eax
+ mov DWORD [28+ecx*4+esp],ebp
+ jl NEAR L$012sqr
+ mov eax,DWORD [ecx*4+esi]
+ mov ebp,edx
+ mul edi
+ add eax,ebp
+ mov edi,DWORD [20+esp]
+ adc edx,0
+ mov esi,DWORD [16+esp]
+ lea ebp,[eax*2+ebx]
+ imul edi,DWORD [32+esp]
+ shr eax,31
+ mov DWORD [32+ecx*4+esp],ebp
+ lea ebp,[edx*2+eax]
+ mov eax,DWORD [esi]
+ shr edx,31
+ mov DWORD [36+ecx*4+esp],ebp
+ mov DWORD [40+ecx*4+esp],edx
+ mul edi
+ add eax,DWORD [32+esp]
+ mov ebx,ecx
+ adc edx,0
+ mov eax,DWORD [4+esi]
+ mov ecx,1
+align 16
+L$0133rdmadd:
+ mov ebp,edx
+ mul edi
+ add ebp,DWORD [32+ecx*4+esp]
+ adc edx,0
+ add ebp,eax
+ mov eax,DWORD [4+ecx*4+esi]
+ adc edx,0
+ mov DWORD [28+ecx*4+esp],ebp
+ mov ebp,edx
+ mul edi
+ add ebp,DWORD [36+ecx*4+esp]
+ lea ecx,[2+ecx]
+ adc edx,0
+ add ebp,eax
+ mov eax,DWORD [ecx*4+esi]
+ adc edx,0
+ cmp ecx,ebx
+ mov DWORD [24+ecx*4+esp],ebp
+ jl NEAR L$0133rdmadd
+ mov ebp,edx
+ mul edi
+ add ebp,DWORD [32+ebx*4+esp]
+ adc edx,0
+ add ebp,eax
+ adc edx,0
+ mov DWORD [28+ebx*4+esp],ebp
+ mov ecx,DWORD [12+esp]
+ xor eax,eax
+ mov esi,DWORD [8+esp]
+ add edx,DWORD [36+ebx*4+esp]
+ adc eax,DWORD [40+ebx*4+esp]
+ mov DWORD [32+ebx*4+esp],edx
+ cmp ecx,ebx
+ mov DWORD [36+ebx*4+esp],eax
+ je NEAR L$007common_tail
+ mov edi,DWORD [4+ecx*4+esi]
+ lea ecx,[1+ecx]
+ mov eax,edi
+ mov DWORD [12+esp],ecx
+ mul edi
+ add eax,DWORD [32+ecx*4+esp]
+ adc edx,0
+ mov DWORD [32+ecx*4+esp],eax
+ xor ebp,ebp
+ cmp ecx,ebx
+ lea ecx,[1+ecx]
+ je NEAR L$014sqrlast
+ mov ebx,edx
+ shr edx,1
+ and ebx,1
+align 16
+L$015sqradd:
+ mov eax,DWORD [ecx*4+esi]
+ mov ebp,edx
+ mul edi
+ add eax,ebp
+ lea ebp,[eax*1+eax]
+ adc edx,0
+ shr eax,31
+ add ebp,DWORD [32+ecx*4+esp]
+ lea ecx,[1+ecx]
+ adc eax,0
+ add ebp,ebx
+ adc eax,0
+ cmp ecx,DWORD [esp]
+ mov DWORD [28+ecx*4+esp],ebp
+ mov ebx,eax
+ jle NEAR L$015sqradd
+ mov ebp,edx
+ add edx,edx
+ shr ebp,31
+ add edx,ebx
+ adc ebp,0
+L$014sqrlast:
+ mov edi,DWORD [20+esp]
+ mov esi,DWORD [16+esp]
+ imul edi,DWORD [32+esp]
+ add edx,DWORD [32+ecx*4+esp]
+ mov eax,DWORD [esi]
+ adc ebp,0
+ mov DWORD [32+ecx*4+esp],edx
+ mov DWORD [36+ecx*4+esp],ebp
+ mul edi
+ add eax,DWORD [32+esp]
+ lea ebx,[ecx-1]
+ adc edx,0
+ mov ecx,1
+ mov eax,DWORD [4+esi]
+ jmp NEAR L$0133rdmadd
+align 16
+L$007common_tail:
+ mov ebp,DWORD [16+esp]
+ mov edi,DWORD [4+esp]
+ lea esi,[32+esp]
+ mov eax,DWORD [esi]
+ mov ecx,ebx
+ xor edx,edx
+align 16
+L$016sub:
+ sbb eax,DWORD [edx*4+ebp]
+ mov DWORD [edx*4+edi],eax
+ dec ecx
+ mov eax,DWORD [4+edx*4+esi]
+ lea edx,[1+edx]
+ jge NEAR L$016sub
+ sbb eax,0
+ mov edx,-1
+ xor edx,eax
+ jmp NEAR L$017copy
+align 16
+L$017copy:
+ mov esi,DWORD [32+ebx*4+esp]
+ mov ebp,DWORD [ebx*4+edi]
+ mov DWORD [32+ebx*4+esp],ecx
+ and esi,eax
+ and ebp,edx
+ or ebp,esi
+ mov DWORD [ebx*4+edi],ebp
+ dec ebx
+ jge NEAR L$017copy
+ mov esp,DWORD [24+esp]
+ mov eax,1
+L$000just_leave:
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+db 77,111,110,116,103,111,109,101,114,121,32,77,117,108,116,105
+db 112,108,105,99,97,116,105,111,110,32,102,111,114,32,120,56
+db 54,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121
+db 32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46
+db 111,114,103,62,0
+segment .bss
+common _OPENSSL_ia32cap_P 16
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/des/crypt586.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/des/crypt586.nasm
new file mode 100644
index 0000000000..dd69f436c4
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/des/crypt586.nasm
@@ -0,0 +1,887 @@
+; Copyright 1995-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text code align=64
+%else
+section .text code
+%endif
+extern _DES_SPtrans
+global _fcrypt_body
+align 16
+_fcrypt_body:
+L$_fcrypt_body_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ ;
+ ; Load the 2 words
+ xor edi,edi
+ xor esi,esi
+ lea edx,[_DES_SPtrans]
+ push edx
+ mov ebp,DWORD [28+esp]
+ push DWORD 25
+L$000start:
+ ;
+ ; Round 0
+ mov eax,DWORD [36+esp]
+ mov edx,esi
+ shr edx,16
+ mov ecx,DWORD [40+esp]
+ xor edx,esi
+ and eax,edx
+ and edx,ecx
+ mov ebx,eax
+ shl ebx,16
+ mov ecx,edx
+ shl ecx,16
+ xor eax,ebx
+ xor edx,ecx
+ mov ebx,DWORD [ebp]
+ xor eax,ebx
+ mov ecx,DWORD [4+ebp]
+ xor eax,esi
+ xor edx,esi
+ xor edx,ecx
+ and eax,0xfcfcfcfc
+ xor ebx,ebx
+ and edx,0xcfcfcfcf
+ xor ecx,ecx
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ mov ebp,DWORD [4+esp]
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ mov ebx,DWORD [0x600+ebx*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x700+ecx*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x400+eax*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x500+edx*1+ebp]
+ xor edi,ebx
+ mov ebp,DWORD [32+esp]
+ ;
+ ; Round 1
+ mov eax,DWORD [36+esp]
+ mov edx,edi
+ shr edx,16
+ mov ecx,DWORD [40+esp]
+ xor edx,edi
+ and eax,edx
+ and edx,ecx
+ mov ebx,eax
+ shl ebx,16
+ mov ecx,edx
+ shl ecx,16
+ xor eax,ebx
+ xor edx,ecx
+ mov ebx,DWORD [8+ebp]
+ xor eax,ebx
+ mov ecx,DWORD [12+ebp]
+ xor eax,edi
+ xor edx,edi
+ xor edx,ecx
+ and eax,0xfcfcfcfc
+ xor ebx,ebx
+ and edx,0xcfcfcfcf
+ xor ecx,ecx
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ mov ebp,DWORD [4+esp]
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ mov ebx,DWORD [0x600+ebx*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x700+ecx*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x400+eax*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x500+edx*1+ebp]
+ xor esi,ebx
+ mov ebp,DWORD [32+esp]
+ ;
+ ; Round 2
+ mov eax,DWORD [36+esp]
+ mov edx,esi
+ shr edx,16
+ mov ecx,DWORD [40+esp]
+ xor edx,esi
+ and eax,edx
+ and edx,ecx
+ mov ebx,eax
+ shl ebx,16
+ mov ecx,edx
+ shl ecx,16
+ xor eax,ebx
+ xor edx,ecx
+ mov ebx,DWORD [16+ebp]
+ xor eax,ebx
+ mov ecx,DWORD [20+ebp]
+ xor eax,esi
+ xor edx,esi
+ xor edx,ecx
+ and eax,0xfcfcfcfc
+ xor ebx,ebx
+ and edx,0xcfcfcfcf
+ xor ecx,ecx
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ mov ebp,DWORD [4+esp]
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ mov ebx,DWORD [0x600+ebx*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x700+ecx*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x400+eax*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x500+edx*1+ebp]
+ xor edi,ebx
+ mov ebp,DWORD [32+esp]
+ ;
+ ; Round 3
+ mov eax,DWORD [36+esp]
+ mov edx,edi
+ shr edx,16
+ mov ecx,DWORD [40+esp]
+ xor edx,edi
+ and eax,edx
+ and edx,ecx
+ mov ebx,eax
+ shl ebx,16
+ mov ecx,edx
+ shl ecx,16
+ xor eax,ebx
+ xor edx,ecx
+ mov ebx,DWORD [24+ebp]
+ xor eax,ebx
+ mov ecx,DWORD [28+ebp]
+ xor eax,edi
+ xor edx,edi
+ xor edx,ecx
+ and eax,0xfcfcfcfc
+ xor ebx,ebx
+ and edx,0xcfcfcfcf
+ xor ecx,ecx
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ mov ebp,DWORD [4+esp]
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ mov ebx,DWORD [0x600+ebx*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x700+ecx*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x400+eax*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x500+edx*1+ebp]
+ xor esi,ebx
+ mov ebp,DWORD [32+esp]
+ ;
+ ; Round 4
+ mov eax,DWORD [36+esp]
+ mov edx,esi
+ shr edx,16
+ mov ecx,DWORD [40+esp]
+ xor edx,esi
+ and eax,edx
+ and edx,ecx
+ mov ebx,eax
+ shl ebx,16
+ mov ecx,edx
+ shl ecx,16
+ xor eax,ebx
+ xor edx,ecx
+ mov ebx,DWORD [32+ebp]
+ xor eax,ebx
+ mov ecx,DWORD [36+ebp]
+ xor eax,esi
+ xor edx,esi
+ xor edx,ecx
+ and eax,0xfcfcfcfc
+ xor ebx,ebx
+ and edx,0xcfcfcfcf
+ xor ecx,ecx
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ mov ebp,DWORD [4+esp]
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ mov ebx,DWORD [0x600+ebx*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x700+ecx*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x400+eax*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x500+edx*1+ebp]
+ xor edi,ebx
+ mov ebp,DWORD [32+esp]
+ ;
+ ; Round 5
+ mov eax,DWORD [36+esp]
+ mov edx,edi
+ shr edx,16
+ mov ecx,DWORD [40+esp]
+ xor edx,edi
+ and eax,edx
+ and edx,ecx
+ mov ebx,eax
+ shl ebx,16
+ mov ecx,edx
+ shl ecx,16
+ xor eax,ebx
+ xor edx,ecx
+ mov ebx,DWORD [40+ebp]
+ xor eax,ebx
+ mov ecx,DWORD [44+ebp]
+ xor eax,edi
+ xor edx,edi
+ xor edx,ecx
+ and eax,0xfcfcfcfc
+ xor ebx,ebx
+ and edx,0xcfcfcfcf
+ xor ecx,ecx
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ mov ebp,DWORD [4+esp]
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ mov ebx,DWORD [0x600+ebx*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x700+ecx*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x400+eax*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x500+edx*1+ebp]
+ xor esi,ebx
+ mov ebp,DWORD [32+esp]
+ ;
+ ; Round 6
+ mov eax,DWORD [36+esp]
+ mov edx,esi
+ shr edx,16
+ mov ecx,DWORD [40+esp]
+ xor edx,esi
+ and eax,edx
+ and edx,ecx
+ mov ebx,eax
+ shl ebx,16
+ mov ecx,edx
+ shl ecx,16
+ xor eax,ebx
+ xor edx,ecx
+ mov ebx,DWORD [48+ebp]
+ xor eax,ebx
+ mov ecx,DWORD [52+ebp]
+ xor eax,esi
+ xor edx,esi
+ xor edx,ecx
+ and eax,0xfcfcfcfc
+ xor ebx,ebx
+ and edx,0xcfcfcfcf
+ xor ecx,ecx
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ mov ebp,DWORD [4+esp]
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ mov ebx,DWORD [0x600+ebx*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x700+ecx*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x400+eax*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x500+edx*1+ebp]
+ xor edi,ebx
+ mov ebp,DWORD [32+esp]
+ ;
+ ; Round 7
+ mov eax,DWORD [36+esp]
+ mov edx,edi
+ shr edx,16
+ mov ecx,DWORD [40+esp]
+ xor edx,edi
+ and eax,edx
+ and edx,ecx
+ mov ebx,eax
+ shl ebx,16
+ mov ecx,edx
+ shl ecx,16
+ xor eax,ebx
+ xor edx,ecx
+ mov ebx,DWORD [56+ebp]
+ xor eax,ebx
+ mov ecx,DWORD [60+ebp]
+ xor eax,edi
+ xor edx,edi
+ xor edx,ecx
+ and eax,0xfcfcfcfc
+ xor ebx,ebx
+ and edx,0xcfcfcfcf
+ xor ecx,ecx
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ mov ebp,DWORD [4+esp]
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ mov ebx,DWORD [0x600+ebx*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x700+ecx*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x400+eax*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x500+edx*1+ebp]
+ xor esi,ebx
+ mov ebp,DWORD [32+esp]
+ ;
+ ; Round 8
+ mov eax,DWORD [36+esp]
+ mov edx,esi
+ shr edx,16
+ mov ecx,DWORD [40+esp]
+ xor edx,esi
+ and eax,edx
+ and edx,ecx
+ mov ebx,eax
+ shl ebx,16
+ mov ecx,edx
+ shl ecx,16
+ xor eax,ebx
+ xor edx,ecx
+ mov ebx,DWORD [64+ebp]
+ xor eax,ebx
+ mov ecx,DWORD [68+ebp]
+ xor eax,esi
+ xor edx,esi
+ xor edx,ecx
+ and eax,0xfcfcfcfc
+ xor ebx,ebx
+ and edx,0xcfcfcfcf
+ xor ecx,ecx
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ mov ebp,DWORD [4+esp]
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ mov ebx,DWORD [0x600+ebx*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x700+ecx*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x400+eax*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x500+edx*1+ebp]
+ xor edi,ebx
+ mov ebp,DWORD [32+esp]
+ ;
+ ; Round 9
+ mov eax,DWORD [36+esp]
+ mov edx,edi
+ shr edx,16
+ mov ecx,DWORD [40+esp]
+ xor edx,edi
+ and eax,edx
+ and edx,ecx
+ mov ebx,eax
+ shl ebx,16
+ mov ecx,edx
+ shl ecx,16
+ xor eax,ebx
+ xor edx,ecx
+ mov ebx,DWORD [72+ebp]
+ xor eax,ebx
+ mov ecx,DWORD [76+ebp]
+ xor eax,edi
+ xor edx,edi
+ xor edx,ecx
+ and eax,0xfcfcfcfc
+ xor ebx,ebx
+ and edx,0xcfcfcfcf
+ xor ecx,ecx
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ mov ebp,DWORD [4+esp]
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ mov ebx,DWORD [0x600+ebx*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x700+ecx*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x400+eax*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x500+edx*1+ebp]
+ xor esi,ebx
+ mov ebp,DWORD [32+esp]
+ ;
+ ; Round 10
+ mov eax,DWORD [36+esp]
+ mov edx,esi
+ shr edx,16
+ mov ecx,DWORD [40+esp]
+ xor edx,esi
+ and eax,edx
+ and edx,ecx
+ mov ebx,eax
+ shl ebx,16
+ mov ecx,edx
+ shl ecx,16
+ xor eax,ebx
+ xor edx,ecx
+ mov ebx,DWORD [80+ebp]
+ xor eax,ebx
+ mov ecx,DWORD [84+ebp]
+ xor eax,esi
+ xor edx,esi
+ xor edx,ecx
+ and eax,0xfcfcfcfc
+ xor ebx,ebx
+ and edx,0xcfcfcfcf
+ xor ecx,ecx
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ mov ebp,DWORD [4+esp]
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ mov ebx,DWORD [0x600+ebx*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x700+ecx*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x400+eax*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x500+edx*1+ebp]
+ xor edi,ebx
+ mov ebp,DWORD [32+esp]
+ ;
+ ; Round 11
+ mov eax,DWORD [36+esp]
+ mov edx,edi
+ shr edx,16
+ mov ecx,DWORD [40+esp]
+ xor edx,edi
+ and eax,edx
+ and edx,ecx
+ mov ebx,eax
+ shl ebx,16
+ mov ecx,edx
+ shl ecx,16
+ xor eax,ebx
+ xor edx,ecx
+ mov ebx,DWORD [88+ebp]
+ xor eax,ebx
+ mov ecx,DWORD [92+ebp]
+ xor eax,edi
+ xor edx,edi
+ xor edx,ecx
+ and eax,0xfcfcfcfc
+ xor ebx,ebx
+ and edx,0xcfcfcfcf
+ xor ecx,ecx
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ mov ebp,DWORD [4+esp]
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ mov ebx,DWORD [0x600+ebx*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x700+ecx*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x400+eax*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x500+edx*1+ebp]
+ xor esi,ebx
+ mov ebp,DWORD [32+esp]
+ ;
+ ; Round 12
+ mov eax,DWORD [36+esp]
+ mov edx,esi
+ shr edx,16
+ mov ecx,DWORD [40+esp]
+ xor edx,esi
+ and eax,edx
+ and edx,ecx
+ mov ebx,eax
+ shl ebx,16
+ mov ecx,edx
+ shl ecx,16
+ xor eax,ebx
+ xor edx,ecx
+ mov ebx,DWORD [96+ebp]
+ xor eax,ebx
+ mov ecx,DWORD [100+ebp]
+ xor eax,esi
+ xor edx,esi
+ xor edx,ecx
+ and eax,0xfcfcfcfc
+ xor ebx,ebx
+ and edx,0xcfcfcfcf
+ xor ecx,ecx
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ mov ebp,DWORD [4+esp]
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ mov ebx,DWORD [0x600+ebx*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x700+ecx*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x400+eax*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x500+edx*1+ebp]
+ xor edi,ebx
+ mov ebp,DWORD [32+esp]
+ ;
+ ; Round 13
+ mov eax,DWORD [36+esp]
+ mov edx,edi
+ shr edx,16
+ mov ecx,DWORD [40+esp]
+ xor edx,edi
+ and eax,edx
+ and edx,ecx
+ mov ebx,eax
+ shl ebx,16
+ mov ecx,edx
+ shl ecx,16
+ xor eax,ebx
+ xor edx,ecx
+ mov ebx,DWORD [104+ebp]
+ xor eax,ebx
+ mov ecx,DWORD [108+ebp]
+ xor eax,edi
+ xor edx,edi
+ xor edx,ecx
+ and eax,0xfcfcfcfc
+ xor ebx,ebx
+ and edx,0xcfcfcfcf
+ xor ecx,ecx
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ mov ebp,DWORD [4+esp]
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ mov ebx,DWORD [0x600+ebx*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x700+ecx*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x400+eax*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x500+edx*1+ebp]
+ xor esi,ebx
+ mov ebp,DWORD [32+esp]
+ ;
+ ; Round 14
+ mov eax,DWORD [36+esp]
+ mov edx,esi
+ shr edx,16
+ mov ecx,DWORD [40+esp]
+ xor edx,esi
+ and eax,edx
+ and edx,ecx
+ mov ebx,eax
+ shl ebx,16
+ mov ecx,edx
+ shl ecx,16
+ xor eax,ebx
+ xor edx,ecx
+ mov ebx,DWORD [112+ebp]
+ xor eax,ebx
+ mov ecx,DWORD [116+ebp]
+ xor eax,esi
+ xor edx,esi
+ xor edx,ecx
+ and eax,0xfcfcfcfc
+ xor ebx,ebx
+ and edx,0xcfcfcfcf
+ xor ecx,ecx
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ mov ebp,DWORD [4+esp]
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ mov ebx,DWORD [0x600+ebx*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x700+ecx*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x400+eax*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x500+edx*1+ebp]
+ xor edi,ebx
+ mov ebp,DWORD [32+esp]
+ ;
+ ; Round 15
+ mov eax,DWORD [36+esp]
+ mov edx,edi
+ shr edx,16
+ mov ecx,DWORD [40+esp]
+ xor edx,edi
+ and eax,edx
+ and edx,ecx
+ mov ebx,eax
+ shl ebx,16
+ mov ecx,edx
+ shl ecx,16
+ xor eax,ebx
+ xor edx,ecx
+ mov ebx,DWORD [120+ebp]
+ xor eax,ebx
+ mov ecx,DWORD [124+ebp]
+ xor eax,edi
+ xor edx,edi
+ xor edx,ecx
+ and eax,0xfcfcfcfc
+ xor ebx,ebx
+ and edx,0xcfcfcfcf
+ xor ecx,ecx
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ mov ebp,DWORD [4+esp]
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ mov ebx,DWORD [0x600+ebx*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x700+ecx*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x400+eax*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x500+edx*1+ebp]
+ xor esi,ebx
+ mov ebp,DWORD [32+esp]
+ mov ebx,DWORD [esp]
+ mov eax,edi
+ dec ebx
+ mov edi,esi
+ mov esi,eax
+ mov DWORD [esp],ebx
+ jnz NEAR L$000start
+ ;
+ ; FP
+ mov edx,DWORD [28+esp]
+ ror edi,1
+ mov eax,esi
+ xor esi,edi
+ and esi,0xaaaaaaaa
+ xor eax,esi
+ xor edi,esi
+ ;
+ rol eax,23
+ mov esi,eax
+ xor eax,edi
+ and eax,0x03fc03fc
+ xor esi,eax
+ xor edi,eax
+ ;
+ rol esi,10
+ mov eax,esi
+ xor esi,edi
+ and esi,0x33333333
+ xor eax,esi
+ xor edi,esi
+ ;
+ rol edi,18
+ mov esi,edi
+ xor edi,eax
+ and edi,0xfff0000f
+ xor esi,edi
+ xor eax,edi
+ ;
+ rol esi,12
+ mov edi,esi
+ xor esi,eax
+ and esi,0xf0f0f0f0
+ xor edi,esi
+ xor eax,esi
+ ;
+ ror eax,4
+ mov DWORD [edx],eax
+ mov DWORD [4+edx],edi
+ add esp,8
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/des/des-586.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/des/des-586.nasm
new file mode 100644
index 0000000000..980d488316
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/des/des-586.nasm
@@ -0,0 +1,1835 @@
+; Copyright 1995-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text code align=64
+%else
+section .text code
+%endif
+global _DES_SPtrans
+align 16
+__x86_DES_encrypt:
+ push ecx
+ ; Round 0
+ mov eax,DWORD [ecx]
+ xor ebx,ebx
+ mov edx,DWORD [4+ecx]
+ xor eax,esi
+ xor ecx,ecx
+ xor edx,esi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor edi,DWORD [0x600+ebx*1+ebp]
+ xor edi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor edi,DWORD [0x400+eax*1+ebp]
+ xor edi,DWORD [0x500+edx*1+ebp]
+ ; Round 1
+ mov eax,DWORD [8+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [12+ecx]
+ xor eax,edi
+ xor ecx,ecx
+ xor edx,edi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor esi,DWORD [0x600+ebx*1+ebp]
+ xor esi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor esi,DWORD [0x400+eax*1+ebp]
+ xor esi,DWORD [0x500+edx*1+ebp]
+ ; Round 2
+ mov eax,DWORD [16+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [20+ecx]
+ xor eax,esi
+ xor ecx,ecx
+ xor edx,esi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor edi,DWORD [0x600+ebx*1+ebp]
+ xor edi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor edi,DWORD [0x400+eax*1+ebp]
+ xor edi,DWORD [0x500+edx*1+ebp]
+ ; Round 3
+ mov eax,DWORD [24+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [28+ecx]
+ xor eax,edi
+ xor ecx,ecx
+ xor edx,edi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor esi,DWORD [0x600+ebx*1+ebp]
+ xor esi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor esi,DWORD [0x400+eax*1+ebp]
+ xor esi,DWORD [0x500+edx*1+ebp]
+ ; Round 4
+ mov eax,DWORD [32+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [36+ecx]
+ xor eax,esi
+ xor ecx,ecx
+ xor edx,esi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor edi,DWORD [0x600+ebx*1+ebp]
+ xor edi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor edi,DWORD [0x400+eax*1+ebp]
+ xor edi,DWORD [0x500+edx*1+ebp]
+ ; Round 5
+ mov eax,DWORD [40+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [44+ecx]
+ xor eax,edi
+ xor ecx,ecx
+ xor edx,edi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor esi,DWORD [0x600+ebx*1+ebp]
+ xor esi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor esi,DWORD [0x400+eax*1+ebp]
+ xor esi,DWORD [0x500+edx*1+ebp]
+ ; Round 6
+ mov eax,DWORD [48+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [52+ecx]
+ xor eax,esi
+ xor ecx,ecx
+ xor edx,esi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor edi,DWORD [0x600+ebx*1+ebp]
+ xor edi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor edi,DWORD [0x400+eax*1+ebp]
+ xor edi,DWORD [0x500+edx*1+ebp]
+ ; Round 7
+ mov eax,DWORD [56+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [60+ecx]
+ xor eax,edi
+ xor ecx,ecx
+ xor edx,edi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor esi,DWORD [0x600+ebx*1+ebp]
+ xor esi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor esi,DWORD [0x400+eax*1+ebp]
+ xor esi,DWORD [0x500+edx*1+ebp]
+ ; Round 8
+ mov eax,DWORD [64+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [68+ecx]
+ xor eax,esi
+ xor ecx,ecx
+ xor edx,esi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor edi,DWORD [0x600+ebx*1+ebp]
+ xor edi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor edi,DWORD [0x400+eax*1+ebp]
+ xor edi,DWORD [0x500+edx*1+ebp]
+ ; Round 9
+ mov eax,DWORD [72+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [76+ecx]
+ xor eax,edi
+ xor ecx,ecx
+ xor edx,edi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor esi,DWORD [0x600+ebx*1+ebp]
+ xor esi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor esi,DWORD [0x400+eax*1+ebp]
+ xor esi,DWORD [0x500+edx*1+ebp]
+ ; Round 10
+ mov eax,DWORD [80+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [84+ecx]
+ xor eax,esi
+ xor ecx,ecx
+ xor edx,esi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor edi,DWORD [0x600+ebx*1+ebp]
+ xor edi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor edi,DWORD [0x400+eax*1+ebp]
+ xor edi,DWORD [0x500+edx*1+ebp]
+ ; Round 11
+ mov eax,DWORD [88+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [92+ecx]
+ xor eax,edi
+ xor ecx,ecx
+ xor edx,edi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor esi,DWORD [0x600+ebx*1+ebp]
+ xor esi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor esi,DWORD [0x400+eax*1+ebp]
+ xor esi,DWORD [0x500+edx*1+ebp]
+ ; Round 12
+ mov eax,DWORD [96+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [100+ecx]
+ xor eax,esi
+ xor ecx,ecx
+ xor edx,esi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor edi,DWORD [0x600+ebx*1+ebp]
+ xor edi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor edi,DWORD [0x400+eax*1+ebp]
+ xor edi,DWORD [0x500+edx*1+ebp]
+ ; Round 13
+ mov eax,DWORD [104+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [108+ecx]
+ xor eax,edi
+ xor ecx,ecx
+ xor edx,edi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor esi,DWORD [0x600+ebx*1+ebp]
+ xor esi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor esi,DWORD [0x400+eax*1+ebp]
+ xor esi,DWORD [0x500+edx*1+ebp]
+ ; Round 14
+ mov eax,DWORD [112+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [116+ecx]
+ xor eax,esi
+ xor ecx,ecx
+ xor edx,esi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor edi,DWORD [0x600+ebx*1+ebp]
+ xor edi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor edi,DWORD [0x400+eax*1+ebp]
+ xor edi,DWORD [0x500+edx*1+ebp]
+ ; Round 15
+ mov eax,DWORD [120+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [124+ecx]
+ xor eax,edi
+ xor ecx,ecx
+ xor edx,edi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor esi,DWORD [0x600+ebx*1+ebp]
+ xor esi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor esi,DWORD [0x400+eax*1+ebp]
+ xor esi,DWORD [0x500+edx*1+ebp]
+ add esp,4
+ ret
+align 16
+__x86_DES_decrypt:
+ push ecx
+ ; Round 15
+ mov eax,DWORD [120+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [124+ecx]
+ xor eax,esi
+ xor ecx,ecx
+ xor edx,esi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor edi,DWORD [0x600+ebx*1+ebp]
+ xor edi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor edi,DWORD [0x400+eax*1+ebp]
+ xor edi,DWORD [0x500+edx*1+ebp]
+ ; Round 14
+ mov eax,DWORD [112+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [116+ecx]
+ xor eax,edi
+ xor ecx,ecx
+ xor edx,edi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor esi,DWORD [0x600+ebx*1+ebp]
+ xor esi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor esi,DWORD [0x400+eax*1+ebp]
+ xor esi,DWORD [0x500+edx*1+ebp]
+ ; Round 13
+ mov eax,DWORD [104+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [108+ecx]
+ xor eax,esi
+ xor ecx,ecx
+ xor edx,esi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor edi,DWORD [0x600+ebx*1+ebp]
+ xor edi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor edi,DWORD [0x400+eax*1+ebp]
+ xor edi,DWORD [0x500+edx*1+ebp]
+ ; Round 12
+ mov eax,DWORD [96+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [100+ecx]
+ xor eax,edi
+ xor ecx,ecx
+ xor edx,edi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor esi,DWORD [0x600+ebx*1+ebp]
+ xor esi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor esi,DWORD [0x400+eax*1+ebp]
+ xor esi,DWORD [0x500+edx*1+ebp]
+ ; Round 11
+ mov eax,DWORD [88+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [92+ecx]
+ xor eax,esi
+ xor ecx,ecx
+ xor edx,esi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor edi,DWORD [0x600+ebx*1+ebp]
+ xor edi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor edi,DWORD [0x400+eax*1+ebp]
+ xor edi,DWORD [0x500+edx*1+ebp]
+ ; Round 10
+ mov eax,DWORD [80+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [84+ecx]
+ xor eax,edi
+ xor ecx,ecx
+ xor edx,edi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor esi,DWORD [0x600+ebx*1+ebp]
+ xor esi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor esi,DWORD [0x400+eax*1+ebp]
+ xor esi,DWORD [0x500+edx*1+ebp]
+ ; Round 9
+ mov eax,DWORD [72+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [76+ecx]
+ xor eax,esi
+ xor ecx,ecx
+ xor edx,esi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor edi,DWORD [0x600+ebx*1+ebp]
+ xor edi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor edi,DWORD [0x400+eax*1+ebp]
+ xor edi,DWORD [0x500+edx*1+ebp]
+ ; Round 8
+ mov eax,DWORD [64+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [68+ecx]
+ xor eax,edi
+ xor ecx,ecx
+ xor edx,edi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor esi,DWORD [0x600+ebx*1+ebp]
+ xor esi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor esi,DWORD [0x400+eax*1+ebp]
+ xor esi,DWORD [0x500+edx*1+ebp]
+ ; Round 7
+ mov eax,DWORD [56+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [60+ecx]
+ xor eax,esi
+ xor ecx,ecx
+ xor edx,esi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor edi,DWORD [0x600+ebx*1+ebp]
+ xor edi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor edi,DWORD [0x400+eax*1+ebp]
+ xor edi,DWORD [0x500+edx*1+ebp]
+ ; Round 6
+ mov eax,DWORD [48+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [52+ecx]
+ xor eax,edi
+ xor ecx,ecx
+ xor edx,edi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor esi,DWORD [0x600+ebx*1+ebp]
+ xor esi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor esi,DWORD [0x400+eax*1+ebp]
+ xor esi,DWORD [0x500+edx*1+ebp]
+ ; Round 5
+ mov eax,DWORD [40+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [44+ecx]
+ xor eax,esi
+ xor ecx,ecx
+ xor edx,esi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor edi,DWORD [0x600+ebx*1+ebp]
+ xor edi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor edi,DWORD [0x400+eax*1+ebp]
+ xor edi,DWORD [0x500+edx*1+ebp]
+ ; Round 4
+ mov eax,DWORD [32+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [36+ecx]
+ xor eax,edi
+ xor ecx,ecx
+ xor edx,edi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor esi,DWORD [0x600+ebx*1+ebp]
+ xor esi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor esi,DWORD [0x400+eax*1+ebp]
+ xor esi,DWORD [0x500+edx*1+ebp]
+ ; Round 3
+ mov eax,DWORD [24+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [28+ecx]
+ xor eax,esi
+ xor ecx,ecx
+ xor edx,esi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor edi,DWORD [0x600+ebx*1+ebp]
+ xor edi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor edi,DWORD [0x400+eax*1+ebp]
+ xor edi,DWORD [0x500+edx*1+ebp]
+ ; Round 2
+ mov eax,DWORD [16+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [20+ecx]
+ xor eax,edi
+ xor ecx,ecx
+ xor edx,edi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor esi,DWORD [0x600+ebx*1+ebp]
+ xor esi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor esi,DWORD [0x400+eax*1+ebp]
+ xor esi,DWORD [0x500+edx*1+ebp]
+ ; Round 1
+ mov eax,DWORD [8+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [12+ecx]
+ xor eax,esi
+ xor ecx,ecx
+ xor edx,esi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor edi,DWORD [0x600+ebx*1+ebp]
+ xor edi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor edi,DWORD [0x400+eax*1+ebp]
+ xor edi,DWORD [0x500+edx*1+ebp]
+ ; Round 0
+ mov eax,DWORD [ecx]
+ xor ebx,ebx
+ mov edx,DWORD [4+ecx]
+ xor eax,edi
+ xor ecx,ecx
+ xor edx,edi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor esi,DWORD [0x600+ebx*1+ebp]
+ xor esi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor esi,DWORD [0x400+eax*1+ebp]
+ xor esi,DWORD [0x500+edx*1+ebp]
+ add esp,4
+ ret
+global _DES_encrypt1
+align 16
+_DES_encrypt1:
+L$_DES_encrypt1_begin:
+ push esi
+ push edi
+ ;
+ ; Load the 2 words
+ mov esi,DWORD [12+esp]
+ xor ecx,ecx
+ push ebx
+ push ebp
+ mov eax,DWORD [esi]
+ mov ebx,DWORD [28+esp]
+ mov edi,DWORD [4+esi]
+ ;
+ ; IP
+ rol eax,4
+ mov esi,eax
+ xor eax,edi
+ and eax,0xf0f0f0f0
+ xor esi,eax
+ xor edi,eax
+ ;
+ rol edi,20
+ mov eax,edi
+ xor edi,esi
+ and edi,0xfff0000f
+ xor eax,edi
+ xor esi,edi
+ ;
+ rol eax,14
+ mov edi,eax
+ xor eax,esi
+ and eax,0x33333333
+ xor edi,eax
+ xor esi,eax
+ ;
+ rol esi,22
+ mov eax,esi
+ xor esi,edi
+ and esi,0x03fc03fc
+ xor eax,esi
+ xor edi,esi
+ ;
+ rol eax,9
+ mov esi,eax
+ xor eax,edi
+ and eax,0xaaaaaaaa
+ xor esi,eax
+ xor edi,eax
+ ;
+ rol edi,1
+ call L$000pic_point
+L$000pic_point:
+ pop ebp
+ lea ebp,[(L$des_sptrans-L$000pic_point)+ebp]
+ mov ecx,DWORD [24+esp]
+ cmp ebx,0
+ je NEAR L$001decrypt
+ call __x86_DES_encrypt
+ jmp NEAR L$002done
+L$001decrypt:
+ call __x86_DES_decrypt
+L$002done:
+ ;
+ ; FP
+ mov edx,DWORD [20+esp]
+ ror esi,1
+ mov eax,edi
+ xor edi,esi
+ and edi,0xaaaaaaaa
+ xor eax,edi
+ xor esi,edi
+ ;
+ rol eax,23
+ mov edi,eax
+ xor eax,esi
+ and eax,0x03fc03fc
+ xor edi,eax
+ xor esi,eax
+ ;
+ rol edi,10
+ mov eax,edi
+ xor edi,esi
+ and edi,0x33333333
+ xor eax,edi
+ xor esi,edi
+ ;
+ rol esi,18
+ mov edi,esi
+ xor esi,eax
+ and esi,0xfff0000f
+ xor edi,esi
+ xor eax,esi
+ ;
+ rol edi,12
+ mov esi,edi
+ xor edi,eax
+ and edi,0xf0f0f0f0
+ xor esi,edi
+ xor eax,edi
+ ;
+ ror eax,4
+ mov DWORD [edx],eax
+ mov DWORD [4+edx],esi
+ pop ebp
+ pop ebx
+ pop edi
+ pop esi
+ ret
+global _DES_encrypt2
+align 16
+_DES_encrypt2:
+L$_DES_encrypt2_begin:
+ push esi
+ push edi
+ ;
+ ; Load the 2 words
+ mov eax,DWORD [12+esp]
+ xor ecx,ecx
+ push ebx
+ push ebp
+ mov esi,DWORD [eax]
+ mov ebx,DWORD [28+esp]
+ rol esi,3
+ mov edi,DWORD [4+eax]
+ rol edi,3
+ call L$003pic_point
+L$003pic_point:
+ pop ebp
+ lea ebp,[(L$des_sptrans-L$003pic_point)+ebp]
+ mov ecx,DWORD [24+esp]
+ cmp ebx,0
+ je NEAR L$004decrypt
+ call __x86_DES_encrypt
+ jmp NEAR L$005done
+L$004decrypt:
+ call __x86_DES_decrypt
+L$005done:
+ ;
+ ; Fixup
+ ror edi,3
+ mov eax,DWORD [20+esp]
+ ror esi,3
+ mov DWORD [eax],edi
+ mov DWORD [4+eax],esi
+ pop ebp
+ pop ebx
+ pop edi
+ pop esi
+ ret
+global _DES_encrypt3
+align 16
+_DES_encrypt3:
+L$_DES_encrypt3_begin:
+ push ebx
+ mov ebx,DWORD [8+esp]
+ push ebp
+ push esi
+ push edi
+ ;
+ ; Load the data words
+ mov edi,DWORD [ebx]
+ mov esi,DWORD [4+ebx]
+ sub esp,12
+ ;
+ ; IP
+ rol edi,4
+ mov edx,edi
+ xor edi,esi
+ and edi,0xf0f0f0f0
+ xor edx,edi
+ xor esi,edi
+ ;
+ rol esi,20
+ mov edi,esi
+ xor esi,edx
+ and esi,0xfff0000f
+ xor edi,esi
+ xor edx,esi
+ ;
+ rol edi,14
+ mov esi,edi
+ xor edi,edx
+ and edi,0x33333333
+ xor esi,edi
+ xor edx,edi
+ ;
+ rol edx,22
+ mov edi,edx
+ xor edx,esi
+ and edx,0x03fc03fc
+ xor edi,edx
+ xor esi,edx
+ ;
+ rol edi,9
+ mov edx,edi
+ xor edi,esi
+ and edi,0xaaaaaaaa
+ xor edx,edi
+ xor esi,edi
+ ;
+ ror edx,3
+ ror esi,2
+ mov DWORD [4+ebx],esi
+ mov eax,DWORD [36+esp]
+ mov DWORD [ebx],edx
+ mov edi,DWORD [40+esp]
+ mov esi,DWORD [44+esp]
+ mov DWORD [8+esp],DWORD 1
+ mov DWORD [4+esp],eax
+ mov DWORD [esp],ebx
+ call L$_DES_encrypt2_begin
+ mov DWORD [8+esp],DWORD 0
+ mov DWORD [4+esp],edi
+ mov DWORD [esp],ebx
+ call L$_DES_encrypt2_begin
+ mov DWORD [8+esp],DWORD 1
+ mov DWORD [4+esp],esi
+ mov DWORD [esp],ebx
+ call L$_DES_encrypt2_begin
+ add esp,12
+ mov edi,DWORD [ebx]
+ mov esi,DWORD [4+ebx]
+ ;
+ ; FP
+ rol esi,2
+ rol edi,3
+ mov eax,edi
+ xor edi,esi
+ and edi,0xaaaaaaaa
+ xor eax,edi
+ xor esi,edi
+ ;
+ rol eax,23
+ mov edi,eax
+ xor eax,esi
+ and eax,0x03fc03fc
+ xor edi,eax
+ xor esi,eax
+ ;
+ rol edi,10
+ mov eax,edi
+ xor edi,esi
+ and edi,0x33333333
+ xor eax,edi
+ xor esi,edi
+ ;
+ rol esi,18
+ mov edi,esi
+ xor esi,eax
+ and esi,0xfff0000f
+ xor edi,esi
+ xor eax,esi
+ ;
+ rol edi,12
+ mov esi,edi
+ xor edi,eax
+ and edi,0xf0f0f0f0
+ xor esi,edi
+ xor eax,edi
+ ;
+ ror eax,4
+ mov DWORD [ebx],eax
+ mov DWORD [4+ebx],esi
+ pop edi
+ pop esi
+ pop ebp
+ pop ebx
+ ret
+global _DES_decrypt3
+align 16
+_DES_decrypt3:
+L$_DES_decrypt3_begin:
+ push ebx
+ mov ebx,DWORD [8+esp]
+ push ebp
+ push esi
+ push edi
+ ;
+ ; Load the data words
+ mov edi,DWORD [ebx]
+ mov esi,DWORD [4+ebx]
+ sub esp,12
+ ;
+ ; IP
+ rol edi,4
+ mov edx,edi
+ xor edi,esi
+ and edi,0xf0f0f0f0
+ xor edx,edi
+ xor esi,edi
+ ;
+ rol esi,20
+ mov edi,esi
+ xor esi,edx
+ and esi,0xfff0000f
+ xor edi,esi
+ xor edx,esi
+ ;
+ rol edi,14
+ mov esi,edi
+ xor edi,edx
+ and edi,0x33333333
+ xor esi,edi
+ xor edx,edi
+ ;
+ rol edx,22
+ mov edi,edx
+ xor edx,esi
+ and edx,0x03fc03fc
+ xor edi,edx
+ xor esi,edx
+ ;
+ rol edi,9
+ mov edx,edi
+ xor edi,esi
+ and edi,0xaaaaaaaa
+ xor edx,edi
+ xor esi,edi
+ ;
+ ror edx,3
+ ror esi,2
+ mov DWORD [4+ebx],esi
+ mov esi,DWORD [36+esp]
+ mov DWORD [ebx],edx
+ mov edi,DWORD [40+esp]
+ mov eax,DWORD [44+esp]
+ mov DWORD [8+esp],DWORD 0
+ mov DWORD [4+esp],eax
+ mov DWORD [esp],ebx
+ call L$_DES_encrypt2_begin
+ mov DWORD [8+esp],DWORD 1
+ mov DWORD [4+esp],edi
+ mov DWORD [esp],ebx
+ call L$_DES_encrypt2_begin
+ mov DWORD [8+esp],DWORD 0
+ mov DWORD [4+esp],esi
+ mov DWORD [esp],ebx
+ call L$_DES_encrypt2_begin
+ add esp,12
+ mov edi,DWORD [ebx]
+ mov esi,DWORD [4+ebx]
+ ;
+ ; FP
+ rol esi,2
+ rol edi,3
+ mov eax,edi
+ xor edi,esi
+ and edi,0xaaaaaaaa
+ xor eax,edi
+ xor esi,edi
+ ;
+ rol eax,23
+ mov edi,eax
+ xor eax,esi
+ and eax,0x03fc03fc
+ xor edi,eax
+ xor esi,eax
+ ;
+ rol edi,10
+ mov eax,edi
+ xor edi,esi
+ and edi,0x33333333
+ xor eax,edi
+ xor esi,edi
+ ;
+ rol esi,18
+ mov edi,esi
+ xor esi,eax
+ and esi,0xfff0000f
+ xor edi,esi
+ xor eax,esi
+ ;
+ rol edi,12
+ mov esi,edi
+ xor edi,eax
+ and edi,0xf0f0f0f0
+ xor esi,edi
+ xor eax,edi
+ ;
+ ror eax,4
+ mov DWORD [ebx],eax
+ mov DWORD [4+ebx],esi
+ pop edi
+ pop esi
+ pop ebp
+ pop ebx
+ ret
+global _DES_ncbc_encrypt
+align 16
+_DES_ncbc_encrypt:
+L$_DES_ncbc_encrypt_begin:
+ ;
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov ebp,DWORD [28+esp]
+ ; getting iv ptr from parameter 4
+ mov ebx,DWORD [36+esp]
+ mov esi,DWORD [ebx]
+ mov edi,DWORD [4+ebx]
+ push edi
+ push esi
+ push edi
+ push esi
+ mov ebx,esp
+ mov esi,DWORD [36+esp]
+ mov edi,DWORD [40+esp]
+ ; getting encrypt flag from parameter 5
+ mov ecx,DWORD [56+esp]
+ ; get and push parameter 5
+ push ecx
+ ; get and push parameter 3
+ mov eax,DWORD [52+esp]
+ push eax
+ push ebx
+ cmp ecx,0
+ jz NEAR L$006decrypt
+ and ebp,4294967288
+ mov eax,DWORD [12+esp]
+ mov ebx,DWORD [16+esp]
+ jz NEAR L$007encrypt_finish
+L$008encrypt_loop:
+ mov ecx,DWORD [esi]
+ mov edx,DWORD [4+esi]
+ xor eax,ecx
+ xor ebx,edx
+ mov DWORD [12+esp],eax
+ mov DWORD [16+esp],ebx
+ call L$_DES_encrypt1_begin
+ mov eax,DWORD [12+esp]
+ mov ebx,DWORD [16+esp]
+ mov DWORD [edi],eax
+ mov DWORD [4+edi],ebx
+ add esi,8
+ add edi,8
+ sub ebp,8
+ jnz NEAR L$008encrypt_loop
+L$007encrypt_finish:
+ mov ebp,DWORD [56+esp]
+ and ebp,7
+ jz NEAR L$009finish
+ call L$010PIC_point
+L$010PIC_point:
+ pop edx
+ lea ecx,[(L$011cbc_enc_jmp_table-L$010PIC_point)+edx]
+ mov ebp,DWORD [ebp*4+ecx]
+ add ebp,edx
+ xor ecx,ecx
+ xor edx,edx
+ jmp ebp
+L$012ej7:
+ mov dh,BYTE [6+esi]
+ shl edx,8
+L$013ej6:
+ mov dh,BYTE [5+esi]
+L$014ej5:
+ mov dl,BYTE [4+esi]
+L$015ej4:
+ mov ecx,DWORD [esi]
+ jmp NEAR L$016ejend
+L$017ej3:
+ mov ch,BYTE [2+esi]
+ shl ecx,8
+L$018ej2:
+ mov ch,BYTE [1+esi]
+L$019ej1:
+ mov cl,BYTE [esi]
+L$016ejend:
+ xor eax,ecx
+ xor ebx,edx
+ mov DWORD [12+esp],eax
+ mov DWORD [16+esp],ebx
+ call L$_DES_encrypt1_begin
+ mov eax,DWORD [12+esp]
+ mov ebx,DWORD [16+esp]
+ mov DWORD [edi],eax
+ mov DWORD [4+edi],ebx
+ jmp NEAR L$009finish
+L$006decrypt:
+ and ebp,4294967288
+ mov eax,DWORD [20+esp]
+ mov ebx,DWORD [24+esp]
+ jz NEAR L$020decrypt_finish
+L$021decrypt_loop:
+ mov eax,DWORD [esi]
+ mov ebx,DWORD [4+esi]
+ mov DWORD [12+esp],eax
+ mov DWORD [16+esp],ebx
+ call L$_DES_encrypt1_begin
+ mov eax,DWORD [12+esp]
+ mov ebx,DWORD [16+esp]
+ mov ecx,DWORD [20+esp]
+ mov edx,DWORD [24+esp]
+ xor ecx,eax
+ xor edx,ebx
+ mov eax,DWORD [esi]
+ mov ebx,DWORD [4+esi]
+ mov DWORD [edi],ecx
+ mov DWORD [4+edi],edx
+ mov DWORD [20+esp],eax
+ mov DWORD [24+esp],ebx
+ add esi,8
+ add edi,8
+ sub ebp,8
+ jnz NEAR L$021decrypt_loop
+L$020decrypt_finish:
+ mov ebp,DWORD [56+esp]
+ and ebp,7
+ jz NEAR L$009finish
+ mov eax,DWORD [esi]
+ mov ebx,DWORD [4+esi]
+ mov DWORD [12+esp],eax
+ mov DWORD [16+esp],ebx
+ call L$_DES_encrypt1_begin
+ mov eax,DWORD [12+esp]
+ mov ebx,DWORD [16+esp]
+ mov ecx,DWORD [20+esp]
+ mov edx,DWORD [24+esp]
+ xor ecx,eax
+ xor edx,ebx
+ mov eax,DWORD [esi]
+ mov ebx,DWORD [4+esi]
+L$022dj7:
+ ror edx,16
+ mov BYTE [6+edi],dl
+ shr edx,16
+L$023dj6:
+ mov BYTE [5+edi],dh
+L$024dj5:
+ mov BYTE [4+edi],dl
+L$025dj4:
+ mov DWORD [edi],ecx
+ jmp NEAR L$026djend
+L$027dj3:
+ ror ecx,16
+ mov BYTE [2+edi],cl
+ shl ecx,16
+L$028dj2:
+ mov BYTE [1+esi],ch
+L$029dj1:
+ mov BYTE [esi],cl
+L$026djend:
+ jmp NEAR L$009finish
+L$009finish:
+ mov ecx,DWORD [64+esp]
+ add esp,28
+ mov DWORD [ecx],eax
+ mov DWORD [4+ecx],ebx
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+align 64
+L$011cbc_enc_jmp_table:
+dd 0
+dd L$019ej1-L$010PIC_point
+dd L$018ej2-L$010PIC_point
+dd L$017ej3-L$010PIC_point
+dd L$015ej4-L$010PIC_point
+dd L$014ej5-L$010PIC_point
+dd L$013ej6-L$010PIC_point
+dd L$012ej7-L$010PIC_point
+align 64
+global _DES_ede3_cbc_encrypt
+align 16
+_DES_ede3_cbc_encrypt:
+L$_DES_ede3_cbc_encrypt_begin:
+ ;
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov ebp,DWORD [28+esp]
+ ; getting iv ptr from parameter 6
+ mov ebx,DWORD [44+esp]
+ mov esi,DWORD [ebx]
+ mov edi,DWORD [4+ebx]
+ push edi
+ push esi
+ push edi
+ push esi
+ mov ebx,esp
+ mov esi,DWORD [36+esp]
+ mov edi,DWORD [40+esp]
+ ; getting encrypt flag from parameter 7
+ mov ecx,DWORD [64+esp]
+ ; get and push parameter 5
+ mov eax,DWORD [56+esp]
+ push eax
+ ; get and push parameter 4
+ mov eax,DWORD [56+esp]
+ push eax
+ ; get and push parameter 3
+ mov eax,DWORD [56+esp]
+ push eax
+ push ebx
+ cmp ecx,0
+ jz NEAR L$030decrypt
+ and ebp,4294967288
+ mov eax,DWORD [16+esp]
+ mov ebx,DWORD [20+esp]
+ jz NEAR L$031encrypt_finish
+L$032encrypt_loop:
+ mov ecx,DWORD [esi]
+ mov edx,DWORD [4+esi]
+ xor eax,ecx
+ xor ebx,edx
+ mov DWORD [16+esp],eax
+ mov DWORD [20+esp],ebx
+ call L$_DES_encrypt3_begin
+ mov eax,DWORD [16+esp]
+ mov ebx,DWORD [20+esp]
+ mov DWORD [edi],eax
+ mov DWORD [4+edi],ebx
+ add esi,8
+ add edi,8
+ sub ebp,8
+ jnz NEAR L$032encrypt_loop
+L$031encrypt_finish:
+ mov ebp,DWORD [60+esp]
+ and ebp,7
+ jz NEAR L$033finish
+ call L$034PIC_point
+L$034PIC_point:
+ pop edx
+ lea ecx,[(L$035cbc_enc_jmp_table-L$034PIC_point)+edx]
+ mov ebp,DWORD [ebp*4+ecx]
+ add ebp,edx
+ xor ecx,ecx
+ xor edx,edx
+ jmp ebp
+L$036ej7:
+ mov dh,BYTE [6+esi]
+ shl edx,8
+L$037ej6:
+ mov dh,BYTE [5+esi]
+L$038ej5:
+ mov dl,BYTE [4+esi]
+L$039ej4:
+ mov ecx,DWORD [esi]
+ jmp NEAR L$040ejend
+L$041ej3:
+ mov ch,BYTE [2+esi]
+ shl ecx,8
+L$042ej2:
+ mov ch,BYTE [1+esi]
+L$043ej1:
+ mov cl,BYTE [esi]
+L$040ejend:
+ xor eax,ecx
+ xor ebx,edx
+ mov DWORD [16+esp],eax
+ mov DWORD [20+esp],ebx
+ call L$_DES_encrypt3_begin
+ mov eax,DWORD [16+esp]
+ mov ebx,DWORD [20+esp]
+ mov DWORD [edi],eax
+ mov DWORD [4+edi],ebx
+ jmp NEAR L$033finish
+L$030decrypt:
+ and ebp,4294967288
+ mov eax,DWORD [24+esp]
+ mov ebx,DWORD [28+esp]
+ jz NEAR L$044decrypt_finish
+L$045decrypt_loop:
+ mov eax,DWORD [esi]
+ mov ebx,DWORD [4+esi]
+ mov DWORD [16+esp],eax
+ mov DWORD [20+esp],ebx
+ call L$_DES_decrypt3_begin
+ mov eax,DWORD [16+esp]
+ mov ebx,DWORD [20+esp]
+ mov ecx,DWORD [24+esp]
+ mov edx,DWORD [28+esp]
+ xor ecx,eax
+ xor edx,ebx
+ mov eax,DWORD [esi]
+ mov ebx,DWORD [4+esi]
+ mov DWORD [edi],ecx
+ mov DWORD [4+edi],edx
+ mov DWORD [24+esp],eax
+ mov DWORD [28+esp],ebx
+ add esi,8
+ add edi,8
+ sub ebp,8
+ jnz NEAR L$045decrypt_loop
+L$044decrypt_finish:
+ mov ebp,DWORD [60+esp]
+ and ebp,7
+ jz NEAR L$033finish
+ mov eax,DWORD [esi]
+ mov ebx,DWORD [4+esi]
+ mov DWORD [16+esp],eax
+ mov DWORD [20+esp],ebx
+ call L$_DES_decrypt3_begin
+ mov eax,DWORD [16+esp]
+ mov ebx,DWORD [20+esp]
+ mov ecx,DWORD [24+esp]
+ mov edx,DWORD [28+esp]
+ xor ecx,eax
+ xor edx,ebx
+ mov eax,DWORD [esi]
+ mov ebx,DWORD [4+esi]
+L$046dj7:
+ ror edx,16
+ mov BYTE [6+edi],dl
+ shr edx,16
+L$047dj6:
+ mov BYTE [5+edi],dh
+L$048dj5:
+ mov BYTE [4+edi],dl
+L$049dj4:
+ mov DWORD [edi],ecx
+ jmp NEAR L$050djend
+L$051dj3:
+ ror ecx,16
+ mov BYTE [2+edi],cl
+ shl ecx,16
+L$052dj2:
+ mov BYTE [1+esi],ch
+L$053dj1:
+ mov BYTE [esi],cl
+L$050djend:
+ jmp NEAR L$033finish
+L$033finish:
+ mov ecx,DWORD [76+esp]
+ add esp,32
+ mov DWORD [ecx],eax
+ mov DWORD [4+ecx],ebx
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+align 64
+L$035cbc_enc_jmp_table:
+dd 0
+dd L$043ej1-L$034PIC_point
+dd L$042ej2-L$034PIC_point
+dd L$041ej3-L$034PIC_point
+dd L$039ej4-L$034PIC_point
+dd L$038ej5-L$034PIC_point
+dd L$037ej6-L$034PIC_point
+dd L$036ej7-L$034PIC_point
+align 64
+align 64
+_DES_SPtrans:
+L$des_sptrans:
+dd 34080768,524288,33554434,34080770
+dd 33554432,526338,524290,33554434
+dd 526338,34080768,34078720,2050
+dd 33556482,33554432,0,524290
+dd 524288,2,33556480,526336
+dd 34080770,34078720,2050,33556480
+dd 2,2048,526336,34078722
+dd 2048,33556482,34078722,0
+dd 0,34080770,33556480,524290
+dd 34080768,524288,2050,33556480
+dd 34078722,2048,526336,33554434
+dd 526338,2,33554434,34078720
+dd 34080770,526336,34078720,33556482
+dd 33554432,2050,524290,0
+dd 524288,33554432,33556482,34080768
+dd 2,34078722,2048,526338
+dd 1074823184,0,1081344,1074790400
+dd 1073741840,32784,1073774592,1081344
+dd 32768,1074790416,16,1073774592
+dd 1048592,1074823168,1074790400,16
+dd 1048576,1073774608,1074790416,32768
+dd 1081360,1073741824,0,1048592
+dd 1073774608,1081360,1074823168,1073741840
+dd 1073741824,1048576,32784,1074823184
+dd 1048592,1074823168,1073774592,1081360
+dd 1074823184,1048592,1073741840,0
+dd 1073741824,32784,1048576,1074790416
+dd 32768,1073741824,1081360,1073774608
+dd 1074823168,32768,0,1073741840
+dd 16,1074823184,1081344,1074790400
+dd 1074790416,1048576,32784,1073774592
+dd 1073774608,16,1074790400,1081344
+dd 67108865,67371264,256,67109121
+dd 262145,67108864,67109121,262400
+dd 67109120,262144,67371008,1
+dd 67371265,257,1,67371009
+dd 0,262145,67371264,256
+dd 257,67371265,262144,67108865
+dd 67371009,67109120,262401,67371008
+dd 262400,0,67108864,262401
+dd 67371264,256,1,262144
+dd 257,262145,67371008,67109121
+dd 0,67371264,262400,67371009
+dd 262145,67108864,67371265,1
+dd 262401,67108865,67108864,67371265
+dd 262144,67109120,67109121,262400
+dd 67109120,0,67371009,257
+dd 67108865,262401,256,67371008
+dd 4198408,268439552,8,272633864
+dd 0,272629760,268439560,4194312
+dd 272633856,268435464,268435456,4104
+dd 268435464,4198408,4194304,268435456
+dd 272629768,4198400,4096,8
+dd 4198400,268439560,272629760,4096
+dd 4104,0,4194312,272633856
+dd 268439552,272629768,272633864,4194304
+dd 272629768,4104,4194304,268435464
+dd 4198400,268439552,8,272629760
+dd 268439560,0,4096,4194312
+dd 0,272629768,272633856,4096
+dd 268435456,272633864,4198408,4194304
+dd 272633864,8,268439552,4198408
+dd 4194312,4198400,272629760,268439560
+dd 4104,268435456,268435464,272633856
+dd 134217728,65536,1024,134284320
+dd 134283296,134218752,66592,134283264
+dd 65536,32,134217760,66560
+dd 134218784,134283296,134284288,0
+dd 66560,134217728,65568,1056
+dd 134218752,66592,0,134217760
+dd 32,134218784,134284320,65568
+dd 134283264,1024,1056,134284288
+dd 134284288,134218784,65568,134283264
+dd 65536,32,134217760,134218752
+dd 134217728,66560,134284320,0
+dd 66592,134217728,1024,65568
+dd 134218784,1024,0,134284320
+dd 134283296,134284288,1056,65536
+dd 66560,134283296,134218752,1056
+dd 32,66592,134283264,134217760
+dd 2147483712,2097216,0,2149588992
+dd 2097216,8192,2147491904,2097152
+dd 8256,2149589056,2105344,2147483648
+dd 2147491840,2147483712,2149580800,2105408
+dd 2097152,2147491904,2149580864,0
+dd 8192,64,2149588992,2149580864
+dd 2149589056,2149580800,2147483648,8256
+dd 64,2105344,2105408,2147491840
+dd 8256,2147483648,2147491840,2105408
+dd 2149588992,2097216,0,2147491840
+dd 2147483648,8192,2149580864,2097152
+dd 2097216,2149589056,2105344,64
+dd 2149589056,2105344,2097152,2147491904
+dd 2147483712,2149580800,2105408,0
+dd 8192,2147483712,2147491904,2149588992
+dd 2149580800,8256,64,2149580864
+dd 16384,512,16777728,16777220
+dd 16794116,16388,16896,0
+dd 16777216,16777732,516,16793600
+dd 4,16794112,16793600,516
+dd 16777732,16384,16388,16794116
+dd 0,16777728,16777220,16896
+dd 16793604,16900,16794112,4
+dd 16900,16793604,512,16777216
+dd 16900,16793600,16793604,516
+dd 16384,512,16777216,16793604
+dd 16777732,16900,16896,0
+dd 512,16777220,4,16777728
+dd 0,16777732,16777728,16896
+dd 516,16384,16794116,16777216
+dd 16794112,4,16388,16794116
+dd 16777220,16794112,16793600,16388
+dd 545259648,545390592,131200,0
+dd 537001984,8388736,545259520,545390720
+dd 128,536870912,8519680,131200
+dd 8519808,537002112,536871040,545259520
+dd 131072,8519808,8388736,537001984
+dd 545390720,536871040,0,8519680
+dd 536870912,8388608,537002112,545259648
+dd 8388608,131072,545390592,128
+dd 8388608,131072,536871040,545390720
+dd 131200,536870912,0,8519680
+dd 545259648,537002112,537001984,8388736
+dd 545390592,128,8388736,537001984
+dd 545390720,8388608,545259520,536871040
+dd 8519680,131200,537002112,545259520
+dd 128,545390592,8519808,0
+dd 536870912,545259648,131072,8519808
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/md5/md5-586.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/md5/md5-586.nasm
new file mode 100644
index 0000000000..83e4e77e6a
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/md5/md5-586.nasm
@@ -0,0 +1,690 @@
+; Copyright 1995-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text code align=64
+%else
+section .text code
+%endif
+global _md5_block_asm_data_order
+align 16
+_md5_block_asm_data_order:
+L$_md5_block_asm_data_order_begin:
+ push esi
+ push edi
+ mov edi,DWORD [12+esp]
+ mov esi,DWORD [16+esp]
+ mov ecx,DWORD [20+esp]
+ push ebp
+ shl ecx,6
+ push ebx
+ add ecx,esi
+ sub ecx,64
+ mov eax,DWORD [edi]
+ push ecx
+ mov ebx,DWORD [4+edi]
+ mov ecx,DWORD [8+edi]
+ mov edx,DWORD [12+edi]
+L$000start:
+ ;
+ ; R0 section
+ mov edi,ecx
+ mov ebp,DWORD [esi]
+ ; R0 0
+ xor edi,edx
+ and edi,ebx
+ lea eax,[3614090360+ebp*1+eax]
+ xor edi,edx
+ mov ebp,DWORD [4+esi]
+ add eax,edi
+ rol eax,7
+ mov edi,ebx
+ add eax,ebx
+ ; R0 1
+ xor edi,ecx
+ and edi,eax
+ lea edx,[3905402710+ebp*1+edx]
+ xor edi,ecx
+ mov ebp,DWORD [8+esi]
+ add edx,edi
+ rol edx,12
+ mov edi,eax
+ add edx,eax
+ ; R0 2
+ xor edi,ebx
+ and edi,edx
+ lea ecx,[606105819+ebp*1+ecx]
+ xor edi,ebx
+ mov ebp,DWORD [12+esi]
+ add ecx,edi
+ rol ecx,17
+ mov edi,edx
+ add ecx,edx
+ ; R0 3
+ xor edi,eax
+ and edi,ecx
+ lea ebx,[3250441966+ebp*1+ebx]
+ xor edi,eax
+ mov ebp,DWORD [16+esi]
+ add ebx,edi
+ rol ebx,22
+ mov edi,ecx
+ add ebx,ecx
+ ; R0 4
+ xor edi,edx
+ and edi,ebx
+ lea eax,[4118548399+ebp*1+eax]
+ xor edi,edx
+ mov ebp,DWORD [20+esi]
+ add eax,edi
+ rol eax,7
+ mov edi,ebx
+ add eax,ebx
+ ; R0 5
+ xor edi,ecx
+ and edi,eax
+ lea edx,[1200080426+ebp*1+edx]
+ xor edi,ecx
+ mov ebp,DWORD [24+esi]
+ add edx,edi
+ rol edx,12
+ mov edi,eax
+ add edx,eax
+ ; R0 6
+ xor edi,ebx
+ and edi,edx
+ lea ecx,[2821735955+ebp*1+ecx]
+ xor edi,ebx
+ mov ebp,DWORD [28+esi]
+ add ecx,edi
+ rol ecx,17
+ mov edi,edx
+ add ecx,edx
+ ; R0 7
+ xor edi,eax
+ and edi,ecx
+ lea ebx,[4249261313+ebp*1+ebx]
+ xor edi,eax
+ mov ebp,DWORD [32+esi]
+ add ebx,edi
+ rol ebx,22
+ mov edi,ecx
+ add ebx,ecx
+ ; R0 8
+ xor edi,edx
+ and edi,ebx
+ lea eax,[1770035416+ebp*1+eax]
+ xor edi,edx
+ mov ebp,DWORD [36+esi]
+ add eax,edi
+ rol eax,7
+ mov edi,ebx
+ add eax,ebx
+ ; R0 9
+ xor edi,ecx
+ and edi,eax
+ lea edx,[2336552879+ebp*1+edx]
+ xor edi,ecx
+ mov ebp,DWORD [40+esi]
+ add edx,edi
+ rol edx,12
+ mov edi,eax
+ add edx,eax
+ ; R0 10
+ xor edi,ebx
+ and edi,edx
+ lea ecx,[4294925233+ebp*1+ecx]
+ xor edi,ebx
+ mov ebp,DWORD [44+esi]
+ add ecx,edi
+ rol ecx,17
+ mov edi,edx
+ add ecx,edx
+ ; R0 11
+ xor edi,eax
+ and edi,ecx
+ lea ebx,[2304563134+ebp*1+ebx]
+ xor edi,eax
+ mov ebp,DWORD [48+esi]
+ add ebx,edi
+ rol ebx,22
+ mov edi,ecx
+ add ebx,ecx
+ ; R0 12
+ xor edi,edx
+ and edi,ebx
+ lea eax,[1804603682+ebp*1+eax]
+ xor edi,edx
+ mov ebp,DWORD [52+esi]
+ add eax,edi
+ rol eax,7
+ mov edi,ebx
+ add eax,ebx
+ ; R0 13
+ xor edi,ecx
+ and edi,eax
+ lea edx,[4254626195+ebp*1+edx]
+ xor edi,ecx
+ mov ebp,DWORD [56+esi]
+ add edx,edi
+ rol edx,12
+ mov edi,eax
+ add edx,eax
+ ; R0 14
+ xor edi,ebx
+ and edi,edx
+ lea ecx,[2792965006+ebp*1+ecx]
+ xor edi,ebx
+ mov ebp,DWORD [60+esi]
+ add ecx,edi
+ rol ecx,17
+ mov edi,edx
+ add ecx,edx
+ ; R0 15
+ xor edi,eax
+ and edi,ecx
+ lea ebx,[1236535329+ebp*1+ebx]
+ xor edi,eax
+ mov ebp,DWORD [4+esi]
+ add ebx,edi
+ rol ebx,22
+ mov edi,ecx
+ add ebx,ecx
+ ;
+ ; R1 section
+ ; R1 16
+ xor edi,ebx
+ and edi,edx
+ lea eax,[4129170786+ebp*1+eax]
+ xor edi,ecx
+ mov ebp,DWORD [24+esi]
+ add eax,edi
+ mov edi,ebx
+ rol eax,5
+ add eax,ebx
+ ; R1 17
+ xor edi,eax
+ and edi,ecx
+ lea edx,[3225465664+ebp*1+edx]
+ xor edi,ebx
+ mov ebp,DWORD [44+esi]
+ add edx,edi
+ mov edi,eax
+ rol edx,9
+ add edx,eax
+ ; R1 18
+ xor edi,edx
+ and edi,ebx
+ lea ecx,[643717713+ebp*1+ecx]
+ xor edi,eax
+ mov ebp,DWORD [esi]
+ add ecx,edi
+ mov edi,edx
+ rol ecx,14
+ add ecx,edx
+ ; R1 19
+ xor edi,ecx
+ and edi,eax
+ lea ebx,[3921069994+ebp*1+ebx]
+ xor edi,edx
+ mov ebp,DWORD [20+esi]
+ add ebx,edi
+ mov edi,ecx
+ rol ebx,20
+ add ebx,ecx
+ ; R1 20
+ xor edi,ebx
+ and edi,edx
+ lea eax,[3593408605+ebp*1+eax]
+ xor edi,ecx
+ mov ebp,DWORD [40+esi]
+ add eax,edi
+ mov edi,ebx
+ rol eax,5
+ add eax,ebx
+ ; R1 21
+ xor edi,eax
+ and edi,ecx
+ lea edx,[38016083+ebp*1+edx]
+ xor edi,ebx
+ mov ebp,DWORD [60+esi]
+ add edx,edi
+ mov edi,eax
+ rol edx,9
+ add edx,eax
+ ; R1 22
+ xor edi,edx
+ and edi,ebx
+ lea ecx,[3634488961+ebp*1+ecx]
+ xor edi,eax
+ mov ebp,DWORD [16+esi]
+ add ecx,edi
+ mov edi,edx
+ rol ecx,14
+ add ecx,edx
+ ; R1 23
+ xor edi,ecx
+ and edi,eax
+ lea ebx,[3889429448+ebp*1+ebx]
+ xor edi,edx
+ mov ebp,DWORD [36+esi]
+ add ebx,edi
+ mov edi,ecx
+ rol ebx,20
+ add ebx,ecx
+ ; R1 24
+ xor edi,ebx
+ and edi,edx
+ lea eax,[568446438+ebp*1+eax]
+ xor edi,ecx
+ mov ebp,DWORD [56+esi]
+ add eax,edi
+ mov edi,ebx
+ rol eax,5
+ add eax,ebx
+ ; R1 25
+ xor edi,eax
+ and edi,ecx
+ lea edx,[3275163606+ebp*1+edx]
+ xor edi,ebx
+ mov ebp,DWORD [12+esi]
+ add edx,edi
+ mov edi,eax
+ rol edx,9
+ add edx,eax
+ ; R1 26
+ xor edi,edx
+ and edi,ebx
+ lea ecx,[4107603335+ebp*1+ecx]
+ xor edi,eax
+ mov ebp,DWORD [32+esi]
+ add ecx,edi
+ mov edi,edx
+ rol ecx,14
+ add ecx,edx
+ ; R1 27
+ xor edi,ecx
+ and edi,eax
+ lea ebx,[1163531501+ebp*1+ebx]
+ xor edi,edx
+ mov ebp,DWORD [52+esi]
+ add ebx,edi
+ mov edi,ecx
+ rol ebx,20
+ add ebx,ecx
+ ; R1 28
+ xor edi,ebx
+ and edi,edx
+ lea eax,[2850285829+ebp*1+eax]
+ xor edi,ecx
+ mov ebp,DWORD [8+esi]
+ add eax,edi
+ mov edi,ebx
+ rol eax,5
+ add eax,ebx
+ ; R1 29
+ xor edi,eax
+ and edi,ecx
+ lea edx,[4243563512+ebp*1+edx]
+ xor edi,ebx
+ mov ebp,DWORD [28+esi]
+ add edx,edi
+ mov edi,eax
+ rol edx,9
+ add edx,eax
+ ; R1 30
+ xor edi,edx
+ and edi,ebx
+ lea ecx,[1735328473+ebp*1+ecx]
+ xor edi,eax
+ mov ebp,DWORD [48+esi]
+ add ecx,edi
+ mov edi,edx
+ rol ecx,14
+ add ecx,edx
+ ; R1 31
+ xor edi,ecx
+ and edi,eax
+ lea ebx,[2368359562+ebp*1+ebx]
+ xor edi,edx
+ mov ebp,DWORD [20+esi]
+ add ebx,edi
+ mov edi,ecx
+ rol ebx,20
+ add ebx,ecx
+ ;
+ ; R2 section
+ ; R2 32
+ xor edi,edx
+ xor edi,ebx
+ lea eax,[4294588738+ebp*1+eax]
+ add eax,edi
+ mov ebp,DWORD [32+esi]
+ rol eax,4
+ mov edi,ebx
+ ; R2 33
+ add eax,ebx
+ xor edi,ecx
+ lea edx,[2272392833+ebp*1+edx]
+ xor edi,eax
+ mov ebp,DWORD [44+esi]
+ add edx,edi
+ mov edi,eax
+ rol edx,11
+ add edx,eax
+ ; R2 34
+ xor edi,ebx
+ xor edi,edx
+ lea ecx,[1839030562+ebp*1+ecx]
+ add ecx,edi
+ mov ebp,DWORD [56+esi]
+ rol ecx,16
+ mov edi,edx
+ ; R2 35
+ add ecx,edx
+ xor edi,eax
+ lea ebx,[4259657740+ebp*1+ebx]
+ xor edi,ecx
+ mov ebp,DWORD [4+esi]
+ add ebx,edi
+ mov edi,ecx
+ rol ebx,23
+ add ebx,ecx
+ ; R2 36
+ xor edi,edx
+ xor edi,ebx
+ lea eax,[2763975236+ebp*1+eax]
+ add eax,edi
+ mov ebp,DWORD [16+esi]
+ rol eax,4
+ mov edi,ebx
+ ; R2 37
+ add eax,ebx
+ xor edi,ecx
+ lea edx,[1272893353+ebp*1+edx]
+ xor edi,eax
+ mov ebp,DWORD [28+esi]
+ add edx,edi
+ mov edi,eax
+ rol edx,11
+ add edx,eax
+ ; R2 38
+ xor edi,ebx
+ xor edi,edx
+ lea ecx,[4139469664+ebp*1+ecx]
+ add ecx,edi
+ mov ebp,DWORD [40+esi]
+ rol ecx,16
+ mov edi,edx
+ ; R2 39
+ add ecx,edx
+ xor edi,eax
+ lea ebx,[3200236656+ebp*1+ebx]
+ xor edi,ecx
+ mov ebp,DWORD [52+esi]
+ add ebx,edi
+ mov edi,ecx
+ rol ebx,23
+ add ebx,ecx
+ ; R2 40
+ xor edi,edx
+ xor edi,ebx
+ lea eax,[681279174+ebp*1+eax]
+ add eax,edi
+ mov ebp,DWORD [esi]
+ rol eax,4
+ mov edi,ebx
+ ; R2 41
+ add eax,ebx
+ xor edi,ecx
+ lea edx,[3936430074+ebp*1+edx]
+ xor edi,eax
+ mov ebp,DWORD [12+esi]
+ add edx,edi
+ mov edi,eax
+ rol edx,11
+ add edx,eax
+ ; R2 42
+ xor edi,ebx
+ xor edi,edx
+ lea ecx,[3572445317+ebp*1+ecx]
+ add ecx,edi
+ mov ebp,DWORD [24+esi]
+ rol ecx,16
+ mov edi,edx
+ ; R2 43
+ add ecx,edx
+ xor edi,eax
+ lea ebx,[76029189+ebp*1+ebx]
+ xor edi,ecx
+ mov ebp,DWORD [36+esi]
+ add ebx,edi
+ mov edi,ecx
+ rol ebx,23
+ add ebx,ecx
+ ; R2 44
+ xor edi,edx
+ xor edi,ebx
+ lea eax,[3654602809+ebp*1+eax]
+ add eax,edi
+ mov ebp,DWORD [48+esi]
+ rol eax,4
+ mov edi,ebx
+ ; R2 45
+ add eax,ebx
+ xor edi,ecx
+ lea edx,[3873151461+ebp*1+edx]
+ xor edi,eax
+ mov ebp,DWORD [60+esi]
+ add edx,edi
+ mov edi,eax
+ rol edx,11
+ add edx,eax
+ ; R2 46
+ xor edi,ebx
+ xor edi,edx
+ lea ecx,[530742520+ebp*1+ecx]
+ add ecx,edi
+ mov ebp,DWORD [8+esi]
+ rol ecx,16
+ mov edi,edx
+ ; R2 47
+ add ecx,edx
+ xor edi,eax
+ lea ebx,[3299628645+ebp*1+ebx]
+ xor edi,ecx
+ mov ebp,DWORD [esi]
+ add ebx,edi
+ mov edi,-1
+ rol ebx,23
+ add ebx,ecx
+ ;
+ ; R3 section
+ ; R3 48
+ xor edi,edx
+ or edi,ebx
+ lea eax,[4096336452+ebp*1+eax]
+ xor edi,ecx
+ mov ebp,DWORD [28+esi]
+ add eax,edi
+ mov edi,-1
+ rol eax,6
+ xor edi,ecx
+ add eax,ebx
+ ; R3 49
+ or edi,eax
+ lea edx,[1126891415+ebp*1+edx]
+ xor edi,ebx
+ mov ebp,DWORD [56+esi]
+ add edx,edi
+ mov edi,-1
+ rol edx,10
+ xor edi,ebx
+ add edx,eax
+ ; R3 50
+ or edi,edx
+ lea ecx,[2878612391+ebp*1+ecx]
+ xor edi,eax
+ mov ebp,DWORD [20+esi]
+ add ecx,edi
+ mov edi,-1
+ rol ecx,15
+ xor edi,eax
+ add ecx,edx
+ ; R3 51
+ or edi,ecx
+ lea ebx,[4237533241+ebp*1+ebx]
+ xor edi,edx
+ mov ebp,DWORD [48+esi]
+ add ebx,edi
+ mov edi,-1
+ rol ebx,21
+ xor edi,edx
+ add ebx,ecx
+ ; R3 52
+ or edi,ebx
+ lea eax,[1700485571+ebp*1+eax]
+ xor edi,ecx
+ mov ebp,DWORD [12+esi]
+ add eax,edi
+ mov edi,-1
+ rol eax,6
+ xor edi,ecx
+ add eax,ebx
+ ; R3 53
+ or edi,eax
+ lea edx,[2399980690+ebp*1+edx]
+ xor edi,ebx
+ mov ebp,DWORD [40+esi]
+ add edx,edi
+ mov edi,-1
+ rol edx,10
+ xor edi,ebx
+ add edx,eax
+ ; R3 54
+ or edi,edx
+ lea ecx,[4293915773+ebp*1+ecx]
+ xor edi,eax
+ mov ebp,DWORD [4+esi]
+ add ecx,edi
+ mov edi,-1
+ rol ecx,15
+ xor edi,eax
+ add ecx,edx
+ ; R3 55
+ or edi,ecx
+ lea ebx,[2240044497+ebp*1+ebx]
+ xor edi,edx
+ mov ebp,DWORD [32+esi]
+ add ebx,edi
+ mov edi,-1
+ rol ebx,21
+ xor edi,edx
+ add ebx,ecx
+ ; R3 56
+ or edi,ebx
+ lea eax,[1873313359+ebp*1+eax]
+ xor edi,ecx
+ mov ebp,DWORD [60+esi]
+ add eax,edi
+ mov edi,-1
+ rol eax,6
+ xor edi,ecx
+ add eax,ebx
+ ; R3 57
+ or edi,eax
+ lea edx,[4264355552+ebp*1+edx]
+ xor edi,ebx
+ mov ebp,DWORD [24+esi]
+ add edx,edi
+ mov edi,-1
+ rol edx,10
+ xor edi,ebx
+ add edx,eax
+ ; R3 58
+ or edi,edx
+ lea ecx,[2734768916+ebp*1+ecx]
+ xor edi,eax
+ mov ebp,DWORD [52+esi]
+ add ecx,edi
+ mov edi,-1
+ rol ecx,15
+ xor edi,eax
+ add ecx,edx
+ ; R3 59
+ or edi,ecx
+ lea ebx,[1309151649+ebp*1+ebx]
+ xor edi,edx
+ mov ebp,DWORD [16+esi]
+ add ebx,edi
+ mov edi,-1
+ rol ebx,21
+ xor edi,edx
+ add ebx,ecx
+ ; R3 60
+ or edi,ebx
+ lea eax,[4149444226+ebp*1+eax]
+ xor edi,ecx
+ mov ebp,DWORD [44+esi]
+ add eax,edi
+ mov edi,-1
+ rol eax,6
+ xor edi,ecx
+ add eax,ebx
+ ; R3 61
+ or edi,eax
+ lea edx,[3174756917+ebp*1+edx]
+ xor edi,ebx
+ mov ebp,DWORD [8+esi]
+ add edx,edi
+ mov edi,-1
+ rol edx,10
+ xor edi,ebx
+ add edx,eax
+ ; R3 62
+ or edi,edx
+ lea ecx,[718787259+ebp*1+ecx]
+ xor edi,eax
+ mov ebp,DWORD [36+esi]
+ add ecx,edi
+ mov edi,-1
+ rol ecx,15
+ xor edi,eax
+ add ecx,edx
+ ; R3 63
+ or edi,ecx
+ lea ebx,[3951481745+ebp*1+ebx]
+ xor edi,edx
+ mov ebp,DWORD [24+esp]
+ add ebx,edi
+ add esi,64
+ rol ebx,21
+ mov edi,DWORD [ebp]
+ add ebx,ecx
+ add eax,edi
+ mov edi,DWORD [4+ebp]
+ add ebx,edi
+ mov edi,DWORD [8+ebp]
+ add ecx,edi
+ mov edi,DWORD [12+ebp]
+ add edx,edi
+ mov DWORD [ebp],eax
+ mov DWORD [4+ebp],ebx
+ mov edi,DWORD [esp]
+ mov DWORD [8+ebp],ecx
+ mov DWORD [12+ebp],edx
+ cmp edi,esi
+ jae NEAR L$000start
+ pop eax
+ pop ebx
+ pop ebp
+ pop edi
+ pop esi
+ ret
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/modes/ghash-x86.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/modes/ghash-x86.nasm
new file mode 100644
index 0000000000..57649ad22b
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/modes/ghash-x86.nasm
@@ -0,0 +1,1264 @@
+; Copyright 2010-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text code align=64
+%else
+section .text code
+%endif
+global _gcm_gmult_4bit_x86
+align 16
+_gcm_gmult_4bit_x86:
+L$_gcm_gmult_4bit_x86_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ sub esp,84
+ mov edi,DWORD [104+esp]
+ mov esi,DWORD [108+esp]
+ mov ebp,DWORD [edi]
+ mov edx,DWORD [4+edi]
+ mov ecx,DWORD [8+edi]
+ mov ebx,DWORD [12+edi]
+ mov DWORD [16+esp],0
+ mov DWORD [20+esp],471859200
+ mov DWORD [24+esp],943718400
+ mov DWORD [28+esp],610271232
+ mov DWORD [32+esp],1887436800
+ mov DWORD [36+esp],1822425088
+ mov DWORD [40+esp],1220542464
+ mov DWORD [44+esp],1423966208
+ mov DWORD [48+esp],3774873600
+ mov DWORD [52+esp],4246732800
+ mov DWORD [56+esp],3644850176
+ mov DWORD [60+esp],3311403008
+ mov DWORD [64+esp],2441084928
+ mov DWORD [68+esp],2376073216
+ mov DWORD [72+esp],2847932416
+ mov DWORD [76+esp],3051356160
+ mov DWORD [esp],ebp
+ mov DWORD [4+esp],edx
+ mov DWORD [8+esp],ecx
+ mov DWORD [12+esp],ebx
+ shr ebx,20
+ and ebx,240
+ mov ebp,DWORD [4+ebx*1+esi]
+ mov edx,DWORD [ebx*1+esi]
+ mov ecx,DWORD [12+ebx*1+esi]
+ mov ebx,DWORD [8+ebx*1+esi]
+ xor eax,eax
+ mov edi,15
+ jmp NEAR L$000x86_loop
+align 16
+L$000x86_loop:
+ mov al,bl
+ shrd ebx,ecx,4
+ and al,15
+ shrd ecx,edx,4
+ shrd edx,ebp,4
+ shr ebp,4
+ xor ebp,DWORD [16+eax*4+esp]
+ mov al,BYTE [edi*1+esp]
+ and al,240
+ xor ebx,DWORD [8+eax*1+esi]
+ xor ecx,DWORD [12+eax*1+esi]
+ xor edx,DWORD [eax*1+esi]
+ xor ebp,DWORD [4+eax*1+esi]
+ dec edi
+ js NEAR L$001x86_break
+ mov al,bl
+ shrd ebx,ecx,4
+ and al,15
+ shrd ecx,edx,4
+ shrd edx,ebp,4
+ shr ebp,4
+ xor ebp,DWORD [16+eax*4+esp]
+ mov al,BYTE [edi*1+esp]
+ shl al,4
+ xor ebx,DWORD [8+eax*1+esi]
+ xor ecx,DWORD [12+eax*1+esi]
+ xor edx,DWORD [eax*1+esi]
+ xor ebp,DWORD [4+eax*1+esi]
+ jmp NEAR L$000x86_loop
+align 16
+L$001x86_break:
+ bswap ebx
+ bswap ecx
+ bswap edx
+ bswap ebp
+ mov edi,DWORD [104+esp]
+ mov DWORD [12+edi],ebx
+ mov DWORD [8+edi],ecx
+ mov DWORD [4+edi],edx
+ mov DWORD [edi],ebp
+ add esp,84
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _gcm_ghash_4bit_x86
+align 16
+_gcm_ghash_4bit_x86:
+L$_gcm_ghash_4bit_x86_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ sub esp,84
+ mov ebx,DWORD [104+esp]
+ mov esi,DWORD [108+esp]
+ mov edi,DWORD [112+esp]
+ mov ecx,DWORD [116+esp]
+ add ecx,edi
+ mov DWORD [116+esp],ecx
+ mov ebp,DWORD [ebx]
+ mov edx,DWORD [4+ebx]
+ mov ecx,DWORD [8+ebx]
+ mov ebx,DWORD [12+ebx]
+ mov DWORD [16+esp],0
+ mov DWORD [20+esp],471859200
+ mov DWORD [24+esp],943718400
+ mov DWORD [28+esp],610271232
+ mov DWORD [32+esp],1887436800
+ mov DWORD [36+esp],1822425088
+ mov DWORD [40+esp],1220542464
+ mov DWORD [44+esp],1423966208
+ mov DWORD [48+esp],3774873600
+ mov DWORD [52+esp],4246732800
+ mov DWORD [56+esp],3644850176
+ mov DWORD [60+esp],3311403008
+ mov DWORD [64+esp],2441084928
+ mov DWORD [68+esp],2376073216
+ mov DWORD [72+esp],2847932416
+ mov DWORD [76+esp],3051356160
+align 16
+L$002x86_outer_loop:
+ xor ebx,DWORD [12+edi]
+ xor ecx,DWORD [8+edi]
+ xor edx,DWORD [4+edi]
+ xor ebp,DWORD [edi]
+ mov DWORD [12+esp],ebx
+ mov DWORD [8+esp],ecx
+ mov DWORD [4+esp],edx
+ mov DWORD [esp],ebp
+ shr ebx,20
+ and ebx,240
+ mov ebp,DWORD [4+ebx*1+esi]
+ mov edx,DWORD [ebx*1+esi]
+ mov ecx,DWORD [12+ebx*1+esi]
+ mov ebx,DWORD [8+ebx*1+esi]
+ xor eax,eax
+ mov edi,15
+ jmp NEAR L$003x86_loop
+align 16
+L$003x86_loop:
+ mov al,bl
+ shrd ebx,ecx,4
+ and al,15
+ shrd ecx,edx,4
+ shrd edx,ebp,4
+ shr ebp,4
+ xor ebp,DWORD [16+eax*4+esp]
+ mov al,BYTE [edi*1+esp]
+ and al,240
+ xor ebx,DWORD [8+eax*1+esi]
+ xor ecx,DWORD [12+eax*1+esi]
+ xor edx,DWORD [eax*1+esi]
+ xor ebp,DWORD [4+eax*1+esi]
+ dec edi
+ js NEAR L$004x86_break
+ mov al,bl
+ shrd ebx,ecx,4
+ and al,15
+ shrd ecx,edx,4
+ shrd edx,ebp,4
+ shr ebp,4
+ xor ebp,DWORD [16+eax*4+esp]
+ mov al,BYTE [edi*1+esp]
+ shl al,4
+ xor ebx,DWORD [8+eax*1+esi]
+ xor ecx,DWORD [12+eax*1+esi]
+ xor edx,DWORD [eax*1+esi]
+ xor ebp,DWORD [4+eax*1+esi]
+ jmp NEAR L$003x86_loop
+align 16
+L$004x86_break:
+ bswap ebx
+ bswap ecx
+ bswap edx
+ bswap ebp
+ mov edi,DWORD [112+esp]
+ lea edi,[16+edi]
+ cmp edi,DWORD [116+esp]
+ mov DWORD [112+esp],edi
+ jb NEAR L$002x86_outer_loop
+ mov edi,DWORD [104+esp]
+ mov DWORD [12+edi],ebx
+ mov DWORD [8+edi],ecx
+ mov DWORD [4+edi],edx
+ mov DWORD [edi],ebp
+ add esp,84
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _gcm_gmult_4bit_mmx
+align 16
+_gcm_gmult_4bit_mmx:
+L$_gcm_gmult_4bit_mmx_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov edi,DWORD [20+esp]
+ mov esi,DWORD [24+esp]
+ call L$005pic_point
+L$005pic_point:
+ pop eax
+ lea eax,[(L$rem_4bit-L$005pic_point)+eax]
+ movzx ebx,BYTE [15+edi]
+ xor ecx,ecx
+ mov edx,ebx
+ mov cl,dl
+ mov ebp,14
+ shl cl,4
+ and edx,240
+ movq mm0,[8+ecx*1+esi]
+ movq mm1,[ecx*1+esi]
+ movd ebx,mm0
+ jmp NEAR L$006mmx_loop
+align 16
+L$006mmx_loop:
+ psrlq mm0,4
+ and ebx,15
+ movq mm2,mm1
+ psrlq mm1,4
+ pxor mm0,[8+edx*1+esi]
+ mov cl,BYTE [ebp*1+edi]
+ psllq mm2,60
+ pxor mm1,[ebx*8+eax]
+ dec ebp
+ movd ebx,mm0
+ pxor mm1,[edx*1+esi]
+ mov edx,ecx
+ pxor mm0,mm2
+ js NEAR L$007mmx_break
+ shl cl,4
+ and ebx,15
+ psrlq mm0,4
+ and edx,240
+ movq mm2,mm1
+ psrlq mm1,4
+ pxor mm0,[8+ecx*1+esi]
+ psllq mm2,60
+ pxor mm1,[ebx*8+eax]
+ movd ebx,mm0
+ pxor mm1,[ecx*1+esi]
+ pxor mm0,mm2
+ jmp NEAR L$006mmx_loop
+align 16
+L$007mmx_break:
+ shl cl,4
+ and ebx,15
+ psrlq mm0,4
+ and edx,240
+ movq mm2,mm1
+ psrlq mm1,4
+ pxor mm0,[8+ecx*1+esi]
+ psllq mm2,60
+ pxor mm1,[ebx*8+eax]
+ movd ebx,mm0
+ pxor mm1,[ecx*1+esi]
+ pxor mm0,mm2
+ psrlq mm0,4
+ and ebx,15
+ movq mm2,mm1
+ psrlq mm1,4
+ pxor mm0,[8+edx*1+esi]
+ psllq mm2,60
+ pxor mm1,[ebx*8+eax]
+ movd ebx,mm0
+ pxor mm1,[edx*1+esi]
+ pxor mm0,mm2
+ psrlq mm0,32
+ movd edx,mm1
+ psrlq mm1,32
+ movd ecx,mm0
+ movd ebp,mm1
+ bswap ebx
+ bswap edx
+ bswap ecx
+ bswap ebp
+ emms
+ mov DWORD [12+edi],ebx
+ mov DWORD [4+edi],edx
+ mov DWORD [8+edi],ecx
+ mov DWORD [edi],ebp
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _gcm_ghash_4bit_mmx
+align 16
+_gcm_ghash_4bit_mmx:
+L$_gcm_ghash_4bit_mmx_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov eax,DWORD [20+esp]
+ mov ebx,DWORD [24+esp]
+ mov ecx,DWORD [28+esp]
+ mov edx,DWORD [32+esp]
+ mov ebp,esp
+ call L$008pic_point
+L$008pic_point:
+ pop esi
+ lea esi,[(L$rem_8bit-L$008pic_point)+esi]
+ sub esp,544
+ and esp,-64
+ sub esp,16
+ add edx,ecx
+ mov DWORD [544+esp],eax
+ mov DWORD [552+esp],edx
+ mov DWORD [556+esp],ebp
+ add ebx,128
+ lea edi,[144+esp]
+ lea ebp,[400+esp]
+ mov edx,DWORD [ebx-120]
+ movq mm0,[ebx-120]
+ movq mm3,[ebx-128]
+ shl edx,4
+ mov BYTE [esp],dl
+ mov edx,DWORD [ebx-104]
+ movq mm2,[ebx-104]
+ movq mm5,[ebx-112]
+ movq [edi-128],mm0
+ psrlq mm0,4
+ movq [edi],mm3
+ movq mm7,mm3
+ psrlq mm3,4
+ shl edx,4
+ mov BYTE [1+esp],dl
+ mov edx,DWORD [ebx-88]
+ movq mm1,[ebx-88]
+ psllq mm7,60
+ movq mm4,[ebx-96]
+ por mm0,mm7
+ movq [edi-120],mm2
+ psrlq mm2,4
+ movq [8+edi],mm5
+ movq mm6,mm5
+ movq [ebp-128],mm0
+ psrlq mm5,4
+ movq [ebp],mm3
+ shl edx,4
+ mov BYTE [2+esp],dl
+ mov edx,DWORD [ebx-72]
+ movq mm0,[ebx-72]
+ psllq mm6,60
+ movq mm3,[ebx-80]
+ por mm2,mm6
+ movq [edi-112],mm1
+ psrlq mm1,4
+ movq [16+edi],mm4
+ movq mm7,mm4
+ movq [ebp-120],mm2
+ psrlq mm4,4
+ movq [8+ebp],mm5
+ shl edx,4
+ mov BYTE [3+esp],dl
+ mov edx,DWORD [ebx-56]
+ movq mm2,[ebx-56]
+ psllq mm7,60
+ movq mm5,[ebx-64]
+ por mm1,mm7
+ movq [edi-104],mm0
+ psrlq mm0,4
+ movq [24+edi],mm3
+ movq mm6,mm3
+ movq [ebp-112],mm1
+ psrlq mm3,4
+ movq [16+ebp],mm4
+ shl edx,4
+ mov BYTE [4+esp],dl
+ mov edx,DWORD [ebx-40]
+ movq mm1,[ebx-40]
+ psllq mm6,60
+ movq mm4,[ebx-48]
+ por mm0,mm6
+ movq [edi-96],mm2
+ psrlq mm2,4
+ movq [32+edi],mm5
+ movq mm7,mm5
+ movq [ebp-104],mm0
+ psrlq mm5,4
+ movq [24+ebp],mm3
+ shl edx,4
+ mov BYTE [5+esp],dl
+ mov edx,DWORD [ebx-24]
+ movq mm0,[ebx-24]
+ psllq mm7,60
+ movq mm3,[ebx-32]
+ por mm2,mm7
+ movq [edi-88],mm1
+ psrlq mm1,4
+ movq [40+edi],mm4
+ movq mm6,mm4
+ movq [ebp-96],mm2
+ psrlq mm4,4
+ movq [32+ebp],mm5
+ shl edx,4
+ mov BYTE [6+esp],dl
+ mov edx,DWORD [ebx-8]
+ movq mm2,[ebx-8]
+ psllq mm6,60
+ movq mm5,[ebx-16]
+ por mm1,mm6
+ movq [edi-80],mm0
+ psrlq mm0,4
+ movq [48+edi],mm3
+ movq mm7,mm3
+ movq [ebp-88],mm1
+ psrlq mm3,4
+ movq [40+ebp],mm4
+ shl edx,4
+ mov BYTE [7+esp],dl
+ mov edx,DWORD [8+ebx]
+ movq mm1,[8+ebx]
+ psllq mm7,60
+ movq mm4,[ebx]
+ por mm0,mm7
+ movq [edi-72],mm2
+ psrlq mm2,4
+ movq [56+edi],mm5
+ movq mm6,mm5
+ movq [ebp-80],mm0
+ psrlq mm5,4
+ movq [48+ebp],mm3
+ shl edx,4
+ mov BYTE [8+esp],dl
+ mov edx,DWORD [24+ebx]
+ movq mm0,[24+ebx]
+ psllq mm6,60
+ movq mm3,[16+ebx]
+ por mm2,mm6
+ movq [edi-64],mm1
+ psrlq mm1,4
+ movq [64+edi],mm4
+ movq mm7,mm4
+ movq [ebp-72],mm2
+ psrlq mm4,4
+ movq [56+ebp],mm5
+ shl edx,4
+ mov BYTE [9+esp],dl
+ mov edx,DWORD [40+ebx]
+ movq mm2,[40+ebx]
+ psllq mm7,60
+ movq mm5,[32+ebx]
+ por mm1,mm7
+ movq [edi-56],mm0
+ psrlq mm0,4
+ movq [72+edi],mm3
+ movq mm6,mm3
+ movq [ebp-64],mm1
+ psrlq mm3,4
+ movq [64+ebp],mm4
+ shl edx,4
+ mov BYTE [10+esp],dl
+ mov edx,DWORD [56+ebx]
+ movq mm1,[56+ebx]
+ psllq mm6,60
+ movq mm4,[48+ebx]
+ por mm0,mm6
+ movq [edi-48],mm2
+ psrlq mm2,4
+ movq [80+edi],mm5
+ movq mm7,mm5
+ movq [ebp-56],mm0
+ psrlq mm5,4
+ movq [72+ebp],mm3
+ shl edx,4
+ mov BYTE [11+esp],dl
+ mov edx,DWORD [72+ebx]
+ movq mm0,[72+ebx]
+ psllq mm7,60
+ movq mm3,[64+ebx]
+ por mm2,mm7
+ movq [edi-40],mm1
+ psrlq mm1,4
+ movq [88+edi],mm4
+ movq mm6,mm4
+ movq [ebp-48],mm2
+ psrlq mm4,4
+ movq [80+ebp],mm5
+ shl edx,4
+ mov BYTE [12+esp],dl
+ mov edx,DWORD [88+ebx]
+ movq mm2,[88+ebx]
+ psllq mm6,60
+ movq mm5,[80+ebx]
+ por mm1,mm6
+ movq [edi-32],mm0
+ psrlq mm0,4
+ movq [96+edi],mm3
+ movq mm7,mm3
+ movq [ebp-40],mm1
+ psrlq mm3,4
+ movq [88+ebp],mm4
+ shl edx,4
+ mov BYTE [13+esp],dl
+ mov edx,DWORD [104+ebx]
+ movq mm1,[104+ebx]
+ psllq mm7,60
+ movq mm4,[96+ebx]
+ por mm0,mm7
+ movq [edi-24],mm2
+ psrlq mm2,4
+ movq [104+edi],mm5
+ movq mm6,mm5
+ movq [ebp-32],mm0
+ psrlq mm5,4
+ movq [96+ebp],mm3
+ shl edx,4
+ mov BYTE [14+esp],dl
+ mov edx,DWORD [120+ebx]
+ movq mm0,[120+ebx]
+ psllq mm6,60
+ movq mm3,[112+ebx]
+ por mm2,mm6
+ movq [edi-16],mm1
+ psrlq mm1,4
+ movq [112+edi],mm4
+ movq mm7,mm4
+ movq [ebp-24],mm2
+ psrlq mm4,4
+ movq [104+ebp],mm5
+ shl edx,4
+ mov BYTE [15+esp],dl
+ psllq mm7,60
+ por mm1,mm7
+ movq [edi-8],mm0
+ psrlq mm0,4
+ movq [120+edi],mm3
+ movq mm6,mm3
+ movq [ebp-16],mm1
+ psrlq mm3,4
+ movq [112+ebp],mm4
+ psllq mm6,60
+ por mm0,mm6
+ movq [ebp-8],mm0
+ movq [120+ebp],mm3
+ movq mm6,[eax]
+ mov ebx,DWORD [8+eax]
+ mov edx,DWORD [12+eax]
+align 16
+L$009outer:
+ xor edx,DWORD [12+ecx]
+ xor ebx,DWORD [8+ecx]
+ pxor mm6,[ecx]
+ lea ecx,[16+ecx]
+ mov DWORD [536+esp],ebx
+ movq [528+esp],mm6
+ mov DWORD [548+esp],ecx
+ xor eax,eax
+ rol edx,8
+ mov al,dl
+ mov ebp,eax
+ and al,15
+ shr ebp,4
+ pxor mm0,mm0
+ rol edx,8
+ pxor mm1,mm1
+ pxor mm2,mm2
+ movq mm7,[16+eax*8+esp]
+ movq mm6,[144+eax*8+esp]
+ mov al,dl
+ movd ebx,mm7
+ psrlq mm7,8
+ movq mm3,mm6
+ mov edi,eax
+ psrlq mm6,8
+ pxor mm7,[272+ebp*8+esp]
+ and al,15
+ psllq mm3,56
+ shr edi,4
+ pxor mm7,[16+eax*8+esp]
+ rol edx,8
+ pxor mm6,[144+eax*8+esp]
+ pxor mm7,mm3
+ pxor mm6,[400+ebp*8+esp]
+ xor bl,BYTE [ebp*1+esp]
+ mov al,dl
+ movd ecx,mm7
+ movzx ebx,bl
+ psrlq mm7,8
+ movq mm3,mm6
+ mov ebp,eax
+ psrlq mm6,8
+ pxor mm7,[272+edi*8+esp]
+ and al,15
+ psllq mm3,56
+ shr ebp,4
+ pinsrw mm2,WORD [ebx*2+esi],2
+ pxor mm7,[16+eax*8+esp]
+ rol edx,8
+ pxor mm6,[144+eax*8+esp]
+ pxor mm7,mm3
+ pxor mm6,[400+edi*8+esp]
+ xor cl,BYTE [edi*1+esp]
+ mov al,dl
+ mov edx,DWORD [536+esp]
+ movd ebx,mm7
+ movzx ecx,cl
+ psrlq mm7,8
+ movq mm3,mm6
+ mov edi,eax
+ psrlq mm6,8
+ pxor mm7,[272+ebp*8+esp]
+ and al,15
+ psllq mm3,56
+ pxor mm6,mm2
+ shr edi,4
+ pinsrw mm1,WORD [ecx*2+esi],2
+ pxor mm7,[16+eax*8+esp]
+ rol edx,8
+ pxor mm6,[144+eax*8+esp]
+ pxor mm7,mm3
+ pxor mm6,[400+ebp*8+esp]
+ xor bl,BYTE [ebp*1+esp]
+ mov al,dl
+ movd ecx,mm7
+ movzx ebx,bl
+ psrlq mm7,8
+ movq mm3,mm6
+ mov ebp,eax
+ psrlq mm6,8
+ pxor mm7,[272+edi*8+esp]
+ and al,15
+ psllq mm3,56
+ pxor mm6,mm1
+ shr ebp,4
+ pinsrw mm0,WORD [ebx*2+esi],2
+ pxor mm7,[16+eax*8+esp]
+ rol edx,8
+ pxor mm6,[144+eax*8+esp]
+ pxor mm7,mm3
+ pxor mm6,[400+edi*8+esp]
+ xor cl,BYTE [edi*1+esp]
+ mov al,dl
+ movd ebx,mm7
+ movzx ecx,cl
+ psrlq mm7,8
+ movq mm3,mm6
+ mov edi,eax
+ psrlq mm6,8
+ pxor mm7,[272+ebp*8+esp]
+ and al,15
+ psllq mm3,56
+ pxor mm6,mm0
+ shr edi,4
+ pinsrw mm2,WORD [ecx*2+esi],2
+ pxor mm7,[16+eax*8+esp]
+ rol edx,8
+ pxor mm6,[144+eax*8+esp]
+ pxor mm7,mm3
+ pxor mm6,[400+ebp*8+esp]
+ xor bl,BYTE [ebp*1+esp]
+ mov al,dl
+ movd ecx,mm7
+ movzx ebx,bl
+ psrlq mm7,8
+ movq mm3,mm6
+ mov ebp,eax
+ psrlq mm6,8
+ pxor mm7,[272+edi*8+esp]
+ and al,15
+ psllq mm3,56
+ pxor mm6,mm2
+ shr ebp,4
+ pinsrw mm1,WORD [ebx*2+esi],2
+ pxor mm7,[16+eax*8+esp]
+ rol edx,8
+ pxor mm6,[144+eax*8+esp]
+ pxor mm7,mm3
+ pxor mm6,[400+edi*8+esp]
+ xor cl,BYTE [edi*1+esp]
+ mov al,dl
+ mov edx,DWORD [532+esp]
+ movd ebx,mm7
+ movzx ecx,cl
+ psrlq mm7,8
+ movq mm3,mm6
+ mov edi,eax
+ psrlq mm6,8
+ pxor mm7,[272+ebp*8+esp]
+ and al,15
+ psllq mm3,56
+ pxor mm6,mm1
+ shr edi,4
+ pinsrw mm0,WORD [ecx*2+esi],2
+ pxor mm7,[16+eax*8+esp]
+ rol edx,8
+ pxor mm6,[144+eax*8+esp]
+ pxor mm7,mm3
+ pxor mm6,[400+ebp*8+esp]
+ xor bl,BYTE [ebp*1+esp]
+ mov al,dl
+ movd ecx,mm7
+ movzx ebx,bl
+ psrlq mm7,8
+ movq mm3,mm6
+ mov ebp,eax
+ psrlq mm6,8
+ pxor mm7,[272+edi*8+esp]
+ and al,15
+ psllq mm3,56
+ pxor mm6,mm0
+ shr ebp,4
+ pinsrw mm2,WORD [ebx*2+esi],2
+ pxor mm7,[16+eax*8+esp]
+ rol edx,8
+ pxor mm6,[144+eax*8+esp]
+ pxor mm7,mm3
+ pxor mm6,[400+edi*8+esp]
+ xor cl,BYTE [edi*1+esp]
+ mov al,dl
+ movd ebx,mm7
+ movzx ecx,cl
+ psrlq mm7,8
+ movq mm3,mm6
+ mov edi,eax
+ psrlq mm6,8
+ pxor mm7,[272+ebp*8+esp]
+ and al,15
+ psllq mm3,56
+ pxor mm6,mm2
+ shr edi,4
+ pinsrw mm1,WORD [ecx*2+esi],2
+ pxor mm7,[16+eax*8+esp]
+ rol edx,8
+ pxor mm6,[144+eax*8+esp]
+ pxor mm7,mm3
+ pxor mm6,[400+ebp*8+esp]
+ xor bl,BYTE [ebp*1+esp]
+ mov al,dl
+ movd ecx,mm7
+ movzx ebx,bl
+ psrlq mm7,8
+ movq mm3,mm6
+ mov ebp,eax
+ psrlq mm6,8
+ pxor mm7,[272+edi*8+esp]
+ and al,15
+ psllq mm3,56
+ pxor mm6,mm1
+ shr ebp,4
+ pinsrw mm0,WORD [ebx*2+esi],2
+ pxor mm7,[16+eax*8+esp]
+ rol edx,8
+ pxor mm6,[144+eax*8+esp]
+ pxor mm7,mm3
+ pxor mm6,[400+edi*8+esp]
+ xor cl,BYTE [edi*1+esp]
+ mov al,dl
+ mov edx,DWORD [528+esp]
+ movd ebx,mm7
+ movzx ecx,cl
+ psrlq mm7,8
+ movq mm3,mm6
+ mov edi,eax
+ psrlq mm6,8
+ pxor mm7,[272+ebp*8+esp]
+ and al,15
+ psllq mm3,56
+ pxor mm6,mm0
+ shr edi,4
+ pinsrw mm2,WORD [ecx*2+esi],2
+ pxor mm7,[16+eax*8+esp]
+ rol edx,8
+ pxor mm6,[144+eax*8+esp]
+ pxor mm7,mm3
+ pxor mm6,[400+ebp*8+esp]
+ xor bl,BYTE [ebp*1+esp]
+ mov al,dl
+ movd ecx,mm7
+ movzx ebx,bl
+ psrlq mm7,8
+ movq mm3,mm6
+ mov ebp,eax
+ psrlq mm6,8
+ pxor mm7,[272+edi*8+esp]
+ and al,15
+ psllq mm3,56
+ pxor mm6,mm2
+ shr ebp,4
+ pinsrw mm1,WORD [ebx*2+esi],2
+ pxor mm7,[16+eax*8+esp]
+ rol edx,8
+ pxor mm6,[144+eax*8+esp]
+ pxor mm7,mm3
+ pxor mm6,[400+edi*8+esp]
+ xor cl,BYTE [edi*1+esp]
+ mov al,dl
+ movd ebx,mm7
+ movzx ecx,cl
+ psrlq mm7,8
+ movq mm3,mm6
+ mov edi,eax
+ psrlq mm6,8
+ pxor mm7,[272+ebp*8+esp]
+ and al,15
+ psllq mm3,56
+ pxor mm6,mm1
+ shr edi,4
+ pinsrw mm0,WORD [ecx*2+esi],2
+ pxor mm7,[16+eax*8+esp]
+ rol edx,8
+ pxor mm6,[144+eax*8+esp]
+ pxor mm7,mm3
+ pxor mm6,[400+ebp*8+esp]
+ xor bl,BYTE [ebp*1+esp]
+ mov al,dl
+ movd ecx,mm7
+ movzx ebx,bl
+ psrlq mm7,8
+ movq mm3,mm6
+ mov ebp,eax
+ psrlq mm6,8
+ pxor mm7,[272+edi*8+esp]
+ and al,15
+ psllq mm3,56
+ pxor mm6,mm0
+ shr ebp,4
+ pinsrw mm2,WORD [ebx*2+esi],2
+ pxor mm7,[16+eax*8+esp]
+ rol edx,8
+ pxor mm6,[144+eax*8+esp]
+ pxor mm7,mm3
+ pxor mm6,[400+edi*8+esp]
+ xor cl,BYTE [edi*1+esp]
+ mov al,dl
+ mov edx,DWORD [524+esp]
+ movd ebx,mm7
+ movzx ecx,cl
+ psrlq mm7,8
+ movq mm3,mm6
+ mov edi,eax
+ psrlq mm6,8
+ pxor mm7,[272+ebp*8+esp]
+ and al,15
+ psllq mm3,56
+ pxor mm6,mm2
+ shr edi,4
+ pinsrw mm1,WORD [ecx*2+esi],2
+ pxor mm7,[16+eax*8+esp]
+ pxor mm6,[144+eax*8+esp]
+ xor bl,BYTE [ebp*1+esp]
+ pxor mm7,mm3
+ pxor mm6,[400+ebp*8+esp]
+ movzx ebx,bl
+ pxor mm2,mm2
+ psllq mm1,4
+ movd ecx,mm7
+ psrlq mm7,4
+ movq mm3,mm6
+ psrlq mm6,4
+ shl ecx,4
+ pxor mm7,[16+edi*8+esp]
+ psllq mm3,60
+ movzx ecx,cl
+ pxor mm7,mm3
+ pxor mm6,[144+edi*8+esp]
+ pinsrw mm0,WORD [ebx*2+esi],2
+ pxor mm6,mm1
+ movd edx,mm7
+ pinsrw mm2,WORD [ecx*2+esi],3
+ psllq mm0,12
+ pxor mm6,mm0
+ psrlq mm7,32
+ pxor mm6,mm2
+ mov ecx,DWORD [548+esp]
+ movd ebx,mm7
+ movq mm3,mm6
+ psllw mm6,8
+ psrlw mm3,8
+ por mm6,mm3
+ bswap edx
+ pshufw mm6,mm6,27
+ bswap ebx
+ cmp ecx,DWORD [552+esp]
+ jne NEAR L$009outer
+ mov eax,DWORD [544+esp]
+ mov DWORD [12+eax],edx
+ mov DWORD [8+eax],ebx
+ movq [eax],mm6
+ mov esp,DWORD [556+esp]
+ emms
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _gcm_init_clmul
+align 16
+_gcm_init_clmul:
+L$_gcm_init_clmul_begin:
+ mov edx,DWORD [4+esp]
+ mov eax,DWORD [8+esp]
+ call L$010pic
+L$010pic:
+ pop ecx
+ lea ecx,[(L$bswap-L$010pic)+ecx]
+ movdqu xmm2,[eax]
+ pshufd xmm2,xmm2,78
+ pshufd xmm4,xmm2,255
+ movdqa xmm3,xmm2
+ psllq xmm2,1
+ pxor xmm5,xmm5
+ psrlq xmm3,63
+ pcmpgtd xmm5,xmm4
+ pslldq xmm3,8
+ por xmm2,xmm3
+ pand xmm5,[16+ecx]
+ pxor xmm2,xmm5
+ movdqa xmm0,xmm2
+ movdqa xmm1,xmm0
+ pshufd xmm3,xmm0,78
+ pshufd xmm4,xmm2,78
+ pxor xmm3,xmm0
+ pxor xmm4,xmm2
+db 102,15,58,68,194,0
+db 102,15,58,68,202,17
+db 102,15,58,68,220,0
+ xorps xmm3,xmm0
+ xorps xmm3,xmm1
+ movdqa xmm4,xmm3
+ psrldq xmm3,8
+ pslldq xmm4,8
+ pxor xmm1,xmm3
+ pxor xmm0,xmm4
+ movdqa xmm4,xmm0
+ movdqa xmm3,xmm0
+ psllq xmm0,5
+ pxor xmm3,xmm0
+ psllq xmm0,1
+ pxor xmm0,xmm3
+ psllq xmm0,57
+ movdqa xmm3,xmm0
+ pslldq xmm0,8
+ psrldq xmm3,8
+ pxor xmm0,xmm4
+ pxor xmm1,xmm3
+ movdqa xmm4,xmm0
+ psrlq xmm0,1
+ pxor xmm1,xmm4
+ pxor xmm4,xmm0
+ psrlq xmm0,5
+ pxor xmm0,xmm4
+ psrlq xmm0,1
+ pxor xmm0,xmm1
+ pshufd xmm3,xmm2,78
+ pshufd xmm4,xmm0,78
+ pxor xmm3,xmm2
+ movdqu [edx],xmm2
+ pxor xmm4,xmm0
+ movdqu [16+edx],xmm0
+db 102,15,58,15,227,8
+ movdqu [32+edx],xmm4
+ ret
+global _gcm_gmult_clmul
+align 16
+_gcm_gmult_clmul:
+L$_gcm_gmult_clmul_begin:
+ mov eax,DWORD [4+esp]
+ mov edx,DWORD [8+esp]
+ call L$011pic
+L$011pic:
+ pop ecx
+ lea ecx,[(L$bswap-L$011pic)+ecx]
+ movdqu xmm0,[eax]
+ movdqa xmm5,[ecx]
+ movups xmm2,[edx]
+db 102,15,56,0,197
+ movups xmm4,[32+edx]
+ movdqa xmm1,xmm0
+ pshufd xmm3,xmm0,78
+ pxor xmm3,xmm0
+db 102,15,58,68,194,0
+db 102,15,58,68,202,17
+db 102,15,58,68,220,0
+ xorps xmm3,xmm0
+ xorps xmm3,xmm1
+ movdqa xmm4,xmm3
+ psrldq xmm3,8
+ pslldq xmm4,8
+ pxor xmm1,xmm3
+ pxor xmm0,xmm4
+ movdqa xmm4,xmm0
+ movdqa xmm3,xmm0
+ psllq xmm0,5
+ pxor xmm3,xmm0
+ psllq xmm0,1
+ pxor xmm0,xmm3
+ psllq xmm0,57
+ movdqa xmm3,xmm0
+ pslldq xmm0,8
+ psrldq xmm3,8
+ pxor xmm0,xmm4
+ pxor xmm1,xmm3
+ movdqa xmm4,xmm0
+ psrlq xmm0,1
+ pxor xmm1,xmm4
+ pxor xmm4,xmm0
+ psrlq xmm0,5
+ pxor xmm0,xmm4
+ psrlq xmm0,1
+ pxor xmm0,xmm1
+db 102,15,56,0,197
+ movdqu [eax],xmm0
+ ret
+global _gcm_ghash_clmul
+align 16
+_gcm_ghash_clmul:
+L$_gcm_ghash_clmul_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov eax,DWORD [20+esp]
+ mov edx,DWORD [24+esp]
+ mov esi,DWORD [28+esp]
+ mov ebx,DWORD [32+esp]
+ call L$012pic
+L$012pic:
+ pop ecx
+ lea ecx,[(L$bswap-L$012pic)+ecx]
+ movdqu xmm0,[eax]
+ movdqa xmm5,[ecx]
+ movdqu xmm2,[edx]
+db 102,15,56,0,197
+ sub ebx,16
+ jz NEAR L$013odd_tail
+ movdqu xmm3,[esi]
+ movdqu xmm6,[16+esi]
+db 102,15,56,0,221
+db 102,15,56,0,245
+ movdqu xmm5,[32+edx]
+ pxor xmm0,xmm3
+ pshufd xmm3,xmm6,78
+ movdqa xmm7,xmm6
+ pxor xmm3,xmm6
+ lea esi,[32+esi]
+db 102,15,58,68,242,0
+db 102,15,58,68,250,17
+db 102,15,58,68,221,0
+ movups xmm2,[16+edx]
+ nop
+ sub ebx,32
+ jbe NEAR L$014even_tail
+ jmp NEAR L$015mod_loop
+align 32
+L$015mod_loop:
+ pshufd xmm4,xmm0,78
+ movdqa xmm1,xmm0
+ pxor xmm4,xmm0
+ nop
+db 102,15,58,68,194,0
+db 102,15,58,68,202,17
+db 102,15,58,68,229,16
+ movups xmm2,[edx]
+ xorps xmm0,xmm6
+ movdqa xmm5,[ecx]
+ xorps xmm1,xmm7
+ movdqu xmm7,[esi]
+ pxor xmm3,xmm0
+ movdqu xmm6,[16+esi]
+ pxor xmm3,xmm1
+db 102,15,56,0,253
+ pxor xmm4,xmm3
+ movdqa xmm3,xmm4
+ psrldq xmm4,8
+ pslldq xmm3,8
+ pxor xmm1,xmm4
+ pxor xmm0,xmm3
+db 102,15,56,0,245
+ pxor xmm1,xmm7
+ movdqa xmm7,xmm6
+ movdqa xmm4,xmm0
+ movdqa xmm3,xmm0
+ psllq xmm0,5
+ pxor xmm3,xmm0
+ psllq xmm0,1
+ pxor xmm0,xmm3
+db 102,15,58,68,242,0
+ movups xmm5,[32+edx]
+ psllq xmm0,57
+ movdqa xmm3,xmm0
+ pslldq xmm0,8
+ psrldq xmm3,8
+ pxor xmm0,xmm4
+ pxor xmm1,xmm3
+ pshufd xmm3,xmm7,78
+ movdqa xmm4,xmm0
+ psrlq xmm0,1
+ pxor xmm3,xmm7
+ pxor xmm1,xmm4
+db 102,15,58,68,250,17
+ movups xmm2,[16+edx]
+ pxor xmm4,xmm0
+ psrlq xmm0,5
+ pxor xmm0,xmm4
+ psrlq xmm0,1
+ pxor xmm0,xmm1
+db 102,15,58,68,221,0
+ lea esi,[32+esi]
+ sub ebx,32
+ ja NEAR L$015mod_loop
+L$014even_tail:
+ pshufd xmm4,xmm0,78
+ movdqa xmm1,xmm0
+ pxor xmm4,xmm0
+db 102,15,58,68,194,0
+db 102,15,58,68,202,17
+db 102,15,58,68,229,16
+ movdqa xmm5,[ecx]
+ xorps xmm0,xmm6
+ xorps xmm1,xmm7
+ pxor xmm3,xmm0
+ pxor xmm3,xmm1
+ pxor xmm4,xmm3
+ movdqa xmm3,xmm4
+ psrldq xmm4,8
+ pslldq xmm3,8
+ pxor xmm1,xmm4
+ pxor xmm0,xmm3
+ movdqa xmm4,xmm0
+ movdqa xmm3,xmm0
+ psllq xmm0,5
+ pxor xmm3,xmm0
+ psllq xmm0,1
+ pxor xmm0,xmm3
+ psllq xmm0,57
+ movdqa xmm3,xmm0
+ pslldq xmm0,8
+ psrldq xmm3,8
+ pxor xmm0,xmm4
+ pxor xmm1,xmm3
+ movdqa xmm4,xmm0
+ psrlq xmm0,1
+ pxor xmm1,xmm4
+ pxor xmm4,xmm0
+ psrlq xmm0,5
+ pxor xmm0,xmm4
+ psrlq xmm0,1
+ pxor xmm0,xmm1
+ test ebx,ebx
+ jnz NEAR L$016done
+ movups xmm2,[edx]
+L$013odd_tail:
+ movdqu xmm3,[esi]
+db 102,15,56,0,221
+ pxor xmm0,xmm3
+ movdqa xmm1,xmm0
+ pshufd xmm3,xmm0,78
+ pshufd xmm4,xmm2,78
+ pxor xmm3,xmm0
+ pxor xmm4,xmm2
+db 102,15,58,68,194,0
+db 102,15,58,68,202,17
+db 102,15,58,68,220,0
+ xorps xmm3,xmm0
+ xorps xmm3,xmm1
+ movdqa xmm4,xmm3
+ psrldq xmm3,8
+ pslldq xmm4,8
+ pxor xmm1,xmm3
+ pxor xmm0,xmm4
+ movdqa xmm4,xmm0
+ movdqa xmm3,xmm0
+ psllq xmm0,5
+ pxor xmm3,xmm0
+ psllq xmm0,1
+ pxor xmm0,xmm3
+ psllq xmm0,57
+ movdqa xmm3,xmm0
+ pslldq xmm0,8
+ psrldq xmm3,8
+ pxor xmm0,xmm4
+ pxor xmm1,xmm3
+ movdqa xmm4,xmm0
+ psrlq xmm0,1
+ pxor xmm1,xmm4
+ pxor xmm4,xmm0
+ psrlq xmm0,5
+ pxor xmm0,xmm4
+ psrlq xmm0,1
+ pxor xmm0,xmm1
+L$016done:
+db 102,15,56,0,197
+ movdqu [eax],xmm0
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+align 64
+L$bswap:
+db 15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0
+db 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,194
+align 64
+L$rem_8bit:
+dw 0,450,900,582,1800,1738,1164,1358
+dw 3600,4050,3476,3158,2328,2266,2716,2910
+dw 7200,7650,8100,7782,6952,6890,6316,6510
+dw 4656,5106,4532,4214,5432,5370,5820,6014
+dw 14400,14722,15300,14854,16200,16010,15564,15630
+dw 13904,14226,13780,13334,12632,12442,13020,13086
+dw 9312,9634,10212,9766,9064,8874,8428,8494
+dw 10864,11186,10740,10294,11640,11450,12028,12094
+dw 28800,28994,29444,29382,30600,30282,29708,30158
+dw 32400,32594,32020,31958,31128,30810,31260,31710
+dw 27808,28002,28452,28390,27560,27242,26668,27118
+dw 25264,25458,24884,24822,26040,25722,26172,26622
+dw 18624,18690,19268,19078,20424,19978,19532,19854
+dw 18128,18194,17748,17558,16856,16410,16988,17310
+dw 21728,21794,22372,22182,21480,21034,20588,20910
+dw 23280,23346,22900,22710,24056,23610,24188,24510
+dw 57600,57538,57988,58182,58888,59338,58764,58446
+dw 61200,61138,60564,60758,59416,59866,60316,59998
+dw 64800,64738,65188,65382,64040,64490,63916,63598
+dw 62256,62194,61620,61814,62520,62970,63420,63102
+dw 55616,55426,56004,56070,56904,57226,56780,56334
+dw 55120,54930,54484,54550,53336,53658,54236,53790
+dw 50528,50338,50916,50982,49768,50090,49644,49198
+dw 52080,51890,51444,51510,52344,52666,53244,52798
+dw 37248,36930,37380,37830,38536,38730,38156,38094
+dw 40848,40530,39956,40406,39064,39258,39708,39646
+dw 36256,35938,36388,36838,35496,35690,35116,35054
+dw 33712,33394,32820,33270,33976,34170,34620,34558
+dw 43456,43010,43588,43910,44744,44810,44364,44174
+dw 42960,42514,42068,42390,41176,41242,41820,41630
+dw 46560,46114,46692,47014,45800,45866,45420,45230
+dw 48112,47666,47220,47542,48376,48442,49020,48830
+align 64
+L$rem_4bit:
+dd 0,0,0,471859200,0,943718400,0,610271232
+dd 0,1887436800,0,1822425088,0,1220542464,0,1423966208
+dd 0,3774873600,0,4246732800,0,3644850176,0,3311403008
+dd 0,2441084928,0,2376073216,0,2847932416,0,3051356160
+db 71,72,65,83,72,32,102,111,114,32,120,56,54,44,32,67
+db 82,89,80,84,79,71,65,77,83,32,98,121,32,60,97,112
+db 112,114,111,64,111,112,101,110,115,115,108,46,111,114,103,62
+db 0
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/rc4/rc4-586.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/rc4/rc4-586.nasm
new file mode 100644
index 0000000000..e78222ee9d
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/rc4/rc4-586.nasm
@@ -0,0 +1,381 @@
+; Copyright 1998-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text code align=64
+%else
+section .text code
+%endif
+;extern _OPENSSL_ia32cap_P
+global _RC4
+align 16
+_RC4:
+L$_RC4_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov edi,DWORD [20+esp]
+ mov edx,DWORD [24+esp]
+ mov esi,DWORD [28+esp]
+ mov ebp,DWORD [32+esp]
+ xor eax,eax
+ xor ebx,ebx
+ cmp edx,0
+ je NEAR L$000abort
+ mov al,BYTE [edi]
+ mov bl,BYTE [4+edi]
+ add edi,8
+ lea ecx,[edx*1+esi]
+ sub ebp,esi
+ mov DWORD [24+esp],ecx
+ inc al
+ cmp DWORD [256+edi],-1
+ je NEAR L$001RC4_CHAR
+ mov ecx,DWORD [eax*4+edi]
+ and edx,-4
+ jz NEAR L$002loop1
+ mov DWORD [32+esp],ebp
+ test edx,-8
+ jz NEAR L$003go4loop4
+ lea ebp,[_OPENSSL_ia32cap_P]
+ bt DWORD [ebp],26
+ jnc NEAR L$003go4loop4
+ mov ebp,DWORD [32+esp]
+ and edx,-8
+ lea edx,[edx*1+esi-8]
+ mov DWORD [edi-4],edx
+ add bl,cl
+ mov edx,DWORD [ebx*4+edi]
+ mov DWORD [ebx*4+edi],ecx
+ mov DWORD [eax*4+edi],edx
+ inc eax
+ add edx,ecx
+ movzx eax,al
+ movzx edx,dl
+ movq mm0,[esi]
+ mov ecx,DWORD [eax*4+edi]
+ movd mm2,DWORD [edx*4+edi]
+ jmp NEAR L$004loop_mmx_enter
+align 16
+L$005loop_mmx:
+ add bl,cl
+ psllq mm1,56
+ mov edx,DWORD [ebx*4+edi]
+ mov DWORD [ebx*4+edi],ecx
+ mov DWORD [eax*4+edi],edx
+ inc eax
+ add edx,ecx
+ movzx eax,al
+ movzx edx,dl
+ pxor mm2,mm1
+ movq mm0,[esi]
+ movq [esi*1+ebp-8],mm2
+ mov ecx,DWORD [eax*4+edi]
+ movd mm2,DWORD [edx*4+edi]
+L$004loop_mmx_enter:
+ add bl,cl
+ mov edx,DWORD [ebx*4+edi]
+ mov DWORD [ebx*4+edi],ecx
+ mov DWORD [eax*4+edi],edx
+ inc eax
+ add edx,ecx
+ movzx eax,al
+ movzx edx,dl
+ pxor mm2,mm0
+ mov ecx,DWORD [eax*4+edi]
+ movd mm1,DWORD [edx*4+edi]
+ add bl,cl
+ psllq mm1,8
+ mov edx,DWORD [ebx*4+edi]
+ mov DWORD [ebx*4+edi],ecx
+ mov DWORD [eax*4+edi],edx
+ inc eax
+ add edx,ecx
+ movzx eax,al
+ movzx edx,dl
+ pxor mm2,mm1
+ mov ecx,DWORD [eax*4+edi]
+ movd mm1,DWORD [edx*4+edi]
+ add bl,cl
+ psllq mm1,16
+ mov edx,DWORD [ebx*4+edi]
+ mov DWORD [ebx*4+edi],ecx
+ mov DWORD [eax*4+edi],edx
+ inc eax
+ add edx,ecx
+ movzx eax,al
+ movzx edx,dl
+ pxor mm2,mm1
+ mov ecx,DWORD [eax*4+edi]
+ movd mm1,DWORD [edx*4+edi]
+ add bl,cl
+ psllq mm1,24
+ mov edx,DWORD [ebx*4+edi]
+ mov DWORD [ebx*4+edi],ecx
+ mov DWORD [eax*4+edi],edx
+ inc eax
+ add edx,ecx
+ movzx eax,al
+ movzx edx,dl
+ pxor mm2,mm1
+ mov ecx,DWORD [eax*4+edi]
+ movd mm1,DWORD [edx*4+edi]
+ add bl,cl
+ psllq mm1,32
+ mov edx,DWORD [ebx*4+edi]
+ mov DWORD [ebx*4+edi],ecx
+ mov DWORD [eax*4+edi],edx
+ inc eax
+ add edx,ecx
+ movzx eax,al
+ movzx edx,dl
+ pxor mm2,mm1
+ mov ecx,DWORD [eax*4+edi]
+ movd mm1,DWORD [edx*4+edi]
+ add bl,cl
+ psllq mm1,40
+ mov edx,DWORD [ebx*4+edi]
+ mov DWORD [ebx*4+edi],ecx
+ mov DWORD [eax*4+edi],edx
+ inc eax
+ add edx,ecx
+ movzx eax,al
+ movzx edx,dl
+ pxor mm2,mm1
+ mov ecx,DWORD [eax*4+edi]
+ movd mm1,DWORD [edx*4+edi]
+ add bl,cl
+ psllq mm1,48
+ mov edx,DWORD [ebx*4+edi]
+ mov DWORD [ebx*4+edi],ecx
+ mov DWORD [eax*4+edi],edx
+ inc eax
+ add edx,ecx
+ movzx eax,al
+ movzx edx,dl
+ pxor mm2,mm1
+ mov ecx,DWORD [eax*4+edi]
+ movd mm1,DWORD [edx*4+edi]
+ mov edx,ebx
+ xor ebx,ebx
+ mov bl,dl
+ cmp esi,DWORD [edi-4]
+ lea esi,[8+esi]
+ jb NEAR L$005loop_mmx
+ psllq mm1,56
+ pxor mm2,mm1
+ movq [esi*1+ebp-8],mm2
+ emms
+ cmp esi,DWORD [24+esp]
+ je NEAR L$006done
+ jmp NEAR L$002loop1
+align 16
+L$003go4loop4:
+ lea edx,[edx*1+esi-4]
+ mov DWORD [28+esp],edx
+L$007loop4:
+ add bl,cl
+ mov edx,DWORD [ebx*4+edi]
+ mov DWORD [ebx*4+edi],ecx
+ mov DWORD [eax*4+edi],edx
+ add edx,ecx
+ inc al
+ and edx,255
+ mov ecx,DWORD [eax*4+edi]
+ mov ebp,DWORD [edx*4+edi]
+ add bl,cl
+ mov edx,DWORD [ebx*4+edi]
+ mov DWORD [ebx*4+edi],ecx
+ mov DWORD [eax*4+edi],edx
+ add edx,ecx
+ inc al
+ and edx,255
+ ror ebp,8
+ mov ecx,DWORD [eax*4+edi]
+ or ebp,DWORD [edx*4+edi]
+ add bl,cl
+ mov edx,DWORD [ebx*4+edi]
+ mov DWORD [ebx*4+edi],ecx
+ mov DWORD [eax*4+edi],edx
+ add edx,ecx
+ inc al
+ and edx,255
+ ror ebp,8
+ mov ecx,DWORD [eax*4+edi]
+ or ebp,DWORD [edx*4+edi]
+ add bl,cl
+ mov edx,DWORD [ebx*4+edi]
+ mov DWORD [ebx*4+edi],ecx
+ mov DWORD [eax*4+edi],edx
+ add edx,ecx
+ inc al
+ and edx,255
+ ror ebp,8
+ mov ecx,DWORD [32+esp]
+ or ebp,DWORD [edx*4+edi]
+ ror ebp,8
+ xor ebp,DWORD [esi]
+ cmp esi,DWORD [28+esp]
+ mov DWORD [esi*1+ecx],ebp
+ lea esi,[4+esi]
+ mov ecx,DWORD [eax*4+edi]
+ jb NEAR L$007loop4
+ cmp esi,DWORD [24+esp]
+ je NEAR L$006done
+ mov ebp,DWORD [32+esp]
+align 16
+L$002loop1:
+ add bl,cl
+ mov edx,DWORD [ebx*4+edi]
+ mov DWORD [ebx*4+edi],ecx
+ mov DWORD [eax*4+edi],edx
+ add edx,ecx
+ inc al
+ and edx,255
+ mov edx,DWORD [edx*4+edi]
+ xor dl,BYTE [esi]
+ lea esi,[1+esi]
+ mov ecx,DWORD [eax*4+edi]
+ cmp esi,DWORD [24+esp]
+ mov BYTE [esi*1+ebp-1],dl
+ jb NEAR L$002loop1
+ jmp NEAR L$006done
+align 16
+L$001RC4_CHAR:
+ movzx ecx,BYTE [eax*1+edi]
+L$008cloop1:
+ add bl,cl
+ movzx edx,BYTE [ebx*1+edi]
+ mov BYTE [ebx*1+edi],cl
+ mov BYTE [eax*1+edi],dl
+ add dl,cl
+ movzx edx,BYTE [edx*1+edi]
+ add al,1
+ xor dl,BYTE [esi]
+ lea esi,[1+esi]
+ movzx ecx,BYTE [eax*1+edi]
+ cmp esi,DWORD [24+esp]
+ mov BYTE [esi*1+ebp-1],dl
+ jb NEAR L$008cloop1
+L$006done:
+ dec al
+ mov DWORD [edi-4],ebx
+ mov BYTE [edi-8],al
+L$000abort:
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _RC4_set_key
+align 16
+_RC4_set_key:
+L$_RC4_set_key_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov edi,DWORD [20+esp]
+ mov ebp,DWORD [24+esp]
+ mov esi,DWORD [28+esp]
+ lea edx,[_OPENSSL_ia32cap_P]
+ lea edi,[8+edi]
+ lea esi,[ebp*1+esi]
+ neg ebp
+ xor eax,eax
+ mov DWORD [edi-4],ebp
+ bt DWORD [edx],20
+ jc NEAR L$009c1stloop
+align 16
+L$010w1stloop:
+ mov DWORD [eax*4+edi],eax
+ add al,1
+ jnc NEAR L$010w1stloop
+ xor ecx,ecx
+ xor edx,edx
+align 16
+L$011w2ndloop:
+ mov eax,DWORD [ecx*4+edi]
+ add dl,BYTE [ebp*1+esi]
+ add dl,al
+ add ebp,1
+ mov ebx,DWORD [edx*4+edi]
+ jnz NEAR L$012wnowrap
+ mov ebp,DWORD [edi-4]
+L$012wnowrap:
+ mov DWORD [edx*4+edi],eax
+ mov DWORD [ecx*4+edi],ebx
+ add cl,1
+ jnc NEAR L$011w2ndloop
+ jmp NEAR L$013exit
+align 16
+L$009c1stloop:
+ mov BYTE [eax*1+edi],al
+ add al,1
+ jnc NEAR L$009c1stloop
+ xor ecx,ecx
+ xor edx,edx
+ xor ebx,ebx
+align 16
+L$014c2ndloop:
+ mov al,BYTE [ecx*1+edi]
+ add dl,BYTE [ebp*1+esi]
+ add dl,al
+ add ebp,1
+ mov bl,BYTE [edx*1+edi]
+ jnz NEAR L$015cnowrap
+ mov ebp,DWORD [edi-4]
+L$015cnowrap:
+ mov BYTE [edx*1+edi],al
+ mov BYTE [ecx*1+edi],bl
+ add cl,1
+ jnc NEAR L$014c2ndloop
+ mov DWORD [256+edi],-1
+L$013exit:
+ xor eax,eax
+ mov DWORD [edi-8],eax
+ mov DWORD [edi-4],eax
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _RC4_options
+align 16
+_RC4_options:
+L$_RC4_options_begin:
+ call L$016pic_point
+L$016pic_point:
+ pop eax
+ lea eax,[(L$017opts-L$016pic_point)+eax]
+ lea edx,[_OPENSSL_ia32cap_P]
+ mov edx,DWORD [edx]
+ bt edx,20
+ jc NEAR L$0181xchar
+ bt edx,26
+ jnc NEAR L$019ret
+ add eax,25
+ ret
+L$0181xchar:
+ add eax,12
+L$019ret:
+ ret
+align 64
+L$017opts:
+db 114,99,52,40,52,120,44,105,110,116,41,0
+db 114,99,52,40,49,120,44,99,104,97,114,41,0
+db 114,99,52,40,56,120,44,109,109,120,41,0
+db 82,67,52,32,102,111,114,32,120,56,54,44,32,67,82,89
+db 80,84,79,71,65,77,83,32,98,121,32,60,97,112,112,114
+db 111,64,111,112,101,110,115,115,108,46,111,114,103,62,0
+align 64
+segment .bss
+common _OPENSSL_ia32cap_P 16
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha1-586.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha1-586.nasm
new file mode 100644
index 0000000000..4a893333d8
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha1-586.nasm
@@ -0,0 +1,3977 @@
+; Copyright 1998-2018 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text code align=64
+%else
+section .text code
+%endif
+;extern _OPENSSL_ia32cap_P
+global _sha1_block_data_order
+align 16
+_sha1_block_data_order:
+L$_sha1_block_data_order_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ call L$000pic_point
+L$000pic_point:
+ pop ebp
+ lea esi,[_OPENSSL_ia32cap_P]
+ lea ebp,[(L$K_XX_XX-L$000pic_point)+ebp]
+ mov eax,DWORD [esi]
+ mov edx,DWORD [4+esi]
+ test edx,512
+ jz NEAR L$001x86
+ mov ecx,DWORD [8+esi]
+ test eax,16777216
+ jz NEAR L$001x86
+ test ecx,536870912
+ jnz NEAR L$shaext_shortcut
+ and edx,268435456
+ and eax,1073741824
+ or eax,edx
+ cmp eax,1342177280
+ je NEAR L$avx_shortcut
+ jmp NEAR L$ssse3_shortcut
+align 16
+L$001x86:
+ mov ebp,DWORD [20+esp]
+ mov esi,DWORD [24+esp]
+ mov eax,DWORD [28+esp]
+ sub esp,76
+ shl eax,6
+ add eax,esi
+ mov DWORD [104+esp],eax
+ mov edi,DWORD [16+ebp]
+ jmp NEAR L$002loop
+align 16
+L$002loop:
+ mov eax,DWORD [esi]
+ mov ebx,DWORD [4+esi]
+ mov ecx,DWORD [8+esi]
+ mov edx,DWORD [12+esi]
+ bswap eax
+ bswap ebx
+ bswap ecx
+ bswap edx
+ mov DWORD [esp],eax
+ mov DWORD [4+esp],ebx
+ mov DWORD [8+esp],ecx
+ mov DWORD [12+esp],edx
+ mov eax,DWORD [16+esi]
+ mov ebx,DWORD [20+esi]
+ mov ecx,DWORD [24+esi]
+ mov edx,DWORD [28+esi]
+ bswap eax
+ bswap ebx
+ bswap ecx
+ bswap edx
+ mov DWORD [16+esp],eax
+ mov DWORD [20+esp],ebx
+ mov DWORD [24+esp],ecx
+ mov DWORD [28+esp],edx
+ mov eax,DWORD [32+esi]
+ mov ebx,DWORD [36+esi]
+ mov ecx,DWORD [40+esi]
+ mov edx,DWORD [44+esi]
+ bswap eax
+ bswap ebx
+ bswap ecx
+ bswap edx
+ mov DWORD [32+esp],eax
+ mov DWORD [36+esp],ebx
+ mov DWORD [40+esp],ecx
+ mov DWORD [44+esp],edx
+ mov eax,DWORD [48+esi]
+ mov ebx,DWORD [52+esi]
+ mov ecx,DWORD [56+esi]
+ mov edx,DWORD [60+esi]
+ bswap eax
+ bswap ebx
+ bswap ecx
+ bswap edx
+ mov DWORD [48+esp],eax
+ mov DWORD [52+esp],ebx
+ mov DWORD [56+esp],ecx
+ mov DWORD [60+esp],edx
+ mov DWORD [100+esp],esi
+ mov eax,DWORD [ebp]
+ mov ebx,DWORD [4+ebp]
+ mov ecx,DWORD [8+ebp]
+ mov edx,DWORD [12+ebp]
+ ; 00_15 0
+ mov esi,ecx
+ mov ebp,eax
+ rol ebp,5
+ xor esi,edx
+ add ebp,edi
+ mov edi,DWORD [esp]
+ and esi,ebx
+ ror ebx,2
+ xor esi,edx
+ lea ebp,[1518500249+edi*1+ebp]
+ add ebp,esi
+ ; 00_15 1
+ mov edi,ebx
+ mov esi,ebp
+ rol ebp,5
+ xor edi,ecx
+ add ebp,edx
+ mov edx,DWORD [4+esp]
+ and edi,eax
+ ror eax,2
+ xor edi,ecx
+ lea ebp,[1518500249+edx*1+ebp]
+ add ebp,edi
+ ; 00_15 2
+ mov edx,eax
+ mov edi,ebp
+ rol ebp,5
+ xor edx,ebx
+ add ebp,ecx
+ mov ecx,DWORD [8+esp]
+ and edx,esi
+ ror esi,2
+ xor edx,ebx
+ lea ebp,[1518500249+ecx*1+ebp]
+ add ebp,edx
+ ; 00_15 3
+ mov ecx,esi
+ mov edx,ebp
+ rol ebp,5
+ xor ecx,eax
+ add ebp,ebx
+ mov ebx,DWORD [12+esp]
+ and ecx,edi
+ ror edi,2
+ xor ecx,eax
+ lea ebp,[1518500249+ebx*1+ebp]
+ add ebp,ecx
+ ; 00_15 4
+ mov ebx,edi
+ mov ecx,ebp
+ rol ebp,5
+ xor ebx,esi
+ add ebp,eax
+ mov eax,DWORD [16+esp]
+ and ebx,edx
+ ror edx,2
+ xor ebx,esi
+ lea ebp,[1518500249+eax*1+ebp]
+ add ebp,ebx
+ ; 00_15 5
+ mov eax,edx
+ mov ebx,ebp
+ rol ebp,5
+ xor eax,edi
+ add ebp,esi
+ mov esi,DWORD [20+esp]
+ and eax,ecx
+ ror ecx,2
+ xor eax,edi
+ lea ebp,[1518500249+esi*1+ebp]
+ add ebp,eax
+ ; 00_15 6
+ mov esi,ecx
+ mov eax,ebp
+ rol ebp,5
+ xor esi,edx
+ add ebp,edi
+ mov edi,DWORD [24+esp]
+ and esi,ebx
+ ror ebx,2
+ xor esi,edx
+ lea ebp,[1518500249+edi*1+ebp]
+ add ebp,esi
+ ; 00_15 7
+ mov edi,ebx
+ mov esi,ebp
+ rol ebp,5
+ xor edi,ecx
+ add ebp,edx
+ mov edx,DWORD [28+esp]
+ and edi,eax
+ ror eax,2
+ xor edi,ecx
+ lea ebp,[1518500249+edx*1+ebp]
+ add ebp,edi
+ ; 00_15 8
+ mov edx,eax
+ mov edi,ebp
+ rol ebp,5
+ xor edx,ebx
+ add ebp,ecx
+ mov ecx,DWORD [32+esp]
+ and edx,esi
+ ror esi,2
+ xor edx,ebx
+ lea ebp,[1518500249+ecx*1+ebp]
+ add ebp,edx
+ ; 00_15 9
+ mov ecx,esi
+ mov edx,ebp
+ rol ebp,5
+ xor ecx,eax
+ add ebp,ebx
+ mov ebx,DWORD [36+esp]
+ and ecx,edi
+ ror edi,2
+ xor ecx,eax
+ lea ebp,[1518500249+ebx*1+ebp]
+ add ebp,ecx
+ ; 00_15 10
+ mov ebx,edi
+ mov ecx,ebp
+ rol ebp,5
+ xor ebx,esi
+ add ebp,eax
+ mov eax,DWORD [40+esp]
+ and ebx,edx
+ ror edx,2
+ xor ebx,esi
+ lea ebp,[1518500249+eax*1+ebp]
+ add ebp,ebx
+ ; 00_15 11
+ mov eax,edx
+ mov ebx,ebp
+ rol ebp,5
+ xor eax,edi
+ add ebp,esi
+ mov esi,DWORD [44+esp]
+ and eax,ecx
+ ror ecx,2
+ xor eax,edi
+ lea ebp,[1518500249+esi*1+ebp]
+ add ebp,eax
+ ; 00_15 12
+ mov esi,ecx
+ mov eax,ebp
+ rol ebp,5
+ xor esi,edx
+ add ebp,edi
+ mov edi,DWORD [48+esp]
+ and esi,ebx
+ ror ebx,2
+ xor esi,edx
+ lea ebp,[1518500249+edi*1+ebp]
+ add ebp,esi
+ ; 00_15 13
+ mov edi,ebx
+ mov esi,ebp
+ rol ebp,5
+ xor edi,ecx
+ add ebp,edx
+ mov edx,DWORD [52+esp]
+ and edi,eax
+ ror eax,2
+ xor edi,ecx
+ lea ebp,[1518500249+edx*1+ebp]
+ add ebp,edi
+ ; 00_15 14
+ mov edx,eax
+ mov edi,ebp
+ rol ebp,5
+ xor edx,ebx
+ add ebp,ecx
+ mov ecx,DWORD [56+esp]
+ and edx,esi
+ ror esi,2
+ xor edx,ebx
+ lea ebp,[1518500249+ecx*1+ebp]
+ add ebp,edx
+ ; 00_15 15
+ mov ecx,esi
+ mov edx,ebp
+ rol ebp,5
+ xor ecx,eax
+ add ebp,ebx
+ mov ebx,DWORD [60+esp]
+ and ecx,edi
+ ror edi,2
+ xor ecx,eax
+ lea ebp,[1518500249+ebx*1+ebp]
+ mov ebx,DWORD [esp]
+ add ecx,ebp
+ ; 16_19 16
+ mov ebp,edi
+ xor ebx,DWORD [8+esp]
+ xor ebp,esi
+ xor ebx,DWORD [32+esp]
+ and ebp,edx
+ xor ebx,DWORD [52+esp]
+ rol ebx,1
+ xor ebp,esi
+ add eax,ebp
+ mov ebp,ecx
+ ror edx,2
+ mov DWORD [esp],ebx
+ rol ebp,5
+ lea ebx,[1518500249+eax*1+ebx]
+ mov eax,DWORD [4+esp]
+ add ebx,ebp
+ ; 16_19 17
+ mov ebp,edx
+ xor eax,DWORD [12+esp]
+ xor ebp,edi
+ xor eax,DWORD [36+esp]
+ and ebp,ecx
+ xor eax,DWORD [56+esp]
+ rol eax,1
+ xor ebp,edi
+ add esi,ebp
+ mov ebp,ebx
+ ror ecx,2
+ mov DWORD [4+esp],eax
+ rol ebp,5
+ lea eax,[1518500249+esi*1+eax]
+ mov esi,DWORD [8+esp]
+ add eax,ebp
+ ; 16_19 18
+ mov ebp,ecx
+ xor esi,DWORD [16+esp]
+ xor ebp,edx
+ xor esi,DWORD [40+esp]
+ and ebp,ebx
+ xor esi,DWORD [60+esp]
+ rol esi,1
+ xor ebp,edx
+ add edi,ebp
+ mov ebp,eax
+ ror ebx,2
+ mov DWORD [8+esp],esi
+ rol ebp,5
+ lea esi,[1518500249+edi*1+esi]
+ mov edi,DWORD [12+esp]
+ add esi,ebp
+ ; 16_19 19
+ mov ebp,ebx
+ xor edi,DWORD [20+esp]
+ xor ebp,ecx
+ xor edi,DWORD [44+esp]
+ and ebp,eax
+ xor edi,DWORD [esp]
+ rol edi,1
+ xor ebp,ecx
+ add edx,ebp
+ mov ebp,esi
+ ror eax,2
+ mov DWORD [12+esp],edi
+ rol ebp,5
+ lea edi,[1518500249+edx*1+edi]
+ mov edx,DWORD [16+esp]
+ add edi,ebp
+ ; 20_39 20
+ mov ebp,esi
+ xor edx,DWORD [24+esp]
+ xor ebp,eax
+ xor edx,DWORD [48+esp]
+ xor ebp,ebx
+ xor edx,DWORD [4+esp]
+ rol edx,1
+ add ecx,ebp
+ ror esi,2
+ mov ebp,edi
+ rol ebp,5
+ mov DWORD [16+esp],edx
+ lea edx,[1859775393+ecx*1+edx]
+ mov ecx,DWORD [20+esp]
+ add edx,ebp
+ ; 20_39 21
+ mov ebp,edi
+ xor ecx,DWORD [28+esp]
+ xor ebp,esi
+ xor ecx,DWORD [52+esp]
+ xor ebp,eax
+ xor ecx,DWORD [8+esp]
+ rol ecx,1
+ add ebx,ebp
+ ror edi,2
+ mov ebp,edx
+ rol ebp,5
+ mov DWORD [20+esp],ecx
+ lea ecx,[1859775393+ebx*1+ecx]
+ mov ebx,DWORD [24+esp]
+ add ecx,ebp
+ ; 20_39 22
+ mov ebp,edx
+ xor ebx,DWORD [32+esp]
+ xor ebp,edi
+ xor ebx,DWORD [56+esp]
+ xor ebp,esi
+ xor ebx,DWORD [12+esp]
+ rol ebx,1
+ add eax,ebp
+ ror edx,2
+ mov ebp,ecx
+ rol ebp,5
+ mov DWORD [24+esp],ebx
+ lea ebx,[1859775393+eax*1+ebx]
+ mov eax,DWORD [28+esp]
+ add ebx,ebp
+ ; 20_39 23
+ mov ebp,ecx
+ xor eax,DWORD [36+esp]
+ xor ebp,edx
+ xor eax,DWORD [60+esp]
+ xor ebp,edi
+ xor eax,DWORD [16+esp]
+ rol eax,1
+ add esi,ebp
+ ror ecx,2
+ mov ebp,ebx
+ rol ebp,5
+ mov DWORD [28+esp],eax
+ lea eax,[1859775393+esi*1+eax]
+ mov esi,DWORD [32+esp]
+ add eax,ebp
+ ; 20_39 24
+ mov ebp,ebx
+ xor esi,DWORD [40+esp]
+ xor ebp,ecx
+ xor esi,DWORD [esp]
+ xor ebp,edx
+ xor esi,DWORD [20+esp]
+ rol esi,1
+ add edi,ebp
+ ror ebx,2
+ mov ebp,eax
+ rol ebp,5
+ mov DWORD [32+esp],esi
+ lea esi,[1859775393+edi*1+esi]
+ mov edi,DWORD [36+esp]
+ add esi,ebp
+ ; 20_39 25
+ mov ebp,eax
+ xor edi,DWORD [44+esp]
+ xor ebp,ebx
+ xor edi,DWORD [4+esp]
+ xor ebp,ecx
+ xor edi,DWORD [24+esp]
+ rol edi,1
+ add edx,ebp
+ ror eax,2
+ mov ebp,esi
+ rol ebp,5
+ mov DWORD [36+esp],edi
+ lea edi,[1859775393+edx*1+edi]
+ mov edx,DWORD [40+esp]
+ add edi,ebp
+ ; 20_39 26
+ mov ebp,esi
+ xor edx,DWORD [48+esp]
+ xor ebp,eax
+ xor edx,DWORD [8+esp]
+ xor ebp,ebx
+ xor edx,DWORD [28+esp]
+ rol edx,1
+ add ecx,ebp
+ ror esi,2
+ mov ebp,edi
+ rol ebp,5
+ mov DWORD [40+esp],edx
+ lea edx,[1859775393+ecx*1+edx]
+ mov ecx,DWORD [44+esp]
+ add edx,ebp
+ ; 20_39 27
+ mov ebp,edi
+ xor ecx,DWORD [52+esp]
+ xor ebp,esi
+ xor ecx,DWORD [12+esp]
+ xor ebp,eax
+ xor ecx,DWORD [32+esp]
+ rol ecx,1
+ add ebx,ebp
+ ror edi,2
+ mov ebp,edx
+ rol ebp,5
+ mov DWORD [44+esp],ecx
+ lea ecx,[1859775393+ebx*1+ecx]
+ mov ebx,DWORD [48+esp]
+ add ecx,ebp
+ ; 20_39 28
+ mov ebp,edx
+ xor ebx,DWORD [56+esp]
+ xor ebp,edi
+ xor ebx,DWORD [16+esp]
+ xor ebp,esi
+ xor ebx,DWORD [36+esp]
+ rol ebx,1
+ add eax,ebp
+ ror edx,2
+ mov ebp,ecx
+ rol ebp,5
+ mov DWORD [48+esp],ebx
+ lea ebx,[1859775393+eax*1+ebx]
+ mov eax,DWORD [52+esp]
+ add ebx,ebp
+ ; 20_39 29
+ mov ebp,ecx
+ xor eax,DWORD [60+esp]
+ xor ebp,edx
+ xor eax,DWORD [20+esp]
+ xor ebp,edi
+ xor eax,DWORD [40+esp]
+ rol eax,1
+ add esi,ebp
+ ror ecx,2
+ mov ebp,ebx
+ rol ebp,5
+ mov DWORD [52+esp],eax
+ lea eax,[1859775393+esi*1+eax]
+ mov esi,DWORD [56+esp]
+ add eax,ebp
+ ; 20_39 30
+ mov ebp,ebx
+ xor esi,DWORD [esp]
+ xor ebp,ecx
+ xor esi,DWORD [24+esp]
+ xor ebp,edx
+ xor esi,DWORD [44+esp]
+ rol esi,1
+ add edi,ebp
+ ror ebx,2
+ mov ebp,eax
+ rol ebp,5
+ mov DWORD [56+esp],esi
+ lea esi,[1859775393+edi*1+esi]
+ mov edi,DWORD [60+esp]
+ add esi,ebp
+ ; 20_39 31
+ mov ebp,eax
+ xor edi,DWORD [4+esp]
+ xor ebp,ebx
+ xor edi,DWORD [28+esp]
+ xor ebp,ecx
+ xor edi,DWORD [48+esp]
+ rol edi,1
+ add edx,ebp
+ ror eax,2
+ mov ebp,esi
+ rol ebp,5
+ mov DWORD [60+esp],edi
+ lea edi,[1859775393+edx*1+edi]
+ mov edx,DWORD [esp]
+ add edi,ebp
+ ; 20_39 32
+ mov ebp,esi
+ xor edx,DWORD [8+esp]
+ xor ebp,eax
+ xor edx,DWORD [32+esp]
+ xor ebp,ebx
+ xor edx,DWORD [52+esp]
+ rol edx,1
+ add ecx,ebp
+ ror esi,2
+ mov ebp,edi
+ rol ebp,5
+ mov DWORD [esp],edx
+ lea edx,[1859775393+ecx*1+edx]
+ mov ecx,DWORD [4+esp]
+ add edx,ebp
+ ; 20_39 33
+ mov ebp,edi
+ xor ecx,DWORD [12+esp]
+ xor ebp,esi
+ xor ecx,DWORD [36+esp]
+ xor ebp,eax
+ xor ecx,DWORD [56+esp]
+ rol ecx,1
+ add ebx,ebp
+ ror edi,2
+ mov ebp,edx
+ rol ebp,5
+ mov DWORD [4+esp],ecx
+ lea ecx,[1859775393+ebx*1+ecx]
+ mov ebx,DWORD [8+esp]
+ add ecx,ebp
+ ; 20_39 34
+ mov ebp,edx
+ xor ebx,DWORD [16+esp]
+ xor ebp,edi
+ xor ebx,DWORD [40+esp]
+ xor ebp,esi
+ xor ebx,DWORD [60+esp]
+ rol ebx,1
+ add eax,ebp
+ ror edx,2
+ mov ebp,ecx
+ rol ebp,5
+ mov DWORD [8+esp],ebx
+ lea ebx,[1859775393+eax*1+ebx]
+ mov eax,DWORD [12+esp]
+ add ebx,ebp
+ ; 20_39 35
+ mov ebp,ecx
+ xor eax,DWORD [20+esp]
+ xor ebp,edx
+ xor eax,DWORD [44+esp]
+ xor ebp,edi
+ xor eax,DWORD [esp]
+ rol eax,1
+ add esi,ebp
+ ror ecx,2
+ mov ebp,ebx
+ rol ebp,5
+ mov DWORD [12+esp],eax
+ lea eax,[1859775393+esi*1+eax]
+ mov esi,DWORD [16+esp]
+ add eax,ebp
+ ; 20_39 36
+ mov ebp,ebx
+ xor esi,DWORD [24+esp]
+ xor ebp,ecx
+ xor esi,DWORD [48+esp]
+ xor ebp,edx
+ xor esi,DWORD [4+esp]
+ rol esi,1
+ add edi,ebp
+ ror ebx,2
+ mov ebp,eax
+ rol ebp,5
+ mov DWORD [16+esp],esi
+ lea esi,[1859775393+edi*1+esi]
+ mov edi,DWORD [20+esp]
+ add esi,ebp
+ ; 20_39 37
+ mov ebp,eax
+ xor edi,DWORD [28+esp]
+ xor ebp,ebx
+ xor edi,DWORD [52+esp]
+ xor ebp,ecx
+ xor edi,DWORD [8+esp]
+ rol edi,1
+ add edx,ebp
+ ror eax,2
+ mov ebp,esi
+ rol ebp,5
+ mov DWORD [20+esp],edi
+ lea edi,[1859775393+edx*1+edi]
+ mov edx,DWORD [24+esp]
+ add edi,ebp
+ ; 20_39 38
+ mov ebp,esi
+ xor edx,DWORD [32+esp]
+ xor ebp,eax
+ xor edx,DWORD [56+esp]
+ xor ebp,ebx
+ xor edx,DWORD [12+esp]
+ rol edx,1
+ add ecx,ebp
+ ror esi,2
+ mov ebp,edi
+ rol ebp,5
+ mov DWORD [24+esp],edx
+ lea edx,[1859775393+ecx*1+edx]
+ mov ecx,DWORD [28+esp]
+ add edx,ebp
+ ; 20_39 39
+ mov ebp,edi
+ xor ecx,DWORD [36+esp]
+ xor ebp,esi
+ xor ecx,DWORD [60+esp]
+ xor ebp,eax
+ xor ecx,DWORD [16+esp]
+ rol ecx,1
+ add ebx,ebp
+ ror edi,2
+ mov ebp,edx
+ rol ebp,5
+ mov DWORD [28+esp],ecx
+ lea ecx,[1859775393+ebx*1+ecx]
+ mov ebx,DWORD [32+esp]
+ add ecx,ebp
+ ; 40_59 40
+ mov ebp,edi
+ xor ebx,DWORD [40+esp]
+ xor ebp,esi
+ xor ebx,DWORD [esp]
+ and ebp,edx
+ xor ebx,DWORD [20+esp]
+ rol ebx,1
+ add ebp,eax
+ ror edx,2
+ mov eax,ecx
+ rol eax,5
+ mov DWORD [32+esp],ebx
+ lea ebx,[2400959708+ebp*1+ebx]
+ mov ebp,edi
+ add ebx,eax
+ and ebp,esi
+ mov eax,DWORD [36+esp]
+ add ebx,ebp
+ ; 40_59 41
+ mov ebp,edx
+ xor eax,DWORD [44+esp]
+ xor ebp,edi
+ xor eax,DWORD [4+esp]
+ and ebp,ecx
+ xor eax,DWORD [24+esp]
+ rol eax,1
+ add ebp,esi
+ ror ecx,2
+ mov esi,ebx
+ rol esi,5
+ mov DWORD [36+esp],eax
+ lea eax,[2400959708+ebp*1+eax]
+ mov ebp,edx
+ add eax,esi
+ and ebp,edi
+ mov esi,DWORD [40+esp]
+ add eax,ebp
+ ; 40_59 42
+ mov ebp,ecx
+ xor esi,DWORD [48+esp]
+ xor ebp,edx
+ xor esi,DWORD [8+esp]
+ and ebp,ebx
+ xor esi,DWORD [28+esp]
+ rol esi,1
+ add ebp,edi
+ ror ebx,2
+ mov edi,eax
+ rol edi,5
+ mov DWORD [40+esp],esi
+ lea esi,[2400959708+ebp*1+esi]
+ mov ebp,ecx
+ add esi,edi
+ and ebp,edx
+ mov edi,DWORD [44+esp]
+ add esi,ebp
+ ; 40_59 43
+ mov ebp,ebx
+ xor edi,DWORD [52+esp]
+ xor ebp,ecx
+ xor edi,DWORD [12+esp]
+ and ebp,eax
+ xor edi,DWORD [32+esp]
+ rol edi,1
+ add ebp,edx
+ ror eax,2
+ mov edx,esi
+ rol edx,5
+ mov DWORD [44+esp],edi
+ lea edi,[2400959708+ebp*1+edi]
+ mov ebp,ebx
+ add edi,edx
+ and ebp,ecx
+ mov edx,DWORD [48+esp]
+ add edi,ebp
+ ; 40_59 44
+ mov ebp,eax
+ xor edx,DWORD [56+esp]
+ xor ebp,ebx
+ xor edx,DWORD [16+esp]
+ and ebp,esi
+ xor edx,DWORD [36+esp]
+ rol edx,1
+ add ebp,ecx
+ ror esi,2
+ mov ecx,edi
+ rol ecx,5
+ mov DWORD [48+esp],edx
+ lea edx,[2400959708+ebp*1+edx]
+ mov ebp,eax
+ add edx,ecx
+ and ebp,ebx
+ mov ecx,DWORD [52+esp]
+ add edx,ebp
+ ; 40_59 45
+ mov ebp,esi
+ xor ecx,DWORD [60+esp]
+ xor ebp,eax
+ xor ecx,DWORD [20+esp]
+ and ebp,edi
+ xor ecx,DWORD [40+esp]
+ rol ecx,1
+ add ebp,ebx
+ ror edi,2
+ mov ebx,edx
+ rol ebx,5
+ mov DWORD [52+esp],ecx
+ lea ecx,[2400959708+ebp*1+ecx]
+ mov ebp,esi
+ add ecx,ebx
+ and ebp,eax
+ mov ebx,DWORD [56+esp]
+ add ecx,ebp
+ ; 40_59 46
+ mov ebp,edi
+ xor ebx,DWORD [esp]
+ xor ebp,esi
+ xor ebx,DWORD [24+esp]
+ and ebp,edx
+ xor ebx,DWORD [44+esp]
+ rol ebx,1
+ add ebp,eax
+ ror edx,2
+ mov eax,ecx
+ rol eax,5
+ mov DWORD [56+esp],ebx
+ lea ebx,[2400959708+ebp*1+ebx]
+ mov ebp,edi
+ add ebx,eax
+ and ebp,esi
+ mov eax,DWORD [60+esp]
+ add ebx,ebp
+ ; 40_59 47
+ mov ebp,edx
+ xor eax,DWORD [4+esp]
+ xor ebp,edi
+ xor eax,DWORD [28+esp]
+ and ebp,ecx
+ xor eax,DWORD [48+esp]
+ rol eax,1
+ add ebp,esi
+ ror ecx,2
+ mov esi,ebx
+ rol esi,5
+ mov DWORD [60+esp],eax
+ lea eax,[2400959708+ebp*1+eax]
+ mov ebp,edx
+ add eax,esi
+ and ebp,edi
+ mov esi,DWORD [esp]
+ add eax,ebp
+ ; 40_59 48
+ mov ebp,ecx
+ xor esi,DWORD [8+esp]
+ xor ebp,edx
+ xor esi,DWORD [32+esp]
+ and ebp,ebx
+ xor esi,DWORD [52+esp]
+ rol esi,1
+ add ebp,edi
+ ror ebx,2
+ mov edi,eax
+ rol edi,5
+ mov DWORD [esp],esi
+ lea esi,[2400959708+ebp*1+esi]
+ mov ebp,ecx
+ add esi,edi
+ and ebp,edx
+ mov edi,DWORD [4+esp]
+ add esi,ebp
+ ; 40_59 49
+ mov ebp,ebx
+ xor edi,DWORD [12+esp]
+ xor ebp,ecx
+ xor edi,DWORD [36+esp]
+ and ebp,eax
+ xor edi,DWORD [56+esp]
+ rol edi,1
+ add ebp,edx
+ ror eax,2
+ mov edx,esi
+ rol edx,5
+ mov DWORD [4+esp],edi
+ lea edi,[2400959708+ebp*1+edi]
+ mov ebp,ebx
+ add edi,edx
+ and ebp,ecx
+ mov edx,DWORD [8+esp]
+ add edi,ebp
+ ; 40_59 50
+ mov ebp,eax
+ xor edx,DWORD [16+esp]
+ xor ebp,ebx
+ xor edx,DWORD [40+esp]
+ and ebp,esi
+ xor edx,DWORD [60+esp]
+ rol edx,1
+ add ebp,ecx
+ ror esi,2
+ mov ecx,edi
+ rol ecx,5
+ mov DWORD [8+esp],edx
+ lea edx,[2400959708+ebp*1+edx]
+ mov ebp,eax
+ add edx,ecx
+ and ebp,ebx
+ mov ecx,DWORD [12+esp]
+ add edx,ebp
+ ; 40_59 51
+ mov ebp,esi
+ xor ecx,DWORD [20+esp]
+ xor ebp,eax
+ xor ecx,DWORD [44+esp]
+ and ebp,edi
+ xor ecx,DWORD [esp]
+ rol ecx,1
+ add ebp,ebx
+ ror edi,2
+ mov ebx,edx
+ rol ebx,5
+ mov DWORD [12+esp],ecx
+ lea ecx,[2400959708+ebp*1+ecx]
+ mov ebp,esi
+ add ecx,ebx
+ and ebp,eax
+ mov ebx,DWORD [16+esp]
+ add ecx,ebp
+ ; 40_59 52
+ mov ebp,edi
+ xor ebx,DWORD [24+esp]
+ xor ebp,esi
+ xor ebx,DWORD [48+esp]
+ and ebp,edx
+ xor ebx,DWORD [4+esp]
+ rol ebx,1
+ add ebp,eax
+ ror edx,2
+ mov eax,ecx
+ rol eax,5
+ mov DWORD [16+esp],ebx
+ lea ebx,[2400959708+ebp*1+ebx]
+ mov ebp,edi
+ add ebx,eax
+ and ebp,esi
+ mov eax,DWORD [20+esp]
+ add ebx,ebp
+ ; 40_59 53
+ mov ebp,edx
+ xor eax,DWORD [28+esp]
+ xor ebp,edi
+ xor eax,DWORD [52+esp]
+ and ebp,ecx
+ xor eax,DWORD [8+esp]
+ rol eax,1
+ add ebp,esi
+ ror ecx,2
+ mov esi,ebx
+ rol esi,5
+ mov DWORD [20+esp],eax
+ lea eax,[2400959708+ebp*1+eax]
+ mov ebp,edx
+ add eax,esi
+ and ebp,edi
+ mov esi,DWORD [24+esp]
+ add eax,ebp
+ ; 40_59 54
+ mov ebp,ecx
+ xor esi,DWORD [32+esp]
+ xor ebp,edx
+ xor esi,DWORD [56+esp]
+ and ebp,ebx
+ xor esi,DWORD [12+esp]
+ rol esi,1
+ add ebp,edi
+ ror ebx,2
+ mov edi,eax
+ rol edi,5
+ mov DWORD [24+esp],esi
+ lea esi,[2400959708+ebp*1+esi]
+ mov ebp,ecx
+ add esi,edi
+ and ebp,edx
+ mov edi,DWORD [28+esp]
+ add esi,ebp
+ ; 40_59 55
+ mov ebp,ebx
+ xor edi,DWORD [36+esp]
+ xor ebp,ecx
+ xor edi,DWORD [60+esp]
+ and ebp,eax
+ xor edi,DWORD [16+esp]
+ rol edi,1
+ add ebp,edx
+ ror eax,2
+ mov edx,esi
+ rol edx,5
+ mov DWORD [28+esp],edi
+ lea edi,[2400959708+ebp*1+edi]
+ mov ebp,ebx
+ add edi,edx
+ and ebp,ecx
+ mov edx,DWORD [32+esp]
+ add edi,ebp
+ ; 40_59 56
+ mov ebp,eax
+ xor edx,DWORD [40+esp]
+ xor ebp,ebx
+ xor edx,DWORD [esp]
+ and ebp,esi
+ xor edx,DWORD [20+esp]
+ rol edx,1
+ add ebp,ecx
+ ror esi,2
+ mov ecx,edi
+ rol ecx,5
+ mov DWORD [32+esp],edx
+ lea edx,[2400959708+ebp*1+edx]
+ mov ebp,eax
+ add edx,ecx
+ and ebp,ebx
+ mov ecx,DWORD [36+esp]
+ add edx,ebp
+ ; 40_59 57
+ mov ebp,esi
+ xor ecx,DWORD [44+esp]
+ xor ebp,eax
+ xor ecx,DWORD [4+esp]
+ and ebp,edi
+ xor ecx,DWORD [24+esp]
+ rol ecx,1
+ add ebp,ebx
+ ror edi,2
+ mov ebx,edx
+ rol ebx,5
+ mov DWORD [36+esp],ecx
+ lea ecx,[2400959708+ebp*1+ecx]
+ mov ebp,esi
+ add ecx,ebx
+ and ebp,eax
+ mov ebx,DWORD [40+esp]
+ add ecx,ebp
+ ; 40_59 58
+ mov ebp,edi
+ xor ebx,DWORD [48+esp]
+ xor ebp,esi
+ xor ebx,DWORD [8+esp]
+ and ebp,edx
+ xor ebx,DWORD [28+esp]
+ rol ebx,1
+ add ebp,eax
+ ror edx,2
+ mov eax,ecx
+ rol eax,5
+ mov DWORD [40+esp],ebx
+ lea ebx,[2400959708+ebp*1+ebx]
+ mov ebp,edi
+ add ebx,eax
+ and ebp,esi
+ mov eax,DWORD [44+esp]
+ add ebx,ebp
+ ; 40_59 59
+ mov ebp,edx
+ xor eax,DWORD [52+esp]
+ xor ebp,edi
+ xor eax,DWORD [12+esp]
+ and ebp,ecx
+ xor eax,DWORD [32+esp]
+ rol eax,1
+ add ebp,esi
+ ror ecx,2
+ mov esi,ebx
+ rol esi,5
+ mov DWORD [44+esp],eax
+ lea eax,[2400959708+ebp*1+eax]
+ mov ebp,edx
+ add eax,esi
+ and ebp,edi
+ mov esi,DWORD [48+esp]
+ add eax,ebp
+ ; 20_39 60
+ mov ebp,ebx
+ xor esi,DWORD [56+esp]
+ xor ebp,ecx
+ xor esi,DWORD [16+esp]
+ xor ebp,edx
+ xor esi,DWORD [36+esp]
+ rol esi,1
+ add edi,ebp
+ ror ebx,2
+ mov ebp,eax
+ rol ebp,5
+ mov DWORD [48+esp],esi
+ lea esi,[3395469782+edi*1+esi]
+ mov edi,DWORD [52+esp]
+ add esi,ebp
+ ; 20_39 61
+ mov ebp,eax
+ xor edi,DWORD [60+esp]
+ xor ebp,ebx
+ xor edi,DWORD [20+esp]
+ xor ebp,ecx
+ xor edi,DWORD [40+esp]
+ rol edi,1
+ add edx,ebp
+ ror eax,2
+ mov ebp,esi
+ rol ebp,5
+ mov DWORD [52+esp],edi
+ lea edi,[3395469782+edx*1+edi]
+ mov edx,DWORD [56+esp]
+ add edi,ebp
+ ; 20_39 62
+ mov ebp,esi
+ xor edx,DWORD [esp]
+ xor ebp,eax
+ xor edx,DWORD [24+esp]
+ xor ebp,ebx
+ xor edx,DWORD [44+esp]
+ rol edx,1
+ add ecx,ebp
+ ror esi,2
+ mov ebp,edi
+ rol ebp,5
+ mov DWORD [56+esp],edx
+ lea edx,[3395469782+ecx*1+edx]
+ mov ecx,DWORD [60+esp]
+ add edx,ebp
+ ; 20_39 63
+ mov ebp,edi
+ xor ecx,DWORD [4+esp]
+ xor ebp,esi
+ xor ecx,DWORD [28+esp]
+ xor ebp,eax
+ xor ecx,DWORD [48+esp]
+ rol ecx,1
+ add ebx,ebp
+ ror edi,2
+ mov ebp,edx
+ rol ebp,5
+ mov DWORD [60+esp],ecx
+ lea ecx,[3395469782+ebx*1+ecx]
+ mov ebx,DWORD [esp]
+ add ecx,ebp
+ ; 20_39 64
+ mov ebp,edx
+ xor ebx,DWORD [8+esp]
+ xor ebp,edi
+ xor ebx,DWORD [32+esp]
+ xor ebp,esi
+ xor ebx,DWORD [52+esp]
+ rol ebx,1
+ add eax,ebp
+ ror edx,2
+ mov ebp,ecx
+ rol ebp,5
+ mov DWORD [esp],ebx
+ lea ebx,[3395469782+eax*1+ebx]
+ mov eax,DWORD [4+esp]
+ add ebx,ebp
+ ; 20_39 65
+ mov ebp,ecx
+ xor eax,DWORD [12+esp]
+ xor ebp,edx
+ xor eax,DWORD [36+esp]
+ xor ebp,edi
+ xor eax,DWORD [56+esp]
+ rol eax,1
+ add esi,ebp
+ ror ecx,2
+ mov ebp,ebx
+ rol ebp,5
+ mov DWORD [4+esp],eax
+ lea eax,[3395469782+esi*1+eax]
+ mov esi,DWORD [8+esp]
+ add eax,ebp
+ ; 20_39 66
+ mov ebp,ebx
+ xor esi,DWORD [16+esp]
+ xor ebp,ecx
+ xor esi,DWORD [40+esp]
+ xor ebp,edx
+ xor esi,DWORD [60+esp]
+ rol esi,1
+ add edi,ebp
+ ror ebx,2
+ mov ebp,eax
+ rol ebp,5
+ mov DWORD [8+esp],esi
+ lea esi,[3395469782+edi*1+esi]
+ mov edi,DWORD [12+esp]
+ add esi,ebp
+ ; 20_39 67
+ mov ebp,eax
+ xor edi,DWORD [20+esp]
+ xor ebp,ebx
+ xor edi,DWORD [44+esp]
+ xor ebp,ecx
+ xor edi,DWORD [esp]
+ rol edi,1
+ add edx,ebp
+ ror eax,2
+ mov ebp,esi
+ rol ebp,5
+ mov DWORD [12+esp],edi
+ lea edi,[3395469782+edx*1+edi]
+ mov edx,DWORD [16+esp]
+ add edi,ebp
+ ; 20_39 68
+ mov ebp,esi
+ xor edx,DWORD [24+esp]
+ xor ebp,eax
+ xor edx,DWORD [48+esp]
+ xor ebp,ebx
+ xor edx,DWORD [4+esp]
+ rol edx,1
+ add ecx,ebp
+ ror esi,2
+ mov ebp,edi
+ rol ebp,5
+ mov DWORD [16+esp],edx
+ lea edx,[3395469782+ecx*1+edx]
+ mov ecx,DWORD [20+esp]
+ add edx,ebp
+ ; 20_39 69
+ mov ebp,edi
+ xor ecx,DWORD [28+esp]
+ xor ebp,esi
+ xor ecx,DWORD [52+esp]
+ xor ebp,eax
+ xor ecx,DWORD [8+esp]
+ rol ecx,1
+ add ebx,ebp
+ ror edi,2
+ mov ebp,edx
+ rol ebp,5
+ mov DWORD [20+esp],ecx
+ lea ecx,[3395469782+ebx*1+ecx]
+ mov ebx,DWORD [24+esp]
+ add ecx,ebp
+ ; 20_39 70
+ mov ebp,edx
+ xor ebx,DWORD [32+esp]
+ xor ebp,edi
+ xor ebx,DWORD [56+esp]
+ xor ebp,esi
+ xor ebx,DWORD [12+esp]
+ rol ebx,1
+ add eax,ebp
+ ror edx,2
+ mov ebp,ecx
+ rol ebp,5
+ mov DWORD [24+esp],ebx
+ lea ebx,[3395469782+eax*1+ebx]
+ mov eax,DWORD [28+esp]
+ add ebx,ebp
+ ; 20_39 71
+ mov ebp,ecx
+ xor eax,DWORD [36+esp]
+ xor ebp,edx
+ xor eax,DWORD [60+esp]
+ xor ebp,edi
+ xor eax,DWORD [16+esp]
+ rol eax,1
+ add esi,ebp
+ ror ecx,2
+ mov ebp,ebx
+ rol ebp,5
+ mov DWORD [28+esp],eax
+ lea eax,[3395469782+esi*1+eax]
+ mov esi,DWORD [32+esp]
+ add eax,ebp
+ ; 20_39 72
+ mov ebp,ebx
+ xor esi,DWORD [40+esp]
+ xor ebp,ecx
+ xor esi,DWORD [esp]
+ xor ebp,edx
+ xor esi,DWORD [20+esp]
+ rol esi,1
+ add edi,ebp
+ ror ebx,2
+ mov ebp,eax
+ rol ebp,5
+ mov DWORD [32+esp],esi
+ lea esi,[3395469782+edi*1+esi]
+ mov edi,DWORD [36+esp]
+ add esi,ebp
+ ; 20_39 73
+ mov ebp,eax
+ xor edi,DWORD [44+esp]
+ xor ebp,ebx
+ xor edi,DWORD [4+esp]
+ xor ebp,ecx
+ xor edi,DWORD [24+esp]
+ rol edi,1
+ add edx,ebp
+ ror eax,2
+ mov ebp,esi
+ rol ebp,5
+ mov DWORD [36+esp],edi
+ lea edi,[3395469782+edx*1+edi]
+ mov edx,DWORD [40+esp]
+ add edi,ebp
+ ; 20_39 74
+ mov ebp,esi
+ xor edx,DWORD [48+esp]
+ xor ebp,eax
+ xor edx,DWORD [8+esp]
+ xor ebp,ebx
+ xor edx,DWORD [28+esp]
+ rol edx,1
+ add ecx,ebp
+ ror esi,2
+ mov ebp,edi
+ rol ebp,5
+ mov DWORD [40+esp],edx
+ lea edx,[3395469782+ecx*1+edx]
+ mov ecx,DWORD [44+esp]
+ add edx,ebp
+ ; 20_39 75
+ mov ebp,edi
+ xor ecx,DWORD [52+esp]
+ xor ebp,esi
+ xor ecx,DWORD [12+esp]
+ xor ebp,eax
+ xor ecx,DWORD [32+esp]
+ rol ecx,1
+ add ebx,ebp
+ ror edi,2
+ mov ebp,edx
+ rol ebp,5
+ mov DWORD [44+esp],ecx
+ lea ecx,[3395469782+ebx*1+ecx]
+ mov ebx,DWORD [48+esp]
+ add ecx,ebp
+ ; 20_39 76
+ mov ebp,edx
+ xor ebx,DWORD [56+esp]
+ xor ebp,edi
+ xor ebx,DWORD [16+esp]
+ xor ebp,esi
+ xor ebx,DWORD [36+esp]
+ rol ebx,1
+ add eax,ebp
+ ror edx,2
+ mov ebp,ecx
+ rol ebp,5
+ mov DWORD [48+esp],ebx
+ lea ebx,[3395469782+eax*1+ebx]
+ mov eax,DWORD [52+esp]
+ add ebx,ebp
+ ; 20_39 77
+ mov ebp,ecx
+ xor eax,DWORD [60+esp]
+ xor ebp,edx
+ xor eax,DWORD [20+esp]
+ xor ebp,edi
+ xor eax,DWORD [40+esp]
+ rol eax,1
+ add esi,ebp
+ ror ecx,2
+ mov ebp,ebx
+ rol ebp,5
+ lea eax,[3395469782+esi*1+eax]
+ mov esi,DWORD [56+esp]
+ add eax,ebp
+ ; 20_39 78
+ mov ebp,ebx
+ xor esi,DWORD [esp]
+ xor ebp,ecx
+ xor esi,DWORD [24+esp]
+ xor ebp,edx
+ xor esi,DWORD [44+esp]
+ rol esi,1
+ add edi,ebp
+ ror ebx,2
+ mov ebp,eax
+ rol ebp,5
+ lea esi,[3395469782+edi*1+esi]
+ mov edi,DWORD [60+esp]
+ add esi,ebp
+ ; 20_39 79
+ mov ebp,eax
+ xor edi,DWORD [4+esp]
+ xor ebp,ebx
+ xor edi,DWORD [28+esp]
+ xor ebp,ecx
+ xor edi,DWORD [48+esp]
+ rol edi,1
+ add edx,ebp
+ ror eax,2
+ mov ebp,esi
+ rol ebp,5
+ lea edi,[3395469782+edx*1+edi]
+ add edi,ebp
+ mov ebp,DWORD [96+esp]
+ mov edx,DWORD [100+esp]
+ add edi,DWORD [ebp]
+ add esi,DWORD [4+ebp]
+ add eax,DWORD [8+ebp]
+ add ebx,DWORD [12+ebp]
+ add ecx,DWORD [16+ebp]
+ mov DWORD [ebp],edi
+ add edx,64
+ mov DWORD [4+ebp],esi
+ cmp edx,DWORD [104+esp]
+ mov DWORD [8+ebp],eax
+ mov edi,ecx
+ mov DWORD [12+ebp],ebx
+ mov esi,edx
+ mov DWORD [16+ebp],ecx
+ jb NEAR L$002loop
+ add esp,76
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+align 16
+__sha1_block_data_order_shaext:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ call L$003pic_point
+L$003pic_point:
+ pop ebp
+ lea ebp,[(L$K_XX_XX-L$003pic_point)+ebp]
+L$shaext_shortcut:
+ mov edi,DWORD [20+esp]
+ mov ebx,esp
+ mov esi,DWORD [24+esp]
+ mov ecx,DWORD [28+esp]
+ sub esp,32
+ movdqu xmm0,[edi]
+ movd xmm1,DWORD [16+edi]
+ and esp,-32
+ movdqa xmm3,[80+ebp]
+ movdqu xmm4,[esi]
+ pshufd xmm0,xmm0,27
+ movdqu xmm5,[16+esi]
+ pshufd xmm1,xmm1,27
+ movdqu xmm6,[32+esi]
+db 102,15,56,0,227
+ movdqu xmm7,[48+esi]
+db 102,15,56,0,235
+db 102,15,56,0,243
+db 102,15,56,0,251
+ jmp NEAR L$004loop_shaext
+align 16
+L$004loop_shaext:
+ dec ecx
+ lea eax,[64+esi]
+ movdqa [esp],xmm1
+ paddd xmm1,xmm4
+ cmovne esi,eax
+ movdqa [16+esp],xmm0
+db 15,56,201,229
+ movdqa xmm2,xmm0
+db 15,58,204,193,0
+db 15,56,200,213
+ pxor xmm4,xmm6
+db 15,56,201,238
+db 15,56,202,231
+ movdqa xmm1,xmm0
+db 15,58,204,194,0
+db 15,56,200,206
+ pxor xmm5,xmm7
+db 15,56,202,236
+db 15,56,201,247
+ movdqa xmm2,xmm0
+db 15,58,204,193,0
+db 15,56,200,215
+ pxor xmm6,xmm4
+db 15,56,201,252
+db 15,56,202,245
+ movdqa xmm1,xmm0
+db 15,58,204,194,0
+db 15,56,200,204
+ pxor xmm7,xmm5
+db 15,56,202,254
+db 15,56,201,229
+ movdqa xmm2,xmm0
+db 15,58,204,193,0
+db 15,56,200,213
+ pxor xmm4,xmm6
+db 15,56,201,238
+db 15,56,202,231
+ movdqa xmm1,xmm0
+db 15,58,204,194,1
+db 15,56,200,206
+ pxor xmm5,xmm7
+db 15,56,202,236
+db 15,56,201,247
+ movdqa xmm2,xmm0
+db 15,58,204,193,1
+db 15,56,200,215
+ pxor xmm6,xmm4
+db 15,56,201,252
+db 15,56,202,245
+ movdqa xmm1,xmm0
+db 15,58,204,194,1
+db 15,56,200,204
+ pxor xmm7,xmm5
+db 15,56,202,254
+db 15,56,201,229
+ movdqa xmm2,xmm0
+db 15,58,204,193,1
+db 15,56,200,213
+ pxor xmm4,xmm6
+db 15,56,201,238
+db 15,56,202,231
+ movdqa xmm1,xmm0
+db 15,58,204,194,1
+db 15,56,200,206
+ pxor xmm5,xmm7
+db 15,56,202,236
+db 15,56,201,247
+ movdqa xmm2,xmm0
+db 15,58,204,193,2
+db 15,56,200,215
+ pxor xmm6,xmm4
+db 15,56,201,252
+db 15,56,202,245
+ movdqa xmm1,xmm0
+db 15,58,204,194,2
+db 15,56,200,204
+ pxor xmm7,xmm5
+db 15,56,202,254
+db 15,56,201,229
+ movdqa xmm2,xmm0
+db 15,58,204,193,2
+db 15,56,200,213
+ pxor xmm4,xmm6
+db 15,56,201,238
+db 15,56,202,231
+ movdqa xmm1,xmm0
+db 15,58,204,194,2
+db 15,56,200,206
+ pxor xmm5,xmm7
+db 15,56,202,236
+db 15,56,201,247
+ movdqa xmm2,xmm0
+db 15,58,204,193,2
+db 15,56,200,215
+ pxor xmm6,xmm4
+db 15,56,201,252
+db 15,56,202,245
+ movdqa xmm1,xmm0
+db 15,58,204,194,3
+db 15,56,200,204
+ pxor xmm7,xmm5
+db 15,56,202,254
+ movdqu xmm4,[esi]
+ movdqa xmm2,xmm0
+db 15,58,204,193,3
+db 15,56,200,213
+ movdqu xmm5,[16+esi]
+db 102,15,56,0,227
+ movdqa xmm1,xmm0
+db 15,58,204,194,3
+db 15,56,200,206
+ movdqu xmm6,[32+esi]
+db 102,15,56,0,235
+ movdqa xmm2,xmm0
+db 15,58,204,193,3
+db 15,56,200,215
+ movdqu xmm7,[48+esi]
+db 102,15,56,0,243
+ movdqa xmm1,xmm0
+db 15,58,204,194,3
+ movdqa xmm2,[esp]
+db 102,15,56,0,251
+db 15,56,200,202
+ paddd xmm0,[16+esp]
+ jnz NEAR L$004loop_shaext
+ pshufd xmm0,xmm0,27
+ pshufd xmm1,xmm1,27
+ movdqu [edi],xmm0
+ movd DWORD [16+edi],xmm1
+ mov esp,ebx
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+align 16
+__sha1_block_data_order_ssse3:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ call L$005pic_point
+L$005pic_point:
+ pop ebp
+ lea ebp,[(L$K_XX_XX-L$005pic_point)+ebp]
+L$ssse3_shortcut:
+ movdqa xmm7,[ebp]
+ movdqa xmm0,[16+ebp]
+ movdqa xmm1,[32+ebp]
+ movdqa xmm2,[48+ebp]
+ movdqa xmm6,[64+ebp]
+ mov edi,DWORD [20+esp]
+ mov ebp,DWORD [24+esp]
+ mov edx,DWORD [28+esp]
+ mov esi,esp
+ sub esp,208
+ and esp,-64
+ movdqa [112+esp],xmm0
+ movdqa [128+esp],xmm1
+ movdqa [144+esp],xmm2
+ shl edx,6
+ movdqa [160+esp],xmm7
+ add edx,ebp
+ movdqa [176+esp],xmm6
+ add ebp,64
+ mov DWORD [192+esp],edi
+ mov DWORD [196+esp],ebp
+ mov DWORD [200+esp],edx
+ mov DWORD [204+esp],esi
+ mov eax,DWORD [edi]
+ mov ebx,DWORD [4+edi]
+ mov ecx,DWORD [8+edi]
+ mov edx,DWORD [12+edi]
+ mov edi,DWORD [16+edi]
+ mov esi,ebx
+ movdqu xmm0,[ebp-64]
+ movdqu xmm1,[ebp-48]
+ movdqu xmm2,[ebp-32]
+ movdqu xmm3,[ebp-16]
+db 102,15,56,0,198
+db 102,15,56,0,206
+db 102,15,56,0,214
+ movdqa [96+esp],xmm7
+db 102,15,56,0,222
+ paddd xmm0,xmm7
+ paddd xmm1,xmm7
+ paddd xmm2,xmm7
+ movdqa [esp],xmm0
+ psubd xmm0,xmm7
+ movdqa [16+esp],xmm1
+ psubd xmm1,xmm7
+ movdqa [32+esp],xmm2
+ mov ebp,ecx
+ psubd xmm2,xmm7
+ xor ebp,edx
+ pshufd xmm4,xmm0,238
+ and esi,ebp
+ jmp NEAR L$006loop
+align 16
+L$006loop:
+ ror ebx,2
+ xor esi,edx
+ mov ebp,eax
+ punpcklqdq xmm4,xmm1
+ movdqa xmm6,xmm3
+ add edi,DWORD [esp]
+ xor ebx,ecx
+ paddd xmm7,xmm3
+ movdqa [64+esp],xmm0
+ rol eax,5
+ add edi,esi
+ psrldq xmm6,4
+ and ebp,ebx
+ xor ebx,ecx
+ pxor xmm4,xmm0
+ add edi,eax
+ ror eax,7
+ pxor xmm6,xmm2
+ xor ebp,ecx
+ mov esi,edi
+ add edx,DWORD [4+esp]
+ pxor xmm4,xmm6
+ xor eax,ebx
+ rol edi,5
+ movdqa [48+esp],xmm7
+ add edx,ebp
+ and esi,eax
+ movdqa xmm0,xmm4
+ xor eax,ebx
+ add edx,edi
+ ror edi,7
+ movdqa xmm6,xmm4
+ xor esi,ebx
+ pslldq xmm0,12
+ paddd xmm4,xmm4
+ mov ebp,edx
+ add ecx,DWORD [8+esp]
+ psrld xmm6,31
+ xor edi,eax
+ rol edx,5
+ movdqa xmm7,xmm0
+ add ecx,esi
+ and ebp,edi
+ xor edi,eax
+ psrld xmm0,30
+ add ecx,edx
+ ror edx,7
+ por xmm4,xmm6
+ xor ebp,eax
+ mov esi,ecx
+ add ebx,DWORD [12+esp]
+ pslld xmm7,2
+ xor edx,edi
+ rol ecx,5
+ pxor xmm4,xmm0
+ movdqa xmm0,[96+esp]
+ add ebx,ebp
+ and esi,edx
+ pxor xmm4,xmm7
+ pshufd xmm5,xmm1,238
+ xor edx,edi
+ add ebx,ecx
+ ror ecx,7
+ xor esi,edi
+ mov ebp,ebx
+ punpcklqdq xmm5,xmm2
+ movdqa xmm7,xmm4
+ add eax,DWORD [16+esp]
+ xor ecx,edx
+ paddd xmm0,xmm4
+ movdqa [80+esp],xmm1
+ rol ebx,5
+ add eax,esi
+ psrldq xmm7,4
+ and ebp,ecx
+ xor ecx,edx
+ pxor xmm5,xmm1
+ add eax,ebx
+ ror ebx,7
+ pxor xmm7,xmm3
+ xor ebp,edx
+ mov esi,eax
+ add edi,DWORD [20+esp]
+ pxor xmm5,xmm7
+ xor ebx,ecx
+ rol eax,5
+ movdqa [esp],xmm0
+ add edi,ebp
+ and esi,ebx
+ movdqa xmm1,xmm5
+ xor ebx,ecx
+ add edi,eax
+ ror eax,7
+ movdqa xmm7,xmm5
+ xor esi,ecx
+ pslldq xmm1,12
+ paddd xmm5,xmm5
+ mov ebp,edi
+ add edx,DWORD [24+esp]
+ psrld xmm7,31
+ xor eax,ebx
+ rol edi,5
+ movdqa xmm0,xmm1
+ add edx,esi
+ and ebp,eax
+ xor eax,ebx
+ psrld xmm1,30
+ add edx,edi
+ ror edi,7
+ por xmm5,xmm7
+ xor ebp,ebx
+ mov esi,edx
+ add ecx,DWORD [28+esp]
+ pslld xmm0,2
+ xor edi,eax
+ rol edx,5
+ pxor xmm5,xmm1
+ movdqa xmm1,[112+esp]
+ add ecx,ebp
+ and esi,edi
+ pxor xmm5,xmm0
+ pshufd xmm6,xmm2,238
+ xor edi,eax
+ add ecx,edx
+ ror edx,7
+ xor esi,eax
+ mov ebp,ecx
+ punpcklqdq xmm6,xmm3
+ movdqa xmm0,xmm5
+ add ebx,DWORD [32+esp]
+ xor edx,edi
+ paddd xmm1,xmm5
+ movdqa [96+esp],xmm2
+ rol ecx,5
+ add ebx,esi
+ psrldq xmm0,4
+ and ebp,edx
+ xor edx,edi
+ pxor xmm6,xmm2
+ add ebx,ecx
+ ror ecx,7
+ pxor xmm0,xmm4
+ xor ebp,edi
+ mov esi,ebx
+ add eax,DWORD [36+esp]
+ pxor xmm6,xmm0
+ xor ecx,edx
+ rol ebx,5
+ movdqa [16+esp],xmm1
+ add eax,ebp
+ and esi,ecx
+ movdqa xmm2,xmm6
+ xor ecx,edx
+ add eax,ebx
+ ror ebx,7
+ movdqa xmm0,xmm6
+ xor esi,edx
+ pslldq xmm2,12
+ paddd xmm6,xmm6
+ mov ebp,eax
+ add edi,DWORD [40+esp]
+ psrld xmm0,31
+ xor ebx,ecx
+ rol eax,5
+ movdqa xmm1,xmm2
+ add edi,esi
+ and ebp,ebx
+ xor ebx,ecx
+ psrld xmm2,30
+ add edi,eax
+ ror eax,7
+ por xmm6,xmm0
+ xor ebp,ecx
+ movdqa xmm0,[64+esp]
+ mov esi,edi
+ add edx,DWORD [44+esp]
+ pslld xmm1,2
+ xor eax,ebx
+ rol edi,5
+ pxor xmm6,xmm2
+ movdqa xmm2,[112+esp]
+ add edx,ebp
+ and esi,eax
+ pxor xmm6,xmm1
+ pshufd xmm7,xmm3,238
+ xor eax,ebx
+ add edx,edi
+ ror edi,7
+ xor esi,ebx
+ mov ebp,edx
+ punpcklqdq xmm7,xmm4
+ movdqa xmm1,xmm6
+ add ecx,DWORD [48+esp]
+ xor edi,eax
+ paddd xmm2,xmm6
+ movdqa [64+esp],xmm3
+ rol edx,5
+ add ecx,esi
+ psrldq xmm1,4
+ and ebp,edi
+ xor edi,eax
+ pxor xmm7,xmm3
+ add ecx,edx
+ ror edx,7
+ pxor xmm1,xmm5
+ xor ebp,eax
+ mov esi,ecx
+ add ebx,DWORD [52+esp]
+ pxor xmm7,xmm1
+ xor edx,edi
+ rol ecx,5
+ movdqa [32+esp],xmm2
+ add ebx,ebp
+ and esi,edx
+ movdqa xmm3,xmm7
+ xor edx,edi
+ add ebx,ecx
+ ror ecx,7
+ movdqa xmm1,xmm7
+ xor esi,edi
+ pslldq xmm3,12
+ paddd xmm7,xmm7
+ mov ebp,ebx
+ add eax,DWORD [56+esp]
+ psrld xmm1,31
+ xor ecx,edx
+ rol ebx,5
+ movdqa xmm2,xmm3
+ add eax,esi
+ and ebp,ecx
+ xor ecx,edx
+ psrld xmm3,30
+ add eax,ebx
+ ror ebx,7
+ por xmm7,xmm1
+ xor ebp,edx
+ movdqa xmm1,[80+esp]
+ mov esi,eax
+ add edi,DWORD [60+esp]
+ pslld xmm2,2
+ xor ebx,ecx
+ rol eax,5
+ pxor xmm7,xmm3
+ movdqa xmm3,[112+esp]
+ add edi,ebp
+ and esi,ebx
+ pxor xmm7,xmm2
+ pshufd xmm2,xmm6,238
+ xor ebx,ecx
+ add edi,eax
+ ror eax,7
+ pxor xmm0,xmm4
+ punpcklqdq xmm2,xmm7
+ xor esi,ecx
+ mov ebp,edi
+ add edx,DWORD [esp]
+ pxor xmm0,xmm1
+ movdqa [80+esp],xmm4
+ xor eax,ebx
+ rol edi,5
+ movdqa xmm4,xmm3
+ add edx,esi
+ paddd xmm3,xmm7
+ and ebp,eax
+ pxor xmm0,xmm2
+ xor eax,ebx
+ add edx,edi
+ ror edi,7
+ xor ebp,ebx
+ movdqa xmm2,xmm0
+ movdqa [48+esp],xmm3
+ mov esi,edx
+ add ecx,DWORD [4+esp]
+ xor edi,eax
+ rol edx,5
+ pslld xmm0,2
+ add ecx,ebp
+ and esi,edi
+ psrld xmm2,30
+ xor edi,eax
+ add ecx,edx
+ ror edx,7
+ xor esi,eax
+ mov ebp,ecx
+ add ebx,DWORD [8+esp]
+ xor edx,edi
+ rol ecx,5
+ por xmm0,xmm2
+ add ebx,esi
+ and ebp,edx
+ movdqa xmm2,[96+esp]
+ xor edx,edi
+ add ebx,ecx
+ add eax,DWORD [12+esp]
+ xor ebp,edi
+ mov esi,ebx
+ pshufd xmm3,xmm7,238
+ rol ebx,5
+ add eax,ebp
+ xor esi,edx
+ ror ecx,7
+ add eax,ebx
+ add edi,DWORD [16+esp]
+ pxor xmm1,xmm5
+ punpcklqdq xmm3,xmm0
+ xor esi,ecx
+ mov ebp,eax
+ rol eax,5
+ pxor xmm1,xmm2
+ movdqa [96+esp],xmm5
+ add edi,esi
+ xor ebp,ecx
+ movdqa xmm5,xmm4
+ ror ebx,7
+ paddd xmm4,xmm0
+ add edi,eax
+ pxor xmm1,xmm3
+ add edx,DWORD [20+esp]
+ xor ebp,ebx
+ mov esi,edi
+ rol edi,5
+ movdqa xmm3,xmm1
+ movdqa [esp],xmm4
+ add edx,ebp
+ xor esi,ebx
+ ror eax,7
+ add edx,edi
+ pslld xmm1,2
+ add ecx,DWORD [24+esp]
+ xor esi,eax
+ psrld xmm3,30
+ mov ebp,edx
+ rol edx,5
+ add ecx,esi
+ xor ebp,eax
+ ror edi,7
+ add ecx,edx
+ por xmm1,xmm3
+ add ebx,DWORD [28+esp]
+ xor ebp,edi
+ movdqa xmm3,[64+esp]
+ mov esi,ecx
+ rol ecx,5
+ add ebx,ebp
+ xor esi,edi
+ ror edx,7
+ pshufd xmm4,xmm0,238
+ add ebx,ecx
+ add eax,DWORD [32+esp]
+ pxor xmm2,xmm6
+ punpcklqdq xmm4,xmm1
+ xor esi,edx
+ mov ebp,ebx
+ rol ebx,5
+ pxor xmm2,xmm3
+ movdqa [64+esp],xmm6
+ add eax,esi
+ xor ebp,edx
+ movdqa xmm6,[128+esp]
+ ror ecx,7
+ paddd xmm5,xmm1
+ add eax,ebx
+ pxor xmm2,xmm4
+ add edi,DWORD [36+esp]
+ xor ebp,ecx
+ mov esi,eax
+ rol eax,5
+ movdqa xmm4,xmm2
+ movdqa [16+esp],xmm5
+ add edi,ebp
+ xor esi,ecx
+ ror ebx,7
+ add edi,eax
+ pslld xmm2,2
+ add edx,DWORD [40+esp]
+ xor esi,ebx
+ psrld xmm4,30
+ mov ebp,edi
+ rol edi,5
+ add edx,esi
+ xor ebp,ebx
+ ror eax,7
+ add edx,edi
+ por xmm2,xmm4
+ add ecx,DWORD [44+esp]
+ xor ebp,eax
+ movdqa xmm4,[80+esp]
+ mov esi,edx
+ rol edx,5
+ add ecx,ebp
+ xor esi,eax
+ ror edi,7
+ pshufd xmm5,xmm1,238
+ add ecx,edx
+ add ebx,DWORD [48+esp]
+ pxor xmm3,xmm7
+ punpcklqdq xmm5,xmm2
+ xor esi,edi
+ mov ebp,ecx
+ rol ecx,5
+ pxor xmm3,xmm4
+ movdqa [80+esp],xmm7
+ add ebx,esi
+ xor ebp,edi
+ movdqa xmm7,xmm6
+ ror edx,7
+ paddd xmm6,xmm2
+ add ebx,ecx
+ pxor xmm3,xmm5
+ add eax,DWORD [52+esp]
+ xor ebp,edx
+ mov esi,ebx
+ rol ebx,5
+ movdqa xmm5,xmm3
+ movdqa [32+esp],xmm6
+ add eax,ebp
+ xor esi,edx
+ ror ecx,7
+ add eax,ebx
+ pslld xmm3,2
+ add edi,DWORD [56+esp]
+ xor esi,ecx
+ psrld xmm5,30
+ mov ebp,eax
+ rol eax,5
+ add edi,esi
+ xor ebp,ecx
+ ror ebx,7
+ add edi,eax
+ por xmm3,xmm5
+ add edx,DWORD [60+esp]
+ xor ebp,ebx
+ movdqa xmm5,[96+esp]
+ mov esi,edi
+ rol edi,5
+ add edx,ebp
+ xor esi,ebx
+ ror eax,7
+ pshufd xmm6,xmm2,238
+ add edx,edi
+ add ecx,DWORD [esp]
+ pxor xmm4,xmm0
+ punpcklqdq xmm6,xmm3
+ xor esi,eax
+ mov ebp,edx
+ rol edx,5
+ pxor xmm4,xmm5
+ movdqa [96+esp],xmm0
+ add ecx,esi
+ xor ebp,eax
+ movdqa xmm0,xmm7
+ ror edi,7
+ paddd xmm7,xmm3
+ add ecx,edx
+ pxor xmm4,xmm6
+ add ebx,DWORD [4+esp]
+ xor ebp,edi
+ mov esi,ecx
+ rol ecx,5
+ movdqa xmm6,xmm4
+ movdqa [48+esp],xmm7
+ add ebx,ebp
+ xor esi,edi
+ ror edx,7
+ add ebx,ecx
+ pslld xmm4,2
+ add eax,DWORD [8+esp]
+ xor esi,edx
+ psrld xmm6,30
+ mov ebp,ebx
+ rol ebx,5
+ add eax,esi
+ xor ebp,edx
+ ror ecx,7
+ add eax,ebx
+ por xmm4,xmm6
+ add edi,DWORD [12+esp]
+ xor ebp,ecx
+ movdqa xmm6,[64+esp]
+ mov esi,eax
+ rol eax,5
+ add edi,ebp
+ xor esi,ecx
+ ror ebx,7
+ pshufd xmm7,xmm3,238
+ add edi,eax
+ add edx,DWORD [16+esp]
+ pxor xmm5,xmm1
+ punpcklqdq xmm7,xmm4
+ xor esi,ebx
+ mov ebp,edi
+ rol edi,5
+ pxor xmm5,xmm6
+ movdqa [64+esp],xmm1
+ add edx,esi
+ xor ebp,ebx
+ movdqa xmm1,xmm0
+ ror eax,7
+ paddd xmm0,xmm4
+ add edx,edi
+ pxor xmm5,xmm7
+ add ecx,DWORD [20+esp]
+ xor ebp,eax
+ mov esi,edx
+ rol edx,5
+ movdqa xmm7,xmm5
+ movdqa [esp],xmm0
+ add ecx,ebp
+ xor esi,eax
+ ror edi,7
+ add ecx,edx
+ pslld xmm5,2
+ add ebx,DWORD [24+esp]
+ xor esi,edi
+ psrld xmm7,30
+ mov ebp,ecx
+ rol ecx,5
+ add ebx,esi
+ xor ebp,edi
+ ror edx,7
+ add ebx,ecx
+ por xmm5,xmm7
+ add eax,DWORD [28+esp]
+ movdqa xmm7,[80+esp]
+ ror ecx,7
+ mov esi,ebx
+ xor ebp,edx
+ rol ebx,5
+ pshufd xmm0,xmm4,238
+ add eax,ebp
+ xor esi,ecx
+ xor ecx,edx
+ add eax,ebx
+ add edi,DWORD [32+esp]
+ pxor xmm6,xmm2
+ punpcklqdq xmm0,xmm5
+ and esi,ecx
+ xor ecx,edx
+ ror ebx,7
+ pxor xmm6,xmm7
+ movdqa [80+esp],xmm2
+ mov ebp,eax
+ xor esi,ecx
+ rol eax,5
+ movdqa xmm2,xmm1
+ add edi,esi
+ paddd xmm1,xmm5
+ xor ebp,ebx
+ pxor xmm6,xmm0
+ xor ebx,ecx
+ add edi,eax
+ add edx,DWORD [36+esp]
+ and ebp,ebx
+ movdqa xmm0,xmm6
+ movdqa [16+esp],xmm1
+ xor ebx,ecx
+ ror eax,7
+ mov esi,edi
+ xor ebp,ebx
+ rol edi,5
+ pslld xmm6,2
+ add edx,ebp
+ xor esi,eax
+ psrld xmm0,30
+ xor eax,ebx
+ add edx,edi
+ add ecx,DWORD [40+esp]
+ and esi,eax
+ xor eax,ebx
+ ror edi,7
+ por xmm6,xmm0
+ mov ebp,edx
+ xor esi,eax
+ movdqa xmm0,[96+esp]
+ rol edx,5
+ add ecx,esi
+ xor ebp,edi
+ xor edi,eax
+ add ecx,edx
+ pshufd xmm1,xmm5,238
+ add ebx,DWORD [44+esp]
+ and ebp,edi
+ xor edi,eax
+ ror edx,7
+ mov esi,ecx
+ xor ebp,edi
+ rol ecx,5
+ add ebx,ebp
+ xor esi,edx
+ xor edx,edi
+ add ebx,ecx
+ add eax,DWORD [48+esp]
+ pxor xmm7,xmm3
+ punpcklqdq xmm1,xmm6
+ and esi,edx
+ xor edx,edi
+ ror ecx,7
+ pxor xmm7,xmm0
+ movdqa [96+esp],xmm3
+ mov ebp,ebx
+ xor esi,edx
+ rol ebx,5
+ movdqa xmm3,[144+esp]
+ add eax,esi
+ paddd xmm2,xmm6
+ xor ebp,ecx
+ pxor xmm7,xmm1
+ xor ecx,edx
+ add eax,ebx
+ add edi,DWORD [52+esp]
+ and ebp,ecx
+ movdqa xmm1,xmm7
+ movdqa [32+esp],xmm2
+ xor ecx,edx
+ ror ebx,7
+ mov esi,eax
+ xor ebp,ecx
+ rol eax,5
+ pslld xmm7,2
+ add edi,ebp
+ xor esi,ebx
+ psrld xmm1,30
+ xor ebx,ecx
+ add edi,eax
+ add edx,DWORD [56+esp]
+ and esi,ebx
+ xor ebx,ecx
+ ror eax,7
+ por xmm7,xmm1
+ mov ebp,edi
+ xor esi,ebx
+ movdqa xmm1,[64+esp]
+ rol edi,5
+ add edx,esi
+ xor ebp,eax
+ xor eax,ebx
+ add edx,edi
+ pshufd xmm2,xmm6,238
+ add ecx,DWORD [60+esp]
+ and ebp,eax
+ xor eax,ebx
+ ror edi,7
+ mov esi,edx
+ xor ebp,eax
+ rol edx,5
+ add ecx,ebp
+ xor esi,edi
+ xor edi,eax
+ add ecx,edx
+ add ebx,DWORD [esp]
+ pxor xmm0,xmm4
+ punpcklqdq xmm2,xmm7
+ and esi,edi
+ xor edi,eax
+ ror edx,7
+ pxor xmm0,xmm1
+ movdqa [64+esp],xmm4
+ mov ebp,ecx
+ xor esi,edi
+ rol ecx,5
+ movdqa xmm4,xmm3
+ add ebx,esi
+ paddd xmm3,xmm7
+ xor ebp,edx
+ pxor xmm0,xmm2
+ xor edx,edi
+ add ebx,ecx
+ add eax,DWORD [4+esp]
+ and ebp,edx
+ movdqa xmm2,xmm0
+ movdqa [48+esp],xmm3
+ xor edx,edi
+ ror ecx,7
+ mov esi,ebx
+ xor ebp,edx
+ rol ebx,5
+ pslld xmm0,2
+ add eax,ebp
+ xor esi,ecx
+ psrld xmm2,30
+ xor ecx,edx
+ add eax,ebx
+ add edi,DWORD [8+esp]
+ and esi,ecx
+ xor ecx,edx
+ ror ebx,7
+ por xmm0,xmm2
+ mov ebp,eax
+ xor esi,ecx
+ movdqa xmm2,[80+esp]
+ rol eax,5
+ add edi,esi
+ xor ebp,ebx
+ xor ebx,ecx
+ add edi,eax
+ pshufd xmm3,xmm7,238
+ add edx,DWORD [12+esp]
+ and ebp,ebx
+ xor ebx,ecx
+ ror eax,7
+ mov esi,edi
+ xor ebp,ebx
+ rol edi,5
+ add edx,ebp
+ xor esi,eax
+ xor eax,ebx
+ add edx,edi
+ add ecx,DWORD [16+esp]
+ pxor xmm1,xmm5
+ punpcklqdq xmm3,xmm0
+ and esi,eax
+ xor eax,ebx
+ ror edi,7
+ pxor xmm1,xmm2
+ movdqa [80+esp],xmm5
+ mov ebp,edx
+ xor esi,eax
+ rol edx,5
+ movdqa xmm5,xmm4
+ add ecx,esi
+ paddd xmm4,xmm0
+ xor ebp,edi
+ pxor xmm1,xmm3
+ xor edi,eax
+ add ecx,edx
+ add ebx,DWORD [20+esp]
+ and ebp,edi
+ movdqa xmm3,xmm1
+ movdqa [esp],xmm4
+ xor edi,eax
+ ror edx,7
+ mov esi,ecx
+ xor ebp,edi
+ rol ecx,5
+ pslld xmm1,2
+ add ebx,ebp
+ xor esi,edx
+ psrld xmm3,30
+ xor edx,edi
+ add ebx,ecx
+ add eax,DWORD [24+esp]
+ and esi,edx
+ xor edx,edi
+ ror ecx,7
+ por xmm1,xmm3
+ mov ebp,ebx
+ xor esi,edx
+ movdqa xmm3,[96+esp]
+ rol ebx,5
+ add eax,esi
+ xor ebp,ecx
+ xor ecx,edx
+ add eax,ebx
+ pshufd xmm4,xmm0,238
+ add edi,DWORD [28+esp]
+ and ebp,ecx
+ xor ecx,edx
+ ror ebx,7
+ mov esi,eax
+ xor ebp,ecx
+ rol eax,5
+ add edi,ebp
+ xor esi,ebx
+ xor ebx,ecx
+ add edi,eax
+ add edx,DWORD [32+esp]
+ pxor xmm2,xmm6
+ punpcklqdq xmm4,xmm1
+ and esi,ebx
+ xor ebx,ecx
+ ror eax,7
+ pxor xmm2,xmm3
+ movdqa [96+esp],xmm6
+ mov ebp,edi
+ xor esi,ebx
+ rol edi,5
+ movdqa xmm6,xmm5
+ add edx,esi
+ paddd xmm5,xmm1
+ xor ebp,eax
+ pxor xmm2,xmm4
+ xor eax,ebx
+ add edx,edi
+ add ecx,DWORD [36+esp]
+ and ebp,eax
+ movdqa xmm4,xmm2
+ movdqa [16+esp],xmm5
+ xor eax,ebx
+ ror edi,7
+ mov esi,edx
+ xor ebp,eax
+ rol edx,5
+ pslld xmm2,2
+ add ecx,ebp
+ xor esi,edi
+ psrld xmm4,30
+ xor edi,eax
+ add ecx,edx
+ add ebx,DWORD [40+esp]
+ and esi,edi
+ xor edi,eax
+ ror edx,7
+ por xmm2,xmm4
+ mov ebp,ecx
+ xor esi,edi
+ movdqa xmm4,[64+esp]
+ rol ecx,5
+ add ebx,esi
+ xor ebp,edx
+ xor edx,edi
+ add ebx,ecx
+ pshufd xmm5,xmm1,238
+ add eax,DWORD [44+esp]
+ and ebp,edx
+ xor edx,edi
+ ror ecx,7
+ mov esi,ebx
+ xor ebp,edx
+ rol ebx,5
+ add eax,ebp
+ xor esi,edx
+ add eax,ebx
+ add edi,DWORD [48+esp]
+ pxor xmm3,xmm7
+ punpcklqdq xmm5,xmm2
+ xor esi,ecx
+ mov ebp,eax
+ rol eax,5
+ pxor xmm3,xmm4
+ movdqa [64+esp],xmm7
+ add edi,esi
+ xor ebp,ecx
+ movdqa xmm7,xmm6
+ ror ebx,7
+ paddd xmm6,xmm2
+ add edi,eax
+ pxor xmm3,xmm5
+ add edx,DWORD [52+esp]
+ xor ebp,ebx
+ mov esi,edi
+ rol edi,5
+ movdqa xmm5,xmm3
+ movdqa [32+esp],xmm6
+ add edx,ebp
+ xor esi,ebx
+ ror eax,7
+ add edx,edi
+ pslld xmm3,2
+ add ecx,DWORD [56+esp]
+ xor esi,eax
+ psrld xmm5,30
+ mov ebp,edx
+ rol edx,5
+ add ecx,esi
+ xor ebp,eax
+ ror edi,7
+ add ecx,edx
+ por xmm3,xmm5
+ add ebx,DWORD [60+esp]
+ xor ebp,edi
+ mov esi,ecx
+ rol ecx,5
+ add ebx,ebp
+ xor esi,edi
+ ror edx,7
+ add ebx,ecx
+ add eax,DWORD [esp]
+ xor esi,edx
+ mov ebp,ebx
+ rol ebx,5
+ add eax,esi
+ xor ebp,edx
+ ror ecx,7
+ paddd xmm7,xmm3
+ add eax,ebx
+ add edi,DWORD [4+esp]
+ xor ebp,ecx
+ mov esi,eax
+ movdqa [48+esp],xmm7
+ rol eax,5
+ add edi,ebp
+ xor esi,ecx
+ ror ebx,7
+ add edi,eax
+ add edx,DWORD [8+esp]
+ xor esi,ebx
+ mov ebp,edi
+ rol edi,5
+ add edx,esi
+ xor ebp,ebx
+ ror eax,7
+ add edx,edi
+ add ecx,DWORD [12+esp]
+ xor ebp,eax
+ mov esi,edx
+ rol edx,5
+ add ecx,ebp
+ xor esi,eax
+ ror edi,7
+ add ecx,edx
+ mov ebp,DWORD [196+esp]
+ cmp ebp,DWORD [200+esp]
+ je NEAR L$007done
+ movdqa xmm7,[160+esp]
+ movdqa xmm6,[176+esp]
+ movdqu xmm0,[ebp]
+ movdqu xmm1,[16+ebp]
+ movdqu xmm2,[32+ebp]
+ movdqu xmm3,[48+ebp]
+ add ebp,64
+db 102,15,56,0,198
+ mov DWORD [196+esp],ebp
+ movdqa [96+esp],xmm7
+ add ebx,DWORD [16+esp]
+ xor esi,edi
+ mov ebp,ecx
+ rol ecx,5
+ add ebx,esi
+ xor ebp,edi
+ ror edx,7
+db 102,15,56,0,206
+ add ebx,ecx
+ add eax,DWORD [20+esp]
+ xor ebp,edx
+ mov esi,ebx
+ paddd xmm0,xmm7
+ rol ebx,5
+ add eax,ebp
+ xor esi,edx
+ ror ecx,7
+ movdqa [esp],xmm0
+ add eax,ebx
+ add edi,DWORD [24+esp]
+ xor esi,ecx
+ mov ebp,eax
+ psubd xmm0,xmm7
+ rol eax,5
+ add edi,esi
+ xor ebp,ecx
+ ror ebx,7
+ add edi,eax
+ add edx,DWORD [28+esp]
+ xor ebp,ebx
+ mov esi,edi
+ rol edi,5
+ add edx,ebp
+ xor esi,ebx
+ ror eax,7
+ add edx,edi
+ add ecx,DWORD [32+esp]
+ xor esi,eax
+ mov ebp,edx
+ rol edx,5
+ add ecx,esi
+ xor ebp,eax
+ ror edi,7
+db 102,15,56,0,214
+ add ecx,edx
+ add ebx,DWORD [36+esp]
+ xor ebp,edi
+ mov esi,ecx
+ paddd xmm1,xmm7
+ rol ecx,5
+ add ebx,ebp
+ xor esi,edi
+ ror edx,7
+ movdqa [16+esp],xmm1
+ add ebx,ecx
+ add eax,DWORD [40+esp]
+ xor esi,edx
+ mov ebp,ebx
+ psubd xmm1,xmm7
+ rol ebx,5
+ add eax,esi
+ xor ebp,edx
+ ror ecx,7
+ add eax,ebx
+ add edi,DWORD [44+esp]
+ xor ebp,ecx
+ mov esi,eax
+ rol eax,5
+ add edi,ebp
+ xor esi,ecx
+ ror ebx,7
+ add edi,eax
+ add edx,DWORD [48+esp]
+ xor esi,ebx
+ mov ebp,edi
+ rol edi,5
+ add edx,esi
+ xor ebp,ebx
+ ror eax,7
+db 102,15,56,0,222
+ add edx,edi
+ add ecx,DWORD [52+esp]
+ xor ebp,eax
+ mov esi,edx
+ paddd xmm2,xmm7
+ rol edx,5
+ add ecx,ebp
+ xor esi,eax
+ ror edi,7
+ movdqa [32+esp],xmm2
+ add ecx,edx
+ add ebx,DWORD [56+esp]
+ xor esi,edi
+ mov ebp,ecx
+ psubd xmm2,xmm7
+ rol ecx,5
+ add ebx,esi
+ xor ebp,edi
+ ror edx,7
+ add ebx,ecx
+ add eax,DWORD [60+esp]
+ xor ebp,edx
+ mov esi,ebx
+ rol ebx,5
+ add eax,ebp
+ ror ecx,7
+ add eax,ebx
+ mov ebp,DWORD [192+esp]
+ add eax,DWORD [ebp]
+ add esi,DWORD [4+ebp]
+ add ecx,DWORD [8+ebp]
+ mov DWORD [ebp],eax
+ add edx,DWORD [12+ebp]
+ mov DWORD [4+ebp],esi
+ add edi,DWORD [16+ebp]
+ mov DWORD [8+ebp],ecx
+ mov ebx,ecx
+ mov DWORD [12+ebp],edx
+ xor ebx,edx
+ mov DWORD [16+ebp],edi
+ mov ebp,esi
+ pshufd xmm4,xmm0,238
+ and esi,ebx
+ mov ebx,ebp
+ jmp NEAR L$006loop
+align 16
+L$007done:
+ add ebx,DWORD [16+esp]
+ xor esi,edi
+ mov ebp,ecx
+ rol ecx,5
+ add ebx,esi
+ xor ebp,edi
+ ror edx,7
+ add ebx,ecx
+ add eax,DWORD [20+esp]
+ xor ebp,edx
+ mov esi,ebx
+ rol ebx,5
+ add eax,ebp
+ xor esi,edx
+ ror ecx,7
+ add eax,ebx
+ add edi,DWORD [24+esp]
+ xor esi,ecx
+ mov ebp,eax
+ rol eax,5
+ add edi,esi
+ xor ebp,ecx
+ ror ebx,7
+ add edi,eax
+ add edx,DWORD [28+esp]
+ xor ebp,ebx
+ mov esi,edi
+ rol edi,5
+ add edx,ebp
+ xor esi,ebx
+ ror eax,7
+ add edx,edi
+ add ecx,DWORD [32+esp]
+ xor esi,eax
+ mov ebp,edx
+ rol edx,5
+ add ecx,esi
+ xor ebp,eax
+ ror edi,7
+ add ecx,edx
+ add ebx,DWORD [36+esp]
+ xor ebp,edi
+ mov esi,ecx
+ rol ecx,5
+ add ebx,ebp
+ xor esi,edi
+ ror edx,7
+ add ebx,ecx
+ add eax,DWORD [40+esp]
+ xor esi,edx
+ mov ebp,ebx
+ rol ebx,5
+ add eax,esi
+ xor ebp,edx
+ ror ecx,7
+ add eax,ebx
+ add edi,DWORD [44+esp]
+ xor ebp,ecx
+ mov esi,eax
+ rol eax,5
+ add edi,ebp
+ xor esi,ecx
+ ror ebx,7
+ add edi,eax
+ add edx,DWORD [48+esp]
+ xor esi,ebx
+ mov ebp,edi
+ rol edi,5
+ add edx,esi
+ xor ebp,ebx
+ ror eax,7
+ add edx,edi
+ add ecx,DWORD [52+esp]
+ xor ebp,eax
+ mov esi,edx
+ rol edx,5
+ add ecx,ebp
+ xor esi,eax
+ ror edi,7
+ add ecx,edx
+ add ebx,DWORD [56+esp]
+ xor esi,edi
+ mov ebp,ecx
+ rol ecx,5
+ add ebx,esi
+ xor ebp,edi
+ ror edx,7
+ add ebx,ecx
+ add eax,DWORD [60+esp]
+ xor ebp,edx
+ mov esi,ebx
+ rol ebx,5
+ add eax,ebp
+ ror ecx,7
+ add eax,ebx
+ mov ebp,DWORD [192+esp]
+ add eax,DWORD [ebp]
+ mov esp,DWORD [204+esp]
+ add esi,DWORD [4+ebp]
+ add ecx,DWORD [8+ebp]
+ mov DWORD [ebp],eax
+ add edx,DWORD [12+ebp]
+ mov DWORD [4+ebp],esi
+ add edi,DWORD [16+ebp]
+ mov DWORD [8+ebp],ecx
+ mov DWORD [12+ebp],edx
+ mov DWORD [16+ebp],edi
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+align 16
+__sha1_block_data_order_avx:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ call L$008pic_point
+L$008pic_point:
+ pop ebp
+ lea ebp,[(L$K_XX_XX-L$008pic_point)+ebp]
+L$avx_shortcut:
+ vzeroall
+ vmovdqa xmm7,[ebp]
+ vmovdqa xmm0,[16+ebp]
+ vmovdqa xmm1,[32+ebp]
+ vmovdqa xmm2,[48+ebp]
+ vmovdqa xmm6,[64+ebp]
+ mov edi,DWORD [20+esp]
+ mov ebp,DWORD [24+esp]
+ mov edx,DWORD [28+esp]
+ mov esi,esp
+ sub esp,208
+ and esp,-64
+ vmovdqa [112+esp],xmm0
+ vmovdqa [128+esp],xmm1
+ vmovdqa [144+esp],xmm2
+ shl edx,6
+ vmovdqa [160+esp],xmm7
+ add edx,ebp
+ vmovdqa [176+esp],xmm6
+ add ebp,64
+ mov DWORD [192+esp],edi
+ mov DWORD [196+esp],ebp
+ mov DWORD [200+esp],edx
+ mov DWORD [204+esp],esi
+ mov eax,DWORD [edi]
+ mov ebx,DWORD [4+edi]
+ mov ecx,DWORD [8+edi]
+ mov edx,DWORD [12+edi]
+ mov edi,DWORD [16+edi]
+ mov esi,ebx
+ vmovdqu xmm0,[ebp-64]
+ vmovdqu xmm1,[ebp-48]
+ vmovdqu xmm2,[ebp-32]
+ vmovdqu xmm3,[ebp-16]
+ vpshufb xmm0,xmm0,xmm6
+ vpshufb xmm1,xmm1,xmm6
+ vpshufb xmm2,xmm2,xmm6
+ vmovdqa [96+esp],xmm7
+ vpshufb xmm3,xmm3,xmm6
+ vpaddd xmm4,xmm0,xmm7
+ vpaddd xmm5,xmm1,xmm7
+ vpaddd xmm6,xmm2,xmm7
+ vmovdqa [esp],xmm4
+ mov ebp,ecx
+ vmovdqa [16+esp],xmm5
+ xor ebp,edx
+ vmovdqa [32+esp],xmm6
+ and esi,ebp
+ jmp NEAR L$009loop
+align 16
+L$009loop:
+ shrd ebx,ebx,2
+ xor esi,edx
+ vpalignr xmm4,xmm1,xmm0,8
+ mov ebp,eax
+ add edi,DWORD [esp]
+ vpaddd xmm7,xmm7,xmm3
+ vmovdqa [64+esp],xmm0
+ xor ebx,ecx
+ shld eax,eax,5
+ vpsrldq xmm6,xmm3,4
+ add edi,esi
+ and ebp,ebx
+ vpxor xmm4,xmm4,xmm0
+ xor ebx,ecx
+ add edi,eax
+ vpxor xmm6,xmm6,xmm2
+ shrd eax,eax,7
+ xor ebp,ecx
+ vmovdqa [48+esp],xmm7
+ mov esi,edi
+ add edx,DWORD [4+esp]
+ vpxor xmm4,xmm4,xmm6
+ xor eax,ebx
+ shld edi,edi,5
+ add edx,ebp
+ and esi,eax
+ vpsrld xmm6,xmm4,31
+ xor eax,ebx
+ add edx,edi
+ shrd edi,edi,7
+ xor esi,ebx
+ vpslldq xmm0,xmm4,12
+ vpaddd xmm4,xmm4,xmm4
+ mov ebp,edx
+ add ecx,DWORD [8+esp]
+ xor edi,eax
+ shld edx,edx,5
+ vpsrld xmm7,xmm0,30
+ vpor xmm4,xmm4,xmm6
+ add ecx,esi
+ and ebp,edi
+ xor edi,eax
+ add ecx,edx
+ vpslld xmm0,xmm0,2
+ shrd edx,edx,7
+ xor ebp,eax
+ vpxor xmm4,xmm4,xmm7
+ mov esi,ecx
+ add ebx,DWORD [12+esp]
+ xor edx,edi
+ shld ecx,ecx,5
+ vpxor xmm4,xmm4,xmm0
+ add ebx,ebp
+ and esi,edx
+ vmovdqa xmm0,[96+esp]
+ xor edx,edi
+ add ebx,ecx
+ shrd ecx,ecx,7
+ xor esi,edi
+ vpalignr xmm5,xmm2,xmm1,8
+ mov ebp,ebx
+ add eax,DWORD [16+esp]
+ vpaddd xmm0,xmm0,xmm4
+ vmovdqa [80+esp],xmm1
+ xor ecx,edx
+ shld ebx,ebx,5
+ vpsrldq xmm7,xmm4,4
+ add eax,esi
+ and ebp,ecx
+ vpxor xmm5,xmm5,xmm1
+ xor ecx,edx
+ add eax,ebx
+ vpxor xmm7,xmm7,xmm3
+ shrd ebx,ebx,7
+ xor ebp,edx
+ vmovdqa [esp],xmm0
+ mov esi,eax
+ add edi,DWORD [20+esp]
+ vpxor xmm5,xmm5,xmm7
+ xor ebx,ecx
+ shld eax,eax,5
+ add edi,ebp
+ and esi,ebx
+ vpsrld xmm7,xmm5,31
+ xor ebx,ecx
+ add edi,eax
+ shrd eax,eax,7
+ xor esi,ecx
+ vpslldq xmm1,xmm5,12
+ vpaddd xmm5,xmm5,xmm5
+ mov ebp,edi
+ add edx,DWORD [24+esp]
+ xor eax,ebx
+ shld edi,edi,5
+ vpsrld xmm0,xmm1,30
+ vpor xmm5,xmm5,xmm7
+ add edx,esi
+ and ebp,eax
+ xor eax,ebx
+ add edx,edi
+ vpslld xmm1,xmm1,2
+ shrd edi,edi,7
+ xor ebp,ebx
+ vpxor xmm5,xmm5,xmm0
+ mov esi,edx
+ add ecx,DWORD [28+esp]
+ xor edi,eax
+ shld edx,edx,5
+ vpxor xmm5,xmm5,xmm1
+ add ecx,ebp
+ and esi,edi
+ vmovdqa xmm1,[112+esp]
+ xor edi,eax
+ add ecx,edx
+ shrd edx,edx,7
+ xor esi,eax
+ vpalignr xmm6,xmm3,xmm2,8
+ mov ebp,ecx
+ add ebx,DWORD [32+esp]
+ vpaddd xmm1,xmm1,xmm5
+ vmovdqa [96+esp],xmm2
+ xor edx,edi
+ shld ecx,ecx,5
+ vpsrldq xmm0,xmm5,4
+ add ebx,esi
+ and ebp,edx
+ vpxor xmm6,xmm6,xmm2
+ xor edx,edi
+ add ebx,ecx
+ vpxor xmm0,xmm0,xmm4
+ shrd ecx,ecx,7
+ xor ebp,edi
+ vmovdqa [16+esp],xmm1
+ mov esi,ebx
+ add eax,DWORD [36+esp]
+ vpxor xmm6,xmm6,xmm0
+ xor ecx,edx
+ shld ebx,ebx,5
+ add eax,ebp
+ and esi,ecx
+ vpsrld xmm0,xmm6,31
+ xor ecx,edx
+ add eax,ebx
+ shrd ebx,ebx,7
+ xor esi,edx
+ vpslldq xmm2,xmm6,12
+ vpaddd xmm6,xmm6,xmm6
+ mov ebp,eax
+ add edi,DWORD [40+esp]
+ xor ebx,ecx
+ shld eax,eax,5
+ vpsrld xmm1,xmm2,30
+ vpor xmm6,xmm6,xmm0
+ add edi,esi
+ and ebp,ebx
+ xor ebx,ecx
+ add edi,eax
+ vpslld xmm2,xmm2,2
+ vmovdqa xmm0,[64+esp]
+ shrd eax,eax,7
+ xor ebp,ecx
+ vpxor xmm6,xmm6,xmm1
+ mov esi,edi
+ add edx,DWORD [44+esp]
+ xor eax,ebx
+ shld edi,edi,5
+ vpxor xmm6,xmm6,xmm2
+ add edx,ebp
+ and esi,eax
+ vmovdqa xmm2,[112+esp]
+ xor eax,ebx
+ add edx,edi
+ shrd edi,edi,7
+ xor esi,ebx
+ vpalignr xmm7,xmm4,xmm3,8
+ mov ebp,edx
+ add ecx,DWORD [48+esp]
+ vpaddd xmm2,xmm2,xmm6
+ vmovdqa [64+esp],xmm3
+ xor edi,eax
+ shld edx,edx,5
+ vpsrldq xmm1,xmm6,4
+ add ecx,esi
+ and ebp,edi
+ vpxor xmm7,xmm7,xmm3
+ xor edi,eax
+ add ecx,edx
+ vpxor xmm1,xmm1,xmm5
+ shrd edx,edx,7
+ xor ebp,eax
+ vmovdqa [32+esp],xmm2
+ mov esi,ecx
+ add ebx,DWORD [52+esp]
+ vpxor xmm7,xmm7,xmm1
+ xor edx,edi
+ shld ecx,ecx,5
+ add ebx,ebp
+ and esi,edx
+ vpsrld xmm1,xmm7,31
+ xor edx,edi
+ add ebx,ecx
+ shrd ecx,ecx,7
+ xor esi,edi
+ vpslldq xmm3,xmm7,12
+ vpaddd xmm7,xmm7,xmm7
+ mov ebp,ebx
+ add eax,DWORD [56+esp]
+ xor ecx,edx
+ shld ebx,ebx,5
+ vpsrld xmm2,xmm3,30
+ vpor xmm7,xmm7,xmm1
+ add eax,esi
+ and ebp,ecx
+ xor ecx,edx
+ add eax,ebx
+ vpslld xmm3,xmm3,2
+ vmovdqa xmm1,[80+esp]
+ shrd ebx,ebx,7
+ xor ebp,edx
+ vpxor xmm7,xmm7,xmm2
+ mov esi,eax
+ add edi,DWORD [60+esp]
+ xor ebx,ecx
+ shld eax,eax,5
+ vpxor xmm7,xmm7,xmm3
+ add edi,ebp
+ and esi,ebx
+ vmovdqa xmm3,[112+esp]
+ xor ebx,ecx
+ add edi,eax
+ vpalignr xmm2,xmm7,xmm6,8
+ vpxor xmm0,xmm0,xmm4
+ shrd eax,eax,7
+ xor esi,ecx
+ mov ebp,edi
+ add edx,DWORD [esp]
+ vpxor xmm0,xmm0,xmm1
+ vmovdqa [80+esp],xmm4
+ xor eax,ebx
+ shld edi,edi,5
+ vmovdqa xmm4,xmm3
+ vpaddd xmm3,xmm3,xmm7
+ add edx,esi
+ and ebp,eax
+ vpxor xmm0,xmm0,xmm2
+ xor eax,ebx
+ add edx,edi
+ shrd edi,edi,7
+ xor ebp,ebx
+ vpsrld xmm2,xmm0,30
+ vmovdqa [48+esp],xmm3
+ mov esi,edx
+ add ecx,DWORD [4+esp]
+ xor edi,eax
+ shld edx,edx,5
+ vpslld xmm0,xmm0,2
+ add ecx,ebp
+ and esi,edi
+ xor edi,eax
+ add ecx,edx
+ shrd edx,edx,7
+ xor esi,eax
+ mov ebp,ecx
+ add ebx,DWORD [8+esp]
+ vpor xmm0,xmm0,xmm2
+ xor edx,edi
+ shld ecx,ecx,5
+ vmovdqa xmm2,[96+esp]
+ add ebx,esi
+ and ebp,edx
+ xor edx,edi
+ add ebx,ecx
+ add eax,DWORD [12+esp]
+ xor ebp,edi
+ mov esi,ebx
+ shld ebx,ebx,5
+ add eax,ebp
+ xor esi,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ vpalignr xmm3,xmm0,xmm7,8
+ vpxor xmm1,xmm1,xmm5
+ add edi,DWORD [16+esp]
+ xor esi,ecx
+ mov ebp,eax
+ shld eax,eax,5
+ vpxor xmm1,xmm1,xmm2
+ vmovdqa [96+esp],xmm5
+ add edi,esi
+ xor ebp,ecx
+ vmovdqa xmm5,xmm4
+ vpaddd xmm4,xmm4,xmm0
+ shrd ebx,ebx,7
+ add edi,eax
+ vpxor xmm1,xmm1,xmm3
+ add edx,DWORD [20+esp]
+ xor ebp,ebx
+ mov esi,edi
+ shld edi,edi,5
+ vpsrld xmm3,xmm1,30
+ vmovdqa [esp],xmm4
+ add edx,ebp
+ xor esi,ebx
+ shrd eax,eax,7
+ add edx,edi
+ vpslld xmm1,xmm1,2
+ add ecx,DWORD [24+esp]
+ xor esi,eax
+ mov ebp,edx
+ shld edx,edx,5
+ add ecx,esi
+ xor ebp,eax
+ shrd edi,edi,7
+ add ecx,edx
+ vpor xmm1,xmm1,xmm3
+ add ebx,DWORD [28+esp]
+ xor ebp,edi
+ vmovdqa xmm3,[64+esp]
+ mov esi,ecx
+ shld ecx,ecx,5
+ add ebx,ebp
+ xor esi,edi
+ shrd edx,edx,7
+ add ebx,ecx
+ vpalignr xmm4,xmm1,xmm0,8
+ vpxor xmm2,xmm2,xmm6
+ add eax,DWORD [32+esp]
+ xor esi,edx
+ mov ebp,ebx
+ shld ebx,ebx,5
+ vpxor xmm2,xmm2,xmm3
+ vmovdqa [64+esp],xmm6
+ add eax,esi
+ xor ebp,edx
+ vmovdqa xmm6,[128+esp]
+ vpaddd xmm5,xmm5,xmm1
+ shrd ecx,ecx,7
+ add eax,ebx
+ vpxor xmm2,xmm2,xmm4
+ add edi,DWORD [36+esp]
+ xor ebp,ecx
+ mov esi,eax
+ shld eax,eax,5
+ vpsrld xmm4,xmm2,30
+ vmovdqa [16+esp],xmm5
+ add edi,ebp
+ xor esi,ecx
+ shrd ebx,ebx,7
+ add edi,eax
+ vpslld xmm2,xmm2,2
+ add edx,DWORD [40+esp]
+ xor esi,ebx
+ mov ebp,edi
+ shld edi,edi,5
+ add edx,esi
+ xor ebp,ebx
+ shrd eax,eax,7
+ add edx,edi
+ vpor xmm2,xmm2,xmm4
+ add ecx,DWORD [44+esp]
+ xor ebp,eax
+ vmovdqa xmm4,[80+esp]
+ mov esi,edx
+ shld edx,edx,5
+ add ecx,ebp
+ xor esi,eax
+ shrd edi,edi,7
+ add ecx,edx
+ vpalignr xmm5,xmm2,xmm1,8
+ vpxor xmm3,xmm3,xmm7
+ add ebx,DWORD [48+esp]
+ xor esi,edi
+ mov ebp,ecx
+ shld ecx,ecx,5
+ vpxor xmm3,xmm3,xmm4
+ vmovdqa [80+esp],xmm7
+ add ebx,esi
+ xor ebp,edi
+ vmovdqa xmm7,xmm6
+ vpaddd xmm6,xmm6,xmm2
+ shrd edx,edx,7
+ add ebx,ecx
+ vpxor xmm3,xmm3,xmm5
+ add eax,DWORD [52+esp]
+ xor ebp,edx
+ mov esi,ebx
+ shld ebx,ebx,5
+ vpsrld xmm5,xmm3,30
+ vmovdqa [32+esp],xmm6
+ add eax,ebp
+ xor esi,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ vpslld xmm3,xmm3,2
+ add edi,DWORD [56+esp]
+ xor esi,ecx
+ mov ebp,eax
+ shld eax,eax,5
+ add edi,esi
+ xor ebp,ecx
+ shrd ebx,ebx,7
+ add edi,eax
+ vpor xmm3,xmm3,xmm5
+ add edx,DWORD [60+esp]
+ xor ebp,ebx
+ vmovdqa xmm5,[96+esp]
+ mov esi,edi
+ shld edi,edi,5
+ add edx,ebp
+ xor esi,ebx
+ shrd eax,eax,7
+ add edx,edi
+ vpalignr xmm6,xmm3,xmm2,8
+ vpxor xmm4,xmm4,xmm0
+ add ecx,DWORD [esp]
+ xor esi,eax
+ mov ebp,edx
+ shld edx,edx,5
+ vpxor xmm4,xmm4,xmm5
+ vmovdqa [96+esp],xmm0
+ add ecx,esi
+ xor ebp,eax
+ vmovdqa xmm0,xmm7
+ vpaddd xmm7,xmm7,xmm3
+ shrd edi,edi,7
+ add ecx,edx
+ vpxor xmm4,xmm4,xmm6
+ add ebx,DWORD [4+esp]
+ xor ebp,edi
+ mov esi,ecx
+ shld ecx,ecx,5
+ vpsrld xmm6,xmm4,30
+ vmovdqa [48+esp],xmm7
+ add ebx,ebp
+ xor esi,edi
+ shrd edx,edx,7
+ add ebx,ecx
+ vpslld xmm4,xmm4,2
+ add eax,DWORD [8+esp]
+ xor esi,edx
+ mov ebp,ebx
+ shld ebx,ebx,5
+ add eax,esi
+ xor ebp,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ vpor xmm4,xmm4,xmm6
+ add edi,DWORD [12+esp]
+ xor ebp,ecx
+ vmovdqa xmm6,[64+esp]
+ mov esi,eax
+ shld eax,eax,5
+ add edi,ebp
+ xor esi,ecx
+ shrd ebx,ebx,7
+ add edi,eax
+ vpalignr xmm7,xmm4,xmm3,8
+ vpxor xmm5,xmm5,xmm1
+ add edx,DWORD [16+esp]
+ xor esi,ebx
+ mov ebp,edi
+ shld edi,edi,5
+ vpxor xmm5,xmm5,xmm6
+ vmovdqa [64+esp],xmm1
+ add edx,esi
+ xor ebp,ebx
+ vmovdqa xmm1,xmm0
+ vpaddd xmm0,xmm0,xmm4
+ shrd eax,eax,7
+ add edx,edi
+ vpxor xmm5,xmm5,xmm7
+ add ecx,DWORD [20+esp]
+ xor ebp,eax
+ mov esi,edx
+ shld edx,edx,5
+ vpsrld xmm7,xmm5,30
+ vmovdqa [esp],xmm0
+ add ecx,ebp
+ xor esi,eax
+ shrd edi,edi,7
+ add ecx,edx
+ vpslld xmm5,xmm5,2
+ add ebx,DWORD [24+esp]
+ xor esi,edi
+ mov ebp,ecx
+ shld ecx,ecx,5
+ add ebx,esi
+ xor ebp,edi
+ shrd edx,edx,7
+ add ebx,ecx
+ vpor xmm5,xmm5,xmm7
+ add eax,DWORD [28+esp]
+ vmovdqa xmm7,[80+esp]
+ shrd ecx,ecx,7
+ mov esi,ebx
+ xor ebp,edx
+ shld ebx,ebx,5
+ add eax,ebp
+ xor esi,ecx
+ xor ecx,edx
+ add eax,ebx
+ vpalignr xmm0,xmm5,xmm4,8
+ vpxor xmm6,xmm6,xmm2
+ add edi,DWORD [32+esp]
+ and esi,ecx
+ xor ecx,edx
+ shrd ebx,ebx,7
+ vpxor xmm6,xmm6,xmm7
+ vmovdqa [80+esp],xmm2
+ mov ebp,eax
+ xor esi,ecx
+ vmovdqa xmm2,xmm1
+ vpaddd xmm1,xmm1,xmm5
+ shld eax,eax,5
+ add edi,esi
+ vpxor xmm6,xmm6,xmm0
+ xor ebp,ebx
+ xor ebx,ecx
+ add edi,eax
+ add edx,DWORD [36+esp]
+ vpsrld xmm0,xmm6,30
+ vmovdqa [16+esp],xmm1
+ and ebp,ebx
+ xor ebx,ecx
+ shrd eax,eax,7
+ mov esi,edi
+ vpslld xmm6,xmm6,2
+ xor ebp,ebx
+ shld edi,edi,5
+ add edx,ebp
+ xor esi,eax
+ xor eax,ebx
+ add edx,edi
+ add ecx,DWORD [40+esp]
+ and esi,eax
+ vpor xmm6,xmm6,xmm0
+ xor eax,ebx
+ shrd edi,edi,7
+ vmovdqa xmm0,[96+esp]
+ mov ebp,edx
+ xor esi,eax
+ shld edx,edx,5
+ add ecx,esi
+ xor ebp,edi
+ xor edi,eax
+ add ecx,edx
+ add ebx,DWORD [44+esp]
+ and ebp,edi
+ xor edi,eax
+ shrd edx,edx,7
+ mov esi,ecx
+ xor ebp,edi
+ shld ecx,ecx,5
+ add ebx,ebp
+ xor esi,edx
+ xor edx,edi
+ add ebx,ecx
+ vpalignr xmm1,xmm6,xmm5,8
+ vpxor xmm7,xmm7,xmm3
+ add eax,DWORD [48+esp]
+ and esi,edx
+ xor edx,edi
+ shrd ecx,ecx,7
+ vpxor xmm7,xmm7,xmm0
+ vmovdqa [96+esp],xmm3
+ mov ebp,ebx
+ xor esi,edx
+ vmovdqa xmm3,[144+esp]
+ vpaddd xmm2,xmm2,xmm6
+ shld ebx,ebx,5
+ add eax,esi
+ vpxor xmm7,xmm7,xmm1
+ xor ebp,ecx
+ xor ecx,edx
+ add eax,ebx
+ add edi,DWORD [52+esp]
+ vpsrld xmm1,xmm7,30
+ vmovdqa [32+esp],xmm2
+ and ebp,ecx
+ xor ecx,edx
+ shrd ebx,ebx,7
+ mov esi,eax
+ vpslld xmm7,xmm7,2
+ xor ebp,ecx
+ shld eax,eax,5
+ add edi,ebp
+ xor esi,ebx
+ xor ebx,ecx
+ add edi,eax
+ add edx,DWORD [56+esp]
+ and esi,ebx
+ vpor xmm7,xmm7,xmm1
+ xor ebx,ecx
+ shrd eax,eax,7
+ vmovdqa xmm1,[64+esp]
+ mov ebp,edi
+ xor esi,ebx
+ shld edi,edi,5
+ add edx,esi
+ xor ebp,eax
+ xor eax,ebx
+ add edx,edi
+ add ecx,DWORD [60+esp]
+ and ebp,eax
+ xor eax,ebx
+ shrd edi,edi,7
+ mov esi,edx
+ xor ebp,eax
+ shld edx,edx,5
+ add ecx,ebp
+ xor esi,edi
+ xor edi,eax
+ add ecx,edx
+ vpalignr xmm2,xmm7,xmm6,8
+ vpxor xmm0,xmm0,xmm4
+ add ebx,DWORD [esp]
+ and esi,edi
+ xor edi,eax
+ shrd edx,edx,7
+ vpxor xmm0,xmm0,xmm1
+ vmovdqa [64+esp],xmm4
+ mov ebp,ecx
+ xor esi,edi
+ vmovdqa xmm4,xmm3
+ vpaddd xmm3,xmm3,xmm7
+ shld ecx,ecx,5
+ add ebx,esi
+ vpxor xmm0,xmm0,xmm2
+ xor ebp,edx
+ xor edx,edi
+ add ebx,ecx
+ add eax,DWORD [4+esp]
+ vpsrld xmm2,xmm0,30
+ vmovdqa [48+esp],xmm3
+ and ebp,edx
+ xor edx,edi
+ shrd ecx,ecx,7
+ mov esi,ebx
+ vpslld xmm0,xmm0,2
+ xor ebp,edx
+ shld ebx,ebx,5
+ add eax,ebp
+ xor esi,ecx
+ xor ecx,edx
+ add eax,ebx
+ add edi,DWORD [8+esp]
+ and esi,ecx
+ vpor xmm0,xmm0,xmm2
+ xor ecx,edx
+ shrd ebx,ebx,7
+ vmovdqa xmm2,[80+esp]
+ mov ebp,eax
+ xor esi,ecx
+ shld eax,eax,5
+ add edi,esi
+ xor ebp,ebx
+ xor ebx,ecx
+ add edi,eax
+ add edx,DWORD [12+esp]
+ and ebp,ebx
+ xor ebx,ecx
+ shrd eax,eax,7
+ mov esi,edi
+ xor ebp,ebx
+ shld edi,edi,5
+ add edx,ebp
+ xor esi,eax
+ xor eax,ebx
+ add edx,edi
+ vpalignr xmm3,xmm0,xmm7,8
+ vpxor xmm1,xmm1,xmm5
+ add ecx,DWORD [16+esp]
+ and esi,eax
+ xor eax,ebx
+ shrd edi,edi,7
+ vpxor xmm1,xmm1,xmm2
+ vmovdqa [80+esp],xmm5
+ mov ebp,edx
+ xor esi,eax
+ vmovdqa xmm5,xmm4
+ vpaddd xmm4,xmm4,xmm0
+ shld edx,edx,5
+ add ecx,esi
+ vpxor xmm1,xmm1,xmm3
+ xor ebp,edi
+ xor edi,eax
+ add ecx,edx
+ add ebx,DWORD [20+esp]
+ vpsrld xmm3,xmm1,30
+ vmovdqa [esp],xmm4
+ and ebp,edi
+ xor edi,eax
+ shrd edx,edx,7
+ mov esi,ecx
+ vpslld xmm1,xmm1,2
+ xor ebp,edi
+ shld ecx,ecx,5
+ add ebx,ebp
+ xor esi,edx
+ xor edx,edi
+ add ebx,ecx
+ add eax,DWORD [24+esp]
+ and esi,edx
+ vpor xmm1,xmm1,xmm3
+ xor edx,edi
+ shrd ecx,ecx,7
+ vmovdqa xmm3,[96+esp]
+ mov ebp,ebx
+ xor esi,edx
+ shld ebx,ebx,5
+ add eax,esi
+ xor ebp,ecx
+ xor ecx,edx
+ add eax,ebx
+ add edi,DWORD [28+esp]
+ and ebp,ecx
+ xor ecx,edx
+ shrd ebx,ebx,7
+ mov esi,eax
+ xor ebp,ecx
+ shld eax,eax,5
+ add edi,ebp
+ xor esi,ebx
+ xor ebx,ecx
+ add edi,eax
+ vpalignr xmm4,xmm1,xmm0,8
+ vpxor xmm2,xmm2,xmm6
+ add edx,DWORD [32+esp]
+ and esi,ebx
+ xor ebx,ecx
+ shrd eax,eax,7
+ vpxor xmm2,xmm2,xmm3
+ vmovdqa [96+esp],xmm6
+ mov ebp,edi
+ xor esi,ebx
+ vmovdqa xmm6,xmm5
+ vpaddd xmm5,xmm5,xmm1
+ shld edi,edi,5
+ add edx,esi
+ vpxor xmm2,xmm2,xmm4
+ xor ebp,eax
+ xor eax,ebx
+ add edx,edi
+ add ecx,DWORD [36+esp]
+ vpsrld xmm4,xmm2,30
+ vmovdqa [16+esp],xmm5
+ and ebp,eax
+ xor eax,ebx
+ shrd edi,edi,7
+ mov esi,edx
+ vpslld xmm2,xmm2,2
+ xor ebp,eax
+ shld edx,edx,5
+ add ecx,ebp
+ xor esi,edi
+ xor edi,eax
+ add ecx,edx
+ add ebx,DWORD [40+esp]
+ and esi,edi
+ vpor xmm2,xmm2,xmm4
+ xor edi,eax
+ shrd edx,edx,7
+ vmovdqa xmm4,[64+esp]
+ mov ebp,ecx
+ xor esi,edi
+ shld ecx,ecx,5
+ add ebx,esi
+ xor ebp,edx
+ xor edx,edi
+ add ebx,ecx
+ add eax,DWORD [44+esp]
+ and ebp,edx
+ xor edx,edi
+ shrd ecx,ecx,7
+ mov esi,ebx
+ xor ebp,edx
+ shld ebx,ebx,5
+ add eax,ebp
+ xor esi,edx
+ add eax,ebx
+ vpalignr xmm5,xmm2,xmm1,8
+ vpxor xmm3,xmm3,xmm7
+ add edi,DWORD [48+esp]
+ xor esi,ecx
+ mov ebp,eax
+ shld eax,eax,5
+ vpxor xmm3,xmm3,xmm4
+ vmovdqa [64+esp],xmm7
+ add edi,esi
+ xor ebp,ecx
+ vmovdqa xmm7,xmm6
+ vpaddd xmm6,xmm6,xmm2
+ shrd ebx,ebx,7
+ add edi,eax
+ vpxor xmm3,xmm3,xmm5
+ add edx,DWORD [52+esp]
+ xor ebp,ebx
+ mov esi,edi
+ shld edi,edi,5
+ vpsrld xmm5,xmm3,30
+ vmovdqa [32+esp],xmm6
+ add edx,ebp
+ xor esi,ebx
+ shrd eax,eax,7
+ add edx,edi
+ vpslld xmm3,xmm3,2
+ add ecx,DWORD [56+esp]
+ xor esi,eax
+ mov ebp,edx
+ shld edx,edx,5
+ add ecx,esi
+ xor ebp,eax
+ shrd edi,edi,7
+ add ecx,edx
+ vpor xmm3,xmm3,xmm5
+ add ebx,DWORD [60+esp]
+ xor ebp,edi
+ mov esi,ecx
+ shld ecx,ecx,5
+ add ebx,ebp
+ xor esi,edi
+ shrd edx,edx,7
+ add ebx,ecx
+ add eax,DWORD [esp]
+ vpaddd xmm7,xmm7,xmm3
+ xor esi,edx
+ mov ebp,ebx
+ shld ebx,ebx,5
+ add eax,esi
+ vmovdqa [48+esp],xmm7
+ xor ebp,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ add edi,DWORD [4+esp]
+ xor ebp,ecx
+ mov esi,eax
+ shld eax,eax,5
+ add edi,ebp
+ xor esi,ecx
+ shrd ebx,ebx,7
+ add edi,eax
+ add edx,DWORD [8+esp]
+ xor esi,ebx
+ mov ebp,edi
+ shld edi,edi,5
+ add edx,esi
+ xor ebp,ebx
+ shrd eax,eax,7
+ add edx,edi
+ add ecx,DWORD [12+esp]
+ xor ebp,eax
+ mov esi,edx
+ shld edx,edx,5
+ add ecx,ebp
+ xor esi,eax
+ shrd edi,edi,7
+ add ecx,edx
+ mov ebp,DWORD [196+esp]
+ cmp ebp,DWORD [200+esp]
+ je NEAR L$010done
+ vmovdqa xmm7,[160+esp]
+ vmovdqa xmm6,[176+esp]
+ vmovdqu xmm0,[ebp]
+ vmovdqu xmm1,[16+ebp]
+ vmovdqu xmm2,[32+ebp]
+ vmovdqu xmm3,[48+ebp]
+ add ebp,64
+ vpshufb xmm0,xmm0,xmm6
+ mov DWORD [196+esp],ebp
+ vmovdqa [96+esp],xmm7
+ add ebx,DWORD [16+esp]
+ xor esi,edi
+ vpshufb xmm1,xmm1,xmm6
+ mov ebp,ecx
+ shld ecx,ecx,5
+ vpaddd xmm4,xmm0,xmm7
+ add ebx,esi
+ xor ebp,edi
+ shrd edx,edx,7
+ add ebx,ecx
+ vmovdqa [esp],xmm4
+ add eax,DWORD [20+esp]
+ xor ebp,edx
+ mov esi,ebx
+ shld ebx,ebx,5
+ add eax,ebp
+ xor esi,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ add edi,DWORD [24+esp]
+ xor esi,ecx
+ mov ebp,eax
+ shld eax,eax,5
+ add edi,esi
+ xor ebp,ecx
+ shrd ebx,ebx,7
+ add edi,eax
+ add edx,DWORD [28+esp]
+ xor ebp,ebx
+ mov esi,edi
+ shld edi,edi,5
+ add edx,ebp
+ xor esi,ebx
+ shrd eax,eax,7
+ add edx,edi
+ add ecx,DWORD [32+esp]
+ xor esi,eax
+ vpshufb xmm2,xmm2,xmm6
+ mov ebp,edx
+ shld edx,edx,5
+ vpaddd xmm5,xmm1,xmm7
+ add ecx,esi
+ xor ebp,eax
+ shrd edi,edi,7
+ add ecx,edx
+ vmovdqa [16+esp],xmm5
+ add ebx,DWORD [36+esp]
+ xor ebp,edi
+ mov esi,ecx
+ shld ecx,ecx,5
+ add ebx,ebp
+ xor esi,edi
+ shrd edx,edx,7
+ add ebx,ecx
+ add eax,DWORD [40+esp]
+ xor esi,edx
+ mov ebp,ebx
+ shld ebx,ebx,5
+ add eax,esi
+ xor ebp,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ add edi,DWORD [44+esp]
+ xor ebp,ecx
+ mov esi,eax
+ shld eax,eax,5
+ add edi,ebp
+ xor esi,ecx
+ shrd ebx,ebx,7
+ add edi,eax
+ add edx,DWORD [48+esp]
+ xor esi,ebx
+ vpshufb xmm3,xmm3,xmm6
+ mov ebp,edi
+ shld edi,edi,5
+ vpaddd xmm6,xmm2,xmm7
+ add edx,esi
+ xor ebp,ebx
+ shrd eax,eax,7
+ add edx,edi
+ vmovdqa [32+esp],xmm6
+ add ecx,DWORD [52+esp]
+ xor ebp,eax
+ mov esi,edx
+ shld edx,edx,5
+ add ecx,ebp
+ xor esi,eax
+ shrd edi,edi,7
+ add ecx,edx
+ add ebx,DWORD [56+esp]
+ xor esi,edi
+ mov ebp,ecx
+ shld ecx,ecx,5
+ add ebx,esi
+ xor ebp,edi
+ shrd edx,edx,7
+ add ebx,ecx
+ add eax,DWORD [60+esp]
+ xor ebp,edx
+ mov esi,ebx
+ shld ebx,ebx,5
+ add eax,ebp
+ shrd ecx,ecx,7
+ add eax,ebx
+ mov ebp,DWORD [192+esp]
+ add eax,DWORD [ebp]
+ add esi,DWORD [4+ebp]
+ add ecx,DWORD [8+ebp]
+ mov DWORD [ebp],eax
+ add edx,DWORD [12+ebp]
+ mov DWORD [4+ebp],esi
+ add edi,DWORD [16+ebp]
+ mov ebx,ecx
+ mov DWORD [8+ebp],ecx
+ xor ebx,edx
+ mov DWORD [12+ebp],edx
+ mov DWORD [16+ebp],edi
+ mov ebp,esi
+ and esi,ebx
+ mov ebx,ebp
+ jmp NEAR L$009loop
+align 16
+L$010done:
+ add ebx,DWORD [16+esp]
+ xor esi,edi
+ mov ebp,ecx
+ shld ecx,ecx,5
+ add ebx,esi
+ xor ebp,edi
+ shrd edx,edx,7
+ add ebx,ecx
+ add eax,DWORD [20+esp]
+ xor ebp,edx
+ mov esi,ebx
+ shld ebx,ebx,5
+ add eax,ebp
+ xor esi,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ add edi,DWORD [24+esp]
+ xor esi,ecx
+ mov ebp,eax
+ shld eax,eax,5
+ add edi,esi
+ xor ebp,ecx
+ shrd ebx,ebx,7
+ add edi,eax
+ add edx,DWORD [28+esp]
+ xor ebp,ebx
+ mov esi,edi
+ shld edi,edi,5
+ add edx,ebp
+ xor esi,ebx
+ shrd eax,eax,7
+ add edx,edi
+ add ecx,DWORD [32+esp]
+ xor esi,eax
+ mov ebp,edx
+ shld edx,edx,5
+ add ecx,esi
+ xor ebp,eax
+ shrd edi,edi,7
+ add ecx,edx
+ add ebx,DWORD [36+esp]
+ xor ebp,edi
+ mov esi,ecx
+ shld ecx,ecx,5
+ add ebx,ebp
+ xor esi,edi
+ shrd edx,edx,7
+ add ebx,ecx
+ add eax,DWORD [40+esp]
+ xor esi,edx
+ mov ebp,ebx
+ shld ebx,ebx,5
+ add eax,esi
+ xor ebp,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ add edi,DWORD [44+esp]
+ xor ebp,ecx
+ mov esi,eax
+ shld eax,eax,5
+ add edi,ebp
+ xor esi,ecx
+ shrd ebx,ebx,7
+ add edi,eax
+ add edx,DWORD [48+esp]
+ xor esi,ebx
+ mov ebp,edi
+ shld edi,edi,5
+ add edx,esi
+ xor ebp,ebx
+ shrd eax,eax,7
+ add edx,edi
+ add ecx,DWORD [52+esp]
+ xor ebp,eax
+ mov esi,edx
+ shld edx,edx,5
+ add ecx,ebp
+ xor esi,eax
+ shrd edi,edi,7
+ add ecx,edx
+ add ebx,DWORD [56+esp]
+ xor esi,edi
+ mov ebp,ecx
+ shld ecx,ecx,5
+ add ebx,esi
+ xor ebp,edi
+ shrd edx,edx,7
+ add ebx,ecx
+ add eax,DWORD [60+esp]
+ xor ebp,edx
+ mov esi,ebx
+ shld ebx,ebx,5
+ add eax,ebp
+ shrd ecx,ecx,7
+ add eax,ebx
+ vzeroall
+ mov ebp,DWORD [192+esp]
+ add eax,DWORD [ebp]
+ mov esp,DWORD [204+esp]
+ add esi,DWORD [4+ebp]
+ add ecx,DWORD [8+ebp]
+ mov DWORD [ebp],eax
+ add edx,DWORD [12+ebp]
+ mov DWORD [4+ebp],esi
+ add edi,DWORD [16+ebp]
+ mov DWORD [8+ebp],ecx
+ mov DWORD [12+ebp],edx
+ mov DWORD [16+ebp],edi
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+align 64
+L$K_XX_XX:
+dd 1518500249,1518500249,1518500249,1518500249
+dd 1859775393,1859775393,1859775393,1859775393
+dd 2400959708,2400959708,2400959708,2400959708
+dd 3395469782,3395469782,3395469782,3395469782
+dd 66051,67438087,134810123,202182159
+db 15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0
+db 83,72,65,49,32,98,108,111,99,107,32,116,114,97,110,115
+db 102,111,114,109,32,102,111,114,32,120,56,54,44,32,67,82
+db 89,80,84,79,71,65,77,83,32,98,121,32,60,97,112,112
+db 114,111,64,111,112,101,110,115,115,108,46,111,114,103,62,0
+segment .bss
+common _OPENSSL_ia32cap_P 16
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha256-586.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha256-586.nasm
new file mode 100644
index 0000000000..0540b0eac7
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha256-586.nasm
@@ -0,0 +1,6796 @@
+; Copyright 2007-2018 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text code align=64
+%else
+section .text code
+%endif
+;extern _OPENSSL_ia32cap_P
+global _sha256_block_data_order
+align 16
+_sha256_block_data_order:
+L$_sha256_block_data_order_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov esi,DWORD [20+esp]
+ mov edi,DWORD [24+esp]
+ mov eax,DWORD [28+esp]
+ mov ebx,esp
+ call L$000pic_point
+L$000pic_point:
+ pop ebp
+ lea ebp,[(L$001K256-L$000pic_point)+ebp]
+ sub esp,16
+ and esp,-64
+ shl eax,6
+ add eax,edi
+ mov DWORD [esp],esi
+ mov DWORD [4+esp],edi
+ mov DWORD [8+esp],eax
+ mov DWORD [12+esp],ebx
+ lea edx,[_OPENSSL_ia32cap_P]
+ mov ecx,DWORD [edx]
+ mov ebx,DWORD [4+edx]
+ test ecx,1048576
+ jnz NEAR L$002loop
+ mov edx,DWORD [8+edx]
+ test ecx,16777216
+ jz NEAR L$003no_xmm
+ and ecx,1073741824
+ and ebx,268435968
+ test edx,536870912
+ jnz NEAR L$004shaext
+ or ecx,ebx
+ and ecx,1342177280
+ cmp ecx,1342177280
+ je NEAR L$005AVX
+ test ebx,512
+ jnz NEAR L$006SSSE3
+L$003no_xmm:
+ sub eax,edi
+ cmp eax,256
+ jae NEAR L$007unrolled
+ jmp NEAR L$002loop
+align 16
+L$002loop:
+ mov eax,DWORD [edi]
+ mov ebx,DWORD [4+edi]
+ mov ecx,DWORD [8+edi]
+ bswap eax
+ mov edx,DWORD [12+edi]
+ bswap ebx
+ push eax
+ bswap ecx
+ push ebx
+ bswap edx
+ push ecx
+ push edx
+ mov eax,DWORD [16+edi]
+ mov ebx,DWORD [20+edi]
+ mov ecx,DWORD [24+edi]
+ bswap eax
+ mov edx,DWORD [28+edi]
+ bswap ebx
+ push eax
+ bswap ecx
+ push ebx
+ bswap edx
+ push ecx
+ push edx
+ mov eax,DWORD [32+edi]
+ mov ebx,DWORD [36+edi]
+ mov ecx,DWORD [40+edi]
+ bswap eax
+ mov edx,DWORD [44+edi]
+ bswap ebx
+ push eax
+ bswap ecx
+ push ebx
+ bswap edx
+ push ecx
+ push edx
+ mov eax,DWORD [48+edi]
+ mov ebx,DWORD [52+edi]
+ mov ecx,DWORD [56+edi]
+ bswap eax
+ mov edx,DWORD [60+edi]
+ bswap ebx
+ push eax
+ bswap ecx
+ push ebx
+ bswap edx
+ push ecx
+ push edx
+ add edi,64
+ lea esp,[esp-36]
+ mov DWORD [104+esp],edi
+ mov eax,DWORD [esi]
+ mov ebx,DWORD [4+esi]
+ mov ecx,DWORD [8+esi]
+ mov edi,DWORD [12+esi]
+ mov DWORD [8+esp],ebx
+ xor ebx,ecx
+ mov DWORD [12+esp],ecx
+ mov DWORD [16+esp],edi
+ mov DWORD [esp],ebx
+ mov edx,DWORD [16+esi]
+ mov ebx,DWORD [20+esi]
+ mov ecx,DWORD [24+esi]
+ mov edi,DWORD [28+esi]
+ mov DWORD [24+esp],ebx
+ mov DWORD [28+esp],ecx
+ mov DWORD [32+esp],edi
+align 16
+L$00800_15:
+ mov ecx,edx
+ mov esi,DWORD [24+esp]
+ ror ecx,14
+ mov edi,DWORD [28+esp]
+ xor ecx,edx
+ xor esi,edi
+ mov ebx,DWORD [96+esp]
+ ror ecx,5
+ and esi,edx
+ mov DWORD [20+esp],edx
+ xor edx,ecx
+ add ebx,DWORD [32+esp]
+ xor esi,edi
+ ror edx,6
+ mov ecx,eax
+ add ebx,esi
+ ror ecx,9
+ add ebx,edx
+ mov edi,DWORD [8+esp]
+ xor ecx,eax
+ mov DWORD [4+esp],eax
+ lea esp,[esp-4]
+ ror ecx,11
+ mov esi,DWORD [ebp]
+ xor ecx,eax
+ mov edx,DWORD [20+esp]
+ xor eax,edi
+ ror ecx,2
+ add ebx,esi
+ mov DWORD [esp],eax
+ add edx,ebx
+ and eax,DWORD [4+esp]
+ add ebx,ecx
+ xor eax,edi
+ add ebp,4
+ add eax,ebx
+ cmp esi,3248222580
+ jne NEAR L$00800_15
+ mov ecx,DWORD [156+esp]
+ jmp NEAR L$00916_63
+align 16
+L$00916_63:
+ mov ebx,ecx
+ mov esi,DWORD [104+esp]
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [160+esp]
+ shr edi,10
+ add ebx,DWORD [124+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [24+esp]
+ ror ecx,14
+ add ebx,edi
+ mov edi,DWORD [28+esp]
+ xor ecx,edx
+ xor esi,edi
+ mov DWORD [96+esp],ebx
+ ror ecx,5
+ and esi,edx
+ mov DWORD [20+esp],edx
+ xor edx,ecx
+ add ebx,DWORD [32+esp]
+ xor esi,edi
+ ror edx,6
+ mov ecx,eax
+ add ebx,esi
+ ror ecx,9
+ add ebx,edx
+ mov edi,DWORD [8+esp]
+ xor ecx,eax
+ mov DWORD [4+esp],eax
+ lea esp,[esp-4]
+ ror ecx,11
+ mov esi,DWORD [ebp]
+ xor ecx,eax
+ mov edx,DWORD [20+esp]
+ xor eax,edi
+ ror ecx,2
+ add ebx,esi
+ mov DWORD [esp],eax
+ add edx,ebx
+ and eax,DWORD [4+esp]
+ add ebx,ecx
+ xor eax,edi
+ mov ecx,DWORD [156+esp]
+ add ebp,4
+ add eax,ebx
+ cmp esi,3329325298
+ jne NEAR L$00916_63
+ mov esi,DWORD [356+esp]
+ mov ebx,DWORD [8+esp]
+ mov ecx,DWORD [16+esp]
+ add eax,DWORD [esi]
+ add ebx,DWORD [4+esi]
+ add edi,DWORD [8+esi]
+ add ecx,DWORD [12+esi]
+ mov DWORD [esi],eax
+ mov DWORD [4+esi],ebx
+ mov DWORD [8+esi],edi
+ mov DWORD [12+esi],ecx
+ mov eax,DWORD [24+esp]
+ mov ebx,DWORD [28+esp]
+ mov ecx,DWORD [32+esp]
+ mov edi,DWORD [360+esp]
+ add edx,DWORD [16+esi]
+ add eax,DWORD [20+esi]
+ add ebx,DWORD [24+esi]
+ add ecx,DWORD [28+esi]
+ mov DWORD [16+esi],edx
+ mov DWORD [20+esi],eax
+ mov DWORD [24+esi],ebx
+ mov DWORD [28+esi],ecx
+ lea esp,[356+esp]
+ sub ebp,256
+ cmp edi,DWORD [8+esp]
+ jb NEAR L$002loop
+ mov esp,DWORD [12+esp]
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+align 64
+L$001K256:
+dd 1116352408,1899447441,3049323471,3921009573,961987163,1508970993,2453635748,2870763221,3624381080,310598401,607225278,1426881987,1925078388,2162078206,2614888103,3248222580,3835390401,4022224774,264347078,604807628,770255983,1249150122,1555081692,1996064986,2554220882,2821834349,2952996808,3210313671,3336571891,3584528711,113926993,338241895,666307205,773529912,1294757372,1396182291,1695183700,1986661051,2177026350,2456956037,2730485921,2820302411,3259730800,3345764771,3516065817,3600352804,4094571909,275423344,430227734,506948616,659060556,883997877,958139571,1322822218,1537002063,1747873779,1955562222,2024104815,2227730452,2361852424,2428436474,2756734187,3204031479,3329325298
+dd 66051,67438087,134810123,202182159
+db 83,72,65,50,53,54,32,98,108,111,99,107,32,116,114,97
+db 110,115,102,111,114,109,32,102,111,114,32,120,56,54,44,32
+db 67,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97
+db 112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103
+db 62,0
+align 16
+L$007unrolled:
+ lea esp,[esp-96]
+ mov eax,DWORD [esi]
+ mov ebp,DWORD [4+esi]
+ mov ecx,DWORD [8+esi]
+ mov ebx,DWORD [12+esi]
+ mov DWORD [4+esp],ebp
+ xor ebp,ecx
+ mov DWORD [8+esp],ecx
+ mov DWORD [12+esp],ebx
+ mov edx,DWORD [16+esi]
+ mov ebx,DWORD [20+esi]
+ mov ecx,DWORD [24+esi]
+ mov esi,DWORD [28+esi]
+ mov DWORD [20+esp],ebx
+ mov DWORD [24+esp],ecx
+ mov DWORD [28+esp],esi
+ jmp NEAR L$010grand_loop
+align 16
+L$010grand_loop:
+ mov ebx,DWORD [edi]
+ mov ecx,DWORD [4+edi]
+ bswap ebx
+ mov esi,DWORD [8+edi]
+ bswap ecx
+ mov DWORD [32+esp],ebx
+ bswap esi
+ mov DWORD [36+esp],ecx
+ mov DWORD [40+esp],esi
+ mov ebx,DWORD [12+edi]
+ mov ecx,DWORD [16+edi]
+ bswap ebx
+ mov esi,DWORD [20+edi]
+ bswap ecx
+ mov DWORD [44+esp],ebx
+ bswap esi
+ mov DWORD [48+esp],ecx
+ mov DWORD [52+esp],esi
+ mov ebx,DWORD [24+edi]
+ mov ecx,DWORD [28+edi]
+ bswap ebx
+ mov esi,DWORD [32+edi]
+ bswap ecx
+ mov DWORD [56+esp],ebx
+ bswap esi
+ mov DWORD [60+esp],ecx
+ mov DWORD [64+esp],esi
+ mov ebx,DWORD [36+edi]
+ mov ecx,DWORD [40+edi]
+ bswap ebx
+ mov esi,DWORD [44+edi]
+ bswap ecx
+ mov DWORD [68+esp],ebx
+ bswap esi
+ mov DWORD [72+esp],ecx
+ mov DWORD [76+esp],esi
+ mov ebx,DWORD [48+edi]
+ mov ecx,DWORD [52+edi]
+ bswap ebx
+ mov esi,DWORD [56+edi]
+ bswap ecx
+ mov DWORD [80+esp],ebx
+ bswap esi
+ mov DWORD [84+esp],ecx
+ mov DWORD [88+esp],esi
+ mov ebx,DWORD [60+edi]
+ add edi,64
+ bswap ebx
+ mov DWORD [100+esp],edi
+ mov DWORD [92+esp],ebx
+ mov ecx,edx
+ mov esi,DWORD [20+esp]
+ ror edx,14
+ mov edi,DWORD [24+esp]
+ xor edx,ecx
+ mov ebx,DWORD [32+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [16+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [28+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [4+esp]
+ xor ecx,eax
+ mov DWORD [esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[1116352408+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [12+esp]
+ add ebp,ecx
+ mov esi,edx
+ mov ecx,DWORD [16+esp]
+ ror edx,14
+ mov edi,DWORD [20+esp]
+ xor edx,esi
+ mov ebx,DWORD [36+esp]
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [12+esp],esi
+ xor edx,esi
+ add ebx,DWORD [24+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [esp]
+ xor esi,ebp
+ mov DWORD [28+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[1899447441+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [8+esp]
+ add eax,esi
+ mov ecx,edx
+ mov esi,DWORD [12+esp]
+ ror edx,14
+ mov edi,DWORD [16+esp]
+ xor edx,ecx
+ mov ebx,DWORD [40+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [8+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [20+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [28+esp]
+ xor ecx,eax
+ mov DWORD [24+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[3049323471+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [4+esp]
+ add ebp,ecx
+ mov esi,edx
+ mov ecx,DWORD [8+esp]
+ ror edx,14
+ mov edi,DWORD [12+esp]
+ xor edx,esi
+ mov ebx,DWORD [44+esp]
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [4+esp],esi
+ xor edx,esi
+ add ebx,DWORD [16+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [24+esp]
+ xor esi,ebp
+ mov DWORD [20+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[3921009573+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [esp]
+ add eax,esi
+ mov ecx,edx
+ mov esi,DWORD [4+esp]
+ ror edx,14
+ mov edi,DWORD [8+esp]
+ xor edx,ecx
+ mov ebx,DWORD [48+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [12+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [20+esp]
+ xor ecx,eax
+ mov DWORD [16+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[961987163+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [28+esp]
+ add ebp,ecx
+ mov esi,edx
+ mov ecx,DWORD [esp]
+ ror edx,14
+ mov edi,DWORD [4+esp]
+ xor edx,esi
+ mov ebx,DWORD [52+esp]
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [28+esp],esi
+ xor edx,esi
+ add ebx,DWORD [8+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [16+esp]
+ xor esi,ebp
+ mov DWORD [12+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[1508970993+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [24+esp]
+ add eax,esi
+ mov ecx,edx
+ mov esi,DWORD [28+esp]
+ ror edx,14
+ mov edi,DWORD [esp]
+ xor edx,ecx
+ mov ebx,DWORD [56+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [24+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [4+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [12+esp]
+ xor ecx,eax
+ mov DWORD [8+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[2453635748+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [20+esp]
+ add ebp,ecx
+ mov esi,edx
+ mov ecx,DWORD [24+esp]
+ ror edx,14
+ mov edi,DWORD [28+esp]
+ xor edx,esi
+ mov ebx,DWORD [60+esp]
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [20+esp],esi
+ xor edx,esi
+ add ebx,DWORD [esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [8+esp]
+ xor esi,ebp
+ mov DWORD [4+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[2870763221+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [16+esp]
+ add eax,esi
+ mov ecx,edx
+ mov esi,DWORD [20+esp]
+ ror edx,14
+ mov edi,DWORD [24+esp]
+ xor edx,ecx
+ mov ebx,DWORD [64+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [16+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [28+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [4+esp]
+ xor ecx,eax
+ mov DWORD [esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[3624381080+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [12+esp]
+ add ebp,ecx
+ mov esi,edx
+ mov ecx,DWORD [16+esp]
+ ror edx,14
+ mov edi,DWORD [20+esp]
+ xor edx,esi
+ mov ebx,DWORD [68+esp]
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [12+esp],esi
+ xor edx,esi
+ add ebx,DWORD [24+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [esp]
+ xor esi,ebp
+ mov DWORD [28+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[310598401+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [8+esp]
+ add eax,esi
+ mov ecx,edx
+ mov esi,DWORD [12+esp]
+ ror edx,14
+ mov edi,DWORD [16+esp]
+ xor edx,ecx
+ mov ebx,DWORD [72+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [8+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [20+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [28+esp]
+ xor ecx,eax
+ mov DWORD [24+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[607225278+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [4+esp]
+ add ebp,ecx
+ mov esi,edx
+ mov ecx,DWORD [8+esp]
+ ror edx,14
+ mov edi,DWORD [12+esp]
+ xor edx,esi
+ mov ebx,DWORD [76+esp]
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [4+esp],esi
+ xor edx,esi
+ add ebx,DWORD [16+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [24+esp]
+ xor esi,ebp
+ mov DWORD [20+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[1426881987+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [esp]
+ add eax,esi
+ mov ecx,edx
+ mov esi,DWORD [4+esp]
+ ror edx,14
+ mov edi,DWORD [8+esp]
+ xor edx,ecx
+ mov ebx,DWORD [80+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [12+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [20+esp]
+ xor ecx,eax
+ mov DWORD [16+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[1925078388+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [28+esp]
+ add ebp,ecx
+ mov esi,edx
+ mov ecx,DWORD [esp]
+ ror edx,14
+ mov edi,DWORD [4+esp]
+ xor edx,esi
+ mov ebx,DWORD [84+esp]
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [28+esp],esi
+ xor edx,esi
+ add ebx,DWORD [8+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [16+esp]
+ xor esi,ebp
+ mov DWORD [12+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[2162078206+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [24+esp]
+ add eax,esi
+ mov ecx,edx
+ mov esi,DWORD [28+esp]
+ ror edx,14
+ mov edi,DWORD [esp]
+ xor edx,ecx
+ mov ebx,DWORD [88+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [24+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [4+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [12+esp]
+ xor ecx,eax
+ mov DWORD [8+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[2614888103+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [20+esp]
+ add ebp,ecx
+ mov esi,edx
+ mov ecx,DWORD [24+esp]
+ ror edx,14
+ mov edi,DWORD [28+esp]
+ xor edx,esi
+ mov ebx,DWORD [92+esp]
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [20+esp],esi
+ xor edx,esi
+ add ebx,DWORD [esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [8+esp]
+ xor esi,ebp
+ mov DWORD [4+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[3248222580+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [36+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [16+esp]
+ add eax,esi
+ mov esi,DWORD [88+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [32+esp]
+ shr edi,10
+ add ebx,DWORD [68+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [20+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [24+esp]
+ xor edx,ecx
+ mov DWORD [32+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [16+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [28+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [4+esp]
+ xor ecx,eax
+ mov DWORD [esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[3835390401+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [40+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [12+esp]
+ add ebp,ecx
+ mov ecx,DWORD [92+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [36+esp]
+ shr edi,10
+ add ebx,DWORD [72+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [16+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [20+esp]
+ xor edx,esi
+ mov DWORD [36+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [12+esp],esi
+ xor edx,esi
+ add ebx,DWORD [24+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [esp]
+ xor esi,ebp
+ mov DWORD [28+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[4022224774+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [44+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [8+esp]
+ add eax,esi
+ mov esi,DWORD [32+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [40+esp]
+ shr edi,10
+ add ebx,DWORD [76+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [12+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [16+esp]
+ xor edx,ecx
+ mov DWORD [40+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [8+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [20+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [28+esp]
+ xor ecx,eax
+ mov DWORD [24+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[264347078+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [48+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [4+esp]
+ add ebp,ecx
+ mov ecx,DWORD [36+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [44+esp]
+ shr edi,10
+ add ebx,DWORD [80+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [8+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [12+esp]
+ xor edx,esi
+ mov DWORD [44+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [4+esp],esi
+ xor edx,esi
+ add ebx,DWORD [16+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [24+esp]
+ xor esi,ebp
+ mov DWORD [20+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[604807628+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [52+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [esp]
+ add eax,esi
+ mov esi,DWORD [40+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [48+esp]
+ shr edi,10
+ add ebx,DWORD [84+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [4+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [8+esp]
+ xor edx,ecx
+ mov DWORD [48+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [12+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [20+esp]
+ xor ecx,eax
+ mov DWORD [16+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[770255983+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [56+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [28+esp]
+ add ebp,ecx
+ mov ecx,DWORD [44+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [52+esp]
+ shr edi,10
+ add ebx,DWORD [88+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [4+esp]
+ xor edx,esi
+ mov DWORD [52+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [28+esp],esi
+ xor edx,esi
+ add ebx,DWORD [8+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [16+esp]
+ xor esi,ebp
+ mov DWORD [12+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[1249150122+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [60+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [24+esp]
+ add eax,esi
+ mov esi,DWORD [48+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [56+esp]
+ shr edi,10
+ add ebx,DWORD [92+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [28+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [esp]
+ xor edx,ecx
+ mov DWORD [56+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [24+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [4+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [12+esp]
+ xor ecx,eax
+ mov DWORD [8+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[1555081692+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [64+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [20+esp]
+ add ebp,ecx
+ mov ecx,DWORD [52+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [60+esp]
+ shr edi,10
+ add ebx,DWORD [32+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [24+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [28+esp]
+ xor edx,esi
+ mov DWORD [60+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [20+esp],esi
+ xor edx,esi
+ add ebx,DWORD [esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [8+esp]
+ xor esi,ebp
+ mov DWORD [4+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[1996064986+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [68+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [16+esp]
+ add eax,esi
+ mov esi,DWORD [56+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [64+esp]
+ shr edi,10
+ add ebx,DWORD [36+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [20+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [24+esp]
+ xor edx,ecx
+ mov DWORD [64+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [16+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [28+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [4+esp]
+ xor ecx,eax
+ mov DWORD [esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[2554220882+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [72+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [12+esp]
+ add ebp,ecx
+ mov ecx,DWORD [60+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [68+esp]
+ shr edi,10
+ add ebx,DWORD [40+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [16+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [20+esp]
+ xor edx,esi
+ mov DWORD [68+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [12+esp],esi
+ xor edx,esi
+ add ebx,DWORD [24+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [esp]
+ xor esi,ebp
+ mov DWORD [28+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[2821834349+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [76+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [8+esp]
+ add eax,esi
+ mov esi,DWORD [64+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [72+esp]
+ shr edi,10
+ add ebx,DWORD [44+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [12+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [16+esp]
+ xor edx,ecx
+ mov DWORD [72+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [8+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [20+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [28+esp]
+ xor ecx,eax
+ mov DWORD [24+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[2952996808+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [80+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [4+esp]
+ add ebp,ecx
+ mov ecx,DWORD [68+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [76+esp]
+ shr edi,10
+ add ebx,DWORD [48+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [8+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [12+esp]
+ xor edx,esi
+ mov DWORD [76+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [4+esp],esi
+ xor edx,esi
+ add ebx,DWORD [16+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [24+esp]
+ xor esi,ebp
+ mov DWORD [20+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[3210313671+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [84+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [esp]
+ add eax,esi
+ mov esi,DWORD [72+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [80+esp]
+ shr edi,10
+ add ebx,DWORD [52+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [4+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [8+esp]
+ xor edx,ecx
+ mov DWORD [80+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [12+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [20+esp]
+ xor ecx,eax
+ mov DWORD [16+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[3336571891+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [88+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [28+esp]
+ add ebp,ecx
+ mov ecx,DWORD [76+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [84+esp]
+ shr edi,10
+ add ebx,DWORD [56+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [4+esp]
+ xor edx,esi
+ mov DWORD [84+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [28+esp],esi
+ xor edx,esi
+ add ebx,DWORD [8+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [16+esp]
+ xor esi,ebp
+ mov DWORD [12+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[3584528711+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [92+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [24+esp]
+ add eax,esi
+ mov esi,DWORD [80+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [88+esp]
+ shr edi,10
+ add ebx,DWORD [60+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [28+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [esp]
+ xor edx,ecx
+ mov DWORD [88+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [24+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [4+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [12+esp]
+ xor ecx,eax
+ mov DWORD [8+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[113926993+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [32+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [20+esp]
+ add ebp,ecx
+ mov ecx,DWORD [84+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [92+esp]
+ shr edi,10
+ add ebx,DWORD [64+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [24+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [28+esp]
+ xor edx,esi
+ mov DWORD [92+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [20+esp],esi
+ xor edx,esi
+ add ebx,DWORD [esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [8+esp]
+ xor esi,ebp
+ mov DWORD [4+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[338241895+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [36+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [16+esp]
+ add eax,esi
+ mov esi,DWORD [88+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [32+esp]
+ shr edi,10
+ add ebx,DWORD [68+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [20+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [24+esp]
+ xor edx,ecx
+ mov DWORD [32+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [16+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [28+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [4+esp]
+ xor ecx,eax
+ mov DWORD [esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[666307205+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [40+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [12+esp]
+ add ebp,ecx
+ mov ecx,DWORD [92+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [36+esp]
+ shr edi,10
+ add ebx,DWORD [72+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [16+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [20+esp]
+ xor edx,esi
+ mov DWORD [36+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [12+esp],esi
+ xor edx,esi
+ add ebx,DWORD [24+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [esp]
+ xor esi,ebp
+ mov DWORD [28+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[773529912+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [44+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [8+esp]
+ add eax,esi
+ mov esi,DWORD [32+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [40+esp]
+ shr edi,10
+ add ebx,DWORD [76+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [12+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [16+esp]
+ xor edx,ecx
+ mov DWORD [40+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [8+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [20+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [28+esp]
+ xor ecx,eax
+ mov DWORD [24+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[1294757372+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [48+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [4+esp]
+ add ebp,ecx
+ mov ecx,DWORD [36+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [44+esp]
+ shr edi,10
+ add ebx,DWORD [80+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [8+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [12+esp]
+ xor edx,esi
+ mov DWORD [44+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [4+esp],esi
+ xor edx,esi
+ add ebx,DWORD [16+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [24+esp]
+ xor esi,ebp
+ mov DWORD [20+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[1396182291+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [52+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [esp]
+ add eax,esi
+ mov esi,DWORD [40+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [48+esp]
+ shr edi,10
+ add ebx,DWORD [84+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [4+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [8+esp]
+ xor edx,ecx
+ mov DWORD [48+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [12+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [20+esp]
+ xor ecx,eax
+ mov DWORD [16+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[1695183700+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [56+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [28+esp]
+ add ebp,ecx
+ mov ecx,DWORD [44+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [52+esp]
+ shr edi,10
+ add ebx,DWORD [88+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [4+esp]
+ xor edx,esi
+ mov DWORD [52+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [28+esp],esi
+ xor edx,esi
+ add ebx,DWORD [8+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [16+esp]
+ xor esi,ebp
+ mov DWORD [12+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[1986661051+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [60+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [24+esp]
+ add eax,esi
+ mov esi,DWORD [48+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [56+esp]
+ shr edi,10
+ add ebx,DWORD [92+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [28+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [esp]
+ xor edx,ecx
+ mov DWORD [56+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [24+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [4+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [12+esp]
+ xor ecx,eax
+ mov DWORD [8+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[2177026350+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [64+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [20+esp]
+ add ebp,ecx
+ mov ecx,DWORD [52+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [60+esp]
+ shr edi,10
+ add ebx,DWORD [32+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [24+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [28+esp]
+ xor edx,esi
+ mov DWORD [60+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [20+esp],esi
+ xor edx,esi
+ add ebx,DWORD [esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [8+esp]
+ xor esi,ebp
+ mov DWORD [4+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[2456956037+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [68+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [16+esp]
+ add eax,esi
+ mov esi,DWORD [56+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [64+esp]
+ shr edi,10
+ add ebx,DWORD [36+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [20+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [24+esp]
+ xor edx,ecx
+ mov DWORD [64+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [16+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [28+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [4+esp]
+ xor ecx,eax
+ mov DWORD [esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[2730485921+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [72+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [12+esp]
+ add ebp,ecx
+ mov ecx,DWORD [60+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [68+esp]
+ shr edi,10
+ add ebx,DWORD [40+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [16+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [20+esp]
+ xor edx,esi
+ mov DWORD [68+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [12+esp],esi
+ xor edx,esi
+ add ebx,DWORD [24+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [esp]
+ xor esi,ebp
+ mov DWORD [28+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[2820302411+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [76+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [8+esp]
+ add eax,esi
+ mov esi,DWORD [64+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [72+esp]
+ shr edi,10
+ add ebx,DWORD [44+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [12+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [16+esp]
+ xor edx,ecx
+ mov DWORD [72+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [8+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [20+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [28+esp]
+ xor ecx,eax
+ mov DWORD [24+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[3259730800+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [80+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [4+esp]
+ add ebp,ecx
+ mov ecx,DWORD [68+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [76+esp]
+ shr edi,10
+ add ebx,DWORD [48+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [8+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [12+esp]
+ xor edx,esi
+ mov DWORD [76+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [4+esp],esi
+ xor edx,esi
+ add ebx,DWORD [16+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [24+esp]
+ xor esi,ebp
+ mov DWORD [20+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[3345764771+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [84+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [esp]
+ add eax,esi
+ mov esi,DWORD [72+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [80+esp]
+ shr edi,10
+ add ebx,DWORD [52+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [4+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [8+esp]
+ xor edx,ecx
+ mov DWORD [80+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [12+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [20+esp]
+ xor ecx,eax
+ mov DWORD [16+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[3516065817+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [88+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [28+esp]
+ add ebp,ecx
+ mov ecx,DWORD [76+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [84+esp]
+ shr edi,10
+ add ebx,DWORD [56+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [4+esp]
+ xor edx,esi
+ mov DWORD [84+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [28+esp],esi
+ xor edx,esi
+ add ebx,DWORD [8+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [16+esp]
+ xor esi,ebp
+ mov DWORD [12+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[3600352804+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [92+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [24+esp]
+ add eax,esi
+ mov esi,DWORD [80+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [88+esp]
+ shr edi,10
+ add ebx,DWORD [60+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [28+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [esp]
+ xor edx,ecx
+ mov DWORD [88+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [24+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [4+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [12+esp]
+ xor ecx,eax
+ mov DWORD [8+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[4094571909+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [32+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [20+esp]
+ add ebp,ecx
+ mov ecx,DWORD [84+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [92+esp]
+ shr edi,10
+ add ebx,DWORD [64+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [24+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [28+esp]
+ xor edx,esi
+ mov DWORD [92+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [20+esp],esi
+ xor edx,esi
+ add ebx,DWORD [esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [8+esp]
+ xor esi,ebp
+ mov DWORD [4+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[275423344+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [36+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [16+esp]
+ add eax,esi
+ mov esi,DWORD [88+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [32+esp]
+ shr edi,10
+ add ebx,DWORD [68+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [20+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [24+esp]
+ xor edx,ecx
+ mov DWORD [32+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [16+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [28+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [4+esp]
+ xor ecx,eax
+ mov DWORD [esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[430227734+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [40+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [12+esp]
+ add ebp,ecx
+ mov ecx,DWORD [92+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [36+esp]
+ shr edi,10
+ add ebx,DWORD [72+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [16+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [20+esp]
+ xor edx,esi
+ mov DWORD [36+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [12+esp],esi
+ xor edx,esi
+ add ebx,DWORD [24+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [esp]
+ xor esi,ebp
+ mov DWORD [28+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[506948616+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [44+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [8+esp]
+ add eax,esi
+ mov esi,DWORD [32+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [40+esp]
+ shr edi,10
+ add ebx,DWORD [76+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [12+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [16+esp]
+ xor edx,ecx
+ mov DWORD [40+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [8+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [20+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [28+esp]
+ xor ecx,eax
+ mov DWORD [24+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[659060556+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [48+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [4+esp]
+ add ebp,ecx
+ mov ecx,DWORD [36+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [44+esp]
+ shr edi,10
+ add ebx,DWORD [80+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [8+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [12+esp]
+ xor edx,esi
+ mov DWORD [44+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [4+esp],esi
+ xor edx,esi
+ add ebx,DWORD [16+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [24+esp]
+ xor esi,ebp
+ mov DWORD [20+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[883997877+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [52+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [esp]
+ add eax,esi
+ mov esi,DWORD [40+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [48+esp]
+ shr edi,10
+ add ebx,DWORD [84+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [4+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [8+esp]
+ xor edx,ecx
+ mov DWORD [48+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [12+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [20+esp]
+ xor ecx,eax
+ mov DWORD [16+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[958139571+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [56+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [28+esp]
+ add ebp,ecx
+ mov ecx,DWORD [44+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [52+esp]
+ shr edi,10
+ add ebx,DWORD [88+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [4+esp]
+ xor edx,esi
+ mov DWORD [52+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [28+esp],esi
+ xor edx,esi
+ add ebx,DWORD [8+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [16+esp]
+ xor esi,ebp
+ mov DWORD [12+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[1322822218+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [60+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [24+esp]
+ add eax,esi
+ mov esi,DWORD [48+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [56+esp]
+ shr edi,10
+ add ebx,DWORD [92+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [28+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [esp]
+ xor edx,ecx
+ mov DWORD [56+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [24+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [4+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [12+esp]
+ xor ecx,eax
+ mov DWORD [8+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[1537002063+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [64+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [20+esp]
+ add ebp,ecx
+ mov ecx,DWORD [52+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [60+esp]
+ shr edi,10
+ add ebx,DWORD [32+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [24+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [28+esp]
+ xor edx,esi
+ mov DWORD [60+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [20+esp],esi
+ xor edx,esi
+ add ebx,DWORD [esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [8+esp]
+ xor esi,ebp
+ mov DWORD [4+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[1747873779+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [68+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [16+esp]
+ add eax,esi
+ mov esi,DWORD [56+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [64+esp]
+ shr edi,10
+ add ebx,DWORD [36+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [20+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [24+esp]
+ xor edx,ecx
+ mov DWORD [64+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [16+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [28+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [4+esp]
+ xor ecx,eax
+ mov DWORD [esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[1955562222+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [72+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [12+esp]
+ add ebp,ecx
+ mov ecx,DWORD [60+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [68+esp]
+ shr edi,10
+ add ebx,DWORD [40+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [16+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [20+esp]
+ xor edx,esi
+ mov DWORD [68+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [12+esp],esi
+ xor edx,esi
+ add ebx,DWORD [24+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [esp]
+ xor esi,ebp
+ mov DWORD [28+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[2024104815+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [76+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [8+esp]
+ add eax,esi
+ mov esi,DWORD [64+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [72+esp]
+ shr edi,10
+ add ebx,DWORD [44+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [12+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [16+esp]
+ xor edx,ecx
+ mov DWORD [72+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [8+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [20+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [28+esp]
+ xor ecx,eax
+ mov DWORD [24+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[2227730452+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [80+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [4+esp]
+ add ebp,ecx
+ mov ecx,DWORD [68+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [76+esp]
+ shr edi,10
+ add ebx,DWORD [48+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [8+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [12+esp]
+ xor edx,esi
+ mov DWORD [76+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [4+esp],esi
+ xor edx,esi
+ add ebx,DWORD [16+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [24+esp]
+ xor esi,ebp
+ mov DWORD [20+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[2361852424+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [84+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [esp]
+ add eax,esi
+ mov esi,DWORD [72+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [80+esp]
+ shr edi,10
+ add ebx,DWORD [52+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [4+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [8+esp]
+ xor edx,ecx
+ mov DWORD [80+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [12+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [20+esp]
+ xor ecx,eax
+ mov DWORD [16+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[2428436474+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [88+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [28+esp]
+ add ebp,ecx
+ mov ecx,DWORD [76+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [84+esp]
+ shr edi,10
+ add ebx,DWORD [56+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [4+esp]
+ xor edx,esi
+ mov DWORD [84+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [28+esp],esi
+ xor edx,esi
+ add ebx,DWORD [8+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [16+esp]
+ xor esi,ebp
+ mov DWORD [12+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[2756734187+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [92+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [24+esp]
+ add eax,esi
+ mov esi,DWORD [80+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [88+esp]
+ shr edi,10
+ add ebx,DWORD [60+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [28+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [esp]
+ xor edx,ecx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [24+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [4+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [12+esp]
+ xor ecx,eax
+ mov DWORD [8+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[3204031479+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [32+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [20+esp]
+ add ebp,ecx
+ mov ecx,DWORD [84+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [92+esp]
+ shr edi,10
+ add ebx,DWORD [64+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [24+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [28+esp]
+ xor edx,esi
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [20+esp],esi
+ xor edx,esi
+ add ebx,DWORD [esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [8+esp]
+ xor esi,ebp
+ mov DWORD [4+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[3329325298+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [16+esp]
+ add eax,esi
+ mov esi,DWORD [96+esp]
+ xor ebp,edi
+ mov ecx,DWORD [12+esp]
+ add eax,DWORD [esi]
+ add ebp,DWORD [4+esi]
+ add edi,DWORD [8+esi]
+ add ecx,DWORD [12+esi]
+ mov DWORD [esi],eax
+ mov DWORD [4+esi],ebp
+ mov DWORD [8+esi],edi
+ mov DWORD [12+esi],ecx
+ mov DWORD [4+esp],ebp
+ xor ebp,edi
+ mov DWORD [8+esp],edi
+ mov DWORD [12+esp],ecx
+ mov edi,DWORD [20+esp]
+ mov ebx,DWORD [24+esp]
+ mov ecx,DWORD [28+esp]
+ add edx,DWORD [16+esi]
+ add edi,DWORD [20+esi]
+ add ebx,DWORD [24+esi]
+ add ecx,DWORD [28+esi]
+ mov DWORD [16+esi],edx
+ mov DWORD [20+esi],edi
+ mov DWORD [24+esi],ebx
+ mov DWORD [28+esi],ecx
+ mov DWORD [20+esp],edi
+ mov edi,DWORD [100+esp]
+ mov DWORD [24+esp],ebx
+ mov DWORD [28+esp],ecx
+ cmp edi,DWORD [104+esp]
+ jb NEAR L$010grand_loop
+ mov esp,DWORD [108+esp]
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+align 32
+L$004shaext:
+ sub esp,32
+ movdqu xmm1,[esi]
+ lea ebp,[128+ebp]
+ movdqu xmm2,[16+esi]
+ movdqa xmm7,[128+ebp]
+ pshufd xmm0,xmm1,27
+ pshufd xmm1,xmm1,177
+ pshufd xmm2,xmm2,27
+db 102,15,58,15,202,8
+ punpcklqdq xmm2,xmm0
+ jmp NEAR L$011loop_shaext
+align 16
+L$011loop_shaext:
+ movdqu xmm3,[edi]
+ movdqu xmm4,[16+edi]
+ movdqu xmm5,[32+edi]
+db 102,15,56,0,223
+ movdqu xmm6,[48+edi]
+ movdqa [16+esp],xmm2
+ movdqa xmm0,[ebp-128]
+ paddd xmm0,xmm3
+db 102,15,56,0,231
+db 15,56,203,209
+ pshufd xmm0,xmm0,14
+ nop
+ movdqa [esp],xmm1
+db 15,56,203,202
+ movdqa xmm0,[ebp-112]
+ paddd xmm0,xmm4
+db 102,15,56,0,239
+db 15,56,203,209
+ pshufd xmm0,xmm0,14
+ lea edi,[64+edi]
+db 15,56,204,220
+db 15,56,203,202
+ movdqa xmm0,[ebp-96]
+ paddd xmm0,xmm5
+db 102,15,56,0,247
+db 15,56,203,209
+ pshufd xmm0,xmm0,14
+ movdqa xmm7,xmm6
+db 102,15,58,15,253,4
+ nop
+ paddd xmm3,xmm7
+db 15,56,204,229
+db 15,56,203,202
+ movdqa xmm0,[ebp-80]
+ paddd xmm0,xmm6
+db 15,56,205,222
+db 15,56,203,209
+ pshufd xmm0,xmm0,14
+ movdqa xmm7,xmm3
+db 102,15,58,15,254,4
+ nop
+ paddd xmm4,xmm7
+db 15,56,204,238
+db 15,56,203,202
+ movdqa xmm0,[ebp-64]
+ paddd xmm0,xmm3
+db 15,56,205,227
+db 15,56,203,209
+ pshufd xmm0,xmm0,14
+ movdqa xmm7,xmm4
+db 102,15,58,15,251,4
+ nop
+ paddd xmm5,xmm7
+db 15,56,204,243
+db 15,56,203,202
+ movdqa xmm0,[ebp-48]
+ paddd xmm0,xmm4
+db 15,56,205,236
+db 15,56,203,209
+ pshufd xmm0,xmm0,14
+ movdqa xmm7,xmm5
+db 102,15,58,15,252,4
+ nop
+ paddd xmm6,xmm7
+db 15,56,204,220
+db 15,56,203,202
+ movdqa xmm0,[ebp-32]
+ paddd xmm0,xmm5
+db 15,56,205,245
+db 15,56,203,209
+ pshufd xmm0,xmm0,14
+ movdqa xmm7,xmm6
+db 102,15,58,15,253,4
+ nop
+ paddd xmm3,xmm7
+db 15,56,204,229
+db 15,56,203,202
+ movdqa xmm0,[ebp-16]
+ paddd xmm0,xmm6
+db 15,56,205,222
+db 15,56,203,209
+ pshufd xmm0,xmm0,14
+ movdqa xmm7,xmm3
+db 102,15,58,15,254,4
+ nop
+ paddd xmm4,xmm7
+db 15,56,204,238
+db 15,56,203,202
+ movdqa xmm0,[ebp]
+ paddd xmm0,xmm3
+db 15,56,205,227
+db 15,56,203,209
+ pshufd xmm0,xmm0,14
+ movdqa xmm7,xmm4
+db 102,15,58,15,251,4
+ nop
+ paddd xmm5,xmm7
+db 15,56,204,243
+db 15,56,203,202
+ movdqa xmm0,[16+ebp]
+ paddd xmm0,xmm4
+db 15,56,205,236
+db 15,56,203,209
+ pshufd xmm0,xmm0,14
+ movdqa xmm7,xmm5
+db 102,15,58,15,252,4
+ nop
+ paddd xmm6,xmm7
+db 15,56,204,220
+db 15,56,203,202
+ movdqa xmm0,[32+ebp]
+ paddd xmm0,xmm5
+db 15,56,205,245
+db 15,56,203,209
+ pshufd xmm0,xmm0,14
+ movdqa xmm7,xmm6
+db 102,15,58,15,253,4
+ nop
+ paddd xmm3,xmm7
+db 15,56,204,229
+db 15,56,203,202
+ movdqa xmm0,[48+ebp]
+ paddd xmm0,xmm6
+db 15,56,205,222
+db 15,56,203,209
+ pshufd xmm0,xmm0,14
+ movdqa xmm7,xmm3
+db 102,15,58,15,254,4
+ nop
+ paddd xmm4,xmm7
+db 15,56,204,238
+db 15,56,203,202
+ movdqa xmm0,[64+ebp]
+ paddd xmm0,xmm3
+db 15,56,205,227
+db 15,56,203,209
+ pshufd xmm0,xmm0,14
+ movdqa xmm7,xmm4
+db 102,15,58,15,251,4
+ nop
+ paddd xmm5,xmm7
+db 15,56,204,243
+db 15,56,203,202
+ movdqa xmm0,[80+ebp]
+ paddd xmm0,xmm4
+db 15,56,205,236
+db 15,56,203,209
+ pshufd xmm0,xmm0,14
+ movdqa xmm7,xmm5
+db 102,15,58,15,252,4
+db 15,56,203,202
+ paddd xmm6,xmm7
+ movdqa xmm0,[96+ebp]
+ paddd xmm0,xmm5
+db 15,56,203,209
+ pshufd xmm0,xmm0,14
+db 15,56,205,245
+ movdqa xmm7,[128+ebp]
+db 15,56,203,202
+ movdqa xmm0,[112+ebp]
+ paddd xmm0,xmm6
+ nop
+db 15,56,203,209
+ pshufd xmm0,xmm0,14
+ cmp eax,edi
+ nop
+db 15,56,203,202
+ paddd xmm2,[16+esp]
+ paddd xmm1,[esp]
+ jnz NEAR L$011loop_shaext
+ pshufd xmm2,xmm2,177
+ pshufd xmm7,xmm1,27
+ pshufd xmm1,xmm1,177
+ punpckhqdq xmm1,xmm2
+db 102,15,58,15,215,8
+ mov esp,DWORD [44+esp]
+ movdqu [esi],xmm1
+ movdqu [16+esi],xmm2
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+align 32
+L$006SSSE3:
+ lea esp,[esp-96]
+ mov eax,DWORD [esi]
+ mov ebx,DWORD [4+esi]
+ mov ecx,DWORD [8+esi]
+ mov edi,DWORD [12+esi]
+ mov DWORD [4+esp],ebx
+ xor ebx,ecx
+ mov DWORD [8+esp],ecx
+ mov DWORD [12+esp],edi
+ mov edx,DWORD [16+esi]
+ mov edi,DWORD [20+esi]
+ mov ecx,DWORD [24+esi]
+ mov esi,DWORD [28+esi]
+ mov DWORD [20+esp],edi
+ mov edi,DWORD [100+esp]
+ mov DWORD [24+esp],ecx
+ mov DWORD [28+esp],esi
+ movdqa xmm7,[256+ebp]
+ jmp NEAR L$012grand_ssse3
+align 16
+L$012grand_ssse3:
+ movdqu xmm0,[edi]
+ movdqu xmm1,[16+edi]
+ movdqu xmm2,[32+edi]
+ movdqu xmm3,[48+edi]
+ add edi,64
+db 102,15,56,0,199
+ mov DWORD [100+esp],edi
+db 102,15,56,0,207
+ movdqa xmm4,[ebp]
+db 102,15,56,0,215
+ movdqa xmm5,[16+ebp]
+ paddd xmm4,xmm0
+db 102,15,56,0,223
+ movdqa xmm6,[32+ebp]
+ paddd xmm5,xmm1
+ movdqa xmm7,[48+ebp]
+ movdqa [32+esp],xmm4
+ paddd xmm6,xmm2
+ movdqa [48+esp],xmm5
+ paddd xmm7,xmm3
+ movdqa [64+esp],xmm6
+ movdqa [80+esp],xmm7
+ jmp NEAR L$013ssse3_00_47
+align 16
+L$013ssse3_00_47:
+ add ebp,64
+ mov ecx,edx
+ movdqa xmm4,xmm1
+ ror edx,14
+ mov esi,DWORD [20+esp]
+ movdqa xmm7,xmm3
+ xor edx,ecx
+ mov edi,DWORD [24+esp]
+db 102,15,58,15,224,4
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+db 102,15,58,15,250,4
+ mov DWORD [16+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ movdqa xmm5,xmm4
+ ror edx,6
+ mov ecx,eax
+ movdqa xmm6,xmm4
+ add edx,edi
+ mov edi,DWORD [4+esp]
+ psrld xmm4,3
+ mov esi,eax
+ ror ecx,9
+ paddd xmm0,xmm7
+ mov DWORD [esp],eax
+ xor ecx,eax
+ psrld xmm6,7
+ xor eax,edi
+ add edx,DWORD [28+esp]
+ ror ecx,11
+ and ebx,eax
+ pshufd xmm7,xmm3,250
+ xor ecx,esi
+ add edx,DWORD [32+esp]
+ pslld xmm5,14
+ xor ebx,edi
+ ror ecx,2
+ pxor xmm4,xmm6
+ add ebx,edx
+ add edx,DWORD [12+esp]
+ psrld xmm6,11
+ add ebx,ecx
+ mov ecx,edx
+ ror edx,14
+ pxor xmm4,xmm5
+ mov esi,DWORD [16+esp]
+ xor edx,ecx
+ pslld xmm5,11
+ mov edi,DWORD [20+esp]
+ xor esi,edi
+ ror edx,5
+ pxor xmm4,xmm6
+ and esi,ecx
+ mov DWORD [12+esp],ecx
+ movdqa xmm6,xmm7
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ pxor xmm4,xmm5
+ mov ecx,ebx
+ add edx,edi
+ psrld xmm7,10
+ mov edi,DWORD [esp]
+ mov esi,ebx
+ ror ecx,9
+ paddd xmm0,xmm4
+ mov DWORD [28+esp],ebx
+ xor ecx,ebx
+ psrlq xmm6,17
+ xor ebx,edi
+ add edx,DWORD [24+esp]
+ ror ecx,11
+ pxor xmm7,xmm6
+ and eax,ebx
+ xor ecx,esi
+ psrlq xmm6,2
+ add edx,DWORD [36+esp]
+ xor eax,edi
+ ror ecx,2
+ pxor xmm7,xmm6
+ add eax,edx
+ add edx,DWORD [8+esp]
+ pshufd xmm7,xmm7,128
+ add eax,ecx
+ mov ecx,edx
+ ror edx,14
+ mov esi,DWORD [12+esp]
+ xor edx,ecx
+ mov edi,DWORD [16+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ psrldq xmm7,8
+ mov DWORD [8+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ paddd xmm0,xmm7
+ ror edx,6
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [28+esp]
+ mov esi,eax
+ ror ecx,9
+ mov DWORD [24+esp],eax
+ pshufd xmm7,xmm0,80
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [20+esp]
+ movdqa xmm6,xmm7
+ ror ecx,11
+ psrld xmm7,10
+ and ebx,eax
+ psrlq xmm6,17
+ xor ecx,esi
+ add edx,DWORD [40+esp]
+ xor ebx,edi
+ ror ecx,2
+ pxor xmm7,xmm6
+ add ebx,edx
+ add edx,DWORD [4+esp]
+ psrlq xmm6,2
+ add ebx,ecx
+ mov ecx,edx
+ ror edx,14
+ pxor xmm7,xmm6
+ mov esi,DWORD [8+esp]
+ xor edx,ecx
+ mov edi,DWORD [12+esp]
+ pshufd xmm7,xmm7,8
+ xor esi,edi
+ ror edx,5
+ movdqa xmm6,[ebp]
+ and esi,ecx
+ mov DWORD [4+esp],ecx
+ pslldq xmm7,8
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [24+esp]
+ mov esi,ebx
+ ror ecx,9
+ paddd xmm0,xmm7
+ mov DWORD [20+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [16+esp]
+ paddd xmm6,xmm0
+ ror ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [44+esp]
+ xor eax,edi
+ ror ecx,2
+ add eax,edx
+ add edx,DWORD [esp]
+ add eax,ecx
+ movdqa [32+esp],xmm6
+ mov ecx,edx
+ movdqa xmm4,xmm2
+ ror edx,14
+ mov esi,DWORD [4+esp]
+ movdqa xmm7,xmm0
+ xor edx,ecx
+ mov edi,DWORD [8+esp]
+db 102,15,58,15,225,4
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+db 102,15,58,15,251,4
+ mov DWORD [esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ movdqa xmm5,xmm4
+ ror edx,6
+ mov ecx,eax
+ movdqa xmm6,xmm4
+ add edx,edi
+ mov edi,DWORD [20+esp]
+ psrld xmm4,3
+ mov esi,eax
+ ror ecx,9
+ paddd xmm1,xmm7
+ mov DWORD [16+esp],eax
+ xor ecx,eax
+ psrld xmm6,7
+ xor eax,edi
+ add edx,DWORD [12+esp]
+ ror ecx,11
+ and ebx,eax
+ pshufd xmm7,xmm0,250
+ xor ecx,esi
+ add edx,DWORD [48+esp]
+ pslld xmm5,14
+ xor ebx,edi
+ ror ecx,2
+ pxor xmm4,xmm6
+ add ebx,edx
+ add edx,DWORD [28+esp]
+ psrld xmm6,11
+ add ebx,ecx
+ mov ecx,edx
+ ror edx,14
+ pxor xmm4,xmm5
+ mov esi,DWORD [esp]
+ xor edx,ecx
+ pslld xmm5,11
+ mov edi,DWORD [4+esp]
+ xor esi,edi
+ ror edx,5
+ pxor xmm4,xmm6
+ and esi,ecx
+ mov DWORD [28+esp],ecx
+ movdqa xmm6,xmm7
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ pxor xmm4,xmm5
+ mov ecx,ebx
+ add edx,edi
+ psrld xmm7,10
+ mov edi,DWORD [16+esp]
+ mov esi,ebx
+ ror ecx,9
+ paddd xmm1,xmm4
+ mov DWORD [12+esp],ebx
+ xor ecx,ebx
+ psrlq xmm6,17
+ xor ebx,edi
+ add edx,DWORD [8+esp]
+ ror ecx,11
+ pxor xmm7,xmm6
+ and eax,ebx
+ xor ecx,esi
+ psrlq xmm6,2
+ add edx,DWORD [52+esp]
+ xor eax,edi
+ ror ecx,2
+ pxor xmm7,xmm6
+ add eax,edx
+ add edx,DWORD [24+esp]
+ pshufd xmm7,xmm7,128
+ add eax,ecx
+ mov ecx,edx
+ ror edx,14
+ mov esi,DWORD [28+esp]
+ xor edx,ecx
+ mov edi,DWORD [esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ psrldq xmm7,8
+ mov DWORD [24+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ paddd xmm1,xmm7
+ ror edx,6
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [12+esp]
+ mov esi,eax
+ ror ecx,9
+ mov DWORD [8+esp],eax
+ pshufd xmm7,xmm1,80
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [4+esp]
+ movdqa xmm6,xmm7
+ ror ecx,11
+ psrld xmm7,10
+ and ebx,eax
+ psrlq xmm6,17
+ xor ecx,esi
+ add edx,DWORD [56+esp]
+ xor ebx,edi
+ ror ecx,2
+ pxor xmm7,xmm6
+ add ebx,edx
+ add edx,DWORD [20+esp]
+ psrlq xmm6,2
+ add ebx,ecx
+ mov ecx,edx
+ ror edx,14
+ pxor xmm7,xmm6
+ mov esi,DWORD [24+esp]
+ xor edx,ecx
+ mov edi,DWORD [28+esp]
+ pshufd xmm7,xmm7,8
+ xor esi,edi
+ ror edx,5
+ movdqa xmm6,[16+ebp]
+ and esi,ecx
+ mov DWORD [20+esp],ecx
+ pslldq xmm7,8
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [8+esp]
+ mov esi,ebx
+ ror ecx,9
+ paddd xmm1,xmm7
+ mov DWORD [4+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [esp]
+ paddd xmm6,xmm1
+ ror ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [60+esp]
+ xor eax,edi
+ ror ecx,2
+ add eax,edx
+ add edx,DWORD [16+esp]
+ add eax,ecx
+ movdqa [48+esp],xmm6
+ mov ecx,edx
+ movdqa xmm4,xmm3
+ ror edx,14
+ mov esi,DWORD [20+esp]
+ movdqa xmm7,xmm1
+ xor edx,ecx
+ mov edi,DWORD [24+esp]
+db 102,15,58,15,226,4
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+db 102,15,58,15,248,4
+ mov DWORD [16+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ movdqa xmm5,xmm4
+ ror edx,6
+ mov ecx,eax
+ movdqa xmm6,xmm4
+ add edx,edi
+ mov edi,DWORD [4+esp]
+ psrld xmm4,3
+ mov esi,eax
+ ror ecx,9
+ paddd xmm2,xmm7
+ mov DWORD [esp],eax
+ xor ecx,eax
+ psrld xmm6,7
+ xor eax,edi
+ add edx,DWORD [28+esp]
+ ror ecx,11
+ and ebx,eax
+ pshufd xmm7,xmm1,250
+ xor ecx,esi
+ add edx,DWORD [64+esp]
+ pslld xmm5,14
+ xor ebx,edi
+ ror ecx,2
+ pxor xmm4,xmm6
+ add ebx,edx
+ add edx,DWORD [12+esp]
+ psrld xmm6,11
+ add ebx,ecx
+ mov ecx,edx
+ ror edx,14
+ pxor xmm4,xmm5
+ mov esi,DWORD [16+esp]
+ xor edx,ecx
+ pslld xmm5,11
+ mov edi,DWORD [20+esp]
+ xor esi,edi
+ ror edx,5
+ pxor xmm4,xmm6
+ and esi,ecx
+ mov DWORD [12+esp],ecx
+ movdqa xmm6,xmm7
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ pxor xmm4,xmm5
+ mov ecx,ebx
+ add edx,edi
+ psrld xmm7,10
+ mov edi,DWORD [esp]
+ mov esi,ebx
+ ror ecx,9
+ paddd xmm2,xmm4
+ mov DWORD [28+esp],ebx
+ xor ecx,ebx
+ psrlq xmm6,17
+ xor ebx,edi
+ add edx,DWORD [24+esp]
+ ror ecx,11
+ pxor xmm7,xmm6
+ and eax,ebx
+ xor ecx,esi
+ psrlq xmm6,2
+ add edx,DWORD [68+esp]
+ xor eax,edi
+ ror ecx,2
+ pxor xmm7,xmm6
+ add eax,edx
+ add edx,DWORD [8+esp]
+ pshufd xmm7,xmm7,128
+ add eax,ecx
+ mov ecx,edx
+ ror edx,14
+ mov esi,DWORD [12+esp]
+ xor edx,ecx
+ mov edi,DWORD [16+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ psrldq xmm7,8
+ mov DWORD [8+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ paddd xmm2,xmm7
+ ror edx,6
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [28+esp]
+ mov esi,eax
+ ror ecx,9
+ mov DWORD [24+esp],eax
+ pshufd xmm7,xmm2,80
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [20+esp]
+ movdqa xmm6,xmm7
+ ror ecx,11
+ psrld xmm7,10
+ and ebx,eax
+ psrlq xmm6,17
+ xor ecx,esi
+ add edx,DWORD [72+esp]
+ xor ebx,edi
+ ror ecx,2
+ pxor xmm7,xmm6
+ add ebx,edx
+ add edx,DWORD [4+esp]
+ psrlq xmm6,2
+ add ebx,ecx
+ mov ecx,edx
+ ror edx,14
+ pxor xmm7,xmm6
+ mov esi,DWORD [8+esp]
+ xor edx,ecx
+ mov edi,DWORD [12+esp]
+ pshufd xmm7,xmm7,8
+ xor esi,edi
+ ror edx,5
+ movdqa xmm6,[32+ebp]
+ and esi,ecx
+ mov DWORD [4+esp],ecx
+ pslldq xmm7,8
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [24+esp]
+ mov esi,ebx
+ ror ecx,9
+ paddd xmm2,xmm7
+ mov DWORD [20+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [16+esp]
+ paddd xmm6,xmm2
+ ror ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [76+esp]
+ xor eax,edi
+ ror ecx,2
+ add eax,edx
+ add edx,DWORD [esp]
+ add eax,ecx
+ movdqa [64+esp],xmm6
+ mov ecx,edx
+ movdqa xmm4,xmm0
+ ror edx,14
+ mov esi,DWORD [4+esp]
+ movdqa xmm7,xmm2
+ xor edx,ecx
+ mov edi,DWORD [8+esp]
+db 102,15,58,15,227,4
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+db 102,15,58,15,249,4
+ mov DWORD [esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ movdqa xmm5,xmm4
+ ror edx,6
+ mov ecx,eax
+ movdqa xmm6,xmm4
+ add edx,edi
+ mov edi,DWORD [20+esp]
+ psrld xmm4,3
+ mov esi,eax
+ ror ecx,9
+ paddd xmm3,xmm7
+ mov DWORD [16+esp],eax
+ xor ecx,eax
+ psrld xmm6,7
+ xor eax,edi
+ add edx,DWORD [12+esp]
+ ror ecx,11
+ and ebx,eax
+ pshufd xmm7,xmm2,250
+ xor ecx,esi
+ add edx,DWORD [80+esp]
+ pslld xmm5,14
+ xor ebx,edi
+ ror ecx,2
+ pxor xmm4,xmm6
+ add ebx,edx
+ add edx,DWORD [28+esp]
+ psrld xmm6,11
+ add ebx,ecx
+ mov ecx,edx
+ ror edx,14
+ pxor xmm4,xmm5
+ mov esi,DWORD [esp]
+ xor edx,ecx
+ pslld xmm5,11
+ mov edi,DWORD [4+esp]
+ xor esi,edi
+ ror edx,5
+ pxor xmm4,xmm6
+ and esi,ecx
+ mov DWORD [28+esp],ecx
+ movdqa xmm6,xmm7
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ pxor xmm4,xmm5
+ mov ecx,ebx
+ add edx,edi
+ psrld xmm7,10
+ mov edi,DWORD [16+esp]
+ mov esi,ebx
+ ror ecx,9
+ paddd xmm3,xmm4
+ mov DWORD [12+esp],ebx
+ xor ecx,ebx
+ psrlq xmm6,17
+ xor ebx,edi
+ add edx,DWORD [8+esp]
+ ror ecx,11
+ pxor xmm7,xmm6
+ and eax,ebx
+ xor ecx,esi
+ psrlq xmm6,2
+ add edx,DWORD [84+esp]
+ xor eax,edi
+ ror ecx,2
+ pxor xmm7,xmm6
+ add eax,edx
+ add edx,DWORD [24+esp]
+ pshufd xmm7,xmm7,128
+ add eax,ecx
+ mov ecx,edx
+ ror edx,14
+ mov esi,DWORD [28+esp]
+ xor edx,ecx
+ mov edi,DWORD [esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ psrldq xmm7,8
+ mov DWORD [24+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ paddd xmm3,xmm7
+ ror edx,6
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [12+esp]
+ mov esi,eax
+ ror ecx,9
+ mov DWORD [8+esp],eax
+ pshufd xmm7,xmm3,80
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [4+esp]
+ movdqa xmm6,xmm7
+ ror ecx,11
+ psrld xmm7,10
+ and ebx,eax
+ psrlq xmm6,17
+ xor ecx,esi
+ add edx,DWORD [88+esp]
+ xor ebx,edi
+ ror ecx,2
+ pxor xmm7,xmm6
+ add ebx,edx
+ add edx,DWORD [20+esp]
+ psrlq xmm6,2
+ add ebx,ecx
+ mov ecx,edx
+ ror edx,14
+ pxor xmm7,xmm6
+ mov esi,DWORD [24+esp]
+ xor edx,ecx
+ mov edi,DWORD [28+esp]
+ pshufd xmm7,xmm7,8
+ xor esi,edi
+ ror edx,5
+ movdqa xmm6,[48+ebp]
+ and esi,ecx
+ mov DWORD [20+esp],ecx
+ pslldq xmm7,8
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [8+esp]
+ mov esi,ebx
+ ror ecx,9
+ paddd xmm3,xmm7
+ mov DWORD [4+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [esp]
+ paddd xmm6,xmm3
+ ror ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [92+esp]
+ xor eax,edi
+ ror ecx,2
+ add eax,edx
+ add edx,DWORD [16+esp]
+ add eax,ecx
+ movdqa [80+esp],xmm6
+ cmp DWORD [64+ebp],66051
+ jne NEAR L$013ssse3_00_47
+ mov ecx,edx
+ ror edx,14
+ mov esi,DWORD [20+esp]
+ xor edx,ecx
+ mov edi,DWORD [24+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [16+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [4+esp]
+ mov esi,eax
+ ror ecx,9
+ mov DWORD [esp],eax
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [28+esp]
+ ror ecx,11
+ and ebx,eax
+ xor ecx,esi
+ add edx,DWORD [32+esp]
+ xor ebx,edi
+ ror ecx,2
+ add ebx,edx
+ add edx,DWORD [12+esp]
+ add ebx,ecx
+ mov ecx,edx
+ ror edx,14
+ mov esi,DWORD [16+esp]
+ xor edx,ecx
+ mov edi,DWORD [20+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [12+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [esp]
+ mov esi,ebx
+ ror ecx,9
+ mov DWORD [28+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [24+esp]
+ ror ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [36+esp]
+ xor eax,edi
+ ror ecx,2
+ add eax,edx
+ add edx,DWORD [8+esp]
+ add eax,ecx
+ mov ecx,edx
+ ror edx,14
+ mov esi,DWORD [12+esp]
+ xor edx,ecx
+ mov edi,DWORD [16+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [8+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [28+esp]
+ mov esi,eax
+ ror ecx,9
+ mov DWORD [24+esp],eax
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [20+esp]
+ ror ecx,11
+ and ebx,eax
+ xor ecx,esi
+ add edx,DWORD [40+esp]
+ xor ebx,edi
+ ror ecx,2
+ add ebx,edx
+ add edx,DWORD [4+esp]
+ add ebx,ecx
+ mov ecx,edx
+ ror edx,14
+ mov esi,DWORD [8+esp]
+ xor edx,ecx
+ mov edi,DWORD [12+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [4+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [24+esp]
+ mov esi,ebx
+ ror ecx,9
+ mov DWORD [20+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [16+esp]
+ ror ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [44+esp]
+ xor eax,edi
+ ror ecx,2
+ add eax,edx
+ add edx,DWORD [esp]
+ add eax,ecx
+ mov ecx,edx
+ ror edx,14
+ mov esi,DWORD [4+esp]
+ xor edx,ecx
+ mov edi,DWORD [8+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [20+esp]
+ mov esi,eax
+ ror ecx,9
+ mov DWORD [16+esp],eax
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [12+esp]
+ ror ecx,11
+ and ebx,eax
+ xor ecx,esi
+ add edx,DWORD [48+esp]
+ xor ebx,edi
+ ror ecx,2
+ add ebx,edx
+ add edx,DWORD [28+esp]
+ add ebx,ecx
+ mov ecx,edx
+ ror edx,14
+ mov esi,DWORD [esp]
+ xor edx,ecx
+ mov edi,DWORD [4+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [28+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [16+esp]
+ mov esi,ebx
+ ror ecx,9
+ mov DWORD [12+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [8+esp]
+ ror ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [52+esp]
+ xor eax,edi
+ ror ecx,2
+ add eax,edx
+ add edx,DWORD [24+esp]
+ add eax,ecx
+ mov ecx,edx
+ ror edx,14
+ mov esi,DWORD [28+esp]
+ xor edx,ecx
+ mov edi,DWORD [esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [24+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [12+esp]
+ mov esi,eax
+ ror ecx,9
+ mov DWORD [8+esp],eax
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [4+esp]
+ ror ecx,11
+ and ebx,eax
+ xor ecx,esi
+ add edx,DWORD [56+esp]
+ xor ebx,edi
+ ror ecx,2
+ add ebx,edx
+ add edx,DWORD [20+esp]
+ add ebx,ecx
+ mov ecx,edx
+ ror edx,14
+ mov esi,DWORD [24+esp]
+ xor edx,ecx
+ mov edi,DWORD [28+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [20+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [8+esp]
+ mov esi,ebx
+ ror ecx,9
+ mov DWORD [4+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [esp]
+ ror ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [60+esp]
+ xor eax,edi
+ ror ecx,2
+ add eax,edx
+ add edx,DWORD [16+esp]
+ add eax,ecx
+ mov ecx,edx
+ ror edx,14
+ mov esi,DWORD [20+esp]
+ xor edx,ecx
+ mov edi,DWORD [24+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [16+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [4+esp]
+ mov esi,eax
+ ror ecx,9
+ mov DWORD [esp],eax
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [28+esp]
+ ror ecx,11
+ and ebx,eax
+ xor ecx,esi
+ add edx,DWORD [64+esp]
+ xor ebx,edi
+ ror ecx,2
+ add ebx,edx
+ add edx,DWORD [12+esp]
+ add ebx,ecx
+ mov ecx,edx
+ ror edx,14
+ mov esi,DWORD [16+esp]
+ xor edx,ecx
+ mov edi,DWORD [20+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [12+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [esp]
+ mov esi,ebx
+ ror ecx,9
+ mov DWORD [28+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [24+esp]
+ ror ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [68+esp]
+ xor eax,edi
+ ror ecx,2
+ add eax,edx
+ add edx,DWORD [8+esp]
+ add eax,ecx
+ mov ecx,edx
+ ror edx,14
+ mov esi,DWORD [12+esp]
+ xor edx,ecx
+ mov edi,DWORD [16+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [8+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [28+esp]
+ mov esi,eax
+ ror ecx,9
+ mov DWORD [24+esp],eax
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [20+esp]
+ ror ecx,11
+ and ebx,eax
+ xor ecx,esi
+ add edx,DWORD [72+esp]
+ xor ebx,edi
+ ror ecx,2
+ add ebx,edx
+ add edx,DWORD [4+esp]
+ add ebx,ecx
+ mov ecx,edx
+ ror edx,14
+ mov esi,DWORD [8+esp]
+ xor edx,ecx
+ mov edi,DWORD [12+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [4+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [24+esp]
+ mov esi,ebx
+ ror ecx,9
+ mov DWORD [20+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [16+esp]
+ ror ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [76+esp]
+ xor eax,edi
+ ror ecx,2
+ add eax,edx
+ add edx,DWORD [esp]
+ add eax,ecx
+ mov ecx,edx
+ ror edx,14
+ mov esi,DWORD [4+esp]
+ xor edx,ecx
+ mov edi,DWORD [8+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [20+esp]
+ mov esi,eax
+ ror ecx,9
+ mov DWORD [16+esp],eax
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [12+esp]
+ ror ecx,11
+ and ebx,eax
+ xor ecx,esi
+ add edx,DWORD [80+esp]
+ xor ebx,edi
+ ror ecx,2
+ add ebx,edx
+ add edx,DWORD [28+esp]
+ add ebx,ecx
+ mov ecx,edx
+ ror edx,14
+ mov esi,DWORD [esp]
+ xor edx,ecx
+ mov edi,DWORD [4+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [28+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [16+esp]
+ mov esi,ebx
+ ror ecx,9
+ mov DWORD [12+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [8+esp]
+ ror ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [84+esp]
+ xor eax,edi
+ ror ecx,2
+ add eax,edx
+ add edx,DWORD [24+esp]
+ add eax,ecx
+ mov ecx,edx
+ ror edx,14
+ mov esi,DWORD [28+esp]
+ xor edx,ecx
+ mov edi,DWORD [esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [24+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [12+esp]
+ mov esi,eax
+ ror ecx,9
+ mov DWORD [8+esp],eax
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [4+esp]
+ ror ecx,11
+ and ebx,eax
+ xor ecx,esi
+ add edx,DWORD [88+esp]
+ xor ebx,edi
+ ror ecx,2
+ add ebx,edx
+ add edx,DWORD [20+esp]
+ add ebx,ecx
+ mov ecx,edx
+ ror edx,14
+ mov esi,DWORD [24+esp]
+ xor edx,ecx
+ mov edi,DWORD [28+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [20+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [8+esp]
+ mov esi,ebx
+ ror ecx,9
+ mov DWORD [4+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [esp]
+ ror ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [92+esp]
+ xor eax,edi
+ ror ecx,2
+ add eax,edx
+ add edx,DWORD [16+esp]
+ add eax,ecx
+ mov esi,DWORD [96+esp]
+ xor ebx,edi
+ mov ecx,DWORD [12+esp]
+ add eax,DWORD [esi]
+ add ebx,DWORD [4+esi]
+ add edi,DWORD [8+esi]
+ add ecx,DWORD [12+esi]
+ mov DWORD [esi],eax
+ mov DWORD [4+esi],ebx
+ mov DWORD [8+esi],edi
+ mov DWORD [12+esi],ecx
+ mov DWORD [4+esp],ebx
+ xor ebx,edi
+ mov DWORD [8+esp],edi
+ mov DWORD [12+esp],ecx
+ mov edi,DWORD [20+esp]
+ mov ecx,DWORD [24+esp]
+ add edx,DWORD [16+esi]
+ add edi,DWORD [20+esi]
+ add ecx,DWORD [24+esi]
+ mov DWORD [16+esi],edx
+ mov DWORD [20+esi],edi
+ mov DWORD [20+esp],edi
+ mov edi,DWORD [28+esp]
+ mov DWORD [24+esi],ecx
+ add edi,DWORD [28+esi]
+ mov DWORD [24+esp],ecx
+ mov DWORD [28+esi],edi
+ mov DWORD [28+esp],edi
+ mov edi,DWORD [100+esp]
+ movdqa xmm7,[64+ebp]
+ sub ebp,192
+ cmp edi,DWORD [104+esp]
+ jb NEAR L$012grand_ssse3
+ mov esp,DWORD [108+esp]
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+align 32
+L$005AVX:
+ and edx,264
+ cmp edx,264
+ je NEAR L$014AVX_BMI
+ lea esp,[esp-96]
+ vzeroall
+ mov eax,DWORD [esi]
+ mov ebx,DWORD [4+esi]
+ mov ecx,DWORD [8+esi]
+ mov edi,DWORD [12+esi]
+ mov DWORD [4+esp],ebx
+ xor ebx,ecx
+ mov DWORD [8+esp],ecx
+ mov DWORD [12+esp],edi
+ mov edx,DWORD [16+esi]
+ mov edi,DWORD [20+esi]
+ mov ecx,DWORD [24+esi]
+ mov esi,DWORD [28+esi]
+ mov DWORD [20+esp],edi
+ mov edi,DWORD [100+esp]
+ mov DWORD [24+esp],ecx
+ mov DWORD [28+esp],esi
+ vmovdqa xmm7,[256+ebp]
+ jmp NEAR L$015grand_avx
+align 32
+L$015grand_avx:
+ vmovdqu xmm0,[edi]
+ vmovdqu xmm1,[16+edi]
+ vmovdqu xmm2,[32+edi]
+ vmovdqu xmm3,[48+edi]
+ add edi,64
+ vpshufb xmm0,xmm0,xmm7
+ mov DWORD [100+esp],edi
+ vpshufb xmm1,xmm1,xmm7
+ vpshufb xmm2,xmm2,xmm7
+ vpaddd xmm4,xmm0,[ebp]
+ vpshufb xmm3,xmm3,xmm7
+ vpaddd xmm5,xmm1,[16+ebp]
+ vpaddd xmm6,xmm2,[32+ebp]
+ vpaddd xmm7,xmm3,[48+ebp]
+ vmovdqa [32+esp],xmm4
+ vmovdqa [48+esp],xmm5
+ vmovdqa [64+esp],xmm6
+ vmovdqa [80+esp],xmm7
+ jmp NEAR L$016avx_00_47
+align 16
+L$016avx_00_47:
+ add ebp,64
+ vpalignr xmm4,xmm1,xmm0,4
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [20+esp]
+ vpalignr xmm7,xmm3,xmm2,4
+ xor edx,ecx
+ mov edi,DWORD [24+esp]
+ xor esi,edi
+ vpsrld xmm6,xmm4,7
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [16+esp],ecx
+ vpaddd xmm0,xmm0,xmm7
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ vpsrld xmm7,xmm4,3
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [4+esp]
+ vpslld xmm5,xmm4,14
+ mov esi,eax
+ shrd ecx,ecx,9
+ mov DWORD [esp],eax
+ vpxor xmm4,xmm7,xmm6
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [28+esp]
+ vpshufd xmm7,xmm3,250
+ shrd ecx,ecx,11
+ and ebx,eax
+ xor ecx,esi
+ vpsrld xmm6,xmm6,11
+ add edx,DWORD [32+esp]
+ xor ebx,edi
+ shrd ecx,ecx,2
+ vpxor xmm4,xmm4,xmm5
+ add ebx,edx
+ add edx,DWORD [12+esp]
+ add ebx,ecx
+ vpslld xmm5,xmm5,11
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [16+esp]
+ vpxor xmm4,xmm4,xmm6
+ xor edx,ecx
+ mov edi,DWORD [20+esp]
+ xor esi,edi
+ vpsrld xmm6,xmm7,10
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [12+esp],ecx
+ vpxor xmm4,xmm4,xmm5
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ vpsrlq xmm5,xmm7,17
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [esp]
+ vpaddd xmm0,xmm0,xmm4
+ mov esi,ebx
+ shrd ecx,ecx,9
+ mov DWORD [28+esp],ebx
+ vpxor xmm6,xmm6,xmm5
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [24+esp]
+ vpsrlq xmm7,xmm7,19
+ shrd ecx,ecx,11
+ and eax,ebx
+ xor ecx,esi
+ vpxor xmm6,xmm6,xmm7
+ add edx,DWORD [36+esp]
+ xor eax,edi
+ shrd ecx,ecx,2
+ vpshufd xmm7,xmm6,132
+ add eax,edx
+ add edx,DWORD [8+esp]
+ add eax,ecx
+ vpsrldq xmm7,xmm7,8
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [12+esp]
+ vpaddd xmm0,xmm0,xmm7
+ xor edx,ecx
+ mov edi,DWORD [16+esp]
+ xor esi,edi
+ vpshufd xmm7,xmm0,80
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [8+esp],ecx
+ vpsrld xmm6,xmm7,10
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ vpsrlq xmm5,xmm7,17
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [28+esp]
+ vpxor xmm6,xmm6,xmm5
+ mov esi,eax
+ shrd ecx,ecx,9
+ mov DWORD [24+esp],eax
+ vpsrlq xmm7,xmm7,19
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [20+esp]
+ vpxor xmm6,xmm6,xmm7
+ shrd ecx,ecx,11
+ and ebx,eax
+ xor ecx,esi
+ vpshufd xmm7,xmm6,232
+ add edx,DWORD [40+esp]
+ xor ebx,edi
+ shrd ecx,ecx,2
+ vpslldq xmm7,xmm7,8
+ add ebx,edx
+ add edx,DWORD [4+esp]
+ add ebx,ecx
+ vpaddd xmm0,xmm0,xmm7
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [8+esp]
+ vpaddd xmm6,xmm0,[ebp]
+ xor edx,ecx
+ mov edi,DWORD [12+esp]
+ xor esi,edi
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [4+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [24+esp]
+ mov esi,ebx
+ shrd ecx,ecx,9
+ mov DWORD [20+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [16+esp]
+ shrd ecx,ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [44+esp]
+ xor eax,edi
+ shrd ecx,ecx,2
+ add eax,edx
+ add edx,DWORD [esp]
+ add eax,ecx
+ vmovdqa [32+esp],xmm6
+ vpalignr xmm4,xmm2,xmm1,4
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [4+esp]
+ vpalignr xmm7,xmm0,xmm3,4
+ xor edx,ecx
+ mov edi,DWORD [8+esp]
+ xor esi,edi
+ vpsrld xmm6,xmm4,7
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [esp],ecx
+ vpaddd xmm1,xmm1,xmm7
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ vpsrld xmm7,xmm4,3
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [20+esp]
+ vpslld xmm5,xmm4,14
+ mov esi,eax
+ shrd ecx,ecx,9
+ mov DWORD [16+esp],eax
+ vpxor xmm4,xmm7,xmm6
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [12+esp]
+ vpshufd xmm7,xmm0,250
+ shrd ecx,ecx,11
+ and ebx,eax
+ xor ecx,esi
+ vpsrld xmm6,xmm6,11
+ add edx,DWORD [48+esp]
+ xor ebx,edi
+ shrd ecx,ecx,2
+ vpxor xmm4,xmm4,xmm5
+ add ebx,edx
+ add edx,DWORD [28+esp]
+ add ebx,ecx
+ vpslld xmm5,xmm5,11
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [esp]
+ vpxor xmm4,xmm4,xmm6
+ xor edx,ecx
+ mov edi,DWORD [4+esp]
+ xor esi,edi
+ vpsrld xmm6,xmm7,10
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [28+esp],ecx
+ vpxor xmm4,xmm4,xmm5
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ vpsrlq xmm5,xmm7,17
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [16+esp]
+ vpaddd xmm1,xmm1,xmm4
+ mov esi,ebx
+ shrd ecx,ecx,9
+ mov DWORD [12+esp],ebx
+ vpxor xmm6,xmm6,xmm5
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [8+esp]
+ vpsrlq xmm7,xmm7,19
+ shrd ecx,ecx,11
+ and eax,ebx
+ xor ecx,esi
+ vpxor xmm6,xmm6,xmm7
+ add edx,DWORD [52+esp]
+ xor eax,edi
+ shrd ecx,ecx,2
+ vpshufd xmm7,xmm6,132
+ add eax,edx
+ add edx,DWORD [24+esp]
+ add eax,ecx
+ vpsrldq xmm7,xmm7,8
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [28+esp]
+ vpaddd xmm1,xmm1,xmm7
+ xor edx,ecx
+ mov edi,DWORD [esp]
+ xor esi,edi
+ vpshufd xmm7,xmm1,80
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [24+esp],ecx
+ vpsrld xmm6,xmm7,10
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ vpsrlq xmm5,xmm7,17
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [12+esp]
+ vpxor xmm6,xmm6,xmm5
+ mov esi,eax
+ shrd ecx,ecx,9
+ mov DWORD [8+esp],eax
+ vpsrlq xmm7,xmm7,19
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [4+esp]
+ vpxor xmm6,xmm6,xmm7
+ shrd ecx,ecx,11
+ and ebx,eax
+ xor ecx,esi
+ vpshufd xmm7,xmm6,232
+ add edx,DWORD [56+esp]
+ xor ebx,edi
+ shrd ecx,ecx,2
+ vpslldq xmm7,xmm7,8
+ add ebx,edx
+ add edx,DWORD [20+esp]
+ add ebx,ecx
+ vpaddd xmm1,xmm1,xmm7
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [24+esp]
+ vpaddd xmm6,xmm1,[16+ebp]
+ xor edx,ecx
+ mov edi,DWORD [28+esp]
+ xor esi,edi
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [20+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [8+esp]
+ mov esi,ebx
+ shrd ecx,ecx,9
+ mov DWORD [4+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [esp]
+ shrd ecx,ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [60+esp]
+ xor eax,edi
+ shrd ecx,ecx,2
+ add eax,edx
+ add edx,DWORD [16+esp]
+ add eax,ecx
+ vmovdqa [48+esp],xmm6
+ vpalignr xmm4,xmm3,xmm2,4
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [20+esp]
+ vpalignr xmm7,xmm1,xmm0,4
+ xor edx,ecx
+ mov edi,DWORD [24+esp]
+ xor esi,edi
+ vpsrld xmm6,xmm4,7
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [16+esp],ecx
+ vpaddd xmm2,xmm2,xmm7
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ vpsrld xmm7,xmm4,3
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [4+esp]
+ vpslld xmm5,xmm4,14
+ mov esi,eax
+ shrd ecx,ecx,9
+ mov DWORD [esp],eax
+ vpxor xmm4,xmm7,xmm6
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [28+esp]
+ vpshufd xmm7,xmm1,250
+ shrd ecx,ecx,11
+ and ebx,eax
+ xor ecx,esi
+ vpsrld xmm6,xmm6,11
+ add edx,DWORD [64+esp]
+ xor ebx,edi
+ shrd ecx,ecx,2
+ vpxor xmm4,xmm4,xmm5
+ add ebx,edx
+ add edx,DWORD [12+esp]
+ add ebx,ecx
+ vpslld xmm5,xmm5,11
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [16+esp]
+ vpxor xmm4,xmm4,xmm6
+ xor edx,ecx
+ mov edi,DWORD [20+esp]
+ xor esi,edi
+ vpsrld xmm6,xmm7,10
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [12+esp],ecx
+ vpxor xmm4,xmm4,xmm5
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ vpsrlq xmm5,xmm7,17
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [esp]
+ vpaddd xmm2,xmm2,xmm4
+ mov esi,ebx
+ shrd ecx,ecx,9
+ mov DWORD [28+esp],ebx
+ vpxor xmm6,xmm6,xmm5
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [24+esp]
+ vpsrlq xmm7,xmm7,19
+ shrd ecx,ecx,11
+ and eax,ebx
+ xor ecx,esi
+ vpxor xmm6,xmm6,xmm7
+ add edx,DWORD [68+esp]
+ xor eax,edi
+ shrd ecx,ecx,2
+ vpshufd xmm7,xmm6,132
+ add eax,edx
+ add edx,DWORD [8+esp]
+ add eax,ecx
+ vpsrldq xmm7,xmm7,8
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [12+esp]
+ vpaddd xmm2,xmm2,xmm7
+ xor edx,ecx
+ mov edi,DWORD [16+esp]
+ xor esi,edi
+ vpshufd xmm7,xmm2,80
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [8+esp],ecx
+ vpsrld xmm6,xmm7,10
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ vpsrlq xmm5,xmm7,17
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [28+esp]
+ vpxor xmm6,xmm6,xmm5
+ mov esi,eax
+ shrd ecx,ecx,9
+ mov DWORD [24+esp],eax
+ vpsrlq xmm7,xmm7,19
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [20+esp]
+ vpxor xmm6,xmm6,xmm7
+ shrd ecx,ecx,11
+ and ebx,eax
+ xor ecx,esi
+ vpshufd xmm7,xmm6,232
+ add edx,DWORD [72+esp]
+ xor ebx,edi
+ shrd ecx,ecx,2
+ vpslldq xmm7,xmm7,8
+ add ebx,edx
+ add edx,DWORD [4+esp]
+ add ebx,ecx
+ vpaddd xmm2,xmm2,xmm7
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [8+esp]
+ vpaddd xmm6,xmm2,[32+ebp]
+ xor edx,ecx
+ mov edi,DWORD [12+esp]
+ xor esi,edi
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [4+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [24+esp]
+ mov esi,ebx
+ shrd ecx,ecx,9
+ mov DWORD [20+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [16+esp]
+ shrd ecx,ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [76+esp]
+ xor eax,edi
+ shrd ecx,ecx,2
+ add eax,edx
+ add edx,DWORD [esp]
+ add eax,ecx
+ vmovdqa [64+esp],xmm6
+ vpalignr xmm4,xmm0,xmm3,4
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [4+esp]
+ vpalignr xmm7,xmm2,xmm1,4
+ xor edx,ecx
+ mov edi,DWORD [8+esp]
+ xor esi,edi
+ vpsrld xmm6,xmm4,7
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [esp],ecx
+ vpaddd xmm3,xmm3,xmm7
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ vpsrld xmm7,xmm4,3
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [20+esp]
+ vpslld xmm5,xmm4,14
+ mov esi,eax
+ shrd ecx,ecx,9
+ mov DWORD [16+esp],eax
+ vpxor xmm4,xmm7,xmm6
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [12+esp]
+ vpshufd xmm7,xmm2,250
+ shrd ecx,ecx,11
+ and ebx,eax
+ xor ecx,esi
+ vpsrld xmm6,xmm6,11
+ add edx,DWORD [80+esp]
+ xor ebx,edi
+ shrd ecx,ecx,2
+ vpxor xmm4,xmm4,xmm5
+ add ebx,edx
+ add edx,DWORD [28+esp]
+ add ebx,ecx
+ vpslld xmm5,xmm5,11
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [esp]
+ vpxor xmm4,xmm4,xmm6
+ xor edx,ecx
+ mov edi,DWORD [4+esp]
+ xor esi,edi
+ vpsrld xmm6,xmm7,10
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [28+esp],ecx
+ vpxor xmm4,xmm4,xmm5
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ vpsrlq xmm5,xmm7,17
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [16+esp]
+ vpaddd xmm3,xmm3,xmm4
+ mov esi,ebx
+ shrd ecx,ecx,9
+ mov DWORD [12+esp],ebx
+ vpxor xmm6,xmm6,xmm5
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [8+esp]
+ vpsrlq xmm7,xmm7,19
+ shrd ecx,ecx,11
+ and eax,ebx
+ xor ecx,esi
+ vpxor xmm6,xmm6,xmm7
+ add edx,DWORD [84+esp]
+ xor eax,edi
+ shrd ecx,ecx,2
+ vpshufd xmm7,xmm6,132
+ add eax,edx
+ add edx,DWORD [24+esp]
+ add eax,ecx
+ vpsrldq xmm7,xmm7,8
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [28+esp]
+ vpaddd xmm3,xmm3,xmm7
+ xor edx,ecx
+ mov edi,DWORD [esp]
+ xor esi,edi
+ vpshufd xmm7,xmm3,80
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [24+esp],ecx
+ vpsrld xmm6,xmm7,10
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ vpsrlq xmm5,xmm7,17
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [12+esp]
+ vpxor xmm6,xmm6,xmm5
+ mov esi,eax
+ shrd ecx,ecx,9
+ mov DWORD [8+esp],eax
+ vpsrlq xmm7,xmm7,19
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [4+esp]
+ vpxor xmm6,xmm6,xmm7
+ shrd ecx,ecx,11
+ and ebx,eax
+ xor ecx,esi
+ vpshufd xmm7,xmm6,232
+ add edx,DWORD [88+esp]
+ xor ebx,edi
+ shrd ecx,ecx,2
+ vpslldq xmm7,xmm7,8
+ add ebx,edx
+ add edx,DWORD [20+esp]
+ add ebx,ecx
+ vpaddd xmm3,xmm3,xmm7
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [24+esp]
+ vpaddd xmm6,xmm3,[48+ebp]
+ xor edx,ecx
+ mov edi,DWORD [28+esp]
+ xor esi,edi
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [20+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [8+esp]
+ mov esi,ebx
+ shrd ecx,ecx,9
+ mov DWORD [4+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [esp]
+ shrd ecx,ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [92+esp]
+ xor eax,edi
+ shrd ecx,ecx,2
+ add eax,edx
+ add edx,DWORD [16+esp]
+ add eax,ecx
+ vmovdqa [80+esp],xmm6
+ cmp DWORD [64+ebp],66051
+ jne NEAR L$016avx_00_47
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [20+esp]
+ xor edx,ecx
+ mov edi,DWORD [24+esp]
+ xor esi,edi
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [16+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [4+esp]
+ mov esi,eax
+ shrd ecx,ecx,9
+ mov DWORD [esp],eax
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [28+esp]
+ shrd ecx,ecx,11
+ and ebx,eax
+ xor ecx,esi
+ add edx,DWORD [32+esp]
+ xor ebx,edi
+ shrd ecx,ecx,2
+ add ebx,edx
+ add edx,DWORD [12+esp]
+ add ebx,ecx
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [16+esp]
+ xor edx,ecx
+ mov edi,DWORD [20+esp]
+ xor esi,edi
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [12+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [esp]
+ mov esi,ebx
+ shrd ecx,ecx,9
+ mov DWORD [28+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [24+esp]
+ shrd ecx,ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [36+esp]
+ xor eax,edi
+ shrd ecx,ecx,2
+ add eax,edx
+ add edx,DWORD [8+esp]
+ add eax,ecx
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [12+esp]
+ xor edx,ecx
+ mov edi,DWORD [16+esp]
+ xor esi,edi
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [8+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [28+esp]
+ mov esi,eax
+ shrd ecx,ecx,9
+ mov DWORD [24+esp],eax
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [20+esp]
+ shrd ecx,ecx,11
+ and ebx,eax
+ xor ecx,esi
+ add edx,DWORD [40+esp]
+ xor ebx,edi
+ shrd ecx,ecx,2
+ add ebx,edx
+ add edx,DWORD [4+esp]
+ add ebx,ecx
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [8+esp]
+ xor edx,ecx
+ mov edi,DWORD [12+esp]
+ xor esi,edi
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [4+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [24+esp]
+ mov esi,ebx
+ shrd ecx,ecx,9
+ mov DWORD [20+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [16+esp]
+ shrd ecx,ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [44+esp]
+ xor eax,edi
+ shrd ecx,ecx,2
+ add eax,edx
+ add edx,DWORD [esp]
+ add eax,ecx
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [4+esp]
+ xor edx,ecx
+ mov edi,DWORD [8+esp]
+ xor esi,edi
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [20+esp]
+ mov esi,eax
+ shrd ecx,ecx,9
+ mov DWORD [16+esp],eax
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [12+esp]
+ shrd ecx,ecx,11
+ and ebx,eax
+ xor ecx,esi
+ add edx,DWORD [48+esp]
+ xor ebx,edi
+ shrd ecx,ecx,2
+ add ebx,edx
+ add edx,DWORD [28+esp]
+ add ebx,ecx
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [esp]
+ xor edx,ecx
+ mov edi,DWORD [4+esp]
+ xor esi,edi
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [28+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [16+esp]
+ mov esi,ebx
+ shrd ecx,ecx,9
+ mov DWORD [12+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [8+esp]
+ shrd ecx,ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [52+esp]
+ xor eax,edi
+ shrd ecx,ecx,2
+ add eax,edx
+ add edx,DWORD [24+esp]
+ add eax,ecx
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [28+esp]
+ xor edx,ecx
+ mov edi,DWORD [esp]
+ xor esi,edi
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [24+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [12+esp]
+ mov esi,eax
+ shrd ecx,ecx,9
+ mov DWORD [8+esp],eax
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [4+esp]
+ shrd ecx,ecx,11
+ and ebx,eax
+ xor ecx,esi
+ add edx,DWORD [56+esp]
+ xor ebx,edi
+ shrd ecx,ecx,2
+ add ebx,edx
+ add edx,DWORD [20+esp]
+ add ebx,ecx
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [24+esp]
+ xor edx,ecx
+ mov edi,DWORD [28+esp]
+ xor esi,edi
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [20+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [8+esp]
+ mov esi,ebx
+ shrd ecx,ecx,9
+ mov DWORD [4+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [esp]
+ shrd ecx,ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [60+esp]
+ xor eax,edi
+ shrd ecx,ecx,2
+ add eax,edx
+ add edx,DWORD [16+esp]
+ add eax,ecx
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [20+esp]
+ xor edx,ecx
+ mov edi,DWORD [24+esp]
+ xor esi,edi
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [16+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [4+esp]
+ mov esi,eax
+ shrd ecx,ecx,9
+ mov DWORD [esp],eax
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [28+esp]
+ shrd ecx,ecx,11
+ and ebx,eax
+ xor ecx,esi
+ add edx,DWORD [64+esp]
+ xor ebx,edi
+ shrd ecx,ecx,2
+ add ebx,edx
+ add edx,DWORD [12+esp]
+ add ebx,ecx
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [16+esp]
+ xor edx,ecx
+ mov edi,DWORD [20+esp]
+ xor esi,edi
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [12+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [esp]
+ mov esi,ebx
+ shrd ecx,ecx,9
+ mov DWORD [28+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [24+esp]
+ shrd ecx,ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [68+esp]
+ xor eax,edi
+ shrd ecx,ecx,2
+ add eax,edx
+ add edx,DWORD [8+esp]
+ add eax,ecx
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [12+esp]
+ xor edx,ecx
+ mov edi,DWORD [16+esp]
+ xor esi,edi
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [8+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [28+esp]
+ mov esi,eax
+ shrd ecx,ecx,9
+ mov DWORD [24+esp],eax
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [20+esp]
+ shrd ecx,ecx,11
+ and ebx,eax
+ xor ecx,esi
+ add edx,DWORD [72+esp]
+ xor ebx,edi
+ shrd ecx,ecx,2
+ add ebx,edx
+ add edx,DWORD [4+esp]
+ add ebx,ecx
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [8+esp]
+ xor edx,ecx
+ mov edi,DWORD [12+esp]
+ xor esi,edi
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [4+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [24+esp]
+ mov esi,ebx
+ shrd ecx,ecx,9
+ mov DWORD [20+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [16+esp]
+ shrd ecx,ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [76+esp]
+ xor eax,edi
+ shrd ecx,ecx,2
+ add eax,edx
+ add edx,DWORD [esp]
+ add eax,ecx
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [4+esp]
+ xor edx,ecx
+ mov edi,DWORD [8+esp]
+ xor esi,edi
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [20+esp]
+ mov esi,eax
+ shrd ecx,ecx,9
+ mov DWORD [16+esp],eax
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [12+esp]
+ shrd ecx,ecx,11
+ and ebx,eax
+ xor ecx,esi
+ add edx,DWORD [80+esp]
+ xor ebx,edi
+ shrd ecx,ecx,2
+ add ebx,edx
+ add edx,DWORD [28+esp]
+ add ebx,ecx
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [esp]
+ xor edx,ecx
+ mov edi,DWORD [4+esp]
+ xor esi,edi
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [28+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [16+esp]
+ mov esi,ebx
+ shrd ecx,ecx,9
+ mov DWORD [12+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [8+esp]
+ shrd ecx,ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [84+esp]
+ xor eax,edi
+ shrd ecx,ecx,2
+ add eax,edx
+ add edx,DWORD [24+esp]
+ add eax,ecx
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [28+esp]
+ xor edx,ecx
+ mov edi,DWORD [esp]
+ xor esi,edi
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [24+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [12+esp]
+ mov esi,eax
+ shrd ecx,ecx,9
+ mov DWORD [8+esp],eax
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [4+esp]
+ shrd ecx,ecx,11
+ and ebx,eax
+ xor ecx,esi
+ add edx,DWORD [88+esp]
+ xor ebx,edi
+ shrd ecx,ecx,2
+ add ebx,edx
+ add edx,DWORD [20+esp]
+ add ebx,ecx
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [24+esp]
+ xor edx,ecx
+ mov edi,DWORD [28+esp]
+ xor esi,edi
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [20+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [8+esp]
+ mov esi,ebx
+ shrd ecx,ecx,9
+ mov DWORD [4+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [esp]
+ shrd ecx,ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [92+esp]
+ xor eax,edi
+ shrd ecx,ecx,2
+ add eax,edx
+ add edx,DWORD [16+esp]
+ add eax,ecx
+ mov esi,DWORD [96+esp]
+ xor ebx,edi
+ mov ecx,DWORD [12+esp]
+ add eax,DWORD [esi]
+ add ebx,DWORD [4+esi]
+ add edi,DWORD [8+esi]
+ add ecx,DWORD [12+esi]
+ mov DWORD [esi],eax
+ mov DWORD [4+esi],ebx
+ mov DWORD [8+esi],edi
+ mov DWORD [12+esi],ecx
+ mov DWORD [4+esp],ebx
+ xor ebx,edi
+ mov DWORD [8+esp],edi
+ mov DWORD [12+esp],ecx
+ mov edi,DWORD [20+esp]
+ mov ecx,DWORD [24+esp]
+ add edx,DWORD [16+esi]
+ add edi,DWORD [20+esi]
+ add ecx,DWORD [24+esi]
+ mov DWORD [16+esi],edx
+ mov DWORD [20+esi],edi
+ mov DWORD [20+esp],edi
+ mov edi,DWORD [28+esp]
+ mov DWORD [24+esi],ecx
+ add edi,DWORD [28+esi]
+ mov DWORD [24+esp],ecx
+ mov DWORD [28+esi],edi
+ mov DWORD [28+esp],edi
+ mov edi,DWORD [100+esp]
+ vmovdqa xmm7,[64+ebp]
+ sub ebp,192
+ cmp edi,DWORD [104+esp]
+ jb NEAR L$015grand_avx
+ mov esp,DWORD [108+esp]
+ vzeroall
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+align 32
+L$014AVX_BMI:
+ lea esp,[esp-96]
+ vzeroall
+ mov eax,DWORD [esi]
+ mov ebx,DWORD [4+esi]
+ mov ecx,DWORD [8+esi]
+ mov edi,DWORD [12+esi]
+ mov DWORD [4+esp],ebx
+ xor ebx,ecx
+ mov DWORD [8+esp],ecx
+ mov DWORD [12+esp],edi
+ mov edx,DWORD [16+esi]
+ mov edi,DWORD [20+esi]
+ mov ecx,DWORD [24+esi]
+ mov esi,DWORD [28+esi]
+ mov DWORD [20+esp],edi
+ mov edi,DWORD [100+esp]
+ mov DWORD [24+esp],ecx
+ mov DWORD [28+esp],esi
+ vmovdqa xmm7,[256+ebp]
+ jmp NEAR L$017grand_avx_bmi
+align 32
+L$017grand_avx_bmi:
+ vmovdqu xmm0,[edi]
+ vmovdqu xmm1,[16+edi]
+ vmovdqu xmm2,[32+edi]
+ vmovdqu xmm3,[48+edi]
+ add edi,64
+ vpshufb xmm0,xmm0,xmm7
+ mov DWORD [100+esp],edi
+ vpshufb xmm1,xmm1,xmm7
+ vpshufb xmm2,xmm2,xmm7
+ vpaddd xmm4,xmm0,[ebp]
+ vpshufb xmm3,xmm3,xmm7
+ vpaddd xmm5,xmm1,[16+ebp]
+ vpaddd xmm6,xmm2,[32+ebp]
+ vpaddd xmm7,xmm3,[48+ebp]
+ vmovdqa [32+esp],xmm4
+ vmovdqa [48+esp],xmm5
+ vmovdqa [64+esp],xmm6
+ vmovdqa [80+esp],xmm7
+ jmp NEAR L$018avx_bmi_00_47
+align 16
+L$018avx_bmi_00_47:
+ add ebp,64
+ vpalignr xmm4,xmm1,xmm0,4
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ mov DWORD [16+esp],edx
+ vpalignr xmm7,xmm3,xmm2,4
+ rorx edi,edx,25
+ xor ecx,esi
+ andn esi,edx,DWORD [24+esp]
+ vpsrld xmm6,xmm4,7
+ xor ecx,edi
+ and edx,DWORD [20+esp]
+ mov DWORD [esp],eax
+ vpaddd xmm0,xmm0,xmm7
+ or edx,esi
+ rorx edi,eax,2
+ rorx esi,eax,13
+ vpsrld xmm7,xmm4,3
+ lea edx,[ecx*1+edx]
+ rorx ecx,eax,22
+ xor esi,edi
+ vpslld xmm5,xmm4,14
+ mov edi,DWORD [4+esp]
+ xor ecx,esi
+ xor eax,edi
+ vpxor xmm4,xmm7,xmm6
+ add edx,DWORD [28+esp]
+ and ebx,eax
+ add edx,DWORD [32+esp]
+ vpshufd xmm7,xmm3,250
+ xor ebx,edi
+ add ecx,edx
+ add edx,DWORD [12+esp]
+ vpsrld xmm6,xmm6,11
+ lea ebx,[ecx*1+ebx]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ vpxor xmm4,xmm4,xmm5
+ mov DWORD [12+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ vpslld xmm5,xmm5,11
+ andn esi,edx,DWORD [20+esp]
+ xor ecx,edi
+ and edx,DWORD [16+esp]
+ vpxor xmm4,xmm4,xmm6
+ mov DWORD [28+esp],ebx
+ or edx,esi
+ rorx edi,ebx,2
+ rorx esi,ebx,13
+ vpsrld xmm6,xmm7,10
+ lea edx,[ecx*1+edx]
+ rorx ecx,ebx,22
+ xor esi,edi
+ vpxor xmm4,xmm4,xmm5
+ mov edi,DWORD [esp]
+ xor ecx,esi
+ xor ebx,edi
+ vpsrlq xmm5,xmm7,17
+ add edx,DWORD [24+esp]
+ and eax,ebx
+ add edx,DWORD [36+esp]
+ vpaddd xmm0,xmm0,xmm4
+ xor eax,edi
+ add ecx,edx
+ add edx,DWORD [8+esp]
+ vpxor xmm6,xmm6,xmm5
+ lea eax,[ecx*1+eax]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ vpsrlq xmm7,xmm7,19
+ mov DWORD [8+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ vpxor xmm6,xmm6,xmm7
+ andn esi,edx,DWORD [16+esp]
+ xor ecx,edi
+ and edx,DWORD [12+esp]
+ vpshufd xmm7,xmm6,132
+ mov DWORD [24+esp],eax
+ or edx,esi
+ rorx edi,eax,2
+ rorx esi,eax,13
+ vpsrldq xmm7,xmm7,8
+ lea edx,[ecx*1+edx]
+ rorx ecx,eax,22
+ xor esi,edi
+ vpaddd xmm0,xmm0,xmm7
+ mov edi,DWORD [28+esp]
+ xor ecx,esi
+ xor eax,edi
+ vpshufd xmm7,xmm0,80
+ add edx,DWORD [20+esp]
+ and ebx,eax
+ add edx,DWORD [40+esp]
+ vpsrld xmm6,xmm7,10
+ xor ebx,edi
+ add ecx,edx
+ add edx,DWORD [4+esp]
+ vpsrlq xmm5,xmm7,17
+ lea ebx,[ecx*1+ebx]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ vpxor xmm6,xmm6,xmm5
+ mov DWORD [4+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ vpsrlq xmm7,xmm7,19
+ andn esi,edx,DWORD [12+esp]
+ xor ecx,edi
+ and edx,DWORD [8+esp]
+ vpxor xmm6,xmm6,xmm7
+ mov DWORD [20+esp],ebx
+ or edx,esi
+ rorx edi,ebx,2
+ rorx esi,ebx,13
+ vpshufd xmm7,xmm6,232
+ lea edx,[ecx*1+edx]
+ rorx ecx,ebx,22
+ xor esi,edi
+ vpslldq xmm7,xmm7,8
+ mov edi,DWORD [24+esp]
+ xor ecx,esi
+ xor ebx,edi
+ vpaddd xmm0,xmm0,xmm7
+ add edx,DWORD [16+esp]
+ and eax,ebx
+ add edx,DWORD [44+esp]
+ vpaddd xmm6,xmm0,[ebp]
+ xor eax,edi
+ add ecx,edx
+ add edx,DWORD [esp]
+ lea eax,[ecx*1+eax]
+ vmovdqa [32+esp],xmm6
+ vpalignr xmm4,xmm2,xmm1,4
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ mov DWORD [esp],edx
+ vpalignr xmm7,xmm0,xmm3,4
+ rorx edi,edx,25
+ xor ecx,esi
+ andn esi,edx,DWORD [8+esp]
+ vpsrld xmm6,xmm4,7
+ xor ecx,edi
+ and edx,DWORD [4+esp]
+ mov DWORD [16+esp],eax
+ vpaddd xmm1,xmm1,xmm7
+ or edx,esi
+ rorx edi,eax,2
+ rorx esi,eax,13
+ vpsrld xmm7,xmm4,3
+ lea edx,[ecx*1+edx]
+ rorx ecx,eax,22
+ xor esi,edi
+ vpslld xmm5,xmm4,14
+ mov edi,DWORD [20+esp]
+ xor ecx,esi
+ xor eax,edi
+ vpxor xmm4,xmm7,xmm6
+ add edx,DWORD [12+esp]
+ and ebx,eax
+ add edx,DWORD [48+esp]
+ vpshufd xmm7,xmm0,250
+ xor ebx,edi
+ add ecx,edx
+ add edx,DWORD [28+esp]
+ vpsrld xmm6,xmm6,11
+ lea ebx,[ecx*1+ebx]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ vpxor xmm4,xmm4,xmm5
+ mov DWORD [28+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ vpslld xmm5,xmm5,11
+ andn esi,edx,DWORD [4+esp]
+ xor ecx,edi
+ and edx,DWORD [esp]
+ vpxor xmm4,xmm4,xmm6
+ mov DWORD [12+esp],ebx
+ or edx,esi
+ rorx edi,ebx,2
+ rorx esi,ebx,13
+ vpsrld xmm6,xmm7,10
+ lea edx,[ecx*1+edx]
+ rorx ecx,ebx,22
+ xor esi,edi
+ vpxor xmm4,xmm4,xmm5
+ mov edi,DWORD [16+esp]
+ xor ecx,esi
+ xor ebx,edi
+ vpsrlq xmm5,xmm7,17
+ add edx,DWORD [8+esp]
+ and eax,ebx
+ add edx,DWORD [52+esp]
+ vpaddd xmm1,xmm1,xmm4
+ xor eax,edi
+ add ecx,edx
+ add edx,DWORD [24+esp]
+ vpxor xmm6,xmm6,xmm5
+ lea eax,[ecx*1+eax]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ vpsrlq xmm7,xmm7,19
+ mov DWORD [24+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ vpxor xmm6,xmm6,xmm7
+ andn esi,edx,DWORD [esp]
+ xor ecx,edi
+ and edx,DWORD [28+esp]
+ vpshufd xmm7,xmm6,132
+ mov DWORD [8+esp],eax
+ or edx,esi
+ rorx edi,eax,2
+ rorx esi,eax,13
+ vpsrldq xmm7,xmm7,8
+ lea edx,[ecx*1+edx]
+ rorx ecx,eax,22
+ xor esi,edi
+ vpaddd xmm1,xmm1,xmm7
+ mov edi,DWORD [12+esp]
+ xor ecx,esi
+ xor eax,edi
+ vpshufd xmm7,xmm1,80
+ add edx,DWORD [4+esp]
+ and ebx,eax
+ add edx,DWORD [56+esp]
+ vpsrld xmm6,xmm7,10
+ xor ebx,edi
+ add ecx,edx
+ add edx,DWORD [20+esp]
+ vpsrlq xmm5,xmm7,17
+ lea ebx,[ecx*1+ebx]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ vpxor xmm6,xmm6,xmm5
+ mov DWORD [20+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ vpsrlq xmm7,xmm7,19
+ andn esi,edx,DWORD [28+esp]
+ xor ecx,edi
+ and edx,DWORD [24+esp]
+ vpxor xmm6,xmm6,xmm7
+ mov DWORD [4+esp],ebx
+ or edx,esi
+ rorx edi,ebx,2
+ rorx esi,ebx,13
+ vpshufd xmm7,xmm6,232
+ lea edx,[ecx*1+edx]
+ rorx ecx,ebx,22
+ xor esi,edi
+ vpslldq xmm7,xmm7,8
+ mov edi,DWORD [8+esp]
+ xor ecx,esi
+ xor ebx,edi
+ vpaddd xmm1,xmm1,xmm7
+ add edx,DWORD [esp]
+ and eax,ebx
+ add edx,DWORD [60+esp]
+ vpaddd xmm6,xmm1,[16+ebp]
+ xor eax,edi
+ add ecx,edx
+ add edx,DWORD [16+esp]
+ lea eax,[ecx*1+eax]
+ vmovdqa [48+esp],xmm6
+ vpalignr xmm4,xmm3,xmm2,4
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ mov DWORD [16+esp],edx
+ vpalignr xmm7,xmm1,xmm0,4
+ rorx edi,edx,25
+ xor ecx,esi
+ andn esi,edx,DWORD [24+esp]
+ vpsrld xmm6,xmm4,7
+ xor ecx,edi
+ and edx,DWORD [20+esp]
+ mov DWORD [esp],eax
+ vpaddd xmm2,xmm2,xmm7
+ or edx,esi
+ rorx edi,eax,2
+ rorx esi,eax,13
+ vpsrld xmm7,xmm4,3
+ lea edx,[ecx*1+edx]
+ rorx ecx,eax,22
+ xor esi,edi
+ vpslld xmm5,xmm4,14
+ mov edi,DWORD [4+esp]
+ xor ecx,esi
+ xor eax,edi
+ vpxor xmm4,xmm7,xmm6
+ add edx,DWORD [28+esp]
+ and ebx,eax
+ add edx,DWORD [64+esp]
+ vpshufd xmm7,xmm1,250
+ xor ebx,edi
+ add ecx,edx
+ add edx,DWORD [12+esp]
+ vpsrld xmm6,xmm6,11
+ lea ebx,[ecx*1+ebx]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ vpxor xmm4,xmm4,xmm5
+ mov DWORD [12+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ vpslld xmm5,xmm5,11
+ andn esi,edx,DWORD [20+esp]
+ xor ecx,edi
+ and edx,DWORD [16+esp]
+ vpxor xmm4,xmm4,xmm6
+ mov DWORD [28+esp],ebx
+ or edx,esi
+ rorx edi,ebx,2
+ rorx esi,ebx,13
+ vpsrld xmm6,xmm7,10
+ lea edx,[ecx*1+edx]
+ rorx ecx,ebx,22
+ xor esi,edi
+ vpxor xmm4,xmm4,xmm5
+ mov edi,DWORD [esp]
+ xor ecx,esi
+ xor ebx,edi
+ vpsrlq xmm5,xmm7,17
+ add edx,DWORD [24+esp]
+ and eax,ebx
+ add edx,DWORD [68+esp]
+ vpaddd xmm2,xmm2,xmm4
+ xor eax,edi
+ add ecx,edx
+ add edx,DWORD [8+esp]
+ vpxor xmm6,xmm6,xmm5
+ lea eax,[ecx*1+eax]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ vpsrlq xmm7,xmm7,19
+ mov DWORD [8+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ vpxor xmm6,xmm6,xmm7
+ andn esi,edx,DWORD [16+esp]
+ xor ecx,edi
+ and edx,DWORD [12+esp]
+ vpshufd xmm7,xmm6,132
+ mov DWORD [24+esp],eax
+ or edx,esi
+ rorx edi,eax,2
+ rorx esi,eax,13
+ vpsrldq xmm7,xmm7,8
+ lea edx,[ecx*1+edx]
+ rorx ecx,eax,22
+ xor esi,edi
+ vpaddd xmm2,xmm2,xmm7
+ mov edi,DWORD [28+esp]
+ xor ecx,esi
+ xor eax,edi
+ vpshufd xmm7,xmm2,80
+ add edx,DWORD [20+esp]
+ and ebx,eax
+ add edx,DWORD [72+esp]
+ vpsrld xmm6,xmm7,10
+ xor ebx,edi
+ add ecx,edx
+ add edx,DWORD [4+esp]
+ vpsrlq xmm5,xmm7,17
+ lea ebx,[ecx*1+ebx]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ vpxor xmm6,xmm6,xmm5
+ mov DWORD [4+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ vpsrlq xmm7,xmm7,19
+ andn esi,edx,DWORD [12+esp]
+ xor ecx,edi
+ and edx,DWORD [8+esp]
+ vpxor xmm6,xmm6,xmm7
+ mov DWORD [20+esp],ebx
+ or edx,esi
+ rorx edi,ebx,2
+ rorx esi,ebx,13
+ vpshufd xmm7,xmm6,232
+ lea edx,[ecx*1+edx]
+ rorx ecx,ebx,22
+ xor esi,edi
+ vpslldq xmm7,xmm7,8
+ mov edi,DWORD [24+esp]
+ xor ecx,esi
+ xor ebx,edi
+ vpaddd xmm2,xmm2,xmm7
+ add edx,DWORD [16+esp]
+ and eax,ebx
+ add edx,DWORD [76+esp]
+ vpaddd xmm6,xmm2,[32+ebp]
+ xor eax,edi
+ add ecx,edx
+ add edx,DWORD [esp]
+ lea eax,[ecx*1+eax]
+ vmovdqa [64+esp],xmm6
+ vpalignr xmm4,xmm0,xmm3,4
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ mov DWORD [esp],edx
+ vpalignr xmm7,xmm2,xmm1,4
+ rorx edi,edx,25
+ xor ecx,esi
+ andn esi,edx,DWORD [8+esp]
+ vpsrld xmm6,xmm4,7
+ xor ecx,edi
+ and edx,DWORD [4+esp]
+ mov DWORD [16+esp],eax
+ vpaddd xmm3,xmm3,xmm7
+ or edx,esi
+ rorx edi,eax,2
+ rorx esi,eax,13
+ vpsrld xmm7,xmm4,3
+ lea edx,[ecx*1+edx]
+ rorx ecx,eax,22
+ xor esi,edi
+ vpslld xmm5,xmm4,14
+ mov edi,DWORD [20+esp]
+ xor ecx,esi
+ xor eax,edi
+ vpxor xmm4,xmm7,xmm6
+ add edx,DWORD [12+esp]
+ and ebx,eax
+ add edx,DWORD [80+esp]
+ vpshufd xmm7,xmm2,250
+ xor ebx,edi
+ add ecx,edx
+ add edx,DWORD [28+esp]
+ vpsrld xmm6,xmm6,11
+ lea ebx,[ecx*1+ebx]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ vpxor xmm4,xmm4,xmm5
+ mov DWORD [28+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ vpslld xmm5,xmm5,11
+ andn esi,edx,DWORD [4+esp]
+ xor ecx,edi
+ and edx,DWORD [esp]
+ vpxor xmm4,xmm4,xmm6
+ mov DWORD [12+esp],ebx
+ or edx,esi
+ rorx edi,ebx,2
+ rorx esi,ebx,13
+ vpsrld xmm6,xmm7,10
+ lea edx,[ecx*1+edx]
+ rorx ecx,ebx,22
+ xor esi,edi
+ vpxor xmm4,xmm4,xmm5
+ mov edi,DWORD [16+esp]
+ xor ecx,esi
+ xor ebx,edi
+ vpsrlq xmm5,xmm7,17
+ add edx,DWORD [8+esp]
+ and eax,ebx
+ add edx,DWORD [84+esp]
+ vpaddd xmm3,xmm3,xmm4
+ xor eax,edi
+ add ecx,edx
+ add edx,DWORD [24+esp]
+ vpxor xmm6,xmm6,xmm5
+ lea eax,[ecx*1+eax]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ vpsrlq xmm7,xmm7,19
+ mov DWORD [24+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ vpxor xmm6,xmm6,xmm7
+ andn esi,edx,DWORD [esp]
+ xor ecx,edi
+ and edx,DWORD [28+esp]
+ vpshufd xmm7,xmm6,132
+ mov DWORD [8+esp],eax
+ or edx,esi
+ rorx edi,eax,2
+ rorx esi,eax,13
+ vpsrldq xmm7,xmm7,8
+ lea edx,[ecx*1+edx]
+ rorx ecx,eax,22
+ xor esi,edi
+ vpaddd xmm3,xmm3,xmm7
+ mov edi,DWORD [12+esp]
+ xor ecx,esi
+ xor eax,edi
+ vpshufd xmm7,xmm3,80
+ add edx,DWORD [4+esp]
+ and ebx,eax
+ add edx,DWORD [88+esp]
+ vpsrld xmm6,xmm7,10
+ xor ebx,edi
+ add ecx,edx
+ add edx,DWORD [20+esp]
+ vpsrlq xmm5,xmm7,17
+ lea ebx,[ecx*1+ebx]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ vpxor xmm6,xmm6,xmm5
+ mov DWORD [20+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ vpsrlq xmm7,xmm7,19
+ andn esi,edx,DWORD [28+esp]
+ xor ecx,edi
+ and edx,DWORD [24+esp]
+ vpxor xmm6,xmm6,xmm7
+ mov DWORD [4+esp],ebx
+ or edx,esi
+ rorx edi,ebx,2
+ rorx esi,ebx,13
+ vpshufd xmm7,xmm6,232
+ lea edx,[ecx*1+edx]
+ rorx ecx,ebx,22
+ xor esi,edi
+ vpslldq xmm7,xmm7,8
+ mov edi,DWORD [8+esp]
+ xor ecx,esi
+ xor ebx,edi
+ vpaddd xmm3,xmm3,xmm7
+ add edx,DWORD [esp]
+ and eax,ebx
+ add edx,DWORD [92+esp]
+ vpaddd xmm6,xmm3,[48+ebp]
+ xor eax,edi
+ add ecx,edx
+ add edx,DWORD [16+esp]
+ lea eax,[ecx*1+eax]
+ vmovdqa [80+esp],xmm6
+ cmp DWORD [64+ebp],66051
+ jne NEAR L$018avx_bmi_00_47
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ mov DWORD [16+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ andn esi,edx,DWORD [24+esp]
+ xor ecx,edi
+ and edx,DWORD [20+esp]
+ mov DWORD [esp],eax
+ or edx,esi
+ rorx edi,eax,2
+ rorx esi,eax,13
+ lea edx,[ecx*1+edx]
+ rorx ecx,eax,22
+ xor esi,edi
+ mov edi,DWORD [4+esp]
+ xor ecx,esi
+ xor eax,edi
+ add edx,DWORD [28+esp]
+ and ebx,eax
+ add edx,DWORD [32+esp]
+ xor ebx,edi
+ add ecx,edx
+ add edx,DWORD [12+esp]
+ lea ebx,[ecx*1+ebx]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ mov DWORD [12+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ andn esi,edx,DWORD [20+esp]
+ xor ecx,edi
+ and edx,DWORD [16+esp]
+ mov DWORD [28+esp],ebx
+ or edx,esi
+ rorx edi,ebx,2
+ rorx esi,ebx,13
+ lea edx,[ecx*1+edx]
+ rorx ecx,ebx,22
+ xor esi,edi
+ mov edi,DWORD [esp]
+ xor ecx,esi
+ xor ebx,edi
+ add edx,DWORD [24+esp]
+ and eax,ebx
+ add edx,DWORD [36+esp]
+ xor eax,edi
+ add ecx,edx
+ add edx,DWORD [8+esp]
+ lea eax,[ecx*1+eax]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ mov DWORD [8+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ andn esi,edx,DWORD [16+esp]
+ xor ecx,edi
+ and edx,DWORD [12+esp]
+ mov DWORD [24+esp],eax
+ or edx,esi
+ rorx edi,eax,2
+ rorx esi,eax,13
+ lea edx,[ecx*1+edx]
+ rorx ecx,eax,22
+ xor esi,edi
+ mov edi,DWORD [28+esp]
+ xor ecx,esi
+ xor eax,edi
+ add edx,DWORD [20+esp]
+ and ebx,eax
+ add edx,DWORD [40+esp]
+ xor ebx,edi
+ add ecx,edx
+ add edx,DWORD [4+esp]
+ lea ebx,[ecx*1+ebx]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ mov DWORD [4+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ andn esi,edx,DWORD [12+esp]
+ xor ecx,edi
+ and edx,DWORD [8+esp]
+ mov DWORD [20+esp],ebx
+ or edx,esi
+ rorx edi,ebx,2
+ rorx esi,ebx,13
+ lea edx,[ecx*1+edx]
+ rorx ecx,ebx,22
+ xor esi,edi
+ mov edi,DWORD [24+esp]
+ xor ecx,esi
+ xor ebx,edi
+ add edx,DWORD [16+esp]
+ and eax,ebx
+ add edx,DWORD [44+esp]
+ xor eax,edi
+ add ecx,edx
+ add edx,DWORD [esp]
+ lea eax,[ecx*1+eax]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ mov DWORD [esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ andn esi,edx,DWORD [8+esp]
+ xor ecx,edi
+ and edx,DWORD [4+esp]
+ mov DWORD [16+esp],eax
+ or edx,esi
+ rorx edi,eax,2
+ rorx esi,eax,13
+ lea edx,[ecx*1+edx]
+ rorx ecx,eax,22
+ xor esi,edi
+ mov edi,DWORD [20+esp]
+ xor ecx,esi
+ xor eax,edi
+ add edx,DWORD [12+esp]
+ and ebx,eax
+ add edx,DWORD [48+esp]
+ xor ebx,edi
+ add ecx,edx
+ add edx,DWORD [28+esp]
+ lea ebx,[ecx*1+ebx]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ mov DWORD [28+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ andn esi,edx,DWORD [4+esp]
+ xor ecx,edi
+ and edx,DWORD [esp]
+ mov DWORD [12+esp],ebx
+ or edx,esi
+ rorx edi,ebx,2
+ rorx esi,ebx,13
+ lea edx,[ecx*1+edx]
+ rorx ecx,ebx,22
+ xor esi,edi
+ mov edi,DWORD [16+esp]
+ xor ecx,esi
+ xor ebx,edi
+ add edx,DWORD [8+esp]
+ and eax,ebx
+ add edx,DWORD [52+esp]
+ xor eax,edi
+ add ecx,edx
+ add edx,DWORD [24+esp]
+ lea eax,[ecx*1+eax]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ mov DWORD [24+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ andn esi,edx,DWORD [esp]
+ xor ecx,edi
+ and edx,DWORD [28+esp]
+ mov DWORD [8+esp],eax
+ or edx,esi
+ rorx edi,eax,2
+ rorx esi,eax,13
+ lea edx,[ecx*1+edx]
+ rorx ecx,eax,22
+ xor esi,edi
+ mov edi,DWORD [12+esp]
+ xor ecx,esi
+ xor eax,edi
+ add edx,DWORD [4+esp]
+ and ebx,eax
+ add edx,DWORD [56+esp]
+ xor ebx,edi
+ add ecx,edx
+ add edx,DWORD [20+esp]
+ lea ebx,[ecx*1+ebx]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ mov DWORD [20+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ andn esi,edx,DWORD [28+esp]
+ xor ecx,edi
+ and edx,DWORD [24+esp]
+ mov DWORD [4+esp],ebx
+ or edx,esi
+ rorx edi,ebx,2
+ rorx esi,ebx,13
+ lea edx,[ecx*1+edx]
+ rorx ecx,ebx,22
+ xor esi,edi
+ mov edi,DWORD [8+esp]
+ xor ecx,esi
+ xor ebx,edi
+ add edx,DWORD [esp]
+ and eax,ebx
+ add edx,DWORD [60+esp]
+ xor eax,edi
+ add ecx,edx
+ add edx,DWORD [16+esp]
+ lea eax,[ecx*1+eax]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ mov DWORD [16+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ andn esi,edx,DWORD [24+esp]
+ xor ecx,edi
+ and edx,DWORD [20+esp]
+ mov DWORD [esp],eax
+ or edx,esi
+ rorx edi,eax,2
+ rorx esi,eax,13
+ lea edx,[ecx*1+edx]
+ rorx ecx,eax,22
+ xor esi,edi
+ mov edi,DWORD [4+esp]
+ xor ecx,esi
+ xor eax,edi
+ add edx,DWORD [28+esp]
+ and ebx,eax
+ add edx,DWORD [64+esp]
+ xor ebx,edi
+ add ecx,edx
+ add edx,DWORD [12+esp]
+ lea ebx,[ecx*1+ebx]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ mov DWORD [12+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ andn esi,edx,DWORD [20+esp]
+ xor ecx,edi
+ and edx,DWORD [16+esp]
+ mov DWORD [28+esp],ebx
+ or edx,esi
+ rorx edi,ebx,2
+ rorx esi,ebx,13
+ lea edx,[ecx*1+edx]
+ rorx ecx,ebx,22
+ xor esi,edi
+ mov edi,DWORD [esp]
+ xor ecx,esi
+ xor ebx,edi
+ add edx,DWORD [24+esp]
+ and eax,ebx
+ add edx,DWORD [68+esp]
+ xor eax,edi
+ add ecx,edx
+ add edx,DWORD [8+esp]
+ lea eax,[ecx*1+eax]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ mov DWORD [8+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ andn esi,edx,DWORD [16+esp]
+ xor ecx,edi
+ and edx,DWORD [12+esp]
+ mov DWORD [24+esp],eax
+ or edx,esi
+ rorx edi,eax,2
+ rorx esi,eax,13
+ lea edx,[ecx*1+edx]
+ rorx ecx,eax,22
+ xor esi,edi
+ mov edi,DWORD [28+esp]
+ xor ecx,esi
+ xor eax,edi
+ add edx,DWORD [20+esp]
+ and ebx,eax
+ add edx,DWORD [72+esp]
+ xor ebx,edi
+ add ecx,edx
+ add edx,DWORD [4+esp]
+ lea ebx,[ecx*1+ebx]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ mov DWORD [4+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ andn esi,edx,DWORD [12+esp]
+ xor ecx,edi
+ and edx,DWORD [8+esp]
+ mov DWORD [20+esp],ebx
+ or edx,esi
+ rorx edi,ebx,2
+ rorx esi,ebx,13
+ lea edx,[ecx*1+edx]
+ rorx ecx,ebx,22
+ xor esi,edi
+ mov edi,DWORD [24+esp]
+ xor ecx,esi
+ xor ebx,edi
+ add edx,DWORD [16+esp]
+ and eax,ebx
+ add edx,DWORD [76+esp]
+ xor eax,edi
+ add ecx,edx
+ add edx,DWORD [esp]
+ lea eax,[ecx*1+eax]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ mov DWORD [esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ andn esi,edx,DWORD [8+esp]
+ xor ecx,edi
+ and edx,DWORD [4+esp]
+ mov DWORD [16+esp],eax
+ or edx,esi
+ rorx edi,eax,2
+ rorx esi,eax,13
+ lea edx,[ecx*1+edx]
+ rorx ecx,eax,22
+ xor esi,edi
+ mov edi,DWORD [20+esp]
+ xor ecx,esi
+ xor eax,edi
+ add edx,DWORD [12+esp]
+ and ebx,eax
+ add edx,DWORD [80+esp]
+ xor ebx,edi
+ add ecx,edx
+ add edx,DWORD [28+esp]
+ lea ebx,[ecx*1+ebx]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ mov DWORD [28+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ andn esi,edx,DWORD [4+esp]
+ xor ecx,edi
+ and edx,DWORD [esp]
+ mov DWORD [12+esp],ebx
+ or edx,esi
+ rorx edi,ebx,2
+ rorx esi,ebx,13
+ lea edx,[ecx*1+edx]
+ rorx ecx,ebx,22
+ xor esi,edi
+ mov edi,DWORD [16+esp]
+ xor ecx,esi
+ xor ebx,edi
+ add edx,DWORD [8+esp]
+ and eax,ebx
+ add edx,DWORD [84+esp]
+ xor eax,edi
+ add ecx,edx
+ add edx,DWORD [24+esp]
+ lea eax,[ecx*1+eax]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ mov DWORD [24+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ andn esi,edx,DWORD [esp]
+ xor ecx,edi
+ and edx,DWORD [28+esp]
+ mov DWORD [8+esp],eax
+ or edx,esi
+ rorx edi,eax,2
+ rorx esi,eax,13
+ lea edx,[ecx*1+edx]
+ rorx ecx,eax,22
+ xor esi,edi
+ mov edi,DWORD [12+esp]
+ xor ecx,esi
+ xor eax,edi
+ add edx,DWORD [4+esp]
+ and ebx,eax
+ add edx,DWORD [88+esp]
+ xor ebx,edi
+ add ecx,edx
+ add edx,DWORD [20+esp]
+ lea ebx,[ecx*1+ebx]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ mov DWORD [20+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ andn esi,edx,DWORD [28+esp]
+ xor ecx,edi
+ and edx,DWORD [24+esp]
+ mov DWORD [4+esp],ebx
+ or edx,esi
+ rorx edi,ebx,2
+ rorx esi,ebx,13
+ lea edx,[ecx*1+edx]
+ rorx ecx,ebx,22
+ xor esi,edi
+ mov edi,DWORD [8+esp]
+ xor ecx,esi
+ xor ebx,edi
+ add edx,DWORD [esp]
+ and eax,ebx
+ add edx,DWORD [92+esp]
+ xor eax,edi
+ add ecx,edx
+ add edx,DWORD [16+esp]
+ lea eax,[ecx*1+eax]
+ mov esi,DWORD [96+esp]
+ xor ebx,edi
+ mov ecx,DWORD [12+esp]
+ add eax,DWORD [esi]
+ add ebx,DWORD [4+esi]
+ add edi,DWORD [8+esi]
+ add ecx,DWORD [12+esi]
+ mov DWORD [esi],eax
+ mov DWORD [4+esi],ebx
+ mov DWORD [8+esi],edi
+ mov DWORD [12+esi],ecx
+ mov DWORD [4+esp],ebx
+ xor ebx,edi
+ mov DWORD [8+esp],edi
+ mov DWORD [12+esp],ecx
+ mov edi,DWORD [20+esp]
+ mov ecx,DWORD [24+esp]
+ add edx,DWORD [16+esi]
+ add edi,DWORD [20+esi]
+ add ecx,DWORD [24+esi]
+ mov DWORD [16+esi],edx
+ mov DWORD [20+esi],edi
+ mov DWORD [20+esp],edi
+ mov edi,DWORD [28+esp]
+ mov DWORD [24+esi],ecx
+ add edi,DWORD [28+esi]
+ mov DWORD [24+esp],ecx
+ mov DWORD [28+esi],edi
+ mov DWORD [28+esp],edi
+ mov edi,DWORD [100+esp]
+ vmovdqa xmm7,[64+ebp]
+ sub ebp,192
+ cmp edi,DWORD [104+esp]
+ jb NEAR L$017grand_avx_bmi
+ mov esp,DWORD [108+esp]
+ vzeroall
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+segment .bss
+common _OPENSSL_ia32cap_P 16
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha512-586.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha512-586.nasm
new file mode 100644
index 0000000000..f80f1cca53
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha512-586.nasm
@@ -0,0 +1,2842 @@
+; Copyright 2007-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text code align=64
+%else
+section .text code
+%endif
+;extern _OPENSSL_ia32cap_P
+global _sha512_block_data_order
+align 16
+_sha512_block_data_order:
+L$_sha512_block_data_order_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov esi,DWORD [20+esp]
+ mov edi,DWORD [24+esp]
+ mov eax,DWORD [28+esp]
+ mov ebx,esp
+ call L$000pic_point
+L$000pic_point:
+ pop ebp
+ lea ebp,[(L$001K512-L$000pic_point)+ebp]
+ sub esp,16
+ and esp,-64
+ shl eax,7
+ add eax,edi
+ mov DWORD [esp],esi
+ mov DWORD [4+esp],edi
+ mov DWORD [8+esp],eax
+ mov DWORD [12+esp],ebx
+ lea edx,[_OPENSSL_ia32cap_P]
+ mov ecx,DWORD [edx]
+ test ecx,67108864
+ jz NEAR L$002loop_x86
+ mov edx,DWORD [4+edx]
+ movq mm0,[esi]
+ and ecx,16777216
+ movq mm1,[8+esi]
+ and edx,512
+ movq mm2,[16+esi]
+ or ecx,edx
+ movq mm3,[24+esi]
+ movq mm4,[32+esi]
+ movq mm5,[40+esi]
+ movq mm6,[48+esi]
+ movq mm7,[56+esi]
+ cmp ecx,16777728
+ je NEAR L$003SSSE3
+ sub esp,80
+ jmp NEAR L$004loop_sse2
+align 16
+L$004loop_sse2:
+ movq [8+esp],mm1
+ movq [16+esp],mm2
+ movq [24+esp],mm3
+ movq [40+esp],mm5
+ movq [48+esp],mm6
+ pxor mm2,mm1
+ movq [56+esp],mm7
+ movq mm3,mm0
+ mov eax,DWORD [edi]
+ mov ebx,DWORD [4+edi]
+ add edi,8
+ mov edx,15
+ bswap eax
+ bswap ebx
+ jmp NEAR L$00500_14_sse2
+align 16
+L$00500_14_sse2:
+ movd mm1,eax
+ mov eax,DWORD [edi]
+ movd mm7,ebx
+ mov ebx,DWORD [4+edi]
+ add edi,8
+ bswap eax
+ bswap ebx
+ punpckldq mm7,mm1
+ movq mm1,mm4
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [32+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ movq mm0,mm3
+ movq [72+esp],mm7
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [esp],mm0
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[56+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ paddq mm7,[ebp]
+ pxor mm3,mm4
+ movq mm4,[24+esp]
+ paddq mm3,mm7
+ movq mm5,mm0
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm0
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[8+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ sub esp,8
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm0,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm2,mm0
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm2,mm1
+ pxor mm6,mm7
+ movq mm5,[40+esp]
+ paddq mm3,mm2
+ movq mm2,mm0
+ add ebp,8
+ paddq mm3,mm6
+ movq mm6,[48+esp]
+ dec edx
+ jnz NEAR L$00500_14_sse2
+ movd mm1,eax
+ movd mm7,ebx
+ punpckldq mm7,mm1
+ movq mm1,mm4
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [32+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ movq mm0,mm3
+ movq [72+esp],mm7
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [esp],mm0
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[56+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ paddq mm7,[ebp]
+ pxor mm3,mm4
+ movq mm4,[24+esp]
+ paddq mm3,mm7
+ movq mm5,mm0
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm0
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[8+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ sub esp,8
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm0,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm2,mm0
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm2,mm1
+ pxor mm6,mm7
+ movq mm7,[192+esp]
+ paddq mm3,mm2
+ movq mm2,mm0
+ add ebp,8
+ paddq mm3,mm6
+ pxor mm0,mm0
+ mov edx,32
+ jmp NEAR L$00616_79_sse2
+align 16
+L$00616_79_sse2:
+ movq mm5,[88+esp]
+ movq mm1,mm7
+ psrlq mm7,1
+ movq mm6,mm5
+ psrlq mm5,6
+ psllq mm1,56
+ paddq mm0,mm3
+ movq mm3,mm7
+ psrlq mm7,6
+ pxor mm3,mm1
+ psllq mm1,7
+ pxor mm3,mm7
+ psrlq mm7,1
+ pxor mm3,mm1
+ movq mm1,mm5
+ psrlq mm5,13
+ pxor mm7,mm3
+ psllq mm6,3
+ pxor mm1,mm5
+ paddq mm7,[200+esp]
+ pxor mm1,mm6
+ psrlq mm5,42
+ paddq mm7,[128+esp]
+ pxor mm1,mm5
+ psllq mm6,42
+ movq mm5,[40+esp]
+ pxor mm1,mm6
+ movq mm6,[48+esp]
+ paddq mm7,mm1
+ movq mm1,mm4
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [32+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ movq [72+esp],mm7
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [esp],mm0
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[56+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ paddq mm7,[ebp]
+ pxor mm3,mm4
+ movq mm4,[24+esp]
+ paddq mm3,mm7
+ movq mm5,mm0
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm0
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[8+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ sub esp,8
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm0,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm2,mm0
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm2,mm1
+ pxor mm6,mm7
+ movq mm7,[192+esp]
+ paddq mm2,mm6
+ add ebp,8
+ movq mm5,[88+esp]
+ movq mm1,mm7
+ psrlq mm7,1
+ movq mm6,mm5
+ psrlq mm5,6
+ psllq mm1,56
+ paddq mm2,mm3
+ movq mm3,mm7
+ psrlq mm7,6
+ pxor mm3,mm1
+ psllq mm1,7
+ pxor mm3,mm7
+ psrlq mm7,1
+ pxor mm3,mm1
+ movq mm1,mm5
+ psrlq mm5,13
+ pxor mm7,mm3
+ psllq mm6,3
+ pxor mm1,mm5
+ paddq mm7,[200+esp]
+ pxor mm1,mm6
+ psrlq mm5,42
+ paddq mm7,[128+esp]
+ pxor mm1,mm5
+ psllq mm6,42
+ movq mm5,[40+esp]
+ pxor mm1,mm6
+ movq mm6,[48+esp]
+ paddq mm7,mm1
+ movq mm1,mm4
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [32+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ movq [72+esp],mm7
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [esp],mm2
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[56+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ paddq mm7,[ebp]
+ pxor mm3,mm4
+ movq mm4,[24+esp]
+ paddq mm3,mm7
+ movq mm5,mm2
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm2
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[8+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ sub esp,8
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm2,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm0,mm2
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm0,mm1
+ pxor mm6,mm7
+ movq mm7,[192+esp]
+ paddq mm0,mm6
+ add ebp,8
+ dec edx
+ jnz NEAR L$00616_79_sse2
+ paddq mm0,mm3
+ movq mm1,[8+esp]
+ movq mm3,[24+esp]
+ movq mm5,[40+esp]
+ movq mm6,[48+esp]
+ movq mm7,[56+esp]
+ pxor mm2,mm1
+ paddq mm0,[esi]
+ paddq mm1,[8+esi]
+ paddq mm2,[16+esi]
+ paddq mm3,[24+esi]
+ paddq mm4,[32+esi]
+ paddq mm5,[40+esi]
+ paddq mm6,[48+esi]
+ paddq mm7,[56+esi]
+ mov eax,640
+ movq [esi],mm0
+ movq [8+esi],mm1
+ movq [16+esi],mm2
+ movq [24+esi],mm3
+ movq [32+esi],mm4
+ movq [40+esi],mm5
+ movq [48+esi],mm6
+ movq [56+esi],mm7
+ lea esp,[eax*1+esp]
+ sub ebp,eax
+ cmp edi,DWORD [88+esp]
+ jb NEAR L$004loop_sse2
+ mov esp,DWORD [92+esp]
+ emms
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+align 32
+L$003SSSE3:
+ lea edx,[esp-64]
+ sub esp,256
+ movdqa xmm1,[640+ebp]
+ movdqu xmm0,[edi]
+db 102,15,56,0,193
+ movdqa xmm3,[ebp]
+ movdqa xmm2,xmm1
+ movdqu xmm1,[16+edi]
+ paddq xmm3,xmm0
+db 102,15,56,0,202
+ movdqa [edx-128],xmm3
+ movdqa xmm4,[16+ebp]
+ movdqa xmm3,xmm2
+ movdqu xmm2,[32+edi]
+ paddq xmm4,xmm1
+db 102,15,56,0,211
+ movdqa [edx-112],xmm4
+ movdqa xmm5,[32+ebp]
+ movdqa xmm4,xmm3
+ movdqu xmm3,[48+edi]
+ paddq xmm5,xmm2
+db 102,15,56,0,220
+ movdqa [edx-96],xmm5
+ movdqa xmm6,[48+ebp]
+ movdqa xmm5,xmm4
+ movdqu xmm4,[64+edi]
+ paddq xmm6,xmm3
+db 102,15,56,0,229
+ movdqa [edx-80],xmm6
+ movdqa xmm7,[64+ebp]
+ movdqa xmm6,xmm5
+ movdqu xmm5,[80+edi]
+ paddq xmm7,xmm4
+db 102,15,56,0,238
+ movdqa [edx-64],xmm7
+ movdqa [edx],xmm0
+ movdqa xmm0,[80+ebp]
+ movdqa xmm7,xmm6
+ movdqu xmm6,[96+edi]
+ paddq xmm0,xmm5
+db 102,15,56,0,247
+ movdqa [edx-48],xmm0
+ movdqa [16+edx],xmm1
+ movdqa xmm1,[96+ebp]
+ movdqa xmm0,xmm7
+ movdqu xmm7,[112+edi]
+ paddq xmm1,xmm6
+db 102,15,56,0,248
+ movdqa [edx-32],xmm1
+ movdqa [32+edx],xmm2
+ movdqa xmm2,[112+ebp]
+ movdqa xmm0,[edx]
+ paddq xmm2,xmm7
+ movdqa [edx-16],xmm2
+ nop
+align 32
+L$007loop_ssse3:
+ movdqa xmm2,[16+edx]
+ movdqa [48+edx],xmm3
+ lea ebp,[128+ebp]
+ movq [8+esp],mm1
+ mov ebx,edi
+ movq [16+esp],mm2
+ lea edi,[128+edi]
+ movq [24+esp],mm3
+ cmp edi,eax
+ movq [40+esp],mm5
+ cmovb ebx,edi
+ movq [48+esp],mm6
+ mov ecx,4
+ pxor mm2,mm1
+ movq [56+esp],mm7
+ pxor mm3,mm3
+ jmp NEAR L$00800_47_ssse3
+align 32
+L$00800_47_ssse3:
+ movdqa xmm3,xmm5
+ movdqa xmm1,xmm2
+db 102,15,58,15,208,8
+ movdqa [edx],xmm4
+db 102,15,58,15,220,8
+ movdqa xmm4,xmm2
+ psrlq xmm2,7
+ paddq xmm0,xmm3
+ movdqa xmm3,xmm4
+ psrlq xmm4,1
+ psllq xmm3,56
+ pxor xmm2,xmm4
+ psrlq xmm4,7
+ pxor xmm2,xmm3
+ psllq xmm3,7
+ pxor xmm2,xmm4
+ movdqa xmm4,xmm7
+ pxor xmm2,xmm3
+ movdqa xmm3,xmm7
+ psrlq xmm4,6
+ paddq xmm0,xmm2
+ movdqa xmm2,xmm7
+ psrlq xmm3,19
+ psllq xmm2,3
+ pxor xmm4,xmm3
+ psrlq xmm3,42
+ pxor xmm4,xmm2
+ psllq xmm2,42
+ pxor xmm4,xmm3
+ movdqa xmm3,[32+edx]
+ pxor xmm4,xmm2
+ movdqa xmm2,[ebp]
+ movq mm1,mm4
+ paddq xmm0,xmm4
+ movq mm7,[edx-128]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [32+esp],mm4
+ paddq xmm2,xmm0
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm0,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [esp],mm0
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[56+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[24+esp]
+ paddq mm3,mm7
+ movq mm5,mm0
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm0
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[8+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm0,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm2,mm0
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm2,mm1
+ pxor mm6,mm7
+ movq mm5,[32+esp]
+ paddq mm2,mm6
+ movq mm6,[40+esp]
+ movq mm1,mm4
+ movq mm7,[edx-120]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [24+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm2,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [56+esp],mm2
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[48+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[16+esp]
+ paddq mm3,mm7
+ movq mm5,mm2
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm2
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm2,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm0,mm2
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm0,mm1
+ pxor mm6,mm7
+ movq mm5,[24+esp]
+ paddq mm0,mm6
+ movq mm6,[32+esp]
+ movdqa [edx-128],xmm2
+ movdqa xmm4,xmm6
+ movdqa xmm2,xmm3
+db 102,15,58,15,217,8
+ movdqa [16+edx],xmm5
+db 102,15,58,15,229,8
+ movdqa xmm5,xmm3
+ psrlq xmm3,7
+ paddq xmm1,xmm4
+ movdqa xmm4,xmm5
+ psrlq xmm5,1
+ psllq xmm4,56
+ pxor xmm3,xmm5
+ psrlq xmm5,7
+ pxor xmm3,xmm4
+ psllq xmm4,7
+ pxor xmm3,xmm5
+ movdqa xmm5,xmm0
+ pxor xmm3,xmm4
+ movdqa xmm4,xmm0
+ psrlq xmm5,6
+ paddq xmm1,xmm3
+ movdqa xmm3,xmm0
+ psrlq xmm4,19
+ psllq xmm3,3
+ pxor xmm5,xmm4
+ psrlq xmm4,42
+ pxor xmm5,xmm3
+ psllq xmm3,42
+ pxor xmm5,xmm4
+ movdqa xmm4,[48+edx]
+ pxor xmm5,xmm3
+ movdqa xmm3,[16+ebp]
+ movq mm1,mm4
+ paddq xmm1,xmm5
+ movq mm7,[edx-112]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [16+esp],mm4
+ paddq xmm3,xmm1
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm0,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [48+esp],mm0
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[40+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[8+esp]
+ paddq mm3,mm7
+ movq mm5,mm0
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm0
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[56+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm0,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm2,mm0
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm2,mm1
+ pxor mm6,mm7
+ movq mm5,[16+esp]
+ paddq mm2,mm6
+ movq mm6,[24+esp]
+ movq mm1,mm4
+ movq mm7,[edx-104]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [8+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm2,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [40+esp],mm2
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[32+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[esp]
+ paddq mm3,mm7
+ movq mm5,mm2
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm2
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[48+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm2,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm0,mm2
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm0,mm1
+ pxor mm6,mm7
+ movq mm5,[8+esp]
+ paddq mm0,mm6
+ movq mm6,[16+esp]
+ movdqa [edx-112],xmm3
+ movdqa xmm5,xmm7
+ movdqa xmm3,xmm4
+db 102,15,58,15,226,8
+ movdqa [32+edx],xmm6
+db 102,15,58,15,238,8
+ movdqa xmm6,xmm4
+ psrlq xmm4,7
+ paddq xmm2,xmm5
+ movdqa xmm5,xmm6
+ psrlq xmm6,1
+ psllq xmm5,56
+ pxor xmm4,xmm6
+ psrlq xmm6,7
+ pxor xmm4,xmm5
+ psllq xmm5,7
+ pxor xmm4,xmm6
+ movdqa xmm6,xmm1
+ pxor xmm4,xmm5
+ movdqa xmm5,xmm1
+ psrlq xmm6,6
+ paddq xmm2,xmm4
+ movdqa xmm4,xmm1
+ psrlq xmm5,19
+ psllq xmm4,3
+ pxor xmm6,xmm5
+ psrlq xmm5,42
+ pxor xmm6,xmm4
+ psllq xmm4,42
+ pxor xmm6,xmm5
+ movdqa xmm5,[edx]
+ pxor xmm6,xmm4
+ movdqa xmm4,[32+ebp]
+ movq mm1,mm4
+ paddq xmm2,xmm6
+ movq mm7,[edx-96]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [esp],mm4
+ paddq xmm4,xmm2
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm0,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [32+esp],mm0
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[24+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[56+esp]
+ paddq mm3,mm7
+ movq mm5,mm0
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm0
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[40+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm0,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm2,mm0
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm2,mm1
+ pxor mm6,mm7
+ movq mm5,[esp]
+ paddq mm2,mm6
+ movq mm6,[8+esp]
+ movq mm1,mm4
+ movq mm7,[edx-88]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [56+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm2,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [24+esp],mm2
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[16+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[48+esp]
+ paddq mm3,mm7
+ movq mm5,mm2
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm2
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[32+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm2,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm0,mm2
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm0,mm1
+ pxor mm6,mm7
+ movq mm5,[56+esp]
+ paddq mm0,mm6
+ movq mm6,[esp]
+ movdqa [edx-96],xmm4
+ movdqa xmm6,xmm0
+ movdqa xmm4,xmm5
+db 102,15,58,15,235,8
+ movdqa [48+edx],xmm7
+db 102,15,58,15,247,8
+ movdqa xmm7,xmm5
+ psrlq xmm5,7
+ paddq xmm3,xmm6
+ movdqa xmm6,xmm7
+ psrlq xmm7,1
+ psllq xmm6,56
+ pxor xmm5,xmm7
+ psrlq xmm7,7
+ pxor xmm5,xmm6
+ psllq xmm6,7
+ pxor xmm5,xmm7
+ movdqa xmm7,xmm2
+ pxor xmm5,xmm6
+ movdqa xmm6,xmm2
+ psrlq xmm7,6
+ paddq xmm3,xmm5
+ movdqa xmm5,xmm2
+ psrlq xmm6,19
+ psllq xmm5,3
+ pxor xmm7,xmm6
+ psrlq xmm6,42
+ pxor xmm7,xmm5
+ psllq xmm5,42
+ pxor xmm7,xmm6
+ movdqa xmm6,[16+edx]
+ pxor xmm7,xmm5
+ movdqa xmm5,[48+ebp]
+ movq mm1,mm4
+ paddq xmm3,xmm7
+ movq mm7,[edx-80]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [48+esp],mm4
+ paddq xmm5,xmm3
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm0,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [16+esp],mm0
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[8+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[40+esp]
+ paddq mm3,mm7
+ movq mm5,mm0
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm0
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[24+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm0,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm2,mm0
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm2,mm1
+ pxor mm6,mm7
+ movq mm5,[48+esp]
+ paddq mm2,mm6
+ movq mm6,[56+esp]
+ movq mm1,mm4
+ movq mm7,[edx-72]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [40+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm2,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [8+esp],mm2
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[32+esp]
+ paddq mm3,mm7
+ movq mm5,mm2
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm2
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[16+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm2,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm0,mm2
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm0,mm1
+ pxor mm6,mm7
+ movq mm5,[40+esp]
+ paddq mm0,mm6
+ movq mm6,[48+esp]
+ movdqa [edx-80],xmm5
+ movdqa xmm7,xmm1
+ movdqa xmm5,xmm6
+db 102,15,58,15,244,8
+ movdqa [edx],xmm0
+db 102,15,58,15,248,8
+ movdqa xmm0,xmm6
+ psrlq xmm6,7
+ paddq xmm4,xmm7
+ movdqa xmm7,xmm0
+ psrlq xmm0,1
+ psllq xmm7,56
+ pxor xmm6,xmm0
+ psrlq xmm0,7
+ pxor xmm6,xmm7
+ psllq xmm7,7
+ pxor xmm6,xmm0
+ movdqa xmm0,xmm3
+ pxor xmm6,xmm7
+ movdqa xmm7,xmm3
+ psrlq xmm0,6
+ paddq xmm4,xmm6
+ movdqa xmm6,xmm3
+ psrlq xmm7,19
+ psllq xmm6,3
+ pxor xmm0,xmm7
+ psrlq xmm7,42
+ pxor xmm0,xmm6
+ psllq xmm6,42
+ pxor xmm0,xmm7
+ movdqa xmm7,[32+edx]
+ pxor xmm0,xmm6
+ movdqa xmm6,[64+ebp]
+ movq mm1,mm4
+ paddq xmm4,xmm0
+ movq mm7,[edx-64]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [32+esp],mm4
+ paddq xmm6,xmm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm0,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [esp],mm0
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[56+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[24+esp]
+ paddq mm3,mm7
+ movq mm5,mm0
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm0
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[8+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm0,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm2,mm0
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm2,mm1
+ pxor mm6,mm7
+ movq mm5,[32+esp]
+ paddq mm2,mm6
+ movq mm6,[40+esp]
+ movq mm1,mm4
+ movq mm7,[edx-56]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [24+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm2,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [56+esp],mm2
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[48+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[16+esp]
+ paddq mm3,mm7
+ movq mm5,mm2
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm2
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm2,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm0,mm2
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm0,mm1
+ pxor mm6,mm7
+ movq mm5,[24+esp]
+ paddq mm0,mm6
+ movq mm6,[32+esp]
+ movdqa [edx-64],xmm6
+ movdqa xmm0,xmm2
+ movdqa xmm6,xmm7
+db 102,15,58,15,253,8
+ movdqa [16+edx],xmm1
+db 102,15,58,15,193,8
+ movdqa xmm1,xmm7
+ psrlq xmm7,7
+ paddq xmm5,xmm0
+ movdqa xmm0,xmm1
+ psrlq xmm1,1
+ psllq xmm0,56
+ pxor xmm7,xmm1
+ psrlq xmm1,7
+ pxor xmm7,xmm0
+ psllq xmm0,7
+ pxor xmm7,xmm1
+ movdqa xmm1,xmm4
+ pxor xmm7,xmm0
+ movdqa xmm0,xmm4
+ psrlq xmm1,6
+ paddq xmm5,xmm7
+ movdqa xmm7,xmm4
+ psrlq xmm0,19
+ psllq xmm7,3
+ pxor xmm1,xmm0
+ psrlq xmm0,42
+ pxor xmm1,xmm7
+ psllq xmm7,42
+ pxor xmm1,xmm0
+ movdqa xmm0,[48+edx]
+ pxor xmm1,xmm7
+ movdqa xmm7,[80+ebp]
+ movq mm1,mm4
+ paddq xmm5,xmm1
+ movq mm7,[edx-48]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [16+esp],mm4
+ paddq xmm7,xmm5
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm0,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [48+esp],mm0
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[40+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[8+esp]
+ paddq mm3,mm7
+ movq mm5,mm0
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm0
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[56+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm0,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm2,mm0
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm2,mm1
+ pxor mm6,mm7
+ movq mm5,[16+esp]
+ paddq mm2,mm6
+ movq mm6,[24+esp]
+ movq mm1,mm4
+ movq mm7,[edx-40]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [8+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm2,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [40+esp],mm2
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[32+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[esp]
+ paddq mm3,mm7
+ movq mm5,mm2
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm2
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[48+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm2,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm0,mm2
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm0,mm1
+ pxor mm6,mm7
+ movq mm5,[8+esp]
+ paddq mm0,mm6
+ movq mm6,[16+esp]
+ movdqa [edx-48],xmm7
+ movdqa xmm1,xmm3
+ movdqa xmm7,xmm0
+db 102,15,58,15,198,8
+ movdqa [32+edx],xmm2
+db 102,15,58,15,202,8
+ movdqa xmm2,xmm0
+ psrlq xmm0,7
+ paddq xmm6,xmm1
+ movdqa xmm1,xmm2
+ psrlq xmm2,1
+ psllq xmm1,56
+ pxor xmm0,xmm2
+ psrlq xmm2,7
+ pxor xmm0,xmm1
+ psllq xmm1,7
+ pxor xmm0,xmm2
+ movdqa xmm2,xmm5
+ pxor xmm0,xmm1
+ movdqa xmm1,xmm5
+ psrlq xmm2,6
+ paddq xmm6,xmm0
+ movdqa xmm0,xmm5
+ psrlq xmm1,19
+ psllq xmm0,3
+ pxor xmm2,xmm1
+ psrlq xmm1,42
+ pxor xmm2,xmm0
+ psllq xmm0,42
+ pxor xmm2,xmm1
+ movdqa xmm1,[edx]
+ pxor xmm2,xmm0
+ movdqa xmm0,[96+ebp]
+ movq mm1,mm4
+ paddq xmm6,xmm2
+ movq mm7,[edx-32]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [esp],mm4
+ paddq xmm0,xmm6
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm0,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [32+esp],mm0
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[24+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[56+esp]
+ paddq mm3,mm7
+ movq mm5,mm0
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm0
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[40+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm0,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm2,mm0
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm2,mm1
+ pxor mm6,mm7
+ movq mm5,[esp]
+ paddq mm2,mm6
+ movq mm6,[8+esp]
+ movq mm1,mm4
+ movq mm7,[edx-24]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [56+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm2,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [24+esp],mm2
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[16+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[48+esp]
+ paddq mm3,mm7
+ movq mm5,mm2
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm2
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[32+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm2,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm0,mm2
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm0,mm1
+ pxor mm6,mm7
+ movq mm5,[56+esp]
+ paddq mm0,mm6
+ movq mm6,[esp]
+ movdqa [edx-32],xmm0
+ movdqa xmm2,xmm4
+ movdqa xmm0,xmm1
+db 102,15,58,15,207,8
+ movdqa [48+edx],xmm3
+db 102,15,58,15,211,8
+ movdqa xmm3,xmm1
+ psrlq xmm1,7
+ paddq xmm7,xmm2
+ movdqa xmm2,xmm3
+ psrlq xmm3,1
+ psllq xmm2,56
+ pxor xmm1,xmm3
+ psrlq xmm3,7
+ pxor xmm1,xmm2
+ psllq xmm2,7
+ pxor xmm1,xmm3
+ movdqa xmm3,xmm6
+ pxor xmm1,xmm2
+ movdqa xmm2,xmm6
+ psrlq xmm3,6
+ paddq xmm7,xmm1
+ movdqa xmm1,xmm6
+ psrlq xmm2,19
+ psllq xmm1,3
+ pxor xmm3,xmm2
+ psrlq xmm2,42
+ pxor xmm3,xmm1
+ psllq xmm1,42
+ pxor xmm3,xmm2
+ movdqa xmm2,[16+edx]
+ pxor xmm3,xmm1
+ movdqa xmm1,[112+ebp]
+ movq mm1,mm4
+ paddq xmm7,xmm3
+ movq mm7,[edx-16]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [48+esp],mm4
+ paddq xmm1,xmm7
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm0,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [16+esp],mm0
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[8+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[40+esp]
+ paddq mm3,mm7
+ movq mm5,mm0
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm0
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[24+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm0,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm2,mm0
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm2,mm1
+ pxor mm6,mm7
+ movq mm5,[48+esp]
+ paddq mm2,mm6
+ movq mm6,[56+esp]
+ movq mm1,mm4
+ movq mm7,[edx-8]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [40+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm2,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [8+esp],mm2
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[32+esp]
+ paddq mm3,mm7
+ movq mm5,mm2
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm2
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[16+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm2,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm0,mm2
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm0,mm1
+ pxor mm6,mm7
+ movq mm5,[40+esp]
+ paddq mm0,mm6
+ movq mm6,[48+esp]
+ movdqa [edx-16],xmm1
+ lea ebp,[128+ebp]
+ dec ecx
+ jnz NEAR L$00800_47_ssse3
+ movdqa xmm1,[ebp]
+ lea ebp,[ebp-640]
+ movdqu xmm0,[ebx]
+db 102,15,56,0,193
+ movdqa xmm3,[ebp]
+ movdqa xmm2,xmm1
+ movdqu xmm1,[16+ebx]
+ paddq xmm3,xmm0
+db 102,15,56,0,202
+ movq mm1,mm4
+ movq mm7,[edx-128]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [32+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm0,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [esp],mm0
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[56+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[24+esp]
+ paddq mm3,mm7
+ movq mm5,mm0
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm0
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[8+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm0,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm2,mm0
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm2,mm1
+ pxor mm6,mm7
+ movq mm5,[32+esp]
+ paddq mm2,mm6
+ movq mm6,[40+esp]
+ movq mm1,mm4
+ movq mm7,[edx-120]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [24+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm2,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [56+esp],mm2
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[48+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[16+esp]
+ paddq mm3,mm7
+ movq mm5,mm2
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm2
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm2,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm0,mm2
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm0,mm1
+ pxor mm6,mm7
+ movq mm5,[24+esp]
+ paddq mm0,mm6
+ movq mm6,[32+esp]
+ movdqa [edx-128],xmm3
+ movdqa xmm4,[16+ebp]
+ movdqa xmm3,xmm2
+ movdqu xmm2,[32+ebx]
+ paddq xmm4,xmm1
+db 102,15,56,0,211
+ movq mm1,mm4
+ movq mm7,[edx-112]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [16+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm0,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [48+esp],mm0
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[40+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[8+esp]
+ paddq mm3,mm7
+ movq mm5,mm0
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm0
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[56+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm0,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm2,mm0
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm2,mm1
+ pxor mm6,mm7
+ movq mm5,[16+esp]
+ paddq mm2,mm6
+ movq mm6,[24+esp]
+ movq mm1,mm4
+ movq mm7,[edx-104]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [8+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm2,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [40+esp],mm2
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[32+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[esp]
+ paddq mm3,mm7
+ movq mm5,mm2
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm2
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[48+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm2,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm0,mm2
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm0,mm1
+ pxor mm6,mm7
+ movq mm5,[8+esp]
+ paddq mm0,mm6
+ movq mm6,[16+esp]
+ movdqa [edx-112],xmm4
+ movdqa xmm5,[32+ebp]
+ movdqa xmm4,xmm3
+ movdqu xmm3,[48+ebx]
+ paddq xmm5,xmm2
+db 102,15,56,0,220
+ movq mm1,mm4
+ movq mm7,[edx-96]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm0,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [32+esp],mm0
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[24+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[56+esp]
+ paddq mm3,mm7
+ movq mm5,mm0
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm0
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[40+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm0,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm2,mm0
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm2,mm1
+ pxor mm6,mm7
+ movq mm5,[esp]
+ paddq mm2,mm6
+ movq mm6,[8+esp]
+ movq mm1,mm4
+ movq mm7,[edx-88]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [56+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm2,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [24+esp],mm2
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[16+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[48+esp]
+ paddq mm3,mm7
+ movq mm5,mm2
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm2
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[32+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm2,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm0,mm2
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm0,mm1
+ pxor mm6,mm7
+ movq mm5,[56+esp]
+ paddq mm0,mm6
+ movq mm6,[esp]
+ movdqa [edx-96],xmm5
+ movdqa xmm6,[48+ebp]
+ movdqa xmm5,xmm4
+ movdqu xmm4,[64+ebx]
+ paddq xmm6,xmm3
+db 102,15,56,0,229
+ movq mm1,mm4
+ movq mm7,[edx-80]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [48+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm0,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [16+esp],mm0
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[8+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[40+esp]
+ paddq mm3,mm7
+ movq mm5,mm0
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm0
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[24+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm0,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm2,mm0
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm2,mm1
+ pxor mm6,mm7
+ movq mm5,[48+esp]
+ paddq mm2,mm6
+ movq mm6,[56+esp]
+ movq mm1,mm4
+ movq mm7,[edx-72]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [40+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm2,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [8+esp],mm2
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[32+esp]
+ paddq mm3,mm7
+ movq mm5,mm2
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm2
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[16+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm2,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm0,mm2
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm0,mm1
+ pxor mm6,mm7
+ movq mm5,[40+esp]
+ paddq mm0,mm6
+ movq mm6,[48+esp]
+ movdqa [edx-80],xmm6
+ movdqa xmm7,[64+ebp]
+ movdqa xmm6,xmm5
+ movdqu xmm5,[80+ebx]
+ paddq xmm7,xmm4
+db 102,15,56,0,238
+ movq mm1,mm4
+ movq mm7,[edx-64]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [32+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm0,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [esp],mm0
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[56+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[24+esp]
+ paddq mm3,mm7
+ movq mm5,mm0
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm0
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[8+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm0,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm2,mm0
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm2,mm1
+ pxor mm6,mm7
+ movq mm5,[32+esp]
+ paddq mm2,mm6
+ movq mm6,[40+esp]
+ movq mm1,mm4
+ movq mm7,[edx-56]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [24+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm2,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [56+esp],mm2
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[48+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[16+esp]
+ paddq mm3,mm7
+ movq mm5,mm2
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm2
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm2,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm0,mm2
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm0,mm1
+ pxor mm6,mm7
+ movq mm5,[24+esp]
+ paddq mm0,mm6
+ movq mm6,[32+esp]
+ movdqa [edx-64],xmm7
+ movdqa [edx],xmm0
+ movdqa xmm0,[80+ebp]
+ movdqa xmm7,xmm6
+ movdqu xmm6,[96+ebx]
+ paddq xmm0,xmm5
+db 102,15,56,0,247
+ movq mm1,mm4
+ movq mm7,[edx-48]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [16+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm0,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [48+esp],mm0
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[40+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[8+esp]
+ paddq mm3,mm7
+ movq mm5,mm0
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm0
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[56+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm0,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm2,mm0
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm2,mm1
+ pxor mm6,mm7
+ movq mm5,[16+esp]
+ paddq mm2,mm6
+ movq mm6,[24+esp]
+ movq mm1,mm4
+ movq mm7,[edx-40]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [8+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm2,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [40+esp],mm2
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[32+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[esp]
+ paddq mm3,mm7
+ movq mm5,mm2
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm2
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[48+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm2,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm0,mm2
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm0,mm1
+ pxor mm6,mm7
+ movq mm5,[8+esp]
+ paddq mm0,mm6
+ movq mm6,[16+esp]
+ movdqa [edx-48],xmm0
+ movdqa [16+edx],xmm1
+ movdqa xmm1,[96+ebp]
+ movdqa xmm0,xmm7
+ movdqu xmm7,[112+ebx]
+ paddq xmm1,xmm6
+db 102,15,56,0,248
+ movq mm1,mm4
+ movq mm7,[edx-32]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm0,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [32+esp],mm0
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[24+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[56+esp]
+ paddq mm3,mm7
+ movq mm5,mm0
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm0
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[40+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm0,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm2,mm0
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm2,mm1
+ pxor mm6,mm7
+ movq mm5,[esp]
+ paddq mm2,mm6
+ movq mm6,[8+esp]
+ movq mm1,mm4
+ movq mm7,[edx-24]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [56+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm2,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [24+esp],mm2
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[16+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[48+esp]
+ paddq mm3,mm7
+ movq mm5,mm2
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm2
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[32+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm2,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm0,mm2
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm0,mm1
+ pxor mm6,mm7
+ movq mm5,[56+esp]
+ paddq mm0,mm6
+ movq mm6,[esp]
+ movdqa [edx-32],xmm1
+ movdqa [32+edx],xmm2
+ movdqa xmm2,[112+ebp]
+ movdqa xmm0,[edx]
+ paddq xmm2,xmm7
+ movq mm1,mm4
+ movq mm7,[edx-16]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [48+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm0,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [16+esp],mm0
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[8+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[40+esp]
+ paddq mm3,mm7
+ movq mm5,mm0
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm0
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[24+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm0,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm2,mm0
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm2,mm1
+ pxor mm6,mm7
+ movq mm5,[48+esp]
+ paddq mm2,mm6
+ movq mm6,[56+esp]
+ movq mm1,mm4
+ movq mm7,[edx-8]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [40+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm2,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [8+esp],mm2
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[32+esp]
+ paddq mm3,mm7
+ movq mm5,mm2
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm2
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[16+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm2,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm0,mm2
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm0,mm1
+ pxor mm6,mm7
+ movq mm5,[40+esp]
+ paddq mm0,mm6
+ movq mm6,[48+esp]
+ movdqa [edx-16],xmm2
+ movq mm1,[8+esp]
+ paddq mm0,mm3
+ movq mm3,[24+esp]
+ movq mm7,[56+esp]
+ pxor mm2,mm1
+ paddq mm0,[esi]
+ paddq mm1,[8+esi]
+ paddq mm2,[16+esi]
+ paddq mm3,[24+esi]
+ paddq mm4,[32+esi]
+ paddq mm5,[40+esi]
+ paddq mm6,[48+esi]
+ paddq mm7,[56+esi]
+ movq [esi],mm0
+ movq [8+esi],mm1
+ movq [16+esi],mm2
+ movq [24+esi],mm3
+ movq [32+esi],mm4
+ movq [40+esi],mm5
+ movq [48+esi],mm6
+ movq [56+esi],mm7
+ cmp edi,eax
+ jb NEAR L$007loop_ssse3
+ mov esp,DWORD [76+edx]
+ emms
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+align 16
+L$002loop_x86:
+ mov eax,DWORD [edi]
+ mov ebx,DWORD [4+edi]
+ mov ecx,DWORD [8+edi]
+ mov edx,DWORD [12+edi]
+ bswap eax
+ bswap ebx
+ bswap ecx
+ bswap edx
+ push eax
+ push ebx
+ push ecx
+ push edx
+ mov eax,DWORD [16+edi]
+ mov ebx,DWORD [20+edi]
+ mov ecx,DWORD [24+edi]
+ mov edx,DWORD [28+edi]
+ bswap eax
+ bswap ebx
+ bswap ecx
+ bswap edx
+ push eax
+ push ebx
+ push ecx
+ push edx
+ mov eax,DWORD [32+edi]
+ mov ebx,DWORD [36+edi]
+ mov ecx,DWORD [40+edi]
+ mov edx,DWORD [44+edi]
+ bswap eax
+ bswap ebx
+ bswap ecx
+ bswap edx
+ push eax
+ push ebx
+ push ecx
+ push edx
+ mov eax,DWORD [48+edi]
+ mov ebx,DWORD [52+edi]
+ mov ecx,DWORD [56+edi]
+ mov edx,DWORD [60+edi]
+ bswap eax
+ bswap ebx
+ bswap ecx
+ bswap edx
+ push eax
+ push ebx
+ push ecx
+ push edx
+ mov eax,DWORD [64+edi]
+ mov ebx,DWORD [68+edi]
+ mov ecx,DWORD [72+edi]
+ mov edx,DWORD [76+edi]
+ bswap eax
+ bswap ebx
+ bswap ecx
+ bswap edx
+ push eax
+ push ebx
+ push ecx
+ push edx
+ mov eax,DWORD [80+edi]
+ mov ebx,DWORD [84+edi]
+ mov ecx,DWORD [88+edi]
+ mov edx,DWORD [92+edi]
+ bswap eax
+ bswap ebx
+ bswap ecx
+ bswap edx
+ push eax
+ push ebx
+ push ecx
+ push edx
+ mov eax,DWORD [96+edi]
+ mov ebx,DWORD [100+edi]
+ mov ecx,DWORD [104+edi]
+ mov edx,DWORD [108+edi]
+ bswap eax
+ bswap ebx
+ bswap ecx
+ bswap edx
+ push eax
+ push ebx
+ push ecx
+ push edx
+ mov eax,DWORD [112+edi]
+ mov ebx,DWORD [116+edi]
+ mov ecx,DWORD [120+edi]
+ mov edx,DWORD [124+edi]
+ bswap eax
+ bswap ebx
+ bswap ecx
+ bswap edx
+ push eax
+ push ebx
+ push ecx
+ push edx
+ add edi,128
+ sub esp,72
+ mov DWORD [204+esp],edi
+ lea edi,[8+esp]
+ mov ecx,16
+dd 2784229001
+align 16
+L$00900_15_x86:
+ mov ecx,DWORD [40+esp]
+ mov edx,DWORD [44+esp]
+ mov esi,ecx
+ shr ecx,9
+ mov edi,edx
+ shr edx,9
+ mov ebx,ecx
+ shl esi,14
+ mov eax,edx
+ shl edi,14
+ xor ebx,esi
+ shr ecx,5
+ xor eax,edi
+ shr edx,5
+ xor eax,ecx
+ shl esi,4
+ xor ebx,edx
+ shl edi,4
+ xor ebx,esi
+ shr ecx,4
+ xor eax,edi
+ shr edx,4
+ xor eax,ecx
+ shl esi,5
+ xor ebx,edx
+ shl edi,5
+ xor eax,esi
+ xor ebx,edi
+ mov ecx,DWORD [48+esp]
+ mov edx,DWORD [52+esp]
+ mov esi,DWORD [56+esp]
+ mov edi,DWORD [60+esp]
+ add eax,DWORD [64+esp]
+ adc ebx,DWORD [68+esp]
+ xor ecx,esi
+ xor edx,edi
+ and ecx,DWORD [40+esp]
+ and edx,DWORD [44+esp]
+ add eax,DWORD [192+esp]
+ adc ebx,DWORD [196+esp]
+ xor ecx,esi
+ xor edx,edi
+ mov esi,DWORD [ebp]
+ mov edi,DWORD [4+ebp]
+ add eax,ecx
+ adc ebx,edx
+ mov ecx,DWORD [32+esp]
+ mov edx,DWORD [36+esp]
+ add eax,esi
+ adc ebx,edi
+ mov DWORD [esp],eax
+ mov DWORD [4+esp],ebx
+ add eax,ecx
+ adc ebx,edx
+ mov ecx,DWORD [8+esp]
+ mov edx,DWORD [12+esp]
+ mov DWORD [32+esp],eax
+ mov DWORD [36+esp],ebx
+ mov esi,ecx
+ shr ecx,2
+ mov edi,edx
+ shr edx,2
+ mov ebx,ecx
+ shl esi,4
+ mov eax,edx
+ shl edi,4
+ xor ebx,esi
+ shr ecx,5
+ xor eax,edi
+ shr edx,5
+ xor ebx,ecx
+ shl esi,21
+ xor eax,edx
+ shl edi,21
+ xor eax,esi
+ shr ecx,21
+ xor ebx,edi
+ shr edx,21
+ xor eax,ecx
+ shl esi,5
+ xor ebx,edx
+ shl edi,5
+ xor eax,esi
+ xor ebx,edi
+ mov ecx,DWORD [8+esp]
+ mov edx,DWORD [12+esp]
+ mov esi,DWORD [16+esp]
+ mov edi,DWORD [20+esp]
+ add eax,DWORD [esp]
+ adc ebx,DWORD [4+esp]
+ or ecx,esi
+ or edx,edi
+ and ecx,DWORD [24+esp]
+ and edx,DWORD [28+esp]
+ and esi,DWORD [8+esp]
+ and edi,DWORD [12+esp]
+ or ecx,esi
+ or edx,edi
+ add eax,ecx
+ adc ebx,edx
+ mov DWORD [esp],eax
+ mov DWORD [4+esp],ebx
+ mov dl,BYTE [ebp]
+ sub esp,8
+ lea ebp,[8+ebp]
+ cmp dl,148
+ jne NEAR L$00900_15_x86
+align 16
+L$01016_79_x86:
+ mov ecx,DWORD [312+esp]
+ mov edx,DWORD [316+esp]
+ mov esi,ecx
+ shr ecx,1
+ mov edi,edx
+ shr edx,1
+ mov eax,ecx
+ shl esi,24
+ mov ebx,edx
+ shl edi,24
+ xor ebx,esi
+ shr ecx,6
+ xor eax,edi
+ shr edx,6
+ xor eax,ecx
+ shl esi,7
+ xor ebx,edx
+ shl edi,1
+ xor ebx,esi
+ shr ecx,1
+ xor eax,edi
+ shr edx,1
+ xor eax,ecx
+ shl edi,6
+ xor ebx,edx
+ xor eax,edi
+ mov DWORD [esp],eax
+ mov DWORD [4+esp],ebx
+ mov ecx,DWORD [208+esp]
+ mov edx,DWORD [212+esp]
+ mov esi,ecx
+ shr ecx,6
+ mov edi,edx
+ shr edx,6
+ mov eax,ecx
+ shl esi,3
+ mov ebx,edx
+ shl edi,3
+ xor eax,esi
+ shr ecx,13
+ xor ebx,edi
+ shr edx,13
+ xor eax,ecx
+ shl esi,10
+ xor ebx,edx
+ shl edi,10
+ xor ebx,esi
+ shr ecx,10
+ xor eax,edi
+ shr edx,10
+ xor ebx,ecx
+ shl edi,13
+ xor eax,edx
+ xor eax,edi
+ mov ecx,DWORD [320+esp]
+ mov edx,DWORD [324+esp]
+ add eax,DWORD [esp]
+ adc ebx,DWORD [4+esp]
+ mov esi,DWORD [248+esp]
+ mov edi,DWORD [252+esp]
+ add eax,ecx
+ adc ebx,edx
+ add eax,esi
+ adc ebx,edi
+ mov DWORD [192+esp],eax
+ mov DWORD [196+esp],ebx
+ mov ecx,DWORD [40+esp]
+ mov edx,DWORD [44+esp]
+ mov esi,ecx
+ shr ecx,9
+ mov edi,edx
+ shr edx,9
+ mov ebx,ecx
+ shl esi,14
+ mov eax,edx
+ shl edi,14
+ xor ebx,esi
+ shr ecx,5
+ xor eax,edi
+ shr edx,5
+ xor eax,ecx
+ shl esi,4
+ xor ebx,edx
+ shl edi,4
+ xor ebx,esi
+ shr ecx,4
+ xor eax,edi
+ shr edx,4
+ xor eax,ecx
+ shl esi,5
+ xor ebx,edx
+ shl edi,5
+ xor eax,esi
+ xor ebx,edi
+ mov ecx,DWORD [48+esp]
+ mov edx,DWORD [52+esp]
+ mov esi,DWORD [56+esp]
+ mov edi,DWORD [60+esp]
+ add eax,DWORD [64+esp]
+ adc ebx,DWORD [68+esp]
+ xor ecx,esi
+ xor edx,edi
+ and ecx,DWORD [40+esp]
+ and edx,DWORD [44+esp]
+ add eax,DWORD [192+esp]
+ adc ebx,DWORD [196+esp]
+ xor ecx,esi
+ xor edx,edi
+ mov esi,DWORD [ebp]
+ mov edi,DWORD [4+ebp]
+ add eax,ecx
+ adc ebx,edx
+ mov ecx,DWORD [32+esp]
+ mov edx,DWORD [36+esp]
+ add eax,esi
+ adc ebx,edi
+ mov DWORD [esp],eax
+ mov DWORD [4+esp],ebx
+ add eax,ecx
+ adc ebx,edx
+ mov ecx,DWORD [8+esp]
+ mov edx,DWORD [12+esp]
+ mov DWORD [32+esp],eax
+ mov DWORD [36+esp],ebx
+ mov esi,ecx
+ shr ecx,2
+ mov edi,edx
+ shr edx,2
+ mov ebx,ecx
+ shl esi,4
+ mov eax,edx
+ shl edi,4
+ xor ebx,esi
+ shr ecx,5
+ xor eax,edi
+ shr edx,5
+ xor ebx,ecx
+ shl esi,21
+ xor eax,edx
+ shl edi,21
+ xor eax,esi
+ shr ecx,21
+ xor ebx,edi
+ shr edx,21
+ xor eax,ecx
+ shl esi,5
+ xor ebx,edx
+ shl edi,5
+ xor eax,esi
+ xor ebx,edi
+ mov ecx,DWORD [8+esp]
+ mov edx,DWORD [12+esp]
+ mov esi,DWORD [16+esp]
+ mov edi,DWORD [20+esp]
+ add eax,DWORD [esp]
+ adc ebx,DWORD [4+esp]
+ or ecx,esi
+ or edx,edi
+ and ecx,DWORD [24+esp]
+ and edx,DWORD [28+esp]
+ and esi,DWORD [8+esp]
+ and edi,DWORD [12+esp]
+ or ecx,esi
+ or edx,edi
+ add eax,ecx
+ adc ebx,edx
+ mov DWORD [esp],eax
+ mov DWORD [4+esp],ebx
+ mov dl,BYTE [ebp]
+ sub esp,8
+ lea ebp,[8+ebp]
+ cmp dl,23
+ jne NEAR L$01016_79_x86
+ mov esi,DWORD [840+esp]
+ mov edi,DWORD [844+esp]
+ mov eax,DWORD [esi]
+ mov ebx,DWORD [4+esi]
+ mov ecx,DWORD [8+esi]
+ mov edx,DWORD [12+esi]
+ add eax,DWORD [8+esp]
+ adc ebx,DWORD [12+esp]
+ mov DWORD [esi],eax
+ mov DWORD [4+esi],ebx
+ add ecx,DWORD [16+esp]
+ adc edx,DWORD [20+esp]
+ mov DWORD [8+esi],ecx
+ mov DWORD [12+esi],edx
+ mov eax,DWORD [16+esi]
+ mov ebx,DWORD [20+esi]
+ mov ecx,DWORD [24+esi]
+ mov edx,DWORD [28+esi]
+ add eax,DWORD [24+esp]
+ adc ebx,DWORD [28+esp]
+ mov DWORD [16+esi],eax
+ mov DWORD [20+esi],ebx
+ add ecx,DWORD [32+esp]
+ adc edx,DWORD [36+esp]
+ mov DWORD [24+esi],ecx
+ mov DWORD [28+esi],edx
+ mov eax,DWORD [32+esi]
+ mov ebx,DWORD [36+esi]
+ mov ecx,DWORD [40+esi]
+ mov edx,DWORD [44+esi]
+ add eax,DWORD [40+esp]
+ adc ebx,DWORD [44+esp]
+ mov DWORD [32+esi],eax
+ mov DWORD [36+esi],ebx
+ add ecx,DWORD [48+esp]
+ adc edx,DWORD [52+esp]
+ mov DWORD [40+esi],ecx
+ mov DWORD [44+esi],edx
+ mov eax,DWORD [48+esi]
+ mov ebx,DWORD [52+esi]
+ mov ecx,DWORD [56+esi]
+ mov edx,DWORD [60+esi]
+ add eax,DWORD [56+esp]
+ adc ebx,DWORD [60+esp]
+ mov DWORD [48+esi],eax
+ mov DWORD [52+esi],ebx
+ add ecx,DWORD [64+esp]
+ adc edx,DWORD [68+esp]
+ mov DWORD [56+esi],ecx
+ mov DWORD [60+esi],edx
+ add esp,840
+ sub ebp,640
+ cmp edi,DWORD [8+esp]
+ jb NEAR L$002loop_x86
+ mov esp,DWORD [12+esp]
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+align 64
+L$001K512:
+dd 3609767458,1116352408
+dd 602891725,1899447441
+dd 3964484399,3049323471
+dd 2173295548,3921009573
+dd 4081628472,961987163
+dd 3053834265,1508970993
+dd 2937671579,2453635748
+dd 3664609560,2870763221
+dd 2734883394,3624381080
+dd 1164996542,310598401
+dd 1323610764,607225278
+dd 3590304994,1426881987
+dd 4068182383,1925078388
+dd 991336113,2162078206
+dd 633803317,2614888103
+dd 3479774868,3248222580
+dd 2666613458,3835390401
+dd 944711139,4022224774
+dd 2341262773,264347078
+dd 2007800933,604807628
+dd 1495990901,770255983
+dd 1856431235,1249150122
+dd 3175218132,1555081692
+dd 2198950837,1996064986
+dd 3999719339,2554220882
+dd 766784016,2821834349
+dd 2566594879,2952996808
+dd 3203337956,3210313671
+dd 1034457026,3336571891
+dd 2466948901,3584528711
+dd 3758326383,113926993
+dd 168717936,338241895
+dd 1188179964,666307205
+dd 1546045734,773529912
+dd 1522805485,1294757372
+dd 2643833823,1396182291
+dd 2343527390,1695183700
+dd 1014477480,1986661051
+dd 1206759142,2177026350
+dd 344077627,2456956037
+dd 1290863460,2730485921
+dd 3158454273,2820302411
+dd 3505952657,3259730800
+dd 106217008,3345764771
+dd 3606008344,3516065817
+dd 1432725776,3600352804
+dd 1467031594,4094571909
+dd 851169720,275423344
+dd 3100823752,430227734
+dd 1363258195,506948616
+dd 3750685593,659060556
+dd 3785050280,883997877
+dd 3318307427,958139571
+dd 3812723403,1322822218
+dd 2003034995,1537002063
+dd 3602036899,1747873779
+dd 1575990012,1955562222
+dd 1125592928,2024104815
+dd 2716904306,2227730452
+dd 442776044,2361852424
+dd 593698344,2428436474
+dd 3733110249,2756734187
+dd 2999351573,3204031479
+dd 3815920427,3329325298
+dd 3928383900,3391569614
+dd 566280711,3515267271
+dd 3454069534,3940187606
+dd 4000239992,4118630271
+dd 1914138554,116418474
+dd 2731055270,174292421
+dd 3203993006,289380356
+dd 320620315,460393269
+dd 587496836,685471733
+dd 1086792851,852142971
+dd 365543100,1017036298
+dd 2618297676,1126000580
+dd 3409855158,1288033470
+dd 4234509866,1501505948
+dd 987167468,1607167915
+dd 1246189591,1816402316
+dd 67438087,66051
+dd 202182159,134810123
+db 83,72,65,53,49,50,32,98,108,111,99,107,32,116,114,97
+db 110,115,102,111,114,109,32,102,111,114,32,120,56,54,44,32
+db 67,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97
+db 112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103
+db 62,0
+segment .bss
+common _OPENSSL_ia32cap_P 16
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/x86cpuid.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/x86cpuid.nasm
new file mode 100644
index 0000000000..9d61eedd34
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/x86cpuid.nasm
@@ -0,0 +1,513 @@
+; Copyright 2004-2018 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text code align=64
+%else
+section .text code
+%endif
+global _OPENSSL_ia32_cpuid
+align 16
+_OPENSSL_ia32_cpuid:
+L$_OPENSSL_ia32_cpuid_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ xor edx,edx
+ pushfd
+ pop eax
+ mov ecx,eax
+ xor eax,2097152
+ push eax
+ popfd
+ pushfd
+ pop eax
+ xor ecx,eax
+ xor eax,eax
+ mov esi,DWORD [20+esp]
+ mov DWORD [8+esi],eax
+ bt ecx,21
+ jnc NEAR L$000nocpuid
+ cpuid
+ mov edi,eax
+ xor eax,eax
+ cmp ebx,1970169159
+ setne al
+ mov ebp,eax
+ cmp edx,1231384169
+ setne al
+ or ebp,eax
+ cmp ecx,1818588270
+ setne al
+ or ebp,eax
+ jz NEAR L$001intel
+ cmp ebx,1752462657
+ setne al
+ mov esi,eax
+ cmp edx,1769238117
+ setne al
+ or esi,eax
+ cmp ecx,1145913699
+ setne al
+ or esi,eax
+ jnz NEAR L$001intel
+ mov eax,2147483648
+ cpuid
+ cmp eax,2147483649
+ jb NEAR L$001intel
+ mov esi,eax
+ mov eax,2147483649
+ cpuid
+ or ebp,ecx
+ and ebp,2049
+ cmp esi,2147483656
+ jb NEAR L$001intel
+ mov eax,2147483656
+ cpuid
+ movzx esi,cl
+ inc esi
+ mov eax,1
+ xor ecx,ecx
+ cpuid
+ bt edx,28
+ jnc NEAR L$002generic
+ shr ebx,16
+ and ebx,255
+ cmp ebx,esi
+ ja NEAR L$002generic
+ and edx,4026531839
+ jmp NEAR L$002generic
+L$001intel:
+ cmp edi,4
+ mov esi,-1
+ jb NEAR L$003nocacheinfo
+ mov eax,4
+ mov ecx,0
+ cpuid
+ mov esi,eax
+ shr esi,14
+ and esi,4095
+L$003nocacheinfo:
+ mov eax,1
+ xor ecx,ecx
+ cpuid
+ and edx,3220176895
+ cmp ebp,0
+ jne NEAR L$004notintel
+ or edx,1073741824
+ and ah,15
+ cmp ah,15
+ jne NEAR L$004notintel
+ or edx,1048576
+L$004notintel:
+ bt edx,28
+ jnc NEAR L$002generic
+ and edx,4026531839
+ cmp esi,0
+ je NEAR L$002generic
+ or edx,268435456
+ shr ebx,16
+ cmp bl,1
+ ja NEAR L$002generic
+ and edx,4026531839
+L$002generic:
+ and ebp,2048
+ and ecx,4294965247
+ mov esi,edx
+ or ebp,ecx
+ cmp edi,7
+ mov edi,DWORD [20+esp]
+ jb NEAR L$005no_extended_info
+ mov eax,7
+ xor ecx,ecx
+ cpuid
+ mov DWORD [8+edi],ebx
+L$005no_extended_info:
+ bt ebp,27
+ jnc NEAR L$006clear_avx
+ xor ecx,ecx
+db 15,1,208
+ and eax,6
+ cmp eax,6
+ je NEAR L$007done
+ cmp eax,2
+ je NEAR L$006clear_avx
+L$008clear_xmm:
+ and ebp,4261412861
+ and esi,4278190079
+L$006clear_avx:
+ and ebp,4026525695
+ and DWORD [8+edi],4294967263
+L$007done:
+ mov eax,esi
+ mov edx,ebp
+L$000nocpuid:
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+;extern _OPENSSL_ia32cap_P
+global _OPENSSL_rdtsc
+align 16
+_OPENSSL_rdtsc:
+L$_OPENSSL_rdtsc_begin:
+ xor eax,eax
+ xor edx,edx
+ lea ecx,[_OPENSSL_ia32cap_P]
+ bt DWORD [ecx],4
+ jnc NEAR L$009notsc
+ rdtsc
+L$009notsc:
+ ret
+global _OPENSSL_instrument_halt
+align 16
+_OPENSSL_instrument_halt:
+L$_OPENSSL_instrument_halt_begin:
+ lea ecx,[_OPENSSL_ia32cap_P]
+ bt DWORD [ecx],4
+ jnc NEAR L$010nohalt
+dd 2421723150
+ and eax,3
+ jnz NEAR L$010nohalt
+ pushfd
+ pop eax
+ bt eax,9
+ jnc NEAR L$010nohalt
+ rdtsc
+ push edx
+ push eax
+ hlt
+ rdtsc
+ sub eax,DWORD [esp]
+ sbb edx,DWORD [4+esp]
+ add esp,8
+ ret
+L$010nohalt:
+ xor eax,eax
+ xor edx,edx
+ ret
+global _OPENSSL_far_spin
+align 16
+_OPENSSL_far_spin:
+L$_OPENSSL_far_spin_begin:
+ pushfd
+ pop eax
+ bt eax,9
+ jnc NEAR L$011nospin
+ mov eax,DWORD [4+esp]
+ mov ecx,DWORD [8+esp]
+dd 2430111262
+ xor eax,eax
+ mov edx,DWORD [ecx]
+ jmp NEAR L$012spin
+align 16
+L$012spin:
+ inc eax
+ cmp edx,DWORD [ecx]
+ je NEAR L$012spin
+dd 529567888
+ ret
+L$011nospin:
+ xor eax,eax
+ xor edx,edx
+ ret
+global _OPENSSL_wipe_cpu
+align 16
+_OPENSSL_wipe_cpu:
+L$_OPENSSL_wipe_cpu_begin:
+ xor eax,eax
+ xor edx,edx
+ lea ecx,[_OPENSSL_ia32cap_P]
+ mov ecx,DWORD [ecx]
+ bt DWORD [ecx],1
+ jnc NEAR L$013no_x87
+ and ecx,83886080
+ cmp ecx,83886080
+ jne NEAR L$014no_sse2
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ pxor xmm2,xmm2
+ pxor xmm3,xmm3
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+ pxor xmm6,xmm6
+ pxor xmm7,xmm7
+L$014no_sse2:
+dd 4007259865,4007259865,4007259865,4007259865,2430851995
+L$013no_x87:
+ lea eax,[4+esp]
+ ret
+global _OPENSSL_atomic_add
+align 16
+_OPENSSL_atomic_add:
+L$_OPENSSL_atomic_add_begin:
+ mov edx,DWORD [4+esp]
+ mov ecx,DWORD [8+esp]
+ push ebx
+ nop
+ mov eax,DWORD [edx]
+L$015spin:
+ lea ebx,[ecx*1+eax]
+ nop
+dd 447811568
+ jne NEAR L$015spin
+ mov eax,ebx
+ pop ebx
+ ret
+global _OPENSSL_cleanse
+align 16
+_OPENSSL_cleanse:
+L$_OPENSSL_cleanse_begin:
+ mov edx,DWORD [4+esp]
+ mov ecx,DWORD [8+esp]
+ xor eax,eax
+ cmp ecx,7
+ jae NEAR L$016lot
+ cmp ecx,0
+ je NEAR L$017ret
+L$018little:
+ mov BYTE [edx],al
+ sub ecx,1
+ lea edx,[1+edx]
+ jnz NEAR L$018little
+L$017ret:
+ ret
+align 16
+L$016lot:
+ test edx,3
+ jz NEAR L$019aligned
+ mov BYTE [edx],al
+ lea ecx,[ecx-1]
+ lea edx,[1+edx]
+ jmp NEAR L$016lot
+L$019aligned:
+ mov DWORD [edx],eax
+ lea ecx,[ecx-4]
+ test ecx,-4
+ lea edx,[4+edx]
+ jnz NEAR L$019aligned
+ cmp ecx,0
+ jne NEAR L$018little
+ ret
+global _CRYPTO_memcmp
+align 16
+_CRYPTO_memcmp:
+L$_CRYPTO_memcmp_begin:
+ push esi
+ push edi
+ mov esi,DWORD [12+esp]
+ mov edi,DWORD [16+esp]
+ mov ecx,DWORD [20+esp]
+ xor eax,eax
+ xor edx,edx
+ cmp ecx,0
+ je NEAR L$020no_data
+L$021loop:
+ mov dl,BYTE [esi]
+ lea esi,[1+esi]
+ xor dl,BYTE [edi]
+ lea edi,[1+edi]
+ or al,dl
+ dec ecx
+ jnz NEAR L$021loop
+ neg eax
+ shr eax,31
+L$020no_data:
+ pop edi
+ pop esi
+ ret
+global _OPENSSL_instrument_bus
+align 16
+_OPENSSL_instrument_bus:
+L$_OPENSSL_instrument_bus_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov eax,0
+ lea edx,[_OPENSSL_ia32cap_P]
+ bt DWORD [edx],4
+ jnc NEAR L$022nogo
+ bt DWORD [edx],19
+ jnc NEAR L$022nogo
+ mov edi,DWORD [20+esp]
+ mov ecx,DWORD [24+esp]
+ rdtsc
+ mov esi,eax
+ mov ebx,0
+ clflush [edi]
+db 240
+ add DWORD [edi],ebx
+ jmp NEAR L$023loop
+align 16
+L$023loop:
+ rdtsc
+ mov edx,eax
+ sub eax,esi
+ mov esi,edx
+ mov ebx,eax
+ clflush [edi]
+db 240
+ add DWORD [edi],eax
+ lea edi,[4+edi]
+ sub ecx,1
+ jnz NEAR L$023loop
+ mov eax,DWORD [24+esp]
+L$022nogo:
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _OPENSSL_instrument_bus2
+align 16
+_OPENSSL_instrument_bus2:
+L$_OPENSSL_instrument_bus2_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov eax,0
+ lea edx,[_OPENSSL_ia32cap_P]
+ bt DWORD [edx],4
+ jnc NEAR L$024nogo
+ bt DWORD [edx],19
+ jnc NEAR L$024nogo
+ mov edi,DWORD [20+esp]
+ mov ecx,DWORD [24+esp]
+ mov ebp,DWORD [28+esp]
+ rdtsc
+ mov esi,eax
+ mov ebx,0
+ clflush [edi]
+db 240
+ add DWORD [edi],ebx
+ rdtsc
+ mov edx,eax
+ sub eax,esi
+ mov esi,edx
+ mov ebx,eax
+ jmp NEAR L$025loop2
+align 16
+L$025loop2:
+ clflush [edi]
+db 240
+ add DWORD [edi],eax
+ sub ebp,1
+ jz NEAR L$026done2
+ rdtsc
+ mov edx,eax
+ sub eax,esi
+ mov esi,edx
+ cmp eax,ebx
+ mov ebx,eax
+ mov edx,0
+ setne dl
+ sub ecx,edx
+ lea edi,[edx*4+edi]
+ jnz NEAR L$025loop2
+L$026done2:
+ mov eax,DWORD [24+esp]
+ sub eax,ecx
+L$024nogo:
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _OPENSSL_ia32_rdrand_bytes
+align 16
+_OPENSSL_ia32_rdrand_bytes:
+L$_OPENSSL_ia32_rdrand_bytes_begin:
+ push edi
+ push ebx
+ xor eax,eax
+ mov edi,DWORD [12+esp]
+ mov ebx,DWORD [16+esp]
+ cmp ebx,0
+ je NEAR L$027done
+ mov ecx,8
+L$028loop:
+db 15,199,242
+ jc NEAR L$029break
+ loop L$028loop
+ jmp NEAR L$027done
+align 16
+L$029break:
+ cmp ebx,4
+ jb NEAR L$030tail
+ mov DWORD [edi],edx
+ lea edi,[4+edi]
+ add eax,4
+ sub ebx,4
+ jz NEAR L$027done
+ mov ecx,8
+ jmp NEAR L$028loop
+align 16
+L$030tail:
+ mov BYTE [edi],dl
+ lea edi,[1+edi]
+ inc eax
+ shr edx,8
+ dec ebx
+ jnz NEAR L$030tail
+L$027done:
+ xor edx,edx
+ pop ebx
+ pop edi
+ ret
+global _OPENSSL_ia32_rdseed_bytes
+align 16
+_OPENSSL_ia32_rdseed_bytes:
+L$_OPENSSL_ia32_rdseed_bytes_begin:
+ push edi
+ push ebx
+ xor eax,eax
+ mov edi,DWORD [12+esp]
+ mov ebx,DWORD [16+esp]
+ cmp ebx,0
+ je NEAR L$031done
+ mov ecx,8
+L$032loop:
+db 15,199,250
+ jc NEAR L$033break
+ loop L$032loop
+ jmp NEAR L$031done
+align 16
+L$033break:
+ cmp ebx,4
+ jb NEAR L$034tail
+ mov DWORD [edi],edx
+ lea edi,[4+edi]
+ add eax,4
+ sub ebx,4
+ jz NEAR L$031done
+ mov ecx,8
+ jmp NEAR L$032loop
+align 16
+L$034tail:
+ mov BYTE [edi],dl
+ lea edi,[1+edi]
+ inc eax
+ shr edx,8
+ dec ebx
+ jnz NEAR L$034tail
+L$031done:
+ xor edx,edx
+ pop ebx
+ pop edi
+ ret
+segment .bss
+common _OPENSSL_ia32cap_P 16
+segment .CRT$XCU data align=4
+extern _OPENSSL_cpuid_setup
+dd _OPENSSL_cpuid_setup
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb-x86_64.nasm
new file mode 100644
index 0000000000..a90434b21f
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb-x86_64.nasm
@@ -0,0 +1,1772 @@
+; Copyright 2013-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+EXTERN OPENSSL_ia32cap_P
+
+global aesni_multi_cbc_encrypt
+
+ALIGN 32
+aesni_multi_cbc_encrypt:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_multi_cbc_encrypt:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+ cmp edx,2
+ jb NEAR $L$enc_non_avx
+ mov ecx,DWORD[((OPENSSL_ia32cap_P+4))]
+ test ecx,268435456
+ jnz NEAR _avx_cbc_enc_shortcut
+ jmp NEAR $L$enc_non_avx
+ALIGN 16
+$L$enc_non_avx:
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ lea rsp,[((-168))+rsp]
+ movaps XMMWORD[rsp],xmm6
+ movaps XMMWORD[16+rsp],xmm7
+ movaps XMMWORD[32+rsp],xmm8
+ movaps XMMWORD[48+rsp],xmm9
+ movaps XMMWORD[64+rsp],xmm10
+ movaps XMMWORD[80+rsp],xmm11
+ movaps XMMWORD[96+rsp],xmm12
+ movaps XMMWORD[(-104)+rax],xmm13
+ movaps XMMWORD[(-88)+rax],xmm14
+ movaps XMMWORD[(-72)+rax],xmm15
+
+
+
+
+
+
+ sub rsp,48
+ and rsp,-64
+ mov QWORD[16+rsp],rax
+
+
+$L$enc4x_body:
+ movdqu xmm12,XMMWORD[rsi]
+ lea rsi,[120+rsi]
+ lea rdi,[80+rdi]
+
+$L$enc4x_loop_grande:
+ mov DWORD[24+rsp],edx
+ xor edx,edx
+ mov ecx,DWORD[((-64))+rdi]
+ mov r8,QWORD[((-80))+rdi]
+ cmp ecx,edx
+ mov r12,QWORD[((-72))+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ movdqu xmm2,XMMWORD[((-56))+rdi]
+ mov DWORD[32+rsp],ecx
+ cmovle r8,rsp
+ mov ecx,DWORD[((-24))+rdi]
+ mov r9,QWORD[((-40))+rdi]
+ cmp ecx,edx
+ mov r13,QWORD[((-32))+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ movdqu xmm3,XMMWORD[((-16))+rdi]
+ mov DWORD[36+rsp],ecx
+ cmovle r9,rsp
+ mov ecx,DWORD[16+rdi]
+ mov r10,QWORD[rdi]
+ cmp ecx,edx
+ mov r14,QWORD[8+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ movdqu xmm4,XMMWORD[24+rdi]
+ mov DWORD[40+rsp],ecx
+ cmovle r10,rsp
+ mov ecx,DWORD[56+rdi]
+ mov r11,QWORD[40+rdi]
+ cmp ecx,edx
+ mov r15,QWORD[48+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ movdqu xmm5,XMMWORD[64+rdi]
+ mov DWORD[44+rsp],ecx
+ cmovle r11,rsp
+ test edx,edx
+ jz NEAR $L$enc4x_done
+
+ movups xmm1,XMMWORD[((16-120))+rsi]
+ pxor xmm2,xmm12
+ movups xmm0,XMMWORD[((32-120))+rsi]
+ pxor xmm3,xmm12
+ mov eax,DWORD[((240-120))+rsi]
+ pxor xmm4,xmm12
+ movdqu xmm6,XMMWORD[r8]
+ pxor xmm5,xmm12
+ movdqu xmm7,XMMWORD[r9]
+ pxor xmm2,xmm6
+ movdqu xmm8,XMMWORD[r10]
+ pxor xmm3,xmm7
+ movdqu xmm9,XMMWORD[r11]
+ pxor xmm4,xmm8
+ pxor xmm5,xmm9
+ movdqa xmm10,XMMWORD[32+rsp]
+ xor rbx,rbx
+ jmp NEAR $L$oop_enc4x
+
+ALIGN 32
+$L$oop_enc4x:
+ add rbx,16
+ lea rbp,[16+rsp]
+ mov ecx,1
+ sub rbp,rbx
+
+DB 102,15,56,220,209
+ prefetcht0 [31+rbx*1+r8]
+ prefetcht0 [31+rbx*1+r9]
+DB 102,15,56,220,217
+ prefetcht0 [31+rbx*1+r10]
+ prefetcht0 [31+rbx*1+r10]
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+ movups xmm1,XMMWORD[((48-120))+rsi]
+ cmp ecx,DWORD[32+rsp]
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+DB 102,15,56,220,224
+ cmovge r8,rbp
+ cmovg r12,rbp
+DB 102,15,56,220,232
+ movups xmm0,XMMWORD[((-56))+rsi]
+ cmp ecx,DWORD[36+rsp]
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+ cmovge r9,rbp
+ cmovg r13,rbp
+DB 102,15,56,220,233
+ movups xmm1,XMMWORD[((-40))+rsi]
+ cmp ecx,DWORD[40+rsp]
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+DB 102,15,56,220,224
+ cmovge r10,rbp
+ cmovg r14,rbp
+DB 102,15,56,220,232
+ movups xmm0,XMMWORD[((-24))+rsi]
+ cmp ecx,DWORD[44+rsp]
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+ cmovge r11,rbp
+ cmovg r15,rbp
+DB 102,15,56,220,233
+ movups xmm1,XMMWORD[((-8))+rsi]
+ movdqa xmm11,xmm10
+DB 102,15,56,220,208
+ prefetcht0 [15+rbx*1+r12]
+ prefetcht0 [15+rbx*1+r13]
+DB 102,15,56,220,216
+ prefetcht0 [15+rbx*1+r14]
+ prefetcht0 [15+rbx*1+r15]
+DB 102,15,56,220,224
+DB 102,15,56,220,232
+ movups xmm0,XMMWORD[((128-120))+rsi]
+ pxor xmm12,xmm12
+
+DB 102,15,56,220,209
+ pcmpgtd xmm11,xmm12
+ movdqu xmm12,XMMWORD[((-120))+rsi]
+DB 102,15,56,220,217
+ paddd xmm10,xmm11
+ movdqa XMMWORD[32+rsp],xmm10
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+ movups xmm1,XMMWORD[((144-120))+rsi]
+
+ cmp eax,11
+
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+DB 102,15,56,220,224
+DB 102,15,56,220,232
+ movups xmm0,XMMWORD[((160-120))+rsi]
+
+ jb NEAR $L$enc4x_tail
+
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+ movups xmm1,XMMWORD[((176-120))+rsi]
+
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+DB 102,15,56,220,224
+DB 102,15,56,220,232
+ movups xmm0,XMMWORD[((192-120))+rsi]
+
+ je NEAR $L$enc4x_tail
+
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+ movups xmm1,XMMWORD[((208-120))+rsi]
+
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+DB 102,15,56,220,224
+DB 102,15,56,220,232
+ movups xmm0,XMMWORD[((224-120))+rsi]
+ jmp NEAR $L$enc4x_tail
+
+ALIGN 32
+$L$enc4x_tail:
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+ movdqu xmm6,XMMWORD[rbx*1+r8]
+ movdqu xmm1,XMMWORD[((16-120))+rsi]
+
+DB 102,15,56,221,208
+ movdqu xmm7,XMMWORD[rbx*1+r9]
+ pxor xmm6,xmm12
+DB 102,15,56,221,216
+ movdqu xmm8,XMMWORD[rbx*1+r10]
+ pxor xmm7,xmm12
+DB 102,15,56,221,224
+ movdqu xmm9,XMMWORD[rbx*1+r11]
+ pxor xmm8,xmm12
+DB 102,15,56,221,232
+ movdqu xmm0,XMMWORD[((32-120))+rsi]
+ pxor xmm9,xmm12
+
+ movups XMMWORD[(-16)+rbx*1+r12],xmm2
+ pxor xmm2,xmm6
+ movups XMMWORD[(-16)+rbx*1+r13],xmm3
+ pxor xmm3,xmm7
+ movups XMMWORD[(-16)+rbx*1+r14],xmm4
+ pxor xmm4,xmm8
+ movups XMMWORD[(-16)+rbx*1+r15],xmm5
+ pxor xmm5,xmm9
+
+ dec edx
+ jnz NEAR $L$oop_enc4x
+
+ mov rax,QWORD[16+rsp]
+
+ mov edx,DWORD[24+rsp]
+
+
+
+
+
+
+
+
+
+
+ lea rdi,[160+rdi]
+ dec edx
+ jnz NEAR $L$enc4x_loop_grande
+
+$L$enc4x_done:
+ movaps xmm6,XMMWORD[((-216))+rax]
+ movaps xmm7,XMMWORD[((-200))+rax]
+ movaps xmm8,XMMWORD[((-184))+rax]
+ movaps xmm9,XMMWORD[((-168))+rax]
+ movaps xmm10,XMMWORD[((-152))+rax]
+ movaps xmm11,XMMWORD[((-136))+rax]
+ movaps xmm12,XMMWORD[((-120))+rax]
+
+
+
+ mov r15,QWORD[((-48))+rax]
+
+ mov r14,QWORD[((-40))+rax]
+
+ mov r13,QWORD[((-32))+rax]
+
+ mov r12,QWORD[((-24))+rax]
+
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$enc4x_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_aesni_multi_cbc_encrypt:
+
+global aesni_multi_cbc_decrypt
+
+ALIGN 32
+aesni_multi_cbc_decrypt:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_multi_cbc_decrypt:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+ cmp edx,2
+ jb NEAR $L$dec_non_avx
+ mov ecx,DWORD[((OPENSSL_ia32cap_P+4))]
+ test ecx,268435456
+ jnz NEAR _avx_cbc_dec_shortcut
+ jmp NEAR $L$dec_non_avx
+ALIGN 16
+$L$dec_non_avx:
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ lea rsp,[((-168))+rsp]
+ movaps XMMWORD[rsp],xmm6
+ movaps XMMWORD[16+rsp],xmm7
+ movaps XMMWORD[32+rsp],xmm8
+ movaps XMMWORD[48+rsp],xmm9
+ movaps XMMWORD[64+rsp],xmm10
+ movaps XMMWORD[80+rsp],xmm11
+ movaps XMMWORD[96+rsp],xmm12
+ movaps XMMWORD[(-104)+rax],xmm13
+ movaps XMMWORD[(-88)+rax],xmm14
+ movaps XMMWORD[(-72)+rax],xmm15
+
+
+
+
+
+
+ sub rsp,48
+ and rsp,-64
+ mov QWORD[16+rsp],rax
+
+
+$L$dec4x_body:
+ movdqu xmm12,XMMWORD[rsi]
+ lea rsi,[120+rsi]
+ lea rdi,[80+rdi]
+
+$L$dec4x_loop_grande:
+ mov DWORD[24+rsp],edx
+ xor edx,edx
+ mov ecx,DWORD[((-64))+rdi]
+ mov r8,QWORD[((-80))+rdi]
+ cmp ecx,edx
+ mov r12,QWORD[((-72))+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ movdqu xmm6,XMMWORD[((-56))+rdi]
+ mov DWORD[32+rsp],ecx
+ cmovle r8,rsp
+ mov ecx,DWORD[((-24))+rdi]
+ mov r9,QWORD[((-40))+rdi]
+ cmp ecx,edx
+ mov r13,QWORD[((-32))+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ movdqu xmm7,XMMWORD[((-16))+rdi]
+ mov DWORD[36+rsp],ecx
+ cmovle r9,rsp
+ mov ecx,DWORD[16+rdi]
+ mov r10,QWORD[rdi]
+ cmp ecx,edx
+ mov r14,QWORD[8+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ movdqu xmm8,XMMWORD[24+rdi]
+ mov DWORD[40+rsp],ecx
+ cmovle r10,rsp
+ mov ecx,DWORD[56+rdi]
+ mov r11,QWORD[40+rdi]
+ cmp ecx,edx
+ mov r15,QWORD[48+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ movdqu xmm9,XMMWORD[64+rdi]
+ mov DWORD[44+rsp],ecx
+ cmovle r11,rsp
+ test edx,edx
+ jz NEAR $L$dec4x_done
+
+ movups xmm1,XMMWORD[((16-120))+rsi]
+ movups xmm0,XMMWORD[((32-120))+rsi]
+ mov eax,DWORD[((240-120))+rsi]
+ movdqu xmm2,XMMWORD[r8]
+ movdqu xmm3,XMMWORD[r9]
+ pxor xmm2,xmm12
+ movdqu xmm4,XMMWORD[r10]
+ pxor xmm3,xmm12
+ movdqu xmm5,XMMWORD[r11]
+ pxor xmm4,xmm12
+ pxor xmm5,xmm12
+ movdqa xmm10,XMMWORD[32+rsp]
+ xor rbx,rbx
+ jmp NEAR $L$oop_dec4x
+
+ALIGN 32
+$L$oop_dec4x:
+ add rbx,16
+ lea rbp,[16+rsp]
+ mov ecx,1
+ sub rbp,rbx
+
+DB 102,15,56,222,209
+ prefetcht0 [31+rbx*1+r8]
+ prefetcht0 [31+rbx*1+r9]
+DB 102,15,56,222,217
+ prefetcht0 [31+rbx*1+r10]
+ prefetcht0 [31+rbx*1+r11]
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+ movups xmm1,XMMWORD[((48-120))+rsi]
+ cmp ecx,DWORD[32+rsp]
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+DB 102,15,56,222,224
+ cmovge r8,rbp
+ cmovg r12,rbp
+DB 102,15,56,222,232
+ movups xmm0,XMMWORD[((-56))+rsi]
+ cmp ecx,DWORD[36+rsp]
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+ cmovge r9,rbp
+ cmovg r13,rbp
+DB 102,15,56,222,233
+ movups xmm1,XMMWORD[((-40))+rsi]
+ cmp ecx,DWORD[40+rsp]
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+DB 102,15,56,222,224
+ cmovge r10,rbp
+ cmovg r14,rbp
+DB 102,15,56,222,232
+ movups xmm0,XMMWORD[((-24))+rsi]
+ cmp ecx,DWORD[44+rsp]
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+ cmovge r11,rbp
+ cmovg r15,rbp
+DB 102,15,56,222,233
+ movups xmm1,XMMWORD[((-8))+rsi]
+ movdqa xmm11,xmm10
+DB 102,15,56,222,208
+ prefetcht0 [15+rbx*1+r12]
+ prefetcht0 [15+rbx*1+r13]
+DB 102,15,56,222,216
+ prefetcht0 [15+rbx*1+r14]
+ prefetcht0 [15+rbx*1+r15]
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+ movups xmm0,XMMWORD[((128-120))+rsi]
+ pxor xmm12,xmm12
+
+DB 102,15,56,222,209
+ pcmpgtd xmm11,xmm12
+ movdqu xmm12,XMMWORD[((-120))+rsi]
+DB 102,15,56,222,217
+ paddd xmm10,xmm11
+ movdqa XMMWORD[32+rsp],xmm10
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+ movups xmm1,XMMWORD[((144-120))+rsi]
+
+ cmp eax,11
+
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+ movups xmm0,XMMWORD[((160-120))+rsi]
+
+ jb NEAR $L$dec4x_tail
+
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+ movups xmm1,XMMWORD[((176-120))+rsi]
+
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+ movups xmm0,XMMWORD[((192-120))+rsi]
+
+ je NEAR $L$dec4x_tail
+
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+ movups xmm1,XMMWORD[((208-120))+rsi]
+
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+ movups xmm0,XMMWORD[((224-120))+rsi]
+ jmp NEAR $L$dec4x_tail
+
+ALIGN 32
+$L$dec4x_tail:
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+ pxor xmm6,xmm0
+ pxor xmm7,xmm0
+DB 102,15,56,222,233
+ movdqu xmm1,XMMWORD[((16-120))+rsi]
+ pxor xmm8,xmm0
+ pxor xmm9,xmm0
+ movdqu xmm0,XMMWORD[((32-120))+rsi]
+
+DB 102,15,56,223,214
+DB 102,15,56,223,223
+ movdqu xmm6,XMMWORD[((-16))+rbx*1+r8]
+ movdqu xmm7,XMMWORD[((-16))+rbx*1+r9]
+DB 102,65,15,56,223,224
+DB 102,65,15,56,223,233
+ movdqu xmm8,XMMWORD[((-16))+rbx*1+r10]
+ movdqu xmm9,XMMWORD[((-16))+rbx*1+r11]
+
+ movups XMMWORD[(-16)+rbx*1+r12],xmm2
+ movdqu xmm2,XMMWORD[rbx*1+r8]
+ movups XMMWORD[(-16)+rbx*1+r13],xmm3
+ movdqu xmm3,XMMWORD[rbx*1+r9]
+ pxor xmm2,xmm12
+ movups XMMWORD[(-16)+rbx*1+r14],xmm4
+ movdqu xmm4,XMMWORD[rbx*1+r10]
+ pxor xmm3,xmm12
+ movups XMMWORD[(-16)+rbx*1+r15],xmm5
+ movdqu xmm5,XMMWORD[rbx*1+r11]
+ pxor xmm4,xmm12
+ pxor xmm5,xmm12
+
+ dec edx
+ jnz NEAR $L$oop_dec4x
+
+ mov rax,QWORD[16+rsp]
+
+ mov edx,DWORD[24+rsp]
+
+ lea rdi,[160+rdi]
+ dec edx
+ jnz NEAR $L$dec4x_loop_grande
+
+$L$dec4x_done:
+ movaps xmm6,XMMWORD[((-216))+rax]
+ movaps xmm7,XMMWORD[((-200))+rax]
+ movaps xmm8,XMMWORD[((-184))+rax]
+ movaps xmm9,XMMWORD[((-168))+rax]
+ movaps xmm10,XMMWORD[((-152))+rax]
+ movaps xmm11,XMMWORD[((-136))+rax]
+ movaps xmm12,XMMWORD[((-120))+rax]
+
+
+
+ mov r15,QWORD[((-48))+rax]
+
+ mov r14,QWORD[((-40))+rax]
+
+ mov r13,QWORD[((-32))+rax]
+
+ mov r12,QWORD[((-24))+rax]
+
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$dec4x_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_aesni_multi_cbc_decrypt:
+
+ALIGN 32
+aesni_multi_cbc_encrypt_avx:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_multi_cbc_encrypt_avx:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+_avx_cbc_enc_shortcut:
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ lea rsp,[((-168))+rsp]
+ movaps XMMWORD[rsp],xmm6
+ movaps XMMWORD[16+rsp],xmm7
+ movaps XMMWORD[32+rsp],xmm8
+ movaps XMMWORD[48+rsp],xmm9
+ movaps XMMWORD[64+rsp],xmm10
+ movaps XMMWORD[80+rsp],xmm11
+ movaps XMMWORD[(-120)+rax],xmm12
+ movaps XMMWORD[(-104)+rax],xmm13
+ movaps XMMWORD[(-88)+rax],xmm14
+ movaps XMMWORD[(-72)+rax],xmm15
+
+
+
+
+
+
+
+
+ sub rsp,192
+ and rsp,-128
+ mov QWORD[16+rsp],rax
+
+
+$L$enc8x_body:
+ vzeroupper
+ vmovdqu xmm15,XMMWORD[rsi]
+ lea rsi,[120+rsi]
+ lea rdi,[160+rdi]
+ shr edx,1
+
+$L$enc8x_loop_grande:
+
+ xor edx,edx
+ mov ecx,DWORD[((-144))+rdi]
+ mov r8,QWORD[((-160))+rdi]
+ cmp ecx,edx
+ mov rbx,QWORD[((-152))+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ vmovdqu xmm2,XMMWORD[((-136))+rdi]
+ mov DWORD[32+rsp],ecx
+ cmovle r8,rsp
+ sub rbx,r8
+ mov QWORD[64+rsp],rbx
+ mov ecx,DWORD[((-104))+rdi]
+ mov r9,QWORD[((-120))+rdi]
+ cmp ecx,edx
+ mov rbp,QWORD[((-112))+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ vmovdqu xmm3,XMMWORD[((-96))+rdi]
+ mov DWORD[36+rsp],ecx
+ cmovle r9,rsp
+ sub rbp,r9
+ mov QWORD[72+rsp],rbp
+ mov ecx,DWORD[((-64))+rdi]
+ mov r10,QWORD[((-80))+rdi]
+ cmp ecx,edx
+ mov rbp,QWORD[((-72))+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ vmovdqu xmm4,XMMWORD[((-56))+rdi]
+ mov DWORD[40+rsp],ecx
+ cmovle r10,rsp
+ sub rbp,r10
+ mov QWORD[80+rsp],rbp
+ mov ecx,DWORD[((-24))+rdi]
+ mov r11,QWORD[((-40))+rdi]
+ cmp ecx,edx
+ mov rbp,QWORD[((-32))+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ vmovdqu xmm5,XMMWORD[((-16))+rdi]
+ mov DWORD[44+rsp],ecx
+ cmovle r11,rsp
+ sub rbp,r11
+ mov QWORD[88+rsp],rbp
+ mov ecx,DWORD[16+rdi]
+ mov r12,QWORD[rdi]
+ cmp ecx,edx
+ mov rbp,QWORD[8+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ vmovdqu xmm6,XMMWORD[24+rdi]
+ mov DWORD[48+rsp],ecx
+ cmovle r12,rsp
+ sub rbp,r12
+ mov QWORD[96+rsp],rbp
+ mov ecx,DWORD[56+rdi]
+ mov r13,QWORD[40+rdi]
+ cmp ecx,edx
+ mov rbp,QWORD[48+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ vmovdqu xmm7,XMMWORD[64+rdi]
+ mov DWORD[52+rsp],ecx
+ cmovle r13,rsp
+ sub rbp,r13
+ mov QWORD[104+rsp],rbp
+ mov ecx,DWORD[96+rdi]
+ mov r14,QWORD[80+rdi]
+ cmp ecx,edx
+ mov rbp,QWORD[88+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ vmovdqu xmm8,XMMWORD[104+rdi]
+ mov DWORD[56+rsp],ecx
+ cmovle r14,rsp
+ sub rbp,r14
+ mov QWORD[112+rsp],rbp
+ mov ecx,DWORD[136+rdi]
+ mov r15,QWORD[120+rdi]
+ cmp ecx,edx
+ mov rbp,QWORD[128+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ vmovdqu xmm9,XMMWORD[144+rdi]
+ mov DWORD[60+rsp],ecx
+ cmovle r15,rsp
+ sub rbp,r15
+ mov QWORD[120+rsp],rbp
+ test edx,edx
+ jz NEAR $L$enc8x_done
+
+ vmovups xmm1,XMMWORD[((16-120))+rsi]
+ vmovups xmm0,XMMWORD[((32-120))+rsi]
+ mov eax,DWORD[((240-120))+rsi]
+
+ vpxor xmm10,xmm15,XMMWORD[r8]
+ lea rbp,[128+rsp]
+ vpxor xmm11,xmm15,XMMWORD[r9]
+ vpxor xmm12,xmm15,XMMWORD[r10]
+ vpxor xmm13,xmm15,XMMWORD[r11]
+ vpxor xmm2,xmm2,xmm10
+ vpxor xmm10,xmm15,XMMWORD[r12]
+ vpxor xmm3,xmm3,xmm11
+ vpxor xmm11,xmm15,XMMWORD[r13]
+ vpxor xmm4,xmm4,xmm12
+ vpxor xmm12,xmm15,XMMWORD[r14]
+ vpxor xmm5,xmm5,xmm13
+ vpxor xmm13,xmm15,XMMWORD[r15]
+ vpxor xmm6,xmm6,xmm10
+ mov ecx,1
+ vpxor xmm7,xmm7,xmm11
+ vpxor xmm8,xmm8,xmm12
+ vpxor xmm9,xmm9,xmm13
+ jmp NEAR $L$oop_enc8x
+
+ALIGN 32
+$L$oop_enc8x:
+ vaesenc xmm2,xmm2,xmm1
+ cmp ecx,DWORD[((32+0))+rsp]
+ vaesenc xmm3,xmm3,xmm1
+ prefetcht0 [31+r8]
+ vaesenc xmm4,xmm4,xmm1
+ vaesenc xmm5,xmm5,xmm1
+ lea rbx,[rbx*1+r8]
+ cmovge r8,rsp
+ vaesenc xmm6,xmm6,xmm1
+ cmovg rbx,rsp
+ vaesenc xmm7,xmm7,xmm1
+ sub rbx,r8
+ vaesenc xmm8,xmm8,xmm1
+ vpxor xmm10,xmm15,XMMWORD[16+r8]
+ mov QWORD[((64+0))+rsp],rbx
+ vaesenc xmm9,xmm9,xmm1
+ vmovups xmm1,XMMWORD[((-72))+rsi]
+ lea r8,[16+rbx*1+r8]
+ vmovdqu XMMWORD[rbp],xmm10
+ vaesenc xmm2,xmm2,xmm0
+ cmp ecx,DWORD[((32+4))+rsp]
+ mov rbx,QWORD[((64+8))+rsp]
+ vaesenc xmm3,xmm3,xmm0
+ prefetcht0 [31+r9]
+ vaesenc xmm4,xmm4,xmm0
+ vaesenc xmm5,xmm5,xmm0
+ lea rbx,[rbx*1+r9]
+ cmovge r9,rsp
+ vaesenc xmm6,xmm6,xmm0
+ cmovg rbx,rsp
+ vaesenc xmm7,xmm7,xmm0
+ sub rbx,r9
+ vaesenc xmm8,xmm8,xmm0
+ vpxor xmm11,xmm15,XMMWORD[16+r9]
+ mov QWORD[((64+8))+rsp],rbx
+ vaesenc xmm9,xmm9,xmm0
+ vmovups xmm0,XMMWORD[((-56))+rsi]
+ lea r9,[16+rbx*1+r9]
+ vmovdqu XMMWORD[16+rbp],xmm11
+ vaesenc xmm2,xmm2,xmm1
+ cmp ecx,DWORD[((32+8))+rsp]
+ mov rbx,QWORD[((64+16))+rsp]
+ vaesenc xmm3,xmm3,xmm1
+ prefetcht0 [31+r10]
+ vaesenc xmm4,xmm4,xmm1
+ prefetcht0 [15+r8]
+ vaesenc xmm5,xmm5,xmm1
+ lea rbx,[rbx*1+r10]
+ cmovge r10,rsp
+ vaesenc xmm6,xmm6,xmm1
+ cmovg rbx,rsp
+ vaesenc xmm7,xmm7,xmm1
+ sub rbx,r10
+ vaesenc xmm8,xmm8,xmm1
+ vpxor xmm12,xmm15,XMMWORD[16+r10]
+ mov QWORD[((64+16))+rsp],rbx
+ vaesenc xmm9,xmm9,xmm1
+ vmovups xmm1,XMMWORD[((-40))+rsi]
+ lea r10,[16+rbx*1+r10]
+ vmovdqu XMMWORD[32+rbp],xmm12
+ vaesenc xmm2,xmm2,xmm0
+ cmp ecx,DWORD[((32+12))+rsp]
+ mov rbx,QWORD[((64+24))+rsp]
+ vaesenc xmm3,xmm3,xmm0
+ prefetcht0 [31+r11]
+ vaesenc xmm4,xmm4,xmm0
+ prefetcht0 [15+r9]
+ vaesenc xmm5,xmm5,xmm0
+ lea rbx,[rbx*1+r11]
+ cmovge r11,rsp
+ vaesenc xmm6,xmm6,xmm0
+ cmovg rbx,rsp
+ vaesenc xmm7,xmm7,xmm0
+ sub rbx,r11
+ vaesenc xmm8,xmm8,xmm0
+ vpxor xmm13,xmm15,XMMWORD[16+r11]
+ mov QWORD[((64+24))+rsp],rbx
+ vaesenc xmm9,xmm9,xmm0
+ vmovups xmm0,XMMWORD[((-24))+rsi]
+ lea r11,[16+rbx*1+r11]
+ vmovdqu XMMWORD[48+rbp],xmm13
+ vaesenc xmm2,xmm2,xmm1
+ cmp ecx,DWORD[((32+16))+rsp]
+ mov rbx,QWORD[((64+32))+rsp]
+ vaesenc xmm3,xmm3,xmm1
+ prefetcht0 [31+r12]
+ vaesenc xmm4,xmm4,xmm1
+ prefetcht0 [15+r10]
+ vaesenc xmm5,xmm5,xmm1
+ lea rbx,[rbx*1+r12]
+ cmovge r12,rsp
+ vaesenc xmm6,xmm6,xmm1
+ cmovg rbx,rsp
+ vaesenc xmm7,xmm7,xmm1
+ sub rbx,r12
+ vaesenc xmm8,xmm8,xmm1
+ vpxor xmm10,xmm15,XMMWORD[16+r12]
+ mov QWORD[((64+32))+rsp],rbx
+ vaesenc xmm9,xmm9,xmm1
+ vmovups xmm1,XMMWORD[((-8))+rsi]
+ lea r12,[16+rbx*1+r12]
+ vaesenc xmm2,xmm2,xmm0
+ cmp ecx,DWORD[((32+20))+rsp]
+ mov rbx,QWORD[((64+40))+rsp]
+ vaesenc xmm3,xmm3,xmm0
+ prefetcht0 [31+r13]
+ vaesenc xmm4,xmm4,xmm0
+ prefetcht0 [15+r11]
+ vaesenc xmm5,xmm5,xmm0
+ lea rbx,[r13*1+rbx]
+ cmovge r13,rsp
+ vaesenc xmm6,xmm6,xmm0
+ cmovg rbx,rsp
+ vaesenc xmm7,xmm7,xmm0
+ sub rbx,r13
+ vaesenc xmm8,xmm8,xmm0
+ vpxor xmm11,xmm15,XMMWORD[16+r13]
+ mov QWORD[((64+40))+rsp],rbx
+ vaesenc xmm9,xmm9,xmm0
+ vmovups xmm0,XMMWORD[8+rsi]
+ lea r13,[16+rbx*1+r13]
+ vaesenc xmm2,xmm2,xmm1
+ cmp ecx,DWORD[((32+24))+rsp]
+ mov rbx,QWORD[((64+48))+rsp]
+ vaesenc xmm3,xmm3,xmm1
+ prefetcht0 [31+r14]
+ vaesenc xmm4,xmm4,xmm1
+ prefetcht0 [15+r12]
+ vaesenc xmm5,xmm5,xmm1
+ lea rbx,[rbx*1+r14]
+ cmovge r14,rsp
+ vaesenc xmm6,xmm6,xmm1
+ cmovg rbx,rsp
+ vaesenc xmm7,xmm7,xmm1
+ sub rbx,r14
+ vaesenc xmm8,xmm8,xmm1
+ vpxor xmm12,xmm15,XMMWORD[16+r14]
+ mov QWORD[((64+48))+rsp],rbx
+ vaesenc xmm9,xmm9,xmm1
+ vmovups xmm1,XMMWORD[24+rsi]
+ lea r14,[16+rbx*1+r14]
+ vaesenc xmm2,xmm2,xmm0
+ cmp ecx,DWORD[((32+28))+rsp]
+ mov rbx,QWORD[((64+56))+rsp]
+ vaesenc xmm3,xmm3,xmm0
+ prefetcht0 [31+r15]
+ vaesenc xmm4,xmm4,xmm0
+ prefetcht0 [15+r13]
+ vaesenc xmm5,xmm5,xmm0
+ lea rbx,[rbx*1+r15]
+ cmovge r15,rsp
+ vaesenc xmm6,xmm6,xmm0
+ cmovg rbx,rsp
+ vaesenc xmm7,xmm7,xmm0
+ sub rbx,r15
+ vaesenc xmm8,xmm8,xmm0
+ vpxor xmm13,xmm15,XMMWORD[16+r15]
+ mov QWORD[((64+56))+rsp],rbx
+ vaesenc xmm9,xmm9,xmm0
+ vmovups xmm0,XMMWORD[40+rsi]
+ lea r15,[16+rbx*1+r15]
+ vmovdqu xmm14,XMMWORD[32+rsp]
+ prefetcht0 [15+r14]
+ prefetcht0 [15+r15]
+ cmp eax,11
+ jb NEAR $L$enc8x_tail
+
+ vaesenc xmm2,xmm2,xmm1
+ vaesenc xmm3,xmm3,xmm1
+ vaesenc xmm4,xmm4,xmm1
+ vaesenc xmm5,xmm5,xmm1
+ vaesenc xmm6,xmm6,xmm1
+ vaesenc xmm7,xmm7,xmm1
+ vaesenc xmm8,xmm8,xmm1
+ vaesenc xmm9,xmm9,xmm1
+ vmovups xmm1,XMMWORD[((176-120))+rsi]
+
+ vaesenc xmm2,xmm2,xmm0
+ vaesenc xmm3,xmm3,xmm0
+ vaesenc xmm4,xmm4,xmm0
+ vaesenc xmm5,xmm5,xmm0
+ vaesenc xmm6,xmm6,xmm0
+ vaesenc xmm7,xmm7,xmm0
+ vaesenc xmm8,xmm8,xmm0
+ vaesenc xmm9,xmm9,xmm0
+ vmovups xmm0,XMMWORD[((192-120))+rsi]
+ je NEAR $L$enc8x_tail
+
+ vaesenc xmm2,xmm2,xmm1
+ vaesenc xmm3,xmm3,xmm1
+ vaesenc xmm4,xmm4,xmm1
+ vaesenc xmm5,xmm5,xmm1
+ vaesenc xmm6,xmm6,xmm1
+ vaesenc xmm7,xmm7,xmm1
+ vaesenc xmm8,xmm8,xmm1
+ vaesenc xmm9,xmm9,xmm1
+ vmovups xmm1,XMMWORD[((208-120))+rsi]
+
+ vaesenc xmm2,xmm2,xmm0
+ vaesenc xmm3,xmm3,xmm0
+ vaesenc xmm4,xmm4,xmm0
+ vaesenc xmm5,xmm5,xmm0
+ vaesenc xmm6,xmm6,xmm0
+ vaesenc xmm7,xmm7,xmm0
+ vaesenc xmm8,xmm8,xmm0
+ vaesenc xmm9,xmm9,xmm0
+ vmovups xmm0,XMMWORD[((224-120))+rsi]
+
+$L$enc8x_tail:
+ vaesenc xmm2,xmm2,xmm1
+ vpxor xmm15,xmm15,xmm15
+ vaesenc xmm3,xmm3,xmm1
+ vaesenc xmm4,xmm4,xmm1
+ vpcmpgtd xmm15,xmm14,xmm15
+ vaesenc xmm5,xmm5,xmm1
+ vaesenc xmm6,xmm6,xmm1
+ vpaddd xmm15,xmm15,xmm14
+ vmovdqu xmm14,XMMWORD[48+rsp]
+ vaesenc xmm7,xmm7,xmm1
+ mov rbx,QWORD[64+rsp]
+ vaesenc xmm8,xmm8,xmm1
+ vaesenc xmm9,xmm9,xmm1
+ vmovups xmm1,XMMWORD[((16-120))+rsi]
+
+ vaesenclast xmm2,xmm2,xmm0
+ vmovdqa XMMWORD[32+rsp],xmm15
+ vpxor xmm15,xmm15,xmm15
+ vaesenclast xmm3,xmm3,xmm0
+ vaesenclast xmm4,xmm4,xmm0
+ vpcmpgtd xmm15,xmm14,xmm15
+ vaesenclast xmm5,xmm5,xmm0
+ vaesenclast xmm6,xmm6,xmm0
+ vpaddd xmm14,xmm14,xmm15
+ vmovdqu xmm15,XMMWORD[((-120))+rsi]
+ vaesenclast xmm7,xmm7,xmm0
+ vaesenclast xmm8,xmm8,xmm0
+ vmovdqa XMMWORD[48+rsp],xmm14
+ vaesenclast xmm9,xmm9,xmm0
+ vmovups xmm0,XMMWORD[((32-120))+rsi]
+
+ vmovups XMMWORD[(-16)+r8],xmm2
+ sub r8,rbx
+ vpxor xmm2,xmm2,XMMWORD[rbp]
+ vmovups XMMWORD[(-16)+r9],xmm3
+ sub r9,QWORD[72+rsp]
+ vpxor xmm3,xmm3,XMMWORD[16+rbp]
+ vmovups XMMWORD[(-16)+r10],xmm4
+ sub r10,QWORD[80+rsp]
+ vpxor xmm4,xmm4,XMMWORD[32+rbp]
+ vmovups XMMWORD[(-16)+r11],xmm5
+ sub r11,QWORD[88+rsp]
+ vpxor xmm5,xmm5,XMMWORD[48+rbp]
+ vmovups XMMWORD[(-16)+r12],xmm6
+ sub r12,QWORD[96+rsp]
+ vpxor xmm6,xmm6,xmm10
+ vmovups XMMWORD[(-16)+r13],xmm7
+ sub r13,QWORD[104+rsp]
+ vpxor xmm7,xmm7,xmm11
+ vmovups XMMWORD[(-16)+r14],xmm8
+ sub r14,QWORD[112+rsp]
+ vpxor xmm8,xmm8,xmm12
+ vmovups XMMWORD[(-16)+r15],xmm9
+ sub r15,QWORD[120+rsp]
+ vpxor xmm9,xmm9,xmm13
+
+ dec edx
+ jnz NEAR $L$oop_enc8x
+
+ mov rax,QWORD[16+rsp]
+
+
+
+
+
+
+$L$enc8x_done:
+ vzeroupper
+ movaps xmm6,XMMWORD[((-216))+rax]
+ movaps xmm7,XMMWORD[((-200))+rax]
+ movaps xmm8,XMMWORD[((-184))+rax]
+ movaps xmm9,XMMWORD[((-168))+rax]
+ movaps xmm10,XMMWORD[((-152))+rax]
+ movaps xmm11,XMMWORD[((-136))+rax]
+ movaps xmm12,XMMWORD[((-120))+rax]
+ movaps xmm13,XMMWORD[((-104))+rax]
+ movaps xmm14,XMMWORD[((-88))+rax]
+ movaps xmm15,XMMWORD[((-72))+rax]
+ mov r15,QWORD[((-48))+rax]
+
+ mov r14,QWORD[((-40))+rax]
+
+ mov r13,QWORD[((-32))+rax]
+
+ mov r12,QWORD[((-24))+rax]
+
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$enc8x_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_aesni_multi_cbc_encrypt_avx:
+
+
+ALIGN 32
+aesni_multi_cbc_decrypt_avx:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_multi_cbc_decrypt_avx:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+_avx_cbc_dec_shortcut:
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ lea rsp,[((-168))+rsp]
+ movaps XMMWORD[rsp],xmm6
+ movaps XMMWORD[16+rsp],xmm7
+ movaps XMMWORD[32+rsp],xmm8
+ movaps XMMWORD[48+rsp],xmm9
+ movaps XMMWORD[64+rsp],xmm10
+ movaps XMMWORD[80+rsp],xmm11
+ movaps XMMWORD[(-120)+rax],xmm12
+ movaps XMMWORD[(-104)+rax],xmm13
+ movaps XMMWORD[(-88)+rax],xmm14
+ movaps XMMWORD[(-72)+rax],xmm15
+
+
+
+
+
+
+
+
+
+ sub rsp,256
+ and rsp,-256
+ sub rsp,192
+ mov QWORD[16+rsp],rax
+
+
+$L$dec8x_body:
+ vzeroupper
+ vmovdqu xmm15,XMMWORD[rsi]
+ lea rsi,[120+rsi]
+ lea rdi,[160+rdi]
+ shr edx,1
+
+$L$dec8x_loop_grande:
+
+ xor edx,edx
+ mov ecx,DWORD[((-144))+rdi]
+ mov r8,QWORD[((-160))+rdi]
+ cmp ecx,edx
+ mov rbx,QWORD[((-152))+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ vmovdqu xmm2,XMMWORD[((-136))+rdi]
+ mov DWORD[32+rsp],ecx
+ cmovle r8,rsp
+ sub rbx,r8
+ mov QWORD[64+rsp],rbx
+ vmovdqu XMMWORD[192+rsp],xmm2
+ mov ecx,DWORD[((-104))+rdi]
+ mov r9,QWORD[((-120))+rdi]
+ cmp ecx,edx
+ mov rbp,QWORD[((-112))+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ vmovdqu xmm3,XMMWORD[((-96))+rdi]
+ mov DWORD[36+rsp],ecx
+ cmovle r9,rsp
+ sub rbp,r9
+ mov QWORD[72+rsp],rbp
+ vmovdqu XMMWORD[208+rsp],xmm3
+ mov ecx,DWORD[((-64))+rdi]
+ mov r10,QWORD[((-80))+rdi]
+ cmp ecx,edx
+ mov rbp,QWORD[((-72))+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ vmovdqu xmm4,XMMWORD[((-56))+rdi]
+ mov DWORD[40+rsp],ecx
+ cmovle r10,rsp
+ sub rbp,r10
+ mov QWORD[80+rsp],rbp
+ vmovdqu XMMWORD[224+rsp],xmm4
+ mov ecx,DWORD[((-24))+rdi]
+ mov r11,QWORD[((-40))+rdi]
+ cmp ecx,edx
+ mov rbp,QWORD[((-32))+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ vmovdqu xmm5,XMMWORD[((-16))+rdi]
+ mov DWORD[44+rsp],ecx
+ cmovle r11,rsp
+ sub rbp,r11
+ mov QWORD[88+rsp],rbp
+ vmovdqu XMMWORD[240+rsp],xmm5
+ mov ecx,DWORD[16+rdi]
+ mov r12,QWORD[rdi]
+ cmp ecx,edx
+ mov rbp,QWORD[8+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ vmovdqu xmm6,XMMWORD[24+rdi]
+ mov DWORD[48+rsp],ecx
+ cmovle r12,rsp
+ sub rbp,r12
+ mov QWORD[96+rsp],rbp
+ vmovdqu XMMWORD[256+rsp],xmm6
+ mov ecx,DWORD[56+rdi]
+ mov r13,QWORD[40+rdi]
+ cmp ecx,edx
+ mov rbp,QWORD[48+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ vmovdqu xmm7,XMMWORD[64+rdi]
+ mov DWORD[52+rsp],ecx
+ cmovle r13,rsp
+ sub rbp,r13
+ mov QWORD[104+rsp],rbp
+ vmovdqu XMMWORD[272+rsp],xmm7
+ mov ecx,DWORD[96+rdi]
+ mov r14,QWORD[80+rdi]
+ cmp ecx,edx
+ mov rbp,QWORD[88+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ vmovdqu xmm8,XMMWORD[104+rdi]
+ mov DWORD[56+rsp],ecx
+ cmovle r14,rsp
+ sub rbp,r14
+ mov QWORD[112+rsp],rbp
+ vmovdqu XMMWORD[288+rsp],xmm8
+ mov ecx,DWORD[136+rdi]
+ mov r15,QWORD[120+rdi]
+ cmp ecx,edx
+ mov rbp,QWORD[128+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ vmovdqu xmm9,XMMWORD[144+rdi]
+ mov DWORD[60+rsp],ecx
+ cmovle r15,rsp
+ sub rbp,r15
+ mov QWORD[120+rsp],rbp
+ vmovdqu XMMWORD[304+rsp],xmm9
+ test edx,edx
+ jz NEAR $L$dec8x_done
+
+ vmovups xmm1,XMMWORD[((16-120))+rsi]
+ vmovups xmm0,XMMWORD[((32-120))+rsi]
+ mov eax,DWORD[((240-120))+rsi]
+ lea rbp,[((192+128))+rsp]
+
+ vmovdqu xmm2,XMMWORD[r8]
+ vmovdqu xmm3,XMMWORD[r9]
+ vmovdqu xmm4,XMMWORD[r10]
+ vmovdqu xmm5,XMMWORD[r11]
+ vmovdqu xmm6,XMMWORD[r12]
+ vmovdqu xmm7,XMMWORD[r13]
+ vmovdqu xmm8,XMMWORD[r14]
+ vmovdqu xmm9,XMMWORD[r15]
+ vmovdqu XMMWORD[rbp],xmm2
+ vpxor xmm2,xmm2,xmm15
+ vmovdqu XMMWORD[16+rbp],xmm3
+ vpxor xmm3,xmm3,xmm15
+ vmovdqu XMMWORD[32+rbp],xmm4
+ vpxor xmm4,xmm4,xmm15
+ vmovdqu XMMWORD[48+rbp],xmm5
+ vpxor xmm5,xmm5,xmm15
+ vmovdqu XMMWORD[64+rbp],xmm6
+ vpxor xmm6,xmm6,xmm15
+ vmovdqu XMMWORD[80+rbp],xmm7
+ vpxor xmm7,xmm7,xmm15
+ vmovdqu XMMWORD[96+rbp],xmm8
+ vpxor xmm8,xmm8,xmm15
+ vmovdqu XMMWORD[112+rbp],xmm9
+ vpxor xmm9,xmm9,xmm15
+ xor rbp,0x80
+ mov ecx,1
+ jmp NEAR $L$oop_dec8x
+
+ALIGN 32
+$L$oop_dec8x:
+ vaesdec xmm2,xmm2,xmm1
+ cmp ecx,DWORD[((32+0))+rsp]
+ vaesdec xmm3,xmm3,xmm1
+ prefetcht0 [31+r8]
+ vaesdec xmm4,xmm4,xmm1
+ vaesdec xmm5,xmm5,xmm1
+ lea rbx,[rbx*1+r8]
+ cmovge r8,rsp
+ vaesdec xmm6,xmm6,xmm1
+ cmovg rbx,rsp
+ vaesdec xmm7,xmm7,xmm1
+ sub rbx,r8
+ vaesdec xmm8,xmm8,xmm1
+ vmovdqu xmm10,XMMWORD[16+r8]
+ mov QWORD[((64+0))+rsp],rbx
+ vaesdec xmm9,xmm9,xmm1
+ vmovups xmm1,XMMWORD[((-72))+rsi]
+ lea r8,[16+rbx*1+r8]
+ vmovdqu XMMWORD[128+rsp],xmm10
+ vaesdec xmm2,xmm2,xmm0
+ cmp ecx,DWORD[((32+4))+rsp]
+ mov rbx,QWORD[((64+8))+rsp]
+ vaesdec xmm3,xmm3,xmm0
+ prefetcht0 [31+r9]
+ vaesdec xmm4,xmm4,xmm0
+ vaesdec xmm5,xmm5,xmm0
+ lea rbx,[rbx*1+r9]
+ cmovge r9,rsp
+ vaesdec xmm6,xmm6,xmm0
+ cmovg rbx,rsp
+ vaesdec xmm7,xmm7,xmm0
+ sub rbx,r9
+ vaesdec xmm8,xmm8,xmm0
+ vmovdqu xmm11,XMMWORD[16+r9]
+ mov QWORD[((64+8))+rsp],rbx
+ vaesdec xmm9,xmm9,xmm0
+ vmovups xmm0,XMMWORD[((-56))+rsi]
+ lea r9,[16+rbx*1+r9]
+ vmovdqu XMMWORD[144+rsp],xmm11
+ vaesdec xmm2,xmm2,xmm1
+ cmp ecx,DWORD[((32+8))+rsp]
+ mov rbx,QWORD[((64+16))+rsp]
+ vaesdec xmm3,xmm3,xmm1
+ prefetcht0 [31+r10]
+ vaesdec xmm4,xmm4,xmm1
+ prefetcht0 [15+r8]
+ vaesdec xmm5,xmm5,xmm1
+ lea rbx,[rbx*1+r10]
+ cmovge r10,rsp
+ vaesdec xmm6,xmm6,xmm1
+ cmovg rbx,rsp
+ vaesdec xmm7,xmm7,xmm1
+ sub rbx,r10
+ vaesdec xmm8,xmm8,xmm1
+ vmovdqu xmm12,XMMWORD[16+r10]
+ mov QWORD[((64+16))+rsp],rbx
+ vaesdec xmm9,xmm9,xmm1
+ vmovups xmm1,XMMWORD[((-40))+rsi]
+ lea r10,[16+rbx*1+r10]
+ vmovdqu XMMWORD[160+rsp],xmm12
+ vaesdec xmm2,xmm2,xmm0
+ cmp ecx,DWORD[((32+12))+rsp]
+ mov rbx,QWORD[((64+24))+rsp]
+ vaesdec xmm3,xmm3,xmm0
+ prefetcht0 [31+r11]
+ vaesdec xmm4,xmm4,xmm0
+ prefetcht0 [15+r9]
+ vaesdec xmm5,xmm5,xmm0
+ lea rbx,[rbx*1+r11]
+ cmovge r11,rsp
+ vaesdec xmm6,xmm6,xmm0
+ cmovg rbx,rsp
+ vaesdec xmm7,xmm7,xmm0
+ sub rbx,r11
+ vaesdec xmm8,xmm8,xmm0
+ vmovdqu xmm13,XMMWORD[16+r11]
+ mov QWORD[((64+24))+rsp],rbx
+ vaesdec xmm9,xmm9,xmm0
+ vmovups xmm0,XMMWORD[((-24))+rsi]
+ lea r11,[16+rbx*1+r11]
+ vmovdqu XMMWORD[176+rsp],xmm13
+ vaesdec xmm2,xmm2,xmm1
+ cmp ecx,DWORD[((32+16))+rsp]
+ mov rbx,QWORD[((64+32))+rsp]
+ vaesdec xmm3,xmm3,xmm1
+ prefetcht0 [31+r12]
+ vaesdec xmm4,xmm4,xmm1
+ prefetcht0 [15+r10]
+ vaesdec xmm5,xmm5,xmm1
+ lea rbx,[rbx*1+r12]
+ cmovge r12,rsp
+ vaesdec xmm6,xmm6,xmm1
+ cmovg rbx,rsp
+ vaesdec xmm7,xmm7,xmm1
+ sub rbx,r12
+ vaesdec xmm8,xmm8,xmm1
+ vmovdqu xmm10,XMMWORD[16+r12]
+ mov QWORD[((64+32))+rsp],rbx
+ vaesdec xmm9,xmm9,xmm1
+ vmovups xmm1,XMMWORD[((-8))+rsi]
+ lea r12,[16+rbx*1+r12]
+ vaesdec xmm2,xmm2,xmm0
+ cmp ecx,DWORD[((32+20))+rsp]
+ mov rbx,QWORD[((64+40))+rsp]
+ vaesdec xmm3,xmm3,xmm0
+ prefetcht0 [31+r13]
+ vaesdec xmm4,xmm4,xmm0
+ prefetcht0 [15+r11]
+ vaesdec xmm5,xmm5,xmm0
+ lea rbx,[r13*1+rbx]
+ cmovge r13,rsp
+ vaesdec xmm6,xmm6,xmm0
+ cmovg rbx,rsp
+ vaesdec xmm7,xmm7,xmm0
+ sub rbx,r13
+ vaesdec xmm8,xmm8,xmm0
+ vmovdqu xmm11,XMMWORD[16+r13]
+ mov QWORD[((64+40))+rsp],rbx
+ vaesdec xmm9,xmm9,xmm0
+ vmovups xmm0,XMMWORD[8+rsi]
+ lea r13,[16+rbx*1+r13]
+ vaesdec xmm2,xmm2,xmm1
+ cmp ecx,DWORD[((32+24))+rsp]
+ mov rbx,QWORD[((64+48))+rsp]
+ vaesdec xmm3,xmm3,xmm1
+ prefetcht0 [31+r14]
+ vaesdec xmm4,xmm4,xmm1
+ prefetcht0 [15+r12]
+ vaesdec xmm5,xmm5,xmm1
+ lea rbx,[rbx*1+r14]
+ cmovge r14,rsp
+ vaesdec xmm6,xmm6,xmm1
+ cmovg rbx,rsp
+ vaesdec xmm7,xmm7,xmm1
+ sub rbx,r14
+ vaesdec xmm8,xmm8,xmm1
+ vmovdqu xmm12,XMMWORD[16+r14]
+ mov QWORD[((64+48))+rsp],rbx
+ vaesdec xmm9,xmm9,xmm1
+ vmovups xmm1,XMMWORD[24+rsi]
+ lea r14,[16+rbx*1+r14]
+ vaesdec xmm2,xmm2,xmm0
+ cmp ecx,DWORD[((32+28))+rsp]
+ mov rbx,QWORD[((64+56))+rsp]
+ vaesdec xmm3,xmm3,xmm0
+ prefetcht0 [31+r15]
+ vaesdec xmm4,xmm4,xmm0
+ prefetcht0 [15+r13]
+ vaesdec xmm5,xmm5,xmm0
+ lea rbx,[rbx*1+r15]
+ cmovge r15,rsp
+ vaesdec xmm6,xmm6,xmm0
+ cmovg rbx,rsp
+ vaesdec xmm7,xmm7,xmm0
+ sub rbx,r15
+ vaesdec xmm8,xmm8,xmm0
+ vmovdqu xmm13,XMMWORD[16+r15]
+ mov QWORD[((64+56))+rsp],rbx
+ vaesdec xmm9,xmm9,xmm0
+ vmovups xmm0,XMMWORD[40+rsi]
+ lea r15,[16+rbx*1+r15]
+ vmovdqu xmm14,XMMWORD[32+rsp]
+ prefetcht0 [15+r14]
+ prefetcht0 [15+r15]
+ cmp eax,11
+ jb NEAR $L$dec8x_tail
+
+ vaesdec xmm2,xmm2,xmm1
+ vaesdec xmm3,xmm3,xmm1
+ vaesdec xmm4,xmm4,xmm1
+ vaesdec xmm5,xmm5,xmm1
+ vaesdec xmm6,xmm6,xmm1
+ vaesdec xmm7,xmm7,xmm1
+ vaesdec xmm8,xmm8,xmm1
+ vaesdec xmm9,xmm9,xmm1
+ vmovups xmm1,XMMWORD[((176-120))+rsi]
+
+ vaesdec xmm2,xmm2,xmm0
+ vaesdec xmm3,xmm3,xmm0
+ vaesdec xmm4,xmm4,xmm0
+ vaesdec xmm5,xmm5,xmm0
+ vaesdec xmm6,xmm6,xmm0
+ vaesdec xmm7,xmm7,xmm0
+ vaesdec xmm8,xmm8,xmm0
+ vaesdec xmm9,xmm9,xmm0
+ vmovups xmm0,XMMWORD[((192-120))+rsi]
+ je NEAR $L$dec8x_tail
+
+ vaesdec xmm2,xmm2,xmm1
+ vaesdec xmm3,xmm3,xmm1
+ vaesdec xmm4,xmm4,xmm1
+ vaesdec xmm5,xmm5,xmm1
+ vaesdec xmm6,xmm6,xmm1
+ vaesdec xmm7,xmm7,xmm1
+ vaesdec xmm8,xmm8,xmm1
+ vaesdec xmm9,xmm9,xmm1
+ vmovups xmm1,XMMWORD[((208-120))+rsi]
+
+ vaesdec xmm2,xmm2,xmm0
+ vaesdec xmm3,xmm3,xmm0
+ vaesdec xmm4,xmm4,xmm0
+ vaesdec xmm5,xmm5,xmm0
+ vaesdec xmm6,xmm6,xmm0
+ vaesdec xmm7,xmm7,xmm0
+ vaesdec xmm8,xmm8,xmm0
+ vaesdec xmm9,xmm9,xmm0
+ vmovups xmm0,XMMWORD[((224-120))+rsi]
+
+$L$dec8x_tail:
+ vaesdec xmm2,xmm2,xmm1
+ vpxor xmm15,xmm15,xmm15
+ vaesdec xmm3,xmm3,xmm1
+ vaesdec xmm4,xmm4,xmm1
+ vpcmpgtd xmm15,xmm14,xmm15
+ vaesdec xmm5,xmm5,xmm1
+ vaesdec xmm6,xmm6,xmm1
+ vpaddd xmm15,xmm15,xmm14
+ vmovdqu xmm14,XMMWORD[48+rsp]
+ vaesdec xmm7,xmm7,xmm1
+ mov rbx,QWORD[64+rsp]
+ vaesdec xmm8,xmm8,xmm1
+ vaesdec xmm9,xmm9,xmm1
+ vmovups xmm1,XMMWORD[((16-120))+rsi]
+
+ vaesdeclast xmm2,xmm2,xmm0
+ vmovdqa XMMWORD[32+rsp],xmm15
+ vpxor xmm15,xmm15,xmm15
+ vaesdeclast xmm3,xmm3,xmm0
+ vpxor xmm2,xmm2,XMMWORD[rbp]
+ vaesdeclast xmm4,xmm4,xmm0
+ vpxor xmm3,xmm3,XMMWORD[16+rbp]
+ vpcmpgtd xmm15,xmm14,xmm15
+ vaesdeclast xmm5,xmm5,xmm0
+ vpxor xmm4,xmm4,XMMWORD[32+rbp]
+ vaesdeclast xmm6,xmm6,xmm0
+ vpxor xmm5,xmm5,XMMWORD[48+rbp]
+ vpaddd xmm14,xmm14,xmm15
+ vmovdqu xmm15,XMMWORD[((-120))+rsi]
+ vaesdeclast xmm7,xmm7,xmm0
+ vpxor xmm6,xmm6,XMMWORD[64+rbp]
+ vaesdeclast xmm8,xmm8,xmm0
+ vpxor xmm7,xmm7,XMMWORD[80+rbp]
+ vmovdqa XMMWORD[48+rsp],xmm14
+ vaesdeclast xmm9,xmm9,xmm0
+ vpxor xmm8,xmm8,XMMWORD[96+rbp]
+ vmovups xmm0,XMMWORD[((32-120))+rsi]
+
+ vmovups XMMWORD[(-16)+r8],xmm2
+ sub r8,rbx
+ vmovdqu xmm2,XMMWORD[((128+0))+rsp]
+ vpxor xmm9,xmm9,XMMWORD[112+rbp]
+ vmovups XMMWORD[(-16)+r9],xmm3
+ sub r9,QWORD[72+rsp]
+ vmovdqu XMMWORD[rbp],xmm2
+ vpxor xmm2,xmm2,xmm15
+ vmovdqu xmm3,XMMWORD[((128+16))+rsp]
+ vmovups XMMWORD[(-16)+r10],xmm4
+ sub r10,QWORD[80+rsp]
+ vmovdqu XMMWORD[16+rbp],xmm3
+ vpxor xmm3,xmm3,xmm15
+ vmovdqu xmm4,XMMWORD[((128+32))+rsp]
+ vmovups XMMWORD[(-16)+r11],xmm5
+ sub r11,QWORD[88+rsp]
+ vmovdqu XMMWORD[32+rbp],xmm4
+ vpxor xmm4,xmm4,xmm15
+ vmovdqu xmm5,XMMWORD[((128+48))+rsp]
+ vmovups XMMWORD[(-16)+r12],xmm6
+ sub r12,QWORD[96+rsp]
+ vmovdqu XMMWORD[48+rbp],xmm5
+ vpxor xmm5,xmm5,xmm15
+ vmovdqu XMMWORD[64+rbp],xmm10
+ vpxor xmm6,xmm15,xmm10
+ vmovups XMMWORD[(-16)+r13],xmm7
+ sub r13,QWORD[104+rsp]
+ vmovdqu XMMWORD[80+rbp],xmm11
+ vpxor xmm7,xmm15,xmm11
+ vmovups XMMWORD[(-16)+r14],xmm8
+ sub r14,QWORD[112+rsp]
+ vmovdqu XMMWORD[96+rbp],xmm12
+ vpxor xmm8,xmm15,xmm12
+ vmovups XMMWORD[(-16)+r15],xmm9
+ sub r15,QWORD[120+rsp]
+ vmovdqu XMMWORD[112+rbp],xmm13
+ vpxor xmm9,xmm15,xmm13
+
+ xor rbp,128
+ dec edx
+ jnz NEAR $L$oop_dec8x
+
+ mov rax,QWORD[16+rsp]
+
+
+
+
+
+
+$L$dec8x_done:
+ vzeroupper
+ movaps xmm6,XMMWORD[((-216))+rax]
+ movaps xmm7,XMMWORD[((-200))+rax]
+ movaps xmm8,XMMWORD[((-184))+rax]
+ movaps xmm9,XMMWORD[((-168))+rax]
+ movaps xmm10,XMMWORD[((-152))+rax]
+ movaps xmm11,XMMWORD[((-136))+rax]
+ movaps xmm12,XMMWORD[((-120))+rax]
+ movaps xmm13,XMMWORD[((-104))+rax]
+ movaps xmm14,XMMWORD[((-88))+rax]
+ movaps xmm15,XMMWORD[((-72))+rax]
+ mov r15,QWORD[((-48))+rax]
+
+ mov r14,QWORD[((-40))+rax]
+
+ mov r13,QWORD[((-32))+rax]
+
+ mov r12,QWORD[((-24))+rax]
+
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$dec8x_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_aesni_multi_cbc_decrypt_avx:
+EXTERN __imp_RtlVirtualUnwind
+
+ALIGN 16
+se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$in_prologue
+
+ mov rax,QWORD[152+r8]
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$in_prologue
+
+ mov rax,QWORD[16+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+ mov rbp,QWORD[((-16))+rax]
+ mov r12,QWORD[((-24))+rax]
+ mov r13,QWORD[((-32))+rax]
+ mov r14,QWORD[((-40))+rax]
+ mov r15,QWORD[((-48))+rax]
+ mov QWORD[144+r8],rbx
+ mov QWORD[160+r8],rbp
+ mov QWORD[216+r8],r12
+ mov QWORD[224+r8],r13
+ mov QWORD[232+r8],r14
+ mov QWORD[240+r8],r15
+
+ lea rsi,[((-56-160))+rax]
+ lea rdi,[512+r8]
+ mov ecx,20
+ DD 0xa548f3fc
+
+$L$in_prologue:
+ mov rdi,QWORD[8+rax]
+ mov rsi,QWORD[16+rax]
+ mov QWORD[152+r8],rax
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+
+ mov rdi,QWORD[40+r9]
+ mov rsi,r8
+ mov ecx,154
+ DD 0xa548f3fc
+
+ mov rsi,r9
+ xor rcx,rcx
+ mov rdx,QWORD[8+rsi]
+ mov r8,QWORD[rsi]
+ mov r9,QWORD[16+rsi]
+ mov r10,QWORD[40+rsi]
+ lea r11,[56+rsi]
+ lea r12,[24+rsi]
+ mov QWORD[32+rsp],r10
+ mov QWORD[40+rsp],r11
+ mov QWORD[48+rsp],r12
+ mov QWORD[56+rsp],rcx
+ call QWORD[__imp_RtlVirtualUnwind]
+
+ mov eax,1
+ add rsp,64
+ popfq
+ pop r15
+ pop r14
+ pop r13
+ pop r12
+ pop rbp
+ pop rbx
+ pop rdi
+ pop rsi
+ DB 0F3h,0C3h ;repret
+
+
+section .pdata rdata align=4
+ALIGN 4
+ DD $L$SEH_begin_aesni_multi_cbc_encrypt wrt ..imagebase
+ DD $L$SEH_end_aesni_multi_cbc_encrypt wrt ..imagebase
+ DD $L$SEH_info_aesni_multi_cbc_encrypt wrt ..imagebase
+ DD $L$SEH_begin_aesni_multi_cbc_decrypt wrt ..imagebase
+ DD $L$SEH_end_aesni_multi_cbc_decrypt wrt ..imagebase
+ DD $L$SEH_info_aesni_multi_cbc_decrypt wrt ..imagebase
+ DD $L$SEH_begin_aesni_multi_cbc_encrypt_avx wrt ..imagebase
+ DD $L$SEH_end_aesni_multi_cbc_encrypt_avx wrt ..imagebase
+ DD $L$SEH_info_aesni_multi_cbc_encrypt_avx wrt ..imagebase
+ DD $L$SEH_begin_aesni_multi_cbc_decrypt_avx wrt ..imagebase
+ DD $L$SEH_end_aesni_multi_cbc_decrypt_avx wrt ..imagebase
+ DD $L$SEH_info_aesni_multi_cbc_decrypt_avx wrt ..imagebase
+section .xdata rdata align=8
+ALIGN 8
+$L$SEH_info_aesni_multi_cbc_encrypt:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$enc4x_body wrt ..imagebase,$L$enc4x_epilogue wrt ..imagebase
+$L$SEH_info_aesni_multi_cbc_decrypt:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$dec4x_body wrt ..imagebase,$L$dec4x_epilogue wrt ..imagebase
+$L$SEH_info_aesni_multi_cbc_encrypt_avx:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$enc8x_body wrt ..imagebase,$L$enc8x_epilogue wrt ..imagebase
+$L$SEH_info_aesni_multi_cbc_decrypt_avx:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$dec8x_body wrt ..imagebase,$L$dec8x_epilogue wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1-x86_64.nasm
new file mode 100644
index 0000000000..0b706c4e77
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1-x86_64.nasm
@@ -0,0 +1,3271 @@
+; Copyright 2011-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+EXTERN OPENSSL_ia32cap_P
+
+global aesni_cbc_sha1_enc
+
+ALIGN 32
+aesni_cbc_sha1_enc:
+
+ mov r10d,DWORD[((OPENSSL_ia32cap_P+0))]
+ mov r11,QWORD[((OPENSSL_ia32cap_P+4))]
+ bt r11,61
+ jc NEAR aesni_cbc_sha1_enc_shaext
+ and r11d,268435456
+ and r10d,1073741824
+ or r10d,r11d
+ cmp r10d,1342177280
+ je NEAR aesni_cbc_sha1_enc_avx
+ jmp NEAR aesni_cbc_sha1_enc_ssse3
+ DB 0F3h,0C3h ;repret
+
+
+ALIGN 32
+aesni_cbc_sha1_enc_ssse3:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_cbc_sha1_enc_ssse3:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ mov r10,QWORD[56+rsp]
+
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ lea rsp,[((-264))+rsp]
+
+
+
+ movaps XMMWORD[(96+0)+rsp],xmm6
+ movaps XMMWORD[(96+16)+rsp],xmm7
+ movaps XMMWORD[(96+32)+rsp],xmm8
+ movaps XMMWORD[(96+48)+rsp],xmm9
+ movaps XMMWORD[(96+64)+rsp],xmm10
+ movaps XMMWORD[(96+80)+rsp],xmm11
+ movaps XMMWORD[(96+96)+rsp],xmm12
+ movaps XMMWORD[(96+112)+rsp],xmm13
+ movaps XMMWORD[(96+128)+rsp],xmm14
+ movaps XMMWORD[(96+144)+rsp],xmm15
+$L$prologue_ssse3:
+ mov r12,rdi
+ mov r13,rsi
+ mov r14,rdx
+ lea r15,[112+rcx]
+ movdqu xmm2,XMMWORD[r8]
+ mov QWORD[88+rsp],r8
+ shl r14,6
+ sub r13,r12
+ mov r8d,DWORD[((240-112))+r15]
+ add r14,r10
+
+ lea r11,[K_XX_XX]
+ mov eax,DWORD[r9]
+ mov ebx,DWORD[4+r9]
+ mov ecx,DWORD[8+r9]
+ mov edx,DWORD[12+r9]
+ mov esi,ebx
+ mov ebp,DWORD[16+r9]
+ mov edi,ecx
+ xor edi,edx
+ and esi,edi
+
+ movdqa xmm3,XMMWORD[64+r11]
+ movdqa xmm13,XMMWORD[r11]
+ movdqu xmm4,XMMWORD[r10]
+ movdqu xmm5,XMMWORD[16+r10]
+ movdqu xmm6,XMMWORD[32+r10]
+ movdqu xmm7,XMMWORD[48+r10]
+DB 102,15,56,0,227
+DB 102,15,56,0,235
+DB 102,15,56,0,243
+ add r10,64
+ paddd xmm4,xmm13
+DB 102,15,56,0,251
+ paddd xmm5,xmm13
+ paddd xmm6,xmm13
+ movdqa XMMWORD[rsp],xmm4
+ psubd xmm4,xmm13
+ movdqa XMMWORD[16+rsp],xmm5
+ psubd xmm5,xmm13
+ movdqa XMMWORD[32+rsp],xmm6
+ psubd xmm6,xmm13
+ movups xmm15,XMMWORD[((-112))+r15]
+ movups xmm0,XMMWORD[((16-112))+r15]
+ jmp NEAR $L$oop_ssse3
+ALIGN 32
+$L$oop_ssse3:
+ ror ebx,2
+ movups xmm14,XMMWORD[r12]
+ xorps xmm14,xmm15
+ xorps xmm2,xmm14
+ movups xmm1,XMMWORD[((-80))+r15]
+DB 102,15,56,220,208
+ pshufd xmm8,xmm4,238
+ xor esi,edx
+ movdqa xmm12,xmm7
+ paddd xmm13,xmm7
+ mov edi,eax
+ add ebp,DWORD[rsp]
+ punpcklqdq xmm8,xmm5
+ xor ebx,ecx
+ rol eax,5
+ add ebp,esi
+ psrldq xmm12,4
+ and edi,ebx
+ xor ebx,ecx
+ pxor xmm8,xmm4
+ add ebp,eax
+ ror eax,7
+ pxor xmm12,xmm6
+ xor edi,ecx
+ mov esi,ebp
+ add edx,DWORD[4+rsp]
+ pxor xmm8,xmm12
+ xor eax,ebx
+ rol ebp,5
+ movdqa XMMWORD[48+rsp],xmm13
+ add edx,edi
+ movups xmm0,XMMWORD[((-64))+r15]
+DB 102,15,56,220,209
+ and esi,eax
+ movdqa xmm3,xmm8
+ xor eax,ebx
+ add edx,ebp
+ ror ebp,7
+ movdqa xmm12,xmm8
+ xor esi,ebx
+ pslldq xmm3,12
+ paddd xmm8,xmm8
+ mov edi,edx
+ add ecx,DWORD[8+rsp]
+ psrld xmm12,31
+ xor ebp,eax
+ rol edx,5
+ add ecx,esi
+ movdqa xmm13,xmm3
+ and edi,ebp
+ xor ebp,eax
+ psrld xmm3,30
+ add ecx,edx
+ ror edx,7
+ por xmm8,xmm12
+ xor edi,eax
+ mov esi,ecx
+ add ebx,DWORD[12+rsp]
+ movups xmm1,XMMWORD[((-48))+r15]
+DB 102,15,56,220,208
+ pslld xmm13,2
+ pxor xmm8,xmm3
+ xor edx,ebp
+ movdqa xmm3,XMMWORD[r11]
+ rol ecx,5
+ add ebx,edi
+ and esi,edx
+ pxor xmm8,xmm13
+ xor edx,ebp
+ add ebx,ecx
+ ror ecx,7
+ pshufd xmm9,xmm5,238
+ xor esi,ebp
+ movdqa xmm13,xmm8
+ paddd xmm3,xmm8
+ mov edi,ebx
+ add eax,DWORD[16+rsp]
+ punpcklqdq xmm9,xmm6
+ xor ecx,edx
+ rol ebx,5
+ add eax,esi
+ psrldq xmm13,4
+ and edi,ecx
+ xor ecx,edx
+ pxor xmm9,xmm5
+ add eax,ebx
+ ror ebx,7
+ movups xmm0,XMMWORD[((-32))+r15]
+DB 102,15,56,220,209
+ pxor xmm13,xmm7
+ xor edi,edx
+ mov esi,eax
+ add ebp,DWORD[20+rsp]
+ pxor xmm9,xmm13
+ xor ebx,ecx
+ rol eax,5
+ movdqa XMMWORD[rsp],xmm3
+ add ebp,edi
+ and esi,ebx
+ movdqa xmm12,xmm9
+ xor ebx,ecx
+ add ebp,eax
+ ror eax,7
+ movdqa xmm13,xmm9
+ xor esi,ecx
+ pslldq xmm12,12
+ paddd xmm9,xmm9
+ mov edi,ebp
+ add edx,DWORD[24+rsp]
+ psrld xmm13,31
+ xor eax,ebx
+ rol ebp,5
+ add edx,esi
+ movups xmm1,XMMWORD[((-16))+r15]
+DB 102,15,56,220,208
+ movdqa xmm3,xmm12
+ and edi,eax
+ xor eax,ebx
+ psrld xmm12,30
+ add edx,ebp
+ ror ebp,7
+ por xmm9,xmm13
+ xor edi,ebx
+ mov esi,edx
+ add ecx,DWORD[28+rsp]
+ pslld xmm3,2
+ pxor xmm9,xmm12
+ xor ebp,eax
+ movdqa xmm12,XMMWORD[16+r11]
+ rol edx,5
+ add ecx,edi
+ and esi,ebp
+ pxor xmm9,xmm3
+ xor ebp,eax
+ add ecx,edx
+ ror edx,7
+ pshufd xmm10,xmm6,238
+ xor esi,eax
+ movdqa xmm3,xmm9
+ paddd xmm12,xmm9
+ mov edi,ecx
+ add ebx,DWORD[32+rsp]
+ movups xmm0,XMMWORD[r15]
+DB 102,15,56,220,209
+ punpcklqdq xmm10,xmm7
+ xor edx,ebp
+ rol ecx,5
+ add ebx,esi
+ psrldq xmm3,4
+ and edi,edx
+ xor edx,ebp
+ pxor xmm10,xmm6
+ add ebx,ecx
+ ror ecx,7
+ pxor xmm3,xmm8
+ xor edi,ebp
+ mov esi,ebx
+ add eax,DWORD[36+rsp]
+ pxor xmm10,xmm3
+ xor ecx,edx
+ rol ebx,5
+ movdqa XMMWORD[16+rsp],xmm12
+ add eax,edi
+ and esi,ecx
+ movdqa xmm13,xmm10
+ xor ecx,edx
+ add eax,ebx
+ ror ebx,7
+ movups xmm1,XMMWORD[16+r15]
+DB 102,15,56,220,208
+ movdqa xmm3,xmm10
+ xor esi,edx
+ pslldq xmm13,12
+ paddd xmm10,xmm10
+ mov edi,eax
+ add ebp,DWORD[40+rsp]
+ psrld xmm3,31
+ xor ebx,ecx
+ rol eax,5
+ add ebp,esi
+ movdqa xmm12,xmm13
+ and edi,ebx
+ xor ebx,ecx
+ psrld xmm13,30
+ add ebp,eax
+ ror eax,7
+ por xmm10,xmm3
+ xor edi,ecx
+ mov esi,ebp
+ add edx,DWORD[44+rsp]
+ pslld xmm12,2
+ pxor xmm10,xmm13
+ xor eax,ebx
+ movdqa xmm13,XMMWORD[16+r11]
+ rol ebp,5
+ add edx,edi
+ movups xmm0,XMMWORD[32+r15]
+DB 102,15,56,220,209
+ and esi,eax
+ pxor xmm10,xmm12
+ xor eax,ebx
+ add edx,ebp
+ ror ebp,7
+ pshufd xmm11,xmm7,238
+ xor esi,ebx
+ movdqa xmm12,xmm10
+ paddd xmm13,xmm10
+ mov edi,edx
+ add ecx,DWORD[48+rsp]
+ punpcklqdq xmm11,xmm8
+ xor ebp,eax
+ rol edx,5
+ add ecx,esi
+ psrldq xmm12,4
+ and edi,ebp
+ xor ebp,eax
+ pxor xmm11,xmm7
+ add ecx,edx
+ ror edx,7
+ pxor xmm12,xmm9
+ xor edi,eax
+ mov esi,ecx
+ add ebx,DWORD[52+rsp]
+ movups xmm1,XMMWORD[48+r15]
+DB 102,15,56,220,208
+ pxor xmm11,xmm12
+ xor edx,ebp
+ rol ecx,5
+ movdqa XMMWORD[32+rsp],xmm13
+ add ebx,edi
+ and esi,edx
+ movdqa xmm3,xmm11
+ xor edx,ebp
+ add ebx,ecx
+ ror ecx,7
+ movdqa xmm12,xmm11
+ xor esi,ebp
+ pslldq xmm3,12
+ paddd xmm11,xmm11
+ mov edi,ebx
+ add eax,DWORD[56+rsp]
+ psrld xmm12,31
+ xor ecx,edx
+ rol ebx,5
+ add eax,esi
+ movdqa xmm13,xmm3
+ and edi,ecx
+ xor ecx,edx
+ psrld xmm3,30
+ add eax,ebx
+ ror ebx,7
+ cmp r8d,11
+ jb NEAR $L$aesenclast1
+ movups xmm0,XMMWORD[64+r15]
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[80+r15]
+DB 102,15,56,220,208
+ je NEAR $L$aesenclast1
+ movups xmm0,XMMWORD[96+r15]
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[112+r15]
+DB 102,15,56,220,208
+$L$aesenclast1:
+DB 102,15,56,221,209
+ movups xmm0,XMMWORD[((16-112))+r15]
+ por xmm11,xmm12
+ xor edi,edx
+ mov esi,eax
+ add ebp,DWORD[60+rsp]
+ pslld xmm13,2
+ pxor xmm11,xmm3
+ xor ebx,ecx
+ movdqa xmm3,XMMWORD[16+r11]
+ rol eax,5
+ add ebp,edi
+ and esi,ebx
+ pxor xmm11,xmm13
+ pshufd xmm13,xmm10,238
+ xor ebx,ecx
+ add ebp,eax
+ ror eax,7
+ pxor xmm4,xmm8
+ xor esi,ecx
+ mov edi,ebp
+ add edx,DWORD[rsp]
+ punpcklqdq xmm13,xmm11
+ xor eax,ebx
+ rol ebp,5
+ pxor xmm4,xmm5
+ add edx,esi
+ movups xmm14,XMMWORD[16+r12]
+ xorps xmm14,xmm15
+ movups XMMWORD[r13*1+r12],xmm2
+ xorps xmm2,xmm14
+ movups xmm1,XMMWORD[((-80))+r15]
+DB 102,15,56,220,208
+ and edi,eax
+ movdqa xmm12,xmm3
+ xor eax,ebx
+ paddd xmm3,xmm11
+ add edx,ebp
+ pxor xmm4,xmm13
+ ror ebp,7
+ xor edi,ebx
+ mov esi,edx
+ add ecx,DWORD[4+rsp]
+ movdqa xmm13,xmm4
+ xor ebp,eax
+ rol edx,5
+ movdqa XMMWORD[48+rsp],xmm3
+ add ecx,edi
+ and esi,ebp
+ xor ebp,eax
+ pslld xmm4,2
+ add ecx,edx
+ ror edx,7
+ psrld xmm13,30
+ xor esi,eax
+ mov edi,ecx
+ add ebx,DWORD[8+rsp]
+ movups xmm0,XMMWORD[((-64))+r15]
+DB 102,15,56,220,209
+ por xmm4,xmm13
+ xor edx,ebp
+ rol ecx,5
+ pshufd xmm3,xmm11,238
+ add ebx,esi
+ and edi,edx
+ xor edx,ebp
+ add ebx,ecx
+ add eax,DWORD[12+rsp]
+ xor edi,ebp
+ mov esi,ebx
+ rol ebx,5
+ add eax,edi
+ xor esi,edx
+ ror ecx,7
+ add eax,ebx
+ pxor xmm5,xmm9
+ add ebp,DWORD[16+rsp]
+ movups xmm1,XMMWORD[((-48))+r15]
+DB 102,15,56,220,208
+ xor esi,ecx
+ punpcklqdq xmm3,xmm4
+ mov edi,eax
+ rol eax,5
+ pxor xmm5,xmm6
+ add ebp,esi
+ xor edi,ecx
+ movdqa xmm13,xmm12
+ ror ebx,7
+ paddd xmm12,xmm4
+ add ebp,eax
+ pxor xmm5,xmm3
+ add edx,DWORD[20+rsp]
+ xor edi,ebx
+ mov esi,ebp
+ rol ebp,5
+ movdqa xmm3,xmm5
+ add edx,edi
+ xor esi,ebx
+ movdqa XMMWORD[rsp],xmm12
+ ror eax,7
+ add edx,ebp
+ add ecx,DWORD[24+rsp]
+ pslld xmm5,2
+ xor esi,eax
+ mov edi,edx
+ psrld xmm3,30
+ rol edx,5
+ add ecx,esi
+ movups xmm0,XMMWORD[((-32))+r15]
+DB 102,15,56,220,209
+ xor edi,eax
+ ror ebp,7
+ por xmm5,xmm3
+ add ecx,edx
+ add ebx,DWORD[28+rsp]
+ pshufd xmm12,xmm4,238
+ xor edi,ebp
+ mov esi,ecx
+ rol ecx,5
+ add ebx,edi
+ xor esi,ebp
+ ror edx,7
+ add ebx,ecx
+ pxor xmm6,xmm10
+ add eax,DWORD[32+rsp]
+ xor esi,edx
+ punpcklqdq xmm12,xmm5
+ mov edi,ebx
+ rol ebx,5
+ pxor xmm6,xmm7
+ add eax,esi
+ xor edi,edx
+ movdqa xmm3,XMMWORD[32+r11]
+ ror ecx,7
+ paddd xmm13,xmm5
+ add eax,ebx
+ pxor xmm6,xmm12
+ add ebp,DWORD[36+rsp]
+ movups xmm1,XMMWORD[((-16))+r15]
+DB 102,15,56,220,208
+ xor edi,ecx
+ mov esi,eax
+ rol eax,5
+ movdqa xmm12,xmm6
+ add ebp,edi
+ xor esi,ecx
+ movdqa XMMWORD[16+rsp],xmm13
+ ror ebx,7
+ add ebp,eax
+ add edx,DWORD[40+rsp]
+ pslld xmm6,2
+ xor esi,ebx
+ mov edi,ebp
+ psrld xmm12,30
+ rol ebp,5
+ add edx,esi
+ xor edi,ebx
+ ror eax,7
+ por xmm6,xmm12
+ add edx,ebp
+ add ecx,DWORD[44+rsp]
+ pshufd xmm13,xmm5,238
+ xor edi,eax
+ mov esi,edx
+ rol edx,5
+ add ecx,edi
+ movups xmm0,XMMWORD[r15]
+DB 102,15,56,220,209
+ xor esi,eax
+ ror ebp,7
+ add ecx,edx
+ pxor xmm7,xmm11
+ add ebx,DWORD[48+rsp]
+ xor esi,ebp
+ punpcklqdq xmm13,xmm6
+ mov edi,ecx
+ rol ecx,5
+ pxor xmm7,xmm8
+ add ebx,esi
+ xor edi,ebp
+ movdqa xmm12,xmm3
+ ror edx,7
+ paddd xmm3,xmm6
+ add ebx,ecx
+ pxor xmm7,xmm13
+ add eax,DWORD[52+rsp]
+ xor edi,edx
+ mov esi,ebx
+ rol ebx,5
+ movdqa xmm13,xmm7
+ add eax,edi
+ xor esi,edx
+ movdqa XMMWORD[32+rsp],xmm3
+ ror ecx,7
+ add eax,ebx
+ add ebp,DWORD[56+rsp]
+ movups xmm1,XMMWORD[16+r15]
+DB 102,15,56,220,208
+ pslld xmm7,2
+ xor esi,ecx
+ mov edi,eax
+ psrld xmm13,30
+ rol eax,5
+ add ebp,esi
+ xor edi,ecx
+ ror ebx,7
+ por xmm7,xmm13
+ add ebp,eax
+ add edx,DWORD[60+rsp]
+ pshufd xmm3,xmm6,238
+ xor edi,ebx
+ mov esi,ebp
+ rol ebp,5
+ add edx,edi
+ xor esi,ebx
+ ror eax,7
+ add edx,ebp
+ pxor xmm8,xmm4
+ add ecx,DWORD[rsp]
+ xor esi,eax
+ punpcklqdq xmm3,xmm7
+ mov edi,edx
+ rol edx,5
+ pxor xmm8,xmm9
+ add ecx,esi
+ movups xmm0,XMMWORD[32+r15]
+DB 102,15,56,220,209
+ xor edi,eax
+ movdqa xmm13,xmm12
+ ror ebp,7
+ paddd xmm12,xmm7
+ add ecx,edx
+ pxor xmm8,xmm3
+ add ebx,DWORD[4+rsp]
+ xor edi,ebp
+ mov esi,ecx
+ rol ecx,5
+ movdqa xmm3,xmm8
+ add ebx,edi
+ xor esi,ebp
+ movdqa XMMWORD[48+rsp],xmm12
+ ror edx,7
+ add ebx,ecx
+ add eax,DWORD[8+rsp]
+ pslld xmm8,2
+ xor esi,edx
+ mov edi,ebx
+ psrld xmm3,30
+ rol ebx,5
+ add eax,esi
+ xor edi,edx
+ ror ecx,7
+ por xmm8,xmm3
+ add eax,ebx
+ add ebp,DWORD[12+rsp]
+ movups xmm1,XMMWORD[48+r15]
+DB 102,15,56,220,208
+ pshufd xmm12,xmm7,238
+ xor edi,ecx
+ mov esi,eax
+ rol eax,5
+ add ebp,edi
+ xor esi,ecx
+ ror ebx,7
+ add ebp,eax
+ pxor xmm9,xmm5
+ add edx,DWORD[16+rsp]
+ xor esi,ebx
+ punpcklqdq xmm12,xmm8
+ mov edi,ebp
+ rol ebp,5
+ pxor xmm9,xmm10
+ add edx,esi
+ xor edi,ebx
+ movdqa xmm3,xmm13
+ ror eax,7
+ paddd xmm13,xmm8
+ add edx,ebp
+ pxor xmm9,xmm12
+ add ecx,DWORD[20+rsp]
+ xor edi,eax
+ mov esi,edx
+ rol edx,5
+ movdqa xmm12,xmm9
+ add ecx,edi
+ cmp r8d,11
+ jb NEAR $L$aesenclast2
+ movups xmm0,XMMWORD[64+r15]
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[80+r15]
+DB 102,15,56,220,208
+ je NEAR $L$aesenclast2
+ movups xmm0,XMMWORD[96+r15]
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[112+r15]
+DB 102,15,56,220,208
+$L$aesenclast2:
+DB 102,15,56,221,209
+ movups xmm0,XMMWORD[((16-112))+r15]
+ xor esi,eax
+ movdqa XMMWORD[rsp],xmm13
+ ror ebp,7
+ add ecx,edx
+ add ebx,DWORD[24+rsp]
+ pslld xmm9,2
+ xor esi,ebp
+ mov edi,ecx
+ psrld xmm12,30
+ rol ecx,5
+ add ebx,esi
+ xor edi,ebp
+ ror edx,7
+ por xmm9,xmm12
+ add ebx,ecx
+ add eax,DWORD[28+rsp]
+ pshufd xmm13,xmm8,238
+ ror ecx,7
+ mov esi,ebx
+ xor edi,edx
+ rol ebx,5
+ add eax,edi
+ xor esi,ecx
+ xor ecx,edx
+ add eax,ebx
+ pxor xmm10,xmm6
+ add ebp,DWORD[32+rsp]
+ movups xmm14,XMMWORD[32+r12]
+ xorps xmm14,xmm15
+ movups XMMWORD[16+r12*1+r13],xmm2
+ xorps xmm2,xmm14
+ movups xmm1,XMMWORD[((-80))+r15]
+DB 102,15,56,220,208
+ and esi,ecx
+ xor ecx,edx
+ ror ebx,7
+ punpcklqdq xmm13,xmm9
+ mov edi,eax
+ xor esi,ecx
+ pxor xmm10,xmm11
+ rol eax,5
+ add ebp,esi
+ movdqa xmm12,xmm3
+ xor edi,ebx
+ paddd xmm3,xmm9
+ xor ebx,ecx
+ pxor xmm10,xmm13
+ add ebp,eax
+ add edx,DWORD[36+rsp]
+ and edi,ebx
+ xor ebx,ecx
+ ror eax,7
+ movdqa xmm13,xmm10
+ mov esi,ebp
+ xor edi,ebx
+ movdqa XMMWORD[16+rsp],xmm3
+ rol ebp,5
+ add edx,edi
+ movups xmm0,XMMWORD[((-64))+r15]
+DB 102,15,56,220,209
+ xor esi,eax
+ pslld xmm10,2
+ xor eax,ebx
+ add edx,ebp
+ psrld xmm13,30
+ add ecx,DWORD[40+rsp]
+ and esi,eax
+ xor eax,ebx
+ por xmm10,xmm13
+ ror ebp,7
+ mov edi,edx
+ xor esi,eax
+ rol edx,5
+ pshufd xmm3,xmm9,238
+ add ecx,esi
+ xor edi,ebp
+ xor ebp,eax
+ add ecx,edx
+ add ebx,DWORD[44+rsp]
+ and edi,ebp
+ xor ebp,eax
+ ror edx,7
+ movups xmm1,XMMWORD[((-48))+r15]
+DB 102,15,56,220,208
+ mov esi,ecx
+ xor edi,ebp
+ rol ecx,5
+ add ebx,edi
+ xor esi,edx
+ xor edx,ebp
+ add ebx,ecx
+ pxor xmm11,xmm7
+ add eax,DWORD[48+rsp]
+ and esi,edx
+ xor edx,ebp
+ ror ecx,7
+ punpcklqdq xmm3,xmm10
+ mov edi,ebx
+ xor esi,edx
+ pxor xmm11,xmm4
+ rol ebx,5
+ add eax,esi
+ movdqa xmm13,XMMWORD[48+r11]
+ xor edi,ecx
+ paddd xmm12,xmm10
+ xor ecx,edx
+ pxor xmm11,xmm3
+ add eax,ebx
+ add ebp,DWORD[52+rsp]
+ movups xmm0,XMMWORD[((-32))+r15]
+DB 102,15,56,220,209
+ and edi,ecx
+ xor ecx,edx
+ ror ebx,7
+ movdqa xmm3,xmm11
+ mov esi,eax
+ xor edi,ecx
+ movdqa XMMWORD[32+rsp],xmm12
+ rol eax,5
+ add ebp,edi
+ xor esi,ebx
+ pslld xmm11,2
+ xor ebx,ecx
+ add ebp,eax
+ psrld xmm3,30
+ add edx,DWORD[56+rsp]
+ and esi,ebx
+ xor ebx,ecx
+ por xmm11,xmm3
+ ror eax,7
+ mov edi,ebp
+ xor esi,ebx
+ rol ebp,5
+ pshufd xmm12,xmm10,238
+ add edx,esi
+ movups xmm1,XMMWORD[((-16))+r15]
+DB 102,15,56,220,208
+ xor edi,eax
+ xor eax,ebx
+ add edx,ebp
+ add ecx,DWORD[60+rsp]
+ and edi,eax
+ xor eax,ebx
+ ror ebp,7
+ mov esi,edx
+ xor edi,eax
+ rol edx,5
+ add ecx,edi
+ xor esi,ebp
+ xor ebp,eax
+ add ecx,edx
+ pxor xmm4,xmm8
+ add ebx,DWORD[rsp]
+ and esi,ebp
+ xor ebp,eax
+ ror edx,7
+ movups xmm0,XMMWORD[r15]
+DB 102,15,56,220,209
+ punpcklqdq xmm12,xmm11
+ mov edi,ecx
+ xor esi,ebp
+ pxor xmm4,xmm5
+ rol ecx,5
+ add ebx,esi
+ movdqa xmm3,xmm13
+ xor edi,edx
+ paddd xmm13,xmm11
+ xor edx,ebp
+ pxor xmm4,xmm12
+ add ebx,ecx
+ add eax,DWORD[4+rsp]
+ and edi,edx
+ xor edx,ebp
+ ror ecx,7
+ movdqa xmm12,xmm4
+ mov esi,ebx
+ xor edi,edx
+ movdqa XMMWORD[48+rsp],xmm13
+ rol ebx,5
+ add eax,edi
+ xor esi,ecx
+ pslld xmm4,2
+ xor ecx,edx
+ add eax,ebx
+ psrld xmm12,30
+ add ebp,DWORD[8+rsp]
+ movups xmm1,XMMWORD[16+r15]
+DB 102,15,56,220,208
+ and esi,ecx
+ xor ecx,edx
+ por xmm4,xmm12
+ ror ebx,7
+ mov edi,eax
+ xor esi,ecx
+ rol eax,5
+ pshufd xmm13,xmm11,238
+ add ebp,esi
+ xor edi,ebx
+ xor ebx,ecx
+ add ebp,eax
+ add edx,DWORD[12+rsp]
+ and edi,ebx
+ xor ebx,ecx
+ ror eax,7
+ mov esi,ebp
+ xor edi,ebx
+ rol ebp,5
+ add edx,edi
+ movups xmm0,XMMWORD[32+r15]
+DB 102,15,56,220,209
+ xor esi,eax
+ xor eax,ebx
+ add edx,ebp
+ pxor xmm5,xmm9
+ add ecx,DWORD[16+rsp]
+ and esi,eax
+ xor eax,ebx
+ ror ebp,7
+ punpcklqdq xmm13,xmm4
+ mov edi,edx
+ xor esi,eax
+ pxor xmm5,xmm6
+ rol edx,5
+ add ecx,esi
+ movdqa xmm12,xmm3
+ xor edi,ebp
+ paddd xmm3,xmm4
+ xor ebp,eax
+ pxor xmm5,xmm13
+ add ecx,edx
+ add ebx,DWORD[20+rsp]
+ and edi,ebp
+ xor ebp,eax
+ ror edx,7
+ movups xmm1,XMMWORD[48+r15]
+DB 102,15,56,220,208
+ movdqa xmm13,xmm5
+ mov esi,ecx
+ xor edi,ebp
+ movdqa XMMWORD[rsp],xmm3
+ rol ecx,5
+ add ebx,edi
+ xor esi,edx
+ pslld xmm5,2
+ xor edx,ebp
+ add ebx,ecx
+ psrld xmm13,30
+ add eax,DWORD[24+rsp]
+ and esi,edx
+ xor edx,ebp
+ por xmm5,xmm13
+ ror ecx,7
+ mov edi,ebx
+ xor esi,edx
+ rol ebx,5
+ pshufd xmm3,xmm4,238
+ add eax,esi
+ xor edi,ecx
+ xor ecx,edx
+ add eax,ebx
+ add ebp,DWORD[28+rsp]
+ cmp r8d,11
+ jb NEAR $L$aesenclast3
+ movups xmm0,XMMWORD[64+r15]
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[80+r15]
+DB 102,15,56,220,208
+ je NEAR $L$aesenclast3
+ movups xmm0,XMMWORD[96+r15]
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[112+r15]
+DB 102,15,56,220,208
+$L$aesenclast3:
+DB 102,15,56,221,209
+ movups xmm0,XMMWORD[((16-112))+r15]
+ and edi,ecx
+ xor ecx,edx
+ ror ebx,7
+ mov esi,eax
+ xor edi,ecx
+ rol eax,5
+ add ebp,edi
+ xor esi,ebx
+ xor ebx,ecx
+ add ebp,eax
+ pxor xmm6,xmm10
+ add edx,DWORD[32+rsp]
+ and esi,ebx
+ xor ebx,ecx
+ ror eax,7
+ punpcklqdq xmm3,xmm5
+ mov edi,ebp
+ xor esi,ebx
+ pxor xmm6,xmm7
+ rol ebp,5
+ add edx,esi
+ movups xmm14,XMMWORD[48+r12]
+ xorps xmm14,xmm15
+ movups XMMWORD[32+r12*1+r13],xmm2
+ xorps xmm2,xmm14
+ movups xmm1,XMMWORD[((-80))+r15]
+DB 102,15,56,220,208
+ movdqa xmm13,xmm12
+ xor edi,eax
+ paddd xmm12,xmm5
+ xor eax,ebx
+ pxor xmm6,xmm3
+ add edx,ebp
+ add ecx,DWORD[36+rsp]
+ and edi,eax
+ xor eax,ebx
+ ror ebp,7
+ movdqa xmm3,xmm6
+ mov esi,edx
+ xor edi,eax
+ movdqa XMMWORD[16+rsp],xmm12
+ rol edx,5
+ add ecx,edi
+ xor esi,ebp
+ pslld xmm6,2
+ xor ebp,eax
+ add ecx,edx
+ psrld xmm3,30
+ add ebx,DWORD[40+rsp]
+ and esi,ebp
+ xor ebp,eax
+ por xmm6,xmm3
+ ror edx,7
+ movups xmm0,XMMWORD[((-64))+r15]
+DB 102,15,56,220,209
+ mov edi,ecx
+ xor esi,ebp
+ rol ecx,5
+ pshufd xmm12,xmm5,238
+ add ebx,esi
+ xor edi,edx
+ xor edx,ebp
+ add ebx,ecx
+ add eax,DWORD[44+rsp]
+ and edi,edx
+ xor edx,ebp
+ ror ecx,7
+ mov esi,ebx
+ xor edi,edx
+ rol ebx,5
+ add eax,edi
+ xor esi,edx
+ add eax,ebx
+ pxor xmm7,xmm11
+ add ebp,DWORD[48+rsp]
+ movups xmm1,XMMWORD[((-48))+r15]
+DB 102,15,56,220,208
+ xor esi,ecx
+ punpcklqdq xmm12,xmm6
+ mov edi,eax
+ rol eax,5
+ pxor xmm7,xmm8
+ add ebp,esi
+ xor edi,ecx
+ movdqa xmm3,xmm13
+ ror ebx,7
+ paddd xmm13,xmm6
+ add ebp,eax
+ pxor xmm7,xmm12
+ add edx,DWORD[52+rsp]
+ xor edi,ebx
+ mov esi,ebp
+ rol ebp,5
+ movdqa xmm12,xmm7
+ add edx,edi
+ xor esi,ebx
+ movdqa XMMWORD[32+rsp],xmm13
+ ror eax,7
+ add edx,ebp
+ add ecx,DWORD[56+rsp]
+ pslld xmm7,2
+ xor esi,eax
+ mov edi,edx
+ psrld xmm12,30
+ rol edx,5
+ add ecx,esi
+ movups xmm0,XMMWORD[((-32))+r15]
+DB 102,15,56,220,209
+ xor edi,eax
+ ror ebp,7
+ por xmm7,xmm12
+ add ecx,edx
+ add ebx,DWORD[60+rsp]
+ xor edi,ebp
+ mov esi,ecx
+ rol ecx,5
+ add ebx,edi
+ xor esi,ebp
+ ror edx,7
+ add ebx,ecx
+ add eax,DWORD[rsp]
+ xor esi,edx
+ mov edi,ebx
+ rol ebx,5
+ paddd xmm3,xmm7
+ add eax,esi
+ xor edi,edx
+ movdqa XMMWORD[48+rsp],xmm3
+ ror ecx,7
+ add eax,ebx
+ add ebp,DWORD[4+rsp]
+ movups xmm1,XMMWORD[((-16))+r15]
+DB 102,15,56,220,208
+ xor edi,ecx
+ mov esi,eax
+ rol eax,5
+ add ebp,edi
+ xor esi,ecx
+ ror ebx,7
+ add ebp,eax
+ add edx,DWORD[8+rsp]
+ xor esi,ebx
+ mov edi,ebp
+ rol ebp,5
+ add edx,esi
+ xor edi,ebx
+ ror eax,7
+ add edx,ebp
+ add ecx,DWORD[12+rsp]
+ xor edi,eax
+ mov esi,edx
+ rol edx,5
+ add ecx,edi
+ movups xmm0,XMMWORD[r15]
+DB 102,15,56,220,209
+ xor esi,eax
+ ror ebp,7
+ add ecx,edx
+ cmp r10,r14
+ je NEAR $L$done_ssse3
+ movdqa xmm3,XMMWORD[64+r11]
+ movdqa xmm13,XMMWORD[r11]
+ movdqu xmm4,XMMWORD[r10]
+ movdqu xmm5,XMMWORD[16+r10]
+ movdqu xmm6,XMMWORD[32+r10]
+ movdqu xmm7,XMMWORD[48+r10]
+DB 102,15,56,0,227
+ add r10,64
+ add ebx,DWORD[16+rsp]
+ xor esi,ebp
+ mov edi,ecx
+DB 102,15,56,0,235
+ rol ecx,5
+ add ebx,esi
+ xor edi,ebp
+ ror edx,7
+ paddd xmm4,xmm13
+ add ebx,ecx
+ add eax,DWORD[20+rsp]
+ xor edi,edx
+ mov esi,ebx
+ movdqa XMMWORD[rsp],xmm4
+ rol ebx,5
+ add eax,edi
+ xor esi,edx
+ ror ecx,7
+ psubd xmm4,xmm13
+ add eax,ebx
+ add ebp,DWORD[24+rsp]
+ movups xmm1,XMMWORD[16+r15]
+DB 102,15,56,220,208
+ xor esi,ecx
+ mov edi,eax
+ rol eax,5
+ add ebp,esi
+ xor edi,ecx
+ ror ebx,7
+ add ebp,eax
+ add edx,DWORD[28+rsp]
+ xor edi,ebx
+ mov esi,ebp
+ rol ebp,5
+ add edx,edi
+ xor esi,ebx
+ ror eax,7
+ add edx,ebp
+ add ecx,DWORD[32+rsp]
+ xor esi,eax
+ mov edi,edx
+DB 102,15,56,0,243
+ rol edx,5
+ add ecx,esi
+ movups xmm0,XMMWORD[32+r15]
+DB 102,15,56,220,209
+ xor edi,eax
+ ror ebp,7
+ paddd xmm5,xmm13
+ add ecx,edx
+ add ebx,DWORD[36+rsp]
+ xor edi,ebp
+ mov esi,ecx
+ movdqa XMMWORD[16+rsp],xmm5
+ rol ecx,5
+ add ebx,edi
+ xor esi,ebp
+ ror edx,7
+ psubd xmm5,xmm13
+ add ebx,ecx
+ add eax,DWORD[40+rsp]
+ xor esi,edx
+ mov edi,ebx
+ rol ebx,5
+ add eax,esi
+ xor edi,edx
+ ror ecx,7
+ add eax,ebx
+ add ebp,DWORD[44+rsp]
+ movups xmm1,XMMWORD[48+r15]
+DB 102,15,56,220,208
+ xor edi,ecx
+ mov esi,eax
+ rol eax,5
+ add ebp,edi
+ xor esi,ecx
+ ror ebx,7
+ add ebp,eax
+ add edx,DWORD[48+rsp]
+ xor esi,ebx
+ mov edi,ebp
+DB 102,15,56,0,251
+ rol ebp,5
+ add edx,esi
+ xor edi,ebx
+ ror eax,7
+ paddd xmm6,xmm13
+ add edx,ebp
+ add ecx,DWORD[52+rsp]
+ xor edi,eax
+ mov esi,edx
+ movdqa XMMWORD[32+rsp],xmm6
+ rol edx,5
+ add ecx,edi
+ cmp r8d,11
+ jb NEAR $L$aesenclast4
+ movups xmm0,XMMWORD[64+r15]
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[80+r15]
+DB 102,15,56,220,208
+ je NEAR $L$aesenclast4
+ movups xmm0,XMMWORD[96+r15]
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[112+r15]
+DB 102,15,56,220,208
+$L$aesenclast4:
+DB 102,15,56,221,209
+ movups xmm0,XMMWORD[((16-112))+r15]
+ xor esi,eax
+ ror ebp,7
+ psubd xmm6,xmm13
+ add ecx,edx
+ add ebx,DWORD[56+rsp]
+ xor esi,ebp
+ mov edi,ecx
+ rol ecx,5
+ add ebx,esi
+ xor edi,ebp
+ ror edx,7
+ add ebx,ecx
+ add eax,DWORD[60+rsp]
+ xor edi,edx
+ mov esi,ebx
+ rol ebx,5
+ add eax,edi
+ ror ecx,7
+ add eax,ebx
+ movups XMMWORD[48+r12*1+r13],xmm2
+ lea r12,[64+r12]
+
+ add eax,DWORD[r9]
+ add esi,DWORD[4+r9]
+ add ecx,DWORD[8+r9]
+ add edx,DWORD[12+r9]
+ mov DWORD[r9],eax
+ add ebp,DWORD[16+r9]
+ mov DWORD[4+r9],esi
+ mov ebx,esi
+ mov DWORD[8+r9],ecx
+ mov edi,ecx
+ mov DWORD[12+r9],edx
+ xor edi,edx
+ mov DWORD[16+r9],ebp
+ and esi,edi
+ jmp NEAR $L$oop_ssse3
+
+$L$done_ssse3:
+ add ebx,DWORD[16+rsp]
+ xor esi,ebp
+ mov edi,ecx
+ rol ecx,5
+ add ebx,esi
+ xor edi,ebp
+ ror edx,7
+ add ebx,ecx
+ add eax,DWORD[20+rsp]
+ xor edi,edx
+ mov esi,ebx
+ rol ebx,5
+ add eax,edi
+ xor esi,edx
+ ror ecx,7
+ add eax,ebx
+ add ebp,DWORD[24+rsp]
+ movups xmm1,XMMWORD[16+r15]
+DB 102,15,56,220,208
+ xor esi,ecx
+ mov edi,eax
+ rol eax,5
+ add ebp,esi
+ xor edi,ecx
+ ror ebx,7
+ add ebp,eax
+ add edx,DWORD[28+rsp]
+ xor edi,ebx
+ mov esi,ebp
+ rol ebp,5
+ add edx,edi
+ xor esi,ebx
+ ror eax,7
+ add edx,ebp
+ add ecx,DWORD[32+rsp]
+ xor esi,eax
+ mov edi,edx
+ rol edx,5
+ add ecx,esi
+ movups xmm0,XMMWORD[32+r15]
+DB 102,15,56,220,209
+ xor edi,eax
+ ror ebp,7
+ add ecx,edx
+ add ebx,DWORD[36+rsp]
+ xor edi,ebp
+ mov esi,ecx
+ rol ecx,5
+ add ebx,edi
+ xor esi,ebp
+ ror edx,7
+ add ebx,ecx
+ add eax,DWORD[40+rsp]
+ xor esi,edx
+ mov edi,ebx
+ rol ebx,5
+ add eax,esi
+ xor edi,edx
+ ror ecx,7
+ add eax,ebx
+ add ebp,DWORD[44+rsp]
+ movups xmm1,XMMWORD[48+r15]
+DB 102,15,56,220,208
+ xor edi,ecx
+ mov esi,eax
+ rol eax,5
+ add ebp,edi
+ xor esi,ecx
+ ror ebx,7
+ add ebp,eax
+ add edx,DWORD[48+rsp]
+ xor esi,ebx
+ mov edi,ebp
+ rol ebp,5
+ add edx,esi
+ xor edi,ebx
+ ror eax,7
+ add edx,ebp
+ add ecx,DWORD[52+rsp]
+ xor edi,eax
+ mov esi,edx
+ rol edx,5
+ add ecx,edi
+ cmp r8d,11
+ jb NEAR $L$aesenclast5
+ movups xmm0,XMMWORD[64+r15]
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[80+r15]
+DB 102,15,56,220,208
+ je NEAR $L$aesenclast5
+ movups xmm0,XMMWORD[96+r15]
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[112+r15]
+DB 102,15,56,220,208
+$L$aesenclast5:
+DB 102,15,56,221,209
+ movups xmm0,XMMWORD[((16-112))+r15]
+ xor esi,eax
+ ror ebp,7
+ add ecx,edx
+ add ebx,DWORD[56+rsp]
+ xor esi,ebp
+ mov edi,ecx
+ rol ecx,5
+ add ebx,esi
+ xor edi,ebp
+ ror edx,7
+ add ebx,ecx
+ add eax,DWORD[60+rsp]
+ xor edi,edx
+ mov esi,ebx
+ rol ebx,5
+ add eax,edi
+ ror ecx,7
+ add eax,ebx
+ movups XMMWORD[48+r12*1+r13],xmm2
+ mov r8,QWORD[88+rsp]
+
+ add eax,DWORD[r9]
+ add esi,DWORD[4+r9]
+ add ecx,DWORD[8+r9]
+ mov DWORD[r9],eax
+ add edx,DWORD[12+r9]
+ mov DWORD[4+r9],esi
+ add ebp,DWORD[16+r9]
+ mov DWORD[8+r9],ecx
+ mov DWORD[12+r9],edx
+ mov DWORD[16+r9],ebp
+ movups XMMWORD[r8],xmm2
+ movaps xmm6,XMMWORD[((96+0))+rsp]
+ movaps xmm7,XMMWORD[((96+16))+rsp]
+ movaps xmm8,XMMWORD[((96+32))+rsp]
+ movaps xmm9,XMMWORD[((96+48))+rsp]
+ movaps xmm10,XMMWORD[((96+64))+rsp]
+ movaps xmm11,XMMWORD[((96+80))+rsp]
+ movaps xmm12,XMMWORD[((96+96))+rsp]
+ movaps xmm13,XMMWORD[((96+112))+rsp]
+ movaps xmm14,XMMWORD[((96+128))+rsp]
+ movaps xmm15,XMMWORD[((96+144))+rsp]
+ lea rsi,[264+rsp]
+
+ mov r15,QWORD[rsi]
+
+ mov r14,QWORD[8+rsi]
+
+ mov r13,QWORD[16+rsi]
+
+ mov r12,QWORD[24+rsi]
+
+ mov rbp,QWORD[32+rsi]
+
+ mov rbx,QWORD[40+rsi]
+
+ lea rsp,[48+rsi]
+
+$L$epilogue_ssse3:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_aesni_cbc_sha1_enc_ssse3:
+
+ALIGN 32
+aesni_cbc_sha1_enc_avx:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_cbc_sha1_enc_avx:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ mov r10,QWORD[56+rsp]
+
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ lea rsp,[((-264))+rsp]
+
+
+
+ movaps XMMWORD[(96+0)+rsp],xmm6
+ movaps XMMWORD[(96+16)+rsp],xmm7
+ movaps XMMWORD[(96+32)+rsp],xmm8
+ movaps XMMWORD[(96+48)+rsp],xmm9
+ movaps XMMWORD[(96+64)+rsp],xmm10
+ movaps XMMWORD[(96+80)+rsp],xmm11
+ movaps XMMWORD[(96+96)+rsp],xmm12
+ movaps XMMWORD[(96+112)+rsp],xmm13
+ movaps XMMWORD[(96+128)+rsp],xmm14
+ movaps XMMWORD[(96+144)+rsp],xmm15
+$L$prologue_avx:
+ vzeroall
+ mov r12,rdi
+ mov r13,rsi
+ mov r14,rdx
+ lea r15,[112+rcx]
+ vmovdqu xmm12,XMMWORD[r8]
+ mov QWORD[88+rsp],r8
+ shl r14,6
+ sub r13,r12
+ mov r8d,DWORD[((240-112))+r15]
+ add r14,r10
+
+ lea r11,[K_XX_XX]
+ mov eax,DWORD[r9]
+ mov ebx,DWORD[4+r9]
+ mov ecx,DWORD[8+r9]
+ mov edx,DWORD[12+r9]
+ mov esi,ebx
+ mov ebp,DWORD[16+r9]
+ mov edi,ecx
+ xor edi,edx
+ and esi,edi
+
+ vmovdqa xmm6,XMMWORD[64+r11]
+ vmovdqa xmm10,XMMWORD[r11]
+ vmovdqu xmm0,XMMWORD[r10]
+ vmovdqu xmm1,XMMWORD[16+r10]
+ vmovdqu xmm2,XMMWORD[32+r10]
+ vmovdqu xmm3,XMMWORD[48+r10]
+ vpshufb xmm0,xmm0,xmm6
+ add r10,64
+ vpshufb xmm1,xmm1,xmm6
+ vpshufb xmm2,xmm2,xmm6
+ vpshufb xmm3,xmm3,xmm6
+ vpaddd xmm4,xmm0,xmm10
+ vpaddd xmm5,xmm1,xmm10
+ vpaddd xmm6,xmm2,xmm10
+ vmovdqa XMMWORD[rsp],xmm4
+ vmovdqa XMMWORD[16+rsp],xmm5
+ vmovdqa XMMWORD[32+rsp],xmm6
+ vmovups xmm15,XMMWORD[((-112))+r15]
+ vmovups xmm14,XMMWORD[((16-112))+r15]
+ jmp NEAR $L$oop_avx
+ALIGN 32
+$L$oop_avx:
+ shrd ebx,ebx,2
+ vmovdqu xmm13,XMMWORD[r12]
+ vpxor xmm13,xmm13,xmm15
+ vpxor xmm12,xmm12,xmm13
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[((-80))+r15]
+ xor esi,edx
+ vpalignr xmm4,xmm1,xmm0,8
+ mov edi,eax
+ add ebp,DWORD[rsp]
+ vpaddd xmm9,xmm10,xmm3
+ xor ebx,ecx
+ shld eax,eax,5
+ vpsrldq xmm8,xmm3,4
+ add ebp,esi
+ and edi,ebx
+ vpxor xmm4,xmm4,xmm0
+ xor ebx,ecx
+ add ebp,eax
+ vpxor xmm8,xmm8,xmm2
+ shrd eax,eax,7
+ xor edi,ecx
+ mov esi,ebp
+ add edx,DWORD[4+rsp]
+ vpxor xmm4,xmm4,xmm8
+ xor eax,ebx
+ shld ebp,ebp,5
+ vmovdqa XMMWORD[48+rsp],xmm9
+ add edx,edi
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[((-64))+r15]
+ and esi,eax
+ vpsrld xmm8,xmm4,31
+ xor eax,ebx
+ add edx,ebp
+ shrd ebp,ebp,7
+ xor esi,ebx
+ vpslldq xmm9,xmm4,12
+ vpaddd xmm4,xmm4,xmm4
+ mov edi,edx
+ add ecx,DWORD[8+rsp]
+ xor ebp,eax
+ shld edx,edx,5
+ vpor xmm4,xmm4,xmm8
+ vpsrld xmm8,xmm9,30
+ add ecx,esi
+ and edi,ebp
+ xor ebp,eax
+ add ecx,edx
+ vpslld xmm9,xmm9,2
+ vpxor xmm4,xmm4,xmm8
+ shrd edx,edx,7
+ xor edi,eax
+ mov esi,ecx
+ add ebx,DWORD[12+rsp]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[((-48))+r15]
+ vpxor xmm4,xmm4,xmm9
+ xor edx,ebp
+ shld ecx,ecx,5
+ add ebx,edi
+ and esi,edx
+ xor edx,ebp
+ add ebx,ecx
+ shrd ecx,ecx,7
+ xor esi,ebp
+ vpalignr xmm5,xmm2,xmm1,8
+ mov edi,ebx
+ add eax,DWORD[16+rsp]
+ vpaddd xmm9,xmm10,xmm4
+ xor ecx,edx
+ shld ebx,ebx,5
+ vpsrldq xmm8,xmm4,4
+ add eax,esi
+ and edi,ecx
+ vpxor xmm5,xmm5,xmm1
+ xor ecx,edx
+ add eax,ebx
+ vpxor xmm8,xmm8,xmm3
+ shrd ebx,ebx,7
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[((-32))+r15]
+ xor edi,edx
+ mov esi,eax
+ add ebp,DWORD[20+rsp]
+ vpxor xmm5,xmm5,xmm8
+ xor ebx,ecx
+ shld eax,eax,5
+ vmovdqa XMMWORD[rsp],xmm9
+ add ebp,edi
+ and esi,ebx
+ vpsrld xmm8,xmm5,31
+ xor ebx,ecx
+ add ebp,eax
+ shrd eax,eax,7
+ xor esi,ecx
+ vpslldq xmm9,xmm5,12
+ vpaddd xmm5,xmm5,xmm5
+ mov edi,ebp
+ add edx,DWORD[24+rsp]
+ xor eax,ebx
+ shld ebp,ebp,5
+ vpor xmm5,xmm5,xmm8
+ vpsrld xmm8,xmm9,30
+ add edx,esi
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[((-16))+r15]
+ and edi,eax
+ xor eax,ebx
+ add edx,ebp
+ vpslld xmm9,xmm9,2
+ vpxor xmm5,xmm5,xmm8
+ shrd ebp,ebp,7
+ xor edi,ebx
+ mov esi,edx
+ add ecx,DWORD[28+rsp]
+ vpxor xmm5,xmm5,xmm9
+ xor ebp,eax
+ shld edx,edx,5
+ vmovdqa xmm10,XMMWORD[16+r11]
+ add ecx,edi
+ and esi,ebp
+ xor ebp,eax
+ add ecx,edx
+ shrd edx,edx,7
+ xor esi,eax
+ vpalignr xmm6,xmm3,xmm2,8
+ mov edi,ecx
+ add ebx,DWORD[32+rsp]
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[r15]
+ vpaddd xmm9,xmm10,xmm5
+ xor edx,ebp
+ shld ecx,ecx,5
+ vpsrldq xmm8,xmm5,4
+ add ebx,esi
+ and edi,edx
+ vpxor xmm6,xmm6,xmm2
+ xor edx,ebp
+ add ebx,ecx
+ vpxor xmm8,xmm8,xmm4
+ shrd ecx,ecx,7
+ xor edi,ebp
+ mov esi,ebx
+ add eax,DWORD[36+rsp]
+ vpxor xmm6,xmm6,xmm8
+ xor ecx,edx
+ shld ebx,ebx,5
+ vmovdqa XMMWORD[16+rsp],xmm9
+ add eax,edi
+ and esi,ecx
+ vpsrld xmm8,xmm6,31
+ xor ecx,edx
+ add eax,ebx
+ shrd ebx,ebx,7
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[16+r15]
+ xor esi,edx
+ vpslldq xmm9,xmm6,12
+ vpaddd xmm6,xmm6,xmm6
+ mov edi,eax
+ add ebp,DWORD[40+rsp]
+ xor ebx,ecx
+ shld eax,eax,5
+ vpor xmm6,xmm6,xmm8
+ vpsrld xmm8,xmm9,30
+ add ebp,esi
+ and edi,ebx
+ xor ebx,ecx
+ add ebp,eax
+ vpslld xmm9,xmm9,2
+ vpxor xmm6,xmm6,xmm8
+ shrd eax,eax,7
+ xor edi,ecx
+ mov esi,ebp
+ add edx,DWORD[44+rsp]
+ vpxor xmm6,xmm6,xmm9
+ xor eax,ebx
+ shld ebp,ebp,5
+ add edx,edi
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[32+r15]
+ and esi,eax
+ xor eax,ebx
+ add edx,ebp
+ shrd ebp,ebp,7
+ xor esi,ebx
+ vpalignr xmm7,xmm4,xmm3,8
+ mov edi,edx
+ add ecx,DWORD[48+rsp]
+ vpaddd xmm9,xmm10,xmm6
+ xor ebp,eax
+ shld edx,edx,5
+ vpsrldq xmm8,xmm6,4
+ add ecx,esi
+ and edi,ebp
+ vpxor xmm7,xmm7,xmm3
+ xor ebp,eax
+ add ecx,edx
+ vpxor xmm8,xmm8,xmm5
+ shrd edx,edx,7
+ xor edi,eax
+ mov esi,ecx
+ add ebx,DWORD[52+rsp]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[48+r15]
+ vpxor xmm7,xmm7,xmm8
+ xor edx,ebp
+ shld ecx,ecx,5
+ vmovdqa XMMWORD[32+rsp],xmm9
+ add ebx,edi
+ and esi,edx
+ vpsrld xmm8,xmm7,31
+ xor edx,ebp
+ add ebx,ecx
+ shrd ecx,ecx,7
+ xor esi,ebp
+ vpslldq xmm9,xmm7,12
+ vpaddd xmm7,xmm7,xmm7
+ mov edi,ebx
+ add eax,DWORD[56+rsp]
+ xor ecx,edx
+ shld ebx,ebx,5
+ vpor xmm7,xmm7,xmm8
+ vpsrld xmm8,xmm9,30
+ add eax,esi
+ and edi,ecx
+ xor ecx,edx
+ add eax,ebx
+ vpslld xmm9,xmm9,2
+ vpxor xmm7,xmm7,xmm8
+ shrd ebx,ebx,7
+ cmp r8d,11
+ jb NEAR $L$vaesenclast6
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[64+r15]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[80+r15]
+ je NEAR $L$vaesenclast6
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[96+r15]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[112+r15]
+$L$vaesenclast6:
+ vaesenclast xmm12,xmm12,xmm15
+ vmovups xmm15,XMMWORD[((-112))+r15]
+ vmovups xmm14,XMMWORD[((16-112))+r15]
+ xor edi,edx
+ mov esi,eax
+ add ebp,DWORD[60+rsp]
+ vpxor xmm7,xmm7,xmm9
+ xor ebx,ecx
+ shld eax,eax,5
+ add ebp,edi
+ and esi,ebx
+ xor ebx,ecx
+ add ebp,eax
+ vpalignr xmm8,xmm7,xmm6,8
+ vpxor xmm0,xmm0,xmm4
+ shrd eax,eax,7
+ xor esi,ecx
+ mov edi,ebp
+ add edx,DWORD[rsp]
+ vpxor xmm0,xmm0,xmm1
+ xor eax,ebx
+ shld ebp,ebp,5
+ vpaddd xmm9,xmm10,xmm7
+ add edx,esi
+ vmovdqu xmm13,XMMWORD[16+r12]
+ vpxor xmm13,xmm13,xmm15
+ vmovups XMMWORD[r13*1+r12],xmm12
+ vpxor xmm12,xmm12,xmm13
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[((-80))+r15]
+ and edi,eax
+ vpxor xmm0,xmm0,xmm8
+ xor eax,ebx
+ add edx,ebp
+ shrd ebp,ebp,7
+ xor edi,ebx
+ vpsrld xmm8,xmm0,30
+ vmovdqa XMMWORD[48+rsp],xmm9
+ mov esi,edx
+ add ecx,DWORD[4+rsp]
+ xor ebp,eax
+ shld edx,edx,5
+ vpslld xmm0,xmm0,2
+ add ecx,edi
+ and esi,ebp
+ xor ebp,eax
+ add ecx,edx
+ shrd edx,edx,7
+ xor esi,eax
+ mov edi,ecx
+ add ebx,DWORD[8+rsp]
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[((-64))+r15]
+ vpor xmm0,xmm0,xmm8
+ xor edx,ebp
+ shld ecx,ecx,5
+ add ebx,esi
+ and edi,edx
+ xor edx,ebp
+ add ebx,ecx
+ add eax,DWORD[12+rsp]
+ xor edi,ebp
+ mov esi,ebx
+ shld ebx,ebx,5
+ add eax,edi
+ xor esi,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ vpalignr xmm8,xmm0,xmm7,8
+ vpxor xmm1,xmm1,xmm5
+ add ebp,DWORD[16+rsp]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[((-48))+r15]
+ xor esi,ecx
+ mov edi,eax
+ shld eax,eax,5
+ vpxor xmm1,xmm1,xmm2
+ add ebp,esi
+ xor edi,ecx
+ vpaddd xmm9,xmm10,xmm0
+ shrd ebx,ebx,7
+ add ebp,eax
+ vpxor xmm1,xmm1,xmm8
+ add edx,DWORD[20+rsp]
+ xor edi,ebx
+ mov esi,ebp
+ shld ebp,ebp,5
+ vpsrld xmm8,xmm1,30
+ vmovdqa XMMWORD[rsp],xmm9
+ add edx,edi
+ xor esi,ebx
+ shrd eax,eax,7
+ add edx,ebp
+ vpslld xmm1,xmm1,2
+ add ecx,DWORD[24+rsp]
+ xor esi,eax
+ mov edi,edx
+ shld edx,edx,5
+ add ecx,esi
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[((-32))+r15]
+ xor edi,eax
+ shrd ebp,ebp,7
+ add ecx,edx
+ vpor xmm1,xmm1,xmm8
+ add ebx,DWORD[28+rsp]
+ xor edi,ebp
+ mov esi,ecx
+ shld ecx,ecx,5
+ add ebx,edi
+ xor esi,ebp
+ shrd edx,edx,7
+ add ebx,ecx
+ vpalignr xmm8,xmm1,xmm0,8
+ vpxor xmm2,xmm2,xmm6
+ add eax,DWORD[32+rsp]
+ xor esi,edx
+ mov edi,ebx
+ shld ebx,ebx,5
+ vpxor xmm2,xmm2,xmm3
+ add eax,esi
+ xor edi,edx
+ vpaddd xmm9,xmm10,xmm1
+ vmovdqa xmm10,XMMWORD[32+r11]
+ shrd ecx,ecx,7
+ add eax,ebx
+ vpxor xmm2,xmm2,xmm8
+ add ebp,DWORD[36+rsp]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[((-16))+r15]
+ xor edi,ecx
+ mov esi,eax
+ shld eax,eax,5
+ vpsrld xmm8,xmm2,30
+ vmovdqa XMMWORD[16+rsp],xmm9
+ add ebp,edi
+ xor esi,ecx
+ shrd ebx,ebx,7
+ add ebp,eax
+ vpslld xmm2,xmm2,2
+ add edx,DWORD[40+rsp]
+ xor esi,ebx
+ mov edi,ebp
+ shld ebp,ebp,5
+ add edx,esi
+ xor edi,ebx
+ shrd eax,eax,7
+ add edx,ebp
+ vpor xmm2,xmm2,xmm8
+ add ecx,DWORD[44+rsp]
+ xor edi,eax
+ mov esi,edx
+ shld edx,edx,5
+ add ecx,edi
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[r15]
+ xor esi,eax
+ shrd ebp,ebp,7
+ add ecx,edx
+ vpalignr xmm8,xmm2,xmm1,8
+ vpxor xmm3,xmm3,xmm7
+ add ebx,DWORD[48+rsp]
+ xor esi,ebp
+ mov edi,ecx
+ shld ecx,ecx,5
+ vpxor xmm3,xmm3,xmm4
+ add ebx,esi
+ xor edi,ebp
+ vpaddd xmm9,xmm10,xmm2
+ shrd edx,edx,7
+ add ebx,ecx
+ vpxor xmm3,xmm3,xmm8
+ add eax,DWORD[52+rsp]
+ xor edi,edx
+ mov esi,ebx
+ shld ebx,ebx,5
+ vpsrld xmm8,xmm3,30
+ vmovdqa XMMWORD[32+rsp],xmm9
+ add eax,edi
+ xor esi,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ vpslld xmm3,xmm3,2
+ add ebp,DWORD[56+rsp]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[16+r15]
+ xor esi,ecx
+ mov edi,eax
+ shld eax,eax,5
+ add ebp,esi
+ xor edi,ecx
+ shrd ebx,ebx,7
+ add ebp,eax
+ vpor xmm3,xmm3,xmm8
+ add edx,DWORD[60+rsp]
+ xor edi,ebx
+ mov esi,ebp
+ shld ebp,ebp,5
+ add edx,edi
+ xor esi,ebx
+ shrd eax,eax,7
+ add edx,ebp
+ vpalignr xmm8,xmm3,xmm2,8
+ vpxor xmm4,xmm4,xmm0
+ add ecx,DWORD[rsp]
+ xor esi,eax
+ mov edi,edx
+ shld edx,edx,5
+ vpxor xmm4,xmm4,xmm5
+ add ecx,esi
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[32+r15]
+ xor edi,eax
+ vpaddd xmm9,xmm10,xmm3
+ shrd ebp,ebp,7
+ add ecx,edx
+ vpxor xmm4,xmm4,xmm8
+ add ebx,DWORD[4+rsp]
+ xor edi,ebp
+ mov esi,ecx
+ shld ecx,ecx,5
+ vpsrld xmm8,xmm4,30
+ vmovdqa XMMWORD[48+rsp],xmm9
+ add ebx,edi
+ xor esi,ebp
+ shrd edx,edx,7
+ add ebx,ecx
+ vpslld xmm4,xmm4,2
+ add eax,DWORD[8+rsp]
+ xor esi,edx
+ mov edi,ebx
+ shld ebx,ebx,5
+ add eax,esi
+ xor edi,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ vpor xmm4,xmm4,xmm8
+ add ebp,DWORD[12+rsp]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[48+r15]
+ xor edi,ecx
+ mov esi,eax
+ shld eax,eax,5
+ add ebp,edi
+ xor esi,ecx
+ shrd ebx,ebx,7
+ add ebp,eax
+ vpalignr xmm8,xmm4,xmm3,8
+ vpxor xmm5,xmm5,xmm1
+ add edx,DWORD[16+rsp]
+ xor esi,ebx
+ mov edi,ebp
+ shld ebp,ebp,5
+ vpxor xmm5,xmm5,xmm6
+ add edx,esi
+ xor edi,ebx
+ vpaddd xmm9,xmm10,xmm4
+ shrd eax,eax,7
+ add edx,ebp
+ vpxor xmm5,xmm5,xmm8
+ add ecx,DWORD[20+rsp]
+ xor edi,eax
+ mov esi,edx
+ shld edx,edx,5
+ vpsrld xmm8,xmm5,30
+ vmovdqa XMMWORD[rsp],xmm9
+ add ecx,edi
+ cmp r8d,11
+ jb NEAR $L$vaesenclast7
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[64+r15]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[80+r15]
+ je NEAR $L$vaesenclast7
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[96+r15]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[112+r15]
+$L$vaesenclast7:
+ vaesenclast xmm12,xmm12,xmm15
+ vmovups xmm15,XMMWORD[((-112))+r15]
+ vmovups xmm14,XMMWORD[((16-112))+r15]
+ xor esi,eax
+ shrd ebp,ebp,7
+ add ecx,edx
+ vpslld xmm5,xmm5,2
+ add ebx,DWORD[24+rsp]
+ xor esi,ebp
+ mov edi,ecx
+ shld ecx,ecx,5
+ add ebx,esi
+ xor edi,ebp
+ shrd edx,edx,7
+ add ebx,ecx
+ vpor xmm5,xmm5,xmm8
+ add eax,DWORD[28+rsp]
+ shrd ecx,ecx,7
+ mov esi,ebx
+ xor edi,edx
+ shld ebx,ebx,5
+ add eax,edi
+ xor esi,ecx
+ xor ecx,edx
+ add eax,ebx
+ vpalignr xmm8,xmm5,xmm4,8
+ vpxor xmm6,xmm6,xmm2
+ add ebp,DWORD[32+rsp]
+ vmovdqu xmm13,XMMWORD[32+r12]
+ vpxor xmm13,xmm13,xmm15
+ vmovups XMMWORD[16+r12*1+r13],xmm12
+ vpxor xmm12,xmm12,xmm13
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[((-80))+r15]
+ and esi,ecx
+ xor ecx,edx
+ shrd ebx,ebx,7
+ vpxor xmm6,xmm6,xmm7
+ mov edi,eax
+ xor esi,ecx
+ vpaddd xmm9,xmm10,xmm5
+ shld eax,eax,5
+ add ebp,esi
+ vpxor xmm6,xmm6,xmm8
+ xor edi,ebx
+ xor ebx,ecx
+ add ebp,eax
+ add edx,DWORD[36+rsp]
+ vpsrld xmm8,xmm6,30
+ vmovdqa XMMWORD[16+rsp],xmm9
+ and edi,ebx
+ xor ebx,ecx
+ shrd eax,eax,7
+ mov esi,ebp
+ vpslld xmm6,xmm6,2
+ xor edi,ebx
+ shld ebp,ebp,5
+ add edx,edi
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[((-64))+r15]
+ xor esi,eax
+ xor eax,ebx
+ add edx,ebp
+ add ecx,DWORD[40+rsp]
+ and esi,eax
+ vpor xmm6,xmm6,xmm8
+ xor eax,ebx
+ shrd ebp,ebp,7
+ mov edi,edx
+ xor esi,eax
+ shld edx,edx,5
+ add ecx,esi
+ xor edi,ebp
+ xor ebp,eax
+ add ecx,edx
+ add ebx,DWORD[44+rsp]
+ and edi,ebp
+ xor ebp,eax
+ shrd edx,edx,7
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[((-48))+r15]
+ mov esi,ecx
+ xor edi,ebp
+ shld ecx,ecx,5
+ add ebx,edi
+ xor esi,edx
+ xor edx,ebp
+ add ebx,ecx
+ vpalignr xmm8,xmm6,xmm5,8
+ vpxor xmm7,xmm7,xmm3
+ add eax,DWORD[48+rsp]
+ and esi,edx
+ xor edx,ebp
+ shrd ecx,ecx,7
+ vpxor xmm7,xmm7,xmm0
+ mov edi,ebx
+ xor esi,edx
+ vpaddd xmm9,xmm10,xmm6
+ vmovdqa xmm10,XMMWORD[48+r11]
+ shld ebx,ebx,5
+ add eax,esi
+ vpxor xmm7,xmm7,xmm8
+ xor edi,ecx
+ xor ecx,edx
+ add eax,ebx
+ add ebp,DWORD[52+rsp]
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[((-32))+r15]
+ vpsrld xmm8,xmm7,30
+ vmovdqa XMMWORD[32+rsp],xmm9
+ and edi,ecx
+ xor ecx,edx
+ shrd ebx,ebx,7
+ mov esi,eax
+ vpslld xmm7,xmm7,2
+ xor edi,ecx
+ shld eax,eax,5
+ add ebp,edi
+ xor esi,ebx
+ xor ebx,ecx
+ add ebp,eax
+ add edx,DWORD[56+rsp]
+ and esi,ebx
+ vpor xmm7,xmm7,xmm8
+ xor ebx,ecx
+ shrd eax,eax,7
+ mov edi,ebp
+ xor esi,ebx
+ shld ebp,ebp,5
+ add edx,esi
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[((-16))+r15]
+ xor edi,eax
+ xor eax,ebx
+ add edx,ebp
+ add ecx,DWORD[60+rsp]
+ and edi,eax
+ xor eax,ebx
+ shrd ebp,ebp,7
+ mov esi,edx
+ xor edi,eax
+ shld edx,edx,5
+ add ecx,edi
+ xor esi,ebp
+ xor ebp,eax
+ add ecx,edx
+ vpalignr xmm8,xmm7,xmm6,8
+ vpxor xmm0,xmm0,xmm4
+ add ebx,DWORD[rsp]
+ and esi,ebp
+ xor ebp,eax
+ shrd edx,edx,7
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[r15]
+ vpxor xmm0,xmm0,xmm1
+ mov edi,ecx
+ xor esi,ebp
+ vpaddd xmm9,xmm10,xmm7
+ shld ecx,ecx,5
+ add ebx,esi
+ vpxor xmm0,xmm0,xmm8
+ xor edi,edx
+ xor edx,ebp
+ add ebx,ecx
+ add eax,DWORD[4+rsp]
+ vpsrld xmm8,xmm0,30
+ vmovdqa XMMWORD[48+rsp],xmm9
+ and edi,edx
+ xor edx,ebp
+ shrd ecx,ecx,7
+ mov esi,ebx
+ vpslld xmm0,xmm0,2
+ xor edi,edx
+ shld ebx,ebx,5
+ add eax,edi
+ xor esi,ecx
+ xor ecx,edx
+ add eax,ebx
+ add ebp,DWORD[8+rsp]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[16+r15]
+ and esi,ecx
+ vpor xmm0,xmm0,xmm8
+ xor ecx,edx
+ shrd ebx,ebx,7
+ mov edi,eax
+ xor esi,ecx
+ shld eax,eax,5
+ add ebp,esi
+ xor edi,ebx
+ xor ebx,ecx
+ add ebp,eax
+ add edx,DWORD[12+rsp]
+ and edi,ebx
+ xor ebx,ecx
+ shrd eax,eax,7
+ mov esi,ebp
+ xor edi,ebx
+ shld ebp,ebp,5
+ add edx,edi
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[32+r15]
+ xor esi,eax
+ xor eax,ebx
+ add edx,ebp
+ vpalignr xmm8,xmm0,xmm7,8
+ vpxor xmm1,xmm1,xmm5
+ add ecx,DWORD[16+rsp]
+ and esi,eax
+ xor eax,ebx
+ shrd ebp,ebp,7
+ vpxor xmm1,xmm1,xmm2
+ mov edi,edx
+ xor esi,eax
+ vpaddd xmm9,xmm10,xmm0
+ shld edx,edx,5
+ add ecx,esi
+ vpxor xmm1,xmm1,xmm8
+ xor edi,ebp
+ xor ebp,eax
+ add ecx,edx
+ add ebx,DWORD[20+rsp]
+ vpsrld xmm8,xmm1,30
+ vmovdqa XMMWORD[rsp],xmm9
+ and edi,ebp
+ xor ebp,eax
+ shrd edx,edx,7
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[48+r15]
+ mov esi,ecx
+ vpslld xmm1,xmm1,2
+ xor edi,ebp
+ shld ecx,ecx,5
+ add ebx,edi
+ xor esi,edx
+ xor edx,ebp
+ add ebx,ecx
+ add eax,DWORD[24+rsp]
+ and esi,edx
+ vpor xmm1,xmm1,xmm8
+ xor edx,ebp
+ shrd ecx,ecx,7
+ mov edi,ebx
+ xor esi,edx
+ shld ebx,ebx,5
+ add eax,esi
+ xor edi,ecx
+ xor ecx,edx
+ add eax,ebx
+ add ebp,DWORD[28+rsp]
+ cmp r8d,11
+ jb NEAR $L$vaesenclast8
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[64+r15]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[80+r15]
+ je NEAR $L$vaesenclast8
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[96+r15]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[112+r15]
+$L$vaesenclast8:
+ vaesenclast xmm12,xmm12,xmm15
+ vmovups xmm15,XMMWORD[((-112))+r15]
+ vmovups xmm14,XMMWORD[((16-112))+r15]
+ and edi,ecx
+ xor ecx,edx
+ shrd ebx,ebx,7
+ mov esi,eax
+ xor edi,ecx
+ shld eax,eax,5
+ add ebp,edi
+ xor esi,ebx
+ xor ebx,ecx
+ add ebp,eax
+ vpalignr xmm8,xmm1,xmm0,8
+ vpxor xmm2,xmm2,xmm6
+ add edx,DWORD[32+rsp]
+ and esi,ebx
+ xor ebx,ecx
+ shrd eax,eax,7
+ vpxor xmm2,xmm2,xmm3
+ mov edi,ebp
+ xor esi,ebx
+ vpaddd xmm9,xmm10,xmm1
+ shld ebp,ebp,5
+ add edx,esi
+ vmovdqu xmm13,XMMWORD[48+r12]
+ vpxor xmm13,xmm13,xmm15
+ vmovups XMMWORD[32+r12*1+r13],xmm12
+ vpxor xmm12,xmm12,xmm13
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[((-80))+r15]
+ vpxor xmm2,xmm2,xmm8
+ xor edi,eax
+ xor eax,ebx
+ add edx,ebp
+ add ecx,DWORD[36+rsp]
+ vpsrld xmm8,xmm2,30
+ vmovdqa XMMWORD[16+rsp],xmm9
+ and edi,eax
+ xor eax,ebx
+ shrd ebp,ebp,7
+ mov esi,edx
+ vpslld xmm2,xmm2,2
+ xor edi,eax
+ shld edx,edx,5
+ add ecx,edi
+ xor esi,ebp
+ xor ebp,eax
+ add ecx,edx
+ add ebx,DWORD[40+rsp]
+ and esi,ebp
+ vpor xmm2,xmm2,xmm8
+ xor ebp,eax
+ shrd edx,edx,7
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[((-64))+r15]
+ mov edi,ecx
+ xor esi,ebp
+ shld ecx,ecx,5
+ add ebx,esi
+ xor edi,edx
+ xor edx,ebp
+ add ebx,ecx
+ add eax,DWORD[44+rsp]
+ and edi,edx
+ xor edx,ebp
+ shrd ecx,ecx,7
+ mov esi,ebx
+ xor edi,edx
+ shld ebx,ebx,5
+ add eax,edi
+ xor esi,edx
+ add eax,ebx
+ vpalignr xmm8,xmm2,xmm1,8
+ vpxor xmm3,xmm3,xmm7
+ add ebp,DWORD[48+rsp]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[((-48))+r15]
+ xor esi,ecx
+ mov edi,eax
+ shld eax,eax,5
+ vpxor xmm3,xmm3,xmm4
+ add ebp,esi
+ xor edi,ecx
+ vpaddd xmm9,xmm10,xmm2
+ shrd ebx,ebx,7
+ add ebp,eax
+ vpxor xmm3,xmm3,xmm8
+ add edx,DWORD[52+rsp]
+ xor edi,ebx
+ mov esi,ebp
+ shld ebp,ebp,5
+ vpsrld xmm8,xmm3,30
+ vmovdqa XMMWORD[32+rsp],xmm9
+ add edx,edi
+ xor esi,ebx
+ shrd eax,eax,7
+ add edx,ebp
+ vpslld xmm3,xmm3,2
+ add ecx,DWORD[56+rsp]
+ xor esi,eax
+ mov edi,edx
+ shld edx,edx,5
+ add ecx,esi
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[((-32))+r15]
+ xor edi,eax
+ shrd ebp,ebp,7
+ add ecx,edx
+ vpor xmm3,xmm3,xmm8
+ add ebx,DWORD[60+rsp]
+ xor edi,ebp
+ mov esi,ecx
+ shld ecx,ecx,5
+ add ebx,edi
+ xor esi,ebp
+ shrd edx,edx,7
+ add ebx,ecx
+ add eax,DWORD[rsp]
+ vpaddd xmm9,xmm10,xmm3
+ xor esi,edx
+ mov edi,ebx
+ shld ebx,ebx,5
+ add eax,esi
+ vmovdqa XMMWORD[48+rsp],xmm9
+ xor edi,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ add ebp,DWORD[4+rsp]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[((-16))+r15]
+ xor edi,ecx
+ mov esi,eax
+ shld eax,eax,5
+ add ebp,edi
+ xor esi,ecx
+ shrd ebx,ebx,7
+ add ebp,eax
+ add edx,DWORD[8+rsp]
+ xor esi,ebx
+ mov edi,ebp
+ shld ebp,ebp,5
+ add edx,esi
+ xor edi,ebx
+ shrd eax,eax,7
+ add edx,ebp
+ add ecx,DWORD[12+rsp]
+ xor edi,eax
+ mov esi,edx
+ shld edx,edx,5
+ add ecx,edi
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[r15]
+ xor esi,eax
+ shrd ebp,ebp,7
+ add ecx,edx
+ cmp r10,r14
+ je NEAR $L$done_avx
+ vmovdqa xmm9,XMMWORD[64+r11]
+ vmovdqa xmm10,XMMWORD[r11]
+ vmovdqu xmm0,XMMWORD[r10]
+ vmovdqu xmm1,XMMWORD[16+r10]
+ vmovdqu xmm2,XMMWORD[32+r10]
+ vmovdqu xmm3,XMMWORD[48+r10]
+ vpshufb xmm0,xmm0,xmm9
+ add r10,64
+ add ebx,DWORD[16+rsp]
+ xor esi,ebp
+ vpshufb xmm1,xmm1,xmm9
+ mov edi,ecx
+ shld ecx,ecx,5
+ vpaddd xmm8,xmm0,xmm10
+ add ebx,esi
+ xor edi,ebp
+ shrd edx,edx,7
+ add ebx,ecx
+ vmovdqa XMMWORD[rsp],xmm8
+ add eax,DWORD[20+rsp]
+ xor edi,edx
+ mov esi,ebx
+ shld ebx,ebx,5
+ add eax,edi
+ xor esi,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ add ebp,DWORD[24+rsp]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[16+r15]
+ xor esi,ecx
+ mov edi,eax
+ shld eax,eax,5
+ add ebp,esi
+ xor edi,ecx
+ shrd ebx,ebx,7
+ add ebp,eax
+ add edx,DWORD[28+rsp]
+ xor edi,ebx
+ mov esi,ebp
+ shld ebp,ebp,5
+ add edx,edi
+ xor esi,ebx
+ shrd eax,eax,7
+ add edx,ebp
+ add ecx,DWORD[32+rsp]
+ xor esi,eax
+ vpshufb xmm2,xmm2,xmm9
+ mov edi,edx
+ shld edx,edx,5
+ vpaddd xmm8,xmm1,xmm10
+ add ecx,esi
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[32+r15]
+ xor edi,eax
+ shrd ebp,ebp,7
+ add ecx,edx
+ vmovdqa XMMWORD[16+rsp],xmm8
+ add ebx,DWORD[36+rsp]
+ xor edi,ebp
+ mov esi,ecx
+ shld ecx,ecx,5
+ add ebx,edi
+ xor esi,ebp
+ shrd edx,edx,7
+ add ebx,ecx
+ add eax,DWORD[40+rsp]
+ xor esi,edx
+ mov edi,ebx
+ shld ebx,ebx,5
+ add eax,esi
+ xor edi,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ add ebp,DWORD[44+rsp]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[48+r15]
+ xor edi,ecx
+ mov esi,eax
+ shld eax,eax,5
+ add ebp,edi
+ xor esi,ecx
+ shrd ebx,ebx,7
+ add ebp,eax
+ add edx,DWORD[48+rsp]
+ xor esi,ebx
+ vpshufb xmm3,xmm3,xmm9
+ mov edi,ebp
+ shld ebp,ebp,5
+ vpaddd xmm8,xmm2,xmm10
+ add edx,esi
+ xor edi,ebx
+ shrd eax,eax,7
+ add edx,ebp
+ vmovdqa XMMWORD[32+rsp],xmm8
+ add ecx,DWORD[52+rsp]
+ xor edi,eax
+ mov esi,edx
+ shld edx,edx,5
+ add ecx,edi
+ cmp r8d,11
+ jb NEAR $L$vaesenclast9
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[64+r15]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[80+r15]
+ je NEAR $L$vaesenclast9
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[96+r15]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[112+r15]
+$L$vaesenclast9:
+ vaesenclast xmm12,xmm12,xmm15
+ vmovups xmm15,XMMWORD[((-112))+r15]
+ vmovups xmm14,XMMWORD[((16-112))+r15]
+ xor esi,eax
+ shrd ebp,ebp,7
+ add ecx,edx
+ add ebx,DWORD[56+rsp]
+ xor esi,ebp
+ mov edi,ecx
+ shld ecx,ecx,5
+ add ebx,esi
+ xor edi,ebp
+ shrd edx,edx,7
+ add ebx,ecx
+ add eax,DWORD[60+rsp]
+ xor edi,edx
+ mov esi,ebx
+ shld ebx,ebx,5
+ add eax,edi
+ shrd ecx,ecx,7
+ add eax,ebx
+ vmovups XMMWORD[48+r12*1+r13],xmm12
+ lea r12,[64+r12]
+
+ add eax,DWORD[r9]
+ add esi,DWORD[4+r9]
+ add ecx,DWORD[8+r9]
+ add edx,DWORD[12+r9]
+ mov DWORD[r9],eax
+ add ebp,DWORD[16+r9]
+ mov DWORD[4+r9],esi
+ mov ebx,esi
+ mov DWORD[8+r9],ecx
+ mov edi,ecx
+ mov DWORD[12+r9],edx
+ xor edi,edx
+ mov DWORD[16+r9],ebp
+ and esi,edi
+ jmp NEAR $L$oop_avx
+
+$L$done_avx:
+ add ebx,DWORD[16+rsp]
+ xor esi,ebp
+ mov edi,ecx
+ shld ecx,ecx,5
+ add ebx,esi
+ xor edi,ebp
+ shrd edx,edx,7
+ add ebx,ecx
+ add eax,DWORD[20+rsp]
+ xor edi,edx
+ mov esi,ebx
+ shld ebx,ebx,5
+ add eax,edi
+ xor esi,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ add ebp,DWORD[24+rsp]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[16+r15]
+ xor esi,ecx
+ mov edi,eax
+ shld eax,eax,5
+ add ebp,esi
+ xor edi,ecx
+ shrd ebx,ebx,7
+ add ebp,eax
+ add edx,DWORD[28+rsp]
+ xor edi,ebx
+ mov esi,ebp
+ shld ebp,ebp,5
+ add edx,edi
+ xor esi,ebx
+ shrd eax,eax,7
+ add edx,ebp
+ add ecx,DWORD[32+rsp]
+ xor esi,eax
+ mov edi,edx
+ shld edx,edx,5
+ add ecx,esi
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[32+r15]
+ xor edi,eax
+ shrd ebp,ebp,7
+ add ecx,edx
+ add ebx,DWORD[36+rsp]
+ xor edi,ebp
+ mov esi,ecx
+ shld ecx,ecx,5
+ add ebx,edi
+ xor esi,ebp
+ shrd edx,edx,7
+ add ebx,ecx
+ add eax,DWORD[40+rsp]
+ xor esi,edx
+ mov edi,ebx
+ shld ebx,ebx,5
+ add eax,esi
+ xor edi,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ add ebp,DWORD[44+rsp]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[48+r15]
+ xor edi,ecx
+ mov esi,eax
+ shld eax,eax,5
+ add ebp,edi
+ xor esi,ecx
+ shrd ebx,ebx,7
+ add ebp,eax
+ add edx,DWORD[48+rsp]
+ xor esi,ebx
+ mov edi,ebp
+ shld ebp,ebp,5
+ add edx,esi
+ xor edi,ebx
+ shrd eax,eax,7
+ add edx,ebp
+ add ecx,DWORD[52+rsp]
+ xor edi,eax
+ mov esi,edx
+ shld edx,edx,5
+ add ecx,edi
+ cmp r8d,11
+ jb NEAR $L$vaesenclast10
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[64+r15]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[80+r15]
+ je NEAR $L$vaesenclast10
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[96+r15]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[112+r15]
+$L$vaesenclast10:
+ vaesenclast xmm12,xmm12,xmm15
+ vmovups xmm15,XMMWORD[((-112))+r15]
+ vmovups xmm14,XMMWORD[((16-112))+r15]
+ xor esi,eax
+ shrd ebp,ebp,7
+ add ecx,edx
+ add ebx,DWORD[56+rsp]
+ xor esi,ebp
+ mov edi,ecx
+ shld ecx,ecx,5
+ add ebx,esi
+ xor edi,ebp
+ shrd edx,edx,7
+ add ebx,ecx
+ add eax,DWORD[60+rsp]
+ xor edi,edx
+ mov esi,ebx
+ shld ebx,ebx,5
+ add eax,edi
+ shrd ecx,ecx,7
+ add eax,ebx
+ vmovups XMMWORD[48+r12*1+r13],xmm12
+ mov r8,QWORD[88+rsp]
+
+ add eax,DWORD[r9]
+ add esi,DWORD[4+r9]
+ add ecx,DWORD[8+r9]
+ mov DWORD[r9],eax
+ add edx,DWORD[12+r9]
+ mov DWORD[4+r9],esi
+ add ebp,DWORD[16+r9]
+ mov DWORD[8+r9],ecx
+ mov DWORD[12+r9],edx
+ mov DWORD[16+r9],ebp
+ vmovups XMMWORD[r8],xmm12
+ vzeroall
+ movaps xmm6,XMMWORD[((96+0))+rsp]
+ movaps xmm7,XMMWORD[((96+16))+rsp]
+ movaps xmm8,XMMWORD[((96+32))+rsp]
+ movaps xmm9,XMMWORD[((96+48))+rsp]
+ movaps xmm10,XMMWORD[((96+64))+rsp]
+ movaps xmm11,XMMWORD[((96+80))+rsp]
+ movaps xmm12,XMMWORD[((96+96))+rsp]
+ movaps xmm13,XMMWORD[((96+112))+rsp]
+ movaps xmm14,XMMWORD[((96+128))+rsp]
+ movaps xmm15,XMMWORD[((96+144))+rsp]
+ lea rsi,[264+rsp]
+
+ mov r15,QWORD[rsi]
+
+ mov r14,QWORD[8+rsi]
+
+ mov r13,QWORD[16+rsi]
+
+ mov r12,QWORD[24+rsi]
+
+ mov rbp,QWORD[32+rsi]
+
+ mov rbx,QWORD[40+rsi]
+
+ lea rsp,[48+rsi]
+
+$L$epilogue_avx:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_aesni_cbc_sha1_enc_avx:
+ALIGN 64
+K_XX_XX:
+ DD 0x5a827999,0x5a827999,0x5a827999,0x5a827999
+ DD 0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1
+ DD 0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc
+ DD 0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6
+ DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+DB 0xf,0xe,0xd,0xc,0xb,0xa,0x9,0x8,0x7,0x6,0x5,0x4,0x3,0x2,0x1,0x0
+
+DB 65,69,83,78,73,45,67,66,67,43,83,72,65,49,32,115
+DB 116,105,116,99,104,32,102,111,114,32,120,56,54,95,54,52
+DB 44,32,67,82,89,80,84,79,71,65,77,83,32,98,121,32
+DB 60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111
+DB 114,103,62,0
+ALIGN 64
+
+ALIGN 32
+aesni_cbc_sha1_enc_shaext:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_cbc_sha1_enc_shaext:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+ mov r10,QWORD[56+rsp]
+ lea rsp,[((-168))+rsp]
+ movaps XMMWORD[(-8-160)+rax],xmm6
+ movaps XMMWORD[(-8-144)+rax],xmm7
+ movaps XMMWORD[(-8-128)+rax],xmm8
+ movaps XMMWORD[(-8-112)+rax],xmm9
+ movaps XMMWORD[(-8-96)+rax],xmm10
+ movaps XMMWORD[(-8-80)+rax],xmm11
+ movaps XMMWORD[(-8-64)+rax],xmm12
+ movaps XMMWORD[(-8-48)+rax],xmm13
+ movaps XMMWORD[(-8-32)+rax],xmm14
+ movaps XMMWORD[(-8-16)+rax],xmm15
+$L$prologue_shaext:
+ movdqu xmm8,XMMWORD[r9]
+ movd xmm9,DWORD[16+r9]
+ movdqa xmm7,XMMWORD[((K_XX_XX+80))]
+
+ mov r11d,DWORD[240+rcx]
+ sub rsi,rdi
+ movups xmm15,XMMWORD[rcx]
+ movups xmm2,XMMWORD[r8]
+ movups xmm0,XMMWORD[16+rcx]
+ lea rcx,[112+rcx]
+
+ pshufd xmm8,xmm8,27
+ pshufd xmm9,xmm9,27
+ jmp NEAR $L$oop_shaext
+
+ALIGN 16
+$L$oop_shaext:
+ movups xmm14,XMMWORD[rdi]
+ xorps xmm14,xmm15
+ xorps xmm2,xmm14
+ movups xmm1,XMMWORD[((-80))+rcx]
+DB 102,15,56,220,208
+ movdqu xmm3,XMMWORD[r10]
+ movdqa xmm12,xmm9
+DB 102,15,56,0,223
+ movdqu xmm4,XMMWORD[16+r10]
+ movdqa xmm11,xmm8
+ movups xmm0,XMMWORD[((-64))+rcx]
+DB 102,15,56,220,209
+DB 102,15,56,0,231
+
+ paddd xmm9,xmm3
+ movdqu xmm5,XMMWORD[32+r10]
+ lea r10,[64+r10]
+ pxor xmm3,xmm12
+ movups xmm1,XMMWORD[((-48))+rcx]
+DB 102,15,56,220,208
+ pxor xmm3,xmm12
+ movdqa xmm10,xmm8
+DB 102,15,56,0,239
+DB 69,15,58,204,193,0
+DB 68,15,56,200,212
+ movups xmm0,XMMWORD[((-32))+rcx]
+DB 102,15,56,220,209
+DB 15,56,201,220
+ movdqu xmm6,XMMWORD[((-16))+r10]
+ movdqa xmm9,xmm8
+DB 102,15,56,0,247
+ movups xmm1,XMMWORD[((-16))+rcx]
+DB 102,15,56,220,208
+DB 69,15,58,204,194,0
+DB 68,15,56,200,205
+ pxor xmm3,xmm5
+DB 15,56,201,229
+ movups xmm0,XMMWORD[rcx]
+DB 102,15,56,220,209
+ movdqa xmm10,xmm8
+DB 69,15,58,204,193,0
+DB 68,15,56,200,214
+ movups xmm1,XMMWORD[16+rcx]
+DB 102,15,56,220,208
+DB 15,56,202,222
+ pxor xmm4,xmm6
+DB 15,56,201,238
+ movups xmm0,XMMWORD[32+rcx]
+DB 102,15,56,220,209
+ movdqa xmm9,xmm8
+DB 69,15,58,204,194,0
+DB 68,15,56,200,203
+ movups xmm1,XMMWORD[48+rcx]
+DB 102,15,56,220,208
+DB 15,56,202,227
+ pxor xmm5,xmm3
+DB 15,56,201,243
+ cmp r11d,11
+ jb NEAR $L$aesenclast11
+ movups xmm0,XMMWORD[64+rcx]
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[80+rcx]
+DB 102,15,56,220,208
+ je NEAR $L$aesenclast11
+ movups xmm0,XMMWORD[96+rcx]
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[112+rcx]
+DB 102,15,56,220,208
+$L$aesenclast11:
+DB 102,15,56,221,209
+ movups xmm0,XMMWORD[((16-112))+rcx]
+ movdqa xmm10,xmm8
+DB 69,15,58,204,193,0
+DB 68,15,56,200,212
+ movups xmm14,XMMWORD[16+rdi]
+ xorps xmm14,xmm15
+ movups XMMWORD[rdi*1+rsi],xmm2
+ xorps xmm2,xmm14
+ movups xmm1,XMMWORD[((-80))+rcx]
+DB 102,15,56,220,208
+DB 15,56,202,236
+ pxor xmm6,xmm4
+DB 15,56,201,220
+ movups xmm0,XMMWORD[((-64))+rcx]
+DB 102,15,56,220,209
+ movdqa xmm9,xmm8
+DB 69,15,58,204,194,1
+DB 68,15,56,200,205
+ movups xmm1,XMMWORD[((-48))+rcx]
+DB 102,15,56,220,208
+DB 15,56,202,245
+ pxor xmm3,xmm5
+DB 15,56,201,229
+ movups xmm0,XMMWORD[((-32))+rcx]
+DB 102,15,56,220,209
+ movdqa xmm10,xmm8
+DB 69,15,58,204,193,1
+DB 68,15,56,200,214
+ movups xmm1,XMMWORD[((-16))+rcx]
+DB 102,15,56,220,208
+DB 15,56,202,222
+ pxor xmm4,xmm6
+DB 15,56,201,238
+ movups xmm0,XMMWORD[rcx]
+DB 102,15,56,220,209
+ movdqa xmm9,xmm8
+DB 69,15,58,204,194,1
+DB 68,15,56,200,203
+ movups xmm1,XMMWORD[16+rcx]
+DB 102,15,56,220,208
+DB 15,56,202,227
+ pxor xmm5,xmm3
+DB 15,56,201,243
+ movups xmm0,XMMWORD[32+rcx]
+DB 102,15,56,220,209
+ movdqa xmm10,xmm8
+DB 69,15,58,204,193,1
+DB 68,15,56,200,212
+ movups xmm1,XMMWORD[48+rcx]
+DB 102,15,56,220,208
+DB 15,56,202,236
+ pxor xmm6,xmm4
+DB 15,56,201,220
+ cmp r11d,11
+ jb NEAR $L$aesenclast12
+ movups xmm0,XMMWORD[64+rcx]
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[80+rcx]
+DB 102,15,56,220,208
+ je NEAR $L$aesenclast12
+ movups xmm0,XMMWORD[96+rcx]
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[112+rcx]
+DB 102,15,56,220,208
+$L$aesenclast12:
+DB 102,15,56,221,209
+ movups xmm0,XMMWORD[((16-112))+rcx]
+ movdqa xmm9,xmm8
+DB 69,15,58,204,194,1
+DB 68,15,56,200,205
+ movups xmm14,XMMWORD[32+rdi]
+ xorps xmm14,xmm15
+ movups XMMWORD[16+rdi*1+rsi],xmm2
+ xorps xmm2,xmm14
+ movups xmm1,XMMWORD[((-80))+rcx]
+DB 102,15,56,220,208
+DB 15,56,202,245
+ pxor xmm3,xmm5
+DB 15,56,201,229
+ movups xmm0,XMMWORD[((-64))+rcx]
+DB 102,15,56,220,209
+ movdqa xmm10,xmm8
+DB 69,15,58,204,193,2
+DB 68,15,56,200,214
+ movups xmm1,XMMWORD[((-48))+rcx]
+DB 102,15,56,220,208
+DB 15,56,202,222
+ pxor xmm4,xmm6
+DB 15,56,201,238
+ movups xmm0,XMMWORD[((-32))+rcx]
+DB 102,15,56,220,209
+ movdqa xmm9,xmm8
+DB 69,15,58,204,194,2
+DB 68,15,56,200,203
+ movups xmm1,XMMWORD[((-16))+rcx]
+DB 102,15,56,220,208
+DB 15,56,202,227
+ pxor xmm5,xmm3
+DB 15,56,201,243
+ movups xmm0,XMMWORD[rcx]
+DB 102,15,56,220,209
+ movdqa xmm10,xmm8
+DB 69,15,58,204,193,2
+DB 68,15,56,200,212
+ movups xmm1,XMMWORD[16+rcx]
+DB 102,15,56,220,208
+DB 15,56,202,236
+ pxor xmm6,xmm4
+DB 15,56,201,220
+ movups xmm0,XMMWORD[32+rcx]
+DB 102,15,56,220,209
+ movdqa xmm9,xmm8
+DB 69,15,58,204,194,2
+DB 68,15,56,200,205
+ movups xmm1,XMMWORD[48+rcx]
+DB 102,15,56,220,208
+DB 15,56,202,245
+ pxor xmm3,xmm5
+DB 15,56,201,229
+ cmp r11d,11
+ jb NEAR $L$aesenclast13
+ movups xmm0,XMMWORD[64+rcx]
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[80+rcx]
+DB 102,15,56,220,208
+ je NEAR $L$aesenclast13
+ movups xmm0,XMMWORD[96+rcx]
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[112+rcx]
+DB 102,15,56,220,208
+$L$aesenclast13:
+DB 102,15,56,221,209
+ movups xmm0,XMMWORD[((16-112))+rcx]
+ movdqa xmm10,xmm8
+DB 69,15,58,204,193,2
+DB 68,15,56,200,214
+ movups xmm14,XMMWORD[48+rdi]
+ xorps xmm14,xmm15
+ movups XMMWORD[32+rdi*1+rsi],xmm2
+ xorps xmm2,xmm14
+ movups xmm1,XMMWORD[((-80))+rcx]
+DB 102,15,56,220,208
+DB 15,56,202,222
+ pxor xmm4,xmm6
+DB 15,56,201,238
+ movups xmm0,XMMWORD[((-64))+rcx]
+DB 102,15,56,220,209
+ movdqa xmm9,xmm8
+DB 69,15,58,204,194,3
+DB 68,15,56,200,203
+ movups xmm1,XMMWORD[((-48))+rcx]
+DB 102,15,56,220,208
+DB 15,56,202,227
+ pxor xmm5,xmm3
+DB 15,56,201,243
+ movups xmm0,XMMWORD[((-32))+rcx]
+DB 102,15,56,220,209
+ movdqa xmm10,xmm8
+DB 69,15,58,204,193,3
+DB 68,15,56,200,212
+DB 15,56,202,236
+ pxor xmm6,xmm4
+ movups xmm1,XMMWORD[((-16))+rcx]
+DB 102,15,56,220,208
+ movdqa xmm9,xmm8
+DB 69,15,58,204,194,3
+DB 68,15,56,200,205
+DB 15,56,202,245
+ movups xmm0,XMMWORD[rcx]
+DB 102,15,56,220,209
+ movdqa xmm5,xmm12
+ movdqa xmm10,xmm8
+DB 69,15,58,204,193,3
+DB 68,15,56,200,214
+ movups xmm1,XMMWORD[16+rcx]
+DB 102,15,56,220,208
+ movdqa xmm9,xmm8
+DB 69,15,58,204,194,3
+DB 68,15,56,200,205
+ movups xmm0,XMMWORD[32+rcx]
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[48+rcx]
+DB 102,15,56,220,208
+ cmp r11d,11
+ jb NEAR $L$aesenclast14
+ movups xmm0,XMMWORD[64+rcx]
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[80+rcx]
+DB 102,15,56,220,208
+ je NEAR $L$aesenclast14
+ movups xmm0,XMMWORD[96+rcx]
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[112+rcx]
+DB 102,15,56,220,208
+$L$aesenclast14:
+DB 102,15,56,221,209
+ movups xmm0,XMMWORD[((16-112))+rcx]
+ dec rdx
+
+ paddd xmm8,xmm11
+ movups XMMWORD[48+rdi*1+rsi],xmm2
+ lea rdi,[64+rdi]
+ jnz NEAR $L$oop_shaext
+
+ pshufd xmm8,xmm8,27
+ pshufd xmm9,xmm9,27
+ movups XMMWORD[r8],xmm2
+ movdqu XMMWORD[r9],xmm8
+ movd DWORD[16+r9],xmm9
+ movaps xmm6,XMMWORD[((-8-160))+rax]
+ movaps xmm7,XMMWORD[((-8-144))+rax]
+ movaps xmm8,XMMWORD[((-8-128))+rax]
+ movaps xmm9,XMMWORD[((-8-112))+rax]
+ movaps xmm10,XMMWORD[((-8-96))+rax]
+ movaps xmm11,XMMWORD[((-8-80))+rax]
+ movaps xmm12,XMMWORD[((-8-64))+rax]
+ movaps xmm13,XMMWORD[((-8-48))+rax]
+ movaps xmm14,XMMWORD[((-8-32))+rax]
+ movaps xmm15,XMMWORD[((-8-16))+rax]
+ mov rsp,rax
+$L$epilogue_shaext:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+$L$SEH_end_aesni_cbc_sha1_enc_shaext:
+EXTERN __imp_RtlVirtualUnwind
+
+ALIGN 16
+ssse3_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$common_seh_tail
+
+ mov rax,QWORD[152+r8]
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$common_seh_tail
+ lea r10,[aesni_cbc_sha1_enc_shaext]
+ cmp rbx,r10
+ jb NEAR $L$seh_no_shaext
+
+ lea rsi,[rax]
+ lea rdi,[512+r8]
+ mov ecx,20
+ DD 0xa548f3fc
+ lea rax,[168+rax]
+ jmp NEAR $L$common_seh_tail
+$L$seh_no_shaext:
+ lea rsi,[96+rax]
+ lea rdi,[512+r8]
+ mov ecx,20
+ DD 0xa548f3fc
+ lea rax,[264+rax]
+
+ mov r15,QWORD[rax]
+ mov r14,QWORD[8+rax]
+ mov r13,QWORD[16+rax]
+ mov r12,QWORD[24+rax]
+ mov rbp,QWORD[32+rax]
+ mov rbx,QWORD[40+rax]
+ lea rax,[48+rax]
+ mov QWORD[144+r8],rbx
+ mov QWORD[160+r8],rbp
+ mov QWORD[216+r8],r12
+ mov QWORD[224+r8],r13
+ mov QWORD[232+r8],r14
+ mov QWORD[240+r8],r15
+
+$L$common_seh_tail:
+ mov rdi,QWORD[8+rax]
+ mov rsi,QWORD[16+rax]
+ mov QWORD[152+r8],rax
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+
+ mov rdi,QWORD[40+r9]
+ mov rsi,r8
+ mov ecx,154
+ DD 0xa548f3fc
+
+ mov rsi,r9
+ xor rcx,rcx
+ mov rdx,QWORD[8+rsi]
+ mov r8,QWORD[rsi]
+ mov r9,QWORD[16+rsi]
+ mov r10,QWORD[40+rsi]
+ lea r11,[56+rsi]
+ lea r12,[24+rsi]
+ mov QWORD[32+rsp],r10
+ mov QWORD[40+rsp],r11
+ mov QWORD[48+rsp],r12
+ mov QWORD[56+rsp],rcx
+ call QWORD[__imp_RtlVirtualUnwind]
+
+ mov eax,1
+ add rsp,64
+ popfq
+ pop r15
+ pop r14
+ pop r13
+ pop r12
+ pop rbp
+ pop rbx
+ pop rdi
+ pop rsi
+ DB 0F3h,0C3h ;repret
+
+
+section .pdata rdata align=4
+ALIGN 4
+ DD $L$SEH_begin_aesni_cbc_sha1_enc_ssse3 wrt ..imagebase
+ DD $L$SEH_end_aesni_cbc_sha1_enc_ssse3 wrt ..imagebase
+ DD $L$SEH_info_aesni_cbc_sha1_enc_ssse3 wrt ..imagebase
+ DD $L$SEH_begin_aesni_cbc_sha1_enc_avx wrt ..imagebase
+ DD $L$SEH_end_aesni_cbc_sha1_enc_avx wrt ..imagebase
+ DD $L$SEH_info_aesni_cbc_sha1_enc_avx wrt ..imagebase
+ DD $L$SEH_begin_aesni_cbc_sha1_enc_shaext wrt ..imagebase
+ DD $L$SEH_end_aesni_cbc_sha1_enc_shaext wrt ..imagebase
+ DD $L$SEH_info_aesni_cbc_sha1_enc_shaext wrt ..imagebase
+section .xdata rdata align=8
+ALIGN 8
+$L$SEH_info_aesni_cbc_sha1_enc_ssse3:
+DB 9,0,0,0
+ DD ssse3_handler wrt ..imagebase
+ DD $L$prologue_ssse3 wrt ..imagebase,$L$epilogue_ssse3 wrt ..imagebase
+$L$SEH_info_aesni_cbc_sha1_enc_avx:
+DB 9,0,0,0
+ DD ssse3_handler wrt ..imagebase
+ DD $L$prologue_avx wrt ..imagebase,$L$epilogue_avx wrt ..imagebase
+$L$SEH_info_aesni_cbc_sha1_enc_shaext:
+DB 9,0,0,0
+ DD ssse3_handler wrt ..imagebase
+ DD $L$prologue_shaext wrt ..imagebase,$L$epilogue_shaext wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256-x86_64.nasm
new file mode 100644
index 0000000000..0dba3d7f67
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256-x86_64.nasm
@@ -0,0 +1,4709 @@
+; Copyright 2013-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+EXTERN OPENSSL_ia32cap_P
+global aesni_cbc_sha256_enc
+
+ALIGN 16
+aesni_cbc_sha256_enc:
+ lea r11,[OPENSSL_ia32cap_P]
+ mov eax,1
+ cmp rcx,0
+ je NEAR $L$probe
+ mov eax,DWORD[r11]
+ mov r10,QWORD[4+r11]
+ bt r10,61
+ jc NEAR aesni_cbc_sha256_enc_shaext
+ mov r11,r10
+ shr r11,32
+
+ test r10d,2048
+ jnz NEAR aesni_cbc_sha256_enc_xop
+ and r11d,296
+ cmp r11d,296
+ je NEAR aesni_cbc_sha256_enc_avx2
+ and r10d,268435456
+ jnz NEAR aesni_cbc_sha256_enc_avx
+ ud2
+ xor eax,eax
+ cmp rcx,0
+ je NEAR $L$probe
+ ud2
+$L$probe:
+ DB 0F3h,0C3h ;repret
+
+
+ALIGN 64
+
+K256:
+ DD 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5
+ DD 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5
+ DD 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5
+ DD 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5
+ DD 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3
+ DD 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3
+ DD 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174
+ DD 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174
+ DD 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc
+ DD 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc
+ DD 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da
+ DD 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da
+ DD 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7
+ DD 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7
+ DD 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967
+ DD 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967
+ DD 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13
+ DD 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13
+ DD 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85
+ DD 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85
+ DD 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3
+ DD 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3
+ DD 0xd192e819,0xd6990624,0xf40e3585,0x106aa070
+ DD 0xd192e819,0xd6990624,0xf40e3585,0x106aa070
+ DD 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5
+ DD 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5
+ DD 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3
+ DD 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3
+ DD 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208
+ DD 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208
+ DD 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2
+ DD 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2
+
+ DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+ DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+ DD 0,0,0,0,0,0,0,0,-1,-1,-1,-1
+ DD 0,0,0,0,0,0,0,0
+DB 65,69,83,78,73,45,67,66,67,43,83,72,65,50,53,54
+DB 32,115,116,105,116,99,104,32,102,111,114,32,120,56,54,95
+DB 54,52,44,32,67,82,89,80,84,79,71,65,77,83,32,98
+DB 121,32,60,97,112,112,114,111,64,111,112,101,110,115,115,108
+DB 46,111,114,103,62,0
+ALIGN 64
+
+ALIGN 64
+aesni_cbc_sha256_enc_xop:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_cbc_sha256_enc_xop:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+$L$xop_shortcut:
+ mov r10,QWORD[56+rsp]
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ sub rsp,288
+ and rsp,-64
+
+ shl rdx,6
+ sub rsi,rdi
+ sub r10,rdi
+ add rdx,rdi
+
+
+ mov QWORD[((64+8))+rsp],rsi
+ mov QWORD[((64+16))+rsp],rdx
+
+ mov QWORD[((64+32))+rsp],r8
+ mov QWORD[((64+40))+rsp],r9
+ mov QWORD[((64+48))+rsp],r10
+ mov QWORD[120+rsp],rax
+
+ movaps XMMWORD[128+rsp],xmm6
+ movaps XMMWORD[144+rsp],xmm7
+ movaps XMMWORD[160+rsp],xmm8
+ movaps XMMWORD[176+rsp],xmm9
+ movaps XMMWORD[192+rsp],xmm10
+ movaps XMMWORD[208+rsp],xmm11
+ movaps XMMWORD[224+rsp],xmm12
+ movaps XMMWORD[240+rsp],xmm13
+ movaps XMMWORD[256+rsp],xmm14
+ movaps XMMWORD[272+rsp],xmm15
+$L$prologue_xop:
+ vzeroall
+
+ mov r12,rdi
+ lea rdi,[128+rcx]
+ lea r13,[((K256+544))]
+ mov r14d,DWORD[((240-128))+rdi]
+ mov r15,r9
+ mov rsi,r10
+ vmovdqu xmm8,XMMWORD[r8]
+ sub r14,9
+
+ mov eax,DWORD[r15]
+ mov ebx,DWORD[4+r15]
+ mov ecx,DWORD[8+r15]
+ mov edx,DWORD[12+r15]
+ mov r8d,DWORD[16+r15]
+ mov r9d,DWORD[20+r15]
+ mov r10d,DWORD[24+r15]
+ mov r11d,DWORD[28+r15]
+
+ vmovdqa xmm14,XMMWORD[r14*8+r13]
+ vmovdqa xmm13,XMMWORD[16+r14*8+r13]
+ vmovdqa xmm12,XMMWORD[32+r14*8+r13]
+ vmovdqu xmm10,XMMWORD[((0-128))+rdi]
+ jmp NEAR $L$loop_xop
+ALIGN 16
+$L$loop_xop:
+ vmovdqa xmm7,XMMWORD[((K256+512))]
+ vmovdqu xmm0,XMMWORD[r12*1+rsi]
+ vmovdqu xmm1,XMMWORD[16+r12*1+rsi]
+ vmovdqu xmm2,XMMWORD[32+r12*1+rsi]
+ vmovdqu xmm3,XMMWORD[48+r12*1+rsi]
+ vpshufb xmm0,xmm0,xmm7
+ lea rbp,[K256]
+ vpshufb xmm1,xmm1,xmm7
+ vpshufb xmm2,xmm2,xmm7
+ vpaddd xmm4,xmm0,XMMWORD[rbp]
+ vpshufb xmm3,xmm3,xmm7
+ vpaddd xmm5,xmm1,XMMWORD[32+rbp]
+ vpaddd xmm6,xmm2,XMMWORD[64+rbp]
+ vpaddd xmm7,xmm3,XMMWORD[96+rbp]
+ vmovdqa XMMWORD[rsp],xmm4
+ mov r14d,eax
+ vmovdqa XMMWORD[16+rsp],xmm5
+ mov esi,ebx
+ vmovdqa XMMWORD[32+rsp],xmm6
+ xor esi,ecx
+ vmovdqa XMMWORD[48+rsp],xmm7
+ mov r13d,r8d
+ jmp NEAR $L$xop_00_47
+
+ALIGN 16
+$L$xop_00_47:
+ sub rbp,-16*2*4
+ vmovdqu xmm9,XMMWORD[r12]
+ mov QWORD[((64+0))+rsp],r12
+ vpalignr xmm4,xmm1,xmm0,4
+ ror r13d,14
+ mov eax,r14d
+ vpalignr xmm7,xmm3,xmm2,4
+ mov r12d,r9d
+ xor r13d,r8d
+DB 143,232,120,194,236,14
+ ror r14d,9
+ xor r12d,r10d
+ vpsrld xmm4,xmm4,3
+ ror r13d,5
+ xor r14d,eax
+ vpaddd xmm0,xmm0,xmm7
+ and r12d,r8d
+ vpxor xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((16-128))+rdi]
+ xor r13d,r8d
+ add r11d,DWORD[rsp]
+ mov r15d,eax
+DB 143,232,120,194,245,11
+ ror r14d,11
+ xor r12d,r10d
+ vpxor xmm4,xmm4,xmm5
+ xor r15d,ebx
+ ror r13d,6
+ add r11d,r12d
+ and esi,r15d
+DB 143,232,120,194,251,13
+ xor r14d,eax
+ add r11d,r13d
+ vpxor xmm4,xmm4,xmm6
+ xor esi,ebx
+ add edx,r11d
+ vpsrld xmm6,xmm3,10
+ ror r14d,2
+ add r11d,esi
+ vpaddd xmm0,xmm0,xmm4
+ mov r13d,edx
+ add r14d,r11d
+DB 143,232,120,194,239,2
+ ror r13d,14
+ mov r11d,r14d
+ vpxor xmm7,xmm7,xmm6
+ mov r12d,r8d
+ xor r13d,edx
+ ror r14d,9
+ xor r12d,r9d
+ vpxor xmm7,xmm7,xmm5
+ ror r13d,5
+ xor r14d,r11d
+ and r12d,edx
+ vpxor xmm9,xmm9,xmm8
+ xor r13d,edx
+ vpsrldq xmm7,xmm7,8
+ add r10d,DWORD[4+rsp]
+ mov esi,r11d
+ ror r14d,11
+ xor r12d,r9d
+ vpaddd xmm0,xmm0,xmm7
+ xor esi,eax
+ ror r13d,6
+ add r10d,r12d
+ and r15d,esi
+DB 143,232,120,194,248,13
+ xor r14d,r11d
+ add r10d,r13d
+ vpsrld xmm6,xmm0,10
+ xor r15d,eax
+ add ecx,r10d
+DB 143,232,120,194,239,2
+ ror r14d,2
+ add r10d,r15d
+ vpxor xmm7,xmm7,xmm6
+ mov r13d,ecx
+ add r14d,r10d
+ ror r13d,14
+ mov r10d,r14d
+ vpxor xmm7,xmm7,xmm5
+ mov r12d,edx
+ xor r13d,ecx
+ ror r14d,9
+ xor r12d,r8d
+ vpslldq xmm7,xmm7,8
+ ror r13d,5
+ xor r14d,r10d
+ and r12d,ecx
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((32-128))+rdi]
+ xor r13d,ecx
+ vpaddd xmm0,xmm0,xmm7
+ add r9d,DWORD[8+rsp]
+ mov r15d,r10d
+ ror r14d,11
+ xor r12d,r8d
+ vpaddd xmm6,xmm0,XMMWORD[rbp]
+ xor r15d,r11d
+ ror r13d,6
+ add r9d,r12d
+ and esi,r15d
+ xor r14d,r10d
+ add r9d,r13d
+ xor esi,r11d
+ add ebx,r9d
+ ror r14d,2
+ add r9d,esi
+ mov r13d,ebx
+ add r14d,r9d
+ ror r13d,14
+ mov r9d,r14d
+ mov r12d,ecx
+ xor r13d,ebx
+ ror r14d,9
+ xor r12d,edx
+ ror r13d,5
+ xor r14d,r9d
+ and r12d,ebx
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((48-128))+rdi]
+ xor r13d,ebx
+ add r8d,DWORD[12+rsp]
+ mov esi,r9d
+ ror r14d,11
+ xor r12d,edx
+ xor esi,r10d
+ ror r13d,6
+ add r8d,r12d
+ and r15d,esi
+ xor r14d,r9d
+ add r8d,r13d
+ xor r15d,r10d
+ add eax,r8d
+ ror r14d,2
+ add r8d,r15d
+ mov r13d,eax
+ add r14d,r8d
+ vmovdqa XMMWORD[rsp],xmm6
+ vpalignr xmm4,xmm2,xmm1,4
+ ror r13d,14
+ mov r8d,r14d
+ vpalignr xmm7,xmm0,xmm3,4
+ mov r12d,ebx
+ xor r13d,eax
+DB 143,232,120,194,236,14
+ ror r14d,9
+ xor r12d,ecx
+ vpsrld xmm4,xmm4,3
+ ror r13d,5
+ xor r14d,r8d
+ vpaddd xmm1,xmm1,xmm7
+ and r12d,eax
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((64-128))+rdi]
+ xor r13d,eax
+ add edx,DWORD[16+rsp]
+ mov r15d,r8d
+DB 143,232,120,194,245,11
+ ror r14d,11
+ xor r12d,ecx
+ vpxor xmm4,xmm4,xmm5
+ xor r15d,r9d
+ ror r13d,6
+ add edx,r12d
+ and esi,r15d
+DB 143,232,120,194,248,13
+ xor r14d,r8d
+ add edx,r13d
+ vpxor xmm4,xmm4,xmm6
+ xor esi,r9d
+ add r11d,edx
+ vpsrld xmm6,xmm0,10
+ ror r14d,2
+ add edx,esi
+ vpaddd xmm1,xmm1,xmm4
+ mov r13d,r11d
+ add r14d,edx
+DB 143,232,120,194,239,2
+ ror r13d,14
+ mov edx,r14d
+ vpxor xmm7,xmm7,xmm6
+ mov r12d,eax
+ xor r13d,r11d
+ ror r14d,9
+ xor r12d,ebx
+ vpxor xmm7,xmm7,xmm5
+ ror r13d,5
+ xor r14d,edx
+ and r12d,r11d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((80-128))+rdi]
+ xor r13d,r11d
+ vpsrldq xmm7,xmm7,8
+ add ecx,DWORD[20+rsp]
+ mov esi,edx
+ ror r14d,11
+ xor r12d,ebx
+ vpaddd xmm1,xmm1,xmm7
+ xor esi,r8d
+ ror r13d,6
+ add ecx,r12d
+ and r15d,esi
+DB 143,232,120,194,249,13
+ xor r14d,edx
+ add ecx,r13d
+ vpsrld xmm6,xmm1,10
+ xor r15d,r8d
+ add r10d,ecx
+DB 143,232,120,194,239,2
+ ror r14d,2
+ add ecx,r15d
+ vpxor xmm7,xmm7,xmm6
+ mov r13d,r10d
+ add r14d,ecx
+ ror r13d,14
+ mov ecx,r14d
+ vpxor xmm7,xmm7,xmm5
+ mov r12d,r11d
+ xor r13d,r10d
+ ror r14d,9
+ xor r12d,eax
+ vpslldq xmm7,xmm7,8
+ ror r13d,5
+ xor r14d,ecx
+ and r12d,r10d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((96-128))+rdi]
+ xor r13d,r10d
+ vpaddd xmm1,xmm1,xmm7
+ add ebx,DWORD[24+rsp]
+ mov r15d,ecx
+ ror r14d,11
+ xor r12d,eax
+ vpaddd xmm6,xmm1,XMMWORD[32+rbp]
+ xor r15d,edx
+ ror r13d,6
+ add ebx,r12d
+ and esi,r15d
+ xor r14d,ecx
+ add ebx,r13d
+ xor esi,edx
+ add r9d,ebx
+ ror r14d,2
+ add ebx,esi
+ mov r13d,r9d
+ add r14d,ebx
+ ror r13d,14
+ mov ebx,r14d
+ mov r12d,r10d
+ xor r13d,r9d
+ ror r14d,9
+ xor r12d,r11d
+ ror r13d,5
+ xor r14d,ebx
+ and r12d,r9d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((112-128))+rdi]
+ xor r13d,r9d
+ add eax,DWORD[28+rsp]
+ mov esi,ebx
+ ror r14d,11
+ xor r12d,r11d
+ xor esi,ecx
+ ror r13d,6
+ add eax,r12d
+ and r15d,esi
+ xor r14d,ebx
+ add eax,r13d
+ xor r15d,ecx
+ add r8d,eax
+ ror r14d,2
+ add eax,r15d
+ mov r13d,r8d
+ add r14d,eax
+ vmovdqa XMMWORD[16+rsp],xmm6
+ vpalignr xmm4,xmm3,xmm2,4
+ ror r13d,14
+ mov eax,r14d
+ vpalignr xmm7,xmm1,xmm0,4
+ mov r12d,r9d
+ xor r13d,r8d
+DB 143,232,120,194,236,14
+ ror r14d,9
+ xor r12d,r10d
+ vpsrld xmm4,xmm4,3
+ ror r13d,5
+ xor r14d,eax
+ vpaddd xmm2,xmm2,xmm7
+ and r12d,r8d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((128-128))+rdi]
+ xor r13d,r8d
+ add r11d,DWORD[32+rsp]
+ mov r15d,eax
+DB 143,232,120,194,245,11
+ ror r14d,11
+ xor r12d,r10d
+ vpxor xmm4,xmm4,xmm5
+ xor r15d,ebx
+ ror r13d,6
+ add r11d,r12d
+ and esi,r15d
+DB 143,232,120,194,249,13
+ xor r14d,eax
+ add r11d,r13d
+ vpxor xmm4,xmm4,xmm6
+ xor esi,ebx
+ add edx,r11d
+ vpsrld xmm6,xmm1,10
+ ror r14d,2
+ add r11d,esi
+ vpaddd xmm2,xmm2,xmm4
+ mov r13d,edx
+ add r14d,r11d
+DB 143,232,120,194,239,2
+ ror r13d,14
+ mov r11d,r14d
+ vpxor xmm7,xmm7,xmm6
+ mov r12d,r8d
+ xor r13d,edx
+ ror r14d,9
+ xor r12d,r9d
+ vpxor xmm7,xmm7,xmm5
+ ror r13d,5
+ xor r14d,r11d
+ and r12d,edx
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((144-128))+rdi]
+ xor r13d,edx
+ vpsrldq xmm7,xmm7,8
+ add r10d,DWORD[36+rsp]
+ mov esi,r11d
+ ror r14d,11
+ xor r12d,r9d
+ vpaddd xmm2,xmm2,xmm7
+ xor esi,eax
+ ror r13d,6
+ add r10d,r12d
+ and r15d,esi
+DB 143,232,120,194,250,13
+ xor r14d,r11d
+ add r10d,r13d
+ vpsrld xmm6,xmm2,10
+ xor r15d,eax
+ add ecx,r10d
+DB 143,232,120,194,239,2
+ ror r14d,2
+ add r10d,r15d
+ vpxor xmm7,xmm7,xmm6
+ mov r13d,ecx
+ add r14d,r10d
+ ror r13d,14
+ mov r10d,r14d
+ vpxor xmm7,xmm7,xmm5
+ mov r12d,edx
+ xor r13d,ecx
+ ror r14d,9
+ xor r12d,r8d
+ vpslldq xmm7,xmm7,8
+ ror r13d,5
+ xor r14d,r10d
+ and r12d,ecx
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((160-128))+rdi]
+ xor r13d,ecx
+ vpaddd xmm2,xmm2,xmm7
+ add r9d,DWORD[40+rsp]
+ mov r15d,r10d
+ ror r14d,11
+ xor r12d,r8d
+ vpaddd xmm6,xmm2,XMMWORD[64+rbp]
+ xor r15d,r11d
+ ror r13d,6
+ add r9d,r12d
+ and esi,r15d
+ xor r14d,r10d
+ add r9d,r13d
+ xor esi,r11d
+ add ebx,r9d
+ ror r14d,2
+ add r9d,esi
+ mov r13d,ebx
+ add r14d,r9d
+ ror r13d,14
+ mov r9d,r14d
+ mov r12d,ecx
+ xor r13d,ebx
+ ror r14d,9
+ xor r12d,edx
+ ror r13d,5
+ xor r14d,r9d
+ and r12d,ebx
+ vaesenclast xmm11,xmm9,xmm10
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((176-128))+rdi]
+ xor r13d,ebx
+ add r8d,DWORD[44+rsp]
+ mov esi,r9d
+ ror r14d,11
+ xor r12d,edx
+ xor esi,r10d
+ ror r13d,6
+ add r8d,r12d
+ and r15d,esi
+ xor r14d,r9d
+ add r8d,r13d
+ xor r15d,r10d
+ add eax,r8d
+ ror r14d,2
+ add r8d,r15d
+ mov r13d,eax
+ add r14d,r8d
+ vmovdqa XMMWORD[32+rsp],xmm6
+ vpalignr xmm4,xmm0,xmm3,4
+ ror r13d,14
+ mov r8d,r14d
+ vpalignr xmm7,xmm2,xmm1,4
+ mov r12d,ebx
+ xor r13d,eax
+DB 143,232,120,194,236,14
+ ror r14d,9
+ xor r12d,ecx
+ vpsrld xmm4,xmm4,3
+ ror r13d,5
+ xor r14d,r8d
+ vpaddd xmm3,xmm3,xmm7
+ and r12d,eax
+ vpand xmm8,xmm11,xmm12
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((192-128))+rdi]
+ xor r13d,eax
+ add edx,DWORD[48+rsp]
+ mov r15d,r8d
+DB 143,232,120,194,245,11
+ ror r14d,11
+ xor r12d,ecx
+ vpxor xmm4,xmm4,xmm5
+ xor r15d,r9d
+ ror r13d,6
+ add edx,r12d
+ and esi,r15d
+DB 143,232,120,194,250,13
+ xor r14d,r8d
+ add edx,r13d
+ vpxor xmm4,xmm4,xmm6
+ xor esi,r9d
+ add r11d,edx
+ vpsrld xmm6,xmm2,10
+ ror r14d,2
+ add edx,esi
+ vpaddd xmm3,xmm3,xmm4
+ mov r13d,r11d
+ add r14d,edx
+DB 143,232,120,194,239,2
+ ror r13d,14
+ mov edx,r14d
+ vpxor xmm7,xmm7,xmm6
+ mov r12d,eax
+ xor r13d,r11d
+ ror r14d,9
+ xor r12d,ebx
+ vpxor xmm7,xmm7,xmm5
+ ror r13d,5
+ xor r14d,edx
+ and r12d,r11d
+ vaesenclast xmm11,xmm9,xmm10
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((208-128))+rdi]
+ xor r13d,r11d
+ vpsrldq xmm7,xmm7,8
+ add ecx,DWORD[52+rsp]
+ mov esi,edx
+ ror r14d,11
+ xor r12d,ebx
+ vpaddd xmm3,xmm3,xmm7
+ xor esi,r8d
+ ror r13d,6
+ add ecx,r12d
+ and r15d,esi
+DB 143,232,120,194,251,13
+ xor r14d,edx
+ add ecx,r13d
+ vpsrld xmm6,xmm3,10
+ xor r15d,r8d
+ add r10d,ecx
+DB 143,232,120,194,239,2
+ ror r14d,2
+ add ecx,r15d
+ vpxor xmm7,xmm7,xmm6
+ mov r13d,r10d
+ add r14d,ecx
+ ror r13d,14
+ mov ecx,r14d
+ vpxor xmm7,xmm7,xmm5
+ mov r12d,r11d
+ xor r13d,r10d
+ ror r14d,9
+ xor r12d,eax
+ vpslldq xmm7,xmm7,8
+ ror r13d,5
+ xor r14d,ecx
+ and r12d,r10d
+ vpand xmm11,xmm11,xmm13
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((224-128))+rdi]
+ xor r13d,r10d
+ vpaddd xmm3,xmm3,xmm7
+ add ebx,DWORD[56+rsp]
+ mov r15d,ecx
+ ror r14d,11
+ xor r12d,eax
+ vpaddd xmm6,xmm3,XMMWORD[96+rbp]
+ xor r15d,edx
+ ror r13d,6
+ add ebx,r12d
+ and esi,r15d
+ xor r14d,ecx
+ add ebx,r13d
+ xor esi,edx
+ add r9d,ebx
+ ror r14d,2
+ add ebx,esi
+ mov r13d,r9d
+ add r14d,ebx
+ ror r13d,14
+ mov ebx,r14d
+ mov r12d,r10d
+ xor r13d,r9d
+ ror r14d,9
+ xor r12d,r11d
+ ror r13d,5
+ xor r14d,ebx
+ and r12d,r9d
+ vpor xmm8,xmm8,xmm11
+ vaesenclast xmm11,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((0-128))+rdi]
+ xor r13d,r9d
+ add eax,DWORD[60+rsp]
+ mov esi,ebx
+ ror r14d,11
+ xor r12d,r11d
+ xor esi,ecx
+ ror r13d,6
+ add eax,r12d
+ and r15d,esi
+ xor r14d,ebx
+ add eax,r13d
+ xor r15d,ecx
+ add r8d,eax
+ ror r14d,2
+ add eax,r15d
+ mov r13d,r8d
+ add r14d,eax
+ vmovdqa XMMWORD[48+rsp],xmm6
+ mov r12,QWORD[((64+0))+rsp]
+ vpand xmm11,xmm11,xmm14
+ mov r15,QWORD[((64+8))+rsp]
+ vpor xmm8,xmm8,xmm11
+ vmovdqu XMMWORD[r12*1+r15],xmm8
+ lea r12,[16+r12]
+ cmp BYTE[131+rbp],0
+ jne NEAR $L$xop_00_47
+ vmovdqu xmm9,XMMWORD[r12]
+ mov QWORD[((64+0))+rsp],r12
+ ror r13d,14
+ mov eax,r14d
+ mov r12d,r9d
+ xor r13d,r8d
+ ror r14d,9
+ xor r12d,r10d
+ ror r13d,5
+ xor r14d,eax
+ and r12d,r8d
+ vpxor xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((16-128))+rdi]
+ xor r13d,r8d
+ add r11d,DWORD[rsp]
+ mov r15d,eax
+ ror r14d,11
+ xor r12d,r10d
+ xor r15d,ebx
+ ror r13d,6
+ add r11d,r12d
+ and esi,r15d
+ xor r14d,eax
+ add r11d,r13d
+ xor esi,ebx
+ add edx,r11d
+ ror r14d,2
+ add r11d,esi
+ mov r13d,edx
+ add r14d,r11d
+ ror r13d,14
+ mov r11d,r14d
+ mov r12d,r8d
+ xor r13d,edx
+ ror r14d,9
+ xor r12d,r9d
+ ror r13d,5
+ xor r14d,r11d
+ and r12d,edx
+ vpxor xmm9,xmm9,xmm8
+ xor r13d,edx
+ add r10d,DWORD[4+rsp]
+ mov esi,r11d
+ ror r14d,11
+ xor r12d,r9d
+ xor esi,eax
+ ror r13d,6
+ add r10d,r12d
+ and r15d,esi
+ xor r14d,r11d
+ add r10d,r13d
+ xor r15d,eax
+ add ecx,r10d
+ ror r14d,2
+ add r10d,r15d
+ mov r13d,ecx
+ add r14d,r10d
+ ror r13d,14
+ mov r10d,r14d
+ mov r12d,edx
+ xor r13d,ecx
+ ror r14d,9
+ xor r12d,r8d
+ ror r13d,5
+ xor r14d,r10d
+ and r12d,ecx
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((32-128))+rdi]
+ xor r13d,ecx
+ add r9d,DWORD[8+rsp]
+ mov r15d,r10d
+ ror r14d,11
+ xor r12d,r8d
+ xor r15d,r11d
+ ror r13d,6
+ add r9d,r12d
+ and esi,r15d
+ xor r14d,r10d
+ add r9d,r13d
+ xor esi,r11d
+ add ebx,r9d
+ ror r14d,2
+ add r9d,esi
+ mov r13d,ebx
+ add r14d,r9d
+ ror r13d,14
+ mov r9d,r14d
+ mov r12d,ecx
+ xor r13d,ebx
+ ror r14d,9
+ xor r12d,edx
+ ror r13d,5
+ xor r14d,r9d
+ and r12d,ebx
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((48-128))+rdi]
+ xor r13d,ebx
+ add r8d,DWORD[12+rsp]
+ mov esi,r9d
+ ror r14d,11
+ xor r12d,edx
+ xor esi,r10d
+ ror r13d,6
+ add r8d,r12d
+ and r15d,esi
+ xor r14d,r9d
+ add r8d,r13d
+ xor r15d,r10d
+ add eax,r8d
+ ror r14d,2
+ add r8d,r15d
+ mov r13d,eax
+ add r14d,r8d
+ ror r13d,14
+ mov r8d,r14d
+ mov r12d,ebx
+ xor r13d,eax
+ ror r14d,9
+ xor r12d,ecx
+ ror r13d,5
+ xor r14d,r8d
+ and r12d,eax
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((64-128))+rdi]
+ xor r13d,eax
+ add edx,DWORD[16+rsp]
+ mov r15d,r8d
+ ror r14d,11
+ xor r12d,ecx
+ xor r15d,r9d
+ ror r13d,6
+ add edx,r12d
+ and esi,r15d
+ xor r14d,r8d
+ add edx,r13d
+ xor esi,r9d
+ add r11d,edx
+ ror r14d,2
+ add edx,esi
+ mov r13d,r11d
+ add r14d,edx
+ ror r13d,14
+ mov edx,r14d
+ mov r12d,eax
+ xor r13d,r11d
+ ror r14d,9
+ xor r12d,ebx
+ ror r13d,5
+ xor r14d,edx
+ and r12d,r11d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((80-128))+rdi]
+ xor r13d,r11d
+ add ecx,DWORD[20+rsp]
+ mov esi,edx
+ ror r14d,11
+ xor r12d,ebx
+ xor esi,r8d
+ ror r13d,6
+ add ecx,r12d
+ and r15d,esi
+ xor r14d,edx
+ add ecx,r13d
+ xor r15d,r8d
+ add r10d,ecx
+ ror r14d,2
+ add ecx,r15d
+ mov r13d,r10d
+ add r14d,ecx
+ ror r13d,14
+ mov ecx,r14d
+ mov r12d,r11d
+ xor r13d,r10d
+ ror r14d,9
+ xor r12d,eax
+ ror r13d,5
+ xor r14d,ecx
+ and r12d,r10d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((96-128))+rdi]
+ xor r13d,r10d
+ add ebx,DWORD[24+rsp]
+ mov r15d,ecx
+ ror r14d,11
+ xor r12d,eax
+ xor r15d,edx
+ ror r13d,6
+ add ebx,r12d
+ and esi,r15d
+ xor r14d,ecx
+ add ebx,r13d
+ xor esi,edx
+ add r9d,ebx
+ ror r14d,2
+ add ebx,esi
+ mov r13d,r9d
+ add r14d,ebx
+ ror r13d,14
+ mov ebx,r14d
+ mov r12d,r10d
+ xor r13d,r9d
+ ror r14d,9
+ xor r12d,r11d
+ ror r13d,5
+ xor r14d,ebx
+ and r12d,r9d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((112-128))+rdi]
+ xor r13d,r9d
+ add eax,DWORD[28+rsp]
+ mov esi,ebx
+ ror r14d,11
+ xor r12d,r11d
+ xor esi,ecx
+ ror r13d,6
+ add eax,r12d
+ and r15d,esi
+ xor r14d,ebx
+ add eax,r13d
+ xor r15d,ecx
+ add r8d,eax
+ ror r14d,2
+ add eax,r15d
+ mov r13d,r8d
+ add r14d,eax
+ ror r13d,14
+ mov eax,r14d
+ mov r12d,r9d
+ xor r13d,r8d
+ ror r14d,9
+ xor r12d,r10d
+ ror r13d,5
+ xor r14d,eax
+ and r12d,r8d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((128-128))+rdi]
+ xor r13d,r8d
+ add r11d,DWORD[32+rsp]
+ mov r15d,eax
+ ror r14d,11
+ xor r12d,r10d
+ xor r15d,ebx
+ ror r13d,6
+ add r11d,r12d
+ and esi,r15d
+ xor r14d,eax
+ add r11d,r13d
+ xor esi,ebx
+ add edx,r11d
+ ror r14d,2
+ add r11d,esi
+ mov r13d,edx
+ add r14d,r11d
+ ror r13d,14
+ mov r11d,r14d
+ mov r12d,r8d
+ xor r13d,edx
+ ror r14d,9
+ xor r12d,r9d
+ ror r13d,5
+ xor r14d,r11d
+ and r12d,edx
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((144-128))+rdi]
+ xor r13d,edx
+ add r10d,DWORD[36+rsp]
+ mov esi,r11d
+ ror r14d,11
+ xor r12d,r9d
+ xor esi,eax
+ ror r13d,6
+ add r10d,r12d
+ and r15d,esi
+ xor r14d,r11d
+ add r10d,r13d
+ xor r15d,eax
+ add ecx,r10d
+ ror r14d,2
+ add r10d,r15d
+ mov r13d,ecx
+ add r14d,r10d
+ ror r13d,14
+ mov r10d,r14d
+ mov r12d,edx
+ xor r13d,ecx
+ ror r14d,9
+ xor r12d,r8d
+ ror r13d,5
+ xor r14d,r10d
+ and r12d,ecx
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((160-128))+rdi]
+ xor r13d,ecx
+ add r9d,DWORD[40+rsp]
+ mov r15d,r10d
+ ror r14d,11
+ xor r12d,r8d
+ xor r15d,r11d
+ ror r13d,6
+ add r9d,r12d
+ and esi,r15d
+ xor r14d,r10d
+ add r9d,r13d
+ xor esi,r11d
+ add ebx,r9d
+ ror r14d,2
+ add r9d,esi
+ mov r13d,ebx
+ add r14d,r9d
+ ror r13d,14
+ mov r9d,r14d
+ mov r12d,ecx
+ xor r13d,ebx
+ ror r14d,9
+ xor r12d,edx
+ ror r13d,5
+ xor r14d,r9d
+ and r12d,ebx
+ vaesenclast xmm11,xmm9,xmm10
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((176-128))+rdi]
+ xor r13d,ebx
+ add r8d,DWORD[44+rsp]
+ mov esi,r9d
+ ror r14d,11
+ xor r12d,edx
+ xor esi,r10d
+ ror r13d,6
+ add r8d,r12d
+ and r15d,esi
+ xor r14d,r9d
+ add r8d,r13d
+ xor r15d,r10d
+ add eax,r8d
+ ror r14d,2
+ add r8d,r15d
+ mov r13d,eax
+ add r14d,r8d
+ ror r13d,14
+ mov r8d,r14d
+ mov r12d,ebx
+ xor r13d,eax
+ ror r14d,9
+ xor r12d,ecx
+ ror r13d,5
+ xor r14d,r8d
+ and r12d,eax
+ vpand xmm8,xmm11,xmm12
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((192-128))+rdi]
+ xor r13d,eax
+ add edx,DWORD[48+rsp]
+ mov r15d,r8d
+ ror r14d,11
+ xor r12d,ecx
+ xor r15d,r9d
+ ror r13d,6
+ add edx,r12d
+ and esi,r15d
+ xor r14d,r8d
+ add edx,r13d
+ xor esi,r9d
+ add r11d,edx
+ ror r14d,2
+ add edx,esi
+ mov r13d,r11d
+ add r14d,edx
+ ror r13d,14
+ mov edx,r14d
+ mov r12d,eax
+ xor r13d,r11d
+ ror r14d,9
+ xor r12d,ebx
+ ror r13d,5
+ xor r14d,edx
+ and r12d,r11d
+ vaesenclast xmm11,xmm9,xmm10
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((208-128))+rdi]
+ xor r13d,r11d
+ add ecx,DWORD[52+rsp]
+ mov esi,edx
+ ror r14d,11
+ xor r12d,ebx
+ xor esi,r8d
+ ror r13d,6
+ add ecx,r12d
+ and r15d,esi
+ xor r14d,edx
+ add ecx,r13d
+ xor r15d,r8d
+ add r10d,ecx
+ ror r14d,2
+ add ecx,r15d
+ mov r13d,r10d
+ add r14d,ecx
+ ror r13d,14
+ mov ecx,r14d
+ mov r12d,r11d
+ xor r13d,r10d
+ ror r14d,9
+ xor r12d,eax
+ ror r13d,5
+ xor r14d,ecx
+ and r12d,r10d
+ vpand xmm11,xmm11,xmm13
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((224-128))+rdi]
+ xor r13d,r10d
+ add ebx,DWORD[56+rsp]
+ mov r15d,ecx
+ ror r14d,11
+ xor r12d,eax
+ xor r15d,edx
+ ror r13d,6
+ add ebx,r12d
+ and esi,r15d
+ xor r14d,ecx
+ add ebx,r13d
+ xor esi,edx
+ add r9d,ebx
+ ror r14d,2
+ add ebx,esi
+ mov r13d,r9d
+ add r14d,ebx
+ ror r13d,14
+ mov ebx,r14d
+ mov r12d,r10d
+ xor r13d,r9d
+ ror r14d,9
+ xor r12d,r11d
+ ror r13d,5
+ xor r14d,ebx
+ and r12d,r9d
+ vpor xmm8,xmm8,xmm11
+ vaesenclast xmm11,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((0-128))+rdi]
+ xor r13d,r9d
+ add eax,DWORD[60+rsp]
+ mov esi,ebx
+ ror r14d,11
+ xor r12d,r11d
+ xor esi,ecx
+ ror r13d,6
+ add eax,r12d
+ and r15d,esi
+ xor r14d,ebx
+ add eax,r13d
+ xor r15d,ecx
+ add r8d,eax
+ ror r14d,2
+ add eax,r15d
+ mov r13d,r8d
+ add r14d,eax
+ mov r12,QWORD[((64+0))+rsp]
+ mov r13,QWORD[((64+8))+rsp]
+ mov r15,QWORD[((64+40))+rsp]
+ mov rsi,QWORD[((64+48))+rsp]
+
+ vpand xmm11,xmm11,xmm14
+ mov eax,r14d
+ vpor xmm8,xmm8,xmm11
+ vmovdqu XMMWORD[r13*1+r12],xmm8
+ lea r12,[16+r12]
+
+ add eax,DWORD[r15]
+ add ebx,DWORD[4+r15]
+ add ecx,DWORD[8+r15]
+ add edx,DWORD[12+r15]
+ add r8d,DWORD[16+r15]
+ add r9d,DWORD[20+r15]
+ add r10d,DWORD[24+r15]
+ add r11d,DWORD[28+r15]
+
+ cmp r12,QWORD[((64+16))+rsp]
+
+ mov DWORD[r15],eax
+ mov DWORD[4+r15],ebx
+ mov DWORD[8+r15],ecx
+ mov DWORD[12+r15],edx
+ mov DWORD[16+r15],r8d
+ mov DWORD[20+r15],r9d
+ mov DWORD[24+r15],r10d
+ mov DWORD[28+r15],r11d
+
+ jb NEAR $L$loop_xop
+
+ mov r8,QWORD[((64+32))+rsp]
+ mov rsi,QWORD[120+rsp]
+
+ vmovdqu XMMWORD[r8],xmm8
+ vzeroall
+ movaps xmm6,XMMWORD[128+rsp]
+ movaps xmm7,XMMWORD[144+rsp]
+ movaps xmm8,XMMWORD[160+rsp]
+ movaps xmm9,XMMWORD[176+rsp]
+ movaps xmm10,XMMWORD[192+rsp]
+ movaps xmm11,XMMWORD[208+rsp]
+ movaps xmm12,XMMWORD[224+rsp]
+ movaps xmm13,XMMWORD[240+rsp]
+ movaps xmm14,XMMWORD[256+rsp]
+ movaps xmm15,XMMWORD[272+rsp]
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$epilogue_xop:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_aesni_cbc_sha256_enc_xop:
+
+ALIGN 64
+aesni_cbc_sha256_enc_avx:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_cbc_sha256_enc_avx:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+$L$avx_shortcut:
+ mov r10,QWORD[56+rsp]
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ sub rsp,288
+ and rsp,-64
+
+ shl rdx,6
+ sub rsi,rdi
+ sub r10,rdi
+ add rdx,rdi
+
+
+ mov QWORD[((64+8))+rsp],rsi
+ mov QWORD[((64+16))+rsp],rdx
+
+ mov QWORD[((64+32))+rsp],r8
+ mov QWORD[((64+40))+rsp],r9
+ mov QWORD[((64+48))+rsp],r10
+ mov QWORD[120+rsp],rax
+
+ movaps XMMWORD[128+rsp],xmm6
+ movaps XMMWORD[144+rsp],xmm7
+ movaps XMMWORD[160+rsp],xmm8
+ movaps XMMWORD[176+rsp],xmm9
+ movaps XMMWORD[192+rsp],xmm10
+ movaps XMMWORD[208+rsp],xmm11
+ movaps XMMWORD[224+rsp],xmm12
+ movaps XMMWORD[240+rsp],xmm13
+ movaps XMMWORD[256+rsp],xmm14
+ movaps XMMWORD[272+rsp],xmm15
+$L$prologue_avx:
+ vzeroall
+
+ mov r12,rdi
+ lea rdi,[128+rcx]
+ lea r13,[((K256+544))]
+ mov r14d,DWORD[((240-128))+rdi]
+ mov r15,r9
+ mov rsi,r10
+ vmovdqu xmm8,XMMWORD[r8]
+ sub r14,9
+
+ mov eax,DWORD[r15]
+ mov ebx,DWORD[4+r15]
+ mov ecx,DWORD[8+r15]
+ mov edx,DWORD[12+r15]
+ mov r8d,DWORD[16+r15]
+ mov r9d,DWORD[20+r15]
+ mov r10d,DWORD[24+r15]
+ mov r11d,DWORD[28+r15]
+
+ vmovdqa xmm14,XMMWORD[r14*8+r13]
+ vmovdqa xmm13,XMMWORD[16+r14*8+r13]
+ vmovdqa xmm12,XMMWORD[32+r14*8+r13]
+ vmovdqu xmm10,XMMWORD[((0-128))+rdi]
+ jmp NEAR $L$loop_avx
+ALIGN 16
+$L$loop_avx:
+ vmovdqa xmm7,XMMWORD[((K256+512))]
+ vmovdqu xmm0,XMMWORD[r12*1+rsi]
+ vmovdqu xmm1,XMMWORD[16+r12*1+rsi]
+ vmovdqu xmm2,XMMWORD[32+r12*1+rsi]
+ vmovdqu xmm3,XMMWORD[48+r12*1+rsi]
+ vpshufb xmm0,xmm0,xmm7
+ lea rbp,[K256]
+ vpshufb xmm1,xmm1,xmm7
+ vpshufb xmm2,xmm2,xmm7
+ vpaddd xmm4,xmm0,XMMWORD[rbp]
+ vpshufb xmm3,xmm3,xmm7
+ vpaddd xmm5,xmm1,XMMWORD[32+rbp]
+ vpaddd xmm6,xmm2,XMMWORD[64+rbp]
+ vpaddd xmm7,xmm3,XMMWORD[96+rbp]
+ vmovdqa XMMWORD[rsp],xmm4
+ mov r14d,eax
+ vmovdqa XMMWORD[16+rsp],xmm5
+ mov esi,ebx
+ vmovdqa XMMWORD[32+rsp],xmm6
+ xor esi,ecx
+ vmovdqa XMMWORD[48+rsp],xmm7
+ mov r13d,r8d
+ jmp NEAR $L$avx_00_47
+
+ALIGN 16
+$L$avx_00_47:
+ sub rbp,-16*2*4
+ vmovdqu xmm9,XMMWORD[r12]
+ mov QWORD[((64+0))+rsp],r12
+ vpalignr xmm4,xmm1,xmm0,4
+ shrd r13d,r13d,14
+ mov eax,r14d
+ mov r12d,r9d
+ vpalignr xmm7,xmm3,xmm2,4
+ xor r13d,r8d
+ shrd r14d,r14d,9
+ xor r12d,r10d
+ vpsrld xmm6,xmm4,7
+ shrd r13d,r13d,5
+ xor r14d,eax
+ and r12d,r8d
+ vpaddd xmm0,xmm0,xmm7
+ vpxor xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((16-128))+rdi]
+ xor r13d,r8d
+ add r11d,DWORD[rsp]
+ mov r15d,eax
+ vpsrld xmm7,xmm4,3
+ shrd r14d,r14d,11
+ xor r12d,r10d
+ xor r15d,ebx
+ vpslld xmm5,xmm4,14
+ shrd r13d,r13d,6
+ add r11d,r12d
+ and esi,r15d
+ vpxor xmm4,xmm7,xmm6
+ xor r14d,eax
+ add r11d,r13d
+ xor esi,ebx
+ vpshufd xmm7,xmm3,250
+ add edx,r11d
+ shrd r14d,r14d,2
+ add r11d,esi
+ vpsrld xmm6,xmm6,11
+ mov r13d,edx
+ add r14d,r11d
+ shrd r13d,r13d,14
+ vpxor xmm4,xmm4,xmm5
+ mov r11d,r14d
+ mov r12d,r8d
+ xor r13d,edx
+ vpslld xmm5,xmm5,11
+ shrd r14d,r14d,9
+ xor r12d,r9d
+ shrd r13d,r13d,5
+ vpxor xmm4,xmm4,xmm6
+ xor r14d,r11d
+ and r12d,edx
+ vpxor xmm9,xmm9,xmm8
+ xor r13d,edx
+ vpsrld xmm6,xmm7,10
+ add r10d,DWORD[4+rsp]
+ mov esi,r11d
+ shrd r14d,r14d,11
+ vpxor xmm4,xmm4,xmm5
+ xor r12d,r9d
+ xor esi,eax
+ shrd r13d,r13d,6
+ vpsrlq xmm7,xmm7,17
+ add r10d,r12d
+ and r15d,esi
+ xor r14d,r11d
+ vpaddd xmm0,xmm0,xmm4
+ add r10d,r13d
+ xor r15d,eax
+ add ecx,r10d
+ vpxor xmm6,xmm6,xmm7
+ shrd r14d,r14d,2
+ add r10d,r15d
+ mov r13d,ecx
+ vpsrlq xmm7,xmm7,2
+ add r14d,r10d
+ shrd r13d,r13d,14
+ mov r10d,r14d
+ vpxor xmm6,xmm6,xmm7
+ mov r12d,edx
+ xor r13d,ecx
+ shrd r14d,r14d,9
+ vpshufd xmm6,xmm6,132
+ xor r12d,r8d
+ shrd r13d,r13d,5
+ xor r14d,r10d
+ vpsrldq xmm6,xmm6,8
+ and r12d,ecx
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((32-128))+rdi]
+ xor r13d,ecx
+ add r9d,DWORD[8+rsp]
+ vpaddd xmm0,xmm0,xmm6
+ mov r15d,r10d
+ shrd r14d,r14d,11
+ xor r12d,r8d
+ vpshufd xmm7,xmm0,80
+ xor r15d,r11d
+ shrd r13d,r13d,6
+ add r9d,r12d
+ vpsrld xmm6,xmm7,10
+ and esi,r15d
+ xor r14d,r10d
+ add r9d,r13d
+ vpsrlq xmm7,xmm7,17
+ xor esi,r11d
+ add ebx,r9d
+ shrd r14d,r14d,2
+ vpxor xmm6,xmm6,xmm7
+ add r9d,esi
+ mov r13d,ebx
+ add r14d,r9d
+ vpsrlq xmm7,xmm7,2
+ shrd r13d,r13d,14
+ mov r9d,r14d
+ mov r12d,ecx
+ vpxor xmm6,xmm6,xmm7
+ xor r13d,ebx
+ shrd r14d,r14d,9
+ xor r12d,edx
+ vpshufd xmm6,xmm6,232
+ shrd r13d,r13d,5
+ xor r14d,r9d
+ and r12d,ebx
+ vpslldq xmm6,xmm6,8
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((48-128))+rdi]
+ xor r13d,ebx
+ add r8d,DWORD[12+rsp]
+ mov esi,r9d
+ vpaddd xmm0,xmm0,xmm6
+ shrd r14d,r14d,11
+ xor r12d,edx
+ xor esi,r10d
+ vpaddd xmm6,xmm0,XMMWORD[rbp]
+ shrd r13d,r13d,6
+ add r8d,r12d
+ and r15d,esi
+ xor r14d,r9d
+ add r8d,r13d
+ xor r15d,r10d
+ add eax,r8d
+ shrd r14d,r14d,2
+ add r8d,r15d
+ mov r13d,eax
+ add r14d,r8d
+ vmovdqa XMMWORD[rsp],xmm6
+ vpalignr xmm4,xmm2,xmm1,4
+ shrd r13d,r13d,14
+ mov r8d,r14d
+ mov r12d,ebx
+ vpalignr xmm7,xmm0,xmm3,4
+ xor r13d,eax
+ shrd r14d,r14d,9
+ xor r12d,ecx
+ vpsrld xmm6,xmm4,7
+ shrd r13d,r13d,5
+ xor r14d,r8d
+ and r12d,eax
+ vpaddd xmm1,xmm1,xmm7
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((64-128))+rdi]
+ xor r13d,eax
+ add edx,DWORD[16+rsp]
+ mov r15d,r8d
+ vpsrld xmm7,xmm4,3
+ shrd r14d,r14d,11
+ xor r12d,ecx
+ xor r15d,r9d
+ vpslld xmm5,xmm4,14
+ shrd r13d,r13d,6
+ add edx,r12d
+ and esi,r15d
+ vpxor xmm4,xmm7,xmm6
+ xor r14d,r8d
+ add edx,r13d
+ xor esi,r9d
+ vpshufd xmm7,xmm0,250
+ add r11d,edx
+ shrd r14d,r14d,2
+ add edx,esi
+ vpsrld xmm6,xmm6,11
+ mov r13d,r11d
+ add r14d,edx
+ shrd r13d,r13d,14
+ vpxor xmm4,xmm4,xmm5
+ mov edx,r14d
+ mov r12d,eax
+ xor r13d,r11d
+ vpslld xmm5,xmm5,11
+ shrd r14d,r14d,9
+ xor r12d,ebx
+ shrd r13d,r13d,5
+ vpxor xmm4,xmm4,xmm6
+ xor r14d,edx
+ and r12d,r11d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((80-128))+rdi]
+ xor r13d,r11d
+ vpsrld xmm6,xmm7,10
+ add ecx,DWORD[20+rsp]
+ mov esi,edx
+ shrd r14d,r14d,11
+ vpxor xmm4,xmm4,xmm5
+ xor r12d,ebx
+ xor esi,r8d
+ shrd r13d,r13d,6
+ vpsrlq xmm7,xmm7,17
+ add ecx,r12d
+ and r15d,esi
+ xor r14d,edx
+ vpaddd xmm1,xmm1,xmm4
+ add ecx,r13d
+ xor r15d,r8d
+ add r10d,ecx
+ vpxor xmm6,xmm6,xmm7
+ shrd r14d,r14d,2
+ add ecx,r15d
+ mov r13d,r10d
+ vpsrlq xmm7,xmm7,2
+ add r14d,ecx
+ shrd r13d,r13d,14
+ mov ecx,r14d
+ vpxor xmm6,xmm6,xmm7
+ mov r12d,r11d
+ xor r13d,r10d
+ shrd r14d,r14d,9
+ vpshufd xmm6,xmm6,132
+ xor r12d,eax
+ shrd r13d,r13d,5
+ xor r14d,ecx
+ vpsrldq xmm6,xmm6,8
+ and r12d,r10d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((96-128))+rdi]
+ xor r13d,r10d
+ add ebx,DWORD[24+rsp]
+ vpaddd xmm1,xmm1,xmm6
+ mov r15d,ecx
+ shrd r14d,r14d,11
+ xor r12d,eax
+ vpshufd xmm7,xmm1,80
+ xor r15d,edx
+ shrd r13d,r13d,6
+ add ebx,r12d
+ vpsrld xmm6,xmm7,10
+ and esi,r15d
+ xor r14d,ecx
+ add ebx,r13d
+ vpsrlq xmm7,xmm7,17
+ xor esi,edx
+ add r9d,ebx
+ shrd r14d,r14d,2
+ vpxor xmm6,xmm6,xmm7
+ add ebx,esi
+ mov r13d,r9d
+ add r14d,ebx
+ vpsrlq xmm7,xmm7,2
+ shrd r13d,r13d,14
+ mov ebx,r14d
+ mov r12d,r10d
+ vpxor xmm6,xmm6,xmm7
+ xor r13d,r9d
+ shrd r14d,r14d,9
+ xor r12d,r11d
+ vpshufd xmm6,xmm6,232
+ shrd r13d,r13d,5
+ xor r14d,ebx
+ and r12d,r9d
+ vpslldq xmm6,xmm6,8
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((112-128))+rdi]
+ xor r13d,r9d
+ add eax,DWORD[28+rsp]
+ mov esi,ebx
+ vpaddd xmm1,xmm1,xmm6
+ shrd r14d,r14d,11
+ xor r12d,r11d
+ xor esi,ecx
+ vpaddd xmm6,xmm1,XMMWORD[32+rbp]
+ shrd r13d,r13d,6
+ add eax,r12d
+ and r15d,esi
+ xor r14d,ebx
+ add eax,r13d
+ xor r15d,ecx
+ add r8d,eax
+ shrd r14d,r14d,2
+ add eax,r15d
+ mov r13d,r8d
+ add r14d,eax
+ vmovdqa XMMWORD[16+rsp],xmm6
+ vpalignr xmm4,xmm3,xmm2,4
+ shrd r13d,r13d,14
+ mov eax,r14d
+ mov r12d,r9d
+ vpalignr xmm7,xmm1,xmm0,4
+ xor r13d,r8d
+ shrd r14d,r14d,9
+ xor r12d,r10d
+ vpsrld xmm6,xmm4,7
+ shrd r13d,r13d,5
+ xor r14d,eax
+ and r12d,r8d
+ vpaddd xmm2,xmm2,xmm7
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((128-128))+rdi]
+ xor r13d,r8d
+ add r11d,DWORD[32+rsp]
+ mov r15d,eax
+ vpsrld xmm7,xmm4,3
+ shrd r14d,r14d,11
+ xor r12d,r10d
+ xor r15d,ebx
+ vpslld xmm5,xmm4,14
+ shrd r13d,r13d,6
+ add r11d,r12d
+ and esi,r15d
+ vpxor xmm4,xmm7,xmm6
+ xor r14d,eax
+ add r11d,r13d
+ xor esi,ebx
+ vpshufd xmm7,xmm1,250
+ add edx,r11d
+ shrd r14d,r14d,2
+ add r11d,esi
+ vpsrld xmm6,xmm6,11
+ mov r13d,edx
+ add r14d,r11d
+ shrd r13d,r13d,14
+ vpxor xmm4,xmm4,xmm5
+ mov r11d,r14d
+ mov r12d,r8d
+ xor r13d,edx
+ vpslld xmm5,xmm5,11
+ shrd r14d,r14d,9
+ xor r12d,r9d
+ shrd r13d,r13d,5
+ vpxor xmm4,xmm4,xmm6
+ xor r14d,r11d
+ and r12d,edx
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((144-128))+rdi]
+ xor r13d,edx
+ vpsrld xmm6,xmm7,10
+ add r10d,DWORD[36+rsp]
+ mov esi,r11d
+ shrd r14d,r14d,11
+ vpxor xmm4,xmm4,xmm5
+ xor r12d,r9d
+ xor esi,eax
+ shrd r13d,r13d,6
+ vpsrlq xmm7,xmm7,17
+ add r10d,r12d
+ and r15d,esi
+ xor r14d,r11d
+ vpaddd xmm2,xmm2,xmm4
+ add r10d,r13d
+ xor r15d,eax
+ add ecx,r10d
+ vpxor xmm6,xmm6,xmm7
+ shrd r14d,r14d,2
+ add r10d,r15d
+ mov r13d,ecx
+ vpsrlq xmm7,xmm7,2
+ add r14d,r10d
+ shrd r13d,r13d,14
+ mov r10d,r14d
+ vpxor xmm6,xmm6,xmm7
+ mov r12d,edx
+ xor r13d,ecx
+ shrd r14d,r14d,9
+ vpshufd xmm6,xmm6,132
+ xor r12d,r8d
+ shrd r13d,r13d,5
+ xor r14d,r10d
+ vpsrldq xmm6,xmm6,8
+ and r12d,ecx
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((160-128))+rdi]
+ xor r13d,ecx
+ add r9d,DWORD[40+rsp]
+ vpaddd xmm2,xmm2,xmm6
+ mov r15d,r10d
+ shrd r14d,r14d,11
+ xor r12d,r8d
+ vpshufd xmm7,xmm2,80
+ xor r15d,r11d
+ shrd r13d,r13d,6
+ add r9d,r12d
+ vpsrld xmm6,xmm7,10
+ and esi,r15d
+ xor r14d,r10d
+ add r9d,r13d
+ vpsrlq xmm7,xmm7,17
+ xor esi,r11d
+ add ebx,r9d
+ shrd r14d,r14d,2
+ vpxor xmm6,xmm6,xmm7
+ add r9d,esi
+ mov r13d,ebx
+ add r14d,r9d
+ vpsrlq xmm7,xmm7,2
+ shrd r13d,r13d,14
+ mov r9d,r14d
+ mov r12d,ecx
+ vpxor xmm6,xmm6,xmm7
+ xor r13d,ebx
+ shrd r14d,r14d,9
+ xor r12d,edx
+ vpshufd xmm6,xmm6,232
+ shrd r13d,r13d,5
+ xor r14d,r9d
+ and r12d,ebx
+ vpslldq xmm6,xmm6,8
+ vaesenclast xmm11,xmm9,xmm10
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((176-128))+rdi]
+ xor r13d,ebx
+ add r8d,DWORD[44+rsp]
+ mov esi,r9d
+ vpaddd xmm2,xmm2,xmm6
+ shrd r14d,r14d,11
+ xor r12d,edx
+ xor esi,r10d
+ vpaddd xmm6,xmm2,XMMWORD[64+rbp]
+ shrd r13d,r13d,6
+ add r8d,r12d
+ and r15d,esi
+ xor r14d,r9d
+ add r8d,r13d
+ xor r15d,r10d
+ add eax,r8d
+ shrd r14d,r14d,2
+ add r8d,r15d
+ mov r13d,eax
+ add r14d,r8d
+ vmovdqa XMMWORD[32+rsp],xmm6
+ vpalignr xmm4,xmm0,xmm3,4
+ shrd r13d,r13d,14
+ mov r8d,r14d
+ mov r12d,ebx
+ vpalignr xmm7,xmm2,xmm1,4
+ xor r13d,eax
+ shrd r14d,r14d,9
+ xor r12d,ecx
+ vpsrld xmm6,xmm4,7
+ shrd r13d,r13d,5
+ xor r14d,r8d
+ and r12d,eax
+ vpaddd xmm3,xmm3,xmm7
+ vpand xmm8,xmm11,xmm12
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((192-128))+rdi]
+ xor r13d,eax
+ add edx,DWORD[48+rsp]
+ mov r15d,r8d
+ vpsrld xmm7,xmm4,3
+ shrd r14d,r14d,11
+ xor r12d,ecx
+ xor r15d,r9d
+ vpslld xmm5,xmm4,14
+ shrd r13d,r13d,6
+ add edx,r12d
+ and esi,r15d
+ vpxor xmm4,xmm7,xmm6
+ xor r14d,r8d
+ add edx,r13d
+ xor esi,r9d
+ vpshufd xmm7,xmm2,250
+ add r11d,edx
+ shrd r14d,r14d,2
+ add edx,esi
+ vpsrld xmm6,xmm6,11
+ mov r13d,r11d
+ add r14d,edx
+ shrd r13d,r13d,14
+ vpxor xmm4,xmm4,xmm5
+ mov edx,r14d
+ mov r12d,eax
+ xor r13d,r11d
+ vpslld xmm5,xmm5,11
+ shrd r14d,r14d,9
+ xor r12d,ebx
+ shrd r13d,r13d,5
+ vpxor xmm4,xmm4,xmm6
+ xor r14d,edx
+ and r12d,r11d
+ vaesenclast xmm11,xmm9,xmm10
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((208-128))+rdi]
+ xor r13d,r11d
+ vpsrld xmm6,xmm7,10
+ add ecx,DWORD[52+rsp]
+ mov esi,edx
+ shrd r14d,r14d,11
+ vpxor xmm4,xmm4,xmm5
+ xor r12d,ebx
+ xor esi,r8d
+ shrd r13d,r13d,6
+ vpsrlq xmm7,xmm7,17
+ add ecx,r12d
+ and r15d,esi
+ xor r14d,edx
+ vpaddd xmm3,xmm3,xmm4
+ add ecx,r13d
+ xor r15d,r8d
+ add r10d,ecx
+ vpxor xmm6,xmm6,xmm7
+ shrd r14d,r14d,2
+ add ecx,r15d
+ mov r13d,r10d
+ vpsrlq xmm7,xmm7,2
+ add r14d,ecx
+ shrd r13d,r13d,14
+ mov ecx,r14d
+ vpxor xmm6,xmm6,xmm7
+ mov r12d,r11d
+ xor r13d,r10d
+ shrd r14d,r14d,9
+ vpshufd xmm6,xmm6,132
+ xor r12d,eax
+ shrd r13d,r13d,5
+ xor r14d,ecx
+ vpsrldq xmm6,xmm6,8
+ and r12d,r10d
+ vpand xmm11,xmm11,xmm13
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((224-128))+rdi]
+ xor r13d,r10d
+ add ebx,DWORD[56+rsp]
+ vpaddd xmm3,xmm3,xmm6
+ mov r15d,ecx
+ shrd r14d,r14d,11
+ xor r12d,eax
+ vpshufd xmm7,xmm3,80
+ xor r15d,edx
+ shrd r13d,r13d,6
+ add ebx,r12d
+ vpsrld xmm6,xmm7,10
+ and esi,r15d
+ xor r14d,ecx
+ add ebx,r13d
+ vpsrlq xmm7,xmm7,17
+ xor esi,edx
+ add r9d,ebx
+ shrd r14d,r14d,2
+ vpxor xmm6,xmm6,xmm7
+ add ebx,esi
+ mov r13d,r9d
+ add r14d,ebx
+ vpsrlq xmm7,xmm7,2
+ shrd r13d,r13d,14
+ mov ebx,r14d
+ mov r12d,r10d
+ vpxor xmm6,xmm6,xmm7
+ xor r13d,r9d
+ shrd r14d,r14d,9
+ xor r12d,r11d
+ vpshufd xmm6,xmm6,232
+ shrd r13d,r13d,5
+ xor r14d,ebx
+ and r12d,r9d
+ vpslldq xmm6,xmm6,8
+ vpor xmm8,xmm8,xmm11
+ vaesenclast xmm11,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((0-128))+rdi]
+ xor r13d,r9d
+ add eax,DWORD[60+rsp]
+ mov esi,ebx
+ vpaddd xmm3,xmm3,xmm6
+ shrd r14d,r14d,11
+ xor r12d,r11d
+ xor esi,ecx
+ vpaddd xmm6,xmm3,XMMWORD[96+rbp]
+ shrd r13d,r13d,6
+ add eax,r12d
+ and r15d,esi
+ xor r14d,ebx
+ add eax,r13d
+ xor r15d,ecx
+ add r8d,eax
+ shrd r14d,r14d,2
+ add eax,r15d
+ mov r13d,r8d
+ add r14d,eax
+ vmovdqa XMMWORD[48+rsp],xmm6
+ mov r12,QWORD[((64+0))+rsp]
+ vpand xmm11,xmm11,xmm14
+ mov r15,QWORD[((64+8))+rsp]
+ vpor xmm8,xmm8,xmm11
+ vmovdqu XMMWORD[r12*1+r15],xmm8
+ lea r12,[16+r12]
+ cmp BYTE[131+rbp],0
+ jne NEAR $L$avx_00_47
+ vmovdqu xmm9,XMMWORD[r12]
+ mov QWORD[((64+0))+rsp],r12
+ shrd r13d,r13d,14
+ mov eax,r14d
+ mov r12d,r9d
+ xor r13d,r8d
+ shrd r14d,r14d,9
+ xor r12d,r10d
+ shrd r13d,r13d,5
+ xor r14d,eax
+ and r12d,r8d
+ vpxor xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((16-128))+rdi]
+ xor r13d,r8d
+ add r11d,DWORD[rsp]
+ mov r15d,eax
+ shrd r14d,r14d,11
+ xor r12d,r10d
+ xor r15d,ebx
+ shrd r13d,r13d,6
+ add r11d,r12d
+ and esi,r15d
+ xor r14d,eax
+ add r11d,r13d
+ xor esi,ebx
+ add edx,r11d
+ shrd r14d,r14d,2
+ add r11d,esi
+ mov r13d,edx
+ add r14d,r11d
+ shrd r13d,r13d,14
+ mov r11d,r14d
+ mov r12d,r8d
+ xor r13d,edx
+ shrd r14d,r14d,9
+ xor r12d,r9d
+ shrd r13d,r13d,5
+ xor r14d,r11d
+ and r12d,edx
+ vpxor xmm9,xmm9,xmm8
+ xor r13d,edx
+ add r10d,DWORD[4+rsp]
+ mov esi,r11d
+ shrd r14d,r14d,11
+ xor r12d,r9d
+ xor esi,eax
+ shrd r13d,r13d,6
+ add r10d,r12d
+ and r15d,esi
+ xor r14d,r11d
+ add r10d,r13d
+ xor r15d,eax
+ add ecx,r10d
+ shrd r14d,r14d,2
+ add r10d,r15d
+ mov r13d,ecx
+ add r14d,r10d
+ shrd r13d,r13d,14
+ mov r10d,r14d
+ mov r12d,edx
+ xor r13d,ecx
+ shrd r14d,r14d,9
+ xor r12d,r8d
+ shrd r13d,r13d,5
+ xor r14d,r10d
+ and r12d,ecx
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((32-128))+rdi]
+ xor r13d,ecx
+ add r9d,DWORD[8+rsp]
+ mov r15d,r10d
+ shrd r14d,r14d,11
+ xor r12d,r8d
+ xor r15d,r11d
+ shrd r13d,r13d,6
+ add r9d,r12d
+ and esi,r15d
+ xor r14d,r10d
+ add r9d,r13d
+ xor esi,r11d
+ add ebx,r9d
+ shrd r14d,r14d,2
+ add r9d,esi
+ mov r13d,ebx
+ add r14d,r9d
+ shrd r13d,r13d,14
+ mov r9d,r14d
+ mov r12d,ecx
+ xor r13d,ebx
+ shrd r14d,r14d,9
+ xor r12d,edx
+ shrd r13d,r13d,5
+ xor r14d,r9d
+ and r12d,ebx
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((48-128))+rdi]
+ xor r13d,ebx
+ add r8d,DWORD[12+rsp]
+ mov esi,r9d
+ shrd r14d,r14d,11
+ xor r12d,edx
+ xor esi,r10d
+ shrd r13d,r13d,6
+ add r8d,r12d
+ and r15d,esi
+ xor r14d,r9d
+ add r8d,r13d
+ xor r15d,r10d
+ add eax,r8d
+ shrd r14d,r14d,2
+ add r8d,r15d
+ mov r13d,eax
+ add r14d,r8d
+ shrd r13d,r13d,14
+ mov r8d,r14d
+ mov r12d,ebx
+ xor r13d,eax
+ shrd r14d,r14d,9
+ xor r12d,ecx
+ shrd r13d,r13d,5
+ xor r14d,r8d
+ and r12d,eax
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((64-128))+rdi]
+ xor r13d,eax
+ add edx,DWORD[16+rsp]
+ mov r15d,r8d
+ shrd r14d,r14d,11
+ xor r12d,ecx
+ xor r15d,r9d
+ shrd r13d,r13d,6
+ add edx,r12d
+ and esi,r15d
+ xor r14d,r8d
+ add edx,r13d
+ xor esi,r9d
+ add r11d,edx
+ shrd r14d,r14d,2
+ add edx,esi
+ mov r13d,r11d
+ add r14d,edx
+ shrd r13d,r13d,14
+ mov edx,r14d
+ mov r12d,eax
+ xor r13d,r11d
+ shrd r14d,r14d,9
+ xor r12d,ebx
+ shrd r13d,r13d,5
+ xor r14d,edx
+ and r12d,r11d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((80-128))+rdi]
+ xor r13d,r11d
+ add ecx,DWORD[20+rsp]
+ mov esi,edx
+ shrd r14d,r14d,11
+ xor r12d,ebx
+ xor esi,r8d
+ shrd r13d,r13d,6
+ add ecx,r12d
+ and r15d,esi
+ xor r14d,edx
+ add ecx,r13d
+ xor r15d,r8d
+ add r10d,ecx
+ shrd r14d,r14d,2
+ add ecx,r15d
+ mov r13d,r10d
+ add r14d,ecx
+ shrd r13d,r13d,14
+ mov ecx,r14d
+ mov r12d,r11d
+ xor r13d,r10d
+ shrd r14d,r14d,9
+ xor r12d,eax
+ shrd r13d,r13d,5
+ xor r14d,ecx
+ and r12d,r10d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((96-128))+rdi]
+ xor r13d,r10d
+ add ebx,DWORD[24+rsp]
+ mov r15d,ecx
+ shrd r14d,r14d,11
+ xor r12d,eax
+ xor r15d,edx
+ shrd r13d,r13d,6
+ add ebx,r12d
+ and esi,r15d
+ xor r14d,ecx
+ add ebx,r13d
+ xor esi,edx
+ add r9d,ebx
+ shrd r14d,r14d,2
+ add ebx,esi
+ mov r13d,r9d
+ add r14d,ebx
+ shrd r13d,r13d,14
+ mov ebx,r14d
+ mov r12d,r10d
+ xor r13d,r9d
+ shrd r14d,r14d,9
+ xor r12d,r11d
+ shrd r13d,r13d,5
+ xor r14d,ebx
+ and r12d,r9d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((112-128))+rdi]
+ xor r13d,r9d
+ add eax,DWORD[28+rsp]
+ mov esi,ebx
+ shrd r14d,r14d,11
+ xor r12d,r11d
+ xor esi,ecx
+ shrd r13d,r13d,6
+ add eax,r12d
+ and r15d,esi
+ xor r14d,ebx
+ add eax,r13d
+ xor r15d,ecx
+ add r8d,eax
+ shrd r14d,r14d,2
+ add eax,r15d
+ mov r13d,r8d
+ add r14d,eax
+ shrd r13d,r13d,14
+ mov eax,r14d
+ mov r12d,r9d
+ xor r13d,r8d
+ shrd r14d,r14d,9
+ xor r12d,r10d
+ shrd r13d,r13d,5
+ xor r14d,eax
+ and r12d,r8d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((128-128))+rdi]
+ xor r13d,r8d
+ add r11d,DWORD[32+rsp]
+ mov r15d,eax
+ shrd r14d,r14d,11
+ xor r12d,r10d
+ xor r15d,ebx
+ shrd r13d,r13d,6
+ add r11d,r12d
+ and esi,r15d
+ xor r14d,eax
+ add r11d,r13d
+ xor esi,ebx
+ add edx,r11d
+ shrd r14d,r14d,2
+ add r11d,esi
+ mov r13d,edx
+ add r14d,r11d
+ shrd r13d,r13d,14
+ mov r11d,r14d
+ mov r12d,r8d
+ xor r13d,edx
+ shrd r14d,r14d,9
+ xor r12d,r9d
+ shrd r13d,r13d,5
+ xor r14d,r11d
+ and r12d,edx
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((144-128))+rdi]
+ xor r13d,edx
+ add r10d,DWORD[36+rsp]
+ mov esi,r11d
+ shrd r14d,r14d,11
+ xor r12d,r9d
+ xor esi,eax
+ shrd r13d,r13d,6
+ add r10d,r12d
+ and r15d,esi
+ xor r14d,r11d
+ add r10d,r13d
+ xor r15d,eax
+ add ecx,r10d
+ shrd r14d,r14d,2
+ add r10d,r15d
+ mov r13d,ecx
+ add r14d,r10d
+ shrd r13d,r13d,14
+ mov r10d,r14d
+ mov r12d,edx
+ xor r13d,ecx
+ shrd r14d,r14d,9
+ xor r12d,r8d
+ shrd r13d,r13d,5
+ xor r14d,r10d
+ and r12d,ecx
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((160-128))+rdi]
+ xor r13d,ecx
+ add r9d,DWORD[40+rsp]
+ mov r15d,r10d
+ shrd r14d,r14d,11
+ xor r12d,r8d
+ xor r15d,r11d
+ shrd r13d,r13d,6
+ add r9d,r12d
+ and esi,r15d
+ xor r14d,r10d
+ add r9d,r13d
+ xor esi,r11d
+ add ebx,r9d
+ shrd r14d,r14d,2
+ add r9d,esi
+ mov r13d,ebx
+ add r14d,r9d
+ shrd r13d,r13d,14
+ mov r9d,r14d
+ mov r12d,ecx
+ xor r13d,ebx
+ shrd r14d,r14d,9
+ xor r12d,edx
+ shrd r13d,r13d,5
+ xor r14d,r9d
+ and r12d,ebx
+ vaesenclast xmm11,xmm9,xmm10
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((176-128))+rdi]
+ xor r13d,ebx
+ add r8d,DWORD[44+rsp]
+ mov esi,r9d
+ shrd r14d,r14d,11
+ xor r12d,edx
+ xor esi,r10d
+ shrd r13d,r13d,6
+ add r8d,r12d
+ and r15d,esi
+ xor r14d,r9d
+ add r8d,r13d
+ xor r15d,r10d
+ add eax,r8d
+ shrd r14d,r14d,2
+ add r8d,r15d
+ mov r13d,eax
+ add r14d,r8d
+ shrd r13d,r13d,14
+ mov r8d,r14d
+ mov r12d,ebx
+ xor r13d,eax
+ shrd r14d,r14d,9
+ xor r12d,ecx
+ shrd r13d,r13d,5
+ xor r14d,r8d
+ and r12d,eax
+ vpand xmm8,xmm11,xmm12
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((192-128))+rdi]
+ xor r13d,eax
+ add edx,DWORD[48+rsp]
+ mov r15d,r8d
+ shrd r14d,r14d,11
+ xor r12d,ecx
+ xor r15d,r9d
+ shrd r13d,r13d,6
+ add edx,r12d
+ and esi,r15d
+ xor r14d,r8d
+ add edx,r13d
+ xor esi,r9d
+ add r11d,edx
+ shrd r14d,r14d,2
+ add edx,esi
+ mov r13d,r11d
+ add r14d,edx
+ shrd r13d,r13d,14
+ mov edx,r14d
+ mov r12d,eax
+ xor r13d,r11d
+ shrd r14d,r14d,9
+ xor r12d,ebx
+ shrd r13d,r13d,5
+ xor r14d,edx
+ and r12d,r11d
+ vaesenclast xmm11,xmm9,xmm10
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((208-128))+rdi]
+ xor r13d,r11d
+ add ecx,DWORD[52+rsp]
+ mov esi,edx
+ shrd r14d,r14d,11
+ xor r12d,ebx
+ xor esi,r8d
+ shrd r13d,r13d,6
+ add ecx,r12d
+ and r15d,esi
+ xor r14d,edx
+ add ecx,r13d
+ xor r15d,r8d
+ add r10d,ecx
+ shrd r14d,r14d,2
+ add ecx,r15d
+ mov r13d,r10d
+ add r14d,ecx
+ shrd r13d,r13d,14
+ mov ecx,r14d
+ mov r12d,r11d
+ xor r13d,r10d
+ shrd r14d,r14d,9
+ xor r12d,eax
+ shrd r13d,r13d,5
+ xor r14d,ecx
+ and r12d,r10d
+ vpand xmm11,xmm11,xmm13
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((224-128))+rdi]
+ xor r13d,r10d
+ add ebx,DWORD[56+rsp]
+ mov r15d,ecx
+ shrd r14d,r14d,11
+ xor r12d,eax
+ xor r15d,edx
+ shrd r13d,r13d,6
+ add ebx,r12d
+ and esi,r15d
+ xor r14d,ecx
+ add ebx,r13d
+ xor esi,edx
+ add r9d,ebx
+ shrd r14d,r14d,2
+ add ebx,esi
+ mov r13d,r9d
+ add r14d,ebx
+ shrd r13d,r13d,14
+ mov ebx,r14d
+ mov r12d,r10d
+ xor r13d,r9d
+ shrd r14d,r14d,9
+ xor r12d,r11d
+ shrd r13d,r13d,5
+ xor r14d,ebx
+ and r12d,r9d
+ vpor xmm8,xmm8,xmm11
+ vaesenclast xmm11,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((0-128))+rdi]
+ xor r13d,r9d
+ add eax,DWORD[60+rsp]
+ mov esi,ebx
+ shrd r14d,r14d,11
+ xor r12d,r11d
+ xor esi,ecx
+ shrd r13d,r13d,6
+ add eax,r12d
+ and r15d,esi
+ xor r14d,ebx
+ add eax,r13d
+ xor r15d,ecx
+ add r8d,eax
+ shrd r14d,r14d,2
+ add eax,r15d
+ mov r13d,r8d
+ add r14d,eax
+ mov r12,QWORD[((64+0))+rsp]
+ mov r13,QWORD[((64+8))+rsp]
+ mov r15,QWORD[((64+40))+rsp]
+ mov rsi,QWORD[((64+48))+rsp]
+
+ vpand xmm11,xmm11,xmm14
+ mov eax,r14d
+ vpor xmm8,xmm8,xmm11
+ vmovdqu XMMWORD[r13*1+r12],xmm8
+ lea r12,[16+r12]
+
+ add eax,DWORD[r15]
+ add ebx,DWORD[4+r15]
+ add ecx,DWORD[8+r15]
+ add edx,DWORD[12+r15]
+ add r8d,DWORD[16+r15]
+ add r9d,DWORD[20+r15]
+ add r10d,DWORD[24+r15]
+ add r11d,DWORD[28+r15]
+
+ cmp r12,QWORD[((64+16))+rsp]
+
+ mov DWORD[r15],eax
+ mov DWORD[4+r15],ebx
+ mov DWORD[8+r15],ecx
+ mov DWORD[12+r15],edx
+ mov DWORD[16+r15],r8d
+ mov DWORD[20+r15],r9d
+ mov DWORD[24+r15],r10d
+ mov DWORD[28+r15],r11d
+ jb NEAR $L$loop_avx
+
+ mov r8,QWORD[((64+32))+rsp]
+ mov rsi,QWORD[120+rsp]
+
+ vmovdqu XMMWORD[r8],xmm8
+ vzeroall
+ movaps xmm6,XMMWORD[128+rsp]
+ movaps xmm7,XMMWORD[144+rsp]
+ movaps xmm8,XMMWORD[160+rsp]
+ movaps xmm9,XMMWORD[176+rsp]
+ movaps xmm10,XMMWORD[192+rsp]
+ movaps xmm11,XMMWORD[208+rsp]
+ movaps xmm12,XMMWORD[224+rsp]
+ movaps xmm13,XMMWORD[240+rsp]
+ movaps xmm14,XMMWORD[256+rsp]
+ movaps xmm15,XMMWORD[272+rsp]
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$epilogue_avx:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_aesni_cbc_sha256_enc_avx:
+
+ALIGN 64
+aesni_cbc_sha256_enc_avx2:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_cbc_sha256_enc_avx2:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+$L$avx2_shortcut:
+ mov r10,QWORD[56+rsp]
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ sub rsp,736
+ and rsp,-256*4
+ add rsp,448
+
+ shl rdx,6
+ sub rsi,rdi
+ sub r10,rdi
+ add rdx,rdi
+
+
+
+ mov QWORD[((64+16))+rsp],rdx
+
+ mov QWORD[((64+32))+rsp],r8
+ mov QWORD[((64+40))+rsp],r9
+ mov QWORD[((64+48))+rsp],r10
+ mov QWORD[120+rsp],rax
+
+ movaps XMMWORD[128+rsp],xmm6
+ movaps XMMWORD[144+rsp],xmm7
+ movaps XMMWORD[160+rsp],xmm8
+ movaps XMMWORD[176+rsp],xmm9
+ movaps XMMWORD[192+rsp],xmm10
+ movaps XMMWORD[208+rsp],xmm11
+ movaps XMMWORD[224+rsp],xmm12
+ movaps XMMWORD[240+rsp],xmm13
+ movaps XMMWORD[256+rsp],xmm14
+ movaps XMMWORD[272+rsp],xmm15
+$L$prologue_avx2:
+ vzeroall
+
+ mov r13,rdi
+ vpinsrq xmm15,xmm15,rsi,1
+ lea rdi,[128+rcx]
+ lea r12,[((K256+544))]
+ mov r14d,DWORD[((240-128))+rdi]
+ mov r15,r9
+ mov rsi,r10
+ vmovdqu xmm8,XMMWORD[r8]
+ lea r14,[((-9))+r14]
+
+ vmovdqa xmm14,XMMWORD[r14*8+r12]
+ vmovdqa xmm13,XMMWORD[16+r14*8+r12]
+ vmovdqa xmm12,XMMWORD[32+r14*8+r12]
+
+ sub r13,-16*4
+ mov eax,DWORD[r15]
+ lea r12,[r13*1+rsi]
+ mov ebx,DWORD[4+r15]
+ cmp r13,rdx
+ mov ecx,DWORD[8+r15]
+ cmove r12,rsp
+ mov edx,DWORD[12+r15]
+ mov r8d,DWORD[16+r15]
+ mov r9d,DWORD[20+r15]
+ mov r10d,DWORD[24+r15]
+ mov r11d,DWORD[28+r15]
+ vmovdqu xmm10,XMMWORD[((0-128))+rdi]
+ jmp NEAR $L$oop_avx2
+ALIGN 16
+$L$oop_avx2:
+ vmovdqa ymm7,YMMWORD[((K256+512))]
+ vmovdqu xmm0,XMMWORD[((-64+0))+r13*1+rsi]
+ vmovdqu xmm1,XMMWORD[((-64+16))+r13*1+rsi]
+ vmovdqu xmm2,XMMWORD[((-64+32))+r13*1+rsi]
+ vmovdqu xmm3,XMMWORD[((-64+48))+r13*1+rsi]
+
+ vinserti128 ymm0,ymm0,XMMWORD[r12],1
+ vinserti128 ymm1,ymm1,XMMWORD[16+r12],1
+ vpshufb ymm0,ymm0,ymm7
+ vinserti128 ymm2,ymm2,XMMWORD[32+r12],1
+ vpshufb ymm1,ymm1,ymm7
+ vinserti128 ymm3,ymm3,XMMWORD[48+r12],1
+
+ lea rbp,[K256]
+ vpshufb ymm2,ymm2,ymm7
+ lea r13,[((-64))+r13]
+ vpaddd ymm4,ymm0,YMMWORD[rbp]
+ vpshufb ymm3,ymm3,ymm7
+ vpaddd ymm5,ymm1,YMMWORD[32+rbp]
+ vpaddd ymm6,ymm2,YMMWORD[64+rbp]
+ vpaddd ymm7,ymm3,YMMWORD[96+rbp]
+ vmovdqa YMMWORD[rsp],ymm4
+ xor r14d,r14d
+ vmovdqa YMMWORD[32+rsp],ymm5
+ lea rsp,[((-64))+rsp]
+ mov esi,ebx
+ vmovdqa YMMWORD[rsp],ymm6
+ xor esi,ecx
+ vmovdqa YMMWORD[32+rsp],ymm7
+ mov r12d,r9d
+ sub rbp,-16*2*4
+ jmp NEAR $L$avx2_00_47
+
+ALIGN 16
+$L$avx2_00_47:
+ vmovdqu xmm9,XMMWORD[r13]
+ vpinsrq xmm15,xmm15,r13,0
+ lea rsp,[((-64))+rsp]
+ vpalignr ymm4,ymm1,ymm0,4
+ add r11d,DWORD[((0+128))+rsp]
+ and r12d,r8d
+ rorx r13d,r8d,25
+ vpalignr ymm7,ymm3,ymm2,4
+ rorx r15d,r8d,11
+ lea eax,[r14*1+rax]
+ lea r11d,[r12*1+r11]
+ vpsrld ymm6,ymm4,7
+ andn r12d,r8d,r10d
+ xor r13d,r15d
+ rorx r14d,r8d,6
+ vpaddd ymm0,ymm0,ymm7
+ lea r11d,[r12*1+r11]
+ xor r13d,r14d
+ mov r15d,eax
+ vpsrld ymm7,ymm4,3
+ rorx r12d,eax,22
+ lea r11d,[r13*1+r11]
+ xor r15d,ebx
+ vpslld ymm5,ymm4,14
+ rorx r14d,eax,13
+ rorx r13d,eax,2
+ lea edx,[r11*1+rdx]
+ vpxor ymm4,ymm7,ymm6
+ and esi,r15d
+ vpxor xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((16-128))+rdi]
+ xor r14d,r12d
+ xor esi,ebx
+ vpshufd ymm7,ymm3,250
+ xor r14d,r13d
+ lea r11d,[rsi*1+r11]
+ mov r12d,r8d
+ vpsrld ymm6,ymm6,11
+ add r10d,DWORD[((4+128))+rsp]
+ and r12d,edx
+ rorx r13d,edx,25
+ vpxor ymm4,ymm4,ymm5
+ rorx esi,edx,11
+ lea r11d,[r14*1+r11]
+ lea r10d,[r12*1+r10]
+ vpslld ymm5,ymm5,11
+ andn r12d,edx,r9d
+ xor r13d,esi
+ rorx r14d,edx,6
+ vpxor ymm4,ymm4,ymm6
+ lea r10d,[r12*1+r10]
+ xor r13d,r14d
+ mov esi,r11d
+ vpsrld ymm6,ymm7,10
+ rorx r12d,r11d,22
+ lea r10d,[r13*1+r10]
+ xor esi,eax
+ vpxor ymm4,ymm4,ymm5
+ rorx r14d,r11d,13
+ rorx r13d,r11d,2
+ lea ecx,[r10*1+rcx]
+ vpsrlq ymm7,ymm7,17
+ and r15d,esi
+ vpxor xmm9,xmm9,xmm8
+ xor r14d,r12d
+ xor r15d,eax
+ vpaddd ymm0,ymm0,ymm4
+ xor r14d,r13d
+ lea r10d,[r15*1+r10]
+ mov r12d,edx
+ vpxor ymm6,ymm6,ymm7
+ add r9d,DWORD[((8+128))+rsp]
+ and r12d,ecx
+ rorx r13d,ecx,25
+ vpsrlq ymm7,ymm7,2
+ rorx r15d,ecx,11
+ lea r10d,[r14*1+r10]
+ lea r9d,[r12*1+r9]
+ vpxor ymm6,ymm6,ymm7
+ andn r12d,ecx,r8d
+ xor r13d,r15d
+ rorx r14d,ecx,6
+ vpshufd ymm6,ymm6,132
+ lea r9d,[r12*1+r9]
+ xor r13d,r14d
+ mov r15d,r10d
+ vpsrldq ymm6,ymm6,8
+ rorx r12d,r10d,22
+ lea r9d,[r13*1+r9]
+ xor r15d,r11d
+ vpaddd ymm0,ymm0,ymm6
+ rorx r14d,r10d,13
+ rorx r13d,r10d,2
+ lea ebx,[r9*1+rbx]
+ vpshufd ymm7,ymm0,80
+ and esi,r15d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((32-128))+rdi]
+ xor r14d,r12d
+ xor esi,r11d
+ vpsrld ymm6,ymm7,10
+ xor r14d,r13d
+ lea r9d,[rsi*1+r9]
+ mov r12d,ecx
+ vpsrlq ymm7,ymm7,17
+ add r8d,DWORD[((12+128))+rsp]
+ and r12d,ebx
+ rorx r13d,ebx,25
+ vpxor ymm6,ymm6,ymm7
+ rorx esi,ebx,11
+ lea r9d,[r14*1+r9]
+ lea r8d,[r12*1+r8]
+ vpsrlq ymm7,ymm7,2
+ andn r12d,ebx,edx
+ xor r13d,esi
+ rorx r14d,ebx,6
+ vpxor ymm6,ymm6,ymm7
+ lea r8d,[r12*1+r8]
+ xor r13d,r14d
+ mov esi,r9d
+ vpshufd ymm6,ymm6,232
+ rorx r12d,r9d,22
+ lea r8d,[r13*1+r8]
+ xor esi,r10d
+ vpslldq ymm6,ymm6,8
+ rorx r14d,r9d,13
+ rorx r13d,r9d,2
+ lea eax,[r8*1+rax]
+ vpaddd ymm0,ymm0,ymm6
+ and r15d,esi
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((48-128))+rdi]
+ xor r14d,r12d
+ xor r15d,r10d
+ vpaddd ymm6,ymm0,YMMWORD[rbp]
+ xor r14d,r13d
+ lea r8d,[r15*1+r8]
+ mov r12d,ebx
+ vmovdqa YMMWORD[rsp],ymm6
+ vpalignr ymm4,ymm2,ymm1,4
+ add edx,DWORD[((32+128))+rsp]
+ and r12d,eax
+ rorx r13d,eax,25
+ vpalignr ymm7,ymm0,ymm3,4
+ rorx r15d,eax,11
+ lea r8d,[r14*1+r8]
+ lea edx,[r12*1+rdx]
+ vpsrld ymm6,ymm4,7
+ andn r12d,eax,ecx
+ xor r13d,r15d
+ rorx r14d,eax,6
+ vpaddd ymm1,ymm1,ymm7
+ lea edx,[r12*1+rdx]
+ xor r13d,r14d
+ mov r15d,r8d
+ vpsrld ymm7,ymm4,3
+ rorx r12d,r8d,22
+ lea edx,[r13*1+rdx]
+ xor r15d,r9d
+ vpslld ymm5,ymm4,14
+ rorx r14d,r8d,13
+ rorx r13d,r8d,2
+ lea r11d,[rdx*1+r11]
+ vpxor ymm4,ymm7,ymm6
+ and esi,r15d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((64-128))+rdi]
+ xor r14d,r12d
+ xor esi,r9d
+ vpshufd ymm7,ymm0,250
+ xor r14d,r13d
+ lea edx,[rsi*1+rdx]
+ mov r12d,eax
+ vpsrld ymm6,ymm6,11
+ add ecx,DWORD[((36+128))+rsp]
+ and r12d,r11d
+ rorx r13d,r11d,25
+ vpxor ymm4,ymm4,ymm5
+ rorx esi,r11d,11
+ lea edx,[r14*1+rdx]
+ lea ecx,[r12*1+rcx]
+ vpslld ymm5,ymm5,11
+ andn r12d,r11d,ebx
+ xor r13d,esi
+ rorx r14d,r11d,6
+ vpxor ymm4,ymm4,ymm6
+ lea ecx,[r12*1+rcx]
+ xor r13d,r14d
+ mov esi,edx
+ vpsrld ymm6,ymm7,10
+ rorx r12d,edx,22
+ lea ecx,[r13*1+rcx]
+ xor esi,r8d
+ vpxor ymm4,ymm4,ymm5
+ rorx r14d,edx,13
+ rorx r13d,edx,2
+ lea r10d,[rcx*1+r10]
+ vpsrlq ymm7,ymm7,17
+ and r15d,esi
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((80-128))+rdi]
+ xor r14d,r12d
+ xor r15d,r8d
+ vpaddd ymm1,ymm1,ymm4
+ xor r14d,r13d
+ lea ecx,[r15*1+rcx]
+ mov r12d,r11d
+ vpxor ymm6,ymm6,ymm7
+ add ebx,DWORD[((40+128))+rsp]
+ and r12d,r10d
+ rorx r13d,r10d,25
+ vpsrlq ymm7,ymm7,2
+ rorx r15d,r10d,11
+ lea ecx,[r14*1+rcx]
+ lea ebx,[r12*1+rbx]
+ vpxor ymm6,ymm6,ymm7
+ andn r12d,r10d,eax
+ xor r13d,r15d
+ rorx r14d,r10d,6
+ vpshufd ymm6,ymm6,132
+ lea ebx,[r12*1+rbx]
+ xor r13d,r14d
+ mov r15d,ecx
+ vpsrldq ymm6,ymm6,8
+ rorx r12d,ecx,22
+ lea ebx,[r13*1+rbx]
+ xor r15d,edx
+ vpaddd ymm1,ymm1,ymm6
+ rorx r14d,ecx,13
+ rorx r13d,ecx,2
+ lea r9d,[rbx*1+r9]
+ vpshufd ymm7,ymm1,80
+ and esi,r15d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((96-128))+rdi]
+ xor r14d,r12d
+ xor esi,edx
+ vpsrld ymm6,ymm7,10
+ xor r14d,r13d
+ lea ebx,[rsi*1+rbx]
+ mov r12d,r10d
+ vpsrlq ymm7,ymm7,17
+ add eax,DWORD[((44+128))+rsp]
+ and r12d,r9d
+ rorx r13d,r9d,25
+ vpxor ymm6,ymm6,ymm7
+ rorx esi,r9d,11
+ lea ebx,[r14*1+rbx]
+ lea eax,[r12*1+rax]
+ vpsrlq ymm7,ymm7,2
+ andn r12d,r9d,r11d
+ xor r13d,esi
+ rorx r14d,r9d,6
+ vpxor ymm6,ymm6,ymm7
+ lea eax,[r12*1+rax]
+ xor r13d,r14d
+ mov esi,ebx
+ vpshufd ymm6,ymm6,232
+ rorx r12d,ebx,22
+ lea eax,[r13*1+rax]
+ xor esi,ecx
+ vpslldq ymm6,ymm6,8
+ rorx r14d,ebx,13
+ rorx r13d,ebx,2
+ lea r8d,[rax*1+r8]
+ vpaddd ymm1,ymm1,ymm6
+ and r15d,esi
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((112-128))+rdi]
+ xor r14d,r12d
+ xor r15d,ecx
+ vpaddd ymm6,ymm1,YMMWORD[32+rbp]
+ xor r14d,r13d
+ lea eax,[r15*1+rax]
+ mov r12d,r9d
+ vmovdqa YMMWORD[32+rsp],ymm6
+ lea rsp,[((-64))+rsp]
+ vpalignr ymm4,ymm3,ymm2,4
+ add r11d,DWORD[((0+128))+rsp]
+ and r12d,r8d
+ rorx r13d,r8d,25
+ vpalignr ymm7,ymm1,ymm0,4
+ rorx r15d,r8d,11
+ lea eax,[r14*1+rax]
+ lea r11d,[r12*1+r11]
+ vpsrld ymm6,ymm4,7
+ andn r12d,r8d,r10d
+ xor r13d,r15d
+ rorx r14d,r8d,6
+ vpaddd ymm2,ymm2,ymm7
+ lea r11d,[r12*1+r11]
+ xor r13d,r14d
+ mov r15d,eax
+ vpsrld ymm7,ymm4,3
+ rorx r12d,eax,22
+ lea r11d,[r13*1+r11]
+ xor r15d,ebx
+ vpslld ymm5,ymm4,14
+ rorx r14d,eax,13
+ rorx r13d,eax,2
+ lea edx,[r11*1+rdx]
+ vpxor ymm4,ymm7,ymm6
+ and esi,r15d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((128-128))+rdi]
+ xor r14d,r12d
+ xor esi,ebx
+ vpshufd ymm7,ymm1,250
+ xor r14d,r13d
+ lea r11d,[rsi*1+r11]
+ mov r12d,r8d
+ vpsrld ymm6,ymm6,11
+ add r10d,DWORD[((4+128))+rsp]
+ and r12d,edx
+ rorx r13d,edx,25
+ vpxor ymm4,ymm4,ymm5
+ rorx esi,edx,11
+ lea r11d,[r14*1+r11]
+ lea r10d,[r12*1+r10]
+ vpslld ymm5,ymm5,11
+ andn r12d,edx,r9d
+ xor r13d,esi
+ rorx r14d,edx,6
+ vpxor ymm4,ymm4,ymm6
+ lea r10d,[r12*1+r10]
+ xor r13d,r14d
+ mov esi,r11d
+ vpsrld ymm6,ymm7,10
+ rorx r12d,r11d,22
+ lea r10d,[r13*1+r10]
+ xor esi,eax
+ vpxor ymm4,ymm4,ymm5
+ rorx r14d,r11d,13
+ rorx r13d,r11d,2
+ lea ecx,[r10*1+rcx]
+ vpsrlq ymm7,ymm7,17
+ and r15d,esi
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((144-128))+rdi]
+ xor r14d,r12d
+ xor r15d,eax
+ vpaddd ymm2,ymm2,ymm4
+ xor r14d,r13d
+ lea r10d,[r15*1+r10]
+ mov r12d,edx
+ vpxor ymm6,ymm6,ymm7
+ add r9d,DWORD[((8+128))+rsp]
+ and r12d,ecx
+ rorx r13d,ecx,25
+ vpsrlq ymm7,ymm7,2
+ rorx r15d,ecx,11
+ lea r10d,[r14*1+r10]
+ lea r9d,[r12*1+r9]
+ vpxor ymm6,ymm6,ymm7
+ andn r12d,ecx,r8d
+ xor r13d,r15d
+ rorx r14d,ecx,6
+ vpshufd ymm6,ymm6,132
+ lea r9d,[r12*1+r9]
+ xor r13d,r14d
+ mov r15d,r10d
+ vpsrldq ymm6,ymm6,8
+ rorx r12d,r10d,22
+ lea r9d,[r13*1+r9]
+ xor r15d,r11d
+ vpaddd ymm2,ymm2,ymm6
+ rorx r14d,r10d,13
+ rorx r13d,r10d,2
+ lea ebx,[r9*1+rbx]
+ vpshufd ymm7,ymm2,80
+ and esi,r15d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((160-128))+rdi]
+ xor r14d,r12d
+ xor esi,r11d
+ vpsrld ymm6,ymm7,10
+ xor r14d,r13d
+ lea r9d,[rsi*1+r9]
+ mov r12d,ecx
+ vpsrlq ymm7,ymm7,17
+ add r8d,DWORD[((12+128))+rsp]
+ and r12d,ebx
+ rorx r13d,ebx,25
+ vpxor ymm6,ymm6,ymm7
+ rorx esi,ebx,11
+ lea r9d,[r14*1+r9]
+ lea r8d,[r12*1+r8]
+ vpsrlq ymm7,ymm7,2
+ andn r12d,ebx,edx
+ xor r13d,esi
+ rorx r14d,ebx,6
+ vpxor ymm6,ymm6,ymm7
+ lea r8d,[r12*1+r8]
+ xor r13d,r14d
+ mov esi,r9d
+ vpshufd ymm6,ymm6,232
+ rorx r12d,r9d,22
+ lea r8d,[r13*1+r8]
+ xor esi,r10d
+ vpslldq ymm6,ymm6,8
+ rorx r14d,r9d,13
+ rorx r13d,r9d,2
+ lea eax,[r8*1+rax]
+ vpaddd ymm2,ymm2,ymm6
+ and r15d,esi
+ vaesenclast xmm11,xmm9,xmm10
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((176-128))+rdi]
+ xor r14d,r12d
+ xor r15d,r10d
+ vpaddd ymm6,ymm2,YMMWORD[64+rbp]
+ xor r14d,r13d
+ lea r8d,[r15*1+r8]
+ mov r12d,ebx
+ vmovdqa YMMWORD[rsp],ymm6
+ vpalignr ymm4,ymm0,ymm3,4
+ add edx,DWORD[((32+128))+rsp]
+ and r12d,eax
+ rorx r13d,eax,25
+ vpalignr ymm7,ymm2,ymm1,4
+ rorx r15d,eax,11
+ lea r8d,[r14*1+r8]
+ lea edx,[r12*1+rdx]
+ vpsrld ymm6,ymm4,7
+ andn r12d,eax,ecx
+ xor r13d,r15d
+ rorx r14d,eax,6
+ vpaddd ymm3,ymm3,ymm7
+ lea edx,[r12*1+rdx]
+ xor r13d,r14d
+ mov r15d,r8d
+ vpsrld ymm7,ymm4,3
+ rorx r12d,r8d,22
+ lea edx,[r13*1+rdx]
+ xor r15d,r9d
+ vpslld ymm5,ymm4,14
+ rorx r14d,r8d,13
+ rorx r13d,r8d,2
+ lea r11d,[rdx*1+r11]
+ vpxor ymm4,ymm7,ymm6
+ and esi,r15d
+ vpand xmm8,xmm11,xmm12
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((192-128))+rdi]
+ xor r14d,r12d
+ xor esi,r9d
+ vpshufd ymm7,ymm2,250
+ xor r14d,r13d
+ lea edx,[rsi*1+rdx]
+ mov r12d,eax
+ vpsrld ymm6,ymm6,11
+ add ecx,DWORD[((36+128))+rsp]
+ and r12d,r11d
+ rorx r13d,r11d,25
+ vpxor ymm4,ymm4,ymm5
+ rorx esi,r11d,11
+ lea edx,[r14*1+rdx]
+ lea ecx,[r12*1+rcx]
+ vpslld ymm5,ymm5,11
+ andn r12d,r11d,ebx
+ xor r13d,esi
+ rorx r14d,r11d,6
+ vpxor ymm4,ymm4,ymm6
+ lea ecx,[r12*1+rcx]
+ xor r13d,r14d
+ mov esi,edx
+ vpsrld ymm6,ymm7,10
+ rorx r12d,edx,22
+ lea ecx,[r13*1+rcx]
+ xor esi,r8d
+ vpxor ymm4,ymm4,ymm5
+ rorx r14d,edx,13
+ rorx r13d,edx,2
+ lea r10d,[rcx*1+r10]
+ vpsrlq ymm7,ymm7,17
+ and r15d,esi
+ vaesenclast xmm11,xmm9,xmm10
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((208-128))+rdi]
+ xor r14d,r12d
+ xor r15d,r8d
+ vpaddd ymm3,ymm3,ymm4
+ xor r14d,r13d
+ lea ecx,[r15*1+rcx]
+ mov r12d,r11d
+ vpxor ymm6,ymm6,ymm7
+ add ebx,DWORD[((40+128))+rsp]
+ and r12d,r10d
+ rorx r13d,r10d,25
+ vpsrlq ymm7,ymm7,2
+ rorx r15d,r10d,11
+ lea ecx,[r14*1+rcx]
+ lea ebx,[r12*1+rbx]
+ vpxor ymm6,ymm6,ymm7
+ andn r12d,r10d,eax
+ xor r13d,r15d
+ rorx r14d,r10d,6
+ vpshufd ymm6,ymm6,132
+ lea ebx,[r12*1+rbx]
+ xor r13d,r14d
+ mov r15d,ecx
+ vpsrldq ymm6,ymm6,8
+ rorx r12d,ecx,22
+ lea ebx,[r13*1+rbx]
+ xor r15d,edx
+ vpaddd ymm3,ymm3,ymm6
+ rorx r14d,ecx,13
+ rorx r13d,ecx,2
+ lea r9d,[rbx*1+r9]
+ vpshufd ymm7,ymm3,80
+ and esi,r15d
+ vpand xmm11,xmm11,xmm13
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((224-128))+rdi]
+ xor r14d,r12d
+ xor esi,edx
+ vpsrld ymm6,ymm7,10
+ xor r14d,r13d
+ lea ebx,[rsi*1+rbx]
+ mov r12d,r10d
+ vpsrlq ymm7,ymm7,17
+ add eax,DWORD[((44+128))+rsp]
+ and r12d,r9d
+ rorx r13d,r9d,25
+ vpxor ymm6,ymm6,ymm7
+ rorx esi,r9d,11
+ lea ebx,[r14*1+rbx]
+ lea eax,[r12*1+rax]
+ vpsrlq ymm7,ymm7,2
+ andn r12d,r9d,r11d
+ xor r13d,esi
+ rorx r14d,r9d,6
+ vpxor ymm6,ymm6,ymm7
+ lea eax,[r12*1+rax]
+ xor r13d,r14d
+ mov esi,ebx
+ vpshufd ymm6,ymm6,232
+ rorx r12d,ebx,22
+ lea eax,[r13*1+rax]
+ xor esi,ecx
+ vpslldq ymm6,ymm6,8
+ rorx r14d,ebx,13
+ rorx r13d,ebx,2
+ lea r8d,[rax*1+r8]
+ vpaddd ymm3,ymm3,ymm6
+ and r15d,esi
+ vpor xmm8,xmm8,xmm11
+ vaesenclast xmm11,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((0-128))+rdi]
+ xor r14d,r12d
+ xor r15d,ecx
+ vpaddd ymm6,ymm3,YMMWORD[96+rbp]
+ xor r14d,r13d
+ lea eax,[r15*1+rax]
+ mov r12d,r9d
+ vmovdqa YMMWORD[32+rsp],ymm6
+ vmovq r13,xmm15
+ vpextrq r15,xmm15,1
+ vpand xmm11,xmm11,xmm14
+ vpor xmm8,xmm8,xmm11
+ vmovdqu XMMWORD[r13*1+r15],xmm8
+ lea r13,[16+r13]
+ lea rbp,[128+rbp]
+ cmp BYTE[3+rbp],0
+ jne NEAR $L$avx2_00_47
+ vmovdqu xmm9,XMMWORD[r13]
+ vpinsrq xmm15,xmm15,r13,0
+ add r11d,DWORD[((0+64))+rsp]
+ and r12d,r8d
+ rorx r13d,r8d,25
+ rorx r15d,r8d,11
+ lea eax,[r14*1+rax]
+ lea r11d,[r12*1+r11]
+ andn r12d,r8d,r10d
+ xor r13d,r15d
+ rorx r14d,r8d,6
+ lea r11d,[r12*1+r11]
+ xor r13d,r14d
+ mov r15d,eax
+ rorx r12d,eax,22
+ lea r11d,[r13*1+r11]
+ xor r15d,ebx
+ rorx r14d,eax,13
+ rorx r13d,eax,2
+ lea edx,[r11*1+rdx]
+ and esi,r15d
+ vpxor xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((16-128))+rdi]
+ xor r14d,r12d
+ xor esi,ebx
+ xor r14d,r13d
+ lea r11d,[rsi*1+r11]
+ mov r12d,r8d
+ add r10d,DWORD[((4+64))+rsp]
+ and r12d,edx
+ rorx r13d,edx,25
+ rorx esi,edx,11
+ lea r11d,[r14*1+r11]
+ lea r10d,[r12*1+r10]
+ andn r12d,edx,r9d
+ xor r13d,esi
+ rorx r14d,edx,6
+ lea r10d,[r12*1+r10]
+ xor r13d,r14d
+ mov esi,r11d
+ rorx r12d,r11d,22
+ lea r10d,[r13*1+r10]
+ xor esi,eax
+ rorx r14d,r11d,13
+ rorx r13d,r11d,2
+ lea ecx,[r10*1+rcx]
+ and r15d,esi
+ vpxor xmm9,xmm9,xmm8
+ xor r14d,r12d
+ xor r15d,eax
+ xor r14d,r13d
+ lea r10d,[r15*1+r10]
+ mov r12d,edx
+ add r9d,DWORD[((8+64))+rsp]
+ and r12d,ecx
+ rorx r13d,ecx,25
+ rorx r15d,ecx,11
+ lea r10d,[r14*1+r10]
+ lea r9d,[r12*1+r9]
+ andn r12d,ecx,r8d
+ xor r13d,r15d
+ rorx r14d,ecx,6
+ lea r9d,[r12*1+r9]
+ xor r13d,r14d
+ mov r15d,r10d
+ rorx r12d,r10d,22
+ lea r9d,[r13*1+r9]
+ xor r15d,r11d
+ rorx r14d,r10d,13
+ rorx r13d,r10d,2
+ lea ebx,[r9*1+rbx]
+ and esi,r15d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((32-128))+rdi]
+ xor r14d,r12d
+ xor esi,r11d
+ xor r14d,r13d
+ lea r9d,[rsi*1+r9]
+ mov r12d,ecx
+ add r8d,DWORD[((12+64))+rsp]
+ and r12d,ebx
+ rorx r13d,ebx,25
+ rorx esi,ebx,11
+ lea r9d,[r14*1+r9]
+ lea r8d,[r12*1+r8]
+ andn r12d,ebx,edx
+ xor r13d,esi
+ rorx r14d,ebx,6
+ lea r8d,[r12*1+r8]
+ xor r13d,r14d
+ mov esi,r9d
+ rorx r12d,r9d,22
+ lea r8d,[r13*1+r8]
+ xor esi,r10d
+ rorx r14d,r9d,13
+ rorx r13d,r9d,2
+ lea eax,[r8*1+rax]
+ and r15d,esi
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((48-128))+rdi]
+ xor r14d,r12d
+ xor r15d,r10d
+ xor r14d,r13d
+ lea r8d,[r15*1+r8]
+ mov r12d,ebx
+ add edx,DWORD[((32+64))+rsp]
+ and r12d,eax
+ rorx r13d,eax,25
+ rorx r15d,eax,11
+ lea r8d,[r14*1+r8]
+ lea edx,[r12*1+rdx]
+ andn r12d,eax,ecx
+ xor r13d,r15d
+ rorx r14d,eax,6
+ lea edx,[r12*1+rdx]
+ xor r13d,r14d
+ mov r15d,r8d
+ rorx r12d,r8d,22
+ lea edx,[r13*1+rdx]
+ xor r15d,r9d
+ rorx r14d,r8d,13
+ rorx r13d,r8d,2
+ lea r11d,[rdx*1+r11]
+ and esi,r15d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((64-128))+rdi]
+ xor r14d,r12d
+ xor esi,r9d
+ xor r14d,r13d
+ lea edx,[rsi*1+rdx]
+ mov r12d,eax
+ add ecx,DWORD[((36+64))+rsp]
+ and r12d,r11d
+ rorx r13d,r11d,25
+ rorx esi,r11d,11
+ lea edx,[r14*1+rdx]
+ lea ecx,[r12*1+rcx]
+ andn r12d,r11d,ebx
+ xor r13d,esi
+ rorx r14d,r11d,6
+ lea ecx,[r12*1+rcx]
+ xor r13d,r14d
+ mov esi,edx
+ rorx r12d,edx,22
+ lea ecx,[r13*1+rcx]
+ xor esi,r8d
+ rorx r14d,edx,13
+ rorx r13d,edx,2
+ lea r10d,[rcx*1+r10]
+ and r15d,esi
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((80-128))+rdi]
+ xor r14d,r12d
+ xor r15d,r8d
+ xor r14d,r13d
+ lea ecx,[r15*1+rcx]
+ mov r12d,r11d
+ add ebx,DWORD[((40+64))+rsp]
+ and r12d,r10d
+ rorx r13d,r10d,25
+ rorx r15d,r10d,11
+ lea ecx,[r14*1+rcx]
+ lea ebx,[r12*1+rbx]
+ andn r12d,r10d,eax
+ xor r13d,r15d
+ rorx r14d,r10d,6
+ lea ebx,[r12*1+rbx]
+ xor r13d,r14d
+ mov r15d,ecx
+ rorx r12d,ecx,22
+ lea ebx,[r13*1+rbx]
+ xor r15d,edx
+ rorx r14d,ecx,13
+ rorx r13d,ecx,2
+ lea r9d,[rbx*1+r9]
+ and esi,r15d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((96-128))+rdi]
+ xor r14d,r12d
+ xor esi,edx
+ xor r14d,r13d
+ lea ebx,[rsi*1+rbx]
+ mov r12d,r10d
+ add eax,DWORD[((44+64))+rsp]
+ and r12d,r9d
+ rorx r13d,r9d,25
+ rorx esi,r9d,11
+ lea ebx,[r14*1+rbx]
+ lea eax,[r12*1+rax]
+ andn r12d,r9d,r11d
+ xor r13d,esi
+ rorx r14d,r9d,6
+ lea eax,[r12*1+rax]
+ xor r13d,r14d
+ mov esi,ebx
+ rorx r12d,ebx,22
+ lea eax,[r13*1+rax]
+ xor esi,ecx
+ rorx r14d,ebx,13
+ rorx r13d,ebx,2
+ lea r8d,[rax*1+r8]
+ and r15d,esi
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((112-128))+rdi]
+ xor r14d,r12d
+ xor r15d,ecx
+ xor r14d,r13d
+ lea eax,[r15*1+rax]
+ mov r12d,r9d
+ add r11d,DWORD[rsp]
+ and r12d,r8d
+ rorx r13d,r8d,25
+ rorx r15d,r8d,11
+ lea eax,[r14*1+rax]
+ lea r11d,[r12*1+r11]
+ andn r12d,r8d,r10d
+ xor r13d,r15d
+ rorx r14d,r8d,6
+ lea r11d,[r12*1+r11]
+ xor r13d,r14d
+ mov r15d,eax
+ rorx r12d,eax,22
+ lea r11d,[r13*1+r11]
+ xor r15d,ebx
+ rorx r14d,eax,13
+ rorx r13d,eax,2
+ lea edx,[r11*1+rdx]
+ and esi,r15d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((128-128))+rdi]
+ xor r14d,r12d
+ xor esi,ebx
+ xor r14d,r13d
+ lea r11d,[rsi*1+r11]
+ mov r12d,r8d
+ add r10d,DWORD[4+rsp]
+ and r12d,edx
+ rorx r13d,edx,25
+ rorx esi,edx,11
+ lea r11d,[r14*1+r11]
+ lea r10d,[r12*1+r10]
+ andn r12d,edx,r9d
+ xor r13d,esi
+ rorx r14d,edx,6
+ lea r10d,[r12*1+r10]
+ xor r13d,r14d
+ mov esi,r11d
+ rorx r12d,r11d,22
+ lea r10d,[r13*1+r10]
+ xor esi,eax
+ rorx r14d,r11d,13
+ rorx r13d,r11d,2
+ lea ecx,[r10*1+rcx]
+ and r15d,esi
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((144-128))+rdi]
+ xor r14d,r12d
+ xor r15d,eax
+ xor r14d,r13d
+ lea r10d,[r15*1+r10]
+ mov r12d,edx
+ add r9d,DWORD[8+rsp]
+ and r12d,ecx
+ rorx r13d,ecx,25
+ rorx r15d,ecx,11
+ lea r10d,[r14*1+r10]
+ lea r9d,[r12*1+r9]
+ andn r12d,ecx,r8d
+ xor r13d,r15d
+ rorx r14d,ecx,6
+ lea r9d,[r12*1+r9]
+ xor r13d,r14d
+ mov r15d,r10d
+ rorx r12d,r10d,22
+ lea r9d,[r13*1+r9]
+ xor r15d,r11d
+ rorx r14d,r10d,13
+ rorx r13d,r10d,2
+ lea ebx,[r9*1+rbx]
+ and esi,r15d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((160-128))+rdi]
+ xor r14d,r12d
+ xor esi,r11d
+ xor r14d,r13d
+ lea r9d,[rsi*1+r9]
+ mov r12d,ecx
+ add r8d,DWORD[12+rsp]
+ and r12d,ebx
+ rorx r13d,ebx,25
+ rorx esi,ebx,11
+ lea r9d,[r14*1+r9]
+ lea r8d,[r12*1+r8]
+ andn r12d,ebx,edx
+ xor r13d,esi
+ rorx r14d,ebx,6
+ lea r8d,[r12*1+r8]
+ xor r13d,r14d
+ mov esi,r9d
+ rorx r12d,r9d,22
+ lea r8d,[r13*1+r8]
+ xor esi,r10d
+ rorx r14d,r9d,13
+ rorx r13d,r9d,2
+ lea eax,[r8*1+rax]
+ and r15d,esi
+ vaesenclast xmm11,xmm9,xmm10
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((176-128))+rdi]
+ xor r14d,r12d
+ xor r15d,r10d
+ xor r14d,r13d
+ lea r8d,[r15*1+r8]
+ mov r12d,ebx
+ add edx,DWORD[32+rsp]
+ and r12d,eax
+ rorx r13d,eax,25
+ rorx r15d,eax,11
+ lea r8d,[r14*1+r8]
+ lea edx,[r12*1+rdx]
+ andn r12d,eax,ecx
+ xor r13d,r15d
+ rorx r14d,eax,6
+ lea edx,[r12*1+rdx]
+ xor r13d,r14d
+ mov r15d,r8d
+ rorx r12d,r8d,22
+ lea edx,[r13*1+rdx]
+ xor r15d,r9d
+ rorx r14d,r8d,13
+ rorx r13d,r8d,2
+ lea r11d,[rdx*1+r11]
+ and esi,r15d
+ vpand xmm8,xmm11,xmm12
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((192-128))+rdi]
+ xor r14d,r12d
+ xor esi,r9d
+ xor r14d,r13d
+ lea edx,[rsi*1+rdx]
+ mov r12d,eax
+ add ecx,DWORD[36+rsp]
+ and r12d,r11d
+ rorx r13d,r11d,25
+ rorx esi,r11d,11
+ lea edx,[r14*1+rdx]
+ lea ecx,[r12*1+rcx]
+ andn r12d,r11d,ebx
+ xor r13d,esi
+ rorx r14d,r11d,6
+ lea ecx,[r12*1+rcx]
+ xor r13d,r14d
+ mov esi,edx
+ rorx r12d,edx,22
+ lea ecx,[r13*1+rcx]
+ xor esi,r8d
+ rorx r14d,edx,13
+ rorx r13d,edx,2
+ lea r10d,[rcx*1+r10]
+ and r15d,esi
+ vaesenclast xmm11,xmm9,xmm10
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((208-128))+rdi]
+ xor r14d,r12d
+ xor r15d,r8d
+ xor r14d,r13d
+ lea ecx,[r15*1+rcx]
+ mov r12d,r11d
+ add ebx,DWORD[40+rsp]
+ and r12d,r10d
+ rorx r13d,r10d,25
+ rorx r15d,r10d,11
+ lea ecx,[r14*1+rcx]
+ lea ebx,[r12*1+rbx]
+ andn r12d,r10d,eax
+ xor r13d,r15d
+ rorx r14d,r10d,6
+ lea ebx,[r12*1+rbx]
+ xor r13d,r14d
+ mov r15d,ecx
+ rorx r12d,ecx,22
+ lea ebx,[r13*1+rbx]
+ xor r15d,edx
+ rorx r14d,ecx,13
+ rorx r13d,ecx,2
+ lea r9d,[rbx*1+r9]
+ and esi,r15d
+ vpand xmm11,xmm11,xmm13
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((224-128))+rdi]
+ xor r14d,r12d
+ xor esi,edx
+ xor r14d,r13d
+ lea ebx,[rsi*1+rbx]
+ mov r12d,r10d
+ add eax,DWORD[44+rsp]
+ and r12d,r9d
+ rorx r13d,r9d,25
+ rorx esi,r9d,11
+ lea ebx,[r14*1+rbx]
+ lea eax,[r12*1+rax]
+ andn r12d,r9d,r11d
+ xor r13d,esi
+ rorx r14d,r9d,6
+ lea eax,[r12*1+rax]
+ xor r13d,r14d
+ mov esi,ebx
+ rorx r12d,ebx,22
+ lea eax,[r13*1+rax]
+ xor esi,ecx
+ rorx r14d,ebx,13
+ rorx r13d,ebx,2
+ lea r8d,[rax*1+r8]
+ and r15d,esi
+ vpor xmm8,xmm8,xmm11
+ vaesenclast xmm11,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((0-128))+rdi]
+ xor r14d,r12d
+ xor r15d,ecx
+ xor r14d,r13d
+ lea eax,[r15*1+rax]
+ mov r12d,r9d
+ vpextrq r12,xmm15,1
+ vmovq r13,xmm15
+ mov r15,QWORD[552+rsp]
+ add eax,r14d
+ lea rbp,[448+rsp]
+
+ vpand xmm11,xmm11,xmm14
+ vpor xmm8,xmm8,xmm11
+ vmovdqu XMMWORD[r13*1+r12],xmm8
+ lea r13,[16+r13]
+
+ add eax,DWORD[r15]
+ add ebx,DWORD[4+r15]
+ add ecx,DWORD[8+r15]
+ add edx,DWORD[12+r15]
+ add r8d,DWORD[16+r15]
+ add r9d,DWORD[20+r15]
+ add r10d,DWORD[24+r15]
+ add r11d,DWORD[28+r15]
+
+ mov DWORD[r15],eax
+ mov DWORD[4+r15],ebx
+ mov DWORD[8+r15],ecx
+ mov DWORD[12+r15],edx
+ mov DWORD[16+r15],r8d
+ mov DWORD[20+r15],r9d
+ mov DWORD[24+r15],r10d
+ mov DWORD[28+r15],r11d
+
+ cmp r13,QWORD[80+rbp]
+ je NEAR $L$done_avx2
+
+ xor r14d,r14d
+ mov esi,ebx
+ mov r12d,r9d
+ xor esi,ecx
+ jmp NEAR $L$ower_avx2
+ALIGN 16
+$L$ower_avx2:
+ vmovdqu xmm9,XMMWORD[r13]
+ vpinsrq xmm15,xmm15,r13,0
+ add r11d,DWORD[((0+16))+rbp]
+ and r12d,r8d
+ rorx r13d,r8d,25
+ rorx r15d,r8d,11
+ lea eax,[r14*1+rax]
+ lea r11d,[r12*1+r11]
+ andn r12d,r8d,r10d
+ xor r13d,r15d
+ rorx r14d,r8d,6
+ lea r11d,[r12*1+r11]
+ xor r13d,r14d
+ mov r15d,eax
+ rorx r12d,eax,22
+ lea r11d,[r13*1+r11]
+ xor r15d,ebx
+ rorx r14d,eax,13
+ rorx r13d,eax,2
+ lea edx,[r11*1+rdx]
+ and esi,r15d
+ vpxor xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((16-128))+rdi]
+ xor r14d,r12d
+ xor esi,ebx
+ xor r14d,r13d
+ lea r11d,[rsi*1+r11]
+ mov r12d,r8d
+ add r10d,DWORD[((4+16))+rbp]
+ and r12d,edx
+ rorx r13d,edx,25
+ rorx esi,edx,11
+ lea r11d,[r14*1+r11]
+ lea r10d,[r12*1+r10]
+ andn r12d,edx,r9d
+ xor r13d,esi
+ rorx r14d,edx,6
+ lea r10d,[r12*1+r10]
+ xor r13d,r14d
+ mov esi,r11d
+ rorx r12d,r11d,22
+ lea r10d,[r13*1+r10]
+ xor esi,eax
+ rorx r14d,r11d,13
+ rorx r13d,r11d,2
+ lea ecx,[r10*1+rcx]
+ and r15d,esi
+ vpxor xmm9,xmm9,xmm8
+ xor r14d,r12d
+ xor r15d,eax
+ xor r14d,r13d
+ lea r10d,[r15*1+r10]
+ mov r12d,edx
+ add r9d,DWORD[((8+16))+rbp]
+ and r12d,ecx
+ rorx r13d,ecx,25
+ rorx r15d,ecx,11
+ lea r10d,[r14*1+r10]
+ lea r9d,[r12*1+r9]
+ andn r12d,ecx,r8d
+ xor r13d,r15d
+ rorx r14d,ecx,6
+ lea r9d,[r12*1+r9]
+ xor r13d,r14d
+ mov r15d,r10d
+ rorx r12d,r10d,22
+ lea r9d,[r13*1+r9]
+ xor r15d,r11d
+ rorx r14d,r10d,13
+ rorx r13d,r10d,2
+ lea ebx,[r9*1+rbx]
+ and esi,r15d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((32-128))+rdi]
+ xor r14d,r12d
+ xor esi,r11d
+ xor r14d,r13d
+ lea r9d,[rsi*1+r9]
+ mov r12d,ecx
+ add r8d,DWORD[((12+16))+rbp]
+ and r12d,ebx
+ rorx r13d,ebx,25
+ rorx esi,ebx,11
+ lea r9d,[r14*1+r9]
+ lea r8d,[r12*1+r8]
+ andn r12d,ebx,edx
+ xor r13d,esi
+ rorx r14d,ebx,6
+ lea r8d,[r12*1+r8]
+ xor r13d,r14d
+ mov esi,r9d
+ rorx r12d,r9d,22
+ lea r8d,[r13*1+r8]
+ xor esi,r10d
+ rorx r14d,r9d,13
+ rorx r13d,r9d,2
+ lea eax,[r8*1+rax]
+ and r15d,esi
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((48-128))+rdi]
+ xor r14d,r12d
+ xor r15d,r10d
+ xor r14d,r13d
+ lea r8d,[r15*1+r8]
+ mov r12d,ebx
+ add edx,DWORD[((32+16))+rbp]
+ and r12d,eax
+ rorx r13d,eax,25
+ rorx r15d,eax,11
+ lea r8d,[r14*1+r8]
+ lea edx,[r12*1+rdx]
+ andn r12d,eax,ecx
+ xor r13d,r15d
+ rorx r14d,eax,6
+ lea edx,[r12*1+rdx]
+ xor r13d,r14d
+ mov r15d,r8d
+ rorx r12d,r8d,22
+ lea edx,[r13*1+rdx]
+ xor r15d,r9d
+ rorx r14d,r8d,13
+ rorx r13d,r8d,2
+ lea r11d,[rdx*1+r11]
+ and esi,r15d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((64-128))+rdi]
+ xor r14d,r12d
+ xor esi,r9d
+ xor r14d,r13d
+ lea edx,[rsi*1+rdx]
+ mov r12d,eax
+ add ecx,DWORD[((36+16))+rbp]
+ and r12d,r11d
+ rorx r13d,r11d,25
+ rorx esi,r11d,11
+ lea edx,[r14*1+rdx]
+ lea ecx,[r12*1+rcx]
+ andn r12d,r11d,ebx
+ xor r13d,esi
+ rorx r14d,r11d,6
+ lea ecx,[r12*1+rcx]
+ xor r13d,r14d
+ mov esi,edx
+ rorx r12d,edx,22
+ lea ecx,[r13*1+rcx]
+ xor esi,r8d
+ rorx r14d,edx,13
+ rorx r13d,edx,2
+ lea r10d,[rcx*1+r10]
+ and r15d,esi
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((80-128))+rdi]
+ xor r14d,r12d
+ xor r15d,r8d
+ xor r14d,r13d
+ lea ecx,[r15*1+rcx]
+ mov r12d,r11d
+ add ebx,DWORD[((40+16))+rbp]
+ and r12d,r10d
+ rorx r13d,r10d,25
+ rorx r15d,r10d,11
+ lea ecx,[r14*1+rcx]
+ lea ebx,[r12*1+rbx]
+ andn r12d,r10d,eax
+ xor r13d,r15d
+ rorx r14d,r10d,6
+ lea ebx,[r12*1+rbx]
+ xor r13d,r14d
+ mov r15d,ecx
+ rorx r12d,ecx,22
+ lea ebx,[r13*1+rbx]
+ xor r15d,edx
+ rorx r14d,ecx,13
+ rorx r13d,ecx,2
+ lea r9d,[rbx*1+r9]
+ and esi,r15d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((96-128))+rdi]
+ xor r14d,r12d
+ xor esi,edx
+ xor r14d,r13d
+ lea ebx,[rsi*1+rbx]
+ mov r12d,r10d
+ add eax,DWORD[((44+16))+rbp]
+ and r12d,r9d
+ rorx r13d,r9d,25
+ rorx esi,r9d,11
+ lea ebx,[r14*1+rbx]
+ lea eax,[r12*1+rax]
+ andn r12d,r9d,r11d
+ xor r13d,esi
+ rorx r14d,r9d,6
+ lea eax,[r12*1+rax]
+ xor r13d,r14d
+ mov esi,ebx
+ rorx r12d,ebx,22
+ lea eax,[r13*1+rax]
+ xor esi,ecx
+ rorx r14d,ebx,13
+ rorx r13d,ebx,2
+ lea r8d,[rax*1+r8]
+ and r15d,esi
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((112-128))+rdi]
+ xor r14d,r12d
+ xor r15d,ecx
+ xor r14d,r13d
+ lea eax,[r15*1+rax]
+ mov r12d,r9d
+ lea rbp,[((-64))+rbp]
+ add r11d,DWORD[((0+16))+rbp]
+ and r12d,r8d
+ rorx r13d,r8d,25
+ rorx r15d,r8d,11
+ lea eax,[r14*1+rax]
+ lea r11d,[r12*1+r11]
+ andn r12d,r8d,r10d
+ xor r13d,r15d
+ rorx r14d,r8d,6
+ lea r11d,[r12*1+r11]
+ xor r13d,r14d
+ mov r15d,eax
+ rorx r12d,eax,22
+ lea r11d,[r13*1+r11]
+ xor r15d,ebx
+ rorx r14d,eax,13
+ rorx r13d,eax,2
+ lea edx,[r11*1+rdx]
+ and esi,r15d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((128-128))+rdi]
+ xor r14d,r12d
+ xor esi,ebx
+ xor r14d,r13d
+ lea r11d,[rsi*1+r11]
+ mov r12d,r8d
+ add r10d,DWORD[((4+16))+rbp]
+ and r12d,edx
+ rorx r13d,edx,25
+ rorx esi,edx,11
+ lea r11d,[r14*1+r11]
+ lea r10d,[r12*1+r10]
+ andn r12d,edx,r9d
+ xor r13d,esi
+ rorx r14d,edx,6
+ lea r10d,[r12*1+r10]
+ xor r13d,r14d
+ mov esi,r11d
+ rorx r12d,r11d,22
+ lea r10d,[r13*1+r10]
+ xor esi,eax
+ rorx r14d,r11d,13
+ rorx r13d,r11d,2
+ lea ecx,[r10*1+rcx]
+ and r15d,esi
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((144-128))+rdi]
+ xor r14d,r12d
+ xor r15d,eax
+ xor r14d,r13d
+ lea r10d,[r15*1+r10]
+ mov r12d,edx
+ add r9d,DWORD[((8+16))+rbp]
+ and r12d,ecx
+ rorx r13d,ecx,25
+ rorx r15d,ecx,11
+ lea r10d,[r14*1+r10]
+ lea r9d,[r12*1+r9]
+ andn r12d,ecx,r8d
+ xor r13d,r15d
+ rorx r14d,ecx,6
+ lea r9d,[r12*1+r9]
+ xor r13d,r14d
+ mov r15d,r10d
+ rorx r12d,r10d,22
+ lea r9d,[r13*1+r9]
+ xor r15d,r11d
+ rorx r14d,r10d,13
+ rorx r13d,r10d,2
+ lea ebx,[r9*1+rbx]
+ and esi,r15d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((160-128))+rdi]
+ xor r14d,r12d
+ xor esi,r11d
+ xor r14d,r13d
+ lea r9d,[rsi*1+r9]
+ mov r12d,ecx
+ add r8d,DWORD[((12+16))+rbp]
+ and r12d,ebx
+ rorx r13d,ebx,25
+ rorx esi,ebx,11
+ lea r9d,[r14*1+r9]
+ lea r8d,[r12*1+r8]
+ andn r12d,ebx,edx
+ xor r13d,esi
+ rorx r14d,ebx,6
+ lea r8d,[r12*1+r8]
+ xor r13d,r14d
+ mov esi,r9d
+ rorx r12d,r9d,22
+ lea r8d,[r13*1+r8]
+ xor esi,r10d
+ rorx r14d,r9d,13
+ rorx r13d,r9d,2
+ lea eax,[r8*1+rax]
+ and r15d,esi
+ vaesenclast xmm11,xmm9,xmm10
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((176-128))+rdi]
+ xor r14d,r12d
+ xor r15d,r10d
+ xor r14d,r13d
+ lea r8d,[r15*1+r8]
+ mov r12d,ebx
+ add edx,DWORD[((32+16))+rbp]
+ and r12d,eax
+ rorx r13d,eax,25
+ rorx r15d,eax,11
+ lea r8d,[r14*1+r8]
+ lea edx,[r12*1+rdx]
+ andn r12d,eax,ecx
+ xor r13d,r15d
+ rorx r14d,eax,6
+ lea edx,[r12*1+rdx]
+ xor r13d,r14d
+ mov r15d,r8d
+ rorx r12d,r8d,22
+ lea edx,[r13*1+rdx]
+ xor r15d,r9d
+ rorx r14d,r8d,13
+ rorx r13d,r8d,2
+ lea r11d,[rdx*1+r11]
+ and esi,r15d
+ vpand xmm8,xmm11,xmm12
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((192-128))+rdi]
+ xor r14d,r12d
+ xor esi,r9d
+ xor r14d,r13d
+ lea edx,[rsi*1+rdx]
+ mov r12d,eax
+ add ecx,DWORD[((36+16))+rbp]
+ and r12d,r11d
+ rorx r13d,r11d,25
+ rorx esi,r11d,11
+ lea edx,[r14*1+rdx]
+ lea ecx,[r12*1+rcx]
+ andn r12d,r11d,ebx
+ xor r13d,esi
+ rorx r14d,r11d,6
+ lea ecx,[r12*1+rcx]
+ xor r13d,r14d
+ mov esi,edx
+ rorx r12d,edx,22
+ lea ecx,[r13*1+rcx]
+ xor esi,r8d
+ rorx r14d,edx,13
+ rorx r13d,edx,2
+ lea r10d,[rcx*1+r10]
+ and r15d,esi
+ vaesenclast xmm11,xmm9,xmm10
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((208-128))+rdi]
+ xor r14d,r12d
+ xor r15d,r8d
+ xor r14d,r13d
+ lea ecx,[r15*1+rcx]
+ mov r12d,r11d
+ add ebx,DWORD[((40+16))+rbp]
+ and r12d,r10d
+ rorx r13d,r10d,25
+ rorx r15d,r10d,11
+ lea ecx,[r14*1+rcx]
+ lea ebx,[r12*1+rbx]
+ andn r12d,r10d,eax
+ xor r13d,r15d
+ rorx r14d,r10d,6
+ lea ebx,[r12*1+rbx]
+ xor r13d,r14d
+ mov r15d,ecx
+ rorx r12d,ecx,22
+ lea ebx,[r13*1+rbx]
+ xor r15d,edx
+ rorx r14d,ecx,13
+ rorx r13d,ecx,2
+ lea r9d,[rbx*1+r9]
+ and esi,r15d
+ vpand xmm11,xmm11,xmm13
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((224-128))+rdi]
+ xor r14d,r12d
+ xor esi,edx
+ xor r14d,r13d
+ lea ebx,[rsi*1+rbx]
+ mov r12d,r10d
+ add eax,DWORD[((44+16))+rbp]
+ and r12d,r9d
+ rorx r13d,r9d,25
+ rorx esi,r9d,11
+ lea ebx,[r14*1+rbx]
+ lea eax,[r12*1+rax]
+ andn r12d,r9d,r11d
+ xor r13d,esi
+ rorx r14d,r9d,6
+ lea eax,[r12*1+rax]
+ xor r13d,r14d
+ mov esi,ebx
+ rorx r12d,ebx,22
+ lea eax,[r13*1+rax]
+ xor esi,ecx
+ rorx r14d,ebx,13
+ rorx r13d,ebx,2
+ lea r8d,[rax*1+r8]
+ and r15d,esi
+ vpor xmm8,xmm8,xmm11
+ vaesenclast xmm11,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((0-128))+rdi]
+ xor r14d,r12d
+ xor r15d,ecx
+ xor r14d,r13d
+ lea eax,[r15*1+rax]
+ mov r12d,r9d
+ vmovq r13,xmm15
+ vpextrq r15,xmm15,1
+ vpand xmm11,xmm11,xmm14
+ vpor xmm8,xmm8,xmm11
+ lea rbp,[((-64))+rbp]
+ vmovdqu XMMWORD[r13*1+r15],xmm8
+ lea r13,[16+r13]
+ cmp rbp,rsp
+ jae NEAR $L$ower_avx2
+
+ mov r15,QWORD[552+rsp]
+ lea r13,[64+r13]
+ mov rsi,QWORD[560+rsp]
+ add eax,r14d
+ lea rsp,[448+rsp]
+
+ add eax,DWORD[r15]
+ add ebx,DWORD[4+r15]
+ add ecx,DWORD[8+r15]
+ add edx,DWORD[12+r15]
+ add r8d,DWORD[16+r15]
+ add r9d,DWORD[20+r15]
+ add r10d,DWORD[24+r15]
+ lea r12,[r13*1+rsi]
+ add r11d,DWORD[28+r15]
+
+ cmp r13,QWORD[((64+16))+rsp]
+
+ mov DWORD[r15],eax
+ cmove r12,rsp
+ mov DWORD[4+r15],ebx
+ mov DWORD[8+r15],ecx
+ mov DWORD[12+r15],edx
+ mov DWORD[16+r15],r8d
+ mov DWORD[20+r15],r9d
+ mov DWORD[24+r15],r10d
+ mov DWORD[28+r15],r11d
+
+ jbe NEAR $L$oop_avx2
+ lea rbp,[rsp]
+
+$L$done_avx2:
+ lea rsp,[rbp]
+ mov r8,QWORD[((64+32))+rsp]
+ mov rsi,QWORD[120+rsp]
+
+ vmovdqu XMMWORD[r8],xmm8
+ vzeroall
+ movaps xmm6,XMMWORD[128+rsp]
+ movaps xmm7,XMMWORD[144+rsp]
+ movaps xmm8,XMMWORD[160+rsp]
+ movaps xmm9,XMMWORD[176+rsp]
+ movaps xmm10,XMMWORD[192+rsp]
+ movaps xmm11,XMMWORD[208+rsp]
+ movaps xmm12,XMMWORD[224+rsp]
+ movaps xmm13,XMMWORD[240+rsp]
+ movaps xmm14,XMMWORD[256+rsp]
+ movaps xmm15,XMMWORD[272+rsp]
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$epilogue_avx2:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_aesni_cbc_sha256_enc_avx2:
+
+ALIGN 32
+aesni_cbc_sha256_enc_shaext:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_cbc_sha256_enc_shaext:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+ mov r10,QWORD[56+rsp]
+ lea rsp,[((-168))+rsp]
+ movaps XMMWORD[(-8-160)+rax],xmm6
+ movaps XMMWORD[(-8-144)+rax],xmm7
+ movaps XMMWORD[(-8-128)+rax],xmm8
+ movaps XMMWORD[(-8-112)+rax],xmm9
+ movaps XMMWORD[(-8-96)+rax],xmm10
+ movaps XMMWORD[(-8-80)+rax],xmm11
+ movaps XMMWORD[(-8-64)+rax],xmm12
+ movaps XMMWORD[(-8-48)+rax],xmm13
+ movaps XMMWORD[(-8-32)+rax],xmm14
+ movaps XMMWORD[(-8-16)+rax],xmm15
+$L$prologue_shaext:
+ lea rax,[((K256+128))]
+ movdqu xmm1,XMMWORD[r9]
+ movdqu xmm2,XMMWORD[16+r9]
+ movdqa xmm3,XMMWORD[((512-128))+rax]
+
+ mov r11d,DWORD[240+rcx]
+ sub rsi,rdi
+ movups xmm15,XMMWORD[rcx]
+ movups xmm6,XMMWORD[r8]
+ movups xmm4,XMMWORD[16+rcx]
+ lea rcx,[112+rcx]
+
+ pshufd xmm0,xmm1,0x1b
+ pshufd xmm1,xmm1,0xb1
+ pshufd xmm2,xmm2,0x1b
+ movdqa xmm7,xmm3
+DB 102,15,58,15,202,8
+ punpcklqdq xmm2,xmm0
+
+ jmp NEAR $L$oop_shaext
+
+ALIGN 16
+$L$oop_shaext:
+ movdqu xmm10,XMMWORD[r10]
+ movdqu xmm11,XMMWORD[16+r10]
+ movdqu xmm12,XMMWORD[32+r10]
+DB 102,68,15,56,0,211
+ movdqu xmm13,XMMWORD[48+r10]
+
+ movdqa xmm0,XMMWORD[((0-128))+rax]
+ paddd xmm0,xmm10
+DB 102,68,15,56,0,219
+ movdqa xmm9,xmm2
+ movdqa xmm8,xmm1
+ movups xmm14,XMMWORD[rdi]
+ xorps xmm14,xmm15
+ xorps xmm6,xmm14
+ movups xmm5,XMMWORD[((-80))+rcx]
+ aesenc xmm6,xmm4
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movups xmm4,XMMWORD[((-64))+rcx]
+ aesenc xmm6,xmm5
+DB 15,56,203,202
+
+ movdqa xmm0,XMMWORD[((32-128))+rax]
+ paddd xmm0,xmm11
+DB 102,68,15,56,0,227
+ lea r10,[64+r10]
+ movups xmm5,XMMWORD[((-48))+rcx]
+ aesenc xmm6,xmm4
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movups xmm4,XMMWORD[((-32))+rcx]
+ aesenc xmm6,xmm5
+DB 15,56,203,202
+
+ movdqa xmm0,XMMWORD[((64-128))+rax]
+ paddd xmm0,xmm12
+DB 102,68,15,56,0,235
+DB 69,15,56,204,211
+ movups xmm5,XMMWORD[((-16))+rcx]
+ aesenc xmm6,xmm4
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm3,xmm13
+DB 102,65,15,58,15,220,4
+ paddd xmm10,xmm3
+ movups xmm4,XMMWORD[rcx]
+ aesenc xmm6,xmm5
+DB 15,56,203,202
+
+ movdqa xmm0,XMMWORD[((96-128))+rax]
+ paddd xmm0,xmm13
+DB 69,15,56,205,213
+DB 69,15,56,204,220
+ movups xmm5,XMMWORD[16+rcx]
+ aesenc xmm6,xmm4
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movups xmm4,XMMWORD[32+rcx]
+ aesenc xmm6,xmm5
+ movdqa xmm3,xmm10
+DB 102,65,15,58,15,221,4
+ paddd xmm11,xmm3
+DB 15,56,203,202
+ movdqa xmm0,XMMWORD[((128-128))+rax]
+ paddd xmm0,xmm10
+DB 69,15,56,205,218
+DB 69,15,56,204,229
+ movups xmm5,XMMWORD[48+rcx]
+ aesenc xmm6,xmm4
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm3,xmm11
+DB 102,65,15,58,15,218,4
+ paddd xmm12,xmm3
+ cmp r11d,11
+ jb NEAR $L$aesenclast1
+ movups xmm4,XMMWORD[64+rcx]
+ aesenc xmm6,xmm5
+ movups xmm5,XMMWORD[80+rcx]
+ aesenc xmm6,xmm4
+ je NEAR $L$aesenclast1
+ movups xmm4,XMMWORD[96+rcx]
+ aesenc xmm6,xmm5
+ movups xmm5,XMMWORD[112+rcx]
+ aesenc xmm6,xmm4
+$L$aesenclast1:
+ aesenclast xmm6,xmm5
+ movups xmm4,XMMWORD[((16-112))+rcx]
+ nop
+DB 15,56,203,202
+ movups xmm14,XMMWORD[16+rdi]
+ xorps xmm14,xmm15
+ movups XMMWORD[rdi*1+rsi],xmm6
+ xorps xmm6,xmm14
+ movups xmm5,XMMWORD[((-80))+rcx]
+ aesenc xmm6,xmm4
+ movdqa xmm0,XMMWORD[((160-128))+rax]
+ paddd xmm0,xmm11
+DB 69,15,56,205,227
+DB 69,15,56,204,234
+ movups xmm4,XMMWORD[((-64))+rcx]
+ aesenc xmm6,xmm5
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm3,xmm12
+DB 102,65,15,58,15,219,4
+ paddd xmm13,xmm3
+ movups xmm5,XMMWORD[((-48))+rcx]
+ aesenc xmm6,xmm4
+DB 15,56,203,202
+ movdqa xmm0,XMMWORD[((192-128))+rax]
+ paddd xmm0,xmm12
+DB 69,15,56,205,236
+DB 69,15,56,204,211
+ movups xmm4,XMMWORD[((-32))+rcx]
+ aesenc xmm6,xmm5
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm3,xmm13
+DB 102,65,15,58,15,220,4
+ paddd xmm10,xmm3
+ movups xmm5,XMMWORD[((-16))+rcx]
+ aesenc xmm6,xmm4
+DB 15,56,203,202
+ movdqa xmm0,XMMWORD[((224-128))+rax]
+ paddd xmm0,xmm13
+DB 69,15,56,205,213
+DB 69,15,56,204,220
+ movups xmm4,XMMWORD[rcx]
+ aesenc xmm6,xmm5
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm3,xmm10
+DB 102,65,15,58,15,221,4
+ paddd xmm11,xmm3
+ movups xmm5,XMMWORD[16+rcx]
+ aesenc xmm6,xmm4
+DB 15,56,203,202
+ movdqa xmm0,XMMWORD[((256-128))+rax]
+ paddd xmm0,xmm10
+DB 69,15,56,205,218
+DB 69,15,56,204,229
+ movups xmm4,XMMWORD[32+rcx]
+ aesenc xmm6,xmm5
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm3,xmm11
+DB 102,65,15,58,15,218,4
+ paddd xmm12,xmm3
+ movups xmm5,XMMWORD[48+rcx]
+ aesenc xmm6,xmm4
+ cmp r11d,11
+ jb NEAR $L$aesenclast2
+ movups xmm4,XMMWORD[64+rcx]
+ aesenc xmm6,xmm5
+ movups xmm5,XMMWORD[80+rcx]
+ aesenc xmm6,xmm4
+ je NEAR $L$aesenclast2
+ movups xmm4,XMMWORD[96+rcx]
+ aesenc xmm6,xmm5
+ movups xmm5,XMMWORD[112+rcx]
+ aesenc xmm6,xmm4
+$L$aesenclast2:
+ aesenclast xmm6,xmm5
+ movups xmm4,XMMWORD[((16-112))+rcx]
+ nop
+DB 15,56,203,202
+ movups xmm14,XMMWORD[32+rdi]
+ xorps xmm14,xmm15
+ movups XMMWORD[16+rdi*1+rsi],xmm6
+ xorps xmm6,xmm14
+ movups xmm5,XMMWORD[((-80))+rcx]
+ aesenc xmm6,xmm4
+ movdqa xmm0,XMMWORD[((288-128))+rax]
+ paddd xmm0,xmm11
+DB 69,15,56,205,227
+DB 69,15,56,204,234
+ movups xmm4,XMMWORD[((-64))+rcx]
+ aesenc xmm6,xmm5
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm3,xmm12
+DB 102,65,15,58,15,219,4
+ paddd xmm13,xmm3
+ movups xmm5,XMMWORD[((-48))+rcx]
+ aesenc xmm6,xmm4
+DB 15,56,203,202
+ movdqa xmm0,XMMWORD[((320-128))+rax]
+ paddd xmm0,xmm12
+DB 69,15,56,205,236
+DB 69,15,56,204,211
+ movups xmm4,XMMWORD[((-32))+rcx]
+ aesenc xmm6,xmm5
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm3,xmm13
+DB 102,65,15,58,15,220,4
+ paddd xmm10,xmm3
+ movups xmm5,XMMWORD[((-16))+rcx]
+ aesenc xmm6,xmm4
+DB 15,56,203,202
+ movdqa xmm0,XMMWORD[((352-128))+rax]
+ paddd xmm0,xmm13
+DB 69,15,56,205,213
+DB 69,15,56,204,220
+ movups xmm4,XMMWORD[rcx]
+ aesenc xmm6,xmm5
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm3,xmm10
+DB 102,65,15,58,15,221,4
+ paddd xmm11,xmm3
+ movups xmm5,XMMWORD[16+rcx]
+ aesenc xmm6,xmm4
+DB 15,56,203,202
+ movdqa xmm0,XMMWORD[((384-128))+rax]
+ paddd xmm0,xmm10
+DB 69,15,56,205,218
+DB 69,15,56,204,229
+ movups xmm4,XMMWORD[32+rcx]
+ aesenc xmm6,xmm5
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm3,xmm11
+DB 102,65,15,58,15,218,4
+ paddd xmm12,xmm3
+ movups xmm5,XMMWORD[48+rcx]
+ aesenc xmm6,xmm4
+DB 15,56,203,202
+ movdqa xmm0,XMMWORD[((416-128))+rax]
+ paddd xmm0,xmm11
+DB 69,15,56,205,227
+DB 69,15,56,204,234
+ cmp r11d,11
+ jb NEAR $L$aesenclast3
+ movups xmm4,XMMWORD[64+rcx]
+ aesenc xmm6,xmm5
+ movups xmm5,XMMWORD[80+rcx]
+ aesenc xmm6,xmm4
+ je NEAR $L$aesenclast3
+ movups xmm4,XMMWORD[96+rcx]
+ aesenc xmm6,xmm5
+ movups xmm5,XMMWORD[112+rcx]
+ aesenc xmm6,xmm4
+$L$aesenclast3:
+ aesenclast xmm6,xmm5
+ movups xmm4,XMMWORD[((16-112))+rcx]
+ nop
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm3,xmm12
+DB 102,65,15,58,15,219,4
+ paddd xmm13,xmm3
+ movups xmm14,XMMWORD[48+rdi]
+ xorps xmm14,xmm15
+ movups XMMWORD[32+rdi*1+rsi],xmm6
+ xorps xmm6,xmm14
+ movups xmm5,XMMWORD[((-80))+rcx]
+ aesenc xmm6,xmm4
+ movups xmm4,XMMWORD[((-64))+rcx]
+ aesenc xmm6,xmm5
+DB 15,56,203,202
+
+ movdqa xmm0,XMMWORD[((448-128))+rax]
+ paddd xmm0,xmm12
+DB 69,15,56,205,236
+ movdqa xmm3,xmm7
+ movups xmm5,XMMWORD[((-48))+rcx]
+ aesenc xmm6,xmm4
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movups xmm4,XMMWORD[((-32))+rcx]
+ aesenc xmm6,xmm5
+DB 15,56,203,202
+
+ movdqa xmm0,XMMWORD[((480-128))+rax]
+ paddd xmm0,xmm13
+ movups xmm5,XMMWORD[((-16))+rcx]
+ aesenc xmm6,xmm4
+ movups xmm4,XMMWORD[rcx]
+ aesenc xmm6,xmm5
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movups xmm5,XMMWORD[16+rcx]
+ aesenc xmm6,xmm4
+DB 15,56,203,202
+
+ movups xmm4,XMMWORD[32+rcx]
+ aesenc xmm6,xmm5
+ movups xmm5,XMMWORD[48+rcx]
+ aesenc xmm6,xmm4
+ cmp r11d,11
+ jb NEAR $L$aesenclast4
+ movups xmm4,XMMWORD[64+rcx]
+ aesenc xmm6,xmm5
+ movups xmm5,XMMWORD[80+rcx]
+ aesenc xmm6,xmm4
+ je NEAR $L$aesenclast4
+ movups xmm4,XMMWORD[96+rcx]
+ aesenc xmm6,xmm5
+ movups xmm5,XMMWORD[112+rcx]
+ aesenc xmm6,xmm4
+$L$aesenclast4:
+ aesenclast xmm6,xmm5
+ movups xmm4,XMMWORD[((16-112))+rcx]
+ nop
+
+ paddd xmm2,xmm9
+ paddd xmm1,xmm8
+
+ dec rdx
+ movups XMMWORD[48+rdi*1+rsi],xmm6
+ lea rdi,[64+rdi]
+ jnz NEAR $L$oop_shaext
+
+ pshufd xmm2,xmm2,0xb1
+ pshufd xmm3,xmm1,0x1b
+ pshufd xmm1,xmm1,0xb1
+ punpckhqdq xmm1,xmm2
+DB 102,15,58,15,211,8
+
+ movups XMMWORD[r8],xmm6
+ movdqu XMMWORD[r9],xmm1
+ movdqu XMMWORD[16+r9],xmm2
+ movaps xmm6,XMMWORD[rsp]
+ movaps xmm7,XMMWORD[16+rsp]
+ movaps xmm8,XMMWORD[32+rsp]
+ movaps xmm9,XMMWORD[48+rsp]
+ movaps xmm10,XMMWORD[64+rsp]
+ movaps xmm11,XMMWORD[80+rsp]
+ movaps xmm12,XMMWORD[96+rsp]
+ movaps xmm13,XMMWORD[112+rsp]
+ movaps xmm14,XMMWORD[128+rsp]
+ movaps xmm15,XMMWORD[144+rsp]
+ lea rsp,[((8+160))+rsp]
+$L$epilogue_shaext:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+$L$SEH_end_aesni_cbc_sha256_enc_shaext:
+EXTERN __imp_RtlVirtualUnwind
+
+ALIGN 16
+se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$in_prologue
+
+ mov rax,QWORD[152+r8]
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$in_prologue
+ lea r10,[aesni_cbc_sha256_enc_shaext]
+ cmp rbx,r10
+ jb NEAR $L$not_in_shaext
+
+ lea rsi,[rax]
+ lea rdi,[512+r8]
+ mov ecx,20
+ DD 0xa548f3fc
+ lea rax,[168+rax]
+ jmp NEAR $L$in_prologue
+$L$not_in_shaext:
+ lea r10,[$L$avx2_shortcut]
+ cmp rbx,r10
+ jb NEAR $L$not_in_avx2
+
+ and rax,-256*4
+ add rax,448
+$L$not_in_avx2:
+ mov rsi,rax
+ mov rax,QWORD[((64+56))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+ mov rbp,QWORD[((-16))+rax]
+ mov r12,QWORD[((-24))+rax]
+ mov r13,QWORD[((-32))+rax]
+ mov r14,QWORD[((-40))+rax]
+ mov r15,QWORD[((-48))+rax]
+ mov QWORD[144+r8],rbx
+ mov QWORD[160+r8],rbp
+ mov QWORD[216+r8],r12
+ mov QWORD[224+r8],r13
+ mov QWORD[232+r8],r14
+ mov QWORD[240+r8],r15
+
+ lea rsi,[((64+64))+rsi]
+ lea rdi,[512+r8]
+ mov ecx,20
+ DD 0xa548f3fc
+
+$L$in_prologue:
+ mov rdi,QWORD[8+rax]
+ mov rsi,QWORD[16+rax]
+ mov QWORD[152+r8],rax
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+
+ mov rdi,QWORD[40+r9]
+ mov rsi,r8
+ mov ecx,154
+ DD 0xa548f3fc
+
+ mov rsi,r9
+ xor rcx,rcx
+ mov rdx,QWORD[8+rsi]
+ mov r8,QWORD[rsi]
+ mov r9,QWORD[16+rsi]
+ mov r10,QWORD[40+rsi]
+ lea r11,[56+rsi]
+ lea r12,[24+rsi]
+ mov QWORD[32+rsp],r10
+ mov QWORD[40+rsp],r11
+ mov QWORD[48+rsp],r12
+ mov QWORD[56+rsp],rcx
+ call QWORD[__imp_RtlVirtualUnwind]
+
+ mov eax,1
+ add rsp,64
+ popfq
+ pop r15
+ pop r14
+ pop r13
+ pop r12
+ pop rbp
+ pop rbx
+ pop rdi
+ pop rsi
+ DB 0F3h,0C3h ;repret
+
+
+section .pdata rdata align=4
+ DD $L$SEH_begin_aesni_cbc_sha256_enc_xop wrt ..imagebase
+ DD $L$SEH_end_aesni_cbc_sha256_enc_xop wrt ..imagebase
+ DD $L$SEH_info_aesni_cbc_sha256_enc_xop wrt ..imagebase
+
+ DD $L$SEH_begin_aesni_cbc_sha256_enc_avx wrt ..imagebase
+ DD $L$SEH_end_aesni_cbc_sha256_enc_avx wrt ..imagebase
+ DD $L$SEH_info_aesni_cbc_sha256_enc_avx wrt ..imagebase
+ DD $L$SEH_begin_aesni_cbc_sha256_enc_avx2 wrt ..imagebase
+ DD $L$SEH_end_aesni_cbc_sha256_enc_avx2 wrt ..imagebase
+ DD $L$SEH_info_aesni_cbc_sha256_enc_avx2 wrt ..imagebase
+ DD $L$SEH_begin_aesni_cbc_sha256_enc_shaext wrt ..imagebase
+ DD $L$SEH_end_aesni_cbc_sha256_enc_shaext wrt ..imagebase
+ DD $L$SEH_info_aesni_cbc_sha256_enc_shaext wrt ..imagebase
+section .xdata rdata align=8
+ALIGN 8
+$L$SEH_info_aesni_cbc_sha256_enc_xop:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$prologue_xop wrt ..imagebase,$L$epilogue_xop wrt ..imagebase
+
+$L$SEH_info_aesni_cbc_sha256_enc_avx:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$prologue_avx wrt ..imagebase,$L$epilogue_avx wrt ..imagebase
+$L$SEH_info_aesni_cbc_sha256_enc_avx2:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$prologue_avx2 wrt ..imagebase,$L$epilogue_avx2 wrt ..imagebase
+$L$SEH_info_aesni_cbc_sha256_enc_shaext:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$prologue_shaext wrt ..imagebase,$L$epilogue_shaext wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm
new file mode 100644
index 0000000000..2705ece3e2
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm
@@ -0,0 +1,5084 @@
+; Copyright 2009-2019 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+EXTERN OPENSSL_ia32cap_P
+global aesni_encrypt
+
+ALIGN 16
+aesni_encrypt:
+
+ movups xmm2,XMMWORD[rcx]
+ mov eax,DWORD[240+r8]
+ movups xmm0,XMMWORD[r8]
+ movups xmm1,XMMWORD[16+r8]
+ lea r8,[32+r8]
+ xorps xmm2,xmm0
+$L$oop_enc1_1:
+DB 102,15,56,220,209
+ dec eax
+ movups xmm1,XMMWORD[r8]
+ lea r8,[16+r8]
+ jnz NEAR $L$oop_enc1_1
+DB 102,15,56,221,209
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ movups XMMWORD[rdx],xmm2
+ pxor xmm2,xmm2
+ DB 0F3h,0C3h ;repret
+
+
+
+global aesni_decrypt
+
+ALIGN 16
+aesni_decrypt:
+
+ movups xmm2,XMMWORD[rcx]
+ mov eax,DWORD[240+r8]
+ movups xmm0,XMMWORD[r8]
+ movups xmm1,XMMWORD[16+r8]
+ lea r8,[32+r8]
+ xorps xmm2,xmm0
+$L$oop_dec1_2:
+DB 102,15,56,222,209
+ dec eax
+ movups xmm1,XMMWORD[r8]
+ lea r8,[16+r8]
+ jnz NEAR $L$oop_dec1_2
+DB 102,15,56,223,209
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ movups XMMWORD[rdx],xmm2
+ pxor xmm2,xmm2
+ DB 0F3h,0C3h ;repret
+
+
+
+ALIGN 16
+_aesni_encrypt2:
+
+ movups xmm0,XMMWORD[rcx]
+ shl eax,4
+ movups xmm1,XMMWORD[16+rcx]
+ xorps xmm2,xmm0
+ xorps xmm3,xmm0
+ movups xmm0,XMMWORD[32+rcx]
+ lea rcx,[32+rax*1+rcx]
+ neg rax
+ add rax,16
+
+$L$enc_loop2:
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+ movups xmm1,XMMWORD[rax*1+rcx]
+ add rax,32
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+ movups xmm0,XMMWORD[((-16))+rax*1+rcx]
+ jnz NEAR $L$enc_loop2
+
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,221,208
+DB 102,15,56,221,216
+ DB 0F3h,0C3h ;repret
+
+
+
+ALIGN 16
+_aesni_decrypt2:
+
+ movups xmm0,XMMWORD[rcx]
+ shl eax,4
+ movups xmm1,XMMWORD[16+rcx]
+ xorps xmm2,xmm0
+ xorps xmm3,xmm0
+ movups xmm0,XMMWORD[32+rcx]
+ lea rcx,[32+rax*1+rcx]
+ neg rax
+ add rax,16
+
+$L$dec_loop2:
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+ movups xmm1,XMMWORD[rax*1+rcx]
+ add rax,32
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+ movups xmm0,XMMWORD[((-16))+rax*1+rcx]
+ jnz NEAR $L$dec_loop2
+
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,223,208
+DB 102,15,56,223,216
+ DB 0F3h,0C3h ;repret
+
+
+
+ALIGN 16
+_aesni_encrypt3:
+
+ movups xmm0,XMMWORD[rcx]
+ shl eax,4
+ movups xmm1,XMMWORD[16+rcx]
+ xorps xmm2,xmm0
+ xorps xmm3,xmm0
+ xorps xmm4,xmm0
+ movups xmm0,XMMWORD[32+rcx]
+ lea rcx,[32+rax*1+rcx]
+ neg rax
+ add rax,16
+
+$L$enc_loop3:
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+ movups xmm1,XMMWORD[rax*1+rcx]
+ add rax,32
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+DB 102,15,56,220,224
+ movups xmm0,XMMWORD[((-16))+rax*1+rcx]
+ jnz NEAR $L$enc_loop3
+
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+DB 102,15,56,221,208
+DB 102,15,56,221,216
+DB 102,15,56,221,224
+ DB 0F3h,0C3h ;repret
+
+
+
+ALIGN 16
+_aesni_decrypt3:
+
+ movups xmm0,XMMWORD[rcx]
+ shl eax,4
+ movups xmm1,XMMWORD[16+rcx]
+ xorps xmm2,xmm0
+ xorps xmm3,xmm0
+ xorps xmm4,xmm0
+ movups xmm0,XMMWORD[32+rcx]
+ lea rcx,[32+rax*1+rcx]
+ neg rax
+ add rax,16
+
+$L$dec_loop3:
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+ movups xmm1,XMMWORD[rax*1+rcx]
+ add rax,32
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+DB 102,15,56,222,224
+ movups xmm0,XMMWORD[((-16))+rax*1+rcx]
+ jnz NEAR $L$dec_loop3
+
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+DB 102,15,56,223,208
+DB 102,15,56,223,216
+DB 102,15,56,223,224
+ DB 0F3h,0C3h ;repret
+
+
+
+ALIGN 16
+_aesni_encrypt4:
+
+ movups xmm0,XMMWORD[rcx]
+ shl eax,4
+ movups xmm1,XMMWORD[16+rcx]
+ xorps xmm2,xmm0
+ xorps xmm3,xmm0
+ xorps xmm4,xmm0
+ xorps xmm5,xmm0
+ movups xmm0,XMMWORD[32+rcx]
+ lea rcx,[32+rax*1+rcx]
+ neg rax
+DB 0x0f,0x1f,0x00
+ add rax,16
+
+$L$enc_loop4:
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+ movups xmm1,XMMWORD[rax*1+rcx]
+ add rax,32
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+DB 102,15,56,220,224
+DB 102,15,56,220,232
+ movups xmm0,XMMWORD[((-16))+rax*1+rcx]
+ jnz NEAR $L$enc_loop4
+
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+DB 102,15,56,221,208
+DB 102,15,56,221,216
+DB 102,15,56,221,224
+DB 102,15,56,221,232
+ DB 0F3h,0C3h ;repret
+
+
+
+ALIGN 16
+_aesni_decrypt4:
+
+ movups xmm0,XMMWORD[rcx]
+ shl eax,4
+ movups xmm1,XMMWORD[16+rcx]
+ xorps xmm2,xmm0
+ xorps xmm3,xmm0
+ xorps xmm4,xmm0
+ xorps xmm5,xmm0
+ movups xmm0,XMMWORD[32+rcx]
+ lea rcx,[32+rax*1+rcx]
+ neg rax
+DB 0x0f,0x1f,0x00
+ add rax,16
+
+$L$dec_loop4:
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+ movups xmm1,XMMWORD[rax*1+rcx]
+ add rax,32
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+ movups xmm0,XMMWORD[((-16))+rax*1+rcx]
+ jnz NEAR $L$dec_loop4
+
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+DB 102,15,56,223,208
+DB 102,15,56,223,216
+DB 102,15,56,223,224
+DB 102,15,56,223,232
+ DB 0F3h,0C3h ;repret
+
+
+
+ALIGN 16
+_aesni_encrypt6:
+
+ movups xmm0,XMMWORD[rcx]
+ shl eax,4
+ movups xmm1,XMMWORD[16+rcx]
+ xorps xmm2,xmm0
+ pxor xmm3,xmm0
+ pxor xmm4,xmm0
+DB 102,15,56,220,209
+ lea rcx,[32+rax*1+rcx]
+ neg rax
+DB 102,15,56,220,217
+ pxor xmm5,xmm0
+ pxor xmm6,xmm0
+DB 102,15,56,220,225
+ pxor xmm7,xmm0
+ movups xmm0,XMMWORD[rax*1+rcx]
+ add rax,16
+ jmp NEAR $L$enc_loop6_enter
+ALIGN 16
+$L$enc_loop6:
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+$L$enc_loop6_enter:
+DB 102,15,56,220,233
+DB 102,15,56,220,241
+DB 102,15,56,220,249
+ movups xmm1,XMMWORD[rax*1+rcx]
+ add rax,32
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+DB 102,15,56,220,224
+DB 102,15,56,220,232
+DB 102,15,56,220,240
+DB 102,15,56,220,248
+ movups xmm0,XMMWORD[((-16))+rax*1+rcx]
+ jnz NEAR $L$enc_loop6
+
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+DB 102,15,56,220,241
+DB 102,15,56,220,249
+DB 102,15,56,221,208
+DB 102,15,56,221,216
+DB 102,15,56,221,224
+DB 102,15,56,221,232
+DB 102,15,56,221,240
+DB 102,15,56,221,248
+ DB 0F3h,0C3h ;repret
+
+
+
+ALIGN 16
+_aesni_decrypt6:
+
+ movups xmm0,XMMWORD[rcx]
+ shl eax,4
+ movups xmm1,XMMWORD[16+rcx]
+ xorps xmm2,xmm0
+ pxor xmm3,xmm0
+ pxor xmm4,xmm0
+DB 102,15,56,222,209
+ lea rcx,[32+rax*1+rcx]
+ neg rax
+DB 102,15,56,222,217
+ pxor xmm5,xmm0
+ pxor xmm6,xmm0
+DB 102,15,56,222,225
+ pxor xmm7,xmm0
+ movups xmm0,XMMWORD[rax*1+rcx]
+ add rax,16
+ jmp NEAR $L$dec_loop6_enter
+ALIGN 16
+$L$dec_loop6:
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+$L$dec_loop6_enter:
+DB 102,15,56,222,233
+DB 102,15,56,222,241
+DB 102,15,56,222,249
+ movups xmm1,XMMWORD[rax*1+rcx]
+ add rax,32
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+DB 102,15,56,222,240
+DB 102,15,56,222,248
+ movups xmm0,XMMWORD[((-16))+rax*1+rcx]
+ jnz NEAR $L$dec_loop6
+
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+DB 102,15,56,222,241
+DB 102,15,56,222,249
+DB 102,15,56,223,208
+DB 102,15,56,223,216
+DB 102,15,56,223,224
+DB 102,15,56,223,232
+DB 102,15,56,223,240
+DB 102,15,56,223,248
+ DB 0F3h,0C3h ;repret
+
+
+
+ALIGN 16
+_aesni_encrypt8:
+
+ movups xmm0,XMMWORD[rcx]
+ shl eax,4
+ movups xmm1,XMMWORD[16+rcx]
+ xorps xmm2,xmm0
+ xorps xmm3,xmm0
+ pxor xmm4,xmm0
+ pxor xmm5,xmm0
+ pxor xmm6,xmm0
+ lea rcx,[32+rax*1+rcx]
+ neg rax
+DB 102,15,56,220,209
+ pxor xmm7,xmm0
+ pxor xmm8,xmm0
+DB 102,15,56,220,217
+ pxor xmm9,xmm0
+ movups xmm0,XMMWORD[rax*1+rcx]
+ add rax,16
+ jmp NEAR $L$enc_loop8_inner
+ALIGN 16
+$L$enc_loop8:
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+$L$enc_loop8_inner:
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+DB 102,15,56,220,241
+DB 102,15,56,220,249
+DB 102,68,15,56,220,193
+DB 102,68,15,56,220,201
+$L$enc_loop8_enter:
+ movups xmm1,XMMWORD[rax*1+rcx]
+ add rax,32
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+DB 102,15,56,220,224
+DB 102,15,56,220,232
+DB 102,15,56,220,240
+DB 102,15,56,220,248
+DB 102,68,15,56,220,192
+DB 102,68,15,56,220,200
+ movups xmm0,XMMWORD[((-16))+rax*1+rcx]
+ jnz NEAR $L$enc_loop8
+
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+DB 102,15,56,220,241
+DB 102,15,56,220,249
+DB 102,68,15,56,220,193
+DB 102,68,15,56,220,201
+DB 102,15,56,221,208
+DB 102,15,56,221,216
+DB 102,15,56,221,224
+DB 102,15,56,221,232
+DB 102,15,56,221,240
+DB 102,15,56,221,248
+DB 102,68,15,56,221,192
+DB 102,68,15,56,221,200
+ DB 0F3h,0C3h ;repret
+
+
+
+ALIGN 16
+_aesni_decrypt8:
+
+ movups xmm0,XMMWORD[rcx]
+ shl eax,4
+ movups xmm1,XMMWORD[16+rcx]
+ xorps xmm2,xmm0
+ xorps xmm3,xmm0
+ pxor xmm4,xmm0
+ pxor xmm5,xmm0
+ pxor xmm6,xmm0
+ lea rcx,[32+rax*1+rcx]
+ neg rax
+DB 102,15,56,222,209
+ pxor xmm7,xmm0
+ pxor xmm8,xmm0
+DB 102,15,56,222,217
+ pxor xmm9,xmm0
+ movups xmm0,XMMWORD[rax*1+rcx]
+ add rax,16
+ jmp NEAR $L$dec_loop8_inner
+ALIGN 16
+$L$dec_loop8:
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+$L$dec_loop8_inner:
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+DB 102,15,56,222,241
+DB 102,15,56,222,249
+DB 102,68,15,56,222,193
+DB 102,68,15,56,222,201
+$L$dec_loop8_enter:
+ movups xmm1,XMMWORD[rax*1+rcx]
+ add rax,32
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+DB 102,15,56,222,240
+DB 102,15,56,222,248
+DB 102,68,15,56,222,192
+DB 102,68,15,56,222,200
+ movups xmm0,XMMWORD[((-16))+rax*1+rcx]
+ jnz NEAR $L$dec_loop8
+
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+DB 102,15,56,222,241
+DB 102,15,56,222,249
+DB 102,68,15,56,222,193
+DB 102,68,15,56,222,201
+DB 102,15,56,223,208
+DB 102,15,56,223,216
+DB 102,15,56,223,224
+DB 102,15,56,223,232
+DB 102,15,56,223,240
+DB 102,15,56,223,248
+DB 102,68,15,56,223,192
+DB 102,68,15,56,223,200
+ DB 0F3h,0C3h ;repret
+
+
+global aesni_ecb_encrypt
+
+ALIGN 16
+aesni_ecb_encrypt:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_ecb_encrypt:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+
+
+
+ lea rsp,[((-88))+rsp]
+ movaps XMMWORD[rsp],xmm6
+ movaps XMMWORD[16+rsp],xmm7
+ movaps XMMWORD[32+rsp],xmm8
+ movaps XMMWORD[48+rsp],xmm9
+$L$ecb_enc_body:
+ and rdx,-16
+ jz NEAR $L$ecb_ret
+
+ mov eax,DWORD[240+rcx]
+ movups xmm0,XMMWORD[rcx]
+ mov r11,rcx
+ mov r10d,eax
+ test r8d,r8d
+ jz NEAR $L$ecb_decrypt
+
+ cmp rdx,0x80
+ jb NEAR $L$ecb_enc_tail
+
+ movdqu xmm2,XMMWORD[rdi]
+ movdqu xmm3,XMMWORD[16+rdi]
+ movdqu xmm4,XMMWORD[32+rdi]
+ movdqu xmm5,XMMWORD[48+rdi]
+ movdqu xmm6,XMMWORD[64+rdi]
+ movdqu xmm7,XMMWORD[80+rdi]
+ movdqu xmm8,XMMWORD[96+rdi]
+ movdqu xmm9,XMMWORD[112+rdi]
+ lea rdi,[128+rdi]
+ sub rdx,0x80
+ jmp NEAR $L$ecb_enc_loop8_enter
+ALIGN 16
+$L$ecb_enc_loop8:
+ movups XMMWORD[rsi],xmm2
+ mov rcx,r11
+ movdqu xmm2,XMMWORD[rdi]
+ mov eax,r10d
+ movups XMMWORD[16+rsi],xmm3
+ movdqu xmm3,XMMWORD[16+rdi]
+ movups XMMWORD[32+rsi],xmm4
+ movdqu xmm4,XMMWORD[32+rdi]
+ movups XMMWORD[48+rsi],xmm5
+ movdqu xmm5,XMMWORD[48+rdi]
+ movups XMMWORD[64+rsi],xmm6
+ movdqu xmm6,XMMWORD[64+rdi]
+ movups XMMWORD[80+rsi],xmm7
+ movdqu xmm7,XMMWORD[80+rdi]
+ movups XMMWORD[96+rsi],xmm8
+ movdqu xmm8,XMMWORD[96+rdi]
+ movups XMMWORD[112+rsi],xmm9
+ lea rsi,[128+rsi]
+ movdqu xmm9,XMMWORD[112+rdi]
+ lea rdi,[128+rdi]
+$L$ecb_enc_loop8_enter:
+
+ call _aesni_encrypt8
+
+ sub rdx,0x80
+ jnc NEAR $L$ecb_enc_loop8
+
+ movups XMMWORD[rsi],xmm2
+ mov rcx,r11
+ movups XMMWORD[16+rsi],xmm3
+ mov eax,r10d
+ movups XMMWORD[32+rsi],xmm4
+ movups XMMWORD[48+rsi],xmm5
+ movups XMMWORD[64+rsi],xmm6
+ movups XMMWORD[80+rsi],xmm7
+ movups XMMWORD[96+rsi],xmm8
+ movups XMMWORD[112+rsi],xmm9
+ lea rsi,[128+rsi]
+ add rdx,0x80
+ jz NEAR $L$ecb_ret
+
+$L$ecb_enc_tail:
+ movups xmm2,XMMWORD[rdi]
+ cmp rdx,0x20
+ jb NEAR $L$ecb_enc_one
+ movups xmm3,XMMWORD[16+rdi]
+ je NEAR $L$ecb_enc_two
+ movups xmm4,XMMWORD[32+rdi]
+ cmp rdx,0x40
+ jb NEAR $L$ecb_enc_three
+ movups xmm5,XMMWORD[48+rdi]
+ je NEAR $L$ecb_enc_four
+ movups xmm6,XMMWORD[64+rdi]
+ cmp rdx,0x60
+ jb NEAR $L$ecb_enc_five
+ movups xmm7,XMMWORD[80+rdi]
+ je NEAR $L$ecb_enc_six
+ movdqu xmm8,XMMWORD[96+rdi]
+ xorps xmm9,xmm9
+ call _aesni_encrypt8
+ movups XMMWORD[rsi],xmm2
+ movups XMMWORD[16+rsi],xmm3
+ movups XMMWORD[32+rsi],xmm4
+ movups XMMWORD[48+rsi],xmm5
+ movups XMMWORD[64+rsi],xmm6
+ movups XMMWORD[80+rsi],xmm7
+ movups XMMWORD[96+rsi],xmm8
+ jmp NEAR $L$ecb_ret
+ALIGN 16
+$L$ecb_enc_one:
+ movups xmm0,XMMWORD[rcx]
+ movups xmm1,XMMWORD[16+rcx]
+ lea rcx,[32+rcx]
+ xorps xmm2,xmm0
+$L$oop_enc1_3:
+DB 102,15,56,220,209
+ dec eax
+ movups xmm1,XMMWORD[rcx]
+ lea rcx,[16+rcx]
+ jnz NEAR $L$oop_enc1_3
+DB 102,15,56,221,209
+ movups XMMWORD[rsi],xmm2
+ jmp NEAR $L$ecb_ret
+ALIGN 16
+$L$ecb_enc_two:
+ call _aesni_encrypt2
+ movups XMMWORD[rsi],xmm2
+ movups XMMWORD[16+rsi],xmm3
+ jmp NEAR $L$ecb_ret
+ALIGN 16
+$L$ecb_enc_three:
+ call _aesni_encrypt3
+ movups XMMWORD[rsi],xmm2
+ movups XMMWORD[16+rsi],xmm3
+ movups XMMWORD[32+rsi],xmm4
+ jmp NEAR $L$ecb_ret
+ALIGN 16
+$L$ecb_enc_four:
+ call _aesni_encrypt4
+ movups XMMWORD[rsi],xmm2
+ movups XMMWORD[16+rsi],xmm3
+ movups XMMWORD[32+rsi],xmm4
+ movups XMMWORD[48+rsi],xmm5
+ jmp NEAR $L$ecb_ret
+ALIGN 16
+$L$ecb_enc_five:
+ xorps xmm7,xmm7
+ call _aesni_encrypt6
+ movups XMMWORD[rsi],xmm2
+ movups XMMWORD[16+rsi],xmm3
+ movups XMMWORD[32+rsi],xmm4
+ movups XMMWORD[48+rsi],xmm5
+ movups XMMWORD[64+rsi],xmm6
+ jmp NEAR $L$ecb_ret
+ALIGN 16
+$L$ecb_enc_six:
+ call _aesni_encrypt6
+ movups XMMWORD[rsi],xmm2
+ movups XMMWORD[16+rsi],xmm3
+ movups XMMWORD[32+rsi],xmm4
+ movups XMMWORD[48+rsi],xmm5
+ movups XMMWORD[64+rsi],xmm6
+ movups XMMWORD[80+rsi],xmm7
+ jmp NEAR $L$ecb_ret
+
+ALIGN 16
+$L$ecb_decrypt:
+ cmp rdx,0x80
+ jb NEAR $L$ecb_dec_tail
+
+ movdqu xmm2,XMMWORD[rdi]
+ movdqu xmm3,XMMWORD[16+rdi]
+ movdqu xmm4,XMMWORD[32+rdi]
+ movdqu xmm5,XMMWORD[48+rdi]
+ movdqu xmm6,XMMWORD[64+rdi]
+ movdqu xmm7,XMMWORD[80+rdi]
+ movdqu xmm8,XMMWORD[96+rdi]
+ movdqu xmm9,XMMWORD[112+rdi]
+ lea rdi,[128+rdi]
+ sub rdx,0x80
+ jmp NEAR $L$ecb_dec_loop8_enter
+ALIGN 16
+$L$ecb_dec_loop8:
+ movups XMMWORD[rsi],xmm2
+ mov rcx,r11
+ movdqu xmm2,XMMWORD[rdi]
+ mov eax,r10d
+ movups XMMWORD[16+rsi],xmm3
+ movdqu xmm3,XMMWORD[16+rdi]
+ movups XMMWORD[32+rsi],xmm4
+ movdqu xmm4,XMMWORD[32+rdi]
+ movups XMMWORD[48+rsi],xmm5
+ movdqu xmm5,XMMWORD[48+rdi]
+ movups XMMWORD[64+rsi],xmm6
+ movdqu xmm6,XMMWORD[64+rdi]
+ movups XMMWORD[80+rsi],xmm7
+ movdqu xmm7,XMMWORD[80+rdi]
+ movups XMMWORD[96+rsi],xmm8
+ movdqu xmm8,XMMWORD[96+rdi]
+ movups XMMWORD[112+rsi],xmm9
+ lea rsi,[128+rsi]
+ movdqu xmm9,XMMWORD[112+rdi]
+ lea rdi,[128+rdi]
+$L$ecb_dec_loop8_enter:
+
+ call _aesni_decrypt8
+
+ movups xmm0,XMMWORD[r11]
+ sub rdx,0x80
+ jnc NEAR $L$ecb_dec_loop8
+
+ movups XMMWORD[rsi],xmm2
+ pxor xmm2,xmm2
+ mov rcx,r11
+ movups XMMWORD[16+rsi],xmm3
+ pxor xmm3,xmm3
+ mov eax,r10d
+ movups XMMWORD[32+rsi],xmm4
+ pxor xmm4,xmm4
+ movups XMMWORD[48+rsi],xmm5
+ pxor xmm5,xmm5
+ movups XMMWORD[64+rsi],xmm6
+ pxor xmm6,xmm6
+ movups XMMWORD[80+rsi],xmm7
+ pxor xmm7,xmm7
+ movups XMMWORD[96+rsi],xmm8
+ pxor xmm8,xmm8
+ movups XMMWORD[112+rsi],xmm9
+ pxor xmm9,xmm9
+ lea rsi,[128+rsi]
+ add rdx,0x80
+ jz NEAR $L$ecb_ret
+
+$L$ecb_dec_tail:
+ movups xmm2,XMMWORD[rdi]
+ cmp rdx,0x20
+ jb NEAR $L$ecb_dec_one
+ movups xmm3,XMMWORD[16+rdi]
+ je NEAR $L$ecb_dec_two
+ movups xmm4,XMMWORD[32+rdi]
+ cmp rdx,0x40
+ jb NEAR $L$ecb_dec_three
+ movups xmm5,XMMWORD[48+rdi]
+ je NEAR $L$ecb_dec_four
+ movups xmm6,XMMWORD[64+rdi]
+ cmp rdx,0x60
+ jb NEAR $L$ecb_dec_five
+ movups xmm7,XMMWORD[80+rdi]
+ je NEAR $L$ecb_dec_six
+ movups xmm8,XMMWORD[96+rdi]
+ movups xmm0,XMMWORD[rcx]
+ xorps xmm9,xmm9
+ call _aesni_decrypt8
+ movups XMMWORD[rsi],xmm2
+ pxor xmm2,xmm2
+ movups XMMWORD[16+rsi],xmm3
+ pxor xmm3,xmm3
+ movups XMMWORD[32+rsi],xmm4
+ pxor xmm4,xmm4
+ movups XMMWORD[48+rsi],xmm5
+ pxor xmm5,xmm5
+ movups XMMWORD[64+rsi],xmm6
+ pxor xmm6,xmm6
+ movups XMMWORD[80+rsi],xmm7
+ pxor xmm7,xmm7
+ movups XMMWORD[96+rsi],xmm8
+ pxor xmm8,xmm8
+ pxor xmm9,xmm9
+ jmp NEAR $L$ecb_ret
+ALIGN 16
+$L$ecb_dec_one:
+ movups xmm0,XMMWORD[rcx]
+ movups xmm1,XMMWORD[16+rcx]
+ lea rcx,[32+rcx]
+ xorps xmm2,xmm0
+$L$oop_dec1_4:
+DB 102,15,56,222,209
+ dec eax
+ movups xmm1,XMMWORD[rcx]
+ lea rcx,[16+rcx]
+ jnz NEAR $L$oop_dec1_4
+DB 102,15,56,223,209
+ movups XMMWORD[rsi],xmm2
+ pxor xmm2,xmm2
+ jmp NEAR $L$ecb_ret
+ALIGN 16
+$L$ecb_dec_two:
+ call _aesni_decrypt2
+ movups XMMWORD[rsi],xmm2
+ pxor xmm2,xmm2
+ movups XMMWORD[16+rsi],xmm3
+ pxor xmm3,xmm3
+ jmp NEAR $L$ecb_ret
+ALIGN 16
+$L$ecb_dec_three:
+ call _aesni_decrypt3
+ movups XMMWORD[rsi],xmm2
+ pxor xmm2,xmm2
+ movups XMMWORD[16+rsi],xmm3
+ pxor xmm3,xmm3
+ movups XMMWORD[32+rsi],xmm4
+ pxor xmm4,xmm4
+ jmp NEAR $L$ecb_ret
+ALIGN 16
+$L$ecb_dec_four:
+ call _aesni_decrypt4
+ movups XMMWORD[rsi],xmm2
+ pxor xmm2,xmm2
+ movups XMMWORD[16+rsi],xmm3
+ pxor xmm3,xmm3
+ movups XMMWORD[32+rsi],xmm4
+ pxor xmm4,xmm4
+ movups XMMWORD[48+rsi],xmm5
+ pxor xmm5,xmm5
+ jmp NEAR $L$ecb_ret
+ALIGN 16
+$L$ecb_dec_five:
+ xorps xmm7,xmm7
+ call _aesni_decrypt6
+ movups XMMWORD[rsi],xmm2
+ pxor xmm2,xmm2
+ movups XMMWORD[16+rsi],xmm3
+ pxor xmm3,xmm3
+ movups XMMWORD[32+rsi],xmm4
+ pxor xmm4,xmm4
+ movups XMMWORD[48+rsi],xmm5
+ pxor xmm5,xmm5
+ movups XMMWORD[64+rsi],xmm6
+ pxor xmm6,xmm6
+ pxor xmm7,xmm7
+ jmp NEAR $L$ecb_ret
+ALIGN 16
+$L$ecb_dec_six:
+ call _aesni_decrypt6
+ movups XMMWORD[rsi],xmm2
+ pxor xmm2,xmm2
+ movups XMMWORD[16+rsi],xmm3
+ pxor xmm3,xmm3
+ movups XMMWORD[32+rsi],xmm4
+ pxor xmm4,xmm4
+ movups XMMWORD[48+rsi],xmm5
+ pxor xmm5,xmm5
+ movups XMMWORD[64+rsi],xmm6
+ pxor xmm6,xmm6
+ movups XMMWORD[80+rsi],xmm7
+ pxor xmm7,xmm7
+
+$L$ecb_ret:
+ xorps xmm0,xmm0
+ pxor xmm1,xmm1
+ movaps xmm6,XMMWORD[rsp]
+ movaps XMMWORD[rsp],xmm0
+ movaps xmm7,XMMWORD[16+rsp]
+ movaps XMMWORD[16+rsp],xmm0
+ movaps xmm8,XMMWORD[32+rsp]
+ movaps XMMWORD[32+rsp],xmm0
+ movaps xmm9,XMMWORD[48+rsp]
+ movaps XMMWORD[48+rsp],xmm0
+ lea rsp,[88+rsp]
+$L$ecb_enc_ret:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_aesni_ecb_encrypt:
+global aesni_ccm64_encrypt_blocks
+
+ALIGN 16
+aesni_ccm64_encrypt_blocks:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_ccm64_encrypt_blocks:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+ lea rsp,[((-88))+rsp]
+ movaps XMMWORD[rsp],xmm6
+ movaps XMMWORD[16+rsp],xmm7
+ movaps XMMWORD[32+rsp],xmm8
+ movaps XMMWORD[48+rsp],xmm9
+$L$ccm64_enc_body:
+ mov eax,DWORD[240+rcx]
+ movdqu xmm6,XMMWORD[r8]
+ movdqa xmm9,XMMWORD[$L$increment64]
+ movdqa xmm7,XMMWORD[$L$bswap_mask]
+
+ shl eax,4
+ mov r10d,16
+ lea r11,[rcx]
+ movdqu xmm3,XMMWORD[r9]
+ movdqa xmm2,xmm6
+ lea rcx,[32+rax*1+rcx]
+DB 102,15,56,0,247
+ sub r10,rax
+ jmp NEAR $L$ccm64_enc_outer
+ALIGN 16
+$L$ccm64_enc_outer:
+ movups xmm0,XMMWORD[r11]
+ mov rax,r10
+ movups xmm8,XMMWORD[rdi]
+
+ xorps xmm2,xmm0
+ movups xmm1,XMMWORD[16+r11]
+ xorps xmm0,xmm8
+ xorps xmm3,xmm0
+ movups xmm0,XMMWORD[32+r11]
+
+$L$ccm64_enc2_loop:
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+ movups xmm1,XMMWORD[rax*1+rcx]
+ add rax,32
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+ movups xmm0,XMMWORD[((-16))+rax*1+rcx]
+ jnz NEAR $L$ccm64_enc2_loop
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+ paddq xmm6,xmm9
+ dec rdx
+DB 102,15,56,221,208
+DB 102,15,56,221,216
+
+ lea rdi,[16+rdi]
+ xorps xmm8,xmm2
+ movdqa xmm2,xmm6
+ movups XMMWORD[rsi],xmm8
+DB 102,15,56,0,215
+ lea rsi,[16+rsi]
+ jnz NEAR $L$ccm64_enc_outer
+
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ pxor xmm2,xmm2
+ movups XMMWORD[r9],xmm3
+ pxor xmm3,xmm3
+ pxor xmm8,xmm8
+ pxor xmm6,xmm6
+ movaps xmm6,XMMWORD[rsp]
+ movaps XMMWORD[rsp],xmm0
+ movaps xmm7,XMMWORD[16+rsp]
+ movaps XMMWORD[16+rsp],xmm0
+ movaps xmm8,XMMWORD[32+rsp]
+ movaps XMMWORD[32+rsp],xmm0
+ movaps xmm9,XMMWORD[48+rsp]
+ movaps XMMWORD[48+rsp],xmm0
+ lea rsp,[88+rsp]
+$L$ccm64_enc_ret:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+$L$SEH_end_aesni_ccm64_encrypt_blocks:
+global aesni_ccm64_decrypt_blocks
+
+ALIGN 16
+aesni_ccm64_decrypt_blocks:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_ccm64_decrypt_blocks:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+ lea rsp,[((-88))+rsp]
+ movaps XMMWORD[rsp],xmm6
+ movaps XMMWORD[16+rsp],xmm7
+ movaps XMMWORD[32+rsp],xmm8
+ movaps XMMWORD[48+rsp],xmm9
+$L$ccm64_dec_body:
+ mov eax,DWORD[240+rcx]
+ movups xmm6,XMMWORD[r8]
+ movdqu xmm3,XMMWORD[r9]
+ movdqa xmm9,XMMWORD[$L$increment64]
+ movdqa xmm7,XMMWORD[$L$bswap_mask]
+
+ movaps xmm2,xmm6
+ mov r10d,eax
+ mov r11,rcx
+DB 102,15,56,0,247
+ movups xmm0,XMMWORD[rcx]
+ movups xmm1,XMMWORD[16+rcx]
+ lea rcx,[32+rcx]
+ xorps xmm2,xmm0
+$L$oop_enc1_5:
+DB 102,15,56,220,209
+ dec eax
+ movups xmm1,XMMWORD[rcx]
+ lea rcx,[16+rcx]
+ jnz NEAR $L$oop_enc1_5
+DB 102,15,56,221,209
+ shl r10d,4
+ mov eax,16
+ movups xmm8,XMMWORD[rdi]
+ paddq xmm6,xmm9
+ lea rdi,[16+rdi]
+ sub rax,r10
+ lea rcx,[32+r10*1+r11]
+ mov r10,rax
+ jmp NEAR $L$ccm64_dec_outer
+ALIGN 16
+$L$ccm64_dec_outer:
+ xorps xmm8,xmm2
+ movdqa xmm2,xmm6
+ movups XMMWORD[rsi],xmm8
+ lea rsi,[16+rsi]
+DB 102,15,56,0,215
+
+ sub rdx,1
+ jz NEAR $L$ccm64_dec_break
+
+ movups xmm0,XMMWORD[r11]
+ mov rax,r10
+ movups xmm1,XMMWORD[16+r11]
+ xorps xmm8,xmm0
+ xorps xmm2,xmm0
+ xorps xmm3,xmm8
+ movups xmm0,XMMWORD[32+r11]
+ jmp NEAR $L$ccm64_dec2_loop
+ALIGN 16
+$L$ccm64_dec2_loop:
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+ movups xmm1,XMMWORD[rax*1+rcx]
+ add rax,32
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+ movups xmm0,XMMWORD[((-16))+rax*1+rcx]
+ jnz NEAR $L$ccm64_dec2_loop
+ movups xmm8,XMMWORD[rdi]
+ paddq xmm6,xmm9
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,221,208
+DB 102,15,56,221,216
+ lea rdi,[16+rdi]
+ jmp NEAR $L$ccm64_dec_outer
+
+ALIGN 16
+$L$ccm64_dec_break:
+
+ mov eax,DWORD[240+r11]
+ movups xmm0,XMMWORD[r11]
+ movups xmm1,XMMWORD[16+r11]
+ xorps xmm8,xmm0
+ lea r11,[32+r11]
+ xorps xmm3,xmm8
+$L$oop_enc1_6:
+DB 102,15,56,220,217
+ dec eax
+ movups xmm1,XMMWORD[r11]
+ lea r11,[16+r11]
+ jnz NEAR $L$oop_enc1_6
+DB 102,15,56,221,217
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ pxor xmm2,xmm2
+ movups XMMWORD[r9],xmm3
+ pxor xmm3,xmm3
+ pxor xmm8,xmm8
+ pxor xmm6,xmm6
+ movaps xmm6,XMMWORD[rsp]
+ movaps XMMWORD[rsp],xmm0
+ movaps xmm7,XMMWORD[16+rsp]
+ movaps XMMWORD[16+rsp],xmm0
+ movaps xmm8,XMMWORD[32+rsp]
+ movaps XMMWORD[32+rsp],xmm0
+ movaps xmm9,XMMWORD[48+rsp]
+ movaps XMMWORD[48+rsp],xmm0
+ lea rsp,[88+rsp]
+$L$ccm64_dec_ret:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+$L$SEH_end_aesni_ccm64_decrypt_blocks:
+global aesni_ctr32_encrypt_blocks
+
+ALIGN 16
+aesni_ctr32_encrypt_blocks:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_ctr32_encrypt_blocks:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+
+
+
+ cmp rdx,1
+ jne NEAR $L$ctr32_bulk
+
+
+
+ movups xmm2,XMMWORD[r8]
+ movups xmm3,XMMWORD[rdi]
+ mov edx,DWORD[240+rcx]
+ movups xmm0,XMMWORD[rcx]
+ movups xmm1,XMMWORD[16+rcx]
+ lea rcx,[32+rcx]
+ xorps xmm2,xmm0
+$L$oop_enc1_7:
+DB 102,15,56,220,209
+ dec edx
+ movups xmm1,XMMWORD[rcx]
+ lea rcx,[16+rcx]
+ jnz NEAR $L$oop_enc1_7
+DB 102,15,56,221,209
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ xorps xmm2,xmm3
+ pxor xmm3,xmm3
+ movups XMMWORD[rsi],xmm2
+ xorps xmm2,xmm2
+ jmp NEAR $L$ctr32_epilogue
+
+ALIGN 16
+$L$ctr32_bulk:
+ lea r11,[rsp]
+
+ push rbp
+
+ sub rsp,288
+ and rsp,-16
+ movaps XMMWORD[(-168)+r11],xmm6
+ movaps XMMWORD[(-152)+r11],xmm7
+ movaps XMMWORD[(-136)+r11],xmm8
+ movaps XMMWORD[(-120)+r11],xmm9
+ movaps XMMWORD[(-104)+r11],xmm10
+ movaps XMMWORD[(-88)+r11],xmm11
+ movaps XMMWORD[(-72)+r11],xmm12
+ movaps XMMWORD[(-56)+r11],xmm13
+ movaps XMMWORD[(-40)+r11],xmm14
+ movaps XMMWORD[(-24)+r11],xmm15
+$L$ctr32_body:
+
+
+
+
+ movdqu xmm2,XMMWORD[r8]
+ movdqu xmm0,XMMWORD[rcx]
+ mov r8d,DWORD[12+r8]
+ pxor xmm2,xmm0
+ mov ebp,DWORD[12+rcx]
+ movdqa XMMWORD[rsp],xmm2
+ bswap r8d
+ movdqa xmm3,xmm2
+ movdqa xmm4,xmm2
+ movdqa xmm5,xmm2
+ movdqa XMMWORD[64+rsp],xmm2
+ movdqa XMMWORD[80+rsp],xmm2
+ movdqa XMMWORD[96+rsp],xmm2
+ mov r10,rdx
+ movdqa XMMWORD[112+rsp],xmm2
+
+ lea rax,[1+r8]
+ lea rdx,[2+r8]
+ bswap eax
+ bswap edx
+ xor eax,ebp
+ xor edx,ebp
+DB 102,15,58,34,216,3
+ lea rax,[3+r8]
+ movdqa XMMWORD[16+rsp],xmm3
+DB 102,15,58,34,226,3
+ bswap eax
+ mov rdx,r10
+ lea r10,[4+r8]
+ movdqa XMMWORD[32+rsp],xmm4
+ xor eax,ebp
+ bswap r10d
+DB 102,15,58,34,232,3
+ xor r10d,ebp
+ movdqa XMMWORD[48+rsp],xmm5
+ lea r9,[5+r8]
+ mov DWORD[((64+12))+rsp],r10d
+ bswap r9d
+ lea r10,[6+r8]
+ mov eax,DWORD[240+rcx]
+ xor r9d,ebp
+ bswap r10d
+ mov DWORD[((80+12))+rsp],r9d
+ xor r10d,ebp
+ lea r9,[7+r8]
+ mov DWORD[((96+12))+rsp],r10d
+ bswap r9d
+ mov r10d,DWORD[((OPENSSL_ia32cap_P+4))]
+ xor r9d,ebp
+ and r10d,71303168
+ mov DWORD[((112+12))+rsp],r9d
+
+ movups xmm1,XMMWORD[16+rcx]
+
+ movdqa xmm6,XMMWORD[64+rsp]
+ movdqa xmm7,XMMWORD[80+rsp]
+
+ cmp rdx,8
+ jb NEAR $L$ctr32_tail
+
+ sub rdx,6
+ cmp r10d,4194304
+ je NEAR $L$ctr32_6x
+
+ lea rcx,[128+rcx]
+ sub rdx,2
+ jmp NEAR $L$ctr32_loop8
+
+ALIGN 16
+$L$ctr32_6x:
+ shl eax,4
+ mov r10d,48
+ bswap ebp
+ lea rcx,[32+rax*1+rcx]
+ sub r10,rax
+ jmp NEAR $L$ctr32_loop6
+
+ALIGN 16
+$L$ctr32_loop6:
+ add r8d,6
+ movups xmm0,XMMWORD[((-48))+r10*1+rcx]
+DB 102,15,56,220,209
+ mov eax,r8d
+ xor eax,ebp
+DB 102,15,56,220,217
+DB 0x0f,0x38,0xf1,0x44,0x24,12
+ lea eax,[1+r8]
+DB 102,15,56,220,225
+ xor eax,ebp
+DB 0x0f,0x38,0xf1,0x44,0x24,28
+DB 102,15,56,220,233
+ lea eax,[2+r8]
+ xor eax,ebp
+DB 102,15,56,220,241
+DB 0x0f,0x38,0xf1,0x44,0x24,44
+ lea eax,[3+r8]
+DB 102,15,56,220,249
+ movups xmm1,XMMWORD[((-32))+r10*1+rcx]
+ xor eax,ebp
+
+DB 102,15,56,220,208
+DB 0x0f,0x38,0xf1,0x44,0x24,60
+ lea eax,[4+r8]
+DB 102,15,56,220,216
+ xor eax,ebp
+DB 0x0f,0x38,0xf1,0x44,0x24,76
+DB 102,15,56,220,224
+ lea eax,[5+r8]
+ xor eax,ebp
+DB 102,15,56,220,232
+DB 0x0f,0x38,0xf1,0x44,0x24,92
+ mov rax,r10
+DB 102,15,56,220,240
+DB 102,15,56,220,248
+ movups xmm0,XMMWORD[((-16))+r10*1+rcx]
+
+ call $L$enc_loop6
+
+ movdqu xmm8,XMMWORD[rdi]
+ movdqu xmm9,XMMWORD[16+rdi]
+ movdqu xmm10,XMMWORD[32+rdi]
+ movdqu xmm11,XMMWORD[48+rdi]
+ movdqu xmm12,XMMWORD[64+rdi]
+ movdqu xmm13,XMMWORD[80+rdi]
+ lea rdi,[96+rdi]
+ movups xmm1,XMMWORD[((-64))+r10*1+rcx]
+ pxor xmm8,xmm2
+ movaps xmm2,XMMWORD[rsp]
+ pxor xmm9,xmm3
+ movaps xmm3,XMMWORD[16+rsp]
+ pxor xmm10,xmm4
+ movaps xmm4,XMMWORD[32+rsp]
+ pxor xmm11,xmm5
+ movaps xmm5,XMMWORD[48+rsp]
+ pxor xmm12,xmm6
+ movaps xmm6,XMMWORD[64+rsp]
+ pxor xmm13,xmm7
+ movaps xmm7,XMMWORD[80+rsp]
+ movdqu XMMWORD[rsi],xmm8
+ movdqu XMMWORD[16+rsi],xmm9
+ movdqu XMMWORD[32+rsi],xmm10
+ movdqu XMMWORD[48+rsi],xmm11
+ movdqu XMMWORD[64+rsi],xmm12
+ movdqu XMMWORD[80+rsi],xmm13
+ lea rsi,[96+rsi]
+
+ sub rdx,6
+ jnc NEAR $L$ctr32_loop6
+
+ add rdx,6
+ jz NEAR $L$ctr32_done
+
+ lea eax,[((-48))+r10]
+ lea rcx,[((-80))+r10*1+rcx]
+ neg eax
+ shr eax,4
+ jmp NEAR $L$ctr32_tail
+
+ALIGN 32
+$L$ctr32_loop8:
+ add r8d,8
+ movdqa xmm8,XMMWORD[96+rsp]
+DB 102,15,56,220,209
+ mov r9d,r8d
+ movdqa xmm9,XMMWORD[112+rsp]
+DB 102,15,56,220,217
+ bswap r9d
+ movups xmm0,XMMWORD[((32-128))+rcx]
+DB 102,15,56,220,225
+ xor r9d,ebp
+ nop
+DB 102,15,56,220,233
+ mov DWORD[((0+12))+rsp],r9d
+ lea r9,[1+r8]
+DB 102,15,56,220,241
+DB 102,15,56,220,249
+DB 102,68,15,56,220,193
+DB 102,68,15,56,220,201
+ movups xmm1,XMMWORD[((48-128))+rcx]
+ bswap r9d
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+ xor r9d,ebp
+DB 0x66,0x90
+DB 102,15,56,220,224
+DB 102,15,56,220,232
+ mov DWORD[((16+12))+rsp],r9d
+ lea r9,[2+r8]
+DB 102,15,56,220,240
+DB 102,15,56,220,248
+DB 102,68,15,56,220,192
+DB 102,68,15,56,220,200
+ movups xmm0,XMMWORD[((64-128))+rcx]
+ bswap r9d
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+ xor r9d,ebp
+DB 0x66,0x90
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+ mov DWORD[((32+12))+rsp],r9d
+ lea r9,[3+r8]
+DB 102,15,56,220,241
+DB 102,15,56,220,249
+DB 102,68,15,56,220,193
+DB 102,68,15,56,220,201
+ movups xmm1,XMMWORD[((80-128))+rcx]
+ bswap r9d
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+ xor r9d,ebp
+DB 0x66,0x90
+DB 102,15,56,220,224
+DB 102,15,56,220,232
+ mov DWORD[((48+12))+rsp],r9d
+ lea r9,[4+r8]
+DB 102,15,56,220,240
+DB 102,15,56,220,248
+DB 102,68,15,56,220,192
+DB 102,68,15,56,220,200
+ movups xmm0,XMMWORD[((96-128))+rcx]
+ bswap r9d
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+ xor r9d,ebp
+DB 0x66,0x90
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+ mov DWORD[((64+12))+rsp],r9d
+ lea r9,[5+r8]
+DB 102,15,56,220,241
+DB 102,15,56,220,249
+DB 102,68,15,56,220,193
+DB 102,68,15,56,220,201
+ movups xmm1,XMMWORD[((112-128))+rcx]
+ bswap r9d
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+ xor r9d,ebp
+DB 0x66,0x90
+DB 102,15,56,220,224
+DB 102,15,56,220,232
+ mov DWORD[((80+12))+rsp],r9d
+ lea r9,[6+r8]
+DB 102,15,56,220,240
+DB 102,15,56,220,248
+DB 102,68,15,56,220,192
+DB 102,68,15,56,220,200
+ movups xmm0,XMMWORD[((128-128))+rcx]
+ bswap r9d
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+ xor r9d,ebp
+DB 0x66,0x90
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+ mov DWORD[((96+12))+rsp],r9d
+ lea r9,[7+r8]
+DB 102,15,56,220,241
+DB 102,15,56,220,249
+DB 102,68,15,56,220,193
+DB 102,68,15,56,220,201
+ movups xmm1,XMMWORD[((144-128))+rcx]
+ bswap r9d
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+DB 102,15,56,220,224
+ xor r9d,ebp
+ movdqu xmm10,XMMWORD[rdi]
+DB 102,15,56,220,232
+ mov DWORD[((112+12))+rsp],r9d
+ cmp eax,11
+DB 102,15,56,220,240
+DB 102,15,56,220,248
+DB 102,68,15,56,220,192
+DB 102,68,15,56,220,200
+ movups xmm0,XMMWORD[((160-128))+rcx]
+
+ jb NEAR $L$ctr32_enc_done
+
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+DB 102,15,56,220,241
+DB 102,15,56,220,249
+DB 102,68,15,56,220,193
+DB 102,68,15,56,220,201
+ movups xmm1,XMMWORD[((176-128))+rcx]
+
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+DB 102,15,56,220,224
+DB 102,15,56,220,232
+DB 102,15,56,220,240
+DB 102,15,56,220,248
+DB 102,68,15,56,220,192
+DB 102,68,15,56,220,200
+ movups xmm0,XMMWORD[((192-128))+rcx]
+ je NEAR $L$ctr32_enc_done
+
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+DB 102,15,56,220,241
+DB 102,15,56,220,249
+DB 102,68,15,56,220,193
+DB 102,68,15,56,220,201
+ movups xmm1,XMMWORD[((208-128))+rcx]
+
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+DB 102,15,56,220,224
+DB 102,15,56,220,232
+DB 102,15,56,220,240
+DB 102,15,56,220,248
+DB 102,68,15,56,220,192
+DB 102,68,15,56,220,200
+ movups xmm0,XMMWORD[((224-128))+rcx]
+ jmp NEAR $L$ctr32_enc_done
+
+ALIGN 16
+$L$ctr32_enc_done:
+ movdqu xmm11,XMMWORD[16+rdi]
+ pxor xmm10,xmm0
+ movdqu xmm12,XMMWORD[32+rdi]
+ pxor xmm11,xmm0
+ movdqu xmm13,XMMWORD[48+rdi]
+ pxor xmm12,xmm0
+ movdqu xmm14,XMMWORD[64+rdi]
+ pxor xmm13,xmm0
+ movdqu xmm15,XMMWORD[80+rdi]
+ pxor xmm14,xmm0
+ pxor xmm15,xmm0
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+DB 102,15,56,220,241
+DB 102,15,56,220,249
+DB 102,68,15,56,220,193
+DB 102,68,15,56,220,201
+ movdqu xmm1,XMMWORD[96+rdi]
+ lea rdi,[128+rdi]
+
+DB 102,65,15,56,221,210
+ pxor xmm1,xmm0
+ movdqu xmm10,XMMWORD[((112-128))+rdi]
+DB 102,65,15,56,221,219
+ pxor xmm10,xmm0
+ movdqa xmm11,XMMWORD[rsp]
+DB 102,65,15,56,221,228
+DB 102,65,15,56,221,237
+ movdqa xmm12,XMMWORD[16+rsp]
+ movdqa xmm13,XMMWORD[32+rsp]
+DB 102,65,15,56,221,246
+DB 102,65,15,56,221,255
+ movdqa xmm14,XMMWORD[48+rsp]
+ movdqa xmm15,XMMWORD[64+rsp]
+DB 102,68,15,56,221,193
+ movdqa xmm0,XMMWORD[80+rsp]
+ movups xmm1,XMMWORD[((16-128))+rcx]
+DB 102,69,15,56,221,202
+
+ movups XMMWORD[rsi],xmm2
+ movdqa xmm2,xmm11
+ movups XMMWORD[16+rsi],xmm3
+ movdqa xmm3,xmm12
+ movups XMMWORD[32+rsi],xmm4
+ movdqa xmm4,xmm13
+ movups XMMWORD[48+rsi],xmm5
+ movdqa xmm5,xmm14
+ movups XMMWORD[64+rsi],xmm6
+ movdqa xmm6,xmm15
+ movups XMMWORD[80+rsi],xmm7
+ movdqa xmm7,xmm0
+ movups XMMWORD[96+rsi],xmm8
+ movups XMMWORD[112+rsi],xmm9
+ lea rsi,[128+rsi]
+
+ sub rdx,8
+ jnc NEAR $L$ctr32_loop8
+
+ add rdx,8
+ jz NEAR $L$ctr32_done
+ lea rcx,[((-128))+rcx]
+
+$L$ctr32_tail:
+
+
+ lea rcx,[16+rcx]
+ cmp rdx,4
+ jb NEAR $L$ctr32_loop3
+ je NEAR $L$ctr32_loop4
+
+
+ shl eax,4
+ movdqa xmm8,XMMWORD[96+rsp]
+ pxor xmm9,xmm9
+
+ movups xmm0,XMMWORD[16+rcx]
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+ lea rcx,[((32-16))+rax*1+rcx]
+ neg rax
+DB 102,15,56,220,225
+ add rax,16
+ movups xmm10,XMMWORD[rdi]
+DB 102,15,56,220,233
+DB 102,15,56,220,241
+ movups xmm11,XMMWORD[16+rdi]
+ movups xmm12,XMMWORD[32+rdi]
+DB 102,15,56,220,249
+DB 102,68,15,56,220,193
+
+ call $L$enc_loop8_enter
+
+ movdqu xmm13,XMMWORD[48+rdi]
+ pxor xmm2,xmm10
+ movdqu xmm10,XMMWORD[64+rdi]
+ pxor xmm3,xmm11
+ movdqu XMMWORD[rsi],xmm2
+ pxor xmm4,xmm12
+ movdqu XMMWORD[16+rsi],xmm3
+ pxor xmm5,xmm13
+ movdqu XMMWORD[32+rsi],xmm4
+ pxor xmm6,xmm10
+ movdqu XMMWORD[48+rsi],xmm5
+ movdqu XMMWORD[64+rsi],xmm6
+ cmp rdx,6
+ jb NEAR $L$ctr32_done
+
+ movups xmm11,XMMWORD[80+rdi]
+ xorps xmm7,xmm11
+ movups XMMWORD[80+rsi],xmm7
+ je NEAR $L$ctr32_done
+
+ movups xmm12,XMMWORD[96+rdi]
+ xorps xmm8,xmm12
+ movups XMMWORD[96+rsi],xmm8
+ jmp NEAR $L$ctr32_done
+
+ALIGN 32
+$L$ctr32_loop4:
+DB 102,15,56,220,209
+ lea rcx,[16+rcx]
+ dec eax
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+ movups xmm1,XMMWORD[rcx]
+ jnz NEAR $L$ctr32_loop4
+DB 102,15,56,221,209
+DB 102,15,56,221,217
+ movups xmm10,XMMWORD[rdi]
+ movups xmm11,XMMWORD[16+rdi]
+DB 102,15,56,221,225
+DB 102,15,56,221,233
+ movups xmm12,XMMWORD[32+rdi]
+ movups xmm13,XMMWORD[48+rdi]
+
+ xorps xmm2,xmm10
+ movups XMMWORD[rsi],xmm2
+ xorps xmm3,xmm11
+ movups XMMWORD[16+rsi],xmm3
+ pxor xmm4,xmm12
+ movdqu XMMWORD[32+rsi],xmm4
+ pxor xmm5,xmm13
+ movdqu XMMWORD[48+rsi],xmm5
+ jmp NEAR $L$ctr32_done
+
+ALIGN 32
+$L$ctr32_loop3:
+DB 102,15,56,220,209
+ lea rcx,[16+rcx]
+ dec eax
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+ movups xmm1,XMMWORD[rcx]
+ jnz NEAR $L$ctr32_loop3
+DB 102,15,56,221,209
+DB 102,15,56,221,217
+DB 102,15,56,221,225
+
+ movups xmm10,XMMWORD[rdi]
+ xorps xmm2,xmm10
+ movups XMMWORD[rsi],xmm2
+ cmp rdx,2
+ jb NEAR $L$ctr32_done
+
+ movups xmm11,XMMWORD[16+rdi]
+ xorps xmm3,xmm11
+ movups XMMWORD[16+rsi],xmm3
+ je NEAR $L$ctr32_done
+
+ movups xmm12,XMMWORD[32+rdi]
+ xorps xmm4,xmm12
+ movups XMMWORD[32+rsi],xmm4
+
+$L$ctr32_done:
+ xorps xmm0,xmm0
+ xor ebp,ebp
+ pxor xmm1,xmm1
+ pxor xmm2,xmm2
+ pxor xmm3,xmm3
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+ movaps xmm6,XMMWORD[((-168))+r11]
+ movaps XMMWORD[(-168)+r11],xmm0
+ movaps xmm7,XMMWORD[((-152))+r11]
+ movaps XMMWORD[(-152)+r11],xmm0
+ movaps xmm8,XMMWORD[((-136))+r11]
+ movaps XMMWORD[(-136)+r11],xmm0
+ movaps xmm9,XMMWORD[((-120))+r11]
+ movaps XMMWORD[(-120)+r11],xmm0
+ movaps xmm10,XMMWORD[((-104))+r11]
+ movaps XMMWORD[(-104)+r11],xmm0
+ movaps xmm11,XMMWORD[((-88))+r11]
+ movaps XMMWORD[(-88)+r11],xmm0
+ movaps xmm12,XMMWORD[((-72))+r11]
+ movaps XMMWORD[(-72)+r11],xmm0
+ movaps xmm13,XMMWORD[((-56))+r11]
+ movaps XMMWORD[(-56)+r11],xmm0
+ movaps xmm14,XMMWORD[((-40))+r11]
+ movaps XMMWORD[(-40)+r11],xmm0
+ movaps xmm15,XMMWORD[((-24))+r11]
+ movaps XMMWORD[(-24)+r11],xmm0
+ movaps XMMWORD[rsp],xmm0
+ movaps XMMWORD[16+rsp],xmm0
+ movaps XMMWORD[32+rsp],xmm0
+ movaps XMMWORD[48+rsp],xmm0
+ movaps XMMWORD[64+rsp],xmm0
+ movaps XMMWORD[80+rsp],xmm0
+ movaps XMMWORD[96+rsp],xmm0
+ movaps XMMWORD[112+rsp],xmm0
+ mov rbp,QWORD[((-8))+r11]
+
+ lea rsp,[r11]
+
+$L$ctr32_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_aesni_ctr32_encrypt_blocks:
+global aesni_xts_encrypt
+
+ALIGN 16
+aesni_xts_encrypt:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_xts_encrypt:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ lea r11,[rsp]
+
+ push rbp
+
+ sub rsp,272
+ and rsp,-16
+ movaps XMMWORD[(-168)+r11],xmm6
+ movaps XMMWORD[(-152)+r11],xmm7
+ movaps XMMWORD[(-136)+r11],xmm8
+ movaps XMMWORD[(-120)+r11],xmm9
+ movaps XMMWORD[(-104)+r11],xmm10
+ movaps XMMWORD[(-88)+r11],xmm11
+ movaps XMMWORD[(-72)+r11],xmm12
+ movaps XMMWORD[(-56)+r11],xmm13
+ movaps XMMWORD[(-40)+r11],xmm14
+ movaps XMMWORD[(-24)+r11],xmm15
+$L$xts_enc_body:
+ movups xmm2,XMMWORD[r9]
+ mov eax,DWORD[240+r8]
+ mov r10d,DWORD[240+rcx]
+ movups xmm0,XMMWORD[r8]
+ movups xmm1,XMMWORD[16+r8]
+ lea r8,[32+r8]
+ xorps xmm2,xmm0
+$L$oop_enc1_8:
+DB 102,15,56,220,209
+ dec eax
+ movups xmm1,XMMWORD[r8]
+ lea r8,[16+r8]
+ jnz NEAR $L$oop_enc1_8
+DB 102,15,56,221,209
+ movups xmm0,XMMWORD[rcx]
+ mov rbp,rcx
+ mov eax,r10d
+ shl r10d,4
+ mov r9,rdx
+ and rdx,-16
+
+ movups xmm1,XMMWORD[16+r10*1+rcx]
+
+ movdqa xmm8,XMMWORD[$L$xts_magic]
+ movdqa xmm15,xmm2
+ pshufd xmm9,xmm2,0x5f
+ pxor xmm1,xmm0
+ movdqa xmm14,xmm9
+ paddd xmm9,xmm9
+ movdqa xmm10,xmm15
+ psrad xmm14,31
+ paddq xmm15,xmm15
+ pand xmm14,xmm8
+ pxor xmm10,xmm0
+ pxor xmm15,xmm14
+ movdqa xmm14,xmm9
+ paddd xmm9,xmm9
+ movdqa xmm11,xmm15
+ psrad xmm14,31
+ paddq xmm15,xmm15
+ pand xmm14,xmm8
+ pxor xmm11,xmm0
+ pxor xmm15,xmm14
+ movdqa xmm14,xmm9
+ paddd xmm9,xmm9
+ movdqa xmm12,xmm15
+ psrad xmm14,31
+ paddq xmm15,xmm15
+ pand xmm14,xmm8
+ pxor xmm12,xmm0
+ pxor xmm15,xmm14
+ movdqa xmm14,xmm9
+ paddd xmm9,xmm9
+ movdqa xmm13,xmm15
+ psrad xmm14,31
+ paddq xmm15,xmm15
+ pand xmm14,xmm8
+ pxor xmm13,xmm0
+ pxor xmm15,xmm14
+ movdqa xmm14,xmm15
+ psrad xmm9,31
+ paddq xmm15,xmm15
+ pand xmm9,xmm8
+ pxor xmm14,xmm0
+ pxor xmm15,xmm9
+ movaps XMMWORD[96+rsp],xmm1
+
+ sub rdx,16*6
+ jc NEAR $L$xts_enc_short
+
+ mov eax,16+96
+ lea rcx,[32+r10*1+rbp]
+ sub rax,r10
+ movups xmm1,XMMWORD[16+rbp]
+ mov r10,rax
+ lea r8,[$L$xts_magic]
+ jmp NEAR $L$xts_enc_grandloop
+
+ALIGN 32
+$L$xts_enc_grandloop:
+ movdqu xmm2,XMMWORD[rdi]
+ movdqa xmm8,xmm0
+ movdqu xmm3,XMMWORD[16+rdi]
+ pxor xmm2,xmm10
+ movdqu xmm4,XMMWORD[32+rdi]
+ pxor xmm3,xmm11
+DB 102,15,56,220,209
+ movdqu xmm5,XMMWORD[48+rdi]
+ pxor xmm4,xmm12
+DB 102,15,56,220,217
+ movdqu xmm6,XMMWORD[64+rdi]
+ pxor xmm5,xmm13
+DB 102,15,56,220,225
+ movdqu xmm7,XMMWORD[80+rdi]
+ pxor xmm8,xmm15
+ movdqa xmm9,XMMWORD[96+rsp]
+ pxor xmm6,xmm14
+DB 102,15,56,220,233
+ movups xmm0,XMMWORD[32+rbp]
+ lea rdi,[96+rdi]
+ pxor xmm7,xmm8
+
+ pxor xmm10,xmm9
+DB 102,15,56,220,241
+ pxor xmm11,xmm9
+ movdqa XMMWORD[rsp],xmm10
+DB 102,15,56,220,249
+ movups xmm1,XMMWORD[48+rbp]
+ pxor xmm12,xmm9
+
+DB 102,15,56,220,208
+ pxor xmm13,xmm9
+ movdqa XMMWORD[16+rsp],xmm11
+DB 102,15,56,220,216
+ pxor xmm14,xmm9
+ movdqa XMMWORD[32+rsp],xmm12
+DB 102,15,56,220,224
+DB 102,15,56,220,232
+ pxor xmm8,xmm9
+ movdqa XMMWORD[64+rsp],xmm14
+DB 102,15,56,220,240
+DB 102,15,56,220,248
+ movups xmm0,XMMWORD[64+rbp]
+ movdqa XMMWORD[80+rsp],xmm8
+ pshufd xmm9,xmm15,0x5f
+ jmp NEAR $L$xts_enc_loop6
+ALIGN 32
+$L$xts_enc_loop6:
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+DB 102,15,56,220,241
+DB 102,15,56,220,249
+ movups xmm1,XMMWORD[((-64))+rax*1+rcx]
+ add rax,32
+
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+DB 102,15,56,220,224
+DB 102,15,56,220,232
+DB 102,15,56,220,240
+DB 102,15,56,220,248
+ movups xmm0,XMMWORD[((-80))+rax*1+rcx]
+ jnz NEAR $L$xts_enc_loop6
+
+ movdqa xmm8,XMMWORD[r8]
+ movdqa xmm14,xmm9
+ paddd xmm9,xmm9
+DB 102,15,56,220,209
+ paddq xmm15,xmm15
+ psrad xmm14,31
+DB 102,15,56,220,217
+ pand xmm14,xmm8
+ movups xmm10,XMMWORD[rbp]
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+DB 102,15,56,220,241
+ pxor xmm15,xmm14
+ movaps xmm11,xmm10
+DB 102,15,56,220,249
+ movups xmm1,XMMWORD[((-64))+rcx]
+
+ movdqa xmm14,xmm9
+DB 102,15,56,220,208
+ paddd xmm9,xmm9
+ pxor xmm10,xmm15
+DB 102,15,56,220,216
+ psrad xmm14,31
+ paddq xmm15,xmm15
+DB 102,15,56,220,224
+DB 102,15,56,220,232
+ pand xmm14,xmm8
+ movaps xmm12,xmm11
+DB 102,15,56,220,240
+ pxor xmm15,xmm14
+ movdqa xmm14,xmm9
+DB 102,15,56,220,248
+ movups xmm0,XMMWORD[((-48))+rcx]
+
+ paddd xmm9,xmm9
+DB 102,15,56,220,209
+ pxor xmm11,xmm15
+ psrad xmm14,31
+DB 102,15,56,220,217
+ paddq xmm15,xmm15
+ pand xmm14,xmm8
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+ movdqa XMMWORD[48+rsp],xmm13
+ pxor xmm15,xmm14
+DB 102,15,56,220,241
+ movaps xmm13,xmm12
+ movdqa xmm14,xmm9
+DB 102,15,56,220,249
+ movups xmm1,XMMWORD[((-32))+rcx]
+
+ paddd xmm9,xmm9
+DB 102,15,56,220,208
+ pxor xmm12,xmm15
+ psrad xmm14,31
+DB 102,15,56,220,216
+ paddq xmm15,xmm15
+ pand xmm14,xmm8
+DB 102,15,56,220,224
+DB 102,15,56,220,232
+DB 102,15,56,220,240
+ pxor xmm15,xmm14
+ movaps xmm14,xmm13
+DB 102,15,56,220,248
+
+ movdqa xmm0,xmm9
+ paddd xmm9,xmm9
+DB 102,15,56,220,209
+ pxor xmm13,xmm15
+ psrad xmm0,31
+DB 102,15,56,220,217
+ paddq xmm15,xmm15
+ pand xmm0,xmm8
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+ pxor xmm15,xmm0
+ movups xmm0,XMMWORD[rbp]
+DB 102,15,56,220,241
+DB 102,15,56,220,249
+ movups xmm1,XMMWORD[16+rbp]
+
+ pxor xmm14,xmm15
+DB 102,15,56,221,84,36,0
+ psrad xmm9,31
+ paddq xmm15,xmm15
+DB 102,15,56,221,92,36,16
+DB 102,15,56,221,100,36,32
+ pand xmm9,xmm8
+ mov rax,r10
+DB 102,15,56,221,108,36,48
+DB 102,15,56,221,116,36,64
+DB 102,15,56,221,124,36,80
+ pxor xmm15,xmm9
+
+ lea rsi,[96+rsi]
+ movups XMMWORD[(-96)+rsi],xmm2
+ movups XMMWORD[(-80)+rsi],xmm3
+ movups XMMWORD[(-64)+rsi],xmm4
+ movups XMMWORD[(-48)+rsi],xmm5
+ movups XMMWORD[(-32)+rsi],xmm6
+ movups XMMWORD[(-16)+rsi],xmm7
+ sub rdx,16*6
+ jnc NEAR $L$xts_enc_grandloop
+
+ mov eax,16+96
+ sub eax,r10d
+ mov rcx,rbp
+ shr eax,4
+
+$L$xts_enc_short:
+
+ mov r10d,eax
+ pxor xmm10,xmm0
+ add rdx,16*6
+ jz NEAR $L$xts_enc_done
+
+ pxor xmm11,xmm0
+ cmp rdx,0x20
+ jb NEAR $L$xts_enc_one
+ pxor xmm12,xmm0
+ je NEAR $L$xts_enc_two
+
+ pxor xmm13,xmm0
+ cmp rdx,0x40
+ jb NEAR $L$xts_enc_three
+ pxor xmm14,xmm0
+ je NEAR $L$xts_enc_four
+
+ movdqu xmm2,XMMWORD[rdi]
+ movdqu xmm3,XMMWORD[16+rdi]
+ movdqu xmm4,XMMWORD[32+rdi]
+ pxor xmm2,xmm10
+ movdqu xmm5,XMMWORD[48+rdi]
+ pxor xmm3,xmm11
+ movdqu xmm6,XMMWORD[64+rdi]
+ lea rdi,[80+rdi]
+ pxor xmm4,xmm12
+ pxor xmm5,xmm13
+ pxor xmm6,xmm14
+ pxor xmm7,xmm7
+
+ call _aesni_encrypt6
+
+ xorps xmm2,xmm10
+ movdqa xmm10,xmm15
+ xorps xmm3,xmm11
+ xorps xmm4,xmm12
+ movdqu XMMWORD[rsi],xmm2
+ xorps xmm5,xmm13
+ movdqu XMMWORD[16+rsi],xmm3
+ xorps xmm6,xmm14
+ movdqu XMMWORD[32+rsi],xmm4
+ movdqu XMMWORD[48+rsi],xmm5
+ movdqu XMMWORD[64+rsi],xmm6
+ lea rsi,[80+rsi]
+ jmp NEAR $L$xts_enc_done
+
+ALIGN 16
+$L$xts_enc_one:
+ movups xmm2,XMMWORD[rdi]
+ lea rdi,[16+rdi]
+ xorps xmm2,xmm10
+ movups xmm0,XMMWORD[rcx]
+ movups xmm1,XMMWORD[16+rcx]
+ lea rcx,[32+rcx]
+ xorps xmm2,xmm0
+$L$oop_enc1_9:
+DB 102,15,56,220,209
+ dec eax
+ movups xmm1,XMMWORD[rcx]
+ lea rcx,[16+rcx]
+ jnz NEAR $L$oop_enc1_9
+DB 102,15,56,221,209
+ xorps xmm2,xmm10
+ movdqa xmm10,xmm11
+ movups XMMWORD[rsi],xmm2
+ lea rsi,[16+rsi]
+ jmp NEAR $L$xts_enc_done
+
+ALIGN 16
+$L$xts_enc_two:
+ movups xmm2,XMMWORD[rdi]
+ movups xmm3,XMMWORD[16+rdi]
+ lea rdi,[32+rdi]
+ xorps xmm2,xmm10
+ xorps xmm3,xmm11
+
+ call _aesni_encrypt2
+
+ xorps xmm2,xmm10
+ movdqa xmm10,xmm12
+ xorps xmm3,xmm11
+ movups XMMWORD[rsi],xmm2
+ movups XMMWORD[16+rsi],xmm3
+ lea rsi,[32+rsi]
+ jmp NEAR $L$xts_enc_done
+
+ALIGN 16
+$L$xts_enc_three:
+ movups xmm2,XMMWORD[rdi]
+ movups xmm3,XMMWORD[16+rdi]
+ movups xmm4,XMMWORD[32+rdi]
+ lea rdi,[48+rdi]
+ xorps xmm2,xmm10
+ xorps xmm3,xmm11
+ xorps xmm4,xmm12
+
+ call _aesni_encrypt3
+
+ xorps xmm2,xmm10
+ movdqa xmm10,xmm13
+ xorps xmm3,xmm11
+ xorps xmm4,xmm12
+ movups XMMWORD[rsi],xmm2
+ movups XMMWORD[16+rsi],xmm3
+ movups XMMWORD[32+rsi],xmm4
+ lea rsi,[48+rsi]
+ jmp NEAR $L$xts_enc_done
+
+ALIGN 16
+$L$xts_enc_four:
+ movups xmm2,XMMWORD[rdi]
+ movups xmm3,XMMWORD[16+rdi]
+ movups xmm4,XMMWORD[32+rdi]
+ xorps xmm2,xmm10
+ movups xmm5,XMMWORD[48+rdi]
+ lea rdi,[64+rdi]
+ xorps xmm3,xmm11
+ xorps xmm4,xmm12
+ xorps xmm5,xmm13
+
+ call _aesni_encrypt4
+
+ pxor xmm2,xmm10
+ movdqa xmm10,xmm14
+ pxor xmm3,xmm11
+ pxor xmm4,xmm12
+ movdqu XMMWORD[rsi],xmm2
+ pxor xmm5,xmm13
+ movdqu XMMWORD[16+rsi],xmm3
+ movdqu XMMWORD[32+rsi],xmm4
+ movdqu XMMWORD[48+rsi],xmm5
+ lea rsi,[64+rsi]
+ jmp NEAR $L$xts_enc_done
+
+ALIGN 16
+$L$xts_enc_done:
+ and r9,15
+ jz NEAR $L$xts_enc_ret
+ mov rdx,r9
+
+$L$xts_enc_steal:
+ movzx eax,BYTE[rdi]
+ movzx ecx,BYTE[((-16))+rsi]
+ lea rdi,[1+rdi]
+ mov BYTE[((-16))+rsi],al
+ mov BYTE[rsi],cl
+ lea rsi,[1+rsi]
+ sub rdx,1
+ jnz NEAR $L$xts_enc_steal
+
+ sub rsi,r9
+ mov rcx,rbp
+ mov eax,r10d
+
+ movups xmm2,XMMWORD[((-16))+rsi]
+ xorps xmm2,xmm10
+ movups xmm0,XMMWORD[rcx]
+ movups xmm1,XMMWORD[16+rcx]
+ lea rcx,[32+rcx]
+ xorps xmm2,xmm0
+$L$oop_enc1_10:
+DB 102,15,56,220,209
+ dec eax
+ movups xmm1,XMMWORD[rcx]
+ lea rcx,[16+rcx]
+ jnz NEAR $L$oop_enc1_10
+DB 102,15,56,221,209
+ xorps xmm2,xmm10
+ movups XMMWORD[(-16)+rsi],xmm2
+
+$L$xts_enc_ret:
+ xorps xmm0,xmm0
+ pxor xmm1,xmm1
+ pxor xmm2,xmm2
+ pxor xmm3,xmm3
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+ movaps xmm6,XMMWORD[((-168))+r11]
+ movaps XMMWORD[(-168)+r11],xmm0
+ movaps xmm7,XMMWORD[((-152))+r11]
+ movaps XMMWORD[(-152)+r11],xmm0
+ movaps xmm8,XMMWORD[((-136))+r11]
+ movaps XMMWORD[(-136)+r11],xmm0
+ movaps xmm9,XMMWORD[((-120))+r11]
+ movaps XMMWORD[(-120)+r11],xmm0
+ movaps xmm10,XMMWORD[((-104))+r11]
+ movaps XMMWORD[(-104)+r11],xmm0
+ movaps xmm11,XMMWORD[((-88))+r11]
+ movaps XMMWORD[(-88)+r11],xmm0
+ movaps xmm12,XMMWORD[((-72))+r11]
+ movaps XMMWORD[(-72)+r11],xmm0
+ movaps xmm13,XMMWORD[((-56))+r11]
+ movaps XMMWORD[(-56)+r11],xmm0
+ movaps xmm14,XMMWORD[((-40))+r11]
+ movaps XMMWORD[(-40)+r11],xmm0
+ movaps xmm15,XMMWORD[((-24))+r11]
+ movaps XMMWORD[(-24)+r11],xmm0
+ movaps XMMWORD[rsp],xmm0
+ movaps XMMWORD[16+rsp],xmm0
+ movaps XMMWORD[32+rsp],xmm0
+ movaps XMMWORD[48+rsp],xmm0
+ movaps XMMWORD[64+rsp],xmm0
+ movaps XMMWORD[80+rsp],xmm0
+ movaps XMMWORD[96+rsp],xmm0
+ mov rbp,QWORD[((-8))+r11]
+
+ lea rsp,[r11]
+
+$L$xts_enc_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_aesni_xts_encrypt:
+global aesni_xts_decrypt
+
+ALIGN 16
+aesni_xts_decrypt:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_xts_decrypt:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ lea r11,[rsp]
+
+ push rbp
+
+ sub rsp,272
+ and rsp,-16
+ movaps XMMWORD[(-168)+r11],xmm6
+ movaps XMMWORD[(-152)+r11],xmm7
+ movaps XMMWORD[(-136)+r11],xmm8
+ movaps XMMWORD[(-120)+r11],xmm9
+ movaps XMMWORD[(-104)+r11],xmm10
+ movaps XMMWORD[(-88)+r11],xmm11
+ movaps XMMWORD[(-72)+r11],xmm12
+ movaps XMMWORD[(-56)+r11],xmm13
+ movaps XMMWORD[(-40)+r11],xmm14
+ movaps XMMWORD[(-24)+r11],xmm15
+$L$xts_dec_body:
+ movups xmm2,XMMWORD[r9]
+ mov eax,DWORD[240+r8]
+ mov r10d,DWORD[240+rcx]
+ movups xmm0,XMMWORD[r8]
+ movups xmm1,XMMWORD[16+r8]
+ lea r8,[32+r8]
+ xorps xmm2,xmm0
+$L$oop_enc1_11:
+DB 102,15,56,220,209
+ dec eax
+ movups xmm1,XMMWORD[r8]
+ lea r8,[16+r8]
+ jnz NEAR $L$oop_enc1_11
+DB 102,15,56,221,209
+ xor eax,eax
+ test rdx,15
+ setnz al
+ shl rax,4
+ sub rdx,rax
+
+ movups xmm0,XMMWORD[rcx]
+ mov rbp,rcx
+ mov eax,r10d
+ shl r10d,4
+ mov r9,rdx
+ and rdx,-16
+
+ movups xmm1,XMMWORD[16+r10*1+rcx]
+
+ movdqa xmm8,XMMWORD[$L$xts_magic]
+ movdqa xmm15,xmm2
+ pshufd xmm9,xmm2,0x5f
+ pxor xmm1,xmm0
+ movdqa xmm14,xmm9
+ paddd xmm9,xmm9
+ movdqa xmm10,xmm15
+ psrad xmm14,31
+ paddq xmm15,xmm15
+ pand xmm14,xmm8
+ pxor xmm10,xmm0
+ pxor xmm15,xmm14
+ movdqa xmm14,xmm9
+ paddd xmm9,xmm9
+ movdqa xmm11,xmm15
+ psrad xmm14,31
+ paddq xmm15,xmm15
+ pand xmm14,xmm8
+ pxor xmm11,xmm0
+ pxor xmm15,xmm14
+ movdqa xmm14,xmm9
+ paddd xmm9,xmm9
+ movdqa xmm12,xmm15
+ psrad xmm14,31
+ paddq xmm15,xmm15
+ pand xmm14,xmm8
+ pxor xmm12,xmm0
+ pxor xmm15,xmm14
+ movdqa xmm14,xmm9
+ paddd xmm9,xmm9
+ movdqa xmm13,xmm15
+ psrad xmm14,31
+ paddq xmm15,xmm15
+ pand xmm14,xmm8
+ pxor xmm13,xmm0
+ pxor xmm15,xmm14
+ movdqa xmm14,xmm15
+ psrad xmm9,31
+ paddq xmm15,xmm15
+ pand xmm9,xmm8
+ pxor xmm14,xmm0
+ pxor xmm15,xmm9
+ movaps XMMWORD[96+rsp],xmm1
+
+ sub rdx,16*6
+ jc NEAR $L$xts_dec_short
+
+ mov eax,16+96
+ lea rcx,[32+r10*1+rbp]
+ sub rax,r10
+ movups xmm1,XMMWORD[16+rbp]
+ mov r10,rax
+ lea r8,[$L$xts_magic]
+ jmp NEAR $L$xts_dec_grandloop
+
+ALIGN 32
+$L$xts_dec_grandloop:
+ movdqu xmm2,XMMWORD[rdi]
+ movdqa xmm8,xmm0
+ movdqu xmm3,XMMWORD[16+rdi]
+ pxor xmm2,xmm10
+ movdqu xmm4,XMMWORD[32+rdi]
+ pxor xmm3,xmm11
+DB 102,15,56,222,209
+ movdqu xmm5,XMMWORD[48+rdi]
+ pxor xmm4,xmm12
+DB 102,15,56,222,217
+ movdqu xmm6,XMMWORD[64+rdi]
+ pxor xmm5,xmm13
+DB 102,15,56,222,225
+ movdqu xmm7,XMMWORD[80+rdi]
+ pxor xmm8,xmm15
+ movdqa xmm9,XMMWORD[96+rsp]
+ pxor xmm6,xmm14
+DB 102,15,56,222,233
+ movups xmm0,XMMWORD[32+rbp]
+ lea rdi,[96+rdi]
+ pxor xmm7,xmm8
+
+ pxor xmm10,xmm9
+DB 102,15,56,222,241
+ pxor xmm11,xmm9
+ movdqa XMMWORD[rsp],xmm10
+DB 102,15,56,222,249
+ movups xmm1,XMMWORD[48+rbp]
+ pxor xmm12,xmm9
+
+DB 102,15,56,222,208
+ pxor xmm13,xmm9
+ movdqa XMMWORD[16+rsp],xmm11
+DB 102,15,56,222,216
+ pxor xmm14,xmm9
+ movdqa XMMWORD[32+rsp],xmm12
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+ pxor xmm8,xmm9
+ movdqa XMMWORD[64+rsp],xmm14
+DB 102,15,56,222,240
+DB 102,15,56,222,248
+ movups xmm0,XMMWORD[64+rbp]
+ movdqa XMMWORD[80+rsp],xmm8
+ pshufd xmm9,xmm15,0x5f
+ jmp NEAR $L$xts_dec_loop6
+ALIGN 32
+$L$xts_dec_loop6:
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+DB 102,15,56,222,241
+DB 102,15,56,222,249
+ movups xmm1,XMMWORD[((-64))+rax*1+rcx]
+ add rax,32
+
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+DB 102,15,56,222,240
+DB 102,15,56,222,248
+ movups xmm0,XMMWORD[((-80))+rax*1+rcx]
+ jnz NEAR $L$xts_dec_loop6
+
+ movdqa xmm8,XMMWORD[r8]
+ movdqa xmm14,xmm9
+ paddd xmm9,xmm9
+DB 102,15,56,222,209
+ paddq xmm15,xmm15
+ psrad xmm14,31
+DB 102,15,56,222,217
+ pand xmm14,xmm8
+ movups xmm10,XMMWORD[rbp]
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+DB 102,15,56,222,241
+ pxor xmm15,xmm14
+ movaps xmm11,xmm10
+DB 102,15,56,222,249
+ movups xmm1,XMMWORD[((-64))+rcx]
+
+ movdqa xmm14,xmm9
+DB 102,15,56,222,208
+ paddd xmm9,xmm9
+ pxor xmm10,xmm15
+DB 102,15,56,222,216
+ psrad xmm14,31
+ paddq xmm15,xmm15
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+ pand xmm14,xmm8
+ movaps xmm12,xmm11
+DB 102,15,56,222,240
+ pxor xmm15,xmm14
+ movdqa xmm14,xmm9
+DB 102,15,56,222,248
+ movups xmm0,XMMWORD[((-48))+rcx]
+
+ paddd xmm9,xmm9
+DB 102,15,56,222,209
+ pxor xmm11,xmm15
+ psrad xmm14,31
+DB 102,15,56,222,217
+ paddq xmm15,xmm15
+ pand xmm14,xmm8
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+ movdqa XMMWORD[48+rsp],xmm13
+ pxor xmm15,xmm14
+DB 102,15,56,222,241
+ movaps xmm13,xmm12
+ movdqa xmm14,xmm9
+DB 102,15,56,222,249
+ movups xmm1,XMMWORD[((-32))+rcx]
+
+ paddd xmm9,xmm9
+DB 102,15,56,222,208
+ pxor xmm12,xmm15
+ psrad xmm14,31
+DB 102,15,56,222,216
+ paddq xmm15,xmm15
+ pand xmm14,xmm8
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+DB 102,15,56,222,240
+ pxor xmm15,xmm14
+ movaps xmm14,xmm13
+DB 102,15,56,222,248
+
+ movdqa xmm0,xmm9
+ paddd xmm9,xmm9
+DB 102,15,56,222,209
+ pxor xmm13,xmm15
+ psrad xmm0,31
+DB 102,15,56,222,217
+ paddq xmm15,xmm15
+ pand xmm0,xmm8
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+ pxor xmm15,xmm0
+ movups xmm0,XMMWORD[rbp]
+DB 102,15,56,222,241
+DB 102,15,56,222,249
+ movups xmm1,XMMWORD[16+rbp]
+
+ pxor xmm14,xmm15
+DB 102,15,56,223,84,36,0
+ psrad xmm9,31
+ paddq xmm15,xmm15
+DB 102,15,56,223,92,36,16
+DB 102,15,56,223,100,36,32
+ pand xmm9,xmm8
+ mov rax,r10
+DB 102,15,56,223,108,36,48
+DB 102,15,56,223,116,36,64
+DB 102,15,56,223,124,36,80
+ pxor xmm15,xmm9
+
+ lea rsi,[96+rsi]
+ movups XMMWORD[(-96)+rsi],xmm2
+ movups XMMWORD[(-80)+rsi],xmm3
+ movups XMMWORD[(-64)+rsi],xmm4
+ movups XMMWORD[(-48)+rsi],xmm5
+ movups XMMWORD[(-32)+rsi],xmm6
+ movups XMMWORD[(-16)+rsi],xmm7
+ sub rdx,16*6
+ jnc NEAR $L$xts_dec_grandloop
+
+ mov eax,16+96
+ sub eax,r10d
+ mov rcx,rbp
+ shr eax,4
+
+$L$xts_dec_short:
+
+ mov r10d,eax
+ pxor xmm10,xmm0
+ pxor xmm11,xmm0
+ add rdx,16*6
+ jz NEAR $L$xts_dec_done
+
+ pxor xmm12,xmm0
+ cmp rdx,0x20
+ jb NEAR $L$xts_dec_one
+ pxor xmm13,xmm0
+ je NEAR $L$xts_dec_two
+
+ pxor xmm14,xmm0
+ cmp rdx,0x40
+ jb NEAR $L$xts_dec_three
+ je NEAR $L$xts_dec_four
+
+ movdqu xmm2,XMMWORD[rdi]
+ movdqu xmm3,XMMWORD[16+rdi]
+ movdqu xmm4,XMMWORD[32+rdi]
+ pxor xmm2,xmm10
+ movdqu xmm5,XMMWORD[48+rdi]
+ pxor xmm3,xmm11
+ movdqu xmm6,XMMWORD[64+rdi]
+ lea rdi,[80+rdi]
+ pxor xmm4,xmm12
+ pxor xmm5,xmm13
+ pxor xmm6,xmm14
+
+ call _aesni_decrypt6
+
+ xorps xmm2,xmm10
+ xorps xmm3,xmm11
+ xorps xmm4,xmm12
+ movdqu XMMWORD[rsi],xmm2
+ xorps xmm5,xmm13
+ movdqu XMMWORD[16+rsi],xmm3
+ xorps xmm6,xmm14
+ movdqu XMMWORD[32+rsi],xmm4
+ pxor xmm14,xmm14
+ movdqu XMMWORD[48+rsi],xmm5
+ pcmpgtd xmm14,xmm15
+ movdqu XMMWORD[64+rsi],xmm6
+ lea rsi,[80+rsi]
+ pshufd xmm11,xmm14,0x13
+ and r9,15
+ jz NEAR $L$xts_dec_ret
+
+ movdqa xmm10,xmm15
+ paddq xmm15,xmm15
+ pand xmm11,xmm8
+ pxor xmm11,xmm15
+ jmp NEAR $L$xts_dec_done2
+
+ALIGN 16
+$L$xts_dec_one:
+ movups xmm2,XMMWORD[rdi]
+ lea rdi,[16+rdi]
+ xorps xmm2,xmm10
+ movups xmm0,XMMWORD[rcx]
+ movups xmm1,XMMWORD[16+rcx]
+ lea rcx,[32+rcx]
+ xorps xmm2,xmm0
+$L$oop_dec1_12:
+DB 102,15,56,222,209
+ dec eax
+ movups xmm1,XMMWORD[rcx]
+ lea rcx,[16+rcx]
+ jnz NEAR $L$oop_dec1_12
+DB 102,15,56,223,209
+ xorps xmm2,xmm10
+ movdqa xmm10,xmm11
+ movups XMMWORD[rsi],xmm2
+ movdqa xmm11,xmm12
+ lea rsi,[16+rsi]
+ jmp NEAR $L$xts_dec_done
+
+ALIGN 16
+$L$xts_dec_two:
+ movups xmm2,XMMWORD[rdi]
+ movups xmm3,XMMWORD[16+rdi]
+ lea rdi,[32+rdi]
+ xorps xmm2,xmm10
+ xorps xmm3,xmm11
+
+ call _aesni_decrypt2
+
+ xorps xmm2,xmm10
+ movdqa xmm10,xmm12
+ xorps xmm3,xmm11
+ movdqa xmm11,xmm13
+ movups XMMWORD[rsi],xmm2
+ movups XMMWORD[16+rsi],xmm3
+ lea rsi,[32+rsi]
+ jmp NEAR $L$xts_dec_done
+
+ALIGN 16
+$L$xts_dec_three:
+ movups xmm2,XMMWORD[rdi]
+ movups xmm3,XMMWORD[16+rdi]
+ movups xmm4,XMMWORD[32+rdi]
+ lea rdi,[48+rdi]
+ xorps xmm2,xmm10
+ xorps xmm3,xmm11
+ xorps xmm4,xmm12
+
+ call _aesni_decrypt3
+
+ xorps xmm2,xmm10
+ movdqa xmm10,xmm13
+ xorps xmm3,xmm11
+ movdqa xmm11,xmm14
+ xorps xmm4,xmm12
+ movups XMMWORD[rsi],xmm2
+ movups XMMWORD[16+rsi],xmm3
+ movups XMMWORD[32+rsi],xmm4
+ lea rsi,[48+rsi]
+ jmp NEAR $L$xts_dec_done
+
+ALIGN 16
+$L$xts_dec_four:
+ movups xmm2,XMMWORD[rdi]
+ movups xmm3,XMMWORD[16+rdi]
+ movups xmm4,XMMWORD[32+rdi]
+ xorps xmm2,xmm10
+ movups xmm5,XMMWORD[48+rdi]
+ lea rdi,[64+rdi]
+ xorps xmm3,xmm11
+ xorps xmm4,xmm12
+ xorps xmm5,xmm13
+
+ call _aesni_decrypt4
+
+ pxor xmm2,xmm10
+ movdqa xmm10,xmm14
+ pxor xmm3,xmm11
+ movdqa xmm11,xmm15
+ pxor xmm4,xmm12
+ movdqu XMMWORD[rsi],xmm2
+ pxor xmm5,xmm13
+ movdqu XMMWORD[16+rsi],xmm3
+ movdqu XMMWORD[32+rsi],xmm4
+ movdqu XMMWORD[48+rsi],xmm5
+ lea rsi,[64+rsi]
+ jmp NEAR $L$xts_dec_done
+
+ALIGN 16
+$L$xts_dec_done:
+ and r9,15
+ jz NEAR $L$xts_dec_ret
+$L$xts_dec_done2:
+ mov rdx,r9
+ mov rcx,rbp
+ mov eax,r10d
+
+ movups xmm2,XMMWORD[rdi]
+ xorps xmm2,xmm11
+ movups xmm0,XMMWORD[rcx]
+ movups xmm1,XMMWORD[16+rcx]
+ lea rcx,[32+rcx]
+ xorps xmm2,xmm0
+$L$oop_dec1_13:
+DB 102,15,56,222,209
+ dec eax
+ movups xmm1,XMMWORD[rcx]
+ lea rcx,[16+rcx]
+ jnz NEAR $L$oop_dec1_13
+DB 102,15,56,223,209
+ xorps xmm2,xmm11
+ movups XMMWORD[rsi],xmm2
+
+$L$xts_dec_steal:
+ movzx eax,BYTE[16+rdi]
+ movzx ecx,BYTE[rsi]
+ lea rdi,[1+rdi]
+ mov BYTE[rsi],al
+ mov BYTE[16+rsi],cl
+ lea rsi,[1+rsi]
+ sub rdx,1
+ jnz NEAR $L$xts_dec_steal
+
+ sub rsi,r9
+ mov rcx,rbp
+ mov eax,r10d
+
+ movups xmm2,XMMWORD[rsi]
+ xorps xmm2,xmm10
+ movups xmm0,XMMWORD[rcx]
+ movups xmm1,XMMWORD[16+rcx]
+ lea rcx,[32+rcx]
+ xorps xmm2,xmm0
+$L$oop_dec1_14:
+DB 102,15,56,222,209
+ dec eax
+ movups xmm1,XMMWORD[rcx]
+ lea rcx,[16+rcx]
+ jnz NEAR $L$oop_dec1_14
+DB 102,15,56,223,209
+ xorps xmm2,xmm10
+ movups XMMWORD[rsi],xmm2
+
+$L$xts_dec_ret:
+ xorps xmm0,xmm0
+ pxor xmm1,xmm1
+ pxor xmm2,xmm2
+ pxor xmm3,xmm3
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+ movaps xmm6,XMMWORD[((-168))+r11]
+ movaps XMMWORD[(-168)+r11],xmm0
+ movaps xmm7,XMMWORD[((-152))+r11]
+ movaps XMMWORD[(-152)+r11],xmm0
+ movaps xmm8,XMMWORD[((-136))+r11]
+ movaps XMMWORD[(-136)+r11],xmm0
+ movaps xmm9,XMMWORD[((-120))+r11]
+ movaps XMMWORD[(-120)+r11],xmm0
+ movaps xmm10,XMMWORD[((-104))+r11]
+ movaps XMMWORD[(-104)+r11],xmm0
+ movaps xmm11,XMMWORD[((-88))+r11]
+ movaps XMMWORD[(-88)+r11],xmm0
+ movaps xmm12,XMMWORD[((-72))+r11]
+ movaps XMMWORD[(-72)+r11],xmm0
+ movaps xmm13,XMMWORD[((-56))+r11]
+ movaps XMMWORD[(-56)+r11],xmm0
+ movaps xmm14,XMMWORD[((-40))+r11]
+ movaps XMMWORD[(-40)+r11],xmm0
+ movaps xmm15,XMMWORD[((-24))+r11]
+ movaps XMMWORD[(-24)+r11],xmm0
+ movaps XMMWORD[rsp],xmm0
+ movaps XMMWORD[16+rsp],xmm0
+ movaps XMMWORD[32+rsp],xmm0
+ movaps XMMWORD[48+rsp],xmm0
+ movaps XMMWORD[64+rsp],xmm0
+ movaps XMMWORD[80+rsp],xmm0
+ movaps XMMWORD[96+rsp],xmm0
+ mov rbp,QWORD[((-8))+r11]
+
+ lea rsp,[r11]
+
+$L$xts_dec_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_aesni_xts_decrypt:
+global aesni_ocb_encrypt
+
+ALIGN 32
+aesni_ocb_encrypt:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_ocb_encrypt:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ lea rax,[rsp]
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ lea rsp,[((-160))+rsp]
+ movaps XMMWORD[rsp],xmm6
+ movaps XMMWORD[16+rsp],xmm7
+ movaps XMMWORD[32+rsp],xmm8
+ movaps XMMWORD[48+rsp],xmm9
+ movaps XMMWORD[64+rsp],xmm10
+ movaps XMMWORD[80+rsp],xmm11
+ movaps XMMWORD[96+rsp],xmm12
+ movaps XMMWORD[112+rsp],xmm13
+ movaps XMMWORD[128+rsp],xmm14
+ movaps XMMWORD[144+rsp],xmm15
+$L$ocb_enc_body:
+ mov rbx,QWORD[56+rax]
+ mov rbp,QWORD[((56+8))+rax]
+
+ mov r10d,DWORD[240+rcx]
+ mov r11,rcx
+ shl r10d,4
+ movups xmm9,XMMWORD[rcx]
+ movups xmm1,XMMWORD[16+r10*1+rcx]
+
+ movdqu xmm15,XMMWORD[r9]
+ pxor xmm9,xmm1
+ pxor xmm15,xmm1
+
+ mov eax,16+32
+ lea rcx,[32+r10*1+r11]
+ movups xmm1,XMMWORD[16+r11]
+ sub rax,r10
+ mov r10,rax
+
+ movdqu xmm10,XMMWORD[rbx]
+ movdqu xmm8,XMMWORD[rbp]
+
+ test r8,1
+ jnz NEAR $L$ocb_enc_odd
+
+ bsf r12,r8
+ add r8,1
+ shl r12,4
+ movdqu xmm7,XMMWORD[r12*1+rbx]
+ movdqu xmm2,XMMWORD[rdi]
+ lea rdi,[16+rdi]
+
+ call __ocb_encrypt1
+
+ movdqa xmm15,xmm7
+ movups XMMWORD[rsi],xmm2
+ lea rsi,[16+rsi]
+ sub rdx,1
+ jz NEAR $L$ocb_enc_done
+
+$L$ocb_enc_odd:
+ lea r12,[1+r8]
+ lea r13,[3+r8]
+ lea r14,[5+r8]
+ lea r8,[6+r8]
+ bsf r12,r12
+ bsf r13,r13
+ bsf r14,r14
+ shl r12,4
+ shl r13,4
+ shl r14,4
+
+ sub rdx,6
+ jc NEAR $L$ocb_enc_short
+ jmp NEAR $L$ocb_enc_grandloop
+
+ALIGN 32
+$L$ocb_enc_grandloop:
+ movdqu xmm2,XMMWORD[rdi]
+ movdqu xmm3,XMMWORD[16+rdi]
+ movdqu xmm4,XMMWORD[32+rdi]
+ movdqu xmm5,XMMWORD[48+rdi]
+ movdqu xmm6,XMMWORD[64+rdi]
+ movdqu xmm7,XMMWORD[80+rdi]
+ lea rdi,[96+rdi]
+
+ call __ocb_encrypt6
+
+ movups XMMWORD[rsi],xmm2
+ movups XMMWORD[16+rsi],xmm3
+ movups XMMWORD[32+rsi],xmm4
+ movups XMMWORD[48+rsi],xmm5
+ movups XMMWORD[64+rsi],xmm6
+ movups XMMWORD[80+rsi],xmm7
+ lea rsi,[96+rsi]
+ sub rdx,6
+ jnc NEAR $L$ocb_enc_grandloop
+
+$L$ocb_enc_short:
+ add rdx,6
+ jz NEAR $L$ocb_enc_done
+
+ movdqu xmm2,XMMWORD[rdi]
+ cmp rdx,2
+ jb NEAR $L$ocb_enc_one
+ movdqu xmm3,XMMWORD[16+rdi]
+ je NEAR $L$ocb_enc_two
+
+ movdqu xmm4,XMMWORD[32+rdi]
+ cmp rdx,4
+ jb NEAR $L$ocb_enc_three
+ movdqu xmm5,XMMWORD[48+rdi]
+ je NEAR $L$ocb_enc_four
+
+ movdqu xmm6,XMMWORD[64+rdi]
+ pxor xmm7,xmm7
+
+ call __ocb_encrypt6
+
+ movdqa xmm15,xmm14
+ movups XMMWORD[rsi],xmm2
+ movups XMMWORD[16+rsi],xmm3
+ movups XMMWORD[32+rsi],xmm4
+ movups XMMWORD[48+rsi],xmm5
+ movups XMMWORD[64+rsi],xmm6
+
+ jmp NEAR $L$ocb_enc_done
+
+ALIGN 16
+$L$ocb_enc_one:
+ movdqa xmm7,xmm10
+
+ call __ocb_encrypt1
+
+ movdqa xmm15,xmm7
+ movups XMMWORD[rsi],xmm2
+ jmp NEAR $L$ocb_enc_done
+
+ALIGN 16
+$L$ocb_enc_two:
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+
+ call __ocb_encrypt4
+
+ movdqa xmm15,xmm11
+ movups XMMWORD[rsi],xmm2
+ movups XMMWORD[16+rsi],xmm3
+
+ jmp NEAR $L$ocb_enc_done
+
+ALIGN 16
+$L$ocb_enc_three:
+ pxor xmm5,xmm5
+
+ call __ocb_encrypt4
+
+ movdqa xmm15,xmm12
+ movups XMMWORD[rsi],xmm2
+ movups XMMWORD[16+rsi],xmm3
+ movups XMMWORD[32+rsi],xmm4
+
+ jmp NEAR $L$ocb_enc_done
+
+ALIGN 16
+$L$ocb_enc_four:
+ call __ocb_encrypt4
+
+ movdqa xmm15,xmm13
+ movups XMMWORD[rsi],xmm2
+ movups XMMWORD[16+rsi],xmm3
+ movups XMMWORD[32+rsi],xmm4
+ movups XMMWORD[48+rsi],xmm5
+
+$L$ocb_enc_done:
+ pxor xmm15,xmm0
+ movdqu XMMWORD[rbp],xmm8
+ movdqu XMMWORD[r9],xmm15
+
+ xorps xmm0,xmm0
+ pxor xmm1,xmm1
+ pxor xmm2,xmm2
+ pxor xmm3,xmm3
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+ movaps xmm6,XMMWORD[rsp]
+ movaps XMMWORD[rsp],xmm0
+ movaps xmm7,XMMWORD[16+rsp]
+ movaps XMMWORD[16+rsp],xmm0
+ movaps xmm8,XMMWORD[32+rsp]
+ movaps XMMWORD[32+rsp],xmm0
+ movaps xmm9,XMMWORD[48+rsp]
+ movaps XMMWORD[48+rsp],xmm0
+ movaps xmm10,XMMWORD[64+rsp]
+ movaps XMMWORD[64+rsp],xmm0
+ movaps xmm11,XMMWORD[80+rsp]
+ movaps XMMWORD[80+rsp],xmm0
+ movaps xmm12,XMMWORD[96+rsp]
+ movaps XMMWORD[96+rsp],xmm0
+ movaps xmm13,XMMWORD[112+rsp]
+ movaps XMMWORD[112+rsp],xmm0
+ movaps xmm14,XMMWORD[128+rsp]
+ movaps XMMWORD[128+rsp],xmm0
+ movaps xmm15,XMMWORD[144+rsp]
+ movaps XMMWORD[144+rsp],xmm0
+ lea rax,[((160+40))+rsp]
+$L$ocb_enc_pop:
+ mov r14,QWORD[((-40))+rax]
+
+ mov r13,QWORD[((-32))+rax]
+
+ mov r12,QWORD[((-24))+rax]
+
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$ocb_enc_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_aesni_ocb_encrypt:
+
+
+ALIGN 32
+__ocb_encrypt6:
+ pxor xmm15,xmm9
+ movdqu xmm11,XMMWORD[r12*1+rbx]
+ movdqa xmm12,xmm10
+ movdqu xmm13,XMMWORD[r13*1+rbx]
+ movdqa xmm14,xmm10
+ pxor xmm10,xmm15
+ movdqu xmm15,XMMWORD[r14*1+rbx]
+ pxor xmm11,xmm10
+ pxor xmm8,xmm2
+ pxor xmm2,xmm10
+ pxor xmm12,xmm11
+ pxor xmm8,xmm3
+ pxor xmm3,xmm11
+ pxor xmm13,xmm12
+ pxor xmm8,xmm4
+ pxor xmm4,xmm12
+ pxor xmm14,xmm13
+ pxor xmm8,xmm5
+ pxor xmm5,xmm13
+ pxor xmm15,xmm14
+ pxor xmm8,xmm6
+ pxor xmm6,xmm14
+ pxor xmm8,xmm7
+ pxor xmm7,xmm15
+ movups xmm0,XMMWORD[32+r11]
+
+ lea r12,[1+r8]
+ lea r13,[3+r8]
+ lea r14,[5+r8]
+ add r8,6
+ pxor xmm10,xmm9
+ bsf r12,r12
+ bsf r13,r13
+ bsf r14,r14
+
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+ pxor xmm11,xmm9
+ pxor xmm12,xmm9
+DB 102,15,56,220,241
+ pxor xmm13,xmm9
+ pxor xmm14,xmm9
+DB 102,15,56,220,249
+ movups xmm1,XMMWORD[48+r11]
+ pxor xmm15,xmm9
+
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+DB 102,15,56,220,224
+DB 102,15,56,220,232
+DB 102,15,56,220,240
+DB 102,15,56,220,248
+ movups xmm0,XMMWORD[64+r11]
+ shl r12,4
+ shl r13,4
+ jmp NEAR $L$ocb_enc_loop6
+
+ALIGN 32
+$L$ocb_enc_loop6:
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+DB 102,15,56,220,241
+DB 102,15,56,220,249
+ movups xmm1,XMMWORD[rax*1+rcx]
+ add rax,32
+
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+DB 102,15,56,220,224
+DB 102,15,56,220,232
+DB 102,15,56,220,240
+DB 102,15,56,220,248
+ movups xmm0,XMMWORD[((-16))+rax*1+rcx]
+ jnz NEAR $L$ocb_enc_loop6
+
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+DB 102,15,56,220,241
+DB 102,15,56,220,249
+ movups xmm1,XMMWORD[16+r11]
+ shl r14,4
+
+DB 102,65,15,56,221,210
+ movdqu xmm10,XMMWORD[rbx]
+ mov rax,r10
+DB 102,65,15,56,221,219
+DB 102,65,15,56,221,228
+DB 102,65,15,56,221,237
+DB 102,65,15,56,221,246
+DB 102,65,15,56,221,255
+ DB 0F3h,0C3h ;repret
+
+
+
+ALIGN 32
+__ocb_encrypt4:
+ pxor xmm15,xmm9
+ movdqu xmm11,XMMWORD[r12*1+rbx]
+ movdqa xmm12,xmm10
+ movdqu xmm13,XMMWORD[r13*1+rbx]
+ pxor xmm10,xmm15
+ pxor xmm11,xmm10
+ pxor xmm8,xmm2
+ pxor xmm2,xmm10
+ pxor xmm12,xmm11
+ pxor xmm8,xmm3
+ pxor xmm3,xmm11
+ pxor xmm13,xmm12
+ pxor xmm8,xmm4
+ pxor xmm4,xmm12
+ pxor xmm8,xmm5
+ pxor xmm5,xmm13
+ movups xmm0,XMMWORD[32+r11]
+
+ pxor xmm10,xmm9
+ pxor xmm11,xmm9
+ pxor xmm12,xmm9
+ pxor xmm13,xmm9
+
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+ movups xmm1,XMMWORD[48+r11]
+
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+DB 102,15,56,220,224
+DB 102,15,56,220,232
+ movups xmm0,XMMWORD[64+r11]
+ jmp NEAR $L$ocb_enc_loop4
+
+ALIGN 32
+$L$ocb_enc_loop4:
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+ movups xmm1,XMMWORD[rax*1+rcx]
+ add rax,32
+
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+DB 102,15,56,220,224
+DB 102,15,56,220,232
+ movups xmm0,XMMWORD[((-16))+rax*1+rcx]
+ jnz NEAR $L$ocb_enc_loop4
+
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+ movups xmm1,XMMWORD[16+r11]
+ mov rax,r10
+
+DB 102,65,15,56,221,210
+DB 102,65,15,56,221,219
+DB 102,65,15,56,221,228
+DB 102,65,15,56,221,237
+ DB 0F3h,0C3h ;repret
+
+
+
+ALIGN 32
+__ocb_encrypt1:
+ pxor xmm7,xmm15
+ pxor xmm7,xmm9
+ pxor xmm8,xmm2
+ pxor xmm2,xmm7
+ movups xmm0,XMMWORD[32+r11]
+
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[48+r11]
+ pxor xmm7,xmm9
+
+DB 102,15,56,220,208
+ movups xmm0,XMMWORD[64+r11]
+ jmp NEAR $L$ocb_enc_loop1
+
+ALIGN 32
+$L$ocb_enc_loop1:
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[rax*1+rcx]
+ add rax,32
+
+DB 102,15,56,220,208
+ movups xmm0,XMMWORD[((-16))+rax*1+rcx]
+ jnz NEAR $L$ocb_enc_loop1
+
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[16+r11]
+ mov rax,r10
+
+DB 102,15,56,221,215
+ DB 0F3h,0C3h ;repret
+
+
+global aesni_ocb_decrypt
+
+ALIGN 32
+aesni_ocb_decrypt:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_ocb_decrypt:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ lea rax,[rsp]
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ lea rsp,[((-160))+rsp]
+ movaps XMMWORD[rsp],xmm6
+ movaps XMMWORD[16+rsp],xmm7
+ movaps XMMWORD[32+rsp],xmm8
+ movaps XMMWORD[48+rsp],xmm9
+ movaps XMMWORD[64+rsp],xmm10
+ movaps XMMWORD[80+rsp],xmm11
+ movaps XMMWORD[96+rsp],xmm12
+ movaps XMMWORD[112+rsp],xmm13
+ movaps XMMWORD[128+rsp],xmm14
+ movaps XMMWORD[144+rsp],xmm15
+$L$ocb_dec_body:
+ mov rbx,QWORD[56+rax]
+ mov rbp,QWORD[((56+8))+rax]
+
+ mov r10d,DWORD[240+rcx]
+ mov r11,rcx
+ shl r10d,4
+ movups xmm9,XMMWORD[rcx]
+ movups xmm1,XMMWORD[16+r10*1+rcx]
+
+ movdqu xmm15,XMMWORD[r9]
+ pxor xmm9,xmm1
+ pxor xmm15,xmm1
+
+ mov eax,16+32
+ lea rcx,[32+r10*1+r11]
+ movups xmm1,XMMWORD[16+r11]
+ sub rax,r10
+ mov r10,rax
+
+ movdqu xmm10,XMMWORD[rbx]
+ movdqu xmm8,XMMWORD[rbp]
+
+ test r8,1
+ jnz NEAR $L$ocb_dec_odd
+
+ bsf r12,r8
+ add r8,1
+ shl r12,4
+ movdqu xmm7,XMMWORD[r12*1+rbx]
+ movdqu xmm2,XMMWORD[rdi]
+ lea rdi,[16+rdi]
+
+ call __ocb_decrypt1
+
+ movdqa xmm15,xmm7
+ movups XMMWORD[rsi],xmm2
+ xorps xmm8,xmm2
+ lea rsi,[16+rsi]
+ sub rdx,1
+ jz NEAR $L$ocb_dec_done
+
+$L$ocb_dec_odd:
+ lea r12,[1+r8]
+ lea r13,[3+r8]
+ lea r14,[5+r8]
+ lea r8,[6+r8]
+ bsf r12,r12
+ bsf r13,r13
+ bsf r14,r14
+ shl r12,4
+ shl r13,4
+ shl r14,4
+
+ sub rdx,6
+ jc NEAR $L$ocb_dec_short
+ jmp NEAR $L$ocb_dec_grandloop
+
+ALIGN 32
+$L$ocb_dec_grandloop:
+ movdqu xmm2,XMMWORD[rdi]
+ movdqu xmm3,XMMWORD[16+rdi]
+ movdqu xmm4,XMMWORD[32+rdi]
+ movdqu xmm5,XMMWORD[48+rdi]
+ movdqu xmm6,XMMWORD[64+rdi]
+ movdqu xmm7,XMMWORD[80+rdi]
+ lea rdi,[96+rdi]
+
+ call __ocb_decrypt6
+
+ movups XMMWORD[rsi],xmm2
+ pxor xmm8,xmm2
+ movups XMMWORD[16+rsi],xmm3
+ pxor xmm8,xmm3
+ movups XMMWORD[32+rsi],xmm4
+ pxor xmm8,xmm4
+ movups XMMWORD[48+rsi],xmm5
+ pxor xmm8,xmm5
+ movups XMMWORD[64+rsi],xmm6
+ pxor xmm8,xmm6
+ movups XMMWORD[80+rsi],xmm7
+ pxor xmm8,xmm7
+ lea rsi,[96+rsi]
+ sub rdx,6
+ jnc NEAR $L$ocb_dec_grandloop
+
+$L$ocb_dec_short:
+ add rdx,6
+ jz NEAR $L$ocb_dec_done
+
+ movdqu xmm2,XMMWORD[rdi]
+ cmp rdx,2
+ jb NEAR $L$ocb_dec_one
+ movdqu xmm3,XMMWORD[16+rdi]
+ je NEAR $L$ocb_dec_two
+
+ movdqu xmm4,XMMWORD[32+rdi]
+ cmp rdx,4
+ jb NEAR $L$ocb_dec_three
+ movdqu xmm5,XMMWORD[48+rdi]
+ je NEAR $L$ocb_dec_four
+
+ movdqu xmm6,XMMWORD[64+rdi]
+ pxor xmm7,xmm7
+
+ call __ocb_decrypt6
+
+ movdqa xmm15,xmm14
+ movups XMMWORD[rsi],xmm2
+ pxor xmm8,xmm2
+ movups XMMWORD[16+rsi],xmm3
+ pxor xmm8,xmm3
+ movups XMMWORD[32+rsi],xmm4
+ pxor xmm8,xmm4
+ movups XMMWORD[48+rsi],xmm5
+ pxor xmm8,xmm5
+ movups XMMWORD[64+rsi],xmm6
+ pxor xmm8,xmm6
+
+ jmp NEAR $L$ocb_dec_done
+
+ALIGN 16
+$L$ocb_dec_one:
+ movdqa xmm7,xmm10
+
+ call __ocb_decrypt1
+
+ movdqa xmm15,xmm7
+ movups XMMWORD[rsi],xmm2
+ xorps xmm8,xmm2
+ jmp NEAR $L$ocb_dec_done
+
+ALIGN 16
+$L$ocb_dec_two:
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+
+ call __ocb_decrypt4
+
+ movdqa xmm15,xmm11
+ movups XMMWORD[rsi],xmm2
+ xorps xmm8,xmm2
+ movups XMMWORD[16+rsi],xmm3
+ xorps xmm8,xmm3
+
+ jmp NEAR $L$ocb_dec_done
+
+ALIGN 16
+$L$ocb_dec_three:
+ pxor xmm5,xmm5
+
+ call __ocb_decrypt4
+
+ movdqa xmm15,xmm12
+ movups XMMWORD[rsi],xmm2
+ xorps xmm8,xmm2
+ movups XMMWORD[16+rsi],xmm3
+ xorps xmm8,xmm3
+ movups XMMWORD[32+rsi],xmm4
+ xorps xmm8,xmm4
+
+ jmp NEAR $L$ocb_dec_done
+
+ALIGN 16
+$L$ocb_dec_four:
+ call __ocb_decrypt4
+
+ movdqa xmm15,xmm13
+ movups XMMWORD[rsi],xmm2
+ pxor xmm8,xmm2
+ movups XMMWORD[16+rsi],xmm3
+ pxor xmm8,xmm3
+ movups XMMWORD[32+rsi],xmm4
+ pxor xmm8,xmm4
+ movups XMMWORD[48+rsi],xmm5
+ pxor xmm8,xmm5
+
+$L$ocb_dec_done:
+ pxor xmm15,xmm0
+ movdqu XMMWORD[rbp],xmm8
+ movdqu XMMWORD[r9],xmm15
+
+ xorps xmm0,xmm0
+ pxor xmm1,xmm1
+ pxor xmm2,xmm2
+ pxor xmm3,xmm3
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+ movaps xmm6,XMMWORD[rsp]
+ movaps XMMWORD[rsp],xmm0
+ movaps xmm7,XMMWORD[16+rsp]
+ movaps XMMWORD[16+rsp],xmm0
+ movaps xmm8,XMMWORD[32+rsp]
+ movaps XMMWORD[32+rsp],xmm0
+ movaps xmm9,XMMWORD[48+rsp]
+ movaps XMMWORD[48+rsp],xmm0
+ movaps xmm10,XMMWORD[64+rsp]
+ movaps XMMWORD[64+rsp],xmm0
+ movaps xmm11,XMMWORD[80+rsp]
+ movaps XMMWORD[80+rsp],xmm0
+ movaps xmm12,XMMWORD[96+rsp]
+ movaps XMMWORD[96+rsp],xmm0
+ movaps xmm13,XMMWORD[112+rsp]
+ movaps XMMWORD[112+rsp],xmm0
+ movaps xmm14,XMMWORD[128+rsp]
+ movaps XMMWORD[128+rsp],xmm0
+ movaps xmm15,XMMWORD[144+rsp]
+ movaps XMMWORD[144+rsp],xmm0
+ lea rax,[((160+40))+rsp]
+$L$ocb_dec_pop:
+ mov r14,QWORD[((-40))+rax]
+
+ mov r13,QWORD[((-32))+rax]
+
+ mov r12,QWORD[((-24))+rax]
+
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$ocb_dec_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_aesni_ocb_decrypt:
+
+
+ALIGN 32
+__ocb_decrypt6:
+ pxor xmm15,xmm9
+ movdqu xmm11,XMMWORD[r12*1+rbx]
+ movdqa xmm12,xmm10
+ movdqu xmm13,XMMWORD[r13*1+rbx]
+ movdqa xmm14,xmm10
+ pxor xmm10,xmm15
+ movdqu xmm15,XMMWORD[r14*1+rbx]
+ pxor xmm11,xmm10
+ pxor xmm2,xmm10
+ pxor xmm12,xmm11
+ pxor xmm3,xmm11
+ pxor xmm13,xmm12
+ pxor xmm4,xmm12
+ pxor xmm14,xmm13
+ pxor xmm5,xmm13
+ pxor xmm15,xmm14
+ pxor xmm6,xmm14
+ pxor xmm7,xmm15
+ movups xmm0,XMMWORD[32+r11]
+
+ lea r12,[1+r8]
+ lea r13,[3+r8]
+ lea r14,[5+r8]
+ add r8,6
+ pxor xmm10,xmm9
+ bsf r12,r12
+ bsf r13,r13
+ bsf r14,r14
+
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+ pxor xmm11,xmm9
+ pxor xmm12,xmm9
+DB 102,15,56,222,241
+ pxor xmm13,xmm9
+ pxor xmm14,xmm9
+DB 102,15,56,222,249
+ movups xmm1,XMMWORD[48+r11]
+ pxor xmm15,xmm9
+
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+DB 102,15,56,222,240
+DB 102,15,56,222,248
+ movups xmm0,XMMWORD[64+r11]
+ shl r12,4
+ shl r13,4
+ jmp NEAR $L$ocb_dec_loop6
+
+ALIGN 32
+$L$ocb_dec_loop6:
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+DB 102,15,56,222,241
+DB 102,15,56,222,249
+ movups xmm1,XMMWORD[rax*1+rcx]
+ add rax,32
+
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+DB 102,15,56,222,240
+DB 102,15,56,222,248
+ movups xmm0,XMMWORD[((-16))+rax*1+rcx]
+ jnz NEAR $L$ocb_dec_loop6
+
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+DB 102,15,56,222,241
+DB 102,15,56,222,249
+ movups xmm1,XMMWORD[16+r11]
+ shl r14,4
+
+DB 102,65,15,56,223,210
+ movdqu xmm10,XMMWORD[rbx]
+ mov rax,r10
+DB 102,65,15,56,223,219
+DB 102,65,15,56,223,228
+DB 102,65,15,56,223,237
+DB 102,65,15,56,223,246
+DB 102,65,15,56,223,255
+ DB 0F3h,0C3h ;repret
+
+
+
+ALIGN 32
+__ocb_decrypt4:
+ pxor xmm15,xmm9
+ movdqu xmm11,XMMWORD[r12*1+rbx]
+ movdqa xmm12,xmm10
+ movdqu xmm13,XMMWORD[r13*1+rbx]
+ pxor xmm10,xmm15
+ pxor xmm11,xmm10
+ pxor xmm2,xmm10
+ pxor xmm12,xmm11
+ pxor xmm3,xmm11
+ pxor xmm13,xmm12
+ pxor xmm4,xmm12
+ pxor xmm5,xmm13
+ movups xmm0,XMMWORD[32+r11]
+
+ pxor xmm10,xmm9
+ pxor xmm11,xmm9
+ pxor xmm12,xmm9
+ pxor xmm13,xmm9
+
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+ movups xmm1,XMMWORD[48+r11]
+
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+ movups xmm0,XMMWORD[64+r11]
+ jmp NEAR $L$ocb_dec_loop4
+
+ALIGN 32
+$L$ocb_dec_loop4:
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+ movups xmm1,XMMWORD[rax*1+rcx]
+ add rax,32
+
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+ movups xmm0,XMMWORD[((-16))+rax*1+rcx]
+ jnz NEAR $L$ocb_dec_loop4
+
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+ movups xmm1,XMMWORD[16+r11]
+ mov rax,r10
+
+DB 102,65,15,56,223,210
+DB 102,65,15,56,223,219
+DB 102,65,15,56,223,228
+DB 102,65,15,56,223,237
+ DB 0F3h,0C3h ;repret
+
+
+
+ALIGN 32
+__ocb_decrypt1:
+ pxor xmm7,xmm15
+ pxor xmm7,xmm9
+ pxor xmm2,xmm7
+ movups xmm0,XMMWORD[32+r11]
+
+DB 102,15,56,222,209
+ movups xmm1,XMMWORD[48+r11]
+ pxor xmm7,xmm9
+
+DB 102,15,56,222,208
+ movups xmm0,XMMWORD[64+r11]
+ jmp NEAR $L$ocb_dec_loop1
+
+ALIGN 32
+$L$ocb_dec_loop1:
+DB 102,15,56,222,209
+ movups xmm1,XMMWORD[rax*1+rcx]
+ add rax,32
+
+DB 102,15,56,222,208
+ movups xmm0,XMMWORD[((-16))+rax*1+rcx]
+ jnz NEAR $L$ocb_dec_loop1
+
+DB 102,15,56,222,209
+ movups xmm1,XMMWORD[16+r11]
+ mov rax,r10
+
+DB 102,15,56,223,215
+ DB 0F3h,0C3h ;repret
+
+global aesni_cbc_encrypt
+
+ALIGN 16
+aesni_cbc_encrypt:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_cbc_encrypt:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ test rdx,rdx
+ jz NEAR $L$cbc_ret
+
+ mov r10d,DWORD[240+rcx]
+ mov r11,rcx
+ test r9d,r9d
+ jz NEAR $L$cbc_decrypt
+
+ movups xmm2,XMMWORD[r8]
+ mov eax,r10d
+ cmp rdx,16
+ jb NEAR $L$cbc_enc_tail
+ sub rdx,16
+ jmp NEAR $L$cbc_enc_loop
+ALIGN 16
+$L$cbc_enc_loop:
+ movups xmm3,XMMWORD[rdi]
+ lea rdi,[16+rdi]
+
+ movups xmm0,XMMWORD[rcx]
+ movups xmm1,XMMWORD[16+rcx]
+ xorps xmm3,xmm0
+ lea rcx,[32+rcx]
+ xorps xmm2,xmm3
+$L$oop_enc1_15:
+DB 102,15,56,220,209
+ dec eax
+ movups xmm1,XMMWORD[rcx]
+ lea rcx,[16+rcx]
+ jnz NEAR $L$oop_enc1_15
+DB 102,15,56,221,209
+ mov eax,r10d
+ mov rcx,r11
+ movups XMMWORD[rsi],xmm2
+ lea rsi,[16+rsi]
+ sub rdx,16
+ jnc NEAR $L$cbc_enc_loop
+ add rdx,16
+ jnz NEAR $L$cbc_enc_tail
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ movups XMMWORD[r8],xmm2
+ pxor xmm2,xmm2
+ pxor xmm3,xmm3
+ jmp NEAR $L$cbc_ret
+
+$L$cbc_enc_tail:
+ mov rcx,rdx
+ xchg rsi,rdi
+ DD 0x9066A4F3
+ mov ecx,16
+ sub rcx,rdx
+ xor eax,eax
+ DD 0x9066AAF3
+ lea rdi,[((-16))+rdi]
+ mov eax,r10d
+ mov rsi,rdi
+ mov rcx,r11
+ xor rdx,rdx
+ jmp NEAR $L$cbc_enc_loop
+
+ALIGN 16
+$L$cbc_decrypt:
+ cmp rdx,16
+ jne NEAR $L$cbc_decrypt_bulk
+
+
+
+ movdqu xmm2,XMMWORD[rdi]
+ movdqu xmm3,XMMWORD[r8]
+ movdqa xmm4,xmm2
+ movups xmm0,XMMWORD[rcx]
+ movups xmm1,XMMWORD[16+rcx]
+ lea rcx,[32+rcx]
+ xorps xmm2,xmm0
+$L$oop_dec1_16:
+DB 102,15,56,222,209
+ dec r10d
+ movups xmm1,XMMWORD[rcx]
+ lea rcx,[16+rcx]
+ jnz NEAR $L$oop_dec1_16
+DB 102,15,56,223,209
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ movdqu XMMWORD[r8],xmm4
+ xorps xmm2,xmm3
+ pxor xmm3,xmm3
+ movups XMMWORD[rsi],xmm2
+ pxor xmm2,xmm2
+ jmp NEAR $L$cbc_ret
+ALIGN 16
+$L$cbc_decrypt_bulk:
+ lea r11,[rsp]
+
+ push rbp
+
+ sub rsp,176
+ and rsp,-16
+ movaps XMMWORD[16+rsp],xmm6
+ movaps XMMWORD[32+rsp],xmm7
+ movaps XMMWORD[48+rsp],xmm8
+ movaps XMMWORD[64+rsp],xmm9
+ movaps XMMWORD[80+rsp],xmm10
+ movaps XMMWORD[96+rsp],xmm11
+ movaps XMMWORD[112+rsp],xmm12
+ movaps XMMWORD[128+rsp],xmm13
+ movaps XMMWORD[144+rsp],xmm14
+ movaps XMMWORD[160+rsp],xmm15
+$L$cbc_decrypt_body:
+ mov rbp,rcx
+ movups xmm10,XMMWORD[r8]
+ mov eax,r10d
+ cmp rdx,0x50
+ jbe NEAR $L$cbc_dec_tail
+
+ movups xmm0,XMMWORD[rcx]
+ movdqu xmm2,XMMWORD[rdi]
+ movdqu xmm3,XMMWORD[16+rdi]
+ movdqa xmm11,xmm2
+ movdqu xmm4,XMMWORD[32+rdi]
+ movdqa xmm12,xmm3
+ movdqu xmm5,XMMWORD[48+rdi]
+ movdqa xmm13,xmm4
+ movdqu xmm6,XMMWORD[64+rdi]
+ movdqa xmm14,xmm5
+ movdqu xmm7,XMMWORD[80+rdi]
+ movdqa xmm15,xmm6
+ mov r9d,DWORD[((OPENSSL_ia32cap_P+4))]
+ cmp rdx,0x70
+ jbe NEAR $L$cbc_dec_six_or_seven
+
+ and r9d,71303168
+ sub rdx,0x50
+ cmp r9d,4194304
+ je NEAR $L$cbc_dec_loop6_enter
+ sub rdx,0x20
+ lea rcx,[112+rcx]
+ jmp NEAR $L$cbc_dec_loop8_enter
+ALIGN 16
+$L$cbc_dec_loop8:
+ movups XMMWORD[rsi],xmm9
+ lea rsi,[16+rsi]
+$L$cbc_dec_loop8_enter:
+ movdqu xmm8,XMMWORD[96+rdi]
+ pxor xmm2,xmm0
+ movdqu xmm9,XMMWORD[112+rdi]
+ pxor xmm3,xmm0
+ movups xmm1,XMMWORD[((16-112))+rcx]
+ pxor xmm4,xmm0
+ mov rbp,-1
+ cmp rdx,0x70
+ pxor xmm5,xmm0
+ pxor xmm6,xmm0
+ pxor xmm7,xmm0
+ pxor xmm8,xmm0
+
+DB 102,15,56,222,209
+ pxor xmm9,xmm0
+ movups xmm0,XMMWORD[((32-112))+rcx]
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+DB 102,15,56,222,241
+DB 102,15,56,222,249
+DB 102,68,15,56,222,193
+ adc rbp,0
+ and rbp,128
+DB 102,68,15,56,222,201
+ add rbp,rdi
+ movups xmm1,XMMWORD[((48-112))+rcx]
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+DB 102,15,56,222,240
+DB 102,15,56,222,248
+DB 102,68,15,56,222,192
+DB 102,68,15,56,222,200
+ movups xmm0,XMMWORD[((64-112))+rcx]
+ nop
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+DB 102,15,56,222,241
+DB 102,15,56,222,249
+DB 102,68,15,56,222,193
+DB 102,68,15,56,222,201
+ movups xmm1,XMMWORD[((80-112))+rcx]
+ nop
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+DB 102,15,56,222,240
+DB 102,15,56,222,248
+DB 102,68,15,56,222,192
+DB 102,68,15,56,222,200
+ movups xmm0,XMMWORD[((96-112))+rcx]
+ nop
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+DB 102,15,56,222,241
+DB 102,15,56,222,249
+DB 102,68,15,56,222,193
+DB 102,68,15,56,222,201
+ movups xmm1,XMMWORD[((112-112))+rcx]
+ nop
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+DB 102,15,56,222,240
+DB 102,15,56,222,248
+DB 102,68,15,56,222,192
+DB 102,68,15,56,222,200
+ movups xmm0,XMMWORD[((128-112))+rcx]
+ nop
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+DB 102,15,56,222,241
+DB 102,15,56,222,249
+DB 102,68,15,56,222,193
+DB 102,68,15,56,222,201
+ movups xmm1,XMMWORD[((144-112))+rcx]
+ cmp eax,11
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+DB 102,15,56,222,240
+DB 102,15,56,222,248
+DB 102,68,15,56,222,192
+DB 102,68,15,56,222,200
+ movups xmm0,XMMWORD[((160-112))+rcx]
+ jb NEAR $L$cbc_dec_done
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+DB 102,15,56,222,241
+DB 102,15,56,222,249
+DB 102,68,15,56,222,193
+DB 102,68,15,56,222,201
+ movups xmm1,XMMWORD[((176-112))+rcx]
+ nop
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+DB 102,15,56,222,240
+DB 102,15,56,222,248
+DB 102,68,15,56,222,192
+DB 102,68,15,56,222,200
+ movups xmm0,XMMWORD[((192-112))+rcx]
+ je NEAR $L$cbc_dec_done
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+DB 102,15,56,222,241
+DB 102,15,56,222,249
+DB 102,68,15,56,222,193
+DB 102,68,15,56,222,201
+ movups xmm1,XMMWORD[((208-112))+rcx]
+ nop
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+DB 102,15,56,222,240
+DB 102,15,56,222,248
+DB 102,68,15,56,222,192
+DB 102,68,15,56,222,200
+ movups xmm0,XMMWORD[((224-112))+rcx]
+ jmp NEAR $L$cbc_dec_done
+ALIGN 16
+$L$cbc_dec_done:
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+ pxor xmm10,xmm0
+ pxor xmm11,xmm0
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+ pxor xmm12,xmm0
+ pxor xmm13,xmm0
+DB 102,15,56,222,241
+DB 102,15,56,222,249
+ pxor xmm14,xmm0
+ pxor xmm15,xmm0
+DB 102,68,15,56,222,193
+DB 102,68,15,56,222,201
+ movdqu xmm1,XMMWORD[80+rdi]
+
+DB 102,65,15,56,223,210
+ movdqu xmm10,XMMWORD[96+rdi]
+ pxor xmm1,xmm0
+DB 102,65,15,56,223,219
+ pxor xmm10,xmm0
+ movdqu xmm0,XMMWORD[112+rdi]
+DB 102,65,15,56,223,228
+ lea rdi,[128+rdi]
+ movdqu xmm11,XMMWORD[rbp]
+DB 102,65,15,56,223,237
+DB 102,65,15,56,223,246
+ movdqu xmm12,XMMWORD[16+rbp]
+ movdqu xmm13,XMMWORD[32+rbp]
+DB 102,65,15,56,223,255
+DB 102,68,15,56,223,193
+ movdqu xmm14,XMMWORD[48+rbp]
+ movdqu xmm15,XMMWORD[64+rbp]
+DB 102,69,15,56,223,202
+ movdqa xmm10,xmm0
+ movdqu xmm1,XMMWORD[80+rbp]
+ movups xmm0,XMMWORD[((-112))+rcx]
+
+ movups XMMWORD[rsi],xmm2
+ movdqa xmm2,xmm11
+ movups XMMWORD[16+rsi],xmm3
+ movdqa xmm3,xmm12
+ movups XMMWORD[32+rsi],xmm4
+ movdqa xmm4,xmm13
+ movups XMMWORD[48+rsi],xmm5
+ movdqa xmm5,xmm14
+ movups XMMWORD[64+rsi],xmm6
+ movdqa xmm6,xmm15
+ movups XMMWORD[80+rsi],xmm7
+ movdqa xmm7,xmm1
+ movups XMMWORD[96+rsi],xmm8
+ lea rsi,[112+rsi]
+
+ sub rdx,0x80
+ ja NEAR $L$cbc_dec_loop8
+
+ movaps xmm2,xmm9
+ lea rcx,[((-112))+rcx]
+ add rdx,0x70
+ jle NEAR $L$cbc_dec_clear_tail_collected
+ movups XMMWORD[rsi],xmm9
+ lea rsi,[16+rsi]
+ cmp rdx,0x50
+ jbe NEAR $L$cbc_dec_tail
+
+ movaps xmm2,xmm11
+$L$cbc_dec_six_or_seven:
+ cmp rdx,0x60
+ ja NEAR $L$cbc_dec_seven
+
+ movaps xmm8,xmm7
+ call _aesni_decrypt6
+ pxor xmm2,xmm10
+ movaps xmm10,xmm8
+ pxor xmm3,xmm11
+ movdqu XMMWORD[rsi],xmm2
+ pxor xmm4,xmm12
+ movdqu XMMWORD[16+rsi],xmm3
+ pxor xmm3,xmm3
+ pxor xmm5,xmm13
+ movdqu XMMWORD[32+rsi],xmm4
+ pxor xmm4,xmm4
+ pxor xmm6,xmm14
+ movdqu XMMWORD[48+rsi],xmm5
+ pxor xmm5,xmm5
+ pxor xmm7,xmm15
+ movdqu XMMWORD[64+rsi],xmm6
+ pxor xmm6,xmm6
+ lea rsi,[80+rsi]
+ movdqa xmm2,xmm7
+ pxor xmm7,xmm7
+ jmp NEAR $L$cbc_dec_tail_collected
+
+ALIGN 16
+$L$cbc_dec_seven:
+ movups xmm8,XMMWORD[96+rdi]
+ xorps xmm9,xmm9
+ call _aesni_decrypt8
+ movups xmm9,XMMWORD[80+rdi]
+ pxor xmm2,xmm10
+ movups xmm10,XMMWORD[96+rdi]
+ pxor xmm3,xmm11
+ movdqu XMMWORD[rsi],xmm2
+ pxor xmm4,xmm12
+ movdqu XMMWORD[16+rsi],xmm3
+ pxor xmm3,xmm3
+ pxor xmm5,xmm13
+ movdqu XMMWORD[32+rsi],xmm4
+ pxor xmm4,xmm4
+ pxor xmm6,xmm14
+ movdqu XMMWORD[48+rsi],xmm5
+ pxor xmm5,xmm5
+ pxor xmm7,xmm15
+ movdqu XMMWORD[64+rsi],xmm6
+ pxor xmm6,xmm6
+ pxor xmm8,xmm9
+ movdqu XMMWORD[80+rsi],xmm7
+ pxor xmm7,xmm7
+ lea rsi,[96+rsi]
+ movdqa xmm2,xmm8
+ pxor xmm8,xmm8
+ pxor xmm9,xmm9
+ jmp NEAR $L$cbc_dec_tail_collected
+
+ALIGN 16
+$L$cbc_dec_loop6:
+ movups XMMWORD[rsi],xmm7
+ lea rsi,[16+rsi]
+ movdqu xmm2,XMMWORD[rdi]
+ movdqu xmm3,XMMWORD[16+rdi]
+ movdqa xmm11,xmm2
+ movdqu xmm4,XMMWORD[32+rdi]
+ movdqa xmm12,xmm3
+ movdqu xmm5,XMMWORD[48+rdi]
+ movdqa xmm13,xmm4
+ movdqu xmm6,XMMWORD[64+rdi]
+ movdqa xmm14,xmm5
+ movdqu xmm7,XMMWORD[80+rdi]
+ movdqa xmm15,xmm6
+$L$cbc_dec_loop6_enter:
+ lea rdi,[96+rdi]
+ movdqa xmm8,xmm7
+
+ call _aesni_decrypt6
+
+ pxor xmm2,xmm10
+ movdqa xmm10,xmm8
+ pxor xmm3,xmm11
+ movdqu XMMWORD[rsi],xmm2
+ pxor xmm4,xmm12
+ movdqu XMMWORD[16+rsi],xmm3
+ pxor xmm5,xmm13
+ movdqu XMMWORD[32+rsi],xmm4
+ pxor xmm6,xmm14
+ mov rcx,rbp
+ movdqu XMMWORD[48+rsi],xmm5
+ pxor xmm7,xmm15
+ mov eax,r10d
+ movdqu XMMWORD[64+rsi],xmm6
+ lea rsi,[80+rsi]
+ sub rdx,0x60
+ ja NEAR $L$cbc_dec_loop6
+
+ movdqa xmm2,xmm7
+ add rdx,0x50
+ jle NEAR $L$cbc_dec_clear_tail_collected
+ movups XMMWORD[rsi],xmm7
+ lea rsi,[16+rsi]
+
+$L$cbc_dec_tail:
+ movups xmm2,XMMWORD[rdi]
+ sub rdx,0x10
+ jbe NEAR $L$cbc_dec_one
+
+ movups xmm3,XMMWORD[16+rdi]
+ movaps xmm11,xmm2
+ sub rdx,0x10
+ jbe NEAR $L$cbc_dec_two
+
+ movups xmm4,XMMWORD[32+rdi]
+ movaps xmm12,xmm3
+ sub rdx,0x10
+ jbe NEAR $L$cbc_dec_three
+
+ movups xmm5,XMMWORD[48+rdi]
+ movaps xmm13,xmm4
+ sub rdx,0x10
+ jbe NEAR $L$cbc_dec_four
+
+ movups xmm6,XMMWORD[64+rdi]
+ movaps xmm14,xmm5
+ movaps xmm15,xmm6
+ xorps xmm7,xmm7
+ call _aesni_decrypt6
+ pxor xmm2,xmm10
+ movaps xmm10,xmm15
+ pxor xmm3,xmm11
+ movdqu XMMWORD[rsi],xmm2
+ pxor xmm4,xmm12
+ movdqu XMMWORD[16+rsi],xmm3
+ pxor xmm3,xmm3
+ pxor xmm5,xmm13
+ movdqu XMMWORD[32+rsi],xmm4
+ pxor xmm4,xmm4
+ pxor xmm6,xmm14
+ movdqu XMMWORD[48+rsi],xmm5
+ pxor xmm5,xmm5
+ lea rsi,[64+rsi]
+ movdqa xmm2,xmm6
+ pxor xmm6,xmm6
+ pxor xmm7,xmm7
+ sub rdx,0x10
+ jmp NEAR $L$cbc_dec_tail_collected
+
+ALIGN 16
+$L$cbc_dec_one:
+ movaps xmm11,xmm2
+ movups xmm0,XMMWORD[rcx]
+ movups xmm1,XMMWORD[16+rcx]
+ lea rcx,[32+rcx]
+ xorps xmm2,xmm0
+$L$oop_dec1_17:
+DB 102,15,56,222,209
+ dec eax
+ movups xmm1,XMMWORD[rcx]
+ lea rcx,[16+rcx]
+ jnz NEAR $L$oop_dec1_17
+DB 102,15,56,223,209
+ xorps xmm2,xmm10
+ movaps xmm10,xmm11
+ jmp NEAR $L$cbc_dec_tail_collected
+ALIGN 16
+$L$cbc_dec_two:
+ movaps xmm12,xmm3
+ call _aesni_decrypt2
+ pxor xmm2,xmm10
+ movaps xmm10,xmm12
+ pxor xmm3,xmm11
+ movdqu XMMWORD[rsi],xmm2
+ movdqa xmm2,xmm3
+ pxor xmm3,xmm3
+ lea rsi,[16+rsi]
+ jmp NEAR $L$cbc_dec_tail_collected
+ALIGN 16
+$L$cbc_dec_three:
+ movaps xmm13,xmm4
+ call _aesni_decrypt3
+ pxor xmm2,xmm10
+ movaps xmm10,xmm13
+ pxor xmm3,xmm11
+ movdqu XMMWORD[rsi],xmm2
+ pxor xmm4,xmm12
+ movdqu XMMWORD[16+rsi],xmm3
+ pxor xmm3,xmm3
+ movdqa xmm2,xmm4
+ pxor xmm4,xmm4
+ lea rsi,[32+rsi]
+ jmp NEAR $L$cbc_dec_tail_collected
+ALIGN 16
+$L$cbc_dec_four:
+ movaps xmm14,xmm5
+ call _aesni_decrypt4
+ pxor xmm2,xmm10
+ movaps xmm10,xmm14
+ pxor xmm3,xmm11
+ movdqu XMMWORD[rsi],xmm2
+ pxor xmm4,xmm12
+ movdqu XMMWORD[16+rsi],xmm3
+ pxor xmm3,xmm3
+ pxor xmm5,xmm13
+ movdqu XMMWORD[32+rsi],xmm4
+ pxor xmm4,xmm4
+ movdqa xmm2,xmm5
+ pxor xmm5,xmm5
+ lea rsi,[48+rsi]
+ jmp NEAR $L$cbc_dec_tail_collected
+
+ALIGN 16
+$L$cbc_dec_clear_tail_collected:
+ pxor xmm3,xmm3
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+$L$cbc_dec_tail_collected:
+ movups XMMWORD[r8],xmm10
+ and rdx,15
+ jnz NEAR $L$cbc_dec_tail_partial
+ movups XMMWORD[rsi],xmm2
+ pxor xmm2,xmm2
+ jmp NEAR $L$cbc_dec_ret
+ALIGN 16
+$L$cbc_dec_tail_partial:
+ movaps XMMWORD[rsp],xmm2
+ pxor xmm2,xmm2
+ mov rcx,16
+ mov rdi,rsi
+ sub rcx,rdx
+ lea rsi,[rsp]
+ DD 0x9066A4F3
+ movdqa XMMWORD[rsp],xmm2
+
+$L$cbc_dec_ret:
+ xorps xmm0,xmm0
+ pxor xmm1,xmm1
+ movaps xmm6,XMMWORD[16+rsp]
+ movaps XMMWORD[16+rsp],xmm0
+ movaps xmm7,XMMWORD[32+rsp]
+ movaps XMMWORD[32+rsp],xmm0
+ movaps xmm8,XMMWORD[48+rsp]
+ movaps XMMWORD[48+rsp],xmm0
+ movaps xmm9,XMMWORD[64+rsp]
+ movaps XMMWORD[64+rsp],xmm0
+ movaps xmm10,XMMWORD[80+rsp]
+ movaps XMMWORD[80+rsp],xmm0
+ movaps xmm11,XMMWORD[96+rsp]
+ movaps XMMWORD[96+rsp],xmm0
+ movaps xmm12,XMMWORD[112+rsp]
+ movaps XMMWORD[112+rsp],xmm0
+ movaps xmm13,XMMWORD[128+rsp]
+ movaps XMMWORD[128+rsp],xmm0
+ movaps xmm14,XMMWORD[144+rsp]
+ movaps XMMWORD[144+rsp],xmm0
+ movaps xmm15,XMMWORD[160+rsp]
+ movaps XMMWORD[160+rsp],xmm0
+ mov rbp,QWORD[((-8))+r11]
+
+ lea rsp,[r11]
+
+$L$cbc_ret:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_aesni_cbc_encrypt:
+global aesni_set_decrypt_key
+
+ALIGN 16
+aesni_set_decrypt_key:
+
+DB 0x48,0x83,0xEC,0x08
+
+ call __aesni_set_encrypt_key
+ shl edx,4
+ test eax,eax
+ jnz NEAR $L$dec_key_ret
+ lea rcx,[16+rdx*1+r8]
+
+ movups xmm0,XMMWORD[r8]
+ movups xmm1,XMMWORD[rcx]
+ movups XMMWORD[rcx],xmm0
+ movups XMMWORD[r8],xmm1
+ lea r8,[16+r8]
+ lea rcx,[((-16))+rcx]
+
+$L$dec_key_inverse:
+ movups xmm0,XMMWORD[r8]
+ movups xmm1,XMMWORD[rcx]
+DB 102,15,56,219,192
+DB 102,15,56,219,201
+ lea r8,[16+r8]
+ lea rcx,[((-16))+rcx]
+ movups XMMWORD[16+rcx],xmm0
+ movups XMMWORD[(-16)+r8],xmm1
+ cmp rcx,r8
+ ja NEAR $L$dec_key_inverse
+
+ movups xmm0,XMMWORD[r8]
+DB 102,15,56,219,192
+ pxor xmm1,xmm1
+ movups XMMWORD[rcx],xmm0
+ pxor xmm0,xmm0
+$L$dec_key_ret:
+ add rsp,8
+
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_set_decrypt_key:
+
+global aesni_set_encrypt_key
+
+ALIGN 16
+aesni_set_encrypt_key:
+__aesni_set_encrypt_key:
+
+DB 0x48,0x83,0xEC,0x08
+
+ mov rax,-1
+ test rcx,rcx
+ jz NEAR $L$enc_key_ret
+ test r8,r8
+ jz NEAR $L$enc_key_ret
+
+ mov r10d,268437504
+ movups xmm0,XMMWORD[rcx]
+ xorps xmm4,xmm4
+ and r10d,DWORD[((OPENSSL_ia32cap_P+4))]
+ lea rax,[16+r8]
+ cmp edx,256
+ je NEAR $L$14rounds
+ cmp edx,192
+ je NEAR $L$12rounds
+ cmp edx,128
+ jne NEAR $L$bad_keybits
+
+$L$10rounds:
+ mov edx,9
+ cmp r10d,268435456
+ je NEAR $L$10rounds_alt
+
+ movups XMMWORD[r8],xmm0
+DB 102,15,58,223,200,1
+ call $L$key_expansion_128_cold
+DB 102,15,58,223,200,2
+ call $L$key_expansion_128
+DB 102,15,58,223,200,4
+ call $L$key_expansion_128
+DB 102,15,58,223,200,8
+ call $L$key_expansion_128
+DB 102,15,58,223,200,16
+ call $L$key_expansion_128
+DB 102,15,58,223,200,32
+ call $L$key_expansion_128
+DB 102,15,58,223,200,64
+ call $L$key_expansion_128
+DB 102,15,58,223,200,128
+ call $L$key_expansion_128
+DB 102,15,58,223,200,27
+ call $L$key_expansion_128
+DB 102,15,58,223,200,54
+ call $L$key_expansion_128
+ movups XMMWORD[rax],xmm0
+ mov DWORD[80+rax],edx
+ xor eax,eax
+ jmp NEAR $L$enc_key_ret
+
+ALIGN 16
+$L$10rounds_alt:
+ movdqa xmm5,XMMWORD[$L$key_rotate]
+ mov r10d,8
+ movdqa xmm4,XMMWORD[$L$key_rcon1]
+ movdqa xmm2,xmm0
+ movdqu XMMWORD[r8],xmm0
+ jmp NEAR $L$oop_key128
+
+ALIGN 16
+$L$oop_key128:
+DB 102,15,56,0,197
+DB 102,15,56,221,196
+ pslld xmm4,1
+ lea rax,[16+rax]
+
+ movdqa xmm3,xmm2
+ pslldq xmm2,4
+ pxor xmm3,xmm2
+ pslldq xmm2,4
+ pxor xmm3,xmm2
+ pslldq xmm2,4
+ pxor xmm2,xmm3
+
+ pxor xmm0,xmm2
+ movdqu XMMWORD[(-16)+rax],xmm0
+ movdqa xmm2,xmm0
+
+ dec r10d
+ jnz NEAR $L$oop_key128
+
+ movdqa xmm4,XMMWORD[$L$key_rcon1b]
+
+DB 102,15,56,0,197
+DB 102,15,56,221,196
+ pslld xmm4,1
+
+ movdqa xmm3,xmm2
+ pslldq xmm2,4
+ pxor xmm3,xmm2
+ pslldq xmm2,4
+ pxor xmm3,xmm2
+ pslldq xmm2,4
+ pxor xmm2,xmm3
+
+ pxor xmm0,xmm2
+ movdqu XMMWORD[rax],xmm0
+
+ movdqa xmm2,xmm0
+DB 102,15,56,0,197
+DB 102,15,56,221,196
+
+ movdqa xmm3,xmm2
+ pslldq xmm2,4
+ pxor xmm3,xmm2
+ pslldq xmm2,4
+ pxor xmm3,xmm2
+ pslldq xmm2,4
+ pxor xmm2,xmm3
+
+ pxor xmm0,xmm2
+ movdqu XMMWORD[16+rax],xmm0
+
+ mov DWORD[96+rax],edx
+ xor eax,eax
+ jmp NEAR $L$enc_key_ret
+
+ALIGN 16
+$L$12rounds:
+ movq xmm2,QWORD[16+rcx]
+ mov edx,11
+ cmp r10d,268435456
+ je NEAR $L$12rounds_alt
+
+ movups XMMWORD[r8],xmm0
+DB 102,15,58,223,202,1
+ call $L$key_expansion_192a_cold
+DB 102,15,58,223,202,2
+ call $L$key_expansion_192b
+DB 102,15,58,223,202,4
+ call $L$key_expansion_192a
+DB 102,15,58,223,202,8
+ call $L$key_expansion_192b
+DB 102,15,58,223,202,16
+ call $L$key_expansion_192a
+DB 102,15,58,223,202,32
+ call $L$key_expansion_192b
+DB 102,15,58,223,202,64
+ call $L$key_expansion_192a
+DB 102,15,58,223,202,128
+ call $L$key_expansion_192b
+ movups XMMWORD[rax],xmm0
+ mov DWORD[48+rax],edx
+ xor rax,rax
+ jmp NEAR $L$enc_key_ret
+
+ALIGN 16
+$L$12rounds_alt:
+ movdqa xmm5,XMMWORD[$L$key_rotate192]
+ movdqa xmm4,XMMWORD[$L$key_rcon1]
+ mov r10d,8
+ movdqu XMMWORD[r8],xmm0
+ jmp NEAR $L$oop_key192
+
+ALIGN 16
+$L$oop_key192:
+ movq QWORD[rax],xmm2
+ movdqa xmm1,xmm2
+DB 102,15,56,0,213
+DB 102,15,56,221,212
+ pslld xmm4,1
+ lea rax,[24+rax]
+
+ movdqa xmm3,xmm0
+ pslldq xmm0,4
+ pxor xmm3,xmm0
+ pslldq xmm0,4
+ pxor xmm3,xmm0
+ pslldq xmm0,4
+ pxor xmm0,xmm3
+
+ pshufd xmm3,xmm0,0xff
+ pxor xmm3,xmm1
+ pslldq xmm1,4
+ pxor xmm3,xmm1
+
+ pxor xmm0,xmm2
+ pxor xmm2,xmm3
+ movdqu XMMWORD[(-16)+rax],xmm0
+
+ dec r10d
+ jnz NEAR $L$oop_key192
+
+ mov DWORD[32+rax],edx
+ xor eax,eax
+ jmp NEAR $L$enc_key_ret
+
+ALIGN 16
+$L$14rounds:
+ movups xmm2,XMMWORD[16+rcx]
+ mov edx,13
+ lea rax,[16+rax]
+ cmp r10d,268435456
+ je NEAR $L$14rounds_alt
+
+ movups XMMWORD[r8],xmm0
+ movups XMMWORD[16+r8],xmm2
+DB 102,15,58,223,202,1
+ call $L$key_expansion_256a_cold
+DB 102,15,58,223,200,1
+ call $L$key_expansion_256b
+DB 102,15,58,223,202,2
+ call $L$key_expansion_256a
+DB 102,15,58,223,200,2
+ call $L$key_expansion_256b
+DB 102,15,58,223,202,4
+ call $L$key_expansion_256a
+DB 102,15,58,223,200,4
+ call $L$key_expansion_256b
+DB 102,15,58,223,202,8
+ call $L$key_expansion_256a
+DB 102,15,58,223,200,8
+ call $L$key_expansion_256b
+DB 102,15,58,223,202,16
+ call $L$key_expansion_256a
+DB 102,15,58,223,200,16
+ call $L$key_expansion_256b
+DB 102,15,58,223,202,32
+ call $L$key_expansion_256a
+DB 102,15,58,223,200,32
+ call $L$key_expansion_256b
+DB 102,15,58,223,202,64
+ call $L$key_expansion_256a
+ movups XMMWORD[rax],xmm0
+ mov DWORD[16+rax],edx
+ xor rax,rax
+ jmp NEAR $L$enc_key_ret
+
+ALIGN 16
+$L$14rounds_alt:
+ movdqa xmm5,XMMWORD[$L$key_rotate]
+ movdqa xmm4,XMMWORD[$L$key_rcon1]
+ mov r10d,7
+ movdqu XMMWORD[r8],xmm0
+ movdqa xmm1,xmm2
+ movdqu XMMWORD[16+r8],xmm2
+ jmp NEAR $L$oop_key256
+
+ALIGN 16
+$L$oop_key256:
+DB 102,15,56,0,213
+DB 102,15,56,221,212
+
+ movdqa xmm3,xmm0
+ pslldq xmm0,4
+ pxor xmm3,xmm0
+ pslldq xmm0,4
+ pxor xmm3,xmm0
+ pslldq xmm0,4
+ pxor xmm0,xmm3
+ pslld xmm4,1
+
+ pxor xmm0,xmm2
+ movdqu XMMWORD[rax],xmm0
+
+ dec r10d
+ jz NEAR $L$done_key256
+
+ pshufd xmm2,xmm0,0xff
+ pxor xmm3,xmm3
+DB 102,15,56,221,211
+
+ movdqa xmm3,xmm1
+ pslldq xmm1,4
+ pxor xmm3,xmm1
+ pslldq xmm1,4
+ pxor xmm3,xmm1
+ pslldq xmm1,4
+ pxor xmm1,xmm3
+
+ pxor xmm2,xmm1
+ movdqu XMMWORD[16+rax],xmm2
+ lea rax,[32+rax]
+ movdqa xmm1,xmm2
+
+ jmp NEAR $L$oop_key256
+
+$L$done_key256:
+ mov DWORD[16+rax],edx
+ xor eax,eax
+ jmp NEAR $L$enc_key_ret
+
+ALIGN 16
+$L$bad_keybits:
+ mov rax,-2
+$L$enc_key_ret:
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ pxor xmm2,xmm2
+ pxor xmm3,xmm3
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+ add rsp,8
+
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_set_encrypt_key:
+
+ALIGN 16
+$L$key_expansion_128:
+ movups XMMWORD[rax],xmm0
+ lea rax,[16+rax]
+$L$key_expansion_128_cold:
+ shufps xmm4,xmm0,16
+ xorps xmm0,xmm4
+ shufps xmm4,xmm0,140
+ xorps xmm0,xmm4
+ shufps xmm1,xmm1,255
+ xorps xmm0,xmm1
+ DB 0F3h,0C3h ;repret
+
+ALIGN 16
+$L$key_expansion_192a:
+ movups XMMWORD[rax],xmm0
+ lea rax,[16+rax]
+$L$key_expansion_192a_cold:
+ movaps xmm5,xmm2
+$L$key_expansion_192b_warm:
+ shufps xmm4,xmm0,16
+ movdqa xmm3,xmm2
+ xorps xmm0,xmm4
+ shufps xmm4,xmm0,140
+ pslldq xmm3,4
+ xorps xmm0,xmm4
+ pshufd xmm1,xmm1,85
+ pxor xmm2,xmm3
+ pxor xmm0,xmm1
+ pshufd xmm3,xmm0,255
+ pxor xmm2,xmm3
+ DB 0F3h,0C3h ;repret
+
+ALIGN 16
+$L$key_expansion_192b:
+ movaps xmm3,xmm0
+ shufps xmm5,xmm0,68
+ movups XMMWORD[rax],xmm5
+ shufps xmm3,xmm2,78
+ movups XMMWORD[16+rax],xmm3
+ lea rax,[32+rax]
+ jmp NEAR $L$key_expansion_192b_warm
+
+ALIGN 16
+$L$key_expansion_256a:
+ movups XMMWORD[rax],xmm2
+ lea rax,[16+rax]
+$L$key_expansion_256a_cold:
+ shufps xmm4,xmm0,16
+ xorps xmm0,xmm4
+ shufps xmm4,xmm0,140
+ xorps xmm0,xmm4
+ shufps xmm1,xmm1,255
+ xorps xmm0,xmm1
+ DB 0F3h,0C3h ;repret
+
+ALIGN 16
+$L$key_expansion_256b:
+ movups XMMWORD[rax],xmm0
+ lea rax,[16+rax]
+
+ shufps xmm4,xmm2,16
+ xorps xmm2,xmm4
+ shufps xmm4,xmm2,140
+ xorps xmm2,xmm4
+ shufps xmm1,xmm1,170
+ xorps xmm2,xmm1
+ DB 0F3h,0C3h ;repret
+
+
+ALIGN 64
+$L$bswap_mask:
+DB 15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0
+$L$increment32:
+ DD 6,6,6,0
+$L$increment64:
+ DD 1,0,0,0
+$L$xts_magic:
+ DD 0x87,0,1,0
+$L$increment1:
+DB 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
+$L$key_rotate:
+ DD 0x0c0f0e0d,0x0c0f0e0d,0x0c0f0e0d,0x0c0f0e0d
+$L$key_rotate192:
+ DD 0x04070605,0x04070605,0x04070605,0x04070605
+$L$key_rcon1:
+ DD 1,1,1,1
+$L$key_rcon1b:
+ DD 0x1b,0x1b,0x1b,0x1b
+
+DB 65,69,83,32,102,111,114,32,73,110,116,101,108,32,65,69
+DB 83,45,78,73,44,32,67,82,89,80,84,79,71,65,77,83
+DB 32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115
+DB 115,108,46,111,114,103,62,0
+ALIGN 64
+EXTERN __imp_RtlVirtualUnwind
+
+ALIGN 16
+ecb_ccm64_se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$common_seh_tail
+
+ mov rax,QWORD[152+r8]
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$common_seh_tail
+
+ lea rsi,[rax]
+ lea rdi,[512+r8]
+ mov ecx,8
+ DD 0xa548f3fc
+ lea rax,[88+rax]
+
+ jmp NEAR $L$common_seh_tail
+
+
+
+ALIGN 16
+ctr_xts_se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$common_seh_tail
+
+ mov rax,QWORD[152+r8]
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$common_seh_tail
+
+ mov rax,QWORD[208+r8]
+
+ lea rsi,[((-168))+rax]
+ lea rdi,[512+r8]
+ mov ecx,20
+ DD 0xa548f3fc
+
+ mov rbp,QWORD[((-8))+rax]
+ mov QWORD[160+r8],rbp
+ jmp NEAR $L$common_seh_tail
+
+
+
+ALIGN 16
+ocb_se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$common_seh_tail
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$common_seh_tail
+
+ mov r10d,DWORD[8+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$ocb_no_xmm
+
+ mov rax,QWORD[152+r8]
+
+ lea rsi,[rax]
+ lea rdi,[512+r8]
+ mov ecx,20
+ DD 0xa548f3fc
+ lea rax,[((160+40))+rax]
+
+$L$ocb_no_xmm:
+ mov rbx,QWORD[((-8))+rax]
+ mov rbp,QWORD[((-16))+rax]
+ mov r12,QWORD[((-24))+rax]
+ mov r13,QWORD[((-32))+rax]
+ mov r14,QWORD[((-40))+rax]
+
+ mov QWORD[144+r8],rbx
+ mov QWORD[160+r8],rbp
+ mov QWORD[216+r8],r12
+ mov QWORD[224+r8],r13
+ mov QWORD[232+r8],r14
+
+ jmp NEAR $L$common_seh_tail
+
+
+ALIGN 16
+cbc_se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[152+r8]
+ mov rbx,QWORD[248+r8]
+
+ lea r10,[$L$cbc_decrypt_bulk]
+ cmp rbx,r10
+ jb NEAR $L$common_seh_tail
+
+ mov rax,QWORD[120+r8]
+
+ lea r10,[$L$cbc_decrypt_body]
+ cmp rbx,r10
+ jb NEAR $L$common_seh_tail
+
+ mov rax,QWORD[152+r8]
+
+ lea r10,[$L$cbc_ret]
+ cmp rbx,r10
+ jae NEAR $L$common_seh_tail
+
+ lea rsi,[16+rax]
+ lea rdi,[512+r8]
+ mov ecx,20
+ DD 0xa548f3fc
+
+ mov rax,QWORD[208+r8]
+
+ mov rbp,QWORD[((-8))+rax]
+ mov QWORD[160+r8],rbp
+
+$L$common_seh_tail:
+ mov rdi,QWORD[8+rax]
+ mov rsi,QWORD[16+rax]
+ mov QWORD[152+r8],rax
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+
+ mov rdi,QWORD[40+r9]
+ mov rsi,r8
+ mov ecx,154
+ DD 0xa548f3fc
+
+ mov rsi,r9
+ xor rcx,rcx
+ mov rdx,QWORD[8+rsi]
+ mov r8,QWORD[rsi]
+ mov r9,QWORD[16+rsi]
+ mov r10,QWORD[40+rsi]
+ lea r11,[56+rsi]
+ lea r12,[24+rsi]
+ mov QWORD[32+rsp],r10
+ mov QWORD[40+rsp],r11
+ mov QWORD[48+rsp],r12
+ mov QWORD[56+rsp],rcx
+ call QWORD[__imp_RtlVirtualUnwind]
+
+ mov eax,1
+ add rsp,64
+ popfq
+ pop r15
+ pop r14
+ pop r13
+ pop r12
+ pop rbp
+ pop rbx
+ pop rdi
+ pop rsi
+ DB 0F3h,0C3h ;repret
+
+
+section .pdata rdata align=4
+ALIGN 4
+ DD $L$SEH_begin_aesni_ecb_encrypt wrt ..imagebase
+ DD $L$SEH_end_aesni_ecb_encrypt wrt ..imagebase
+ DD $L$SEH_info_ecb wrt ..imagebase
+
+ DD $L$SEH_begin_aesni_ccm64_encrypt_blocks wrt ..imagebase
+ DD $L$SEH_end_aesni_ccm64_encrypt_blocks wrt ..imagebase
+ DD $L$SEH_info_ccm64_enc wrt ..imagebase
+
+ DD $L$SEH_begin_aesni_ccm64_decrypt_blocks wrt ..imagebase
+ DD $L$SEH_end_aesni_ccm64_decrypt_blocks wrt ..imagebase
+ DD $L$SEH_info_ccm64_dec wrt ..imagebase
+
+ DD $L$SEH_begin_aesni_ctr32_encrypt_blocks wrt ..imagebase
+ DD $L$SEH_end_aesni_ctr32_encrypt_blocks wrt ..imagebase
+ DD $L$SEH_info_ctr32 wrt ..imagebase
+
+ DD $L$SEH_begin_aesni_xts_encrypt wrt ..imagebase
+ DD $L$SEH_end_aesni_xts_encrypt wrt ..imagebase
+ DD $L$SEH_info_xts_enc wrt ..imagebase
+
+ DD $L$SEH_begin_aesni_xts_decrypt wrt ..imagebase
+ DD $L$SEH_end_aesni_xts_decrypt wrt ..imagebase
+ DD $L$SEH_info_xts_dec wrt ..imagebase
+
+ DD $L$SEH_begin_aesni_ocb_encrypt wrt ..imagebase
+ DD $L$SEH_end_aesni_ocb_encrypt wrt ..imagebase
+ DD $L$SEH_info_ocb_enc wrt ..imagebase
+
+ DD $L$SEH_begin_aesni_ocb_decrypt wrt ..imagebase
+ DD $L$SEH_end_aesni_ocb_decrypt wrt ..imagebase
+ DD $L$SEH_info_ocb_dec wrt ..imagebase
+ DD $L$SEH_begin_aesni_cbc_encrypt wrt ..imagebase
+ DD $L$SEH_end_aesni_cbc_encrypt wrt ..imagebase
+ DD $L$SEH_info_cbc wrt ..imagebase
+
+ DD aesni_set_decrypt_key wrt ..imagebase
+ DD $L$SEH_end_set_decrypt_key wrt ..imagebase
+ DD $L$SEH_info_key wrt ..imagebase
+
+ DD aesni_set_encrypt_key wrt ..imagebase
+ DD $L$SEH_end_set_encrypt_key wrt ..imagebase
+ DD $L$SEH_info_key wrt ..imagebase
+section .xdata rdata align=8
+ALIGN 8
+$L$SEH_info_ecb:
+DB 9,0,0,0
+ DD ecb_ccm64_se_handler wrt ..imagebase
+ DD $L$ecb_enc_body wrt ..imagebase,$L$ecb_enc_ret wrt ..imagebase
+$L$SEH_info_ccm64_enc:
+DB 9,0,0,0
+ DD ecb_ccm64_se_handler wrt ..imagebase
+ DD $L$ccm64_enc_body wrt ..imagebase,$L$ccm64_enc_ret wrt ..imagebase
+$L$SEH_info_ccm64_dec:
+DB 9,0,0,0
+ DD ecb_ccm64_se_handler wrt ..imagebase
+ DD $L$ccm64_dec_body wrt ..imagebase,$L$ccm64_dec_ret wrt ..imagebase
+$L$SEH_info_ctr32:
+DB 9,0,0,0
+ DD ctr_xts_se_handler wrt ..imagebase
+ DD $L$ctr32_body wrt ..imagebase,$L$ctr32_epilogue wrt ..imagebase
+$L$SEH_info_xts_enc:
+DB 9,0,0,0
+ DD ctr_xts_se_handler wrt ..imagebase
+ DD $L$xts_enc_body wrt ..imagebase,$L$xts_enc_epilogue wrt ..imagebase
+$L$SEH_info_xts_dec:
+DB 9,0,0,0
+ DD ctr_xts_se_handler wrt ..imagebase
+ DD $L$xts_dec_body wrt ..imagebase,$L$xts_dec_epilogue wrt ..imagebase
+$L$SEH_info_ocb_enc:
+DB 9,0,0,0
+ DD ocb_se_handler wrt ..imagebase
+ DD $L$ocb_enc_body wrt ..imagebase,$L$ocb_enc_epilogue wrt ..imagebase
+ DD $L$ocb_enc_pop wrt ..imagebase
+ DD 0
+$L$SEH_info_ocb_dec:
+DB 9,0,0,0
+ DD ocb_se_handler wrt ..imagebase
+ DD $L$ocb_dec_body wrt ..imagebase,$L$ocb_dec_epilogue wrt ..imagebase
+ DD $L$ocb_dec_pop wrt ..imagebase
+ DD 0
+$L$SEH_info_cbc:
+DB 9,0,0,0
+ DD cbc_se_handler wrt ..imagebase
+$L$SEH_info_key:
+DB 0x01,0x04,0x01,0x00
+DB 0x04,0x02,0x00,0x00
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm
new file mode 100644
index 0000000000..e6a5733924
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm
@@ -0,0 +1,1170 @@
+; Copyright 2011-2019 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ALIGN 16
+_vpaes_encrypt_core:
+
+ mov r9,rdx
+ mov r11,16
+ mov eax,DWORD[240+rdx]
+ movdqa xmm1,xmm9
+ movdqa xmm2,XMMWORD[$L$k_ipt]
+ pandn xmm1,xmm0
+ movdqu xmm5,XMMWORD[r9]
+ psrld xmm1,4
+ pand xmm0,xmm9
+DB 102,15,56,0,208
+ movdqa xmm0,XMMWORD[(($L$k_ipt+16))]
+DB 102,15,56,0,193
+ pxor xmm2,xmm5
+ add r9,16
+ pxor xmm0,xmm2
+ lea r10,[$L$k_mc_backward]
+ jmp NEAR $L$enc_entry
+
+ALIGN 16
+$L$enc_loop:
+
+ movdqa xmm4,xmm13
+ movdqa xmm0,xmm12
+DB 102,15,56,0,226
+DB 102,15,56,0,195
+ pxor xmm4,xmm5
+ movdqa xmm5,xmm15
+ pxor xmm0,xmm4
+ movdqa xmm1,XMMWORD[((-64))+r10*1+r11]
+DB 102,15,56,0,234
+ movdqa xmm4,XMMWORD[r10*1+r11]
+ movdqa xmm2,xmm14
+DB 102,15,56,0,211
+ movdqa xmm3,xmm0
+ pxor xmm2,xmm5
+DB 102,15,56,0,193
+ add r9,16
+ pxor xmm0,xmm2
+DB 102,15,56,0,220
+ add r11,16
+ pxor xmm3,xmm0
+DB 102,15,56,0,193
+ and r11,0x30
+ sub rax,1
+ pxor xmm0,xmm3
+
+$L$enc_entry:
+
+ movdqa xmm1,xmm9
+ movdqa xmm5,xmm11
+ pandn xmm1,xmm0
+ psrld xmm1,4
+ pand xmm0,xmm9
+DB 102,15,56,0,232
+ movdqa xmm3,xmm10
+ pxor xmm0,xmm1
+DB 102,15,56,0,217
+ movdqa xmm4,xmm10
+ pxor xmm3,xmm5
+DB 102,15,56,0,224
+ movdqa xmm2,xmm10
+ pxor xmm4,xmm5
+DB 102,15,56,0,211
+ movdqa xmm3,xmm10
+ pxor xmm2,xmm0
+DB 102,15,56,0,220
+ movdqu xmm5,XMMWORD[r9]
+ pxor xmm3,xmm1
+ jnz NEAR $L$enc_loop
+
+
+ movdqa xmm4,XMMWORD[((-96))+r10]
+ movdqa xmm0,XMMWORD[((-80))+r10]
+DB 102,15,56,0,226
+ pxor xmm4,xmm5
+DB 102,15,56,0,195
+ movdqa xmm1,XMMWORD[64+r10*1+r11]
+ pxor xmm0,xmm4
+DB 102,15,56,0,193
+ DB 0F3h,0C3h ;repret
+
+
+
+
+
+
+
+
+
+ALIGN 16
+_vpaes_decrypt_core:
+
+ mov r9,rdx
+ mov eax,DWORD[240+rdx]
+ movdqa xmm1,xmm9
+ movdqa xmm2,XMMWORD[$L$k_dipt]
+ pandn xmm1,xmm0
+ mov r11,rax
+ psrld xmm1,4
+ movdqu xmm5,XMMWORD[r9]
+ shl r11,4
+ pand xmm0,xmm9
+DB 102,15,56,0,208
+ movdqa xmm0,XMMWORD[(($L$k_dipt+16))]
+ xor r11,0x30
+ lea r10,[$L$k_dsbd]
+DB 102,15,56,0,193
+ and r11,0x30
+ pxor xmm2,xmm5
+ movdqa xmm5,XMMWORD[(($L$k_mc_forward+48))]
+ pxor xmm0,xmm2
+ add r9,16
+ add r11,r10
+ jmp NEAR $L$dec_entry
+
+ALIGN 16
+$L$dec_loop:
+
+
+
+ movdqa xmm4,XMMWORD[((-32))+r10]
+ movdqa xmm1,XMMWORD[((-16))+r10]
+DB 102,15,56,0,226
+DB 102,15,56,0,203
+ pxor xmm0,xmm4
+ movdqa xmm4,XMMWORD[r10]
+ pxor xmm0,xmm1
+ movdqa xmm1,XMMWORD[16+r10]
+
+DB 102,15,56,0,226
+DB 102,15,56,0,197
+DB 102,15,56,0,203
+ pxor xmm0,xmm4
+ movdqa xmm4,XMMWORD[32+r10]
+ pxor xmm0,xmm1
+ movdqa xmm1,XMMWORD[48+r10]
+
+DB 102,15,56,0,226
+DB 102,15,56,0,197
+DB 102,15,56,0,203
+ pxor xmm0,xmm4
+ movdqa xmm4,XMMWORD[64+r10]
+ pxor xmm0,xmm1
+ movdqa xmm1,XMMWORD[80+r10]
+
+DB 102,15,56,0,226
+DB 102,15,56,0,197
+DB 102,15,56,0,203
+ pxor xmm0,xmm4
+ add r9,16
+DB 102,15,58,15,237,12
+ pxor xmm0,xmm1
+ sub rax,1
+
+$L$dec_entry:
+
+ movdqa xmm1,xmm9
+ pandn xmm1,xmm0
+ movdqa xmm2,xmm11
+ psrld xmm1,4
+ pand xmm0,xmm9
+DB 102,15,56,0,208
+ movdqa xmm3,xmm10
+ pxor xmm0,xmm1
+DB 102,15,56,0,217
+ movdqa xmm4,xmm10
+ pxor xmm3,xmm2
+DB 102,15,56,0,224
+ pxor xmm4,xmm2
+ movdqa xmm2,xmm10
+DB 102,15,56,0,211
+ movdqa xmm3,xmm10
+ pxor xmm2,xmm0
+DB 102,15,56,0,220
+ movdqu xmm0,XMMWORD[r9]
+ pxor xmm3,xmm1
+ jnz NEAR $L$dec_loop
+
+
+ movdqa xmm4,XMMWORD[96+r10]
+DB 102,15,56,0,226
+ pxor xmm4,xmm0
+ movdqa xmm0,XMMWORD[112+r10]
+ movdqa xmm2,XMMWORD[((-352))+r11]
+DB 102,15,56,0,195
+ pxor xmm0,xmm4
+DB 102,15,56,0,194
+ DB 0F3h,0C3h ;repret
+
+
+
+
+
+
+
+
+
+ALIGN 16
+_vpaes_schedule_core:
+
+
+
+
+
+
+ call _vpaes_preheat
+ movdqa xmm8,XMMWORD[$L$k_rcon]
+ movdqu xmm0,XMMWORD[rdi]
+
+
+ movdqa xmm3,xmm0
+ lea r11,[$L$k_ipt]
+ call _vpaes_schedule_transform
+ movdqa xmm7,xmm0
+
+ lea r10,[$L$k_sr]
+ test rcx,rcx
+ jnz NEAR $L$schedule_am_decrypting
+
+
+ movdqu XMMWORD[rdx],xmm0
+ jmp NEAR $L$schedule_go
+
+$L$schedule_am_decrypting:
+
+ movdqa xmm1,XMMWORD[r10*1+r8]
+DB 102,15,56,0,217
+ movdqu XMMWORD[rdx],xmm3
+ xor r8,0x30
+
+$L$schedule_go:
+ cmp esi,192
+ ja NEAR $L$schedule_256
+ je NEAR $L$schedule_192
+
+
+
+
+
+
+
+
+
+
+$L$schedule_128:
+ mov esi,10
+
+$L$oop_schedule_128:
+ call _vpaes_schedule_round
+ dec rsi
+ jz NEAR $L$schedule_mangle_last
+ call _vpaes_schedule_mangle
+ jmp NEAR $L$oop_schedule_128
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ALIGN 16
+$L$schedule_192:
+ movdqu xmm0,XMMWORD[8+rdi]
+ call _vpaes_schedule_transform
+ movdqa xmm6,xmm0
+ pxor xmm4,xmm4
+ movhlps xmm6,xmm4
+ mov esi,4
+
+$L$oop_schedule_192:
+ call _vpaes_schedule_round
+DB 102,15,58,15,198,8
+ call _vpaes_schedule_mangle
+ call _vpaes_schedule_192_smear
+ call _vpaes_schedule_mangle
+ call _vpaes_schedule_round
+ dec rsi
+ jz NEAR $L$schedule_mangle_last
+ call _vpaes_schedule_mangle
+ call _vpaes_schedule_192_smear
+ jmp NEAR $L$oop_schedule_192
+
+
+
+
+
+
+
+
+
+
+
+ALIGN 16
+$L$schedule_256:
+ movdqu xmm0,XMMWORD[16+rdi]
+ call _vpaes_schedule_transform
+ mov esi,7
+
+$L$oop_schedule_256:
+ call _vpaes_schedule_mangle
+ movdqa xmm6,xmm0
+
+
+ call _vpaes_schedule_round
+ dec rsi
+ jz NEAR $L$schedule_mangle_last
+ call _vpaes_schedule_mangle
+
+
+ pshufd xmm0,xmm0,0xFF
+ movdqa xmm5,xmm7
+ movdqa xmm7,xmm6
+ call _vpaes_schedule_low_round
+ movdqa xmm7,xmm5
+
+ jmp NEAR $L$oop_schedule_256
+
+
+
+
+
+
+
+
+
+
+
+
+ALIGN 16
+$L$schedule_mangle_last:
+
+ lea r11,[$L$k_deskew]
+ test rcx,rcx
+ jnz NEAR $L$schedule_mangle_last_dec
+
+
+ movdqa xmm1,XMMWORD[r10*1+r8]
+DB 102,15,56,0,193
+ lea r11,[$L$k_opt]
+ add rdx,32
+
+$L$schedule_mangle_last_dec:
+ add rdx,-16
+ pxor xmm0,XMMWORD[$L$k_s63]
+ call _vpaes_schedule_transform
+ movdqu XMMWORD[rdx],xmm0
+
+
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ pxor xmm2,xmm2
+ pxor xmm3,xmm3
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+ pxor xmm6,xmm6
+ pxor xmm7,xmm7
+ DB 0F3h,0C3h ;repret
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ALIGN 16
+_vpaes_schedule_192_smear:
+
+ pshufd xmm1,xmm6,0x80
+ pshufd xmm0,xmm7,0xFE
+ pxor xmm6,xmm1
+ pxor xmm1,xmm1
+ pxor xmm6,xmm0
+ movdqa xmm0,xmm6
+ movhlps xmm6,xmm1
+ DB 0F3h,0C3h ;repret
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ALIGN 16
+_vpaes_schedule_round:
+
+
+ pxor xmm1,xmm1
+DB 102,65,15,58,15,200,15
+DB 102,69,15,58,15,192,15
+ pxor xmm7,xmm1
+
+
+ pshufd xmm0,xmm0,0xFF
+DB 102,15,58,15,192,1
+
+
+
+
+_vpaes_schedule_low_round:
+
+ movdqa xmm1,xmm7
+ pslldq xmm7,4
+ pxor xmm7,xmm1
+ movdqa xmm1,xmm7
+ pslldq xmm7,8
+ pxor xmm7,xmm1
+ pxor xmm7,XMMWORD[$L$k_s63]
+
+
+ movdqa xmm1,xmm9
+ pandn xmm1,xmm0
+ psrld xmm1,4
+ pand xmm0,xmm9
+ movdqa xmm2,xmm11
+DB 102,15,56,0,208
+ pxor xmm0,xmm1
+ movdqa xmm3,xmm10
+DB 102,15,56,0,217
+ pxor xmm3,xmm2
+ movdqa xmm4,xmm10
+DB 102,15,56,0,224
+ pxor xmm4,xmm2
+ movdqa xmm2,xmm10
+DB 102,15,56,0,211
+ pxor xmm2,xmm0
+ movdqa xmm3,xmm10
+DB 102,15,56,0,220
+ pxor xmm3,xmm1
+ movdqa xmm4,xmm13
+DB 102,15,56,0,226
+ movdqa xmm0,xmm12
+DB 102,15,56,0,195
+ pxor xmm0,xmm4
+
+
+ pxor xmm0,xmm7
+ movdqa xmm7,xmm0
+ DB 0F3h,0C3h ;repret
+
+
+
+
+
+
+
+
+
+
+
+
+
+ALIGN 16
+_vpaes_schedule_transform:
+
+ movdqa xmm1,xmm9
+ pandn xmm1,xmm0
+ psrld xmm1,4
+ pand xmm0,xmm9
+ movdqa xmm2,XMMWORD[r11]
+DB 102,15,56,0,208
+ movdqa xmm0,XMMWORD[16+r11]
+DB 102,15,56,0,193
+ pxor xmm0,xmm2
+ DB 0F3h,0C3h ;repret
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ALIGN 16
+_vpaes_schedule_mangle:
+
+ movdqa xmm4,xmm0
+ movdqa xmm5,XMMWORD[$L$k_mc_forward]
+ test rcx,rcx
+ jnz NEAR $L$schedule_mangle_dec
+
+
+ add rdx,16
+ pxor xmm4,XMMWORD[$L$k_s63]
+DB 102,15,56,0,229
+ movdqa xmm3,xmm4
+DB 102,15,56,0,229
+ pxor xmm3,xmm4
+DB 102,15,56,0,229
+ pxor xmm3,xmm4
+
+ jmp NEAR $L$schedule_mangle_both
+ALIGN 16
+$L$schedule_mangle_dec:
+
+ lea r11,[$L$k_dksd]
+ movdqa xmm1,xmm9
+ pandn xmm1,xmm4
+ psrld xmm1,4
+ pand xmm4,xmm9
+
+ movdqa xmm2,XMMWORD[r11]
+DB 102,15,56,0,212
+ movdqa xmm3,XMMWORD[16+r11]
+DB 102,15,56,0,217
+ pxor xmm3,xmm2
+DB 102,15,56,0,221
+
+ movdqa xmm2,XMMWORD[32+r11]
+DB 102,15,56,0,212
+ pxor xmm2,xmm3
+ movdqa xmm3,XMMWORD[48+r11]
+DB 102,15,56,0,217
+ pxor xmm3,xmm2
+DB 102,15,56,0,221
+
+ movdqa xmm2,XMMWORD[64+r11]
+DB 102,15,56,0,212
+ pxor xmm2,xmm3
+ movdqa xmm3,XMMWORD[80+r11]
+DB 102,15,56,0,217
+ pxor xmm3,xmm2
+DB 102,15,56,0,221
+
+ movdqa xmm2,XMMWORD[96+r11]
+DB 102,15,56,0,212
+ pxor xmm2,xmm3
+ movdqa xmm3,XMMWORD[112+r11]
+DB 102,15,56,0,217
+ pxor xmm3,xmm2
+
+ add rdx,-16
+
+$L$schedule_mangle_both:
+ movdqa xmm1,XMMWORD[r10*1+r8]
+DB 102,15,56,0,217
+ add r8,-16
+ and r8,0x30
+ movdqu XMMWORD[rdx],xmm3
+ DB 0F3h,0C3h ;repret
+
+
+
+
+
+
+global vpaes_set_encrypt_key
+
+ALIGN 16
+vpaes_set_encrypt_key:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_vpaes_set_encrypt_key:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+ lea rsp,[((-184))+rsp]
+ movaps XMMWORD[16+rsp],xmm6
+ movaps XMMWORD[32+rsp],xmm7
+ movaps XMMWORD[48+rsp],xmm8
+ movaps XMMWORD[64+rsp],xmm9
+ movaps XMMWORD[80+rsp],xmm10
+ movaps XMMWORD[96+rsp],xmm11
+ movaps XMMWORD[112+rsp],xmm12
+ movaps XMMWORD[128+rsp],xmm13
+ movaps XMMWORD[144+rsp],xmm14
+ movaps XMMWORD[160+rsp],xmm15
+$L$enc_key_body:
+ mov eax,esi
+ shr eax,5
+ add eax,5
+ mov DWORD[240+rdx],eax
+
+ mov ecx,0
+ mov r8d,0x30
+ call _vpaes_schedule_core
+ movaps xmm6,XMMWORD[16+rsp]
+ movaps xmm7,XMMWORD[32+rsp]
+ movaps xmm8,XMMWORD[48+rsp]
+ movaps xmm9,XMMWORD[64+rsp]
+ movaps xmm10,XMMWORD[80+rsp]
+ movaps xmm11,XMMWORD[96+rsp]
+ movaps xmm12,XMMWORD[112+rsp]
+ movaps xmm13,XMMWORD[128+rsp]
+ movaps xmm14,XMMWORD[144+rsp]
+ movaps xmm15,XMMWORD[160+rsp]
+ lea rsp,[184+rsp]
+$L$enc_key_epilogue:
+ xor eax,eax
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_vpaes_set_encrypt_key:
+
+global vpaes_set_decrypt_key
+
+ALIGN 16
+vpaes_set_decrypt_key:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_vpaes_set_decrypt_key:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+ lea rsp,[((-184))+rsp]
+ movaps XMMWORD[16+rsp],xmm6
+ movaps XMMWORD[32+rsp],xmm7
+ movaps XMMWORD[48+rsp],xmm8
+ movaps XMMWORD[64+rsp],xmm9
+ movaps XMMWORD[80+rsp],xmm10
+ movaps XMMWORD[96+rsp],xmm11
+ movaps XMMWORD[112+rsp],xmm12
+ movaps XMMWORD[128+rsp],xmm13
+ movaps XMMWORD[144+rsp],xmm14
+ movaps XMMWORD[160+rsp],xmm15
+$L$dec_key_body:
+ mov eax,esi
+ shr eax,5
+ add eax,5
+ mov DWORD[240+rdx],eax
+ shl eax,4
+ lea rdx,[16+rax*1+rdx]
+
+ mov ecx,1
+ mov r8d,esi
+ shr r8d,1
+ and r8d,32
+ xor r8d,32
+ call _vpaes_schedule_core
+ movaps xmm6,XMMWORD[16+rsp]
+ movaps xmm7,XMMWORD[32+rsp]
+ movaps xmm8,XMMWORD[48+rsp]
+ movaps xmm9,XMMWORD[64+rsp]
+ movaps xmm10,XMMWORD[80+rsp]
+ movaps xmm11,XMMWORD[96+rsp]
+ movaps xmm12,XMMWORD[112+rsp]
+ movaps xmm13,XMMWORD[128+rsp]
+ movaps xmm14,XMMWORD[144+rsp]
+ movaps xmm15,XMMWORD[160+rsp]
+ lea rsp,[184+rsp]
+$L$dec_key_epilogue:
+ xor eax,eax
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_vpaes_set_decrypt_key:
+
+global vpaes_encrypt
+
+ALIGN 16
+vpaes_encrypt:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_vpaes_encrypt:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+ lea rsp,[((-184))+rsp]
+ movaps XMMWORD[16+rsp],xmm6
+ movaps XMMWORD[32+rsp],xmm7
+ movaps XMMWORD[48+rsp],xmm8
+ movaps XMMWORD[64+rsp],xmm9
+ movaps XMMWORD[80+rsp],xmm10
+ movaps XMMWORD[96+rsp],xmm11
+ movaps XMMWORD[112+rsp],xmm12
+ movaps XMMWORD[128+rsp],xmm13
+ movaps XMMWORD[144+rsp],xmm14
+ movaps XMMWORD[160+rsp],xmm15
+$L$enc_body:
+ movdqu xmm0,XMMWORD[rdi]
+ call _vpaes_preheat
+ call _vpaes_encrypt_core
+ movdqu XMMWORD[rsi],xmm0
+ movaps xmm6,XMMWORD[16+rsp]
+ movaps xmm7,XMMWORD[32+rsp]
+ movaps xmm8,XMMWORD[48+rsp]
+ movaps xmm9,XMMWORD[64+rsp]
+ movaps xmm10,XMMWORD[80+rsp]
+ movaps xmm11,XMMWORD[96+rsp]
+ movaps xmm12,XMMWORD[112+rsp]
+ movaps xmm13,XMMWORD[128+rsp]
+ movaps xmm14,XMMWORD[144+rsp]
+ movaps xmm15,XMMWORD[160+rsp]
+ lea rsp,[184+rsp]
+$L$enc_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_vpaes_encrypt:
+
+global vpaes_decrypt
+
+ALIGN 16
+vpaes_decrypt:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_vpaes_decrypt:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+ lea rsp,[((-184))+rsp]
+ movaps XMMWORD[16+rsp],xmm6
+ movaps XMMWORD[32+rsp],xmm7
+ movaps XMMWORD[48+rsp],xmm8
+ movaps XMMWORD[64+rsp],xmm9
+ movaps XMMWORD[80+rsp],xmm10
+ movaps XMMWORD[96+rsp],xmm11
+ movaps XMMWORD[112+rsp],xmm12
+ movaps XMMWORD[128+rsp],xmm13
+ movaps XMMWORD[144+rsp],xmm14
+ movaps XMMWORD[160+rsp],xmm15
+$L$dec_body:
+ movdqu xmm0,XMMWORD[rdi]
+ call _vpaes_preheat
+ call _vpaes_decrypt_core
+ movdqu XMMWORD[rsi],xmm0
+ movaps xmm6,XMMWORD[16+rsp]
+ movaps xmm7,XMMWORD[32+rsp]
+ movaps xmm8,XMMWORD[48+rsp]
+ movaps xmm9,XMMWORD[64+rsp]
+ movaps xmm10,XMMWORD[80+rsp]
+ movaps xmm11,XMMWORD[96+rsp]
+ movaps xmm12,XMMWORD[112+rsp]
+ movaps xmm13,XMMWORD[128+rsp]
+ movaps xmm14,XMMWORD[144+rsp]
+ movaps xmm15,XMMWORD[160+rsp]
+ lea rsp,[184+rsp]
+$L$dec_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_vpaes_decrypt:
+global vpaes_cbc_encrypt
+
+ALIGN 16
+vpaes_cbc_encrypt:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_vpaes_cbc_encrypt:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ xchg rdx,rcx
+ sub rcx,16
+ jc NEAR $L$cbc_abort
+ lea rsp,[((-184))+rsp]
+ movaps XMMWORD[16+rsp],xmm6
+ movaps XMMWORD[32+rsp],xmm7
+ movaps XMMWORD[48+rsp],xmm8
+ movaps XMMWORD[64+rsp],xmm9
+ movaps XMMWORD[80+rsp],xmm10
+ movaps XMMWORD[96+rsp],xmm11
+ movaps XMMWORD[112+rsp],xmm12
+ movaps XMMWORD[128+rsp],xmm13
+ movaps XMMWORD[144+rsp],xmm14
+ movaps XMMWORD[160+rsp],xmm15
+$L$cbc_body:
+ movdqu xmm6,XMMWORD[r8]
+ sub rsi,rdi
+ call _vpaes_preheat
+ cmp r9d,0
+ je NEAR $L$cbc_dec_loop
+ jmp NEAR $L$cbc_enc_loop
+ALIGN 16
+$L$cbc_enc_loop:
+ movdqu xmm0,XMMWORD[rdi]
+ pxor xmm0,xmm6
+ call _vpaes_encrypt_core
+ movdqa xmm6,xmm0
+ movdqu XMMWORD[rdi*1+rsi],xmm0
+ lea rdi,[16+rdi]
+ sub rcx,16
+ jnc NEAR $L$cbc_enc_loop
+ jmp NEAR $L$cbc_done
+ALIGN 16
+$L$cbc_dec_loop:
+ movdqu xmm0,XMMWORD[rdi]
+ movdqa xmm7,xmm0
+ call _vpaes_decrypt_core
+ pxor xmm0,xmm6
+ movdqa xmm6,xmm7
+ movdqu XMMWORD[rdi*1+rsi],xmm0
+ lea rdi,[16+rdi]
+ sub rcx,16
+ jnc NEAR $L$cbc_dec_loop
+$L$cbc_done:
+ movdqu XMMWORD[r8],xmm6
+ movaps xmm6,XMMWORD[16+rsp]
+ movaps xmm7,XMMWORD[32+rsp]
+ movaps xmm8,XMMWORD[48+rsp]
+ movaps xmm9,XMMWORD[64+rsp]
+ movaps xmm10,XMMWORD[80+rsp]
+ movaps xmm11,XMMWORD[96+rsp]
+ movaps xmm12,XMMWORD[112+rsp]
+ movaps xmm13,XMMWORD[128+rsp]
+ movaps xmm14,XMMWORD[144+rsp]
+ movaps xmm15,XMMWORD[160+rsp]
+ lea rsp,[184+rsp]
+$L$cbc_epilogue:
+$L$cbc_abort:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_vpaes_cbc_encrypt:
+
+
+
+
+
+
+
+ALIGN 16
+_vpaes_preheat:
+
+ lea r10,[$L$k_s0F]
+ movdqa xmm10,XMMWORD[((-32))+r10]
+ movdqa xmm11,XMMWORD[((-16))+r10]
+ movdqa xmm9,XMMWORD[r10]
+ movdqa xmm13,XMMWORD[48+r10]
+ movdqa xmm12,XMMWORD[64+r10]
+ movdqa xmm15,XMMWORD[80+r10]
+ movdqa xmm14,XMMWORD[96+r10]
+ DB 0F3h,0C3h ;repret
+
+
+
+
+
+
+
+
+ALIGN 64
+_vpaes_consts:
+$L$k_inv:
+ DQ 0x0E05060F0D080180,0x040703090A0B0C02
+ DQ 0x01040A060F0B0780,0x030D0E0C02050809
+
+$L$k_s0F:
+ DQ 0x0F0F0F0F0F0F0F0F,0x0F0F0F0F0F0F0F0F
+
+$L$k_ipt:
+ DQ 0xC2B2E8985A2A7000,0xCABAE09052227808
+ DQ 0x4C01307D317C4D00,0xCD80B1FCB0FDCC81
+
+$L$k_sb1:
+ DQ 0xB19BE18FCB503E00,0xA5DF7A6E142AF544
+ DQ 0x3618D415FAE22300,0x3BF7CCC10D2ED9EF
+$L$k_sb2:
+ DQ 0xE27A93C60B712400,0x5EB7E955BC982FCD
+ DQ 0x69EB88400AE12900,0xC2A163C8AB82234A
+$L$k_sbo:
+ DQ 0xD0D26D176FBDC700,0x15AABF7AC502A878
+ DQ 0xCFE474A55FBB6A00,0x8E1E90D1412B35FA
+
+$L$k_mc_forward:
+ DQ 0x0407060500030201,0x0C0F0E0D080B0A09
+ DQ 0x080B0A0904070605,0x000302010C0F0E0D
+ DQ 0x0C0F0E0D080B0A09,0x0407060500030201
+ DQ 0x000302010C0F0E0D,0x080B0A0904070605
+
+$L$k_mc_backward:
+ DQ 0x0605040702010003,0x0E0D0C0F0A09080B
+ DQ 0x020100030E0D0C0F,0x0A09080B06050407
+ DQ 0x0E0D0C0F0A09080B,0x0605040702010003
+ DQ 0x0A09080B06050407,0x020100030E0D0C0F
+
+$L$k_sr:
+ DQ 0x0706050403020100,0x0F0E0D0C0B0A0908
+ DQ 0x030E09040F0A0500,0x0B06010C07020D08
+ DQ 0x0F060D040B020900,0x070E050C030A0108
+ DQ 0x0B0E0104070A0D00,0x0306090C0F020508
+
+$L$k_rcon:
+ DQ 0x1F8391B9AF9DEEB6,0x702A98084D7C7D81
+
+$L$k_s63:
+ DQ 0x5B5B5B5B5B5B5B5B,0x5B5B5B5B5B5B5B5B
+
+$L$k_opt:
+ DQ 0xFF9F4929D6B66000,0xF7974121DEBE6808
+ DQ 0x01EDBD5150BCEC00,0xE10D5DB1B05C0CE0
+
+$L$k_deskew:
+ DQ 0x07E4A34047A4E300,0x1DFEB95A5DBEF91A
+ DQ 0x5F36B5DC83EA6900,0x2841C2ABF49D1E77
+
+
+
+
+
+$L$k_dksd:
+ DQ 0xFEB91A5DA3E44700,0x0740E3A45A1DBEF9
+ DQ 0x41C277F4B5368300,0x5FDC69EAAB289D1E
+$L$k_dksb:
+ DQ 0x9A4FCA1F8550D500,0x03D653861CC94C99
+ DQ 0x115BEDA7B6FC4A00,0xD993256F7E3482C8
+$L$k_dkse:
+ DQ 0xD5031CCA1FC9D600,0x53859A4C994F5086
+ DQ 0xA23196054FDC7BE8,0xCD5EF96A20B31487
+$L$k_dks9:
+ DQ 0xB6116FC87ED9A700,0x4AED933482255BFC
+ DQ 0x4576516227143300,0x8BB89FACE9DAFDCE
+
+
+
+
+
+$L$k_dipt:
+ DQ 0x0F505B040B545F00,0x154A411E114E451A
+ DQ 0x86E383E660056500,0x12771772F491F194
+
+$L$k_dsb9:
+ DQ 0x851C03539A86D600,0xCAD51F504F994CC9
+ DQ 0xC03B1789ECD74900,0x725E2C9EB2FBA565
+$L$k_dsbd:
+ DQ 0x7D57CCDFE6B1A200,0xF56E9B13882A4439
+ DQ 0x3CE2FAF724C6CB00,0x2931180D15DEEFD3
+$L$k_dsbb:
+ DQ 0xD022649296B44200,0x602646F6B0F2D404
+ DQ 0xC19498A6CD596700,0xF3FF0C3E3255AA6B
+$L$k_dsbe:
+ DQ 0x46F2929626D4D000,0x2242600464B4F6B0
+ DQ 0x0C55A6CDFFAAC100,0x9467F36B98593E32
+$L$k_dsbo:
+ DQ 0x1387EA537EF94000,0xC7AA6DB9D4943E2D
+ DQ 0x12D7560F93441D00,0xCA4B8159D8C58E9C
+DB 86,101,99,116,111,114,32,80,101,114,109,117,116,97,116,105
+DB 111,110,32,65,69,83,32,102,111,114,32,120,56,54,95,54
+DB 52,47,83,83,83,69,51,44,32,77,105,107,101,32,72,97
+DB 109,98,117,114,103,32,40,83,116,97,110,102,111,114,100,32
+DB 85,110,105,118,101,114,115,105,116,121,41,0
+ALIGN 64
+
+EXTERN __imp_RtlVirtualUnwind
+
+ALIGN 16
+se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$in_prologue
+
+ mov rax,QWORD[152+r8]
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$in_prologue
+
+ lea rsi,[16+rax]
+ lea rdi,[512+r8]
+ mov ecx,20
+ DD 0xa548f3fc
+ lea rax,[184+rax]
+
+$L$in_prologue:
+ mov rdi,QWORD[8+rax]
+ mov rsi,QWORD[16+rax]
+ mov QWORD[152+r8],rax
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+
+ mov rdi,QWORD[40+r9]
+ mov rsi,r8
+ mov ecx,154
+ DD 0xa548f3fc
+
+ mov rsi,r9
+ xor rcx,rcx
+ mov rdx,QWORD[8+rsi]
+ mov r8,QWORD[rsi]
+ mov r9,QWORD[16+rsi]
+ mov r10,QWORD[40+rsi]
+ lea r11,[56+rsi]
+ lea r12,[24+rsi]
+ mov QWORD[32+rsp],r10
+ mov QWORD[40+rsp],r11
+ mov QWORD[48+rsp],r12
+ mov QWORD[56+rsp],rcx
+ call QWORD[__imp_RtlVirtualUnwind]
+
+ mov eax,1
+ add rsp,64
+ popfq
+ pop r15
+ pop r14
+ pop r13
+ pop r12
+ pop rbp
+ pop rbx
+ pop rdi
+ pop rsi
+ DB 0F3h,0C3h ;repret
+
+
+section .pdata rdata align=4
+ALIGN 4
+ DD $L$SEH_begin_vpaes_set_encrypt_key wrt ..imagebase
+ DD $L$SEH_end_vpaes_set_encrypt_key wrt ..imagebase
+ DD $L$SEH_info_vpaes_set_encrypt_key wrt ..imagebase
+
+ DD $L$SEH_begin_vpaes_set_decrypt_key wrt ..imagebase
+ DD $L$SEH_end_vpaes_set_decrypt_key wrt ..imagebase
+ DD $L$SEH_info_vpaes_set_decrypt_key wrt ..imagebase
+
+ DD $L$SEH_begin_vpaes_encrypt wrt ..imagebase
+ DD $L$SEH_end_vpaes_encrypt wrt ..imagebase
+ DD $L$SEH_info_vpaes_encrypt wrt ..imagebase
+
+ DD $L$SEH_begin_vpaes_decrypt wrt ..imagebase
+ DD $L$SEH_end_vpaes_decrypt wrt ..imagebase
+ DD $L$SEH_info_vpaes_decrypt wrt ..imagebase
+
+ DD $L$SEH_begin_vpaes_cbc_encrypt wrt ..imagebase
+ DD $L$SEH_end_vpaes_cbc_encrypt wrt ..imagebase
+ DD $L$SEH_info_vpaes_cbc_encrypt wrt ..imagebase
+
+section .xdata rdata align=8
+ALIGN 8
+$L$SEH_info_vpaes_set_encrypt_key:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$enc_key_body wrt ..imagebase,$L$enc_key_epilogue wrt ..imagebase
+$L$SEH_info_vpaes_set_decrypt_key:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$dec_key_body wrt ..imagebase,$L$dec_key_epilogue wrt ..imagebase
+$L$SEH_info_vpaes_encrypt:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$enc_body wrt ..imagebase,$L$enc_epilogue wrt ..imagebase
+$L$SEH_info_vpaes_decrypt:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$dec_body wrt ..imagebase,$L$dec_epilogue wrt ..imagebase
+$L$SEH_info_vpaes_cbc_encrypt:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$cbc_body wrt ..imagebase,$L$cbc_epilogue wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-avx2.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-avx2.nasm
new file mode 100644
index 0000000000..69443b7261
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-avx2.nasm
@@ -0,0 +1,1989 @@
+; Copyright 2013-2019 The OpenSSL Project Authors. All Rights Reserved.
+; Copyright (c) 2012, Intel Corporation. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+global rsaz_1024_sqr_avx2
+
+ALIGN 64
+rsaz_1024_sqr_avx2:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_rsaz_1024_sqr_avx2:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+
+
+
+ lea rax,[rsp]
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ vzeroupper
+ lea rsp,[((-168))+rsp]
+ vmovaps XMMWORD[(-216)+rax],xmm6
+ vmovaps XMMWORD[(-200)+rax],xmm7
+ vmovaps XMMWORD[(-184)+rax],xmm8
+ vmovaps XMMWORD[(-168)+rax],xmm9
+ vmovaps XMMWORD[(-152)+rax],xmm10
+ vmovaps XMMWORD[(-136)+rax],xmm11
+ vmovaps XMMWORD[(-120)+rax],xmm12
+ vmovaps XMMWORD[(-104)+rax],xmm13
+ vmovaps XMMWORD[(-88)+rax],xmm14
+ vmovaps XMMWORD[(-72)+rax],xmm15
+$L$sqr_1024_body:
+ mov rbp,rax
+
+ mov r13,rdx
+ sub rsp,832
+ mov r15,r13
+ sub rdi,-128
+ sub rsi,-128
+ sub r13,-128
+
+ and r15,4095
+ add r15,32*10
+ shr r15,12
+ vpxor ymm9,ymm9,ymm9
+ jz NEAR $L$sqr_1024_no_n_copy
+
+
+
+
+
+ sub rsp,32*10
+ vmovdqu ymm0,YMMWORD[((0-128))+r13]
+ and rsp,-2048
+ vmovdqu ymm1,YMMWORD[((32-128))+r13]
+ vmovdqu ymm2,YMMWORD[((64-128))+r13]
+ vmovdqu ymm3,YMMWORD[((96-128))+r13]
+ vmovdqu ymm4,YMMWORD[((128-128))+r13]
+ vmovdqu ymm5,YMMWORD[((160-128))+r13]
+ vmovdqu ymm6,YMMWORD[((192-128))+r13]
+ vmovdqu ymm7,YMMWORD[((224-128))+r13]
+ vmovdqu ymm8,YMMWORD[((256-128))+r13]
+ lea r13,[((832+128))+rsp]
+ vmovdqu YMMWORD[(0-128)+r13],ymm0
+ vmovdqu YMMWORD[(32-128)+r13],ymm1
+ vmovdqu YMMWORD[(64-128)+r13],ymm2
+ vmovdqu YMMWORD[(96-128)+r13],ymm3
+ vmovdqu YMMWORD[(128-128)+r13],ymm4
+ vmovdqu YMMWORD[(160-128)+r13],ymm5
+ vmovdqu YMMWORD[(192-128)+r13],ymm6
+ vmovdqu YMMWORD[(224-128)+r13],ymm7
+ vmovdqu YMMWORD[(256-128)+r13],ymm8
+ vmovdqu YMMWORD[(288-128)+r13],ymm9
+
+$L$sqr_1024_no_n_copy:
+ and rsp,-1024
+
+ vmovdqu ymm1,YMMWORD[((32-128))+rsi]
+ vmovdqu ymm2,YMMWORD[((64-128))+rsi]
+ vmovdqu ymm3,YMMWORD[((96-128))+rsi]
+ vmovdqu ymm4,YMMWORD[((128-128))+rsi]
+ vmovdqu ymm5,YMMWORD[((160-128))+rsi]
+ vmovdqu ymm6,YMMWORD[((192-128))+rsi]
+ vmovdqu ymm7,YMMWORD[((224-128))+rsi]
+ vmovdqu ymm8,YMMWORD[((256-128))+rsi]
+
+ lea rbx,[192+rsp]
+ vmovdqu ymm15,YMMWORD[$L$and_mask]
+ jmp NEAR $L$OOP_GRANDE_SQR_1024
+
+ALIGN 32
+$L$OOP_GRANDE_SQR_1024:
+ lea r9,[((576+128))+rsp]
+ lea r12,[448+rsp]
+
+
+
+
+ vpaddq ymm1,ymm1,ymm1
+ vpbroadcastq ymm10,QWORD[((0-128))+rsi]
+ vpaddq ymm2,ymm2,ymm2
+ vmovdqa YMMWORD[(0-128)+r9],ymm1
+ vpaddq ymm3,ymm3,ymm3
+ vmovdqa YMMWORD[(32-128)+r9],ymm2
+ vpaddq ymm4,ymm4,ymm4
+ vmovdqa YMMWORD[(64-128)+r9],ymm3
+ vpaddq ymm5,ymm5,ymm5
+ vmovdqa YMMWORD[(96-128)+r9],ymm4
+ vpaddq ymm6,ymm6,ymm6
+ vmovdqa YMMWORD[(128-128)+r9],ymm5
+ vpaddq ymm7,ymm7,ymm7
+ vmovdqa YMMWORD[(160-128)+r9],ymm6
+ vpaddq ymm8,ymm8,ymm8
+ vmovdqa YMMWORD[(192-128)+r9],ymm7
+ vpxor ymm9,ymm9,ymm9
+ vmovdqa YMMWORD[(224-128)+r9],ymm8
+
+ vpmuludq ymm0,ymm10,YMMWORD[((0-128))+rsi]
+ vpbroadcastq ymm11,QWORD[((32-128))+rsi]
+ vmovdqu YMMWORD[(288-192)+rbx],ymm9
+ vpmuludq ymm1,ymm1,ymm10
+ vmovdqu YMMWORD[(320-448)+r12],ymm9
+ vpmuludq ymm2,ymm2,ymm10
+ vmovdqu YMMWORD[(352-448)+r12],ymm9
+ vpmuludq ymm3,ymm3,ymm10
+ vmovdqu YMMWORD[(384-448)+r12],ymm9
+ vpmuludq ymm4,ymm4,ymm10
+ vmovdqu YMMWORD[(416-448)+r12],ymm9
+ vpmuludq ymm5,ymm5,ymm10
+ vmovdqu YMMWORD[(448-448)+r12],ymm9
+ vpmuludq ymm6,ymm6,ymm10
+ vmovdqu YMMWORD[(480-448)+r12],ymm9
+ vpmuludq ymm7,ymm7,ymm10
+ vmovdqu YMMWORD[(512-448)+r12],ymm9
+ vpmuludq ymm8,ymm8,ymm10
+ vpbroadcastq ymm10,QWORD[((64-128))+rsi]
+ vmovdqu YMMWORD[(544-448)+r12],ymm9
+
+ mov r15,rsi
+ mov r14d,4
+ jmp NEAR $L$sqr_entry_1024
+ALIGN 32
+$L$OOP_SQR_1024:
+ vpbroadcastq ymm11,QWORD[((32-128))+r15]
+ vpmuludq ymm0,ymm10,YMMWORD[((0-128))+rsi]
+ vpaddq ymm0,ymm0,YMMWORD[((0-192))+rbx]
+ vpmuludq ymm1,ymm10,YMMWORD[((0-128))+r9]
+ vpaddq ymm1,ymm1,YMMWORD[((32-192))+rbx]
+ vpmuludq ymm2,ymm10,YMMWORD[((32-128))+r9]
+ vpaddq ymm2,ymm2,YMMWORD[((64-192))+rbx]
+ vpmuludq ymm3,ymm10,YMMWORD[((64-128))+r9]
+ vpaddq ymm3,ymm3,YMMWORD[((96-192))+rbx]
+ vpmuludq ymm4,ymm10,YMMWORD[((96-128))+r9]
+ vpaddq ymm4,ymm4,YMMWORD[((128-192))+rbx]
+ vpmuludq ymm5,ymm10,YMMWORD[((128-128))+r9]
+ vpaddq ymm5,ymm5,YMMWORD[((160-192))+rbx]
+ vpmuludq ymm6,ymm10,YMMWORD[((160-128))+r9]
+ vpaddq ymm6,ymm6,YMMWORD[((192-192))+rbx]
+ vpmuludq ymm7,ymm10,YMMWORD[((192-128))+r9]
+ vpaddq ymm7,ymm7,YMMWORD[((224-192))+rbx]
+ vpmuludq ymm8,ymm10,YMMWORD[((224-128))+r9]
+ vpbroadcastq ymm10,QWORD[((64-128))+r15]
+ vpaddq ymm8,ymm8,YMMWORD[((256-192))+rbx]
+$L$sqr_entry_1024:
+ vmovdqu YMMWORD[(0-192)+rbx],ymm0
+ vmovdqu YMMWORD[(32-192)+rbx],ymm1
+
+ vpmuludq ymm12,ymm11,YMMWORD[((32-128))+rsi]
+ vpaddq ymm2,ymm2,ymm12
+ vpmuludq ymm14,ymm11,YMMWORD[((32-128))+r9]
+ vpaddq ymm3,ymm3,ymm14
+ vpmuludq ymm13,ymm11,YMMWORD[((64-128))+r9]
+ vpaddq ymm4,ymm4,ymm13
+ vpmuludq ymm12,ymm11,YMMWORD[((96-128))+r9]
+ vpaddq ymm5,ymm5,ymm12
+ vpmuludq ymm14,ymm11,YMMWORD[((128-128))+r9]
+ vpaddq ymm6,ymm6,ymm14
+ vpmuludq ymm13,ymm11,YMMWORD[((160-128))+r9]
+ vpaddq ymm7,ymm7,ymm13
+ vpmuludq ymm12,ymm11,YMMWORD[((192-128))+r9]
+ vpaddq ymm8,ymm8,ymm12
+ vpmuludq ymm0,ymm11,YMMWORD[((224-128))+r9]
+ vpbroadcastq ymm11,QWORD[((96-128))+r15]
+ vpaddq ymm0,ymm0,YMMWORD[((288-192))+rbx]
+
+ vmovdqu YMMWORD[(64-192)+rbx],ymm2
+ vmovdqu YMMWORD[(96-192)+rbx],ymm3
+
+ vpmuludq ymm13,ymm10,YMMWORD[((64-128))+rsi]
+ vpaddq ymm4,ymm4,ymm13
+ vpmuludq ymm12,ymm10,YMMWORD[((64-128))+r9]
+ vpaddq ymm5,ymm5,ymm12
+ vpmuludq ymm14,ymm10,YMMWORD[((96-128))+r9]
+ vpaddq ymm6,ymm6,ymm14
+ vpmuludq ymm13,ymm10,YMMWORD[((128-128))+r9]
+ vpaddq ymm7,ymm7,ymm13
+ vpmuludq ymm12,ymm10,YMMWORD[((160-128))+r9]
+ vpaddq ymm8,ymm8,ymm12
+ vpmuludq ymm14,ymm10,YMMWORD[((192-128))+r9]
+ vpaddq ymm0,ymm0,ymm14
+ vpmuludq ymm1,ymm10,YMMWORD[((224-128))+r9]
+ vpbroadcastq ymm10,QWORD[((128-128))+r15]
+ vpaddq ymm1,ymm1,YMMWORD[((320-448))+r12]
+
+ vmovdqu YMMWORD[(128-192)+rbx],ymm4
+ vmovdqu YMMWORD[(160-192)+rbx],ymm5
+
+ vpmuludq ymm12,ymm11,YMMWORD[((96-128))+rsi]
+ vpaddq ymm6,ymm6,ymm12
+ vpmuludq ymm14,ymm11,YMMWORD[((96-128))+r9]
+ vpaddq ymm7,ymm7,ymm14
+ vpmuludq ymm13,ymm11,YMMWORD[((128-128))+r9]
+ vpaddq ymm8,ymm8,ymm13
+ vpmuludq ymm12,ymm11,YMMWORD[((160-128))+r9]
+ vpaddq ymm0,ymm0,ymm12
+ vpmuludq ymm14,ymm11,YMMWORD[((192-128))+r9]
+ vpaddq ymm1,ymm1,ymm14
+ vpmuludq ymm2,ymm11,YMMWORD[((224-128))+r9]
+ vpbroadcastq ymm11,QWORD[((160-128))+r15]
+ vpaddq ymm2,ymm2,YMMWORD[((352-448))+r12]
+
+ vmovdqu YMMWORD[(192-192)+rbx],ymm6
+ vmovdqu YMMWORD[(224-192)+rbx],ymm7
+
+ vpmuludq ymm12,ymm10,YMMWORD[((128-128))+rsi]
+ vpaddq ymm8,ymm8,ymm12
+ vpmuludq ymm14,ymm10,YMMWORD[((128-128))+r9]
+ vpaddq ymm0,ymm0,ymm14
+ vpmuludq ymm13,ymm10,YMMWORD[((160-128))+r9]
+ vpaddq ymm1,ymm1,ymm13
+ vpmuludq ymm12,ymm10,YMMWORD[((192-128))+r9]
+ vpaddq ymm2,ymm2,ymm12
+ vpmuludq ymm3,ymm10,YMMWORD[((224-128))+r9]
+ vpbroadcastq ymm10,QWORD[((192-128))+r15]
+ vpaddq ymm3,ymm3,YMMWORD[((384-448))+r12]
+
+ vmovdqu YMMWORD[(256-192)+rbx],ymm8
+ vmovdqu YMMWORD[(288-192)+rbx],ymm0
+ lea rbx,[8+rbx]
+
+ vpmuludq ymm13,ymm11,YMMWORD[((160-128))+rsi]
+ vpaddq ymm1,ymm1,ymm13
+ vpmuludq ymm12,ymm11,YMMWORD[((160-128))+r9]
+ vpaddq ymm2,ymm2,ymm12
+ vpmuludq ymm14,ymm11,YMMWORD[((192-128))+r9]
+ vpaddq ymm3,ymm3,ymm14
+ vpmuludq ymm4,ymm11,YMMWORD[((224-128))+r9]
+ vpbroadcastq ymm11,QWORD[((224-128))+r15]
+ vpaddq ymm4,ymm4,YMMWORD[((416-448))+r12]
+
+ vmovdqu YMMWORD[(320-448)+r12],ymm1
+ vmovdqu YMMWORD[(352-448)+r12],ymm2
+
+ vpmuludq ymm12,ymm10,YMMWORD[((192-128))+rsi]
+ vpaddq ymm3,ymm3,ymm12
+ vpmuludq ymm14,ymm10,YMMWORD[((192-128))+r9]
+ vpbroadcastq ymm0,QWORD[((256-128))+r15]
+ vpaddq ymm4,ymm4,ymm14
+ vpmuludq ymm5,ymm10,YMMWORD[((224-128))+r9]
+ vpbroadcastq ymm10,QWORD[((0+8-128))+r15]
+ vpaddq ymm5,ymm5,YMMWORD[((448-448))+r12]
+
+ vmovdqu YMMWORD[(384-448)+r12],ymm3
+ vmovdqu YMMWORD[(416-448)+r12],ymm4
+ lea r15,[8+r15]
+
+ vpmuludq ymm12,ymm11,YMMWORD[((224-128))+rsi]
+ vpaddq ymm5,ymm5,ymm12
+ vpmuludq ymm6,ymm11,YMMWORD[((224-128))+r9]
+ vpaddq ymm6,ymm6,YMMWORD[((480-448))+r12]
+
+ vpmuludq ymm7,ymm0,YMMWORD[((256-128))+rsi]
+ vmovdqu YMMWORD[(448-448)+r12],ymm5
+ vpaddq ymm7,ymm7,YMMWORD[((512-448))+r12]
+ vmovdqu YMMWORD[(480-448)+r12],ymm6
+ vmovdqu YMMWORD[(512-448)+r12],ymm7
+ lea r12,[8+r12]
+
+ dec r14d
+ jnz NEAR $L$OOP_SQR_1024
+
+ vmovdqu ymm8,YMMWORD[256+rsp]
+ vmovdqu ymm1,YMMWORD[288+rsp]
+ vmovdqu ymm2,YMMWORD[320+rsp]
+ lea rbx,[192+rsp]
+
+ vpsrlq ymm14,ymm8,29
+ vpand ymm8,ymm8,ymm15
+ vpsrlq ymm11,ymm1,29
+ vpand ymm1,ymm1,ymm15
+
+ vpermq ymm14,ymm14,0x93
+ vpxor ymm9,ymm9,ymm9
+ vpermq ymm11,ymm11,0x93
+
+ vpblendd ymm10,ymm14,ymm9,3
+ vpblendd ymm14,ymm11,ymm14,3
+ vpaddq ymm8,ymm8,ymm10
+ vpblendd ymm11,ymm9,ymm11,3
+ vpaddq ymm1,ymm1,ymm14
+ vpaddq ymm2,ymm2,ymm11
+ vmovdqu YMMWORD[(288-192)+rbx],ymm1
+ vmovdqu YMMWORD[(320-192)+rbx],ymm2
+
+ mov rax,QWORD[rsp]
+ mov r10,QWORD[8+rsp]
+ mov r11,QWORD[16+rsp]
+ mov r12,QWORD[24+rsp]
+ vmovdqu ymm1,YMMWORD[32+rsp]
+ vmovdqu ymm2,YMMWORD[((64-192))+rbx]
+ vmovdqu ymm3,YMMWORD[((96-192))+rbx]
+ vmovdqu ymm4,YMMWORD[((128-192))+rbx]
+ vmovdqu ymm5,YMMWORD[((160-192))+rbx]
+ vmovdqu ymm6,YMMWORD[((192-192))+rbx]
+ vmovdqu ymm7,YMMWORD[((224-192))+rbx]
+
+ mov r9,rax
+ imul eax,ecx
+ and eax,0x1fffffff
+ vmovd xmm12,eax
+
+ mov rdx,rax
+ imul rax,QWORD[((-128))+r13]
+ vpbroadcastq ymm12,xmm12
+ add r9,rax
+ mov rax,rdx
+ imul rax,QWORD[((8-128))+r13]
+ shr r9,29
+ add r10,rax
+ mov rax,rdx
+ imul rax,QWORD[((16-128))+r13]
+ add r10,r9
+ add r11,rax
+ imul rdx,QWORD[((24-128))+r13]
+ add r12,rdx
+
+ mov rax,r10
+ imul eax,ecx
+ and eax,0x1fffffff
+
+ mov r14d,9
+ jmp NEAR $L$OOP_REDUCE_1024
+
+ALIGN 32
+$L$OOP_REDUCE_1024:
+ vmovd xmm13,eax
+ vpbroadcastq ymm13,xmm13
+
+ vpmuludq ymm10,ymm12,YMMWORD[((32-128))+r13]
+ mov rdx,rax
+ imul rax,QWORD[((-128))+r13]
+ vpaddq ymm1,ymm1,ymm10
+ add r10,rax
+ vpmuludq ymm14,ymm12,YMMWORD[((64-128))+r13]
+ mov rax,rdx
+ imul rax,QWORD[((8-128))+r13]
+ vpaddq ymm2,ymm2,ymm14
+ vpmuludq ymm11,ymm12,YMMWORD[((96-128))+r13]
+DB 0x67
+ add r11,rax
+DB 0x67
+ mov rax,rdx
+ imul rax,QWORD[((16-128))+r13]
+ shr r10,29
+ vpaddq ymm3,ymm3,ymm11
+ vpmuludq ymm10,ymm12,YMMWORD[((128-128))+r13]
+ add r12,rax
+ add r11,r10
+ vpaddq ymm4,ymm4,ymm10
+ vpmuludq ymm14,ymm12,YMMWORD[((160-128))+r13]
+ mov rax,r11
+ imul eax,ecx
+ vpaddq ymm5,ymm5,ymm14
+ vpmuludq ymm11,ymm12,YMMWORD[((192-128))+r13]
+ and eax,0x1fffffff
+ vpaddq ymm6,ymm6,ymm11
+ vpmuludq ymm10,ymm12,YMMWORD[((224-128))+r13]
+ vpaddq ymm7,ymm7,ymm10
+ vpmuludq ymm14,ymm12,YMMWORD[((256-128))+r13]
+ vmovd xmm12,eax
+
+ vpaddq ymm8,ymm8,ymm14
+
+ vpbroadcastq ymm12,xmm12
+
+ vpmuludq ymm11,ymm13,YMMWORD[((32-8-128))+r13]
+ vmovdqu ymm14,YMMWORD[((96-8-128))+r13]
+ mov rdx,rax
+ imul rax,QWORD[((-128))+r13]
+ vpaddq ymm1,ymm1,ymm11
+ vpmuludq ymm10,ymm13,YMMWORD[((64-8-128))+r13]
+ vmovdqu ymm11,YMMWORD[((128-8-128))+r13]
+ add r11,rax
+ mov rax,rdx
+ imul rax,QWORD[((8-128))+r13]
+ vpaddq ymm2,ymm2,ymm10
+ add rax,r12
+ shr r11,29
+ vpmuludq ymm14,ymm14,ymm13
+ vmovdqu ymm10,YMMWORD[((160-8-128))+r13]
+ add rax,r11
+ vpaddq ymm3,ymm3,ymm14
+ vpmuludq ymm11,ymm11,ymm13
+ vmovdqu ymm14,YMMWORD[((192-8-128))+r13]
+DB 0x67
+ mov r12,rax
+ imul eax,ecx
+ vpaddq ymm4,ymm4,ymm11
+ vpmuludq ymm10,ymm10,ymm13
+DB 0xc4,0x41,0x7e,0x6f,0x9d,0x58,0x00,0x00,0x00
+ and eax,0x1fffffff
+ vpaddq ymm5,ymm5,ymm10
+ vpmuludq ymm14,ymm14,ymm13
+ vmovdqu ymm10,YMMWORD[((256-8-128))+r13]
+ vpaddq ymm6,ymm6,ymm14
+ vpmuludq ymm11,ymm11,ymm13
+ vmovdqu ymm9,YMMWORD[((288-8-128))+r13]
+ vmovd xmm0,eax
+ imul rax,QWORD[((-128))+r13]
+ vpaddq ymm7,ymm7,ymm11
+ vpmuludq ymm10,ymm10,ymm13
+ vmovdqu ymm14,YMMWORD[((32-16-128))+r13]
+ vpbroadcastq ymm0,xmm0
+ vpaddq ymm8,ymm8,ymm10
+ vpmuludq ymm9,ymm9,ymm13
+ vmovdqu ymm11,YMMWORD[((64-16-128))+r13]
+ add r12,rax
+
+ vmovdqu ymm13,YMMWORD[((32-24-128))+r13]
+ vpmuludq ymm14,ymm14,ymm12
+ vmovdqu ymm10,YMMWORD[((96-16-128))+r13]
+ vpaddq ymm1,ymm1,ymm14
+ vpmuludq ymm13,ymm13,ymm0
+ vpmuludq ymm11,ymm11,ymm12
+DB 0xc4,0x41,0x7e,0x6f,0xb5,0xf0,0xff,0xff,0xff
+ vpaddq ymm13,ymm13,ymm1
+ vpaddq ymm2,ymm2,ymm11
+ vpmuludq ymm10,ymm10,ymm12
+ vmovdqu ymm11,YMMWORD[((160-16-128))+r13]
+DB 0x67
+ vmovq rax,xmm13
+ vmovdqu YMMWORD[rsp],ymm13
+ vpaddq ymm3,ymm3,ymm10
+ vpmuludq ymm14,ymm14,ymm12
+ vmovdqu ymm10,YMMWORD[((192-16-128))+r13]
+ vpaddq ymm4,ymm4,ymm14
+ vpmuludq ymm11,ymm11,ymm12
+ vmovdqu ymm14,YMMWORD[((224-16-128))+r13]
+ vpaddq ymm5,ymm5,ymm11
+ vpmuludq ymm10,ymm10,ymm12
+ vmovdqu ymm11,YMMWORD[((256-16-128))+r13]
+ vpaddq ymm6,ymm6,ymm10
+ vpmuludq ymm14,ymm14,ymm12
+ shr r12,29
+ vmovdqu ymm10,YMMWORD[((288-16-128))+r13]
+ add rax,r12
+ vpaddq ymm7,ymm7,ymm14
+ vpmuludq ymm11,ymm11,ymm12
+
+ mov r9,rax
+ imul eax,ecx
+ vpaddq ymm8,ymm8,ymm11
+ vpmuludq ymm10,ymm10,ymm12
+ and eax,0x1fffffff
+ vmovd xmm12,eax
+ vmovdqu ymm11,YMMWORD[((96-24-128))+r13]
+DB 0x67
+ vpaddq ymm9,ymm9,ymm10
+ vpbroadcastq ymm12,xmm12
+
+ vpmuludq ymm14,ymm0,YMMWORD[((64-24-128))+r13]
+ vmovdqu ymm10,YMMWORD[((128-24-128))+r13]
+ mov rdx,rax
+ imul rax,QWORD[((-128))+r13]
+ mov r10,QWORD[8+rsp]
+ vpaddq ymm1,ymm2,ymm14
+ vpmuludq ymm11,ymm11,ymm0
+ vmovdqu ymm14,YMMWORD[((160-24-128))+r13]
+ add r9,rax
+ mov rax,rdx
+ imul rax,QWORD[((8-128))+r13]
+DB 0x67
+ shr r9,29
+ mov r11,QWORD[16+rsp]
+ vpaddq ymm2,ymm3,ymm11
+ vpmuludq ymm10,ymm10,ymm0
+ vmovdqu ymm11,YMMWORD[((192-24-128))+r13]
+ add r10,rax
+ mov rax,rdx
+ imul rax,QWORD[((16-128))+r13]
+ vpaddq ymm3,ymm4,ymm10
+ vpmuludq ymm14,ymm14,ymm0
+ vmovdqu ymm10,YMMWORD[((224-24-128))+r13]
+ imul rdx,QWORD[((24-128))+r13]
+ add r11,rax
+ lea rax,[r10*1+r9]
+ vpaddq ymm4,ymm5,ymm14
+ vpmuludq ymm11,ymm11,ymm0
+ vmovdqu ymm14,YMMWORD[((256-24-128))+r13]
+ mov r10,rax
+ imul eax,ecx
+ vpmuludq ymm10,ymm10,ymm0
+ vpaddq ymm5,ymm6,ymm11
+ vmovdqu ymm11,YMMWORD[((288-24-128))+r13]
+ and eax,0x1fffffff
+ vpaddq ymm6,ymm7,ymm10
+ vpmuludq ymm14,ymm14,ymm0
+ add rdx,QWORD[24+rsp]
+ vpaddq ymm7,ymm8,ymm14
+ vpmuludq ymm11,ymm11,ymm0
+ vpaddq ymm8,ymm9,ymm11
+ vmovq xmm9,r12
+ mov r12,rdx
+
+ dec r14d
+ jnz NEAR $L$OOP_REDUCE_1024
+ lea r12,[448+rsp]
+ vpaddq ymm0,ymm13,ymm9
+ vpxor ymm9,ymm9,ymm9
+
+ vpaddq ymm0,ymm0,YMMWORD[((288-192))+rbx]
+ vpaddq ymm1,ymm1,YMMWORD[((320-448))+r12]
+ vpaddq ymm2,ymm2,YMMWORD[((352-448))+r12]
+ vpaddq ymm3,ymm3,YMMWORD[((384-448))+r12]
+ vpaddq ymm4,ymm4,YMMWORD[((416-448))+r12]
+ vpaddq ymm5,ymm5,YMMWORD[((448-448))+r12]
+ vpaddq ymm6,ymm6,YMMWORD[((480-448))+r12]
+ vpaddq ymm7,ymm7,YMMWORD[((512-448))+r12]
+ vpaddq ymm8,ymm8,YMMWORD[((544-448))+r12]
+
+ vpsrlq ymm14,ymm0,29
+ vpand ymm0,ymm0,ymm15
+ vpsrlq ymm11,ymm1,29
+ vpand ymm1,ymm1,ymm15
+ vpsrlq ymm12,ymm2,29
+ vpermq ymm14,ymm14,0x93
+ vpand ymm2,ymm2,ymm15
+ vpsrlq ymm13,ymm3,29
+ vpermq ymm11,ymm11,0x93
+ vpand ymm3,ymm3,ymm15
+ vpermq ymm12,ymm12,0x93
+
+ vpblendd ymm10,ymm14,ymm9,3
+ vpermq ymm13,ymm13,0x93
+ vpblendd ymm14,ymm11,ymm14,3
+ vpaddq ymm0,ymm0,ymm10
+ vpblendd ymm11,ymm12,ymm11,3
+ vpaddq ymm1,ymm1,ymm14
+ vpblendd ymm12,ymm13,ymm12,3
+ vpaddq ymm2,ymm2,ymm11
+ vpblendd ymm13,ymm9,ymm13,3
+ vpaddq ymm3,ymm3,ymm12
+ vpaddq ymm4,ymm4,ymm13
+
+ vpsrlq ymm14,ymm0,29
+ vpand ymm0,ymm0,ymm15
+ vpsrlq ymm11,ymm1,29
+ vpand ymm1,ymm1,ymm15
+ vpsrlq ymm12,ymm2,29
+ vpermq ymm14,ymm14,0x93
+ vpand ymm2,ymm2,ymm15
+ vpsrlq ymm13,ymm3,29
+ vpermq ymm11,ymm11,0x93
+ vpand ymm3,ymm3,ymm15
+ vpermq ymm12,ymm12,0x93
+
+ vpblendd ymm10,ymm14,ymm9,3
+ vpermq ymm13,ymm13,0x93
+ vpblendd ymm14,ymm11,ymm14,3
+ vpaddq ymm0,ymm0,ymm10
+ vpblendd ymm11,ymm12,ymm11,3
+ vpaddq ymm1,ymm1,ymm14
+ vmovdqu YMMWORD[(0-128)+rdi],ymm0
+ vpblendd ymm12,ymm13,ymm12,3
+ vpaddq ymm2,ymm2,ymm11
+ vmovdqu YMMWORD[(32-128)+rdi],ymm1
+ vpblendd ymm13,ymm9,ymm13,3
+ vpaddq ymm3,ymm3,ymm12
+ vmovdqu YMMWORD[(64-128)+rdi],ymm2
+ vpaddq ymm4,ymm4,ymm13
+ vmovdqu YMMWORD[(96-128)+rdi],ymm3
+ vpsrlq ymm14,ymm4,29
+ vpand ymm4,ymm4,ymm15
+ vpsrlq ymm11,ymm5,29
+ vpand ymm5,ymm5,ymm15
+ vpsrlq ymm12,ymm6,29
+ vpermq ymm14,ymm14,0x93
+ vpand ymm6,ymm6,ymm15
+ vpsrlq ymm13,ymm7,29
+ vpermq ymm11,ymm11,0x93
+ vpand ymm7,ymm7,ymm15
+ vpsrlq ymm0,ymm8,29
+ vpermq ymm12,ymm12,0x93
+ vpand ymm8,ymm8,ymm15
+ vpermq ymm13,ymm13,0x93
+
+ vpblendd ymm10,ymm14,ymm9,3
+ vpermq ymm0,ymm0,0x93
+ vpblendd ymm14,ymm11,ymm14,3
+ vpaddq ymm4,ymm4,ymm10
+ vpblendd ymm11,ymm12,ymm11,3
+ vpaddq ymm5,ymm5,ymm14
+ vpblendd ymm12,ymm13,ymm12,3
+ vpaddq ymm6,ymm6,ymm11
+ vpblendd ymm13,ymm0,ymm13,3
+ vpaddq ymm7,ymm7,ymm12
+ vpaddq ymm8,ymm8,ymm13
+
+ vpsrlq ymm14,ymm4,29
+ vpand ymm4,ymm4,ymm15
+ vpsrlq ymm11,ymm5,29
+ vpand ymm5,ymm5,ymm15
+ vpsrlq ymm12,ymm6,29
+ vpermq ymm14,ymm14,0x93
+ vpand ymm6,ymm6,ymm15
+ vpsrlq ymm13,ymm7,29
+ vpermq ymm11,ymm11,0x93
+ vpand ymm7,ymm7,ymm15
+ vpsrlq ymm0,ymm8,29
+ vpermq ymm12,ymm12,0x93
+ vpand ymm8,ymm8,ymm15
+ vpermq ymm13,ymm13,0x93
+
+ vpblendd ymm10,ymm14,ymm9,3
+ vpermq ymm0,ymm0,0x93
+ vpblendd ymm14,ymm11,ymm14,3
+ vpaddq ymm4,ymm4,ymm10
+ vpblendd ymm11,ymm12,ymm11,3
+ vpaddq ymm5,ymm5,ymm14
+ vmovdqu YMMWORD[(128-128)+rdi],ymm4
+ vpblendd ymm12,ymm13,ymm12,3
+ vpaddq ymm6,ymm6,ymm11
+ vmovdqu YMMWORD[(160-128)+rdi],ymm5
+ vpblendd ymm13,ymm0,ymm13,3
+ vpaddq ymm7,ymm7,ymm12
+ vmovdqu YMMWORD[(192-128)+rdi],ymm6
+ vpaddq ymm8,ymm8,ymm13
+ vmovdqu YMMWORD[(224-128)+rdi],ymm7
+ vmovdqu YMMWORD[(256-128)+rdi],ymm8
+
+ mov rsi,rdi
+ dec r8d
+ jne NEAR $L$OOP_GRANDE_SQR_1024
+
+ vzeroall
+ mov rax,rbp
+
+$L$sqr_1024_in_tail:
+ movaps xmm6,XMMWORD[((-216))+rax]
+ movaps xmm7,XMMWORD[((-200))+rax]
+ movaps xmm8,XMMWORD[((-184))+rax]
+ movaps xmm9,XMMWORD[((-168))+rax]
+ movaps xmm10,XMMWORD[((-152))+rax]
+ movaps xmm11,XMMWORD[((-136))+rax]
+ movaps xmm12,XMMWORD[((-120))+rax]
+ movaps xmm13,XMMWORD[((-104))+rax]
+ movaps xmm14,XMMWORD[((-88))+rax]
+ movaps xmm15,XMMWORD[((-72))+rax]
+ mov r15,QWORD[((-48))+rax]
+
+ mov r14,QWORD[((-40))+rax]
+
+ mov r13,QWORD[((-32))+rax]
+
+ mov r12,QWORD[((-24))+rax]
+
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$sqr_1024_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_rsaz_1024_sqr_avx2:
+global rsaz_1024_mul_avx2
+
+ALIGN 64
+rsaz_1024_mul_avx2:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_rsaz_1024_mul_avx2:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+
+
+
+ lea rax,[rsp]
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ vzeroupper
+ lea rsp,[((-168))+rsp]
+ vmovaps XMMWORD[(-216)+rax],xmm6
+ vmovaps XMMWORD[(-200)+rax],xmm7
+ vmovaps XMMWORD[(-184)+rax],xmm8
+ vmovaps XMMWORD[(-168)+rax],xmm9
+ vmovaps XMMWORD[(-152)+rax],xmm10
+ vmovaps XMMWORD[(-136)+rax],xmm11
+ vmovaps XMMWORD[(-120)+rax],xmm12
+ vmovaps XMMWORD[(-104)+rax],xmm13
+ vmovaps XMMWORD[(-88)+rax],xmm14
+ vmovaps XMMWORD[(-72)+rax],xmm15
+$L$mul_1024_body:
+ mov rbp,rax
+
+ vzeroall
+ mov r13,rdx
+ sub rsp,64
+
+
+
+
+
+
+DB 0x67,0x67
+ mov r15,rsi
+ and r15,4095
+ add r15,32*10
+ shr r15,12
+ mov r15,rsi
+ cmovnz rsi,r13
+ cmovnz r13,r15
+
+ mov r15,rcx
+ sub rsi,-128
+ sub rcx,-128
+ sub rdi,-128
+
+ and r15,4095
+ add r15,32*10
+DB 0x67,0x67
+ shr r15,12
+ jz NEAR $L$mul_1024_no_n_copy
+
+
+
+
+
+ sub rsp,32*10
+ vmovdqu ymm0,YMMWORD[((0-128))+rcx]
+ and rsp,-512
+ vmovdqu ymm1,YMMWORD[((32-128))+rcx]
+ vmovdqu ymm2,YMMWORD[((64-128))+rcx]
+ vmovdqu ymm3,YMMWORD[((96-128))+rcx]
+ vmovdqu ymm4,YMMWORD[((128-128))+rcx]
+ vmovdqu ymm5,YMMWORD[((160-128))+rcx]
+ vmovdqu ymm6,YMMWORD[((192-128))+rcx]
+ vmovdqu ymm7,YMMWORD[((224-128))+rcx]
+ vmovdqu ymm8,YMMWORD[((256-128))+rcx]
+ lea rcx,[((64+128))+rsp]
+ vmovdqu YMMWORD[(0-128)+rcx],ymm0
+ vpxor ymm0,ymm0,ymm0
+ vmovdqu YMMWORD[(32-128)+rcx],ymm1
+ vpxor ymm1,ymm1,ymm1
+ vmovdqu YMMWORD[(64-128)+rcx],ymm2
+ vpxor ymm2,ymm2,ymm2
+ vmovdqu YMMWORD[(96-128)+rcx],ymm3
+ vpxor ymm3,ymm3,ymm3
+ vmovdqu YMMWORD[(128-128)+rcx],ymm4
+ vpxor ymm4,ymm4,ymm4
+ vmovdqu YMMWORD[(160-128)+rcx],ymm5
+ vpxor ymm5,ymm5,ymm5
+ vmovdqu YMMWORD[(192-128)+rcx],ymm6
+ vpxor ymm6,ymm6,ymm6
+ vmovdqu YMMWORD[(224-128)+rcx],ymm7
+ vpxor ymm7,ymm7,ymm7
+ vmovdqu YMMWORD[(256-128)+rcx],ymm8
+ vmovdqa ymm8,ymm0
+ vmovdqu YMMWORD[(288-128)+rcx],ymm9
+$L$mul_1024_no_n_copy:
+ and rsp,-64
+
+ mov rbx,QWORD[r13]
+ vpbroadcastq ymm10,QWORD[r13]
+ vmovdqu YMMWORD[rsp],ymm0
+ xor r9,r9
+DB 0x67
+ xor r10,r10
+ xor r11,r11
+ xor r12,r12
+
+ vmovdqu ymm15,YMMWORD[$L$and_mask]
+ mov r14d,9
+ vmovdqu YMMWORD[(288-128)+rdi],ymm9
+ jmp NEAR $L$oop_mul_1024
+
+ALIGN 32
+$L$oop_mul_1024:
+ vpsrlq ymm9,ymm3,29
+ mov rax,rbx
+ imul rax,QWORD[((-128))+rsi]
+ add rax,r9
+ mov r10,rbx
+ imul r10,QWORD[((8-128))+rsi]
+ add r10,QWORD[8+rsp]
+
+ mov r9,rax
+ imul eax,r8d
+ and eax,0x1fffffff
+
+ mov r11,rbx
+ imul r11,QWORD[((16-128))+rsi]
+ add r11,QWORD[16+rsp]
+
+ mov r12,rbx
+ imul r12,QWORD[((24-128))+rsi]
+ add r12,QWORD[24+rsp]
+ vpmuludq ymm0,ymm10,YMMWORD[((32-128))+rsi]
+ vmovd xmm11,eax
+ vpaddq ymm1,ymm1,ymm0
+ vpmuludq ymm12,ymm10,YMMWORD[((64-128))+rsi]
+ vpbroadcastq ymm11,xmm11
+ vpaddq ymm2,ymm2,ymm12
+ vpmuludq ymm13,ymm10,YMMWORD[((96-128))+rsi]
+ vpand ymm3,ymm3,ymm15
+ vpaddq ymm3,ymm3,ymm13
+ vpmuludq ymm0,ymm10,YMMWORD[((128-128))+rsi]
+ vpaddq ymm4,ymm4,ymm0
+ vpmuludq ymm12,ymm10,YMMWORD[((160-128))+rsi]
+ vpaddq ymm5,ymm5,ymm12
+ vpmuludq ymm13,ymm10,YMMWORD[((192-128))+rsi]
+ vpaddq ymm6,ymm6,ymm13
+ vpmuludq ymm0,ymm10,YMMWORD[((224-128))+rsi]
+ vpermq ymm9,ymm9,0x93
+ vpaddq ymm7,ymm7,ymm0
+ vpmuludq ymm12,ymm10,YMMWORD[((256-128))+rsi]
+ vpbroadcastq ymm10,QWORD[8+r13]
+ vpaddq ymm8,ymm8,ymm12
+
+ mov rdx,rax
+ imul rax,QWORD[((-128))+rcx]
+ add r9,rax
+ mov rax,rdx
+ imul rax,QWORD[((8-128))+rcx]
+ add r10,rax
+ mov rax,rdx
+ imul rax,QWORD[((16-128))+rcx]
+ add r11,rax
+ shr r9,29
+ imul rdx,QWORD[((24-128))+rcx]
+ add r12,rdx
+ add r10,r9
+
+ vpmuludq ymm13,ymm11,YMMWORD[((32-128))+rcx]
+ vmovq rbx,xmm10
+ vpaddq ymm1,ymm1,ymm13
+ vpmuludq ymm0,ymm11,YMMWORD[((64-128))+rcx]
+ vpaddq ymm2,ymm2,ymm0
+ vpmuludq ymm12,ymm11,YMMWORD[((96-128))+rcx]
+ vpaddq ymm3,ymm3,ymm12
+ vpmuludq ymm13,ymm11,YMMWORD[((128-128))+rcx]
+ vpaddq ymm4,ymm4,ymm13
+ vpmuludq ymm0,ymm11,YMMWORD[((160-128))+rcx]
+ vpaddq ymm5,ymm5,ymm0
+ vpmuludq ymm12,ymm11,YMMWORD[((192-128))+rcx]
+ vpaddq ymm6,ymm6,ymm12
+ vpmuludq ymm13,ymm11,YMMWORD[((224-128))+rcx]
+ vpblendd ymm12,ymm9,ymm14,3
+ vpaddq ymm7,ymm7,ymm13
+ vpmuludq ymm0,ymm11,YMMWORD[((256-128))+rcx]
+ vpaddq ymm3,ymm3,ymm12
+ vpaddq ymm8,ymm8,ymm0
+
+ mov rax,rbx
+ imul rax,QWORD[((-128))+rsi]
+ add r10,rax
+ vmovdqu ymm12,YMMWORD[((-8+32-128))+rsi]
+ mov rax,rbx
+ imul rax,QWORD[((8-128))+rsi]
+ add r11,rax
+ vmovdqu ymm13,YMMWORD[((-8+64-128))+rsi]
+
+ mov rax,r10
+ vpblendd ymm9,ymm9,ymm14,0xfc
+ imul eax,r8d
+ vpaddq ymm4,ymm4,ymm9
+ and eax,0x1fffffff
+
+ imul rbx,QWORD[((16-128))+rsi]
+ add r12,rbx
+ vpmuludq ymm12,ymm12,ymm10
+ vmovd xmm11,eax
+ vmovdqu ymm0,YMMWORD[((-8+96-128))+rsi]
+ vpaddq ymm1,ymm1,ymm12
+ vpmuludq ymm13,ymm13,ymm10
+ vpbroadcastq ymm11,xmm11
+ vmovdqu ymm12,YMMWORD[((-8+128-128))+rsi]
+ vpaddq ymm2,ymm2,ymm13
+ vpmuludq ymm0,ymm0,ymm10
+ vmovdqu ymm13,YMMWORD[((-8+160-128))+rsi]
+ vpaddq ymm3,ymm3,ymm0
+ vpmuludq ymm12,ymm12,ymm10
+ vmovdqu ymm0,YMMWORD[((-8+192-128))+rsi]
+ vpaddq ymm4,ymm4,ymm12
+ vpmuludq ymm13,ymm13,ymm10
+ vmovdqu ymm12,YMMWORD[((-8+224-128))+rsi]
+ vpaddq ymm5,ymm5,ymm13
+ vpmuludq ymm0,ymm0,ymm10
+ vmovdqu ymm13,YMMWORD[((-8+256-128))+rsi]
+ vpaddq ymm6,ymm6,ymm0
+ vpmuludq ymm12,ymm12,ymm10
+ vmovdqu ymm9,YMMWORD[((-8+288-128))+rsi]
+ vpaddq ymm7,ymm7,ymm12
+ vpmuludq ymm13,ymm13,ymm10
+ vpaddq ymm8,ymm8,ymm13
+ vpmuludq ymm9,ymm9,ymm10
+ vpbroadcastq ymm10,QWORD[16+r13]
+
+ mov rdx,rax
+ imul rax,QWORD[((-128))+rcx]
+ add r10,rax
+ vmovdqu ymm0,YMMWORD[((-8+32-128))+rcx]
+ mov rax,rdx
+ imul rax,QWORD[((8-128))+rcx]
+ add r11,rax
+ vmovdqu ymm12,YMMWORD[((-8+64-128))+rcx]
+ shr r10,29
+ imul rdx,QWORD[((16-128))+rcx]
+ add r12,rdx
+ add r11,r10
+
+ vpmuludq ymm0,ymm0,ymm11
+ vmovq rbx,xmm10
+ vmovdqu ymm13,YMMWORD[((-8+96-128))+rcx]
+ vpaddq ymm1,ymm1,ymm0
+ vpmuludq ymm12,ymm12,ymm11
+ vmovdqu ymm0,YMMWORD[((-8+128-128))+rcx]
+ vpaddq ymm2,ymm2,ymm12
+ vpmuludq ymm13,ymm13,ymm11
+ vmovdqu ymm12,YMMWORD[((-8+160-128))+rcx]
+ vpaddq ymm3,ymm3,ymm13
+ vpmuludq ymm0,ymm0,ymm11
+ vmovdqu ymm13,YMMWORD[((-8+192-128))+rcx]
+ vpaddq ymm4,ymm4,ymm0
+ vpmuludq ymm12,ymm12,ymm11
+ vmovdqu ymm0,YMMWORD[((-8+224-128))+rcx]
+ vpaddq ymm5,ymm5,ymm12
+ vpmuludq ymm13,ymm13,ymm11
+ vmovdqu ymm12,YMMWORD[((-8+256-128))+rcx]
+ vpaddq ymm6,ymm6,ymm13
+ vpmuludq ymm0,ymm0,ymm11
+ vmovdqu ymm13,YMMWORD[((-8+288-128))+rcx]
+ vpaddq ymm7,ymm7,ymm0
+ vpmuludq ymm12,ymm12,ymm11
+ vpaddq ymm8,ymm8,ymm12
+ vpmuludq ymm13,ymm13,ymm11
+ vpaddq ymm9,ymm9,ymm13
+
+ vmovdqu ymm0,YMMWORD[((-16+32-128))+rsi]
+ mov rax,rbx
+ imul rax,QWORD[((-128))+rsi]
+ add rax,r11
+
+ vmovdqu ymm12,YMMWORD[((-16+64-128))+rsi]
+ mov r11,rax
+ imul eax,r8d
+ and eax,0x1fffffff
+
+ imul rbx,QWORD[((8-128))+rsi]
+ add r12,rbx
+ vpmuludq ymm0,ymm0,ymm10
+ vmovd xmm11,eax
+ vmovdqu ymm13,YMMWORD[((-16+96-128))+rsi]
+ vpaddq ymm1,ymm1,ymm0
+ vpmuludq ymm12,ymm12,ymm10
+ vpbroadcastq ymm11,xmm11
+ vmovdqu ymm0,YMMWORD[((-16+128-128))+rsi]
+ vpaddq ymm2,ymm2,ymm12
+ vpmuludq ymm13,ymm13,ymm10
+ vmovdqu ymm12,YMMWORD[((-16+160-128))+rsi]
+ vpaddq ymm3,ymm3,ymm13
+ vpmuludq ymm0,ymm0,ymm10
+ vmovdqu ymm13,YMMWORD[((-16+192-128))+rsi]
+ vpaddq ymm4,ymm4,ymm0
+ vpmuludq ymm12,ymm12,ymm10
+ vmovdqu ymm0,YMMWORD[((-16+224-128))+rsi]
+ vpaddq ymm5,ymm5,ymm12
+ vpmuludq ymm13,ymm13,ymm10
+ vmovdqu ymm12,YMMWORD[((-16+256-128))+rsi]
+ vpaddq ymm6,ymm6,ymm13
+ vpmuludq ymm0,ymm0,ymm10
+ vmovdqu ymm13,YMMWORD[((-16+288-128))+rsi]
+ vpaddq ymm7,ymm7,ymm0
+ vpmuludq ymm12,ymm12,ymm10
+ vpaddq ymm8,ymm8,ymm12
+ vpmuludq ymm13,ymm13,ymm10
+ vpbroadcastq ymm10,QWORD[24+r13]
+ vpaddq ymm9,ymm9,ymm13
+
+ vmovdqu ymm0,YMMWORD[((-16+32-128))+rcx]
+ mov rdx,rax
+ imul rax,QWORD[((-128))+rcx]
+ add r11,rax
+ vmovdqu ymm12,YMMWORD[((-16+64-128))+rcx]
+ imul rdx,QWORD[((8-128))+rcx]
+ add r12,rdx
+ shr r11,29
+
+ vpmuludq ymm0,ymm0,ymm11
+ vmovq rbx,xmm10
+ vmovdqu ymm13,YMMWORD[((-16+96-128))+rcx]
+ vpaddq ymm1,ymm1,ymm0
+ vpmuludq ymm12,ymm12,ymm11
+ vmovdqu ymm0,YMMWORD[((-16+128-128))+rcx]
+ vpaddq ymm2,ymm2,ymm12
+ vpmuludq ymm13,ymm13,ymm11
+ vmovdqu ymm12,YMMWORD[((-16+160-128))+rcx]
+ vpaddq ymm3,ymm3,ymm13
+ vpmuludq ymm0,ymm0,ymm11
+ vmovdqu ymm13,YMMWORD[((-16+192-128))+rcx]
+ vpaddq ymm4,ymm4,ymm0
+ vpmuludq ymm12,ymm12,ymm11
+ vmovdqu ymm0,YMMWORD[((-16+224-128))+rcx]
+ vpaddq ymm5,ymm5,ymm12
+ vpmuludq ymm13,ymm13,ymm11
+ vmovdqu ymm12,YMMWORD[((-16+256-128))+rcx]
+ vpaddq ymm6,ymm6,ymm13
+ vpmuludq ymm0,ymm0,ymm11
+ vmovdqu ymm13,YMMWORD[((-16+288-128))+rcx]
+ vpaddq ymm7,ymm7,ymm0
+ vpmuludq ymm12,ymm12,ymm11
+ vmovdqu ymm0,YMMWORD[((-24+32-128))+rsi]
+ vpaddq ymm8,ymm8,ymm12
+ vpmuludq ymm13,ymm13,ymm11
+ vmovdqu ymm12,YMMWORD[((-24+64-128))+rsi]
+ vpaddq ymm9,ymm9,ymm13
+
+ add r12,r11
+ imul rbx,QWORD[((-128))+rsi]
+ add r12,rbx
+
+ mov rax,r12
+ imul eax,r8d
+ and eax,0x1fffffff
+
+ vpmuludq ymm0,ymm0,ymm10
+ vmovd xmm11,eax
+ vmovdqu ymm13,YMMWORD[((-24+96-128))+rsi]
+ vpaddq ymm1,ymm1,ymm0
+ vpmuludq ymm12,ymm12,ymm10
+ vpbroadcastq ymm11,xmm11
+ vmovdqu ymm0,YMMWORD[((-24+128-128))+rsi]
+ vpaddq ymm2,ymm2,ymm12
+ vpmuludq ymm13,ymm13,ymm10
+ vmovdqu ymm12,YMMWORD[((-24+160-128))+rsi]
+ vpaddq ymm3,ymm3,ymm13
+ vpmuludq ymm0,ymm0,ymm10
+ vmovdqu ymm13,YMMWORD[((-24+192-128))+rsi]
+ vpaddq ymm4,ymm4,ymm0
+ vpmuludq ymm12,ymm12,ymm10
+ vmovdqu ymm0,YMMWORD[((-24+224-128))+rsi]
+ vpaddq ymm5,ymm5,ymm12
+ vpmuludq ymm13,ymm13,ymm10
+ vmovdqu ymm12,YMMWORD[((-24+256-128))+rsi]
+ vpaddq ymm6,ymm6,ymm13
+ vpmuludq ymm0,ymm0,ymm10
+ vmovdqu ymm13,YMMWORD[((-24+288-128))+rsi]
+ vpaddq ymm7,ymm7,ymm0
+ vpmuludq ymm12,ymm12,ymm10
+ vpaddq ymm8,ymm8,ymm12
+ vpmuludq ymm13,ymm13,ymm10
+ vpbroadcastq ymm10,QWORD[32+r13]
+ vpaddq ymm9,ymm9,ymm13
+ add r13,32
+
+ vmovdqu ymm0,YMMWORD[((-24+32-128))+rcx]
+ imul rax,QWORD[((-128))+rcx]
+ add r12,rax
+ shr r12,29
+
+ vmovdqu ymm12,YMMWORD[((-24+64-128))+rcx]
+ vpmuludq ymm0,ymm0,ymm11
+ vmovq rbx,xmm10
+ vmovdqu ymm13,YMMWORD[((-24+96-128))+rcx]
+ vpaddq ymm0,ymm1,ymm0
+ vpmuludq ymm12,ymm12,ymm11
+ vmovdqu YMMWORD[rsp],ymm0
+ vpaddq ymm1,ymm2,ymm12
+ vmovdqu ymm0,YMMWORD[((-24+128-128))+rcx]
+ vpmuludq ymm13,ymm13,ymm11
+ vmovdqu ymm12,YMMWORD[((-24+160-128))+rcx]
+ vpaddq ymm2,ymm3,ymm13
+ vpmuludq ymm0,ymm0,ymm11
+ vmovdqu ymm13,YMMWORD[((-24+192-128))+rcx]
+ vpaddq ymm3,ymm4,ymm0
+ vpmuludq ymm12,ymm12,ymm11
+ vmovdqu ymm0,YMMWORD[((-24+224-128))+rcx]
+ vpaddq ymm4,ymm5,ymm12
+ vpmuludq ymm13,ymm13,ymm11
+ vmovdqu ymm12,YMMWORD[((-24+256-128))+rcx]
+ vpaddq ymm5,ymm6,ymm13
+ vpmuludq ymm0,ymm0,ymm11
+ vmovdqu ymm13,YMMWORD[((-24+288-128))+rcx]
+ mov r9,r12
+ vpaddq ymm6,ymm7,ymm0
+ vpmuludq ymm12,ymm12,ymm11
+ add r9,QWORD[rsp]
+ vpaddq ymm7,ymm8,ymm12
+ vpmuludq ymm13,ymm13,ymm11
+ vmovq xmm12,r12
+ vpaddq ymm8,ymm9,ymm13
+
+ dec r14d
+ jnz NEAR $L$oop_mul_1024
+ vpaddq ymm0,ymm12,YMMWORD[rsp]
+
+ vpsrlq ymm12,ymm0,29
+ vpand ymm0,ymm0,ymm15
+ vpsrlq ymm13,ymm1,29
+ vpand ymm1,ymm1,ymm15
+ vpsrlq ymm10,ymm2,29
+ vpermq ymm12,ymm12,0x93
+ vpand ymm2,ymm2,ymm15
+ vpsrlq ymm11,ymm3,29
+ vpermq ymm13,ymm13,0x93
+ vpand ymm3,ymm3,ymm15
+
+ vpblendd ymm9,ymm12,ymm14,3
+ vpermq ymm10,ymm10,0x93
+ vpblendd ymm12,ymm13,ymm12,3
+ vpermq ymm11,ymm11,0x93
+ vpaddq ymm0,ymm0,ymm9
+ vpblendd ymm13,ymm10,ymm13,3
+ vpaddq ymm1,ymm1,ymm12
+ vpblendd ymm10,ymm11,ymm10,3
+ vpaddq ymm2,ymm2,ymm13
+ vpblendd ymm11,ymm14,ymm11,3
+ vpaddq ymm3,ymm3,ymm10
+ vpaddq ymm4,ymm4,ymm11
+
+ vpsrlq ymm12,ymm0,29
+ vpand ymm0,ymm0,ymm15
+ vpsrlq ymm13,ymm1,29
+ vpand ymm1,ymm1,ymm15
+ vpsrlq ymm10,ymm2,29
+ vpermq ymm12,ymm12,0x93
+ vpand ymm2,ymm2,ymm15
+ vpsrlq ymm11,ymm3,29
+ vpermq ymm13,ymm13,0x93
+ vpand ymm3,ymm3,ymm15
+ vpermq ymm10,ymm10,0x93
+
+ vpblendd ymm9,ymm12,ymm14,3
+ vpermq ymm11,ymm11,0x93
+ vpblendd ymm12,ymm13,ymm12,3
+ vpaddq ymm0,ymm0,ymm9
+ vpblendd ymm13,ymm10,ymm13,3
+ vpaddq ymm1,ymm1,ymm12
+ vpblendd ymm10,ymm11,ymm10,3
+ vpaddq ymm2,ymm2,ymm13
+ vpblendd ymm11,ymm14,ymm11,3
+ vpaddq ymm3,ymm3,ymm10
+ vpaddq ymm4,ymm4,ymm11
+
+ vmovdqu YMMWORD[(0-128)+rdi],ymm0
+ vmovdqu YMMWORD[(32-128)+rdi],ymm1
+ vmovdqu YMMWORD[(64-128)+rdi],ymm2
+ vmovdqu YMMWORD[(96-128)+rdi],ymm3
+ vpsrlq ymm12,ymm4,29
+ vpand ymm4,ymm4,ymm15
+ vpsrlq ymm13,ymm5,29
+ vpand ymm5,ymm5,ymm15
+ vpsrlq ymm10,ymm6,29
+ vpermq ymm12,ymm12,0x93
+ vpand ymm6,ymm6,ymm15
+ vpsrlq ymm11,ymm7,29
+ vpermq ymm13,ymm13,0x93
+ vpand ymm7,ymm7,ymm15
+ vpsrlq ymm0,ymm8,29
+ vpermq ymm10,ymm10,0x93
+ vpand ymm8,ymm8,ymm15
+ vpermq ymm11,ymm11,0x93
+
+ vpblendd ymm9,ymm12,ymm14,3
+ vpermq ymm0,ymm0,0x93
+ vpblendd ymm12,ymm13,ymm12,3
+ vpaddq ymm4,ymm4,ymm9
+ vpblendd ymm13,ymm10,ymm13,3
+ vpaddq ymm5,ymm5,ymm12
+ vpblendd ymm10,ymm11,ymm10,3
+ vpaddq ymm6,ymm6,ymm13
+ vpblendd ymm11,ymm0,ymm11,3
+ vpaddq ymm7,ymm7,ymm10
+ vpaddq ymm8,ymm8,ymm11
+
+ vpsrlq ymm12,ymm4,29
+ vpand ymm4,ymm4,ymm15
+ vpsrlq ymm13,ymm5,29
+ vpand ymm5,ymm5,ymm15
+ vpsrlq ymm10,ymm6,29
+ vpermq ymm12,ymm12,0x93
+ vpand ymm6,ymm6,ymm15
+ vpsrlq ymm11,ymm7,29
+ vpermq ymm13,ymm13,0x93
+ vpand ymm7,ymm7,ymm15
+ vpsrlq ymm0,ymm8,29
+ vpermq ymm10,ymm10,0x93
+ vpand ymm8,ymm8,ymm15
+ vpermq ymm11,ymm11,0x93
+
+ vpblendd ymm9,ymm12,ymm14,3
+ vpermq ymm0,ymm0,0x93
+ vpblendd ymm12,ymm13,ymm12,3
+ vpaddq ymm4,ymm4,ymm9
+ vpblendd ymm13,ymm10,ymm13,3
+ vpaddq ymm5,ymm5,ymm12
+ vpblendd ymm10,ymm11,ymm10,3
+ vpaddq ymm6,ymm6,ymm13
+ vpblendd ymm11,ymm0,ymm11,3
+ vpaddq ymm7,ymm7,ymm10
+ vpaddq ymm8,ymm8,ymm11
+
+ vmovdqu YMMWORD[(128-128)+rdi],ymm4
+ vmovdqu YMMWORD[(160-128)+rdi],ymm5
+ vmovdqu YMMWORD[(192-128)+rdi],ymm6
+ vmovdqu YMMWORD[(224-128)+rdi],ymm7
+ vmovdqu YMMWORD[(256-128)+rdi],ymm8
+ vzeroupper
+
+ mov rax,rbp
+
+$L$mul_1024_in_tail:
+ movaps xmm6,XMMWORD[((-216))+rax]
+ movaps xmm7,XMMWORD[((-200))+rax]
+ movaps xmm8,XMMWORD[((-184))+rax]
+ movaps xmm9,XMMWORD[((-168))+rax]
+ movaps xmm10,XMMWORD[((-152))+rax]
+ movaps xmm11,XMMWORD[((-136))+rax]
+ movaps xmm12,XMMWORD[((-120))+rax]
+ movaps xmm13,XMMWORD[((-104))+rax]
+ movaps xmm14,XMMWORD[((-88))+rax]
+ movaps xmm15,XMMWORD[((-72))+rax]
+ mov r15,QWORD[((-48))+rax]
+
+ mov r14,QWORD[((-40))+rax]
+
+ mov r13,QWORD[((-32))+rax]
+
+ mov r12,QWORD[((-24))+rax]
+
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$mul_1024_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_rsaz_1024_mul_avx2:
+global rsaz_1024_red2norm_avx2
+
+ALIGN 32
+rsaz_1024_red2norm_avx2:
+
+ sub rdx,-128
+ xor rax,rax
+ mov r8,QWORD[((-128))+rdx]
+ mov r9,QWORD[((-120))+rdx]
+ mov r10,QWORD[((-112))+rdx]
+ shl r8,0
+ shl r9,29
+ mov r11,r10
+ shl r10,58
+ shr r11,6
+ add rax,r8
+ add rax,r9
+ add rax,r10
+ adc r11,0
+ mov QWORD[rcx],rax
+ mov rax,r11
+ mov r8,QWORD[((-104))+rdx]
+ mov r9,QWORD[((-96))+rdx]
+ shl r8,23
+ mov r10,r9
+ shl r9,52
+ shr r10,12
+ add rax,r8
+ add rax,r9
+ adc r10,0
+ mov QWORD[8+rcx],rax
+ mov rax,r10
+ mov r11,QWORD[((-88))+rdx]
+ mov r8,QWORD[((-80))+rdx]
+ shl r11,17
+ mov r9,r8
+ shl r8,46
+ shr r9,18
+ add rax,r11
+ add rax,r8
+ adc r9,0
+ mov QWORD[16+rcx],rax
+ mov rax,r9
+ mov r10,QWORD[((-72))+rdx]
+ mov r11,QWORD[((-64))+rdx]
+ shl r10,11
+ mov r8,r11
+ shl r11,40
+ shr r8,24
+ add rax,r10
+ add rax,r11
+ adc r8,0
+ mov QWORD[24+rcx],rax
+ mov rax,r8
+ mov r9,QWORD[((-56))+rdx]
+ mov r10,QWORD[((-48))+rdx]
+ mov r11,QWORD[((-40))+rdx]
+ shl r9,5
+ shl r10,34
+ mov r8,r11
+ shl r11,63
+ shr r8,1
+ add rax,r9
+ add rax,r10
+ add rax,r11
+ adc r8,0
+ mov QWORD[32+rcx],rax
+ mov rax,r8
+ mov r9,QWORD[((-32))+rdx]
+ mov r10,QWORD[((-24))+rdx]
+ shl r9,28
+ mov r11,r10
+ shl r10,57
+ shr r11,7
+ add rax,r9
+ add rax,r10
+ adc r11,0
+ mov QWORD[40+rcx],rax
+ mov rax,r11
+ mov r8,QWORD[((-16))+rdx]
+ mov r9,QWORD[((-8))+rdx]
+ shl r8,22
+ mov r10,r9
+ shl r9,51
+ shr r10,13
+ add rax,r8
+ add rax,r9
+ adc r10,0
+ mov QWORD[48+rcx],rax
+ mov rax,r10
+ mov r11,QWORD[rdx]
+ mov r8,QWORD[8+rdx]
+ shl r11,16
+ mov r9,r8
+ shl r8,45
+ shr r9,19
+ add rax,r11
+ add rax,r8
+ adc r9,0
+ mov QWORD[56+rcx],rax
+ mov rax,r9
+ mov r10,QWORD[16+rdx]
+ mov r11,QWORD[24+rdx]
+ shl r10,10
+ mov r8,r11
+ shl r11,39
+ shr r8,25
+ add rax,r10
+ add rax,r11
+ adc r8,0
+ mov QWORD[64+rcx],rax
+ mov rax,r8
+ mov r9,QWORD[32+rdx]
+ mov r10,QWORD[40+rdx]
+ mov r11,QWORD[48+rdx]
+ shl r9,4
+ shl r10,33
+ mov r8,r11
+ shl r11,62
+ shr r8,2
+ add rax,r9
+ add rax,r10
+ add rax,r11
+ adc r8,0
+ mov QWORD[72+rcx],rax
+ mov rax,r8
+ mov r9,QWORD[56+rdx]
+ mov r10,QWORD[64+rdx]
+ shl r9,27
+ mov r11,r10
+ shl r10,56
+ shr r11,8
+ add rax,r9
+ add rax,r10
+ adc r11,0
+ mov QWORD[80+rcx],rax
+ mov rax,r11
+ mov r8,QWORD[72+rdx]
+ mov r9,QWORD[80+rdx]
+ shl r8,21
+ mov r10,r9
+ shl r9,50
+ shr r10,14
+ add rax,r8
+ add rax,r9
+ adc r10,0
+ mov QWORD[88+rcx],rax
+ mov rax,r10
+ mov r11,QWORD[88+rdx]
+ mov r8,QWORD[96+rdx]
+ shl r11,15
+ mov r9,r8
+ shl r8,44
+ shr r9,20
+ add rax,r11
+ add rax,r8
+ adc r9,0
+ mov QWORD[96+rcx],rax
+ mov rax,r9
+ mov r10,QWORD[104+rdx]
+ mov r11,QWORD[112+rdx]
+ shl r10,9
+ mov r8,r11
+ shl r11,38
+ shr r8,26
+ add rax,r10
+ add rax,r11
+ adc r8,0
+ mov QWORD[104+rcx],rax
+ mov rax,r8
+ mov r9,QWORD[120+rdx]
+ mov r10,QWORD[128+rdx]
+ mov r11,QWORD[136+rdx]
+ shl r9,3
+ shl r10,32
+ mov r8,r11
+ shl r11,61
+ shr r8,3
+ add rax,r9
+ add rax,r10
+ add rax,r11
+ adc r8,0
+ mov QWORD[112+rcx],rax
+ mov rax,r8
+ mov r9,QWORD[144+rdx]
+ mov r10,QWORD[152+rdx]
+ shl r9,26
+ mov r11,r10
+ shl r10,55
+ shr r11,9
+ add rax,r9
+ add rax,r10
+ adc r11,0
+ mov QWORD[120+rcx],rax
+ mov rax,r11
+ DB 0F3h,0C3h ;repret
+
+
+
+global rsaz_1024_norm2red_avx2
+
+ALIGN 32
+rsaz_1024_norm2red_avx2:
+
+ sub rcx,-128
+ mov r8,QWORD[rdx]
+ mov eax,0x1fffffff
+ mov r9,QWORD[8+rdx]
+ mov r11,r8
+ shr r11,0
+ and r11,rax
+ mov QWORD[((-128))+rcx],r11
+ mov r10,r8
+ shr r10,29
+ and r10,rax
+ mov QWORD[((-120))+rcx],r10
+ shrd r8,r9,58
+ and r8,rax
+ mov QWORD[((-112))+rcx],r8
+ mov r10,QWORD[16+rdx]
+ mov r8,r9
+ shr r8,23
+ and r8,rax
+ mov QWORD[((-104))+rcx],r8
+ shrd r9,r10,52
+ and r9,rax
+ mov QWORD[((-96))+rcx],r9
+ mov r11,QWORD[24+rdx]
+ mov r9,r10
+ shr r9,17
+ and r9,rax
+ mov QWORD[((-88))+rcx],r9
+ shrd r10,r11,46
+ and r10,rax
+ mov QWORD[((-80))+rcx],r10
+ mov r8,QWORD[32+rdx]
+ mov r10,r11
+ shr r10,11
+ and r10,rax
+ mov QWORD[((-72))+rcx],r10
+ shrd r11,r8,40
+ and r11,rax
+ mov QWORD[((-64))+rcx],r11
+ mov r9,QWORD[40+rdx]
+ mov r11,r8
+ shr r11,5
+ and r11,rax
+ mov QWORD[((-56))+rcx],r11
+ mov r10,r8
+ shr r10,34
+ and r10,rax
+ mov QWORD[((-48))+rcx],r10
+ shrd r8,r9,63
+ and r8,rax
+ mov QWORD[((-40))+rcx],r8
+ mov r10,QWORD[48+rdx]
+ mov r8,r9
+ shr r8,28
+ and r8,rax
+ mov QWORD[((-32))+rcx],r8
+ shrd r9,r10,57
+ and r9,rax
+ mov QWORD[((-24))+rcx],r9
+ mov r11,QWORD[56+rdx]
+ mov r9,r10
+ shr r9,22
+ and r9,rax
+ mov QWORD[((-16))+rcx],r9
+ shrd r10,r11,51
+ and r10,rax
+ mov QWORD[((-8))+rcx],r10
+ mov r8,QWORD[64+rdx]
+ mov r10,r11
+ shr r10,16
+ and r10,rax
+ mov QWORD[rcx],r10
+ shrd r11,r8,45
+ and r11,rax
+ mov QWORD[8+rcx],r11
+ mov r9,QWORD[72+rdx]
+ mov r11,r8
+ shr r11,10
+ and r11,rax
+ mov QWORD[16+rcx],r11
+ shrd r8,r9,39
+ and r8,rax
+ mov QWORD[24+rcx],r8
+ mov r10,QWORD[80+rdx]
+ mov r8,r9
+ shr r8,4
+ and r8,rax
+ mov QWORD[32+rcx],r8
+ mov r11,r9
+ shr r11,33
+ and r11,rax
+ mov QWORD[40+rcx],r11
+ shrd r9,r10,62
+ and r9,rax
+ mov QWORD[48+rcx],r9
+ mov r11,QWORD[88+rdx]
+ mov r9,r10
+ shr r9,27
+ and r9,rax
+ mov QWORD[56+rcx],r9
+ shrd r10,r11,56
+ and r10,rax
+ mov QWORD[64+rcx],r10
+ mov r8,QWORD[96+rdx]
+ mov r10,r11
+ shr r10,21
+ and r10,rax
+ mov QWORD[72+rcx],r10
+ shrd r11,r8,50
+ and r11,rax
+ mov QWORD[80+rcx],r11
+ mov r9,QWORD[104+rdx]
+ mov r11,r8
+ shr r11,15
+ and r11,rax
+ mov QWORD[88+rcx],r11
+ shrd r8,r9,44
+ and r8,rax
+ mov QWORD[96+rcx],r8
+ mov r10,QWORD[112+rdx]
+ mov r8,r9
+ shr r8,9
+ and r8,rax
+ mov QWORD[104+rcx],r8
+ shrd r9,r10,38
+ and r9,rax
+ mov QWORD[112+rcx],r9
+ mov r11,QWORD[120+rdx]
+ mov r9,r10
+ shr r9,3
+ and r9,rax
+ mov QWORD[120+rcx],r9
+ mov r8,r10
+ shr r8,32
+ and r8,rax
+ mov QWORD[128+rcx],r8
+ shrd r10,r11,61
+ and r10,rax
+ mov QWORD[136+rcx],r10
+ xor r8,r8
+ mov r10,r11
+ shr r10,26
+ and r10,rax
+ mov QWORD[144+rcx],r10
+ shrd r11,r8,55
+ and r11,rax
+ mov QWORD[152+rcx],r11
+ mov QWORD[160+rcx],r8
+ mov QWORD[168+rcx],r8
+ mov QWORD[176+rcx],r8
+ mov QWORD[184+rcx],r8
+ DB 0F3h,0C3h ;repret
+
+
+global rsaz_1024_scatter5_avx2
+
+ALIGN 32
+rsaz_1024_scatter5_avx2:
+
+ vzeroupper
+ vmovdqu ymm5,YMMWORD[$L$scatter_permd]
+ shl r8d,4
+ lea rcx,[r8*1+rcx]
+ mov eax,9
+ jmp NEAR $L$oop_scatter_1024
+
+ALIGN 32
+$L$oop_scatter_1024:
+ vmovdqu ymm0,YMMWORD[rdx]
+ lea rdx,[32+rdx]
+ vpermd ymm0,ymm5,ymm0
+ vmovdqu XMMWORD[rcx],xmm0
+ lea rcx,[512+rcx]
+ dec eax
+ jnz NEAR $L$oop_scatter_1024
+
+ vzeroupper
+ DB 0F3h,0C3h ;repret
+
+
+
+global rsaz_1024_gather5_avx2
+
+ALIGN 32
+rsaz_1024_gather5_avx2:
+
+ vzeroupper
+ mov r11,rsp
+
+ lea rax,[((-136))+rsp]
+$L$SEH_begin_rsaz_1024_gather5:
+
+DB 0x48,0x8d,0x60,0xe0
+DB 0xc5,0xf8,0x29,0x70,0xe0
+DB 0xc5,0xf8,0x29,0x78,0xf0
+DB 0xc5,0x78,0x29,0x40,0x00
+DB 0xc5,0x78,0x29,0x48,0x10
+DB 0xc5,0x78,0x29,0x50,0x20
+DB 0xc5,0x78,0x29,0x58,0x30
+DB 0xc5,0x78,0x29,0x60,0x40
+DB 0xc5,0x78,0x29,0x68,0x50
+DB 0xc5,0x78,0x29,0x70,0x60
+DB 0xc5,0x78,0x29,0x78,0x70
+ lea rsp,[((-256))+rsp]
+ and rsp,-32
+ lea r10,[$L$inc]
+ lea rax,[((-128))+rsp]
+
+ vmovd xmm4,r8d
+ vmovdqa ymm0,YMMWORD[r10]
+ vmovdqa ymm1,YMMWORD[32+r10]
+ vmovdqa ymm5,YMMWORD[64+r10]
+ vpbroadcastd ymm4,xmm4
+
+ vpaddd ymm2,ymm0,ymm5
+ vpcmpeqd ymm0,ymm0,ymm4
+ vpaddd ymm3,ymm1,ymm5
+ vpcmpeqd ymm1,ymm1,ymm4
+ vmovdqa YMMWORD[(0+128)+rax],ymm0
+ vpaddd ymm0,ymm2,ymm5
+ vpcmpeqd ymm2,ymm2,ymm4
+ vmovdqa YMMWORD[(32+128)+rax],ymm1
+ vpaddd ymm1,ymm3,ymm5
+ vpcmpeqd ymm3,ymm3,ymm4
+ vmovdqa YMMWORD[(64+128)+rax],ymm2
+ vpaddd ymm2,ymm0,ymm5
+ vpcmpeqd ymm0,ymm0,ymm4
+ vmovdqa YMMWORD[(96+128)+rax],ymm3
+ vpaddd ymm3,ymm1,ymm5
+ vpcmpeqd ymm1,ymm1,ymm4
+ vmovdqa YMMWORD[(128+128)+rax],ymm0
+ vpaddd ymm8,ymm2,ymm5
+ vpcmpeqd ymm2,ymm2,ymm4
+ vmovdqa YMMWORD[(160+128)+rax],ymm1
+ vpaddd ymm9,ymm3,ymm5
+ vpcmpeqd ymm3,ymm3,ymm4
+ vmovdqa YMMWORD[(192+128)+rax],ymm2
+ vpaddd ymm10,ymm8,ymm5
+ vpcmpeqd ymm8,ymm8,ymm4
+ vmovdqa YMMWORD[(224+128)+rax],ymm3
+ vpaddd ymm11,ymm9,ymm5
+ vpcmpeqd ymm9,ymm9,ymm4
+ vpaddd ymm12,ymm10,ymm5
+ vpcmpeqd ymm10,ymm10,ymm4
+ vpaddd ymm13,ymm11,ymm5
+ vpcmpeqd ymm11,ymm11,ymm4
+ vpaddd ymm14,ymm12,ymm5
+ vpcmpeqd ymm12,ymm12,ymm4
+ vpaddd ymm15,ymm13,ymm5
+ vpcmpeqd ymm13,ymm13,ymm4
+ vpcmpeqd ymm14,ymm14,ymm4
+ vpcmpeqd ymm15,ymm15,ymm4
+
+ vmovdqa ymm7,YMMWORD[((-32))+r10]
+ lea rdx,[128+rdx]
+ mov r8d,9
+
+$L$oop_gather_1024:
+ vmovdqa ymm0,YMMWORD[((0-128))+rdx]
+ vmovdqa ymm1,YMMWORD[((32-128))+rdx]
+ vmovdqa ymm2,YMMWORD[((64-128))+rdx]
+ vmovdqa ymm3,YMMWORD[((96-128))+rdx]
+ vpand ymm0,ymm0,YMMWORD[((0+128))+rax]
+ vpand ymm1,ymm1,YMMWORD[((32+128))+rax]
+ vpand ymm2,ymm2,YMMWORD[((64+128))+rax]
+ vpor ymm4,ymm1,ymm0
+ vpand ymm3,ymm3,YMMWORD[((96+128))+rax]
+ vmovdqa ymm0,YMMWORD[((128-128))+rdx]
+ vmovdqa ymm1,YMMWORD[((160-128))+rdx]
+ vpor ymm5,ymm3,ymm2
+ vmovdqa ymm2,YMMWORD[((192-128))+rdx]
+ vmovdqa ymm3,YMMWORD[((224-128))+rdx]
+ vpand ymm0,ymm0,YMMWORD[((128+128))+rax]
+ vpand ymm1,ymm1,YMMWORD[((160+128))+rax]
+ vpand ymm2,ymm2,YMMWORD[((192+128))+rax]
+ vpor ymm4,ymm4,ymm0
+ vpand ymm3,ymm3,YMMWORD[((224+128))+rax]
+ vpand ymm0,ymm8,YMMWORD[((256-128))+rdx]
+ vpor ymm5,ymm5,ymm1
+ vpand ymm1,ymm9,YMMWORD[((288-128))+rdx]
+ vpor ymm4,ymm4,ymm2
+ vpand ymm2,ymm10,YMMWORD[((320-128))+rdx]
+ vpor ymm5,ymm5,ymm3
+ vpand ymm3,ymm11,YMMWORD[((352-128))+rdx]
+ vpor ymm4,ymm4,ymm0
+ vpand ymm0,ymm12,YMMWORD[((384-128))+rdx]
+ vpor ymm5,ymm5,ymm1
+ vpand ymm1,ymm13,YMMWORD[((416-128))+rdx]
+ vpor ymm4,ymm4,ymm2
+ vpand ymm2,ymm14,YMMWORD[((448-128))+rdx]
+ vpor ymm5,ymm5,ymm3
+ vpand ymm3,ymm15,YMMWORD[((480-128))+rdx]
+ lea rdx,[512+rdx]
+ vpor ymm4,ymm4,ymm0
+ vpor ymm5,ymm5,ymm1
+ vpor ymm4,ymm4,ymm2
+ vpor ymm5,ymm5,ymm3
+
+ vpor ymm4,ymm4,ymm5
+ vextracti128 xmm5,ymm4,1
+ vpor xmm5,xmm5,xmm4
+ vpermd ymm5,ymm7,ymm5
+ vmovdqu YMMWORD[rcx],ymm5
+ lea rcx,[32+rcx]
+ dec r8d
+ jnz NEAR $L$oop_gather_1024
+
+ vpxor ymm0,ymm0,ymm0
+ vmovdqu YMMWORD[rcx],ymm0
+ vzeroupper
+ movaps xmm6,XMMWORD[((-168))+r11]
+ movaps xmm7,XMMWORD[((-152))+r11]
+ movaps xmm8,XMMWORD[((-136))+r11]
+ movaps xmm9,XMMWORD[((-120))+r11]
+ movaps xmm10,XMMWORD[((-104))+r11]
+ movaps xmm11,XMMWORD[((-88))+r11]
+ movaps xmm12,XMMWORD[((-72))+r11]
+ movaps xmm13,XMMWORD[((-56))+r11]
+ movaps xmm14,XMMWORD[((-40))+r11]
+ movaps xmm15,XMMWORD[((-24))+r11]
+ lea rsp,[r11]
+
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_rsaz_1024_gather5:
+
+EXTERN OPENSSL_ia32cap_P
+global rsaz_avx2_eligible
+
+ALIGN 32
+rsaz_avx2_eligible:
+ mov eax,DWORD[((OPENSSL_ia32cap_P+8))]
+ mov ecx,524544
+ mov edx,0
+ and ecx,eax
+ cmp ecx,524544
+ cmove eax,edx
+ and eax,32
+ shr eax,5
+ DB 0F3h,0C3h ;repret
+
+
+ALIGN 64
+$L$and_mask:
+ DQ 0x1fffffff,0x1fffffff,0x1fffffff,0x1fffffff
+$L$scatter_permd:
+ DD 0,2,4,6,7,7,7,7
+$L$gather_permd:
+ DD 0,7,1,7,2,7,3,7
+$L$inc:
+ DD 0,0,0,0,1,1,1,1
+ DD 2,2,2,2,3,3,3,3
+ DD 4,4,4,4,4,4,4,4
+ALIGN 64
+EXTERN __imp_RtlVirtualUnwind
+
+ALIGN 16
+rsaz_se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$common_seh_tail
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$common_seh_tail
+
+ mov rbp,QWORD[160+r8]
+
+ mov r10d,DWORD[8+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ cmovc rax,rbp
+
+ mov r15,QWORD[((-48))+rax]
+ mov r14,QWORD[((-40))+rax]
+ mov r13,QWORD[((-32))+rax]
+ mov r12,QWORD[((-24))+rax]
+ mov rbp,QWORD[((-16))+rax]
+ mov rbx,QWORD[((-8))+rax]
+ mov QWORD[240+r8],r15
+ mov QWORD[232+r8],r14
+ mov QWORD[224+r8],r13
+ mov QWORD[216+r8],r12
+ mov QWORD[160+r8],rbp
+ mov QWORD[144+r8],rbx
+
+ lea rsi,[((-216))+rax]
+ lea rdi,[512+r8]
+ mov ecx,20
+ DD 0xa548f3fc
+
+$L$common_seh_tail:
+ mov rdi,QWORD[8+rax]
+ mov rsi,QWORD[16+rax]
+ mov QWORD[152+r8],rax
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+
+ mov rdi,QWORD[40+r9]
+ mov rsi,r8
+ mov ecx,154
+ DD 0xa548f3fc
+
+ mov rsi,r9
+ xor rcx,rcx
+ mov rdx,QWORD[8+rsi]
+ mov r8,QWORD[rsi]
+ mov r9,QWORD[16+rsi]
+ mov r10,QWORD[40+rsi]
+ lea r11,[56+rsi]
+ lea r12,[24+rsi]
+ mov QWORD[32+rsp],r10
+ mov QWORD[40+rsp],r11
+ mov QWORD[48+rsp],r12
+ mov QWORD[56+rsp],rcx
+ call QWORD[__imp_RtlVirtualUnwind]
+
+ mov eax,1
+ add rsp,64
+ popfq
+ pop r15
+ pop r14
+ pop r13
+ pop r12
+ pop rbp
+ pop rbx
+ pop rdi
+ pop rsi
+ DB 0F3h,0C3h ;repret
+
+
+section .pdata rdata align=4
+ALIGN 4
+ DD $L$SEH_begin_rsaz_1024_sqr_avx2 wrt ..imagebase
+ DD $L$SEH_end_rsaz_1024_sqr_avx2 wrt ..imagebase
+ DD $L$SEH_info_rsaz_1024_sqr_avx2 wrt ..imagebase
+
+ DD $L$SEH_begin_rsaz_1024_mul_avx2 wrt ..imagebase
+ DD $L$SEH_end_rsaz_1024_mul_avx2 wrt ..imagebase
+ DD $L$SEH_info_rsaz_1024_mul_avx2 wrt ..imagebase
+
+ DD $L$SEH_begin_rsaz_1024_gather5 wrt ..imagebase
+ DD $L$SEH_end_rsaz_1024_gather5 wrt ..imagebase
+ DD $L$SEH_info_rsaz_1024_gather5 wrt ..imagebase
+section .xdata rdata align=8
+ALIGN 8
+$L$SEH_info_rsaz_1024_sqr_avx2:
+DB 9,0,0,0
+ DD rsaz_se_handler wrt ..imagebase
+ DD $L$sqr_1024_body wrt ..imagebase,$L$sqr_1024_epilogue wrt ..imagebase,$L$sqr_1024_in_tail wrt ..imagebase
+ DD 0
+$L$SEH_info_rsaz_1024_mul_avx2:
+DB 9,0,0,0
+ DD rsaz_se_handler wrt ..imagebase
+ DD $L$mul_1024_body wrt ..imagebase,$L$mul_1024_epilogue wrt ..imagebase,$L$mul_1024_in_tail wrt ..imagebase
+ DD 0
+$L$SEH_info_rsaz_1024_gather5:
+DB 0x01,0x36,0x17,0x0b
+DB 0x36,0xf8,0x09,0x00
+DB 0x31,0xe8,0x08,0x00
+DB 0x2c,0xd8,0x07,0x00
+DB 0x27,0xc8,0x06,0x00
+DB 0x22,0xb8,0x05,0x00
+DB 0x1d,0xa8,0x04,0x00
+DB 0x18,0x98,0x03,0x00
+DB 0x13,0x88,0x02,0x00
+DB 0x0e,0x78,0x01,0x00
+DB 0x09,0x68,0x00,0x00
+DB 0x04,0x01,0x15,0x00
+DB 0x00,0xb3,0x00,0x00
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-x86_64.nasm
new file mode 100644
index 0000000000..eb4958e903
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-x86_64.nasm
@@ -0,0 +1,2242 @@
+; Copyright 2013-2016 The OpenSSL Project Authors. All Rights Reserved.
+; Copyright (c) 2012, Intel Corporation. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+EXTERN OPENSSL_ia32cap_P
+
+global rsaz_512_sqr
+
+ALIGN 32
+rsaz_512_sqr:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_rsaz_512_sqr:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+
+
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+
+ sub rsp,128+24
+
+$L$sqr_body:
+ mov rbp,rdx
+ mov rdx,QWORD[rsi]
+ mov rax,QWORD[8+rsi]
+ mov QWORD[128+rsp],rcx
+ mov r11d,0x80100
+ and r11d,DWORD[((OPENSSL_ia32cap_P+8))]
+ cmp r11d,0x80100
+ je NEAR $L$oop_sqrx
+ jmp NEAR $L$oop_sqr
+
+ALIGN 32
+$L$oop_sqr:
+ mov DWORD[((128+8))+rsp],r8d
+
+ mov rbx,rdx
+ mul rdx
+ mov r8,rax
+ mov rax,QWORD[16+rsi]
+ mov r9,rdx
+
+ mul rbx
+ add r9,rax
+ mov rax,QWORD[24+rsi]
+ mov r10,rdx
+ adc r10,0
+
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[32+rsi]
+ mov r11,rdx
+ adc r11,0
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[40+rsi]
+ mov r12,rdx
+ adc r12,0
+
+ mul rbx
+ add r12,rax
+ mov rax,QWORD[48+rsi]
+ mov r13,rdx
+ adc r13,0
+
+ mul rbx
+ add r13,rax
+ mov rax,QWORD[56+rsi]
+ mov r14,rdx
+ adc r14,0
+
+ mul rbx
+ add r14,rax
+ mov rax,rbx
+ mov r15,rdx
+ adc r15,0
+
+ add r8,r8
+ mov rcx,r9
+ adc r9,r9
+
+ mul rax
+ mov QWORD[rsp],rax
+ add r8,rdx
+ adc r9,0
+
+ mov QWORD[8+rsp],r8
+ shr rcx,63
+
+
+ mov r8,QWORD[8+rsi]
+ mov rax,QWORD[16+rsi]
+ mul r8
+ add r10,rax
+ mov rax,QWORD[24+rsi]
+ mov rbx,rdx
+ adc rbx,0
+
+ mul r8
+ add r11,rax
+ mov rax,QWORD[32+rsi]
+ adc rdx,0
+ add r11,rbx
+ mov rbx,rdx
+ adc rbx,0
+
+ mul r8
+ add r12,rax
+ mov rax,QWORD[40+rsi]
+ adc rdx,0
+ add r12,rbx
+ mov rbx,rdx
+ adc rbx,0
+
+ mul r8
+ add r13,rax
+ mov rax,QWORD[48+rsi]
+ adc rdx,0
+ add r13,rbx
+ mov rbx,rdx
+ adc rbx,0
+
+ mul r8
+ add r14,rax
+ mov rax,QWORD[56+rsi]
+ adc rdx,0
+ add r14,rbx
+ mov rbx,rdx
+ adc rbx,0
+
+ mul r8
+ add r15,rax
+ mov rax,r8
+ adc rdx,0
+ add r15,rbx
+ mov r8,rdx
+ mov rdx,r10
+ adc r8,0
+
+ add rdx,rdx
+ lea r10,[r10*2+rcx]
+ mov rbx,r11
+ adc r11,r11
+
+ mul rax
+ add r9,rax
+ adc r10,rdx
+ adc r11,0
+
+ mov QWORD[16+rsp],r9
+ mov QWORD[24+rsp],r10
+ shr rbx,63
+
+
+ mov r9,QWORD[16+rsi]
+ mov rax,QWORD[24+rsi]
+ mul r9
+ add r12,rax
+ mov rax,QWORD[32+rsi]
+ mov rcx,rdx
+ adc rcx,0
+
+ mul r9
+ add r13,rax
+ mov rax,QWORD[40+rsi]
+ adc rdx,0
+ add r13,rcx
+ mov rcx,rdx
+ adc rcx,0
+
+ mul r9
+ add r14,rax
+ mov rax,QWORD[48+rsi]
+ adc rdx,0
+ add r14,rcx
+ mov rcx,rdx
+ adc rcx,0
+
+ mul r9
+ mov r10,r12
+ lea r12,[r12*2+rbx]
+ add r15,rax
+ mov rax,QWORD[56+rsi]
+ adc rdx,0
+ add r15,rcx
+ mov rcx,rdx
+ adc rcx,0
+
+ mul r9
+ shr r10,63
+ add r8,rax
+ mov rax,r9
+ adc rdx,0
+ add r8,rcx
+ mov r9,rdx
+ adc r9,0
+
+ mov rcx,r13
+ lea r13,[r13*2+r10]
+
+ mul rax
+ add r11,rax
+ adc r12,rdx
+ adc r13,0
+
+ mov QWORD[32+rsp],r11
+ mov QWORD[40+rsp],r12
+ shr rcx,63
+
+
+ mov r10,QWORD[24+rsi]
+ mov rax,QWORD[32+rsi]
+ mul r10
+ add r14,rax
+ mov rax,QWORD[40+rsi]
+ mov rbx,rdx
+ adc rbx,0
+
+ mul r10
+ add r15,rax
+ mov rax,QWORD[48+rsi]
+ adc rdx,0
+ add r15,rbx
+ mov rbx,rdx
+ adc rbx,0
+
+ mul r10
+ mov r12,r14
+ lea r14,[r14*2+rcx]
+ add r8,rax
+ mov rax,QWORD[56+rsi]
+ adc rdx,0
+ add r8,rbx
+ mov rbx,rdx
+ adc rbx,0
+
+ mul r10
+ shr r12,63
+ add r9,rax
+ mov rax,r10
+ adc rdx,0
+ add r9,rbx
+ mov r10,rdx
+ adc r10,0
+
+ mov rbx,r15
+ lea r15,[r15*2+r12]
+
+ mul rax
+ add r13,rax
+ adc r14,rdx
+ adc r15,0
+
+ mov QWORD[48+rsp],r13
+ mov QWORD[56+rsp],r14
+ shr rbx,63
+
+
+ mov r11,QWORD[32+rsi]
+ mov rax,QWORD[40+rsi]
+ mul r11
+ add r8,rax
+ mov rax,QWORD[48+rsi]
+ mov rcx,rdx
+ adc rcx,0
+
+ mul r11
+ add r9,rax
+ mov rax,QWORD[56+rsi]
+ adc rdx,0
+ mov r12,r8
+ lea r8,[r8*2+rbx]
+ add r9,rcx
+ mov rcx,rdx
+ adc rcx,0
+
+ mul r11
+ shr r12,63
+ add r10,rax
+ mov rax,r11
+ adc rdx,0
+ add r10,rcx
+ mov r11,rdx
+ adc r11,0
+
+ mov rcx,r9
+ lea r9,[r9*2+r12]
+
+ mul rax
+ add r15,rax
+ adc r8,rdx
+ adc r9,0
+
+ mov QWORD[64+rsp],r15
+ mov QWORD[72+rsp],r8
+ shr rcx,63
+
+
+ mov r12,QWORD[40+rsi]
+ mov rax,QWORD[48+rsi]
+ mul r12
+ add r10,rax
+ mov rax,QWORD[56+rsi]
+ mov rbx,rdx
+ adc rbx,0
+
+ mul r12
+ add r11,rax
+ mov rax,r12
+ mov r15,r10
+ lea r10,[r10*2+rcx]
+ adc rdx,0
+ shr r15,63
+ add r11,rbx
+ mov r12,rdx
+ adc r12,0
+
+ mov rbx,r11
+ lea r11,[r11*2+r15]
+
+ mul rax
+ add r9,rax
+ adc r10,rdx
+ adc r11,0
+
+ mov QWORD[80+rsp],r9
+ mov QWORD[88+rsp],r10
+
+
+ mov r13,QWORD[48+rsi]
+ mov rax,QWORD[56+rsi]
+ mul r13
+ add r12,rax
+ mov rax,r13
+ mov r13,rdx
+ adc r13,0
+
+ xor r14,r14
+ shl rbx,1
+ adc r12,r12
+ adc r13,r13
+ adc r14,r14
+
+ mul rax
+ add r11,rax
+ adc r12,rdx
+ adc r13,0
+
+ mov QWORD[96+rsp],r11
+ mov QWORD[104+rsp],r12
+
+
+ mov rax,QWORD[56+rsi]
+ mul rax
+ add r13,rax
+ adc rdx,0
+
+ add r14,rdx
+
+ mov QWORD[112+rsp],r13
+ mov QWORD[120+rsp],r14
+
+ mov r8,QWORD[rsp]
+ mov r9,QWORD[8+rsp]
+ mov r10,QWORD[16+rsp]
+ mov r11,QWORD[24+rsp]
+ mov r12,QWORD[32+rsp]
+ mov r13,QWORD[40+rsp]
+ mov r14,QWORD[48+rsp]
+ mov r15,QWORD[56+rsp]
+
+ call __rsaz_512_reduce
+
+ add r8,QWORD[64+rsp]
+ adc r9,QWORD[72+rsp]
+ adc r10,QWORD[80+rsp]
+ adc r11,QWORD[88+rsp]
+ adc r12,QWORD[96+rsp]
+ adc r13,QWORD[104+rsp]
+ adc r14,QWORD[112+rsp]
+ adc r15,QWORD[120+rsp]
+ sbb rcx,rcx
+
+ call __rsaz_512_subtract
+
+ mov rdx,r8
+ mov rax,r9
+ mov r8d,DWORD[((128+8))+rsp]
+ mov rsi,rdi
+
+ dec r8d
+ jnz NEAR $L$oop_sqr
+ jmp NEAR $L$sqr_tail
+
+ALIGN 32
+$L$oop_sqrx:
+ mov DWORD[((128+8))+rsp],r8d
+DB 102,72,15,110,199
+DB 102,72,15,110,205
+
+ mulx r9,r8,rax
+
+ mulx r10,rcx,QWORD[16+rsi]
+ xor rbp,rbp
+
+ mulx r11,rax,QWORD[24+rsi]
+ adcx r9,rcx
+
+ mulx r12,rcx,QWORD[32+rsi]
+ adcx r10,rax
+
+ mulx r13,rax,QWORD[40+rsi]
+ adcx r11,rcx
+
+DB 0xc4,0x62,0xf3,0xf6,0xb6,0x30,0x00,0x00,0x00
+ adcx r12,rax
+ adcx r13,rcx
+
+DB 0xc4,0x62,0xfb,0xf6,0xbe,0x38,0x00,0x00,0x00
+ adcx r14,rax
+ adcx r15,rbp
+
+ mov rcx,r9
+ shld r9,r8,1
+ shl r8,1
+
+ xor ebp,ebp
+ mulx rdx,rax,rdx
+ adcx r8,rdx
+ mov rdx,QWORD[8+rsi]
+ adcx r9,rbp
+
+ mov QWORD[rsp],rax
+ mov QWORD[8+rsp],r8
+
+
+ mulx rbx,rax,QWORD[16+rsi]
+ adox r10,rax
+ adcx r11,rbx
+
+DB 0xc4,0x62,0xc3,0xf6,0x86,0x18,0x00,0x00,0x00
+ adox r11,rdi
+ adcx r12,r8
+
+ mulx rbx,rax,QWORD[32+rsi]
+ adox r12,rax
+ adcx r13,rbx
+
+ mulx r8,rdi,QWORD[40+rsi]
+ adox r13,rdi
+ adcx r14,r8
+
+DB 0xc4,0xe2,0xfb,0xf6,0x9e,0x30,0x00,0x00,0x00
+ adox r14,rax
+ adcx r15,rbx
+
+DB 0xc4,0x62,0xc3,0xf6,0x86,0x38,0x00,0x00,0x00
+ adox r15,rdi
+ adcx r8,rbp
+ adox r8,rbp
+
+ mov rbx,r11
+ shld r11,r10,1
+ shld r10,rcx,1
+
+ xor ebp,ebp
+ mulx rcx,rax,rdx
+ mov rdx,QWORD[16+rsi]
+ adcx r9,rax
+ adcx r10,rcx
+ adcx r11,rbp
+
+ mov QWORD[16+rsp],r9
+DB 0x4c,0x89,0x94,0x24,0x18,0x00,0x00,0x00
+
+
+DB 0xc4,0x62,0xc3,0xf6,0x8e,0x18,0x00,0x00,0x00
+ adox r12,rdi
+ adcx r13,r9
+
+ mulx rcx,rax,QWORD[32+rsi]
+ adox r13,rax
+ adcx r14,rcx
+
+ mulx r9,rdi,QWORD[40+rsi]
+ adox r14,rdi
+ adcx r15,r9
+
+DB 0xc4,0xe2,0xfb,0xf6,0x8e,0x30,0x00,0x00,0x00
+ adox r15,rax
+ adcx r8,rcx
+
+DB 0xc4,0x62,0xc3,0xf6,0x8e,0x38,0x00,0x00,0x00
+ adox r8,rdi
+ adcx r9,rbp
+ adox r9,rbp
+
+ mov rcx,r13
+ shld r13,r12,1
+ shld r12,rbx,1
+
+ xor ebp,ebp
+ mulx rdx,rax,rdx
+ adcx r11,rax
+ adcx r12,rdx
+ mov rdx,QWORD[24+rsi]
+ adcx r13,rbp
+
+ mov QWORD[32+rsp],r11
+DB 0x4c,0x89,0xa4,0x24,0x28,0x00,0x00,0x00
+
+
+DB 0xc4,0xe2,0xfb,0xf6,0x9e,0x20,0x00,0x00,0x00
+ adox r14,rax
+ adcx r15,rbx
+
+ mulx r10,rdi,QWORD[40+rsi]
+ adox r15,rdi
+ adcx r8,r10
+
+ mulx rbx,rax,QWORD[48+rsi]
+ adox r8,rax
+ adcx r9,rbx
+
+ mulx r10,rdi,QWORD[56+rsi]
+ adox r9,rdi
+ adcx r10,rbp
+ adox r10,rbp
+
+DB 0x66
+ mov rbx,r15
+ shld r15,r14,1
+ shld r14,rcx,1
+
+ xor ebp,ebp
+ mulx rdx,rax,rdx
+ adcx r13,rax
+ adcx r14,rdx
+ mov rdx,QWORD[32+rsi]
+ adcx r15,rbp
+
+ mov QWORD[48+rsp],r13
+ mov QWORD[56+rsp],r14
+
+
+DB 0xc4,0x62,0xc3,0xf6,0x9e,0x28,0x00,0x00,0x00
+ adox r8,rdi
+ adcx r9,r11
+
+ mulx rcx,rax,QWORD[48+rsi]
+ adox r9,rax
+ adcx r10,rcx
+
+ mulx r11,rdi,QWORD[56+rsi]
+ adox r10,rdi
+ adcx r11,rbp
+ adox r11,rbp
+
+ mov rcx,r9
+ shld r9,r8,1
+ shld r8,rbx,1
+
+ xor ebp,ebp
+ mulx rdx,rax,rdx
+ adcx r15,rax
+ adcx r8,rdx
+ mov rdx,QWORD[40+rsi]
+ adcx r9,rbp
+
+ mov QWORD[64+rsp],r15
+ mov QWORD[72+rsp],r8
+
+
+DB 0xc4,0xe2,0xfb,0xf6,0x9e,0x30,0x00,0x00,0x00
+ adox r10,rax
+ adcx r11,rbx
+
+DB 0xc4,0x62,0xc3,0xf6,0xa6,0x38,0x00,0x00,0x00
+ adox r11,rdi
+ adcx r12,rbp
+ adox r12,rbp
+
+ mov rbx,r11
+ shld r11,r10,1
+ shld r10,rcx,1
+
+ xor ebp,ebp
+ mulx rdx,rax,rdx
+ adcx r9,rax
+ adcx r10,rdx
+ mov rdx,QWORD[48+rsi]
+ adcx r11,rbp
+
+ mov QWORD[80+rsp],r9
+ mov QWORD[88+rsp],r10
+
+
+DB 0xc4,0x62,0xfb,0xf6,0xae,0x38,0x00,0x00,0x00
+ adox r12,rax
+ adox r13,rbp
+
+ xor r14,r14
+ shld r14,r13,1
+ shld r13,r12,1
+ shld r12,rbx,1
+
+ xor ebp,ebp
+ mulx rdx,rax,rdx
+ adcx r11,rax
+ adcx r12,rdx
+ mov rdx,QWORD[56+rsi]
+ adcx r13,rbp
+
+DB 0x4c,0x89,0x9c,0x24,0x60,0x00,0x00,0x00
+DB 0x4c,0x89,0xa4,0x24,0x68,0x00,0x00,0x00
+
+
+ mulx rdx,rax,rdx
+ adox r13,rax
+ adox rdx,rbp
+
+DB 0x66
+ add r14,rdx
+
+ mov QWORD[112+rsp],r13
+ mov QWORD[120+rsp],r14
+DB 102,72,15,126,199
+DB 102,72,15,126,205
+
+ mov rdx,QWORD[128+rsp]
+ mov r8,QWORD[rsp]
+ mov r9,QWORD[8+rsp]
+ mov r10,QWORD[16+rsp]
+ mov r11,QWORD[24+rsp]
+ mov r12,QWORD[32+rsp]
+ mov r13,QWORD[40+rsp]
+ mov r14,QWORD[48+rsp]
+ mov r15,QWORD[56+rsp]
+
+ call __rsaz_512_reducex
+
+ add r8,QWORD[64+rsp]
+ adc r9,QWORD[72+rsp]
+ adc r10,QWORD[80+rsp]
+ adc r11,QWORD[88+rsp]
+ adc r12,QWORD[96+rsp]
+ adc r13,QWORD[104+rsp]
+ adc r14,QWORD[112+rsp]
+ adc r15,QWORD[120+rsp]
+ sbb rcx,rcx
+
+ call __rsaz_512_subtract
+
+ mov rdx,r8
+ mov rax,r9
+ mov r8d,DWORD[((128+8))+rsp]
+ mov rsi,rdi
+
+ dec r8d
+ jnz NEAR $L$oop_sqrx
+
+$L$sqr_tail:
+
+ lea rax,[((128+24+48))+rsp]
+
+ mov r15,QWORD[((-48))+rax]
+
+ mov r14,QWORD[((-40))+rax]
+
+ mov r13,QWORD[((-32))+rax]
+
+ mov r12,QWORD[((-24))+rax]
+
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$sqr_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_rsaz_512_sqr:
+global rsaz_512_mul
+
+ALIGN 32
+rsaz_512_mul:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_rsaz_512_mul:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+
+
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+
+ sub rsp,128+24
+
+$L$mul_body:
+DB 102,72,15,110,199
+DB 102,72,15,110,201
+ mov QWORD[128+rsp],r8
+ mov r11d,0x80100
+ and r11d,DWORD[((OPENSSL_ia32cap_P+8))]
+ cmp r11d,0x80100
+ je NEAR $L$mulx
+ mov rbx,QWORD[rdx]
+ mov rbp,rdx
+ call __rsaz_512_mul
+
+DB 102,72,15,126,199
+DB 102,72,15,126,205
+
+ mov r8,QWORD[rsp]
+ mov r9,QWORD[8+rsp]
+ mov r10,QWORD[16+rsp]
+ mov r11,QWORD[24+rsp]
+ mov r12,QWORD[32+rsp]
+ mov r13,QWORD[40+rsp]
+ mov r14,QWORD[48+rsp]
+ mov r15,QWORD[56+rsp]
+
+ call __rsaz_512_reduce
+ jmp NEAR $L$mul_tail
+
+ALIGN 32
+$L$mulx:
+ mov rbp,rdx
+ mov rdx,QWORD[rdx]
+ call __rsaz_512_mulx
+
+DB 102,72,15,126,199
+DB 102,72,15,126,205
+
+ mov rdx,QWORD[128+rsp]
+ mov r8,QWORD[rsp]
+ mov r9,QWORD[8+rsp]
+ mov r10,QWORD[16+rsp]
+ mov r11,QWORD[24+rsp]
+ mov r12,QWORD[32+rsp]
+ mov r13,QWORD[40+rsp]
+ mov r14,QWORD[48+rsp]
+ mov r15,QWORD[56+rsp]
+
+ call __rsaz_512_reducex
+$L$mul_tail:
+ add r8,QWORD[64+rsp]
+ adc r9,QWORD[72+rsp]
+ adc r10,QWORD[80+rsp]
+ adc r11,QWORD[88+rsp]
+ adc r12,QWORD[96+rsp]
+ adc r13,QWORD[104+rsp]
+ adc r14,QWORD[112+rsp]
+ adc r15,QWORD[120+rsp]
+ sbb rcx,rcx
+
+ call __rsaz_512_subtract
+
+ lea rax,[((128+24+48))+rsp]
+
+ mov r15,QWORD[((-48))+rax]
+
+ mov r14,QWORD[((-40))+rax]
+
+ mov r13,QWORD[((-32))+rax]
+
+ mov r12,QWORD[((-24))+rax]
+
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$mul_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_rsaz_512_mul:
+global rsaz_512_mul_gather4
+
+ALIGN 32
+rsaz_512_mul_gather4:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_rsaz_512_mul_gather4:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+
+ sub rsp,328
+
+ movaps XMMWORD[160+rsp],xmm6
+ movaps XMMWORD[176+rsp],xmm7
+ movaps XMMWORD[192+rsp],xmm8
+ movaps XMMWORD[208+rsp],xmm9
+ movaps XMMWORD[224+rsp],xmm10
+ movaps XMMWORD[240+rsp],xmm11
+ movaps XMMWORD[256+rsp],xmm12
+ movaps XMMWORD[272+rsp],xmm13
+ movaps XMMWORD[288+rsp],xmm14
+ movaps XMMWORD[304+rsp],xmm15
+$L$mul_gather4_body:
+ movd xmm8,r9d
+ movdqa xmm1,XMMWORD[(($L$inc+16))]
+ movdqa xmm0,XMMWORD[$L$inc]
+
+ pshufd xmm8,xmm8,0
+ movdqa xmm7,xmm1
+ movdqa xmm2,xmm1
+ paddd xmm1,xmm0
+ pcmpeqd xmm0,xmm8
+ movdqa xmm3,xmm7
+ paddd xmm2,xmm1
+ pcmpeqd xmm1,xmm8
+ movdqa xmm4,xmm7
+ paddd xmm3,xmm2
+ pcmpeqd xmm2,xmm8
+ movdqa xmm5,xmm7
+ paddd xmm4,xmm3
+ pcmpeqd xmm3,xmm8
+ movdqa xmm6,xmm7
+ paddd xmm5,xmm4
+ pcmpeqd xmm4,xmm8
+ paddd xmm6,xmm5
+ pcmpeqd xmm5,xmm8
+ paddd xmm7,xmm6
+ pcmpeqd xmm6,xmm8
+ pcmpeqd xmm7,xmm8
+
+ movdqa xmm8,XMMWORD[rdx]
+ movdqa xmm9,XMMWORD[16+rdx]
+ movdqa xmm10,XMMWORD[32+rdx]
+ movdqa xmm11,XMMWORD[48+rdx]
+ pand xmm8,xmm0
+ movdqa xmm12,XMMWORD[64+rdx]
+ pand xmm9,xmm1
+ movdqa xmm13,XMMWORD[80+rdx]
+ pand xmm10,xmm2
+ movdqa xmm14,XMMWORD[96+rdx]
+ pand xmm11,xmm3
+ movdqa xmm15,XMMWORD[112+rdx]
+ lea rbp,[128+rdx]
+ pand xmm12,xmm4
+ pand xmm13,xmm5
+ pand xmm14,xmm6
+ pand xmm15,xmm7
+ por xmm8,xmm10
+ por xmm9,xmm11
+ por xmm8,xmm12
+ por xmm9,xmm13
+ por xmm8,xmm14
+ por xmm9,xmm15
+
+ por xmm8,xmm9
+ pshufd xmm9,xmm8,0x4e
+ por xmm8,xmm9
+ mov r11d,0x80100
+ and r11d,DWORD[((OPENSSL_ia32cap_P+8))]
+ cmp r11d,0x80100
+ je NEAR $L$mulx_gather
+DB 102,76,15,126,195
+
+ mov QWORD[128+rsp],r8
+ mov QWORD[((128+8))+rsp],rdi
+ mov QWORD[((128+16))+rsp],rcx
+
+ mov rax,QWORD[rsi]
+ mov rcx,QWORD[8+rsi]
+ mul rbx
+ mov QWORD[rsp],rax
+ mov rax,rcx
+ mov r8,rdx
+
+ mul rbx
+ add r8,rax
+ mov rax,QWORD[16+rsi]
+ mov r9,rdx
+ adc r9,0
+
+ mul rbx
+ add r9,rax
+ mov rax,QWORD[24+rsi]
+ mov r10,rdx
+ adc r10,0
+
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[32+rsi]
+ mov r11,rdx
+ adc r11,0
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[40+rsi]
+ mov r12,rdx
+ adc r12,0
+
+ mul rbx
+ add r12,rax
+ mov rax,QWORD[48+rsi]
+ mov r13,rdx
+ adc r13,0
+
+ mul rbx
+ add r13,rax
+ mov rax,QWORD[56+rsi]
+ mov r14,rdx
+ adc r14,0
+
+ mul rbx
+ add r14,rax
+ mov rax,QWORD[rsi]
+ mov r15,rdx
+ adc r15,0
+
+ lea rdi,[8+rsp]
+ mov ecx,7
+ jmp NEAR $L$oop_mul_gather
+
+ALIGN 32
+$L$oop_mul_gather:
+ movdqa xmm8,XMMWORD[rbp]
+ movdqa xmm9,XMMWORD[16+rbp]
+ movdqa xmm10,XMMWORD[32+rbp]
+ movdqa xmm11,XMMWORD[48+rbp]
+ pand xmm8,xmm0
+ movdqa xmm12,XMMWORD[64+rbp]
+ pand xmm9,xmm1
+ movdqa xmm13,XMMWORD[80+rbp]
+ pand xmm10,xmm2
+ movdqa xmm14,XMMWORD[96+rbp]
+ pand xmm11,xmm3
+ movdqa xmm15,XMMWORD[112+rbp]
+ lea rbp,[128+rbp]
+ pand xmm12,xmm4
+ pand xmm13,xmm5
+ pand xmm14,xmm6
+ pand xmm15,xmm7
+ por xmm8,xmm10
+ por xmm9,xmm11
+ por xmm8,xmm12
+ por xmm9,xmm13
+ por xmm8,xmm14
+ por xmm9,xmm15
+
+ por xmm8,xmm9
+ pshufd xmm9,xmm8,0x4e
+ por xmm8,xmm9
+DB 102,76,15,126,195
+
+ mul rbx
+ add r8,rax
+ mov rax,QWORD[8+rsi]
+ mov QWORD[rdi],r8
+ mov r8,rdx
+ adc r8,0
+
+ mul rbx
+ add r9,rax
+ mov rax,QWORD[16+rsi]
+ adc rdx,0
+ add r8,r9
+ mov r9,rdx
+ adc r9,0
+
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[24+rsi]
+ adc rdx,0
+ add r9,r10
+ mov r10,rdx
+ adc r10,0
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[32+rsi]
+ adc rdx,0
+ add r10,r11
+ mov r11,rdx
+ adc r11,0
+
+ mul rbx
+ add r12,rax
+ mov rax,QWORD[40+rsi]
+ adc rdx,0
+ add r11,r12
+ mov r12,rdx
+ adc r12,0
+
+ mul rbx
+ add r13,rax
+ mov rax,QWORD[48+rsi]
+ adc rdx,0
+ add r12,r13
+ mov r13,rdx
+ adc r13,0
+
+ mul rbx
+ add r14,rax
+ mov rax,QWORD[56+rsi]
+ adc rdx,0
+ add r13,r14
+ mov r14,rdx
+ adc r14,0
+
+ mul rbx
+ add r15,rax
+ mov rax,QWORD[rsi]
+ adc rdx,0
+ add r14,r15
+ mov r15,rdx
+ adc r15,0
+
+ lea rdi,[8+rdi]
+
+ dec ecx
+ jnz NEAR $L$oop_mul_gather
+
+ mov QWORD[rdi],r8
+ mov QWORD[8+rdi],r9
+ mov QWORD[16+rdi],r10
+ mov QWORD[24+rdi],r11
+ mov QWORD[32+rdi],r12
+ mov QWORD[40+rdi],r13
+ mov QWORD[48+rdi],r14
+ mov QWORD[56+rdi],r15
+
+ mov rdi,QWORD[((128+8))+rsp]
+ mov rbp,QWORD[((128+16))+rsp]
+
+ mov r8,QWORD[rsp]
+ mov r9,QWORD[8+rsp]
+ mov r10,QWORD[16+rsp]
+ mov r11,QWORD[24+rsp]
+ mov r12,QWORD[32+rsp]
+ mov r13,QWORD[40+rsp]
+ mov r14,QWORD[48+rsp]
+ mov r15,QWORD[56+rsp]
+
+ call __rsaz_512_reduce
+ jmp NEAR $L$mul_gather_tail
+
+ALIGN 32
+$L$mulx_gather:
+DB 102,76,15,126,194
+
+ mov QWORD[128+rsp],r8
+ mov QWORD[((128+8))+rsp],rdi
+ mov QWORD[((128+16))+rsp],rcx
+
+ mulx r8,rbx,QWORD[rsi]
+ mov QWORD[rsp],rbx
+ xor edi,edi
+
+ mulx r9,rax,QWORD[8+rsi]
+
+ mulx r10,rbx,QWORD[16+rsi]
+ adcx r8,rax
+
+ mulx r11,rax,QWORD[24+rsi]
+ adcx r9,rbx
+
+ mulx r12,rbx,QWORD[32+rsi]
+ adcx r10,rax
+
+ mulx r13,rax,QWORD[40+rsi]
+ adcx r11,rbx
+
+ mulx r14,rbx,QWORD[48+rsi]
+ adcx r12,rax
+
+ mulx r15,rax,QWORD[56+rsi]
+ adcx r13,rbx
+ adcx r14,rax
+DB 0x67
+ mov rbx,r8
+ adcx r15,rdi
+
+ mov rcx,-7
+ jmp NEAR $L$oop_mulx_gather
+
+ALIGN 32
+$L$oop_mulx_gather:
+ movdqa xmm8,XMMWORD[rbp]
+ movdqa xmm9,XMMWORD[16+rbp]
+ movdqa xmm10,XMMWORD[32+rbp]
+ movdqa xmm11,XMMWORD[48+rbp]
+ pand xmm8,xmm0
+ movdqa xmm12,XMMWORD[64+rbp]
+ pand xmm9,xmm1
+ movdqa xmm13,XMMWORD[80+rbp]
+ pand xmm10,xmm2
+ movdqa xmm14,XMMWORD[96+rbp]
+ pand xmm11,xmm3
+ movdqa xmm15,XMMWORD[112+rbp]
+ lea rbp,[128+rbp]
+ pand xmm12,xmm4
+ pand xmm13,xmm5
+ pand xmm14,xmm6
+ pand xmm15,xmm7
+ por xmm8,xmm10
+ por xmm9,xmm11
+ por xmm8,xmm12
+ por xmm9,xmm13
+ por xmm8,xmm14
+ por xmm9,xmm15
+
+ por xmm8,xmm9
+ pshufd xmm9,xmm8,0x4e
+ por xmm8,xmm9
+DB 102,76,15,126,194
+
+DB 0xc4,0x62,0xfb,0xf6,0x86,0x00,0x00,0x00,0x00
+ adcx rbx,rax
+ adox r8,r9
+
+ mulx r9,rax,QWORD[8+rsi]
+ adcx r8,rax
+ adox r9,r10
+
+ mulx r10,rax,QWORD[16+rsi]
+ adcx r9,rax
+ adox r10,r11
+
+DB 0xc4,0x62,0xfb,0xf6,0x9e,0x18,0x00,0x00,0x00
+ adcx r10,rax
+ adox r11,r12
+
+ mulx r12,rax,QWORD[32+rsi]
+ adcx r11,rax
+ adox r12,r13
+
+ mulx r13,rax,QWORD[40+rsi]
+ adcx r12,rax
+ adox r13,r14
+
+DB 0xc4,0x62,0xfb,0xf6,0xb6,0x30,0x00,0x00,0x00
+ adcx r13,rax
+DB 0x67
+ adox r14,r15
+
+ mulx r15,rax,QWORD[56+rsi]
+ mov QWORD[64+rcx*8+rsp],rbx
+ adcx r14,rax
+ adox r15,rdi
+ mov rbx,r8
+ adcx r15,rdi
+
+ inc rcx
+ jnz NEAR $L$oop_mulx_gather
+
+ mov QWORD[64+rsp],r8
+ mov QWORD[((64+8))+rsp],r9
+ mov QWORD[((64+16))+rsp],r10
+ mov QWORD[((64+24))+rsp],r11
+ mov QWORD[((64+32))+rsp],r12
+ mov QWORD[((64+40))+rsp],r13
+ mov QWORD[((64+48))+rsp],r14
+ mov QWORD[((64+56))+rsp],r15
+
+ mov rdx,QWORD[128+rsp]
+ mov rdi,QWORD[((128+8))+rsp]
+ mov rbp,QWORD[((128+16))+rsp]
+
+ mov r8,QWORD[rsp]
+ mov r9,QWORD[8+rsp]
+ mov r10,QWORD[16+rsp]
+ mov r11,QWORD[24+rsp]
+ mov r12,QWORD[32+rsp]
+ mov r13,QWORD[40+rsp]
+ mov r14,QWORD[48+rsp]
+ mov r15,QWORD[56+rsp]
+
+ call __rsaz_512_reducex
+
+$L$mul_gather_tail:
+ add r8,QWORD[64+rsp]
+ adc r9,QWORD[72+rsp]
+ adc r10,QWORD[80+rsp]
+ adc r11,QWORD[88+rsp]
+ adc r12,QWORD[96+rsp]
+ adc r13,QWORD[104+rsp]
+ adc r14,QWORD[112+rsp]
+ adc r15,QWORD[120+rsp]
+ sbb rcx,rcx
+
+ call __rsaz_512_subtract
+
+ lea rax,[((128+24+48))+rsp]
+ movaps xmm6,XMMWORD[((160-200))+rax]
+ movaps xmm7,XMMWORD[((176-200))+rax]
+ movaps xmm8,XMMWORD[((192-200))+rax]
+ movaps xmm9,XMMWORD[((208-200))+rax]
+ movaps xmm10,XMMWORD[((224-200))+rax]
+ movaps xmm11,XMMWORD[((240-200))+rax]
+ movaps xmm12,XMMWORD[((256-200))+rax]
+ movaps xmm13,XMMWORD[((272-200))+rax]
+ movaps xmm14,XMMWORD[((288-200))+rax]
+ movaps xmm15,XMMWORD[((304-200))+rax]
+ lea rax,[176+rax]
+
+ mov r15,QWORD[((-48))+rax]
+
+ mov r14,QWORD[((-40))+rax]
+
+ mov r13,QWORD[((-32))+rax]
+
+ mov r12,QWORD[((-24))+rax]
+
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$mul_gather4_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_rsaz_512_mul_gather4:
+global rsaz_512_mul_scatter4
+
+ALIGN 32
+rsaz_512_mul_scatter4:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_rsaz_512_mul_scatter4:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+
+ mov r9d,r9d
+ sub rsp,128+24
+
+$L$mul_scatter4_body:
+ lea r8,[r9*8+r8]
+DB 102,72,15,110,199
+DB 102,72,15,110,202
+DB 102,73,15,110,208
+ mov QWORD[128+rsp],rcx
+
+ mov rbp,rdi
+ mov r11d,0x80100
+ and r11d,DWORD[((OPENSSL_ia32cap_P+8))]
+ cmp r11d,0x80100
+ je NEAR $L$mulx_scatter
+ mov rbx,QWORD[rdi]
+ call __rsaz_512_mul
+
+DB 102,72,15,126,199
+DB 102,72,15,126,205
+
+ mov r8,QWORD[rsp]
+ mov r9,QWORD[8+rsp]
+ mov r10,QWORD[16+rsp]
+ mov r11,QWORD[24+rsp]
+ mov r12,QWORD[32+rsp]
+ mov r13,QWORD[40+rsp]
+ mov r14,QWORD[48+rsp]
+ mov r15,QWORD[56+rsp]
+
+ call __rsaz_512_reduce
+ jmp NEAR $L$mul_scatter_tail
+
+ALIGN 32
+$L$mulx_scatter:
+ mov rdx,QWORD[rdi]
+ call __rsaz_512_mulx
+
+DB 102,72,15,126,199
+DB 102,72,15,126,205
+
+ mov rdx,QWORD[128+rsp]
+ mov r8,QWORD[rsp]
+ mov r9,QWORD[8+rsp]
+ mov r10,QWORD[16+rsp]
+ mov r11,QWORD[24+rsp]
+ mov r12,QWORD[32+rsp]
+ mov r13,QWORD[40+rsp]
+ mov r14,QWORD[48+rsp]
+ mov r15,QWORD[56+rsp]
+
+ call __rsaz_512_reducex
+
+$L$mul_scatter_tail:
+ add r8,QWORD[64+rsp]
+ adc r9,QWORD[72+rsp]
+ adc r10,QWORD[80+rsp]
+ adc r11,QWORD[88+rsp]
+ adc r12,QWORD[96+rsp]
+ adc r13,QWORD[104+rsp]
+ adc r14,QWORD[112+rsp]
+ adc r15,QWORD[120+rsp]
+DB 102,72,15,126,214
+ sbb rcx,rcx
+
+ call __rsaz_512_subtract
+
+ mov QWORD[rsi],r8
+ mov QWORD[128+rsi],r9
+ mov QWORD[256+rsi],r10
+ mov QWORD[384+rsi],r11
+ mov QWORD[512+rsi],r12
+ mov QWORD[640+rsi],r13
+ mov QWORD[768+rsi],r14
+ mov QWORD[896+rsi],r15
+
+ lea rax,[((128+24+48))+rsp]
+
+ mov r15,QWORD[((-48))+rax]
+
+ mov r14,QWORD[((-40))+rax]
+
+ mov r13,QWORD[((-32))+rax]
+
+ mov r12,QWORD[((-24))+rax]
+
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$mul_scatter4_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_rsaz_512_mul_scatter4:
+global rsaz_512_mul_by_one
+
+ALIGN 32
+rsaz_512_mul_by_one:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_rsaz_512_mul_by_one:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+
+
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+
+ sub rsp,128+24
+
+$L$mul_by_one_body:
+ mov eax,DWORD[((OPENSSL_ia32cap_P+8))]
+ mov rbp,rdx
+ mov QWORD[128+rsp],rcx
+
+ mov r8,QWORD[rsi]
+ pxor xmm0,xmm0
+ mov r9,QWORD[8+rsi]
+ mov r10,QWORD[16+rsi]
+ mov r11,QWORD[24+rsi]
+ mov r12,QWORD[32+rsi]
+ mov r13,QWORD[40+rsi]
+ mov r14,QWORD[48+rsi]
+ mov r15,QWORD[56+rsi]
+
+ movdqa XMMWORD[rsp],xmm0
+ movdqa XMMWORD[16+rsp],xmm0
+ movdqa XMMWORD[32+rsp],xmm0
+ movdqa XMMWORD[48+rsp],xmm0
+ movdqa XMMWORD[64+rsp],xmm0
+ movdqa XMMWORD[80+rsp],xmm0
+ movdqa XMMWORD[96+rsp],xmm0
+ and eax,0x80100
+ cmp eax,0x80100
+ je NEAR $L$by_one_callx
+ call __rsaz_512_reduce
+ jmp NEAR $L$by_one_tail
+ALIGN 32
+$L$by_one_callx:
+ mov rdx,QWORD[128+rsp]
+ call __rsaz_512_reducex
+$L$by_one_tail:
+ mov QWORD[rdi],r8
+ mov QWORD[8+rdi],r9
+ mov QWORD[16+rdi],r10
+ mov QWORD[24+rdi],r11
+ mov QWORD[32+rdi],r12
+ mov QWORD[40+rdi],r13
+ mov QWORD[48+rdi],r14
+ mov QWORD[56+rdi],r15
+
+ lea rax,[((128+24+48))+rsp]
+
+ mov r15,QWORD[((-48))+rax]
+
+ mov r14,QWORD[((-40))+rax]
+
+ mov r13,QWORD[((-32))+rax]
+
+ mov r12,QWORD[((-24))+rax]
+
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$mul_by_one_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_rsaz_512_mul_by_one:
+
+ALIGN 32
+__rsaz_512_reduce:
+ mov rbx,r8
+ imul rbx,QWORD[((128+8))+rsp]
+ mov rax,QWORD[rbp]
+ mov ecx,8
+ jmp NEAR $L$reduction_loop
+
+ALIGN 32
+$L$reduction_loop:
+ mul rbx
+ mov rax,QWORD[8+rbp]
+ neg r8
+ mov r8,rdx
+ adc r8,0
+
+ mul rbx
+ add r9,rax
+ mov rax,QWORD[16+rbp]
+ adc rdx,0
+ add r8,r9
+ mov r9,rdx
+ adc r9,0
+
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[24+rbp]
+ adc rdx,0
+ add r9,r10
+ mov r10,rdx
+ adc r10,0
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[32+rbp]
+ adc rdx,0
+ add r10,r11
+ mov rsi,QWORD[((128+8))+rsp]
+
+
+ adc rdx,0
+ mov r11,rdx
+
+ mul rbx
+ add r12,rax
+ mov rax,QWORD[40+rbp]
+ adc rdx,0
+ imul rsi,r8
+ add r11,r12
+ mov r12,rdx
+ adc r12,0
+
+ mul rbx
+ add r13,rax
+ mov rax,QWORD[48+rbp]
+ adc rdx,0
+ add r12,r13
+ mov r13,rdx
+ adc r13,0
+
+ mul rbx
+ add r14,rax
+ mov rax,QWORD[56+rbp]
+ adc rdx,0
+ add r13,r14
+ mov r14,rdx
+ adc r14,0
+
+ mul rbx
+ mov rbx,rsi
+ add r15,rax
+ mov rax,QWORD[rbp]
+ adc rdx,0
+ add r14,r15
+ mov r15,rdx
+ adc r15,0
+
+ dec ecx
+ jne NEAR $L$reduction_loop
+
+ DB 0F3h,0C3h ;repret
+
+
+ALIGN 32
+__rsaz_512_reducex:
+
+ imul rdx,r8
+ xor rsi,rsi
+ mov ecx,8
+ jmp NEAR $L$reduction_loopx
+
+ALIGN 32
+$L$reduction_loopx:
+ mov rbx,r8
+ mulx r8,rax,QWORD[rbp]
+ adcx rax,rbx
+ adox r8,r9
+
+ mulx r9,rax,QWORD[8+rbp]
+ adcx r8,rax
+ adox r9,r10
+
+ mulx r10,rbx,QWORD[16+rbp]
+ adcx r9,rbx
+ adox r10,r11
+
+ mulx r11,rbx,QWORD[24+rbp]
+ adcx r10,rbx
+ adox r11,r12
+
+DB 0xc4,0x62,0xe3,0xf6,0xa5,0x20,0x00,0x00,0x00
+ mov rax,rdx
+ mov rdx,r8
+ adcx r11,rbx
+ adox r12,r13
+
+ mulx rdx,rbx,QWORD[((128+8))+rsp]
+ mov rdx,rax
+
+ mulx r13,rax,QWORD[40+rbp]
+ adcx r12,rax
+ adox r13,r14
+
+DB 0xc4,0x62,0xfb,0xf6,0xb5,0x30,0x00,0x00,0x00
+ adcx r13,rax
+ adox r14,r15
+
+ mulx r15,rax,QWORD[56+rbp]
+ mov rdx,rbx
+ adcx r14,rax
+ adox r15,rsi
+ adcx r15,rsi
+
+ dec ecx
+ jne NEAR $L$reduction_loopx
+
+ DB 0F3h,0C3h ;repret
+
+
+ALIGN 32
+__rsaz_512_subtract:
+ mov QWORD[rdi],r8
+ mov QWORD[8+rdi],r9
+ mov QWORD[16+rdi],r10
+ mov QWORD[24+rdi],r11
+ mov QWORD[32+rdi],r12
+ mov QWORD[40+rdi],r13
+ mov QWORD[48+rdi],r14
+ mov QWORD[56+rdi],r15
+
+ mov r8,QWORD[rbp]
+ mov r9,QWORD[8+rbp]
+ neg r8
+ not r9
+ and r8,rcx
+ mov r10,QWORD[16+rbp]
+ and r9,rcx
+ not r10
+ mov r11,QWORD[24+rbp]
+ and r10,rcx
+ not r11
+ mov r12,QWORD[32+rbp]
+ and r11,rcx
+ not r12
+ mov r13,QWORD[40+rbp]
+ and r12,rcx
+ not r13
+ mov r14,QWORD[48+rbp]
+ and r13,rcx
+ not r14
+ mov r15,QWORD[56+rbp]
+ and r14,rcx
+ not r15
+ and r15,rcx
+
+ add r8,QWORD[rdi]
+ adc r9,QWORD[8+rdi]
+ adc r10,QWORD[16+rdi]
+ adc r11,QWORD[24+rdi]
+ adc r12,QWORD[32+rdi]
+ adc r13,QWORD[40+rdi]
+ adc r14,QWORD[48+rdi]
+ adc r15,QWORD[56+rdi]
+
+ mov QWORD[rdi],r8
+ mov QWORD[8+rdi],r9
+ mov QWORD[16+rdi],r10
+ mov QWORD[24+rdi],r11
+ mov QWORD[32+rdi],r12
+ mov QWORD[40+rdi],r13
+ mov QWORD[48+rdi],r14
+ mov QWORD[56+rdi],r15
+
+ DB 0F3h,0C3h ;repret
+
+
+ALIGN 32
+__rsaz_512_mul:
+ lea rdi,[8+rsp]
+
+ mov rax,QWORD[rsi]
+ mul rbx
+ mov QWORD[rdi],rax
+ mov rax,QWORD[8+rsi]
+ mov r8,rdx
+
+ mul rbx
+ add r8,rax
+ mov rax,QWORD[16+rsi]
+ mov r9,rdx
+ adc r9,0
+
+ mul rbx
+ add r9,rax
+ mov rax,QWORD[24+rsi]
+ mov r10,rdx
+ adc r10,0
+
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[32+rsi]
+ mov r11,rdx
+ adc r11,0
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[40+rsi]
+ mov r12,rdx
+ adc r12,0
+
+ mul rbx
+ add r12,rax
+ mov rax,QWORD[48+rsi]
+ mov r13,rdx
+ adc r13,0
+
+ mul rbx
+ add r13,rax
+ mov rax,QWORD[56+rsi]
+ mov r14,rdx
+ adc r14,0
+
+ mul rbx
+ add r14,rax
+ mov rax,QWORD[rsi]
+ mov r15,rdx
+ adc r15,0
+
+ lea rbp,[8+rbp]
+ lea rdi,[8+rdi]
+
+ mov ecx,7
+ jmp NEAR $L$oop_mul
+
+ALIGN 32
+$L$oop_mul:
+ mov rbx,QWORD[rbp]
+ mul rbx
+ add r8,rax
+ mov rax,QWORD[8+rsi]
+ mov QWORD[rdi],r8
+ mov r8,rdx
+ adc r8,0
+
+ mul rbx
+ add r9,rax
+ mov rax,QWORD[16+rsi]
+ adc rdx,0
+ add r8,r9
+ mov r9,rdx
+ adc r9,0
+
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[24+rsi]
+ adc rdx,0
+ add r9,r10
+ mov r10,rdx
+ adc r10,0
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[32+rsi]
+ adc rdx,0
+ add r10,r11
+ mov r11,rdx
+ adc r11,0
+
+ mul rbx
+ add r12,rax
+ mov rax,QWORD[40+rsi]
+ adc rdx,0
+ add r11,r12
+ mov r12,rdx
+ adc r12,0
+
+ mul rbx
+ add r13,rax
+ mov rax,QWORD[48+rsi]
+ adc rdx,0
+ add r12,r13
+ mov r13,rdx
+ adc r13,0
+
+ mul rbx
+ add r14,rax
+ mov rax,QWORD[56+rsi]
+ adc rdx,0
+ add r13,r14
+ mov r14,rdx
+ lea rbp,[8+rbp]
+ adc r14,0
+
+ mul rbx
+ add r15,rax
+ mov rax,QWORD[rsi]
+ adc rdx,0
+ add r14,r15
+ mov r15,rdx
+ adc r15,0
+
+ lea rdi,[8+rdi]
+
+ dec ecx
+ jnz NEAR $L$oop_mul
+
+ mov QWORD[rdi],r8
+ mov QWORD[8+rdi],r9
+ mov QWORD[16+rdi],r10
+ mov QWORD[24+rdi],r11
+ mov QWORD[32+rdi],r12
+ mov QWORD[40+rdi],r13
+ mov QWORD[48+rdi],r14
+ mov QWORD[56+rdi],r15
+
+ DB 0F3h,0C3h ;repret
+
+
+ALIGN 32
+__rsaz_512_mulx:
+ mulx r8,rbx,QWORD[rsi]
+ mov rcx,-6
+
+ mulx r9,rax,QWORD[8+rsi]
+ mov QWORD[8+rsp],rbx
+
+ mulx r10,rbx,QWORD[16+rsi]
+ adc r8,rax
+
+ mulx r11,rax,QWORD[24+rsi]
+ adc r9,rbx
+
+ mulx r12,rbx,QWORD[32+rsi]
+ adc r10,rax
+
+ mulx r13,rax,QWORD[40+rsi]
+ adc r11,rbx
+
+ mulx r14,rbx,QWORD[48+rsi]
+ adc r12,rax
+
+ mulx r15,rax,QWORD[56+rsi]
+ mov rdx,QWORD[8+rbp]
+ adc r13,rbx
+ adc r14,rax
+ adc r15,0
+
+ xor rdi,rdi
+ jmp NEAR $L$oop_mulx
+
+ALIGN 32
+$L$oop_mulx:
+ mov rbx,r8
+ mulx r8,rax,QWORD[rsi]
+ adcx rbx,rax
+ adox r8,r9
+
+ mulx r9,rax,QWORD[8+rsi]
+ adcx r8,rax
+ adox r9,r10
+
+ mulx r10,rax,QWORD[16+rsi]
+ adcx r9,rax
+ adox r10,r11
+
+ mulx r11,rax,QWORD[24+rsi]
+ adcx r10,rax
+ adox r11,r12
+
+DB 0x3e,0xc4,0x62,0xfb,0xf6,0xa6,0x20,0x00,0x00,0x00
+ adcx r11,rax
+ adox r12,r13
+
+ mulx r13,rax,QWORD[40+rsi]
+ adcx r12,rax
+ adox r13,r14
+
+ mulx r14,rax,QWORD[48+rsi]
+ adcx r13,rax
+ adox r14,r15
+
+ mulx r15,rax,QWORD[56+rsi]
+ mov rdx,QWORD[64+rcx*8+rbp]
+ mov QWORD[((8+64-8))+rcx*8+rsp],rbx
+ adcx r14,rax
+ adox r15,rdi
+ adcx r15,rdi
+
+ inc rcx
+ jnz NEAR $L$oop_mulx
+
+ mov rbx,r8
+ mulx r8,rax,QWORD[rsi]
+ adcx rbx,rax
+ adox r8,r9
+
+DB 0xc4,0x62,0xfb,0xf6,0x8e,0x08,0x00,0x00,0x00
+ adcx r8,rax
+ adox r9,r10
+
+DB 0xc4,0x62,0xfb,0xf6,0x96,0x10,0x00,0x00,0x00
+ adcx r9,rax
+ adox r10,r11
+
+ mulx r11,rax,QWORD[24+rsi]
+ adcx r10,rax
+ adox r11,r12
+
+ mulx r12,rax,QWORD[32+rsi]
+ adcx r11,rax
+ adox r12,r13
+
+ mulx r13,rax,QWORD[40+rsi]
+ adcx r12,rax
+ adox r13,r14
+
+DB 0xc4,0x62,0xfb,0xf6,0xb6,0x30,0x00,0x00,0x00
+ adcx r13,rax
+ adox r14,r15
+
+DB 0xc4,0x62,0xfb,0xf6,0xbe,0x38,0x00,0x00,0x00
+ adcx r14,rax
+ adox r15,rdi
+ adcx r15,rdi
+
+ mov QWORD[((8+64-8))+rsp],rbx
+ mov QWORD[((8+64))+rsp],r8
+ mov QWORD[((8+64+8))+rsp],r9
+ mov QWORD[((8+64+16))+rsp],r10
+ mov QWORD[((8+64+24))+rsp],r11
+ mov QWORD[((8+64+32))+rsp],r12
+ mov QWORD[((8+64+40))+rsp],r13
+ mov QWORD[((8+64+48))+rsp],r14
+ mov QWORD[((8+64+56))+rsp],r15
+
+ DB 0F3h,0C3h ;repret
+
+global rsaz_512_scatter4
+
+ALIGN 16
+rsaz_512_scatter4:
+ lea rcx,[r8*8+rcx]
+ mov r9d,8
+ jmp NEAR $L$oop_scatter
+ALIGN 16
+$L$oop_scatter:
+ mov rax,QWORD[rdx]
+ lea rdx,[8+rdx]
+ mov QWORD[rcx],rax
+ lea rcx,[128+rcx]
+ dec r9d
+ jnz NEAR $L$oop_scatter
+ DB 0F3h,0C3h ;repret
+
+
+global rsaz_512_gather4
+
+ALIGN 16
+rsaz_512_gather4:
+$L$SEH_begin_rsaz_512_gather4:
+DB 0x48,0x81,0xec,0xa8,0x00,0x00,0x00
+DB 0x0f,0x29,0x34,0x24
+DB 0x0f,0x29,0x7c,0x24,0x10
+DB 0x44,0x0f,0x29,0x44,0x24,0x20
+DB 0x44,0x0f,0x29,0x4c,0x24,0x30
+DB 0x44,0x0f,0x29,0x54,0x24,0x40
+DB 0x44,0x0f,0x29,0x5c,0x24,0x50
+DB 0x44,0x0f,0x29,0x64,0x24,0x60
+DB 0x44,0x0f,0x29,0x6c,0x24,0x70
+DB 0x44,0x0f,0x29,0xb4,0x24,0x80,0,0,0
+DB 0x44,0x0f,0x29,0xbc,0x24,0x90,0,0,0
+ movd xmm8,r8d
+ movdqa xmm1,XMMWORD[(($L$inc+16))]
+ movdqa xmm0,XMMWORD[$L$inc]
+
+ pshufd xmm8,xmm8,0
+ movdqa xmm7,xmm1
+ movdqa xmm2,xmm1
+ paddd xmm1,xmm0
+ pcmpeqd xmm0,xmm8
+ movdqa xmm3,xmm7
+ paddd xmm2,xmm1
+ pcmpeqd xmm1,xmm8
+ movdqa xmm4,xmm7
+ paddd xmm3,xmm2
+ pcmpeqd xmm2,xmm8
+ movdqa xmm5,xmm7
+ paddd xmm4,xmm3
+ pcmpeqd xmm3,xmm8
+ movdqa xmm6,xmm7
+ paddd xmm5,xmm4
+ pcmpeqd xmm4,xmm8
+ paddd xmm6,xmm5
+ pcmpeqd xmm5,xmm8
+ paddd xmm7,xmm6
+ pcmpeqd xmm6,xmm8
+ pcmpeqd xmm7,xmm8
+ mov r9d,8
+ jmp NEAR $L$oop_gather
+ALIGN 16
+$L$oop_gather:
+ movdqa xmm8,XMMWORD[rdx]
+ movdqa xmm9,XMMWORD[16+rdx]
+ movdqa xmm10,XMMWORD[32+rdx]
+ movdqa xmm11,XMMWORD[48+rdx]
+ pand xmm8,xmm0
+ movdqa xmm12,XMMWORD[64+rdx]
+ pand xmm9,xmm1
+ movdqa xmm13,XMMWORD[80+rdx]
+ pand xmm10,xmm2
+ movdqa xmm14,XMMWORD[96+rdx]
+ pand xmm11,xmm3
+ movdqa xmm15,XMMWORD[112+rdx]
+ lea rdx,[128+rdx]
+ pand xmm12,xmm4
+ pand xmm13,xmm5
+ pand xmm14,xmm6
+ pand xmm15,xmm7
+ por xmm8,xmm10
+ por xmm9,xmm11
+ por xmm8,xmm12
+ por xmm9,xmm13
+ por xmm8,xmm14
+ por xmm9,xmm15
+
+ por xmm8,xmm9
+ pshufd xmm9,xmm8,0x4e
+ por xmm8,xmm9
+ movq QWORD[rcx],xmm8
+ lea rcx,[8+rcx]
+ dec r9d
+ jnz NEAR $L$oop_gather
+ movaps xmm6,XMMWORD[rsp]
+ movaps xmm7,XMMWORD[16+rsp]
+ movaps xmm8,XMMWORD[32+rsp]
+ movaps xmm9,XMMWORD[48+rsp]
+ movaps xmm10,XMMWORD[64+rsp]
+ movaps xmm11,XMMWORD[80+rsp]
+ movaps xmm12,XMMWORD[96+rsp]
+ movaps xmm13,XMMWORD[112+rsp]
+ movaps xmm14,XMMWORD[128+rsp]
+ movaps xmm15,XMMWORD[144+rsp]
+ add rsp,0xa8
+ DB 0F3h,0C3h ;repret
+$L$SEH_end_rsaz_512_gather4:
+
+
+ALIGN 64
+$L$inc:
+ DD 0,0,1,1
+ DD 2,2,2,2
+EXTERN __imp_RtlVirtualUnwind
+
+ALIGN 16
+se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$common_seh_tail
+
+ mov rax,QWORD[152+r8]
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$common_seh_tail
+
+ lea rax,[((128+24+48))+rax]
+
+ lea rbx,[$L$mul_gather4_epilogue]
+ cmp rbx,r10
+ jne NEAR $L$se_not_in_mul_gather4
+
+ lea rax,[176+rax]
+
+ lea rsi,[((-48-168))+rax]
+ lea rdi,[512+r8]
+ mov ecx,20
+ DD 0xa548f3fc
+
+$L$se_not_in_mul_gather4:
+ mov rbx,QWORD[((-8))+rax]
+ mov rbp,QWORD[((-16))+rax]
+ mov r12,QWORD[((-24))+rax]
+ mov r13,QWORD[((-32))+rax]
+ mov r14,QWORD[((-40))+rax]
+ mov r15,QWORD[((-48))+rax]
+ mov QWORD[144+r8],rbx
+ mov QWORD[160+r8],rbp
+ mov QWORD[216+r8],r12
+ mov QWORD[224+r8],r13
+ mov QWORD[232+r8],r14
+ mov QWORD[240+r8],r15
+
+$L$common_seh_tail:
+ mov rdi,QWORD[8+rax]
+ mov rsi,QWORD[16+rax]
+ mov QWORD[152+r8],rax
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+
+ mov rdi,QWORD[40+r9]
+ mov rsi,r8
+ mov ecx,154
+ DD 0xa548f3fc
+
+ mov rsi,r9
+ xor rcx,rcx
+ mov rdx,QWORD[8+rsi]
+ mov r8,QWORD[rsi]
+ mov r9,QWORD[16+rsi]
+ mov r10,QWORD[40+rsi]
+ lea r11,[56+rsi]
+ lea r12,[24+rsi]
+ mov QWORD[32+rsp],r10
+ mov QWORD[40+rsp],r11
+ mov QWORD[48+rsp],r12
+ mov QWORD[56+rsp],rcx
+ call QWORD[__imp_RtlVirtualUnwind]
+
+ mov eax,1
+ add rsp,64
+ popfq
+ pop r15
+ pop r14
+ pop r13
+ pop r12
+ pop rbp
+ pop rbx
+ pop rdi
+ pop rsi
+ DB 0F3h,0C3h ;repret
+
+
+section .pdata rdata align=4
+ALIGN 4
+ DD $L$SEH_begin_rsaz_512_sqr wrt ..imagebase
+ DD $L$SEH_end_rsaz_512_sqr wrt ..imagebase
+ DD $L$SEH_info_rsaz_512_sqr wrt ..imagebase
+
+ DD $L$SEH_begin_rsaz_512_mul wrt ..imagebase
+ DD $L$SEH_end_rsaz_512_mul wrt ..imagebase
+ DD $L$SEH_info_rsaz_512_mul wrt ..imagebase
+
+ DD $L$SEH_begin_rsaz_512_mul_gather4 wrt ..imagebase
+ DD $L$SEH_end_rsaz_512_mul_gather4 wrt ..imagebase
+ DD $L$SEH_info_rsaz_512_mul_gather4 wrt ..imagebase
+
+ DD $L$SEH_begin_rsaz_512_mul_scatter4 wrt ..imagebase
+ DD $L$SEH_end_rsaz_512_mul_scatter4 wrt ..imagebase
+ DD $L$SEH_info_rsaz_512_mul_scatter4 wrt ..imagebase
+
+ DD $L$SEH_begin_rsaz_512_mul_by_one wrt ..imagebase
+ DD $L$SEH_end_rsaz_512_mul_by_one wrt ..imagebase
+ DD $L$SEH_info_rsaz_512_mul_by_one wrt ..imagebase
+
+ DD $L$SEH_begin_rsaz_512_gather4 wrt ..imagebase
+ DD $L$SEH_end_rsaz_512_gather4 wrt ..imagebase
+ DD $L$SEH_info_rsaz_512_gather4 wrt ..imagebase
+
+section .xdata rdata align=8
+ALIGN 8
+$L$SEH_info_rsaz_512_sqr:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$sqr_body wrt ..imagebase,$L$sqr_epilogue wrt ..imagebase
+$L$SEH_info_rsaz_512_mul:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$mul_body wrt ..imagebase,$L$mul_epilogue wrt ..imagebase
+$L$SEH_info_rsaz_512_mul_gather4:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$mul_gather4_body wrt ..imagebase,$L$mul_gather4_epilogue wrt ..imagebase
+$L$SEH_info_rsaz_512_mul_scatter4:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$mul_scatter4_body wrt ..imagebase,$L$mul_scatter4_epilogue wrt ..imagebase
+$L$SEH_info_rsaz_512_mul_by_one:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$mul_by_one_body wrt ..imagebase,$L$mul_by_one_epilogue wrt ..imagebase
+$L$SEH_info_rsaz_512_gather4:
+DB 0x01,0x46,0x16,0x00
+DB 0x46,0xf8,0x09,0x00
+DB 0x3d,0xe8,0x08,0x00
+DB 0x34,0xd8,0x07,0x00
+DB 0x2e,0xc8,0x06,0x00
+DB 0x28,0xb8,0x05,0x00
+DB 0x22,0xa8,0x04,0x00
+DB 0x1c,0x98,0x03,0x00
+DB 0x16,0x88,0x02,0x00
+DB 0x10,0x78,0x01,0x00
+DB 0x0b,0x68,0x00,0x00
+DB 0x07,0x01,0x15,0x00
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-gf2m.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-gf2m.nasm
new file mode 100644
index 0000000000..b96e85a35a
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-gf2m.nasm
@@ -0,0 +1,432 @@
+; Copyright 2011-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+
+ALIGN 16
+_mul_1x1:
+
+ sub rsp,128+8
+
+ mov r9,-1
+ lea rsi,[rax*1+rax]
+ shr r9,3
+ lea rdi,[rax*4]
+ and r9,rax
+ lea r12,[rax*8]
+ sar rax,63
+ lea r10,[r9*1+r9]
+ sar rsi,63
+ lea r11,[r9*4]
+ and rax,rbp
+ sar rdi,63
+ mov rdx,rax
+ shl rax,63
+ and rsi,rbp
+ shr rdx,1
+ mov rcx,rsi
+ shl rsi,62
+ and rdi,rbp
+ shr rcx,2
+ xor rax,rsi
+ mov rbx,rdi
+ shl rdi,61
+ xor rdx,rcx
+ shr rbx,3
+ xor rax,rdi
+ xor rdx,rbx
+
+ mov r13,r9
+ mov QWORD[rsp],0
+ xor r13,r10
+ mov QWORD[8+rsp],r9
+ mov r14,r11
+ mov QWORD[16+rsp],r10
+ xor r14,r12
+ mov QWORD[24+rsp],r13
+
+ xor r9,r11
+ mov QWORD[32+rsp],r11
+ xor r10,r11
+ mov QWORD[40+rsp],r9
+ xor r13,r11
+ mov QWORD[48+rsp],r10
+ xor r9,r14
+ mov QWORD[56+rsp],r13
+ xor r10,r14
+
+ mov QWORD[64+rsp],r12
+ xor r13,r14
+ mov QWORD[72+rsp],r9
+ xor r9,r11
+ mov QWORD[80+rsp],r10
+ xor r10,r11
+ mov QWORD[88+rsp],r13
+
+ xor r13,r11
+ mov QWORD[96+rsp],r14
+ mov rsi,r8
+ mov QWORD[104+rsp],r9
+ and rsi,rbp
+ mov QWORD[112+rsp],r10
+ shr rbp,4
+ mov QWORD[120+rsp],r13
+ mov rdi,r8
+ and rdi,rbp
+ shr rbp,4
+
+ movq xmm0,QWORD[rsi*8+rsp]
+ mov rsi,r8
+ and rsi,rbp
+ shr rbp,4
+ mov rcx,QWORD[rdi*8+rsp]
+ mov rdi,r8
+ mov rbx,rcx
+ shl rcx,4
+ and rdi,rbp
+ movq xmm1,QWORD[rsi*8+rsp]
+ shr rbx,60
+ xor rax,rcx
+ pslldq xmm1,1
+ mov rsi,r8
+ shr rbp,4
+ xor rdx,rbx
+ and rsi,rbp
+ shr rbp,4
+ pxor xmm0,xmm1
+ mov rcx,QWORD[rdi*8+rsp]
+ mov rdi,r8
+ mov rbx,rcx
+ shl rcx,12
+ and rdi,rbp
+ movq xmm1,QWORD[rsi*8+rsp]
+ shr rbx,52
+ xor rax,rcx
+ pslldq xmm1,2
+ mov rsi,r8
+ shr rbp,4
+ xor rdx,rbx
+ and rsi,rbp
+ shr rbp,4
+ pxor xmm0,xmm1
+ mov rcx,QWORD[rdi*8+rsp]
+ mov rdi,r8
+ mov rbx,rcx
+ shl rcx,20
+ and rdi,rbp
+ movq xmm1,QWORD[rsi*8+rsp]
+ shr rbx,44
+ xor rax,rcx
+ pslldq xmm1,3
+ mov rsi,r8
+ shr rbp,4
+ xor rdx,rbx
+ and rsi,rbp
+ shr rbp,4
+ pxor xmm0,xmm1
+ mov rcx,QWORD[rdi*8+rsp]
+ mov rdi,r8
+ mov rbx,rcx
+ shl rcx,28
+ and rdi,rbp
+ movq xmm1,QWORD[rsi*8+rsp]
+ shr rbx,36
+ xor rax,rcx
+ pslldq xmm1,4
+ mov rsi,r8
+ shr rbp,4
+ xor rdx,rbx
+ and rsi,rbp
+ shr rbp,4
+ pxor xmm0,xmm1
+ mov rcx,QWORD[rdi*8+rsp]
+ mov rdi,r8
+ mov rbx,rcx
+ shl rcx,36
+ and rdi,rbp
+ movq xmm1,QWORD[rsi*8+rsp]
+ shr rbx,28
+ xor rax,rcx
+ pslldq xmm1,5
+ mov rsi,r8
+ shr rbp,4
+ xor rdx,rbx
+ and rsi,rbp
+ shr rbp,4
+ pxor xmm0,xmm1
+ mov rcx,QWORD[rdi*8+rsp]
+ mov rdi,r8
+ mov rbx,rcx
+ shl rcx,44
+ and rdi,rbp
+ movq xmm1,QWORD[rsi*8+rsp]
+ shr rbx,20
+ xor rax,rcx
+ pslldq xmm1,6
+ mov rsi,r8
+ shr rbp,4
+ xor rdx,rbx
+ and rsi,rbp
+ shr rbp,4
+ pxor xmm0,xmm1
+ mov rcx,QWORD[rdi*8+rsp]
+ mov rdi,r8
+ mov rbx,rcx
+ shl rcx,52
+ and rdi,rbp
+ movq xmm1,QWORD[rsi*8+rsp]
+ shr rbx,12
+ xor rax,rcx
+ pslldq xmm1,7
+ mov rsi,r8
+ shr rbp,4
+ xor rdx,rbx
+ and rsi,rbp
+ shr rbp,4
+ pxor xmm0,xmm1
+ mov rcx,QWORD[rdi*8+rsp]
+ mov rbx,rcx
+ shl rcx,60
+DB 102,72,15,126,198
+ shr rbx,4
+ xor rax,rcx
+ psrldq xmm0,8
+ xor rdx,rbx
+DB 102,72,15,126,199
+ xor rax,rsi
+ xor rdx,rdi
+
+ add rsp,128+8
+
+ DB 0F3h,0C3h ;repret
+$L$end_mul_1x1:
+
+
+EXTERN OPENSSL_ia32cap_P
+global bn_GF2m_mul_2x2
+
+ALIGN 16
+bn_GF2m_mul_2x2:
+
+ mov rax,rsp
+ mov r10,QWORD[OPENSSL_ia32cap_P]
+ bt r10,33
+ jnc NEAR $L$vanilla_mul_2x2
+
+DB 102,72,15,110,194
+DB 102,73,15,110,201
+DB 102,73,15,110,208
+ movq xmm3,QWORD[40+rsp]
+ movdqa xmm4,xmm0
+ movdqa xmm5,xmm1
+DB 102,15,58,68,193,0
+ pxor xmm4,xmm2
+ pxor xmm5,xmm3
+DB 102,15,58,68,211,0
+DB 102,15,58,68,229,0
+ xorps xmm4,xmm0
+ xorps xmm4,xmm2
+ movdqa xmm5,xmm4
+ pslldq xmm4,8
+ psrldq xmm5,8
+ pxor xmm2,xmm4
+ pxor xmm0,xmm5
+ movdqu XMMWORD[rcx],xmm2
+ movdqu XMMWORD[16+rcx],xmm0
+ DB 0F3h,0C3h ;repret
+
+ALIGN 16
+$L$vanilla_mul_2x2:
+ lea rsp,[((-136))+rsp]
+
+ mov r10,QWORD[176+rsp]
+ mov QWORD[120+rsp],rdi
+ mov QWORD[128+rsp],rsi
+ mov QWORD[80+rsp],r14
+
+ mov QWORD[88+rsp],r13
+
+ mov QWORD[96+rsp],r12
+
+ mov QWORD[104+rsp],rbp
+
+ mov QWORD[112+rsp],rbx
+
+$L$body_mul_2x2:
+ mov QWORD[32+rsp],rcx
+ mov QWORD[40+rsp],rdx
+ mov QWORD[48+rsp],r8
+ mov QWORD[56+rsp],r9
+ mov QWORD[64+rsp],r10
+
+ mov r8,0xf
+ mov rax,rdx
+ mov rbp,r9
+ call _mul_1x1
+ mov QWORD[16+rsp],rax
+ mov QWORD[24+rsp],rdx
+
+ mov rax,QWORD[48+rsp]
+ mov rbp,QWORD[64+rsp]
+ call _mul_1x1
+ mov QWORD[rsp],rax
+ mov QWORD[8+rsp],rdx
+
+ mov rax,QWORD[40+rsp]
+ mov rbp,QWORD[56+rsp]
+ xor rax,QWORD[48+rsp]
+ xor rbp,QWORD[64+rsp]
+ call _mul_1x1
+ mov rbx,QWORD[rsp]
+ mov rcx,QWORD[8+rsp]
+ mov rdi,QWORD[16+rsp]
+ mov rsi,QWORD[24+rsp]
+ mov rbp,QWORD[32+rsp]
+
+ xor rax,rdx
+ xor rdx,rcx
+ xor rax,rbx
+ mov QWORD[rbp],rbx
+ xor rdx,rdi
+ mov QWORD[24+rbp],rsi
+ xor rax,rsi
+ xor rdx,rsi
+ xor rax,rdx
+ mov QWORD[16+rbp],rdx
+ mov QWORD[8+rbp],rax
+
+ mov r14,QWORD[80+rsp]
+
+ mov r13,QWORD[88+rsp]
+
+ mov r12,QWORD[96+rsp]
+
+ mov rbp,QWORD[104+rsp]
+
+ mov rbx,QWORD[112+rsp]
+
+ mov rdi,QWORD[120+rsp]
+ mov rsi,QWORD[128+rsp]
+ lea rsp,[136+rsp]
+
+$L$epilogue_mul_2x2:
+ DB 0F3h,0C3h ;repret
+$L$end_mul_2x2:
+
+
+DB 71,70,40,50,94,109,41,32,77,117,108,116,105,112,108,105
+DB 99,97,116,105,111,110,32,102,111,114,32,120,56,54,95,54
+DB 52,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121
+DB 32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46
+DB 111,114,103,62,0
+ALIGN 16
+EXTERN __imp_RtlVirtualUnwind
+
+
+ALIGN 16
+se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ lea r10,[$L$body_mul_2x2]
+ cmp rbx,r10
+ jb NEAR $L$in_prologue
+
+ mov rax,QWORD[152+r8]
+
+ lea r10,[$L$epilogue_mul_2x2]
+ cmp rbx,r10
+ jae NEAR $L$in_prologue
+
+ mov r14,QWORD[80+rax]
+ mov r13,QWORD[88+rax]
+ mov r12,QWORD[96+rax]
+ mov rbp,QWORD[104+rax]
+ mov rbx,QWORD[112+rax]
+ mov rdi,QWORD[120+rax]
+ mov rsi,QWORD[128+rax]
+
+ mov QWORD[144+r8],rbx
+ mov QWORD[160+r8],rbp
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+ mov QWORD[216+r8],r12
+ mov QWORD[224+r8],r13
+ mov QWORD[232+r8],r14
+
+ lea rax,[136+rax]
+
+$L$in_prologue:
+ mov QWORD[152+r8],rax
+
+ mov rdi,QWORD[40+r9]
+ mov rsi,r8
+ mov ecx,154
+ DD 0xa548f3fc
+
+ mov rsi,r9
+ xor rcx,rcx
+ mov rdx,QWORD[8+rsi]
+ mov r8,QWORD[rsi]
+ mov r9,QWORD[16+rsi]
+ mov r10,QWORD[40+rsi]
+ lea r11,[56+rsi]
+ lea r12,[24+rsi]
+ mov QWORD[32+rsp],r10
+ mov QWORD[40+rsp],r11
+ mov QWORD[48+rsp],r12
+ mov QWORD[56+rsp],rcx
+ call QWORD[__imp_RtlVirtualUnwind]
+
+ mov eax,1
+ add rsp,64
+ popfq
+ pop r15
+ pop r14
+ pop r13
+ pop r12
+ pop rbp
+ pop rbx
+ pop rdi
+ pop rsi
+ DB 0F3h,0C3h ;repret
+
+
+section .pdata rdata align=4
+ALIGN 4
+ DD _mul_1x1 wrt ..imagebase
+ DD $L$end_mul_1x1 wrt ..imagebase
+ DD $L$SEH_info_1x1 wrt ..imagebase
+
+ DD $L$vanilla_mul_2x2 wrt ..imagebase
+ DD $L$end_mul_2x2 wrt ..imagebase
+ DD $L$SEH_info_2x2 wrt ..imagebase
+section .xdata rdata align=8
+ALIGN 8
+$L$SEH_info_1x1:
+DB 0x01,0x07,0x02,0x00
+DB 0x07,0x01,0x11,0x00
+$L$SEH_info_2x2:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont.nasm
new file mode 100644
index 0000000000..9ff8ec428f
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont.nasm
@@ -0,0 +1,1479 @@
+; Copyright 2005-2018 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+EXTERN OPENSSL_ia32cap_P
+
+global bn_mul_mont
+
+ALIGN 16
+bn_mul_mont:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_bn_mul_mont:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ mov r9d,r9d
+ mov rax,rsp
+
+ test r9d,3
+ jnz NEAR $L$mul_enter
+ cmp r9d,8
+ jb NEAR $L$mul_enter
+ mov r11d,DWORD[((OPENSSL_ia32cap_P+8))]
+ cmp rdx,rsi
+ jne NEAR $L$mul4x_enter
+ test r9d,7
+ jz NEAR $L$sqr8x_enter
+ jmp NEAR $L$mul4x_enter
+
+ALIGN 16
+$L$mul_enter:
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+
+ neg r9
+ mov r11,rsp
+ lea r10,[((-16))+r9*8+rsp]
+ neg r9
+ and r10,-1024
+
+
+
+
+
+
+
+
+
+ sub r11,r10
+ and r11,-4096
+ lea rsp,[r11*1+r10]
+ mov r11,QWORD[rsp]
+ cmp rsp,r10
+ ja NEAR $L$mul_page_walk
+ jmp NEAR $L$mul_page_walk_done
+
+ALIGN 16
+$L$mul_page_walk:
+ lea rsp,[((-4096))+rsp]
+ mov r11,QWORD[rsp]
+ cmp rsp,r10
+ ja NEAR $L$mul_page_walk
+$L$mul_page_walk_done:
+
+ mov QWORD[8+r9*8+rsp],rax
+
+$L$mul_body:
+ mov r12,rdx
+ mov r8,QWORD[r8]
+ mov rbx,QWORD[r12]
+ mov rax,QWORD[rsi]
+
+ xor r14,r14
+ xor r15,r15
+
+ mov rbp,r8
+ mul rbx
+ mov r10,rax
+ mov rax,QWORD[rcx]
+
+ imul rbp,r10
+ mov r11,rdx
+
+ mul rbp
+ add r10,rax
+ mov rax,QWORD[8+rsi]
+ adc rdx,0
+ mov r13,rdx
+
+ lea r15,[1+r15]
+ jmp NEAR $L$1st_enter
+
+ALIGN 16
+$L$1st:
+ add r13,rax
+ mov rax,QWORD[r15*8+rsi]
+ adc rdx,0
+ add r13,r11
+ mov r11,r10
+ adc rdx,0
+ mov QWORD[((-16))+r15*8+rsp],r13
+ mov r13,rdx
+
+$L$1st_enter:
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[r15*8+rcx]
+ adc rdx,0
+ lea r15,[1+r15]
+ mov r10,rdx
+
+ mul rbp
+ cmp r15,r9
+ jne NEAR $L$1st
+
+ add r13,rax
+ mov rax,QWORD[rsi]
+ adc rdx,0
+ add r13,r11
+ adc rdx,0
+ mov QWORD[((-16))+r15*8+rsp],r13
+ mov r13,rdx
+ mov r11,r10
+
+ xor rdx,rdx
+ add r13,r11
+ adc rdx,0
+ mov QWORD[((-8))+r9*8+rsp],r13
+ mov QWORD[r9*8+rsp],rdx
+
+ lea r14,[1+r14]
+ jmp NEAR $L$outer
+ALIGN 16
+$L$outer:
+ mov rbx,QWORD[r14*8+r12]
+ xor r15,r15
+ mov rbp,r8
+ mov r10,QWORD[rsp]
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[rcx]
+ adc rdx,0
+
+ imul rbp,r10
+ mov r11,rdx
+
+ mul rbp
+ add r10,rax
+ mov rax,QWORD[8+rsi]
+ adc rdx,0
+ mov r10,QWORD[8+rsp]
+ mov r13,rdx
+
+ lea r15,[1+r15]
+ jmp NEAR $L$inner_enter
+
+ALIGN 16
+$L$inner:
+ add r13,rax
+ mov rax,QWORD[r15*8+rsi]
+ adc rdx,0
+ add r13,r10
+ mov r10,QWORD[r15*8+rsp]
+ adc rdx,0
+ mov QWORD[((-16))+r15*8+rsp],r13
+ mov r13,rdx
+
+$L$inner_enter:
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[r15*8+rcx]
+ adc rdx,0
+ add r10,r11
+ mov r11,rdx
+ adc r11,0
+ lea r15,[1+r15]
+
+ mul rbp
+ cmp r15,r9
+ jne NEAR $L$inner
+
+ add r13,rax
+ mov rax,QWORD[rsi]
+ adc rdx,0
+ add r13,r10
+ mov r10,QWORD[r15*8+rsp]
+ adc rdx,0
+ mov QWORD[((-16))+r15*8+rsp],r13
+ mov r13,rdx
+
+ xor rdx,rdx
+ add r13,r11
+ adc rdx,0
+ add r13,r10
+ adc rdx,0
+ mov QWORD[((-8))+r9*8+rsp],r13
+ mov QWORD[r9*8+rsp],rdx
+
+ lea r14,[1+r14]
+ cmp r14,r9
+ jb NEAR $L$outer
+
+ xor r14,r14
+ mov rax,QWORD[rsp]
+ mov r15,r9
+
+ALIGN 16
+$L$sub: sbb rax,QWORD[r14*8+rcx]
+ mov QWORD[r14*8+rdi],rax
+ mov rax,QWORD[8+r14*8+rsp]
+ lea r14,[1+r14]
+ dec r15
+ jnz NEAR $L$sub
+
+ sbb rax,0
+ mov rbx,-1
+ xor rbx,rax
+ xor r14,r14
+ mov r15,r9
+
+$L$copy:
+ mov rcx,QWORD[r14*8+rdi]
+ mov rdx,QWORD[r14*8+rsp]
+ and rcx,rbx
+ and rdx,rax
+ mov QWORD[r14*8+rsp],r9
+ or rdx,rcx
+ mov QWORD[r14*8+rdi],rdx
+ lea r14,[1+r14]
+ sub r15,1
+ jnz NEAR $L$copy
+
+ mov rsi,QWORD[8+r9*8+rsp]
+
+ mov rax,1
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$mul_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_bn_mul_mont:
+
+ALIGN 16
+bn_mul4x_mont:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_bn_mul4x_mont:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ mov r9d,r9d
+ mov rax,rsp
+
+$L$mul4x_enter:
+ and r11d,0x80100
+ cmp r11d,0x80100
+ je NEAR $L$mulx4x_enter
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+
+ neg r9
+ mov r11,rsp
+ lea r10,[((-32))+r9*8+rsp]
+ neg r9
+ and r10,-1024
+
+ sub r11,r10
+ and r11,-4096
+ lea rsp,[r11*1+r10]
+ mov r11,QWORD[rsp]
+ cmp rsp,r10
+ ja NEAR $L$mul4x_page_walk
+ jmp NEAR $L$mul4x_page_walk_done
+
+$L$mul4x_page_walk:
+ lea rsp,[((-4096))+rsp]
+ mov r11,QWORD[rsp]
+ cmp rsp,r10
+ ja NEAR $L$mul4x_page_walk
+$L$mul4x_page_walk_done:
+
+ mov QWORD[8+r9*8+rsp],rax
+
+$L$mul4x_body:
+ mov QWORD[16+r9*8+rsp],rdi
+ mov r12,rdx
+ mov r8,QWORD[r8]
+ mov rbx,QWORD[r12]
+ mov rax,QWORD[rsi]
+
+ xor r14,r14
+ xor r15,r15
+
+ mov rbp,r8
+ mul rbx
+ mov r10,rax
+ mov rax,QWORD[rcx]
+
+ imul rbp,r10
+ mov r11,rdx
+
+ mul rbp
+ add r10,rax
+ mov rax,QWORD[8+rsi]
+ adc rdx,0
+ mov rdi,rdx
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[8+rcx]
+ adc rdx,0
+ mov r10,rdx
+
+ mul rbp
+ add rdi,rax
+ mov rax,QWORD[16+rsi]
+ adc rdx,0
+ add rdi,r11
+ lea r15,[4+r15]
+ adc rdx,0
+ mov QWORD[rsp],rdi
+ mov r13,rdx
+ jmp NEAR $L$1st4x
+ALIGN 16
+$L$1st4x:
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[((-16))+r15*8+rcx]
+ adc rdx,0
+ mov r11,rdx
+
+ mul rbp
+ add r13,rax
+ mov rax,QWORD[((-8))+r15*8+rsi]
+ adc rdx,0
+ add r13,r10
+ adc rdx,0
+ mov QWORD[((-24))+r15*8+rsp],r13
+ mov rdi,rdx
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[((-8))+r15*8+rcx]
+ adc rdx,0
+ mov r10,rdx
+
+ mul rbp
+ add rdi,rax
+ mov rax,QWORD[r15*8+rsi]
+ adc rdx,0
+ add rdi,r11
+ adc rdx,0
+ mov QWORD[((-16))+r15*8+rsp],rdi
+ mov r13,rdx
+
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[r15*8+rcx]
+ adc rdx,0
+ mov r11,rdx
+
+ mul rbp
+ add r13,rax
+ mov rax,QWORD[8+r15*8+rsi]
+ adc rdx,0
+ add r13,r10
+ adc rdx,0
+ mov QWORD[((-8))+r15*8+rsp],r13
+ mov rdi,rdx
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[8+r15*8+rcx]
+ adc rdx,0
+ lea r15,[4+r15]
+ mov r10,rdx
+
+ mul rbp
+ add rdi,rax
+ mov rax,QWORD[((-16))+r15*8+rsi]
+ adc rdx,0
+ add rdi,r11
+ adc rdx,0
+ mov QWORD[((-32))+r15*8+rsp],rdi
+ mov r13,rdx
+ cmp r15,r9
+ jb NEAR $L$1st4x
+
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[((-16))+r15*8+rcx]
+ adc rdx,0
+ mov r11,rdx
+
+ mul rbp
+ add r13,rax
+ mov rax,QWORD[((-8))+r15*8+rsi]
+ adc rdx,0
+ add r13,r10
+ adc rdx,0
+ mov QWORD[((-24))+r15*8+rsp],r13
+ mov rdi,rdx
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[((-8))+r15*8+rcx]
+ adc rdx,0
+ mov r10,rdx
+
+ mul rbp
+ add rdi,rax
+ mov rax,QWORD[rsi]
+ adc rdx,0
+ add rdi,r11
+ adc rdx,0
+ mov QWORD[((-16))+r15*8+rsp],rdi
+ mov r13,rdx
+
+ xor rdi,rdi
+ add r13,r10
+ adc rdi,0
+ mov QWORD[((-8))+r15*8+rsp],r13
+ mov QWORD[r15*8+rsp],rdi
+
+ lea r14,[1+r14]
+ALIGN 4
+$L$outer4x:
+ mov rbx,QWORD[r14*8+r12]
+ xor r15,r15
+ mov r10,QWORD[rsp]
+ mov rbp,r8
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[rcx]
+ adc rdx,0
+
+ imul rbp,r10
+ mov r11,rdx
+
+ mul rbp
+ add r10,rax
+ mov rax,QWORD[8+rsi]
+ adc rdx,0
+ mov rdi,rdx
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[8+rcx]
+ adc rdx,0
+ add r11,QWORD[8+rsp]
+ adc rdx,0
+ mov r10,rdx
+
+ mul rbp
+ add rdi,rax
+ mov rax,QWORD[16+rsi]
+ adc rdx,0
+ add rdi,r11
+ lea r15,[4+r15]
+ adc rdx,0
+ mov QWORD[rsp],rdi
+ mov r13,rdx
+ jmp NEAR $L$inner4x
+ALIGN 16
+$L$inner4x:
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[((-16))+r15*8+rcx]
+ adc rdx,0
+ add r10,QWORD[((-16))+r15*8+rsp]
+ adc rdx,0
+ mov r11,rdx
+
+ mul rbp
+ add r13,rax
+ mov rax,QWORD[((-8))+r15*8+rsi]
+ adc rdx,0
+ add r13,r10
+ adc rdx,0
+ mov QWORD[((-24))+r15*8+rsp],r13
+ mov rdi,rdx
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[((-8))+r15*8+rcx]
+ adc rdx,0
+ add r11,QWORD[((-8))+r15*8+rsp]
+ adc rdx,0
+ mov r10,rdx
+
+ mul rbp
+ add rdi,rax
+ mov rax,QWORD[r15*8+rsi]
+ adc rdx,0
+ add rdi,r11
+ adc rdx,0
+ mov QWORD[((-16))+r15*8+rsp],rdi
+ mov r13,rdx
+
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[r15*8+rcx]
+ adc rdx,0
+ add r10,QWORD[r15*8+rsp]
+ adc rdx,0
+ mov r11,rdx
+
+ mul rbp
+ add r13,rax
+ mov rax,QWORD[8+r15*8+rsi]
+ adc rdx,0
+ add r13,r10
+ adc rdx,0
+ mov QWORD[((-8))+r15*8+rsp],r13
+ mov rdi,rdx
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[8+r15*8+rcx]
+ adc rdx,0
+ add r11,QWORD[8+r15*8+rsp]
+ adc rdx,0
+ lea r15,[4+r15]
+ mov r10,rdx
+
+ mul rbp
+ add rdi,rax
+ mov rax,QWORD[((-16))+r15*8+rsi]
+ adc rdx,0
+ add rdi,r11
+ adc rdx,0
+ mov QWORD[((-32))+r15*8+rsp],rdi
+ mov r13,rdx
+ cmp r15,r9
+ jb NEAR $L$inner4x
+
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[((-16))+r15*8+rcx]
+ adc rdx,0
+ add r10,QWORD[((-16))+r15*8+rsp]
+ adc rdx,0
+ mov r11,rdx
+
+ mul rbp
+ add r13,rax
+ mov rax,QWORD[((-8))+r15*8+rsi]
+ adc rdx,0
+ add r13,r10
+ adc rdx,0
+ mov QWORD[((-24))+r15*8+rsp],r13
+ mov rdi,rdx
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[((-8))+r15*8+rcx]
+ adc rdx,0
+ add r11,QWORD[((-8))+r15*8+rsp]
+ adc rdx,0
+ lea r14,[1+r14]
+ mov r10,rdx
+
+ mul rbp
+ add rdi,rax
+ mov rax,QWORD[rsi]
+ adc rdx,0
+ add rdi,r11
+ adc rdx,0
+ mov QWORD[((-16))+r15*8+rsp],rdi
+ mov r13,rdx
+
+ xor rdi,rdi
+ add r13,r10
+ adc rdi,0
+ add r13,QWORD[r9*8+rsp]
+ adc rdi,0
+ mov QWORD[((-8))+r15*8+rsp],r13
+ mov QWORD[r15*8+rsp],rdi
+
+ cmp r14,r9
+ jb NEAR $L$outer4x
+ mov rdi,QWORD[16+r9*8+rsp]
+ lea r15,[((-4))+r9]
+ mov rax,QWORD[rsp]
+ mov rdx,QWORD[8+rsp]
+ shr r15,2
+ lea rsi,[rsp]
+ xor r14,r14
+
+ sub rax,QWORD[rcx]
+ mov rbx,QWORD[16+rsi]
+ mov rbp,QWORD[24+rsi]
+ sbb rdx,QWORD[8+rcx]
+
+$L$sub4x:
+ mov QWORD[r14*8+rdi],rax
+ mov QWORD[8+r14*8+rdi],rdx
+ sbb rbx,QWORD[16+r14*8+rcx]
+ mov rax,QWORD[32+r14*8+rsi]
+ mov rdx,QWORD[40+r14*8+rsi]
+ sbb rbp,QWORD[24+r14*8+rcx]
+ mov QWORD[16+r14*8+rdi],rbx
+ mov QWORD[24+r14*8+rdi],rbp
+ sbb rax,QWORD[32+r14*8+rcx]
+ mov rbx,QWORD[48+r14*8+rsi]
+ mov rbp,QWORD[56+r14*8+rsi]
+ sbb rdx,QWORD[40+r14*8+rcx]
+ lea r14,[4+r14]
+ dec r15
+ jnz NEAR $L$sub4x
+
+ mov QWORD[r14*8+rdi],rax
+ mov rax,QWORD[32+r14*8+rsi]
+ sbb rbx,QWORD[16+r14*8+rcx]
+ mov QWORD[8+r14*8+rdi],rdx
+ sbb rbp,QWORD[24+r14*8+rcx]
+ mov QWORD[16+r14*8+rdi],rbx
+
+ sbb rax,0
+ mov QWORD[24+r14*8+rdi],rbp
+ pxor xmm0,xmm0
+DB 102,72,15,110,224
+ pcmpeqd xmm5,xmm5
+ pshufd xmm4,xmm4,0
+ mov r15,r9
+ pxor xmm5,xmm4
+ shr r15,2
+ xor eax,eax
+
+ jmp NEAR $L$copy4x
+ALIGN 16
+$L$copy4x:
+ movdqa xmm1,XMMWORD[rax*1+rsp]
+ movdqu xmm2,XMMWORD[rax*1+rdi]
+ pand xmm1,xmm4
+ pand xmm2,xmm5
+ movdqa xmm3,XMMWORD[16+rax*1+rsp]
+ movdqa XMMWORD[rax*1+rsp],xmm0
+ por xmm1,xmm2
+ movdqu xmm2,XMMWORD[16+rax*1+rdi]
+ movdqu XMMWORD[rax*1+rdi],xmm1
+ pand xmm3,xmm4
+ pand xmm2,xmm5
+ movdqa XMMWORD[16+rax*1+rsp],xmm0
+ por xmm3,xmm2
+ movdqu XMMWORD[16+rax*1+rdi],xmm3
+ lea rax,[32+rax]
+ dec r15
+ jnz NEAR $L$copy4x
+ mov rsi,QWORD[8+r9*8+rsp]
+
+ mov rax,1
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$mul4x_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_bn_mul4x_mont:
+EXTERN bn_sqrx8x_internal
+EXTERN bn_sqr8x_internal
+
+
+ALIGN 32
+bn_sqr8x_mont:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_bn_sqr8x_mont:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ mov rax,rsp
+
+$L$sqr8x_enter:
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+$L$sqr8x_prologue:
+
+ mov r10d,r9d
+ shl r9d,3
+ shl r10,3+2
+ neg r9
+
+
+
+
+
+
+ lea r11,[((-64))+r9*2+rsp]
+ mov rbp,rsp
+ mov r8,QWORD[r8]
+ sub r11,rsi
+ and r11,4095
+ cmp r10,r11
+ jb NEAR $L$sqr8x_sp_alt
+ sub rbp,r11
+ lea rbp,[((-64))+r9*2+rbp]
+ jmp NEAR $L$sqr8x_sp_done
+
+ALIGN 32
+$L$sqr8x_sp_alt:
+ lea r10,[((4096-64))+r9*2]
+ lea rbp,[((-64))+r9*2+rbp]
+ sub r11,r10
+ mov r10,0
+ cmovc r11,r10
+ sub rbp,r11
+$L$sqr8x_sp_done:
+ and rbp,-64
+ mov r11,rsp
+ sub r11,rbp
+ and r11,-4096
+ lea rsp,[rbp*1+r11]
+ mov r10,QWORD[rsp]
+ cmp rsp,rbp
+ ja NEAR $L$sqr8x_page_walk
+ jmp NEAR $L$sqr8x_page_walk_done
+
+ALIGN 16
+$L$sqr8x_page_walk:
+ lea rsp,[((-4096))+rsp]
+ mov r10,QWORD[rsp]
+ cmp rsp,rbp
+ ja NEAR $L$sqr8x_page_walk
+$L$sqr8x_page_walk_done:
+
+ mov r10,r9
+ neg r9
+
+ mov QWORD[32+rsp],r8
+ mov QWORD[40+rsp],rax
+
+$L$sqr8x_body:
+
+DB 102,72,15,110,209
+ pxor xmm0,xmm0
+DB 102,72,15,110,207
+DB 102,73,15,110,218
+ mov eax,DWORD[((OPENSSL_ia32cap_P+8))]
+ and eax,0x80100
+ cmp eax,0x80100
+ jne NEAR $L$sqr8x_nox
+
+ call bn_sqrx8x_internal
+
+
+
+
+ lea rbx,[rcx*1+r8]
+ mov r9,rcx
+ mov rdx,rcx
+DB 102,72,15,126,207
+ sar rcx,3+2
+ jmp NEAR $L$sqr8x_sub
+
+ALIGN 32
+$L$sqr8x_nox:
+ call bn_sqr8x_internal
+
+
+
+
+ lea rbx,[r9*1+rdi]
+ mov rcx,r9
+ mov rdx,r9
+DB 102,72,15,126,207
+ sar rcx,3+2
+ jmp NEAR $L$sqr8x_sub
+
+ALIGN 32
+$L$sqr8x_sub:
+ mov r12,QWORD[rbx]
+ mov r13,QWORD[8+rbx]
+ mov r14,QWORD[16+rbx]
+ mov r15,QWORD[24+rbx]
+ lea rbx,[32+rbx]
+ sbb r12,QWORD[rbp]
+ sbb r13,QWORD[8+rbp]
+ sbb r14,QWORD[16+rbp]
+ sbb r15,QWORD[24+rbp]
+ lea rbp,[32+rbp]
+ mov QWORD[rdi],r12
+ mov QWORD[8+rdi],r13
+ mov QWORD[16+rdi],r14
+ mov QWORD[24+rdi],r15
+ lea rdi,[32+rdi]
+ inc rcx
+ jnz NEAR $L$sqr8x_sub
+
+ sbb rax,0
+ lea rbx,[r9*1+rbx]
+ lea rdi,[r9*1+rdi]
+
+DB 102,72,15,110,200
+ pxor xmm0,xmm0
+ pshufd xmm1,xmm1,0
+ mov rsi,QWORD[40+rsp]
+
+ jmp NEAR $L$sqr8x_cond_copy
+
+ALIGN 32
+$L$sqr8x_cond_copy:
+ movdqa xmm2,XMMWORD[rbx]
+ movdqa xmm3,XMMWORD[16+rbx]
+ lea rbx,[32+rbx]
+ movdqu xmm4,XMMWORD[rdi]
+ movdqu xmm5,XMMWORD[16+rdi]
+ lea rdi,[32+rdi]
+ movdqa XMMWORD[(-32)+rbx],xmm0
+ movdqa XMMWORD[(-16)+rbx],xmm0
+ movdqa XMMWORD[(-32)+rdx*1+rbx],xmm0
+ movdqa XMMWORD[(-16)+rdx*1+rbx],xmm0
+ pcmpeqd xmm0,xmm1
+ pand xmm2,xmm1
+ pand xmm3,xmm1
+ pand xmm4,xmm0
+ pand xmm5,xmm0
+ pxor xmm0,xmm0
+ por xmm4,xmm2
+ por xmm5,xmm3
+ movdqu XMMWORD[(-32)+rdi],xmm4
+ movdqu XMMWORD[(-16)+rdi],xmm5
+ add r9,32
+ jnz NEAR $L$sqr8x_cond_copy
+
+ mov rax,1
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$sqr8x_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_bn_sqr8x_mont:
+
+ALIGN 32
+bn_mulx4x_mont:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_bn_mulx4x_mont:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ mov rax,rsp
+
+$L$mulx4x_enter:
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+$L$mulx4x_prologue:
+
+ shl r9d,3
+ xor r10,r10
+ sub r10,r9
+ mov r8,QWORD[r8]
+ lea rbp,[((-72))+r10*1+rsp]
+ and rbp,-128
+ mov r11,rsp
+ sub r11,rbp
+ and r11,-4096
+ lea rsp,[rbp*1+r11]
+ mov r10,QWORD[rsp]
+ cmp rsp,rbp
+ ja NEAR $L$mulx4x_page_walk
+ jmp NEAR $L$mulx4x_page_walk_done
+
+ALIGN 16
+$L$mulx4x_page_walk:
+ lea rsp,[((-4096))+rsp]
+ mov r10,QWORD[rsp]
+ cmp rsp,rbp
+ ja NEAR $L$mulx4x_page_walk
+$L$mulx4x_page_walk_done:
+
+ lea r10,[r9*1+rdx]
+
+
+
+
+
+
+
+
+
+
+
+
+ mov QWORD[rsp],r9
+ shr r9,5
+ mov QWORD[16+rsp],r10
+ sub r9,1
+ mov QWORD[24+rsp],r8
+ mov QWORD[32+rsp],rdi
+ mov QWORD[40+rsp],rax
+
+ mov QWORD[48+rsp],r9
+ jmp NEAR $L$mulx4x_body
+
+ALIGN 32
+$L$mulx4x_body:
+ lea rdi,[8+rdx]
+ mov rdx,QWORD[rdx]
+ lea rbx,[((64+32))+rsp]
+ mov r9,rdx
+
+ mulx rax,r8,QWORD[rsi]
+ mulx r14,r11,QWORD[8+rsi]
+ add r11,rax
+ mov QWORD[8+rsp],rdi
+ mulx r13,r12,QWORD[16+rsi]
+ adc r12,r14
+ adc r13,0
+
+ mov rdi,r8
+ imul r8,QWORD[24+rsp]
+ xor rbp,rbp
+
+ mulx r14,rax,QWORD[24+rsi]
+ mov rdx,r8
+ lea rsi,[32+rsi]
+ adcx r13,rax
+ adcx r14,rbp
+
+ mulx r10,rax,QWORD[rcx]
+ adcx rdi,rax
+ adox r10,r11
+ mulx r11,rax,QWORD[8+rcx]
+ adcx r10,rax
+ adox r11,r12
+DB 0xc4,0x62,0xfb,0xf6,0xa1,0x10,0x00,0x00,0x00
+ mov rdi,QWORD[48+rsp]
+ mov QWORD[((-32))+rbx],r10
+ adcx r11,rax
+ adox r12,r13
+ mulx r15,rax,QWORD[24+rcx]
+ mov rdx,r9
+ mov QWORD[((-24))+rbx],r11
+ adcx r12,rax
+ adox r15,rbp
+ lea rcx,[32+rcx]
+ mov QWORD[((-16))+rbx],r12
+
+ jmp NEAR $L$mulx4x_1st
+
+ALIGN 32
+$L$mulx4x_1st:
+ adcx r15,rbp
+ mulx rax,r10,QWORD[rsi]
+ adcx r10,r14
+ mulx r14,r11,QWORD[8+rsi]
+ adcx r11,rax
+ mulx rax,r12,QWORD[16+rsi]
+ adcx r12,r14
+ mulx r14,r13,QWORD[24+rsi]
+DB 0x67,0x67
+ mov rdx,r8
+ adcx r13,rax
+ adcx r14,rbp
+ lea rsi,[32+rsi]
+ lea rbx,[32+rbx]
+
+ adox r10,r15
+ mulx r15,rax,QWORD[rcx]
+ adcx r10,rax
+ adox r11,r15
+ mulx r15,rax,QWORD[8+rcx]
+ adcx r11,rax
+ adox r12,r15
+ mulx r15,rax,QWORD[16+rcx]
+ mov QWORD[((-40))+rbx],r10
+ adcx r12,rax
+ mov QWORD[((-32))+rbx],r11
+ adox r13,r15
+ mulx r15,rax,QWORD[24+rcx]
+ mov rdx,r9
+ mov QWORD[((-24))+rbx],r12
+ adcx r13,rax
+ adox r15,rbp
+ lea rcx,[32+rcx]
+ mov QWORD[((-16))+rbx],r13
+
+ dec rdi
+ jnz NEAR $L$mulx4x_1st
+
+ mov rax,QWORD[rsp]
+ mov rdi,QWORD[8+rsp]
+ adc r15,rbp
+ add r14,r15
+ sbb r15,r15
+ mov QWORD[((-8))+rbx],r14
+ jmp NEAR $L$mulx4x_outer
+
+ALIGN 32
+$L$mulx4x_outer:
+ mov rdx,QWORD[rdi]
+ lea rdi,[8+rdi]
+ sub rsi,rax
+ mov QWORD[rbx],r15
+ lea rbx,[((64+32))+rsp]
+ sub rcx,rax
+
+ mulx r11,r8,QWORD[rsi]
+ xor ebp,ebp
+ mov r9,rdx
+ mulx r12,r14,QWORD[8+rsi]
+ adox r8,QWORD[((-32))+rbx]
+ adcx r11,r14
+ mulx r13,r15,QWORD[16+rsi]
+ adox r11,QWORD[((-24))+rbx]
+ adcx r12,r15
+ adox r12,QWORD[((-16))+rbx]
+ adcx r13,rbp
+ adox r13,rbp
+
+ mov QWORD[8+rsp],rdi
+ mov r15,r8
+ imul r8,QWORD[24+rsp]
+ xor ebp,ebp
+
+ mulx r14,rax,QWORD[24+rsi]
+ mov rdx,r8
+ adcx r13,rax
+ adox r13,QWORD[((-8))+rbx]
+ adcx r14,rbp
+ lea rsi,[32+rsi]
+ adox r14,rbp
+
+ mulx r10,rax,QWORD[rcx]
+ adcx r15,rax
+ adox r10,r11
+ mulx r11,rax,QWORD[8+rcx]
+ adcx r10,rax
+ adox r11,r12
+ mulx r12,rax,QWORD[16+rcx]
+ mov QWORD[((-32))+rbx],r10
+ adcx r11,rax
+ adox r12,r13
+ mulx r15,rax,QWORD[24+rcx]
+ mov rdx,r9
+ mov QWORD[((-24))+rbx],r11
+ lea rcx,[32+rcx]
+ adcx r12,rax
+ adox r15,rbp
+ mov rdi,QWORD[48+rsp]
+ mov QWORD[((-16))+rbx],r12
+
+ jmp NEAR $L$mulx4x_inner
+
+ALIGN 32
+$L$mulx4x_inner:
+ mulx rax,r10,QWORD[rsi]
+ adcx r15,rbp
+ adox r10,r14
+ mulx r14,r11,QWORD[8+rsi]
+ adcx r10,QWORD[rbx]
+ adox r11,rax
+ mulx rax,r12,QWORD[16+rsi]
+ adcx r11,QWORD[8+rbx]
+ adox r12,r14
+ mulx r14,r13,QWORD[24+rsi]
+ mov rdx,r8
+ adcx r12,QWORD[16+rbx]
+ adox r13,rax
+ adcx r13,QWORD[24+rbx]
+ adox r14,rbp
+ lea rsi,[32+rsi]
+ lea rbx,[32+rbx]
+ adcx r14,rbp
+
+ adox r10,r15
+ mulx r15,rax,QWORD[rcx]
+ adcx r10,rax
+ adox r11,r15
+ mulx r15,rax,QWORD[8+rcx]
+ adcx r11,rax
+ adox r12,r15
+ mulx r15,rax,QWORD[16+rcx]
+ mov QWORD[((-40))+rbx],r10
+ adcx r12,rax
+ adox r13,r15
+ mulx r15,rax,QWORD[24+rcx]
+ mov rdx,r9
+ mov QWORD[((-32))+rbx],r11
+ mov QWORD[((-24))+rbx],r12
+ adcx r13,rax
+ adox r15,rbp
+ lea rcx,[32+rcx]
+ mov QWORD[((-16))+rbx],r13
+
+ dec rdi
+ jnz NEAR $L$mulx4x_inner
+
+ mov rax,QWORD[rsp]
+ mov rdi,QWORD[8+rsp]
+ adc r15,rbp
+ sub rbp,QWORD[rbx]
+ adc r14,r15
+ sbb r15,r15
+ mov QWORD[((-8))+rbx],r14
+
+ cmp rdi,QWORD[16+rsp]
+ jne NEAR $L$mulx4x_outer
+
+ lea rbx,[64+rsp]
+ sub rcx,rax
+ neg r15
+ mov rdx,rax
+ shr rax,3+2
+ mov rdi,QWORD[32+rsp]
+ jmp NEAR $L$mulx4x_sub
+
+ALIGN 32
+$L$mulx4x_sub:
+ mov r11,QWORD[rbx]
+ mov r12,QWORD[8+rbx]
+ mov r13,QWORD[16+rbx]
+ mov r14,QWORD[24+rbx]
+ lea rbx,[32+rbx]
+ sbb r11,QWORD[rcx]
+ sbb r12,QWORD[8+rcx]
+ sbb r13,QWORD[16+rcx]
+ sbb r14,QWORD[24+rcx]
+ lea rcx,[32+rcx]
+ mov QWORD[rdi],r11
+ mov QWORD[8+rdi],r12
+ mov QWORD[16+rdi],r13
+ mov QWORD[24+rdi],r14
+ lea rdi,[32+rdi]
+ dec rax
+ jnz NEAR $L$mulx4x_sub
+
+ sbb r15,0
+ lea rbx,[64+rsp]
+ sub rdi,rdx
+
+DB 102,73,15,110,207
+ pxor xmm0,xmm0
+ pshufd xmm1,xmm1,0
+ mov rsi,QWORD[40+rsp]
+
+ jmp NEAR $L$mulx4x_cond_copy
+
+ALIGN 32
+$L$mulx4x_cond_copy:
+ movdqa xmm2,XMMWORD[rbx]
+ movdqa xmm3,XMMWORD[16+rbx]
+ lea rbx,[32+rbx]
+ movdqu xmm4,XMMWORD[rdi]
+ movdqu xmm5,XMMWORD[16+rdi]
+ lea rdi,[32+rdi]
+ movdqa XMMWORD[(-32)+rbx],xmm0
+ movdqa XMMWORD[(-16)+rbx],xmm0
+ pcmpeqd xmm0,xmm1
+ pand xmm2,xmm1
+ pand xmm3,xmm1
+ pand xmm4,xmm0
+ pand xmm5,xmm0
+ pxor xmm0,xmm0
+ por xmm4,xmm2
+ por xmm5,xmm3
+ movdqu XMMWORD[(-32)+rdi],xmm4
+ movdqu XMMWORD[(-16)+rdi],xmm5
+ sub rdx,32
+ jnz NEAR $L$mulx4x_cond_copy
+
+ mov QWORD[rbx],rdx
+
+ mov rax,1
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$mulx4x_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_bn_mulx4x_mont:
+DB 77,111,110,116,103,111,109,101,114,121,32,77,117,108,116,105
+DB 112,108,105,99,97,116,105,111,110,32,102,111,114,32,120,56
+DB 54,95,54,52,44,32,67,82,89,80,84,79,71,65,77,83
+DB 32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115
+DB 115,108,46,111,114,103,62,0
+ALIGN 16
+EXTERN __imp_RtlVirtualUnwind
+
+ALIGN 16
+mul_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$common_seh_tail
+
+ mov rax,QWORD[152+r8]
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$common_seh_tail
+
+ mov r10,QWORD[192+r8]
+ mov rax,QWORD[8+r10*8+rax]
+
+ jmp NEAR $L$common_pop_regs
+
+
+
+ALIGN 16
+sqr_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$common_seh_tail
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$common_pop_regs
+
+ mov rax,QWORD[152+r8]
+
+ mov r10d,DWORD[8+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$common_seh_tail
+
+ mov rax,QWORD[40+rax]
+
+$L$common_pop_regs:
+ mov rbx,QWORD[((-8))+rax]
+ mov rbp,QWORD[((-16))+rax]
+ mov r12,QWORD[((-24))+rax]
+ mov r13,QWORD[((-32))+rax]
+ mov r14,QWORD[((-40))+rax]
+ mov r15,QWORD[((-48))+rax]
+ mov QWORD[144+r8],rbx
+ mov QWORD[160+r8],rbp
+ mov QWORD[216+r8],r12
+ mov QWORD[224+r8],r13
+ mov QWORD[232+r8],r14
+ mov QWORD[240+r8],r15
+
+$L$common_seh_tail:
+ mov rdi,QWORD[8+rax]
+ mov rsi,QWORD[16+rax]
+ mov QWORD[152+r8],rax
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+
+ mov rdi,QWORD[40+r9]
+ mov rsi,r8
+ mov ecx,154
+ DD 0xa548f3fc
+
+ mov rsi,r9
+ xor rcx,rcx
+ mov rdx,QWORD[8+rsi]
+ mov r8,QWORD[rsi]
+ mov r9,QWORD[16+rsi]
+ mov r10,QWORD[40+rsi]
+ lea r11,[56+rsi]
+ lea r12,[24+rsi]
+ mov QWORD[32+rsp],r10
+ mov QWORD[40+rsp],r11
+ mov QWORD[48+rsp],r12
+ mov QWORD[56+rsp],rcx
+ call QWORD[__imp_RtlVirtualUnwind]
+
+ mov eax,1
+ add rsp,64
+ popfq
+ pop r15
+ pop r14
+ pop r13
+ pop r12
+ pop rbp
+ pop rbx
+ pop rdi
+ pop rsi
+ DB 0F3h,0C3h ;repret
+
+
+section .pdata rdata align=4
+ALIGN 4
+ DD $L$SEH_begin_bn_mul_mont wrt ..imagebase
+ DD $L$SEH_end_bn_mul_mont wrt ..imagebase
+ DD $L$SEH_info_bn_mul_mont wrt ..imagebase
+
+ DD $L$SEH_begin_bn_mul4x_mont wrt ..imagebase
+ DD $L$SEH_end_bn_mul4x_mont wrt ..imagebase
+ DD $L$SEH_info_bn_mul4x_mont wrt ..imagebase
+
+ DD $L$SEH_begin_bn_sqr8x_mont wrt ..imagebase
+ DD $L$SEH_end_bn_sqr8x_mont wrt ..imagebase
+ DD $L$SEH_info_bn_sqr8x_mont wrt ..imagebase
+ DD $L$SEH_begin_bn_mulx4x_mont wrt ..imagebase
+ DD $L$SEH_end_bn_mulx4x_mont wrt ..imagebase
+ DD $L$SEH_info_bn_mulx4x_mont wrt ..imagebase
+section .xdata rdata align=8
+ALIGN 8
+$L$SEH_info_bn_mul_mont:
+DB 9,0,0,0
+ DD mul_handler wrt ..imagebase
+ DD $L$mul_body wrt ..imagebase,$L$mul_epilogue wrt ..imagebase
+$L$SEH_info_bn_mul4x_mont:
+DB 9,0,0,0
+ DD mul_handler wrt ..imagebase
+ DD $L$mul4x_body wrt ..imagebase,$L$mul4x_epilogue wrt ..imagebase
+$L$SEH_info_bn_sqr8x_mont:
+DB 9,0,0,0
+ DD sqr_handler wrt ..imagebase
+ DD $L$sqr8x_prologue wrt ..imagebase,$L$sqr8x_body wrt ..imagebase,$L$sqr8x_epilogue wrt ..imagebase
+ALIGN 8
+$L$SEH_info_bn_mulx4x_mont:
+DB 9,0,0,0
+ DD sqr_handler wrt ..imagebase
+ DD $L$mulx4x_prologue wrt ..imagebase,$L$mulx4x_body wrt ..imagebase,$L$mulx4x_epilogue wrt ..imagebase
+ALIGN 8
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont5.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont5.nasm
new file mode 100644
index 0000000000..f256a94476
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont5.nasm
@@ -0,0 +1,4033 @@
+; Copyright 2011-2019 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+EXTERN OPENSSL_ia32cap_P
+
+global bn_mul_mont_gather5
+
+ALIGN 64
+bn_mul_mont_gather5:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_bn_mul_mont_gather5:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ mov r9d,r9d
+ mov rax,rsp
+
+ test r9d,7
+ jnz NEAR $L$mul_enter
+ mov r11d,DWORD[((OPENSSL_ia32cap_P+8))]
+ jmp NEAR $L$mul4x_enter
+
+ALIGN 16
+$L$mul_enter:
+ movd xmm5,DWORD[56+rsp]
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+
+ neg r9
+ mov r11,rsp
+ lea r10,[((-280))+r9*8+rsp]
+ neg r9
+ and r10,-1024
+
+
+
+
+
+
+
+
+
+ sub r11,r10
+ and r11,-4096
+ lea rsp,[r11*1+r10]
+ mov r11,QWORD[rsp]
+ cmp rsp,r10
+ ja NEAR $L$mul_page_walk
+ jmp NEAR $L$mul_page_walk_done
+
+$L$mul_page_walk:
+ lea rsp,[((-4096))+rsp]
+ mov r11,QWORD[rsp]
+ cmp rsp,r10
+ ja NEAR $L$mul_page_walk
+$L$mul_page_walk_done:
+
+ lea r10,[$L$inc]
+ mov QWORD[8+r9*8+rsp],rax
+
+$L$mul_body:
+
+ lea r12,[128+rdx]
+ movdqa xmm0,XMMWORD[r10]
+ movdqa xmm1,XMMWORD[16+r10]
+ lea r10,[((24-112))+r9*8+rsp]
+ and r10,-16
+
+ pshufd xmm5,xmm5,0
+ movdqa xmm4,xmm1
+ movdqa xmm2,xmm1
+ paddd xmm1,xmm0
+ pcmpeqd xmm0,xmm5
+DB 0x67
+ movdqa xmm3,xmm4
+ paddd xmm2,xmm1
+ pcmpeqd xmm1,xmm5
+ movdqa XMMWORD[112+r10],xmm0
+ movdqa xmm0,xmm4
+
+ paddd xmm3,xmm2
+ pcmpeqd xmm2,xmm5
+ movdqa XMMWORD[128+r10],xmm1
+ movdqa xmm1,xmm4
+
+ paddd xmm0,xmm3
+ pcmpeqd xmm3,xmm5
+ movdqa XMMWORD[144+r10],xmm2
+ movdqa xmm2,xmm4
+
+ paddd xmm1,xmm0
+ pcmpeqd xmm0,xmm5
+ movdqa XMMWORD[160+r10],xmm3
+ movdqa xmm3,xmm4
+ paddd xmm2,xmm1
+ pcmpeqd xmm1,xmm5
+ movdqa XMMWORD[176+r10],xmm0
+ movdqa xmm0,xmm4
+
+ paddd xmm3,xmm2
+ pcmpeqd xmm2,xmm5
+ movdqa XMMWORD[192+r10],xmm1
+ movdqa xmm1,xmm4
+
+ paddd xmm0,xmm3
+ pcmpeqd xmm3,xmm5
+ movdqa XMMWORD[208+r10],xmm2
+ movdqa xmm2,xmm4
+
+ paddd xmm1,xmm0
+ pcmpeqd xmm0,xmm5
+ movdqa XMMWORD[224+r10],xmm3
+ movdqa xmm3,xmm4
+ paddd xmm2,xmm1
+ pcmpeqd xmm1,xmm5
+ movdqa XMMWORD[240+r10],xmm0
+ movdqa xmm0,xmm4
+
+ paddd xmm3,xmm2
+ pcmpeqd xmm2,xmm5
+ movdqa XMMWORD[256+r10],xmm1
+ movdqa xmm1,xmm4
+
+ paddd xmm0,xmm3
+ pcmpeqd xmm3,xmm5
+ movdqa XMMWORD[272+r10],xmm2
+ movdqa xmm2,xmm4
+
+ paddd xmm1,xmm0
+ pcmpeqd xmm0,xmm5
+ movdqa XMMWORD[288+r10],xmm3
+ movdqa xmm3,xmm4
+ paddd xmm2,xmm1
+ pcmpeqd xmm1,xmm5
+ movdqa XMMWORD[304+r10],xmm0
+
+ paddd xmm3,xmm2
+DB 0x67
+ pcmpeqd xmm2,xmm5
+ movdqa XMMWORD[320+r10],xmm1
+
+ pcmpeqd xmm3,xmm5
+ movdqa XMMWORD[336+r10],xmm2
+ pand xmm0,XMMWORD[64+r12]
+
+ pand xmm1,XMMWORD[80+r12]
+ pand xmm2,XMMWORD[96+r12]
+ movdqa XMMWORD[352+r10],xmm3
+ pand xmm3,XMMWORD[112+r12]
+ por xmm0,xmm2
+ por xmm1,xmm3
+ movdqa xmm4,XMMWORD[((-128))+r12]
+ movdqa xmm5,XMMWORD[((-112))+r12]
+ movdqa xmm2,XMMWORD[((-96))+r12]
+ pand xmm4,XMMWORD[112+r10]
+ movdqa xmm3,XMMWORD[((-80))+r12]
+ pand xmm5,XMMWORD[128+r10]
+ por xmm0,xmm4
+ pand xmm2,XMMWORD[144+r10]
+ por xmm1,xmm5
+ pand xmm3,XMMWORD[160+r10]
+ por xmm0,xmm2
+ por xmm1,xmm3
+ movdqa xmm4,XMMWORD[((-64))+r12]
+ movdqa xmm5,XMMWORD[((-48))+r12]
+ movdqa xmm2,XMMWORD[((-32))+r12]
+ pand xmm4,XMMWORD[176+r10]
+ movdqa xmm3,XMMWORD[((-16))+r12]
+ pand xmm5,XMMWORD[192+r10]
+ por xmm0,xmm4
+ pand xmm2,XMMWORD[208+r10]
+ por xmm1,xmm5
+ pand xmm3,XMMWORD[224+r10]
+ por xmm0,xmm2
+ por xmm1,xmm3
+ movdqa xmm4,XMMWORD[r12]
+ movdqa xmm5,XMMWORD[16+r12]
+ movdqa xmm2,XMMWORD[32+r12]
+ pand xmm4,XMMWORD[240+r10]
+ movdqa xmm3,XMMWORD[48+r12]
+ pand xmm5,XMMWORD[256+r10]
+ por xmm0,xmm4
+ pand xmm2,XMMWORD[272+r10]
+ por xmm1,xmm5
+ pand xmm3,XMMWORD[288+r10]
+ por xmm0,xmm2
+ por xmm1,xmm3
+ por xmm0,xmm1
+ pshufd xmm1,xmm0,0x4e
+ por xmm0,xmm1
+ lea r12,[256+r12]
+DB 102,72,15,126,195
+
+ mov r8,QWORD[r8]
+ mov rax,QWORD[rsi]
+
+ xor r14,r14
+ xor r15,r15
+
+ mov rbp,r8
+ mul rbx
+ mov r10,rax
+ mov rax,QWORD[rcx]
+
+ imul rbp,r10
+ mov r11,rdx
+
+ mul rbp
+ add r10,rax
+ mov rax,QWORD[8+rsi]
+ adc rdx,0
+ mov r13,rdx
+
+ lea r15,[1+r15]
+ jmp NEAR $L$1st_enter
+
+ALIGN 16
+$L$1st:
+ add r13,rax
+ mov rax,QWORD[r15*8+rsi]
+ adc rdx,0
+ add r13,r11
+ mov r11,r10
+ adc rdx,0
+ mov QWORD[((-16))+r15*8+rsp],r13
+ mov r13,rdx
+
+$L$1st_enter:
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[r15*8+rcx]
+ adc rdx,0
+ lea r15,[1+r15]
+ mov r10,rdx
+
+ mul rbp
+ cmp r15,r9
+ jne NEAR $L$1st
+
+
+ add r13,rax
+ adc rdx,0
+ add r13,r11
+ adc rdx,0
+ mov QWORD[((-16))+r9*8+rsp],r13
+ mov r13,rdx
+ mov r11,r10
+
+ xor rdx,rdx
+ add r13,r11
+ adc rdx,0
+ mov QWORD[((-8))+r9*8+rsp],r13
+ mov QWORD[r9*8+rsp],rdx
+
+ lea r14,[1+r14]
+ jmp NEAR $L$outer
+ALIGN 16
+$L$outer:
+ lea rdx,[((24+128))+r9*8+rsp]
+ and rdx,-16
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+ movdqa xmm0,XMMWORD[((-128))+r12]
+ movdqa xmm1,XMMWORD[((-112))+r12]
+ movdqa xmm2,XMMWORD[((-96))+r12]
+ movdqa xmm3,XMMWORD[((-80))+r12]
+ pand xmm0,XMMWORD[((-128))+rdx]
+ pand xmm1,XMMWORD[((-112))+rdx]
+ por xmm4,xmm0
+ pand xmm2,XMMWORD[((-96))+rdx]
+ por xmm5,xmm1
+ pand xmm3,XMMWORD[((-80))+rdx]
+ por xmm4,xmm2
+ por xmm5,xmm3
+ movdqa xmm0,XMMWORD[((-64))+r12]
+ movdqa xmm1,XMMWORD[((-48))+r12]
+ movdqa xmm2,XMMWORD[((-32))+r12]
+ movdqa xmm3,XMMWORD[((-16))+r12]
+ pand xmm0,XMMWORD[((-64))+rdx]
+ pand xmm1,XMMWORD[((-48))+rdx]
+ por xmm4,xmm0
+ pand xmm2,XMMWORD[((-32))+rdx]
+ por xmm5,xmm1
+ pand xmm3,XMMWORD[((-16))+rdx]
+ por xmm4,xmm2
+ por xmm5,xmm3
+ movdqa xmm0,XMMWORD[r12]
+ movdqa xmm1,XMMWORD[16+r12]
+ movdqa xmm2,XMMWORD[32+r12]
+ movdqa xmm3,XMMWORD[48+r12]
+ pand xmm0,XMMWORD[rdx]
+ pand xmm1,XMMWORD[16+rdx]
+ por xmm4,xmm0
+ pand xmm2,XMMWORD[32+rdx]
+ por xmm5,xmm1
+ pand xmm3,XMMWORD[48+rdx]
+ por xmm4,xmm2
+ por xmm5,xmm3
+ movdqa xmm0,XMMWORD[64+r12]
+ movdqa xmm1,XMMWORD[80+r12]
+ movdqa xmm2,XMMWORD[96+r12]
+ movdqa xmm3,XMMWORD[112+r12]
+ pand xmm0,XMMWORD[64+rdx]
+ pand xmm1,XMMWORD[80+rdx]
+ por xmm4,xmm0
+ pand xmm2,XMMWORD[96+rdx]
+ por xmm5,xmm1
+ pand xmm3,XMMWORD[112+rdx]
+ por xmm4,xmm2
+ por xmm5,xmm3
+ por xmm4,xmm5
+ pshufd xmm0,xmm4,0x4e
+ por xmm0,xmm4
+ lea r12,[256+r12]
+
+ mov rax,QWORD[rsi]
+DB 102,72,15,126,195
+
+ xor r15,r15
+ mov rbp,r8
+ mov r10,QWORD[rsp]
+
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[rcx]
+ adc rdx,0
+
+ imul rbp,r10
+ mov r11,rdx
+
+ mul rbp
+ add r10,rax
+ mov rax,QWORD[8+rsi]
+ adc rdx,0
+ mov r10,QWORD[8+rsp]
+ mov r13,rdx
+
+ lea r15,[1+r15]
+ jmp NEAR $L$inner_enter
+
+ALIGN 16
+$L$inner:
+ add r13,rax
+ mov rax,QWORD[r15*8+rsi]
+ adc rdx,0
+ add r13,r10
+ mov r10,QWORD[r15*8+rsp]
+ adc rdx,0
+ mov QWORD[((-16))+r15*8+rsp],r13
+ mov r13,rdx
+
+$L$inner_enter:
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[r15*8+rcx]
+ adc rdx,0
+ add r10,r11
+ mov r11,rdx
+ adc r11,0
+ lea r15,[1+r15]
+
+ mul rbp
+ cmp r15,r9
+ jne NEAR $L$inner
+
+ add r13,rax
+ adc rdx,0
+ add r13,r10
+ mov r10,QWORD[r9*8+rsp]
+ adc rdx,0
+ mov QWORD[((-16))+r9*8+rsp],r13
+ mov r13,rdx
+
+ xor rdx,rdx
+ add r13,r11
+ adc rdx,0
+ add r13,r10
+ adc rdx,0
+ mov QWORD[((-8))+r9*8+rsp],r13
+ mov QWORD[r9*8+rsp],rdx
+
+ lea r14,[1+r14]
+ cmp r14,r9
+ jb NEAR $L$outer
+
+ xor r14,r14
+ mov rax,QWORD[rsp]
+ lea rsi,[rsp]
+ mov r15,r9
+ jmp NEAR $L$sub
+ALIGN 16
+$L$sub: sbb rax,QWORD[r14*8+rcx]
+ mov QWORD[r14*8+rdi],rax
+ mov rax,QWORD[8+r14*8+rsi]
+ lea r14,[1+r14]
+ dec r15
+ jnz NEAR $L$sub
+
+ sbb rax,0
+ mov rbx,-1
+ xor rbx,rax
+ xor r14,r14
+ mov r15,r9
+
+$L$copy:
+ mov rcx,QWORD[r14*8+rdi]
+ mov rdx,QWORD[r14*8+rsp]
+ and rcx,rbx
+ and rdx,rax
+ mov QWORD[r14*8+rsp],r14
+ or rdx,rcx
+ mov QWORD[r14*8+rdi],rdx
+ lea r14,[1+r14]
+ sub r15,1
+ jnz NEAR $L$copy
+
+ mov rsi,QWORD[8+r9*8+rsp]
+
+ mov rax,1
+
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$mul_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_bn_mul_mont_gather5:
+
+ALIGN 32
+bn_mul4x_mont_gather5:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_bn_mul4x_mont_gather5:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+DB 0x67
+ mov rax,rsp
+
+$L$mul4x_enter:
+ and r11d,0x80108
+ cmp r11d,0x80108
+ je NEAR $L$mulx4x_enter
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+$L$mul4x_prologue:
+
+DB 0x67
+ shl r9d,3
+ lea r10,[r9*2+r9]
+ neg r9
+
+
+
+
+
+
+
+
+
+
+ lea r11,[((-320))+r9*2+rsp]
+ mov rbp,rsp
+ sub r11,rdi
+ and r11,4095
+ cmp r10,r11
+ jb NEAR $L$mul4xsp_alt
+ sub rbp,r11
+ lea rbp,[((-320))+r9*2+rbp]
+ jmp NEAR $L$mul4xsp_done
+
+ALIGN 32
+$L$mul4xsp_alt:
+ lea r10,[((4096-320))+r9*2]
+ lea rbp,[((-320))+r9*2+rbp]
+ sub r11,r10
+ mov r10,0
+ cmovc r11,r10
+ sub rbp,r11
+$L$mul4xsp_done:
+ and rbp,-64
+ mov r11,rsp
+ sub r11,rbp
+ and r11,-4096
+ lea rsp,[rbp*1+r11]
+ mov r10,QWORD[rsp]
+ cmp rsp,rbp
+ ja NEAR $L$mul4x_page_walk
+ jmp NEAR $L$mul4x_page_walk_done
+
+$L$mul4x_page_walk:
+ lea rsp,[((-4096))+rsp]
+ mov r10,QWORD[rsp]
+ cmp rsp,rbp
+ ja NEAR $L$mul4x_page_walk
+$L$mul4x_page_walk_done:
+
+ neg r9
+
+ mov QWORD[40+rsp],rax
+
+$L$mul4x_body:
+
+ call mul4x_internal
+
+ mov rsi,QWORD[40+rsp]
+
+ mov rax,1
+
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$mul4x_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_bn_mul4x_mont_gather5:
+
+
+ALIGN 32
+mul4x_internal:
+ shl r9,5
+ movd xmm5,DWORD[56+rax]
+ lea rax,[$L$inc]
+ lea r13,[128+r9*1+rdx]
+ shr r9,5
+ movdqa xmm0,XMMWORD[rax]
+ movdqa xmm1,XMMWORD[16+rax]
+ lea r10,[((88-112))+r9*1+rsp]
+ lea r12,[128+rdx]
+
+ pshufd xmm5,xmm5,0
+ movdqa xmm4,xmm1
+DB 0x67,0x67
+ movdqa xmm2,xmm1
+ paddd xmm1,xmm0
+ pcmpeqd xmm0,xmm5
+DB 0x67
+ movdqa xmm3,xmm4
+ paddd xmm2,xmm1
+ pcmpeqd xmm1,xmm5
+ movdqa XMMWORD[112+r10],xmm0
+ movdqa xmm0,xmm4
+
+ paddd xmm3,xmm2
+ pcmpeqd xmm2,xmm5
+ movdqa XMMWORD[128+r10],xmm1
+ movdqa xmm1,xmm4
+
+ paddd xmm0,xmm3
+ pcmpeqd xmm3,xmm5
+ movdqa XMMWORD[144+r10],xmm2
+ movdqa xmm2,xmm4
+
+ paddd xmm1,xmm0
+ pcmpeqd xmm0,xmm5
+ movdqa XMMWORD[160+r10],xmm3
+ movdqa xmm3,xmm4
+ paddd xmm2,xmm1
+ pcmpeqd xmm1,xmm5
+ movdqa XMMWORD[176+r10],xmm0
+ movdqa xmm0,xmm4
+
+ paddd xmm3,xmm2
+ pcmpeqd xmm2,xmm5
+ movdqa XMMWORD[192+r10],xmm1
+ movdqa xmm1,xmm4
+
+ paddd xmm0,xmm3
+ pcmpeqd xmm3,xmm5
+ movdqa XMMWORD[208+r10],xmm2
+ movdqa xmm2,xmm4
+
+ paddd xmm1,xmm0
+ pcmpeqd xmm0,xmm5
+ movdqa XMMWORD[224+r10],xmm3
+ movdqa xmm3,xmm4
+ paddd xmm2,xmm1
+ pcmpeqd xmm1,xmm5
+ movdqa XMMWORD[240+r10],xmm0
+ movdqa xmm0,xmm4
+
+ paddd xmm3,xmm2
+ pcmpeqd xmm2,xmm5
+ movdqa XMMWORD[256+r10],xmm1
+ movdqa xmm1,xmm4
+
+ paddd xmm0,xmm3
+ pcmpeqd xmm3,xmm5
+ movdqa XMMWORD[272+r10],xmm2
+ movdqa xmm2,xmm4
+
+ paddd xmm1,xmm0
+ pcmpeqd xmm0,xmm5
+ movdqa XMMWORD[288+r10],xmm3
+ movdqa xmm3,xmm4
+ paddd xmm2,xmm1
+ pcmpeqd xmm1,xmm5
+ movdqa XMMWORD[304+r10],xmm0
+
+ paddd xmm3,xmm2
+DB 0x67
+ pcmpeqd xmm2,xmm5
+ movdqa XMMWORD[320+r10],xmm1
+
+ pcmpeqd xmm3,xmm5
+ movdqa XMMWORD[336+r10],xmm2
+ pand xmm0,XMMWORD[64+r12]
+
+ pand xmm1,XMMWORD[80+r12]
+ pand xmm2,XMMWORD[96+r12]
+ movdqa XMMWORD[352+r10],xmm3
+ pand xmm3,XMMWORD[112+r12]
+ por xmm0,xmm2
+ por xmm1,xmm3
+ movdqa xmm4,XMMWORD[((-128))+r12]
+ movdqa xmm5,XMMWORD[((-112))+r12]
+ movdqa xmm2,XMMWORD[((-96))+r12]
+ pand xmm4,XMMWORD[112+r10]
+ movdqa xmm3,XMMWORD[((-80))+r12]
+ pand xmm5,XMMWORD[128+r10]
+ por xmm0,xmm4
+ pand xmm2,XMMWORD[144+r10]
+ por xmm1,xmm5
+ pand xmm3,XMMWORD[160+r10]
+ por xmm0,xmm2
+ por xmm1,xmm3
+ movdqa xmm4,XMMWORD[((-64))+r12]
+ movdqa xmm5,XMMWORD[((-48))+r12]
+ movdqa xmm2,XMMWORD[((-32))+r12]
+ pand xmm4,XMMWORD[176+r10]
+ movdqa xmm3,XMMWORD[((-16))+r12]
+ pand xmm5,XMMWORD[192+r10]
+ por xmm0,xmm4
+ pand xmm2,XMMWORD[208+r10]
+ por xmm1,xmm5
+ pand xmm3,XMMWORD[224+r10]
+ por xmm0,xmm2
+ por xmm1,xmm3
+ movdqa xmm4,XMMWORD[r12]
+ movdqa xmm5,XMMWORD[16+r12]
+ movdqa xmm2,XMMWORD[32+r12]
+ pand xmm4,XMMWORD[240+r10]
+ movdqa xmm3,XMMWORD[48+r12]
+ pand xmm5,XMMWORD[256+r10]
+ por xmm0,xmm4
+ pand xmm2,XMMWORD[272+r10]
+ por xmm1,xmm5
+ pand xmm3,XMMWORD[288+r10]
+ por xmm0,xmm2
+ por xmm1,xmm3
+ por xmm0,xmm1
+ pshufd xmm1,xmm0,0x4e
+ por xmm0,xmm1
+ lea r12,[256+r12]
+DB 102,72,15,126,195
+
+ mov QWORD[((16+8))+rsp],r13
+ mov QWORD[((56+8))+rsp],rdi
+
+ mov r8,QWORD[r8]
+ mov rax,QWORD[rsi]
+ lea rsi,[r9*1+rsi]
+ neg r9
+
+ mov rbp,r8
+ mul rbx
+ mov r10,rax
+ mov rax,QWORD[rcx]
+
+ imul rbp,r10
+ lea r14,[((64+8))+rsp]
+ mov r11,rdx
+
+ mul rbp
+ add r10,rax
+ mov rax,QWORD[8+r9*1+rsi]
+ adc rdx,0
+ mov rdi,rdx
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[8+rcx]
+ adc rdx,0
+ mov r10,rdx
+
+ mul rbp
+ add rdi,rax
+ mov rax,QWORD[16+r9*1+rsi]
+ adc rdx,0
+ add rdi,r11
+ lea r15,[32+r9]
+ lea rcx,[32+rcx]
+ adc rdx,0
+ mov QWORD[r14],rdi
+ mov r13,rdx
+ jmp NEAR $L$1st4x
+
+ALIGN 32
+$L$1st4x:
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[((-16))+rcx]
+ lea r14,[32+r14]
+ adc rdx,0
+ mov r11,rdx
+
+ mul rbp
+ add r13,rax
+ mov rax,QWORD[((-8))+r15*1+rsi]
+ adc rdx,0
+ add r13,r10
+ adc rdx,0
+ mov QWORD[((-24))+r14],r13
+ mov rdi,rdx
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[((-8))+rcx]
+ adc rdx,0
+ mov r10,rdx
+
+ mul rbp
+ add rdi,rax
+ mov rax,QWORD[r15*1+rsi]
+ adc rdx,0
+ add rdi,r11
+ adc rdx,0
+ mov QWORD[((-16))+r14],rdi
+ mov r13,rdx
+
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[rcx]
+ adc rdx,0
+ mov r11,rdx
+
+ mul rbp
+ add r13,rax
+ mov rax,QWORD[8+r15*1+rsi]
+ adc rdx,0
+ add r13,r10
+ adc rdx,0
+ mov QWORD[((-8))+r14],r13
+ mov rdi,rdx
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[8+rcx]
+ adc rdx,0
+ mov r10,rdx
+
+ mul rbp
+ add rdi,rax
+ mov rax,QWORD[16+r15*1+rsi]
+ adc rdx,0
+ add rdi,r11
+ lea rcx,[32+rcx]
+ adc rdx,0
+ mov QWORD[r14],rdi
+ mov r13,rdx
+
+ add r15,32
+ jnz NEAR $L$1st4x
+
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[((-16))+rcx]
+ lea r14,[32+r14]
+ adc rdx,0
+ mov r11,rdx
+
+ mul rbp
+ add r13,rax
+ mov rax,QWORD[((-8))+rsi]
+ adc rdx,0
+ add r13,r10
+ adc rdx,0
+ mov QWORD[((-24))+r14],r13
+ mov rdi,rdx
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[((-8))+rcx]
+ adc rdx,0
+ mov r10,rdx
+
+ mul rbp
+ add rdi,rax
+ mov rax,QWORD[r9*1+rsi]
+ adc rdx,0
+ add rdi,r11
+ adc rdx,0
+ mov QWORD[((-16))+r14],rdi
+ mov r13,rdx
+
+ lea rcx,[r9*1+rcx]
+
+ xor rdi,rdi
+ add r13,r10
+ adc rdi,0
+ mov QWORD[((-8))+r14],r13
+
+ jmp NEAR $L$outer4x
+
+ALIGN 32
+$L$outer4x:
+ lea rdx,[((16+128))+r14]
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+ movdqa xmm0,XMMWORD[((-128))+r12]
+ movdqa xmm1,XMMWORD[((-112))+r12]
+ movdqa xmm2,XMMWORD[((-96))+r12]
+ movdqa xmm3,XMMWORD[((-80))+r12]
+ pand xmm0,XMMWORD[((-128))+rdx]
+ pand xmm1,XMMWORD[((-112))+rdx]
+ por xmm4,xmm0
+ pand xmm2,XMMWORD[((-96))+rdx]
+ por xmm5,xmm1
+ pand xmm3,XMMWORD[((-80))+rdx]
+ por xmm4,xmm2
+ por xmm5,xmm3
+ movdqa xmm0,XMMWORD[((-64))+r12]
+ movdqa xmm1,XMMWORD[((-48))+r12]
+ movdqa xmm2,XMMWORD[((-32))+r12]
+ movdqa xmm3,XMMWORD[((-16))+r12]
+ pand xmm0,XMMWORD[((-64))+rdx]
+ pand xmm1,XMMWORD[((-48))+rdx]
+ por xmm4,xmm0
+ pand xmm2,XMMWORD[((-32))+rdx]
+ por xmm5,xmm1
+ pand xmm3,XMMWORD[((-16))+rdx]
+ por xmm4,xmm2
+ por xmm5,xmm3
+ movdqa xmm0,XMMWORD[r12]
+ movdqa xmm1,XMMWORD[16+r12]
+ movdqa xmm2,XMMWORD[32+r12]
+ movdqa xmm3,XMMWORD[48+r12]
+ pand xmm0,XMMWORD[rdx]
+ pand xmm1,XMMWORD[16+rdx]
+ por xmm4,xmm0
+ pand xmm2,XMMWORD[32+rdx]
+ por xmm5,xmm1
+ pand xmm3,XMMWORD[48+rdx]
+ por xmm4,xmm2
+ por xmm5,xmm3
+ movdqa xmm0,XMMWORD[64+r12]
+ movdqa xmm1,XMMWORD[80+r12]
+ movdqa xmm2,XMMWORD[96+r12]
+ movdqa xmm3,XMMWORD[112+r12]
+ pand xmm0,XMMWORD[64+rdx]
+ pand xmm1,XMMWORD[80+rdx]
+ por xmm4,xmm0
+ pand xmm2,XMMWORD[96+rdx]
+ por xmm5,xmm1
+ pand xmm3,XMMWORD[112+rdx]
+ por xmm4,xmm2
+ por xmm5,xmm3
+ por xmm4,xmm5
+ pshufd xmm0,xmm4,0x4e
+ por xmm0,xmm4
+ lea r12,[256+r12]
+DB 102,72,15,126,195
+
+ mov r10,QWORD[r9*1+r14]
+ mov rbp,r8
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[rcx]
+ adc rdx,0
+
+ imul rbp,r10
+ mov r11,rdx
+ mov QWORD[r14],rdi
+
+ lea r14,[r9*1+r14]
+
+ mul rbp
+ add r10,rax
+ mov rax,QWORD[8+r9*1+rsi]
+ adc rdx,0
+ mov rdi,rdx
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[8+rcx]
+ adc rdx,0
+ add r11,QWORD[8+r14]
+ adc rdx,0
+ mov r10,rdx
+
+ mul rbp
+ add rdi,rax
+ mov rax,QWORD[16+r9*1+rsi]
+ adc rdx,0
+ add rdi,r11
+ lea r15,[32+r9]
+ lea rcx,[32+rcx]
+ adc rdx,0
+ mov r13,rdx
+ jmp NEAR $L$inner4x
+
+ALIGN 32
+$L$inner4x:
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[((-16))+rcx]
+ adc rdx,0
+ add r10,QWORD[16+r14]
+ lea r14,[32+r14]
+ adc rdx,0
+ mov r11,rdx
+
+ mul rbp
+ add r13,rax
+ mov rax,QWORD[((-8))+r15*1+rsi]
+ adc rdx,0
+ add r13,r10
+ adc rdx,0
+ mov QWORD[((-32))+r14],rdi
+ mov rdi,rdx
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[((-8))+rcx]
+ adc rdx,0
+ add r11,QWORD[((-8))+r14]
+ adc rdx,0
+ mov r10,rdx
+
+ mul rbp
+ add rdi,rax
+ mov rax,QWORD[r15*1+rsi]
+ adc rdx,0
+ add rdi,r11
+ adc rdx,0
+ mov QWORD[((-24))+r14],r13
+ mov r13,rdx
+
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[rcx]
+ adc rdx,0
+ add r10,QWORD[r14]
+ adc rdx,0
+ mov r11,rdx
+
+ mul rbp
+ add r13,rax
+ mov rax,QWORD[8+r15*1+rsi]
+ adc rdx,0
+ add r13,r10
+ adc rdx,0
+ mov QWORD[((-16))+r14],rdi
+ mov rdi,rdx
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[8+rcx]
+ adc rdx,0
+ add r11,QWORD[8+r14]
+ adc rdx,0
+ mov r10,rdx
+
+ mul rbp
+ add rdi,rax
+ mov rax,QWORD[16+r15*1+rsi]
+ adc rdx,0
+ add rdi,r11
+ lea rcx,[32+rcx]
+ adc rdx,0
+ mov QWORD[((-8))+r14],r13
+ mov r13,rdx
+
+ add r15,32
+ jnz NEAR $L$inner4x
+
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[((-16))+rcx]
+ adc rdx,0
+ add r10,QWORD[16+r14]
+ lea r14,[32+r14]
+ adc rdx,0
+ mov r11,rdx
+
+ mul rbp
+ add r13,rax
+ mov rax,QWORD[((-8))+rsi]
+ adc rdx,0
+ add r13,r10
+ adc rdx,0
+ mov QWORD[((-32))+r14],rdi
+ mov rdi,rdx
+
+ mul rbx
+ add r11,rax
+ mov rax,rbp
+ mov rbp,QWORD[((-8))+rcx]
+ adc rdx,0
+ add r11,QWORD[((-8))+r14]
+ adc rdx,0
+ mov r10,rdx
+
+ mul rbp
+ add rdi,rax
+ mov rax,QWORD[r9*1+rsi]
+ adc rdx,0
+ add rdi,r11
+ adc rdx,0
+ mov QWORD[((-24))+r14],r13
+ mov r13,rdx
+
+ mov QWORD[((-16))+r14],rdi
+ lea rcx,[r9*1+rcx]
+
+ xor rdi,rdi
+ add r13,r10
+ adc rdi,0
+ add r13,QWORD[r14]
+ adc rdi,0
+ mov QWORD[((-8))+r14],r13
+
+ cmp r12,QWORD[((16+8))+rsp]
+ jb NEAR $L$outer4x
+ xor rax,rax
+ sub rbp,r13
+ adc r15,r15
+ or rdi,r15
+ sub rax,rdi
+ lea rbx,[r9*1+r14]
+ mov r12,QWORD[rcx]
+ lea rbp,[rcx]
+ mov rcx,r9
+ sar rcx,3+2
+ mov rdi,QWORD[((56+8))+rsp]
+ dec r12
+ xor r10,r10
+ mov r13,QWORD[8+rbp]
+ mov r14,QWORD[16+rbp]
+ mov r15,QWORD[24+rbp]
+ jmp NEAR $L$sqr4x_sub_entry
+
+global bn_power5
+
+ALIGN 32
+bn_power5:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_bn_power5:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ mov rax,rsp
+
+ mov r11d,DWORD[((OPENSSL_ia32cap_P+8))]
+ and r11d,0x80108
+ cmp r11d,0x80108
+ je NEAR $L$powerx5_enter
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+$L$power5_prologue:
+
+ shl r9d,3
+ lea r10d,[r9*2+r9]
+ neg r9
+ mov r8,QWORD[r8]
+
+
+
+
+
+
+
+
+ lea r11,[((-320))+r9*2+rsp]
+ mov rbp,rsp
+ sub r11,rdi
+ and r11,4095
+ cmp r10,r11
+ jb NEAR $L$pwr_sp_alt
+ sub rbp,r11
+ lea rbp,[((-320))+r9*2+rbp]
+ jmp NEAR $L$pwr_sp_done
+
+ALIGN 32
+$L$pwr_sp_alt:
+ lea r10,[((4096-320))+r9*2]
+ lea rbp,[((-320))+r9*2+rbp]
+ sub r11,r10
+ mov r10,0
+ cmovc r11,r10
+ sub rbp,r11
+$L$pwr_sp_done:
+ and rbp,-64
+ mov r11,rsp
+ sub r11,rbp
+ and r11,-4096
+ lea rsp,[rbp*1+r11]
+ mov r10,QWORD[rsp]
+ cmp rsp,rbp
+ ja NEAR $L$pwr_page_walk
+ jmp NEAR $L$pwr_page_walk_done
+
+$L$pwr_page_walk:
+ lea rsp,[((-4096))+rsp]
+ mov r10,QWORD[rsp]
+ cmp rsp,rbp
+ ja NEAR $L$pwr_page_walk
+$L$pwr_page_walk_done:
+
+ mov r10,r9
+ neg r9
+
+
+
+
+
+
+
+
+
+
+ mov QWORD[32+rsp],r8
+ mov QWORD[40+rsp],rax
+
+$L$power5_body:
+DB 102,72,15,110,207
+DB 102,72,15,110,209
+DB 102,73,15,110,218
+DB 102,72,15,110,226
+
+ call __bn_sqr8x_internal
+ call __bn_post4x_internal
+ call __bn_sqr8x_internal
+ call __bn_post4x_internal
+ call __bn_sqr8x_internal
+ call __bn_post4x_internal
+ call __bn_sqr8x_internal
+ call __bn_post4x_internal
+ call __bn_sqr8x_internal
+ call __bn_post4x_internal
+
+DB 102,72,15,126,209
+DB 102,72,15,126,226
+ mov rdi,rsi
+ mov rax,QWORD[40+rsp]
+ lea r8,[32+rsp]
+
+ call mul4x_internal
+
+ mov rsi,QWORD[40+rsp]
+
+ mov rax,1
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$power5_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_bn_power5:
+
+global bn_sqr8x_internal
+
+
+ALIGN 32
+bn_sqr8x_internal:
+__bn_sqr8x_internal:
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ lea rbp,[32+r10]
+ lea rsi,[r9*1+rsi]
+
+ mov rcx,r9
+
+
+ mov r14,QWORD[((-32))+rbp*1+rsi]
+ lea rdi,[((48+8))+r9*2+rsp]
+ mov rax,QWORD[((-24))+rbp*1+rsi]
+ lea rdi,[((-32))+rbp*1+rdi]
+ mov rbx,QWORD[((-16))+rbp*1+rsi]
+ mov r15,rax
+
+ mul r14
+ mov r10,rax
+ mov rax,rbx
+ mov r11,rdx
+ mov QWORD[((-24))+rbp*1+rdi],r10
+
+ mul r14
+ add r11,rax
+ mov rax,rbx
+ adc rdx,0
+ mov QWORD[((-16))+rbp*1+rdi],r11
+ mov r10,rdx
+
+
+ mov rbx,QWORD[((-8))+rbp*1+rsi]
+ mul r15
+ mov r12,rax
+ mov rax,rbx
+ mov r13,rdx
+
+ lea rcx,[rbp]
+ mul r14
+ add r10,rax
+ mov rax,rbx
+ mov r11,rdx
+ adc r11,0
+ add r10,r12
+ adc r11,0
+ mov QWORD[((-8))+rcx*1+rdi],r10
+ jmp NEAR $L$sqr4x_1st
+
+ALIGN 32
+$L$sqr4x_1st:
+ mov rbx,QWORD[rcx*1+rsi]
+ mul r15
+ add r13,rax
+ mov rax,rbx
+ mov r12,rdx
+ adc r12,0
+
+ mul r14
+ add r11,rax
+ mov rax,rbx
+ mov rbx,QWORD[8+rcx*1+rsi]
+ mov r10,rdx
+ adc r10,0
+ add r11,r13
+ adc r10,0
+
+
+ mul r15
+ add r12,rax
+ mov rax,rbx
+ mov QWORD[rcx*1+rdi],r11
+ mov r13,rdx
+ adc r13,0
+
+ mul r14
+ add r10,rax
+ mov rax,rbx
+ mov rbx,QWORD[16+rcx*1+rsi]
+ mov r11,rdx
+ adc r11,0
+ add r10,r12
+ adc r11,0
+
+ mul r15
+ add r13,rax
+ mov rax,rbx
+ mov QWORD[8+rcx*1+rdi],r10
+ mov r12,rdx
+ adc r12,0
+
+ mul r14
+ add r11,rax
+ mov rax,rbx
+ mov rbx,QWORD[24+rcx*1+rsi]
+ mov r10,rdx
+ adc r10,0
+ add r11,r13
+ adc r10,0
+
+
+ mul r15
+ add r12,rax
+ mov rax,rbx
+ mov QWORD[16+rcx*1+rdi],r11
+ mov r13,rdx
+ adc r13,0
+ lea rcx,[32+rcx]
+
+ mul r14
+ add r10,rax
+ mov rax,rbx
+ mov r11,rdx
+ adc r11,0
+ add r10,r12
+ adc r11,0
+ mov QWORD[((-8))+rcx*1+rdi],r10
+
+ cmp rcx,0
+ jne NEAR $L$sqr4x_1st
+
+ mul r15
+ add r13,rax
+ lea rbp,[16+rbp]
+ adc rdx,0
+ add r13,r11
+ adc rdx,0
+
+ mov QWORD[rdi],r13
+ mov r12,rdx
+ mov QWORD[8+rdi],rdx
+ jmp NEAR $L$sqr4x_outer
+
+ALIGN 32
+$L$sqr4x_outer:
+ mov r14,QWORD[((-32))+rbp*1+rsi]
+ lea rdi,[((48+8))+r9*2+rsp]
+ mov rax,QWORD[((-24))+rbp*1+rsi]
+ lea rdi,[((-32))+rbp*1+rdi]
+ mov rbx,QWORD[((-16))+rbp*1+rsi]
+ mov r15,rax
+
+ mul r14
+ mov r10,QWORD[((-24))+rbp*1+rdi]
+ add r10,rax
+ mov rax,rbx
+ adc rdx,0
+ mov QWORD[((-24))+rbp*1+rdi],r10
+ mov r11,rdx
+
+ mul r14
+ add r11,rax
+ mov rax,rbx
+ adc rdx,0
+ add r11,QWORD[((-16))+rbp*1+rdi]
+ mov r10,rdx
+ adc r10,0
+ mov QWORD[((-16))+rbp*1+rdi],r11
+
+ xor r12,r12
+
+ mov rbx,QWORD[((-8))+rbp*1+rsi]
+ mul r15
+ add r12,rax
+ mov rax,rbx
+ adc rdx,0
+ add r12,QWORD[((-8))+rbp*1+rdi]
+ mov r13,rdx
+ adc r13,0
+
+ mul r14
+ add r10,rax
+ mov rax,rbx
+ adc rdx,0
+ add r10,r12
+ mov r11,rdx
+ adc r11,0
+ mov QWORD[((-8))+rbp*1+rdi],r10
+
+ lea rcx,[rbp]
+ jmp NEAR $L$sqr4x_inner
+
+ALIGN 32
+$L$sqr4x_inner:
+ mov rbx,QWORD[rcx*1+rsi]
+ mul r15
+ add r13,rax
+ mov rax,rbx
+ mov r12,rdx
+ adc r12,0
+ add r13,QWORD[rcx*1+rdi]
+ adc r12,0
+
+DB 0x67
+ mul r14
+ add r11,rax
+ mov rax,rbx
+ mov rbx,QWORD[8+rcx*1+rsi]
+ mov r10,rdx
+ adc r10,0
+ add r11,r13
+ adc r10,0
+
+ mul r15
+ add r12,rax
+ mov QWORD[rcx*1+rdi],r11
+ mov rax,rbx
+ mov r13,rdx
+ adc r13,0
+ add r12,QWORD[8+rcx*1+rdi]
+ lea rcx,[16+rcx]
+ adc r13,0
+
+ mul r14
+ add r10,rax
+ mov rax,rbx
+ adc rdx,0
+ add r10,r12
+ mov r11,rdx
+ adc r11,0
+ mov QWORD[((-8))+rcx*1+rdi],r10
+
+ cmp rcx,0
+ jne NEAR $L$sqr4x_inner
+
+DB 0x67
+ mul r15
+ add r13,rax
+ adc rdx,0
+ add r13,r11
+ adc rdx,0
+
+ mov QWORD[rdi],r13
+ mov r12,rdx
+ mov QWORD[8+rdi],rdx
+
+ add rbp,16
+ jnz NEAR $L$sqr4x_outer
+
+
+ mov r14,QWORD[((-32))+rsi]
+ lea rdi,[((48+8))+r9*2+rsp]
+ mov rax,QWORD[((-24))+rsi]
+ lea rdi,[((-32))+rbp*1+rdi]
+ mov rbx,QWORD[((-16))+rsi]
+ mov r15,rax
+
+ mul r14
+ add r10,rax
+ mov rax,rbx
+ mov r11,rdx
+ adc r11,0
+
+ mul r14
+ add r11,rax
+ mov rax,rbx
+ mov QWORD[((-24))+rdi],r10
+ mov r10,rdx
+ adc r10,0
+ add r11,r13
+ mov rbx,QWORD[((-8))+rsi]
+ adc r10,0
+
+ mul r15
+ add r12,rax
+ mov rax,rbx
+ mov QWORD[((-16))+rdi],r11
+ mov r13,rdx
+ adc r13,0
+
+ mul r14
+ add r10,rax
+ mov rax,rbx
+ mov r11,rdx
+ adc r11,0
+ add r10,r12
+ adc r11,0
+ mov QWORD[((-8))+rdi],r10
+
+ mul r15
+ add r13,rax
+ mov rax,QWORD[((-16))+rsi]
+ adc rdx,0
+ add r13,r11
+ adc rdx,0
+
+ mov QWORD[rdi],r13
+ mov r12,rdx
+ mov QWORD[8+rdi],rdx
+
+ mul rbx
+ add rbp,16
+ xor r14,r14
+ sub rbp,r9
+ xor r15,r15
+
+ add rax,r12
+ adc rdx,0
+ mov QWORD[8+rdi],rax
+ mov QWORD[16+rdi],rdx
+ mov QWORD[24+rdi],r15
+
+ mov rax,QWORD[((-16))+rbp*1+rsi]
+ lea rdi,[((48+8))+rsp]
+ xor r10,r10
+ mov r11,QWORD[8+rdi]
+
+ lea r12,[r10*2+r14]
+ shr r10,63
+ lea r13,[r11*2+rcx]
+ shr r11,63
+ or r13,r10
+ mov r10,QWORD[16+rdi]
+ mov r14,r11
+ mul rax
+ neg r15
+ mov r11,QWORD[24+rdi]
+ adc r12,rax
+ mov rax,QWORD[((-8))+rbp*1+rsi]
+ mov QWORD[rdi],r12
+ adc r13,rdx
+
+ lea rbx,[r10*2+r14]
+ mov QWORD[8+rdi],r13
+ sbb r15,r15
+ shr r10,63
+ lea r8,[r11*2+rcx]
+ shr r11,63
+ or r8,r10
+ mov r10,QWORD[32+rdi]
+ mov r14,r11
+ mul rax
+ neg r15
+ mov r11,QWORD[40+rdi]
+ adc rbx,rax
+ mov rax,QWORD[rbp*1+rsi]
+ mov QWORD[16+rdi],rbx
+ adc r8,rdx
+ lea rbp,[16+rbp]
+ mov QWORD[24+rdi],r8
+ sbb r15,r15
+ lea rdi,[64+rdi]
+ jmp NEAR $L$sqr4x_shift_n_add
+
+ALIGN 32
+$L$sqr4x_shift_n_add:
+ lea r12,[r10*2+r14]
+ shr r10,63
+ lea r13,[r11*2+rcx]
+ shr r11,63
+ or r13,r10
+ mov r10,QWORD[((-16))+rdi]
+ mov r14,r11
+ mul rax
+ neg r15
+ mov r11,QWORD[((-8))+rdi]
+ adc r12,rax
+ mov rax,QWORD[((-8))+rbp*1+rsi]
+ mov QWORD[((-32))+rdi],r12
+ adc r13,rdx
+
+ lea rbx,[r10*2+r14]
+ mov QWORD[((-24))+rdi],r13
+ sbb r15,r15
+ shr r10,63
+ lea r8,[r11*2+rcx]
+ shr r11,63
+ or r8,r10
+ mov r10,QWORD[rdi]
+ mov r14,r11
+ mul rax
+ neg r15
+ mov r11,QWORD[8+rdi]
+ adc rbx,rax
+ mov rax,QWORD[rbp*1+rsi]
+ mov QWORD[((-16))+rdi],rbx
+ adc r8,rdx
+
+ lea r12,[r10*2+r14]
+ mov QWORD[((-8))+rdi],r8
+ sbb r15,r15
+ shr r10,63
+ lea r13,[r11*2+rcx]
+ shr r11,63
+ or r13,r10
+ mov r10,QWORD[16+rdi]
+ mov r14,r11
+ mul rax
+ neg r15
+ mov r11,QWORD[24+rdi]
+ adc r12,rax
+ mov rax,QWORD[8+rbp*1+rsi]
+ mov QWORD[rdi],r12
+ adc r13,rdx
+
+ lea rbx,[r10*2+r14]
+ mov QWORD[8+rdi],r13
+ sbb r15,r15
+ shr r10,63
+ lea r8,[r11*2+rcx]
+ shr r11,63
+ or r8,r10
+ mov r10,QWORD[32+rdi]
+ mov r14,r11
+ mul rax
+ neg r15
+ mov r11,QWORD[40+rdi]
+ adc rbx,rax
+ mov rax,QWORD[16+rbp*1+rsi]
+ mov QWORD[16+rdi],rbx
+ adc r8,rdx
+ mov QWORD[24+rdi],r8
+ sbb r15,r15
+ lea rdi,[64+rdi]
+ add rbp,32
+ jnz NEAR $L$sqr4x_shift_n_add
+
+ lea r12,[r10*2+r14]
+DB 0x67
+ shr r10,63
+ lea r13,[r11*2+rcx]
+ shr r11,63
+ or r13,r10
+ mov r10,QWORD[((-16))+rdi]
+ mov r14,r11
+ mul rax
+ neg r15
+ mov r11,QWORD[((-8))+rdi]
+ adc r12,rax
+ mov rax,QWORD[((-8))+rsi]
+ mov QWORD[((-32))+rdi],r12
+ adc r13,rdx
+
+ lea rbx,[r10*2+r14]
+ mov QWORD[((-24))+rdi],r13
+ sbb r15,r15
+ shr r10,63
+ lea r8,[r11*2+rcx]
+ shr r11,63
+ or r8,r10
+ mul rax
+ neg r15
+ adc rbx,rax
+ adc r8,rdx
+ mov QWORD[((-16))+rdi],rbx
+ mov QWORD[((-8))+rdi],r8
+DB 102,72,15,126,213
+__bn_sqr8x_reduction:
+ xor rax,rax
+ lea rcx,[rbp*1+r9]
+ lea rdx,[((48+8))+r9*2+rsp]
+ mov QWORD[((0+8))+rsp],rcx
+ lea rdi,[((48+8))+r9*1+rsp]
+ mov QWORD[((8+8))+rsp],rdx
+ neg r9
+ jmp NEAR $L$8x_reduction_loop
+
+ALIGN 32
+$L$8x_reduction_loop:
+ lea rdi,[r9*1+rdi]
+DB 0x66
+ mov rbx,QWORD[rdi]
+ mov r9,QWORD[8+rdi]
+ mov r10,QWORD[16+rdi]
+ mov r11,QWORD[24+rdi]
+ mov r12,QWORD[32+rdi]
+ mov r13,QWORD[40+rdi]
+ mov r14,QWORD[48+rdi]
+ mov r15,QWORD[56+rdi]
+ mov QWORD[rdx],rax
+ lea rdi,[64+rdi]
+
+DB 0x67
+ mov r8,rbx
+ imul rbx,QWORD[((32+8))+rsp]
+ mov rax,QWORD[rbp]
+ mov ecx,8
+ jmp NEAR $L$8x_reduce
+
+ALIGN 32
+$L$8x_reduce:
+ mul rbx
+ mov rax,QWORD[8+rbp]
+ neg r8
+ mov r8,rdx
+ adc r8,0
+
+ mul rbx
+ add r9,rax
+ mov rax,QWORD[16+rbp]
+ adc rdx,0
+ add r8,r9
+ mov QWORD[((48-8+8))+rcx*8+rsp],rbx
+ mov r9,rdx
+ adc r9,0
+
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[24+rbp]
+ adc rdx,0
+ add r9,r10
+ mov rsi,QWORD[((32+8))+rsp]
+ mov r10,rdx
+ adc r10,0
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[32+rbp]
+ adc rdx,0
+ imul rsi,r8
+ add r10,r11
+ mov r11,rdx
+ adc r11,0
+
+ mul rbx
+ add r12,rax
+ mov rax,QWORD[40+rbp]
+ adc rdx,0
+ add r11,r12
+ mov r12,rdx
+ adc r12,0
+
+ mul rbx
+ add r13,rax
+ mov rax,QWORD[48+rbp]
+ adc rdx,0
+ add r12,r13
+ mov r13,rdx
+ adc r13,0
+
+ mul rbx
+ add r14,rax
+ mov rax,QWORD[56+rbp]
+ adc rdx,0
+ add r13,r14
+ mov r14,rdx
+ adc r14,0
+
+ mul rbx
+ mov rbx,rsi
+ add r15,rax
+ mov rax,QWORD[rbp]
+ adc rdx,0
+ add r14,r15
+ mov r15,rdx
+ adc r15,0
+
+ dec ecx
+ jnz NEAR $L$8x_reduce
+
+ lea rbp,[64+rbp]
+ xor rax,rax
+ mov rdx,QWORD[((8+8))+rsp]
+ cmp rbp,QWORD[((0+8))+rsp]
+ jae NEAR $L$8x_no_tail
+
+DB 0x66
+ add r8,QWORD[rdi]
+ adc r9,QWORD[8+rdi]
+ adc r10,QWORD[16+rdi]
+ adc r11,QWORD[24+rdi]
+ adc r12,QWORD[32+rdi]
+ adc r13,QWORD[40+rdi]
+ adc r14,QWORD[48+rdi]
+ adc r15,QWORD[56+rdi]
+ sbb rsi,rsi
+
+ mov rbx,QWORD[((48+56+8))+rsp]
+ mov ecx,8
+ mov rax,QWORD[rbp]
+ jmp NEAR $L$8x_tail
+
+ALIGN 32
+$L$8x_tail:
+ mul rbx
+ add r8,rax
+ mov rax,QWORD[8+rbp]
+ mov QWORD[rdi],r8
+ mov r8,rdx
+ adc r8,0
+
+ mul rbx
+ add r9,rax
+ mov rax,QWORD[16+rbp]
+ adc rdx,0
+ add r8,r9
+ lea rdi,[8+rdi]
+ mov r9,rdx
+ adc r9,0
+
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[24+rbp]
+ adc rdx,0
+ add r9,r10
+ mov r10,rdx
+ adc r10,0
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[32+rbp]
+ adc rdx,0
+ add r10,r11
+ mov r11,rdx
+ adc r11,0
+
+ mul rbx
+ add r12,rax
+ mov rax,QWORD[40+rbp]
+ adc rdx,0
+ add r11,r12
+ mov r12,rdx
+ adc r12,0
+
+ mul rbx
+ add r13,rax
+ mov rax,QWORD[48+rbp]
+ adc rdx,0
+ add r12,r13
+ mov r13,rdx
+ adc r13,0
+
+ mul rbx
+ add r14,rax
+ mov rax,QWORD[56+rbp]
+ adc rdx,0
+ add r13,r14
+ mov r14,rdx
+ adc r14,0
+
+ mul rbx
+ mov rbx,QWORD[((48-16+8))+rcx*8+rsp]
+ add r15,rax
+ adc rdx,0
+ add r14,r15
+ mov rax,QWORD[rbp]
+ mov r15,rdx
+ adc r15,0
+
+ dec ecx
+ jnz NEAR $L$8x_tail
+
+ lea rbp,[64+rbp]
+ mov rdx,QWORD[((8+8))+rsp]
+ cmp rbp,QWORD[((0+8))+rsp]
+ jae NEAR $L$8x_tail_done
+
+ mov rbx,QWORD[((48+56+8))+rsp]
+ neg rsi
+ mov rax,QWORD[rbp]
+ adc r8,QWORD[rdi]
+ adc r9,QWORD[8+rdi]
+ adc r10,QWORD[16+rdi]
+ adc r11,QWORD[24+rdi]
+ adc r12,QWORD[32+rdi]
+ adc r13,QWORD[40+rdi]
+ adc r14,QWORD[48+rdi]
+ adc r15,QWORD[56+rdi]
+ sbb rsi,rsi
+
+ mov ecx,8
+ jmp NEAR $L$8x_tail
+
+ALIGN 32
+$L$8x_tail_done:
+ xor rax,rax
+ add r8,QWORD[rdx]
+ adc r9,0
+ adc r10,0
+ adc r11,0
+ adc r12,0
+ adc r13,0
+ adc r14,0
+ adc r15,0
+ adc rax,0
+
+ neg rsi
+$L$8x_no_tail:
+ adc r8,QWORD[rdi]
+ adc r9,QWORD[8+rdi]
+ adc r10,QWORD[16+rdi]
+ adc r11,QWORD[24+rdi]
+ adc r12,QWORD[32+rdi]
+ adc r13,QWORD[40+rdi]
+ adc r14,QWORD[48+rdi]
+ adc r15,QWORD[56+rdi]
+ adc rax,0
+ mov rcx,QWORD[((-8))+rbp]
+ xor rsi,rsi
+
+DB 102,72,15,126,213
+
+ mov QWORD[rdi],r8
+ mov QWORD[8+rdi],r9
+DB 102,73,15,126,217
+ mov QWORD[16+rdi],r10
+ mov QWORD[24+rdi],r11
+ mov QWORD[32+rdi],r12
+ mov QWORD[40+rdi],r13
+ mov QWORD[48+rdi],r14
+ mov QWORD[56+rdi],r15
+ lea rdi,[64+rdi]
+
+ cmp rdi,rdx
+ jb NEAR $L$8x_reduction_loop
+ DB 0F3h,0C3h ;repret
+
+
+ALIGN 32
+__bn_post4x_internal:
+ mov r12,QWORD[rbp]
+ lea rbx,[r9*1+rdi]
+ mov rcx,r9
+DB 102,72,15,126,207
+ neg rax
+DB 102,72,15,126,206
+ sar rcx,3+2
+ dec r12
+ xor r10,r10
+ mov r13,QWORD[8+rbp]
+ mov r14,QWORD[16+rbp]
+ mov r15,QWORD[24+rbp]
+ jmp NEAR $L$sqr4x_sub_entry
+
+ALIGN 16
+$L$sqr4x_sub:
+ mov r12,QWORD[rbp]
+ mov r13,QWORD[8+rbp]
+ mov r14,QWORD[16+rbp]
+ mov r15,QWORD[24+rbp]
+$L$sqr4x_sub_entry:
+ lea rbp,[32+rbp]
+ not r12
+ not r13
+ not r14
+ not r15
+ and r12,rax
+ and r13,rax
+ and r14,rax
+ and r15,rax
+
+ neg r10
+ adc r12,QWORD[rbx]
+ adc r13,QWORD[8+rbx]
+ adc r14,QWORD[16+rbx]
+ adc r15,QWORD[24+rbx]
+ mov QWORD[rdi],r12
+ lea rbx,[32+rbx]
+ mov QWORD[8+rdi],r13
+ sbb r10,r10
+ mov QWORD[16+rdi],r14
+ mov QWORD[24+rdi],r15
+ lea rdi,[32+rdi]
+
+ inc rcx
+ jnz NEAR $L$sqr4x_sub
+
+ mov r10,r9
+ neg r9
+ DB 0F3h,0C3h ;repret
+
+global bn_from_montgomery
+
+ALIGN 32
+bn_from_montgomery:
+ test DWORD[48+rsp],7
+ jz NEAR bn_from_mont8x
+ xor eax,eax
+ DB 0F3h,0C3h ;repret
+
+
+
+ALIGN 32
+bn_from_mont8x:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_bn_from_mont8x:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+DB 0x67
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+$L$from_prologue:
+
+ shl r9d,3
+ lea r10,[r9*2+r9]
+ neg r9
+ mov r8,QWORD[r8]
+
+
+
+
+
+
+
+
+ lea r11,[((-320))+r9*2+rsp]
+ mov rbp,rsp
+ sub r11,rdi
+ and r11,4095
+ cmp r10,r11
+ jb NEAR $L$from_sp_alt
+ sub rbp,r11
+ lea rbp,[((-320))+r9*2+rbp]
+ jmp NEAR $L$from_sp_done
+
+ALIGN 32
+$L$from_sp_alt:
+ lea r10,[((4096-320))+r9*2]
+ lea rbp,[((-320))+r9*2+rbp]
+ sub r11,r10
+ mov r10,0
+ cmovc r11,r10
+ sub rbp,r11
+$L$from_sp_done:
+ and rbp,-64
+ mov r11,rsp
+ sub r11,rbp
+ and r11,-4096
+ lea rsp,[rbp*1+r11]
+ mov r10,QWORD[rsp]
+ cmp rsp,rbp
+ ja NEAR $L$from_page_walk
+ jmp NEAR $L$from_page_walk_done
+
+$L$from_page_walk:
+ lea rsp,[((-4096))+rsp]
+ mov r10,QWORD[rsp]
+ cmp rsp,rbp
+ ja NEAR $L$from_page_walk
+$L$from_page_walk_done:
+
+ mov r10,r9
+ neg r9
+
+
+
+
+
+
+
+
+
+
+ mov QWORD[32+rsp],r8
+ mov QWORD[40+rsp],rax
+
+$L$from_body:
+ mov r11,r9
+ lea rax,[48+rsp]
+ pxor xmm0,xmm0
+ jmp NEAR $L$mul_by_1
+
+ALIGN 32
+$L$mul_by_1:
+ movdqu xmm1,XMMWORD[rsi]
+ movdqu xmm2,XMMWORD[16+rsi]
+ movdqu xmm3,XMMWORD[32+rsi]
+ movdqa XMMWORD[r9*1+rax],xmm0
+ movdqu xmm4,XMMWORD[48+rsi]
+ movdqa XMMWORD[16+r9*1+rax],xmm0
+DB 0x48,0x8d,0xb6,0x40,0x00,0x00,0x00
+ movdqa XMMWORD[rax],xmm1
+ movdqa XMMWORD[32+r9*1+rax],xmm0
+ movdqa XMMWORD[16+rax],xmm2
+ movdqa XMMWORD[48+r9*1+rax],xmm0
+ movdqa XMMWORD[32+rax],xmm3
+ movdqa XMMWORD[48+rax],xmm4
+ lea rax,[64+rax]
+ sub r11,64
+ jnz NEAR $L$mul_by_1
+
+DB 102,72,15,110,207
+DB 102,72,15,110,209
+DB 0x67
+ mov rbp,rcx
+DB 102,73,15,110,218
+ mov r11d,DWORD[((OPENSSL_ia32cap_P+8))]
+ and r11d,0x80108
+ cmp r11d,0x80108
+ jne NEAR $L$from_mont_nox
+
+ lea rdi,[r9*1+rax]
+ call __bn_sqrx8x_reduction
+ call __bn_postx4x_internal
+
+ pxor xmm0,xmm0
+ lea rax,[48+rsp]
+ jmp NEAR $L$from_mont_zero
+
+ALIGN 32
+$L$from_mont_nox:
+ call __bn_sqr8x_reduction
+ call __bn_post4x_internal
+
+ pxor xmm0,xmm0
+ lea rax,[48+rsp]
+ jmp NEAR $L$from_mont_zero
+
+ALIGN 32
+$L$from_mont_zero:
+ mov rsi,QWORD[40+rsp]
+
+ movdqa XMMWORD[rax],xmm0
+ movdqa XMMWORD[16+rax],xmm0
+ movdqa XMMWORD[32+rax],xmm0
+ movdqa XMMWORD[48+rax],xmm0
+ lea rax,[64+rax]
+ sub r9,32
+ jnz NEAR $L$from_mont_zero
+
+ mov rax,1
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$from_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_bn_from_mont8x:
+
+ALIGN 32
+bn_mulx4x_mont_gather5:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_bn_mulx4x_mont_gather5:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ mov rax,rsp
+
+$L$mulx4x_enter:
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+$L$mulx4x_prologue:
+
+ shl r9d,3
+ lea r10,[r9*2+r9]
+ neg r9
+ mov r8,QWORD[r8]
+
+
+
+
+
+
+
+
+
+
+ lea r11,[((-320))+r9*2+rsp]
+ mov rbp,rsp
+ sub r11,rdi
+ and r11,4095
+ cmp r10,r11
+ jb NEAR $L$mulx4xsp_alt
+ sub rbp,r11
+ lea rbp,[((-320))+r9*2+rbp]
+ jmp NEAR $L$mulx4xsp_done
+
+$L$mulx4xsp_alt:
+ lea r10,[((4096-320))+r9*2]
+ lea rbp,[((-320))+r9*2+rbp]
+ sub r11,r10
+ mov r10,0
+ cmovc r11,r10
+ sub rbp,r11
+$L$mulx4xsp_done:
+ and rbp,-64
+ mov r11,rsp
+ sub r11,rbp
+ and r11,-4096
+ lea rsp,[rbp*1+r11]
+ mov r10,QWORD[rsp]
+ cmp rsp,rbp
+ ja NEAR $L$mulx4x_page_walk
+ jmp NEAR $L$mulx4x_page_walk_done
+
+$L$mulx4x_page_walk:
+ lea rsp,[((-4096))+rsp]
+ mov r10,QWORD[rsp]
+ cmp rsp,rbp
+ ja NEAR $L$mulx4x_page_walk
+$L$mulx4x_page_walk_done:
+
+
+
+
+
+
+
+
+
+
+
+
+
+ mov QWORD[32+rsp],r8
+ mov QWORD[40+rsp],rax
+
+$L$mulx4x_body:
+ call mulx4x_internal
+
+ mov rsi,QWORD[40+rsp]
+
+ mov rax,1
+
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$mulx4x_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_bn_mulx4x_mont_gather5:
+
+
+ALIGN 32
+mulx4x_internal:
+ mov QWORD[8+rsp],r9
+ mov r10,r9
+ neg r9
+ shl r9,5
+ neg r10
+ lea r13,[128+r9*1+rdx]
+ shr r9,5+5
+ movd xmm5,DWORD[56+rax]
+ sub r9,1
+ lea rax,[$L$inc]
+ mov QWORD[((16+8))+rsp],r13
+ mov QWORD[((24+8))+rsp],r9
+ mov QWORD[((56+8))+rsp],rdi
+ movdqa xmm0,XMMWORD[rax]
+ movdqa xmm1,XMMWORD[16+rax]
+ lea r10,[((88-112))+r10*1+rsp]
+ lea rdi,[128+rdx]
+
+ pshufd xmm5,xmm5,0
+ movdqa xmm4,xmm1
+DB 0x67
+ movdqa xmm2,xmm1
+DB 0x67
+ paddd xmm1,xmm0
+ pcmpeqd xmm0,xmm5
+ movdqa xmm3,xmm4
+ paddd xmm2,xmm1
+ pcmpeqd xmm1,xmm5
+ movdqa XMMWORD[112+r10],xmm0
+ movdqa xmm0,xmm4
+
+ paddd xmm3,xmm2
+ pcmpeqd xmm2,xmm5
+ movdqa XMMWORD[128+r10],xmm1
+ movdqa xmm1,xmm4
+
+ paddd xmm0,xmm3
+ pcmpeqd xmm3,xmm5
+ movdqa XMMWORD[144+r10],xmm2
+ movdqa xmm2,xmm4
+
+ paddd xmm1,xmm0
+ pcmpeqd xmm0,xmm5
+ movdqa XMMWORD[160+r10],xmm3
+ movdqa xmm3,xmm4
+ paddd xmm2,xmm1
+ pcmpeqd xmm1,xmm5
+ movdqa XMMWORD[176+r10],xmm0
+ movdqa xmm0,xmm4
+
+ paddd xmm3,xmm2
+ pcmpeqd xmm2,xmm5
+ movdqa XMMWORD[192+r10],xmm1
+ movdqa xmm1,xmm4
+
+ paddd xmm0,xmm3
+ pcmpeqd xmm3,xmm5
+ movdqa XMMWORD[208+r10],xmm2
+ movdqa xmm2,xmm4
+
+ paddd xmm1,xmm0
+ pcmpeqd xmm0,xmm5
+ movdqa XMMWORD[224+r10],xmm3
+ movdqa xmm3,xmm4
+ paddd xmm2,xmm1
+ pcmpeqd xmm1,xmm5
+ movdqa XMMWORD[240+r10],xmm0
+ movdqa xmm0,xmm4
+
+ paddd xmm3,xmm2
+ pcmpeqd xmm2,xmm5
+ movdqa XMMWORD[256+r10],xmm1
+ movdqa xmm1,xmm4
+
+ paddd xmm0,xmm3
+ pcmpeqd xmm3,xmm5
+ movdqa XMMWORD[272+r10],xmm2
+ movdqa xmm2,xmm4
+
+ paddd xmm1,xmm0
+ pcmpeqd xmm0,xmm5
+ movdqa XMMWORD[288+r10],xmm3
+ movdqa xmm3,xmm4
+DB 0x67
+ paddd xmm2,xmm1
+ pcmpeqd xmm1,xmm5
+ movdqa XMMWORD[304+r10],xmm0
+
+ paddd xmm3,xmm2
+ pcmpeqd xmm2,xmm5
+ movdqa XMMWORD[320+r10],xmm1
+
+ pcmpeqd xmm3,xmm5
+ movdqa XMMWORD[336+r10],xmm2
+
+ pand xmm0,XMMWORD[64+rdi]
+ pand xmm1,XMMWORD[80+rdi]
+ pand xmm2,XMMWORD[96+rdi]
+ movdqa XMMWORD[352+r10],xmm3
+ pand xmm3,XMMWORD[112+rdi]
+ por xmm0,xmm2
+ por xmm1,xmm3
+ movdqa xmm4,XMMWORD[((-128))+rdi]
+ movdqa xmm5,XMMWORD[((-112))+rdi]
+ movdqa xmm2,XMMWORD[((-96))+rdi]
+ pand xmm4,XMMWORD[112+r10]
+ movdqa xmm3,XMMWORD[((-80))+rdi]
+ pand xmm5,XMMWORD[128+r10]
+ por xmm0,xmm4
+ pand xmm2,XMMWORD[144+r10]
+ por xmm1,xmm5
+ pand xmm3,XMMWORD[160+r10]
+ por xmm0,xmm2
+ por xmm1,xmm3
+ movdqa xmm4,XMMWORD[((-64))+rdi]
+ movdqa xmm5,XMMWORD[((-48))+rdi]
+ movdqa xmm2,XMMWORD[((-32))+rdi]
+ pand xmm4,XMMWORD[176+r10]
+ movdqa xmm3,XMMWORD[((-16))+rdi]
+ pand xmm5,XMMWORD[192+r10]
+ por xmm0,xmm4
+ pand xmm2,XMMWORD[208+r10]
+ por xmm1,xmm5
+ pand xmm3,XMMWORD[224+r10]
+ por xmm0,xmm2
+ por xmm1,xmm3
+ movdqa xmm4,XMMWORD[rdi]
+ movdqa xmm5,XMMWORD[16+rdi]
+ movdqa xmm2,XMMWORD[32+rdi]
+ pand xmm4,XMMWORD[240+r10]
+ movdqa xmm3,XMMWORD[48+rdi]
+ pand xmm5,XMMWORD[256+r10]
+ por xmm0,xmm4
+ pand xmm2,XMMWORD[272+r10]
+ por xmm1,xmm5
+ pand xmm3,XMMWORD[288+r10]
+ por xmm0,xmm2
+ por xmm1,xmm3
+ pxor xmm0,xmm1
+ pshufd xmm1,xmm0,0x4e
+ por xmm0,xmm1
+ lea rdi,[256+rdi]
+DB 102,72,15,126,194
+ lea rbx,[((64+32+8))+rsp]
+
+ mov r9,rdx
+ mulx rax,r8,QWORD[rsi]
+ mulx r12,r11,QWORD[8+rsi]
+ add r11,rax
+ mulx r13,rax,QWORD[16+rsi]
+ adc r12,rax
+ adc r13,0
+ mulx r14,rax,QWORD[24+rsi]
+
+ mov r15,r8
+ imul r8,QWORD[((32+8))+rsp]
+ xor rbp,rbp
+ mov rdx,r8
+
+ mov QWORD[((8+8))+rsp],rdi
+
+ lea rsi,[32+rsi]
+ adcx r13,rax
+ adcx r14,rbp
+
+ mulx r10,rax,QWORD[rcx]
+ adcx r15,rax
+ adox r10,r11
+ mulx r11,rax,QWORD[8+rcx]
+ adcx r10,rax
+ adox r11,r12
+ mulx r12,rax,QWORD[16+rcx]
+ mov rdi,QWORD[((24+8))+rsp]
+ mov QWORD[((-32))+rbx],r10
+ adcx r11,rax
+ adox r12,r13
+ mulx r15,rax,QWORD[24+rcx]
+ mov rdx,r9
+ mov QWORD[((-24))+rbx],r11
+ adcx r12,rax
+ adox r15,rbp
+ lea rcx,[32+rcx]
+ mov QWORD[((-16))+rbx],r12
+ jmp NEAR $L$mulx4x_1st
+
+ALIGN 32
+$L$mulx4x_1st:
+ adcx r15,rbp
+ mulx rax,r10,QWORD[rsi]
+ adcx r10,r14
+ mulx r14,r11,QWORD[8+rsi]
+ adcx r11,rax
+ mulx rax,r12,QWORD[16+rsi]
+ adcx r12,r14
+ mulx r14,r13,QWORD[24+rsi]
+DB 0x67,0x67
+ mov rdx,r8
+ adcx r13,rax
+ adcx r14,rbp
+ lea rsi,[32+rsi]
+ lea rbx,[32+rbx]
+
+ adox r10,r15
+ mulx r15,rax,QWORD[rcx]
+ adcx r10,rax
+ adox r11,r15
+ mulx r15,rax,QWORD[8+rcx]
+ adcx r11,rax
+ adox r12,r15
+ mulx r15,rax,QWORD[16+rcx]
+ mov QWORD[((-40))+rbx],r10
+ adcx r12,rax
+ mov QWORD[((-32))+rbx],r11
+ adox r13,r15
+ mulx r15,rax,QWORD[24+rcx]
+ mov rdx,r9
+ mov QWORD[((-24))+rbx],r12
+ adcx r13,rax
+ adox r15,rbp
+ lea rcx,[32+rcx]
+ mov QWORD[((-16))+rbx],r13
+
+ dec rdi
+ jnz NEAR $L$mulx4x_1st
+
+ mov rax,QWORD[8+rsp]
+ adc r15,rbp
+ lea rsi,[rax*1+rsi]
+ add r14,r15
+ mov rdi,QWORD[((8+8))+rsp]
+ adc rbp,rbp
+ mov QWORD[((-8))+rbx],r14
+ jmp NEAR $L$mulx4x_outer
+
+ALIGN 32
+$L$mulx4x_outer:
+ lea r10,[((16-256))+rbx]
+ pxor xmm4,xmm4
+DB 0x67,0x67
+ pxor xmm5,xmm5
+ movdqa xmm0,XMMWORD[((-128))+rdi]
+ movdqa xmm1,XMMWORD[((-112))+rdi]
+ movdqa xmm2,XMMWORD[((-96))+rdi]
+ pand xmm0,XMMWORD[256+r10]
+ movdqa xmm3,XMMWORD[((-80))+rdi]
+ pand xmm1,XMMWORD[272+r10]
+ por xmm4,xmm0
+ pand xmm2,XMMWORD[288+r10]
+ por xmm5,xmm1
+ pand xmm3,XMMWORD[304+r10]
+ por xmm4,xmm2
+ por xmm5,xmm3
+ movdqa xmm0,XMMWORD[((-64))+rdi]
+ movdqa xmm1,XMMWORD[((-48))+rdi]
+ movdqa xmm2,XMMWORD[((-32))+rdi]
+ pand xmm0,XMMWORD[320+r10]
+ movdqa xmm3,XMMWORD[((-16))+rdi]
+ pand xmm1,XMMWORD[336+r10]
+ por xmm4,xmm0
+ pand xmm2,XMMWORD[352+r10]
+ por xmm5,xmm1
+ pand xmm3,XMMWORD[368+r10]
+ por xmm4,xmm2
+ por xmm5,xmm3
+ movdqa xmm0,XMMWORD[rdi]
+ movdqa xmm1,XMMWORD[16+rdi]
+ movdqa xmm2,XMMWORD[32+rdi]
+ pand xmm0,XMMWORD[384+r10]
+ movdqa xmm3,XMMWORD[48+rdi]
+ pand xmm1,XMMWORD[400+r10]
+ por xmm4,xmm0
+ pand xmm2,XMMWORD[416+r10]
+ por xmm5,xmm1
+ pand xmm3,XMMWORD[432+r10]
+ por xmm4,xmm2
+ por xmm5,xmm3
+ movdqa xmm0,XMMWORD[64+rdi]
+ movdqa xmm1,XMMWORD[80+rdi]
+ movdqa xmm2,XMMWORD[96+rdi]
+ pand xmm0,XMMWORD[448+r10]
+ movdqa xmm3,XMMWORD[112+rdi]
+ pand xmm1,XMMWORD[464+r10]
+ por xmm4,xmm0
+ pand xmm2,XMMWORD[480+r10]
+ por xmm5,xmm1
+ pand xmm3,XMMWORD[496+r10]
+ por xmm4,xmm2
+ por xmm5,xmm3
+ por xmm4,xmm5
+ pshufd xmm0,xmm4,0x4e
+ por xmm0,xmm4
+ lea rdi,[256+rdi]
+DB 102,72,15,126,194
+
+ mov QWORD[rbx],rbp
+ lea rbx,[32+rax*1+rbx]
+ mulx r11,r8,QWORD[rsi]
+ xor rbp,rbp
+ mov r9,rdx
+ mulx r12,r14,QWORD[8+rsi]
+ adox r8,QWORD[((-32))+rbx]
+ adcx r11,r14
+ mulx r13,r15,QWORD[16+rsi]
+ adox r11,QWORD[((-24))+rbx]
+ adcx r12,r15
+ mulx r14,rdx,QWORD[24+rsi]
+ adox r12,QWORD[((-16))+rbx]
+ adcx r13,rdx
+ lea rcx,[rax*1+rcx]
+ lea rsi,[32+rsi]
+ adox r13,QWORD[((-8))+rbx]
+ adcx r14,rbp
+ adox r14,rbp
+
+ mov r15,r8
+ imul r8,QWORD[((32+8))+rsp]
+
+ mov rdx,r8
+ xor rbp,rbp
+ mov QWORD[((8+8))+rsp],rdi
+
+ mulx r10,rax,QWORD[rcx]
+ adcx r15,rax
+ adox r10,r11
+ mulx r11,rax,QWORD[8+rcx]
+ adcx r10,rax
+ adox r11,r12
+ mulx r12,rax,QWORD[16+rcx]
+ adcx r11,rax
+ adox r12,r13
+ mulx r15,rax,QWORD[24+rcx]
+ mov rdx,r9
+ mov rdi,QWORD[((24+8))+rsp]
+ mov QWORD[((-32))+rbx],r10
+ adcx r12,rax
+ mov QWORD[((-24))+rbx],r11
+ adox r15,rbp
+ mov QWORD[((-16))+rbx],r12
+ lea rcx,[32+rcx]
+ jmp NEAR $L$mulx4x_inner
+
+ALIGN 32
+$L$mulx4x_inner:
+ mulx rax,r10,QWORD[rsi]
+ adcx r15,rbp
+ adox r10,r14
+ mulx r14,r11,QWORD[8+rsi]
+ adcx r10,QWORD[rbx]
+ adox r11,rax
+ mulx rax,r12,QWORD[16+rsi]
+ adcx r11,QWORD[8+rbx]
+ adox r12,r14
+ mulx r14,r13,QWORD[24+rsi]
+ mov rdx,r8
+ adcx r12,QWORD[16+rbx]
+ adox r13,rax
+ adcx r13,QWORD[24+rbx]
+ adox r14,rbp
+ lea rsi,[32+rsi]
+ lea rbx,[32+rbx]
+ adcx r14,rbp
+
+ adox r10,r15
+ mulx r15,rax,QWORD[rcx]
+ adcx r10,rax
+ adox r11,r15
+ mulx r15,rax,QWORD[8+rcx]
+ adcx r11,rax
+ adox r12,r15
+ mulx r15,rax,QWORD[16+rcx]
+ mov QWORD[((-40))+rbx],r10
+ adcx r12,rax
+ adox r13,r15
+ mov QWORD[((-32))+rbx],r11
+ mulx r15,rax,QWORD[24+rcx]
+ mov rdx,r9
+ lea rcx,[32+rcx]
+ mov QWORD[((-24))+rbx],r12
+ adcx r13,rax
+ adox r15,rbp
+ mov QWORD[((-16))+rbx],r13
+
+ dec rdi
+ jnz NEAR $L$mulx4x_inner
+
+ mov rax,QWORD[((0+8))+rsp]
+ adc r15,rbp
+ sub rdi,QWORD[rbx]
+ mov rdi,QWORD[((8+8))+rsp]
+ mov r10,QWORD[((16+8))+rsp]
+ adc r14,r15
+ lea rsi,[rax*1+rsi]
+ adc rbp,rbp
+ mov QWORD[((-8))+rbx],r14
+
+ cmp rdi,r10
+ jb NEAR $L$mulx4x_outer
+
+ mov r10,QWORD[((-8))+rcx]
+ mov r8,rbp
+ mov r12,QWORD[rax*1+rcx]
+ lea rbp,[rax*1+rcx]
+ mov rcx,rax
+ lea rdi,[rax*1+rbx]
+ xor eax,eax
+ xor r15,r15
+ sub r10,r14
+ adc r15,r15
+ or r8,r15
+ sar rcx,3+2
+ sub rax,r8
+ mov rdx,QWORD[((56+8))+rsp]
+ dec r12
+ mov r13,QWORD[8+rbp]
+ xor r8,r8
+ mov r14,QWORD[16+rbp]
+ mov r15,QWORD[24+rbp]
+ jmp NEAR $L$sqrx4x_sub_entry
+
+
+ALIGN 32
+bn_powerx5:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_bn_powerx5:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ mov rax,rsp
+
+$L$powerx5_enter:
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+$L$powerx5_prologue:
+
+ shl r9d,3
+ lea r10,[r9*2+r9]
+ neg r9
+ mov r8,QWORD[r8]
+
+
+
+
+
+
+
+
+ lea r11,[((-320))+r9*2+rsp]
+ mov rbp,rsp
+ sub r11,rdi
+ and r11,4095
+ cmp r10,r11
+ jb NEAR $L$pwrx_sp_alt
+ sub rbp,r11
+ lea rbp,[((-320))+r9*2+rbp]
+ jmp NEAR $L$pwrx_sp_done
+
+ALIGN 32
+$L$pwrx_sp_alt:
+ lea r10,[((4096-320))+r9*2]
+ lea rbp,[((-320))+r9*2+rbp]
+ sub r11,r10
+ mov r10,0
+ cmovc r11,r10
+ sub rbp,r11
+$L$pwrx_sp_done:
+ and rbp,-64
+ mov r11,rsp
+ sub r11,rbp
+ and r11,-4096
+ lea rsp,[rbp*1+r11]
+ mov r10,QWORD[rsp]
+ cmp rsp,rbp
+ ja NEAR $L$pwrx_page_walk
+ jmp NEAR $L$pwrx_page_walk_done
+
+$L$pwrx_page_walk:
+ lea rsp,[((-4096))+rsp]
+ mov r10,QWORD[rsp]
+ cmp rsp,rbp
+ ja NEAR $L$pwrx_page_walk
+$L$pwrx_page_walk_done:
+
+ mov r10,r9
+ neg r9
+
+
+
+
+
+
+
+
+
+
+
+
+ pxor xmm0,xmm0
+DB 102,72,15,110,207
+DB 102,72,15,110,209
+DB 102,73,15,110,218
+DB 102,72,15,110,226
+ mov QWORD[32+rsp],r8
+ mov QWORD[40+rsp],rax
+
+$L$powerx5_body:
+
+ call __bn_sqrx8x_internal
+ call __bn_postx4x_internal
+ call __bn_sqrx8x_internal
+ call __bn_postx4x_internal
+ call __bn_sqrx8x_internal
+ call __bn_postx4x_internal
+ call __bn_sqrx8x_internal
+ call __bn_postx4x_internal
+ call __bn_sqrx8x_internal
+ call __bn_postx4x_internal
+
+ mov r9,r10
+ mov rdi,rsi
+DB 102,72,15,126,209
+DB 102,72,15,126,226
+ mov rax,QWORD[40+rsp]
+
+ call mulx4x_internal
+
+ mov rsi,QWORD[40+rsp]
+
+ mov rax,1
+
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$powerx5_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_bn_powerx5:
+
+global bn_sqrx8x_internal
+
+
+ALIGN 32
+bn_sqrx8x_internal:
+__bn_sqrx8x_internal:
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ lea rdi,[((48+8))+rsp]
+ lea rbp,[r9*1+rsi]
+ mov QWORD[((0+8))+rsp],r9
+ mov QWORD[((8+8))+rsp],rbp
+ jmp NEAR $L$sqr8x_zero_start
+
+ALIGN 32
+DB 0x66,0x66,0x66,0x2e,0x0f,0x1f,0x84,0x00,0x00,0x00,0x00,0x00
+$L$sqrx8x_zero:
+DB 0x3e
+ movdqa XMMWORD[rdi],xmm0
+ movdqa XMMWORD[16+rdi],xmm0
+ movdqa XMMWORD[32+rdi],xmm0
+ movdqa XMMWORD[48+rdi],xmm0
+$L$sqr8x_zero_start:
+ movdqa XMMWORD[64+rdi],xmm0
+ movdqa XMMWORD[80+rdi],xmm0
+ movdqa XMMWORD[96+rdi],xmm0
+ movdqa XMMWORD[112+rdi],xmm0
+ lea rdi,[128+rdi]
+ sub r9,64
+ jnz NEAR $L$sqrx8x_zero
+
+ mov rdx,QWORD[rsi]
+
+ xor r10,r10
+ xor r11,r11
+ xor r12,r12
+ xor r13,r13
+ xor r14,r14
+ xor r15,r15
+ lea rdi,[((48+8))+rsp]
+ xor rbp,rbp
+ jmp NEAR $L$sqrx8x_outer_loop
+
+ALIGN 32
+$L$sqrx8x_outer_loop:
+ mulx rax,r8,QWORD[8+rsi]
+ adcx r8,r9
+ adox r10,rax
+ mulx rax,r9,QWORD[16+rsi]
+ adcx r9,r10
+ adox r11,rax
+DB 0xc4,0xe2,0xab,0xf6,0x86,0x18,0x00,0x00,0x00
+ adcx r10,r11
+ adox r12,rax
+DB 0xc4,0xe2,0xa3,0xf6,0x86,0x20,0x00,0x00,0x00
+ adcx r11,r12
+ adox r13,rax
+ mulx rax,r12,QWORD[40+rsi]
+ adcx r12,r13
+ adox r14,rax
+ mulx rax,r13,QWORD[48+rsi]
+ adcx r13,r14
+ adox rax,r15
+ mulx r15,r14,QWORD[56+rsi]
+ mov rdx,QWORD[8+rsi]
+ adcx r14,rax
+ adox r15,rbp
+ adc r15,QWORD[64+rdi]
+ mov QWORD[8+rdi],r8
+ mov QWORD[16+rdi],r9
+ sbb rcx,rcx
+ xor rbp,rbp
+
+
+ mulx rbx,r8,QWORD[16+rsi]
+ mulx rax,r9,QWORD[24+rsi]
+ adcx r8,r10
+ adox r9,rbx
+ mulx rbx,r10,QWORD[32+rsi]
+ adcx r9,r11
+ adox r10,rax
+DB 0xc4,0xe2,0xa3,0xf6,0x86,0x28,0x00,0x00,0x00
+ adcx r10,r12
+ adox r11,rbx
+DB 0xc4,0xe2,0x9b,0xf6,0x9e,0x30,0x00,0x00,0x00
+ adcx r11,r13
+ adox r12,r14
+DB 0xc4,0x62,0x93,0xf6,0xb6,0x38,0x00,0x00,0x00
+ mov rdx,QWORD[16+rsi]
+ adcx r12,rax
+ adox r13,rbx
+ adcx r13,r15
+ adox r14,rbp
+ adcx r14,rbp
+
+ mov QWORD[24+rdi],r8
+ mov QWORD[32+rdi],r9
+
+ mulx rbx,r8,QWORD[24+rsi]
+ mulx rax,r9,QWORD[32+rsi]
+ adcx r8,r10
+ adox r9,rbx
+ mulx rbx,r10,QWORD[40+rsi]
+ adcx r9,r11
+ adox r10,rax
+DB 0xc4,0xe2,0xa3,0xf6,0x86,0x30,0x00,0x00,0x00
+ adcx r10,r12
+ adox r11,r13
+DB 0xc4,0x62,0x9b,0xf6,0xae,0x38,0x00,0x00,0x00
+DB 0x3e
+ mov rdx,QWORD[24+rsi]
+ adcx r11,rbx
+ adox r12,rax
+ adcx r12,r14
+ mov QWORD[40+rdi],r8
+ mov QWORD[48+rdi],r9
+ mulx rax,r8,QWORD[32+rsi]
+ adox r13,rbp
+ adcx r13,rbp
+
+ mulx rbx,r9,QWORD[40+rsi]
+ adcx r8,r10
+ adox r9,rax
+ mulx rax,r10,QWORD[48+rsi]
+ adcx r9,r11
+ adox r10,r12
+ mulx r12,r11,QWORD[56+rsi]
+ mov rdx,QWORD[32+rsi]
+ mov r14,QWORD[40+rsi]
+ adcx r10,rbx
+ adox r11,rax
+ mov r15,QWORD[48+rsi]
+ adcx r11,r13
+ adox r12,rbp
+ adcx r12,rbp
+
+ mov QWORD[56+rdi],r8
+ mov QWORD[64+rdi],r9
+
+ mulx rax,r9,r14
+ mov r8,QWORD[56+rsi]
+ adcx r9,r10
+ mulx rbx,r10,r15
+ adox r10,rax
+ adcx r10,r11
+ mulx rax,r11,r8
+ mov rdx,r14
+ adox r11,rbx
+ adcx r11,r12
+
+ adcx rax,rbp
+
+ mulx rbx,r14,r15
+ mulx r13,r12,r8
+ mov rdx,r15
+ lea rsi,[64+rsi]
+ adcx r11,r14
+ adox r12,rbx
+ adcx r12,rax
+ adox r13,rbp
+
+DB 0x67,0x67
+ mulx r14,r8,r8
+ adcx r13,r8
+ adcx r14,rbp
+
+ cmp rsi,QWORD[((8+8))+rsp]
+ je NEAR $L$sqrx8x_outer_break
+
+ neg rcx
+ mov rcx,-8
+ mov r15,rbp
+ mov r8,QWORD[64+rdi]
+ adcx r9,QWORD[72+rdi]
+ adcx r10,QWORD[80+rdi]
+ adcx r11,QWORD[88+rdi]
+ adc r12,QWORD[96+rdi]
+ adc r13,QWORD[104+rdi]
+ adc r14,QWORD[112+rdi]
+ adc r15,QWORD[120+rdi]
+ lea rbp,[rsi]
+ lea rdi,[128+rdi]
+ sbb rax,rax
+
+ mov rdx,QWORD[((-64))+rsi]
+ mov QWORD[((16+8))+rsp],rax
+ mov QWORD[((24+8))+rsp],rdi
+
+
+ xor eax,eax
+ jmp NEAR $L$sqrx8x_loop
+
+ALIGN 32
+$L$sqrx8x_loop:
+ mov rbx,r8
+ mulx r8,rax,QWORD[rbp]
+ adcx rbx,rax
+ adox r8,r9
+
+ mulx r9,rax,QWORD[8+rbp]
+ adcx r8,rax
+ adox r9,r10
+
+ mulx r10,rax,QWORD[16+rbp]
+ adcx r9,rax
+ adox r10,r11
+
+ mulx r11,rax,QWORD[24+rbp]
+ adcx r10,rax
+ adox r11,r12
+
+DB 0xc4,0x62,0xfb,0xf6,0xa5,0x20,0x00,0x00,0x00
+ adcx r11,rax
+ adox r12,r13
+
+ mulx r13,rax,QWORD[40+rbp]
+ adcx r12,rax
+ adox r13,r14
+
+ mulx r14,rax,QWORD[48+rbp]
+ mov QWORD[rcx*8+rdi],rbx
+ mov ebx,0
+ adcx r13,rax
+ adox r14,r15
+
+DB 0xc4,0x62,0xfb,0xf6,0xbd,0x38,0x00,0x00,0x00
+ mov rdx,QWORD[8+rcx*8+rsi]
+ adcx r14,rax
+ adox r15,rbx
+ adcx r15,rbx
+
+DB 0x67
+ inc rcx
+ jnz NEAR $L$sqrx8x_loop
+
+ lea rbp,[64+rbp]
+ mov rcx,-8
+ cmp rbp,QWORD[((8+8))+rsp]
+ je NEAR $L$sqrx8x_break
+
+ sub rbx,QWORD[((16+8))+rsp]
+DB 0x66
+ mov rdx,QWORD[((-64))+rsi]
+ adcx r8,QWORD[rdi]
+ adcx r9,QWORD[8+rdi]
+ adc r10,QWORD[16+rdi]
+ adc r11,QWORD[24+rdi]
+ adc r12,QWORD[32+rdi]
+ adc r13,QWORD[40+rdi]
+ adc r14,QWORD[48+rdi]
+ adc r15,QWORD[56+rdi]
+ lea rdi,[64+rdi]
+DB 0x67
+ sbb rax,rax
+ xor ebx,ebx
+ mov QWORD[((16+8))+rsp],rax
+ jmp NEAR $L$sqrx8x_loop
+
+ALIGN 32
+$L$sqrx8x_break:
+ xor rbp,rbp
+ sub rbx,QWORD[((16+8))+rsp]
+ adcx r8,rbp
+ mov rcx,QWORD[((24+8))+rsp]
+ adcx r9,rbp
+ mov rdx,QWORD[rsi]
+ adc r10,0
+ mov QWORD[rdi],r8
+ adc r11,0
+ adc r12,0
+ adc r13,0
+ adc r14,0
+ adc r15,0
+ cmp rdi,rcx
+ je NEAR $L$sqrx8x_outer_loop
+
+ mov QWORD[8+rdi],r9
+ mov r9,QWORD[8+rcx]
+ mov QWORD[16+rdi],r10
+ mov r10,QWORD[16+rcx]
+ mov QWORD[24+rdi],r11
+ mov r11,QWORD[24+rcx]
+ mov QWORD[32+rdi],r12
+ mov r12,QWORD[32+rcx]
+ mov QWORD[40+rdi],r13
+ mov r13,QWORD[40+rcx]
+ mov QWORD[48+rdi],r14
+ mov r14,QWORD[48+rcx]
+ mov QWORD[56+rdi],r15
+ mov r15,QWORD[56+rcx]
+ mov rdi,rcx
+ jmp NEAR $L$sqrx8x_outer_loop
+
+ALIGN 32
+$L$sqrx8x_outer_break:
+ mov QWORD[72+rdi],r9
+DB 102,72,15,126,217
+ mov QWORD[80+rdi],r10
+ mov QWORD[88+rdi],r11
+ mov QWORD[96+rdi],r12
+ mov QWORD[104+rdi],r13
+ mov QWORD[112+rdi],r14
+ lea rdi,[((48+8))+rsp]
+ mov rdx,QWORD[rcx*1+rsi]
+
+ mov r11,QWORD[8+rdi]
+ xor r10,r10
+ mov r9,QWORD[((0+8))+rsp]
+ adox r11,r11
+ mov r12,QWORD[16+rdi]
+ mov r13,QWORD[24+rdi]
+
+
+ALIGN 32
+$L$sqrx4x_shift_n_add:
+ mulx rbx,rax,rdx
+ adox r12,r12
+ adcx rax,r10
+DB 0x48,0x8b,0x94,0x0e,0x08,0x00,0x00,0x00
+DB 0x4c,0x8b,0x97,0x20,0x00,0x00,0x00
+ adox r13,r13
+ adcx rbx,r11
+ mov r11,QWORD[40+rdi]
+ mov QWORD[rdi],rax
+ mov QWORD[8+rdi],rbx
+
+ mulx rbx,rax,rdx
+ adox r10,r10
+ adcx rax,r12
+ mov rdx,QWORD[16+rcx*1+rsi]
+ mov r12,QWORD[48+rdi]
+ adox r11,r11
+ adcx rbx,r13
+ mov r13,QWORD[56+rdi]
+ mov QWORD[16+rdi],rax
+ mov QWORD[24+rdi],rbx
+
+ mulx rbx,rax,rdx
+ adox r12,r12
+ adcx rax,r10
+ mov rdx,QWORD[24+rcx*1+rsi]
+ lea rcx,[32+rcx]
+ mov r10,QWORD[64+rdi]
+ adox r13,r13
+ adcx rbx,r11
+ mov r11,QWORD[72+rdi]
+ mov QWORD[32+rdi],rax
+ mov QWORD[40+rdi],rbx
+
+ mulx rbx,rax,rdx
+ adox r10,r10
+ adcx rax,r12
+ jrcxz $L$sqrx4x_shift_n_add_break
+DB 0x48,0x8b,0x94,0x0e,0x00,0x00,0x00,0x00
+ adox r11,r11
+ adcx rbx,r13
+ mov r12,QWORD[80+rdi]
+ mov r13,QWORD[88+rdi]
+ mov QWORD[48+rdi],rax
+ mov QWORD[56+rdi],rbx
+ lea rdi,[64+rdi]
+ nop
+ jmp NEAR $L$sqrx4x_shift_n_add
+
+ALIGN 32
+$L$sqrx4x_shift_n_add_break:
+ adcx rbx,r13
+ mov QWORD[48+rdi],rax
+ mov QWORD[56+rdi],rbx
+ lea rdi,[64+rdi]
+DB 102,72,15,126,213
+__bn_sqrx8x_reduction:
+ xor eax,eax
+ mov rbx,QWORD[((32+8))+rsp]
+ mov rdx,QWORD[((48+8))+rsp]
+ lea rcx,[((-64))+r9*1+rbp]
+
+ mov QWORD[((0+8))+rsp],rcx
+ mov QWORD[((8+8))+rsp],rdi
+
+ lea rdi,[((48+8))+rsp]
+ jmp NEAR $L$sqrx8x_reduction_loop
+
+ALIGN 32
+$L$sqrx8x_reduction_loop:
+ mov r9,QWORD[8+rdi]
+ mov r10,QWORD[16+rdi]
+ mov r11,QWORD[24+rdi]
+ mov r12,QWORD[32+rdi]
+ mov r8,rdx
+ imul rdx,rbx
+ mov r13,QWORD[40+rdi]
+ mov r14,QWORD[48+rdi]
+ mov r15,QWORD[56+rdi]
+ mov QWORD[((24+8))+rsp],rax
+
+ lea rdi,[64+rdi]
+ xor rsi,rsi
+ mov rcx,-8
+ jmp NEAR $L$sqrx8x_reduce
+
+ALIGN 32
+$L$sqrx8x_reduce:
+ mov rbx,r8
+ mulx r8,rax,QWORD[rbp]
+ adcx rax,rbx
+ adox r8,r9
+
+ mulx r9,rbx,QWORD[8+rbp]
+ adcx r8,rbx
+ adox r9,r10
+
+ mulx r10,rbx,QWORD[16+rbp]
+ adcx r9,rbx
+ adox r10,r11
+
+ mulx r11,rbx,QWORD[24+rbp]
+ adcx r10,rbx
+ adox r11,r12
+
+DB 0xc4,0x62,0xe3,0xf6,0xa5,0x20,0x00,0x00,0x00
+ mov rax,rdx
+ mov rdx,r8
+ adcx r11,rbx
+ adox r12,r13
+
+ mulx rdx,rbx,QWORD[((32+8))+rsp]
+ mov rdx,rax
+ mov QWORD[((64+48+8))+rcx*8+rsp],rax
+
+ mulx r13,rax,QWORD[40+rbp]
+ adcx r12,rax
+ adox r13,r14
+
+ mulx r14,rax,QWORD[48+rbp]
+ adcx r13,rax
+ adox r14,r15
+
+ mulx r15,rax,QWORD[56+rbp]
+ mov rdx,rbx
+ adcx r14,rax
+ adox r15,rsi
+ adcx r15,rsi
+
+DB 0x67,0x67,0x67
+ inc rcx
+ jnz NEAR $L$sqrx8x_reduce
+
+ mov rax,rsi
+ cmp rbp,QWORD[((0+8))+rsp]
+ jae NEAR $L$sqrx8x_no_tail
+
+ mov rdx,QWORD[((48+8))+rsp]
+ add r8,QWORD[rdi]
+ lea rbp,[64+rbp]
+ mov rcx,-8
+ adcx r9,QWORD[8+rdi]
+ adcx r10,QWORD[16+rdi]
+ adc r11,QWORD[24+rdi]
+ adc r12,QWORD[32+rdi]
+ adc r13,QWORD[40+rdi]
+ adc r14,QWORD[48+rdi]
+ adc r15,QWORD[56+rdi]
+ lea rdi,[64+rdi]
+ sbb rax,rax
+
+ xor rsi,rsi
+ mov QWORD[((16+8))+rsp],rax
+ jmp NEAR $L$sqrx8x_tail
+
+ALIGN 32
+$L$sqrx8x_tail:
+ mov rbx,r8
+ mulx r8,rax,QWORD[rbp]
+ adcx rbx,rax
+ adox r8,r9
+
+ mulx r9,rax,QWORD[8+rbp]
+ adcx r8,rax
+ adox r9,r10
+
+ mulx r10,rax,QWORD[16+rbp]
+ adcx r9,rax
+ adox r10,r11
+
+ mulx r11,rax,QWORD[24+rbp]
+ adcx r10,rax
+ adox r11,r12
+
+DB 0xc4,0x62,0xfb,0xf6,0xa5,0x20,0x00,0x00,0x00
+ adcx r11,rax
+ adox r12,r13
+
+ mulx r13,rax,QWORD[40+rbp]
+ adcx r12,rax
+ adox r13,r14
+
+ mulx r14,rax,QWORD[48+rbp]
+ adcx r13,rax
+ adox r14,r15
+
+ mulx r15,rax,QWORD[56+rbp]
+ mov rdx,QWORD[((72+48+8))+rcx*8+rsp]
+ adcx r14,rax
+ adox r15,rsi
+ mov QWORD[rcx*8+rdi],rbx
+ mov rbx,r8
+ adcx r15,rsi
+
+ inc rcx
+ jnz NEAR $L$sqrx8x_tail
+
+ cmp rbp,QWORD[((0+8))+rsp]
+ jae NEAR $L$sqrx8x_tail_done
+
+ sub rsi,QWORD[((16+8))+rsp]
+ mov rdx,QWORD[((48+8))+rsp]
+ lea rbp,[64+rbp]
+ adc r8,QWORD[rdi]
+ adc r9,QWORD[8+rdi]
+ adc r10,QWORD[16+rdi]
+ adc r11,QWORD[24+rdi]
+ adc r12,QWORD[32+rdi]
+ adc r13,QWORD[40+rdi]
+ adc r14,QWORD[48+rdi]
+ adc r15,QWORD[56+rdi]
+ lea rdi,[64+rdi]
+ sbb rax,rax
+ sub rcx,8
+
+ xor rsi,rsi
+ mov QWORD[((16+8))+rsp],rax
+ jmp NEAR $L$sqrx8x_tail
+
+ALIGN 32
+$L$sqrx8x_tail_done:
+ xor rax,rax
+ add r8,QWORD[((24+8))+rsp]
+ adc r9,0
+ adc r10,0
+ adc r11,0
+ adc r12,0
+ adc r13,0
+ adc r14,0
+ adc r15,0
+ adc rax,0
+
+ sub rsi,QWORD[((16+8))+rsp]
+$L$sqrx8x_no_tail:
+ adc r8,QWORD[rdi]
+DB 102,72,15,126,217
+ adc r9,QWORD[8+rdi]
+ mov rsi,QWORD[56+rbp]
+DB 102,72,15,126,213
+ adc r10,QWORD[16+rdi]
+ adc r11,QWORD[24+rdi]
+ adc r12,QWORD[32+rdi]
+ adc r13,QWORD[40+rdi]
+ adc r14,QWORD[48+rdi]
+ adc r15,QWORD[56+rdi]
+ adc rax,0
+
+ mov rbx,QWORD[((32+8))+rsp]
+ mov rdx,QWORD[64+rcx*1+rdi]
+
+ mov QWORD[rdi],r8
+ lea r8,[64+rdi]
+ mov QWORD[8+rdi],r9
+ mov QWORD[16+rdi],r10
+ mov QWORD[24+rdi],r11
+ mov QWORD[32+rdi],r12
+ mov QWORD[40+rdi],r13
+ mov QWORD[48+rdi],r14
+ mov QWORD[56+rdi],r15
+
+ lea rdi,[64+rcx*1+rdi]
+ cmp r8,QWORD[((8+8))+rsp]
+ jb NEAR $L$sqrx8x_reduction_loop
+ DB 0F3h,0C3h ;repret
+
+
+ALIGN 32
+__bn_postx4x_internal:
+ mov r12,QWORD[rbp]
+ mov r10,rcx
+ mov r9,rcx
+ neg rax
+ sar rcx,3+2
+
+DB 102,72,15,126,202
+DB 102,72,15,126,206
+ dec r12
+ mov r13,QWORD[8+rbp]
+ xor r8,r8
+ mov r14,QWORD[16+rbp]
+ mov r15,QWORD[24+rbp]
+ jmp NEAR $L$sqrx4x_sub_entry
+
+ALIGN 16
+$L$sqrx4x_sub:
+ mov r12,QWORD[rbp]
+ mov r13,QWORD[8+rbp]
+ mov r14,QWORD[16+rbp]
+ mov r15,QWORD[24+rbp]
+$L$sqrx4x_sub_entry:
+ andn r12,r12,rax
+ lea rbp,[32+rbp]
+ andn r13,r13,rax
+ andn r14,r14,rax
+ andn r15,r15,rax
+
+ neg r8
+ adc r12,QWORD[rdi]
+ adc r13,QWORD[8+rdi]
+ adc r14,QWORD[16+rdi]
+ adc r15,QWORD[24+rdi]
+ mov QWORD[rdx],r12
+ lea rdi,[32+rdi]
+ mov QWORD[8+rdx],r13
+ sbb r8,r8
+ mov QWORD[16+rdx],r14
+ mov QWORD[24+rdx],r15
+ lea rdx,[32+rdx]
+
+ inc rcx
+ jnz NEAR $L$sqrx4x_sub
+
+ neg r9
+
+ DB 0F3h,0C3h ;repret
+
+global bn_get_bits5
+
+ALIGN 16
+bn_get_bits5:
+ lea r10,[rcx]
+ lea r11,[1+rcx]
+ mov ecx,edx
+ shr edx,4
+ and ecx,15
+ lea eax,[((-8))+rcx]
+ cmp ecx,11
+ cmova r10,r11
+ cmova ecx,eax
+ movzx eax,WORD[rdx*2+r10]
+ shr eax,cl
+ and eax,31
+ DB 0F3h,0C3h ;repret
+
+
+global bn_scatter5
+
+ALIGN 16
+bn_scatter5:
+ cmp edx,0
+ jz NEAR $L$scatter_epilogue
+ lea r8,[r9*8+r8]
+$L$scatter:
+ mov rax,QWORD[rcx]
+ lea rcx,[8+rcx]
+ mov QWORD[r8],rax
+ lea r8,[256+r8]
+ sub edx,1
+ jnz NEAR $L$scatter
+$L$scatter_epilogue:
+ DB 0F3h,0C3h ;repret
+
+
+global bn_gather5
+
+ALIGN 32
+bn_gather5:
+$L$SEH_begin_bn_gather5:
+
+DB 0x4c,0x8d,0x14,0x24
+DB 0x48,0x81,0xec,0x08,0x01,0x00,0x00
+ lea rax,[$L$inc]
+ and rsp,-16
+
+ movd xmm5,r9d
+ movdqa xmm0,XMMWORD[rax]
+ movdqa xmm1,XMMWORD[16+rax]
+ lea r11,[128+r8]
+ lea rax,[128+rsp]
+
+ pshufd xmm5,xmm5,0
+ movdqa xmm4,xmm1
+ movdqa xmm2,xmm1
+ paddd xmm1,xmm0
+ pcmpeqd xmm0,xmm5
+ movdqa xmm3,xmm4
+
+ paddd xmm2,xmm1
+ pcmpeqd xmm1,xmm5
+ movdqa XMMWORD[(-128)+rax],xmm0
+ movdqa xmm0,xmm4
+
+ paddd xmm3,xmm2
+ pcmpeqd xmm2,xmm5
+ movdqa XMMWORD[(-112)+rax],xmm1
+ movdqa xmm1,xmm4
+
+ paddd xmm0,xmm3
+ pcmpeqd xmm3,xmm5
+ movdqa XMMWORD[(-96)+rax],xmm2
+ movdqa xmm2,xmm4
+ paddd xmm1,xmm0
+ pcmpeqd xmm0,xmm5
+ movdqa XMMWORD[(-80)+rax],xmm3
+ movdqa xmm3,xmm4
+
+ paddd xmm2,xmm1
+ pcmpeqd xmm1,xmm5
+ movdqa XMMWORD[(-64)+rax],xmm0
+ movdqa xmm0,xmm4
+
+ paddd xmm3,xmm2
+ pcmpeqd xmm2,xmm5
+ movdqa XMMWORD[(-48)+rax],xmm1
+ movdqa xmm1,xmm4
+
+ paddd xmm0,xmm3
+ pcmpeqd xmm3,xmm5
+ movdqa XMMWORD[(-32)+rax],xmm2
+ movdqa xmm2,xmm4
+ paddd xmm1,xmm0
+ pcmpeqd xmm0,xmm5
+ movdqa XMMWORD[(-16)+rax],xmm3
+ movdqa xmm3,xmm4
+
+ paddd xmm2,xmm1
+ pcmpeqd xmm1,xmm5
+ movdqa XMMWORD[rax],xmm0
+ movdqa xmm0,xmm4
+
+ paddd xmm3,xmm2
+ pcmpeqd xmm2,xmm5
+ movdqa XMMWORD[16+rax],xmm1
+ movdqa xmm1,xmm4
+
+ paddd xmm0,xmm3
+ pcmpeqd xmm3,xmm5
+ movdqa XMMWORD[32+rax],xmm2
+ movdqa xmm2,xmm4
+ paddd xmm1,xmm0
+ pcmpeqd xmm0,xmm5
+ movdqa XMMWORD[48+rax],xmm3
+ movdqa xmm3,xmm4
+
+ paddd xmm2,xmm1
+ pcmpeqd xmm1,xmm5
+ movdqa XMMWORD[64+rax],xmm0
+ movdqa xmm0,xmm4
+
+ paddd xmm3,xmm2
+ pcmpeqd xmm2,xmm5
+ movdqa XMMWORD[80+rax],xmm1
+ movdqa xmm1,xmm4
+
+ paddd xmm0,xmm3
+ pcmpeqd xmm3,xmm5
+ movdqa XMMWORD[96+rax],xmm2
+ movdqa xmm2,xmm4
+ movdqa XMMWORD[112+rax],xmm3
+ jmp NEAR $L$gather
+
+ALIGN 32
+$L$gather:
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+ movdqa xmm0,XMMWORD[((-128))+r11]
+ movdqa xmm1,XMMWORD[((-112))+r11]
+ movdqa xmm2,XMMWORD[((-96))+r11]
+ pand xmm0,XMMWORD[((-128))+rax]
+ movdqa xmm3,XMMWORD[((-80))+r11]
+ pand xmm1,XMMWORD[((-112))+rax]
+ por xmm4,xmm0
+ pand xmm2,XMMWORD[((-96))+rax]
+ por xmm5,xmm1
+ pand xmm3,XMMWORD[((-80))+rax]
+ por xmm4,xmm2
+ por xmm5,xmm3
+ movdqa xmm0,XMMWORD[((-64))+r11]
+ movdqa xmm1,XMMWORD[((-48))+r11]
+ movdqa xmm2,XMMWORD[((-32))+r11]
+ pand xmm0,XMMWORD[((-64))+rax]
+ movdqa xmm3,XMMWORD[((-16))+r11]
+ pand xmm1,XMMWORD[((-48))+rax]
+ por xmm4,xmm0
+ pand xmm2,XMMWORD[((-32))+rax]
+ por xmm5,xmm1
+ pand xmm3,XMMWORD[((-16))+rax]
+ por xmm4,xmm2
+ por xmm5,xmm3
+ movdqa xmm0,XMMWORD[r11]
+ movdqa xmm1,XMMWORD[16+r11]
+ movdqa xmm2,XMMWORD[32+r11]
+ pand xmm0,XMMWORD[rax]
+ movdqa xmm3,XMMWORD[48+r11]
+ pand xmm1,XMMWORD[16+rax]
+ por xmm4,xmm0
+ pand xmm2,XMMWORD[32+rax]
+ por xmm5,xmm1
+ pand xmm3,XMMWORD[48+rax]
+ por xmm4,xmm2
+ por xmm5,xmm3
+ movdqa xmm0,XMMWORD[64+r11]
+ movdqa xmm1,XMMWORD[80+r11]
+ movdqa xmm2,XMMWORD[96+r11]
+ pand xmm0,XMMWORD[64+rax]
+ movdqa xmm3,XMMWORD[112+r11]
+ pand xmm1,XMMWORD[80+rax]
+ por xmm4,xmm0
+ pand xmm2,XMMWORD[96+rax]
+ por xmm5,xmm1
+ pand xmm3,XMMWORD[112+rax]
+ por xmm4,xmm2
+ por xmm5,xmm3
+ por xmm4,xmm5
+ lea r11,[256+r11]
+ pshufd xmm0,xmm4,0x4e
+ por xmm0,xmm4
+ movq QWORD[rcx],xmm0
+ lea rcx,[8+rcx]
+ sub edx,1
+ jnz NEAR $L$gather
+
+ lea rsp,[r10]
+ DB 0F3h,0C3h ;repret
+$L$SEH_end_bn_gather5:
+
+ALIGN 64
+$L$inc:
+ DD 0,0,1,1
+ DD 2,2,2,2
+DB 77,111,110,116,103,111,109,101,114,121,32,77,117,108,116,105
+DB 112,108,105,99,97,116,105,111,110,32,119,105,116,104,32,115
+DB 99,97,116,116,101,114,47,103,97,116,104,101,114,32,102,111
+DB 114,32,120,56,54,95,54,52,44,32,67,82,89,80,84,79
+DB 71,65,77,83,32,98,121,32,60,97,112,112,114,111,64,111
+DB 112,101,110,115,115,108,46,111,114,103,62,0
+EXTERN __imp_RtlVirtualUnwind
+
+ALIGN 16
+mul_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$common_seh_tail
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$common_pop_regs
+
+ mov rax,QWORD[152+r8]
+
+ mov r10d,DWORD[8+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$common_seh_tail
+
+ lea r10,[$L$mul_epilogue]
+ cmp rbx,r10
+ ja NEAR $L$body_40
+
+ mov r10,QWORD[192+r8]
+ mov rax,QWORD[8+r10*8+rax]
+
+ jmp NEAR $L$common_pop_regs
+
+$L$body_40:
+ mov rax,QWORD[40+rax]
+$L$common_pop_regs:
+ mov rbx,QWORD[((-8))+rax]
+ mov rbp,QWORD[((-16))+rax]
+ mov r12,QWORD[((-24))+rax]
+ mov r13,QWORD[((-32))+rax]
+ mov r14,QWORD[((-40))+rax]
+ mov r15,QWORD[((-48))+rax]
+ mov QWORD[144+r8],rbx
+ mov QWORD[160+r8],rbp
+ mov QWORD[216+r8],r12
+ mov QWORD[224+r8],r13
+ mov QWORD[232+r8],r14
+ mov QWORD[240+r8],r15
+
+$L$common_seh_tail:
+ mov rdi,QWORD[8+rax]
+ mov rsi,QWORD[16+rax]
+ mov QWORD[152+r8],rax
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+
+ mov rdi,QWORD[40+r9]
+ mov rsi,r8
+ mov ecx,154
+ DD 0xa548f3fc
+
+ mov rsi,r9
+ xor rcx,rcx
+ mov rdx,QWORD[8+rsi]
+ mov r8,QWORD[rsi]
+ mov r9,QWORD[16+rsi]
+ mov r10,QWORD[40+rsi]
+ lea r11,[56+rsi]
+ lea r12,[24+rsi]
+ mov QWORD[32+rsp],r10
+ mov QWORD[40+rsp],r11
+ mov QWORD[48+rsp],r12
+ mov QWORD[56+rsp],rcx
+ call QWORD[__imp_RtlVirtualUnwind]
+
+ mov eax,1
+ add rsp,64
+ popfq
+ pop r15
+ pop r14
+ pop r13
+ pop r12
+ pop rbp
+ pop rbx
+ pop rdi
+ pop rsi
+ DB 0F3h,0C3h ;repret
+
+
+section .pdata rdata align=4
+ALIGN 4
+ DD $L$SEH_begin_bn_mul_mont_gather5 wrt ..imagebase
+ DD $L$SEH_end_bn_mul_mont_gather5 wrt ..imagebase
+ DD $L$SEH_info_bn_mul_mont_gather5 wrt ..imagebase
+
+ DD $L$SEH_begin_bn_mul4x_mont_gather5 wrt ..imagebase
+ DD $L$SEH_end_bn_mul4x_mont_gather5 wrt ..imagebase
+ DD $L$SEH_info_bn_mul4x_mont_gather5 wrt ..imagebase
+
+ DD $L$SEH_begin_bn_power5 wrt ..imagebase
+ DD $L$SEH_end_bn_power5 wrt ..imagebase
+ DD $L$SEH_info_bn_power5 wrt ..imagebase
+
+ DD $L$SEH_begin_bn_from_mont8x wrt ..imagebase
+ DD $L$SEH_end_bn_from_mont8x wrt ..imagebase
+ DD $L$SEH_info_bn_from_mont8x wrt ..imagebase
+ DD $L$SEH_begin_bn_mulx4x_mont_gather5 wrt ..imagebase
+ DD $L$SEH_end_bn_mulx4x_mont_gather5 wrt ..imagebase
+ DD $L$SEH_info_bn_mulx4x_mont_gather5 wrt ..imagebase
+
+ DD $L$SEH_begin_bn_powerx5 wrt ..imagebase
+ DD $L$SEH_end_bn_powerx5 wrt ..imagebase
+ DD $L$SEH_info_bn_powerx5 wrt ..imagebase
+ DD $L$SEH_begin_bn_gather5 wrt ..imagebase
+ DD $L$SEH_end_bn_gather5 wrt ..imagebase
+ DD $L$SEH_info_bn_gather5 wrt ..imagebase
+
+section .xdata rdata align=8
+ALIGN 8
+$L$SEH_info_bn_mul_mont_gather5:
+DB 9,0,0,0
+ DD mul_handler wrt ..imagebase
+ DD $L$mul_body wrt ..imagebase,$L$mul_body wrt ..imagebase,$L$mul_epilogue wrt ..imagebase
+ALIGN 8
+$L$SEH_info_bn_mul4x_mont_gather5:
+DB 9,0,0,0
+ DD mul_handler wrt ..imagebase
+ DD $L$mul4x_prologue wrt ..imagebase,$L$mul4x_body wrt ..imagebase,$L$mul4x_epilogue wrt ..imagebase
+ALIGN 8
+$L$SEH_info_bn_power5:
+DB 9,0,0,0
+ DD mul_handler wrt ..imagebase
+ DD $L$power5_prologue wrt ..imagebase,$L$power5_body wrt ..imagebase,$L$power5_epilogue wrt ..imagebase
+ALIGN 8
+$L$SEH_info_bn_from_mont8x:
+DB 9,0,0,0
+ DD mul_handler wrt ..imagebase
+ DD $L$from_prologue wrt ..imagebase,$L$from_body wrt ..imagebase,$L$from_epilogue wrt ..imagebase
+ALIGN 8
+$L$SEH_info_bn_mulx4x_mont_gather5:
+DB 9,0,0,0
+ DD mul_handler wrt ..imagebase
+ DD $L$mulx4x_prologue wrt ..imagebase,$L$mulx4x_body wrt ..imagebase,$L$mulx4x_epilogue wrt ..imagebase
+ALIGN 8
+$L$SEH_info_bn_powerx5:
+DB 9,0,0,0
+ DD mul_handler wrt ..imagebase
+ DD $L$powerx5_prologue wrt ..imagebase,$L$powerx5_body wrt ..imagebase,$L$powerx5_epilogue wrt ..imagebase
+ALIGN 8
+$L$SEH_info_bn_gather5:
+DB 0x01,0x0b,0x03,0x0a
+DB 0x0b,0x01,0x21,0x00
+DB 0x04,0xa3,0x00,0x00
+ALIGN 8
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/md5/md5-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/md5/md5-x86_64.nasm
new file mode 100644
index 0000000000..ff688eeb06
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/md5/md5-x86_64.nasm
@@ -0,0 +1,794 @@
+; Author: Marc Bevand <bevand_m (at) epita.fr>
+; Copyright 2005-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+ALIGN 16
+
+global md5_block_asm_data_order
+
+md5_block_asm_data_order:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_md5_block_asm_data_order:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+ push rbp
+
+ push rbx
+
+ push r12
+
+ push r14
+
+ push r15
+
+$L$prologue:
+
+
+
+
+ mov rbp,rdi
+ shl rdx,6
+ lea rdi,[rdx*1+rsi]
+ mov eax,DWORD[rbp]
+ mov ebx,DWORD[4+rbp]
+ mov ecx,DWORD[8+rbp]
+ mov edx,DWORD[12+rbp]
+
+
+
+
+
+
+
+ cmp rsi,rdi
+ je NEAR $L$end
+
+
+$L$loop:
+ mov r8d,eax
+ mov r9d,ebx
+ mov r14d,ecx
+ mov r15d,edx
+ mov r10d,DWORD[rsi]
+ mov r11d,edx
+ xor r11d,ecx
+ lea eax,[((-680876936))+r10*1+rax]
+ and r11d,ebx
+ mov r10d,DWORD[4+rsi]
+ xor r11d,edx
+ add eax,r11d
+ rol eax,7
+ mov r11d,ecx
+ add eax,ebx
+ xor r11d,ebx
+ lea edx,[((-389564586))+r10*1+rdx]
+ and r11d,eax
+ mov r10d,DWORD[8+rsi]
+ xor r11d,ecx
+ add edx,r11d
+ rol edx,12
+ mov r11d,ebx
+ add edx,eax
+ xor r11d,eax
+ lea ecx,[606105819+r10*1+rcx]
+ and r11d,edx
+ mov r10d,DWORD[12+rsi]
+ xor r11d,ebx
+ add ecx,r11d
+ rol ecx,17
+ mov r11d,eax
+ add ecx,edx
+ xor r11d,edx
+ lea ebx,[((-1044525330))+r10*1+rbx]
+ and r11d,ecx
+ mov r10d,DWORD[16+rsi]
+ xor r11d,eax
+ add ebx,r11d
+ rol ebx,22
+ mov r11d,edx
+ add ebx,ecx
+ xor r11d,ecx
+ lea eax,[((-176418897))+r10*1+rax]
+ and r11d,ebx
+ mov r10d,DWORD[20+rsi]
+ xor r11d,edx
+ add eax,r11d
+ rol eax,7
+ mov r11d,ecx
+ add eax,ebx
+ xor r11d,ebx
+ lea edx,[1200080426+r10*1+rdx]
+ and r11d,eax
+ mov r10d,DWORD[24+rsi]
+ xor r11d,ecx
+ add edx,r11d
+ rol edx,12
+ mov r11d,ebx
+ add edx,eax
+ xor r11d,eax
+ lea ecx,[((-1473231341))+r10*1+rcx]
+ and r11d,edx
+ mov r10d,DWORD[28+rsi]
+ xor r11d,ebx
+ add ecx,r11d
+ rol ecx,17
+ mov r11d,eax
+ add ecx,edx
+ xor r11d,edx
+ lea ebx,[((-45705983))+r10*1+rbx]
+ and r11d,ecx
+ mov r10d,DWORD[32+rsi]
+ xor r11d,eax
+ add ebx,r11d
+ rol ebx,22
+ mov r11d,edx
+ add ebx,ecx
+ xor r11d,ecx
+ lea eax,[1770035416+r10*1+rax]
+ and r11d,ebx
+ mov r10d,DWORD[36+rsi]
+ xor r11d,edx
+ add eax,r11d
+ rol eax,7
+ mov r11d,ecx
+ add eax,ebx
+ xor r11d,ebx
+ lea edx,[((-1958414417))+r10*1+rdx]
+ and r11d,eax
+ mov r10d,DWORD[40+rsi]
+ xor r11d,ecx
+ add edx,r11d
+ rol edx,12
+ mov r11d,ebx
+ add edx,eax
+ xor r11d,eax
+ lea ecx,[((-42063))+r10*1+rcx]
+ and r11d,edx
+ mov r10d,DWORD[44+rsi]
+ xor r11d,ebx
+ add ecx,r11d
+ rol ecx,17
+ mov r11d,eax
+ add ecx,edx
+ xor r11d,edx
+ lea ebx,[((-1990404162))+r10*1+rbx]
+ and r11d,ecx
+ mov r10d,DWORD[48+rsi]
+ xor r11d,eax
+ add ebx,r11d
+ rol ebx,22
+ mov r11d,edx
+ add ebx,ecx
+ xor r11d,ecx
+ lea eax,[1804603682+r10*1+rax]
+ and r11d,ebx
+ mov r10d,DWORD[52+rsi]
+ xor r11d,edx
+ add eax,r11d
+ rol eax,7
+ mov r11d,ecx
+ add eax,ebx
+ xor r11d,ebx
+ lea edx,[((-40341101))+r10*1+rdx]
+ and r11d,eax
+ mov r10d,DWORD[56+rsi]
+ xor r11d,ecx
+ add edx,r11d
+ rol edx,12
+ mov r11d,ebx
+ add edx,eax
+ xor r11d,eax
+ lea ecx,[((-1502002290))+r10*1+rcx]
+ and r11d,edx
+ mov r10d,DWORD[60+rsi]
+ xor r11d,ebx
+ add ecx,r11d
+ rol ecx,17
+ mov r11d,eax
+ add ecx,edx
+ xor r11d,edx
+ lea ebx,[1236535329+r10*1+rbx]
+ and r11d,ecx
+ mov r10d,DWORD[4+rsi]
+ xor r11d,eax
+ add ebx,r11d
+ rol ebx,22
+ mov r11d,edx
+ add ebx,ecx
+ mov r11d,edx
+ mov r12d,edx
+ not r11d
+ and r12d,ebx
+ lea eax,[((-165796510))+r10*1+rax]
+ and r11d,ecx
+ mov r10d,DWORD[24+rsi]
+ or r12d,r11d
+ mov r11d,ecx
+ add eax,r12d
+ mov r12d,ecx
+ rol eax,5
+ add eax,ebx
+ not r11d
+ and r12d,eax
+ lea edx,[((-1069501632))+r10*1+rdx]
+ and r11d,ebx
+ mov r10d,DWORD[44+rsi]
+ or r12d,r11d
+ mov r11d,ebx
+ add edx,r12d
+ mov r12d,ebx
+ rol edx,9
+ add edx,eax
+ not r11d
+ and r12d,edx
+ lea ecx,[643717713+r10*1+rcx]
+ and r11d,eax
+ mov r10d,DWORD[rsi]
+ or r12d,r11d
+ mov r11d,eax
+ add ecx,r12d
+ mov r12d,eax
+ rol ecx,14
+ add ecx,edx
+ not r11d
+ and r12d,ecx
+ lea ebx,[((-373897302))+r10*1+rbx]
+ and r11d,edx
+ mov r10d,DWORD[20+rsi]
+ or r12d,r11d
+ mov r11d,edx
+ add ebx,r12d
+ mov r12d,edx
+ rol ebx,20
+ add ebx,ecx
+ not r11d
+ and r12d,ebx
+ lea eax,[((-701558691))+r10*1+rax]
+ and r11d,ecx
+ mov r10d,DWORD[40+rsi]
+ or r12d,r11d
+ mov r11d,ecx
+ add eax,r12d
+ mov r12d,ecx
+ rol eax,5
+ add eax,ebx
+ not r11d
+ and r12d,eax
+ lea edx,[38016083+r10*1+rdx]
+ and r11d,ebx
+ mov r10d,DWORD[60+rsi]
+ or r12d,r11d
+ mov r11d,ebx
+ add edx,r12d
+ mov r12d,ebx
+ rol edx,9
+ add edx,eax
+ not r11d
+ and r12d,edx
+ lea ecx,[((-660478335))+r10*1+rcx]
+ and r11d,eax
+ mov r10d,DWORD[16+rsi]
+ or r12d,r11d
+ mov r11d,eax
+ add ecx,r12d
+ mov r12d,eax
+ rol ecx,14
+ add ecx,edx
+ not r11d
+ and r12d,ecx
+ lea ebx,[((-405537848))+r10*1+rbx]
+ and r11d,edx
+ mov r10d,DWORD[36+rsi]
+ or r12d,r11d
+ mov r11d,edx
+ add ebx,r12d
+ mov r12d,edx
+ rol ebx,20
+ add ebx,ecx
+ not r11d
+ and r12d,ebx
+ lea eax,[568446438+r10*1+rax]
+ and r11d,ecx
+ mov r10d,DWORD[56+rsi]
+ or r12d,r11d
+ mov r11d,ecx
+ add eax,r12d
+ mov r12d,ecx
+ rol eax,5
+ add eax,ebx
+ not r11d
+ and r12d,eax
+ lea edx,[((-1019803690))+r10*1+rdx]
+ and r11d,ebx
+ mov r10d,DWORD[12+rsi]
+ or r12d,r11d
+ mov r11d,ebx
+ add edx,r12d
+ mov r12d,ebx
+ rol edx,9
+ add edx,eax
+ not r11d
+ and r12d,edx
+ lea ecx,[((-187363961))+r10*1+rcx]
+ and r11d,eax
+ mov r10d,DWORD[32+rsi]
+ or r12d,r11d
+ mov r11d,eax
+ add ecx,r12d
+ mov r12d,eax
+ rol ecx,14
+ add ecx,edx
+ not r11d
+ and r12d,ecx
+ lea ebx,[1163531501+r10*1+rbx]
+ and r11d,edx
+ mov r10d,DWORD[52+rsi]
+ or r12d,r11d
+ mov r11d,edx
+ add ebx,r12d
+ mov r12d,edx
+ rol ebx,20
+ add ebx,ecx
+ not r11d
+ and r12d,ebx
+ lea eax,[((-1444681467))+r10*1+rax]
+ and r11d,ecx
+ mov r10d,DWORD[8+rsi]
+ or r12d,r11d
+ mov r11d,ecx
+ add eax,r12d
+ mov r12d,ecx
+ rol eax,5
+ add eax,ebx
+ not r11d
+ and r12d,eax
+ lea edx,[((-51403784))+r10*1+rdx]
+ and r11d,ebx
+ mov r10d,DWORD[28+rsi]
+ or r12d,r11d
+ mov r11d,ebx
+ add edx,r12d
+ mov r12d,ebx
+ rol edx,9
+ add edx,eax
+ not r11d
+ and r12d,edx
+ lea ecx,[1735328473+r10*1+rcx]
+ and r11d,eax
+ mov r10d,DWORD[48+rsi]
+ or r12d,r11d
+ mov r11d,eax
+ add ecx,r12d
+ mov r12d,eax
+ rol ecx,14
+ add ecx,edx
+ not r11d
+ and r12d,ecx
+ lea ebx,[((-1926607734))+r10*1+rbx]
+ and r11d,edx
+ mov r10d,DWORD[20+rsi]
+ or r12d,r11d
+ mov r11d,edx
+ add ebx,r12d
+ mov r12d,edx
+ rol ebx,20
+ add ebx,ecx
+ mov r11d,ecx
+ lea eax,[((-378558))+r10*1+rax]
+ xor r11d,edx
+ mov r10d,DWORD[32+rsi]
+ xor r11d,ebx
+ add eax,r11d
+ mov r11d,ebx
+ rol eax,4
+ add eax,ebx
+ lea edx,[((-2022574463))+r10*1+rdx]
+ xor r11d,ecx
+ mov r10d,DWORD[44+rsi]
+ xor r11d,eax
+ add edx,r11d
+ rol edx,11
+ mov r11d,eax
+ add edx,eax
+ lea ecx,[1839030562+r10*1+rcx]
+ xor r11d,ebx
+ mov r10d,DWORD[56+rsi]
+ xor r11d,edx
+ add ecx,r11d
+ mov r11d,edx
+ rol ecx,16
+ add ecx,edx
+ lea ebx,[((-35309556))+r10*1+rbx]
+ xor r11d,eax
+ mov r10d,DWORD[4+rsi]
+ xor r11d,ecx
+ add ebx,r11d
+ rol ebx,23
+ mov r11d,ecx
+ add ebx,ecx
+ lea eax,[((-1530992060))+r10*1+rax]
+ xor r11d,edx
+ mov r10d,DWORD[16+rsi]
+ xor r11d,ebx
+ add eax,r11d
+ mov r11d,ebx
+ rol eax,4
+ add eax,ebx
+ lea edx,[1272893353+r10*1+rdx]
+ xor r11d,ecx
+ mov r10d,DWORD[28+rsi]
+ xor r11d,eax
+ add edx,r11d
+ rol edx,11
+ mov r11d,eax
+ add edx,eax
+ lea ecx,[((-155497632))+r10*1+rcx]
+ xor r11d,ebx
+ mov r10d,DWORD[40+rsi]
+ xor r11d,edx
+ add ecx,r11d
+ mov r11d,edx
+ rol ecx,16
+ add ecx,edx
+ lea ebx,[((-1094730640))+r10*1+rbx]
+ xor r11d,eax
+ mov r10d,DWORD[52+rsi]
+ xor r11d,ecx
+ add ebx,r11d
+ rol ebx,23
+ mov r11d,ecx
+ add ebx,ecx
+ lea eax,[681279174+r10*1+rax]
+ xor r11d,edx
+ mov r10d,DWORD[rsi]
+ xor r11d,ebx
+ add eax,r11d
+ mov r11d,ebx
+ rol eax,4
+ add eax,ebx
+ lea edx,[((-358537222))+r10*1+rdx]
+ xor r11d,ecx
+ mov r10d,DWORD[12+rsi]
+ xor r11d,eax
+ add edx,r11d
+ rol edx,11
+ mov r11d,eax
+ add edx,eax
+ lea ecx,[((-722521979))+r10*1+rcx]
+ xor r11d,ebx
+ mov r10d,DWORD[24+rsi]
+ xor r11d,edx
+ add ecx,r11d
+ mov r11d,edx
+ rol ecx,16
+ add ecx,edx
+ lea ebx,[76029189+r10*1+rbx]
+ xor r11d,eax
+ mov r10d,DWORD[36+rsi]
+ xor r11d,ecx
+ add ebx,r11d
+ rol ebx,23
+ mov r11d,ecx
+ add ebx,ecx
+ lea eax,[((-640364487))+r10*1+rax]
+ xor r11d,edx
+ mov r10d,DWORD[48+rsi]
+ xor r11d,ebx
+ add eax,r11d
+ mov r11d,ebx
+ rol eax,4
+ add eax,ebx
+ lea edx,[((-421815835))+r10*1+rdx]
+ xor r11d,ecx
+ mov r10d,DWORD[60+rsi]
+ xor r11d,eax
+ add edx,r11d
+ rol edx,11
+ mov r11d,eax
+ add edx,eax
+ lea ecx,[530742520+r10*1+rcx]
+ xor r11d,ebx
+ mov r10d,DWORD[8+rsi]
+ xor r11d,edx
+ add ecx,r11d
+ mov r11d,edx
+ rol ecx,16
+ add ecx,edx
+ lea ebx,[((-995338651))+r10*1+rbx]
+ xor r11d,eax
+ mov r10d,DWORD[rsi]
+ xor r11d,ecx
+ add ebx,r11d
+ rol ebx,23
+ mov r11d,ecx
+ add ebx,ecx
+ mov r11d,0xffffffff
+ xor r11d,edx
+ lea eax,[((-198630844))+r10*1+rax]
+ or r11d,ebx
+ mov r10d,DWORD[28+rsi]
+ xor r11d,ecx
+ add eax,r11d
+ mov r11d,0xffffffff
+ rol eax,6
+ xor r11d,ecx
+ add eax,ebx
+ lea edx,[1126891415+r10*1+rdx]
+ or r11d,eax
+ mov r10d,DWORD[56+rsi]
+ xor r11d,ebx
+ add edx,r11d
+ mov r11d,0xffffffff
+ rol edx,10
+ xor r11d,ebx
+ add edx,eax
+ lea ecx,[((-1416354905))+r10*1+rcx]
+ or r11d,edx
+ mov r10d,DWORD[20+rsi]
+ xor r11d,eax
+ add ecx,r11d
+ mov r11d,0xffffffff
+ rol ecx,15
+ xor r11d,eax
+ add ecx,edx
+ lea ebx,[((-57434055))+r10*1+rbx]
+ or r11d,ecx
+ mov r10d,DWORD[48+rsi]
+ xor r11d,edx
+ add ebx,r11d
+ mov r11d,0xffffffff
+ rol ebx,21
+ xor r11d,edx
+ add ebx,ecx
+ lea eax,[1700485571+r10*1+rax]
+ or r11d,ebx
+ mov r10d,DWORD[12+rsi]
+ xor r11d,ecx
+ add eax,r11d
+ mov r11d,0xffffffff
+ rol eax,6
+ xor r11d,ecx
+ add eax,ebx
+ lea edx,[((-1894986606))+r10*1+rdx]
+ or r11d,eax
+ mov r10d,DWORD[40+rsi]
+ xor r11d,ebx
+ add edx,r11d
+ mov r11d,0xffffffff
+ rol edx,10
+ xor r11d,ebx
+ add edx,eax
+ lea ecx,[((-1051523))+r10*1+rcx]
+ or r11d,edx
+ mov r10d,DWORD[4+rsi]
+ xor r11d,eax
+ add ecx,r11d
+ mov r11d,0xffffffff
+ rol ecx,15
+ xor r11d,eax
+ add ecx,edx
+ lea ebx,[((-2054922799))+r10*1+rbx]
+ or r11d,ecx
+ mov r10d,DWORD[32+rsi]
+ xor r11d,edx
+ add ebx,r11d
+ mov r11d,0xffffffff
+ rol ebx,21
+ xor r11d,edx
+ add ebx,ecx
+ lea eax,[1873313359+r10*1+rax]
+ or r11d,ebx
+ mov r10d,DWORD[60+rsi]
+ xor r11d,ecx
+ add eax,r11d
+ mov r11d,0xffffffff
+ rol eax,6
+ xor r11d,ecx
+ add eax,ebx
+ lea edx,[((-30611744))+r10*1+rdx]
+ or r11d,eax
+ mov r10d,DWORD[24+rsi]
+ xor r11d,ebx
+ add edx,r11d
+ mov r11d,0xffffffff
+ rol edx,10
+ xor r11d,ebx
+ add edx,eax
+ lea ecx,[((-1560198380))+r10*1+rcx]
+ or r11d,edx
+ mov r10d,DWORD[52+rsi]
+ xor r11d,eax
+ add ecx,r11d
+ mov r11d,0xffffffff
+ rol ecx,15
+ xor r11d,eax
+ add ecx,edx
+ lea ebx,[1309151649+r10*1+rbx]
+ or r11d,ecx
+ mov r10d,DWORD[16+rsi]
+ xor r11d,edx
+ add ebx,r11d
+ mov r11d,0xffffffff
+ rol ebx,21
+ xor r11d,edx
+ add ebx,ecx
+ lea eax,[((-145523070))+r10*1+rax]
+ or r11d,ebx
+ mov r10d,DWORD[44+rsi]
+ xor r11d,ecx
+ add eax,r11d
+ mov r11d,0xffffffff
+ rol eax,6
+ xor r11d,ecx
+ add eax,ebx
+ lea edx,[((-1120210379))+r10*1+rdx]
+ or r11d,eax
+ mov r10d,DWORD[8+rsi]
+ xor r11d,ebx
+ add edx,r11d
+ mov r11d,0xffffffff
+ rol edx,10
+ xor r11d,ebx
+ add edx,eax
+ lea ecx,[718787259+r10*1+rcx]
+ or r11d,edx
+ mov r10d,DWORD[36+rsi]
+ xor r11d,eax
+ add ecx,r11d
+ mov r11d,0xffffffff
+ rol ecx,15
+ xor r11d,eax
+ add ecx,edx
+ lea ebx,[((-343485551))+r10*1+rbx]
+ or r11d,ecx
+ mov r10d,DWORD[rsi]
+ xor r11d,edx
+ add ebx,r11d
+ mov r11d,0xffffffff
+ rol ebx,21
+ xor r11d,edx
+ add ebx,ecx
+
+ add eax,r8d
+ add ebx,r9d
+ add ecx,r14d
+ add edx,r15d
+
+
+ add rsi,64
+ cmp rsi,rdi
+ jb NEAR $L$loop
+
+
+$L$end:
+ mov DWORD[rbp],eax
+ mov DWORD[4+rbp],ebx
+ mov DWORD[8+rbp],ecx
+ mov DWORD[12+rbp],edx
+
+ mov r15,QWORD[rsp]
+
+ mov r14,QWORD[8+rsp]
+
+ mov r12,QWORD[16+rsp]
+
+ mov rbx,QWORD[24+rsp]
+
+ mov rbp,QWORD[32+rsp]
+
+ add rsp,40
+
+$L$epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_md5_block_asm_data_order:
+EXTERN __imp_RtlVirtualUnwind
+
+ALIGN 16
+se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ lea r10,[$L$prologue]
+ cmp rbx,r10
+ jb NEAR $L$in_prologue
+
+ mov rax,QWORD[152+r8]
+
+ lea r10,[$L$epilogue]
+ cmp rbx,r10
+ jae NEAR $L$in_prologue
+
+ lea rax,[40+rax]
+
+ mov rbp,QWORD[((-8))+rax]
+ mov rbx,QWORD[((-16))+rax]
+ mov r12,QWORD[((-24))+rax]
+ mov r14,QWORD[((-32))+rax]
+ mov r15,QWORD[((-40))+rax]
+ mov QWORD[144+r8],rbx
+ mov QWORD[160+r8],rbp
+ mov QWORD[216+r8],r12
+ mov QWORD[232+r8],r14
+ mov QWORD[240+r8],r15
+
+$L$in_prologue:
+ mov rdi,QWORD[8+rax]
+ mov rsi,QWORD[16+rax]
+ mov QWORD[152+r8],rax
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+
+ mov rdi,QWORD[40+r9]
+ mov rsi,r8
+ mov ecx,154
+ DD 0xa548f3fc
+
+ mov rsi,r9
+ xor rcx,rcx
+ mov rdx,QWORD[8+rsi]
+ mov r8,QWORD[rsi]
+ mov r9,QWORD[16+rsi]
+ mov r10,QWORD[40+rsi]
+ lea r11,[56+rsi]
+ lea r12,[24+rsi]
+ mov QWORD[32+rsp],r10
+ mov QWORD[40+rsp],r11
+ mov QWORD[48+rsp],r12
+ mov QWORD[56+rsp],rcx
+ call QWORD[__imp_RtlVirtualUnwind]
+
+ mov eax,1
+ add rsp,64
+ popfq
+ pop r15
+ pop r14
+ pop r13
+ pop r12
+ pop rbp
+ pop rbx
+ pop rdi
+ pop rsi
+ DB 0F3h,0C3h ;repret
+
+
+section .pdata rdata align=4
+ALIGN 4
+ DD $L$SEH_begin_md5_block_asm_data_order wrt ..imagebase
+ DD $L$SEH_end_md5_block_asm_data_order wrt ..imagebase
+ DD $L$SEH_info_md5_block_asm_data_order wrt ..imagebase
+
+section .xdata rdata align=8
+ALIGN 8
+$L$SEH_info_md5_block_asm_data_order:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm-x86_64.nasm
new file mode 100644
index 0000000000..3951121452
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm-x86_64.nasm
@@ -0,0 +1,984 @@
+; Copyright 2013-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+
+ALIGN 32
+_aesni_ctr32_ghash_6x:
+ vmovdqu xmm2,XMMWORD[32+r11]
+ sub rdx,6
+ vpxor xmm4,xmm4,xmm4
+ vmovdqu xmm15,XMMWORD[((0-128))+rcx]
+ vpaddb xmm10,xmm1,xmm2
+ vpaddb xmm11,xmm10,xmm2
+ vpaddb xmm12,xmm11,xmm2
+ vpaddb xmm13,xmm12,xmm2
+ vpaddb xmm14,xmm13,xmm2
+ vpxor xmm9,xmm1,xmm15
+ vmovdqu XMMWORD[(16+8)+rsp],xmm4
+ jmp NEAR $L$oop6x
+
+ALIGN 32
+$L$oop6x:
+ add ebx,100663296
+ jc NEAR $L$handle_ctr32
+ vmovdqu xmm3,XMMWORD[((0-32))+r9]
+ vpaddb xmm1,xmm14,xmm2
+ vpxor xmm10,xmm10,xmm15
+ vpxor xmm11,xmm11,xmm15
+
+$L$resume_ctr32:
+ vmovdqu XMMWORD[r8],xmm1
+ vpclmulqdq xmm5,xmm7,xmm3,0x10
+ vpxor xmm12,xmm12,xmm15
+ vmovups xmm2,XMMWORD[((16-128))+rcx]
+ vpclmulqdq xmm6,xmm7,xmm3,0x01
+ xor r12,r12
+ cmp r15,r14
+
+ vaesenc xmm9,xmm9,xmm2
+ vmovdqu xmm0,XMMWORD[((48+8))+rsp]
+ vpxor xmm13,xmm13,xmm15
+ vpclmulqdq xmm1,xmm7,xmm3,0x00
+ vaesenc xmm10,xmm10,xmm2
+ vpxor xmm14,xmm14,xmm15
+ setnc r12b
+ vpclmulqdq xmm7,xmm7,xmm3,0x11
+ vaesenc xmm11,xmm11,xmm2
+ vmovdqu xmm3,XMMWORD[((16-32))+r9]
+ neg r12
+ vaesenc xmm12,xmm12,xmm2
+ vpxor xmm6,xmm6,xmm5
+ vpclmulqdq xmm5,xmm0,xmm3,0x00
+ vpxor xmm8,xmm8,xmm4
+ vaesenc xmm13,xmm13,xmm2
+ vpxor xmm4,xmm1,xmm5
+ and r12,0x60
+ vmovups xmm15,XMMWORD[((32-128))+rcx]
+ vpclmulqdq xmm1,xmm0,xmm3,0x10
+ vaesenc xmm14,xmm14,xmm2
+
+ vpclmulqdq xmm2,xmm0,xmm3,0x01
+ lea r14,[r12*1+r14]
+ vaesenc xmm9,xmm9,xmm15
+ vpxor xmm8,xmm8,XMMWORD[((16+8))+rsp]
+ vpclmulqdq xmm3,xmm0,xmm3,0x11
+ vmovdqu xmm0,XMMWORD[((64+8))+rsp]
+ vaesenc xmm10,xmm10,xmm15
+ movbe r13,QWORD[88+r14]
+ vaesenc xmm11,xmm11,xmm15
+ movbe r12,QWORD[80+r14]
+ vaesenc xmm12,xmm12,xmm15
+ mov QWORD[((32+8))+rsp],r13
+ vaesenc xmm13,xmm13,xmm15
+ mov QWORD[((40+8))+rsp],r12
+ vmovdqu xmm5,XMMWORD[((48-32))+r9]
+ vaesenc xmm14,xmm14,xmm15
+
+ vmovups xmm15,XMMWORD[((48-128))+rcx]
+ vpxor xmm6,xmm6,xmm1
+ vpclmulqdq xmm1,xmm0,xmm5,0x00
+ vaesenc xmm9,xmm9,xmm15
+ vpxor xmm6,xmm6,xmm2
+ vpclmulqdq xmm2,xmm0,xmm5,0x10
+ vaesenc xmm10,xmm10,xmm15
+ vpxor xmm7,xmm7,xmm3
+ vpclmulqdq xmm3,xmm0,xmm5,0x01
+ vaesenc xmm11,xmm11,xmm15
+ vpclmulqdq xmm5,xmm0,xmm5,0x11
+ vmovdqu xmm0,XMMWORD[((80+8))+rsp]
+ vaesenc xmm12,xmm12,xmm15
+ vaesenc xmm13,xmm13,xmm15
+ vpxor xmm4,xmm4,xmm1
+ vmovdqu xmm1,XMMWORD[((64-32))+r9]
+ vaesenc xmm14,xmm14,xmm15
+
+ vmovups xmm15,XMMWORD[((64-128))+rcx]
+ vpxor xmm6,xmm6,xmm2
+ vpclmulqdq xmm2,xmm0,xmm1,0x00
+ vaesenc xmm9,xmm9,xmm15
+ vpxor xmm6,xmm6,xmm3
+ vpclmulqdq xmm3,xmm0,xmm1,0x10
+ vaesenc xmm10,xmm10,xmm15
+ movbe r13,QWORD[72+r14]
+ vpxor xmm7,xmm7,xmm5
+ vpclmulqdq xmm5,xmm0,xmm1,0x01
+ vaesenc xmm11,xmm11,xmm15
+ movbe r12,QWORD[64+r14]
+ vpclmulqdq xmm1,xmm0,xmm1,0x11
+ vmovdqu xmm0,XMMWORD[((96+8))+rsp]
+ vaesenc xmm12,xmm12,xmm15
+ mov QWORD[((48+8))+rsp],r13
+ vaesenc xmm13,xmm13,xmm15
+ mov QWORD[((56+8))+rsp],r12
+ vpxor xmm4,xmm4,xmm2
+ vmovdqu xmm2,XMMWORD[((96-32))+r9]
+ vaesenc xmm14,xmm14,xmm15
+
+ vmovups xmm15,XMMWORD[((80-128))+rcx]
+ vpxor xmm6,xmm6,xmm3
+ vpclmulqdq xmm3,xmm0,xmm2,0x00
+ vaesenc xmm9,xmm9,xmm15
+ vpxor xmm6,xmm6,xmm5
+ vpclmulqdq xmm5,xmm0,xmm2,0x10
+ vaesenc xmm10,xmm10,xmm15
+ movbe r13,QWORD[56+r14]
+ vpxor xmm7,xmm7,xmm1
+ vpclmulqdq xmm1,xmm0,xmm2,0x01
+ vpxor xmm8,xmm8,XMMWORD[((112+8))+rsp]
+ vaesenc xmm11,xmm11,xmm15
+ movbe r12,QWORD[48+r14]
+ vpclmulqdq xmm2,xmm0,xmm2,0x11
+ vaesenc xmm12,xmm12,xmm15
+ mov QWORD[((64+8))+rsp],r13
+ vaesenc xmm13,xmm13,xmm15
+ mov QWORD[((72+8))+rsp],r12
+ vpxor xmm4,xmm4,xmm3
+ vmovdqu xmm3,XMMWORD[((112-32))+r9]
+ vaesenc xmm14,xmm14,xmm15
+
+ vmovups xmm15,XMMWORD[((96-128))+rcx]
+ vpxor xmm6,xmm6,xmm5
+ vpclmulqdq xmm5,xmm8,xmm3,0x10
+ vaesenc xmm9,xmm9,xmm15
+ vpxor xmm6,xmm6,xmm1
+ vpclmulqdq xmm1,xmm8,xmm3,0x01
+ vaesenc xmm10,xmm10,xmm15
+ movbe r13,QWORD[40+r14]
+ vpxor xmm7,xmm7,xmm2
+ vpclmulqdq xmm2,xmm8,xmm3,0x00
+ vaesenc xmm11,xmm11,xmm15
+ movbe r12,QWORD[32+r14]
+ vpclmulqdq xmm8,xmm8,xmm3,0x11
+ vaesenc xmm12,xmm12,xmm15
+ mov QWORD[((80+8))+rsp],r13
+ vaesenc xmm13,xmm13,xmm15
+ mov QWORD[((88+8))+rsp],r12
+ vpxor xmm6,xmm6,xmm5
+ vaesenc xmm14,xmm14,xmm15
+ vpxor xmm6,xmm6,xmm1
+
+ vmovups xmm15,XMMWORD[((112-128))+rcx]
+ vpslldq xmm5,xmm6,8
+ vpxor xmm4,xmm4,xmm2
+ vmovdqu xmm3,XMMWORD[16+r11]
+
+ vaesenc xmm9,xmm9,xmm15
+ vpxor xmm7,xmm7,xmm8
+ vaesenc xmm10,xmm10,xmm15
+ vpxor xmm4,xmm4,xmm5
+ movbe r13,QWORD[24+r14]
+ vaesenc xmm11,xmm11,xmm15
+ movbe r12,QWORD[16+r14]
+ vpalignr xmm0,xmm4,xmm4,8
+ vpclmulqdq xmm4,xmm4,xmm3,0x10
+ mov QWORD[((96+8))+rsp],r13
+ vaesenc xmm12,xmm12,xmm15
+ mov QWORD[((104+8))+rsp],r12
+ vaesenc xmm13,xmm13,xmm15
+ vmovups xmm1,XMMWORD[((128-128))+rcx]
+ vaesenc xmm14,xmm14,xmm15
+
+ vaesenc xmm9,xmm9,xmm1
+ vmovups xmm15,XMMWORD[((144-128))+rcx]
+ vaesenc xmm10,xmm10,xmm1
+ vpsrldq xmm6,xmm6,8
+ vaesenc xmm11,xmm11,xmm1
+ vpxor xmm7,xmm7,xmm6
+ vaesenc xmm12,xmm12,xmm1
+ vpxor xmm4,xmm4,xmm0
+ movbe r13,QWORD[8+r14]
+ vaesenc xmm13,xmm13,xmm1
+ movbe r12,QWORD[r14]
+ vaesenc xmm14,xmm14,xmm1
+ vmovups xmm1,XMMWORD[((160-128))+rcx]
+ cmp ebp,11
+ jb NEAR $L$enc_tail
+
+ vaesenc xmm9,xmm9,xmm15
+ vaesenc xmm10,xmm10,xmm15
+ vaesenc xmm11,xmm11,xmm15
+ vaesenc xmm12,xmm12,xmm15
+ vaesenc xmm13,xmm13,xmm15
+ vaesenc xmm14,xmm14,xmm15
+
+ vaesenc xmm9,xmm9,xmm1
+ vaesenc xmm10,xmm10,xmm1
+ vaesenc xmm11,xmm11,xmm1
+ vaesenc xmm12,xmm12,xmm1
+ vaesenc xmm13,xmm13,xmm1
+ vmovups xmm15,XMMWORD[((176-128))+rcx]
+ vaesenc xmm14,xmm14,xmm1
+ vmovups xmm1,XMMWORD[((192-128))+rcx]
+ je NEAR $L$enc_tail
+
+ vaesenc xmm9,xmm9,xmm15
+ vaesenc xmm10,xmm10,xmm15
+ vaesenc xmm11,xmm11,xmm15
+ vaesenc xmm12,xmm12,xmm15
+ vaesenc xmm13,xmm13,xmm15
+ vaesenc xmm14,xmm14,xmm15
+
+ vaesenc xmm9,xmm9,xmm1
+ vaesenc xmm10,xmm10,xmm1
+ vaesenc xmm11,xmm11,xmm1
+ vaesenc xmm12,xmm12,xmm1
+ vaesenc xmm13,xmm13,xmm1
+ vmovups xmm15,XMMWORD[((208-128))+rcx]
+ vaesenc xmm14,xmm14,xmm1
+ vmovups xmm1,XMMWORD[((224-128))+rcx]
+ jmp NEAR $L$enc_tail
+
+ALIGN 32
+$L$handle_ctr32:
+ vmovdqu xmm0,XMMWORD[r11]
+ vpshufb xmm6,xmm1,xmm0
+ vmovdqu xmm5,XMMWORD[48+r11]
+ vpaddd xmm10,xmm6,XMMWORD[64+r11]
+ vpaddd xmm11,xmm6,xmm5
+ vmovdqu xmm3,XMMWORD[((0-32))+r9]
+ vpaddd xmm12,xmm10,xmm5
+ vpshufb xmm10,xmm10,xmm0
+ vpaddd xmm13,xmm11,xmm5
+ vpshufb xmm11,xmm11,xmm0
+ vpxor xmm10,xmm10,xmm15
+ vpaddd xmm14,xmm12,xmm5
+ vpshufb xmm12,xmm12,xmm0
+ vpxor xmm11,xmm11,xmm15
+ vpaddd xmm1,xmm13,xmm5
+ vpshufb xmm13,xmm13,xmm0
+ vpshufb xmm14,xmm14,xmm0
+ vpshufb xmm1,xmm1,xmm0
+ jmp NEAR $L$resume_ctr32
+
+ALIGN 32
+$L$enc_tail:
+ vaesenc xmm9,xmm9,xmm15
+ vmovdqu XMMWORD[(16+8)+rsp],xmm7
+ vpalignr xmm8,xmm4,xmm4,8
+ vaesenc xmm10,xmm10,xmm15
+ vpclmulqdq xmm4,xmm4,xmm3,0x10
+ vpxor xmm2,xmm1,XMMWORD[rdi]
+ vaesenc xmm11,xmm11,xmm15
+ vpxor xmm0,xmm1,XMMWORD[16+rdi]
+ vaesenc xmm12,xmm12,xmm15
+ vpxor xmm5,xmm1,XMMWORD[32+rdi]
+ vaesenc xmm13,xmm13,xmm15
+ vpxor xmm6,xmm1,XMMWORD[48+rdi]
+ vaesenc xmm14,xmm14,xmm15
+ vpxor xmm7,xmm1,XMMWORD[64+rdi]
+ vpxor xmm3,xmm1,XMMWORD[80+rdi]
+ vmovdqu xmm1,XMMWORD[r8]
+
+ vaesenclast xmm9,xmm9,xmm2
+ vmovdqu xmm2,XMMWORD[32+r11]
+ vaesenclast xmm10,xmm10,xmm0
+ vpaddb xmm0,xmm1,xmm2
+ mov QWORD[((112+8))+rsp],r13
+ lea rdi,[96+rdi]
+ vaesenclast xmm11,xmm11,xmm5
+ vpaddb xmm5,xmm0,xmm2
+ mov QWORD[((120+8))+rsp],r12
+ lea rsi,[96+rsi]
+ vmovdqu xmm15,XMMWORD[((0-128))+rcx]
+ vaesenclast xmm12,xmm12,xmm6
+ vpaddb xmm6,xmm5,xmm2
+ vaesenclast xmm13,xmm13,xmm7
+ vpaddb xmm7,xmm6,xmm2
+ vaesenclast xmm14,xmm14,xmm3
+ vpaddb xmm3,xmm7,xmm2
+
+ add r10,0x60
+ sub rdx,0x6
+ jc NEAR $L$6x_done
+
+ vmovups XMMWORD[(-96)+rsi],xmm9
+ vpxor xmm9,xmm1,xmm15
+ vmovups XMMWORD[(-80)+rsi],xmm10
+ vmovdqa xmm10,xmm0
+ vmovups XMMWORD[(-64)+rsi],xmm11
+ vmovdqa xmm11,xmm5
+ vmovups XMMWORD[(-48)+rsi],xmm12
+ vmovdqa xmm12,xmm6
+ vmovups XMMWORD[(-32)+rsi],xmm13
+ vmovdqa xmm13,xmm7
+ vmovups XMMWORD[(-16)+rsi],xmm14
+ vmovdqa xmm14,xmm3
+ vmovdqu xmm7,XMMWORD[((32+8))+rsp]
+ jmp NEAR $L$oop6x
+
+$L$6x_done:
+ vpxor xmm8,xmm8,XMMWORD[((16+8))+rsp]
+ vpxor xmm8,xmm8,xmm4
+
+ DB 0F3h,0C3h ;repret
+
+global aesni_gcm_decrypt
+
+ALIGN 32
+aesni_gcm_decrypt:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_gcm_decrypt:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ xor r10,r10
+ cmp rdx,0x60
+ jb NEAR $L$gcm_dec_abort
+
+ lea rax,[rsp]
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ lea rsp,[((-168))+rsp]
+ movaps XMMWORD[(-216)+rax],xmm6
+ movaps XMMWORD[(-200)+rax],xmm7
+ movaps XMMWORD[(-184)+rax],xmm8
+ movaps XMMWORD[(-168)+rax],xmm9
+ movaps XMMWORD[(-152)+rax],xmm10
+ movaps XMMWORD[(-136)+rax],xmm11
+ movaps XMMWORD[(-120)+rax],xmm12
+ movaps XMMWORD[(-104)+rax],xmm13
+ movaps XMMWORD[(-88)+rax],xmm14
+ movaps XMMWORD[(-72)+rax],xmm15
+$L$gcm_dec_body:
+ vzeroupper
+
+ vmovdqu xmm1,XMMWORD[r8]
+ add rsp,-128
+ mov ebx,DWORD[12+r8]
+ lea r11,[$L$bswap_mask]
+ lea r14,[((-128))+rcx]
+ mov r15,0xf80
+ vmovdqu xmm8,XMMWORD[r9]
+ and rsp,-128
+ vmovdqu xmm0,XMMWORD[r11]
+ lea rcx,[128+rcx]
+ lea r9,[((32+32))+r9]
+ mov ebp,DWORD[((240-128))+rcx]
+ vpshufb xmm8,xmm8,xmm0
+
+ and r14,r15
+ and r15,rsp
+ sub r15,r14
+ jc NEAR $L$dec_no_key_aliasing
+ cmp r15,768
+ jnc NEAR $L$dec_no_key_aliasing
+ sub rsp,r15
+$L$dec_no_key_aliasing:
+
+ vmovdqu xmm7,XMMWORD[80+rdi]
+ lea r14,[rdi]
+ vmovdqu xmm4,XMMWORD[64+rdi]
+ lea r15,[((-192))+rdx*1+rdi]
+ vmovdqu xmm5,XMMWORD[48+rdi]
+ shr rdx,4
+ xor r10,r10
+ vmovdqu xmm6,XMMWORD[32+rdi]
+ vpshufb xmm7,xmm7,xmm0
+ vmovdqu xmm2,XMMWORD[16+rdi]
+ vpshufb xmm4,xmm4,xmm0
+ vmovdqu xmm3,XMMWORD[rdi]
+ vpshufb xmm5,xmm5,xmm0
+ vmovdqu XMMWORD[48+rsp],xmm4
+ vpshufb xmm6,xmm6,xmm0
+ vmovdqu XMMWORD[64+rsp],xmm5
+ vpshufb xmm2,xmm2,xmm0
+ vmovdqu XMMWORD[80+rsp],xmm6
+ vpshufb xmm3,xmm3,xmm0
+ vmovdqu XMMWORD[96+rsp],xmm2
+ vmovdqu XMMWORD[112+rsp],xmm3
+
+ call _aesni_ctr32_ghash_6x
+
+ vmovups XMMWORD[(-96)+rsi],xmm9
+ vmovups XMMWORD[(-80)+rsi],xmm10
+ vmovups XMMWORD[(-64)+rsi],xmm11
+ vmovups XMMWORD[(-48)+rsi],xmm12
+ vmovups XMMWORD[(-32)+rsi],xmm13
+ vmovups XMMWORD[(-16)+rsi],xmm14
+
+ vpshufb xmm8,xmm8,XMMWORD[r11]
+ vmovdqu XMMWORD[(-64)+r9],xmm8
+
+ vzeroupper
+ movaps xmm6,XMMWORD[((-216))+rax]
+ movaps xmm7,XMMWORD[((-200))+rax]
+ movaps xmm8,XMMWORD[((-184))+rax]
+ movaps xmm9,XMMWORD[((-168))+rax]
+ movaps xmm10,XMMWORD[((-152))+rax]
+ movaps xmm11,XMMWORD[((-136))+rax]
+ movaps xmm12,XMMWORD[((-120))+rax]
+ movaps xmm13,XMMWORD[((-104))+rax]
+ movaps xmm14,XMMWORD[((-88))+rax]
+ movaps xmm15,XMMWORD[((-72))+rax]
+ mov r15,QWORD[((-48))+rax]
+
+ mov r14,QWORD[((-40))+rax]
+
+ mov r13,QWORD[((-32))+rax]
+
+ mov r12,QWORD[((-24))+rax]
+
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$gcm_dec_abort:
+ mov rax,r10
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_aesni_gcm_decrypt:
+
+ALIGN 32
+_aesni_ctr32_6x:
+ vmovdqu xmm4,XMMWORD[((0-128))+rcx]
+ vmovdqu xmm2,XMMWORD[32+r11]
+ lea r13,[((-1))+rbp]
+ vmovups xmm15,XMMWORD[((16-128))+rcx]
+ lea r12,[((32-128))+rcx]
+ vpxor xmm9,xmm1,xmm4
+ add ebx,100663296
+ jc NEAR $L$handle_ctr32_2
+ vpaddb xmm10,xmm1,xmm2
+ vpaddb xmm11,xmm10,xmm2
+ vpxor xmm10,xmm10,xmm4
+ vpaddb xmm12,xmm11,xmm2
+ vpxor xmm11,xmm11,xmm4
+ vpaddb xmm13,xmm12,xmm2
+ vpxor xmm12,xmm12,xmm4
+ vpaddb xmm14,xmm13,xmm2
+ vpxor xmm13,xmm13,xmm4
+ vpaddb xmm1,xmm14,xmm2
+ vpxor xmm14,xmm14,xmm4
+ jmp NEAR $L$oop_ctr32
+
+ALIGN 16
+$L$oop_ctr32:
+ vaesenc xmm9,xmm9,xmm15
+ vaesenc xmm10,xmm10,xmm15
+ vaesenc xmm11,xmm11,xmm15
+ vaesenc xmm12,xmm12,xmm15
+ vaesenc xmm13,xmm13,xmm15
+ vaesenc xmm14,xmm14,xmm15
+ vmovups xmm15,XMMWORD[r12]
+ lea r12,[16+r12]
+ dec r13d
+ jnz NEAR $L$oop_ctr32
+
+ vmovdqu xmm3,XMMWORD[r12]
+ vaesenc xmm9,xmm9,xmm15
+ vpxor xmm4,xmm3,XMMWORD[rdi]
+ vaesenc xmm10,xmm10,xmm15
+ vpxor xmm5,xmm3,XMMWORD[16+rdi]
+ vaesenc xmm11,xmm11,xmm15
+ vpxor xmm6,xmm3,XMMWORD[32+rdi]
+ vaesenc xmm12,xmm12,xmm15
+ vpxor xmm8,xmm3,XMMWORD[48+rdi]
+ vaesenc xmm13,xmm13,xmm15
+ vpxor xmm2,xmm3,XMMWORD[64+rdi]
+ vaesenc xmm14,xmm14,xmm15
+ vpxor xmm3,xmm3,XMMWORD[80+rdi]
+ lea rdi,[96+rdi]
+
+ vaesenclast xmm9,xmm9,xmm4
+ vaesenclast xmm10,xmm10,xmm5
+ vaesenclast xmm11,xmm11,xmm6
+ vaesenclast xmm12,xmm12,xmm8
+ vaesenclast xmm13,xmm13,xmm2
+ vaesenclast xmm14,xmm14,xmm3
+ vmovups XMMWORD[rsi],xmm9
+ vmovups XMMWORD[16+rsi],xmm10
+ vmovups XMMWORD[32+rsi],xmm11
+ vmovups XMMWORD[48+rsi],xmm12
+ vmovups XMMWORD[64+rsi],xmm13
+ vmovups XMMWORD[80+rsi],xmm14
+ lea rsi,[96+rsi]
+
+ DB 0F3h,0C3h ;repret
+ALIGN 32
+$L$handle_ctr32_2:
+ vpshufb xmm6,xmm1,xmm0
+ vmovdqu xmm5,XMMWORD[48+r11]
+ vpaddd xmm10,xmm6,XMMWORD[64+r11]
+ vpaddd xmm11,xmm6,xmm5
+ vpaddd xmm12,xmm10,xmm5
+ vpshufb xmm10,xmm10,xmm0
+ vpaddd xmm13,xmm11,xmm5
+ vpshufb xmm11,xmm11,xmm0
+ vpxor xmm10,xmm10,xmm4
+ vpaddd xmm14,xmm12,xmm5
+ vpshufb xmm12,xmm12,xmm0
+ vpxor xmm11,xmm11,xmm4
+ vpaddd xmm1,xmm13,xmm5
+ vpshufb xmm13,xmm13,xmm0
+ vpxor xmm12,xmm12,xmm4
+ vpshufb xmm14,xmm14,xmm0
+ vpxor xmm13,xmm13,xmm4
+ vpshufb xmm1,xmm1,xmm0
+ vpxor xmm14,xmm14,xmm4
+ jmp NEAR $L$oop_ctr32
+
+
+global aesni_gcm_encrypt
+
+ALIGN 32
+aesni_gcm_encrypt:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_gcm_encrypt:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ xor r10,r10
+ cmp rdx,0x60*3
+ jb NEAR $L$gcm_enc_abort
+
+ lea rax,[rsp]
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ lea rsp,[((-168))+rsp]
+ movaps XMMWORD[(-216)+rax],xmm6
+ movaps XMMWORD[(-200)+rax],xmm7
+ movaps XMMWORD[(-184)+rax],xmm8
+ movaps XMMWORD[(-168)+rax],xmm9
+ movaps XMMWORD[(-152)+rax],xmm10
+ movaps XMMWORD[(-136)+rax],xmm11
+ movaps XMMWORD[(-120)+rax],xmm12
+ movaps XMMWORD[(-104)+rax],xmm13
+ movaps XMMWORD[(-88)+rax],xmm14
+ movaps XMMWORD[(-72)+rax],xmm15
+$L$gcm_enc_body:
+ vzeroupper
+
+ vmovdqu xmm1,XMMWORD[r8]
+ add rsp,-128
+ mov ebx,DWORD[12+r8]
+ lea r11,[$L$bswap_mask]
+ lea r14,[((-128))+rcx]
+ mov r15,0xf80
+ lea rcx,[128+rcx]
+ vmovdqu xmm0,XMMWORD[r11]
+ and rsp,-128
+ mov ebp,DWORD[((240-128))+rcx]
+
+ and r14,r15
+ and r15,rsp
+ sub r15,r14
+ jc NEAR $L$enc_no_key_aliasing
+ cmp r15,768
+ jnc NEAR $L$enc_no_key_aliasing
+ sub rsp,r15
+$L$enc_no_key_aliasing:
+
+ lea r14,[rsi]
+ lea r15,[((-192))+rdx*1+rsi]
+ shr rdx,4
+
+ call _aesni_ctr32_6x
+ vpshufb xmm8,xmm9,xmm0
+ vpshufb xmm2,xmm10,xmm0
+ vmovdqu XMMWORD[112+rsp],xmm8
+ vpshufb xmm4,xmm11,xmm0
+ vmovdqu XMMWORD[96+rsp],xmm2
+ vpshufb xmm5,xmm12,xmm0
+ vmovdqu XMMWORD[80+rsp],xmm4
+ vpshufb xmm6,xmm13,xmm0
+ vmovdqu XMMWORD[64+rsp],xmm5
+ vpshufb xmm7,xmm14,xmm0
+ vmovdqu XMMWORD[48+rsp],xmm6
+
+ call _aesni_ctr32_6x
+
+ vmovdqu xmm8,XMMWORD[r9]
+ lea r9,[((32+32))+r9]
+ sub rdx,12
+ mov r10,0x60*2
+ vpshufb xmm8,xmm8,xmm0
+
+ call _aesni_ctr32_ghash_6x
+ vmovdqu xmm7,XMMWORD[32+rsp]
+ vmovdqu xmm0,XMMWORD[r11]
+ vmovdqu xmm3,XMMWORD[((0-32))+r9]
+ vpunpckhqdq xmm1,xmm7,xmm7
+ vmovdqu xmm15,XMMWORD[((32-32))+r9]
+ vmovups XMMWORD[(-96)+rsi],xmm9
+ vpshufb xmm9,xmm9,xmm0
+ vpxor xmm1,xmm1,xmm7
+ vmovups XMMWORD[(-80)+rsi],xmm10
+ vpshufb xmm10,xmm10,xmm0
+ vmovups XMMWORD[(-64)+rsi],xmm11
+ vpshufb xmm11,xmm11,xmm0
+ vmovups XMMWORD[(-48)+rsi],xmm12
+ vpshufb xmm12,xmm12,xmm0
+ vmovups XMMWORD[(-32)+rsi],xmm13
+ vpshufb xmm13,xmm13,xmm0
+ vmovups XMMWORD[(-16)+rsi],xmm14
+ vpshufb xmm14,xmm14,xmm0
+ vmovdqu XMMWORD[16+rsp],xmm9
+ vmovdqu xmm6,XMMWORD[48+rsp]
+ vmovdqu xmm0,XMMWORD[((16-32))+r9]
+ vpunpckhqdq xmm2,xmm6,xmm6
+ vpclmulqdq xmm5,xmm7,xmm3,0x00
+ vpxor xmm2,xmm2,xmm6
+ vpclmulqdq xmm7,xmm7,xmm3,0x11
+ vpclmulqdq xmm1,xmm1,xmm15,0x00
+
+ vmovdqu xmm9,XMMWORD[64+rsp]
+ vpclmulqdq xmm4,xmm6,xmm0,0x00
+ vmovdqu xmm3,XMMWORD[((48-32))+r9]
+ vpxor xmm4,xmm4,xmm5
+ vpunpckhqdq xmm5,xmm9,xmm9
+ vpclmulqdq xmm6,xmm6,xmm0,0x11
+ vpxor xmm5,xmm5,xmm9
+ vpxor xmm6,xmm6,xmm7
+ vpclmulqdq xmm2,xmm2,xmm15,0x10
+ vmovdqu xmm15,XMMWORD[((80-32))+r9]
+ vpxor xmm2,xmm2,xmm1
+
+ vmovdqu xmm1,XMMWORD[80+rsp]
+ vpclmulqdq xmm7,xmm9,xmm3,0x00
+ vmovdqu xmm0,XMMWORD[((64-32))+r9]
+ vpxor xmm7,xmm7,xmm4
+ vpunpckhqdq xmm4,xmm1,xmm1
+ vpclmulqdq xmm9,xmm9,xmm3,0x11
+ vpxor xmm4,xmm4,xmm1
+ vpxor xmm9,xmm9,xmm6
+ vpclmulqdq xmm5,xmm5,xmm15,0x00
+ vpxor xmm5,xmm5,xmm2
+
+ vmovdqu xmm2,XMMWORD[96+rsp]
+ vpclmulqdq xmm6,xmm1,xmm0,0x00
+ vmovdqu xmm3,XMMWORD[((96-32))+r9]
+ vpxor xmm6,xmm6,xmm7
+ vpunpckhqdq xmm7,xmm2,xmm2
+ vpclmulqdq xmm1,xmm1,xmm0,0x11
+ vpxor xmm7,xmm7,xmm2
+ vpxor xmm1,xmm1,xmm9
+ vpclmulqdq xmm4,xmm4,xmm15,0x10
+ vmovdqu xmm15,XMMWORD[((128-32))+r9]
+ vpxor xmm4,xmm4,xmm5
+
+ vpxor xmm8,xmm8,XMMWORD[112+rsp]
+ vpclmulqdq xmm5,xmm2,xmm3,0x00
+ vmovdqu xmm0,XMMWORD[((112-32))+r9]
+ vpunpckhqdq xmm9,xmm8,xmm8
+ vpxor xmm5,xmm5,xmm6
+ vpclmulqdq xmm2,xmm2,xmm3,0x11
+ vpxor xmm9,xmm9,xmm8
+ vpxor xmm2,xmm2,xmm1
+ vpclmulqdq xmm7,xmm7,xmm15,0x00
+ vpxor xmm4,xmm7,xmm4
+
+ vpclmulqdq xmm6,xmm8,xmm0,0x00
+ vmovdqu xmm3,XMMWORD[((0-32))+r9]
+ vpunpckhqdq xmm1,xmm14,xmm14
+ vpclmulqdq xmm8,xmm8,xmm0,0x11
+ vpxor xmm1,xmm1,xmm14
+ vpxor xmm5,xmm6,xmm5
+ vpclmulqdq xmm9,xmm9,xmm15,0x10
+ vmovdqu xmm15,XMMWORD[((32-32))+r9]
+ vpxor xmm7,xmm8,xmm2
+ vpxor xmm6,xmm9,xmm4
+
+ vmovdqu xmm0,XMMWORD[((16-32))+r9]
+ vpxor xmm9,xmm7,xmm5
+ vpclmulqdq xmm4,xmm14,xmm3,0x00
+ vpxor xmm6,xmm6,xmm9
+ vpunpckhqdq xmm2,xmm13,xmm13
+ vpclmulqdq xmm14,xmm14,xmm3,0x11
+ vpxor xmm2,xmm2,xmm13
+ vpslldq xmm9,xmm6,8
+ vpclmulqdq xmm1,xmm1,xmm15,0x00
+ vpxor xmm8,xmm5,xmm9
+ vpsrldq xmm6,xmm6,8
+ vpxor xmm7,xmm7,xmm6
+
+ vpclmulqdq xmm5,xmm13,xmm0,0x00
+ vmovdqu xmm3,XMMWORD[((48-32))+r9]
+ vpxor xmm5,xmm5,xmm4
+ vpunpckhqdq xmm9,xmm12,xmm12
+ vpclmulqdq xmm13,xmm13,xmm0,0x11
+ vpxor xmm9,xmm9,xmm12
+ vpxor xmm13,xmm13,xmm14
+ vpalignr xmm14,xmm8,xmm8,8
+ vpclmulqdq xmm2,xmm2,xmm15,0x10
+ vmovdqu xmm15,XMMWORD[((80-32))+r9]
+ vpxor xmm2,xmm2,xmm1
+
+ vpclmulqdq xmm4,xmm12,xmm3,0x00
+ vmovdqu xmm0,XMMWORD[((64-32))+r9]
+ vpxor xmm4,xmm4,xmm5
+ vpunpckhqdq xmm1,xmm11,xmm11
+ vpclmulqdq xmm12,xmm12,xmm3,0x11
+ vpxor xmm1,xmm1,xmm11
+ vpxor xmm12,xmm12,xmm13
+ vxorps xmm7,xmm7,XMMWORD[16+rsp]
+ vpclmulqdq xmm9,xmm9,xmm15,0x00
+ vpxor xmm9,xmm9,xmm2
+
+ vpclmulqdq xmm8,xmm8,XMMWORD[16+r11],0x10
+ vxorps xmm8,xmm8,xmm14
+
+ vpclmulqdq xmm5,xmm11,xmm0,0x00
+ vmovdqu xmm3,XMMWORD[((96-32))+r9]
+ vpxor xmm5,xmm5,xmm4
+ vpunpckhqdq xmm2,xmm10,xmm10
+ vpclmulqdq xmm11,xmm11,xmm0,0x11
+ vpxor xmm2,xmm2,xmm10
+ vpalignr xmm14,xmm8,xmm8,8
+ vpxor xmm11,xmm11,xmm12
+ vpclmulqdq xmm1,xmm1,xmm15,0x10
+ vmovdqu xmm15,XMMWORD[((128-32))+r9]
+ vpxor xmm1,xmm1,xmm9
+
+ vxorps xmm14,xmm14,xmm7
+ vpclmulqdq xmm8,xmm8,XMMWORD[16+r11],0x10
+ vxorps xmm8,xmm8,xmm14
+
+ vpclmulqdq xmm4,xmm10,xmm3,0x00
+ vmovdqu xmm0,XMMWORD[((112-32))+r9]
+ vpxor xmm4,xmm4,xmm5
+ vpunpckhqdq xmm9,xmm8,xmm8
+ vpclmulqdq xmm10,xmm10,xmm3,0x11
+ vpxor xmm9,xmm9,xmm8
+ vpxor xmm10,xmm10,xmm11
+ vpclmulqdq xmm2,xmm2,xmm15,0x00
+ vpxor xmm2,xmm2,xmm1
+
+ vpclmulqdq xmm5,xmm8,xmm0,0x00
+ vpclmulqdq xmm7,xmm8,xmm0,0x11
+ vpxor xmm5,xmm5,xmm4
+ vpclmulqdq xmm6,xmm9,xmm15,0x10
+ vpxor xmm7,xmm7,xmm10
+ vpxor xmm6,xmm6,xmm2
+
+ vpxor xmm4,xmm7,xmm5
+ vpxor xmm6,xmm6,xmm4
+ vpslldq xmm1,xmm6,8
+ vmovdqu xmm3,XMMWORD[16+r11]
+ vpsrldq xmm6,xmm6,8
+ vpxor xmm8,xmm5,xmm1
+ vpxor xmm7,xmm7,xmm6
+
+ vpalignr xmm2,xmm8,xmm8,8
+ vpclmulqdq xmm8,xmm8,xmm3,0x10
+ vpxor xmm8,xmm8,xmm2
+
+ vpalignr xmm2,xmm8,xmm8,8
+ vpclmulqdq xmm8,xmm8,xmm3,0x10
+ vpxor xmm2,xmm2,xmm7
+ vpxor xmm8,xmm8,xmm2
+ vpshufb xmm8,xmm8,XMMWORD[r11]
+ vmovdqu XMMWORD[(-64)+r9],xmm8
+
+ vzeroupper
+ movaps xmm6,XMMWORD[((-216))+rax]
+ movaps xmm7,XMMWORD[((-200))+rax]
+ movaps xmm8,XMMWORD[((-184))+rax]
+ movaps xmm9,XMMWORD[((-168))+rax]
+ movaps xmm10,XMMWORD[((-152))+rax]
+ movaps xmm11,XMMWORD[((-136))+rax]
+ movaps xmm12,XMMWORD[((-120))+rax]
+ movaps xmm13,XMMWORD[((-104))+rax]
+ movaps xmm14,XMMWORD[((-88))+rax]
+ movaps xmm15,XMMWORD[((-72))+rax]
+ mov r15,QWORD[((-48))+rax]
+
+ mov r14,QWORD[((-40))+rax]
+
+ mov r13,QWORD[((-32))+rax]
+
+ mov r12,QWORD[((-24))+rax]
+
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$gcm_enc_abort:
+ mov rax,r10
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_aesni_gcm_encrypt:
+ALIGN 64
+$L$bswap_mask:
+DB 15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0
+$L$poly:
+DB 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0xc2
+$L$one_msb:
+DB 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
+$L$two_lsb:
+DB 2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
+$L$one_lsb:
+DB 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
+DB 65,69,83,45,78,73,32,71,67,77,32,109,111,100,117,108
+DB 101,32,102,111,114,32,120,56,54,95,54,52,44,32,67,82
+DB 89,80,84,79,71,65,77,83,32,98,121,32,60,97,112,112
+DB 114,111,64,111,112,101,110,115,115,108,46,111,114,103,62,0
+ALIGN 64
+EXTERN __imp_RtlVirtualUnwind
+
+ALIGN 16
+gcm_se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$common_seh_tail
+
+ mov rax,QWORD[152+r8]
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$common_seh_tail
+
+ mov rax,QWORD[120+r8]
+
+ mov r15,QWORD[((-48))+rax]
+ mov r14,QWORD[((-40))+rax]
+ mov r13,QWORD[((-32))+rax]
+ mov r12,QWORD[((-24))+rax]
+ mov rbp,QWORD[((-16))+rax]
+ mov rbx,QWORD[((-8))+rax]
+ mov QWORD[240+r8],r15
+ mov QWORD[232+r8],r14
+ mov QWORD[224+r8],r13
+ mov QWORD[216+r8],r12
+ mov QWORD[160+r8],rbp
+ mov QWORD[144+r8],rbx
+
+ lea rsi,[((-216))+rax]
+ lea rdi,[512+r8]
+ mov ecx,20
+ DD 0xa548f3fc
+
+$L$common_seh_tail:
+ mov rdi,QWORD[8+rax]
+ mov rsi,QWORD[16+rax]
+ mov QWORD[152+r8],rax
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+
+ mov rdi,QWORD[40+r9]
+ mov rsi,r8
+ mov ecx,154
+ DD 0xa548f3fc
+
+ mov rsi,r9
+ xor rcx,rcx
+ mov rdx,QWORD[8+rsi]
+ mov r8,QWORD[rsi]
+ mov r9,QWORD[16+rsi]
+ mov r10,QWORD[40+rsi]
+ lea r11,[56+rsi]
+ lea r12,[24+rsi]
+ mov QWORD[32+rsp],r10
+ mov QWORD[40+rsp],r11
+ mov QWORD[48+rsp],r12
+ mov QWORD[56+rsp],rcx
+ call QWORD[__imp_RtlVirtualUnwind]
+
+ mov eax,1
+ add rsp,64
+ popfq
+ pop r15
+ pop r14
+ pop r13
+ pop r12
+ pop rbp
+ pop rbx
+ pop rdi
+ pop rsi
+ DB 0F3h,0C3h ;repret
+
+
+section .pdata rdata align=4
+ALIGN 4
+ DD $L$SEH_begin_aesni_gcm_decrypt wrt ..imagebase
+ DD $L$SEH_end_aesni_gcm_decrypt wrt ..imagebase
+ DD $L$SEH_gcm_dec_info wrt ..imagebase
+
+ DD $L$SEH_begin_aesni_gcm_encrypt wrt ..imagebase
+ DD $L$SEH_end_aesni_gcm_encrypt wrt ..imagebase
+ DD $L$SEH_gcm_enc_info wrt ..imagebase
+section .xdata rdata align=8
+ALIGN 8
+$L$SEH_gcm_dec_info:
+DB 9,0,0,0
+ DD gcm_se_handler wrt ..imagebase
+ DD $L$gcm_dec_body wrt ..imagebase,$L$gcm_dec_abort wrt ..imagebase
+$L$SEH_gcm_enc_info:
+DB 9,0,0,0
+ DD gcm_se_handler wrt ..imagebase
+ DD $L$gcm_enc_body wrt ..imagebase,$L$gcm_enc_abort wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash-x86_64.nasm
new file mode 100644
index 0000000000..3d67e12775
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash-x86_64.nasm
@@ -0,0 +1,2077 @@
+; Copyright 2010-2019 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+EXTERN OPENSSL_ia32cap_P
+
+global gcm_gmult_4bit
+
+ALIGN 16
+gcm_gmult_4bit:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_gcm_gmult_4bit:
+ mov rdi,rcx
+ mov rsi,rdx
+
+
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ sub rsp,280
+
+$L$gmult_prologue:
+
+ movzx r8,BYTE[15+rdi]
+ lea r11,[$L$rem_4bit]
+ xor rax,rax
+ xor rbx,rbx
+ mov al,r8b
+ mov bl,r8b
+ shl al,4
+ mov rcx,14
+ mov r8,QWORD[8+rax*1+rsi]
+ mov r9,QWORD[rax*1+rsi]
+ and bl,0xf0
+ mov rdx,r8
+ jmp NEAR $L$oop1
+
+ALIGN 16
+$L$oop1:
+ shr r8,4
+ and rdx,0xf
+ mov r10,r9
+ mov al,BYTE[rcx*1+rdi]
+ shr r9,4
+ xor r8,QWORD[8+rbx*1+rsi]
+ shl r10,60
+ xor r9,QWORD[rbx*1+rsi]
+ mov bl,al
+ xor r9,QWORD[rdx*8+r11]
+ mov rdx,r8
+ shl al,4
+ xor r8,r10
+ dec rcx
+ js NEAR $L$break1
+
+ shr r8,4
+ and rdx,0xf
+ mov r10,r9
+ shr r9,4
+ xor r8,QWORD[8+rax*1+rsi]
+ shl r10,60
+ xor r9,QWORD[rax*1+rsi]
+ and bl,0xf0
+ xor r9,QWORD[rdx*8+r11]
+ mov rdx,r8
+ xor r8,r10
+ jmp NEAR $L$oop1
+
+ALIGN 16
+$L$break1:
+ shr r8,4
+ and rdx,0xf
+ mov r10,r9
+ shr r9,4
+ xor r8,QWORD[8+rax*1+rsi]
+ shl r10,60
+ xor r9,QWORD[rax*1+rsi]
+ and bl,0xf0
+ xor r9,QWORD[rdx*8+r11]
+ mov rdx,r8
+ xor r8,r10
+
+ shr r8,4
+ and rdx,0xf
+ mov r10,r9
+ shr r9,4
+ xor r8,QWORD[8+rbx*1+rsi]
+ shl r10,60
+ xor r9,QWORD[rbx*1+rsi]
+ xor r8,r10
+ xor r9,QWORD[rdx*8+r11]
+
+ bswap r8
+ bswap r9
+ mov QWORD[8+rdi],r8
+ mov QWORD[rdi],r9
+
+ lea rsi,[((280+48))+rsp]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$gmult_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_gcm_gmult_4bit:
+global gcm_ghash_4bit
+
+ALIGN 16
+gcm_ghash_4bit:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_gcm_ghash_4bit:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+
+
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ sub rsp,280
+
+$L$ghash_prologue:
+ mov r14,rdx
+ mov r15,rcx
+ sub rsi,-128
+ lea rbp,[((16+128))+rsp]
+ xor edx,edx
+ mov r8,QWORD[((0+0-128))+rsi]
+ mov rax,QWORD[((0+8-128))+rsi]
+ mov dl,al
+ shr rax,4
+ mov r10,r8
+ shr r8,4
+ mov r9,QWORD[((16+0-128))+rsi]
+ shl dl,4
+ mov rbx,QWORD[((16+8-128))+rsi]
+ shl r10,60
+ mov BYTE[rsp],dl
+ or rax,r10
+ mov dl,bl
+ shr rbx,4
+ mov r10,r9
+ shr r9,4
+ mov QWORD[rbp],r8
+ mov r8,QWORD[((32+0-128))+rsi]
+ shl dl,4
+ mov QWORD[((0-128))+rbp],rax
+ mov rax,QWORD[((32+8-128))+rsi]
+ shl r10,60
+ mov BYTE[1+rsp],dl
+ or rbx,r10
+ mov dl,al
+ shr rax,4
+ mov r10,r8
+ shr r8,4
+ mov QWORD[8+rbp],r9
+ mov r9,QWORD[((48+0-128))+rsi]
+ shl dl,4
+ mov QWORD[((8-128))+rbp],rbx
+ mov rbx,QWORD[((48+8-128))+rsi]
+ shl r10,60
+ mov BYTE[2+rsp],dl
+ or rax,r10
+ mov dl,bl
+ shr rbx,4
+ mov r10,r9
+ shr r9,4
+ mov QWORD[16+rbp],r8
+ mov r8,QWORD[((64+0-128))+rsi]
+ shl dl,4
+ mov QWORD[((16-128))+rbp],rax
+ mov rax,QWORD[((64+8-128))+rsi]
+ shl r10,60
+ mov BYTE[3+rsp],dl
+ or rbx,r10
+ mov dl,al
+ shr rax,4
+ mov r10,r8
+ shr r8,4
+ mov QWORD[24+rbp],r9
+ mov r9,QWORD[((80+0-128))+rsi]
+ shl dl,4
+ mov QWORD[((24-128))+rbp],rbx
+ mov rbx,QWORD[((80+8-128))+rsi]
+ shl r10,60
+ mov BYTE[4+rsp],dl
+ or rax,r10
+ mov dl,bl
+ shr rbx,4
+ mov r10,r9
+ shr r9,4
+ mov QWORD[32+rbp],r8
+ mov r8,QWORD[((96+0-128))+rsi]
+ shl dl,4
+ mov QWORD[((32-128))+rbp],rax
+ mov rax,QWORD[((96+8-128))+rsi]
+ shl r10,60
+ mov BYTE[5+rsp],dl
+ or rbx,r10
+ mov dl,al
+ shr rax,4
+ mov r10,r8
+ shr r8,4
+ mov QWORD[40+rbp],r9
+ mov r9,QWORD[((112+0-128))+rsi]
+ shl dl,4
+ mov QWORD[((40-128))+rbp],rbx
+ mov rbx,QWORD[((112+8-128))+rsi]
+ shl r10,60
+ mov BYTE[6+rsp],dl
+ or rax,r10
+ mov dl,bl
+ shr rbx,4
+ mov r10,r9
+ shr r9,4
+ mov QWORD[48+rbp],r8
+ mov r8,QWORD[((128+0-128))+rsi]
+ shl dl,4
+ mov QWORD[((48-128))+rbp],rax
+ mov rax,QWORD[((128+8-128))+rsi]
+ shl r10,60
+ mov BYTE[7+rsp],dl
+ or rbx,r10
+ mov dl,al
+ shr rax,4
+ mov r10,r8
+ shr r8,4
+ mov QWORD[56+rbp],r9
+ mov r9,QWORD[((144+0-128))+rsi]
+ shl dl,4
+ mov QWORD[((56-128))+rbp],rbx
+ mov rbx,QWORD[((144+8-128))+rsi]
+ shl r10,60
+ mov BYTE[8+rsp],dl
+ or rax,r10
+ mov dl,bl
+ shr rbx,4
+ mov r10,r9
+ shr r9,4
+ mov QWORD[64+rbp],r8
+ mov r8,QWORD[((160+0-128))+rsi]
+ shl dl,4
+ mov QWORD[((64-128))+rbp],rax
+ mov rax,QWORD[((160+8-128))+rsi]
+ shl r10,60
+ mov BYTE[9+rsp],dl
+ or rbx,r10
+ mov dl,al
+ shr rax,4
+ mov r10,r8
+ shr r8,4
+ mov QWORD[72+rbp],r9
+ mov r9,QWORD[((176+0-128))+rsi]
+ shl dl,4
+ mov QWORD[((72-128))+rbp],rbx
+ mov rbx,QWORD[((176+8-128))+rsi]
+ shl r10,60
+ mov BYTE[10+rsp],dl
+ or rax,r10
+ mov dl,bl
+ shr rbx,4
+ mov r10,r9
+ shr r9,4
+ mov QWORD[80+rbp],r8
+ mov r8,QWORD[((192+0-128))+rsi]
+ shl dl,4
+ mov QWORD[((80-128))+rbp],rax
+ mov rax,QWORD[((192+8-128))+rsi]
+ shl r10,60
+ mov BYTE[11+rsp],dl
+ or rbx,r10
+ mov dl,al
+ shr rax,4
+ mov r10,r8
+ shr r8,4
+ mov QWORD[88+rbp],r9
+ mov r9,QWORD[((208+0-128))+rsi]
+ shl dl,4
+ mov QWORD[((88-128))+rbp],rbx
+ mov rbx,QWORD[((208+8-128))+rsi]
+ shl r10,60
+ mov BYTE[12+rsp],dl
+ or rax,r10
+ mov dl,bl
+ shr rbx,4
+ mov r10,r9
+ shr r9,4
+ mov QWORD[96+rbp],r8
+ mov r8,QWORD[((224+0-128))+rsi]
+ shl dl,4
+ mov QWORD[((96-128))+rbp],rax
+ mov rax,QWORD[((224+8-128))+rsi]
+ shl r10,60
+ mov BYTE[13+rsp],dl
+ or rbx,r10
+ mov dl,al
+ shr rax,4
+ mov r10,r8
+ shr r8,4
+ mov QWORD[104+rbp],r9
+ mov r9,QWORD[((240+0-128))+rsi]
+ shl dl,4
+ mov QWORD[((104-128))+rbp],rbx
+ mov rbx,QWORD[((240+8-128))+rsi]
+ shl r10,60
+ mov BYTE[14+rsp],dl
+ or rax,r10
+ mov dl,bl
+ shr rbx,4
+ mov r10,r9
+ shr r9,4
+ mov QWORD[112+rbp],r8
+ shl dl,4
+ mov QWORD[((112-128))+rbp],rax
+ shl r10,60
+ mov BYTE[15+rsp],dl
+ or rbx,r10
+ mov QWORD[120+rbp],r9
+ mov QWORD[((120-128))+rbp],rbx
+ add rsi,-128
+ mov r8,QWORD[8+rdi]
+ mov r9,QWORD[rdi]
+ add r15,r14
+ lea r11,[$L$rem_8bit]
+ jmp NEAR $L$outer_loop
+ALIGN 16
+$L$outer_loop:
+ xor r9,QWORD[r14]
+ mov rdx,QWORD[8+r14]
+ lea r14,[16+r14]
+ xor rdx,r8
+ mov QWORD[rdi],r9
+ mov QWORD[8+rdi],rdx
+ shr rdx,32
+ xor rax,rax
+ rol edx,8
+ mov al,dl
+ movzx ebx,dl
+ shl al,4
+ shr ebx,4
+ rol edx,8
+ mov r8,QWORD[8+rax*1+rsi]
+ mov r9,QWORD[rax*1+rsi]
+ mov al,dl
+ movzx ecx,dl
+ shl al,4
+ movzx r12,BYTE[rbx*1+rsp]
+ shr ecx,4
+ xor r12,r8
+ mov r10,r9
+ shr r8,8
+ movzx r12,r12b
+ shr r9,8
+ xor r8,QWORD[((-128))+rbx*8+rbp]
+ shl r10,56
+ xor r9,QWORD[rbx*8+rbp]
+ rol edx,8
+ xor r8,QWORD[8+rax*1+rsi]
+ xor r9,QWORD[rax*1+rsi]
+ mov al,dl
+ xor r8,r10
+ movzx r12,WORD[r12*2+r11]
+ movzx ebx,dl
+ shl al,4
+ movzx r13,BYTE[rcx*1+rsp]
+ shr ebx,4
+ shl r12,48
+ xor r13,r8
+ mov r10,r9
+ xor r9,r12
+ shr r8,8
+ movzx r13,r13b
+ shr r9,8
+ xor r8,QWORD[((-128))+rcx*8+rbp]
+ shl r10,56
+ xor r9,QWORD[rcx*8+rbp]
+ rol edx,8
+ xor r8,QWORD[8+rax*1+rsi]
+ xor r9,QWORD[rax*1+rsi]
+ mov al,dl
+ xor r8,r10
+ movzx r13,WORD[r13*2+r11]
+ movzx ecx,dl
+ shl al,4
+ movzx r12,BYTE[rbx*1+rsp]
+ shr ecx,4
+ shl r13,48
+ xor r12,r8
+ mov r10,r9
+ xor r9,r13
+ shr r8,8
+ movzx r12,r12b
+ mov edx,DWORD[8+rdi]
+ shr r9,8
+ xor r8,QWORD[((-128))+rbx*8+rbp]
+ shl r10,56
+ xor r9,QWORD[rbx*8+rbp]
+ rol edx,8
+ xor r8,QWORD[8+rax*1+rsi]
+ xor r9,QWORD[rax*1+rsi]
+ mov al,dl
+ xor r8,r10
+ movzx r12,WORD[r12*2+r11]
+ movzx ebx,dl
+ shl al,4
+ movzx r13,BYTE[rcx*1+rsp]
+ shr ebx,4
+ shl r12,48
+ xor r13,r8
+ mov r10,r9
+ xor r9,r12
+ shr r8,8
+ movzx r13,r13b
+ shr r9,8
+ xor r8,QWORD[((-128))+rcx*8+rbp]
+ shl r10,56
+ xor r9,QWORD[rcx*8+rbp]
+ rol edx,8
+ xor r8,QWORD[8+rax*1+rsi]
+ xor r9,QWORD[rax*1+rsi]
+ mov al,dl
+ xor r8,r10
+ movzx r13,WORD[r13*2+r11]
+ movzx ecx,dl
+ shl al,4
+ movzx r12,BYTE[rbx*1+rsp]
+ shr ecx,4
+ shl r13,48
+ xor r12,r8
+ mov r10,r9
+ xor r9,r13
+ shr r8,8
+ movzx r12,r12b
+ shr r9,8
+ xor r8,QWORD[((-128))+rbx*8+rbp]
+ shl r10,56
+ xor r9,QWORD[rbx*8+rbp]
+ rol edx,8
+ xor r8,QWORD[8+rax*1+rsi]
+ xor r9,QWORD[rax*1+rsi]
+ mov al,dl
+ xor r8,r10
+ movzx r12,WORD[r12*2+r11]
+ movzx ebx,dl
+ shl al,4
+ movzx r13,BYTE[rcx*1+rsp]
+ shr ebx,4
+ shl r12,48
+ xor r13,r8
+ mov r10,r9
+ xor r9,r12
+ shr r8,8
+ movzx r13,r13b
+ shr r9,8
+ xor r8,QWORD[((-128))+rcx*8+rbp]
+ shl r10,56
+ xor r9,QWORD[rcx*8+rbp]
+ rol edx,8
+ xor r8,QWORD[8+rax*1+rsi]
+ xor r9,QWORD[rax*1+rsi]
+ mov al,dl
+ xor r8,r10
+ movzx r13,WORD[r13*2+r11]
+ movzx ecx,dl
+ shl al,4
+ movzx r12,BYTE[rbx*1+rsp]
+ shr ecx,4
+ shl r13,48
+ xor r12,r8
+ mov r10,r9
+ xor r9,r13
+ shr r8,8
+ movzx r12,r12b
+ mov edx,DWORD[4+rdi]
+ shr r9,8
+ xor r8,QWORD[((-128))+rbx*8+rbp]
+ shl r10,56
+ xor r9,QWORD[rbx*8+rbp]
+ rol edx,8
+ xor r8,QWORD[8+rax*1+rsi]
+ xor r9,QWORD[rax*1+rsi]
+ mov al,dl
+ xor r8,r10
+ movzx r12,WORD[r12*2+r11]
+ movzx ebx,dl
+ shl al,4
+ movzx r13,BYTE[rcx*1+rsp]
+ shr ebx,4
+ shl r12,48
+ xor r13,r8
+ mov r10,r9
+ xor r9,r12
+ shr r8,8
+ movzx r13,r13b
+ shr r9,8
+ xor r8,QWORD[((-128))+rcx*8+rbp]
+ shl r10,56
+ xor r9,QWORD[rcx*8+rbp]
+ rol edx,8
+ xor r8,QWORD[8+rax*1+rsi]
+ xor r9,QWORD[rax*1+rsi]
+ mov al,dl
+ xor r8,r10
+ movzx r13,WORD[r13*2+r11]
+ movzx ecx,dl
+ shl al,4
+ movzx r12,BYTE[rbx*1+rsp]
+ shr ecx,4
+ shl r13,48
+ xor r12,r8
+ mov r10,r9
+ xor r9,r13
+ shr r8,8
+ movzx r12,r12b
+ shr r9,8
+ xor r8,QWORD[((-128))+rbx*8+rbp]
+ shl r10,56
+ xor r9,QWORD[rbx*8+rbp]
+ rol edx,8
+ xor r8,QWORD[8+rax*1+rsi]
+ xor r9,QWORD[rax*1+rsi]
+ mov al,dl
+ xor r8,r10
+ movzx r12,WORD[r12*2+r11]
+ movzx ebx,dl
+ shl al,4
+ movzx r13,BYTE[rcx*1+rsp]
+ shr ebx,4
+ shl r12,48
+ xor r13,r8
+ mov r10,r9
+ xor r9,r12
+ shr r8,8
+ movzx r13,r13b
+ shr r9,8
+ xor r8,QWORD[((-128))+rcx*8+rbp]
+ shl r10,56
+ xor r9,QWORD[rcx*8+rbp]
+ rol edx,8
+ xor r8,QWORD[8+rax*1+rsi]
+ xor r9,QWORD[rax*1+rsi]
+ mov al,dl
+ xor r8,r10
+ movzx r13,WORD[r13*2+r11]
+ movzx ecx,dl
+ shl al,4
+ movzx r12,BYTE[rbx*1+rsp]
+ shr ecx,4
+ shl r13,48
+ xor r12,r8
+ mov r10,r9
+ xor r9,r13
+ shr r8,8
+ movzx r12,r12b
+ mov edx,DWORD[rdi]
+ shr r9,8
+ xor r8,QWORD[((-128))+rbx*8+rbp]
+ shl r10,56
+ xor r9,QWORD[rbx*8+rbp]
+ rol edx,8
+ xor r8,QWORD[8+rax*1+rsi]
+ xor r9,QWORD[rax*1+rsi]
+ mov al,dl
+ xor r8,r10
+ movzx r12,WORD[r12*2+r11]
+ movzx ebx,dl
+ shl al,4
+ movzx r13,BYTE[rcx*1+rsp]
+ shr ebx,4
+ shl r12,48
+ xor r13,r8
+ mov r10,r9
+ xor r9,r12
+ shr r8,8
+ movzx r13,r13b
+ shr r9,8
+ xor r8,QWORD[((-128))+rcx*8+rbp]
+ shl r10,56
+ xor r9,QWORD[rcx*8+rbp]
+ rol edx,8
+ xor r8,QWORD[8+rax*1+rsi]
+ xor r9,QWORD[rax*1+rsi]
+ mov al,dl
+ xor r8,r10
+ movzx r13,WORD[r13*2+r11]
+ movzx ecx,dl
+ shl al,4
+ movzx r12,BYTE[rbx*1+rsp]
+ shr ecx,4
+ shl r13,48
+ xor r12,r8
+ mov r10,r9
+ xor r9,r13
+ shr r8,8
+ movzx r12,r12b
+ shr r9,8
+ xor r8,QWORD[((-128))+rbx*8+rbp]
+ shl r10,56
+ xor r9,QWORD[rbx*8+rbp]
+ rol edx,8
+ xor r8,QWORD[8+rax*1+rsi]
+ xor r9,QWORD[rax*1+rsi]
+ mov al,dl
+ xor r8,r10
+ movzx r12,WORD[r12*2+r11]
+ movzx ebx,dl
+ shl al,4
+ movzx r13,BYTE[rcx*1+rsp]
+ shr ebx,4
+ shl r12,48
+ xor r13,r8
+ mov r10,r9
+ xor r9,r12
+ shr r8,8
+ movzx r13,r13b
+ shr r9,8
+ xor r8,QWORD[((-128))+rcx*8+rbp]
+ shl r10,56
+ xor r9,QWORD[rcx*8+rbp]
+ rol edx,8
+ xor r8,QWORD[8+rax*1+rsi]
+ xor r9,QWORD[rax*1+rsi]
+ mov al,dl
+ xor r8,r10
+ movzx r13,WORD[r13*2+r11]
+ movzx ecx,dl
+ shl al,4
+ movzx r12,BYTE[rbx*1+rsp]
+ and ecx,240
+ shl r13,48
+ xor r12,r8
+ mov r10,r9
+ xor r9,r13
+ shr r8,8
+ movzx r12,r12b
+ mov edx,DWORD[((-4))+rdi]
+ shr r9,8
+ xor r8,QWORD[((-128))+rbx*8+rbp]
+ shl r10,56
+ xor r9,QWORD[rbx*8+rbp]
+ movzx r12,WORD[r12*2+r11]
+ xor r8,QWORD[8+rax*1+rsi]
+ xor r9,QWORD[rax*1+rsi]
+ shl r12,48
+ xor r8,r10
+ xor r9,r12
+ movzx r13,r8b
+ shr r8,4
+ mov r10,r9
+ shl r13b,4
+ shr r9,4
+ xor r8,QWORD[8+rcx*1+rsi]
+ movzx r13,WORD[r13*2+r11]
+ shl r10,60
+ xor r9,QWORD[rcx*1+rsi]
+ xor r8,r10
+ shl r13,48
+ bswap r8
+ xor r9,r13
+ bswap r9
+ cmp r14,r15
+ jb NEAR $L$outer_loop
+ mov QWORD[8+rdi],r8
+ mov QWORD[rdi],r9
+
+ lea rsi,[((280+48))+rsp]
+
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$ghash_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_gcm_ghash_4bit:
+global gcm_init_clmul
+
+ALIGN 16
+gcm_init_clmul:
+
+$L$_init_clmul:
+$L$SEH_begin_gcm_init_clmul:
+
+DB 0x48,0x83,0xec,0x18
+DB 0x0f,0x29,0x34,0x24
+ movdqu xmm2,XMMWORD[rdx]
+ pshufd xmm2,xmm2,78
+
+
+ pshufd xmm4,xmm2,255
+ movdqa xmm3,xmm2
+ psllq xmm2,1
+ pxor xmm5,xmm5
+ psrlq xmm3,63
+ pcmpgtd xmm5,xmm4
+ pslldq xmm3,8
+ por xmm2,xmm3
+
+
+ pand xmm5,XMMWORD[$L$0x1c2_polynomial]
+ pxor xmm2,xmm5
+
+
+ pshufd xmm6,xmm2,78
+ movdqa xmm0,xmm2
+ pxor xmm6,xmm2
+ movdqa xmm1,xmm0
+ pshufd xmm3,xmm0,78
+ pxor xmm3,xmm0
+DB 102,15,58,68,194,0
+DB 102,15,58,68,202,17
+DB 102,15,58,68,222,0
+ pxor xmm3,xmm0
+ pxor xmm3,xmm1
+
+ movdqa xmm4,xmm3
+ psrldq xmm3,8
+ pslldq xmm4,8
+ pxor xmm1,xmm3
+ pxor xmm0,xmm4
+
+ movdqa xmm4,xmm0
+ movdqa xmm3,xmm0
+ psllq xmm0,5
+ pxor xmm3,xmm0
+ psllq xmm0,1
+ pxor xmm0,xmm3
+ psllq xmm0,57
+ movdqa xmm3,xmm0
+ pslldq xmm0,8
+ psrldq xmm3,8
+ pxor xmm0,xmm4
+ pxor xmm1,xmm3
+
+
+ movdqa xmm4,xmm0
+ psrlq xmm0,1
+ pxor xmm1,xmm4
+ pxor xmm4,xmm0
+ psrlq xmm0,5
+ pxor xmm0,xmm4
+ psrlq xmm0,1
+ pxor xmm0,xmm1
+ pshufd xmm3,xmm2,78
+ pshufd xmm4,xmm0,78
+ pxor xmm3,xmm2
+ movdqu XMMWORD[rcx],xmm2
+ pxor xmm4,xmm0
+ movdqu XMMWORD[16+rcx],xmm0
+DB 102,15,58,15,227,8
+ movdqu XMMWORD[32+rcx],xmm4
+ movdqa xmm1,xmm0
+ pshufd xmm3,xmm0,78
+ pxor xmm3,xmm0
+DB 102,15,58,68,194,0
+DB 102,15,58,68,202,17
+DB 102,15,58,68,222,0
+ pxor xmm3,xmm0
+ pxor xmm3,xmm1
+
+ movdqa xmm4,xmm3
+ psrldq xmm3,8
+ pslldq xmm4,8
+ pxor xmm1,xmm3
+ pxor xmm0,xmm4
+
+ movdqa xmm4,xmm0
+ movdqa xmm3,xmm0
+ psllq xmm0,5
+ pxor xmm3,xmm0
+ psllq xmm0,1
+ pxor xmm0,xmm3
+ psllq xmm0,57
+ movdqa xmm3,xmm0
+ pslldq xmm0,8
+ psrldq xmm3,8
+ pxor xmm0,xmm4
+ pxor xmm1,xmm3
+
+
+ movdqa xmm4,xmm0
+ psrlq xmm0,1
+ pxor xmm1,xmm4
+ pxor xmm4,xmm0
+ psrlq xmm0,5
+ pxor xmm0,xmm4
+ psrlq xmm0,1
+ pxor xmm0,xmm1
+ movdqa xmm5,xmm0
+ movdqa xmm1,xmm0
+ pshufd xmm3,xmm0,78
+ pxor xmm3,xmm0
+DB 102,15,58,68,194,0
+DB 102,15,58,68,202,17
+DB 102,15,58,68,222,0
+ pxor xmm3,xmm0
+ pxor xmm3,xmm1
+
+ movdqa xmm4,xmm3
+ psrldq xmm3,8
+ pslldq xmm4,8
+ pxor xmm1,xmm3
+ pxor xmm0,xmm4
+
+ movdqa xmm4,xmm0
+ movdqa xmm3,xmm0
+ psllq xmm0,5
+ pxor xmm3,xmm0
+ psllq xmm0,1
+ pxor xmm0,xmm3
+ psllq xmm0,57
+ movdqa xmm3,xmm0
+ pslldq xmm0,8
+ psrldq xmm3,8
+ pxor xmm0,xmm4
+ pxor xmm1,xmm3
+
+
+ movdqa xmm4,xmm0
+ psrlq xmm0,1
+ pxor xmm1,xmm4
+ pxor xmm4,xmm0
+ psrlq xmm0,5
+ pxor xmm0,xmm4
+ psrlq xmm0,1
+ pxor xmm0,xmm1
+ pshufd xmm3,xmm5,78
+ pshufd xmm4,xmm0,78
+ pxor xmm3,xmm5
+ movdqu XMMWORD[48+rcx],xmm5
+ pxor xmm4,xmm0
+ movdqu XMMWORD[64+rcx],xmm0
+DB 102,15,58,15,227,8
+ movdqu XMMWORD[80+rcx],xmm4
+ movaps xmm6,XMMWORD[rsp]
+ lea rsp,[24+rsp]
+$L$SEH_end_gcm_init_clmul:
+ DB 0F3h,0C3h ;repret
+
+
+global gcm_gmult_clmul
+
+ALIGN 16
+gcm_gmult_clmul:
+
+$L$_gmult_clmul:
+ movdqu xmm0,XMMWORD[rcx]
+ movdqa xmm5,XMMWORD[$L$bswap_mask]
+ movdqu xmm2,XMMWORD[rdx]
+ movdqu xmm4,XMMWORD[32+rdx]
+DB 102,15,56,0,197
+ movdqa xmm1,xmm0
+ pshufd xmm3,xmm0,78
+ pxor xmm3,xmm0
+DB 102,15,58,68,194,0
+DB 102,15,58,68,202,17
+DB 102,15,58,68,220,0
+ pxor xmm3,xmm0
+ pxor xmm3,xmm1
+
+ movdqa xmm4,xmm3
+ psrldq xmm3,8
+ pslldq xmm4,8
+ pxor xmm1,xmm3
+ pxor xmm0,xmm4
+
+ movdqa xmm4,xmm0
+ movdqa xmm3,xmm0
+ psllq xmm0,5
+ pxor xmm3,xmm0
+ psllq xmm0,1
+ pxor xmm0,xmm3
+ psllq xmm0,57
+ movdqa xmm3,xmm0
+ pslldq xmm0,8
+ psrldq xmm3,8
+ pxor xmm0,xmm4
+ pxor xmm1,xmm3
+
+
+ movdqa xmm4,xmm0
+ psrlq xmm0,1
+ pxor xmm1,xmm4
+ pxor xmm4,xmm0
+ psrlq xmm0,5
+ pxor xmm0,xmm4
+ psrlq xmm0,1
+ pxor xmm0,xmm1
+DB 102,15,56,0,197
+ movdqu XMMWORD[rcx],xmm0
+ DB 0F3h,0C3h ;repret
+
+
+global gcm_ghash_clmul
+
+ALIGN 32
+gcm_ghash_clmul:
+
+$L$_ghash_clmul:
+ lea rax,[((-136))+rsp]
+$L$SEH_begin_gcm_ghash_clmul:
+
+DB 0x48,0x8d,0x60,0xe0
+DB 0x0f,0x29,0x70,0xe0
+DB 0x0f,0x29,0x78,0xf0
+DB 0x44,0x0f,0x29,0x00
+DB 0x44,0x0f,0x29,0x48,0x10
+DB 0x44,0x0f,0x29,0x50,0x20
+DB 0x44,0x0f,0x29,0x58,0x30
+DB 0x44,0x0f,0x29,0x60,0x40
+DB 0x44,0x0f,0x29,0x68,0x50
+DB 0x44,0x0f,0x29,0x70,0x60
+DB 0x44,0x0f,0x29,0x78,0x70
+ movdqa xmm10,XMMWORD[$L$bswap_mask]
+
+ movdqu xmm0,XMMWORD[rcx]
+ movdqu xmm2,XMMWORD[rdx]
+ movdqu xmm7,XMMWORD[32+rdx]
+DB 102,65,15,56,0,194
+
+ sub r9,0x10
+ jz NEAR $L$odd_tail
+
+ movdqu xmm6,XMMWORD[16+rdx]
+ mov eax,DWORD[((OPENSSL_ia32cap_P+4))]
+ cmp r9,0x30
+ jb NEAR $L$skip4x
+
+ and eax,71303168
+ cmp eax,4194304
+ je NEAR $L$skip4x
+
+ sub r9,0x30
+ mov rax,0xA040608020C0E000
+ movdqu xmm14,XMMWORD[48+rdx]
+ movdqu xmm15,XMMWORD[64+rdx]
+
+
+
+
+ movdqu xmm3,XMMWORD[48+r8]
+ movdqu xmm11,XMMWORD[32+r8]
+DB 102,65,15,56,0,218
+DB 102,69,15,56,0,218
+ movdqa xmm5,xmm3
+ pshufd xmm4,xmm3,78
+ pxor xmm4,xmm3
+DB 102,15,58,68,218,0
+DB 102,15,58,68,234,17
+DB 102,15,58,68,231,0
+
+ movdqa xmm13,xmm11
+ pshufd xmm12,xmm11,78
+ pxor xmm12,xmm11
+DB 102,68,15,58,68,222,0
+DB 102,68,15,58,68,238,17
+DB 102,68,15,58,68,231,16
+ xorps xmm3,xmm11
+ xorps xmm5,xmm13
+ movups xmm7,XMMWORD[80+rdx]
+ xorps xmm4,xmm12
+
+ movdqu xmm11,XMMWORD[16+r8]
+ movdqu xmm8,XMMWORD[r8]
+DB 102,69,15,56,0,218
+DB 102,69,15,56,0,194
+ movdqa xmm13,xmm11
+ pshufd xmm12,xmm11,78
+ pxor xmm0,xmm8
+ pxor xmm12,xmm11
+DB 102,69,15,58,68,222,0
+ movdqa xmm1,xmm0
+ pshufd xmm8,xmm0,78
+ pxor xmm8,xmm0
+DB 102,69,15,58,68,238,17
+DB 102,68,15,58,68,231,0
+ xorps xmm3,xmm11
+ xorps xmm5,xmm13
+
+ lea r8,[64+r8]
+ sub r9,0x40
+ jc NEAR $L$tail4x
+
+ jmp NEAR $L$mod4_loop
+ALIGN 32
+$L$mod4_loop:
+DB 102,65,15,58,68,199,0
+ xorps xmm4,xmm12
+ movdqu xmm11,XMMWORD[48+r8]
+DB 102,69,15,56,0,218
+DB 102,65,15,58,68,207,17
+ xorps xmm0,xmm3
+ movdqu xmm3,XMMWORD[32+r8]
+ movdqa xmm13,xmm11
+DB 102,68,15,58,68,199,16
+ pshufd xmm12,xmm11,78
+ xorps xmm1,xmm5
+ pxor xmm12,xmm11
+DB 102,65,15,56,0,218
+ movups xmm7,XMMWORD[32+rdx]
+ xorps xmm8,xmm4
+DB 102,68,15,58,68,218,0
+ pshufd xmm4,xmm3,78
+
+ pxor xmm8,xmm0
+ movdqa xmm5,xmm3
+ pxor xmm8,xmm1
+ pxor xmm4,xmm3
+ movdqa xmm9,xmm8
+DB 102,68,15,58,68,234,17
+ pslldq xmm8,8
+ psrldq xmm9,8
+ pxor xmm0,xmm8
+ movdqa xmm8,XMMWORD[$L$7_mask]
+ pxor xmm1,xmm9
+DB 102,76,15,110,200
+
+ pand xmm8,xmm0
+DB 102,69,15,56,0,200
+ pxor xmm9,xmm0
+DB 102,68,15,58,68,231,0
+ psllq xmm9,57
+ movdqa xmm8,xmm9
+ pslldq xmm9,8
+DB 102,15,58,68,222,0
+ psrldq xmm8,8
+ pxor xmm0,xmm9
+ pxor xmm1,xmm8
+ movdqu xmm8,XMMWORD[r8]
+
+ movdqa xmm9,xmm0
+ psrlq xmm0,1
+DB 102,15,58,68,238,17
+ xorps xmm3,xmm11
+ movdqu xmm11,XMMWORD[16+r8]
+DB 102,69,15,56,0,218
+DB 102,15,58,68,231,16
+ xorps xmm5,xmm13
+ movups xmm7,XMMWORD[80+rdx]
+DB 102,69,15,56,0,194
+ pxor xmm1,xmm9
+ pxor xmm9,xmm0
+ psrlq xmm0,5
+
+ movdqa xmm13,xmm11
+ pxor xmm4,xmm12
+ pshufd xmm12,xmm11,78
+ pxor xmm0,xmm9
+ pxor xmm1,xmm8
+ pxor xmm12,xmm11
+DB 102,69,15,58,68,222,0
+ psrlq xmm0,1
+ pxor xmm0,xmm1
+ movdqa xmm1,xmm0
+DB 102,69,15,58,68,238,17
+ xorps xmm3,xmm11
+ pshufd xmm8,xmm0,78
+ pxor xmm8,xmm0
+
+DB 102,68,15,58,68,231,0
+ xorps xmm5,xmm13
+
+ lea r8,[64+r8]
+ sub r9,0x40
+ jnc NEAR $L$mod4_loop
+
+$L$tail4x:
+DB 102,65,15,58,68,199,0
+DB 102,65,15,58,68,207,17
+DB 102,68,15,58,68,199,16
+ xorps xmm4,xmm12
+ xorps xmm0,xmm3
+ xorps xmm1,xmm5
+ pxor xmm1,xmm0
+ pxor xmm8,xmm4
+
+ pxor xmm8,xmm1
+ pxor xmm1,xmm0
+
+ movdqa xmm9,xmm8
+ psrldq xmm8,8
+ pslldq xmm9,8
+ pxor xmm1,xmm8
+ pxor xmm0,xmm9
+
+ movdqa xmm4,xmm0
+ movdqa xmm3,xmm0
+ psllq xmm0,5
+ pxor xmm3,xmm0
+ psllq xmm0,1
+ pxor xmm0,xmm3
+ psllq xmm0,57
+ movdqa xmm3,xmm0
+ pslldq xmm0,8
+ psrldq xmm3,8
+ pxor xmm0,xmm4
+ pxor xmm1,xmm3
+
+
+ movdqa xmm4,xmm0
+ psrlq xmm0,1
+ pxor xmm1,xmm4
+ pxor xmm4,xmm0
+ psrlq xmm0,5
+ pxor xmm0,xmm4
+ psrlq xmm0,1
+ pxor xmm0,xmm1
+ add r9,0x40
+ jz NEAR $L$done
+ movdqu xmm7,XMMWORD[32+rdx]
+ sub r9,0x10
+ jz NEAR $L$odd_tail
+$L$skip4x:
+
+
+
+
+
+ movdqu xmm8,XMMWORD[r8]
+ movdqu xmm3,XMMWORD[16+r8]
+DB 102,69,15,56,0,194
+DB 102,65,15,56,0,218
+ pxor xmm0,xmm8
+
+ movdqa xmm5,xmm3
+ pshufd xmm4,xmm3,78
+ pxor xmm4,xmm3
+DB 102,15,58,68,218,0
+DB 102,15,58,68,234,17
+DB 102,15,58,68,231,0
+
+ lea r8,[32+r8]
+ nop
+ sub r9,0x20
+ jbe NEAR $L$even_tail
+ nop
+ jmp NEAR $L$mod_loop
+
+ALIGN 32
+$L$mod_loop:
+ movdqa xmm1,xmm0
+ movdqa xmm8,xmm4
+ pshufd xmm4,xmm0,78
+ pxor xmm4,xmm0
+
+DB 102,15,58,68,198,0
+DB 102,15,58,68,206,17
+DB 102,15,58,68,231,16
+
+ pxor xmm0,xmm3
+ pxor xmm1,xmm5
+ movdqu xmm9,XMMWORD[r8]
+ pxor xmm8,xmm0
+DB 102,69,15,56,0,202
+ movdqu xmm3,XMMWORD[16+r8]
+
+ pxor xmm8,xmm1
+ pxor xmm1,xmm9
+ pxor xmm4,xmm8
+DB 102,65,15,56,0,218
+ movdqa xmm8,xmm4
+ psrldq xmm8,8
+ pslldq xmm4,8
+ pxor xmm1,xmm8
+ pxor xmm0,xmm4
+
+ movdqa xmm5,xmm3
+
+ movdqa xmm9,xmm0
+ movdqa xmm8,xmm0
+ psllq xmm0,5
+ pxor xmm8,xmm0
+DB 102,15,58,68,218,0
+ psllq xmm0,1
+ pxor xmm0,xmm8
+ psllq xmm0,57
+ movdqa xmm8,xmm0
+ pslldq xmm0,8
+ psrldq xmm8,8
+ pxor xmm0,xmm9
+ pshufd xmm4,xmm5,78
+ pxor xmm1,xmm8
+ pxor xmm4,xmm5
+
+ movdqa xmm9,xmm0
+ psrlq xmm0,1
+DB 102,15,58,68,234,17
+ pxor xmm1,xmm9
+ pxor xmm9,xmm0
+ psrlq xmm0,5
+ pxor xmm0,xmm9
+ lea r8,[32+r8]
+ psrlq xmm0,1
+DB 102,15,58,68,231,0
+ pxor xmm0,xmm1
+
+ sub r9,0x20
+ ja NEAR $L$mod_loop
+
+$L$even_tail:
+ movdqa xmm1,xmm0
+ movdqa xmm8,xmm4
+ pshufd xmm4,xmm0,78
+ pxor xmm4,xmm0
+
+DB 102,15,58,68,198,0
+DB 102,15,58,68,206,17
+DB 102,15,58,68,231,16
+
+ pxor xmm0,xmm3
+ pxor xmm1,xmm5
+ pxor xmm8,xmm0
+ pxor xmm8,xmm1
+ pxor xmm4,xmm8
+ movdqa xmm8,xmm4
+ psrldq xmm8,8
+ pslldq xmm4,8
+ pxor xmm1,xmm8
+ pxor xmm0,xmm4
+
+ movdqa xmm4,xmm0
+ movdqa xmm3,xmm0
+ psllq xmm0,5
+ pxor xmm3,xmm0
+ psllq xmm0,1
+ pxor xmm0,xmm3
+ psllq xmm0,57
+ movdqa xmm3,xmm0
+ pslldq xmm0,8
+ psrldq xmm3,8
+ pxor xmm0,xmm4
+ pxor xmm1,xmm3
+
+
+ movdqa xmm4,xmm0
+ psrlq xmm0,1
+ pxor xmm1,xmm4
+ pxor xmm4,xmm0
+ psrlq xmm0,5
+ pxor xmm0,xmm4
+ psrlq xmm0,1
+ pxor xmm0,xmm1
+ test r9,r9
+ jnz NEAR $L$done
+
+$L$odd_tail:
+ movdqu xmm8,XMMWORD[r8]
+DB 102,69,15,56,0,194
+ pxor xmm0,xmm8
+ movdqa xmm1,xmm0
+ pshufd xmm3,xmm0,78
+ pxor xmm3,xmm0
+DB 102,15,58,68,194,0
+DB 102,15,58,68,202,17
+DB 102,15,58,68,223,0
+ pxor xmm3,xmm0
+ pxor xmm3,xmm1
+
+ movdqa xmm4,xmm3
+ psrldq xmm3,8
+ pslldq xmm4,8
+ pxor xmm1,xmm3
+ pxor xmm0,xmm4
+
+ movdqa xmm4,xmm0
+ movdqa xmm3,xmm0
+ psllq xmm0,5
+ pxor xmm3,xmm0
+ psllq xmm0,1
+ pxor xmm0,xmm3
+ psllq xmm0,57
+ movdqa xmm3,xmm0
+ pslldq xmm0,8
+ psrldq xmm3,8
+ pxor xmm0,xmm4
+ pxor xmm1,xmm3
+
+
+ movdqa xmm4,xmm0
+ psrlq xmm0,1
+ pxor xmm1,xmm4
+ pxor xmm4,xmm0
+ psrlq xmm0,5
+ pxor xmm0,xmm4
+ psrlq xmm0,1
+ pxor xmm0,xmm1
+$L$done:
+DB 102,65,15,56,0,194
+ movdqu XMMWORD[rcx],xmm0
+ movaps xmm6,XMMWORD[rsp]
+ movaps xmm7,XMMWORD[16+rsp]
+ movaps xmm8,XMMWORD[32+rsp]
+ movaps xmm9,XMMWORD[48+rsp]
+ movaps xmm10,XMMWORD[64+rsp]
+ movaps xmm11,XMMWORD[80+rsp]
+ movaps xmm12,XMMWORD[96+rsp]
+ movaps xmm13,XMMWORD[112+rsp]
+ movaps xmm14,XMMWORD[128+rsp]
+ movaps xmm15,XMMWORD[144+rsp]
+ lea rsp,[168+rsp]
+$L$SEH_end_gcm_ghash_clmul:
+ DB 0F3h,0C3h ;repret
+
+
+global gcm_init_avx
+
+ALIGN 32
+gcm_init_avx:
+
+$L$SEH_begin_gcm_init_avx:
+
+DB 0x48,0x83,0xec,0x18
+DB 0x0f,0x29,0x34,0x24
+ vzeroupper
+
+ vmovdqu xmm2,XMMWORD[rdx]
+ vpshufd xmm2,xmm2,78
+
+
+ vpshufd xmm4,xmm2,255
+ vpsrlq xmm3,xmm2,63
+ vpsllq xmm2,xmm2,1
+ vpxor xmm5,xmm5,xmm5
+ vpcmpgtd xmm5,xmm5,xmm4
+ vpslldq xmm3,xmm3,8
+ vpor xmm2,xmm2,xmm3
+
+
+ vpand xmm5,xmm5,XMMWORD[$L$0x1c2_polynomial]
+ vpxor xmm2,xmm2,xmm5
+
+ vpunpckhqdq xmm6,xmm2,xmm2
+ vmovdqa xmm0,xmm2
+ vpxor xmm6,xmm6,xmm2
+ mov r10,4
+ jmp NEAR $L$init_start_avx
+ALIGN 32
+$L$init_loop_avx:
+ vpalignr xmm5,xmm4,xmm3,8
+ vmovdqu XMMWORD[(-16)+rcx],xmm5
+ vpunpckhqdq xmm3,xmm0,xmm0
+ vpxor xmm3,xmm3,xmm0
+ vpclmulqdq xmm1,xmm0,xmm2,0x11
+ vpclmulqdq xmm0,xmm0,xmm2,0x00
+ vpclmulqdq xmm3,xmm3,xmm6,0x00
+ vpxor xmm4,xmm1,xmm0
+ vpxor xmm3,xmm3,xmm4
+
+ vpslldq xmm4,xmm3,8
+ vpsrldq xmm3,xmm3,8
+ vpxor xmm0,xmm0,xmm4
+ vpxor xmm1,xmm1,xmm3
+ vpsllq xmm3,xmm0,57
+ vpsllq xmm4,xmm0,62
+ vpxor xmm4,xmm4,xmm3
+ vpsllq xmm3,xmm0,63
+ vpxor xmm4,xmm4,xmm3
+ vpslldq xmm3,xmm4,8
+ vpsrldq xmm4,xmm4,8
+ vpxor xmm0,xmm0,xmm3
+ vpxor xmm1,xmm1,xmm4
+
+ vpsrlq xmm4,xmm0,1
+ vpxor xmm1,xmm1,xmm0
+ vpxor xmm0,xmm0,xmm4
+ vpsrlq xmm4,xmm4,5
+ vpxor xmm0,xmm0,xmm4
+ vpsrlq xmm0,xmm0,1
+ vpxor xmm0,xmm0,xmm1
+$L$init_start_avx:
+ vmovdqa xmm5,xmm0
+ vpunpckhqdq xmm3,xmm0,xmm0
+ vpxor xmm3,xmm3,xmm0
+ vpclmulqdq xmm1,xmm0,xmm2,0x11
+ vpclmulqdq xmm0,xmm0,xmm2,0x00
+ vpclmulqdq xmm3,xmm3,xmm6,0x00
+ vpxor xmm4,xmm1,xmm0
+ vpxor xmm3,xmm3,xmm4
+
+ vpslldq xmm4,xmm3,8
+ vpsrldq xmm3,xmm3,8
+ vpxor xmm0,xmm0,xmm4
+ vpxor xmm1,xmm1,xmm3
+ vpsllq xmm3,xmm0,57
+ vpsllq xmm4,xmm0,62
+ vpxor xmm4,xmm4,xmm3
+ vpsllq xmm3,xmm0,63
+ vpxor xmm4,xmm4,xmm3
+ vpslldq xmm3,xmm4,8
+ vpsrldq xmm4,xmm4,8
+ vpxor xmm0,xmm0,xmm3
+ vpxor xmm1,xmm1,xmm4
+
+ vpsrlq xmm4,xmm0,1
+ vpxor xmm1,xmm1,xmm0
+ vpxor xmm0,xmm0,xmm4
+ vpsrlq xmm4,xmm4,5
+ vpxor xmm0,xmm0,xmm4
+ vpsrlq xmm0,xmm0,1
+ vpxor xmm0,xmm0,xmm1
+ vpshufd xmm3,xmm5,78
+ vpshufd xmm4,xmm0,78
+ vpxor xmm3,xmm3,xmm5
+ vmovdqu XMMWORD[rcx],xmm5
+ vpxor xmm4,xmm4,xmm0
+ vmovdqu XMMWORD[16+rcx],xmm0
+ lea rcx,[48+rcx]
+ sub r10,1
+ jnz NEAR $L$init_loop_avx
+
+ vpalignr xmm5,xmm3,xmm4,8
+ vmovdqu XMMWORD[(-16)+rcx],xmm5
+
+ vzeroupper
+ movaps xmm6,XMMWORD[rsp]
+ lea rsp,[24+rsp]
+$L$SEH_end_gcm_init_avx:
+ DB 0F3h,0C3h ;repret
+
+
+global gcm_gmult_avx
+
+ALIGN 32
+gcm_gmult_avx:
+
+ jmp NEAR $L$_gmult_clmul
+
+
+global gcm_ghash_avx
+
+ALIGN 32
+gcm_ghash_avx:
+
+ lea rax,[((-136))+rsp]
+$L$SEH_begin_gcm_ghash_avx:
+
+DB 0x48,0x8d,0x60,0xe0
+DB 0x0f,0x29,0x70,0xe0
+DB 0x0f,0x29,0x78,0xf0
+DB 0x44,0x0f,0x29,0x00
+DB 0x44,0x0f,0x29,0x48,0x10
+DB 0x44,0x0f,0x29,0x50,0x20
+DB 0x44,0x0f,0x29,0x58,0x30
+DB 0x44,0x0f,0x29,0x60,0x40
+DB 0x44,0x0f,0x29,0x68,0x50
+DB 0x44,0x0f,0x29,0x70,0x60
+DB 0x44,0x0f,0x29,0x78,0x70
+ vzeroupper
+
+ vmovdqu xmm10,XMMWORD[rcx]
+ lea r10,[$L$0x1c2_polynomial]
+ lea rdx,[64+rdx]
+ vmovdqu xmm13,XMMWORD[$L$bswap_mask]
+ vpshufb xmm10,xmm10,xmm13
+ cmp r9,0x80
+ jb NEAR $L$short_avx
+ sub r9,0x80
+
+ vmovdqu xmm14,XMMWORD[112+r8]
+ vmovdqu xmm6,XMMWORD[((0-64))+rdx]
+ vpshufb xmm14,xmm14,xmm13
+ vmovdqu xmm7,XMMWORD[((32-64))+rdx]
+
+ vpunpckhqdq xmm9,xmm14,xmm14
+ vmovdqu xmm15,XMMWORD[96+r8]
+ vpclmulqdq xmm0,xmm14,xmm6,0x00
+ vpxor xmm9,xmm9,xmm14
+ vpshufb xmm15,xmm15,xmm13
+ vpclmulqdq xmm1,xmm14,xmm6,0x11
+ vmovdqu xmm6,XMMWORD[((16-64))+rdx]
+ vpunpckhqdq xmm8,xmm15,xmm15
+ vmovdqu xmm14,XMMWORD[80+r8]
+ vpclmulqdq xmm2,xmm9,xmm7,0x00
+ vpxor xmm8,xmm8,xmm15
+
+ vpshufb xmm14,xmm14,xmm13
+ vpclmulqdq xmm3,xmm15,xmm6,0x00
+ vpunpckhqdq xmm9,xmm14,xmm14
+ vpclmulqdq xmm4,xmm15,xmm6,0x11
+ vmovdqu xmm6,XMMWORD[((48-64))+rdx]
+ vpxor xmm9,xmm9,xmm14
+ vmovdqu xmm15,XMMWORD[64+r8]
+ vpclmulqdq xmm5,xmm8,xmm7,0x10
+ vmovdqu xmm7,XMMWORD[((80-64))+rdx]
+
+ vpshufb xmm15,xmm15,xmm13
+ vpxor xmm3,xmm3,xmm0
+ vpclmulqdq xmm0,xmm14,xmm6,0x00
+ vpxor xmm4,xmm4,xmm1
+ vpunpckhqdq xmm8,xmm15,xmm15
+ vpclmulqdq xmm1,xmm14,xmm6,0x11
+ vmovdqu xmm6,XMMWORD[((64-64))+rdx]
+ vpxor xmm5,xmm5,xmm2
+ vpclmulqdq xmm2,xmm9,xmm7,0x00
+ vpxor xmm8,xmm8,xmm15
+
+ vmovdqu xmm14,XMMWORD[48+r8]
+ vpxor xmm0,xmm0,xmm3
+ vpclmulqdq xmm3,xmm15,xmm6,0x00
+ vpxor xmm1,xmm1,xmm4
+ vpshufb xmm14,xmm14,xmm13
+ vpclmulqdq xmm4,xmm15,xmm6,0x11
+ vmovdqu xmm6,XMMWORD[((96-64))+rdx]
+ vpxor xmm2,xmm2,xmm5
+ vpunpckhqdq xmm9,xmm14,xmm14
+ vpclmulqdq xmm5,xmm8,xmm7,0x10
+ vmovdqu xmm7,XMMWORD[((128-64))+rdx]
+ vpxor xmm9,xmm9,xmm14
+
+ vmovdqu xmm15,XMMWORD[32+r8]
+ vpxor xmm3,xmm3,xmm0
+ vpclmulqdq xmm0,xmm14,xmm6,0x00
+ vpxor xmm4,xmm4,xmm1
+ vpshufb xmm15,xmm15,xmm13
+ vpclmulqdq xmm1,xmm14,xmm6,0x11
+ vmovdqu xmm6,XMMWORD[((112-64))+rdx]
+ vpxor xmm5,xmm5,xmm2
+ vpunpckhqdq xmm8,xmm15,xmm15
+ vpclmulqdq xmm2,xmm9,xmm7,0x00
+ vpxor xmm8,xmm8,xmm15
+
+ vmovdqu xmm14,XMMWORD[16+r8]
+ vpxor xmm0,xmm0,xmm3
+ vpclmulqdq xmm3,xmm15,xmm6,0x00
+ vpxor xmm1,xmm1,xmm4
+ vpshufb xmm14,xmm14,xmm13
+ vpclmulqdq xmm4,xmm15,xmm6,0x11
+ vmovdqu xmm6,XMMWORD[((144-64))+rdx]
+ vpxor xmm2,xmm2,xmm5
+ vpunpckhqdq xmm9,xmm14,xmm14
+ vpclmulqdq xmm5,xmm8,xmm7,0x10
+ vmovdqu xmm7,XMMWORD[((176-64))+rdx]
+ vpxor xmm9,xmm9,xmm14
+
+ vmovdqu xmm15,XMMWORD[r8]
+ vpxor xmm3,xmm3,xmm0
+ vpclmulqdq xmm0,xmm14,xmm6,0x00
+ vpxor xmm4,xmm4,xmm1
+ vpshufb xmm15,xmm15,xmm13
+ vpclmulqdq xmm1,xmm14,xmm6,0x11
+ vmovdqu xmm6,XMMWORD[((160-64))+rdx]
+ vpxor xmm5,xmm5,xmm2
+ vpclmulqdq xmm2,xmm9,xmm7,0x10
+
+ lea r8,[128+r8]
+ cmp r9,0x80
+ jb NEAR $L$tail_avx
+
+ vpxor xmm15,xmm15,xmm10
+ sub r9,0x80
+ jmp NEAR $L$oop8x_avx
+
+ALIGN 32
+$L$oop8x_avx:
+ vpunpckhqdq xmm8,xmm15,xmm15
+ vmovdqu xmm14,XMMWORD[112+r8]
+ vpxor xmm3,xmm3,xmm0
+ vpxor xmm8,xmm8,xmm15
+ vpclmulqdq xmm10,xmm15,xmm6,0x00
+ vpshufb xmm14,xmm14,xmm13
+ vpxor xmm4,xmm4,xmm1
+ vpclmulqdq xmm11,xmm15,xmm6,0x11
+ vmovdqu xmm6,XMMWORD[((0-64))+rdx]
+ vpunpckhqdq xmm9,xmm14,xmm14
+ vpxor xmm5,xmm5,xmm2
+ vpclmulqdq xmm12,xmm8,xmm7,0x00
+ vmovdqu xmm7,XMMWORD[((32-64))+rdx]
+ vpxor xmm9,xmm9,xmm14
+
+ vmovdqu xmm15,XMMWORD[96+r8]
+ vpclmulqdq xmm0,xmm14,xmm6,0x00
+ vpxor xmm10,xmm10,xmm3
+ vpshufb xmm15,xmm15,xmm13
+ vpclmulqdq xmm1,xmm14,xmm6,0x11
+ vxorps xmm11,xmm11,xmm4
+ vmovdqu xmm6,XMMWORD[((16-64))+rdx]
+ vpunpckhqdq xmm8,xmm15,xmm15
+ vpclmulqdq xmm2,xmm9,xmm7,0x00
+ vpxor xmm12,xmm12,xmm5
+ vxorps xmm8,xmm8,xmm15
+
+ vmovdqu xmm14,XMMWORD[80+r8]
+ vpxor xmm12,xmm12,xmm10
+ vpclmulqdq xmm3,xmm15,xmm6,0x00
+ vpxor xmm12,xmm12,xmm11
+ vpslldq xmm9,xmm12,8
+ vpxor xmm3,xmm3,xmm0
+ vpclmulqdq xmm4,xmm15,xmm6,0x11
+ vpsrldq xmm12,xmm12,8
+ vpxor xmm10,xmm10,xmm9
+ vmovdqu xmm6,XMMWORD[((48-64))+rdx]
+ vpshufb xmm14,xmm14,xmm13
+ vxorps xmm11,xmm11,xmm12
+ vpxor xmm4,xmm4,xmm1
+ vpunpckhqdq xmm9,xmm14,xmm14
+ vpclmulqdq xmm5,xmm8,xmm7,0x10
+ vmovdqu xmm7,XMMWORD[((80-64))+rdx]
+ vpxor xmm9,xmm9,xmm14
+ vpxor xmm5,xmm5,xmm2
+
+ vmovdqu xmm15,XMMWORD[64+r8]
+ vpalignr xmm12,xmm10,xmm10,8
+ vpclmulqdq xmm0,xmm14,xmm6,0x00
+ vpshufb xmm15,xmm15,xmm13
+ vpxor xmm0,xmm0,xmm3
+ vpclmulqdq xmm1,xmm14,xmm6,0x11
+ vmovdqu xmm6,XMMWORD[((64-64))+rdx]
+ vpunpckhqdq xmm8,xmm15,xmm15
+ vpxor xmm1,xmm1,xmm4
+ vpclmulqdq xmm2,xmm9,xmm7,0x00
+ vxorps xmm8,xmm8,xmm15
+ vpxor xmm2,xmm2,xmm5
+
+ vmovdqu xmm14,XMMWORD[48+r8]
+ vpclmulqdq xmm10,xmm10,XMMWORD[r10],0x10
+ vpclmulqdq xmm3,xmm15,xmm6,0x00
+ vpshufb xmm14,xmm14,xmm13
+ vpxor xmm3,xmm3,xmm0
+ vpclmulqdq xmm4,xmm15,xmm6,0x11
+ vmovdqu xmm6,XMMWORD[((96-64))+rdx]
+ vpunpckhqdq xmm9,xmm14,xmm14
+ vpxor xmm4,xmm4,xmm1
+ vpclmulqdq xmm5,xmm8,xmm7,0x10
+ vmovdqu xmm7,XMMWORD[((128-64))+rdx]
+ vpxor xmm9,xmm9,xmm14
+ vpxor xmm5,xmm5,xmm2
+
+ vmovdqu xmm15,XMMWORD[32+r8]
+ vpclmulqdq xmm0,xmm14,xmm6,0x00
+ vpshufb xmm15,xmm15,xmm13
+ vpxor xmm0,xmm0,xmm3
+ vpclmulqdq xmm1,xmm14,xmm6,0x11
+ vmovdqu xmm6,XMMWORD[((112-64))+rdx]
+ vpunpckhqdq xmm8,xmm15,xmm15
+ vpxor xmm1,xmm1,xmm4
+ vpclmulqdq xmm2,xmm9,xmm7,0x00
+ vpxor xmm8,xmm8,xmm15
+ vpxor xmm2,xmm2,xmm5
+ vxorps xmm10,xmm10,xmm12
+
+ vmovdqu xmm14,XMMWORD[16+r8]
+ vpalignr xmm12,xmm10,xmm10,8
+ vpclmulqdq xmm3,xmm15,xmm6,0x00
+ vpshufb xmm14,xmm14,xmm13
+ vpxor xmm3,xmm3,xmm0
+ vpclmulqdq xmm4,xmm15,xmm6,0x11
+ vmovdqu xmm6,XMMWORD[((144-64))+rdx]
+ vpclmulqdq xmm10,xmm10,XMMWORD[r10],0x10
+ vxorps xmm12,xmm12,xmm11
+ vpunpckhqdq xmm9,xmm14,xmm14
+ vpxor xmm4,xmm4,xmm1
+ vpclmulqdq xmm5,xmm8,xmm7,0x10
+ vmovdqu xmm7,XMMWORD[((176-64))+rdx]
+ vpxor xmm9,xmm9,xmm14
+ vpxor xmm5,xmm5,xmm2
+
+ vmovdqu xmm15,XMMWORD[r8]
+ vpclmulqdq xmm0,xmm14,xmm6,0x00
+ vpshufb xmm15,xmm15,xmm13
+ vpclmulqdq xmm1,xmm14,xmm6,0x11
+ vmovdqu xmm6,XMMWORD[((160-64))+rdx]
+ vpxor xmm15,xmm15,xmm12
+ vpclmulqdq xmm2,xmm9,xmm7,0x10
+ vpxor xmm15,xmm15,xmm10
+
+ lea r8,[128+r8]
+ sub r9,0x80
+ jnc NEAR $L$oop8x_avx
+
+ add r9,0x80
+ jmp NEAR $L$tail_no_xor_avx
+
+ALIGN 32
+$L$short_avx:
+ vmovdqu xmm14,XMMWORD[((-16))+r9*1+r8]
+ lea r8,[r9*1+r8]
+ vmovdqu xmm6,XMMWORD[((0-64))+rdx]
+ vmovdqu xmm7,XMMWORD[((32-64))+rdx]
+ vpshufb xmm15,xmm14,xmm13
+
+ vmovdqa xmm3,xmm0
+ vmovdqa xmm4,xmm1
+ vmovdqa xmm5,xmm2
+ sub r9,0x10
+ jz NEAR $L$tail_avx
+
+ vpunpckhqdq xmm8,xmm15,xmm15
+ vpxor xmm3,xmm3,xmm0
+ vpclmulqdq xmm0,xmm15,xmm6,0x00
+ vpxor xmm8,xmm8,xmm15
+ vmovdqu xmm14,XMMWORD[((-32))+r8]
+ vpxor xmm4,xmm4,xmm1
+ vpclmulqdq xmm1,xmm15,xmm6,0x11
+ vmovdqu xmm6,XMMWORD[((16-64))+rdx]
+ vpshufb xmm15,xmm14,xmm13
+ vpxor xmm5,xmm5,xmm2
+ vpclmulqdq xmm2,xmm8,xmm7,0x00
+ vpsrldq xmm7,xmm7,8
+ sub r9,0x10
+ jz NEAR $L$tail_avx
+
+ vpunpckhqdq xmm8,xmm15,xmm15
+ vpxor xmm3,xmm3,xmm0
+ vpclmulqdq xmm0,xmm15,xmm6,0x00
+ vpxor xmm8,xmm8,xmm15
+ vmovdqu xmm14,XMMWORD[((-48))+r8]
+ vpxor xmm4,xmm4,xmm1
+ vpclmulqdq xmm1,xmm15,xmm6,0x11
+ vmovdqu xmm6,XMMWORD[((48-64))+rdx]
+ vpshufb xmm15,xmm14,xmm13
+ vpxor xmm5,xmm5,xmm2
+ vpclmulqdq xmm2,xmm8,xmm7,0x00
+ vmovdqu xmm7,XMMWORD[((80-64))+rdx]
+ sub r9,0x10
+ jz NEAR $L$tail_avx
+
+ vpunpckhqdq xmm8,xmm15,xmm15
+ vpxor xmm3,xmm3,xmm0
+ vpclmulqdq xmm0,xmm15,xmm6,0x00
+ vpxor xmm8,xmm8,xmm15
+ vmovdqu xmm14,XMMWORD[((-64))+r8]
+ vpxor xmm4,xmm4,xmm1
+ vpclmulqdq xmm1,xmm15,xmm6,0x11
+ vmovdqu xmm6,XMMWORD[((64-64))+rdx]
+ vpshufb xmm15,xmm14,xmm13
+ vpxor xmm5,xmm5,xmm2
+ vpclmulqdq xmm2,xmm8,xmm7,0x00
+ vpsrldq xmm7,xmm7,8
+ sub r9,0x10
+ jz NEAR $L$tail_avx
+
+ vpunpckhqdq xmm8,xmm15,xmm15
+ vpxor xmm3,xmm3,xmm0
+ vpclmulqdq xmm0,xmm15,xmm6,0x00
+ vpxor xmm8,xmm8,xmm15
+ vmovdqu xmm14,XMMWORD[((-80))+r8]
+ vpxor xmm4,xmm4,xmm1
+ vpclmulqdq xmm1,xmm15,xmm6,0x11
+ vmovdqu xmm6,XMMWORD[((96-64))+rdx]
+ vpshufb xmm15,xmm14,xmm13
+ vpxor xmm5,xmm5,xmm2
+ vpclmulqdq xmm2,xmm8,xmm7,0x00
+ vmovdqu xmm7,XMMWORD[((128-64))+rdx]
+ sub r9,0x10
+ jz NEAR $L$tail_avx
+
+ vpunpckhqdq xmm8,xmm15,xmm15
+ vpxor xmm3,xmm3,xmm0
+ vpclmulqdq xmm0,xmm15,xmm6,0x00
+ vpxor xmm8,xmm8,xmm15
+ vmovdqu xmm14,XMMWORD[((-96))+r8]
+ vpxor xmm4,xmm4,xmm1
+ vpclmulqdq xmm1,xmm15,xmm6,0x11
+ vmovdqu xmm6,XMMWORD[((112-64))+rdx]
+ vpshufb xmm15,xmm14,xmm13
+ vpxor xmm5,xmm5,xmm2
+ vpclmulqdq xmm2,xmm8,xmm7,0x00
+ vpsrldq xmm7,xmm7,8
+ sub r9,0x10
+ jz NEAR $L$tail_avx
+
+ vpunpckhqdq xmm8,xmm15,xmm15
+ vpxor xmm3,xmm3,xmm0
+ vpclmulqdq xmm0,xmm15,xmm6,0x00
+ vpxor xmm8,xmm8,xmm15
+ vmovdqu xmm14,XMMWORD[((-112))+r8]
+ vpxor xmm4,xmm4,xmm1
+ vpclmulqdq xmm1,xmm15,xmm6,0x11
+ vmovdqu xmm6,XMMWORD[((144-64))+rdx]
+ vpshufb xmm15,xmm14,xmm13
+ vpxor xmm5,xmm5,xmm2
+ vpclmulqdq xmm2,xmm8,xmm7,0x00
+ vmovq xmm7,QWORD[((184-64))+rdx]
+ sub r9,0x10
+ jmp NEAR $L$tail_avx
+
+ALIGN 32
+$L$tail_avx:
+ vpxor xmm15,xmm15,xmm10
+$L$tail_no_xor_avx:
+ vpunpckhqdq xmm8,xmm15,xmm15
+ vpxor xmm3,xmm3,xmm0
+ vpclmulqdq xmm0,xmm15,xmm6,0x00
+ vpxor xmm8,xmm8,xmm15
+ vpxor xmm4,xmm4,xmm1
+ vpclmulqdq xmm1,xmm15,xmm6,0x11
+ vpxor xmm5,xmm5,xmm2
+ vpclmulqdq xmm2,xmm8,xmm7,0x00
+
+ vmovdqu xmm12,XMMWORD[r10]
+
+ vpxor xmm10,xmm3,xmm0
+ vpxor xmm11,xmm4,xmm1
+ vpxor xmm5,xmm5,xmm2
+
+ vpxor xmm5,xmm5,xmm10
+ vpxor xmm5,xmm5,xmm11
+ vpslldq xmm9,xmm5,8
+ vpsrldq xmm5,xmm5,8
+ vpxor xmm10,xmm10,xmm9
+ vpxor xmm11,xmm11,xmm5
+
+ vpclmulqdq xmm9,xmm10,xmm12,0x10
+ vpalignr xmm10,xmm10,xmm10,8
+ vpxor xmm10,xmm10,xmm9
+
+ vpclmulqdq xmm9,xmm10,xmm12,0x10
+ vpalignr xmm10,xmm10,xmm10,8
+ vpxor xmm10,xmm10,xmm11
+ vpxor xmm10,xmm10,xmm9
+
+ cmp r9,0
+ jne NEAR $L$short_avx
+
+ vpshufb xmm10,xmm10,xmm13
+ vmovdqu XMMWORD[rcx],xmm10
+ vzeroupper
+ movaps xmm6,XMMWORD[rsp]
+ movaps xmm7,XMMWORD[16+rsp]
+ movaps xmm8,XMMWORD[32+rsp]
+ movaps xmm9,XMMWORD[48+rsp]
+ movaps xmm10,XMMWORD[64+rsp]
+ movaps xmm11,XMMWORD[80+rsp]
+ movaps xmm12,XMMWORD[96+rsp]
+ movaps xmm13,XMMWORD[112+rsp]
+ movaps xmm14,XMMWORD[128+rsp]
+ movaps xmm15,XMMWORD[144+rsp]
+ lea rsp,[168+rsp]
+$L$SEH_end_gcm_ghash_avx:
+ DB 0F3h,0C3h ;repret
+
+
+ALIGN 64
+$L$bswap_mask:
+DB 15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0
+$L$0x1c2_polynomial:
+DB 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0xc2
+$L$7_mask:
+ DD 7,0,7,0
+$L$7_mask_poly:
+ DD 7,0,450,0
+ALIGN 64
+
+$L$rem_4bit:
+ DD 0,0,0,471859200,0,943718400,0,610271232
+ DD 0,1887436800,0,1822425088,0,1220542464,0,1423966208
+ DD 0,3774873600,0,4246732800,0,3644850176,0,3311403008
+ DD 0,2441084928,0,2376073216,0,2847932416,0,3051356160
+
+$L$rem_8bit:
+ DW 0x0000,0x01C2,0x0384,0x0246,0x0708,0x06CA,0x048C,0x054E
+ DW 0x0E10,0x0FD2,0x0D94,0x0C56,0x0918,0x08DA,0x0A9C,0x0B5E
+ DW 0x1C20,0x1DE2,0x1FA4,0x1E66,0x1B28,0x1AEA,0x18AC,0x196E
+ DW 0x1230,0x13F2,0x11B4,0x1076,0x1538,0x14FA,0x16BC,0x177E
+ DW 0x3840,0x3982,0x3BC4,0x3A06,0x3F48,0x3E8A,0x3CCC,0x3D0E
+ DW 0x3650,0x3792,0x35D4,0x3416,0x3158,0x309A,0x32DC,0x331E
+ DW 0x2460,0x25A2,0x27E4,0x2626,0x2368,0x22AA,0x20EC,0x212E
+ DW 0x2A70,0x2BB2,0x29F4,0x2836,0x2D78,0x2CBA,0x2EFC,0x2F3E
+ DW 0x7080,0x7142,0x7304,0x72C6,0x7788,0x764A,0x740C,0x75CE
+ DW 0x7E90,0x7F52,0x7D14,0x7CD6,0x7998,0x785A,0x7A1C,0x7BDE
+ DW 0x6CA0,0x6D62,0x6F24,0x6EE6,0x6BA8,0x6A6A,0x682C,0x69EE
+ DW 0x62B0,0x6372,0x6134,0x60F6,0x65B8,0x647A,0x663C,0x67FE
+ DW 0x48C0,0x4902,0x4B44,0x4A86,0x4FC8,0x4E0A,0x4C4C,0x4D8E
+ DW 0x46D0,0x4712,0x4554,0x4496,0x41D8,0x401A,0x425C,0x439E
+ DW 0x54E0,0x5522,0x5764,0x56A6,0x53E8,0x522A,0x506C,0x51AE
+ DW 0x5AF0,0x5B32,0x5974,0x58B6,0x5DF8,0x5C3A,0x5E7C,0x5FBE
+ DW 0xE100,0xE0C2,0xE284,0xE346,0xE608,0xE7CA,0xE58C,0xE44E
+ DW 0xEF10,0xEED2,0xEC94,0xED56,0xE818,0xE9DA,0xEB9C,0xEA5E
+ DW 0xFD20,0xFCE2,0xFEA4,0xFF66,0xFA28,0xFBEA,0xF9AC,0xF86E
+ DW 0xF330,0xF2F2,0xF0B4,0xF176,0xF438,0xF5FA,0xF7BC,0xF67E
+ DW 0xD940,0xD882,0xDAC4,0xDB06,0xDE48,0xDF8A,0xDDCC,0xDC0E
+ DW 0xD750,0xD692,0xD4D4,0xD516,0xD058,0xD19A,0xD3DC,0xD21E
+ DW 0xC560,0xC4A2,0xC6E4,0xC726,0xC268,0xC3AA,0xC1EC,0xC02E
+ DW 0xCB70,0xCAB2,0xC8F4,0xC936,0xCC78,0xCDBA,0xCFFC,0xCE3E
+ DW 0x9180,0x9042,0x9204,0x93C6,0x9688,0x974A,0x950C,0x94CE
+ DW 0x9F90,0x9E52,0x9C14,0x9DD6,0x9898,0x995A,0x9B1C,0x9ADE
+ DW 0x8DA0,0x8C62,0x8E24,0x8FE6,0x8AA8,0x8B6A,0x892C,0x88EE
+ DW 0x83B0,0x8272,0x8034,0x81F6,0x84B8,0x857A,0x873C,0x86FE
+ DW 0xA9C0,0xA802,0xAA44,0xAB86,0xAEC8,0xAF0A,0xAD4C,0xAC8E
+ DW 0xA7D0,0xA612,0xA454,0xA596,0xA0D8,0xA11A,0xA35C,0xA29E
+ DW 0xB5E0,0xB422,0xB664,0xB7A6,0xB2E8,0xB32A,0xB16C,0xB0AE
+ DW 0xBBF0,0xBA32,0xB874,0xB9B6,0xBCF8,0xBD3A,0xBF7C,0xBEBE
+
+DB 71,72,65,83,72,32,102,111,114,32,120,56,54,95,54,52
+DB 44,32,67,82,89,80,84,79,71,65,77,83,32,98,121,32
+DB 60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111
+DB 114,103,62,0
+ALIGN 64
+EXTERN __imp_RtlVirtualUnwind
+
+ALIGN 16
+se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$in_prologue
+
+ mov rax,QWORD[152+r8]
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$in_prologue
+
+ lea rax,[((48+280))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+ mov rbp,QWORD[((-16))+rax]
+ mov r12,QWORD[((-24))+rax]
+ mov r13,QWORD[((-32))+rax]
+ mov r14,QWORD[((-40))+rax]
+ mov r15,QWORD[((-48))+rax]
+ mov QWORD[144+r8],rbx
+ mov QWORD[160+r8],rbp
+ mov QWORD[216+r8],r12
+ mov QWORD[224+r8],r13
+ mov QWORD[232+r8],r14
+ mov QWORD[240+r8],r15
+
+$L$in_prologue:
+ mov rdi,QWORD[8+rax]
+ mov rsi,QWORD[16+rax]
+ mov QWORD[152+r8],rax
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+
+ mov rdi,QWORD[40+r9]
+ mov rsi,r8
+ mov ecx,154
+ DD 0xa548f3fc
+
+ mov rsi,r9
+ xor rcx,rcx
+ mov rdx,QWORD[8+rsi]
+ mov r8,QWORD[rsi]
+ mov r9,QWORD[16+rsi]
+ mov r10,QWORD[40+rsi]
+ lea r11,[56+rsi]
+ lea r12,[24+rsi]
+ mov QWORD[32+rsp],r10
+ mov QWORD[40+rsp],r11
+ mov QWORD[48+rsp],r12
+ mov QWORD[56+rsp],rcx
+ call QWORD[__imp_RtlVirtualUnwind]
+
+ mov eax,1
+ add rsp,64
+ popfq
+ pop r15
+ pop r14
+ pop r13
+ pop r12
+ pop rbp
+ pop rbx
+ pop rdi
+ pop rsi
+ DB 0F3h,0C3h ;repret
+
+
+section .pdata rdata align=4
+ALIGN 4
+ DD $L$SEH_begin_gcm_gmult_4bit wrt ..imagebase
+ DD $L$SEH_end_gcm_gmult_4bit wrt ..imagebase
+ DD $L$SEH_info_gcm_gmult_4bit wrt ..imagebase
+
+ DD $L$SEH_begin_gcm_ghash_4bit wrt ..imagebase
+ DD $L$SEH_end_gcm_ghash_4bit wrt ..imagebase
+ DD $L$SEH_info_gcm_ghash_4bit wrt ..imagebase
+
+ DD $L$SEH_begin_gcm_init_clmul wrt ..imagebase
+ DD $L$SEH_end_gcm_init_clmul wrt ..imagebase
+ DD $L$SEH_info_gcm_init_clmul wrt ..imagebase
+
+ DD $L$SEH_begin_gcm_ghash_clmul wrt ..imagebase
+ DD $L$SEH_end_gcm_ghash_clmul wrt ..imagebase
+ DD $L$SEH_info_gcm_ghash_clmul wrt ..imagebase
+ DD $L$SEH_begin_gcm_init_avx wrt ..imagebase
+ DD $L$SEH_end_gcm_init_avx wrt ..imagebase
+ DD $L$SEH_info_gcm_init_clmul wrt ..imagebase
+
+ DD $L$SEH_begin_gcm_ghash_avx wrt ..imagebase
+ DD $L$SEH_end_gcm_ghash_avx wrt ..imagebase
+ DD $L$SEH_info_gcm_ghash_clmul wrt ..imagebase
+section .xdata rdata align=8
+ALIGN 8
+$L$SEH_info_gcm_gmult_4bit:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$gmult_prologue wrt ..imagebase,$L$gmult_epilogue wrt ..imagebase
+$L$SEH_info_gcm_ghash_4bit:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$ghash_prologue wrt ..imagebase,$L$ghash_epilogue wrt ..imagebase
+$L$SEH_info_gcm_init_clmul:
+DB 0x01,0x08,0x03,0x00
+DB 0x08,0x68,0x00,0x00
+DB 0x04,0x22,0x00,0x00
+$L$SEH_info_gcm_ghash_clmul:
+DB 0x01,0x33,0x16,0x00
+DB 0x33,0xf8,0x09,0x00
+DB 0x2e,0xe8,0x08,0x00
+DB 0x29,0xd8,0x07,0x00
+DB 0x24,0xc8,0x06,0x00
+DB 0x1f,0xb8,0x05,0x00
+DB 0x1a,0xa8,0x04,0x00
+DB 0x15,0x98,0x03,0x00
+DB 0x10,0x88,0x02,0x00
+DB 0x0c,0x78,0x01,0x00
+DB 0x08,0x68,0x00,0x00
+DB 0x04,0x01,0x15,0x00
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-md5-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-md5-x86_64.nasm
new file mode 100644
index 0000000000..c9a37a47c9
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-md5-x86_64.nasm
@@ -0,0 +1,1395 @@
+; Copyright 2011-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+ALIGN 16
+
+global rc4_md5_enc
+
+rc4_md5_enc:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_rc4_md5_enc:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ cmp r9,0
+ je NEAR $L$abort
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ sub rsp,40
+
+$L$body:
+ mov r11,rcx
+ mov r12,r9
+ mov r13,rsi
+ mov r14,rdx
+ mov r15,r8
+ xor rbp,rbp
+ xor rcx,rcx
+
+ lea rdi,[8+rdi]
+ mov bpl,BYTE[((-8))+rdi]
+ mov cl,BYTE[((-4))+rdi]
+
+ inc bpl
+ sub r14,r13
+ mov eax,DWORD[rbp*4+rdi]
+ add cl,al
+ lea rsi,[rbp*4+rdi]
+ shl r12,6
+ add r12,r15
+ mov QWORD[16+rsp],r12
+
+ mov QWORD[24+rsp],r11
+ mov r8d,DWORD[r11]
+ mov r9d,DWORD[4+r11]
+ mov r10d,DWORD[8+r11]
+ mov r11d,DWORD[12+r11]
+ jmp NEAR $L$oop
+
+ALIGN 16
+$L$oop:
+ mov DWORD[rsp],r8d
+ mov DWORD[4+rsp],r9d
+ mov DWORD[8+rsp],r10d
+ mov r12d,r11d
+ mov DWORD[12+rsp],r11d
+ pxor xmm0,xmm0
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r10d
+ mov DWORD[rcx*4+rdi],eax
+ and r12d,r9d
+ add r8d,DWORD[r15]
+ add al,dl
+ mov ebx,DWORD[4+rsi]
+ add r8d,3614090360
+ xor r12d,r11d
+ movzx eax,al
+ mov DWORD[rsi],edx
+ add r8d,r12d
+ add cl,bl
+ rol r8d,7
+ mov r12d,r10d
+ movd xmm0,DWORD[rax*4+rdi]
+
+ add r8d,r9d
+ pxor xmm1,xmm1
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r9d
+ mov DWORD[rcx*4+rdi],ebx
+ and r12d,r8d
+ add r11d,DWORD[4+r15]
+ add bl,dl
+ mov eax,DWORD[8+rsi]
+ add r11d,3905402710
+ xor r12d,r10d
+ movzx ebx,bl
+ mov DWORD[4+rsi],edx
+ add r11d,r12d
+ add cl,al
+ rol r11d,12
+ mov r12d,r9d
+ movd xmm1,DWORD[rbx*4+rdi]
+
+ add r11d,r8d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r8d
+ mov DWORD[rcx*4+rdi],eax
+ and r12d,r11d
+ add r10d,DWORD[8+r15]
+ add al,dl
+ mov ebx,DWORD[12+rsi]
+ add r10d,606105819
+ xor r12d,r9d
+ movzx eax,al
+ mov DWORD[8+rsi],edx
+ add r10d,r12d
+ add cl,bl
+ rol r10d,17
+ mov r12d,r8d
+ pinsrw xmm0,WORD[rax*4+rdi],1
+
+ add r10d,r11d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r11d
+ mov DWORD[rcx*4+rdi],ebx
+ and r12d,r10d
+ add r9d,DWORD[12+r15]
+ add bl,dl
+ mov eax,DWORD[16+rsi]
+ add r9d,3250441966
+ xor r12d,r8d
+ movzx ebx,bl
+ mov DWORD[12+rsi],edx
+ add r9d,r12d
+ add cl,al
+ rol r9d,22
+ mov r12d,r11d
+ pinsrw xmm1,WORD[rbx*4+rdi],1
+
+ add r9d,r10d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r10d
+ mov DWORD[rcx*4+rdi],eax
+ and r12d,r9d
+ add r8d,DWORD[16+r15]
+ add al,dl
+ mov ebx,DWORD[20+rsi]
+ add r8d,4118548399
+ xor r12d,r11d
+ movzx eax,al
+ mov DWORD[16+rsi],edx
+ add r8d,r12d
+ add cl,bl
+ rol r8d,7
+ mov r12d,r10d
+ pinsrw xmm0,WORD[rax*4+rdi],2
+
+ add r8d,r9d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r9d
+ mov DWORD[rcx*4+rdi],ebx
+ and r12d,r8d
+ add r11d,DWORD[20+r15]
+ add bl,dl
+ mov eax,DWORD[24+rsi]
+ add r11d,1200080426
+ xor r12d,r10d
+ movzx ebx,bl
+ mov DWORD[20+rsi],edx
+ add r11d,r12d
+ add cl,al
+ rol r11d,12
+ mov r12d,r9d
+ pinsrw xmm1,WORD[rbx*4+rdi],2
+
+ add r11d,r8d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r8d
+ mov DWORD[rcx*4+rdi],eax
+ and r12d,r11d
+ add r10d,DWORD[24+r15]
+ add al,dl
+ mov ebx,DWORD[28+rsi]
+ add r10d,2821735955
+ xor r12d,r9d
+ movzx eax,al
+ mov DWORD[24+rsi],edx
+ add r10d,r12d
+ add cl,bl
+ rol r10d,17
+ mov r12d,r8d
+ pinsrw xmm0,WORD[rax*4+rdi],3
+
+ add r10d,r11d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r11d
+ mov DWORD[rcx*4+rdi],ebx
+ and r12d,r10d
+ add r9d,DWORD[28+r15]
+ add bl,dl
+ mov eax,DWORD[32+rsi]
+ add r9d,4249261313
+ xor r12d,r8d
+ movzx ebx,bl
+ mov DWORD[28+rsi],edx
+ add r9d,r12d
+ add cl,al
+ rol r9d,22
+ mov r12d,r11d
+ pinsrw xmm1,WORD[rbx*4+rdi],3
+
+ add r9d,r10d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r10d
+ mov DWORD[rcx*4+rdi],eax
+ and r12d,r9d
+ add r8d,DWORD[32+r15]
+ add al,dl
+ mov ebx,DWORD[36+rsi]
+ add r8d,1770035416
+ xor r12d,r11d
+ movzx eax,al
+ mov DWORD[32+rsi],edx
+ add r8d,r12d
+ add cl,bl
+ rol r8d,7
+ mov r12d,r10d
+ pinsrw xmm0,WORD[rax*4+rdi],4
+
+ add r8d,r9d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r9d
+ mov DWORD[rcx*4+rdi],ebx
+ and r12d,r8d
+ add r11d,DWORD[36+r15]
+ add bl,dl
+ mov eax,DWORD[40+rsi]
+ add r11d,2336552879
+ xor r12d,r10d
+ movzx ebx,bl
+ mov DWORD[36+rsi],edx
+ add r11d,r12d
+ add cl,al
+ rol r11d,12
+ mov r12d,r9d
+ pinsrw xmm1,WORD[rbx*4+rdi],4
+
+ add r11d,r8d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r8d
+ mov DWORD[rcx*4+rdi],eax
+ and r12d,r11d
+ add r10d,DWORD[40+r15]
+ add al,dl
+ mov ebx,DWORD[44+rsi]
+ add r10d,4294925233
+ xor r12d,r9d
+ movzx eax,al
+ mov DWORD[40+rsi],edx
+ add r10d,r12d
+ add cl,bl
+ rol r10d,17
+ mov r12d,r8d
+ pinsrw xmm0,WORD[rax*4+rdi],5
+
+ add r10d,r11d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r11d
+ mov DWORD[rcx*4+rdi],ebx
+ and r12d,r10d
+ add r9d,DWORD[44+r15]
+ add bl,dl
+ mov eax,DWORD[48+rsi]
+ add r9d,2304563134
+ xor r12d,r8d
+ movzx ebx,bl
+ mov DWORD[44+rsi],edx
+ add r9d,r12d
+ add cl,al
+ rol r9d,22
+ mov r12d,r11d
+ pinsrw xmm1,WORD[rbx*4+rdi],5
+
+ add r9d,r10d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r10d
+ mov DWORD[rcx*4+rdi],eax
+ and r12d,r9d
+ add r8d,DWORD[48+r15]
+ add al,dl
+ mov ebx,DWORD[52+rsi]
+ add r8d,1804603682
+ xor r12d,r11d
+ movzx eax,al
+ mov DWORD[48+rsi],edx
+ add r8d,r12d
+ add cl,bl
+ rol r8d,7
+ mov r12d,r10d
+ pinsrw xmm0,WORD[rax*4+rdi],6
+
+ add r8d,r9d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r9d
+ mov DWORD[rcx*4+rdi],ebx
+ and r12d,r8d
+ add r11d,DWORD[52+r15]
+ add bl,dl
+ mov eax,DWORD[56+rsi]
+ add r11d,4254626195
+ xor r12d,r10d
+ movzx ebx,bl
+ mov DWORD[52+rsi],edx
+ add r11d,r12d
+ add cl,al
+ rol r11d,12
+ mov r12d,r9d
+ pinsrw xmm1,WORD[rbx*4+rdi],6
+
+ add r11d,r8d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r8d
+ mov DWORD[rcx*4+rdi],eax
+ and r12d,r11d
+ add r10d,DWORD[56+r15]
+ add al,dl
+ mov ebx,DWORD[60+rsi]
+ add r10d,2792965006
+ xor r12d,r9d
+ movzx eax,al
+ mov DWORD[56+rsi],edx
+ add r10d,r12d
+ add cl,bl
+ rol r10d,17
+ mov r12d,r8d
+ pinsrw xmm0,WORD[rax*4+rdi],7
+
+ add r10d,r11d
+ movdqu xmm2,XMMWORD[r13]
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r11d
+ mov DWORD[rcx*4+rdi],ebx
+ and r12d,r10d
+ add r9d,DWORD[60+r15]
+ add bl,dl
+ mov eax,DWORD[64+rsi]
+ add r9d,1236535329
+ xor r12d,r8d
+ movzx ebx,bl
+ mov DWORD[60+rsi],edx
+ add r9d,r12d
+ add cl,al
+ rol r9d,22
+ mov r12d,r10d
+ pinsrw xmm1,WORD[rbx*4+rdi],7
+
+ add r9d,r10d
+ psllq xmm1,8
+ pxor xmm2,xmm0
+ pxor xmm2,xmm1
+ pxor xmm0,xmm0
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r9d
+ mov DWORD[rcx*4+rdi],eax
+ and r12d,r11d
+ add r8d,DWORD[4+r15]
+ add al,dl
+ mov ebx,DWORD[68+rsi]
+ add r8d,4129170786
+ xor r12d,r10d
+ movzx eax,al
+ mov DWORD[64+rsi],edx
+ add r8d,r12d
+ add cl,bl
+ rol r8d,5
+ mov r12d,r9d
+ movd xmm0,DWORD[rax*4+rdi]
+
+ add r8d,r9d
+ pxor xmm1,xmm1
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r8d
+ mov DWORD[rcx*4+rdi],ebx
+ and r12d,r10d
+ add r11d,DWORD[24+r15]
+ add bl,dl
+ mov eax,DWORD[72+rsi]
+ add r11d,3225465664
+ xor r12d,r9d
+ movzx ebx,bl
+ mov DWORD[68+rsi],edx
+ add r11d,r12d
+ add cl,al
+ rol r11d,9
+ mov r12d,r8d
+ movd xmm1,DWORD[rbx*4+rdi]
+
+ add r11d,r8d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r11d
+ mov DWORD[rcx*4+rdi],eax
+ and r12d,r9d
+ add r10d,DWORD[44+r15]
+ add al,dl
+ mov ebx,DWORD[76+rsi]
+ add r10d,643717713
+ xor r12d,r8d
+ movzx eax,al
+ mov DWORD[72+rsi],edx
+ add r10d,r12d
+ add cl,bl
+ rol r10d,14
+ mov r12d,r11d
+ pinsrw xmm0,WORD[rax*4+rdi],1
+
+ add r10d,r11d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r10d
+ mov DWORD[rcx*4+rdi],ebx
+ and r12d,r8d
+ add r9d,DWORD[r15]
+ add bl,dl
+ mov eax,DWORD[80+rsi]
+ add r9d,3921069994
+ xor r12d,r11d
+ movzx ebx,bl
+ mov DWORD[76+rsi],edx
+ add r9d,r12d
+ add cl,al
+ rol r9d,20
+ mov r12d,r10d
+ pinsrw xmm1,WORD[rbx*4+rdi],1
+
+ add r9d,r10d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r9d
+ mov DWORD[rcx*4+rdi],eax
+ and r12d,r11d
+ add r8d,DWORD[20+r15]
+ add al,dl
+ mov ebx,DWORD[84+rsi]
+ add r8d,3593408605
+ xor r12d,r10d
+ movzx eax,al
+ mov DWORD[80+rsi],edx
+ add r8d,r12d
+ add cl,bl
+ rol r8d,5
+ mov r12d,r9d
+ pinsrw xmm0,WORD[rax*4+rdi],2
+
+ add r8d,r9d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r8d
+ mov DWORD[rcx*4+rdi],ebx
+ and r12d,r10d
+ add r11d,DWORD[40+r15]
+ add bl,dl
+ mov eax,DWORD[88+rsi]
+ add r11d,38016083
+ xor r12d,r9d
+ movzx ebx,bl
+ mov DWORD[84+rsi],edx
+ add r11d,r12d
+ add cl,al
+ rol r11d,9
+ mov r12d,r8d
+ pinsrw xmm1,WORD[rbx*4+rdi],2
+
+ add r11d,r8d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r11d
+ mov DWORD[rcx*4+rdi],eax
+ and r12d,r9d
+ add r10d,DWORD[60+r15]
+ add al,dl
+ mov ebx,DWORD[92+rsi]
+ add r10d,3634488961
+ xor r12d,r8d
+ movzx eax,al
+ mov DWORD[88+rsi],edx
+ add r10d,r12d
+ add cl,bl
+ rol r10d,14
+ mov r12d,r11d
+ pinsrw xmm0,WORD[rax*4+rdi],3
+
+ add r10d,r11d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r10d
+ mov DWORD[rcx*4+rdi],ebx
+ and r12d,r8d
+ add r9d,DWORD[16+r15]
+ add bl,dl
+ mov eax,DWORD[96+rsi]
+ add r9d,3889429448
+ xor r12d,r11d
+ movzx ebx,bl
+ mov DWORD[92+rsi],edx
+ add r9d,r12d
+ add cl,al
+ rol r9d,20
+ mov r12d,r10d
+ pinsrw xmm1,WORD[rbx*4+rdi],3
+
+ add r9d,r10d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r9d
+ mov DWORD[rcx*4+rdi],eax
+ and r12d,r11d
+ add r8d,DWORD[36+r15]
+ add al,dl
+ mov ebx,DWORD[100+rsi]
+ add r8d,568446438
+ xor r12d,r10d
+ movzx eax,al
+ mov DWORD[96+rsi],edx
+ add r8d,r12d
+ add cl,bl
+ rol r8d,5
+ mov r12d,r9d
+ pinsrw xmm0,WORD[rax*4+rdi],4
+
+ add r8d,r9d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r8d
+ mov DWORD[rcx*4+rdi],ebx
+ and r12d,r10d
+ add r11d,DWORD[56+r15]
+ add bl,dl
+ mov eax,DWORD[104+rsi]
+ add r11d,3275163606
+ xor r12d,r9d
+ movzx ebx,bl
+ mov DWORD[100+rsi],edx
+ add r11d,r12d
+ add cl,al
+ rol r11d,9
+ mov r12d,r8d
+ pinsrw xmm1,WORD[rbx*4+rdi],4
+
+ add r11d,r8d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r11d
+ mov DWORD[rcx*4+rdi],eax
+ and r12d,r9d
+ add r10d,DWORD[12+r15]
+ add al,dl
+ mov ebx,DWORD[108+rsi]
+ add r10d,4107603335
+ xor r12d,r8d
+ movzx eax,al
+ mov DWORD[104+rsi],edx
+ add r10d,r12d
+ add cl,bl
+ rol r10d,14
+ mov r12d,r11d
+ pinsrw xmm0,WORD[rax*4+rdi],5
+
+ add r10d,r11d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r10d
+ mov DWORD[rcx*4+rdi],ebx
+ and r12d,r8d
+ add r9d,DWORD[32+r15]
+ add bl,dl
+ mov eax,DWORD[112+rsi]
+ add r9d,1163531501
+ xor r12d,r11d
+ movzx ebx,bl
+ mov DWORD[108+rsi],edx
+ add r9d,r12d
+ add cl,al
+ rol r9d,20
+ mov r12d,r10d
+ pinsrw xmm1,WORD[rbx*4+rdi],5
+
+ add r9d,r10d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r9d
+ mov DWORD[rcx*4+rdi],eax
+ and r12d,r11d
+ add r8d,DWORD[52+r15]
+ add al,dl
+ mov ebx,DWORD[116+rsi]
+ add r8d,2850285829
+ xor r12d,r10d
+ movzx eax,al
+ mov DWORD[112+rsi],edx
+ add r8d,r12d
+ add cl,bl
+ rol r8d,5
+ mov r12d,r9d
+ pinsrw xmm0,WORD[rax*4+rdi],6
+
+ add r8d,r9d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r8d
+ mov DWORD[rcx*4+rdi],ebx
+ and r12d,r10d
+ add r11d,DWORD[8+r15]
+ add bl,dl
+ mov eax,DWORD[120+rsi]
+ add r11d,4243563512
+ xor r12d,r9d
+ movzx ebx,bl
+ mov DWORD[116+rsi],edx
+ add r11d,r12d
+ add cl,al
+ rol r11d,9
+ mov r12d,r8d
+ pinsrw xmm1,WORD[rbx*4+rdi],6
+
+ add r11d,r8d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r11d
+ mov DWORD[rcx*4+rdi],eax
+ and r12d,r9d
+ add r10d,DWORD[28+r15]
+ add al,dl
+ mov ebx,DWORD[124+rsi]
+ add r10d,1735328473
+ xor r12d,r8d
+ movzx eax,al
+ mov DWORD[120+rsi],edx
+ add r10d,r12d
+ add cl,bl
+ rol r10d,14
+ mov r12d,r11d
+ pinsrw xmm0,WORD[rax*4+rdi],7
+
+ add r10d,r11d
+ movdqu xmm3,XMMWORD[16+r13]
+ add bpl,32
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r10d
+ mov DWORD[rcx*4+rdi],ebx
+ and r12d,r8d
+ add r9d,DWORD[48+r15]
+ add bl,dl
+ mov eax,DWORD[rbp*4+rdi]
+ add r9d,2368359562
+ xor r12d,r11d
+ movzx ebx,bl
+ mov DWORD[124+rsi],edx
+ add r9d,r12d
+ add cl,al
+ rol r9d,20
+ mov r12d,r11d
+ pinsrw xmm1,WORD[rbx*4+rdi],7
+
+ add r9d,r10d
+ mov rsi,rcx
+ xor rcx,rcx
+ mov cl,sil
+ lea rsi,[rbp*4+rdi]
+ psllq xmm1,8
+ pxor xmm3,xmm0
+ pxor xmm3,xmm1
+ pxor xmm0,xmm0
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r10d
+ mov DWORD[rcx*4+rdi],eax
+ xor r12d,r9d
+ add r8d,DWORD[20+r15]
+ add al,dl
+ mov ebx,DWORD[4+rsi]
+ add r8d,4294588738
+ movzx eax,al
+ add r8d,r12d
+ mov DWORD[rsi],edx
+ add cl,bl
+ rol r8d,4
+ mov r12d,r10d
+ movd xmm0,DWORD[rax*4+rdi]
+
+ add r8d,r9d
+ pxor xmm1,xmm1
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r9d
+ mov DWORD[rcx*4+rdi],ebx
+ xor r12d,r8d
+ add r11d,DWORD[32+r15]
+ add bl,dl
+ mov eax,DWORD[8+rsi]
+ add r11d,2272392833
+ movzx ebx,bl
+ add r11d,r12d
+ mov DWORD[4+rsi],edx
+ add cl,al
+ rol r11d,11
+ mov r12d,r9d
+ movd xmm1,DWORD[rbx*4+rdi]
+
+ add r11d,r8d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r8d
+ mov DWORD[rcx*4+rdi],eax
+ xor r12d,r11d
+ add r10d,DWORD[44+r15]
+ add al,dl
+ mov ebx,DWORD[12+rsi]
+ add r10d,1839030562
+ movzx eax,al
+ add r10d,r12d
+ mov DWORD[8+rsi],edx
+ add cl,bl
+ rol r10d,16
+ mov r12d,r8d
+ pinsrw xmm0,WORD[rax*4+rdi],1
+
+ add r10d,r11d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r11d
+ mov DWORD[rcx*4+rdi],ebx
+ xor r12d,r10d
+ add r9d,DWORD[56+r15]
+ add bl,dl
+ mov eax,DWORD[16+rsi]
+ add r9d,4259657740
+ movzx ebx,bl
+ add r9d,r12d
+ mov DWORD[12+rsi],edx
+ add cl,al
+ rol r9d,23
+ mov r12d,r11d
+ pinsrw xmm1,WORD[rbx*4+rdi],1
+
+ add r9d,r10d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r10d
+ mov DWORD[rcx*4+rdi],eax
+ xor r12d,r9d
+ add r8d,DWORD[4+r15]
+ add al,dl
+ mov ebx,DWORD[20+rsi]
+ add r8d,2763975236
+ movzx eax,al
+ add r8d,r12d
+ mov DWORD[16+rsi],edx
+ add cl,bl
+ rol r8d,4
+ mov r12d,r10d
+ pinsrw xmm0,WORD[rax*4+rdi],2
+
+ add r8d,r9d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r9d
+ mov DWORD[rcx*4+rdi],ebx
+ xor r12d,r8d
+ add r11d,DWORD[16+r15]
+ add bl,dl
+ mov eax,DWORD[24+rsi]
+ add r11d,1272893353
+ movzx ebx,bl
+ add r11d,r12d
+ mov DWORD[20+rsi],edx
+ add cl,al
+ rol r11d,11
+ mov r12d,r9d
+ pinsrw xmm1,WORD[rbx*4+rdi],2
+
+ add r11d,r8d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r8d
+ mov DWORD[rcx*4+rdi],eax
+ xor r12d,r11d
+ add r10d,DWORD[28+r15]
+ add al,dl
+ mov ebx,DWORD[28+rsi]
+ add r10d,4139469664
+ movzx eax,al
+ add r10d,r12d
+ mov DWORD[24+rsi],edx
+ add cl,bl
+ rol r10d,16
+ mov r12d,r8d
+ pinsrw xmm0,WORD[rax*4+rdi],3
+
+ add r10d,r11d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r11d
+ mov DWORD[rcx*4+rdi],ebx
+ xor r12d,r10d
+ add r9d,DWORD[40+r15]
+ add bl,dl
+ mov eax,DWORD[32+rsi]
+ add r9d,3200236656
+ movzx ebx,bl
+ add r9d,r12d
+ mov DWORD[28+rsi],edx
+ add cl,al
+ rol r9d,23
+ mov r12d,r11d
+ pinsrw xmm1,WORD[rbx*4+rdi],3
+
+ add r9d,r10d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r10d
+ mov DWORD[rcx*4+rdi],eax
+ xor r12d,r9d
+ add r8d,DWORD[52+r15]
+ add al,dl
+ mov ebx,DWORD[36+rsi]
+ add r8d,681279174
+ movzx eax,al
+ add r8d,r12d
+ mov DWORD[32+rsi],edx
+ add cl,bl
+ rol r8d,4
+ mov r12d,r10d
+ pinsrw xmm0,WORD[rax*4+rdi],4
+
+ add r8d,r9d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r9d
+ mov DWORD[rcx*4+rdi],ebx
+ xor r12d,r8d
+ add r11d,DWORD[r15]
+ add bl,dl
+ mov eax,DWORD[40+rsi]
+ add r11d,3936430074
+ movzx ebx,bl
+ add r11d,r12d
+ mov DWORD[36+rsi],edx
+ add cl,al
+ rol r11d,11
+ mov r12d,r9d
+ pinsrw xmm1,WORD[rbx*4+rdi],4
+
+ add r11d,r8d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r8d
+ mov DWORD[rcx*4+rdi],eax
+ xor r12d,r11d
+ add r10d,DWORD[12+r15]
+ add al,dl
+ mov ebx,DWORD[44+rsi]
+ add r10d,3572445317
+ movzx eax,al
+ add r10d,r12d
+ mov DWORD[40+rsi],edx
+ add cl,bl
+ rol r10d,16
+ mov r12d,r8d
+ pinsrw xmm0,WORD[rax*4+rdi],5
+
+ add r10d,r11d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r11d
+ mov DWORD[rcx*4+rdi],ebx
+ xor r12d,r10d
+ add r9d,DWORD[24+r15]
+ add bl,dl
+ mov eax,DWORD[48+rsi]
+ add r9d,76029189
+ movzx ebx,bl
+ add r9d,r12d
+ mov DWORD[44+rsi],edx
+ add cl,al
+ rol r9d,23
+ mov r12d,r11d
+ pinsrw xmm1,WORD[rbx*4+rdi],5
+
+ add r9d,r10d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r10d
+ mov DWORD[rcx*4+rdi],eax
+ xor r12d,r9d
+ add r8d,DWORD[36+r15]
+ add al,dl
+ mov ebx,DWORD[52+rsi]
+ add r8d,3654602809
+ movzx eax,al
+ add r8d,r12d
+ mov DWORD[48+rsi],edx
+ add cl,bl
+ rol r8d,4
+ mov r12d,r10d
+ pinsrw xmm0,WORD[rax*4+rdi],6
+
+ add r8d,r9d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r9d
+ mov DWORD[rcx*4+rdi],ebx
+ xor r12d,r8d
+ add r11d,DWORD[48+r15]
+ add bl,dl
+ mov eax,DWORD[56+rsi]
+ add r11d,3873151461
+ movzx ebx,bl
+ add r11d,r12d
+ mov DWORD[52+rsi],edx
+ add cl,al
+ rol r11d,11
+ mov r12d,r9d
+ pinsrw xmm1,WORD[rbx*4+rdi],6
+
+ add r11d,r8d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r8d
+ mov DWORD[rcx*4+rdi],eax
+ xor r12d,r11d
+ add r10d,DWORD[60+r15]
+ add al,dl
+ mov ebx,DWORD[60+rsi]
+ add r10d,530742520
+ movzx eax,al
+ add r10d,r12d
+ mov DWORD[56+rsi],edx
+ add cl,bl
+ rol r10d,16
+ mov r12d,r8d
+ pinsrw xmm0,WORD[rax*4+rdi],7
+
+ add r10d,r11d
+ movdqu xmm4,XMMWORD[32+r13]
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r11d
+ mov DWORD[rcx*4+rdi],ebx
+ xor r12d,r10d
+ add r9d,DWORD[8+r15]
+ add bl,dl
+ mov eax,DWORD[64+rsi]
+ add r9d,3299628645
+ movzx ebx,bl
+ add r9d,r12d
+ mov DWORD[60+rsi],edx
+ add cl,al
+ rol r9d,23
+ mov r12d,-1
+ pinsrw xmm1,WORD[rbx*4+rdi],7
+
+ add r9d,r10d
+ psllq xmm1,8
+ pxor xmm4,xmm0
+ pxor xmm4,xmm1
+ pxor xmm0,xmm0
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r11d
+ mov DWORD[rcx*4+rdi],eax
+ or r12d,r9d
+ add r8d,DWORD[r15]
+ add al,dl
+ mov ebx,DWORD[68+rsi]
+ add r8d,4096336452
+ movzx eax,al
+ xor r12d,r10d
+ mov DWORD[64+rsi],edx
+ add r8d,r12d
+ add cl,bl
+ rol r8d,6
+ mov r12d,-1
+ movd xmm0,DWORD[rax*4+rdi]
+
+ add r8d,r9d
+ pxor xmm1,xmm1
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r10d
+ mov DWORD[rcx*4+rdi],ebx
+ or r12d,r8d
+ add r11d,DWORD[28+r15]
+ add bl,dl
+ mov eax,DWORD[72+rsi]
+ add r11d,1126891415
+ movzx ebx,bl
+ xor r12d,r9d
+ mov DWORD[68+rsi],edx
+ add r11d,r12d
+ add cl,al
+ rol r11d,10
+ mov r12d,-1
+ movd xmm1,DWORD[rbx*4+rdi]
+
+ add r11d,r8d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r9d
+ mov DWORD[rcx*4+rdi],eax
+ or r12d,r11d
+ add r10d,DWORD[56+r15]
+ add al,dl
+ mov ebx,DWORD[76+rsi]
+ add r10d,2878612391
+ movzx eax,al
+ xor r12d,r8d
+ mov DWORD[72+rsi],edx
+ add r10d,r12d
+ add cl,bl
+ rol r10d,15
+ mov r12d,-1
+ pinsrw xmm0,WORD[rax*4+rdi],1
+
+ add r10d,r11d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r8d
+ mov DWORD[rcx*4+rdi],ebx
+ or r12d,r10d
+ add r9d,DWORD[20+r15]
+ add bl,dl
+ mov eax,DWORD[80+rsi]
+ add r9d,4237533241
+ movzx ebx,bl
+ xor r12d,r11d
+ mov DWORD[76+rsi],edx
+ add r9d,r12d
+ add cl,al
+ rol r9d,21
+ mov r12d,-1
+ pinsrw xmm1,WORD[rbx*4+rdi],1
+
+ add r9d,r10d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r11d
+ mov DWORD[rcx*4+rdi],eax
+ or r12d,r9d
+ add r8d,DWORD[48+r15]
+ add al,dl
+ mov ebx,DWORD[84+rsi]
+ add r8d,1700485571
+ movzx eax,al
+ xor r12d,r10d
+ mov DWORD[80+rsi],edx
+ add r8d,r12d
+ add cl,bl
+ rol r8d,6
+ mov r12d,-1
+ pinsrw xmm0,WORD[rax*4+rdi],2
+
+ add r8d,r9d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r10d
+ mov DWORD[rcx*4+rdi],ebx
+ or r12d,r8d
+ add r11d,DWORD[12+r15]
+ add bl,dl
+ mov eax,DWORD[88+rsi]
+ add r11d,2399980690
+ movzx ebx,bl
+ xor r12d,r9d
+ mov DWORD[84+rsi],edx
+ add r11d,r12d
+ add cl,al
+ rol r11d,10
+ mov r12d,-1
+ pinsrw xmm1,WORD[rbx*4+rdi],2
+
+ add r11d,r8d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r9d
+ mov DWORD[rcx*4+rdi],eax
+ or r12d,r11d
+ add r10d,DWORD[40+r15]
+ add al,dl
+ mov ebx,DWORD[92+rsi]
+ add r10d,4293915773
+ movzx eax,al
+ xor r12d,r8d
+ mov DWORD[88+rsi],edx
+ add r10d,r12d
+ add cl,bl
+ rol r10d,15
+ mov r12d,-1
+ pinsrw xmm0,WORD[rax*4+rdi],3
+
+ add r10d,r11d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r8d
+ mov DWORD[rcx*4+rdi],ebx
+ or r12d,r10d
+ add r9d,DWORD[4+r15]
+ add bl,dl
+ mov eax,DWORD[96+rsi]
+ add r9d,2240044497
+ movzx ebx,bl
+ xor r12d,r11d
+ mov DWORD[92+rsi],edx
+ add r9d,r12d
+ add cl,al
+ rol r9d,21
+ mov r12d,-1
+ pinsrw xmm1,WORD[rbx*4+rdi],3
+
+ add r9d,r10d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r11d
+ mov DWORD[rcx*4+rdi],eax
+ or r12d,r9d
+ add r8d,DWORD[32+r15]
+ add al,dl
+ mov ebx,DWORD[100+rsi]
+ add r8d,1873313359
+ movzx eax,al
+ xor r12d,r10d
+ mov DWORD[96+rsi],edx
+ add r8d,r12d
+ add cl,bl
+ rol r8d,6
+ mov r12d,-1
+ pinsrw xmm0,WORD[rax*4+rdi],4
+
+ add r8d,r9d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r10d
+ mov DWORD[rcx*4+rdi],ebx
+ or r12d,r8d
+ add r11d,DWORD[60+r15]
+ add bl,dl
+ mov eax,DWORD[104+rsi]
+ add r11d,4264355552
+ movzx ebx,bl
+ xor r12d,r9d
+ mov DWORD[100+rsi],edx
+ add r11d,r12d
+ add cl,al
+ rol r11d,10
+ mov r12d,-1
+ pinsrw xmm1,WORD[rbx*4+rdi],4
+
+ add r11d,r8d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r9d
+ mov DWORD[rcx*4+rdi],eax
+ or r12d,r11d
+ add r10d,DWORD[24+r15]
+ add al,dl
+ mov ebx,DWORD[108+rsi]
+ add r10d,2734768916
+ movzx eax,al
+ xor r12d,r8d
+ mov DWORD[104+rsi],edx
+ add r10d,r12d
+ add cl,bl
+ rol r10d,15
+ mov r12d,-1
+ pinsrw xmm0,WORD[rax*4+rdi],5
+
+ add r10d,r11d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r8d
+ mov DWORD[rcx*4+rdi],ebx
+ or r12d,r10d
+ add r9d,DWORD[52+r15]
+ add bl,dl
+ mov eax,DWORD[112+rsi]
+ add r9d,1309151649
+ movzx ebx,bl
+ xor r12d,r11d
+ mov DWORD[108+rsi],edx
+ add r9d,r12d
+ add cl,al
+ rol r9d,21
+ mov r12d,-1
+ pinsrw xmm1,WORD[rbx*4+rdi],5
+
+ add r9d,r10d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r11d
+ mov DWORD[rcx*4+rdi],eax
+ or r12d,r9d
+ add r8d,DWORD[16+r15]
+ add al,dl
+ mov ebx,DWORD[116+rsi]
+ add r8d,4149444226
+ movzx eax,al
+ xor r12d,r10d
+ mov DWORD[112+rsi],edx
+ add r8d,r12d
+ add cl,bl
+ rol r8d,6
+ mov r12d,-1
+ pinsrw xmm0,WORD[rax*4+rdi],6
+
+ add r8d,r9d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r10d
+ mov DWORD[rcx*4+rdi],ebx
+ or r12d,r8d
+ add r11d,DWORD[44+r15]
+ add bl,dl
+ mov eax,DWORD[120+rsi]
+ add r11d,3174756917
+ movzx ebx,bl
+ xor r12d,r9d
+ mov DWORD[116+rsi],edx
+ add r11d,r12d
+ add cl,al
+ rol r11d,10
+ mov r12d,-1
+ pinsrw xmm1,WORD[rbx*4+rdi],6
+
+ add r11d,r8d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r9d
+ mov DWORD[rcx*4+rdi],eax
+ or r12d,r11d
+ add r10d,DWORD[8+r15]
+ add al,dl
+ mov ebx,DWORD[124+rsi]
+ add r10d,718787259
+ movzx eax,al
+ xor r12d,r8d
+ mov DWORD[120+rsi],edx
+ add r10d,r12d
+ add cl,bl
+ rol r10d,15
+ mov r12d,-1
+ pinsrw xmm0,WORD[rax*4+rdi],7
+
+ add r10d,r11d
+ movdqu xmm5,XMMWORD[48+r13]
+ add bpl,32
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r8d
+ mov DWORD[rcx*4+rdi],ebx
+ or r12d,r10d
+ add r9d,DWORD[36+r15]
+ add bl,dl
+ mov eax,DWORD[rbp*4+rdi]
+ add r9d,3951481745
+ movzx ebx,bl
+ xor r12d,r11d
+ mov DWORD[124+rsi],edx
+ add r9d,r12d
+ add cl,al
+ rol r9d,21
+ mov r12d,-1
+ pinsrw xmm1,WORD[rbx*4+rdi],7
+
+ add r9d,r10d
+ mov rsi,rbp
+ xor rbp,rbp
+ mov bpl,sil
+ mov rsi,rcx
+ xor rcx,rcx
+ mov cl,sil
+ lea rsi,[rbp*4+rdi]
+ psllq xmm1,8
+ pxor xmm5,xmm0
+ pxor xmm5,xmm1
+ add r8d,DWORD[rsp]
+ add r9d,DWORD[4+rsp]
+ add r10d,DWORD[8+rsp]
+ add r11d,DWORD[12+rsp]
+
+ movdqu XMMWORD[r13*1+r14],xmm2
+ movdqu XMMWORD[16+r13*1+r14],xmm3
+ movdqu XMMWORD[32+r13*1+r14],xmm4
+ movdqu XMMWORD[48+r13*1+r14],xmm5
+ lea r15,[64+r15]
+ lea r13,[64+r13]
+ cmp r15,QWORD[16+rsp]
+ jb NEAR $L$oop
+
+ mov r12,QWORD[24+rsp]
+ sub cl,al
+ mov DWORD[r12],r8d
+ mov DWORD[4+r12],r9d
+ mov DWORD[8+r12],r10d
+ mov DWORD[12+r12],r11d
+ sub bpl,1
+ mov DWORD[((-8))+rdi],ebp
+ mov DWORD[((-4))+rdi],ecx
+
+ mov r15,QWORD[40+rsp]
+
+ mov r14,QWORD[48+rsp]
+
+ mov r13,QWORD[56+rsp]
+
+ mov r12,QWORD[64+rsp]
+
+ mov rbp,QWORD[72+rsp]
+
+ mov rbx,QWORD[80+rsp]
+
+ lea rsp,[88+rsp]
+
+$L$epilogue:
+$L$abort:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_rc4_md5_enc:
+EXTERN __imp_RtlVirtualUnwind
+
+ALIGN 16
+se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ lea r10,[$L$body]
+ cmp rbx,r10
+ jb NEAR $L$in_prologue
+
+ mov rax,QWORD[152+r8]
+
+ lea r10,[$L$epilogue]
+ cmp rbx,r10
+ jae NEAR $L$in_prologue
+
+ mov r15,QWORD[40+rax]
+ mov r14,QWORD[48+rax]
+ mov r13,QWORD[56+rax]
+ mov r12,QWORD[64+rax]
+ mov rbp,QWORD[72+rax]
+ mov rbx,QWORD[80+rax]
+ lea rax,[88+rax]
+
+ mov QWORD[144+r8],rbx
+ mov QWORD[160+r8],rbp
+ mov QWORD[216+r8],r12
+ mov QWORD[224+r8],r13
+ mov QWORD[232+r8],r14
+ mov QWORD[240+r8],r15
+
+$L$in_prologue:
+ mov rdi,QWORD[8+rax]
+ mov rsi,QWORD[16+rax]
+ mov QWORD[152+r8],rax
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+
+ mov rdi,QWORD[40+r9]
+ mov rsi,r8
+ mov ecx,154
+ DD 0xa548f3fc
+
+ mov rsi,r9
+ xor rcx,rcx
+ mov rdx,QWORD[8+rsi]
+ mov r8,QWORD[rsi]
+ mov r9,QWORD[16+rsi]
+ mov r10,QWORD[40+rsi]
+ lea r11,[56+rsi]
+ lea r12,[24+rsi]
+ mov QWORD[32+rsp],r10
+ mov QWORD[40+rsp],r11
+ mov QWORD[48+rsp],r12
+ mov QWORD[56+rsp],rcx
+ call QWORD[__imp_RtlVirtualUnwind]
+
+ mov eax,1
+ add rsp,64
+ popfq
+ pop r15
+ pop r14
+ pop r13
+ pop r12
+ pop rbp
+ pop rbx
+ pop rdi
+ pop rsi
+ DB 0F3h,0C3h ;repret
+
+
+section .pdata rdata align=4
+ALIGN 4
+ DD $L$SEH_begin_rc4_md5_enc wrt ..imagebase
+ DD $L$SEH_end_rc4_md5_enc wrt ..imagebase
+ DD $L$SEH_info_rc4_md5_enc wrt ..imagebase
+
+section .xdata rdata align=8
+ALIGN 8
+$L$SEH_info_rc4_md5_enc:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-x86_64.nasm
new file mode 100644
index 0000000000..72e3641649
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-x86_64.nasm
@@ -0,0 +1,784 @@
+; Copyright 2005-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+EXTERN OPENSSL_ia32cap_P
+
+global RC4
+
+ALIGN 16
+RC4:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_RC4:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+
+ or rsi,rsi
+ jne NEAR $L$entry
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+$L$entry:
+
+ push rbx
+
+ push r12
+
+ push r13
+
+$L$prologue:
+ mov r11,rsi
+ mov r12,rdx
+ mov r13,rcx
+ xor r10,r10
+ xor rcx,rcx
+
+ lea rdi,[8+rdi]
+ mov r10b,BYTE[((-8))+rdi]
+ mov cl,BYTE[((-4))+rdi]
+ cmp DWORD[256+rdi],-1
+ je NEAR $L$RC4_CHAR
+ mov r8d,DWORD[OPENSSL_ia32cap_P]
+ xor rbx,rbx
+ inc r10b
+ sub rbx,r10
+ sub r13,r12
+ mov eax,DWORD[r10*4+rdi]
+ test r11,-16
+ jz NEAR $L$loop1
+ bt r8d,30
+ jc NEAR $L$intel
+ and rbx,7
+ lea rsi,[1+r10]
+ jz NEAR $L$oop8
+ sub r11,rbx
+$L$oop8_warmup:
+ add cl,al
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],eax
+ mov DWORD[r10*4+rdi],edx
+ add al,dl
+ inc r10b
+ mov edx,DWORD[rax*4+rdi]
+ mov eax,DWORD[r10*4+rdi]
+ xor dl,BYTE[r12]
+ mov BYTE[r13*1+r12],dl
+ lea r12,[1+r12]
+ dec rbx
+ jnz NEAR $L$oop8_warmup
+
+ lea rsi,[1+r10]
+ jmp NEAR $L$oop8
+ALIGN 16
+$L$oop8:
+ add cl,al
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],eax
+ mov ebx,DWORD[rsi*4+rdi]
+ ror r8,8
+ mov DWORD[r10*4+rdi],edx
+ add dl,al
+ mov r8b,BYTE[rdx*4+rdi]
+ add cl,bl
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],ebx
+ mov eax,DWORD[4+rsi*4+rdi]
+ ror r8,8
+ mov DWORD[4+r10*4+rdi],edx
+ add dl,bl
+ mov r8b,BYTE[rdx*4+rdi]
+ add cl,al
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],eax
+ mov ebx,DWORD[8+rsi*4+rdi]
+ ror r8,8
+ mov DWORD[8+r10*4+rdi],edx
+ add dl,al
+ mov r8b,BYTE[rdx*4+rdi]
+ add cl,bl
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],ebx
+ mov eax,DWORD[12+rsi*4+rdi]
+ ror r8,8
+ mov DWORD[12+r10*4+rdi],edx
+ add dl,bl
+ mov r8b,BYTE[rdx*4+rdi]
+ add cl,al
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],eax
+ mov ebx,DWORD[16+rsi*4+rdi]
+ ror r8,8
+ mov DWORD[16+r10*4+rdi],edx
+ add dl,al
+ mov r8b,BYTE[rdx*4+rdi]
+ add cl,bl
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],ebx
+ mov eax,DWORD[20+rsi*4+rdi]
+ ror r8,8
+ mov DWORD[20+r10*4+rdi],edx
+ add dl,bl
+ mov r8b,BYTE[rdx*4+rdi]
+ add cl,al
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],eax
+ mov ebx,DWORD[24+rsi*4+rdi]
+ ror r8,8
+ mov DWORD[24+r10*4+rdi],edx
+ add dl,al
+ mov r8b,BYTE[rdx*4+rdi]
+ add sil,8
+ add cl,bl
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],ebx
+ mov eax,DWORD[((-4))+rsi*4+rdi]
+ ror r8,8
+ mov DWORD[28+r10*4+rdi],edx
+ add dl,bl
+ mov r8b,BYTE[rdx*4+rdi]
+ add r10b,8
+ ror r8,8
+ sub r11,8
+
+ xor r8,QWORD[r12]
+ mov QWORD[r13*1+r12],r8
+ lea r12,[8+r12]
+
+ test r11,-8
+ jnz NEAR $L$oop8
+ cmp r11,0
+ jne NEAR $L$loop1
+ jmp NEAR $L$exit
+
+ALIGN 16
+$L$intel:
+ test r11,-32
+ jz NEAR $L$loop1
+ and rbx,15
+ jz NEAR $L$oop16_is_hot
+ sub r11,rbx
+$L$oop16_warmup:
+ add cl,al
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],eax
+ mov DWORD[r10*4+rdi],edx
+ add al,dl
+ inc r10b
+ mov edx,DWORD[rax*4+rdi]
+ mov eax,DWORD[r10*4+rdi]
+ xor dl,BYTE[r12]
+ mov BYTE[r13*1+r12],dl
+ lea r12,[1+r12]
+ dec rbx
+ jnz NEAR $L$oop16_warmup
+
+ mov rbx,rcx
+ xor rcx,rcx
+ mov cl,bl
+
+$L$oop16_is_hot:
+ lea rsi,[r10*4+rdi]
+ add cl,al
+ mov edx,DWORD[rcx*4+rdi]
+ pxor xmm0,xmm0
+ mov DWORD[rcx*4+rdi],eax
+ add al,dl
+ mov ebx,DWORD[4+rsi]
+ movzx eax,al
+ mov DWORD[rsi],edx
+ add cl,bl
+ pinsrw xmm0,WORD[rax*4+rdi],0
+ jmp NEAR $L$oop16_enter
+ALIGN 16
+$L$oop16:
+ add cl,al
+ mov edx,DWORD[rcx*4+rdi]
+ pxor xmm2,xmm0
+ psllq xmm1,8
+ pxor xmm0,xmm0
+ mov DWORD[rcx*4+rdi],eax
+ add al,dl
+ mov ebx,DWORD[4+rsi]
+ movzx eax,al
+ mov DWORD[rsi],edx
+ pxor xmm2,xmm1
+ add cl,bl
+ pinsrw xmm0,WORD[rax*4+rdi],0
+ movdqu XMMWORD[r13*1+r12],xmm2
+ lea r12,[16+r12]
+$L$oop16_enter:
+ mov edx,DWORD[rcx*4+rdi]
+ pxor xmm1,xmm1
+ mov DWORD[rcx*4+rdi],ebx
+ add bl,dl
+ mov eax,DWORD[8+rsi]
+ movzx ebx,bl
+ mov DWORD[4+rsi],edx
+ add cl,al
+ pinsrw xmm1,WORD[rbx*4+rdi],0
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],eax
+ add al,dl
+ mov ebx,DWORD[12+rsi]
+ movzx eax,al
+ mov DWORD[8+rsi],edx
+ add cl,bl
+ pinsrw xmm0,WORD[rax*4+rdi],1
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],ebx
+ add bl,dl
+ mov eax,DWORD[16+rsi]
+ movzx ebx,bl
+ mov DWORD[12+rsi],edx
+ add cl,al
+ pinsrw xmm1,WORD[rbx*4+rdi],1
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],eax
+ add al,dl
+ mov ebx,DWORD[20+rsi]
+ movzx eax,al
+ mov DWORD[16+rsi],edx
+ add cl,bl
+ pinsrw xmm0,WORD[rax*4+rdi],2
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],ebx
+ add bl,dl
+ mov eax,DWORD[24+rsi]
+ movzx ebx,bl
+ mov DWORD[20+rsi],edx
+ add cl,al
+ pinsrw xmm1,WORD[rbx*4+rdi],2
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],eax
+ add al,dl
+ mov ebx,DWORD[28+rsi]
+ movzx eax,al
+ mov DWORD[24+rsi],edx
+ add cl,bl
+ pinsrw xmm0,WORD[rax*4+rdi],3
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],ebx
+ add bl,dl
+ mov eax,DWORD[32+rsi]
+ movzx ebx,bl
+ mov DWORD[28+rsi],edx
+ add cl,al
+ pinsrw xmm1,WORD[rbx*4+rdi],3
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],eax
+ add al,dl
+ mov ebx,DWORD[36+rsi]
+ movzx eax,al
+ mov DWORD[32+rsi],edx
+ add cl,bl
+ pinsrw xmm0,WORD[rax*4+rdi],4
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],ebx
+ add bl,dl
+ mov eax,DWORD[40+rsi]
+ movzx ebx,bl
+ mov DWORD[36+rsi],edx
+ add cl,al
+ pinsrw xmm1,WORD[rbx*4+rdi],4
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],eax
+ add al,dl
+ mov ebx,DWORD[44+rsi]
+ movzx eax,al
+ mov DWORD[40+rsi],edx
+ add cl,bl
+ pinsrw xmm0,WORD[rax*4+rdi],5
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],ebx
+ add bl,dl
+ mov eax,DWORD[48+rsi]
+ movzx ebx,bl
+ mov DWORD[44+rsi],edx
+ add cl,al
+ pinsrw xmm1,WORD[rbx*4+rdi],5
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],eax
+ add al,dl
+ mov ebx,DWORD[52+rsi]
+ movzx eax,al
+ mov DWORD[48+rsi],edx
+ add cl,bl
+ pinsrw xmm0,WORD[rax*4+rdi],6
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],ebx
+ add bl,dl
+ mov eax,DWORD[56+rsi]
+ movzx ebx,bl
+ mov DWORD[52+rsi],edx
+ add cl,al
+ pinsrw xmm1,WORD[rbx*4+rdi],6
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],eax
+ add al,dl
+ mov ebx,DWORD[60+rsi]
+ movzx eax,al
+ mov DWORD[56+rsi],edx
+ add cl,bl
+ pinsrw xmm0,WORD[rax*4+rdi],7
+ add r10b,16
+ movdqu xmm2,XMMWORD[r12]
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],ebx
+ add bl,dl
+ movzx ebx,bl
+ mov DWORD[60+rsi],edx
+ lea rsi,[r10*4+rdi]
+ pinsrw xmm1,WORD[rbx*4+rdi],7
+ mov eax,DWORD[rsi]
+ mov rbx,rcx
+ xor rcx,rcx
+ sub r11,16
+ mov cl,bl
+ test r11,-16
+ jnz NEAR $L$oop16
+
+ psllq xmm1,8
+ pxor xmm2,xmm0
+ pxor xmm2,xmm1
+ movdqu XMMWORD[r13*1+r12],xmm2
+ lea r12,[16+r12]
+
+ cmp r11,0
+ jne NEAR $L$loop1
+ jmp NEAR $L$exit
+
+ALIGN 16
+$L$loop1:
+ add cl,al
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],eax
+ mov DWORD[r10*4+rdi],edx
+ add al,dl
+ inc r10b
+ mov edx,DWORD[rax*4+rdi]
+ mov eax,DWORD[r10*4+rdi]
+ xor dl,BYTE[r12]
+ mov BYTE[r13*1+r12],dl
+ lea r12,[1+r12]
+ dec r11
+ jnz NEAR $L$loop1
+ jmp NEAR $L$exit
+
+ALIGN 16
+$L$RC4_CHAR:
+ add r10b,1
+ movzx eax,BYTE[r10*1+rdi]
+ test r11,-8
+ jz NEAR $L$cloop1
+ jmp NEAR $L$cloop8
+ALIGN 16
+$L$cloop8:
+ mov r8d,DWORD[r12]
+ mov r9d,DWORD[4+r12]
+ add cl,al
+ lea rsi,[1+r10]
+ movzx edx,BYTE[rcx*1+rdi]
+ movzx esi,sil
+ movzx ebx,BYTE[rsi*1+rdi]
+ mov BYTE[rcx*1+rdi],al
+ cmp rcx,rsi
+ mov BYTE[r10*1+rdi],dl
+ jne NEAR $L$cmov0
+ mov rbx,rax
+$L$cmov0:
+ add dl,al
+ xor r8b,BYTE[rdx*1+rdi]
+ ror r8d,8
+ add cl,bl
+ lea r10,[1+rsi]
+ movzx edx,BYTE[rcx*1+rdi]
+ movzx r10d,r10b
+ movzx eax,BYTE[r10*1+rdi]
+ mov BYTE[rcx*1+rdi],bl
+ cmp rcx,r10
+ mov BYTE[rsi*1+rdi],dl
+ jne NEAR $L$cmov1
+ mov rax,rbx
+$L$cmov1:
+ add dl,bl
+ xor r8b,BYTE[rdx*1+rdi]
+ ror r8d,8
+ add cl,al
+ lea rsi,[1+r10]
+ movzx edx,BYTE[rcx*1+rdi]
+ movzx esi,sil
+ movzx ebx,BYTE[rsi*1+rdi]
+ mov BYTE[rcx*1+rdi],al
+ cmp rcx,rsi
+ mov BYTE[r10*1+rdi],dl
+ jne NEAR $L$cmov2
+ mov rbx,rax
+$L$cmov2:
+ add dl,al
+ xor r8b,BYTE[rdx*1+rdi]
+ ror r8d,8
+ add cl,bl
+ lea r10,[1+rsi]
+ movzx edx,BYTE[rcx*1+rdi]
+ movzx r10d,r10b
+ movzx eax,BYTE[r10*1+rdi]
+ mov BYTE[rcx*1+rdi],bl
+ cmp rcx,r10
+ mov BYTE[rsi*1+rdi],dl
+ jne NEAR $L$cmov3
+ mov rax,rbx
+$L$cmov3:
+ add dl,bl
+ xor r8b,BYTE[rdx*1+rdi]
+ ror r8d,8
+ add cl,al
+ lea rsi,[1+r10]
+ movzx edx,BYTE[rcx*1+rdi]
+ movzx esi,sil
+ movzx ebx,BYTE[rsi*1+rdi]
+ mov BYTE[rcx*1+rdi],al
+ cmp rcx,rsi
+ mov BYTE[r10*1+rdi],dl
+ jne NEAR $L$cmov4
+ mov rbx,rax
+$L$cmov4:
+ add dl,al
+ xor r9b,BYTE[rdx*1+rdi]
+ ror r9d,8
+ add cl,bl
+ lea r10,[1+rsi]
+ movzx edx,BYTE[rcx*1+rdi]
+ movzx r10d,r10b
+ movzx eax,BYTE[r10*1+rdi]
+ mov BYTE[rcx*1+rdi],bl
+ cmp rcx,r10
+ mov BYTE[rsi*1+rdi],dl
+ jne NEAR $L$cmov5
+ mov rax,rbx
+$L$cmov5:
+ add dl,bl
+ xor r9b,BYTE[rdx*1+rdi]
+ ror r9d,8
+ add cl,al
+ lea rsi,[1+r10]
+ movzx edx,BYTE[rcx*1+rdi]
+ movzx esi,sil
+ movzx ebx,BYTE[rsi*1+rdi]
+ mov BYTE[rcx*1+rdi],al
+ cmp rcx,rsi
+ mov BYTE[r10*1+rdi],dl
+ jne NEAR $L$cmov6
+ mov rbx,rax
+$L$cmov6:
+ add dl,al
+ xor r9b,BYTE[rdx*1+rdi]
+ ror r9d,8
+ add cl,bl
+ lea r10,[1+rsi]
+ movzx edx,BYTE[rcx*1+rdi]
+ movzx r10d,r10b
+ movzx eax,BYTE[r10*1+rdi]
+ mov BYTE[rcx*1+rdi],bl
+ cmp rcx,r10
+ mov BYTE[rsi*1+rdi],dl
+ jne NEAR $L$cmov7
+ mov rax,rbx
+$L$cmov7:
+ add dl,bl
+ xor r9b,BYTE[rdx*1+rdi]
+ ror r9d,8
+ lea r11,[((-8))+r11]
+ mov DWORD[r13],r8d
+ lea r12,[8+r12]
+ mov DWORD[4+r13],r9d
+ lea r13,[8+r13]
+
+ test r11,-8
+ jnz NEAR $L$cloop8
+ cmp r11,0
+ jne NEAR $L$cloop1
+ jmp NEAR $L$exit
+ALIGN 16
+$L$cloop1:
+ add cl,al
+ movzx ecx,cl
+ movzx edx,BYTE[rcx*1+rdi]
+ mov BYTE[rcx*1+rdi],al
+ mov BYTE[r10*1+rdi],dl
+ add dl,al
+ add r10b,1
+ movzx edx,dl
+ movzx r10d,r10b
+ movzx edx,BYTE[rdx*1+rdi]
+ movzx eax,BYTE[r10*1+rdi]
+ xor dl,BYTE[r12]
+ lea r12,[1+r12]
+ mov BYTE[r13],dl
+ lea r13,[1+r13]
+ sub r11,1
+ jnz NEAR $L$cloop1
+ jmp NEAR $L$exit
+
+ALIGN 16
+$L$exit:
+ sub r10b,1
+ mov DWORD[((-8))+rdi],r10d
+ mov DWORD[((-4))+rdi],ecx
+
+ mov r13,QWORD[rsp]
+
+ mov r12,QWORD[8+rsp]
+
+ mov rbx,QWORD[16+rsp]
+
+ add rsp,24
+
+$L$epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_RC4:
+global RC4_set_key
+
+ALIGN 16
+RC4_set_key:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_RC4_set_key:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+ lea rdi,[8+rdi]
+ lea rdx,[rsi*1+rdx]
+ neg rsi
+ mov rcx,rsi
+ xor eax,eax
+ xor r9,r9
+ xor r10,r10
+ xor r11,r11
+
+ mov r8d,DWORD[OPENSSL_ia32cap_P]
+ bt r8d,20
+ jc NEAR $L$c1stloop
+ jmp NEAR $L$w1stloop
+
+ALIGN 16
+$L$w1stloop:
+ mov DWORD[rax*4+rdi],eax
+ add al,1
+ jnc NEAR $L$w1stloop
+
+ xor r9,r9
+ xor r8,r8
+ALIGN 16
+$L$w2ndloop:
+ mov r10d,DWORD[r9*4+rdi]
+ add r8b,BYTE[rsi*1+rdx]
+ add r8b,r10b
+ add rsi,1
+ mov r11d,DWORD[r8*4+rdi]
+ cmovz rsi,rcx
+ mov DWORD[r8*4+rdi],r10d
+ mov DWORD[r9*4+rdi],r11d
+ add r9b,1
+ jnc NEAR $L$w2ndloop
+ jmp NEAR $L$exit_key
+
+ALIGN 16
+$L$c1stloop:
+ mov BYTE[rax*1+rdi],al
+ add al,1
+ jnc NEAR $L$c1stloop
+
+ xor r9,r9
+ xor r8,r8
+ALIGN 16
+$L$c2ndloop:
+ mov r10b,BYTE[r9*1+rdi]
+ add r8b,BYTE[rsi*1+rdx]
+ add r8b,r10b
+ add rsi,1
+ mov r11b,BYTE[r8*1+rdi]
+ jnz NEAR $L$cnowrap
+ mov rsi,rcx
+$L$cnowrap:
+ mov BYTE[r8*1+rdi],r10b
+ mov BYTE[r9*1+rdi],r11b
+ add r9b,1
+ jnc NEAR $L$c2ndloop
+ mov DWORD[256+rdi],-1
+
+ALIGN 16
+$L$exit_key:
+ xor eax,eax
+ mov DWORD[((-8))+rdi],eax
+ mov DWORD[((-4))+rdi],eax
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+$L$SEH_end_RC4_set_key:
+
+global RC4_options
+
+ALIGN 16
+RC4_options:
+ lea rax,[$L$opts]
+ mov edx,DWORD[OPENSSL_ia32cap_P]
+ bt edx,20
+ jc NEAR $L$8xchar
+ bt edx,30
+ jnc NEAR $L$done
+ add rax,25
+ DB 0F3h,0C3h ;repret
+$L$8xchar:
+ add rax,12
+$L$done:
+ DB 0F3h,0C3h ;repret
+ALIGN 64
+$L$opts:
+DB 114,99,52,40,56,120,44,105,110,116,41,0
+DB 114,99,52,40,56,120,44,99,104,97,114,41,0
+DB 114,99,52,40,49,54,120,44,105,110,116,41,0
+DB 82,67,52,32,102,111,114,32,120,56,54,95,54,52,44,32
+DB 67,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97
+DB 112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103
+DB 62,0
+ALIGN 64
+
+EXTERN __imp_RtlVirtualUnwind
+
+ALIGN 16
+stream_se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ lea r10,[$L$prologue]
+ cmp rbx,r10
+ jb NEAR $L$in_prologue
+
+ mov rax,QWORD[152+r8]
+
+ lea r10,[$L$epilogue]
+ cmp rbx,r10
+ jae NEAR $L$in_prologue
+
+ lea rax,[24+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+ mov r12,QWORD[((-16))+rax]
+ mov r13,QWORD[((-24))+rax]
+ mov QWORD[144+r8],rbx
+ mov QWORD[216+r8],r12
+ mov QWORD[224+r8],r13
+
+$L$in_prologue:
+ mov rdi,QWORD[8+rax]
+ mov rsi,QWORD[16+rax]
+ mov QWORD[152+r8],rax
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+
+ jmp NEAR $L$common_seh_exit
+
+
+
+ALIGN 16
+key_se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[152+r8]
+ mov rdi,QWORD[8+rax]
+ mov rsi,QWORD[16+rax]
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+
+$L$common_seh_exit:
+
+ mov rdi,QWORD[40+r9]
+ mov rsi,r8
+ mov ecx,154
+ DD 0xa548f3fc
+
+ mov rsi,r9
+ xor rcx,rcx
+ mov rdx,QWORD[8+rsi]
+ mov r8,QWORD[rsi]
+ mov r9,QWORD[16+rsi]
+ mov r10,QWORD[40+rsi]
+ lea r11,[56+rsi]
+ lea r12,[24+rsi]
+ mov QWORD[32+rsp],r10
+ mov QWORD[40+rsp],r11
+ mov QWORD[48+rsp],r12
+ mov QWORD[56+rsp],rcx
+ call QWORD[__imp_RtlVirtualUnwind]
+
+ mov eax,1
+ add rsp,64
+ popfq
+ pop r15
+ pop r14
+ pop r13
+ pop r12
+ pop rbp
+ pop rbx
+ pop rdi
+ pop rsi
+ DB 0F3h,0C3h ;repret
+
+
+section .pdata rdata align=4
+ALIGN 4
+ DD $L$SEH_begin_RC4 wrt ..imagebase
+ DD $L$SEH_end_RC4 wrt ..imagebase
+ DD $L$SEH_info_RC4 wrt ..imagebase
+
+ DD $L$SEH_begin_RC4_set_key wrt ..imagebase
+ DD $L$SEH_end_RC4_set_key wrt ..imagebase
+ DD $L$SEH_info_RC4_set_key wrt ..imagebase
+
+section .xdata rdata align=8
+ALIGN 8
+$L$SEH_info_RC4:
+DB 9,0,0,0
+ DD stream_se_handler wrt ..imagebase
+$L$SEH_info_RC4_set_key:
+DB 9,0,0,0
+ DD key_se_handler wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/sha/keccak1600-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/keccak1600-x86_64.nasm
new file mode 100644
index 0000000000..00eadebf68
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/keccak1600-x86_64.nasm
@@ -0,0 +1,532 @@
+; Copyright 2017-2018 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+
+ALIGN 32
+__KeccakF1600:
+ mov rax,QWORD[60+rdi]
+ mov rbx,QWORD[68+rdi]
+ mov rcx,QWORD[76+rdi]
+ mov rdx,QWORD[84+rdi]
+ mov rbp,QWORD[92+rdi]
+ jmp NEAR $L$oop
+
+ALIGN 32
+$L$oop:
+ mov r8,QWORD[((-100))+rdi]
+ mov r9,QWORD[((-52))+rdi]
+ mov r10,QWORD[((-4))+rdi]
+ mov r11,QWORD[44+rdi]
+
+ xor rcx,QWORD[((-84))+rdi]
+ xor rdx,QWORD[((-76))+rdi]
+ xor rax,r8
+ xor rbx,QWORD[((-92))+rdi]
+ xor rcx,QWORD[((-44))+rdi]
+ xor rax,QWORD[((-60))+rdi]
+ mov r12,rbp
+ xor rbp,QWORD[((-68))+rdi]
+
+ xor rcx,r10
+ xor rax,QWORD[((-20))+rdi]
+ xor rdx,QWORD[((-36))+rdi]
+ xor rbx,r9
+ xor rbp,QWORD[((-28))+rdi]
+
+ xor rcx,QWORD[36+rdi]
+ xor rax,QWORD[20+rdi]
+ xor rdx,QWORD[4+rdi]
+ xor rbx,QWORD[((-12))+rdi]
+ xor rbp,QWORD[12+rdi]
+
+ mov r13,rcx
+ rol rcx,1
+ xor rcx,rax
+ xor rdx,r11
+
+ rol rax,1
+ xor rax,rdx
+ xor rbx,QWORD[28+rdi]
+
+ rol rdx,1
+ xor rdx,rbx
+ xor rbp,QWORD[52+rdi]
+
+ rol rbx,1
+ xor rbx,rbp
+
+ rol rbp,1
+ xor rbp,r13
+ xor r9,rcx
+ xor r10,rdx
+ rol r9,44
+ xor r11,rbp
+ xor r12,rax
+ rol r10,43
+ xor r8,rbx
+ mov r13,r9
+ rol r11,21
+ or r9,r10
+ xor r9,r8
+ rol r12,14
+
+ xor r9,QWORD[r15]
+ lea r15,[8+r15]
+
+ mov r14,r12
+ and r12,r11
+ mov QWORD[((-100))+rsi],r9
+ xor r12,r10
+ not r10
+ mov QWORD[((-84))+rsi],r12
+
+ or r10,r11
+ mov r12,QWORD[76+rdi]
+ xor r10,r13
+ mov QWORD[((-92))+rsi],r10
+
+ and r13,r8
+ mov r9,QWORD[((-28))+rdi]
+ xor r13,r14
+ mov r10,QWORD[((-20))+rdi]
+ mov QWORD[((-68))+rsi],r13
+
+ or r14,r8
+ mov r8,QWORD[((-76))+rdi]
+ xor r14,r11
+ mov r11,QWORD[28+rdi]
+ mov QWORD[((-76))+rsi],r14
+
+
+ xor r8,rbp
+ xor r12,rdx
+ rol r8,28
+ xor r11,rcx
+ xor r9,rax
+ rol r12,61
+ rol r11,45
+ xor r10,rbx
+ rol r9,20
+ mov r13,r8
+ or r8,r12
+ rol r10,3
+
+ xor r8,r11
+ mov QWORD[((-36))+rsi],r8
+
+ mov r14,r9
+ and r9,r13
+ mov r8,QWORD[((-92))+rdi]
+ xor r9,r12
+ not r12
+ mov QWORD[((-28))+rsi],r9
+
+ or r12,r11
+ mov r9,QWORD[((-44))+rdi]
+ xor r12,r10
+ mov QWORD[((-44))+rsi],r12
+
+ and r11,r10
+ mov r12,QWORD[60+rdi]
+ xor r11,r14
+ mov QWORD[((-52))+rsi],r11
+
+ or r14,r10
+ mov r10,QWORD[4+rdi]
+ xor r14,r13
+ mov r11,QWORD[52+rdi]
+ mov QWORD[((-60))+rsi],r14
+
+
+ xor r10,rbp
+ xor r11,rax
+ rol r10,25
+ xor r9,rdx
+ rol r11,8
+ xor r12,rbx
+ rol r9,6
+ xor r8,rcx
+ rol r12,18
+ mov r13,r10
+ and r10,r11
+ rol r8,1
+
+ not r11
+ xor r10,r9
+ mov QWORD[((-12))+rsi],r10
+
+ mov r14,r12
+ and r12,r11
+ mov r10,QWORD[((-12))+rdi]
+ xor r12,r13
+ mov QWORD[((-4))+rsi],r12
+
+ or r13,r9
+ mov r12,QWORD[84+rdi]
+ xor r13,r8
+ mov QWORD[((-20))+rsi],r13
+
+ and r9,r8
+ xor r9,r14
+ mov QWORD[12+rsi],r9
+
+ or r14,r8
+ mov r9,QWORD[((-60))+rdi]
+ xor r14,r11
+ mov r11,QWORD[36+rdi]
+ mov QWORD[4+rsi],r14
+
+
+ mov r8,QWORD[((-68))+rdi]
+
+ xor r10,rcx
+ xor r11,rdx
+ rol r10,10
+ xor r9,rbx
+ rol r11,15
+ xor r12,rbp
+ rol r9,36
+ xor r8,rax
+ rol r12,56
+ mov r13,r10
+ or r10,r11
+ rol r8,27
+
+ not r11
+ xor r10,r9
+ mov QWORD[28+rsi],r10
+
+ mov r14,r12
+ or r12,r11
+ xor r12,r13
+ mov QWORD[36+rsi],r12
+
+ and r13,r9
+ xor r13,r8
+ mov QWORD[20+rsi],r13
+
+ or r9,r8
+ xor r9,r14
+ mov QWORD[52+rsi],r9
+
+ and r8,r14
+ xor r8,r11
+ mov QWORD[44+rsi],r8
+
+
+ xor rdx,QWORD[((-84))+rdi]
+ xor rbp,QWORD[((-36))+rdi]
+ rol rdx,62
+ xor rcx,QWORD[68+rdi]
+ rol rbp,55
+ xor rax,QWORD[12+rdi]
+ rol rcx,2
+ xor rbx,QWORD[20+rdi]
+ xchg rdi,rsi
+ rol rax,39
+ rol rbx,41
+ mov r13,rdx
+ and rdx,rbp
+ not rbp
+ xor rdx,rcx
+ mov QWORD[92+rdi],rdx
+
+ mov r14,rax
+ and rax,rbp
+ xor rax,r13
+ mov QWORD[60+rdi],rax
+
+ or r13,rcx
+ xor r13,rbx
+ mov QWORD[84+rdi],r13
+
+ and rcx,rbx
+ xor rcx,r14
+ mov QWORD[76+rdi],rcx
+
+ or rbx,r14
+ xor rbx,rbp
+ mov QWORD[68+rdi],rbx
+
+ mov rbp,rdx
+ mov rdx,r13
+
+ test r15,255
+ jnz NEAR $L$oop
+
+ lea r15,[((-192))+r15]
+ DB 0F3h,0C3h ;repret
+
+
+
+ALIGN 32
+KeccakF1600:
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+
+ lea rdi,[100+rdi]
+ sub rsp,200
+
+
+ not QWORD[((-92))+rdi]
+ not QWORD[((-84))+rdi]
+ not QWORD[((-36))+rdi]
+ not QWORD[((-4))+rdi]
+ not QWORD[36+rdi]
+ not QWORD[60+rdi]
+
+ lea r15,[iotas]
+ lea rsi,[100+rsp]
+
+ call __KeccakF1600
+
+ not QWORD[((-92))+rdi]
+ not QWORD[((-84))+rdi]
+ not QWORD[((-36))+rdi]
+ not QWORD[((-4))+rdi]
+ not QWORD[36+rdi]
+ not QWORD[60+rdi]
+ lea rdi,[((-100))+rdi]
+
+ add rsp,200
+
+
+ pop r15
+
+ pop r14
+
+ pop r13
+
+ pop r12
+
+ pop rbp
+
+ pop rbx
+
+ DB 0F3h,0C3h ;repret
+
+
+global SHA3_absorb
+
+ALIGN 32
+SHA3_absorb:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_SHA3_absorb:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+
+
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+
+ lea rdi,[100+rdi]
+ sub rsp,232
+
+
+ mov r9,rsi
+ lea rsi,[100+rsp]
+
+ not QWORD[((-92))+rdi]
+ not QWORD[((-84))+rdi]
+ not QWORD[((-36))+rdi]
+ not QWORD[((-4))+rdi]
+ not QWORD[36+rdi]
+ not QWORD[60+rdi]
+ lea r15,[iotas]
+
+ mov QWORD[((216-100))+rsi],rcx
+
+$L$oop_absorb:
+ cmp rdx,rcx
+ jc NEAR $L$done_absorb
+
+ shr rcx,3
+ lea r8,[((-100))+rdi]
+
+$L$block_absorb:
+ mov rax,QWORD[r9]
+ lea r9,[8+r9]
+ xor rax,QWORD[r8]
+ lea r8,[8+r8]
+ sub rdx,8
+ mov QWORD[((-8))+r8],rax
+ sub rcx,1
+ jnz NEAR $L$block_absorb
+
+ mov QWORD[((200-100))+rsi],r9
+ mov QWORD[((208-100))+rsi],rdx
+ call __KeccakF1600
+ mov r9,QWORD[((200-100))+rsi]
+ mov rdx,QWORD[((208-100))+rsi]
+ mov rcx,QWORD[((216-100))+rsi]
+ jmp NEAR $L$oop_absorb
+
+ALIGN 32
+$L$done_absorb:
+ mov rax,rdx
+
+ not QWORD[((-92))+rdi]
+ not QWORD[((-84))+rdi]
+ not QWORD[((-36))+rdi]
+ not QWORD[((-4))+rdi]
+ not QWORD[36+rdi]
+ not QWORD[60+rdi]
+
+ add rsp,232
+
+
+ pop r15
+
+ pop r14
+
+ pop r13
+
+ pop r12
+
+ pop rbp
+
+ pop rbx
+
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_SHA3_absorb:
+global SHA3_squeeze
+
+ALIGN 32
+SHA3_squeeze:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_SHA3_squeeze:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+
+
+
+ push r12
+
+ push r13
+
+ push r14
+
+
+ shr rcx,3
+ mov r8,rdi
+ mov r12,rsi
+ mov r13,rdx
+ mov r14,rcx
+ jmp NEAR $L$oop_squeeze
+
+ALIGN 32
+$L$oop_squeeze:
+ cmp r13,8
+ jb NEAR $L$tail_squeeze
+
+ mov rax,QWORD[r8]
+ lea r8,[8+r8]
+ mov QWORD[r12],rax
+ lea r12,[8+r12]
+ sub r13,8
+ jz NEAR $L$done_squeeze
+
+ sub rcx,1
+ jnz NEAR $L$oop_squeeze
+
+ call KeccakF1600
+ mov r8,rdi
+ mov rcx,r14
+ jmp NEAR $L$oop_squeeze
+
+$L$tail_squeeze:
+ mov rsi,r8
+ mov rdi,r12
+ mov rcx,r13
+DB 0xf3,0xa4
+
+$L$done_squeeze:
+ pop r14
+
+ pop r13
+
+ pop r12
+
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_SHA3_squeeze:
+ALIGN 256
+ DQ 0,0,0,0,0,0,0,0
+
+iotas:
+ DQ 0x0000000000000001
+ DQ 0x0000000000008082
+ DQ 0x800000000000808a
+ DQ 0x8000000080008000
+ DQ 0x000000000000808b
+ DQ 0x0000000080000001
+ DQ 0x8000000080008081
+ DQ 0x8000000000008009
+ DQ 0x000000000000008a
+ DQ 0x0000000000000088
+ DQ 0x0000000080008009
+ DQ 0x000000008000000a
+ DQ 0x000000008000808b
+ DQ 0x800000000000008b
+ DQ 0x8000000000008089
+ DQ 0x8000000000008003
+ DQ 0x8000000000008002
+ DQ 0x8000000000000080
+ DQ 0x000000000000800a
+ DQ 0x800000008000000a
+ DQ 0x8000000080008081
+ DQ 0x8000000000008080
+ DQ 0x0000000080000001
+ DQ 0x8000000080008008
+
+DB 75,101,99,99,97,107,45,49,54,48,48,32,97,98,115,111
+DB 114,98,32,97,110,100,32,115,113,117,101,101,122,101,32,102
+DB 111,114,32,120,56,54,95,54,52,44,32,67,82,89,80,84
+DB 79,71,65,77,83,32,98,121,32,60,97,112,112,114,111,64
+DB 111,112,101,110,115,115,108,46,111,114,103,62,0
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb-x86_64.nasm
new file mode 100644
index 0000000000..ea394daa3b
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb-x86_64.nasm
@@ -0,0 +1,7581 @@
+; Copyright 2013-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+EXTERN OPENSSL_ia32cap_P
+
+global sha1_multi_block
+
+ALIGN 32
+sha1_multi_block:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha1_multi_block:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+ mov rcx,QWORD[((OPENSSL_ia32cap_P+4))]
+ bt rcx,61
+ jc NEAR _shaext_shortcut
+ test ecx,268435456
+ jnz NEAR _avx_shortcut
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ lea rsp,[((-168))+rsp]
+ movaps XMMWORD[rsp],xmm6
+ movaps XMMWORD[16+rsp],xmm7
+ movaps XMMWORD[32+rsp],xmm8
+ movaps XMMWORD[48+rsp],xmm9
+ movaps XMMWORD[(-120)+rax],xmm10
+ movaps XMMWORD[(-104)+rax],xmm11
+ movaps XMMWORD[(-88)+rax],xmm12
+ movaps XMMWORD[(-72)+rax],xmm13
+ movaps XMMWORD[(-56)+rax],xmm14
+ movaps XMMWORD[(-40)+rax],xmm15
+ sub rsp,288
+ and rsp,-256
+ mov QWORD[272+rsp],rax
+
+$L$body:
+ lea rbp,[K_XX_XX]
+ lea rbx,[256+rsp]
+
+$L$oop_grande:
+ mov DWORD[280+rsp],edx
+ xor edx,edx
+ mov r8,QWORD[rsi]
+ mov ecx,DWORD[8+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[rbx],ecx
+ cmovle r8,rbp
+ mov r9,QWORD[16+rsi]
+ mov ecx,DWORD[24+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[4+rbx],ecx
+ cmovle r9,rbp
+ mov r10,QWORD[32+rsi]
+ mov ecx,DWORD[40+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[8+rbx],ecx
+ cmovle r10,rbp
+ mov r11,QWORD[48+rsi]
+ mov ecx,DWORD[56+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[12+rbx],ecx
+ cmovle r11,rbp
+ test edx,edx
+ jz NEAR $L$done
+
+ movdqu xmm10,XMMWORD[rdi]
+ lea rax,[128+rsp]
+ movdqu xmm11,XMMWORD[32+rdi]
+ movdqu xmm12,XMMWORD[64+rdi]
+ movdqu xmm13,XMMWORD[96+rdi]
+ movdqu xmm14,XMMWORD[128+rdi]
+ movdqa xmm5,XMMWORD[96+rbp]
+ movdqa xmm15,XMMWORD[((-32))+rbp]
+ jmp NEAR $L$oop
+
+ALIGN 32
+$L$oop:
+ movd xmm0,DWORD[r8]
+ lea r8,[64+r8]
+ movd xmm2,DWORD[r9]
+ lea r9,[64+r9]
+ movd xmm3,DWORD[r10]
+ lea r10,[64+r10]
+ movd xmm4,DWORD[r11]
+ lea r11,[64+r11]
+ punpckldq xmm0,xmm3
+ movd xmm1,DWORD[((-60))+r8]
+ punpckldq xmm2,xmm4
+ movd xmm9,DWORD[((-60))+r9]
+ punpckldq xmm0,xmm2
+ movd xmm8,DWORD[((-60))+r10]
+DB 102,15,56,0,197
+ movd xmm7,DWORD[((-60))+r11]
+ punpckldq xmm1,xmm8
+ movdqa xmm8,xmm10
+ paddd xmm14,xmm15
+ punpckldq xmm9,xmm7
+ movdqa xmm7,xmm11
+ movdqa xmm6,xmm11
+ pslld xmm8,5
+ pandn xmm7,xmm13
+ pand xmm6,xmm12
+ punpckldq xmm1,xmm9
+ movdqa xmm9,xmm10
+
+ movdqa XMMWORD[(0-128)+rax],xmm0
+ paddd xmm14,xmm0
+ movd xmm2,DWORD[((-56))+r8]
+ psrld xmm9,27
+ pxor xmm6,xmm7
+ movdqa xmm7,xmm11
+
+ por xmm8,xmm9
+ movd xmm9,DWORD[((-56))+r9]
+ pslld xmm7,30
+ paddd xmm14,xmm6
+
+ psrld xmm11,2
+ paddd xmm14,xmm8
+DB 102,15,56,0,205
+ movd xmm8,DWORD[((-56))+r10]
+ por xmm11,xmm7
+ movd xmm7,DWORD[((-56))+r11]
+ punpckldq xmm2,xmm8
+ movdqa xmm8,xmm14
+ paddd xmm13,xmm15
+ punpckldq xmm9,xmm7
+ movdqa xmm7,xmm10
+ movdqa xmm6,xmm10
+ pslld xmm8,5
+ pandn xmm7,xmm12
+ pand xmm6,xmm11
+ punpckldq xmm2,xmm9
+ movdqa xmm9,xmm14
+
+ movdqa XMMWORD[(16-128)+rax],xmm1
+ paddd xmm13,xmm1
+ movd xmm3,DWORD[((-52))+r8]
+ psrld xmm9,27
+ pxor xmm6,xmm7
+ movdqa xmm7,xmm10
+
+ por xmm8,xmm9
+ movd xmm9,DWORD[((-52))+r9]
+ pslld xmm7,30
+ paddd xmm13,xmm6
+
+ psrld xmm10,2
+ paddd xmm13,xmm8
+DB 102,15,56,0,213
+ movd xmm8,DWORD[((-52))+r10]
+ por xmm10,xmm7
+ movd xmm7,DWORD[((-52))+r11]
+ punpckldq xmm3,xmm8
+ movdqa xmm8,xmm13
+ paddd xmm12,xmm15
+ punpckldq xmm9,xmm7
+ movdqa xmm7,xmm14
+ movdqa xmm6,xmm14
+ pslld xmm8,5
+ pandn xmm7,xmm11
+ pand xmm6,xmm10
+ punpckldq xmm3,xmm9
+ movdqa xmm9,xmm13
+
+ movdqa XMMWORD[(32-128)+rax],xmm2
+ paddd xmm12,xmm2
+ movd xmm4,DWORD[((-48))+r8]
+ psrld xmm9,27
+ pxor xmm6,xmm7
+ movdqa xmm7,xmm14
+
+ por xmm8,xmm9
+ movd xmm9,DWORD[((-48))+r9]
+ pslld xmm7,30
+ paddd xmm12,xmm6
+
+ psrld xmm14,2
+ paddd xmm12,xmm8
+DB 102,15,56,0,221
+ movd xmm8,DWORD[((-48))+r10]
+ por xmm14,xmm7
+ movd xmm7,DWORD[((-48))+r11]
+ punpckldq xmm4,xmm8
+ movdqa xmm8,xmm12
+ paddd xmm11,xmm15
+ punpckldq xmm9,xmm7
+ movdqa xmm7,xmm13
+ movdqa xmm6,xmm13
+ pslld xmm8,5
+ pandn xmm7,xmm10
+ pand xmm6,xmm14
+ punpckldq xmm4,xmm9
+ movdqa xmm9,xmm12
+
+ movdqa XMMWORD[(48-128)+rax],xmm3
+ paddd xmm11,xmm3
+ movd xmm0,DWORD[((-44))+r8]
+ psrld xmm9,27
+ pxor xmm6,xmm7
+ movdqa xmm7,xmm13
+
+ por xmm8,xmm9
+ movd xmm9,DWORD[((-44))+r9]
+ pslld xmm7,30
+ paddd xmm11,xmm6
+
+ psrld xmm13,2
+ paddd xmm11,xmm8
+DB 102,15,56,0,229
+ movd xmm8,DWORD[((-44))+r10]
+ por xmm13,xmm7
+ movd xmm7,DWORD[((-44))+r11]
+ punpckldq xmm0,xmm8
+ movdqa xmm8,xmm11
+ paddd xmm10,xmm15
+ punpckldq xmm9,xmm7
+ movdqa xmm7,xmm12
+ movdqa xmm6,xmm12
+ pslld xmm8,5
+ pandn xmm7,xmm14
+ pand xmm6,xmm13
+ punpckldq xmm0,xmm9
+ movdqa xmm9,xmm11
+
+ movdqa XMMWORD[(64-128)+rax],xmm4
+ paddd xmm10,xmm4
+ movd xmm1,DWORD[((-40))+r8]
+ psrld xmm9,27
+ pxor xmm6,xmm7
+ movdqa xmm7,xmm12
+
+ por xmm8,xmm9
+ movd xmm9,DWORD[((-40))+r9]
+ pslld xmm7,30
+ paddd xmm10,xmm6
+
+ psrld xmm12,2
+ paddd xmm10,xmm8
+DB 102,15,56,0,197
+ movd xmm8,DWORD[((-40))+r10]
+ por xmm12,xmm7
+ movd xmm7,DWORD[((-40))+r11]
+ punpckldq xmm1,xmm8
+ movdqa xmm8,xmm10
+ paddd xmm14,xmm15
+ punpckldq xmm9,xmm7
+ movdqa xmm7,xmm11
+ movdqa xmm6,xmm11
+ pslld xmm8,5
+ pandn xmm7,xmm13
+ pand xmm6,xmm12
+ punpckldq xmm1,xmm9
+ movdqa xmm9,xmm10
+
+ movdqa XMMWORD[(80-128)+rax],xmm0
+ paddd xmm14,xmm0
+ movd xmm2,DWORD[((-36))+r8]
+ psrld xmm9,27
+ pxor xmm6,xmm7
+ movdqa xmm7,xmm11
+
+ por xmm8,xmm9
+ movd xmm9,DWORD[((-36))+r9]
+ pslld xmm7,30
+ paddd xmm14,xmm6
+
+ psrld xmm11,2
+ paddd xmm14,xmm8
+DB 102,15,56,0,205
+ movd xmm8,DWORD[((-36))+r10]
+ por xmm11,xmm7
+ movd xmm7,DWORD[((-36))+r11]
+ punpckldq xmm2,xmm8
+ movdqa xmm8,xmm14
+ paddd xmm13,xmm15
+ punpckldq xmm9,xmm7
+ movdqa xmm7,xmm10
+ movdqa xmm6,xmm10
+ pslld xmm8,5
+ pandn xmm7,xmm12
+ pand xmm6,xmm11
+ punpckldq xmm2,xmm9
+ movdqa xmm9,xmm14
+
+ movdqa XMMWORD[(96-128)+rax],xmm1
+ paddd xmm13,xmm1
+ movd xmm3,DWORD[((-32))+r8]
+ psrld xmm9,27
+ pxor xmm6,xmm7
+ movdqa xmm7,xmm10
+
+ por xmm8,xmm9
+ movd xmm9,DWORD[((-32))+r9]
+ pslld xmm7,30
+ paddd xmm13,xmm6
+
+ psrld xmm10,2
+ paddd xmm13,xmm8
+DB 102,15,56,0,213
+ movd xmm8,DWORD[((-32))+r10]
+ por xmm10,xmm7
+ movd xmm7,DWORD[((-32))+r11]
+ punpckldq xmm3,xmm8
+ movdqa xmm8,xmm13
+ paddd xmm12,xmm15
+ punpckldq xmm9,xmm7
+ movdqa xmm7,xmm14
+ movdqa xmm6,xmm14
+ pslld xmm8,5
+ pandn xmm7,xmm11
+ pand xmm6,xmm10
+ punpckldq xmm3,xmm9
+ movdqa xmm9,xmm13
+
+ movdqa XMMWORD[(112-128)+rax],xmm2
+ paddd xmm12,xmm2
+ movd xmm4,DWORD[((-28))+r8]
+ psrld xmm9,27
+ pxor xmm6,xmm7
+ movdqa xmm7,xmm14
+
+ por xmm8,xmm9
+ movd xmm9,DWORD[((-28))+r9]
+ pslld xmm7,30
+ paddd xmm12,xmm6
+
+ psrld xmm14,2
+ paddd xmm12,xmm8
+DB 102,15,56,0,221
+ movd xmm8,DWORD[((-28))+r10]
+ por xmm14,xmm7
+ movd xmm7,DWORD[((-28))+r11]
+ punpckldq xmm4,xmm8
+ movdqa xmm8,xmm12
+ paddd xmm11,xmm15
+ punpckldq xmm9,xmm7
+ movdqa xmm7,xmm13
+ movdqa xmm6,xmm13
+ pslld xmm8,5
+ pandn xmm7,xmm10
+ pand xmm6,xmm14
+ punpckldq xmm4,xmm9
+ movdqa xmm9,xmm12
+
+ movdqa XMMWORD[(128-128)+rax],xmm3
+ paddd xmm11,xmm3
+ movd xmm0,DWORD[((-24))+r8]
+ psrld xmm9,27
+ pxor xmm6,xmm7
+ movdqa xmm7,xmm13
+
+ por xmm8,xmm9
+ movd xmm9,DWORD[((-24))+r9]
+ pslld xmm7,30
+ paddd xmm11,xmm6
+
+ psrld xmm13,2
+ paddd xmm11,xmm8
+DB 102,15,56,0,229
+ movd xmm8,DWORD[((-24))+r10]
+ por xmm13,xmm7
+ movd xmm7,DWORD[((-24))+r11]
+ punpckldq xmm0,xmm8
+ movdqa xmm8,xmm11
+ paddd xmm10,xmm15
+ punpckldq xmm9,xmm7
+ movdqa xmm7,xmm12
+ movdqa xmm6,xmm12
+ pslld xmm8,5
+ pandn xmm7,xmm14
+ pand xmm6,xmm13
+ punpckldq xmm0,xmm9
+ movdqa xmm9,xmm11
+
+ movdqa XMMWORD[(144-128)+rax],xmm4
+ paddd xmm10,xmm4
+ movd xmm1,DWORD[((-20))+r8]
+ psrld xmm9,27
+ pxor xmm6,xmm7
+ movdqa xmm7,xmm12
+
+ por xmm8,xmm9
+ movd xmm9,DWORD[((-20))+r9]
+ pslld xmm7,30
+ paddd xmm10,xmm6
+
+ psrld xmm12,2
+ paddd xmm10,xmm8
+DB 102,15,56,0,197
+ movd xmm8,DWORD[((-20))+r10]
+ por xmm12,xmm7
+ movd xmm7,DWORD[((-20))+r11]
+ punpckldq xmm1,xmm8
+ movdqa xmm8,xmm10
+ paddd xmm14,xmm15
+ punpckldq xmm9,xmm7
+ movdqa xmm7,xmm11
+ movdqa xmm6,xmm11
+ pslld xmm8,5
+ pandn xmm7,xmm13
+ pand xmm6,xmm12
+ punpckldq xmm1,xmm9
+ movdqa xmm9,xmm10
+
+ movdqa XMMWORD[(160-128)+rax],xmm0
+ paddd xmm14,xmm0
+ movd xmm2,DWORD[((-16))+r8]
+ psrld xmm9,27
+ pxor xmm6,xmm7
+ movdqa xmm7,xmm11
+
+ por xmm8,xmm9
+ movd xmm9,DWORD[((-16))+r9]
+ pslld xmm7,30
+ paddd xmm14,xmm6
+
+ psrld xmm11,2
+ paddd xmm14,xmm8
+DB 102,15,56,0,205
+ movd xmm8,DWORD[((-16))+r10]
+ por xmm11,xmm7
+ movd xmm7,DWORD[((-16))+r11]
+ punpckldq xmm2,xmm8
+ movdqa xmm8,xmm14
+ paddd xmm13,xmm15
+ punpckldq xmm9,xmm7
+ movdqa xmm7,xmm10
+ movdqa xmm6,xmm10
+ pslld xmm8,5
+ pandn xmm7,xmm12
+ pand xmm6,xmm11
+ punpckldq xmm2,xmm9
+ movdqa xmm9,xmm14
+
+ movdqa XMMWORD[(176-128)+rax],xmm1
+ paddd xmm13,xmm1
+ movd xmm3,DWORD[((-12))+r8]
+ psrld xmm9,27
+ pxor xmm6,xmm7
+ movdqa xmm7,xmm10
+
+ por xmm8,xmm9
+ movd xmm9,DWORD[((-12))+r9]
+ pslld xmm7,30
+ paddd xmm13,xmm6
+
+ psrld xmm10,2
+ paddd xmm13,xmm8
+DB 102,15,56,0,213
+ movd xmm8,DWORD[((-12))+r10]
+ por xmm10,xmm7
+ movd xmm7,DWORD[((-12))+r11]
+ punpckldq xmm3,xmm8
+ movdqa xmm8,xmm13
+ paddd xmm12,xmm15
+ punpckldq xmm9,xmm7
+ movdqa xmm7,xmm14
+ movdqa xmm6,xmm14
+ pslld xmm8,5
+ pandn xmm7,xmm11
+ pand xmm6,xmm10
+ punpckldq xmm3,xmm9
+ movdqa xmm9,xmm13
+
+ movdqa XMMWORD[(192-128)+rax],xmm2
+ paddd xmm12,xmm2
+ movd xmm4,DWORD[((-8))+r8]
+ psrld xmm9,27
+ pxor xmm6,xmm7
+ movdqa xmm7,xmm14
+
+ por xmm8,xmm9
+ movd xmm9,DWORD[((-8))+r9]
+ pslld xmm7,30
+ paddd xmm12,xmm6
+
+ psrld xmm14,2
+ paddd xmm12,xmm8
+DB 102,15,56,0,221
+ movd xmm8,DWORD[((-8))+r10]
+ por xmm14,xmm7
+ movd xmm7,DWORD[((-8))+r11]
+ punpckldq xmm4,xmm8
+ movdqa xmm8,xmm12
+ paddd xmm11,xmm15
+ punpckldq xmm9,xmm7
+ movdqa xmm7,xmm13
+ movdqa xmm6,xmm13
+ pslld xmm8,5
+ pandn xmm7,xmm10
+ pand xmm6,xmm14
+ punpckldq xmm4,xmm9
+ movdqa xmm9,xmm12
+
+ movdqa XMMWORD[(208-128)+rax],xmm3
+ paddd xmm11,xmm3
+ movd xmm0,DWORD[((-4))+r8]
+ psrld xmm9,27
+ pxor xmm6,xmm7
+ movdqa xmm7,xmm13
+
+ por xmm8,xmm9
+ movd xmm9,DWORD[((-4))+r9]
+ pslld xmm7,30
+ paddd xmm11,xmm6
+
+ psrld xmm13,2
+ paddd xmm11,xmm8
+DB 102,15,56,0,229
+ movd xmm8,DWORD[((-4))+r10]
+ por xmm13,xmm7
+ movdqa xmm1,XMMWORD[((0-128))+rax]
+ movd xmm7,DWORD[((-4))+r11]
+ punpckldq xmm0,xmm8
+ movdqa xmm8,xmm11
+ paddd xmm10,xmm15
+ punpckldq xmm9,xmm7
+ movdqa xmm7,xmm12
+ movdqa xmm6,xmm12
+ pslld xmm8,5
+ prefetcht0 [63+r8]
+ pandn xmm7,xmm14
+ pand xmm6,xmm13
+ punpckldq xmm0,xmm9
+ movdqa xmm9,xmm11
+
+ movdqa XMMWORD[(224-128)+rax],xmm4
+ paddd xmm10,xmm4
+ psrld xmm9,27
+ pxor xmm6,xmm7
+ movdqa xmm7,xmm12
+ prefetcht0 [63+r9]
+
+ por xmm8,xmm9
+ pslld xmm7,30
+ paddd xmm10,xmm6
+ prefetcht0 [63+r10]
+
+ psrld xmm12,2
+ paddd xmm10,xmm8
+DB 102,15,56,0,197
+ prefetcht0 [63+r11]
+ por xmm12,xmm7
+ movdqa xmm2,XMMWORD[((16-128))+rax]
+ pxor xmm1,xmm3
+ movdqa xmm3,XMMWORD[((32-128))+rax]
+
+ movdqa xmm8,xmm10
+ pxor xmm1,XMMWORD[((128-128))+rax]
+ paddd xmm14,xmm15
+ movdqa xmm7,xmm11
+ pslld xmm8,5
+ pxor xmm1,xmm3
+ movdqa xmm6,xmm11
+ pandn xmm7,xmm13
+ movdqa xmm5,xmm1
+ pand xmm6,xmm12
+ movdqa xmm9,xmm10
+ psrld xmm5,31
+ paddd xmm1,xmm1
+
+ movdqa XMMWORD[(240-128)+rax],xmm0
+ paddd xmm14,xmm0
+ psrld xmm9,27
+ pxor xmm6,xmm7
+
+ movdqa xmm7,xmm11
+ por xmm8,xmm9
+ pslld xmm7,30
+ paddd xmm14,xmm6
+
+ psrld xmm11,2
+ paddd xmm14,xmm8
+ por xmm1,xmm5
+ por xmm11,xmm7
+ pxor xmm2,xmm4
+ movdqa xmm4,XMMWORD[((48-128))+rax]
+
+ movdqa xmm8,xmm14
+ pxor xmm2,XMMWORD[((144-128))+rax]
+ paddd xmm13,xmm15
+ movdqa xmm7,xmm10
+ pslld xmm8,5
+ pxor xmm2,xmm4
+ movdqa xmm6,xmm10
+ pandn xmm7,xmm12
+ movdqa xmm5,xmm2
+ pand xmm6,xmm11
+ movdqa xmm9,xmm14
+ psrld xmm5,31
+ paddd xmm2,xmm2
+
+ movdqa XMMWORD[(0-128)+rax],xmm1
+ paddd xmm13,xmm1
+ psrld xmm9,27
+ pxor xmm6,xmm7
+
+ movdqa xmm7,xmm10
+ por xmm8,xmm9
+ pslld xmm7,30
+ paddd xmm13,xmm6
+
+ psrld xmm10,2
+ paddd xmm13,xmm8
+ por xmm2,xmm5
+ por xmm10,xmm7
+ pxor xmm3,xmm0
+ movdqa xmm0,XMMWORD[((64-128))+rax]
+
+ movdqa xmm8,xmm13
+ pxor xmm3,XMMWORD[((160-128))+rax]
+ paddd xmm12,xmm15
+ movdqa xmm7,xmm14
+ pslld xmm8,5
+ pxor xmm3,xmm0
+ movdqa xmm6,xmm14
+ pandn xmm7,xmm11
+ movdqa xmm5,xmm3
+ pand xmm6,xmm10
+ movdqa xmm9,xmm13
+ psrld xmm5,31
+ paddd xmm3,xmm3
+
+ movdqa XMMWORD[(16-128)+rax],xmm2
+ paddd xmm12,xmm2
+ psrld xmm9,27
+ pxor xmm6,xmm7
+
+ movdqa xmm7,xmm14
+ por xmm8,xmm9
+ pslld xmm7,30
+ paddd xmm12,xmm6
+
+ psrld xmm14,2
+ paddd xmm12,xmm8
+ por xmm3,xmm5
+ por xmm14,xmm7
+ pxor xmm4,xmm1
+ movdqa xmm1,XMMWORD[((80-128))+rax]
+
+ movdqa xmm8,xmm12
+ pxor xmm4,XMMWORD[((176-128))+rax]
+ paddd xmm11,xmm15
+ movdqa xmm7,xmm13
+ pslld xmm8,5
+ pxor xmm4,xmm1
+ movdqa xmm6,xmm13
+ pandn xmm7,xmm10
+ movdqa xmm5,xmm4
+ pand xmm6,xmm14
+ movdqa xmm9,xmm12
+ psrld xmm5,31
+ paddd xmm4,xmm4
+
+ movdqa XMMWORD[(32-128)+rax],xmm3
+ paddd xmm11,xmm3
+ psrld xmm9,27
+ pxor xmm6,xmm7
+
+ movdqa xmm7,xmm13
+ por xmm8,xmm9
+ pslld xmm7,30
+ paddd xmm11,xmm6
+
+ psrld xmm13,2
+ paddd xmm11,xmm8
+ por xmm4,xmm5
+ por xmm13,xmm7
+ pxor xmm0,xmm2
+ movdqa xmm2,XMMWORD[((96-128))+rax]
+
+ movdqa xmm8,xmm11
+ pxor xmm0,XMMWORD[((192-128))+rax]
+ paddd xmm10,xmm15
+ movdqa xmm7,xmm12
+ pslld xmm8,5
+ pxor xmm0,xmm2
+ movdqa xmm6,xmm12
+ pandn xmm7,xmm14
+ movdqa xmm5,xmm0
+ pand xmm6,xmm13
+ movdqa xmm9,xmm11
+ psrld xmm5,31
+ paddd xmm0,xmm0
+
+ movdqa XMMWORD[(48-128)+rax],xmm4
+ paddd xmm10,xmm4
+ psrld xmm9,27
+ pxor xmm6,xmm7
+
+ movdqa xmm7,xmm12
+ por xmm8,xmm9
+ pslld xmm7,30
+ paddd xmm10,xmm6
+
+ psrld xmm12,2
+ paddd xmm10,xmm8
+ por xmm0,xmm5
+ por xmm12,xmm7
+ movdqa xmm15,XMMWORD[rbp]
+ pxor xmm1,xmm3
+ movdqa xmm3,XMMWORD[((112-128))+rax]
+
+ movdqa xmm8,xmm10
+ movdqa xmm6,xmm13
+ pxor xmm1,XMMWORD[((208-128))+rax]
+ paddd xmm14,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm11
+
+ movdqa xmm9,xmm10
+ movdqa XMMWORD[(64-128)+rax],xmm0
+ paddd xmm14,xmm0
+ pxor xmm1,xmm3
+ psrld xmm9,27
+ pxor xmm6,xmm12
+ movdqa xmm7,xmm11
+
+ pslld xmm7,30
+ movdqa xmm5,xmm1
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm14,xmm6
+ paddd xmm1,xmm1
+
+ psrld xmm11,2
+ paddd xmm14,xmm8
+ por xmm1,xmm5
+ por xmm11,xmm7
+ pxor xmm2,xmm4
+ movdqa xmm4,XMMWORD[((128-128))+rax]
+
+ movdqa xmm8,xmm14
+ movdqa xmm6,xmm12
+ pxor xmm2,XMMWORD[((224-128))+rax]
+ paddd xmm13,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm10
+
+ movdqa xmm9,xmm14
+ movdqa XMMWORD[(80-128)+rax],xmm1
+ paddd xmm13,xmm1
+ pxor xmm2,xmm4
+ psrld xmm9,27
+ pxor xmm6,xmm11
+ movdqa xmm7,xmm10
+
+ pslld xmm7,30
+ movdqa xmm5,xmm2
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm13,xmm6
+ paddd xmm2,xmm2
+
+ psrld xmm10,2
+ paddd xmm13,xmm8
+ por xmm2,xmm5
+ por xmm10,xmm7
+ pxor xmm3,xmm0
+ movdqa xmm0,XMMWORD[((144-128))+rax]
+
+ movdqa xmm8,xmm13
+ movdqa xmm6,xmm11
+ pxor xmm3,XMMWORD[((240-128))+rax]
+ paddd xmm12,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm14
+
+ movdqa xmm9,xmm13
+ movdqa XMMWORD[(96-128)+rax],xmm2
+ paddd xmm12,xmm2
+ pxor xmm3,xmm0
+ psrld xmm9,27
+ pxor xmm6,xmm10
+ movdqa xmm7,xmm14
+
+ pslld xmm7,30
+ movdqa xmm5,xmm3
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm12,xmm6
+ paddd xmm3,xmm3
+
+ psrld xmm14,2
+ paddd xmm12,xmm8
+ por xmm3,xmm5
+ por xmm14,xmm7
+ pxor xmm4,xmm1
+ movdqa xmm1,XMMWORD[((160-128))+rax]
+
+ movdqa xmm8,xmm12
+ movdqa xmm6,xmm10
+ pxor xmm4,XMMWORD[((0-128))+rax]
+ paddd xmm11,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm13
+
+ movdqa xmm9,xmm12
+ movdqa XMMWORD[(112-128)+rax],xmm3
+ paddd xmm11,xmm3
+ pxor xmm4,xmm1
+ psrld xmm9,27
+ pxor xmm6,xmm14
+ movdqa xmm7,xmm13
+
+ pslld xmm7,30
+ movdqa xmm5,xmm4
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm11,xmm6
+ paddd xmm4,xmm4
+
+ psrld xmm13,2
+ paddd xmm11,xmm8
+ por xmm4,xmm5
+ por xmm13,xmm7
+ pxor xmm0,xmm2
+ movdqa xmm2,XMMWORD[((176-128))+rax]
+
+ movdqa xmm8,xmm11
+ movdqa xmm6,xmm14
+ pxor xmm0,XMMWORD[((16-128))+rax]
+ paddd xmm10,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm12
+
+ movdqa xmm9,xmm11
+ movdqa XMMWORD[(128-128)+rax],xmm4
+ paddd xmm10,xmm4
+ pxor xmm0,xmm2
+ psrld xmm9,27
+ pxor xmm6,xmm13
+ movdqa xmm7,xmm12
+
+ pslld xmm7,30
+ movdqa xmm5,xmm0
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm10,xmm6
+ paddd xmm0,xmm0
+
+ psrld xmm12,2
+ paddd xmm10,xmm8
+ por xmm0,xmm5
+ por xmm12,xmm7
+ pxor xmm1,xmm3
+ movdqa xmm3,XMMWORD[((192-128))+rax]
+
+ movdqa xmm8,xmm10
+ movdqa xmm6,xmm13
+ pxor xmm1,XMMWORD[((32-128))+rax]
+ paddd xmm14,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm11
+
+ movdqa xmm9,xmm10
+ movdqa XMMWORD[(144-128)+rax],xmm0
+ paddd xmm14,xmm0
+ pxor xmm1,xmm3
+ psrld xmm9,27
+ pxor xmm6,xmm12
+ movdqa xmm7,xmm11
+
+ pslld xmm7,30
+ movdqa xmm5,xmm1
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm14,xmm6
+ paddd xmm1,xmm1
+
+ psrld xmm11,2
+ paddd xmm14,xmm8
+ por xmm1,xmm5
+ por xmm11,xmm7
+ pxor xmm2,xmm4
+ movdqa xmm4,XMMWORD[((208-128))+rax]
+
+ movdqa xmm8,xmm14
+ movdqa xmm6,xmm12
+ pxor xmm2,XMMWORD[((48-128))+rax]
+ paddd xmm13,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm10
+
+ movdqa xmm9,xmm14
+ movdqa XMMWORD[(160-128)+rax],xmm1
+ paddd xmm13,xmm1
+ pxor xmm2,xmm4
+ psrld xmm9,27
+ pxor xmm6,xmm11
+ movdqa xmm7,xmm10
+
+ pslld xmm7,30
+ movdqa xmm5,xmm2
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm13,xmm6
+ paddd xmm2,xmm2
+
+ psrld xmm10,2
+ paddd xmm13,xmm8
+ por xmm2,xmm5
+ por xmm10,xmm7
+ pxor xmm3,xmm0
+ movdqa xmm0,XMMWORD[((224-128))+rax]
+
+ movdqa xmm8,xmm13
+ movdqa xmm6,xmm11
+ pxor xmm3,XMMWORD[((64-128))+rax]
+ paddd xmm12,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm14
+
+ movdqa xmm9,xmm13
+ movdqa XMMWORD[(176-128)+rax],xmm2
+ paddd xmm12,xmm2
+ pxor xmm3,xmm0
+ psrld xmm9,27
+ pxor xmm6,xmm10
+ movdqa xmm7,xmm14
+
+ pslld xmm7,30
+ movdqa xmm5,xmm3
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm12,xmm6
+ paddd xmm3,xmm3
+
+ psrld xmm14,2
+ paddd xmm12,xmm8
+ por xmm3,xmm5
+ por xmm14,xmm7
+ pxor xmm4,xmm1
+ movdqa xmm1,XMMWORD[((240-128))+rax]
+
+ movdqa xmm8,xmm12
+ movdqa xmm6,xmm10
+ pxor xmm4,XMMWORD[((80-128))+rax]
+ paddd xmm11,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm13
+
+ movdqa xmm9,xmm12
+ movdqa XMMWORD[(192-128)+rax],xmm3
+ paddd xmm11,xmm3
+ pxor xmm4,xmm1
+ psrld xmm9,27
+ pxor xmm6,xmm14
+ movdqa xmm7,xmm13
+
+ pslld xmm7,30
+ movdqa xmm5,xmm4
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm11,xmm6
+ paddd xmm4,xmm4
+
+ psrld xmm13,2
+ paddd xmm11,xmm8
+ por xmm4,xmm5
+ por xmm13,xmm7
+ pxor xmm0,xmm2
+ movdqa xmm2,XMMWORD[((0-128))+rax]
+
+ movdqa xmm8,xmm11
+ movdqa xmm6,xmm14
+ pxor xmm0,XMMWORD[((96-128))+rax]
+ paddd xmm10,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm12
+
+ movdqa xmm9,xmm11
+ movdqa XMMWORD[(208-128)+rax],xmm4
+ paddd xmm10,xmm4
+ pxor xmm0,xmm2
+ psrld xmm9,27
+ pxor xmm6,xmm13
+ movdqa xmm7,xmm12
+
+ pslld xmm7,30
+ movdqa xmm5,xmm0
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm10,xmm6
+ paddd xmm0,xmm0
+
+ psrld xmm12,2
+ paddd xmm10,xmm8
+ por xmm0,xmm5
+ por xmm12,xmm7
+ pxor xmm1,xmm3
+ movdqa xmm3,XMMWORD[((16-128))+rax]
+
+ movdqa xmm8,xmm10
+ movdqa xmm6,xmm13
+ pxor xmm1,XMMWORD[((112-128))+rax]
+ paddd xmm14,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm11
+
+ movdqa xmm9,xmm10
+ movdqa XMMWORD[(224-128)+rax],xmm0
+ paddd xmm14,xmm0
+ pxor xmm1,xmm3
+ psrld xmm9,27
+ pxor xmm6,xmm12
+ movdqa xmm7,xmm11
+
+ pslld xmm7,30
+ movdqa xmm5,xmm1
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm14,xmm6
+ paddd xmm1,xmm1
+
+ psrld xmm11,2
+ paddd xmm14,xmm8
+ por xmm1,xmm5
+ por xmm11,xmm7
+ pxor xmm2,xmm4
+ movdqa xmm4,XMMWORD[((32-128))+rax]
+
+ movdqa xmm8,xmm14
+ movdqa xmm6,xmm12
+ pxor xmm2,XMMWORD[((128-128))+rax]
+ paddd xmm13,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm10
+
+ movdqa xmm9,xmm14
+ movdqa XMMWORD[(240-128)+rax],xmm1
+ paddd xmm13,xmm1
+ pxor xmm2,xmm4
+ psrld xmm9,27
+ pxor xmm6,xmm11
+ movdqa xmm7,xmm10
+
+ pslld xmm7,30
+ movdqa xmm5,xmm2
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm13,xmm6
+ paddd xmm2,xmm2
+
+ psrld xmm10,2
+ paddd xmm13,xmm8
+ por xmm2,xmm5
+ por xmm10,xmm7
+ pxor xmm3,xmm0
+ movdqa xmm0,XMMWORD[((48-128))+rax]
+
+ movdqa xmm8,xmm13
+ movdqa xmm6,xmm11
+ pxor xmm3,XMMWORD[((144-128))+rax]
+ paddd xmm12,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm14
+
+ movdqa xmm9,xmm13
+ movdqa XMMWORD[(0-128)+rax],xmm2
+ paddd xmm12,xmm2
+ pxor xmm3,xmm0
+ psrld xmm9,27
+ pxor xmm6,xmm10
+ movdqa xmm7,xmm14
+
+ pslld xmm7,30
+ movdqa xmm5,xmm3
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm12,xmm6
+ paddd xmm3,xmm3
+
+ psrld xmm14,2
+ paddd xmm12,xmm8
+ por xmm3,xmm5
+ por xmm14,xmm7
+ pxor xmm4,xmm1
+ movdqa xmm1,XMMWORD[((64-128))+rax]
+
+ movdqa xmm8,xmm12
+ movdqa xmm6,xmm10
+ pxor xmm4,XMMWORD[((160-128))+rax]
+ paddd xmm11,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm13
+
+ movdqa xmm9,xmm12
+ movdqa XMMWORD[(16-128)+rax],xmm3
+ paddd xmm11,xmm3
+ pxor xmm4,xmm1
+ psrld xmm9,27
+ pxor xmm6,xmm14
+ movdqa xmm7,xmm13
+
+ pslld xmm7,30
+ movdqa xmm5,xmm4
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm11,xmm6
+ paddd xmm4,xmm4
+
+ psrld xmm13,2
+ paddd xmm11,xmm8
+ por xmm4,xmm5
+ por xmm13,xmm7
+ pxor xmm0,xmm2
+ movdqa xmm2,XMMWORD[((80-128))+rax]
+
+ movdqa xmm8,xmm11
+ movdqa xmm6,xmm14
+ pxor xmm0,XMMWORD[((176-128))+rax]
+ paddd xmm10,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm12
+
+ movdqa xmm9,xmm11
+ movdqa XMMWORD[(32-128)+rax],xmm4
+ paddd xmm10,xmm4
+ pxor xmm0,xmm2
+ psrld xmm9,27
+ pxor xmm6,xmm13
+ movdqa xmm7,xmm12
+
+ pslld xmm7,30
+ movdqa xmm5,xmm0
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm10,xmm6
+ paddd xmm0,xmm0
+
+ psrld xmm12,2
+ paddd xmm10,xmm8
+ por xmm0,xmm5
+ por xmm12,xmm7
+ pxor xmm1,xmm3
+ movdqa xmm3,XMMWORD[((96-128))+rax]
+
+ movdqa xmm8,xmm10
+ movdqa xmm6,xmm13
+ pxor xmm1,XMMWORD[((192-128))+rax]
+ paddd xmm14,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm11
+
+ movdqa xmm9,xmm10
+ movdqa XMMWORD[(48-128)+rax],xmm0
+ paddd xmm14,xmm0
+ pxor xmm1,xmm3
+ psrld xmm9,27
+ pxor xmm6,xmm12
+ movdqa xmm7,xmm11
+
+ pslld xmm7,30
+ movdqa xmm5,xmm1
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm14,xmm6
+ paddd xmm1,xmm1
+
+ psrld xmm11,2
+ paddd xmm14,xmm8
+ por xmm1,xmm5
+ por xmm11,xmm7
+ pxor xmm2,xmm4
+ movdqa xmm4,XMMWORD[((112-128))+rax]
+
+ movdqa xmm8,xmm14
+ movdqa xmm6,xmm12
+ pxor xmm2,XMMWORD[((208-128))+rax]
+ paddd xmm13,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm10
+
+ movdqa xmm9,xmm14
+ movdqa XMMWORD[(64-128)+rax],xmm1
+ paddd xmm13,xmm1
+ pxor xmm2,xmm4
+ psrld xmm9,27
+ pxor xmm6,xmm11
+ movdqa xmm7,xmm10
+
+ pslld xmm7,30
+ movdqa xmm5,xmm2
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm13,xmm6
+ paddd xmm2,xmm2
+
+ psrld xmm10,2
+ paddd xmm13,xmm8
+ por xmm2,xmm5
+ por xmm10,xmm7
+ pxor xmm3,xmm0
+ movdqa xmm0,XMMWORD[((128-128))+rax]
+
+ movdqa xmm8,xmm13
+ movdqa xmm6,xmm11
+ pxor xmm3,XMMWORD[((224-128))+rax]
+ paddd xmm12,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm14
+
+ movdqa xmm9,xmm13
+ movdqa XMMWORD[(80-128)+rax],xmm2
+ paddd xmm12,xmm2
+ pxor xmm3,xmm0
+ psrld xmm9,27
+ pxor xmm6,xmm10
+ movdqa xmm7,xmm14
+
+ pslld xmm7,30
+ movdqa xmm5,xmm3
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm12,xmm6
+ paddd xmm3,xmm3
+
+ psrld xmm14,2
+ paddd xmm12,xmm8
+ por xmm3,xmm5
+ por xmm14,xmm7
+ pxor xmm4,xmm1
+ movdqa xmm1,XMMWORD[((144-128))+rax]
+
+ movdqa xmm8,xmm12
+ movdqa xmm6,xmm10
+ pxor xmm4,XMMWORD[((240-128))+rax]
+ paddd xmm11,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm13
+
+ movdqa xmm9,xmm12
+ movdqa XMMWORD[(96-128)+rax],xmm3
+ paddd xmm11,xmm3
+ pxor xmm4,xmm1
+ psrld xmm9,27
+ pxor xmm6,xmm14
+ movdqa xmm7,xmm13
+
+ pslld xmm7,30
+ movdqa xmm5,xmm4
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm11,xmm6
+ paddd xmm4,xmm4
+
+ psrld xmm13,2
+ paddd xmm11,xmm8
+ por xmm4,xmm5
+ por xmm13,xmm7
+ pxor xmm0,xmm2
+ movdqa xmm2,XMMWORD[((160-128))+rax]
+
+ movdqa xmm8,xmm11
+ movdqa xmm6,xmm14
+ pxor xmm0,XMMWORD[((0-128))+rax]
+ paddd xmm10,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm12
+
+ movdqa xmm9,xmm11
+ movdqa XMMWORD[(112-128)+rax],xmm4
+ paddd xmm10,xmm4
+ pxor xmm0,xmm2
+ psrld xmm9,27
+ pxor xmm6,xmm13
+ movdqa xmm7,xmm12
+
+ pslld xmm7,30
+ movdqa xmm5,xmm0
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm10,xmm6
+ paddd xmm0,xmm0
+
+ psrld xmm12,2
+ paddd xmm10,xmm8
+ por xmm0,xmm5
+ por xmm12,xmm7
+ movdqa xmm15,XMMWORD[32+rbp]
+ pxor xmm1,xmm3
+ movdqa xmm3,XMMWORD[((176-128))+rax]
+
+ movdqa xmm8,xmm10
+ movdqa xmm7,xmm13
+ pxor xmm1,XMMWORD[((16-128))+rax]
+ pxor xmm1,xmm3
+ paddd xmm14,xmm15
+ pslld xmm8,5
+ movdqa xmm9,xmm10
+ pand xmm7,xmm12
+
+ movdqa xmm6,xmm13
+ movdqa xmm5,xmm1
+ psrld xmm9,27
+ paddd xmm14,xmm7
+ pxor xmm6,xmm12
+
+ movdqa XMMWORD[(128-128)+rax],xmm0
+ paddd xmm14,xmm0
+ por xmm8,xmm9
+ psrld xmm5,31
+ pand xmm6,xmm11
+ movdqa xmm7,xmm11
+
+ pslld xmm7,30
+ paddd xmm1,xmm1
+ paddd xmm14,xmm6
+
+ psrld xmm11,2
+ paddd xmm14,xmm8
+ por xmm1,xmm5
+ por xmm11,xmm7
+ pxor xmm2,xmm4
+ movdqa xmm4,XMMWORD[((192-128))+rax]
+
+ movdqa xmm8,xmm14
+ movdqa xmm7,xmm12
+ pxor xmm2,XMMWORD[((32-128))+rax]
+ pxor xmm2,xmm4
+ paddd xmm13,xmm15
+ pslld xmm8,5
+ movdqa xmm9,xmm14
+ pand xmm7,xmm11
+
+ movdqa xmm6,xmm12
+ movdqa xmm5,xmm2
+ psrld xmm9,27
+ paddd xmm13,xmm7
+ pxor xmm6,xmm11
+
+ movdqa XMMWORD[(144-128)+rax],xmm1
+ paddd xmm13,xmm1
+ por xmm8,xmm9
+ psrld xmm5,31
+ pand xmm6,xmm10
+ movdqa xmm7,xmm10
+
+ pslld xmm7,30
+ paddd xmm2,xmm2
+ paddd xmm13,xmm6
+
+ psrld xmm10,2
+ paddd xmm13,xmm8
+ por xmm2,xmm5
+ por xmm10,xmm7
+ pxor xmm3,xmm0
+ movdqa xmm0,XMMWORD[((208-128))+rax]
+
+ movdqa xmm8,xmm13
+ movdqa xmm7,xmm11
+ pxor xmm3,XMMWORD[((48-128))+rax]
+ pxor xmm3,xmm0
+ paddd xmm12,xmm15
+ pslld xmm8,5
+ movdqa xmm9,xmm13
+ pand xmm7,xmm10
+
+ movdqa xmm6,xmm11
+ movdqa xmm5,xmm3
+ psrld xmm9,27
+ paddd xmm12,xmm7
+ pxor xmm6,xmm10
+
+ movdqa XMMWORD[(160-128)+rax],xmm2
+ paddd xmm12,xmm2
+ por xmm8,xmm9
+ psrld xmm5,31
+ pand xmm6,xmm14
+ movdqa xmm7,xmm14
+
+ pslld xmm7,30
+ paddd xmm3,xmm3
+ paddd xmm12,xmm6
+
+ psrld xmm14,2
+ paddd xmm12,xmm8
+ por xmm3,xmm5
+ por xmm14,xmm7
+ pxor xmm4,xmm1
+ movdqa xmm1,XMMWORD[((224-128))+rax]
+
+ movdqa xmm8,xmm12
+ movdqa xmm7,xmm10
+ pxor xmm4,XMMWORD[((64-128))+rax]
+ pxor xmm4,xmm1
+ paddd xmm11,xmm15
+ pslld xmm8,5
+ movdqa xmm9,xmm12
+ pand xmm7,xmm14
+
+ movdqa xmm6,xmm10
+ movdqa xmm5,xmm4
+ psrld xmm9,27
+ paddd xmm11,xmm7
+ pxor xmm6,xmm14
+
+ movdqa XMMWORD[(176-128)+rax],xmm3
+ paddd xmm11,xmm3
+ por xmm8,xmm9
+ psrld xmm5,31
+ pand xmm6,xmm13
+ movdqa xmm7,xmm13
+
+ pslld xmm7,30
+ paddd xmm4,xmm4
+ paddd xmm11,xmm6
+
+ psrld xmm13,2
+ paddd xmm11,xmm8
+ por xmm4,xmm5
+ por xmm13,xmm7
+ pxor xmm0,xmm2
+ movdqa xmm2,XMMWORD[((240-128))+rax]
+
+ movdqa xmm8,xmm11
+ movdqa xmm7,xmm14
+ pxor xmm0,XMMWORD[((80-128))+rax]
+ pxor xmm0,xmm2
+ paddd xmm10,xmm15
+ pslld xmm8,5
+ movdqa xmm9,xmm11
+ pand xmm7,xmm13
+
+ movdqa xmm6,xmm14
+ movdqa xmm5,xmm0
+ psrld xmm9,27
+ paddd xmm10,xmm7
+ pxor xmm6,xmm13
+
+ movdqa XMMWORD[(192-128)+rax],xmm4
+ paddd xmm10,xmm4
+ por xmm8,xmm9
+ psrld xmm5,31
+ pand xmm6,xmm12
+ movdqa xmm7,xmm12
+
+ pslld xmm7,30
+ paddd xmm0,xmm0
+ paddd xmm10,xmm6
+
+ psrld xmm12,2
+ paddd xmm10,xmm8
+ por xmm0,xmm5
+ por xmm12,xmm7
+ pxor xmm1,xmm3
+ movdqa xmm3,XMMWORD[((0-128))+rax]
+
+ movdqa xmm8,xmm10
+ movdqa xmm7,xmm13
+ pxor xmm1,XMMWORD[((96-128))+rax]
+ pxor xmm1,xmm3
+ paddd xmm14,xmm15
+ pslld xmm8,5
+ movdqa xmm9,xmm10
+ pand xmm7,xmm12
+
+ movdqa xmm6,xmm13
+ movdqa xmm5,xmm1
+ psrld xmm9,27
+ paddd xmm14,xmm7
+ pxor xmm6,xmm12
+
+ movdqa XMMWORD[(208-128)+rax],xmm0
+ paddd xmm14,xmm0
+ por xmm8,xmm9
+ psrld xmm5,31
+ pand xmm6,xmm11
+ movdqa xmm7,xmm11
+
+ pslld xmm7,30
+ paddd xmm1,xmm1
+ paddd xmm14,xmm6
+
+ psrld xmm11,2
+ paddd xmm14,xmm8
+ por xmm1,xmm5
+ por xmm11,xmm7
+ pxor xmm2,xmm4
+ movdqa xmm4,XMMWORD[((16-128))+rax]
+
+ movdqa xmm8,xmm14
+ movdqa xmm7,xmm12
+ pxor xmm2,XMMWORD[((112-128))+rax]
+ pxor xmm2,xmm4
+ paddd xmm13,xmm15
+ pslld xmm8,5
+ movdqa xmm9,xmm14
+ pand xmm7,xmm11
+
+ movdqa xmm6,xmm12
+ movdqa xmm5,xmm2
+ psrld xmm9,27
+ paddd xmm13,xmm7
+ pxor xmm6,xmm11
+
+ movdqa XMMWORD[(224-128)+rax],xmm1
+ paddd xmm13,xmm1
+ por xmm8,xmm9
+ psrld xmm5,31
+ pand xmm6,xmm10
+ movdqa xmm7,xmm10
+
+ pslld xmm7,30
+ paddd xmm2,xmm2
+ paddd xmm13,xmm6
+
+ psrld xmm10,2
+ paddd xmm13,xmm8
+ por xmm2,xmm5
+ por xmm10,xmm7
+ pxor xmm3,xmm0
+ movdqa xmm0,XMMWORD[((32-128))+rax]
+
+ movdqa xmm8,xmm13
+ movdqa xmm7,xmm11
+ pxor xmm3,XMMWORD[((128-128))+rax]
+ pxor xmm3,xmm0
+ paddd xmm12,xmm15
+ pslld xmm8,5
+ movdqa xmm9,xmm13
+ pand xmm7,xmm10
+
+ movdqa xmm6,xmm11
+ movdqa xmm5,xmm3
+ psrld xmm9,27
+ paddd xmm12,xmm7
+ pxor xmm6,xmm10
+
+ movdqa XMMWORD[(240-128)+rax],xmm2
+ paddd xmm12,xmm2
+ por xmm8,xmm9
+ psrld xmm5,31
+ pand xmm6,xmm14
+ movdqa xmm7,xmm14
+
+ pslld xmm7,30
+ paddd xmm3,xmm3
+ paddd xmm12,xmm6
+
+ psrld xmm14,2
+ paddd xmm12,xmm8
+ por xmm3,xmm5
+ por xmm14,xmm7
+ pxor xmm4,xmm1
+ movdqa xmm1,XMMWORD[((48-128))+rax]
+
+ movdqa xmm8,xmm12
+ movdqa xmm7,xmm10
+ pxor xmm4,XMMWORD[((144-128))+rax]
+ pxor xmm4,xmm1
+ paddd xmm11,xmm15
+ pslld xmm8,5
+ movdqa xmm9,xmm12
+ pand xmm7,xmm14
+
+ movdqa xmm6,xmm10
+ movdqa xmm5,xmm4
+ psrld xmm9,27
+ paddd xmm11,xmm7
+ pxor xmm6,xmm14
+
+ movdqa XMMWORD[(0-128)+rax],xmm3
+ paddd xmm11,xmm3
+ por xmm8,xmm9
+ psrld xmm5,31
+ pand xmm6,xmm13
+ movdqa xmm7,xmm13
+
+ pslld xmm7,30
+ paddd xmm4,xmm4
+ paddd xmm11,xmm6
+
+ psrld xmm13,2
+ paddd xmm11,xmm8
+ por xmm4,xmm5
+ por xmm13,xmm7
+ pxor xmm0,xmm2
+ movdqa xmm2,XMMWORD[((64-128))+rax]
+
+ movdqa xmm8,xmm11
+ movdqa xmm7,xmm14
+ pxor xmm0,XMMWORD[((160-128))+rax]
+ pxor xmm0,xmm2
+ paddd xmm10,xmm15
+ pslld xmm8,5
+ movdqa xmm9,xmm11
+ pand xmm7,xmm13
+
+ movdqa xmm6,xmm14
+ movdqa xmm5,xmm0
+ psrld xmm9,27
+ paddd xmm10,xmm7
+ pxor xmm6,xmm13
+
+ movdqa XMMWORD[(16-128)+rax],xmm4
+ paddd xmm10,xmm4
+ por xmm8,xmm9
+ psrld xmm5,31
+ pand xmm6,xmm12
+ movdqa xmm7,xmm12
+
+ pslld xmm7,30
+ paddd xmm0,xmm0
+ paddd xmm10,xmm6
+
+ psrld xmm12,2
+ paddd xmm10,xmm8
+ por xmm0,xmm5
+ por xmm12,xmm7
+ pxor xmm1,xmm3
+ movdqa xmm3,XMMWORD[((80-128))+rax]
+
+ movdqa xmm8,xmm10
+ movdqa xmm7,xmm13
+ pxor xmm1,XMMWORD[((176-128))+rax]
+ pxor xmm1,xmm3
+ paddd xmm14,xmm15
+ pslld xmm8,5
+ movdqa xmm9,xmm10
+ pand xmm7,xmm12
+
+ movdqa xmm6,xmm13
+ movdqa xmm5,xmm1
+ psrld xmm9,27
+ paddd xmm14,xmm7
+ pxor xmm6,xmm12
+
+ movdqa XMMWORD[(32-128)+rax],xmm0
+ paddd xmm14,xmm0
+ por xmm8,xmm9
+ psrld xmm5,31
+ pand xmm6,xmm11
+ movdqa xmm7,xmm11
+
+ pslld xmm7,30
+ paddd xmm1,xmm1
+ paddd xmm14,xmm6
+
+ psrld xmm11,2
+ paddd xmm14,xmm8
+ por xmm1,xmm5
+ por xmm11,xmm7
+ pxor xmm2,xmm4
+ movdqa xmm4,XMMWORD[((96-128))+rax]
+
+ movdqa xmm8,xmm14
+ movdqa xmm7,xmm12
+ pxor xmm2,XMMWORD[((192-128))+rax]
+ pxor xmm2,xmm4
+ paddd xmm13,xmm15
+ pslld xmm8,5
+ movdqa xmm9,xmm14
+ pand xmm7,xmm11
+
+ movdqa xmm6,xmm12
+ movdqa xmm5,xmm2
+ psrld xmm9,27
+ paddd xmm13,xmm7
+ pxor xmm6,xmm11
+
+ movdqa XMMWORD[(48-128)+rax],xmm1
+ paddd xmm13,xmm1
+ por xmm8,xmm9
+ psrld xmm5,31
+ pand xmm6,xmm10
+ movdqa xmm7,xmm10
+
+ pslld xmm7,30
+ paddd xmm2,xmm2
+ paddd xmm13,xmm6
+
+ psrld xmm10,2
+ paddd xmm13,xmm8
+ por xmm2,xmm5
+ por xmm10,xmm7
+ pxor xmm3,xmm0
+ movdqa xmm0,XMMWORD[((112-128))+rax]
+
+ movdqa xmm8,xmm13
+ movdqa xmm7,xmm11
+ pxor xmm3,XMMWORD[((208-128))+rax]
+ pxor xmm3,xmm0
+ paddd xmm12,xmm15
+ pslld xmm8,5
+ movdqa xmm9,xmm13
+ pand xmm7,xmm10
+
+ movdqa xmm6,xmm11
+ movdqa xmm5,xmm3
+ psrld xmm9,27
+ paddd xmm12,xmm7
+ pxor xmm6,xmm10
+
+ movdqa XMMWORD[(64-128)+rax],xmm2
+ paddd xmm12,xmm2
+ por xmm8,xmm9
+ psrld xmm5,31
+ pand xmm6,xmm14
+ movdqa xmm7,xmm14
+
+ pslld xmm7,30
+ paddd xmm3,xmm3
+ paddd xmm12,xmm6
+
+ psrld xmm14,2
+ paddd xmm12,xmm8
+ por xmm3,xmm5
+ por xmm14,xmm7
+ pxor xmm4,xmm1
+ movdqa xmm1,XMMWORD[((128-128))+rax]
+
+ movdqa xmm8,xmm12
+ movdqa xmm7,xmm10
+ pxor xmm4,XMMWORD[((224-128))+rax]
+ pxor xmm4,xmm1
+ paddd xmm11,xmm15
+ pslld xmm8,5
+ movdqa xmm9,xmm12
+ pand xmm7,xmm14
+
+ movdqa xmm6,xmm10
+ movdqa xmm5,xmm4
+ psrld xmm9,27
+ paddd xmm11,xmm7
+ pxor xmm6,xmm14
+
+ movdqa XMMWORD[(80-128)+rax],xmm3
+ paddd xmm11,xmm3
+ por xmm8,xmm9
+ psrld xmm5,31
+ pand xmm6,xmm13
+ movdqa xmm7,xmm13
+
+ pslld xmm7,30
+ paddd xmm4,xmm4
+ paddd xmm11,xmm6
+
+ psrld xmm13,2
+ paddd xmm11,xmm8
+ por xmm4,xmm5
+ por xmm13,xmm7
+ pxor xmm0,xmm2
+ movdqa xmm2,XMMWORD[((144-128))+rax]
+
+ movdqa xmm8,xmm11
+ movdqa xmm7,xmm14
+ pxor xmm0,XMMWORD[((240-128))+rax]
+ pxor xmm0,xmm2
+ paddd xmm10,xmm15
+ pslld xmm8,5
+ movdqa xmm9,xmm11
+ pand xmm7,xmm13
+
+ movdqa xmm6,xmm14
+ movdqa xmm5,xmm0
+ psrld xmm9,27
+ paddd xmm10,xmm7
+ pxor xmm6,xmm13
+
+ movdqa XMMWORD[(96-128)+rax],xmm4
+ paddd xmm10,xmm4
+ por xmm8,xmm9
+ psrld xmm5,31
+ pand xmm6,xmm12
+ movdqa xmm7,xmm12
+
+ pslld xmm7,30
+ paddd xmm0,xmm0
+ paddd xmm10,xmm6
+
+ psrld xmm12,2
+ paddd xmm10,xmm8
+ por xmm0,xmm5
+ por xmm12,xmm7
+ pxor xmm1,xmm3
+ movdqa xmm3,XMMWORD[((160-128))+rax]
+
+ movdqa xmm8,xmm10
+ movdqa xmm7,xmm13
+ pxor xmm1,XMMWORD[((0-128))+rax]
+ pxor xmm1,xmm3
+ paddd xmm14,xmm15
+ pslld xmm8,5
+ movdqa xmm9,xmm10
+ pand xmm7,xmm12
+
+ movdqa xmm6,xmm13
+ movdqa xmm5,xmm1
+ psrld xmm9,27
+ paddd xmm14,xmm7
+ pxor xmm6,xmm12
+
+ movdqa XMMWORD[(112-128)+rax],xmm0
+ paddd xmm14,xmm0
+ por xmm8,xmm9
+ psrld xmm5,31
+ pand xmm6,xmm11
+ movdqa xmm7,xmm11
+
+ pslld xmm7,30
+ paddd xmm1,xmm1
+ paddd xmm14,xmm6
+
+ psrld xmm11,2
+ paddd xmm14,xmm8
+ por xmm1,xmm5
+ por xmm11,xmm7
+ pxor xmm2,xmm4
+ movdqa xmm4,XMMWORD[((176-128))+rax]
+
+ movdqa xmm8,xmm14
+ movdqa xmm7,xmm12
+ pxor xmm2,XMMWORD[((16-128))+rax]
+ pxor xmm2,xmm4
+ paddd xmm13,xmm15
+ pslld xmm8,5
+ movdqa xmm9,xmm14
+ pand xmm7,xmm11
+
+ movdqa xmm6,xmm12
+ movdqa xmm5,xmm2
+ psrld xmm9,27
+ paddd xmm13,xmm7
+ pxor xmm6,xmm11
+
+ movdqa XMMWORD[(128-128)+rax],xmm1
+ paddd xmm13,xmm1
+ por xmm8,xmm9
+ psrld xmm5,31
+ pand xmm6,xmm10
+ movdqa xmm7,xmm10
+
+ pslld xmm7,30
+ paddd xmm2,xmm2
+ paddd xmm13,xmm6
+
+ psrld xmm10,2
+ paddd xmm13,xmm8
+ por xmm2,xmm5
+ por xmm10,xmm7
+ pxor xmm3,xmm0
+ movdqa xmm0,XMMWORD[((192-128))+rax]
+
+ movdqa xmm8,xmm13
+ movdqa xmm7,xmm11
+ pxor xmm3,XMMWORD[((32-128))+rax]
+ pxor xmm3,xmm0
+ paddd xmm12,xmm15
+ pslld xmm8,5
+ movdqa xmm9,xmm13
+ pand xmm7,xmm10
+
+ movdqa xmm6,xmm11
+ movdqa xmm5,xmm3
+ psrld xmm9,27
+ paddd xmm12,xmm7
+ pxor xmm6,xmm10
+
+ movdqa XMMWORD[(144-128)+rax],xmm2
+ paddd xmm12,xmm2
+ por xmm8,xmm9
+ psrld xmm5,31
+ pand xmm6,xmm14
+ movdqa xmm7,xmm14
+
+ pslld xmm7,30
+ paddd xmm3,xmm3
+ paddd xmm12,xmm6
+
+ psrld xmm14,2
+ paddd xmm12,xmm8
+ por xmm3,xmm5
+ por xmm14,xmm7
+ pxor xmm4,xmm1
+ movdqa xmm1,XMMWORD[((208-128))+rax]
+
+ movdqa xmm8,xmm12
+ movdqa xmm7,xmm10
+ pxor xmm4,XMMWORD[((48-128))+rax]
+ pxor xmm4,xmm1
+ paddd xmm11,xmm15
+ pslld xmm8,5
+ movdqa xmm9,xmm12
+ pand xmm7,xmm14
+
+ movdqa xmm6,xmm10
+ movdqa xmm5,xmm4
+ psrld xmm9,27
+ paddd xmm11,xmm7
+ pxor xmm6,xmm14
+
+ movdqa XMMWORD[(160-128)+rax],xmm3
+ paddd xmm11,xmm3
+ por xmm8,xmm9
+ psrld xmm5,31
+ pand xmm6,xmm13
+ movdqa xmm7,xmm13
+
+ pslld xmm7,30
+ paddd xmm4,xmm4
+ paddd xmm11,xmm6
+
+ psrld xmm13,2
+ paddd xmm11,xmm8
+ por xmm4,xmm5
+ por xmm13,xmm7
+ pxor xmm0,xmm2
+ movdqa xmm2,XMMWORD[((224-128))+rax]
+
+ movdqa xmm8,xmm11
+ movdqa xmm7,xmm14
+ pxor xmm0,XMMWORD[((64-128))+rax]
+ pxor xmm0,xmm2
+ paddd xmm10,xmm15
+ pslld xmm8,5
+ movdqa xmm9,xmm11
+ pand xmm7,xmm13
+
+ movdqa xmm6,xmm14
+ movdqa xmm5,xmm0
+ psrld xmm9,27
+ paddd xmm10,xmm7
+ pxor xmm6,xmm13
+
+ movdqa XMMWORD[(176-128)+rax],xmm4
+ paddd xmm10,xmm4
+ por xmm8,xmm9
+ psrld xmm5,31
+ pand xmm6,xmm12
+ movdqa xmm7,xmm12
+
+ pslld xmm7,30
+ paddd xmm0,xmm0
+ paddd xmm10,xmm6
+
+ psrld xmm12,2
+ paddd xmm10,xmm8
+ por xmm0,xmm5
+ por xmm12,xmm7
+ movdqa xmm15,XMMWORD[64+rbp]
+ pxor xmm1,xmm3
+ movdqa xmm3,XMMWORD[((240-128))+rax]
+
+ movdqa xmm8,xmm10
+ movdqa xmm6,xmm13
+ pxor xmm1,XMMWORD[((80-128))+rax]
+ paddd xmm14,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm11
+
+ movdqa xmm9,xmm10
+ movdqa XMMWORD[(192-128)+rax],xmm0
+ paddd xmm14,xmm0
+ pxor xmm1,xmm3
+ psrld xmm9,27
+ pxor xmm6,xmm12
+ movdqa xmm7,xmm11
+
+ pslld xmm7,30
+ movdqa xmm5,xmm1
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm14,xmm6
+ paddd xmm1,xmm1
+
+ psrld xmm11,2
+ paddd xmm14,xmm8
+ por xmm1,xmm5
+ por xmm11,xmm7
+ pxor xmm2,xmm4
+ movdqa xmm4,XMMWORD[((0-128))+rax]
+
+ movdqa xmm8,xmm14
+ movdqa xmm6,xmm12
+ pxor xmm2,XMMWORD[((96-128))+rax]
+ paddd xmm13,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm10
+
+ movdqa xmm9,xmm14
+ movdqa XMMWORD[(208-128)+rax],xmm1
+ paddd xmm13,xmm1
+ pxor xmm2,xmm4
+ psrld xmm9,27
+ pxor xmm6,xmm11
+ movdqa xmm7,xmm10
+
+ pslld xmm7,30
+ movdqa xmm5,xmm2
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm13,xmm6
+ paddd xmm2,xmm2
+
+ psrld xmm10,2
+ paddd xmm13,xmm8
+ por xmm2,xmm5
+ por xmm10,xmm7
+ pxor xmm3,xmm0
+ movdqa xmm0,XMMWORD[((16-128))+rax]
+
+ movdqa xmm8,xmm13
+ movdqa xmm6,xmm11
+ pxor xmm3,XMMWORD[((112-128))+rax]
+ paddd xmm12,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm14
+
+ movdqa xmm9,xmm13
+ movdqa XMMWORD[(224-128)+rax],xmm2
+ paddd xmm12,xmm2
+ pxor xmm3,xmm0
+ psrld xmm9,27
+ pxor xmm6,xmm10
+ movdqa xmm7,xmm14
+
+ pslld xmm7,30
+ movdqa xmm5,xmm3
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm12,xmm6
+ paddd xmm3,xmm3
+
+ psrld xmm14,2
+ paddd xmm12,xmm8
+ por xmm3,xmm5
+ por xmm14,xmm7
+ pxor xmm4,xmm1
+ movdqa xmm1,XMMWORD[((32-128))+rax]
+
+ movdqa xmm8,xmm12
+ movdqa xmm6,xmm10
+ pxor xmm4,XMMWORD[((128-128))+rax]
+ paddd xmm11,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm13
+
+ movdqa xmm9,xmm12
+ movdqa XMMWORD[(240-128)+rax],xmm3
+ paddd xmm11,xmm3
+ pxor xmm4,xmm1
+ psrld xmm9,27
+ pxor xmm6,xmm14
+ movdqa xmm7,xmm13
+
+ pslld xmm7,30
+ movdqa xmm5,xmm4
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm11,xmm6
+ paddd xmm4,xmm4
+
+ psrld xmm13,2
+ paddd xmm11,xmm8
+ por xmm4,xmm5
+ por xmm13,xmm7
+ pxor xmm0,xmm2
+ movdqa xmm2,XMMWORD[((48-128))+rax]
+
+ movdqa xmm8,xmm11
+ movdqa xmm6,xmm14
+ pxor xmm0,XMMWORD[((144-128))+rax]
+ paddd xmm10,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm12
+
+ movdqa xmm9,xmm11
+ movdqa XMMWORD[(0-128)+rax],xmm4
+ paddd xmm10,xmm4
+ pxor xmm0,xmm2
+ psrld xmm9,27
+ pxor xmm6,xmm13
+ movdqa xmm7,xmm12
+
+ pslld xmm7,30
+ movdqa xmm5,xmm0
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm10,xmm6
+ paddd xmm0,xmm0
+
+ psrld xmm12,2
+ paddd xmm10,xmm8
+ por xmm0,xmm5
+ por xmm12,xmm7
+ pxor xmm1,xmm3
+ movdqa xmm3,XMMWORD[((64-128))+rax]
+
+ movdqa xmm8,xmm10
+ movdqa xmm6,xmm13
+ pxor xmm1,XMMWORD[((160-128))+rax]
+ paddd xmm14,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm11
+
+ movdqa xmm9,xmm10
+ movdqa XMMWORD[(16-128)+rax],xmm0
+ paddd xmm14,xmm0
+ pxor xmm1,xmm3
+ psrld xmm9,27
+ pxor xmm6,xmm12
+ movdqa xmm7,xmm11
+
+ pslld xmm7,30
+ movdqa xmm5,xmm1
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm14,xmm6
+ paddd xmm1,xmm1
+
+ psrld xmm11,2
+ paddd xmm14,xmm8
+ por xmm1,xmm5
+ por xmm11,xmm7
+ pxor xmm2,xmm4
+ movdqa xmm4,XMMWORD[((80-128))+rax]
+
+ movdqa xmm8,xmm14
+ movdqa xmm6,xmm12
+ pxor xmm2,XMMWORD[((176-128))+rax]
+ paddd xmm13,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm10
+
+ movdqa xmm9,xmm14
+ movdqa XMMWORD[(32-128)+rax],xmm1
+ paddd xmm13,xmm1
+ pxor xmm2,xmm4
+ psrld xmm9,27
+ pxor xmm6,xmm11
+ movdqa xmm7,xmm10
+
+ pslld xmm7,30
+ movdqa xmm5,xmm2
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm13,xmm6
+ paddd xmm2,xmm2
+
+ psrld xmm10,2
+ paddd xmm13,xmm8
+ por xmm2,xmm5
+ por xmm10,xmm7
+ pxor xmm3,xmm0
+ movdqa xmm0,XMMWORD[((96-128))+rax]
+
+ movdqa xmm8,xmm13
+ movdqa xmm6,xmm11
+ pxor xmm3,XMMWORD[((192-128))+rax]
+ paddd xmm12,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm14
+
+ movdqa xmm9,xmm13
+ movdqa XMMWORD[(48-128)+rax],xmm2
+ paddd xmm12,xmm2
+ pxor xmm3,xmm0
+ psrld xmm9,27
+ pxor xmm6,xmm10
+ movdqa xmm7,xmm14
+
+ pslld xmm7,30
+ movdqa xmm5,xmm3
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm12,xmm6
+ paddd xmm3,xmm3
+
+ psrld xmm14,2
+ paddd xmm12,xmm8
+ por xmm3,xmm5
+ por xmm14,xmm7
+ pxor xmm4,xmm1
+ movdqa xmm1,XMMWORD[((112-128))+rax]
+
+ movdqa xmm8,xmm12
+ movdqa xmm6,xmm10
+ pxor xmm4,XMMWORD[((208-128))+rax]
+ paddd xmm11,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm13
+
+ movdqa xmm9,xmm12
+ movdqa XMMWORD[(64-128)+rax],xmm3
+ paddd xmm11,xmm3
+ pxor xmm4,xmm1
+ psrld xmm9,27
+ pxor xmm6,xmm14
+ movdqa xmm7,xmm13
+
+ pslld xmm7,30
+ movdqa xmm5,xmm4
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm11,xmm6
+ paddd xmm4,xmm4
+
+ psrld xmm13,2
+ paddd xmm11,xmm8
+ por xmm4,xmm5
+ por xmm13,xmm7
+ pxor xmm0,xmm2
+ movdqa xmm2,XMMWORD[((128-128))+rax]
+
+ movdqa xmm8,xmm11
+ movdqa xmm6,xmm14
+ pxor xmm0,XMMWORD[((224-128))+rax]
+ paddd xmm10,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm12
+
+ movdqa xmm9,xmm11
+ movdqa XMMWORD[(80-128)+rax],xmm4
+ paddd xmm10,xmm4
+ pxor xmm0,xmm2
+ psrld xmm9,27
+ pxor xmm6,xmm13
+ movdqa xmm7,xmm12
+
+ pslld xmm7,30
+ movdqa xmm5,xmm0
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm10,xmm6
+ paddd xmm0,xmm0
+
+ psrld xmm12,2
+ paddd xmm10,xmm8
+ por xmm0,xmm5
+ por xmm12,xmm7
+ pxor xmm1,xmm3
+ movdqa xmm3,XMMWORD[((144-128))+rax]
+
+ movdqa xmm8,xmm10
+ movdqa xmm6,xmm13
+ pxor xmm1,XMMWORD[((240-128))+rax]
+ paddd xmm14,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm11
+
+ movdqa xmm9,xmm10
+ movdqa XMMWORD[(96-128)+rax],xmm0
+ paddd xmm14,xmm0
+ pxor xmm1,xmm3
+ psrld xmm9,27
+ pxor xmm6,xmm12
+ movdqa xmm7,xmm11
+
+ pslld xmm7,30
+ movdqa xmm5,xmm1
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm14,xmm6
+ paddd xmm1,xmm1
+
+ psrld xmm11,2
+ paddd xmm14,xmm8
+ por xmm1,xmm5
+ por xmm11,xmm7
+ pxor xmm2,xmm4
+ movdqa xmm4,XMMWORD[((160-128))+rax]
+
+ movdqa xmm8,xmm14
+ movdqa xmm6,xmm12
+ pxor xmm2,XMMWORD[((0-128))+rax]
+ paddd xmm13,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm10
+
+ movdqa xmm9,xmm14
+ movdqa XMMWORD[(112-128)+rax],xmm1
+ paddd xmm13,xmm1
+ pxor xmm2,xmm4
+ psrld xmm9,27
+ pxor xmm6,xmm11
+ movdqa xmm7,xmm10
+
+ pslld xmm7,30
+ movdqa xmm5,xmm2
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm13,xmm6
+ paddd xmm2,xmm2
+
+ psrld xmm10,2
+ paddd xmm13,xmm8
+ por xmm2,xmm5
+ por xmm10,xmm7
+ pxor xmm3,xmm0
+ movdqa xmm0,XMMWORD[((176-128))+rax]
+
+ movdqa xmm8,xmm13
+ movdqa xmm6,xmm11
+ pxor xmm3,XMMWORD[((16-128))+rax]
+ paddd xmm12,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm14
+
+ movdqa xmm9,xmm13
+ paddd xmm12,xmm2
+ pxor xmm3,xmm0
+ psrld xmm9,27
+ pxor xmm6,xmm10
+ movdqa xmm7,xmm14
+
+ pslld xmm7,30
+ movdqa xmm5,xmm3
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm12,xmm6
+ paddd xmm3,xmm3
+
+ psrld xmm14,2
+ paddd xmm12,xmm8
+ por xmm3,xmm5
+ por xmm14,xmm7
+ pxor xmm4,xmm1
+ movdqa xmm1,XMMWORD[((192-128))+rax]
+
+ movdqa xmm8,xmm12
+ movdqa xmm6,xmm10
+ pxor xmm4,XMMWORD[((32-128))+rax]
+ paddd xmm11,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm13
+
+ movdqa xmm9,xmm12
+ paddd xmm11,xmm3
+ pxor xmm4,xmm1
+ psrld xmm9,27
+ pxor xmm6,xmm14
+ movdqa xmm7,xmm13
+
+ pslld xmm7,30
+ movdqa xmm5,xmm4
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm11,xmm6
+ paddd xmm4,xmm4
+
+ psrld xmm13,2
+ paddd xmm11,xmm8
+ por xmm4,xmm5
+ por xmm13,xmm7
+ pxor xmm0,xmm2
+ movdqa xmm2,XMMWORD[((208-128))+rax]
+
+ movdqa xmm8,xmm11
+ movdqa xmm6,xmm14
+ pxor xmm0,XMMWORD[((48-128))+rax]
+ paddd xmm10,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm12
+
+ movdqa xmm9,xmm11
+ paddd xmm10,xmm4
+ pxor xmm0,xmm2
+ psrld xmm9,27
+ pxor xmm6,xmm13
+ movdqa xmm7,xmm12
+
+ pslld xmm7,30
+ movdqa xmm5,xmm0
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm10,xmm6
+ paddd xmm0,xmm0
+
+ psrld xmm12,2
+ paddd xmm10,xmm8
+ por xmm0,xmm5
+ por xmm12,xmm7
+ pxor xmm1,xmm3
+ movdqa xmm3,XMMWORD[((224-128))+rax]
+
+ movdqa xmm8,xmm10
+ movdqa xmm6,xmm13
+ pxor xmm1,XMMWORD[((64-128))+rax]
+ paddd xmm14,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm11
+
+ movdqa xmm9,xmm10
+ paddd xmm14,xmm0
+ pxor xmm1,xmm3
+ psrld xmm9,27
+ pxor xmm6,xmm12
+ movdqa xmm7,xmm11
+
+ pslld xmm7,30
+ movdqa xmm5,xmm1
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm14,xmm6
+ paddd xmm1,xmm1
+
+ psrld xmm11,2
+ paddd xmm14,xmm8
+ por xmm1,xmm5
+ por xmm11,xmm7
+ pxor xmm2,xmm4
+ movdqa xmm4,XMMWORD[((240-128))+rax]
+
+ movdqa xmm8,xmm14
+ movdqa xmm6,xmm12
+ pxor xmm2,XMMWORD[((80-128))+rax]
+ paddd xmm13,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm10
+
+ movdqa xmm9,xmm14
+ paddd xmm13,xmm1
+ pxor xmm2,xmm4
+ psrld xmm9,27
+ pxor xmm6,xmm11
+ movdqa xmm7,xmm10
+
+ pslld xmm7,30
+ movdqa xmm5,xmm2
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm13,xmm6
+ paddd xmm2,xmm2
+
+ psrld xmm10,2
+ paddd xmm13,xmm8
+ por xmm2,xmm5
+ por xmm10,xmm7
+ pxor xmm3,xmm0
+ movdqa xmm0,XMMWORD[((0-128))+rax]
+
+ movdqa xmm8,xmm13
+ movdqa xmm6,xmm11
+ pxor xmm3,XMMWORD[((96-128))+rax]
+ paddd xmm12,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm14
+
+ movdqa xmm9,xmm13
+ paddd xmm12,xmm2
+ pxor xmm3,xmm0
+ psrld xmm9,27
+ pxor xmm6,xmm10
+ movdqa xmm7,xmm14
+
+ pslld xmm7,30
+ movdqa xmm5,xmm3
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm12,xmm6
+ paddd xmm3,xmm3
+
+ psrld xmm14,2
+ paddd xmm12,xmm8
+ por xmm3,xmm5
+ por xmm14,xmm7
+ pxor xmm4,xmm1
+ movdqa xmm1,XMMWORD[((16-128))+rax]
+
+ movdqa xmm8,xmm12
+ movdqa xmm6,xmm10
+ pxor xmm4,XMMWORD[((112-128))+rax]
+ paddd xmm11,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm13
+
+ movdqa xmm9,xmm12
+ paddd xmm11,xmm3
+ pxor xmm4,xmm1
+ psrld xmm9,27
+ pxor xmm6,xmm14
+ movdqa xmm7,xmm13
+
+ pslld xmm7,30
+ movdqa xmm5,xmm4
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm11,xmm6
+ paddd xmm4,xmm4
+
+ psrld xmm13,2
+ paddd xmm11,xmm8
+ por xmm4,xmm5
+ por xmm13,xmm7
+ movdqa xmm8,xmm11
+ paddd xmm10,xmm15
+ movdqa xmm6,xmm14
+ pslld xmm8,5
+ pxor xmm6,xmm12
+
+ movdqa xmm9,xmm11
+ paddd xmm10,xmm4
+ psrld xmm9,27
+ movdqa xmm7,xmm12
+ pxor xmm6,xmm13
+
+ pslld xmm7,30
+ por xmm8,xmm9
+ paddd xmm10,xmm6
+
+ psrld xmm12,2
+ paddd xmm10,xmm8
+ por xmm12,xmm7
+ movdqa xmm0,XMMWORD[rbx]
+ mov ecx,1
+ cmp ecx,DWORD[rbx]
+ pxor xmm8,xmm8
+ cmovge r8,rbp
+ cmp ecx,DWORD[4+rbx]
+ movdqa xmm1,xmm0
+ cmovge r9,rbp
+ cmp ecx,DWORD[8+rbx]
+ pcmpgtd xmm1,xmm8
+ cmovge r10,rbp
+ cmp ecx,DWORD[12+rbx]
+ paddd xmm0,xmm1
+ cmovge r11,rbp
+
+ movdqu xmm6,XMMWORD[rdi]
+ pand xmm10,xmm1
+ movdqu xmm7,XMMWORD[32+rdi]
+ pand xmm11,xmm1
+ paddd xmm10,xmm6
+ movdqu xmm8,XMMWORD[64+rdi]
+ pand xmm12,xmm1
+ paddd xmm11,xmm7
+ movdqu xmm9,XMMWORD[96+rdi]
+ pand xmm13,xmm1
+ paddd xmm12,xmm8
+ movdqu xmm5,XMMWORD[128+rdi]
+ pand xmm14,xmm1
+ movdqu XMMWORD[rdi],xmm10
+ paddd xmm13,xmm9
+ movdqu XMMWORD[32+rdi],xmm11
+ paddd xmm14,xmm5
+ movdqu XMMWORD[64+rdi],xmm12
+ movdqu XMMWORD[96+rdi],xmm13
+ movdqu XMMWORD[128+rdi],xmm14
+
+ movdqa XMMWORD[rbx],xmm0
+ movdqa xmm5,XMMWORD[96+rbp]
+ movdqa xmm15,XMMWORD[((-32))+rbp]
+ dec edx
+ jnz NEAR $L$oop
+
+ mov edx,DWORD[280+rsp]
+ lea rdi,[16+rdi]
+ lea rsi,[64+rsi]
+ dec edx
+ jnz NEAR $L$oop_grande
+
+$L$done:
+ mov rax,QWORD[272+rsp]
+
+ movaps xmm6,XMMWORD[((-184))+rax]
+ movaps xmm7,XMMWORD[((-168))+rax]
+ movaps xmm8,XMMWORD[((-152))+rax]
+ movaps xmm9,XMMWORD[((-136))+rax]
+ movaps xmm10,XMMWORD[((-120))+rax]
+ movaps xmm11,XMMWORD[((-104))+rax]
+ movaps xmm12,XMMWORD[((-88))+rax]
+ movaps xmm13,XMMWORD[((-72))+rax]
+ movaps xmm14,XMMWORD[((-56))+rax]
+ movaps xmm15,XMMWORD[((-40))+rax]
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_sha1_multi_block:
+
+ALIGN 32
+sha1_multi_block_shaext:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha1_multi_block_shaext:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+_shaext_shortcut:
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ lea rsp,[((-168))+rsp]
+ movaps XMMWORD[rsp],xmm6
+ movaps XMMWORD[16+rsp],xmm7
+ movaps XMMWORD[32+rsp],xmm8
+ movaps XMMWORD[48+rsp],xmm9
+ movaps XMMWORD[(-120)+rax],xmm10
+ movaps XMMWORD[(-104)+rax],xmm11
+ movaps XMMWORD[(-88)+rax],xmm12
+ movaps XMMWORD[(-72)+rax],xmm13
+ movaps XMMWORD[(-56)+rax],xmm14
+ movaps XMMWORD[(-40)+rax],xmm15
+ sub rsp,288
+ shl edx,1
+ and rsp,-256
+ lea rdi,[64+rdi]
+ mov QWORD[272+rsp],rax
+$L$body_shaext:
+ lea rbx,[256+rsp]
+ movdqa xmm3,XMMWORD[((K_XX_XX+128))]
+
+$L$oop_grande_shaext:
+ mov DWORD[280+rsp],edx
+ xor edx,edx
+ mov r8,QWORD[rsi]
+ mov ecx,DWORD[8+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[rbx],ecx
+ cmovle r8,rsp
+ mov r9,QWORD[16+rsi]
+ mov ecx,DWORD[24+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[4+rbx],ecx
+ cmovle r9,rsp
+ test edx,edx
+ jz NEAR $L$done_shaext
+
+ movq xmm0,QWORD[((0-64))+rdi]
+ movq xmm4,QWORD[((32-64))+rdi]
+ movq xmm5,QWORD[((64-64))+rdi]
+ movq xmm6,QWORD[((96-64))+rdi]
+ movq xmm7,QWORD[((128-64))+rdi]
+
+ punpckldq xmm0,xmm4
+ punpckldq xmm5,xmm6
+
+ movdqa xmm8,xmm0
+ punpcklqdq xmm0,xmm5
+ punpckhqdq xmm8,xmm5
+
+ pshufd xmm1,xmm7,63
+ pshufd xmm9,xmm7,127
+ pshufd xmm0,xmm0,27
+ pshufd xmm8,xmm8,27
+ jmp NEAR $L$oop_shaext
+
+ALIGN 32
+$L$oop_shaext:
+ movdqu xmm4,XMMWORD[r8]
+ movdqu xmm11,XMMWORD[r9]
+ movdqu xmm5,XMMWORD[16+r8]
+ movdqu xmm12,XMMWORD[16+r9]
+ movdqu xmm6,XMMWORD[32+r8]
+DB 102,15,56,0,227
+ movdqu xmm13,XMMWORD[32+r9]
+DB 102,68,15,56,0,219
+ movdqu xmm7,XMMWORD[48+r8]
+ lea r8,[64+r8]
+DB 102,15,56,0,235
+ movdqu xmm14,XMMWORD[48+r9]
+ lea r9,[64+r9]
+DB 102,68,15,56,0,227
+
+ movdqa XMMWORD[80+rsp],xmm1
+ paddd xmm1,xmm4
+ movdqa XMMWORD[112+rsp],xmm9
+ paddd xmm9,xmm11
+ movdqa XMMWORD[64+rsp],xmm0
+ movdqa xmm2,xmm0
+ movdqa XMMWORD[96+rsp],xmm8
+ movdqa xmm10,xmm8
+DB 15,58,204,193,0
+DB 15,56,200,213
+DB 69,15,58,204,193,0
+DB 69,15,56,200,212
+DB 102,15,56,0,243
+ prefetcht0 [127+r8]
+DB 15,56,201,229
+DB 102,68,15,56,0,235
+ prefetcht0 [127+r9]
+DB 69,15,56,201,220
+
+DB 102,15,56,0,251
+ movdqa xmm1,xmm0
+DB 102,68,15,56,0,243
+ movdqa xmm9,xmm8
+DB 15,58,204,194,0
+DB 15,56,200,206
+DB 69,15,58,204,194,0
+DB 69,15,56,200,205
+ pxor xmm4,xmm6
+DB 15,56,201,238
+ pxor xmm11,xmm13
+DB 69,15,56,201,229
+ movdqa xmm2,xmm0
+ movdqa xmm10,xmm8
+DB 15,58,204,193,0
+DB 15,56,200,215
+DB 69,15,58,204,193,0
+DB 69,15,56,200,214
+DB 15,56,202,231
+DB 69,15,56,202,222
+ pxor xmm5,xmm7
+DB 15,56,201,247
+ pxor xmm12,xmm14
+DB 69,15,56,201,238
+ movdqa xmm1,xmm0
+ movdqa xmm9,xmm8
+DB 15,58,204,194,0
+DB 15,56,200,204
+DB 69,15,58,204,194,0
+DB 69,15,56,200,203
+DB 15,56,202,236
+DB 69,15,56,202,227
+ pxor xmm6,xmm4
+DB 15,56,201,252
+ pxor xmm13,xmm11
+DB 69,15,56,201,243
+ movdqa xmm2,xmm0
+ movdqa xmm10,xmm8
+DB 15,58,204,193,0
+DB 15,56,200,213
+DB 69,15,58,204,193,0
+DB 69,15,56,200,212
+DB 15,56,202,245
+DB 69,15,56,202,236
+ pxor xmm7,xmm5
+DB 15,56,201,229
+ pxor xmm14,xmm12
+DB 69,15,56,201,220
+ movdqa xmm1,xmm0
+ movdqa xmm9,xmm8
+DB 15,58,204,194,1
+DB 15,56,200,206
+DB 69,15,58,204,194,1
+DB 69,15,56,200,205
+DB 15,56,202,254
+DB 69,15,56,202,245
+ pxor xmm4,xmm6
+DB 15,56,201,238
+ pxor xmm11,xmm13
+DB 69,15,56,201,229
+ movdqa xmm2,xmm0
+ movdqa xmm10,xmm8
+DB 15,58,204,193,1
+DB 15,56,200,215
+DB 69,15,58,204,193,1
+DB 69,15,56,200,214
+DB 15,56,202,231
+DB 69,15,56,202,222
+ pxor xmm5,xmm7
+DB 15,56,201,247
+ pxor xmm12,xmm14
+DB 69,15,56,201,238
+ movdqa xmm1,xmm0
+ movdqa xmm9,xmm8
+DB 15,58,204,194,1
+DB 15,56,200,204
+DB 69,15,58,204,194,1
+DB 69,15,56,200,203
+DB 15,56,202,236
+DB 69,15,56,202,227
+ pxor xmm6,xmm4
+DB 15,56,201,252
+ pxor xmm13,xmm11
+DB 69,15,56,201,243
+ movdqa xmm2,xmm0
+ movdqa xmm10,xmm8
+DB 15,58,204,193,1
+DB 15,56,200,213
+DB 69,15,58,204,193,1
+DB 69,15,56,200,212
+DB 15,56,202,245
+DB 69,15,56,202,236
+ pxor xmm7,xmm5
+DB 15,56,201,229
+ pxor xmm14,xmm12
+DB 69,15,56,201,220
+ movdqa xmm1,xmm0
+ movdqa xmm9,xmm8
+DB 15,58,204,194,1
+DB 15,56,200,206
+DB 69,15,58,204,194,1
+DB 69,15,56,200,205
+DB 15,56,202,254
+DB 69,15,56,202,245
+ pxor xmm4,xmm6
+DB 15,56,201,238
+ pxor xmm11,xmm13
+DB 69,15,56,201,229
+ movdqa xmm2,xmm0
+ movdqa xmm10,xmm8
+DB 15,58,204,193,2
+DB 15,56,200,215
+DB 69,15,58,204,193,2
+DB 69,15,56,200,214
+DB 15,56,202,231
+DB 69,15,56,202,222
+ pxor xmm5,xmm7
+DB 15,56,201,247
+ pxor xmm12,xmm14
+DB 69,15,56,201,238
+ movdqa xmm1,xmm0
+ movdqa xmm9,xmm8
+DB 15,58,204,194,2
+DB 15,56,200,204
+DB 69,15,58,204,194,2
+DB 69,15,56,200,203
+DB 15,56,202,236
+DB 69,15,56,202,227
+ pxor xmm6,xmm4
+DB 15,56,201,252
+ pxor xmm13,xmm11
+DB 69,15,56,201,243
+ movdqa xmm2,xmm0
+ movdqa xmm10,xmm8
+DB 15,58,204,193,2
+DB 15,56,200,213
+DB 69,15,58,204,193,2
+DB 69,15,56,200,212
+DB 15,56,202,245
+DB 69,15,56,202,236
+ pxor xmm7,xmm5
+DB 15,56,201,229
+ pxor xmm14,xmm12
+DB 69,15,56,201,220
+ movdqa xmm1,xmm0
+ movdqa xmm9,xmm8
+DB 15,58,204,194,2
+DB 15,56,200,206
+DB 69,15,58,204,194,2
+DB 69,15,56,200,205
+DB 15,56,202,254
+DB 69,15,56,202,245
+ pxor xmm4,xmm6
+DB 15,56,201,238
+ pxor xmm11,xmm13
+DB 69,15,56,201,229
+ movdqa xmm2,xmm0
+ movdqa xmm10,xmm8
+DB 15,58,204,193,2
+DB 15,56,200,215
+DB 69,15,58,204,193,2
+DB 69,15,56,200,214
+DB 15,56,202,231
+DB 69,15,56,202,222
+ pxor xmm5,xmm7
+DB 15,56,201,247
+ pxor xmm12,xmm14
+DB 69,15,56,201,238
+ movdqa xmm1,xmm0
+ movdqa xmm9,xmm8
+DB 15,58,204,194,3
+DB 15,56,200,204
+DB 69,15,58,204,194,3
+DB 69,15,56,200,203
+DB 15,56,202,236
+DB 69,15,56,202,227
+ pxor xmm6,xmm4
+DB 15,56,201,252
+ pxor xmm13,xmm11
+DB 69,15,56,201,243
+ movdqa xmm2,xmm0
+ movdqa xmm10,xmm8
+DB 15,58,204,193,3
+DB 15,56,200,213
+DB 69,15,58,204,193,3
+DB 69,15,56,200,212
+DB 15,56,202,245
+DB 69,15,56,202,236
+ pxor xmm7,xmm5
+ pxor xmm14,xmm12
+
+ mov ecx,1
+ pxor xmm4,xmm4
+ cmp ecx,DWORD[rbx]
+ cmovge r8,rsp
+
+ movdqa xmm1,xmm0
+ movdqa xmm9,xmm8
+DB 15,58,204,194,3
+DB 15,56,200,206
+DB 69,15,58,204,194,3
+DB 69,15,56,200,205
+DB 15,56,202,254
+DB 69,15,56,202,245
+
+ cmp ecx,DWORD[4+rbx]
+ cmovge r9,rsp
+ movq xmm6,QWORD[rbx]
+
+ movdqa xmm2,xmm0
+ movdqa xmm10,xmm8
+DB 15,58,204,193,3
+DB 15,56,200,215
+DB 69,15,58,204,193,3
+DB 69,15,56,200,214
+
+ pshufd xmm11,xmm6,0x00
+ pshufd xmm12,xmm6,0x55
+ movdqa xmm7,xmm6
+ pcmpgtd xmm11,xmm4
+ pcmpgtd xmm12,xmm4
+
+ movdqa xmm1,xmm0
+ movdqa xmm9,xmm8
+DB 15,58,204,194,3
+DB 15,56,200,204
+DB 69,15,58,204,194,3
+DB 68,15,56,200,204
+
+ pcmpgtd xmm7,xmm4
+ pand xmm0,xmm11
+ pand xmm1,xmm11
+ pand xmm8,xmm12
+ pand xmm9,xmm12
+ paddd xmm6,xmm7
+
+ paddd xmm0,XMMWORD[64+rsp]
+ paddd xmm1,XMMWORD[80+rsp]
+ paddd xmm8,XMMWORD[96+rsp]
+ paddd xmm9,XMMWORD[112+rsp]
+
+ movq QWORD[rbx],xmm6
+ dec edx
+ jnz NEAR $L$oop_shaext
+
+ mov edx,DWORD[280+rsp]
+
+ pshufd xmm0,xmm0,27
+ pshufd xmm8,xmm8,27
+
+ movdqa xmm6,xmm0
+ punpckldq xmm0,xmm8
+ punpckhdq xmm6,xmm8
+ punpckhdq xmm1,xmm9
+ movq QWORD[(0-64)+rdi],xmm0
+ psrldq xmm0,8
+ movq QWORD[(64-64)+rdi],xmm6
+ psrldq xmm6,8
+ movq QWORD[(32-64)+rdi],xmm0
+ psrldq xmm1,8
+ movq QWORD[(96-64)+rdi],xmm6
+ movq QWORD[(128-64)+rdi],xmm1
+
+ lea rdi,[8+rdi]
+ lea rsi,[32+rsi]
+ dec edx
+ jnz NEAR $L$oop_grande_shaext
+
+$L$done_shaext:
+
+ movaps xmm6,XMMWORD[((-184))+rax]
+ movaps xmm7,XMMWORD[((-168))+rax]
+ movaps xmm8,XMMWORD[((-152))+rax]
+ movaps xmm9,XMMWORD[((-136))+rax]
+ movaps xmm10,XMMWORD[((-120))+rax]
+ movaps xmm11,XMMWORD[((-104))+rax]
+ movaps xmm12,XMMWORD[((-88))+rax]
+ movaps xmm13,XMMWORD[((-72))+rax]
+ movaps xmm14,XMMWORD[((-56))+rax]
+ movaps xmm15,XMMWORD[((-40))+rax]
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$epilogue_shaext:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_sha1_multi_block_shaext:
+
+ALIGN 32
+sha1_multi_block_avx:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha1_multi_block_avx:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+_avx_shortcut:
+ shr rcx,32
+ cmp edx,2
+ jb NEAR $L$avx
+ test ecx,32
+ jnz NEAR _avx2_shortcut
+ jmp NEAR $L$avx
+ALIGN 32
+$L$avx:
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ lea rsp,[((-168))+rsp]
+ movaps XMMWORD[rsp],xmm6
+ movaps XMMWORD[16+rsp],xmm7
+ movaps XMMWORD[32+rsp],xmm8
+ movaps XMMWORD[48+rsp],xmm9
+ movaps XMMWORD[(-120)+rax],xmm10
+ movaps XMMWORD[(-104)+rax],xmm11
+ movaps XMMWORD[(-88)+rax],xmm12
+ movaps XMMWORD[(-72)+rax],xmm13
+ movaps XMMWORD[(-56)+rax],xmm14
+ movaps XMMWORD[(-40)+rax],xmm15
+ sub rsp,288
+ and rsp,-256
+ mov QWORD[272+rsp],rax
+
+$L$body_avx:
+ lea rbp,[K_XX_XX]
+ lea rbx,[256+rsp]
+
+ vzeroupper
+$L$oop_grande_avx:
+ mov DWORD[280+rsp],edx
+ xor edx,edx
+ mov r8,QWORD[rsi]
+ mov ecx,DWORD[8+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[rbx],ecx
+ cmovle r8,rbp
+ mov r9,QWORD[16+rsi]
+ mov ecx,DWORD[24+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[4+rbx],ecx
+ cmovle r9,rbp
+ mov r10,QWORD[32+rsi]
+ mov ecx,DWORD[40+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[8+rbx],ecx
+ cmovle r10,rbp
+ mov r11,QWORD[48+rsi]
+ mov ecx,DWORD[56+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[12+rbx],ecx
+ cmovle r11,rbp
+ test edx,edx
+ jz NEAR $L$done_avx
+
+ vmovdqu xmm10,XMMWORD[rdi]
+ lea rax,[128+rsp]
+ vmovdqu xmm11,XMMWORD[32+rdi]
+ vmovdqu xmm12,XMMWORD[64+rdi]
+ vmovdqu xmm13,XMMWORD[96+rdi]
+ vmovdqu xmm14,XMMWORD[128+rdi]
+ vmovdqu xmm5,XMMWORD[96+rbp]
+ jmp NEAR $L$oop_avx
+
+ALIGN 32
+$L$oop_avx:
+ vmovdqa xmm15,XMMWORD[((-32))+rbp]
+ vmovd xmm0,DWORD[r8]
+ lea r8,[64+r8]
+ vmovd xmm2,DWORD[r9]
+ lea r9,[64+r9]
+ vpinsrd xmm0,xmm0,DWORD[r10],1
+ lea r10,[64+r10]
+ vpinsrd xmm2,xmm2,DWORD[r11],1
+ lea r11,[64+r11]
+ vmovd xmm1,DWORD[((-60))+r8]
+ vpunpckldq xmm0,xmm0,xmm2
+ vmovd xmm9,DWORD[((-60))+r9]
+ vpshufb xmm0,xmm0,xmm5
+ vpinsrd xmm1,xmm1,DWORD[((-60))+r10],1
+ vpinsrd xmm9,xmm9,DWORD[((-60))+r11],1
+ vpaddd xmm14,xmm14,xmm15
+ vpslld xmm8,xmm10,5
+ vpandn xmm7,xmm11,xmm13
+ vpand xmm6,xmm11,xmm12
+
+ vmovdqa XMMWORD[(0-128)+rax],xmm0
+ vpaddd xmm14,xmm14,xmm0
+ vpunpckldq xmm1,xmm1,xmm9
+ vpsrld xmm9,xmm10,27
+ vpxor xmm6,xmm6,xmm7
+ vmovd xmm2,DWORD[((-56))+r8]
+
+ vpslld xmm7,xmm11,30
+ vpor xmm8,xmm8,xmm9
+ vmovd xmm9,DWORD[((-56))+r9]
+ vpaddd xmm14,xmm14,xmm6
+
+ vpsrld xmm11,xmm11,2
+ vpaddd xmm14,xmm14,xmm8
+ vpshufb xmm1,xmm1,xmm5
+ vpor xmm11,xmm11,xmm7
+ vpinsrd xmm2,xmm2,DWORD[((-56))+r10],1
+ vpinsrd xmm9,xmm9,DWORD[((-56))+r11],1
+ vpaddd xmm13,xmm13,xmm15
+ vpslld xmm8,xmm14,5
+ vpandn xmm7,xmm10,xmm12
+ vpand xmm6,xmm10,xmm11
+
+ vmovdqa XMMWORD[(16-128)+rax],xmm1
+ vpaddd xmm13,xmm13,xmm1
+ vpunpckldq xmm2,xmm2,xmm9
+ vpsrld xmm9,xmm14,27
+ vpxor xmm6,xmm6,xmm7
+ vmovd xmm3,DWORD[((-52))+r8]
+
+ vpslld xmm7,xmm10,30
+ vpor xmm8,xmm8,xmm9
+ vmovd xmm9,DWORD[((-52))+r9]
+ vpaddd xmm13,xmm13,xmm6
+
+ vpsrld xmm10,xmm10,2
+ vpaddd xmm13,xmm13,xmm8
+ vpshufb xmm2,xmm2,xmm5
+ vpor xmm10,xmm10,xmm7
+ vpinsrd xmm3,xmm3,DWORD[((-52))+r10],1
+ vpinsrd xmm9,xmm9,DWORD[((-52))+r11],1
+ vpaddd xmm12,xmm12,xmm15
+ vpslld xmm8,xmm13,5
+ vpandn xmm7,xmm14,xmm11
+ vpand xmm6,xmm14,xmm10
+
+ vmovdqa XMMWORD[(32-128)+rax],xmm2
+ vpaddd xmm12,xmm12,xmm2
+ vpunpckldq xmm3,xmm3,xmm9
+ vpsrld xmm9,xmm13,27
+ vpxor xmm6,xmm6,xmm7
+ vmovd xmm4,DWORD[((-48))+r8]
+
+ vpslld xmm7,xmm14,30
+ vpor xmm8,xmm8,xmm9
+ vmovd xmm9,DWORD[((-48))+r9]
+ vpaddd xmm12,xmm12,xmm6
+
+ vpsrld xmm14,xmm14,2
+ vpaddd xmm12,xmm12,xmm8
+ vpshufb xmm3,xmm3,xmm5
+ vpor xmm14,xmm14,xmm7
+ vpinsrd xmm4,xmm4,DWORD[((-48))+r10],1
+ vpinsrd xmm9,xmm9,DWORD[((-48))+r11],1
+ vpaddd xmm11,xmm11,xmm15
+ vpslld xmm8,xmm12,5
+ vpandn xmm7,xmm13,xmm10
+ vpand xmm6,xmm13,xmm14
+
+ vmovdqa XMMWORD[(48-128)+rax],xmm3
+ vpaddd xmm11,xmm11,xmm3
+ vpunpckldq xmm4,xmm4,xmm9
+ vpsrld xmm9,xmm12,27
+ vpxor xmm6,xmm6,xmm7
+ vmovd xmm0,DWORD[((-44))+r8]
+
+ vpslld xmm7,xmm13,30
+ vpor xmm8,xmm8,xmm9
+ vmovd xmm9,DWORD[((-44))+r9]
+ vpaddd xmm11,xmm11,xmm6
+
+ vpsrld xmm13,xmm13,2
+ vpaddd xmm11,xmm11,xmm8
+ vpshufb xmm4,xmm4,xmm5
+ vpor xmm13,xmm13,xmm7
+ vpinsrd xmm0,xmm0,DWORD[((-44))+r10],1
+ vpinsrd xmm9,xmm9,DWORD[((-44))+r11],1
+ vpaddd xmm10,xmm10,xmm15
+ vpslld xmm8,xmm11,5
+ vpandn xmm7,xmm12,xmm14
+ vpand xmm6,xmm12,xmm13
+
+ vmovdqa XMMWORD[(64-128)+rax],xmm4
+ vpaddd xmm10,xmm10,xmm4
+ vpunpckldq xmm0,xmm0,xmm9
+ vpsrld xmm9,xmm11,27
+ vpxor xmm6,xmm6,xmm7
+ vmovd xmm1,DWORD[((-40))+r8]
+
+ vpslld xmm7,xmm12,30
+ vpor xmm8,xmm8,xmm9
+ vmovd xmm9,DWORD[((-40))+r9]
+ vpaddd xmm10,xmm10,xmm6
+
+ vpsrld xmm12,xmm12,2
+ vpaddd xmm10,xmm10,xmm8
+ vpshufb xmm0,xmm0,xmm5
+ vpor xmm12,xmm12,xmm7
+ vpinsrd xmm1,xmm1,DWORD[((-40))+r10],1
+ vpinsrd xmm9,xmm9,DWORD[((-40))+r11],1
+ vpaddd xmm14,xmm14,xmm15
+ vpslld xmm8,xmm10,5
+ vpandn xmm7,xmm11,xmm13
+ vpand xmm6,xmm11,xmm12
+
+ vmovdqa XMMWORD[(80-128)+rax],xmm0
+ vpaddd xmm14,xmm14,xmm0
+ vpunpckldq xmm1,xmm1,xmm9
+ vpsrld xmm9,xmm10,27
+ vpxor xmm6,xmm6,xmm7
+ vmovd xmm2,DWORD[((-36))+r8]
+
+ vpslld xmm7,xmm11,30
+ vpor xmm8,xmm8,xmm9
+ vmovd xmm9,DWORD[((-36))+r9]
+ vpaddd xmm14,xmm14,xmm6
+
+ vpsrld xmm11,xmm11,2
+ vpaddd xmm14,xmm14,xmm8
+ vpshufb xmm1,xmm1,xmm5
+ vpor xmm11,xmm11,xmm7
+ vpinsrd xmm2,xmm2,DWORD[((-36))+r10],1
+ vpinsrd xmm9,xmm9,DWORD[((-36))+r11],1
+ vpaddd xmm13,xmm13,xmm15
+ vpslld xmm8,xmm14,5
+ vpandn xmm7,xmm10,xmm12
+ vpand xmm6,xmm10,xmm11
+
+ vmovdqa XMMWORD[(96-128)+rax],xmm1
+ vpaddd xmm13,xmm13,xmm1
+ vpunpckldq xmm2,xmm2,xmm9
+ vpsrld xmm9,xmm14,27
+ vpxor xmm6,xmm6,xmm7
+ vmovd xmm3,DWORD[((-32))+r8]
+
+ vpslld xmm7,xmm10,30
+ vpor xmm8,xmm8,xmm9
+ vmovd xmm9,DWORD[((-32))+r9]
+ vpaddd xmm13,xmm13,xmm6
+
+ vpsrld xmm10,xmm10,2
+ vpaddd xmm13,xmm13,xmm8
+ vpshufb xmm2,xmm2,xmm5
+ vpor xmm10,xmm10,xmm7
+ vpinsrd xmm3,xmm3,DWORD[((-32))+r10],1
+ vpinsrd xmm9,xmm9,DWORD[((-32))+r11],1
+ vpaddd xmm12,xmm12,xmm15
+ vpslld xmm8,xmm13,5
+ vpandn xmm7,xmm14,xmm11
+ vpand xmm6,xmm14,xmm10
+
+ vmovdqa XMMWORD[(112-128)+rax],xmm2
+ vpaddd xmm12,xmm12,xmm2
+ vpunpckldq xmm3,xmm3,xmm9
+ vpsrld xmm9,xmm13,27
+ vpxor xmm6,xmm6,xmm7
+ vmovd xmm4,DWORD[((-28))+r8]
+
+ vpslld xmm7,xmm14,30
+ vpor xmm8,xmm8,xmm9
+ vmovd xmm9,DWORD[((-28))+r9]
+ vpaddd xmm12,xmm12,xmm6
+
+ vpsrld xmm14,xmm14,2
+ vpaddd xmm12,xmm12,xmm8
+ vpshufb xmm3,xmm3,xmm5
+ vpor xmm14,xmm14,xmm7
+ vpinsrd xmm4,xmm4,DWORD[((-28))+r10],1
+ vpinsrd xmm9,xmm9,DWORD[((-28))+r11],1
+ vpaddd xmm11,xmm11,xmm15
+ vpslld xmm8,xmm12,5
+ vpandn xmm7,xmm13,xmm10
+ vpand xmm6,xmm13,xmm14
+
+ vmovdqa XMMWORD[(128-128)+rax],xmm3
+ vpaddd xmm11,xmm11,xmm3
+ vpunpckldq xmm4,xmm4,xmm9
+ vpsrld xmm9,xmm12,27
+ vpxor xmm6,xmm6,xmm7
+ vmovd xmm0,DWORD[((-24))+r8]
+
+ vpslld xmm7,xmm13,30
+ vpor xmm8,xmm8,xmm9
+ vmovd xmm9,DWORD[((-24))+r9]
+ vpaddd xmm11,xmm11,xmm6
+
+ vpsrld xmm13,xmm13,2
+ vpaddd xmm11,xmm11,xmm8
+ vpshufb xmm4,xmm4,xmm5
+ vpor xmm13,xmm13,xmm7
+ vpinsrd xmm0,xmm0,DWORD[((-24))+r10],1
+ vpinsrd xmm9,xmm9,DWORD[((-24))+r11],1
+ vpaddd xmm10,xmm10,xmm15
+ vpslld xmm8,xmm11,5
+ vpandn xmm7,xmm12,xmm14
+ vpand xmm6,xmm12,xmm13
+
+ vmovdqa XMMWORD[(144-128)+rax],xmm4
+ vpaddd xmm10,xmm10,xmm4
+ vpunpckldq xmm0,xmm0,xmm9
+ vpsrld xmm9,xmm11,27
+ vpxor xmm6,xmm6,xmm7
+ vmovd xmm1,DWORD[((-20))+r8]
+
+ vpslld xmm7,xmm12,30
+ vpor xmm8,xmm8,xmm9
+ vmovd xmm9,DWORD[((-20))+r9]
+ vpaddd xmm10,xmm10,xmm6
+
+ vpsrld xmm12,xmm12,2
+ vpaddd xmm10,xmm10,xmm8
+ vpshufb xmm0,xmm0,xmm5
+ vpor xmm12,xmm12,xmm7
+ vpinsrd xmm1,xmm1,DWORD[((-20))+r10],1
+ vpinsrd xmm9,xmm9,DWORD[((-20))+r11],1
+ vpaddd xmm14,xmm14,xmm15
+ vpslld xmm8,xmm10,5
+ vpandn xmm7,xmm11,xmm13
+ vpand xmm6,xmm11,xmm12
+
+ vmovdqa XMMWORD[(160-128)+rax],xmm0
+ vpaddd xmm14,xmm14,xmm0
+ vpunpckldq xmm1,xmm1,xmm9
+ vpsrld xmm9,xmm10,27
+ vpxor xmm6,xmm6,xmm7
+ vmovd xmm2,DWORD[((-16))+r8]
+
+ vpslld xmm7,xmm11,30
+ vpor xmm8,xmm8,xmm9
+ vmovd xmm9,DWORD[((-16))+r9]
+ vpaddd xmm14,xmm14,xmm6
+
+ vpsrld xmm11,xmm11,2
+ vpaddd xmm14,xmm14,xmm8
+ vpshufb xmm1,xmm1,xmm5
+ vpor xmm11,xmm11,xmm7
+ vpinsrd xmm2,xmm2,DWORD[((-16))+r10],1
+ vpinsrd xmm9,xmm9,DWORD[((-16))+r11],1
+ vpaddd xmm13,xmm13,xmm15
+ vpslld xmm8,xmm14,5
+ vpandn xmm7,xmm10,xmm12
+ vpand xmm6,xmm10,xmm11
+
+ vmovdqa XMMWORD[(176-128)+rax],xmm1
+ vpaddd xmm13,xmm13,xmm1
+ vpunpckldq xmm2,xmm2,xmm9
+ vpsrld xmm9,xmm14,27
+ vpxor xmm6,xmm6,xmm7
+ vmovd xmm3,DWORD[((-12))+r8]
+
+ vpslld xmm7,xmm10,30
+ vpor xmm8,xmm8,xmm9
+ vmovd xmm9,DWORD[((-12))+r9]
+ vpaddd xmm13,xmm13,xmm6
+
+ vpsrld xmm10,xmm10,2
+ vpaddd xmm13,xmm13,xmm8
+ vpshufb xmm2,xmm2,xmm5
+ vpor xmm10,xmm10,xmm7
+ vpinsrd xmm3,xmm3,DWORD[((-12))+r10],1
+ vpinsrd xmm9,xmm9,DWORD[((-12))+r11],1
+ vpaddd xmm12,xmm12,xmm15
+ vpslld xmm8,xmm13,5
+ vpandn xmm7,xmm14,xmm11
+ vpand xmm6,xmm14,xmm10
+
+ vmovdqa XMMWORD[(192-128)+rax],xmm2
+ vpaddd xmm12,xmm12,xmm2
+ vpunpckldq xmm3,xmm3,xmm9
+ vpsrld xmm9,xmm13,27
+ vpxor xmm6,xmm6,xmm7
+ vmovd xmm4,DWORD[((-8))+r8]
+
+ vpslld xmm7,xmm14,30
+ vpor xmm8,xmm8,xmm9
+ vmovd xmm9,DWORD[((-8))+r9]
+ vpaddd xmm12,xmm12,xmm6
+
+ vpsrld xmm14,xmm14,2
+ vpaddd xmm12,xmm12,xmm8
+ vpshufb xmm3,xmm3,xmm5
+ vpor xmm14,xmm14,xmm7
+ vpinsrd xmm4,xmm4,DWORD[((-8))+r10],1
+ vpinsrd xmm9,xmm9,DWORD[((-8))+r11],1
+ vpaddd xmm11,xmm11,xmm15
+ vpslld xmm8,xmm12,5
+ vpandn xmm7,xmm13,xmm10
+ vpand xmm6,xmm13,xmm14
+
+ vmovdqa XMMWORD[(208-128)+rax],xmm3
+ vpaddd xmm11,xmm11,xmm3
+ vpunpckldq xmm4,xmm4,xmm9
+ vpsrld xmm9,xmm12,27
+ vpxor xmm6,xmm6,xmm7
+ vmovd xmm0,DWORD[((-4))+r8]
+
+ vpslld xmm7,xmm13,30
+ vpor xmm8,xmm8,xmm9
+ vmovd xmm9,DWORD[((-4))+r9]
+ vpaddd xmm11,xmm11,xmm6
+
+ vpsrld xmm13,xmm13,2
+ vpaddd xmm11,xmm11,xmm8
+ vpshufb xmm4,xmm4,xmm5
+ vpor xmm13,xmm13,xmm7
+ vmovdqa xmm1,XMMWORD[((0-128))+rax]
+ vpinsrd xmm0,xmm0,DWORD[((-4))+r10],1
+ vpinsrd xmm9,xmm9,DWORD[((-4))+r11],1
+ vpaddd xmm10,xmm10,xmm15
+ prefetcht0 [63+r8]
+ vpslld xmm8,xmm11,5
+ vpandn xmm7,xmm12,xmm14
+ vpand xmm6,xmm12,xmm13
+
+ vmovdqa XMMWORD[(224-128)+rax],xmm4
+ vpaddd xmm10,xmm10,xmm4
+ vpunpckldq xmm0,xmm0,xmm9
+ vpsrld xmm9,xmm11,27
+ prefetcht0 [63+r9]
+ vpxor xmm6,xmm6,xmm7
+
+ vpslld xmm7,xmm12,30
+ vpor xmm8,xmm8,xmm9
+ prefetcht0 [63+r10]
+ vpaddd xmm10,xmm10,xmm6
+
+ vpsrld xmm12,xmm12,2
+ vpaddd xmm10,xmm10,xmm8
+ prefetcht0 [63+r11]
+ vpshufb xmm0,xmm0,xmm5
+ vpor xmm12,xmm12,xmm7
+ vmovdqa xmm2,XMMWORD[((16-128))+rax]
+ vpxor xmm1,xmm1,xmm3
+ vmovdqa xmm3,XMMWORD[((32-128))+rax]
+
+ vpaddd xmm14,xmm14,xmm15
+ vpslld xmm8,xmm10,5
+ vpandn xmm7,xmm11,xmm13
+
+ vpand xmm6,xmm11,xmm12
+
+ vmovdqa XMMWORD[(240-128)+rax],xmm0
+ vpaddd xmm14,xmm14,xmm0
+ vpxor xmm1,xmm1,XMMWORD[((128-128))+rax]
+ vpsrld xmm9,xmm10,27
+ vpxor xmm6,xmm6,xmm7
+ vpxor xmm1,xmm1,xmm3
+
+
+ vpslld xmm7,xmm11,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm14,xmm14,xmm6
+
+ vpsrld xmm5,xmm1,31
+ vpaddd xmm1,xmm1,xmm1
+
+ vpsrld xmm11,xmm11,2
+
+ vpaddd xmm14,xmm14,xmm8
+ vpor xmm1,xmm1,xmm5
+ vpor xmm11,xmm11,xmm7
+ vpxor xmm2,xmm2,xmm4
+ vmovdqa xmm4,XMMWORD[((48-128))+rax]
+
+ vpaddd xmm13,xmm13,xmm15
+ vpslld xmm8,xmm14,5
+ vpandn xmm7,xmm10,xmm12
+
+ vpand xmm6,xmm10,xmm11
+
+ vmovdqa XMMWORD[(0-128)+rax],xmm1
+ vpaddd xmm13,xmm13,xmm1
+ vpxor xmm2,xmm2,XMMWORD[((144-128))+rax]
+ vpsrld xmm9,xmm14,27
+ vpxor xmm6,xmm6,xmm7
+ vpxor xmm2,xmm2,xmm4
+
+
+ vpslld xmm7,xmm10,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm13,xmm13,xmm6
+
+ vpsrld xmm5,xmm2,31
+ vpaddd xmm2,xmm2,xmm2
+
+ vpsrld xmm10,xmm10,2
+
+ vpaddd xmm13,xmm13,xmm8
+ vpor xmm2,xmm2,xmm5
+ vpor xmm10,xmm10,xmm7
+ vpxor xmm3,xmm3,xmm0
+ vmovdqa xmm0,XMMWORD[((64-128))+rax]
+
+ vpaddd xmm12,xmm12,xmm15
+ vpslld xmm8,xmm13,5
+ vpandn xmm7,xmm14,xmm11
+
+ vpand xmm6,xmm14,xmm10
+
+ vmovdqa XMMWORD[(16-128)+rax],xmm2
+ vpaddd xmm12,xmm12,xmm2
+ vpxor xmm3,xmm3,XMMWORD[((160-128))+rax]
+ vpsrld xmm9,xmm13,27
+ vpxor xmm6,xmm6,xmm7
+ vpxor xmm3,xmm3,xmm0
+
+
+ vpslld xmm7,xmm14,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm12,xmm12,xmm6
+
+ vpsrld xmm5,xmm3,31
+ vpaddd xmm3,xmm3,xmm3
+
+ vpsrld xmm14,xmm14,2
+
+ vpaddd xmm12,xmm12,xmm8
+ vpor xmm3,xmm3,xmm5
+ vpor xmm14,xmm14,xmm7
+ vpxor xmm4,xmm4,xmm1
+ vmovdqa xmm1,XMMWORD[((80-128))+rax]
+
+ vpaddd xmm11,xmm11,xmm15
+ vpslld xmm8,xmm12,5
+ vpandn xmm7,xmm13,xmm10
+
+ vpand xmm6,xmm13,xmm14
+
+ vmovdqa XMMWORD[(32-128)+rax],xmm3
+ vpaddd xmm11,xmm11,xmm3
+ vpxor xmm4,xmm4,XMMWORD[((176-128))+rax]
+ vpsrld xmm9,xmm12,27
+ vpxor xmm6,xmm6,xmm7
+ vpxor xmm4,xmm4,xmm1
+
+
+ vpslld xmm7,xmm13,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm11,xmm11,xmm6
+
+ vpsrld xmm5,xmm4,31
+ vpaddd xmm4,xmm4,xmm4
+
+ vpsrld xmm13,xmm13,2
+
+ vpaddd xmm11,xmm11,xmm8
+ vpor xmm4,xmm4,xmm5
+ vpor xmm13,xmm13,xmm7
+ vpxor xmm0,xmm0,xmm2
+ vmovdqa xmm2,XMMWORD[((96-128))+rax]
+
+ vpaddd xmm10,xmm10,xmm15
+ vpslld xmm8,xmm11,5
+ vpandn xmm7,xmm12,xmm14
+
+ vpand xmm6,xmm12,xmm13
+
+ vmovdqa XMMWORD[(48-128)+rax],xmm4
+ vpaddd xmm10,xmm10,xmm4
+ vpxor xmm0,xmm0,XMMWORD[((192-128))+rax]
+ vpsrld xmm9,xmm11,27
+ vpxor xmm6,xmm6,xmm7
+ vpxor xmm0,xmm0,xmm2
+
+
+ vpslld xmm7,xmm12,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm10,xmm10,xmm6
+
+ vpsrld xmm5,xmm0,31
+ vpaddd xmm0,xmm0,xmm0
+
+ vpsrld xmm12,xmm12,2
+
+ vpaddd xmm10,xmm10,xmm8
+ vpor xmm0,xmm0,xmm5
+ vpor xmm12,xmm12,xmm7
+ vmovdqa xmm15,XMMWORD[rbp]
+ vpxor xmm1,xmm1,xmm3
+ vmovdqa xmm3,XMMWORD[((112-128))+rax]
+
+ vpslld xmm8,xmm10,5
+ vpaddd xmm14,xmm14,xmm15
+ vpxor xmm6,xmm13,xmm11
+ vmovdqa XMMWORD[(64-128)+rax],xmm0
+ vpaddd xmm14,xmm14,xmm0
+ vpxor xmm1,xmm1,XMMWORD[((208-128))+rax]
+ vpsrld xmm9,xmm10,27
+ vpxor xmm6,xmm6,xmm12
+ vpxor xmm1,xmm1,xmm3
+
+ vpslld xmm7,xmm11,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm14,xmm14,xmm6
+ vpsrld xmm5,xmm1,31
+ vpaddd xmm1,xmm1,xmm1
+
+ vpsrld xmm11,xmm11,2
+ vpaddd xmm14,xmm14,xmm8
+ vpor xmm1,xmm1,xmm5
+ vpor xmm11,xmm11,xmm7
+ vpxor xmm2,xmm2,xmm4
+ vmovdqa xmm4,XMMWORD[((128-128))+rax]
+
+ vpslld xmm8,xmm14,5
+ vpaddd xmm13,xmm13,xmm15
+ vpxor xmm6,xmm12,xmm10
+ vmovdqa XMMWORD[(80-128)+rax],xmm1
+ vpaddd xmm13,xmm13,xmm1
+ vpxor xmm2,xmm2,XMMWORD[((224-128))+rax]
+ vpsrld xmm9,xmm14,27
+ vpxor xmm6,xmm6,xmm11
+ vpxor xmm2,xmm2,xmm4
+
+ vpslld xmm7,xmm10,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm13,xmm13,xmm6
+ vpsrld xmm5,xmm2,31
+ vpaddd xmm2,xmm2,xmm2
+
+ vpsrld xmm10,xmm10,2
+ vpaddd xmm13,xmm13,xmm8
+ vpor xmm2,xmm2,xmm5
+ vpor xmm10,xmm10,xmm7
+ vpxor xmm3,xmm3,xmm0
+ vmovdqa xmm0,XMMWORD[((144-128))+rax]
+
+ vpslld xmm8,xmm13,5
+ vpaddd xmm12,xmm12,xmm15
+ vpxor xmm6,xmm11,xmm14
+ vmovdqa XMMWORD[(96-128)+rax],xmm2
+ vpaddd xmm12,xmm12,xmm2
+ vpxor xmm3,xmm3,XMMWORD[((240-128))+rax]
+ vpsrld xmm9,xmm13,27
+ vpxor xmm6,xmm6,xmm10
+ vpxor xmm3,xmm3,xmm0
+
+ vpslld xmm7,xmm14,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm12,xmm12,xmm6
+ vpsrld xmm5,xmm3,31
+ vpaddd xmm3,xmm3,xmm3
+
+ vpsrld xmm14,xmm14,2
+ vpaddd xmm12,xmm12,xmm8
+ vpor xmm3,xmm3,xmm5
+ vpor xmm14,xmm14,xmm7
+ vpxor xmm4,xmm4,xmm1
+ vmovdqa xmm1,XMMWORD[((160-128))+rax]
+
+ vpslld xmm8,xmm12,5
+ vpaddd xmm11,xmm11,xmm15
+ vpxor xmm6,xmm10,xmm13
+ vmovdqa XMMWORD[(112-128)+rax],xmm3
+ vpaddd xmm11,xmm11,xmm3
+ vpxor xmm4,xmm4,XMMWORD[((0-128))+rax]
+ vpsrld xmm9,xmm12,27
+ vpxor xmm6,xmm6,xmm14
+ vpxor xmm4,xmm4,xmm1
+
+ vpslld xmm7,xmm13,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm11,xmm11,xmm6
+ vpsrld xmm5,xmm4,31
+ vpaddd xmm4,xmm4,xmm4
+
+ vpsrld xmm13,xmm13,2
+ vpaddd xmm11,xmm11,xmm8
+ vpor xmm4,xmm4,xmm5
+ vpor xmm13,xmm13,xmm7
+ vpxor xmm0,xmm0,xmm2
+ vmovdqa xmm2,XMMWORD[((176-128))+rax]
+
+ vpslld xmm8,xmm11,5
+ vpaddd xmm10,xmm10,xmm15
+ vpxor xmm6,xmm14,xmm12
+ vmovdqa XMMWORD[(128-128)+rax],xmm4
+ vpaddd xmm10,xmm10,xmm4
+ vpxor xmm0,xmm0,XMMWORD[((16-128))+rax]
+ vpsrld xmm9,xmm11,27
+ vpxor xmm6,xmm6,xmm13
+ vpxor xmm0,xmm0,xmm2
+
+ vpslld xmm7,xmm12,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm10,xmm10,xmm6
+ vpsrld xmm5,xmm0,31
+ vpaddd xmm0,xmm0,xmm0
+
+ vpsrld xmm12,xmm12,2
+ vpaddd xmm10,xmm10,xmm8
+ vpor xmm0,xmm0,xmm5
+ vpor xmm12,xmm12,xmm7
+ vpxor xmm1,xmm1,xmm3
+ vmovdqa xmm3,XMMWORD[((192-128))+rax]
+
+ vpslld xmm8,xmm10,5
+ vpaddd xmm14,xmm14,xmm15
+ vpxor xmm6,xmm13,xmm11
+ vmovdqa XMMWORD[(144-128)+rax],xmm0
+ vpaddd xmm14,xmm14,xmm0
+ vpxor xmm1,xmm1,XMMWORD[((32-128))+rax]
+ vpsrld xmm9,xmm10,27
+ vpxor xmm6,xmm6,xmm12
+ vpxor xmm1,xmm1,xmm3
+
+ vpslld xmm7,xmm11,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm14,xmm14,xmm6
+ vpsrld xmm5,xmm1,31
+ vpaddd xmm1,xmm1,xmm1
+
+ vpsrld xmm11,xmm11,2
+ vpaddd xmm14,xmm14,xmm8
+ vpor xmm1,xmm1,xmm5
+ vpor xmm11,xmm11,xmm7
+ vpxor xmm2,xmm2,xmm4
+ vmovdqa xmm4,XMMWORD[((208-128))+rax]
+
+ vpslld xmm8,xmm14,5
+ vpaddd xmm13,xmm13,xmm15
+ vpxor xmm6,xmm12,xmm10
+ vmovdqa XMMWORD[(160-128)+rax],xmm1
+ vpaddd xmm13,xmm13,xmm1
+ vpxor xmm2,xmm2,XMMWORD[((48-128))+rax]
+ vpsrld xmm9,xmm14,27
+ vpxor xmm6,xmm6,xmm11
+ vpxor xmm2,xmm2,xmm4
+
+ vpslld xmm7,xmm10,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm13,xmm13,xmm6
+ vpsrld xmm5,xmm2,31
+ vpaddd xmm2,xmm2,xmm2
+
+ vpsrld xmm10,xmm10,2
+ vpaddd xmm13,xmm13,xmm8
+ vpor xmm2,xmm2,xmm5
+ vpor xmm10,xmm10,xmm7
+ vpxor xmm3,xmm3,xmm0
+ vmovdqa xmm0,XMMWORD[((224-128))+rax]
+
+ vpslld xmm8,xmm13,5
+ vpaddd xmm12,xmm12,xmm15
+ vpxor xmm6,xmm11,xmm14
+ vmovdqa XMMWORD[(176-128)+rax],xmm2
+ vpaddd xmm12,xmm12,xmm2
+ vpxor xmm3,xmm3,XMMWORD[((64-128))+rax]
+ vpsrld xmm9,xmm13,27
+ vpxor xmm6,xmm6,xmm10
+ vpxor xmm3,xmm3,xmm0
+
+ vpslld xmm7,xmm14,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm12,xmm12,xmm6
+ vpsrld xmm5,xmm3,31
+ vpaddd xmm3,xmm3,xmm3
+
+ vpsrld xmm14,xmm14,2
+ vpaddd xmm12,xmm12,xmm8
+ vpor xmm3,xmm3,xmm5
+ vpor xmm14,xmm14,xmm7
+ vpxor xmm4,xmm4,xmm1
+ vmovdqa xmm1,XMMWORD[((240-128))+rax]
+
+ vpslld xmm8,xmm12,5
+ vpaddd xmm11,xmm11,xmm15
+ vpxor xmm6,xmm10,xmm13
+ vmovdqa XMMWORD[(192-128)+rax],xmm3
+ vpaddd xmm11,xmm11,xmm3
+ vpxor xmm4,xmm4,XMMWORD[((80-128))+rax]
+ vpsrld xmm9,xmm12,27
+ vpxor xmm6,xmm6,xmm14
+ vpxor xmm4,xmm4,xmm1
+
+ vpslld xmm7,xmm13,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm11,xmm11,xmm6
+ vpsrld xmm5,xmm4,31
+ vpaddd xmm4,xmm4,xmm4
+
+ vpsrld xmm13,xmm13,2
+ vpaddd xmm11,xmm11,xmm8
+ vpor xmm4,xmm4,xmm5
+ vpor xmm13,xmm13,xmm7
+ vpxor xmm0,xmm0,xmm2
+ vmovdqa xmm2,XMMWORD[((0-128))+rax]
+
+ vpslld xmm8,xmm11,5
+ vpaddd xmm10,xmm10,xmm15
+ vpxor xmm6,xmm14,xmm12
+ vmovdqa XMMWORD[(208-128)+rax],xmm4
+ vpaddd xmm10,xmm10,xmm4
+ vpxor xmm0,xmm0,XMMWORD[((96-128))+rax]
+ vpsrld xmm9,xmm11,27
+ vpxor xmm6,xmm6,xmm13
+ vpxor xmm0,xmm0,xmm2
+
+ vpslld xmm7,xmm12,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm10,xmm10,xmm6
+ vpsrld xmm5,xmm0,31
+ vpaddd xmm0,xmm0,xmm0
+
+ vpsrld xmm12,xmm12,2
+ vpaddd xmm10,xmm10,xmm8
+ vpor xmm0,xmm0,xmm5
+ vpor xmm12,xmm12,xmm7
+ vpxor xmm1,xmm1,xmm3
+ vmovdqa xmm3,XMMWORD[((16-128))+rax]
+
+ vpslld xmm8,xmm10,5
+ vpaddd xmm14,xmm14,xmm15
+ vpxor xmm6,xmm13,xmm11
+ vmovdqa XMMWORD[(224-128)+rax],xmm0
+ vpaddd xmm14,xmm14,xmm0
+ vpxor xmm1,xmm1,XMMWORD[((112-128))+rax]
+ vpsrld xmm9,xmm10,27
+ vpxor xmm6,xmm6,xmm12
+ vpxor xmm1,xmm1,xmm3
+
+ vpslld xmm7,xmm11,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm14,xmm14,xmm6
+ vpsrld xmm5,xmm1,31
+ vpaddd xmm1,xmm1,xmm1
+
+ vpsrld xmm11,xmm11,2
+ vpaddd xmm14,xmm14,xmm8
+ vpor xmm1,xmm1,xmm5
+ vpor xmm11,xmm11,xmm7
+ vpxor xmm2,xmm2,xmm4
+ vmovdqa xmm4,XMMWORD[((32-128))+rax]
+
+ vpslld xmm8,xmm14,5
+ vpaddd xmm13,xmm13,xmm15
+ vpxor xmm6,xmm12,xmm10
+ vmovdqa XMMWORD[(240-128)+rax],xmm1
+ vpaddd xmm13,xmm13,xmm1
+ vpxor xmm2,xmm2,XMMWORD[((128-128))+rax]
+ vpsrld xmm9,xmm14,27
+ vpxor xmm6,xmm6,xmm11
+ vpxor xmm2,xmm2,xmm4
+
+ vpslld xmm7,xmm10,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm13,xmm13,xmm6
+ vpsrld xmm5,xmm2,31
+ vpaddd xmm2,xmm2,xmm2
+
+ vpsrld xmm10,xmm10,2
+ vpaddd xmm13,xmm13,xmm8
+ vpor xmm2,xmm2,xmm5
+ vpor xmm10,xmm10,xmm7
+ vpxor xmm3,xmm3,xmm0
+ vmovdqa xmm0,XMMWORD[((48-128))+rax]
+
+ vpslld xmm8,xmm13,5
+ vpaddd xmm12,xmm12,xmm15
+ vpxor xmm6,xmm11,xmm14
+ vmovdqa XMMWORD[(0-128)+rax],xmm2
+ vpaddd xmm12,xmm12,xmm2
+ vpxor xmm3,xmm3,XMMWORD[((144-128))+rax]
+ vpsrld xmm9,xmm13,27
+ vpxor xmm6,xmm6,xmm10
+ vpxor xmm3,xmm3,xmm0
+
+ vpslld xmm7,xmm14,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm12,xmm12,xmm6
+ vpsrld xmm5,xmm3,31
+ vpaddd xmm3,xmm3,xmm3
+
+ vpsrld xmm14,xmm14,2
+ vpaddd xmm12,xmm12,xmm8
+ vpor xmm3,xmm3,xmm5
+ vpor xmm14,xmm14,xmm7
+ vpxor xmm4,xmm4,xmm1
+ vmovdqa xmm1,XMMWORD[((64-128))+rax]
+
+ vpslld xmm8,xmm12,5
+ vpaddd xmm11,xmm11,xmm15
+ vpxor xmm6,xmm10,xmm13
+ vmovdqa XMMWORD[(16-128)+rax],xmm3
+ vpaddd xmm11,xmm11,xmm3
+ vpxor xmm4,xmm4,XMMWORD[((160-128))+rax]
+ vpsrld xmm9,xmm12,27
+ vpxor xmm6,xmm6,xmm14
+ vpxor xmm4,xmm4,xmm1
+
+ vpslld xmm7,xmm13,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm11,xmm11,xmm6
+ vpsrld xmm5,xmm4,31
+ vpaddd xmm4,xmm4,xmm4
+
+ vpsrld xmm13,xmm13,2
+ vpaddd xmm11,xmm11,xmm8
+ vpor xmm4,xmm4,xmm5
+ vpor xmm13,xmm13,xmm7
+ vpxor xmm0,xmm0,xmm2
+ vmovdqa xmm2,XMMWORD[((80-128))+rax]
+
+ vpslld xmm8,xmm11,5
+ vpaddd xmm10,xmm10,xmm15
+ vpxor xmm6,xmm14,xmm12
+ vmovdqa XMMWORD[(32-128)+rax],xmm4
+ vpaddd xmm10,xmm10,xmm4
+ vpxor xmm0,xmm0,XMMWORD[((176-128))+rax]
+ vpsrld xmm9,xmm11,27
+ vpxor xmm6,xmm6,xmm13
+ vpxor xmm0,xmm0,xmm2
+
+ vpslld xmm7,xmm12,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm10,xmm10,xmm6
+ vpsrld xmm5,xmm0,31
+ vpaddd xmm0,xmm0,xmm0
+
+ vpsrld xmm12,xmm12,2
+ vpaddd xmm10,xmm10,xmm8
+ vpor xmm0,xmm0,xmm5
+ vpor xmm12,xmm12,xmm7
+ vpxor xmm1,xmm1,xmm3
+ vmovdqa xmm3,XMMWORD[((96-128))+rax]
+
+ vpslld xmm8,xmm10,5
+ vpaddd xmm14,xmm14,xmm15
+ vpxor xmm6,xmm13,xmm11
+ vmovdqa XMMWORD[(48-128)+rax],xmm0
+ vpaddd xmm14,xmm14,xmm0
+ vpxor xmm1,xmm1,XMMWORD[((192-128))+rax]
+ vpsrld xmm9,xmm10,27
+ vpxor xmm6,xmm6,xmm12
+ vpxor xmm1,xmm1,xmm3
+
+ vpslld xmm7,xmm11,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm14,xmm14,xmm6
+ vpsrld xmm5,xmm1,31
+ vpaddd xmm1,xmm1,xmm1
+
+ vpsrld xmm11,xmm11,2
+ vpaddd xmm14,xmm14,xmm8
+ vpor xmm1,xmm1,xmm5
+ vpor xmm11,xmm11,xmm7
+ vpxor xmm2,xmm2,xmm4
+ vmovdqa xmm4,XMMWORD[((112-128))+rax]
+
+ vpslld xmm8,xmm14,5
+ vpaddd xmm13,xmm13,xmm15
+ vpxor xmm6,xmm12,xmm10
+ vmovdqa XMMWORD[(64-128)+rax],xmm1
+ vpaddd xmm13,xmm13,xmm1
+ vpxor xmm2,xmm2,XMMWORD[((208-128))+rax]
+ vpsrld xmm9,xmm14,27
+ vpxor xmm6,xmm6,xmm11
+ vpxor xmm2,xmm2,xmm4
+
+ vpslld xmm7,xmm10,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm13,xmm13,xmm6
+ vpsrld xmm5,xmm2,31
+ vpaddd xmm2,xmm2,xmm2
+
+ vpsrld xmm10,xmm10,2
+ vpaddd xmm13,xmm13,xmm8
+ vpor xmm2,xmm2,xmm5
+ vpor xmm10,xmm10,xmm7
+ vpxor xmm3,xmm3,xmm0
+ vmovdqa xmm0,XMMWORD[((128-128))+rax]
+
+ vpslld xmm8,xmm13,5
+ vpaddd xmm12,xmm12,xmm15
+ vpxor xmm6,xmm11,xmm14
+ vmovdqa XMMWORD[(80-128)+rax],xmm2
+ vpaddd xmm12,xmm12,xmm2
+ vpxor xmm3,xmm3,XMMWORD[((224-128))+rax]
+ vpsrld xmm9,xmm13,27
+ vpxor xmm6,xmm6,xmm10
+ vpxor xmm3,xmm3,xmm0
+
+ vpslld xmm7,xmm14,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm12,xmm12,xmm6
+ vpsrld xmm5,xmm3,31
+ vpaddd xmm3,xmm3,xmm3
+
+ vpsrld xmm14,xmm14,2
+ vpaddd xmm12,xmm12,xmm8
+ vpor xmm3,xmm3,xmm5
+ vpor xmm14,xmm14,xmm7
+ vpxor xmm4,xmm4,xmm1
+ vmovdqa xmm1,XMMWORD[((144-128))+rax]
+
+ vpslld xmm8,xmm12,5
+ vpaddd xmm11,xmm11,xmm15
+ vpxor xmm6,xmm10,xmm13
+ vmovdqa XMMWORD[(96-128)+rax],xmm3
+ vpaddd xmm11,xmm11,xmm3
+ vpxor xmm4,xmm4,XMMWORD[((240-128))+rax]
+ vpsrld xmm9,xmm12,27
+ vpxor xmm6,xmm6,xmm14
+ vpxor xmm4,xmm4,xmm1
+
+ vpslld xmm7,xmm13,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm11,xmm11,xmm6
+ vpsrld xmm5,xmm4,31
+ vpaddd xmm4,xmm4,xmm4
+
+ vpsrld xmm13,xmm13,2
+ vpaddd xmm11,xmm11,xmm8
+ vpor xmm4,xmm4,xmm5
+ vpor xmm13,xmm13,xmm7
+ vpxor xmm0,xmm0,xmm2
+ vmovdqa xmm2,XMMWORD[((160-128))+rax]
+
+ vpslld xmm8,xmm11,5
+ vpaddd xmm10,xmm10,xmm15
+ vpxor xmm6,xmm14,xmm12
+ vmovdqa XMMWORD[(112-128)+rax],xmm4
+ vpaddd xmm10,xmm10,xmm4
+ vpxor xmm0,xmm0,XMMWORD[((0-128))+rax]
+ vpsrld xmm9,xmm11,27
+ vpxor xmm6,xmm6,xmm13
+ vpxor xmm0,xmm0,xmm2
+
+ vpslld xmm7,xmm12,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm10,xmm10,xmm6
+ vpsrld xmm5,xmm0,31
+ vpaddd xmm0,xmm0,xmm0
+
+ vpsrld xmm12,xmm12,2
+ vpaddd xmm10,xmm10,xmm8
+ vpor xmm0,xmm0,xmm5
+ vpor xmm12,xmm12,xmm7
+ vmovdqa xmm15,XMMWORD[32+rbp]
+ vpxor xmm1,xmm1,xmm3
+ vmovdqa xmm3,XMMWORD[((176-128))+rax]
+
+ vpaddd xmm14,xmm14,xmm15
+ vpslld xmm8,xmm10,5
+ vpand xmm7,xmm13,xmm12
+ vpxor xmm1,xmm1,XMMWORD[((16-128))+rax]
+
+ vpaddd xmm14,xmm14,xmm7
+ vpsrld xmm9,xmm10,27
+ vpxor xmm6,xmm13,xmm12
+ vpxor xmm1,xmm1,xmm3
+
+ vmovdqu XMMWORD[(128-128)+rax],xmm0
+ vpaddd xmm14,xmm14,xmm0
+ vpor xmm8,xmm8,xmm9
+ vpsrld xmm5,xmm1,31
+ vpand xmm6,xmm6,xmm11
+ vpaddd xmm1,xmm1,xmm1
+
+ vpslld xmm7,xmm11,30
+ vpaddd xmm14,xmm14,xmm6
+
+ vpsrld xmm11,xmm11,2
+ vpaddd xmm14,xmm14,xmm8
+ vpor xmm1,xmm1,xmm5
+ vpor xmm11,xmm11,xmm7
+ vpxor xmm2,xmm2,xmm4
+ vmovdqa xmm4,XMMWORD[((192-128))+rax]
+
+ vpaddd xmm13,xmm13,xmm15
+ vpslld xmm8,xmm14,5
+ vpand xmm7,xmm12,xmm11
+ vpxor xmm2,xmm2,XMMWORD[((32-128))+rax]
+
+ vpaddd xmm13,xmm13,xmm7
+ vpsrld xmm9,xmm14,27
+ vpxor xmm6,xmm12,xmm11
+ vpxor xmm2,xmm2,xmm4
+
+ vmovdqu XMMWORD[(144-128)+rax],xmm1
+ vpaddd xmm13,xmm13,xmm1
+ vpor xmm8,xmm8,xmm9
+ vpsrld xmm5,xmm2,31
+ vpand xmm6,xmm6,xmm10
+ vpaddd xmm2,xmm2,xmm2
+
+ vpslld xmm7,xmm10,30
+ vpaddd xmm13,xmm13,xmm6
+
+ vpsrld xmm10,xmm10,2
+ vpaddd xmm13,xmm13,xmm8
+ vpor xmm2,xmm2,xmm5
+ vpor xmm10,xmm10,xmm7
+ vpxor xmm3,xmm3,xmm0
+ vmovdqa xmm0,XMMWORD[((208-128))+rax]
+
+ vpaddd xmm12,xmm12,xmm15
+ vpslld xmm8,xmm13,5
+ vpand xmm7,xmm11,xmm10
+ vpxor xmm3,xmm3,XMMWORD[((48-128))+rax]
+
+ vpaddd xmm12,xmm12,xmm7
+ vpsrld xmm9,xmm13,27
+ vpxor xmm6,xmm11,xmm10
+ vpxor xmm3,xmm3,xmm0
+
+ vmovdqu XMMWORD[(160-128)+rax],xmm2
+ vpaddd xmm12,xmm12,xmm2
+ vpor xmm8,xmm8,xmm9
+ vpsrld xmm5,xmm3,31
+ vpand xmm6,xmm6,xmm14
+ vpaddd xmm3,xmm3,xmm3
+
+ vpslld xmm7,xmm14,30
+ vpaddd xmm12,xmm12,xmm6
+
+ vpsrld xmm14,xmm14,2
+ vpaddd xmm12,xmm12,xmm8
+ vpor xmm3,xmm3,xmm5
+ vpor xmm14,xmm14,xmm7
+ vpxor xmm4,xmm4,xmm1
+ vmovdqa xmm1,XMMWORD[((224-128))+rax]
+
+ vpaddd xmm11,xmm11,xmm15
+ vpslld xmm8,xmm12,5
+ vpand xmm7,xmm10,xmm14
+ vpxor xmm4,xmm4,XMMWORD[((64-128))+rax]
+
+ vpaddd xmm11,xmm11,xmm7
+ vpsrld xmm9,xmm12,27
+ vpxor xmm6,xmm10,xmm14
+ vpxor xmm4,xmm4,xmm1
+
+ vmovdqu XMMWORD[(176-128)+rax],xmm3
+ vpaddd xmm11,xmm11,xmm3
+ vpor xmm8,xmm8,xmm9
+ vpsrld xmm5,xmm4,31
+ vpand xmm6,xmm6,xmm13
+ vpaddd xmm4,xmm4,xmm4
+
+ vpslld xmm7,xmm13,30
+ vpaddd xmm11,xmm11,xmm6
+
+ vpsrld xmm13,xmm13,2
+ vpaddd xmm11,xmm11,xmm8
+ vpor xmm4,xmm4,xmm5
+ vpor xmm13,xmm13,xmm7
+ vpxor xmm0,xmm0,xmm2
+ vmovdqa xmm2,XMMWORD[((240-128))+rax]
+
+ vpaddd xmm10,xmm10,xmm15
+ vpslld xmm8,xmm11,5
+ vpand xmm7,xmm14,xmm13
+ vpxor xmm0,xmm0,XMMWORD[((80-128))+rax]
+
+ vpaddd xmm10,xmm10,xmm7
+ vpsrld xmm9,xmm11,27
+ vpxor xmm6,xmm14,xmm13
+ vpxor xmm0,xmm0,xmm2
+
+ vmovdqu XMMWORD[(192-128)+rax],xmm4
+ vpaddd xmm10,xmm10,xmm4
+ vpor xmm8,xmm8,xmm9
+ vpsrld xmm5,xmm0,31
+ vpand xmm6,xmm6,xmm12
+ vpaddd xmm0,xmm0,xmm0
+
+ vpslld xmm7,xmm12,30
+ vpaddd xmm10,xmm10,xmm6
+
+ vpsrld xmm12,xmm12,2
+ vpaddd xmm10,xmm10,xmm8
+ vpor xmm0,xmm0,xmm5
+ vpor xmm12,xmm12,xmm7
+ vpxor xmm1,xmm1,xmm3
+ vmovdqa xmm3,XMMWORD[((0-128))+rax]
+
+ vpaddd xmm14,xmm14,xmm15
+ vpslld xmm8,xmm10,5
+ vpand xmm7,xmm13,xmm12
+ vpxor xmm1,xmm1,XMMWORD[((96-128))+rax]
+
+ vpaddd xmm14,xmm14,xmm7
+ vpsrld xmm9,xmm10,27
+ vpxor xmm6,xmm13,xmm12
+ vpxor xmm1,xmm1,xmm3
+
+ vmovdqu XMMWORD[(208-128)+rax],xmm0
+ vpaddd xmm14,xmm14,xmm0
+ vpor xmm8,xmm8,xmm9
+ vpsrld xmm5,xmm1,31
+ vpand xmm6,xmm6,xmm11
+ vpaddd xmm1,xmm1,xmm1
+
+ vpslld xmm7,xmm11,30
+ vpaddd xmm14,xmm14,xmm6
+
+ vpsrld xmm11,xmm11,2
+ vpaddd xmm14,xmm14,xmm8
+ vpor xmm1,xmm1,xmm5
+ vpor xmm11,xmm11,xmm7
+ vpxor xmm2,xmm2,xmm4
+ vmovdqa xmm4,XMMWORD[((16-128))+rax]
+
+ vpaddd xmm13,xmm13,xmm15
+ vpslld xmm8,xmm14,5
+ vpand xmm7,xmm12,xmm11
+ vpxor xmm2,xmm2,XMMWORD[((112-128))+rax]
+
+ vpaddd xmm13,xmm13,xmm7
+ vpsrld xmm9,xmm14,27
+ vpxor xmm6,xmm12,xmm11
+ vpxor xmm2,xmm2,xmm4
+
+ vmovdqu XMMWORD[(224-128)+rax],xmm1
+ vpaddd xmm13,xmm13,xmm1
+ vpor xmm8,xmm8,xmm9
+ vpsrld xmm5,xmm2,31
+ vpand xmm6,xmm6,xmm10
+ vpaddd xmm2,xmm2,xmm2
+
+ vpslld xmm7,xmm10,30
+ vpaddd xmm13,xmm13,xmm6
+
+ vpsrld xmm10,xmm10,2
+ vpaddd xmm13,xmm13,xmm8
+ vpor xmm2,xmm2,xmm5
+ vpor xmm10,xmm10,xmm7
+ vpxor xmm3,xmm3,xmm0
+ vmovdqa xmm0,XMMWORD[((32-128))+rax]
+
+ vpaddd xmm12,xmm12,xmm15
+ vpslld xmm8,xmm13,5
+ vpand xmm7,xmm11,xmm10
+ vpxor xmm3,xmm3,XMMWORD[((128-128))+rax]
+
+ vpaddd xmm12,xmm12,xmm7
+ vpsrld xmm9,xmm13,27
+ vpxor xmm6,xmm11,xmm10
+ vpxor xmm3,xmm3,xmm0
+
+ vmovdqu XMMWORD[(240-128)+rax],xmm2
+ vpaddd xmm12,xmm12,xmm2
+ vpor xmm8,xmm8,xmm9
+ vpsrld xmm5,xmm3,31
+ vpand xmm6,xmm6,xmm14
+ vpaddd xmm3,xmm3,xmm3
+
+ vpslld xmm7,xmm14,30
+ vpaddd xmm12,xmm12,xmm6
+
+ vpsrld xmm14,xmm14,2
+ vpaddd xmm12,xmm12,xmm8
+ vpor xmm3,xmm3,xmm5
+ vpor xmm14,xmm14,xmm7
+ vpxor xmm4,xmm4,xmm1
+ vmovdqa xmm1,XMMWORD[((48-128))+rax]
+
+ vpaddd xmm11,xmm11,xmm15
+ vpslld xmm8,xmm12,5
+ vpand xmm7,xmm10,xmm14
+ vpxor xmm4,xmm4,XMMWORD[((144-128))+rax]
+
+ vpaddd xmm11,xmm11,xmm7
+ vpsrld xmm9,xmm12,27
+ vpxor xmm6,xmm10,xmm14
+ vpxor xmm4,xmm4,xmm1
+
+ vmovdqu XMMWORD[(0-128)+rax],xmm3
+ vpaddd xmm11,xmm11,xmm3
+ vpor xmm8,xmm8,xmm9
+ vpsrld xmm5,xmm4,31
+ vpand xmm6,xmm6,xmm13
+ vpaddd xmm4,xmm4,xmm4
+
+ vpslld xmm7,xmm13,30
+ vpaddd xmm11,xmm11,xmm6
+
+ vpsrld xmm13,xmm13,2
+ vpaddd xmm11,xmm11,xmm8
+ vpor xmm4,xmm4,xmm5
+ vpor xmm13,xmm13,xmm7
+ vpxor xmm0,xmm0,xmm2
+ vmovdqa xmm2,XMMWORD[((64-128))+rax]
+
+ vpaddd xmm10,xmm10,xmm15
+ vpslld xmm8,xmm11,5
+ vpand xmm7,xmm14,xmm13
+ vpxor xmm0,xmm0,XMMWORD[((160-128))+rax]
+
+ vpaddd xmm10,xmm10,xmm7
+ vpsrld xmm9,xmm11,27
+ vpxor xmm6,xmm14,xmm13
+ vpxor xmm0,xmm0,xmm2
+
+ vmovdqu XMMWORD[(16-128)+rax],xmm4
+ vpaddd xmm10,xmm10,xmm4
+ vpor xmm8,xmm8,xmm9
+ vpsrld xmm5,xmm0,31
+ vpand xmm6,xmm6,xmm12
+ vpaddd xmm0,xmm0,xmm0
+
+ vpslld xmm7,xmm12,30
+ vpaddd xmm10,xmm10,xmm6
+
+ vpsrld xmm12,xmm12,2
+ vpaddd xmm10,xmm10,xmm8
+ vpor xmm0,xmm0,xmm5
+ vpor xmm12,xmm12,xmm7
+ vpxor xmm1,xmm1,xmm3
+ vmovdqa xmm3,XMMWORD[((80-128))+rax]
+
+ vpaddd xmm14,xmm14,xmm15
+ vpslld xmm8,xmm10,5
+ vpand xmm7,xmm13,xmm12
+ vpxor xmm1,xmm1,XMMWORD[((176-128))+rax]
+
+ vpaddd xmm14,xmm14,xmm7
+ vpsrld xmm9,xmm10,27
+ vpxor xmm6,xmm13,xmm12
+ vpxor xmm1,xmm1,xmm3
+
+ vmovdqu XMMWORD[(32-128)+rax],xmm0
+ vpaddd xmm14,xmm14,xmm0
+ vpor xmm8,xmm8,xmm9
+ vpsrld xmm5,xmm1,31
+ vpand xmm6,xmm6,xmm11
+ vpaddd xmm1,xmm1,xmm1
+
+ vpslld xmm7,xmm11,30
+ vpaddd xmm14,xmm14,xmm6
+
+ vpsrld xmm11,xmm11,2
+ vpaddd xmm14,xmm14,xmm8
+ vpor xmm1,xmm1,xmm5
+ vpor xmm11,xmm11,xmm7
+ vpxor xmm2,xmm2,xmm4
+ vmovdqa xmm4,XMMWORD[((96-128))+rax]
+
+ vpaddd xmm13,xmm13,xmm15
+ vpslld xmm8,xmm14,5
+ vpand xmm7,xmm12,xmm11
+ vpxor xmm2,xmm2,XMMWORD[((192-128))+rax]
+
+ vpaddd xmm13,xmm13,xmm7
+ vpsrld xmm9,xmm14,27
+ vpxor xmm6,xmm12,xmm11
+ vpxor xmm2,xmm2,xmm4
+
+ vmovdqu XMMWORD[(48-128)+rax],xmm1
+ vpaddd xmm13,xmm13,xmm1
+ vpor xmm8,xmm8,xmm9
+ vpsrld xmm5,xmm2,31
+ vpand xmm6,xmm6,xmm10
+ vpaddd xmm2,xmm2,xmm2
+
+ vpslld xmm7,xmm10,30
+ vpaddd xmm13,xmm13,xmm6
+
+ vpsrld xmm10,xmm10,2
+ vpaddd xmm13,xmm13,xmm8
+ vpor xmm2,xmm2,xmm5
+ vpor xmm10,xmm10,xmm7
+ vpxor xmm3,xmm3,xmm0
+ vmovdqa xmm0,XMMWORD[((112-128))+rax]
+
+ vpaddd xmm12,xmm12,xmm15
+ vpslld xmm8,xmm13,5
+ vpand xmm7,xmm11,xmm10
+ vpxor xmm3,xmm3,XMMWORD[((208-128))+rax]
+
+ vpaddd xmm12,xmm12,xmm7
+ vpsrld xmm9,xmm13,27
+ vpxor xmm6,xmm11,xmm10
+ vpxor xmm3,xmm3,xmm0
+
+ vmovdqu XMMWORD[(64-128)+rax],xmm2
+ vpaddd xmm12,xmm12,xmm2
+ vpor xmm8,xmm8,xmm9
+ vpsrld xmm5,xmm3,31
+ vpand xmm6,xmm6,xmm14
+ vpaddd xmm3,xmm3,xmm3
+
+ vpslld xmm7,xmm14,30
+ vpaddd xmm12,xmm12,xmm6
+
+ vpsrld xmm14,xmm14,2
+ vpaddd xmm12,xmm12,xmm8
+ vpor xmm3,xmm3,xmm5
+ vpor xmm14,xmm14,xmm7
+ vpxor xmm4,xmm4,xmm1
+ vmovdqa xmm1,XMMWORD[((128-128))+rax]
+
+ vpaddd xmm11,xmm11,xmm15
+ vpslld xmm8,xmm12,5
+ vpand xmm7,xmm10,xmm14
+ vpxor xmm4,xmm4,XMMWORD[((224-128))+rax]
+
+ vpaddd xmm11,xmm11,xmm7
+ vpsrld xmm9,xmm12,27
+ vpxor xmm6,xmm10,xmm14
+ vpxor xmm4,xmm4,xmm1
+
+ vmovdqu XMMWORD[(80-128)+rax],xmm3
+ vpaddd xmm11,xmm11,xmm3
+ vpor xmm8,xmm8,xmm9
+ vpsrld xmm5,xmm4,31
+ vpand xmm6,xmm6,xmm13
+ vpaddd xmm4,xmm4,xmm4
+
+ vpslld xmm7,xmm13,30
+ vpaddd xmm11,xmm11,xmm6
+
+ vpsrld xmm13,xmm13,2
+ vpaddd xmm11,xmm11,xmm8
+ vpor xmm4,xmm4,xmm5
+ vpor xmm13,xmm13,xmm7
+ vpxor xmm0,xmm0,xmm2
+ vmovdqa xmm2,XMMWORD[((144-128))+rax]
+
+ vpaddd xmm10,xmm10,xmm15
+ vpslld xmm8,xmm11,5
+ vpand xmm7,xmm14,xmm13
+ vpxor xmm0,xmm0,XMMWORD[((240-128))+rax]
+
+ vpaddd xmm10,xmm10,xmm7
+ vpsrld xmm9,xmm11,27
+ vpxor xmm6,xmm14,xmm13
+ vpxor xmm0,xmm0,xmm2
+
+ vmovdqu XMMWORD[(96-128)+rax],xmm4
+ vpaddd xmm10,xmm10,xmm4
+ vpor xmm8,xmm8,xmm9
+ vpsrld xmm5,xmm0,31
+ vpand xmm6,xmm6,xmm12
+ vpaddd xmm0,xmm0,xmm0
+
+ vpslld xmm7,xmm12,30
+ vpaddd xmm10,xmm10,xmm6
+
+ vpsrld xmm12,xmm12,2
+ vpaddd xmm10,xmm10,xmm8
+ vpor xmm0,xmm0,xmm5
+ vpor xmm12,xmm12,xmm7
+ vpxor xmm1,xmm1,xmm3
+ vmovdqa xmm3,XMMWORD[((160-128))+rax]
+
+ vpaddd xmm14,xmm14,xmm15
+ vpslld xmm8,xmm10,5
+ vpand xmm7,xmm13,xmm12
+ vpxor xmm1,xmm1,XMMWORD[((0-128))+rax]
+
+ vpaddd xmm14,xmm14,xmm7
+ vpsrld xmm9,xmm10,27
+ vpxor xmm6,xmm13,xmm12
+ vpxor xmm1,xmm1,xmm3
+
+ vmovdqu XMMWORD[(112-128)+rax],xmm0
+ vpaddd xmm14,xmm14,xmm0
+ vpor xmm8,xmm8,xmm9
+ vpsrld xmm5,xmm1,31
+ vpand xmm6,xmm6,xmm11
+ vpaddd xmm1,xmm1,xmm1
+
+ vpslld xmm7,xmm11,30
+ vpaddd xmm14,xmm14,xmm6
+
+ vpsrld xmm11,xmm11,2
+ vpaddd xmm14,xmm14,xmm8
+ vpor xmm1,xmm1,xmm5
+ vpor xmm11,xmm11,xmm7
+ vpxor xmm2,xmm2,xmm4
+ vmovdqa xmm4,XMMWORD[((176-128))+rax]
+
+ vpaddd xmm13,xmm13,xmm15
+ vpslld xmm8,xmm14,5
+ vpand xmm7,xmm12,xmm11
+ vpxor xmm2,xmm2,XMMWORD[((16-128))+rax]
+
+ vpaddd xmm13,xmm13,xmm7
+ vpsrld xmm9,xmm14,27
+ vpxor xmm6,xmm12,xmm11
+ vpxor xmm2,xmm2,xmm4
+
+ vmovdqu XMMWORD[(128-128)+rax],xmm1
+ vpaddd xmm13,xmm13,xmm1
+ vpor xmm8,xmm8,xmm9
+ vpsrld xmm5,xmm2,31
+ vpand xmm6,xmm6,xmm10
+ vpaddd xmm2,xmm2,xmm2
+
+ vpslld xmm7,xmm10,30
+ vpaddd xmm13,xmm13,xmm6
+
+ vpsrld xmm10,xmm10,2
+ vpaddd xmm13,xmm13,xmm8
+ vpor xmm2,xmm2,xmm5
+ vpor xmm10,xmm10,xmm7
+ vpxor xmm3,xmm3,xmm0
+ vmovdqa xmm0,XMMWORD[((192-128))+rax]
+
+ vpaddd xmm12,xmm12,xmm15
+ vpslld xmm8,xmm13,5
+ vpand xmm7,xmm11,xmm10
+ vpxor xmm3,xmm3,XMMWORD[((32-128))+rax]
+
+ vpaddd xmm12,xmm12,xmm7
+ vpsrld xmm9,xmm13,27
+ vpxor xmm6,xmm11,xmm10
+ vpxor xmm3,xmm3,xmm0
+
+ vmovdqu XMMWORD[(144-128)+rax],xmm2
+ vpaddd xmm12,xmm12,xmm2
+ vpor xmm8,xmm8,xmm9
+ vpsrld xmm5,xmm3,31
+ vpand xmm6,xmm6,xmm14
+ vpaddd xmm3,xmm3,xmm3
+
+ vpslld xmm7,xmm14,30
+ vpaddd xmm12,xmm12,xmm6
+
+ vpsrld xmm14,xmm14,2
+ vpaddd xmm12,xmm12,xmm8
+ vpor xmm3,xmm3,xmm5
+ vpor xmm14,xmm14,xmm7
+ vpxor xmm4,xmm4,xmm1
+ vmovdqa xmm1,XMMWORD[((208-128))+rax]
+
+ vpaddd xmm11,xmm11,xmm15
+ vpslld xmm8,xmm12,5
+ vpand xmm7,xmm10,xmm14
+ vpxor xmm4,xmm4,XMMWORD[((48-128))+rax]
+
+ vpaddd xmm11,xmm11,xmm7
+ vpsrld xmm9,xmm12,27
+ vpxor xmm6,xmm10,xmm14
+ vpxor xmm4,xmm4,xmm1
+
+ vmovdqu XMMWORD[(160-128)+rax],xmm3
+ vpaddd xmm11,xmm11,xmm3
+ vpor xmm8,xmm8,xmm9
+ vpsrld xmm5,xmm4,31
+ vpand xmm6,xmm6,xmm13
+ vpaddd xmm4,xmm4,xmm4
+
+ vpslld xmm7,xmm13,30
+ vpaddd xmm11,xmm11,xmm6
+
+ vpsrld xmm13,xmm13,2
+ vpaddd xmm11,xmm11,xmm8
+ vpor xmm4,xmm4,xmm5
+ vpor xmm13,xmm13,xmm7
+ vpxor xmm0,xmm0,xmm2
+ vmovdqa xmm2,XMMWORD[((224-128))+rax]
+
+ vpaddd xmm10,xmm10,xmm15
+ vpslld xmm8,xmm11,5
+ vpand xmm7,xmm14,xmm13
+ vpxor xmm0,xmm0,XMMWORD[((64-128))+rax]
+
+ vpaddd xmm10,xmm10,xmm7
+ vpsrld xmm9,xmm11,27
+ vpxor xmm6,xmm14,xmm13
+ vpxor xmm0,xmm0,xmm2
+
+ vmovdqu XMMWORD[(176-128)+rax],xmm4
+ vpaddd xmm10,xmm10,xmm4
+ vpor xmm8,xmm8,xmm9
+ vpsrld xmm5,xmm0,31
+ vpand xmm6,xmm6,xmm12
+ vpaddd xmm0,xmm0,xmm0
+
+ vpslld xmm7,xmm12,30
+ vpaddd xmm10,xmm10,xmm6
+
+ vpsrld xmm12,xmm12,2
+ vpaddd xmm10,xmm10,xmm8
+ vpor xmm0,xmm0,xmm5
+ vpor xmm12,xmm12,xmm7
+ vmovdqa xmm15,XMMWORD[64+rbp]
+ vpxor xmm1,xmm1,xmm3
+ vmovdqa xmm3,XMMWORD[((240-128))+rax]
+
+ vpslld xmm8,xmm10,5
+ vpaddd xmm14,xmm14,xmm15
+ vpxor xmm6,xmm13,xmm11
+ vmovdqa XMMWORD[(192-128)+rax],xmm0
+ vpaddd xmm14,xmm14,xmm0
+ vpxor xmm1,xmm1,XMMWORD[((80-128))+rax]
+ vpsrld xmm9,xmm10,27
+ vpxor xmm6,xmm6,xmm12
+ vpxor xmm1,xmm1,xmm3
+
+ vpslld xmm7,xmm11,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm14,xmm14,xmm6
+ vpsrld xmm5,xmm1,31
+ vpaddd xmm1,xmm1,xmm1
+
+ vpsrld xmm11,xmm11,2
+ vpaddd xmm14,xmm14,xmm8
+ vpor xmm1,xmm1,xmm5
+ vpor xmm11,xmm11,xmm7
+ vpxor xmm2,xmm2,xmm4
+ vmovdqa xmm4,XMMWORD[((0-128))+rax]
+
+ vpslld xmm8,xmm14,5
+ vpaddd xmm13,xmm13,xmm15
+ vpxor xmm6,xmm12,xmm10
+ vmovdqa XMMWORD[(208-128)+rax],xmm1
+ vpaddd xmm13,xmm13,xmm1
+ vpxor xmm2,xmm2,XMMWORD[((96-128))+rax]
+ vpsrld xmm9,xmm14,27
+ vpxor xmm6,xmm6,xmm11
+ vpxor xmm2,xmm2,xmm4
+
+ vpslld xmm7,xmm10,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm13,xmm13,xmm6
+ vpsrld xmm5,xmm2,31
+ vpaddd xmm2,xmm2,xmm2
+
+ vpsrld xmm10,xmm10,2
+ vpaddd xmm13,xmm13,xmm8
+ vpor xmm2,xmm2,xmm5
+ vpor xmm10,xmm10,xmm7
+ vpxor xmm3,xmm3,xmm0
+ vmovdqa xmm0,XMMWORD[((16-128))+rax]
+
+ vpslld xmm8,xmm13,5
+ vpaddd xmm12,xmm12,xmm15
+ vpxor xmm6,xmm11,xmm14
+ vmovdqa XMMWORD[(224-128)+rax],xmm2
+ vpaddd xmm12,xmm12,xmm2
+ vpxor xmm3,xmm3,XMMWORD[((112-128))+rax]
+ vpsrld xmm9,xmm13,27
+ vpxor xmm6,xmm6,xmm10
+ vpxor xmm3,xmm3,xmm0
+
+ vpslld xmm7,xmm14,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm12,xmm12,xmm6
+ vpsrld xmm5,xmm3,31
+ vpaddd xmm3,xmm3,xmm3
+
+ vpsrld xmm14,xmm14,2
+ vpaddd xmm12,xmm12,xmm8
+ vpor xmm3,xmm3,xmm5
+ vpor xmm14,xmm14,xmm7
+ vpxor xmm4,xmm4,xmm1
+ vmovdqa xmm1,XMMWORD[((32-128))+rax]
+
+ vpslld xmm8,xmm12,5
+ vpaddd xmm11,xmm11,xmm15
+ vpxor xmm6,xmm10,xmm13
+ vmovdqa XMMWORD[(240-128)+rax],xmm3
+ vpaddd xmm11,xmm11,xmm3
+ vpxor xmm4,xmm4,XMMWORD[((128-128))+rax]
+ vpsrld xmm9,xmm12,27
+ vpxor xmm6,xmm6,xmm14
+ vpxor xmm4,xmm4,xmm1
+
+ vpslld xmm7,xmm13,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm11,xmm11,xmm6
+ vpsrld xmm5,xmm4,31
+ vpaddd xmm4,xmm4,xmm4
+
+ vpsrld xmm13,xmm13,2
+ vpaddd xmm11,xmm11,xmm8
+ vpor xmm4,xmm4,xmm5
+ vpor xmm13,xmm13,xmm7
+ vpxor xmm0,xmm0,xmm2
+ vmovdqa xmm2,XMMWORD[((48-128))+rax]
+
+ vpslld xmm8,xmm11,5
+ vpaddd xmm10,xmm10,xmm15
+ vpxor xmm6,xmm14,xmm12
+ vmovdqa XMMWORD[(0-128)+rax],xmm4
+ vpaddd xmm10,xmm10,xmm4
+ vpxor xmm0,xmm0,XMMWORD[((144-128))+rax]
+ vpsrld xmm9,xmm11,27
+ vpxor xmm6,xmm6,xmm13
+ vpxor xmm0,xmm0,xmm2
+
+ vpslld xmm7,xmm12,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm10,xmm10,xmm6
+ vpsrld xmm5,xmm0,31
+ vpaddd xmm0,xmm0,xmm0
+
+ vpsrld xmm12,xmm12,2
+ vpaddd xmm10,xmm10,xmm8
+ vpor xmm0,xmm0,xmm5
+ vpor xmm12,xmm12,xmm7
+ vpxor xmm1,xmm1,xmm3
+ vmovdqa xmm3,XMMWORD[((64-128))+rax]
+
+ vpslld xmm8,xmm10,5
+ vpaddd xmm14,xmm14,xmm15
+ vpxor xmm6,xmm13,xmm11
+ vmovdqa XMMWORD[(16-128)+rax],xmm0
+ vpaddd xmm14,xmm14,xmm0
+ vpxor xmm1,xmm1,XMMWORD[((160-128))+rax]
+ vpsrld xmm9,xmm10,27
+ vpxor xmm6,xmm6,xmm12
+ vpxor xmm1,xmm1,xmm3
+
+ vpslld xmm7,xmm11,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm14,xmm14,xmm6
+ vpsrld xmm5,xmm1,31
+ vpaddd xmm1,xmm1,xmm1
+
+ vpsrld xmm11,xmm11,2
+ vpaddd xmm14,xmm14,xmm8
+ vpor xmm1,xmm1,xmm5
+ vpor xmm11,xmm11,xmm7
+ vpxor xmm2,xmm2,xmm4
+ vmovdqa xmm4,XMMWORD[((80-128))+rax]
+
+ vpslld xmm8,xmm14,5
+ vpaddd xmm13,xmm13,xmm15
+ vpxor xmm6,xmm12,xmm10
+ vmovdqa XMMWORD[(32-128)+rax],xmm1
+ vpaddd xmm13,xmm13,xmm1
+ vpxor xmm2,xmm2,XMMWORD[((176-128))+rax]
+ vpsrld xmm9,xmm14,27
+ vpxor xmm6,xmm6,xmm11
+ vpxor xmm2,xmm2,xmm4
+
+ vpslld xmm7,xmm10,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm13,xmm13,xmm6
+ vpsrld xmm5,xmm2,31
+ vpaddd xmm2,xmm2,xmm2
+
+ vpsrld xmm10,xmm10,2
+ vpaddd xmm13,xmm13,xmm8
+ vpor xmm2,xmm2,xmm5
+ vpor xmm10,xmm10,xmm7
+ vpxor xmm3,xmm3,xmm0
+ vmovdqa xmm0,XMMWORD[((96-128))+rax]
+
+ vpslld xmm8,xmm13,5
+ vpaddd xmm12,xmm12,xmm15
+ vpxor xmm6,xmm11,xmm14
+ vmovdqa XMMWORD[(48-128)+rax],xmm2
+ vpaddd xmm12,xmm12,xmm2
+ vpxor xmm3,xmm3,XMMWORD[((192-128))+rax]
+ vpsrld xmm9,xmm13,27
+ vpxor xmm6,xmm6,xmm10
+ vpxor xmm3,xmm3,xmm0
+
+ vpslld xmm7,xmm14,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm12,xmm12,xmm6
+ vpsrld xmm5,xmm3,31
+ vpaddd xmm3,xmm3,xmm3
+
+ vpsrld xmm14,xmm14,2
+ vpaddd xmm12,xmm12,xmm8
+ vpor xmm3,xmm3,xmm5
+ vpor xmm14,xmm14,xmm7
+ vpxor xmm4,xmm4,xmm1
+ vmovdqa xmm1,XMMWORD[((112-128))+rax]
+
+ vpslld xmm8,xmm12,5
+ vpaddd xmm11,xmm11,xmm15
+ vpxor xmm6,xmm10,xmm13
+ vmovdqa XMMWORD[(64-128)+rax],xmm3
+ vpaddd xmm11,xmm11,xmm3
+ vpxor xmm4,xmm4,XMMWORD[((208-128))+rax]
+ vpsrld xmm9,xmm12,27
+ vpxor xmm6,xmm6,xmm14
+ vpxor xmm4,xmm4,xmm1
+
+ vpslld xmm7,xmm13,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm11,xmm11,xmm6
+ vpsrld xmm5,xmm4,31
+ vpaddd xmm4,xmm4,xmm4
+
+ vpsrld xmm13,xmm13,2
+ vpaddd xmm11,xmm11,xmm8
+ vpor xmm4,xmm4,xmm5
+ vpor xmm13,xmm13,xmm7
+ vpxor xmm0,xmm0,xmm2
+ vmovdqa xmm2,XMMWORD[((128-128))+rax]
+
+ vpslld xmm8,xmm11,5
+ vpaddd xmm10,xmm10,xmm15
+ vpxor xmm6,xmm14,xmm12
+ vmovdqa XMMWORD[(80-128)+rax],xmm4
+ vpaddd xmm10,xmm10,xmm4
+ vpxor xmm0,xmm0,XMMWORD[((224-128))+rax]
+ vpsrld xmm9,xmm11,27
+ vpxor xmm6,xmm6,xmm13
+ vpxor xmm0,xmm0,xmm2
+
+ vpslld xmm7,xmm12,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm10,xmm10,xmm6
+ vpsrld xmm5,xmm0,31
+ vpaddd xmm0,xmm0,xmm0
+
+ vpsrld xmm12,xmm12,2
+ vpaddd xmm10,xmm10,xmm8
+ vpor xmm0,xmm0,xmm5
+ vpor xmm12,xmm12,xmm7
+ vpxor xmm1,xmm1,xmm3
+ vmovdqa xmm3,XMMWORD[((144-128))+rax]
+
+ vpslld xmm8,xmm10,5
+ vpaddd xmm14,xmm14,xmm15
+ vpxor xmm6,xmm13,xmm11
+ vmovdqa XMMWORD[(96-128)+rax],xmm0
+ vpaddd xmm14,xmm14,xmm0
+ vpxor xmm1,xmm1,XMMWORD[((240-128))+rax]
+ vpsrld xmm9,xmm10,27
+ vpxor xmm6,xmm6,xmm12
+ vpxor xmm1,xmm1,xmm3
+
+ vpslld xmm7,xmm11,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm14,xmm14,xmm6
+ vpsrld xmm5,xmm1,31
+ vpaddd xmm1,xmm1,xmm1
+
+ vpsrld xmm11,xmm11,2
+ vpaddd xmm14,xmm14,xmm8
+ vpor xmm1,xmm1,xmm5
+ vpor xmm11,xmm11,xmm7
+ vpxor xmm2,xmm2,xmm4
+ vmovdqa xmm4,XMMWORD[((160-128))+rax]
+
+ vpslld xmm8,xmm14,5
+ vpaddd xmm13,xmm13,xmm15
+ vpxor xmm6,xmm12,xmm10
+ vmovdqa XMMWORD[(112-128)+rax],xmm1
+ vpaddd xmm13,xmm13,xmm1
+ vpxor xmm2,xmm2,XMMWORD[((0-128))+rax]
+ vpsrld xmm9,xmm14,27
+ vpxor xmm6,xmm6,xmm11
+ vpxor xmm2,xmm2,xmm4
+
+ vpslld xmm7,xmm10,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm13,xmm13,xmm6
+ vpsrld xmm5,xmm2,31
+ vpaddd xmm2,xmm2,xmm2
+
+ vpsrld xmm10,xmm10,2
+ vpaddd xmm13,xmm13,xmm8
+ vpor xmm2,xmm2,xmm5
+ vpor xmm10,xmm10,xmm7
+ vpxor xmm3,xmm3,xmm0
+ vmovdqa xmm0,XMMWORD[((176-128))+rax]
+
+ vpslld xmm8,xmm13,5
+ vpaddd xmm12,xmm12,xmm15
+ vpxor xmm6,xmm11,xmm14
+ vpaddd xmm12,xmm12,xmm2
+ vpxor xmm3,xmm3,XMMWORD[((16-128))+rax]
+ vpsrld xmm9,xmm13,27
+ vpxor xmm6,xmm6,xmm10
+ vpxor xmm3,xmm3,xmm0
+
+ vpslld xmm7,xmm14,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm12,xmm12,xmm6
+ vpsrld xmm5,xmm3,31
+ vpaddd xmm3,xmm3,xmm3
+
+ vpsrld xmm14,xmm14,2
+ vpaddd xmm12,xmm12,xmm8
+ vpor xmm3,xmm3,xmm5
+ vpor xmm14,xmm14,xmm7
+ vpxor xmm4,xmm4,xmm1
+ vmovdqa xmm1,XMMWORD[((192-128))+rax]
+
+ vpslld xmm8,xmm12,5
+ vpaddd xmm11,xmm11,xmm15
+ vpxor xmm6,xmm10,xmm13
+ vpaddd xmm11,xmm11,xmm3
+ vpxor xmm4,xmm4,XMMWORD[((32-128))+rax]
+ vpsrld xmm9,xmm12,27
+ vpxor xmm6,xmm6,xmm14
+ vpxor xmm4,xmm4,xmm1
+
+ vpslld xmm7,xmm13,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm11,xmm11,xmm6
+ vpsrld xmm5,xmm4,31
+ vpaddd xmm4,xmm4,xmm4
+
+ vpsrld xmm13,xmm13,2
+ vpaddd xmm11,xmm11,xmm8
+ vpor xmm4,xmm4,xmm5
+ vpor xmm13,xmm13,xmm7
+ vpxor xmm0,xmm0,xmm2
+ vmovdqa xmm2,XMMWORD[((208-128))+rax]
+
+ vpslld xmm8,xmm11,5
+ vpaddd xmm10,xmm10,xmm15
+ vpxor xmm6,xmm14,xmm12
+ vpaddd xmm10,xmm10,xmm4
+ vpxor xmm0,xmm0,XMMWORD[((48-128))+rax]
+ vpsrld xmm9,xmm11,27
+ vpxor xmm6,xmm6,xmm13
+ vpxor xmm0,xmm0,xmm2
+
+ vpslld xmm7,xmm12,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm10,xmm10,xmm6
+ vpsrld xmm5,xmm0,31
+ vpaddd xmm0,xmm0,xmm0
+
+ vpsrld xmm12,xmm12,2
+ vpaddd xmm10,xmm10,xmm8
+ vpor xmm0,xmm0,xmm5
+ vpor xmm12,xmm12,xmm7
+ vpxor xmm1,xmm1,xmm3
+ vmovdqa xmm3,XMMWORD[((224-128))+rax]
+
+ vpslld xmm8,xmm10,5
+ vpaddd xmm14,xmm14,xmm15
+ vpxor xmm6,xmm13,xmm11
+ vpaddd xmm14,xmm14,xmm0
+ vpxor xmm1,xmm1,XMMWORD[((64-128))+rax]
+ vpsrld xmm9,xmm10,27
+ vpxor xmm6,xmm6,xmm12
+ vpxor xmm1,xmm1,xmm3
+
+ vpslld xmm7,xmm11,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm14,xmm14,xmm6
+ vpsrld xmm5,xmm1,31
+ vpaddd xmm1,xmm1,xmm1
+
+ vpsrld xmm11,xmm11,2
+ vpaddd xmm14,xmm14,xmm8
+ vpor xmm1,xmm1,xmm5
+ vpor xmm11,xmm11,xmm7
+ vpxor xmm2,xmm2,xmm4
+ vmovdqa xmm4,XMMWORD[((240-128))+rax]
+
+ vpslld xmm8,xmm14,5
+ vpaddd xmm13,xmm13,xmm15
+ vpxor xmm6,xmm12,xmm10
+ vpaddd xmm13,xmm13,xmm1
+ vpxor xmm2,xmm2,XMMWORD[((80-128))+rax]
+ vpsrld xmm9,xmm14,27
+ vpxor xmm6,xmm6,xmm11
+ vpxor xmm2,xmm2,xmm4
+
+ vpslld xmm7,xmm10,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm13,xmm13,xmm6
+ vpsrld xmm5,xmm2,31
+ vpaddd xmm2,xmm2,xmm2
+
+ vpsrld xmm10,xmm10,2
+ vpaddd xmm13,xmm13,xmm8
+ vpor xmm2,xmm2,xmm5
+ vpor xmm10,xmm10,xmm7
+ vpxor xmm3,xmm3,xmm0
+ vmovdqa xmm0,XMMWORD[((0-128))+rax]
+
+ vpslld xmm8,xmm13,5
+ vpaddd xmm12,xmm12,xmm15
+ vpxor xmm6,xmm11,xmm14
+ vpaddd xmm12,xmm12,xmm2
+ vpxor xmm3,xmm3,XMMWORD[((96-128))+rax]
+ vpsrld xmm9,xmm13,27
+ vpxor xmm6,xmm6,xmm10
+ vpxor xmm3,xmm3,xmm0
+
+ vpslld xmm7,xmm14,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm12,xmm12,xmm6
+ vpsrld xmm5,xmm3,31
+ vpaddd xmm3,xmm3,xmm3
+
+ vpsrld xmm14,xmm14,2
+ vpaddd xmm12,xmm12,xmm8
+ vpor xmm3,xmm3,xmm5
+ vpor xmm14,xmm14,xmm7
+ vpxor xmm4,xmm4,xmm1
+ vmovdqa xmm1,XMMWORD[((16-128))+rax]
+
+ vpslld xmm8,xmm12,5
+ vpaddd xmm11,xmm11,xmm15
+ vpxor xmm6,xmm10,xmm13
+ vpaddd xmm11,xmm11,xmm3
+ vpxor xmm4,xmm4,XMMWORD[((112-128))+rax]
+ vpsrld xmm9,xmm12,27
+ vpxor xmm6,xmm6,xmm14
+ vpxor xmm4,xmm4,xmm1
+
+ vpslld xmm7,xmm13,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm11,xmm11,xmm6
+ vpsrld xmm5,xmm4,31
+ vpaddd xmm4,xmm4,xmm4
+
+ vpsrld xmm13,xmm13,2
+ vpaddd xmm11,xmm11,xmm8
+ vpor xmm4,xmm4,xmm5
+ vpor xmm13,xmm13,xmm7
+ vpslld xmm8,xmm11,5
+ vpaddd xmm10,xmm10,xmm15
+ vpxor xmm6,xmm14,xmm12
+
+ vpsrld xmm9,xmm11,27
+ vpaddd xmm10,xmm10,xmm4
+ vpxor xmm6,xmm6,xmm13
+
+ vpslld xmm7,xmm12,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm10,xmm10,xmm6
+
+ vpsrld xmm12,xmm12,2
+ vpaddd xmm10,xmm10,xmm8
+ vpor xmm12,xmm12,xmm7
+ mov ecx,1
+ cmp ecx,DWORD[rbx]
+ cmovge r8,rbp
+ cmp ecx,DWORD[4+rbx]
+ cmovge r9,rbp
+ cmp ecx,DWORD[8+rbx]
+ cmovge r10,rbp
+ cmp ecx,DWORD[12+rbx]
+ cmovge r11,rbp
+ vmovdqu xmm6,XMMWORD[rbx]
+ vpxor xmm8,xmm8,xmm8
+ vmovdqa xmm7,xmm6
+ vpcmpgtd xmm7,xmm7,xmm8
+ vpaddd xmm6,xmm6,xmm7
+
+ vpand xmm10,xmm10,xmm7
+ vpand xmm11,xmm11,xmm7
+ vpaddd xmm10,xmm10,XMMWORD[rdi]
+ vpand xmm12,xmm12,xmm7
+ vpaddd xmm11,xmm11,XMMWORD[32+rdi]
+ vpand xmm13,xmm13,xmm7
+ vpaddd xmm12,xmm12,XMMWORD[64+rdi]
+ vpand xmm14,xmm14,xmm7
+ vpaddd xmm13,xmm13,XMMWORD[96+rdi]
+ vpaddd xmm14,xmm14,XMMWORD[128+rdi]
+ vmovdqu XMMWORD[rdi],xmm10
+ vmovdqu XMMWORD[32+rdi],xmm11
+ vmovdqu XMMWORD[64+rdi],xmm12
+ vmovdqu XMMWORD[96+rdi],xmm13
+ vmovdqu XMMWORD[128+rdi],xmm14
+
+ vmovdqu XMMWORD[rbx],xmm6
+ vmovdqu xmm5,XMMWORD[96+rbp]
+ dec edx
+ jnz NEAR $L$oop_avx
+
+ mov edx,DWORD[280+rsp]
+ lea rdi,[16+rdi]
+ lea rsi,[64+rsi]
+ dec edx
+ jnz NEAR $L$oop_grande_avx
+
+$L$done_avx:
+ mov rax,QWORD[272+rsp]
+
+ vzeroupper
+ movaps xmm6,XMMWORD[((-184))+rax]
+ movaps xmm7,XMMWORD[((-168))+rax]
+ movaps xmm8,XMMWORD[((-152))+rax]
+ movaps xmm9,XMMWORD[((-136))+rax]
+ movaps xmm10,XMMWORD[((-120))+rax]
+ movaps xmm11,XMMWORD[((-104))+rax]
+ movaps xmm12,XMMWORD[((-88))+rax]
+ movaps xmm13,XMMWORD[((-72))+rax]
+ movaps xmm14,XMMWORD[((-56))+rax]
+ movaps xmm15,XMMWORD[((-40))+rax]
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$epilogue_avx:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_sha1_multi_block_avx:
+
+ALIGN 32
+sha1_multi_block_avx2:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha1_multi_block_avx2:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+_avx2_shortcut:
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ lea rsp,[((-168))+rsp]
+ movaps XMMWORD[rsp],xmm6
+ movaps XMMWORD[16+rsp],xmm7
+ movaps XMMWORD[32+rsp],xmm8
+ movaps XMMWORD[48+rsp],xmm9
+ movaps XMMWORD[64+rsp],xmm10
+ movaps XMMWORD[80+rsp],xmm11
+ movaps XMMWORD[(-120)+rax],xmm12
+ movaps XMMWORD[(-104)+rax],xmm13
+ movaps XMMWORD[(-88)+rax],xmm14
+ movaps XMMWORD[(-72)+rax],xmm15
+ sub rsp,576
+ and rsp,-256
+ mov QWORD[544+rsp],rax
+
+$L$body_avx2:
+ lea rbp,[K_XX_XX]
+ shr edx,1
+
+ vzeroupper
+$L$oop_grande_avx2:
+ mov DWORD[552+rsp],edx
+ xor edx,edx
+ lea rbx,[512+rsp]
+ mov r12,QWORD[rsi]
+ mov ecx,DWORD[8+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[rbx],ecx
+ cmovle r12,rbp
+ mov r13,QWORD[16+rsi]
+ mov ecx,DWORD[24+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[4+rbx],ecx
+ cmovle r13,rbp
+ mov r14,QWORD[32+rsi]
+ mov ecx,DWORD[40+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[8+rbx],ecx
+ cmovle r14,rbp
+ mov r15,QWORD[48+rsi]
+ mov ecx,DWORD[56+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[12+rbx],ecx
+ cmovle r15,rbp
+ mov r8,QWORD[64+rsi]
+ mov ecx,DWORD[72+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[16+rbx],ecx
+ cmovle r8,rbp
+ mov r9,QWORD[80+rsi]
+ mov ecx,DWORD[88+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[20+rbx],ecx
+ cmovle r9,rbp
+ mov r10,QWORD[96+rsi]
+ mov ecx,DWORD[104+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[24+rbx],ecx
+ cmovle r10,rbp
+ mov r11,QWORD[112+rsi]
+ mov ecx,DWORD[120+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[28+rbx],ecx
+ cmovle r11,rbp
+ vmovdqu ymm0,YMMWORD[rdi]
+ lea rax,[128+rsp]
+ vmovdqu ymm1,YMMWORD[32+rdi]
+ lea rbx,[((256+128))+rsp]
+ vmovdqu ymm2,YMMWORD[64+rdi]
+ vmovdqu ymm3,YMMWORD[96+rdi]
+ vmovdqu ymm4,YMMWORD[128+rdi]
+ vmovdqu ymm9,YMMWORD[96+rbp]
+ jmp NEAR $L$oop_avx2
+
+ALIGN 32
+$L$oop_avx2:
+ vmovdqa ymm15,YMMWORD[((-32))+rbp]
+ vmovd xmm10,DWORD[r12]
+ lea r12,[64+r12]
+ vmovd xmm12,DWORD[r8]
+ lea r8,[64+r8]
+ vmovd xmm7,DWORD[r13]
+ lea r13,[64+r13]
+ vmovd xmm6,DWORD[r9]
+ lea r9,[64+r9]
+ vpinsrd xmm10,xmm10,DWORD[r14],1
+ lea r14,[64+r14]
+ vpinsrd xmm12,xmm12,DWORD[r10],1
+ lea r10,[64+r10]
+ vpinsrd xmm7,xmm7,DWORD[r15],1
+ lea r15,[64+r15]
+ vpunpckldq ymm10,ymm10,ymm7
+ vpinsrd xmm6,xmm6,DWORD[r11],1
+ lea r11,[64+r11]
+ vpunpckldq ymm12,ymm12,ymm6
+ vmovd xmm11,DWORD[((-60))+r12]
+ vinserti128 ymm10,ymm10,xmm12,1
+ vmovd xmm8,DWORD[((-60))+r8]
+ vpshufb ymm10,ymm10,ymm9
+ vmovd xmm7,DWORD[((-60))+r13]
+ vmovd xmm6,DWORD[((-60))+r9]
+ vpinsrd xmm11,xmm11,DWORD[((-60))+r14],1
+ vpinsrd xmm8,xmm8,DWORD[((-60))+r10],1
+ vpinsrd xmm7,xmm7,DWORD[((-60))+r15],1
+ vpunpckldq ymm11,ymm11,ymm7
+ vpinsrd xmm6,xmm6,DWORD[((-60))+r11],1
+ vpunpckldq ymm8,ymm8,ymm6
+ vpaddd ymm4,ymm4,ymm15
+ vpslld ymm7,ymm0,5
+ vpandn ymm6,ymm1,ymm3
+ vpand ymm5,ymm1,ymm2
+
+ vmovdqa YMMWORD[(0-128)+rax],ymm10
+ vpaddd ymm4,ymm4,ymm10
+ vinserti128 ymm11,ymm11,xmm8,1
+ vpsrld ymm8,ymm0,27
+ vpxor ymm5,ymm5,ymm6
+ vmovd xmm12,DWORD[((-56))+r12]
+
+ vpslld ymm6,ymm1,30
+ vpor ymm7,ymm7,ymm8
+ vmovd xmm8,DWORD[((-56))+r8]
+ vpaddd ymm4,ymm4,ymm5
+
+ vpsrld ymm1,ymm1,2
+ vpaddd ymm4,ymm4,ymm7
+ vpshufb ymm11,ymm11,ymm9
+ vpor ymm1,ymm1,ymm6
+ vmovd xmm7,DWORD[((-56))+r13]
+ vmovd xmm6,DWORD[((-56))+r9]
+ vpinsrd xmm12,xmm12,DWORD[((-56))+r14],1
+ vpinsrd xmm8,xmm8,DWORD[((-56))+r10],1
+ vpinsrd xmm7,xmm7,DWORD[((-56))+r15],1
+ vpunpckldq ymm12,ymm12,ymm7
+ vpinsrd xmm6,xmm6,DWORD[((-56))+r11],1
+ vpunpckldq ymm8,ymm8,ymm6
+ vpaddd ymm3,ymm3,ymm15
+ vpslld ymm7,ymm4,5
+ vpandn ymm6,ymm0,ymm2
+ vpand ymm5,ymm0,ymm1
+
+ vmovdqa YMMWORD[(32-128)+rax],ymm11
+ vpaddd ymm3,ymm3,ymm11
+ vinserti128 ymm12,ymm12,xmm8,1
+ vpsrld ymm8,ymm4,27
+ vpxor ymm5,ymm5,ymm6
+ vmovd xmm13,DWORD[((-52))+r12]
+
+ vpslld ymm6,ymm0,30
+ vpor ymm7,ymm7,ymm8
+ vmovd xmm8,DWORD[((-52))+r8]
+ vpaddd ymm3,ymm3,ymm5
+
+ vpsrld ymm0,ymm0,2
+ vpaddd ymm3,ymm3,ymm7
+ vpshufb ymm12,ymm12,ymm9
+ vpor ymm0,ymm0,ymm6
+ vmovd xmm7,DWORD[((-52))+r13]
+ vmovd xmm6,DWORD[((-52))+r9]
+ vpinsrd xmm13,xmm13,DWORD[((-52))+r14],1
+ vpinsrd xmm8,xmm8,DWORD[((-52))+r10],1
+ vpinsrd xmm7,xmm7,DWORD[((-52))+r15],1
+ vpunpckldq ymm13,ymm13,ymm7
+ vpinsrd xmm6,xmm6,DWORD[((-52))+r11],1
+ vpunpckldq ymm8,ymm8,ymm6
+ vpaddd ymm2,ymm2,ymm15
+ vpslld ymm7,ymm3,5
+ vpandn ymm6,ymm4,ymm1
+ vpand ymm5,ymm4,ymm0
+
+ vmovdqa YMMWORD[(64-128)+rax],ymm12
+ vpaddd ymm2,ymm2,ymm12
+ vinserti128 ymm13,ymm13,xmm8,1
+ vpsrld ymm8,ymm3,27
+ vpxor ymm5,ymm5,ymm6
+ vmovd xmm14,DWORD[((-48))+r12]
+
+ vpslld ymm6,ymm4,30
+ vpor ymm7,ymm7,ymm8
+ vmovd xmm8,DWORD[((-48))+r8]
+ vpaddd ymm2,ymm2,ymm5
+
+ vpsrld ymm4,ymm4,2
+ vpaddd ymm2,ymm2,ymm7
+ vpshufb ymm13,ymm13,ymm9
+ vpor ymm4,ymm4,ymm6
+ vmovd xmm7,DWORD[((-48))+r13]
+ vmovd xmm6,DWORD[((-48))+r9]
+ vpinsrd xmm14,xmm14,DWORD[((-48))+r14],1
+ vpinsrd xmm8,xmm8,DWORD[((-48))+r10],1
+ vpinsrd xmm7,xmm7,DWORD[((-48))+r15],1
+ vpunpckldq ymm14,ymm14,ymm7
+ vpinsrd xmm6,xmm6,DWORD[((-48))+r11],1
+ vpunpckldq ymm8,ymm8,ymm6
+ vpaddd ymm1,ymm1,ymm15
+ vpslld ymm7,ymm2,5
+ vpandn ymm6,ymm3,ymm0
+ vpand ymm5,ymm3,ymm4
+
+ vmovdqa YMMWORD[(96-128)+rax],ymm13
+ vpaddd ymm1,ymm1,ymm13
+ vinserti128 ymm14,ymm14,xmm8,1
+ vpsrld ymm8,ymm2,27
+ vpxor ymm5,ymm5,ymm6
+ vmovd xmm10,DWORD[((-44))+r12]
+
+ vpslld ymm6,ymm3,30
+ vpor ymm7,ymm7,ymm8
+ vmovd xmm8,DWORD[((-44))+r8]
+ vpaddd ymm1,ymm1,ymm5
+
+ vpsrld ymm3,ymm3,2
+ vpaddd ymm1,ymm1,ymm7
+ vpshufb ymm14,ymm14,ymm9
+ vpor ymm3,ymm3,ymm6
+ vmovd xmm7,DWORD[((-44))+r13]
+ vmovd xmm6,DWORD[((-44))+r9]
+ vpinsrd xmm10,xmm10,DWORD[((-44))+r14],1
+ vpinsrd xmm8,xmm8,DWORD[((-44))+r10],1
+ vpinsrd xmm7,xmm7,DWORD[((-44))+r15],1
+ vpunpckldq ymm10,ymm10,ymm7
+ vpinsrd xmm6,xmm6,DWORD[((-44))+r11],1
+ vpunpckldq ymm8,ymm8,ymm6
+ vpaddd ymm0,ymm0,ymm15
+ vpslld ymm7,ymm1,5
+ vpandn ymm6,ymm2,ymm4
+ vpand ymm5,ymm2,ymm3
+
+ vmovdqa YMMWORD[(128-128)+rax],ymm14
+ vpaddd ymm0,ymm0,ymm14
+ vinserti128 ymm10,ymm10,xmm8,1
+ vpsrld ymm8,ymm1,27
+ vpxor ymm5,ymm5,ymm6
+ vmovd xmm11,DWORD[((-40))+r12]
+
+ vpslld ymm6,ymm2,30
+ vpor ymm7,ymm7,ymm8
+ vmovd xmm8,DWORD[((-40))+r8]
+ vpaddd ymm0,ymm0,ymm5
+
+ vpsrld ymm2,ymm2,2
+ vpaddd ymm0,ymm0,ymm7
+ vpshufb ymm10,ymm10,ymm9
+ vpor ymm2,ymm2,ymm6
+ vmovd xmm7,DWORD[((-40))+r13]
+ vmovd xmm6,DWORD[((-40))+r9]
+ vpinsrd xmm11,xmm11,DWORD[((-40))+r14],1
+ vpinsrd xmm8,xmm8,DWORD[((-40))+r10],1
+ vpinsrd xmm7,xmm7,DWORD[((-40))+r15],1
+ vpunpckldq ymm11,ymm11,ymm7
+ vpinsrd xmm6,xmm6,DWORD[((-40))+r11],1
+ vpunpckldq ymm8,ymm8,ymm6
+ vpaddd ymm4,ymm4,ymm15
+ vpslld ymm7,ymm0,5
+ vpandn ymm6,ymm1,ymm3
+ vpand ymm5,ymm1,ymm2
+
+ vmovdqa YMMWORD[(160-128)+rax],ymm10
+ vpaddd ymm4,ymm4,ymm10
+ vinserti128 ymm11,ymm11,xmm8,1
+ vpsrld ymm8,ymm0,27
+ vpxor ymm5,ymm5,ymm6
+ vmovd xmm12,DWORD[((-36))+r12]
+
+ vpslld ymm6,ymm1,30
+ vpor ymm7,ymm7,ymm8
+ vmovd xmm8,DWORD[((-36))+r8]
+ vpaddd ymm4,ymm4,ymm5
+
+ vpsrld ymm1,ymm1,2
+ vpaddd ymm4,ymm4,ymm7
+ vpshufb ymm11,ymm11,ymm9
+ vpor ymm1,ymm1,ymm6
+ vmovd xmm7,DWORD[((-36))+r13]
+ vmovd xmm6,DWORD[((-36))+r9]
+ vpinsrd xmm12,xmm12,DWORD[((-36))+r14],1
+ vpinsrd xmm8,xmm8,DWORD[((-36))+r10],1
+ vpinsrd xmm7,xmm7,DWORD[((-36))+r15],1
+ vpunpckldq ymm12,ymm12,ymm7
+ vpinsrd xmm6,xmm6,DWORD[((-36))+r11],1
+ vpunpckldq ymm8,ymm8,ymm6
+ vpaddd ymm3,ymm3,ymm15
+ vpslld ymm7,ymm4,5
+ vpandn ymm6,ymm0,ymm2
+ vpand ymm5,ymm0,ymm1
+
+ vmovdqa YMMWORD[(192-128)+rax],ymm11
+ vpaddd ymm3,ymm3,ymm11
+ vinserti128 ymm12,ymm12,xmm8,1
+ vpsrld ymm8,ymm4,27
+ vpxor ymm5,ymm5,ymm6
+ vmovd xmm13,DWORD[((-32))+r12]
+
+ vpslld ymm6,ymm0,30
+ vpor ymm7,ymm7,ymm8
+ vmovd xmm8,DWORD[((-32))+r8]
+ vpaddd ymm3,ymm3,ymm5
+
+ vpsrld ymm0,ymm0,2
+ vpaddd ymm3,ymm3,ymm7
+ vpshufb ymm12,ymm12,ymm9
+ vpor ymm0,ymm0,ymm6
+ vmovd xmm7,DWORD[((-32))+r13]
+ vmovd xmm6,DWORD[((-32))+r9]
+ vpinsrd xmm13,xmm13,DWORD[((-32))+r14],1
+ vpinsrd xmm8,xmm8,DWORD[((-32))+r10],1
+ vpinsrd xmm7,xmm7,DWORD[((-32))+r15],1
+ vpunpckldq ymm13,ymm13,ymm7
+ vpinsrd xmm6,xmm6,DWORD[((-32))+r11],1
+ vpunpckldq ymm8,ymm8,ymm6
+ vpaddd ymm2,ymm2,ymm15
+ vpslld ymm7,ymm3,5
+ vpandn ymm6,ymm4,ymm1
+ vpand ymm5,ymm4,ymm0
+
+ vmovdqa YMMWORD[(224-128)+rax],ymm12
+ vpaddd ymm2,ymm2,ymm12
+ vinserti128 ymm13,ymm13,xmm8,1
+ vpsrld ymm8,ymm3,27
+ vpxor ymm5,ymm5,ymm6
+ vmovd xmm14,DWORD[((-28))+r12]
+
+ vpslld ymm6,ymm4,30
+ vpor ymm7,ymm7,ymm8
+ vmovd xmm8,DWORD[((-28))+r8]
+ vpaddd ymm2,ymm2,ymm5
+
+ vpsrld ymm4,ymm4,2
+ vpaddd ymm2,ymm2,ymm7
+ vpshufb ymm13,ymm13,ymm9
+ vpor ymm4,ymm4,ymm6
+ vmovd xmm7,DWORD[((-28))+r13]
+ vmovd xmm6,DWORD[((-28))+r9]
+ vpinsrd xmm14,xmm14,DWORD[((-28))+r14],1
+ vpinsrd xmm8,xmm8,DWORD[((-28))+r10],1
+ vpinsrd xmm7,xmm7,DWORD[((-28))+r15],1
+ vpunpckldq ymm14,ymm14,ymm7
+ vpinsrd xmm6,xmm6,DWORD[((-28))+r11],1
+ vpunpckldq ymm8,ymm8,ymm6
+ vpaddd ymm1,ymm1,ymm15
+ vpslld ymm7,ymm2,5
+ vpandn ymm6,ymm3,ymm0
+ vpand ymm5,ymm3,ymm4
+
+ vmovdqa YMMWORD[(256-256-128)+rbx],ymm13
+ vpaddd ymm1,ymm1,ymm13
+ vinserti128 ymm14,ymm14,xmm8,1
+ vpsrld ymm8,ymm2,27
+ vpxor ymm5,ymm5,ymm6
+ vmovd xmm10,DWORD[((-24))+r12]
+
+ vpslld ymm6,ymm3,30
+ vpor ymm7,ymm7,ymm8
+ vmovd xmm8,DWORD[((-24))+r8]
+ vpaddd ymm1,ymm1,ymm5
+
+ vpsrld ymm3,ymm3,2
+ vpaddd ymm1,ymm1,ymm7
+ vpshufb ymm14,ymm14,ymm9
+ vpor ymm3,ymm3,ymm6
+ vmovd xmm7,DWORD[((-24))+r13]
+ vmovd xmm6,DWORD[((-24))+r9]
+ vpinsrd xmm10,xmm10,DWORD[((-24))+r14],1
+ vpinsrd xmm8,xmm8,DWORD[((-24))+r10],1
+ vpinsrd xmm7,xmm7,DWORD[((-24))+r15],1
+ vpunpckldq ymm10,ymm10,ymm7
+ vpinsrd xmm6,xmm6,DWORD[((-24))+r11],1
+ vpunpckldq ymm8,ymm8,ymm6
+ vpaddd ymm0,ymm0,ymm15
+ vpslld ymm7,ymm1,5
+ vpandn ymm6,ymm2,ymm4
+ vpand ymm5,ymm2,ymm3
+
+ vmovdqa YMMWORD[(288-256-128)+rbx],ymm14
+ vpaddd ymm0,ymm0,ymm14
+ vinserti128 ymm10,ymm10,xmm8,1
+ vpsrld ymm8,ymm1,27
+ vpxor ymm5,ymm5,ymm6
+ vmovd xmm11,DWORD[((-20))+r12]
+
+ vpslld ymm6,ymm2,30
+ vpor ymm7,ymm7,ymm8
+ vmovd xmm8,DWORD[((-20))+r8]
+ vpaddd ymm0,ymm0,ymm5
+
+ vpsrld ymm2,ymm2,2
+ vpaddd ymm0,ymm0,ymm7
+ vpshufb ymm10,ymm10,ymm9
+ vpor ymm2,ymm2,ymm6
+ vmovd xmm7,DWORD[((-20))+r13]
+ vmovd xmm6,DWORD[((-20))+r9]
+ vpinsrd xmm11,xmm11,DWORD[((-20))+r14],1
+ vpinsrd xmm8,xmm8,DWORD[((-20))+r10],1
+ vpinsrd xmm7,xmm7,DWORD[((-20))+r15],1
+ vpunpckldq ymm11,ymm11,ymm7
+ vpinsrd xmm6,xmm6,DWORD[((-20))+r11],1
+ vpunpckldq ymm8,ymm8,ymm6
+ vpaddd ymm4,ymm4,ymm15
+ vpslld ymm7,ymm0,5
+ vpandn ymm6,ymm1,ymm3
+ vpand ymm5,ymm1,ymm2
+
+ vmovdqa YMMWORD[(320-256-128)+rbx],ymm10
+ vpaddd ymm4,ymm4,ymm10
+ vinserti128 ymm11,ymm11,xmm8,1
+ vpsrld ymm8,ymm0,27
+ vpxor ymm5,ymm5,ymm6
+ vmovd xmm12,DWORD[((-16))+r12]
+
+ vpslld ymm6,ymm1,30
+ vpor ymm7,ymm7,ymm8
+ vmovd xmm8,DWORD[((-16))+r8]
+ vpaddd ymm4,ymm4,ymm5
+
+ vpsrld ymm1,ymm1,2
+ vpaddd ymm4,ymm4,ymm7
+ vpshufb ymm11,ymm11,ymm9
+ vpor ymm1,ymm1,ymm6
+ vmovd xmm7,DWORD[((-16))+r13]
+ vmovd xmm6,DWORD[((-16))+r9]
+ vpinsrd xmm12,xmm12,DWORD[((-16))+r14],1
+ vpinsrd xmm8,xmm8,DWORD[((-16))+r10],1
+ vpinsrd xmm7,xmm7,DWORD[((-16))+r15],1
+ vpunpckldq ymm12,ymm12,ymm7
+ vpinsrd xmm6,xmm6,DWORD[((-16))+r11],1
+ vpunpckldq ymm8,ymm8,ymm6
+ vpaddd ymm3,ymm3,ymm15
+ vpslld ymm7,ymm4,5
+ vpandn ymm6,ymm0,ymm2
+ vpand ymm5,ymm0,ymm1
+
+ vmovdqa YMMWORD[(352-256-128)+rbx],ymm11
+ vpaddd ymm3,ymm3,ymm11
+ vinserti128 ymm12,ymm12,xmm8,1
+ vpsrld ymm8,ymm4,27
+ vpxor ymm5,ymm5,ymm6
+ vmovd xmm13,DWORD[((-12))+r12]
+
+ vpslld ymm6,ymm0,30
+ vpor ymm7,ymm7,ymm8
+ vmovd xmm8,DWORD[((-12))+r8]
+ vpaddd ymm3,ymm3,ymm5
+
+ vpsrld ymm0,ymm0,2
+ vpaddd ymm3,ymm3,ymm7
+ vpshufb ymm12,ymm12,ymm9
+ vpor ymm0,ymm0,ymm6
+ vmovd xmm7,DWORD[((-12))+r13]
+ vmovd xmm6,DWORD[((-12))+r9]
+ vpinsrd xmm13,xmm13,DWORD[((-12))+r14],1
+ vpinsrd xmm8,xmm8,DWORD[((-12))+r10],1
+ vpinsrd xmm7,xmm7,DWORD[((-12))+r15],1
+ vpunpckldq ymm13,ymm13,ymm7
+ vpinsrd xmm6,xmm6,DWORD[((-12))+r11],1
+ vpunpckldq ymm8,ymm8,ymm6
+ vpaddd ymm2,ymm2,ymm15
+ vpslld ymm7,ymm3,5
+ vpandn ymm6,ymm4,ymm1
+ vpand ymm5,ymm4,ymm0
+
+ vmovdqa YMMWORD[(384-256-128)+rbx],ymm12
+ vpaddd ymm2,ymm2,ymm12
+ vinserti128 ymm13,ymm13,xmm8,1
+ vpsrld ymm8,ymm3,27
+ vpxor ymm5,ymm5,ymm6
+ vmovd xmm14,DWORD[((-8))+r12]
+
+ vpslld ymm6,ymm4,30
+ vpor ymm7,ymm7,ymm8
+ vmovd xmm8,DWORD[((-8))+r8]
+ vpaddd ymm2,ymm2,ymm5
+
+ vpsrld ymm4,ymm4,2
+ vpaddd ymm2,ymm2,ymm7
+ vpshufb ymm13,ymm13,ymm9
+ vpor ymm4,ymm4,ymm6
+ vmovd xmm7,DWORD[((-8))+r13]
+ vmovd xmm6,DWORD[((-8))+r9]
+ vpinsrd xmm14,xmm14,DWORD[((-8))+r14],1
+ vpinsrd xmm8,xmm8,DWORD[((-8))+r10],1
+ vpinsrd xmm7,xmm7,DWORD[((-8))+r15],1
+ vpunpckldq ymm14,ymm14,ymm7
+ vpinsrd xmm6,xmm6,DWORD[((-8))+r11],1
+ vpunpckldq ymm8,ymm8,ymm6
+ vpaddd ymm1,ymm1,ymm15
+ vpslld ymm7,ymm2,5
+ vpandn ymm6,ymm3,ymm0
+ vpand ymm5,ymm3,ymm4
+
+ vmovdqa YMMWORD[(416-256-128)+rbx],ymm13
+ vpaddd ymm1,ymm1,ymm13
+ vinserti128 ymm14,ymm14,xmm8,1
+ vpsrld ymm8,ymm2,27
+ vpxor ymm5,ymm5,ymm6
+ vmovd xmm10,DWORD[((-4))+r12]
+
+ vpslld ymm6,ymm3,30
+ vpor ymm7,ymm7,ymm8
+ vmovd xmm8,DWORD[((-4))+r8]
+ vpaddd ymm1,ymm1,ymm5
+
+ vpsrld ymm3,ymm3,2
+ vpaddd ymm1,ymm1,ymm7
+ vpshufb ymm14,ymm14,ymm9
+ vpor ymm3,ymm3,ymm6
+ vmovdqa ymm11,YMMWORD[((0-128))+rax]
+ vmovd xmm7,DWORD[((-4))+r13]
+ vmovd xmm6,DWORD[((-4))+r9]
+ vpinsrd xmm10,xmm10,DWORD[((-4))+r14],1
+ vpinsrd xmm8,xmm8,DWORD[((-4))+r10],1
+ vpinsrd xmm7,xmm7,DWORD[((-4))+r15],1
+ vpunpckldq ymm10,ymm10,ymm7
+ vpinsrd xmm6,xmm6,DWORD[((-4))+r11],1
+ vpunpckldq ymm8,ymm8,ymm6
+ vpaddd ymm0,ymm0,ymm15
+ prefetcht0 [63+r12]
+ vpslld ymm7,ymm1,5
+ vpandn ymm6,ymm2,ymm4
+ vpand ymm5,ymm2,ymm3
+
+ vmovdqa YMMWORD[(448-256-128)+rbx],ymm14
+ vpaddd ymm0,ymm0,ymm14
+ vinserti128 ymm10,ymm10,xmm8,1
+ vpsrld ymm8,ymm1,27
+ prefetcht0 [63+r13]
+ vpxor ymm5,ymm5,ymm6
+
+ vpslld ymm6,ymm2,30
+ vpor ymm7,ymm7,ymm8
+ prefetcht0 [63+r14]
+ vpaddd ymm0,ymm0,ymm5
+
+ vpsrld ymm2,ymm2,2
+ vpaddd ymm0,ymm0,ymm7
+ prefetcht0 [63+r15]
+ vpshufb ymm10,ymm10,ymm9
+ vpor ymm2,ymm2,ymm6
+ vmovdqa ymm12,YMMWORD[((32-128))+rax]
+ vpxor ymm11,ymm11,ymm13
+ vmovdqa ymm13,YMMWORD[((64-128))+rax]
+
+ vpaddd ymm4,ymm4,ymm15
+ vpslld ymm7,ymm0,5
+ vpandn ymm6,ymm1,ymm3
+ prefetcht0 [63+r8]
+ vpand ymm5,ymm1,ymm2
+
+ vmovdqa YMMWORD[(480-256-128)+rbx],ymm10
+ vpaddd ymm4,ymm4,ymm10
+ vpxor ymm11,ymm11,YMMWORD[((256-256-128))+rbx]
+ vpsrld ymm8,ymm0,27
+ vpxor ymm5,ymm5,ymm6
+ vpxor ymm11,ymm11,ymm13
+ prefetcht0 [63+r9]
+
+ vpslld ymm6,ymm1,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm4,ymm4,ymm5
+ prefetcht0 [63+r10]
+ vpsrld ymm9,ymm11,31
+ vpaddd ymm11,ymm11,ymm11
+
+ vpsrld ymm1,ymm1,2
+ prefetcht0 [63+r11]
+ vpaddd ymm4,ymm4,ymm7
+ vpor ymm11,ymm11,ymm9
+ vpor ymm1,ymm1,ymm6
+ vpxor ymm12,ymm12,ymm14
+ vmovdqa ymm14,YMMWORD[((96-128))+rax]
+
+ vpaddd ymm3,ymm3,ymm15
+ vpslld ymm7,ymm4,5
+ vpandn ymm6,ymm0,ymm2
+
+ vpand ymm5,ymm0,ymm1
+
+ vmovdqa YMMWORD[(0-128)+rax],ymm11
+ vpaddd ymm3,ymm3,ymm11
+ vpxor ymm12,ymm12,YMMWORD[((288-256-128))+rbx]
+ vpsrld ymm8,ymm4,27
+ vpxor ymm5,ymm5,ymm6
+ vpxor ymm12,ymm12,ymm14
+
+
+ vpslld ymm6,ymm0,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm3,ymm3,ymm5
+
+ vpsrld ymm9,ymm12,31
+ vpaddd ymm12,ymm12,ymm12
+
+ vpsrld ymm0,ymm0,2
+
+ vpaddd ymm3,ymm3,ymm7
+ vpor ymm12,ymm12,ymm9
+ vpor ymm0,ymm0,ymm6
+ vpxor ymm13,ymm13,ymm10
+ vmovdqa ymm10,YMMWORD[((128-128))+rax]
+
+ vpaddd ymm2,ymm2,ymm15
+ vpslld ymm7,ymm3,5
+ vpandn ymm6,ymm4,ymm1
+
+ vpand ymm5,ymm4,ymm0
+
+ vmovdqa YMMWORD[(32-128)+rax],ymm12
+ vpaddd ymm2,ymm2,ymm12
+ vpxor ymm13,ymm13,YMMWORD[((320-256-128))+rbx]
+ vpsrld ymm8,ymm3,27
+ vpxor ymm5,ymm5,ymm6
+ vpxor ymm13,ymm13,ymm10
+
+
+ vpslld ymm6,ymm4,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm2,ymm2,ymm5
+
+ vpsrld ymm9,ymm13,31
+ vpaddd ymm13,ymm13,ymm13
+
+ vpsrld ymm4,ymm4,2
+
+ vpaddd ymm2,ymm2,ymm7
+ vpor ymm13,ymm13,ymm9
+ vpor ymm4,ymm4,ymm6
+ vpxor ymm14,ymm14,ymm11
+ vmovdqa ymm11,YMMWORD[((160-128))+rax]
+
+ vpaddd ymm1,ymm1,ymm15
+ vpslld ymm7,ymm2,5
+ vpandn ymm6,ymm3,ymm0
+
+ vpand ymm5,ymm3,ymm4
+
+ vmovdqa YMMWORD[(64-128)+rax],ymm13
+ vpaddd ymm1,ymm1,ymm13
+ vpxor ymm14,ymm14,YMMWORD[((352-256-128))+rbx]
+ vpsrld ymm8,ymm2,27
+ vpxor ymm5,ymm5,ymm6
+ vpxor ymm14,ymm14,ymm11
+
+
+ vpslld ymm6,ymm3,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm1,ymm1,ymm5
+
+ vpsrld ymm9,ymm14,31
+ vpaddd ymm14,ymm14,ymm14
+
+ vpsrld ymm3,ymm3,2
+
+ vpaddd ymm1,ymm1,ymm7
+ vpor ymm14,ymm14,ymm9
+ vpor ymm3,ymm3,ymm6
+ vpxor ymm10,ymm10,ymm12
+ vmovdqa ymm12,YMMWORD[((192-128))+rax]
+
+ vpaddd ymm0,ymm0,ymm15
+ vpslld ymm7,ymm1,5
+ vpandn ymm6,ymm2,ymm4
+
+ vpand ymm5,ymm2,ymm3
+
+ vmovdqa YMMWORD[(96-128)+rax],ymm14
+ vpaddd ymm0,ymm0,ymm14
+ vpxor ymm10,ymm10,YMMWORD[((384-256-128))+rbx]
+ vpsrld ymm8,ymm1,27
+ vpxor ymm5,ymm5,ymm6
+ vpxor ymm10,ymm10,ymm12
+
+
+ vpslld ymm6,ymm2,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm0,ymm0,ymm5
+
+ vpsrld ymm9,ymm10,31
+ vpaddd ymm10,ymm10,ymm10
+
+ vpsrld ymm2,ymm2,2
+
+ vpaddd ymm0,ymm0,ymm7
+ vpor ymm10,ymm10,ymm9
+ vpor ymm2,ymm2,ymm6
+ vmovdqa ymm15,YMMWORD[rbp]
+ vpxor ymm11,ymm11,ymm13
+ vmovdqa ymm13,YMMWORD[((224-128))+rax]
+
+ vpslld ymm7,ymm0,5
+ vpaddd ymm4,ymm4,ymm15
+ vpxor ymm5,ymm3,ymm1
+ vmovdqa YMMWORD[(128-128)+rax],ymm10
+ vpaddd ymm4,ymm4,ymm10
+ vpxor ymm11,ymm11,YMMWORD[((416-256-128))+rbx]
+ vpsrld ymm8,ymm0,27
+ vpxor ymm5,ymm5,ymm2
+ vpxor ymm11,ymm11,ymm13
+
+ vpslld ymm6,ymm1,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm4,ymm4,ymm5
+ vpsrld ymm9,ymm11,31
+ vpaddd ymm11,ymm11,ymm11
+
+ vpsrld ymm1,ymm1,2
+ vpaddd ymm4,ymm4,ymm7
+ vpor ymm11,ymm11,ymm9
+ vpor ymm1,ymm1,ymm6
+ vpxor ymm12,ymm12,ymm14
+ vmovdqa ymm14,YMMWORD[((256-256-128))+rbx]
+
+ vpslld ymm7,ymm4,5
+ vpaddd ymm3,ymm3,ymm15
+ vpxor ymm5,ymm2,ymm0
+ vmovdqa YMMWORD[(160-128)+rax],ymm11
+ vpaddd ymm3,ymm3,ymm11
+ vpxor ymm12,ymm12,YMMWORD[((448-256-128))+rbx]
+ vpsrld ymm8,ymm4,27
+ vpxor ymm5,ymm5,ymm1
+ vpxor ymm12,ymm12,ymm14
+
+ vpslld ymm6,ymm0,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm3,ymm3,ymm5
+ vpsrld ymm9,ymm12,31
+ vpaddd ymm12,ymm12,ymm12
+
+ vpsrld ymm0,ymm0,2
+ vpaddd ymm3,ymm3,ymm7
+ vpor ymm12,ymm12,ymm9
+ vpor ymm0,ymm0,ymm6
+ vpxor ymm13,ymm13,ymm10
+ vmovdqa ymm10,YMMWORD[((288-256-128))+rbx]
+
+ vpslld ymm7,ymm3,5
+ vpaddd ymm2,ymm2,ymm15
+ vpxor ymm5,ymm1,ymm4
+ vmovdqa YMMWORD[(192-128)+rax],ymm12
+ vpaddd ymm2,ymm2,ymm12
+ vpxor ymm13,ymm13,YMMWORD[((480-256-128))+rbx]
+ vpsrld ymm8,ymm3,27
+ vpxor ymm5,ymm5,ymm0
+ vpxor ymm13,ymm13,ymm10
+
+ vpslld ymm6,ymm4,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm2,ymm2,ymm5
+ vpsrld ymm9,ymm13,31
+ vpaddd ymm13,ymm13,ymm13
+
+ vpsrld ymm4,ymm4,2
+ vpaddd ymm2,ymm2,ymm7
+ vpor ymm13,ymm13,ymm9
+ vpor ymm4,ymm4,ymm6
+ vpxor ymm14,ymm14,ymm11
+ vmovdqa ymm11,YMMWORD[((320-256-128))+rbx]
+
+ vpslld ymm7,ymm2,5
+ vpaddd ymm1,ymm1,ymm15
+ vpxor ymm5,ymm0,ymm3
+ vmovdqa YMMWORD[(224-128)+rax],ymm13
+ vpaddd ymm1,ymm1,ymm13
+ vpxor ymm14,ymm14,YMMWORD[((0-128))+rax]
+ vpsrld ymm8,ymm2,27
+ vpxor ymm5,ymm5,ymm4
+ vpxor ymm14,ymm14,ymm11
+
+ vpslld ymm6,ymm3,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm1,ymm1,ymm5
+ vpsrld ymm9,ymm14,31
+ vpaddd ymm14,ymm14,ymm14
+
+ vpsrld ymm3,ymm3,2
+ vpaddd ymm1,ymm1,ymm7
+ vpor ymm14,ymm14,ymm9
+ vpor ymm3,ymm3,ymm6
+ vpxor ymm10,ymm10,ymm12
+ vmovdqa ymm12,YMMWORD[((352-256-128))+rbx]
+
+ vpslld ymm7,ymm1,5
+ vpaddd ymm0,ymm0,ymm15
+ vpxor ymm5,ymm4,ymm2
+ vmovdqa YMMWORD[(256-256-128)+rbx],ymm14
+ vpaddd ymm0,ymm0,ymm14
+ vpxor ymm10,ymm10,YMMWORD[((32-128))+rax]
+ vpsrld ymm8,ymm1,27
+ vpxor ymm5,ymm5,ymm3
+ vpxor ymm10,ymm10,ymm12
+
+ vpslld ymm6,ymm2,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm0,ymm0,ymm5
+ vpsrld ymm9,ymm10,31
+ vpaddd ymm10,ymm10,ymm10
+
+ vpsrld ymm2,ymm2,2
+ vpaddd ymm0,ymm0,ymm7
+ vpor ymm10,ymm10,ymm9
+ vpor ymm2,ymm2,ymm6
+ vpxor ymm11,ymm11,ymm13
+ vmovdqa ymm13,YMMWORD[((384-256-128))+rbx]
+
+ vpslld ymm7,ymm0,5
+ vpaddd ymm4,ymm4,ymm15
+ vpxor ymm5,ymm3,ymm1
+ vmovdqa YMMWORD[(288-256-128)+rbx],ymm10
+ vpaddd ymm4,ymm4,ymm10
+ vpxor ymm11,ymm11,YMMWORD[((64-128))+rax]
+ vpsrld ymm8,ymm0,27
+ vpxor ymm5,ymm5,ymm2
+ vpxor ymm11,ymm11,ymm13
+
+ vpslld ymm6,ymm1,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm4,ymm4,ymm5
+ vpsrld ymm9,ymm11,31
+ vpaddd ymm11,ymm11,ymm11
+
+ vpsrld ymm1,ymm1,2
+ vpaddd ymm4,ymm4,ymm7
+ vpor ymm11,ymm11,ymm9
+ vpor ymm1,ymm1,ymm6
+ vpxor ymm12,ymm12,ymm14
+ vmovdqa ymm14,YMMWORD[((416-256-128))+rbx]
+
+ vpslld ymm7,ymm4,5
+ vpaddd ymm3,ymm3,ymm15
+ vpxor ymm5,ymm2,ymm0
+ vmovdqa YMMWORD[(320-256-128)+rbx],ymm11
+ vpaddd ymm3,ymm3,ymm11
+ vpxor ymm12,ymm12,YMMWORD[((96-128))+rax]
+ vpsrld ymm8,ymm4,27
+ vpxor ymm5,ymm5,ymm1
+ vpxor ymm12,ymm12,ymm14
+
+ vpslld ymm6,ymm0,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm3,ymm3,ymm5
+ vpsrld ymm9,ymm12,31
+ vpaddd ymm12,ymm12,ymm12
+
+ vpsrld ymm0,ymm0,2
+ vpaddd ymm3,ymm3,ymm7
+ vpor ymm12,ymm12,ymm9
+ vpor ymm0,ymm0,ymm6
+ vpxor ymm13,ymm13,ymm10
+ vmovdqa ymm10,YMMWORD[((448-256-128))+rbx]
+
+ vpslld ymm7,ymm3,5
+ vpaddd ymm2,ymm2,ymm15
+ vpxor ymm5,ymm1,ymm4
+ vmovdqa YMMWORD[(352-256-128)+rbx],ymm12
+ vpaddd ymm2,ymm2,ymm12
+ vpxor ymm13,ymm13,YMMWORD[((128-128))+rax]
+ vpsrld ymm8,ymm3,27
+ vpxor ymm5,ymm5,ymm0
+ vpxor ymm13,ymm13,ymm10
+
+ vpslld ymm6,ymm4,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm2,ymm2,ymm5
+ vpsrld ymm9,ymm13,31
+ vpaddd ymm13,ymm13,ymm13
+
+ vpsrld ymm4,ymm4,2
+ vpaddd ymm2,ymm2,ymm7
+ vpor ymm13,ymm13,ymm9
+ vpor ymm4,ymm4,ymm6
+ vpxor ymm14,ymm14,ymm11
+ vmovdqa ymm11,YMMWORD[((480-256-128))+rbx]
+
+ vpslld ymm7,ymm2,5
+ vpaddd ymm1,ymm1,ymm15
+ vpxor ymm5,ymm0,ymm3
+ vmovdqa YMMWORD[(384-256-128)+rbx],ymm13
+ vpaddd ymm1,ymm1,ymm13
+ vpxor ymm14,ymm14,YMMWORD[((160-128))+rax]
+ vpsrld ymm8,ymm2,27
+ vpxor ymm5,ymm5,ymm4
+ vpxor ymm14,ymm14,ymm11
+
+ vpslld ymm6,ymm3,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm1,ymm1,ymm5
+ vpsrld ymm9,ymm14,31
+ vpaddd ymm14,ymm14,ymm14
+
+ vpsrld ymm3,ymm3,2
+ vpaddd ymm1,ymm1,ymm7
+ vpor ymm14,ymm14,ymm9
+ vpor ymm3,ymm3,ymm6
+ vpxor ymm10,ymm10,ymm12
+ vmovdqa ymm12,YMMWORD[((0-128))+rax]
+
+ vpslld ymm7,ymm1,5
+ vpaddd ymm0,ymm0,ymm15
+ vpxor ymm5,ymm4,ymm2
+ vmovdqa YMMWORD[(416-256-128)+rbx],ymm14
+ vpaddd ymm0,ymm0,ymm14
+ vpxor ymm10,ymm10,YMMWORD[((192-128))+rax]
+ vpsrld ymm8,ymm1,27
+ vpxor ymm5,ymm5,ymm3
+ vpxor ymm10,ymm10,ymm12
+
+ vpslld ymm6,ymm2,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm0,ymm0,ymm5
+ vpsrld ymm9,ymm10,31
+ vpaddd ymm10,ymm10,ymm10
+
+ vpsrld ymm2,ymm2,2
+ vpaddd ymm0,ymm0,ymm7
+ vpor ymm10,ymm10,ymm9
+ vpor ymm2,ymm2,ymm6
+ vpxor ymm11,ymm11,ymm13
+ vmovdqa ymm13,YMMWORD[((32-128))+rax]
+
+ vpslld ymm7,ymm0,5
+ vpaddd ymm4,ymm4,ymm15
+ vpxor ymm5,ymm3,ymm1
+ vmovdqa YMMWORD[(448-256-128)+rbx],ymm10
+ vpaddd ymm4,ymm4,ymm10
+ vpxor ymm11,ymm11,YMMWORD[((224-128))+rax]
+ vpsrld ymm8,ymm0,27
+ vpxor ymm5,ymm5,ymm2
+ vpxor ymm11,ymm11,ymm13
+
+ vpslld ymm6,ymm1,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm4,ymm4,ymm5
+ vpsrld ymm9,ymm11,31
+ vpaddd ymm11,ymm11,ymm11
+
+ vpsrld ymm1,ymm1,2
+ vpaddd ymm4,ymm4,ymm7
+ vpor ymm11,ymm11,ymm9
+ vpor ymm1,ymm1,ymm6
+ vpxor ymm12,ymm12,ymm14
+ vmovdqa ymm14,YMMWORD[((64-128))+rax]
+
+ vpslld ymm7,ymm4,5
+ vpaddd ymm3,ymm3,ymm15
+ vpxor ymm5,ymm2,ymm0
+ vmovdqa YMMWORD[(480-256-128)+rbx],ymm11
+ vpaddd ymm3,ymm3,ymm11
+ vpxor ymm12,ymm12,YMMWORD[((256-256-128))+rbx]
+ vpsrld ymm8,ymm4,27
+ vpxor ymm5,ymm5,ymm1
+ vpxor ymm12,ymm12,ymm14
+
+ vpslld ymm6,ymm0,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm3,ymm3,ymm5
+ vpsrld ymm9,ymm12,31
+ vpaddd ymm12,ymm12,ymm12
+
+ vpsrld ymm0,ymm0,2
+ vpaddd ymm3,ymm3,ymm7
+ vpor ymm12,ymm12,ymm9
+ vpor ymm0,ymm0,ymm6
+ vpxor ymm13,ymm13,ymm10
+ vmovdqa ymm10,YMMWORD[((96-128))+rax]
+
+ vpslld ymm7,ymm3,5
+ vpaddd ymm2,ymm2,ymm15
+ vpxor ymm5,ymm1,ymm4
+ vmovdqa YMMWORD[(0-128)+rax],ymm12
+ vpaddd ymm2,ymm2,ymm12
+ vpxor ymm13,ymm13,YMMWORD[((288-256-128))+rbx]
+ vpsrld ymm8,ymm3,27
+ vpxor ymm5,ymm5,ymm0
+ vpxor ymm13,ymm13,ymm10
+
+ vpslld ymm6,ymm4,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm2,ymm2,ymm5
+ vpsrld ymm9,ymm13,31
+ vpaddd ymm13,ymm13,ymm13
+
+ vpsrld ymm4,ymm4,2
+ vpaddd ymm2,ymm2,ymm7
+ vpor ymm13,ymm13,ymm9
+ vpor ymm4,ymm4,ymm6
+ vpxor ymm14,ymm14,ymm11
+ vmovdqa ymm11,YMMWORD[((128-128))+rax]
+
+ vpslld ymm7,ymm2,5
+ vpaddd ymm1,ymm1,ymm15
+ vpxor ymm5,ymm0,ymm3
+ vmovdqa YMMWORD[(32-128)+rax],ymm13
+ vpaddd ymm1,ymm1,ymm13
+ vpxor ymm14,ymm14,YMMWORD[((320-256-128))+rbx]
+ vpsrld ymm8,ymm2,27
+ vpxor ymm5,ymm5,ymm4
+ vpxor ymm14,ymm14,ymm11
+
+ vpslld ymm6,ymm3,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm1,ymm1,ymm5
+ vpsrld ymm9,ymm14,31
+ vpaddd ymm14,ymm14,ymm14
+
+ vpsrld ymm3,ymm3,2
+ vpaddd ymm1,ymm1,ymm7
+ vpor ymm14,ymm14,ymm9
+ vpor ymm3,ymm3,ymm6
+ vpxor ymm10,ymm10,ymm12
+ vmovdqa ymm12,YMMWORD[((160-128))+rax]
+
+ vpslld ymm7,ymm1,5
+ vpaddd ymm0,ymm0,ymm15
+ vpxor ymm5,ymm4,ymm2
+ vmovdqa YMMWORD[(64-128)+rax],ymm14
+ vpaddd ymm0,ymm0,ymm14
+ vpxor ymm10,ymm10,YMMWORD[((352-256-128))+rbx]
+ vpsrld ymm8,ymm1,27
+ vpxor ymm5,ymm5,ymm3
+ vpxor ymm10,ymm10,ymm12
+
+ vpslld ymm6,ymm2,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm0,ymm0,ymm5
+ vpsrld ymm9,ymm10,31
+ vpaddd ymm10,ymm10,ymm10
+
+ vpsrld ymm2,ymm2,2
+ vpaddd ymm0,ymm0,ymm7
+ vpor ymm10,ymm10,ymm9
+ vpor ymm2,ymm2,ymm6
+ vpxor ymm11,ymm11,ymm13
+ vmovdqa ymm13,YMMWORD[((192-128))+rax]
+
+ vpslld ymm7,ymm0,5
+ vpaddd ymm4,ymm4,ymm15
+ vpxor ymm5,ymm3,ymm1
+ vmovdqa YMMWORD[(96-128)+rax],ymm10
+ vpaddd ymm4,ymm4,ymm10
+ vpxor ymm11,ymm11,YMMWORD[((384-256-128))+rbx]
+ vpsrld ymm8,ymm0,27
+ vpxor ymm5,ymm5,ymm2
+ vpxor ymm11,ymm11,ymm13
+
+ vpslld ymm6,ymm1,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm4,ymm4,ymm5
+ vpsrld ymm9,ymm11,31
+ vpaddd ymm11,ymm11,ymm11
+
+ vpsrld ymm1,ymm1,2
+ vpaddd ymm4,ymm4,ymm7
+ vpor ymm11,ymm11,ymm9
+ vpor ymm1,ymm1,ymm6
+ vpxor ymm12,ymm12,ymm14
+ vmovdqa ymm14,YMMWORD[((224-128))+rax]
+
+ vpslld ymm7,ymm4,5
+ vpaddd ymm3,ymm3,ymm15
+ vpxor ymm5,ymm2,ymm0
+ vmovdqa YMMWORD[(128-128)+rax],ymm11
+ vpaddd ymm3,ymm3,ymm11
+ vpxor ymm12,ymm12,YMMWORD[((416-256-128))+rbx]
+ vpsrld ymm8,ymm4,27
+ vpxor ymm5,ymm5,ymm1
+ vpxor ymm12,ymm12,ymm14
+
+ vpslld ymm6,ymm0,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm3,ymm3,ymm5
+ vpsrld ymm9,ymm12,31
+ vpaddd ymm12,ymm12,ymm12
+
+ vpsrld ymm0,ymm0,2
+ vpaddd ymm3,ymm3,ymm7
+ vpor ymm12,ymm12,ymm9
+ vpor ymm0,ymm0,ymm6
+ vpxor ymm13,ymm13,ymm10
+ vmovdqa ymm10,YMMWORD[((256-256-128))+rbx]
+
+ vpslld ymm7,ymm3,5
+ vpaddd ymm2,ymm2,ymm15
+ vpxor ymm5,ymm1,ymm4
+ vmovdqa YMMWORD[(160-128)+rax],ymm12
+ vpaddd ymm2,ymm2,ymm12
+ vpxor ymm13,ymm13,YMMWORD[((448-256-128))+rbx]
+ vpsrld ymm8,ymm3,27
+ vpxor ymm5,ymm5,ymm0
+ vpxor ymm13,ymm13,ymm10
+
+ vpslld ymm6,ymm4,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm2,ymm2,ymm5
+ vpsrld ymm9,ymm13,31
+ vpaddd ymm13,ymm13,ymm13
+
+ vpsrld ymm4,ymm4,2
+ vpaddd ymm2,ymm2,ymm7
+ vpor ymm13,ymm13,ymm9
+ vpor ymm4,ymm4,ymm6
+ vpxor ymm14,ymm14,ymm11
+ vmovdqa ymm11,YMMWORD[((288-256-128))+rbx]
+
+ vpslld ymm7,ymm2,5
+ vpaddd ymm1,ymm1,ymm15
+ vpxor ymm5,ymm0,ymm3
+ vmovdqa YMMWORD[(192-128)+rax],ymm13
+ vpaddd ymm1,ymm1,ymm13
+ vpxor ymm14,ymm14,YMMWORD[((480-256-128))+rbx]
+ vpsrld ymm8,ymm2,27
+ vpxor ymm5,ymm5,ymm4
+ vpxor ymm14,ymm14,ymm11
+
+ vpslld ymm6,ymm3,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm1,ymm1,ymm5
+ vpsrld ymm9,ymm14,31
+ vpaddd ymm14,ymm14,ymm14
+
+ vpsrld ymm3,ymm3,2
+ vpaddd ymm1,ymm1,ymm7
+ vpor ymm14,ymm14,ymm9
+ vpor ymm3,ymm3,ymm6
+ vpxor ymm10,ymm10,ymm12
+ vmovdqa ymm12,YMMWORD[((320-256-128))+rbx]
+
+ vpslld ymm7,ymm1,5
+ vpaddd ymm0,ymm0,ymm15
+ vpxor ymm5,ymm4,ymm2
+ vmovdqa YMMWORD[(224-128)+rax],ymm14
+ vpaddd ymm0,ymm0,ymm14
+ vpxor ymm10,ymm10,YMMWORD[((0-128))+rax]
+ vpsrld ymm8,ymm1,27
+ vpxor ymm5,ymm5,ymm3
+ vpxor ymm10,ymm10,ymm12
+
+ vpslld ymm6,ymm2,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm0,ymm0,ymm5
+ vpsrld ymm9,ymm10,31
+ vpaddd ymm10,ymm10,ymm10
+
+ vpsrld ymm2,ymm2,2
+ vpaddd ymm0,ymm0,ymm7
+ vpor ymm10,ymm10,ymm9
+ vpor ymm2,ymm2,ymm6
+ vmovdqa ymm15,YMMWORD[32+rbp]
+ vpxor ymm11,ymm11,ymm13
+ vmovdqa ymm13,YMMWORD[((352-256-128))+rbx]
+
+ vpaddd ymm4,ymm4,ymm15
+ vpslld ymm7,ymm0,5
+ vpand ymm6,ymm3,ymm2
+ vpxor ymm11,ymm11,YMMWORD[((32-128))+rax]
+
+ vpaddd ymm4,ymm4,ymm6
+ vpsrld ymm8,ymm0,27
+ vpxor ymm5,ymm3,ymm2
+ vpxor ymm11,ymm11,ymm13
+
+ vmovdqu YMMWORD[(256-256-128)+rbx],ymm10
+ vpaddd ymm4,ymm4,ymm10
+ vpor ymm7,ymm7,ymm8
+ vpsrld ymm9,ymm11,31
+ vpand ymm5,ymm5,ymm1
+ vpaddd ymm11,ymm11,ymm11
+
+ vpslld ymm6,ymm1,30
+ vpaddd ymm4,ymm4,ymm5
+
+ vpsrld ymm1,ymm1,2
+ vpaddd ymm4,ymm4,ymm7
+ vpor ymm11,ymm11,ymm9
+ vpor ymm1,ymm1,ymm6
+ vpxor ymm12,ymm12,ymm14
+ vmovdqa ymm14,YMMWORD[((384-256-128))+rbx]
+
+ vpaddd ymm3,ymm3,ymm15
+ vpslld ymm7,ymm4,5
+ vpand ymm6,ymm2,ymm1
+ vpxor ymm12,ymm12,YMMWORD[((64-128))+rax]
+
+ vpaddd ymm3,ymm3,ymm6
+ vpsrld ymm8,ymm4,27
+ vpxor ymm5,ymm2,ymm1
+ vpxor ymm12,ymm12,ymm14
+
+ vmovdqu YMMWORD[(288-256-128)+rbx],ymm11
+ vpaddd ymm3,ymm3,ymm11
+ vpor ymm7,ymm7,ymm8
+ vpsrld ymm9,ymm12,31
+ vpand ymm5,ymm5,ymm0
+ vpaddd ymm12,ymm12,ymm12
+
+ vpslld ymm6,ymm0,30
+ vpaddd ymm3,ymm3,ymm5
+
+ vpsrld ymm0,ymm0,2
+ vpaddd ymm3,ymm3,ymm7
+ vpor ymm12,ymm12,ymm9
+ vpor ymm0,ymm0,ymm6
+ vpxor ymm13,ymm13,ymm10
+ vmovdqa ymm10,YMMWORD[((416-256-128))+rbx]
+
+ vpaddd ymm2,ymm2,ymm15
+ vpslld ymm7,ymm3,5
+ vpand ymm6,ymm1,ymm0
+ vpxor ymm13,ymm13,YMMWORD[((96-128))+rax]
+
+ vpaddd ymm2,ymm2,ymm6
+ vpsrld ymm8,ymm3,27
+ vpxor ymm5,ymm1,ymm0
+ vpxor ymm13,ymm13,ymm10
+
+ vmovdqu YMMWORD[(320-256-128)+rbx],ymm12
+ vpaddd ymm2,ymm2,ymm12
+ vpor ymm7,ymm7,ymm8
+ vpsrld ymm9,ymm13,31
+ vpand ymm5,ymm5,ymm4
+ vpaddd ymm13,ymm13,ymm13
+
+ vpslld ymm6,ymm4,30
+ vpaddd ymm2,ymm2,ymm5
+
+ vpsrld ymm4,ymm4,2
+ vpaddd ymm2,ymm2,ymm7
+ vpor ymm13,ymm13,ymm9
+ vpor ymm4,ymm4,ymm6
+ vpxor ymm14,ymm14,ymm11
+ vmovdqa ymm11,YMMWORD[((448-256-128))+rbx]
+
+ vpaddd ymm1,ymm1,ymm15
+ vpslld ymm7,ymm2,5
+ vpand ymm6,ymm0,ymm4
+ vpxor ymm14,ymm14,YMMWORD[((128-128))+rax]
+
+ vpaddd ymm1,ymm1,ymm6
+ vpsrld ymm8,ymm2,27
+ vpxor ymm5,ymm0,ymm4
+ vpxor ymm14,ymm14,ymm11
+
+ vmovdqu YMMWORD[(352-256-128)+rbx],ymm13
+ vpaddd ymm1,ymm1,ymm13
+ vpor ymm7,ymm7,ymm8
+ vpsrld ymm9,ymm14,31
+ vpand ymm5,ymm5,ymm3
+ vpaddd ymm14,ymm14,ymm14
+
+ vpslld ymm6,ymm3,30
+ vpaddd ymm1,ymm1,ymm5
+
+ vpsrld ymm3,ymm3,2
+ vpaddd ymm1,ymm1,ymm7
+ vpor ymm14,ymm14,ymm9
+ vpor ymm3,ymm3,ymm6
+ vpxor ymm10,ymm10,ymm12
+ vmovdqa ymm12,YMMWORD[((480-256-128))+rbx]
+
+ vpaddd ymm0,ymm0,ymm15
+ vpslld ymm7,ymm1,5
+ vpand ymm6,ymm4,ymm3
+ vpxor ymm10,ymm10,YMMWORD[((160-128))+rax]
+
+ vpaddd ymm0,ymm0,ymm6
+ vpsrld ymm8,ymm1,27
+ vpxor ymm5,ymm4,ymm3
+ vpxor ymm10,ymm10,ymm12
+
+ vmovdqu YMMWORD[(384-256-128)+rbx],ymm14
+ vpaddd ymm0,ymm0,ymm14
+ vpor ymm7,ymm7,ymm8
+ vpsrld ymm9,ymm10,31
+ vpand ymm5,ymm5,ymm2
+ vpaddd ymm10,ymm10,ymm10
+
+ vpslld ymm6,ymm2,30
+ vpaddd ymm0,ymm0,ymm5
+
+ vpsrld ymm2,ymm2,2
+ vpaddd ymm0,ymm0,ymm7
+ vpor ymm10,ymm10,ymm9
+ vpor ymm2,ymm2,ymm6
+ vpxor ymm11,ymm11,ymm13
+ vmovdqa ymm13,YMMWORD[((0-128))+rax]
+
+ vpaddd ymm4,ymm4,ymm15
+ vpslld ymm7,ymm0,5
+ vpand ymm6,ymm3,ymm2
+ vpxor ymm11,ymm11,YMMWORD[((192-128))+rax]
+
+ vpaddd ymm4,ymm4,ymm6
+ vpsrld ymm8,ymm0,27
+ vpxor ymm5,ymm3,ymm2
+ vpxor ymm11,ymm11,ymm13
+
+ vmovdqu YMMWORD[(416-256-128)+rbx],ymm10
+ vpaddd ymm4,ymm4,ymm10
+ vpor ymm7,ymm7,ymm8
+ vpsrld ymm9,ymm11,31
+ vpand ymm5,ymm5,ymm1
+ vpaddd ymm11,ymm11,ymm11
+
+ vpslld ymm6,ymm1,30
+ vpaddd ymm4,ymm4,ymm5
+
+ vpsrld ymm1,ymm1,2
+ vpaddd ymm4,ymm4,ymm7
+ vpor ymm11,ymm11,ymm9
+ vpor ymm1,ymm1,ymm6
+ vpxor ymm12,ymm12,ymm14
+ vmovdqa ymm14,YMMWORD[((32-128))+rax]
+
+ vpaddd ymm3,ymm3,ymm15
+ vpslld ymm7,ymm4,5
+ vpand ymm6,ymm2,ymm1
+ vpxor ymm12,ymm12,YMMWORD[((224-128))+rax]
+
+ vpaddd ymm3,ymm3,ymm6
+ vpsrld ymm8,ymm4,27
+ vpxor ymm5,ymm2,ymm1
+ vpxor ymm12,ymm12,ymm14
+
+ vmovdqu YMMWORD[(448-256-128)+rbx],ymm11
+ vpaddd ymm3,ymm3,ymm11
+ vpor ymm7,ymm7,ymm8
+ vpsrld ymm9,ymm12,31
+ vpand ymm5,ymm5,ymm0
+ vpaddd ymm12,ymm12,ymm12
+
+ vpslld ymm6,ymm0,30
+ vpaddd ymm3,ymm3,ymm5
+
+ vpsrld ymm0,ymm0,2
+ vpaddd ymm3,ymm3,ymm7
+ vpor ymm12,ymm12,ymm9
+ vpor ymm0,ymm0,ymm6
+ vpxor ymm13,ymm13,ymm10
+ vmovdqa ymm10,YMMWORD[((64-128))+rax]
+
+ vpaddd ymm2,ymm2,ymm15
+ vpslld ymm7,ymm3,5
+ vpand ymm6,ymm1,ymm0
+ vpxor ymm13,ymm13,YMMWORD[((256-256-128))+rbx]
+
+ vpaddd ymm2,ymm2,ymm6
+ vpsrld ymm8,ymm3,27
+ vpxor ymm5,ymm1,ymm0
+ vpxor ymm13,ymm13,ymm10
+
+ vmovdqu YMMWORD[(480-256-128)+rbx],ymm12
+ vpaddd ymm2,ymm2,ymm12
+ vpor ymm7,ymm7,ymm8
+ vpsrld ymm9,ymm13,31
+ vpand ymm5,ymm5,ymm4
+ vpaddd ymm13,ymm13,ymm13
+
+ vpslld ymm6,ymm4,30
+ vpaddd ymm2,ymm2,ymm5
+
+ vpsrld ymm4,ymm4,2
+ vpaddd ymm2,ymm2,ymm7
+ vpor ymm13,ymm13,ymm9
+ vpor ymm4,ymm4,ymm6
+ vpxor ymm14,ymm14,ymm11
+ vmovdqa ymm11,YMMWORD[((96-128))+rax]
+
+ vpaddd ymm1,ymm1,ymm15
+ vpslld ymm7,ymm2,5
+ vpand ymm6,ymm0,ymm4
+ vpxor ymm14,ymm14,YMMWORD[((288-256-128))+rbx]
+
+ vpaddd ymm1,ymm1,ymm6
+ vpsrld ymm8,ymm2,27
+ vpxor ymm5,ymm0,ymm4
+ vpxor ymm14,ymm14,ymm11
+
+ vmovdqu YMMWORD[(0-128)+rax],ymm13
+ vpaddd ymm1,ymm1,ymm13
+ vpor ymm7,ymm7,ymm8
+ vpsrld ymm9,ymm14,31
+ vpand ymm5,ymm5,ymm3
+ vpaddd ymm14,ymm14,ymm14
+
+ vpslld ymm6,ymm3,30
+ vpaddd ymm1,ymm1,ymm5
+
+ vpsrld ymm3,ymm3,2
+ vpaddd ymm1,ymm1,ymm7
+ vpor ymm14,ymm14,ymm9
+ vpor ymm3,ymm3,ymm6
+ vpxor ymm10,ymm10,ymm12
+ vmovdqa ymm12,YMMWORD[((128-128))+rax]
+
+ vpaddd ymm0,ymm0,ymm15
+ vpslld ymm7,ymm1,5
+ vpand ymm6,ymm4,ymm3
+ vpxor ymm10,ymm10,YMMWORD[((320-256-128))+rbx]
+
+ vpaddd ymm0,ymm0,ymm6
+ vpsrld ymm8,ymm1,27
+ vpxor ymm5,ymm4,ymm3
+ vpxor ymm10,ymm10,ymm12
+
+ vmovdqu YMMWORD[(32-128)+rax],ymm14
+ vpaddd ymm0,ymm0,ymm14
+ vpor ymm7,ymm7,ymm8
+ vpsrld ymm9,ymm10,31
+ vpand ymm5,ymm5,ymm2
+ vpaddd ymm10,ymm10,ymm10
+
+ vpslld ymm6,ymm2,30
+ vpaddd ymm0,ymm0,ymm5
+
+ vpsrld ymm2,ymm2,2
+ vpaddd ymm0,ymm0,ymm7
+ vpor ymm10,ymm10,ymm9
+ vpor ymm2,ymm2,ymm6
+ vpxor ymm11,ymm11,ymm13
+ vmovdqa ymm13,YMMWORD[((160-128))+rax]
+
+ vpaddd ymm4,ymm4,ymm15
+ vpslld ymm7,ymm0,5
+ vpand ymm6,ymm3,ymm2
+ vpxor ymm11,ymm11,YMMWORD[((352-256-128))+rbx]
+
+ vpaddd ymm4,ymm4,ymm6
+ vpsrld ymm8,ymm0,27
+ vpxor ymm5,ymm3,ymm2
+ vpxor ymm11,ymm11,ymm13
+
+ vmovdqu YMMWORD[(64-128)+rax],ymm10
+ vpaddd ymm4,ymm4,ymm10
+ vpor ymm7,ymm7,ymm8
+ vpsrld ymm9,ymm11,31
+ vpand ymm5,ymm5,ymm1
+ vpaddd ymm11,ymm11,ymm11
+
+ vpslld ymm6,ymm1,30
+ vpaddd ymm4,ymm4,ymm5
+
+ vpsrld ymm1,ymm1,2
+ vpaddd ymm4,ymm4,ymm7
+ vpor ymm11,ymm11,ymm9
+ vpor ymm1,ymm1,ymm6
+ vpxor ymm12,ymm12,ymm14
+ vmovdqa ymm14,YMMWORD[((192-128))+rax]
+
+ vpaddd ymm3,ymm3,ymm15
+ vpslld ymm7,ymm4,5
+ vpand ymm6,ymm2,ymm1
+ vpxor ymm12,ymm12,YMMWORD[((384-256-128))+rbx]
+
+ vpaddd ymm3,ymm3,ymm6
+ vpsrld ymm8,ymm4,27
+ vpxor ymm5,ymm2,ymm1
+ vpxor ymm12,ymm12,ymm14
+
+ vmovdqu YMMWORD[(96-128)+rax],ymm11
+ vpaddd ymm3,ymm3,ymm11
+ vpor ymm7,ymm7,ymm8
+ vpsrld ymm9,ymm12,31
+ vpand ymm5,ymm5,ymm0
+ vpaddd ymm12,ymm12,ymm12
+
+ vpslld ymm6,ymm0,30
+ vpaddd ymm3,ymm3,ymm5
+
+ vpsrld ymm0,ymm0,2
+ vpaddd ymm3,ymm3,ymm7
+ vpor ymm12,ymm12,ymm9
+ vpor ymm0,ymm0,ymm6
+ vpxor ymm13,ymm13,ymm10
+ vmovdqa ymm10,YMMWORD[((224-128))+rax]
+
+ vpaddd ymm2,ymm2,ymm15
+ vpslld ymm7,ymm3,5
+ vpand ymm6,ymm1,ymm0
+ vpxor ymm13,ymm13,YMMWORD[((416-256-128))+rbx]
+
+ vpaddd ymm2,ymm2,ymm6
+ vpsrld ymm8,ymm3,27
+ vpxor ymm5,ymm1,ymm0
+ vpxor ymm13,ymm13,ymm10
+
+ vmovdqu YMMWORD[(128-128)+rax],ymm12
+ vpaddd ymm2,ymm2,ymm12
+ vpor ymm7,ymm7,ymm8
+ vpsrld ymm9,ymm13,31
+ vpand ymm5,ymm5,ymm4
+ vpaddd ymm13,ymm13,ymm13
+
+ vpslld ymm6,ymm4,30
+ vpaddd ymm2,ymm2,ymm5
+
+ vpsrld ymm4,ymm4,2
+ vpaddd ymm2,ymm2,ymm7
+ vpor ymm13,ymm13,ymm9
+ vpor ymm4,ymm4,ymm6
+ vpxor ymm14,ymm14,ymm11
+ vmovdqa ymm11,YMMWORD[((256-256-128))+rbx]
+
+ vpaddd ymm1,ymm1,ymm15
+ vpslld ymm7,ymm2,5
+ vpand ymm6,ymm0,ymm4
+ vpxor ymm14,ymm14,YMMWORD[((448-256-128))+rbx]
+
+ vpaddd ymm1,ymm1,ymm6
+ vpsrld ymm8,ymm2,27
+ vpxor ymm5,ymm0,ymm4
+ vpxor ymm14,ymm14,ymm11
+
+ vmovdqu YMMWORD[(160-128)+rax],ymm13
+ vpaddd ymm1,ymm1,ymm13
+ vpor ymm7,ymm7,ymm8
+ vpsrld ymm9,ymm14,31
+ vpand ymm5,ymm5,ymm3
+ vpaddd ymm14,ymm14,ymm14
+
+ vpslld ymm6,ymm3,30
+ vpaddd ymm1,ymm1,ymm5
+
+ vpsrld ymm3,ymm3,2
+ vpaddd ymm1,ymm1,ymm7
+ vpor ymm14,ymm14,ymm9
+ vpor ymm3,ymm3,ymm6
+ vpxor ymm10,ymm10,ymm12
+ vmovdqa ymm12,YMMWORD[((288-256-128))+rbx]
+
+ vpaddd ymm0,ymm0,ymm15
+ vpslld ymm7,ymm1,5
+ vpand ymm6,ymm4,ymm3
+ vpxor ymm10,ymm10,YMMWORD[((480-256-128))+rbx]
+
+ vpaddd ymm0,ymm0,ymm6
+ vpsrld ymm8,ymm1,27
+ vpxor ymm5,ymm4,ymm3
+ vpxor ymm10,ymm10,ymm12
+
+ vmovdqu YMMWORD[(192-128)+rax],ymm14
+ vpaddd ymm0,ymm0,ymm14
+ vpor ymm7,ymm7,ymm8
+ vpsrld ymm9,ymm10,31
+ vpand ymm5,ymm5,ymm2
+ vpaddd ymm10,ymm10,ymm10
+
+ vpslld ymm6,ymm2,30
+ vpaddd ymm0,ymm0,ymm5
+
+ vpsrld ymm2,ymm2,2
+ vpaddd ymm0,ymm0,ymm7
+ vpor ymm10,ymm10,ymm9
+ vpor ymm2,ymm2,ymm6
+ vpxor ymm11,ymm11,ymm13
+ vmovdqa ymm13,YMMWORD[((320-256-128))+rbx]
+
+ vpaddd ymm4,ymm4,ymm15
+ vpslld ymm7,ymm0,5
+ vpand ymm6,ymm3,ymm2
+ vpxor ymm11,ymm11,YMMWORD[((0-128))+rax]
+
+ vpaddd ymm4,ymm4,ymm6
+ vpsrld ymm8,ymm0,27
+ vpxor ymm5,ymm3,ymm2
+ vpxor ymm11,ymm11,ymm13
+
+ vmovdqu YMMWORD[(224-128)+rax],ymm10
+ vpaddd ymm4,ymm4,ymm10
+ vpor ymm7,ymm7,ymm8
+ vpsrld ymm9,ymm11,31
+ vpand ymm5,ymm5,ymm1
+ vpaddd ymm11,ymm11,ymm11
+
+ vpslld ymm6,ymm1,30
+ vpaddd ymm4,ymm4,ymm5
+
+ vpsrld ymm1,ymm1,2
+ vpaddd ymm4,ymm4,ymm7
+ vpor ymm11,ymm11,ymm9
+ vpor ymm1,ymm1,ymm6
+ vpxor ymm12,ymm12,ymm14
+ vmovdqa ymm14,YMMWORD[((352-256-128))+rbx]
+
+ vpaddd ymm3,ymm3,ymm15
+ vpslld ymm7,ymm4,5
+ vpand ymm6,ymm2,ymm1
+ vpxor ymm12,ymm12,YMMWORD[((32-128))+rax]
+
+ vpaddd ymm3,ymm3,ymm6
+ vpsrld ymm8,ymm4,27
+ vpxor ymm5,ymm2,ymm1
+ vpxor ymm12,ymm12,ymm14
+
+ vmovdqu YMMWORD[(256-256-128)+rbx],ymm11
+ vpaddd ymm3,ymm3,ymm11
+ vpor ymm7,ymm7,ymm8
+ vpsrld ymm9,ymm12,31
+ vpand ymm5,ymm5,ymm0
+ vpaddd ymm12,ymm12,ymm12
+
+ vpslld ymm6,ymm0,30
+ vpaddd ymm3,ymm3,ymm5
+
+ vpsrld ymm0,ymm0,2
+ vpaddd ymm3,ymm3,ymm7
+ vpor ymm12,ymm12,ymm9
+ vpor ymm0,ymm0,ymm6
+ vpxor ymm13,ymm13,ymm10
+ vmovdqa ymm10,YMMWORD[((384-256-128))+rbx]
+
+ vpaddd ymm2,ymm2,ymm15
+ vpslld ymm7,ymm3,5
+ vpand ymm6,ymm1,ymm0
+ vpxor ymm13,ymm13,YMMWORD[((64-128))+rax]
+
+ vpaddd ymm2,ymm2,ymm6
+ vpsrld ymm8,ymm3,27
+ vpxor ymm5,ymm1,ymm0
+ vpxor ymm13,ymm13,ymm10
+
+ vmovdqu YMMWORD[(288-256-128)+rbx],ymm12
+ vpaddd ymm2,ymm2,ymm12
+ vpor ymm7,ymm7,ymm8
+ vpsrld ymm9,ymm13,31
+ vpand ymm5,ymm5,ymm4
+ vpaddd ymm13,ymm13,ymm13
+
+ vpslld ymm6,ymm4,30
+ vpaddd ymm2,ymm2,ymm5
+
+ vpsrld ymm4,ymm4,2
+ vpaddd ymm2,ymm2,ymm7
+ vpor ymm13,ymm13,ymm9
+ vpor ymm4,ymm4,ymm6
+ vpxor ymm14,ymm14,ymm11
+ vmovdqa ymm11,YMMWORD[((416-256-128))+rbx]
+
+ vpaddd ymm1,ymm1,ymm15
+ vpslld ymm7,ymm2,5
+ vpand ymm6,ymm0,ymm4
+ vpxor ymm14,ymm14,YMMWORD[((96-128))+rax]
+
+ vpaddd ymm1,ymm1,ymm6
+ vpsrld ymm8,ymm2,27
+ vpxor ymm5,ymm0,ymm4
+ vpxor ymm14,ymm14,ymm11
+
+ vmovdqu YMMWORD[(320-256-128)+rbx],ymm13
+ vpaddd ymm1,ymm1,ymm13
+ vpor ymm7,ymm7,ymm8
+ vpsrld ymm9,ymm14,31
+ vpand ymm5,ymm5,ymm3
+ vpaddd ymm14,ymm14,ymm14
+
+ vpslld ymm6,ymm3,30
+ vpaddd ymm1,ymm1,ymm5
+
+ vpsrld ymm3,ymm3,2
+ vpaddd ymm1,ymm1,ymm7
+ vpor ymm14,ymm14,ymm9
+ vpor ymm3,ymm3,ymm6
+ vpxor ymm10,ymm10,ymm12
+ vmovdqa ymm12,YMMWORD[((448-256-128))+rbx]
+
+ vpaddd ymm0,ymm0,ymm15
+ vpslld ymm7,ymm1,5
+ vpand ymm6,ymm4,ymm3
+ vpxor ymm10,ymm10,YMMWORD[((128-128))+rax]
+
+ vpaddd ymm0,ymm0,ymm6
+ vpsrld ymm8,ymm1,27
+ vpxor ymm5,ymm4,ymm3
+ vpxor ymm10,ymm10,ymm12
+
+ vmovdqu YMMWORD[(352-256-128)+rbx],ymm14
+ vpaddd ymm0,ymm0,ymm14
+ vpor ymm7,ymm7,ymm8
+ vpsrld ymm9,ymm10,31
+ vpand ymm5,ymm5,ymm2
+ vpaddd ymm10,ymm10,ymm10
+
+ vpslld ymm6,ymm2,30
+ vpaddd ymm0,ymm0,ymm5
+
+ vpsrld ymm2,ymm2,2
+ vpaddd ymm0,ymm0,ymm7
+ vpor ymm10,ymm10,ymm9
+ vpor ymm2,ymm2,ymm6
+ vmovdqa ymm15,YMMWORD[64+rbp]
+ vpxor ymm11,ymm11,ymm13
+ vmovdqa ymm13,YMMWORD[((480-256-128))+rbx]
+
+ vpslld ymm7,ymm0,5
+ vpaddd ymm4,ymm4,ymm15
+ vpxor ymm5,ymm3,ymm1
+ vmovdqa YMMWORD[(384-256-128)+rbx],ymm10
+ vpaddd ymm4,ymm4,ymm10
+ vpxor ymm11,ymm11,YMMWORD[((160-128))+rax]
+ vpsrld ymm8,ymm0,27
+ vpxor ymm5,ymm5,ymm2
+ vpxor ymm11,ymm11,ymm13
+
+ vpslld ymm6,ymm1,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm4,ymm4,ymm5
+ vpsrld ymm9,ymm11,31
+ vpaddd ymm11,ymm11,ymm11
+
+ vpsrld ymm1,ymm1,2
+ vpaddd ymm4,ymm4,ymm7
+ vpor ymm11,ymm11,ymm9
+ vpor ymm1,ymm1,ymm6
+ vpxor ymm12,ymm12,ymm14
+ vmovdqa ymm14,YMMWORD[((0-128))+rax]
+
+ vpslld ymm7,ymm4,5
+ vpaddd ymm3,ymm3,ymm15
+ vpxor ymm5,ymm2,ymm0
+ vmovdqa YMMWORD[(416-256-128)+rbx],ymm11
+ vpaddd ymm3,ymm3,ymm11
+ vpxor ymm12,ymm12,YMMWORD[((192-128))+rax]
+ vpsrld ymm8,ymm4,27
+ vpxor ymm5,ymm5,ymm1
+ vpxor ymm12,ymm12,ymm14
+
+ vpslld ymm6,ymm0,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm3,ymm3,ymm5
+ vpsrld ymm9,ymm12,31
+ vpaddd ymm12,ymm12,ymm12
+
+ vpsrld ymm0,ymm0,2
+ vpaddd ymm3,ymm3,ymm7
+ vpor ymm12,ymm12,ymm9
+ vpor ymm0,ymm0,ymm6
+ vpxor ymm13,ymm13,ymm10
+ vmovdqa ymm10,YMMWORD[((32-128))+rax]
+
+ vpslld ymm7,ymm3,5
+ vpaddd ymm2,ymm2,ymm15
+ vpxor ymm5,ymm1,ymm4
+ vmovdqa YMMWORD[(448-256-128)+rbx],ymm12
+ vpaddd ymm2,ymm2,ymm12
+ vpxor ymm13,ymm13,YMMWORD[((224-128))+rax]
+ vpsrld ymm8,ymm3,27
+ vpxor ymm5,ymm5,ymm0
+ vpxor ymm13,ymm13,ymm10
+
+ vpslld ymm6,ymm4,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm2,ymm2,ymm5
+ vpsrld ymm9,ymm13,31
+ vpaddd ymm13,ymm13,ymm13
+
+ vpsrld ymm4,ymm4,2
+ vpaddd ymm2,ymm2,ymm7
+ vpor ymm13,ymm13,ymm9
+ vpor ymm4,ymm4,ymm6
+ vpxor ymm14,ymm14,ymm11
+ vmovdqa ymm11,YMMWORD[((64-128))+rax]
+
+ vpslld ymm7,ymm2,5
+ vpaddd ymm1,ymm1,ymm15
+ vpxor ymm5,ymm0,ymm3
+ vmovdqa YMMWORD[(480-256-128)+rbx],ymm13
+ vpaddd ymm1,ymm1,ymm13
+ vpxor ymm14,ymm14,YMMWORD[((256-256-128))+rbx]
+ vpsrld ymm8,ymm2,27
+ vpxor ymm5,ymm5,ymm4
+ vpxor ymm14,ymm14,ymm11
+
+ vpslld ymm6,ymm3,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm1,ymm1,ymm5
+ vpsrld ymm9,ymm14,31
+ vpaddd ymm14,ymm14,ymm14
+
+ vpsrld ymm3,ymm3,2
+ vpaddd ymm1,ymm1,ymm7
+ vpor ymm14,ymm14,ymm9
+ vpor ymm3,ymm3,ymm6
+ vpxor ymm10,ymm10,ymm12
+ vmovdqa ymm12,YMMWORD[((96-128))+rax]
+
+ vpslld ymm7,ymm1,5
+ vpaddd ymm0,ymm0,ymm15
+ vpxor ymm5,ymm4,ymm2
+ vmovdqa YMMWORD[(0-128)+rax],ymm14
+ vpaddd ymm0,ymm0,ymm14
+ vpxor ymm10,ymm10,YMMWORD[((288-256-128))+rbx]
+ vpsrld ymm8,ymm1,27
+ vpxor ymm5,ymm5,ymm3
+ vpxor ymm10,ymm10,ymm12
+
+ vpslld ymm6,ymm2,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm0,ymm0,ymm5
+ vpsrld ymm9,ymm10,31
+ vpaddd ymm10,ymm10,ymm10
+
+ vpsrld ymm2,ymm2,2
+ vpaddd ymm0,ymm0,ymm7
+ vpor ymm10,ymm10,ymm9
+ vpor ymm2,ymm2,ymm6
+ vpxor ymm11,ymm11,ymm13
+ vmovdqa ymm13,YMMWORD[((128-128))+rax]
+
+ vpslld ymm7,ymm0,5
+ vpaddd ymm4,ymm4,ymm15
+ vpxor ymm5,ymm3,ymm1
+ vmovdqa YMMWORD[(32-128)+rax],ymm10
+ vpaddd ymm4,ymm4,ymm10
+ vpxor ymm11,ymm11,YMMWORD[((320-256-128))+rbx]
+ vpsrld ymm8,ymm0,27
+ vpxor ymm5,ymm5,ymm2
+ vpxor ymm11,ymm11,ymm13
+
+ vpslld ymm6,ymm1,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm4,ymm4,ymm5
+ vpsrld ymm9,ymm11,31
+ vpaddd ymm11,ymm11,ymm11
+
+ vpsrld ymm1,ymm1,2
+ vpaddd ymm4,ymm4,ymm7
+ vpor ymm11,ymm11,ymm9
+ vpor ymm1,ymm1,ymm6
+ vpxor ymm12,ymm12,ymm14
+ vmovdqa ymm14,YMMWORD[((160-128))+rax]
+
+ vpslld ymm7,ymm4,5
+ vpaddd ymm3,ymm3,ymm15
+ vpxor ymm5,ymm2,ymm0
+ vmovdqa YMMWORD[(64-128)+rax],ymm11
+ vpaddd ymm3,ymm3,ymm11
+ vpxor ymm12,ymm12,YMMWORD[((352-256-128))+rbx]
+ vpsrld ymm8,ymm4,27
+ vpxor ymm5,ymm5,ymm1
+ vpxor ymm12,ymm12,ymm14
+
+ vpslld ymm6,ymm0,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm3,ymm3,ymm5
+ vpsrld ymm9,ymm12,31
+ vpaddd ymm12,ymm12,ymm12
+
+ vpsrld ymm0,ymm0,2
+ vpaddd ymm3,ymm3,ymm7
+ vpor ymm12,ymm12,ymm9
+ vpor ymm0,ymm0,ymm6
+ vpxor ymm13,ymm13,ymm10
+ vmovdqa ymm10,YMMWORD[((192-128))+rax]
+
+ vpslld ymm7,ymm3,5
+ vpaddd ymm2,ymm2,ymm15
+ vpxor ymm5,ymm1,ymm4
+ vmovdqa YMMWORD[(96-128)+rax],ymm12
+ vpaddd ymm2,ymm2,ymm12
+ vpxor ymm13,ymm13,YMMWORD[((384-256-128))+rbx]
+ vpsrld ymm8,ymm3,27
+ vpxor ymm5,ymm5,ymm0
+ vpxor ymm13,ymm13,ymm10
+
+ vpslld ymm6,ymm4,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm2,ymm2,ymm5
+ vpsrld ymm9,ymm13,31
+ vpaddd ymm13,ymm13,ymm13
+
+ vpsrld ymm4,ymm4,2
+ vpaddd ymm2,ymm2,ymm7
+ vpor ymm13,ymm13,ymm9
+ vpor ymm4,ymm4,ymm6
+ vpxor ymm14,ymm14,ymm11
+ vmovdqa ymm11,YMMWORD[((224-128))+rax]
+
+ vpslld ymm7,ymm2,5
+ vpaddd ymm1,ymm1,ymm15
+ vpxor ymm5,ymm0,ymm3
+ vmovdqa YMMWORD[(128-128)+rax],ymm13
+ vpaddd ymm1,ymm1,ymm13
+ vpxor ymm14,ymm14,YMMWORD[((416-256-128))+rbx]
+ vpsrld ymm8,ymm2,27
+ vpxor ymm5,ymm5,ymm4
+ vpxor ymm14,ymm14,ymm11
+
+ vpslld ymm6,ymm3,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm1,ymm1,ymm5
+ vpsrld ymm9,ymm14,31
+ vpaddd ymm14,ymm14,ymm14
+
+ vpsrld ymm3,ymm3,2
+ vpaddd ymm1,ymm1,ymm7
+ vpor ymm14,ymm14,ymm9
+ vpor ymm3,ymm3,ymm6
+ vpxor ymm10,ymm10,ymm12
+ vmovdqa ymm12,YMMWORD[((256-256-128))+rbx]
+
+ vpslld ymm7,ymm1,5
+ vpaddd ymm0,ymm0,ymm15
+ vpxor ymm5,ymm4,ymm2
+ vmovdqa YMMWORD[(160-128)+rax],ymm14
+ vpaddd ymm0,ymm0,ymm14
+ vpxor ymm10,ymm10,YMMWORD[((448-256-128))+rbx]
+ vpsrld ymm8,ymm1,27
+ vpxor ymm5,ymm5,ymm3
+ vpxor ymm10,ymm10,ymm12
+
+ vpslld ymm6,ymm2,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm0,ymm0,ymm5
+ vpsrld ymm9,ymm10,31
+ vpaddd ymm10,ymm10,ymm10
+
+ vpsrld ymm2,ymm2,2
+ vpaddd ymm0,ymm0,ymm7
+ vpor ymm10,ymm10,ymm9
+ vpor ymm2,ymm2,ymm6
+ vpxor ymm11,ymm11,ymm13
+ vmovdqa ymm13,YMMWORD[((288-256-128))+rbx]
+
+ vpslld ymm7,ymm0,5
+ vpaddd ymm4,ymm4,ymm15
+ vpxor ymm5,ymm3,ymm1
+ vmovdqa YMMWORD[(192-128)+rax],ymm10
+ vpaddd ymm4,ymm4,ymm10
+ vpxor ymm11,ymm11,YMMWORD[((480-256-128))+rbx]
+ vpsrld ymm8,ymm0,27
+ vpxor ymm5,ymm5,ymm2
+ vpxor ymm11,ymm11,ymm13
+
+ vpslld ymm6,ymm1,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm4,ymm4,ymm5
+ vpsrld ymm9,ymm11,31
+ vpaddd ymm11,ymm11,ymm11
+
+ vpsrld ymm1,ymm1,2
+ vpaddd ymm4,ymm4,ymm7
+ vpor ymm11,ymm11,ymm9
+ vpor ymm1,ymm1,ymm6
+ vpxor ymm12,ymm12,ymm14
+ vmovdqa ymm14,YMMWORD[((320-256-128))+rbx]
+
+ vpslld ymm7,ymm4,5
+ vpaddd ymm3,ymm3,ymm15
+ vpxor ymm5,ymm2,ymm0
+ vmovdqa YMMWORD[(224-128)+rax],ymm11
+ vpaddd ymm3,ymm3,ymm11
+ vpxor ymm12,ymm12,YMMWORD[((0-128))+rax]
+ vpsrld ymm8,ymm4,27
+ vpxor ymm5,ymm5,ymm1
+ vpxor ymm12,ymm12,ymm14
+
+ vpslld ymm6,ymm0,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm3,ymm3,ymm5
+ vpsrld ymm9,ymm12,31
+ vpaddd ymm12,ymm12,ymm12
+
+ vpsrld ymm0,ymm0,2
+ vpaddd ymm3,ymm3,ymm7
+ vpor ymm12,ymm12,ymm9
+ vpor ymm0,ymm0,ymm6
+ vpxor ymm13,ymm13,ymm10
+ vmovdqa ymm10,YMMWORD[((352-256-128))+rbx]
+
+ vpslld ymm7,ymm3,5
+ vpaddd ymm2,ymm2,ymm15
+ vpxor ymm5,ymm1,ymm4
+ vpaddd ymm2,ymm2,ymm12
+ vpxor ymm13,ymm13,YMMWORD[((32-128))+rax]
+ vpsrld ymm8,ymm3,27
+ vpxor ymm5,ymm5,ymm0
+ vpxor ymm13,ymm13,ymm10
+
+ vpslld ymm6,ymm4,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm2,ymm2,ymm5
+ vpsrld ymm9,ymm13,31
+ vpaddd ymm13,ymm13,ymm13
+
+ vpsrld ymm4,ymm4,2
+ vpaddd ymm2,ymm2,ymm7
+ vpor ymm13,ymm13,ymm9
+ vpor ymm4,ymm4,ymm6
+ vpxor ymm14,ymm14,ymm11
+ vmovdqa ymm11,YMMWORD[((384-256-128))+rbx]
+
+ vpslld ymm7,ymm2,5
+ vpaddd ymm1,ymm1,ymm15
+ vpxor ymm5,ymm0,ymm3
+ vpaddd ymm1,ymm1,ymm13
+ vpxor ymm14,ymm14,YMMWORD[((64-128))+rax]
+ vpsrld ymm8,ymm2,27
+ vpxor ymm5,ymm5,ymm4
+ vpxor ymm14,ymm14,ymm11
+
+ vpslld ymm6,ymm3,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm1,ymm1,ymm5
+ vpsrld ymm9,ymm14,31
+ vpaddd ymm14,ymm14,ymm14
+
+ vpsrld ymm3,ymm3,2
+ vpaddd ymm1,ymm1,ymm7
+ vpor ymm14,ymm14,ymm9
+ vpor ymm3,ymm3,ymm6
+ vpxor ymm10,ymm10,ymm12
+ vmovdqa ymm12,YMMWORD[((416-256-128))+rbx]
+
+ vpslld ymm7,ymm1,5
+ vpaddd ymm0,ymm0,ymm15
+ vpxor ymm5,ymm4,ymm2
+ vpaddd ymm0,ymm0,ymm14
+ vpxor ymm10,ymm10,YMMWORD[((96-128))+rax]
+ vpsrld ymm8,ymm1,27
+ vpxor ymm5,ymm5,ymm3
+ vpxor ymm10,ymm10,ymm12
+
+ vpslld ymm6,ymm2,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm0,ymm0,ymm5
+ vpsrld ymm9,ymm10,31
+ vpaddd ymm10,ymm10,ymm10
+
+ vpsrld ymm2,ymm2,2
+ vpaddd ymm0,ymm0,ymm7
+ vpor ymm10,ymm10,ymm9
+ vpor ymm2,ymm2,ymm6
+ vpxor ymm11,ymm11,ymm13
+ vmovdqa ymm13,YMMWORD[((448-256-128))+rbx]
+
+ vpslld ymm7,ymm0,5
+ vpaddd ymm4,ymm4,ymm15
+ vpxor ymm5,ymm3,ymm1
+ vpaddd ymm4,ymm4,ymm10
+ vpxor ymm11,ymm11,YMMWORD[((128-128))+rax]
+ vpsrld ymm8,ymm0,27
+ vpxor ymm5,ymm5,ymm2
+ vpxor ymm11,ymm11,ymm13
+
+ vpslld ymm6,ymm1,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm4,ymm4,ymm5
+ vpsrld ymm9,ymm11,31
+ vpaddd ymm11,ymm11,ymm11
+
+ vpsrld ymm1,ymm1,2
+ vpaddd ymm4,ymm4,ymm7
+ vpor ymm11,ymm11,ymm9
+ vpor ymm1,ymm1,ymm6
+ vpxor ymm12,ymm12,ymm14
+ vmovdqa ymm14,YMMWORD[((480-256-128))+rbx]
+
+ vpslld ymm7,ymm4,5
+ vpaddd ymm3,ymm3,ymm15
+ vpxor ymm5,ymm2,ymm0
+ vpaddd ymm3,ymm3,ymm11
+ vpxor ymm12,ymm12,YMMWORD[((160-128))+rax]
+ vpsrld ymm8,ymm4,27
+ vpxor ymm5,ymm5,ymm1
+ vpxor ymm12,ymm12,ymm14
+
+ vpslld ymm6,ymm0,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm3,ymm3,ymm5
+ vpsrld ymm9,ymm12,31
+ vpaddd ymm12,ymm12,ymm12
+
+ vpsrld ymm0,ymm0,2
+ vpaddd ymm3,ymm3,ymm7
+ vpor ymm12,ymm12,ymm9
+ vpor ymm0,ymm0,ymm6
+ vpxor ymm13,ymm13,ymm10
+ vmovdqa ymm10,YMMWORD[((0-128))+rax]
+
+ vpslld ymm7,ymm3,5
+ vpaddd ymm2,ymm2,ymm15
+ vpxor ymm5,ymm1,ymm4
+ vpaddd ymm2,ymm2,ymm12
+ vpxor ymm13,ymm13,YMMWORD[((192-128))+rax]
+ vpsrld ymm8,ymm3,27
+ vpxor ymm5,ymm5,ymm0
+ vpxor ymm13,ymm13,ymm10
+
+ vpslld ymm6,ymm4,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm2,ymm2,ymm5
+ vpsrld ymm9,ymm13,31
+ vpaddd ymm13,ymm13,ymm13
+
+ vpsrld ymm4,ymm4,2
+ vpaddd ymm2,ymm2,ymm7
+ vpor ymm13,ymm13,ymm9
+ vpor ymm4,ymm4,ymm6
+ vpxor ymm14,ymm14,ymm11
+ vmovdqa ymm11,YMMWORD[((32-128))+rax]
+
+ vpslld ymm7,ymm2,5
+ vpaddd ymm1,ymm1,ymm15
+ vpxor ymm5,ymm0,ymm3
+ vpaddd ymm1,ymm1,ymm13
+ vpxor ymm14,ymm14,YMMWORD[((224-128))+rax]
+ vpsrld ymm8,ymm2,27
+ vpxor ymm5,ymm5,ymm4
+ vpxor ymm14,ymm14,ymm11
+
+ vpslld ymm6,ymm3,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm1,ymm1,ymm5
+ vpsrld ymm9,ymm14,31
+ vpaddd ymm14,ymm14,ymm14
+
+ vpsrld ymm3,ymm3,2
+ vpaddd ymm1,ymm1,ymm7
+ vpor ymm14,ymm14,ymm9
+ vpor ymm3,ymm3,ymm6
+ vpslld ymm7,ymm1,5
+ vpaddd ymm0,ymm0,ymm15
+ vpxor ymm5,ymm4,ymm2
+
+ vpsrld ymm8,ymm1,27
+ vpaddd ymm0,ymm0,ymm14
+ vpxor ymm5,ymm5,ymm3
+
+ vpslld ymm6,ymm2,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm0,ymm0,ymm5
+
+ vpsrld ymm2,ymm2,2
+ vpaddd ymm0,ymm0,ymm7
+ vpor ymm2,ymm2,ymm6
+ mov ecx,1
+ lea rbx,[512+rsp]
+ cmp ecx,DWORD[rbx]
+ cmovge r12,rbp
+ cmp ecx,DWORD[4+rbx]
+ cmovge r13,rbp
+ cmp ecx,DWORD[8+rbx]
+ cmovge r14,rbp
+ cmp ecx,DWORD[12+rbx]
+ cmovge r15,rbp
+ cmp ecx,DWORD[16+rbx]
+ cmovge r8,rbp
+ cmp ecx,DWORD[20+rbx]
+ cmovge r9,rbp
+ cmp ecx,DWORD[24+rbx]
+ cmovge r10,rbp
+ cmp ecx,DWORD[28+rbx]
+ cmovge r11,rbp
+ vmovdqu ymm5,YMMWORD[rbx]
+ vpxor ymm7,ymm7,ymm7
+ vmovdqa ymm6,ymm5
+ vpcmpgtd ymm6,ymm6,ymm7
+ vpaddd ymm5,ymm5,ymm6
+
+ vpand ymm0,ymm0,ymm6
+ vpand ymm1,ymm1,ymm6
+ vpaddd ymm0,ymm0,YMMWORD[rdi]
+ vpand ymm2,ymm2,ymm6
+ vpaddd ymm1,ymm1,YMMWORD[32+rdi]
+ vpand ymm3,ymm3,ymm6
+ vpaddd ymm2,ymm2,YMMWORD[64+rdi]
+ vpand ymm4,ymm4,ymm6
+ vpaddd ymm3,ymm3,YMMWORD[96+rdi]
+ vpaddd ymm4,ymm4,YMMWORD[128+rdi]
+ vmovdqu YMMWORD[rdi],ymm0
+ vmovdqu YMMWORD[32+rdi],ymm1
+ vmovdqu YMMWORD[64+rdi],ymm2
+ vmovdqu YMMWORD[96+rdi],ymm3
+ vmovdqu YMMWORD[128+rdi],ymm4
+
+ vmovdqu YMMWORD[rbx],ymm5
+ lea rbx,[((256+128))+rsp]
+ vmovdqu ymm9,YMMWORD[96+rbp]
+ dec edx
+ jnz NEAR $L$oop_avx2
+
+
+
+
+
+
+
+$L$done_avx2:
+ mov rax,QWORD[544+rsp]
+
+ vzeroupper
+ movaps xmm6,XMMWORD[((-216))+rax]
+ movaps xmm7,XMMWORD[((-200))+rax]
+ movaps xmm8,XMMWORD[((-184))+rax]
+ movaps xmm9,XMMWORD[((-168))+rax]
+ movaps xmm10,XMMWORD[((-152))+rax]
+ movaps xmm11,XMMWORD[((-136))+rax]
+ movaps xmm12,XMMWORD[((-120))+rax]
+ movaps xmm13,XMMWORD[((-104))+rax]
+ movaps xmm14,XMMWORD[((-88))+rax]
+ movaps xmm15,XMMWORD[((-72))+rax]
+ mov r15,QWORD[((-48))+rax]
+
+ mov r14,QWORD[((-40))+rax]
+
+ mov r13,QWORD[((-32))+rax]
+
+ mov r12,QWORD[((-24))+rax]
+
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$epilogue_avx2:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_sha1_multi_block_avx2:
+
+ALIGN 256
+ DD 0x5a827999,0x5a827999,0x5a827999,0x5a827999
+ DD 0x5a827999,0x5a827999,0x5a827999,0x5a827999
+K_XX_XX:
+ DD 0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1
+ DD 0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1
+ DD 0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc
+ DD 0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc
+ DD 0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6
+ DD 0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6
+ DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+ DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+DB 0xf,0xe,0xd,0xc,0xb,0xa,0x9,0x8,0x7,0x6,0x5,0x4,0x3,0x2,0x1,0x0
+DB 83,72,65,49,32,109,117,108,116,105,45,98,108,111,99,107
+DB 32,116,114,97,110,115,102,111,114,109,32,102,111,114,32,120
+DB 56,54,95,54,52,44,32,67,82,89,80,84,79,71,65,77
+DB 83,32,98,121,32,60,97,112,112,114,111,64,111,112,101,110
+DB 115,115,108,46,111,114,103,62,0
+EXTERN __imp_RtlVirtualUnwind
+
+ALIGN 16
+se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$in_prologue
+
+ mov rax,QWORD[152+r8]
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$in_prologue
+
+ mov rax,QWORD[272+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+ mov rbp,QWORD[((-16))+rax]
+ mov QWORD[144+r8],rbx
+ mov QWORD[160+r8],rbp
+
+ lea rsi,[((-24-160))+rax]
+ lea rdi,[512+r8]
+ mov ecx,20
+ DD 0xa548f3fc
+
+$L$in_prologue:
+ mov rdi,QWORD[8+rax]
+ mov rsi,QWORD[16+rax]
+ mov QWORD[152+r8],rax
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+
+ mov rdi,QWORD[40+r9]
+ mov rsi,r8
+ mov ecx,154
+ DD 0xa548f3fc
+
+ mov rsi,r9
+ xor rcx,rcx
+ mov rdx,QWORD[8+rsi]
+ mov r8,QWORD[rsi]
+ mov r9,QWORD[16+rsi]
+ mov r10,QWORD[40+rsi]
+ lea r11,[56+rsi]
+ lea r12,[24+rsi]
+ mov QWORD[32+rsp],r10
+ mov QWORD[40+rsp],r11
+ mov QWORD[48+rsp],r12
+ mov QWORD[56+rsp],rcx
+ call QWORD[__imp_RtlVirtualUnwind]
+
+ mov eax,1
+ add rsp,64
+ popfq
+ pop r15
+ pop r14
+ pop r13
+ pop r12
+ pop rbp
+ pop rbx
+ pop rdi
+ pop rsi
+ DB 0F3h,0C3h ;repret
+
+
+ALIGN 16
+avx2_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$in_prologue
+
+ mov rax,QWORD[152+r8]
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$in_prologue
+
+ mov rax,QWORD[544+r8]
+
+ mov rbx,QWORD[((-8))+rax]
+ mov rbp,QWORD[((-16))+rax]
+ mov r12,QWORD[((-24))+rax]
+ mov r13,QWORD[((-32))+rax]
+ mov r14,QWORD[((-40))+rax]
+ mov r15,QWORD[((-48))+rax]
+ mov QWORD[144+r8],rbx
+ mov QWORD[160+r8],rbp
+ mov QWORD[216+r8],r12
+ mov QWORD[224+r8],r13
+ mov QWORD[232+r8],r14
+ mov QWORD[240+r8],r15
+
+ lea rsi,[((-56-160))+rax]
+ lea rdi,[512+r8]
+ mov ecx,20
+ DD 0xa548f3fc
+
+ jmp NEAR $L$in_prologue
+
+section .pdata rdata align=4
+ALIGN 4
+ DD $L$SEH_begin_sha1_multi_block wrt ..imagebase
+ DD $L$SEH_end_sha1_multi_block wrt ..imagebase
+ DD $L$SEH_info_sha1_multi_block wrt ..imagebase
+ DD $L$SEH_begin_sha1_multi_block_shaext wrt ..imagebase
+ DD $L$SEH_end_sha1_multi_block_shaext wrt ..imagebase
+ DD $L$SEH_info_sha1_multi_block_shaext wrt ..imagebase
+ DD $L$SEH_begin_sha1_multi_block_avx wrt ..imagebase
+ DD $L$SEH_end_sha1_multi_block_avx wrt ..imagebase
+ DD $L$SEH_info_sha1_multi_block_avx wrt ..imagebase
+ DD $L$SEH_begin_sha1_multi_block_avx2 wrt ..imagebase
+ DD $L$SEH_end_sha1_multi_block_avx2 wrt ..imagebase
+ DD $L$SEH_info_sha1_multi_block_avx2 wrt ..imagebase
+section .xdata rdata align=8
+ALIGN 8
+$L$SEH_info_sha1_multi_block:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$body wrt ..imagebase,$L$epilogue wrt ..imagebase
+$L$SEH_info_sha1_multi_block_shaext:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$body_shaext wrt ..imagebase,$L$epilogue_shaext wrt ..imagebase
+$L$SEH_info_sha1_multi_block_avx:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$body_avx wrt ..imagebase,$L$epilogue_avx wrt ..imagebase
+$L$SEH_info_sha1_multi_block_avx2:
+DB 9,0,0,0
+ DD avx2_handler wrt ..imagebase
+ DD $L$body_avx2 wrt ..imagebase,$L$epilogue_avx2 wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm
new file mode 100644
index 0000000000..3a7655b27f
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm
@@ -0,0 +1,5773 @@
+; Copyright 2006-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+EXTERN OPENSSL_ia32cap_P
+
+global sha1_block_data_order
+
+ALIGN 16
+sha1_block_data_order:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha1_block_data_order:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+ mov r9d,DWORD[((OPENSSL_ia32cap_P+0))]
+ mov r8d,DWORD[((OPENSSL_ia32cap_P+4))]
+ mov r10d,DWORD[((OPENSSL_ia32cap_P+8))]
+ test r8d,512
+ jz NEAR $L$ialu
+ test r10d,536870912
+ jnz NEAR _shaext_shortcut
+ and r10d,296
+ cmp r10d,296
+ je NEAR _avx2_shortcut
+ and r8d,268435456
+ and r9d,1073741824
+ or r8d,r9d
+ cmp r8d,1342177280
+ je NEAR _avx_shortcut
+ jmp NEAR _ssse3_shortcut
+
+ALIGN 16
+$L$ialu:
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ mov r8,rdi
+ sub rsp,72
+ mov r9,rsi
+ and rsp,-64
+ mov r10,rdx
+ mov QWORD[64+rsp],rax
+
+$L$prologue:
+
+ mov esi,DWORD[r8]
+ mov edi,DWORD[4+r8]
+ mov r11d,DWORD[8+r8]
+ mov r12d,DWORD[12+r8]
+ mov r13d,DWORD[16+r8]
+ jmp NEAR $L$loop
+
+ALIGN 16
+$L$loop:
+ mov edx,DWORD[r9]
+ bswap edx
+ mov ebp,DWORD[4+r9]
+ mov eax,r12d
+ mov DWORD[rsp],edx
+ mov ecx,esi
+ bswap ebp
+ xor eax,r11d
+ rol ecx,5
+ and eax,edi
+ lea r13d,[1518500249+r13*1+rdx]
+ add r13d,ecx
+ xor eax,r12d
+ rol edi,30
+ add r13d,eax
+ mov r14d,DWORD[8+r9]
+ mov eax,r11d
+ mov DWORD[4+rsp],ebp
+ mov ecx,r13d
+ bswap r14d
+ xor eax,edi
+ rol ecx,5
+ and eax,esi
+ lea r12d,[1518500249+r12*1+rbp]
+ add r12d,ecx
+ xor eax,r11d
+ rol esi,30
+ add r12d,eax
+ mov edx,DWORD[12+r9]
+ mov eax,edi
+ mov DWORD[8+rsp],r14d
+ mov ecx,r12d
+ bswap edx
+ xor eax,esi
+ rol ecx,5
+ and eax,r13d
+ lea r11d,[1518500249+r11*1+r14]
+ add r11d,ecx
+ xor eax,edi
+ rol r13d,30
+ add r11d,eax
+ mov ebp,DWORD[16+r9]
+ mov eax,esi
+ mov DWORD[12+rsp],edx
+ mov ecx,r11d
+ bswap ebp
+ xor eax,r13d
+ rol ecx,5
+ and eax,r12d
+ lea edi,[1518500249+rdi*1+rdx]
+ add edi,ecx
+ xor eax,esi
+ rol r12d,30
+ add edi,eax
+ mov r14d,DWORD[20+r9]
+ mov eax,r13d
+ mov DWORD[16+rsp],ebp
+ mov ecx,edi
+ bswap r14d
+ xor eax,r12d
+ rol ecx,5
+ and eax,r11d
+ lea esi,[1518500249+rsi*1+rbp]
+ add esi,ecx
+ xor eax,r13d
+ rol r11d,30
+ add esi,eax
+ mov edx,DWORD[24+r9]
+ mov eax,r12d
+ mov DWORD[20+rsp],r14d
+ mov ecx,esi
+ bswap edx
+ xor eax,r11d
+ rol ecx,5
+ and eax,edi
+ lea r13d,[1518500249+r13*1+r14]
+ add r13d,ecx
+ xor eax,r12d
+ rol edi,30
+ add r13d,eax
+ mov ebp,DWORD[28+r9]
+ mov eax,r11d
+ mov DWORD[24+rsp],edx
+ mov ecx,r13d
+ bswap ebp
+ xor eax,edi
+ rol ecx,5
+ and eax,esi
+ lea r12d,[1518500249+r12*1+rdx]
+ add r12d,ecx
+ xor eax,r11d
+ rol esi,30
+ add r12d,eax
+ mov r14d,DWORD[32+r9]
+ mov eax,edi
+ mov DWORD[28+rsp],ebp
+ mov ecx,r12d
+ bswap r14d
+ xor eax,esi
+ rol ecx,5
+ and eax,r13d
+ lea r11d,[1518500249+r11*1+rbp]
+ add r11d,ecx
+ xor eax,edi
+ rol r13d,30
+ add r11d,eax
+ mov edx,DWORD[36+r9]
+ mov eax,esi
+ mov DWORD[32+rsp],r14d
+ mov ecx,r11d
+ bswap edx
+ xor eax,r13d
+ rol ecx,5
+ and eax,r12d
+ lea edi,[1518500249+rdi*1+r14]
+ add edi,ecx
+ xor eax,esi
+ rol r12d,30
+ add edi,eax
+ mov ebp,DWORD[40+r9]
+ mov eax,r13d
+ mov DWORD[36+rsp],edx
+ mov ecx,edi
+ bswap ebp
+ xor eax,r12d
+ rol ecx,5
+ and eax,r11d
+ lea esi,[1518500249+rsi*1+rdx]
+ add esi,ecx
+ xor eax,r13d
+ rol r11d,30
+ add esi,eax
+ mov r14d,DWORD[44+r9]
+ mov eax,r12d
+ mov DWORD[40+rsp],ebp
+ mov ecx,esi
+ bswap r14d
+ xor eax,r11d
+ rol ecx,5
+ and eax,edi
+ lea r13d,[1518500249+r13*1+rbp]
+ add r13d,ecx
+ xor eax,r12d
+ rol edi,30
+ add r13d,eax
+ mov edx,DWORD[48+r9]
+ mov eax,r11d
+ mov DWORD[44+rsp],r14d
+ mov ecx,r13d
+ bswap edx
+ xor eax,edi
+ rol ecx,5
+ and eax,esi
+ lea r12d,[1518500249+r12*1+r14]
+ add r12d,ecx
+ xor eax,r11d
+ rol esi,30
+ add r12d,eax
+ mov ebp,DWORD[52+r9]
+ mov eax,edi
+ mov DWORD[48+rsp],edx
+ mov ecx,r12d
+ bswap ebp
+ xor eax,esi
+ rol ecx,5
+ and eax,r13d
+ lea r11d,[1518500249+r11*1+rdx]
+ add r11d,ecx
+ xor eax,edi
+ rol r13d,30
+ add r11d,eax
+ mov r14d,DWORD[56+r9]
+ mov eax,esi
+ mov DWORD[52+rsp],ebp
+ mov ecx,r11d
+ bswap r14d
+ xor eax,r13d
+ rol ecx,5
+ and eax,r12d
+ lea edi,[1518500249+rdi*1+rbp]
+ add edi,ecx
+ xor eax,esi
+ rol r12d,30
+ add edi,eax
+ mov edx,DWORD[60+r9]
+ mov eax,r13d
+ mov DWORD[56+rsp],r14d
+ mov ecx,edi
+ bswap edx
+ xor eax,r12d
+ rol ecx,5
+ and eax,r11d
+ lea esi,[1518500249+rsi*1+r14]
+ add esi,ecx
+ xor eax,r13d
+ rol r11d,30
+ add esi,eax
+ xor ebp,DWORD[rsp]
+ mov eax,r12d
+ mov DWORD[60+rsp],edx
+ mov ecx,esi
+ xor ebp,DWORD[8+rsp]
+ xor eax,r11d
+ rol ecx,5
+ xor ebp,DWORD[32+rsp]
+ and eax,edi
+ lea r13d,[1518500249+r13*1+rdx]
+ rol edi,30
+ xor eax,r12d
+ add r13d,ecx
+ rol ebp,1
+ add r13d,eax
+ xor r14d,DWORD[4+rsp]
+ mov eax,r11d
+ mov DWORD[rsp],ebp
+ mov ecx,r13d
+ xor r14d,DWORD[12+rsp]
+ xor eax,edi
+ rol ecx,5
+ xor r14d,DWORD[36+rsp]
+ and eax,esi
+ lea r12d,[1518500249+r12*1+rbp]
+ rol esi,30
+ xor eax,r11d
+ add r12d,ecx
+ rol r14d,1
+ add r12d,eax
+ xor edx,DWORD[8+rsp]
+ mov eax,edi
+ mov DWORD[4+rsp],r14d
+ mov ecx,r12d
+ xor edx,DWORD[16+rsp]
+ xor eax,esi
+ rol ecx,5
+ xor edx,DWORD[40+rsp]
+ and eax,r13d
+ lea r11d,[1518500249+r11*1+r14]
+ rol r13d,30
+ xor eax,edi
+ add r11d,ecx
+ rol edx,1
+ add r11d,eax
+ xor ebp,DWORD[12+rsp]
+ mov eax,esi
+ mov DWORD[8+rsp],edx
+ mov ecx,r11d
+ xor ebp,DWORD[20+rsp]
+ xor eax,r13d
+ rol ecx,5
+ xor ebp,DWORD[44+rsp]
+ and eax,r12d
+ lea edi,[1518500249+rdi*1+rdx]
+ rol r12d,30
+ xor eax,esi
+ add edi,ecx
+ rol ebp,1
+ add edi,eax
+ xor r14d,DWORD[16+rsp]
+ mov eax,r13d
+ mov DWORD[12+rsp],ebp
+ mov ecx,edi
+ xor r14d,DWORD[24+rsp]
+ xor eax,r12d
+ rol ecx,5
+ xor r14d,DWORD[48+rsp]
+ and eax,r11d
+ lea esi,[1518500249+rsi*1+rbp]
+ rol r11d,30
+ xor eax,r13d
+ add esi,ecx
+ rol r14d,1
+ add esi,eax
+ xor edx,DWORD[20+rsp]
+ mov eax,edi
+ mov DWORD[16+rsp],r14d
+ mov ecx,esi
+ xor edx,DWORD[28+rsp]
+ xor eax,r12d
+ rol ecx,5
+ xor edx,DWORD[52+rsp]
+ lea r13d,[1859775393+r13*1+r14]
+ xor eax,r11d
+ add r13d,ecx
+ rol edi,30
+ add r13d,eax
+ rol edx,1
+ xor ebp,DWORD[24+rsp]
+ mov eax,esi
+ mov DWORD[20+rsp],edx
+ mov ecx,r13d
+ xor ebp,DWORD[32+rsp]
+ xor eax,r11d
+ rol ecx,5
+ xor ebp,DWORD[56+rsp]
+ lea r12d,[1859775393+r12*1+rdx]
+ xor eax,edi
+ add r12d,ecx
+ rol esi,30
+ add r12d,eax
+ rol ebp,1
+ xor r14d,DWORD[28+rsp]
+ mov eax,r13d
+ mov DWORD[24+rsp],ebp
+ mov ecx,r12d
+ xor r14d,DWORD[36+rsp]
+ xor eax,edi
+ rol ecx,5
+ xor r14d,DWORD[60+rsp]
+ lea r11d,[1859775393+r11*1+rbp]
+ xor eax,esi
+ add r11d,ecx
+ rol r13d,30
+ add r11d,eax
+ rol r14d,1
+ xor edx,DWORD[32+rsp]
+ mov eax,r12d
+ mov DWORD[28+rsp],r14d
+ mov ecx,r11d
+ xor edx,DWORD[40+rsp]
+ xor eax,esi
+ rol ecx,5
+ xor edx,DWORD[rsp]
+ lea edi,[1859775393+rdi*1+r14]
+ xor eax,r13d
+ add edi,ecx
+ rol r12d,30
+ add edi,eax
+ rol edx,1
+ xor ebp,DWORD[36+rsp]
+ mov eax,r11d
+ mov DWORD[32+rsp],edx
+ mov ecx,edi
+ xor ebp,DWORD[44+rsp]
+ xor eax,r13d
+ rol ecx,5
+ xor ebp,DWORD[4+rsp]
+ lea esi,[1859775393+rsi*1+rdx]
+ xor eax,r12d
+ add esi,ecx
+ rol r11d,30
+ add esi,eax
+ rol ebp,1
+ xor r14d,DWORD[40+rsp]
+ mov eax,edi
+ mov DWORD[36+rsp],ebp
+ mov ecx,esi
+ xor r14d,DWORD[48+rsp]
+ xor eax,r12d
+ rol ecx,5
+ xor r14d,DWORD[8+rsp]
+ lea r13d,[1859775393+r13*1+rbp]
+ xor eax,r11d
+ add r13d,ecx
+ rol edi,30
+ add r13d,eax
+ rol r14d,1
+ xor edx,DWORD[44+rsp]
+ mov eax,esi
+ mov DWORD[40+rsp],r14d
+ mov ecx,r13d
+ xor edx,DWORD[52+rsp]
+ xor eax,r11d
+ rol ecx,5
+ xor edx,DWORD[12+rsp]
+ lea r12d,[1859775393+r12*1+r14]
+ xor eax,edi
+ add r12d,ecx
+ rol esi,30
+ add r12d,eax
+ rol edx,1
+ xor ebp,DWORD[48+rsp]
+ mov eax,r13d
+ mov DWORD[44+rsp],edx
+ mov ecx,r12d
+ xor ebp,DWORD[56+rsp]
+ xor eax,edi
+ rol ecx,5
+ xor ebp,DWORD[16+rsp]
+ lea r11d,[1859775393+r11*1+rdx]
+ xor eax,esi
+ add r11d,ecx
+ rol r13d,30
+ add r11d,eax
+ rol ebp,1
+ xor r14d,DWORD[52+rsp]
+ mov eax,r12d
+ mov DWORD[48+rsp],ebp
+ mov ecx,r11d
+ xor r14d,DWORD[60+rsp]
+ xor eax,esi
+ rol ecx,5
+ xor r14d,DWORD[20+rsp]
+ lea edi,[1859775393+rdi*1+rbp]
+ xor eax,r13d
+ add edi,ecx
+ rol r12d,30
+ add edi,eax
+ rol r14d,1
+ xor edx,DWORD[56+rsp]
+ mov eax,r11d
+ mov DWORD[52+rsp],r14d
+ mov ecx,edi
+ xor edx,DWORD[rsp]
+ xor eax,r13d
+ rol ecx,5
+ xor edx,DWORD[24+rsp]
+ lea esi,[1859775393+rsi*1+r14]
+ xor eax,r12d
+ add esi,ecx
+ rol r11d,30
+ add esi,eax
+ rol edx,1
+ xor ebp,DWORD[60+rsp]
+ mov eax,edi
+ mov DWORD[56+rsp],edx
+ mov ecx,esi
+ xor ebp,DWORD[4+rsp]
+ xor eax,r12d
+ rol ecx,5
+ xor ebp,DWORD[28+rsp]
+ lea r13d,[1859775393+r13*1+rdx]
+ xor eax,r11d
+ add r13d,ecx
+ rol edi,30
+ add r13d,eax
+ rol ebp,1
+ xor r14d,DWORD[rsp]
+ mov eax,esi
+ mov DWORD[60+rsp],ebp
+ mov ecx,r13d
+ xor r14d,DWORD[8+rsp]
+ xor eax,r11d
+ rol ecx,5
+ xor r14d,DWORD[32+rsp]
+ lea r12d,[1859775393+r12*1+rbp]
+ xor eax,edi
+ add r12d,ecx
+ rol esi,30
+ add r12d,eax
+ rol r14d,1
+ xor edx,DWORD[4+rsp]
+ mov eax,r13d
+ mov DWORD[rsp],r14d
+ mov ecx,r12d
+ xor edx,DWORD[12+rsp]
+ xor eax,edi
+ rol ecx,5
+ xor edx,DWORD[36+rsp]
+ lea r11d,[1859775393+r11*1+r14]
+ xor eax,esi
+ add r11d,ecx
+ rol r13d,30
+ add r11d,eax
+ rol edx,1
+ xor ebp,DWORD[8+rsp]
+ mov eax,r12d
+ mov DWORD[4+rsp],edx
+ mov ecx,r11d
+ xor ebp,DWORD[16+rsp]
+ xor eax,esi
+ rol ecx,5
+ xor ebp,DWORD[40+rsp]
+ lea edi,[1859775393+rdi*1+rdx]
+ xor eax,r13d
+ add edi,ecx
+ rol r12d,30
+ add edi,eax
+ rol ebp,1
+ xor r14d,DWORD[12+rsp]
+ mov eax,r11d
+ mov DWORD[8+rsp],ebp
+ mov ecx,edi
+ xor r14d,DWORD[20+rsp]
+ xor eax,r13d
+ rol ecx,5
+ xor r14d,DWORD[44+rsp]
+ lea esi,[1859775393+rsi*1+rbp]
+ xor eax,r12d
+ add esi,ecx
+ rol r11d,30
+ add esi,eax
+ rol r14d,1
+ xor edx,DWORD[16+rsp]
+ mov eax,edi
+ mov DWORD[12+rsp],r14d
+ mov ecx,esi
+ xor edx,DWORD[24+rsp]
+ xor eax,r12d
+ rol ecx,5
+ xor edx,DWORD[48+rsp]
+ lea r13d,[1859775393+r13*1+r14]
+ xor eax,r11d
+ add r13d,ecx
+ rol edi,30
+ add r13d,eax
+ rol edx,1
+ xor ebp,DWORD[20+rsp]
+ mov eax,esi
+ mov DWORD[16+rsp],edx
+ mov ecx,r13d
+ xor ebp,DWORD[28+rsp]
+ xor eax,r11d
+ rol ecx,5
+ xor ebp,DWORD[52+rsp]
+ lea r12d,[1859775393+r12*1+rdx]
+ xor eax,edi
+ add r12d,ecx
+ rol esi,30
+ add r12d,eax
+ rol ebp,1
+ xor r14d,DWORD[24+rsp]
+ mov eax,r13d
+ mov DWORD[20+rsp],ebp
+ mov ecx,r12d
+ xor r14d,DWORD[32+rsp]
+ xor eax,edi
+ rol ecx,5
+ xor r14d,DWORD[56+rsp]
+ lea r11d,[1859775393+r11*1+rbp]
+ xor eax,esi
+ add r11d,ecx
+ rol r13d,30
+ add r11d,eax
+ rol r14d,1
+ xor edx,DWORD[28+rsp]
+ mov eax,r12d
+ mov DWORD[24+rsp],r14d
+ mov ecx,r11d
+ xor edx,DWORD[36+rsp]
+ xor eax,esi
+ rol ecx,5
+ xor edx,DWORD[60+rsp]
+ lea edi,[1859775393+rdi*1+r14]
+ xor eax,r13d
+ add edi,ecx
+ rol r12d,30
+ add edi,eax
+ rol edx,1
+ xor ebp,DWORD[32+rsp]
+ mov eax,r11d
+ mov DWORD[28+rsp],edx
+ mov ecx,edi
+ xor ebp,DWORD[40+rsp]
+ xor eax,r13d
+ rol ecx,5
+ xor ebp,DWORD[rsp]
+ lea esi,[1859775393+rsi*1+rdx]
+ xor eax,r12d
+ add esi,ecx
+ rol r11d,30
+ add esi,eax
+ rol ebp,1
+ xor r14d,DWORD[36+rsp]
+ mov eax,r12d
+ mov DWORD[32+rsp],ebp
+ mov ebx,r12d
+ xor r14d,DWORD[44+rsp]
+ and eax,r11d
+ mov ecx,esi
+ xor r14d,DWORD[4+rsp]
+ lea r13d,[((-1894007588))+r13*1+rbp]
+ xor ebx,r11d
+ rol ecx,5
+ add r13d,eax
+ rol r14d,1
+ and ebx,edi
+ add r13d,ecx
+ rol edi,30
+ add r13d,ebx
+ xor edx,DWORD[40+rsp]
+ mov eax,r11d
+ mov DWORD[36+rsp],r14d
+ mov ebx,r11d
+ xor edx,DWORD[48+rsp]
+ and eax,edi
+ mov ecx,r13d
+ xor edx,DWORD[8+rsp]
+ lea r12d,[((-1894007588))+r12*1+r14]
+ xor ebx,edi
+ rol ecx,5
+ add r12d,eax
+ rol edx,1
+ and ebx,esi
+ add r12d,ecx
+ rol esi,30
+ add r12d,ebx
+ xor ebp,DWORD[44+rsp]
+ mov eax,edi
+ mov DWORD[40+rsp],edx
+ mov ebx,edi
+ xor ebp,DWORD[52+rsp]
+ and eax,esi
+ mov ecx,r12d
+ xor ebp,DWORD[12+rsp]
+ lea r11d,[((-1894007588))+r11*1+rdx]
+ xor ebx,esi
+ rol ecx,5
+ add r11d,eax
+ rol ebp,1
+ and ebx,r13d
+ add r11d,ecx
+ rol r13d,30
+ add r11d,ebx
+ xor r14d,DWORD[48+rsp]
+ mov eax,esi
+ mov DWORD[44+rsp],ebp
+ mov ebx,esi
+ xor r14d,DWORD[56+rsp]
+ and eax,r13d
+ mov ecx,r11d
+ xor r14d,DWORD[16+rsp]
+ lea edi,[((-1894007588))+rdi*1+rbp]
+ xor ebx,r13d
+ rol ecx,5
+ add edi,eax
+ rol r14d,1
+ and ebx,r12d
+ add edi,ecx
+ rol r12d,30
+ add edi,ebx
+ xor edx,DWORD[52+rsp]
+ mov eax,r13d
+ mov DWORD[48+rsp],r14d
+ mov ebx,r13d
+ xor edx,DWORD[60+rsp]
+ and eax,r12d
+ mov ecx,edi
+ xor edx,DWORD[20+rsp]
+ lea esi,[((-1894007588))+rsi*1+r14]
+ xor ebx,r12d
+ rol ecx,5
+ add esi,eax
+ rol edx,1
+ and ebx,r11d
+ add esi,ecx
+ rol r11d,30
+ add esi,ebx
+ xor ebp,DWORD[56+rsp]
+ mov eax,r12d
+ mov DWORD[52+rsp],edx
+ mov ebx,r12d
+ xor ebp,DWORD[rsp]
+ and eax,r11d
+ mov ecx,esi
+ xor ebp,DWORD[24+rsp]
+ lea r13d,[((-1894007588))+r13*1+rdx]
+ xor ebx,r11d
+ rol ecx,5
+ add r13d,eax
+ rol ebp,1
+ and ebx,edi
+ add r13d,ecx
+ rol edi,30
+ add r13d,ebx
+ xor r14d,DWORD[60+rsp]
+ mov eax,r11d
+ mov DWORD[56+rsp],ebp
+ mov ebx,r11d
+ xor r14d,DWORD[4+rsp]
+ and eax,edi
+ mov ecx,r13d
+ xor r14d,DWORD[28+rsp]
+ lea r12d,[((-1894007588))+r12*1+rbp]
+ xor ebx,edi
+ rol ecx,5
+ add r12d,eax
+ rol r14d,1
+ and ebx,esi
+ add r12d,ecx
+ rol esi,30
+ add r12d,ebx
+ xor edx,DWORD[rsp]
+ mov eax,edi
+ mov DWORD[60+rsp],r14d
+ mov ebx,edi
+ xor edx,DWORD[8+rsp]
+ and eax,esi
+ mov ecx,r12d
+ xor edx,DWORD[32+rsp]
+ lea r11d,[((-1894007588))+r11*1+r14]
+ xor ebx,esi
+ rol ecx,5
+ add r11d,eax
+ rol edx,1
+ and ebx,r13d
+ add r11d,ecx
+ rol r13d,30
+ add r11d,ebx
+ xor ebp,DWORD[4+rsp]
+ mov eax,esi
+ mov DWORD[rsp],edx
+ mov ebx,esi
+ xor ebp,DWORD[12+rsp]
+ and eax,r13d
+ mov ecx,r11d
+ xor ebp,DWORD[36+rsp]
+ lea edi,[((-1894007588))+rdi*1+rdx]
+ xor ebx,r13d
+ rol ecx,5
+ add edi,eax
+ rol ebp,1
+ and ebx,r12d
+ add edi,ecx
+ rol r12d,30
+ add edi,ebx
+ xor r14d,DWORD[8+rsp]
+ mov eax,r13d
+ mov DWORD[4+rsp],ebp
+ mov ebx,r13d
+ xor r14d,DWORD[16+rsp]
+ and eax,r12d
+ mov ecx,edi
+ xor r14d,DWORD[40+rsp]
+ lea esi,[((-1894007588))+rsi*1+rbp]
+ xor ebx,r12d
+ rol ecx,5
+ add esi,eax
+ rol r14d,1
+ and ebx,r11d
+ add esi,ecx
+ rol r11d,30
+ add esi,ebx
+ xor edx,DWORD[12+rsp]
+ mov eax,r12d
+ mov DWORD[8+rsp],r14d
+ mov ebx,r12d
+ xor edx,DWORD[20+rsp]
+ and eax,r11d
+ mov ecx,esi
+ xor edx,DWORD[44+rsp]
+ lea r13d,[((-1894007588))+r13*1+r14]
+ xor ebx,r11d
+ rol ecx,5
+ add r13d,eax
+ rol edx,1
+ and ebx,edi
+ add r13d,ecx
+ rol edi,30
+ add r13d,ebx
+ xor ebp,DWORD[16+rsp]
+ mov eax,r11d
+ mov DWORD[12+rsp],edx
+ mov ebx,r11d
+ xor ebp,DWORD[24+rsp]
+ and eax,edi
+ mov ecx,r13d
+ xor ebp,DWORD[48+rsp]
+ lea r12d,[((-1894007588))+r12*1+rdx]
+ xor ebx,edi
+ rol ecx,5
+ add r12d,eax
+ rol ebp,1
+ and ebx,esi
+ add r12d,ecx
+ rol esi,30
+ add r12d,ebx
+ xor r14d,DWORD[20+rsp]
+ mov eax,edi
+ mov DWORD[16+rsp],ebp
+ mov ebx,edi
+ xor r14d,DWORD[28+rsp]
+ and eax,esi
+ mov ecx,r12d
+ xor r14d,DWORD[52+rsp]
+ lea r11d,[((-1894007588))+r11*1+rbp]
+ xor ebx,esi
+ rol ecx,5
+ add r11d,eax
+ rol r14d,1
+ and ebx,r13d
+ add r11d,ecx
+ rol r13d,30
+ add r11d,ebx
+ xor edx,DWORD[24+rsp]
+ mov eax,esi
+ mov DWORD[20+rsp],r14d
+ mov ebx,esi
+ xor edx,DWORD[32+rsp]
+ and eax,r13d
+ mov ecx,r11d
+ xor edx,DWORD[56+rsp]
+ lea edi,[((-1894007588))+rdi*1+r14]
+ xor ebx,r13d
+ rol ecx,5
+ add edi,eax
+ rol edx,1
+ and ebx,r12d
+ add edi,ecx
+ rol r12d,30
+ add edi,ebx
+ xor ebp,DWORD[28+rsp]
+ mov eax,r13d
+ mov DWORD[24+rsp],edx
+ mov ebx,r13d
+ xor ebp,DWORD[36+rsp]
+ and eax,r12d
+ mov ecx,edi
+ xor ebp,DWORD[60+rsp]
+ lea esi,[((-1894007588))+rsi*1+rdx]
+ xor ebx,r12d
+ rol ecx,5
+ add esi,eax
+ rol ebp,1
+ and ebx,r11d
+ add esi,ecx
+ rol r11d,30
+ add esi,ebx
+ xor r14d,DWORD[32+rsp]
+ mov eax,r12d
+ mov DWORD[28+rsp],ebp
+ mov ebx,r12d
+ xor r14d,DWORD[40+rsp]
+ and eax,r11d
+ mov ecx,esi
+ xor r14d,DWORD[rsp]
+ lea r13d,[((-1894007588))+r13*1+rbp]
+ xor ebx,r11d
+ rol ecx,5
+ add r13d,eax
+ rol r14d,1
+ and ebx,edi
+ add r13d,ecx
+ rol edi,30
+ add r13d,ebx
+ xor edx,DWORD[36+rsp]
+ mov eax,r11d
+ mov DWORD[32+rsp],r14d
+ mov ebx,r11d
+ xor edx,DWORD[44+rsp]
+ and eax,edi
+ mov ecx,r13d
+ xor edx,DWORD[4+rsp]
+ lea r12d,[((-1894007588))+r12*1+r14]
+ xor ebx,edi
+ rol ecx,5
+ add r12d,eax
+ rol edx,1
+ and ebx,esi
+ add r12d,ecx
+ rol esi,30
+ add r12d,ebx
+ xor ebp,DWORD[40+rsp]
+ mov eax,edi
+ mov DWORD[36+rsp],edx
+ mov ebx,edi
+ xor ebp,DWORD[48+rsp]
+ and eax,esi
+ mov ecx,r12d
+ xor ebp,DWORD[8+rsp]
+ lea r11d,[((-1894007588))+r11*1+rdx]
+ xor ebx,esi
+ rol ecx,5
+ add r11d,eax
+ rol ebp,1
+ and ebx,r13d
+ add r11d,ecx
+ rol r13d,30
+ add r11d,ebx
+ xor r14d,DWORD[44+rsp]
+ mov eax,esi
+ mov DWORD[40+rsp],ebp
+ mov ebx,esi
+ xor r14d,DWORD[52+rsp]
+ and eax,r13d
+ mov ecx,r11d
+ xor r14d,DWORD[12+rsp]
+ lea edi,[((-1894007588))+rdi*1+rbp]
+ xor ebx,r13d
+ rol ecx,5
+ add edi,eax
+ rol r14d,1
+ and ebx,r12d
+ add edi,ecx
+ rol r12d,30
+ add edi,ebx
+ xor edx,DWORD[48+rsp]
+ mov eax,r13d
+ mov DWORD[44+rsp],r14d
+ mov ebx,r13d
+ xor edx,DWORD[56+rsp]
+ and eax,r12d
+ mov ecx,edi
+ xor edx,DWORD[16+rsp]
+ lea esi,[((-1894007588))+rsi*1+r14]
+ xor ebx,r12d
+ rol ecx,5
+ add esi,eax
+ rol edx,1
+ and ebx,r11d
+ add esi,ecx
+ rol r11d,30
+ add esi,ebx
+ xor ebp,DWORD[52+rsp]
+ mov eax,edi
+ mov DWORD[48+rsp],edx
+ mov ecx,esi
+ xor ebp,DWORD[60+rsp]
+ xor eax,r12d
+ rol ecx,5
+ xor ebp,DWORD[20+rsp]
+ lea r13d,[((-899497514))+r13*1+rdx]
+ xor eax,r11d
+ add r13d,ecx
+ rol edi,30
+ add r13d,eax
+ rol ebp,1
+ xor r14d,DWORD[56+rsp]
+ mov eax,esi
+ mov DWORD[52+rsp],ebp
+ mov ecx,r13d
+ xor r14d,DWORD[rsp]
+ xor eax,r11d
+ rol ecx,5
+ xor r14d,DWORD[24+rsp]
+ lea r12d,[((-899497514))+r12*1+rbp]
+ xor eax,edi
+ add r12d,ecx
+ rol esi,30
+ add r12d,eax
+ rol r14d,1
+ xor edx,DWORD[60+rsp]
+ mov eax,r13d
+ mov DWORD[56+rsp],r14d
+ mov ecx,r12d
+ xor edx,DWORD[4+rsp]
+ xor eax,edi
+ rol ecx,5
+ xor edx,DWORD[28+rsp]
+ lea r11d,[((-899497514))+r11*1+r14]
+ xor eax,esi
+ add r11d,ecx
+ rol r13d,30
+ add r11d,eax
+ rol edx,1
+ xor ebp,DWORD[rsp]
+ mov eax,r12d
+ mov DWORD[60+rsp],edx
+ mov ecx,r11d
+ xor ebp,DWORD[8+rsp]
+ xor eax,esi
+ rol ecx,5
+ xor ebp,DWORD[32+rsp]
+ lea edi,[((-899497514))+rdi*1+rdx]
+ xor eax,r13d
+ add edi,ecx
+ rol r12d,30
+ add edi,eax
+ rol ebp,1
+ xor r14d,DWORD[4+rsp]
+ mov eax,r11d
+ mov DWORD[rsp],ebp
+ mov ecx,edi
+ xor r14d,DWORD[12+rsp]
+ xor eax,r13d
+ rol ecx,5
+ xor r14d,DWORD[36+rsp]
+ lea esi,[((-899497514))+rsi*1+rbp]
+ xor eax,r12d
+ add esi,ecx
+ rol r11d,30
+ add esi,eax
+ rol r14d,1
+ xor edx,DWORD[8+rsp]
+ mov eax,edi
+ mov DWORD[4+rsp],r14d
+ mov ecx,esi
+ xor edx,DWORD[16+rsp]
+ xor eax,r12d
+ rol ecx,5
+ xor edx,DWORD[40+rsp]
+ lea r13d,[((-899497514))+r13*1+r14]
+ xor eax,r11d
+ add r13d,ecx
+ rol edi,30
+ add r13d,eax
+ rol edx,1
+ xor ebp,DWORD[12+rsp]
+ mov eax,esi
+ mov DWORD[8+rsp],edx
+ mov ecx,r13d
+ xor ebp,DWORD[20+rsp]
+ xor eax,r11d
+ rol ecx,5
+ xor ebp,DWORD[44+rsp]
+ lea r12d,[((-899497514))+r12*1+rdx]
+ xor eax,edi
+ add r12d,ecx
+ rol esi,30
+ add r12d,eax
+ rol ebp,1
+ xor r14d,DWORD[16+rsp]
+ mov eax,r13d
+ mov DWORD[12+rsp],ebp
+ mov ecx,r12d
+ xor r14d,DWORD[24+rsp]
+ xor eax,edi
+ rol ecx,5
+ xor r14d,DWORD[48+rsp]
+ lea r11d,[((-899497514))+r11*1+rbp]
+ xor eax,esi
+ add r11d,ecx
+ rol r13d,30
+ add r11d,eax
+ rol r14d,1
+ xor edx,DWORD[20+rsp]
+ mov eax,r12d
+ mov DWORD[16+rsp],r14d
+ mov ecx,r11d
+ xor edx,DWORD[28+rsp]
+ xor eax,esi
+ rol ecx,5
+ xor edx,DWORD[52+rsp]
+ lea edi,[((-899497514))+rdi*1+r14]
+ xor eax,r13d
+ add edi,ecx
+ rol r12d,30
+ add edi,eax
+ rol edx,1
+ xor ebp,DWORD[24+rsp]
+ mov eax,r11d
+ mov DWORD[20+rsp],edx
+ mov ecx,edi
+ xor ebp,DWORD[32+rsp]
+ xor eax,r13d
+ rol ecx,5
+ xor ebp,DWORD[56+rsp]
+ lea esi,[((-899497514))+rsi*1+rdx]
+ xor eax,r12d
+ add esi,ecx
+ rol r11d,30
+ add esi,eax
+ rol ebp,1
+ xor r14d,DWORD[28+rsp]
+ mov eax,edi
+ mov DWORD[24+rsp],ebp
+ mov ecx,esi
+ xor r14d,DWORD[36+rsp]
+ xor eax,r12d
+ rol ecx,5
+ xor r14d,DWORD[60+rsp]
+ lea r13d,[((-899497514))+r13*1+rbp]
+ xor eax,r11d
+ add r13d,ecx
+ rol edi,30
+ add r13d,eax
+ rol r14d,1
+ xor edx,DWORD[32+rsp]
+ mov eax,esi
+ mov DWORD[28+rsp],r14d
+ mov ecx,r13d
+ xor edx,DWORD[40+rsp]
+ xor eax,r11d
+ rol ecx,5
+ xor edx,DWORD[rsp]
+ lea r12d,[((-899497514))+r12*1+r14]
+ xor eax,edi
+ add r12d,ecx
+ rol esi,30
+ add r12d,eax
+ rol edx,1
+ xor ebp,DWORD[36+rsp]
+ mov eax,r13d
+
+ mov ecx,r12d
+ xor ebp,DWORD[44+rsp]
+ xor eax,edi
+ rol ecx,5
+ xor ebp,DWORD[4+rsp]
+ lea r11d,[((-899497514))+r11*1+rdx]
+ xor eax,esi
+ add r11d,ecx
+ rol r13d,30
+ add r11d,eax
+ rol ebp,1
+ xor r14d,DWORD[40+rsp]
+ mov eax,r12d
+
+ mov ecx,r11d
+ xor r14d,DWORD[48+rsp]
+ xor eax,esi
+ rol ecx,5
+ xor r14d,DWORD[8+rsp]
+ lea edi,[((-899497514))+rdi*1+rbp]
+ xor eax,r13d
+ add edi,ecx
+ rol r12d,30
+ add edi,eax
+ rol r14d,1
+ xor edx,DWORD[44+rsp]
+ mov eax,r11d
+
+ mov ecx,edi
+ xor edx,DWORD[52+rsp]
+ xor eax,r13d
+ rol ecx,5
+ xor edx,DWORD[12+rsp]
+ lea esi,[((-899497514))+rsi*1+r14]
+ xor eax,r12d
+ add esi,ecx
+ rol r11d,30
+ add esi,eax
+ rol edx,1
+ xor ebp,DWORD[48+rsp]
+ mov eax,edi
+
+ mov ecx,esi
+ xor ebp,DWORD[56+rsp]
+ xor eax,r12d
+ rol ecx,5
+ xor ebp,DWORD[16+rsp]
+ lea r13d,[((-899497514))+r13*1+rdx]
+ xor eax,r11d
+ add r13d,ecx
+ rol edi,30
+ add r13d,eax
+ rol ebp,1
+ xor r14d,DWORD[52+rsp]
+ mov eax,esi
+
+ mov ecx,r13d
+ xor r14d,DWORD[60+rsp]
+ xor eax,r11d
+ rol ecx,5
+ xor r14d,DWORD[20+rsp]
+ lea r12d,[((-899497514))+r12*1+rbp]
+ xor eax,edi
+ add r12d,ecx
+ rol esi,30
+ add r12d,eax
+ rol r14d,1
+ xor edx,DWORD[56+rsp]
+ mov eax,r13d
+
+ mov ecx,r12d
+ xor edx,DWORD[rsp]
+ xor eax,edi
+ rol ecx,5
+ xor edx,DWORD[24+rsp]
+ lea r11d,[((-899497514))+r11*1+r14]
+ xor eax,esi
+ add r11d,ecx
+ rol r13d,30
+ add r11d,eax
+ rol edx,1
+ xor ebp,DWORD[60+rsp]
+ mov eax,r12d
+
+ mov ecx,r11d
+ xor ebp,DWORD[4+rsp]
+ xor eax,esi
+ rol ecx,5
+ xor ebp,DWORD[28+rsp]
+ lea edi,[((-899497514))+rdi*1+rdx]
+ xor eax,r13d
+ add edi,ecx
+ rol r12d,30
+ add edi,eax
+ rol ebp,1
+ mov eax,r11d
+ mov ecx,edi
+ xor eax,r13d
+ lea esi,[((-899497514))+rsi*1+rbp]
+ rol ecx,5
+ xor eax,r12d
+ add esi,ecx
+ rol r11d,30
+ add esi,eax
+ add esi,DWORD[r8]
+ add edi,DWORD[4+r8]
+ add r11d,DWORD[8+r8]
+ add r12d,DWORD[12+r8]
+ add r13d,DWORD[16+r8]
+ mov DWORD[r8],esi
+ mov DWORD[4+r8],edi
+ mov DWORD[8+r8],r11d
+ mov DWORD[12+r8],r12d
+ mov DWORD[16+r8],r13d
+
+ sub r10,1
+ lea r9,[64+r9]
+ jnz NEAR $L$loop
+
+ mov rsi,QWORD[64+rsp]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_sha1_block_data_order:
+
+ALIGN 32
+sha1_block_data_order_shaext:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha1_block_data_order_shaext:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+_shaext_shortcut:
+
+ lea rsp,[((-72))+rsp]
+ movaps XMMWORD[(-8-64)+rax],xmm6
+ movaps XMMWORD[(-8-48)+rax],xmm7
+ movaps XMMWORD[(-8-32)+rax],xmm8
+ movaps XMMWORD[(-8-16)+rax],xmm9
+$L$prologue_shaext:
+ movdqu xmm0,XMMWORD[rdi]
+ movd xmm1,DWORD[16+rdi]
+ movdqa xmm3,XMMWORD[((K_XX_XX+160))]
+
+ movdqu xmm4,XMMWORD[rsi]
+ pshufd xmm0,xmm0,27
+ movdqu xmm5,XMMWORD[16+rsi]
+ pshufd xmm1,xmm1,27
+ movdqu xmm6,XMMWORD[32+rsi]
+DB 102,15,56,0,227
+ movdqu xmm7,XMMWORD[48+rsi]
+DB 102,15,56,0,235
+DB 102,15,56,0,243
+ movdqa xmm9,xmm1
+DB 102,15,56,0,251
+ jmp NEAR $L$oop_shaext
+
+ALIGN 16
+$L$oop_shaext:
+ dec rdx
+ lea r8,[64+rsi]
+ paddd xmm1,xmm4
+ cmovne rsi,r8
+ movdqa xmm8,xmm0
+DB 15,56,201,229
+ movdqa xmm2,xmm0
+DB 15,58,204,193,0
+DB 15,56,200,213
+ pxor xmm4,xmm6
+DB 15,56,201,238
+DB 15,56,202,231
+
+ movdqa xmm1,xmm0
+DB 15,58,204,194,0
+DB 15,56,200,206
+ pxor xmm5,xmm7
+DB 15,56,202,236
+DB 15,56,201,247
+ movdqa xmm2,xmm0
+DB 15,58,204,193,0
+DB 15,56,200,215
+ pxor xmm6,xmm4
+DB 15,56,201,252
+DB 15,56,202,245
+
+ movdqa xmm1,xmm0
+DB 15,58,204,194,0
+DB 15,56,200,204
+ pxor xmm7,xmm5
+DB 15,56,202,254
+DB 15,56,201,229
+ movdqa xmm2,xmm0
+DB 15,58,204,193,0
+DB 15,56,200,213
+ pxor xmm4,xmm6
+DB 15,56,201,238
+DB 15,56,202,231
+
+ movdqa xmm1,xmm0
+DB 15,58,204,194,1
+DB 15,56,200,206
+ pxor xmm5,xmm7
+DB 15,56,202,236
+DB 15,56,201,247
+ movdqa xmm2,xmm0
+DB 15,58,204,193,1
+DB 15,56,200,215
+ pxor xmm6,xmm4
+DB 15,56,201,252
+DB 15,56,202,245
+
+ movdqa xmm1,xmm0
+DB 15,58,204,194,1
+DB 15,56,200,204
+ pxor xmm7,xmm5
+DB 15,56,202,254
+DB 15,56,201,229
+ movdqa xmm2,xmm0
+DB 15,58,204,193,1
+DB 15,56,200,213
+ pxor xmm4,xmm6
+DB 15,56,201,238
+DB 15,56,202,231
+
+ movdqa xmm1,xmm0
+DB 15,58,204,194,1
+DB 15,56,200,206
+ pxor xmm5,xmm7
+DB 15,56,202,236
+DB 15,56,201,247
+ movdqa xmm2,xmm0
+DB 15,58,204,193,2
+DB 15,56,200,215
+ pxor xmm6,xmm4
+DB 15,56,201,252
+DB 15,56,202,245
+
+ movdqa xmm1,xmm0
+DB 15,58,204,194,2
+DB 15,56,200,204
+ pxor xmm7,xmm5
+DB 15,56,202,254
+DB 15,56,201,229
+ movdqa xmm2,xmm0
+DB 15,58,204,193,2
+DB 15,56,200,213
+ pxor xmm4,xmm6
+DB 15,56,201,238
+DB 15,56,202,231
+
+ movdqa xmm1,xmm0
+DB 15,58,204,194,2
+DB 15,56,200,206
+ pxor xmm5,xmm7
+DB 15,56,202,236
+DB 15,56,201,247
+ movdqa xmm2,xmm0
+DB 15,58,204,193,2
+DB 15,56,200,215
+ pxor xmm6,xmm4
+DB 15,56,201,252
+DB 15,56,202,245
+
+ movdqa xmm1,xmm0
+DB 15,58,204,194,3
+DB 15,56,200,204
+ pxor xmm7,xmm5
+DB 15,56,202,254
+ movdqu xmm4,XMMWORD[rsi]
+ movdqa xmm2,xmm0
+DB 15,58,204,193,3
+DB 15,56,200,213
+ movdqu xmm5,XMMWORD[16+rsi]
+DB 102,15,56,0,227
+
+ movdqa xmm1,xmm0
+DB 15,58,204,194,3
+DB 15,56,200,206
+ movdqu xmm6,XMMWORD[32+rsi]
+DB 102,15,56,0,235
+
+ movdqa xmm2,xmm0
+DB 15,58,204,193,3
+DB 15,56,200,215
+ movdqu xmm7,XMMWORD[48+rsi]
+DB 102,15,56,0,243
+
+ movdqa xmm1,xmm0
+DB 15,58,204,194,3
+DB 65,15,56,200,201
+DB 102,15,56,0,251
+
+ paddd xmm0,xmm8
+ movdqa xmm9,xmm1
+
+ jnz NEAR $L$oop_shaext
+
+ pshufd xmm0,xmm0,27
+ pshufd xmm1,xmm1,27
+ movdqu XMMWORD[rdi],xmm0
+ movd DWORD[16+rdi],xmm1
+ movaps xmm6,XMMWORD[((-8-64))+rax]
+ movaps xmm7,XMMWORD[((-8-48))+rax]
+ movaps xmm8,XMMWORD[((-8-32))+rax]
+ movaps xmm9,XMMWORD[((-8-16))+rax]
+ mov rsp,rax
+$L$epilogue_shaext:
+
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+$L$SEH_end_sha1_block_data_order_shaext:
+
+ALIGN 16
+sha1_block_data_order_ssse3:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha1_block_data_order_ssse3:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+_ssse3_shortcut:
+
+ mov r11,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ lea rsp,[((-160))+rsp]
+ movaps XMMWORD[(-40-96)+r11],xmm6
+ movaps XMMWORD[(-40-80)+r11],xmm7
+ movaps XMMWORD[(-40-64)+r11],xmm8
+ movaps XMMWORD[(-40-48)+r11],xmm9
+ movaps XMMWORD[(-40-32)+r11],xmm10
+ movaps XMMWORD[(-40-16)+r11],xmm11
+$L$prologue_ssse3:
+ and rsp,-64
+ mov r8,rdi
+ mov r9,rsi
+ mov r10,rdx
+
+ shl r10,6
+ add r10,r9
+ lea r14,[((K_XX_XX+64))]
+
+ mov eax,DWORD[r8]
+ mov ebx,DWORD[4+r8]
+ mov ecx,DWORD[8+r8]
+ mov edx,DWORD[12+r8]
+ mov esi,ebx
+ mov ebp,DWORD[16+r8]
+ mov edi,ecx
+ xor edi,edx
+ and esi,edi
+
+ movdqa xmm6,XMMWORD[64+r14]
+ movdqa xmm9,XMMWORD[((-64))+r14]
+ movdqu xmm0,XMMWORD[r9]
+ movdqu xmm1,XMMWORD[16+r9]
+ movdqu xmm2,XMMWORD[32+r9]
+ movdqu xmm3,XMMWORD[48+r9]
+DB 102,15,56,0,198
+DB 102,15,56,0,206
+DB 102,15,56,0,214
+ add r9,64
+ paddd xmm0,xmm9
+DB 102,15,56,0,222
+ paddd xmm1,xmm9
+ paddd xmm2,xmm9
+ movdqa XMMWORD[rsp],xmm0
+ psubd xmm0,xmm9
+ movdqa XMMWORD[16+rsp],xmm1
+ psubd xmm1,xmm9
+ movdqa XMMWORD[32+rsp],xmm2
+ psubd xmm2,xmm9
+ jmp NEAR $L$oop_ssse3
+ALIGN 16
+$L$oop_ssse3:
+ ror ebx,2
+ pshufd xmm4,xmm0,238
+ xor esi,edx
+ movdqa xmm8,xmm3
+ paddd xmm9,xmm3
+ mov edi,eax
+ add ebp,DWORD[rsp]
+ punpcklqdq xmm4,xmm1
+ xor ebx,ecx
+ rol eax,5
+ add ebp,esi
+ psrldq xmm8,4
+ and edi,ebx
+ xor ebx,ecx
+ pxor xmm4,xmm0
+ add ebp,eax
+ ror eax,7
+ pxor xmm8,xmm2
+ xor edi,ecx
+ mov esi,ebp
+ add edx,DWORD[4+rsp]
+ pxor xmm4,xmm8
+ xor eax,ebx
+ rol ebp,5
+ movdqa XMMWORD[48+rsp],xmm9
+ add edx,edi
+ and esi,eax
+ movdqa xmm10,xmm4
+ xor eax,ebx
+ add edx,ebp
+ ror ebp,7
+ movdqa xmm8,xmm4
+ xor esi,ebx
+ pslldq xmm10,12
+ paddd xmm4,xmm4
+ mov edi,edx
+ add ecx,DWORD[8+rsp]
+ psrld xmm8,31
+ xor ebp,eax
+ rol edx,5
+ add ecx,esi
+ movdqa xmm9,xmm10
+ and edi,ebp
+ xor ebp,eax
+ psrld xmm10,30
+ add ecx,edx
+ ror edx,7
+ por xmm4,xmm8
+ xor edi,eax
+ mov esi,ecx
+ add ebx,DWORD[12+rsp]
+ pslld xmm9,2
+ pxor xmm4,xmm10
+ xor edx,ebp
+ movdqa xmm10,XMMWORD[((-64))+r14]
+ rol ecx,5
+ add ebx,edi
+ and esi,edx
+ pxor xmm4,xmm9
+ xor edx,ebp
+ add ebx,ecx
+ ror ecx,7
+ pshufd xmm5,xmm1,238
+ xor esi,ebp
+ movdqa xmm9,xmm4
+ paddd xmm10,xmm4
+ mov edi,ebx
+ add eax,DWORD[16+rsp]
+ punpcklqdq xmm5,xmm2
+ xor ecx,edx
+ rol ebx,5
+ add eax,esi
+ psrldq xmm9,4
+ and edi,ecx
+ xor ecx,edx
+ pxor xmm5,xmm1
+ add eax,ebx
+ ror ebx,7
+ pxor xmm9,xmm3
+ xor edi,edx
+ mov esi,eax
+ add ebp,DWORD[20+rsp]
+ pxor xmm5,xmm9
+ xor ebx,ecx
+ rol eax,5
+ movdqa XMMWORD[rsp],xmm10
+ add ebp,edi
+ and esi,ebx
+ movdqa xmm8,xmm5
+ xor ebx,ecx
+ add ebp,eax
+ ror eax,7
+ movdqa xmm9,xmm5
+ xor esi,ecx
+ pslldq xmm8,12
+ paddd xmm5,xmm5
+ mov edi,ebp
+ add edx,DWORD[24+rsp]
+ psrld xmm9,31
+ xor eax,ebx
+ rol ebp,5
+ add edx,esi
+ movdqa xmm10,xmm8
+ and edi,eax
+ xor eax,ebx
+ psrld xmm8,30
+ add edx,ebp
+ ror ebp,7
+ por xmm5,xmm9
+ xor edi,ebx
+ mov esi,edx
+ add ecx,DWORD[28+rsp]
+ pslld xmm10,2
+ pxor xmm5,xmm8
+ xor ebp,eax
+ movdqa xmm8,XMMWORD[((-32))+r14]
+ rol edx,5
+ add ecx,edi
+ and esi,ebp
+ pxor xmm5,xmm10
+ xor ebp,eax
+ add ecx,edx
+ ror edx,7
+ pshufd xmm6,xmm2,238
+ xor esi,eax
+ movdqa xmm10,xmm5
+ paddd xmm8,xmm5
+ mov edi,ecx
+ add ebx,DWORD[32+rsp]
+ punpcklqdq xmm6,xmm3
+ xor edx,ebp
+ rol ecx,5
+ add ebx,esi
+ psrldq xmm10,4
+ and edi,edx
+ xor edx,ebp
+ pxor xmm6,xmm2
+ add ebx,ecx
+ ror ecx,7
+ pxor xmm10,xmm4
+ xor edi,ebp
+ mov esi,ebx
+ add eax,DWORD[36+rsp]
+ pxor xmm6,xmm10
+ xor ecx,edx
+ rol ebx,5
+ movdqa XMMWORD[16+rsp],xmm8
+ add eax,edi
+ and esi,ecx
+ movdqa xmm9,xmm6
+ xor ecx,edx
+ add eax,ebx
+ ror ebx,7
+ movdqa xmm10,xmm6
+ xor esi,edx
+ pslldq xmm9,12
+ paddd xmm6,xmm6
+ mov edi,eax
+ add ebp,DWORD[40+rsp]
+ psrld xmm10,31
+ xor ebx,ecx
+ rol eax,5
+ add ebp,esi
+ movdqa xmm8,xmm9
+ and edi,ebx
+ xor ebx,ecx
+ psrld xmm9,30
+ add ebp,eax
+ ror eax,7
+ por xmm6,xmm10
+ xor edi,ecx
+ mov esi,ebp
+ add edx,DWORD[44+rsp]
+ pslld xmm8,2
+ pxor xmm6,xmm9
+ xor eax,ebx
+ movdqa xmm9,XMMWORD[((-32))+r14]
+ rol ebp,5
+ add edx,edi
+ and esi,eax
+ pxor xmm6,xmm8
+ xor eax,ebx
+ add edx,ebp
+ ror ebp,7
+ pshufd xmm7,xmm3,238
+ xor esi,ebx
+ movdqa xmm8,xmm6
+ paddd xmm9,xmm6
+ mov edi,edx
+ add ecx,DWORD[48+rsp]
+ punpcklqdq xmm7,xmm4
+ xor ebp,eax
+ rol edx,5
+ add ecx,esi
+ psrldq xmm8,4
+ and edi,ebp
+ xor ebp,eax
+ pxor xmm7,xmm3
+ add ecx,edx
+ ror edx,7
+ pxor xmm8,xmm5
+ xor edi,eax
+ mov esi,ecx
+ add ebx,DWORD[52+rsp]
+ pxor xmm7,xmm8
+ xor edx,ebp
+ rol ecx,5
+ movdqa XMMWORD[32+rsp],xmm9
+ add ebx,edi
+ and esi,edx
+ movdqa xmm10,xmm7
+ xor edx,ebp
+ add ebx,ecx
+ ror ecx,7
+ movdqa xmm8,xmm7
+ xor esi,ebp
+ pslldq xmm10,12
+ paddd xmm7,xmm7
+ mov edi,ebx
+ add eax,DWORD[56+rsp]
+ psrld xmm8,31
+ xor ecx,edx
+ rol ebx,5
+ add eax,esi
+ movdqa xmm9,xmm10
+ and edi,ecx
+ xor ecx,edx
+ psrld xmm10,30
+ add eax,ebx
+ ror ebx,7
+ por xmm7,xmm8
+ xor edi,edx
+ mov esi,eax
+ add ebp,DWORD[60+rsp]
+ pslld xmm9,2
+ pxor xmm7,xmm10
+ xor ebx,ecx
+ movdqa xmm10,XMMWORD[((-32))+r14]
+ rol eax,5
+ add ebp,edi
+ and esi,ebx
+ pxor xmm7,xmm9
+ pshufd xmm9,xmm6,238
+ xor ebx,ecx
+ add ebp,eax
+ ror eax,7
+ pxor xmm0,xmm4
+ xor esi,ecx
+ mov edi,ebp
+ add edx,DWORD[rsp]
+ punpcklqdq xmm9,xmm7
+ xor eax,ebx
+ rol ebp,5
+ pxor xmm0,xmm1
+ add edx,esi
+ and edi,eax
+ movdqa xmm8,xmm10
+ xor eax,ebx
+ paddd xmm10,xmm7
+ add edx,ebp
+ pxor xmm0,xmm9
+ ror ebp,7
+ xor edi,ebx
+ mov esi,edx
+ add ecx,DWORD[4+rsp]
+ movdqa xmm9,xmm0
+ xor ebp,eax
+ rol edx,5
+ movdqa XMMWORD[48+rsp],xmm10
+ add ecx,edi
+ and esi,ebp
+ xor ebp,eax
+ pslld xmm0,2
+ add ecx,edx
+ ror edx,7
+ psrld xmm9,30
+ xor esi,eax
+ mov edi,ecx
+ add ebx,DWORD[8+rsp]
+ por xmm0,xmm9
+ xor edx,ebp
+ rol ecx,5
+ pshufd xmm10,xmm7,238
+ add ebx,esi
+ and edi,edx
+ xor edx,ebp
+ add ebx,ecx
+ add eax,DWORD[12+rsp]
+ xor edi,ebp
+ mov esi,ebx
+ rol ebx,5
+ add eax,edi
+ xor esi,edx
+ ror ecx,7
+ add eax,ebx
+ pxor xmm1,xmm5
+ add ebp,DWORD[16+rsp]
+ xor esi,ecx
+ punpcklqdq xmm10,xmm0
+ mov edi,eax
+ rol eax,5
+ pxor xmm1,xmm2
+ add ebp,esi
+ xor edi,ecx
+ movdqa xmm9,xmm8
+ ror ebx,7
+ paddd xmm8,xmm0
+ add ebp,eax
+ pxor xmm1,xmm10
+ add edx,DWORD[20+rsp]
+ xor edi,ebx
+ mov esi,ebp
+ rol ebp,5
+ movdqa xmm10,xmm1
+ add edx,edi
+ xor esi,ebx
+ movdqa XMMWORD[rsp],xmm8
+ ror eax,7
+ add edx,ebp
+ add ecx,DWORD[24+rsp]
+ pslld xmm1,2
+ xor esi,eax
+ mov edi,edx
+ psrld xmm10,30
+ rol edx,5
+ add ecx,esi
+ xor edi,eax
+ ror ebp,7
+ por xmm1,xmm10
+ add ecx,edx
+ add ebx,DWORD[28+rsp]
+ pshufd xmm8,xmm0,238
+ xor edi,ebp
+ mov esi,ecx
+ rol ecx,5
+ add ebx,edi
+ xor esi,ebp
+ ror edx,7
+ add ebx,ecx
+ pxor xmm2,xmm6
+ add eax,DWORD[32+rsp]
+ xor esi,edx
+ punpcklqdq xmm8,xmm1
+ mov edi,ebx
+ rol ebx,5
+ pxor xmm2,xmm3
+ add eax,esi
+ xor edi,edx
+ movdqa xmm10,XMMWORD[r14]
+ ror ecx,7
+ paddd xmm9,xmm1
+ add eax,ebx
+ pxor xmm2,xmm8
+ add ebp,DWORD[36+rsp]
+ xor edi,ecx
+ mov esi,eax
+ rol eax,5
+ movdqa xmm8,xmm2
+ add ebp,edi
+ xor esi,ecx
+ movdqa XMMWORD[16+rsp],xmm9
+ ror ebx,7
+ add ebp,eax
+ add edx,DWORD[40+rsp]
+ pslld xmm2,2
+ xor esi,ebx
+ mov edi,ebp
+ psrld xmm8,30
+ rol ebp,5
+ add edx,esi
+ xor edi,ebx
+ ror eax,7
+ por xmm2,xmm8
+ add edx,ebp
+ add ecx,DWORD[44+rsp]
+ pshufd xmm9,xmm1,238
+ xor edi,eax
+ mov esi,edx
+ rol edx,5
+ add ecx,edi
+ xor esi,eax
+ ror ebp,7
+ add ecx,edx
+ pxor xmm3,xmm7
+ add ebx,DWORD[48+rsp]
+ xor esi,ebp
+ punpcklqdq xmm9,xmm2
+ mov edi,ecx
+ rol ecx,5
+ pxor xmm3,xmm4
+ add ebx,esi
+ xor edi,ebp
+ movdqa xmm8,xmm10
+ ror edx,7
+ paddd xmm10,xmm2
+ add ebx,ecx
+ pxor xmm3,xmm9
+ add eax,DWORD[52+rsp]
+ xor edi,edx
+ mov esi,ebx
+ rol ebx,5
+ movdqa xmm9,xmm3
+ add eax,edi
+ xor esi,edx
+ movdqa XMMWORD[32+rsp],xmm10
+ ror ecx,7
+ add eax,ebx
+ add ebp,DWORD[56+rsp]
+ pslld xmm3,2
+ xor esi,ecx
+ mov edi,eax
+ psrld xmm9,30
+ rol eax,5
+ add ebp,esi
+ xor edi,ecx
+ ror ebx,7
+ por xmm3,xmm9
+ add ebp,eax
+ add edx,DWORD[60+rsp]
+ pshufd xmm10,xmm2,238
+ xor edi,ebx
+ mov esi,ebp
+ rol ebp,5
+ add edx,edi
+ xor esi,ebx
+ ror eax,7
+ add edx,ebp
+ pxor xmm4,xmm0
+ add ecx,DWORD[rsp]
+ xor esi,eax
+ punpcklqdq xmm10,xmm3
+ mov edi,edx
+ rol edx,5
+ pxor xmm4,xmm5
+ add ecx,esi
+ xor edi,eax
+ movdqa xmm9,xmm8
+ ror ebp,7
+ paddd xmm8,xmm3
+ add ecx,edx
+ pxor xmm4,xmm10
+ add ebx,DWORD[4+rsp]
+ xor edi,ebp
+ mov esi,ecx
+ rol ecx,5
+ movdqa xmm10,xmm4
+ add ebx,edi
+ xor esi,ebp
+ movdqa XMMWORD[48+rsp],xmm8
+ ror edx,7
+ add ebx,ecx
+ add eax,DWORD[8+rsp]
+ pslld xmm4,2
+ xor esi,edx
+ mov edi,ebx
+ psrld xmm10,30
+ rol ebx,5
+ add eax,esi
+ xor edi,edx
+ ror ecx,7
+ por xmm4,xmm10
+ add eax,ebx
+ add ebp,DWORD[12+rsp]
+ pshufd xmm8,xmm3,238
+ xor edi,ecx
+ mov esi,eax
+ rol eax,5
+ add ebp,edi
+ xor esi,ecx
+ ror ebx,7
+ add ebp,eax
+ pxor xmm5,xmm1
+ add edx,DWORD[16+rsp]
+ xor esi,ebx
+ punpcklqdq xmm8,xmm4
+ mov edi,ebp
+ rol ebp,5
+ pxor xmm5,xmm6
+ add edx,esi
+ xor edi,ebx
+ movdqa xmm10,xmm9
+ ror eax,7
+ paddd xmm9,xmm4
+ add edx,ebp
+ pxor xmm5,xmm8
+ add ecx,DWORD[20+rsp]
+ xor edi,eax
+ mov esi,edx
+ rol edx,5
+ movdqa xmm8,xmm5
+ add ecx,edi
+ xor esi,eax
+ movdqa XMMWORD[rsp],xmm9
+ ror ebp,7
+ add ecx,edx
+ add ebx,DWORD[24+rsp]
+ pslld xmm5,2
+ xor esi,ebp
+ mov edi,ecx
+ psrld xmm8,30
+ rol ecx,5
+ add ebx,esi
+ xor edi,ebp
+ ror edx,7
+ por xmm5,xmm8
+ add ebx,ecx
+ add eax,DWORD[28+rsp]
+ pshufd xmm9,xmm4,238
+ ror ecx,7
+ mov esi,ebx
+ xor edi,edx
+ rol ebx,5
+ add eax,edi
+ xor esi,ecx
+ xor ecx,edx
+ add eax,ebx
+ pxor xmm6,xmm2
+ add ebp,DWORD[32+rsp]
+ and esi,ecx
+ xor ecx,edx
+ ror ebx,7
+ punpcklqdq xmm9,xmm5
+ mov edi,eax
+ xor esi,ecx
+ pxor xmm6,xmm7
+ rol eax,5
+ add ebp,esi
+ movdqa xmm8,xmm10
+ xor edi,ebx
+ paddd xmm10,xmm5
+ xor ebx,ecx
+ pxor xmm6,xmm9
+ add ebp,eax
+ add edx,DWORD[36+rsp]
+ and edi,ebx
+ xor ebx,ecx
+ ror eax,7
+ movdqa xmm9,xmm6
+ mov esi,ebp
+ xor edi,ebx
+ movdqa XMMWORD[16+rsp],xmm10
+ rol ebp,5
+ add edx,edi
+ xor esi,eax
+ pslld xmm6,2
+ xor eax,ebx
+ add edx,ebp
+ psrld xmm9,30
+ add ecx,DWORD[40+rsp]
+ and esi,eax
+ xor eax,ebx
+ por xmm6,xmm9
+ ror ebp,7
+ mov edi,edx
+ xor esi,eax
+ rol edx,5
+ pshufd xmm10,xmm5,238
+ add ecx,esi
+ xor edi,ebp
+ xor ebp,eax
+ add ecx,edx
+ add ebx,DWORD[44+rsp]
+ and edi,ebp
+ xor ebp,eax
+ ror edx,7
+ mov esi,ecx
+ xor edi,ebp
+ rol ecx,5
+ add ebx,edi
+ xor esi,edx
+ xor edx,ebp
+ add ebx,ecx
+ pxor xmm7,xmm3
+ add eax,DWORD[48+rsp]
+ and esi,edx
+ xor edx,ebp
+ ror ecx,7
+ punpcklqdq xmm10,xmm6
+ mov edi,ebx
+ xor esi,edx
+ pxor xmm7,xmm0
+ rol ebx,5
+ add eax,esi
+ movdqa xmm9,XMMWORD[32+r14]
+ xor edi,ecx
+ paddd xmm8,xmm6
+ xor ecx,edx
+ pxor xmm7,xmm10
+ add eax,ebx
+ add ebp,DWORD[52+rsp]
+ and edi,ecx
+ xor ecx,edx
+ ror ebx,7
+ movdqa xmm10,xmm7
+ mov esi,eax
+ xor edi,ecx
+ movdqa XMMWORD[32+rsp],xmm8
+ rol eax,5
+ add ebp,edi
+ xor esi,ebx
+ pslld xmm7,2
+ xor ebx,ecx
+ add ebp,eax
+ psrld xmm10,30
+ add edx,DWORD[56+rsp]
+ and esi,ebx
+ xor ebx,ecx
+ por xmm7,xmm10
+ ror eax,7
+ mov edi,ebp
+ xor esi,ebx
+ rol ebp,5
+ pshufd xmm8,xmm6,238
+ add edx,esi
+ xor edi,eax
+ xor eax,ebx
+ add edx,ebp
+ add ecx,DWORD[60+rsp]
+ and edi,eax
+ xor eax,ebx
+ ror ebp,7
+ mov esi,edx
+ xor edi,eax
+ rol edx,5
+ add ecx,edi
+ xor esi,ebp
+ xor ebp,eax
+ add ecx,edx
+ pxor xmm0,xmm4
+ add ebx,DWORD[rsp]
+ and esi,ebp
+ xor ebp,eax
+ ror edx,7
+ punpcklqdq xmm8,xmm7
+ mov edi,ecx
+ xor esi,ebp
+ pxor xmm0,xmm1
+ rol ecx,5
+ add ebx,esi
+ movdqa xmm10,xmm9
+ xor edi,edx
+ paddd xmm9,xmm7
+ xor edx,ebp
+ pxor xmm0,xmm8
+ add ebx,ecx
+ add eax,DWORD[4+rsp]
+ and edi,edx
+ xor edx,ebp
+ ror ecx,7
+ movdqa xmm8,xmm0
+ mov esi,ebx
+ xor edi,edx
+ movdqa XMMWORD[48+rsp],xmm9
+ rol ebx,5
+ add eax,edi
+ xor esi,ecx
+ pslld xmm0,2
+ xor ecx,edx
+ add eax,ebx
+ psrld xmm8,30
+ add ebp,DWORD[8+rsp]
+ and esi,ecx
+ xor ecx,edx
+ por xmm0,xmm8
+ ror ebx,7
+ mov edi,eax
+ xor esi,ecx
+ rol eax,5
+ pshufd xmm9,xmm7,238
+ add ebp,esi
+ xor edi,ebx
+ xor ebx,ecx
+ add ebp,eax
+ add edx,DWORD[12+rsp]
+ and edi,ebx
+ xor ebx,ecx
+ ror eax,7
+ mov esi,ebp
+ xor edi,ebx
+ rol ebp,5
+ add edx,edi
+ xor esi,eax
+ xor eax,ebx
+ add edx,ebp
+ pxor xmm1,xmm5
+ add ecx,DWORD[16+rsp]
+ and esi,eax
+ xor eax,ebx
+ ror ebp,7
+ punpcklqdq xmm9,xmm0
+ mov edi,edx
+ xor esi,eax
+ pxor xmm1,xmm2
+ rol edx,5
+ add ecx,esi
+ movdqa xmm8,xmm10
+ xor edi,ebp
+ paddd xmm10,xmm0
+ xor ebp,eax
+ pxor xmm1,xmm9
+ add ecx,edx
+ add ebx,DWORD[20+rsp]
+ and edi,ebp
+ xor ebp,eax
+ ror edx,7
+ movdqa xmm9,xmm1
+ mov esi,ecx
+ xor edi,ebp
+ movdqa XMMWORD[rsp],xmm10
+ rol ecx,5
+ add ebx,edi
+ xor esi,edx
+ pslld xmm1,2
+ xor edx,ebp
+ add ebx,ecx
+ psrld xmm9,30
+ add eax,DWORD[24+rsp]
+ and esi,edx
+ xor edx,ebp
+ por xmm1,xmm9
+ ror ecx,7
+ mov edi,ebx
+ xor esi,edx
+ rol ebx,5
+ pshufd xmm10,xmm0,238
+ add eax,esi
+ xor edi,ecx
+ xor ecx,edx
+ add eax,ebx
+ add ebp,DWORD[28+rsp]
+ and edi,ecx
+ xor ecx,edx
+ ror ebx,7
+ mov esi,eax
+ xor edi,ecx
+ rol eax,5
+ add ebp,edi
+ xor esi,ebx
+ xor ebx,ecx
+ add ebp,eax
+ pxor xmm2,xmm6
+ add edx,DWORD[32+rsp]
+ and esi,ebx
+ xor ebx,ecx
+ ror eax,7
+ punpcklqdq xmm10,xmm1
+ mov edi,ebp
+ xor esi,ebx
+ pxor xmm2,xmm3
+ rol ebp,5
+ add edx,esi
+ movdqa xmm9,xmm8
+ xor edi,eax
+ paddd xmm8,xmm1
+ xor eax,ebx
+ pxor xmm2,xmm10
+ add edx,ebp
+ add ecx,DWORD[36+rsp]
+ and edi,eax
+ xor eax,ebx
+ ror ebp,7
+ movdqa xmm10,xmm2
+ mov esi,edx
+ xor edi,eax
+ movdqa XMMWORD[16+rsp],xmm8
+ rol edx,5
+ add ecx,edi
+ xor esi,ebp
+ pslld xmm2,2
+ xor ebp,eax
+ add ecx,edx
+ psrld xmm10,30
+ add ebx,DWORD[40+rsp]
+ and esi,ebp
+ xor ebp,eax
+ por xmm2,xmm10
+ ror edx,7
+ mov edi,ecx
+ xor esi,ebp
+ rol ecx,5
+ pshufd xmm8,xmm1,238
+ add ebx,esi
+ xor edi,edx
+ xor edx,ebp
+ add ebx,ecx
+ add eax,DWORD[44+rsp]
+ and edi,edx
+ xor edx,ebp
+ ror ecx,7
+ mov esi,ebx
+ xor edi,edx
+ rol ebx,5
+ add eax,edi
+ xor esi,edx
+ add eax,ebx
+ pxor xmm3,xmm7
+ add ebp,DWORD[48+rsp]
+ xor esi,ecx
+ punpcklqdq xmm8,xmm2
+ mov edi,eax
+ rol eax,5
+ pxor xmm3,xmm4
+ add ebp,esi
+ xor edi,ecx
+ movdqa xmm10,xmm9
+ ror ebx,7
+ paddd xmm9,xmm2
+ add ebp,eax
+ pxor xmm3,xmm8
+ add edx,DWORD[52+rsp]
+ xor edi,ebx
+ mov esi,ebp
+ rol ebp,5
+ movdqa xmm8,xmm3
+ add edx,edi
+ xor esi,ebx
+ movdqa XMMWORD[32+rsp],xmm9
+ ror eax,7
+ add edx,ebp
+ add ecx,DWORD[56+rsp]
+ pslld xmm3,2
+ xor esi,eax
+ mov edi,edx
+ psrld xmm8,30
+ rol edx,5
+ add ecx,esi
+ xor edi,eax
+ ror ebp,7
+ por xmm3,xmm8
+ add ecx,edx
+ add ebx,DWORD[60+rsp]
+ xor edi,ebp
+ mov esi,ecx
+ rol ecx,5
+ add ebx,edi
+ xor esi,ebp
+ ror edx,7
+ add ebx,ecx
+ add eax,DWORD[rsp]
+ xor esi,edx
+ mov edi,ebx
+ rol ebx,5
+ paddd xmm10,xmm3
+ add eax,esi
+ xor edi,edx
+ movdqa XMMWORD[48+rsp],xmm10
+ ror ecx,7
+ add eax,ebx
+ add ebp,DWORD[4+rsp]
+ xor edi,ecx
+ mov esi,eax
+ rol eax,5
+ add ebp,edi
+ xor esi,ecx
+ ror ebx,7
+ add ebp,eax
+ add edx,DWORD[8+rsp]
+ xor esi,ebx
+ mov edi,ebp
+ rol ebp,5
+ add edx,esi
+ xor edi,ebx
+ ror eax,7
+ add edx,ebp
+ add ecx,DWORD[12+rsp]
+ xor edi,eax
+ mov esi,edx
+ rol edx,5
+ add ecx,edi
+ xor esi,eax
+ ror ebp,7
+ add ecx,edx
+ cmp r9,r10
+ je NEAR $L$done_ssse3
+ movdqa xmm6,XMMWORD[64+r14]
+ movdqa xmm9,XMMWORD[((-64))+r14]
+ movdqu xmm0,XMMWORD[r9]
+ movdqu xmm1,XMMWORD[16+r9]
+ movdqu xmm2,XMMWORD[32+r9]
+ movdqu xmm3,XMMWORD[48+r9]
+DB 102,15,56,0,198
+ add r9,64
+ add ebx,DWORD[16+rsp]
+ xor esi,ebp
+ mov edi,ecx
+DB 102,15,56,0,206
+ rol ecx,5
+ add ebx,esi
+ xor edi,ebp
+ ror edx,7
+ paddd xmm0,xmm9
+ add ebx,ecx
+ add eax,DWORD[20+rsp]
+ xor edi,edx
+ mov esi,ebx
+ movdqa XMMWORD[rsp],xmm0
+ rol ebx,5
+ add eax,edi
+ xor esi,edx
+ ror ecx,7
+ psubd xmm0,xmm9
+ add eax,ebx
+ add ebp,DWORD[24+rsp]
+ xor esi,ecx
+ mov edi,eax
+ rol eax,5
+ add ebp,esi
+ xor edi,ecx
+ ror ebx,7
+ add ebp,eax
+ add edx,DWORD[28+rsp]
+ xor edi,ebx
+ mov esi,ebp
+ rol ebp,5
+ add edx,edi
+ xor esi,ebx
+ ror eax,7
+ add edx,ebp
+ add ecx,DWORD[32+rsp]
+ xor esi,eax
+ mov edi,edx
+DB 102,15,56,0,214
+ rol edx,5
+ add ecx,esi
+ xor edi,eax
+ ror ebp,7
+ paddd xmm1,xmm9
+ add ecx,edx
+ add ebx,DWORD[36+rsp]
+ xor edi,ebp
+ mov esi,ecx
+ movdqa XMMWORD[16+rsp],xmm1
+ rol ecx,5
+ add ebx,edi
+ xor esi,ebp
+ ror edx,7
+ psubd xmm1,xmm9
+ add ebx,ecx
+ add eax,DWORD[40+rsp]
+ xor esi,edx
+ mov edi,ebx
+ rol ebx,5
+ add eax,esi
+ xor edi,edx
+ ror ecx,7
+ add eax,ebx
+ add ebp,DWORD[44+rsp]
+ xor edi,ecx
+ mov esi,eax
+ rol eax,5
+ add ebp,edi
+ xor esi,ecx
+ ror ebx,7
+ add ebp,eax
+ add edx,DWORD[48+rsp]
+ xor esi,ebx
+ mov edi,ebp
+DB 102,15,56,0,222
+ rol ebp,5
+ add edx,esi
+ xor edi,ebx
+ ror eax,7
+ paddd xmm2,xmm9
+ add edx,ebp
+ add ecx,DWORD[52+rsp]
+ xor edi,eax
+ mov esi,edx
+ movdqa XMMWORD[32+rsp],xmm2
+ rol edx,5
+ add ecx,edi
+ xor esi,eax
+ ror ebp,7
+ psubd xmm2,xmm9
+ add ecx,edx
+ add ebx,DWORD[56+rsp]
+ xor esi,ebp
+ mov edi,ecx
+ rol ecx,5
+ add ebx,esi
+ xor edi,ebp
+ ror edx,7
+ add ebx,ecx
+ add eax,DWORD[60+rsp]
+ xor edi,edx
+ mov esi,ebx
+ rol ebx,5
+ add eax,edi
+ ror ecx,7
+ add eax,ebx
+ add eax,DWORD[r8]
+ add esi,DWORD[4+r8]
+ add ecx,DWORD[8+r8]
+ add edx,DWORD[12+r8]
+ mov DWORD[r8],eax
+ add ebp,DWORD[16+r8]
+ mov DWORD[4+r8],esi
+ mov ebx,esi
+ mov DWORD[8+r8],ecx
+ mov edi,ecx
+ mov DWORD[12+r8],edx
+ xor edi,edx
+ mov DWORD[16+r8],ebp
+ and esi,edi
+ jmp NEAR $L$oop_ssse3
+
+ALIGN 16
+$L$done_ssse3:
+ add ebx,DWORD[16+rsp]
+ xor esi,ebp
+ mov edi,ecx
+ rol ecx,5
+ add ebx,esi
+ xor edi,ebp
+ ror edx,7
+ add ebx,ecx
+ add eax,DWORD[20+rsp]
+ xor edi,edx
+ mov esi,ebx
+ rol ebx,5
+ add eax,edi
+ xor esi,edx
+ ror ecx,7
+ add eax,ebx
+ add ebp,DWORD[24+rsp]
+ xor esi,ecx
+ mov edi,eax
+ rol eax,5
+ add ebp,esi
+ xor edi,ecx
+ ror ebx,7
+ add ebp,eax
+ add edx,DWORD[28+rsp]
+ xor edi,ebx
+ mov esi,ebp
+ rol ebp,5
+ add edx,edi
+ xor esi,ebx
+ ror eax,7
+ add edx,ebp
+ add ecx,DWORD[32+rsp]
+ xor esi,eax
+ mov edi,edx
+ rol edx,5
+ add ecx,esi
+ xor edi,eax
+ ror ebp,7
+ add ecx,edx
+ add ebx,DWORD[36+rsp]
+ xor edi,ebp
+ mov esi,ecx
+ rol ecx,5
+ add ebx,edi
+ xor esi,ebp
+ ror edx,7
+ add ebx,ecx
+ add eax,DWORD[40+rsp]
+ xor esi,edx
+ mov edi,ebx
+ rol ebx,5
+ add eax,esi
+ xor edi,edx
+ ror ecx,7
+ add eax,ebx
+ add ebp,DWORD[44+rsp]
+ xor edi,ecx
+ mov esi,eax
+ rol eax,5
+ add ebp,edi
+ xor esi,ecx
+ ror ebx,7
+ add ebp,eax
+ add edx,DWORD[48+rsp]
+ xor esi,ebx
+ mov edi,ebp
+ rol ebp,5
+ add edx,esi
+ xor edi,ebx
+ ror eax,7
+ add edx,ebp
+ add ecx,DWORD[52+rsp]
+ xor edi,eax
+ mov esi,edx
+ rol edx,5
+ add ecx,edi
+ xor esi,eax
+ ror ebp,7
+ add ecx,edx
+ add ebx,DWORD[56+rsp]
+ xor esi,ebp
+ mov edi,ecx
+ rol ecx,5
+ add ebx,esi
+ xor edi,ebp
+ ror edx,7
+ add ebx,ecx
+ add eax,DWORD[60+rsp]
+ xor edi,edx
+ mov esi,ebx
+ rol ebx,5
+ add eax,edi
+ ror ecx,7
+ add eax,ebx
+ add eax,DWORD[r8]
+ add esi,DWORD[4+r8]
+ add ecx,DWORD[8+r8]
+ mov DWORD[r8],eax
+ add edx,DWORD[12+r8]
+ mov DWORD[4+r8],esi
+ add ebp,DWORD[16+r8]
+ mov DWORD[8+r8],ecx
+ mov DWORD[12+r8],edx
+ mov DWORD[16+r8],ebp
+ movaps xmm6,XMMWORD[((-40-96))+r11]
+ movaps xmm7,XMMWORD[((-40-80))+r11]
+ movaps xmm8,XMMWORD[((-40-64))+r11]
+ movaps xmm9,XMMWORD[((-40-48))+r11]
+ movaps xmm10,XMMWORD[((-40-32))+r11]
+ movaps xmm11,XMMWORD[((-40-16))+r11]
+ mov r14,QWORD[((-40))+r11]
+
+ mov r13,QWORD[((-32))+r11]
+
+ mov r12,QWORD[((-24))+r11]
+
+ mov rbp,QWORD[((-16))+r11]
+
+ mov rbx,QWORD[((-8))+r11]
+
+ lea rsp,[r11]
+
+$L$epilogue_ssse3:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_sha1_block_data_order_ssse3:
+
+ALIGN 16
+sha1_block_data_order_avx:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha1_block_data_order_avx:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+_avx_shortcut:
+
+ mov r11,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ lea rsp,[((-160))+rsp]
+ vzeroupper
+ vmovaps XMMWORD[(-40-96)+r11],xmm6
+ vmovaps XMMWORD[(-40-80)+r11],xmm7
+ vmovaps XMMWORD[(-40-64)+r11],xmm8
+ vmovaps XMMWORD[(-40-48)+r11],xmm9
+ vmovaps XMMWORD[(-40-32)+r11],xmm10
+ vmovaps XMMWORD[(-40-16)+r11],xmm11
+$L$prologue_avx:
+ and rsp,-64
+ mov r8,rdi
+ mov r9,rsi
+ mov r10,rdx
+
+ shl r10,6
+ add r10,r9
+ lea r14,[((K_XX_XX+64))]
+
+ mov eax,DWORD[r8]
+ mov ebx,DWORD[4+r8]
+ mov ecx,DWORD[8+r8]
+ mov edx,DWORD[12+r8]
+ mov esi,ebx
+ mov ebp,DWORD[16+r8]
+ mov edi,ecx
+ xor edi,edx
+ and esi,edi
+
+ vmovdqa xmm6,XMMWORD[64+r14]
+ vmovdqa xmm11,XMMWORD[((-64))+r14]
+ vmovdqu xmm0,XMMWORD[r9]
+ vmovdqu xmm1,XMMWORD[16+r9]
+ vmovdqu xmm2,XMMWORD[32+r9]
+ vmovdqu xmm3,XMMWORD[48+r9]
+ vpshufb xmm0,xmm0,xmm6
+ add r9,64
+ vpshufb xmm1,xmm1,xmm6
+ vpshufb xmm2,xmm2,xmm6
+ vpshufb xmm3,xmm3,xmm6
+ vpaddd xmm4,xmm0,xmm11
+ vpaddd xmm5,xmm1,xmm11
+ vpaddd xmm6,xmm2,xmm11
+ vmovdqa XMMWORD[rsp],xmm4
+ vmovdqa XMMWORD[16+rsp],xmm5
+ vmovdqa XMMWORD[32+rsp],xmm6
+ jmp NEAR $L$oop_avx
+ALIGN 16
+$L$oop_avx:
+ shrd ebx,ebx,2
+ xor esi,edx
+ vpalignr xmm4,xmm1,xmm0,8
+ mov edi,eax
+ add ebp,DWORD[rsp]
+ vpaddd xmm9,xmm11,xmm3
+ xor ebx,ecx
+ shld eax,eax,5
+ vpsrldq xmm8,xmm3,4
+ add ebp,esi
+ and edi,ebx
+ vpxor xmm4,xmm4,xmm0
+ xor ebx,ecx
+ add ebp,eax
+ vpxor xmm8,xmm8,xmm2
+ shrd eax,eax,7
+ xor edi,ecx
+ mov esi,ebp
+ add edx,DWORD[4+rsp]
+ vpxor xmm4,xmm4,xmm8
+ xor eax,ebx
+ shld ebp,ebp,5
+ vmovdqa XMMWORD[48+rsp],xmm9
+ add edx,edi
+ and esi,eax
+ vpsrld xmm8,xmm4,31
+ xor eax,ebx
+ add edx,ebp
+ shrd ebp,ebp,7
+ xor esi,ebx
+ vpslldq xmm10,xmm4,12
+ vpaddd xmm4,xmm4,xmm4
+ mov edi,edx
+ add ecx,DWORD[8+rsp]
+ xor ebp,eax
+ shld edx,edx,5
+ vpsrld xmm9,xmm10,30
+ vpor xmm4,xmm4,xmm8
+ add ecx,esi
+ and edi,ebp
+ xor ebp,eax
+ add ecx,edx
+ vpslld xmm10,xmm10,2
+ vpxor xmm4,xmm4,xmm9
+ shrd edx,edx,7
+ xor edi,eax
+ mov esi,ecx
+ add ebx,DWORD[12+rsp]
+ vpxor xmm4,xmm4,xmm10
+ xor edx,ebp
+ shld ecx,ecx,5
+ add ebx,edi
+ and esi,edx
+ xor edx,ebp
+ add ebx,ecx
+ shrd ecx,ecx,7
+ xor esi,ebp
+ vpalignr xmm5,xmm2,xmm1,8
+ mov edi,ebx
+ add eax,DWORD[16+rsp]
+ vpaddd xmm9,xmm11,xmm4
+ xor ecx,edx
+ shld ebx,ebx,5
+ vpsrldq xmm8,xmm4,4
+ add eax,esi
+ and edi,ecx
+ vpxor xmm5,xmm5,xmm1
+ xor ecx,edx
+ add eax,ebx
+ vpxor xmm8,xmm8,xmm3
+ shrd ebx,ebx,7
+ xor edi,edx
+ mov esi,eax
+ add ebp,DWORD[20+rsp]
+ vpxor xmm5,xmm5,xmm8
+ xor ebx,ecx
+ shld eax,eax,5
+ vmovdqa XMMWORD[rsp],xmm9
+ add ebp,edi
+ and esi,ebx
+ vpsrld xmm8,xmm5,31
+ xor ebx,ecx
+ add ebp,eax
+ shrd eax,eax,7
+ xor esi,ecx
+ vpslldq xmm10,xmm5,12
+ vpaddd xmm5,xmm5,xmm5
+ mov edi,ebp
+ add edx,DWORD[24+rsp]
+ xor eax,ebx
+ shld ebp,ebp,5
+ vpsrld xmm9,xmm10,30
+ vpor xmm5,xmm5,xmm8
+ add edx,esi
+ and edi,eax
+ xor eax,ebx
+ add edx,ebp
+ vpslld xmm10,xmm10,2
+ vpxor xmm5,xmm5,xmm9
+ shrd ebp,ebp,7
+ xor edi,ebx
+ mov esi,edx
+ add ecx,DWORD[28+rsp]
+ vpxor xmm5,xmm5,xmm10
+ xor ebp,eax
+ shld edx,edx,5
+ vmovdqa xmm11,XMMWORD[((-32))+r14]
+ add ecx,edi
+ and esi,ebp
+ xor ebp,eax
+ add ecx,edx
+ shrd edx,edx,7
+ xor esi,eax
+ vpalignr xmm6,xmm3,xmm2,8
+ mov edi,ecx
+ add ebx,DWORD[32+rsp]
+ vpaddd xmm9,xmm11,xmm5
+ xor edx,ebp
+ shld ecx,ecx,5
+ vpsrldq xmm8,xmm5,4
+ add ebx,esi
+ and edi,edx
+ vpxor xmm6,xmm6,xmm2
+ xor edx,ebp
+ add ebx,ecx
+ vpxor xmm8,xmm8,xmm4
+ shrd ecx,ecx,7
+ xor edi,ebp
+ mov esi,ebx
+ add eax,DWORD[36+rsp]
+ vpxor xmm6,xmm6,xmm8
+ xor ecx,edx
+ shld ebx,ebx,5
+ vmovdqa XMMWORD[16+rsp],xmm9
+ add eax,edi
+ and esi,ecx
+ vpsrld xmm8,xmm6,31
+ xor ecx,edx
+ add eax,ebx
+ shrd ebx,ebx,7
+ xor esi,edx
+ vpslldq xmm10,xmm6,12
+ vpaddd xmm6,xmm6,xmm6
+ mov edi,eax
+ add ebp,DWORD[40+rsp]
+ xor ebx,ecx
+ shld eax,eax,5
+ vpsrld xmm9,xmm10,30
+ vpor xmm6,xmm6,xmm8
+ add ebp,esi
+ and edi,ebx
+ xor ebx,ecx
+ add ebp,eax
+ vpslld xmm10,xmm10,2
+ vpxor xmm6,xmm6,xmm9
+ shrd eax,eax,7
+ xor edi,ecx
+ mov esi,ebp
+ add edx,DWORD[44+rsp]
+ vpxor xmm6,xmm6,xmm10
+ xor eax,ebx
+ shld ebp,ebp,5
+ add edx,edi
+ and esi,eax
+ xor eax,ebx
+ add edx,ebp
+ shrd ebp,ebp,7
+ xor esi,ebx
+ vpalignr xmm7,xmm4,xmm3,8
+ mov edi,edx
+ add ecx,DWORD[48+rsp]
+ vpaddd xmm9,xmm11,xmm6
+ xor ebp,eax
+ shld edx,edx,5
+ vpsrldq xmm8,xmm6,4
+ add ecx,esi
+ and edi,ebp
+ vpxor xmm7,xmm7,xmm3
+ xor ebp,eax
+ add ecx,edx
+ vpxor xmm8,xmm8,xmm5
+ shrd edx,edx,7
+ xor edi,eax
+ mov esi,ecx
+ add ebx,DWORD[52+rsp]
+ vpxor xmm7,xmm7,xmm8
+ xor edx,ebp
+ shld ecx,ecx,5
+ vmovdqa XMMWORD[32+rsp],xmm9
+ add ebx,edi
+ and esi,edx
+ vpsrld xmm8,xmm7,31
+ xor edx,ebp
+ add ebx,ecx
+ shrd ecx,ecx,7
+ xor esi,ebp
+ vpslldq xmm10,xmm7,12
+ vpaddd xmm7,xmm7,xmm7
+ mov edi,ebx
+ add eax,DWORD[56+rsp]
+ xor ecx,edx
+ shld ebx,ebx,5
+ vpsrld xmm9,xmm10,30
+ vpor xmm7,xmm7,xmm8
+ add eax,esi
+ and edi,ecx
+ xor ecx,edx
+ add eax,ebx
+ vpslld xmm10,xmm10,2
+ vpxor xmm7,xmm7,xmm9
+ shrd ebx,ebx,7
+ xor edi,edx
+ mov esi,eax
+ add ebp,DWORD[60+rsp]
+ vpxor xmm7,xmm7,xmm10
+ xor ebx,ecx
+ shld eax,eax,5
+ add ebp,edi
+ and esi,ebx
+ xor ebx,ecx
+ add ebp,eax
+ vpalignr xmm8,xmm7,xmm6,8
+ vpxor xmm0,xmm0,xmm4
+ shrd eax,eax,7
+ xor esi,ecx
+ mov edi,ebp
+ add edx,DWORD[rsp]
+ vpxor xmm0,xmm0,xmm1
+ xor eax,ebx
+ shld ebp,ebp,5
+ vpaddd xmm9,xmm11,xmm7
+ add edx,esi
+ and edi,eax
+ vpxor xmm0,xmm0,xmm8
+ xor eax,ebx
+ add edx,ebp
+ shrd ebp,ebp,7
+ xor edi,ebx
+ vpsrld xmm8,xmm0,30
+ vmovdqa XMMWORD[48+rsp],xmm9
+ mov esi,edx
+ add ecx,DWORD[4+rsp]
+ xor ebp,eax
+ shld edx,edx,5
+ vpslld xmm0,xmm0,2
+ add ecx,edi
+ and esi,ebp
+ xor ebp,eax
+ add ecx,edx
+ shrd edx,edx,7
+ xor esi,eax
+ mov edi,ecx
+ add ebx,DWORD[8+rsp]
+ vpor xmm0,xmm0,xmm8
+ xor edx,ebp
+ shld ecx,ecx,5
+ add ebx,esi
+ and edi,edx
+ xor edx,ebp
+ add ebx,ecx
+ add eax,DWORD[12+rsp]
+ xor edi,ebp
+ mov esi,ebx
+ shld ebx,ebx,5
+ add eax,edi
+ xor esi,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ vpalignr xmm8,xmm0,xmm7,8
+ vpxor xmm1,xmm1,xmm5
+ add ebp,DWORD[16+rsp]
+ xor esi,ecx
+ mov edi,eax
+ shld eax,eax,5
+ vpxor xmm1,xmm1,xmm2
+ add ebp,esi
+ xor edi,ecx
+ vpaddd xmm9,xmm11,xmm0
+ shrd ebx,ebx,7
+ add ebp,eax
+ vpxor xmm1,xmm1,xmm8
+ add edx,DWORD[20+rsp]
+ xor edi,ebx
+ mov esi,ebp
+ shld ebp,ebp,5
+ vpsrld xmm8,xmm1,30
+ vmovdqa XMMWORD[rsp],xmm9
+ add edx,edi
+ xor esi,ebx
+ shrd eax,eax,7
+ add edx,ebp
+ vpslld xmm1,xmm1,2
+ add ecx,DWORD[24+rsp]
+ xor esi,eax
+ mov edi,edx
+ shld edx,edx,5
+ add ecx,esi
+ xor edi,eax
+ shrd ebp,ebp,7
+ add ecx,edx
+ vpor xmm1,xmm1,xmm8
+ add ebx,DWORD[28+rsp]
+ xor edi,ebp
+ mov esi,ecx
+ shld ecx,ecx,5
+ add ebx,edi
+ xor esi,ebp
+ shrd edx,edx,7
+ add ebx,ecx
+ vpalignr xmm8,xmm1,xmm0,8
+ vpxor xmm2,xmm2,xmm6
+ add eax,DWORD[32+rsp]
+ xor esi,edx
+ mov edi,ebx
+ shld ebx,ebx,5
+ vpxor xmm2,xmm2,xmm3
+ add eax,esi
+ xor edi,edx
+ vpaddd xmm9,xmm11,xmm1
+ vmovdqa xmm11,XMMWORD[r14]
+ shrd ecx,ecx,7
+ add eax,ebx
+ vpxor xmm2,xmm2,xmm8
+ add ebp,DWORD[36+rsp]
+ xor edi,ecx
+ mov esi,eax
+ shld eax,eax,5
+ vpsrld xmm8,xmm2,30
+ vmovdqa XMMWORD[16+rsp],xmm9
+ add ebp,edi
+ xor esi,ecx
+ shrd ebx,ebx,7
+ add ebp,eax
+ vpslld xmm2,xmm2,2
+ add edx,DWORD[40+rsp]
+ xor esi,ebx
+ mov edi,ebp
+ shld ebp,ebp,5
+ add edx,esi
+ xor edi,ebx
+ shrd eax,eax,7
+ add edx,ebp
+ vpor xmm2,xmm2,xmm8
+ add ecx,DWORD[44+rsp]
+ xor edi,eax
+ mov esi,edx
+ shld edx,edx,5
+ add ecx,edi
+ xor esi,eax
+ shrd ebp,ebp,7
+ add ecx,edx
+ vpalignr xmm8,xmm2,xmm1,8
+ vpxor xmm3,xmm3,xmm7
+ add ebx,DWORD[48+rsp]
+ xor esi,ebp
+ mov edi,ecx
+ shld ecx,ecx,5
+ vpxor xmm3,xmm3,xmm4
+ add ebx,esi
+ xor edi,ebp
+ vpaddd xmm9,xmm11,xmm2
+ shrd edx,edx,7
+ add ebx,ecx
+ vpxor xmm3,xmm3,xmm8
+ add eax,DWORD[52+rsp]
+ xor edi,edx
+ mov esi,ebx
+ shld ebx,ebx,5
+ vpsrld xmm8,xmm3,30
+ vmovdqa XMMWORD[32+rsp],xmm9
+ add eax,edi
+ xor esi,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ vpslld xmm3,xmm3,2
+ add ebp,DWORD[56+rsp]
+ xor esi,ecx
+ mov edi,eax
+ shld eax,eax,5
+ add ebp,esi
+ xor edi,ecx
+ shrd ebx,ebx,7
+ add ebp,eax
+ vpor xmm3,xmm3,xmm8
+ add edx,DWORD[60+rsp]
+ xor edi,ebx
+ mov esi,ebp
+ shld ebp,ebp,5
+ add edx,edi
+ xor esi,ebx
+ shrd eax,eax,7
+ add edx,ebp
+ vpalignr xmm8,xmm3,xmm2,8
+ vpxor xmm4,xmm4,xmm0
+ add ecx,DWORD[rsp]
+ xor esi,eax
+ mov edi,edx
+ shld edx,edx,5
+ vpxor xmm4,xmm4,xmm5
+ add ecx,esi
+ xor edi,eax
+ vpaddd xmm9,xmm11,xmm3
+ shrd ebp,ebp,7
+ add ecx,edx
+ vpxor xmm4,xmm4,xmm8
+ add ebx,DWORD[4+rsp]
+ xor edi,ebp
+ mov esi,ecx
+ shld ecx,ecx,5
+ vpsrld xmm8,xmm4,30
+ vmovdqa XMMWORD[48+rsp],xmm9
+ add ebx,edi
+ xor esi,ebp
+ shrd edx,edx,7
+ add ebx,ecx
+ vpslld xmm4,xmm4,2
+ add eax,DWORD[8+rsp]
+ xor esi,edx
+ mov edi,ebx
+ shld ebx,ebx,5
+ add eax,esi
+ xor edi,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ vpor xmm4,xmm4,xmm8
+ add ebp,DWORD[12+rsp]
+ xor edi,ecx
+ mov esi,eax
+ shld eax,eax,5
+ add ebp,edi
+ xor esi,ecx
+ shrd ebx,ebx,7
+ add ebp,eax
+ vpalignr xmm8,xmm4,xmm3,8
+ vpxor xmm5,xmm5,xmm1
+ add edx,DWORD[16+rsp]
+ xor esi,ebx
+ mov edi,ebp
+ shld ebp,ebp,5
+ vpxor xmm5,xmm5,xmm6
+ add edx,esi
+ xor edi,ebx
+ vpaddd xmm9,xmm11,xmm4
+ shrd eax,eax,7
+ add edx,ebp
+ vpxor xmm5,xmm5,xmm8
+ add ecx,DWORD[20+rsp]
+ xor edi,eax
+ mov esi,edx
+ shld edx,edx,5
+ vpsrld xmm8,xmm5,30
+ vmovdqa XMMWORD[rsp],xmm9
+ add ecx,edi
+ xor esi,eax
+ shrd ebp,ebp,7
+ add ecx,edx
+ vpslld xmm5,xmm5,2
+ add ebx,DWORD[24+rsp]
+ xor esi,ebp
+ mov edi,ecx
+ shld ecx,ecx,5
+ add ebx,esi
+ xor edi,ebp
+ shrd edx,edx,7
+ add ebx,ecx
+ vpor xmm5,xmm5,xmm8
+ add eax,DWORD[28+rsp]
+ shrd ecx,ecx,7
+ mov esi,ebx
+ xor edi,edx
+ shld ebx,ebx,5
+ add eax,edi
+ xor esi,ecx
+ xor ecx,edx
+ add eax,ebx
+ vpalignr xmm8,xmm5,xmm4,8
+ vpxor xmm6,xmm6,xmm2
+ add ebp,DWORD[32+rsp]
+ and esi,ecx
+ xor ecx,edx
+ shrd ebx,ebx,7
+ vpxor xmm6,xmm6,xmm7
+ mov edi,eax
+ xor esi,ecx
+ vpaddd xmm9,xmm11,xmm5
+ shld eax,eax,5
+ add ebp,esi
+ vpxor xmm6,xmm6,xmm8
+ xor edi,ebx
+ xor ebx,ecx
+ add ebp,eax
+ add edx,DWORD[36+rsp]
+ vpsrld xmm8,xmm6,30
+ vmovdqa XMMWORD[16+rsp],xmm9
+ and edi,ebx
+ xor ebx,ecx
+ shrd eax,eax,7
+ mov esi,ebp
+ vpslld xmm6,xmm6,2
+ xor edi,ebx
+ shld ebp,ebp,5
+ add edx,edi
+ xor esi,eax
+ xor eax,ebx
+ add edx,ebp
+ add ecx,DWORD[40+rsp]
+ and esi,eax
+ vpor xmm6,xmm6,xmm8
+ xor eax,ebx
+ shrd ebp,ebp,7
+ mov edi,edx
+ xor esi,eax
+ shld edx,edx,5
+ add ecx,esi
+ xor edi,ebp
+ xor ebp,eax
+ add ecx,edx
+ add ebx,DWORD[44+rsp]
+ and edi,ebp
+ xor ebp,eax
+ shrd edx,edx,7
+ mov esi,ecx
+ xor edi,ebp
+ shld ecx,ecx,5
+ add ebx,edi
+ xor esi,edx
+ xor edx,ebp
+ add ebx,ecx
+ vpalignr xmm8,xmm6,xmm5,8
+ vpxor xmm7,xmm7,xmm3
+ add eax,DWORD[48+rsp]
+ and esi,edx
+ xor edx,ebp
+ shrd ecx,ecx,7
+ vpxor xmm7,xmm7,xmm0
+ mov edi,ebx
+ xor esi,edx
+ vpaddd xmm9,xmm11,xmm6
+ vmovdqa xmm11,XMMWORD[32+r14]
+ shld ebx,ebx,5
+ add eax,esi
+ vpxor xmm7,xmm7,xmm8
+ xor edi,ecx
+ xor ecx,edx
+ add eax,ebx
+ add ebp,DWORD[52+rsp]
+ vpsrld xmm8,xmm7,30
+ vmovdqa XMMWORD[32+rsp],xmm9
+ and edi,ecx
+ xor ecx,edx
+ shrd ebx,ebx,7
+ mov esi,eax
+ vpslld xmm7,xmm7,2
+ xor edi,ecx
+ shld eax,eax,5
+ add ebp,edi
+ xor esi,ebx
+ xor ebx,ecx
+ add ebp,eax
+ add edx,DWORD[56+rsp]
+ and esi,ebx
+ vpor xmm7,xmm7,xmm8
+ xor ebx,ecx
+ shrd eax,eax,7
+ mov edi,ebp
+ xor esi,ebx
+ shld ebp,ebp,5
+ add edx,esi
+ xor edi,eax
+ xor eax,ebx
+ add edx,ebp
+ add ecx,DWORD[60+rsp]
+ and edi,eax
+ xor eax,ebx
+ shrd ebp,ebp,7
+ mov esi,edx
+ xor edi,eax
+ shld edx,edx,5
+ add ecx,edi
+ xor esi,ebp
+ xor ebp,eax
+ add ecx,edx
+ vpalignr xmm8,xmm7,xmm6,8
+ vpxor xmm0,xmm0,xmm4
+ add ebx,DWORD[rsp]
+ and esi,ebp
+ xor ebp,eax
+ shrd edx,edx,7
+ vpxor xmm0,xmm0,xmm1
+ mov edi,ecx
+ xor esi,ebp
+ vpaddd xmm9,xmm11,xmm7
+ shld ecx,ecx,5
+ add ebx,esi
+ vpxor xmm0,xmm0,xmm8
+ xor edi,edx
+ xor edx,ebp
+ add ebx,ecx
+ add eax,DWORD[4+rsp]
+ vpsrld xmm8,xmm0,30
+ vmovdqa XMMWORD[48+rsp],xmm9
+ and edi,edx
+ xor edx,ebp
+ shrd ecx,ecx,7
+ mov esi,ebx
+ vpslld xmm0,xmm0,2
+ xor edi,edx
+ shld ebx,ebx,5
+ add eax,edi
+ xor esi,ecx
+ xor ecx,edx
+ add eax,ebx
+ add ebp,DWORD[8+rsp]
+ and esi,ecx
+ vpor xmm0,xmm0,xmm8
+ xor ecx,edx
+ shrd ebx,ebx,7
+ mov edi,eax
+ xor esi,ecx
+ shld eax,eax,5
+ add ebp,esi
+ xor edi,ebx
+ xor ebx,ecx
+ add ebp,eax
+ add edx,DWORD[12+rsp]
+ and edi,ebx
+ xor ebx,ecx
+ shrd eax,eax,7
+ mov esi,ebp
+ xor edi,ebx
+ shld ebp,ebp,5
+ add edx,edi
+ xor esi,eax
+ xor eax,ebx
+ add edx,ebp
+ vpalignr xmm8,xmm0,xmm7,8
+ vpxor xmm1,xmm1,xmm5
+ add ecx,DWORD[16+rsp]
+ and esi,eax
+ xor eax,ebx
+ shrd ebp,ebp,7
+ vpxor xmm1,xmm1,xmm2
+ mov edi,edx
+ xor esi,eax
+ vpaddd xmm9,xmm11,xmm0
+ shld edx,edx,5
+ add ecx,esi
+ vpxor xmm1,xmm1,xmm8
+ xor edi,ebp
+ xor ebp,eax
+ add ecx,edx
+ add ebx,DWORD[20+rsp]
+ vpsrld xmm8,xmm1,30
+ vmovdqa XMMWORD[rsp],xmm9
+ and edi,ebp
+ xor ebp,eax
+ shrd edx,edx,7
+ mov esi,ecx
+ vpslld xmm1,xmm1,2
+ xor edi,ebp
+ shld ecx,ecx,5
+ add ebx,edi
+ xor esi,edx
+ xor edx,ebp
+ add ebx,ecx
+ add eax,DWORD[24+rsp]
+ and esi,edx
+ vpor xmm1,xmm1,xmm8
+ xor edx,ebp
+ shrd ecx,ecx,7
+ mov edi,ebx
+ xor esi,edx
+ shld ebx,ebx,5
+ add eax,esi
+ xor edi,ecx
+ xor ecx,edx
+ add eax,ebx
+ add ebp,DWORD[28+rsp]
+ and edi,ecx
+ xor ecx,edx
+ shrd ebx,ebx,7
+ mov esi,eax
+ xor edi,ecx
+ shld eax,eax,5
+ add ebp,edi
+ xor esi,ebx
+ xor ebx,ecx
+ add ebp,eax
+ vpalignr xmm8,xmm1,xmm0,8
+ vpxor xmm2,xmm2,xmm6
+ add edx,DWORD[32+rsp]
+ and esi,ebx
+ xor ebx,ecx
+ shrd eax,eax,7
+ vpxor xmm2,xmm2,xmm3
+ mov edi,ebp
+ xor esi,ebx
+ vpaddd xmm9,xmm11,xmm1
+ shld ebp,ebp,5
+ add edx,esi
+ vpxor xmm2,xmm2,xmm8
+ xor edi,eax
+ xor eax,ebx
+ add edx,ebp
+ add ecx,DWORD[36+rsp]
+ vpsrld xmm8,xmm2,30
+ vmovdqa XMMWORD[16+rsp],xmm9
+ and edi,eax
+ xor eax,ebx
+ shrd ebp,ebp,7
+ mov esi,edx
+ vpslld xmm2,xmm2,2
+ xor edi,eax
+ shld edx,edx,5
+ add ecx,edi
+ xor esi,ebp
+ xor ebp,eax
+ add ecx,edx
+ add ebx,DWORD[40+rsp]
+ and esi,ebp
+ vpor xmm2,xmm2,xmm8
+ xor ebp,eax
+ shrd edx,edx,7
+ mov edi,ecx
+ xor esi,ebp
+ shld ecx,ecx,5
+ add ebx,esi
+ xor edi,edx
+ xor edx,ebp
+ add ebx,ecx
+ add eax,DWORD[44+rsp]
+ and edi,edx
+ xor edx,ebp
+ shrd ecx,ecx,7
+ mov esi,ebx
+ xor edi,edx
+ shld ebx,ebx,5
+ add eax,edi
+ xor esi,edx
+ add eax,ebx
+ vpalignr xmm8,xmm2,xmm1,8
+ vpxor xmm3,xmm3,xmm7
+ add ebp,DWORD[48+rsp]
+ xor esi,ecx
+ mov edi,eax
+ shld eax,eax,5
+ vpxor xmm3,xmm3,xmm4
+ add ebp,esi
+ xor edi,ecx
+ vpaddd xmm9,xmm11,xmm2
+ shrd ebx,ebx,7
+ add ebp,eax
+ vpxor xmm3,xmm3,xmm8
+ add edx,DWORD[52+rsp]
+ xor edi,ebx
+ mov esi,ebp
+ shld ebp,ebp,5
+ vpsrld xmm8,xmm3,30
+ vmovdqa XMMWORD[32+rsp],xmm9
+ add edx,edi
+ xor esi,ebx
+ shrd eax,eax,7
+ add edx,ebp
+ vpslld xmm3,xmm3,2
+ add ecx,DWORD[56+rsp]
+ xor esi,eax
+ mov edi,edx
+ shld edx,edx,5
+ add ecx,esi
+ xor edi,eax
+ shrd ebp,ebp,7
+ add ecx,edx
+ vpor xmm3,xmm3,xmm8
+ add ebx,DWORD[60+rsp]
+ xor edi,ebp
+ mov esi,ecx
+ shld ecx,ecx,5
+ add ebx,edi
+ xor esi,ebp
+ shrd edx,edx,7
+ add ebx,ecx
+ add eax,DWORD[rsp]
+ vpaddd xmm9,xmm11,xmm3
+ xor esi,edx
+ mov edi,ebx
+ shld ebx,ebx,5
+ add eax,esi
+ vmovdqa XMMWORD[48+rsp],xmm9
+ xor edi,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ add ebp,DWORD[4+rsp]
+ xor edi,ecx
+ mov esi,eax
+ shld eax,eax,5
+ add ebp,edi
+ xor esi,ecx
+ shrd ebx,ebx,7
+ add ebp,eax
+ add edx,DWORD[8+rsp]
+ xor esi,ebx
+ mov edi,ebp
+ shld ebp,ebp,5
+ add edx,esi
+ xor edi,ebx
+ shrd eax,eax,7
+ add edx,ebp
+ add ecx,DWORD[12+rsp]
+ xor edi,eax
+ mov esi,edx
+ shld edx,edx,5
+ add ecx,edi
+ xor esi,eax
+ shrd ebp,ebp,7
+ add ecx,edx
+ cmp r9,r10
+ je NEAR $L$done_avx
+ vmovdqa xmm6,XMMWORD[64+r14]
+ vmovdqa xmm11,XMMWORD[((-64))+r14]
+ vmovdqu xmm0,XMMWORD[r9]
+ vmovdqu xmm1,XMMWORD[16+r9]
+ vmovdqu xmm2,XMMWORD[32+r9]
+ vmovdqu xmm3,XMMWORD[48+r9]
+ vpshufb xmm0,xmm0,xmm6
+ add r9,64
+ add ebx,DWORD[16+rsp]
+ xor esi,ebp
+ vpshufb xmm1,xmm1,xmm6
+ mov edi,ecx
+ shld ecx,ecx,5
+ vpaddd xmm4,xmm0,xmm11
+ add ebx,esi
+ xor edi,ebp
+ shrd edx,edx,7
+ add ebx,ecx
+ vmovdqa XMMWORD[rsp],xmm4
+ add eax,DWORD[20+rsp]
+ xor edi,edx
+ mov esi,ebx
+ shld ebx,ebx,5
+ add eax,edi
+ xor esi,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ add ebp,DWORD[24+rsp]
+ xor esi,ecx
+ mov edi,eax
+ shld eax,eax,5
+ add ebp,esi
+ xor edi,ecx
+ shrd ebx,ebx,7
+ add ebp,eax
+ add edx,DWORD[28+rsp]
+ xor edi,ebx
+ mov esi,ebp
+ shld ebp,ebp,5
+ add edx,edi
+ xor esi,ebx
+ shrd eax,eax,7
+ add edx,ebp
+ add ecx,DWORD[32+rsp]
+ xor esi,eax
+ vpshufb xmm2,xmm2,xmm6
+ mov edi,edx
+ shld edx,edx,5
+ vpaddd xmm5,xmm1,xmm11
+ add ecx,esi
+ xor edi,eax
+ shrd ebp,ebp,7
+ add ecx,edx
+ vmovdqa XMMWORD[16+rsp],xmm5
+ add ebx,DWORD[36+rsp]
+ xor edi,ebp
+ mov esi,ecx
+ shld ecx,ecx,5
+ add ebx,edi
+ xor esi,ebp
+ shrd edx,edx,7
+ add ebx,ecx
+ add eax,DWORD[40+rsp]
+ xor esi,edx
+ mov edi,ebx
+ shld ebx,ebx,5
+ add eax,esi
+ xor edi,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ add ebp,DWORD[44+rsp]
+ xor edi,ecx
+ mov esi,eax
+ shld eax,eax,5
+ add ebp,edi
+ xor esi,ecx
+ shrd ebx,ebx,7
+ add ebp,eax
+ add edx,DWORD[48+rsp]
+ xor esi,ebx
+ vpshufb xmm3,xmm3,xmm6
+ mov edi,ebp
+ shld ebp,ebp,5
+ vpaddd xmm6,xmm2,xmm11
+ add edx,esi
+ xor edi,ebx
+ shrd eax,eax,7
+ add edx,ebp
+ vmovdqa XMMWORD[32+rsp],xmm6
+ add ecx,DWORD[52+rsp]
+ xor edi,eax
+ mov esi,edx
+ shld edx,edx,5
+ add ecx,edi
+ xor esi,eax
+ shrd ebp,ebp,7
+ add ecx,edx
+ add ebx,DWORD[56+rsp]
+ xor esi,ebp
+ mov edi,ecx
+ shld ecx,ecx,5
+ add ebx,esi
+ xor edi,ebp
+ shrd edx,edx,7
+ add ebx,ecx
+ add eax,DWORD[60+rsp]
+ xor edi,edx
+ mov esi,ebx
+ shld ebx,ebx,5
+ add eax,edi
+ shrd ecx,ecx,7
+ add eax,ebx
+ add eax,DWORD[r8]
+ add esi,DWORD[4+r8]
+ add ecx,DWORD[8+r8]
+ add edx,DWORD[12+r8]
+ mov DWORD[r8],eax
+ add ebp,DWORD[16+r8]
+ mov DWORD[4+r8],esi
+ mov ebx,esi
+ mov DWORD[8+r8],ecx
+ mov edi,ecx
+ mov DWORD[12+r8],edx
+ xor edi,edx
+ mov DWORD[16+r8],ebp
+ and esi,edi
+ jmp NEAR $L$oop_avx
+
+ALIGN 16
+$L$done_avx:
+ add ebx,DWORD[16+rsp]
+ xor esi,ebp
+ mov edi,ecx
+ shld ecx,ecx,5
+ add ebx,esi
+ xor edi,ebp
+ shrd edx,edx,7
+ add ebx,ecx
+ add eax,DWORD[20+rsp]
+ xor edi,edx
+ mov esi,ebx
+ shld ebx,ebx,5
+ add eax,edi
+ xor esi,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ add ebp,DWORD[24+rsp]
+ xor esi,ecx
+ mov edi,eax
+ shld eax,eax,5
+ add ebp,esi
+ xor edi,ecx
+ shrd ebx,ebx,7
+ add ebp,eax
+ add edx,DWORD[28+rsp]
+ xor edi,ebx
+ mov esi,ebp
+ shld ebp,ebp,5
+ add edx,edi
+ xor esi,ebx
+ shrd eax,eax,7
+ add edx,ebp
+ add ecx,DWORD[32+rsp]
+ xor esi,eax
+ mov edi,edx
+ shld edx,edx,5
+ add ecx,esi
+ xor edi,eax
+ shrd ebp,ebp,7
+ add ecx,edx
+ add ebx,DWORD[36+rsp]
+ xor edi,ebp
+ mov esi,ecx
+ shld ecx,ecx,5
+ add ebx,edi
+ xor esi,ebp
+ shrd edx,edx,7
+ add ebx,ecx
+ add eax,DWORD[40+rsp]
+ xor esi,edx
+ mov edi,ebx
+ shld ebx,ebx,5
+ add eax,esi
+ xor edi,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ add ebp,DWORD[44+rsp]
+ xor edi,ecx
+ mov esi,eax
+ shld eax,eax,5
+ add ebp,edi
+ xor esi,ecx
+ shrd ebx,ebx,7
+ add ebp,eax
+ add edx,DWORD[48+rsp]
+ xor esi,ebx
+ mov edi,ebp
+ shld ebp,ebp,5
+ add edx,esi
+ xor edi,ebx
+ shrd eax,eax,7
+ add edx,ebp
+ add ecx,DWORD[52+rsp]
+ xor edi,eax
+ mov esi,edx
+ shld edx,edx,5
+ add ecx,edi
+ xor esi,eax
+ shrd ebp,ebp,7
+ add ecx,edx
+ add ebx,DWORD[56+rsp]
+ xor esi,ebp
+ mov edi,ecx
+ shld ecx,ecx,5
+ add ebx,esi
+ xor edi,ebp
+ shrd edx,edx,7
+ add ebx,ecx
+ add eax,DWORD[60+rsp]
+ xor edi,edx
+ mov esi,ebx
+ shld ebx,ebx,5
+ add eax,edi
+ shrd ecx,ecx,7
+ add eax,ebx
+ vzeroupper
+
+ add eax,DWORD[r8]
+ add esi,DWORD[4+r8]
+ add ecx,DWORD[8+r8]
+ mov DWORD[r8],eax
+ add edx,DWORD[12+r8]
+ mov DWORD[4+r8],esi
+ add ebp,DWORD[16+r8]
+ mov DWORD[8+r8],ecx
+ mov DWORD[12+r8],edx
+ mov DWORD[16+r8],ebp
+ movaps xmm6,XMMWORD[((-40-96))+r11]
+ movaps xmm7,XMMWORD[((-40-80))+r11]
+ movaps xmm8,XMMWORD[((-40-64))+r11]
+ movaps xmm9,XMMWORD[((-40-48))+r11]
+ movaps xmm10,XMMWORD[((-40-32))+r11]
+ movaps xmm11,XMMWORD[((-40-16))+r11]
+ mov r14,QWORD[((-40))+r11]
+
+ mov r13,QWORD[((-32))+r11]
+
+ mov r12,QWORD[((-24))+r11]
+
+ mov rbp,QWORD[((-16))+r11]
+
+ mov rbx,QWORD[((-8))+r11]
+
+ lea rsp,[r11]
+
+$L$epilogue_avx:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_sha1_block_data_order_avx:
+
+ALIGN 16
+sha1_block_data_order_avx2:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha1_block_data_order_avx2:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+_avx2_shortcut:
+
+ mov r11,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ vzeroupper
+ lea rsp,[((-96))+rsp]
+ vmovaps XMMWORD[(-40-96)+r11],xmm6
+ vmovaps XMMWORD[(-40-80)+r11],xmm7
+ vmovaps XMMWORD[(-40-64)+r11],xmm8
+ vmovaps XMMWORD[(-40-48)+r11],xmm9
+ vmovaps XMMWORD[(-40-32)+r11],xmm10
+ vmovaps XMMWORD[(-40-16)+r11],xmm11
+$L$prologue_avx2:
+ mov r8,rdi
+ mov r9,rsi
+ mov r10,rdx
+
+ lea rsp,[((-640))+rsp]
+ shl r10,6
+ lea r13,[64+r9]
+ and rsp,-128
+ add r10,r9
+ lea r14,[((K_XX_XX+64))]
+
+ mov eax,DWORD[r8]
+ cmp r13,r10
+ cmovae r13,r9
+ mov ebp,DWORD[4+r8]
+ mov ecx,DWORD[8+r8]
+ mov edx,DWORD[12+r8]
+ mov esi,DWORD[16+r8]
+ vmovdqu ymm6,YMMWORD[64+r14]
+
+ vmovdqu xmm0,XMMWORD[r9]
+ vmovdqu xmm1,XMMWORD[16+r9]
+ vmovdqu xmm2,XMMWORD[32+r9]
+ vmovdqu xmm3,XMMWORD[48+r9]
+ lea r9,[64+r9]
+ vinserti128 ymm0,ymm0,XMMWORD[r13],1
+ vinserti128 ymm1,ymm1,XMMWORD[16+r13],1
+ vpshufb ymm0,ymm0,ymm6
+ vinserti128 ymm2,ymm2,XMMWORD[32+r13],1
+ vpshufb ymm1,ymm1,ymm6
+ vinserti128 ymm3,ymm3,XMMWORD[48+r13],1
+ vpshufb ymm2,ymm2,ymm6
+ vmovdqu ymm11,YMMWORD[((-64))+r14]
+ vpshufb ymm3,ymm3,ymm6
+
+ vpaddd ymm4,ymm0,ymm11
+ vpaddd ymm5,ymm1,ymm11
+ vmovdqu YMMWORD[rsp],ymm4
+ vpaddd ymm6,ymm2,ymm11
+ vmovdqu YMMWORD[32+rsp],ymm5
+ vpaddd ymm7,ymm3,ymm11
+ vmovdqu YMMWORD[64+rsp],ymm6
+ vmovdqu YMMWORD[96+rsp],ymm7
+ vpalignr ymm4,ymm1,ymm0,8
+ vpsrldq ymm8,ymm3,4
+ vpxor ymm4,ymm4,ymm0
+ vpxor ymm8,ymm8,ymm2
+ vpxor ymm4,ymm4,ymm8
+ vpsrld ymm8,ymm4,31
+ vpslldq ymm10,ymm4,12
+ vpaddd ymm4,ymm4,ymm4
+ vpsrld ymm9,ymm10,30
+ vpor ymm4,ymm4,ymm8
+ vpslld ymm10,ymm10,2
+ vpxor ymm4,ymm4,ymm9
+ vpxor ymm4,ymm4,ymm10
+ vpaddd ymm9,ymm4,ymm11
+ vmovdqu YMMWORD[128+rsp],ymm9
+ vpalignr ymm5,ymm2,ymm1,8
+ vpsrldq ymm8,ymm4,4
+ vpxor ymm5,ymm5,ymm1
+ vpxor ymm8,ymm8,ymm3
+ vpxor ymm5,ymm5,ymm8
+ vpsrld ymm8,ymm5,31
+ vmovdqu ymm11,YMMWORD[((-32))+r14]
+ vpslldq ymm10,ymm5,12
+ vpaddd ymm5,ymm5,ymm5
+ vpsrld ymm9,ymm10,30
+ vpor ymm5,ymm5,ymm8
+ vpslld ymm10,ymm10,2
+ vpxor ymm5,ymm5,ymm9
+ vpxor ymm5,ymm5,ymm10
+ vpaddd ymm9,ymm5,ymm11
+ vmovdqu YMMWORD[160+rsp],ymm9
+ vpalignr ymm6,ymm3,ymm2,8
+ vpsrldq ymm8,ymm5,4
+ vpxor ymm6,ymm6,ymm2
+ vpxor ymm8,ymm8,ymm4
+ vpxor ymm6,ymm6,ymm8
+ vpsrld ymm8,ymm6,31
+ vpslldq ymm10,ymm6,12
+ vpaddd ymm6,ymm6,ymm6
+ vpsrld ymm9,ymm10,30
+ vpor ymm6,ymm6,ymm8
+ vpslld ymm10,ymm10,2
+ vpxor ymm6,ymm6,ymm9
+ vpxor ymm6,ymm6,ymm10
+ vpaddd ymm9,ymm6,ymm11
+ vmovdqu YMMWORD[192+rsp],ymm9
+ vpalignr ymm7,ymm4,ymm3,8
+ vpsrldq ymm8,ymm6,4
+ vpxor ymm7,ymm7,ymm3
+ vpxor ymm8,ymm8,ymm5
+ vpxor ymm7,ymm7,ymm8
+ vpsrld ymm8,ymm7,31
+ vpslldq ymm10,ymm7,12
+ vpaddd ymm7,ymm7,ymm7
+ vpsrld ymm9,ymm10,30
+ vpor ymm7,ymm7,ymm8
+ vpslld ymm10,ymm10,2
+ vpxor ymm7,ymm7,ymm9
+ vpxor ymm7,ymm7,ymm10
+ vpaddd ymm9,ymm7,ymm11
+ vmovdqu YMMWORD[224+rsp],ymm9
+ lea r13,[128+rsp]
+ jmp NEAR $L$oop_avx2
+ALIGN 32
+$L$oop_avx2:
+ rorx ebx,ebp,2
+ andn edi,ebp,edx
+ and ebp,ecx
+ xor ebp,edi
+ jmp NEAR $L$align32_1
+ALIGN 32
+$L$align32_1:
+ vpalignr ymm8,ymm7,ymm6,8
+ vpxor ymm0,ymm0,ymm4
+ add esi,DWORD[((-128))+r13]
+ andn edi,eax,ecx
+ vpxor ymm0,ymm0,ymm1
+ add esi,ebp
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ vpxor ymm0,ymm0,ymm8
+ and eax,ebx
+ add esi,r12d
+ xor eax,edi
+ vpsrld ymm8,ymm0,30
+ vpslld ymm0,ymm0,2
+ add edx,DWORD[((-124))+r13]
+ andn edi,esi,ebx
+ add edx,eax
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ and esi,ebp
+ vpor ymm0,ymm0,ymm8
+ add edx,r12d
+ xor esi,edi
+ add ecx,DWORD[((-120))+r13]
+ andn edi,edx,ebp
+ vpaddd ymm9,ymm0,ymm11
+ add ecx,esi
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ and edx,eax
+ vmovdqu YMMWORD[256+rsp],ymm9
+ add ecx,r12d
+ xor edx,edi
+ add ebx,DWORD[((-116))+r13]
+ andn edi,ecx,eax
+ add ebx,edx
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ and ecx,esi
+ add ebx,r12d
+ xor ecx,edi
+ add ebp,DWORD[((-96))+r13]
+ andn edi,ebx,esi
+ add ebp,ecx
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ and ebx,edx
+ add ebp,r12d
+ xor ebx,edi
+ vpalignr ymm8,ymm0,ymm7,8
+ vpxor ymm1,ymm1,ymm5
+ add eax,DWORD[((-92))+r13]
+ andn edi,ebp,edx
+ vpxor ymm1,ymm1,ymm2
+ add eax,ebx
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ vpxor ymm1,ymm1,ymm8
+ and ebp,ecx
+ add eax,r12d
+ xor ebp,edi
+ vpsrld ymm8,ymm1,30
+ vpslld ymm1,ymm1,2
+ add esi,DWORD[((-88))+r13]
+ andn edi,eax,ecx
+ add esi,ebp
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ and eax,ebx
+ vpor ymm1,ymm1,ymm8
+ add esi,r12d
+ xor eax,edi
+ add edx,DWORD[((-84))+r13]
+ andn edi,esi,ebx
+ vpaddd ymm9,ymm1,ymm11
+ add edx,eax
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ and esi,ebp
+ vmovdqu YMMWORD[288+rsp],ymm9
+ add edx,r12d
+ xor esi,edi
+ add ecx,DWORD[((-64))+r13]
+ andn edi,edx,ebp
+ add ecx,esi
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ and edx,eax
+ add ecx,r12d
+ xor edx,edi
+ add ebx,DWORD[((-60))+r13]
+ andn edi,ecx,eax
+ add ebx,edx
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ and ecx,esi
+ add ebx,r12d
+ xor ecx,edi
+ vpalignr ymm8,ymm1,ymm0,8
+ vpxor ymm2,ymm2,ymm6
+ add ebp,DWORD[((-56))+r13]
+ andn edi,ebx,esi
+ vpxor ymm2,ymm2,ymm3
+ vmovdqu ymm11,YMMWORD[r14]
+ add ebp,ecx
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ vpxor ymm2,ymm2,ymm8
+ and ebx,edx
+ add ebp,r12d
+ xor ebx,edi
+ vpsrld ymm8,ymm2,30
+ vpslld ymm2,ymm2,2
+ add eax,DWORD[((-52))+r13]
+ andn edi,ebp,edx
+ add eax,ebx
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ and ebp,ecx
+ vpor ymm2,ymm2,ymm8
+ add eax,r12d
+ xor ebp,edi
+ add esi,DWORD[((-32))+r13]
+ andn edi,eax,ecx
+ vpaddd ymm9,ymm2,ymm11
+ add esi,ebp
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ and eax,ebx
+ vmovdqu YMMWORD[320+rsp],ymm9
+ add esi,r12d
+ xor eax,edi
+ add edx,DWORD[((-28))+r13]
+ andn edi,esi,ebx
+ add edx,eax
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ and esi,ebp
+ add edx,r12d
+ xor esi,edi
+ add ecx,DWORD[((-24))+r13]
+ andn edi,edx,ebp
+ add ecx,esi
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ and edx,eax
+ add ecx,r12d
+ xor edx,edi
+ vpalignr ymm8,ymm2,ymm1,8
+ vpxor ymm3,ymm3,ymm7
+ add ebx,DWORD[((-20))+r13]
+ andn edi,ecx,eax
+ vpxor ymm3,ymm3,ymm4
+ add ebx,edx
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ vpxor ymm3,ymm3,ymm8
+ and ecx,esi
+ add ebx,r12d
+ xor ecx,edi
+ vpsrld ymm8,ymm3,30
+ vpslld ymm3,ymm3,2
+ add ebp,DWORD[r13]
+ andn edi,ebx,esi
+ add ebp,ecx
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ and ebx,edx
+ vpor ymm3,ymm3,ymm8
+ add ebp,r12d
+ xor ebx,edi
+ add eax,DWORD[4+r13]
+ andn edi,ebp,edx
+ vpaddd ymm9,ymm3,ymm11
+ add eax,ebx
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ and ebp,ecx
+ vmovdqu YMMWORD[352+rsp],ymm9
+ add eax,r12d
+ xor ebp,edi
+ add esi,DWORD[8+r13]
+ andn edi,eax,ecx
+ add esi,ebp
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ and eax,ebx
+ add esi,r12d
+ xor eax,edi
+ add edx,DWORD[12+r13]
+ lea edx,[rax*1+rdx]
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ xor esi,ebp
+ add edx,r12d
+ xor esi,ebx
+ vpalignr ymm8,ymm3,ymm2,8
+ vpxor ymm4,ymm4,ymm0
+ add ecx,DWORD[32+r13]
+ lea ecx,[rsi*1+rcx]
+ vpxor ymm4,ymm4,ymm5
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ xor edx,eax
+ vpxor ymm4,ymm4,ymm8
+ add ecx,r12d
+ xor edx,ebp
+ add ebx,DWORD[36+r13]
+ vpsrld ymm8,ymm4,30
+ vpslld ymm4,ymm4,2
+ lea ebx,[rdx*1+rbx]
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ xor ecx,esi
+ add ebx,r12d
+ xor ecx,eax
+ vpor ymm4,ymm4,ymm8
+ add ebp,DWORD[40+r13]
+ lea ebp,[rbp*1+rcx]
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ vpaddd ymm9,ymm4,ymm11
+ xor ebx,edx
+ add ebp,r12d
+ xor ebx,esi
+ add eax,DWORD[44+r13]
+ vmovdqu YMMWORD[384+rsp],ymm9
+ lea eax,[rbx*1+rax]
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ xor ebp,ecx
+ add eax,r12d
+ xor ebp,edx
+ add esi,DWORD[64+r13]
+ lea esi,[rbp*1+rsi]
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ xor eax,ebx
+ add esi,r12d
+ xor eax,ecx
+ vpalignr ymm8,ymm4,ymm3,8
+ vpxor ymm5,ymm5,ymm1
+ add edx,DWORD[68+r13]
+ lea edx,[rax*1+rdx]
+ vpxor ymm5,ymm5,ymm6
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ xor esi,ebp
+ vpxor ymm5,ymm5,ymm8
+ add edx,r12d
+ xor esi,ebx
+ add ecx,DWORD[72+r13]
+ vpsrld ymm8,ymm5,30
+ vpslld ymm5,ymm5,2
+ lea ecx,[rsi*1+rcx]
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ xor edx,eax
+ add ecx,r12d
+ xor edx,ebp
+ vpor ymm5,ymm5,ymm8
+ add ebx,DWORD[76+r13]
+ lea ebx,[rdx*1+rbx]
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ vpaddd ymm9,ymm5,ymm11
+ xor ecx,esi
+ add ebx,r12d
+ xor ecx,eax
+ add ebp,DWORD[96+r13]
+ vmovdqu YMMWORD[416+rsp],ymm9
+ lea ebp,[rbp*1+rcx]
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ xor ebx,edx
+ add ebp,r12d
+ xor ebx,esi
+ add eax,DWORD[100+r13]
+ lea eax,[rbx*1+rax]
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ xor ebp,ecx
+ add eax,r12d
+ xor ebp,edx
+ vpalignr ymm8,ymm5,ymm4,8
+ vpxor ymm6,ymm6,ymm2
+ add esi,DWORD[104+r13]
+ lea esi,[rbp*1+rsi]
+ vpxor ymm6,ymm6,ymm7
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ xor eax,ebx
+ vpxor ymm6,ymm6,ymm8
+ add esi,r12d
+ xor eax,ecx
+ add edx,DWORD[108+r13]
+ lea r13,[256+r13]
+ vpsrld ymm8,ymm6,30
+ vpslld ymm6,ymm6,2
+ lea edx,[rax*1+rdx]
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ xor esi,ebp
+ add edx,r12d
+ xor esi,ebx
+ vpor ymm6,ymm6,ymm8
+ add ecx,DWORD[((-128))+r13]
+ lea ecx,[rsi*1+rcx]
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ vpaddd ymm9,ymm6,ymm11
+ xor edx,eax
+ add ecx,r12d
+ xor edx,ebp
+ add ebx,DWORD[((-124))+r13]
+ vmovdqu YMMWORD[448+rsp],ymm9
+ lea ebx,[rdx*1+rbx]
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ xor ecx,esi
+ add ebx,r12d
+ xor ecx,eax
+ add ebp,DWORD[((-120))+r13]
+ lea ebp,[rbp*1+rcx]
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ xor ebx,edx
+ add ebp,r12d
+ xor ebx,esi
+ vpalignr ymm8,ymm6,ymm5,8
+ vpxor ymm7,ymm7,ymm3
+ add eax,DWORD[((-116))+r13]
+ lea eax,[rbx*1+rax]
+ vpxor ymm7,ymm7,ymm0
+ vmovdqu ymm11,YMMWORD[32+r14]
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ xor ebp,ecx
+ vpxor ymm7,ymm7,ymm8
+ add eax,r12d
+ xor ebp,edx
+ add esi,DWORD[((-96))+r13]
+ vpsrld ymm8,ymm7,30
+ vpslld ymm7,ymm7,2
+ lea esi,[rbp*1+rsi]
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ xor eax,ebx
+ add esi,r12d
+ xor eax,ecx
+ vpor ymm7,ymm7,ymm8
+ add edx,DWORD[((-92))+r13]
+ lea edx,[rax*1+rdx]
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ vpaddd ymm9,ymm7,ymm11
+ xor esi,ebp
+ add edx,r12d
+ xor esi,ebx
+ add ecx,DWORD[((-88))+r13]
+ vmovdqu YMMWORD[480+rsp],ymm9
+ lea ecx,[rsi*1+rcx]
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ xor edx,eax
+ add ecx,r12d
+ xor edx,ebp
+ add ebx,DWORD[((-84))+r13]
+ mov edi,esi
+ xor edi,eax
+ lea ebx,[rdx*1+rbx]
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ xor ecx,esi
+ add ebx,r12d
+ and ecx,edi
+ jmp NEAR $L$align32_2
+ALIGN 32
+$L$align32_2:
+ vpalignr ymm8,ymm7,ymm6,8
+ vpxor ymm0,ymm0,ymm4
+ add ebp,DWORD[((-64))+r13]
+ xor ecx,esi
+ vpxor ymm0,ymm0,ymm1
+ mov edi,edx
+ xor edi,esi
+ lea ebp,[rbp*1+rcx]
+ vpxor ymm0,ymm0,ymm8
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ xor ebx,edx
+ vpsrld ymm8,ymm0,30
+ vpslld ymm0,ymm0,2
+ add ebp,r12d
+ and ebx,edi
+ add eax,DWORD[((-60))+r13]
+ xor ebx,edx
+ mov edi,ecx
+ xor edi,edx
+ vpor ymm0,ymm0,ymm8
+ lea eax,[rbx*1+rax]
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ xor ebp,ecx
+ vpaddd ymm9,ymm0,ymm11
+ add eax,r12d
+ and ebp,edi
+ add esi,DWORD[((-56))+r13]
+ xor ebp,ecx
+ vmovdqu YMMWORD[512+rsp],ymm9
+ mov edi,ebx
+ xor edi,ecx
+ lea esi,[rbp*1+rsi]
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ xor eax,ebx
+ add esi,r12d
+ and eax,edi
+ add edx,DWORD[((-52))+r13]
+ xor eax,ebx
+ mov edi,ebp
+ xor edi,ebx
+ lea edx,[rax*1+rdx]
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ xor esi,ebp
+ add edx,r12d
+ and esi,edi
+ add ecx,DWORD[((-32))+r13]
+ xor esi,ebp
+ mov edi,eax
+ xor edi,ebp
+ lea ecx,[rsi*1+rcx]
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ xor edx,eax
+ add ecx,r12d
+ and edx,edi
+ vpalignr ymm8,ymm0,ymm7,8
+ vpxor ymm1,ymm1,ymm5
+ add ebx,DWORD[((-28))+r13]
+ xor edx,eax
+ vpxor ymm1,ymm1,ymm2
+ mov edi,esi
+ xor edi,eax
+ lea ebx,[rdx*1+rbx]
+ vpxor ymm1,ymm1,ymm8
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ xor ecx,esi
+ vpsrld ymm8,ymm1,30
+ vpslld ymm1,ymm1,2
+ add ebx,r12d
+ and ecx,edi
+ add ebp,DWORD[((-24))+r13]
+ xor ecx,esi
+ mov edi,edx
+ xor edi,esi
+ vpor ymm1,ymm1,ymm8
+ lea ebp,[rbp*1+rcx]
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ xor ebx,edx
+ vpaddd ymm9,ymm1,ymm11
+ add ebp,r12d
+ and ebx,edi
+ add eax,DWORD[((-20))+r13]
+ xor ebx,edx
+ vmovdqu YMMWORD[544+rsp],ymm9
+ mov edi,ecx
+ xor edi,edx
+ lea eax,[rbx*1+rax]
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ xor ebp,ecx
+ add eax,r12d
+ and ebp,edi
+ add esi,DWORD[r13]
+ xor ebp,ecx
+ mov edi,ebx
+ xor edi,ecx
+ lea esi,[rbp*1+rsi]
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ xor eax,ebx
+ add esi,r12d
+ and eax,edi
+ add edx,DWORD[4+r13]
+ xor eax,ebx
+ mov edi,ebp
+ xor edi,ebx
+ lea edx,[rax*1+rdx]
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ xor esi,ebp
+ add edx,r12d
+ and esi,edi
+ vpalignr ymm8,ymm1,ymm0,8
+ vpxor ymm2,ymm2,ymm6
+ add ecx,DWORD[8+r13]
+ xor esi,ebp
+ vpxor ymm2,ymm2,ymm3
+ mov edi,eax
+ xor edi,ebp
+ lea ecx,[rsi*1+rcx]
+ vpxor ymm2,ymm2,ymm8
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ xor edx,eax
+ vpsrld ymm8,ymm2,30
+ vpslld ymm2,ymm2,2
+ add ecx,r12d
+ and edx,edi
+ add ebx,DWORD[12+r13]
+ xor edx,eax
+ mov edi,esi
+ xor edi,eax
+ vpor ymm2,ymm2,ymm8
+ lea ebx,[rdx*1+rbx]
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ xor ecx,esi
+ vpaddd ymm9,ymm2,ymm11
+ add ebx,r12d
+ and ecx,edi
+ add ebp,DWORD[32+r13]
+ xor ecx,esi
+ vmovdqu YMMWORD[576+rsp],ymm9
+ mov edi,edx
+ xor edi,esi
+ lea ebp,[rbp*1+rcx]
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ xor ebx,edx
+ add ebp,r12d
+ and ebx,edi
+ add eax,DWORD[36+r13]
+ xor ebx,edx
+ mov edi,ecx
+ xor edi,edx
+ lea eax,[rbx*1+rax]
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ xor ebp,ecx
+ add eax,r12d
+ and ebp,edi
+ add esi,DWORD[40+r13]
+ xor ebp,ecx
+ mov edi,ebx
+ xor edi,ecx
+ lea esi,[rbp*1+rsi]
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ xor eax,ebx
+ add esi,r12d
+ and eax,edi
+ vpalignr ymm8,ymm2,ymm1,8
+ vpxor ymm3,ymm3,ymm7
+ add edx,DWORD[44+r13]
+ xor eax,ebx
+ vpxor ymm3,ymm3,ymm4
+ mov edi,ebp
+ xor edi,ebx
+ lea edx,[rax*1+rdx]
+ vpxor ymm3,ymm3,ymm8
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ xor esi,ebp
+ vpsrld ymm8,ymm3,30
+ vpslld ymm3,ymm3,2
+ add edx,r12d
+ and esi,edi
+ add ecx,DWORD[64+r13]
+ xor esi,ebp
+ mov edi,eax
+ xor edi,ebp
+ vpor ymm3,ymm3,ymm8
+ lea ecx,[rsi*1+rcx]
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ xor edx,eax
+ vpaddd ymm9,ymm3,ymm11
+ add ecx,r12d
+ and edx,edi
+ add ebx,DWORD[68+r13]
+ xor edx,eax
+ vmovdqu YMMWORD[608+rsp],ymm9
+ mov edi,esi
+ xor edi,eax
+ lea ebx,[rdx*1+rbx]
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ xor ecx,esi
+ add ebx,r12d
+ and ecx,edi
+ add ebp,DWORD[72+r13]
+ xor ecx,esi
+ mov edi,edx
+ xor edi,esi
+ lea ebp,[rbp*1+rcx]
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ xor ebx,edx
+ add ebp,r12d
+ and ebx,edi
+ add eax,DWORD[76+r13]
+ xor ebx,edx
+ lea eax,[rbx*1+rax]
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ xor ebp,ecx
+ add eax,r12d
+ xor ebp,edx
+ add esi,DWORD[96+r13]
+ lea esi,[rbp*1+rsi]
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ xor eax,ebx
+ add esi,r12d
+ xor eax,ecx
+ add edx,DWORD[100+r13]
+ lea edx,[rax*1+rdx]
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ xor esi,ebp
+ add edx,r12d
+ xor esi,ebx
+ add ecx,DWORD[104+r13]
+ lea ecx,[rsi*1+rcx]
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ xor edx,eax
+ add ecx,r12d
+ xor edx,ebp
+ add ebx,DWORD[108+r13]
+ lea r13,[256+r13]
+ lea ebx,[rdx*1+rbx]
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ xor ecx,esi
+ add ebx,r12d
+ xor ecx,eax
+ add ebp,DWORD[((-128))+r13]
+ lea ebp,[rbp*1+rcx]
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ xor ebx,edx
+ add ebp,r12d
+ xor ebx,esi
+ add eax,DWORD[((-124))+r13]
+ lea eax,[rbx*1+rax]
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ xor ebp,ecx
+ add eax,r12d
+ xor ebp,edx
+ add esi,DWORD[((-120))+r13]
+ lea esi,[rbp*1+rsi]
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ xor eax,ebx
+ add esi,r12d
+ xor eax,ecx
+ add edx,DWORD[((-116))+r13]
+ lea edx,[rax*1+rdx]
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ xor esi,ebp
+ add edx,r12d
+ xor esi,ebx
+ add ecx,DWORD[((-96))+r13]
+ lea ecx,[rsi*1+rcx]
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ xor edx,eax
+ add ecx,r12d
+ xor edx,ebp
+ add ebx,DWORD[((-92))+r13]
+ lea ebx,[rdx*1+rbx]
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ xor ecx,esi
+ add ebx,r12d
+ xor ecx,eax
+ add ebp,DWORD[((-88))+r13]
+ lea ebp,[rbp*1+rcx]
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ xor ebx,edx
+ add ebp,r12d
+ xor ebx,esi
+ add eax,DWORD[((-84))+r13]
+ lea eax,[rbx*1+rax]
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ xor ebp,ecx
+ add eax,r12d
+ xor ebp,edx
+ add esi,DWORD[((-64))+r13]
+ lea esi,[rbp*1+rsi]
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ xor eax,ebx
+ add esi,r12d
+ xor eax,ecx
+ add edx,DWORD[((-60))+r13]
+ lea edx,[rax*1+rdx]
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ xor esi,ebp
+ add edx,r12d
+ xor esi,ebx
+ add ecx,DWORD[((-56))+r13]
+ lea ecx,[rsi*1+rcx]
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ xor edx,eax
+ add ecx,r12d
+ xor edx,ebp
+ add ebx,DWORD[((-52))+r13]
+ lea ebx,[rdx*1+rbx]
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ xor ecx,esi
+ add ebx,r12d
+ xor ecx,eax
+ add ebp,DWORD[((-32))+r13]
+ lea ebp,[rbp*1+rcx]
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ xor ebx,edx
+ add ebp,r12d
+ xor ebx,esi
+ add eax,DWORD[((-28))+r13]
+ lea eax,[rbx*1+rax]
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ xor ebp,ecx
+ add eax,r12d
+ xor ebp,edx
+ add esi,DWORD[((-24))+r13]
+ lea esi,[rbp*1+rsi]
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ xor eax,ebx
+ add esi,r12d
+ xor eax,ecx
+ add edx,DWORD[((-20))+r13]
+ lea edx,[rax*1+rdx]
+ rorx r12d,esi,27
+ add edx,r12d
+ lea r13,[128+r9]
+ lea rdi,[128+r9]
+ cmp r13,r10
+ cmovae r13,r9
+
+
+ add edx,DWORD[r8]
+ add esi,DWORD[4+r8]
+ add ebp,DWORD[8+r8]
+ mov DWORD[r8],edx
+ add ebx,DWORD[12+r8]
+ mov DWORD[4+r8],esi
+ mov eax,edx
+ add ecx,DWORD[16+r8]
+ mov r12d,ebp
+ mov DWORD[8+r8],ebp
+ mov edx,ebx
+
+ mov DWORD[12+r8],ebx
+ mov ebp,esi
+ mov DWORD[16+r8],ecx
+
+ mov esi,ecx
+ mov ecx,r12d
+
+
+ cmp r9,r10
+ je NEAR $L$done_avx2
+ vmovdqu ymm6,YMMWORD[64+r14]
+ cmp rdi,r10
+ ja NEAR $L$ast_avx2
+
+ vmovdqu xmm0,XMMWORD[((-64))+rdi]
+ vmovdqu xmm1,XMMWORD[((-48))+rdi]
+ vmovdqu xmm2,XMMWORD[((-32))+rdi]
+ vmovdqu xmm3,XMMWORD[((-16))+rdi]
+ vinserti128 ymm0,ymm0,XMMWORD[r13],1
+ vinserti128 ymm1,ymm1,XMMWORD[16+r13],1
+ vinserti128 ymm2,ymm2,XMMWORD[32+r13],1
+ vinserti128 ymm3,ymm3,XMMWORD[48+r13],1
+ jmp NEAR $L$ast_avx2
+
+ALIGN 32
+$L$ast_avx2:
+ lea r13,[((128+16))+rsp]
+ rorx ebx,ebp,2
+ andn edi,ebp,edx
+ and ebp,ecx
+ xor ebp,edi
+ sub r9,-128
+ add esi,DWORD[((-128))+r13]
+ andn edi,eax,ecx
+ add esi,ebp
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ and eax,ebx
+ add esi,r12d
+ xor eax,edi
+ add edx,DWORD[((-124))+r13]
+ andn edi,esi,ebx
+ add edx,eax
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ and esi,ebp
+ add edx,r12d
+ xor esi,edi
+ add ecx,DWORD[((-120))+r13]
+ andn edi,edx,ebp
+ add ecx,esi
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ and edx,eax
+ add ecx,r12d
+ xor edx,edi
+ add ebx,DWORD[((-116))+r13]
+ andn edi,ecx,eax
+ add ebx,edx
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ and ecx,esi
+ add ebx,r12d
+ xor ecx,edi
+ add ebp,DWORD[((-96))+r13]
+ andn edi,ebx,esi
+ add ebp,ecx
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ and ebx,edx
+ add ebp,r12d
+ xor ebx,edi
+ add eax,DWORD[((-92))+r13]
+ andn edi,ebp,edx
+ add eax,ebx
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ and ebp,ecx
+ add eax,r12d
+ xor ebp,edi
+ add esi,DWORD[((-88))+r13]
+ andn edi,eax,ecx
+ add esi,ebp
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ and eax,ebx
+ add esi,r12d
+ xor eax,edi
+ add edx,DWORD[((-84))+r13]
+ andn edi,esi,ebx
+ add edx,eax
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ and esi,ebp
+ add edx,r12d
+ xor esi,edi
+ add ecx,DWORD[((-64))+r13]
+ andn edi,edx,ebp
+ add ecx,esi
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ and edx,eax
+ add ecx,r12d
+ xor edx,edi
+ add ebx,DWORD[((-60))+r13]
+ andn edi,ecx,eax
+ add ebx,edx
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ and ecx,esi
+ add ebx,r12d
+ xor ecx,edi
+ add ebp,DWORD[((-56))+r13]
+ andn edi,ebx,esi
+ add ebp,ecx
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ and ebx,edx
+ add ebp,r12d
+ xor ebx,edi
+ add eax,DWORD[((-52))+r13]
+ andn edi,ebp,edx
+ add eax,ebx
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ and ebp,ecx
+ add eax,r12d
+ xor ebp,edi
+ add esi,DWORD[((-32))+r13]
+ andn edi,eax,ecx
+ add esi,ebp
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ and eax,ebx
+ add esi,r12d
+ xor eax,edi
+ add edx,DWORD[((-28))+r13]
+ andn edi,esi,ebx
+ add edx,eax
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ and esi,ebp
+ add edx,r12d
+ xor esi,edi
+ add ecx,DWORD[((-24))+r13]
+ andn edi,edx,ebp
+ add ecx,esi
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ and edx,eax
+ add ecx,r12d
+ xor edx,edi
+ add ebx,DWORD[((-20))+r13]
+ andn edi,ecx,eax
+ add ebx,edx
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ and ecx,esi
+ add ebx,r12d
+ xor ecx,edi
+ add ebp,DWORD[r13]
+ andn edi,ebx,esi
+ add ebp,ecx
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ and ebx,edx
+ add ebp,r12d
+ xor ebx,edi
+ add eax,DWORD[4+r13]
+ andn edi,ebp,edx
+ add eax,ebx
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ and ebp,ecx
+ add eax,r12d
+ xor ebp,edi
+ add esi,DWORD[8+r13]
+ andn edi,eax,ecx
+ add esi,ebp
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ and eax,ebx
+ add esi,r12d
+ xor eax,edi
+ add edx,DWORD[12+r13]
+ lea edx,[rax*1+rdx]
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ xor esi,ebp
+ add edx,r12d
+ xor esi,ebx
+ add ecx,DWORD[32+r13]
+ lea ecx,[rsi*1+rcx]
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ xor edx,eax
+ add ecx,r12d
+ xor edx,ebp
+ add ebx,DWORD[36+r13]
+ lea ebx,[rdx*1+rbx]
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ xor ecx,esi
+ add ebx,r12d
+ xor ecx,eax
+ add ebp,DWORD[40+r13]
+ lea ebp,[rbp*1+rcx]
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ xor ebx,edx
+ add ebp,r12d
+ xor ebx,esi
+ add eax,DWORD[44+r13]
+ lea eax,[rbx*1+rax]
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ xor ebp,ecx
+ add eax,r12d
+ xor ebp,edx
+ add esi,DWORD[64+r13]
+ lea esi,[rbp*1+rsi]
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ xor eax,ebx
+ add esi,r12d
+ xor eax,ecx
+ vmovdqu ymm11,YMMWORD[((-64))+r14]
+ vpshufb ymm0,ymm0,ymm6
+ add edx,DWORD[68+r13]
+ lea edx,[rax*1+rdx]
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ xor esi,ebp
+ add edx,r12d
+ xor esi,ebx
+ add ecx,DWORD[72+r13]
+ lea ecx,[rsi*1+rcx]
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ xor edx,eax
+ add ecx,r12d
+ xor edx,ebp
+ add ebx,DWORD[76+r13]
+ lea ebx,[rdx*1+rbx]
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ xor ecx,esi
+ add ebx,r12d
+ xor ecx,eax
+ add ebp,DWORD[96+r13]
+ lea ebp,[rbp*1+rcx]
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ xor ebx,edx
+ add ebp,r12d
+ xor ebx,esi
+ add eax,DWORD[100+r13]
+ lea eax,[rbx*1+rax]
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ xor ebp,ecx
+ add eax,r12d
+ xor ebp,edx
+ vpshufb ymm1,ymm1,ymm6
+ vpaddd ymm8,ymm0,ymm11
+ add esi,DWORD[104+r13]
+ lea esi,[rbp*1+rsi]
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ xor eax,ebx
+ add esi,r12d
+ xor eax,ecx
+ add edx,DWORD[108+r13]
+ lea r13,[256+r13]
+ lea edx,[rax*1+rdx]
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ xor esi,ebp
+ add edx,r12d
+ xor esi,ebx
+ add ecx,DWORD[((-128))+r13]
+ lea ecx,[rsi*1+rcx]
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ xor edx,eax
+ add ecx,r12d
+ xor edx,ebp
+ add ebx,DWORD[((-124))+r13]
+ lea ebx,[rdx*1+rbx]
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ xor ecx,esi
+ add ebx,r12d
+ xor ecx,eax
+ add ebp,DWORD[((-120))+r13]
+ lea ebp,[rbp*1+rcx]
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ xor ebx,edx
+ add ebp,r12d
+ xor ebx,esi
+ vmovdqu YMMWORD[rsp],ymm8
+ vpshufb ymm2,ymm2,ymm6
+ vpaddd ymm9,ymm1,ymm11
+ add eax,DWORD[((-116))+r13]
+ lea eax,[rbx*1+rax]
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ xor ebp,ecx
+ add eax,r12d
+ xor ebp,edx
+ add esi,DWORD[((-96))+r13]
+ lea esi,[rbp*1+rsi]
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ xor eax,ebx
+ add esi,r12d
+ xor eax,ecx
+ add edx,DWORD[((-92))+r13]
+ lea edx,[rax*1+rdx]
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ xor esi,ebp
+ add edx,r12d
+ xor esi,ebx
+ add ecx,DWORD[((-88))+r13]
+ lea ecx,[rsi*1+rcx]
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ xor edx,eax
+ add ecx,r12d
+ xor edx,ebp
+ add ebx,DWORD[((-84))+r13]
+ mov edi,esi
+ xor edi,eax
+ lea ebx,[rdx*1+rbx]
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ xor ecx,esi
+ add ebx,r12d
+ and ecx,edi
+ vmovdqu YMMWORD[32+rsp],ymm9
+ vpshufb ymm3,ymm3,ymm6
+ vpaddd ymm6,ymm2,ymm11
+ add ebp,DWORD[((-64))+r13]
+ xor ecx,esi
+ mov edi,edx
+ xor edi,esi
+ lea ebp,[rbp*1+rcx]
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ xor ebx,edx
+ add ebp,r12d
+ and ebx,edi
+ add eax,DWORD[((-60))+r13]
+ xor ebx,edx
+ mov edi,ecx
+ xor edi,edx
+ lea eax,[rbx*1+rax]
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ xor ebp,ecx
+ add eax,r12d
+ and ebp,edi
+ add esi,DWORD[((-56))+r13]
+ xor ebp,ecx
+ mov edi,ebx
+ xor edi,ecx
+ lea esi,[rbp*1+rsi]
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ xor eax,ebx
+ add esi,r12d
+ and eax,edi
+ add edx,DWORD[((-52))+r13]
+ xor eax,ebx
+ mov edi,ebp
+ xor edi,ebx
+ lea edx,[rax*1+rdx]
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ xor esi,ebp
+ add edx,r12d
+ and esi,edi
+ add ecx,DWORD[((-32))+r13]
+ xor esi,ebp
+ mov edi,eax
+ xor edi,ebp
+ lea ecx,[rsi*1+rcx]
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ xor edx,eax
+ add ecx,r12d
+ and edx,edi
+ jmp NEAR $L$align32_3
+ALIGN 32
+$L$align32_3:
+ vmovdqu YMMWORD[64+rsp],ymm6
+ vpaddd ymm7,ymm3,ymm11
+ add ebx,DWORD[((-28))+r13]
+ xor edx,eax
+ mov edi,esi
+ xor edi,eax
+ lea ebx,[rdx*1+rbx]
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ xor ecx,esi
+ add ebx,r12d
+ and ecx,edi
+ add ebp,DWORD[((-24))+r13]
+ xor ecx,esi
+ mov edi,edx
+ xor edi,esi
+ lea ebp,[rbp*1+rcx]
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ xor ebx,edx
+ add ebp,r12d
+ and ebx,edi
+ add eax,DWORD[((-20))+r13]
+ xor ebx,edx
+ mov edi,ecx
+ xor edi,edx
+ lea eax,[rbx*1+rax]
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ xor ebp,ecx
+ add eax,r12d
+ and ebp,edi
+ add esi,DWORD[r13]
+ xor ebp,ecx
+ mov edi,ebx
+ xor edi,ecx
+ lea esi,[rbp*1+rsi]
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ xor eax,ebx
+ add esi,r12d
+ and eax,edi
+ add edx,DWORD[4+r13]
+ xor eax,ebx
+ mov edi,ebp
+ xor edi,ebx
+ lea edx,[rax*1+rdx]
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ xor esi,ebp
+ add edx,r12d
+ and esi,edi
+ vmovdqu YMMWORD[96+rsp],ymm7
+ add ecx,DWORD[8+r13]
+ xor esi,ebp
+ mov edi,eax
+ xor edi,ebp
+ lea ecx,[rsi*1+rcx]
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ xor edx,eax
+ add ecx,r12d
+ and edx,edi
+ add ebx,DWORD[12+r13]
+ xor edx,eax
+ mov edi,esi
+ xor edi,eax
+ lea ebx,[rdx*1+rbx]
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ xor ecx,esi
+ add ebx,r12d
+ and ecx,edi
+ add ebp,DWORD[32+r13]
+ xor ecx,esi
+ mov edi,edx
+ xor edi,esi
+ lea ebp,[rbp*1+rcx]
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ xor ebx,edx
+ add ebp,r12d
+ and ebx,edi
+ add eax,DWORD[36+r13]
+ xor ebx,edx
+ mov edi,ecx
+ xor edi,edx
+ lea eax,[rbx*1+rax]
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ xor ebp,ecx
+ add eax,r12d
+ and ebp,edi
+ add esi,DWORD[40+r13]
+ xor ebp,ecx
+ mov edi,ebx
+ xor edi,ecx
+ lea esi,[rbp*1+rsi]
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ xor eax,ebx
+ add esi,r12d
+ and eax,edi
+ vpalignr ymm4,ymm1,ymm0,8
+ add edx,DWORD[44+r13]
+ xor eax,ebx
+ mov edi,ebp
+ xor edi,ebx
+ vpsrldq ymm8,ymm3,4
+ lea edx,[rax*1+rdx]
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ vpxor ymm4,ymm4,ymm0
+ vpxor ymm8,ymm8,ymm2
+ xor esi,ebp
+ add edx,r12d
+ vpxor ymm4,ymm4,ymm8
+ and esi,edi
+ add ecx,DWORD[64+r13]
+ xor esi,ebp
+ mov edi,eax
+ vpsrld ymm8,ymm4,31
+ xor edi,ebp
+ lea ecx,[rsi*1+rcx]
+ rorx r12d,edx,27
+ vpslldq ymm10,ymm4,12
+ vpaddd ymm4,ymm4,ymm4
+ rorx esi,edx,2
+ xor edx,eax
+ vpsrld ymm9,ymm10,30
+ vpor ymm4,ymm4,ymm8
+ add ecx,r12d
+ and edx,edi
+ vpslld ymm10,ymm10,2
+ vpxor ymm4,ymm4,ymm9
+ add ebx,DWORD[68+r13]
+ xor edx,eax
+ vpxor ymm4,ymm4,ymm10
+ mov edi,esi
+ xor edi,eax
+ lea ebx,[rdx*1+rbx]
+ vpaddd ymm9,ymm4,ymm11
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ xor ecx,esi
+ vmovdqu YMMWORD[128+rsp],ymm9
+ add ebx,r12d
+ and ecx,edi
+ add ebp,DWORD[72+r13]
+ xor ecx,esi
+ mov edi,edx
+ xor edi,esi
+ lea ebp,[rbp*1+rcx]
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ xor ebx,edx
+ add ebp,r12d
+ and ebx,edi
+ add eax,DWORD[76+r13]
+ xor ebx,edx
+ lea eax,[rbx*1+rax]
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ xor ebp,ecx
+ add eax,r12d
+ xor ebp,edx
+ vpalignr ymm5,ymm2,ymm1,8
+ add esi,DWORD[96+r13]
+ lea esi,[rbp*1+rsi]
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ vpsrldq ymm8,ymm4,4
+ xor eax,ebx
+ add esi,r12d
+ xor eax,ecx
+ vpxor ymm5,ymm5,ymm1
+ vpxor ymm8,ymm8,ymm3
+ add edx,DWORD[100+r13]
+ lea edx,[rax*1+rdx]
+ vpxor ymm5,ymm5,ymm8
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ xor esi,ebp
+ add edx,r12d
+ vpsrld ymm8,ymm5,31
+ vmovdqu ymm11,YMMWORD[((-32))+r14]
+ xor esi,ebx
+ add ecx,DWORD[104+r13]
+ lea ecx,[rsi*1+rcx]
+ vpslldq ymm10,ymm5,12
+ vpaddd ymm5,ymm5,ymm5
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ vpsrld ymm9,ymm10,30
+ vpor ymm5,ymm5,ymm8
+ xor edx,eax
+ add ecx,r12d
+ vpslld ymm10,ymm10,2
+ vpxor ymm5,ymm5,ymm9
+ xor edx,ebp
+ add ebx,DWORD[108+r13]
+ lea r13,[256+r13]
+ vpxor ymm5,ymm5,ymm10
+ lea ebx,[rdx*1+rbx]
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ vpaddd ymm9,ymm5,ymm11
+ xor ecx,esi
+ add ebx,r12d
+ xor ecx,eax
+ vmovdqu YMMWORD[160+rsp],ymm9
+ add ebp,DWORD[((-128))+r13]
+ lea ebp,[rbp*1+rcx]
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ xor ebx,edx
+ add ebp,r12d
+ xor ebx,esi
+ vpalignr ymm6,ymm3,ymm2,8
+ add eax,DWORD[((-124))+r13]
+ lea eax,[rbx*1+rax]
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ vpsrldq ymm8,ymm5,4
+ xor ebp,ecx
+ add eax,r12d
+ xor ebp,edx
+ vpxor ymm6,ymm6,ymm2
+ vpxor ymm8,ymm8,ymm4
+ add esi,DWORD[((-120))+r13]
+ lea esi,[rbp*1+rsi]
+ vpxor ymm6,ymm6,ymm8
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ xor eax,ebx
+ add esi,r12d
+ vpsrld ymm8,ymm6,31
+ xor eax,ecx
+ add edx,DWORD[((-116))+r13]
+ lea edx,[rax*1+rdx]
+ vpslldq ymm10,ymm6,12
+ vpaddd ymm6,ymm6,ymm6
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ vpsrld ymm9,ymm10,30
+ vpor ymm6,ymm6,ymm8
+ xor esi,ebp
+ add edx,r12d
+ vpslld ymm10,ymm10,2
+ vpxor ymm6,ymm6,ymm9
+ xor esi,ebx
+ add ecx,DWORD[((-96))+r13]
+ vpxor ymm6,ymm6,ymm10
+ lea ecx,[rsi*1+rcx]
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ vpaddd ymm9,ymm6,ymm11
+ xor edx,eax
+ add ecx,r12d
+ xor edx,ebp
+ vmovdqu YMMWORD[192+rsp],ymm9
+ add ebx,DWORD[((-92))+r13]
+ lea ebx,[rdx*1+rbx]
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ xor ecx,esi
+ add ebx,r12d
+ xor ecx,eax
+ vpalignr ymm7,ymm4,ymm3,8
+ add ebp,DWORD[((-88))+r13]
+ lea ebp,[rbp*1+rcx]
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ vpsrldq ymm8,ymm6,4
+ xor ebx,edx
+ add ebp,r12d
+ xor ebx,esi
+ vpxor ymm7,ymm7,ymm3
+ vpxor ymm8,ymm8,ymm5
+ add eax,DWORD[((-84))+r13]
+ lea eax,[rbx*1+rax]
+ vpxor ymm7,ymm7,ymm8
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ xor ebp,ecx
+ add eax,r12d
+ vpsrld ymm8,ymm7,31
+ xor ebp,edx
+ add esi,DWORD[((-64))+r13]
+ lea esi,[rbp*1+rsi]
+ vpslldq ymm10,ymm7,12
+ vpaddd ymm7,ymm7,ymm7
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ vpsrld ymm9,ymm10,30
+ vpor ymm7,ymm7,ymm8
+ xor eax,ebx
+ add esi,r12d
+ vpslld ymm10,ymm10,2
+ vpxor ymm7,ymm7,ymm9
+ xor eax,ecx
+ add edx,DWORD[((-60))+r13]
+ vpxor ymm7,ymm7,ymm10
+ lea edx,[rax*1+rdx]
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ vpaddd ymm9,ymm7,ymm11
+ xor esi,ebp
+ add edx,r12d
+ xor esi,ebx
+ vmovdqu YMMWORD[224+rsp],ymm9
+ add ecx,DWORD[((-56))+r13]
+ lea ecx,[rsi*1+rcx]
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ xor edx,eax
+ add ecx,r12d
+ xor edx,ebp
+ add ebx,DWORD[((-52))+r13]
+ lea ebx,[rdx*1+rbx]
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ xor ecx,esi
+ add ebx,r12d
+ xor ecx,eax
+ add ebp,DWORD[((-32))+r13]
+ lea ebp,[rbp*1+rcx]
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ xor ebx,edx
+ add ebp,r12d
+ xor ebx,esi
+ add eax,DWORD[((-28))+r13]
+ lea eax,[rbx*1+rax]
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ xor ebp,ecx
+ add eax,r12d
+ xor ebp,edx
+ add esi,DWORD[((-24))+r13]
+ lea esi,[rbp*1+rsi]
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ xor eax,ebx
+ add esi,r12d
+ xor eax,ecx
+ add edx,DWORD[((-20))+r13]
+ lea edx,[rax*1+rdx]
+ rorx r12d,esi,27
+ add edx,r12d
+ lea r13,[128+rsp]
+
+
+ add edx,DWORD[r8]
+ add esi,DWORD[4+r8]
+ add ebp,DWORD[8+r8]
+ mov DWORD[r8],edx
+ add ebx,DWORD[12+r8]
+ mov DWORD[4+r8],esi
+ mov eax,edx
+ add ecx,DWORD[16+r8]
+ mov r12d,ebp
+ mov DWORD[8+r8],ebp
+ mov edx,ebx
+
+ mov DWORD[12+r8],ebx
+ mov ebp,esi
+ mov DWORD[16+r8],ecx
+
+ mov esi,ecx
+ mov ecx,r12d
+
+
+ cmp r9,r10
+ jbe NEAR $L$oop_avx2
+
+$L$done_avx2:
+ vzeroupper
+ movaps xmm6,XMMWORD[((-40-96))+r11]
+ movaps xmm7,XMMWORD[((-40-80))+r11]
+ movaps xmm8,XMMWORD[((-40-64))+r11]
+ movaps xmm9,XMMWORD[((-40-48))+r11]
+ movaps xmm10,XMMWORD[((-40-32))+r11]
+ movaps xmm11,XMMWORD[((-40-16))+r11]
+ mov r14,QWORD[((-40))+r11]
+
+ mov r13,QWORD[((-32))+r11]
+
+ mov r12,QWORD[((-24))+r11]
+
+ mov rbp,QWORD[((-16))+r11]
+
+ mov rbx,QWORD[((-8))+r11]
+
+ lea rsp,[r11]
+
+$L$epilogue_avx2:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_sha1_block_data_order_avx2:
+ALIGN 64
+K_XX_XX:
+ DD 0x5a827999,0x5a827999,0x5a827999,0x5a827999
+ DD 0x5a827999,0x5a827999,0x5a827999,0x5a827999
+ DD 0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1
+ DD 0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1
+ DD 0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc
+ DD 0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc
+ DD 0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6
+ DD 0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6
+ DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+ DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+DB 0xf,0xe,0xd,0xc,0xb,0xa,0x9,0x8,0x7,0x6,0x5,0x4,0x3,0x2,0x1,0x0
+DB 83,72,65,49,32,98,108,111,99,107,32,116,114,97,110,115
+DB 102,111,114,109,32,102,111,114,32,120,56,54,95,54,52,44
+DB 32,67,82,89,80,84,79,71,65,77,83,32,98,121,32,60
+DB 97,112,112,114,111,64,111,112,101,110,115,115,108,46,111,114
+DB 103,62,0
+ALIGN 64
+EXTERN __imp_RtlVirtualUnwind
+
+ALIGN 16
+se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ lea r10,[$L$prologue]
+ cmp rbx,r10
+ jb NEAR $L$common_seh_tail
+
+ mov rax,QWORD[152+r8]
+
+ lea r10,[$L$epilogue]
+ cmp rbx,r10
+ jae NEAR $L$common_seh_tail
+
+ mov rax,QWORD[64+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+ mov rbp,QWORD[((-16))+rax]
+ mov r12,QWORD[((-24))+rax]
+ mov r13,QWORD[((-32))+rax]
+ mov r14,QWORD[((-40))+rax]
+ mov QWORD[144+r8],rbx
+ mov QWORD[160+r8],rbp
+ mov QWORD[216+r8],r12
+ mov QWORD[224+r8],r13
+ mov QWORD[232+r8],r14
+
+ jmp NEAR $L$common_seh_tail
+
+
+ALIGN 16
+shaext_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ lea r10,[$L$prologue_shaext]
+ cmp rbx,r10
+ jb NEAR $L$common_seh_tail
+
+ lea r10,[$L$epilogue_shaext]
+ cmp rbx,r10
+ jae NEAR $L$common_seh_tail
+
+ lea rsi,[((-8-64))+rax]
+ lea rdi,[512+r8]
+ mov ecx,8
+ DD 0xa548f3fc
+
+ jmp NEAR $L$common_seh_tail
+
+
+ALIGN 16
+ssse3_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$common_seh_tail
+
+ mov rax,QWORD[208+r8]
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$common_seh_tail
+
+ lea rsi,[((-40-96))+rax]
+ lea rdi,[512+r8]
+ mov ecx,12
+ DD 0xa548f3fc
+
+ mov rbx,QWORD[((-8))+rax]
+ mov rbp,QWORD[((-16))+rax]
+ mov r12,QWORD[((-24))+rax]
+ mov r13,QWORD[((-32))+rax]
+ mov r14,QWORD[((-40))+rax]
+ mov QWORD[144+r8],rbx
+ mov QWORD[160+r8],rbp
+ mov QWORD[216+r8],r12
+ mov QWORD[224+r8],r13
+ mov QWORD[232+r8],r14
+
+$L$common_seh_tail:
+ mov rdi,QWORD[8+rax]
+ mov rsi,QWORD[16+rax]
+ mov QWORD[152+r8],rax
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+
+ mov rdi,QWORD[40+r9]
+ mov rsi,r8
+ mov ecx,154
+ DD 0xa548f3fc
+
+ mov rsi,r9
+ xor rcx,rcx
+ mov rdx,QWORD[8+rsi]
+ mov r8,QWORD[rsi]
+ mov r9,QWORD[16+rsi]
+ mov r10,QWORD[40+rsi]
+ lea r11,[56+rsi]
+ lea r12,[24+rsi]
+ mov QWORD[32+rsp],r10
+ mov QWORD[40+rsp],r11
+ mov QWORD[48+rsp],r12
+ mov QWORD[56+rsp],rcx
+ call QWORD[__imp_RtlVirtualUnwind]
+
+ mov eax,1
+ add rsp,64
+ popfq
+ pop r15
+ pop r14
+ pop r13
+ pop r12
+ pop rbp
+ pop rbx
+ pop rdi
+ pop rsi
+ DB 0F3h,0C3h ;repret
+
+
+section .pdata rdata align=4
+ALIGN 4
+ DD $L$SEH_begin_sha1_block_data_order wrt ..imagebase
+ DD $L$SEH_end_sha1_block_data_order wrt ..imagebase
+ DD $L$SEH_info_sha1_block_data_order wrt ..imagebase
+ DD $L$SEH_begin_sha1_block_data_order_shaext wrt ..imagebase
+ DD $L$SEH_end_sha1_block_data_order_shaext wrt ..imagebase
+ DD $L$SEH_info_sha1_block_data_order_shaext wrt ..imagebase
+ DD $L$SEH_begin_sha1_block_data_order_ssse3 wrt ..imagebase
+ DD $L$SEH_end_sha1_block_data_order_ssse3 wrt ..imagebase
+ DD $L$SEH_info_sha1_block_data_order_ssse3 wrt ..imagebase
+ DD $L$SEH_begin_sha1_block_data_order_avx wrt ..imagebase
+ DD $L$SEH_end_sha1_block_data_order_avx wrt ..imagebase
+ DD $L$SEH_info_sha1_block_data_order_avx wrt ..imagebase
+ DD $L$SEH_begin_sha1_block_data_order_avx2 wrt ..imagebase
+ DD $L$SEH_end_sha1_block_data_order_avx2 wrt ..imagebase
+ DD $L$SEH_info_sha1_block_data_order_avx2 wrt ..imagebase
+section .xdata rdata align=8
+ALIGN 8
+$L$SEH_info_sha1_block_data_order:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+$L$SEH_info_sha1_block_data_order_shaext:
+DB 9,0,0,0
+ DD shaext_handler wrt ..imagebase
+$L$SEH_info_sha1_block_data_order_ssse3:
+DB 9,0,0,0
+ DD ssse3_handler wrt ..imagebase
+ DD $L$prologue_ssse3 wrt ..imagebase,$L$epilogue_ssse3 wrt ..imagebase
+$L$SEH_info_sha1_block_data_order_avx:
+DB 9,0,0,0
+ DD ssse3_handler wrt ..imagebase
+ DD $L$prologue_avx wrt ..imagebase,$L$epilogue_avx wrt ..imagebase
+$L$SEH_info_sha1_block_data_order_avx2:
+DB 9,0,0,0
+ DD ssse3_handler wrt ..imagebase
+ DD $L$prologue_avx2 wrt ..imagebase,$L$epilogue_avx2 wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb-x86_64.nasm
new file mode 100644
index 0000000000..5940112c1f
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb-x86_64.nasm
@@ -0,0 +1,8262 @@
+; Copyright 2013-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+EXTERN OPENSSL_ia32cap_P
+
+global sha256_multi_block
+
+ALIGN 32
+sha256_multi_block:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha256_multi_block:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+ mov rcx,QWORD[((OPENSSL_ia32cap_P+4))]
+ bt rcx,61
+ jc NEAR _shaext_shortcut
+ test ecx,268435456
+ jnz NEAR _avx_shortcut
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ lea rsp,[((-168))+rsp]
+ movaps XMMWORD[rsp],xmm6
+ movaps XMMWORD[16+rsp],xmm7
+ movaps XMMWORD[32+rsp],xmm8
+ movaps XMMWORD[48+rsp],xmm9
+ movaps XMMWORD[(-120)+rax],xmm10
+ movaps XMMWORD[(-104)+rax],xmm11
+ movaps XMMWORD[(-88)+rax],xmm12
+ movaps XMMWORD[(-72)+rax],xmm13
+ movaps XMMWORD[(-56)+rax],xmm14
+ movaps XMMWORD[(-40)+rax],xmm15
+ sub rsp,288
+ and rsp,-256
+ mov QWORD[272+rsp],rax
+
+$L$body:
+ lea rbp,[((K256+128))]
+ lea rbx,[256+rsp]
+ lea rdi,[128+rdi]
+
+$L$oop_grande:
+ mov DWORD[280+rsp],edx
+ xor edx,edx
+ mov r8,QWORD[rsi]
+ mov ecx,DWORD[8+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[rbx],ecx
+ cmovle r8,rbp
+ mov r9,QWORD[16+rsi]
+ mov ecx,DWORD[24+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[4+rbx],ecx
+ cmovle r9,rbp
+ mov r10,QWORD[32+rsi]
+ mov ecx,DWORD[40+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[8+rbx],ecx
+ cmovle r10,rbp
+ mov r11,QWORD[48+rsi]
+ mov ecx,DWORD[56+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[12+rbx],ecx
+ cmovle r11,rbp
+ test edx,edx
+ jz NEAR $L$done
+
+ movdqu xmm8,XMMWORD[((0-128))+rdi]
+ lea rax,[128+rsp]
+ movdqu xmm9,XMMWORD[((32-128))+rdi]
+ movdqu xmm10,XMMWORD[((64-128))+rdi]
+ movdqu xmm11,XMMWORD[((96-128))+rdi]
+ movdqu xmm12,XMMWORD[((128-128))+rdi]
+ movdqu xmm13,XMMWORD[((160-128))+rdi]
+ movdqu xmm14,XMMWORD[((192-128))+rdi]
+ movdqu xmm15,XMMWORD[((224-128))+rdi]
+ movdqu xmm6,XMMWORD[$L$pbswap]
+ jmp NEAR $L$oop
+
+ALIGN 32
+$L$oop:
+ movdqa xmm4,xmm10
+ pxor xmm4,xmm9
+ movd xmm5,DWORD[r8]
+ movd xmm0,DWORD[r9]
+ movd xmm1,DWORD[r10]
+ movd xmm2,DWORD[r11]
+ punpckldq xmm5,xmm1
+ punpckldq xmm0,xmm2
+ punpckldq xmm5,xmm0
+ movdqa xmm7,xmm12
+DB 102,15,56,0,238
+ movdqa xmm2,xmm12
+
+ psrld xmm7,6
+ movdqa xmm1,xmm12
+ pslld xmm2,7
+ movdqa XMMWORD[(0-128)+rax],xmm5
+ paddd xmm5,xmm15
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[((-128))+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm12
+
+ pxor xmm7,xmm2
+ movdqa xmm3,xmm12
+ pslld xmm2,26-21
+ pandn xmm0,xmm14
+ pand xmm3,xmm13
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm8
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm8
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ movdqa xmm3,xmm9
+ movdqa xmm7,xmm8
+ pslld xmm2,10
+ pxor xmm3,xmm8
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm4,xmm3
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm15,xmm9
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm15,xmm4
+ paddd xmm11,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm15,xmm5
+ paddd xmm15,xmm7
+ movd xmm5,DWORD[4+r8]
+ movd xmm0,DWORD[4+r9]
+ movd xmm1,DWORD[4+r10]
+ movd xmm2,DWORD[4+r11]
+ punpckldq xmm5,xmm1
+ punpckldq xmm0,xmm2
+ punpckldq xmm5,xmm0
+ movdqa xmm7,xmm11
+
+ movdqa xmm2,xmm11
+DB 102,15,56,0,238
+ psrld xmm7,6
+ movdqa xmm1,xmm11
+ pslld xmm2,7
+ movdqa XMMWORD[(16-128)+rax],xmm5
+ paddd xmm5,xmm14
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[((-96))+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm11
+
+ pxor xmm7,xmm2
+ movdqa xmm4,xmm11
+ pslld xmm2,26-21
+ pandn xmm0,xmm13
+ pand xmm4,xmm12
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm15
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm15
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm4
+ movdqa xmm4,xmm8
+ movdqa xmm7,xmm15
+ pslld xmm2,10
+ pxor xmm4,xmm15
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm3,xmm4
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm14,xmm8
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm14,xmm3
+ paddd xmm10,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm14,xmm5
+ paddd xmm14,xmm7
+ movd xmm5,DWORD[8+r8]
+ movd xmm0,DWORD[8+r9]
+ movd xmm1,DWORD[8+r10]
+ movd xmm2,DWORD[8+r11]
+ punpckldq xmm5,xmm1
+ punpckldq xmm0,xmm2
+ punpckldq xmm5,xmm0
+ movdqa xmm7,xmm10
+DB 102,15,56,0,238
+ movdqa xmm2,xmm10
+
+ psrld xmm7,6
+ movdqa xmm1,xmm10
+ pslld xmm2,7
+ movdqa XMMWORD[(32-128)+rax],xmm5
+ paddd xmm5,xmm13
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[((-64))+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm10
+
+ pxor xmm7,xmm2
+ movdqa xmm3,xmm10
+ pslld xmm2,26-21
+ pandn xmm0,xmm12
+ pand xmm3,xmm11
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm14
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm14
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ movdqa xmm3,xmm15
+ movdqa xmm7,xmm14
+ pslld xmm2,10
+ pxor xmm3,xmm14
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm4,xmm3
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm13,xmm15
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm13,xmm4
+ paddd xmm9,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm13,xmm5
+ paddd xmm13,xmm7
+ movd xmm5,DWORD[12+r8]
+ movd xmm0,DWORD[12+r9]
+ movd xmm1,DWORD[12+r10]
+ movd xmm2,DWORD[12+r11]
+ punpckldq xmm5,xmm1
+ punpckldq xmm0,xmm2
+ punpckldq xmm5,xmm0
+ movdqa xmm7,xmm9
+
+ movdqa xmm2,xmm9
+DB 102,15,56,0,238
+ psrld xmm7,6
+ movdqa xmm1,xmm9
+ pslld xmm2,7
+ movdqa XMMWORD[(48-128)+rax],xmm5
+ paddd xmm5,xmm12
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[((-32))+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm9
+
+ pxor xmm7,xmm2
+ movdqa xmm4,xmm9
+ pslld xmm2,26-21
+ pandn xmm0,xmm11
+ pand xmm4,xmm10
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm13
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm13
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm4
+ movdqa xmm4,xmm14
+ movdqa xmm7,xmm13
+ pslld xmm2,10
+ pxor xmm4,xmm13
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm3,xmm4
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm12,xmm14
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm12,xmm3
+ paddd xmm8,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm12,xmm5
+ paddd xmm12,xmm7
+ movd xmm5,DWORD[16+r8]
+ movd xmm0,DWORD[16+r9]
+ movd xmm1,DWORD[16+r10]
+ movd xmm2,DWORD[16+r11]
+ punpckldq xmm5,xmm1
+ punpckldq xmm0,xmm2
+ punpckldq xmm5,xmm0
+ movdqa xmm7,xmm8
+DB 102,15,56,0,238
+ movdqa xmm2,xmm8
+
+ psrld xmm7,6
+ movdqa xmm1,xmm8
+ pslld xmm2,7
+ movdqa XMMWORD[(64-128)+rax],xmm5
+ paddd xmm5,xmm11
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm8
+
+ pxor xmm7,xmm2
+ movdqa xmm3,xmm8
+ pslld xmm2,26-21
+ pandn xmm0,xmm10
+ pand xmm3,xmm9
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm12
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm12
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ movdqa xmm3,xmm13
+ movdqa xmm7,xmm12
+ pslld xmm2,10
+ pxor xmm3,xmm12
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm4,xmm3
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm11,xmm13
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm11,xmm4
+ paddd xmm15,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm11,xmm5
+ paddd xmm11,xmm7
+ movd xmm5,DWORD[20+r8]
+ movd xmm0,DWORD[20+r9]
+ movd xmm1,DWORD[20+r10]
+ movd xmm2,DWORD[20+r11]
+ punpckldq xmm5,xmm1
+ punpckldq xmm0,xmm2
+ punpckldq xmm5,xmm0
+ movdqa xmm7,xmm15
+
+ movdqa xmm2,xmm15
+DB 102,15,56,0,238
+ psrld xmm7,6
+ movdqa xmm1,xmm15
+ pslld xmm2,7
+ movdqa XMMWORD[(80-128)+rax],xmm5
+ paddd xmm5,xmm10
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[32+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm15
+
+ pxor xmm7,xmm2
+ movdqa xmm4,xmm15
+ pslld xmm2,26-21
+ pandn xmm0,xmm9
+ pand xmm4,xmm8
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm11
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm11
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm4
+ movdqa xmm4,xmm12
+ movdqa xmm7,xmm11
+ pslld xmm2,10
+ pxor xmm4,xmm11
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm3,xmm4
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm10,xmm12
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm10,xmm3
+ paddd xmm14,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm10,xmm5
+ paddd xmm10,xmm7
+ movd xmm5,DWORD[24+r8]
+ movd xmm0,DWORD[24+r9]
+ movd xmm1,DWORD[24+r10]
+ movd xmm2,DWORD[24+r11]
+ punpckldq xmm5,xmm1
+ punpckldq xmm0,xmm2
+ punpckldq xmm5,xmm0
+ movdqa xmm7,xmm14
+DB 102,15,56,0,238
+ movdqa xmm2,xmm14
+
+ psrld xmm7,6
+ movdqa xmm1,xmm14
+ pslld xmm2,7
+ movdqa XMMWORD[(96-128)+rax],xmm5
+ paddd xmm5,xmm9
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[64+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm14
+
+ pxor xmm7,xmm2
+ movdqa xmm3,xmm14
+ pslld xmm2,26-21
+ pandn xmm0,xmm8
+ pand xmm3,xmm15
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm10
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm10
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ movdqa xmm3,xmm11
+ movdqa xmm7,xmm10
+ pslld xmm2,10
+ pxor xmm3,xmm10
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm4,xmm3
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm9,xmm11
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm9,xmm4
+ paddd xmm13,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm9,xmm5
+ paddd xmm9,xmm7
+ movd xmm5,DWORD[28+r8]
+ movd xmm0,DWORD[28+r9]
+ movd xmm1,DWORD[28+r10]
+ movd xmm2,DWORD[28+r11]
+ punpckldq xmm5,xmm1
+ punpckldq xmm0,xmm2
+ punpckldq xmm5,xmm0
+ movdqa xmm7,xmm13
+
+ movdqa xmm2,xmm13
+DB 102,15,56,0,238
+ psrld xmm7,6
+ movdqa xmm1,xmm13
+ pslld xmm2,7
+ movdqa XMMWORD[(112-128)+rax],xmm5
+ paddd xmm5,xmm8
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[96+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm13
+
+ pxor xmm7,xmm2
+ movdqa xmm4,xmm13
+ pslld xmm2,26-21
+ pandn xmm0,xmm15
+ pand xmm4,xmm14
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm9
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm9
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm4
+ movdqa xmm4,xmm10
+ movdqa xmm7,xmm9
+ pslld xmm2,10
+ pxor xmm4,xmm9
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm3,xmm4
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm8,xmm10
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm8,xmm3
+ paddd xmm12,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm8,xmm5
+ paddd xmm8,xmm7
+ lea rbp,[256+rbp]
+ movd xmm5,DWORD[32+r8]
+ movd xmm0,DWORD[32+r9]
+ movd xmm1,DWORD[32+r10]
+ movd xmm2,DWORD[32+r11]
+ punpckldq xmm5,xmm1
+ punpckldq xmm0,xmm2
+ punpckldq xmm5,xmm0
+ movdqa xmm7,xmm12
+DB 102,15,56,0,238
+ movdqa xmm2,xmm12
+
+ psrld xmm7,6
+ movdqa xmm1,xmm12
+ pslld xmm2,7
+ movdqa XMMWORD[(128-128)+rax],xmm5
+ paddd xmm5,xmm15
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[((-128))+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm12
+
+ pxor xmm7,xmm2
+ movdqa xmm3,xmm12
+ pslld xmm2,26-21
+ pandn xmm0,xmm14
+ pand xmm3,xmm13
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm8
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm8
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ movdqa xmm3,xmm9
+ movdqa xmm7,xmm8
+ pslld xmm2,10
+ pxor xmm3,xmm8
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm4,xmm3
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm15,xmm9
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm15,xmm4
+ paddd xmm11,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm15,xmm5
+ paddd xmm15,xmm7
+ movd xmm5,DWORD[36+r8]
+ movd xmm0,DWORD[36+r9]
+ movd xmm1,DWORD[36+r10]
+ movd xmm2,DWORD[36+r11]
+ punpckldq xmm5,xmm1
+ punpckldq xmm0,xmm2
+ punpckldq xmm5,xmm0
+ movdqa xmm7,xmm11
+
+ movdqa xmm2,xmm11
+DB 102,15,56,0,238
+ psrld xmm7,6
+ movdqa xmm1,xmm11
+ pslld xmm2,7
+ movdqa XMMWORD[(144-128)+rax],xmm5
+ paddd xmm5,xmm14
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[((-96))+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm11
+
+ pxor xmm7,xmm2
+ movdqa xmm4,xmm11
+ pslld xmm2,26-21
+ pandn xmm0,xmm13
+ pand xmm4,xmm12
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm15
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm15
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm4
+ movdqa xmm4,xmm8
+ movdqa xmm7,xmm15
+ pslld xmm2,10
+ pxor xmm4,xmm15
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm3,xmm4
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm14,xmm8
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm14,xmm3
+ paddd xmm10,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm14,xmm5
+ paddd xmm14,xmm7
+ movd xmm5,DWORD[40+r8]
+ movd xmm0,DWORD[40+r9]
+ movd xmm1,DWORD[40+r10]
+ movd xmm2,DWORD[40+r11]
+ punpckldq xmm5,xmm1
+ punpckldq xmm0,xmm2
+ punpckldq xmm5,xmm0
+ movdqa xmm7,xmm10
+DB 102,15,56,0,238
+ movdqa xmm2,xmm10
+
+ psrld xmm7,6
+ movdqa xmm1,xmm10
+ pslld xmm2,7
+ movdqa XMMWORD[(160-128)+rax],xmm5
+ paddd xmm5,xmm13
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[((-64))+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm10
+
+ pxor xmm7,xmm2
+ movdqa xmm3,xmm10
+ pslld xmm2,26-21
+ pandn xmm0,xmm12
+ pand xmm3,xmm11
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm14
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm14
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ movdqa xmm3,xmm15
+ movdqa xmm7,xmm14
+ pslld xmm2,10
+ pxor xmm3,xmm14
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm4,xmm3
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm13,xmm15
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm13,xmm4
+ paddd xmm9,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm13,xmm5
+ paddd xmm13,xmm7
+ movd xmm5,DWORD[44+r8]
+ movd xmm0,DWORD[44+r9]
+ movd xmm1,DWORD[44+r10]
+ movd xmm2,DWORD[44+r11]
+ punpckldq xmm5,xmm1
+ punpckldq xmm0,xmm2
+ punpckldq xmm5,xmm0
+ movdqa xmm7,xmm9
+
+ movdqa xmm2,xmm9
+DB 102,15,56,0,238
+ psrld xmm7,6
+ movdqa xmm1,xmm9
+ pslld xmm2,7
+ movdqa XMMWORD[(176-128)+rax],xmm5
+ paddd xmm5,xmm12
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[((-32))+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm9
+
+ pxor xmm7,xmm2
+ movdqa xmm4,xmm9
+ pslld xmm2,26-21
+ pandn xmm0,xmm11
+ pand xmm4,xmm10
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm13
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm13
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm4
+ movdqa xmm4,xmm14
+ movdqa xmm7,xmm13
+ pslld xmm2,10
+ pxor xmm4,xmm13
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm3,xmm4
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm12,xmm14
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm12,xmm3
+ paddd xmm8,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm12,xmm5
+ paddd xmm12,xmm7
+ movd xmm5,DWORD[48+r8]
+ movd xmm0,DWORD[48+r9]
+ movd xmm1,DWORD[48+r10]
+ movd xmm2,DWORD[48+r11]
+ punpckldq xmm5,xmm1
+ punpckldq xmm0,xmm2
+ punpckldq xmm5,xmm0
+ movdqa xmm7,xmm8
+DB 102,15,56,0,238
+ movdqa xmm2,xmm8
+
+ psrld xmm7,6
+ movdqa xmm1,xmm8
+ pslld xmm2,7
+ movdqa XMMWORD[(192-128)+rax],xmm5
+ paddd xmm5,xmm11
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm8
+
+ pxor xmm7,xmm2
+ movdqa xmm3,xmm8
+ pslld xmm2,26-21
+ pandn xmm0,xmm10
+ pand xmm3,xmm9
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm12
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm12
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ movdqa xmm3,xmm13
+ movdqa xmm7,xmm12
+ pslld xmm2,10
+ pxor xmm3,xmm12
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm4,xmm3
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm11,xmm13
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm11,xmm4
+ paddd xmm15,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm11,xmm5
+ paddd xmm11,xmm7
+ movd xmm5,DWORD[52+r8]
+ movd xmm0,DWORD[52+r9]
+ movd xmm1,DWORD[52+r10]
+ movd xmm2,DWORD[52+r11]
+ punpckldq xmm5,xmm1
+ punpckldq xmm0,xmm2
+ punpckldq xmm5,xmm0
+ movdqa xmm7,xmm15
+
+ movdqa xmm2,xmm15
+DB 102,15,56,0,238
+ psrld xmm7,6
+ movdqa xmm1,xmm15
+ pslld xmm2,7
+ movdqa XMMWORD[(208-128)+rax],xmm5
+ paddd xmm5,xmm10
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[32+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm15
+
+ pxor xmm7,xmm2
+ movdqa xmm4,xmm15
+ pslld xmm2,26-21
+ pandn xmm0,xmm9
+ pand xmm4,xmm8
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm11
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm11
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm4
+ movdqa xmm4,xmm12
+ movdqa xmm7,xmm11
+ pslld xmm2,10
+ pxor xmm4,xmm11
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm3,xmm4
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm10,xmm12
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm10,xmm3
+ paddd xmm14,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm10,xmm5
+ paddd xmm10,xmm7
+ movd xmm5,DWORD[56+r8]
+ movd xmm0,DWORD[56+r9]
+ movd xmm1,DWORD[56+r10]
+ movd xmm2,DWORD[56+r11]
+ punpckldq xmm5,xmm1
+ punpckldq xmm0,xmm2
+ punpckldq xmm5,xmm0
+ movdqa xmm7,xmm14
+DB 102,15,56,0,238
+ movdqa xmm2,xmm14
+
+ psrld xmm7,6
+ movdqa xmm1,xmm14
+ pslld xmm2,7
+ movdqa XMMWORD[(224-128)+rax],xmm5
+ paddd xmm5,xmm9
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[64+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm14
+
+ pxor xmm7,xmm2
+ movdqa xmm3,xmm14
+ pslld xmm2,26-21
+ pandn xmm0,xmm8
+ pand xmm3,xmm15
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm10
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm10
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ movdqa xmm3,xmm11
+ movdqa xmm7,xmm10
+ pslld xmm2,10
+ pxor xmm3,xmm10
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm4,xmm3
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm9,xmm11
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm9,xmm4
+ paddd xmm13,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm9,xmm5
+ paddd xmm9,xmm7
+ movd xmm5,DWORD[60+r8]
+ lea r8,[64+r8]
+ movd xmm0,DWORD[60+r9]
+ lea r9,[64+r9]
+ movd xmm1,DWORD[60+r10]
+ lea r10,[64+r10]
+ movd xmm2,DWORD[60+r11]
+ lea r11,[64+r11]
+ punpckldq xmm5,xmm1
+ punpckldq xmm0,xmm2
+ punpckldq xmm5,xmm0
+ movdqa xmm7,xmm13
+
+ movdqa xmm2,xmm13
+DB 102,15,56,0,238
+ psrld xmm7,6
+ movdqa xmm1,xmm13
+ pslld xmm2,7
+ movdqa XMMWORD[(240-128)+rax],xmm5
+ paddd xmm5,xmm8
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[96+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm13
+ prefetcht0 [63+r8]
+ pxor xmm7,xmm2
+ movdqa xmm4,xmm13
+ pslld xmm2,26-21
+ pandn xmm0,xmm15
+ pand xmm4,xmm14
+ pxor xmm7,xmm1
+
+ prefetcht0 [63+r9]
+ movdqa xmm1,xmm9
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm9
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm4
+ movdqa xmm4,xmm10
+ movdqa xmm7,xmm9
+ pslld xmm2,10
+ pxor xmm4,xmm9
+
+ prefetcht0 [63+r10]
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm3,xmm4
+ pxor xmm1,xmm7
+
+ prefetcht0 [63+r11]
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm8,xmm10
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm8,xmm3
+ paddd xmm12,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm8,xmm5
+ paddd xmm8,xmm7
+ lea rbp,[256+rbp]
+ movdqu xmm5,XMMWORD[((0-128))+rax]
+ mov ecx,3
+ jmp NEAR $L$oop_16_xx
+ALIGN 32
+$L$oop_16_xx:
+ movdqa xmm6,XMMWORD[((16-128))+rax]
+ paddd xmm5,XMMWORD[((144-128))+rax]
+
+ movdqa xmm7,xmm6
+ movdqa xmm1,xmm6
+ psrld xmm7,3
+ movdqa xmm2,xmm6
+
+ psrld xmm1,7
+ movdqa xmm0,XMMWORD[((224-128))+rax]
+ pslld xmm2,14
+ pxor xmm7,xmm1
+ psrld xmm1,18-7
+ movdqa xmm3,xmm0
+ pxor xmm7,xmm2
+ pslld xmm2,25-14
+ pxor xmm7,xmm1
+ psrld xmm0,10
+ movdqa xmm1,xmm3
+
+ psrld xmm3,17
+ pxor xmm7,xmm2
+ pslld xmm1,13
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ psrld xmm3,19-17
+ pxor xmm0,xmm1
+ pslld xmm1,15-13
+ pxor xmm0,xmm3
+ pxor xmm0,xmm1
+ paddd xmm5,xmm0
+ movdqa xmm7,xmm12
+
+ movdqa xmm2,xmm12
+
+ psrld xmm7,6
+ movdqa xmm1,xmm12
+ pslld xmm2,7
+ movdqa XMMWORD[(0-128)+rax],xmm5
+ paddd xmm5,xmm15
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[((-128))+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm12
+
+ pxor xmm7,xmm2
+ movdqa xmm3,xmm12
+ pslld xmm2,26-21
+ pandn xmm0,xmm14
+ pand xmm3,xmm13
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm8
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm8
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ movdqa xmm3,xmm9
+ movdqa xmm7,xmm8
+ pslld xmm2,10
+ pxor xmm3,xmm8
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm4,xmm3
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm15,xmm9
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm15,xmm4
+ paddd xmm11,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm15,xmm5
+ paddd xmm15,xmm7
+ movdqa xmm5,XMMWORD[((32-128))+rax]
+ paddd xmm6,XMMWORD[((160-128))+rax]
+
+ movdqa xmm7,xmm5
+ movdqa xmm1,xmm5
+ psrld xmm7,3
+ movdqa xmm2,xmm5
+
+ psrld xmm1,7
+ movdqa xmm0,XMMWORD[((240-128))+rax]
+ pslld xmm2,14
+ pxor xmm7,xmm1
+ psrld xmm1,18-7
+ movdqa xmm4,xmm0
+ pxor xmm7,xmm2
+ pslld xmm2,25-14
+ pxor xmm7,xmm1
+ psrld xmm0,10
+ movdqa xmm1,xmm4
+
+ psrld xmm4,17
+ pxor xmm7,xmm2
+ pslld xmm1,13
+ paddd xmm6,xmm7
+ pxor xmm0,xmm4
+ psrld xmm4,19-17
+ pxor xmm0,xmm1
+ pslld xmm1,15-13
+ pxor xmm0,xmm4
+ pxor xmm0,xmm1
+ paddd xmm6,xmm0
+ movdqa xmm7,xmm11
+
+ movdqa xmm2,xmm11
+
+ psrld xmm7,6
+ movdqa xmm1,xmm11
+ pslld xmm2,7
+ movdqa XMMWORD[(16-128)+rax],xmm6
+ paddd xmm6,xmm14
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm6,XMMWORD[((-96))+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm11
+
+ pxor xmm7,xmm2
+ movdqa xmm4,xmm11
+ pslld xmm2,26-21
+ pandn xmm0,xmm13
+ pand xmm4,xmm12
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm15
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm15
+ psrld xmm1,2
+ paddd xmm6,xmm7
+ pxor xmm0,xmm4
+ movdqa xmm4,xmm8
+ movdqa xmm7,xmm15
+ pslld xmm2,10
+ pxor xmm4,xmm15
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm6,xmm0
+ pslld xmm2,19-10
+ pand xmm3,xmm4
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm14,xmm8
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm14,xmm3
+ paddd xmm10,xmm6
+ pxor xmm7,xmm2
+
+ paddd xmm14,xmm6
+ paddd xmm14,xmm7
+ movdqa xmm6,XMMWORD[((48-128))+rax]
+ paddd xmm5,XMMWORD[((176-128))+rax]
+
+ movdqa xmm7,xmm6
+ movdqa xmm1,xmm6
+ psrld xmm7,3
+ movdqa xmm2,xmm6
+
+ psrld xmm1,7
+ movdqa xmm0,XMMWORD[((0-128))+rax]
+ pslld xmm2,14
+ pxor xmm7,xmm1
+ psrld xmm1,18-7
+ movdqa xmm3,xmm0
+ pxor xmm7,xmm2
+ pslld xmm2,25-14
+ pxor xmm7,xmm1
+ psrld xmm0,10
+ movdqa xmm1,xmm3
+
+ psrld xmm3,17
+ pxor xmm7,xmm2
+ pslld xmm1,13
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ psrld xmm3,19-17
+ pxor xmm0,xmm1
+ pslld xmm1,15-13
+ pxor xmm0,xmm3
+ pxor xmm0,xmm1
+ paddd xmm5,xmm0
+ movdqa xmm7,xmm10
+
+ movdqa xmm2,xmm10
+
+ psrld xmm7,6
+ movdqa xmm1,xmm10
+ pslld xmm2,7
+ movdqa XMMWORD[(32-128)+rax],xmm5
+ paddd xmm5,xmm13
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[((-64))+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm10
+
+ pxor xmm7,xmm2
+ movdqa xmm3,xmm10
+ pslld xmm2,26-21
+ pandn xmm0,xmm12
+ pand xmm3,xmm11
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm14
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm14
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ movdqa xmm3,xmm15
+ movdqa xmm7,xmm14
+ pslld xmm2,10
+ pxor xmm3,xmm14
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm4,xmm3
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm13,xmm15
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm13,xmm4
+ paddd xmm9,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm13,xmm5
+ paddd xmm13,xmm7
+ movdqa xmm5,XMMWORD[((64-128))+rax]
+ paddd xmm6,XMMWORD[((192-128))+rax]
+
+ movdqa xmm7,xmm5
+ movdqa xmm1,xmm5
+ psrld xmm7,3
+ movdqa xmm2,xmm5
+
+ psrld xmm1,7
+ movdqa xmm0,XMMWORD[((16-128))+rax]
+ pslld xmm2,14
+ pxor xmm7,xmm1
+ psrld xmm1,18-7
+ movdqa xmm4,xmm0
+ pxor xmm7,xmm2
+ pslld xmm2,25-14
+ pxor xmm7,xmm1
+ psrld xmm0,10
+ movdqa xmm1,xmm4
+
+ psrld xmm4,17
+ pxor xmm7,xmm2
+ pslld xmm1,13
+ paddd xmm6,xmm7
+ pxor xmm0,xmm4
+ psrld xmm4,19-17
+ pxor xmm0,xmm1
+ pslld xmm1,15-13
+ pxor xmm0,xmm4
+ pxor xmm0,xmm1
+ paddd xmm6,xmm0
+ movdqa xmm7,xmm9
+
+ movdqa xmm2,xmm9
+
+ psrld xmm7,6
+ movdqa xmm1,xmm9
+ pslld xmm2,7
+ movdqa XMMWORD[(48-128)+rax],xmm6
+ paddd xmm6,xmm12
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm6,XMMWORD[((-32))+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm9
+
+ pxor xmm7,xmm2
+ movdqa xmm4,xmm9
+ pslld xmm2,26-21
+ pandn xmm0,xmm11
+ pand xmm4,xmm10
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm13
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm13
+ psrld xmm1,2
+ paddd xmm6,xmm7
+ pxor xmm0,xmm4
+ movdqa xmm4,xmm14
+ movdqa xmm7,xmm13
+ pslld xmm2,10
+ pxor xmm4,xmm13
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm6,xmm0
+ pslld xmm2,19-10
+ pand xmm3,xmm4
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm12,xmm14
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm12,xmm3
+ paddd xmm8,xmm6
+ pxor xmm7,xmm2
+
+ paddd xmm12,xmm6
+ paddd xmm12,xmm7
+ movdqa xmm6,XMMWORD[((80-128))+rax]
+ paddd xmm5,XMMWORD[((208-128))+rax]
+
+ movdqa xmm7,xmm6
+ movdqa xmm1,xmm6
+ psrld xmm7,3
+ movdqa xmm2,xmm6
+
+ psrld xmm1,7
+ movdqa xmm0,XMMWORD[((32-128))+rax]
+ pslld xmm2,14
+ pxor xmm7,xmm1
+ psrld xmm1,18-7
+ movdqa xmm3,xmm0
+ pxor xmm7,xmm2
+ pslld xmm2,25-14
+ pxor xmm7,xmm1
+ psrld xmm0,10
+ movdqa xmm1,xmm3
+
+ psrld xmm3,17
+ pxor xmm7,xmm2
+ pslld xmm1,13
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ psrld xmm3,19-17
+ pxor xmm0,xmm1
+ pslld xmm1,15-13
+ pxor xmm0,xmm3
+ pxor xmm0,xmm1
+ paddd xmm5,xmm0
+ movdqa xmm7,xmm8
+
+ movdqa xmm2,xmm8
+
+ psrld xmm7,6
+ movdqa xmm1,xmm8
+ pslld xmm2,7
+ movdqa XMMWORD[(64-128)+rax],xmm5
+ paddd xmm5,xmm11
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm8
+
+ pxor xmm7,xmm2
+ movdqa xmm3,xmm8
+ pslld xmm2,26-21
+ pandn xmm0,xmm10
+ pand xmm3,xmm9
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm12
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm12
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ movdqa xmm3,xmm13
+ movdqa xmm7,xmm12
+ pslld xmm2,10
+ pxor xmm3,xmm12
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm4,xmm3
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm11,xmm13
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm11,xmm4
+ paddd xmm15,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm11,xmm5
+ paddd xmm11,xmm7
+ movdqa xmm5,XMMWORD[((96-128))+rax]
+ paddd xmm6,XMMWORD[((224-128))+rax]
+
+ movdqa xmm7,xmm5
+ movdqa xmm1,xmm5
+ psrld xmm7,3
+ movdqa xmm2,xmm5
+
+ psrld xmm1,7
+ movdqa xmm0,XMMWORD[((48-128))+rax]
+ pslld xmm2,14
+ pxor xmm7,xmm1
+ psrld xmm1,18-7
+ movdqa xmm4,xmm0
+ pxor xmm7,xmm2
+ pslld xmm2,25-14
+ pxor xmm7,xmm1
+ psrld xmm0,10
+ movdqa xmm1,xmm4
+
+ psrld xmm4,17
+ pxor xmm7,xmm2
+ pslld xmm1,13
+ paddd xmm6,xmm7
+ pxor xmm0,xmm4
+ psrld xmm4,19-17
+ pxor xmm0,xmm1
+ pslld xmm1,15-13
+ pxor xmm0,xmm4
+ pxor xmm0,xmm1
+ paddd xmm6,xmm0
+ movdqa xmm7,xmm15
+
+ movdqa xmm2,xmm15
+
+ psrld xmm7,6
+ movdqa xmm1,xmm15
+ pslld xmm2,7
+ movdqa XMMWORD[(80-128)+rax],xmm6
+ paddd xmm6,xmm10
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm6,XMMWORD[32+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm15
+
+ pxor xmm7,xmm2
+ movdqa xmm4,xmm15
+ pslld xmm2,26-21
+ pandn xmm0,xmm9
+ pand xmm4,xmm8
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm11
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm11
+ psrld xmm1,2
+ paddd xmm6,xmm7
+ pxor xmm0,xmm4
+ movdqa xmm4,xmm12
+ movdqa xmm7,xmm11
+ pslld xmm2,10
+ pxor xmm4,xmm11
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm6,xmm0
+ pslld xmm2,19-10
+ pand xmm3,xmm4
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm10,xmm12
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm10,xmm3
+ paddd xmm14,xmm6
+ pxor xmm7,xmm2
+
+ paddd xmm10,xmm6
+ paddd xmm10,xmm7
+ movdqa xmm6,XMMWORD[((112-128))+rax]
+ paddd xmm5,XMMWORD[((240-128))+rax]
+
+ movdqa xmm7,xmm6
+ movdqa xmm1,xmm6
+ psrld xmm7,3
+ movdqa xmm2,xmm6
+
+ psrld xmm1,7
+ movdqa xmm0,XMMWORD[((64-128))+rax]
+ pslld xmm2,14
+ pxor xmm7,xmm1
+ psrld xmm1,18-7
+ movdqa xmm3,xmm0
+ pxor xmm7,xmm2
+ pslld xmm2,25-14
+ pxor xmm7,xmm1
+ psrld xmm0,10
+ movdqa xmm1,xmm3
+
+ psrld xmm3,17
+ pxor xmm7,xmm2
+ pslld xmm1,13
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ psrld xmm3,19-17
+ pxor xmm0,xmm1
+ pslld xmm1,15-13
+ pxor xmm0,xmm3
+ pxor xmm0,xmm1
+ paddd xmm5,xmm0
+ movdqa xmm7,xmm14
+
+ movdqa xmm2,xmm14
+
+ psrld xmm7,6
+ movdqa xmm1,xmm14
+ pslld xmm2,7
+ movdqa XMMWORD[(96-128)+rax],xmm5
+ paddd xmm5,xmm9
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[64+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm14
+
+ pxor xmm7,xmm2
+ movdqa xmm3,xmm14
+ pslld xmm2,26-21
+ pandn xmm0,xmm8
+ pand xmm3,xmm15
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm10
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm10
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ movdqa xmm3,xmm11
+ movdqa xmm7,xmm10
+ pslld xmm2,10
+ pxor xmm3,xmm10
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm4,xmm3
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm9,xmm11
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm9,xmm4
+ paddd xmm13,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm9,xmm5
+ paddd xmm9,xmm7
+ movdqa xmm5,XMMWORD[((128-128))+rax]
+ paddd xmm6,XMMWORD[((0-128))+rax]
+
+ movdqa xmm7,xmm5
+ movdqa xmm1,xmm5
+ psrld xmm7,3
+ movdqa xmm2,xmm5
+
+ psrld xmm1,7
+ movdqa xmm0,XMMWORD[((80-128))+rax]
+ pslld xmm2,14
+ pxor xmm7,xmm1
+ psrld xmm1,18-7
+ movdqa xmm4,xmm0
+ pxor xmm7,xmm2
+ pslld xmm2,25-14
+ pxor xmm7,xmm1
+ psrld xmm0,10
+ movdqa xmm1,xmm4
+
+ psrld xmm4,17
+ pxor xmm7,xmm2
+ pslld xmm1,13
+ paddd xmm6,xmm7
+ pxor xmm0,xmm4
+ psrld xmm4,19-17
+ pxor xmm0,xmm1
+ pslld xmm1,15-13
+ pxor xmm0,xmm4
+ pxor xmm0,xmm1
+ paddd xmm6,xmm0
+ movdqa xmm7,xmm13
+
+ movdqa xmm2,xmm13
+
+ psrld xmm7,6
+ movdqa xmm1,xmm13
+ pslld xmm2,7
+ movdqa XMMWORD[(112-128)+rax],xmm6
+ paddd xmm6,xmm8
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm6,XMMWORD[96+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm13
+
+ pxor xmm7,xmm2
+ movdqa xmm4,xmm13
+ pslld xmm2,26-21
+ pandn xmm0,xmm15
+ pand xmm4,xmm14
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm9
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm9
+ psrld xmm1,2
+ paddd xmm6,xmm7
+ pxor xmm0,xmm4
+ movdqa xmm4,xmm10
+ movdqa xmm7,xmm9
+ pslld xmm2,10
+ pxor xmm4,xmm9
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm6,xmm0
+ pslld xmm2,19-10
+ pand xmm3,xmm4
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm8,xmm10
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm8,xmm3
+ paddd xmm12,xmm6
+ pxor xmm7,xmm2
+
+ paddd xmm8,xmm6
+ paddd xmm8,xmm7
+ lea rbp,[256+rbp]
+ movdqa xmm6,XMMWORD[((144-128))+rax]
+ paddd xmm5,XMMWORD[((16-128))+rax]
+
+ movdqa xmm7,xmm6
+ movdqa xmm1,xmm6
+ psrld xmm7,3
+ movdqa xmm2,xmm6
+
+ psrld xmm1,7
+ movdqa xmm0,XMMWORD[((96-128))+rax]
+ pslld xmm2,14
+ pxor xmm7,xmm1
+ psrld xmm1,18-7
+ movdqa xmm3,xmm0
+ pxor xmm7,xmm2
+ pslld xmm2,25-14
+ pxor xmm7,xmm1
+ psrld xmm0,10
+ movdqa xmm1,xmm3
+
+ psrld xmm3,17
+ pxor xmm7,xmm2
+ pslld xmm1,13
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ psrld xmm3,19-17
+ pxor xmm0,xmm1
+ pslld xmm1,15-13
+ pxor xmm0,xmm3
+ pxor xmm0,xmm1
+ paddd xmm5,xmm0
+ movdqa xmm7,xmm12
+
+ movdqa xmm2,xmm12
+
+ psrld xmm7,6
+ movdqa xmm1,xmm12
+ pslld xmm2,7
+ movdqa XMMWORD[(128-128)+rax],xmm5
+ paddd xmm5,xmm15
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[((-128))+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm12
+
+ pxor xmm7,xmm2
+ movdqa xmm3,xmm12
+ pslld xmm2,26-21
+ pandn xmm0,xmm14
+ pand xmm3,xmm13
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm8
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm8
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ movdqa xmm3,xmm9
+ movdqa xmm7,xmm8
+ pslld xmm2,10
+ pxor xmm3,xmm8
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm4,xmm3
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm15,xmm9
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm15,xmm4
+ paddd xmm11,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm15,xmm5
+ paddd xmm15,xmm7
+ movdqa xmm5,XMMWORD[((160-128))+rax]
+ paddd xmm6,XMMWORD[((32-128))+rax]
+
+ movdqa xmm7,xmm5
+ movdqa xmm1,xmm5
+ psrld xmm7,3
+ movdqa xmm2,xmm5
+
+ psrld xmm1,7
+ movdqa xmm0,XMMWORD[((112-128))+rax]
+ pslld xmm2,14
+ pxor xmm7,xmm1
+ psrld xmm1,18-7
+ movdqa xmm4,xmm0
+ pxor xmm7,xmm2
+ pslld xmm2,25-14
+ pxor xmm7,xmm1
+ psrld xmm0,10
+ movdqa xmm1,xmm4
+
+ psrld xmm4,17
+ pxor xmm7,xmm2
+ pslld xmm1,13
+ paddd xmm6,xmm7
+ pxor xmm0,xmm4
+ psrld xmm4,19-17
+ pxor xmm0,xmm1
+ pslld xmm1,15-13
+ pxor xmm0,xmm4
+ pxor xmm0,xmm1
+ paddd xmm6,xmm0
+ movdqa xmm7,xmm11
+
+ movdqa xmm2,xmm11
+
+ psrld xmm7,6
+ movdqa xmm1,xmm11
+ pslld xmm2,7
+ movdqa XMMWORD[(144-128)+rax],xmm6
+ paddd xmm6,xmm14
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm6,XMMWORD[((-96))+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm11
+
+ pxor xmm7,xmm2
+ movdqa xmm4,xmm11
+ pslld xmm2,26-21
+ pandn xmm0,xmm13
+ pand xmm4,xmm12
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm15
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm15
+ psrld xmm1,2
+ paddd xmm6,xmm7
+ pxor xmm0,xmm4
+ movdqa xmm4,xmm8
+ movdqa xmm7,xmm15
+ pslld xmm2,10
+ pxor xmm4,xmm15
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm6,xmm0
+ pslld xmm2,19-10
+ pand xmm3,xmm4
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm14,xmm8
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm14,xmm3
+ paddd xmm10,xmm6
+ pxor xmm7,xmm2
+
+ paddd xmm14,xmm6
+ paddd xmm14,xmm7
+ movdqa xmm6,XMMWORD[((176-128))+rax]
+ paddd xmm5,XMMWORD[((48-128))+rax]
+
+ movdqa xmm7,xmm6
+ movdqa xmm1,xmm6
+ psrld xmm7,3
+ movdqa xmm2,xmm6
+
+ psrld xmm1,7
+ movdqa xmm0,XMMWORD[((128-128))+rax]
+ pslld xmm2,14
+ pxor xmm7,xmm1
+ psrld xmm1,18-7
+ movdqa xmm3,xmm0
+ pxor xmm7,xmm2
+ pslld xmm2,25-14
+ pxor xmm7,xmm1
+ psrld xmm0,10
+ movdqa xmm1,xmm3
+
+ psrld xmm3,17
+ pxor xmm7,xmm2
+ pslld xmm1,13
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ psrld xmm3,19-17
+ pxor xmm0,xmm1
+ pslld xmm1,15-13
+ pxor xmm0,xmm3
+ pxor xmm0,xmm1
+ paddd xmm5,xmm0
+ movdqa xmm7,xmm10
+
+ movdqa xmm2,xmm10
+
+ psrld xmm7,6
+ movdqa xmm1,xmm10
+ pslld xmm2,7
+ movdqa XMMWORD[(160-128)+rax],xmm5
+ paddd xmm5,xmm13
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[((-64))+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm10
+
+ pxor xmm7,xmm2
+ movdqa xmm3,xmm10
+ pslld xmm2,26-21
+ pandn xmm0,xmm12
+ pand xmm3,xmm11
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm14
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm14
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ movdqa xmm3,xmm15
+ movdqa xmm7,xmm14
+ pslld xmm2,10
+ pxor xmm3,xmm14
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm4,xmm3
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm13,xmm15
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm13,xmm4
+ paddd xmm9,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm13,xmm5
+ paddd xmm13,xmm7
+ movdqa xmm5,XMMWORD[((192-128))+rax]
+ paddd xmm6,XMMWORD[((64-128))+rax]
+
+ movdqa xmm7,xmm5
+ movdqa xmm1,xmm5
+ psrld xmm7,3
+ movdqa xmm2,xmm5
+
+ psrld xmm1,7
+ movdqa xmm0,XMMWORD[((144-128))+rax]
+ pslld xmm2,14
+ pxor xmm7,xmm1
+ psrld xmm1,18-7
+ movdqa xmm4,xmm0
+ pxor xmm7,xmm2
+ pslld xmm2,25-14
+ pxor xmm7,xmm1
+ psrld xmm0,10
+ movdqa xmm1,xmm4
+
+ psrld xmm4,17
+ pxor xmm7,xmm2
+ pslld xmm1,13
+ paddd xmm6,xmm7
+ pxor xmm0,xmm4
+ psrld xmm4,19-17
+ pxor xmm0,xmm1
+ pslld xmm1,15-13
+ pxor xmm0,xmm4
+ pxor xmm0,xmm1
+ paddd xmm6,xmm0
+ movdqa xmm7,xmm9
+
+ movdqa xmm2,xmm9
+
+ psrld xmm7,6
+ movdqa xmm1,xmm9
+ pslld xmm2,7
+ movdqa XMMWORD[(176-128)+rax],xmm6
+ paddd xmm6,xmm12
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm6,XMMWORD[((-32))+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm9
+
+ pxor xmm7,xmm2
+ movdqa xmm4,xmm9
+ pslld xmm2,26-21
+ pandn xmm0,xmm11
+ pand xmm4,xmm10
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm13
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm13
+ psrld xmm1,2
+ paddd xmm6,xmm7
+ pxor xmm0,xmm4
+ movdqa xmm4,xmm14
+ movdqa xmm7,xmm13
+ pslld xmm2,10
+ pxor xmm4,xmm13
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm6,xmm0
+ pslld xmm2,19-10
+ pand xmm3,xmm4
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm12,xmm14
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm12,xmm3
+ paddd xmm8,xmm6
+ pxor xmm7,xmm2
+
+ paddd xmm12,xmm6
+ paddd xmm12,xmm7
+ movdqa xmm6,XMMWORD[((208-128))+rax]
+ paddd xmm5,XMMWORD[((80-128))+rax]
+
+ movdqa xmm7,xmm6
+ movdqa xmm1,xmm6
+ psrld xmm7,3
+ movdqa xmm2,xmm6
+
+ psrld xmm1,7
+ movdqa xmm0,XMMWORD[((160-128))+rax]
+ pslld xmm2,14
+ pxor xmm7,xmm1
+ psrld xmm1,18-7
+ movdqa xmm3,xmm0
+ pxor xmm7,xmm2
+ pslld xmm2,25-14
+ pxor xmm7,xmm1
+ psrld xmm0,10
+ movdqa xmm1,xmm3
+
+ psrld xmm3,17
+ pxor xmm7,xmm2
+ pslld xmm1,13
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ psrld xmm3,19-17
+ pxor xmm0,xmm1
+ pslld xmm1,15-13
+ pxor xmm0,xmm3
+ pxor xmm0,xmm1
+ paddd xmm5,xmm0
+ movdqa xmm7,xmm8
+
+ movdqa xmm2,xmm8
+
+ psrld xmm7,6
+ movdqa xmm1,xmm8
+ pslld xmm2,7
+ movdqa XMMWORD[(192-128)+rax],xmm5
+ paddd xmm5,xmm11
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm8
+
+ pxor xmm7,xmm2
+ movdqa xmm3,xmm8
+ pslld xmm2,26-21
+ pandn xmm0,xmm10
+ pand xmm3,xmm9
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm12
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm12
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ movdqa xmm3,xmm13
+ movdqa xmm7,xmm12
+ pslld xmm2,10
+ pxor xmm3,xmm12
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm4,xmm3
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm11,xmm13
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm11,xmm4
+ paddd xmm15,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm11,xmm5
+ paddd xmm11,xmm7
+ movdqa xmm5,XMMWORD[((224-128))+rax]
+ paddd xmm6,XMMWORD[((96-128))+rax]
+
+ movdqa xmm7,xmm5
+ movdqa xmm1,xmm5
+ psrld xmm7,3
+ movdqa xmm2,xmm5
+
+ psrld xmm1,7
+ movdqa xmm0,XMMWORD[((176-128))+rax]
+ pslld xmm2,14
+ pxor xmm7,xmm1
+ psrld xmm1,18-7
+ movdqa xmm4,xmm0
+ pxor xmm7,xmm2
+ pslld xmm2,25-14
+ pxor xmm7,xmm1
+ psrld xmm0,10
+ movdqa xmm1,xmm4
+
+ psrld xmm4,17
+ pxor xmm7,xmm2
+ pslld xmm1,13
+ paddd xmm6,xmm7
+ pxor xmm0,xmm4
+ psrld xmm4,19-17
+ pxor xmm0,xmm1
+ pslld xmm1,15-13
+ pxor xmm0,xmm4
+ pxor xmm0,xmm1
+ paddd xmm6,xmm0
+ movdqa xmm7,xmm15
+
+ movdqa xmm2,xmm15
+
+ psrld xmm7,6
+ movdqa xmm1,xmm15
+ pslld xmm2,7
+ movdqa XMMWORD[(208-128)+rax],xmm6
+ paddd xmm6,xmm10
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm6,XMMWORD[32+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm15
+
+ pxor xmm7,xmm2
+ movdqa xmm4,xmm15
+ pslld xmm2,26-21
+ pandn xmm0,xmm9
+ pand xmm4,xmm8
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm11
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm11
+ psrld xmm1,2
+ paddd xmm6,xmm7
+ pxor xmm0,xmm4
+ movdqa xmm4,xmm12
+ movdqa xmm7,xmm11
+ pslld xmm2,10
+ pxor xmm4,xmm11
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm6,xmm0
+ pslld xmm2,19-10
+ pand xmm3,xmm4
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm10,xmm12
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm10,xmm3
+ paddd xmm14,xmm6
+ pxor xmm7,xmm2
+
+ paddd xmm10,xmm6
+ paddd xmm10,xmm7
+ movdqa xmm6,XMMWORD[((240-128))+rax]
+ paddd xmm5,XMMWORD[((112-128))+rax]
+
+ movdqa xmm7,xmm6
+ movdqa xmm1,xmm6
+ psrld xmm7,3
+ movdqa xmm2,xmm6
+
+ psrld xmm1,7
+ movdqa xmm0,XMMWORD[((192-128))+rax]
+ pslld xmm2,14
+ pxor xmm7,xmm1
+ psrld xmm1,18-7
+ movdqa xmm3,xmm0
+ pxor xmm7,xmm2
+ pslld xmm2,25-14
+ pxor xmm7,xmm1
+ psrld xmm0,10
+ movdqa xmm1,xmm3
+
+ psrld xmm3,17
+ pxor xmm7,xmm2
+ pslld xmm1,13
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ psrld xmm3,19-17
+ pxor xmm0,xmm1
+ pslld xmm1,15-13
+ pxor xmm0,xmm3
+ pxor xmm0,xmm1
+ paddd xmm5,xmm0
+ movdqa xmm7,xmm14
+
+ movdqa xmm2,xmm14
+
+ psrld xmm7,6
+ movdqa xmm1,xmm14
+ pslld xmm2,7
+ movdqa XMMWORD[(224-128)+rax],xmm5
+ paddd xmm5,xmm9
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[64+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm14
+
+ pxor xmm7,xmm2
+ movdqa xmm3,xmm14
+ pslld xmm2,26-21
+ pandn xmm0,xmm8
+ pand xmm3,xmm15
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm10
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm10
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ movdqa xmm3,xmm11
+ movdqa xmm7,xmm10
+ pslld xmm2,10
+ pxor xmm3,xmm10
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm4,xmm3
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm9,xmm11
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm9,xmm4
+ paddd xmm13,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm9,xmm5
+ paddd xmm9,xmm7
+ movdqa xmm5,XMMWORD[((0-128))+rax]
+ paddd xmm6,XMMWORD[((128-128))+rax]
+
+ movdqa xmm7,xmm5
+ movdqa xmm1,xmm5
+ psrld xmm7,3
+ movdqa xmm2,xmm5
+
+ psrld xmm1,7
+ movdqa xmm0,XMMWORD[((208-128))+rax]
+ pslld xmm2,14
+ pxor xmm7,xmm1
+ psrld xmm1,18-7
+ movdqa xmm4,xmm0
+ pxor xmm7,xmm2
+ pslld xmm2,25-14
+ pxor xmm7,xmm1
+ psrld xmm0,10
+ movdqa xmm1,xmm4
+
+ psrld xmm4,17
+ pxor xmm7,xmm2
+ pslld xmm1,13
+ paddd xmm6,xmm7
+ pxor xmm0,xmm4
+ psrld xmm4,19-17
+ pxor xmm0,xmm1
+ pslld xmm1,15-13
+ pxor xmm0,xmm4
+ pxor xmm0,xmm1
+ paddd xmm6,xmm0
+ movdqa xmm7,xmm13
+
+ movdqa xmm2,xmm13
+
+ psrld xmm7,6
+ movdqa xmm1,xmm13
+ pslld xmm2,7
+ movdqa XMMWORD[(240-128)+rax],xmm6
+ paddd xmm6,xmm8
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm6,XMMWORD[96+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm13
+
+ pxor xmm7,xmm2
+ movdqa xmm4,xmm13
+ pslld xmm2,26-21
+ pandn xmm0,xmm15
+ pand xmm4,xmm14
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm9
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm9
+ psrld xmm1,2
+ paddd xmm6,xmm7
+ pxor xmm0,xmm4
+ movdqa xmm4,xmm10
+ movdqa xmm7,xmm9
+ pslld xmm2,10
+ pxor xmm4,xmm9
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm6,xmm0
+ pslld xmm2,19-10
+ pand xmm3,xmm4
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm8,xmm10
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm8,xmm3
+ paddd xmm12,xmm6
+ pxor xmm7,xmm2
+
+ paddd xmm8,xmm6
+ paddd xmm8,xmm7
+ lea rbp,[256+rbp]
+ dec ecx
+ jnz NEAR $L$oop_16_xx
+
+ mov ecx,1
+ lea rbp,[((K256+128))]
+
+ movdqa xmm7,XMMWORD[rbx]
+ cmp ecx,DWORD[rbx]
+ pxor xmm0,xmm0
+ cmovge r8,rbp
+ cmp ecx,DWORD[4+rbx]
+ movdqa xmm6,xmm7
+ cmovge r9,rbp
+ cmp ecx,DWORD[8+rbx]
+ pcmpgtd xmm6,xmm0
+ cmovge r10,rbp
+ cmp ecx,DWORD[12+rbx]
+ paddd xmm7,xmm6
+ cmovge r11,rbp
+
+ movdqu xmm0,XMMWORD[((0-128))+rdi]
+ pand xmm8,xmm6
+ movdqu xmm1,XMMWORD[((32-128))+rdi]
+ pand xmm9,xmm6
+ movdqu xmm2,XMMWORD[((64-128))+rdi]
+ pand xmm10,xmm6
+ movdqu xmm5,XMMWORD[((96-128))+rdi]
+ pand xmm11,xmm6
+ paddd xmm8,xmm0
+ movdqu xmm0,XMMWORD[((128-128))+rdi]
+ pand xmm12,xmm6
+ paddd xmm9,xmm1
+ movdqu xmm1,XMMWORD[((160-128))+rdi]
+ pand xmm13,xmm6
+ paddd xmm10,xmm2
+ movdqu xmm2,XMMWORD[((192-128))+rdi]
+ pand xmm14,xmm6
+ paddd xmm11,xmm5
+ movdqu xmm5,XMMWORD[((224-128))+rdi]
+ pand xmm15,xmm6
+ paddd xmm12,xmm0
+ paddd xmm13,xmm1
+ movdqu XMMWORD[(0-128)+rdi],xmm8
+ paddd xmm14,xmm2
+ movdqu XMMWORD[(32-128)+rdi],xmm9
+ paddd xmm15,xmm5
+ movdqu XMMWORD[(64-128)+rdi],xmm10
+ movdqu XMMWORD[(96-128)+rdi],xmm11
+ movdqu XMMWORD[(128-128)+rdi],xmm12
+ movdqu XMMWORD[(160-128)+rdi],xmm13
+ movdqu XMMWORD[(192-128)+rdi],xmm14
+ movdqu XMMWORD[(224-128)+rdi],xmm15
+
+ movdqa XMMWORD[rbx],xmm7
+ movdqa xmm6,XMMWORD[$L$pbswap]
+ dec edx
+ jnz NEAR $L$oop
+
+ mov edx,DWORD[280+rsp]
+ lea rdi,[16+rdi]
+ lea rsi,[64+rsi]
+ dec edx
+ jnz NEAR $L$oop_grande
+
+$L$done:
+ mov rax,QWORD[272+rsp]
+
+ movaps xmm6,XMMWORD[((-184))+rax]
+ movaps xmm7,XMMWORD[((-168))+rax]
+ movaps xmm8,XMMWORD[((-152))+rax]
+ movaps xmm9,XMMWORD[((-136))+rax]
+ movaps xmm10,XMMWORD[((-120))+rax]
+ movaps xmm11,XMMWORD[((-104))+rax]
+ movaps xmm12,XMMWORD[((-88))+rax]
+ movaps xmm13,XMMWORD[((-72))+rax]
+ movaps xmm14,XMMWORD[((-56))+rax]
+ movaps xmm15,XMMWORD[((-40))+rax]
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_sha256_multi_block:
+
+ALIGN 32
+sha256_multi_block_shaext:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha256_multi_block_shaext:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+_shaext_shortcut:
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ lea rsp,[((-168))+rsp]
+ movaps XMMWORD[rsp],xmm6
+ movaps XMMWORD[16+rsp],xmm7
+ movaps XMMWORD[32+rsp],xmm8
+ movaps XMMWORD[48+rsp],xmm9
+ movaps XMMWORD[(-120)+rax],xmm10
+ movaps XMMWORD[(-104)+rax],xmm11
+ movaps XMMWORD[(-88)+rax],xmm12
+ movaps XMMWORD[(-72)+rax],xmm13
+ movaps XMMWORD[(-56)+rax],xmm14
+ movaps XMMWORD[(-40)+rax],xmm15
+ sub rsp,288
+ shl edx,1
+ and rsp,-256
+ lea rdi,[128+rdi]
+ mov QWORD[272+rsp],rax
+$L$body_shaext:
+ lea rbx,[256+rsp]
+ lea rbp,[((K256_shaext+128))]
+
+$L$oop_grande_shaext:
+ mov DWORD[280+rsp],edx
+ xor edx,edx
+ mov r8,QWORD[rsi]
+ mov ecx,DWORD[8+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[rbx],ecx
+ cmovle r8,rsp
+ mov r9,QWORD[16+rsi]
+ mov ecx,DWORD[24+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[4+rbx],ecx
+ cmovle r9,rsp
+ test edx,edx
+ jz NEAR $L$done_shaext
+
+ movq xmm12,QWORD[((0-128))+rdi]
+ movq xmm4,QWORD[((32-128))+rdi]
+ movq xmm13,QWORD[((64-128))+rdi]
+ movq xmm5,QWORD[((96-128))+rdi]
+ movq xmm8,QWORD[((128-128))+rdi]
+ movq xmm9,QWORD[((160-128))+rdi]
+ movq xmm10,QWORD[((192-128))+rdi]
+ movq xmm11,QWORD[((224-128))+rdi]
+
+ punpckldq xmm12,xmm4
+ punpckldq xmm13,xmm5
+ punpckldq xmm8,xmm9
+ punpckldq xmm10,xmm11
+ movdqa xmm3,XMMWORD[((K256_shaext-16))]
+
+ movdqa xmm14,xmm12
+ movdqa xmm15,xmm13
+ punpcklqdq xmm12,xmm8
+ punpcklqdq xmm13,xmm10
+ punpckhqdq xmm14,xmm8
+ punpckhqdq xmm15,xmm10
+
+ pshufd xmm12,xmm12,27
+ pshufd xmm13,xmm13,27
+ pshufd xmm14,xmm14,27
+ pshufd xmm15,xmm15,27
+ jmp NEAR $L$oop_shaext
+
+ALIGN 32
+$L$oop_shaext:
+ movdqu xmm4,XMMWORD[r8]
+ movdqu xmm8,XMMWORD[r9]
+ movdqu xmm5,XMMWORD[16+r8]
+ movdqu xmm9,XMMWORD[16+r9]
+ movdqu xmm6,XMMWORD[32+r8]
+DB 102,15,56,0,227
+ movdqu xmm10,XMMWORD[32+r9]
+DB 102,68,15,56,0,195
+ movdqu xmm7,XMMWORD[48+r8]
+ lea r8,[64+r8]
+ movdqu xmm11,XMMWORD[48+r9]
+ lea r9,[64+r9]
+
+ movdqa xmm0,XMMWORD[((0-128))+rbp]
+DB 102,15,56,0,235
+ paddd xmm0,xmm4
+ pxor xmm4,xmm12
+ movdqa xmm1,xmm0
+ movdqa xmm2,XMMWORD[((0-128))+rbp]
+DB 102,68,15,56,0,203
+ paddd xmm2,xmm8
+ movdqa XMMWORD[80+rsp],xmm13
+DB 69,15,56,203,236
+ pxor xmm8,xmm14
+ movdqa xmm0,xmm2
+ movdqa XMMWORD[112+rsp],xmm15
+DB 69,15,56,203,254
+ pshufd xmm0,xmm1,0x0e
+ pxor xmm4,xmm12
+ movdqa XMMWORD[64+rsp],xmm12
+DB 69,15,56,203,229
+ pshufd xmm0,xmm2,0x0e
+ pxor xmm8,xmm14
+ movdqa XMMWORD[96+rsp],xmm14
+ movdqa xmm1,XMMWORD[((16-128))+rbp]
+ paddd xmm1,xmm5
+DB 102,15,56,0,243
+DB 69,15,56,203,247
+
+ movdqa xmm0,xmm1
+ movdqa xmm2,XMMWORD[((16-128))+rbp]
+ paddd xmm2,xmm9
+DB 69,15,56,203,236
+ movdqa xmm0,xmm2
+ prefetcht0 [127+r8]
+DB 102,15,56,0,251
+DB 102,68,15,56,0,211
+ prefetcht0 [127+r9]
+DB 69,15,56,203,254
+ pshufd xmm0,xmm1,0x0e
+DB 102,68,15,56,0,219
+DB 15,56,204,229
+DB 69,15,56,203,229
+ pshufd xmm0,xmm2,0x0e
+ movdqa xmm1,XMMWORD[((32-128))+rbp]
+ paddd xmm1,xmm6
+DB 69,15,56,203,247
+
+ movdqa xmm0,xmm1
+ movdqa xmm2,XMMWORD[((32-128))+rbp]
+ paddd xmm2,xmm10
+DB 69,15,56,203,236
+DB 69,15,56,204,193
+ movdqa xmm0,xmm2
+ movdqa xmm3,xmm7
+DB 69,15,56,203,254
+ pshufd xmm0,xmm1,0x0e
+DB 102,15,58,15,222,4
+ paddd xmm4,xmm3
+ movdqa xmm3,xmm11
+DB 102,65,15,58,15,218,4
+DB 15,56,204,238
+DB 69,15,56,203,229
+ pshufd xmm0,xmm2,0x0e
+ movdqa xmm1,XMMWORD[((48-128))+rbp]
+ paddd xmm1,xmm7
+DB 69,15,56,203,247
+DB 69,15,56,204,202
+
+ movdqa xmm0,xmm1
+ movdqa xmm2,XMMWORD[((48-128))+rbp]
+ paddd xmm8,xmm3
+ paddd xmm2,xmm11
+DB 15,56,205,231
+DB 69,15,56,203,236
+ movdqa xmm0,xmm2
+ movdqa xmm3,xmm4
+DB 102,15,58,15,223,4
+DB 69,15,56,203,254
+DB 69,15,56,205,195
+ pshufd xmm0,xmm1,0x0e
+ paddd xmm5,xmm3
+ movdqa xmm3,xmm8
+DB 102,65,15,58,15,219,4
+DB 15,56,204,247
+DB 69,15,56,203,229
+ pshufd xmm0,xmm2,0x0e
+ movdqa xmm1,XMMWORD[((64-128))+rbp]
+ paddd xmm1,xmm4
+DB 69,15,56,203,247
+DB 69,15,56,204,211
+ movdqa xmm0,xmm1
+ movdqa xmm2,XMMWORD[((64-128))+rbp]
+ paddd xmm9,xmm3
+ paddd xmm2,xmm8
+DB 15,56,205,236
+DB 69,15,56,203,236
+ movdqa xmm0,xmm2
+ movdqa xmm3,xmm5
+DB 102,15,58,15,220,4
+DB 69,15,56,203,254
+DB 69,15,56,205,200
+ pshufd xmm0,xmm1,0x0e
+ paddd xmm6,xmm3
+ movdqa xmm3,xmm9
+DB 102,65,15,58,15,216,4
+DB 15,56,204,252
+DB 69,15,56,203,229
+ pshufd xmm0,xmm2,0x0e
+ movdqa xmm1,XMMWORD[((80-128))+rbp]
+ paddd xmm1,xmm5
+DB 69,15,56,203,247
+DB 69,15,56,204,216
+ movdqa xmm0,xmm1
+ movdqa xmm2,XMMWORD[((80-128))+rbp]
+ paddd xmm10,xmm3
+ paddd xmm2,xmm9
+DB 15,56,205,245
+DB 69,15,56,203,236
+ movdqa xmm0,xmm2
+ movdqa xmm3,xmm6
+DB 102,15,58,15,221,4
+DB 69,15,56,203,254
+DB 69,15,56,205,209
+ pshufd xmm0,xmm1,0x0e
+ paddd xmm7,xmm3
+ movdqa xmm3,xmm10
+DB 102,65,15,58,15,217,4
+DB 15,56,204,229
+DB 69,15,56,203,229
+ pshufd xmm0,xmm2,0x0e
+ movdqa xmm1,XMMWORD[((96-128))+rbp]
+ paddd xmm1,xmm6
+DB 69,15,56,203,247
+DB 69,15,56,204,193
+ movdqa xmm0,xmm1
+ movdqa xmm2,XMMWORD[((96-128))+rbp]
+ paddd xmm11,xmm3
+ paddd xmm2,xmm10
+DB 15,56,205,254
+DB 69,15,56,203,236
+ movdqa xmm0,xmm2
+ movdqa xmm3,xmm7
+DB 102,15,58,15,222,4
+DB 69,15,56,203,254
+DB 69,15,56,205,218
+ pshufd xmm0,xmm1,0x0e
+ paddd xmm4,xmm3
+ movdqa xmm3,xmm11
+DB 102,65,15,58,15,218,4
+DB 15,56,204,238
+DB 69,15,56,203,229
+ pshufd xmm0,xmm2,0x0e
+ movdqa xmm1,XMMWORD[((112-128))+rbp]
+ paddd xmm1,xmm7
+DB 69,15,56,203,247
+DB 69,15,56,204,202
+ movdqa xmm0,xmm1
+ movdqa xmm2,XMMWORD[((112-128))+rbp]
+ paddd xmm8,xmm3
+ paddd xmm2,xmm11
+DB 15,56,205,231
+DB 69,15,56,203,236
+ movdqa xmm0,xmm2
+ movdqa xmm3,xmm4
+DB 102,15,58,15,223,4
+DB 69,15,56,203,254
+DB 69,15,56,205,195
+ pshufd xmm0,xmm1,0x0e
+ paddd xmm5,xmm3
+ movdqa xmm3,xmm8
+DB 102,65,15,58,15,219,4
+DB 15,56,204,247
+DB 69,15,56,203,229
+ pshufd xmm0,xmm2,0x0e
+ movdqa xmm1,XMMWORD[((128-128))+rbp]
+ paddd xmm1,xmm4
+DB 69,15,56,203,247
+DB 69,15,56,204,211
+ movdqa xmm0,xmm1
+ movdqa xmm2,XMMWORD[((128-128))+rbp]
+ paddd xmm9,xmm3
+ paddd xmm2,xmm8
+DB 15,56,205,236
+DB 69,15,56,203,236
+ movdqa xmm0,xmm2
+ movdqa xmm3,xmm5
+DB 102,15,58,15,220,4
+DB 69,15,56,203,254
+DB 69,15,56,205,200
+ pshufd xmm0,xmm1,0x0e
+ paddd xmm6,xmm3
+ movdqa xmm3,xmm9
+DB 102,65,15,58,15,216,4
+DB 15,56,204,252
+DB 69,15,56,203,229
+ pshufd xmm0,xmm2,0x0e
+ movdqa xmm1,XMMWORD[((144-128))+rbp]
+ paddd xmm1,xmm5
+DB 69,15,56,203,247
+DB 69,15,56,204,216
+ movdqa xmm0,xmm1
+ movdqa xmm2,XMMWORD[((144-128))+rbp]
+ paddd xmm10,xmm3
+ paddd xmm2,xmm9
+DB 15,56,205,245
+DB 69,15,56,203,236
+ movdqa xmm0,xmm2
+ movdqa xmm3,xmm6
+DB 102,15,58,15,221,4
+DB 69,15,56,203,254
+DB 69,15,56,205,209
+ pshufd xmm0,xmm1,0x0e
+ paddd xmm7,xmm3
+ movdqa xmm3,xmm10
+DB 102,65,15,58,15,217,4
+DB 15,56,204,229
+DB 69,15,56,203,229
+ pshufd xmm0,xmm2,0x0e
+ movdqa xmm1,XMMWORD[((160-128))+rbp]
+ paddd xmm1,xmm6
+DB 69,15,56,203,247
+DB 69,15,56,204,193
+ movdqa xmm0,xmm1
+ movdqa xmm2,XMMWORD[((160-128))+rbp]
+ paddd xmm11,xmm3
+ paddd xmm2,xmm10
+DB 15,56,205,254
+DB 69,15,56,203,236
+ movdqa xmm0,xmm2
+ movdqa xmm3,xmm7
+DB 102,15,58,15,222,4
+DB 69,15,56,203,254
+DB 69,15,56,205,218
+ pshufd xmm0,xmm1,0x0e
+ paddd xmm4,xmm3
+ movdqa xmm3,xmm11
+DB 102,65,15,58,15,218,4
+DB 15,56,204,238
+DB 69,15,56,203,229
+ pshufd xmm0,xmm2,0x0e
+ movdqa xmm1,XMMWORD[((176-128))+rbp]
+ paddd xmm1,xmm7
+DB 69,15,56,203,247
+DB 69,15,56,204,202
+ movdqa xmm0,xmm1
+ movdqa xmm2,XMMWORD[((176-128))+rbp]
+ paddd xmm8,xmm3
+ paddd xmm2,xmm11
+DB 15,56,205,231
+DB 69,15,56,203,236
+ movdqa xmm0,xmm2
+ movdqa xmm3,xmm4
+DB 102,15,58,15,223,4
+DB 69,15,56,203,254
+DB 69,15,56,205,195
+ pshufd xmm0,xmm1,0x0e
+ paddd xmm5,xmm3
+ movdqa xmm3,xmm8
+DB 102,65,15,58,15,219,4
+DB 15,56,204,247
+DB 69,15,56,203,229
+ pshufd xmm0,xmm2,0x0e
+ movdqa xmm1,XMMWORD[((192-128))+rbp]
+ paddd xmm1,xmm4
+DB 69,15,56,203,247
+DB 69,15,56,204,211
+ movdqa xmm0,xmm1
+ movdqa xmm2,XMMWORD[((192-128))+rbp]
+ paddd xmm9,xmm3
+ paddd xmm2,xmm8
+DB 15,56,205,236
+DB 69,15,56,203,236
+ movdqa xmm0,xmm2
+ movdqa xmm3,xmm5
+DB 102,15,58,15,220,4
+DB 69,15,56,203,254
+DB 69,15,56,205,200
+ pshufd xmm0,xmm1,0x0e
+ paddd xmm6,xmm3
+ movdqa xmm3,xmm9
+DB 102,65,15,58,15,216,4
+DB 15,56,204,252
+DB 69,15,56,203,229
+ pshufd xmm0,xmm2,0x0e
+ movdqa xmm1,XMMWORD[((208-128))+rbp]
+ paddd xmm1,xmm5
+DB 69,15,56,203,247
+DB 69,15,56,204,216
+ movdqa xmm0,xmm1
+ movdqa xmm2,XMMWORD[((208-128))+rbp]
+ paddd xmm10,xmm3
+ paddd xmm2,xmm9
+DB 15,56,205,245
+DB 69,15,56,203,236
+ movdqa xmm0,xmm2
+ movdqa xmm3,xmm6
+DB 102,15,58,15,221,4
+DB 69,15,56,203,254
+DB 69,15,56,205,209
+ pshufd xmm0,xmm1,0x0e
+ paddd xmm7,xmm3
+ movdqa xmm3,xmm10
+DB 102,65,15,58,15,217,4
+ nop
+DB 69,15,56,203,229
+ pshufd xmm0,xmm2,0x0e
+ movdqa xmm1,XMMWORD[((224-128))+rbp]
+ paddd xmm1,xmm6
+DB 69,15,56,203,247
+
+ movdqa xmm0,xmm1
+ movdqa xmm2,XMMWORD[((224-128))+rbp]
+ paddd xmm11,xmm3
+ paddd xmm2,xmm10
+DB 15,56,205,254
+ nop
+DB 69,15,56,203,236
+ movdqa xmm0,xmm2
+ mov ecx,1
+ pxor xmm6,xmm6
+DB 69,15,56,203,254
+DB 69,15,56,205,218
+ pshufd xmm0,xmm1,0x0e
+ movdqa xmm1,XMMWORD[((240-128))+rbp]
+ paddd xmm1,xmm7
+ movq xmm7,QWORD[rbx]
+ nop
+DB 69,15,56,203,229
+ pshufd xmm0,xmm2,0x0e
+ movdqa xmm2,XMMWORD[((240-128))+rbp]
+ paddd xmm2,xmm11
+DB 69,15,56,203,247
+
+ movdqa xmm0,xmm1
+ cmp ecx,DWORD[rbx]
+ cmovge r8,rsp
+ cmp ecx,DWORD[4+rbx]
+ cmovge r9,rsp
+ pshufd xmm9,xmm7,0x00
+DB 69,15,56,203,236
+ movdqa xmm0,xmm2
+ pshufd xmm10,xmm7,0x55
+ movdqa xmm11,xmm7
+DB 69,15,56,203,254
+ pshufd xmm0,xmm1,0x0e
+ pcmpgtd xmm9,xmm6
+ pcmpgtd xmm10,xmm6
+DB 69,15,56,203,229
+ pshufd xmm0,xmm2,0x0e
+ pcmpgtd xmm11,xmm6
+ movdqa xmm3,XMMWORD[((K256_shaext-16))]
+DB 69,15,56,203,247
+
+ pand xmm13,xmm9
+ pand xmm15,xmm10
+ pand xmm12,xmm9
+ pand xmm14,xmm10
+ paddd xmm11,xmm7
+
+ paddd xmm13,XMMWORD[80+rsp]
+ paddd xmm15,XMMWORD[112+rsp]
+ paddd xmm12,XMMWORD[64+rsp]
+ paddd xmm14,XMMWORD[96+rsp]
+
+ movq QWORD[rbx],xmm11
+ dec edx
+ jnz NEAR $L$oop_shaext
+
+ mov edx,DWORD[280+rsp]
+
+ pshufd xmm12,xmm12,27
+ pshufd xmm13,xmm13,27
+ pshufd xmm14,xmm14,27
+ pshufd xmm15,xmm15,27
+
+ movdqa xmm5,xmm12
+ movdqa xmm6,xmm13
+ punpckldq xmm12,xmm14
+ punpckhdq xmm5,xmm14
+ punpckldq xmm13,xmm15
+ punpckhdq xmm6,xmm15
+
+ movq QWORD[(0-128)+rdi],xmm12
+ psrldq xmm12,8
+ movq QWORD[(128-128)+rdi],xmm5
+ psrldq xmm5,8
+ movq QWORD[(32-128)+rdi],xmm12
+ movq QWORD[(160-128)+rdi],xmm5
+
+ movq QWORD[(64-128)+rdi],xmm13
+ psrldq xmm13,8
+ movq QWORD[(192-128)+rdi],xmm6
+ psrldq xmm6,8
+ movq QWORD[(96-128)+rdi],xmm13
+ movq QWORD[(224-128)+rdi],xmm6
+
+ lea rdi,[8+rdi]
+ lea rsi,[32+rsi]
+ dec edx
+ jnz NEAR $L$oop_grande_shaext
+
+$L$done_shaext:
+
+ movaps xmm6,XMMWORD[((-184))+rax]
+ movaps xmm7,XMMWORD[((-168))+rax]
+ movaps xmm8,XMMWORD[((-152))+rax]
+ movaps xmm9,XMMWORD[((-136))+rax]
+ movaps xmm10,XMMWORD[((-120))+rax]
+ movaps xmm11,XMMWORD[((-104))+rax]
+ movaps xmm12,XMMWORD[((-88))+rax]
+ movaps xmm13,XMMWORD[((-72))+rax]
+ movaps xmm14,XMMWORD[((-56))+rax]
+ movaps xmm15,XMMWORD[((-40))+rax]
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$epilogue_shaext:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_sha256_multi_block_shaext:
+
+ALIGN 32
+sha256_multi_block_avx:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha256_multi_block_avx:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+_avx_shortcut:
+ shr rcx,32
+ cmp edx,2
+ jb NEAR $L$avx
+ test ecx,32
+ jnz NEAR _avx2_shortcut
+ jmp NEAR $L$avx
+ALIGN 32
+$L$avx:
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ lea rsp,[((-168))+rsp]
+ movaps XMMWORD[rsp],xmm6
+ movaps XMMWORD[16+rsp],xmm7
+ movaps XMMWORD[32+rsp],xmm8
+ movaps XMMWORD[48+rsp],xmm9
+ movaps XMMWORD[(-120)+rax],xmm10
+ movaps XMMWORD[(-104)+rax],xmm11
+ movaps XMMWORD[(-88)+rax],xmm12
+ movaps XMMWORD[(-72)+rax],xmm13
+ movaps XMMWORD[(-56)+rax],xmm14
+ movaps XMMWORD[(-40)+rax],xmm15
+ sub rsp,288
+ and rsp,-256
+ mov QWORD[272+rsp],rax
+
+$L$body_avx:
+ lea rbp,[((K256+128))]
+ lea rbx,[256+rsp]
+ lea rdi,[128+rdi]
+
+$L$oop_grande_avx:
+ mov DWORD[280+rsp],edx
+ xor edx,edx
+ mov r8,QWORD[rsi]
+ mov ecx,DWORD[8+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[rbx],ecx
+ cmovle r8,rbp
+ mov r9,QWORD[16+rsi]
+ mov ecx,DWORD[24+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[4+rbx],ecx
+ cmovle r9,rbp
+ mov r10,QWORD[32+rsi]
+ mov ecx,DWORD[40+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[8+rbx],ecx
+ cmovle r10,rbp
+ mov r11,QWORD[48+rsi]
+ mov ecx,DWORD[56+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[12+rbx],ecx
+ cmovle r11,rbp
+ test edx,edx
+ jz NEAR $L$done_avx
+
+ vmovdqu xmm8,XMMWORD[((0-128))+rdi]
+ lea rax,[128+rsp]
+ vmovdqu xmm9,XMMWORD[((32-128))+rdi]
+ vmovdqu xmm10,XMMWORD[((64-128))+rdi]
+ vmovdqu xmm11,XMMWORD[((96-128))+rdi]
+ vmovdqu xmm12,XMMWORD[((128-128))+rdi]
+ vmovdqu xmm13,XMMWORD[((160-128))+rdi]
+ vmovdqu xmm14,XMMWORD[((192-128))+rdi]
+ vmovdqu xmm15,XMMWORD[((224-128))+rdi]
+ vmovdqu xmm6,XMMWORD[$L$pbswap]
+ jmp NEAR $L$oop_avx
+
+ALIGN 32
+$L$oop_avx:
+ vpxor xmm4,xmm10,xmm9
+ vmovd xmm5,DWORD[r8]
+ vmovd xmm0,DWORD[r9]
+ vpinsrd xmm5,xmm5,DWORD[r10],1
+ vpinsrd xmm0,xmm0,DWORD[r11],1
+ vpunpckldq xmm5,xmm5,xmm0
+ vpshufb xmm5,xmm5,xmm6
+ vpsrld xmm7,xmm12,6
+ vpslld xmm2,xmm12,26
+ vmovdqu XMMWORD[(0-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm15
+
+ vpsrld xmm1,xmm12,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm12,21
+ vpaddd xmm5,xmm5,XMMWORD[((-128))+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm12,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm12,7
+ vpandn xmm0,xmm12,xmm14
+ vpand xmm3,xmm12,xmm13
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm15,xmm8,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm8,30
+ vpxor xmm0,xmm0,xmm3
+ vpxor xmm3,xmm9,xmm8
+
+ vpxor xmm15,xmm15,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm8,13
+
+ vpslld xmm2,xmm8,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm4,xmm4,xmm3
+
+ vpxor xmm7,xmm15,xmm1
+
+ vpsrld xmm1,xmm8,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm8,10
+ vpxor xmm15,xmm9,xmm4
+ vpaddd xmm11,xmm11,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm15,xmm15,xmm5
+ vpaddd xmm15,xmm15,xmm7
+ vmovd xmm5,DWORD[4+r8]
+ vmovd xmm0,DWORD[4+r9]
+ vpinsrd xmm5,xmm5,DWORD[4+r10],1
+ vpinsrd xmm0,xmm0,DWORD[4+r11],1
+ vpunpckldq xmm5,xmm5,xmm0
+ vpshufb xmm5,xmm5,xmm6
+ vpsrld xmm7,xmm11,6
+ vpslld xmm2,xmm11,26
+ vmovdqu XMMWORD[(16-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm14
+
+ vpsrld xmm1,xmm11,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm11,21
+ vpaddd xmm5,xmm5,XMMWORD[((-96))+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm11,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm11,7
+ vpandn xmm0,xmm11,xmm13
+ vpand xmm4,xmm11,xmm12
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm14,xmm15,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm15,30
+ vpxor xmm0,xmm0,xmm4
+ vpxor xmm4,xmm8,xmm15
+
+ vpxor xmm14,xmm14,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm15,13
+
+ vpslld xmm2,xmm15,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm3,xmm3,xmm4
+
+ vpxor xmm7,xmm14,xmm1
+
+ vpsrld xmm1,xmm15,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm15,10
+ vpxor xmm14,xmm8,xmm3
+ vpaddd xmm10,xmm10,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm14,xmm14,xmm5
+ vpaddd xmm14,xmm14,xmm7
+ vmovd xmm5,DWORD[8+r8]
+ vmovd xmm0,DWORD[8+r9]
+ vpinsrd xmm5,xmm5,DWORD[8+r10],1
+ vpinsrd xmm0,xmm0,DWORD[8+r11],1
+ vpunpckldq xmm5,xmm5,xmm0
+ vpshufb xmm5,xmm5,xmm6
+ vpsrld xmm7,xmm10,6
+ vpslld xmm2,xmm10,26
+ vmovdqu XMMWORD[(32-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm13
+
+ vpsrld xmm1,xmm10,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm10,21
+ vpaddd xmm5,xmm5,XMMWORD[((-64))+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm10,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm10,7
+ vpandn xmm0,xmm10,xmm12
+ vpand xmm3,xmm10,xmm11
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm13,xmm14,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm14,30
+ vpxor xmm0,xmm0,xmm3
+ vpxor xmm3,xmm15,xmm14
+
+ vpxor xmm13,xmm13,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm14,13
+
+ vpslld xmm2,xmm14,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm4,xmm4,xmm3
+
+ vpxor xmm7,xmm13,xmm1
+
+ vpsrld xmm1,xmm14,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm14,10
+ vpxor xmm13,xmm15,xmm4
+ vpaddd xmm9,xmm9,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm13,xmm13,xmm5
+ vpaddd xmm13,xmm13,xmm7
+ vmovd xmm5,DWORD[12+r8]
+ vmovd xmm0,DWORD[12+r9]
+ vpinsrd xmm5,xmm5,DWORD[12+r10],1
+ vpinsrd xmm0,xmm0,DWORD[12+r11],1
+ vpunpckldq xmm5,xmm5,xmm0
+ vpshufb xmm5,xmm5,xmm6
+ vpsrld xmm7,xmm9,6
+ vpslld xmm2,xmm9,26
+ vmovdqu XMMWORD[(48-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm12
+
+ vpsrld xmm1,xmm9,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm9,21
+ vpaddd xmm5,xmm5,XMMWORD[((-32))+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm9,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm9,7
+ vpandn xmm0,xmm9,xmm11
+ vpand xmm4,xmm9,xmm10
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm12,xmm13,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm13,30
+ vpxor xmm0,xmm0,xmm4
+ vpxor xmm4,xmm14,xmm13
+
+ vpxor xmm12,xmm12,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm13,13
+
+ vpslld xmm2,xmm13,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm3,xmm3,xmm4
+
+ vpxor xmm7,xmm12,xmm1
+
+ vpsrld xmm1,xmm13,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm13,10
+ vpxor xmm12,xmm14,xmm3
+ vpaddd xmm8,xmm8,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm12,xmm12,xmm5
+ vpaddd xmm12,xmm12,xmm7
+ vmovd xmm5,DWORD[16+r8]
+ vmovd xmm0,DWORD[16+r9]
+ vpinsrd xmm5,xmm5,DWORD[16+r10],1
+ vpinsrd xmm0,xmm0,DWORD[16+r11],1
+ vpunpckldq xmm5,xmm5,xmm0
+ vpshufb xmm5,xmm5,xmm6
+ vpsrld xmm7,xmm8,6
+ vpslld xmm2,xmm8,26
+ vmovdqu XMMWORD[(64-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm11
+
+ vpsrld xmm1,xmm8,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm8,21
+ vpaddd xmm5,xmm5,XMMWORD[rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm8,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm8,7
+ vpandn xmm0,xmm8,xmm10
+ vpand xmm3,xmm8,xmm9
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm11,xmm12,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm12,30
+ vpxor xmm0,xmm0,xmm3
+ vpxor xmm3,xmm13,xmm12
+
+ vpxor xmm11,xmm11,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm12,13
+
+ vpslld xmm2,xmm12,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm4,xmm4,xmm3
+
+ vpxor xmm7,xmm11,xmm1
+
+ vpsrld xmm1,xmm12,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm12,10
+ vpxor xmm11,xmm13,xmm4
+ vpaddd xmm15,xmm15,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm11,xmm11,xmm5
+ vpaddd xmm11,xmm11,xmm7
+ vmovd xmm5,DWORD[20+r8]
+ vmovd xmm0,DWORD[20+r9]
+ vpinsrd xmm5,xmm5,DWORD[20+r10],1
+ vpinsrd xmm0,xmm0,DWORD[20+r11],1
+ vpunpckldq xmm5,xmm5,xmm0
+ vpshufb xmm5,xmm5,xmm6
+ vpsrld xmm7,xmm15,6
+ vpslld xmm2,xmm15,26
+ vmovdqu XMMWORD[(80-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm10
+
+ vpsrld xmm1,xmm15,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm15,21
+ vpaddd xmm5,xmm5,XMMWORD[32+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm15,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm15,7
+ vpandn xmm0,xmm15,xmm9
+ vpand xmm4,xmm15,xmm8
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm10,xmm11,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm11,30
+ vpxor xmm0,xmm0,xmm4
+ vpxor xmm4,xmm12,xmm11
+
+ vpxor xmm10,xmm10,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm11,13
+
+ vpslld xmm2,xmm11,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm3,xmm3,xmm4
+
+ vpxor xmm7,xmm10,xmm1
+
+ vpsrld xmm1,xmm11,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm11,10
+ vpxor xmm10,xmm12,xmm3
+ vpaddd xmm14,xmm14,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm10,xmm10,xmm5
+ vpaddd xmm10,xmm10,xmm7
+ vmovd xmm5,DWORD[24+r8]
+ vmovd xmm0,DWORD[24+r9]
+ vpinsrd xmm5,xmm5,DWORD[24+r10],1
+ vpinsrd xmm0,xmm0,DWORD[24+r11],1
+ vpunpckldq xmm5,xmm5,xmm0
+ vpshufb xmm5,xmm5,xmm6
+ vpsrld xmm7,xmm14,6
+ vpslld xmm2,xmm14,26
+ vmovdqu XMMWORD[(96-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm9
+
+ vpsrld xmm1,xmm14,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm14,21
+ vpaddd xmm5,xmm5,XMMWORD[64+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm14,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm14,7
+ vpandn xmm0,xmm14,xmm8
+ vpand xmm3,xmm14,xmm15
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm9,xmm10,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm10,30
+ vpxor xmm0,xmm0,xmm3
+ vpxor xmm3,xmm11,xmm10
+
+ vpxor xmm9,xmm9,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm10,13
+
+ vpslld xmm2,xmm10,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm4,xmm4,xmm3
+
+ vpxor xmm7,xmm9,xmm1
+
+ vpsrld xmm1,xmm10,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm10,10
+ vpxor xmm9,xmm11,xmm4
+ vpaddd xmm13,xmm13,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm9,xmm9,xmm5
+ vpaddd xmm9,xmm9,xmm7
+ vmovd xmm5,DWORD[28+r8]
+ vmovd xmm0,DWORD[28+r9]
+ vpinsrd xmm5,xmm5,DWORD[28+r10],1
+ vpinsrd xmm0,xmm0,DWORD[28+r11],1
+ vpunpckldq xmm5,xmm5,xmm0
+ vpshufb xmm5,xmm5,xmm6
+ vpsrld xmm7,xmm13,6
+ vpslld xmm2,xmm13,26
+ vmovdqu XMMWORD[(112-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm8
+
+ vpsrld xmm1,xmm13,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm13,21
+ vpaddd xmm5,xmm5,XMMWORD[96+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm13,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm13,7
+ vpandn xmm0,xmm13,xmm15
+ vpand xmm4,xmm13,xmm14
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm8,xmm9,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm9,30
+ vpxor xmm0,xmm0,xmm4
+ vpxor xmm4,xmm10,xmm9
+
+ vpxor xmm8,xmm8,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm9,13
+
+ vpslld xmm2,xmm9,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm3,xmm3,xmm4
+
+ vpxor xmm7,xmm8,xmm1
+
+ vpsrld xmm1,xmm9,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm9,10
+ vpxor xmm8,xmm10,xmm3
+ vpaddd xmm12,xmm12,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm8,xmm8,xmm5
+ vpaddd xmm8,xmm8,xmm7
+ add rbp,256
+ vmovd xmm5,DWORD[32+r8]
+ vmovd xmm0,DWORD[32+r9]
+ vpinsrd xmm5,xmm5,DWORD[32+r10],1
+ vpinsrd xmm0,xmm0,DWORD[32+r11],1
+ vpunpckldq xmm5,xmm5,xmm0
+ vpshufb xmm5,xmm5,xmm6
+ vpsrld xmm7,xmm12,6
+ vpslld xmm2,xmm12,26
+ vmovdqu XMMWORD[(128-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm15
+
+ vpsrld xmm1,xmm12,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm12,21
+ vpaddd xmm5,xmm5,XMMWORD[((-128))+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm12,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm12,7
+ vpandn xmm0,xmm12,xmm14
+ vpand xmm3,xmm12,xmm13
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm15,xmm8,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm8,30
+ vpxor xmm0,xmm0,xmm3
+ vpxor xmm3,xmm9,xmm8
+
+ vpxor xmm15,xmm15,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm8,13
+
+ vpslld xmm2,xmm8,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm4,xmm4,xmm3
+
+ vpxor xmm7,xmm15,xmm1
+
+ vpsrld xmm1,xmm8,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm8,10
+ vpxor xmm15,xmm9,xmm4
+ vpaddd xmm11,xmm11,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm15,xmm15,xmm5
+ vpaddd xmm15,xmm15,xmm7
+ vmovd xmm5,DWORD[36+r8]
+ vmovd xmm0,DWORD[36+r9]
+ vpinsrd xmm5,xmm5,DWORD[36+r10],1
+ vpinsrd xmm0,xmm0,DWORD[36+r11],1
+ vpunpckldq xmm5,xmm5,xmm0
+ vpshufb xmm5,xmm5,xmm6
+ vpsrld xmm7,xmm11,6
+ vpslld xmm2,xmm11,26
+ vmovdqu XMMWORD[(144-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm14
+
+ vpsrld xmm1,xmm11,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm11,21
+ vpaddd xmm5,xmm5,XMMWORD[((-96))+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm11,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm11,7
+ vpandn xmm0,xmm11,xmm13
+ vpand xmm4,xmm11,xmm12
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm14,xmm15,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm15,30
+ vpxor xmm0,xmm0,xmm4
+ vpxor xmm4,xmm8,xmm15
+
+ vpxor xmm14,xmm14,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm15,13
+
+ vpslld xmm2,xmm15,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm3,xmm3,xmm4
+
+ vpxor xmm7,xmm14,xmm1
+
+ vpsrld xmm1,xmm15,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm15,10
+ vpxor xmm14,xmm8,xmm3
+ vpaddd xmm10,xmm10,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm14,xmm14,xmm5
+ vpaddd xmm14,xmm14,xmm7
+ vmovd xmm5,DWORD[40+r8]
+ vmovd xmm0,DWORD[40+r9]
+ vpinsrd xmm5,xmm5,DWORD[40+r10],1
+ vpinsrd xmm0,xmm0,DWORD[40+r11],1
+ vpunpckldq xmm5,xmm5,xmm0
+ vpshufb xmm5,xmm5,xmm6
+ vpsrld xmm7,xmm10,6
+ vpslld xmm2,xmm10,26
+ vmovdqu XMMWORD[(160-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm13
+
+ vpsrld xmm1,xmm10,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm10,21
+ vpaddd xmm5,xmm5,XMMWORD[((-64))+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm10,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm10,7
+ vpandn xmm0,xmm10,xmm12
+ vpand xmm3,xmm10,xmm11
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm13,xmm14,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm14,30
+ vpxor xmm0,xmm0,xmm3
+ vpxor xmm3,xmm15,xmm14
+
+ vpxor xmm13,xmm13,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm14,13
+
+ vpslld xmm2,xmm14,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm4,xmm4,xmm3
+
+ vpxor xmm7,xmm13,xmm1
+
+ vpsrld xmm1,xmm14,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm14,10
+ vpxor xmm13,xmm15,xmm4
+ vpaddd xmm9,xmm9,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm13,xmm13,xmm5
+ vpaddd xmm13,xmm13,xmm7
+ vmovd xmm5,DWORD[44+r8]
+ vmovd xmm0,DWORD[44+r9]
+ vpinsrd xmm5,xmm5,DWORD[44+r10],1
+ vpinsrd xmm0,xmm0,DWORD[44+r11],1
+ vpunpckldq xmm5,xmm5,xmm0
+ vpshufb xmm5,xmm5,xmm6
+ vpsrld xmm7,xmm9,6
+ vpslld xmm2,xmm9,26
+ vmovdqu XMMWORD[(176-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm12
+
+ vpsrld xmm1,xmm9,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm9,21
+ vpaddd xmm5,xmm5,XMMWORD[((-32))+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm9,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm9,7
+ vpandn xmm0,xmm9,xmm11
+ vpand xmm4,xmm9,xmm10
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm12,xmm13,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm13,30
+ vpxor xmm0,xmm0,xmm4
+ vpxor xmm4,xmm14,xmm13
+
+ vpxor xmm12,xmm12,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm13,13
+
+ vpslld xmm2,xmm13,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm3,xmm3,xmm4
+
+ vpxor xmm7,xmm12,xmm1
+
+ vpsrld xmm1,xmm13,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm13,10
+ vpxor xmm12,xmm14,xmm3
+ vpaddd xmm8,xmm8,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm12,xmm12,xmm5
+ vpaddd xmm12,xmm12,xmm7
+ vmovd xmm5,DWORD[48+r8]
+ vmovd xmm0,DWORD[48+r9]
+ vpinsrd xmm5,xmm5,DWORD[48+r10],1
+ vpinsrd xmm0,xmm0,DWORD[48+r11],1
+ vpunpckldq xmm5,xmm5,xmm0
+ vpshufb xmm5,xmm5,xmm6
+ vpsrld xmm7,xmm8,6
+ vpslld xmm2,xmm8,26
+ vmovdqu XMMWORD[(192-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm11
+
+ vpsrld xmm1,xmm8,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm8,21
+ vpaddd xmm5,xmm5,XMMWORD[rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm8,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm8,7
+ vpandn xmm0,xmm8,xmm10
+ vpand xmm3,xmm8,xmm9
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm11,xmm12,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm12,30
+ vpxor xmm0,xmm0,xmm3
+ vpxor xmm3,xmm13,xmm12
+
+ vpxor xmm11,xmm11,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm12,13
+
+ vpslld xmm2,xmm12,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm4,xmm4,xmm3
+
+ vpxor xmm7,xmm11,xmm1
+
+ vpsrld xmm1,xmm12,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm12,10
+ vpxor xmm11,xmm13,xmm4
+ vpaddd xmm15,xmm15,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm11,xmm11,xmm5
+ vpaddd xmm11,xmm11,xmm7
+ vmovd xmm5,DWORD[52+r8]
+ vmovd xmm0,DWORD[52+r9]
+ vpinsrd xmm5,xmm5,DWORD[52+r10],1
+ vpinsrd xmm0,xmm0,DWORD[52+r11],1
+ vpunpckldq xmm5,xmm5,xmm0
+ vpshufb xmm5,xmm5,xmm6
+ vpsrld xmm7,xmm15,6
+ vpslld xmm2,xmm15,26
+ vmovdqu XMMWORD[(208-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm10
+
+ vpsrld xmm1,xmm15,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm15,21
+ vpaddd xmm5,xmm5,XMMWORD[32+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm15,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm15,7
+ vpandn xmm0,xmm15,xmm9
+ vpand xmm4,xmm15,xmm8
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm10,xmm11,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm11,30
+ vpxor xmm0,xmm0,xmm4
+ vpxor xmm4,xmm12,xmm11
+
+ vpxor xmm10,xmm10,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm11,13
+
+ vpslld xmm2,xmm11,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm3,xmm3,xmm4
+
+ vpxor xmm7,xmm10,xmm1
+
+ vpsrld xmm1,xmm11,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm11,10
+ vpxor xmm10,xmm12,xmm3
+ vpaddd xmm14,xmm14,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm10,xmm10,xmm5
+ vpaddd xmm10,xmm10,xmm7
+ vmovd xmm5,DWORD[56+r8]
+ vmovd xmm0,DWORD[56+r9]
+ vpinsrd xmm5,xmm5,DWORD[56+r10],1
+ vpinsrd xmm0,xmm0,DWORD[56+r11],1
+ vpunpckldq xmm5,xmm5,xmm0
+ vpshufb xmm5,xmm5,xmm6
+ vpsrld xmm7,xmm14,6
+ vpslld xmm2,xmm14,26
+ vmovdqu XMMWORD[(224-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm9
+
+ vpsrld xmm1,xmm14,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm14,21
+ vpaddd xmm5,xmm5,XMMWORD[64+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm14,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm14,7
+ vpandn xmm0,xmm14,xmm8
+ vpand xmm3,xmm14,xmm15
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm9,xmm10,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm10,30
+ vpxor xmm0,xmm0,xmm3
+ vpxor xmm3,xmm11,xmm10
+
+ vpxor xmm9,xmm9,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm10,13
+
+ vpslld xmm2,xmm10,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm4,xmm4,xmm3
+
+ vpxor xmm7,xmm9,xmm1
+
+ vpsrld xmm1,xmm10,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm10,10
+ vpxor xmm9,xmm11,xmm4
+ vpaddd xmm13,xmm13,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm9,xmm9,xmm5
+ vpaddd xmm9,xmm9,xmm7
+ vmovd xmm5,DWORD[60+r8]
+ lea r8,[64+r8]
+ vmovd xmm0,DWORD[60+r9]
+ lea r9,[64+r9]
+ vpinsrd xmm5,xmm5,DWORD[60+r10],1
+ lea r10,[64+r10]
+ vpinsrd xmm0,xmm0,DWORD[60+r11],1
+ lea r11,[64+r11]
+ vpunpckldq xmm5,xmm5,xmm0
+ vpshufb xmm5,xmm5,xmm6
+ vpsrld xmm7,xmm13,6
+ vpslld xmm2,xmm13,26
+ vmovdqu XMMWORD[(240-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm8
+
+ vpsrld xmm1,xmm13,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm13,21
+ vpaddd xmm5,xmm5,XMMWORD[96+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm13,25
+ vpxor xmm7,xmm7,xmm2
+ prefetcht0 [63+r8]
+ vpslld xmm2,xmm13,7
+ vpandn xmm0,xmm13,xmm15
+ vpand xmm4,xmm13,xmm14
+ prefetcht0 [63+r9]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm8,xmm9,2
+ vpxor xmm7,xmm7,xmm2
+ prefetcht0 [63+r10]
+ vpslld xmm1,xmm9,30
+ vpxor xmm0,xmm0,xmm4
+ vpxor xmm4,xmm10,xmm9
+ prefetcht0 [63+r11]
+ vpxor xmm8,xmm8,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm9,13
+
+ vpslld xmm2,xmm9,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm3,xmm3,xmm4
+
+ vpxor xmm7,xmm8,xmm1
+
+ vpsrld xmm1,xmm9,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm9,10
+ vpxor xmm8,xmm10,xmm3
+ vpaddd xmm12,xmm12,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm8,xmm8,xmm5
+ vpaddd xmm8,xmm8,xmm7
+ add rbp,256
+ vmovdqu xmm5,XMMWORD[((0-128))+rax]
+ mov ecx,3
+ jmp NEAR $L$oop_16_xx_avx
+ALIGN 32
+$L$oop_16_xx_avx:
+ vmovdqu xmm6,XMMWORD[((16-128))+rax]
+ vpaddd xmm5,xmm5,XMMWORD[((144-128))+rax]
+
+ vpsrld xmm7,xmm6,3
+ vpsrld xmm1,xmm6,7
+ vpslld xmm2,xmm6,25
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm6,18
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm6,14
+ vmovdqu xmm0,XMMWORD[((224-128))+rax]
+ vpsrld xmm3,xmm0,10
+
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm0,17
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,15
+ vpaddd xmm5,xmm5,xmm7
+ vpxor xmm7,xmm3,xmm1
+ vpsrld xmm1,xmm0,19
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,13
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+ vpaddd xmm5,xmm5,xmm7
+ vpsrld xmm7,xmm12,6
+ vpslld xmm2,xmm12,26
+ vmovdqu XMMWORD[(0-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm15
+
+ vpsrld xmm1,xmm12,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm12,21
+ vpaddd xmm5,xmm5,XMMWORD[((-128))+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm12,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm12,7
+ vpandn xmm0,xmm12,xmm14
+ vpand xmm3,xmm12,xmm13
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm15,xmm8,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm8,30
+ vpxor xmm0,xmm0,xmm3
+ vpxor xmm3,xmm9,xmm8
+
+ vpxor xmm15,xmm15,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm8,13
+
+ vpslld xmm2,xmm8,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm4,xmm4,xmm3
+
+ vpxor xmm7,xmm15,xmm1
+
+ vpsrld xmm1,xmm8,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm8,10
+ vpxor xmm15,xmm9,xmm4
+ vpaddd xmm11,xmm11,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm15,xmm15,xmm5
+ vpaddd xmm15,xmm15,xmm7
+ vmovdqu xmm5,XMMWORD[((32-128))+rax]
+ vpaddd xmm6,xmm6,XMMWORD[((160-128))+rax]
+
+ vpsrld xmm7,xmm5,3
+ vpsrld xmm1,xmm5,7
+ vpslld xmm2,xmm5,25
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm5,18
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm5,14
+ vmovdqu xmm0,XMMWORD[((240-128))+rax]
+ vpsrld xmm4,xmm0,10
+
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm0,17
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,15
+ vpaddd xmm6,xmm6,xmm7
+ vpxor xmm7,xmm4,xmm1
+ vpsrld xmm1,xmm0,19
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,13
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+ vpaddd xmm6,xmm6,xmm7
+ vpsrld xmm7,xmm11,6
+ vpslld xmm2,xmm11,26
+ vmovdqu XMMWORD[(16-128)+rax],xmm6
+ vpaddd xmm6,xmm6,xmm14
+
+ vpsrld xmm1,xmm11,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm11,21
+ vpaddd xmm6,xmm6,XMMWORD[((-96))+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm11,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm11,7
+ vpandn xmm0,xmm11,xmm13
+ vpand xmm4,xmm11,xmm12
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm14,xmm15,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm15,30
+ vpxor xmm0,xmm0,xmm4
+ vpxor xmm4,xmm8,xmm15
+
+ vpxor xmm14,xmm14,xmm1
+ vpaddd xmm6,xmm6,xmm7
+
+ vpsrld xmm1,xmm15,13
+
+ vpslld xmm2,xmm15,19
+ vpaddd xmm6,xmm6,xmm0
+ vpand xmm3,xmm3,xmm4
+
+ vpxor xmm7,xmm14,xmm1
+
+ vpsrld xmm1,xmm15,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm15,10
+ vpxor xmm14,xmm8,xmm3
+ vpaddd xmm10,xmm10,xmm6
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm14,xmm14,xmm6
+ vpaddd xmm14,xmm14,xmm7
+ vmovdqu xmm6,XMMWORD[((48-128))+rax]
+ vpaddd xmm5,xmm5,XMMWORD[((176-128))+rax]
+
+ vpsrld xmm7,xmm6,3
+ vpsrld xmm1,xmm6,7
+ vpslld xmm2,xmm6,25
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm6,18
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm6,14
+ vmovdqu xmm0,XMMWORD[((0-128))+rax]
+ vpsrld xmm3,xmm0,10
+
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm0,17
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,15
+ vpaddd xmm5,xmm5,xmm7
+ vpxor xmm7,xmm3,xmm1
+ vpsrld xmm1,xmm0,19
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,13
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+ vpaddd xmm5,xmm5,xmm7
+ vpsrld xmm7,xmm10,6
+ vpslld xmm2,xmm10,26
+ vmovdqu XMMWORD[(32-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm13
+
+ vpsrld xmm1,xmm10,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm10,21
+ vpaddd xmm5,xmm5,XMMWORD[((-64))+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm10,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm10,7
+ vpandn xmm0,xmm10,xmm12
+ vpand xmm3,xmm10,xmm11
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm13,xmm14,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm14,30
+ vpxor xmm0,xmm0,xmm3
+ vpxor xmm3,xmm15,xmm14
+
+ vpxor xmm13,xmm13,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm14,13
+
+ vpslld xmm2,xmm14,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm4,xmm4,xmm3
+
+ vpxor xmm7,xmm13,xmm1
+
+ vpsrld xmm1,xmm14,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm14,10
+ vpxor xmm13,xmm15,xmm4
+ vpaddd xmm9,xmm9,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm13,xmm13,xmm5
+ vpaddd xmm13,xmm13,xmm7
+ vmovdqu xmm5,XMMWORD[((64-128))+rax]
+ vpaddd xmm6,xmm6,XMMWORD[((192-128))+rax]
+
+ vpsrld xmm7,xmm5,3
+ vpsrld xmm1,xmm5,7
+ vpslld xmm2,xmm5,25
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm5,18
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm5,14
+ vmovdqu xmm0,XMMWORD[((16-128))+rax]
+ vpsrld xmm4,xmm0,10
+
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm0,17
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,15
+ vpaddd xmm6,xmm6,xmm7
+ vpxor xmm7,xmm4,xmm1
+ vpsrld xmm1,xmm0,19
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,13
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+ vpaddd xmm6,xmm6,xmm7
+ vpsrld xmm7,xmm9,6
+ vpslld xmm2,xmm9,26
+ vmovdqu XMMWORD[(48-128)+rax],xmm6
+ vpaddd xmm6,xmm6,xmm12
+
+ vpsrld xmm1,xmm9,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm9,21
+ vpaddd xmm6,xmm6,XMMWORD[((-32))+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm9,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm9,7
+ vpandn xmm0,xmm9,xmm11
+ vpand xmm4,xmm9,xmm10
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm12,xmm13,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm13,30
+ vpxor xmm0,xmm0,xmm4
+ vpxor xmm4,xmm14,xmm13
+
+ vpxor xmm12,xmm12,xmm1
+ vpaddd xmm6,xmm6,xmm7
+
+ vpsrld xmm1,xmm13,13
+
+ vpslld xmm2,xmm13,19
+ vpaddd xmm6,xmm6,xmm0
+ vpand xmm3,xmm3,xmm4
+
+ vpxor xmm7,xmm12,xmm1
+
+ vpsrld xmm1,xmm13,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm13,10
+ vpxor xmm12,xmm14,xmm3
+ vpaddd xmm8,xmm8,xmm6
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm12,xmm12,xmm6
+ vpaddd xmm12,xmm12,xmm7
+ vmovdqu xmm6,XMMWORD[((80-128))+rax]
+ vpaddd xmm5,xmm5,XMMWORD[((208-128))+rax]
+
+ vpsrld xmm7,xmm6,3
+ vpsrld xmm1,xmm6,7
+ vpslld xmm2,xmm6,25
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm6,18
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm6,14
+ vmovdqu xmm0,XMMWORD[((32-128))+rax]
+ vpsrld xmm3,xmm0,10
+
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm0,17
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,15
+ vpaddd xmm5,xmm5,xmm7
+ vpxor xmm7,xmm3,xmm1
+ vpsrld xmm1,xmm0,19
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,13
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+ vpaddd xmm5,xmm5,xmm7
+ vpsrld xmm7,xmm8,6
+ vpslld xmm2,xmm8,26
+ vmovdqu XMMWORD[(64-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm11
+
+ vpsrld xmm1,xmm8,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm8,21
+ vpaddd xmm5,xmm5,XMMWORD[rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm8,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm8,7
+ vpandn xmm0,xmm8,xmm10
+ vpand xmm3,xmm8,xmm9
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm11,xmm12,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm12,30
+ vpxor xmm0,xmm0,xmm3
+ vpxor xmm3,xmm13,xmm12
+
+ vpxor xmm11,xmm11,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm12,13
+
+ vpslld xmm2,xmm12,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm4,xmm4,xmm3
+
+ vpxor xmm7,xmm11,xmm1
+
+ vpsrld xmm1,xmm12,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm12,10
+ vpxor xmm11,xmm13,xmm4
+ vpaddd xmm15,xmm15,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm11,xmm11,xmm5
+ vpaddd xmm11,xmm11,xmm7
+ vmovdqu xmm5,XMMWORD[((96-128))+rax]
+ vpaddd xmm6,xmm6,XMMWORD[((224-128))+rax]
+
+ vpsrld xmm7,xmm5,3
+ vpsrld xmm1,xmm5,7
+ vpslld xmm2,xmm5,25
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm5,18
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm5,14
+ vmovdqu xmm0,XMMWORD[((48-128))+rax]
+ vpsrld xmm4,xmm0,10
+
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm0,17
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,15
+ vpaddd xmm6,xmm6,xmm7
+ vpxor xmm7,xmm4,xmm1
+ vpsrld xmm1,xmm0,19
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,13
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+ vpaddd xmm6,xmm6,xmm7
+ vpsrld xmm7,xmm15,6
+ vpslld xmm2,xmm15,26
+ vmovdqu XMMWORD[(80-128)+rax],xmm6
+ vpaddd xmm6,xmm6,xmm10
+
+ vpsrld xmm1,xmm15,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm15,21
+ vpaddd xmm6,xmm6,XMMWORD[32+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm15,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm15,7
+ vpandn xmm0,xmm15,xmm9
+ vpand xmm4,xmm15,xmm8
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm10,xmm11,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm11,30
+ vpxor xmm0,xmm0,xmm4
+ vpxor xmm4,xmm12,xmm11
+
+ vpxor xmm10,xmm10,xmm1
+ vpaddd xmm6,xmm6,xmm7
+
+ vpsrld xmm1,xmm11,13
+
+ vpslld xmm2,xmm11,19
+ vpaddd xmm6,xmm6,xmm0
+ vpand xmm3,xmm3,xmm4
+
+ vpxor xmm7,xmm10,xmm1
+
+ vpsrld xmm1,xmm11,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm11,10
+ vpxor xmm10,xmm12,xmm3
+ vpaddd xmm14,xmm14,xmm6
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm10,xmm10,xmm6
+ vpaddd xmm10,xmm10,xmm7
+ vmovdqu xmm6,XMMWORD[((112-128))+rax]
+ vpaddd xmm5,xmm5,XMMWORD[((240-128))+rax]
+
+ vpsrld xmm7,xmm6,3
+ vpsrld xmm1,xmm6,7
+ vpslld xmm2,xmm6,25
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm6,18
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm6,14
+ vmovdqu xmm0,XMMWORD[((64-128))+rax]
+ vpsrld xmm3,xmm0,10
+
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm0,17
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,15
+ vpaddd xmm5,xmm5,xmm7
+ vpxor xmm7,xmm3,xmm1
+ vpsrld xmm1,xmm0,19
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,13
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+ vpaddd xmm5,xmm5,xmm7
+ vpsrld xmm7,xmm14,6
+ vpslld xmm2,xmm14,26
+ vmovdqu XMMWORD[(96-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm9
+
+ vpsrld xmm1,xmm14,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm14,21
+ vpaddd xmm5,xmm5,XMMWORD[64+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm14,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm14,7
+ vpandn xmm0,xmm14,xmm8
+ vpand xmm3,xmm14,xmm15
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm9,xmm10,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm10,30
+ vpxor xmm0,xmm0,xmm3
+ vpxor xmm3,xmm11,xmm10
+
+ vpxor xmm9,xmm9,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm10,13
+
+ vpslld xmm2,xmm10,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm4,xmm4,xmm3
+
+ vpxor xmm7,xmm9,xmm1
+
+ vpsrld xmm1,xmm10,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm10,10
+ vpxor xmm9,xmm11,xmm4
+ vpaddd xmm13,xmm13,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm9,xmm9,xmm5
+ vpaddd xmm9,xmm9,xmm7
+ vmovdqu xmm5,XMMWORD[((128-128))+rax]
+ vpaddd xmm6,xmm6,XMMWORD[((0-128))+rax]
+
+ vpsrld xmm7,xmm5,3
+ vpsrld xmm1,xmm5,7
+ vpslld xmm2,xmm5,25
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm5,18
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm5,14
+ vmovdqu xmm0,XMMWORD[((80-128))+rax]
+ vpsrld xmm4,xmm0,10
+
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm0,17
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,15
+ vpaddd xmm6,xmm6,xmm7
+ vpxor xmm7,xmm4,xmm1
+ vpsrld xmm1,xmm0,19
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,13
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+ vpaddd xmm6,xmm6,xmm7
+ vpsrld xmm7,xmm13,6
+ vpslld xmm2,xmm13,26
+ vmovdqu XMMWORD[(112-128)+rax],xmm6
+ vpaddd xmm6,xmm6,xmm8
+
+ vpsrld xmm1,xmm13,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm13,21
+ vpaddd xmm6,xmm6,XMMWORD[96+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm13,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm13,7
+ vpandn xmm0,xmm13,xmm15
+ vpand xmm4,xmm13,xmm14
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm8,xmm9,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm9,30
+ vpxor xmm0,xmm0,xmm4
+ vpxor xmm4,xmm10,xmm9
+
+ vpxor xmm8,xmm8,xmm1
+ vpaddd xmm6,xmm6,xmm7
+
+ vpsrld xmm1,xmm9,13
+
+ vpslld xmm2,xmm9,19
+ vpaddd xmm6,xmm6,xmm0
+ vpand xmm3,xmm3,xmm4
+
+ vpxor xmm7,xmm8,xmm1
+
+ vpsrld xmm1,xmm9,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm9,10
+ vpxor xmm8,xmm10,xmm3
+ vpaddd xmm12,xmm12,xmm6
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm8,xmm8,xmm6
+ vpaddd xmm8,xmm8,xmm7
+ add rbp,256
+ vmovdqu xmm6,XMMWORD[((144-128))+rax]
+ vpaddd xmm5,xmm5,XMMWORD[((16-128))+rax]
+
+ vpsrld xmm7,xmm6,3
+ vpsrld xmm1,xmm6,7
+ vpslld xmm2,xmm6,25
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm6,18
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm6,14
+ vmovdqu xmm0,XMMWORD[((96-128))+rax]
+ vpsrld xmm3,xmm0,10
+
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm0,17
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,15
+ vpaddd xmm5,xmm5,xmm7
+ vpxor xmm7,xmm3,xmm1
+ vpsrld xmm1,xmm0,19
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,13
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+ vpaddd xmm5,xmm5,xmm7
+ vpsrld xmm7,xmm12,6
+ vpslld xmm2,xmm12,26
+ vmovdqu XMMWORD[(128-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm15
+
+ vpsrld xmm1,xmm12,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm12,21
+ vpaddd xmm5,xmm5,XMMWORD[((-128))+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm12,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm12,7
+ vpandn xmm0,xmm12,xmm14
+ vpand xmm3,xmm12,xmm13
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm15,xmm8,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm8,30
+ vpxor xmm0,xmm0,xmm3
+ vpxor xmm3,xmm9,xmm8
+
+ vpxor xmm15,xmm15,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm8,13
+
+ vpslld xmm2,xmm8,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm4,xmm4,xmm3
+
+ vpxor xmm7,xmm15,xmm1
+
+ vpsrld xmm1,xmm8,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm8,10
+ vpxor xmm15,xmm9,xmm4
+ vpaddd xmm11,xmm11,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm15,xmm15,xmm5
+ vpaddd xmm15,xmm15,xmm7
+ vmovdqu xmm5,XMMWORD[((160-128))+rax]
+ vpaddd xmm6,xmm6,XMMWORD[((32-128))+rax]
+
+ vpsrld xmm7,xmm5,3
+ vpsrld xmm1,xmm5,7
+ vpslld xmm2,xmm5,25
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm5,18
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm5,14
+ vmovdqu xmm0,XMMWORD[((112-128))+rax]
+ vpsrld xmm4,xmm0,10
+
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm0,17
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,15
+ vpaddd xmm6,xmm6,xmm7
+ vpxor xmm7,xmm4,xmm1
+ vpsrld xmm1,xmm0,19
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,13
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+ vpaddd xmm6,xmm6,xmm7
+ vpsrld xmm7,xmm11,6
+ vpslld xmm2,xmm11,26
+ vmovdqu XMMWORD[(144-128)+rax],xmm6
+ vpaddd xmm6,xmm6,xmm14
+
+ vpsrld xmm1,xmm11,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm11,21
+ vpaddd xmm6,xmm6,XMMWORD[((-96))+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm11,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm11,7
+ vpandn xmm0,xmm11,xmm13
+ vpand xmm4,xmm11,xmm12
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm14,xmm15,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm15,30
+ vpxor xmm0,xmm0,xmm4
+ vpxor xmm4,xmm8,xmm15
+
+ vpxor xmm14,xmm14,xmm1
+ vpaddd xmm6,xmm6,xmm7
+
+ vpsrld xmm1,xmm15,13
+
+ vpslld xmm2,xmm15,19
+ vpaddd xmm6,xmm6,xmm0
+ vpand xmm3,xmm3,xmm4
+
+ vpxor xmm7,xmm14,xmm1
+
+ vpsrld xmm1,xmm15,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm15,10
+ vpxor xmm14,xmm8,xmm3
+ vpaddd xmm10,xmm10,xmm6
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm14,xmm14,xmm6
+ vpaddd xmm14,xmm14,xmm7
+ vmovdqu xmm6,XMMWORD[((176-128))+rax]
+ vpaddd xmm5,xmm5,XMMWORD[((48-128))+rax]
+
+ vpsrld xmm7,xmm6,3
+ vpsrld xmm1,xmm6,7
+ vpslld xmm2,xmm6,25
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm6,18
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm6,14
+ vmovdqu xmm0,XMMWORD[((128-128))+rax]
+ vpsrld xmm3,xmm0,10
+
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm0,17
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,15
+ vpaddd xmm5,xmm5,xmm7
+ vpxor xmm7,xmm3,xmm1
+ vpsrld xmm1,xmm0,19
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,13
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+ vpaddd xmm5,xmm5,xmm7
+ vpsrld xmm7,xmm10,6
+ vpslld xmm2,xmm10,26
+ vmovdqu XMMWORD[(160-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm13
+
+ vpsrld xmm1,xmm10,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm10,21
+ vpaddd xmm5,xmm5,XMMWORD[((-64))+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm10,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm10,7
+ vpandn xmm0,xmm10,xmm12
+ vpand xmm3,xmm10,xmm11
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm13,xmm14,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm14,30
+ vpxor xmm0,xmm0,xmm3
+ vpxor xmm3,xmm15,xmm14
+
+ vpxor xmm13,xmm13,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm14,13
+
+ vpslld xmm2,xmm14,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm4,xmm4,xmm3
+
+ vpxor xmm7,xmm13,xmm1
+
+ vpsrld xmm1,xmm14,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm14,10
+ vpxor xmm13,xmm15,xmm4
+ vpaddd xmm9,xmm9,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm13,xmm13,xmm5
+ vpaddd xmm13,xmm13,xmm7
+ vmovdqu xmm5,XMMWORD[((192-128))+rax]
+ vpaddd xmm6,xmm6,XMMWORD[((64-128))+rax]
+
+ vpsrld xmm7,xmm5,3
+ vpsrld xmm1,xmm5,7
+ vpslld xmm2,xmm5,25
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm5,18
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm5,14
+ vmovdqu xmm0,XMMWORD[((144-128))+rax]
+ vpsrld xmm4,xmm0,10
+
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm0,17
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,15
+ vpaddd xmm6,xmm6,xmm7
+ vpxor xmm7,xmm4,xmm1
+ vpsrld xmm1,xmm0,19
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,13
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+ vpaddd xmm6,xmm6,xmm7
+ vpsrld xmm7,xmm9,6
+ vpslld xmm2,xmm9,26
+ vmovdqu XMMWORD[(176-128)+rax],xmm6
+ vpaddd xmm6,xmm6,xmm12
+
+ vpsrld xmm1,xmm9,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm9,21
+ vpaddd xmm6,xmm6,XMMWORD[((-32))+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm9,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm9,7
+ vpandn xmm0,xmm9,xmm11
+ vpand xmm4,xmm9,xmm10
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm12,xmm13,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm13,30
+ vpxor xmm0,xmm0,xmm4
+ vpxor xmm4,xmm14,xmm13
+
+ vpxor xmm12,xmm12,xmm1
+ vpaddd xmm6,xmm6,xmm7
+
+ vpsrld xmm1,xmm13,13
+
+ vpslld xmm2,xmm13,19
+ vpaddd xmm6,xmm6,xmm0
+ vpand xmm3,xmm3,xmm4
+
+ vpxor xmm7,xmm12,xmm1
+
+ vpsrld xmm1,xmm13,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm13,10
+ vpxor xmm12,xmm14,xmm3
+ vpaddd xmm8,xmm8,xmm6
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm12,xmm12,xmm6
+ vpaddd xmm12,xmm12,xmm7
+ vmovdqu xmm6,XMMWORD[((208-128))+rax]
+ vpaddd xmm5,xmm5,XMMWORD[((80-128))+rax]
+
+ vpsrld xmm7,xmm6,3
+ vpsrld xmm1,xmm6,7
+ vpslld xmm2,xmm6,25
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm6,18
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm6,14
+ vmovdqu xmm0,XMMWORD[((160-128))+rax]
+ vpsrld xmm3,xmm0,10
+
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm0,17
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,15
+ vpaddd xmm5,xmm5,xmm7
+ vpxor xmm7,xmm3,xmm1
+ vpsrld xmm1,xmm0,19
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,13
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+ vpaddd xmm5,xmm5,xmm7
+ vpsrld xmm7,xmm8,6
+ vpslld xmm2,xmm8,26
+ vmovdqu XMMWORD[(192-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm11
+
+ vpsrld xmm1,xmm8,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm8,21
+ vpaddd xmm5,xmm5,XMMWORD[rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm8,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm8,7
+ vpandn xmm0,xmm8,xmm10
+ vpand xmm3,xmm8,xmm9
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm11,xmm12,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm12,30
+ vpxor xmm0,xmm0,xmm3
+ vpxor xmm3,xmm13,xmm12
+
+ vpxor xmm11,xmm11,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm12,13
+
+ vpslld xmm2,xmm12,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm4,xmm4,xmm3
+
+ vpxor xmm7,xmm11,xmm1
+
+ vpsrld xmm1,xmm12,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm12,10
+ vpxor xmm11,xmm13,xmm4
+ vpaddd xmm15,xmm15,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm11,xmm11,xmm5
+ vpaddd xmm11,xmm11,xmm7
+ vmovdqu xmm5,XMMWORD[((224-128))+rax]
+ vpaddd xmm6,xmm6,XMMWORD[((96-128))+rax]
+
+ vpsrld xmm7,xmm5,3
+ vpsrld xmm1,xmm5,7
+ vpslld xmm2,xmm5,25
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm5,18
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm5,14
+ vmovdqu xmm0,XMMWORD[((176-128))+rax]
+ vpsrld xmm4,xmm0,10
+
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm0,17
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,15
+ vpaddd xmm6,xmm6,xmm7
+ vpxor xmm7,xmm4,xmm1
+ vpsrld xmm1,xmm0,19
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,13
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+ vpaddd xmm6,xmm6,xmm7
+ vpsrld xmm7,xmm15,6
+ vpslld xmm2,xmm15,26
+ vmovdqu XMMWORD[(208-128)+rax],xmm6
+ vpaddd xmm6,xmm6,xmm10
+
+ vpsrld xmm1,xmm15,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm15,21
+ vpaddd xmm6,xmm6,XMMWORD[32+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm15,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm15,7
+ vpandn xmm0,xmm15,xmm9
+ vpand xmm4,xmm15,xmm8
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm10,xmm11,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm11,30
+ vpxor xmm0,xmm0,xmm4
+ vpxor xmm4,xmm12,xmm11
+
+ vpxor xmm10,xmm10,xmm1
+ vpaddd xmm6,xmm6,xmm7
+
+ vpsrld xmm1,xmm11,13
+
+ vpslld xmm2,xmm11,19
+ vpaddd xmm6,xmm6,xmm0
+ vpand xmm3,xmm3,xmm4
+
+ vpxor xmm7,xmm10,xmm1
+
+ vpsrld xmm1,xmm11,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm11,10
+ vpxor xmm10,xmm12,xmm3
+ vpaddd xmm14,xmm14,xmm6
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm10,xmm10,xmm6
+ vpaddd xmm10,xmm10,xmm7
+ vmovdqu xmm6,XMMWORD[((240-128))+rax]
+ vpaddd xmm5,xmm5,XMMWORD[((112-128))+rax]
+
+ vpsrld xmm7,xmm6,3
+ vpsrld xmm1,xmm6,7
+ vpslld xmm2,xmm6,25
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm6,18
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm6,14
+ vmovdqu xmm0,XMMWORD[((192-128))+rax]
+ vpsrld xmm3,xmm0,10
+
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm0,17
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,15
+ vpaddd xmm5,xmm5,xmm7
+ vpxor xmm7,xmm3,xmm1
+ vpsrld xmm1,xmm0,19
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,13
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+ vpaddd xmm5,xmm5,xmm7
+ vpsrld xmm7,xmm14,6
+ vpslld xmm2,xmm14,26
+ vmovdqu XMMWORD[(224-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm9
+
+ vpsrld xmm1,xmm14,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm14,21
+ vpaddd xmm5,xmm5,XMMWORD[64+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm14,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm14,7
+ vpandn xmm0,xmm14,xmm8
+ vpand xmm3,xmm14,xmm15
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm9,xmm10,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm10,30
+ vpxor xmm0,xmm0,xmm3
+ vpxor xmm3,xmm11,xmm10
+
+ vpxor xmm9,xmm9,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm10,13
+
+ vpslld xmm2,xmm10,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm4,xmm4,xmm3
+
+ vpxor xmm7,xmm9,xmm1
+
+ vpsrld xmm1,xmm10,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm10,10
+ vpxor xmm9,xmm11,xmm4
+ vpaddd xmm13,xmm13,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm9,xmm9,xmm5
+ vpaddd xmm9,xmm9,xmm7
+ vmovdqu xmm5,XMMWORD[((0-128))+rax]
+ vpaddd xmm6,xmm6,XMMWORD[((128-128))+rax]
+
+ vpsrld xmm7,xmm5,3
+ vpsrld xmm1,xmm5,7
+ vpslld xmm2,xmm5,25
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm5,18
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm5,14
+ vmovdqu xmm0,XMMWORD[((208-128))+rax]
+ vpsrld xmm4,xmm0,10
+
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm0,17
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,15
+ vpaddd xmm6,xmm6,xmm7
+ vpxor xmm7,xmm4,xmm1
+ vpsrld xmm1,xmm0,19
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,13
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+ vpaddd xmm6,xmm6,xmm7
+ vpsrld xmm7,xmm13,6
+ vpslld xmm2,xmm13,26
+ vmovdqu XMMWORD[(240-128)+rax],xmm6
+ vpaddd xmm6,xmm6,xmm8
+
+ vpsrld xmm1,xmm13,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm13,21
+ vpaddd xmm6,xmm6,XMMWORD[96+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm13,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm13,7
+ vpandn xmm0,xmm13,xmm15
+ vpand xmm4,xmm13,xmm14
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm8,xmm9,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm9,30
+ vpxor xmm0,xmm0,xmm4
+ vpxor xmm4,xmm10,xmm9
+
+ vpxor xmm8,xmm8,xmm1
+ vpaddd xmm6,xmm6,xmm7
+
+ vpsrld xmm1,xmm9,13
+
+ vpslld xmm2,xmm9,19
+ vpaddd xmm6,xmm6,xmm0
+ vpand xmm3,xmm3,xmm4
+
+ vpxor xmm7,xmm8,xmm1
+
+ vpsrld xmm1,xmm9,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm9,10
+ vpxor xmm8,xmm10,xmm3
+ vpaddd xmm12,xmm12,xmm6
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm8,xmm8,xmm6
+ vpaddd xmm8,xmm8,xmm7
+ add rbp,256
+ dec ecx
+ jnz NEAR $L$oop_16_xx_avx
+
+ mov ecx,1
+ lea rbp,[((K256+128))]
+ cmp ecx,DWORD[rbx]
+ cmovge r8,rbp
+ cmp ecx,DWORD[4+rbx]
+ cmovge r9,rbp
+ cmp ecx,DWORD[8+rbx]
+ cmovge r10,rbp
+ cmp ecx,DWORD[12+rbx]
+ cmovge r11,rbp
+ vmovdqa xmm7,XMMWORD[rbx]
+ vpxor xmm0,xmm0,xmm0
+ vmovdqa xmm6,xmm7
+ vpcmpgtd xmm6,xmm6,xmm0
+ vpaddd xmm7,xmm7,xmm6
+
+ vmovdqu xmm0,XMMWORD[((0-128))+rdi]
+ vpand xmm8,xmm8,xmm6
+ vmovdqu xmm1,XMMWORD[((32-128))+rdi]
+ vpand xmm9,xmm9,xmm6
+ vmovdqu xmm2,XMMWORD[((64-128))+rdi]
+ vpand xmm10,xmm10,xmm6
+ vmovdqu xmm5,XMMWORD[((96-128))+rdi]
+ vpand xmm11,xmm11,xmm6
+ vpaddd xmm8,xmm8,xmm0
+ vmovdqu xmm0,XMMWORD[((128-128))+rdi]
+ vpand xmm12,xmm12,xmm6
+ vpaddd xmm9,xmm9,xmm1
+ vmovdqu xmm1,XMMWORD[((160-128))+rdi]
+ vpand xmm13,xmm13,xmm6
+ vpaddd xmm10,xmm10,xmm2
+ vmovdqu xmm2,XMMWORD[((192-128))+rdi]
+ vpand xmm14,xmm14,xmm6
+ vpaddd xmm11,xmm11,xmm5
+ vmovdqu xmm5,XMMWORD[((224-128))+rdi]
+ vpand xmm15,xmm15,xmm6
+ vpaddd xmm12,xmm12,xmm0
+ vpaddd xmm13,xmm13,xmm1
+ vmovdqu XMMWORD[(0-128)+rdi],xmm8
+ vpaddd xmm14,xmm14,xmm2
+ vmovdqu XMMWORD[(32-128)+rdi],xmm9
+ vpaddd xmm15,xmm15,xmm5
+ vmovdqu XMMWORD[(64-128)+rdi],xmm10
+ vmovdqu XMMWORD[(96-128)+rdi],xmm11
+ vmovdqu XMMWORD[(128-128)+rdi],xmm12
+ vmovdqu XMMWORD[(160-128)+rdi],xmm13
+ vmovdqu XMMWORD[(192-128)+rdi],xmm14
+ vmovdqu XMMWORD[(224-128)+rdi],xmm15
+
+ vmovdqu XMMWORD[rbx],xmm7
+ vmovdqu xmm6,XMMWORD[$L$pbswap]
+ dec edx
+ jnz NEAR $L$oop_avx
+
+ mov edx,DWORD[280+rsp]
+ lea rdi,[16+rdi]
+ lea rsi,[64+rsi]
+ dec edx
+ jnz NEAR $L$oop_grande_avx
+
+$L$done_avx:
+ mov rax,QWORD[272+rsp]
+
+ vzeroupper
+ movaps xmm6,XMMWORD[((-184))+rax]
+ movaps xmm7,XMMWORD[((-168))+rax]
+ movaps xmm8,XMMWORD[((-152))+rax]
+ movaps xmm9,XMMWORD[((-136))+rax]
+ movaps xmm10,XMMWORD[((-120))+rax]
+ movaps xmm11,XMMWORD[((-104))+rax]
+ movaps xmm12,XMMWORD[((-88))+rax]
+ movaps xmm13,XMMWORD[((-72))+rax]
+ movaps xmm14,XMMWORD[((-56))+rax]
+ movaps xmm15,XMMWORD[((-40))+rax]
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$epilogue_avx:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_sha256_multi_block_avx:
+
+ALIGN 32
+sha256_multi_block_avx2:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha256_multi_block_avx2:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+_avx2_shortcut:
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ lea rsp,[((-168))+rsp]
+ movaps XMMWORD[rsp],xmm6
+ movaps XMMWORD[16+rsp],xmm7
+ movaps XMMWORD[32+rsp],xmm8
+ movaps XMMWORD[48+rsp],xmm9
+ movaps XMMWORD[64+rsp],xmm10
+ movaps XMMWORD[80+rsp],xmm11
+ movaps XMMWORD[(-120)+rax],xmm12
+ movaps XMMWORD[(-104)+rax],xmm13
+ movaps XMMWORD[(-88)+rax],xmm14
+ movaps XMMWORD[(-72)+rax],xmm15
+ sub rsp,576
+ and rsp,-256
+ mov QWORD[544+rsp],rax
+
+$L$body_avx2:
+ lea rbp,[((K256+128))]
+ lea rdi,[128+rdi]
+
+$L$oop_grande_avx2:
+ mov DWORD[552+rsp],edx
+ xor edx,edx
+ lea rbx,[512+rsp]
+ mov r12,QWORD[rsi]
+ mov ecx,DWORD[8+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[rbx],ecx
+ cmovle r12,rbp
+ mov r13,QWORD[16+rsi]
+ mov ecx,DWORD[24+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[4+rbx],ecx
+ cmovle r13,rbp
+ mov r14,QWORD[32+rsi]
+ mov ecx,DWORD[40+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[8+rbx],ecx
+ cmovle r14,rbp
+ mov r15,QWORD[48+rsi]
+ mov ecx,DWORD[56+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[12+rbx],ecx
+ cmovle r15,rbp
+ mov r8,QWORD[64+rsi]
+ mov ecx,DWORD[72+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[16+rbx],ecx
+ cmovle r8,rbp
+ mov r9,QWORD[80+rsi]
+ mov ecx,DWORD[88+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[20+rbx],ecx
+ cmovle r9,rbp
+ mov r10,QWORD[96+rsi]
+ mov ecx,DWORD[104+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[24+rbx],ecx
+ cmovle r10,rbp
+ mov r11,QWORD[112+rsi]
+ mov ecx,DWORD[120+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[28+rbx],ecx
+ cmovle r11,rbp
+ vmovdqu ymm8,YMMWORD[((0-128))+rdi]
+ lea rax,[128+rsp]
+ vmovdqu ymm9,YMMWORD[((32-128))+rdi]
+ lea rbx,[((256+128))+rsp]
+ vmovdqu ymm10,YMMWORD[((64-128))+rdi]
+ vmovdqu ymm11,YMMWORD[((96-128))+rdi]
+ vmovdqu ymm12,YMMWORD[((128-128))+rdi]
+ vmovdqu ymm13,YMMWORD[((160-128))+rdi]
+ vmovdqu ymm14,YMMWORD[((192-128))+rdi]
+ vmovdqu ymm15,YMMWORD[((224-128))+rdi]
+ vmovdqu ymm6,YMMWORD[$L$pbswap]
+ jmp NEAR $L$oop_avx2
+
+ALIGN 32
+$L$oop_avx2:
+ vpxor ymm4,ymm10,ymm9
+ vmovd xmm5,DWORD[r12]
+ vmovd xmm0,DWORD[r8]
+ vmovd xmm1,DWORD[r13]
+ vmovd xmm2,DWORD[r9]
+ vpinsrd xmm5,xmm5,DWORD[r14],1
+ vpinsrd xmm0,xmm0,DWORD[r10],1
+ vpinsrd xmm1,xmm1,DWORD[r15],1
+ vpunpckldq ymm5,ymm5,ymm1
+ vpinsrd xmm2,xmm2,DWORD[r11],1
+ vpunpckldq ymm0,ymm0,ymm2
+ vinserti128 ymm5,ymm5,xmm0,1
+ vpshufb ymm5,ymm5,ymm6
+ vpsrld ymm7,ymm12,6
+ vpslld ymm2,ymm12,26
+ vmovdqu YMMWORD[(0-128)+rax],ymm5
+ vpaddd ymm5,ymm5,ymm15
+
+ vpsrld ymm1,ymm12,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm12,21
+ vpaddd ymm5,ymm5,YMMWORD[((-128))+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm12,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm12,7
+ vpandn ymm0,ymm12,ymm14
+ vpand ymm3,ymm12,ymm13
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm15,ymm8,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm8,30
+ vpxor ymm0,ymm0,ymm3
+ vpxor ymm3,ymm9,ymm8
+
+ vpxor ymm15,ymm15,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm8,13
+
+ vpslld ymm2,ymm8,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm4,ymm4,ymm3
+
+ vpxor ymm7,ymm15,ymm1
+
+ vpsrld ymm1,ymm8,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm8,10
+ vpxor ymm15,ymm9,ymm4
+ vpaddd ymm11,ymm11,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm15,ymm15,ymm5
+ vpaddd ymm15,ymm15,ymm7
+ vmovd xmm5,DWORD[4+r12]
+ vmovd xmm0,DWORD[4+r8]
+ vmovd xmm1,DWORD[4+r13]
+ vmovd xmm2,DWORD[4+r9]
+ vpinsrd xmm5,xmm5,DWORD[4+r14],1
+ vpinsrd xmm0,xmm0,DWORD[4+r10],1
+ vpinsrd xmm1,xmm1,DWORD[4+r15],1
+ vpunpckldq ymm5,ymm5,ymm1
+ vpinsrd xmm2,xmm2,DWORD[4+r11],1
+ vpunpckldq ymm0,ymm0,ymm2
+ vinserti128 ymm5,ymm5,xmm0,1
+ vpshufb ymm5,ymm5,ymm6
+ vpsrld ymm7,ymm11,6
+ vpslld ymm2,ymm11,26
+ vmovdqu YMMWORD[(32-128)+rax],ymm5
+ vpaddd ymm5,ymm5,ymm14
+
+ vpsrld ymm1,ymm11,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm11,21
+ vpaddd ymm5,ymm5,YMMWORD[((-96))+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm11,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm11,7
+ vpandn ymm0,ymm11,ymm13
+ vpand ymm4,ymm11,ymm12
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm14,ymm15,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm15,30
+ vpxor ymm0,ymm0,ymm4
+ vpxor ymm4,ymm8,ymm15
+
+ vpxor ymm14,ymm14,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm15,13
+
+ vpslld ymm2,ymm15,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm3,ymm3,ymm4
+
+ vpxor ymm7,ymm14,ymm1
+
+ vpsrld ymm1,ymm15,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm15,10
+ vpxor ymm14,ymm8,ymm3
+ vpaddd ymm10,ymm10,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm14,ymm14,ymm5
+ vpaddd ymm14,ymm14,ymm7
+ vmovd xmm5,DWORD[8+r12]
+ vmovd xmm0,DWORD[8+r8]
+ vmovd xmm1,DWORD[8+r13]
+ vmovd xmm2,DWORD[8+r9]
+ vpinsrd xmm5,xmm5,DWORD[8+r14],1
+ vpinsrd xmm0,xmm0,DWORD[8+r10],1
+ vpinsrd xmm1,xmm1,DWORD[8+r15],1
+ vpunpckldq ymm5,ymm5,ymm1
+ vpinsrd xmm2,xmm2,DWORD[8+r11],1
+ vpunpckldq ymm0,ymm0,ymm2
+ vinserti128 ymm5,ymm5,xmm0,1
+ vpshufb ymm5,ymm5,ymm6
+ vpsrld ymm7,ymm10,6
+ vpslld ymm2,ymm10,26
+ vmovdqu YMMWORD[(64-128)+rax],ymm5
+ vpaddd ymm5,ymm5,ymm13
+
+ vpsrld ymm1,ymm10,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm10,21
+ vpaddd ymm5,ymm5,YMMWORD[((-64))+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm10,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm10,7
+ vpandn ymm0,ymm10,ymm12
+ vpand ymm3,ymm10,ymm11
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm13,ymm14,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm14,30
+ vpxor ymm0,ymm0,ymm3
+ vpxor ymm3,ymm15,ymm14
+
+ vpxor ymm13,ymm13,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm14,13
+
+ vpslld ymm2,ymm14,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm4,ymm4,ymm3
+
+ vpxor ymm7,ymm13,ymm1
+
+ vpsrld ymm1,ymm14,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm14,10
+ vpxor ymm13,ymm15,ymm4
+ vpaddd ymm9,ymm9,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm13,ymm13,ymm5
+ vpaddd ymm13,ymm13,ymm7
+ vmovd xmm5,DWORD[12+r12]
+ vmovd xmm0,DWORD[12+r8]
+ vmovd xmm1,DWORD[12+r13]
+ vmovd xmm2,DWORD[12+r9]
+ vpinsrd xmm5,xmm5,DWORD[12+r14],1
+ vpinsrd xmm0,xmm0,DWORD[12+r10],1
+ vpinsrd xmm1,xmm1,DWORD[12+r15],1
+ vpunpckldq ymm5,ymm5,ymm1
+ vpinsrd xmm2,xmm2,DWORD[12+r11],1
+ vpunpckldq ymm0,ymm0,ymm2
+ vinserti128 ymm5,ymm5,xmm0,1
+ vpshufb ymm5,ymm5,ymm6
+ vpsrld ymm7,ymm9,6
+ vpslld ymm2,ymm9,26
+ vmovdqu YMMWORD[(96-128)+rax],ymm5
+ vpaddd ymm5,ymm5,ymm12
+
+ vpsrld ymm1,ymm9,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm9,21
+ vpaddd ymm5,ymm5,YMMWORD[((-32))+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm9,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm9,7
+ vpandn ymm0,ymm9,ymm11
+ vpand ymm4,ymm9,ymm10
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm12,ymm13,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm13,30
+ vpxor ymm0,ymm0,ymm4
+ vpxor ymm4,ymm14,ymm13
+
+ vpxor ymm12,ymm12,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm13,13
+
+ vpslld ymm2,ymm13,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm3,ymm3,ymm4
+
+ vpxor ymm7,ymm12,ymm1
+
+ vpsrld ymm1,ymm13,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm13,10
+ vpxor ymm12,ymm14,ymm3
+ vpaddd ymm8,ymm8,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm12,ymm12,ymm5
+ vpaddd ymm12,ymm12,ymm7
+ vmovd xmm5,DWORD[16+r12]
+ vmovd xmm0,DWORD[16+r8]
+ vmovd xmm1,DWORD[16+r13]
+ vmovd xmm2,DWORD[16+r9]
+ vpinsrd xmm5,xmm5,DWORD[16+r14],1
+ vpinsrd xmm0,xmm0,DWORD[16+r10],1
+ vpinsrd xmm1,xmm1,DWORD[16+r15],1
+ vpunpckldq ymm5,ymm5,ymm1
+ vpinsrd xmm2,xmm2,DWORD[16+r11],1
+ vpunpckldq ymm0,ymm0,ymm2
+ vinserti128 ymm5,ymm5,xmm0,1
+ vpshufb ymm5,ymm5,ymm6
+ vpsrld ymm7,ymm8,6
+ vpslld ymm2,ymm8,26
+ vmovdqu YMMWORD[(128-128)+rax],ymm5
+ vpaddd ymm5,ymm5,ymm11
+
+ vpsrld ymm1,ymm8,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm8,21
+ vpaddd ymm5,ymm5,YMMWORD[rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm8,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm8,7
+ vpandn ymm0,ymm8,ymm10
+ vpand ymm3,ymm8,ymm9
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm11,ymm12,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm12,30
+ vpxor ymm0,ymm0,ymm3
+ vpxor ymm3,ymm13,ymm12
+
+ vpxor ymm11,ymm11,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm12,13
+
+ vpslld ymm2,ymm12,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm4,ymm4,ymm3
+
+ vpxor ymm7,ymm11,ymm1
+
+ vpsrld ymm1,ymm12,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm12,10
+ vpxor ymm11,ymm13,ymm4
+ vpaddd ymm15,ymm15,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm11,ymm11,ymm5
+ vpaddd ymm11,ymm11,ymm7
+ vmovd xmm5,DWORD[20+r12]
+ vmovd xmm0,DWORD[20+r8]
+ vmovd xmm1,DWORD[20+r13]
+ vmovd xmm2,DWORD[20+r9]
+ vpinsrd xmm5,xmm5,DWORD[20+r14],1
+ vpinsrd xmm0,xmm0,DWORD[20+r10],1
+ vpinsrd xmm1,xmm1,DWORD[20+r15],1
+ vpunpckldq ymm5,ymm5,ymm1
+ vpinsrd xmm2,xmm2,DWORD[20+r11],1
+ vpunpckldq ymm0,ymm0,ymm2
+ vinserti128 ymm5,ymm5,xmm0,1
+ vpshufb ymm5,ymm5,ymm6
+ vpsrld ymm7,ymm15,6
+ vpslld ymm2,ymm15,26
+ vmovdqu YMMWORD[(160-128)+rax],ymm5
+ vpaddd ymm5,ymm5,ymm10
+
+ vpsrld ymm1,ymm15,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm15,21
+ vpaddd ymm5,ymm5,YMMWORD[32+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm15,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm15,7
+ vpandn ymm0,ymm15,ymm9
+ vpand ymm4,ymm15,ymm8
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm10,ymm11,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm11,30
+ vpxor ymm0,ymm0,ymm4
+ vpxor ymm4,ymm12,ymm11
+
+ vpxor ymm10,ymm10,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm11,13
+
+ vpslld ymm2,ymm11,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm3,ymm3,ymm4
+
+ vpxor ymm7,ymm10,ymm1
+
+ vpsrld ymm1,ymm11,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm11,10
+ vpxor ymm10,ymm12,ymm3
+ vpaddd ymm14,ymm14,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm10,ymm10,ymm5
+ vpaddd ymm10,ymm10,ymm7
+ vmovd xmm5,DWORD[24+r12]
+ vmovd xmm0,DWORD[24+r8]
+ vmovd xmm1,DWORD[24+r13]
+ vmovd xmm2,DWORD[24+r9]
+ vpinsrd xmm5,xmm5,DWORD[24+r14],1
+ vpinsrd xmm0,xmm0,DWORD[24+r10],1
+ vpinsrd xmm1,xmm1,DWORD[24+r15],1
+ vpunpckldq ymm5,ymm5,ymm1
+ vpinsrd xmm2,xmm2,DWORD[24+r11],1
+ vpunpckldq ymm0,ymm0,ymm2
+ vinserti128 ymm5,ymm5,xmm0,1
+ vpshufb ymm5,ymm5,ymm6
+ vpsrld ymm7,ymm14,6
+ vpslld ymm2,ymm14,26
+ vmovdqu YMMWORD[(192-128)+rax],ymm5
+ vpaddd ymm5,ymm5,ymm9
+
+ vpsrld ymm1,ymm14,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm14,21
+ vpaddd ymm5,ymm5,YMMWORD[64+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm14,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm14,7
+ vpandn ymm0,ymm14,ymm8
+ vpand ymm3,ymm14,ymm15
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm9,ymm10,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm10,30
+ vpxor ymm0,ymm0,ymm3
+ vpxor ymm3,ymm11,ymm10
+
+ vpxor ymm9,ymm9,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm10,13
+
+ vpslld ymm2,ymm10,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm4,ymm4,ymm3
+
+ vpxor ymm7,ymm9,ymm1
+
+ vpsrld ymm1,ymm10,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm10,10
+ vpxor ymm9,ymm11,ymm4
+ vpaddd ymm13,ymm13,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm9,ymm9,ymm5
+ vpaddd ymm9,ymm9,ymm7
+ vmovd xmm5,DWORD[28+r12]
+ vmovd xmm0,DWORD[28+r8]
+ vmovd xmm1,DWORD[28+r13]
+ vmovd xmm2,DWORD[28+r9]
+ vpinsrd xmm5,xmm5,DWORD[28+r14],1
+ vpinsrd xmm0,xmm0,DWORD[28+r10],1
+ vpinsrd xmm1,xmm1,DWORD[28+r15],1
+ vpunpckldq ymm5,ymm5,ymm1
+ vpinsrd xmm2,xmm2,DWORD[28+r11],1
+ vpunpckldq ymm0,ymm0,ymm2
+ vinserti128 ymm5,ymm5,xmm0,1
+ vpshufb ymm5,ymm5,ymm6
+ vpsrld ymm7,ymm13,6
+ vpslld ymm2,ymm13,26
+ vmovdqu YMMWORD[(224-128)+rax],ymm5
+ vpaddd ymm5,ymm5,ymm8
+
+ vpsrld ymm1,ymm13,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm13,21
+ vpaddd ymm5,ymm5,YMMWORD[96+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm13,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm13,7
+ vpandn ymm0,ymm13,ymm15
+ vpand ymm4,ymm13,ymm14
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm8,ymm9,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm9,30
+ vpxor ymm0,ymm0,ymm4
+ vpxor ymm4,ymm10,ymm9
+
+ vpxor ymm8,ymm8,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm9,13
+
+ vpslld ymm2,ymm9,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm3,ymm3,ymm4
+
+ vpxor ymm7,ymm8,ymm1
+
+ vpsrld ymm1,ymm9,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm9,10
+ vpxor ymm8,ymm10,ymm3
+ vpaddd ymm12,ymm12,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm8,ymm8,ymm5
+ vpaddd ymm8,ymm8,ymm7
+ add rbp,256
+ vmovd xmm5,DWORD[32+r12]
+ vmovd xmm0,DWORD[32+r8]
+ vmovd xmm1,DWORD[32+r13]
+ vmovd xmm2,DWORD[32+r9]
+ vpinsrd xmm5,xmm5,DWORD[32+r14],1
+ vpinsrd xmm0,xmm0,DWORD[32+r10],1
+ vpinsrd xmm1,xmm1,DWORD[32+r15],1
+ vpunpckldq ymm5,ymm5,ymm1
+ vpinsrd xmm2,xmm2,DWORD[32+r11],1
+ vpunpckldq ymm0,ymm0,ymm2
+ vinserti128 ymm5,ymm5,xmm0,1
+ vpshufb ymm5,ymm5,ymm6
+ vpsrld ymm7,ymm12,6
+ vpslld ymm2,ymm12,26
+ vmovdqu YMMWORD[(256-256-128)+rbx],ymm5
+ vpaddd ymm5,ymm5,ymm15
+
+ vpsrld ymm1,ymm12,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm12,21
+ vpaddd ymm5,ymm5,YMMWORD[((-128))+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm12,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm12,7
+ vpandn ymm0,ymm12,ymm14
+ vpand ymm3,ymm12,ymm13
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm15,ymm8,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm8,30
+ vpxor ymm0,ymm0,ymm3
+ vpxor ymm3,ymm9,ymm8
+
+ vpxor ymm15,ymm15,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm8,13
+
+ vpslld ymm2,ymm8,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm4,ymm4,ymm3
+
+ vpxor ymm7,ymm15,ymm1
+
+ vpsrld ymm1,ymm8,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm8,10
+ vpxor ymm15,ymm9,ymm4
+ vpaddd ymm11,ymm11,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm15,ymm15,ymm5
+ vpaddd ymm15,ymm15,ymm7
+ vmovd xmm5,DWORD[36+r12]
+ vmovd xmm0,DWORD[36+r8]
+ vmovd xmm1,DWORD[36+r13]
+ vmovd xmm2,DWORD[36+r9]
+ vpinsrd xmm5,xmm5,DWORD[36+r14],1
+ vpinsrd xmm0,xmm0,DWORD[36+r10],1
+ vpinsrd xmm1,xmm1,DWORD[36+r15],1
+ vpunpckldq ymm5,ymm5,ymm1
+ vpinsrd xmm2,xmm2,DWORD[36+r11],1
+ vpunpckldq ymm0,ymm0,ymm2
+ vinserti128 ymm5,ymm5,xmm0,1
+ vpshufb ymm5,ymm5,ymm6
+ vpsrld ymm7,ymm11,6
+ vpslld ymm2,ymm11,26
+ vmovdqu YMMWORD[(288-256-128)+rbx],ymm5
+ vpaddd ymm5,ymm5,ymm14
+
+ vpsrld ymm1,ymm11,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm11,21
+ vpaddd ymm5,ymm5,YMMWORD[((-96))+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm11,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm11,7
+ vpandn ymm0,ymm11,ymm13
+ vpand ymm4,ymm11,ymm12
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm14,ymm15,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm15,30
+ vpxor ymm0,ymm0,ymm4
+ vpxor ymm4,ymm8,ymm15
+
+ vpxor ymm14,ymm14,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm15,13
+
+ vpslld ymm2,ymm15,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm3,ymm3,ymm4
+
+ vpxor ymm7,ymm14,ymm1
+
+ vpsrld ymm1,ymm15,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm15,10
+ vpxor ymm14,ymm8,ymm3
+ vpaddd ymm10,ymm10,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm14,ymm14,ymm5
+ vpaddd ymm14,ymm14,ymm7
+ vmovd xmm5,DWORD[40+r12]
+ vmovd xmm0,DWORD[40+r8]
+ vmovd xmm1,DWORD[40+r13]
+ vmovd xmm2,DWORD[40+r9]
+ vpinsrd xmm5,xmm5,DWORD[40+r14],1
+ vpinsrd xmm0,xmm0,DWORD[40+r10],1
+ vpinsrd xmm1,xmm1,DWORD[40+r15],1
+ vpunpckldq ymm5,ymm5,ymm1
+ vpinsrd xmm2,xmm2,DWORD[40+r11],1
+ vpunpckldq ymm0,ymm0,ymm2
+ vinserti128 ymm5,ymm5,xmm0,1
+ vpshufb ymm5,ymm5,ymm6
+ vpsrld ymm7,ymm10,6
+ vpslld ymm2,ymm10,26
+ vmovdqu YMMWORD[(320-256-128)+rbx],ymm5
+ vpaddd ymm5,ymm5,ymm13
+
+ vpsrld ymm1,ymm10,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm10,21
+ vpaddd ymm5,ymm5,YMMWORD[((-64))+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm10,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm10,7
+ vpandn ymm0,ymm10,ymm12
+ vpand ymm3,ymm10,ymm11
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm13,ymm14,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm14,30
+ vpxor ymm0,ymm0,ymm3
+ vpxor ymm3,ymm15,ymm14
+
+ vpxor ymm13,ymm13,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm14,13
+
+ vpslld ymm2,ymm14,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm4,ymm4,ymm3
+
+ vpxor ymm7,ymm13,ymm1
+
+ vpsrld ymm1,ymm14,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm14,10
+ vpxor ymm13,ymm15,ymm4
+ vpaddd ymm9,ymm9,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm13,ymm13,ymm5
+ vpaddd ymm13,ymm13,ymm7
+ vmovd xmm5,DWORD[44+r12]
+ vmovd xmm0,DWORD[44+r8]
+ vmovd xmm1,DWORD[44+r13]
+ vmovd xmm2,DWORD[44+r9]
+ vpinsrd xmm5,xmm5,DWORD[44+r14],1
+ vpinsrd xmm0,xmm0,DWORD[44+r10],1
+ vpinsrd xmm1,xmm1,DWORD[44+r15],1
+ vpunpckldq ymm5,ymm5,ymm1
+ vpinsrd xmm2,xmm2,DWORD[44+r11],1
+ vpunpckldq ymm0,ymm0,ymm2
+ vinserti128 ymm5,ymm5,xmm0,1
+ vpshufb ymm5,ymm5,ymm6
+ vpsrld ymm7,ymm9,6
+ vpslld ymm2,ymm9,26
+ vmovdqu YMMWORD[(352-256-128)+rbx],ymm5
+ vpaddd ymm5,ymm5,ymm12
+
+ vpsrld ymm1,ymm9,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm9,21
+ vpaddd ymm5,ymm5,YMMWORD[((-32))+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm9,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm9,7
+ vpandn ymm0,ymm9,ymm11
+ vpand ymm4,ymm9,ymm10
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm12,ymm13,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm13,30
+ vpxor ymm0,ymm0,ymm4
+ vpxor ymm4,ymm14,ymm13
+
+ vpxor ymm12,ymm12,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm13,13
+
+ vpslld ymm2,ymm13,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm3,ymm3,ymm4
+
+ vpxor ymm7,ymm12,ymm1
+
+ vpsrld ymm1,ymm13,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm13,10
+ vpxor ymm12,ymm14,ymm3
+ vpaddd ymm8,ymm8,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm12,ymm12,ymm5
+ vpaddd ymm12,ymm12,ymm7
+ vmovd xmm5,DWORD[48+r12]
+ vmovd xmm0,DWORD[48+r8]
+ vmovd xmm1,DWORD[48+r13]
+ vmovd xmm2,DWORD[48+r9]
+ vpinsrd xmm5,xmm5,DWORD[48+r14],1
+ vpinsrd xmm0,xmm0,DWORD[48+r10],1
+ vpinsrd xmm1,xmm1,DWORD[48+r15],1
+ vpunpckldq ymm5,ymm5,ymm1
+ vpinsrd xmm2,xmm2,DWORD[48+r11],1
+ vpunpckldq ymm0,ymm0,ymm2
+ vinserti128 ymm5,ymm5,xmm0,1
+ vpshufb ymm5,ymm5,ymm6
+ vpsrld ymm7,ymm8,6
+ vpslld ymm2,ymm8,26
+ vmovdqu YMMWORD[(384-256-128)+rbx],ymm5
+ vpaddd ymm5,ymm5,ymm11
+
+ vpsrld ymm1,ymm8,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm8,21
+ vpaddd ymm5,ymm5,YMMWORD[rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm8,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm8,7
+ vpandn ymm0,ymm8,ymm10
+ vpand ymm3,ymm8,ymm9
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm11,ymm12,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm12,30
+ vpxor ymm0,ymm0,ymm3
+ vpxor ymm3,ymm13,ymm12
+
+ vpxor ymm11,ymm11,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm12,13
+
+ vpslld ymm2,ymm12,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm4,ymm4,ymm3
+
+ vpxor ymm7,ymm11,ymm1
+
+ vpsrld ymm1,ymm12,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm12,10
+ vpxor ymm11,ymm13,ymm4
+ vpaddd ymm15,ymm15,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm11,ymm11,ymm5
+ vpaddd ymm11,ymm11,ymm7
+ vmovd xmm5,DWORD[52+r12]
+ vmovd xmm0,DWORD[52+r8]
+ vmovd xmm1,DWORD[52+r13]
+ vmovd xmm2,DWORD[52+r9]
+ vpinsrd xmm5,xmm5,DWORD[52+r14],1
+ vpinsrd xmm0,xmm0,DWORD[52+r10],1
+ vpinsrd xmm1,xmm1,DWORD[52+r15],1
+ vpunpckldq ymm5,ymm5,ymm1
+ vpinsrd xmm2,xmm2,DWORD[52+r11],1
+ vpunpckldq ymm0,ymm0,ymm2
+ vinserti128 ymm5,ymm5,xmm0,1
+ vpshufb ymm5,ymm5,ymm6
+ vpsrld ymm7,ymm15,6
+ vpslld ymm2,ymm15,26
+ vmovdqu YMMWORD[(416-256-128)+rbx],ymm5
+ vpaddd ymm5,ymm5,ymm10
+
+ vpsrld ymm1,ymm15,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm15,21
+ vpaddd ymm5,ymm5,YMMWORD[32+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm15,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm15,7
+ vpandn ymm0,ymm15,ymm9
+ vpand ymm4,ymm15,ymm8
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm10,ymm11,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm11,30
+ vpxor ymm0,ymm0,ymm4
+ vpxor ymm4,ymm12,ymm11
+
+ vpxor ymm10,ymm10,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm11,13
+
+ vpslld ymm2,ymm11,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm3,ymm3,ymm4
+
+ vpxor ymm7,ymm10,ymm1
+
+ vpsrld ymm1,ymm11,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm11,10
+ vpxor ymm10,ymm12,ymm3
+ vpaddd ymm14,ymm14,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm10,ymm10,ymm5
+ vpaddd ymm10,ymm10,ymm7
+ vmovd xmm5,DWORD[56+r12]
+ vmovd xmm0,DWORD[56+r8]
+ vmovd xmm1,DWORD[56+r13]
+ vmovd xmm2,DWORD[56+r9]
+ vpinsrd xmm5,xmm5,DWORD[56+r14],1
+ vpinsrd xmm0,xmm0,DWORD[56+r10],1
+ vpinsrd xmm1,xmm1,DWORD[56+r15],1
+ vpunpckldq ymm5,ymm5,ymm1
+ vpinsrd xmm2,xmm2,DWORD[56+r11],1
+ vpunpckldq ymm0,ymm0,ymm2
+ vinserti128 ymm5,ymm5,xmm0,1
+ vpshufb ymm5,ymm5,ymm6
+ vpsrld ymm7,ymm14,6
+ vpslld ymm2,ymm14,26
+ vmovdqu YMMWORD[(448-256-128)+rbx],ymm5
+ vpaddd ymm5,ymm5,ymm9
+
+ vpsrld ymm1,ymm14,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm14,21
+ vpaddd ymm5,ymm5,YMMWORD[64+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm14,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm14,7
+ vpandn ymm0,ymm14,ymm8
+ vpand ymm3,ymm14,ymm15
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm9,ymm10,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm10,30
+ vpxor ymm0,ymm0,ymm3
+ vpxor ymm3,ymm11,ymm10
+
+ vpxor ymm9,ymm9,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm10,13
+
+ vpslld ymm2,ymm10,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm4,ymm4,ymm3
+
+ vpxor ymm7,ymm9,ymm1
+
+ vpsrld ymm1,ymm10,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm10,10
+ vpxor ymm9,ymm11,ymm4
+ vpaddd ymm13,ymm13,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm9,ymm9,ymm5
+ vpaddd ymm9,ymm9,ymm7
+ vmovd xmm5,DWORD[60+r12]
+ lea r12,[64+r12]
+ vmovd xmm0,DWORD[60+r8]
+ lea r8,[64+r8]
+ vmovd xmm1,DWORD[60+r13]
+ lea r13,[64+r13]
+ vmovd xmm2,DWORD[60+r9]
+ lea r9,[64+r9]
+ vpinsrd xmm5,xmm5,DWORD[60+r14],1
+ lea r14,[64+r14]
+ vpinsrd xmm0,xmm0,DWORD[60+r10],1
+ lea r10,[64+r10]
+ vpinsrd xmm1,xmm1,DWORD[60+r15],1
+ lea r15,[64+r15]
+ vpunpckldq ymm5,ymm5,ymm1
+ vpinsrd xmm2,xmm2,DWORD[60+r11],1
+ lea r11,[64+r11]
+ vpunpckldq ymm0,ymm0,ymm2
+ vinserti128 ymm5,ymm5,xmm0,1
+ vpshufb ymm5,ymm5,ymm6
+ vpsrld ymm7,ymm13,6
+ vpslld ymm2,ymm13,26
+ vmovdqu YMMWORD[(480-256-128)+rbx],ymm5
+ vpaddd ymm5,ymm5,ymm8
+
+ vpsrld ymm1,ymm13,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm13,21
+ vpaddd ymm5,ymm5,YMMWORD[96+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm13,25
+ vpxor ymm7,ymm7,ymm2
+ prefetcht0 [63+r12]
+ vpslld ymm2,ymm13,7
+ vpandn ymm0,ymm13,ymm15
+ vpand ymm4,ymm13,ymm14
+ prefetcht0 [63+r13]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm8,ymm9,2
+ vpxor ymm7,ymm7,ymm2
+ prefetcht0 [63+r14]
+ vpslld ymm1,ymm9,30
+ vpxor ymm0,ymm0,ymm4
+ vpxor ymm4,ymm10,ymm9
+ prefetcht0 [63+r15]
+ vpxor ymm8,ymm8,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm9,13
+ prefetcht0 [63+r8]
+ vpslld ymm2,ymm9,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm3,ymm3,ymm4
+ prefetcht0 [63+r9]
+ vpxor ymm7,ymm8,ymm1
+
+ vpsrld ymm1,ymm9,22
+ vpxor ymm7,ymm7,ymm2
+ prefetcht0 [63+r10]
+ vpslld ymm2,ymm9,10
+ vpxor ymm8,ymm10,ymm3
+ vpaddd ymm12,ymm12,ymm5
+ prefetcht0 [63+r11]
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm8,ymm8,ymm5
+ vpaddd ymm8,ymm8,ymm7
+ add rbp,256
+ vmovdqu ymm5,YMMWORD[((0-128))+rax]
+ mov ecx,3
+ jmp NEAR $L$oop_16_xx_avx2
+ALIGN 32
+$L$oop_16_xx_avx2:
+ vmovdqu ymm6,YMMWORD[((32-128))+rax]
+ vpaddd ymm5,ymm5,YMMWORD[((288-256-128))+rbx]
+
+ vpsrld ymm7,ymm6,3
+ vpsrld ymm1,ymm6,7
+ vpslld ymm2,ymm6,25
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm6,18
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm6,14
+ vmovdqu ymm0,YMMWORD[((448-256-128))+rbx]
+ vpsrld ymm3,ymm0,10
+
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm0,17
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,15
+ vpaddd ymm5,ymm5,ymm7
+ vpxor ymm7,ymm3,ymm1
+ vpsrld ymm1,ymm0,19
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,13
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+ vpaddd ymm5,ymm5,ymm7
+ vpsrld ymm7,ymm12,6
+ vpslld ymm2,ymm12,26
+ vmovdqu YMMWORD[(0-128)+rax],ymm5
+ vpaddd ymm5,ymm5,ymm15
+
+ vpsrld ymm1,ymm12,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm12,21
+ vpaddd ymm5,ymm5,YMMWORD[((-128))+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm12,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm12,7
+ vpandn ymm0,ymm12,ymm14
+ vpand ymm3,ymm12,ymm13
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm15,ymm8,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm8,30
+ vpxor ymm0,ymm0,ymm3
+ vpxor ymm3,ymm9,ymm8
+
+ vpxor ymm15,ymm15,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm8,13
+
+ vpslld ymm2,ymm8,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm4,ymm4,ymm3
+
+ vpxor ymm7,ymm15,ymm1
+
+ vpsrld ymm1,ymm8,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm8,10
+ vpxor ymm15,ymm9,ymm4
+ vpaddd ymm11,ymm11,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm15,ymm15,ymm5
+ vpaddd ymm15,ymm15,ymm7
+ vmovdqu ymm5,YMMWORD[((64-128))+rax]
+ vpaddd ymm6,ymm6,YMMWORD[((320-256-128))+rbx]
+
+ vpsrld ymm7,ymm5,3
+ vpsrld ymm1,ymm5,7
+ vpslld ymm2,ymm5,25
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm5,18
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm5,14
+ vmovdqu ymm0,YMMWORD[((480-256-128))+rbx]
+ vpsrld ymm4,ymm0,10
+
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm0,17
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,15
+ vpaddd ymm6,ymm6,ymm7
+ vpxor ymm7,ymm4,ymm1
+ vpsrld ymm1,ymm0,19
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,13
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+ vpaddd ymm6,ymm6,ymm7
+ vpsrld ymm7,ymm11,6
+ vpslld ymm2,ymm11,26
+ vmovdqu YMMWORD[(32-128)+rax],ymm6
+ vpaddd ymm6,ymm6,ymm14
+
+ vpsrld ymm1,ymm11,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm11,21
+ vpaddd ymm6,ymm6,YMMWORD[((-96))+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm11,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm11,7
+ vpandn ymm0,ymm11,ymm13
+ vpand ymm4,ymm11,ymm12
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm14,ymm15,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm15,30
+ vpxor ymm0,ymm0,ymm4
+ vpxor ymm4,ymm8,ymm15
+
+ vpxor ymm14,ymm14,ymm1
+ vpaddd ymm6,ymm6,ymm7
+
+ vpsrld ymm1,ymm15,13
+
+ vpslld ymm2,ymm15,19
+ vpaddd ymm6,ymm6,ymm0
+ vpand ymm3,ymm3,ymm4
+
+ vpxor ymm7,ymm14,ymm1
+
+ vpsrld ymm1,ymm15,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm15,10
+ vpxor ymm14,ymm8,ymm3
+ vpaddd ymm10,ymm10,ymm6
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm14,ymm14,ymm6
+ vpaddd ymm14,ymm14,ymm7
+ vmovdqu ymm6,YMMWORD[((96-128))+rax]
+ vpaddd ymm5,ymm5,YMMWORD[((352-256-128))+rbx]
+
+ vpsrld ymm7,ymm6,3
+ vpsrld ymm1,ymm6,7
+ vpslld ymm2,ymm6,25
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm6,18
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm6,14
+ vmovdqu ymm0,YMMWORD[((0-128))+rax]
+ vpsrld ymm3,ymm0,10
+
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm0,17
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,15
+ vpaddd ymm5,ymm5,ymm7
+ vpxor ymm7,ymm3,ymm1
+ vpsrld ymm1,ymm0,19
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,13
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+ vpaddd ymm5,ymm5,ymm7
+ vpsrld ymm7,ymm10,6
+ vpslld ymm2,ymm10,26
+ vmovdqu YMMWORD[(64-128)+rax],ymm5
+ vpaddd ymm5,ymm5,ymm13
+
+ vpsrld ymm1,ymm10,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm10,21
+ vpaddd ymm5,ymm5,YMMWORD[((-64))+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm10,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm10,7
+ vpandn ymm0,ymm10,ymm12
+ vpand ymm3,ymm10,ymm11
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm13,ymm14,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm14,30
+ vpxor ymm0,ymm0,ymm3
+ vpxor ymm3,ymm15,ymm14
+
+ vpxor ymm13,ymm13,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm14,13
+
+ vpslld ymm2,ymm14,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm4,ymm4,ymm3
+
+ vpxor ymm7,ymm13,ymm1
+
+ vpsrld ymm1,ymm14,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm14,10
+ vpxor ymm13,ymm15,ymm4
+ vpaddd ymm9,ymm9,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm13,ymm13,ymm5
+ vpaddd ymm13,ymm13,ymm7
+ vmovdqu ymm5,YMMWORD[((128-128))+rax]
+ vpaddd ymm6,ymm6,YMMWORD[((384-256-128))+rbx]
+
+ vpsrld ymm7,ymm5,3
+ vpsrld ymm1,ymm5,7
+ vpslld ymm2,ymm5,25
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm5,18
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm5,14
+ vmovdqu ymm0,YMMWORD[((32-128))+rax]
+ vpsrld ymm4,ymm0,10
+
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm0,17
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,15
+ vpaddd ymm6,ymm6,ymm7
+ vpxor ymm7,ymm4,ymm1
+ vpsrld ymm1,ymm0,19
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,13
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+ vpaddd ymm6,ymm6,ymm7
+ vpsrld ymm7,ymm9,6
+ vpslld ymm2,ymm9,26
+ vmovdqu YMMWORD[(96-128)+rax],ymm6
+ vpaddd ymm6,ymm6,ymm12
+
+ vpsrld ymm1,ymm9,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm9,21
+ vpaddd ymm6,ymm6,YMMWORD[((-32))+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm9,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm9,7
+ vpandn ymm0,ymm9,ymm11
+ vpand ymm4,ymm9,ymm10
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm12,ymm13,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm13,30
+ vpxor ymm0,ymm0,ymm4
+ vpxor ymm4,ymm14,ymm13
+
+ vpxor ymm12,ymm12,ymm1
+ vpaddd ymm6,ymm6,ymm7
+
+ vpsrld ymm1,ymm13,13
+
+ vpslld ymm2,ymm13,19
+ vpaddd ymm6,ymm6,ymm0
+ vpand ymm3,ymm3,ymm4
+
+ vpxor ymm7,ymm12,ymm1
+
+ vpsrld ymm1,ymm13,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm13,10
+ vpxor ymm12,ymm14,ymm3
+ vpaddd ymm8,ymm8,ymm6
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm12,ymm12,ymm6
+ vpaddd ymm12,ymm12,ymm7
+ vmovdqu ymm6,YMMWORD[((160-128))+rax]
+ vpaddd ymm5,ymm5,YMMWORD[((416-256-128))+rbx]
+
+ vpsrld ymm7,ymm6,3
+ vpsrld ymm1,ymm6,7
+ vpslld ymm2,ymm6,25
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm6,18
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm6,14
+ vmovdqu ymm0,YMMWORD[((64-128))+rax]
+ vpsrld ymm3,ymm0,10
+
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm0,17
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,15
+ vpaddd ymm5,ymm5,ymm7
+ vpxor ymm7,ymm3,ymm1
+ vpsrld ymm1,ymm0,19
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,13
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+ vpaddd ymm5,ymm5,ymm7
+ vpsrld ymm7,ymm8,6
+ vpslld ymm2,ymm8,26
+ vmovdqu YMMWORD[(128-128)+rax],ymm5
+ vpaddd ymm5,ymm5,ymm11
+
+ vpsrld ymm1,ymm8,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm8,21
+ vpaddd ymm5,ymm5,YMMWORD[rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm8,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm8,7
+ vpandn ymm0,ymm8,ymm10
+ vpand ymm3,ymm8,ymm9
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm11,ymm12,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm12,30
+ vpxor ymm0,ymm0,ymm3
+ vpxor ymm3,ymm13,ymm12
+
+ vpxor ymm11,ymm11,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm12,13
+
+ vpslld ymm2,ymm12,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm4,ymm4,ymm3
+
+ vpxor ymm7,ymm11,ymm1
+
+ vpsrld ymm1,ymm12,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm12,10
+ vpxor ymm11,ymm13,ymm4
+ vpaddd ymm15,ymm15,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm11,ymm11,ymm5
+ vpaddd ymm11,ymm11,ymm7
+ vmovdqu ymm5,YMMWORD[((192-128))+rax]
+ vpaddd ymm6,ymm6,YMMWORD[((448-256-128))+rbx]
+
+ vpsrld ymm7,ymm5,3
+ vpsrld ymm1,ymm5,7
+ vpslld ymm2,ymm5,25
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm5,18
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm5,14
+ vmovdqu ymm0,YMMWORD[((96-128))+rax]
+ vpsrld ymm4,ymm0,10
+
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm0,17
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,15
+ vpaddd ymm6,ymm6,ymm7
+ vpxor ymm7,ymm4,ymm1
+ vpsrld ymm1,ymm0,19
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,13
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+ vpaddd ymm6,ymm6,ymm7
+ vpsrld ymm7,ymm15,6
+ vpslld ymm2,ymm15,26
+ vmovdqu YMMWORD[(160-128)+rax],ymm6
+ vpaddd ymm6,ymm6,ymm10
+
+ vpsrld ymm1,ymm15,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm15,21
+ vpaddd ymm6,ymm6,YMMWORD[32+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm15,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm15,7
+ vpandn ymm0,ymm15,ymm9
+ vpand ymm4,ymm15,ymm8
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm10,ymm11,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm11,30
+ vpxor ymm0,ymm0,ymm4
+ vpxor ymm4,ymm12,ymm11
+
+ vpxor ymm10,ymm10,ymm1
+ vpaddd ymm6,ymm6,ymm7
+
+ vpsrld ymm1,ymm11,13
+
+ vpslld ymm2,ymm11,19
+ vpaddd ymm6,ymm6,ymm0
+ vpand ymm3,ymm3,ymm4
+
+ vpxor ymm7,ymm10,ymm1
+
+ vpsrld ymm1,ymm11,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm11,10
+ vpxor ymm10,ymm12,ymm3
+ vpaddd ymm14,ymm14,ymm6
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm10,ymm10,ymm6
+ vpaddd ymm10,ymm10,ymm7
+ vmovdqu ymm6,YMMWORD[((224-128))+rax]
+ vpaddd ymm5,ymm5,YMMWORD[((480-256-128))+rbx]
+
+ vpsrld ymm7,ymm6,3
+ vpsrld ymm1,ymm6,7
+ vpslld ymm2,ymm6,25
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm6,18
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm6,14
+ vmovdqu ymm0,YMMWORD[((128-128))+rax]
+ vpsrld ymm3,ymm0,10
+
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm0,17
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,15
+ vpaddd ymm5,ymm5,ymm7
+ vpxor ymm7,ymm3,ymm1
+ vpsrld ymm1,ymm0,19
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,13
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+ vpaddd ymm5,ymm5,ymm7
+ vpsrld ymm7,ymm14,6
+ vpslld ymm2,ymm14,26
+ vmovdqu YMMWORD[(192-128)+rax],ymm5
+ vpaddd ymm5,ymm5,ymm9
+
+ vpsrld ymm1,ymm14,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm14,21
+ vpaddd ymm5,ymm5,YMMWORD[64+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm14,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm14,7
+ vpandn ymm0,ymm14,ymm8
+ vpand ymm3,ymm14,ymm15
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm9,ymm10,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm10,30
+ vpxor ymm0,ymm0,ymm3
+ vpxor ymm3,ymm11,ymm10
+
+ vpxor ymm9,ymm9,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm10,13
+
+ vpslld ymm2,ymm10,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm4,ymm4,ymm3
+
+ vpxor ymm7,ymm9,ymm1
+
+ vpsrld ymm1,ymm10,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm10,10
+ vpxor ymm9,ymm11,ymm4
+ vpaddd ymm13,ymm13,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm9,ymm9,ymm5
+ vpaddd ymm9,ymm9,ymm7
+ vmovdqu ymm5,YMMWORD[((256-256-128))+rbx]
+ vpaddd ymm6,ymm6,YMMWORD[((0-128))+rax]
+
+ vpsrld ymm7,ymm5,3
+ vpsrld ymm1,ymm5,7
+ vpslld ymm2,ymm5,25
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm5,18
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm5,14
+ vmovdqu ymm0,YMMWORD[((160-128))+rax]
+ vpsrld ymm4,ymm0,10
+
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm0,17
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,15
+ vpaddd ymm6,ymm6,ymm7
+ vpxor ymm7,ymm4,ymm1
+ vpsrld ymm1,ymm0,19
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,13
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+ vpaddd ymm6,ymm6,ymm7
+ vpsrld ymm7,ymm13,6
+ vpslld ymm2,ymm13,26
+ vmovdqu YMMWORD[(224-128)+rax],ymm6
+ vpaddd ymm6,ymm6,ymm8
+
+ vpsrld ymm1,ymm13,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm13,21
+ vpaddd ymm6,ymm6,YMMWORD[96+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm13,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm13,7
+ vpandn ymm0,ymm13,ymm15
+ vpand ymm4,ymm13,ymm14
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm8,ymm9,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm9,30
+ vpxor ymm0,ymm0,ymm4
+ vpxor ymm4,ymm10,ymm9
+
+ vpxor ymm8,ymm8,ymm1
+ vpaddd ymm6,ymm6,ymm7
+
+ vpsrld ymm1,ymm9,13
+
+ vpslld ymm2,ymm9,19
+ vpaddd ymm6,ymm6,ymm0
+ vpand ymm3,ymm3,ymm4
+
+ vpxor ymm7,ymm8,ymm1
+
+ vpsrld ymm1,ymm9,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm9,10
+ vpxor ymm8,ymm10,ymm3
+ vpaddd ymm12,ymm12,ymm6
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm8,ymm8,ymm6
+ vpaddd ymm8,ymm8,ymm7
+ add rbp,256
+ vmovdqu ymm6,YMMWORD[((288-256-128))+rbx]
+ vpaddd ymm5,ymm5,YMMWORD[((32-128))+rax]
+
+ vpsrld ymm7,ymm6,3
+ vpsrld ymm1,ymm6,7
+ vpslld ymm2,ymm6,25
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm6,18
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm6,14
+ vmovdqu ymm0,YMMWORD[((192-128))+rax]
+ vpsrld ymm3,ymm0,10
+
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm0,17
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,15
+ vpaddd ymm5,ymm5,ymm7
+ vpxor ymm7,ymm3,ymm1
+ vpsrld ymm1,ymm0,19
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,13
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+ vpaddd ymm5,ymm5,ymm7
+ vpsrld ymm7,ymm12,6
+ vpslld ymm2,ymm12,26
+ vmovdqu YMMWORD[(256-256-128)+rbx],ymm5
+ vpaddd ymm5,ymm5,ymm15
+
+ vpsrld ymm1,ymm12,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm12,21
+ vpaddd ymm5,ymm5,YMMWORD[((-128))+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm12,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm12,7
+ vpandn ymm0,ymm12,ymm14
+ vpand ymm3,ymm12,ymm13
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm15,ymm8,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm8,30
+ vpxor ymm0,ymm0,ymm3
+ vpxor ymm3,ymm9,ymm8
+
+ vpxor ymm15,ymm15,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm8,13
+
+ vpslld ymm2,ymm8,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm4,ymm4,ymm3
+
+ vpxor ymm7,ymm15,ymm1
+
+ vpsrld ymm1,ymm8,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm8,10
+ vpxor ymm15,ymm9,ymm4
+ vpaddd ymm11,ymm11,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm15,ymm15,ymm5
+ vpaddd ymm15,ymm15,ymm7
+ vmovdqu ymm5,YMMWORD[((320-256-128))+rbx]
+ vpaddd ymm6,ymm6,YMMWORD[((64-128))+rax]
+
+ vpsrld ymm7,ymm5,3
+ vpsrld ymm1,ymm5,7
+ vpslld ymm2,ymm5,25
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm5,18
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm5,14
+ vmovdqu ymm0,YMMWORD[((224-128))+rax]
+ vpsrld ymm4,ymm0,10
+
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm0,17
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,15
+ vpaddd ymm6,ymm6,ymm7
+ vpxor ymm7,ymm4,ymm1
+ vpsrld ymm1,ymm0,19
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,13
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+ vpaddd ymm6,ymm6,ymm7
+ vpsrld ymm7,ymm11,6
+ vpslld ymm2,ymm11,26
+ vmovdqu YMMWORD[(288-256-128)+rbx],ymm6
+ vpaddd ymm6,ymm6,ymm14
+
+ vpsrld ymm1,ymm11,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm11,21
+ vpaddd ymm6,ymm6,YMMWORD[((-96))+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm11,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm11,7
+ vpandn ymm0,ymm11,ymm13
+ vpand ymm4,ymm11,ymm12
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm14,ymm15,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm15,30
+ vpxor ymm0,ymm0,ymm4
+ vpxor ymm4,ymm8,ymm15
+
+ vpxor ymm14,ymm14,ymm1
+ vpaddd ymm6,ymm6,ymm7
+
+ vpsrld ymm1,ymm15,13
+
+ vpslld ymm2,ymm15,19
+ vpaddd ymm6,ymm6,ymm0
+ vpand ymm3,ymm3,ymm4
+
+ vpxor ymm7,ymm14,ymm1
+
+ vpsrld ymm1,ymm15,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm15,10
+ vpxor ymm14,ymm8,ymm3
+ vpaddd ymm10,ymm10,ymm6
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm14,ymm14,ymm6
+ vpaddd ymm14,ymm14,ymm7
+ vmovdqu ymm6,YMMWORD[((352-256-128))+rbx]
+ vpaddd ymm5,ymm5,YMMWORD[((96-128))+rax]
+
+ vpsrld ymm7,ymm6,3
+ vpsrld ymm1,ymm6,7
+ vpslld ymm2,ymm6,25
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm6,18
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm6,14
+ vmovdqu ymm0,YMMWORD[((256-256-128))+rbx]
+ vpsrld ymm3,ymm0,10
+
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm0,17
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,15
+ vpaddd ymm5,ymm5,ymm7
+ vpxor ymm7,ymm3,ymm1
+ vpsrld ymm1,ymm0,19
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,13
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+ vpaddd ymm5,ymm5,ymm7
+ vpsrld ymm7,ymm10,6
+ vpslld ymm2,ymm10,26
+ vmovdqu YMMWORD[(320-256-128)+rbx],ymm5
+ vpaddd ymm5,ymm5,ymm13
+
+ vpsrld ymm1,ymm10,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm10,21
+ vpaddd ymm5,ymm5,YMMWORD[((-64))+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm10,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm10,7
+ vpandn ymm0,ymm10,ymm12
+ vpand ymm3,ymm10,ymm11
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm13,ymm14,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm14,30
+ vpxor ymm0,ymm0,ymm3
+ vpxor ymm3,ymm15,ymm14
+
+ vpxor ymm13,ymm13,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm14,13
+
+ vpslld ymm2,ymm14,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm4,ymm4,ymm3
+
+ vpxor ymm7,ymm13,ymm1
+
+ vpsrld ymm1,ymm14,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm14,10
+ vpxor ymm13,ymm15,ymm4
+ vpaddd ymm9,ymm9,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm13,ymm13,ymm5
+ vpaddd ymm13,ymm13,ymm7
+ vmovdqu ymm5,YMMWORD[((384-256-128))+rbx]
+ vpaddd ymm6,ymm6,YMMWORD[((128-128))+rax]
+
+ vpsrld ymm7,ymm5,3
+ vpsrld ymm1,ymm5,7
+ vpslld ymm2,ymm5,25
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm5,18
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm5,14
+ vmovdqu ymm0,YMMWORD[((288-256-128))+rbx]
+ vpsrld ymm4,ymm0,10
+
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm0,17
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,15
+ vpaddd ymm6,ymm6,ymm7
+ vpxor ymm7,ymm4,ymm1
+ vpsrld ymm1,ymm0,19
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,13
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+ vpaddd ymm6,ymm6,ymm7
+ vpsrld ymm7,ymm9,6
+ vpslld ymm2,ymm9,26
+ vmovdqu YMMWORD[(352-256-128)+rbx],ymm6
+ vpaddd ymm6,ymm6,ymm12
+
+ vpsrld ymm1,ymm9,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm9,21
+ vpaddd ymm6,ymm6,YMMWORD[((-32))+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm9,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm9,7
+ vpandn ymm0,ymm9,ymm11
+ vpand ymm4,ymm9,ymm10
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm12,ymm13,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm13,30
+ vpxor ymm0,ymm0,ymm4
+ vpxor ymm4,ymm14,ymm13
+
+ vpxor ymm12,ymm12,ymm1
+ vpaddd ymm6,ymm6,ymm7
+
+ vpsrld ymm1,ymm13,13
+
+ vpslld ymm2,ymm13,19
+ vpaddd ymm6,ymm6,ymm0
+ vpand ymm3,ymm3,ymm4
+
+ vpxor ymm7,ymm12,ymm1
+
+ vpsrld ymm1,ymm13,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm13,10
+ vpxor ymm12,ymm14,ymm3
+ vpaddd ymm8,ymm8,ymm6
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm12,ymm12,ymm6
+ vpaddd ymm12,ymm12,ymm7
+ vmovdqu ymm6,YMMWORD[((416-256-128))+rbx]
+ vpaddd ymm5,ymm5,YMMWORD[((160-128))+rax]
+
+ vpsrld ymm7,ymm6,3
+ vpsrld ymm1,ymm6,7
+ vpslld ymm2,ymm6,25
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm6,18
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm6,14
+ vmovdqu ymm0,YMMWORD[((320-256-128))+rbx]
+ vpsrld ymm3,ymm0,10
+
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm0,17
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,15
+ vpaddd ymm5,ymm5,ymm7
+ vpxor ymm7,ymm3,ymm1
+ vpsrld ymm1,ymm0,19
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,13
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+ vpaddd ymm5,ymm5,ymm7
+ vpsrld ymm7,ymm8,6
+ vpslld ymm2,ymm8,26
+ vmovdqu YMMWORD[(384-256-128)+rbx],ymm5
+ vpaddd ymm5,ymm5,ymm11
+
+ vpsrld ymm1,ymm8,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm8,21
+ vpaddd ymm5,ymm5,YMMWORD[rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm8,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm8,7
+ vpandn ymm0,ymm8,ymm10
+ vpand ymm3,ymm8,ymm9
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm11,ymm12,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm12,30
+ vpxor ymm0,ymm0,ymm3
+ vpxor ymm3,ymm13,ymm12
+
+ vpxor ymm11,ymm11,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm12,13
+
+ vpslld ymm2,ymm12,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm4,ymm4,ymm3
+
+ vpxor ymm7,ymm11,ymm1
+
+ vpsrld ymm1,ymm12,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm12,10
+ vpxor ymm11,ymm13,ymm4
+ vpaddd ymm15,ymm15,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm11,ymm11,ymm5
+ vpaddd ymm11,ymm11,ymm7
+ vmovdqu ymm5,YMMWORD[((448-256-128))+rbx]
+ vpaddd ymm6,ymm6,YMMWORD[((192-128))+rax]
+
+ vpsrld ymm7,ymm5,3
+ vpsrld ymm1,ymm5,7
+ vpslld ymm2,ymm5,25
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm5,18
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm5,14
+ vmovdqu ymm0,YMMWORD[((352-256-128))+rbx]
+ vpsrld ymm4,ymm0,10
+
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm0,17
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,15
+ vpaddd ymm6,ymm6,ymm7
+ vpxor ymm7,ymm4,ymm1
+ vpsrld ymm1,ymm0,19
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,13
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+ vpaddd ymm6,ymm6,ymm7
+ vpsrld ymm7,ymm15,6
+ vpslld ymm2,ymm15,26
+ vmovdqu YMMWORD[(416-256-128)+rbx],ymm6
+ vpaddd ymm6,ymm6,ymm10
+
+ vpsrld ymm1,ymm15,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm15,21
+ vpaddd ymm6,ymm6,YMMWORD[32+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm15,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm15,7
+ vpandn ymm0,ymm15,ymm9
+ vpand ymm4,ymm15,ymm8
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm10,ymm11,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm11,30
+ vpxor ymm0,ymm0,ymm4
+ vpxor ymm4,ymm12,ymm11
+
+ vpxor ymm10,ymm10,ymm1
+ vpaddd ymm6,ymm6,ymm7
+
+ vpsrld ymm1,ymm11,13
+
+ vpslld ymm2,ymm11,19
+ vpaddd ymm6,ymm6,ymm0
+ vpand ymm3,ymm3,ymm4
+
+ vpxor ymm7,ymm10,ymm1
+
+ vpsrld ymm1,ymm11,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm11,10
+ vpxor ymm10,ymm12,ymm3
+ vpaddd ymm14,ymm14,ymm6
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm10,ymm10,ymm6
+ vpaddd ymm10,ymm10,ymm7
+ vmovdqu ymm6,YMMWORD[((480-256-128))+rbx]
+ vpaddd ymm5,ymm5,YMMWORD[((224-128))+rax]
+
+ vpsrld ymm7,ymm6,3
+ vpsrld ymm1,ymm6,7
+ vpslld ymm2,ymm6,25
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm6,18
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm6,14
+ vmovdqu ymm0,YMMWORD[((384-256-128))+rbx]
+ vpsrld ymm3,ymm0,10
+
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm0,17
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,15
+ vpaddd ymm5,ymm5,ymm7
+ vpxor ymm7,ymm3,ymm1
+ vpsrld ymm1,ymm0,19
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,13
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+ vpaddd ymm5,ymm5,ymm7
+ vpsrld ymm7,ymm14,6
+ vpslld ymm2,ymm14,26
+ vmovdqu YMMWORD[(448-256-128)+rbx],ymm5
+ vpaddd ymm5,ymm5,ymm9
+
+ vpsrld ymm1,ymm14,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm14,21
+ vpaddd ymm5,ymm5,YMMWORD[64+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm14,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm14,7
+ vpandn ymm0,ymm14,ymm8
+ vpand ymm3,ymm14,ymm15
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm9,ymm10,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm10,30
+ vpxor ymm0,ymm0,ymm3
+ vpxor ymm3,ymm11,ymm10
+
+ vpxor ymm9,ymm9,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm10,13
+
+ vpslld ymm2,ymm10,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm4,ymm4,ymm3
+
+ vpxor ymm7,ymm9,ymm1
+
+ vpsrld ymm1,ymm10,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm10,10
+ vpxor ymm9,ymm11,ymm4
+ vpaddd ymm13,ymm13,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm9,ymm9,ymm5
+ vpaddd ymm9,ymm9,ymm7
+ vmovdqu ymm5,YMMWORD[((0-128))+rax]
+ vpaddd ymm6,ymm6,YMMWORD[((256-256-128))+rbx]
+
+ vpsrld ymm7,ymm5,3
+ vpsrld ymm1,ymm5,7
+ vpslld ymm2,ymm5,25
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm5,18
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm5,14
+ vmovdqu ymm0,YMMWORD[((416-256-128))+rbx]
+ vpsrld ymm4,ymm0,10
+
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm0,17
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,15
+ vpaddd ymm6,ymm6,ymm7
+ vpxor ymm7,ymm4,ymm1
+ vpsrld ymm1,ymm0,19
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,13
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+ vpaddd ymm6,ymm6,ymm7
+ vpsrld ymm7,ymm13,6
+ vpslld ymm2,ymm13,26
+ vmovdqu YMMWORD[(480-256-128)+rbx],ymm6
+ vpaddd ymm6,ymm6,ymm8
+
+ vpsrld ymm1,ymm13,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm13,21
+ vpaddd ymm6,ymm6,YMMWORD[96+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm13,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm13,7
+ vpandn ymm0,ymm13,ymm15
+ vpand ymm4,ymm13,ymm14
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm8,ymm9,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm9,30
+ vpxor ymm0,ymm0,ymm4
+ vpxor ymm4,ymm10,ymm9
+
+ vpxor ymm8,ymm8,ymm1
+ vpaddd ymm6,ymm6,ymm7
+
+ vpsrld ymm1,ymm9,13
+
+ vpslld ymm2,ymm9,19
+ vpaddd ymm6,ymm6,ymm0
+ vpand ymm3,ymm3,ymm4
+
+ vpxor ymm7,ymm8,ymm1
+
+ vpsrld ymm1,ymm9,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm9,10
+ vpxor ymm8,ymm10,ymm3
+ vpaddd ymm12,ymm12,ymm6
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm8,ymm8,ymm6
+ vpaddd ymm8,ymm8,ymm7
+ add rbp,256
+ dec ecx
+ jnz NEAR $L$oop_16_xx_avx2
+
+ mov ecx,1
+ lea rbx,[512+rsp]
+ lea rbp,[((K256+128))]
+ cmp ecx,DWORD[rbx]
+ cmovge r12,rbp
+ cmp ecx,DWORD[4+rbx]
+ cmovge r13,rbp
+ cmp ecx,DWORD[8+rbx]
+ cmovge r14,rbp
+ cmp ecx,DWORD[12+rbx]
+ cmovge r15,rbp
+ cmp ecx,DWORD[16+rbx]
+ cmovge r8,rbp
+ cmp ecx,DWORD[20+rbx]
+ cmovge r9,rbp
+ cmp ecx,DWORD[24+rbx]
+ cmovge r10,rbp
+ cmp ecx,DWORD[28+rbx]
+ cmovge r11,rbp
+ vmovdqa ymm7,YMMWORD[rbx]
+ vpxor ymm0,ymm0,ymm0
+ vmovdqa ymm6,ymm7
+ vpcmpgtd ymm6,ymm6,ymm0
+ vpaddd ymm7,ymm7,ymm6
+
+ vmovdqu ymm0,YMMWORD[((0-128))+rdi]
+ vpand ymm8,ymm8,ymm6
+ vmovdqu ymm1,YMMWORD[((32-128))+rdi]
+ vpand ymm9,ymm9,ymm6
+ vmovdqu ymm2,YMMWORD[((64-128))+rdi]
+ vpand ymm10,ymm10,ymm6
+ vmovdqu ymm5,YMMWORD[((96-128))+rdi]
+ vpand ymm11,ymm11,ymm6
+ vpaddd ymm8,ymm8,ymm0
+ vmovdqu ymm0,YMMWORD[((128-128))+rdi]
+ vpand ymm12,ymm12,ymm6
+ vpaddd ymm9,ymm9,ymm1
+ vmovdqu ymm1,YMMWORD[((160-128))+rdi]
+ vpand ymm13,ymm13,ymm6
+ vpaddd ymm10,ymm10,ymm2
+ vmovdqu ymm2,YMMWORD[((192-128))+rdi]
+ vpand ymm14,ymm14,ymm6
+ vpaddd ymm11,ymm11,ymm5
+ vmovdqu ymm5,YMMWORD[((224-128))+rdi]
+ vpand ymm15,ymm15,ymm6
+ vpaddd ymm12,ymm12,ymm0
+ vpaddd ymm13,ymm13,ymm1
+ vmovdqu YMMWORD[(0-128)+rdi],ymm8
+ vpaddd ymm14,ymm14,ymm2
+ vmovdqu YMMWORD[(32-128)+rdi],ymm9
+ vpaddd ymm15,ymm15,ymm5
+ vmovdqu YMMWORD[(64-128)+rdi],ymm10
+ vmovdqu YMMWORD[(96-128)+rdi],ymm11
+ vmovdqu YMMWORD[(128-128)+rdi],ymm12
+ vmovdqu YMMWORD[(160-128)+rdi],ymm13
+ vmovdqu YMMWORD[(192-128)+rdi],ymm14
+ vmovdqu YMMWORD[(224-128)+rdi],ymm15
+
+ vmovdqu YMMWORD[rbx],ymm7
+ lea rbx,[((256+128))+rsp]
+ vmovdqu ymm6,YMMWORD[$L$pbswap]
+ dec edx
+ jnz NEAR $L$oop_avx2
+
+
+
+
+
+
+
+$L$done_avx2:
+ mov rax,QWORD[544+rsp]
+
+ vzeroupper
+ movaps xmm6,XMMWORD[((-216))+rax]
+ movaps xmm7,XMMWORD[((-200))+rax]
+ movaps xmm8,XMMWORD[((-184))+rax]
+ movaps xmm9,XMMWORD[((-168))+rax]
+ movaps xmm10,XMMWORD[((-152))+rax]
+ movaps xmm11,XMMWORD[((-136))+rax]
+ movaps xmm12,XMMWORD[((-120))+rax]
+ movaps xmm13,XMMWORD[((-104))+rax]
+ movaps xmm14,XMMWORD[((-88))+rax]
+ movaps xmm15,XMMWORD[((-72))+rax]
+ mov r15,QWORD[((-48))+rax]
+
+ mov r14,QWORD[((-40))+rax]
+
+ mov r13,QWORD[((-32))+rax]
+
+ mov r12,QWORD[((-24))+rax]
+
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$epilogue_avx2:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_sha256_multi_block_avx2:
+ALIGN 256
+K256:
+ DD 1116352408,1116352408,1116352408,1116352408
+ DD 1116352408,1116352408,1116352408,1116352408
+ DD 1899447441,1899447441,1899447441,1899447441
+ DD 1899447441,1899447441,1899447441,1899447441
+ DD 3049323471,3049323471,3049323471,3049323471
+ DD 3049323471,3049323471,3049323471,3049323471
+ DD 3921009573,3921009573,3921009573,3921009573
+ DD 3921009573,3921009573,3921009573,3921009573
+ DD 961987163,961987163,961987163,961987163
+ DD 961987163,961987163,961987163,961987163
+ DD 1508970993,1508970993,1508970993,1508970993
+ DD 1508970993,1508970993,1508970993,1508970993
+ DD 2453635748,2453635748,2453635748,2453635748
+ DD 2453635748,2453635748,2453635748,2453635748
+ DD 2870763221,2870763221,2870763221,2870763221
+ DD 2870763221,2870763221,2870763221,2870763221
+ DD 3624381080,3624381080,3624381080,3624381080
+ DD 3624381080,3624381080,3624381080,3624381080
+ DD 310598401,310598401,310598401,310598401
+ DD 310598401,310598401,310598401,310598401
+ DD 607225278,607225278,607225278,607225278
+ DD 607225278,607225278,607225278,607225278
+ DD 1426881987,1426881987,1426881987,1426881987
+ DD 1426881987,1426881987,1426881987,1426881987
+ DD 1925078388,1925078388,1925078388,1925078388
+ DD 1925078388,1925078388,1925078388,1925078388
+ DD 2162078206,2162078206,2162078206,2162078206
+ DD 2162078206,2162078206,2162078206,2162078206
+ DD 2614888103,2614888103,2614888103,2614888103
+ DD 2614888103,2614888103,2614888103,2614888103
+ DD 3248222580,3248222580,3248222580,3248222580
+ DD 3248222580,3248222580,3248222580,3248222580
+ DD 3835390401,3835390401,3835390401,3835390401
+ DD 3835390401,3835390401,3835390401,3835390401
+ DD 4022224774,4022224774,4022224774,4022224774
+ DD 4022224774,4022224774,4022224774,4022224774
+ DD 264347078,264347078,264347078,264347078
+ DD 264347078,264347078,264347078,264347078
+ DD 604807628,604807628,604807628,604807628
+ DD 604807628,604807628,604807628,604807628
+ DD 770255983,770255983,770255983,770255983
+ DD 770255983,770255983,770255983,770255983
+ DD 1249150122,1249150122,1249150122,1249150122
+ DD 1249150122,1249150122,1249150122,1249150122
+ DD 1555081692,1555081692,1555081692,1555081692
+ DD 1555081692,1555081692,1555081692,1555081692
+ DD 1996064986,1996064986,1996064986,1996064986
+ DD 1996064986,1996064986,1996064986,1996064986
+ DD 2554220882,2554220882,2554220882,2554220882
+ DD 2554220882,2554220882,2554220882,2554220882
+ DD 2821834349,2821834349,2821834349,2821834349
+ DD 2821834349,2821834349,2821834349,2821834349
+ DD 2952996808,2952996808,2952996808,2952996808
+ DD 2952996808,2952996808,2952996808,2952996808
+ DD 3210313671,3210313671,3210313671,3210313671
+ DD 3210313671,3210313671,3210313671,3210313671
+ DD 3336571891,3336571891,3336571891,3336571891
+ DD 3336571891,3336571891,3336571891,3336571891
+ DD 3584528711,3584528711,3584528711,3584528711
+ DD 3584528711,3584528711,3584528711,3584528711
+ DD 113926993,113926993,113926993,113926993
+ DD 113926993,113926993,113926993,113926993
+ DD 338241895,338241895,338241895,338241895
+ DD 338241895,338241895,338241895,338241895
+ DD 666307205,666307205,666307205,666307205
+ DD 666307205,666307205,666307205,666307205
+ DD 773529912,773529912,773529912,773529912
+ DD 773529912,773529912,773529912,773529912
+ DD 1294757372,1294757372,1294757372,1294757372
+ DD 1294757372,1294757372,1294757372,1294757372
+ DD 1396182291,1396182291,1396182291,1396182291
+ DD 1396182291,1396182291,1396182291,1396182291
+ DD 1695183700,1695183700,1695183700,1695183700
+ DD 1695183700,1695183700,1695183700,1695183700
+ DD 1986661051,1986661051,1986661051,1986661051
+ DD 1986661051,1986661051,1986661051,1986661051
+ DD 2177026350,2177026350,2177026350,2177026350
+ DD 2177026350,2177026350,2177026350,2177026350
+ DD 2456956037,2456956037,2456956037,2456956037
+ DD 2456956037,2456956037,2456956037,2456956037
+ DD 2730485921,2730485921,2730485921,2730485921
+ DD 2730485921,2730485921,2730485921,2730485921
+ DD 2820302411,2820302411,2820302411,2820302411
+ DD 2820302411,2820302411,2820302411,2820302411
+ DD 3259730800,3259730800,3259730800,3259730800
+ DD 3259730800,3259730800,3259730800,3259730800
+ DD 3345764771,3345764771,3345764771,3345764771
+ DD 3345764771,3345764771,3345764771,3345764771
+ DD 3516065817,3516065817,3516065817,3516065817
+ DD 3516065817,3516065817,3516065817,3516065817
+ DD 3600352804,3600352804,3600352804,3600352804
+ DD 3600352804,3600352804,3600352804,3600352804
+ DD 4094571909,4094571909,4094571909,4094571909
+ DD 4094571909,4094571909,4094571909,4094571909
+ DD 275423344,275423344,275423344,275423344
+ DD 275423344,275423344,275423344,275423344
+ DD 430227734,430227734,430227734,430227734
+ DD 430227734,430227734,430227734,430227734
+ DD 506948616,506948616,506948616,506948616
+ DD 506948616,506948616,506948616,506948616
+ DD 659060556,659060556,659060556,659060556
+ DD 659060556,659060556,659060556,659060556
+ DD 883997877,883997877,883997877,883997877
+ DD 883997877,883997877,883997877,883997877
+ DD 958139571,958139571,958139571,958139571
+ DD 958139571,958139571,958139571,958139571
+ DD 1322822218,1322822218,1322822218,1322822218
+ DD 1322822218,1322822218,1322822218,1322822218
+ DD 1537002063,1537002063,1537002063,1537002063
+ DD 1537002063,1537002063,1537002063,1537002063
+ DD 1747873779,1747873779,1747873779,1747873779
+ DD 1747873779,1747873779,1747873779,1747873779
+ DD 1955562222,1955562222,1955562222,1955562222
+ DD 1955562222,1955562222,1955562222,1955562222
+ DD 2024104815,2024104815,2024104815,2024104815
+ DD 2024104815,2024104815,2024104815,2024104815
+ DD 2227730452,2227730452,2227730452,2227730452
+ DD 2227730452,2227730452,2227730452,2227730452
+ DD 2361852424,2361852424,2361852424,2361852424
+ DD 2361852424,2361852424,2361852424,2361852424
+ DD 2428436474,2428436474,2428436474,2428436474
+ DD 2428436474,2428436474,2428436474,2428436474
+ DD 2756734187,2756734187,2756734187,2756734187
+ DD 2756734187,2756734187,2756734187,2756734187
+ DD 3204031479,3204031479,3204031479,3204031479
+ DD 3204031479,3204031479,3204031479,3204031479
+ DD 3329325298,3329325298,3329325298,3329325298
+ DD 3329325298,3329325298,3329325298,3329325298
+$L$pbswap:
+ DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+ DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+K256_shaext:
+ DD 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5
+ DD 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5
+ DD 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3
+ DD 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174
+ DD 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc
+ DD 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da
+ DD 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7
+ DD 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967
+ DD 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13
+ DD 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85
+ DD 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3
+ DD 0xd192e819,0xd6990624,0xf40e3585,0x106aa070
+ DD 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5
+ DD 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3
+ DD 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208
+ DD 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2
+DB 83,72,65,50,53,54,32,109,117,108,116,105,45,98,108,111
+DB 99,107,32,116,114,97,110,115,102,111,114,109,32,102,111,114
+DB 32,120,56,54,95,54,52,44,32,67,82,89,80,84,79,71
+DB 65,77,83,32,98,121,32,60,97,112,112,114,111,64,111,112
+DB 101,110,115,115,108,46,111,114,103,62,0
+EXTERN __imp_RtlVirtualUnwind
+
+ALIGN 16
+se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$in_prologue
+
+ mov rax,QWORD[152+r8]
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$in_prologue
+
+ mov rax,QWORD[272+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+ mov rbp,QWORD[((-16))+rax]
+ mov QWORD[144+r8],rbx
+ mov QWORD[160+r8],rbp
+
+ lea rsi,[((-24-160))+rax]
+ lea rdi,[512+r8]
+ mov ecx,20
+ DD 0xa548f3fc
+
+$L$in_prologue:
+ mov rdi,QWORD[8+rax]
+ mov rsi,QWORD[16+rax]
+ mov QWORD[152+r8],rax
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+
+ mov rdi,QWORD[40+r9]
+ mov rsi,r8
+ mov ecx,154
+ DD 0xa548f3fc
+
+ mov rsi,r9
+ xor rcx,rcx
+ mov rdx,QWORD[8+rsi]
+ mov r8,QWORD[rsi]
+ mov r9,QWORD[16+rsi]
+ mov r10,QWORD[40+rsi]
+ lea r11,[56+rsi]
+ lea r12,[24+rsi]
+ mov QWORD[32+rsp],r10
+ mov QWORD[40+rsp],r11
+ mov QWORD[48+rsp],r12
+ mov QWORD[56+rsp],rcx
+ call QWORD[__imp_RtlVirtualUnwind]
+
+ mov eax,1
+ add rsp,64
+ popfq
+ pop r15
+ pop r14
+ pop r13
+ pop r12
+ pop rbp
+ pop rbx
+ pop rdi
+ pop rsi
+ DB 0F3h,0C3h ;repret
+
+
+ALIGN 16
+avx2_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$in_prologue
+
+ mov rax,QWORD[152+r8]
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$in_prologue
+
+ mov rax,QWORD[544+r8]
+
+ mov rbx,QWORD[((-8))+rax]
+ mov rbp,QWORD[((-16))+rax]
+ mov r12,QWORD[((-24))+rax]
+ mov r13,QWORD[((-32))+rax]
+ mov r14,QWORD[((-40))+rax]
+ mov r15,QWORD[((-48))+rax]
+ mov QWORD[144+r8],rbx
+ mov QWORD[160+r8],rbp
+ mov QWORD[216+r8],r12
+ mov QWORD[224+r8],r13
+ mov QWORD[232+r8],r14
+ mov QWORD[240+r8],r15
+
+ lea rsi,[((-56-160))+rax]
+ lea rdi,[512+r8]
+ mov ecx,20
+ DD 0xa548f3fc
+
+ jmp NEAR $L$in_prologue
+
+section .pdata rdata align=4
+ALIGN 4
+ DD $L$SEH_begin_sha256_multi_block wrt ..imagebase
+ DD $L$SEH_end_sha256_multi_block wrt ..imagebase
+ DD $L$SEH_info_sha256_multi_block wrt ..imagebase
+ DD $L$SEH_begin_sha256_multi_block_shaext wrt ..imagebase
+ DD $L$SEH_end_sha256_multi_block_shaext wrt ..imagebase
+ DD $L$SEH_info_sha256_multi_block_shaext wrt ..imagebase
+ DD $L$SEH_begin_sha256_multi_block_avx wrt ..imagebase
+ DD $L$SEH_end_sha256_multi_block_avx wrt ..imagebase
+ DD $L$SEH_info_sha256_multi_block_avx wrt ..imagebase
+ DD $L$SEH_begin_sha256_multi_block_avx2 wrt ..imagebase
+ DD $L$SEH_end_sha256_multi_block_avx2 wrt ..imagebase
+ DD $L$SEH_info_sha256_multi_block_avx2 wrt ..imagebase
+section .xdata rdata align=8
+ALIGN 8
+$L$SEH_info_sha256_multi_block:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$body wrt ..imagebase,$L$epilogue wrt ..imagebase
+$L$SEH_info_sha256_multi_block_shaext:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$body_shaext wrt ..imagebase,$L$epilogue_shaext wrt ..imagebase
+$L$SEH_info_sha256_multi_block_avx:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$body_avx wrt ..imagebase,$L$epilogue_avx wrt ..imagebase
+$L$SEH_info_sha256_multi_block_avx2:
+DB 9,0,0,0
+ DD avx2_handler wrt ..imagebase
+ DD $L$body_avx2 wrt ..imagebase,$L$epilogue_avx2 wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm
new file mode 100644
index 0000000000..e8abeaa668
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm
@@ -0,0 +1,5712 @@
+; Copyright 2005-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+EXTERN OPENSSL_ia32cap_P
+global sha256_block_data_order
+
+ALIGN 16
+sha256_block_data_order:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha256_block_data_order:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+ lea r11,[OPENSSL_ia32cap_P]
+ mov r9d,DWORD[r11]
+ mov r10d,DWORD[4+r11]
+ mov r11d,DWORD[8+r11]
+ test r11d,536870912
+ jnz NEAR _shaext_shortcut
+ and r11d,296
+ cmp r11d,296
+ je NEAR $L$avx2_shortcut
+ and r9d,1073741824
+ and r10d,268435968
+ or r10d,r9d
+ cmp r10d,1342177792
+ je NEAR $L$avx_shortcut
+ test r10d,512
+ jnz NEAR $L$ssse3_shortcut
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ shl rdx,4
+ sub rsp,16*4+4*8
+ lea rdx,[rdx*4+rsi]
+ and rsp,-64
+ mov QWORD[((64+0))+rsp],rdi
+ mov QWORD[((64+8))+rsp],rsi
+ mov QWORD[((64+16))+rsp],rdx
+ mov QWORD[88+rsp],rax
+
+$L$prologue:
+
+ mov eax,DWORD[rdi]
+ mov ebx,DWORD[4+rdi]
+ mov ecx,DWORD[8+rdi]
+ mov edx,DWORD[12+rdi]
+ mov r8d,DWORD[16+rdi]
+ mov r9d,DWORD[20+rdi]
+ mov r10d,DWORD[24+rdi]
+ mov r11d,DWORD[28+rdi]
+ jmp NEAR $L$loop
+
+ALIGN 16
+$L$loop:
+ mov edi,ebx
+ lea rbp,[K256]
+ xor edi,ecx
+ mov r12d,DWORD[rsi]
+ mov r13d,r8d
+ mov r14d,eax
+ bswap r12d
+ ror r13d,14
+ mov r15d,r9d
+
+ xor r13d,r8d
+ ror r14d,9
+ xor r15d,r10d
+
+ mov DWORD[rsp],r12d
+ xor r14d,eax
+ and r15d,r8d
+
+ ror r13d,5
+ add r12d,r11d
+ xor r15d,r10d
+
+ ror r14d,11
+ xor r13d,r8d
+ add r12d,r15d
+
+ mov r15d,eax
+ add r12d,DWORD[rbp]
+ xor r14d,eax
+
+ xor r15d,ebx
+ ror r13d,6
+ mov r11d,ebx
+
+ and edi,r15d
+ ror r14d,2
+ add r12d,r13d
+
+ xor r11d,edi
+ add edx,r12d
+ add r11d,r12d
+
+ lea rbp,[4+rbp]
+ add r11d,r14d
+ mov r12d,DWORD[4+rsi]
+ mov r13d,edx
+ mov r14d,r11d
+ bswap r12d
+ ror r13d,14
+ mov edi,r8d
+
+ xor r13d,edx
+ ror r14d,9
+ xor edi,r9d
+
+ mov DWORD[4+rsp],r12d
+ xor r14d,r11d
+ and edi,edx
+
+ ror r13d,5
+ add r12d,r10d
+ xor edi,r9d
+
+ ror r14d,11
+ xor r13d,edx
+ add r12d,edi
+
+ mov edi,r11d
+ add r12d,DWORD[rbp]
+ xor r14d,r11d
+
+ xor edi,eax
+ ror r13d,6
+ mov r10d,eax
+
+ and r15d,edi
+ ror r14d,2
+ add r12d,r13d
+
+ xor r10d,r15d
+ add ecx,r12d
+ add r10d,r12d
+
+ lea rbp,[4+rbp]
+ add r10d,r14d
+ mov r12d,DWORD[8+rsi]
+ mov r13d,ecx
+ mov r14d,r10d
+ bswap r12d
+ ror r13d,14
+ mov r15d,edx
+
+ xor r13d,ecx
+ ror r14d,9
+ xor r15d,r8d
+
+ mov DWORD[8+rsp],r12d
+ xor r14d,r10d
+ and r15d,ecx
+
+ ror r13d,5
+ add r12d,r9d
+ xor r15d,r8d
+
+ ror r14d,11
+ xor r13d,ecx
+ add r12d,r15d
+
+ mov r15d,r10d
+ add r12d,DWORD[rbp]
+ xor r14d,r10d
+
+ xor r15d,r11d
+ ror r13d,6
+ mov r9d,r11d
+
+ and edi,r15d
+ ror r14d,2
+ add r12d,r13d
+
+ xor r9d,edi
+ add ebx,r12d
+ add r9d,r12d
+
+ lea rbp,[4+rbp]
+ add r9d,r14d
+ mov r12d,DWORD[12+rsi]
+ mov r13d,ebx
+ mov r14d,r9d
+ bswap r12d
+ ror r13d,14
+ mov edi,ecx
+
+ xor r13d,ebx
+ ror r14d,9
+ xor edi,edx
+
+ mov DWORD[12+rsp],r12d
+ xor r14d,r9d
+ and edi,ebx
+
+ ror r13d,5
+ add r12d,r8d
+ xor edi,edx
+
+ ror r14d,11
+ xor r13d,ebx
+ add r12d,edi
+
+ mov edi,r9d
+ add r12d,DWORD[rbp]
+ xor r14d,r9d
+
+ xor edi,r10d
+ ror r13d,6
+ mov r8d,r10d
+
+ and r15d,edi
+ ror r14d,2
+ add r12d,r13d
+
+ xor r8d,r15d
+ add eax,r12d
+ add r8d,r12d
+
+ lea rbp,[20+rbp]
+ add r8d,r14d
+ mov r12d,DWORD[16+rsi]
+ mov r13d,eax
+ mov r14d,r8d
+ bswap r12d
+ ror r13d,14
+ mov r15d,ebx
+
+ xor r13d,eax
+ ror r14d,9
+ xor r15d,ecx
+
+ mov DWORD[16+rsp],r12d
+ xor r14d,r8d
+ and r15d,eax
+
+ ror r13d,5
+ add r12d,edx
+ xor r15d,ecx
+
+ ror r14d,11
+ xor r13d,eax
+ add r12d,r15d
+
+ mov r15d,r8d
+ add r12d,DWORD[rbp]
+ xor r14d,r8d
+
+ xor r15d,r9d
+ ror r13d,6
+ mov edx,r9d
+
+ and edi,r15d
+ ror r14d,2
+ add r12d,r13d
+
+ xor edx,edi
+ add r11d,r12d
+ add edx,r12d
+
+ lea rbp,[4+rbp]
+ add edx,r14d
+ mov r12d,DWORD[20+rsi]
+ mov r13d,r11d
+ mov r14d,edx
+ bswap r12d
+ ror r13d,14
+ mov edi,eax
+
+ xor r13d,r11d
+ ror r14d,9
+ xor edi,ebx
+
+ mov DWORD[20+rsp],r12d
+ xor r14d,edx
+ and edi,r11d
+
+ ror r13d,5
+ add r12d,ecx
+ xor edi,ebx
+
+ ror r14d,11
+ xor r13d,r11d
+ add r12d,edi
+
+ mov edi,edx
+ add r12d,DWORD[rbp]
+ xor r14d,edx
+
+ xor edi,r8d
+ ror r13d,6
+ mov ecx,r8d
+
+ and r15d,edi
+ ror r14d,2
+ add r12d,r13d
+
+ xor ecx,r15d
+ add r10d,r12d
+ add ecx,r12d
+
+ lea rbp,[4+rbp]
+ add ecx,r14d
+ mov r12d,DWORD[24+rsi]
+ mov r13d,r10d
+ mov r14d,ecx
+ bswap r12d
+ ror r13d,14
+ mov r15d,r11d
+
+ xor r13d,r10d
+ ror r14d,9
+ xor r15d,eax
+
+ mov DWORD[24+rsp],r12d
+ xor r14d,ecx
+ and r15d,r10d
+
+ ror r13d,5
+ add r12d,ebx
+ xor r15d,eax
+
+ ror r14d,11
+ xor r13d,r10d
+ add r12d,r15d
+
+ mov r15d,ecx
+ add r12d,DWORD[rbp]
+ xor r14d,ecx
+
+ xor r15d,edx
+ ror r13d,6
+ mov ebx,edx
+
+ and edi,r15d
+ ror r14d,2
+ add r12d,r13d
+
+ xor ebx,edi
+ add r9d,r12d
+ add ebx,r12d
+
+ lea rbp,[4+rbp]
+ add ebx,r14d
+ mov r12d,DWORD[28+rsi]
+ mov r13d,r9d
+ mov r14d,ebx
+ bswap r12d
+ ror r13d,14
+ mov edi,r10d
+
+ xor r13d,r9d
+ ror r14d,9
+ xor edi,r11d
+
+ mov DWORD[28+rsp],r12d
+ xor r14d,ebx
+ and edi,r9d
+
+ ror r13d,5
+ add r12d,eax
+ xor edi,r11d
+
+ ror r14d,11
+ xor r13d,r9d
+ add r12d,edi
+
+ mov edi,ebx
+ add r12d,DWORD[rbp]
+ xor r14d,ebx
+
+ xor edi,ecx
+ ror r13d,6
+ mov eax,ecx
+
+ and r15d,edi
+ ror r14d,2
+ add r12d,r13d
+
+ xor eax,r15d
+ add r8d,r12d
+ add eax,r12d
+
+ lea rbp,[20+rbp]
+ add eax,r14d
+ mov r12d,DWORD[32+rsi]
+ mov r13d,r8d
+ mov r14d,eax
+ bswap r12d
+ ror r13d,14
+ mov r15d,r9d
+
+ xor r13d,r8d
+ ror r14d,9
+ xor r15d,r10d
+
+ mov DWORD[32+rsp],r12d
+ xor r14d,eax
+ and r15d,r8d
+
+ ror r13d,5
+ add r12d,r11d
+ xor r15d,r10d
+
+ ror r14d,11
+ xor r13d,r8d
+ add r12d,r15d
+
+ mov r15d,eax
+ add r12d,DWORD[rbp]
+ xor r14d,eax
+
+ xor r15d,ebx
+ ror r13d,6
+ mov r11d,ebx
+
+ and edi,r15d
+ ror r14d,2
+ add r12d,r13d
+
+ xor r11d,edi
+ add edx,r12d
+ add r11d,r12d
+
+ lea rbp,[4+rbp]
+ add r11d,r14d
+ mov r12d,DWORD[36+rsi]
+ mov r13d,edx
+ mov r14d,r11d
+ bswap r12d
+ ror r13d,14
+ mov edi,r8d
+
+ xor r13d,edx
+ ror r14d,9
+ xor edi,r9d
+
+ mov DWORD[36+rsp],r12d
+ xor r14d,r11d
+ and edi,edx
+
+ ror r13d,5
+ add r12d,r10d
+ xor edi,r9d
+
+ ror r14d,11
+ xor r13d,edx
+ add r12d,edi
+
+ mov edi,r11d
+ add r12d,DWORD[rbp]
+ xor r14d,r11d
+
+ xor edi,eax
+ ror r13d,6
+ mov r10d,eax
+
+ and r15d,edi
+ ror r14d,2
+ add r12d,r13d
+
+ xor r10d,r15d
+ add ecx,r12d
+ add r10d,r12d
+
+ lea rbp,[4+rbp]
+ add r10d,r14d
+ mov r12d,DWORD[40+rsi]
+ mov r13d,ecx
+ mov r14d,r10d
+ bswap r12d
+ ror r13d,14
+ mov r15d,edx
+
+ xor r13d,ecx
+ ror r14d,9
+ xor r15d,r8d
+
+ mov DWORD[40+rsp],r12d
+ xor r14d,r10d
+ and r15d,ecx
+
+ ror r13d,5
+ add r12d,r9d
+ xor r15d,r8d
+
+ ror r14d,11
+ xor r13d,ecx
+ add r12d,r15d
+
+ mov r15d,r10d
+ add r12d,DWORD[rbp]
+ xor r14d,r10d
+
+ xor r15d,r11d
+ ror r13d,6
+ mov r9d,r11d
+
+ and edi,r15d
+ ror r14d,2
+ add r12d,r13d
+
+ xor r9d,edi
+ add ebx,r12d
+ add r9d,r12d
+
+ lea rbp,[4+rbp]
+ add r9d,r14d
+ mov r12d,DWORD[44+rsi]
+ mov r13d,ebx
+ mov r14d,r9d
+ bswap r12d
+ ror r13d,14
+ mov edi,ecx
+
+ xor r13d,ebx
+ ror r14d,9
+ xor edi,edx
+
+ mov DWORD[44+rsp],r12d
+ xor r14d,r9d
+ and edi,ebx
+
+ ror r13d,5
+ add r12d,r8d
+ xor edi,edx
+
+ ror r14d,11
+ xor r13d,ebx
+ add r12d,edi
+
+ mov edi,r9d
+ add r12d,DWORD[rbp]
+ xor r14d,r9d
+
+ xor edi,r10d
+ ror r13d,6
+ mov r8d,r10d
+
+ and r15d,edi
+ ror r14d,2
+ add r12d,r13d
+
+ xor r8d,r15d
+ add eax,r12d
+ add r8d,r12d
+
+ lea rbp,[20+rbp]
+ add r8d,r14d
+ mov r12d,DWORD[48+rsi]
+ mov r13d,eax
+ mov r14d,r8d
+ bswap r12d
+ ror r13d,14
+ mov r15d,ebx
+
+ xor r13d,eax
+ ror r14d,9
+ xor r15d,ecx
+
+ mov DWORD[48+rsp],r12d
+ xor r14d,r8d
+ and r15d,eax
+
+ ror r13d,5
+ add r12d,edx
+ xor r15d,ecx
+
+ ror r14d,11
+ xor r13d,eax
+ add r12d,r15d
+
+ mov r15d,r8d
+ add r12d,DWORD[rbp]
+ xor r14d,r8d
+
+ xor r15d,r9d
+ ror r13d,6
+ mov edx,r9d
+
+ and edi,r15d
+ ror r14d,2
+ add r12d,r13d
+
+ xor edx,edi
+ add r11d,r12d
+ add edx,r12d
+
+ lea rbp,[4+rbp]
+ add edx,r14d
+ mov r12d,DWORD[52+rsi]
+ mov r13d,r11d
+ mov r14d,edx
+ bswap r12d
+ ror r13d,14
+ mov edi,eax
+
+ xor r13d,r11d
+ ror r14d,9
+ xor edi,ebx
+
+ mov DWORD[52+rsp],r12d
+ xor r14d,edx
+ and edi,r11d
+
+ ror r13d,5
+ add r12d,ecx
+ xor edi,ebx
+
+ ror r14d,11
+ xor r13d,r11d
+ add r12d,edi
+
+ mov edi,edx
+ add r12d,DWORD[rbp]
+ xor r14d,edx
+
+ xor edi,r8d
+ ror r13d,6
+ mov ecx,r8d
+
+ and r15d,edi
+ ror r14d,2
+ add r12d,r13d
+
+ xor ecx,r15d
+ add r10d,r12d
+ add ecx,r12d
+
+ lea rbp,[4+rbp]
+ add ecx,r14d
+ mov r12d,DWORD[56+rsi]
+ mov r13d,r10d
+ mov r14d,ecx
+ bswap r12d
+ ror r13d,14
+ mov r15d,r11d
+
+ xor r13d,r10d
+ ror r14d,9
+ xor r15d,eax
+
+ mov DWORD[56+rsp],r12d
+ xor r14d,ecx
+ and r15d,r10d
+
+ ror r13d,5
+ add r12d,ebx
+ xor r15d,eax
+
+ ror r14d,11
+ xor r13d,r10d
+ add r12d,r15d
+
+ mov r15d,ecx
+ add r12d,DWORD[rbp]
+ xor r14d,ecx
+
+ xor r15d,edx
+ ror r13d,6
+ mov ebx,edx
+
+ and edi,r15d
+ ror r14d,2
+ add r12d,r13d
+
+ xor ebx,edi
+ add r9d,r12d
+ add ebx,r12d
+
+ lea rbp,[4+rbp]
+ add ebx,r14d
+ mov r12d,DWORD[60+rsi]
+ mov r13d,r9d
+ mov r14d,ebx
+ bswap r12d
+ ror r13d,14
+ mov edi,r10d
+
+ xor r13d,r9d
+ ror r14d,9
+ xor edi,r11d
+
+ mov DWORD[60+rsp],r12d
+ xor r14d,ebx
+ and edi,r9d
+
+ ror r13d,5
+ add r12d,eax
+ xor edi,r11d
+
+ ror r14d,11
+ xor r13d,r9d
+ add r12d,edi
+
+ mov edi,ebx
+ add r12d,DWORD[rbp]
+ xor r14d,ebx
+
+ xor edi,ecx
+ ror r13d,6
+ mov eax,ecx
+
+ and r15d,edi
+ ror r14d,2
+ add r12d,r13d
+
+ xor eax,r15d
+ add r8d,r12d
+ add eax,r12d
+
+ lea rbp,[20+rbp]
+ jmp NEAR $L$rounds_16_xx
+ALIGN 16
+$L$rounds_16_xx:
+ mov r13d,DWORD[4+rsp]
+ mov r15d,DWORD[56+rsp]
+
+ mov r12d,r13d
+ ror r13d,11
+ add eax,r14d
+ mov r14d,r15d
+ ror r15d,2
+
+ xor r13d,r12d
+ shr r12d,3
+ ror r13d,7
+ xor r15d,r14d
+ shr r14d,10
+
+ ror r15d,17
+ xor r12d,r13d
+ xor r15d,r14d
+ add r12d,DWORD[36+rsp]
+
+ add r12d,DWORD[rsp]
+ mov r13d,r8d
+ add r12d,r15d
+ mov r14d,eax
+ ror r13d,14
+ mov r15d,r9d
+
+ xor r13d,r8d
+ ror r14d,9
+ xor r15d,r10d
+
+ mov DWORD[rsp],r12d
+ xor r14d,eax
+ and r15d,r8d
+
+ ror r13d,5
+ add r12d,r11d
+ xor r15d,r10d
+
+ ror r14d,11
+ xor r13d,r8d
+ add r12d,r15d
+
+ mov r15d,eax
+ add r12d,DWORD[rbp]
+ xor r14d,eax
+
+ xor r15d,ebx
+ ror r13d,6
+ mov r11d,ebx
+
+ and edi,r15d
+ ror r14d,2
+ add r12d,r13d
+
+ xor r11d,edi
+ add edx,r12d
+ add r11d,r12d
+
+ lea rbp,[4+rbp]
+ mov r13d,DWORD[8+rsp]
+ mov edi,DWORD[60+rsp]
+
+ mov r12d,r13d
+ ror r13d,11
+ add r11d,r14d
+ mov r14d,edi
+ ror edi,2
+
+ xor r13d,r12d
+ shr r12d,3
+ ror r13d,7
+ xor edi,r14d
+ shr r14d,10
+
+ ror edi,17
+ xor r12d,r13d
+ xor edi,r14d
+ add r12d,DWORD[40+rsp]
+
+ add r12d,DWORD[4+rsp]
+ mov r13d,edx
+ add r12d,edi
+ mov r14d,r11d
+ ror r13d,14
+ mov edi,r8d
+
+ xor r13d,edx
+ ror r14d,9
+ xor edi,r9d
+
+ mov DWORD[4+rsp],r12d
+ xor r14d,r11d
+ and edi,edx
+
+ ror r13d,5
+ add r12d,r10d
+ xor edi,r9d
+
+ ror r14d,11
+ xor r13d,edx
+ add r12d,edi
+
+ mov edi,r11d
+ add r12d,DWORD[rbp]
+ xor r14d,r11d
+
+ xor edi,eax
+ ror r13d,6
+ mov r10d,eax
+
+ and r15d,edi
+ ror r14d,2
+ add r12d,r13d
+
+ xor r10d,r15d
+ add ecx,r12d
+ add r10d,r12d
+
+ lea rbp,[4+rbp]
+ mov r13d,DWORD[12+rsp]
+ mov r15d,DWORD[rsp]
+
+ mov r12d,r13d
+ ror r13d,11
+ add r10d,r14d
+ mov r14d,r15d
+ ror r15d,2
+
+ xor r13d,r12d
+ shr r12d,3
+ ror r13d,7
+ xor r15d,r14d
+ shr r14d,10
+
+ ror r15d,17
+ xor r12d,r13d
+ xor r15d,r14d
+ add r12d,DWORD[44+rsp]
+
+ add r12d,DWORD[8+rsp]
+ mov r13d,ecx
+ add r12d,r15d
+ mov r14d,r10d
+ ror r13d,14
+ mov r15d,edx
+
+ xor r13d,ecx
+ ror r14d,9
+ xor r15d,r8d
+
+ mov DWORD[8+rsp],r12d
+ xor r14d,r10d
+ and r15d,ecx
+
+ ror r13d,5
+ add r12d,r9d
+ xor r15d,r8d
+
+ ror r14d,11
+ xor r13d,ecx
+ add r12d,r15d
+
+ mov r15d,r10d
+ add r12d,DWORD[rbp]
+ xor r14d,r10d
+
+ xor r15d,r11d
+ ror r13d,6
+ mov r9d,r11d
+
+ and edi,r15d
+ ror r14d,2
+ add r12d,r13d
+
+ xor r9d,edi
+ add ebx,r12d
+ add r9d,r12d
+
+ lea rbp,[4+rbp]
+ mov r13d,DWORD[16+rsp]
+ mov edi,DWORD[4+rsp]
+
+ mov r12d,r13d
+ ror r13d,11
+ add r9d,r14d
+ mov r14d,edi
+ ror edi,2
+
+ xor r13d,r12d
+ shr r12d,3
+ ror r13d,7
+ xor edi,r14d
+ shr r14d,10
+
+ ror edi,17
+ xor r12d,r13d
+ xor edi,r14d
+ add r12d,DWORD[48+rsp]
+
+ add r12d,DWORD[12+rsp]
+ mov r13d,ebx
+ add r12d,edi
+ mov r14d,r9d
+ ror r13d,14
+ mov edi,ecx
+
+ xor r13d,ebx
+ ror r14d,9
+ xor edi,edx
+
+ mov DWORD[12+rsp],r12d
+ xor r14d,r9d
+ and edi,ebx
+
+ ror r13d,5
+ add r12d,r8d
+ xor edi,edx
+
+ ror r14d,11
+ xor r13d,ebx
+ add r12d,edi
+
+ mov edi,r9d
+ add r12d,DWORD[rbp]
+ xor r14d,r9d
+
+ xor edi,r10d
+ ror r13d,6
+ mov r8d,r10d
+
+ and r15d,edi
+ ror r14d,2
+ add r12d,r13d
+
+ xor r8d,r15d
+ add eax,r12d
+ add r8d,r12d
+
+ lea rbp,[20+rbp]
+ mov r13d,DWORD[20+rsp]
+ mov r15d,DWORD[8+rsp]
+
+ mov r12d,r13d
+ ror r13d,11
+ add r8d,r14d
+ mov r14d,r15d
+ ror r15d,2
+
+ xor r13d,r12d
+ shr r12d,3
+ ror r13d,7
+ xor r15d,r14d
+ shr r14d,10
+
+ ror r15d,17
+ xor r12d,r13d
+ xor r15d,r14d
+ add r12d,DWORD[52+rsp]
+
+ add r12d,DWORD[16+rsp]
+ mov r13d,eax
+ add r12d,r15d
+ mov r14d,r8d
+ ror r13d,14
+ mov r15d,ebx
+
+ xor r13d,eax
+ ror r14d,9
+ xor r15d,ecx
+
+ mov DWORD[16+rsp],r12d
+ xor r14d,r8d
+ and r15d,eax
+
+ ror r13d,5
+ add r12d,edx
+ xor r15d,ecx
+
+ ror r14d,11
+ xor r13d,eax
+ add r12d,r15d
+
+ mov r15d,r8d
+ add r12d,DWORD[rbp]
+ xor r14d,r8d
+
+ xor r15d,r9d
+ ror r13d,6
+ mov edx,r9d
+
+ and edi,r15d
+ ror r14d,2
+ add r12d,r13d
+
+ xor edx,edi
+ add r11d,r12d
+ add edx,r12d
+
+ lea rbp,[4+rbp]
+ mov r13d,DWORD[24+rsp]
+ mov edi,DWORD[12+rsp]
+
+ mov r12d,r13d
+ ror r13d,11
+ add edx,r14d
+ mov r14d,edi
+ ror edi,2
+
+ xor r13d,r12d
+ shr r12d,3
+ ror r13d,7
+ xor edi,r14d
+ shr r14d,10
+
+ ror edi,17
+ xor r12d,r13d
+ xor edi,r14d
+ add r12d,DWORD[56+rsp]
+
+ add r12d,DWORD[20+rsp]
+ mov r13d,r11d
+ add r12d,edi
+ mov r14d,edx
+ ror r13d,14
+ mov edi,eax
+
+ xor r13d,r11d
+ ror r14d,9
+ xor edi,ebx
+
+ mov DWORD[20+rsp],r12d
+ xor r14d,edx
+ and edi,r11d
+
+ ror r13d,5
+ add r12d,ecx
+ xor edi,ebx
+
+ ror r14d,11
+ xor r13d,r11d
+ add r12d,edi
+
+ mov edi,edx
+ add r12d,DWORD[rbp]
+ xor r14d,edx
+
+ xor edi,r8d
+ ror r13d,6
+ mov ecx,r8d
+
+ and r15d,edi
+ ror r14d,2
+ add r12d,r13d
+
+ xor ecx,r15d
+ add r10d,r12d
+ add ecx,r12d
+
+ lea rbp,[4+rbp]
+ mov r13d,DWORD[28+rsp]
+ mov r15d,DWORD[16+rsp]
+
+ mov r12d,r13d
+ ror r13d,11
+ add ecx,r14d
+ mov r14d,r15d
+ ror r15d,2
+
+ xor r13d,r12d
+ shr r12d,3
+ ror r13d,7
+ xor r15d,r14d
+ shr r14d,10
+
+ ror r15d,17
+ xor r12d,r13d
+ xor r15d,r14d
+ add r12d,DWORD[60+rsp]
+
+ add r12d,DWORD[24+rsp]
+ mov r13d,r10d
+ add r12d,r15d
+ mov r14d,ecx
+ ror r13d,14
+ mov r15d,r11d
+
+ xor r13d,r10d
+ ror r14d,9
+ xor r15d,eax
+
+ mov DWORD[24+rsp],r12d
+ xor r14d,ecx
+ and r15d,r10d
+
+ ror r13d,5
+ add r12d,ebx
+ xor r15d,eax
+
+ ror r14d,11
+ xor r13d,r10d
+ add r12d,r15d
+
+ mov r15d,ecx
+ add r12d,DWORD[rbp]
+ xor r14d,ecx
+
+ xor r15d,edx
+ ror r13d,6
+ mov ebx,edx
+
+ and edi,r15d
+ ror r14d,2
+ add r12d,r13d
+
+ xor ebx,edi
+ add r9d,r12d
+ add ebx,r12d
+
+ lea rbp,[4+rbp]
+ mov r13d,DWORD[32+rsp]
+ mov edi,DWORD[20+rsp]
+
+ mov r12d,r13d
+ ror r13d,11
+ add ebx,r14d
+ mov r14d,edi
+ ror edi,2
+
+ xor r13d,r12d
+ shr r12d,3
+ ror r13d,7
+ xor edi,r14d
+ shr r14d,10
+
+ ror edi,17
+ xor r12d,r13d
+ xor edi,r14d
+ add r12d,DWORD[rsp]
+
+ add r12d,DWORD[28+rsp]
+ mov r13d,r9d
+ add r12d,edi
+ mov r14d,ebx
+ ror r13d,14
+ mov edi,r10d
+
+ xor r13d,r9d
+ ror r14d,9
+ xor edi,r11d
+
+ mov DWORD[28+rsp],r12d
+ xor r14d,ebx
+ and edi,r9d
+
+ ror r13d,5
+ add r12d,eax
+ xor edi,r11d
+
+ ror r14d,11
+ xor r13d,r9d
+ add r12d,edi
+
+ mov edi,ebx
+ add r12d,DWORD[rbp]
+ xor r14d,ebx
+
+ xor edi,ecx
+ ror r13d,6
+ mov eax,ecx
+
+ and r15d,edi
+ ror r14d,2
+ add r12d,r13d
+
+ xor eax,r15d
+ add r8d,r12d
+ add eax,r12d
+
+ lea rbp,[20+rbp]
+ mov r13d,DWORD[36+rsp]
+ mov r15d,DWORD[24+rsp]
+
+ mov r12d,r13d
+ ror r13d,11
+ add eax,r14d
+ mov r14d,r15d
+ ror r15d,2
+
+ xor r13d,r12d
+ shr r12d,3
+ ror r13d,7
+ xor r15d,r14d
+ shr r14d,10
+
+ ror r15d,17
+ xor r12d,r13d
+ xor r15d,r14d
+ add r12d,DWORD[4+rsp]
+
+ add r12d,DWORD[32+rsp]
+ mov r13d,r8d
+ add r12d,r15d
+ mov r14d,eax
+ ror r13d,14
+ mov r15d,r9d
+
+ xor r13d,r8d
+ ror r14d,9
+ xor r15d,r10d
+
+ mov DWORD[32+rsp],r12d
+ xor r14d,eax
+ and r15d,r8d
+
+ ror r13d,5
+ add r12d,r11d
+ xor r15d,r10d
+
+ ror r14d,11
+ xor r13d,r8d
+ add r12d,r15d
+
+ mov r15d,eax
+ add r12d,DWORD[rbp]
+ xor r14d,eax
+
+ xor r15d,ebx
+ ror r13d,6
+ mov r11d,ebx
+
+ and edi,r15d
+ ror r14d,2
+ add r12d,r13d
+
+ xor r11d,edi
+ add edx,r12d
+ add r11d,r12d
+
+ lea rbp,[4+rbp]
+ mov r13d,DWORD[40+rsp]
+ mov edi,DWORD[28+rsp]
+
+ mov r12d,r13d
+ ror r13d,11
+ add r11d,r14d
+ mov r14d,edi
+ ror edi,2
+
+ xor r13d,r12d
+ shr r12d,3
+ ror r13d,7
+ xor edi,r14d
+ shr r14d,10
+
+ ror edi,17
+ xor r12d,r13d
+ xor edi,r14d
+ add r12d,DWORD[8+rsp]
+
+ add r12d,DWORD[36+rsp]
+ mov r13d,edx
+ add r12d,edi
+ mov r14d,r11d
+ ror r13d,14
+ mov edi,r8d
+
+ xor r13d,edx
+ ror r14d,9
+ xor edi,r9d
+
+ mov DWORD[36+rsp],r12d
+ xor r14d,r11d
+ and edi,edx
+
+ ror r13d,5
+ add r12d,r10d
+ xor edi,r9d
+
+ ror r14d,11
+ xor r13d,edx
+ add r12d,edi
+
+ mov edi,r11d
+ add r12d,DWORD[rbp]
+ xor r14d,r11d
+
+ xor edi,eax
+ ror r13d,6
+ mov r10d,eax
+
+ and r15d,edi
+ ror r14d,2
+ add r12d,r13d
+
+ xor r10d,r15d
+ add ecx,r12d
+ add r10d,r12d
+
+ lea rbp,[4+rbp]
+ mov r13d,DWORD[44+rsp]
+ mov r15d,DWORD[32+rsp]
+
+ mov r12d,r13d
+ ror r13d,11
+ add r10d,r14d
+ mov r14d,r15d
+ ror r15d,2
+
+ xor r13d,r12d
+ shr r12d,3
+ ror r13d,7
+ xor r15d,r14d
+ shr r14d,10
+
+ ror r15d,17
+ xor r12d,r13d
+ xor r15d,r14d
+ add r12d,DWORD[12+rsp]
+
+ add r12d,DWORD[40+rsp]
+ mov r13d,ecx
+ add r12d,r15d
+ mov r14d,r10d
+ ror r13d,14
+ mov r15d,edx
+
+ xor r13d,ecx
+ ror r14d,9
+ xor r15d,r8d
+
+ mov DWORD[40+rsp],r12d
+ xor r14d,r10d
+ and r15d,ecx
+
+ ror r13d,5
+ add r12d,r9d
+ xor r15d,r8d
+
+ ror r14d,11
+ xor r13d,ecx
+ add r12d,r15d
+
+ mov r15d,r10d
+ add r12d,DWORD[rbp]
+ xor r14d,r10d
+
+ xor r15d,r11d
+ ror r13d,6
+ mov r9d,r11d
+
+ and edi,r15d
+ ror r14d,2
+ add r12d,r13d
+
+ xor r9d,edi
+ add ebx,r12d
+ add r9d,r12d
+
+ lea rbp,[4+rbp]
+ mov r13d,DWORD[48+rsp]
+ mov edi,DWORD[36+rsp]
+
+ mov r12d,r13d
+ ror r13d,11
+ add r9d,r14d
+ mov r14d,edi
+ ror edi,2
+
+ xor r13d,r12d
+ shr r12d,3
+ ror r13d,7
+ xor edi,r14d
+ shr r14d,10
+
+ ror edi,17
+ xor r12d,r13d
+ xor edi,r14d
+ add r12d,DWORD[16+rsp]
+
+ add r12d,DWORD[44+rsp]
+ mov r13d,ebx
+ add r12d,edi
+ mov r14d,r9d
+ ror r13d,14
+ mov edi,ecx
+
+ xor r13d,ebx
+ ror r14d,9
+ xor edi,edx
+
+ mov DWORD[44+rsp],r12d
+ xor r14d,r9d
+ and edi,ebx
+
+ ror r13d,5
+ add r12d,r8d
+ xor edi,edx
+
+ ror r14d,11
+ xor r13d,ebx
+ add r12d,edi
+
+ mov edi,r9d
+ add r12d,DWORD[rbp]
+ xor r14d,r9d
+
+ xor edi,r10d
+ ror r13d,6
+ mov r8d,r10d
+
+ and r15d,edi
+ ror r14d,2
+ add r12d,r13d
+
+ xor r8d,r15d
+ add eax,r12d
+ add r8d,r12d
+
+ lea rbp,[20+rbp]
+ mov r13d,DWORD[52+rsp]
+ mov r15d,DWORD[40+rsp]
+
+ mov r12d,r13d
+ ror r13d,11
+ add r8d,r14d
+ mov r14d,r15d
+ ror r15d,2
+
+ xor r13d,r12d
+ shr r12d,3
+ ror r13d,7
+ xor r15d,r14d
+ shr r14d,10
+
+ ror r15d,17
+ xor r12d,r13d
+ xor r15d,r14d
+ add r12d,DWORD[20+rsp]
+
+ add r12d,DWORD[48+rsp]
+ mov r13d,eax
+ add r12d,r15d
+ mov r14d,r8d
+ ror r13d,14
+ mov r15d,ebx
+
+ xor r13d,eax
+ ror r14d,9
+ xor r15d,ecx
+
+ mov DWORD[48+rsp],r12d
+ xor r14d,r8d
+ and r15d,eax
+
+ ror r13d,5
+ add r12d,edx
+ xor r15d,ecx
+
+ ror r14d,11
+ xor r13d,eax
+ add r12d,r15d
+
+ mov r15d,r8d
+ add r12d,DWORD[rbp]
+ xor r14d,r8d
+
+ xor r15d,r9d
+ ror r13d,6
+ mov edx,r9d
+
+ and edi,r15d
+ ror r14d,2
+ add r12d,r13d
+
+ xor edx,edi
+ add r11d,r12d
+ add edx,r12d
+
+ lea rbp,[4+rbp]
+ mov r13d,DWORD[56+rsp]
+ mov edi,DWORD[44+rsp]
+
+ mov r12d,r13d
+ ror r13d,11
+ add edx,r14d
+ mov r14d,edi
+ ror edi,2
+
+ xor r13d,r12d
+ shr r12d,3
+ ror r13d,7
+ xor edi,r14d
+ shr r14d,10
+
+ ror edi,17
+ xor r12d,r13d
+ xor edi,r14d
+ add r12d,DWORD[24+rsp]
+
+ add r12d,DWORD[52+rsp]
+ mov r13d,r11d
+ add r12d,edi
+ mov r14d,edx
+ ror r13d,14
+ mov edi,eax
+
+ xor r13d,r11d
+ ror r14d,9
+ xor edi,ebx
+
+ mov DWORD[52+rsp],r12d
+ xor r14d,edx
+ and edi,r11d
+
+ ror r13d,5
+ add r12d,ecx
+ xor edi,ebx
+
+ ror r14d,11
+ xor r13d,r11d
+ add r12d,edi
+
+ mov edi,edx
+ add r12d,DWORD[rbp]
+ xor r14d,edx
+
+ xor edi,r8d
+ ror r13d,6
+ mov ecx,r8d
+
+ and r15d,edi
+ ror r14d,2
+ add r12d,r13d
+
+ xor ecx,r15d
+ add r10d,r12d
+ add ecx,r12d
+
+ lea rbp,[4+rbp]
+ mov r13d,DWORD[60+rsp]
+ mov r15d,DWORD[48+rsp]
+
+ mov r12d,r13d
+ ror r13d,11
+ add ecx,r14d
+ mov r14d,r15d
+ ror r15d,2
+
+ xor r13d,r12d
+ shr r12d,3
+ ror r13d,7
+ xor r15d,r14d
+ shr r14d,10
+
+ ror r15d,17
+ xor r12d,r13d
+ xor r15d,r14d
+ add r12d,DWORD[28+rsp]
+
+ add r12d,DWORD[56+rsp]
+ mov r13d,r10d
+ add r12d,r15d
+ mov r14d,ecx
+ ror r13d,14
+ mov r15d,r11d
+
+ xor r13d,r10d
+ ror r14d,9
+ xor r15d,eax
+
+ mov DWORD[56+rsp],r12d
+ xor r14d,ecx
+ and r15d,r10d
+
+ ror r13d,5
+ add r12d,ebx
+ xor r15d,eax
+
+ ror r14d,11
+ xor r13d,r10d
+ add r12d,r15d
+
+ mov r15d,ecx
+ add r12d,DWORD[rbp]
+ xor r14d,ecx
+
+ xor r15d,edx
+ ror r13d,6
+ mov ebx,edx
+
+ and edi,r15d
+ ror r14d,2
+ add r12d,r13d
+
+ xor ebx,edi
+ add r9d,r12d
+ add ebx,r12d
+
+ lea rbp,[4+rbp]
+ mov r13d,DWORD[rsp]
+ mov edi,DWORD[52+rsp]
+
+ mov r12d,r13d
+ ror r13d,11
+ add ebx,r14d
+ mov r14d,edi
+ ror edi,2
+
+ xor r13d,r12d
+ shr r12d,3
+ ror r13d,7
+ xor edi,r14d
+ shr r14d,10
+
+ ror edi,17
+ xor r12d,r13d
+ xor edi,r14d
+ add r12d,DWORD[32+rsp]
+
+ add r12d,DWORD[60+rsp]
+ mov r13d,r9d
+ add r12d,edi
+ mov r14d,ebx
+ ror r13d,14
+ mov edi,r10d
+
+ xor r13d,r9d
+ ror r14d,9
+ xor edi,r11d
+
+ mov DWORD[60+rsp],r12d
+ xor r14d,ebx
+ and edi,r9d
+
+ ror r13d,5
+ add r12d,eax
+ xor edi,r11d
+
+ ror r14d,11
+ xor r13d,r9d
+ add r12d,edi
+
+ mov edi,ebx
+ add r12d,DWORD[rbp]
+ xor r14d,ebx
+
+ xor edi,ecx
+ ror r13d,6
+ mov eax,ecx
+
+ and r15d,edi
+ ror r14d,2
+ add r12d,r13d
+
+ xor eax,r15d
+ add r8d,r12d
+ add eax,r12d
+
+ lea rbp,[20+rbp]
+ cmp BYTE[3+rbp],0
+ jnz NEAR $L$rounds_16_xx
+
+ mov rdi,QWORD[((64+0))+rsp]
+ add eax,r14d
+ lea rsi,[64+rsi]
+
+ add eax,DWORD[rdi]
+ add ebx,DWORD[4+rdi]
+ add ecx,DWORD[8+rdi]
+ add edx,DWORD[12+rdi]
+ add r8d,DWORD[16+rdi]
+ add r9d,DWORD[20+rdi]
+ add r10d,DWORD[24+rdi]
+ add r11d,DWORD[28+rdi]
+
+ cmp rsi,QWORD[((64+16))+rsp]
+
+ mov DWORD[rdi],eax
+ mov DWORD[4+rdi],ebx
+ mov DWORD[8+rdi],ecx
+ mov DWORD[12+rdi],edx
+ mov DWORD[16+rdi],r8d
+ mov DWORD[20+rdi],r9d
+ mov DWORD[24+rdi],r10d
+ mov DWORD[28+rdi],r11d
+ jb NEAR $L$loop
+
+ mov rsi,QWORD[88+rsp]
+
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_sha256_block_data_order:
+ALIGN 64
+
+K256:
+ DD 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5
+ DD 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5
+ DD 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5
+ DD 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5
+ DD 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3
+ DD 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3
+ DD 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174
+ DD 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174
+ DD 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc
+ DD 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc
+ DD 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da
+ DD 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da
+ DD 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7
+ DD 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7
+ DD 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967
+ DD 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967
+ DD 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13
+ DD 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13
+ DD 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85
+ DD 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85
+ DD 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3
+ DD 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3
+ DD 0xd192e819,0xd6990624,0xf40e3585,0x106aa070
+ DD 0xd192e819,0xd6990624,0xf40e3585,0x106aa070
+ DD 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5
+ DD 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5
+ DD 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3
+ DD 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3
+ DD 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208
+ DD 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208
+ DD 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2
+ DD 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2
+
+ DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+ DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+ DD 0x03020100,0x0b0a0908,0xffffffff,0xffffffff
+ DD 0x03020100,0x0b0a0908,0xffffffff,0xffffffff
+ DD 0xffffffff,0xffffffff,0x03020100,0x0b0a0908
+ DD 0xffffffff,0xffffffff,0x03020100,0x0b0a0908
+DB 83,72,65,50,53,54,32,98,108,111,99,107,32,116,114,97
+DB 110,115,102,111,114,109,32,102,111,114,32,120,56,54,95,54
+DB 52,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121
+DB 32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46
+DB 111,114,103,62,0
+
+ALIGN 64
+sha256_block_data_order_shaext:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha256_block_data_order_shaext:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+_shaext_shortcut:
+ lea rsp,[((-88))+rsp]
+ movaps XMMWORD[(-8-80)+rax],xmm6
+ movaps XMMWORD[(-8-64)+rax],xmm7
+ movaps XMMWORD[(-8-48)+rax],xmm8
+ movaps XMMWORD[(-8-32)+rax],xmm9
+ movaps XMMWORD[(-8-16)+rax],xmm10
+$L$prologue_shaext:
+ lea rcx,[((K256+128))]
+ movdqu xmm1,XMMWORD[rdi]
+ movdqu xmm2,XMMWORD[16+rdi]
+ movdqa xmm7,XMMWORD[((512-128))+rcx]
+
+ pshufd xmm0,xmm1,0x1b
+ pshufd xmm1,xmm1,0xb1
+ pshufd xmm2,xmm2,0x1b
+ movdqa xmm8,xmm7
+DB 102,15,58,15,202,8
+ punpcklqdq xmm2,xmm0
+ jmp NEAR $L$oop_shaext
+
+ALIGN 16
+$L$oop_shaext:
+ movdqu xmm3,XMMWORD[rsi]
+ movdqu xmm4,XMMWORD[16+rsi]
+ movdqu xmm5,XMMWORD[32+rsi]
+DB 102,15,56,0,223
+ movdqu xmm6,XMMWORD[48+rsi]
+
+ movdqa xmm0,XMMWORD[((0-128))+rcx]
+ paddd xmm0,xmm3
+DB 102,15,56,0,231
+ movdqa xmm10,xmm2
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ nop
+ movdqa xmm9,xmm1
+DB 15,56,203,202
+
+ movdqa xmm0,XMMWORD[((32-128))+rcx]
+ paddd xmm0,xmm4
+DB 102,15,56,0,239
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ lea rsi,[64+rsi]
+DB 15,56,204,220
+DB 15,56,203,202
+
+ movdqa xmm0,XMMWORD[((64-128))+rcx]
+ paddd xmm0,xmm5
+DB 102,15,56,0,247
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm7,xmm6
+DB 102,15,58,15,253,4
+ nop
+ paddd xmm3,xmm7
+DB 15,56,204,229
+DB 15,56,203,202
+
+ movdqa xmm0,XMMWORD[((96-128))+rcx]
+ paddd xmm0,xmm6
+DB 15,56,205,222
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm7,xmm3
+DB 102,15,58,15,254,4
+ nop
+ paddd xmm4,xmm7
+DB 15,56,204,238
+DB 15,56,203,202
+ movdqa xmm0,XMMWORD[((128-128))+rcx]
+ paddd xmm0,xmm3
+DB 15,56,205,227
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm7,xmm4
+DB 102,15,58,15,251,4
+ nop
+ paddd xmm5,xmm7
+DB 15,56,204,243
+DB 15,56,203,202
+ movdqa xmm0,XMMWORD[((160-128))+rcx]
+ paddd xmm0,xmm4
+DB 15,56,205,236
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm7,xmm5
+DB 102,15,58,15,252,4
+ nop
+ paddd xmm6,xmm7
+DB 15,56,204,220
+DB 15,56,203,202
+ movdqa xmm0,XMMWORD[((192-128))+rcx]
+ paddd xmm0,xmm5
+DB 15,56,205,245
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm7,xmm6
+DB 102,15,58,15,253,4
+ nop
+ paddd xmm3,xmm7
+DB 15,56,204,229
+DB 15,56,203,202
+ movdqa xmm0,XMMWORD[((224-128))+rcx]
+ paddd xmm0,xmm6
+DB 15,56,205,222
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm7,xmm3
+DB 102,15,58,15,254,4
+ nop
+ paddd xmm4,xmm7
+DB 15,56,204,238
+DB 15,56,203,202
+ movdqa xmm0,XMMWORD[((256-128))+rcx]
+ paddd xmm0,xmm3
+DB 15,56,205,227
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm7,xmm4
+DB 102,15,58,15,251,4
+ nop
+ paddd xmm5,xmm7
+DB 15,56,204,243
+DB 15,56,203,202
+ movdqa xmm0,XMMWORD[((288-128))+rcx]
+ paddd xmm0,xmm4
+DB 15,56,205,236
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm7,xmm5
+DB 102,15,58,15,252,4
+ nop
+ paddd xmm6,xmm7
+DB 15,56,204,220
+DB 15,56,203,202
+ movdqa xmm0,XMMWORD[((320-128))+rcx]
+ paddd xmm0,xmm5
+DB 15,56,205,245
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm7,xmm6
+DB 102,15,58,15,253,4
+ nop
+ paddd xmm3,xmm7
+DB 15,56,204,229
+DB 15,56,203,202
+ movdqa xmm0,XMMWORD[((352-128))+rcx]
+ paddd xmm0,xmm6
+DB 15,56,205,222
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm7,xmm3
+DB 102,15,58,15,254,4
+ nop
+ paddd xmm4,xmm7
+DB 15,56,204,238
+DB 15,56,203,202
+ movdqa xmm0,XMMWORD[((384-128))+rcx]
+ paddd xmm0,xmm3
+DB 15,56,205,227
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm7,xmm4
+DB 102,15,58,15,251,4
+ nop
+ paddd xmm5,xmm7
+DB 15,56,204,243
+DB 15,56,203,202
+ movdqa xmm0,XMMWORD[((416-128))+rcx]
+ paddd xmm0,xmm4
+DB 15,56,205,236
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm7,xmm5
+DB 102,15,58,15,252,4
+DB 15,56,203,202
+ paddd xmm6,xmm7
+
+ movdqa xmm0,XMMWORD[((448-128))+rcx]
+ paddd xmm0,xmm5
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+DB 15,56,205,245
+ movdqa xmm7,xmm8
+DB 15,56,203,202
+
+ movdqa xmm0,XMMWORD[((480-128))+rcx]
+ paddd xmm0,xmm6
+ nop
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ dec rdx
+ nop
+DB 15,56,203,202
+
+ paddd xmm2,xmm10
+ paddd xmm1,xmm9
+ jnz NEAR $L$oop_shaext
+
+ pshufd xmm2,xmm2,0xb1
+ pshufd xmm7,xmm1,0x1b
+ pshufd xmm1,xmm1,0xb1
+ punpckhqdq xmm1,xmm2
+DB 102,15,58,15,215,8
+
+ movdqu XMMWORD[rdi],xmm1
+ movdqu XMMWORD[16+rdi],xmm2
+ movaps xmm6,XMMWORD[((-8-80))+rax]
+ movaps xmm7,XMMWORD[((-8-64))+rax]
+ movaps xmm8,XMMWORD[((-8-48))+rax]
+ movaps xmm9,XMMWORD[((-8-32))+rax]
+ movaps xmm10,XMMWORD[((-8-16))+rax]
+ mov rsp,rax
+$L$epilogue_shaext:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+$L$SEH_end_sha256_block_data_order_shaext:
+
+ALIGN 64
+sha256_block_data_order_ssse3:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha256_block_data_order_ssse3:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+$L$ssse3_shortcut:
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ shl rdx,4
+ sub rsp,160
+ lea rdx,[rdx*4+rsi]
+ and rsp,-64
+ mov QWORD[((64+0))+rsp],rdi
+ mov QWORD[((64+8))+rsp],rsi
+ mov QWORD[((64+16))+rsp],rdx
+ mov QWORD[88+rsp],rax
+
+ movaps XMMWORD[(64+32)+rsp],xmm6
+ movaps XMMWORD[(64+48)+rsp],xmm7
+ movaps XMMWORD[(64+64)+rsp],xmm8
+ movaps XMMWORD[(64+80)+rsp],xmm9
+$L$prologue_ssse3:
+
+ mov eax,DWORD[rdi]
+ mov ebx,DWORD[4+rdi]
+ mov ecx,DWORD[8+rdi]
+ mov edx,DWORD[12+rdi]
+ mov r8d,DWORD[16+rdi]
+ mov r9d,DWORD[20+rdi]
+ mov r10d,DWORD[24+rdi]
+ mov r11d,DWORD[28+rdi]
+
+
+ jmp NEAR $L$loop_ssse3
+ALIGN 16
+$L$loop_ssse3:
+ movdqa xmm7,XMMWORD[((K256+512))]
+ movdqu xmm0,XMMWORD[rsi]
+ movdqu xmm1,XMMWORD[16+rsi]
+ movdqu xmm2,XMMWORD[32+rsi]
+DB 102,15,56,0,199
+ movdqu xmm3,XMMWORD[48+rsi]
+ lea rbp,[K256]
+DB 102,15,56,0,207
+ movdqa xmm4,XMMWORD[rbp]
+ movdqa xmm5,XMMWORD[32+rbp]
+DB 102,15,56,0,215
+ paddd xmm4,xmm0
+ movdqa xmm6,XMMWORD[64+rbp]
+DB 102,15,56,0,223
+ movdqa xmm7,XMMWORD[96+rbp]
+ paddd xmm5,xmm1
+ paddd xmm6,xmm2
+ paddd xmm7,xmm3
+ movdqa XMMWORD[rsp],xmm4
+ mov r14d,eax
+ movdqa XMMWORD[16+rsp],xmm5
+ mov edi,ebx
+ movdqa XMMWORD[32+rsp],xmm6
+ xor edi,ecx
+ movdqa XMMWORD[48+rsp],xmm7
+ mov r13d,r8d
+ jmp NEAR $L$ssse3_00_47
+
+ALIGN 16
+$L$ssse3_00_47:
+ sub rbp,-128
+ ror r13d,14
+ movdqa xmm4,xmm1
+ mov eax,r14d
+ mov r12d,r9d
+ movdqa xmm7,xmm3
+ ror r14d,9
+ xor r13d,r8d
+ xor r12d,r10d
+ ror r13d,5
+ xor r14d,eax
+DB 102,15,58,15,224,4
+ and r12d,r8d
+ xor r13d,r8d
+DB 102,15,58,15,250,4
+ add r11d,DWORD[rsp]
+ mov r15d,eax
+ xor r12d,r10d
+ ror r14d,11
+ movdqa xmm5,xmm4
+ xor r15d,ebx
+ add r11d,r12d
+ movdqa xmm6,xmm4
+ ror r13d,6
+ and edi,r15d
+ psrld xmm4,3
+ xor r14d,eax
+ add r11d,r13d
+ xor edi,ebx
+ paddd xmm0,xmm7
+ ror r14d,2
+ add edx,r11d
+ psrld xmm6,7
+ add r11d,edi
+ mov r13d,edx
+ pshufd xmm7,xmm3,250
+ add r14d,r11d
+ ror r13d,14
+ pslld xmm5,14
+ mov r11d,r14d
+ mov r12d,r8d
+ pxor xmm4,xmm6
+ ror r14d,9
+ xor r13d,edx
+ xor r12d,r9d
+ ror r13d,5
+ psrld xmm6,11
+ xor r14d,r11d
+ pxor xmm4,xmm5
+ and r12d,edx
+ xor r13d,edx
+ pslld xmm5,11
+ add r10d,DWORD[4+rsp]
+ mov edi,r11d
+ pxor xmm4,xmm6
+ xor r12d,r9d
+ ror r14d,11
+ movdqa xmm6,xmm7
+ xor edi,eax
+ add r10d,r12d
+ pxor xmm4,xmm5
+ ror r13d,6
+ and r15d,edi
+ xor r14d,r11d
+ psrld xmm7,10
+ add r10d,r13d
+ xor r15d,eax
+ paddd xmm0,xmm4
+ ror r14d,2
+ add ecx,r10d
+ psrlq xmm6,17
+ add r10d,r15d
+ mov r13d,ecx
+ add r14d,r10d
+ pxor xmm7,xmm6
+ ror r13d,14
+ mov r10d,r14d
+ mov r12d,edx
+ ror r14d,9
+ psrlq xmm6,2
+ xor r13d,ecx
+ xor r12d,r8d
+ pxor xmm7,xmm6
+ ror r13d,5
+ xor r14d,r10d
+ and r12d,ecx
+ pshufd xmm7,xmm7,128
+ xor r13d,ecx
+ add r9d,DWORD[8+rsp]
+ mov r15d,r10d
+ psrldq xmm7,8
+ xor r12d,r8d
+ ror r14d,11
+ xor r15d,r11d
+ add r9d,r12d
+ ror r13d,6
+ paddd xmm0,xmm7
+ and edi,r15d
+ xor r14d,r10d
+ add r9d,r13d
+ pshufd xmm7,xmm0,80
+ xor edi,r11d
+ ror r14d,2
+ add ebx,r9d
+ movdqa xmm6,xmm7
+ add r9d,edi
+ mov r13d,ebx
+ psrld xmm7,10
+ add r14d,r9d
+ ror r13d,14
+ psrlq xmm6,17
+ mov r9d,r14d
+ mov r12d,ecx
+ pxor xmm7,xmm6
+ ror r14d,9
+ xor r13d,ebx
+ xor r12d,edx
+ ror r13d,5
+ xor r14d,r9d
+ psrlq xmm6,2
+ and r12d,ebx
+ xor r13d,ebx
+ add r8d,DWORD[12+rsp]
+ pxor xmm7,xmm6
+ mov edi,r9d
+ xor r12d,edx
+ ror r14d,11
+ pshufd xmm7,xmm7,8
+ xor edi,r10d
+ add r8d,r12d
+ movdqa xmm6,XMMWORD[rbp]
+ ror r13d,6
+ and r15d,edi
+ pslldq xmm7,8
+ xor r14d,r9d
+ add r8d,r13d
+ xor r15d,r10d
+ paddd xmm0,xmm7
+ ror r14d,2
+ add eax,r8d
+ add r8d,r15d
+ paddd xmm6,xmm0
+ mov r13d,eax
+ add r14d,r8d
+ movdqa XMMWORD[rsp],xmm6
+ ror r13d,14
+ movdqa xmm4,xmm2
+ mov r8d,r14d
+ mov r12d,ebx
+ movdqa xmm7,xmm0
+ ror r14d,9
+ xor r13d,eax
+ xor r12d,ecx
+ ror r13d,5
+ xor r14d,r8d
+DB 102,15,58,15,225,4
+ and r12d,eax
+ xor r13d,eax
+DB 102,15,58,15,251,4
+ add edx,DWORD[16+rsp]
+ mov r15d,r8d
+ xor r12d,ecx
+ ror r14d,11
+ movdqa xmm5,xmm4
+ xor r15d,r9d
+ add edx,r12d
+ movdqa xmm6,xmm4
+ ror r13d,6
+ and edi,r15d
+ psrld xmm4,3
+ xor r14d,r8d
+ add edx,r13d
+ xor edi,r9d
+ paddd xmm1,xmm7
+ ror r14d,2
+ add r11d,edx
+ psrld xmm6,7
+ add edx,edi
+ mov r13d,r11d
+ pshufd xmm7,xmm0,250
+ add r14d,edx
+ ror r13d,14
+ pslld xmm5,14
+ mov edx,r14d
+ mov r12d,eax
+ pxor xmm4,xmm6
+ ror r14d,9
+ xor r13d,r11d
+ xor r12d,ebx
+ ror r13d,5
+ psrld xmm6,11
+ xor r14d,edx
+ pxor xmm4,xmm5
+ and r12d,r11d
+ xor r13d,r11d
+ pslld xmm5,11
+ add ecx,DWORD[20+rsp]
+ mov edi,edx
+ pxor xmm4,xmm6
+ xor r12d,ebx
+ ror r14d,11
+ movdqa xmm6,xmm7
+ xor edi,r8d
+ add ecx,r12d
+ pxor xmm4,xmm5
+ ror r13d,6
+ and r15d,edi
+ xor r14d,edx
+ psrld xmm7,10
+ add ecx,r13d
+ xor r15d,r8d
+ paddd xmm1,xmm4
+ ror r14d,2
+ add r10d,ecx
+ psrlq xmm6,17
+ add ecx,r15d
+ mov r13d,r10d
+ add r14d,ecx
+ pxor xmm7,xmm6
+ ror r13d,14
+ mov ecx,r14d
+ mov r12d,r11d
+ ror r14d,9
+ psrlq xmm6,2
+ xor r13d,r10d
+ xor r12d,eax
+ pxor xmm7,xmm6
+ ror r13d,5
+ xor r14d,ecx
+ and r12d,r10d
+ pshufd xmm7,xmm7,128
+ xor r13d,r10d
+ add ebx,DWORD[24+rsp]
+ mov r15d,ecx
+ psrldq xmm7,8
+ xor r12d,eax
+ ror r14d,11
+ xor r15d,edx
+ add ebx,r12d
+ ror r13d,6
+ paddd xmm1,xmm7
+ and edi,r15d
+ xor r14d,ecx
+ add ebx,r13d
+ pshufd xmm7,xmm1,80
+ xor edi,edx
+ ror r14d,2
+ add r9d,ebx
+ movdqa xmm6,xmm7
+ add ebx,edi
+ mov r13d,r9d
+ psrld xmm7,10
+ add r14d,ebx
+ ror r13d,14
+ psrlq xmm6,17
+ mov ebx,r14d
+ mov r12d,r10d
+ pxor xmm7,xmm6
+ ror r14d,9
+ xor r13d,r9d
+ xor r12d,r11d
+ ror r13d,5
+ xor r14d,ebx
+ psrlq xmm6,2
+ and r12d,r9d
+ xor r13d,r9d
+ add eax,DWORD[28+rsp]
+ pxor xmm7,xmm6
+ mov edi,ebx
+ xor r12d,r11d
+ ror r14d,11
+ pshufd xmm7,xmm7,8
+ xor edi,ecx
+ add eax,r12d
+ movdqa xmm6,XMMWORD[32+rbp]
+ ror r13d,6
+ and r15d,edi
+ pslldq xmm7,8
+ xor r14d,ebx
+ add eax,r13d
+ xor r15d,ecx
+ paddd xmm1,xmm7
+ ror r14d,2
+ add r8d,eax
+ add eax,r15d
+ paddd xmm6,xmm1
+ mov r13d,r8d
+ add r14d,eax
+ movdqa XMMWORD[16+rsp],xmm6
+ ror r13d,14
+ movdqa xmm4,xmm3
+ mov eax,r14d
+ mov r12d,r9d
+ movdqa xmm7,xmm1
+ ror r14d,9
+ xor r13d,r8d
+ xor r12d,r10d
+ ror r13d,5
+ xor r14d,eax
+DB 102,15,58,15,226,4
+ and r12d,r8d
+ xor r13d,r8d
+DB 102,15,58,15,248,4
+ add r11d,DWORD[32+rsp]
+ mov r15d,eax
+ xor r12d,r10d
+ ror r14d,11
+ movdqa xmm5,xmm4
+ xor r15d,ebx
+ add r11d,r12d
+ movdqa xmm6,xmm4
+ ror r13d,6
+ and edi,r15d
+ psrld xmm4,3
+ xor r14d,eax
+ add r11d,r13d
+ xor edi,ebx
+ paddd xmm2,xmm7
+ ror r14d,2
+ add edx,r11d
+ psrld xmm6,7
+ add r11d,edi
+ mov r13d,edx
+ pshufd xmm7,xmm1,250
+ add r14d,r11d
+ ror r13d,14
+ pslld xmm5,14
+ mov r11d,r14d
+ mov r12d,r8d
+ pxor xmm4,xmm6
+ ror r14d,9
+ xor r13d,edx
+ xor r12d,r9d
+ ror r13d,5
+ psrld xmm6,11
+ xor r14d,r11d
+ pxor xmm4,xmm5
+ and r12d,edx
+ xor r13d,edx
+ pslld xmm5,11
+ add r10d,DWORD[36+rsp]
+ mov edi,r11d
+ pxor xmm4,xmm6
+ xor r12d,r9d
+ ror r14d,11
+ movdqa xmm6,xmm7
+ xor edi,eax
+ add r10d,r12d
+ pxor xmm4,xmm5
+ ror r13d,6
+ and r15d,edi
+ xor r14d,r11d
+ psrld xmm7,10
+ add r10d,r13d
+ xor r15d,eax
+ paddd xmm2,xmm4
+ ror r14d,2
+ add ecx,r10d
+ psrlq xmm6,17
+ add r10d,r15d
+ mov r13d,ecx
+ add r14d,r10d
+ pxor xmm7,xmm6
+ ror r13d,14
+ mov r10d,r14d
+ mov r12d,edx
+ ror r14d,9
+ psrlq xmm6,2
+ xor r13d,ecx
+ xor r12d,r8d
+ pxor xmm7,xmm6
+ ror r13d,5
+ xor r14d,r10d
+ and r12d,ecx
+ pshufd xmm7,xmm7,128
+ xor r13d,ecx
+ add r9d,DWORD[40+rsp]
+ mov r15d,r10d
+ psrldq xmm7,8
+ xor r12d,r8d
+ ror r14d,11
+ xor r15d,r11d
+ add r9d,r12d
+ ror r13d,6
+ paddd xmm2,xmm7
+ and edi,r15d
+ xor r14d,r10d
+ add r9d,r13d
+ pshufd xmm7,xmm2,80
+ xor edi,r11d
+ ror r14d,2
+ add ebx,r9d
+ movdqa xmm6,xmm7
+ add r9d,edi
+ mov r13d,ebx
+ psrld xmm7,10
+ add r14d,r9d
+ ror r13d,14
+ psrlq xmm6,17
+ mov r9d,r14d
+ mov r12d,ecx
+ pxor xmm7,xmm6
+ ror r14d,9
+ xor r13d,ebx
+ xor r12d,edx
+ ror r13d,5
+ xor r14d,r9d
+ psrlq xmm6,2
+ and r12d,ebx
+ xor r13d,ebx
+ add r8d,DWORD[44+rsp]
+ pxor xmm7,xmm6
+ mov edi,r9d
+ xor r12d,edx
+ ror r14d,11
+ pshufd xmm7,xmm7,8
+ xor edi,r10d
+ add r8d,r12d
+ movdqa xmm6,XMMWORD[64+rbp]
+ ror r13d,6
+ and r15d,edi
+ pslldq xmm7,8
+ xor r14d,r9d
+ add r8d,r13d
+ xor r15d,r10d
+ paddd xmm2,xmm7
+ ror r14d,2
+ add eax,r8d
+ add r8d,r15d
+ paddd xmm6,xmm2
+ mov r13d,eax
+ add r14d,r8d
+ movdqa XMMWORD[32+rsp],xmm6
+ ror r13d,14
+ movdqa xmm4,xmm0
+ mov r8d,r14d
+ mov r12d,ebx
+ movdqa xmm7,xmm2
+ ror r14d,9
+ xor r13d,eax
+ xor r12d,ecx
+ ror r13d,5
+ xor r14d,r8d
+DB 102,15,58,15,227,4
+ and r12d,eax
+ xor r13d,eax
+DB 102,15,58,15,249,4
+ add edx,DWORD[48+rsp]
+ mov r15d,r8d
+ xor r12d,ecx
+ ror r14d,11
+ movdqa xmm5,xmm4
+ xor r15d,r9d
+ add edx,r12d
+ movdqa xmm6,xmm4
+ ror r13d,6
+ and edi,r15d
+ psrld xmm4,3
+ xor r14d,r8d
+ add edx,r13d
+ xor edi,r9d
+ paddd xmm3,xmm7
+ ror r14d,2
+ add r11d,edx
+ psrld xmm6,7
+ add edx,edi
+ mov r13d,r11d
+ pshufd xmm7,xmm2,250
+ add r14d,edx
+ ror r13d,14
+ pslld xmm5,14
+ mov edx,r14d
+ mov r12d,eax
+ pxor xmm4,xmm6
+ ror r14d,9
+ xor r13d,r11d
+ xor r12d,ebx
+ ror r13d,5
+ psrld xmm6,11
+ xor r14d,edx
+ pxor xmm4,xmm5
+ and r12d,r11d
+ xor r13d,r11d
+ pslld xmm5,11
+ add ecx,DWORD[52+rsp]
+ mov edi,edx
+ pxor xmm4,xmm6
+ xor r12d,ebx
+ ror r14d,11
+ movdqa xmm6,xmm7
+ xor edi,r8d
+ add ecx,r12d
+ pxor xmm4,xmm5
+ ror r13d,6
+ and r15d,edi
+ xor r14d,edx
+ psrld xmm7,10
+ add ecx,r13d
+ xor r15d,r8d
+ paddd xmm3,xmm4
+ ror r14d,2
+ add r10d,ecx
+ psrlq xmm6,17
+ add ecx,r15d
+ mov r13d,r10d
+ add r14d,ecx
+ pxor xmm7,xmm6
+ ror r13d,14
+ mov ecx,r14d
+ mov r12d,r11d
+ ror r14d,9
+ psrlq xmm6,2
+ xor r13d,r10d
+ xor r12d,eax
+ pxor xmm7,xmm6
+ ror r13d,5
+ xor r14d,ecx
+ and r12d,r10d
+ pshufd xmm7,xmm7,128
+ xor r13d,r10d
+ add ebx,DWORD[56+rsp]
+ mov r15d,ecx
+ psrldq xmm7,8
+ xor r12d,eax
+ ror r14d,11
+ xor r15d,edx
+ add ebx,r12d
+ ror r13d,6
+ paddd xmm3,xmm7
+ and edi,r15d
+ xor r14d,ecx
+ add ebx,r13d
+ pshufd xmm7,xmm3,80
+ xor edi,edx
+ ror r14d,2
+ add r9d,ebx
+ movdqa xmm6,xmm7
+ add ebx,edi
+ mov r13d,r9d
+ psrld xmm7,10
+ add r14d,ebx
+ ror r13d,14
+ psrlq xmm6,17
+ mov ebx,r14d
+ mov r12d,r10d
+ pxor xmm7,xmm6
+ ror r14d,9
+ xor r13d,r9d
+ xor r12d,r11d
+ ror r13d,5
+ xor r14d,ebx
+ psrlq xmm6,2
+ and r12d,r9d
+ xor r13d,r9d
+ add eax,DWORD[60+rsp]
+ pxor xmm7,xmm6
+ mov edi,ebx
+ xor r12d,r11d
+ ror r14d,11
+ pshufd xmm7,xmm7,8
+ xor edi,ecx
+ add eax,r12d
+ movdqa xmm6,XMMWORD[96+rbp]
+ ror r13d,6
+ and r15d,edi
+ pslldq xmm7,8
+ xor r14d,ebx
+ add eax,r13d
+ xor r15d,ecx
+ paddd xmm3,xmm7
+ ror r14d,2
+ add r8d,eax
+ add eax,r15d
+ paddd xmm6,xmm3
+ mov r13d,r8d
+ add r14d,eax
+ movdqa XMMWORD[48+rsp],xmm6
+ cmp BYTE[131+rbp],0
+ jne NEAR $L$ssse3_00_47
+ ror r13d,14
+ mov eax,r14d
+ mov r12d,r9d
+ ror r14d,9
+ xor r13d,r8d
+ xor r12d,r10d
+ ror r13d,5
+ xor r14d,eax
+ and r12d,r8d
+ xor r13d,r8d
+ add r11d,DWORD[rsp]
+ mov r15d,eax
+ xor r12d,r10d
+ ror r14d,11
+ xor r15d,ebx
+ add r11d,r12d
+ ror r13d,6
+ and edi,r15d
+ xor r14d,eax
+ add r11d,r13d
+ xor edi,ebx
+ ror r14d,2
+ add edx,r11d
+ add r11d,edi
+ mov r13d,edx
+ add r14d,r11d
+ ror r13d,14
+ mov r11d,r14d
+ mov r12d,r8d
+ ror r14d,9
+ xor r13d,edx
+ xor r12d,r9d
+ ror r13d,5
+ xor r14d,r11d
+ and r12d,edx
+ xor r13d,edx
+ add r10d,DWORD[4+rsp]
+ mov edi,r11d
+ xor r12d,r9d
+ ror r14d,11
+ xor edi,eax
+ add r10d,r12d
+ ror r13d,6
+ and r15d,edi
+ xor r14d,r11d
+ add r10d,r13d
+ xor r15d,eax
+ ror r14d,2
+ add ecx,r10d
+ add r10d,r15d
+ mov r13d,ecx
+ add r14d,r10d
+ ror r13d,14
+ mov r10d,r14d
+ mov r12d,edx
+ ror r14d,9
+ xor r13d,ecx
+ xor r12d,r8d
+ ror r13d,5
+ xor r14d,r10d
+ and r12d,ecx
+ xor r13d,ecx
+ add r9d,DWORD[8+rsp]
+ mov r15d,r10d
+ xor r12d,r8d
+ ror r14d,11
+ xor r15d,r11d
+ add r9d,r12d
+ ror r13d,6
+ and edi,r15d
+ xor r14d,r10d
+ add r9d,r13d
+ xor edi,r11d
+ ror r14d,2
+ add ebx,r9d
+ add r9d,edi
+ mov r13d,ebx
+ add r14d,r9d
+ ror r13d,14
+ mov r9d,r14d
+ mov r12d,ecx
+ ror r14d,9
+ xor r13d,ebx
+ xor r12d,edx
+ ror r13d,5
+ xor r14d,r9d
+ and r12d,ebx
+ xor r13d,ebx
+ add r8d,DWORD[12+rsp]
+ mov edi,r9d
+ xor r12d,edx
+ ror r14d,11
+ xor edi,r10d
+ add r8d,r12d
+ ror r13d,6
+ and r15d,edi
+ xor r14d,r9d
+ add r8d,r13d
+ xor r15d,r10d
+ ror r14d,2
+ add eax,r8d
+ add r8d,r15d
+ mov r13d,eax
+ add r14d,r8d
+ ror r13d,14
+ mov r8d,r14d
+ mov r12d,ebx
+ ror r14d,9
+ xor r13d,eax
+ xor r12d,ecx
+ ror r13d,5
+ xor r14d,r8d
+ and r12d,eax
+ xor r13d,eax
+ add edx,DWORD[16+rsp]
+ mov r15d,r8d
+ xor r12d,ecx
+ ror r14d,11
+ xor r15d,r9d
+ add edx,r12d
+ ror r13d,6
+ and edi,r15d
+ xor r14d,r8d
+ add edx,r13d
+ xor edi,r9d
+ ror r14d,2
+ add r11d,edx
+ add edx,edi
+ mov r13d,r11d
+ add r14d,edx
+ ror r13d,14
+ mov edx,r14d
+ mov r12d,eax
+ ror r14d,9
+ xor r13d,r11d
+ xor r12d,ebx
+ ror r13d,5
+ xor r14d,edx
+ and r12d,r11d
+ xor r13d,r11d
+ add ecx,DWORD[20+rsp]
+ mov edi,edx
+ xor r12d,ebx
+ ror r14d,11
+ xor edi,r8d
+ add ecx,r12d
+ ror r13d,6
+ and r15d,edi
+ xor r14d,edx
+ add ecx,r13d
+ xor r15d,r8d
+ ror r14d,2
+ add r10d,ecx
+ add ecx,r15d
+ mov r13d,r10d
+ add r14d,ecx
+ ror r13d,14
+ mov ecx,r14d
+ mov r12d,r11d
+ ror r14d,9
+ xor r13d,r10d
+ xor r12d,eax
+ ror r13d,5
+ xor r14d,ecx
+ and r12d,r10d
+ xor r13d,r10d
+ add ebx,DWORD[24+rsp]
+ mov r15d,ecx
+ xor r12d,eax
+ ror r14d,11
+ xor r15d,edx
+ add ebx,r12d
+ ror r13d,6
+ and edi,r15d
+ xor r14d,ecx
+ add ebx,r13d
+ xor edi,edx
+ ror r14d,2
+ add r9d,ebx
+ add ebx,edi
+ mov r13d,r9d
+ add r14d,ebx
+ ror r13d,14
+ mov ebx,r14d
+ mov r12d,r10d
+ ror r14d,9
+ xor r13d,r9d
+ xor r12d,r11d
+ ror r13d,5
+ xor r14d,ebx
+ and r12d,r9d
+ xor r13d,r9d
+ add eax,DWORD[28+rsp]
+ mov edi,ebx
+ xor r12d,r11d
+ ror r14d,11
+ xor edi,ecx
+ add eax,r12d
+ ror r13d,6
+ and r15d,edi
+ xor r14d,ebx
+ add eax,r13d
+ xor r15d,ecx
+ ror r14d,2
+ add r8d,eax
+ add eax,r15d
+ mov r13d,r8d
+ add r14d,eax
+ ror r13d,14
+ mov eax,r14d
+ mov r12d,r9d
+ ror r14d,9
+ xor r13d,r8d
+ xor r12d,r10d
+ ror r13d,5
+ xor r14d,eax
+ and r12d,r8d
+ xor r13d,r8d
+ add r11d,DWORD[32+rsp]
+ mov r15d,eax
+ xor r12d,r10d
+ ror r14d,11
+ xor r15d,ebx
+ add r11d,r12d
+ ror r13d,6
+ and edi,r15d
+ xor r14d,eax
+ add r11d,r13d
+ xor edi,ebx
+ ror r14d,2
+ add edx,r11d
+ add r11d,edi
+ mov r13d,edx
+ add r14d,r11d
+ ror r13d,14
+ mov r11d,r14d
+ mov r12d,r8d
+ ror r14d,9
+ xor r13d,edx
+ xor r12d,r9d
+ ror r13d,5
+ xor r14d,r11d
+ and r12d,edx
+ xor r13d,edx
+ add r10d,DWORD[36+rsp]
+ mov edi,r11d
+ xor r12d,r9d
+ ror r14d,11
+ xor edi,eax
+ add r10d,r12d
+ ror r13d,6
+ and r15d,edi
+ xor r14d,r11d
+ add r10d,r13d
+ xor r15d,eax
+ ror r14d,2
+ add ecx,r10d
+ add r10d,r15d
+ mov r13d,ecx
+ add r14d,r10d
+ ror r13d,14
+ mov r10d,r14d
+ mov r12d,edx
+ ror r14d,9
+ xor r13d,ecx
+ xor r12d,r8d
+ ror r13d,5
+ xor r14d,r10d
+ and r12d,ecx
+ xor r13d,ecx
+ add r9d,DWORD[40+rsp]
+ mov r15d,r10d
+ xor r12d,r8d
+ ror r14d,11
+ xor r15d,r11d
+ add r9d,r12d
+ ror r13d,6
+ and edi,r15d
+ xor r14d,r10d
+ add r9d,r13d
+ xor edi,r11d
+ ror r14d,2
+ add ebx,r9d
+ add r9d,edi
+ mov r13d,ebx
+ add r14d,r9d
+ ror r13d,14
+ mov r9d,r14d
+ mov r12d,ecx
+ ror r14d,9
+ xor r13d,ebx
+ xor r12d,edx
+ ror r13d,5
+ xor r14d,r9d
+ and r12d,ebx
+ xor r13d,ebx
+ add r8d,DWORD[44+rsp]
+ mov edi,r9d
+ xor r12d,edx
+ ror r14d,11
+ xor edi,r10d
+ add r8d,r12d
+ ror r13d,6
+ and r15d,edi
+ xor r14d,r9d
+ add r8d,r13d
+ xor r15d,r10d
+ ror r14d,2
+ add eax,r8d
+ add r8d,r15d
+ mov r13d,eax
+ add r14d,r8d
+ ror r13d,14
+ mov r8d,r14d
+ mov r12d,ebx
+ ror r14d,9
+ xor r13d,eax
+ xor r12d,ecx
+ ror r13d,5
+ xor r14d,r8d
+ and r12d,eax
+ xor r13d,eax
+ add edx,DWORD[48+rsp]
+ mov r15d,r8d
+ xor r12d,ecx
+ ror r14d,11
+ xor r15d,r9d
+ add edx,r12d
+ ror r13d,6
+ and edi,r15d
+ xor r14d,r8d
+ add edx,r13d
+ xor edi,r9d
+ ror r14d,2
+ add r11d,edx
+ add edx,edi
+ mov r13d,r11d
+ add r14d,edx
+ ror r13d,14
+ mov edx,r14d
+ mov r12d,eax
+ ror r14d,9
+ xor r13d,r11d
+ xor r12d,ebx
+ ror r13d,5
+ xor r14d,edx
+ and r12d,r11d
+ xor r13d,r11d
+ add ecx,DWORD[52+rsp]
+ mov edi,edx
+ xor r12d,ebx
+ ror r14d,11
+ xor edi,r8d
+ add ecx,r12d
+ ror r13d,6
+ and r15d,edi
+ xor r14d,edx
+ add ecx,r13d
+ xor r15d,r8d
+ ror r14d,2
+ add r10d,ecx
+ add ecx,r15d
+ mov r13d,r10d
+ add r14d,ecx
+ ror r13d,14
+ mov ecx,r14d
+ mov r12d,r11d
+ ror r14d,9
+ xor r13d,r10d
+ xor r12d,eax
+ ror r13d,5
+ xor r14d,ecx
+ and r12d,r10d
+ xor r13d,r10d
+ add ebx,DWORD[56+rsp]
+ mov r15d,ecx
+ xor r12d,eax
+ ror r14d,11
+ xor r15d,edx
+ add ebx,r12d
+ ror r13d,6
+ and edi,r15d
+ xor r14d,ecx
+ add ebx,r13d
+ xor edi,edx
+ ror r14d,2
+ add r9d,ebx
+ add ebx,edi
+ mov r13d,r9d
+ add r14d,ebx
+ ror r13d,14
+ mov ebx,r14d
+ mov r12d,r10d
+ ror r14d,9
+ xor r13d,r9d
+ xor r12d,r11d
+ ror r13d,5
+ xor r14d,ebx
+ and r12d,r9d
+ xor r13d,r9d
+ add eax,DWORD[60+rsp]
+ mov edi,ebx
+ xor r12d,r11d
+ ror r14d,11
+ xor edi,ecx
+ add eax,r12d
+ ror r13d,6
+ and r15d,edi
+ xor r14d,ebx
+ add eax,r13d
+ xor r15d,ecx
+ ror r14d,2
+ add r8d,eax
+ add eax,r15d
+ mov r13d,r8d
+ add r14d,eax
+ mov rdi,QWORD[((64+0))+rsp]
+ mov eax,r14d
+
+ add eax,DWORD[rdi]
+ lea rsi,[64+rsi]
+ add ebx,DWORD[4+rdi]
+ add ecx,DWORD[8+rdi]
+ add edx,DWORD[12+rdi]
+ add r8d,DWORD[16+rdi]
+ add r9d,DWORD[20+rdi]
+ add r10d,DWORD[24+rdi]
+ add r11d,DWORD[28+rdi]
+
+ cmp rsi,QWORD[((64+16))+rsp]
+
+ mov DWORD[rdi],eax
+ mov DWORD[4+rdi],ebx
+ mov DWORD[8+rdi],ecx
+ mov DWORD[12+rdi],edx
+ mov DWORD[16+rdi],r8d
+ mov DWORD[20+rdi],r9d
+ mov DWORD[24+rdi],r10d
+ mov DWORD[28+rdi],r11d
+ jb NEAR $L$loop_ssse3
+
+ mov rsi,QWORD[88+rsp]
+
+ movaps xmm6,XMMWORD[((64+32))+rsp]
+ movaps xmm7,XMMWORD[((64+48))+rsp]
+ movaps xmm8,XMMWORD[((64+64))+rsp]
+ movaps xmm9,XMMWORD[((64+80))+rsp]
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$epilogue_ssse3:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_sha256_block_data_order_ssse3:
+
+ALIGN 64
+sha256_block_data_order_avx:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha256_block_data_order_avx:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+$L$avx_shortcut:
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ shl rdx,4
+ sub rsp,160
+ lea rdx,[rdx*4+rsi]
+ and rsp,-64
+ mov QWORD[((64+0))+rsp],rdi
+ mov QWORD[((64+8))+rsp],rsi
+ mov QWORD[((64+16))+rsp],rdx
+ mov QWORD[88+rsp],rax
+
+ movaps XMMWORD[(64+32)+rsp],xmm6
+ movaps XMMWORD[(64+48)+rsp],xmm7
+ movaps XMMWORD[(64+64)+rsp],xmm8
+ movaps XMMWORD[(64+80)+rsp],xmm9
+$L$prologue_avx:
+
+ vzeroupper
+ mov eax,DWORD[rdi]
+ mov ebx,DWORD[4+rdi]
+ mov ecx,DWORD[8+rdi]
+ mov edx,DWORD[12+rdi]
+ mov r8d,DWORD[16+rdi]
+ mov r9d,DWORD[20+rdi]
+ mov r10d,DWORD[24+rdi]
+ mov r11d,DWORD[28+rdi]
+ vmovdqa xmm8,XMMWORD[((K256+512+32))]
+ vmovdqa xmm9,XMMWORD[((K256+512+64))]
+ jmp NEAR $L$loop_avx
+ALIGN 16
+$L$loop_avx:
+ vmovdqa xmm7,XMMWORD[((K256+512))]
+ vmovdqu xmm0,XMMWORD[rsi]
+ vmovdqu xmm1,XMMWORD[16+rsi]
+ vmovdqu xmm2,XMMWORD[32+rsi]
+ vmovdqu xmm3,XMMWORD[48+rsi]
+ vpshufb xmm0,xmm0,xmm7
+ lea rbp,[K256]
+ vpshufb xmm1,xmm1,xmm7
+ vpshufb xmm2,xmm2,xmm7
+ vpaddd xmm4,xmm0,XMMWORD[rbp]
+ vpshufb xmm3,xmm3,xmm7
+ vpaddd xmm5,xmm1,XMMWORD[32+rbp]
+ vpaddd xmm6,xmm2,XMMWORD[64+rbp]
+ vpaddd xmm7,xmm3,XMMWORD[96+rbp]
+ vmovdqa XMMWORD[rsp],xmm4
+ mov r14d,eax
+ vmovdqa XMMWORD[16+rsp],xmm5
+ mov edi,ebx
+ vmovdqa XMMWORD[32+rsp],xmm6
+ xor edi,ecx
+ vmovdqa XMMWORD[48+rsp],xmm7
+ mov r13d,r8d
+ jmp NEAR $L$avx_00_47
+
+ALIGN 16
+$L$avx_00_47:
+ sub rbp,-128
+ vpalignr xmm4,xmm1,xmm0,4
+ shrd r13d,r13d,14
+ mov eax,r14d
+ mov r12d,r9d
+ vpalignr xmm7,xmm3,xmm2,4
+ shrd r14d,r14d,9
+ xor r13d,r8d
+ xor r12d,r10d
+ vpsrld xmm6,xmm4,7
+ shrd r13d,r13d,5
+ xor r14d,eax
+ and r12d,r8d
+ vpaddd xmm0,xmm0,xmm7
+ xor r13d,r8d
+ add r11d,DWORD[rsp]
+ mov r15d,eax
+ vpsrld xmm7,xmm4,3
+ xor r12d,r10d
+ shrd r14d,r14d,11
+ xor r15d,ebx
+ vpslld xmm5,xmm4,14
+ add r11d,r12d
+ shrd r13d,r13d,6
+ and edi,r15d
+ vpxor xmm4,xmm7,xmm6
+ xor r14d,eax
+ add r11d,r13d
+ xor edi,ebx
+ vpshufd xmm7,xmm3,250
+ shrd r14d,r14d,2
+ add edx,r11d
+ add r11d,edi
+ vpsrld xmm6,xmm6,11
+ mov r13d,edx
+ add r14d,r11d
+ shrd r13d,r13d,14
+ vpxor xmm4,xmm4,xmm5
+ mov r11d,r14d
+ mov r12d,r8d
+ shrd r14d,r14d,9
+ vpslld xmm5,xmm5,11
+ xor r13d,edx
+ xor r12d,r9d
+ shrd r13d,r13d,5
+ vpxor xmm4,xmm4,xmm6
+ xor r14d,r11d
+ and r12d,edx
+ xor r13d,edx
+ vpsrld xmm6,xmm7,10
+ add r10d,DWORD[4+rsp]
+ mov edi,r11d
+ xor r12d,r9d
+ vpxor xmm4,xmm4,xmm5
+ shrd r14d,r14d,11
+ xor edi,eax
+ add r10d,r12d
+ vpsrlq xmm7,xmm7,17
+ shrd r13d,r13d,6
+ and r15d,edi
+ xor r14d,r11d
+ vpaddd xmm0,xmm0,xmm4
+ add r10d,r13d
+ xor r15d,eax
+ shrd r14d,r14d,2
+ vpxor xmm6,xmm6,xmm7
+ add ecx,r10d
+ add r10d,r15d
+ mov r13d,ecx
+ vpsrlq xmm7,xmm7,2
+ add r14d,r10d
+ shrd r13d,r13d,14
+ mov r10d,r14d
+ vpxor xmm6,xmm6,xmm7
+ mov r12d,edx
+ shrd r14d,r14d,9
+ xor r13d,ecx
+ vpshufb xmm6,xmm6,xmm8
+ xor r12d,r8d
+ shrd r13d,r13d,5
+ xor r14d,r10d
+ vpaddd xmm0,xmm0,xmm6
+ and r12d,ecx
+ xor r13d,ecx
+ add r9d,DWORD[8+rsp]
+ vpshufd xmm7,xmm0,80
+ mov r15d,r10d
+ xor r12d,r8d
+ shrd r14d,r14d,11
+ vpsrld xmm6,xmm7,10
+ xor r15d,r11d
+ add r9d,r12d
+ shrd r13d,r13d,6
+ vpsrlq xmm7,xmm7,17
+ and edi,r15d
+ xor r14d,r10d
+ add r9d,r13d
+ vpxor xmm6,xmm6,xmm7
+ xor edi,r11d
+ shrd r14d,r14d,2
+ add ebx,r9d
+ vpsrlq xmm7,xmm7,2
+ add r9d,edi
+ mov r13d,ebx
+ add r14d,r9d
+ vpxor xmm6,xmm6,xmm7
+ shrd r13d,r13d,14
+ mov r9d,r14d
+ mov r12d,ecx
+ vpshufb xmm6,xmm6,xmm9
+ shrd r14d,r14d,9
+ xor r13d,ebx
+ xor r12d,edx
+ vpaddd xmm0,xmm0,xmm6
+ shrd r13d,r13d,5
+ xor r14d,r9d
+ and r12d,ebx
+ vpaddd xmm6,xmm0,XMMWORD[rbp]
+ xor r13d,ebx
+ add r8d,DWORD[12+rsp]
+ mov edi,r9d
+ xor r12d,edx
+ shrd r14d,r14d,11
+ xor edi,r10d
+ add r8d,r12d
+ shrd r13d,r13d,6
+ and r15d,edi
+ xor r14d,r9d
+ add r8d,r13d
+ xor r15d,r10d
+ shrd r14d,r14d,2
+ add eax,r8d
+ add r8d,r15d
+ mov r13d,eax
+ add r14d,r8d
+ vmovdqa XMMWORD[rsp],xmm6
+ vpalignr xmm4,xmm2,xmm1,4
+ shrd r13d,r13d,14
+ mov r8d,r14d
+ mov r12d,ebx
+ vpalignr xmm7,xmm0,xmm3,4
+ shrd r14d,r14d,9
+ xor r13d,eax
+ xor r12d,ecx
+ vpsrld xmm6,xmm4,7
+ shrd r13d,r13d,5
+ xor r14d,r8d
+ and r12d,eax
+ vpaddd xmm1,xmm1,xmm7
+ xor r13d,eax
+ add edx,DWORD[16+rsp]
+ mov r15d,r8d
+ vpsrld xmm7,xmm4,3
+ xor r12d,ecx
+ shrd r14d,r14d,11
+ xor r15d,r9d
+ vpslld xmm5,xmm4,14
+ add edx,r12d
+ shrd r13d,r13d,6
+ and edi,r15d
+ vpxor xmm4,xmm7,xmm6
+ xor r14d,r8d
+ add edx,r13d
+ xor edi,r9d
+ vpshufd xmm7,xmm0,250
+ shrd r14d,r14d,2
+ add r11d,edx
+ add edx,edi
+ vpsrld xmm6,xmm6,11
+ mov r13d,r11d
+ add r14d,edx
+ shrd r13d,r13d,14
+ vpxor xmm4,xmm4,xmm5
+ mov edx,r14d
+ mov r12d,eax
+ shrd r14d,r14d,9
+ vpslld xmm5,xmm5,11
+ xor r13d,r11d
+ xor r12d,ebx
+ shrd r13d,r13d,5
+ vpxor xmm4,xmm4,xmm6
+ xor r14d,edx
+ and r12d,r11d
+ xor r13d,r11d
+ vpsrld xmm6,xmm7,10
+ add ecx,DWORD[20+rsp]
+ mov edi,edx
+ xor r12d,ebx
+ vpxor xmm4,xmm4,xmm5
+ shrd r14d,r14d,11
+ xor edi,r8d
+ add ecx,r12d
+ vpsrlq xmm7,xmm7,17
+ shrd r13d,r13d,6
+ and r15d,edi
+ xor r14d,edx
+ vpaddd xmm1,xmm1,xmm4
+ add ecx,r13d
+ xor r15d,r8d
+ shrd r14d,r14d,2
+ vpxor xmm6,xmm6,xmm7
+ add r10d,ecx
+ add ecx,r15d
+ mov r13d,r10d
+ vpsrlq xmm7,xmm7,2
+ add r14d,ecx
+ shrd r13d,r13d,14
+ mov ecx,r14d
+ vpxor xmm6,xmm6,xmm7
+ mov r12d,r11d
+ shrd r14d,r14d,9
+ xor r13d,r10d
+ vpshufb xmm6,xmm6,xmm8
+ xor r12d,eax
+ shrd r13d,r13d,5
+ xor r14d,ecx
+ vpaddd xmm1,xmm1,xmm6
+ and r12d,r10d
+ xor r13d,r10d
+ add ebx,DWORD[24+rsp]
+ vpshufd xmm7,xmm1,80
+ mov r15d,ecx
+ xor r12d,eax
+ shrd r14d,r14d,11
+ vpsrld xmm6,xmm7,10
+ xor r15d,edx
+ add ebx,r12d
+ shrd r13d,r13d,6
+ vpsrlq xmm7,xmm7,17
+ and edi,r15d
+ xor r14d,ecx
+ add ebx,r13d
+ vpxor xmm6,xmm6,xmm7
+ xor edi,edx
+ shrd r14d,r14d,2
+ add r9d,ebx
+ vpsrlq xmm7,xmm7,2
+ add ebx,edi
+ mov r13d,r9d
+ add r14d,ebx
+ vpxor xmm6,xmm6,xmm7
+ shrd r13d,r13d,14
+ mov ebx,r14d
+ mov r12d,r10d
+ vpshufb xmm6,xmm6,xmm9
+ shrd r14d,r14d,9
+ xor r13d,r9d
+ xor r12d,r11d
+ vpaddd xmm1,xmm1,xmm6
+ shrd r13d,r13d,5
+ xor r14d,ebx
+ and r12d,r9d
+ vpaddd xmm6,xmm1,XMMWORD[32+rbp]
+ xor r13d,r9d
+ add eax,DWORD[28+rsp]
+ mov edi,ebx
+ xor r12d,r11d
+ shrd r14d,r14d,11
+ xor edi,ecx
+ add eax,r12d
+ shrd r13d,r13d,6
+ and r15d,edi
+ xor r14d,ebx
+ add eax,r13d
+ xor r15d,ecx
+ shrd r14d,r14d,2
+ add r8d,eax
+ add eax,r15d
+ mov r13d,r8d
+ add r14d,eax
+ vmovdqa XMMWORD[16+rsp],xmm6
+ vpalignr xmm4,xmm3,xmm2,4
+ shrd r13d,r13d,14
+ mov eax,r14d
+ mov r12d,r9d
+ vpalignr xmm7,xmm1,xmm0,4
+ shrd r14d,r14d,9
+ xor r13d,r8d
+ xor r12d,r10d
+ vpsrld xmm6,xmm4,7
+ shrd r13d,r13d,5
+ xor r14d,eax
+ and r12d,r8d
+ vpaddd xmm2,xmm2,xmm7
+ xor r13d,r8d
+ add r11d,DWORD[32+rsp]
+ mov r15d,eax
+ vpsrld xmm7,xmm4,3
+ xor r12d,r10d
+ shrd r14d,r14d,11
+ xor r15d,ebx
+ vpslld xmm5,xmm4,14
+ add r11d,r12d
+ shrd r13d,r13d,6
+ and edi,r15d
+ vpxor xmm4,xmm7,xmm6
+ xor r14d,eax
+ add r11d,r13d
+ xor edi,ebx
+ vpshufd xmm7,xmm1,250
+ shrd r14d,r14d,2
+ add edx,r11d
+ add r11d,edi
+ vpsrld xmm6,xmm6,11
+ mov r13d,edx
+ add r14d,r11d
+ shrd r13d,r13d,14
+ vpxor xmm4,xmm4,xmm5
+ mov r11d,r14d
+ mov r12d,r8d
+ shrd r14d,r14d,9
+ vpslld xmm5,xmm5,11
+ xor r13d,edx
+ xor r12d,r9d
+ shrd r13d,r13d,5
+ vpxor xmm4,xmm4,xmm6
+ xor r14d,r11d
+ and r12d,edx
+ xor r13d,edx
+ vpsrld xmm6,xmm7,10
+ add r10d,DWORD[36+rsp]
+ mov edi,r11d
+ xor r12d,r9d
+ vpxor xmm4,xmm4,xmm5
+ shrd r14d,r14d,11
+ xor edi,eax
+ add r10d,r12d
+ vpsrlq xmm7,xmm7,17
+ shrd r13d,r13d,6
+ and r15d,edi
+ xor r14d,r11d
+ vpaddd xmm2,xmm2,xmm4
+ add r10d,r13d
+ xor r15d,eax
+ shrd r14d,r14d,2
+ vpxor xmm6,xmm6,xmm7
+ add ecx,r10d
+ add r10d,r15d
+ mov r13d,ecx
+ vpsrlq xmm7,xmm7,2
+ add r14d,r10d
+ shrd r13d,r13d,14
+ mov r10d,r14d
+ vpxor xmm6,xmm6,xmm7
+ mov r12d,edx
+ shrd r14d,r14d,9
+ xor r13d,ecx
+ vpshufb xmm6,xmm6,xmm8
+ xor r12d,r8d
+ shrd r13d,r13d,5
+ xor r14d,r10d
+ vpaddd xmm2,xmm2,xmm6
+ and r12d,ecx
+ xor r13d,ecx
+ add r9d,DWORD[40+rsp]
+ vpshufd xmm7,xmm2,80
+ mov r15d,r10d
+ xor r12d,r8d
+ shrd r14d,r14d,11
+ vpsrld xmm6,xmm7,10
+ xor r15d,r11d
+ add r9d,r12d
+ shrd r13d,r13d,6
+ vpsrlq xmm7,xmm7,17
+ and edi,r15d
+ xor r14d,r10d
+ add r9d,r13d
+ vpxor xmm6,xmm6,xmm7
+ xor edi,r11d
+ shrd r14d,r14d,2
+ add ebx,r9d
+ vpsrlq xmm7,xmm7,2
+ add r9d,edi
+ mov r13d,ebx
+ add r14d,r9d
+ vpxor xmm6,xmm6,xmm7
+ shrd r13d,r13d,14
+ mov r9d,r14d
+ mov r12d,ecx
+ vpshufb xmm6,xmm6,xmm9
+ shrd r14d,r14d,9
+ xor r13d,ebx
+ xor r12d,edx
+ vpaddd xmm2,xmm2,xmm6
+ shrd r13d,r13d,5
+ xor r14d,r9d
+ and r12d,ebx
+ vpaddd xmm6,xmm2,XMMWORD[64+rbp]
+ xor r13d,ebx
+ add r8d,DWORD[44+rsp]
+ mov edi,r9d
+ xor r12d,edx
+ shrd r14d,r14d,11
+ xor edi,r10d
+ add r8d,r12d
+ shrd r13d,r13d,6
+ and r15d,edi
+ xor r14d,r9d
+ add r8d,r13d
+ xor r15d,r10d
+ shrd r14d,r14d,2
+ add eax,r8d
+ add r8d,r15d
+ mov r13d,eax
+ add r14d,r8d
+ vmovdqa XMMWORD[32+rsp],xmm6
+ vpalignr xmm4,xmm0,xmm3,4
+ shrd r13d,r13d,14
+ mov r8d,r14d
+ mov r12d,ebx
+ vpalignr xmm7,xmm2,xmm1,4
+ shrd r14d,r14d,9
+ xor r13d,eax
+ xor r12d,ecx
+ vpsrld xmm6,xmm4,7
+ shrd r13d,r13d,5
+ xor r14d,r8d
+ and r12d,eax
+ vpaddd xmm3,xmm3,xmm7
+ xor r13d,eax
+ add edx,DWORD[48+rsp]
+ mov r15d,r8d
+ vpsrld xmm7,xmm4,3
+ xor r12d,ecx
+ shrd r14d,r14d,11
+ xor r15d,r9d
+ vpslld xmm5,xmm4,14
+ add edx,r12d
+ shrd r13d,r13d,6
+ and edi,r15d
+ vpxor xmm4,xmm7,xmm6
+ xor r14d,r8d
+ add edx,r13d
+ xor edi,r9d
+ vpshufd xmm7,xmm2,250
+ shrd r14d,r14d,2
+ add r11d,edx
+ add edx,edi
+ vpsrld xmm6,xmm6,11
+ mov r13d,r11d
+ add r14d,edx
+ shrd r13d,r13d,14
+ vpxor xmm4,xmm4,xmm5
+ mov edx,r14d
+ mov r12d,eax
+ shrd r14d,r14d,9
+ vpslld xmm5,xmm5,11
+ xor r13d,r11d
+ xor r12d,ebx
+ shrd r13d,r13d,5
+ vpxor xmm4,xmm4,xmm6
+ xor r14d,edx
+ and r12d,r11d
+ xor r13d,r11d
+ vpsrld xmm6,xmm7,10
+ add ecx,DWORD[52+rsp]
+ mov edi,edx
+ xor r12d,ebx
+ vpxor xmm4,xmm4,xmm5
+ shrd r14d,r14d,11
+ xor edi,r8d
+ add ecx,r12d
+ vpsrlq xmm7,xmm7,17
+ shrd r13d,r13d,6
+ and r15d,edi
+ xor r14d,edx
+ vpaddd xmm3,xmm3,xmm4
+ add ecx,r13d
+ xor r15d,r8d
+ shrd r14d,r14d,2
+ vpxor xmm6,xmm6,xmm7
+ add r10d,ecx
+ add ecx,r15d
+ mov r13d,r10d
+ vpsrlq xmm7,xmm7,2
+ add r14d,ecx
+ shrd r13d,r13d,14
+ mov ecx,r14d
+ vpxor xmm6,xmm6,xmm7
+ mov r12d,r11d
+ shrd r14d,r14d,9
+ xor r13d,r10d
+ vpshufb xmm6,xmm6,xmm8
+ xor r12d,eax
+ shrd r13d,r13d,5
+ xor r14d,ecx
+ vpaddd xmm3,xmm3,xmm6
+ and r12d,r10d
+ xor r13d,r10d
+ add ebx,DWORD[56+rsp]
+ vpshufd xmm7,xmm3,80
+ mov r15d,ecx
+ xor r12d,eax
+ shrd r14d,r14d,11
+ vpsrld xmm6,xmm7,10
+ xor r15d,edx
+ add ebx,r12d
+ shrd r13d,r13d,6
+ vpsrlq xmm7,xmm7,17
+ and edi,r15d
+ xor r14d,ecx
+ add ebx,r13d
+ vpxor xmm6,xmm6,xmm7
+ xor edi,edx
+ shrd r14d,r14d,2
+ add r9d,ebx
+ vpsrlq xmm7,xmm7,2
+ add ebx,edi
+ mov r13d,r9d
+ add r14d,ebx
+ vpxor xmm6,xmm6,xmm7
+ shrd r13d,r13d,14
+ mov ebx,r14d
+ mov r12d,r10d
+ vpshufb xmm6,xmm6,xmm9
+ shrd r14d,r14d,9
+ xor r13d,r9d
+ xor r12d,r11d
+ vpaddd xmm3,xmm3,xmm6
+ shrd r13d,r13d,5
+ xor r14d,ebx
+ and r12d,r9d
+ vpaddd xmm6,xmm3,XMMWORD[96+rbp]
+ xor r13d,r9d
+ add eax,DWORD[60+rsp]
+ mov edi,ebx
+ xor r12d,r11d
+ shrd r14d,r14d,11
+ xor edi,ecx
+ add eax,r12d
+ shrd r13d,r13d,6
+ and r15d,edi
+ xor r14d,ebx
+ add eax,r13d
+ xor r15d,ecx
+ shrd r14d,r14d,2
+ add r8d,eax
+ add eax,r15d
+ mov r13d,r8d
+ add r14d,eax
+ vmovdqa XMMWORD[48+rsp],xmm6
+ cmp BYTE[131+rbp],0
+ jne NEAR $L$avx_00_47
+ shrd r13d,r13d,14
+ mov eax,r14d
+ mov r12d,r9d
+ shrd r14d,r14d,9
+ xor r13d,r8d
+ xor r12d,r10d
+ shrd r13d,r13d,5
+ xor r14d,eax
+ and r12d,r8d
+ xor r13d,r8d
+ add r11d,DWORD[rsp]
+ mov r15d,eax
+ xor r12d,r10d
+ shrd r14d,r14d,11
+ xor r15d,ebx
+ add r11d,r12d
+ shrd r13d,r13d,6
+ and edi,r15d
+ xor r14d,eax
+ add r11d,r13d
+ xor edi,ebx
+ shrd r14d,r14d,2
+ add edx,r11d
+ add r11d,edi
+ mov r13d,edx
+ add r14d,r11d
+ shrd r13d,r13d,14
+ mov r11d,r14d
+ mov r12d,r8d
+ shrd r14d,r14d,9
+ xor r13d,edx
+ xor r12d,r9d
+ shrd r13d,r13d,5
+ xor r14d,r11d
+ and r12d,edx
+ xor r13d,edx
+ add r10d,DWORD[4+rsp]
+ mov edi,r11d
+ xor r12d,r9d
+ shrd r14d,r14d,11
+ xor edi,eax
+ add r10d,r12d
+ shrd r13d,r13d,6
+ and r15d,edi
+ xor r14d,r11d
+ add r10d,r13d
+ xor r15d,eax
+ shrd r14d,r14d,2
+ add ecx,r10d
+ add r10d,r15d
+ mov r13d,ecx
+ add r14d,r10d
+ shrd r13d,r13d,14
+ mov r10d,r14d
+ mov r12d,edx
+ shrd r14d,r14d,9
+ xor r13d,ecx
+ xor r12d,r8d
+ shrd r13d,r13d,5
+ xor r14d,r10d
+ and r12d,ecx
+ xor r13d,ecx
+ add r9d,DWORD[8+rsp]
+ mov r15d,r10d
+ xor r12d,r8d
+ shrd r14d,r14d,11
+ xor r15d,r11d
+ add r9d,r12d
+ shrd r13d,r13d,6
+ and edi,r15d
+ xor r14d,r10d
+ add r9d,r13d
+ xor edi,r11d
+ shrd r14d,r14d,2
+ add ebx,r9d
+ add r9d,edi
+ mov r13d,ebx
+ add r14d,r9d
+ shrd r13d,r13d,14
+ mov r9d,r14d
+ mov r12d,ecx
+ shrd r14d,r14d,9
+ xor r13d,ebx
+ xor r12d,edx
+ shrd r13d,r13d,5
+ xor r14d,r9d
+ and r12d,ebx
+ xor r13d,ebx
+ add r8d,DWORD[12+rsp]
+ mov edi,r9d
+ xor r12d,edx
+ shrd r14d,r14d,11
+ xor edi,r10d
+ add r8d,r12d
+ shrd r13d,r13d,6
+ and r15d,edi
+ xor r14d,r9d
+ add r8d,r13d
+ xor r15d,r10d
+ shrd r14d,r14d,2
+ add eax,r8d
+ add r8d,r15d
+ mov r13d,eax
+ add r14d,r8d
+ shrd r13d,r13d,14
+ mov r8d,r14d
+ mov r12d,ebx
+ shrd r14d,r14d,9
+ xor r13d,eax
+ xor r12d,ecx
+ shrd r13d,r13d,5
+ xor r14d,r8d
+ and r12d,eax
+ xor r13d,eax
+ add edx,DWORD[16+rsp]
+ mov r15d,r8d
+ xor r12d,ecx
+ shrd r14d,r14d,11
+ xor r15d,r9d
+ add edx,r12d
+ shrd r13d,r13d,6
+ and edi,r15d
+ xor r14d,r8d
+ add edx,r13d
+ xor edi,r9d
+ shrd r14d,r14d,2
+ add r11d,edx
+ add edx,edi
+ mov r13d,r11d
+ add r14d,edx
+ shrd r13d,r13d,14
+ mov edx,r14d
+ mov r12d,eax
+ shrd r14d,r14d,9
+ xor r13d,r11d
+ xor r12d,ebx
+ shrd r13d,r13d,5
+ xor r14d,edx
+ and r12d,r11d
+ xor r13d,r11d
+ add ecx,DWORD[20+rsp]
+ mov edi,edx
+ xor r12d,ebx
+ shrd r14d,r14d,11
+ xor edi,r8d
+ add ecx,r12d
+ shrd r13d,r13d,6
+ and r15d,edi
+ xor r14d,edx
+ add ecx,r13d
+ xor r15d,r8d
+ shrd r14d,r14d,2
+ add r10d,ecx
+ add ecx,r15d
+ mov r13d,r10d
+ add r14d,ecx
+ shrd r13d,r13d,14
+ mov ecx,r14d
+ mov r12d,r11d
+ shrd r14d,r14d,9
+ xor r13d,r10d
+ xor r12d,eax
+ shrd r13d,r13d,5
+ xor r14d,ecx
+ and r12d,r10d
+ xor r13d,r10d
+ add ebx,DWORD[24+rsp]
+ mov r15d,ecx
+ xor r12d,eax
+ shrd r14d,r14d,11
+ xor r15d,edx
+ add ebx,r12d
+ shrd r13d,r13d,6
+ and edi,r15d
+ xor r14d,ecx
+ add ebx,r13d
+ xor edi,edx
+ shrd r14d,r14d,2
+ add r9d,ebx
+ add ebx,edi
+ mov r13d,r9d
+ add r14d,ebx
+ shrd r13d,r13d,14
+ mov ebx,r14d
+ mov r12d,r10d
+ shrd r14d,r14d,9
+ xor r13d,r9d
+ xor r12d,r11d
+ shrd r13d,r13d,5
+ xor r14d,ebx
+ and r12d,r9d
+ xor r13d,r9d
+ add eax,DWORD[28+rsp]
+ mov edi,ebx
+ xor r12d,r11d
+ shrd r14d,r14d,11
+ xor edi,ecx
+ add eax,r12d
+ shrd r13d,r13d,6
+ and r15d,edi
+ xor r14d,ebx
+ add eax,r13d
+ xor r15d,ecx
+ shrd r14d,r14d,2
+ add r8d,eax
+ add eax,r15d
+ mov r13d,r8d
+ add r14d,eax
+ shrd r13d,r13d,14
+ mov eax,r14d
+ mov r12d,r9d
+ shrd r14d,r14d,9
+ xor r13d,r8d
+ xor r12d,r10d
+ shrd r13d,r13d,5
+ xor r14d,eax
+ and r12d,r8d
+ xor r13d,r8d
+ add r11d,DWORD[32+rsp]
+ mov r15d,eax
+ xor r12d,r10d
+ shrd r14d,r14d,11
+ xor r15d,ebx
+ add r11d,r12d
+ shrd r13d,r13d,6
+ and edi,r15d
+ xor r14d,eax
+ add r11d,r13d
+ xor edi,ebx
+ shrd r14d,r14d,2
+ add edx,r11d
+ add r11d,edi
+ mov r13d,edx
+ add r14d,r11d
+ shrd r13d,r13d,14
+ mov r11d,r14d
+ mov r12d,r8d
+ shrd r14d,r14d,9
+ xor r13d,edx
+ xor r12d,r9d
+ shrd r13d,r13d,5
+ xor r14d,r11d
+ and r12d,edx
+ xor r13d,edx
+ add r10d,DWORD[36+rsp]
+ mov edi,r11d
+ xor r12d,r9d
+ shrd r14d,r14d,11
+ xor edi,eax
+ add r10d,r12d
+ shrd r13d,r13d,6
+ and r15d,edi
+ xor r14d,r11d
+ add r10d,r13d
+ xor r15d,eax
+ shrd r14d,r14d,2
+ add ecx,r10d
+ add r10d,r15d
+ mov r13d,ecx
+ add r14d,r10d
+ shrd r13d,r13d,14
+ mov r10d,r14d
+ mov r12d,edx
+ shrd r14d,r14d,9
+ xor r13d,ecx
+ xor r12d,r8d
+ shrd r13d,r13d,5
+ xor r14d,r10d
+ and r12d,ecx
+ xor r13d,ecx
+ add r9d,DWORD[40+rsp]
+ mov r15d,r10d
+ xor r12d,r8d
+ shrd r14d,r14d,11
+ xor r15d,r11d
+ add r9d,r12d
+ shrd r13d,r13d,6
+ and edi,r15d
+ xor r14d,r10d
+ add r9d,r13d
+ xor edi,r11d
+ shrd r14d,r14d,2
+ add ebx,r9d
+ add r9d,edi
+ mov r13d,ebx
+ add r14d,r9d
+ shrd r13d,r13d,14
+ mov r9d,r14d
+ mov r12d,ecx
+ shrd r14d,r14d,9
+ xor r13d,ebx
+ xor r12d,edx
+ shrd r13d,r13d,5
+ xor r14d,r9d
+ and r12d,ebx
+ xor r13d,ebx
+ add r8d,DWORD[44+rsp]
+ mov edi,r9d
+ xor r12d,edx
+ shrd r14d,r14d,11
+ xor edi,r10d
+ add r8d,r12d
+ shrd r13d,r13d,6
+ and r15d,edi
+ xor r14d,r9d
+ add r8d,r13d
+ xor r15d,r10d
+ shrd r14d,r14d,2
+ add eax,r8d
+ add r8d,r15d
+ mov r13d,eax
+ add r14d,r8d
+ shrd r13d,r13d,14
+ mov r8d,r14d
+ mov r12d,ebx
+ shrd r14d,r14d,9
+ xor r13d,eax
+ xor r12d,ecx
+ shrd r13d,r13d,5
+ xor r14d,r8d
+ and r12d,eax
+ xor r13d,eax
+ add edx,DWORD[48+rsp]
+ mov r15d,r8d
+ xor r12d,ecx
+ shrd r14d,r14d,11
+ xor r15d,r9d
+ add edx,r12d
+ shrd r13d,r13d,6
+ and edi,r15d
+ xor r14d,r8d
+ add edx,r13d
+ xor edi,r9d
+ shrd r14d,r14d,2
+ add r11d,edx
+ add edx,edi
+ mov r13d,r11d
+ add r14d,edx
+ shrd r13d,r13d,14
+ mov edx,r14d
+ mov r12d,eax
+ shrd r14d,r14d,9
+ xor r13d,r11d
+ xor r12d,ebx
+ shrd r13d,r13d,5
+ xor r14d,edx
+ and r12d,r11d
+ xor r13d,r11d
+ add ecx,DWORD[52+rsp]
+ mov edi,edx
+ xor r12d,ebx
+ shrd r14d,r14d,11
+ xor edi,r8d
+ add ecx,r12d
+ shrd r13d,r13d,6
+ and r15d,edi
+ xor r14d,edx
+ add ecx,r13d
+ xor r15d,r8d
+ shrd r14d,r14d,2
+ add r10d,ecx
+ add ecx,r15d
+ mov r13d,r10d
+ add r14d,ecx
+ shrd r13d,r13d,14
+ mov ecx,r14d
+ mov r12d,r11d
+ shrd r14d,r14d,9
+ xor r13d,r10d
+ xor r12d,eax
+ shrd r13d,r13d,5
+ xor r14d,ecx
+ and r12d,r10d
+ xor r13d,r10d
+ add ebx,DWORD[56+rsp]
+ mov r15d,ecx
+ xor r12d,eax
+ shrd r14d,r14d,11
+ xor r15d,edx
+ add ebx,r12d
+ shrd r13d,r13d,6
+ and edi,r15d
+ xor r14d,ecx
+ add ebx,r13d
+ xor edi,edx
+ shrd r14d,r14d,2
+ add r9d,ebx
+ add ebx,edi
+ mov r13d,r9d
+ add r14d,ebx
+ shrd r13d,r13d,14
+ mov ebx,r14d
+ mov r12d,r10d
+ shrd r14d,r14d,9
+ xor r13d,r9d
+ xor r12d,r11d
+ shrd r13d,r13d,5
+ xor r14d,ebx
+ and r12d,r9d
+ xor r13d,r9d
+ add eax,DWORD[60+rsp]
+ mov edi,ebx
+ xor r12d,r11d
+ shrd r14d,r14d,11
+ xor edi,ecx
+ add eax,r12d
+ shrd r13d,r13d,6
+ and r15d,edi
+ xor r14d,ebx
+ add eax,r13d
+ xor r15d,ecx
+ shrd r14d,r14d,2
+ add r8d,eax
+ add eax,r15d
+ mov r13d,r8d
+ add r14d,eax
+ mov rdi,QWORD[((64+0))+rsp]
+ mov eax,r14d
+
+ add eax,DWORD[rdi]
+ lea rsi,[64+rsi]
+ add ebx,DWORD[4+rdi]
+ add ecx,DWORD[8+rdi]
+ add edx,DWORD[12+rdi]
+ add r8d,DWORD[16+rdi]
+ add r9d,DWORD[20+rdi]
+ add r10d,DWORD[24+rdi]
+ add r11d,DWORD[28+rdi]
+
+ cmp rsi,QWORD[((64+16))+rsp]
+
+ mov DWORD[rdi],eax
+ mov DWORD[4+rdi],ebx
+ mov DWORD[8+rdi],ecx
+ mov DWORD[12+rdi],edx
+ mov DWORD[16+rdi],r8d
+ mov DWORD[20+rdi],r9d
+ mov DWORD[24+rdi],r10d
+ mov DWORD[28+rdi],r11d
+ jb NEAR $L$loop_avx
+
+ mov rsi,QWORD[88+rsp]
+
+ vzeroupper
+ movaps xmm6,XMMWORD[((64+32))+rsp]
+ movaps xmm7,XMMWORD[((64+48))+rsp]
+ movaps xmm8,XMMWORD[((64+64))+rsp]
+ movaps xmm9,XMMWORD[((64+80))+rsp]
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$epilogue_avx:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_sha256_block_data_order_avx:
+
+ALIGN 64
+sha256_block_data_order_avx2:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha256_block_data_order_avx2:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+$L$avx2_shortcut:
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ sub rsp,608
+ shl rdx,4
+ and rsp,-256*4
+ lea rdx,[rdx*4+rsi]
+ add rsp,448
+ mov QWORD[((64+0))+rsp],rdi
+ mov QWORD[((64+8))+rsp],rsi
+ mov QWORD[((64+16))+rsp],rdx
+ mov QWORD[88+rsp],rax
+
+ movaps XMMWORD[(64+32)+rsp],xmm6
+ movaps XMMWORD[(64+48)+rsp],xmm7
+ movaps XMMWORD[(64+64)+rsp],xmm8
+ movaps XMMWORD[(64+80)+rsp],xmm9
+$L$prologue_avx2:
+
+ vzeroupper
+ sub rsi,-16*4
+ mov eax,DWORD[rdi]
+ mov r12,rsi
+ mov ebx,DWORD[4+rdi]
+ cmp rsi,rdx
+ mov ecx,DWORD[8+rdi]
+ cmove r12,rsp
+ mov edx,DWORD[12+rdi]
+ mov r8d,DWORD[16+rdi]
+ mov r9d,DWORD[20+rdi]
+ mov r10d,DWORD[24+rdi]
+ mov r11d,DWORD[28+rdi]
+ vmovdqa ymm8,YMMWORD[((K256+512+32))]
+ vmovdqa ymm9,YMMWORD[((K256+512+64))]
+ jmp NEAR $L$oop_avx2
+ALIGN 16
+$L$oop_avx2:
+ vmovdqa ymm7,YMMWORD[((K256+512))]
+ vmovdqu xmm0,XMMWORD[((-64+0))+rsi]
+ vmovdqu xmm1,XMMWORD[((-64+16))+rsi]
+ vmovdqu xmm2,XMMWORD[((-64+32))+rsi]
+ vmovdqu xmm3,XMMWORD[((-64+48))+rsi]
+
+ vinserti128 ymm0,ymm0,XMMWORD[r12],1
+ vinserti128 ymm1,ymm1,XMMWORD[16+r12],1
+ vpshufb ymm0,ymm0,ymm7
+ vinserti128 ymm2,ymm2,XMMWORD[32+r12],1
+ vpshufb ymm1,ymm1,ymm7
+ vinserti128 ymm3,ymm3,XMMWORD[48+r12],1
+
+ lea rbp,[K256]
+ vpshufb ymm2,ymm2,ymm7
+ vpaddd ymm4,ymm0,YMMWORD[rbp]
+ vpshufb ymm3,ymm3,ymm7
+ vpaddd ymm5,ymm1,YMMWORD[32+rbp]
+ vpaddd ymm6,ymm2,YMMWORD[64+rbp]
+ vpaddd ymm7,ymm3,YMMWORD[96+rbp]
+ vmovdqa YMMWORD[rsp],ymm4
+ xor r14d,r14d
+ vmovdqa YMMWORD[32+rsp],ymm5
+ lea rsp,[((-64))+rsp]
+ mov edi,ebx
+ vmovdqa YMMWORD[rsp],ymm6
+ xor edi,ecx
+ vmovdqa YMMWORD[32+rsp],ymm7
+ mov r12d,r9d
+ sub rbp,-16*2*4
+ jmp NEAR $L$avx2_00_47
+
+ALIGN 16
+$L$avx2_00_47:
+ lea rsp,[((-64))+rsp]
+ vpalignr ymm4,ymm1,ymm0,4
+ add r11d,DWORD[((0+128))+rsp]
+ and r12d,r8d
+ rorx r13d,r8d,25
+ vpalignr ymm7,ymm3,ymm2,4
+ rorx r15d,r8d,11
+ lea eax,[r14*1+rax]
+ lea r11d,[r12*1+r11]
+ vpsrld ymm6,ymm4,7
+ andn r12d,r8d,r10d
+ xor r13d,r15d
+ rorx r14d,r8d,6
+ vpaddd ymm0,ymm0,ymm7
+ lea r11d,[r12*1+r11]
+ xor r13d,r14d
+ mov r15d,eax
+ vpsrld ymm7,ymm4,3
+ rorx r12d,eax,22
+ lea r11d,[r13*1+r11]
+ xor r15d,ebx
+ vpslld ymm5,ymm4,14
+ rorx r14d,eax,13
+ rorx r13d,eax,2
+ lea edx,[r11*1+rdx]
+ vpxor ymm4,ymm7,ymm6
+ and edi,r15d
+ xor r14d,r12d
+ xor edi,ebx
+ vpshufd ymm7,ymm3,250
+ xor r14d,r13d
+ lea r11d,[rdi*1+r11]
+ mov r12d,r8d
+ vpsrld ymm6,ymm6,11
+ add r10d,DWORD[((4+128))+rsp]
+ and r12d,edx
+ rorx r13d,edx,25
+ vpxor ymm4,ymm4,ymm5
+ rorx edi,edx,11
+ lea r11d,[r14*1+r11]
+ lea r10d,[r12*1+r10]
+ vpslld ymm5,ymm5,11
+ andn r12d,edx,r9d
+ xor r13d,edi
+ rorx r14d,edx,6
+ vpxor ymm4,ymm4,ymm6
+ lea r10d,[r12*1+r10]
+ xor r13d,r14d
+ mov edi,r11d
+ vpsrld ymm6,ymm7,10
+ rorx r12d,r11d,22
+ lea r10d,[r13*1+r10]
+ xor edi,eax
+ vpxor ymm4,ymm4,ymm5
+ rorx r14d,r11d,13
+ rorx r13d,r11d,2
+ lea ecx,[r10*1+rcx]
+ vpsrlq ymm7,ymm7,17
+ and r15d,edi
+ xor r14d,r12d
+ xor r15d,eax
+ vpaddd ymm0,ymm0,ymm4
+ xor r14d,r13d
+ lea r10d,[r15*1+r10]
+ mov r12d,edx
+ vpxor ymm6,ymm6,ymm7
+ add r9d,DWORD[((8+128))+rsp]
+ and r12d,ecx
+ rorx r13d,ecx,25
+ vpsrlq ymm7,ymm7,2
+ rorx r15d,ecx,11
+ lea r10d,[r14*1+r10]
+ lea r9d,[r12*1+r9]
+ vpxor ymm6,ymm6,ymm7
+ andn r12d,ecx,r8d
+ xor r13d,r15d
+ rorx r14d,ecx,6
+ vpshufb ymm6,ymm6,ymm8
+ lea r9d,[r12*1+r9]
+ xor r13d,r14d
+ mov r15d,r10d
+ vpaddd ymm0,ymm0,ymm6
+ rorx r12d,r10d,22
+ lea r9d,[r13*1+r9]
+ xor r15d,r11d
+ vpshufd ymm7,ymm0,80
+ rorx r14d,r10d,13
+ rorx r13d,r10d,2
+ lea ebx,[r9*1+rbx]
+ vpsrld ymm6,ymm7,10
+ and edi,r15d
+ xor r14d,r12d
+ xor edi,r11d
+ vpsrlq ymm7,ymm7,17
+ xor r14d,r13d
+ lea r9d,[rdi*1+r9]
+ mov r12d,ecx
+ vpxor ymm6,ymm6,ymm7
+ add r8d,DWORD[((12+128))+rsp]
+ and r12d,ebx
+ rorx r13d,ebx,25
+ vpsrlq ymm7,ymm7,2
+ rorx edi,ebx,11
+ lea r9d,[r14*1+r9]
+ lea r8d,[r12*1+r8]
+ vpxor ymm6,ymm6,ymm7
+ andn r12d,ebx,edx
+ xor r13d,edi
+ rorx r14d,ebx,6
+ vpshufb ymm6,ymm6,ymm9
+ lea r8d,[r12*1+r8]
+ xor r13d,r14d
+ mov edi,r9d
+ vpaddd ymm0,ymm0,ymm6
+ rorx r12d,r9d,22
+ lea r8d,[r13*1+r8]
+ xor edi,r10d
+ vpaddd ymm6,ymm0,YMMWORD[rbp]
+ rorx r14d,r9d,13
+ rorx r13d,r9d,2
+ lea eax,[r8*1+rax]
+ and r15d,edi
+ xor r14d,r12d
+ xor r15d,r10d
+ xor r14d,r13d
+ lea r8d,[r15*1+r8]
+ mov r12d,ebx
+ vmovdqa YMMWORD[rsp],ymm6
+ vpalignr ymm4,ymm2,ymm1,4
+ add edx,DWORD[((32+128))+rsp]
+ and r12d,eax
+ rorx r13d,eax,25
+ vpalignr ymm7,ymm0,ymm3,4
+ rorx r15d,eax,11
+ lea r8d,[r14*1+r8]
+ lea edx,[r12*1+rdx]
+ vpsrld ymm6,ymm4,7
+ andn r12d,eax,ecx
+ xor r13d,r15d
+ rorx r14d,eax,6
+ vpaddd ymm1,ymm1,ymm7
+ lea edx,[r12*1+rdx]
+ xor r13d,r14d
+ mov r15d,r8d
+ vpsrld ymm7,ymm4,3
+ rorx r12d,r8d,22
+ lea edx,[r13*1+rdx]
+ xor r15d,r9d
+ vpslld ymm5,ymm4,14
+ rorx r14d,r8d,13
+ rorx r13d,r8d,2
+ lea r11d,[rdx*1+r11]
+ vpxor ymm4,ymm7,ymm6
+ and edi,r15d
+ xor r14d,r12d
+ xor edi,r9d
+ vpshufd ymm7,ymm0,250
+ xor r14d,r13d
+ lea edx,[rdi*1+rdx]
+ mov r12d,eax
+ vpsrld ymm6,ymm6,11
+ add ecx,DWORD[((36+128))+rsp]
+ and r12d,r11d
+ rorx r13d,r11d,25
+ vpxor ymm4,ymm4,ymm5
+ rorx edi,r11d,11
+ lea edx,[r14*1+rdx]
+ lea ecx,[r12*1+rcx]
+ vpslld ymm5,ymm5,11
+ andn r12d,r11d,ebx
+ xor r13d,edi
+ rorx r14d,r11d,6
+ vpxor ymm4,ymm4,ymm6
+ lea ecx,[r12*1+rcx]
+ xor r13d,r14d
+ mov edi,edx
+ vpsrld ymm6,ymm7,10
+ rorx r12d,edx,22
+ lea ecx,[r13*1+rcx]
+ xor edi,r8d
+ vpxor ymm4,ymm4,ymm5
+ rorx r14d,edx,13
+ rorx r13d,edx,2
+ lea r10d,[rcx*1+r10]
+ vpsrlq ymm7,ymm7,17
+ and r15d,edi
+ xor r14d,r12d
+ xor r15d,r8d
+ vpaddd ymm1,ymm1,ymm4
+ xor r14d,r13d
+ lea ecx,[r15*1+rcx]
+ mov r12d,r11d
+ vpxor ymm6,ymm6,ymm7
+ add ebx,DWORD[((40+128))+rsp]
+ and r12d,r10d
+ rorx r13d,r10d,25
+ vpsrlq ymm7,ymm7,2
+ rorx r15d,r10d,11
+ lea ecx,[r14*1+rcx]
+ lea ebx,[r12*1+rbx]
+ vpxor ymm6,ymm6,ymm7
+ andn r12d,r10d,eax
+ xor r13d,r15d
+ rorx r14d,r10d,6
+ vpshufb ymm6,ymm6,ymm8
+ lea ebx,[r12*1+rbx]
+ xor r13d,r14d
+ mov r15d,ecx
+ vpaddd ymm1,ymm1,ymm6
+ rorx r12d,ecx,22
+ lea ebx,[r13*1+rbx]
+ xor r15d,edx
+ vpshufd ymm7,ymm1,80
+ rorx r14d,ecx,13
+ rorx r13d,ecx,2
+ lea r9d,[rbx*1+r9]
+ vpsrld ymm6,ymm7,10
+ and edi,r15d
+ xor r14d,r12d
+ xor edi,edx
+ vpsrlq ymm7,ymm7,17
+ xor r14d,r13d
+ lea ebx,[rdi*1+rbx]
+ mov r12d,r10d
+ vpxor ymm6,ymm6,ymm7
+ add eax,DWORD[((44+128))+rsp]
+ and r12d,r9d
+ rorx r13d,r9d,25
+ vpsrlq ymm7,ymm7,2
+ rorx edi,r9d,11
+ lea ebx,[r14*1+rbx]
+ lea eax,[r12*1+rax]
+ vpxor ymm6,ymm6,ymm7
+ andn r12d,r9d,r11d
+ xor r13d,edi
+ rorx r14d,r9d,6
+ vpshufb ymm6,ymm6,ymm9
+ lea eax,[r12*1+rax]
+ xor r13d,r14d
+ mov edi,ebx
+ vpaddd ymm1,ymm1,ymm6
+ rorx r12d,ebx,22
+ lea eax,[r13*1+rax]
+ xor edi,ecx
+ vpaddd ymm6,ymm1,YMMWORD[32+rbp]
+ rorx r14d,ebx,13
+ rorx r13d,ebx,2
+ lea r8d,[rax*1+r8]
+ and r15d,edi
+ xor r14d,r12d
+ xor r15d,ecx
+ xor r14d,r13d
+ lea eax,[r15*1+rax]
+ mov r12d,r9d
+ vmovdqa YMMWORD[32+rsp],ymm6
+ lea rsp,[((-64))+rsp]
+ vpalignr ymm4,ymm3,ymm2,4
+ add r11d,DWORD[((0+128))+rsp]
+ and r12d,r8d
+ rorx r13d,r8d,25
+ vpalignr ymm7,ymm1,ymm0,4
+ rorx r15d,r8d,11
+ lea eax,[r14*1+rax]
+ lea r11d,[r12*1+r11]
+ vpsrld ymm6,ymm4,7
+ andn r12d,r8d,r10d
+ xor r13d,r15d
+ rorx r14d,r8d,6
+ vpaddd ymm2,ymm2,ymm7
+ lea r11d,[r12*1+r11]
+ xor r13d,r14d
+ mov r15d,eax
+ vpsrld ymm7,ymm4,3
+ rorx r12d,eax,22
+ lea r11d,[r13*1+r11]
+ xor r15d,ebx
+ vpslld ymm5,ymm4,14
+ rorx r14d,eax,13
+ rorx r13d,eax,2
+ lea edx,[r11*1+rdx]
+ vpxor ymm4,ymm7,ymm6
+ and edi,r15d
+ xor r14d,r12d
+ xor edi,ebx
+ vpshufd ymm7,ymm1,250
+ xor r14d,r13d
+ lea r11d,[rdi*1+r11]
+ mov r12d,r8d
+ vpsrld ymm6,ymm6,11
+ add r10d,DWORD[((4+128))+rsp]
+ and r12d,edx
+ rorx r13d,edx,25
+ vpxor ymm4,ymm4,ymm5
+ rorx edi,edx,11
+ lea r11d,[r14*1+r11]
+ lea r10d,[r12*1+r10]
+ vpslld ymm5,ymm5,11
+ andn r12d,edx,r9d
+ xor r13d,edi
+ rorx r14d,edx,6
+ vpxor ymm4,ymm4,ymm6
+ lea r10d,[r12*1+r10]
+ xor r13d,r14d
+ mov edi,r11d
+ vpsrld ymm6,ymm7,10
+ rorx r12d,r11d,22
+ lea r10d,[r13*1+r10]
+ xor edi,eax
+ vpxor ymm4,ymm4,ymm5
+ rorx r14d,r11d,13
+ rorx r13d,r11d,2
+ lea ecx,[r10*1+rcx]
+ vpsrlq ymm7,ymm7,17
+ and r15d,edi
+ xor r14d,r12d
+ xor r15d,eax
+ vpaddd ymm2,ymm2,ymm4
+ xor r14d,r13d
+ lea r10d,[r15*1+r10]
+ mov r12d,edx
+ vpxor ymm6,ymm6,ymm7
+ add r9d,DWORD[((8+128))+rsp]
+ and r12d,ecx
+ rorx r13d,ecx,25
+ vpsrlq ymm7,ymm7,2
+ rorx r15d,ecx,11
+ lea r10d,[r14*1+r10]
+ lea r9d,[r12*1+r9]
+ vpxor ymm6,ymm6,ymm7
+ andn r12d,ecx,r8d
+ xor r13d,r15d
+ rorx r14d,ecx,6
+ vpshufb ymm6,ymm6,ymm8
+ lea r9d,[r12*1+r9]
+ xor r13d,r14d
+ mov r15d,r10d
+ vpaddd ymm2,ymm2,ymm6
+ rorx r12d,r10d,22
+ lea r9d,[r13*1+r9]
+ xor r15d,r11d
+ vpshufd ymm7,ymm2,80
+ rorx r14d,r10d,13
+ rorx r13d,r10d,2
+ lea ebx,[r9*1+rbx]
+ vpsrld ymm6,ymm7,10
+ and edi,r15d
+ xor r14d,r12d
+ xor edi,r11d
+ vpsrlq ymm7,ymm7,17
+ xor r14d,r13d
+ lea r9d,[rdi*1+r9]
+ mov r12d,ecx
+ vpxor ymm6,ymm6,ymm7
+ add r8d,DWORD[((12+128))+rsp]
+ and r12d,ebx
+ rorx r13d,ebx,25
+ vpsrlq ymm7,ymm7,2
+ rorx edi,ebx,11
+ lea r9d,[r14*1+r9]
+ lea r8d,[r12*1+r8]
+ vpxor ymm6,ymm6,ymm7
+ andn r12d,ebx,edx
+ xor r13d,edi
+ rorx r14d,ebx,6
+ vpshufb ymm6,ymm6,ymm9
+ lea r8d,[r12*1+r8]
+ xor r13d,r14d
+ mov edi,r9d
+ vpaddd ymm2,ymm2,ymm6
+ rorx r12d,r9d,22
+ lea r8d,[r13*1+r8]
+ xor edi,r10d
+ vpaddd ymm6,ymm2,YMMWORD[64+rbp]
+ rorx r14d,r9d,13
+ rorx r13d,r9d,2
+ lea eax,[r8*1+rax]
+ and r15d,edi
+ xor r14d,r12d
+ xor r15d,r10d
+ xor r14d,r13d
+ lea r8d,[r15*1+r8]
+ mov r12d,ebx
+ vmovdqa YMMWORD[rsp],ymm6
+ vpalignr ymm4,ymm0,ymm3,4
+ add edx,DWORD[((32+128))+rsp]
+ and r12d,eax
+ rorx r13d,eax,25
+ vpalignr ymm7,ymm2,ymm1,4
+ rorx r15d,eax,11
+ lea r8d,[r14*1+r8]
+ lea edx,[r12*1+rdx]
+ vpsrld ymm6,ymm4,7
+ andn r12d,eax,ecx
+ xor r13d,r15d
+ rorx r14d,eax,6
+ vpaddd ymm3,ymm3,ymm7
+ lea edx,[r12*1+rdx]
+ xor r13d,r14d
+ mov r15d,r8d
+ vpsrld ymm7,ymm4,3
+ rorx r12d,r8d,22
+ lea edx,[r13*1+rdx]
+ xor r15d,r9d
+ vpslld ymm5,ymm4,14
+ rorx r14d,r8d,13
+ rorx r13d,r8d,2
+ lea r11d,[rdx*1+r11]
+ vpxor ymm4,ymm7,ymm6
+ and edi,r15d
+ xor r14d,r12d
+ xor edi,r9d
+ vpshufd ymm7,ymm2,250
+ xor r14d,r13d
+ lea edx,[rdi*1+rdx]
+ mov r12d,eax
+ vpsrld ymm6,ymm6,11
+ add ecx,DWORD[((36+128))+rsp]
+ and r12d,r11d
+ rorx r13d,r11d,25
+ vpxor ymm4,ymm4,ymm5
+ rorx edi,r11d,11
+ lea edx,[r14*1+rdx]
+ lea ecx,[r12*1+rcx]
+ vpslld ymm5,ymm5,11
+ andn r12d,r11d,ebx
+ xor r13d,edi
+ rorx r14d,r11d,6
+ vpxor ymm4,ymm4,ymm6
+ lea ecx,[r12*1+rcx]
+ xor r13d,r14d
+ mov edi,edx
+ vpsrld ymm6,ymm7,10
+ rorx r12d,edx,22
+ lea ecx,[r13*1+rcx]
+ xor edi,r8d
+ vpxor ymm4,ymm4,ymm5
+ rorx r14d,edx,13
+ rorx r13d,edx,2
+ lea r10d,[rcx*1+r10]
+ vpsrlq ymm7,ymm7,17
+ and r15d,edi
+ xor r14d,r12d
+ xor r15d,r8d
+ vpaddd ymm3,ymm3,ymm4
+ xor r14d,r13d
+ lea ecx,[r15*1+rcx]
+ mov r12d,r11d
+ vpxor ymm6,ymm6,ymm7
+ add ebx,DWORD[((40+128))+rsp]
+ and r12d,r10d
+ rorx r13d,r10d,25
+ vpsrlq ymm7,ymm7,2
+ rorx r15d,r10d,11
+ lea ecx,[r14*1+rcx]
+ lea ebx,[r12*1+rbx]
+ vpxor ymm6,ymm6,ymm7
+ andn r12d,r10d,eax
+ xor r13d,r15d
+ rorx r14d,r10d,6
+ vpshufb ymm6,ymm6,ymm8
+ lea ebx,[r12*1+rbx]
+ xor r13d,r14d
+ mov r15d,ecx
+ vpaddd ymm3,ymm3,ymm6
+ rorx r12d,ecx,22
+ lea ebx,[r13*1+rbx]
+ xor r15d,edx
+ vpshufd ymm7,ymm3,80
+ rorx r14d,ecx,13
+ rorx r13d,ecx,2
+ lea r9d,[rbx*1+r9]
+ vpsrld ymm6,ymm7,10
+ and edi,r15d
+ xor r14d,r12d
+ xor edi,edx
+ vpsrlq ymm7,ymm7,17
+ xor r14d,r13d
+ lea ebx,[rdi*1+rbx]
+ mov r12d,r10d
+ vpxor ymm6,ymm6,ymm7
+ add eax,DWORD[((44+128))+rsp]
+ and r12d,r9d
+ rorx r13d,r9d,25
+ vpsrlq ymm7,ymm7,2
+ rorx edi,r9d,11
+ lea ebx,[r14*1+rbx]
+ lea eax,[r12*1+rax]
+ vpxor ymm6,ymm6,ymm7
+ andn r12d,r9d,r11d
+ xor r13d,edi
+ rorx r14d,r9d,6
+ vpshufb ymm6,ymm6,ymm9
+ lea eax,[r12*1+rax]
+ xor r13d,r14d
+ mov edi,ebx
+ vpaddd ymm3,ymm3,ymm6
+ rorx r12d,ebx,22
+ lea eax,[r13*1+rax]
+ xor edi,ecx
+ vpaddd ymm6,ymm3,YMMWORD[96+rbp]
+ rorx r14d,ebx,13
+ rorx r13d,ebx,2
+ lea r8d,[rax*1+r8]
+ and r15d,edi
+ xor r14d,r12d
+ xor r15d,ecx
+ xor r14d,r13d
+ lea eax,[r15*1+rax]
+ mov r12d,r9d
+ vmovdqa YMMWORD[32+rsp],ymm6
+ lea rbp,[128+rbp]
+ cmp BYTE[3+rbp],0
+ jne NEAR $L$avx2_00_47
+ add r11d,DWORD[((0+64))+rsp]
+ and r12d,r8d
+ rorx r13d,r8d,25
+ rorx r15d,r8d,11
+ lea eax,[r14*1+rax]
+ lea r11d,[r12*1+r11]
+ andn r12d,r8d,r10d
+ xor r13d,r15d
+ rorx r14d,r8d,6
+ lea r11d,[r12*1+r11]
+ xor r13d,r14d
+ mov r15d,eax
+ rorx r12d,eax,22
+ lea r11d,[r13*1+r11]
+ xor r15d,ebx
+ rorx r14d,eax,13
+ rorx r13d,eax,2
+ lea edx,[r11*1+rdx]
+ and edi,r15d
+ xor r14d,r12d
+ xor edi,ebx
+ xor r14d,r13d
+ lea r11d,[rdi*1+r11]
+ mov r12d,r8d
+ add r10d,DWORD[((4+64))+rsp]
+ and r12d,edx
+ rorx r13d,edx,25
+ rorx edi,edx,11
+ lea r11d,[r14*1+r11]
+ lea r10d,[r12*1+r10]
+ andn r12d,edx,r9d
+ xor r13d,edi
+ rorx r14d,edx,6
+ lea r10d,[r12*1+r10]
+ xor r13d,r14d
+ mov edi,r11d
+ rorx r12d,r11d,22
+ lea r10d,[r13*1+r10]
+ xor edi,eax
+ rorx r14d,r11d,13
+ rorx r13d,r11d,2
+ lea ecx,[r10*1+rcx]
+ and r15d,edi
+ xor r14d,r12d
+ xor r15d,eax
+ xor r14d,r13d
+ lea r10d,[r15*1+r10]
+ mov r12d,edx
+ add r9d,DWORD[((8+64))+rsp]
+ and r12d,ecx
+ rorx r13d,ecx,25
+ rorx r15d,ecx,11
+ lea r10d,[r14*1+r10]
+ lea r9d,[r12*1+r9]
+ andn r12d,ecx,r8d
+ xor r13d,r15d
+ rorx r14d,ecx,6
+ lea r9d,[r12*1+r9]
+ xor r13d,r14d
+ mov r15d,r10d
+ rorx r12d,r10d,22
+ lea r9d,[r13*1+r9]
+ xor r15d,r11d
+ rorx r14d,r10d,13
+ rorx r13d,r10d,2
+ lea ebx,[r9*1+rbx]
+ and edi,r15d
+ xor r14d,r12d
+ xor edi,r11d
+ xor r14d,r13d
+ lea r9d,[rdi*1+r9]
+ mov r12d,ecx
+ add r8d,DWORD[((12+64))+rsp]
+ and r12d,ebx
+ rorx r13d,ebx,25
+ rorx edi,ebx,11
+ lea r9d,[r14*1+r9]
+ lea r8d,[r12*1+r8]
+ andn r12d,ebx,edx
+ xor r13d,edi
+ rorx r14d,ebx,6
+ lea r8d,[r12*1+r8]
+ xor r13d,r14d
+ mov edi,r9d
+ rorx r12d,r9d,22
+ lea r8d,[r13*1+r8]
+ xor edi,r10d
+ rorx r14d,r9d,13
+ rorx r13d,r9d,2
+ lea eax,[r8*1+rax]
+ and r15d,edi
+ xor r14d,r12d
+ xor r15d,r10d
+ xor r14d,r13d
+ lea r8d,[r15*1+r8]
+ mov r12d,ebx
+ add edx,DWORD[((32+64))+rsp]
+ and r12d,eax
+ rorx r13d,eax,25
+ rorx r15d,eax,11
+ lea r8d,[r14*1+r8]
+ lea edx,[r12*1+rdx]
+ andn r12d,eax,ecx
+ xor r13d,r15d
+ rorx r14d,eax,6
+ lea edx,[r12*1+rdx]
+ xor r13d,r14d
+ mov r15d,r8d
+ rorx r12d,r8d,22
+ lea edx,[r13*1+rdx]
+ xor r15d,r9d
+ rorx r14d,r8d,13
+ rorx r13d,r8d,2
+ lea r11d,[rdx*1+r11]
+ and edi,r15d
+ xor r14d,r12d
+ xor edi,r9d
+ xor r14d,r13d
+ lea edx,[rdi*1+rdx]
+ mov r12d,eax
+ add ecx,DWORD[((36+64))+rsp]
+ and r12d,r11d
+ rorx r13d,r11d,25
+ rorx edi,r11d,11
+ lea edx,[r14*1+rdx]
+ lea ecx,[r12*1+rcx]
+ andn r12d,r11d,ebx
+ xor r13d,edi
+ rorx r14d,r11d,6
+ lea ecx,[r12*1+rcx]
+ xor r13d,r14d
+ mov edi,edx
+ rorx r12d,edx,22
+ lea ecx,[r13*1+rcx]
+ xor edi,r8d
+ rorx r14d,edx,13
+ rorx r13d,edx,2
+ lea r10d,[rcx*1+r10]
+ and r15d,edi
+ xor r14d,r12d
+ xor r15d,r8d
+ xor r14d,r13d
+ lea ecx,[r15*1+rcx]
+ mov r12d,r11d
+ add ebx,DWORD[((40+64))+rsp]
+ and r12d,r10d
+ rorx r13d,r10d,25
+ rorx r15d,r10d,11
+ lea ecx,[r14*1+rcx]
+ lea ebx,[r12*1+rbx]
+ andn r12d,r10d,eax
+ xor r13d,r15d
+ rorx r14d,r10d,6
+ lea ebx,[r12*1+rbx]
+ xor r13d,r14d
+ mov r15d,ecx
+ rorx r12d,ecx,22
+ lea ebx,[r13*1+rbx]
+ xor r15d,edx
+ rorx r14d,ecx,13
+ rorx r13d,ecx,2
+ lea r9d,[rbx*1+r9]
+ and edi,r15d
+ xor r14d,r12d
+ xor edi,edx
+ xor r14d,r13d
+ lea ebx,[rdi*1+rbx]
+ mov r12d,r10d
+ add eax,DWORD[((44+64))+rsp]
+ and r12d,r9d
+ rorx r13d,r9d,25
+ rorx edi,r9d,11
+ lea ebx,[r14*1+rbx]
+ lea eax,[r12*1+rax]
+ andn r12d,r9d,r11d
+ xor r13d,edi
+ rorx r14d,r9d,6
+ lea eax,[r12*1+rax]
+ xor r13d,r14d
+ mov edi,ebx
+ rorx r12d,ebx,22
+ lea eax,[r13*1+rax]
+ xor edi,ecx
+ rorx r14d,ebx,13
+ rorx r13d,ebx,2
+ lea r8d,[rax*1+r8]
+ and r15d,edi
+ xor r14d,r12d
+ xor r15d,ecx
+ xor r14d,r13d
+ lea eax,[r15*1+rax]
+ mov r12d,r9d
+ add r11d,DWORD[rsp]
+ and r12d,r8d
+ rorx r13d,r8d,25
+ rorx r15d,r8d,11
+ lea eax,[r14*1+rax]
+ lea r11d,[r12*1+r11]
+ andn r12d,r8d,r10d
+ xor r13d,r15d
+ rorx r14d,r8d,6
+ lea r11d,[r12*1+r11]
+ xor r13d,r14d
+ mov r15d,eax
+ rorx r12d,eax,22
+ lea r11d,[r13*1+r11]
+ xor r15d,ebx
+ rorx r14d,eax,13
+ rorx r13d,eax,2
+ lea edx,[r11*1+rdx]
+ and edi,r15d
+ xor r14d,r12d
+ xor edi,ebx
+ xor r14d,r13d
+ lea r11d,[rdi*1+r11]
+ mov r12d,r8d
+ add r10d,DWORD[4+rsp]
+ and r12d,edx
+ rorx r13d,edx,25
+ rorx edi,edx,11
+ lea r11d,[r14*1+r11]
+ lea r10d,[r12*1+r10]
+ andn r12d,edx,r9d
+ xor r13d,edi
+ rorx r14d,edx,6
+ lea r10d,[r12*1+r10]
+ xor r13d,r14d
+ mov edi,r11d
+ rorx r12d,r11d,22
+ lea r10d,[r13*1+r10]
+ xor edi,eax
+ rorx r14d,r11d,13
+ rorx r13d,r11d,2
+ lea ecx,[r10*1+rcx]
+ and r15d,edi
+ xor r14d,r12d
+ xor r15d,eax
+ xor r14d,r13d
+ lea r10d,[r15*1+r10]
+ mov r12d,edx
+ add r9d,DWORD[8+rsp]
+ and r12d,ecx
+ rorx r13d,ecx,25
+ rorx r15d,ecx,11
+ lea r10d,[r14*1+r10]
+ lea r9d,[r12*1+r9]
+ andn r12d,ecx,r8d
+ xor r13d,r15d
+ rorx r14d,ecx,6
+ lea r9d,[r12*1+r9]
+ xor r13d,r14d
+ mov r15d,r10d
+ rorx r12d,r10d,22
+ lea r9d,[r13*1+r9]
+ xor r15d,r11d
+ rorx r14d,r10d,13
+ rorx r13d,r10d,2
+ lea ebx,[r9*1+rbx]
+ and edi,r15d
+ xor r14d,r12d
+ xor edi,r11d
+ xor r14d,r13d
+ lea r9d,[rdi*1+r9]
+ mov r12d,ecx
+ add r8d,DWORD[12+rsp]
+ and r12d,ebx
+ rorx r13d,ebx,25
+ rorx edi,ebx,11
+ lea r9d,[r14*1+r9]
+ lea r8d,[r12*1+r8]
+ andn r12d,ebx,edx
+ xor r13d,edi
+ rorx r14d,ebx,6
+ lea r8d,[r12*1+r8]
+ xor r13d,r14d
+ mov edi,r9d
+ rorx r12d,r9d,22
+ lea r8d,[r13*1+r8]
+ xor edi,r10d
+ rorx r14d,r9d,13
+ rorx r13d,r9d,2
+ lea eax,[r8*1+rax]
+ and r15d,edi
+ xor r14d,r12d
+ xor r15d,r10d
+ xor r14d,r13d
+ lea r8d,[r15*1+r8]
+ mov r12d,ebx
+ add edx,DWORD[32+rsp]
+ and r12d,eax
+ rorx r13d,eax,25
+ rorx r15d,eax,11
+ lea r8d,[r14*1+r8]
+ lea edx,[r12*1+rdx]
+ andn r12d,eax,ecx
+ xor r13d,r15d
+ rorx r14d,eax,6
+ lea edx,[r12*1+rdx]
+ xor r13d,r14d
+ mov r15d,r8d
+ rorx r12d,r8d,22
+ lea edx,[r13*1+rdx]
+ xor r15d,r9d
+ rorx r14d,r8d,13
+ rorx r13d,r8d,2
+ lea r11d,[rdx*1+r11]
+ and edi,r15d
+ xor r14d,r12d
+ xor edi,r9d
+ xor r14d,r13d
+ lea edx,[rdi*1+rdx]
+ mov r12d,eax
+ add ecx,DWORD[36+rsp]
+ and r12d,r11d
+ rorx r13d,r11d,25
+ rorx edi,r11d,11
+ lea edx,[r14*1+rdx]
+ lea ecx,[r12*1+rcx]
+ andn r12d,r11d,ebx
+ xor r13d,edi
+ rorx r14d,r11d,6
+ lea ecx,[r12*1+rcx]
+ xor r13d,r14d
+ mov edi,edx
+ rorx r12d,edx,22
+ lea ecx,[r13*1+rcx]
+ xor edi,r8d
+ rorx r14d,edx,13
+ rorx r13d,edx,2
+ lea r10d,[rcx*1+r10]
+ and r15d,edi
+ xor r14d,r12d
+ xor r15d,r8d
+ xor r14d,r13d
+ lea ecx,[r15*1+rcx]
+ mov r12d,r11d
+ add ebx,DWORD[40+rsp]
+ and r12d,r10d
+ rorx r13d,r10d,25
+ rorx r15d,r10d,11
+ lea ecx,[r14*1+rcx]
+ lea ebx,[r12*1+rbx]
+ andn r12d,r10d,eax
+ xor r13d,r15d
+ rorx r14d,r10d,6
+ lea ebx,[r12*1+rbx]
+ xor r13d,r14d
+ mov r15d,ecx
+ rorx r12d,ecx,22
+ lea ebx,[r13*1+rbx]
+ xor r15d,edx
+ rorx r14d,ecx,13
+ rorx r13d,ecx,2
+ lea r9d,[rbx*1+r9]
+ and edi,r15d
+ xor r14d,r12d
+ xor edi,edx
+ xor r14d,r13d
+ lea ebx,[rdi*1+rbx]
+ mov r12d,r10d
+ add eax,DWORD[44+rsp]
+ and r12d,r9d
+ rorx r13d,r9d,25
+ rorx edi,r9d,11
+ lea ebx,[r14*1+rbx]
+ lea eax,[r12*1+rax]
+ andn r12d,r9d,r11d
+ xor r13d,edi
+ rorx r14d,r9d,6
+ lea eax,[r12*1+rax]
+ xor r13d,r14d
+ mov edi,ebx
+ rorx r12d,ebx,22
+ lea eax,[r13*1+rax]
+ xor edi,ecx
+ rorx r14d,ebx,13
+ rorx r13d,ebx,2
+ lea r8d,[rax*1+r8]
+ and r15d,edi
+ xor r14d,r12d
+ xor r15d,ecx
+ xor r14d,r13d
+ lea eax,[r15*1+rax]
+ mov r12d,r9d
+ mov rdi,QWORD[512+rsp]
+ add eax,r14d
+
+ lea rbp,[448+rsp]
+
+ add eax,DWORD[rdi]
+ add ebx,DWORD[4+rdi]
+ add ecx,DWORD[8+rdi]
+ add edx,DWORD[12+rdi]
+ add r8d,DWORD[16+rdi]
+ add r9d,DWORD[20+rdi]
+ add r10d,DWORD[24+rdi]
+ add r11d,DWORD[28+rdi]
+
+ mov DWORD[rdi],eax
+ mov DWORD[4+rdi],ebx
+ mov DWORD[8+rdi],ecx
+ mov DWORD[12+rdi],edx
+ mov DWORD[16+rdi],r8d
+ mov DWORD[20+rdi],r9d
+ mov DWORD[24+rdi],r10d
+ mov DWORD[28+rdi],r11d
+
+ cmp rsi,QWORD[80+rbp]
+ je NEAR $L$done_avx2
+
+ xor r14d,r14d
+ mov edi,ebx
+ xor edi,ecx
+ mov r12d,r9d
+ jmp NEAR $L$ower_avx2
+ALIGN 16
+$L$ower_avx2:
+ add r11d,DWORD[((0+16))+rbp]
+ and r12d,r8d
+ rorx r13d,r8d,25
+ rorx r15d,r8d,11
+ lea eax,[r14*1+rax]
+ lea r11d,[r12*1+r11]
+ andn r12d,r8d,r10d
+ xor r13d,r15d
+ rorx r14d,r8d,6
+ lea r11d,[r12*1+r11]
+ xor r13d,r14d
+ mov r15d,eax
+ rorx r12d,eax,22
+ lea r11d,[r13*1+r11]
+ xor r15d,ebx
+ rorx r14d,eax,13
+ rorx r13d,eax,2
+ lea edx,[r11*1+rdx]
+ and edi,r15d
+ xor r14d,r12d
+ xor edi,ebx
+ xor r14d,r13d
+ lea r11d,[rdi*1+r11]
+ mov r12d,r8d
+ add r10d,DWORD[((4+16))+rbp]
+ and r12d,edx
+ rorx r13d,edx,25
+ rorx edi,edx,11
+ lea r11d,[r14*1+r11]
+ lea r10d,[r12*1+r10]
+ andn r12d,edx,r9d
+ xor r13d,edi
+ rorx r14d,edx,6
+ lea r10d,[r12*1+r10]
+ xor r13d,r14d
+ mov edi,r11d
+ rorx r12d,r11d,22
+ lea r10d,[r13*1+r10]
+ xor edi,eax
+ rorx r14d,r11d,13
+ rorx r13d,r11d,2
+ lea ecx,[r10*1+rcx]
+ and r15d,edi
+ xor r14d,r12d
+ xor r15d,eax
+ xor r14d,r13d
+ lea r10d,[r15*1+r10]
+ mov r12d,edx
+ add r9d,DWORD[((8+16))+rbp]
+ and r12d,ecx
+ rorx r13d,ecx,25
+ rorx r15d,ecx,11
+ lea r10d,[r14*1+r10]
+ lea r9d,[r12*1+r9]
+ andn r12d,ecx,r8d
+ xor r13d,r15d
+ rorx r14d,ecx,6
+ lea r9d,[r12*1+r9]
+ xor r13d,r14d
+ mov r15d,r10d
+ rorx r12d,r10d,22
+ lea r9d,[r13*1+r9]
+ xor r15d,r11d
+ rorx r14d,r10d,13
+ rorx r13d,r10d,2
+ lea ebx,[r9*1+rbx]
+ and edi,r15d
+ xor r14d,r12d
+ xor edi,r11d
+ xor r14d,r13d
+ lea r9d,[rdi*1+r9]
+ mov r12d,ecx
+ add r8d,DWORD[((12+16))+rbp]
+ and r12d,ebx
+ rorx r13d,ebx,25
+ rorx edi,ebx,11
+ lea r9d,[r14*1+r9]
+ lea r8d,[r12*1+r8]
+ andn r12d,ebx,edx
+ xor r13d,edi
+ rorx r14d,ebx,6
+ lea r8d,[r12*1+r8]
+ xor r13d,r14d
+ mov edi,r9d
+ rorx r12d,r9d,22
+ lea r8d,[r13*1+r8]
+ xor edi,r10d
+ rorx r14d,r9d,13
+ rorx r13d,r9d,2
+ lea eax,[r8*1+rax]
+ and r15d,edi
+ xor r14d,r12d
+ xor r15d,r10d
+ xor r14d,r13d
+ lea r8d,[r15*1+r8]
+ mov r12d,ebx
+ add edx,DWORD[((32+16))+rbp]
+ and r12d,eax
+ rorx r13d,eax,25
+ rorx r15d,eax,11
+ lea r8d,[r14*1+r8]
+ lea edx,[r12*1+rdx]
+ andn r12d,eax,ecx
+ xor r13d,r15d
+ rorx r14d,eax,6
+ lea edx,[r12*1+rdx]
+ xor r13d,r14d
+ mov r15d,r8d
+ rorx r12d,r8d,22
+ lea edx,[r13*1+rdx]
+ xor r15d,r9d
+ rorx r14d,r8d,13
+ rorx r13d,r8d,2
+ lea r11d,[rdx*1+r11]
+ and edi,r15d
+ xor r14d,r12d
+ xor edi,r9d
+ xor r14d,r13d
+ lea edx,[rdi*1+rdx]
+ mov r12d,eax
+ add ecx,DWORD[((36+16))+rbp]
+ and r12d,r11d
+ rorx r13d,r11d,25
+ rorx edi,r11d,11
+ lea edx,[r14*1+rdx]
+ lea ecx,[r12*1+rcx]
+ andn r12d,r11d,ebx
+ xor r13d,edi
+ rorx r14d,r11d,6
+ lea ecx,[r12*1+rcx]
+ xor r13d,r14d
+ mov edi,edx
+ rorx r12d,edx,22
+ lea ecx,[r13*1+rcx]
+ xor edi,r8d
+ rorx r14d,edx,13
+ rorx r13d,edx,2
+ lea r10d,[rcx*1+r10]
+ and r15d,edi
+ xor r14d,r12d
+ xor r15d,r8d
+ xor r14d,r13d
+ lea ecx,[r15*1+rcx]
+ mov r12d,r11d
+ add ebx,DWORD[((40+16))+rbp]
+ and r12d,r10d
+ rorx r13d,r10d,25
+ rorx r15d,r10d,11
+ lea ecx,[r14*1+rcx]
+ lea ebx,[r12*1+rbx]
+ andn r12d,r10d,eax
+ xor r13d,r15d
+ rorx r14d,r10d,6
+ lea ebx,[r12*1+rbx]
+ xor r13d,r14d
+ mov r15d,ecx
+ rorx r12d,ecx,22
+ lea ebx,[r13*1+rbx]
+ xor r15d,edx
+ rorx r14d,ecx,13
+ rorx r13d,ecx,2
+ lea r9d,[rbx*1+r9]
+ and edi,r15d
+ xor r14d,r12d
+ xor edi,edx
+ xor r14d,r13d
+ lea ebx,[rdi*1+rbx]
+ mov r12d,r10d
+ add eax,DWORD[((44+16))+rbp]
+ and r12d,r9d
+ rorx r13d,r9d,25
+ rorx edi,r9d,11
+ lea ebx,[r14*1+rbx]
+ lea eax,[r12*1+rax]
+ andn r12d,r9d,r11d
+ xor r13d,edi
+ rorx r14d,r9d,6
+ lea eax,[r12*1+rax]
+ xor r13d,r14d
+ mov edi,ebx
+ rorx r12d,ebx,22
+ lea eax,[r13*1+rax]
+ xor edi,ecx
+ rorx r14d,ebx,13
+ rorx r13d,ebx,2
+ lea r8d,[rax*1+r8]
+ and r15d,edi
+ xor r14d,r12d
+ xor r15d,ecx
+ xor r14d,r13d
+ lea eax,[r15*1+rax]
+ mov r12d,r9d
+ lea rbp,[((-64))+rbp]
+ cmp rbp,rsp
+ jae NEAR $L$ower_avx2
+
+ mov rdi,QWORD[512+rsp]
+ add eax,r14d
+
+ lea rsp,[448+rsp]
+
+ add eax,DWORD[rdi]
+ add ebx,DWORD[4+rdi]
+ add ecx,DWORD[8+rdi]
+ add edx,DWORD[12+rdi]
+ add r8d,DWORD[16+rdi]
+ add r9d,DWORD[20+rdi]
+ lea rsi,[128+rsi]
+ add r10d,DWORD[24+rdi]
+ mov r12,rsi
+ add r11d,DWORD[28+rdi]
+ cmp rsi,QWORD[((64+16))+rsp]
+
+ mov DWORD[rdi],eax
+ cmove r12,rsp
+ mov DWORD[4+rdi],ebx
+ mov DWORD[8+rdi],ecx
+ mov DWORD[12+rdi],edx
+ mov DWORD[16+rdi],r8d
+ mov DWORD[20+rdi],r9d
+ mov DWORD[24+rdi],r10d
+ mov DWORD[28+rdi],r11d
+
+ jbe NEAR $L$oop_avx2
+ lea rbp,[rsp]
+
+$L$done_avx2:
+ lea rsp,[rbp]
+ mov rsi,QWORD[88+rsp]
+
+ vzeroupper
+ movaps xmm6,XMMWORD[((64+32))+rsp]
+ movaps xmm7,XMMWORD[((64+48))+rsp]
+ movaps xmm8,XMMWORD[((64+64))+rsp]
+ movaps xmm9,XMMWORD[((64+80))+rsp]
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$epilogue_avx2:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_sha256_block_data_order_avx2:
+EXTERN __imp_RtlVirtualUnwind
+
+ALIGN 16
+se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$in_prologue
+
+ mov rax,QWORD[152+r8]
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$in_prologue
+ lea r10,[$L$avx2_shortcut]
+ cmp rbx,r10
+ jb NEAR $L$not_in_avx2
+
+ and rax,-256*4
+ add rax,448
+$L$not_in_avx2:
+ mov rsi,rax
+ mov rax,QWORD[((64+24))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+ mov rbp,QWORD[((-16))+rax]
+ mov r12,QWORD[((-24))+rax]
+ mov r13,QWORD[((-32))+rax]
+ mov r14,QWORD[((-40))+rax]
+ mov r15,QWORD[((-48))+rax]
+ mov QWORD[144+r8],rbx
+ mov QWORD[160+r8],rbp
+ mov QWORD[216+r8],r12
+ mov QWORD[224+r8],r13
+ mov QWORD[232+r8],r14
+ mov QWORD[240+r8],r15
+
+ lea r10,[$L$epilogue]
+ cmp rbx,r10
+ jb NEAR $L$in_prologue
+
+ lea rsi,[((64+32))+rsi]
+ lea rdi,[512+r8]
+ mov ecx,8
+ DD 0xa548f3fc
+
+$L$in_prologue:
+ mov rdi,QWORD[8+rax]
+ mov rsi,QWORD[16+rax]
+ mov QWORD[152+r8],rax
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+
+ mov rdi,QWORD[40+r9]
+ mov rsi,r8
+ mov ecx,154
+ DD 0xa548f3fc
+
+ mov rsi,r9
+ xor rcx,rcx
+ mov rdx,QWORD[8+rsi]
+ mov r8,QWORD[rsi]
+ mov r9,QWORD[16+rsi]
+ mov r10,QWORD[40+rsi]
+ lea r11,[56+rsi]
+ lea r12,[24+rsi]
+ mov QWORD[32+rsp],r10
+ mov QWORD[40+rsp],r11
+ mov QWORD[48+rsp],r12
+ mov QWORD[56+rsp],rcx
+ call QWORD[__imp_RtlVirtualUnwind]
+
+ mov eax,1
+ add rsp,64
+ popfq
+ pop r15
+ pop r14
+ pop r13
+ pop r12
+ pop rbp
+ pop rbx
+ pop rdi
+ pop rsi
+ DB 0F3h,0C3h ;repret
+
+
+ALIGN 16
+shaext_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ lea r10,[$L$prologue_shaext]
+ cmp rbx,r10
+ jb NEAR $L$in_prologue
+
+ lea r10,[$L$epilogue_shaext]
+ cmp rbx,r10
+ jae NEAR $L$in_prologue
+
+ lea rsi,[((-8-80))+rax]
+ lea rdi,[512+r8]
+ mov ecx,10
+ DD 0xa548f3fc
+
+ jmp NEAR $L$in_prologue
+
+section .pdata rdata align=4
+ALIGN 4
+ DD $L$SEH_begin_sha256_block_data_order wrt ..imagebase
+ DD $L$SEH_end_sha256_block_data_order wrt ..imagebase
+ DD $L$SEH_info_sha256_block_data_order wrt ..imagebase
+ DD $L$SEH_begin_sha256_block_data_order_shaext wrt ..imagebase
+ DD $L$SEH_end_sha256_block_data_order_shaext wrt ..imagebase
+ DD $L$SEH_info_sha256_block_data_order_shaext wrt ..imagebase
+ DD $L$SEH_begin_sha256_block_data_order_ssse3 wrt ..imagebase
+ DD $L$SEH_end_sha256_block_data_order_ssse3 wrt ..imagebase
+ DD $L$SEH_info_sha256_block_data_order_ssse3 wrt ..imagebase
+ DD $L$SEH_begin_sha256_block_data_order_avx wrt ..imagebase
+ DD $L$SEH_end_sha256_block_data_order_avx wrt ..imagebase
+ DD $L$SEH_info_sha256_block_data_order_avx wrt ..imagebase
+ DD $L$SEH_begin_sha256_block_data_order_avx2 wrt ..imagebase
+ DD $L$SEH_end_sha256_block_data_order_avx2 wrt ..imagebase
+ DD $L$SEH_info_sha256_block_data_order_avx2 wrt ..imagebase
+section .xdata rdata align=8
+ALIGN 8
+$L$SEH_info_sha256_block_data_order:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$prologue wrt ..imagebase,$L$epilogue wrt ..imagebase
+$L$SEH_info_sha256_block_data_order_shaext:
+DB 9,0,0,0
+ DD shaext_handler wrt ..imagebase
+$L$SEH_info_sha256_block_data_order_ssse3:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$prologue_ssse3 wrt ..imagebase,$L$epilogue_ssse3 wrt ..imagebase
+$L$SEH_info_sha256_block_data_order_avx:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$prologue_avx wrt ..imagebase,$L$epilogue_avx wrt ..imagebase
+$L$SEH_info_sha256_block_data_order_avx2:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$prologue_avx2 wrt ..imagebase,$L$epilogue_avx2 wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm
new file mode 100644
index 0000000000..6d48b93b84
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm
@@ -0,0 +1,5668 @@
+; Copyright 2005-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+EXTERN OPENSSL_ia32cap_P
+global sha512_block_data_order
+
+ALIGN 16
+sha512_block_data_order:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha512_block_data_order:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+ lea r11,[OPENSSL_ia32cap_P]
+ mov r9d,DWORD[r11]
+ mov r10d,DWORD[4+r11]
+ mov r11d,DWORD[8+r11]
+ test r10d,2048
+ jnz NEAR $L$xop_shortcut
+ and r11d,296
+ cmp r11d,296
+ je NEAR $L$avx2_shortcut
+ and r9d,1073741824
+ and r10d,268435968
+ or r10d,r9d
+ cmp r10d,1342177792
+ je NEAR $L$avx_shortcut
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ shl rdx,4
+ sub rsp,16*8+4*8
+ lea rdx,[rdx*8+rsi]
+ and rsp,-64
+ mov QWORD[((128+0))+rsp],rdi
+ mov QWORD[((128+8))+rsp],rsi
+ mov QWORD[((128+16))+rsp],rdx
+ mov QWORD[152+rsp],rax
+
+$L$prologue:
+
+ mov rax,QWORD[rdi]
+ mov rbx,QWORD[8+rdi]
+ mov rcx,QWORD[16+rdi]
+ mov rdx,QWORD[24+rdi]
+ mov r8,QWORD[32+rdi]
+ mov r9,QWORD[40+rdi]
+ mov r10,QWORD[48+rdi]
+ mov r11,QWORD[56+rdi]
+ jmp NEAR $L$loop
+
+ALIGN 16
+$L$loop:
+ mov rdi,rbx
+ lea rbp,[K512]
+ xor rdi,rcx
+ mov r12,QWORD[rsi]
+ mov r13,r8
+ mov r14,rax
+ bswap r12
+ ror r13,23
+ mov r15,r9
+
+ xor r13,r8
+ ror r14,5
+ xor r15,r10
+
+ mov QWORD[rsp],r12
+ xor r14,rax
+ and r15,r8
+
+ ror r13,4
+ add r12,r11
+ xor r15,r10
+
+ ror r14,6
+ xor r13,r8
+ add r12,r15
+
+ mov r15,rax
+ add r12,QWORD[rbp]
+ xor r14,rax
+
+ xor r15,rbx
+ ror r13,14
+ mov r11,rbx
+
+ and rdi,r15
+ ror r14,28
+ add r12,r13
+
+ xor r11,rdi
+ add rdx,r12
+ add r11,r12
+
+ lea rbp,[8+rbp]
+ add r11,r14
+ mov r12,QWORD[8+rsi]
+ mov r13,rdx
+ mov r14,r11
+ bswap r12
+ ror r13,23
+ mov rdi,r8
+
+ xor r13,rdx
+ ror r14,5
+ xor rdi,r9
+
+ mov QWORD[8+rsp],r12
+ xor r14,r11
+ and rdi,rdx
+
+ ror r13,4
+ add r12,r10
+ xor rdi,r9
+
+ ror r14,6
+ xor r13,rdx
+ add r12,rdi
+
+ mov rdi,r11
+ add r12,QWORD[rbp]
+ xor r14,r11
+
+ xor rdi,rax
+ ror r13,14
+ mov r10,rax
+
+ and r15,rdi
+ ror r14,28
+ add r12,r13
+
+ xor r10,r15
+ add rcx,r12
+ add r10,r12
+
+ lea rbp,[24+rbp]
+ add r10,r14
+ mov r12,QWORD[16+rsi]
+ mov r13,rcx
+ mov r14,r10
+ bswap r12
+ ror r13,23
+ mov r15,rdx
+
+ xor r13,rcx
+ ror r14,5
+ xor r15,r8
+
+ mov QWORD[16+rsp],r12
+ xor r14,r10
+ and r15,rcx
+
+ ror r13,4
+ add r12,r9
+ xor r15,r8
+
+ ror r14,6
+ xor r13,rcx
+ add r12,r15
+
+ mov r15,r10
+ add r12,QWORD[rbp]
+ xor r14,r10
+
+ xor r15,r11
+ ror r13,14
+ mov r9,r11
+
+ and rdi,r15
+ ror r14,28
+ add r12,r13
+
+ xor r9,rdi
+ add rbx,r12
+ add r9,r12
+
+ lea rbp,[8+rbp]
+ add r9,r14
+ mov r12,QWORD[24+rsi]
+ mov r13,rbx
+ mov r14,r9
+ bswap r12
+ ror r13,23
+ mov rdi,rcx
+
+ xor r13,rbx
+ ror r14,5
+ xor rdi,rdx
+
+ mov QWORD[24+rsp],r12
+ xor r14,r9
+ and rdi,rbx
+
+ ror r13,4
+ add r12,r8
+ xor rdi,rdx
+
+ ror r14,6
+ xor r13,rbx
+ add r12,rdi
+
+ mov rdi,r9
+ add r12,QWORD[rbp]
+ xor r14,r9
+
+ xor rdi,r10
+ ror r13,14
+ mov r8,r10
+
+ and r15,rdi
+ ror r14,28
+ add r12,r13
+
+ xor r8,r15
+ add rax,r12
+ add r8,r12
+
+ lea rbp,[24+rbp]
+ add r8,r14
+ mov r12,QWORD[32+rsi]
+ mov r13,rax
+ mov r14,r8
+ bswap r12
+ ror r13,23
+ mov r15,rbx
+
+ xor r13,rax
+ ror r14,5
+ xor r15,rcx
+
+ mov QWORD[32+rsp],r12
+ xor r14,r8
+ and r15,rax
+
+ ror r13,4
+ add r12,rdx
+ xor r15,rcx
+
+ ror r14,6
+ xor r13,rax
+ add r12,r15
+
+ mov r15,r8
+ add r12,QWORD[rbp]
+ xor r14,r8
+
+ xor r15,r9
+ ror r13,14
+ mov rdx,r9
+
+ and rdi,r15
+ ror r14,28
+ add r12,r13
+
+ xor rdx,rdi
+ add r11,r12
+ add rdx,r12
+
+ lea rbp,[8+rbp]
+ add rdx,r14
+ mov r12,QWORD[40+rsi]
+ mov r13,r11
+ mov r14,rdx
+ bswap r12
+ ror r13,23
+ mov rdi,rax
+
+ xor r13,r11
+ ror r14,5
+ xor rdi,rbx
+
+ mov QWORD[40+rsp],r12
+ xor r14,rdx
+ and rdi,r11
+
+ ror r13,4
+ add r12,rcx
+ xor rdi,rbx
+
+ ror r14,6
+ xor r13,r11
+ add r12,rdi
+
+ mov rdi,rdx
+ add r12,QWORD[rbp]
+ xor r14,rdx
+
+ xor rdi,r8
+ ror r13,14
+ mov rcx,r8
+
+ and r15,rdi
+ ror r14,28
+ add r12,r13
+
+ xor rcx,r15
+ add r10,r12
+ add rcx,r12
+
+ lea rbp,[24+rbp]
+ add rcx,r14
+ mov r12,QWORD[48+rsi]
+ mov r13,r10
+ mov r14,rcx
+ bswap r12
+ ror r13,23
+ mov r15,r11
+
+ xor r13,r10
+ ror r14,5
+ xor r15,rax
+
+ mov QWORD[48+rsp],r12
+ xor r14,rcx
+ and r15,r10
+
+ ror r13,4
+ add r12,rbx
+ xor r15,rax
+
+ ror r14,6
+ xor r13,r10
+ add r12,r15
+
+ mov r15,rcx
+ add r12,QWORD[rbp]
+ xor r14,rcx
+
+ xor r15,rdx
+ ror r13,14
+ mov rbx,rdx
+
+ and rdi,r15
+ ror r14,28
+ add r12,r13
+
+ xor rbx,rdi
+ add r9,r12
+ add rbx,r12
+
+ lea rbp,[8+rbp]
+ add rbx,r14
+ mov r12,QWORD[56+rsi]
+ mov r13,r9
+ mov r14,rbx
+ bswap r12
+ ror r13,23
+ mov rdi,r10
+
+ xor r13,r9
+ ror r14,5
+ xor rdi,r11
+
+ mov QWORD[56+rsp],r12
+ xor r14,rbx
+ and rdi,r9
+
+ ror r13,4
+ add r12,rax
+ xor rdi,r11
+
+ ror r14,6
+ xor r13,r9
+ add r12,rdi
+
+ mov rdi,rbx
+ add r12,QWORD[rbp]
+ xor r14,rbx
+
+ xor rdi,rcx
+ ror r13,14
+ mov rax,rcx
+
+ and r15,rdi
+ ror r14,28
+ add r12,r13
+
+ xor rax,r15
+ add r8,r12
+ add rax,r12
+
+ lea rbp,[24+rbp]
+ add rax,r14
+ mov r12,QWORD[64+rsi]
+ mov r13,r8
+ mov r14,rax
+ bswap r12
+ ror r13,23
+ mov r15,r9
+
+ xor r13,r8
+ ror r14,5
+ xor r15,r10
+
+ mov QWORD[64+rsp],r12
+ xor r14,rax
+ and r15,r8
+
+ ror r13,4
+ add r12,r11
+ xor r15,r10
+
+ ror r14,6
+ xor r13,r8
+ add r12,r15
+
+ mov r15,rax
+ add r12,QWORD[rbp]
+ xor r14,rax
+
+ xor r15,rbx
+ ror r13,14
+ mov r11,rbx
+
+ and rdi,r15
+ ror r14,28
+ add r12,r13
+
+ xor r11,rdi
+ add rdx,r12
+ add r11,r12
+
+ lea rbp,[8+rbp]
+ add r11,r14
+ mov r12,QWORD[72+rsi]
+ mov r13,rdx
+ mov r14,r11
+ bswap r12
+ ror r13,23
+ mov rdi,r8
+
+ xor r13,rdx
+ ror r14,5
+ xor rdi,r9
+
+ mov QWORD[72+rsp],r12
+ xor r14,r11
+ and rdi,rdx
+
+ ror r13,4
+ add r12,r10
+ xor rdi,r9
+
+ ror r14,6
+ xor r13,rdx
+ add r12,rdi
+
+ mov rdi,r11
+ add r12,QWORD[rbp]
+ xor r14,r11
+
+ xor rdi,rax
+ ror r13,14
+ mov r10,rax
+
+ and r15,rdi
+ ror r14,28
+ add r12,r13
+
+ xor r10,r15
+ add rcx,r12
+ add r10,r12
+
+ lea rbp,[24+rbp]
+ add r10,r14
+ mov r12,QWORD[80+rsi]
+ mov r13,rcx
+ mov r14,r10
+ bswap r12
+ ror r13,23
+ mov r15,rdx
+
+ xor r13,rcx
+ ror r14,5
+ xor r15,r8
+
+ mov QWORD[80+rsp],r12
+ xor r14,r10
+ and r15,rcx
+
+ ror r13,4
+ add r12,r9
+ xor r15,r8
+
+ ror r14,6
+ xor r13,rcx
+ add r12,r15
+
+ mov r15,r10
+ add r12,QWORD[rbp]
+ xor r14,r10
+
+ xor r15,r11
+ ror r13,14
+ mov r9,r11
+
+ and rdi,r15
+ ror r14,28
+ add r12,r13
+
+ xor r9,rdi
+ add rbx,r12
+ add r9,r12
+
+ lea rbp,[8+rbp]
+ add r9,r14
+ mov r12,QWORD[88+rsi]
+ mov r13,rbx
+ mov r14,r9
+ bswap r12
+ ror r13,23
+ mov rdi,rcx
+
+ xor r13,rbx
+ ror r14,5
+ xor rdi,rdx
+
+ mov QWORD[88+rsp],r12
+ xor r14,r9
+ and rdi,rbx
+
+ ror r13,4
+ add r12,r8
+ xor rdi,rdx
+
+ ror r14,6
+ xor r13,rbx
+ add r12,rdi
+
+ mov rdi,r9
+ add r12,QWORD[rbp]
+ xor r14,r9
+
+ xor rdi,r10
+ ror r13,14
+ mov r8,r10
+
+ and r15,rdi
+ ror r14,28
+ add r12,r13
+
+ xor r8,r15
+ add rax,r12
+ add r8,r12
+
+ lea rbp,[24+rbp]
+ add r8,r14
+ mov r12,QWORD[96+rsi]
+ mov r13,rax
+ mov r14,r8
+ bswap r12
+ ror r13,23
+ mov r15,rbx
+
+ xor r13,rax
+ ror r14,5
+ xor r15,rcx
+
+ mov QWORD[96+rsp],r12
+ xor r14,r8
+ and r15,rax
+
+ ror r13,4
+ add r12,rdx
+ xor r15,rcx
+
+ ror r14,6
+ xor r13,rax
+ add r12,r15
+
+ mov r15,r8
+ add r12,QWORD[rbp]
+ xor r14,r8
+
+ xor r15,r9
+ ror r13,14
+ mov rdx,r9
+
+ and rdi,r15
+ ror r14,28
+ add r12,r13
+
+ xor rdx,rdi
+ add r11,r12
+ add rdx,r12
+
+ lea rbp,[8+rbp]
+ add rdx,r14
+ mov r12,QWORD[104+rsi]
+ mov r13,r11
+ mov r14,rdx
+ bswap r12
+ ror r13,23
+ mov rdi,rax
+
+ xor r13,r11
+ ror r14,5
+ xor rdi,rbx
+
+ mov QWORD[104+rsp],r12
+ xor r14,rdx
+ and rdi,r11
+
+ ror r13,4
+ add r12,rcx
+ xor rdi,rbx
+
+ ror r14,6
+ xor r13,r11
+ add r12,rdi
+
+ mov rdi,rdx
+ add r12,QWORD[rbp]
+ xor r14,rdx
+
+ xor rdi,r8
+ ror r13,14
+ mov rcx,r8
+
+ and r15,rdi
+ ror r14,28
+ add r12,r13
+
+ xor rcx,r15
+ add r10,r12
+ add rcx,r12
+
+ lea rbp,[24+rbp]
+ add rcx,r14
+ mov r12,QWORD[112+rsi]
+ mov r13,r10
+ mov r14,rcx
+ bswap r12
+ ror r13,23
+ mov r15,r11
+
+ xor r13,r10
+ ror r14,5
+ xor r15,rax
+
+ mov QWORD[112+rsp],r12
+ xor r14,rcx
+ and r15,r10
+
+ ror r13,4
+ add r12,rbx
+ xor r15,rax
+
+ ror r14,6
+ xor r13,r10
+ add r12,r15
+
+ mov r15,rcx
+ add r12,QWORD[rbp]
+ xor r14,rcx
+
+ xor r15,rdx
+ ror r13,14
+ mov rbx,rdx
+
+ and rdi,r15
+ ror r14,28
+ add r12,r13
+
+ xor rbx,rdi
+ add r9,r12
+ add rbx,r12
+
+ lea rbp,[8+rbp]
+ add rbx,r14
+ mov r12,QWORD[120+rsi]
+ mov r13,r9
+ mov r14,rbx
+ bswap r12
+ ror r13,23
+ mov rdi,r10
+
+ xor r13,r9
+ ror r14,5
+ xor rdi,r11
+
+ mov QWORD[120+rsp],r12
+ xor r14,rbx
+ and rdi,r9
+
+ ror r13,4
+ add r12,rax
+ xor rdi,r11
+
+ ror r14,6
+ xor r13,r9
+ add r12,rdi
+
+ mov rdi,rbx
+ add r12,QWORD[rbp]
+ xor r14,rbx
+
+ xor rdi,rcx
+ ror r13,14
+ mov rax,rcx
+
+ and r15,rdi
+ ror r14,28
+ add r12,r13
+
+ xor rax,r15
+ add r8,r12
+ add rax,r12
+
+ lea rbp,[24+rbp]
+ jmp NEAR $L$rounds_16_xx
+ALIGN 16
+$L$rounds_16_xx:
+ mov r13,QWORD[8+rsp]
+ mov r15,QWORD[112+rsp]
+
+ mov r12,r13
+ ror r13,7
+ add rax,r14
+ mov r14,r15
+ ror r15,42
+
+ xor r13,r12
+ shr r12,7
+ ror r13,1
+ xor r15,r14
+ shr r14,6
+
+ ror r15,19
+ xor r12,r13
+ xor r15,r14
+ add r12,QWORD[72+rsp]
+
+ add r12,QWORD[rsp]
+ mov r13,r8
+ add r12,r15
+ mov r14,rax
+ ror r13,23
+ mov r15,r9
+
+ xor r13,r8
+ ror r14,5
+ xor r15,r10
+
+ mov QWORD[rsp],r12
+ xor r14,rax
+ and r15,r8
+
+ ror r13,4
+ add r12,r11
+ xor r15,r10
+
+ ror r14,6
+ xor r13,r8
+ add r12,r15
+
+ mov r15,rax
+ add r12,QWORD[rbp]
+ xor r14,rax
+
+ xor r15,rbx
+ ror r13,14
+ mov r11,rbx
+
+ and rdi,r15
+ ror r14,28
+ add r12,r13
+
+ xor r11,rdi
+ add rdx,r12
+ add r11,r12
+
+ lea rbp,[8+rbp]
+ mov r13,QWORD[16+rsp]
+ mov rdi,QWORD[120+rsp]
+
+ mov r12,r13
+ ror r13,7
+ add r11,r14
+ mov r14,rdi
+ ror rdi,42
+
+ xor r13,r12
+ shr r12,7
+ ror r13,1
+ xor rdi,r14
+ shr r14,6
+
+ ror rdi,19
+ xor r12,r13
+ xor rdi,r14
+ add r12,QWORD[80+rsp]
+
+ add r12,QWORD[8+rsp]
+ mov r13,rdx
+ add r12,rdi
+ mov r14,r11
+ ror r13,23
+ mov rdi,r8
+
+ xor r13,rdx
+ ror r14,5
+ xor rdi,r9
+
+ mov QWORD[8+rsp],r12
+ xor r14,r11
+ and rdi,rdx
+
+ ror r13,4
+ add r12,r10
+ xor rdi,r9
+
+ ror r14,6
+ xor r13,rdx
+ add r12,rdi
+
+ mov rdi,r11
+ add r12,QWORD[rbp]
+ xor r14,r11
+
+ xor rdi,rax
+ ror r13,14
+ mov r10,rax
+
+ and r15,rdi
+ ror r14,28
+ add r12,r13
+
+ xor r10,r15
+ add rcx,r12
+ add r10,r12
+
+ lea rbp,[24+rbp]
+ mov r13,QWORD[24+rsp]
+ mov r15,QWORD[rsp]
+
+ mov r12,r13
+ ror r13,7
+ add r10,r14
+ mov r14,r15
+ ror r15,42
+
+ xor r13,r12
+ shr r12,7
+ ror r13,1
+ xor r15,r14
+ shr r14,6
+
+ ror r15,19
+ xor r12,r13
+ xor r15,r14
+ add r12,QWORD[88+rsp]
+
+ add r12,QWORD[16+rsp]
+ mov r13,rcx
+ add r12,r15
+ mov r14,r10
+ ror r13,23
+ mov r15,rdx
+
+ xor r13,rcx
+ ror r14,5
+ xor r15,r8
+
+ mov QWORD[16+rsp],r12
+ xor r14,r10
+ and r15,rcx
+
+ ror r13,4
+ add r12,r9
+ xor r15,r8
+
+ ror r14,6
+ xor r13,rcx
+ add r12,r15
+
+ mov r15,r10
+ add r12,QWORD[rbp]
+ xor r14,r10
+
+ xor r15,r11
+ ror r13,14
+ mov r9,r11
+
+ and rdi,r15
+ ror r14,28
+ add r12,r13
+
+ xor r9,rdi
+ add rbx,r12
+ add r9,r12
+
+ lea rbp,[8+rbp]
+ mov r13,QWORD[32+rsp]
+ mov rdi,QWORD[8+rsp]
+
+ mov r12,r13
+ ror r13,7
+ add r9,r14
+ mov r14,rdi
+ ror rdi,42
+
+ xor r13,r12
+ shr r12,7
+ ror r13,1
+ xor rdi,r14
+ shr r14,6
+
+ ror rdi,19
+ xor r12,r13
+ xor rdi,r14
+ add r12,QWORD[96+rsp]
+
+ add r12,QWORD[24+rsp]
+ mov r13,rbx
+ add r12,rdi
+ mov r14,r9
+ ror r13,23
+ mov rdi,rcx
+
+ xor r13,rbx
+ ror r14,5
+ xor rdi,rdx
+
+ mov QWORD[24+rsp],r12
+ xor r14,r9
+ and rdi,rbx
+
+ ror r13,4
+ add r12,r8
+ xor rdi,rdx
+
+ ror r14,6
+ xor r13,rbx
+ add r12,rdi
+
+ mov rdi,r9
+ add r12,QWORD[rbp]
+ xor r14,r9
+
+ xor rdi,r10
+ ror r13,14
+ mov r8,r10
+
+ and r15,rdi
+ ror r14,28
+ add r12,r13
+
+ xor r8,r15
+ add rax,r12
+ add r8,r12
+
+ lea rbp,[24+rbp]
+ mov r13,QWORD[40+rsp]
+ mov r15,QWORD[16+rsp]
+
+ mov r12,r13
+ ror r13,7
+ add r8,r14
+ mov r14,r15
+ ror r15,42
+
+ xor r13,r12
+ shr r12,7
+ ror r13,1
+ xor r15,r14
+ shr r14,6
+
+ ror r15,19
+ xor r12,r13
+ xor r15,r14
+ add r12,QWORD[104+rsp]
+
+ add r12,QWORD[32+rsp]
+ mov r13,rax
+ add r12,r15
+ mov r14,r8
+ ror r13,23
+ mov r15,rbx
+
+ xor r13,rax
+ ror r14,5
+ xor r15,rcx
+
+ mov QWORD[32+rsp],r12
+ xor r14,r8
+ and r15,rax
+
+ ror r13,4
+ add r12,rdx
+ xor r15,rcx
+
+ ror r14,6
+ xor r13,rax
+ add r12,r15
+
+ mov r15,r8
+ add r12,QWORD[rbp]
+ xor r14,r8
+
+ xor r15,r9
+ ror r13,14
+ mov rdx,r9
+
+ and rdi,r15
+ ror r14,28
+ add r12,r13
+
+ xor rdx,rdi
+ add r11,r12
+ add rdx,r12
+
+ lea rbp,[8+rbp]
+ mov r13,QWORD[48+rsp]
+ mov rdi,QWORD[24+rsp]
+
+ mov r12,r13
+ ror r13,7
+ add rdx,r14
+ mov r14,rdi
+ ror rdi,42
+
+ xor r13,r12
+ shr r12,7
+ ror r13,1
+ xor rdi,r14
+ shr r14,6
+
+ ror rdi,19
+ xor r12,r13
+ xor rdi,r14
+ add r12,QWORD[112+rsp]
+
+ add r12,QWORD[40+rsp]
+ mov r13,r11
+ add r12,rdi
+ mov r14,rdx
+ ror r13,23
+ mov rdi,rax
+
+ xor r13,r11
+ ror r14,5
+ xor rdi,rbx
+
+ mov QWORD[40+rsp],r12
+ xor r14,rdx
+ and rdi,r11
+
+ ror r13,4
+ add r12,rcx
+ xor rdi,rbx
+
+ ror r14,6
+ xor r13,r11
+ add r12,rdi
+
+ mov rdi,rdx
+ add r12,QWORD[rbp]
+ xor r14,rdx
+
+ xor rdi,r8
+ ror r13,14
+ mov rcx,r8
+
+ and r15,rdi
+ ror r14,28
+ add r12,r13
+
+ xor rcx,r15
+ add r10,r12
+ add rcx,r12
+
+ lea rbp,[24+rbp]
+ mov r13,QWORD[56+rsp]
+ mov r15,QWORD[32+rsp]
+
+ mov r12,r13
+ ror r13,7
+ add rcx,r14
+ mov r14,r15
+ ror r15,42
+
+ xor r13,r12
+ shr r12,7
+ ror r13,1
+ xor r15,r14
+ shr r14,6
+
+ ror r15,19
+ xor r12,r13
+ xor r15,r14
+ add r12,QWORD[120+rsp]
+
+ add r12,QWORD[48+rsp]
+ mov r13,r10
+ add r12,r15
+ mov r14,rcx
+ ror r13,23
+ mov r15,r11
+
+ xor r13,r10
+ ror r14,5
+ xor r15,rax
+
+ mov QWORD[48+rsp],r12
+ xor r14,rcx
+ and r15,r10
+
+ ror r13,4
+ add r12,rbx
+ xor r15,rax
+
+ ror r14,6
+ xor r13,r10
+ add r12,r15
+
+ mov r15,rcx
+ add r12,QWORD[rbp]
+ xor r14,rcx
+
+ xor r15,rdx
+ ror r13,14
+ mov rbx,rdx
+
+ and rdi,r15
+ ror r14,28
+ add r12,r13
+
+ xor rbx,rdi
+ add r9,r12
+ add rbx,r12
+
+ lea rbp,[8+rbp]
+ mov r13,QWORD[64+rsp]
+ mov rdi,QWORD[40+rsp]
+
+ mov r12,r13
+ ror r13,7
+ add rbx,r14
+ mov r14,rdi
+ ror rdi,42
+
+ xor r13,r12
+ shr r12,7
+ ror r13,1
+ xor rdi,r14
+ shr r14,6
+
+ ror rdi,19
+ xor r12,r13
+ xor rdi,r14
+ add r12,QWORD[rsp]
+
+ add r12,QWORD[56+rsp]
+ mov r13,r9
+ add r12,rdi
+ mov r14,rbx
+ ror r13,23
+ mov rdi,r10
+
+ xor r13,r9
+ ror r14,5
+ xor rdi,r11
+
+ mov QWORD[56+rsp],r12
+ xor r14,rbx
+ and rdi,r9
+
+ ror r13,4
+ add r12,rax
+ xor rdi,r11
+
+ ror r14,6
+ xor r13,r9
+ add r12,rdi
+
+ mov rdi,rbx
+ add r12,QWORD[rbp]
+ xor r14,rbx
+
+ xor rdi,rcx
+ ror r13,14
+ mov rax,rcx
+
+ and r15,rdi
+ ror r14,28
+ add r12,r13
+
+ xor rax,r15
+ add r8,r12
+ add rax,r12
+
+ lea rbp,[24+rbp]
+ mov r13,QWORD[72+rsp]
+ mov r15,QWORD[48+rsp]
+
+ mov r12,r13
+ ror r13,7
+ add rax,r14
+ mov r14,r15
+ ror r15,42
+
+ xor r13,r12
+ shr r12,7
+ ror r13,1
+ xor r15,r14
+ shr r14,6
+
+ ror r15,19
+ xor r12,r13
+ xor r15,r14
+ add r12,QWORD[8+rsp]
+
+ add r12,QWORD[64+rsp]
+ mov r13,r8
+ add r12,r15
+ mov r14,rax
+ ror r13,23
+ mov r15,r9
+
+ xor r13,r8
+ ror r14,5
+ xor r15,r10
+
+ mov QWORD[64+rsp],r12
+ xor r14,rax
+ and r15,r8
+
+ ror r13,4
+ add r12,r11
+ xor r15,r10
+
+ ror r14,6
+ xor r13,r8
+ add r12,r15
+
+ mov r15,rax
+ add r12,QWORD[rbp]
+ xor r14,rax
+
+ xor r15,rbx
+ ror r13,14
+ mov r11,rbx
+
+ and rdi,r15
+ ror r14,28
+ add r12,r13
+
+ xor r11,rdi
+ add rdx,r12
+ add r11,r12
+
+ lea rbp,[8+rbp]
+ mov r13,QWORD[80+rsp]
+ mov rdi,QWORD[56+rsp]
+
+ mov r12,r13
+ ror r13,7
+ add r11,r14
+ mov r14,rdi
+ ror rdi,42
+
+ xor r13,r12
+ shr r12,7
+ ror r13,1
+ xor rdi,r14
+ shr r14,6
+
+ ror rdi,19
+ xor r12,r13
+ xor rdi,r14
+ add r12,QWORD[16+rsp]
+
+ add r12,QWORD[72+rsp]
+ mov r13,rdx
+ add r12,rdi
+ mov r14,r11
+ ror r13,23
+ mov rdi,r8
+
+ xor r13,rdx
+ ror r14,5
+ xor rdi,r9
+
+ mov QWORD[72+rsp],r12
+ xor r14,r11
+ and rdi,rdx
+
+ ror r13,4
+ add r12,r10
+ xor rdi,r9
+
+ ror r14,6
+ xor r13,rdx
+ add r12,rdi
+
+ mov rdi,r11
+ add r12,QWORD[rbp]
+ xor r14,r11
+
+ xor rdi,rax
+ ror r13,14
+ mov r10,rax
+
+ and r15,rdi
+ ror r14,28
+ add r12,r13
+
+ xor r10,r15
+ add rcx,r12
+ add r10,r12
+
+ lea rbp,[24+rbp]
+ mov r13,QWORD[88+rsp]
+ mov r15,QWORD[64+rsp]
+
+ mov r12,r13
+ ror r13,7
+ add r10,r14
+ mov r14,r15
+ ror r15,42
+
+ xor r13,r12
+ shr r12,7
+ ror r13,1
+ xor r15,r14
+ shr r14,6
+
+ ror r15,19
+ xor r12,r13
+ xor r15,r14
+ add r12,QWORD[24+rsp]
+
+ add r12,QWORD[80+rsp]
+ mov r13,rcx
+ add r12,r15
+ mov r14,r10
+ ror r13,23
+ mov r15,rdx
+
+ xor r13,rcx
+ ror r14,5
+ xor r15,r8
+
+ mov QWORD[80+rsp],r12
+ xor r14,r10
+ and r15,rcx
+
+ ror r13,4
+ add r12,r9
+ xor r15,r8
+
+ ror r14,6
+ xor r13,rcx
+ add r12,r15
+
+ mov r15,r10
+ add r12,QWORD[rbp]
+ xor r14,r10
+
+ xor r15,r11
+ ror r13,14
+ mov r9,r11
+
+ and rdi,r15
+ ror r14,28
+ add r12,r13
+
+ xor r9,rdi
+ add rbx,r12
+ add r9,r12
+
+ lea rbp,[8+rbp]
+ mov r13,QWORD[96+rsp]
+ mov rdi,QWORD[72+rsp]
+
+ mov r12,r13
+ ror r13,7
+ add r9,r14
+ mov r14,rdi
+ ror rdi,42
+
+ xor r13,r12
+ shr r12,7
+ ror r13,1
+ xor rdi,r14
+ shr r14,6
+
+ ror rdi,19
+ xor r12,r13
+ xor rdi,r14
+ add r12,QWORD[32+rsp]
+
+ add r12,QWORD[88+rsp]
+ mov r13,rbx
+ add r12,rdi
+ mov r14,r9
+ ror r13,23
+ mov rdi,rcx
+
+ xor r13,rbx
+ ror r14,5
+ xor rdi,rdx
+
+ mov QWORD[88+rsp],r12
+ xor r14,r9
+ and rdi,rbx
+
+ ror r13,4
+ add r12,r8
+ xor rdi,rdx
+
+ ror r14,6
+ xor r13,rbx
+ add r12,rdi
+
+ mov rdi,r9
+ add r12,QWORD[rbp]
+ xor r14,r9
+
+ xor rdi,r10
+ ror r13,14
+ mov r8,r10
+
+ and r15,rdi
+ ror r14,28
+ add r12,r13
+
+ xor r8,r15
+ add rax,r12
+ add r8,r12
+
+ lea rbp,[24+rbp]
+ mov r13,QWORD[104+rsp]
+ mov r15,QWORD[80+rsp]
+
+ mov r12,r13
+ ror r13,7
+ add r8,r14
+ mov r14,r15
+ ror r15,42
+
+ xor r13,r12
+ shr r12,7
+ ror r13,1
+ xor r15,r14
+ shr r14,6
+
+ ror r15,19
+ xor r12,r13
+ xor r15,r14
+ add r12,QWORD[40+rsp]
+
+ add r12,QWORD[96+rsp]
+ mov r13,rax
+ add r12,r15
+ mov r14,r8
+ ror r13,23
+ mov r15,rbx
+
+ xor r13,rax
+ ror r14,5
+ xor r15,rcx
+
+ mov QWORD[96+rsp],r12
+ xor r14,r8
+ and r15,rax
+
+ ror r13,4
+ add r12,rdx
+ xor r15,rcx
+
+ ror r14,6
+ xor r13,rax
+ add r12,r15
+
+ mov r15,r8
+ add r12,QWORD[rbp]
+ xor r14,r8
+
+ xor r15,r9
+ ror r13,14
+ mov rdx,r9
+
+ and rdi,r15
+ ror r14,28
+ add r12,r13
+
+ xor rdx,rdi
+ add r11,r12
+ add rdx,r12
+
+ lea rbp,[8+rbp]
+ mov r13,QWORD[112+rsp]
+ mov rdi,QWORD[88+rsp]
+
+ mov r12,r13
+ ror r13,7
+ add rdx,r14
+ mov r14,rdi
+ ror rdi,42
+
+ xor r13,r12
+ shr r12,7
+ ror r13,1
+ xor rdi,r14
+ shr r14,6
+
+ ror rdi,19
+ xor r12,r13
+ xor rdi,r14
+ add r12,QWORD[48+rsp]
+
+ add r12,QWORD[104+rsp]
+ mov r13,r11
+ add r12,rdi
+ mov r14,rdx
+ ror r13,23
+ mov rdi,rax
+
+ xor r13,r11
+ ror r14,5
+ xor rdi,rbx
+
+ mov QWORD[104+rsp],r12
+ xor r14,rdx
+ and rdi,r11
+
+ ror r13,4
+ add r12,rcx
+ xor rdi,rbx
+
+ ror r14,6
+ xor r13,r11
+ add r12,rdi
+
+ mov rdi,rdx
+ add r12,QWORD[rbp]
+ xor r14,rdx
+
+ xor rdi,r8
+ ror r13,14
+ mov rcx,r8
+
+ and r15,rdi
+ ror r14,28
+ add r12,r13
+
+ xor rcx,r15
+ add r10,r12
+ add rcx,r12
+
+ lea rbp,[24+rbp]
+ mov r13,QWORD[120+rsp]
+ mov r15,QWORD[96+rsp]
+
+ mov r12,r13
+ ror r13,7
+ add rcx,r14
+ mov r14,r15
+ ror r15,42
+
+ xor r13,r12
+ shr r12,7
+ ror r13,1
+ xor r15,r14
+ shr r14,6
+
+ ror r15,19
+ xor r12,r13
+ xor r15,r14
+ add r12,QWORD[56+rsp]
+
+ add r12,QWORD[112+rsp]
+ mov r13,r10
+ add r12,r15
+ mov r14,rcx
+ ror r13,23
+ mov r15,r11
+
+ xor r13,r10
+ ror r14,5
+ xor r15,rax
+
+ mov QWORD[112+rsp],r12
+ xor r14,rcx
+ and r15,r10
+
+ ror r13,4
+ add r12,rbx
+ xor r15,rax
+
+ ror r14,6
+ xor r13,r10
+ add r12,r15
+
+ mov r15,rcx
+ add r12,QWORD[rbp]
+ xor r14,rcx
+
+ xor r15,rdx
+ ror r13,14
+ mov rbx,rdx
+
+ and rdi,r15
+ ror r14,28
+ add r12,r13
+
+ xor rbx,rdi
+ add r9,r12
+ add rbx,r12
+
+ lea rbp,[8+rbp]
+ mov r13,QWORD[rsp]
+ mov rdi,QWORD[104+rsp]
+
+ mov r12,r13
+ ror r13,7
+ add rbx,r14
+ mov r14,rdi
+ ror rdi,42
+
+ xor r13,r12
+ shr r12,7
+ ror r13,1
+ xor rdi,r14
+ shr r14,6
+
+ ror rdi,19
+ xor r12,r13
+ xor rdi,r14
+ add r12,QWORD[64+rsp]
+
+ add r12,QWORD[120+rsp]
+ mov r13,r9
+ add r12,rdi
+ mov r14,rbx
+ ror r13,23
+ mov rdi,r10
+
+ xor r13,r9
+ ror r14,5
+ xor rdi,r11
+
+ mov QWORD[120+rsp],r12
+ xor r14,rbx
+ and rdi,r9
+
+ ror r13,4
+ add r12,rax
+ xor rdi,r11
+
+ ror r14,6
+ xor r13,r9
+ add r12,rdi
+
+ mov rdi,rbx
+ add r12,QWORD[rbp]
+ xor r14,rbx
+
+ xor rdi,rcx
+ ror r13,14
+ mov rax,rcx
+
+ and r15,rdi
+ ror r14,28
+ add r12,r13
+
+ xor rax,r15
+ add r8,r12
+ add rax,r12
+
+ lea rbp,[24+rbp]
+ cmp BYTE[7+rbp],0
+ jnz NEAR $L$rounds_16_xx
+
+ mov rdi,QWORD[((128+0))+rsp]
+ add rax,r14
+ lea rsi,[128+rsi]
+
+ add rax,QWORD[rdi]
+ add rbx,QWORD[8+rdi]
+ add rcx,QWORD[16+rdi]
+ add rdx,QWORD[24+rdi]
+ add r8,QWORD[32+rdi]
+ add r9,QWORD[40+rdi]
+ add r10,QWORD[48+rdi]
+ add r11,QWORD[56+rdi]
+
+ cmp rsi,QWORD[((128+16))+rsp]
+
+ mov QWORD[rdi],rax
+ mov QWORD[8+rdi],rbx
+ mov QWORD[16+rdi],rcx
+ mov QWORD[24+rdi],rdx
+ mov QWORD[32+rdi],r8
+ mov QWORD[40+rdi],r9
+ mov QWORD[48+rdi],r10
+ mov QWORD[56+rdi],r11
+ jb NEAR $L$loop
+
+ mov rsi,QWORD[152+rsp]
+
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_sha512_block_data_order:
+ALIGN 64
+
+K512:
+ DQ 0x428a2f98d728ae22,0x7137449123ef65cd
+ DQ 0x428a2f98d728ae22,0x7137449123ef65cd
+ DQ 0xb5c0fbcfec4d3b2f,0xe9b5dba58189dbbc
+ DQ 0xb5c0fbcfec4d3b2f,0xe9b5dba58189dbbc
+ DQ 0x3956c25bf348b538,0x59f111f1b605d019
+ DQ 0x3956c25bf348b538,0x59f111f1b605d019
+ DQ 0x923f82a4af194f9b,0xab1c5ed5da6d8118
+ DQ 0x923f82a4af194f9b,0xab1c5ed5da6d8118
+ DQ 0xd807aa98a3030242,0x12835b0145706fbe
+ DQ 0xd807aa98a3030242,0x12835b0145706fbe
+ DQ 0x243185be4ee4b28c,0x550c7dc3d5ffb4e2
+ DQ 0x243185be4ee4b28c,0x550c7dc3d5ffb4e2
+ DQ 0x72be5d74f27b896f,0x80deb1fe3b1696b1
+ DQ 0x72be5d74f27b896f,0x80deb1fe3b1696b1
+ DQ 0x9bdc06a725c71235,0xc19bf174cf692694
+ DQ 0x9bdc06a725c71235,0xc19bf174cf692694
+ DQ 0xe49b69c19ef14ad2,0xefbe4786384f25e3
+ DQ 0xe49b69c19ef14ad2,0xefbe4786384f25e3
+ DQ 0x0fc19dc68b8cd5b5,0x240ca1cc77ac9c65
+ DQ 0x0fc19dc68b8cd5b5,0x240ca1cc77ac9c65
+ DQ 0x2de92c6f592b0275,0x4a7484aa6ea6e483
+ DQ 0x2de92c6f592b0275,0x4a7484aa6ea6e483
+ DQ 0x5cb0a9dcbd41fbd4,0x76f988da831153b5
+ DQ 0x5cb0a9dcbd41fbd4,0x76f988da831153b5
+ DQ 0x983e5152ee66dfab,0xa831c66d2db43210
+ DQ 0x983e5152ee66dfab,0xa831c66d2db43210
+ DQ 0xb00327c898fb213f,0xbf597fc7beef0ee4
+ DQ 0xb00327c898fb213f,0xbf597fc7beef0ee4
+ DQ 0xc6e00bf33da88fc2,0xd5a79147930aa725
+ DQ 0xc6e00bf33da88fc2,0xd5a79147930aa725
+ DQ 0x06ca6351e003826f,0x142929670a0e6e70
+ DQ 0x06ca6351e003826f,0x142929670a0e6e70
+ DQ 0x27b70a8546d22ffc,0x2e1b21385c26c926
+ DQ 0x27b70a8546d22ffc,0x2e1b21385c26c926
+ DQ 0x4d2c6dfc5ac42aed,0x53380d139d95b3df
+ DQ 0x4d2c6dfc5ac42aed,0x53380d139d95b3df
+ DQ 0x650a73548baf63de,0x766a0abb3c77b2a8
+ DQ 0x650a73548baf63de,0x766a0abb3c77b2a8
+ DQ 0x81c2c92e47edaee6,0x92722c851482353b
+ DQ 0x81c2c92e47edaee6,0x92722c851482353b
+ DQ 0xa2bfe8a14cf10364,0xa81a664bbc423001
+ DQ 0xa2bfe8a14cf10364,0xa81a664bbc423001
+ DQ 0xc24b8b70d0f89791,0xc76c51a30654be30
+ DQ 0xc24b8b70d0f89791,0xc76c51a30654be30
+ DQ 0xd192e819d6ef5218,0xd69906245565a910
+ DQ 0xd192e819d6ef5218,0xd69906245565a910
+ DQ 0xf40e35855771202a,0x106aa07032bbd1b8
+ DQ 0xf40e35855771202a,0x106aa07032bbd1b8
+ DQ 0x19a4c116b8d2d0c8,0x1e376c085141ab53
+ DQ 0x19a4c116b8d2d0c8,0x1e376c085141ab53
+ DQ 0x2748774cdf8eeb99,0x34b0bcb5e19b48a8
+ DQ 0x2748774cdf8eeb99,0x34b0bcb5e19b48a8
+ DQ 0x391c0cb3c5c95a63,0x4ed8aa4ae3418acb
+ DQ 0x391c0cb3c5c95a63,0x4ed8aa4ae3418acb
+ DQ 0x5b9cca4f7763e373,0x682e6ff3d6b2b8a3
+ DQ 0x5b9cca4f7763e373,0x682e6ff3d6b2b8a3
+ DQ 0x748f82ee5defb2fc,0x78a5636f43172f60
+ DQ 0x748f82ee5defb2fc,0x78a5636f43172f60
+ DQ 0x84c87814a1f0ab72,0x8cc702081a6439ec
+ DQ 0x84c87814a1f0ab72,0x8cc702081a6439ec
+ DQ 0x90befffa23631e28,0xa4506cebde82bde9
+ DQ 0x90befffa23631e28,0xa4506cebde82bde9
+ DQ 0xbef9a3f7b2c67915,0xc67178f2e372532b
+ DQ 0xbef9a3f7b2c67915,0xc67178f2e372532b
+ DQ 0xca273eceea26619c,0xd186b8c721c0c207
+ DQ 0xca273eceea26619c,0xd186b8c721c0c207
+ DQ 0xeada7dd6cde0eb1e,0xf57d4f7fee6ed178
+ DQ 0xeada7dd6cde0eb1e,0xf57d4f7fee6ed178
+ DQ 0x06f067aa72176fba,0x0a637dc5a2c898a6
+ DQ 0x06f067aa72176fba,0x0a637dc5a2c898a6
+ DQ 0x113f9804bef90dae,0x1b710b35131c471b
+ DQ 0x113f9804bef90dae,0x1b710b35131c471b
+ DQ 0x28db77f523047d84,0x32caab7b40c72493
+ DQ 0x28db77f523047d84,0x32caab7b40c72493
+ DQ 0x3c9ebe0a15c9bebc,0x431d67c49c100d4c
+ DQ 0x3c9ebe0a15c9bebc,0x431d67c49c100d4c
+ DQ 0x4cc5d4becb3e42b6,0x597f299cfc657e2a
+ DQ 0x4cc5d4becb3e42b6,0x597f299cfc657e2a
+ DQ 0x5fcb6fab3ad6faec,0x6c44198c4a475817
+ DQ 0x5fcb6fab3ad6faec,0x6c44198c4a475817
+
+ DQ 0x0001020304050607,0x08090a0b0c0d0e0f
+ DQ 0x0001020304050607,0x08090a0b0c0d0e0f
+DB 83,72,65,53,49,50,32,98,108,111,99,107,32,116,114,97
+DB 110,115,102,111,114,109,32,102,111,114,32,120,56,54,95,54
+DB 52,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121
+DB 32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46
+DB 111,114,103,62,0
+
+ALIGN 64
+sha512_block_data_order_xop:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha512_block_data_order_xop:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+$L$xop_shortcut:
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ shl rdx,4
+ sub rsp,256
+ lea rdx,[rdx*8+rsi]
+ and rsp,-64
+ mov QWORD[((128+0))+rsp],rdi
+ mov QWORD[((128+8))+rsp],rsi
+ mov QWORD[((128+16))+rsp],rdx
+ mov QWORD[152+rsp],rax
+
+ movaps XMMWORD[(128+32)+rsp],xmm6
+ movaps XMMWORD[(128+48)+rsp],xmm7
+ movaps XMMWORD[(128+64)+rsp],xmm8
+ movaps XMMWORD[(128+80)+rsp],xmm9
+ movaps XMMWORD[(128+96)+rsp],xmm10
+ movaps XMMWORD[(128+112)+rsp],xmm11
+$L$prologue_xop:
+
+ vzeroupper
+ mov rax,QWORD[rdi]
+ mov rbx,QWORD[8+rdi]
+ mov rcx,QWORD[16+rdi]
+ mov rdx,QWORD[24+rdi]
+ mov r8,QWORD[32+rdi]
+ mov r9,QWORD[40+rdi]
+ mov r10,QWORD[48+rdi]
+ mov r11,QWORD[56+rdi]
+ jmp NEAR $L$loop_xop
+ALIGN 16
+$L$loop_xop:
+ vmovdqa xmm11,XMMWORD[((K512+1280))]
+ vmovdqu xmm0,XMMWORD[rsi]
+ lea rbp,[((K512+128))]
+ vmovdqu xmm1,XMMWORD[16+rsi]
+ vmovdqu xmm2,XMMWORD[32+rsi]
+ vpshufb xmm0,xmm0,xmm11
+ vmovdqu xmm3,XMMWORD[48+rsi]
+ vpshufb xmm1,xmm1,xmm11
+ vmovdqu xmm4,XMMWORD[64+rsi]
+ vpshufb xmm2,xmm2,xmm11
+ vmovdqu xmm5,XMMWORD[80+rsi]
+ vpshufb xmm3,xmm3,xmm11
+ vmovdqu xmm6,XMMWORD[96+rsi]
+ vpshufb xmm4,xmm4,xmm11
+ vmovdqu xmm7,XMMWORD[112+rsi]
+ vpshufb xmm5,xmm5,xmm11
+ vpaddq xmm8,xmm0,XMMWORD[((-128))+rbp]
+ vpshufb xmm6,xmm6,xmm11
+ vpaddq xmm9,xmm1,XMMWORD[((-96))+rbp]
+ vpshufb xmm7,xmm7,xmm11
+ vpaddq xmm10,xmm2,XMMWORD[((-64))+rbp]
+ vpaddq xmm11,xmm3,XMMWORD[((-32))+rbp]
+ vmovdqa XMMWORD[rsp],xmm8
+ vpaddq xmm8,xmm4,XMMWORD[rbp]
+ vmovdqa XMMWORD[16+rsp],xmm9
+ vpaddq xmm9,xmm5,XMMWORD[32+rbp]
+ vmovdqa XMMWORD[32+rsp],xmm10
+ vpaddq xmm10,xmm6,XMMWORD[64+rbp]
+ vmovdqa XMMWORD[48+rsp],xmm11
+ vpaddq xmm11,xmm7,XMMWORD[96+rbp]
+ vmovdqa XMMWORD[64+rsp],xmm8
+ mov r14,rax
+ vmovdqa XMMWORD[80+rsp],xmm9
+ mov rdi,rbx
+ vmovdqa XMMWORD[96+rsp],xmm10
+ xor rdi,rcx
+ vmovdqa XMMWORD[112+rsp],xmm11
+ mov r13,r8
+ jmp NEAR $L$xop_00_47
+
+ALIGN 16
+$L$xop_00_47:
+ add rbp,256
+ vpalignr xmm8,xmm1,xmm0,8
+ ror r13,23
+ mov rax,r14
+ vpalignr xmm11,xmm5,xmm4,8
+ mov r12,r9
+ ror r14,5
+DB 143,72,120,195,200,56
+ xor r13,r8
+ xor r12,r10
+ vpsrlq xmm8,xmm8,7
+ ror r13,4
+ xor r14,rax
+ vpaddq xmm0,xmm0,xmm11
+ and r12,r8
+ xor r13,r8
+ add r11,QWORD[rsp]
+ mov r15,rax
+DB 143,72,120,195,209,7
+ xor r12,r10
+ ror r14,6
+ vpxor xmm8,xmm8,xmm9
+ xor r15,rbx
+ add r11,r12
+ ror r13,14
+ and rdi,r15
+DB 143,104,120,195,223,3
+ xor r14,rax
+ add r11,r13
+ vpxor xmm8,xmm8,xmm10
+ xor rdi,rbx
+ ror r14,28
+ vpsrlq xmm10,xmm7,6
+ add rdx,r11
+ add r11,rdi
+ vpaddq xmm0,xmm0,xmm8
+ mov r13,rdx
+ add r14,r11
+DB 143,72,120,195,203,42
+ ror r13,23
+ mov r11,r14
+ vpxor xmm11,xmm11,xmm10
+ mov r12,r8
+ ror r14,5
+ xor r13,rdx
+ xor r12,r9
+ vpxor xmm11,xmm11,xmm9
+ ror r13,4
+ xor r14,r11
+ and r12,rdx
+ xor r13,rdx
+ vpaddq xmm0,xmm0,xmm11
+ add r10,QWORD[8+rsp]
+ mov rdi,r11
+ xor r12,r9
+ ror r14,6
+ vpaddq xmm10,xmm0,XMMWORD[((-128))+rbp]
+ xor rdi,rax
+ add r10,r12
+ ror r13,14
+ and r15,rdi
+ xor r14,r11
+ add r10,r13
+ xor r15,rax
+ ror r14,28
+ add rcx,r10
+ add r10,r15
+ mov r13,rcx
+ add r14,r10
+ vmovdqa XMMWORD[rsp],xmm10
+ vpalignr xmm8,xmm2,xmm1,8
+ ror r13,23
+ mov r10,r14
+ vpalignr xmm11,xmm6,xmm5,8
+ mov r12,rdx
+ ror r14,5
+DB 143,72,120,195,200,56
+ xor r13,rcx
+ xor r12,r8
+ vpsrlq xmm8,xmm8,7
+ ror r13,4
+ xor r14,r10
+ vpaddq xmm1,xmm1,xmm11
+ and r12,rcx
+ xor r13,rcx
+ add r9,QWORD[16+rsp]
+ mov r15,r10
+DB 143,72,120,195,209,7
+ xor r12,r8
+ ror r14,6
+ vpxor xmm8,xmm8,xmm9
+ xor r15,r11
+ add r9,r12
+ ror r13,14
+ and rdi,r15
+DB 143,104,120,195,216,3
+ xor r14,r10
+ add r9,r13
+ vpxor xmm8,xmm8,xmm10
+ xor rdi,r11
+ ror r14,28
+ vpsrlq xmm10,xmm0,6
+ add rbx,r9
+ add r9,rdi
+ vpaddq xmm1,xmm1,xmm8
+ mov r13,rbx
+ add r14,r9
+DB 143,72,120,195,203,42
+ ror r13,23
+ mov r9,r14
+ vpxor xmm11,xmm11,xmm10
+ mov r12,rcx
+ ror r14,5
+ xor r13,rbx
+ xor r12,rdx
+ vpxor xmm11,xmm11,xmm9
+ ror r13,4
+ xor r14,r9
+ and r12,rbx
+ xor r13,rbx
+ vpaddq xmm1,xmm1,xmm11
+ add r8,QWORD[24+rsp]
+ mov rdi,r9
+ xor r12,rdx
+ ror r14,6
+ vpaddq xmm10,xmm1,XMMWORD[((-96))+rbp]
+ xor rdi,r10
+ add r8,r12
+ ror r13,14
+ and r15,rdi
+ xor r14,r9
+ add r8,r13
+ xor r15,r10
+ ror r14,28
+ add rax,r8
+ add r8,r15
+ mov r13,rax
+ add r14,r8
+ vmovdqa XMMWORD[16+rsp],xmm10
+ vpalignr xmm8,xmm3,xmm2,8
+ ror r13,23
+ mov r8,r14
+ vpalignr xmm11,xmm7,xmm6,8
+ mov r12,rbx
+ ror r14,5
+DB 143,72,120,195,200,56
+ xor r13,rax
+ xor r12,rcx
+ vpsrlq xmm8,xmm8,7
+ ror r13,4
+ xor r14,r8
+ vpaddq xmm2,xmm2,xmm11
+ and r12,rax
+ xor r13,rax
+ add rdx,QWORD[32+rsp]
+ mov r15,r8
+DB 143,72,120,195,209,7
+ xor r12,rcx
+ ror r14,6
+ vpxor xmm8,xmm8,xmm9
+ xor r15,r9
+ add rdx,r12
+ ror r13,14
+ and rdi,r15
+DB 143,104,120,195,217,3
+ xor r14,r8
+ add rdx,r13
+ vpxor xmm8,xmm8,xmm10
+ xor rdi,r9
+ ror r14,28
+ vpsrlq xmm10,xmm1,6
+ add r11,rdx
+ add rdx,rdi
+ vpaddq xmm2,xmm2,xmm8
+ mov r13,r11
+ add r14,rdx
+DB 143,72,120,195,203,42
+ ror r13,23
+ mov rdx,r14
+ vpxor xmm11,xmm11,xmm10
+ mov r12,rax
+ ror r14,5
+ xor r13,r11
+ xor r12,rbx
+ vpxor xmm11,xmm11,xmm9
+ ror r13,4
+ xor r14,rdx
+ and r12,r11
+ xor r13,r11
+ vpaddq xmm2,xmm2,xmm11
+ add rcx,QWORD[40+rsp]
+ mov rdi,rdx
+ xor r12,rbx
+ ror r14,6
+ vpaddq xmm10,xmm2,XMMWORD[((-64))+rbp]
+ xor rdi,r8
+ add rcx,r12
+ ror r13,14
+ and r15,rdi
+ xor r14,rdx
+ add rcx,r13
+ xor r15,r8
+ ror r14,28
+ add r10,rcx
+ add rcx,r15
+ mov r13,r10
+ add r14,rcx
+ vmovdqa XMMWORD[32+rsp],xmm10
+ vpalignr xmm8,xmm4,xmm3,8
+ ror r13,23
+ mov rcx,r14
+ vpalignr xmm11,xmm0,xmm7,8
+ mov r12,r11
+ ror r14,5
+DB 143,72,120,195,200,56
+ xor r13,r10
+ xor r12,rax
+ vpsrlq xmm8,xmm8,7
+ ror r13,4
+ xor r14,rcx
+ vpaddq xmm3,xmm3,xmm11
+ and r12,r10
+ xor r13,r10
+ add rbx,QWORD[48+rsp]
+ mov r15,rcx
+DB 143,72,120,195,209,7
+ xor r12,rax
+ ror r14,6
+ vpxor xmm8,xmm8,xmm9
+ xor r15,rdx
+ add rbx,r12
+ ror r13,14
+ and rdi,r15
+DB 143,104,120,195,218,3
+ xor r14,rcx
+ add rbx,r13
+ vpxor xmm8,xmm8,xmm10
+ xor rdi,rdx
+ ror r14,28
+ vpsrlq xmm10,xmm2,6
+ add r9,rbx
+ add rbx,rdi
+ vpaddq xmm3,xmm3,xmm8
+ mov r13,r9
+ add r14,rbx
+DB 143,72,120,195,203,42
+ ror r13,23
+ mov rbx,r14
+ vpxor xmm11,xmm11,xmm10
+ mov r12,r10
+ ror r14,5
+ xor r13,r9
+ xor r12,r11
+ vpxor xmm11,xmm11,xmm9
+ ror r13,4
+ xor r14,rbx
+ and r12,r9
+ xor r13,r9
+ vpaddq xmm3,xmm3,xmm11
+ add rax,QWORD[56+rsp]
+ mov rdi,rbx
+ xor r12,r11
+ ror r14,6
+ vpaddq xmm10,xmm3,XMMWORD[((-32))+rbp]
+ xor rdi,rcx
+ add rax,r12
+ ror r13,14
+ and r15,rdi
+ xor r14,rbx
+ add rax,r13
+ xor r15,rcx
+ ror r14,28
+ add r8,rax
+ add rax,r15
+ mov r13,r8
+ add r14,rax
+ vmovdqa XMMWORD[48+rsp],xmm10
+ vpalignr xmm8,xmm5,xmm4,8
+ ror r13,23
+ mov rax,r14
+ vpalignr xmm11,xmm1,xmm0,8
+ mov r12,r9
+ ror r14,5
+DB 143,72,120,195,200,56
+ xor r13,r8
+ xor r12,r10
+ vpsrlq xmm8,xmm8,7
+ ror r13,4
+ xor r14,rax
+ vpaddq xmm4,xmm4,xmm11
+ and r12,r8
+ xor r13,r8
+ add r11,QWORD[64+rsp]
+ mov r15,rax
+DB 143,72,120,195,209,7
+ xor r12,r10
+ ror r14,6
+ vpxor xmm8,xmm8,xmm9
+ xor r15,rbx
+ add r11,r12
+ ror r13,14
+ and rdi,r15
+DB 143,104,120,195,219,3
+ xor r14,rax
+ add r11,r13
+ vpxor xmm8,xmm8,xmm10
+ xor rdi,rbx
+ ror r14,28
+ vpsrlq xmm10,xmm3,6
+ add rdx,r11
+ add r11,rdi
+ vpaddq xmm4,xmm4,xmm8
+ mov r13,rdx
+ add r14,r11
+DB 143,72,120,195,203,42
+ ror r13,23
+ mov r11,r14
+ vpxor xmm11,xmm11,xmm10
+ mov r12,r8
+ ror r14,5
+ xor r13,rdx
+ xor r12,r9
+ vpxor xmm11,xmm11,xmm9
+ ror r13,4
+ xor r14,r11
+ and r12,rdx
+ xor r13,rdx
+ vpaddq xmm4,xmm4,xmm11
+ add r10,QWORD[72+rsp]
+ mov rdi,r11
+ xor r12,r9
+ ror r14,6
+ vpaddq xmm10,xmm4,XMMWORD[rbp]
+ xor rdi,rax
+ add r10,r12
+ ror r13,14
+ and r15,rdi
+ xor r14,r11
+ add r10,r13
+ xor r15,rax
+ ror r14,28
+ add rcx,r10
+ add r10,r15
+ mov r13,rcx
+ add r14,r10
+ vmovdqa XMMWORD[64+rsp],xmm10
+ vpalignr xmm8,xmm6,xmm5,8
+ ror r13,23
+ mov r10,r14
+ vpalignr xmm11,xmm2,xmm1,8
+ mov r12,rdx
+ ror r14,5
+DB 143,72,120,195,200,56
+ xor r13,rcx
+ xor r12,r8
+ vpsrlq xmm8,xmm8,7
+ ror r13,4
+ xor r14,r10
+ vpaddq xmm5,xmm5,xmm11
+ and r12,rcx
+ xor r13,rcx
+ add r9,QWORD[80+rsp]
+ mov r15,r10
+DB 143,72,120,195,209,7
+ xor r12,r8
+ ror r14,6
+ vpxor xmm8,xmm8,xmm9
+ xor r15,r11
+ add r9,r12
+ ror r13,14
+ and rdi,r15
+DB 143,104,120,195,220,3
+ xor r14,r10
+ add r9,r13
+ vpxor xmm8,xmm8,xmm10
+ xor rdi,r11
+ ror r14,28
+ vpsrlq xmm10,xmm4,6
+ add rbx,r9
+ add r9,rdi
+ vpaddq xmm5,xmm5,xmm8
+ mov r13,rbx
+ add r14,r9
+DB 143,72,120,195,203,42
+ ror r13,23
+ mov r9,r14
+ vpxor xmm11,xmm11,xmm10
+ mov r12,rcx
+ ror r14,5
+ xor r13,rbx
+ xor r12,rdx
+ vpxor xmm11,xmm11,xmm9
+ ror r13,4
+ xor r14,r9
+ and r12,rbx
+ xor r13,rbx
+ vpaddq xmm5,xmm5,xmm11
+ add r8,QWORD[88+rsp]
+ mov rdi,r9
+ xor r12,rdx
+ ror r14,6
+ vpaddq xmm10,xmm5,XMMWORD[32+rbp]
+ xor rdi,r10
+ add r8,r12
+ ror r13,14
+ and r15,rdi
+ xor r14,r9
+ add r8,r13
+ xor r15,r10
+ ror r14,28
+ add rax,r8
+ add r8,r15
+ mov r13,rax
+ add r14,r8
+ vmovdqa XMMWORD[80+rsp],xmm10
+ vpalignr xmm8,xmm7,xmm6,8
+ ror r13,23
+ mov r8,r14
+ vpalignr xmm11,xmm3,xmm2,8
+ mov r12,rbx
+ ror r14,5
+DB 143,72,120,195,200,56
+ xor r13,rax
+ xor r12,rcx
+ vpsrlq xmm8,xmm8,7
+ ror r13,4
+ xor r14,r8
+ vpaddq xmm6,xmm6,xmm11
+ and r12,rax
+ xor r13,rax
+ add rdx,QWORD[96+rsp]
+ mov r15,r8
+DB 143,72,120,195,209,7
+ xor r12,rcx
+ ror r14,6
+ vpxor xmm8,xmm8,xmm9
+ xor r15,r9
+ add rdx,r12
+ ror r13,14
+ and rdi,r15
+DB 143,104,120,195,221,3
+ xor r14,r8
+ add rdx,r13
+ vpxor xmm8,xmm8,xmm10
+ xor rdi,r9
+ ror r14,28
+ vpsrlq xmm10,xmm5,6
+ add r11,rdx
+ add rdx,rdi
+ vpaddq xmm6,xmm6,xmm8
+ mov r13,r11
+ add r14,rdx
+DB 143,72,120,195,203,42
+ ror r13,23
+ mov rdx,r14
+ vpxor xmm11,xmm11,xmm10
+ mov r12,rax
+ ror r14,5
+ xor r13,r11
+ xor r12,rbx
+ vpxor xmm11,xmm11,xmm9
+ ror r13,4
+ xor r14,rdx
+ and r12,r11
+ xor r13,r11
+ vpaddq xmm6,xmm6,xmm11
+ add rcx,QWORD[104+rsp]
+ mov rdi,rdx
+ xor r12,rbx
+ ror r14,6
+ vpaddq xmm10,xmm6,XMMWORD[64+rbp]
+ xor rdi,r8
+ add rcx,r12
+ ror r13,14
+ and r15,rdi
+ xor r14,rdx
+ add rcx,r13
+ xor r15,r8
+ ror r14,28
+ add r10,rcx
+ add rcx,r15
+ mov r13,r10
+ add r14,rcx
+ vmovdqa XMMWORD[96+rsp],xmm10
+ vpalignr xmm8,xmm0,xmm7,8
+ ror r13,23
+ mov rcx,r14
+ vpalignr xmm11,xmm4,xmm3,8
+ mov r12,r11
+ ror r14,5
+DB 143,72,120,195,200,56
+ xor r13,r10
+ xor r12,rax
+ vpsrlq xmm8,xmm8,7
+ ror r13,4
+ xor r14,rcx
+ vpaddq xmm7,xmm7,xmm11
+ and r12,r10
+ xor r13,r10
+ add rbx,QWORD[112+rsp]
+ mov r15,rcx
+DB 143,72,120,195,209,7
+ xor r12,rax
+ ror r14,6
+ vpxor xmm8,xmm8,xmm9
+ xor r15,rdx
+ add rbx,r12
+ ror r13,14
+ and rdi,r15
+DB 143,104,120,195,222,3
+ xor r14,rcx
+ add rbx,r13
+ vpxor xmm8,xmm8,xmm10
+ xor rdi,rdx
+ ror r14,28
+ vpsrlq xmm10,xmm6,6
+ add r9,rbx
+ add rbx,rdi
+ vpaddq xmm7,xmm7,xmm8
+ mov r13,r9
+ add r14,rbx
+DB 143,72,120,195,203,42
+ ror r13,23
+ mov rbx,r14
+ vpxor xmm11,xmm11,xmm10
+ mov r12,r10
+ ror r14,5
+ xor r13,r9
+ xor r12,r11
+ vpxor xmm11,xmm11,xmm9
+ ror r13,4
+ xor r14,rbx
+ and r12,r9
+ xor r13,r9
+ vpaddq xmm7,xmm7,xmm11
+ add rax,QWORD[120+rsp]
+ mov rdi,rbx
+ xor r12,r11
+ ror r14,6
+ vpaddq xmm10,xmm7,XMMWORD[96+rbp]
+ xor rdi,rcx
+ add rax,r12
+ ror r13,14
+ and r15,rdi
+ xor r14,rbx
+ add rax,r13
+ xor r15,rcx
+ ror r14,28
+ add r8,rax
+ add rax,r15
+ mov r13,r8
+ add r14,rax
+ vmovdqa XMMWORD[112+rsp],xmm10
+ cmp BYTE[135+rbp],0
+ jne NEAR $L$xop_00_47
+ ror r13,23
+ mov rax,r14
+ mov r12,r9
+ ror r14,5
+ xor r13,r8
+ xor r12,r10
+ ror r13,4
+ xor r14,rax
+ and r12,r8
+ xor r13,r8
+ add r11,QWORD[rsp]
+ mov r15,rax
+ xor r12,r10
+ ror r14,6
+ xor r15,rbx
+ add r11,r12
+ ror r13,14
+ and rdi,r15
+ xor r14,rax
+ add r11,r13
+ xor rdi,rbx
+ ror r14,28
+ add rdx,r11
+ add r11,rdi
+ mov r13,rdx
+ add r14,r11
+ ror r13,23
+ mov r11,r14
+ mov r12,r8
+ ror r14,5
+ xor r13,rdx
+ xor r12,r9
+ ror r13,4
+ xor r14,r11
+ and r12,rdx
+ xor r13,rdx
+ add r10,QWORD[8+rsp]
+ mov rdi,r11
+ xor r12,r9
+ ror r14,6
+ xor rdi,rax
+ add r10,r12
+ ror r13,14
+ and r15,rdi
+ xor r14,r11
+ add r10,r13
+ xor r15,rax
+ ror r14,28
+ add rcx,r10
+ add r10,r15
+ mov r13,rcx
+ add r14,r10
+ ror r13,23
+ mov r10,r14
+ mov r12,rdx
+ ror r14,5
+ xor r13,rcx
+ xor r12,r8
+ ror r13,4
+ xor r14,r10
+ and r12,rcx
+ xor r13,rcx
+ add r9,QWORD[16+rsp]
+ mov r15,r10
+ xor r12,r8
+ ror r14,6
+ xor r15,r11
+ add r9,r12
+ ror r13,14
+ and rdi,r15
+ xor r14,r10
+ add r9,r13
+ xor rdi,r11
+ ror r14,28
+ add rbx,r9
+ add r9,rdi
+ mov r13,rbx
+ add r14,r9
+ ror r13,23
+ mov r9,r14
+ mov r12,rcx
+ ror r14,5
+ xor r13,rbx
+ xor r12,rdx
+ ror r13,4
+ xor r14,r9
+ and r12,rbx
+ xor r13,rbx
+ add r8,QWORD[24+rsp]
+ mov rdi,r9
+ xor r12,rdx
+ ror r14,6
+ xor rdi,r10
+ add r8,r12
+ ror r13,14
+ and r15,rdi
+ xor r14,r9
+ add r8,r13
+ xor r15,r10
+ ror r14,28
+ add rax,r8
+ add r8,r15
+ mov r13,rax
+ add r14,r8
+ ror r13,23
+ mov r8,r14
+ mov r12,rbx
+ ror r14,5
+ xor r13,rax
+ xor r12,rcx
+ ror r13,4
+ xor r14,r8
+ and r12,rax
+ xor r13,rax
+ add rdx,QWORD[32+rsp]
+ mov r15,r8
+ xor r12,rcx
+ ror r14,6
+ xor r15,r9
+ add rdx,r12
+ ror r13,14
+ and rdi,r15
+ xor r14,r8
+ add rdx,r13
+ xor rdi,r9
+ ror r14,28
+ add r11,rdx
+ add rdx,rdi
+ mov r13,r11
+ add r14,rdx
+ ror r13,23
+ mov rdx,r14
+ mov r12,rax
+ ror r14,5
+ xor r13,r11
+ xor r12,rbx
+ ror r13,4
+ xor r14,rdx
+ and r12,r11
+ xor r13,r11
+ add rcx,QWORD[40+rsp]
+ mov rdi,rdx
+ xor r12,rbx
+ ror r14,6
+ xor rdi,r8
+ add rcx,r12
+ ror r13,14
+ and r15,rdi
+ xor r14,rdx
+ add rcx,r13
+ xor r15,r8
+ ror r14,28
+ add r10,rcx
+ add rcx,r15
+ mov r13,r10
+ add r14,rcx
+ ror r13,23
+ mov rcx,r14
+ mov r12,r11
+ ror r14,5
+ xor r13,r10
+ xor r12,rax
+ ror r13,4
+ xor r14,rcx
+ and r12,r10
+ xor r13,r10
+ add rbx,QWORD[48+rsp]
+ mov r15,rcx
+ xor r12,rax
+ ror r14,6
+ xor r15,rdx
+ add rbx,r12
+ ror r13,14
+ and rdi,r15
+ xor r14,rcx
+ add rbx,r13
+ xor rdi,rdx
+ ror r14,28
+ add r9,rbx
+ add rbx,rdi
+ mov r13,r9
+ add r14,rbx
+ ror r13,23
+ mov rbx,r14
+ mov r12,r10
+ ror r14,5
+ xor r13,r9
+ xor r12,r11
+ ror r13,4
+ xor r14,rbx
+ and r12,r9
+ xor r13,r9
+ add rax,QWORD[56+rsp]
+ mov rdi,rbx
+ xor r12,r11
+ ror r14,6
+ xor rdi,rcx
+ add rax,r12
+ ror r13,14
+ and r15,rdi
+ xor r14,rbx
+ add rax,r13
+ xor r15,rcx
+ ror r14,28
+ add r8,rax
+ add rax,r15
+ mov r13,r8
+ add r14,rax
+ ror r13,23
+ mov rax,r14
+ mov r12,r9
+ ror r14,5
+ xor r13,r8
+ xor r12,r10
+ ror r13,4
+ xor r14,rax
+ and r12,r8
+ xor r13,r8
+ add r11,QWORD[64+rsp]
+ mov r15,rax
+ xor r12,r10
+ ror r14,6
+ xor r15,rbx
+ add r11,r12
+ ror r13,14
+ and rdi,r15
+ xor r14,rax
+ add r11,r13
+ xor rdi,rbx
+ ror r14,28
+ add rdx,r11
+ add r11,rdi
+ mov r13,rdx
+ add r14,r11
+ ror r13,23
+ mov r11,r14
+ mov r12,r8
+ ror r14,5
+ xor r13,rdx
+ xor r12,r9
+ ror r13,4
+ xor r14,r11
+ and r12,rdx
+ xor r13,rdx
+ add r10,QWORD[72+rsp]
+ mov rdi,r11
+ xor r12,r9
+ ror r14,6
+ xor rdi,rax
+ add r10,r12
+ ror r13,14
+ and r15,rdi
+ xor r14,r11
+ add r10,r13
+ xor r15,rax
+ ror r14,28
+ add rcx,r10
+ add r10,r15
+ mov r13,rcx
+ add r14,r10
+ ror r13,23
+ mov r10,r14
+ mov r12,rdx
+ ror r14,5
+ xor r13,rcx
+ xor r12,r8
+ ror r13,4
+ xor r14,r10
+ and r12,rcx
+ xor r13,rcx
+ add r9,QWORD[80+rsp]
+ mov r15,r10
+ xor r12,r8
+ ror r14,6
+ xor r15,r11
+ add r9,r12
+ ror r13,14
+ and rdi,r15
+ xor r14,r10
+ add r9,r13
+ xor rdi,r11
+ ror r14,28
+ add rbx,r9
+ add r9,rdi
+ mov r13,rbx
+ add r14,r9
+ ror r13,23
+ mov r9,r14
+ mov r12,rcx
+ ror r14,5
+ xor r13,rbx
+ xor r12,rdx
+ ror r13,4
+ xor r14,r9
+ and r12,rbx
+ xor r13,rbx
+ add r8,QWORD[88+rsp]
+ mov rdi,r9
+ xor r12,rdx
+ ror r14,6
+ xor rdi,r10
+ add r8,r12
+ ror r13,14
+ and r15,rdi
+ xor r14,r9
+ add r8,r13
+ xor r15,r10
+ ror r14,28
+ add rax,r8
+ add r8,r15
+ mov r13,rax
+ add r14,r8
+ ror r13,23
+ mov r8,r14
+ mov r12,rbx
+ ror r14,5
+ xor r13,rax
+ xor r12,rcx
+ ror r13,4
+ xor r14,r8
+ and r12,rax
+ xor r13,rax
+ add rdx,QWORD[96+rsp]
+ mov r15,r8
+ xor r12,rcx
+ ror r14,6
+ xor r15,r9
+ add rdx,r12
+ ror r13,14
+ and rdi,r15
+ xor r14,r8
+ add rdx,r13
+ xor rdi,r9
+ ror r14,28
+ add r11,rdx
+ add rdx,rdi
+ mov r13,r11
+ add r14,rdx
+ ror r13,23
+ mov rdx,r14
+ mov r12,rax
+ ror r14,5
+ xor r13,r11
+ xor r12,rbx
+ ror r13,4
+ xor r14,rdx
+ and r12,r11
+ xor r13,r11
+ add rcx,QWORD[104+rsp]
+ mov rdi,rdx
+ xor r12,rbx
+ ror r14,6
+ xor rdi,r8
+ add rcx,r12
+ ror r13,14
+ and r15,rdi
+ xor r14,rdx
+ add rcx,r13
+ xor r15,r8
+ ror r14,28
+ add r10,rcx
+ add rcx,r15
+ mov r13,r10
+ add r14,rcx
+ ror r13,23
+ mov rcx,r14
+ mov r12,r11
+ ror r14,5
+ xor r13,r10
+ xor r12,rax
+ ror r13,4
+ xor r14,rcx
+ and r12,r10
+ xor r13,r10
+ add rbx,QWORD[112+rsp]
+ mov r15,rcx
+ xor r12,rax
+ ror r14,6
+ xor r15,rdx
+ add rbx,r12
+ ror r13,14
+ and rdi,r15
+ xor r14,rcx
+ add rbx,r13
+ xor rdi,rdx
+ ror r14,28
+ add r9,rbx
+ add rbx,rdi
+ mov r13,r9
+ add r14,rbx
+ ror r13,23
+ mov rbx,r14
+ mov r12,r10
+ ror r14,5
+ xor r13,r9
+ xor r12,r11
+ ror r13,4
+ xor r14,rbx
+ and r12,r9
+ xor r13,r9
+ add rax,QWORD[120+rsp]
+ mov rdi,rbx
+ xor r12,r11
+ ror r14,6
+ xor rdi,rcx
+ add rax,r12
+ ror r13,14
+ and r15,rdi
+ xor r14,rbx
+ add rax,r13
+ xor r15,rcx
+ ror r14,28
+ add r8,rax
+ add rax,r15
+ mov r13,r8
+ add r14,rax
+ mov rdi,QWORD[((128+0))+rsp]
+ mov rax,r14
+
+ add rax,QWORD[rdi]
+ lea rsi,[128+rsi]
+ add rbx,QWORD[8+rdi]
+ add rcx,QWORD[16+rdi]
+ add rdx,QWORD[24+rdi]
+ add r8,QWORD[32+rdi]
+ add r9,QWORD[40+rdi]
+ add r10,QWORD[48+rdi]
+ add r11,QWORD[56+rdi]
+
+ cmp rsi,QWORD[((128+16))+rsp]
+
+ mov QWORD[rdi],rax
+ mov QWORD[8+rdi],rbx
+ mov QWORD[16+rdi],rcx
+ mov QWORD[24+rdi],rdx
+ mov QWORD[32+rdi],r8
+ mov QWORD[40+rdi],r9
+ mov QWORD[48+rdi],r10
+ mov QWORD[56+rdi],r11
+ jb NEAR $L$loop_xop
+
+ mov rsi,QWORD[152+rsp]
+
+ vzeroupper
+ movaps xmm6,XMMWORD[((128+32))+rsp]
+ movaps xmm7,XMMWORD[((128+48))+rsp]
+ movaps xmm8,XMMWORD[((128+64))+rsp]
+ movaps xmm9,XMMWORD[((128+80))+rsp]
+ movaps xmm10,XMMWORD[((128+96))+rsp]
+ movaps xmm11,XMMWORD[((128+112))+rsp]
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$epilogue_xop:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_sha512_block_data_order_xop:
+
+ALIGN 64
+sha512_block_data_order_avx:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha512_block_data_order_avx:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+$L$avx_shortcut:
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ shl rdx,4
+ sub rsp,256
+ lea rdx,[rdx*8+rsi]
+ and rsp,-64
+ mov QWORD[((128+0))+rsp],rdi
+ mov QWORD[((128+8))+rsp],rsi
+ mov QWORD[((128+16))+rsp],rdx
+ mov QWORD[152+rsp],rax
+
+ movaps XMMWORD[(128+32)+rsp],xmm6
+ movaps XMMWORD[(128+48)+rsp],xmm7
+ movaps XMMWORD[(128+64)+rsp],xmm8
+ movaps XMMWORD[(128+80)+rsp],xmm9
+ movaps XMMWORD[(128+96)+rsp],xmm10
+ movaps XMMWORD[(128+112)+rsp],xmm11
+$L$prologue_avx:
+
+ vzeroupper
+ mov rax,QWORD[rdi]
+ mov rbx,QWORD[8+rdi]
+ mov rcx,QWORD[16+rdi]
+ mov rdx,QWORD[24+rdi]
+ mov r8,QWORD[32+rdi]
+ mov r9,QWORD[40+rdi]
+ mov r10,QWORD[48+rdi]
+ mov r11,QWORD[56+rdi]
+ jmp NEAR $L$loop_avx
+ALIGN 16
+$L$loop_avx:
+ vmovdqa xmm11,XMMWORD[((K512+1280))]
+ vmovdqu xmm0,XMMWORD[rsi]
+ lea rbp,[((K512+128))]
+ vmovdqu xmm1,XMMWORD[16+rsi]
+ vmovdqu xmm2,XMMWORD[32+rsi]
+ vpshufb xmm0,xmm0,xmm11
+ vmovdqu xmm3,XMMWORD[48+rsi]
+ vpshufb xmm1,xmm1,xmm11
+ vmovdqu xmm4,XMMWORD[64+rsi]
+ vpshufb xmm2,xmm2,xmm11
+ vmovdqu xmm5,XMMWORD[80+rsi]
+ vpshufb xmm3,xmm3,xmm11
+ vmovdqu xmm6,XMMWORD[96+rsi]
+ vpshufb xmm4,xmm4,xmm11
+ vmovdqu xmm7,XMMWORD[112+rsi]
+ vpshufb xmm5,xmm5,xmm11
+ vpaddq xmm8,xmm0,XMMWORD[((-128))+rbp]
+ vpshufb xmm6,xmm6,xmm11
+ vpaddq xmm9,xmm1,XMMWORD[((-96))+rbp]
+ vpshufb xmm7,xmm7,xmm11
+ vpaddq xmm10,xmm2,XMMWORD[((-64))+rbp]
+ vpaddq xmm11,xmm3,XMMWORD[((-32))+rbp]
+ vmovdqa XMMWORD[rsp],xmm8
+ vpaddq xmm8,xmm4,XMMWORD[rbp]
+ vmovdqa XMMWORD[16+rsp],xmm9
+ vpaddq xmm9,xmm5,XMMWORD[32+rbp]
+ vmovdqa XMMWORD[32+rsp],xmm10
+ vpaddq xmm10,xmm6,XMMWORD[64+rbp]
+ vmovdqa XMMWORD[48+rsp],xmm11
+ vpaddq xmm11,xmm7,XMMWORD[96+rbp]
+ vmovdqa XMMWORD[64+rsp],xmm8
+ mov r14,rax
+ vmovdqa XMMWORD[80+rsp],xmm9
+ mov rdi,rbx
+ vmovdqa XMMWORD[96+rsp],xmm10
+ xor rdi,rcx
+ vmovdqa XMMWORD[112+rsp],xmm11
+ mov r13,r8
+ jmp NEAR $L$avx_00_47
+
+ALIGN 16
+$L$avx_00_47:
+ add rbp,256
+ vpalignr xmm8,xmm1,xmm0,8
+ shrd r13,r13,23
+ mov rax,r14
+ vpalignr xmm11,xmm5,xmm4,8
+ mov r12,r9
+ shrd r14,r14,5
+ vpsrlq xmm10,xmm8,1
+ xor r13,r8
+ xor r12,r10
+ vpaddq xmm0,xmm0,xmm11
+ shrd r13,r13,4
+ xor r14,rax
+ vpsrlq xmm11,xmm8,7
+ and r12,r8
+ xor r13,r8
+ vpsllq xmm9,xmm8,56
+ add r11,QWORD[rsp]
+ mov r15,rax
+ vpxor xmm8,xmm11,xmm10
+ xor r12,r10
+ shrd r14,r14,6
+ vpsrlq xmm10,xmm10,7
+ xor r15,rbx
+ add r11,r12
+ vpxor xmm8,xmm8,xmm9
+ shrd r13,r13,14
+ and rdi,r15
+ vpsllq xmm9,xmm9,7
+ xor r14,rax
+ add r11,r13
+ vpxor xmm8,xmm8,xmm10
+ xor rdi,rbx
+ shrd r14,r14,28
+ vpsrlq xmm11,xmm7,6
+ add rdx,r11
+ add r11,rdi
+ vpxor xmm8,xmm8,xmm9
+ mov r13,rdx
+ add r14,r11
+ vpsllq xmm10,xmm7,3
+ shrd r13,r13,23
+ mov r11,r14
+ vpaddq xmm0,xmm0,xmm8
+ mov r12,r8
+ shrd r14,r14,5
+ vpsrlq xmm9,xmm7,19
+ xor r13,rdx
+ xor r12,r9
+ vpxor xmm11,xmm11,xmm10
+ shrd r13,r13,4
+ xor r14,r11
+ vpsllq xmm10,xmm10,42
+ and r12,rdx
+ xor r13,rdx
+ vpxor xmm11,xmm11,xmm9
+ add r10,QWORD[8+rsp]
+ mov rdi,r11
+ vpsrlq xmm9,xmm9,42
+ xor r12,r9
+ shrd r14,r14,6
+ vpxor xmm11,xmm11,xmm10
+ xor rdi,rax
+ add r10,r12
+ vpxor xmm11,xmm11,xmm9
+ shrd r13,r13,14
+ and r15,rdi
+ vpaddq xmm0,xmm0,xmm11
+ xor r14,r11
+ add r10,r13
+ vpaddq xmm10,xmm0,XMMWORD[((-128))+rbp]
+ xor r15,rax
+ shrd r14,r14,28
+ add rcx,r10
+ add r10,r15
+ mov r13,rcx
+ add r14,r10
+ vmovdqa XMMWORD[rsp],xmm10
+ vpalignr xmm8,xmm2,xmm1,8
+ shrd r13,r13,23
+ mov r10,r14
+ vpalignr xmm11,xmm6,xmm5,8
+ mov r12,rdx
+ shrd r14,r14,5
+ vpsrlq xmm10,xmm8,1
+ xor r13,rcx
+ xor r12,r8
+ vpaddq xmm1,xmm1,xmm11
+ shrd r13,r13,4
+ xor r14,r10
+ vpsrlq xmm11,xmm8,7
+ and r12,rcx
+ xor r13,rcx
+ vpsllq xmm9,xmm8,56
+ add r9,QWORD[16+rsp]
+ mov r15,r10
+ vpxor xmm8,xmm11,xmm10
+ xor r12,r8
+ shrd r14,r14,6
+ vpsrlq xmm10,xmm10,7
+ xor r15,r11
+ add r9,r12
+ vpxor xmm8,xmm8,xmm9
+ shrd r13,r13,14
+ and rdi,r15
+ vpsllq xmm9,xmm9,7
+ xor r14,r10
+ add r9,r13
+ vpxor xmm8,xmm8,xmm10
+ xor rdi,r11
+ shrd r14,r14,28
+ vpsrlq xmm11,xmm0,6
+ add rbx,r9
+ add r9,rdi
+ vpxor xmm8,xmm8,xmm9
+ mov r13,rbx
+ add r14,r9
+ vpsllq xmm10,xmm0,3
+ shrd r13,r13,23
+ mov r9,r14
+ vpaddq xmm1,xmm1,xmm8
+ mov r12,rcx
+ shrd r14,r14,5
+ vpsrlq xmm9,xmm0,19
+ xor r13,rbx
+ xor r12,rdx
+ vpxor xmm11,xmm11,xmm10
+ shrd r13,r13,4
+ xor r14,r9
+ vpsllq xmm10,xmm10,42
+ and r12,rbx
+ xor r13,rbx
+ vpxor xmm11,xmm11,xmm9
+ add r8,QWORD[24+rsp]
+ mov rdi,r9
+ vpsrlq xmm9,xmm9,42
+ xor r12,rdx
+ shrd r14,r14,6
+ vpxor xmm11,xmm11,xmm10
+ xor rdi,r10
+ add r8,r12
+ vpxor xmm11,xmm11,xmm9
+ shrd r13,r13,14
+ and r15,rdi
+ vpaddq xmm1,xmm1,xmm11
+ xor r14,r9
+ add r8,r13
+ vpaddq xmm10,xmm1,XMMWORD[((-96))+rbp]
+ xor r15,r10
+ shrd r14,r14,28
+ add rax,r8
+ add r8,r15
+ mov r13,rax
+ add r14,r8
+ vmovdqa XMMWORD[16+rsp],xmm10
+ vpalignr xmm8,xmm3,xmm2,8
+ shrd r13,r13,23
+ mov r8,r14
+ vpalignr xmm11,xmm7,xmm6,8
+ mov r12,rbx
+ shrd r14,r14,5
+ vpsrlq xmm10,xmm8,1
+ xor r13,rax
+ xor r12,rcx
+ vpaddq xmm2,xmm2,xmm11
+ shrd r13,r13,4
+ xor r14,r8
+ vpsrlq xmm11,xmm8,7
+ and r12,rax
+ xor r13,rax
+ vpsllq xmm9,xmm8,56
+ add rdx,QWORD[32+rsp]
+ mov r15,r8
+ vpxor xmm8,xmm11,xmm10
+ xor r12,rcx
+ shrd r14,r14,6
+ vpsrlq xmm10,xmm10,7
+ xor r15,r9
+ add rdx,r12
+ vpxor xmm8,xmm8,xmm9
+ shrd r13,r13,14
+ and rdi,r15
+ vpsllq xmm9,xmm9,7
+ xor r14,r8
+ add rdx,r13
+ vpxor xmm8,xmm8,xmm10
+ xor rdi,r9
+ shrd r14,r14,28
+ vpsrlq xmm11,xmm1,6
+ add r11,rdx
+ add rdx,rdi
+ vpxor xmm8,xmm8,xmm9
+ mov r13,r11
+ add r14,rdx
+ vpsllq xmm10,xmm1,3
+ shrd r13,r13,23
+ mov rdx,r14
+ vpaddq xmm2,xmm2,xmm8
+ mov r12,rax
+ shrd r14,r14,5
+ vpsrlq xmm9,xmm1,19
+ xor r13,r11
+ xor r12,rbx
+ vpxor xmm11,xmm11,xmm10
+ shrd r13,r13,4
+ xor r14,rdx
+ vpsllq xmm10,xmm10,42
+ and r12,r11
+ xor r13,r11
+ vpxor xmm11,xmm11,xmm9
+ add rcx,QWORD[40+rsp]
+ mov rdi,rdx
+ vpsrlq xmm9,xmm9,42
+ xor r12,rbx
+ shrd r14,r14,6
+ vpxor xmm11,xmm11,xmm10
+ xor rdi,r8
+ add rcx,r12
+ vpxor xmm11,xmm11,xmm9
+ shrd r13,r13,14
+ and r15,rdi
+ vpaddq xmm2,xmm2,xmm11
+ xor r14,rdx
+ add rcx,r13
+ vpaddq xmm10,xmm2,XMMWORD[((-64))+rbp]
+ xor r15,r8
+ shrd r14,r14,28
+ add r10,rcx
+ add rcx,r15
+ mov r13,r10
+ add r14,rcx
+ vmovdqa XMMWORD[32+rsp],xmm10
+ vpalignr xmm8,xmm4,xmm3,8
+ shrd r13,r13,23
+ mov rcx,r14
+ vpalignr xmm11,xmm0,xmm7,8
+ mov r12,r11
+ shrd r14,r14,5
+ vpsrlq xmm10,xmm8,1
+ xor r13,r10
+ xor r12,rax
+ vpaddq xmm3,xmm3,xmm11
+ shrd r13,r13,4
+ xor r14,rcx
+ vpsrlq xmm11,xmm8,7
+ and r12,r10
+ xor r13,r10
+ vpsllq xmm9,xmm8,56
+ add rbx,QWORD[48+rsp]
+ mov r15,rcx
+ vpxor xmm8,xmm11,xmm10
+ xor r12,rax
+ shrd r14,r14,6
+ vpsrlq xmm10,xmm10,7
+ xor r15,rdx
+ add rbx,r12
+ vpxor xmm8,xmm8,xmm9
+ shrd r13,r13,14
+ and rdi,r15
+ vpsllq xmm9,xmm9,7
+ xor r14,rcx
+ add rbx,r13
+ vpxor xmm8,xmm8,xmm10
+ xor rdi,rdx
+ shrd r14,r14,28
+ vpsrlq xmm11,xmm2,6
+ add r9,rbx
+ add rbx,rdi
+ vpxor xmm8,xmm8,xmm9
+ mov r13,r9
+ add r14,rbx
+ vpsllq xmm10,xmm2,3
+ shrd r13,r13,23
+ mov rbx,r14
+ vpaddq xmm3,xmm3,xmm8
+ mov r12,r10
+ shrd r14,r14,5
+ vpsrlq xmm9,xmm2,19
+ xor r13,r9
+ xor r12,r11
+ vpxor xmm11,xmm11,xmm10
+ shrd r13,r13,4
+ xor r14,rbx
+ vpsllq xmm10,xmm10,42
+ and r12,r9
+ xor r13,r9
+ vpxor xmm11,xmm11,xmm9
+ add rax,QWORD[56+rsp]
+ mov rdi,rbx
+ vpsrlq xmm9,xmm9,42
+ xor r12,r11
+ shrd r14,r14,6
+ vpxor xmm11,xmm11,xmm10
+ xor rdi,rcx
+ add rax,r12
+ vpxor xmm11,xmm11,xmm9
+ shrd r13,r13,14
+ and r15,rdi
+ vpaddq xmm3,xmm3,xmm11
+ xor r14,rbx
+ add rax,r13
+ vpaddq xmm10,xmm3,XMMWORD[((-32))+rbp]
+ xor r15,rcx
+ shrd r14,r14,28
+ add r8,rax
+ add rax,r15
+ mov r13,r8
+ add r14,rax
+ vmovdqa XMMWORD[48+rsp],xmm10
+ vpalignr xmm8,xmm5,xmm4,8
+ shrd r13,r13,23
+ mov rax,r14
+ vpalignr xmm11,xmm1,xmm0,8
+ mov r12,r9
+ shrd r14,r14,5
+ vpsrlq xmm10,xmm8,1
+ xor r13,r8
+ xor r12,r10
+ vpaddq xmm4,xmm4,xmm11
+ shrd r13,r13,4
+ xor r14,rax
+ vpsrlq xmm11,xmm8,7
+ and r12,r8
+ xor r13,r8
+ vpsllq xmm9,xmm8,56
+ add r11,QWORD[64+rsp]
+ mov r15,rax
+ vpxor xmm8,xmm11,xmm10
+ xor r12,r10
+ shrd r14,r14,6
+ vpsrlq xmm10,xmm10,7
+ xor r15,rbx
+ add r11,r12
+ vpxor xmm8,xmm8,xmm9
+ shrd r13,r13,14
+ and rdi,r15
+ vpsllq xmm9,xmm9,7
+ xor r14,rax
+ add r11,r13
+ vpxor xmm8,xmm8,xmm10
+ xor rdi,rbx
+ shrd r14,r14,28
+ vpsrlq xmm11,xmm3,6
+ add rdx,r11
+ add r11,rdi
+ vpxor xmm8,xmm8,xmm9
+ mov r13,rdx
+ add r14,r11
+ vpsllq xmm10,xmm3,3
+ shrd r13,r13,23
+ mov r11,r14
+ vpaddq xmm4,xmm4,xmm8
+ mov r12,r8
+ shrd r14,r14,5
+ vpsrlq xmm9,xmm3,19
+ xor r13,rdx
+ xor r12,r9
+ vpxor xmm11,xmm11,xmm10
+ shrd r13,r13,4
+ xor r14,r11
+ vpsllq xmm10,xmm10,42
+ and r12,rdx
+ xor r13,rdx
+ vpxor xmm11,xmm11,xmm9
+ add r10,QWORD[72+rsp]
+ mov rdi,r11
+ vpsrlq xmm9,xmm9,42
+ xor r12,r9
+ shrd r14,r14,6
+ vpxor xmm11,xmm11,xmm10
+ xor rdi,rax
+ add r10,r12
+ vpxor xmm11,xmm11,xmm9
+ shrd r13,r13,14
+ and r15,rdi
+ vpaddq xmm4,xmm4,xmm11
+ xor r14,r11
+ add r10,r13
+ vpaddq xmm10,xmm4,XMMWORD[rbp]
+ xor r15,rax
+ shrd r14,r14,28
+ add rcx,r10
+ add r10,r15
+ mov r13,rcx
+ add r14,r10
+ vmovdqa XMMWORD[64+rsp],xmm10
+ vpalignr xmm8,xmm6,xmm5,8
+ shrd r13,r13,23
+ mov r10,r14
+ vpalignr xmm11,xmm2,xmm1,8
+ mov r12,rdx
+ shrd r14,r14,5
+ vpsrlq xmm10,xmm8,1
+ xor r13,rcx
+ xor r12,r8
+ vpaddq xmm5,xmm5,xmm11
+ shrd r13,r13,4
+ xor r14,r10
+ vpsrlq xmm11,xmm8,7
+ and r12,rcx
+ xor r13,rcx
+ vpsllq xmm9,xmm8,56
+ add r9,QWORD[80+rsp]
+ mov r15,r10
+ vpxor xmm8,xmm11,xmm10
+ xor r12,r8
+ shrd r14,r14,6
+ vpsrlq xmm10,xmm10,7
+ xor r15,r11
+ add r9,r12
+ vpxor xmm8,xmm8,xmm9
+ shrd r13,r13,14
+ and rdi,r15
+ vpsllq xmm9,xmm9,7
+ xor r14,r10
+ add r9,r13
+ vpxor xmm8,xmm8,xmm10
+ xor rdi,r11
+ shrd r14,r14,28
+ vpsrlq xmm11,xmm4,6
+ add rbx,r9
+ add r9,rdi
+ vpxor xmm8,xmm8,xmm9
+ mov r13,rbx
+ add r14,r9
+ vpsllq xmm10,xmm4,3
+ shrd r13,r13,23
+ mov r9,r14
+ vpaddq xmm5,xmm5,xmm8
+ mov r12,rcx
+ shrd r14,r14,5
+ vpsrlq xmm9,xmm4,19
+ xor r13,rbx
+ xor r12,rdx
+ vpxor xmm11,xmm11,xmm10
+ shrd r13,r13,4
+ xor r14,r9
+ vpsllq xmm10,xmm10,42
+ and r12,rbx
+ xor r13,rbx
+ vpxor xmm11,xmm11,xmm9
+ add r8,QWORD[88+rsp]
+ mov rdi,r9
+ vpsrlq xmm9,xmm9,42
+ xor r12,rdx
+ shrd r14,r14,6
+ vpxor xmm11,xmm11,xmm10
+ xor rdi,r10
+ add r8,r12
+ vpxor xmm11,xmm11,xmm9
+ shrd r13,r13,14
+ and r15,rdi
+ vpaddq xmm5,xmm5,xmm11
+ xor r14,r9
+ add r8,r13
+ vpaddq xmm10,xmm5,XMMWORD[32+rbp]
+ xor r15,r10
+ shrd r14,r14,28
+ add rax,r8
+ add r8,r15
+ mov r13,rax
+ add r14,r8
+ vmovdqa XMMWORD[80+rsp],xmm10
+ vpalignr xmm8,xmm7,xmm6,8
+ shrd r13,r13,23
+ mov r8,r14
+ vpalignr xmm11,xmm3,xmm2,8
+ mov r12,rbx
+ shrd r14,r14,5
+ vpsrlq xmm10,xmm8,1
+ xor r13,rax
+ xor r12,rcx
+ vpaddq xmm6,xmm6,xmm11
+ shrd r13,r13,4
+ xor r14,r8
+ vpsrlq xmm11,xmm8,7
+ and r12,rax
+ xor r13,rax
+ vpsllq xmm9,xmm8,56
+ add rdx,QWORD[96+rsp]
+ mov r15,r8
+ vpxor xmm8,xmm11,xmm10
+ xor r12,rcx
+ shrd r14,r14,6
+ vpsrlq xmm10,xmm10,7
+ xor r15,r9
+ add rdx,r12
+ vpxor xmm8,xmm8,xmm9
+ shrd r13,r13,14
+ and rdi,r15
+ vpsllq xmm9,xmm9,7
+ xor r14,r8
+ add rdx,r13
+ vpxor xmm8,xmm8,xmm10
+ xor rdi,r9
+ shrd r14,r14,28
+ vpsrlq xmm11,xmm5,6
+ add r11,rdx
+ add rdx,rdi
+ vpxor xmm8,xmm8,xmm9
+ mov r13,r11
+ add r14,rdx
+ vpsllq xmm10,xmm5,3
+ shrd r13,r13,23
+ mov rdx,r14
+ vpaddq xmm6,xmm6,xmm8
+ mov r12,rax
+ shrd r14,r14,5
+ vpsrlq xmm9,xmm5,19
+ xor r13,r11
+ xor r12,rbx
+ vpxor xmm11,xmm11,xmm10
+ shrd r13,r13,4
+ xor r14,rdx
+ vpsllq xmm10,xmm10,42
+ and r12,r11
+ xor r13,r11
+ vpxor xmm11,xmm11,xmm9
+ add rcx,QWORD[104+rsp]
+ mov rdi,rdx
+ vpsrlq xmm9,xmm9,42
+ xor r12,rbx
+ shrd r14,r14,6
+ vpxor xmm11,xmm11,xmm10
+ xor rdi,r8
+ add rcx,r12
+ vpxor xmm11,xmm11,xmm9
+ shrd r13,r13,14
+ and r15,rdi
+ vpaddq xmm6,xmm6,xmm11
+ xor r14,rdx
+ add rcx,r13
+ vpaddq xmm10,xmm6,XMMWORD[64+rbp]
+ xor r15,r8
+ shrd r14,r14,28
+ add r10,rcx
+ add rcx,r15
+ mov r13,r10
+ add r14,rcx
+ vmovdqa XMMWORD[96+rsp],xmm10
+ vpalignr xmm8,xmm0,xmm7,8
+ shrd r13,r13,23
+ mov rcx,r14
+ vpalignr xmm11,xmm4,xmm3,8
+ mov r12,r11
+ shrd r14,r14,5
+ vpsrlq xmm10,xmm8,1
+ xor r13,r10
+ xor r12,rax
+ vpaddq xmm7,xmm7,xmm11
+ shrd r13,r13,4
+ xor r14,rcx
+ vpsrlq xmm11,xmm8,7
+ and r12,r10
+ xor r13,r10
+ vpsllq xmm9,xmm8,56
+ add rbx,QWORD[112+rsp]
+ mov r15,rcx
+ vpxor xmm8,xmm11,xmm10
+ xor r12,rax
+ shrd r14,r14,6
+ vpsrlq xmm10,xmm10,7
+ xor r15,rdx
+ add rbx,r12
+ vpxor xmm8,xmm8,xmm9
+ shrd r13,r13,14
+ and rdi,r15
+ vpsllq xmm9,xmm9,7
+ xor r14,rcx
+ add rbx,r13
+ vpxor xmm8,xmm8,xmm10
+ xor rdi,rdx
+ shrd r14,r14,28
+ vpsrlq xmm11,xmm6,6
+ add r9,rbx
+ add rbx,rdi
+ vpxor xmm8,xmm8,xmm9
+ mov r13,r9
+ add r14,rbx
+ vpsllq xmm10,xmm6,3
+ shrd r13,r13,23
+ mov rbx,r14
+ vpaddq xmm7,xmm7,xmm8
+ mov r12,r10
+ shrd r14,r14,5
+ vpsrlq xmm9,xmm6,19
+ xor r13,r9
+ xor r12,r11
+ vpxor xmm11,xmm11,xmm10
+ shrd r13,r13,4
+ xor r14,rbx
+ vpsllq xmm10,xmm10,42
+ and r12,r9
+ xor r13,r9
+ vpxor xmm11,xmm11,xmm9
+ add rax,QWORD[120+rsp]
+ mov rdi,rbx
+ vpsrlq xmm9,xmm9,42
+ xor r12,r11
+ shrd r14,r14,6
+ vpxor xmm11,xmm11,xmm10
+ xor rdi,rcx
+ add rax,r12
+ vpxor xmm11,xmm11,xmm9
+ shrd r13,r13,14
+ and r15,rdi
+ vpaddq xmm7,xmm7,xmm11
+ xor r14,rbx
+ add rax,r13
+ vpaddq xmm10,xmm7,XMMWORD[96+rbp]
+ xor r15,rcx
+ shrd r14,r14,28
+ add r8,rax
+ add rax,r15
+ mov r13,r8
+ add r14,rax
+ vmovdqa XMMWORD[112+rsp],xmm10
+ cmp BYTE[135+rbp],0
+ jne NEAR $L$avx_00_47
+ shrd r13,r13,23
+ mov rax,r14
+ mov r12,r9
+ shrd r14,r14,5
+ xor r13,r8
+ xor r12,r10
+ shrd r13,r13,4
+ xor r14,rax
+ and r12,r8
+ xor r13,r8
+ add r11,QWORD[rsp]
+ mov r15,rax
+ xor r12,r10
+ shrd r14,r14,6
+ xor r15,rbx
+ add r11,r12
+ shrd r13,r13,14
+ and rdi,r15
+ xor r14,rax
+ add r11,r13
+ xor rdi,rbx
+ shrd r14,r14,28
+ add rdx,r11
+ add r11,rdi
+ mov r13,rdx
+ add r14,r11
+ shrd r13,r13,23
+ mov r11,r14
+ mov r12,r8
+ shrd r14,r14,5
+ xor r13,rdx
+ xor r12,r9
+ shrd r13,r13,4
+ xor r14,r11
+ and r12,rdx
+ xor r13,rdx
+ add r10,QWORD[8+rsp]
+ mov rdi,r11
+ xor r12,r9
+ shrd r14,r14,6
+ xor rdi,rax
+ add r10,r12
+ shrd r13,r13,14
+ and r15,rdi
+ xor r14,r11
+ add r10,r13
+ xor r15,rax
+ shrd r14,r14,28
+ add rcx,r10
+ add r10,r15
+ mov r13,rcx
+ add r14,r10
+ shrd r13,r13,23
+ mov r10,r14
+ mov r12,rdx
+ shrd r14,r14,5
+ xor r13,rcx
+ xor r12,r8
+ shrd r13,r13,4
+ xor r14,r10
+ and r12,rcx
+ xor r13,rcx
+ add r9,QWORD[16+rsp]
+ mov r15,r10
+ xor r12,r8
+ shrd r14,r14,6
+ xor r15,r11
+ add r9,r12
+ shrd r13,r13,14
+ and rdi,r15
+ xor r14,r10
+ add r9,r13
+ xor rdi,r11
+ shrd r14,r14,28
+ add rbx,r9
+ add r9,rdi
+ mov r13,rbx
+ add r14,r9
+ shrd r13,r13,23
+ mov r9,r14
+ mov r12,rcx
+ shrd r14,r14,5
+ xor r13,rbx
+ xor r12,rdx
+ shrd r13,r13,4
+ xor r14,r9
+ and r12,rbx
+ xor r13,rbx
+ add r8,QWORD[24+rsp]
+ mov rdi,r9
+ xor r12,rdx
+ shrd r14,r14,6
+ xor rdi,r10
+ add r8,r12
+ shrd r13,r13,14
+ and r15,rdi
+ xor r14,r9
+ add r8,r13
+ xor r15,r10
+ shrd r14,r14,28
+ add rax,r8
+ add r8,r15
+ mov r13,rax
+ add r14,r8
+ shrd r13,r13,23
+ mov r8,r14
+ mov r12,rbx
+ shrd r14,r14,5
+ xor r13,rax
+ xor r12,rcx
+ shrd r13,r13,4
+ xor r14,r8
+ and r12,rax
+ xor r13,rax
+ add rdx,QWORD[32+rsp]
+ mov r15,r8
+ xor r12,rcx
+ shrd r14,r14,6
+ xor r15,r9
+ add rdx,r12
+ shrd r13,r13,14
+ and rdi,r15
+ xor r14,r8
+ add rdx,r13
+ xor rdi,r9
+ shrd r14,r14,28
+ add r11,rdx
+ add rdx,rdi
+ mov r13,r11
+ add r14,rdx
+ shrd r13,r13,23
+ mov rdx,r14
+ mov r12,rax
+ shrd r14,r14,5
+ xor r13,r11
+ xor r12,rbx
+ shrd r13,r13,4
+ xor r14,rdx
+ and r12,r11
+ xor r13,r11
+ add rcx,QWORD[40+rsp]
+ mov rdi,rdx
+ xor r12,rbx
+ shrd r14,r14,6
+ xor rdi,r8
+ add rcx,r12
+ shrd r13,r13,14
+ and r15,rdi
+ xor r14,rdx
+ add rcx,r13
+ xor r15,r8
+ shrd r14,r14,28
+ add r10,rcx
+ add rcx,r15
+ mov r13,r10
+ add r14,rcx
+ shrd r13,r13,23
+ mov rcx,r14
+ mov r12,r11
+ shrd r14,r14,5
+ xor r13,r10
+ xor r12,rax
+ shrd r13,r13,4
+ xor r14,rcx
+ and r12,r10
+ xor r13,r10
+ add rbx,QWORD[48+rsp]
+ mov r15,rcx
+ xor r12,rax
+ shrd r14,r14,6
+ xor r15,rdx
+ add rbx,r12
+ shrd r13,r13,14
+ and rdi,r15
+ xor r14,rcx
+ add rbx,r13
+ xor rdi,rdx
+ shrd r14,r14,28
+ add r9,rbx
+ add rbx,rdi
+ mov r13,r9
+ add r14,rbx
+ shrd r13,r13,23
+ mov rbx,r14
+ mov r12,r10
+ shrd r14,r14,5
+ xor r13,r9
+ xor r12,r11
+ shrd r13,r13,4
+ xor r14,rbx
+ and r12,r9
+ xor r13,r9
+ add rax,QWORD[56+rsp]
+ mov rdi,rbx
+ xor r12,r11
+ shrd r14,r14,6
+ xor rdi,rcx
+ add rax,r12
+ shrd r13,r13,14
+ and r15,rdi
+ xor r14,rbx
+ add rax,r13
+ xor r15,rcx
+ shrd r14,r14,28
+ add r8,rax
+ add rax,r15
+ mov r13,r8
+ add r14,rax
+ shrd r13,r13,23
+ mov rax,r14
+ mov r12,r9
+ shrd r14,r14,5
+ xor r13,r8
+ xor r12,r10
+ shrd r13,r13,4
+ xor r14,rax
+ and r12,r8
+ xor r13,r8
+ add r11,QWORD[64+rsp]
+ mov r15,rax
+ xor r12,r10
+ shrd r14,r14,6
+ xor r15,rbx
+ add r11,r12
+ shrd r13,r13,14
+ and rdi,r15
+ xor r14,rax
+ add r11,r13
+ xor rdi,rbx
+ shrd r14,r14,28
+ add rdx,r11
+ add r11,rdi
+ mov r13,rdx
+ add r14,r11
+ shrd r13,r13,23
+ mov r11,r14
+ mov r12,r8
+ shrd r14,r14,5
+ xor r13,rdx
+ xor r12,r9
+ shrd r13,r13,4
+ xor r14,r11
+ and r12,rdx
+ xor r13,rdx
+ add r10,QWORD[72+rsp]
+ mov rdi,r11
+ xor r12,r9
+ shrd r14,r14,6
+ xor rdi,rax
+ add r10,r12
+ shrd r13,r13,14
+ and r15,rdi
+ xor r14,r11
+ add r10,r13
+ xor r15,rax
+ shrd r14,r14,28
+ add rcx,r10
+ add r10,r15
+ mov r13,rcx
+ add r14,r10
+ shrd r13,r13,23
+ mov r10,r14
+ mov r12,rdx
+ shrd r14,r14,5
+ xor r13,rcx
+ xor r12,r8
+ shrd r13,r13,4
+ xor r14,r10
+ and r12,rcx
+ xor r13,rcx
+ add r9,QWORD[80+rsp]
+ mov r15,r10
+ xor r12,r8
+ shrd r14,r14,6
+ xor r15,r11
+ add r9,r12
+ shrd r13,r13,14
+ and rdi,r15
+ xor r14,r10
+ add r9,r13
+ xor rdi,r11
+ shrd r14,r14,28
+ add rbx,r9
+ add r9,rdi
+ mov r13,rbx
+ add r14,r9
+ shrd r13,r13,23
+ mov r9,r14
+ mov r12,rcx
+ shrd r14,r14,5
+ xor r13,rbx
+ xor r12,rdx
+ shrd r13,r13,4
+ xor r14,r9
+ and r12,rbx
+ xor r13,rbx
+ add r8,QWORD[88+rsp]
+ mov rdi,r9
+ xor r12,rdx
+ shrd r14,r14,6
+ xor rdi,r10
+ add r8,r12
+ shrd r13,r13,14
+ and r15,rdi
+ xor r14,r9
+ add r8,r13
+ xor r15,r10
+ shrd r14,r14,28
+ add rax,r8
+ add r8,r15
+ mov r13,rax
+ add r14,r8
+ shrd r13,r13,23
+ mov r8,r14
+ mov r12,rbx
+ shrd r14,r14,5
+ xor r13,rax
+ xor r12,rcx
+ shrd r13,r13,4
+ xor r14,r8
+ and r12,rax
+ xor r13,rax
+ add rdx,QWORD[96+rsp]
+ mov r15,r8
+ xor r12,rcx
+ shrd r14,r14,6
+ xor r15,r9
+ add rdx,r12
+ shrd r13,r13,14
+ and rdi,r15
+ xor r14,r8
+ add rdx,r13
+ xor rdi,r9
+ shrd r14,r14,28
+ add r11,rdx
+ add rdx,rdi
+ mov r13,r11
+ add r14,rdx
+ shrd r13,r13,23
+ mov rdx,r14
+ mov r12,rax
+ shrd r14,r14,5
+ xor r13,r11
+ xor r12,rbx
+ shrd r13,r13,4
+ xor r14,rdx
+ and r12,r11
+ xor r13,r11
+ add rcx,QWORD[104+rsp]
+ mov rdi,rdx
+ xor r12,rbx
+ shrd r14,r14,6
+ xor rdi,r8
+ add rcx,r12
+ shrd r13,r13,14
+ and r15,rdi
+ xor r14,rdx
+ add rcx,r13
+ xor r15,r8
+ shrd r14,r14,28
+ add r10,rcx
+ add rcx,r15
+ mov r13,r10
+ add r14,rcx
+ shrd r13,r13,23
+ mov rcx,r14
+ mov r12,r11
+ shrd r14,r14,5
+ xor r13,r10
+ xor r12,rax
+ shrd r13,r13,4
+ xor r14,rcx
+ and r12,r10
+ xor r13,r10
+ add rbx,QWORD[112+rsp]
+ mov r15,rcx
+ xor r12,rax
+ shrd r14,r14,6
+ xor r15,rdx
+ add rbx,r12
+ shrd r13,r13,14
+ and rdi,r15
+ xor r14,rcx
+ add rbx,r13
+ xor rdi,rdx
+ shrd r14,r14,28
+ add r9,rbx
+ add rbx,rdi
+ mov r13,r9
+ add r14,rbx
+ shrd r13,r13,23
+ mov rbx,r14
+ mov r12,r10
+ shrd r14,r14,5
+ xor r13,r9
+ xor r12,r11
+ shrd r13,r13,4
+ xor r14,rbx
+ and r12,r9
+ xor r13,r9
+ add rax,QWORD[120+rsp]
+ mov rdi,rbx
+ xor r12,r11
+ shrd r14,r14,6
+ xor rdi,rcx
+ add rax,r12
+ shrd r13,r13,14
+ and r15,rdi
+ xor r14,rbx
+ add rax,r13
+ xor r15,rcx
+ shrd r14,r14,28
+ add r8,rax
+ add rax,r15
+ mov r13,r8
+ add r14,rax
+ mov rdi,QWORD[((128+0))+rsp]
+ mov rax,r14
+
+ add rax,QWORD[rdi]
+ lea rsi,[128+rsi]
+ add rbx,QWORD[8+rdi]
+ add rcx,QWORD[16+rdi]
+ add rdx,QWORD[24+rdi]
+ add r8,QWORD[32+rdi]
+ add r9,QWORD[40+rdi]
+ add r10,QWORD[48+rdi]
+ add r11,QWORD[56+rdi]
+
+ cmp rsi,QWORD[((128+16))+rsp]
+
+ mov QWORD[rdi],rax
+ mov QWORD[8+rdi],rbx
+ mov QWORD[16+rdi],rcx
+ mov QWORD[24+rdi],rdx
+ mov QWORD[32+rdi],r8
+ mov QWORD[40+rdi],r9
+ mov QWORD[48+rdi],r10
+ mov QWORD[56+rdi],r11
+ jb NEAR $L$loop_avx
+
+ mov rsi,QWORD[152+rsp]
+
+ vzeroupper
+ movaps xmm6,XMMWORD[((128+32))+rsp]
+ movaps xmm7,XMMWORD[((128+48))+rsp]
+ movaps xmm8,XMMWORD[((128+64))+rsp]
+ movaps xmm9,XMMWORD[((128+80))+rsp]
+ movaps xmm10,XMMWORD[((128+96))+rsp]
+ movaps xmm11,XMMWORD[((128+112))+rsp]
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$epilogue_avx:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_sha512_block_data_order_avx:
+
+ALIGN 64
+sha512_block_data_order_avx2:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha512_block_data_order_avx2:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+$L$avx2_shortcut:
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ sub rsp,1408
+ shl rdx,4
+ and rsp,-256*8
+ lea rdx,[rdx*8+rsi]
+ add rsp,1152
+ mov QWORD[((128+0))+rsp],rdi
+ mov QWORD[((128+8))+rsp],rsi
+ mov QWORD[((128+16))+rsp],rdx
+ mov QWORD[152+rsp],rax
+
+ movaps XMMWORD[(128+32)+rsp],xmm6
+ movaps XMMWORD[(128+48)+rsp],xmm7
+ movaps XMMWORD[(128+64)+rsp],xmm8
+ movaps XMMWORD[(128+80)+rsp],xmm9
+ movaps XMMWORD[(128+96)+rsp],xmm10
+ movaps XMMWORD[(128+112)+rsp],xmm11
+$L$prologue_avx2:
+
+ vzeroupper
+ sub rsi,-16*8
+ mov rax,QWORD[rdi]
+ mov r12,rsi
+ mov rbx,QWORD[8+rdi]
+ cmp rsi,rdx
+ mov rcx,QWORD[16+rdi]
+ cmove r12,rsp
+ mov rdx,QWORD[24+rdi]
+ mov r8,QWORD[32+rdi]
+ mov r9,QWORD[40+rdi]
+ mov r10,QWORD[48+rdi]
+ mov r11,QWORD[56+rdi]
+ jmp NEAR $L$oop_avx2
+ALIGN 16
+$L$oop_avx2:
+ vmovdqu xmm0,XMMWORD[((-128))+rsi]
+ vmovdqu xmm1,XMMWORD[((-128+16))+rsi]
+ vmovdqu xmm2,XMMWORD[((-128+32))+rsi]
+ lea rbp,[((K512+128))]
+ vmovdqu xmm3,XMMWORD[((-128+48))+rsi]
+ vmovdqu xmm4,XMMWORD[((-128+64))+rsi]
+ vmovdqu xmm5,XMMWORD[((-128+80))+rsi]
+ vmovdqu xmm6,XMMWORD[((-128+96))+rsi]
+ vmovdqu xmm7,XMMWORD[((-128+112))+rsi]
+
+ vmovdqa ymm10,YMMWORD[1152+rbp]
+ vinserti128 ymm0,ymm0,XMMWORD[r12],1
+ vinserti128 ymm1,ymm1,XMMWORD[16+r12],1
+ vpshufb ymm0,ymm0,ymm10
+ vinserti128 ymm2,ymm2,XMMWORD[32+r12],1
+ vpshufb ymm1,ymm1,ymm10
+ vinserti128 ymm3,ymm3,XMMWORD[48+r12],1
+ vpshufb ymm2,ymm2,ymm10
+ vinserti128 ymm4,ymm4,XMMWORD[64+r12],1
+ vpshufb ymm3,ymm3,ymm10
+ vinserti128 ymm5,ymm5,XMMWORD[80+r12],1
+ vpshufb ymm4,ymm4,ymm10
+ vinserti128 ymm6,ymm6,XMMWORD[96+r12],1
+ vpshufb ymm5,ymm5,ymm10
+ vinserti128 ymm7,ymm7,XMMWORD[112+r12],1
+
+ vpaddq ymm8,ymm0,YMMWORD[((-128))+rbp]
+ vpshufb ymm6,ymm6,ymm10
+ vpaddq ymm9,ymm1,YMMWORD[((-96))+rbp]
+ vpshufb ymm7,ymm7,ymm10
+ vpaddq ymm10,ymm2,YMMWORD[((-64))+rbp]
+ vpaddq ymm11,ymm3,YMMWORD[((-32))+rbp]
+ vmovdqa YMMWORD[rsp],ymm8
+ vpaddq ymm8,ymm4,YMMWORD[rbp]
+ vmovdqa YMMWORD[32+rsp],ymm9
+ vpaddq ymm9,ymm5,YMMWORD[32+rbp]
+ vmovdqa YMMWORD[64+rsp],ymm10
+ vpaddq ymm10,ymm6,YMMWORD[64+rbp]
+ vmovdqa YMMWORD[96+rsp],ymm11
+ lea rsp,[((-128))+rsp]
+ vpaddq ymm11,ymm7,YMMWORD[96+rbp]
+ vmovdqa YMMWORD[rsp],ymm8
+ xor r14,r14
+ vmovdqa YMMWORD[32+rsp],ymm9
+ mov rdi,rbx
+ vmovdqa YMMWORD[64+rsp],ymm10
+ xor rdi,rcx
+ vmovdqa YMMWORD[96+rsp],ymm11
+ mov r12,r9
+ add rbp,16*2*8
+ jmp NEAR $L$avx2_00_47
+
+ALIGN 16
+$L$avx2_00_47:
+ lea rsp,[((-128))+rsp]
+ vpalignr ymm8,ymm1,ymm0,8
+ add r11,QWORD[((0+256))+rsp]
+ and r12,r8
+ rorx r13,r8,41
+ vpalignr ymm11,ymm5,ymm4,8
+ rorx r15,r8,18
+ lea rax,[r14*1+rax]
+ lea r11,[r12*1+r11]
+ vpsrlq ymm10,ymm8,1
+ andn r12,r8,r10
+ xor r13,r15
+ rorx r14,r8,14
+ vpaddq ymm0,ymm0,ymm11
+ vpsrlq ymm11,ymm8,7
+ lea r11,[r12*1+r11]
+ xor r13,r14
+ mov r15,rax
+ vpsllq ymm9,ymm8,56
+ vpxor ymm8,ymm11,ymm10
+ rorx r12,rax,39
+ lea r11,[r13*1+r11]
+ xor r15,rbx
+ vpsrlq ymm10,ymm10,7
+ vpxor ymm8,ymm8,ymm9
+ rorx r14,rax,34
+ rorx r13,rax,28
+ lea rdx,[r11*1+rdx]
+ vpsllq ymm9,ymm9,7
+ vpxor ymm8,ymm8,ymm10
+ and rdi,r15
+ xor r14,r12
+ xor rdi,rbx
+ vpsrlq ymm11,ymm7,6
+ vpxor ymm8,ymm8,ymm9
+ xor r14,r13
+ lea r11,[rdi*1+r11]
+ mov r12,r8
+ vpsllq ymm10,ymm7,3
+ vpaddq ymm0,ymm0,ymm8
+ add r10,QWORD[((8+256))+rsp]
+ and r12,rdx
+ rorx r13,rdx,41
+ vpsrlq ymm9,ymm7,19
+ vpxor ymm11,ymm11,ymm10
+ rorx rdi,rdx,18
+ lea r11,[r14*1+r11]
+ lea r10,[r12*1+r10]
+ vpsllq ymm10,ymm10,42
+ vpxor ymm11,ymm11,ymm9
+ andn r12,rdx,r9
+ xor r13,rdi
+ rorx r14,rdx,14
+ vpsrlq ymm9,ymm9,42
+ vpxor ymm11,ymm11,ymm10
+ lea r10,[r12*1+r10]
+ xor r13,r14
+ mov rdi,r11
+ vpxor ymm11,ymm11,ymm9
+ rorx r12,r11,39
+ lea r10,[r13*1+r10]
+ xor rdi,rax
+ vpaddq ymm0,ymm0,ymm11
+ rorx r14,r11,34
+ rorx r13,r11,28
+ lea rcx,[r10*1+rcx]
+ vpaddq ymm10,ymm0,YMMWORD[((-128))+rbp]
+ and r15,rdi
+ xor r14,r12
+ xor r15,rax
+ xor r14,r13
+ lea r10,[r15*1+r10]
+ mov r12,rdx
+ vmovdqa YMMWORD[rsp],ymm10
+ vpalignr ymm8,ymm2,ymm1,8
+ add r9,QWORD[((32+256))+rsp]
+ and r12,rcx
+ rorx r13,rcx,41
+ vpalignr ymm11,ymm6,ymm5,8
+ rorx r15,rcx,18
+ lea r10,[r14*1+r10]
+ lea r9,[r12*1+r9]
+ vpsrlq ymm10,ymm8,1
+ andn r12,rcx,r8
+ xor r13,r15
+ rorx r14,rcx,14
+ vpaddq ymm1,ymm1,ymm11
+ vpsrlq ymm11,ymm8,7
+ lea r9,[r12*1+r9]
+ xor r13,r14
+ mov r15,r10
+ vpsllq ymm9,ymm8,56
+ vpxor ymm8,ymm11,ymm10
+ rorx r12,r10,39
+ lea r9,[r13*1+r9]
+ xor r15,r11
+ vpsrlq ymm10,ymm10,7
+ vpxor ymm8,ymm8,ymm9
+ rorx r14,r10,34
+ rorx r13,r10,28
+ lea rbx,[r9*1+rbx]
+ vpsllq ymm9,ymm9,7
+ vpxor ymm8,ymm8,ymm10
+ and rdi,r15
+ xor r14,r12
+ xor rdi,r11
+ vpsrlq ymm11,ymm0,6
+ vpxor ymm8,ymm8,ymm9
+ xor r14,r13
+ lea r9,[rdi*1+r9]
+ mov r12,rcx
+ vpsllq ymm10,ymm0,3
+ vpaddq ymm1,ymm1,ymm8
+ add r8,QWORD[((40+256))+rsp]
+ and r12,rbx
+ rorx r13,rbx,41
+ vpsrlq ymm9,ymm0,19
+ vpxor ymm11,ymm11,ymm10
+ rorx rdi,rbx,18
+ lea r9,[r14*1+r9]
+ lea r8,[r12*1+r8]
+ vpsllq ymm10,ymm10,42
+ vpxor ymm11,ymm11,ymm9
+ andn r12,rbx,rdx
+ xor r13,rdi
+ rorx r14,rbx,14
+ vpsrlq ymm9,ymm9,42
+ vpxor ymm11,ymm11,ymm10
+ lea r8,[r12*1+r8]
+ xor r13,r14
+ mov rdi,r9
+ vpxor ymm11,ymm11,ymm9
+ rorx r12,r9,39
+ lea r8,[r13*1+r8]
+ xor rdi,r10
+ vpaddq ymm1,ymm1,ymm11
+ rorx r14,r9,34
+ rorx r13,r9,28
+ lea rax,[r8*1+rax]
+ vpaddq ymm10,ymm1,YMMWORD[((-96))+rbp]
+ and r15,rdi
+ xor r14,r12
+ xor r15,r10
+ xor r14,r13
+ lea r8,[r15*1+r8]
+ mov r12,rbx
+ vmovdqa YMMWORD[32+rsp],ymm10
+ vpalignr ymm8,ymm3,ymm2,8
+ add rdx,QWORD[((64+256))+rsp]
+ and r12,rax
+ rorx r13,rax,41
+ vpalignr ymm11,ymm7,ymm6,8
+ rorx r15,rax,18
+ lea r8,[r14*1+r8]
+ lea rdx,[r12*1+rdx]
+ vpsrlq ymm10,ymm8,1
+ andn r12,rax,rcx
+ xor r13,r15
+ rorx r14,rax,14
+ vpaddq ymm2,ymm2,ymm11
+ vpsrlq ymm11,ymm8,7
+ lea rdx,[r12*1+rdx]
+ xor r13,r14
+ mov r15,r8
+ vpsllq ymm9,ymm8,56
+ vpxor ymm8,ymm11,ymm10
+ rorx r12,r8,39
+ lea rdx,[r13*1+rdx]
+ xor r15,r9
+ vpsrlq ymm10,ymm10,7
+ vpxor ymm8,ymm8,ymm9
+ rorx r14,r8,34
+ rorx r13,r8,28
+ lea r11,[rdx*1+r11]
+ vpsllq ymm9,ymm9,7
+ vpxor ymm8,ymm8,ymm10
+ and rdi,r15
+ xor r14,r12
+ xor rdi,r9
+ vpsrlq ymm11,ymm1,6
+ vpxor ymm8,ymm8,ymm9
+ xor r14,r13
+ lea rdx,[rdi*1+rdx]
+ mov r12,rax
+ vpsllq ymm10,ymm1,3
+ vpaddq ymm2,ymm2,ymm8
+ add rcx,QWORD[((72+256))+rsp]
+ and r12,r11
+ rorx r13,r11,41
+ vpsrlq ymm9,ymm1,19
+ vpxor ymm11,ymm11,ymm10
+ rorx rdi,r11,18
+ lea rdx,[r14*1+rdx]
+ lea rcx,[r12*1+rcx]
+ vpsllq ymm10,ymm10,42
+ vpxor ymm11,ymm11,ymm9
+ andn r12,r11,rbx
+ xor r13,rdi
+ rorx r14,r11,14
+ vpsrlq ymm9,ymm9,42
+ vpxor ymm11,ymm11,ymm10
+ lea rcx,[r12*1+rcx]
+ xor r13,r14
+ mov rdi,rdx
+ vpxor ymm11,ymm11,ymm9
+ rorx r12,rdx,39
+ lea rcx,[r13*1+rcx]
+ xor rdi,r8
+ vpaddq ymm2,ymm2,ymm11
+ rorx r14,rdx,34
+ rorx r13,rdx,28
+ lea r10,[rcx*1+r10]
+ vpaddq ymm10,ymm2,YMMWORD[((-64))+rbp]
+ and r15,rdi
+ xor r14,r12
+ xor r15,r8
+ xor r14,r13
+ lea rcx,[r15*1+rcx]
+ mov r12,r11
+ vmovdqa YMMWORD[64+rsp],ymm10
+ vpalignr ymm8,ymm4,ymm3,8
+ add rbx,QWORD[((96+256))+rsp]
+ and r12,r10
+ rorx r13,r10,41
+ vpalignr ymm11,ymm0,ymm7,8
+ rorx r15,r10,18
+ lea rcx,[r14*1+rcx]
+ lea rbx,[r12*1+rbx]
+ vpsrlq ymm10,ymm8,1
+ andn r12,r10,rax
+ xor r13,r15
+ rorx r14,r10,14
+ vpaddq ymm3,ymm3,ymm11
+ vpsrlq ymm11,ymm8,7
+ lea rbx,[r12*1+rbx]
+ xor r13,r14
+ mov r15,rcx
+ vpsllq ymm9,ymm8,56
+ vpxor ymm8,ymm11,ymm10
+ rorx r12,rcx,39
+ lea rbx,[r13*1+rbx]
+ xor r15,rdx
+ vpsrlq ymm10,ymm10,7
+ vpxor ymm8,ymm8,ymm9
+ rorx r14,rcx,34
+ rorx r13,rcx,28
+ lea r9,[rbx*1+r9]
+ vpsllq ymm9,ymm9,7
+ vpxor ymm8,ymm8,ymm10
+ and rdi,r15
+ xor r14,r12
+ xor rdi,rdx
+ vpsrlq ymm11,ymm2,6
+ vpxor ymm8,ymm8,ymm9
+ xor r14,r13
+ lea rbx,[rdi*1+rbx]
+ mov r12,r10
+ vpsllq ymm10,ymm2,3
+ vpaddq ymm3,ymm3,ymm8
+ add rax,QWORD[((104+256))+rsp]
+ and r12,r9
+ rorx r13,r9,41
+ vpsrlq ymm9,ymm2,19
+ vpxor ymm11,ymm11,ymm10
+ rorx rdi,r9,18
+ lea rbx,[r14*1+rbx]
+ lea rax,[r12*1+rax]
+ vpsllq ymm10,ymm10,42
+ vpxor ymm11,ymm11,ymm9
+ andn r12,r9,r11
+ xor r13,rdi
+ rorx r14,r9,14
+ vpsrlq ymm9,ymm9,42
+ vpxor ymm11,ymm11,ymm10
+ lea rax,[r12*1+rax]
+ xor r13,r14
+ mov rdi,rbx
+ vpxor ymm11,ymm11,ymm9
+ rorx r12,rbx,39
+ lea rax,[r13*1+rax]
+ xor rdi,rcx
+ vpaddq ymm3,ymm3,ymm11
+ rorx r14,rbx,34
+ rorx r13,rbx,28
+ lea r8,[rax*1+r8]
+ vpaddq ymm10,ymm3,YMMWORD[((-32))+rbp]
+ and r15,rdi
+ xor r14,r12
+ xor r15,rcx
+ xor r14,r13
+ lea rax,[r15*1+rax]
+ mov r12,r9
+ vmovdqa YMMWORD[96+rsp],ymm10
+ lea rsp,[((-128))+rsp]
+ vpalignr ymm8,ymm5,ymm4,8
+ add r11,QWORD[((0+256))+rsp]
+ and r12,r8
+ rorx r13,r8,41
+ vpalignr ymm11,ymm1,ymm0,8
+ rorx r15,r8,18
+ lea rax,[r14*1+rax]
+ lea r11,[r12*1+r11]
+ vpsrlq ymm10,ymm8,1
+ andn r12,r8,r10
+ xor r13,r15
+ rorx r14,r8,14
+ vpaddq ymm4,ymm4,ymm11
+ vpsrlq ymm11,ymm8,7
+ lea r11,[r12*1+r11]
+ xor r13,r14
+ mov r15,rax
+ vpsllq ymm9,ymm8,56
+ vpxor ymm8,ymm11,ymm10
+ rorx r12,rax,39
+ lea r11,[r13*1+r11]
+ xor r15,rbx
+ vpsrlq ymm10,ymm10,7
+ vpxor ymm8,ymm8,ymm9
+ rorx r14,rax,34
+ rorx r13,rax,28
+ lea rdx,[r11*1+rdx]
+ vpsllq ymm9,ymm9,7
+ vpxor ymm8,ymm8,ymm10
+ and rdi,r15
+ xor r14,r12
+ xor rdi,rbx
+ vpsrlq ymm11,ymm3,6
+ vpxor ymm8,ymm8,ymm9
+ xor r14,r13
+ lea r11,[rdi*1+r11]
+ mov r12,r8
+ vpsllq ymm10,ymm3,3
+ vpaddq ymm4,ymm4,ymm8
+ add r10,QWORD[((8+256))+rsp]
+ and r12,rdx
+ rorx r13,rdx,41
+ vpsrlq ymm9,ymm3,19
+ vpxor ymm11,ymm11,ymm10
+ rorx rdi,rdx,18
+ lea r11,[r14*1+r11]
+ lea r10,[r12*1+r10]
+ vpsllq ymm10,ymm10,42
+ vpxor ymm11,ymm11,ymm9
+ andn r12,rdx,r9
+ xor r13,rdi
+ rorx r14,rdx,14
+ vpsrlq ymm9,ymm9,42
+ vpxor ymm11,ymm11,ymm10
+ lea r10,[r12*1+r10]
+ xor r13,r14
+ mov rdi,r11
+ vpxor ymm11,ymm11,ymm9
+ rorx r12,r11,39
+ lea r10,[r13*1+r10]
+ xor rdi,rax
+ vpaddq ymm4,ymm4,ymm11
+ rorx r14,r11,34
+ rorx r13,r11,28
+ lea rcx,[r10*1+rcx]
+ vpaddq ymm10,ymm4,YMMWORD[rbp]
+ and r15,rdi
+ xor r14,r12
+ xor r15,rax
+ xor r14,r13
+ lea r10,[r15*1+r10]
+ mov r12,rdx
+ vmovdqa YMMWORD[rsp],ymm10
+ vpalignr ymm8,ymm6,ymm5,8
+ add r9,QWORD[((32+256))+rsp]
+ and r12,rcx
+ rorx r13,rcx,41
+ vpalignr ymm11,ymm2,ymm1,8
+ rorx r15,rcx,18
+ lea r10,[r14*1+r10]
+ lea r9,[r12*1+r9]
+ vpsrlq ymm10,ymm8,1
+ andn r12,rcx,r8
+ xor r13,r15
+ rorx r14,rcx,14
+ vpaddq ymm5,ymm5,ymm11
+ vpsrlq ymm11,ymm8,7
+ lea r9,[r12*1+r9]
+ xor r13,r14
+ mov r15,r10
+ vpsllq ymm9,ymm8,56
+ vpxor ymm8,ymm11,ymm10
+ rorx r12,r10,39
+ lea r9,[r13*1+r9]
+ xor r15,r11
+ vpsrlq ymm10,ymm10,7
+ vpxor ymm8,ymm8,ymm9
+ rorx r14,r10,34
+ rorx r13,r10,28
+ lea rbx,[r9*1+rbx]
+ vpsllq ymm9,ymm9,7
+ vpxor ymm8,ymm8,ymm10
+ and rdi,r15
+ xor r14,r12
+ xor rdi,r11
+ vpsrlq ymm11,ymm4,6
+ vpxor ymm8,ymm8,ymm9
+ xor r14,r13
+ lea r9,[rdi*1+r9]
+ mov r12,rcx
+ vpsllq ymm10,ymm4,3
+ vpaddq ymm5,ymm5,ymm8
+ add r8,QWORD[((40+256))+rsp]
+ and r12,rbx
+ rorx r13,rbx,41
+ vpsrlq ymm9,ymm4,19
+ vpxor ymm11,ymm11,ymm10
+ rorx rdi,rbx,18
+ lea r9,[r14*1+r9]
+ lea r8,[r12*1+r8]
+ vpsllq ymm10,ymm10,42
+ vpxor ymm11,ymm11,ymm9
+ andn r12,rbx,rdx
+ xor r13,rdi
+ rorx r14,rbx,14
+ vpsrlq ymm9,ymm9,42
+ vpxor ymm11,ymm11,ymm10
+ lea r8,[r12*1+r8]
+ xor r13,r14
+ mov rdi,r9
+ vpxor ymm11,ymm11,ymm9
+ rorx r12,r9,39
+ lea r8,[r13*1+r8]
+ xor rdi,r10
+ vpaddq ymm5,ymm5,ymm11
+ rorx r14,r9,34
+ rorx r13,r9,28
+ lea rax,[r8*1+rax]
+ vpaddq ymm10,ymm5,YMMWORD[32+rbp]
+ and r15,rdi
+ xor r14,r12
+ xor r15,r10
+ xor r14,r13
+ lea r8,[r15*1+r8]
+ mov r12,rbx
+ vmovdqa YMMWORD[32+rsp],ymm10
+ vpalignr ymm8,ymm7,ymm6,8
+ add rdx,QWORD[((64+256))+rsp]
+ and r12,rax
+ rorx r13,rax,41
+ vpalignr ymm11,ymm3,ymm2,8
+ rorx r15,rax,18
+ lea r8,[r14*1+r8]
+ lea rdx,[r12*1+rdx]
+ vpsrlq ymm10,ymm8,1
+ andn r12,rax,rcx
+ xor r13,r15
+ rorx r14,rax,14
+ vpaddq ymm6,ymm6,ymm11
+ vpsrlq ymm11,ymm8,7
+ lea rdx,[r12*1+rdx]
+ xor r13,r14
+ mov r15,r8
+ vpsllq ymm9,ymm8,56
+ vpxor ymm8,ymm11,ymm10
+ rorx r12,r8,39
+ lea rdx,[r13*1+rdx]
+ xor r15,r9
+ vpsrlq ymm10,ymm10,7
+ vpxor ymm8,ymm8,ymm9
+ rorx r14,r8,34
+ rorx r13,r8,28
+ lea r11,[rdx*1+r11]
+ vpsllq ymm9,ymm9,7
+ vpxor ymm8,ymm8,ymm10
+ and rdi,r15
+ xor r14,r12
+ xor rdi,r9
+ vpsrlq ymm11,ymm5,6
+ vpxor ymm8,ymm8,ymm9
+ xor r14,r13
+ lea rdx,[rdi*1+rdx]
+ mov r12,rax
+ vpsllq ymm10,ymm5,3
+ vpaddq ymm6,ymm6,ymm8
+ add rcx,QWORD[((72+256))+rsp]
+ and r12,r11
+ rorx r13,r11,41
+ vpsrlq ymm9,ymm5,19
+ vpxor ymm11,ymm11,ymm10
+ rorx rdi,r11,18
+ lea rdx,[r14*1+rdx]
+ lea rcx,[r12*1+rcx]
+ vpsllq ymm10,ymm10,42
+ vpxor ymm11,ymm11,ymm9
+ andn r12,r11,rbx
+ xor r13,rdi
+ rorx r14,r11,14
+ vpsrlq ymm9,ymm9,42
+ vpxor ymm11,ymm11,ymm10
+ lea rcx,[r12*1+rcx]
+ xor r13,r14
+ mov rdi,rdx
+ vpxor ymm11,ymm11,ymm9
+ rorx r12,rdx,39
+ lea rcx,[r13*1+rcx]
+ xor rdi,r8
+ vpaddq ymm6,ymm6,ymm11
+ rorx r14,rdx,34
+ rorx r13,rdx,28
+ lea r10,[rcx*1+r10]
+ vpaddq ymm10,ymm6,YMMWORD[64+rbp]
+ and r15,rdi
+ xor r14,r12
+ xor r15,r8
+ xor r14,r13
+ lea rcx,[r15*1+rcx]
+ mov r12,r11
+ vmovdqa YMMWORD[64+rsp],ymm10
+ vpalignr ymm8,ymm0,ymm7,8
+ add rbx,QWORD[((96+256))+rsp]
+ and r12,r10
+ rorx r13,r10,41
+ vpalignr ymm11,ymm4,ymm3,8
+ rorx r15,r10,18
+ lea rcx,[r14*1+rcx]
+ lea rbx,[r12*1+rbx]
+ vpsrlq ymm10,ymm8,1
+ andn r12,r10,rax
+ xor r13,r15
+ rorx r14,r10,14
+ vpaddq ymm7,ymm7,ymm11
+ vpsrlq ymm11,ymm8,7
+ lea rbx,[r12*1+rbx]
+ xor r13,r14
+ mov r15,rcx
+ vpsllq ymm9,ymm8,56
+ vpxor ymm8,ymm11,ymm10
+ rorx r12,rcx,39
+ lea rbx,[r13*1+rbx]
+ xor r15,rdx
+ vpsrlq ymm10,ymm10,7
+ vpxor ymm8,ymm8,ymm9
+ rorx r14,rcx,34
+ rorx r13,rcx,28
+ lea r9,[rbx*1+r9]
+ vpsllq ymm9,ymm9,7
+ vpxor ymm8,ymm8,ymm10
+ and rdi,r15
+ xor r14,r12
+ xor rdi,rdx
+ vpsrlq ymm11,ymm6,6
+ vpxor ymm8,ymm8,ymm9
+ xor r14,r13
+ lea rbx,[rdi*1+rbx]
+ mov r12,r10
+ vpsllq ymm10,ymm6,3
+ vpaddq ymm7,ymm7,ymm8
+ add rax,QWORD[((104+256))+rsp]
+ and r12,r9
+ rorx r13,r9,41
+ vpsrlq ymm9,ymm6,19
+ vpxor ymm11,ymm11,ymm10
+ rorx rdi,r9,18
+ lea rbx,[r14*1+rbx]
+ lea rax,[r12*1+rax]
+ vpsllq ymm10,ymm10,42
+ vpxor ymm11,ymm11,ymm9
+ andn r12,r9,r11
+ xor r13,rdi
+ rorx r14,r9,14
+ vpsrlq ymm9,ymm9,42
+ vpxor ymm11,ymm11,ymm10
+ lea rax,[r12*1+rax]
+ xor r13,r14
+ mov rdi,rbx
+ vpxor ymm11,ymm11,ymm9
+ rorx r12,rbx,39
+ lea rax,[r13*1+rax]
+ xor rdi,rcx
+ vpaddq ymm7,ymm7,ymm11
+ rorx r14,rbx,34
+ rorx r13,rbx,28
+ lea r8,[rax*1+r8]
+ vpaddq ymm10,ymm7,YMMWORD[96+rbp]
+ and r15,rdi
+ xor r14,r12
+ xor r15,rcx
+ xor r14,r13
+ lea rax,[r15*1+rax]
+ mov r12,r9
+ vmovdqa YMMWORD[96+rsp],ymm10
+ lea rbp,[256+rbp]
+ cmp BYTE[((-121))+rbp],0
+ jne NEAR $L$avx2_00_47
+ add r11,QWORD[((0+128))+rsp]
+ and r12,r8
+ rorx r13,r8,41
+ rorx r15,r8,18
+ lea rax,[r14*1+rax]
+ lea r11,[r12*1+r11]
+ andn r12,r8,r10
+ xor r13,r15
+ rorx r14,r8,14
+ lea r11,[r12*1+r11]
+ xor r13,r14
+ mov r15,rax
+ rorx r12,rax,39
+ lea r11,[r13*1+r11]
+ xor r15,rbx
+ rorx r14,rax,34
+ rorx r13,rax,28
+ lea rdx,[r11*1+rdx]
+ and rdi,r15
+ xor r14,r12
+ xor rdi,rbx
+ xor r14,r13
+ lea r11,[rdi*1+r11]
+ mov r12,r8
+ add r10,QWORD[((8+128))+rsp]
+ and r12,rdx
+ rorx r13,rdx,41
+ rorx rdi,rdx,18
+ lea r11,[r14*1+r11]
+ lea r10,[r12*1+r10]
+ andn r12,rdx,r9
+ xor r13,rdi
+ rorx r14,rdx,14
+ lea r10,[r12*1+r10]
+ xor r13,r14
+ mov rdi,r11
+ rorx r12,r11,39
+ lea r10,[r13*1+r10]
+ xor rdi,rax
+ rorx r14,r11,34
+ rorx r13,r11,28
+ lea rcx,[r10*1+rcx]
+ and r15,rdi
+ xor r14,r12
+ xor r15,rax
+ xor r14,r13
+ lea r10,[r15*1+r10]
+ mov r12,rdx
+ add r9,QWORD[((32+128))+rsp]
+ and r12,rcx
+ rorx r13,rcx,41
+ rorx r15,rcx,18
+ lea r10,[r14*1+r10]
+ lea r9,[r12*1+r9]
+ andn r12,rcx,r8
+ xor r13,r15
+ rorx r14,rcx,14
+ lea r9,[r12*1+r9]
+ xor r13,r14
+ mov r15,r10
+ rorx r12,r10,39
+ lea r9,[r13*1+r9]
+ xor r15,r11
+ rorx r14,r10,34
+ rorx r13,r10,28
+ lea rbx,[r9*1+rbx]
+ and rdi,r15
+ xor r14,r12
+ xor rdi,r11
+ xor r14,r13
+ lea r9,[rdi*1+r9]
+ mov r12,rcx
+ add r8,QWORD[((40+128))+rsp]
+ and r12,rbx
+ rorx r13,rbx,41
+ rorx rdi,rbx,18
+ lea r9,[r14*1+r9]
+ lea r8,[r12*1+r8]
+ andn r12,rbx,rdx
+ xor r13,rdi
+ rorx r14,rbx,14
+ lea r8,[r12*1+r8]
+ xor r13,r14
+ mov rdi,r9
+ rorx r12,r9,39
+ lea r8,[r13*1+r8]
+ xor rdi,r10
+ rorx r14,r9,34
+ rorx r13,r9,28
+ lea rax,[r8*1+rax]
+ and r15,rdi
+ xor r14,r12
+ xor r15,r10
+ xor r14,r13
+ lea r8,[r15*1+r8]
+ mov r12,rbx
+ add rdx,QWORD[((64+128))+rsp]
+ and r12,rax
+ rorx r13,rax,41
+ rorx r15,rax,18
+ lea r8,[r14*1+r8]
+ lea rdx,[r12*1+rdx]
+ andn r12,rax,rcx
+ xor r13,r15
+ rorx r14,rax,14
+ lea rdx,[r12*1+rdx]
+ xor r13,r14
+ mov r15,r8
+ rorx r12,r8,39
+ lea rdx,[r13*1+rdx]
+ xor r15,r9
+ rorx r14,r8,34
+ rorx r13,r8,28
+ lea r11,[rdx*1+r11]
+ and rdi,r15
+ xor r14,r12
+ xor rdi,r9
+ xor r14,r13
+ lea rdx,[rdi*1+rdx]
+ mov r12,rax
+ add rcx,QWORD[((72+128))+rsp]
+ and r12,r11
+ rorx r13,r11,41
+ rorx rdi,r11,18
+ lea rdx,[r14*1+rdx]
+ lea rcx,[r12*1+rcx]
+ andn r12,r11,rbx
+ xor r13,rdi
+ rorx r14,r11,14
+ lea rcx,[r12*1+rcx]
+ xor r13,r14
+ mov rdi,rdx
+ rorx r12,rdx,39
+ lea rcx,[r13*1+rcx]
+ xor rdi,r8
+ rorx r14,rdx,34
+ rorx r13,rdx,28
+ lea r10,[rcx*1+r10]
+ and r15,rdi
+ xor r14,r12
+ xor r15,r8
+ xor r14,r13
+ lea rcx,[r15*1+rcx]
+ mov r12,r11
+ add rbx,QWORD[((96+128))+rsp]
+ and r12,r10
+ rorx r13,r10,41
+ rorx r15,r10,18
+ lea rcx,[r14*1+rcx]
+ lea rbx,[r12*1+rbx]
+ andn r12,r10,rax
+ xor r13,r15
+ rorx r14,r10,14
+ lea rbx,[r12*1+rbx]
+ xor r13,r14
+ mov r15,rcx
+ rorx r12,rcx,39
+ lea rbx,[r13*1+rbx]
+ xor r15,rdx
+ rorx r14,rcx,34
+ rorx r13,rcx,28
+ lea r9,[rbx*1+r9]
+ and rdi,r15
+ xor r14,r12
+ xor rdi,rdx
+ xor r14,r13
+ lea rbx,[rdi*1+rbx]
+ mov r12,r10
+ add rax,QWORD[((104+128))+rsp]
+ and r12,r9
+ rorx r13,r9,41
+ rorx rdi,r9,18
+ lea rbx,[r14*1+rbx]
+ lea rax,[r12*1+rax]
+ andn r12,r9,r11
+ xor r13,rdi
+ rorx r14,r9,14
+ lea rax,[r12*1+rax]
+ xor r13,r14
+ mov rdi,rbx
+ rorx r12,rbx,39
+ lea rax,[r13*1+rax]
+ xor rdi,rcx
+ rorx r14,rbx,34
+ rorx r13,rbx,28
+ lea r8,[rax*1+r8]
+ and r15,rdi
+ xor r14,r12
+ xor r15,rcx
+ xor r14,r13
+ lea rax,[r15*1+rax]
+ mov r12,r9
+ add r11,QWORD[rsp]
+ and r12,r8
+ rorx r13,r8,41
+ rorx r15,r8,18
+ lea rax,[r14*1+rax]
+ lea r11,[r12*1+r11]
+ andn r12,r8,r10
+ xor r13,r15
+ rorx r14,r8,14
+ lea r11,[r12*1+r11]
+ xor r13,r14
+ mov r15,rax
+ rorx r12,rax,39
+ lea r11,[r13*1+r11]
+ xor r15,rbx
+ rorx r14,rax,34
+ rorx r13,rax,28
+ lea rdx,[r11*1+rdx]
+ and rdi,r15
+ xor r14,r12
+ xor rdi,rbx
+ xor r14,r13
+ lea r11,[rdi*1+r11]
+ mov r12,r8
+ add r10,QWORD[8+rsp]
+ and r12,rdx
+ rorx r13,rdx,41
+ rorx rdi,rdx,18
+ lea r11,[r14*1+r11]
+ lea r10,[r12*1+r10]
+ andn r12,rdx,r9
+ xor r13,rdi
+ rorx r14,rdx,14
+ lea r10,[r12*1+r10]
+ xor r13,r14
+ mov rdi,r11
+ rorx r12,r11,39
+ lea r10,[r13*1+r10]
+ xor rdi,rax
+ rorx r14,r11,34
+ rorx r13,r11,28
+ lea rcx,[r10*1+rcx]
+ and r15,rdi
+ xor r14,r12
+ xor r15,rax
+ xor r14,r13
+ lea r10,[r15*1+r10]
+ mov r12,rdx
+ add r9,QWORD[32+rsp]
+ and r12,rcx
+ rorx r13,rcx,41
+ rorx r15,rcx,18
+ lea r10,[r14*1+r10]
+ lea r9,[r12*1+r9]
+ andn r12,rcx,r8
+ xor r13,r15
+ rorx r14,rcx,14
+ lea r9,[r12*1+r9]
+ xor r13,r14
+ mov r15,r10
+ rorx r12,r10,39
+ lea r9,[r13*1+r9]
+ xor r15,r11
+ rorx r14,r10,34
+ rorx r13,r10,28
+ lea rbx,[r9*1+rbx]
+ and rdi,r15
+ xor r14,r12
+ xor rdi,r11
+ xor r14,r13
+ lea r9,[rdi*1+r9]
+ mov r12,rcx
+ add r8,QWORD[40+rsp]
+ and r12,rbx
+ rorx r13,rbx,41
+ rorx rdi,rbx,18
+ lea r9,[r14*1+r9]
+ lea r8,[r12*1+r8]
+ andn r12,rbx,rdx
+ xor r13,rdi
+ rorx r14,rbx,14
+ lea r8,[r12*1+r8]
+ xor r13,r14
+ mov rdi,r9
+ rorx r12,r9,39
+ lea r8,[r13*1+r8]
+ xor rdi,r10
+ rorx r14,r9,34
+ rorx r13,r9,28
+ lea rax,[r8*1+rax]
+ and r15,rdi
+ xor r14,r12
+ xor r15,r10
+ xor r14,r13
+ lea r8,[r15*1+r8]
+ mov r12,rbx
+ add rdx,QWORD[64+rsp]
+ and r12,rax
+ rorx r13,rax,41
+ rorx r15,rax,18
+ lea r8,[r14*1+r8]
+ lea rdx,[r12*1+rdx]
+ andn r12,rax,rcx
+ xor r13,r15
+ rorx r14,rax,14
+ lea rdx,[r12*1+rdx]
+ xor r13,r14
+ mov r15,r8
+ rorx r12,r8,39
+ lea rdx,[r13*1+rdx]
+ xor r15,r9
+ rorx r14,r8,34
+ rorx r13,r8,28
+ lea r11,[rdx*1+r11]
+ and rdi,r15
+ xor r14,r12
+ xor rdi,r9
+ xor r14,r13
+ lea rdx,[rdi*1+rdx]
+ mov r12,rax
+ add rcx,QWORD[72+rsp]
+ and r12,r11
+ rorx r13,r11,41
+ rorx rdi,r11,18
+ lea rdx,[r14*1+rdx]
+ lea rcx,[r12*1+rcx]
+ andn r12,r11,rbx
+ xor r13,rdi
+ rorx r14,r11,14
+ lea rcx,[r12*1+rcx]
+ xor r13,r14
+ mov rdi,rdx
+ rorx r12,rdx,39
+ lea rcx,[r13*1+rcx]
+ xor rdi,r8
+ rorx r14,rdx,34
+ rorx r13,rdx,28
+ lea r10,[rcx*1+r10]
+ and r15,rdi
+ xor r14,r12
+ xor r15,r8
+ xor r14,r13
+ lea rcx,[r15*1+rcx]
+ mov r12,r11
+ add rbx,QWORD[96+rsp]
+ and r12,r10
+ rorx r13,r10,41
+ rorx r15,r10,18
+ lea rcx,[r14*1+rcx]
+ lea rbx,[r12*1+rbx]
+ andn r12,r10,rax
+ xor r13,r15
+ rorx r14,r10,14
+ lea rbx,[r12*1+rbx]
+ xor r13,r14
+ mov r15,rcx
+ rorx r12,rcx,39
+ lea rbx,[r13*1+rbx]
+ xor r15,rdx
+ rorx r14,rcx,34
+ rorx r13,rcx,28
+ lea r9,[rbx*1+r9]
+ and rdi,r15
+ xor r14,r12
+ xor rdi,rdx
+ xor r14,r13
+ lea rbx,[rdi*1+rbx]
+ mov r12,r10
+ add rax,QWORD[104+rsp]
+ and r12,r9
+ rorx r13,r9,41
+ rorx rdi,r9,18
+ lea rbx,[r14*1+rbx]
+ lea rax,[r12*1+rax]
+ andn r12,r9,r11
+ xor r13,rdi
+ rorx r14,r9,14
+ lea rax,[r12*1+rax]
+ xor r13,r14
+ mov rdi,rbx
+ rorx r12,rbx,39
+ lea rax,[r13*1+rax]
+ xor rdi,rcx
+ rorx r14,rbx,34
+ rorx r13,rbx,28
+ lea r8,[rax*1+r8]
+ and r15,rdi
+ xor r14,r12
+ xor r15,rcx
+ xor r14,r13
+ lea rax,[r15*1+rax]
+ mov r12,r9
+ mov rdi,QWORD[1280+rsp]
+ add rax,r14
+
+ lea rbp,[1152+rsp]
+
+ add rax,QWORD[rdi]
+ add rbx,QWORD[8+rdi]
+ add rcx,QWORD[16+rdi]
+ add rdx,QWORD[24+rdi]
+ add r8,QWORD[32+rdi]
+ add r9,QWORD[40+rdi]
+ add r10,QWORD[48+rdi]
+ add r11,QWORD[56+rdi]
+
+ mov QWORD[rdi],rax
+ mov QWORD[8+rdi],rbx
+ mov QWORD[16+rdi],rcx
+ mov QWORD[24+rdi],rdx
+ mov QWORD[32+rdi],r8
+ mov QWORD[40+rdi],r9
+ mov QWORD[48+rdi],r10
+ mov QWORD[56+rdi],r11
+
+ cmp rsi,QWORD[144+rbp]
+ je NEAR $L$done_avx2
+
+ xor r14,r14
+ mov rdi,rbx
+ xor rdi,rcx
+ mov r12,r9
+ jmp NEAR $L$ower_avx2
+ALIGN 16
+$L$ower_avx2:
+ add r11,QWORD[((0+16))+rbp]
+ and r12,r8
+ rorx r13,r8,41
+ rorx r15,r8,18
+ lea rax,[r14*1+rax]
+ lea r11,[r12*1+r11]
+ andn r12,r8,r10
+ xor r13,r15
+ rorx r14,r8,14
+ lea r11,[r12*1+r11]
+ xor r13,r14
+ mov r15,rax
+ rorx r12,rax,39
+ lea r11,[r13*1+r11]
+ xor r15,rbx
+ rorx r14,rax,34
+ rorx r13,rax,28
+ lea rdx,[r11*1+rdx]
+ and rdi,r15
+ xor r14,r12
+ xor rdi,rbx
+ xor r14,r13
+ lea r11,[rdi*1+r11]
+ mov r12,r8
+ add r10,QWORD[((8+16))+rbp]
+ and r12,rdx
+ rorx r13,rdx,41
+ rorx rdi,rdx,18
+ lea r11,[r14*1+r11]
+ lea r10,[r12*1+r10]
+ andn r12,rdx,r9
+ xor r13,rdi
+ rorx r14,rdx,14
+ lea r10,[r12*1+r10]
+ xor r13,r14
+ mov rdi,r11
+ rorx r12,r11,39
+ lea r10,[r13*1+r10]
+ xor rdi,rax
+ rorx r14,r11,34
+ rorx r13,r11,28
+ lea rcx,[r10*1+rcx]
+ and r15,rdi
+ xor r14,r12
+ xor r15,rax
+ xor r14,r13
+ lea r10,[r15*1+r10]
+ mov r12,rdx
+ add r9,QWORD[((32+16))+rbp]
+ and r12,rcx
+ rorx r13,rcx,41
+ rorx r15,rcx,18
+ lea r10,[r14*1+r10]
+ lea r9,[r12*1+r9]
+ andn r12,rcx,r8
+ xor r13,r15
+ rorx r14,rcx,14
+ lea r9,[r12*1+r9]
+ xor r13,r14
+ mov r15,r10
+ rorx r12,r10,39
+ lea r9,[r13*1+r9]
+ xor r15,r11
+ rorx r14,r10,34
+ rorx r13,r10,28
+ lea rbx,[r9*1+rbx]
+ and rdi,r15
+ xor r14,r12
+ xor rdi,r11
+ xor r14,r13
+ lea r9,[rdi*1+r9]
+ mov r12,rcx
+ add r8,QWORD[((40+16))+rbp]
+ and r12,rbx
+ rorx r13,rbx,41
+ rorx rdi,rbx,18
+ lea r9,[r14*1+r9]
+ lea r8,[r12*1+r8]
+ andn r12,rbx,rdx
+ xor r13,rdi
+ rorx r14,rbx,14
+ lea r8,[r12*1+r8]
+ xor r13,r14
+ mov rdi,r9
+ rorx r12,r9,39
+ lea r8,[r13*1+r8]
+ xor rdi,r10
+ rorx r14,r9,34
+ rorx r13,r9,28
+ lea rax,[r8*1+rax]
+ and r15,rdi
+ xor r14,r12
+ xor r15,r10
+ xor r14,r13
+ lea r8,[r15*1+r8]
+ mov r12,rbx
+ add rdx,QWORD[((64+16))+rbp]
+ and r12,rax
+ rorx r13,rax,41
+ rorx r15,rax,18
+ lea r8,[r14*1+r8]
+ lea rdx,[r12*1+rdx]
+ andn r12,rax,rcx
+ xor r13,r15
+ rorx r14,rax,14
+ lea rdx,[r12*1+rdx]
+ xor r13,r14
+ mov r15,r8
+ rorx r12,r8,39
+ lea rdx,[r13*1+rdx]
+ xor r15,r9
+ rorx r14,r8,34
+ rorx r13,r8,28
+ lea r11,[rdx*1+r11]
+ and rdi,r15
+ xor r14,r12
+ xor rdi,r9
+ xor r14,r13
+ lea rdx,[rdi*1+rdx]
+ mov r12,rax
+ add rcx,QWORD[((72+16))+rbp]
+ and r12,r11
+ rorx r13,r11,41
+ rorx rdi,r11,18
+ lea rdx,[r14*1+rdx]
+ lea rcx,[r12*1+rcx]
+ andn r12,r11,rbx
+ xor r13,rdi
+ rorx r14,r11,14
+ lea rcx,[r12*1+rcx]
+ xor r13,r14
+ mov rdi,rdx
+ rorx r12,rdx,39
+ lea rcx,[r13*1+rcx]
+ xor rdi,r8
+ rorx r14,rdx,34
+ rorx r13,rdx,28
+ lea r10,[rcx*1+r10]
+ and r15,rdi
+ xor r14,r12
+ xor r15,r8
+ xor r14,r13
+ lea rcx,[r15*1+rcx]
+ mov r12,r11
+ add rbx,QWORD[((96+16))+rbp]
+ and r12,r10
+ rorx r13,r10,41
+ rorx r15,r10,18
+ lea rcx,[r14*1+rcx]
+ lea rbx,[r12*1+rbx]
+ andn r12,r10,rax
+ xor r13,r15
+ rorx r14,r10,14
+ lea rbx,[r12*1+rbx]
+ xor r13,r14
+ mov r15,rcx
+ rorx r12,rcx,39
+ lea rbx,[r13*1+rbx]
+ xor r15,rdx
+ rorx r14,rcx,34
+ rorx r13,rcx,28
+ lea r9,[rbx*1+r9]
+ and rdi,r15
+ xor r14,r12
+ xor rdi,rdx
+ xor r14,r13
+ lea rbx,[rdi*1+rbx]
+ mov r12,r10
+ add rax,QWORD[((104+16))+rbp]
+ and r12,r9
+ rorx r13,r9,41
+ rorx rdi,r9,18
+ lea rbx,[r14*1+rbx]
+ lea rax,[r12*1+rax]
+ andn r12,r9,r11
+ xor r13,rdi
+ rorx r14,r9,14
+ lea rax,[r12*1+rax]
+ xor r13,r14
+ mov rdi,rbx
+ rorx r12,rbx,39
+ lea rax,[r13*1+rax]
+ xor rdi,rcx
+ rorx r14,rbx,34
+ rorx r13,rbx,28
+ lea r8,[rax*1+r8]
+ and r15,rdi
+ xor r14,r12
+ xor r15,rcx
+ xor r14,r13
+ lea rax,[r15*1+rax]
+ mov r12,r9
+ lea rbp,[((-128))+rbp]
+ cmp rbp,rsp
+ jae NEAR $L$ower_avx2
+
+ mov rdi,QWORD[1280+rsp]
+ add rax,r14
+
+ lea rsp,[1152+rsp]
+
+ add rax,QWORD[rdi]
+ add rbx,QWORD[8+rdi]
+ add rcx,QWORD[16+rdi]
+ add rdx,QWORD[24+rdi]
+ add r8,QWORD[32+rdi]
+ add r9,QWORD[40+rdi]
+ lea rsi,[256+rsi]
+ add r10,QWORD[48+rdi]
+ mov r12,rsi
+ add r11,QWORD[56+rdi]
+ cmp rsi,QWORD[((128+16))+rsp]
+
+ mov QWORD[rdi],rax
+ cmove r12,rsp
+ mov QWORD[8+rdi],rbx
+ mov QWORD[16+rdi],rcx
+ mov QWORD[24+rdi],rdx
+ mov QWORD[32+rdi],r8
+ mov QWORD[40+rdi],r9
+ mov QWORD[48+rdi],r10
+ mov QWORD[56+rdi],r11
+
+ jbe NEAR $L$oop_avx2
+ lea rbp,[rsp]
+
+$L$done_avx2:
+ lea rsp,[rbp]
+ mov rsi,QWORD[152+rsp]
+
+ vzeroupper
+ movaps xmm6,XMMWORD[((128+32))+rsp]
+ movaps xmm7,XMMWORD[((128+48))+rsp]
+ movaps xmm8,XMMWORD[((128+64))+rsp]
+ movaps xmm9,XMMWORD[((128+80))+rsp]
+ movaps xmm10,XMMWORD[((128+96))+rsp]
+ movaps xmm11,XMMWORD[((128+112))+rsp]
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$epilogue_avx2:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_sha512_block_data_order_avx2:
+EXTERN __imp_RtlVirtualUnwind
+
+ALIGN 16
+se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$in_prologue
+
+ mov rax,QWORD[152+r8]
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$in_prologue
+ lea r10,[$L$avx2_shortcut]
+ cmp rbx,r10
+ jb NEAR $L$not_in_avx2
+
+ and rax,-256*8
+ add rax,1152
+$L$not_in_avx2:
+ mov rsi,rax
+ mov rax,QWORD[((128+24))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+ mov rbp,QWORD[((-16))+rax]
+ mov r12,QWORD[((-24))+rax]
+ mov r13,QWORD[((-32))+rax]
+ mov r14,QWORD[((-40))+rax]
+ mov r15,QWORD[((-48))+rax]
+ mov QWORD[144+r8],rbx
+ mov QWORD[160+r8],rbp
+ mov QWORD[216+r8],r12
+ mov QWORD[224+r8],r13
+ mov QWORD[232+r8],r14
+ mov QWORD[240+r8],r15
+
+ lea r10,[$L$epilogue]
+ cmp rbx,r10
+ jb NEAR $L$in_prologue
+
+ lea rsi,[((128+32))+rsi]
+ lea rdi,[512+r8]
+ mov ecx,12
+ DD 0xa548f3fc
+
+$L$in_prologue:
+ mov rdi,QWORD[8+rax]
+ mov rsi,QWORD[16+rax]
+ mov QWORD[152+r8],rax
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+
+ mov rdi,QWORD[40+r9]
+ mov rsi,r8
+ mov ecx,154
+ DD 0xa548f3fc
+
+ mov rsi,r9
+ xor rcx,rcx
+ mov rdx,QWORD[8+rsi]
+ mov r8,QWORD[rsi]
+ mov r9,QWORD[16+rsi]
+ mov r10,QWORD[40+rsi]
+ lea r11,[56+rsi]
+ lea r12,[24+rsi]
+ mov QWORD[32+rsp],r10
+ mov QWORD[40+rsp],r11
+ mov QWORD[48+rsp],r12
+ mov QWORD[56+rsp],rcx
+ call QWORD[__imp_RtlVirtualUnwind]
+
+ mov eax,1
+ add rsp,64
+ popfq
+ pop r15
+ pop r14
+ pop r13
+ pop r12
+ pop rbp
+ pop rbx
+ pop rdi
+ pop rsi
+ DB 0F3h,0C3h ;repret
+
+section .pdata rdata align=4
+ALIGN 4
+ DD $L$SEH_begin_sha512_block_data_order wrt ..imagebase
+ DD $L$SEH_end_sha512_block_data_order wrt ..imagebase
+ DD $L$SEH_info_sha512_block_data_order wrt ..imagebase
+ DD $L$SEH_begin_sha512_block_data_order_xop wrt ..imagebase
+ DD $L$SEH_end_sha512_block_data_order_xop wrt ..imagebase
+ DD $L$SEH_info_sha512_block_data_order_xop wrt ..imagebase
+ DD $L$SEH_begin_sha512_block_data_order_avx wrt ..imagebase
+ DD $L$SEH_end_sha512_block_data_order_avx wrt ..imagebase
+ DD $L$SEH_info_sha512_block_data_order_avx wrt ..imagebase
+ DD $L$SEH_begin_sha512_block_data_order_avx2 wrt ..imagebase
+ DD $L$SEH_end_sha512_block_data_order_avx2 wrt ..imagebase
+ DD $L$SEH_info_sha512_block_data_order_avx2 wrt ..imagebase
+section .xdata rdata align=8
+ALIGN 8
+$L$SEH_info_sha512_block_data_order:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$prologue wrt ..imagebase,$L$epilogue wrt ..imagebase
+$L$SEH_info_sha512_block_data_order_xop:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$prologue_xop wrt ..imagebase,$L$epilogue_xop wrt ..imagebase
+$L$SEH_info_sha512_block_data_order_avx:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$prologue_avx wrt ..imagebase,$L$epilogue_avx wrt ..imagebase
+$L$SEH_info_sha512_block_data_order_avx2:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$prologue_avx2 wrt ..imagebase,$L$epilogue_avx2 wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm
new file mode 100644
index 0000000000..2b64a074c3
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm
@@ -0,0 +1,472 @@
+; Copyright 2005-2018 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+EXTERN OPENSSL_cpuid_setup
+
+section .CRT$XCU rdata align=8
+ DQ OPENSSL_cpuid_setup
+
+
+common OPENSSL_ia32cap_P 16
+
+section .text code align=64
+
+
+global OPENSSL_atomic_add
+
+ALIGN 16
+OPENSSL_atomic_add:
+ mov eax,DWORD[rcx]
+$L$spin: lea r8,[rax*1+rdx]
+DB 0xf0
+ cmpxchg DWORD[rcx],r8d
+ jne NEAR $L$spin
+ mov eax,r8d
+DB 0x48,0x98
+ DB 0F3h,0C3h ;repret
+
+
+global OPENSSL_rdtsc
+
+ALIGN 16
+OPENSSL_rdtsc:
+ rdtsc
+ shl rdx,32
+ or rax,rdx
+ DB 0F3h,0C3h ;repret
+
+
+global OPENSSL_ia32_cpuid
+
+ALIGN 16
+OPENSSL_ia32_cpuid:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_OPENSSL_ia32_cpuid:
+ mov rdi,rcx
+
+
+
+ mov r8,rbx
+
+
+ xor eax,eax
+ mov QWORD[8+rdi],rax
+ cpuid
+ mov r11d,eax
+
+ xor eax,eax
+ cmp ebx,0x756e6547
+ setne al
+ mov r9d,eax
+ cmp edx,0x49656e69
+ setne al
+ or r9d,eax
+ cmp ecx,0x6c65746e
+ setne al
+ or r9d,eax
+ jz NEAR $L$intel
+
+ cmp ebx,0x68747541
+ setne al
+ mov r10d,eax
+ cmp edx,0x69746E65
+ setne al
+ or r10d,eax
+ cmp ecx,0x444D4163
+ setne al
+ or r10d,eax
+ jnz NEAR $L$intel
+
+
+ mov eax,0x80000000
+ cpuid
+ cmp eax,0x80000001
+ jb NEAR $L$intel
+ mov r10d,eax
+ mov eax,0x80000001
+ cpuid
+ or r9d,ecx
+ and r9d,0x00000801
+
+ cmp r10d,0x80000008
+ jb NEAR $L$intel
+
+ mov eax,0x80000008
+ cpuid
+ movzx r10,cl
+ inc r10
+
+ mov eax,1
+ cpuid
+ bt edx,28
+ jnc NEAR $L$generic
+ shr ebx,16
+ cmp bl,r10b
+ ja NEAR $L$generic
+ and edx,0xefffffff
+ jmp NEAR $L$generic
+
+$L$intel:
+ cmp r11d,4
+ mov r10d,-1
+ jb NEAR $L$nocacheinfo
+
+ mov eax,4
+ mov ecx,0
+ cpuid
+ mov r10d,eax
+ shr r10d,14
+ and r10d,0xfff
+
+$L$nocacheinfo:
+ mov eax,1
+ cpuid
+ movd xmm0,eax
+ and edx,0xbfefffff
+ cmp r9d,0
+ jne NEAR $L$notintel
+ or edx,0x40000000
+ and ah,15
+ cmp ah,15
+ jne NEAR $L$notP4
+ or edx,0x00100000
+$L$notP4:
+ cmp ah,6
+ jne NEAR $L$notintel
+ and eax,0x0fff0ff0
+ cmp eax,0x00050670
+ je NEAR $L$knights
+ cmp eax,0x00080650
+ jne NEAR $L$notintel
+$L$knights:
+ and ecx,0xfbffffff
+
+$L$notintel:
+ bt edx,28
+ jnc NEAR $L$generic
+ and edx,0xefffffff
+ cmp r10d,0
+ je NEAR $L$generic
+
+ or edx,0x10000000
+ shr ebx,16
+ cmp bl,1
+ ja NEAR $L$generic
+ and edx,0xefffffff
+$L$generic:
+ and r9d,0x00000800
+ and ecx,0xfffff7ff
+ or r9d,ecx
+
+ mov r10d,edx
+
+ cmp r11d,7
+ jb NEAR $L$no_extended_info
+ mov eax,7
+ xor ecx,ecx
+ cpuid
+ bt r9d,26
+ jc NEAR $L$notknights
+ and ebx,0xfff7ffff
+$L$notknights:
+ movd eax,xmm0
+ and eax,0x0fff0ff0
+ cmp eax,0x00050650
+ jne NEAR $L$notskylakex
+ and ebx,0xfffeffff
+
+$L$notskylakex:
+ mov DWORD[8+rdi],ebx
+ mov DWORD[12+rdi],ecx
+$L$no_extended_info:
+
+ bt r9d,27
+ jnc NEAR $L$clear_avx
+ xor ecx,ecx
+DB 0x0f,0x01,0xd0
+ and eax,0xe6
+ cmp eax,0xe6
+ je NEAR $L$done
+ and DWORD[8+rdi],0x3fdeffff
+
+
+
+
+ and eax,6
+ cmp eax,6
+ je NEAR $L$done
+$L$clear_avx:
+ mov eax,0xefffe7ff
+ and r9d,eax
+ mov eax,0x3fdeffdf
+ and DWORD[8+rdi],eax
+$L$done:
+ shl r9,32
+ mov eax,r10d
+ mov rbx,r8
+
+ or rax,r9
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_OPENSSL_ia32_cpuid:
+
+global OPENSSL_cleanse
+
+ALIGN 16
+OPENSSL_cleanse:
+ xor rax,rax
+ cmp rdx,15
+ jae NEAR $L$ot
+ cmp rdx,0
+ je NEAR $L$ret
+$L$ittle:
+ mov BYTE[rcx],al
+ sub rdx,1
+ lea rcx,[1+rcx]
+ jnz NEAR $L$ittle
+$L$ret:
+ DB 0F3h,0C3h ;repret
+ALIGN 16
+$L$ot:
+ test rcx,7
+ jz NEAR $L$aligned
+ mov BYTE[rcx],al
+ lea rdx,[((-1))+rdx]
+ lea rcx,[1+rcx]
+ jmp NEAR $L$ot
+$L$aligned:
+ mov QWORD[rcx],rax
+ lea rdx,[((-8))+rdx]
+ test rdx,-8
+ lea rcx,[8+rcx]
+ jnz NEAR $L$aligned
+ cmp rdx,0
+ jne NEAR $L$ittle
+ DB 0F3h,0C3h ;repret
+
+
+global CRYPTO_memcmp
+
+ALIGN 16
+CRYPTO_memcmp:
+ xor rax,rax
+ xor r10,r10
+ cmp r8,0
+ je NEAR $L$no_data
+ cmp r8,16
+ jne NEAR $L$oop_cmp
+ mov r10,QWORD[rcx]
+ mov r11,QWORD[8+rcx]
+ mov r8,1
+ xor r10,QWORD[rdx]
+ xor r11,QWORD[8+rdx]
+ or r10,r11
+ cmovnz rax,r8
+ DB 0F3h,0C3h ;repret
+
+ALIGN 16
+$L$oop_cmp:
+ mov r10b,BYTE[rcx]
+ lea rcx,[1+rcx]
+ xor r10b,BYTE[rdx]
+ lea rdx,[1+rdx]
+ or al,r10b
+ dec r8
+ jnz NEAR $L$oop_cmp
+ neg rax
+ shr rax,63
+$L$no_data:
+ DB 0F3h,0C3h ;repret
+
+global OPENSSL_wipe_cpu
+
+ALIGN 16
+OPENSSL_wipe_cpu:
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ pxor xmm2,xmm2
+ pxor xmm3,xmm3
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+ xor rcx,rcx
+ xor rdx,rdx
+ xor r8,r8
+ xor r9,r9
+ xor r10,r10
+ xor r11,r11
+ lea rax,[8+rsp]
+ DB 0F3h,0C3h ;repret
+
+global OPENSSL_instrument_bus
+
+ALIGN 16
+OPENSSL_instrument_bus:
+ mov r10,rcx
+ mov rcx,rdx
+ mov r11,rdx
+
+ rdtsc
+ mov r8d,eax
+ mov r9d,0
+ clflush [r10]
+DB 0xf0
+ add DWORD[r10],r9d
+ jmp NEAR $L$oop
+ALIGN 16
+$L$oop: rdtsc
+ mov edx,eax
+ sub eax,r8d
+ mov r8d,edx
+ mov r9d,eax
+ clflush [r10]
+DB 0xf0
+ add DWORD[r10],eax
+ lea r10,[4+r10]
+ sub rcx,1
+ jnz NEAR $L$oop
+
+ mov rax,r11
+ DB 0F3h,0C3h ;repret
+
+
+global OPENSSL_instrument_bus2
+
+ALIGN 16
+OPENSSL_instrument_bus2:
+ mov r10,rcx
+ mov rcx,rdx
+ mov r11,r8
+ mov QWORD[8+rsp],rcx
+
+ rdtsc
+ mov r8d,eax
+ mov r9d,0
+
+ clflush [r10]
+DB 0xf0
+ add DWORD[r10],r9d
+
+ rdtsc
+ mov edx,eax
+ sub eax,r8d
+ mov r8d,edx
+ mov r9d,eax
+$L$oop2:
+ clflush [r10]
+DB 0xf0
+ add DWORD[r10],eax
+
+ sub r11,1
+ jz NEAR $L$done2
+
+ rdtsc
+ mov edx,eax
+ sub eax,r8d
+ mov r8d,edx
+ cmp eax,r9d
+ mov r9d,eax
+ mov edx,0
+ setne dl
+ sub rcx,rdx
+ lea r10,[rdx*4+r10]
+ jnz NEAR $L$oop2
+
+$L$done2:
+ mov rax,QWORD[8+rsp]
+ sub rax,rcx
+ DB 0F3h,0C3h ;repret
+
+global OPENSSL_ia32_rdrand_bytes
+
+ALIGN 16
+OPENSSL_ia32_rdrand_bytes:
+ xor rax,rax
+ cmp rdx,0
+ je NEAR $L$done_rdrand_bytes
+
+ mov r11,8
+$L$oop_rdrand_bytes:
+DB 73,15,199,242
+ jc NEAR $L$break_rdrand_bytes
+ dec r11
+ jnz NEAR $L$oop_rdrand_bytes
+ jmp NEAR $L$done_rdrand_bytes
+
+ALIGN 16
+$L$break_rdrand_bytes:
+ cmp rdx,8
+ jb NEAR $L$tail_rdrand_bytes
+ mov QWORD[rcx],r10
+ lea rcx,[8+rcx]
+ add rax,8
+ sub rdx,8
+ jz NEAR $L$done_rdrand_bytes
+ mov r11,8
+ jmp NEAR $L$oop_rdrand_bytes
+
+ALIGN 16
+$L$tail_rdrand_bytes:
+ mov BYTE[rcx],r10b
+ lea rcx,[1+rcx]
+ inc rax
+ shr r10,8
+ dec rdx
+ jnz NEAR $L$tail_rdrand_bytes
+
+$L$done_rdrand_bytes:
+ xor r10,r10
+ DB 0F3h,0C3h ;repret
+
+global OPENSSL_ia32_rdseed_bytes
+
+ALIGN 16
+OPENSSL_ia32_rdseed_bytes:
+ xor rax,rax
+ cmp rdx,0
+ je NEAR $L$done_rdseed_bytes
+
+ mov r11,8
+$L$oop_rdseed_bytes:
+DB 73,15,199,250
+ jc NEAR $L$break_rdseed_bytes
+ dec r11
+ jnz NEAR $L$oop_rdseed_bytes
+ jmp NEAR $L$done_rdseed_bytes
+
+ALIGN 16
+$L$break_rdseed_bytes:
+ cmp rdx,8
+ jb NEAR $L$tail_rdseed_bytes
+ mov QWORD[rcx],r10
+ lea rcx,[8+rcx]
+ add rax,8
+ sub rdx,8
+ jz NEAR $L$done_rdseed_bytes
+ mov r11,8
+ jmp NEAR $L$oop_rdseed_bytes
+
+ALIGN 16
+$L$tail_rdseed_bytes:
+ mov BYTE[rcx],r10b
+ lea rcx,[1+rcx]
+ inc rax
+ shr r10,8
+ dec rdx
+ jnz NEAR $L$tail_rdseed_bytes
+
+$L$done_rdseed_bytes:
+ xor r10,r10
+ DB 0F3h,0C3h ;repret
+
diff --git a/CryptoPkg/Library/OpensslLib/process_files.pl b/CryptoPkg/Library/OpensslLib/process_files.pl
index 4ba25da407..c0a19b99b6 100755
--- a/CryptoPkg/Library/OpensslLib/process_files.pl
+++ b/CryptoPkg/Library/OpensslLib/process_files.pl
@@ -12,6 +12,47 @@
use strict;
use Cwd;
use File::Copy;
+use File::Basename;
+use File::Path qw(make_path remove_tree);
+use Text::Tabs;
+
+#
+# OpenSSL perlasm generator script does not transfer the copyright header
+#
+sub copy_license_header
+{
+ my @args = split / /, shift; #Separate args by spaces
+ my $source = $args[1]; #Source file is second (after "perl")
+ my $target = pop @args; #Target file is always last
+ chop ($target); #Remove newline char
+
+ my $temp_file_name = "license.tmp";
+ open (my $source_file, "<" . $source) || die $source;
+ open (my $target_file, "<" . $target) || die $target;
+ open (my $temp_file, ">" . $temp_file_name) || die $temp_file_name;
+
+ #Copy source file header to temp file
+ while (my $line = <$source_file>) {
+ next if ($line =~ /#!/); #Ignore shebang line
+ $line =~ s/#/;/; #Fix comment character for assembly
+ $line =~ s/\s+$/\r\n/; #Trim trailing whitepsace, fixup line endings
+ print ($temp_file $line);
+ last if ($line =~ /http/); #Last line of copyright header contains a web link
+ }
+ print ($temp_file "\r\n"); #Add an empty line after the header
+ #Retrieve generated assembly contents
+ while (my $line = <$target_file>) {
+ $line =~ s/\s+$/\r\n/; #Trim trailing whitepsace, fixup line endings
+ print ($temp_file expand ($line)); #expand() replaces tabs with spaces
+ }
+
+ close ($source_file);
+ close ($target_file);
+ close ($temp_file);
+
+ move ($temp_file_name, $target) ||
+ die "Cannot replace \"" . $target . "\"!";
+}
#
# Find the openssl directory name for use lib. We have to do this
@@ -21,10 +62,39 @@ use File::Copy;
#
my $inf_file;
my $OPENSSL_PATH;
+my $uefi_config;
+my $extension;
+my $arch;
my @inf;
BEGIN {
$inf_file = "OpensslLib.inf";
+ $uefi_config = "UEFI";
+ $arch = shift;
+
+ if (defined $arch) {
+ if (lc ($arch) eq lc ("X64")) {
+ $inf_file = "OpensslLibX64.inf";
+ $uefi_config = "UEFI-x86_64";
+ $extension = "nasm";
+ } elsif (lc ($arch) eq lc ("IA32")) {
+ $arch = "Ia32";
+ $inf_file = "OpensslLibIa32.inf";
+ $uefi_config = "UEFI-x86";
+ $extension = "nasm";
+ } else {
+ die "Unsupported architecture \"" . $arch . "\"!";
+ }
+
+ # Prepare assembly folder
+ if (-d $arch) {
+ remove_tree ($arch, {safe => 1}) ||
+ die "Cannot clean assembly folder \"" . $arch . "\"!";
+ } else {
+ mkdir $arch ||
+ die "Cannot create assembly folder \"" . $arch . "\"!";
+ }
+ }
# Read the contents of the inf file
open( FD, "<" . $inf_file ) ||
@@ -47,9 +117,9 @@ BEGIN {
# Configure UEFI
system(
"./Configure",
- "UEFI",
+ "--config=../uefi-asm.conf",
+ "$uefi_config",
"no-afalgeng",
- "no-asm",
"no-async",
"no-autoerrinit",
"no-autoload-config",
@@ -126,22 +196,52 @@ BEGIN {
# Retrieve file lists from OpenSSL configdata
#
use configdata qw/%unified_info/;
+use configdata qw/%config/;
+use configdata qw/%target/;
+
+#
+# Collect build flags from configdata
+#
+my $flags = "";
+foreach my $f (@{$config{lib_defines}}) {
+ $flags .= " -D$f";
+}
my @cryptofilelist = ();
my @sslfilelist = ();
+my @asmfilelist = ();
+my @asmbuild = ();
foreach my $product ((@{$unified_info{libraries}},
@{$unified_info{engines}})) {
foreach my $o (@{$unified_info{sources}->{$product}}) {
foreach my $s (@{$unified_info{sources}->{$o}}) {
- next if ($unified_info{generate}->{$s});
- next if $s =~ "crypto/bio/b_print.c";
-
# No need to add unused files in UEFI.
# So it can reduce porting time, compile time, library size.
+ next if $s =~ "crypto/bio/b_print.c";
next if $s =~ "crypto/rand/randfile.c";
next if $s =~ "crypto/store/";
next if $s =~ "crypto/err/err_all.c";
+ if ($unified_info{generate}->{$s}) {
+ if (defined $arch) {
+ my $buildstring = "perl";
+ foreach my $arg (@{$unified_info{generate}->{$s}}) {
+ if ($arg =~ ".pl") {
+ $buildstring .= " ./openssl/$arg";
+ } elsif ($arg =~ "PERLASM_SCHEME") {
+ $buildstring .= " $target{perlasm_scheme}";
+ } elsif ($arg =~ "LIB_CFLAGS") {
+ $buildstring .= "$flags";
+ }
+ }
+ ($s, my $path, undef) = fileparse($s, qr/\.[^.]*/);
+ $buildstring .= " ./$arch/$path$s.$extension";
+ make_path ("./$arch/$path");
+ push @asmbuild, "$buildstring\n";
+ push @asmfilelist, " $arch/$path$s.$extension\r\n";
+ }
+ next;
+ }
if ($product =~ "libssl") {
push @sslfilelist, ' $(OPENSSL_PATH)/' . $s . "\r\n";
next;
@@ -179,15 +279,31 @@ foreach (@headers){
}
+#
+# Generate assembly files
+#
+if (@asmbuild) {
+ print "\n--> Generating assembly files ... ";
+ foreach my $buildstring (@asmbuild) {
+ system ("$buildstring");
+ copy_license_header ($buildstring);
+ }
+ print "Done!";
+}
+
#
# Update OpensslLib.inf with autogenerated file list
#
my @new_inf = ();
my $subbing = 0;
-print "\n--> Updating OpensslLib.inf ... ";
+print "\n--> Updating $inf_file ... ";
foreach (@inf) {
+ if ($_ =~ "DEFINE OPENSSL_FLAGS_CONFIG") {
+ push @new_inf, " DEFINE OPENSSL_FLAGS_CONFIG =" . $flags . "\r\n";
+ next;
+ }
if ( $_ =~ "# Autogenerated files list starts here" ) {
- push @new_inf, $_, @cryptofilelist, @sslfilelist;
+ push @new_inf, $_, @asmfilelist, @cryptofilelist, @sslfilelist;
$subbing = 1;
next;
}
@@ -212,49 +328,51 @@ rename( $new_inf_file, $inf_file ) ||
die "rename $inf_file";
print "Done!";
-#
-# Update OpensslLibCrypto.inf with auto-generated file list (no libssl)
-#
-$inf_file = "OpensslLibCrypto.inf";
-
-# Read the contents of the inf file
-@inf = ();
-@new_inf = ();
-open( FD, "<" . $inf_file ) ||
- die "Cannot open \"" . $inf_file . "\"!";
-@inf = (<FD>);
-close(FD) ||
- die "Cannot close \"" . $inf_file . "\"!";
+if (!defined $arch) {
+ #
+ # Update OpensslLibCrypto.inf with auto-generated file list (no libssl)
+ #
+ $inf_file = "OpensslLibCrypto.inf";
-$subbing = 0;
-print "\n--> Updating OpensslLibCrypto.inf ... ";
-foreach (@inf) {
- if ( $_ =~ "# Autogenerated files list starts here" ) {
- push @new_inf, $_, @cryptofilelist;
- $subbing = 1;
- next;
- }
- if ( $_ =~ "# Autogenerated files list ends here" ) {
- push @new_inf, $_;
- $subbing = 0;
- next;
+ # Read the contents of the inf file
+ @inf = ();
+ @new_inf = ();
+ open( FD, "<" . $inf_file ) ||
+ die "Cannot open \"" . $inf_file . "\"!";
+ @inf = (<FD>);
+ close(FD) ||
+ die "Cannot close \"" . $inf_file . "\"!";
+
+ $subbing = 0;
+ print "\n--> Updating OpensslLibCrypto.inf ... ";
+ foreach (@inf) {
+ if ( $_ =~ "# Autogenerated files list starts here" ) {
+ push @new_inf, $_, @cryptofilelist;
+ $subbing = 1;
+ next;
+ }
+ if ( $_ =~ "# Autogenerated files list ends here" ) {
+ push @new_inf, $_;
+ $subbing = 0;
+ next;
+ }
+
+ push @new_inf, $_
+ unless ($subbing);
}
- push @new_inf, $_
- unless ($subbing);
+ $new_inf_file = $inf_file . ".new";
+ open( FD, ">" . $new_inf_file ) ||
+ die $new_inf_file;
+ print( FD @new_inf ) ||
+ die $new_inf_file;
+ close(FD) ||
+ die $new_inf_file;
+ rename( $new_inf_file, $inf_file ) ||
+ die "rename $inf_file";
+ print "Done!";
}
-$new_inf_file = $inf_file . ".new";
-open( FD, ">" . $new_inf_file ) ||
- die $new_inf_file;
-print( FD @new_inf ) ||
- die $new_inf_file;
-close(FD) ||
- die $new_inf_file;
-rename( $new_inf_file, $inf_file ) ||
- die "rename $inf_file";
-print "Done!";
-
#
# Copy opensslconf.h and dso_conf.h generated from OpenSSL Configuration
#
diff --git a/CryptoPkg/Library/OpensslLib/uefi-asm.conf b/CryptoPkg/Library/OpensslLib/uefi-asm.conf
new file mode 100644
index 0000000000..4fd52c9cf2
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/uefi-asm.conf
@@ -0,0 +1,14 @@
+## -*- mode: perl; -*-
+## UEFI assembly openssl configuration targets.
+
+my %targets = (
+#### UEFI
+ "UEFI-x86" => {
+ inherit_from => [ "UEFI", asm("x86_asm") ],
+ perlasm_scheme => "win32n",
+ },
+ "UEFI-x86_64" => {
+ inherit_from => [ "UEFI", asm("x86_64_asm") ],
+ perlasm_scheme => "nasm",
+ },
+);
--
2.16.2.windows.1
next prev parent reply other threads:[~2020-03-17 10:27 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-17 10:26 [PATCH 0/1] CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64 Zurcher, Christopher J
2020-03-17 10:26 ` Zurcher, Christopher J [this message]
2020-03-26 1:15 ` [edk2-devel] [PATCH 1/1] " Yao, Jiewen
[not found] ` <15FFB5A5A94CCE31.23217@groups.io>
2020-03-26 1:23 ` Yao, Jiewen
2020-03-26 2:44 ` Zurcher, Christopher J
2020-03-26 3:05 ` Yao, Jiewen
2020-03-26 3:29 ` Zurcher, Christopher J
2020-03-26 3:58 ` Yao, Jiewen
2020-03-26 18:23 ` Michael D Kinney
2020-03-27 0:52 ` Zurcher, Christopher J
2020-03-23 12:59 ` [edk2-devel] [PATCH 0/1] " Laszlo Ersek
2020-03-25 18:40 ` Ard Biesheuvel
2020-03-26 1:04 ` [edk2-devel] " Zurcher, Christopher J
2020-03-26 7:49 ` Ard Biesheuvel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-list from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200317102656.20032-2-christopher.j.zurcher@intel.com \
--to=devel@edk2.groups.io \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox