From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from NAM04-CO1-obe.outbound.protection.outlook.com (mail-co1nam04olkn0812.outbound.protection.outlook.com [IPv6:2a01:111:f400:fe4d::812]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id ADA9D21955D99 for ; Thu, 4 May 2017 05:22:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hotmail.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=bBezoTWUY0VShf8dEnhRSiNDb0JV0jgC9EIHR9fMI1I=; b=KLTmXU44608Mx/N2fCZFeX6ceWPooCHmCnqxnKwRmEl7xSKfWSP+I1bxXhsUb0upMot3iBlFMBQ7wkINArWeVcEUv0J2787U4vlJqGpM8z/BEBfhFiZRSFoXhHPqImRaLuqh5F9HQ/9EbouBUY7QQ8NvUTAbbKkXJDeF3LRXN1r28UQ1GWweGTzczi0zlZsCGE570GUMCr1Bn4+CqVJayIZqbmSAXhUaXW4dIRHPGMFdwyEQuEY8qq1kBaUiGQI1x6HjZTfedMfugJE6Pa9yHZm+jkx96KBXNNZb2p17bb9Ev7k6PwaXFXdOKjdVjo3NgL2qld4Jc0KwRrmDBtZDdQ== Received: from CO1NAM04FT050.eop-NAM04.prod.protection.outlook.com (10.152.90.60) by CO1NAM04HT084.eop-NAM04.prod.protection.outlook.com (10.152.91.135) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.1047.9; Thu, 4 May 2017 12:22:09 +0000 Received: from MWHPR11MB1822.namprd11.prod.outlook.com (10.152.90.59) by CO1NAM04FT050.mail.protection.outlook.com (10.152.91.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1047.9 via Frontend Transport; Thu, 4 May 2017 12:22:09 +0000 Received: from MWHPR11MB1822.namprd11.prod.outlook.com ([10.175.53.137]) by MWHPR11MB1822.namprd11.prod.outlook.com ([10.175.53.137]) with mapi id 15.01.1075.010; Thu, 4 May 2017 12:22:09 +0000 From: Amit kumar To: Andrew Fish CC: Mike Kinney , "edk2-devel@lists.01.org" Thread-Topic: [edk2] Accessing AVX/AVX2 instruction in UEFI. Thread-Index: AQHSw2YUrPw3Gy/dW0iHxTLtc/HNpqHiGq96gAHsTDmAAAaRgIAACzCBgAACd0E= Date: Thu, 4 May 2017 12:22:09 +0000 Message-ID: References: <0E40AA0F-3FDD-420D-9982-43FB8E0DE81A@apple.com> , , In-Reply-To: Accept-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: apple.com; dkim=none (message not signed) header.d=none;apple.com; dmarc=none action=none header.from=hotmail.com; x-incomingtopheadermarker: OriginalChecksum:81E4EFDA3A733FCBFCEF0F900280F69A12567E65EF18EAD7CC0BD2C06A534505; UpperCasedChecksum:3CC130411596A8BE14E2C2E0DF444D599B7CEFF3816D31D7E5879B54C0FC32FE; SizeAsReceived:8777; Count:46 x-ms-exchange-messagesentrepresentingtype: 1 x-tmn: [ie9ASEoMeH26ASBY/RMahq44c6qya6Eo] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; CO1NAM04HT084; 5:jd3ZmGIiehCyVO/+l7D7vyACyQ5FeEqCXMQw1J2zKR+MzlE3GcNTgGHek1hE9P4tTeLVP2LXzDYXFR60bl/bd3ikpmnRQDuh01GrSvfiBnTd14ivIsByrua7jdnTbK28RM0AwlpzFBI1EhbozYZNbw==; 24:DVx6rFEnBslmUTvD99eNGt92XAbbsEc/UPZyittyvYYYgWU8MDXL+TubURs+1pnRCfbdRrEkObhO/9/2nv2rQxI5oh2BVyDbPXcsU6l2irw=; 7:Km2Hio4Y/qA1BiDYYNZtrGGR07ZHH+3JRGSQbF8hTJ1wA/jmtr956K9cqToAaxCL0KaTayQue8r58AlvOKvGPs6bC5BtnvFpup8Tr6pEpfihxHlWo2TUlHP3F582EsqYqepjcOxIV+fXLEFLl0m0t608e8sTiFeGjqX93AB0Iyk0kiMob5qI38zkM+XOHaNpUQnChoi/GoEy3EPyFXAFgkLTsYWDGbl6L2Nsmgh5rd+5hoNmAVkeyt+BLGAdv3YsHXdKZJl9q7Ci6XHqeKQC112oDTcHVFiq3ahBY4mCZLr8E/G/e22jJADVfh1v6/yb x-incomingheadercount: 46 x-eopattributedmessage: 0 x-forefront-antispam-report: EFV:NLI; SFV:NSPM; SFS:(7070007)(98901004); DIR:OUT; SFP:1901; SCL:1; SRVR:CO1NAM04HT084; H:MWHPR11MB1822.namprd11.prod.outlook.com; FPR:; SPF:None; LANG:en; x-ms-office365-filtering-correlation-id: c9e8d04f-338d-4da6-86ba-08d492e82fa4 x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(201702061074)(5061506573)(5061507331)(1603103135)(2017031320274)(2017031324274)(2017031323274)(2017031322274)(1601125374)(1603101448)(1701031045); SRVR:CO1NAM04HT084; x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(444000031); SRVR:CO1NAM04HT084; BCL:0; PCL:0; RULEID:; SRVR:CO1NAM04HT084; x-forefront-prvs: 02973C87BC spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: hotmail.com X-MS-Exchange-CrossTenant-originalarrivaltime: 04 May 2017 12:22:09.3730 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Internet X-MS-Exchange-CrossTenant-id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-Transport-CrossTenantHeadersStamped: CO1NAM04HT084 X-Content-Filtered-By: Mailman/MimeDel 2.1.22 Subject: Re: Accessing AVX/AVX2 instruction in UEFI. X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 May 2017 12:22:10 -0000 Content-Language: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Here are the compiler flags [BuildOptions] MSFT:DEBUG_*_*_CC_FLAGS =3D /Od /FAsc /GL- MSFT:RELEASE_*_*_CC_FLAGS =3D /FAsc /D MDEPKG_NDEBUG MSFT:RELEASE_*_*_DLINK_FLAGS =3D /BASE:0x10000 /ALIGN:4096 /FILEALIGN:40= 96 ________________________________ From: Amit kumar Sent: Thursday, May 4, 2017 5:48:11 PM To: Andrew Fish Cc: Mike Kinney; edk2-devel@lists.01.org Subject: Re: [edk2] Accessing AVX/AVX2 instruction in UEFI. Yes am aligning the data at 32 byte boundary while allocating memory in bot= h environments. in windows using _alligned_malloc(size,32); in UEFI Offset =3D (UINTN)src & 0xFF; src =3D (CHAR8 *)((UINTN) src - Offset + 0x20); Thanks Amit ________________________________ From: afish@apple.com on behalf of Andrew Fish Sent: Thursday, May 4, 2017 5:02:55 PM To: Amit kumar Cc: Mike Kinney; edk2-devel@lists.01.org Subject: Re: [edk2] Accessing AVX/AVX2 instruction in UEFI. > On May 4, 2017, at 4:13 AM, Amit kumar wrote: > > Hi, > > > Even after using AVX2 instruction my code shown no performance improvemen= t in UEFI although there is substantial improvement when i run the similar = code in windows . > > Am i missing something ? > Is the data aligned the same in both environments? Thanks, Andrew Fish > Using MSVC compiler and the codes written in ASM. > > Thanks And Regards > > Amit > > ________________________________ > From: edk2-devel on behalf of Amit kuma= r > Sent: Wednesday, May 3, 2017 11:18:39 AM > To: Kinney, Michael D; Andrew Fish > Cc: edk2-devel@lists.01.org > Subject: Re: [edk2] Accessing AVX/AVX2 instruction in UEFI. > > Thank you Michael and Andrew > > > Regards > > Amit > > ________________________________ > From: Kinney, Michael D > Sent: Tuesday, May 2, 2017 10:33:45 PM > To: Andrew Fish; Amit kumar; Kinney, Michael D > Cc: edk2-devel@lists.01.org > Subject: RE: [edk2] Accessing AVX/AVX2 instruction in UEFI. > > Amit, > > The information from Andrew is correct. > > The document that covers this topic is the > Intel(r) 64 and IA-32 Architectures Software Developer Manuals > > https://software.intel.com/en-us/articles/intel-sdm > > Volume 1, Section 13.5.3 describes the AVX State. There are > More details about detecting and enabling different AVX features > in that document. > > If the CPU supports AVX, then the basic assembly instructions > required to use AVX instructions are the following that sets > bits 0, 1, 2 of XCR0. > > mov rcx, 0 > xgetbv > or rax, 0007h > xsetbv > > One additional item you need to be aware of is that UEFI firmware only > saves/Restores CPU registers required for the UEFI ABI calling convention > when a timer interrupt or exception is processed. > > This means CPU state such as the YMM registers are not saved/restored > across an interrupt and may be modified if code in interrupt context > also uses YMM registers. > > When you enable the use of extended registers, interrupts should be > saved/disabled and restored around the extended register usage. > > You can use the following functions from MdePkg BaseLib to do this > > /** > Disables CPU interrupts and returns the interrupt state prior to the dis= able > operation. > > @retval TRUE CPU interrupts were enabled on entry to this call. > @retval FALSE CPU interrupts were disabled on entry to this call. > > **/ > BOOLEAN > EFIAPI > SaveAndDisableInterrupts ( > VOID > ); > > /** > Set the current CPU interrupt state. > > Sets the current CPU interrupt state to the state specified by > InterruptState. If InterruptState is TRUE, then interrupts are enabled. = If > InterruptState is FALSE, then interrupts are disabled. InterruptState is > returned. > > @param InterruptState TRUE if interrupts should enabled. FALSE if > interrupts should be disabled. > > @return InterruptState > > **/ > BOOLEAN > EFIAPI > SetInterruptState ( > IN BOOLEAN InterruptState > ); > > Algorithm: > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > { > BOOLEAN InterruptState; > > InterruptState =3D SaveAndDisableInterrupts(); > > // Enable use of AVX/AVX2 instructions > > // Use AVX/AVX2 instructions > > SetInterruptState (InterruptState); > } > > Best regards, > > Mike > >> -----Original Message----- >> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of A= ndrew Fish >> Sent: Tuesday, May 2, 2017 8:12 AM >> To: Amit kumar >> Cc: edk2-devel@lists.01.org >> Subject: Re: [edk2] Accessing AVX/AVX2 instruction in UEFI. >> >> >>> On May 2, 2017, at 6:57 AM, Amit kumar wrote: >>> >>> Hi, >>> >>> Am trying to optimize an application using AVX/AVX2, but my code hangs = while trying >> to access YMM registers. >>> The instruction where my code hangs is : >>> >>> >>> vmovups ymm0, YMMWORD PTR [rax] >>> >>> >>> I have verified the cpuid in OS and it supports AVX and AVX2 instructio= n. Processor >> i7 6th gen. >>> Can somebody help me out here ? Is there a way to enable YMM registers = ? >>> >> >> Amit, >> >> I think these instructions will generate an illegal instruction fault un= til you enable >> AVX. You need to check the Cpu ID bits in your code, then write BIT18 of= CR4. After >> that XGETBV/XSETBV instructions are enabled and you can or in the lower = 2 bits of >> XCR0. This basic operation is in the Intel Docs, it is just hard to find= . Usually the >> OS has done this for the programmer and all the code needs to do is chec= k the CPU ID. >> >> Thanks, >> >> Andrew Fish >> >>> >>> Thanks And Regards >>> Amit Kumar >>> >>> _______________________________________________ >>> edk2-devel mailing list >>> edk2-devel@lists.01.org >>> https://lists.01.org/mailman/listinfo/edk2-devel >> >> _______________________________________________ >> edk2-devel mailing list >> edk2-devel@lists.01.org >> https://lists.01.org/mailman/listinfo/edk2-devel > _______________________________________________ > edk2-devel mailing list > edk2-devel@lists.01.org > https://lists.01.org/mailman/listinfo/edk2-devel > _______________________________________________ > edk2-devel mailing list > edk2-devel@lists.01.org > https://lists.01.org/mailman/listinfo/edk2-devel