From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from NAM03-DM3-obe.outbound.protection.outlook.com (mail-dm3nam03olkn081d.outbound.protection.outlook.com [IPv6:2a01:111:f400:fe49::81d]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 13AF221A18AAB for ; Fri, 5 May 2017 06:08:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hotmail.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=cuptFZGXX/mzPq4xYUvF1f3x0VGyswUF8z9EXy0SAxw=; b=mCra+Nvh6F7YzWusVOVdapDRUUESL5mt+wYLNIuAWQi+zqtZ/Pn6FBP4vifGva8nQl02VQ3sk0OaP5yufeUiNXPXqBl2/o7weZzhWS/Ajp+ViPNjYbfvBHAcoPXwwGn1ICvTkb1P8N/1Zqvo9J/TtBJCCZHeKPg62iFORj+qJmrKblcgioPcO2OP00Btu1KrEKIUI4vAKvx6XhXtPrxqPB/99DQYaaVEEDy2O8ZifTmPBroW9oyzEq8Y6aAaAUTjA09qpvy29lwvIycypqAcFrGW2fJ0AJ1woCeuJo3M2HGX0c7ijIKJIBncFw13ZxFGu17rlo5p53NAeoLiI+JnQw== Received: from DM3NAM03FT004.eop-NAM03.prod.protection.outlook.com (10.152.82.51) by DM3NAM03HT231.eop-NAM03.prod.protection.outlook.com (10.152.82.161) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.1047.9; Fri, 5 May 2017 13:08:42 +0000 Received: from MWHPR11MB1822.namprd11.prod.outlook.com (10.152.82.51) by DM3NAM03FT004.mail.protection.outlook.com (10.152.82.105) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1047.9 via Frontend Transport; Fri, 5 May 2017 13:08:42 +0000 Received: from MWHPR11MB1822.namprd11.prod.outlook.com ([10.175.53.137]) by MWHPR11MB1822.namprd11.prod.outlook.com ([10.175.53.137]) with mapi id 15.01.1075.010; Fri, 5 May 2017 13:08:42 +0000 From: Amit kumar To: "Kinney, Michael D" , Andrew Fish CC: "edk2-devel@lists.01.org" Thread-Topic: [edk2] Accessing AVX/AVX2 instruction in UEFI. Thread-Index: AQHSw2YUrPw3Gy/dW0iHxTLtc/HNpqHiGq96gAHsTDmAAAaRgIAACzCBgAACd0GAADIJAIAAIysAgAFI4KA= Date: Fri, 5 May 2017 13:08:42 +0000 Message-ID: References: <0E40AA0F-3FDD-420D-9982-43FB8E0DE81A@apple.com> <00B544A5-F7E3-4723-BCFD-D2A15138EB21@apple.com>, In-Reply-To: Accept-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: intel.com; dkim=none (message not signed) header.d=none;intel.com; dmarc=none action=none header.from=hotmail.com; x-incomingtopheadermarker: OriginalChecksum:9D78F9330269D74D6545326FBB6E37EBD0D1E0361AEA0F3207962417826FADB4; UpperCasedChecksum:522F8848BF3218B284E94E4F43CAD556DBAFCF257AECB43DBE226DEF91E82766; SizeAsReceived:9006; Count:46 x-ms-exchange-messagesentrepresentingtype: 1 x-tmn: [TdjyaVvKASdHkNYxH4hzZx+rKsgqzS9J] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; DM3NAM03HT231; 5:0hBjgMUC3SW0cJ8oadbopDW5BE57N0Wx1OScGFByDrK0Aa7i22na7EIE2hi8KCZZeVBTiXFx8al1KNsw5FN0nPZ23myYC9pgcPmHiUYO8TOsLLNwtzEwAQbac54mqiOMgw0f+5ZRAlS/Dh3pZNjWcw==; 24:jbQdRbPQUqL4wkvkbyuJ5R4H1baxN3rhQNaPvfnG/97x6tUxqIcEuXIPouDuqZhCkPcNV5KdPC4wO0NCPs1hpECDrkD4yCVwUMpgOtDygq8=; 7:vSRzhS4D1P/jOYjRwouMVRmKyBhrUAx6dAZcM8p1Jr0iFeoQfnH0Kx978iNeXpaR/mwnp5YbDL6C0hCh7DmVS9+KWu0DmLZ/wTvZSmIT5UI4Z+4KL9PVgptauu6PTWUQe+EZcwEUDjKpXwdJRhG3OTqIiJbSEUAGTe/q8dQQn6PK+Mvi5769Yt/oUYxfSmq6t8tgsKgVq8lD56vMzvIEVgSu2IVfKMRcCxiiQR6p+G6aT9bUra4pJBmNPH/SsMhGKHv+JCk+GY23ThEKbh6sQLyd3S7d9NFTkw0HH7zoIvMIpbTE0/ennjoCxEMNoUIC x-incomingheadercount: 46 x-eopattributedmessage: 0 x-forefront-antispam-report: EFV:NLI; SFV:NSPM; SFS:(7070007)(98901004); DIR:OUT; SFP:1901; SCL:1; SRVR:DM3NAM03HT231; H:MWHPR11MB1822.namprd11.prod.outlook.com; FPR:; SPF:None; LANG:en; x-ms-office365-filtering-correlation-id: 2b42061a-6c16-4efa-2167-08d493b7dac6 x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(201702061074)(5061506573)(5061507331)(1603103135)(2017031320274)(2017031324274)(2017031323274)(2017031322274)(1603101448)(1601125374)(1701031045); SRVR:DM3NAM03HT231; x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(444000031); SRVR:DM3NAM03HT231; BCL:0; PCL:0; RULEID:; SRVR:DM3NAM03HT231; x-forefront-prvs: 02981BE340 spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: hotmail.com X-MS-Exchange-CrossTenant-originalarrivaltime: 05 May 2017 13:08:42.4258 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Internet X-MS-Exchange-CrossTenant-id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM3NAM03HT231 X-Content-Filtered-By: Mailman/MimeDel 2.1.22 Subject: Re: Accessing AVX/AVX2 instruction in UEFI. X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 May 2017 13:08:44 -0000 Content-Language: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Mike, Andrew Thanks for your suggestions, it looks like MMIO is the bottleneck in my app= lication. I have one more query. Does each core have independent YMM registers or is = it shared among the cores ? Thanks And Regards Amit Kumar ________________________________ From: Kinney, Michael D Sent: Thursday, May 4, 2017 10:56:44 PM To: Andrew Fish; Amit kumar; Kinney, Michael D Cc: edk2-devel@lists.01.org Subject: RE: [edk2] Accessing AVX/AVX2 instruction in UEFI. Amit, I agree with Andrew that establishing a good measurement method is very important and that raising TPL to HIGH_LEVEL(disabling interrupts) during measurement may improve the consistency of the measurement results. You also likely want to test both large buffer operations as well as a loop on small buffer operations to see if there are differences based on the size of the requested operation. In order to verify that your measurement method is working, you may want to test some of the existing BaseMemoryLib implementations before testing your new one. * BaseMemoryLib C code implementation * BaseMemoryLibMmx Uses MMX registers/instructions * BaseMemoryLibSse2 Uses SSE2 registers/instructions * BaseMemoryLibRepStr Uses REP STR instructions * BaseMemoryLibOptDxe Blend of above libs with good perf in DXE/UEFI phase * BaseMemoryLibOptPei Blend of above libs with good perf in PEI phase I recommend you try measuring the first 4 to see if your measurements show differences. Base on my own evaluation in the past, I have found that DXE/UEFI code work= s well with BaseMemoryLibRepStr. It tends to go as fast as the largest register width access the CPU supports. One additional element that may be impacting your results is the type of memory that is being testing and that memory ranges cache settings. If you are accessing MMIO, FLASH, or some other type of device memory, you may be seeing bandwidth limitations from that device. Best regards, Mike > -----Original Message----- > From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of An= drew Fish > Sent: Thursday, May 4, 2017 8:21 AM > To: Amit kumar > Cc: Kinney, Michael D ; edk2-devel@lists.01.o= rg > Subject: Re: [edk2] Accessing AVX/AVX2 instruction in UEFI. > > Amit, > > In regards to AVX/AVX2 performance how are you doing the measuring? > > In EFI it is hard to measure wall clock time for things that take a long = time. > Basically there is no scheduler in EFI and no threads, but there are even= ts. The > events can preempt your App while it is running and the time spent in eve= nts would > look to you like time spent in your App. > > Generally the time spent in events should be constant (hot plugging USB o= r other > changes like that may have a noticeable impact). If the goal of the perfo= rmance > measurement is to make the system boot faster you care more about the del= ta, than the > absolute time (so the event overhead does not matter). > > If you are just doing a computation that does not do any IO then you may = be able to > raise the TPL to prevent events from being part of your measurement. > > Thanks, > > Andrew Fish > > PS I assume your are measuring the RELEASE code since you are turning off= optimization > on the DEBUG code. > > > On May 4, 2017, at 5:22 AM, Amit kumar wrote: > > > > Here are the compiler flags > > [BuildOptions] > > MSFT:DEBUG_*_*_CC_FLAGS =3D /Od /FAsc /GL- > > MSFT:RELEASE_*_*_CC_FLAGS =3D /FAsc /D MDEPKG_NDEBUG > > MSFT:RELEASE_*_*_DLINK_FLAGS =3D /BASE:0x10000 /ALIGN:4096 /FILEALIGN= :4096 > > > > > > ________________________________ > > From: Amit kumar > > Sent: Thursday, May 4, 2017 5:48:11 PM > > To: Andrew Fish > > Cc: Mike Kinney; edk2-devel@lists.01.org > > Subject: Re: [edk2] Accessing AVX/AVX2 instruction in UEFI. > > > > > > Yes am aligning the data at 32 byte boundary while allocating memory in= both > environments. > > > > in windows using _alligned_malloc(size,32); > > > > in UEFI > > > > Offset =3D (UINTN)src & 0xFF; > > > > src =3D (CHAR8 *)((UINTN) src - Offset + 0x20); > > > > > > Thanks > > > > Amit > > > > ________________________________ > > From: afish@apple.com on behalf of Andrew Fish > > Sent: Thursday, May 4, 2017 5:02:55 PM > > To: Amit kumar > > Cc: Mike Kinney; edk2-devel@lists.01.org > > Subject: Re: [edk2] Accessing AVX/AVX2 instruction in UEFI. > > > > > >> On May 4, 2017, at 4:13 AM, Amit kumar wrote: > >> > >> Hi, > >> > >> > >> Even after using AVX2 instruction my code shown no performance improve= ment in UEFI > although there is substantial improvement when i run the similar code in = windows . > >> > >> Am i missing something ? > >> > > > > Is the data aligned the same in both environments? > > > > Thanks, > > > > Andrew Fish > > > >> Using MSVC compiler and the codes written in ASM. > >> > >> Thanks And Regards > >> > >> Amit > >> > >> ________________________________ > >> From: edk2-devel on behalf of Amit k= umar > > >> Sent: Wednesday, May 3, 2017 11:18:39 AM > >> To: Kinney, Michael D; Andrew Fish > >> Cc: edk2-devel@lists.01.org > >> Subject: Re: [edk2] Accessing AVX/AVX2 instruction in UEFI. > >> > >> Thank you Michael and Andrew > >> > >> > >> Regards > >> > >> Amit > >> > >> ________________________________ > >> From: Kinney, Michael D > >> Sent: Tuesday, May 2, 2017 10:33:45 PM > >> To: Andrew Fish; Amit kumar; Kinney, Michael D > >> Cc: edk2-devel@lists.01.org > >> Subject: RE: [edk2] Accessing AVX/AVX2 instruction in UEFI. > >> > >> Amit, > >> > >> The information from Andrew is correct. > >> > >> The document that covers this topic is the > >> Intel(r) 64 and IA-32 Architectures Software Developer Manuals > >> > >> https://software.intel.com/en-us/articles/intel-sdm > >> > >> Volume 1, Section 13.5.3 describes the AVX State. There are > >> More details about detecting and enabling different AVX features > >> in that document. > >> > >> If the CPU supports AVX, then the basic assembly instructions > >> required to use AVX instructions are the following that sets > >> bits 0, 1, 2 of XCR0. > >> > >> mov rcx, 0 > >> xgetbv > >> or rax, 0007h > >> xsetbv > >> > >> One additional item you need to be aware of is that UEFI firmware only > >> saves/Restores CPU registers required for the UEFI ABI calling convent= ion > >> when a timer interrupt or exception is processed. > >> > >> This means CPU state such as the YMM registers are not saved/restored > >> across an interrupt and may be modified if code in interrupt context > >> also uses YMM registers. > >> > >> When you enable the use of extended registers, interrupts should be > >> saved/disabled and restored around the extended register usage. > >> > >> You can use the following functions from MdePkg BaseLib to do this > >> > >> /** > >> Disables CPU interrupts and returns the interrupt state prior to the d= isable > >> operation. > >> > >> @retval TRUE CPU interrupts were enabled on entry to this call. > >> @retval FALSE CPU interrupts were disabled on entry to this call. > >> > >> **/ > >> BOOLEAN > >> EFIAPI > >> SaveAndDisableInterrupts ( > >> VOID > >> ); > >> > >> /** > >> Set the current CPU interrupt state. > >> > >> Sets the current CPU interrupt state to the state specified by > >> InterruptState. If InterruptState is TRUE, then interrupts are enabled= . If > >> InterruptState is FALSE, then interrupts are disabled. InterruptState = is > >> returned. > >> > >> @param InterruptState TRUE if interrupts should enabled. FALSE if > >> interrupts should be disabled. > >> > >> @return InterruptState > >> > >> **/ > >> BOOLEAN > >> EFIAPI > >> SetInterruptState ( > >> IN BOOLEAN InterruptState > >> ); > >> > >> Algorithm: > >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >> { > >> BOOLEAN InterruptState; > >> > >> InterruptState =3D SaveAndDisableInterrupts(); > >> > >> // Enable use of AVX/AVX2 instructions > >> > >> // Use AVX/AVX2 instructions > >> > >> SetInterruptState (InterruptState); > >> } > >> > >> Best regards, > >> > >> Mike > >> > >>> -----Original Message----- > >>> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf O= f Andrew Fish > >>> Sent: Tuesday, May 2, 2017 8:12 AM > >>> To: Amit kumar > >>> Cc: edk2-devel@lists.01.org > >>> Subject: Re: [edk2] Accessing AVX/AVX2 instruction in UEFI. > >>> > >>> > >>>> On May 2, 2017, at 6:57 AM, Amit kumar wrote: > >>>> > >>>> Hi, > >>>> > >>>> Am trying to optimize an application using AVX/AVX2, but my code han= gs while > trying > >>> to access YMM registers. > >>>> The instruction where my code hangs is : > >>>> > >>>> > >>>> vmovups ymm0, YMMWORD PTR [rax] > >>>> > >>>> > >>>> I have verified the cpuid in OS and it supports AVX and AVX2 instruc= tion. > Processor > >>> i7 6th gen. > >>>> Can somebody help me out here ? Is there a way to enable YMM registe= rs ? > >>>> > >>> > >>> Amit, > >>> > >>> I think these instructions will generate an illegal instruction fault= until you > enable > >>> AVX. You need to check the Cpu ID bits in your code, then write BIT18= of CR4. > After > >>> that XGETBV/XSETBV instructions are enabled and you can or in the low= er 2 bits of > >>> XCR0. This basic operation is in the Intel Docs, it is just hard to f= ind. Usually > the > >>> OS has done this for the programmer and all the code needs to do is c= heck the CPU > ID. > >>> > >>> Thanks, > >>> > >>> Andrew Fish > >>> > >>>> > >>>> Thanks And Regards > >>>> Amit Kumar > >>>> > >>>> _______________________________________________ > >>>> edk2-devel mailing list > >>>> edk2-devel@lists.01.org > >>>> https://lists.01.org/mailman/listinfo/edk2-devel > >>> > >>> _______________________________________________ > >>> edk2-devel mailing list > >>> edk2-devel@lists.01.org > >>> https://lists.01.org/mailman/listinfo/edk2-devel > >> _______________________________________________ > >> edk2-devel mailing list > >> edk2-devel@lists.01.org > >> https://lists.01.org/mailman/listinfo/edk2-devel > >> _______________________________________________ > >> edk2-devel mailing list > >> edk2-devel@lists.01.org > >> https://lists.01.org/mailman/listinfo/edk2-devel > > > > _______________________________________________ > > edk2-devel mailing list > > edk2-devel@lists.01.org > > https://lists.01.org/mailman/listinfo/edk2-devel > > _______________________________________________ > edk2-devel mailing list > edk2-devel@lists.01.org > https://lists.01.org/mailman/listinfo/edk2-devel