From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-in7.apple.com (mail-out7.apple.com [17.151.62.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 2C91821A13487 for ; Thu, 4 May 2017 08:20:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; d=apple.com; s=mailout2048s; c=relaxed/simple; q=dns/txt; i=@apple.com; t=1493911253; h=From:Sender:Reply-To:Subject:Date:Message-id:To:Cc:MIME-version:Content-type: Content-transfer-encoding:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-reply-to:References:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=k096EOTI5Rw/K1sliGbQlvY14frlxHOiiJ23InpAjBo=; b=XpFUZKy+4J1ScphLwrwZ1hz93E6zoXmyMzRkEfTZzJ+sIUVu25HzACdHAdhg7s6C OaUrDpYvQK/AHCFB5CzCIsEg5I+APJVYf/7hBrzWm6CU+XUg8h9WlEN3ELk1Ob3Y BKDpDpq4/p1oTn6RYNfpunR2X8ZNHsvwjyB6NQ/UWUv0OQvLG2Ket/Qi9oFMOQP9 fYstpf9LRN+RzTCaf2CT8HRp5BUq3t0fsh0FpcikFWhJXlJCNUBvyuB+0766D/3T DrH5t3KP6cw9djICvQsCGMOAlspxhuR97pV5ifA5FoGXss42Ob9P32WI9xqeSguC gNPmpikVT+Hn0jsysZxYDg==; Received: from relay3.apple.com (relay3.apple.com [17.128.113.83]) (using TLS with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mail-in7.apple.com (Apple Secure Mail Relay) with SMTP id 5B.6F.08351.5D64B095; Thu, 4 May 2017 08:20:53 -0700 (PDT) X-AuditID: 11973e16-efb529a00000209f-a5-590b46d5b654 Received: from nwk-mmpp-sz09.apple.com (nwk-mmpp-sz09.apple.com [17.128.115.80]) by relay3.apple.com (Apple SCV relay) with SMTP id 88.EF.15148.5D64B095; Thu, 4 May 2017 08:20:53 -0700 (PDT) MIME-version: 1.0 Received: from da0601a-dhcp148.apple.com (da0601a-dhcp148.apple.com [17.226.15.148]) by nwk-mmpp-sz09.apple.com (Oracle Communications Messaging Server 8.0.1.2.20170210 64bit (built Feb 10 2017)) with ESMTPSA id <0OPF0084YPAT5G60@nwk-mmpp-sz09.apple.com>; Thu, 04 May 2017 08:20:53 -0700 (PDT) Sender: afish@apple.com From: Andrew Fish In-reply-to: Date: Thu, 04 May 2017 08:20:52 -0700 Cc: Mike Kinney , "edk2-devel@lists.01.org" Message-id: <00B544A5-F7E3-4723-BCFD-D2A15138EB21@apple.com> References: <0E40AA0F-3FDD-420D-9982-43FB8E0DE81A@apple.com> To: Amit kumar X-Mailer: Apple Mail (2.3273) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrMLMWRmVeSWpSXmKPExsUi2FAYrHvVjTvS4EODicW+1zuZLPYcOsps 0dHxj8mB2eNxzxk2j8V7XjJ5dM/+xxLAHMVlk5Kak1mWWqRvl8CVsebwY7aCK3YVB/oamRoY G426GDk5JARMJNpuTWPpYuTiEBJYzSTx89Q5FpjEmhm7mEBsIYGDjBLzZ5iC2LwCghI/Jt8D quHgYBaQlzh4XhYkzCygJfH9USsLRPlaJonnbZIgtrCAuMS7M5uYIWxriVcLmxlBbDYBZYkV 8z+wg9icAvESG29PBouzCKhKLF26kB1iZrLEtEUn2EFW8QrYSLxqT4Y48xaLxMRji1hBakSA 6vvbp7BDnCwrcWv2JWaQIgmBLWwS/yd3sUxgFJ6F5OxZCGfPQnL2AkbmVYxCuYmZObqZeeZ6 iQUFOal6yfm5mxhBoT7dTmwH48NVVocYBTgYlXh4N9zljBRiTSwrrsw9xCjNwaIkzsuvCRQS SE8sSc1OTS1ILYovKs1JLT7EyMTBKdXAWCh2r315ztWQiauvtopM8Fz5sG/dK7/6WK1118xT Z4fpPLf4YWTgy/you317UrTs/ofT5v711Shf7xJ0v8GlWSPi+7VtGd0SE3bG/edVstQRuiKf d45f6JLQ0/wO64eB5S1pzDbNzrqSmZ9mJ3y4vOm//OKehHtvPm8R3HpCte/QhSa9JWmtSizF GYmGWsxFxYkApcTr3lYCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFprKIsWRmVeSWpSXmKPExsUi2FAcoHvVjTvSYPliJYt9r3cyWew5dJTZ oqPjH5MDs8fjnjNsHov3vGTy6J79jyWAOYrLJiU1J7MstUjfLoErY83hx2wFV+wqDvQ1MjUw Nhp1MXJySAiYSKyZsYsJxBYSOMgoMX+GKYjNKyAo8WPyPZYuRg4OZgF5iYPnZUHCzAJaEt8f tbJAlK9lknjeJgliCwuIS7w7s4kZwraWeLWwmRHEZhNQllgx/wM7iM0pEC+x8fZksDiLgKrE 0qUL2SFmJktMW3SCHWQVr4CNxKv25C5GLqDxt1gkJh5bxApSIwJU398+hR3iZFmJW7MvMU9g FJiF5NJZCJfOQnLpAkbmVYwCRak5iZXGeokFBTmpesn5uZsYwaFZGLyD8c8yq0OMAhyMSjy8 C4AhK8SaWFZcmXuIUYKDWUmEl8cCKMSbklhZlVqUH19UmpNafIixCuj+icxSosn5wLjJK4k3 NDExMDE2NjM2Njcxp4qwkjjvtGymSCGB9MSS1OzU1ILUIpjlTBycUg2M1v84DhqUm3yZ3LSB Y2+S/lZZjsvFfgVCxzJcbTb68v65MDftWsKhJxmKizdwzUk6suHJRJMty0q4j6oHSimLmG9R FtMv+P9AOqH30XIhoRSfT/GrPp+v3l/Wzjzr3nqeWfPuCC8VKNcRfVSw92DRztuTfh+4Yfvl yKUu+8ey+/bW5H//XZo8W4mlOCPRUIu5qDgRACvyt0CoAgAA Subject: Re: Accessing AVX/AVX2 instruction in UEFI. X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 May 2017 15:20:54 -0000 Content-transfer-encoding: 7BIT Content-type: text/plain; CHARSET=US-ASCII Amit, In regards to AVX/AVX2 performance how are you doing the measuring? In EFI it is hard to measure wall clock time for things that take a long time. Basically there is no scheduler in EFI and no threads, but there are events. The events can preempt your App while it is running and the time spent in events would look to you like time spent in your App. Generally the time spent in events should be constant (hot plugging USB or other changes like that may have a noticeable impact). If the goal of the performance measurement is to make the system boot faster you care more about the delta, than the absolute time (so the event overhead does not matter). If you are just doing a computation that does not do any IO then you may be able to raise the TPL to prevent events from being part of your measurement. Thanks, Andrew Fish PS I assume your are measuring the RELEASE code since you are turning off optimization on the DEBUG code. > On May 4, 2017, at 5:22 AM, Amit kumar wrote: > > Here are the compiler flags > [BuildOptions] > MSFT:DEBUG_*_*_CC_FLAGS = /Od /FAsc /GL- > MSFT:RELEASE_*_*_CC_FLAGS = /FAsc /D MDEPKG_NDEBUG > MSFT:RELEASE_*_*_DLINK_FLAGS = /BASE:0x10000 /ALIGN:4096 /FILEALIGN:4096 > > > ________________________________ > From: Amit kumar > Sent: Thursday, May 4, 2017 5:48:11 PM > To: Andrew Fish > Cc: Mike Kinney; edk2-devel@lists.01.org > Subject: Re: [edk2] Accessing AVX/AVX2 instruction in UEFI. > > > Yes am aligning the data at 32 byte boundary while allocating memory in both environments. > > in windows using _alligned_malloc(size,32); > > in UEFI > > Offset = (UINTN)src & 0xFF; > > src = (CHAR8 *)((UINTN) src - Offset + 0x20); > > > Thanks > > Amit > > ________________________________ > From: afish@apple.com on behalf of Andrew Fish > Sent: Thursday, May 4, 2017 5:02:55 PM > To: Amit kumar > Cc: Mike Kinney; edk2-devel@lists.01.org > Subject: Re: [edk2] Accessing AVX/AVX2 instruction in UEFI. > > >> On May 4, 2017, at 4:13 AM, Amit kumar wrote: >> >> Hi, >> >> >> Even after using AVX2 instruction my code shown no performance improvement in UEFI although there is substantial improvement when i run the similar code in windows . >> >> Am i missing something ? >> > > Is the data aligned the same in both environments? > > Thanks, > > Andrew Fish > >> Using MSVC compiler and the codes written in ASM. >> >> Thanks And Regards >> >> Amit >> >> ________________________________ >> From: edk2-devel on behalf of Amit kumar >> Sent: Wednesday, May 3, 2017 11:18:39 AM >> To: Kinney, Michael D; Andrew Fish >> Cc: edk2-devel@lists.01.org >> Subject: Re: [edk2] Accessing AVX/AVX2 instruction in UEFI. >> >> Thank you Michael and Andrew >> >> >> Regards >> >> Amit >> >> ________________________________ >> From: Kinney, Michael D >> Sent: Tuesday, May 2, 2017 10:33:45 PM >> To: Andrew Fish; Amit kumar; Kinney, Michael D >> Cc: edk2-devel@lists.01.org >> Subject: RE: [edk2] Accessing AVX/AVX2 instruction in UEFI. >> >> Amit, >> >> The information from Andrew is correct. >> >> The document that covers this topic is the >> Intel(r) 64 and IA-32 Architectures Software Developer Manuals >> >> https://software.intel.com/en-us/articles/intel-sdm >> >> Volume 1, Section 13.5.3 describes the AVX State. There are >> More details about detecting and enabling different AVX features >> in that document. >> >> If the CPU supports AVX, then the basic assembly instructions >> required to use AVX instructions are the following that sets >> bits 0, 1, 2 of XCR0. >> >> mov rcx, 0 >> xgetbv >> or rax, 0007h >> xsetbv >> >> One additional item you need to be aware of is that UEFI firmware only >> saves/Restores CPU registers required for the UEFI ABI calling convention >> when a timer interrupt or exception is processed. >> >> This means CPU state such as the YMM registers are not saved/restored >> across an interrupt and may be modified if code in interrupt context >> also uses YMM registers. >> >> When you enable the use of extended registers, interrupts should be >> saved/disabled and restored around the extended register usage. >> >> You can use the following functions from MdePkg BaseLib to do this >> >> /** >> Disables CPU interrupts and returns the interrupt state prior to the disable >> operation. >> >> @retval TRUE CPU interrupts were enabled on entry to this call. >> @retval FALSE CPU interrupts were disabled on entry to this call. >> >> **/ >> BOOLEAN >> EFIAPI >> SaveAndDisableInterrupts ( >> VOID >> ); >> >> /** >> Set the current CPU interrupt state. >> >> Sets the current CPU interrupt state to the state specified by >> InterruptState. If InterruptState is TRUE, then interrupts are enabled. If >> InterruptState is FALSE, then interrupts are disabled. InterruptState is >> returned. >> >> @param InterruptState TRUE if interrupts should enabled. FALSE if >> interrupts should be disabled. >> >> @return InterruptState >> >> **/ >> BOOLEAN >> EFIAPI >> SetInterruptState ( >> IN BOOLEAN InterruptState >> ); >> >> Algorithm: >> ============ >> { >> BOOLEAN InterruptState; >> >> InterruptState = SaveAndDisableInterrupts(); >> >> // Enable use of AVX/AVX2 instructions >> >> // Use AVX/AVX2 instructions >> >> SetInterruptState (InterruptState); >> } >> >> Best regards, >> >> Mike >> >>> -----Original Message----- >>> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Andrew Fish >>> Sent: Tuesday, May 2, 2017 8:12 AM >>> To: Amit kumar >>> Cc: edk2-devel@lists.01.org >>> Subject: Re: [edk2] Accessing AVX/AVX2 instruction in UEFI. >>> >>> >>>> On May 2, 2017, at 6:57 AM, Amit kumar wrote: >>>> >>>> Hi, >>>> >>>> Am trying to optimize an application using AVX/AVX2, but my code hangs while trying >>> to access YMM registers. >>>> The instruction where my code hangs is : >>>> >>>> >>>> vmovups ymm0, YMMWORD PTR [rax] >>>> >>>> >>>> I have verified the cpuid in OS and it supports AVX and AVX2 instruction. Processor >>> i7 6th gen. >>>> Can somebody help me out here ? Is there a way to enable YMM registers ? >>>> >>> >>> Amit, >>> >>> I think these instructions will generate an illegal instruction fault until you enable >>> AVX. You need to check the Cpu ID bits in your code, then write BIT18 of CR4. After >>> that XGETBV/XSETBV instructions are enabled and you can or in the lower 2 bits of >>> XCR0. This basic operation is in the Intel Docs, it is just hard to find. Usually the >>> OS has done this for the programmer and all the code needs to do is check the CPU ID. >>> >>> Thanks, >>> >>> Andrew Fish >>> >>>> >>>> Thanks And Regards >>>> Amit Kumar >>>> >>>> _______________________________________________ >>>> edk2-devel mailing list >>>> edk2-devel@lists.01.org >>>> https://lists.01.org/mailman/listinfo/edk2-devel >>> >>> _______________________________________________ >>> edk2-devel mailing list >>> edk2-devel@lists.01.org >>> https://lists.01.org/mailman/listinfo/edk2-devel >> _______________________________________________ >> edk2-devel mailing list >> edk2-devel@lists.01.org >> https://lists.01.org/mailman/listinfo/edk2-devel >> _______________________________________________ >> edk2-devel mailing list >> edk2-devel@lists.01.org >> https://lists.01.org/mailman/listinfo/edk2-devel > > _______________________________________________ > edk2-devel mailing list > edk2-devel@lists.01.org > https://lists.01.org/mailman/listinfo/edk2-devel