public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
* Accessing AVX/AVX2 instruction in UEFI.
@ 2017-05-02 13:57 Amit kumar
  2017-05-02 15:12 ` Andrew Fish
  0 siblings, 1 reply; 11+ messages in thread
From: Amit kumar @ 2017-05-02 13:57 UTC (permalink / raw)
  To: edk2-devel@lists.01.org

Hi,

Am trying to optimize an application using AVX/AVX2, but my code hangs while trying to access YMM registers.
The instruction where my code hangs is :


 vmovups ymm0, YMMWORD PTR [rax] 


I have verified the cpuid in OS and it supports AVX and AVX2 instruction. Processor i7 6th gen.
Can somebody help me out here ? Is there a way to enable YMM registers ?


Thanks And Regards
Amit Kumar 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Accessing AVX/AVX2 instruction in UEFI.
  2017-05-02 13:57 Accessing AVX/AVX2 instruction in UEFI Amit kumar
@ 2017-05-02 15:12 ` Andrew Fish
  2017-05-02 17:03   ` Kinney, Michael D
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Fish @ 2017-05-02 15:12 UTC (permalink / raw)
  To: Amit kumar; +Cc: edk2-devel@lists.01.org


> On May 2, 2017, at 6:57 AM, Amit kumar <akamit91@hotmail.com> wrote:
> 
> Hi,
> 
> Am trying to optimize an application using AVX/AVX2, but my code hangs while trying to access YMM registers.
> The instruction where my code hangs is :
> 
> 
>  vmovups ymm0, YMMWORD PTR [rax] 
> 
> 
> I have verified the cpuid in OS and it supports AVX and AVX2 instruction. Processor i7 6th gen.
> Can somebody help me out here ? Is there a way to enable YMM registers ?
> 

Amit,

I think these instructions will generate an illegal instruction fault until you enable AVX. You need to check the Cpu ID bits in your code, then write BIT18 of CR4. After that XGETBV/XSETBV instructions are enabled and you can or in the lower 2 bits of XCR0. This basic operation is in the Intel Docs, it is just hard to find. Usually the OS has done this for the programmer and all the code needs to do is check the CPU ID. 

Thanks,

Andrew Fish

> 
> Thanks And Regards
> Amit Kumar 
> 
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Accessing AVX/AVX2 instruction in UEFI.
  2017-05-02 15:12 ` Andrew Fish
@ 2017-05-02 17:03   ` Kinney, Michael D
  2017-05-03  5:48     ` Amit kumar
  0 siblings, 1 reply; 11+ messages in thread
From: Kinney, Michael D @ 2017-05-02 17:03 UTC (permalink / raw)
  To: Andrew Fish, Amit kumar, Kinney, Michael D; +Cc: edk2-devel@lists.01.org

Amit,

The information from Andrew is correct.

The document that covers this topic is the 
Intel(r) 64 and IA-32 Architectures Software Developer Manuals

https://software.intel.com/en-us/articles/intel-sdm

Volume 1, Section 13.5.3 describes the AVX State.  There are 
More details about detecting and enabling different AVX features
in that document.

If the CPU supports AVX, then the basic assembly instructions
required to use AVX instructions are the following that sets
bits 0, 1, 2 of XCR0.

    mov     rcx, 0
    xgetbv
    or      rax, 0007h
    xsetbv

One additional item you need to be aware of is that UEFI firmware only
saves/Restores CPU registers required for the UEFI ABI calling convention
when a timer interrupt or exception is processed.

This means CPU state such as the YMM registers are not saved/restored
across an interrupt and may be modified if code in interrupt context
also uses YMM registers.

When you enable the use of extended registers, interrupts should be 
saved/disabled and restored around the extended register usage.

You can use the following functions from MdePkg BaseLib to do this

/**
  Disables CPU interrupts and returns the interrupt state prior to the disable
  operation.

  @retval TRUE  CPU interrupts were enabled on entry to this call.
  @retval FALSE CPU interrupts were disabled on entry to this call.

**/
BOOLEAN
EFIAPI
SaveAndDisableInterrupts (
  VOID
  );

/**
  Set the current CPU interrupt state.

  Sets the current CPU interrupt state to the state specified by
  InterruptState. If InterruptState is TRUE, then interrupts are enabled. If
  InterruptState is FALSE, then interrupts are disabled. InterruptState is
  returned.

  @param  InterruptState  TRUE if interrupts should enabled. FALSE if
                          interrupts should be disabled.

  @return InterruptState

**/
BOOLEAN
EFIAPI
SetInterruptState (
  IN      BOOLEAN                   InterruptState
  );

Algorithm:
============
{
  BOOLEAN  InterruptState;

  InterruptState = SaveAndDisableInterrupts();

  // Enable use of AVX/AVX2 instructions

  // Use AVX/AVX2 instructions

  SetInterruptState (InterruptState);
}

Best regards,

Mike

> -----Original Message-----
> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Andrew Fish
> Sent: Tuesday, May 2, 2017 8:12 AM
> To: Amit kumar <akamit91@hotmail.com>
> Cc: edk2-devel@lists.01.org
> Subject: Re: [edk2] Accessing AVX/AVX2 instruction in UEFI.
> 
> 
> > On May 2, 2017, at 6:57 AM, Amit kumar <akamit91@hotmail.com> wrote:
> >
> > Hi,
> >
> > Am trying to optimize an application using AVX/AVX2, but my code hangs while trying
> to access YMM registers.
> > The instruction where my code hangs is :
> >
> >
> >  vmovups ymm0, YMMWORD PTR [rax]
> >
> >
> > I have verified the cpuid in OS and it supports AVX and AVX2 instruction. Processor
> i7 6th gen.
> > Can somebody help me out here ? Is there a way to enable YMM registers ?
> >
> 
> Amit,
> 
> I think these instructions will generate an illegal instruction fault until you enable
> AVX. You need to check the Cpu ID bits in your code, then write BIT18 of CR4. After
> that XGETBV/XSETBV instructions are enabled and you can or in the lower 2 bits of
> XCR0. This basic operation is in the Intel Docs, it is just hard to find. Usually the
> OS has done this for the programmer and all the code needs to do is check the CPU ID.
> 
> Thanks,
> 
> Andrew Fish
> 
> >
> > Thanks And Regards
> > Amit Kumar
> >
> > _______________________________________________
> > edk2-devel mailing list
> > edk2-devel@lists.01.org
> > https://lists.01.org/mailman/listinfo/edk2-devel
> 
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Accessing AVX/AVX2 instruction in UEFI.
  2017-05-02 17:03   ` Kinney, Michael D
@ 2017-05-03  5:48     ` Amit kumar
  2017-05-04 11:13       ` Amit kumar
  0 siblings, 1 reply; 11+ messages in thread
From: Amit kumar @ 2017-05-03  5:48 UTC (permalink / raw)
  To: Kinney, Michael D, Andrew Fish; +Cc: edk2-devel@lists.01.org

Thank you Michael and Andrew


Regards

Amit

________________________________
From: Kinney, Michael D <michael.d.kinney@intel.com>
Sent: Tuesday, May 2, 2017 10:33:45 PM
To: Andrew Fish; Amit kumar; Kinney, Michael D
Cc: edk2-devel@lists.01.org
Subject: RE: [edk2] Accessing AVX/AVX2 instruction in UEFI.

Amit,

The information from Andrew is correct.

The document that covers this topic is the
Intel(r) 64 and IA-32 Architectures Software Developer Manuals

https://software.intel.com/en-us/articles/intel-sdm

Volume 1, Section 13.5.3 describes the AVX State.  There are
More details about detecting and enabling different AVX features
in that document.

If the CPU supports AVX, then the basic assembly instructions
required to use AVX instructions are the following that sets
bits 0, 1, 2 of XCR0.

    mov     rcx, 0
    xgetbv
    or      rax, 0007h
    xsetbv

One additional item you need to be aware of is that UEFI firmware only
saves/Restores CPU registers required for the UEFI ABI calling convention
when a timer interrupt or exception is processed.

This means CPU state such as the YMM registers are not saved/restored
across an interrupt and may be modified if code in interrupt context
also uses YMM registers.

When you enable the use of extended registers, interrupts should be
saved/disabled and restored around the extended register usage.

You can use the following functions from MdePkg BaseLib to do this

/**
  Disables CPU interrupts and returns the interrupt state prior to the disable
  operation.

  @retval TRUE  CPU interrupts were enabled on entry to this call.
  @retval FALSE CPU interrupts were disabled on entry to this call.

**/
BOOLEAN
EFIAPI
SaveAndDisableInterrupts (
  VOID
  );

/**
  Set the current CPU interrupt state.

  Sets the current CPU interrupt state to the state specified by
  InterruptState. If InterruptState is TRUE, then interrupts are enabled. If
  InterruptState is FALSE, then interrupts are disabled. InterruptState is
  returned.

  @param  InterruptState  TRUE if interrupts should enabled. FALSE if
                          interrupts should be disabled.

  @return InterruptState

**/
BOOLEAN
EFIAPI
SetInterruptState (
  IN      BOOLEAN                   InterruptState
  );

Algorithm:
============
{
  BOOLEAN  InterruptState;

  InterruptState = SaveAndDisableInterrupts();

  // Enable use of AVX/AVX2 instructions

  // Use AVX/AVX2 instructions

  SetInterruptState (InterruptState);
}

Best regards,

Mike

> -----Original Message-----
> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Andrew Fish
> Sent: Tuesday, May 2, 2017 8:12 AM
> To: Amit kumar <akamit91@hotmail.com>
> Cc: edk2-devel@lists.01.org
> Subject: Re: [edk2] Accessing AVX/AVX2 instruction in UEFI.
>
>
> > On May 2, 2017, at 6:57 AM, Amit kumar <akamit91@hotmail.com> wrote:
> >
> > Hi,
> >
> > Am trying to optimize an application using AVX/AVX2, but my code hangs while trying
> to access YMM registers.
> > The instruction where my code hangs is :
> >
> >
> >  vmovups ymm0, YMMWORD PTR [rax]
> >
> >
> > I have verified the cpuid in OS and it supports AVX and AVX2 instruction. Processor
> i7 6th gen.
> > Can somebody help me out here ? Is there a way to enable YMM registers ?
> >
>
> Amit,
>
> I think these instructions will generate an illegal instruction fault until you enable
> AVX. You need to check the Cpu ID bits in your code, then write BIT18 of CR4. After
> that XGETBV/XSETBV instructions are enabled and you can or in the lower 2 bits of
> XCR0. This basic operation is in the Intel Docs, it is just hard to find. Usually the
> OS has done this for the programmer and all the code needs to do is check the CPU ID.
>
> Thanks,
>
> Andrew Fish
>
> >
> > Thanks And Regards
> > Amit Kumar
> >
> > _______________________________________________
> > edk2-devel mailing list
> > edk2-devel@lists.01.org
> > https://lists.01.org/mailman/listinfo/edk2-devel
>
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Accessing AVX/AVX2 instruction in UEFI.
  2017-05-03  5:48     ` Amit kumar
@ 2017-05-04 11:13       ` Amit kumar
  2017-05-04 11:32         ` Andrew Fish
  0 siblings, 1 reply; 11+ messages in thread
From: Amit kumar @ 2017-05-04 11:13 UTC (permalink / raw)
  To: Kinney, Michael D, Andrew Fish; +Cc: edk2-devel@lists.01.org

Hi,


Even after using AVX2 instruction my code shown no performance improvement in UEFI although there is substantial improvement when i run the similar code in windows .

Am i missing something ?

Using MSVC compiler and the codes written in ASM.

Thanks And Regards

Amit

________________________________
From: edk2-devel <edk2-devel-bounces@lists.01.org> on behalf of Amit kumar <akamit91@hotmail.com>
Sent: Wednesday, May 3, 2017 11:18:39 AM
To: Kinney, Michael D; Andrew Fish
Cc: edk2-devel@lists.01.org
Subject: Re: [edk2] Accessing AVX/AVX2 instruction in UEFI.

Thank you Michael and Andrew


Regards

Amit

________________________________
From: Kinney, Michael D <michael.d.kinney@intel.com>
Sent: Tuesday, May 2, 2017 10:33:45 PM
To: Andrew Fish; Amit kumar; Kinney, Michael D
Cc: edk2-devel@lists.01.org
Subject: RE: [edk2] Accessing AVX/AVX2 instruction in UEFI.

Amit,

The information from Andrew is correct.

The document that covers this topic is the
Intel(r) 64 and IA-32 Architectures Software Developer Manuals

https://software.intel.com/en-us/articles/intel-sdm

Volume 1, Section 13.5.3 describes the AVX State.  There are
More details about detecting and enabling different AVX features
in that document.

If the CPU supports AVX, then the basic assembly instructions
required to use AVX instructions are the following that sets
bits 0, 1, 2 of XCR0.

    mov     rcx, 0
    xgetbv
    or      rax, 0007h
    xsetbv

One additional item you need to be aware of is that UEFI firmware only
saves/Restores CPU registers required for the UEFI ABI calling convention
when a timer interrupt or exception is processed.

This means CPU state such as the YMM registers are not saved/restored
across an interrupt and may be modified if code in interrupt context
also uses YMM registers.

When you enable the use of extended registers, interrupts should be
saved/disabled and restored around the extended register usage.

You can use the following functions from MdePkg BaseLib to do this

/**
  Disables CPU interrupts and returns the interrupt state prior to the disable
  operation.

  @retval TRUE  CPU interrupts were enabled on entry to this call.
  @retval FALSE CPU interrupts were disabled on entry to this call.

**/
BOOLEAN
EFIAPI
SaveAndDisableInterrupts (
  VOID
  );

/**
  Set the current CPU interrupt state.

  Sets the current CPU interrupt state to the state specified by
  InterruptState. If InterruptState is TRUE, then interrupts are enabled. If
  InterruptState is FALSE, then interrupts are disabled. InterruptState is
  returned.

  @param  InterruptState  TRUE if interrupts should enabled. FALSE if
                          interrupts should be disabled.

  @return InterruptState

**/
BOOLEAN
EFIAPI
SetInterruptState (
  IN      BOOLEAN                   InterruptState
  );

Algorithm:
============
{
  BOOLEAN  InterruptState;

  InterruptState = SaveAndDisableInterrupts();

  // Enable use of AVX/AVX2 instructions

  // Use AVX/AVX2 instructions

  SetInterruptState (InterruptState);
}

Best regards,

Mike

> -----Original Message-----
> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Andrew Fish
> Sent: Tuesday, May 2, 2017 8:12 AM
> To: Amit kumar <akamit91@hotmail.com>
> Cc: edk2-devel@lists.01.org
> Subject: Re: [edk2] Accessing AVX/AVX2 instruction in UEFI.
>
>
> > On May 2, 2017, at 6:57 AM, Amit kumar <akamit91@hotmail.com> wrote:
> >
> > Hi,
> >
> > Am trying to optimize an application using AVX/AVX2, but my code hangs while trying
> to access YMM registers.
> > The instruction where my code hangs is :
> >
> >
> >  vmovups ymm0, YMMWORD PTR [rax]
> >
> >
> > I have verified the cpuid in OS and it supports AVX and AVX2 instruction. Processor
> i7 6th gen.
> > Can somebody help me out here ? Is there a way to enable YMM registers ?
> >
>
> Amit,
>
> I think these instructions will generate an illegal instruction fault until you enable
> AVX. You need to check the Cpu ID bits in your code, then write BIT18 of CR4. After
> that XGETBV/XSETBV instructions are enabled and you can or in the lower 2 bits of
> XCR0. This basic operation is in the Intel Docs, it is just hard to find. Usually the
> OS has done this for the programmer and all the code needs to do is check the CPU ID.
>
> Thanks,
>
> Andrew Fish
>
> >
> > Thanks And Regards
> > Amit Kumar
> >
> > _______________________________________________
> > edk2-devel mailing list
> > edk2-devel@lists.01.org
> > https://lists.01.org/mailman/listinfo/edk2-devel
>
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Accessing AVX/AVX2 instruction in UEFI.
  2017-05-04 11:13       ` Amit kumar
@ 2017-05-04 11:32         ` Andrew Fish
  2017-05-04 12:18           ` Amit kumar
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Fish @ 2017-05-04 11:32 UTC (permalink / raw)
  To: Amit kumar; +Cc: Mike Kinney, edk2-devel@lists.01.org


> On May 4, 2017, at 4:13 AM, Amit kumar <akamit91@hotmail.com> wrote:
> 
> Hi,
> 
> 
> Even after using AVX2 instruction my code shown no performance improvement in UEFI although there is substantial improvement when i run the similar code in windows .
> 
> Am i missing something ?
> 

Is the data aligned the same in both environments?

Thanks,

Andrew Fish

> Using MSVC compiler and the codes written in ASM.
> 
> Thanks And Regards
> 
> Amit
> 
> ________________________________
> From: edk2-devel <edk2-devel-bounces@lists.01.org> on behalf of Amit kumar <akamit91@hotmail.com>
> Sent: Wednesday, May 3, 2017 11:18:39 AM
> To: Kinney, Michael D; Andrew Fish
> Cc: edk2-devel@lists.01.org
> Subject: Re: [edk2] Accessing AVX/AVX2 instruction in UEFI.
> 
> Thank you Michael and Andrew
> 
> 
> Regards
> 
> Amit
> 
> ________________________________
> From: Kinney, Michael D <michael.d.kinney@intel.com>
> Sent: Tuesday, May 2, 2017 10:33:45 PM
> To: Andrew Fish; Amit kumar; Kinney, Michael D
> Cc: edk2-devel@lists.01.org
> Subject: RE: [edk2] Accessing AVX/AVX2 instruction in UEFI.
> 
> Amit,
> 
> The information from Andrew is correct.
> 
> The document that covers this topic is the
> Intel(r) 64 and IA-32 Architectures Software Developer Manuals
> 
> https://software.intel.com/en-us/articles/intel-sdm
> 
> Volume 1, Section 13.5.3 describes the AVX State.  There are
> More details about detecting and enabling different AVX features
> in that document.
> 
> If the CPU supports AVX, then the basic assembly instructions
> required to use AVX instructions are the following that sets
> bits 0, 1, 2 of XCR0.
> 
>    mov     rcx, 0
>    xgetbv
>    or      rax, 0007h
>    xsetbv
> 
> One additional item you need to be aware of is that UEFI firmware only
> saves/Restores CPU registers required for the UEFI ABI calling convention
> when a timer interrupt or exception is processed.
> 
> This means CPU state such as the YMM registers are not saved/restored
> across an interrupt and may be modified if code in interrupt context
> also uses YMM registers.
> 
> When you enable the use of extended registers, interrupts should be
> saved/disabled and restored around the extended register usage.
> 
> You can use the following functions from MdePkg BaseLib to do this
> 
> /**
>  Disables CPU interrupts and returns the interrupt state prior to the disable
>  operation.
> 
>  @retval TRUE  CPU interrupts were enabled on entry to this call.
>  @retval FALSE CPU interrupts were disabled on entry to this call.
> 
> **/
> BOOLEAN
> EFIAPI
> SaveAndDisableInterrupts (
>  VOID
>  );
> 
> /**
>  Set the current CPU interrupt state.
> 
>  Sets the current CPU interrupt state to the state specified by
>  InterruptState. If InterruptState is TRUE, then interrupts are enabled. If
>  InterruptState is FALSE, then interrupts are disabled. InterruptState is
>  returned.
> 
>  @param  InterruptState  TRUE if interrupts should enabled. FALSE if
>                          interrupts should be disabled.
> 
>  @return InterruptState
> 
> **/
> BOOLEAN
> EFIAPI
> SetInterruptState (
>  IN      BOOLEAN                   InterruptState
>  );
> 
> Algorithm:
> ============
> {
>  BOOLEAN  InterruptState;
> 
>  InterruptState = SaveAndDisableInterrupts();
> 
>  // Enable use of AVX/AVX2 instructions
> 
>  // Use AVX/AVX2 instructions
> 
>  SetInterruptState (InterruptState);
> }
> 
> Best regards,
> 
> Mike
> 
>> -----Original Message-----
>> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Andrew Fish
>> Sent: Tuesday, May 2, 2017 8:12 AM
>> To: Amit kumar <akamit91@hotmail.com>
>> Cc: edk2-devel@lists.01.org
>> Subject: Re: [edk2] Accessing AVX/AVX2 instruction in UEFI.
>> 
>> 
>>> On May 2, 2017, at 6:57 AM, Amit kumar <akamit91@hotmail.com> wrote:
>>> 
>>> Hi,
>>> 
>>> Am trying to optimize an application using AVX/AVX2, but my code hangs while trying
>> to access YMM registers.
>>> The instruction where my code hangs is :
>>> 
>>> 
>>> vmovups ymm0, YMMWORD PTR [rax]
>>> 
>>> 
>>> I have verified the cpuid in OS and it supports AVX and AVX2 instruction. Processor
>> i7 6th gen.
>>> Can somebody help me out here ? Is there a way to enable YMM registers ?
>>> 
>> 
>> Amit,
>> 
>> I think these instructions will generate an illegal instruction fault until you enable
>> AVX. You need to check the Cpu ID bits in your code, then write BIT18 of CR4. After
>> that XGETBV/XSETBV instructions are enabled and you can or in the lower 2 bits of
>> XCR0. This basic operation is in the Intel Docs, it is just hard to find. Usually the
>> OS has done this for the programmer and all the code needs to do is check the CPU ID.
>> 
>> Thanks,
>> 
>> Andrew Fish
>> 
>>> 
>>> Thanks And Regards
>>> Amit Kumar
>>> 
>>> _______________________________________________
>>> edk2-devel mailing list
>>> edk2-devel@lists.01.org
>>> https://lists.01.org/mailman/listinfo/edk2-devel
>> 
>> _______________________________________________
>> edk2-devel mailing list
>> edk2-devel@lists.01.org
>> https://lists.01.org/mailman/listinfo/edk2-devel
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Accessing AVX/AVX2 instruction in UEFI.
  2017-05-04 11:32         ` Andrew Fish
@ 2017-05-04 12:18           ` Amit kumar
  2017-05-04 12:22             ` Amit kumar
  0 siblings, 1 reply; 11+ messages in thread
From: Amit kumar @ 2017-05-04 12:18 UTC (permalink / raw)
  To: Andrew Fish; +Cc: Mike Kinney, edk2-devel@lists.01.org

Yes am aligning the data at 32 byte boundary while allocating memory in both environments.

in windows using  _alligned_malloc(size,32);

in UEFI

Offset = (UINTN)src & 0xFF;

src = (CHAR8 *)((UINTN) src - Offset + 0x20);


Thanks

Amit

________________________________
From: afish@apple.com <afish@apple.com> on behalf of Andrew Fish <afish@apple.com>
Sent: Thursday, May 4, 2017 5:02:55 PM
To: Amit kumar
Cc: Mike Kinney; edk2-devel@lists.01.org
Subject: Re: [edk2] Accessing AVX/AVX2 instruction in UEFI.


> On May 4, 2017, at 4:13 AM, Amit kumar <akamit91@hotmail.com> wrote:
>
> Hi,
>
>
> Even after using AVX2 instruction my code shown no performance improvement in UEFI although there is substantial improvement when i run the similar code in windows .
>
> Am i missing something ?
>

Is the data aligned the same in both environments?

Thanks,

Andrew Fish

> Using MSVC compiler and the codes written in ASM.
>
> Thanks And Regards
>
> Amit
>
> ________________________________
> From: edk2-devel <edk2-devel-bounces@lists.01.org> on behalf of Amit kumar <akamit91@hotmail.com>
> Sent: Wednesday, May 3, 2017 11:18:39 AM
> To: Kinney, Michael D; Andrew Fish
> Cc: edk2-devel@lists.01.org
> Subject: Re: [edk2] Accessing AVX/AVX2 instruction in UEFI.
>
> Thank you Michael and Andrew
>
>
> Regards
>
> Amit
>
> ________________________________
> From: Kinney, Michael D <michael.d.kinney@intel.com>
> Sent: Tuesday, May 2, 2017 10:33:45 PM
> To: Andrew Fish; Amit kumar; Kinney, Michael D
> Cc: edk2-devel@lists.01.org
> Subject: RE: [edk2] Accessing AVX/AVX2 instruction in UEFI.
>
> Amit,
>
> The information from Andrew is correct.
>
> The document that covers this topic is the
> Intel(r) 64 and IA-32 Architectures Software Developer Manuals
>
> https://software.intel.com/en-us/articles/intel-sdm
>
> Volume 1, Section 13.5.3 describes the AVX State.  There are
> More details about detecting and enabling different AVX features
> in that document.
>
> If the CPU supports AVX, then the basic assembly instructions
> required to use AVX instructions are the following that sets
> bits 0, 1, 2 of XCR0.
>
>    mov     rcx, 0
>    xgetbv
>    or      rax, 0007h
>    xsetbv
>
> One additional item you need to be aware of is that UEFI firmware only
> saves/Restores CPU registers required for the UEFI ABI calling convention
> when a timer interrupt or exception is processed.
>
> This means CPU state such as the YMM registers are not saved/restored
> across an interrupt and may be modified if code in interrupt context
> also uses YMM registers.
>
> When you enable the use of extended registers, interrupts should be
> saved/disabled and restored around the extended register usage.
>
> You can use the following functions from MdePkg BaseLib to do this
>
> /**
>  Disables CPU interrupts and returns the interrupt state prior to the disable
>  operation.
>
>  @retval TRUE  CPU interrupts were enabled on entry to this call.
>  @retval FALSE CPU interrupts were disabled on entry to this call.
>
> **/
> BOOLEAN
> EFIAPI
> SaveAndDisableInterrupts (
>  VOID
>  );
>
> /**
>  Set the current CPU interrupt state.
>
>  Sets the current CPU interrupt state to the state specified by
>  InterruptState. If InterruptState is TRUE, then interrupts are enabled. If
>  InterruptState is FALSE, then interrupts are disabled. InterruptState is
>  returned.
>
>  @param  InterruptState  TRUE if interrupts should enabled. FALSE if
>                          interrupts should be disabled.
>
>  @return InterruptState
>
> **/
> BOOLEAN
> EFIAPI
> SetInterruptState (
>  IN      BOOLEAN                   InterruptState
>  );
>
> Algorithm:
> ============
> {
>  BOOLEAN  InterruptState;
>
>  InterruptState = SaveAndDisableInterrupts();
>
>  // Enable use of AVX/AVX2 instructions
>
>  // Use AVX/AVX2 instructions
>
>  SetInterruptState (InterruptState);
> }
>
> Best regards,
>
> Mike
>
>> -----Original Message-----
>> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Andrew Fish
>> Sent: Tuesday, May 2, 2017 8:12 AM
>> To: Amit kumar <akamit91@hotmail.com>
>> Cc: edk2-devel@lists.01.org
>> Subject: Re: [edk2] Accessing AVX/AVX2 instruction in UEFI.
>>
>>
>>> On May 2, 2017, at 6:57 AM, Amit kumar <akamit91@hotmail.com> wrote:
>>>
>>> Hi,
>>>
>>> Am trying to optimize an application using AVX/AVX2, but my code hangs while trying
>> to access YMM registers.
>>> The instruction where my code hangs is :
>>>
>>>
>>> vmovups ymm0, YMMWORD PTR [rax]
>>>
>>>
>>> I have verified the cpuid in OS and it supports AVX and AVX2 instruction. Processor
>> i7 6th gen.
>>> Can somebody help me out here ? Is there a way to enable YMM registers ?
>>>
>>
>> Amit,
>>
>> I think these instructions will generate an illegal instruction fault until you enable
>> AVX. You need to check the Cpu ID bits in your code, then write BIT18 of CR4. After
>> that XGETBV/XSETBV instructions are enabled and you can or in the lower 2 bits of
>> XCR0. This basic operation is in the Intel Docs, it is just hard to find. Usually the
>> OS has done this for the programmer and all the code needs to do is check the CPU ID.
>>
>> Thanks,
>>
>> Andrew Fish
>>
>>>
>>> Thanks And Regards
>>> Amit Kumar
>>>
>>> _______________________________________________
>>> edk2-devel mailing list
>>> edk2-devel@lists.01.org
>>> https://lists.01.org/mailman/listinfo/edk2-devel
>>
>> _______________________________________________
>> edk2-devel mailing list
>> edk2-devel@lists.01.org
>> https://lists.01.org/mailman/listinfo/edk2-devel
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Accessing AVX/AVX2 instruction in UEFI.
  2017-05-04 12:18           ` Amit kumar
@ 2017-05-04 12:22             ` Amit kumar
  2017-05-04 15:20               ` Andrew Fish
  0 siblings, 1 reply; 11+ messages in thread
From: Amit kumar @ 2017-05-04 12:22 UTC (permalink / raw)
  To: Andrew Fish; +Cc: Mike Kinney, edk2-devel@lists.01.org

Here are the compiler flags
[BuildOptions]
  MSFT:DEBUG_*_*_CC_FLAGS = /Od /FAsc /GL-
  MSFT:RELEASE_*_*_CC_FLAGS = /FAsc /D MDEPKG_NDEBUG
  MSFT:RELEASE_*_*_DLINK_FLAGS = /BASE:0x10000  /ALIGN:4096 /FILEALIGN:4096


________________________________
From: Amit kumar <akamit91@hotmail.com>
Sent: Thursday, May 4, 2017 5:48:11 PM
To: Andrew Fish
Cc: Mike Kinney; edk2-devel@lists.01.org
Subject: Re: [edk2] Accessing AVX/AVX2 instruction in UEFI.


Yes am aligning the data at 32 byte boundary while allocating memory in both environments.

in windows using  _alligned_malloc(size,32);

in UEFI

Offset = (UINTN)src & 0xFF;

src = (CHAR8 *)((UINTN) src - Offset + 0x20);


Thanks

Amit

________________________________
From: afish@apple.com <afish@apple.com> on behalf of Andrew Fish <afish@apple.com>
Sent: Thursday, May 4, 2017 5:02:55 PM
To: Amit kumar
Cc: Mike Kinney; edk2-devel@lists.01.org
Subject: Re: [edk2] Accessing AVX/AVX2 instruction in UEFI.


> On May 4, 2017, at 4:13 AM, Amit kumar <akamit91@hotmail.com> wrote:
>
> Hi,
>
>
> Even after using AVX2 instruction my code shown no performance improvement in UEFI although there is substantial improvement when i run the similar code in windows .
>
> Am i missing something ?
>

Is the data aligned the same in both environments?

Thanks,

Andrew Fish

> Using MSVC compiler and the codes written in ASM.
>
> Thanks And Regards
>
> Amit
>
> ________________________________
> From: edk2-devel <edk2-devel-bounces@lists.01.org> on behalf of Amit kumar <akamit91@hotmail.com>
> Sent: Wednesday, May 3, 2017 11:18:39 AM
> To: Kinney, Michael D; Andrew Fish
> Cc: edk2-devel@lists.01.org
> Subject: Re: [edk2] Accessing AVX/AVX2 instruction in UEFI.
>
> Thank you Michael and Andrew
>
>
> Regards
>
> Amit
>
> ________________________________
> From: Kinney, Michael D <michael.d.kinney@intel.com>
> Sent: Tuesday, May 2, 2017 10:33:45 PM
> To: Andrew Fish; Amit kumar; Kinney, Michael D
> Cc: edk2-devel@lists.01.org
> Subject: RE: [edk2] Accessing AVX/AVX2 instruction in UEFI.
>
> Amit,
>
> The information from Andrew is correct.
>
> The document that covers this topic is the
> Intel(r) 64 and IA-32 Architectures Software Developer Manuals
>
> https://software.intel.com/en-us/articles/intel-sdm
>
> Volume 1, Section 13.5.3 describes the AVX State.  There are
> More details about detecting and enabling different AVX features
> in that document.
>
> If the CPU supports AVX, then the basic assembly instructions
> required to use AVX instructions are the following that sets
> bits 0, 1, 2 of XCR0.
>
>    mov     rcx, 0
>    xgetbv
>    or      rax, 0007h
>    xsetbv
>
> One additional item you need to be aware of is that UEFI firmware only
> saves/Restores CPU registers required for the UEFI ABI calling convention
> when a timer interrupt or exception is processed.
>
> This means CPU state such as the YMM registers are not saved/restored
> across an interrupt and may be modified if code in interrupt context
> also uses YMM registers.
>
> When you enable the use of extended registers, interrupts should be
> saved/disabled and restored around the extended register usage.
>
> You can use the following functions from MdePkg BaseLib to do this
>
> /**
>  Disables CPU interrupts and returns the interrupt state prior to the disable
>  operation.
>
>  @retval TRUE  CPU interrupts were enabled on entry to this call.
>  @retval FALSE CPU interrupts were disabled on entry to this call.
>
> **/
> BOOLEAN
> EFIAPI
> SaveAndDisableInterrupts (
>  VOID
>  );
>
> /**
>  Set the current CPU interrupt state.
>
>  Sets the current CPU interrupt state to the state specified by
>  InterruptState. If InterruptState is TRUE, then interrupts are enabled. If
>  InterruptState is FALSE, then interrupts are disabled. InterruptState is
>  returned.
>
>  @param  InterruptState  TRUE if interrupts should enabled. FALSE if
>                          interrupts should be disabled.
>
>  @return InterruptState
>
> **/
> BOOLEAN
> EFIAPI
> SetInterruptState (
>  IN      BOOLEAN                   InterruptState
>  );
>
> Algorithm:
> ============
> {
>  BOOLEAN  InterruptState;
>
>  InterruptState = SaveAndDisableInterrupts();
>
>  // Enable use of AVX/AVX2 instructions
>
>  // Use AVX/AVX2 instructions
>
>  SetInterruptState (InterruptState);
> }
>
> Best regards,
>
> Mike
>
>> -----Original Message-----
>> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Andrew Fish
>> Sent: Tuesday, May 2, 2017 8:12 AM
>> To: Amit kumar <akamit91@hotmail.com>
>> Cc: edk2-devel@lists.01.org
>> Subject: Re: [edk2] Accessing AVX/AVX2 instruction in UEFI.
>>
>>
>>> On May 2, 2017, at 6:57 AM, Amit kumar <akamit91@hotmail.com> wrote:
>>>
>>> Hi,
>>>
>>> Am trying to optimize an application using AVX/AVX2, but my code hangs while trying
>> to access YMM registers.
>>> The instruction where my code hangs is :
>>>
>>>
>>> vmovups ymm0, YMMWORD PTR [rax]
>>>
>>>
>>> I have verified the cpuid in OS and it supports AVX and AVX2 instruction. Processor
>> i7 6th gen.
>>> Can somebody help me out here ? Is there a way to enable YMM registers ?
>>>
>>
>> Amit,
>>
>> I think these instructions will generate an illegal instruction fault until you enable
>> AVX. You need to check the Cpu ID bits in your code, then write BIT18 of CR4. After
>> that XGETBV/XSETBV instructions are enabled and you can or in the lower 2 bits of
>> XCR0. This basic operation is in the Intel Docs, it is just hard to find. Usually the
>> OS has done this for the programmer and all the code needs to do is check the CPU ID.
>>
>> Thanks,
>>
>> Andrew Fish
>>
>>>
>>> Thanks And Regards
>>> Amit Kumar
>>>
>>> _______________________________________________
>>> edk2-devel mailing list
>>> edk2-devel@lists.01.org
>>> https://lists.01.org/mailman/listinfo/edk2-devel
>>
>> _______________________________________________
>> edk2-devel mailing list
>> edk2-devel@lists.01.org
>> https://lists.01.org/mailman/listinfo/edk2-devel
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Accessing AVX/AVX2 instruction in UEFI.
  2017-05-04 12:22             ` Amit kumar
@ 2017-05-04 15:20               ` Andrew Fish
  2017-05-04 17:26                 ` Kinney, Michael D
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Fish @ 2017-05-04 15:20 UTC (permalink / raw)
  To: Amit kumar; +Cc: Mike Kinney, edk2-devel@lists.01.org

Amit,

In regards to AVX/AVX2 performance how are you doing the measuring?

In EFI it is hard to measure wall clock time for things that take a long time. Basically there is no scheduler in EFI and no threads, but there are events. The events can preempt your App while it is running and the time spent in events would look to you like time spent in your App. 

Generally the time spent in events should be constant (hot plugging USB or other changes like that may have a noticeable impact). If the goal of the performance measurement is to make the system boot faster you care more about the delta, than the absolute time (so the event overhead does not matter). 

If you are just doing a computation that does not do any IO then you may be able to raise the TPL to prevent events from being part of your measurement. 

Thanks,

Andrew Fish

PS I assume your are measuring the RELEASE code since you are turning off optimization on the DEBUG code. 

> On May 4, 2017, at 5:22 AM, Amit kumar <akamit91@hotmail.com> wrote:
> 
> Here are the compiler flags
> [BuildOptions]
>  MSFT:DEBUG_*_*_CC_FLAGS = /Od /FAsc /GL-
>  MSFT:RELEASE_*_*_CC_FLAGS = /FAsc /D MDEPKG_NDEBUG
>  MSFT:RELEASE_*_*_DLINK_FLAGS = /BASE:0x10000  /ALIGN:4096 /FILEALIGN:4096
> 
> 
> ________________________________
> From: Amit kumar <akamit91@hotmail.com>
> Sent: Thursday, May 4, 2017 5:48:11 PM
> To: Andrew Fish
> Cc: Mike Kinney; edk2-devel@lists.01.org
> Subject: Re: [edk2] Accessing AVX/AVX2 instruction in UEFI.
> 
> 
> Yes am aligning the data at 32 byte boundary while allocating memory in both environments.
> 
> in windows using  _alligned_malloc(size,32);
> 
> in UEFI
> 
> Offset = (UINTN)src & 0xFF;
> 
> src = (CHAR8 *)((UINTN) src - Offset + 0x20);
> 
> 
> Thanks
> 
> Amit
> 
> ________________________________
> From: afish@apple.com <afish@apple.com> on behalf of Andrew Fish <afish@apple.com>
> Sent: Thursday, May 4, 2017 5:02:55 PM
> To: Amit kumar
> Cc: Mike Kinney; edk2-devel@lists.01.org
> Subject: Re: [edk2] Accessing AVX/AVX2 instruction in UEFI.
> 
> 
>> On May 4, 2017, at 4:13 AM, Amit kumar <akamit91@hotmail.com> wrote:
>> 
>> Hi,
>> 
>> 
>> Even after using AVX2 instruction my code shown no performance improvement in UEFI although there is substantial improvement when i run the similar code in windows .
>> 
>> Am i missing something ?
>> 
> 
> Is the data aligned the same in both environments?
> 
> Thanks,
> 
> Andrew Fish
> 
>> Using MSVC compiler and the codes written in ASM.
>> 
>> Thanks And Regards
>> 
>> Amit
>> 
>> ________________________________
>> From: edk2-devel <edk2-devel-bounces@lists.01.org> on behalf of Amit kumar <akamit91@hotmail.com>
>> Sent: Wednesday, May 3, 2017 11:18:39 AM
>> To: Kinney, Michael D; Andrew Fish
>> Cc: edk2-devel@lists.01.org
>> Subject: Re: [edk2] Accessing AVX/AVX2 instruction in UEFI.
>> 
>> Thank you Michael and Andrew
>> 
>> 
>> Regards
>> 
>> Amit
>> 
>> ________________________________
>> From: Kinney, Michael D <michael.d.kinney@intel.com>
>> Sent: Tuesday, May 2, 2017 10:33:45 PM
>> To: Andrew Fish; Amit kumar; Kinney, Michael D
>> Cc: edk2-devel@lists.01.org
>> Subject: RE: [edk2] Accessing AVX/AVX2 instruction in UEFI.
>> 
>> Amit,
>> 
>> The information from Andrew is correct.
>> 
>> The document that covers this topic is the
>> Intel(r) 64 and IA-32 Architectures Software Developer Manuals
>> 
>> https://software.intel.com/en-us/articles/intel-sdm
>> 
>> Volume 1, Section 13.5.3 describes the AVX State.  There are
>> More details about detecting and enabling different AVX features
>> in that document.
>> 
>> If the CPU supports AVX, then the basic assembly instructions
>> required to use AVX instructions are the following that sets
>> bits 0, 1, 2 of XCR0.
>> 
>>   mov     rcx, 0
>>   xgetbv
>>   or      rax, 0007h
>>   xsetbv
>> 
>> One additional item you need to be aware of is that UEFI firmware only
>> saves/Restores CPU registers required for the UEFI ABI calling convention
>> when a timer interrupt or exception is processed.
>> 
>> This means CPU state such as the YMM registers are not saved/restored
>> across an interrupt and may be modified if code in interrupt context
>> also uses YMM registers.
>> 
>> When you enable the use of extended registers, interrupts should be
>> saved/disabled and restored around the extended register usage.
>> 
>> You can use the following functions from MdePkg BaseLib to do this
>> 
>> /**
>> Disables CPU interrupts and returns the interrupt state prior to the disable
>> operation.
>> 
>> @retval TRUE  CPU interrupts were enabled on entry to this call.
>> @retval FALSE CPU interrupts were disabled on entry to this call.
>> 
>> **/
>> BOOLEAN
>> EFIAPI
>> SaveAndDisableInterrupts (
>> VOID
>> );
>> 
>> /**
>> Set the current CPU interrupt state.
>> 
>> Sets the current CPU interrupt state to the state specified by
>> InterruptState. If InterruptState is TRUE, then interrupts are enabled. If
>> InterruptState is FALSE, then interrupts are disabled. InterruptState is
>> returned.
>> 
>> @param  InterruptState  TRUE if interrupts should enabled. FALSE if
>>                         interrupts should be disabled.
>> 
>> @return InterruptState
>> 
>> **/
>> BOOLEAN
>> EFIAPI
>> SetInterruptState (
>> IN      BOOLEAN                   InterruptState
>> );
>> 
>> Algorithm:
>> ============
>> {
>> BOOLEAN  InterruptState;
>> 
>> InterruptState = SaveAndDisableInterrupts();
>> 
>> // Enable use of AVX/AVX2 instructions
>> 
>> // Use AVX/AVX2 instructions
>> 
>> SetInterruptState (InterruptState);
>> }
>> 
>> Best regards,
>> 
>> Mike
>> 
>>> -----Original Message-----
>>> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Andrew Fish
>>> Sent: Tuesday, May 2, 2017 8:12 AM
>>> To: Amit kumar <akamit91@hotmail.com>
>>> Cc: edk2-devel@lists.01.org
>>> Subject: Re: [edk2] Accessing AVX/AVX2 instruction in UEFI.
>>> 
>>> 
>>>> On May 2, 2017, at 6:57 AM, Amit kumar <akamit91@hotmail.com> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> Am trying to optimize an application using AVX/AVX2, but my code hangs while trying
>>> to access YMM registers.
>>>> The instruction where my code hangs is :
>>>> 
>>>> 
>>>> vmovups ymm0, YMMWORD PTR [rax]
>>>> 
>>>> 
>>>> I have verified the cpuid in OS and it supports AVX and AVX2 instruction. Processor
>>> i7 6th gen.
>>>> Can somebody help me out here ? Is there a way to enable YMM registers ?
>>>> 
>>> 
>>> Amit,
>>> 
>>> I think these instructions will generate an illegal instruction fault until you enable
>>> AVX. You need to check the Cpu ID bits in your code, then write BIT18 of CR4. After
>>> that XGETBV/XSETBV instructions are enabled and you can or in the lower 2 bits of
>>> XCR0. This basic operation is in the Intel Docs, it is just hard to find. Usually the
>>> OS has done this for the programmer and all the code needs to do is check the CPU ID.
>>> 
>>> Thanks,
>>> 
>>> Andrew Fish
>>> 
>>>> 
>>>> Thanks And Regards
>>>> Amit Kumar
>>>> 
>>>> _______________________________________________
>>>> edk2-devel mailing list
>>>> edk2-devel@lists.01.org
>>>> https://lists.01.org/mailman/listinfo/edk2-devel
>>> 
>>> _______________________________________________
>>> edk2-devel mailing list
>>> edk2-devel@lists.01.org
>>> https://lists.01.org/mailman/listinfo/edk2-devel
>> _______________________________________________
>> edk2-devel mailing list
>> edk2-devel@lists.01.org
>> https://lists.01.org/mailman/listinfo/edk2-devel
>> _______________________________________________
>> edk2-devel mailing list
>> edk2-devel@lists.01.org
>> https://lists.01.org/mailman/listinfo/edk2-devel
> 
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Accessing AVX/AVX2 instruction in UEFI.
  2017-05-04 15:20               ` Andrew Fish
@ 2017-05-04 17:26                 ` Kinney, Michael D
  2017-05-05 13:08                   ` Amit kumar
  0 siblings, 1 reply; 11+ messages in thread
From: Kinney, Michael D @ 2017-05-04 17:26 UTC (permalink / raw)
  To: Andrew Fish, Amit kumar, Kinney, Michael D; +Cc: edk2-devel@lists.01.org

Amit,

I agree with Andrew that establishing a good measurement method is very
important and that raising TPL to HIGH_LEVEL(disabling interrupts) during
measurement may improve the consistency of the measurement results.

You also likely want to test both large buffer operations as well as a
loop on small buffer operations to see if there are differences based
on the size of the requested operation.

In order to verify that your measurement method is working, you may want
to test some of the existing BaseMemoryLib implementations before testing
your new one.

* BaseMemoryLib        C code implementation
* BaseMemoryLibMmx     Uses MMX registers/instructions
* BaseMemoryLibSse2    Uses SSE2 registers/instructions
* BaseMemoryLibRepStr  Uses REP STR instructions

* BaseMemoryLibOptDxe  Blend of above libs with good perf in DXE/UEFI phase
* BaseMemoryLibOptPei  Blend of above libs with good perf in PEI phase


I recommend you try measuring the first 4 to see if your measurements show
differences.

Base on my own evaluation in the past, I have found that DXE/UEFI code works
well with BaseMemoryLibRepStr.  It tends to go as fast as the largest
register width access the CPU supports.

One additional element that may be impacting your results is the type of
memory that is being testing and that memory ranges cache settings.  If
you are accessing MMIO, FLASH, or some other type of device memory, you
may be seeing bandwidth limitations from that device. 

Best regards,

Mike



> -----Original Message-----
> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Andrew Fish
> Sent: Thursday, May 4, 2017 8:21 AM
> To: Amit kumar <akamit91@hotmail.com>
> Cc: Kinney, Michael D <michael.d.kinney@intel.com>; edk2-devel@lists.01.org
> Subject: Re: [edk2] Accessing AVX/AVX2 instruction in UEFI.
> 
> Amit,
> 
> In regards to AVX/AVX2 performance how are you doing the measuring?
> 
> In EFI it is hard to measure wall clock time for things that take a long time.
> Basically there is no scheduler in EFI and no threads, but there are events. The
> events can preempt your App while it is running and the time spent in events would
> look to you like time spent in your App.
> 
> Generally the time spent in events should be constant (hot plugging USB or other
> changes like that may have a noticeable impact). If the goal of the performance
> measurement is to make the system boot faster you care more about the delta, than the
> absolute time (so the event overhead does not matter).
> 
> If you are just doing a computation that does not do any IO then you may be able to
> raise the TPL to prevent events from being part of your measurement.
> 
> Thanks,
> 
> Andrew Fish
> 
> PS I assume your are measuring the RELEASE code since you are turning off optimization
> on the DEBUG code.
> 
> > On May 4, 2017, at 5:22 AM, Amit kumar <akamit91@hotmail.com> wrote:
> >
> > Here are the compiler flags
> > [BuildOptions]
> >  MSFT:DEBUG_*_*_CC_FLAGS = /Od /FAsc /GL-
> >  MSFT:RELEASE_*_*_CC_FLAGS = /FAsc /D MDEPKG_NDEBUG
> >  MSFT:RELEASE_*_*_DLINK_FLAGS = /BASE:0x10000  /ALIGN:4096 /FILEALIGN:4096
> >
> >
> > ________________________________
> > From: Amit kumar <akamit91@hotmail.com>
> > Sent: Thursday, May 4, 2017 5:48:11 PM
> > To: Andrew Fish
> > Cc: Mike Kinney; edk2-devel@lists.01.org
> > Subject: Re: [edk2] Accessing AVX/AVX2 instruction in UEFI.
> >
> >
> > Yes am aligning the data at 32 byte boundary while allocating memory in both
> environments.
> >
> > in windows using  _alligned_malloc(size,32);
> >
> > in UEFI
> >
> > Offset = (UINTN)src & 0xFF;
> >
> > src = (CHAR8 *)((UINTN) src - Offset + 0x20);
> >
> >
> > Thanks
> >
> > Amit
> >
> > ________________________________
> > From: afish@apple.com <afish@apple.com> on behalf of Andrew Fish <afish@apple.com>
> > Sent: Thursday, May 4, 2017 5:02:55 PM
> > To: Amit kumar
> > Cc: Mike Kinney; edk2-devel@lists.01.org
> > Subject: Re: [edk2] Accessing AVX/AVX2 instruction in UEFI.
> >
> >
> >> On May 4, 2017, at 4:13 AM, Amit kumar <akamit91@hotmail.com> wrote:
> >>
> >> Hi,
> >>
> >>
> >> Even after using AVX2 instruction my code shown no performance improvement in UEFI
> although there is substantial improvement when i run the similar code in windows .
> >>
> >> Am i missing something ?
> >>
> >
> > Is the data aligned the same in both environments?
> >
> > Thanks,
> >
> > Andrew Fish
> >
> >> Using MSVC compiler and the codes written in ASM.
> >>
> >> Thanks And Regards
> >>
> >> Amit
> >>
> >> ________________________________
> >> From: edk2-devel <edk2-devel-bounces@lists.01.org> on behalf of Amit kumar
> <akamit91@hotmail.com>
> >> Sent: Wednesday, May 3, 2017 11:18:39 AM
> >> To: Kinney, Michael D; Andrew Fish
> >> Cc: edk2-devel@lists.01.org
> >> Subject: Re: [edk2] Accessing AVX/AVX2 instruction in UEFI.
> >>
> >> Thank you Michael and Andrew
> >>
> >>
> >> Regards
> >>
> >> Amit
> >>
> >> ________________________________
> >> From: Kinney, Michael D <michael.d.kinney@intel.com>
> >> Sent: Tuesday, May 2, 2017 10:33:45 PM
> >> To: Andrew Fish; Amit kumar; Kinney, Michael D
> >> Cc: edk2-devel@lists.01.org
> >> Subject: RE: [edk2] Accessing AVX/AVX2 instruction in UEFI.
> >>
> >> Amit,
> >>
> >> The information from Andrew is correct.
> >>
> >> The document that covers this topic is the
> >> Intel(r) 64 and IA-32 Architectures Software Developer Manuals
> >>
> >> https://software.intel.com/en-us/articles/intel-sdm
> >>
> >> Volume 1, Section 13.5.3 describes the AVX State.  There are
> >> More details about detecting and enabling different AVX features
> >> in that document.
> >>
> >> If the CPU supports AVX, then the basic assembly instructions
> >> required to use AVX instructions are the following that sets
> >> bits 0, 1, 2 of XCR0.
> >>
> >>   mov     rcx, 0
> >>   xgetbv
> >>   or      rax, 0007h
> >>   xsetbv
> >>
> >> One additional item you need to be aware of is that UEFI firmware only
> >> saves/Restores CPU registers required for the UEFI ABI calling convention
> >> when a timer interrupt or exception is processed.
> >>
> >> This means CPU state such as the YMM registers are not saved/restored
> >> across an interrupt and may be modified if code in interrupt context
> >> also uses YMM registers.
> >>
> >> When you enable the use of extended registers, interrupts should be
> >> saved/disabled and restored around the extended register usage.
> >>
> >> You can use the following functions from MdePkg BaseLib to do this
> >>
> >> /**
> >> Disables CPU interrupts and returns the interrupt state prior to the disable
> >> operation.
> >>
> >> @retval TRUE  CPU interrupts were enabled on entry to this call.
> >> @retval FALSE CPU interrupts were disabled on entry to this call.
> >>
> >> **/
> >> BOOLEAN
> >> EFIAPI
> >> SaveAndDisableInterrupts (
> >> VOID
> >> );
> >>
> >> /**
> >> Set the current CPU interrupt state.
> >>
> >> Sets the current CPU interrupt state to the state specified by
> >> InterruptState. If InterruptState is TRUE, then interrupts are enabled. If
> >> InterruptState is FALSE, then interrupts are disabled. InterruptState is
> >> returned.
> >>
> >> @param  InterruptState  TRUE if interrupts should enabled. FALSE if
> >>                         interrupts should be disabled.
> >>
> >> @return InterruptState
> >>
> >> **/
> >> BOOLEAN
> >> EFIAPI
> >> SetInterruptState (
> >> IN      BOOLEAN                   InterruptState
> >> );
> >>
> >> Algorithm:
> >> ============
> >> {
> >> BOOLEAN  InterruptState;
> >>
> >> InterruptState = SaveAndDisableInterrupts();
> >>
> >> // Enable use of AVX/AVX2 instructions
> >>
> >> // Use AVX/AVX2 instructions
> >>
> >> SetInterruptState (InterruptState);
> >> }
> >>
> >> Best regards,
> >>
> >> Mike
> >>
> >>> -----Original Message-----
> >>> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Andrew Fish
> >>> Sent: Tuesday, May 2, 2017 8:12 AM
> >>> To: Amit kumar <akamit91@hotmail.com>
> >>> Cc: edk2-devel@lists.01.org
> >>> Subject: Re: [edk2] Accessing AVX/AVX2 instruction in UEFI.
> >>>
> >>>
> >>>> On May 2, 2017, at 6:57 AM, Amit kumar <akamit91@hotmail.com> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> Am trying to optimize an application using AVX/AVX2, but my code hangs while
> trying
> >>> to access YMM registers.
> >>>> The instruction where my code hangs is :
> >>>>
> >>>>
> >>>> vmovups ymm0, YMMWORD PTR [rax]
> >>>>
> >>>>
> >>>> I have verified the cpuid in OS and it supports AVX and AVX2 instruction.
> Processor
> >>> i7 6th gen.
> >>>> Can somebody help me out here ? Is there a way to enable YMM registers ?
> >>>>
> >>>
> >>> Amit,
> >>>
> >>> I think these instructions will generate an illegal instruction fault until you
> enable
> >>> AVX. You need to check the Cpu ID bits in your code, then write BIT18 of CR4.
> After
> >>> that XGETBV/XSETBV instructions are enabled and you can or in the lower 2 bits of
> >>> XCR0. This basic operation is in the Intel Docs, it is just hard to find. Usually
> the
> >>> OS has done this for the programmer and all the code needs to do is check the CPU
> ID.
> >>>
> >>> Thanks,
> >>>
> >>> Andrew Fish
> >>>
> >>>>
> >>>> Thanks And Regards
> >>>> Amit Kumar
> >>>>
> >>>> _______________________________________________
> >>>> edk2-devel mailing list
> >>>> edk2-devel@lists.01.org
> >>>> https://lists.01.org/mailman/listinfo/edk2-devel
> >>>
> >>> _______________________________________________
> >>> edk2-devel mailing list
> >>> edk2-devel@lists.01.org
> >>> https://lists.01.org/mailman/listinfo/edk2-devel
> >> _______________________________________________
> >> edk2-devel mailing list
> >> edk2-devel@lists.01.org
> >> https://lists.01.org/mailman/listinfo/edk2-devel
> >> _______________________________________________
> >> edk2-devel mailing list
> >> edk2-devel@lists.01.org
> >> https://lists.01.org/mailman/listinfo/edk2-devel
> >
> > _______________________________________________
> > edk2-devel mailing list
> > edk2-devel@lists.01.org
> > https://lists.01.org/mailman/listinfo/edk2-devel
> 
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Accessing AVX/AVX2 instruction in UEFI.
  2017-05-04 17:26                 ` Kinney, Michael D
@ 2017-05-05 13:08                   ` Amit kumar
  0 siblings, 0 replies; 11+ messages in thread
From: Amit kumar @ 2017-05-05 13:08 UTC (permalink / raw)
  To: Kinney, Michael D, Andrew Fish; +Cc: edk2-devel@lists.01.org

Mike, Andrew


Thanks for your suggestions, it looks like MMIO is the bottleneck in my application.

I have one more query. Does each core have independent YMM registers or is it shared among the cores ?


Thanks And Regards

Amit Kumar

________________________________
From: Kinney, Michael D <michael.d.kinney@intel.com>
Sent: Thursday, May 4, 2017 10:56:44 PM
To: Andrew Fish; Amit kumar; Kinney, Michael D
Cc: edk2-devel@lists.01.org
Subject: RE: [edk2] Accessing AVX/AVX2 instruction in UEFI.

Amit,

I agree with Andrew that establishing a good measurement method is very
important and that raising TPL to HIGH_LEVEL(disabling interrupts) during
measurement may improve the consistency of the measurement results.

You also likely want to test both large buffer operations as well as a
loop on small buffer operations to see if there are differences based
on the size of the requested operation.

In order to verify that your measurement method is working, you may want
to test some of the existing BaseMemoryLib implementations before testing
your new one.

* BaseMemoryLib        C code implementation
* BaseMemoryLibMmx     Uses MMX registers/instructions
* BaseMemoryLibSse2    Uses SSE2 registers/instructions
* BaseMemoryLibRepStr  Uses REP STR instructions

* BaseMemoryLibOptDxe  Blend of above libs with good perf in DXE/UEFI phase
* BaseMemoryLibOptPei  Blend of above libs with good perf in PEI phase


I recommend you try measuring the first 4 to see if your measurements show
differences.

Base on my own evaluation in the past, I have found that DXE/UEFI code works
well with BaseMemoryLibRepStr.  It tends to go as fast as the largest
register width access the CPU supports.

One additional element that may be impacting your results is the type of
memory that is being testing and that memory ranges cache settings.  If
you are accessing MMIO, FLASH, or some other type of device memory, you
may be seeing bandwidth limitations from that device.

Best regards,

Mike



> -----Original Message-----
> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Andrew Fish
> Sent: Thursday, May 4, 2017 8:21 AM
> To: Amit kumar <akamit91@hotmail.com>
> Cc: Kinney, Michael D <michael.d.kinney@intel.com>; edk2-devel@lists.01.org
> Subject: Re: [edk2] Accessing AVX/AVX2 instruction in UEFI.
>
> Amit,
>
> In regards to AVX/AVX2 performance how are you doing the measuring?
>
> In EFI it is hard to measure wall clock time for things that take a long time.
> Basically there is no scheduler in EFI and no threads, but there are events. The
> events can preempt your App while it is running and the time spent in events would
> look to you like time spent in your App.
>
> Generally the time spent in events should be constant (hot plugging USB or other
> changes like that may have a noticeable impact). If the goal of the performance
> measurement is to make the system boot faster you care more about the delta, than the
> absolute time (so the event overhead does not matter).
>
> If you are just doing a computation that does not do any IO then you may be able to
> raise the TPL to prevent events from being part of your measurement.
>
> Thanks,
>
> Andrew Fish
>
> PS I assume your are measuring the RELEASE code since you are turning off optimization
> on the DEBUG code.
>
> > On May 4, 2017, at 5:22 AM, Amit kumar <akamit91@hotmail.com> wrote:
> >
> > Here are the compiler flags
> > [BuildOptions]
> >  MSFT:DEBUG_*_*_CC_FLAGS = /Od /FAsc /GL-
> >  MSFT:RELEASE_*_*_CC_FLAGS = /FAsc /D MDEPKG_NDEBUG
> >  MSFT:RELEASE_*_*_DLINK_FLAGS = /BASE:0x10000  /ALIGN:4096 /FILEALIGN:4096
> >
> >
> > ________________________________
> > From: Amit kumar <akamit91@hotmail.com>
> > Sent: Thursday, May 4, 2017 5:48:11 PM
> > To: Andrew Fish
> > Cc: Mike Kinney; edk2-devel@lists.01.org
> > Subject: Re: [edk2] Accessing AVX/AVX2 instruction in UEFI.
> >
> >
> > Yes am aligning the data at 32 byte boundary while allocating memory in both
> environments.
> >
> > in windows using  _alligned_malloc(size,32);
> >
> > in UEFI
> >
> > Offset = (UINTN)src & 0xFF;
> >
> > src = (CHAR8 *)((UINTN) src - Offset + 0x20);
> >
> >
> > Thanks
> >
> > Amit
> >
> > ________________________________
> > From: afish@apple.com <afish@apple.com> on behalf of Andrew Fish <afish@apple.com>
> > Sent: Thursday, May 4, 2017 5:02:55 PM
> > To: Amit kumar
> > Cc: Mike Kinney; edk2-devel@lists.01.org
> > Subject: Re: [edk2] Accessing AVX/AVX2 instruction in UEFI.
> >
> >
> >> On May 4, 2017, at 4:13 AM, Amit kumar <akamit91@hotmail.com> wrote:
> >>
> >> Hi,
> >>
> >>
> >> Even after using AVX2 instruction my code shown no performance improvement in UEFI
> although there is substantial improvement when i run the similar code in windows .
> >>
> >> Am i missing something ?
> >>
> >
> > Is the data aligned the same in both environments?
> >
> > Thanks,
> >
> > Andrew Fish
> >
> >> Using MSVC compiler and the codes written in ASM.
> >>
> >> Thanks And Regards
> >>
> >> Amit
> >>
> >> ________________________________
> >> From: edk2-devel <edk2-devel-bounces@lists.01.org> on behalf of Amit kumar
> <akamit91@hotmail.com>
> >> Sent: Wednesday, May 3, 2017 11:18:39 AM
> >> To: Kinney, Michael D; Andrew Fish
> >> Cc: edk2-devel@lists.01.org
> >> Subject: Re: [edk2] Accessing AVX/AVX2 instruction in UEFI.
> >>
> >> Thank you Michael and Andrew
> >>
> >>
> >> Regards
> >>
> >> Amit
> >>
> >> ________________________________
> >> From: Kinney, Michael D <michael.d.kinney@intel.com>
> >> Sent: Tuesday, May 2, 2017 10:33:45 PM
> >> To: Andrew Fish; Amit kumar; Kinney, Michael D
> >> Cc: edk2-devel@lists.01.org
> >> Subject: RE: [edk2] Accessing AVX/AVX2 instruction in UEFI.
> >>
> >> Amit,
> >>
> >> The information from Andrew is correct.
> >>
> >> The document that covers this topic is the
> >> Intel(r) 64 and IA-32 Architectures Software Developer Manuals
> >>
> >> https://software.intel.com/en-us/articles/intel-sdm
> >>
> >> Volume 1, Section 13.5.3 describes the AVX State.  There are
> >> More details about detecting and enabling different AVX features
> >> in that document.
> >>
> >> If the CPU supports AVX, then the basic assembly instructions
> >> required to use AVX instructions are the following that sets
> >> bits 0, 1, 2 of XCR0.
> >>
> >>   mov     rcx, 0
> >>   xgetbv
> >>   or      rax, 0007h
> >>   xsetbv
> >>
> >> One additional item you need to be aware of is that UEFI firmware only
> >> saves/Restores CPU registers required for the UEFI ABI calling convention
> >> when a timer interrupt or exception is processed.
> >>
> >> This means CPU state such as the YMM registers are not saved/restored
> >> across an interrupt and may be modified if code in interrupt context
> >> also uses YMM registers.
> >>
> >> When you enable the use of extended registers, interrupts should be
> >> saved/disabled and restored around the extended register usage.
> >>
> >> You can use the following functions from MdePkg BaseLib to do this
> >>
> >> /**
> >> Disables CPU interrupts and returns the interrupt state prior to the disable
> >> operation.
> >>
> >> @retval TRUE  CPU interrupts were enabled on entry to this call.
> >> @retval FALSE CPU interrupts were disabled on entry to this call.
> >>
> >> **/
> >> BOOLEAN
> >> EFIAPI
> >> SaveAndDisableInterrupts (
> >> VOID
> >> );
> >>
> >> /**
> >> Set the current CPU interrupt state.
> >>
> >> Sets the current CPU interrupt state to the state specified by
> >> InterruptState. If InterruptState is TRUE, then interrupts are enabled. If
> >> InterruptState is FALSE, then interrupts are disabled. InterruptState is
> >> returned.
> >>
> >> @param  InterruptState  TRUE if interrupts should enabled. FALSE if
> >>                         interrupts should be disabled.
> >>
> >> @return InterruptState
> >>
> >> **/
> >> BOOLEAN
> >> EFIAPI
> >> SetInterruptState (
> >> IN      BOOLEAN                   InterruptState
> >> );
> >>
> >> Algorithm:
> >> ============
> >> {
> >> BOOLEAN  InterruptState;
> >>
> >> InterruptState = SaveAndDisableInterrupts();
> >>
> >> // Enable use of AVX/AVX2 instructions
> >>
> >> // Use AVX/AVX2 instructions
> >>
> >> SetInterruptState (InterruptState);
> >> }
> >>
> >> Best regards,
> >>
> >> Mike
> >>
> >>> -----Original Message-----
> >>> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Andrew Fish
> >>> Sent: Tuesday, May 2, 2017 8:12 AM
> >>> To: Amit kumar <akamit91@hotmail.com>
> >>> Cc: edk2-devel@lists.01.org
> >>> Subject: Re: [edk2] Accessing AVX/AVX2 instruction in UEFI.
> >>>
> >>>
> >>>> On May 2, 2017, at 6:57 AM, Amit kumar <akamit91@hotmail.com> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> Am trying to optimize an application using AVX/AVX2, but my code hangs while
> trying
> >>> to access YMM registers.
> >>>> The instruction where my code hangs is :
> >>>>
> >>>>
> >>>> vmovups ymm0, YMMWORD PTR [rax]
> >>>>
> >>>>
> >>>> I have verified the cpuid in OS and it supports AVX and AVX2 instruction.
> Processor
> >>> i7 6th gen.
> >>>> Can somebody help me out here ? Is there a way to enable YMM registers ?
> >>>>
> >>>
> >>> Amit,
> >>>
> >>> I think these instructions will generate an illegal instruction fault until you
> enable
> >>> AVX. You need to check the Cpu ID bits in your code, then write BIT18 of CR4.
> After
> >>> that XGETBV/XSETBV instructions are enabled and you can or in the lower 2 bits of
> >>> XCR0. This basic operation is in the Intel Docs, it is just hard to find. Usually
> the
> >>> OS has done this for the programmer and all the code needs to do is check the CPU
> ID.
> >>>
> >>> Thanks,
> >>>
> >>> Andrew Fish
> >>>
> >>>>
> >>>> Thanks And Regards
> >>>> Amit Kumar
> >>>>
> >>>> _______________________________________________
> >>>> edk2-devel mailing list
> >>>> edk2-devel@lists.01.org
> >>>> https://lists.01.org/mailman/listinfo/edk2-devel
> >>>
> >>> _______________________________________________
> >>> edk2-devel mailing list
> >>> edk2-devel@lists.01.org
> >>> https://lists.01.org/mailman/listinfo/edk2-devel
> >> _______________________________________________
> >> edk2-devel mailing list
> >> edk2-devel@lists.01.org
> >> https://lists.01.org/mailman/listinfo/edk2-devel
> >> _______________________________________________
> >> edk2-devel mailing list
> >> edk2-devel@lists.01.org
> >> https://lists.01.org/mailman/listinfo/edk2-devel
> >
> > _______________________________________________
> > edk2-devel mailing list
> > edk2-devel@lists.01.org
> > https://lists.01.org/mailman/listinfo/edk2-devel
>
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-05-05 13:08 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-05-02 13:57 Accessing AVX/AVX2 instruction in UEFI Amit kumar
2017-05-02 15:12 ` Andrew Fish
2017-05-02 17:03   ` Kinney, Michael D
2017-05-03  5:48     ` Amit kumar
2017-05-04 11:13       ` Amit kumar
2017-05-04 11:32         ` Andrew Fish
2017-05-04 12:18           ` Amit kumar
2017-05-04 12:22             ` Amit kumar
2017-05-04 15:20               ` Andrew Fish
2017-05-04 17:26                 ` Kinney, Michael D
2017-05-05 13:08                   ` Amit kumar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox