public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
* [PATCH] UefiCpuPkg/PiSmmCpuDxeSmm: pause in WaitForSemaphore() before re-fetch
@ 2020-07-29 18:52 Laszlo Ersek
  2020-07-31  1:10 ` Dong, Eric
  0 siblings, 1 reply; 3+ messages in thread
From: Laszlo Ersek @ 2020-07-29 18:52 UTC (permalink / raw)
  To: edk2-devel-groups-io
  Cc: Eric Dong, Philippe Mathieu-Daudé, Rahul Kumar, Ray Ni

Most busy waits (spinlocks) in "UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c"
already call CpuPause() in their loop bodies; see SmmWaitForApArrival(),
APHandler(), and SmiRendezvous(). However, the "main wait" within
APHandler():

>     //
>     // Wait for something to happen
>     //
>     WaitForSemaphore (mSmmMpSyncData->CpuData[CpuIndex].Run);

doesn't do so, as WaitForSemaphore() keeps trying to acquire the semaphore
without pausing.

The performance impact is especially notable in QEMU/KVM + OVMF
virtualization with CPU overcommit (that is, when the guest has
significantly more VCPUs than the host has physical CPUs). The guest BSP
is working heavily in:

  BSPHandler()                  [MpService.c]
    PerformRemainingTasks()     [PiSmmCpuDxeSmm.c]
      SetUefiMemMapAttributes() [SmmCpuMemoryManagement.c]

while the many guest APs are spinning in the "Wait for something to
happen" semaphore acquisition, in APHandler(). The guest APs are
generating useless memory traffic and saturating host CPUs, hindering the
guest BSP's progress in SetUefiMemMapAttributes().

Rework the loop in WaitForSemaphore(): call CpuPause() in every iteration
after the first check fails. Due to Pause Loop Exiting (known as Pause
Filter on AMD), the host scheduler can favor the guest BSP over the guest
APs.

Running a 16 GB RAM + 512 VCPU guest on a 448 PCPU host, this patch
reduces OVMF boot time (counted until reaching grub) from 20-30 minutes to
less than 4 minutes.

The patch should benefit physical machines as well -- according to the
Intel SDM, PAUSE "Improves the performance of spin-wait loops". Adding
PAUSE to the generic WaitForSemaphore() function is considered a general
improvement.

Cc: Eric Dong <eric.dong@intel.com>
Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
Cc: Rahul Kumar <rahul1.kumar@intel.com>
Cc: Ray Ni <ray.ni@intel.com>
Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1861718
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
---

Notes:
    Repo:   https://pagure.io/lersek/edk2.git
    Branch: sem_wait_pause_rhbz1861718

 UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
index 57e788c01b1f..4bcd217917d7 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
@@ -40,14 +40,18 @@ WaitForSemaphore (
 {
   UINT32                            Value;
 
-  do {
+  for (;;) {
     Value = *Sem;
-  } while (Value == 0 ||
-           InterlockedCompareExchange32 (
-             (UINT32*)Sem,
-             Value,
-             Value - 1
-             ) != Value);
+    if (Value != 0 &&
+        InterlockedCompareExchange32 (
+          (UINT32*)Sem,
+          Value,
+          Value - 1
+          ) == Value) {
+      break;
+    }
+    CpuPause ();
+  }
   return Value - 1;
 }
 
-- 
2.19.1.3.g30247aa5d201


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] UefiCpuPkg/PiSmmCpuDxeSmm: pause in WaitForSemaphore() before re-fetch
  2020-07-29 18:52 [PATCH] UefiCpuPkg/PiSmmCpuDxeSmm: pause in WaitForSemaphore() before re-fetch Laszlo Ersek
@ 2020-07-31  1:10 ` Dong, Eric
  2020-07-31 13:30   ` [edk2-devel] " Laszlo Ersek
  0 siblings, 1 reply; 3+ messages in thread
From: Dong, Eric @ 2020-07-31  1:10 UTC (permalink / raw)
  To: Laszlo Ersek, edk2-devel-groups-io
  Cc: Philippe Mathieu-Daudé, Kumar, Rahul1, Ni, Ray

Reviewed-by: Eric Dong <eric.dong@intel.com>

> -----Original Message-----
> From: Laszlo Ersek <lersek@redhat.com>
> Sent: Thursday, July 30, 2020 2:52 AM
> To: edk2-devel-groups-io <devel@edk2.groups.io>
> Cc: Dong, Eric <eric.dong@intel.com>; Philippe Mathieu-Daudé
> <philmd@redhat.com>; Kumar, Rahul1 <rahul1.kumar@intel.com>; Ni, Ray
> <ray.ni@intel.com>
> Subject: [PATCH] UefiCpuPkg/PiSmmCpuDxeSmm: pause in
> WaitForSemaphore() before re-fetch
> 
> Most busy waits (spinlocks) in
> "UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c"
> already call CpuPause() in their loop bodies; see SmmWaitForApArrival(),
> APHandler(), and SmiRendezvous(). However, the "main wait" within
> APHandler():
> 
> >     //
> >     // Wait for something to happen
> >     //
> >     WaitForSemaphore (mSmmMpSyncData->CpuData[CpuIndex].Run);
> 
> doesn't do so, as WaitForSemaphore() keeps trying to acquire the
> semaphore without pausing.
> 
> The performance impact is especially notable in QEMU/KVM + OVMF
> virtualization with CPU overcommit (that is, when the guest has significantly
> more VCPUs than the host has physical CPUs). The guest BSP is working
> heavily in:
> 
>   BSPHandler()                  [MpService.c]
>     PerformRemainingTasks()     [PiSmmCpuDxeSmm.c]
>       SetUefiMemMapAttributes() [SmmCpuMemoryManagement.c]
> 
> while the many guest APs are spinning in the "Wait for something to happen"
> semaphore acquisition, in APHandler(). The guest APs are generating useless
> memory traffic and saturating host CPUs, hindering the guest BSP's progress
> in SetUefiMemMapAttributes().
> 
> Rework the loop in WaitForSemaphore(): call CpuPause() in every iteration
> after the first check fails. Due to Pause Loop Exiting (known as Pause Filter on
> AMD), the host scheduler can favor the guest BSP over the guest APs.
> 
> Running a 16 GB RAM + 512 VCPU guest on a 448 PCPU host, this patch
> reduces OVMF boot time (counted until reaching grub) from 20-30 minutes
> to less than 4 minutes.
> 
> The patch should benefit physical machines as well -- according to the Intel
> SDM, PAUSE "Improves the performance of spin-wait loops". Adding PAUSE
> to the generic WaitForSemaphore() function is considered a general
> improvement.
> 
> Cc: Eric Dong <eric.dong@intel.com>
> Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
> Cc: Rahul Kumar <rahul1.kumar@intel.com>
> Cc: Ray Ni <ray.ni@intel.com>
> Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1861718
> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
> ---
> 
> Notes:
>     Repo:   https://pagure.io/lersek/edk2.git
>     Branch: sem_wait_pause_rhbz1861718
> 
>  UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c | 18 +++++++++++-------
>  1 file changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
> b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
> index 57e788c01b1f..4bcd217917d7 100644
> --- a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
> +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
> @@ -40,14 +40,18 @@ WaitForSemaphore (
>  {
>    UINT32                            Value;
> 
> -  do {
> +  for (;;) {
>      Value = *Sem;
> -  } while (Value == 0 ||
> -           InterlockedCompareExchange32 (
> -             (UINT32*)Sem,
> -             Value,
> -             Value - 1
> -             ) != Value);
> +    if (Value != 0 &&
> +        InterlockedCompareExchange32 (
> +          (UINT32*)Sem,
> +          Value,
> +          Value - 1
> +          ) == Value) {
> +      break;
> +    }
> +    CpuPause ();
> +  }
>    return Value - 1;
>  }
> 
> --
> 2.19.1.3.g30247aa5d201

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [edk2-devel] [PATCH] UefiCpuPkg/PiSmmCpuDxeSmm: pause in WaitForSemaphore() before re-fetch
  2020-07-31  1:10 ` Dong, Eric
@ 2020-07-31 13:30   ` Laszlo Ersek
  0 siblings, 0 replies; 3+ messages in thread
From: Laszlo Ersek @ 2020-07-31 13:30 UTC (permalink / raw)
  To: devel, eric.dong; +Cc: Philippe Mathieu-Daudé, Kumar, Rahul1, Ni, Ray

On 07/31/20 03:10, Dong, Eric wrote:
> Reviewed-by: Eric Dong <eric.dong@intel.com>

Thank you, merged as commit 9001b750df64, via
<https://github.com/tianocore/edk2/pull/843>.

Laszlo

>> -----Original Message-----
>> From: Laszlo Ersek <lersek@redhat.com>
>> Sent: Thursday, July 30, 2020 2:52 AM
>> To: edk2-devel-groups-io <devel@edk2.groups.io>
>> Cc: Dong, Eric <eric.dong@intel.com>; Philippe Mathieu-Daudé
>> <philmd@redhat.com>; Kumar, Rahul1 <rahul1.kumar@intel.com>; Ni, Ray
>> <ray.ni@intel.com>
>> Subject: [PATCH] UefiCpuPkg/PiSmmCpuDxeSmm: pause in
>> WaitForSemaphore() before re-fetch
>>
>> Most busy waits (spinlocks) in
>> "UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c"
>> already call CpuPause() in their loop bodies; see SmmWaitForApArrival(),
>> APHandler(), and SmiRendezvous(). However, the "main wait" within
>> APHandler():
>>
>>>     //
>>>     // Wait for something to happen
>>>     //
>>>     WaitForSemaphore (mSmmMpSyncData->CpuData[CpuIndex].Run);
>>
>> doesn't do so, as WaitForSemaphore() keeps trying to acquire the
>> semaphore without pausing.
>>
>> The performance impact is especially notable in QEMU/KVM + OVMF
>> virtualization with CPU overcommit (that is, when the guest has significantly
>> more VCPUs than the host has physical CPUs). The guest BSP is working
>> heavily in:
>>
>>   BSPHandler()                  [MpService.c]
>>     PerformRemainingTasks()     [PiSmmCpuDxeSmm.c]
>>       SetUefiMemMapAttributes() [SmmCpuMemoryManagement.c]
>>
>> while the many guest APs are spinning in the "Wait for something to happen"
>> semaphore acquisition, in APHandler(). The guest APs are generating useless
>> memory traffic and saturating host CPUs, hindering the guest BSP's progress
>> in SetUefiMemMapAttributes().
>>
>> Rework the loop in WaitForSemaphore(): call CpuPause() in every iteration
>> after the first check fails. Due to Pause Loop Exiting (known as Pause Filter on
>> AMD), the host scheduler can favor the guest BSP over the guest APs.
>>
>> Running a 16 GB RAM + 512 VCPU guest on a 448 PCPU host, this patch
>> reduces OVMF boot time (counted until reaching grub) from 20-30 minutes
>> to less than 4 minutes.
>>
>> The patch should benefit physical machines as well -- according to the Intel
>> SDM, PAUSE "Improves the performance of spin-wait loops". Adding PAUSE
>> to the generic WaitForSemaphore() function is considered a general
>> improvement.
>>
>> Cc: Eric Dong <eric.dong@intel.com>
>> Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
>> Cc: Rahul Kumar <rahul1.kumar@intel.com>
>> Cc: Ray Ni <ray.ni@intel.com>
>> Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1861718
>> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
>> ---
>>
>> Notes:
>>     Repo:   https://pagure.io/lersek/edk2.git
>>     Branch: sem_wait_pause_rhbz1861718
>>
>>  UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c | 18 +++++++++++-------
>>  1 file changed, 11 insertions(+), 7 deletions(-)
>>
>> diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
>> b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
>> index 57e788c01b1f..4bcd217917d7 100644
>> --- a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
>> +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
>> @@ -40,14 +40,18 @@ WaitForSemaphore (
>>  {
>>    UINT32                            Value;
>>
>> -  do {
>> +  for (;;) {
>>      Value = *Sem;
>> -  } while (Value == 0 ||
>> -           InterlockedCompareExchange32 (
>> -             (UINT32*)Sem,
>> -             Value,
>> -             Value - 1
>> -             ) != Value);
>> +    if (Value != 0 &&
>> +        InterlockedCompareExchange32 (
>> +          (UINT32*)Sem,
>> +          Value,
>> +          Value - 1
>> +          ) == Value) {
>> +      break;
>> +    }
>> +    CpuPause ();
>> +  }
>>    return Value - 1;
>>  }
>>
>> --
>> 2.19.1.3.g30247aa5d201
> 
> 
> 


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-07-31 13:30 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-07-29 18:52 [PATCH] UefiCpuPkg/PiSmmCpuDxeSmm: pause in WaitForSemaphore() before re-fetch Laszlo Ersek
2020-07-31  1:10 ` Dong, Eric
2020-07-31 13:30   ` [edk2-devel] " Laszlo Ersek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox