public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
* [Patch 0/3] Optimize load uCode performance
@ 2018-07-11 11:07 Eric Dong
  2018-07-11 11:07 ` [Patch 1/3] UefiCpuPkg/MpInitLib: Relocate uCode to memory to save time Eric Dong
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Eric Dong @ 2018-07-11 11:07 UTC (permalink / raw)
  To: edk2-devel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 929 bytes --]

Use below three rules to optimize load uCode performance:
1. Let BSP relocate uCode from flash to memory for better performance.
2. BSP caches the CPU ID and address of uCode so AP doesn’t need to look 
   for the uCode again if the CPU ID is same as BSP’s.
3. Only apply uCode in one thread of a core when hyper threading is enabled.

Test:
Use an sample platform which has 1 socket, 4 core, 8 threads, the CpuMpPei
driver cost time reduce from 108.4ms to 27.2ms

Eric Dong (3):
  UefiCpuPkg/MpInitLib: Relocate uCode to memory to save time.
  UefiCpuPkg/MpInitLib: Use BSP uCode for APs if possible.
  UefiCpuPkg/MpInitLib: Load uCode once for one core.

 UefiCpuPkg/Library/MpInitLib/Microcode.c | 43 +++++++++++++++++++++++++++++---
 UefiCpuPkg/Library/MpInitLib/MpLib.c     | 17 ++++++++++---
 UefiCpuPkg/Library/MpInitLib/MpLib.h     | 11 ++++++--
 3 files changed, 63 insertions(+), 8 deletions(-)

-- 
2.15.0.windows.1



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Patch 1/3] UefiCpuPkg/MpInitLib: Relocate uCode to memory to save time.
  2018-07-11 11:07 [Patch 0/3] Optimize load uCode performance Eric Dong
@ 2018-07-11 11:07 ` Eric Dong
  2018-07-12  9:26   ` Laszlo Ersek
  2018-07-11 11:07 ` [Patch 2/3] UefiCpuPkg/MpInitLib: Use BSP uCode for APs if possible Eric Dong
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Eric Dong @ 2018-07-11 11:07 UTC (permalink / raw)
  To: edk2-devel; +Cc: Laszlo Ersek, Ruiyu Ni

Read uCode from memory has better performance than from flash.
But it needs extra effort to let BSP copy uCode from flash to
memory. Also BSP already enable cache in SEC phase, so it use
less time to relocate uCode from flash to memory. After
verification, if system has more than one processor, it will
reduce some time if load uCode from memory.

This change enable this optimization.

Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Ruiyu Ni <ruiyu.ni@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Eric Dong <eric.dong@intel.com>
---
 UefiCpuPkg/Library/MpInitLib/MpLib.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c b/UefiCpuPkg/Library/MpInitLib/MpLib.c
index eec178b419..8b458a4a3a 100644
--- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
+++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
@@ -1560,8 +1560,19 @@ MpInitLibInitialize (
   CpuMpData->SwitchBspFlag    = FALSE;
   CpuMpData->CpuData          = (CPU_AP_DATA *) (CpuMpData + 1);
   CpuMpData->CpuInfoInHob     = (UINT64) (UINTN) (CpuMpData->CpuData + MaxLogicalProcessorNumber);
-  CpuMpData->MicrocodePatchAddress    = PcdGet64 (PcdCpuMicrocodePatchAddress);
   CpuMpData->MicrocodePatchRegionSize = PcdGet64 (PcdCpuMicrocodePatchRegionSize);
+  //
+  // if platform has more than one CPU, relocate microcode to memory to reduce loading microcode time.
+  //
+  if (MaxLogicalProcessorNumber > 1) {
+    CpuMpData->MicrocodePatchAddress = (UINT64) (UINTN) AllocatePages (EFI_SIZE_TO_PAGES ((UINTN)CpuMpData->MicrocodePatchRegionSize));
+    if (CpuMpData->MicrocodePatchAddress != 0) {
+      CopyMem ((VOID *) (UINTN)CpuMpData->MicrocodePatchAddress, (VOID *)(UINTN)(PcdGet64 (PcdCpuMicrocodePatchAddress)), (UINTN)CpuMpData->MicrocodePatchRegionSize);
+    }
+  } else {
+    CpuMpData->MicrocodePatchAddress = PcdGet64 (PcdCpuMicrocodePatchAddress);
+  }
+
   InitializeSpinLock(&CpuMpData->MpLock);
   //
   // Save BSP's Control registers to APs
-- 
2.15.0.windows.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Patch 2/3] UefiCpuPkg/MpInitLib: Use BSP uCode for APs if possible.
  2018-07-11 11:07 [Patch 0/3] Optimize load uCode performance Eric Dong
  2018-07-11 11:07 ` [Patch 1/3] UefiCpuPkg/MpInitLib: Relocate uCode to memory to save time Eric Dong
@ 2018-07-11 11:07 ` Eric Dong
  2018-07-12  9:42   ` Laszlo Ersek
  2018-07-11 11:07 ` [Patch 3/3] UefiCpuPkg/MpInitLib: Load uCode once for one core Eric Dong
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Eric Dong @ 2018-07-11 11:07 UTC (permalink / raw)
  To: edk2-devel; +Cc: Laszlo Ersek, Ruiyu Ni

Search uCode costs much time, if AP has same processor type
with BSP, AP can use BSP saved uCode info to get better performance.

This change enables this solution.

Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Ruiyu Ni <ruiyu.ni@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Eric Dong <eric.dong@intel.com>
---
 UefiCpuPkg/Library/MpInitLib/Microcode.c | 34 +++++++++++++++++++++++++++++---
 UefiCpuPkg/Library/MpInitLib/MpLib.c     |  4 ++--
 UefiCpuPkg/Library/MpInitLib/MpLib.h     | 11 +++++++++--
 3 files changed, 42 insertions(+), 7 deletions(-)

diff --git a/UefiCpuPkg/Library/MpInitLib/Microcode.c b/UefiCpuPkg/Library/MpInitLib/Microcode.c
index e47f9f4f8f..351975e2a2 100644
--- a/UefiCpuPkg/Library/MpInitLib/Microcode.c
+++ b/UefiCpuPkg/Library/MpInitLib/Microcode.c
@@ -35,11 +35,13 @@ GetCurrentMicrocodeSignature (
 /**
   Detect whether specified processor can find matching microcode patch and load it.
 
-  @param[in]  CpuMpData  The pointer to CPU MP Data structure.
+  @param[in]  CpuMpData    The pointer to CPU MP Data structure.
+  @param[in]  IsBspCallIn  Indicate whether the caller is BSP or not.
 **/
 VOID
 MicrocodeDetect (
-  IN CPU_MP_DATA             *CpuMpData
+  IN CPU_MP_DATA             *CpuMpData,
+  IN BOOLEAN                 IsBspCallIn
   )
 {
   UINT32                                  ExtendedTableLength;
@@ -58,6 +60,7 @@ MicrocodeDetect (
   BOOLEAN                                 CorrectMicrocode;
   VOID                                    *MicrocodeData;
   MSR_IA32_PLATFORM_ID_REGISTER           PlatformIdMsr;
+  UINT32                                  ProcessorFlags;
 
   if (CpuMpData->MicrocodePatchRegionSize == 0) {
     //
@@ -67,7 +70,7 @@ MicrocodeDetect (
   }
 
   CurrentRevision = GetCurrentMicrocodeSignature ();
-  if (CurrentRevision != 0) {
+  if (CurrentRevision != 0 && !IsBspCallIn) {
     //
     // Skip loading microcode if it has been loaded successfully
     //
@@ -87,6 +90,19 @@ MicrocodeDetect (
   PlatformIdMsr.Uint64 = AsmReadMsr64 (MSR_IA32_PLATFORM_ID);
   PlatformId = (UINT8) PlatformIdMsr.Bits.PlatformId;
 
+  //
+  // Check whether AP has same processor with BSP.
+  // If yes, direct use microcode info saved by BSP.
+  //
+  if (!IsBspCallIn) {
+    if ((CpuMpData->ProcessorSignature == Eax.Uint32) &&
+        (CpuMpData->ProcessorFlags & (1 << PlatformId)) != 0) {
+        MicrocodeData = (VOID *)(UINTN) CpuMpData->MicrocodeDataAddress;
+        LatestRevision = CpuMpData->MicrocodeRevision;
+        goto Done;
+    }
+  }
+
   LatestRevision = 0;
   MicrocodeData  = NULL;
   MicrocodeEnd = (UINTN) (CpuMpData->MicrocodePatchAddress + CpuMpData->MicrocodePatchRegionSize);
@@ -117,6 +133,7 @@ MicrocodeDetect (
         }
         if (CheckSum32 == 0) {
           CorrectMicrocode = TRUE;
+          ProcessorFlags = MicrocodeEntryPoint->ProcessorFlags;
         }
       } else if ((MicrocodeEntryPoint->DataSize != 0) &&
                  (MicrocodeEntryPoint->UpdateRevision > LatestRevision)) {
@@ -151,6 +168,7 @@ MicrocodeDetect (
                     // Find one
                     //
                     CorrectMicrocode = TRUE;
+                    ProcessorFlags = ExtendedTable->ProcessorFlag;
                     break;
                   }
                 }
@@ -188,6 +206,7 @@ MicrocodeDetect (
     MicrocodeEntryPoint = (CPU_MICROCODE_HEADER *) (((UINTN) MicrocodeEntryPoint) + TotalSize);
   } while (((UINTN) MicrocodeEntryPoint < MicrocodeEnd));
 
+Done:
   if (LatestRevision > CurrentRevision) {
     //
     // BIOS only authenticate updates that contain a numerically larger revision
@@ -211,4 +230,13 @@ MicrocodeDetect (
       ReleaseSpinLock(&CpuMpData->MpLock);
     }
   }
+
+  if (IsBspCallIn && (LatestRevision != 0)) {
+    CpuMpData->ProcessorSignature = Eax.Uint32;
+    CpuMpData->ProcessorFlags = ProcessorFlags;
+    CpuMpData->MicrocodeDataAddress = (UINTN) MicrocodeData;
+    CpuMpData->MicrocodeRevision = LatestRevision;
+    DEBUG ((DEBUG_INFO, "BSP Microcode:: signature [0x%08x], ProcessorFlags [0x%08x], \
+       MicroData [0x%08x], Revision [0x%08x]\n", Eax.Uint32, ProcessorFlags, (UINTN) MicrocodeData, LatestRevision));
+  }
 }
diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c b/UefiCpuPkg/Library/MpInitLib/MpLib.c
index 8b458a4a3a..9179f9ae6d 100644
--- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
+++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
@@ -410,7 +410,7 @@ ApInitializeSync (
   //
   // Load microcode on AP
   //
-  MicrocodeDetect (CpuMpData);
+  MicrocodeDetect (CpuMpData, FALSE);
   //
   // Sync BSP's MTRR table to AP
   //
@@ -1601,7 +1601,7 @@ MpInitLibInitialize (
   //
   // Load Microcode on BSP
   //
-  MicrocodeDetect (CpuMpData);
+  MicrocodeDetect (CpuMpData, TRUE);
   //
   // Store BSP's MTRR setting
   //
diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.h b/UefiCpuPkg/Library/MpInitLib/MpLib.h
index 73e689d969..d897497b77 100644
--- a/UefiCpuPkg/Library/MpInitLib/MpLib.h
+++ b/UefiCpuPkg/Library/MpInitLib/MpLib.h
@@ -246,6 +246,11 @@ struct _CPU_MP_DATA {
   BOOLEAN                        TimerInterruptState;
   UINT64                         MicrocodePatchAddress;
   UINT64                         MicrocodePatchRegionSize;
+
+  UINT32                         ProcessorSignature;
+  UINT32                         ProcessorFlags;
+  UINT64                         MicrocodeDataAddress;
+  UINT32                         MicrocodeRevision;
 };
 
 extern EFI_GUID mCpuInitMpLibHobGuid;
@@ -547,11 +552,13 @@ CheckAndUpdateApsStatus (
 /**
   Detect whether specified processor can find matching microcode patch and load it.
 
-  @param[in]  CpuMpData  The pointer to CPU MP Data structure.
+  @param[in]  CpuMpData    The pointer to CPU MP Data structure.
+  @param[in]  IsBspCallIn  Indicate whether the caller is BSP or not.
 **/
 VOID
 MicrocodeDetect (
-  IN CPU_MP_DATA             *CpuMpData
+  IN CPU_MP_DATA             *CpuMpData,
+  IN BOOLEAN                 IsBspCallIn
   );
 
 /**
-- 
2.15.0.windows.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Patch 3/3] UefiCpuPkg/MpInitLib: Load uCode once for one core.
  2018-07-11 11:07 [Patch 0/3] Optimize load uCode performance Eric Dong
  2018-07-11 11:07 ` [Patch 1/3] UefiCpuPkg/MpInitLib: Relocate uCode to memory to save time Eric Dong
  2018-07-11 11:07 ` [Patch 2/3] UefiCpuPkg/MpInitLib: Use BSP uCode for APs if possible Eric Dong
@ 2018-07-11 11:07 ` Eric Dong
  2018-07-12  9:49   ` Laszlo Ersek
  2018-07-11 16:08 ` [Patch 0/3] Optimize load uCode performance Laszlo Ersek
  2018-07-12  9:58 ` Laszlo Ersek
  4 siblings, 1 reply; 12+ messages in thread
From: Eric Dong @ 2018-07-11 11:07 UTC (permalink / raw)
  To: edk2-devel; +Cc: Laszlo Ersek, Ruiyu Ni

SDM requires one core only needs to load uCode once.
Also load uCode once can save some time.

This change enables this solution.

Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Ruiyu Ni <ruiyu.ni@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Eric Dong <eric.dong@intel.com>
---
 UefiCpuPkg/Library/MpInitLib/Microcode.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/UefiCpuPkg/Library/MpInitLib/Microcode.c b/UefiCpuPkg/Library/MpInitLib/Microcode.c
index 351975e2a2..4586e037d2 100644
--- a/UefiCpuPkg/Library/MpInitLib/Microcode.c
+++ b/UefiCpuPkg/Library/MpInitLib/Microcode.c
@@ -61,6 +61,7 @@ MicrocodeDetect (
   VOID                                    *MicrocodeData;
   MSR_IA32_PLATFORM_ID_REGISTER           PlatformIdMsr;
   UINT32                                  ProcessorFlags;
+  UINT32                                  ThreadId;
 
   if (CpuMpData->MicrocodePatchRegionSize == 0) {
     //
@@ -77,6 +78,14 @@ MicrocodeDetect (
     return;
   }
 
+  GetProcessorLocationByApicId (GetInitialApicId(), NULL, NULL, &ThreadId);
+  if (ThreadId != 0) {
+    //
+    // Skip loading microcode if it is not the first thread in one core.
+    //
+    return;
+  }
+
   ExtendedTableLength = 0;
   //
   // Here data of CPUID leafs have not been collected into context buffer, so
-- 
2.15.0.windows.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [Patch 0/3] Optimize load uCode performance
  2018-07-11 11:07 [Patch 0/3] Optimize load uCode performance Eric Dong
                   ` (2 preceding siblings ...)
  2018-07-11 11:07 ` [Patch 3/3] UefiCpuPkg/MpInitLib: Load uCode once for one core Eric Dong
@ 2018-07-11 16:08 ` Laszlo Ersek
  2018-07-12  9:58 ` Laszlo Ersek
  4 siblings, 0 replies; 12+ messages in thread
From: Laszlo Ersek @ 2018-07-11 16:08 UTC (permalink / raw)
  To: Eric Dong, edk2-devel

Hi Eric,

On 07/11/18 13:07, Eric Dong wrote:
> Use below three rules to optimize load uCode performance:
> 1. Let BSP relocate uCode from flash to memory for better performance.
> 2. BSP caches the CPU ID and address of uCode so AP doesn’t need to look 
>    for the uCode again if the CPU ID is same as BSP’s.
> 3. Only apply uCode in one thread of a core when hyper threading is enabled.
> 
> Test:
> Use an sample platform which has 1 socket, 4 core, 8 threads, the CpuMpPei
> driver cost time reduce from 108.4ms to 27.2ms
> 
> Eric Dong (3):
>   UefiCpuPkg/MpInitLib: Relocate uCode to memory to save time.
>   UefiCpuPkg/MpInitLib: Use BSP uCode for APs if possible.
>   UefiCpuPkg/MpInitLib: Load uCode once for one core.
> 
>  UefiCpuPkg/Library/MpInitLib/Microcode.c | 43 +++++++++++++++++++++++++++++---
>  UefiCpuPkg/Library/MpInitLib/MpLib.c     | 17 ++++++++++---
>  UefiCpuPkg/Library/MpInitLib/MpLib.h     | 11 ++++++--
>  3 files changed, 63 insertions(+), 8 deletions(-)

I'm putting off the regression-testing of this series until we come to a
conclusion on pushing Ray's patch first (see my message at
<http://mid.mail-archive.com/26d8afe6-f89b-ddfc-0e8b-6acdc311f121@redhat.com>).

Thanks!
Laszlo


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch 1/3] UefiCpuPkg/MpInitLib: Relocate uCode to memory to save time.
  2018-07-11 11:07 ` [Patch 1/3] UefiCpuPkg/MpInitLib: Relocate uCode to memory to save time Eric Dong
@ 2018-07-12  9:26   ` Laszlo Ersek
  2018-07-12 10:54     ` Dong, Eric
  0 siblings, 1 reply; 12+ messages in thread
From: Laszlo Ersek @ 2018-07-12  9:26 UTC (permalink / raw)
  To: Eric Dong, edk2-devel; +Cc: Ruiyu Ni

Hi Eric,

On 07/11/18 13:07, Eric Dong wrote:
> Read uCode from memory has better performance than from flash.
> But it needs extra effort to let BSP copy uCode from flash to
> memory. Also BSP already enable cache in SEC phase, so it use
> less time to relocate uCode from flash to memory. After
> verification, if system has more than one processor, it will
> reduce some time if load uCode from memory.
>
> This change enable this optimization.
>
> Cc: Laszlo Ersek <lersek@redhat.com>
> Cc: Ruiyu Ni <ruiyu.ni@intel.com>
> Contributed-under: TianoCore Contribution Agreement 1.1
> Signed-off-by: Eric Dong <eric.dong@intel.com>
> ---
>  UefiCpuPkg/Library/MpInitLib/MpLib.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> index eec178b419..8b458a4a3a 100644
> --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
> +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> @@ -1560,8 +1560,19 @@ MpInitLibInitialize (
>    CpuMpData->SwitchBspFlag    = FALSE;
>    CpuMpData->CpuData          = (CPU_AP_DATA *) (CpuMpData + 1);
>    CpuMpData->CpuInfoInHob     = (UINT64) (UINTN) (CpuMpData->CpuData + MaxLogicalProcessorNumber);
> -  CpuMpData->MicrocodePatchAddress    = PcdGet64 (PcdCpuMicrocodePatchAddress);
>    CpuMpData->MicrocodePatchRegionSize = PcdGet64 (PcdCpuMicrocodePatchRegionSize);
> +  //
> +  // if platform has more than one CPU, relocate microcode to memory to reduce loading microcode time.
> +  //
> +  if (MaxLogicalProcessorNumber > 1) {
> +    CpuMpData->MicrocodePatchAddress = (UINT64) (UINTN) AllocatePages (EFI_SIZE_TO_PAGES ((UINTN)CpuMpData->MicrocodePatchRegionSize));
> +    if (CpuMpData->MicrocodePatchAddress != 0) {
> +      CopyMem ((VOID *) (UINTN)CpuMpData->MicrocodePatchAddress, (VOID *)(UINTN)(PcdGet64 (PcdCpuMicrocodePatchAddress)), (UINTN)CpuMpData->MicrocodePatchRegionSize);
> +    }
> +  } else {
> +    CpuMpData->MicrocodePatchAddress = PcdGet64 (PcdCpuMicrocodePatchAddress);
> +  }
> +
>    InitializeSpinLock(&CpuMpData->MpLock);
>    //
>    // Save BSP's Control registers to APs
>

(1) Can you please break up the added lines to multiple lines? They are
extremely long, and difficult for me to handle. It should be possible to
break up both AllocatePages() and CopyMem(), for example.

(2) The code appears to handle the case well when
PcdCpuMicrocodePatchRegionSize is zero. In that case,
EFI_SIZE_TO_PAGES(...) evaluates to zero, and -- I checked --
AllocatePages() returns NULL. In this case, allocation and copying will
not take place, and that's fine -- there is nothing to copy and no
microcode to install. So, OK.

(3) However, if there is a microcode update available, but we can't
allocate memory, things will go wrong. The region size is nonzero, thus
MicrocodeDetect() will not exit early. But MicrocodePatchAddress will be
set to 0.

So, I suggest the following instead:

---------
  VOID *MicrocodePatchInRam;

  //
  // If platform has more than one CPU, relocate microcode to memory to reduce
  // loading microcode time.
  //
  MicrocodePatchInRam = NULL;
  if (MaxLogicalProcessorNumber > 1) {
    MicrocodePatchInRam = AllocatePages (
                            EFI_SIZE_TO_PAGES (
                              (UINTN)CpuMpData->MicrocodePatchRegionSize
                              )
                            );
  }
  if (MicrocodePatchInRam == NULL) {
    //
    // there is only one processor, or no microcode patch is available, or
    // memory allocation failed
    //
    CpuMpData->MicrocodePatchAddress = PcdGet64 (PcdCpuMicrocodePatchAddress);
  } else {
    //
    // there are multiple processors, and a microcode patch is available, and
    // memory allocation succeeded
    //
    CopyMem (
      MicrocodePatchInRam,
      (VOID *)(UINTN)PcdGet64 (PcdCpuMicrocodePatchAddress),
      (UINTN)CpuMpData->MicrocodePatchRegionSize
      );
    CpuMpData->MicrocodePatchAddress = (UINTN)MicrocodePatchInRam;
  }
---------

Thanks
Laszlo


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch 2/3] UefiCpuPkg/MpInitLib: Use BSP uCode for APs if possible.
  2018-07-11 11:07 ` [Patch 2/3] UefiCpuPkg/MpInitLib: Use BSP uCode for APs if possible Eric Dong
@ 2018-07-12  9:42   ` Laszlo Ersek
  2018-07-12 10:30     ` Dong, Eric
  0 siblings, 1 reply; 12+ messages in thread
From: Laszlo Ersek @ 2018-07-12  9:42 UTC (permalink / raw)
  To: Eric Dong, edk2-devel; +Cc: Ruiyu Ni

On 07/11/18 13:07, Eric Dong wrote:
> Search uCode costs much time, if AP has same processor type
> with BSP, AP can use BSP saved uCode info to get better performance.
> 
> This change enables this solution.
> 
> Cc: Laszlo Ersek <lersek@redhat.com>
> Cc: Ruiyu Ni <ruiyu.ni@intel.com>
> Contributed-under: TianoCore Contribution Agreement 1.1
> Signed-off-by: Eric Dong <eric.dong@intel.com>
> ---
>  UefiCpuPkg/Library/MpInitLib/Microcode.c | 34 +++++++++++++++++++++++++++++---
>  UefiCpuPkg/Library/MpInitLib/MpLib.c     |  4 ++--
>  UefiCpuPkg/Library/MpInitLib/MpLib.h     | 11 +++++++++--
>  3 files changed, 42 insertions(+), 7 deletions(-)
> 
> diff --git a/UefiCpuPkg/Library/MpInitLib/Microcode.c b/UefiCpuPkg/Library/MpInitLib/Microcode.c
> index e47f9f4f8f..351975e2a2 100644
> --- a/UefiCpuPkg/Library/MpInitLib/Microcode.c
> +++ b/UefiCpuPkg/Library/MpInitLib/Microcode.c
> @@ -35,11 +35,13 @@ GetCurrentMicrocodeSignature (
>  /**
>    Detect whether specified processor can find matching microcode patch and load it.
>  
> -  @param[in]  CpuMpData  The pointer to CPU MP Data structure.
> +  @param[in]  CpuMpData    The pointer to CPU MP Data structure.
> +  @param[in]  IsBspCallIn  Indicate whether the caller is BSP or not.
>  **/
>  VOID
>  MicrocodeDetect (
> -  IN CPU_MP_DATA             *CpuMpData
> +  IN CPU_MP_DATA             *CpuMpData,
> +  IN BOOLEAN                 IsBspCallIn
>    )
>  {
>    UINT32                                  ExtendedTableLength;
> @@ -58,6 +60,7 @@ MicrocodeDetect (
>    BOOLEAN                                 CorrectMicrocode;
>    VOID                                    *MicrocodeData;
>    MSR_IA32_PLATFORM_ID_REGISTER           PlatformIdMsr;
> +  UINT32                                  ProcessorFlags;
>  
>    if (CpuMpData->MicrocodePatchRegionSize == 0) {
>      //
> @@ -67,7 +70,7 @@ MicrocodeDetect (
>    }
>  
>    CurrentRevision = GetCurrentMicrocodeSignature ();
> -  if (CurrentRevision != 0) {
> +  if (CurrentRevision != 0 && !IsBspCallIn) {
>      //
>      // Skip loading microcode if it has been loaded successfully
>      //
> @@ -87,6 +90,19 @@ MicrocodeDetect (
>    PlatformIdMsr.Uint64 = AsmReadMsr64 (MSR_IA32_PLATFORM_ID);
>    PlatformId = (UINT8) PlatformIdMsr.Bits.PlatformId;
>  
> +  //
> +  // Check whether AP has same processor with BSP.
> +  // If yes, direct use microcode info saved by BSP.
> +  //
> +  if (!IsBspCallIn) {
> +    if ((CpuMpData->ProcessorSignature == Eax.Uint32) &&
> +        (CpuMpData->ProcessorFlags & (1 << PlatformId)) != 0) {

Here I have only one comment. (The reason for that is that, on OVMF,
MicrocodePatchRegionSize is zero, so MicrocodeDetect() will exit
immediately, on both the APs and the BSP.)

My comment is that the expression

  (1 << PlatformId)

may invoke undefined behavior (and rightfully trigger build breakage
with e.g. clang) if PlatformId is larger than 31.

Now, I do see the comment

  //
  // The index of platform information resides in bits 50:52 of MSR IA32_PLATFORM_ID
  //

so I wanted to suggest adding:

  ASSERT (PlatformId < 8)?

but then I saw that the same left-shift was already used in two other
places.

So, with or without the ASSERT:

Acked-by: Laszlo Ersek <lersek@redhat.com>

Thanks
Laszlo



> +        MicrocodeData = (VOID *)(UINTN) CpuMpData->MicrocodeDataAddress;
> +        LatestRevision = CpuMpData->MicrocodeRevision;
> +        goto Done;
> +    }
> +  }
> +
>    LatestRevision = 0;
>    MicrocodeData  = NULL;
>    MicrocodeEnd = (UINTN) (CpuMpData->MicrocodePatchAddress + CpuMpData->MicrocodePatchRegionSize);
> @@ -117,6 +133,7 @@ MicrocodeDetect (
>          }
>          if (CheckSum32 == 0) {
>            CorrectMicrocode = TRUE;
> +          ProcessorFlags = MicrocodeEntryPoint->ProcessorFlags;
>          }
>        } else if ((MicrocodeEntryPoint->DataSize != 0) &&
>                   (MicrocodeEntryPoint->UpdateRevision > LatestRevision)) {
> @@ -151,6 +168,7 @@ MicrocodeDetect (
>                      // Find one
>                      //
>                      CorrectMicrocode = TRUE;
> +                    ProcessorFlags = ExtendedTable->ProcessorFlag;
>                      break;
>                    }
>                  }
> @@ -188,6 +206,7 @@ MicrocodeDetect (
>      MicrocodeEntryPoint = (CPU_MICROCODE_HEADER *) (((UINTN) MicrocodeEntryPoint) + TotalSize);
>    } while (((UINTN) MicrocodeEntryPoint < MicrocodeEnd));
>  
> +Done:
>    if (LatestRevision > CurrentRevision) {
>      //
>      // BIOS only authenticate updates that contain a numerically larger revision
> @@ -211,4 +230,13 @@ MicrocodeDetect (
>        ReleaseSpinLock(&CpuMpData->MpLock);
>      }
>    }
> +
> +  if (IsBspCallIn && (LatestRevision != 0)) {
> +    CpuMpData->ProcessorSignature = Eax.Uint32;
> +    CpuMpData->ProcessorFlags = ProcessorFlags;
> +    CpuMpData->MicrocodeDataAddress = (UINTN) MicrocodeData;
> +    CpuMpData->MicrocodeRevision = LatestRevision;
> +    DEBUG ((DEBUG_INFO, "BSP Microcode:: signature [0x%08x], ProcessorFlags [0x%08x], \
> +       MicroData [0x%08x], Revision [0x%08x]\n", Eax.Uint32, ProcessorFlags, (UINTN) MicrocodeData, LatestRevision));
> +  }
>  }
> diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> index 8b458a4a3a..9179f9ae6d 100644
> --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
> +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> @@ -410,7 +410,7 @@ ApInitializeSync (
>    //
>    // Load microcode on AP
>    //
> -  MicrocodeDetect (CpuMpData);
> +  MicrocodeDetect (CpuMpData, FALSE);
>    //
>    // Sync BSP's MTRR table to AP
>    //
> @@ -1601,7 +1601,7 @@ MpInitLibInitialize (
>    //
>    // Load Microcode on BSP
>    //
> -  MicrocodeDetect (CpuMpData);
> +  MicrocodeDetect (CpuMpData, TRUE);
>    //
>    // Store BSP's MTRR setting
>    //
> diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.h b/UefiCpuPkg/Library/MpInitLib/MpLib.h
> index 73e689d969..d897497b77 100644
> --- a/UefiCpuPkg/Library/MpInitLib/MpLib.h
> +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.h
> @@ -246,6 +246,11 @@ struct _CPU_MP_DATA {
>    BOOLEAN                        TimerInterruptState;
>    UINT64                         MicrocodePatchAddress;
>    UINT64                         MicrocodePatchRegionSize;
> +
> +  UINT32                         ProcessorSignature;
> +  UINT32                         ProcessorFlags;
> +  UINT64                         MicrocodeDataAddress;
> +  UINT32                         MicrocodeRevision;
>  };
>  
>  extern EFI_GUID mCpuInitMpLibHobGuid;
> @@ -547,11 +552,13 @@ CheckAndUpdateApsStatus (
>  /**
>    Detect whether specified processor can find matching microcode patch and load it.
>  
> -  @param[in]  CpuMpData  The pointer to CPU MP Data structure.
> +  @param[in]  CpuMpData    The pointer to CPU MP Data structure.
> +  @param[in]  IsBspCallIn  Indicate whether the caller is BSP or not.
>  **/
>  VOID
>  MicrocodeDetect (
> -  IN CPU_MP_DATA             *CpuMpData
> +  IN CPU_MP_DATA             *CpuMpData,
> +  IN BOOLEAN                 IsBspCallIn
>    );
>  
>  /**
> 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch 3/3] UefiCpuPkg/MpInitLib: Load uCode once for one core.
  2018-07-11 11:07 ` [Patch 3/3] UefiCpuPkg/MpInitLib: Load uCode once for one core Eric Dong
@ 2018-07-12  9:49   ` Laszlo Ersek
  0 siblings, 0 replies; 12+ messages in thread
From: Laszlo Ersek @ 2018-07-12  9:49 UTC (permalink / raw)
  To: Eric Dong, edk2-devel; +Cc: Ruiyu Ni

I've got two comments:

On 07/11/18 13:07, Eric Dong wrote:
> SDM requires one core only needs to load uCode once.

(1) This is a very confusing typo ("one core only").

I totally missed the point until I re-read the cover letter of the
patch. In the cover letter, you say:

> 3. Only apply uCode in one thread of a core when hyper threading is
> enabled.

So please fix the commit message to say, "The SDM requires only one
*thread per core* to load the microcode", or something similar.

(You don't need to add the emphasis, I'm only adding it here to make
myself clear.)

> Also load uCode once can save some time.
> 
> This change enables this solution.
> 
> Cc: Laszlo Ersek <lersek@redhat.com>
> Cc: Ruiyu Ni <ruiyu.ni@intel.com>
> Contributed-under: TianoCore Contribution Agreement 1.1
> Signed-off-by: Eric Dong <eric.dong@intel.com>
> ---
>  UefiCpuPkg/Library/MpInitLib/Microcode.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/UefiCpuPkg/Library/MpInitLib/Microcode.c b/UefiCpuPkg/Library/MpInitLib/Microcode.c
> index 351975e2a2..4586e037d2 100644
> --- a/UefiCpuPkg/Library/MpInitLib/Microcode.c
> +++ b/UefiCpuPkg/Library/MpInitLib/Microcode.c
> @@ -61,6 +61,7 @@ MicrocodeDetect (
>    VOID                                    *MicrocodeData;
>    MSR_IA32_PLATFORM_ID_REGISTER           PlatformIdMsr;
>    UINT32                                  ProcessorFlags;
> +  UINT32                                  ThreadId;
>  
>    if (CpuMpData->MicrocodePatchRegionSize == 0) {
>      //
> @@ -77,6 +78,14 @@ MicrocodeDetect (
>      return;
>    }
>  
> +  GetProcessorLocationByApicId (GetInitialApicId(), NULL, NULL, &ThreadId);

(2) A space character should be inserted after "GetInitialApicId".

With those fixed:

Acked-by: Laszlo Ersek <lersek@redhat.com>

Thanks
Laszlo

> +  if (ThreadId != 0) {
> +    //
> +    // Skip loading microcode if it is not the first thread in one core.
> +    //
> +    return;
> +  }
> +
>    ExtendedTableLength = 0;
>    //
>    // Here data of CPUID leafs have not been collected into context buffer, so
> 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch 0/3] Optimize load uCode performance
  2018-07-11 11:07 [Patch 0/3] Optimize load uCode performance Eric Dong
                   ` (3 preceding siblings ...)
  2018-07-11 16:08 ` [Patch 0/3] Optimize load uCode performance Laszlo Ersek
@ 2018-07-12  9:58 ` Laszlo Ersek
  2018-07-12 10:55   ` Dong, Eric
  4 siblings, 1 reply; 12+ messages in thread
From: Laszlo Ersek @ 2018-07-12  9:58 UTC (permalink / raw)
  To: Eric Dong, edk2-devel

On 07/11/18 13:07, Eric Dong wrote:
> Use below three rules to optimize load uCode performance:
> 1. Let BSP relocate uCode from flash to memory for better performance.
> 2. BSP caches the CPU ID and address of uCode so AP doesn’t need to look 
>    for the uCode again if the CPU ID is same as BSP’s.
> 3. Only apply uCode in one thread of a core when hyper threading is enabled.
> 
> Test:
> Use an sample platform which has 1 socket, 4 core, 8 threads, the CpuMpPei
> driver cost time reduce from 108.4ms to 27.2ms
> 
> Eric Dong (3):
>   UefiCpuPkg/MpInitLib: Relocate uCode to memory to save time.
>   UefiCpuPkg/MpInitLib: Use BSP uCode for APs if possible.
>   UefiCpuPkg/MpInitLib: Load uCode once for one core.
> 
>  UefiCpuPkg/Library/MpInitLib/Microcode.c | 43 +++++++++++++++++++++++++++++---
>  UefiCpuPkg/Library/MpInitLib/MpLib.c     | 17 ++++++++++---
>  UefiCpuPkg/Library/MpInitLib/MpLib.h     | 11 ++++++--
>  3 files changed, 63 insertions(+), 8 deletions(-)
> 
> 
> 
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
> 

I also tried to regression test this with OVMF. (I know I requested some
changes for patch #1, however those would only affect behavior in
practice when memory allocation fails, so it makes sense for me to test
this version of the series as well.)

Unfortunately, the patch set doesn't apply; it fails with the first
patch already. I tried on top of current master, commit 0a563f3fecfd
("BaseTool: Fixed the incorrect cache key.", 2018-07-12).

Can you please post a v2 that is also rebased on current master?

Thanks
Laszlo


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch 2/3] UefiCpuPkg/MpInitLib: Use BSP uCode for APs if possible.
  2018-07-12  9:42   ` Laszlo Ersek
@ 2018-07-12 10:30     ` Dong, Eric
  0 siblings, 0 replies; 12+ messages in thread
From: Dong, Eric @ 2018-07-12 10:30 UTC (permalink / raw)
  To: Laszlo Ersek, edk2-devel@lists.01.org; +Cc: Ni, Ruiyu

Hi Laszlo,

> -----Original Message-----
> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of
> Laszlo Ersek
> Sent: Thursday, July 12, 2018 5:42 PM
> To: Dong, Eric <eric.dong@intel.com>; edk2-devel@lists.01.org
> Cc: Ni, Ruiyu <ruiyu.ni@intel.com>
> Subject: Re: [edk2] [Patch 2/3] UefiCpuPkg/MpInitLib: Use BSP uCode for APs
> if possible.
> 
> On 07/11/18 13:07, Eric Dong wrote:
> > Search uCode costs much time, if AP has same processor type with BSP,
> > AP can use BSP saved uCode info to get better performance.
> >
> > This change enables this solution.
> >
> > Cc: Laszlo Ersek <lersek@redhat.com>
> > Cc: Ruiyu Ni <ruiyu.ni@intel.com>
> > Contributed-under: TianoCore Contribution Agreement 1.1
> > Signed-off-by: Eric Dong <eric.dong@intel.com>
> > ---
> >  UefiCpuPkg/Library/MpInitLib/Microcode.c | 34
> +++++++++++++++++++++++++++++---
> >  UefiCpuPkg/Library/MpInitLib/MpLib.c     |  4 ++--
> >  UefiCpuPkg/Library/MpInitLib/MpLib.h     | 11 +++++++++--
> >  3 files changed, 42 insertions(+), 7 deletions(-)
> >
> > diff --git a/UefiCpuPkg/Library/MpInitLib/Microcode.c
> > b/UefiCpuPkg/Library/MpInitLib/Microcode.c
> > index e47f9f4f8f..351975e2a2 100644
> > --- a/UefiCpuPkg/Library/MpInitLib/Microcode.c
> > +++ b/UefiCpuPkg/Library/MpInitLib/Microcode.c
> > @@ -35,11 +35,13 @@ GetCurrentMicrocodeSignature (
> >  /**
> >    Detect whether specified processor can find matching microcode patch
> and load it.
> >
> > -  @param[in]  CpuMpData  The pointer to CPU MP Data structure.
> > +  @param[in]  CpuMpData    The pointer to CPU MP Data structure.
> > +  @param[in]  IsBspCallIn  Indicate whether the caller is BSP or not.
> >  **/
> >  VOID
> >  MicrocodeDetect (
> > -  IN CPU_MP_DATA             *CpuMpData
> > +  IN CPU_MP_DATA             *CpuMpData,
> > +  IN BOOLEAN                 IsBspCallIn
> >    )
> >  {
> >    UINT32                                  ExtendedTableLength;
> > @@ -58,6 +60,7 @@ MicrocodeDetect (
> >    BOOLEAN                                 CorrectMicrocode;
> >    VOID                                    *MicrocodeData;
> >    MSR_IA32_PLATFORM_ID_REGISTER           PlatformIdMsr;
> > +  UINT32                                  ProcessorFlags;
> >
> >    if (CpuMpData->MicrocodePatchRegionSize == 0) {
> >      //
> > @@ -67,7 +70,7 @@ MicrocodeDetect (
> >    }
> >
> >    CurrentRevision = GetCurrentMicrocodeSignature ();
> > -  if (CurrentRevision != 0) {
> > +  if (CurrentRevision != 0 && !IsBspCallIn) {
> >      //
> >      // Skip loading microcode if it has been loaded successfully
> >      //
> > @@ -87,6 +90,19 @@ MicrocodeDetect (
> >    PlatformIdMsr.Uint64 = AsmReadMsr64 (MSR_IA32_PLATFORM_ID);
> >    PlatformId = (UINT8) PlatformIdMsr.Bits.PlatformId;
> >
> > +  //
> > +  // Check whether AP has same processor with BSP.
> > +  // If yes, direct use microcode info saved by BSP.
> > +  //
> > +  if (!IsBspCallIn) {
> > +    if ((CpuMpData->ProcessorSignature == Eax.Uint32) &&
> > +        (CpuMpData->ProcessorFlags & (1 << PlatformId)) != 0) {
> 
> Here I have only one comment. (The reason for that is that, on OVMF,
> MicrocodePatchRegionSize is zero, so MicrocodeDetect() will exit
> immediately, on both the APs and the BSP.)
> 
> My comment is that the expression
> 
>   (1 << PlatformId)
> 
> may invoke undefined behavior (and rightfully trigger build breakage with e.g.
> clang) if PlatformId is larger than 31.
> 
> Now, I do see the comment
> 
>   //
>   // The index of platform information resides in bits 50:52 of MSR
> IA32_PLATFORM_ID
>   //
> 
> so I wanted to suggest adding:
> 
>   ASSERT (PlatformId < 8)?
> 
> but then I saw that the same left-shift was already used in two other places.
> 
> So, with or without the ASSERT:
> 
> Acked-by: Laszlo Ersek <lersek@redhat.com>

PlatformId get from below code: 
  PlatformId = (UINT8) PlatformIdMsr.Bits.PlatformId;

PlatformIdMsr.Bits.PlatformId is a three bits value, so it has limit the PlatformId value < 8. So I will not add the ASSERT code.

> 
> Thanks
> Laszlo
> 
> 
> 
> > +        MicrocodeData = (VOID *)(UINTN) CpuMpData-
> >MicrocodeDataAddress;
> > +        LatestRevision = CpuMpData->MicrocodeRevision;
> > +        goto Done;
> > +    }
> > +  }
> > +
> >    LatestRevision = 0;
> >    MicrocodeData  = NULL;
> >    MicrocodeEnd = (UINTN) (CpuMpData->MicrocodePatchAddress +
> > CpuMpData->MicrocodePatchRegionSize);
> > @@ -117,6 +133,7 @@ MicrocodeDetect (
> >          }
> >          if (CheckSum32 == 0) {
> >            CorrectMicrocode = TRUE;
> > +          ProcessorFlags = MicrocodeEntryPoint->ProcessorFlags;
> >          }
> >        } else if ((MicrocodeEntryPoint->DataSize != 0) &&
> >                   (MicrocodeEntryPoint->UpdateRevision >
> > LatestRevision)) { @@ -151,6 +168,7 @@ MicrocodeDetect (
> >                      // Find one
> >                      //
> >                      CorrectMicrocode = TRUE;
> > +                    ProcessorFlags = ExtendedTable->ProcessorFlag;
> >                      break;
> >                    }
> >                  }
> > @@ -188,6 +206,7 @@ MicrocodeDetect (
> >      MicrocodeEntryPoint = (CPU_MICROCODE_HEADER *) (((UINTN)
> MicrocodeEntryPoint) + TotalSize);
> >    } while (((UINTN) MicrocodeEntryPoint < MicrocodeEnd));
> >
> > +Done:
> >    if (LatestRevision > CurrentRevision) {
> >      //
> >      // BIOS only authenticate updates that contain a numerically
> > larger revision @@ -211,4 +230,13 @@ MicrocodeDetect (
> >        ReleaseSpinLock(&CpuMpData->MpLock);
> >      }
> >    }
> > +
> > +  if (IsBspCallIn && (LatestRevision != 0)) {
> > +    CpuMpData->ProcessorSignature = Eax.Uint32;
> > +    CpuMpData->ProcessorFlags = ProcessorFlags;
> > +    CpuMpData->MicrocodeDataAddress = (UINTN) MicrocodeData;
> > +    CpuMpData->MicrocodeRevision = LatestRevision;
> > +    DEBUG ((DEBUG_INFO, "BSP Microcode:: signature [0x%08x],
> ProcessorFlags [0x%08x], \
> > +       MicroData [0x%08x], Revision [0x%08x]\n", Eax.Uint32,
> > + ProcessorFlags, (UINTN) MicrocodeData, LatestRevision));  }
> >  }
> > diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c
> > b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> > index 8b458a4a3a..9179f9ae6d 100644
> > --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
> > +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> > @@ -410,7 +410,7 @@ ApInitializeSync (
> >    //
> >    // Load microcode on AP
> >    //
> > -  MicrocodeDetect (CpuMpData);
> > +  MicrocodeDetect (CpuMpData, FALSE);
> >    //
> >    // Sync BSP's MTRR table to AP
> >    //
> > @@ -1601,7 +1601,7 @@ MpInitLibInitialize (
> >    //
> >    // Load Microcode on BSP
> >    //
> > -  MicrocodeDetect (CpuMpData);
> > +  MicrocodeDetect (CpuMpData, TRUE);
> >    //
> >    // Store BSP's MTRR setting
> >    //
> > diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.h
> > b/UefiCpuPkg/Library/MpInitLib/MpLib.h
> > index 73e689d969..d897497b77 100644
> > --- a/UefiCpuPkg/Library/MpInitLib/MpLib.h
> > +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.h
> > @@ -246,6 +246,11 @@ struct _CPU_MP_DATA {
> >    BOOLEAN                        TimerInterruptState;
> >    UINT64                         MicrocodePatchAddress;
> >    UINT64                         MicrocodePatchRegionSize;
> > +
> > +  UINT32                         ProcessorSignature;
> > +  UINT32                         ProcessorFlags;
> > +  UINT64                         MicrocodeDataAddress;
> > +  UINT32                         MicrocodeRevision;
> >  };
> >
> >  extern EFI_GUID mCpuInitMpLibHobGuid; @@ -547,11 +552,13 @@
> > CheckAndUpdateApsStatus (
> >  /**
> >    Detect whether specified processor can find matching microcode patch
> and load it.
> >
> > -  @param[in]  CpuMpData  The pointer to CPU MP Data structure.
> > +  @param[in]  CpuMpData    The pointer to CPU MP Data structure.
> > +  @param[in]  IsBspCallIn  Indicate whether the caller is BSP or not.
> >  **/
> >  VOID
> >  MicrocodeDetect (
> > -  IN CPU_MP_DATA             *CpuMpData
> > +  IN CPU_MP_DATA             *CpuMpData,
> > +  IN BOOLEAN                 IsBspCallIn
> >    );
> >
> >  /**
> >
> 
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch 1/3] UefiCpuPkg/MpInitLib: Relocate uCode to memory to save time.
  2018-07-12  9:26   ` Laszlo Ersek
@ 2018-07-12 10:54     ` Dong, Eric
  0 siblings, 0 replies; 12+ messages in thread
From: Dong, Eric @ 2018-07-12 10:54 UTC (permalink / raw)
  To: Laszlo Ersek, edk2-devel@lists.01.org; +Cc: Ni, Ruiyu

Hi Laszlo,

> -----Original Message-----
> From: Laszlo Ersek [mailto:lersek@redhat.com]
> Sent: Thursday, July 12, 2018 5:26 PM
> To: Dong, Eric <eric.dong@intel.com>; edk2-devel@lists.01.org
> Cc: Ni, Ruiyu <ruiyu.ni@intel.com>
> Subject: Re: [edk2] [Patch 1/3] UefiCpuPkg/MpInitLib: Relocate uCode to
> memory to save time.
> 
> Hi Eric,
> 
> On 07/11/18 13:07, Eric Dong wrote:
> > Read uCode from memory has better performance than from flash.
> > But it needs extra effort to let BSP copy uCode from flash to memory.
> > Also BSP already enable cache in SEC phase, so it use less time to
> > relocate uCode from flash to memory. After verification, if system has
> > more than one processor, it will reduce some time if load uCode from
> > memory.
> >
> > This change enable this optimization.
> >
> > Cc: Laszlo Ersek <lersek@redhat.com>
> > Cc: Ruiyu Ni <ruiyu.ni@intel.com>
> > Contributed-under: TianoCore Contribution Agreement 1.1
> > Signed-off-by: Eric Dong <eric.dong@intel.com>
> > ---
> >  UefiCpuPkg/Library/MpInitLib/MpLib.c | 13 ++++++++++++-
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c
> > b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> > index eec178b419..8b458a4a3a 100644
> > --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
> > +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> > @@ -1560,8 +1560,19 @@ MpInitLibInitialize (
> >    CpuMpData->SwitchBspFlag    = FALSE;
> >    CpuMpData->CpuData          = (CPU_AP_DATA *) (CpuMpData + 1);
> >    CpuMpData->CpuInfoInHob     = (UINT64) (UINTN) (CpuMpData->CpuData
> + MaxLogicalProcessorNumber);
> > -  CpuMpData->MicrocodePatchAddress    = PcdGet64
> (PcdCpuMicrocodePatchAddress);
> >    CpuMpData->MicrocodePatchRegionSize = PcdGet64
> > (PcdCpuMicrocodePatchRegionSize);
> > +  //
> > +  // if platform has more than one CPU, relocate microcode to memory to
> reduce loading microcode time.
> > +  //
> > +  if (MaxLogicalProcessorNumber > 1) {
> > +    CpuMpData->MicrocodePatchAddress = (UINT64) (UINTN) AllocatePages
> (EFI_SIZE_TO_PAGES ((UINTN)CpuMpData->MicrocodePatchRegionSize));
> > +    if (CpuMpData->MicrocodePatchAddress != 0) {
> > +      CopyMem ((VOID *) (UINTN)CpuMpData->MicrocodePatchAddress,
> (VOID *)(UINTN)(PcdGet64 (PcdCpuMicrocodePatchAddress)),
> (UINTN)CpuMpData->MicrocodePatchRegionSize);
> > +    }
> > +  } else {
> > +    CpuMpData->MicrocodePatchAddress = PcdGet64
> > + (PcdCpuMicrocodePatchAddress);  }
> > +
> >    InitializeSpinLock(&CpuMpData->MpLock);
> >    //
> >    // Save BSP's Control registers to APs
> >
> 
> (1) Can you please break up the added lines to multiple lines? They are
> extremely long, and difficult for me to handle. It should be possible to break
> up both AllocatePages() and CopyMem(), for example.
> 
> (2) The code appears to handle the case well when
> PcdCpuMicrocodePatchRegionSize is zero. In that case,
> EFI_SIZE_TO_PAGES(...) evaluates to zero, and -- I checked --
> AllocatePages() returns NULL. In this case, allocation and copying will not take
> place, and that's fine -- there is nothing to copy and no microcode to install.
> So, OK.
> 

Yes, my original patch has PcdCpuMicrocodePatchRegionSize check, but after check I found if it is zero, all the allocate action will do nothing, so I removed it.

> (3) However, if there is a microcode update available, but we can't allocate
> memory, things will go wrong. The region size is nonzero, thus
> MicrocodeDetect() will not exit early. But MicrocodePatchAddress will be set
> to 0.
> 

Agree.

> So, I suggest the following instead:
> 
> ---------
>   VOID *MicrocodePatchInRam;
> 
>   //
>   // If platform has more than one CPU, relocate microcode to memory to
> reduce
>   // loading microcode time.
>   //
>   MicrocodePatchInRam = NULL;
>   if (MaxLogicalProcessorNumber > 1) {
>     MicrocodePatchInRam = AllocatePages (
>                             EFI_SIZE_TO_PAGES (
>                               (UINTN)CpuMpData->MicrocodePatchRegionSize
>                               )
>                             );
>   }
>   if (MicrocodePatchInRam == NULL) {
>     //
>     // there is only one processor, or no microcode patch is available, or
>     // memory allocation failed
>     //
>     CpuMpData->MicrocodePatchAddress = PcdGet64
> (PcdCpuMicrocodePatchAddress);
>   } else {
>     //
>     // there are multiple processors, and a microcode patch is available, and
>     // memory allocation succeeded
>     //
>     CopyMem (
>       MicrocodePatchInRam,
>       (VOID *)(UINTN)PcdGet64 (PcdCpuMicrocodePatchAddress),
>       (UINTN)CpuMpData->MicrocodePatchRegionSize
>       );
>     CpuMpData->MicrocodePatchAddress = (UINTN)MicrocodePatchInRam;
>   }
> ---------
> 

Thanks for your sample code, I directly use it.  Also I think allocate memory failed is an abnormal case, so I add an ASSERT in the new patch.

> Thanks
> Laszlo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch 0/3] Optimize load uCode performance
  2018-07-12  9:58 ` Laszlo Ersek
@ 2018-07-12 10:55   ` Dong, Eric
  0 siblings, 0 replies; 12+ messages in thread
From: Dong, Eric @ 2018-07-12 10:55 UTC (permalink / raw)
  To: Laszlo Ersek, edk2-devel@lists.01.org

Hi Laszlo,

> -----Original Message-----
> From: Laszlo Ersek [mailto:lersek@redhat.com]
> Sent: Thursday, July 12, 2018 5:58 PM
> To: Dong, Eric <eric.dong@intel.com>; edk2-devel@lists.01.org
> Subject: Re: [edk2] [Patch 0/3] Optimize load uCode performance
> 
> On 07/11/18 13:07, Eric Dong wrote:
> > Use below three rules to optimize load uCode performance:
> > 1. Let BSP relocate uCode from flash to memory for better performance.
> > 2. BSP caches the CPU ID and address of uCode so AP doesn’t need to look
> >    for the uCode again if the CPU ID is same as BSP’s.
> > 3. Only apply uCode in one thread of a core when hyper threading is
> enabled.
> >
> > Test:
> > Use an sample platform which has 1 socket, 4 core, 8 threads, the
> > CpuMpPei driver cost time reduce from 108.4ms to 27.2ms
> >
> > Eric Dong (3):
> >   UefiCpuPkg/MpInitLib: Relocate uCode to memory to save time.
> >   UefiCpuPkg/MpInitLib: Use BSP uCode for APs if possible.
> >   UefiCpuPkg/MpInitLib: Load uCode once for one core.
> >
> >  UefiCpuPkg/Library/MpInitLib/Microcode.c | 43
> +++++++++++++++++++++++++++++---
> >  UefiCpuPkg/Library/MpInitLib/MpLib.c     | 17 ++++++++++---
> >  UefiCpuPkg/Library/MpInitLib/MpLib.h     | 11 ++++++--
> >  3 files changed, 63 insertions(+), 8 deletions(-)
> >
> >
> >
> > _______________________________________________
> > edk2-devel mailing list
> > edk2-devel@lists.01.org
> > https://lists.01.org/mailman/listinfo/edk2-devel
> >
> 
> I also tried to regression test this with OVMF. (I know I requested some
> changes for patch #1, however those would only affect behavior in practice
> when memory allocation fails, so it makes sense for me to test this version of
> the series as well.)
> 
> Unfortunately, the patch set doesn't apply; it fails with the first patch already.
> I tried on top of current master, commit 0a563f3fecfd
> ("BaseTool: Fixed the incorrect cache key.", 2018-07-12).
> 
> Can you please post a v2 that is also rebased on current master?
> 

I just send v2 patches base on the latest codebase, please verify with the newer one.

> Thanks
> Laszlo

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-07-12 10:56 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-07-11 11:07 [Patch 0/3] Optimize load uCode performance Eric Dong
2018-07-11 11:07 ` [Patch 1/3] UefiCpuPkg/MpInitLib: Relocate uCode to memory to save time Eric Dong
2018-07-12  9:26   ` Laszlo Ersek
2018-07-12 10:54     ` Dong, Eric
2018-07-11 11:07 ` [Patch 2/3] UefiCpuPkg/MpInitLib: Use BSP uCode for APs if possible Eric Dong
2018-07-12  9:42   ` Laszlo Ersek
2018-07-12 10:30     ` Dong, Eric
2018-07-11 11:07 ` [Patch 3/3] UefiCpuPkg/MpInitLib: Load uCode once for one core Eric Dong
2018-07-12  9:49   ` Laszlo Ersek
2018-07-11 16:08 ` [Patch 0/3] Optimize load uCode performance Laszlo Ersek
2018-07-12  9:58 ` Laszlo Ersek
2018-07-12 10:55   ` Dong, Eric

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox