public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
* [Patch v2 0/3] Optimize load uCode performance
@ 2018-07-12 10:49 Eric Dong
  2018-07-12 10:49 ` [Patch v2 1/3] UefiCpuPkg/MpInitLib: Relocate uCode to memory to save time Eric Dong
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Eric Dong @ 2018-07-12 10:49 UTC (permalink / raw)
  To: edk2-devel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1009 bytes --]

Use below three rules to optimize load uCode performance:
1. Let BSP relocate uCode from flash to memory for better performance.
2. BSP caches the CPU ID and address of uCode so AP doesn’t need to look 
   for the uCode again if the CPU ID is same as BSP’s.
3. Only apply uCode in one thread of a core when hyper threading is enabled.

v2 changes:
  Fix potential issue if allocate memory failed.

Test:
Use an sample platform which has 1 socket, 4 core, 8 threads, the 
CpuMpPei driver cost time reduce from 108.4ms to 27.2ms


Eric Dong (3):
  UefiCpuPkg/MpInitLib: Use BSP uCode for APs if possible.
  UefiCpuPkg/MpInitLib: Load uCode once for each core.
  UefiCpuPkg/MpInitLib: Relocate uCode to memory to save time.

 UefiCpuPkg/Library/MpInitLib/Microcode.c | 43 +++++++++++++++++++++++++++++---
 UefiCpuPkg/Library/MpInitLib/MpLib.c     | 38 +++++++++++++++++++++++++---
 UefiCpuPkg/Library/MpInitLib/MpLib.h     | 11 ++++++--
 3 files changed, 84 insertions(+), 8 deletions(-)

-- 
2.15.0.windows.1



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Patch v2 1/3] UefiCpuPkg/MpInitLib: Relocate uCode to memory to save time.
  2018-07-12 10:49 [Patch v2 0/3] Optimize load uCode performance Eric Dong
@ 2018-07-12 10:49 ` Eric Dong
  2018-07-12 21:59   ` Laszlo Ersek
  2018-07-12 10:49 ` [Patch v2 2/3] UefiCpuPkg/MpInitLib: Use BSP uCode for APs if possible Eric Dong
  2018-07-12 10:49 ` [Patch v2 3/3] UefiCpuPkg/MpInitLib: Load uCode once for each core Eric Dong
  2 siblings, 1 reply; 9+ messages in thread
From: Eric Dong @ 2018-07-12 10:49 UTC (permalink / raw)
  To: edk2-devel; +Cc: Laszlo Ersek, Ruiyu Ni

Read uCode from memory has better performance than from flash.
But it needs extra effort to let BSP copy uCode from flash to
memory. Also BSP already enable cache in SEC phase, so it use
less time to relocate uCode from flash to memory. After
verification, if system has more than one processor, it will
reduce some time if load uCode from memory.

This change enable this optimization.

Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Ruiyu Ni <ruiyu.ni@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Eric Dong <eric.dong@intel.com>
---
 UefiCpuPkg/Library/MpInitLib/MpLib.c | 34 +++++++++++++++++++++++++++++++++-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c b/UefiCpuPkg/Library/MpInitLib/MpLib.c
index 108eea0a6f..c3cd6d7d51 100644
--- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
+++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
@@ -1520,6 +1520,7 @@ MpInitLibInitialize (
   UINTN                    ApResetVectorSize;
   UINTN                    BackupBufferAddr;
   UINTN                    ApIdtBase;
+  VOID                     *MicrocodePatchInRam;
 
   OldCpuMpData = GetCpuMpDataFromGuidedHob ();
   if (OldCpuMpData == NULL) {
@@ -1587,8 +1588,39 @@ MpInitLibInitialize (
   CpuMpData->SwitchBspFlag    = FALSE;
   CpuMpData->CpuData          = (CPU_AP_DATA *) (CpuMpData + 1);
   CpuMpData->CpuInfoInHob     = (UINT64) (UINTN) (CpuMpData->CpuData + MaxLogicalProcessorNumber);
-  CpuMpData->MicrocodePatchAddress    = PcdGet64 (PcdCpuMicrocodePatchAddress);
   CpuMpData->MicrocodePatchRegionSize = PcdGet64 (PcdCpuMicrocodePatchRegionSize);
+  //
+  // If platform has more than one CPU, relocate microcode to memory to reduce
+  // loading microcode time.
+  //
+  MicrocodePatchInRam = NULL;
+  if (MaxLogicalProcessorNumber > 1) {
+    MicrocodePatchInRam = AllocatePages (
+                            EFI_SIZE_TO_PAGES (
+                              (UINTN)CpuMpData->MicrocodePatchRegionSize
+                              )
+                            );
+    ASSERT (MicrocodePatchInRam != NULL);
+  }
+  if (MicrocodePatchInRam == NULL) {
+    //
+    // there is only one processor, or no microcode patch is available, or
+    // memory allocation failed
+    //
+    CpuMpData->MicrocodePatchAddress = PcdGet64 (PcdCpuMicrocodePatchAddress);
+  } else {
+    //
+    // there are multiple processors, and a microcode patch is available, and
+    // memory allocation succeeded
+    //
+    CopyMem (
+      MicrocodePatchInRam,
+      (VOID *)(UINTN)PcdGet64 (PcdCpuMicrocodePatchAddress),
+      (UINTN)CpuMpData->MicrocodePatchRegionSize
+      );
+    CpuMpData->MicrocodePatchAddress = (UINTN)MicrocodePatchInRam;
+  }
+
   InitializeSpinLock(&CpuMpData->MpLock);
 
   //
-- 
2.15.0.windows.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Patch v2 2/3] UefiCpuPkg/MpInitLib: Use BSP uCode for APs if possible.
  2018-07-12 10:49 [Patch v2 0/3] Optimize load uCode performance Eric Dong
  2018-07-12 10:49 ` [Patch v2 1/3] UefiCpuPkg/MpInitLib: Relocate uCode to memory to save time Eric Dong
@ 2018-07-12 10:49 ` Eric Dong
  2018-07-12 22:00   ` Laszlo Ersek
  2018-07-12 10:49 ` [Patch v2 3/3] UefiCpuPkg/MpInitLib: Load uCode once for each core Eric Dong
  2 siblings, 1 reply; 9+ messages in thread
From: Eric Dong @ 2018-07-12 10:49 UTC (permalink / raw)
  To: edk2-devel; +Cc: Laszlo Ersek, Ruiyu Ni

Search uCode costs much time, if AP has same processor type
with BSP, AP can use BSP saved uCode info to get better performance.

This change enables this solution.

Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Ruiyu Ni <ruiyu.ni@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Eric Dong <eric.dong@intel.com>
---
 UefiCpuPkg/Library/MpInitLib/Microcode.c | 34 +++++++++++++++++++++++++++++---
 UefiCpuPkg/Library/MpInitLib/MpLib.c     |  4 ++--
 UefiCpuPkg/Library/MpInitLib/MpLib.h     | 11 +++++++++--
 3 files changed, 42 insertions(+), 7 deletions(-)

diff --git a/UefiCpuPkg/Library/MpInitLib/Microcode.c b/UefiCpuPkg/Library/MpInitLib/Microcode.c
index e47f9f4f8f..351975e2a2 100644
--- a/UefiCpuPkg/Library/MpInitLib/Microcode.c
+++ b/UefiCpuPkg/Library/MpInitLib/Microcode.c
@@ -35,11 +35,13 @@ GetCurrentMicrocodeSignature (
 /**
   Detect whether specified processor can find matching microcode patch and load it.
 
-  @param[in]  CpuMpData  The pointer to CPU MP Data structure.
+  @param[in]  CpuMpData    The pointer to CPU MP Data structure.
+  @param[in]  IsBspCallIn  Indicate whether the caller is BSP or not.
 **/
 VOID
 MicrocodeDetect (
-  IN CPU_MP_DATA             *CpuMpData
+  IN CPU_MP_DATA             *CpuMpData,
+  IN BOOLEAN                 IsBspCallIn
   )
 {
   UINT32                                  ExtendedTableLength;
@@ -58,6 +60,7 @@ MicrocodeDetect (
   BOOLEAN                                 CorrectMicrocode;
   VOID                                    *MicrocodeData;
   MSR_IA32_PLATFORM_ID_REGISTER           PlatformIdMsr;
+  UINT32                                  ProcessorFlags;
 
   if (CpuMpData->MicrocodePatchRegionSize == 0) {
     //
@@ -67,7 +70,7 @@ MicrocodeDetect (
   }
 
   CurrentRevision = GetCurrentMicrocodeSignature ();
-  if (CurrentRevision != 0) {
+  if (CurrentRevision != 0 && !IsBspCallIn) {
     //
     // Skip loading microcode if it has been loaded successfully
     //
@@ -87,6 +90,19 @@ MicrocodeDetect (
   PlatformIdMsr.Uint64 = AsmReadMsr64 (MSR_IA32_PLATFORM_ID);
   PlatformId = (UINT8) PlatformIdMsr.Bits.PlatformId;
 
+  //
+  // Check whether AP has same processor with BSP.
+  // If yes, direct use microcode info saved by BSP.
+  //
+  if (!IsBspCallIn) {
+    if ((CpuMpData->ProcessorSignature == Eax.Uint32) &&
+        (CpuMpData->ProcessorFlags & (1 << PlatformId)) != 0) {
+        MicrocodeData = (VOID *)(UINTN) CpuMpData->MicrocodeDataAddress;
+        LatestRevision = CpuMpData->MicrocodeRevision;
+        goto Done;
+    }
+  }
+
   LatestRevision = 0;
   MicrocodeData  = NULL;
   MicrocodeEnd = (UINTN) (CpuMpData->MicrocodePatchAddress + CpuMpData->MicrocodePatchRegionSize);
@@ -117,6 +133,7 @@ MicrocodeDetect (
         }
         if (CheckSum32 == 0) {
           CorrectMicrocode = TRUE;
+          ProcessorFlags = MicrocodeEntryPoint->ProcessorFlags;
         }
       } else if ((MicrocodeEntryPoint->DataSize != 0) &&
                  (MicrocodeEntryPoint->UpdateRevision > LatestRevision)) {
@@ -151,6 +168,7 @@ MicrocodeDetect (
                     // Find one
                     //
                     CorrectMicrocode = TRUE;
+                    ProcessorFlags = ExtendedTable->ProcessorFlag;
                     break;
                   }
                 }
@@ -188,6 +206,7 @@ MicrocodeDetect (
     MicrocodeEntryPoint = (CPU_MICROCODE_HEADER *) (((UINTN) MicrocodeEntryPoint) + TotalSize);
   } while (((UINTN) MicrocodeEntryPoint < MicrocodeEnd));
 
+Done:
   if (LatestRevision > CurrentRevision) {
     //
     // BIOS only authenticate updates that contain a numerically larger revision
@@ -211,4 +230,13 @@ MicrocodeDetect (
       ReleaseSpinLock(&CpuMpData->MpLock);
     }
   }
+
+  if (IsBspCallIn && (LatestRevision != 0)) {
+    CpuMpData->ProcessorSignature = Eax.Uint32;
+    CpuMpData->ProcessorFlags = ProcessorFlags;
+    CpuMpData->MicrocodeDataAddress = (UINTN) MicrocodeData;
+    CpuMpData->MicrocodeRevision = LatestRevision;
+    DEBUG ((DEBUG_INFO, "BSP Microcode:: signature [0x%08x], ProcessorFlags [0x%08x], \
+       MicroData [0x%08x], Revision [0x%08x]\n", Eax.Uint32, ProcessorFlags, (UINTN) MicrocodeData, LatestRevision));
+  }
 }
diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c b/UefiCpuPkg/Library/MpInitLib/MpLib.c
index c3cd6d7d51..3a7bdfa00b 100644
--- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
+++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
@@ -410,7 +410,7 @@ ApInitializeSync (
   //
   // Load microcode on AP
   //
-  MicrocodeDetect (CpuMpData);
+  MicrocodeDetect (CpuMpData, FALSE);
   //
   // Sync BSP's MTRR table to AP
   //
@@ -1659,7 +1659,7 @@ MpInitLibInitialize (
   //
   // Load Microcode on BSP
   //
-  MicrocodeDetect (CpuMpData);
+  MicrocodeDetect (CpuMpData, TRUE);
   //
   // Store BSP's MTRR setting
   //
diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.h b/UefiCpuPkg/Library/MpInitLib/MpLib.h
index 9aedb52636..6958080ac1 100644
--- a/UefiCpuPkg/Library/MpInitLib/MpLib.h
+++ b/UefiCpuPkg/Library/MpInitLib/MpLib.h
@@ -245,6 +245,11 @@ struct _CPU_MP_DATA {
   BOOLEAN                        TimerInterruptState;
   UINT64                         MicrocodePatchAddress;
   UINT64                         MicrocodePatchRegionSize;
+
+  UINT32                         ProcessorSignature;
+  UINT32                         ProcessorFlags;
+  UINT64                         MicrocodeDataAddress;
+  UINT32                         MicrocodeRevision;
 };
 
 extern EFI_GUID mCpuInitMpLibHobGuid;
@@ -546,11 +551,13 @@ CheckAndUpdateApsStatus (
 /**
   Detect whether specified processor can find matching microcode patch and load it.
 
-  @param[in]  CpuMpData  The pointer to CPU MP Data structure.
+  @param[in]  CpuMpData    The pointer to CPU MP Data structure.
+  @param[in]  IsBspCallIn  Indicate whether the caller is BSP or not.
 **/
 VOID
 MicrocodeDetect (
-  IN CPU_MP_DATA             *CpuMpData
+  IN CPU_MP_DATA             *CpuMpData,
+  IN BOOLEAN                 IsBspCallIn
   );
 
 /**
-- 
2.15.0.windows.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Patch v2 3/3] UefiCpuPkg/MpInitLib: Load uCode once for each core.
  2018-07-12 10:49 [Patch v2 0/3] Optimize load uCode performance Eric Dong
  2018-07-12 10:49 ` [Patch v2 1/3] UefiCpuPkg/MpInitLib: Relocate uCode to memory to save time Eric Dong
  2018-07-12 10:49 ` [Patch v2 2/3] UefiCpuPkg/MpInitLib: Use BSP uCode for APs if possible Eric Dong
@ 2018-07-12 10:49 ` Eric Dong
  2018-07-12 22:02   ` Laszlo Ersek
  2 siblings, 1 reply; 9+ messages in thread
From: Eric Dong @ 2018-07-12 10:49 UTC (permalink / raw)
  To: edk2-devel; +Cc: Laszlo Ersek, Ruiyu Ni

The SDM requires only one thread per core to load the
microcode.

This change enables this solution.

Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Ruiyu Ni <ruiyu.ni@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Eric Dong <eric.dong@intel.com>
---
 UefiCpuPkg/Library/MpInitLib/Microcode.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/UefiCpuPkg/Library/MpInitLib/Microcode.c b/UefiCpuPkg/Library/MpInitLib/Microcode.c
index 351975e2a2..122c23469d 100644
--- a/UefiCpuPkg/Library/MpInitLib/Microcode.c
+++ b/UefiCpuPkg/Library/MpInitLib/Microcode.c
@@ -61,6 +61,7 @@ MicrocodeDetect (
   VOID                                    *MicrocodeData;
   MSR_IA32_PLATFORM_ID_REGISTER           PlatformIdMsr;
   UINT32                                  ProcessorFlags;
+  UINT32                                  ThreadId;
 
   if (CpuMpData->MicrocodePatchRegionSize == 0) {
     //
@@ -77,6 +78,14 @@ MicrocodeDetect (
     return;
   }
 
+  GetProcessorLocationByApicId (GetInitialApicId (), NULL, NULL, &ThreadId);
+  if (ThreadId != 0) {
+    //
+    // Skip loading microcode if it is not the first thread in one core.
+    //
+    return;
+  }
+
   ExtendedTableLength = 0;
   //
   // Here data of CPUID leafs have not been collected into context buffer, so
-- 
2.15.0.windows.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Patch v2 1/3] UefiCpuPkg/MpInitLib: Relocate uCode to memory to save time.
  2018-07-12 10:49 ` [Patch v2 1/3] UefiCpuPkg/MpInitLib: Relocate uCode to memory to save time Eric Dong
@ 2018-07-12 21:59   ` Laszlo Ersek
  2018-07-12 22:13     ` Laszlo Ersek
  0 siblings, 1 reply; 9+ messages in thread
From: Laszlo Ersek @ 2018-07-12 21:59 UTC (permalink / raw)
  To: Eric Dong, edk2-devel; +Cc: Ruiyu Ni

On 07/12/18 12:49, Eric Dong wrote:
> Read uCode from memory has better performance than from flash.
> But it needs extra effort to let BSP copy uCode from flash to
> memory. Also BSP already enable cache in SEC phase, so it use
> less time to relocate uCode from flash to memory. After
> verification, if system has more than one processor, it will
> reduce some time if load uCode from memory.
> 
> This change enable this optimization.
> 
> Cc: Laszlo Ersek <lersek@redhat.com>
> Cc: Ruiyu Ni <ruiyu.ni@intel.com>
> Contributed-under: TianoCore Contribution Agreement 1.1
> Signed-off-by: Eric Dong <eric.dong@intel.com>
> ---
>  UefiCpuPkg/Library/MpInitLib/MpLib.c | 34 +++++++++++++++++++++++++++++++++-
>  1 file changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> index 108eea0a6f..c3cd6d7d51 100644
> --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
> +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> @@ -1520,6 +1520,7 @@ MpInitLibInitialize (
>    UINTN                    ApResetVectorSize;
>    UINTN                    BackupBufferAddr;
>    UINTN                    ApIdtBase;
> +  VOID                     *MicrocodePatchInRam;
>  
>    OldCpuMpData = GetCpuMpDataFromGuidedHob ();
>    if (OldCpuMpData == NULL) {
> @@ -1587,8 +1588,39 @@ MpInitLibInitialize (
>    CpuMpData->SwitchBspFlag    = FALSE;
>    CpuMpData->CpuData          = (CPU_AP_DATA *) (CpuMpData + 1);
>    CpuMpData->CpuInfoInHob     = (UINT64) (UINTN) (CpuMpData->CpuData + MaxLogicalProcessorNumber);
> -  CpuMpData->MicrocodePatchAddress    = PcdGet64 (PcdCpuMicrocodePatchAddress);
>    CpuMpData->MicrocodePatchRegionSize = PcdGet64 (PcdCpuMicrocodePatchRegionSize);
> +  //
> +  // If platform has more than one CPU, relocate microcode to memory to reduce
> +  // loading microcode time.
> +  //
> +  MicrocodePatchInRam = NULL;
> +  if (MaxLogicalProcessorNumber > 1) {
> +    MicrocodePatchInRam = AllocatePages (
> +                            EFI_SIZE_TO_PAGES (
> +                              (UINTN)CpuMpData->MicrocodePatchRegionSize
> +                              )
> +                            );
> +    ASSERT (MicrocodePatchInRam != NULL);
> +  }
> +  if (MicrocodePatchInRam == NULL) {
> +    //
> +    // there is only one processor, or no microcode patch is available, or
> +    // memory allocation failed
> +    //
> +    CpuMpData->MicrocodePatchAddress = PcdGet64 (PcdCpuMicrocodePatchAddress);
> +  } else {
> +    //
> +    // there are multiple processors, and a microcode patch is available, and
> +    // memory allocation succeeded
> +    //
> +    CopyMem (
> +      MicrocodePatchInRam,
> +      (VOID *)(UINTN)PcdGet64 (PcdCpuMicrocodePatchAddress),
> +      (UINTN)CpuMpData->MicrocodePatchRegionSize
> +      );
> +    CpuMpData->MicrocodePatchAddress = (UINTN)MicrocodePatchInRam;
> +  }
> +
>    InitializeSpinLock(&CpuMpData->MpLock);
>  
>    //
> 

Reviewed-by: Laszlo Ersek <lersek@redhat.com>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Patch v2 2/3] UefiCpuPkg/MpInitLib: Use BSP uCode for APs if possible.
  2018-07-12 10:49 ` [Patch v2 2/3] UefiCpuPkg/MpInitLib: Use BSP uCode for APs if possible Eric Dong
@ 2018-07-12 22:00   ` Laszlo Ersek
  0 siblings, 0 replies; 9+ messages in thread
From: Laszlo Ersek @ 2018-07-12 22:00 UTC (permalink / raw)
  To: Eric Dong, edk2-devel; +Cc: Ruiyu Ni

On 07/12/18 12:49, Eric Dong wrote:
> Search uCode costs much time, if AP has same processor type
> with BSP, AP can use BSP saved uCode info to get better performance.
> 
> This change enables this solution.
> 
> Cc: Laszlo Ersek <lersek@redhat.com>
> Cc: Ruiyu Ni <ruiyu.ni@intel.com>
> Contributed-under: TianoCore Contribution Agreement 1.1
> Signed-off-by: Eric Dong <eric.dong@intel.com>
> ---
>  UefiCpuPkg/Library/MpInitLib/Microcode.c | 34 +++++++++++++++++++++++++++++---
>  UefiCpuPkg/Library/MpInitLib/MpLib.c     |  4 ++--
>  UefiCpuPkg/Library/MpInitLib/MpLib.h     | 11 +++++++++--
>  3 files changed, 42 insertions(+), 7 deletions(-)
> 
> diff --git a/UefiCpuPkg/Library/MpInitLib/Microcode.c b/UefiCpuPkg/Library/MpInitLib/Microcode.c
> index e47f9f4f8f..351975e2a2 100644
> --- a/UefiCpuPkg/Library/MpInitLib/Microcode.c
> +++ b/UefiCpuPkg/Library/MpInitLib/Microcode.c
> @@ -35,11 +35,13 @@ GetCurrentMicrocodeSignature (
>  /**
>    Detect whether specified processor can find matching microcode patch and load it.
>  
> -  @param[in]  CpuMpData  The pointer to CPU MP Data structure.
> +  @param[in]  CpuMpData    The pointer to CPU MP Data structure.
> +  @param[in]  IsBspCallIn  Indicate whether the caller is BSP or not.
>  **/
>  VOID
>  MicrocodeDetect (
> -  IN CPU_MP_DATA             *CpuMpData
> +  IN CPU_MP_DATA             *CpuMpData,
> +  IN BOOLEAN                 IsBspCallIn
>    )
>  {
>    UINT32                                  ExtendedTableLength;
> @@ -58,6 +60,7 @@ MicrocodeDetect (
>    BOOLEAN                                 CorrectMicrocode;
>    VOID                                    *MicrocodeData;
>    MSR_IA32_PLATFORM_ID_REGISTER           PlatformIdMsr;
> +  UINT32                                  ProcessorFlags;
>  
>    if (CpuMpData->MicrocodePatchRegionSize == 0) {
>      //
> @@ -67,7 +70,7 @@ MicrocodeDetect (
>    }
>  
>    CurrentRevision = GetCurrentMicrocodeSignature ();
> -  if (CurrentRevision != 0) {
> +  if (CurrentRevision != 0 && !IsBspCallIn) {
>      //
>      // Skip loading microcode if it has been loaded successfully
>      //
> @@ -87,6 +90,19 @@ MicrocodeDetect (
>    PlatformIdMsr.Uint64 = AsmReadMsr64 (MSR_IA32_PLATFORM_ID);
>    PlatformId = (UINT8) PlatformIdMsr.Bits.PlatformId;
>  
> +  //
> +  // Check whether AP has same processor with BSP.
> +  // If yes, direct use microcode info saved by BSP.
> +  //
> +  if (!IsBspCallIn) {
> +    if ((CpuMpData->ProcessorSignature == Eax.Uint32) &&
> +        (CpuMpData->ProcessorFlags & (1 << PlatformId)) != 0) {
> +        MicrocodeData = (VOID *)(UINTN) CpuMpData->MicrocodeDataAddress;
> +        LatestRevision = CpuMpData->MicrocodeRevision;
> +        goto Done;
> +    }
> +  }
> +
>    LatestRevision = 0;
>    MicrocodeData  = NULL;
>    MicrocodeEnd = (UINTN) (CpuMpData->MicrocodePatchAddress + CpuMpData->MicrocodePatchRegionSize);
> @@ -117,6 +133,7 @@ MicrocodeDetect (
>          }
>          if (CheckSum32 == 0) {
>            CorrectMicrocode = TRUE;
> +          ProcessorFlags = MicrocodeEntryPoint->ProcessorFlags;
>          }
>        } else if ((MicrocodeEntryPoint->DataSize != 0) &&
>                   (MicrocodeEntryPoint->UpdateRevision > LatestRevision)) {
> @@ -151,6 +168,7 @@ MicrocodeDetect (
>                      // Find one
>                      //
>                      CorrectMicrocode = TRUE;
> +                    ProcessorFlags = ExtendedTable->ProcessorFlag;
>                      break;
>                    }
>                  }
> @@ -188,6 +206,7 @@ MicrocodeDetect (
>      MicrocodeEntryPoint = (CPU_MICROCODE_HEADER *) (((UINTN) MicrocodeEntryPoint) + TotalSize);
>    } while (((UINTN) MicrocodeEntryPoint < MicrocodeEnd));
>  
> +Done:
>    if (LatestRevision > CurrentRevision) {
>      //
>      // BIOS only authenticate updates that contain a numerically larger revision
> @@ -211,4 +230,13 @@ MicrocodeDetect (
>        ReleaseSpinLock(&CpuMpData->MpLock);
>      }
>    }
> +
> +  if (IsBspCallIn && (LatestRevision != 0)) {
> +    CpuMpData->ProcessorSignature = Eax.Uint32;
> +    CpuMpData->ProcessorFlags = ProcessorFlags;
> +    CpuMpData->MicrocodeDataAddress = (UINTN) MicrocodeData;
> +    CpuMpData->MicrocodeRevision = LatestRevision;
> +    DEBUG ((DEBUG_INFO, "BSP Microcode:: signature [0x%08x], ProcessorFlags [0x%08x], \
> +       MicroData [0x%08x], Revision [0x%08x]\n", Eax.Uint32, ProcessorFlags, (UINTN) MicrocodeData, LatestRevision));
> +  }
>  }
> diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> index c3cd6d7d51..3a7bdfa00b 100644
> --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
> +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> @@ -410,7 +410,7 @@ ApInitializeSync (
>    //
>    // Load microcode on AP
>    //
> -  MicrocodeDetect (CpuMpData);
> +  MicrocodeDetect (CpuMpData, FALSE);
>    //
>    // Sync BSP's MTRR table to AP
>    //
> @@ -1659,7 +1659,7 @@ MpInitLibInitialize (
>    //
>    // Load Microcode on BSP
>    //
> -  MicrocodeDetect (CpuMpData);
> +  MicrocodeDetect (CpuMpData, TRUE);
>    //
>    // Store BSP's MTRR setting
>    //
> diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.h b/UefiCpuPkg/Library/MpInitLib/MpLib.h
> index 9aedb52636..6958080ac1 100644
> --- a/UefiCpuPkg/Library/MpInitLib/MpLib.h
> +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.h
> @@ -245,6 +245,11 @@ struct _CPU_MP_DATA {
>    BOOLEAN                        TimerInterruptState;
>    UINT64                         MicrocodePatchAddress;
>    UINT64                         MicrocodePatchRegionSize;
> +
> +  UINT32                         ProcessorSignature;
> +  UINT32                         ProcessorFlags;
> +  UINT64                         MicrocodeDataAddress;
> +  UINT32                         MicrocodeRevision;
>  };
>  
>  extern EFI_GUID mCpuInitMpLibHobGuid;
> @@ -546,11 +551,13 @@ CheckAndUpdateApsStatus (
>  /**
>    Detect whether specified processor can find matching microcode patch and load it.
>  
> -  @param[in]  CpuMpData  The pointer to CPU MP Data structure.
> +  @param[in]  CpuMpData    The pointer to CPU MP Data structure.
> +  @param[in]  IsBspCallIn  Indicate whether the caller is BSP or not.
>  **/
>  VOID
>  MicrocodeDetect (
> -  IN CPU_MP_DATA             *CpuMpData
> +  IN CPU_MP_DATA             *CpuMpData,
> +  IN BOOLEAN                 IsBspCallIn
>    );
>  
>  /**
> 

Acked-by: Laszlo Ersek <lersek@redhat.com>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Patch v2 3/3] UefiCpuPkg/MpInitLib: Load uCode once for each core.
  2018-07-12 10:49 ` [Patch v2 3/3] UefiCpuPkg/MpInitLib: Load uCode once for each core Eric Dong
@ 2018-07-12 22:02   ` Laszlo Ersek
  0 siblings, 0 replies; 9+ messages in thread
From: Laszlo Ersek @ 2018-07-12 22:02 UTC (permalink / raw)
  To: Eric Dong, edk2-devel; +Cc: Ruiyu Ni

On 07/12/18 12:49, Eric Dong wrote:
> The SDM requires only one thread per core to load the
> microcode.
> 
> This change enables this solution.
> 
> Cc: Laszlo Ersek <lersek@redhat.com>
> Cc: Ruiyu Ni <ruiyu.ni@intel.com>
> Contributed-under: TianoCore Contribution Agreement 1.1
> Signed-off-by: Eric Dong <eric.dong@intel.com>
> ---
>  UefiCpuPkg/Library/MpInitLib/Microcode.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/UefiCpuPkg/Library/MpInitLib/Microcode.c b/UefiCpuPkg/Library/MpInitLib/Microcode.c
> index 351975e2a2..122c23469d 100644
> --- a/UefiCpuPkg/Library/MpInitLib/Microcode.c
> +++ b/UefiCpuPkg/Library/MpInitLib/Microcode.c
> @@ -61,6 +61,7 @@ MicrocodeDetect (
>    VOID                                    *MicrocodeData;
>    MSR_IA32_PLATFORM_ID_REGISTER           PlatformIdMsr;
>    UINT32                                  ProcessorFlags;
> +  UINT32                                  ThreadId;
>  
>    if (CpuMpData->MicrocodePatchRegionSize == 0) {
>      //
> @@ -77,6 +78,14 @@ MicrocodeDetect (
>      return;
>    }
>  
> +  GetProcessorLocationByApicId (GetInitialApicId (), NULL, NULL, &ThreadId);
> +  if (ThreadId != 0) {
> +    //
> +    // Skip loading microcode if it is not the first thread in one core.
> +    //
> +    return;
> +  }
> +
>    ExtendedTableLength = 0;
>    //
>    // Here data of CPUID leafs have not been collected into context buffer, so
> 

Acked-by: Laszlo Ersek <lersek@redhat.com>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Patch v2 1/3] UefiCpuPkg/MpInitLib: Relocate uCode to memory to save time.
  2018-07-12 21:59   ` Laszlo Ersek
@ 2018-07-12 22:13     ` Laszlo Ersek
  2018-07-13  0:45       ` Dong, Eric
  0 siblings, 1 reply; 9+ messages in thread
From: Laszlo Ersek @ 2018-07-12 22:13 UTC (permalink / raw)
  To: Eric Dong, edk2-devel; +Cc: Ruiyu Ni

On 07/12/18 23:59, Laszlo Ersek wrote:
> On 07/12/18 12:49, Eric Dong wrote:
>> Read uCode from memory has better performance than from flash.
>> But it needs extra effort to let BSP copy uCode from flash to
>> memory. Also BSP already enable cache in SEC phase, so it use
>> less time to relocate uCode from flash to memory. After
>> verification, if system has more than one processor, it will
>> reduce some time if load uCode from memory.
>>
>> This change enable this optimization.
>>
>> Cc: Laszlo Ersek <lersek@redhat.com>
>> Cc: Ruiyu Ni <ruiyu.ni@intel.com>
>> Contributed-under: TianoCore Contribution Agreement 1.1
>> Signed-off-by: Eric Dong <eric.dong@intel.com>
>> ---
>>  UefiCpuPkg/Library/MpInitLib/MpLib.c | 34 +++++++++++++++++++++++++++++++++-
>>  1 file changed, 33 insertions(+), 1 deletion(-)
>>
>> diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c b/UefiCpuPkg/Library/MpInitLib/MpLib.c
>> index 108eea0a6f..c3cd6d7d51 100644
>> --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
>> +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
>> @@ -1520,6 +1520,7 @@ MpInitLibInitialize (
>>    UINTN                    ApResetVectorSize;
>>    UINTN                    BackupBufferAddr;
>>    UINTN                    ApIdtBase;
>> +  VOID                     *MicrocodePatchInRam;
>>  
>>    OldCpuMpData = GetCpuMpDataFromGuidedHob ();
>>    if (OldCpuMpData == NULL) {
>> @@ -1587,8 +1588,39 @@ MpInitLibInitialize (
>>    CpuMpData->SwitchBspFlag    = FALSE;
>>    CpuMpData->CpuData          = (CPU_AP_DATA *) (CpuMpData + 1);
>>    CpuMpData->CpuInfoInHob     = (UINT64) (UINTN) (CpuMpData->CpuData + MaxLogicalProcessorNumber);
>> -  CpuMpData->MicrocodePatchAddress    = PcdGet64 (PcdCpuMicrocodePatchAddress);
>>    CpuMpData->MicrocodePatchRegionSize = PcdGet64 (PcdCpuMicrocodePatchRegionSize);
>> +  //
>> +  // If platform has more than one CPU, relocate microcode to memory to reduce
>> +  // loading microcode time.
>> +  //
>> +  MicrocodePatchInRam = NULL;
>> +  if (MaxLogicalProcessorNumber > 1) {
>> +    MicrocodePatchInRam = AllocatePages (
>> +                            EFI_SIZE_TO_PAGES (
>> +                              (UINTN)CpuMpData->MicrocodePatchRegionSize
>> +                              )
>> +                            );
>> +    ASSERT (MicrocodePatchInRam != NULL);
>> +  }
>> +  if (MicrocodePatchInRam == NULL) {
>> +    //
>> +    // there is only one processor, or no microcode patch is available, or
>> +    // memory allocation failed
>> +    //
>> +    CpuMpData->MicrocodePatchAddress = PcdGet64 (PcdCpuMicrocodePatchAddress);
>> +  } else {
>> +    //
>> +    // there are multiple processors, and a microcode patch is available, and
>> +    // memory allocation succeeded
>> +    //
>> +    CopyMem (
>> +      MicrocodePatchInRam,
>> +      (VOID *)(UINTN)PcdGet64 (PcdCpuMicrocodePatchAddress),
>> +      (UINTN)CpuMpData->MicrocodePatchRegionSize
>> +      );
>> +    CpuMpData->MicrocodePatchAddress = (UINTN)MicrocodePatchInRam;
>> +  }
>> +
>>    InitializeSpinLock(&CpuMpData->MpLock);
>>  
>>    //
>>
> 
> Reviewed-by: Laszlo Ersek <lersek@redhat.com>
> 

Sorry, I have to take that back -- please do not commit this patch.

For this version of the patch:

Nacked-by: Laszlo Ersek <lersek@redhat.com>

The ASSERT() is wrong. Again, in the code above, AllocatePages() can
return NULL not only because of memory allocation failure, but also
because the number of pages to allocate can be zero! If the platform has
no microcode patch to apply.

I knew this full well when I suggested the code, but then I forgot about
it when you mentioned the ASSERT(). I think you also knew about it, and
forgot about it too. :)

In particular, this patch would make it *impossible* to boot OVMF with
multiple processors, because OVMF *never* provides a microcode update.

So, please remove the ASSERT.

Alternatively, you could modify the ASSERT() like this:

  //
  // if we attempt actual memory allocation, we expect it to succeed
  //
  ASSERT (
    (CpuMpData->MicrocodePatchRegionSize == 0) ||
    (MicrocodePatchInRam != NULL)
    );

Thanks,
Laszlo


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Patch v2 1/3] UefiCpuPkg/MpInitLib: Relocate uCode to memory to save time.
  2018-07-12 22:13     ` Laszlo Ersek
@ 2018-07-13  0:45       ` Dong, Eric
  0 siblings, 0 replies; 9+ messages in thread
From: Dong, Eric @ 2018-07-13  0:45 UTC (permalink / raw)
  To: Laszlo Ersek, edk2-devel@lists.01.org; +Cc: Ni, Ruiyu

Hi Laszlo,

> -----Original Message-----
> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of
> Laszlo Ersek
> Sent: Friday, July 13, 2018 6:14 AM
> To: Dong, Eric <eric.dong@intel.com>; edk2-devel@lists.01.org
> Cc: Ni, Ruiyu <ruiyu.ni@intel.com>
> Subject: Re: [edk2] [Patch v2 1/3] UefiCpuPkg/MpInitLib: Relocate uCode to
> memory to save time.
> 
> On 07/12/18 23:59, Laszlo Ersek wrote:
> > On 07/12/18 12:49, Eric Dong wrote:
> >> Read uCode from memory has better performance than from flash.
> >> But it needs extra effort to let BSP copy uCode from flash to memory.
> >> Also BSP already enable cache in SEC phase, so it use less time to
> >> relocate uCode from flash to memory. After verification, if system
> >> has more than one processor, it will reduce some time if load uCode
> >> from memory.
> >>
> >> This change enable this optimization.
> >>
> >> Cc: Laszlo Ersek <lersek@redhat.com>
> >> Cc: Ruiyu Ni <ruiyu.ni@intel.com>
> >> Contributed-under: TianoCore Contribution Agreement 1.1
> >> Signed-off-by: Eric Dong <eric.dong@intel.com>
> >> ---
> >>  UefiCpuPkg/Library/MpInitLib/MpLib.c | 34
> >> +++++++++++++++++++++++++++++++++-
> >>  1 file changed, 33 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c
> >> b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> >> index 108eea0a6f..c3cd6d7d51 100644
> >> --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
> >> +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> >> @@ -1520,6 +1520,7 @@ MpInitLibInitialize (
> >>    UINTN                    ApResetVectorSize;
> >>    UINTN                    BackupBufferAddr;
> >>    UINTN                    ApIdtBase;
> >> +  VOID                     *MicrocodePatchInRam;
> >>
> >>    OldCpuMpData = GetCpuMpDataFromGuidedHob ();
> >>    if (OldCpuMpData == NULL) {
> >> @@ -1587,8 +1588,39 @@ MpInitLibInitialize (
> >>    CpuMpData->SwitchBspFlag    = FALSE;
> >>    CpuMpData->CpuData          = (CPU_AP_DATA *) (CpuMpData + 1);
> >>    CpuMpData->CpuInfoInHob     = (UINT64) (UINTN) (CpuMpData-
> >CpuData + MaxLogicalProcessorNumber);
> >> -  CpuMpData->MicrocodePatchAddress    = PcdGet64
> (PcdCpuMicrocodePatchAddress);
> >>    CpuMpData->MicrocodePatchRegionSize = PcdGet64
> >> (PcdCpuMicrocodePatchRegionSize);
> >> +  //
> >> +  // If platform has more than one CPU, relocate microcode to memory
> >> + to reduce  // loading microcode time.
> >> +  //
> >> +  MicrocodePatchInRam = NULL;
> >> +  if (MaxLogicalProcessorNumber > 1) {
> >> +    MicrocodePatchInRam = AllocatePages (
> >> +                            EFI_SIZE_TO_PAGES (
> >> +                              (UINTN)CpuMpData->MicrocodePatchRegionSize
> >> +                              )
> >> +                            );
> >> +    ASSERT (MicrocodePatchInRam != NULL);  }  if
> >> + (MicrocodePatchInRam == NULL) {
> >> +    //
> >> +    // there is only one processor, or no microcode patch is available, or
> >> +    // memory allocation failed
> >> +    //
> >> +    CpuMpData->MicrocodePatchAddress = PcdGet64
> >> + (PcdCpuMicrocodePatchAddress);  } else {
> >> +    //
> >> +    // there are multiple processors, and a microcode patch is available,
> and
> >> +    // memory allocation succeeded
> >> +    //
> >> +    CopyMem (
> >> +      MicrocodePatchInRam,
> >> +      (VOID *)(UINTN)PcdGet64 (PcdCpuMicrocodePatchAddress),
> >> +      (UINTN)CpuMpData->MicrocodePatchRegionSize
> >> +      );
> >> +    CpuMpData->MicrocodePatchAddress = (UINTN)MicrocodePatchInRam;
> >> + }
> >> +
> >>    InitializeSpinLock(&CpuMpData->MpLock);
> >>
> >>    //
> >>
> >
> > Reviewed-by: Laszlo Ersek <lersek@redhat.com>
> >
> 
> Sorry, I have to take that back -- please do not commit this patch.
> 
> For this version of the patch:
> 
> Nacked-by: Laszlo Ersek <lersek@redhat.com>
> 
> The ASSERT() is wrong. Again, in the code above, AllocatePages() can return
> NULL not only because of memory allocation failure, but also because the
> number of pages to allocate can be zero! If the platform has no microcode
> patch to apply.
> 
> I knew this full well when I suggested the code, but then I forgot about it when
> you mentioned the ASSERT(). I think you also knew about it, and forgot about
> it too. :)
> 
> In particular, this patch would make it *impossible* to boot OVMF with
> multiple processors, because OVMF *never* provides a microcode update.
> 
> So, please remove the ASSERT.
> 

Agree, I removed ASSERT code and send V3 patches.

> Alternatively, you could modify the ASSERT() like this:
> 
>   //
>   // if we attempt actual memory allocation, we expect it to succeed
>   //
>   ASSERT (
>     (CpuMpData->MicrocodePatchRegionSize == 0) ||
>     (MicrocodePatchInRam != NULL)
>     );
> 
> Thanks,
> Laszlo
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-07-13  0:46 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-07-12 10:49 [Patch v2 0/3] Optimize load uCode performance Eric Dong
2018-07-12 10:49 ` [Patch v2 1/3] UefiCpuPkg/MpInitLib: Relocate uCode to memory to save time Eric Dong
2018-07-12 21:59   ` Laszlo Ersek
2018-07-12 22:13     ` Laszlo Ersek
2018-07-13  0:45       ` Dong, Eric
2018-07-12 10:49 ` [Patch v2 2/3] UefiCpuPkg/MpInitLib: Use BSP uCode for APs if possible Eric Dong
2018-07-12 22:00   ` Laszlo Ersek
2018-07-12 10:49 ` [Patch v2 3/3] UefiCpuPkg/MpInitLib: Load uCode once for each core Eric Dong
2018-07-12 22:02   ` Laszlo Ersek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox