public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
* [Patch 0/4] Fix performance issue caused by Set MSR task.
@ 2018-10-15  2:49 Eric Dong
  2018-10-15  2:49 ` [Patch 1/4] UefiCpuPkg/Include/AcpiCpuData.h: Add Semaphore related Information Eric Dong
                   ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: Eric Dong @ 2018-10-15  2:49 UTC (permalink / raw)
  To: edk2-devel; +Cc: Ruiyu Ni, Laszlo Ersek

In a system which has multiple cores, current set register value task costs huge times.
After investigation, current set MSR task costs most of the times. Current logic uses
SpinLock to let set MSR task as an single thread task for all cores. Because MSR has
scope attribute which may cause GP fault if multiple APs set MSR at the same time,
current logic use an easiest solution (use SpinLock) to avoid this issue, but it will
cost huge times.

In order to fix this performance issue, new solution will set MSRs base on their scope
attribute. After this, the SpinLock will not needed. Without SpinLock, new issue raised
which is caused by MSR dependence. For example, MSR A depends on MSR B which means MSR A
must been set after MSR B has been set. Also MSR B is package scope level and MSR A is
thread scope level. If system has multiple threads, Thread 1 needs to set the thread level
MSRs and thread 2 needs to set thread and package level MSRs. Set MSRs task for thread 1
and thread 2 like below:

            Thread 1                 Thread 2
MSR B          N                        Y
MSR A          Y                        Y

If driver don't control execute MSR order, for thread 1, it will execute MSR A first, but
at this time, MSR B not been executed yet by thread 2. system may trig exception at this
time.

In order to fix the above issue, driver introduces semaphore logic to control the MSR
execute sequence. For the above case, a semaphore will be add between MSR A and B for
all threads. Semaphore has scope info for it. The possible scope value is core or package.
For each thread, when it meets a semaphore during it set registers, it will 1) release
semaphore (+1) for each threads in this core or package(based on the scope info for this
semaphore) 2) acquire semaphore (-1) for all the threads in this core or package(based
on the scope info for this semaphore). With these two steps, driver can control MSR
sequence. Sample code logic like below:

  //
  // First increase semaphore count by 1 for processors in this package.
  //
  for (ProcessorIndex = 0; ProcessorIndex < PackageThreadsCount ; ProcessorIndex ++) {
    LibReleaseSemaphore ((UINT32 *) &SemaphorePtr[PackageOffset + ProcessorIndex]);
  }
  //
  // Second, check whether the count has reach the check number.
  //
  for (ProcessorIndex = 0; ProcessorIndex < ValidApCount; ProcessorIndex ++) {
    LibWaitForSemaphore (&SemaphorePtr[ApOffset]);
  }

Platform Requirement:
1. This change requires register MSR setting base on MSR scope info. If still register MSR
   for all threads, exception may raised.

Known limitation:
1. Current CpuFeatures driver supports DXE instance and PEI instance. But semaphore logic
   requires Aps execute in async mode which is not supported by PEI driver. So CpuFeature
   PEI instance not works after this change. We plan to support async mode for PEI in phase
   2 for this task.
2. Current execute MSR task code in duplicated in PiSmmCpuDxeSmm driver and 
   RegisterCpuFeaturesLib library because the schedule limitation. Will merge the code to 
   RegisterCpuFeaturesLib and export as an API in phase 2 for this task.

Extra Notes:
  I will send the other patch to set MSR base on scope info and check in it before check in
  this serial.

Cc: Ruiyu Ni <ruiyu.ni@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Eric Dong <eric.dong@intel.com>

Eric Dong (4):
  UefiCpuPkg/Include/AcpiCpuData.h: Add Semaphore related Information.
  UefiCpuPkg/RegisterCpuFeaturesLib.h: Add new dependence types.
  UefiCpuPkg/RegisterCpuFeaturesLib: Add logic to support semaphore
    type.
  UefiCpuPkg/PiSmmCpuDxeSmm: Add logic to support semaphore type.

 UefiCpuPkg/Include/AcpiCpuData.h                   |  23 +-
 .../Include/Library/RegisterCpuFeaturesLib.h       |  25 +-
 .../RegisterCpuFeaturesLib/CpuFeaturesInitialize.c | 324 ++++++++++++---
 .../DxeRegisterCpuFeaturesLib.c                    |  71 +++-
 .../DxeRegisterCpuFeaturesLib.inf                  |   3 +
 .../PeiRegisterCpuFeaturesLib.c                    |  55 ++-
 .../PeiRegisterCpuFeaturesLib.inf                  |   1 +
 .../RegisterCpuFeaturesLib/RegisterCpuFeatures.h   |  51 ++-
 .../RegisterCpuFeaturesLib.c                       | 452 ++++++++++++++++++---
 UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c                  | 316 +++++++-------
 UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c              |   3 -
 UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h         |   3 +-
 12 files changed, 1063 insertions(+), 264 deletions(-)

-- 
2.15.0.windows.1



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Patch 1/4] UefiCpuPkg/Include/AcpiCpuData.h: Add Semaphore related Information.
  2018-10-15  2:49 [Patch 0/4] Fix performance issue caused by Set MSR task Eric Dong
@ 2018-10-15  2:49 ` Eric Dong
  2018-10-15 16:02   ` Laszlo Ersek
  2018-10-16  2:27   ` Ni, Ruiyu
  2018-10-15  2:49 ` [Patch 2/4] UefiCpuPkg/RegisterCpuFeaturesLib.h: Add new dependence types Eric Dong
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 18+ messages in thread
From: Eric Dong @ 2018-10-15  2:49 UTC (permalink / raw)
  To: edk2-devel; +Cc: Ruiyu Ni, Laszlo Ersek

In order to support semaphore related logic, add new definition for it.

Cc: Ruiyu Ni <ruiyu.ni@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Eric Dong <eric.dong@intel.com>
---
 UefiCpuPkg/Include/AcpiCpuData.h | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/UefiCpuPkg/Include/AcpiCpuData.h b/UefiCpuPkg/Include/AcpiCpuData.h
index 9e51145c08..b3cf2f664a 100644
--- a/UefiCpuPkg/Include/AcpiCpuData.h
+++ b/UefiCpuPkg/Include/AcpiCpuData.h
@@ -15,6 +15,8 @@ WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED.
 #ifndef _ACPI_CPU_DATA_H_
 #define _ACPI_CPU_DATA_H_
 
+#include <Protocol/MpService.h>
+
 //
 // Register types in register table
 //
@@ -22,9 +24,20 @@ typedef enum {
   Msr,
   ControlRegister,
   MemoryMapped,
-  CacheControl
+  CacheControl,
+  Semaphore
 } REGISTER_TYPE;
 
+//
+// CPU information.
+//
+typedef struct {
+  UINT32        PackageCount;             // Packages in this CPU.
+  UINT32        CoreCount;                // Max Core count in the packages.
+  UINT32        ThreadCount;              // MAx thread count in the cores.
+  UINT32        *ValidCoresInPackages;    // Valid cores in each package.
+} CPU_STATUS_INFORMATION;
+
 //
 // Element of register table entry
 //
@@ -147,6 +160,14 @@ typedef struct {
   // provided.
   //
   UINT32                ApMachineCheckHandlerSize;
+  //
+  // CPU information which is required when set the register table.
+  //
+  CPU_STATUS_INFORMATION     CpuStatus;
+  //
+  // Location info for each ap.
+  //
+  EFI_CPU_PHYSICAL_LOCATION  *ApLocation;
 } ACPI_CPU_DATA;
 
 #endif
-- 
2.15.0.windows.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Patch 2/4] UefiCpuPkg/RegisterCpuFeaturesLib.h: Add new dependence types.
  2018-10-15  2:49 [Patch 0/4] Fix performance issue caused by Set MSR task Eric Dong
  2018-10-15  2:49 ` [Patch 1/4] UefiCpuPkg/Include/AcpiCpuData.h: Add Semaphore related Information Eric Dong
@ 2018-10-15  2:49 ` Eric Dong
  2018-10-15  2:49 ` [Patch 3/4] UefiCpuPkg/RegisterCpuFeaturesLib: Add logic to support semaphore type Eric Dong
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 18+ messages in thread
From: Eric Dong @ 2018-10-15  2:49 UTC (permalink / raw)
  To: edk2-devel; +Cc: Ruiyu Ni, Laszlo Ersek

Add new core/package dependence types which consumed by different MSRs.

Cc: Ruiyu Ni <ruiyu.ni@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Eric Dong <eric.dong@intel.com>
---
 .../Include/Library/RegisterCpuFeaturesLib.h       | 25 ++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/UefiCpuPkg/Include/Library/RegisterCpuFeaturesLib.h b/UefiCpuPkg/Include/Library/RegisterCpuFeaturesLib.h
index 9331e49d13..e6f0ebe4bc 100644
--- a/UefiCpuPkg/Include/Library/RegisterCpuFeaturesLib.h
+++ b/UefiCpuPkg/Include/Library/RegisterCpuFeaturesLib.h
@@ -73,10 +73,17 @@
 #define CPU_FEATURE_PPIN                            (32+11)
 #define CPU_FEATURE_PROC_TRACE                      (32+12)
 
-#define CPU_FEATURE_BEFORE_ALL                      BIT27
-#define CPU_FEATURE_AFTER_ALL                       BIT28
-#define CPU_FEATURE_BEFORE                          BIT29
-#define CPU_FEATURE_AFTER                           BIT30
+#define CPU_FEATURE_BEFORE_ALL                      BIT23
+#define CPU_FEATURE_AFTER_ALL                       BIT24
+#define CPU_FEATURE_BEFORE                          BIT25
+#define CPU_FEATURE_AFTER                           BIT26
+
+#define CPU_FEATURE_THREAD_BEFORE                   CPU_FEATURE_BEFORE
+#define CPU_FEATURE_THREAD_AFTER                    CPU_FEATURE_AFTER
+#define CPU_FEATURE_CORE_BEFORE                     BIT27
+#define CPU_FEATURE_CORE_AFTER                      BIT28
+#define CPU_FEATURE_PACKAGE_BEFORE                  BIT29
+#define CPU_FEATURE_PACKAGE_AFTER                   BIT30
 #define CPU_FEATURE_END                             MAX_UINT32
 /// @}
 
@@ -116,6 +123,16 @@ typedef struct {
   CPUID_VERSION_INFO_EDX               CpuIdVersionInfoEdx;
 } REGISTER_CPU_FEATURE_INFORMATION;
 
+//
+// Describe the dependency type for different features.
+//
+typedef enum {
+  NoneDepType,
+  ThreadDepType,
+  CoreDepType,
+  PackageDepType
+} CPU_FEATURE_DEPENDENCE_TYPE;
+
 /**
   Determines if a CPU feature is enabled in PcdCpuFeaturesSupport bit mask.
   If a CPU feature is disabled in PcdCpuFeaturesSupport then all the code/data
-- 
2.15.0.windows.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Patch 3/4] UefiCpuPkg/RegisterCpuFeaturesLib: Add logic to support semaphore type.
  2018-10-15  2:49 [Patch 0/4] Fix performance issue caused by Set MSR task Eric Dong
  2018-10-15  2:49 ` [Patch 1/4] UefiCpuPkg/Include/AcpiCpuData.h: Add Semaphore related Information Eric Dong
  2018-10-15  2:49 ` [Patch 2/4] UefiCpuPkg/RegisterCpuFeaturesLib.h: Add new dependence types Eric Dong
@ 2018-10-15  2:49 ` Eric Dong
  2018-10-16  3:05   ` Ni, Ruiyu
  2018-10-15  2:49 ` [Patch 4/4] UefiCpuPkg/PiSmmCpuDxeSmm: " Eric Dong
  2018-10-15 15:51 ` [Patch 0/4] Fix performance issue caused by Set MSR task Laszlo Ersek
  4 siblings, 1 reply; 18+ messages in thread
From: Eric Dong @ 2018-10-15  2:49 UTC (permalink / raw)
  To: edk2-devel; +Cc: Ruiyu Ni, Laszlo Ersek

In a system which has multiple cores, current set register value task costs huge times.
After investigation, current set MSR task costs most of the times. Current logic uses
SpinLock to let set MSR task as an single thread task for all cores. Because MSR has
scope attribute which may cause GP fault if multiple APs set MSR at the same time,
current logic use an easiest solution (use SpinLock) to avoid this issue, but it will
cost huge times.

In order to fix this performance issue, new solution will set MSRs base on their scope
attribute. After this, the SpinLock will not needed. Without SpinLock, new issue raised
which is caused by MSR dependence. For example, MSR A depends on MSR B which means MSR A
must been set after MSR B has been set. Also MSR B is package scope level and MSR A is
thread scope level. If system has multiple threads, Thread 1 needs to set the thread level
MSRs and thread 2 needs to set thread and package level MSRs. Set MSRs task for thread 1
and thread 2 like below:

            Thread 1                 Thread 2
MSR B          N                        Y
MSR A          Y                        Y

If driver don't control execute MSR order, for thread 1, it will execute MSR A first, but
at this time, MSR B not been executed yet by thread 2. system may trig exception at this
time.

In order to fix the above issue, driver introduces semaphore logic to control the MSR
execute sequence. For the above case, a semaphore will be add between MSR A and B for
all threads. Semaphore has scope info for it. The possible scope value is core or package.
For each thread, when it meets a semaphore during it set registers, it will 1) release
semaphore (+1) for each threads in this core or package(based on the scope info for this
semaphore) 2) acquire semaphore (-1) for all the threads in this core or package(based
on the scope info for this semaphore). With these two steps, driver can control MSR
sequence. Sample code logic like below:

  //
  // First increase semaphore count by 1 for processors in this package.
  //
  for (ProcessorIndex = 0; ProcessorIndex < PackageThreadsCount ; ProcessorIndex ++) {
    LibReleaseSemaphore ((UINT32 *) &SemaphorePtr[PackageOffset + ProcessorIndex]);
  }
  //
  // Second, check whether the count has reach the check number.
  //
  for (ProcessorIndex = 0; ProcessorIndex < ValidApCount; ProcessorIndex ++) {
    LibWaitForSemaphore (&SemaphorePtr[ApOffset]);
  }

Platform Requirement:
1. This change requires register MSR setting base on MSR scope info. If still register MSR
   for all threads, exception may raised.

Known limitation:
1. Current CpuFeatures driver supports DXE instance and PEI instance. But semaphore logic
   requires Aps execute in async mode which is not supported by PEI driver. So CpuFeature
   PEI instance not works after this change. We plan to support async mode for PEI in phase
   2 for this task.

Cc: Ruiyu Ni <ruiyu.ni@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Eric Dong <eric.dong@intel.com>
---
 .../RegisterCpuFeaturesLib/CpuFeaturesInitialize.c | 324 ++++++++++++---
 .../DxeRegisterCpuFeaturesLib.c                    |  71 +++-
 .../DxeRegisterCpuFeaturesLib.inf                  |   3 +
 .../PeiRegisterCpuFeaturesLib.c                    |  55 ++-
 .../PeiRegisterCpuFeaturesLib.inf                  |   1 +
 .../RegisterCpuFeaturesLib/RegisterCpuFeatures.h   |  51 ++-
 .../RegisterCpuFeaturesLib.c                       | 452 ++++++++++++++++++---
 7 files changed, 840 insertions(+), 117 deletions(-)

diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c
index ba3fb3250f..f820b4fed7 100644
--- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c
+++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c
@@ -145,6 +145,20 @@ CpuInitDataInitialize (
   CPU_FEATURES_INIT_ORDER              *InitOrder;
   CPU_FEATURES_DATA                    *CpuFeaturesData;
   LIST_ENTRY                           *Entry;
+  UINT32                               Core;
+  UINT32                               Package;
+  UINT32                               Thread;
+  EFI_CPU_PHYSICAL_LOCATION            *Location;
+  UINT32                               *CoreArray;
+  UINTN                                Index;
+  UINT32                               ValidCount;
+  UINTN                                CoreIndex;
+  ACPI_CPU_DATA                        *AcpiCpuData;
+  CPU_STATUS_INFORMATION               *CpuStatus;
+
+  Core    = 0;
+  Package = 0;
+  Thread  = 0;
 
   CpuFeaturesData = GetCpuFeaturesData ();
   CpuFeaturesData->InitOrder = AllocateZeroPool (sizeof (CPU_FEATURES_INIT_ORDER) * NumberOfCpus);
@@ -163,6 +177,16 @@ CpuInitDataInitialize (
     Entry = Entry->ForwardLink;
   }
 
+  CpuFeaturesData->NumberOfCpus = (UINT32) NumberOfCpus;
+
+  AcpiCpuData = (ACPI_CPU_DATA *) (UINTN) PcdGet64 (PcdCpuS3DataAddress);
+  ASSERT (AcpiCpuData != NULL);
+  CpuFeaturesData->AcpiCpuData= AcpiCpuData;
+
+  CpuStatus = &AcpiCpuData->CpuStatus;
+  AcpiCpuData->ApLocation = AllocateZeroPool (sizeof (EFI_CPU_PHYSICAL_LOCATION) * NumberOfCpus);
+  ASSERT (AcpiCpuData->ApLocation != NULL);
+
   for (ProcessorNumber = 0; ProcessorNumber < NumberOfCpus; ProcessorNumber++) {
     InitOrder = &CpuFeaturesData->InitOrder[ProcessorNumber];
     InitOrder->FeaturesSupportedMask = AllocateZeroPool (CpuFeaturesData->BitMaskSize);
@@ -175,7 +199,59 @@ CpuInitDataInitialize (
       &ProcessorInfoBuffer,
       sizeof (EFI_PROCESSOR_INFORMATION)
       );
+    CopyMem (
+      AcpiCpuData->ApLocation + ProcessorNumber,
+      &ProcessorInfoBuffer.Location,
+      sizeof (EFI_CPU_PHYSICAL_LOCATION)
+      );
+
+    if (Package < ProcessorInfoBuffer.Location.Package) {
+      Package = ProcessorInfoBuffer.Location.Package;
+    }
+    if (Core < ProcessorInfoBuffer.Location.Core) {
+      Core = ProcessorInfoBuffer.Location.Core;
+    }
+    if (Thread < ProcessorInfoBuffer.Location.Thread) {
+      Thread = ProcessorInfoBuffer.Location.Thread;
+    }
+  }
+  CpuStatus->PackageCount = Package + 1;
+  CpuStatus->CoreCount    = Core + 1;
+  CpuStatus->ThreadCount  = Thread + 1;
+  DEBUG ((DEBUG_INFO, "Processor Info: Package: %d, Core : %d, Thread: %d\n",
+         CpuStatus->PackageCount,
+         CpuStatus->CoreCount,
+         CpuStatus->ThreadCount));
+
+  //
+  // Collect valid core count in each package because not all cores are valid.
+  //
+  CpuStatus->ValidCoresInPackages = AllocateZeroPool (sizeof (UINT32) * CpuStatus->PackageCount);
+  ASSERT (CpuStatus->ValidCoresInPackages != NULL);
+  CoreArray = AllocatePool (sizeof (UINT32) * CpuStatus->CoreCount);
+  ASSERT (CoreArray != NULL);
+
+  for (Index = 0; Index <= Package; Index ++ ) {
+    ZeroMem (CoreArray, sizeof (UINT32) * (Core + 1));
+    for (ProcessorNumber = 0; ProcessorNumber < NumberOfCpus; ProcessorNumber++) {
+      Location = &CpuFeaturesData->InitOrder[ProcessorNumber].CpuInfo.ProcessorInfo.Location;
+      if (Location->Package == Index) {
+        CoreArray[Location->Core] = 1;
+      }
+    }
+    for (CoreIndex = 0, ValidCount = 0; CoreIndex <= Core; CoreIndex ++) {
+      ValidCount += CoreArray[CoreIndex];
+    }
+    CpuStatus->ValidCoresInPackages[Index] = ValidCount;
   }
+  FreePool (CoreArray);
+  for (Index = 0; Index <= Package; Index++) {
+    DEBUG ((DEBUG_INFO, "Package: %d, Valid Core : %d\n", Index, CpuStatus->ValidCoresInPackages[Index]));
+  }
+
+  CpuFeaturesData->CpuFlags.SemaphoreCount = AllocateZeroPool (sizeof (UINT32) * CpuStatus->PackageCount * CpuStatus->CoreCount* CpuStatus->ThreadCount);
+  ASSERT (CpuFeaturesData->CpuFlags.SemaphoreCount != NULL);
+
   //
   // Get support and configuration PCDs
   //
@@ -310,7 +386,7 @@ CollectProcessorData (
   LIST_ENTRY                           *Entry;
   CPU_FEATURES_DATA                    *CpuFeaturesData;
 
-  CpuFeaturesData = GetCpuFeaturesData ();
+  CpuFeaturesData = (CPU_FEATURES_DATA *)Buffer;
   ProcessorNumber = GetProcessorIndex ();
   CpuInfo = &CpuFeaturesData->InitOrder[ProcessorNumber].CpuInfo;
   //
@@ -416,6 +492,15 @@ DumpRegisterTableOnProcessor (
         RegisterTableEntry->Value
         ));
       break;
+    case Semaphore:
+      DEBUG ((
+        DebugPrintErrorLevel,
+        "Processor: %d: Semaphore: Scope Value: %d\r\n",
+        ProcessorNumber,
+        RegisterTableEntry->Value
+        ));
+      break;
+
     default:
       break;
     }
@@ -441,6 +526,11 @@ AnalysisProcessorFeatures (
   REGISTER_CPU_FEATURE_INFORMATION     *CpuInfo;
   LIST_ENTRY                           *Entry;
   CPU_FEATURES_DATA                    *CpuFeaturesData;
+  LIST_ENTRY                           *NextEntry;
+  CPU_FEATURES_ENTRY                   *NextCpuFeatureInOrder;
+  BOOLEAN                              Success;
+  CPU_FEATURE_DEPENDENCE_TYPE          BeforeDep;
+  CPU_FEATURE_DEPENDENCE_TYPE          AfterDep;
 
   CpuFeaturesData = GetCpuFeaturesData ();
   CpuFeaturesData->CapabilityPcd = AllocatePool (CpuFeaturesData->BitMaskSize);
@@ -517,8 +607,14 @@ AnalysisProcessorFeatures (
     //
     CpuInfo = &CpuFeaturesData->InitOrder[ProcessorNumber].CpuInfo;
     Entry = GetFirstNode (&CpuInitOrder->OrderList);
+    NextEntry = Entry->ForwardLink;
     while (!IsNull (&CpuInitOrder->OrderList, Entry)) {
       CpuFeatureInOrder = CPU_FEATURE_ENTRY_FROM_LINK (Entry);
+      if (!IsNull (&CpuInitOrder->OrderList, NextEntry)) {
+        NextCpuFeatureInOrder = CPU_FEATURE_ENTRY_FROM_LINK (NextEntry);
+      } else {
+        NextCpuFeatureInOrder = NULL;
+      }
       if (IsBitMaskMatch (CpuFeatureInOrder->FeatureMask, CpuFeaturesData->SettingPcd)) {
         Status = CpuFeatureInOrder->InitializeFunc (ProcessorNumber, CpuInfo, CpuFeatureInOrder->ConfigData, TRUE);
         if (EFI_ERROR (Status)) {
@@ -532,6 +628,8 @@ AnalysisProcessorFeatures (
             DEBUG ((DEBUG_WARN, "Warning :: Failed to enable Feature: Mask = "));
             DumpCpuFeatureMask (CpuFeatureInOrder->FeatureMask);
           }
+        } else {
+          Success = TRUE;
         }
       } else {
         Status = CpuFeatureInOrder->InitializeFunc (ProcessorNumber, CpuInfo, CpuFeatureInOrder->ConfigData, FALSE);
@@ -542,9 +640,36 @@ AnalysisProcessorFeatures (
             DEBUG ((DEBUG_WARN, "Warning :: Failed to disable Feature: Mask = "));
             DumpCpuFeatureMask (CpuFeatureInOrder->FeatureMask);
           }
+        } else {
+          Success = TRUE;
         }
       }
-      Entry = Entry->ForwardLink;
+
+      if (Success) {
+        //
+        // If feature has dependence with the next feature (ONLY care core/package dependency).
+        // and feature initialize succeed, add sync semaphere here.
+        //
+        BeforeDep = DetectFeatureScope (CpuFeatureInOrder, TRUE);
+        if (NextCpuFeatureInOrder != NULL) {
+          AfterDep  = DetectFeatureScope (NextCpuFeatureInOrder, FALSE);
+        } else {
+          AfterDep = NoneDepType;
+        }
+        //
+        // Assume only one of the depend is valid.
+        //
+        ASSERT (!(BeforeDep > ThreadDepType && AfterDep > ThreadDepType));
+        if (BeforeDep > ThreadDepType) {
+          CPU_REGISTER_TABLE_WRITE32 (ProcessorNumber, Semaphore, 0, BeforeDep);
+        }
+        if (AfterDep > ThreadDepType) {
+          CPU_REGISTER_TABLE_WRITE32 (ProcessorNumber, Semaphore, 0, AfterDep);
+        }
+      }
+
+      Entry     = Entry->ForwardLink;
+      NextEntry = Entry->ForwardLink;
     }
 
     //
@@ -561,27 +686,79 @@ AnalysisProcessorFeatures (
   }
 }
 
+/**
+  Increment semaphore by 1.
+
+  @param      Sem            IN:  32-bit unsigned integer
+
+**/
+VOID
+LibReleaseSemaphore (
+  IN OUT  volatile UINT32           *Sem
+  )
+{
+  InterlockedIncrement (Sem);
+}
+
+/**
+  Decrement the semaphore by 1 if it is not zero.
+
+  Performs an atomic decrement operation for semaphore.
+  The compare exchange operation must be performed using
+  MP safe mechanisms.
+
+  @param      Sem            IN:  32-bit unsigned integer
+
+**/
+VOID
+LibWaitForSemaphore (
+  IN OUT  volatile UINT32           *Sem
+  )
+{
+  UINT32  Value;
+
+  do {
+    Value = *Sem;
+  } while (Value == 0);
+
+  InterlockedDecrement (Sem);
+}
+
 /**
   Initialize the CPU registers from a register table.
 
-  @param[in]  ProcessorNumber  The index of the CPU executing this function.
+  @param[in]  RegisterTable         The register table for this AP.
+  @param[in]  ApLocation            AP location info for this ap.
+  @param[in]  CpuStatus             CPU status info for this CPU.
+  @param[in]  CpuFlags              Flags data structure used when program the register.
 
   @note This service could be called by BSP/APs.
 **/
 VOID
+EFIAPI
 ProgramProcessorRegister (
-  IN UINTN  ProcessorNumber
+  IN CPU_REGISTER_TABLE           *RegisterTable,
+  IN EFI_CPU_PHYSICAL_LOCATION    *ApLocation,
+  IN CPU_STATUS_INFORMATION       *CpuStatus,
+  IN PROGRAM_CPU_REGISTER_FLAGS   *CpuFlags
   )
 {
-  CPU_FEATURES_DATA         *CpuFeaturesData;
-  CPU_REGISTER_TABLE        *RegisterTable;
   CPU_REGISTER_TABLE_ENTRY  *RegisterTableEntry;
   UINTN                     Index;
   UINTN                     Value;
   CPU_REGISTER_TABLE_ENTRY  *RegisterTableEntryHead;
-
-  CpuFeaturesData = GetCpuFeaturesData ();
-  RegisterTable = &CpuFeaturesData->RegisterTable[ProcessorNumber];
+  volatile UINT32           *SemaphorePtr;
+  UINT32                    CoreOffset;
+  UINT32                    PackageOffset;
+  UINT32                    PackageThreadsCount;
+  UINT32                    ApOffset;
+  UINTN                     ProcessorIndex;
+  UINTN                     ApIndex;
+  UINTN                     ValidApCount;
+
+  ApIndex = ApLocation->Package * CpuStatus->CoreCount * CpuStatus->ThreadCount \
+            + ApLocation->Core * CpuStatus->ThreadCount \
+            + ApLocation->Thread;
 
   //
   // Traverse Register Table of this logical processor
@@ -591,6 +768,7 @@ ProgramProcessorRegister (
   for (Index = 0; Index < RegisterTable->TableLength; Index++) {
 
     RegisterTableEntry = &RegisterTableEntryHead[Index];
+    DEBUG ((DEBUG_INFO, "Processor = %d, Entry Index %d, Type = %d!\n", ApIndex, Index, RegisterTableEntry->RegisterType));
 
     //
     // Check the type of specified register
@@ -654,10 +832,6 @@ ProgramProcessorRegister (
     // The specified register is Model Specific Register
     //
     case Msr:
-      //
-      // Get lock to avoid Package/Core scope MSRs programming issue in parallel execution mode
-      //
-      AcquireSpinLock (&CpuFeaturesData->MsrLock);
       if (RegisterTableEntry->ValidBitLength >= 64) {
         //
         // If length is not less than 64 bits, then directly write without reading
@@ -677,20 +851,19 @@ ProgramProcessorRegister (
           RegisterTableEntry->Value
           );
       }
-      ReleaseSpinLock (&CpuFeaturesData->MsrLock);
       break;
     //
     // MemoryMapped operations
     //
     case MemoryMapped:
-      AcquireSpinLock (&CpuFeaturesData->MemoryMappedLock);
+      AcquireSpinLock (&CpuFlags->MemoryMappedLock);
       MmioBitFieldWrite32 (
         (UINTN)(RegisterTableEntry->Index | LShiftU64 (RegisterTableEntry->HighIndex, 32)),
         RegisterTableEntry->ValidBitStart,
         RegisterTableEntry->ValidBitStart + RegisterTableEntry->ValidBitLength - 1,
         (UINT32)RegisterTableEntry->Value
         );
-      ReleaseSpinLock (&CpuFeaturesData->MemoryMappedLock);
+      ReleaseSpinLock (&CpuFlags->MemoryMappedLock);
       break;
     //
     // Enable or disable cache
@@ -706,6 +879,50 @@ ProgramProcessorRegister (
       }
       break;
 
+    case Semaphore:
+      SemaphorePtr = CpuFlags->SemaphoreCount;
+      switch (RegisterTableEntry->Value) {
+      case CoreDepType:
+        CoreOffset = (ApLocation->Package * CpuStatus->CoreCount + ApLocation->Core) * CpuStatus->ThreadCount;
+        ApOffset = CoreOffset + ApLocation->Thread;
+        //
+        // First increase semaphore count by 1 for processors in this core.
+        //
+        for (ProcessorIndex = 0; ProcessorIndex < CpuStatus->ThreadCount; ProcessorIndex ++) {
+          LibReleaseSemaphore ((UINT32 *) &SemaphorePtr[CoreOffset + ProcessorIndex]);
+        }
+        //
+        // Second, check whether the count has reach the check number.
+        //
+        for (ProcessorIndex = 0; ProcessorIndex < CpuStatus->ThreadCount; ProcessorIndex ++) {
+          LibWaitForSemaphore (&SemaphorePtr[ApOffset]);
+        }
+        break;
+
+      case PackageDepType:
+        PackageOffset = ApLocation->Package * CpuStatus->CoreCount * CpuStatus->ThreadCount;
+        PackageThreadsCount = CpuStatus->ThreadCount * CpuStatus->CoreCount;
+        ApOffset = PackageOffset + CpuStatus->ThreadCount * ApLocation->Core + ApLocation->Thread;
+        ValidApCount = CpuStatus->ThreadCount * CpuStatus->ValidCoresInPackages[ApLocation->Package];
+        //
+        // First increase semaphore count by 1 for processors in this package.
+        //
+        for (ProcessorIndex = 0; ProcessorIndex < PackageThreadsCount ; ProcessorIndex ++) {
+          LibReleaseSemaphore ((UINT32 *) &SemaphorePtr[PackageOffset + ProcessorIndex]);
+        }
+        //
+        // Second, check whether the count has reach the check number.
+        //
+        for (ProcessorIndex = 0; ProcessorIndex < ValidApCount; ProcessorIndex ++) {
+          LibWaitForSemaphore (&SemaphorePtr[ApOffset]);
+        }
+        break;
+
+      default:
+        break;
+      }
+      break;
+
     default:
       break;
     }
@@ -724,10 +941,36 @@ SetProcessorRegister (
   IN OUT VOID            *Buffer
   )
 {
-  UINTN                  ProcessorNumber;
+  CPU_FEATURES_DATA         *CpuFeaturesData;
+  CPU_REGISTER_TABLE        *RegisterTable;
+  CPU_REGISTER_TABLE        *RegisterTables;
+  UINT32                    InitApicId;
+  UINTN                     ProcIndex;
+  UINTN                     Index;
+  ACPI_CPU_DATA             *AcpiCpuData;
 
-  ProcessorNumber = GetProcessorIndex ();
-  ProgramProcessorRegister (ProcessorNumber);
+  CpuFeaturesData = (CPU_FEATURES_DATA *) Buffer;
+  AcpiCpuData = CpuFeaturesData->AcpiCpuData;
+
+  RegisterTables = (CPU_REGISTER_TABLE *)(UINTN)AcpiCpuData->RegisterTable;
+
+  InitApicId = GetInitialApicId ();
+  RegisterTable = NULL;
+  for (Index = 0; Index < AcpiCpuData->NumberOfCpus; Index++) {
+    if (RegisterTables[Index].InitialApicId == InitApicId) {
+      RegisterTable =  &RegisterTables[Index];
+      ProcIndex = Index;
+      break;
+    }
+  }
+  ASSERT (RegisterTable != NULL);
+
+  ProgramProcessorRegister (
+    RegisterTable,
+    AcpiCpuData->ApLocation + ProcIndex,
+    &AcpiCpuData->CpuStatus,
+    &CpuFeaturesData->CpuFlags
+    );
 }
 
 /**
@@ -746,6 +989,9 @@ CpuFeaturesDetect (
 {
   UINTN                  NumberOfCpus;
   UINTN                  NumberOfEnabledProcessors;
+  CPU_FEATURES_DATA      *CpuFeaturesData;
+
+  CpuFeaturesData = GetCpuFeaturesData();
 
   GetNumberOfProcessor (&NumberOfCpus, &NumberOfEnabledProcessors);
 
@@ -754,49 +1000,13 @@ CpuFeaturesDetect (
   //
   // Wakeup all APs for data collection.
   //
-  StartupAPsWorker (CollectProcessorData);
+  StartupAPsWorker (CollectProcessorData, NULL);
 
   //
   // Collect data on BSP
   //
-  CollectProcessorData (NULL);
+  CollectProcessorData (CpuFeaturesData);
 
   AnalysisProcessorFeatures (NumberOfCpus);
 }
 
-/**
-  Performs CPU features Initialization.
-
-  This service will invoke MP service to perform CPU features
-  initialization on BSP/APs per user configuration.
-
-  @note This service could be called by BSP only.
-**/
-VOID
-EFIAPI
-CpuFeaturesInitialize (
-  VOID
-  )
-{
-  CPU_FEATURES_DATA      *CpuFeaturesData;
-  UINTN                  OldBspNumber;
-
-  CpuFeaturesData = GetCpuFeaturesData ();
-
-  OldBspNumber = GetProcessorIndex();
-  CpuFeaturesData->BspNumber = OldBspNumber;
-  //
-  // Wakeup all APs for programming.
-  //
-  StartupAPsWorker (SetProcessorRegister);
-  //
-  // Programming BSP
-  //
-  SetProcessorRegister (NULL);
-  //
-  // Switch to new BSP if required
-  //
-  if (CpuFeaturesData->BspNumber != OldBspNumber) {
-    SwitchNewBsp (CpuFeaturesData->BspNumber);
-  }
-}
diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.c b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.c
index 1f34a3f489..8346f7004f 100644
--- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.c
+++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.c
@@ -15,6 +15,7 @@
 #include <PiDxe.h>
 
 #include <Library/UefiBootServicesTableLib.h>
+#include <Library/UefiLib.h>
 
 #include "RegisterCpuFeatures.h"
 
@@ -115,14 +116,20 @@ GetProcessorInformation (
 
   @param[in]  Procedure               A pointer to the function to be run on
                                       enabled APs of the system.
+  @param[in]  MpEvent                 A pointer to the event to be used later
+                                      to check whether procedure has done.
 **/
 VOID
 StartupAPsWorker (
-  IN  EFI_AP_PROCEDURE                 Procedure
+  IN  EFI_AP_PROCEDURE                 Procedure,
+  IN  VOID                             *MpEvent
   )
 {
   EFI_STATUS                           Status;
   EFI_MP_SERVICES_PROTOCOL             *MpServices;
+  CPU_FEATURES_DATA                    *CpuFeaturesData;
+
+  CpuFeaturesData = GetCpuFeaturesData ();
 
   MpServices = GetMpProtocol ();
   //
@@ -132,9 +139,9 @@ StartupAPsWorker (
                  MpServices,
                  Procedure,
                  FALSE,
-                 NULL,
+                 (EFI_EVENT)MpEvent,
                  0,
-                 NULL,
+                 CpuFeaturesData,
                  NULL
                  );
   ASSERT_EFI_ERROR (Status);
@@ -197,3 +204,61 @@ GetNumberOfProcessor (
   ASSERT_EFI_ERROR (Status);
 }
 
+/**
+  Performs CPU features Initialization.
+
+  This service will invoke MP service to perform CPU features
+  initialization on BSP/APs per user configuration.
+
+  @note This service could be called by BSP only.
+**/
+VOID
+EFIAPI
+CpuFeaturesInitialize (
+  VOID
+  )
+{
+  CPU_FEATURES_DATA          *CpuFeaturesData;
+  UINTN                      OldBspNumber;
+  EFI_EVENT                  MpEvent;
+  EFI_STATUS                 Status;
+
+  CpuFeaturesData = GetCpuFeaturesData ();
+
+  OldBspNumber = GetProcessorIndex();
+  CpuFeaturesData->BspNumber = OldBspNumber;
+
+  Status = gBS->CreateEvent (
+                  EVT_NOTIFY_WAIT,
+                  TPL_CALLBACK,
+                  EfiEventEmptyFunction,
+                  NULL,
+                  &MpEvent
+                  );
+  ASSERT_EFI_ERROR (Status);
+
+  //
+  // Wakeup all APs for programming.
+  //
+  StartupAPsWorker (SetProcessorRegister, MpEvent);
+  //
+  // Programming BSP
+  //
+  SetProcessorRegister (CpuFeaturesData);
+
+  //
+  // Wait all processors to finish the task.
+  //
+  do {
+    Status = gBS->CheckEvent (MpEvent);
+  } while (Status == EFI_NOT_READY);
+  ASSERT_EFI_ERROR (Status);
+
+  //
+  // Switch to new BSP if required
+  //
+  if (CpuFeaturesData->BspNumber != OldBspNumber) {
+    SwitchNewBsp (CpuFeaturesData->BspNumber);
+  }
+}
+
diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.inf b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.inf
index f0f317c945..6693bae575 100644
--- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.inf
+++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.inf
@@ -47,6 +47,9 @@
   SynchronizationLib
   UefiBootServicesTableLib
   IoLib
+  UefiBootServicesTableLib
+  UefiLib
+  LocalApicLib
 
 [Protocols]
   gEfiMpServiceProtocolGuid                                            ## CONSUMES
diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.c b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.c
index 82fe268812..799864a136 100644
--- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.c
+++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.c
@@ -149,11 +149,15 @@ GetProcessorInformation (
 **/
 VOID
 StartupAPsWorker (
-  IN  EFI_AP_PROCEDURE                 Procedure
+  IN  EFI_AP_PROCEDURE                 Procedure,
+  IN  VOID                             *MpEvent
   )
 {
   EFI_STATUS                           Status;
   EFI_PEI_MP_SERVICES_PPI              *CpuMpPpi;
+  CPU_FEATURES_DATA                    *CpuFeaturesData;
+
+  CpuFeaturesData = GetCpuFeaturesData ();
 
   //
   // Get MP Services Protocol
@@ -175,7 +179,7 @@ StartupAPsWorker (
                  Procedure,
                  FALSE,
                  0,
-                 NULL
+                 CpuFeaturesData
                  );
   ASSERT_EFI_ERROR (Status);
 }
@@ -257,3 +261,50 @@ GetNumberOfProcessor (
                          );
   ASSERT_EFI_ERROR (Status);
 }
+
+/**
+  Performs CPU features Initialization.
+
+  This service will invoke MP service to perform CPU features
+  initialization on BSP/APs per user configuration.
+
+  @note This service could be called by BSP only.
+**/
+VOID
+EFIAPI
+CpuFeaturesInitialize (
+  VOID
+  )
+{
+  CPU_FEATURES_DATA          *CpuFeaturesData;
+  UINTN                      OldBspNumber;
+
+  CpuFeaturesData = GetCpuFeaturesData ();
+
+  OldBspNumber = GetProcessorIndex();
+  CpuFeaturesData->BspNumber = OldBspNumber;
+
+  //
+  // Known limitation: In PEI phase, CpuFeatures driver not
+  // support async mode execute tasks. So semaphore type
+  // register can't been used for this instance, must use
+  // DXE type instance.
+  //
+
+  //
+  // Wakeup all APs for programming.
+  //
+  StartupAPsWorker (SetProcessorRegister, NULL);
+  //
+  // Programming BSP
+  //
+  SetProcessorRegister (CpuFeaturesData);
+
+  //
+  // Switch to new BSP if required
+  //
+  if (CpuFeaturesData->BspNumber != OldBspNumber) {
+    SwitchNewBsp (CpuFeaturesData->BspNumber);
+  }
+}
+
diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.inf b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.inf
index fdfef98293..e95f01df0b 100644
--- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.inf
+++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.inf
@@ -49,6 +49,7 @@
   PeiServicesLib
   PeiServicesTablePointerLib
   IoLib
+  LocalApicLib
 
 [Ppis]
   gEfiPeiMpServicesPpiGuid                                             ## CONSUMES
diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeatures.h b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeatures.h
index edd266934f..39457e9730 100644
--- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeatures.h
+++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeatures.h
@@ -23,6 +23,7 @@
 #include <Library/MemoryAllocationLib.h>
 #include <Library/SynchronizationLib.h>
 #include <Library/IoLib.h>
+#include <Library/LocalApicLib.h>
 
 #include <AcpiCpuData.h>
 
@@ -46,16 +47,26 @@ typedef struct {
   CPU_FEATURE_INITIALIZE       InitializeFunc;
   UINT8                        *BeforeFeatureBitMask;
   UINT8                        *AfterFeatureBitMask;
+  UINT8                        *CoreBeforeFeatureBitMask;
+  UINT8                        *CoreAfterFeatureBitMask;
+  UINT8                        *PackageBeforeFeatureBitMask;
+  UINT8                        *PackageAfterFeatureBitMask;
   VOID                         *ConfigData;
   BOOLEAN                      BeforeAll;
   BOOLEAN                      AfterAll;
 } CPU_FEATURES_ENTRY;
 
+//
+// Flags used when program the register.
+//
+typedef struct {
+  volatile UINTN           MemoryMappedLock;     // Spinlock used to program mmio
+  volatile UINT32          *SemaphoreCount;      // Semaphore used to program semaphore.
+} PROGRAM_CPU_REGISTER_FLAGS;
+
 typedef struct {
   UINTN                    FeaturesCount;
   UINT32                   BitMaskSize;
-  SPIN_LOCK                MsrLock;
-  SPIN_LOCK                MemoryMappedLock;
   LIST_ENTRY               FeatureList;
 
   CPU_FEATURES_INIT_ORDER  *InitOrder;
@@ -64,9 +75,14 @@ typedef struct {
   UINT8                    *ConfigurationPcd;
   UINT8                    *SettingPcd;
 
+  UINT32                   NumberOfCpus;
+  ACPI_CPU_DATA            *AcpiCpuData;
+
   CPU_REGISTER_TABLE       *RegisterTable;
   CPU_REGISTER_TABLE       *PreSmmRegisterTable;
   UINTN                    BspNumber;
+
+  PROGRAM_CPU_REGISTER_FLAGS  CpuFlags;
 } CPU_FEATURES_DATA;
 
 #define CPU_FEATURE_ENTRY_FROM_LINK(a) \
@@ -118,10 +134,13 @@ GetProcessorInformation (
 
   @param[in]  Procedure               A pointer to the function to be run on
                                       enabled APs of the system.
+  @param[in]  MpEvent                 A pointer to the event to be used later
+                                      to check whether procedure has done.
 **/
 VOID
 StartupAPsWorker (
-  IN  EFI_AP_PROCEDURE                 Procedure
+  IN  EFI_AP_PROCEDURE                 Procedure,
+  IN  VOID                             *MpEvent
   );
 
 /**
@@ -170,4 +189,30 @@ DumpCpuFeature (
   IN CPU_FEATURES_ENTRY  *CpuFeature
   );
 
+/**
+  Return feature dependence result.
+
+  @param[in]  CpuFeature        Pointer to CPU feature.
+  @param[in]  Before            Check before dependence or after.
+
+  @retval     return the dependence result.
+**/
+CPU_FEATURE_DEPENDENCE_TYPE
+DetectFeatureScope (
+  IN CPU_FEATURES_ENTRY         *CpuFeature,
+  IN BOOLEAN                    Before
+  );
+
+/**
+  Programs registers for the calling processor.
+
+  @param[in,out] Buffer  The pointer to private data buffer.
+
+**/
+VOID
+EFIAPI
+SetProcessorRegister (
+  IN OUT VOID            *Buffer
+  );
+
 #endif
diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeaturesLib.c b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeaturesLib.c
index fa7e107e39..f9e3178dc1 100644
--- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeaturesLib.c
+++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeaturesLib.c
@@ -112,6 +112,302 @@ IsBitMaskMatchCheck (
   return FALSE;
 }
 
+/**
+  Return feature dependence result.
+
+  @param[in]  CpuFeature        Pointer to CPU feature.
+  @param[in]  Before            Check before dependence or after.
+
+  @retval     return the dependence result.
+**/
+CPU_FEATURE_DEPENDENCE_TYPE
+DetectFeatureScope (
+  IN CPU_FEATURES_ENTRY         *CpuFeature,
+  IN BOOLEAN                    Before
+  )
+{
+  if (Before) {
+    if (CpuFeature->PackageBeforeFeatureBitMask != NULL) {
+      return PackageDepType;
+    }
+
+    if (CpuFeature->CoreBeforeFeatureBitMask != NULL) {
+      return CoreDepType;
+    }
+
+    if (CpuFeature->BeforeFeatureBitMask != NULL) {
+      return ThreadDepType;
+    }
+
+    return NoneDepType;
+  }
+
+  if (CpuFeature->PackageAfterFeatureBitMask != NULL) {
+    return PackageDepType;
+  }
+
+  if (CpuFeature->CoreAfterFeatureBitMask != NULL) {
+    return CoreDepType;
+  }
+
+  if (CpuFeature->AfterFeatureBitMask != NULL) {
+    return ThreadDepType;
+  }
+
+  return NoneDepType;
+}
+
+/**
+  Clear dependence for the specified type.
+
+  @param[in]  CurrentFeature     Cpu feature need to clear.
+  @param[in]  Before             Before or after dependence relationship.
+
+**/
+VOID
+ClearFeatureScope (
+  IN CPU_FEATURES_ENTRY           *CpuFeature,
+  IN BOOLEAN                      Before
+  )
+{
+  if (Before) {
+    if (CpuFeature->BeforeFeatureBitMask != NULL) {
+      FreePool (CpuFeature->BeforeFeatureBitMask);
+      CpuFeature->BeforeFeatureBitMask = NULL;
+    }
+    if (CpuFeature->CoreBeforeFeatureBitMask != NULL) {
+      FreePool (CpuFeature->CoreBeforeFeatureBitMask);
+      CpuFeature->CoreBeforeFeatureBitMask = NULL;
+    }
+    if (CpuFeature->PackageBeforeFeatureBitMask != NULL) {
+      FreePool (CpuFeature->PackageBeforeFeatureBitMask);
+      CpuFeature->PackageBeforeFeatureBitMask = NULL;
+    }
+  } else {
+    if (CpuFeature->PackageAfterFeatureBitMask != NULL) {
+      FreePool (CpuFeature->PackageAfterFeatureBitMask);
+      CpuFeature->PackageAfterFeatureBitMask = NULL;
+    }
+    if (CpuFeature->CoreAfterFeatureBitMask != NULL) {
+      FreePool (CpuFeature->CoreAfterFeatureBitMask);
+      CpuFeature->CoreAfterFeatureBitMask = NULL;
+    }
+    if (CpuFeature->AfterFeatureBitMask != NULL) {
+      FreePool (CpuFeature->AfterFeatureBitMask);
+      CpuFeature->AfterFeatureBitMask = NULL;
+    }
+  }
+}
+
+/**
+  Base on dependence relationship to asjust feature dependence.
+
+  ONLY when the feature before(or after) the find feature also has 
+  dependence with the find feature. In this case, driver need to base
+  on dependce relationship to decide how to insert current feature and
+  adjust the feature dependence.
+
+  @param[in]  PreviousFeature    CPU feature current before the find one.
+  @param[in]  CurrentFeature     Cpu feature need to adjust.
+  @param[in]  Before             Before or after dependence relationship.
+
+  @retval   TRUE   means the current feature dependence has been adjusted.
+
+  @retval   FALSE  means the previous feature dependence has been adjusted.
+                   or previous feature has no dependence with the find one.
+
+**/
+BOOLEAN
+AdjustFeaturesDependence (
+  IN OUT CPU_FEATURES_ENTRY         *PreviousFeature,
+  IN OUT CPU_FEATURES_ENTRY         *CurrentFeature,
+  IN     BOOLEAN                    Before
+  )
+{
+  CPU_FEATURE_DEPENDENCE_TYPE            PreDependType;
+  CPU_FEATURE_DEPENDENCE_TYPE            CurrentDependType;
+
+  PreDependType     = DetectFeatureScope(PreviousFeature, Before);
+  CurrentDependType = DetectFeatureScope(CurrentFeature, Before);
+
+  //
+  // If previous feature has no dependence with the find featue.
+  // return FALSE.
+  //
+  if (PreDependType == NoneDepType) {
+    return FALSE;
+  }
+
+  //
+  // If both feature have dependence, keep the one which needs use more 
+  // processors and clear the dependence for the other one.
+  //
+  if (PreDependType >= CurrentDependType) {
+    ClearFeatureScope (CurrentFeature, Before);
+    return TRUE;
+  } else {
+    ClearFeatureScope (PreviousFeature, Before);
+    return FALSE;
+  }
+}
+
+/**
+  Base on dependence relationship to asjust feature order.
+
+  @param[in]  FeatureList        Pointer to CPU feature list
+  @param[in]  FindEntry          The entry this feature depend on.
+  @param[in]  CurrentEntry       The entry for this feature.
+  @param[in]  Before             Before or after dependence relationship.
+
+**/
+VOID
+AdjustEntry (
+  IN      LIST_ENTRY                *FeatureList,
+  IN OUT  LIST_ENTRY                *FindEntry,
+  IN OUT  LIST_ENTRY                *CurrentEntry,
+  IN      BOOLEAN                   Before
+  )
+{
+  LIST_ENTRY                *PreviousEntry;
+  CPU_FEATURES_ENTRY        *PreviousFeature;
+  CPU_FEATURES_ENTRY        *CurrentFeature;
+
+  //
+  // For CPU feature which has core or package type dependence, later code need to insert
+  // AcquireSpinLock/ReleaseSpinLock logic to sequency the execute order.
+  // So if driver finds both feature A and B need to execute before feature C, driver will
+  // base on dependence type of feature A and B to update the logic here.
+  // For example, feature A has package type dependence and feature B has core type dependence,
+  // because package type dependence need to wait for more processors which has strong dependence
+  // than core type dependence. So driver will adjust the feature order to B -> A -> C. and driver 
+  // will remove the feature dependence in feature B. 
+  // Driver just needs to make sure before feature C been executed, feature A has finished its task
+  // in all all thread. Feature A finished in all threads also means feature B have finshed in all
+  // threads.
+  //
+  if (Before) {
+    PreviousEntry = GetPreviousNode (FeatureList, FindEntry);
+  } else {

+    PreviousEntry = GetNextNode (FeatureList, FindEntry);
+  }
+
+  CurrentFeature  = CPU_FEATURE_ENTRY_FROM_LINK (CurrentEntry);
+  RemoveEntryList (CurrentEntry);
+
+  if (IsNull (FeatureList, PreviousEntry)) {
+    //
+    // If not exist the previous or next entry, just insert the current entry.
+    //
+    if (Before) {
+      InsertTailList (FindEntry, CurrentEntry);
+    } else {
+      InsertHeadList (FindEntry, CurrentEntry);
+    }
+  } else {
+    //
+    // If exist the previous or next entry, need to check it before insert curent entry.
+    //
+    PreviousFeature = CPU_FEATURE_ENTRY_FROM_LINK (PreviousEntry);
+
+    if (AdjustFeaturesDependence (PreviousFeature, CurrentFeature, Before)) {
+      //
+      // Return TRUE means current feature dependence has been cleared and the previous
+      // feature dependence has been kept and used. So insert current feature before (or after)
+      // the previous feature.
+      //
+      if (Before) {
+        InsertTailList (PreviousEntry, CurrentEntry);
+      } else {
+        InsertHeadList (PreviousEntry, CurrentEntry);
+      }
+    } else {
+      if (Before) {
+        InsertTailList (FindEntry, CurrentEntry);
+      } else {
+        InsertHeadList (FindEntry, CurrentEntry);
+      }
+    }
+  }
+}

+
+/**
+  Checks and adjusts current CPU features per dependency relationship.
+
+  @param[in]  FeatureList        Pointer to CPU feature list
+  @param[in]  CurrentEntry       Pointer to current checked CPU feature
+  @param[in]  FeatureMask        The feature bit mask.
+
+  @retval     return Swapped info.
+**/
+BOOLEAN
+InsertToBeforeEntry (
+  IN LIST_ENTRY              *FeatureList,
+  IN LIST_ENTRY              *CurrentEntry,
+  IN UINT8                   *FeatureMask
+  )
+{
+  LIST_ENTRY                 *CheckEntry;
+  CPU_FEATURES_ENTRY         *CheckFeature;
+  BOOLEAN                    Swapped;
+
+  Swapped = FALSE;
+
+  //
+  // Check all features dispatched before this entry
+  //
+  CheckEntry = GetFirstNode (FeatureList);
+  while (CheckEntry != CurrentEntry) {
+    CheckFeature = CPU_FEATURE_ENTRY_FROM_LINK (CheckEntry);
+    if (IsBitMaskMatchCheck (CheckFeature->FeatureMask, FeatureMask)) {
+      AdjustEntry (FeatureList, CheckEntry, CurrentEntry, TRUE);
+      Swapped = TRUE;
+      break;
+    }
+    CheckEntry = CheckEntry->ForwardLink;
+  }
+
+  return Swapped;
+}
+
+/**
+  Checks and adjusts current CPU features per dependency relationship.
+
+  @param[in]  FeatureList        Pointer to CPU feature list
+  @param[in]  CurrentEntry       Pointer to current checked CPU feature
+  @param[in]  FeatureMask        The feature bit mask.
+
+  @retval     return Swapped info.
+**/
+BOOLEAN
+InsertToAfterEntry (
+  IN LIST_ENTRY              *FeatureList,
+  IN LIST_ENTRY              *CurrentEntry,
+  IN UINT8                   *FeatureMask
+  )
+{
+  LIST_ENTRY                 *CheckEntry;
+  CPU_FEATURES_ENTRY         *CheckFeature;
+  BOOLEAN                    Swapped;
+
+  Swapped = FALSE;
+
+  //
+  // Check all features dispatched after this entry
+  //
+  CheckEntry = GetNextNode (FeatureList, CurrentEntry);
+  while (!IsNull (FeatureList, CheckEntry)) {
+    CheckFeature = CPU_FEATURE_ENTRY_FROM_LINK (CheckEntry);
+    if (IsBitMaskMatchCheck (CheckFeature->FeatureMask, FeatureMask)) {
+      AdjustEntry (FeatureList, CheckEntry, CurrentEntry, FALSE);
+      Swapped = TRUE;
+      break;
+    }
+    CheckEntry = CheckEntry->ForwardLink;
+  }
+
+  return Swapped;
+}
+
 /**
   Checks and adjusts CPU features order per dependency relationship.
 
@@ -128,11 +424,13 @@ CheckCpuFeaturesDependency (
   CPU_FEATURES_ENTRY         *CheckFeature;
   BOOLEAN                    Swapped;
   LIST_ENTRY                 *TempEntry;
+  LIST_ENTRY                 *NextEntry;
 
   CurrentEntry = GetFirstNode (FeatureList);
   while (!IsNull (FeatureList, CurrentEntry)) {
     Swapped = FALSE;
     CpuFeature = CPU_FEATURE_ENTRY_FROM_LINK (CurrentEntry);
+    NextEntry = CurrentEntry->ForwardLink;
     if (CpuFeature->BeforeAll) {
       //
       // Check all features dispatched before this entry
@@ -153,6 +451,7 @@ CheckCpuFeaturesDependency (
         CheckEntry = CheckEntry->ForwardLink;
       }
       if (Swapped) {
+        CurrentEntry = NextEntry;
         continue;
       }
     }
@@ -179,60 +478,59 @@ CheckCpuFeaturesDependency (
         CheckEntry = CheckEntry->ForwardLink;
       }
       if (Swapped) {
+        CurrentEntry = NextEntry;
         continue;
       }
     }
 
     if (CpuFeature->BeforeFeatureBitMask != NULL) {
-      //
-      // Check all features dispatched before this entry
-      //
-      CheckEntry = GetFirstNode (FeatureList);
-      while (CheckEntry != CurrentEntry) {
-        CheckFeature = CPU_FEATURE_ENTRY_FROM_LINK (CheckEntry);
-        if (IsBitMaskMatchCheck (CheckFeature->FeatureMask, CpuFeature->BeforeFeatureBitMask)) {
-          //
-          // If there is dependency, swap them
-          //
-          RemoveEntryList (CurrentEntry);
-          InsertTailList (CheckEntry, CurrentEntry);
-          Swapped = TRUE;
-          break;
-        }
-        CheckEntry = CheckEntry->ForwardLink;
-      }
+      Swapped = InsertToBeforeEntry (FeatureList, CurrentEntry, CpuFeature->BeforeFeatureBitMask);
       if (Swapped) {
+        CurrentEntry = NextEntry;
         continue;
       }
     }
 
     if (CpuFeature->AfterFeatureBitMask != NULL) {
-      //
-      // Check all features dispatched after this entry
-      //
-      CheckEntry = GetNextNode (FeatureList, CurrentEntry);
-      while (!IsNull (FeatureList, CheckEntry)) {
-        CheckFeature = CPU_FEATURE_ENTRY_FROM_LINK (CheckEntry);
-        if (IsBitMaskMatchCheck (CheckFeature->FeatureMask, CpuFeature->AfterFeatureBitMask)) {
-          //
-          // If there is dependency, swap them
-          //
-          TempEntry = GetNextNode (FeatureList, CurrentEntry);
-          RemoveEntryList (CurrentEntry);
-          InsertHeadList (CheckEntry, CurrentEntry);
-          CurrentEntry = TempEntry;
-          Swapped = TRUE;
-          break;
-        }
-        CheckEntry = CheckEntry->ForwardLink;
+      Swapped = InsertToAfterEntry (FeatureList, CurrentEntry, CpuFeature->AfterFeatureBitMask);
+      if (Swapped) {
+        CurrentEntry = NextEntry;
+        continue;
       }
+    }
+
+    if (CpuFeature->CoreBeforeFeatureBitMask != NULL) {
+      Swapped = InsertToBeforeEntry (FeatureList, CurrentEntry, CpuFeature->CoreBeforeFeatureBitMask);
       if (Swapped) {
+        CurrentEntry = NextEntry;
         continue;
       }
     }
-    //
-    // No swap happened, check the next feature
-    //
+
+    if (CpuFeature->CoreAfterFeatureBitMask != NULL) {
+      Swapped = InsertToAfterEntry (FeatureList, CurrentEntry, CpuFeature->CoreAfterFeatureBitMask);
+      if (Swapped) {
+        CurrentEntry = NextEntry;
+        continue;
+      }
+    }
+
+    if (CpuFeature->PackageBeforeFeatureBitMask != NULL) {
+      Swapped = InsertToBeforeEntry (FeatureList, CurrentEntry, CpuFeature->PackageBeforeFeatureBitMask);
+      if (Swapped) {
+        CurrentEntry = NextEntry;
+        continue;
+      }
+    }
+
+    if (CpuFeature->PackageAfterFeatureBitMask != NULL) {
+      Swapped = InsertToAfterEntry (FeatureList, CurrentEntry, CpuFeature->PackageAfterFeatureBitMask);
+      if (Swapped) {
+        CurrentEntry = NextEntry;
+        continue;
+      }
+    }
+
     CurrentEntry = CurrentEntry->ForwardLink;
   }
 }
@@ -265,8 +563,7 @@ RegisterCpuFeatureWorker (
   CpuFeaturesData = GetCpuFeaturesData ();
   if (CpuFeaturesData->FeaturesCount == 0) {
     InitializeListHead (&CpuFeaturesData->FeatureList);
-    InitializeSpinLock (&CpuFeaturesData->MsrLock);
-    InitializeSpinLock (&CpuFeaturesData->MemoryMappedLock);
+    InitializeSpinLock (&CpuFeaturesData->CpuFlags.MemoryMappedLock);
     CpuFeaturesData->BitMaskSize = (UINT32) BitMaskSize;
   }
   ASSERT (CpuFeaturesData->BitMaskSize == BitMaskSize);
@@ -328,6 +625,31 @@ RegisterCpuFeatureWorker (
       }
       CpuFeatureEntry->AfterFeatureBitMask = CpuFeature->AfterFeatureBitMask;
     }
+    if (CpuFeature->CoreBeforeFeatureBitMask != NULL) {
+      if (CpuFeatureEntry->CoreBeforeFeatureBitMask != NULL) {
+        FreePool (CpuFeatureEntry->CoreBeforeFeatureBitMask);
+      }
+      CpuFeatureEntry->CoreBeforeFeatureBitMask = CpuFeature->CoreBeforeFeatureBitMask;
+    }
+    if (CpuFeature->CoreAfterFeatureBitMask != NULL) {
+      if (CpuFeatureEntry->CoreAfterFeatureBitMask != NULL) {
+        FreePool (CpuFeatureEntry->CoreAfterFeatureBitMask);
+      }
+      CpuFeatureEntry->CoreAfterFeatureBitMask = CpuFeature->CoreAfterFeatureBitMask;
+    }
+    if (CpuFeature->PackageBeforeFeatureBitMask != NULL) {
+      if (CpuFeatureEntry->PackageBeforeFeatureBitMask != NULL) {
+        FreePool (CpuFeatureEntry->PackageBeforeFeatureBitMask);
+      }
+      CpuFeatureEntry->PackageBeforeFeatureBitMask = CpuFeature->PackageBeforeFeatureBitMask;
+    }
+    if (CpuFeature->PackageAfterFeatureBitMask != NULL) {
+      if (CpuFeatureEntry->PackageAfterFeatureBitMask != NULL) {
+        FreePool (CpuFeatureEntry->PackageAfterFeatureBitMask);
+      }
+      CpuFeatureEntry->PackageAfterFeatureBitMask = CpuFeature->PackageAfterFeatureBitMask;
+    }
+
     CpuFeatureEntry->BeforeAll = CpuFeature->BeforeAll;
     CpuFeatureEntry->AfterAll  = CpuFeature->AfterAll;
 
@@ -410,6 +732,8 @@ SetCpuFeaturesBitMask (
   @retval  RETURN_UNSUPPORTED       Registration of the CPU feature is not
                                     supported due to a circular dependency between
                                     BEFORE and AFTER features.
+  @retval  RETURN_NOT_READY         CPU feature PCD PcdCpuFeaturesUserConfiguration
+                                    not updated by Platform driver yet.
 
   @note This service could be called by BSP only.
 **/
@@ -431,12 +755,20 @@ RegisterCpuFeature (
   UINT8                      *FeatureMask;
   UINT8                      *BeforeFeatureBitMask;
   UINT8                      *AfterFeatureBitMask;
+  UINT8                      *CoreBeforeFeatureBitMask;
+  UINT8                      *CoreAfterFeatureBitMask;
+  UINT8                      *PackageBeforeFeatureBitMask;
+  UINT8                      *PackageAfterFeatureBitMask;
   BOOLEAN                    BeforeAll;
   BOOLEAN                    AfterAll;
 
-  FeatureMask          = NULL;
-  BeforeFeatureBitMask = NULL;
-  AfterFeatureBitMask  = NULL;
+  FeatureMask                 = NULL;
+  BeforeFeatureBitMask        = NULL;
+  AfterFeatureBitMask         = NULL;
+  CoreBeforeFeatureBitMask    = NULL;
+  CoreAfterFeatureBitMask     = NULL;
+  PackageBeforeFeatureBitMask  = NULL;
+  PackageAfterFeatureBitMask   = NULL;
   BeforeAll            = FALSE;
   AfterAll             = FALSE;
 
@@ -449,6 +781,10 @@ RegisterCpuFeature (
                     != (CPU_FEATURE_BEFORE | CPU_FEATURE_AFTER));
     ASSERT ((Feature & (CPU_FEATURE_BEFORE_ALL | CPU_FEATURE_AFTER_ALL))
                     != (CPU_FEATURE_BEFORE_ALL | CPU_FEATURE_AFTER_ALL));
+    ASSERT ((Feature & (CPU_FEATURE_CORE_BEFORE | CPU_FEATURE_CORE_AFTER))
+                    != (CPU_FEATURE_CORE_BEFORE | CPU_FEATURE_CORE_AFTER));
+    ASSERT ((Feature & (CPU_FEATURE_PACKAGE_BEFORE | CPU_FEATURE_PACKAGE_AFTER))
+                    != (CPU_FEATURE_PACKAGE_BEFORE | CPU_FEATURE_PACKAGE_AFTER));
     if (Feature < CPU_FEATURE_BEFORE) {
       BeforeAll = ((Feature & CPU_FEATURE_BEFORE_ALL) != 0) ? TRUE : FALSE;
       AfterAll  = ((Feature & CPU_FEATURE_AFTER_ALL) != 0) ? TRUE : FALSE;
@@ -459,6 +795,14 @@ RegisterCpuFeature (
       SetCpuFeaturesBitMask (&BeforeFeatureBitMask, Feature & ~CPU_FEATURE_BEFORE, BitMaskSize);
     } else if ((Feature & CPU_FEATURE_AFTER) != 0) {
       SetCpuFeaturesBitMask (&AfterFeatureBitMask, Feature & ~CPU_FEATURE_AFTER, BitMaskSize);
+    } else if ((Feature & CPU_FEATURE_CORE_BEFORE) != 0) {
+      SetCpuFeaturesBitMask (&CoreBeforeFeatureBitMask, Feature & ~CPU_FEATURE_CORE_BEFORE, BitMaskSize);
+    } else if ((Feature & CPU_FEATURE_CORE_AFTER) != 0) {
+      SetCpuFeaturesBitMask (&CoreAfterFeatureBitMask, Feature & ~CPU_FEATURE_CORE_AFTER, BitMaskSize);
+    } else if ((Feature & CPU_FEATURE_PACKAGE_BEFORE) != 0) {
+      SetCpuFeaturesBitMask (&PackageBeforeFeatureBitMask, Feature & ~CPU_FEATURE_PACKAGE_BEFORE, BitMaskSize);
+    } else if ((Feature & CPU_FEATURE_PACKAGE_AFTER) != 0) {
+      SetCpuFeaturesBitMask (&PackageAfterFeatureBitMask, Feature & ~CPU_FEATURE_PACKAGE_AFTER, BitMaskSize);
     }
     Feature = VA_ARG (Marker, UINT32);
   }
@@ -466,15 +810,19 @@ RegisterCpuFeature (
 
   CpuFeature = AllocateZeroPool (sizeof (CPU_FEATURES_ENTRY));
   ASSERT (CpuFeature != NULL);
-  CpuFeature->Signature            = CPU_FEATURE_ENTRY_SIGNATURE;
-  CpuFeature->FeatureMask          = FeatureMask;
-  CpuFeature->BeforeFeatureBitMask = BeforeFeatureBitMask;
-  CpuFeature->AfterFeatureBitMask  = AfterFeatureBitMask;
-  CpuFeature->BeforeAll            = BeforeAll;
-  CpuFeature->AfterAll             = AfterAll;
-  CpuFeature->GetConfigDataFunc    = GetConfigDataFunc;
-  CpuFeature->SupportFunc          = SupportFunc;
-  CpuFeature->InitializeFunc       = InitializeFunc;
+  CpuFeature->Signature                   = CPU_FEATURE_ENTRY_SIGNATURE;
+  CpuFeature->FeatureMask                 = FeatureMask;
+  CpuFeature->BeforeFeatureBitMask        = BeforeFeatureBitMask;
+  CpuFeature->AfterFeatureBitMask         = AfterFeatureBitMask;
+  CpuFeature->CoreBeforeFeatureBitMask    = CoreBeforeFeatureBitMask;
+  CpuFeature->CoreAfterFeatureBitMask     = CoreAfterFeatureBitMask;
+  CpuFeature->PackageBeforeFeatureBitMask = PackageBeforeFeatureBitMask;
+  CpuFeature->PackageAfterFeatureBitMask  = PackageAfterFeatureBitMask;
+  CpuFeature->BeforeAll                   = BeforeAll;
+  CpuFeature->AfterAll                    = AfterAll;
+  CpuFeature->GetConfigDataFunc           = GetConfigDataFunc;
+  CpuFeature->SupportFunc                 = SupportFunc;
+  CpuFeature->InitializeFunc              = InitializeFunc;
   if (FeatureName != NULL) {
     CpuFeature->FeatureName          = AllocatePool (CPU_FEATURE_NAME_SIZE);
     ASSERT (CpuFeature->FeatureName != NULL);
-- 
2.15.0.windows.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Patch 4/4] UefiCpuPkg/PiSmmCpuDxeSmm: Add logic to support semaphore type.
  2018-10-15  2:49 [Patch 0/4] Fix performance issue caused by Set MSR task Eric Dong
                   ` (2 preceding siblings ...)
  2018-10-15  2:49 ` [Patch 3/4] UefiCpuPkg/RegisterCpuFeaturesLib: Add logic to support semaphore type Eric Dong
@ 2018-10-15  2:49 ` Eric Dong
  2018-10-15 17:13   ` Laszlo Ersek
  2018-10-16  3:16   ` Ni, Ruiyu
  2018-10-15 15:51 ` [Patch 0/4] Fix performance issue caused by Set MSR task Laszlo Ersek
  4 siblings, 2 replies; 18+ messages in thread
From: Eric Dong @ 2018-10-15  2:49 UTC (permalink / raw)
  To: edk2-devel; +Cc: Ruiyu Ni, Laszlo Ersek

Because this driver needs to set MSRs saved in normal boot phase, sync semaphore
logic from RegisterCpuFeaturesLib code which used for normal boot phase.

Detail see change SHA-1: dcdf1774212d87e2d7feb36286a408ea7475fd7b for
RegisterCpuFeaturesLib.

Cc: Ruiyu Ni <ruiyu.ni@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Eric Dong <eric.dong@intel.com>
---
 UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c          | 316 ++++++++++++++++-------------
 UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c      |   3 -
 UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h |   3 +-
 3 files changed, 180 insertions(+), 142 deletions(-)

diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c b/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
index 52ff9679d5..5a35f7a634 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
@@ -38,9 +38,12 @@ typedef struct {
 } MP_ASSEMBLY_ADDRESS_MAP;
 
 //
-// Spin lock used to serialize MemoryMapped operation
+// Flags used when program the register.
 //
-SPIN_LOCK                *mMemoryMappedLock = NULL;
+typedef struct {
+  volatile UINTN           MemoryMappedLock;     // Spinlock used to program mmio
+  volatile UINT32          *SemaphoreCount;      // Semaphore used to program semaphore.
+} PROGRAM_CPU_REGISTER_FLAGS;
 
 //
 // Signal that SMM BASE relocation is complete.
@@ -62,13 +65,11 @@ AsmGetAddressMap (
 #define LEGACY_REGION_SIZE    (2 * 0x1000)
 #define LEGACY_REGION_BASE    (0xA0000 - LEGACY_REGION_SIZE)
 
+PROGRAM_CPU_REGISTER_FLAGS   mCpuFlags;
 ACPI_CPU_DATA                mAcpiCpuData;
 volatile UINT32              mNumberToFinish;
 MP_CPU_EXCHANGE_INFO         *mExchangeInfo;
 BOOLEAN                      mRestoreSmmConfigurationInS3 = FALSE;
-MP_MSR_LOCK                  *mMsrSpinLocks = NULL;
-UINTN                        mMsrSpinLockCount;
-UINTN                        mMsrCount = 0;
 
 //
 // S3 boot flag
@@ -91,89 +92,6 @@ UINT8                        mApHltLoopCodeTemplate[] = {
                                0xEB, 0xFC               // jmp $-2
                                };
 
-/**
-  Get MSR spin lock by MSR index.
-
-  @param  MsrIndex       MSR index value.
-
-  @return Pointer to MSR spin lock.
-
-**/
-SPIN_LOCK *
-GetMsrSpinLockByIndex (
-  IN UINT32      MsrIndex
-  )
-{
-  UINTN     Index;
-  for (Index = 0; Index < mMsrCount; Index++) {
-    if (MsrIndex == mMsrSpinLocks[Index].MsrIndex) {
-      return mMsrSpinLocks[Index].SpinLock;
-    }
-  }
-  return NULL;
-}
-
-/**
-  Initialize MSR spin lock by MSR index.
-
-  @param  MsrIndex       MSR index value.
-
-**/
-VOID
-InitMsrSpinLockByIndex (
-  IN UINT32      MsrIndex
-  )
-{
-  UINTN    MsrSpinLockCount;
-  UINTN    NewMsrSpinLockCount;
-  UINTN    Index;
-  UINTN    AddedSize;
-
-  if (mMsrSpinLocks == NULL) {
-    MsrSpinLockCount = mSmmCpuSemaphores.SemaphoreMsr.AvailableCounter;
-    mMsrSpinLocks = (MP_MSR_LOCK *) AllocatePool (sizeof (MP_MSR_LOCK) * MsrSpinLockCount);
-    ASSERT (mMsrSpinLocks != NULL);
-    for (Index = 0; Index < MsrSpinLockCount; Index++) {
-      mMsrSpinLocks[Index].SpinLock =
-       (SPIN_LOCK *)((UINTN)mSmmCpuSemaphores.SemaphoreMsr.Msr + Index * mSemaphoreSize);
-      mMsrSpinLocks[Index].MsrIndex = (UINT32)-1;
-    }
-    mMsrSpinLockCount = MsrSpinLockCount;
-    mSmmCpuSemaphores.SemaphoreMsr.AvailableCounter = 0;
-  }
-  if (GetMsrSpinLockByIndex (MsrIndex) == NULL) {
-    //
-    // Initialize spin lock for MSR programming
-    //
-    mMsrSpinLocks[mMsrCount].MsrIndex = MsrIndex;
-    InitializeSpinLock (mMsrSpinLocks[mMsrCount].SpinLock);
-    mMsrCount ++;
-    if (mMsrCount == mMsrSpinLockCount) {
-      //
-      // If MSR spin lock buffer is full, enlarge it
-      //
-      AddedSize = SIZE_4KB;
-      mSmmCpuSemaphores.SemaphoreMsr.Msr =
-                        AllocatePages (EFI_SIZE_TO_PAGES(AddedSize));
-      ASSERT (mSmmCpuSemaphores.SemaphoreMsr.Msr != NULL);
-      NewMsrSpinLockCount = mMsrSpinLockCount + AddedSize / mSemaphoreSize;
-      mMsrSpinLocks = ReallocatePool (
-                        sizeof (MP_MSR_LOCK) * mMsrSpinLockCount,
-                        sizeof (MP_MSR_LOCK) * NewMsrSpinLockCount,
-                        mMsrSpinLocks
-                        );
-      ASSERT (mMsrSpinLocks != NULL);
-      mMsrSpinLockCount = NewMsrSpinLockCount;
-      for (Index = mMsrCount; Index < mMsrSpinLockCount; Index++) {
-        mMsrSpinLocks[Index].SpinLock =
-                 (SPIN_LOCK *)((UINTN)mSmmCpuSemaphores.SemaphoreMsr.Msr +
-                 (Index - mMsrCount)  * mSemaphoreSize);
-        mMsrSpinLocks[Index].MsrIndex = (UINT32)-1;
-      }
-    }
-  }
-}
-
 /**
   Sync up the MTRR values for all processors.
 
@@ -204,42 +122,89 @@ Returns:
 }
 
 /**
-  Programs registers for the calling processor.
+  Increment semaphore by 1.
 
-  This function programs registers for the calling processor.
+  @param      Sem            IN:  32-bit unsigned integer
 
-  @param  RegisterTables        Pointer to register table of the running processor.
-  @param  RegisterTableCount    Register table count.
+**/
+VOID
+S3ReleaseSemaphore (
+  IN OUT  volatile UINT32           *Sem
+  )
+{
+  InterlockedIncrement (Sem);
+}
+
+/**
+  Decrement the semaphore by 1 if it is not zero.
+
+  Performs an atomic decrement operation for semaphore.
+  The compare exchange operation must be performed using
+  MP safe mechanisms.
+
+  @param      Sem            IN:  32-bit unsigned integer
+
+**/
+VOID
+S3WaitForSemaphore (
+  IN OUT  volatile UINT32           *Sem
+  )
+{
+  UINT32  Value;
+
+  do {
+    Value = *Sem;
+  } while (Value == 0);
+
+  InterlockedDecrement (Sem);
+}
+
+/**
+  Initialize the CPU registers from a register table.
+
+  @param[in]  RegisterTable         The register table for this AP.
+  @param[in]  ApLocation            AP location info for this ap.
+  @param[in]  CpuStatus             CPU status info for this CPU.
+  @param[in]  CpuFlags              Flags data structure used when program the register.
 
+  @note This service could be called by BSP/APs.
 **/
 VOID
-SetProcessorRegister (
-  IN CPU_REGISTER_TABLE        *RegisterTables,
-  IN UINTN                     RegisterTableCount
+EFIAPI
+ProgramProcessorRegister (
+  IN CPU_REGISTER_TABLE           *RegisterTable,
+  IN EFI_CPU_PHYSICAL_LOCATION    *ApLocation,
+  IN CPU_STATUS_INFORMATION       *CpuStatus,
+  IN PROGRAM_CPU_REGISTER_FLAGS   *CpuFlags
   )
 {
   CPU_REGISTER_TABLE_ENTRY  *RegisterTableEntry;
   UINTN                     Index;
   UINTN                     Value;
-  SPIN_LOCK                 *MsrSpinLock;
-  UINT32                    InitApicId;
-  CPU_REGISTER_TABLE        *RegisterTable;
+  CPU_REGISTER_TABLE_ENTRY  *RegisterTableEntryHead;
+  volatile UINT32           *SemaphorePtr;
+  UINT32                    CoreOffset;
+  UINT32                    PackageOffset;
+  UINT32                    PackageThreadsCount;
+  UINT32                    ApOffset;
+  UINTN                     ProcessorIndex;
+  UINTN                     ApIndex;
+  UINTN                     ValidApCount;
 
-  InitApicId = GetInitialApicId ();
-  RegisterTable = NULL;
-  for (Index = 0; Index < RegisterTableCount; Index++) {
-    if (RegisterTables[Index].InitialApicId == InitApicId) {
-      RegisterTable =  &RegisterTables[Index];
-      break;
-    }
-  }
-  ASSERT (RegisterTable != NULL);
+  ApIndex = ApLocation->Package * CpuStatus->CoreCount * CpuStatus->ThreadCount \
+            + ApLocation->Core * CpuStatus->ThreadCount \
+            + ApLocation->Thread;
 
   //
   // Traverse Register Table of this logical processor
   //
-  RegisterTableEntry = (CPU_REGISTER_TABLE_ENTRY *) (UINTN) RegisterTable->RegisterTableEntry;
-  for (Index = 0; Index < RegisterTable->TableLength; Index++, RegisterTableEntry++) {
+  RegisterTableEntryHead = (CPU_REGISTER_TABLE_ENTRY *) (UINTN) RegisterTable->RegisterTableEntry;
+
+  for (Index = 0; Index < RegisterTable->TableLength; Index++) {
+
+    RegisterTableEntry = &RegisterTableEntryHead[Index];
+    DEBUG ((DEBUG_INFO, "Processor = %d, Entry Index %d, Type = %d!\n", ApIndex, Index, RegisterTableEntry->RegisterType));
+
     //
     // Check the type of specified register
     //
@@ -310,12 +275,6 @@ SetProcessorRegister (
           RegisterTableEntry->Value
           );
       } else {
-        //
-        // Get lock to avoid Package/Core scope MSRs programming issue in parallel execution mode
-        // to make sure MSR read/write operation is atomic.
-        //
-        MsrSpinLock = GetMsrSpinLockByIndex (RegisterTableEntry->Index);
-        AcquireSpinLock (MsrSpinLock);
         //
         // Set the bit section according to bit start and length
         //
@@ -325,21 +284,20 @@ SetProcessorRegister (
           RegisterTableEntry->ValidBitStart + RegisterTableEntry->ValidBitLength - 1,
           RegisterTableEntry->Value
           );
-        ReleaseSpinLock (MsrSpinLock);
       }
       break;
     //
     // MemoryMapped operations
     //
     case MemoryMapped:
-      AcquireSpinLock (mMemoryMappedLock);
+      AcquireSpinLock (&CpuFlags->MemoryMappedLock);
       MmioBitFieldWrite32 (
         (UINTN)(RegisterTableEntry->Index | LShiftU64 (RegisterTableEntry->HighIndex, 32)),
         RegisterTableEntry->ValidBitStart,
         RegisterTableEntry->ValidBitStart + RegisterTableEntry->ValidBitLength - 1,
         (UINT32)RegisterTableEntry->Value
         );
-      ReleaseSpinLock (mMemoryMappedLock);
+      ReleaseSpinLock (&CpuFlags->MemoryMappedLock);
       break;
     //
     // Enable or disable cache
@@ -355,12 +313,99 @@ SetProcessorRegister (
       }
       break;
 
+    case Semaphore:
+      SemaphorePtr = CpuFlags->SemaphoreCount;
+      switch (RegisterTableEntry->Value) {
+      case CoreDepType:
+        CoreOffset = (ApLocation->Package * CpuStatus->CoreCount + ApLocation->Core) * CpuStatus->ThreadCount;
+        ApOffset = CoreOffset + ApLocation->Thread;
+        //
+        // First increase semaphore count by 1 for processors in this core.
+        //
+        for (ProcessorIndex = 0; ProcessorIndex < CpuStatus->ThreadCount; ProcessorIndex ++) {
+          S3ReleaseSemaphore ((UINT32 *) &SemaphorePtr[CoreOffset + ProcessorIndex]);
+        }
+        //
+        // Second, check whether the count has reach the check number.
+        //
+        for (ProcessorIndex = 0; ProcessorIndex < CpuStatus->ThreadCount; ProcessorIndex ++) {
+          S3WaitForSemaphore (&SemaphorePtr[ApOffset]);
+        }
+        break;
+
+      case PackageDepType:
+        PackageOffset = ApLocation->Package * CpuStatus->CoreCount * CpuStatus->ThreadCount;
+        PackageThreadsCount = CpuStatus->ThreadCount * CpuStatus->CoreCount;
+        ApOffset = PackageOffset + CpuStatus->ThreadCount * ApLocation->Core + ApLocation->Thread;
+        ValidApCount = CpuStatus->ThreadCount * CpuStatus->ValidCoresInPackages[ApLocation->Package];
+        //
+        // First increase semaphore count by 1 for processors in this package.
+        //
+        for (ProcessorIndex = 0; ProcessorIndex < PackageThreadsCount ; ProcessorIndex ++) {
+          S3ReleaseSemaphore ((UINT32 *) &SemaphorePtr[PackageOffset + ProcessorIndex]);
+        }
+        //
+        // Second, check whether the count has reach the check number.
+        //
+        for (ProcessorIndex = 0; ProcessorIndex < ValidApCount; ProcessorIndex ++) {
+          S3WaitForSemaphore (&SemaphorePtr[ApOffset]);
+        }
+        break;
+
+      default:
+        break;
+      }
+      break;
+
     default:
       break;
     }
   }
 }
 
+/**
+
+  Set Processor register for one AP.
+  
+  @param     SmmPreRegisterTable     Use pre register table or register table.
+
+**/
+VOID
+SetRegister (
+  IN BOOLEAN                 SmmPreRegisterTable
+  )
+{
+  CPU_REGISTER_TABLE        *RegisterTable;
+  CPU_REGISTER_TABLE        *RegisterTables;
+  UINT32                    InitApicId;
+  UINTN                     ProcIndex;
+  UINTN                     Index;
+
+  if (SmmPreRegisterTable) {
+    RegisterTables = (CPU_REGISTER_TABLE *)(UINTN)mAcpiCpuData.PreSmmInitRegisterTable;
+  } else {
+    RegisterTables = (CPU_REGISTER_TABLE *)(UINTN)mAcpiCpuData.RegisterTable;
+  }
+
+  InitApicId = GetInitialApicId ();
+  RegisterTable = NULL;
+  for (Index = 0; Index < mAcpiCpuData.NumberOfCpus; Index++) {
+    if (RegisterTables[Index].InitialApicId == InitApicId) {
+      RegisterTable =  &RegisterTables[Index];
+      ProcIndex = Index;
+      break;
+    }
+  }
+  ASSERT (RegisterTable != NULL);
+
+  ProgramProcessorRegister (
+    RegisterTable,
+    mAcpiCpuData.ApLocation + ProcIndex,
+    &mAcpiCpuData.CpuStatus,
+    &mCpuFlags
+    );
+}
+
 /**
   AP initialization before then after SMBASE relocation in the S3 boot path.
 **/
@@ -374,7 +419,7 @@ InitializeAp (
 
   LoadMtrrData (mAcpiCpuData.MtrrTable);
 
-  SetProcessorRegister ((CPU_REGISTER_TABLE *) (UINTN) mAcpiCpuData.PreSmmInitRegisterTable, mAcpiCpuData.NumberOfCpus);
+  SetRegister (TRUE);
 
   //
   // Count down the number with lock mechanism.
@@ -391,7 +436,7 @@ InitializeAp (
   ProgramVirtualWireMode ();
   DisableLvtInterrupts ();
 
-  SetProcessorRegister ((CPU_REGISTER_TABLE *) (UINTN) mAcpiCpuData.RegisterTable, mAcpiCpuData.NumberOfCpus);
+  SetRegister (FALSE);
 
   //
   // Place AP into the safe code, count down the number with lock mechanism in the safe code.
@@ -466,7 +511,7 @@ InitializeCpuBeforeRebase (
 {
   LoadMtrrData (mAcpiCpuData.MtrrTable);
 
-  SetProcessorRegister ((CPU_REGISTER_TABLE *) (UINTN) mAcpiCpuData.PreSmmInitRegisterTable, mAcpiCpuData.NumberOfCpus);
+  SetRegister (TRUE);
 
   ProgramVirtualWireMode ();
 
@@ -502,8 +547,6 @@ InitializeCpuAfterRebase (
   VOID
   )
 {
-  SetProcessorRegister ((CPU_REGISTER_TABLE *) (UINTN) mAcpiCpuData.RegisterTable, mAcpiCpuData.NumberOfCpus);
-
   mNumberToFinish = mAcpiCpuData.NumberOfCpus - 1;
 
   //
@@ -511,6 +554,8 @@ InitializeCpuAfterRebase (
   //
   mInitApsAfterSmmBaseReloc = TRUE;
 
+  SetRegister (FALSE);
+
   while (mNumberToFinish > 0) {
     CpuPause ();
   }
@@ -574,8 +619,6 @@ SmmRestoreCpu (
 
   mSmmS3Flag = TRUE;
 
-  InitializeSpinLock (mMemoryMappedLock);
-
   //
   // See if there is enough context to resume PEI Phase
   //
@@ -790,7 +833,6 @@ CopyRegisterTable (
   )
 {
   UINTN                      Index;
-  UINTN                      Index1;
   CPU_REGISTER_TABLE_ENTRY   *RegisterTableEntry;
 
   CopyMem (DestinationRegisterTableList, SourceRegisterTableList, NumberOfCpus * sizeof (CPU_REGISTER_TABLE));
@@ -802,17 +844,6 @@ CopyRegisterTable (
         );
       ASSERT (RegisterTableEntry != NULL);
       DestinationRegisterTableList[Index].RegisterTableEntry = (EFI_PHYSICAL_ADDRESS)(UINTN)RegisterTableEntry;
-      //
-      // Go though all MSRs in register table to initialize MSR spin lock
-      //
-      for (Index1 = 0; Index1 < DestinationRegisterTableList[Index].TableLength; Index1++, RegisterTableEntry++) {
-        if ((RegisterTableEntry->RegisterType == Msr) && (RegisterTableEntry->ValidBitLength < 64)) {
-          //
-          // Initialize MSR spin lock only for those MSRs need bit field writing
-          //
-          InitMsrSpinLockByIndex (RegisterTableEntry->Index);
-        }
-      }
     }
   }
 }
@@ -832,6 +863,7 @@ GetAcpiCpuData (
   VOID                       *GdtForAp;
   VOID                       *IdtForAp;
   VOID                       *MachineCheckHandlerForAp;
+  CPU_STATUS_INFORMATION     *CpuStatus;
 
   if (!mAcpiS3Enable) {
     return;
@@ -906,6 +938,16 @@ GetAcpiCpuData (
   Gdtr->Base = (UINTN)GdtForAp;
   Idtr->Base = (UINTN)IdtForAp;
   mAcpiCpuData.ApMachineCheckHandlerBase = (EFI_PHYSICAL_ADDRESS)(UINTN)MachineCheckHandlerForAp;
+
+  CpuStatus = &mAcpiCpuData.CpuStatus;
+  CopyMem (CpuStatus, &AcpiCpuData->CpuStatus, sizeof (CPU_STATUS_INFORMATION));
+  CpuStatus->ValidCoresInPackages = AllocateCopyPool (sizeof (UINT32) * CpuStatus->PackageCount, AcpiCpuData->CpuStatus.ValidCoresInPackages);
+  ASSERT (CpuStatus->ValidCoresInPackages != NULL);
+  mAcpiCpuData.ApLocation = AllocateCopyPool (mAcpiCpuData.NumberOfCpus * sizeof (EFI_CPU_PHYSICAL_LOCATION), AcpiCpuData->ApLocation);
+  ASSERT (mAcpiCpuData.ApLocation != NULL);
+  InitializeSpinLock((SPIN_LOCK*) &mCpuFlags.MemoryMappedLock);
+  mCpuFlags.SemaphoreCount = AllocateZeroPool (sizeof (UINT32) * CpuStatus->PackageCount * CpuStatus->CoreCount * CpuStatus->ThreadCount);
+  ASSERT (mCpuFlags.SemaphoreCount != NULL);
 }
 
 /**
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
index 9cf508a5c7..42b040531e 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
@@ -1303,8 +1303,6 @@ InitializeSmmCpuSemaphores (
   mSmmCpuSemaphores.SemaphoreGlobal.CodeAccessCheckLock
                                                   = (SPIN_LOCK *)SemaphoreAddr;
   SemaphoreAddr += SemaphoreSize;
-  mSmmCpuSemaphores.SemaphoreGlobal.MemoryMappedLock
-                                                  = (SPIN_LOCK *)SemaphoreAddr;
 
   SemaphoreAddr = (UINTN)SemaphoreBlock + GlobalSemaphoresSize;
   mSmmCpuSemaphores.SemaphoreCpu.Busy    = (SPIN_LOCK *)SemaphoreAddr;
@@ -1321,7 +1319,6 @@ InitializeSmmCpuSemaphores (
 
   mPFLock                       = mSmmCpuSemaphores.SemaphoreGlobal.PFLock;
   mConfigSmmCodeAccessCheckLock = mSmmCpuSemaphores.SemaphoreGlobal.CodeAccessCheckLock;
-  mMemoryMappedLock             = mSmmCpuSemaphores.SemaphoreGlobal.MemoryMappedLock;
 
   mSemaphoreSize = SemaphoreSize;
 }
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
index 8c7f4996d1..e2970308fe 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
@@ -53,6 +53,7 @@ WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED.
 #include <Library/ReportStatusCodeLib.h>
 #include <Library/SmmCpuFeaturesLib.h>
 #include <Library/PeCoffGetEntryPointLib.h>
+#include <Library/RegisterCpuFeaturesLib.h>
 
 #include <AcpiCpuData.h>
 #include <CpuHotPlugData.h>
@@ -364,7 +365,6 @@ typedef struct {
   volatile BOOLEAN     *AllCpusInSync;
   SPIN_LOCK            *PFLock;
   SPIN_LOCK            *CodeAccessCheckLock;
-  SPIN_LOCK            *MemoryMappedLock;
 } SMM_CPU_SEMAPHORE_GLOBAL;
 
 ///
@@ -409,7 +409,6 @@ extern SMM_CPU_SEMAPHORES                  mSmmCpuSemaphores;
 extern UINTN                               mSemaphoreSize;
 extern SPIN_LOCK                           *mPFLock;
 extern SPIN_LOCK                           *mConfigSmmCodeAccessCheckLock;
-extern SPIN_LOCK                           *mMemoryMappedLock;
 extern EFI_SMRAM_DESCRIPTOR                *mSmmCpuSmramRanges;
 extern UINTN                               mSmmCpuSmramRangeCount;
 extern UINT8                               mPhysicalAddressBits;
-- 
2.15.0.windows.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Patch 0/4] Fix performance issue caused by Set MSR task.
  2018-10-15  2:49 [Patch 0/4] Fix performance issue caused by Set MSR task Eric Dong
                   ` (3 preceding siblings ...)
  2018-10-15  2:49 ` [Patch 4/4] UefiCpuPkg/PiSmmCpuDxeSmm: " Eric Dong
@ 2018-10-15 15:51 ` Laszlo Ersek
  2018-10-16  1:39   ` Dong, Eric
  4 siblings, 1 reply; 18+ messages in thread
From: Laszlo Ersek @ 2018-10-15 15:51 UTC (permalink / raw)
  To: Eric Dong, edk2-devel; +Cc: Ruiyu Ni

Hi Eric,

On 10/15/18 04:49, Eric Dong wrote:
> In a system which has multiple cores, current set register value task costs huge times.
> After investigation, current set MSR task costs most of the times. Current logic uses
> SpinLock to let set MSR task as an single thread task for all cores. Because MSR has
> scope attribute which may cause GP fault if multiple APs set MSR at the same time,
> current logic use an easiest solution (use SpinLock) to avoid this issue, but it will
> cost huge times.
> 
> In order to fix this performance issue, new solution will set MSRs base on their scope
> attribute. After this, the SpinLock will not needed. Without SpinLock, new issue raised
> which is caused by MSR dependence. For example, MSR A depends on MSR B which means MSR A
> must been set after MSR B has been set. Also MSR B is package scope level and MSR A is
> thread scope level. If system has multiple threads, Thread 1 needs to set the thread level
> MSRs and thread 2 needs to set thread and package level MSRs. Set MSRs task for thread 1
> and thread 2 like below:
> 
>             Thread 1                 Thread 2
> MSR B          N                        Y
> MSR A          Y                        Y
> 
> If driver don't control execute MSR order, for thread 1, it will execute MSR A first, but
> at this time, MSR B not been executed yet by thread 2. system may trig exception at this
> time.
> 
> In order to fix the above issue, driver introduces semaphore logic to control the MSR
> execute sequence. For the above case, a semaphore will be add between MSR A and B for
> all threads. Semaphore has scope info for it. The possible scope value is core or package.
> For each thread, when it meets a semaphore during it set registers, it will 1) release
> semaphore (+1) for each threads in this core or package(based on the scope info for this
> semaphore) 2) acquire semaphore (-1) for all the threads in this core or package(based
> on the scope info for this semaphore). With these two steps, driver can control MSR
> sequence. Sample code logic like below:
> 
>   //
>   // First increase semaphore count by 1 for processors in this package.
>   //
>   for (ProcessorIndex = 0; ProcessorIndex < PackageThreadsCount ; ProcessorIndex ++) {
>     LibReleaseSemaphore ((UINT32 *) &SemaphorePtr[PackageOffset + ProcessorIndex]);
>   }
>   //
>   // Second, check whether the count has reach the check number.
>   //
>   for (ProcessorIndex = 0; ProcessorIndex < ValidApCount; ProcessorIndex ++) {
>     LibWaitForSemaphore (&SemaphorePtr[ApOffset]);
>   }
> 
> Platform Requirement:
> 1. This change requires register MSR setting base on MSR scope info. If still register MSR
>    for all threads, exception may raised.

Do you mean that platforms are responsible for updating their register
tables in:
- ACPI_CPU_DATA.PreSmmInitRegisterTable,
- ACPI_CPU_DATA.RegisterTable

so that the tables utilize the new Semaphore REGISTER_TYPE as appropriate?

> 
> Known limitation:
> 1. Current CpuFeatures driver supports DXE instance and PEI instance. But semaphore logic
>    requires Aps execute in async mode which is not supported by PEI driver. So CpuFeature
>    PEI instance not works after this change. We plan to support async mode for PEI in phase
>    2 for this task.
> 2. Current execute MSR task code in duplicated in PiSmmCpuDxeSmm driver and 
>    RegisterCpuFeaturesLib library because the schedule limitation.

I don't understand what you mean by "schedule limitation". Are you
alluding to the upcoming edk2 stable tag (in November), or some other
schedule?

>    Will merge the code to 
>    RegisterCpuFeaturesLib and export as an API in phase 2 for this task.

While I agree that common code (especially complex code like this)
should belong to libraries, there are platforms that consume
PiSmmCpuDxeSmm, but don't consume RegisterCpuFeaturesLib in any way.

Do you plan to add the new function(s) to a RegisterCpuFeaturesLib
instance, and make PiSmmCpuDxeSmm dependent on RegisterCpuFeaturesLib?

If so, I think it can work, but then the RegisterCpuFeaturesLib instance
in question should do nothing at all in the constructor. On platforms
that don't use this feature at all -- i.e., the Semaphore REGISTER_TYPE
--, there should be no impact.

(BTW, DxeRegisterCpuFeaturesLib is currently restricted to DXE_DRIVER
modules.)

> Extra Notes:
>   I will send the other patch to set MSR base on scope info and check in it before check in
>   this serial.

I don't understand. I assume that you are referring to some concrete
platform (?) where the Semaphore REGISTER_TYPE *must* be used, in order
to successfully boot (and/or perform S3), if this series is applied.

What platform is that?

And, if that other patch is indeed a pre-requisite for *this* set (on
some specific platform anyway), then people on that platform will not be
able to test this series until you post those patches.

My point here is that, on that platform, the testing cannot be performed
in separation, so it's not enough to establish the right dependency
order *just* before check-in. It should be offered on the list as well.

Thanks,
Laszlo

> 
> Cc: Ruiyu Ni <ruiyu.ni@intel.com>
> Cc: Laszlo Ersek <lersek@redhat.com>
> Contributed-under: TianoCore Contribution Agreement 1.1
> Signed-off-by: Eric Dong <eric.dong@intel.com>
> 
> Eric Dong (4):
>   UefiCpuPkg/Include/AcpiCpuData.h: Add Semaphore related Information.
>   UefiCpuPkg/RegisterCpuFeaturesLib.h: Add new dependence types.
>   UefiCpuPkg/RegisterCpuFeaturesLib: Add logic to support semaphore
>     type.
>   UefiCpuPkg/PiSmmCpuDxeSmm: Add logic to support semaphore type.
> 
>  UefiCpuPkg/Include/AcpiCpuData.h                   |  23 +-
>  .../Include/Library/RegisterCpuFeaturesLib.h       |  25 +-
>  .../RegisterCpuFeaturesLib/CpuFeaturesInitialize.c | 324 ++++++++++++---
>  .../DxeRegisterCpuFeaturesLib.c                    |  71 +++-
>  .../DxeRegisterCpuFeaturesLib.inf                  |   3 +
>  .../PeiRegisterCpuFeaturesLib.c                    |  55 ++-
>  .../PeiRegisterCpuFeaturesLib.inf                  |   1 +
>  .../RegisterCpuFeaturesLib/RegisterCpuFeatures.h   |  51 ++-
>  .../RegisterCpuFeaturesLib.c                       | 452 ++++++++++++++++++---
>  UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c                  | 316 +++++++-------
>  UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c              |   3 -
>  UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h         |   3 +-
>  12 files changed, 1063 insertions(+), 264 deletions(-)
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch 1/4] UefiCpuPkg/Include/AcpiCpuData.h: Add Semaphore related Information.
  2018-10-15  2:49 ` [Patch 1/4] UefiCpuPkg/Include/AcpiCpuData.h: Add Semaphore related Information Eric Dong
@ 2018-10-15 16:02   ` Laszlo Ersek
  2018-10-16  3:43     ` Dong, Eric
  2018-10-16  2:27   ` Ni, Ruiyu
  1 sibling, 1 reply; 18+ messages in thread
From: Laszlo Ersek @ 2018-10-15 16:02 UTC (permalink / raw)
  To: Eric Dong, edk2-devel; +Cc: Ruiyu Ni

On 10/15/18 04:49, Eric Dong wrote:
> In order to support semaphore related logic, add new definition for it.
> 
> Cc: Ruiyu Ni <ruiyu.ni@intel.com>
> Cc: Laszlo Ersek <lersek@redhat.com>
> Contributed-under: TianoCore Contribution Agreement 1.1
> Signed-off-by: Eric Dong <eric.dong@intel.com>
> ---
>  UefiCpuPkg/Include/AcpiCpuData.h | 23 ++++++++++++++++++++++-
>  1 file changed, 22 insertions(+), 1 deletion(-)

(1) If it's possible, I suggest moving the (very nice) description from
the 0/4 cover letter to this patch. The cover letter is not captured in
the git commit history.

I don't insist, but it would be a nice touch, IMO.

> 
> diff --git a/UefiCpuPkg/Include/AcpiCpuData.h b/UefiCpuPkg/Include/AcpiCpuData.h
> index 9e51145c08..b3cf2f664a 100644
> --- a/UefiCpuPkg/Include/AcpiCpuData.h
> +++ b/UefiCpuPkg/Include/AcpiCpuData.h
> @@ -15,6 +15,8 @@ WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED.
>  #ifndef _ACPI_CPU_DATA_H_
>  #define _ACPI_CPU_DATA_H_
>  
> +#include <Protocol/MpService.h>
> +
>  //
>  // Register types in register table
>  //
> @@ -22,9 +24,20 @@ typedef enum {
>    Msr,
>    ControlRegister,
>    MemoryMapped,
> -  CacheControl
> +  CacheControl,
> +  Semaphore
>  } REGISTER_TYPE;
>  
> +//
> +// CPU information.
> +//
> +typedef struct {
> +  UINT32        PackageCount;             // Packages in this CPU.

(2) Is it possible to have multiple packages in a single CPU? If not,
then please clean up the comment.

Did you perhaps mean "number of sockets in the system"?

> +  UINT32        CoreCount;                // Max Core count in the packages.
> +  UINT32        ThreadCount;              // MAx thread count in the cores.

(3) The word "MAx" should be "Max", I think.

> +  UINT32        *ValidCoresInPackages;    // Valid cores in each package.

(4) Is it possible to document the structure of this array (?) in some
detail? Other parts of "UefiCpuPkg/Include/AcpiCpuData.h" are very well
documented.

> +} CPU_STATUS_INFORMATION;
> +
>  //
>  // Element of register table entry
>  //
> @@ -147,6 +160,14 @@ typedef struct {
>    // provided.
>    //
>    UINT32                ApMachineCheckHandlerSize;
> +  //
> +  // CPU information which is required when set the register table.
> +  //
> +  CPU_STATUS_INFORMATION     CpuStatus;
> +  //
> +  // Location info for each ap.

(5) This header file spells "AP" in upper case elsewhere.

> +  //
> +  EFI_CPU_PHYSICAL_LOCATION  *ApLocation;

(6) Is this supposed to be an array? If so, what is the structure of the
array? What is the size?

(7) This is the first field in ACPI_CPU_DATA that has pointer type.
Other pointers are represented as EFI_PHYSICAL_ADDRESS.

What justifies this difference?

>  } ACPI_CPU_DATA;
>  
>  #endif
> 

(8) "UefiCpuPkg/CpuS3DataDxe/CpuS3Data.c" will zero-fill the new fields.
Is that safe?

Thanks
Laszlo


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch 4/4] UefiCpuPkg/PiSmmCpuDxeSmm: Add logic to support semaphore type.
  2018-10-15  2:49 ` [Patch 4/4] UefiCpuPkg/PiSmmCpuDxeSmm: " Eric Dong
@ 2018-10-15 17:13   ` Laszlo Ersek
  2018-10-16 14:44     ` Dong, Eric
  2018-10-16  3:16   ` Ni, Ruiyu
  1 sibling, 1 reply; 18+ messages in thread
From: Laszlo Ersek @ 2018-10-15 17:13 UTC (permalink / raw)
  To: Eric Dong, edk2-devel; +Cc: Ruiyu Ni

On 10/15/18 04:49, Eric Dong wrote:
> Because this driver needs to set MSRs saved in normal boot phase, sync semaphore
> logic from RegisterCpuFeaturesLib code which used for normal boot phase.

(My review of this patch is going to be superficial. I'm not trying to
validate the actual algorithm. I'm mostly sanity-checking the code, and
gauging whether it will break platforms that use CpuS3DataDxe.)


> Detail see change SHA-1: dcdf1774212d87e2d7feb36286a408ea7475fd7b for
> RegisterCpuFeaturesLib.

(1) I think it is valid to reference other patches in the same series.
However, the commit hashes are not stable yet -- when you rebase the
series, the commit hashes will change. Therefore, when we refer to a
patch that is not upstream yet (i.e. it is part of the same series), it
is best to spell out the full subject, such as:

UefiCpuPkg/RegisterCpuFeaturesLib: Add logic to support semaphore type.


> 
> Cc: Ruiyu Ni <ruiyu.ni@intel.com>
> Cc: Laszlo Ersek <lersek@redhat.com>
> Contributed-under: TianoCore Contribution Agreement 1.1
> Signed-off-by: Eric Dong <eric.dong@intel.com>
> ---
>  UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c          | 316 ++++++++++++++++-------------
>  UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c      |   3 -
>  UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h |   3 +-
>  3 files changed, 180 insertions(+), 142 deletions(-)
> 
> diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c b/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
> index 52ff9679d5..5a35f7a634 100644
> --- a/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
> +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
> @@ -38,9 +38,12 @@ typedef struct {
>  } MP_ASSEMBLY_ADDRESS_MAP;
>  
>  //
> -// Spin lock used to serialize MemoryMapped operation
> +// Flags used when program the register.
>  //
> -SPIN_LOCK                *mMemoryMappedLock = NULL;
> +typedef struct {
> +  volatile UINTN           MemoryMappedLock;     // Spinlock used to program mmio
> +  volatile UINT32          *SemaphoreCount;      // Semaphore used to program semaphore.
> +} PROGRAM_CPU_REGISTER_FLAGS;
>  
>  //
>  // Signal that SMM BASE relocation is complete.
> @@ -62,13 +65,11 @@ AsmGetAddressMap (
>  #define LEGACY_REGION_SIZE    (2 * 0x1000)
>  #define LEGACY_REGION_BASE    (0xA0000 - LEGACY_REGION_SIZE)
>  
> +PROGRAM_CPU_REGISTER_FLAGS   mCpuFlags;
>  ACPI_CPU_DATA                mAcpiCpuData;
>  volatile UINT32              mNumberToFinish;
>  MP_CPU_EXCHANGE_INFO         *mExchangeInfo;
>  BOOLEAN                      mRestoreSmmConfigurationInS3 = FALSE;
> -MP_MSR_LOCK                  *mMsrSpinLocks = NULL;
> -UINTN                        mMsrSpinLockCount;
> -UINTN                        mMsrCount = 0;
>  
>  //
>  // S3 boot flag
> @@ -91,89 +92,6 @@ UINT8                        mApHltLoopCodeTemplate[] = {
>                                 0xEB, 0xFC               // jmp $-2
>                                 };
>  
> -/**
> -  Get MSR spin lock by MSR index.
> -
> -  @param  MsrIndex       MSR index value.
> -
> -  @return Pointer to MSR spin lock.
> -
> -**/
> -SPIN_LOCK *
> -GetMsrSpinLockByIndex (
> -  IN UINT32      MsrIndex
> -  )
> -{
> -  UINTN     Index;
> -  for (Index = 0; Index < mMsrCount; Index++) {
> -    if (MsrIndex == mMsrSpinLocks[Index].MsrIndex) {
> -      return mMsrSpinLocks[Index].SpinLock;
> -    }
> -  }
> -  return NULL;
> -}
> -
> -/**
> -  Initialize MSR spin lock by MSR index.
> -
> -  @param  MsrIndex       MSR index value.
> -
> -**/
> -VOID
> -InitMsrSpinLockByIndex (
> -  IN UINT32      MsrIndex
> -  )
> -{
> -  UINTN    MsrSpinLockCount;
> -  UINTN    NewMsrSpinLockCount;
> -  UINTN    Index;
> -  UINTN    AddedSize;
> -
> -  if (mMsrSpinLocks == NULL) {
> -    MsrSpinLockCount = mSmmCpuSemaphores.SemaphoreMsr.AvailableCounter;
> -    mMsrSpinLocks = (MP_MSR_LOCK *) AllocatePool (sizeof (MP_MSR_LOCK) * MsrSpinLockCount);
> -    ASSERT (mMsrSpinLocks != NULL);
> -    for (Index = 0; Index < MsrSpinLockCount; Index++) {
> -      mMsrSpinLocks[Index].SpinLock =
> -       (SPIN_LOCK *)((UINTN)mSmmCpuSemaphores.SemaphoreMsr.Msr + Index * mSemaphoreSize);
> -      mMsrSpinLocks[Index].MsrIndex = (UINT32)-1;
> -    }
> -    mMsrSpinLockCount = MsrSpinLockCount;
> -    mSmmCpuSemaphores.SemaphoreMsr.AvailableCounter = 0;
> -  }
> -  if (GetMsrSpinLockByIndex (MsrIndex) == NULL) {
> -    //
> -    // Initialize spin lock for MSR programming
> -    //
> -    mMsrSpinLocks[mMsrCount].MsrIndex = MsrIndex;
> -    InitializeSpinLock (mMsrSpinLocks[mMsrCount].SpinLock);
> -    mMsrCount ++;
> -    if (mMsrCount == mMsrSpinLockCount) {
> -      //
> -      // If MSR spin lock buffer is full, enlarge it
> -      //
> -      AddedSize = SIZE_4KB;
> -      mSmmCpuSemaphores.SemaphoreMsr.Msr =
> -                        AllocatePages (EFI_SIZE_TO_PAGES(AddedSize));
> -      ASSERT (mSmmCpuSemaphores.SemaphoreMsr.Msr != NULL);
> -      NewMsrSpinLockCount = mMsrSpinLockCount + AddedSize / mSemaphoreSize;
> -      mMsrSpinLocks = ReallocatePool (
> -                        sizeof (MP_MSR_LOCK) * mMsrSpinLockCount,
> -                        sizeof (MP_MSR_LOCK) * NewMsrSpinLockCount,
> -                        mMsrSpinLocks
> -                        );
> -      ASSERT (mMsrSpinLocks != NULL);
> -      mMsrSpinLockCount = NewMsrSpinLockCount;
> -      for (Index = mMsrCount; Index < mMsrSpinLockCount; Index++) {
> -        mMsrSpinLocks[Index].SpinLock =
> -                 (SPIN_LOCK *)((UINTN)mSmmCpuSemaphores.SemaphoreMsr.Msr +
> -                 (Index - mMsrCount)  * mSemaphoreSize);
> -        mMsrSpinLocks[Index].MsrIndex = (UINT32)-1;
> -      }
> -    }
> -  }
> -}
> -
>  /**
>    Sync up the MTRR values for all processors.
>  
> @@ -204,42 +122,89 @@ Returns:
>  }
>  
>  /**
> -  Programs registers for the calling processor.
> +  Increment semaphore by 1.
>  
> -  This function programs registers for the calling processor.
> +  @param      Sem            IN:  32-bit unsigned integer
>  
> -  @param  RegisterTables        Pointer to register table of the running processor.
> -  @param  RegisterTableCount    Register table count.
> +**/
> +VOID
> +S3ReleaseSemaphore (
> +  IN OUT  volatile UINT32           *Sem
> +  )
> +{
> +  InterlockedIncrement (Sem);
> +}
> +
> +/**
> +  Decrement the semaphore by 1 if it is not zero.
> +
> +  Performs an atomic decrement operation for semaphore.
> +  The compare exchange operation must be performed using
> +  MP safe mechanisms.
> +
> +  @param      Sem            IN:  32-bit unsigned integer
> +
> +**/
> +VOID
> +S3WaitForSemaphore (
> +  IN OUT  volatile UINT32           *Sem
> +  )
> +{
> +  UINT32  Value;
> +
> +  do {
> +    Value = *Sem;
> +  } while (Value == 0);
> +
> +  InterlockedDecrement (Sem);
> +}

(2) I think this implementation is not correct. If threads T1 and T2 are
spinning in the loop, and thread T3 releases the semaphore, then both T1
and T2 could see (Value==1). They will both exit the loop, they will
both decrement (*Sem), and then (*Sem) will wrap around.

Instead, we should do:

  for (;;) {
    Value = *Sem;
    if (Value == 0) {
      continue;
    }
    if (InterlockedCompareExchange32 (Sem, Value, Value - 1) == Value) {
      break;
    }
  }

This implementation is not protected against the ABA problem, but that's
fine. Namely, it doesn't matter whether, and how, the value of (*Sem)
fluctuates, between fetching it into Value, and setting it to (Value-1).
What matters is that we either perform a transition from Value to
(Value-1), or nothing.


> +
> +/**
> +  Initialize the CPU registers from a register table.
> +
> +  @param[in]  RegisterTable         The register table for this AP.
> +  @param[in]  ApLocation            AP location info for this ap.
> +  @param[in]  CpuStatus             CPU status info for this CPU.
> +  @param[in]  CpuFlags              Flags data structure used when program the register.
>  
> +  @note This service could be called by BSP/APs.
>  **/
>  VOID
> -SetProcessorRegister (
> -  IN CPU_REGISTER_TABLE        *RegisterTables,
> -  IN UINTN                     RegisterTableCount
> +EFIAPI
> +ProgramProcessorRegister (
> +  IN CPU_REGISTER_TABLE           *RegisterTable,
> +  IN EFI_CPU_PHYSICAL_LOCATION    *ApLocation,
> +  IN CPU_STATUS_INFORMATION       *CpuStatus,
> +  IN PROGRAM_CPU_REGISTER_FLAGS   *CpuFlags
>    )

(3) Any particular reason for declaring this function as EFIAPI?


>  {
>    CPU_REGISTER_TABLE_ENTRY  *RegisterTableEntry;
>    UINTN                     Index;
>    UINTN                     Value;
> -  SPIN_LOCK                 *MsrSpinLock;
> -  UINT32                    InitApicId;
> -  CPU_REGISTER_TABLE        *RegisterTable;
> +  CPU_REGISTER_TABLE_ENTRY  *RegisterTableEntryHead;
> +  volatile UINT32           *SemaphorePtr;
> +  UINT32                    CoreOffset;
> +  UINT32                    PackageOffset;
> +  UINT32                    PackageThreadsCount;
> +  UINT32                    ApOffset;
> +  UINTN                     ProcessorIndex;
> +  UINTN                     ApIndex;
> +  UINTN                     ValidApCount;
>  
> -  InitApicId = GetInitialApicId ();
> -  RegisterTable = NULL;
> -  for (Index = 0; Index < RegisterTableCount; Index++) {
> -    if (RegisterTables[Index].InitialApicId == InitApicId) {
> -      RegisterTable =  &RegisterTables[Index];
> -      break;
> -    }
> -  }
> -  ASSERT (RegisterTable != NULL);
> +  ApIndex = ApLocation->Package * CpuStatus->CoreCount * CpuStatus->ThreadCount \
> +            + ApLocation->Core * CpuStatus->ThreadCount \
> +            + ApLocation->Thread;

(4) The backslashes look useless.

In addition, the plus signs should be at the ends of the lines,
according to the edk2 style (operators at the end).

>  
>    //
>    // Traverse Register Table of this logical processor
>    //
> -  RegisterTableEntry = (CPU_REGISTER_TABLE_ENTRY *) (UINTN) RegisterTable->RegisterTableEntry;
> -  for (Index = 0; Index < RegisterTable->TableLength; Index++, RegisterTableEntry++) {
> +  RegisterTableEntryHead = (CPU_REGISTER_TABLE_ENTRY *) (UINTN) RegisterTable->RegisterTableEntry;
> +
> +  for (Index = 0; Index < RegisterTable->TableLength; Index++) {

(OK, I think this should continue working with (TableLength==0), from
CpuS3DataDxe.)

> +
> +    RegisterTableEntry = &RegisterTableEntryHead[Index];
> +    DEBUG ((DEBUG_INFO, "Processor = %d, Entry Index %d, Type = %d!\n", ApIndex, Index, RegisterTableEntry->RegisterType));

(5) "ApIndex" and "Index" have type UINTN; they should not be printed
with "%d". The portable way to print them is to cast them to UINT64, and
use "%lu".


> +
>      //
>      // Check the type of specified register
>      //
> @@ -310,12 +275,6 @@ SetProcessorRegister (
>            RegisterTableEntry->Value
>            );
>        } else {
> -        //
> -        // Get lock to avoid Package/Core scope MSRs programming issue in parallel execution mode
> -        // to make sure MSR read/write operation is atomic.
> -        //
> -        MsrSpinLock = GetMsrSpinLockByIndex (RegisterTableEntry->Index);
> -        AcquireSpinLock (MsrSpinLock);
>          //
>          // Set the bit section according to bit start and length
>          //
> @@ -325,21 +284,20 @@ SetProcessorRegister (
>            RegisterTableEntry->ValidBitStart + RegisterTableEntry->ValidBitLength - 1,
>            RegisterTableEntry->Value
>            );
> -        ReleaseSpinLock (MsrSpinLock);
>        }
>        break;
>      //
>      // MemoryMapped operations
>      //
>      case MemoryMapped:
> -      AcquireSpinLock (mMemoryMappedLock);
> +      AcquireSpinLock (&CpuFlags->MemoryMappedLock);
>        MmioBitFieldWrite32 (
>          (UINTN)(RegisterTableEntry->Index | LShiftU64 (RegisterTableEntry->HighIndex, 32)),
>          RegisterTableEntry->ValidBitStart,
>          RegisterTableEntry->ValidBitStart + RegisterTableEntry->ValidBitLength - 1,
>          (UINT32)RegisterTableEntry->Value
>          );
> -      ReleaseSpinLock (mMemoryMappedLock);
> +      ReleaseSpinLock (&CpuFlags->MemoryMappedLock);
>        break;
>      //
>      // Enable or disable cache
> @@ -355,12 +313,99 @@ SetProcessorRegister (
>        }
>        break;
>  
> +    case Semaphore:
> +      SemaphorePtr = CpuFlags->SemaphoreCount;
> +      switch (RegisterTableEntry->Value) {
> +      case CoreDepType:
> +        CoreOffset = (ApLocation->Package * CpuStatus->CoreCount + ApLocation->Core) * CpuStatus->ThreadCount;
> +        ApOffset = CoreOffset + ApLocation->Thread;
> +        //
> +        // First increase semaphore count by 1 for processors in this core.
> +        //
> +        for (ProcessorIndex = 0; ProcessorIndex < CpuStatus->ThreadCount; ProcessorIndex ++) {
> +          S3ReleaseSemaphore ((UINT32 *) &SemaphorePtr[CoreOffset + ProcessorIndex]);

(6) The explicit (UINT32*) cast is confusing and unneeded, please remove it.


> +        }
> +        //
> +        // Second, check whether the count has reach the check number.
> +        //
> +        for (ProcessorIndex = 0; ProcessorIndex < CpuStatus->ThreadCount; ProcessorIndex ++) {
> +          S3WaitForSemaphore (&SemaphorePtr[ApOffset]);
> +        }
> +        break;
> +
> +      case PackageDepType:
> +        PackageOffset = ApLocation->Package * CpuStatus->CoreCount * CpuStatus->ThreadCount;
> +        PackageThreadsCount = CpuStatus->ThreadCount * CpuStatus->CoreCount;
> +        ApOffset = PackageOffset + CpuStatus->ThreadCount * ApLocation->Core + ApLocation->Thread;
> +        ValidApCount = CpuStatus->ThreadCount * CpuStatus->ValidCoresInPackages[ApLocation->Package];
> +        //
> +        // First increase semaphore count by 1 for processors in this package.
> +        //
> +        for (ProcessorIndex = 0; ProcessorIndex < PackageThreadsCount ; ProcessorIndex ++) {
> +          S3ReleaseSemaphore ((UINT32 *) &SemaphorePtr[PackageOffset + ProcessorIndex]);

(7) Same as (6).


> +        }
> +        //
> +        // Second, check whether the count has reach the check number.
> +        //
> +        for (ProcessorIndex = 0; ProcessorIndex < ValidApCount; ProcessorIndex ++) {
> +          S3WaitForSemaphore (&SemaphorePtr[ApOffset]);
> +        }
> +        break;
> +
> +      default:
> +        break;
> +      }
> +      break;
> +
>      default:
>        break;
>      }
>    }
>  }
>  
> +/**
> +
> +  Set Processor register for one AP.
> +  
> +  @param     SmmPreRegisterTable     Use pre register table or register table.
> +
> +**/
> +VOID
> +SetRegister (
> +  IN BOOLEAN                 SmmPreRegisterTable

(8) For consistency with the "PreSmmInitRegisterTable" field name, I
think this parameter should be named "PreSmmRegisterTable" (in the
leading comment as well).


> +  )
> +{
> +  CPU_REGISTER_TABLE        *RegisterTable;
> +  CPU_REGISTER_TABLE        *RegisterTables;
> +  UINT32                    InitApicId;
> +  UINTN                     ProcIndex;
> +  UINTN                     Index;
> +
> +  if (SmmPreRegisterTable) {
> +    RegisterTables = (CPU_REGISTER_TABLE *)(UINTN)mAcpiCpuData.PreSmmInitRegisterTable;
> +  } else {
> +    RegisterTables = (CPU_REGISTER_TABLE *)(UINTN)mAcpiCpuData.RegisterTable;
> +  }
> +
> +  InitApicId = GetInitialApicId ();
> +  RegisterTable = NULL;
> +  for (Index = 0; Index < mAcpiCpuData.NumberOfCpus; Index++) {
> +    if (RegisterTables[Index].InitialApicId == InitApicId) {
> +      RegisterTable =  &RegisterTables[Index];

(9) Unjustified double space after the equal sign.


> +      ProcIndex = Index;
> +      break;
> +    }
> +  }
> +  ASSERT (RegisterTable != NULL);
> +
> +  ProgramProcessorRegister (
> +    RegisterTable,
> +    mAcpiCpuData.ApLocation + ProcIndex,
> +    &mAcpiCpuData.CpuStatus,
> +    &mCpuFlags
> +    );
> +}
> +
>  /**
>    AP initialization before then after SMBASE relocation in the S3 boot path.
>  **/
> @@ -374,7 +419,7 @@ InitializeAp (
>  
>    LoadMtrrData (mAcpiCpuData.MtrrTable);
>  
> -  SetProcessorRegister ((CPU_REGISTER_TABLE *) (UINTN) mAcpiCpuData.PreSmmInitRegisterTable, mAcpiCpuData.NumberOfCpus);
> +  SetRegister (TRUE);
>  
>    //
>    // Count down the number with lock mechanism.
> @@ -391,7 +436,7 @@ InitializeAp (
>    ProgramVirtualWireMode ();
>    DisableLvtInterrupts ();
>  
> -  SetProcessorRegister ((CPU_REGISTER_TABLE *) (UINTN) mAcpiCpuData.RegisterTable, mAcpiCpuData.NumberOfCpus);
> +  SetRegister (FALSE);
>  
>    //
>    // Place AP into the safe code, count down the number with lock mechanism in the safe code.
> @@ -466,7 +511,7 @@ InitializeCpuBeforeRebase (
>  {
>    LoadMtrrData (mAcpiCpuData.MtrrTable);
>  
> -  SetProcessorRegister ((CPU_REGISTER_TABLE *) (UINTN) mAcpiCpuData.PreSmmInitRegisterTable, mAcpiCpuData.NumberOfCpus);
> +  SetRegister (TRUE);
>  
>    ProgramVirtualWireMode ();
>  
> @@ -502,8 +547,6 @@ InitializeCpuAfterRebase (
>    VOID
>    )
>  {
> -  SetProcessorRegister ((CPU_REGISTER_TABLE *) (UINTN) mAcpiCpuData.RegisterTable, mAcpiCpuData.NumberOfCpus);
> -
>    mNumberToFinish = mAcpiCpuData.NumberOfCpus - 1;
>  
>    //
> @@ -511,6 +554,8 @@ InitializeCpuAfterRebase (
>    //
>    mInitApsAfterSmmBaseReloc = TRUE;
>  
> +  SetRegister (FALSE);
> +
>    while (mNumberToFinish > 0) {
>      CpuPause ();
>    }

(10) I'm not implying this is incorrect, just asking: can you please
explain why the function call is *moved*?

Does it merit a comment in the code perhaps?


> @@ -574,8 +619,6 @@ SmmRestoreCpu (
>  
>    mSmmS3Flag = TRUE;
>  
> -  InitializeSpinLock (mMemoryMappedLock);
> -
>    //
>    // See if there is enough context to resume PEI Phase
>    //
> @@ -790,7 +833,6 @@ CopyRegisterTable (
>    )
>  {
>    UINTN                      Index;
> -  UINTN                      Index1;
>    CPU_REGISTER_TABLE_ENTRY   *RegisterTableEntry;
>  
>    CopyMem (DestinationRegisterTableList, SourceRegisterTableList, NumberOfCpus * sizeof (CPU_REGISTER_TABLE));
> @@ -802,17 +844,6 @@ CopyRegisterTable (
>          );
>        ASSERT (RegisterTableEntry != NULL);
>        DestinationRegisterTableList[Index].RegisterTableEntry = (EFI_PHYSICAL_ADDRESS)(UINTN)RegisterTableEntry;
> -      //
> -      // Go though all MSRs in register table to initialize MSR spin lock
> -      //
> -      for (Index1 = 0; Index1 < DestinationRegisterTableList[Index].TableLength; Index1++, RegisterTableEntry++) {
> -        if ((RegisterTableEntry->RegisterType == Msr) && (RegisterTableEntry->ValidBitLength < 64)) {
> -          //
> -          // Initialize MSR spin lock only for those MSRs need bit field writing
> -          //
> -          InitMsrSpinLockByIndex (RegisterTableEntry->Index);
> -        }
> -      }
>      }
>    }
>  }
> @@ -832,6 +863,7 @@ GetAcpiCpuData (
>    VOID                       *GdtForAp;
>    VOID                       *IdtForAp;
>    VOID                       *MachineCheckHandlerForAp;
> +  CPU_STATUS_INFORMATION     *CpuStatus;
>  
>    if (!mAcpiS3Enable) {
>      return;
> @@ -906,6 +938,16 @@ GetAcpiCpuData (
>    Gdtr->Base = (UINTN)GdtForAp;
>    Idtr->Base = (UINTN)IdtForAp;
>    mAcpiCpuData.ApMachineCheckHandlerBase = (EFI_PHYSICAL_ADDRESS)(UINTN)MachineCheckHandlerForAp;
> +
> +  CpuStatus = &mAcpiCpuData.CpuStatus;
> +  CopyMem (CpuStatus, &AcpiCpuData->CpuStatus, sizeof (CPU_STATUS_INFORMATION));
> +  CpuStatus->ValidCoresInPackages = AllocateCopyPool (sizeof (UINT32) * CpuStatus->PackageCount, AcpiCpuData->CpuStatus.ValidCoresInPackages);

(11) This line is 142 characters long.

Please make sure that all new lines are at most 120 chars long.


(12) I don't understand the multiplication. In the
"ValidCoresInPackages" array, do we have a simple (scalar) core count,
for each socket?

That's what the "ValidApCount" assignment above suggests. Can we perhaps
rename the field so that it says "Count" somewhere?


(13) Without modifying CpuS3DataDxe, this line will crash.


> +  ASSERT (CpuStatus->ValidCoresInPackages != NULL);
> +  mAcpiCpuData.ApLocation = AllocateCopyPool (mAcpiCpuData.NumberOfCpus * sizeof (EFI_CPU_PHYSICAL_LOCATION), AcpiCpuData->ApLocation);
> +  ASSERT (mAcpiCpuData.ApLocation != NULL);

(14) This also requires a modification to CpuS3DataDxe.


> +  InitializeSpinLock((SPIN_LOCK*) &mCpuFlags.MemoryMappedLock);
> +  mCpuFlags.SemaphoreCount = AllocateZeroPool (sizeof (UINT32) * CpuStatus->PackageCount * CpuStatus->CoreCount * CpuStatus->ThreadCount);
> +  ASSERT (mCpuFlags.SemaphoreCount != NULL);
>  }
>  
>  /**
> diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
> index 9cf508a5c7..42b040531e 100644
> --- a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
> +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
> @@ -1303,8 +1303,6 @@ InitializeSmmCpuSemaphores (
>    mSmmCpuSemaphores.SemaphoreGlobal.CodeAccessCheckLock
>                                                    = (SPIN_LOCK *)SemaphoreAddr;
>    SemaphoreAddr += SemaphoreSize;
> -  mSmmCpuSemaphores.SemaphoreGlobal.MemoryMappedLock
> -                                                  = (SPIN_LOCK *)SemaphoreAddr;
>  
>    SemaphoreAddr = (UINTN)SemaphoreBlock + GlobalSemaphoresSize;
>    mSmmCpuSemaphores.SemaphoreCpu.Busy    = (SPIN_LOCK *)SemaphoreAddr;
> @@ -1321,7 +1319,6 @@ InitializeSmmCpuSemaphores (
>  
>    mPFLock                       = mSmmCpuSemaphores.SemaphoreGlobal.PFLock;
>    mConfigSmmCodeAccessCheckLock = mSmmCpuSemaphores.SemaphoreGlobal.CodeAccessCheckLock;
> -  mMemoryMappedLock             = mSmmCpuSemaphores.SemaphoreGlobal.MemoryMappedLock;
>  
>    mSemaphoreSize = SemaphoreSize;
>  }
> diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
> index 8c7f4996d1..e2970308fe 100644
> --- a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
> +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
> @@ -53,6 +53,7 @@ WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED.
>  #include <Library/ReportStatusCodeLib.h>
>  #include <Library/SmmCpuFeaturesLib.h>
>  #include <Library/PeCoffGetEntryPointLib.h>
> +#include <Library/RegisterCpuFeaturesLib.h>
>  
>  #include <AcpiCpuData.h>
>  #include <CpuHotPlugData.h>
> @@ -364,7 +365,6 @@ typedef struct {
>    volatile BOOLEAN     *AllCpusInSync;
>    SPIN_LOCK            *PFLock;
>    SPIN_LOCK            *CodeAccessCheckLock;
> -  SPIN_LOCK            *MemoryMappedLock;
>  } SMM_CPU_SEMAPHORE_GLOBAL;
>  
>  ///
> @@ -409,7 +409,6 @@ extern SMM_CPU_SEMAPHORES                  mSmmCpuSemaphores;
>  extern UINTN                               mSemaphoreSize;
>  extern SPIN_LOCK                           *mPFLock;
>  extern SPIN_LOCK                           *mConfigSmmCodeAccessCheckLock;
> -extern SPIN_LOCK                           *mMemoryMappedLock;
>  extern EFI_SMRAM_DESCRIPTOR                *mSmmCpuSmramRanges;
>  extern UINTN                               mSmmCpuSmramRangeCount;
>  extern UINT8                               mPhysicalAddressBits;
> 

Thanks,
Laszlo


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch 0/4] Fix performance issue caused by Set MSR task.
  2018-10-15 15:51 ` [Patch 0/4] Fix performance issue caused by Set MSR task Laszlo Ersek
@ 2018-10-16  1:39   ` Dong, Eric
  2018-10-17 11:42     ` Laszlo Ersek
  0 siblings, 1 reply; 18+ messages in thread
From: Dong, Eric @ 2018-10-16  1:39 UTC (permalink / raw)
  To: Laszlo Ersek, edk2-devel@lists.01.org; +Cc: Ni, Ruiyu

Hi Laszlo,

> -----Original Message-----
> From: Laszlo Ersek [mailto:lersek@redhat.com]
> Sent: Monday, October 15, 2018 11:52 PM
> To: Dong, Eric <eric.dong@intel.com>; edk2-devel@lists.01.org
> Cc: Ni, Ruiyu <ruiyu.ni@intel.com>
> Subject: Re: [Patch 0/4] Fix performance issue caused by Set MSR task.
> 
> Hi Eric,
> 
> On 10/15/18 04:49, Eric Dong wrote:
> > In a system which has multiple cores, current set register value task costs
> huge times.
> > After investigation, current set MSR task costs most of the times.
> > Current logic uses SpinLock to let set MSR task as an single thread
> > task for all cores. Because MSR has scope attribute which may cause GP
> > fault if multiple APs set MSR at the same time, current logic use an
> > easiest solution (use SpinLock) to avoid this issue, but it will cost huge times.
> >
> > In order to fix this performance issue, new solution will set MSRs
> > base on their scope attribute. After this, the SpinLock will not
> > needed. Without SpinLock, new issue raised which is caused by MSR
> > dependence. For example, MSR A depends on MSR B which means MSR A
> must
> > been set after MSR B has been set. Also MSR B is package scope level
> > and MSR A is thread scope level. If system has multiple threads,
> > Thread 1 needs to set the thread level MSRs and thread 2 needs to set
> thread and package level MSRs. Set MSRs task for thread 1 and thread 2 like
> below:
> >
> >             Thread 1                 Thread 2
> > MSR B          N                        Y
> > MSR A          Y                        Y
> >
> > If driver don't control execute MSR order, for thread 1, it will
> > execute MSR A first, but at this time, MSR B not been executed yet by
> > thread 2. system may trig exception at this time.
> >
> > In order to fix the above issue, driver introduces semaphore logic to
> > control the MSR execute sequence. For the above case, a semaphore will
> > be add between MSR A and B for all threads. Semaphore has scope info for
> it. The possible scope value is core or package.
> > For each thread, when it meets a semaphore during it set registers, it
> > will 1) release semaphore (+1) for each threads in this core or
> > package(based on the scope info for this
> > semaphore) 2) acquire semaphore (-1) for all the threads in this core
> > or package(based on the scope info for this semaphore). With these two
> > steps, driver can control MSR sequence. Sample code logic like below:
> >
> >   //
> >   // First increase semaphore count by 1 for processors in this package.
> >   //
> >   for (ProcessorIndex = 0; ProcessorIndex < PackageThreadsCount ;
> ProcessorIndex ++) {
> >     LibReleaseSemaphore ((UINT32 *) &SemaphorePtr[PackageOffset +
> ProcessorIndex]);
> >   }
> >   //
> >   // Second, check whether the count has reach the check number.
> >   //
> >   for (ProcessorIndex = 0; ProcessorIndex < ValidApCount; ProcessorIndex
> ++) {
> >     LibWaitForSemaphore (&SemaphorePtr[ApOffset]);
> >   }
> >
> > Platform Requirement:
> > 1. This change requires register MSR setting base on MSR scope info. If still
> register MSR
> >    for all threads, exception may raised.
> 
> Do you mean that platforms are responsible for updating their register tables
> in:
> - ACPI_CPU_DATA.PreSmmInitRegisterTable,
> - ACPI_CPU_DATA.RegisterTable
> 
> so that the tables utilize the new Semaphore REGISTER_TYPE as appropriate?

Yes, platform should set MSR in these two tables base on MSR's scope info. Just like if the MSR is core level, this MSR should on been add to the AP which control the related core. 
Also if two MSRs have dependence and they have different scope info, a semaphore should been added between these two MSRs. 

> 
> >
> > Known limitation:
> > 1. Current CpuFeatures driver supports DXE instance and PEI instance. But
> semaphore logic
> >    requires Aps execute in async mode which is not supported by PEI driver.
> So CpuFeature
> >    PEI instance not works after this change. We plan to support async mode
> for PEI in phase
> >    2 for this task.
> > 2. Current execute MSR task code in duplicated in PiSmmCpuDxeSmm
> driver and
> >    RegisterCpuFeaturesLib library because the schedule limitation.
> 
> I don't understand what you mean by "schedule limitation". Are you alluding
> to the upcoming edk2 stable tag (in November), or some other schedule?

Yes, I want to include this change in the upcoming edk2 stable tag. But I can't finish all these changes before it, so for this version, I just duplicate the code.

> 
> >    Will merge the code to
> >    RegisterCpuFeaturesLib and export as an API in phase 2 for this task.
> 
> While I agree that common code (especially complex code like this) should
> belong to libraries, there are platforms that consume PiSmmCpuDxeSmm,
> but don't consume RegisterCpuFeaturesLib in any way.
> 
> Do you plan to add the new function(s) to a RegisterCpuFeaturesLib instance,
> and make PiSmmCpuDxeSmm dependent on RegisterCpuFeaturesLib?

Yes, plan to export one new API in RegisterCpuFeaturesLib to let PiSmmCpuDxeSmm driver to consume it.\
This API used to program the register.

> 
> If so, I think it can work, but then the RegisterCpuFeaturesLib instance in
> question should do nothing at all in the constructor. On platforms that don't
> use this feature at all -- i.e., the Semaphore REGISTER_TYPE --, there should
> be no impact.
> 
> (BTW, DxeRegisterCpuFeaturesLib is currently restricted to DXE_DRIVER
> modules.)
> 

Thanks for your advice. Yes, I have did some POC code to export this API and already met such issue. I also met a dependence issue.
Let's discuss these issue when I do that changes.

> > Extra Notes:
> >   I will send the other patch to set MSR base on scope info and check in it
> before check in
> >   this serial.
> 
> I don't understand. I assume that you are referring to some concrete
> platform (?) where the Semaphore REGISTER_TYPE *must* be used, in order
> to successfully boot (and/or perform S3), if this series is applied.
> 
> What platform is that?

Yes, I used an internal reference platform to  verify the changes.  I have did some change(Set MSR base on scope info) to verify the solution but not finalize the change yet. I checked the boot result and console log to confirmed the code works as expectation. 
When I send this serial changes, I'm not finalize Set MSR base on scope info change, but I want to collect your feedback for this serial as soon as possible (I think you will be the first one who will reply this serial), so I add this note and send out this serial. Now I have finished code change to set MSRs base on its scope info. I will include this patch when I send the  v2 patch for this serial.

> 
> And, if that other patch is indeed a pre-requisite for *this* set (on some
> specific platform anyway), then people on that platform will not be able to
> test this series until you post those patches.
> 
> My point here is that, on that platform, the testing cannot be performed in
> separation, so it's not enough to establish the right dependency order *just*
> before check-in. It should be offered on the list as well.
> 
> Thanks,
> Laszlo
> 
> >
> > Cc: Ruiyu Ni <ruiyu.ni@intel.com>
> > Cc: Laszlo Ersek <lersek@redhat.com>
> > Contributed-under: TianoCore Contribution Agreement 1.1
> > Signed-off-by: Eric Dong <eric.dong@intel.com>
> >
> > Eric Dong (4):
> >   UefiCpuPkg/Include/AcpiCpuData.h: Add Semaphore related Information.
> >   UefiCpuPkg/RegisterCpuFeaturesLib.h: Add new dependence types.
> >   UefiCpuPkg/RegisterCpuFeaturesLib: Add logic to support semaphore
> >     type.
> >   UefiCpuPkg/PiSmmCpuDxeSmm: Add logic to support semaphore type.
> >
> >  UefiCpuPkg/Include/AcpiCpuData.h                   |  23 +-
> >  .../Include/Library/RegisterCpuFeaturesLib.h       |  25 +-
> >  .../RegisterCpuFeaturesLib/CpuFeaturesInitialize.c | 324 ++++++++++++---
> >  .../DxeRegisterCpuFeaturesLib.c                    |  71 +++-
> >  .../DxeRegisterCpuFeaturesLib.inf                  |   3 +
> >  .../PeiRegisterCpuFeaturesLib.c                    |  55 ++-
> >  .../PeiRegisterCpuFeaturesLib.inf                  |   1 +
> >  .../RegisterCpuFeaturesLib/RegisterCpuFeatures.h   |  51 ++-
> >  .../RegisterCpuFeaturesLib.c                       | 452 ++++++++++++++++++---
> >  UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c                  | 316 +++++++-------
> >  UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c              |   3 -
> >  UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h         |   3 +-
> >  12 files changed, 1063 insertions(+), 264 deletions(-)
> >


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch 1/4] UefiCpuPkg/Include/AcpiCpuData.h: Add Semaphore related Information.
  2018-10-15  2:49 ` [Patch 1/4] UefiCpuPkg/Include/AcpiCpuData.h: Add Semaphore related Information Eric Dong
  2018-10-15 16:02   ` Laszlo Ersek
@ 2018-10-16  2:27   ` Ni, Ruiyu
  2018-10-16  5:25     ` Dong, Eric
  1 sibling, 1 reply; 18+ messages in thread
From: Ni, Ruiyu @ 2018-10-16  2:27 UTC (permalink / raw)
  To: Eric Dong, edk2-devel; +Cc: Laszlo Ersek

On 10/15/2018 10:49 AM, Eric Dong wrote:
> In order to support semaphore related logic, add new definition for it.
> 
> Cc: Ruiyu Ni <ruiyu.ni@intel.com>
> Cc: Laszlo Ersek <lersek@redhat.com>
> Contributed-under: TianoCore Contribution Agreement 1.1
> Signed-off-by: Eric Dong <eric.dong@intel.com>
> ---
>   UefiCpuPkg/Include/AcpiCpuData.h | 23 ++++++++++++++++++++++-
>   1 file changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/UefiCpuPkg/Include/AcpiCpuData.h b/UefiCpuPkg/Include/AcpiCpuData.h
> index 9e51145c08..b3cf2f664a 100644
> --- a/UefiCpuPkg/Include/AcpiCpuData.h
> +++ b/UefiCpuPkg/Include/AcpiCpuData.h
> @@ -15,6 +15,8 @@ WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED.
>   #ifndef _ACPI_CPU_DATA_H_
>   #define _ACPI_CPU_DATA_H_
>   
> +#include <Protocol/MpService.h>
> +
>   //
>   // Register types in register table
>   //
> @@ -22,9 +24,20 @@ typedef enum {
>     Msr,
>     ControlRegister,
>     MemoryMapped,
> -  CacheControl
> +  CacheControl, > +  Semaphore
I assume the REGISTER_TYPE definition will be move to internal 
(non-public) in phase 2.

>   } REGISTER_TYPE;
>   
> +//
> +// CPU information.
> +//
> +typedef struct {
> +  UINT32        PackageCount;             // Packages in this CPU.
> +  UINT32        CoreCount;                // Max Core count in the packages.
> +  UINT32        ThreadCount;              // MAx thread count in the cores.
> +  UINT32        *ValidCoresInPackages;    // Valid cores in each package.

Can you please add more comments to describe each field above?
PackageCount is easy to understand.
But CoreCount is not. Maybe different packages have different number of 
cores. In this case, what value will CoreCount be?
Similar question to ThreadCount.

What does ValidCoresInPackages mean? Does it hold the valid (non-dead) 
core numbers for each package? So it's a UINT32 array with PackageCount 
elements?
How about using name ValidCoreCountPerPackage?
How about using MaxCoreCount/MaxThreadCount for CoreCount and ThreadCount?

> +} CPU_STATUS_INFORMATION;
> +
>   //
>   // Element of register table entry
>   //
> @@ -147,6 +160,14 @@ typedef struct {
>     // provided.
>     //
>     UINT32                ApMachineCheckHandlerSize;
> +  //
> +  // CPU information which is required when set the register table.
> +  //
> +  CPU_STATUS_INFORMATION     CpuStatus;
> +  //
> +  // Location info for each ap.
> +  //
> +  EFI_CPU_PHYSICAL_LOCATION  *ApLocation;

Please use EFI_PHYSICAL_ADDRESS for ApLocation. It's ok now. But if 
there are more fields below ApLocation, the offset of those fields 
differs between PEI and DXE. That will cause bugs.

>   } ACPI_CPU_DATA;
>   
>   #endif
> 


-- 
Thanks,
Ray


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch 3/4] UefiCpuPkg/RegisterCpuFeaturesLib: Add logic to support semaphore type.
  2018-10-15  2:49 ` [Patch 3/4] UefiCpuPkg/RegisterCpuFeaturesLib: Add logic to support semaphore type Eric Dong
@ 2018-10-16  3:05   ` Ni, Ruiyu
  2018-10-16  7:43     ` Dong, Eric
  0 siblings, 1 reply; 18+ messages in thread
From: Ni, Ruiyu @ 2018-10-16  3:05 UTC (permalink / raw)
  To: Eric Dong, edk2-devel; +Cc: Laszlo Ersek

On 10/15/2018 10:49 AM, Eric Dong wrote:
> In a system which has multiple cores, current set register value task costs huge times.
> After investigation, current set MSR task costs most of the times. Current logic uses
> SpinLock to let set MSR task as an single thread task for all cores. Because MSR has
> scope attribute which may cause GP fault if multiple APs set MSR at the same time,
> current logic use an easiest solution (use SpinLock) to avoid this issue, but it will
> cost huge times.
> 
> In order to fix this performance issue, new solution will set MSRs base on their scope
> attribute. After this, the SpinLock will not needed. Without SpinLock, new issue raised
> which is caused by MSR dependence. For example, MSR A depends on MSR B which means MSR A
> must been set after MSR B has been set. Also MSR B is package scope level and MSR A is
> thread scope level. If system has multiple threads, Thread 1 needs to set the thread level
> MSRs and thread 2 needs to set thread and package level MSRs. Set MSRs task for thread 1
> and thread 2 like below:
> 
>              Thread 1                 Thread 2
> MSR B          N                        Y
> MSR A          Y                        Y
> 
> If driver don't control execute MSR order, for thread 1, it will execute MSR A first, but
> at this time, MSR B not been executed yet by thread 2. system may trig exception at this
> time.
> 
> In order to fix the above issue, driver introduces semaphore logic to control the MSR
> execute sequence. For the above case, a semaphore will be add between MSR A and B for
> all threads. Semaphore has scope info for it. The possible scope value is core or package.
> For each thread, when it meets a semaphore during it set registers, it will 1) release
> semaphore (+1) for each threads in this core or package(based on the scope info for this
> semaphore) 2) acquire semaphore (-1) for all the threads in this core or package(based
> on the scope info for this semaphore). With these two steps, driver can control MSR
> sequence. Sample code logic like below:
> 
>    //
>    // First increase semaphore count by 1 for processors in this package.
>    //
>    for (ProcessorIndex = 0; ProcessorIndex < PackageThreadsCount ; ProcessorIndex ++) {
>      LibReleaseSemaphore ((UINT32 *) &SemaphorePtr[PackageOffset + ProcessorIndex]);
>    }
>    //
>    // Second, check whether the count has reach the check number.
>    //
>    for (ProcessorIndex = 0; ProcessorIndex < ValidApCount; ProcessorIndex ++) {
>      LibWaitForSemaphore (&SemaphorePtr[ApOffset]);
>    }
> 
> Platform Requirement:
> 1. This change requires register MSR setting base on MSR scope info. If still register MSR
>     for all threads, exception may raised.
> 
> Known limitation:
> 1. Current CpuFeatures driver supports DXE instance and PEI instance. But semaphore logic
>     requires Aps execute in async mode which is not supported by PEI driver. So CpuFeature
>     PEI instance not works after this change. We plan to support async mode for PEI in phase
>     2 for this task.
> 
> Cc: Ruiyu Ni <ruiyu.ni@intel.com>
> Cc: Laszlo Ersek <lersek@redhat.com>
> Contributed-under: TianoCore Contribution Agreement 1.1
> Signed-off-by: Eric Dong <eric.dong@intel.com>
> ---
>   .../RegisterCpuFeaturesLib/CpuFeaturesInitialize.c | 324 ++++++++++++---
>   .../DxeRegisterCpuFeaturesLib.c                    |  71 +++-
>   .../DxeRegisterCpuFeaturesLib.inf                  |   3 +
>   .../PeiRegisterCpuFeaturesLib.c                    |  55 ++-
>   .../PeiRegisterCpuFeaturesLib.inf                  |   1 +
>   .../RegisterCpuFeaturesLib/RegisterCpuFeatures.h   |  51 ++-
>   .../RegisterCpuFeaturesLib.c                       | 452 ++++++++++++++++++---
>   7 files changed, 840 insertions(+), 117 deletions(-)
> 
> diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c
> index ba3fb3250f..f820b4fed7 100644
> --- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c
> +++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c
> @@ -145,6 +145,20 @@ CpuInitDataInitialize (
>     CPU_FEATURES_INIT_ORDER              *InitOrder;
>     CPU_FEATURES_DATA                    *CpuFeaturesData;
>     LIST_ENTRY                           *Entry;
> +  UINT32                               Core;
> +  UINT32                               Package;
> +  UINT32                               Thread;
> +  EFI_CPU_PHYSICAL_LOCATION            *Location;
> +  UINT32                               *CoreArray;
> +  UINTN                                Index;
> +  UINT32                               ValidCount;
> +  UINTN                                CoreIndex;
> +  ACPI_CPU_DATA                        *AcpiCpuData;
> +  CPU_STATUS_INFORMATION               *CpuStatus;
> +
> +  Core    = 0;
> +  Package = 0;
> +  Thread  = 0;
>   
>     CpuFeaturesData = GetCpuFeaturesData ();
>     CpuFeaturesData->InitOrder = AllocateZeroPool (sizeof (CPU_FEATURES_INIT_ORDER) * NumberOfCpus);
> @@ -163,6 +177,16 @@ CpuInitDataInitialize (
>       Entry = Entry->ForwardLink;
>     }
>   
> +  CpuFeaturesData->NumberOfCpus = (UINT32) NumberOfCpus;
> +
> +  AcpiCpuData = (ACPI_CPU_DATA *) (UINTN) PcdGet64 (PcdCpuS3DataAddress);
> +  ASSERT (AcpiCpuData != NULL);
> +  CpuFeaturesData->AcpiCpuData= AcpiCpuData;
> +
> +  CpuStatus = &AcpiCpuData->CpuStatus;
> +  AcpiCpuData->ApLocation = AllocateZeroPool (sizeof (EFI_CPU_PHYSICAL_LOCATION) * NumberOfCpus);
> +  ASSERT (AcpiCpuData->ApLocation != NULL);
> +
>     for (ProcessorNumber = 0; ProcessorNumber < NumberOfCpus; ProcessorNumber++) {
>       InitOrder = &CpuFeaturesData->InitOrder[ProcessorNumber];
>       InitOrder->FeaturesSupportedMask = AllocateZeroPool (CpuFeaturesData->BitMaskSize);
> @@ -175,7 +199,59 @@ CpuInitDataInitialize (
>         &ProcessorInfoBuffer,
>         sizeof (EFI_PROCESSOR_INFORMATION)
>         );
> +    CopyMem (
> +      AcpiCpuData->ApLocation + ProcessorNumber,
> +      &ProcessorInfoBuffer.Location,
> +      sizeof (EFI_CPU_PHYSICAL_LOCATION)
> +      );
> +

Please add more comments here to describe what the below code tries to 
do and why.

> +    if (Package < ProcessorInfoBuffer.Location.Package) {
> +      Package = ProcessorInfoBuffer.Location.Package;
> +    }
> +    if (Core < ProcessorInfoBuffer.Location.Core) {
> +      Core = ProcessorInfoBuffer.Location.Core;
> +    }
> +    if (Thread < ProcessorInfoBuffer.Location.Thread) {
> +      Thread = ProcessorInfoBuffer.Location.Thread;
> +    }
> +  }
> +  CpuStatus->PackageCount = Package + 1;
> +  CpuStatus->CoreCount    = Core + 1;
> +  CpuStatus->ThreadCount  = Thread + 1;


> +  DEBUG ((DEBUG_INFO, "Processor Info: Package: %d, Core : %d, Thread: %d\n",
> +         CpuStatus->PackageCount,
> +         CpuStatus->CoreCount,
> +         CpuStatus->ThreadCount));

Please use MaxCore and MaxThread in debug message. Otherwise it's confusing.

> +
> +  //
> +  // Collect valid core count in each package because not all cores are valid.
> +  //
> +  CpuStatus->ValidCoresInPackages = AllocateZeroPool (sizeof (UINT32) * CpuStatus->PackageCount);
> +  ASSERT (CpuStatus->ValidCoresInPackages != NULL);

Please add comments to describe the purpose of CoreArray.
CoreArray is not a good name IMO. How about:
CoreVisited - AllocatePool (sizeof (BOOLEAN) * CpuStatus->MaxCoreCount);

> +  CoreArray = AllocatePool (sizeof (UINT32) * CpuStatus->CoreCount);
> +  ASSERT (CoreArray != NULL);
> +
> +  for (Index = 0; Index <= Package; Index ++ ) {

Please stop using Package/Core/Thread. Use the field in CpuStatus 
structure instead. It makes the code more readable.

> +    ZeroMem (CoreArray, sizeof (UINT32) * (Core + 1));
> +    for (ProcessorNumber = 0; ProcessorNumber < NumberOfCpus; ProcessorNumber++) {
> +      Location = &CpuFeaturesData->InitOrder[ProcessorNumber].CpuInfo.ProcessorInfo.Location;
> +      if (Location->Package == Index) {
> +        CoreArray[Location->Core] = 1;
> +      }

The above if-clause can be:
          if ((Location->Package == Index) &&
              !CoreVisited[Location->Core])) {
            CpuStatus->ValidCoreCountPerPackage[Index]++;
            CoreVisited[Location->Core] = TRUE;
          }

The for-loop below can be removed.

> +    }
> +    for (CoreIndex = 0, ValidCount = 0; CoreIndex <= Core; CoreIndex ++) {
> +      ValidCount += CoreArray[CoreIndex];
> +    }
> +    CpuStatus->ValidCoresInPackages[Index] = ValidCount;
>     }
> +  FreePool (CoreArray);
> +  for (Index = 0; Index <= Package; Index++) {
> +    DEBUG ((DEBUG_INFO, "Package: %d, Valid Core : %d\n", Index, CpuStatus->ValidCoresInPackages[Index]));
> +  }
> +
> +  CpuFeaturesData->CpuFlags.SemaphoreCount = AllocateZeroPool (sizeof (UINT32) * CpuStatus->PackageCount * CpuStatus->CoreCount* CpuStatus->ThreadCount);
> +  ASSERT (CpuFeaturesData->CpuFlags.SemaphoreCount != NULL);
> +
>     //
>     // Get support and configuration PCDs
>     //
> @@ -310,7 +386,7 @@ CollectProcessorData (
>     LIST_ENTRY                           *Entry;
>     CPU_FEATURES_DATA                    *CpuFeaturesData;
>   
> -  CpuFeaturesData = GetCpuFeaturesData ();
> +  CpuFeaturesData = (CPU_FEATURES_DATA *)Buffer;

Is the above change more proper in a separate patch?

>     ProcessorNumber = GetProcessorIndex ();
>     CpuInfo = &CpuFeaturesData->InitOrder[ProcessorNumber].CpuInfo;
>     //
> @@ -416,6 +492,15 @@ DumpRegisterTableOnProcessor (
>           RegisterTableEntry->Value
>           ));
>         break;
> +    case Semaphore:
> +      DEBUG ((
> +        DebugPrintErrorLevel,
> +        "Processor: %d: Semaphore: Scope Value: %d\r\n",

How about print the Scope value in string? This makes the debug message 
more meaningful.

> +        ProcessorNumber,
> +        RegisterTableEntry->Value
> +        ));
> +      break;
> +
>       default:
>         break;
>       }
> @@ -441,6 +526,11 @@ AnalysisProcessorFeatures (
>     REGISTER_CPU_FEATURE_INFORMATION     *CpuInfo;
>     LIST_ENTRY                           *Entry;
>     CPU_FEATURES_DATA                    *CpuFeaturesData;
> +  LIST_ENTRY                           *NextEntry;
> +  CPU_FEATURES_ENTRY                   *NextCpuFeatureInOrder;
> +  BOOLEAN                              Success;
> +  CPU_FEATURE_DEPENDENCE_TYPE          BeforeDep;
> +  CPU_FEATURE_DEPENDENCE_TYPE          AfterDep;
>   
>     CpuFeaturesData = GetCpuFeaturesData ();
>     CpuFeaturesData->CapabilityPcd = AllocatePool (CpuFeaturesData->BitMaskSize);
> @@ -517,8 +607,14 @@ AnalysisProcessorFeatures (
>       //
>       CpuInfo = &CpuFeaturesData->InitOrder[ProcessorNumber].CpuInfo;
>       Entry = GetFirstNode (&CpuInitOrder->OrderList);
> +    NextEntry = Entry->ForwardLink;
>       while (!IsNull (&CpuInitOrder->OrderList, Entry)) {
>         CpuFeatureInOrder = CPU_FEATURE_ENTRY_FROM_LINK (Entry);
> +      if (!IsNull (&CpuInitOrder->OrderList, NextEntry)) {
> +        NextCpuFeatureInOrder = CPU_FEATURE_ENTRY_FROM_LINK (NextEntry);
> +      } else {
> +        NextCpuFeatureInOrder = NULL;
> +      }
>         if (IsBitMaskMatch (CpuFeatureInOrder->FeatureMask, CpuFeaturesData->SettingPcd)) {
>           Status = CpuFeatureInOrder->InitializeFunc (ProcessorNumber, CpuInfo, CpuFeatureInOrder->ConfigData, TRUE);
>           if (EFI_ERROR (Status)) {
> @@ -532,6 +628,8 @@ AnalysisProcessorFeatures (
>               DEBUG ((DEBUG_WARN, "Warning :: Failed to enable Feature: Mask = "));
>               DumpCpuFeatureMask (CpuFeatureInOrder->FeatureMask);
>             }
> +        } else {
> +          Success = TRUE;
>           }
>         } else {
>           Status = CpuFeatureInOrder->InitializeFunc (ProcessorNumber, CpuInfo, CpuFeatureInOrder->ConfigData, FALSE);
> @@ -542,9 +640,36 @@ AnalysisProcessorFeatures (
>               DEBUG ((DEBUG_WARN, "Warning :: Failed to disable Feature: Mask = "));
>               DumpCpuFeatureMask (CpuFeatureInOrder->FeatureMask);
>             }
> +        } else {
> +          Success = TRUE;
>           }
>         }
> -      Entry = Entry->ForwardLink;
> +
> +      if (Success) {
> +        //
> +        // If feature has dependence with the next feature (ONLY care core/package dependency).
> +        // and feature initialize succeed, add sync semaphere here.
> +        //
> +        BeforeDep = DetectFeatureScope (CpuFeatureInOrder, TRUE);
> +        if (NextCpuFeatureInOrder != NULL) {
> +          AfterDep  = DetectFeatureScope (NextCpuFeatureInOrder, FALSE);
> +        } else {
> +          AfterDep = NoneDepType;
> +        }
> +        //
> +        // Assume only one of the depend is valid.
> +        //
> +        ASSERT (!(BeforeDep > ThreadDepType && AfterDep > ThreadDepType));
> +        if (BeforeDep > ThreadDepType) {
> +          CPU_REGISTER_TABLE_WRITE32 (ProcessorNumber, Semaphore, 0, BeforeDep);
> +        }
> +        if (AfterDep > ThreadDepType) {
> +          CPU_REGISTER_TABLE_WRITE32 (ProcessorNumber, Semaphore, 0, AfterDep);
> +        }
> +      }
> +
> +      Entry     = Entry->ForwardLink;
> +      NextEntry = Entry->ForwardLink;
>       }
>   
>       //
> @@ -561,27 +686,79 @@ AnalysisProcessorFeatures (
>     }
>   }
>   
> +/**
> +  Increment semaphore by 1.
> +
> +  @param      Sem            IN:  32-bit unsigned integer
> +
> +**/
> +VOID
> +LibReleaseSemaphore (
> +  IN OUT  volatile UINT32           *Sem
> +  )
> +{
> +  InterlockedIncrement (Sem);
> +}
> +
> +/**
> +  Decrement the semaphore by 1 if it is not zero.
> +
> +  Performs an atomic decrement operation for semaphore.
> +  The compare exchange operation must be performed using
> +  MP safe mechanisms.
> +
> +  @param      Sem            IN:  32-bit unsigned integer
> +
> +**/
> +VOID
> +LibWaitForSemaphore (
> +  IN OUT  volatile UINT32           *Sem
> +  )
> +{
> +  UINT32  Value;
> +
> +  do {
> +    Value = *Sem;
> +  } while (Value == 0);
> +
> +  InterlockedDecrement (Sem);
> +}
> +
>   /**
>     Initialize the CPU registers from a register table.
>   
> -  @param[in]  ProcessorNumber  The index of the CPU executing this function.
> +  @param[in]  RegisterTable         The register table for this AP.
> +  @param[in]  ApLocation            AP location info for this ap.
> +  @param[in]  CpuStatus             CPU status info for this CPU.
> +  @param[in]  CpuFlags              Flags data structure used when program the register.
>   
>     @note This service could be called by BSP/APs.
>   **/
>   VOID
> +EFIAPI
>   ProgramProcessorRegister (
> -  IN UINTN  ProcessorNumber
> +  IN CPU_REGISTER_TABLE           *RegisterTable,
> +  IN EFI_CPU_PHYSICAL_LOCATION    *ApLocation,
> +  IN CPU_STATUS_INFORMATION       *CpuStatus,
> +  IN PROGRAM_CPU_REGISTER_FLAGS   *CpuFlags
>     )
>   {
> -  CPU_FEATURES_DATA         *CpuFeaturesData;
> -  CPU_REGISTER_TABLE        *RegisterTable;
>     CPU_REGISTER_TABLE_ENTRY  *RegisterTableEntry;
>     UINTN                     Index;
>     UINTN                     Value;
>     CPU_REGISTER_TABLE_ENTRY  *RegisterTableEntryHead;
> -
> -  CpuFeaturesData = GetCpuFeaturesData ();
> -  RegisterTable = &CpuFeaturesData->RegisterTable[ProcessorNumber];
> +  volatile UINT32           *SemaphorePtr;
> +  UINT32                    CoreOffset;
> +  UINT32                    PackageOffset;
> +  UINT32                    PackageThreadsCount;
> +  UINT32                    ApOffset;
> +  UINTN                     ProcessorIndex;
> +  UINTN                     ApIndex;
> +  UINTN                     ValidApCount;
> +
> +  ApIndex = ApLocation->Package * CpuStatus->CoreCount * CpuStatus->ThreadCount \
> +            + ApLocation->Core * CpuStatus->ThreadCount \
> +            + ApLocation->Thread;
>   
>     //
>     // Traverse Register Table of this logical processor
> @@ -591,6 +768,7 @@ ProgramProcessorRegister (
>     for (Index = 0; Index < RegisterTable->TableLength; Index++) {
>   
>       RegisterTableEntry = &RegisterTableEntryHead[Index];
> +    DEBUG ((DEBUG_INFO, "Processor = %d, Entry Index %d, Type = %d!\n", ApIndex, Index, RegisterTableEntry->RegisterType));
How about print the register type in string?

>   
>       //
>       // Check the type of specified register
> @@ -654,10 +832,6 @@ ProgramProcessorRegister (
>       // The specified register is Model Specific Register
>       //
>       case Msr:
> -      //
> -      // Get lock to avoid Package/Core scope MSRs programming issue in parallel execution mode
> -      //
> -      AcquireSpinLock (&CpuFeaturesData->MsrLock);
>         if (RegisterTableEntry->ValidBitLength >= 64) {
>           //
>           // If length is not less than 64 bits, then directly write without reading
> @@ -677,20 +851,19 @@ ProgramProcessorRegister (
>             RegisterTableEntry->Value
>             );
>         }
> -      ReleaseSpinLock (&CpuFeaturesData->MsrLock);
>         break;
>       //
>       // MemoryMapped operations
>       //
>       case MemoryMapped:
> -      AcquireSpinLock (&CpuFeaturesData->MemoryMappedLock);
> +      AcquireSpinLock (&CpuFlags->MemoryMappedLock);
>         MmioBitFieldWrite32 (
>           (UINTN)(RegisterTableEntry->Index | LShiftU64 (RegisterTableEntry->HighIndex, 32)),
>           RegisterTableEntry->ValidBitStart,
>           RegisterTableEntry->ValidBitStart + RegisterTableEntry->ValidBitLength - 1,
>           (UINT32)RegisterTableEntry->Value
>           );
> -      ReleaseSpinLock (&CpuFeaturesData->MemoryMappedLock);
> +      ReleaseSpinLock (&CpuFlags->MemoryMappedLock);
>         break;
>       //
>       // Enable or disable cache
> @@ -706,6 +879,50 @@ ProgramProcessorRegister (
>         }
>         break;
>   
> +    case Semaphore:
> +      SemaphorePtr = CpuFlags->SemaphoreCount;
> +      switch (RegisterTableEntry->Value) {
> +      case CoreDepType:
> +        CoreOffset = (ApLocation->Package * CpuStatus->CoreCount + ApLocation->Core) * CpuStatus->ThreadCount > +        ApOffset = CoreOffset + ApLocation->Thread;

How about FirstThread and CurrentThread?

> +        //
> +        // First increase semaphore count by 1 for processors in this core.
This comment might not be helpful for reviewer to understand.
How about "Notify all threads in current Core"?

> +        //
> +        for (ProcessorIndex = 0; ProcessorIndex < CpuStatus->ThreadCount; ProcessorIndex ++) {
> +          LibReleaseSemaphore ((UINT32 *) &SemaphorePtr[CoreOffset + ProcessorIndex]);
> +        }
> +        //
> +        // Second, check whether the count has reach the check number.
How about "Wait for all threads in current Core"?

Below diagram is also helpful
//
//  V(x) = LibReleaseSemaphore (Semaphore[FirstThread + x]);
//  P(x) = LibWaitForSemaphore (Semaphore[FirstThread + x]);
//
//  All threads (T0...Tn) waits in P() line and continues running
//  together.
//
//
//  T0             T1            ...           Tn
//
//  V(0...n)       V(0...n)      ...           V(0...n)
//  n * P(0)       n * P(1)      ...           n * P(n)
//

> +        //
> +        for (ProcessorIndex = 0; ProcessorIndex < CpuStatus->ThreadCount; ProcessorIndex ++) {
> +          LibWaitForSemaphore (&SemaphorePtr[ApOffset]);
> +        }
> +        break;
> +
> +      case PackageDepType:
> +        PackageOffset = ApLocation->Package * CpuStatus->CoreCount * CpuStatus->ThreadCount;

FirstThread?

> +        PackageThreadsCount = CpuStatus->ThreadCount * CpuStatus->CoreCount;
ThreadCount?

> +        ApOffset = PackageOffset + CpuStatus->ThreadCount * ApLocation->Core + ApLocation->Thread;
CurrentThread?

> +        ValidApCount = CpuStatus->ThreadCount * CpuStatus->ValidCoresInPackages[ApLocation->Package];
ValidThreadCount?

> +        //
> +        // First increase semaphore count by 1 for processors in this package.
How about "Notify all threads in current Package"?
> +        //
> +        for (ProcessorIndex = 0; ProcessorIndex < PackageThreadsCount ; ProcessorIndex ++) {
> +          LibReleaseSemaphore ((UINT32 *) &SemaphorePtr[PackageOffset + ProcessorIndex]);
> +        }
> +        //
> +        // Second, check whether the count has reach the check number.
How about "Wait for all threads in current Package"?
> +        //
> +        for (ProcessorIndex = 0; ProcessorIndex < ValidApCount; ProcessorIndex ++) {
> +          LibWaitForSemaphore (&SemaphorePtr[ApOffset]);
> +        }
> +        break;
> +
> +      default:
> +        break;
> +      }
> +      break;
> +
>       default:
>         break;
>       }
> @@ -724,10 +941,36 @@ SetProcessorRegister (
>     IN OUT VOID            *Buffer
>     )
>   {
> -  UINTN                  ProcessorNumber;
> +  CPU_FEATURES_DATA         *CpuFeaturesData;
> +  CPU_REGISTER_TABLE        *RegisterTable;
> +  CPU_REGISTER_TABLE        *RegisterTables;
> +  UINT32                    InitApicId;
> +  UINTN                     ProcIndex;
> +  UINTN                     Index;
> +  ACPI_CPU_DATA             *AcpiCpuData;
>   
> -  ProcessorNumber = GetProcessorIndex ();
> -  ProgramProcessorRegister (ProcessorNumber);
> +  CpuFeaturesData = (CPU_FEATURES_DATA *) Buffer;
> +  AcpiCpuData = CpuFeaturesData->AcpiCpuData;
> +
> +  RegisterTables = (CPU_REGISTER_TABLE *)(UINTN)AcpiCpuData->RegisterTable;
> +
> +  InitApicId = GetInitialApicId ();
> +  RegisterTable = NULL;
> +  for (Index = 0; Index < AcpiCpuData->NumberOfCpus; Index++) {
> +    if (RegisterTables[Index].InitialApicId == InitApicId) {
> +      RegisterTable =  &RegisterTables[Index];
> +      ProcIndex = Index;
> +      break;
> +    }
> +  }
> +  ASSERT (RegisterTable != NULL);
> +
> +  ProgramProcessorRegister (
> +    RegisterTable,
> +    AcpiCpuData->ApLocation + ProcIndex,
> +    &AcpiCpuData->CpuStatus,
> +    &CpuFeaturesData->CpuFlags
> +    );
>   }
>   
>   /**
> @@ -746,6 +989,9 @@ CpuFeaturesDetect (
>   {
>     UINTN                  NumberOfCpus;
>     UINTN                  NumberOfEnabledProcessors;
> +  CPU_FEATURES_DATA      *CpuFeaturesData;
> +
> +  CpuFeaturesData = GetCpuFeaturesData();
>   
>     GetNumberOfProcessor (&NumberOfCpus, &NumberOfEnabledProcessors);
>   
> @@ -754,49 +1000,13 @@ CpuFeaturesDetect (
>     //
>     // Wakeup all APs for data collection.
>     //
> -  StartupAPsWorker (CollectProcessorData);
> +  StartupAPsWorker (CollectProcessorData, NULL);
>   
>     //
>     // Collect data on BSP
>     //
> -  CollectProcessorData (NULL);
> +  CollectProcessorData (CpuFeaturesData);
>   
>     AnalysisProcessorFeatures (NumberOfCpus);
>   }
>   
> -/**
> -  Performs CPU features Initialization.
> -
> -  This service will invoke MP service to perform CPU features
> -  initialization on BSP/APs per user configuration.
> -
> -  @note This service could be called by BSP only.
> -**/
> -VOID
> -EFIAPI
> -CpuFeaturesInitialize (
> -  VOID
> -  )
> -{
> -  CPU_FEATURES_DATA      *CpuFeaturesData;
> -  UINTN                  OldBspNumber;
> -
> -  CpuFeaturesData = GetCpuFeaturesData ();
> -
> -  OldBspNumber = GetProcessorIndex();
> -  CpuFeaturesData->BspNumber = OldBspNumber;
> -  //
> -  // Wakeup all APs for programming.
> -  //
> -  StartupAPsWorker (SetProcessorRegister);
> -  //
> -  // Programming BSP
> -  //
> -  SetProcessorRegister (NULL);
> -  //
> -  // Switch to new BSP if required
> -  //
> -  if (CpuFeaturesData->BspNumber != OldBspNumber) {
> -    SwitchNewBsp (CpuFeaturesData->BspNumber);
> -  }
> -}
> diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.c b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.c
> index 1f34a3f489..8346f7004f 100644
> --- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.c
> +++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.c
> @@ -15,6 +15,7 @@
>   #include <PiDxe.h>
>   
>   #include <Library/UefiBootServicesTableLib.h>
> +#include <Library/UefiLib.h>
>   
>   #include "RegisterCpuFeatures.h"
>   
> @@ -115,14 +116,20 @@ GetProcessorInformation (
>   
>     @param[in]  Procedure               A pointer to the function to be run on
>                                         enabled APs of the system.
> +  @param[in]  MpEvent                 A pointer to the event to be used later
> +                                      to check whether procedure has done.
>   **/
>   VOID
>   StartupAPsWorker (
> -  IN  EFI_AP_PROCEDURE                 Procedure
> +  IN  EFI_AP_PROCEDURE                 Procedure,
> +  IN  VOID                             *MpEvent
>     )
>   {
>     EFI_STATUS                           Status;
>     EFI_MP_SERVICES_PROTOCOL             *MpServices;
> +  CPU_FEATURES_DATA                    *CpuFeaturesData;
> +
> +  CpuFeaturesData = GetCpuFeaturesData ();
>   
>     MpServices = GetMpProtocol ();
>     //
> @@ -132,9 +139,9 @@ StartupAPsWorker (
>                    MpServices,
>                    Procedure,
>                    FALSE,
> -                 NULL,
> +                 (EFI_EVENT)MpEvent,
>                    0,
> -                 NULL,
> +                 CpuFeaturesData,
>                    NULL
>                    );
>     ASSERT_EFI_ERROR (Status);
> @@ -197,3 +204,61 @@ GetNumberOfProcessor (
>     ASSERT_EFI_ERROR (Status);
>   }
>   
> +/**
> +  Performs CPU features Initialization.
> +
> +  This service will invoke MP service to perform CPU features
> +  initialization on BSP/APs per user configuration.
> +
> +  @note This service could be called by BSP only.
> +**/
> +VOID
> +EFIAPI
> +CpuFeaturesInitialize (
> +  VOID
> +  )
> +{
> +  CPU_FEATURES_DATA          *CpuFeaturesData;
> +  UINTN                      OldBspNumber;
> +  EFI_EVENT                  MpEvent;
> +  EFI_STATUS                 Status;
> +
> +  CpuFeaturesData = GetCpuFeaturesData ();
> +
> +  OldBspNumber = GetProcessorIndex();
> +  CpuFeaturesData->BspNumber = OldBspNumber;
> +
> +  Status = gBS->CreateEvent (
> +                  EVT_NOTIFY_WAIT,
> +                  TPL_CALLBACK,
> +                  EfiEventEmptyFunction,
> +                  NULL,
> +                  &MpEvent
> +                  );
> +  ASSERT_EFI_ERROR (Status);
> +
> +  //
> +  // Wakeup all APs for programming.
> +  //
> +  StartupAPsWorker (SetProcessorRegister, MpEvent);
> +  //
> +  // Programming BSP
> +  //
> +  SetProcessorRegister (CpuFeaturesData);
> +
> +  //
> +  // Wait all processors to finish the task.
> +  //
> +  do {
> +    Status = gBS->CheckEvent (MpEvent);
> +  } while (Status == EFI_NOT_READY);
> +  ASSERT_EFI_ERROR (Status);
> +
> +  //
> +  // Switch to new BSP if required
> +  //
> +  if (CpuFeaturesData->BspNumber != OldBspNumber) {
> +    SwitchNewBsp (CpuFeaturesData->BspNumber);
> +  }
> +}
> +
> diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.inf b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.inf
> index f0f317c945..6693bae575 100644
> --- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.inf
> +++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.inf
> @@ -47,6 +47,9 @@
>     SynchronizationLib
>     UefiBootServicesTableLib
>     IoLib
> +  UefiBootServicesTableLib
> +  UefiLib
> +  LocalApicLib
>   
>   [Protocols]
>     gEfiMpServiceProtocolGuid                                            ## CONSUMES
> diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.c b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.c
> index 82fe268812..799864a136 100644
> --- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.c
> +++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.c
> @@ -149,11 +149,15 @@ GetProcessorInformation (
>   **/
>   VOID
>   StartupAPsWorker (
> -  IN  EFI_AP_PROCEDURE                 Procedure
> +  IN  EFI_AP_PROCEDURE                 Procedure,
> +  IN  VOID                             *MpEvent
>     )
>   {
>     EFI_STATUS                           Status;
>     EFI_PEI_MP_SERVICES_PPI              *CpuMpPpi;
> +  CPU_FEATURES_DATA                    *CpuFeaturesData;
> +
> +  CpuFeaturesData = GetCpuFeaturesData ();
>   
>     //
>     // Get MP Services Protocol
> @@ -175,7 +179,7 @@ StartupAPsWorker (
>                    Procedure,
>                    FALSE,
>                    0,
> -                 NULL
> +                 CpuFeaturesData
>                    );
>     ASSERT_EFI_ERROR (Status);
>   }
> @@ -257,3 +261,50 @@ GetNumberOfProcessor (
>                            );
>     ASSERT_EFI_ERROR (Status);
>   }
> +
> +/**
> +  Performs CPU features Initialization.
> +
> +  This service will invoke MP service to perform CPU features
> +  initialization on BSP/APs per user configuration.
> +
> +  @note This service could be called by BSP only.
> +**/
> +VOID
> +EFIAPI
> +CpuFeaturesInitialize (
> +  VOID
> +  )
> +{
> +  CPU_FEATURES_DATA          *CpuFeaturesData;
> +  UINTN                      OldBspNumber;
> +
> +  CpuFeaturesData = GetCpuFeaturesData ();
> +
> +  OldBspNumber = GetProcessorIndex();
> +  CpuFeaturesData->BspNumber = OldBspNumber;
> +
> +  //
> +  // Known limitation: In PEI phase, CpuFeatures driver not
> +  // support async mode execute tasks. So semaphore type
> +  // register can't been used for this instance, must use
> +  // DXE type instance.
> +  //
> +
> +  //
> +  // Wakeup all APs for programming.
> +  //
> +  StartupAPsWorker (SetProcessorRegister, NULL);
> +  //
> +  // Programming BSP
> +  //
> +  SetProcessorRegister (CpuFeaturesData);
> +
> +  //
> +  // Switch to new BSP if required
> +  //
> +  if (CpuFeaturesData->BspNumber != OldBspNumber) {
> +    SwitchNewBsp (CpuFeaturesData->BspNumber);
> +  }
> +}
> +
> diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.inf b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.inf
> index fdfef98293..e95f01df0b 100644
> --- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.inf
> +++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.inf
> @@ -49,6 +49,7 @@
>     PeiServicesLib
>     PeiServicesTablePointerLib
>     IoLib
> +  LocalApicLib
>   
>   [Ppis]
>     gEfiPeiMpServicesPpiGuid                                             ## CONSUMES
> diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeatures.h b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeatures.h
> index edd266934f..39457e9730 100644
> --- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeatures.h
> +++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeatures.h
> @@ -23,6 +23,7 @@
>   #include <Library/MemoryAllocationLib.h>
>   #include <Library/SynchronizationLib.h>
>   #include <Library/IoLib.h>
> +#include <Library/LocalApicLib.h>
>   
>   #include <AcpiCpuData.h>
>   
> @@ -46,16 +47,26 @@ typedef struct {
>     CPU_FEATURE_INITIALIZE       InitializeFunc;
>     UINT8                        *BeforeFeatureBitMask;
>     UINT8                        *AfterFeatureBitMask;
> +  UINT8                        *CoreBeforeFeatureBitMask;
> +  UINT8                        *CoreAfterFeatureBitMask;
> +  UINT8                        *PackageBeforeFeatureBitMask;
> +  UINT8                        *PackageAfterFeatureBitMask;
>     VOID                         *ConfigData;
>     BOOLEAN                      BeforeAll;
>     BOOLEAN                      AfterAll;
>   } CPU_FEATURES_ENTRY;
>   
> +//
> +// Flags used when program the register.
> +//
> +typedef struct {
> +  volatile UINTN           MemoryMappedLock;     // Spinlock used to program mmio
> +  volatile UINT32          *SemaphoreCount;      // Semaphore used to program semaphore.
> +} PROGRAM_CPU_REGISTER_FLAGS;
> +
>   typedef struct {
>     UINTN                    FeaturesCount;
>     UINT32                   BitMaskSize;
> -  SPIN_LOCK                MsrLock;
> -  SPIN_LOCK                MemoryMappedLock;
>     LIST_ENTRY               FeatureList;
>   
>     CPU_FEATURES_INIT_ORDER  *InitOrder;
> @@ -64,9 +75,14 @@ typedef struct {
>     UINT8                    *ConfigurationPcd;
>     UINT8                    *SettingPcd;
>   
> +  UINT32                   NumberOfCpus;
> +  ACPI_CPU_DATA            *AcpiCpuData;
> +
>     CPU_REGISTER_TABLE       *RegisterTable;
>     CPU_REGISTER_TABLE       *PreSmmRegisterTable;
>     UINTN                    BspNumber;
> +
> +  PROGRAM_CPU_REGISTER_FLAGS  CpuFlags;
>   } CPU_FEATURES_DATA;
>   
>   #define CPU_FEATURE_ENTRY_FROM_LINK(a) \
> @@ -118,10 +134,13 @@ GetProcessorInformation (
>   
>     @param[in]  Procedure               A pointer to the function to be run on
>                                         enabled APs of the system.
> +  @param[in]  MpEvent                 A pointer to the event to be used later
> +                                      to check whether procedure has done.
>   **/
>   VOID
>   StartupAPsWorker (
> -  IN  EFI_AP_PROCEDURE                 Procedure
> +  IN  EFI_AP_PROCEDURE                 Procedure,
> +  IN  VOID                             *MpEvent
>     );
>   
>   /**
> @@ -170,4 +189,30 @@ DumpCpuFeature (
>     IN CPU_FEATURES_ENTRY  *CpuFeature
>     );
>   
> +/**
> +  Return feature dependence result.
> +
> +  @param[in]  CpuFeature        Pointer to CPU feature.
> +  @param[in]  Before            Check before dependence or after.
> +
> +  @retval     return the dependence result.
> +**/
> +CPU_FEATURE_DEPENDENCE_TYPE
> +DetectFeatureScope (
> +  IN CPU_FEATURES_ENTRY         *CpuFeature,
> +  IN BOOLEAN                    Before
> +  );
> +
> +/**
> +  Programs registers for the calling processor.
> +
> +  @param[in,out] Buffer  The pointer to private data buffer.
> +
> +**/
> +VOID
> +EFIAPI
> +SetProcessorRegister (
> +  IN OUT VOID            *Buffer
> +  );
> +
>   #endif
> diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeaturesLib.c b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeaturesLib.c
> index fa7e107e39..f9e3178dc1 100644
> --- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeaturesLib.c
> +++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeaturesLib.c
> @@ -112,6 +112,302 @@ IsBitMaskMatchCheck (
>     return FALSE;
>   }
>   
> +/**
> +  Return feature dependence result.
> +
> +  @param[in]  CpuFeature        Pointer to CPU feature.
> +  @param[in]  Before            Check before dependence or after.
> +
> +  @retval     return the dependence result.
> +**/
> +CPU_FEATURE_DEPENDENCE_TYPE
> +DetectFeatureScope (
> +  IN CPU_FEATURES_ENTRY         *CpuFeature,
> +  IN BOOLEAN                    Before
> +  )
> +{
> +  if (Before) {
> +    if (CpuFeature->PackageBeforeFeatureBitMask != NULL) {
> +      return PackageDepType;
> +    }
> +
> +    if (CpuFeature->CoreBeforeFeatureBitMask != NULL) {
> +      return CoreDepType;
> +    }
> +
> +    if (CpuFeature->BeforeFeatureBitMask != NULL) {
> +      return ThreadDepType;
> +    }
> +
> +    return NoneDepType;
> +  }
> +
> +  if (CpuFeature->PackageAfterFeatureBitMask != NULL) {
> +    return PackageDepType;
> +  }
> +
> +  if (CpuFeature->CoreAfterFeatureBitMask != NULL) {
> +    return CoreDepType;
> +  }
> +
> +  if (CpuFeature->AfterFeatureBitMask != NULL) {
> +    return ThreadDepType;
> +  }
> +
> +  return NoneDepType;
> +}
> +
> +/**
> +  Clear dependence for the specified type.
> +
> +  @param[in]  CurrentFeature     Cpu feature need to clear.
> +  @param[in]  Before             Before or after dependence relationship.
> +
> +**/
> +VOID
> +ClearFeatureScope (
> +  IN CPU_FEATURES_ENTRY           *CpuFeature,
> +  IN BOOLEAN                      Before
> +  )
> +{
> +  if (Before) {
> +    if (CpuFeature->BeforeFeatureBitMask != NULL) {
> +      FreePool (CpuFeature->BeforeFeatureBitMask);
> +      CpuFeature->BeforeFeatureBitMask = NULL;
> +    }
> +    if (CpuFeature->CoreBeforeFeatureBitMask != NULL) {
> +      FreePool (CpuFeature->CoreBeforeFeatureBitMask);
> +      CpuFeature->CoreBeforeFeatureBitMask = NULL;
> +    }
> +    if (CpuFeature->PackageBeforeFeatureBitMask != NULL) {
> +      FreePool (CpuFeature->PackageBeforeFeatureBitMask);
> +      CpuFeature->PackageBeforeFeatureBitMask = NULL;
> +    }
> +  } else {
> +    if (CpuFeature->PackageAfterFeatureBitMask != NULL) {
> +      FreePool (CpuFeature->PackageAfterFeatureBitMask);
> +      CpuFeature->PackageAfterFeatureBitMask = NULL;
> +    }
> +    if (CpuFeature->CoreAfterFeatureBitMask != NULL) {
> +      FreePool (CpuFeature->CoreAfterFeatureBitMask);
> +      CpuFeature->CoreAfterFeatureBitMask = NULL;
> +    }
> +    if (CpuFeature->AfterFeatureBitMask != NULL) {
> +      FreePool (CpuFeature->AfterFeatureBitMask);
> +      CpuFeature->AfterFeatureBitMask = NULL;
> +    }
> +  }
> +}
> +
> +/**
> +  Base on dependence relationship to asjust feature dependence.
> +
> +  ONLY when the feature before(or after) the find feature also has
> +  dependence with the find feature. In this case, driver need to base
> +  on dependce relationship to decide how to insert current feature and
> +  adjust the feature dependence.
> +
> +  @param[in]  PreviousFeature    CPU feature current before the find one.
> +  @param[in]  CurrentFeature     Cpu feature need to adjust.
> +  @param[in]  Before             Before or after dependence relationship.
> +
> +  @retval   TRUE   means the current feature dependence has been adjusted.
> +
> +  @retval   FALSE  means the previous feature dependence has been adjusted.
> +                   or previous feature has no dependence with the find one.
> +
> +**/
> +BOOLEAN
> +AdjustFeaturesDependence (
> +  IN OUT CPU_FEATURES_ENTRY         *PreviousFeature,
> +  IN OUT CPU_FEATURES_ENTRY         *CurrentFeature,
> +  IN     BOOLEAN                    Before
> +  )
> +{
> +  CPU_FEATURE_DEPENDENCE_TYPE            PreDependType;
> +  CPU_FEATURE_DEPENDENCE_TYPE            CurrentDependType;
> +
> +  PreDependType     = DetectFeatureScope(PreviousFeature, Before);
> +  CurrentDependType = DetectFeatureScope(CurrentFeature, Before);
> +
> +  //
> +  // If previous feature has no dependence with the find featue.
> +  // return FALSE.
> +  //
> +  if (PreDependType == NoneDepType) {
> +    return FALSE;
> +  }
> +
> +  //
> +  // If both feature have dependence, keep the one which needs use more
> +  // processors and clear the dependence for the other one.
> +  //
> +  if (PreDependType >= CurrentDependType) {
> +    ClearFeatureScope (CurrentFeature, Before);
> +    return TRUE;
> +  } else {
> +    ClearFeatureScope (PreviousFeature, Before);
> +    return FALSE;
> +  }
> +}
> +
> +/**
> +  Base on dependence relationship to asjust feature order.
> +
> +  @param[in]  FeatureList        Pointer to CPU feature list
> +  @param[in]  FindEntry          The entry this feature depend on.
> +  @param[in]  CurrentEntry       The entry for this feature.
> +  @param[in]  Before             Before or after dependence relationship.
> +
> +**/
> +VOID
> +AdjustEntry (
> +  IN      LIST_ENTRY                *FeatureList,
> +  IN OUT  LIST_ENTRY                *FindEntry,
> +  IN OUT  LIST_ENTRY                *CurrentEntry,
> +  IN      BOOLEAN                   Before
> +  )
> +{
> +  LIST_ENTRY                *PreviousEntry;
> +  CPU_FEATURES_ENTRY        *PreviousFeature;
> +  CPU_FEATURES_ENTRY        *CurrentFeature;
> +
> +  //
> +  // For CPU feature which has core or package type dependence, later code need to insert
> +  // AcquireSpinLock/ReleaseSpinLock logic to sequency the execute order.
> +  // So if driver finds both feature A and B need to execute before feature C, driver will
> +  // base on dependence type of feature A and B to update the logic here.
> +  // For example, feature A has package type dependence and feature B has core type dependence,
> +  // because package type dependence need to wait for more processors which has strong dependence
> +  // than core type dependence. So driver will adjust the feature order to B -> A -> C. and driver
> +  // will remove the feature dependence in feature B.
> +  // Driver just needs to make sure before feature C been executed, feature A has finished its task
> +  // in all all thread. Feature A finished in all threads also means feature B have finshed in all
> +  // threads.
> +  //
> +  if (Before) {
> +    PreviousEntry = GetPreviousNode (FeatureList, FindEntry);
> +  } else {
> 
> +    PreviousEntry = GetNextNode (FeatureList, FindEntry);
> +  }
> +
> +  CurrentFeature  = CPU_FEATURE_ENTRY_FROM_LINK (CurrentEntry);
> +  RemoveEntryList (CurrentEntry);
> +
> +  if (IsNull (FeatureList, PreviousEntry)) {
> +    //
> +    // If not exist the previous or next entry, just insert the current entry.
> +    //
> +    if (Before) {
> +      InsertTailList (FindEntry, CurrentEntry);
> +    } else {
> +      InsertHeadList (FindEntry, CurrentEntry);
> +    }
> +  } else {
> +    //
> +    // If exist the previous or next entry, need to check it before insert curent entry.
> +    //
> +    PreviousFeature = CPU_FEATURE_ENTRY_FROM_LINK (PreviousEntry);
> +
> +    if (AdjustFeaturesDependence (PreviousFeature, CurrentFeature, Before)) {
> +      //
> +      // Return TRUE means current feature dependence has been cleared and the previous
> +      // feature dependence has been kept and used. So insert current feature before (or after)
> +      // the previous feature.
> +      //
> +      if (Before) {
> +        InsertTailList (PreviousEntry, CurrentEntry);
> +      } else {
> +        InsertHeadList (PreviousEntry, CurrentEntry);
> +      }
> +    } else {
> +      if (Before) {
> +        InsertTailList (FindEntry, CurrentEntry);
> +      } else {
> +        InsertHeadList (FindEntry, CurrentEntry);
> +      }
> +    }
> +  }
> +}
> 
> +
> +/**
> +  Checks and adjusts current CPU features per dependency relationship.
> +
> +  @param[in]  FeatureList        Pointer to CPU feature list
> +  @param[in]  CurrentEntry       Pointer to current checked CPU feature
> +  @param[in]  FeatureMask        The feature bit mask.
> +
> +  @retval     return Swapped info.
> +**/
> +BOOLEAN
> +InsertToBeforeEntry (
> +  IN LIST_ENTRY              *FeatureList,
> +  IN LIST_ENTRY              *CurrentEntry,
> +  IN UINT8                   *FeatureMask
> +  )
> +{
> +  LIST_ENTRY                 *CheckEntry;
> +  CPU_FEATURES_ENTRY         *CheckFeature;
> +  BOOLEAN                    Swapped;
> +
> +  Swapped = FALSE;
> +
> +  //
> +  // Check all features dispatched before this entry
> +  //
> +  CheckEntry = GetFirstNode (FeatureList);
> +  while (CheckEntry != CurrentEntry) {
> +    CheckFeature = CPU_FEATURE_ENTRY_FROM_LINK (CheckEntry);
> +    if (IsBitMaskMatchCheck (CheckFeature->FeatureMask, FeatureMask)) {
> +      AdjustEntry (FeatureList, CheckEntry, CurrentEntry, TRUE);
> +      Swapped = TRUE;
> +      break;
> +    }
> +    CheckEntry = CheckEntry->ForwardLink;
> +  }
> +
> +  return Swapped;
> +}
> +
> +/**
> +  Checks and adjusts current CPU features per dependency relationship.
> +
> +  @param[in]  FeatureList        Pointer to CPU feature list
> +  @param[in]  CurrentEntry       Pointer to current checked CPU feature
> +  @param[in]  FeatureMask        The feature bit mask.
> +
> +  @retval     return Swapped info.
> +**/
> +BOOLEAN
> +InsertToAfterEntry (
> +  IN LIST_ENTRY              *FeatureList,
> +  IN LIST_ENTRY              *CurrentEntry,
> +  IN UINT8                   *FeatureMask
> +  )
> +{
> +  LIST_ENTRY                 *CheckEntry;
> +  CPU_FEATURES_ENTRY         *CheckFeature;
> +  BOOLEAN                    Swapped;
> +
> +  Swapped = FALSE;
> +
> +  //
> +  // Check all features dispatched after this entry
> +  //
> +  CheckEntry = GetNextNode (FeatureList, CurrentEntry);
> +  while (!IsNull (FeatureList, CheckEntry)) {
> +    CheckFeature = CPU_FEATURE_ENTRY_FROM_LINK (CheckEntry);
> +    if (IsBitMaskMatchCheck (CheckFeature->FeatureMask, FeatureMask)) {
> +      AdjustEntry (FeatureList, CheckEntry, CurrentEntry, FALSE);
> +      Swapped = TRUE;
> +      break;
> +    }
> +    CheckEntry = CheckEntry->ForwardLink;
> +  }
> +
> +  return Swapped;
> +}
> +
>   /**
>     Checks and adjusts CPU features order per dependency relationship.
>   
> @@ -128,11 +424,13 @@ CheckCpuFeaturesDependency (
>     CPU_FEATURES_ENTRY         *CheckFeature;
>     BOOLEAN                    Swapped;
>     LIST_ENTRY                 *TempEntry;
> +  LIST_ENTRY                 *NextEntry;
>   
>     CurrentEntry = GetFirstNode (FeatureList);
>     while (!IsNull (FeatureList, CurrentEntry)) {
>       Swapped = FALSE;
>       CpuFeature = CPU_FEATURE_ENTRY_FROM_LINK (CurrentEntry);
> +    NextEntry = CurrentEntry->ForwardLink;
>       if (CpuFeature->BeforeAll) {
>         //
>         // Check all features dispatched before this entry
> @@ -153,6 +451,7 @@ CheckCpuFeaturesDependency (
>           CheckEntry = CheckEntry->ForwardLink;
>         }
>         if (Swapped) {
> +        CurrentEntry = NextEntry;
>           continue;
>         }
>       }
> @@ -179,60 +478,59 @@ CheckCpuFeaturesDependency (
>           CheckEntry = CheckEntry->ForwardLink;
>         }
>         if (Swapped) {
> +        CurrentEntry = NextEntry;
>           continue;
>         }
>       }
>   
>       if (CpuFeature->BeforeFeatureBitMask != NULL) {
> -      //
> -      // Check all features dispatched before this entry
> -      //
> -      CheckEntry = GetFirstNode (FeatureList);
> -      while (CheckEntry != CurrentEntry) {
> -        CheckFeature = CPU_FEATURE_ENTRY_FROM_LINK (CheckEntry);
> -        if (IsBitMaskMatchCheck (CheckFeature->FeatureMask, CpuFeature->BeforeFeatureBitMask)) {
> -          //
> -          // If there is dependency, swap them
> -          //
> -          RemoveEntryList (CurrentEntry);
> -          InsertTailList (CheckEntry, CurrentEntry);
> -          Swapped = TRUE;
> -          break;
> -        }
> -        CheckEntry = CheckEntry->ForwardLink;
> -      }
> +      Swapped = InsertToBeforeEntry (FeatureList, CurrentEntry, CpuFeature->BeforeFeatureBitMask);
>         if (Swapped) {
> +        CurrentEntry = NextEntry;
>           continue;
>         }
>       }
>   
>       if (CpuFeature->AfterFeatureBitMask != NULL) {
> -      //
> -      // Check all features dispatched after this entry
> -      //
> -      CheckEntry = GetNextNode (FeatureList, CurrentEntry);
> -      while (!IsNull (FeatureList, CheckEntry)) {
> -        CheckFeature = CPU_FEATURE_ENTRY_FROM_LINK (CheckEntry);
> -        if (IsBitMaskMatchCheck (CheckFeature->FeatureMask, CpuFeature->AfterFeatureBitMask)) {
> -          //
> -          // If there is dependency, swap them
> -          //
> -          TempEntry = GetNextNode (FeatureList, CurrentEntry);
> -          RemoveEntryList (CurrentEntry);
> -          InsertHeadList (CheckEntry, CurrentEntry);
> -          CurrentEntry = TempEntry;
> -          Swapped = TRUE;
> -          break;
> -        }
> -        CheckEntry = CheckEntry->ForwardLink;
> +      Swapped = InsertToAfterEntry (FeatureList, CurrentEntry, CpuFeature->AfterFeatureBitMask);
> +      if (Swapped) {
> +        CurrentEntry = NextEntry;
> +        continue;
>         }
> +    }
> +
> +    if (CpuFeature->CoreBeforeFeatureBitMask != NULL) {
> +      Swapped = InsertToBeforeEntry (FeatureList, CurrentEntry, CpuFeature->CoreBeforeFeatureBitMask);
>         if (Swapped) {
> +        CurrentEntry = NextEntry;
>           continue;
>         }
>       }
> -    //
> -    // No swap happened, check the next feature
> -    //
> +
> +    if (CpuFeature->CoreAfterFeatureBitMask != NULL) {
> +      Swapped = InsertToAfterEntry (FeatureList, CurrentEntry, CpuFeature->CoreAfterFeatureBitMask);
> +      if (Swapped) {
> +        CurrentEntry = NextEntry;
> +        continue;
> +      }
> +    }
> +
> +    if (CpuFeature->PackageBeforeFeatureBitMask != NULL) {
> +      Swapped = InsertToBeforeEntry (FeatureList, CurrentEntry, CpuFeature->PackageBeforeFeatureBitMask);
> +      if (Swapped) {
> +        CurrentEntry = NextEntry;
> +        continue;
> +      }
> +    }
> +
> +    if (CpuFeature->PackageAfterFeatureBitMask != NULL) {
> +      Swapped = InsertToAfterEntry (FeatureList, CurrentEntry, CpuFeature->PackageAfterFeatureBitMask);
> +      if (Swapped) {
> +        CurrentEntry = NextEntry;
> +        continue;
> +      }
> +    }
> +
>       CurrentEntry = CurrentEntry->ForwardLink;
>     }
>   }
> @@ -265,8 +563,7 @@ RegisterCpuFeatureWorker (
>     CpuFeaturesData = GetCpuFeaturesData ();
>     if (CpuFeaturesData->FeaturesCount == 0) {
>       InitializeListHead (&CpuFeaturesData->FeatureList);
> -    InitializeSpinLock (&CpuFeaturesData->MsrLock);
> -    InitializeSpinLock (&CpuFeaturesData->MemoryMappedLock);
> +    InitializeSpinLock (&CpuFeaturesData->CpuFlags.MemoryMappedLock);
>       CpuFeaturesData->BitMaskSize = (UINT32) BitMaskSize;
>     }
>     ASSERT (CpuFeaturesData->BitMaskSize == BitMaskSize);
> @@ -328,6 +625,31 @@ RegisterCpuFeatureWorker (
>         }
>         CpuFeatureEntry->AfterFeatureBitMask = CpuFeature->AfterFeatureBitMask;
>       }
> +    if (CpuFeature->CoreBeforeFeatureBitMask != NULL) {
> +      if (CpuFeatureEntry->CoreBeforeFeatureBitMask != NULL) {
> +        FreePool (CpuFeatureEntry->CoreBeforeFeatureBitMask);
> +      }
> +      CpuFeatureEntry->CoreBeforeFeatureBitMask = CpuFeature->CoreBeforeFeatureBitMask;
> +    }
> +    if (CpuFeature->CoreAfterFeatureBitMask != NULL) {
> +      if (CpuFeatureEntry->CoreAfterFeatureBitMask != NULL) {
> +        FreePool (CpuFeatureEntry->CoreAfterFeatureBitMask);
> +      }
> +      CpuFeatureEntry->CoreAfterFeatureBitMask = CpuFeature->CoreAfterFeatureBitMask;
> +    }
> +    if (CpuFeature->PackageBeforeFeatureBitMask != NULL) {
> +      if (CpuFeatureEntry->PackageBeforeFeatureBitMask != NULL) {
> +        FreePool (CpuFeatureEntry->PackageBeforeFeatureBitMask);
> +      }
> +      CpuFeatureEntry->PackageBeforeFeatureBitMask = CpuFeature->PackageBeforeFeatureBitMask;
> +    }
> +    if (CpuFeature->PackageAfterFeatureBitMask != NULL) {
> +      if (CpuFeatureEntry->PackageAfterFeatureBitMask != NULL) {
> +        FreePool (CpuFeatureEntry->PackageAfterFeatureBitMask);
> +      }
> +      CpuFeatureEntry->PackageAfterFeatureBitMask = CpuFeature->PackageAfterFeatureBitMask;
> +    }
> +
>       CpuFeatureEntry->BeforeAll = CpuFeature->BeforeAll;
>       CpuFeatureEntry->AfterAll  = CpuFeature->AfterAll;
>   
> @@ -410,6 +732,8 @@ SetCpuFeaturesBitMask (
>     @retval  RETURN_UNSUPPORTED       Registration of the CPU feature is not
>                                       supported due to a circular dependency between
>                                       BEFORE and AFTER features.
> +  @retval  RETURN_NOT_READY         CPU feature PCD PcdCpuFeaturesUserConfiguration
> +                                    not updated by Platform driver yet.
>   
>     @note This service could be called by BSP only.
>   **/
> @@ -431,12 +755,20 @@ RegisterCpuFeature (
>     UINT8                      *FeatureMask;
>     UINT8                      *BeforeFeatureBitMask;
>     UINT8                      *AfterFeatureBitMask;
> +  UINT8                      *CoreBeforeFeatureBitMask;
> +  UINT8                      *CoreAfterFeatureBitMask;
> +  UINT8                      *PackageBeforeFeatureBitMask;
> +  UINT8                      *PackageAfterFeatureBitMask;
>     BOOLEAN                    BeforeAll;
>     BOOLEAN                    AfterAll;
>   
> -  FeatureMask          = NULL;
> -  BeforeFeatureBitMask = NULL;
> -  AfterFeatureBitMask  = NULL;
> +  FeatureMask                 = NULL;
> +  BeforeFeatureBitMask        = NULL;

How about renaming BeforeFeatureBitMask to ThreadBeforeFeatureBitMask?
I think the renaming together with redefining the macro 
CPU_FEATURE_BEFORE as CPU_FEATURE_THREAD_BEFORE can be in a separate patch.

> +  AfterFeatureBitMask         = NULL;
> +  CoreBeforeFeatureBitMask    = NULL;
> +  CoreAfterFeatureBitMask     = NULL;
> +  PackageBeforeFeatureBitMask  = NULL;
> +  PackageAfterFeatureBitMask   = NULL;
>     BeforeAll            = FALSE;
>     AfterAll             = FALSE;
>   
> @@ -449,6 +781,10 @@ RegisterCpuFeature (
>                       != (CPU_FEATURE_BEFORE | CPU_FEATURE_AFTER));
>       ASSERT ((Feature & (CPU_FEATURE_BEFORE_ALL | CPU_FEATURE_AFTER_ALL))
>                       != (CPU_FEATURE_BEFORE_ALL | CPU_FEATURE_AFTER_ALL));

Implementation can avoid using CPU_FEATURE_BEFORE and CPU_FEATURE_AFTER.
Use CPU_FEATURE_THREAD_BEFORE and CPU_FEATURE_THREAD_AFTER.

> +    ASSERT ((Feature & (CPU_FEATURE_CORE_BEFORE | CPU_FEATURE_CORE_AFTER))
> +                    != (CPU_FEATURE_CORE_BEFORE | CPU_FEATURE_CORE_AFTER));
> +    ASSERT ((Feature & (CPU_FEATURE_PACKAGE_BEFORE | CPU_FEATURE_PACKAGE_AFTER))
> +                    != (CPU_FEATURE_PACKAGE_BEFORE | CPU_FEATURE_PACKAGE_AFTER));
>       if (Feature < CPU_FEATURE_BEFORE) {
>         BeforeAll = ((Feature & CPU_FEATURE_BEFORE_ALL) != 0) ? TRUE : FALSE;
>         AfterAll  = ((Feature & CPU_FEATURE_AFTER_ALL) != 0) ? TRUE : FALSE;
> @@ -459,6 +795,14 @@ RegisterCpuFeature (
>         SetCpuFeaturesBitMask (&BeforeFeatureBitMask, Feature & ~CPU_FEATURE_BEFORE, BitMaskSize);
>       } else if ((Feature & CPU_FEATURE_AFTER) != 0) {
>         SetCpuFeaturesBitMask (&AfterFeatureBitMask, Feature & ~CPU_FEATURE_AFTER, BitMaskSize);
> +    } else if ((Feature & CPU_FEATURE_CORE_BEFORE) != 0) {
> +      SetCpuFeaturesBitMask (&CoreBeforeFeatureBitMask, Feature & ~CPU_FEATURE_CORE_BEFORE, BitMaskSize);
> +    } else if ((Feature & CPU_FEATURE_CORE_AFTER) != 0) {
> +      SetCpuFeaturesBitMask (&CoreAfterFeatureBitMask, Feature & ~CPU_FEATURE_CORE_AFTER, BitMaskSize);
> +    } else if ((Feature & CPU_FEATURE_PACKAGE_BEFORE) != 0) {
> +      SetCpuFeaturesBitMask (&PackageBeforeFeatureBitMask, Feature & ~CPU_FEATURE_PACKAGE_BEFORE, BitMaskSize);
> +    } else if ((Feature & CPU_FEATURE_PACKAGE_AFTER) != 0) {
> +      SetCpuFeaturesBitMask (&PackageAfterFeatureBitMask, Feature & ~CPU_FEATURE_PACKAGE_AFTER, BitMaskSize);
>       }
>       Feature = VA_ARG (Marker, UINT32);
>     }
> @@ -466,15 +810,19 @@ RegisterCpuFeature (
>   
>     CpuFeature = AllocateZeroPool (sizeof (CPU_FEATURES_ENTRY));
>     ASSERT (CpuFeature != NULL);
> -  CpuFeature->Signature            = CPU_FEATURE_ENTRY_SIGNATURE;
> -  CpuFeature->FeatureMask          = FeatureMask;
> -  CpuFeature->BeforeFeatureBitMask = BeforeFeatureBitMask;
> -  CpuFeature->AfterFeatureBitMask  = AfterFeatureBitMask;
> -  CpuFeature->BeforeAll            = BeforeAll;
> -  CpuFeature->AfterAll             = AfterAll;
> -  CpuFeature->GetConfigDataFunc    = GetConfigDataFunc;
> -  CpuFeature->SupportFunc          = SupportFunc;
> -  CpuFeature->InitializeFunc       = InitializeFunc;
> +  CpuFeature->Signature                   = CPU_FEATURE_ENTRY_SIGNATURE;
> +  CpuFeature->FeatureMask                 = FeatureMask;
> +  CpuFeature->BeforeFeatureBitMask        = BeforeFeatureBitMask;
> +  CpuFeature->AfterFeatureBitMask         = AfterFeatureBitMask;
> +  CpuFeature->CoreBeforeFeatureBitMask    = CoreBeforeFeatureBitMask;
> +  CpuFeature->CoreAfterFeatureBitMask     = CoreAfterFeatureBitMask;
> +  CpuFeature->PackageBeforeFeatureBitMask = PackageBeforeFeatureBitMask;
> +  CpuFeature->PackageAfterFeatureBitMask  = PackageAfterFeatureBitMask;
> +  CpuFeature->BeforeAll                   = BeforeAll;
> +  CpuFeature->AfterAll                    = AfterAll;
> +  CpuFeature->GetConfigDataFunc           = GetConfigDataFunc;
> +  CpuFeature->SupportFunc                 = SupportFunc;
> +  CpuFeature->InitializeFunc              = InitializeFunc;
>     if (FeatureName != NULL) {
>       CpuFeature->FeatureName          = AllocatePool (CPU_FEATURE_NAME_SIZE);
>       ASSERT (CpuFeature->FeatureName != NULL);
> 


-- 
Thanks,
Ray


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch 4/4] UefiCpuPkg/PiSmmCpuDxeSmm: Add logic to support semaphore type.
  2018-10-15  2:49 ` [Patch 4/4] UefiCpuPkg/PiSmmCpuDxeSmm: " Eric Dong
  2018-10-15 17:13   ` Laszlo Ersek
@ 2018-10-16  3:16   ` Ni, Ruiyu
  2018-10-16 23:52     ` Dong, Eric
  1 sibling, 1 reply; 18+ messages in thread
From: Ni, Ruiyu @ 2018-10-16  3:16 UTC (permalink / raw)
  To: Eric Dong, edk2-devel; +Cc: Laszlo Ersek

On 10/15/2018 10:49 AM, Eric Dong wrote:
> Because this driver needs to set MSRs saved in normal boot phase, sync semaphore
> logic from RegisterCpuFeaturesLib code which used for normal boot phase.
> 
> Detail see change SHA-1: dcdf1774212d87e2d7feb36286a408ea7475fd7b for
> RegisterCpuFeaturesLib.
> 
> Cc: Ruiyu Ni <ruiyu.ni@intel.com>
> Cc: Laszlo Ersek <lersek@redhat.com>
> Contributed-under: TianoCore Contribution Agreement 1.1
> Signed-off-by: Eric Dong <eric.dong@intel.com>
> ---
>   UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c          | 316 ++++++++++++++++-------------
>   UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c      |   3 -
>   UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h |   3 +-
>   3 files changed, 180 insertions(+), 142 deletions(-)
> 
> diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c b/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
> index 52ff9679d5..5a35f7a634 100644
> --- a/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
> +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
> @@ -38,9 +38,12 @@ typedef struct {
>   } MP_ASSEMBLY_ADDRESS_MAP;
>   
>   //
> -// Spin lock used to serialize MemoryMapped operation
> +// Flags used when program the register.
>   //
> -SPIN_LOCK                *mMemoryMappedLock = NULL;
> +typedef struct {
> +  volatile UINTN           MemoryMappedLock;     // Spinlock used to program mmio
> +  volatile UINT32          *SemaphoreCount;      // Semaphore used to program semaphore.
> +} PROGRAM_CPU_REGISTER_FLAGS;
>   
>   //
>   // Signal that SMM BASE relocation is complete.
> @@ -62,13 +65,11 @@ AsmGetAddressMap (
>   #define LEGACY_REGION_SIZE    (2 * 0x1000)
>   #define LEGACY_REGION_BASE    (0xA0000 - LEGACY_REGION_SIZE)
>   
> +PROGRAM_CPU_REGISTER_FLAGS   mCpuFlags;
>   ACPI_CPU_DATA                mAcpiCpuData;
>   volatile UINT32              mNumberToFinish;
>   MP_CPU_EXCHANGE_INFO         *mExchangeInfo;
>   BOOLEAN                      mRestoreSmmConfigurationInS3 = FALSE;
> -MP_MSR_LOCK                  *mMsrSpinLocks = NULL;
> -UINTN                        mMsrSpinLockCount;
> -UINTN                        mMsrCount = 0;
>   
>   //
>   // S3 boot flag
> @@ -91,89 +92,6 @@ UINT8                        mApHltLoopCodeTemplate[] = {
>                                  0xEB, 0xFC               // jmp $-2
>                                  };
>   
> -/**
> -  Get MSR spin lock by MSR index.
> -
> -  @param  MsrIndex       MSR index value.
> -
> -  @return Pointer to MSR spin lock.
> -
> -**/
> -SPIN_LOCK *
> -GetMsrSpinLockByIndex (
> -  IN UINT32      MsrIndex
> -  )
> -{
> -  UINTN     Index;
> -  for (Index = 0; Index < mMsrCount; Index++) {
> -    if (MsrIndex == mMsrSpinLocks[Index].MsrIndex) {
> -      return mMsrSpinLocks[Index].SpinLock;
> -    }
> -  }
> -  return NULL;
> -}
> -
> -/**
> -  Initialize MSR spin lock by MSR index.
> -
> -  @param  MsrIndex       MSR index value.
> -
> -**/
> -VOID
> -InitMsrSpinLockByIndex (
> -  IN UINT32      MsrIndex
> -  )
> -{
> -  UINTN    MsrSpinLockCount;
> -  UINTN    NewMsrSpinLockCount;
> -  UINTN    Index;
> -  UINTN    AddedSize;
> -
> -  if (mMsrSpinLocks == NULL) {
> -    MsrSpinLockCount = mSmmCpuSemaphores.SemaphoreMsr.AvailableCounter;
> -    mMsrSpinLocks = (MP_MSR_LOCK *) AllocatePool (sizeof (MP_MSR_LOCK) * MsrSpinLockCount);
> -    ASSERT (mMsrSpinLocks != NULL);
> -    for (Index = 0; Index < MsrSpinLockCount; Index++) {
> -      mMsrSpinLocks[Index].SpinLock =
> -       (SPIN_LOCK *)((UINTN)mSmmCpuSemaphores.SemaphoreMsr.Msr + Index * mSemaphoreSize);
> -      mMsrSpinLocks[Index].MsrIndex = (UINT32)-1;
> -    }
> -    mMsrSpinLockCount = MsrSpinLockCount;
> -    mSmmCpuSemaphores.SemaphoreMsr.AvailableCounter = 0;
> -  }
> -  if (GetMsrSpinLockByIndex (MsrIndex) == NULL) {
> -    //
> -    // Initialize spin lock for MSR programming
> -    //
> -    mMsrSpinLocks[mMsrCount].MsrIndex = MsrIndex;
> -    InitializeSpinLock (mMsrSpinLocks[mMsrCount].SpinLock);
> -    mMsrCount ++;
> -    if (mMsrCount == mMsrSpinLockCount) {
> -      //
> -      // If MSR spin lock buffer is full, enlarge it
> -      //
> -      AddedSize = SIZE_4KB;
> -      mSmmCpuSemaphores.SemaphoreMsr.Msr =
> -                        AllocatePages (EFI_SIZE_TO_PAGES(AddedSize));
> -      ASSERT (mSmmCpuSemaphores.SemaphoreMsr.Msr != NULL);
> -      NewMsrSpinLockCount = mMsrSpinLockCount + AddedSize / mSemaphoreSize;
> -      mMsrSpinLocks = ReallocatePool (
> -                        sizeof (MP_MSR_LOCK) * mMsrSpinLockCount,
> -                        sizeof (MP_MSR_LOCK) * NewMsrSpinLockCount,
> -                        mMsrSpinLocks
> -                        );
> -      ASSERT (mMsrSpinLocks != NULL);
> -      mMsrSpinLockCount = NewMsrSpinLockCount;
> -      for (Index = mMsrCount; Index < mMsrSpinLockCount; Index++) {
> -        mMsrSpinLocks[Index].SpinLock =
> -                 (SPIN_LOCK *)((UINTN)mSmmCpuSemaphores.SemaphoreMsr.Msr +
> -                 (Index - mMsrCount)  * mSemaphoreSize);
> -        mMsrSpinLocks[Index].MsrIndex = (UINT32)-1;
> -      }
> -    }
> -  }
> -}
> -
>   /**
>     Sync up the MTRR values for all processors.
>   
> @@ -204,42 +122,89 @@ Returns:
>   }
>   
>   /**
> -  Programs registers for the calling processor.
> +  Increment semaphore by 1.
>   
> -  This function programs registers for the calling processor.
> +  @param      Sem            IN:  32-bit unsigned integer
>   
> -  @param  RegisterTables        Pointer to register table of the running processor.
> -  @param  RegisterTableCount    Register table count.
> +**/
> +VOID
> +S3ReleaseSemaphore (
> +  IN OUT  volatile UINT32           *Sem
> +  )
> +{
> +  InterlockedIncrement (Sem);
> +}
> +
> +/**
> +  Decrement the semaphore by 1 if it is not zero.
> +
> +  Performs an atomic decrement operation for semaphore.
> +  The compare exchange operation must be performed using
> +  MP safe mechanisms.
> +
> +  @param      Sem            IN:  32-bit unsigned integer
> +
> +**/
> +VOID
> +S3WaitForSemaphore (
> +  IN OUT  volatile UINT32           *Sem
> +  )
> +{
> +  UINT32  Value;
> +
> +  do {
> +    Value = *Sem;
> +  } while (Value == 0);
> +
> +  InterlockedDecrement (Sem);

The code here is not safe. Please reference ReleaseSemaphore() 
implementation in PiSmmCpuDxeSmm/MpService.c.

> +}
> +
> +/**
> +  Initialize the CPU registers from a register table.
> +
> +  @param[in]  RegisterTable         The register table for this AP.
> +  @param[in]  ApLocation            AP location info for this ap.
> +  @param[in]  CpuStatus             CPU status info for this CPU.
> +  @param[in]  CpuFlags              Flags data structure used when program the register.
>   
> +  @note This service could be called by BSP/APs.
>   **/
>   VOID
> -SetProcessorRegister (
> -  IN CPU_REGISTER_TABLE        *RegisterTables,
> -  IN UINTN                     RegisterTableCount
> +EFIAPI
> +ProgramProcessorRegister (
> +  IN CPU_REGISTER_TABLE           *RegisterTable,
> +  IN EFI_CPU_PHYSICAL_LOCATION    *ApLocation,
> +  IN CPU_STATUS_INFORMATION       *CpuStatus,
> +  IN PROGRAM_CPU_REGISTER_FLAGS   *CpuFlags
>     )
>   {
>     CPU_REGISTER_TABLE_ENTRY  *RegisterTableEntry;
>     UINTN                     Index;
>     UINTN                     Value;
> -  SPIN_LOCK                 *MsrSpinLock;
> -  UINT32                    InitApicId;
> -  CPU_REGISTER_TABLE        *RegisterTable;
> +  CPU_REGISTER_TABLE_ENTRY  *RegisterTableEntryHead;
> +  volatile UINT32           *SemaphorePtr;
> +  UINT32                    CoreOffset;
> +  UINT32                    PackageOffset;
> +  UINT32                    PackageThreadsCount;
> +  UINT32                    ApOffset;
> +  UINTN                     ProcessorIndex;
> +  UINTN                     ApIndex;
> +  UINTN                     ValidApCount;
>   
> -  InitApicId = GetInitialApicId ();
> -  RegisterTable = NULL;
> -  for (Index = 0; Index < RegisterTableCount; Index++) {
> -    if (RegisterTables[Index].InitialApicId == InitApicId) {
> -      RegisterTable =  &RegisterTables[Index];
> -      break;
> -    }
> -  }
> -  ASSERT (RegisterTable != NULL);
> +  ApIndex = ApLocation->Package * CpuStatus->CoreCount * CpuStatus->ThreadCount \
> +            + ApLocation->Core * CpuStatus->ThreadCount \
> +            + ApLocation->Thread;
Please avoid using AP. Use Thread instead.
>   
>     //
>     // Traverse Register Table of this logical processor
>     //
> -  RegisterTableEntry = (CPU_REGISTER_TABLE_ENTRY *) (UINTN) RegisterTable->RegisterTableEntry;
> -  for (Index = 0; Index < RegisterTable->TableLength; Index++, RegisterTableEntry++) {
> +  RegisterTableEntryHead = (CPU_REGISTER_TABLE_ENTRY *) (UINTN) RegisterTable->RegisterTableEntry;
> +
> +  for (Index = 0; Index < RegisterTable->TableLength; Index++) {
> +
> +    RegisterTableEntry = &RegisterTableEntryHead[Index];
> +    DEBUG ((DEBUG_INFO, "Processor = %d, Entry Index %d, Type = %d!\n", ApIndex, Index, RegisterTableEntry->RegisterType));

Please dump the register type as string.

> +
>       //
>       // Check the type of specified register
>       //
> @@ -310,12 +275,6 @@ SetProcessorRegister (
>             RegisterTableEntry->Value
>             );
>         } else {
> -        //
> -        // Get lock to avoid Package/Core scope MSRs programming issue in parallel execution mode
> -        // to make sure MSR read/write operation is atomic.
> -        //
> -        MsrSpinLock = GetMsrSpinLockByIndex (RegisterTableEntry->Index);
> -        AcquireSpinLock (MsrSpinLock);
>           //
>           // Set the bit section according to bit start and length
>           //
> @@ -325,21 +284,20 @@ SetProcessorRegister (
>             RegisterTableEntry->ValidBitStart + RegisterTableEntry->ValidBitLength - 1,
>             RegisterTableEntry->Value
>             );
> -        ReleaseSpinLock (MsrSpinLock);
>         }
>         break;
>       //
>       // MemoryMapped operations
>       //
>       case MemoryMapped:
> -      AcquireSpinLock (mMemoryMappedLock);
> +      AcquireSpinLock (&CpuFlags->MemoryMappedLock);
>         MmioBitFieldWrite32 (
>           (UINTN)(RegisterTableEntry->Index | LShiftU64 (RegisterTableEntry->HighIndex, 32)),
>           RegisterTableEntry->ValidBitStart,
>           RegisterTableEntry->ValidBitStart + RegisterTableEntry->ValidBitLength - 1,
>           (UINT32)RegisterTableEntry->Value
>           );
> -      ReleaseSpinLock (mMemoryMappedLock);
> +      ReleaseSpinLock (&CpuFlags->MemoryMappedLock);
>         break;
>       //
>       // Enable or disable cache
> @@ -355,12 +313,99 @@ SetProcessorRegister (
>         }
>         break;
>   
> +    case Semaphore:

Please refer to the comment to patch #3.

> +      SemaphorePtr = CpuFlags->SemaphoreCount;
> +      switch (RegisterTableEntry->Value) {
> +      case CoreDepType:
> +        CoreOffset = (ApLocation->Package * CpuStatus->CoreCount + ApLocation->Core) * CpuStatus->ThreadCount;
> +        ApOffset = CoreOffset + ApLocation->Thread;
> +        //
> +        // First increase semaphore count by 1 for processors in this core.
> +        //
> +        for (ProcessorIndex = 0; ProcessorIndex < CpuStatus->ThreadCount; ProcessorIndex ++) {
> +          S3ReleaseSemaphore ((UINT32 *) &SemaphorePtr[CoreOffset + ProcessorIndex]);
> +        }
> +        //
> +        // Second, check whether the count has reach the check number.
> +        //
> +        for (ProcessorIndex = 0; ProcessorIndex < CpuStatus->ThreadCount; ProcessorIndex ++) {
> +          S3WaitForSemaphore (&SemaphorePtr[ApOffset]);
> +        }
> +        break;
> +
> +      case PackageDepType:
> +        PackageOffset = ApLocation->Package * CpuStatus->CoreCount * CpuStatus->ThreadCount;
> +        PackageThreadsCount = CpuStatus->ThreadCount * CpuStatus->CoreCount;
> +        ApOffset = PackageOffset + CpuStatus->ThreadCount * ApLocation->Core + ApLocation->Thread;
> +        ValidApCount = CpuStatus->ThreadCount * CpuStatus->ValidCoresInPackages[ApLocation->Package];
> +        //
> +        // First increase semaphore count by 1 for processors in this package.
> +        //
> +        for (ProcessorIndex = 0; ProcessorIndex < PackageThreadsCount ; ProcessorIndex ++) {
> +          S3ReleaseSemaphore ((UINT32 *) &SemaphorePtr[PackageOffset + ProcessorIndex]);
> +        }
> +        //
> +        // Second, check whether the count has reach the check number.
> +        //
> +        for (ProcessorIndex = 0; ProcessorIndex < ValidApCount; ProcessorIndex ++) {
> +          S3WaitForSemaphore (&SemaphorePtr[ApOffset]);
> +        }
> +        break;
> +
> +      default:
> +        break;
> +      }
> +      break;
> +
>       default:
>         break;
>       }
>     }
>   }
>   
> +/**
> +
> +  Set Processor register for one AP.
> +
> +  @param     SmmPreRegisterTable     Use pre register table or register table.
> +
> +**/
> +VOID
> +SetRegister (
> +  IN BOOLEAN                 SmmPreRegisterTable
> +  )
> +{
> +  CPU_REGISTER_TABLE        *RegisterTable;
> +  CPU_REGISTER_TABLE        *RegisterTables;
> +  UINT32                    InitApicId;
> +  UINTN                     ProcIndex;
> +  UINTN                     Index;
> +
> +  if (SmmPreRegisterTable) {
> +    RegisterTables = (CPU_REGISTER_TABLE *)(UINTN)mAcpiCpuData.PreSmmInitRegisterTable;
> +  } else {
> +    RegisterTables = (CPU_REGISTER_TABLE *)(UINTN)mAcpiCpuData.RegisterTable;
> +  }
> +
> +  InitApicId = GetInitialApicId ();
> +  RegisterTable = NULL;
> +  for (Index = 0; Index < mAcpiCpuData.NumberOfCpus; Index++) {
> +    if (RegisterTables[Index].InitialApicId == InitApicId) {
> +      RegisterTable =  &RegisterTables[Index];
> +      ProcIndex = Index;
> +      break;
> +    }
> +  }
> +  ASSERT (RegisterTable != NULL);
> +
> +  ProgramProcessorRegister (
> +    RegisterTable,
> +    mAcpiCpuData.ApLocation + ProcIndex,
> +    &mAcpiCpuData.CpuStatus,
> +    &mCpuFlags
> +    );
> +}
> +
>   /**
>     AP initialization before then after SMBASE relocation in the S3 boot path.
>   **/
> @@ -374,7 +419,7 @@ InitializeAp (
>   
>     LoadMtrrData (mAcpiCpuData.MtrrTable);
>   
> -  SetProcessorRegister ((CPU_REGISTER_TABLE *) (UINTN) mAcpiCpuData.PreSmmInitRegisterTable, mAcpiCpuData.NumberOfCpus);
> +  SetRegister (TRUE);
>   
>     //
>     // Count down the number with lock mechanism.
> @@ -391,7 +436,7 @@ InitializeAp (
>     ProgramVirtualWireMode ();
>     DisableLvtInterrupts ();
>   
> -  SetProcessorRegister ((CPU_REGISTER_TABLE *) (UINTN) mAcpiCpuData.RegisterTable, mAcpiCpuData.NumberOfCpus);
> +  SetRegister (FALSE);
>   
>     //
>     // Place AP into the safe code, count down the number with lock mechanism in the safe code.
> @@ -466,7 +511,7 @@ InitializeCpuBeforeRebase (
>   {
>     LoadMtrrData (mAcpiCpuData.MtrrTable);
>   
> -  SetProcessorRegister ((CPU_REGISTER_TABLE *) (UINTN) mAcpiCpuData.PreSmmInitRegisterTable, mAcpiCpuData.NumberOfCpus);
> +  SetRegister (TRUE);
>   
>     ProgramVirtualWireMode ();
>   
> @@ -502,8 +547,6 @@ InitializeCpuAfterRebase (
>     VOID
>     )
>   {
> -  SetProcessorRegister ((CPU_REGISTER_TABLE *) (UINTN) mAcpiCpuData.RegisterTable, mAcpiCpuData.NumberOfCpus);
> -
>     mNumberToFinish = mAcpiCpuData.NumberOfCpus - 1;
>   
>     //
> @@ -511,6 +554,8 @@ InitializeCpuAfterRebase (
>     //
>     mInitApsAfterSmmBaseReloc = TRUE;
>   
> +  SetRegister (FALSE);
> +
>     while (mNumberToFinish > 0) {
>       CpuPause ();
>     }
> @@ -574,8 +619,6 @@ SmmRestoreCpu (
>   
>     mSmmS3Flag = TRUE;
>   
> -  InitializeSpinLock (mMemoryMappedLock);
> -
>     //
>     // See if there is enough context to resume PEI Phase
>     //
> @@ -790,7 +833,6 @@ CopyRegisterTable (
>     )
>   {
>     UINTN                      Index;
> -  UINTN                      Index1;
>     CPU_REGISTER_TABLE_ENTRY   *RegisterTableEntry;
>   
>     CopyMem (DestinationRegisterTableList, SourceRegisterTableList, NumberOfCpus * sizeof (CPU_REGISTER_TABLE));
> @@ -802,17 +844,6 @@ CopyRegisterTable (
>           );
>         ASSERT (RegisterTableEntry != NULL);
>         DestinationRegisterTableList[Index].RegisterTableEntry = (EFI_PHYSICAL_ADDRESS)(UINTN)RegisterTableEntry;
> -      //
> -      // Go though all MSRs in register table to initialize MSR spin lock
> -      //
> -      for (Index1 = 0; Index1 < DestinationRegisterTableList[Index].TableLength; Index1++, RegisterTableEntry++) {
> -        if ((RegisterTableEntry->RegisterType == Msr) && (RegisterTableEntry->ValidBitLength < 64)) {
> -          //
> -          // Initialize MSR spin lock only for those MSRs need bit field writing
> -          //
> -          InitMsrSpinLockByIndex (RegisterTableEntry->Index);
> -        }
> -      }
>       }
>     }
>   }
> @@ -832,6 +863,7 @@ GetAcpiCpuData (
>     VOID                       *GdtForAp;
>     VOID                       *IdtForAp;
>     VOID                       *MachineCheckHandlerForAp;
> +  CPU_STATUS_INFORMATION     *CpuStatus;
>   
>     if (!mAcpiS3Enable) {
>       return;
> @@ -906,6 +938,16 @@ GetAcpiCpuData (
>     Gdtr->Base = (UINTN)GdtForAp;
>     Idtr->Base = (UINTN)IdtForAp;
>     mAcpiCpuData.ApMachineCheckHandlerBase = (EFI_PHYSICAL_ADDRESS)(UINTN)MachineCheckHandlerForAp;
> +
> +  CpuStatus = &mAcpiCpuData.CpuStatus;
> +  CopyMem (CpuStatus, &AcpiCpuData->CpuStatus, sizeof (CPU_STATUS_INFORMATION));
> +  CpuStatus->ValidCoresInPackages = AllocateCopyPool (sizeof (UINT32) * CpuStatus->PackageCount, AcpiCpuData->CpuStatus.ValidCoresInPackages);
> +  ASSERT (CpuStatus->ValidCoresInPackages != NULL);
> +  mAcpiCpuData.ApLocation = AllocateCopyPool (mAcpiCpuData.NumberOfCpus * sizeof (EFI_CPU_PHYSICAL_LOCATION), AcpiCpuData->ApLocation);
> +  ASSERT (mAcpiCpuData.ApLocation != NULL);
> +  InitializeSpinLock((SPIN_LOCK*) &mCpuFlags.MemoryMappedLock);
> +  mCpuFlags.SemaphoreCount = AllocateZeroPool (sizeof (UINT32) * CpuStatus->PackageCount * CpuStatus->CoreCount * CpuStatus->ThreadCount);
> +  ASSERT (mCpuFlags.SemaphoreCount != NULL);
>   }
>   
>   /**
> diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
> index 9cf508a5c7..42b040531e 100644
> --- a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
> +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
> @@ -1303,8 +1303,6 @@ InitializeSmmCpuSemaphores (
>     mSmmCpuSemaphores.SemaphoreGlobal.CodeAccessCheckLock
>                                                     = (SPIN_LOCK *)SemaphoreAddr;
>     SemaphoreAddr += SemaphoreSize;
> -  mSmmCpuSemaphores.SemaphoreGlobal.MemoryMappedLock
> -                                                  = (SPIN_LOCK *)SemaphoreAddr;
>   
>     SemaphoreAddr = (UINTN)SemaphoreBlock + GlobalSemaphoresSize;
>     mSmmCpuSemaphores.SemaphoreCpu.Busy    = (SPIN_LOCK *)SemaphoreAddr;
> @@ -1321,7 +1319,6 @@ InitializeSmmCpuSemaphores (
>   
>     mPFLock                       = mSmmCpuSemaphores.SemaphoreGlobal.PFLock;
>     mConfigSmmCodeAccessCheckLock = mSmmCpuSemaphores.SemaphoreGlobal.CodeAccessCheckLock;
> -  mMemoryMappedLock             = mSmmCpuSemaphores.SemaphoreGlobal.MemoryMappedLock;
>   
>     mSemaphoreSize = SemaphoreSize;
>   }
> diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
> index 8c7f4996d1..e2970308fe 100644
> --- a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
> +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
> @@ -53,6 +53,7 @@ WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED.
>   #include <Library/ReportStatusCodeLib.h>
>   #include <Library/SmmCpuFeaturesLib.h>
>   #include <Library/PeCoffGetEntryPointLib.h>
> +#include <Library/RegisterCpuFeaturesLib.h>
>   
>   #include <AcpiCpuData.h>
>   #include <CpuHotPlugData.h>
> @@ -364,7 +365,6 @@ typedef struct {
>     volatile BOOLEAN     *AllCpusInSync;
>     SPIN_LOCK            *PFLock;
>     SPIN_LOCK            *CodeAccessCheckLock;
> -  SPIN_LOCK            *MemoryMappedLock;
>   } SMM_CPU_SEMAPHORE_GLOBAL;
>   
>   ///
> @@ -409,7 +409,6 @@ extern SMM_CPU_SEMAPHORES                  mSmmCpuSemaphores;
>   extern UINTN                               mSemaphoreSize;
>   extern SPIN_LOCK                           *mPFLock;
>   extern SPIN_LOCK                           *mConfigSmmCodeAccessCheckLock;
> -extern SPIN_LOCK                           *mMemoryMappedLock;
>   extern EFI_SMRAM_DESCRIPTOR                *mSmmCpuSmramRanges;
>   extern UINTN                               mSmmCpuSmramRangeCount;
>   extern UINT8                               mPhysicalAddressBits;
> 


-- 
Thanks,
Ray


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch 1/4] UefiCpuPkg/Include/AcpiCpuData.h: Add Semaphore related Information.
  2018-10-15 16:02   ` Laszlo Ersek
@ 2018-10-16  3:43     ` Dong, Eric
  0 siblings, 0 replies; 18+ messages in thread
From: Dong, Eric @ 2018-10-16  3:43 UTC (permalink / raw)
  To: Laszlo Ersek, edk2-devel@lists.01.org; +Cc: Ni, Ruiyu

Hi Laszlo,

> -----Original Message-----
> From: Laszlo Ersek [mailto:lersek@redhat.com]
> Sent: Tuesday, October 16, 2018 12:03 AM
> To: Dong, Eric <eric.dong@intel.com>; edk2-devel@lists.01.org
> Cc: Ni, Ruiyu <ruiyu.ni@intel.com>
> Subject: Re: [Patch 1/4] UefiCpuPkg/Include/AcpiCpuData.h: Add
> Semaphore related Information.
> 
> On 10/15/18 04:49, Eric Dong wrote:
> > In order to support semaphore related logic, add new definition for it.
> >
> > Cc: Ruiyu Ni <ruiyu.ni@intel.com>
> > Cc: Laszlo Ersek <lersek@redhat.com>
> > Contributed-under: TianoCore Contribution Agreement 1.1
> > Signed-off-by: Eric Dong <eric.dong@intel.com>
> > ---
> >  UefiCpuPkg/Include/AcpiCpuData.h | 23 ++++++++++++++++++++++-
> >  1 file changed, 22 insertions(+), 1 deletion(-)
> 
> (1) If it's possible, I suggest moving the (very nice) description from the 0/4
> cover letter to this patch. The cover letter is not captured in the git commit
> history.
> 
> I don't insist, but it would be a nice touch, IMO.

Code change for this patch can't show all the information for the description. I add this description in 3/4 change.  
But this change is the first one for this serial, I think it's ok to add description here. Will add it in V2 change.

> 
> >
> > diff --git a/UefiCpuPkg/Include/AcpiCpuData.h
> > b/UefiCpuPkg/Include/AcpiCpuData.h
> > index 9e51145c08..b3cf2f664a 100644
> > --- a/UefiCpuPkg/Include/AcpiCpuData.h
> > +++ b/UefiCpuPkg/Include/AcpiCpuData.h
> > @@ -15,6 +15,8 @@ WITHOUT WARRANTIES OR REPRESENTATIONS OF
> ANY KIND, EITHER EXPRESS OR IMPLIED.
> >  #ifndef _ACPI_CPU_DATA_H_
> >  #define _ACPI_CPU_DATA_H_
> >
> > +#include <Protocol/MpService.h>
> > +
> >  //
> >  // Register types in register table
> >  //
> > @@ -22,9 +24,20 @@ typedef enum {
> >    Msr,
> >    ControlRegister,
> >    MemoryMapped,
> > -  CacheControl
> > +  CacheControl,
> > +  Semaphore
> >  } REGISTER_TYPE;
> >
> > +//
> > +// CPU information.
> > +//
> > +typedef struct {
> > +  UINT32        PackageCount;             // Packages in this CPU.
> 
> (2) Is it possible to have multiple packages in a single CPU? If not, then please
> clean up the comment.
> 
> Did you perhaps mean "number of sockets in the system"?

Yes, I means sockets in the system, i think socket == package. Just like below definition in MdePkg\Include\Protocol\MpService.h file, it use package instead of socket.
	///
	/// Structure that describes the pyhiscal location of a logical CPU.
	///
	typedef struct {
	  ///
	  /// Zero-based physical package number that identifies the cartridge of the processor.
	  ///
	  UINT32  Package;
	  ///
	  /// Zero-based physical core number within package of the processor.
	  ///
	  UINT32  Core;
	  ///
	  /// Zero-based logical thread number within core of the processor.
	  ///
	  UINT32  Thread;
	} EFI_CPU_PHYSICAL_LOCATION;
	

> 
> > +  UINT32        CoreCount;                // Max Core count in the packages.
> > +  UINT32        ThreadCount;              // MAx thread count in the cores.
> 
> (3) The word "MAx" should be "Max", I think.

Yes, will update it in next version.

> 
> > +  UINT32        *ValidCoresInPackages;    // Valid cores in each package.
> 
> (4) Is it possible to document the structure of this array (?) in some detail?
> Other parts of "UefiCpuPkg/Include/AcpiCpuData.h" are very well
> documented.

Yes, will add description in next version.

> 
> > +} CPU_STATUS_INFORMATION;
> > +
> >  //
> >  // Element of register table entry
> >  //
> > @@ -147,6 +160,14 @@ typedef struct {
> >    // provided.
> >    //
> >    UINT32                ApMachineCheckHandlerSize;
> > +  //
> > +  // CPU information which is required when set the register table.
> > +  //
> > +  CPU_STATUS_INFORMATION     CpuStatus;
> > +  //
> > +  // Location info for each ap.
> 
> (5) This header file spells "AP" in upper case elsewhere.

Ok, will update it in next version

> 
> > +  //
> > +  EFI_CPU_PHYSICAL_LOCATION  *ApLocation;
> 
> (6) Is this supposed to be an array? If so, what is the structure of the array?
> What is the size?

Yes, it's point to an array.  Will add comments in this definition in next version.

> 
> (7) This is the first field in ACPI_CPU_DATA that has pointer type.
> Other pointers are represented as EFI_PHYSICAL_ADDRESS.
> 
> What justifies this difference?

Yes, here I should use EFI_PHYSICAL_ADDRESS instead of pointer type. Will update it in my next change.

> >  } ACPI_CPU_DATA;
> >
> >  #endif
> >
> 
> (8) "UefiCpuPkg/CpuS3DataDxe/CpuS3Data.c" will zero-fill the new fields.
> Is that safe?

It's not safe, I missed code change in CpuS3DataDxe, it should keep these data if OldAcpiCpuData already exist. Will update it in the next version.

> 
> Thanks
> Laszlo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch 1/4] UefiCpuPkg/Include/AcpiCpuData.h: Add Semaphore related Information.
  2018-10-16  2:27   ` Ni, Ruiyu
@ 2018-10-16  5:25     ` Dong, Eric
  0 siblings, 0 replies; 18+ messages in thread
From: Dong, Eric @ 2018-10-16  5:25 UTC (permalink / raw)
  To: Ni, Ruiyu, edk2-devel@lists.01.org; +Cc: Laszlo Ersek

Hi Ruiyu,

> -----Original Message-----
> From: Ni, Ruiyu
> Sent: Tuesday, October 16, 2018 10:27 AM
> To: Dong, Eric <eric.dong@intel.com>; edk2-devel@lists.01.org
> Cc: Laszlo Ersek <lersek@redhat.com>
> Subject: Re: [Patch 1/4] UefiCpuPkg/Include/AcpiCpuData.h: Add
> Semaphore related Information.
> 
> On 10/15/2018 10:49 AM, Eric Dong wrote:
> > In order to support semaphore related logic, add new definition for it.
> >
> > Cc: Ruiyu Ni <ruiyu.ni@intel.com>
> > Cc: Laszlo Ersek <lersek@redhat.com>
> > Contributed-under: TianoCore Contribution Agreement 1.1
> > Signed-off-by: Eric Dong <eric.dong@intel.com>
> > ---
> >   UefiCpuPkg/Include/AcpiCpuData.h | 23 ++++++++++++++++++++++-
> >   1 file changed, 22 insertions(+), 1 deletion(-)
> >
> > diff --git a/UefiCpuPkg/Include/AcpiCpuData.h
> > b/UefiCpuPkg/Include/AcpiCpuData.h
> > index 9e51145c08..b3cf2f664a 100644
> > --- a/UefiCpuPkg/Include/AcpiCpuData.h
> > +++ b/UefiCpuPkg/Include/AcpiCpuData.h
> > @@ -15,6 +15,8 @@ WITHOUT WARRANTIES OR REPRESENTATIONS OF
> ANY KIND, EITHER EXPRESS OR IMPLIED.
> >   #ifndef _ACPI_CPU_DATA_H_
> >   #define _ACPI_CPU_DATA_H_
> >
> > +#include <Protocol/MpService.h>
> > +
> >   //
> >   // Register types in register table
> >   //
> > @@ -22,9 +24,20 @@ typedef enum {
> >     Msr,
> >     ControlRegister,
> >     MemoryMapped,
> > -  CacheControl
> > +  CacheControl, > +  Semaphore
> I assume the REGISTER_TYPE definition will be move to internal
> (non-public) in phase 2.
> 

Yes.

> >   } REGISTER_TYPE;
> >
> > +//
> > +// CPU information.
> > +//
> > +typedef struct {
> > +  UINT32        PackageCount;             // Packages in this CPU.
> > +  UINT32        CoreCount;                // Max Core count in the packages.
> > +  UINT32        ThreadCount;              // MAx thread count in the cores.
> > +  UINT32        *ValidCoresInPackages;    // Valid cores in each package.
> 
> Can you please add more comments to describe each field above?

Will add more comments in the next version.

> PackageCount is easy to understand.
> But CoreCount is not. Maybe different packages have different number of
> cores. In this case, what value will CoreCount be?
> Similar question to ThreadCount.

CoreCount means the max core count in the CPU.  ThreadCount means max thread count in the CPU. Will add comments in next version change.

> 
> What does ValidCoresInPackages mean? Does it hold the valid (non-dead)
> core numbers for each package? So it's a UINT32 array with PackageCount
> elements?

Yes.

> How about using name ValidCoreCountPerPackage?
> How about using MaxCoreCount/MaxThreadCount for CoreCount and
> ThreadCount?
> 

Ok, will use these names in next version.

> > +} CPU_STATUS_INFORMATION;
> > +
> >   //
> >   // Element of register table entry
> >   //
> > @@ -147,6 +160,14 @@ typedef struct {
> >     // provided.
> >     //
> >     UINT32                ApMachineCheckHandlerSize;
> > +  //
> > +  // CPU information which is required when set the register table.
> > +  //
> > +  CPU_STATUS_INFORMATION     CpuStatus;
> > +  //
> > +  // Location info for each ap.
> > +  //
> > +  EFI_CPU_PHYSICAL_LOCATION  *ApLocation;
> 
> Please use EFI_PHYSICAL_ADDRESS for ApLocation. It's ok now. But if there
> are more fields below ApLocation, the offset of those fields differs between
> PEI and DXE. That will cause bugs.
> 

Yes,  update code in next version.

> >   } ACPI_CPU_DATA;
> >
> >   #endif
> >
> 
> 
> --
> Thanks,
> Ray

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch 3/4] UefiCpuPkg/RegisterCpuFeaturesLib: Add logic to support semaphore type.
  2018-10-16  3:05   ` Ni, Ruiyu
@ 2018-10-16  7:43     ` Dong, Eric
  0 siblings, 0 replies; 18+ messages in thread
From: Dong, Eric @ 2018-10-16  7:43 UTC (permalink / raw)
  To: Ni, Ruiyu, edk2-devel@lists.01.org; +Cc: Laszlo Ersek

Hi Ruiyu,

> -----Original Message-----
> From: Ni, Ruiyu
> Sent: Tuesday, October 16, 2018 11:05 AM
> To: Dong, Eric <eric.dong@intel.com>; edk2-devel@lists.01.org
> Cc: Laszlo Ersek <lersek@redhat.com>
> Subject: Re: [Patch 3/4] UefiCpuPkg/RegisterCpuFeaturesLib: Add logic to
> support semaphore type.
> 
> On 10/15/2018 10:49 AM, Eric Dong wrote:
> > In a system which has multiple cores, current set register value task costs
> huge times.
> > After investigation, current set MSR task costs most of the times. Current
> logic uses
> > SpinLock to let set MSR task as an single thread task for all cores. Because
> MSR has
> > scope attribute which may cause GP fault if multiple APs set MSR at the
> same time,
> > current logic use an easiest solution (use SpinLock) to avoid this issue, but it
> will
> > cost huge times.
> >
> > In order to fix this performance issue, new solution will set MSRs base on
> their scope
> > attribute. After this, the SpinLock will not needed. Without SpinLock, new
> issue raised
> > which is caused by MSR dependence. For example, MSR A depends on
> MSR B which means MSR A
> > must been set after MSR B has been set. Also MSR B is package scope level
> and MSR A is
> > thread scope level. If system has multiple threads, Thread 1 needs to set
> the thread level
> > MSRs and thread 2 needs to set thread and package level MSRs. Set MSRs
> task for thread 1
> > and thread 2 like below:
> >
> >              Thread 1                 Thread 2
> > MSR B          N                        Y
> > MSR A          Y                        Y
> >
> > If driver don't control execute MSR order, for thread 1, it will execute MSR
> A first, but
> > at this time, MSR B not been executed yet by thread 2. system may trig
> exception at this
> > time.
> >
> > In order to fix the above issue, driver introduces semaphore logic to control
> the MSR
> > execute sequence. For the above case, a semaphore will be add between
> MSR A and B for
> > all threads. Semaphore has scope info for it. The possible scope value is
> core or package.
> > For each thread, when it meets a semaphore during it set registers, it will 1)
> release
> > semaphore (+1) for each threads in this core or package(based on the
> scope info for this
> > semaphore) 2) acquire semaphore (-1) for all the threads in this core or
> package(based
> > on the scope info for this semaphore). With these two steps, driver can
> control MSR
> > sequence. Sample code logic like below:
> >
> >    //
> >    // First increase semaphore count by 1 for processors in this package.
> >    //
> >    for (ProcessorIndex = 0; ProcessorIndex < PackageThreadsCount ;
> ProcessorIndex ++) {
> >      LibReleaseSemaphore ((UINT32 *) &SemaphorePtr[PackageOffset +
> ProcessorIndex]);
> >    }
> >    //
> >    // Second, check whether the count has reach the check number.
> >    //
> >    for (ProcessorIndex = 0; ProcessorIndex < ValidApCount; ProcessorIndex
> ++) {
> >      LibWaitForSemaphore (&SemaphorePtr[ApOffset]);
> >    }
> >
> > Platform Requirement:
> > 1. This change requires register MSR setting base on MSR scope info. If still
> register MSR
> >     for all threads, exception may raised.
> >
> > Known limitation:
> > 1. Current CpuFeatures driver supports DXE instance and PEI instance. But
> semaphore logic
> >     requires Aps execute in async mode which is not supported by PEI driver.
> So CpuFeature
> >     PEI instance not works after this change. We plan to support async mode
> for PEI in phase
> >     2 for this task.
> >
> > Cc: Ruiyu Ni <ruiyu.ni@intel.com>
> > Cc: Laszlo Ersek <lersek@redhat.com>
> > Contributed-under: TianoCore Contribution Agreement 1.1
> > Signed-off-by: Eric Dong <eric.dong@intel.com>
> > ---
> >   .../RegisterCpuFeaturesLib/CpuFeaturesInitialize.c | 324 ++++++++++++--
> -
> >   .../DxeRegisterCpuFeaturesLib.c                    |  71 +++-
> >   .../DxeRegisterCpuFeaturesLib.inf                  |   3 +
> >   .../PeiRegisterCpuFeaturesLib.c                    |  55 ++-
> >   .../PeiRegisterCpuFeaturesLib.inf                  |   1 +
> >   .../RegisterCpuFeaturesLib/RegisterCpuFeatures.h   |  51 ++-
> >   .../RegisterCpuFeaturesLib.c                       | 452 ++++++++++++++++++---
> >   7 files changed, 840 insertions(+), 117 deletions(-)
> >
> > diff --git
> a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c
> b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c
> > index ba3fb3250f..f820b4fed7 100644
> > --- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c
> > +++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c
> > @@ -145,6 +145,20 @@ CpuInitDataInitialize (
> >     CPU_FEATURES_INIT_ORDER              *InitOrder;
> >     CPU_FEATURES_DATA                    *CpuFeaturesData;
> >     LIST_ENTRY                           *Entry;
> > +  UINT32                               Core;
> > +  UINT32                               Package;
> > +  UINT32                               Thread;
> > +  EFI_CPU_PHYSICAL_LOCATION            *Location;
> > +  UINT32                               *CoreArray;
> > +  UINTN                                Index;
> > +  UINT32                               ValidCount;
> > +  UINTN                                CoreIndex;
> > +  ACPI_CPU_DATA                        *AcpiCpuData;
> > +  CPU_STATUS_INFORMATION               *CpuStatus;
> > +
> > +  Core    = 0;
> > +  Package = 0;
> > +  Thread  = 0;
> >
> >     CpuFeaturesData = GetCpuFeaturesData ();
> >     CpuFeaturesData->InitOrder = AllocateZeroPool (sizeof
> (CPU_FEATURES_INIT_ORDER) * NumberOfCpus);
> > @@ -163,6 +177,16 @@ CpuInitDataInitialize (
> >       Entry = Entry->ForwardLink;
> >     }
> >
> > +  CpuFeaturesData->NumberOfCpus = (UINT32) NumberOfCpus;
> > +
> > +  AcpiCpuData = (ACPI_CPU_DATA *) (UINTN) PcdGet64
> (PcdCpuS3DataAddress);
> > +  ASSERT (AcpiCpuData != NULL);
> > +  CpuFeaturesData->AcpiCpuData= AcpiCpuData;
> > +
> > +  CpuStatus = &AcpiCpuData->CpuStatus;
> > +  AcpiCpuData->ApLocation = AllocateZeroPool (sizeof
> (EFI_CPU_PHYSICAL_LOCATION) * NumberOfCpus);
> > +  ASSERT (AcpiCpuData->ApLocation != NULL);
> > +
> >     for (ProcessorNumber = 0; ProcessorNumber < NumberOfCpus;
> ProcessorNumber++) {
> >       InitOrder = &CpuFeaturesData->InitOrder[ProcessorNumber];
> >       InitOrder->FeaturesSupportedMask = AllocateZeroPool
> (CpuFeaturesData->BitMaskSize);
> > @@ -175,7 +199,59 @@ CpuInitDataInitialize (
> >         &ProcessorInfoBuffer,
> >         sizeof (EFI_PROCESSOR_INFORMATION)
> >         );
> > +    CopyMem (
> > +      AcpiCpuData->ApLocation + ProcessorNumber,
> > +      &ProcessorInfoBuffer.Location,
> > +      sizeof (EFI_CPU_PHYSICAL_LOCATION)
> > +      );
> > +
> 
> Please add more comments here to describe what the below code tries to
> do and why.

Yes, will add comments in next version change.

> 
> > +    if (Package < ProcessorInfoBuffer.Location.Package) {
> > +      Package = ProcessorInfoBuffer.Location.Package;
> > +    }
> > +    if (Core < ProcessorInfoBuffer.Location.Core) {
> > +      Core = ProcessorInfoBuffer.Location.Core;
> > +    }
> > +    if (Thread < ProcessorInfoBuffer.Location.Thread) {
> > +      Thread = ProcessorInfoBuffer.Location.Thread;
> > +    }
> > +  }
> > +  CpuStatus->PackageCount = Package + 1;
> > +  CpuStatus->CoreCount    = Core + 1;
> > +  CpuStatus->ThreadCount  = Thread + 1;
> 
> 
> > +  DEBUG ((DEBUG_INFO, "Processor Info: Package: %d, Core : %d,
> Thread: %d\n",
> > +         CpuStatus->PackageCount,
> > +         CpuStatus->CoreCount,
> > +         CpuStatus->ThreadCount));
> 
> Please use MaxCore and MaxThread in debug message. Otherwise it's
> confusing.

Yes, will update in next version change.

> 
> > +
> > +  //
> > +  // Collect valid core count in each package because not all cores are valid.
> > +  //
> > +  CpuStatus->ValidCoresInPackages = AllocateZeroPool (sizeof (UINT32) *
> CpuStatus->PackageCount);
> > +  ASSERT (CpuStatus->ValidCoresInPackages != NULL);
> 
> Please add comments to describe the purpose of CoreArray.
> CoreArray is not a good name IMO. How about:
> CoreVisited - AllocatePool (sizeof (BOOLEAN) * CpuStatus->MaxCoreCount);
> 
> > +  CoreArray = AllocatePool (sizeof (UINT32) * CpuStatus->CoreCount);
> > +  ASSERT (CoreArray != NULL);
> > +
> > +  for (Index = 0; Index <= Package; Index ++ ) {
> 
> Please stop using Package/Core/Thread. Use the field in CpuStatus
> structure instead. It makes the code more readable.

Ok, will update in next version.

> 
> > +    ZeroMem (CoreArray, sizeof (UINT32) * (Core + 1));
> > +    for (ProcessorNumber = 0; ProcessorNumber < NumberOfCpus;
> ProcessorNumber++) {
> > +      Location = &CpuFeaturesData-
> >InitOrder[ProcessorNumber].CpuInfo.ProcessorInfo.Location;
> > +      if (Location->Package == Index) {
> > +        CoreArray[Location->Core] = 1;
> > +      }
> 
> The above if-clause can be:
>           if ((Location->Package == Index) &&
>               !CoreVisited[Location->Core])) {
>             CpuStatus->ValidCoreCountPerPackage[Index]++;
>             CoreVisited[Location->Core] = TRUE;
>           }
> 
> The for-loop below can be removed.

Thanks for the enhancement, will update the code logic in next version.

> 
> > +    }
> > +    for (CoreIndex = 0, ValidCount = 0; CoreIndex <= Core; CoreIndex ++) {
> > +      ValidCount += CoreArray[CoreIndex];
> > +    }
> > +    CpuStatus->ValidCoresInPackages[Index] = ValidCount;
> >     }
> > +  FreePool (CoreArray);
> > +  for (Index = 0; Index <= Package; Index++) {
> > +    DEBUG ((DEBUG_INFO, "Package: %d, Valid Core : %d\n", Index,
> CpuStatus->ValidCoresInPackages[Index]));
> > +  }
> > +
> > +  CpuFeaturesData->CpuFlags.SemaphoreCount = AllocateZeroPool
> (sizeof (UINT32) * CpuStatus->PackageCount * CpuStatus->CoreCount*
> CpuStatus->ThreadCount);
> > +  ASSERT (CpuFeaturesData->CpuFlags.SemaphoreCount != NULL);
> > +
> >     //
> >     // Get support and configuration PCDs
> >     //
> > @@ -310,7 +386,7 @@ CollectProcessorData (
> >     LIST_ENTRY                           *Entry;
> >     CPU_FEATURES_DATA                    *CpuFeaturesData;
> >
> > -  CpuFeaturesData = GetCpuFeaturesData ();
> > +  CpuFeaturesData = (CPU_FEATURES_DATA *)Buffer;
> 
> Is the above change more proper in a separate patch?
> 
> >     ProcessorNumber = GetProcessorIndex ();
> >     CpuInfo = &CpuFeaturesData->InitOrder[ProcessorNumber].CpuInfo;
> >     //
> > @@ -416,6 +492,15 @@ DumpRegisterTableOnProcessor (
> >           RegisterTableEntry->Value
> >           ));
> >         break;
> > +    case Semaphore:
> > +      DEBUG ((
> > +        DebugPrintErrorLevel,
> > +        "Processor: %d: Semaphore: Scope Value: %d\r\n",
> 
> How about print the Scope value in string? This makes the debug message
> more meaningful.

Ok, will do it in next version change

> 
> > +        ProcessorNumber,
> > +        RegisterTableEntry->Value
> > +        ));
> > +      break;
> > +
> >       default:
> >         break;
> >       }
> > @@ -441,6 +526,11 @@ AnalysisProcessorFeatures (
> >     REGISTER_CPU_FEATURE_INFORMATION     *CpuInfo;
> >     LIST_ENTRY                           *Entry;
> >     CPU_FEATURES_DATA                    *CpuFeaturesData;
> > +  LIST_ENTRY                           *NextEntry;
> > +  CPU_FEATURES_ENTRY                   *NextCpuFeatureInOrder;
> > +  BOOLEAN                              Success;
> > +  CPU_FEATURE_DEPENDENCE_TYPE          BeforeDep;
> > +  CPU_FEATURE_DEPENDENCE_TYPE          AfterDep;
> >
> >     CpuFeaturesData = GetCpuFeaturesData ();
> >     CpuFeaturesData->CapabilityPcd = AllocatePool (CpuFeaturesData-
> >BitMaskSize);
> > @@ -517,8 +607,14 @@ AnalysisProcessorFeatures (
> >       //
> >       CpuInfo = &CpuFeaturesData->InitOrder[ProcessorNumber].CpuInfo;
> >       Entry = GetFirstNode (&CpuInitOrder->OrderList);
> > +    NextEntry = Entry->ForwardLink;
> >       while (!IsNull (&CpuInitOrder->OrderList, Entry)) {
> >         CpuFeatureInOrder = CPU_FEATURE_ENTRY_FROM_LINK (Entry);
> > +      if (!IsNull (&CpuInitOrder->OrderList, NextEntry)) {
> > +        NextCpuFeatureInOrder = CPU_FEATURE_ENTRY_FROM_LINK
> (NextEntry);
> > +      } else {
> > +        NextCpuFeatureInOrder = NULL;
> > +      }
> >         if (IsBitMaskMatch (CpuFeatureInOrder->FeatureMask,
> CpuFeaturesData->SettingPcd)) {
> >           Status = CpuFeatureInOrder->InitializeFunc (ProcessorNumber,
> CpuInfo, CpuFeatureInOrder->ConfigData, TRUE);
> >           if (EFI_ERROR (Status)) {
> > @@ -532,6 +628,8 @@ AnalysisProcessorFeatures (
> >               DEBUG ((DEBUG_WARN, "Warning :: Failed to enable Feature: Mask
> = "));
> >               DumpCpuFeatureMask (CpuFeatureInOrder->FeatureMask);
> >             }
> > +        } else {
> > +          Success = TRUE;
> >           }
> >         } else {
> >           Status = CpuFeatureInOrder->InitializeFunc (ProcessorNumber,
> CpuInfo, CpuFeatureInOrder->ConfigData, FALSE);
> > @@ -542,9 +640,36 @@ AnalysisProcessorFeatures (
> >               DEBUG ((DEBUG_WARN, "Warning :: Failed to disable Feature: Mask
> = "));
> >               DumpCpuFeatureMask (CpuFeatureInOrder->FeatureMask);
> >             }
> > +        } else {
> > +          Success = TRUE;
> >           }
> >         }
> > -      Entry = Entry->ForwardLink;
> > +
> > +      if (Success) {
> > +        //
> > +        // If feature has dependence with the next feature (ONLY care
> core/package dependency).
> > +        // and feature initialize succeed, add sync semaphere here.
> > +        //
> > +        BeforeDep = DetectFeatureScope (CpuFeatureInOrder, TRUE);
> > +        if (NextCpuFeatureInOrder != NULL) {
> > +          AfterDep  = DetectFeatureScope (NextCpuFeatureInOrder, FALSE);
> > +        } else {
> > +          AfterDep = NoneDepType;
> > +        }
> > +        //
> > +        // Assume only one of the depend is valid.
> > +        //
> > +        ASSERT (!(BeforeDep > ThreadDepType && AfterDep >
> ThreadDepType));
> > +        if (BeforeDep > ThreadDepType) {
> > +          CPU_REGISTER_TABLE_WRITE32 (ProcessorNumber, Semaphore, 0,
> BeforeDep);
> > +        }
> > +        if (AfterDep > ThreadDepType) {
> > +          CPU_REGISTER_TABLE_WRITE32 (ProcessorNumber, Semaphore, 0,
> AfterDep);
> > +        }
> > +      }
> > +
> > +      Entry     = Entry->ForwardLink;
> > +      NextEntry = Entry->ForwardLink;
> >       }
> >
> >       //
> > @@ -561,27 +686,79 @@ AnalysisProcessorFeatures (
> >     }
> >   }
> >
> > +/**
> > +  Increment semaphore by 1.
> > +
> > +  @param      Sem            IN:  32-bit unsigned integer
> > +
> > +**/
> > +VOID
> > +LibReleaseSemaphore (
> > +  IN OUT  volatile UINT32           *Sem
> > +  )
> > +{
> > +  InterlockedIncrement (Sem);
> > +}
> > +
> > +/**
> > +  Decrement the semaphore by 1 if it is not zero.
> > +
> > +  Performs an atomic decrement operation for semaphore.
> > +  The compare exchange operation must be performed using
> > +  MP safe mechanisms.
> > +
> > +  @param      Sem            IN:  32-bit unsigned integer
> > +
> > +**/
> > +VOID
> > +LibWaitForSemaphore (
> > +  IN OUT  volatile UINT32           *Sem
> > +  )
> > +{
> > +  UINT32  Value;
> > +
> > +  do {
> > +    Value = *Sem;
> > +  } while (Value == 0);
> > +
> > +  InterlockedDecrement (Sem);
> > +}
> > +
> >   /**
> >     Initialize the CPU registers from a register table.
> >
> > -  @param[in]  ProcessorNumber  The index of the CPU executing this
> function.
> > +  @param[in]  RegisterTable         The register table for this AP.
> > +  @param[in]  ApLocation            AP location info for this ap.
> > +  @param[in]  CpuStatus             CPU status info for this CPU.
> > +  @param[in]  CpuFlags              Flags data structure used when program the
> register.
> >
> >     @note This service could be called by BSP/APs.
> >   **/
> >   VOID
> > +EFIAPI
> >   ProgramProcessorRegister (
> > -  IN UINTN  ProcessorNumber
> > +  IN CPU_REGISTER_TABLE           *RegisterTable,
> > +  IN EFI_CPU_PHYSICAL_LOCATION    *ApLocation,
> > +  IN CPU_STATUS_INFORMATION       *CpuStatus,
> > +  IN PROGRAM_CPU_REGISTER_FLAGS   *CpuFlags
> >     )
> >   {
> > -  CPU_FEATURES_DATA         *CpuFeaturesData;
> > -  CPU_REGISTER_TABLE        *RegisterTable;
> >     CPU_REGISTER_TABLE_ENTRY  *RegisterTableEntry;
> >     UINTN                     Index;
> >     UINTN                     Value;
> >     CPU_REGISTER_TABLE_ENTRY  *RegisterTableEntryHead;
> > -
> > -  CpuFeaturesData = GetCpuFeaturesData ();
> > -  RegisterTable = &CpuFeaturesData->RegisterTable[ProcessorNumber];
> > +  volatile UINT32           *SemaphorePtr;
> > +  UINT32                    CoreOffset;
> > +  UINT32                    PackageOffset;
> > +  UINT32                    PackageThreadsCount;
> > +  UINT32                    ApOffset;
> > +  UINTN                     ProcessorIndex;
> > +  UINTN                     ApIndex;
> > +  UINTN                     ValidApCount;
> > +
> > +  ApIndex = ApLocation->Package * CpuStatus->CoreCount * CpuStatus-
> >ThreadCount \
> > +            + ApLocation->Core * CpuStatus->ThreadCount \
> > +            + ApLocation->Thread;
> >
> >     //
> >     // Traverse Register Table of this logical processor
> > @@ -591,6 +768,7 @@ ProgramProcessorRegister (
> >     for (Index = 0; Index < RegisterTable->TableLength; Index++) {
> >
> >       RegisterTableEntry = &RegisterTableEntryHead[Index];
> > +    DEBUG ((DEBUG_INFO, "Processor = %d, Entry Index %d, Type
> = %d!\n", ApIndex, Index, RegisterTableEntry->RegisterType));
> How about print the register type in string?

Yes, will do it in the next version.

> 
> >
> >       //
> >       // Check the type of specified register
> > @@ -654,10 +832,6 @@ ProgramProcessorRegister (
> >       // The specified register is Model Specific Register
> >       //
> >       case Msr:
> > -      //
> > -      // Get lock to avoid Package/Core scope MSRs programming issue in
> parallel execution mode
> > -      //
> > -      AcquireSpinLock (&CpuFeaturesData->MsrLock);
> >         if (RegisterTableEntry->ValidBitLength >= 64) {
> >           //
> >           // If length is not less than 64 bits, then directly write without reading
> > @@ -677,20 +851,19 @@ ProgramProcessorRegister (
> >             RegisterTableEntry->Value
> >             );
> >         }
> > -      ReleaseSpinLock (&CpuFeaturesData->MsrLock);
> >         break;
> >       //
> >       // MemoryMapped operations
> >       //
> >       case MemoryMapped:
> > -      AcquireSpinLock (&CpuFeaturesData->MemoryMappedLock);
> > +      AcquireSpinLock (&CpuFlags->MemoryMappedLock);
> >         MmioBitFieldWrite32 (
> >           (UINTN)(RegisterTableEntry->Index | LShiftU64 (RegisterTableEntry-
> >HighIndex, 32)),
> >           RegisterTableEntry->ValidBitStart,
> >           RegisterTableEntry->ValidBitStart + RegisterTableEntry-
> >ValidBitLength - 1,
> >           (UINT32)RegisterTableEntry->Value
> >           );
> > -      ReleaseSpinLock (&CpuFeaturesData->MemoryMappedLock);
> > +      ReleaseSpinLock (&CpuFlags->MemoryMappedLock);
> >         break;
> >       //
> >       // Enable or disable cache
> > @@ -706,6 +879,50 @@ ProgramProcessorRegister (
> >         }
> >         break;
> >
> > +    case Semaphore:
> > +      SemaphorePtr = CpuFlags->SemaphoreCount;
> > +      switch (RegisterTableEntry->Value) {
> > +      case CoreDepType:
> > +        CoreOffset = (ApLocation->Package * CpuStatus->CoreCount +
> ApLocation->Core) * CpuStatus->ThreadCount > +        ApOffset = CoreOffset
> + ApLocation->Thread;
> 
> How about FirstThread and CurrentThread?

Ok, will use this new names.

> 
> > +        //
> > +        // First increase semaphore count by 1 for processors in this core.
> This comment might not be helpful for reviewer to understand.
> How about "Notify all threads in current Core"?
> 
> > +        //
> > +        for (ProcessorIndex = 0; ProcessorIndex < CpuStatus->ThreadCount;
> ProcessorIndex ++) {
> > +          LibReleaseSemaphore ((UINT32 *) &SemaphorePtr[CoreOffset +
> ProcessorIndex]);
> > +        }
> > +        //
> > +        // Second, check whether the count has reach the check number.
> How about "Wait for all threads in current Core"?

Ok, will do this change in next version changes.

> 
> Below diagram is also helpful
> //
> //  V(x) = LibReleaseSemaphore (Semaphore[FirstThread + x]);
> //  P(x) = LibWaitForSemaphore (Semaphore[FirstThread + x]);
> //
> //  All threads (T0...Tn) waits in P() line and continues running
> //  together.
> //
> //
> //  T0             T1            ...           Tn
> //
> //  V(0...n)       V(0...n)      ...           V(0...n)
> //  n * P(0)       n * P(1)      ...           n * P(n)
> //
> 

Ok, will do this change in next version changes.

> > +        //
> > +        for (ProcessorIndex = 0; ProcessorIndex < CpuStatus->ThreadCount;
> ProcessorIndex ++) {
> > +          LibWaitForSemaphore (&SemaphorePtr[ApOffset]);
> > +        }
> > +        break;
> > +
> > +      case PackageDepType:
> > +        PackageOffset = ApLocation->Package * CpuStatus->CoreCount *
> CpuStatus->ThreadCount;
> 
> FirstThread?

Ok, will do this change in next version changes.

> 
> > +        PackageThreadsCount = CpuStatus->ThreadCount * CpuStatus-
> >CoreCount;
> ThreadCount?
> 
> > +        ApOffset = PackageOffset + CpuStatus->ThreadCount * ApLocation-
> >Core + ApLocation->Thread;
> CurrentThread?

Ok, will do this change in next version changes.

> 
> > +        ValidApCount = CpuStatus->ThreadCount * CpuStatus-
> >ValidCoresInPackages[ApLocation->Package];
> ValidThreadCount?

Ok, will do this change in next version changes.

> 
> > +        //
> > +        // First increase semaphore count by 1 for processors in this package.
> How about "Notify all threads in current Package"?

Ok, will do this change in next version changes.

> > +        //
> > +        for (ProcessorIndex = 0; ProcessorIndex < PackageThreadsCount ;
> ProcessorIndex ++) {
> > +          LibReleaseSemaphore ((UINT32 *) &SemaphorePtr[PackageOffset +
> ProcessorIndex]);
> > +        }
> > +        //
> > +        // Second, check whether the count has reach the check number.
> How about "Wait for all threads in current Package"?

Ok, will do this change in next version changes.

> > +        //
> > +        for (ProcessorIndex = 0; ProcessorIndex < ValidApCount;
> ProcessorIndex ++) {
> > +          LibWaitForSemaphore (&SemaphorePtr[ApOffset]);
> > +        }
> > +        break;
> > +
> > +      default:
> > +        break;
> > +      }
> > +      break;
> > +
> >       default:
> >         break;
> >       }
> > @@ -724,10 +941,36 @@ SetProcessorRegister (
> >     IN OUT VOID            *Buffer
> >     )
> >   {
> > -  UINTN                  ProcessorNumber;
> > +  CPU_FEATURES_DATA         *CpuFeaturesData;
> > +  CPU_REGISTER_TABLE        *RegisterTable;
> > +  CPU_REGISTER_TABLE        *RegisterTables;
> > +  UINT32                    InitApicId;
> > +  UINTN                     ProcIndex;
> > +  UINTN                     Index;
> > +  ACPI_CPU_DATA             *AcpiCpuData;
> >
> > -  ProcessorNumber = GetProcessorIndex ();
> > -  ProgramProcessorRegister (ProcessorNumber);
> > +  CpuFeaturesData = (CPU_FEATURES_DATA *) Buffer;
> > +  AcpiCpuData = CpuFeaturesData->AcpiCpuData;
> > +
> > +  RegisterTables = (CPU_REGISTER_TABLE *)(UINTN)AcpiCpuData-
> >RegisterTable;
> > +
> > +  InitApicId = GetInitialApicId ();
> > +  RegisterTable = NULL;
> > +  for (Index = 0; Index < AcpiCpuData->NumberOfCpus; Index++) {
> > +    if (RegisterTables[Index].InitialApicId == InitApicId) {
> > +      RegisterTable =  &RegisterTables[Index];
> > +      ProcIndex = Index;
> > +      break;
> > +    }
> > +  }
> > +  ASSERT (RegisterTable != NULL);
> > +
> > +  ProgramProcessorRegister (
> > +    RegisterTable,
> > +    AcpiCpuData->ApLocation + ProcIndex,
> > +    &AcpiCpuData->CpuStatus,
> > +    &CpuFeaturesData->CpuFlags
> > +    );
> >   }
> >
> >   /**
> > @@ -746,6 +989,9 @@ CpuFeaturesDetect (
> >   {
> >     UINTN                  NumberOfCpus;
> >     UINTN                  NumberOfEnabledProcessors;
> > +  CPU_FEATURES_DATA      *CpuFeaturesData;
> > +
> > +  CpuFeaturesData = GetCpuFeaturesData();
> >
> >     GetNumberOfProcessor (&NumberOfCpus,
> &NumberOfEnabledProcessors);
> >
> > @@ -754,49 +1000,13 @@ CpuFeaturesDetect (
> >     //
> >     // Wakeup all APs for data collection.
> >     //
> > -  StartupAPsWorker (CollectProcessorData);
> > +  StartupAPsWorker (CollectProcessorData, NULL);
> >
> >     //
> >     // Collect data on BSP
> >     //
> > -  CollectProcessorData (NULL);
> > +  CollectProcessorData (CpuFeaturesData);
> >
> >     AnalysisProcessorFeatures (NumberOfCpus);
> >   }
> >
> > -/**
> > -  Performs CPU features Initialization.
> > -
> > -  This service will invoke MP service to perform CPU features
> > -  initialization on BSP/APs per user configuration.
> > -
> > -  @note This service could be called by BSP only.
> > -**/
> > -VOID
> > -EFIAPI
> > -CpuFeaturesInitialize (
> > -  VOID
> > -  )
> > -{
> > -  CPU_FEATURES_DATA      *CpuFeaturesData;
> > -  UINTN                  OldBspNumber;
> > -
> > -  CpuFeaturesData = GetCpuFeaturesData ();
> > -
> > -  OldBspNumber = GetProcessorIndex();
> > -  CpuFeaturesData->BspNumber = OldBspNumber;
> > -  //
> > -  // Wakeup all APs for programming.
> > -  //
> > -  StartupAPsWorker (SetProcessorRegister);
> > -  //
> > -  // Programming BSP
> > -  //
> > -  SetProcessorRegister (NULL);
> > -  //
> > -  // Switch to new BSP if required
> > -  //
> > -  if (CpuFeaturesData->BspNumber != OldBspNumber) {
> > -    SwitchNewBsp (CpuFeaturesData->BspNumber);
> > -  }
> > -}
> > diff --git
> a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.c
> b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.c
> > index 1f34a3f489..8346f7004f 100644
> > ---
> a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.c
> > +++
> b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.c
> > @@ -15,6 +15,7 @@
> >   #include <PiDxe.h>
> >
> >   #include <Library/UefiBootServicesTableLib.h>
> > +#include <Library/UefiLib.h>
> >
> >   #include "RegisterCpuFeatures.h"
> >
> > @@ -115,14 +116,20 @@ GetProcessorInformation (
> >
> >     @param[in]  Procedure               A pointer to the function to be run on
> >                                         enabled APs of the system.
> > +  @param[in]  MpEvent                 A pointer to the event to be used later
> > +                                      to check whether procedure has done.
> >   **/
> >   VOID
> >   StartupAPsWorker (
> > -  IN  EFI_AP_PROCEDURE                 Procedure
> > +  IN  EFI_AP_PROCEDURE                 Procedure,
> > +  IN  VOID                             *MpEvent
> >     )
> >   {
> >     EFI_STATUS                           Status;
> >     EFI_MP_SERVICES_PROTOCOL             *MpServices;
> > +  CPU_FEATURES_DATA                    *CpuFeaturesData;
> > +
> > +  CpuFeaturesData = GetCpuFeaturesData ();
> >
> >     MpServices = GetMpProtocol ();
> >     //
> > @@ -132,9 +139,9 @@ StartupAPsWorker (
> >                    MpServices,
> >                    Procedure,
> >                    FALSE,
> > -                 NULL,
> > +                 (EFI_EVENT)MpEvent,
> >                    0,
> > -                 NULL,
> > +                 CpuFeaturesData,
> >                    NULL
> >                    );
> >     ASSERT_EFI_ERROR (Status);
> > @@ -197,3 +204,61 @@ GetNumberOfProcessor (
> >     ASSERT_EFI_ERROR (Status);
> >   }
> >
> > +/**
> > +  Performs CPU features Initialization.
> > +
> > +  This service will invoke MP service to perform CPU features
> > +  initialization on BSP/APs per user configuration.
> > +
> > +  @note This service could be called by BSP only.
> > +**/
> > +VOID
> > +EFIAPI
> > +CpuFeaturesInitialize (
> > +  VOID
> > +  )
> > +{
> > +  CPU_FEATURES_DATA          *CpuFeaturesData;
> > +  UINTN                      OldBspNumber;
> > +  EFI_EVENT                  MpEvent;
> > +  EFI_STATUS                 Status;
> > +
> > +  CpuFeaturesData = GetCpuFeaturesData ();
> > +
> > +  OldBspNumber = GetProcessorIndex();
> > +  CpuFeaturesData->BspNumber = OldBspNumber;
> > +
> > +  Status = gBS->CreateEvent (
> > +                  EVT_NOTIFY_WAIT,
> > +                  TPL_CALLBACK,
> > +                  EfiEventEmptyFunction,
> > +                  NULL,
> > +                  &MpEvent
> > +                  );
> > +  ASSERT_EFI_ERROR (Status);
> > +
> > +  //
> > +  // Wakeup all APs for programming.
> > +  //
> > +  StartupAPsWorker (SetProcessorRegister, MpEvent);
> > +  //
> > +  // Programming BSP
> > +  //
> > +  SetProcessorRegister (CpuFeaturesData);
> > +
> > +  //
> > +  // Wait all processors to finish the task.
> > +  //
> > +  do {
> > +    Status = gBS->CheckEvent (MpEvent);
> > +  } while (Status == EFI_NOT_READY);
> > +  ASSERT_EFI_ERROR (Status);
> > +
> > +  //
> > +  // Switch to new BSP if required
> > +  //
> > +  if (CpuFeaturesData->BspNumber != OldBspNumber) {
> > +    SwitchNewBsp (CpuFeaturesData->BspNumber);
> > +  }
> > +}
> > +
> > diff --git
> a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.i
> nf
> b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.i
> nf
> > index f0f317c945..6693bae575 100644
> > ---
> a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.i
> nf
> > +++
> b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.i
> nf
> > @@ -47,6 +47,9 @@
> >     SynchronizationLib
> >     UefiBootServicesTableLib
> >     IoLib
> > +  UefiBootServicesTableLib
> > +  UefiLib
> > +  LocalApicLib
> >
> >   [Protocols]
> >     gEfiMpServiceProtocolGuid                                            ## CONSUMES
> > diff --git
> a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.c
> b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.c
> > index 82fe268812..799864a136 100644
> > ---
> a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.c
> > +++
> b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.c
> > @@ -149,11 +149,15 @@ GetProcessorInformation (
> >   **/
> >   VOID
> >   StartupAPsWorker (
> > -  IN  EFI_AP_PROCEDURE                 Procedure
> > +  IN  EFI_AP_PROCEDURE                 Procedure,
> > +  IN  VOID                             *MpEvent
> >     )
> >   {
> >     EFI_STATUS                           Status;
> >     EFI_PEI_MP_SERVICES_PPI              *CpuMpPpi;
> > +  CPU_FEATURES_DATA                    *CpuFeaturesData;
> > +
> > +  CpuFeaturesData = GetCpuFeaturesData ();
> >
> >     //
> >     // Get MP Services Protocol
> > @@ -175,7 +179,7 @@ StartupAPsWorker (
> >                    Procedure,
> >                    FALSE,
> >                    0,
> > -                 NULL
> > +                 CpuFeaturesData
> >                    );
> >     ASSERT_EFI_ERROR (Status);
> >   }
> > @@ -257,3 +261,50 @@ GetNumberOfProcessor (
> >                            );
> >     ASSERT_EFI_ERROR (Status);
> >   }
> > +
> > +/**
> > +  Performs CPU features Initialization.
> > +
> > +  This service will invoke MP service to perform CPU features
> > +  initialization on BSP/APs per user configuration.
> > +
> > +  @note This service could be called by BSP only.
> > +**/
> > +VOID
> > +EFIAPI
> > +CpuFeaturesInitialize (
> > +  VOID
> > +  )
> > +{
> > +  CPU_FEATURES_DATA          *CpuFeaturesData;
> > +  UINTN                      OldBspNumber;
> > +
> > +  CpuFeaturesData = GetCpuFeaturesData ();
> > +
> > +  OldBspNumber = GetProcessorIndex();
> > +  CpuFeaturesData->BspNumber = OldBspNumber;
> > +
> > +  //
> > +  // Known limitation: In PEI phase, CpuFeatures driver not
> > +  // support async mode execute tasks. So semaphore type
> > +  // register can't been used for this instance, must use
> > +  // DXE type instance.
> > +  //
> > +
> > +  //
> > +  // Wakeup all APs for programming.
> > +  //
> > +  StartupAPsWorker (SetProcessorRegister, NULL);
> > +  //
> > +  // Programming BSP
> > +  //
> > +  SetProcessorRegister (CpuFeaturesData);
> > +
> > +  //
> > +  // Switch to new BSP if required
> > +  //
> > +  if (CpuFeaturesData->BspNumber != OldBspNumber) {
> > +    SwitchNewBsp (CpuFeaturesData->BspNumber);
> > +  }
> > +}
> > +
> > diff --git
> a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.in
> f
> b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.in
> f
> > index fdfef98293..e95f01df0b 100644
> > ---
> a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.in
> f
> > +++
> b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.in
> f
> > @@ -49,6 +49,7 @@
> >     PeiServicesLib
> >     PeiServicesTablePointerLib
> >     IoLib
> > +  LocalApicLib
> >
> >   [Ppis]
> >     gEfiPeiMpServicesPpiGuid                                             ## CONSUMES
> > diff --git
> a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeatures.h
> b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeatures.h
> > index edd266934f..39457e9730 100644
> > --- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeatures.h
> > +++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeatures.h
> > @@ -23,6 +23,7 @@
> >   #include <Library/MemoryAllocationLib.h>
> >   #include <Library/SynchronizationLib.h>
> >   #include <Library/IoLib.h>
> > +#include <Library/LocalApicLib.h>
> >
> >   #include <AcpiCpuData.h>
> >
> > @@ -46,16 +47,26 @@ typedef struct {
> >     CPU_FEATURE_INITIALIZE       InitializeFunc;
> >     UINT8                        *BeforeFeatureBitMask;
> >     UINT8                        *AfterFeatureBitMask;
> > +  UINT8                        *CoreBeforeFeatureBitMask;
> > +  UINT8                        *CoreAfterFeatureBitMask;
> > +  UINT8                        *PackageBeforeFeatureBitMask;
> > +  UINT8                        *PackageAfterFeatureBitMask;
> >     VOID                         *ConfigData;
> >     BOOLEAN                      BeforeAll;
> >     BOOLEAN                      AfterAll;
> >   } CPU_FEATURES_ENTRY;
> >
> > +//
> > +// Flags used when program the register.
> > +//
> > +typedef struct {
> > +  volatile UINTN           MemoryMappedLock;     // Spinlock used to program
> mmio
> > +  volatile UINT32          *SemaphoreCount;      // Semaphore used to
> program semaphore.
> > +} PROGRAM_CPU_REGISTER_FLAGS;
> > +
> >   typedef struct {
> >     UINTN                    FeaturesCount;
> >     UINT32                   BitMaskSize;
> > -  SPIN_LOCK                MsrLock;
> > -  SPIN_LOCK                MemoryMappedLock;
> >     LIST_ENTRY               FeatureList;
> >
> >     CPU_FEATURES_INIT_ORDER  *InitOrder;
> > @@ -64,9 +75,14 @@ typedef struct {
> >     UINT8                    *ConfigurationPcd;
> >     UINT8                    *SettingPcd;
> >
> > +  UINT32                   NumberOfCpus;
> > +  ACPI_CPU_DATA            *AcpiCpuData;
> > +
> >     CPU_REGISTER_TABLE       *RegisterTable;
> >     CPU_REGISTER_TABLE       *PreSmmRegisterTable;
> >     UINTN                    BspNumber;
> > +
> > +  PROGRAM_CPU_REGISTER_FLAGS  CpuFlags;
> >   } CPU_FEATURES_DATA;
> >
> >   #define CPU_FEATURE_ENTRY_FROM_LINK(a) \
> > @@ -118,10 +134,13 @@ GetProcessorInformation (
> >
> >     @param[in]  Procedure               A pointer to the function to be run on
> >                                         enabled APs of the system.
> > +  @param[in]  MpEvent                 A pointer to the event to be used later
> > +                                      to check whether procedure has done.
> >   **/
> >   VOID
> >   StartupAPsWorker (
> > -  IN  EFI_AP_PROCEDURE                 Procedure
> > +  IN  EFI_AP_PROCEDURE                 Procedure,
> > +  IN  VOID                             *MpEvent
> >     );
> >
> >   /**
> > @@ -170,4 +189,30 @@ DumpCpuFeature (
> >     IN CPU_FEATURES_ENTRY  *CpuFeature
> >     );
> >
> > +/**
> > +  Return feature dependence result.
> > +
> > +  @param[in]  CpuFeature        Pointer to CPU feature.
> > +  @param[in]  Before            Check before dependence or after.
> > +
> > +  @retval     return the dependence result.
> > +**/
> > +CPU_FEATURE_DEPENDENCE_TYPE
> > +DetectFeatureScope (
> > +  IN CPU_FEATURES_ENTRY         *CpuFeature,
> > +  IN BOOLEAN                    Before
> > +  );
> > +
> > +/**
> > +  Programs registers for the calling processor.
> > +
> > +  @param[in,out] Buffer  The pointer to private data buffer.
> > +
> > +**/
> > +VOID
> > +EFIAPI
> > +SetProcessorRegister (
> > +  IN OUT VOID            *Buffer
> > +  );
> > +
> >   #endif
> > diff --git
> a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeaturesLib.c
> b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeaturesLib.c
> > index fa7e107e39..f9e3178dc1 100644
> > ---
> a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeaturesLib.c
> > +++
> b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeaturesLib.c
> > @@ -112,6 +112,302 @@ IsBitMaskMatchCheck (
> >     return FALSE;
> >   }
> >
> > +/**
> > +  Return feature dependence result.
> > +
> > +  @param[in]  CpuFeature        Pointer to CPU feature.
> > +  @param[in]  Before            Check before dependence or after.
> > +
> > +  @retval     return the dependence result.
> > +**/
> > +CPU_FEATURE_DEPENDENCE_TYPE
> > +DetectFeatureScope (
> > +  IN CPU_FEATURES_ENTRY         *CpuFeature,
> > +  IN BOOLEAN                    Before
> > +  )
> > +{
> > +  if (Before) {
> > +    if (CpuFeature->PackageBeforeFeatureBitMask != NULL) {
> > +      return PackageDepType;
> > +    }
> > +
> > +    if (CpuFeature->CoreBeforeFeatureBitMask != NULL) {
> > +      return CoreDepType;
> > +    }
> > +
> > +    if (CpuFeature->BeforeFeatureBitMask != NULL) {
> > +      return ThreadDepType;
> > +    }
> > +
> > +    return NoneDepType;
> > +  }
> > +
> > +  if (CpuFeature->PackageAfterFeatureBitMask != NULL) {
> > +    return PackageDepType;
> > +  }
> > +
> > +  if (CpuFeature->CoreAfterFeatureBitMask != NULL) {
> > +    return CoreDepType;
> > +  }
> > +
> > +  if (CpuFeature->AfterFeatureBitMask != NULL) {
> > +    return ThreadDepType;
> > +  }
> > +
> > +  return NoneDepType;
> > +}
> > +
> > +/**
> > +  Clear dependence for the specified type.
> > +
> > +  @param[in]  CurrentFeature     Cpu feature need to clear.
> > +  @param[in]  Before             Before or after dependence relationship.
> > +
> > +**/
> > +VOID
> > +ClearFeatureScope (
> > +  IN CPU_FEATURES_ENTRY           *CpuFeature,
> > +  IN BOOLEAN                      Before
> > +  )
> > +{
> > +  if (Before) {
> > +    if (CpuFeature->BeforeFeatureBitMask != NULL) {
> > +      FreePool (CpuFeature->BeforeFeatureBitMask);
> > +      CpuFeature->BeforeFeatureBitMask = NULL;
> > +    }
> > +    if (CpuFeature->CoreBeforeFeatureBitMask != NULL) {
> > +      FreePool (CpuFeature->CoreBeforeFeatureBitMask);
> > +      CpuFeature->CoreBeforeFeatureBitMask = NULL;
> > +    }
> > +    if (CpuFeature->PackageBeforeFeatureBitMask != NULL) {
> > +      FreePool (CpuFeature->PackageBeforeFeatureBitMask);
> > +      CpuFeature->PackageBeforeFeatureBitMask = NULL;
> > +    }
> > +  } else {
> > +    if (CpuFeature->PackageAfterFeatureBitMask != NULL) {
> > +      FreePool (CpuFeature->PackageAfterFeatureBitMask);
> > +      CpuFeature->PackageAfterFeatureBitMask = NULL;
> > +    }
> > +    if (CpuFeature->CoreAfterFeatureBitMask != NULL) {
> > +      FreePool (CpuFeature->CoreAfterFeatureBitMask);
> > +      CpuFeature->CoreAfterFeatureBitMask = NULL;
> > +    }
> > +    if (CpuFeature->AfterFeatureBitMask != NULL) {
> > +      FreePool (CpuFeature->AfterFeatureBitMask);
> > +      CpuFeature->AfterFeatureBitMask = NULL;
> > +    }
> > +  }
> > +}
> > +
> > +/**
> > +  Base on dependence relationship to asjust feature dependence.
> > +
> > +  ONLY when the feature before(or after) the find feature also has
> > +  dependence with the find feature. In this case, driver need to base
> > +  on dependce relationship to decide how to insert current feature and
> > +  adjust the feature dependence.
> > +
> > +  @param[in]  PreviousFeature    CPU feature current before the find one.
> > +  @param[in]  CurrentFeature     Cpu feature need to adjust.
> > +  @param[in]  Before             Before or after dependence relationship.
> > +
> > +  @retval   TRUE   means the current feature dependence has been
> adjusted.
> > +
> > +  @retval   FALSE  means the previous feature dependence has been
> adjusted.
> > +                   or previous feature has no dependence with the find one.
> > +
> > +**/
> > +BOOLEAN
> > +AdjustFeaturesDependence (
> > +  IN OUT CPU_FEATURES_ENTRY         *PreviousFeature,
> > +  IN OUT CPU_FEATURES_ENTRY         *CurrentFeature,
> > +  IN     BOOLEAN                    Before
> > +  )
> > +{
> > +  CPU_FEATURE_DEPENDENCE_TYPE            PreDependType;
> > +  CPU_FEATURE_DEPENDENCE_TYPE            CurrentDependType;
> > +
> > +  PreDependType     = DetectFeatureScope(PreviousFeature, Before);
> > +  CurrentDependType = DetectFeatureScope(CurrentFeature, Before);
> > +
> > +  //
> > +  // If previous feature has no dependence with the find featue.
> > +  // return FALSE.
> > +  //
> > +  if (PreDependType == NoneDepType) {
> > +    return FALSE;
> > +  }
> > +
> > +  //
> > +  // If both feature have dependence, keep the one which needs use
> more
> > +  // processors and clear the dependence for the other one.
> > +  //
> > +  if (PreDependType >= CurrentDependType) {
> > +    ClearFeatureScope (CurrentFeature, Before);
> > +    return TRUE;
> > +  } else {
> > +    ClearFeatureScope (PreviousFeature, Before);
> > +    return FALSE;
> > +  }
> > +}
> > +
> > +/**
> > +  Base on dependence relationship to asjust feature order.
> > +
> > +  @param[in]  FeatureList        Pointer to CPU feature list
> > +  @param[in]  FindEntry          The entry this feature depend on.
> > +  @param[in]  CurrentEntry       The entry for this feature.
> > +  @param[in]  Before             Before or after dependence relationship.
> > +
> > +**/
> > +VOID
> > +AdjustEntry (
> > +  IN      LIST_ENTRY                *FeatureList,
> > +  IN OUT  LIST_ENTRY                *FindEntry,
> > +  IN OUT  LIST_ENTRY                *CurrentEntry,
> > +  IN      BOOLEAN                   Before
> > +  )
> > +{
> > +  LIST_ENTRY                *PreviousEntry;
> > +  CPU_FEATURES_ENTRY        *PreviousFeature;
> > +  CPU_FEATURES_ENTRY        *CurrentFeature;
> > +
> > +  //
> > +  // For CPU feature which has core or package type dependence, later
> code need to insert
> > +  // AcquireSpinLock/ReleaseSpinLock logic to sequency the execute order.
> > +  // So if driver finds both feature A and B need to execute before feature
> C, driver will
> > +  // base on dependence type of feature A and B to update the logic here.
> > +  // For example, feature A has package type dependence and feature B
> has core type dependence,
> > +  // because package type dependence need to wait for more processors
> which has strong dependence
> > +  // than core type dependence. So driver will adjust the feature order to
> B -> A -> C. and driver
> > +  // will remove the feature dependence in feature B.
> > +  // Driver just needs to make sure before feature C been executed,
> feature A has finished its task
> > +  // in all all thread. Feature A finished in all threads also means feature B
> have finshed in all
> > +  // threads.
> > +  //
> > +  if (Before) {
> > +    PreviousEntry = GetPreviousNode (FeatureList, FindEntry);
> > +  } else {
> >
> > +    PreviousEntry = GetNextNode (FeatureList, FindEntry);
> > +  }
> > +
> > +  CurrentFeature  = CPU_FEATURE_ENTRY_FROM_LINK (CurrentEntry);
> > +  RemoveEntryList (CurrentEntry);
> > +
> > +  if (IsNull (FeatureList, PreviousEntry)) {
> > +    //
> > +    // If not exist the previous or next entry, just insert the current entry.
> > +    //
> > +    if (Before) {
> > +      InsertTailList (FindEntry, CurrentEntry);
> > +    } else {
> > +      InsertHeadList (FindEntry, CurrentEntry);
> > +    }
> > +  } else {
> > +    //
> > +    // If exist the previous or next entry, need to check it before insert
> curent entry.
> > +    //
> > +    PreviousFeature = CPU_FEATURE_ENTRY_FROM_LINK (PreviousEntry);
> > +
> > +    if (AdjustFeaturesDependence (PreviousFeature, CurrentFeature,
> Before)) {
> > +      //
> > +      // Return TRUE means current feature dependence has been cleared
> and the previous
> > +      // feature dependence has been kept and used. So insert current
> feature before (or after)
> > +      // the previous feature.
> > +      //
> > +      if (Before) {
> > +        InsertTailList (PreviousEntry, CurrentEntry);
> > +      } else {
> > +        InsertHeadList (PreviousEntry, CurrentEntry);
> > +      }
> > +    } else {
> > +      if (Before) {
> > +        InsertTailList (FindEntry, CurrentEntry);
> > +      } else {
> > +        InsertHeadList (FindEntry, CurrentEntry);
> > +      }
> > +    }
> > +  }
> > +}
> >
> > +
> > +/**
> > +  Checks and adjusts current CPU features per dependency relationship.
> > +
> > +  @param[in]  FeatureList        Pointer to CPU feature list
> > +  @param[in]  CurrentEntry       Pointer to current checked CPU feature
> > +  @param[in]  FeatureMask        The feature bit mask.
> > +
> > +  @retval     return Swapped info.
> > +**/
> > +BOOLEAN
> > +InsertToBeforeEntry (
> > +  IN LIST_ENTRY              *FeatureList,
> > +  IN LIST_ENTRY              *CurrentEntry,
> > +  IN UINT8                   *FeatureMask
> > +  )
> > +{
> > +  LIST_ENTRY                 *CheckEntry;
> > +  CPU_FEATURES_ENTRY         *CheckFeature;
> > +  BOOLEAN                    Swapped;
> > +
> > +  Swapped = FALSE;
> > +
> > +  //
> > +  // Check all features dispatched before this entry
> > +  //
> > +  CheckEntry = GetFirstNode (FeatureList);
> > +  while (CheckEntry != CurrentEntry) {
> > +    CheckFeature = CPU_FEATURE_ENTRY_FROM_LINK (CheckEntry);
> > +    if (IsBitMaskMatchCheck (CheckFeature->FeatureMask, FeatureMask))
> {
> > +      AdjustEntry (FeatureList, CheckEntry, CurrentEntry, TRUE);
> > +      Swapped = TRUE;
> > +      break;
> > +    }
> > +    CheckEntry = CheckEntry->ForwardLink;
> > +  }
> > +
> > +  return Swapped;
> > +}
> > +
> > +/**
> > +  Checks and adjusts current CPU features per dependency relationship.
> > +
> > +  @param[in]  FeatureList        Pointer to CPU feature list
> > +  @param[in]  CurrentEntry       Pointer to current checked CPU feature
> > +  @param[in]  FeatureMask        The feature bit mask.
> > +
> > +  @retval     return Swapped info.
> > +**/
> > +BOOLEAN
> > +InsertToAfterEntry (
> > +  IN LIST_ENTRY              *FeatureList,
> > +  IN LIST_ENTRY              *CurrentEntry,
> > +  IN UINT8                   *FeatureMask
> > +  )
> > +{
> > +  LIST_ENTRY                 *CheckEntry;
> > +  CPU_FEATURES_ENTRY         *CheckFeature;
> > +  BOOLEAN                    Swapped;
> > +
> > +  Swapped = FALSE;
> > +
> > +  //
> > +  // Check all features dispatched after this entry
> > +  //
> > +  CheckEntry = GetNextNode (FeatureList, CurrentEntry);
> > +  while (!IsNull (FeatureList, CheckEntry)) {
> > +    CheckFeature = CPU_FEATURE_ENTRY_FROM_LINK (CheckEntry);
> > +    if (IsBitMaskMatchCheck (CheckFeature->FeatureMask, FeatureMask))
> {
> > +      AdjustEntry (FeatureList, CheckEntry, CurrentEntry, FALSE);
> > +      Swapped = TRUE;
> > +      break;
> > +    }
> > +    CheckEntry = CheckEntry->ForwardLink;
> > +  }
> > +
> > +  return Swapped;
> > +}
> > +
> >   /**
> >     Checks and adjusts CPU features order per dependency relationship.
> >
> > @@ -128,11 +424,13 @@ CheckCpuFeaturesDependency (
> >     CPU_FEATURES_ENTRY         *CheckFeature;
> >     BOOLEAN                    Swapped;
> >     LIST_ENTRY                 *TempEntry;
> > +  LIST_ENTRY                 *NextEntry;
> >
> >     CurrentEntry = GetFirstNode (FeatureList);
> >     while (!IsNull (FeatureList, CurrentEntry)) {
> >       Swapped = FALSE;
> >       CpuFeature = CPU_FEATURE_ENTRY_FROM_LINK (CurrentEntry);
> > +    NextEntry = CurrentEntry->ForwardLink;
> >       if (CpuFeature->BeforeAll) {
> >         //
> >         // Check all features dispatched before this entry
> > @@ -153,6 +451,7 @@ CheckCpuFeaturesDependency (
> >           CheckEntry = CheckEntry->ForwardLink;
> >         }
> >         if (Swapped) {
> > +        CurrentEntry = NextEntry;
> >           continue;
> >         }
> >       }
> > @@ -179,60 +478,59 @@ CheckCpuFeaturesDependency (
> >           CheckEntry = CheckEntry->ForwardLink;
> >         }
> >         if (Swapped) {
> > +        CurrentEntry = NextEntry;
> >           continue;
> >         }
> >       }
> >
> >       if (CpuFeature->BeforeFeatureBitMask != NULL) {
> > -      //
> > -      // Check all features dispatched before this entry
> > -      //
> > -      CheckEntry = GetFirstNode (FeatureList);
> > -      while (CheckEntry != CurrentEntry) {
> > -        CheckFeature = CPU_FEATURE_ENTRY_FROM_LINK (CheckEntry);
> > -        if (IsBitMaskMatchCheck (CheckFeature->FeatureMask, CpuFeature-
> >BeforeFeatureBitMask)) {
> > -          //
> > -          // If there is dependency, swap them
> > -          //
> > -          RemoveEntryList (CurrentEntry);
> > -          InsertTailList (CheckEntry, CurrentEntry);
> > -          Swapped = TRUE;
> > -          break;
> > -        }
> > -        CheckEntry = CheckEntry->ForwardLink;
> > -      }
> > +      Swapped = InsertToBeforeEntry (FeatureList, CurrentEntry,
> CpuFeature->BeforeFeatureBitMask);
> >         if (Swapped) {
> > +        CurrentEntry = NextEntry;
> >           continue;
> >         }
> >       }
> >
> >       if (CpuFeature->AfterFeatureBitMask != NULL) {
> > -      //
> > -      // Check all features dispatched after this entry
> > -      //
> > -      CheckEntry = GetNextNode (FeatureList, CurrentEntry);
> > -      while (!IsNull (FeatureList, CheckEntry)) {
> > -        CheckFeature = CPU_FEATURE_ENTRY_FROM_LINK (CheckEntry);
> > -        if (IsBitMaskMatchCheck (CheckFeature->FeatureMask, CpuFeature-
> >AfterFeatureBitMask)) {
> > -          //
> > -          // If there is dependency, swap them
> > -          //
> > -          TempEntry = GetNextNode (FeatureList, CurrentEntry);
> > -          RemoveEntryList (CurrentEntry);
> > -          InsertHeadList (CheckEntry, CurrentEntry);
> > -          CurrentEntry = TempEntry;
> > -          Swapped = TRUE;
> > -          break;
> > -        }
> > -        CheckEntry = CheckEntry->ForwardLink;
> > +      Swapped = InsertToAfterEntry (FeatureList, CurrentEntry, CpuFeature-
> >AfterFeatureBitMask);
> > +      if (Swapped) {
> > +        CurrentEntry = NextEntry;
> > +        continue;
> >         }
> > +    }
> > +
> > +    if (CpuFeature->CoreBeforeFeatureBitMask != NULL) {
> > +      Swapped = InsertToBeforeEntry (FeatureList, CurrentEntry,
> CpuFeature->CoreBeforeFeatureBitMask);
> >         if (Swapped) {
> > +        CurrentEntry = NextEntry;
> >           continue;
> >         }
> >       }
> > -    //
> > -    // No swap happened, check the next feature
> > -    //
> > +
> > +    if (CpuFeature->CoreAfterFeatureBitMask != NULL) {
> > +      Swapped = InsertToAfterEntry (FeatureList, CurrentEntry, CpuFeature-
> >CoreAfterFeatureBitMask);
> > +      if (Swapped) {
> > +        CurrentEntry = NextEntry;
> > +        continue;
> > +      }
> > +    }
> > +
> > +    if (CpuFeature->PackageBeforeFeatureBitMask != NULL) {
> > +      Swapped = InsertToBeforeEntry (FeatureList, CurrentEntry,
> CpuFeature->PackageBeforeFeatureBitMask);
> > +      if (Swapped) {
> > +        CurrentEntry = NextEntry;
> > +        continue;
> > +      }
> > +    }
> > +
> > +    if (CpuFeature->PackageAfterFeatureBitMask != NULL) {
> > +      Swapped = InsertToAfterEntry (FeatureList, CurrentEntry, CpuFeature-
> >PackageAfterFeatureBitMask);
> > +      if (Swapped) {
> > +        CurrentEntry = NextEntry;
> > +        continue;
> > +      }
> > +    }
> > +
> >       CurrentEntry = CurrentEntry->ForwardLink;
> >     }
> >   }
> > @@ -265,8 +563,7 @@ RegisterCpuFeatureWorker (
> >     CpuFeaturesData = GetCpuFeaturesData ();
> >     if (CpuFeaturesData->FeaturesCount == 0) {
> >       InitializeListHead (&CpuFeaturesData->FeatureList);
> > -    InitializeSpinLock (&CpuFeaturesData->MsrLock);
> > -    InitializeSpinLock (&CpuFeaturesData->MemoryMappedLock);
> > +    InitializeSpinLock (&CpuFeaturesData->CpuFlags.MemoryMappedLock);
> >       CpuFeaturesData->BitMaskSize = (UINT32) BitMaskSize;
> >     }
> >     ASSERT (CpuFeaturesData->BitMaskSize == BitMaskSize);
> > @@ -328,6 +625,31 @@ RegisterCpuFeatureWorker (
> >         }
> >         CpuFeatureEntry->AfterFeatureBitMask = CpuFeature-
> >AfterFeatureBitMask;
> >       }
> > +    if (CpuFeature->CoreBeforeFeatureBitMask != NULL) {
> > +      if (CpuFeatureEntry->CoreBeforeFeatureBitMask != NULL) {
> > +        FreePool (CpuFeatureEntry->CoreBeforeFeatureBitMask);
> > +      }
> > +      CpuFeatureEntry->CoreBeforeFeatureBitMask = CpuFeature-
> >CoreBeforeFeatureBitMask;
> > +    }
> > +    if (CpuFeature->CoreAfterFeatureBitMask != NULL) {
> > +      if (CpuFeatureEntry->CoreAfterFeatureBitMask != NULL) {
> > +        FreePool (CpuFeatureEntry->CoreAfterFeatureBitMask);
> > +      }
> > +      CpuFeatureEntry->CoreAfterFeatureBitMask = CpuFeature-
> >CoreAfterFeatureBitMask;
> > +    }
> > +    if (CpuFeature->PackageBeforeFeatureBitMask != NULL) {
> > +      if (CpuFeatureEntry->PackageBeforeFeatureBitMask != NULL) {
> > +        FreePool (CpuFeatureEntry->PackageBeforeFeatureBitMask);
> > +      }
> > +      CpuFeatureEntry->PackageBeforeFeatureBitMask = CpuFeature-
> >PackageBeforeFeatureBitMask;
> > +    }
> > +    if (CpuFeature->PackageAfterFeatureBitMask != NULL) {
> > +      if (CpuFeatureEntry->PackageAfterFeatureBitMask != NULL) {
> > +        FreePool (CpuFeatureEntry->PackageAfterFeatureBitMask);
> > +      }
> > +      CpuFeatureEntry->PackageAfterFeatureBitMask = CpuFeature-
> >PackageAfterFeatureBitMask;
> > +    }
> > +
> >       CpuFeatureEntry->BeforeAll = CpuFeature->BeforeAll;
> >       CpuFeatureEntry->AfterAll  = CpuFeature->AfterAll;
> >
> > @@ -410,6 +732,8 @@ SetCpuFeaturesBitMask (
> >     @retval  RETURN_UNSUPPORTED       Registration of the CPU feature is
> not
> >                                       supported due to a circular dependency between
> >                                       BEFORE and AFTER features.
> > +  @retval  RETURN_NOT_READY         CPU feature PCD
> PcdCpuFeaturesUserConfiguration
> > +                                    not updated by Platform driver yet.
> >
> >     @note This service could be called by BSP only.
> >   **/
> > @@ -431,12 +755,20 @@ RegisterCpuFeature (
> >     UINT8                      *FeatureMask;
> >     UINT8                      *BeforeFeatureBitMask;
> >     UINT8                      *AfterFeatureBitMask;
> > +  UINT8                      *CoreBeforeFeatureBitMask;
> > +  UINT8                      *CoreAfterFeatureBitMask;
> > +  UINT8                      *PackageBeforeFeatureBitMask;
> > +  UINT8                      *PackageAfterFeatureBitMask;
> >     BOOLEAN                    BeforeAll;
> >     BOOLEAN                    AfterAll;
> >
> > -  FeatureMask          = NULL;
> > -  BeforeFeatureBitMask = NULL;
> > -  AfterFeatureBitMask  = NULL;
> > +  FeatureMask                 = NULL;
> > +  BeforeFeatureBitMask        = NULL;
> 
> How about renaming BeforeFeatureBitMask to
> ThreadBeforeFeatureBitMask?
> I think the renaming together with redefining the macro
> CPU_FEATURE_BEFORE as CPU_FEATURE_THREAD_BEFORE can be in a
> separate patch.
> 

Ok, will separate the patch in next version changes.

> > +  AfterFeatureBitMask         = NULL;
> > +  CoreBeforeFeatureBitMask    = NULL;
> > +  CoreAfterFeatureBitMask     = NULL;
> > +  PackageBeforeFeatureBitMask  = NULL;
> > +  PackageAfterFeatureBitMask   = NULL;
> >     BeforeAll            = FALSE;
> >     AfterAll             = FALSE;
> >
> > @@ -449,6 +781,10 @@ RegisterCpuFeature (
> >                       != (CPU_FEATURE_BEFORE | CPU_FEATURE_AFTER));
> >       ASSERT ((Feature & (CPU_FEATURE_BEFORE_ALL |
> CPU_FEATURE_AFTER_ALL))
> >                       != (CPU_FEATURE_BEFORE_ALL | CPU_FEATURE_AFTER_ALL));
> 
> Implementation can avoid using CPU_FEATURE_BEFORE and
> CPU_FEATURE_AFTER.
> Use CPU_FEATURE_THREAD_BEFORE and CPU_FEATURE_THREAD_AFTER.

Ok, will do this change in the separate patch in next version changes.


> 
> > +    ASSERT ((Feature & (CPU_FEATURE_CORE_BEFORE |
> CPU_FEATURE_CORE_AFTER))
> > +                    != (CPU_FEATURE_CORE_BEFORE |
> CPU_FEATURE_CORE_AFTER));
> > +    ASSERT ((Feature & (CPU_FEATURE_PACKAGE_BEFORE |
> CPU_FEATURE_PACKAGE_AFTER))
> > +                    != (CPU_FEATURE_PACKAGE_BEFORE |
> CPU_FEATURE_PACKAGE_AFTER));
> >       if (Feature < CPU_FEATURE_BEFORE) {
> >         BeforeAll = ((Feature & CPU_FEATURE_BEFORE_ALL) != 0) ? TRUE :
> FALSE;
> >         AfterAll  = ((Feature & CPU_FEATURE_AFTER_ALL) != 0) ? TRUE : FALSE;
> > @@ -459,6 +795,14 @@ RegisterCpuFeature (
> >         SetCpuFeaturesBitMask (&BeforeFeatureBitMask, Feature &
> ~CPU_FEATURE_BEFORE, BitMaskSize);
> >       } else if ((Feature & CPU_FEATURE_AFTER) != 0) {
> >         SetCpuFeaturesBitMask (&AfterFeatureBitMask, Feature &
> ~CPU_FEATURE_AFTER, BitMaskSize);
> > +    } else if ((Feature & CPU_FEATURE_CORE_BEFORE) != 0) {
> > +      SetCpuFeaturesBitMask (&CoreBeforeFeatureBitMask, Feature &
> ~CPU_FEATURE_CORE_BEFORE, BitMaskSize);
> > +    } else if ((Feature & CPU_FEATURE_CORE_AFTER) != 0) {
> > +      SetCpuFeaturesBitMask (&CoreAfterFeatureBitMask, Feature &
> ~CPU_FEATURE_CORE_AFTER, BitMaskSize);
> > +    } else if ((Feature & CPU_FEATURE_PACKAGE_BEFORE) != 0) {
> > +      SetCpuFeaturesBitMask (&PackageBeforeFeatureBitMask, Feature &
> ~CPU_FEATURE_PACKAGE_BEFORE, BitMaskSize);
> > +    } else if ((Feature & CPU_FEATURE_PACKAGE_AFTER) != 0) {
> > +      SetCpuFeaturesBitMask (&PackageAfterFeatureBitMask, Feature &
> ~CPU_FEATURE_PACKAGE_AFTER, BitMaskSize);
> >       }
> >       Feature = VA_ARG (Marker, UINT32);
> >     }
> > @@ -466,15 +810,19 @@ RegisterCpuFeature (
> >
> >     CpuFeature = AllocateZeroPool (sizeof (CPU_FEATURES_ENTRY));
> >     ASSERT (CpuFeature != NULL);
> > -  CpuFeature->Signature            = CPU_FEATURE_ENTRY_SIGNATURE;
> > -  CpuFeature->FeatureMask          = FeatureMask;
> > -  CpuFeature->BeforeFeatureBitMask = BeforeFeatureBitMask;
> > -  CpuFeature->AfterFeatureBitMask  = AfterFeatureBitMask;
> > -  CpuFeature->BeforeAll            = BeforeAll;
> > -  CpuFeature->AfterAll             = AfterAll;
> > -  CpuFeature->GetConfigDataFunc    = GetConfigDataFunc;
> > -  CpuFeature->SupportFunc          = SupportFunc;
> > -  CpuFeature->InitializeFunc       = InitializeFunc;
> > +  CpuFeature->Signature                   = CPU_FEATURE_ENTRY_SIGNATURE;
> > +  CpuFeature->FeatureMask                 = FeatureMask;
> > +  CpuFeature->BeforeFeatureBitMask        = BeforeFeatureBitMask;
> > +  CpuFeature->AfterFeatureBitMask         = AfterFeatureBitMask;
> > +  CpuFeature->CoreBeforeFeatureBitMask    = CoreBeforeFeatureBitMask;
> > +  CpuFeature->CoreAfterFeatureBitMask     = CoreAfterFeatureBitMask;
> > +  CpuFeature->PackageBeforeFeatureBitMask =
> PackageBeforeFeatureBitMask;
> > +  CpuFeature->PackageAfterFeatureBitMask  =
> PackageAfterFeatureBitMask;
> > +  CpuFeature->BeforeAll                   = BeforeAll;
> > +  CpuFeature->AfterAll                    = AfterAll;
> > +  CpuFeature->GetConfigDataFunc           = GetConfigDataFunc;
> > +  CpuFeature->SupportFunc                 = SupportFunc;
> > +  CpuFeature->InitializeFunc              = InitializeFunc;
> >     if (FeatureName != NULL) {
> >       CpuFeature->FeatureName          = AllocatePool
> (CPU_FEATURE_NAME_SIZE);
> >       ASSERT (CpuFeature->FeatureName != NULL);
> >
> 
> 
> --
> Thanks,
> Ray

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch 4/4] UefiCpuPkg/PiSmmCpuDxeSmm: Add logic to support semaphore type.
  2018-10-15 17:13   ` Laszlo Ersek
@ 2018-10-16 14:44     ` Dong, Eric
  0 siblings, 0 replies; 18+ messages in thread
From: Dong, Eric @ 2018-10-16 14:44 UTC (permalink / raw)
  To: Laszlo Ersek, edk2-devel@lists.01.org; +Cc: Ni, Ruiyu

Hi Laszlo,

> -----Original Message-----
> From: Laszlo Ersek [mailto:lersek@redhat.com]
> Sent: Tuesday, October 16, 2018 1:13 AM
> To: Dong, Eric <eric.dong@intel.com>; edk2-devel@lists.01.org
> Cc: Ni, Ruiyu <ruiyu.ni@intel.com>
> Subject: Re: [Patch 4/4] UefiCpuPkg/PiSmmCpuDxeSmm: Add logic to support
> semaphore type.
> 
> On 10/15/18 04:49, Eric Dong wrote:
> > Because this driver needs to set MSRs saved in normal boot phase, sync
> > semaphore logic from RegisterCpuFeaturesLib code which used for normal
> boot phase.
> 
> (My review of this patch is going to be superficial. I'm not trying to validate the
> actual algorithm. I'm mostly sanity-checking the code, and gauging whether it
> will break platforms that use CpuS3DataDxe.)
> 

Reasonable, thanks for your efforts.

> 
> > Detail see change SHA-1: dcdf1774212d87e2d7feb36286a408ea7475fd7b for
> > RegisterCpuFeaturesLib.
> 
> (1) I think it is valid to reference other patches in the same series.
> However, the commit hashes are not stable yet -- when you rebase the series,
> the commit hashes will change. Therefore, when we refer to a patch that is not
> upstream yet (i.e. it is part of the same series), it is best to spell out the full
> subject, such as:
> 
> UefiCpuPkg/RegisterCpuFeaturesLib: Add logic to support semaphore type.
> 

I aware this value change when do the rebase action. I plan to update the value when I do the rebase action.  Your suggestion is good. I can also use the change header to specify the change. I will use it in my next change.

> 
> >
> > Cc: Ruiyu Ni <ruiyu.ni@intel.com>
> > Cc: Laszlo Ersek <lersek@redhat.com>
> > Contributed-under: TianoCore Contribution Agreement 1.1
> > Signed-off-by: Eric Dong <eric.dong@intel.com>
> > ---
> >  UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c          | 316 ++++++++++++++++-------
> ------
> >  UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c      |   3 -
> >  UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h |   3 +-
> >  3 files changed, 180 insertions(+), 142 deletions(-)
> >
> > diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
> > b/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
> > index 52ff9679d5..5a35f7a634 100644
> > --- a/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
> > +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
> > @@ -38,9 +38,12 @@ typedef struct {
> >  } MP_ASSEMBLY_ADDRESS_MAP;
> >
> >  //
> > -// Spin lock used to serialize MemoryMapped operation
> > +// Flags used when program the register.
> >  //
> > -SPIN_LOCK                *mMemoryMappedLock = NULL;
> > +typedef struct {
> > +  volatile UINTN           MemoryMappedLock;     // Spinlock used to program
> mmio
> > +  volatile UINT32          *SemaphoreCount;      // Semaphore used to program
> semaphore.
> > +} PROGRAM_CPU_REGISTER_FLAGS;
> >
> >  //
> >  // Signal that SMM BASE relocation is complete.
> > @@ -62,13 +65,11 @@ AsmGetAddressMap (
> >  #define LEGACY_REGION_SIZE    (2 * 0x1000)
> >  #define LEGACY_REGION_BASE    (0xA0000 - LEGACY_REGION_SIZE)
> >
> > +PROGRAM_CPU_REGISTER_FLAGS   mCpuFlags;
> >  ACPI_CPU_DATA                mAcpiCpuData;
> >  volatile UINT32              mNumberToFinish;
> >  MP_CPU_EXCHANGE_INFO         *mExchangeInfo;
> >  BOOLEAN                      mRestoreSmmConfigurationInS3 = FALSE;
> > -MP_MSR_LOCK                  *mMsrSpinLocks = NULL;
> > -UINTN                        mMsrSpinLockCount;
> > -UINTN                        mMsrCount = 0;
> >
> >  //
> >  // S3 boot flag
> > @@ -91,89 +92,6 @@ UINT8                        mApHltLoopCodeTemplate[] = {
> >                                 0xEB, 0xFC               // jmp $-2
> >                                 };
> >
> > -/**
> > -  Get MSR spin lock by MSR index.
> > -
> > -  @param  MsrIndex       MSR index value.
> > -
> > -  @return Pointer to MSR spin lock.
> > -
> > -**/
> > -SPIN_LOCK *
> > -GetMsrSpinLockByIndex (
> > -  IN UINT32      MsrIndex
> > -  )
> > -{
> > -  UINTN     Index;
> > -  for (Index = 0; Index < mMsrCount; Index++) {
> > -    if (MsrIndex == mMsrSpinLocks[Index].MsrIndex) {
> > -      return mMsrSpinLocks[Index].SpinLock;
> > -    }
> > -  }
> > -  return NULL;
> > -}
> > -
> > -/**
> > -  Initialize MSR spin lock by MSR index.
> > -
> > -  @param  MsrIndex       MSR index value.
> > -
> > -**/
> > -VOID
> > -InitMsrSpinLockByIndex (
> > -  IN UINT32      MsrIndex
> > -  )
> > -{
> > -  UINTN    MsrSpinLockCount;
> > -  UINTN    NewMsrSpinLockCount;
> > -  UINTN    Index;
> > -  UINTN    AddedSize;
> > -
> > -  if (mMsrSpinLocks == NULL) {
> > -    MsrSpinLockCount =
> mSmmCpuSemaphores.SemaphoreMsr.AvailableCounter;
> > -    mMsrSpinLocks = (MP_MSR_LOCK *) AllocatePool (sizeof (MP_MSR_LOCK)
> * MsrSpinLockCount);
> > -    ASSERT (mMsrSpinLocks != NULL);
> > -    for (Index = 0; Index < MsrSpinLockCount; Index++) {
> > -      mMsrSpinLocks[Index].SpinLock =
> > -       (SPIN_LOCK *)((UINTN)mSmmCpuSemaphores.SemaphoreMsr.Msr +
> Index * mSemaphoreSize);
> > -      mMsrSpinLocks[Index].MsrIndex = (UINT32)-1;
> > -    }
> > -    mMsrSpinLockCount = MsrSpinLockCount;
> > -    mSmmCpuSemaphores.SemaphoreMsr.AvailableCounter = 0;
> > -  }
> > -  if (GetMsrSpinLockByIndex (MsrIndex) == NULL) {
> > -    //
> > -    // Initialize spin lock for MSR programming
> > -    //
> > -    mMsrSpinLocks[mMsrCount].MsrIndex = MsrIndex;
> > -    InitializeSpinLock (mMsrSpinLocks[mMsrCount].SpinLock);
> > -    mMsrCount ++;
> > -    if (mMsrCount == mMsrSpinLockCount) {
> > -      //
> > -      // If MSR spin lock buffer is full, enlarge it
> > -      //
> > -      AddedSize = SIZE_4KB;
> > -      mSmmCpuSemaphores.SemaphoreMsr.Msr =
> > -                        AllocatePages (EFI_SIZE_TO_PAGES(AddedSize));
> > -      ASSERT (mSmmCpuSemaphores.SemaphoreMsr.Msr != NULL);
> > -      NewMsrSpinLockCount = mMsrSpinLockCount + AddedSize /
> mSemaphoreSize;
> > -      mMsrSpinLocks = ReallocatePool (
> > -                        sizeof (MP_MSR_LOCK) * mMsrSpinLockCount,
> > -                        sizeof (MP_MSR_LOCK) * NewMsrSpinLockCount,
> > -                        mMsrSpinLocks
> > -                        );
> > -      ASSERT (mMsrSpinLocks != NULL);
> > -      mMsrSpinLockCount = NewMsrSpinLockCount;
> > -      for (Index = mMsrCount; Index < mMsrSpinLockCount; Index++) {
> > -        mMsrSpinLocks[Index].SpinLock =
> > -                 (SPIN_LOCK *)((UINTN)mSmmCpuSemaphores.SemaphoreMsr.Msr +
> > -                 (Index - mMsrCount)  * mSemaphoreSize);
> > -        mMsrSpinLocks[Index].MsrIndex = (UINT32)-1;
> > -      }
> > -    }
> > -  }
> > -}
> > -
> >  /**
> >    Sync up the MTRR values for all processors.
> >
> > @@ -204,42 +122,89 @@ Returns:
> >  }
> >
> >  /**
> > -  Programs registers for the calling processor.
> > +  Increment semaphore by 1.
> >
> > -  This function programs registers for the calling processor.
> > +  @param      Sem            IN:  32-bit unsigned integer
> >
> > -  @param  RegisterTables        Pointer to register table of the running
> processor.
> > -  @param  RegisterTableCount    Register table count.
> > +**/
> > +VOID
> > +S3ReleaseSemaphore (
> > +  IN OUT  volatile UINT32           *Sem
> > +  )
> > +{
> > +  InterlockedIncrement (Sem);
> > +}
> > +
> > +/**
> > +  Decrement the semaphore by 1 if it is not zero.
> > +
> > +  Performs an atomic decrement operation for semaphore.
> > +  The compare exchange operation must be performed using  MP safe
> > + mechanisms.
> > +
> > +  @param      Sem            IN:  32-bit unsigned integer
> > +
> > +**/
> > +VOID
> > +S3WaitForSemaphore (
> > +  IN OUT  volatile UINT32           *Sem
> > +  )
> > +{
> > +  UINT32  Value;
> > +
> > +  do {
> > +    Value = *Sem;
> > +  } while (Value == 0);
> > +
> > +  InterlockedDecrement (Sem);
> > +}
> 
> (2) I think this implementation is not correct. If threads T1 and T2 are spinning in
> the loop, and thread T3 releases the semaphore, then both T1 and T2 could see
> (Value==1). They will both exit the loop, they will both decrement (*Sem), and
> then (*Sem) will wrap around.
> 
> Instead, we should do:
> 
>   for (;;) {
>     Value = *Sem;
>     if (Value == 0) {
>       continue;
>     }
>     if (InterlockedCompareExchange32 (Sem, Value, Value - 1) == Value) {
>       break;
>     }
>   }
> 
> This implementation is not protected against the ABA problem, but that's fine.
> Namely, it doesn't matter whether, and how, the value of (*Sem) fluctuates,
> between fetching it into Value, and setting it to (Value-1).
> What matters is that we either perform a transition from Value to (Value-1), or
> nothing.
> 

Good catch. Thanks for your sample code. I will update the code in my next changes.

> 
> > +
> > +/**
> > +  Initialize the CPU registers from a register table.
> > +
> > +  @param[in]  RegisterTable         The register table for this AP.
> > +  @param[in]  ApLocation            AP location info for this ap.
> > +  @param[in]  CpuStatus             CPU status info for this CPU.
> > +  @param[in]  CpuFlags              Flags data structure used when program the
> register.
> >
> > +  @note This service could be called by BSP/APs.
> >  **/
> >  VOID
> > -SetProcessorRegister (
> > -  IN CPU_REGISTER_TABLE        *RegisterTables,
> > -  IN UINTN                     RegisterTableCount
> > +EFIAPI
> > +ProgramProcessorRegister (
> > +  IN CPU_REGISTER_TABLE           *RegisterTable,
> > +  IN EFI_CPU_PHYSICAL_LOCATION    *ApLocation,
> > +  IN CPU_STATUS_INFORMATION       *CpuStatus,
> > +  IN PROGRAM_CPU_REGISTER_FLAGS   *CpuFlags
> >    )
> 
> (3) Any particular reason for declaring this function as EFIAPI?

I plan to export this function as an API from RegisterCpuFeaturesLib, I have did some POC code change. But this version changes will not export it as an API. I forgot to remove EFIAPI when I duplicate the code here. Will remove it in my next version changes.

> 
> 
> >  {
> >    CPU_REGISTER_TABLE_ENTRY  *RegisterTableEntry;
> >    UINTN                     Index;
> >    UINTN                     Value;
> > -  SPIN_LOCK                 *MsrSpinLock;
> > -  UINT32                    InitApicId;
> > -  CPU_REGISTER_TABLE        *RegisterTable;
> > +  CPU_REGISTER_TABLE_ENTRY  *RegisterTableEntryHead;
> > +  volatile UINT32           *SemaphorePtr;
> > +  UINT32                    CoreOffset;
> > +  UINT32                    PackageOffset;
> > +  UINT32                    PackageThreadsCount;
> > +  UINT32                    ApOffset;
> > +  UINTN                     ProcessorIndex;
> > +  UINTN                     ApIndex;
> > +  UINTN                     ValidApCount;
> >
> > -  InitApicId = GetInitialApicId ();
> > -  RegisterTable = NULL;
> > -  for (Index = 0; Index < RegisterTableCount; Index++) {
> > -    if (RegisterTables[Index].InitialApicId == InitApicId) {
> > -      RegisterTable =  &RegisterTables[Index];
> > -      break;
> > -    }
> > -  }
> > -  ASSERT (RegisterTable != NULL);
> > +  ApIndex = ApLocation->Package * CpuStatus->CoreCount * CpuStatus-
> >ThreadCount \
> > +            + ApLocation->Core * CpuStatus->ThreadCount \
> > +            + ApLocation->Thread;
> 
> (4) The backslashes look useless.
> 
> In addition, the plus signs should be at the ends of the lines, according to the
> edk2 style (operators at the end).

Will update then in my next version code changes.

> 
> >
> >    //
> >    // Traverse Register Table of this logical processor
> >    //
> > -  RegisterTableEntry = (CPU_REGISTER_TABLE_ENTRY *) (UINTN)
> > RegisterTable->RegisterTableEntry;
> > -  for (Index = 0; Index < RegisterTable->TableLength; Index++,
> > RegisterTableEntry++) {
> > +  RegisterTableEntryHead = (CPU_REGISTER_TABLE_ENTRY *) (UINTN)
> > + RegisterTable->RegisterTableEntry;
> > +
> > +  for (Index = 0; Index < RegisterTable->TableLength; Index++) {
> 
> (OK, I think this should continue working with (TableLength==0), from
> CpuS3DataDxe.)
> 
> > +
> > +    RegisterTableEntry = &RegisterTableEntryHead[Index];
> > +    DEBUG ((DEBUG_INFO, "Processor = %d, Entry Index %d, Type =
> > + %d!\n", ApIndex, Index, RegisterTableEntry->RegisterType));
> 
> (5) "ApIndex" and "Index" have type UINTN; they should not be printed with
> "%d". The portable way to print them is to cast them to UINT64, and use "%lu".
> 

ok, will update then in my next version code changes.

> 
> > +
> >      //
> >      // Check the type of specified register
> >      //
> > @@ -310,12 +275,6 @@ SetProcessorRegister (
> >            RegisterTableEntry->Value
> >            );
> >        } else {
> > -        //
> > -        // Get lock to avoid Package/Core scope MSRs programming issue in
> parallel execution mode
> > -        // to make sure MSR read/write operation is atomic.
> > -        //
> > -        MsrSpinLock = GetMsrSpinLockByIndex (RegisterTableEntry->Index);
> > -        AcquireSpinLock (MsrSpinLock);
> >          //
> >          // Set the bit section according to bit start and length
> >          //
> > @@ -325,21 +284,20 @@ SetProcessorRegister (
> >            RegisterTableEntry->ValidBitStart + RegisterTableEntry->ValidBitLength -
> 1,
> >            RegisterTableEntry->Value
> >            );
> > -        ReleaseSpinLock (MsrSpinLock);
> >        }
> >        break;
> >      //
> >      // MemoryMapped operations
> >      //
> >      case MemoryMapped:
> > -      AcquireSpinLock (mMemoryMappedLock);
> > +      AcquireSpinLock (&CpuFlags->MemoryMappedLock);
> >        MmioBitFieldWrite32 (
> >          (UINTN)(RegisterTableEntry->Index | LShiftU64 (RegisterTableEntry-
> >HighIndex, 32)),
> >          RegisterTableEntry->ValidBitStart,
> >          RegisterTableEntry->ValidBitStart + RegisterTableEntry->ValidBitLength -
> 1,
> >          (UINT32)RegisterTableEntry->Value
> >          );
> > -      ReleaseSpinLock (mMemoryMappedLock);
> > +      ReleaseSpinLock (&CpuFlags->MemoryMappedLock);
> >        break;
> >      //
> >      // Enable or disable cache
> > @@ -355,12 +313,99 @@ SetProcessorRegister (
> >        }
> >        break;
> >
> > +    case Semaphore:
> > +      SemaphorePtr = CpuFlags->SemaphoreCount;
> > +      switch (RegisterTableEntry->Value) {
> > +      case CoreDepType:
> > +        CoreOffset = (ApLocation->Package * CpuStatus->CoreCount +
> ApLocation->Core) * CpuStatus->ThreadCount;
> > +        ApOffset = CoreOffset + ApLocation->Thread;
> > +        //
> > +        // First increase semaphore count by 1 for processors in this core.
> > +        //
> > +        for (ProcessorIndex = 0; ProcessorIndex < CpuStatus->ThreadCount;
> ProcessorIndex ++) {
> > +          S3ReleaseSemaphore ((UINT32 *) &SemaphorePtr[CoreOffset +
> > + ProcessorIndex]);
> 
> (6) The explicit (UINT32*) cast is confusing and unneeded, please remove it.

ok, will update then in my next version code changes.

> 
> > +        }
> > +        //
> > +        // Second, check whether the count has reach the check number.
> > +        //
> > +        for (ProcessorIndex = 0; ProcessorIndex < CpuStatus->ThreadCount;
> ProcessorIndex ++) {
> > +          S3WaitForSemaphore (&SemaphorePtr[ApOffset]);
> > +        }
> > +        break;
> > +
> > +      case PackageDepType:
> > +        PackageOffset = ApLocation->Package * CpuStatus->CoreCount *
> CpuStatus->ThreadCount;
> > +        PackageThreadsCount = CpuStatus->ThreadCount * CpuStatus-
> >CoreCount;
> > +        ApOffset = PackageOffset + CpuStatus->ThreadCount * ApLocation-
> >Core + ApLocation->Thread;
> > +        ValidApCount = CpuStatus->ThreadCount * CpuStatus-
> >ValidCoresInPackages[ApLocation->Package];
> > +        //
> > +        // First increase semaphore count by 1 for processors in this package.
> > +        //
> > +        for (ProcessorIndex = 0; ProcessorIndex < PackageThreadsCount ;
> ProcessorIndex ++) {
> > +          S3ReleaseSemaphore ((UINT32 *) &SemaphorePtr[PackageOffset
> > + + ProcessorIndex]);
> 
> (7) Same as (6).

ok, will update then in my next version code changes.

> 
> > +        }
> > +        //
> > +        // Second, check whether the count has reach the check number.
> > +        //
> > +        for (ProcessorIndex = 0; ProcessorIndex < ValidApCount; ProcessorIndex
> ++) {
> > +          S3WaitForSemaphore (&SemaphorePtr[ApOffset]);
> > +        }
> > +        break;
> > +
> > +      default:
> > +        break;
> > +      }
> > +      break;
> > +
> >      default:
> >        break;
> >      }
> >    }
> >  }
> >
> > +/**
> > +
> > +  Set Processor register for one AP.
> > +
> > +  @param     SmmPreRegisterTable     Use pre register table or register table.
> > +
> > +**/
> > +VOID
> > +SetRegister (
> > +  IN BOOLEAN                 SmmPreRegisterTable
> 
> (8) For consistency with the "PreSmmInitRegisterTable" field name, I think this
> parameter should be named "PreSmmRegisterTable" (in the leading comment as
> well).

ok, will update then in my next version code changes.
> 
> 
> > +  )
> > +{
> > +  CPU_REGISTER_TABLE        *RegisterTable;
> > +  CPU_REGISTER_TABLE        *RegisterTables;
> > +  UINT32                    InitApicId;
> > +  UINTN                     ProcIndex;
> > +  UINTN                     Index;
> > +
> > +  if (SmmPreRegisterTable) {
> > +    RegisterTables = (CPU_REGISTER_TABLE
> > + *)(UINTN)mAcpiCpuData.PreSmmInitRegisterTable;
> > +  } else {
> > +    RegisterTables = (CPU_REGISTER_TABLE
> > + *)(UINTN)mAcpiCpuData.RegisterTable;
> > +  }
> > +
> > +  InitApicId = GetInitialApicId ();
> > +  RegisterTable = NULL;
> > +  for (Index = 0; Index < mAcpiCpuData.NumberOfCpus; Index++) {
> > +    if (RegisterTables[Index].InitialApicId == InitApicId) {
> > +      RegisterTable =  &RegisterTables[Index];
> 
> (9) Unjustified double space after the equal sign.

ok, will update then in my next version code changes.
> 
> 
> > +      ProcIndex = Index;
> > +      break;
> > +    }
> > +  }
> > +  ASSERT (RegisterTable != NULL);
> > +
> > +  ProgramProcessorRegister (
> > +    RegisterTable,
> > +    mAcpiCpuData.ApLocation + ProcIndex,
> > +    &mAcpiCpuData.CpuStatus,
> > +    &mCpuFlags
> > +    );
> > +}
> > +
> >  /**
> >    AP initialization before then after SMBASE relocation in the S3 boot path.
> >  **/
> > @@ -374,7 +419,7 @@ InitializeAp (
> >
> >    LoadMtrrData (mAcpiCpuData.MtrrTable);
> >
> > -  SetProcessorRegister ((CPU_REGISTER_TABLE *) (UINTN)
> > mAcpiCpuData.PreSmmInitRegisterTable, mAcpiCpuData.NumberOfCpus);
> > +  SetRegister (TRUE);
> >
> >    //
> >    // Count down the number with lock mechanism.
> > @@ -391,7 +436,7 @@ InitializeAp (
> >    ProgramVirtualWireMode ();
> >    DisableLvtInterrupts ();
> >
> > -  SetProcessorRegister ((CPU_REGISTER_TABLE *) (UINTN)
> > mAcpiCpuData.RegisterTable, mAcpiCpuData.NumberOfCpus);
> > +  SetRegister (FALSE);
> >
> >    //
> >    // Place AP into the safe code, count down the number with lock mechanism
> in the safe code.
> > @@ -466,7 +511,7 @@ InitializeCpuBeforeRebase (  {
> >    LoadMtrrData (mAcpiCpuData.MtrrTable);
> >
> > -  SetProcessorRegister ((CPU_REGISTER_TABLE *) (UINTN)
> > mAcpiCpuData.PreSmmInitRegisterTable, mAcpiCpuData.NumberOfCpus);
> > +  SetRegister (TRUE);
> >
> >    ProgramVirtualWireMode ();
> >
> > @@ -502,8 +547,6 @@ InitializeCpuAfterRebase (
> >    VOID
> >    )
> >  {
> > -  SetProcessorRegister ((CPU_REGISTER_TABLE *) (UINTN)
> > mAcpiCpuData.RegisterTable, mAcpiCpuData.NumberOfCpus);
> > -
> >    mNumberToFinish = mAcpiCpuData.NumberOfCpus - 1;
> >
> >    //
> > @@ -511,6 +554,8 @@ InitializeCpuAfterRebase (
> >    //
> >    mInitApsAfterSmmBaseReloc = TRUE;
> >
> > +  SetRegister (FALSE);
> > +
> >    while (mNumberToFinish > 0) {
> >      CpuPause ();
> >    }
> 
> (10) I'm not implying this is incorrect, just asking: can you please explain why the
> function call is *moved*?
> 
> Does it merit a comment in the code perhaps?

Yes, this change used to let Aps continue it procedure before BSP begin to set the register. Old logic will let AP continue their tasks after BSP finished set its register. This is a must have change if the semaphore logic is working. Because if the dependence type is package type, the semaphore will wait for all Aps in one package finishing their tasks before set next register for all APs. If the Aps not begin its task during BSP doing its task, the BSP thread will hang because it is waiting for other Aps in the same package finishing their task.

I will add this info in the comments.

> 
> 
> > @@ -574,8 +619,6 @@ SmmRestoreCpu (
> >
> >    mSmmS3Flag = TRUE;
> >
> > -  InitializeSpinLock (mMemoryMappedLock);
> > -
> >    //
> >    // See if there is enough context to resume PEI Phase
> >    //
> > @@ -790,7 +833,6 @@ CopyRegisterTable (
> >    )
> >  {
> >    UINTN                      Index;
> > -  UINTN                      Index1;
> >    CPU_REGISTER_TABLE_ENTRY   *RegisterTableEntry;
> >
> >    CopyMem (DestinationRegisterTableList, SourceRegisterTableList,
> > NumberOfCpus * sizeof (CPU_REGISTER_TABLE)); @@ -802,17 +844,6 @@
> CopyRegisterTable (
> >          );
> >        ASSERT (RegisterTableEntry != NULL);
> >        DestinationRegisterTableList[Index].RegisterTableEntry =
> (EFI_PHYSICAL_ADDRESS)(UINTN)RegisterTableEntry;
> > -      //
> > -      // Go though all MSRs in register table to initialize MSR spin lock
> > -      //
> > -      for (Index1 = 0; Index1 < DestinationRegisterTableList[Index].TableLength;
> Index1++, RegisterTableEntry++) {
> > -        if ((RegisterTableEntry->RegisterType == Msr) && (RegisterTableEntry-
> >ValidBitLength < 64)) {
> > -          //
> > -          // Initialize MSR spin lock only for those MSRs need bit field writing
> > -          //
> > -          InitMsrSpinLockByIndex (RegisterTableEntry->Index);
> > -        }
> > -      }
> >      }
> >    }
> >  }
> > @@ -832,6 +863,7 @@ GetAcpiCpuData (
> >    VOID                       *GdtForAp;
> >    VOID                       *IdtForAp;
> >    VOID                       *MachineCheckHandlerForAp;
> > +  CPU_STATUS_INFORMATION     *CpuStatus;
> >
> >    if (!mAcpiS3Enable) {
> >      return;
> > @@ -906,6 +938,16 @@ GetAcpiCpuData (
> >    Gdtr->Base = (UINTN)GdtForAp;
> >    Idtr->Base = (UINTN)IdtForAp;
> >    mAcpiCpuData.ApMachineCheckHandlerBase =
> > (EFI_PHYSICAL_ADDRESS)(UINTN)MachineCheckHandlerForAp;
> > +
> > +  CpuStatus = &mAcpiCpuData.CpuStatus;  CopyMem (CpuStatus,
> > + &AcpiCpuData->CpuStatus, sizeof (CPU_STATUS_INFORMATION));
> > + CpuStatus->ValidCoresInPackages = AllocateCopyPool (sizeof (UINT32)
> > + * CpuStatus->PackageCount,
> > + AcpiCpuData->CpuStatus.ValidCoresInPackages);
> 
> (11) This line is 142 characters long.
> 
> Please make sure that all new lines are at most 120 chars long.

got it, will refine the code in my next changes.

> 
> 
> (12) I don't understand the multiplication. In the "ValidCoresInPackages" array,
> do we have a simple (scalar) core count, for each socket?
Yes, ValidCoresInPackages is a pointer which point to an array. This array saves valid core count for each package(socket) in the CPU. In server platform, it has multiple packages and different packages may have different valid cores. This info is required by semaphore register.

> 
> That's what the "ValidApCount" assignment above suggests. Can we perhaps
> rename the field so that it says "Count" somewhere?
> 
> 
> (13) Without modifying CpuS3DataDxe, this line will crash.	
Yes, I missed the change in CpuS3DataDxe, will include the change in my next version.

> 
> 
> > +  ASSERT (CpuStatus->ValidCoresInPackages != NULL);
> > + mAcpiCpuData.ApLocation = AllocateCopyPool
> > + (mAcpiCpuData.NumberOfCpus * sizeof (EFI_CPU_PHYSICAL_LOCATION),
> > + AcpiCpuData->ApLocation);  ASSERT (mAcpiCpuData.ApLocation != NULL);
> 
> (14) This also requires a modification to CpuS3DataDxe.
Yes, I missed the change in CpuS3DataDxe, will include the change in my next version.

> 
> 
> > +  InitializeSpinLock((SPIN_LOCK*) &mCpuFlags.MemoryMappedLock);
> > + mCpuFlags.SemaphoreCount = AllocateZeroPool (sizeof (UINT32) *
> > + CpuStatus->PackageCount * CpuStatus->CoreCount *
> > + CpuStatus->ThreadCount);  ASSERT (mCpuFlags.SemaphoreCount != NULL);
> >  }
> >
> >  /**
> > diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
> > b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
> > index 9cf508a5c7..42b040531e 100644
> > --- a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
> > +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
> > @@ -1303,8 +1303,6 @@ InitializeSmmCpuSemaphores (
> >    mSmmCpuSemaphores.SemaphoreGlobal.CodeAccessCheckLock
> >                                                    = (SPIN_LOCK *)SemaphoreAddr;
> >    SemaphoreAddr += SemaphoreSize;
> > -  mSmmCpuSemaphores.SemaphoreGlobal.MemoryMappedLock
> > -                                                  = (SPIN_LOCK *)SemaphoreAddr;
> >
> >    SemaphoreAddr = (UINTN)SemaphoreBlock + GlobalSemaphoresSize;
> >    mSmmCpuSemaphores.SemaphoreCpu.Busy    = (SPIN_LOCK
> *)SemaphoreAddr;
> > @@ -1321,7 +1319,6 @@ InitializeSmmCpuSemaphores (
> >
> >    mPFLock                       = mSmmCpuSemaphores.SemaphoreGlobal.PFLock;
> >    mConfigSmmCodeAccessCheckLock =
> mSmmCpuSemaphores.SemaphoreGlobal.CodeAccessCheckLock;
> > -  mMemoryMappedLock             =
> mSmmCpuSemaphores.SemaphoreGlobal.MemoryMappedLock;
> >
> >    mSemaphoreSize = SemaphoreSize;
> >  }
> > diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
> > b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
> > index 8c7f4996d1..e2970308fe 100644
> > --- a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
> > +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
> > @@ -53,6 +53,7 @@ WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY
> KIND, EITHER EXPRESS OR IMPLIED.
> >  #include <Library/ReportStatusCodeLib.h>  #include
> > <Library/SmmCpuFeaturesLib.h>  #include
> > <Library/PeCoffGetEntryPointLib.h>
> > +#include <Library/RegisterCpuFeaturesLib.h>
> >
> >  #include <AcpiCpuData.h>
> >  #include <CpuHotPlugData.h>
> > @@ -364,7 +365,6 @@ typedef struct {
> >    volatile BOOLEAN     *AllCpusInSync;
> >    SPIN_LOCK            *PFLock;
> >    SPIN_LOCK            *CodeAccessCheckLock;
> > -  SPIN_LOCK            *MemoryMappedLock;
> >  } SMM_CPU_SEMAPHORE_GLOBAL;
> >
> >  ///
> > @@ -409,7 +409,6 @@ extern SMM_CPU_SEMAPHORES
> mSmmCpuSemaphores;
> >  extern UINTN                               mSemaphoreSize;
> >  extern SPIN_LOCK                           *mPFLock;
> >  extern SPIN_LOCK                           *mConfigSmmCodeAccessCheckLock;
> > -extern SPIN_LOCK                           *mMemoryMappedLock;
> >  extern EFI_SMRAM_DESCRIPTOR                *mSmmCpuSmramRanges;
> >  extern UINTN                               mSmmCpuSmramRangeCount;
> >  extern UINT8                               mPhysicalAddressBits;
> >
> 
> Thanks,
> Laszlo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch 4/4] UefiCpuPkg/PiSmmCpuDxeSmm: Add logic to support semaphore type.
  2018-10-16  3:16   ` Ni, Ruiyu
@ 2018-10-16 23:52     ` Dong, Eric
  0 siblings, 0 replies; 18+ messages in thread
From: Dong, Eric @ 2018-10-16 23:52 UTC (permalink / raw)
  To: Ni, Ruiyu, edk2-devel@lists.01.org; +Cc: Laszlo Ersek

Hi Ruiyu,

> -----Original Message-----
> From: Ni, Ruiyu
> Sent: Tuesday, October 16, 2018 11:16 AM
> To: Dong, Eric <eric.dong@intel.com>; edk2-devel@lists.01.org
> Cc: Laszlo Ersek <lersek@redhat.com>
> Subject: Re: [Patch 4/4] UefiCpuPkg/PiSmmCpuDxeSmm: Add logic to
> support semaphore type.
> 
> On 10/15/2018 10:49 AM, Eric Dong wrote:
> > Because this driver needs to set MSRs saved in normal boot phase, sync
> > semaphore logic from RegisterCpuFeaturesLib code which used for normal
> boot phase.
> >
> > Detail see change SHA-1: dcdf1774212d87e2d7feb36286a408ea7475fd7b for
> > RegisterCpuFeaturesLib.
> >
> > Cc: Ruiyu Ni <ruiyu.ni@intel.com>
> > Cc: Laszlo Ersek <lersek@redhat.com>
> > Contributed-under: TianoCore Contribution Agreement 1.1
> > Signed-off-by: Eric Dong <eric.dong@intel.com>
> > ---
> >   UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c          | 316 ++++++++++++++++-
> ------------
> >   UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c      |   3 -
> >   UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h |   3 +-
> >   3 files changed, 180 insertions(+), 142 deletions(-)
> >
> > diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
> > b/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
> > index 52ff9679d5..5a35f7a634 100644
> > --- a/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
> > +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
> > @@ -38,9 +38,12 @@ typedef struct {
> >   } MP_ASSEMBLY_ADDRESS_MAP;
> >
> >   //
> > -// Spin lock used to serialize MemoryMapped operation
> > +// Flags used when program the register.
> >   //
> > -SPIN_LOCK                *mMemoryMappedLock = NULL;
> > +typedef struct {
> > +  volatile UINTN           MemoryMappedLock;     // Spinlock used to program
> mmio
> > +  volatile UINT32          *SemaphoreCount;      // Semaphore used to
> program semaphore.
> > +} PROGRAM_CPU_REGISTER_FLAGS;
> >
> >   //
> >   // Signal that SMM BASE relocation is complete.
> > @@ -62,13 +65,11 @@ AsmGetAddressMap (
> >   #define LEGACY_REGION_SIZE    (2 * 0x1000)
> >   #define LEGACY_REGION_BASE    (0xA0000 - LEGACY_REGION_SIZE)
> >
> > +PROGRAM_CPU_REGISTER_FLAGS   mCpuFlags;
> >   ACPI_CPU_DATA                mAcpiCpuData;
> >   volatile UINT32              mNumberToFinish;
> >   MP_CPU_EXCHANGE_INFO         *mExchangeInfo;
> >   BOOLEAN                      mRestoreSmmConfigurationInS3 = FALSE;
> > -MP_MSR_LOCK                  *mMsrSpinLocks = NULL;
> > -UINTN                        mMsrSpinLockCount;
> > -UINTN                        mMsrCount = 0;
> >
> >   //
> >   // S3 boot flag
> > @@ -91,89 +92,6 @@ UINT8                        mApHltLoopCodeTemplate[] = {
> >                                  0xEB, 0xFC               // jmp $-2
> >                                  };
> >
> > -/**
> > -  Get MSR spin lock by MSR index.
> > -
> > -  @param  MsrIndex       MSR index value.
> > -
> > -  @return Pointer to MSR spin lock.
> > -
> > -**/
> > -SPIN_LOCK *
> > -GetMsrSpinLockByIndex (
> > -  IN UINT32      MsrIndex
> > -  )
> > -{
> > -  UINTN     Index;
> > -  for (Index = 0; Index < mMsrCount; Index++) {
> > -    if (MsrIndex == mMsrSpinLocks[Index].MsrIndex) {
> > -      return mMsrSpinLocks[Index].SpinLock;
> > -    }
> > -  }
> > -  return NULL;
> > -}
> > -
> > -/**
> > -  Initialize MSR spin lock by MSR index.
> > -
> > -  @param  MsrIndex       MSR index value.
> > -
> > -**/
> > -VOID
> > -InitMsrSpinLockByIndex (
> > -  IN UINT32      MsrIndex
> > -  )
> > -{
> > -  UINTN    MsrSpinLockCount;
> > -  UINTN    NewMsrSpinLockCount;
> > -  UINTN    Index;
> > -  UINTN    AddedSize;
> > -
> > -  if (mMsrSpinLocks == NULL) {
> > -    MsrSpinLockCount =
> mSmmCpuSemaphores.SemaphoreMsr.AvailableCounter;
> > -    mMsrSpinLocks = (MP_MSR_LOCK *) AllocatePool (sizeof
> (MP_MSR_LOCK) * MsrSpinLockCount);
> > -    ASSERT (mMsrSpinLocks != NULL);
> > -    for (Index = 0; Index < MsrSpinLockCount; Index++) {
> > -      mMsrSpinLocks[Index].SpinLock =
> > -       (SPIN_LOCK *)((UINTN)mSmmCpuSemaphores.SemaphoreMsr.Msr +
> Index * mSemaphoreSize);
> > -      mMsrSpinLocks[Index].MsrIndex = (UINT32)-1;
> > -    }
> > -    mMsrSpinLockCount = MsrSpinLockCount;
> > -    mSmmCpuSemaphores.SemaphoreMsr.AvailableCounter = 0;
> > -  }
> > -  if (GetMsrSpinLockByIndex (MsrIndex) == NULL) {
> > -    //
> > -    // Initialize spin lock for MSR programming
> > -    //
> > -    mMsrSpinLocks[mMsrCount].MsrIndex = MsrIndex;
> > -    InitializeSpinLock (mMsrSpinLocks[mMsrCount].SpinLock);
> > -    mMsrCount ++;
> > -    if (mMsrCount == mMsrSpinLockCount) {
> > -      //
> > -      // If MSR spin lock buffer is full, enlarge it
> > -      //
> > -      AddedSize = SIZE_4KB;
> > -      mSmmCpuSemaphores.SemaphoreMsr.Msr =
> > -                        AllocatePages (EFI_SIZE_TO_PAGES(AddedSize));
> > -      ASSERT (mSmmCpuSemaphores.SemaphoreMsr.Msr != NULL);
> > -      NewMsrSpinLockCount = mMsrSpinLockCount + AddedSize /
> mSemaphoreSize;
> > -      mMsrSpinLocks = ReallocatePool (
> > -                        sizeof (MP_MSR_LOCK) * mMsrSpinLockCount,
> > -                        sizeof (MP_MSR_LOCK) * NewMsrSpinLockCount,
> > -                        mMsrSpinLocks
> > -                        );
> > -      ASSERT (mMsrSpinLocks != NULL);
> > -      mMsrSpinLockCount = NewMsrSpinLockCount;
> > -      for (Index = mMsrCount; Index < mMsrSpinLockCount; Index++) {
> > -        mMsrSpinLocks[Index].SpinLock =
> > -                 (SPIN_LOCK
> *)((UINTN)mSmmCpuSemaphores.SemaphoreMsr.Msr +
> > -                 (Index - mMsrCount)  * mSemaphoreSize);
> > -        mMsrSpinLocks[Index].MsrIndex = (UINT32)-1;
> > -      }
> > -    }
> > -  }
> > -}
> > -
> >   /**
> >     Sync up the MTRR values for all processors.
> >
> > @@ -204,42 +122,89 @@ Returns:
> >   }
> >
> >   /**
> > -  Programs registers for the calling processor.
> > +  Increment semaphore by 1.
> >
> > -  This function programs registers for the calling processor.
> > +  @param      Sem            IN:  32-bit unsigned integer
> >
> > -  @param  RegisterTables        Pointer to register table of the running
> processor.
> > -  @param  RegisterTableCount    Register table count.
> > +**/
> > +VOID
> > +S3ReleaseSemaphore (
> > +  IN OUT  volatile UINT32           *Sem
> > +  )
> > +{
> > +  InterlockedIncrement (Sem);
> > +}
> > +
> > +/**
> > +  Decrement the semaphore by 1 if it is not zero.
> > +
> > +  Performs an atomic decrement operation for semaphore.
> > +  The compare exchange operation must be performed using  MP safe
> > + mechanisms.
> > +
> > +  @param      Sem            IN:  32-bit unsigned integer
> > +
> > +**/
> > +VOID
> > +S3WaitForSemaphore (
> > +  IN OUT  volatile UINT32           *Sem
> > +  )
> > +{
> > +  UINT32  Value;
> > +
> > +  do {
> > +    Value = *Sem;
> > +  } while (Value == 0);
> > +
> > +  InterlockedDecrement (Sem);
> 
> The code here is not safe. Please reference ReleaseSemaphore()
> implementation in PiSmmCpuDxeSmm/MpService.c.

Yes, will update code logic in my next version changes.

> 
> > +}
> > +
> > +/**
> > +  Initialize the CPU registers from a register table.
> > +
> > +  @param[in]  RegisterTable         The register table for this AP.
> > +  @param[in]  ApLocation            AP location info for this ap.
> > +  @param[in]  CpuStatus             CPU status info for this CPU.
> > +  @param[in]  CpuFlags              Flags data structure used when program the
> register.
> >
> > +  @note This service could be called by BSP/APs.
> >   **/
> >   VOID
> > -SetProcessorRegister (
> > -  IN CPU_REGISTER_TABLE        *RegisterTables,
> > -  IN UINTN                     RegisterTableCount
> > +EFIAPI
> > +ProgramProcessorRegister (
> > +  IN CPU_REGISTER_TABLE           *RegisterTable,
> > +  IN EFI_CPU_PHYSICAL_LOCATION    *ApLocation,
> > +  IN CPU_STATUS_INFORMATION       *CpuStatus,
> > +  IN PROGRAM_CPU_REGISTER_FLAGS   *CpuFlags
> >     )
> >   {
> >     CPU_REGISTER_TABLE_ENTRY  *RegisterTableEntry;
> >     UINTN                     Index;
> >     UINTN                     Value;
> > -  SPIN_LOCK                 *MsrSpinLock;
> > -  UINT32                    InitApicId;
> > -  CPU_REGISTER_TABLE        *RegisterTable;
> > +  CPU_REGISTER_TABLE_ENTRY  *RegisterTableEntryHead;
> > +  volatile UINT32           *SemaphorePtr;
> > +  UINT32                    CoreOffset;
> > +  UINT32                    PackageOffset;
> > +  UINT32                    PackageThreadsCount;
> > +  UINT32                    ApOffset;
> > +  UINTN                     ProcessorIndex;
> > +  UINTN                     ApIndex;
> > +  UINTN                     ValidApCount;
> >
> > -  InitApicId = GetInitialApicId ();
> > -  RegisterTable = NULL;
> > -  for (Index = 0; Index < RegisterTableCount; Index++) {
> > -    if (RegisterTables[Index].InitialApicId == InitApicId) {
> > -      RegisterTable =  &RegisterTables[Index];
> > -      break;
> > -    }
> > -  }
> > -  ASSERT (RegisterTable != NULL);
> > +  ApIndex = ApLocation->Package * CpuStatus->CoreCount * CpuStatus-
> >ThreadCount \
> > +            + ApLocation->Core * CpuStatus->ThreadCount \
> > +            + ApLocation->Thread;
> Please avoid using AP. Use Thread instead.

Got it. Will use thread for consistent.
> >
> >     //
> >     // Traverse Register Table of this logical processor
> >     //
> > -  RegisterTableEntry = (CPU_REGISTER_TABLE_ENTRY *) (UINTN)
> > RegisterTable->RegisterTableEntry;
> > -  for (Index = 0; Index < RegisterTable->TableLength; Index++,
> > RegisterTableEntry++) {
> > +  RegisterTableEntryHead = (CPU_REGISTER_TABLE_ENTRY *) (UINTN)
> > + RegisterTable->RegisterTableEntry;
> > +
> > +  for (Index = 0; Index < RegisterTable->TableLength; Index++) {
> > +
> > +    RegisterTableEntry = &RegisterTableEntryHead[Index];
> > +    DEBUG ((DEBUG_INFO, "Processor = %d, Entry Index %d, Type =
> > + %d!\n", ApIndex, Index, RegisterTableEntry->RegisterType));
> 
> Please dump the register type as string.

Yes, will update in my next version changes.

> 
> > +
> >       //
> >       // Check the type of specified register
> >       //
> > @@ -310,12 +275,6 @@ SetProcessorRegister (
> >             RegisterTableEntry->Value
> >             );
> >         } else {
> > -        //
> > -        // Get lock to avoid Package/Core scope MSRs programming issue in
> parallel execution mode
> > -        // to make sure MSR read/write operation is atomic.
> > -        //
> > -        MsrSpinLock = GetMsrSpinLockByIndex (RegisterTableEntry->Index);
> > -        AcquireSpinLock (MsrSpinLock);
> >           //
> >           // Set the bit section according to bit start and length
> >           //
> > @@ -325,21 +284,20 @@ SetProcessorRegister (
> >             RegisterTableEntry->ValidBitStart + RegisterTableEntry-
> >ValidBitLength - 1,
> >             RegisterTableEntry->Value
> >             );
> > -        ReleaseSpinLock (MsrSpinLock);
> >         }
> >         break;
> >       //
> >       // MemoryMapped operations
> >       //
> >       case MemoryMapped:
> > -      AcquireSpinLock (mMemoryMappedLock);
> > +      AcquireSpinLock (&CpuFlags->MemoryMappedLock);
> >         MmioBitFieldWrite32 (
> >           (UINTN)(RegisterTableEntry->Index | LShiftU64 (RegisterTableEntry-
> >HighIndex, 32)),
> >           RegisterTableEntry->ValidBitStart,
> >           RegisterTableEntry->ValidBitStart + RegisterTableEntry-
> >ValidBitLength - 1,
> >           (UINT32)RegisterTableEntry->Value
> >           );
> > -      ReleaseSpinLock (mMemoryMappedLock);
> > +      ReleaseSpinLock (&CpuFlags->MemoryMappedLock);
> >         break;
> >       //
> >       // Enable or disable cache
> > @@ -355,12 +313,99 @@ SetProcessorRegister (
> >         }
> >         break;
> >
> > +    case Semaphore:
> 
> Please refer to the comment to patch #3.

Got it.

> 
> > +      SemaphorePtr = CpuFlags->SemaphoreCount;
> > +      switch (RegisterTableEntry->Value) {
> > +      case CoreDepType:
> > +        CoreOffset = (ApLocation->Package * CpuStatus->CoreCount +
> ApLocation->Core) * CpuStatus->ThreadCount;
> > +        ApOffset = CoreOffset + ApLocation->Thread;
> > +        //
> > +        // First increase semaphore count by 1 for processors in this core.
> > +        //
> > +        for (ProcessorIndex = 0; ProcessorIndex < CpuStatus->ThreadCount;
> ProcessorIndex ++) {
> > +          S3ReleaseSemaphore ((UINT32 *) &SemaphorePtr[CoreOffset +
> ProcessorIndex]);
> > +        }
> > +        //
> > +        // Second, check whether the count has reach the check number.
> > +        //
> > +        for (ProcessorIndex = 0; ProcessorIndex < CpuStatus->ThreadCount;
> ProcessorIndex ++) {
> > +          S3WaitForSemaphore (&SemaphorePtr[ApOffset]);
> > +        }
> > +        break;
> > +
> > +      case PackageDepType:
> > +        PackageOffset = ApLocation->Package * CpuStatus->CoreCount *
> CpuStatus->ThreadCount;
> > +        PackageThreadsCount = CpuStatus->ThreadCount * CpuStatus-
> >CoreCount;
> > +        ApOffset = PackageOffset + CpuStatus->ThreadCount * ApLocation-
> >Core + ApLocation->Thread;
> > +        ValidApCount = CpuStatus->ThreadCount * CpuStatus-
> >ValidCoresInPackages[ApLocation->Package];
> > +        //
> > +        // First increase semaphore count by 1 for processors in this package.
> > +        //
> > +        for (ProcessorIndex = 0; ProcessorIndex < PackageThreadsCount ;
> ProcessorIndex ++) {
> > +          S3ReleaseSemaphore ((UINT32 *) &SemaphorePtr[PackageOffset +
> ProcessorIndex]);
> > +        }
> > +        //
> > +        // Second, check whether the count has reach the check number.
> > +        //
> > +        for (ProcessorIndex = 0; ProcessorIndex < ValidApCount;
> ProcessorIndex ++) {
> > +          S3WaitForSemaphore (&SemaphorePtr[ApOffset]);
> > +        }
> > +        break;
> > +
> > +      default:
> > +        break;
> > +      }
> > +      break;
> > +
> >       default:
> >         break;
> >       }
> >     }
> >   }
> >
> > +/**
> > +
> > +  Set Processor register for one AP.
> > +
> > +  @param     SmmPreRegisterTable     Use pre register table or register
> table.
> > +
> > +**/
> > +VOID
> > +SetRegister (
> > +  IN BOOLEAN                 SmmPreRegisterTable
> > +  )
> > +{
> > +  CPU_REGISTER_TABLE        *RegisterTable;
> > +  CPU_REGISTER_TABLE        *RegisterTables;
> > +  UINT32                    InitApicId;
> > +  UINTN                     ProcIndex;
> > +  UINTN                     Index;
> > +
> > +  if (SmmPreRegisterTable) {
> > +    RegisterTables = (CPU_REGISTER_TABLE
> > + *)(UINTN)mAcpiCpuData.PreSmmInitRegisterTable;
> > +  } else {
> > +    RegisterTables = (CPU_REGISTER_TABLE
> > + *)(UINTN)mAcpiCpuData.RegisterTable;
> > +  }
> > +
> > +  InitApicId = GetInitialApicId ();
> > +  RegisterTable = NULL;
> > +  for (Index = 0; Index < mAcpiCpuData.NumberOfCpus; Index++) {
> > +    if (RegisterTables[Index].InitialApicId == InitApicId) {
> > +      RegisterTable =  &RegisterTables[Index];
> > +      ProcIndex = Index;
> > +      break;
> > +    }
> > +  }
> > +  ASSERT (RegisterTable != NULL);
> > +
> > +  ProgramProcessorRegister (
> > +    RegisterTable,
> > +    mAcpiCpuData.ApLocation + ProcIndex,
> > +    &mAcpiCpuData.CpuStatus,
> > +    &mCpuFlags
> > +    );
> > +}
> > +
> >   /**
> >     AP initialization before then after SMBASE relocation in the S3 boot path.
> >   **/
> > @@ -374,7 +419,7 @@ InitializeAp (
> >
> >     LoadMtrrData (mAcpiCpuData.MtrrTable);
> >
> > -  SetProcessorRegister ((CPU_REGISTER_TABLE *) (UINTN)
> > mAcpiCpuData.PreSmmInitRegisterTable, mAcpiCpuData.NumberOfCpus);
> > +  SetRegister (TRUE);
> >
> >     //
> >     // Count down the number with lock mechanism.
> > @@ -391,7 +436,7 @@ InitializeAp (
> >     ProgramVirtualWireMode ();
> >     DisableLvtInterrupts ();
> >
> > -  SetProcessorRegister ((CPU_REGISTER_TABLE *) (UINTN)
> > mAcpiCpuData.RegisterTable, mAcpiCpuData.NumberOfCpus);
> > +  SetRegister (FALSE);
> >
> >     //
> >     // Place AP into the safe code, count down the number with lock
> mechanism in the safe code.
> > @@ -466,7 +511,7 @@ InitializeCpuBeforeRebase (
> >   {
> >     LoadMtrrData (mAcpiCpuData.MtrrTable);
> >
> > -  SetProcessorRegister ((CPU_REGISTER_TABLE *) (UINTN)
> > mAcpiCpuData.PreSmmInitRegisterTable, mAcpiCpuData.NumberOfCpus);
> > +  SetRegister (TRUE);
> >
> >     ProgramVirtualWireMode ();
> >
> > @@ -502,8 +547,6 @@ InitializeCpuAfterRebase (
> >     VOID
> >     )
> >   {
> > -  SetProcessorRegister ((CPU_REGISTER_TABLE *) (UINTN)
> > mAcpiCpuData.RegisterTable, mAcpiCpuData.NumberOfCpus);
> > -
> >     mNumberToFinish = mAcpiCpuData.NumberOfCpus - 1;
> >
> >     //
> > @@ -511,6 +554,8 @@ InitializeCpuAfterRebase (
> >     //
> >     mInitApsAfterSmmBaseReloc = TRUE;
> >
> > +  SetRegister (FALSE);
> > +
> >     while (mNumberToFinish > 0) {
> >       CpuPause ();
> >     }
> > @@ -574,8 +619,6 @@ SmmRestoreCpu (
> >
> >     mSmmS3Flag = TRUE;
> >
> > -  InitializeSpinLock (mMemoryMappedLock);
> > -
> >     //
> >     // See if there is enough context to resume PEI Phase
> >     //
> > @@ -790,7 +833,6 @@ CopyRegisterTable (
> >     )
> >   {
> >     UINTN                      Index;
> > -  UINTN                      Index1;
> >     CPU_REGISTER_TABLE_ENTRY   *RegisterTableEntry;
> >
> >     CopyMem (DestinationRegisterTableList, SourceRegisterTableList,
> > NumberOfCpus * sizeof (CPU_REGISTER_TABLE)); @@ -802,17 +844,6 @@
> CopyRegisterTable (
> >           );
> >         ASSERT (RegisterTableEntry != NULL);
> >         DestinationRegisterTableList[Index].RegisterTableEntry =
> (EFI_PHYSICAL_ADDRESS)(UINTN)RegisterTableEntry;
> > -      //
> > -      // Go though all MSRs in register table to initialize MSR spin lock
> > -      //
> > -      for (Index1 = 0; Index1 <
> DestinationRegisterTableList[Index].TableLength; Index1++,
> RegisterTableEntry++) {
> > -        if ((RegisterTableEntry->RegisterType == Msr) &&
> (RegisterTableEntry->ValidBitLength < 64)) {
> > -          //
> > -          // Initialize MSR spin lock only for those MSRs need bit field writing
> > -          //
> > -          InitMsrSpinLockByIndex (RegisterTableEntry->Index);
> > -        }
> > -      }
> >       }
> >     }
> >   }
> > @@ -832,6 +863,7 @@ GetAcpiCpuData (
> >     VOID                       *GdtForAp;
> >     VOID                       *IdtForAp;
> >     VOID                       *MachineCheckHandlerForAp;
> > +  CPU_STATUS_INFORMATION     *CpuStatus;
> >
> >     if (!mAcpiS3Enable) {
> >       return;
> > @@ -906,6 +938,16 @@ GetAcpiCpuData (
> >     Gdtr->Base = (UINTN)GdtForAp;
> >     Idtr->Base = (UINTN)IdtForAp;
> >     mAcpiCpuData.ApMachineCheckHandlerBase =
> > (EFI_PHYSICAL_ADDRESS)(UINTN)MachineCheckHandlerForAp;
> > +
> > +  CpuStatus = &mAcpiCpuData.CpuStatus;  CopyMem (CpuStatus,
> > + &AcpiCpuData->CpuStatus, sizeof (CPU_STATUS_INFORMATION));
> > + CpuStatus->ValidCoresInPackages = AllocateCopyPool (sizeof (UINT32)
> > + * CpuStatus->PackageCount,
> > + AcpiCpuData->CpuStatus.ValidCoresInPackages);
> > +  ASSERT (CpuStatus->ValidCoresInPackages != NULL);
> > + mAcpiCpuData.ApLocation = AllocateCopyPool
> > + (mAcpiCpuData.NumberOfCpus * sizeof (EFI_CPU_PHYSICAL_LOCATION),
> > + AcpiCpuData->ApLocation);  ASSERT (mAcpiCpuData.ApLocation != NULL);
> > +  InitializeSpinLock((SPIN_LOCK*) &mCpuFlags.MemoryMappedLock);
> > + mCpuFlags.SemaphoreCount = AllocateZeroPool (sizeof (UINT32) *
> > + CpuStatus->PackageCount * CpuStatus->CoreCount *
> > + CpuStatus->ThreadCount);  ASSERT (mCpuFlags.SemaphoreCount !=
> NULL);
> >   }
> >
> >   /**
> > diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
> > b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
> > index 9cf508a5c7..42b040531e 100644
> > --- a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
> > +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
> > @@ -1303,8 +1303,6 @@ InitializeSmmCpuSemaphores (
> >     mSmmCpuSemaphores.SemaphoreGlobal.CodeAccessCheckLock
> >                                                     = (SPIN_LOCK *)SemaphoreAddr;
> >     SemaphoreAddr += SemaphoreSize;
> > -  mSmmCpuSemaphores.SemaphoreGlobal.MemoryMappedLock
> > -                                                  = (SPIN_LOCK *)SemaphoreAddr;
> >
> >     SemaphoreAddr = (UINTN)SemaphoreBlock + GlobalSemaphoresSize;
> >     mSmmCpuSemaphores.SemaphoreCpu.Busy    = (SPIN_LOCK
> *)SemaphoreAddr;
> > @@ -1321,7 +1319,6 @@ InitializeSmmCpuSemaphores (
> >
> >     mPFLock                       = mSmmCpuSemaphores.SemaphoreGlobal.PFLock;
> >     mConfigSmmCodeAccessCheckLock =
> mSmmCpuSemaphores.SemaphoreGlobal.CodeAccessCheckLock;
> > -  mMemoryMappedLock             =
> mSmmCpuSemaphores.SemaphoreGlobal.MemoryMappedLock;
> >
> >     mSemaphoreSize = SemaphoreSize;
> >   }
> > diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
> > b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
> > index 8c7f4996d1..e2970308fe 100644
> > --- a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
> > +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
> > @@ -53,6 +53,7 @@ WITHOUT WARRANTIES OR REPRESENTATIONS OF
> ANY KIND, EITHER EXPRESS OR IMPLIED.
> >   #include <Library/ReportStatusCodeLib.h>
> >   #include <Library/SmmCpuFeaturesLib.h>
> >   #include <Library/PeCoffGetEntryPointLib.h>
> > +#include <Library/RegisterCpuFeaturesLib.h>
> >
> >   #include <AcpiCpuData.h>
> >   #include <CpuHotPlugData.h>
> > @@ -364,7 +365,6 @@ typedef struct {
> >     volatile BOOLEAN     *AllCpusInSync;
> >     SPIN_LOCK            *PFLock;
> >     SPIN_LOCK            *CodeAccessCheckLock;
> > -  SPIN_LOCK            *MemoryMappedLock;
> >   } SMM_CPU_SEMAPHORE_GLOBAL;
> >
> >   ///
> > @@ -409,7 +409,6 @@ extern SMM_CPU_SEMAPHORES
> mSmmCpuSemaphores;
> >   extern UINTN                               mSemaphoreSize;
> >   extern SPIN_LOCK                           *mPFLock;
> >   extern SPIN_LOCK                           *mConfigSmmCodeAccessCheckLock;
> > -extern SPIN_LOCK                           *mMemoryMappedLock;
> >   extern EFI_SMRAM_DESCRIPTOR                *mSmmCpuSmramRanges;
> >   extern UINTN                               mSmmCpuSmramRangeCount;
> >   extern UINT8                               mPhysicalAddressBits;
> >
> 
> 
> --
> Thanks,
> Ray

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch 0/4] Fix performance issue caused by Set MSR task.
  2018-10-16  1:39   ` Dong, Eric
@ 2018-10-17 11:42     ` Laszlo Ersek
  0 siblings, 0 replies; 18+ messages in thread
From: Laszlo Ersek @ 2018-10-17 11:42 UTC (permalink / raw)
  To: Dong, Eric, edk2-devel@lists.01.org; +Cc: Ni, Ruiyu

On 10/16/18 03:39, Dong, Eric wrote:
> Hi Laszlo,
> 
> [...]

Thanks for your answers, it's all much clearer now.

Laszlo


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-10-17 11:42 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-10-15  2:49 [Patch 0/4] Fix performance issue caused by Set MSR task Eric Dong
2018-10-15  2:49 ` [Patch 1/4] UefiCpuPkg/Include/AcpiCpuData.h: Add Semaphore related Information Eric Dong
2018-10-15 16:02   ` Laszlo Ersek
2018-10-16  3:43     ` Dong, Eric
2018-10-16  2:27   ` Ni, Ruiyu
2018-10-16  5:25     ` Dong, Eric
2018-10-15  2:49 ` [Patch 2/4] UefiCpuPkg/RegisterCpuFeaturesLib.h: Add new dependence types Eric Dong
2018-10-15  2:49 ` [Patch 3/4] UefiCpuPkg/RegisterCpuFeaturesLib: Add logic to support semaphore type Eric Dong
2018-10-16  3:05   ` Ni, Ruiyu
2018-10-16  7:43     ` Dong, Eric
2018-10-15  2:49 ` [Patch 4/4] UefiCpuPkg/PiSmmCpuDxeSmm: " Eric Dong
2018-10-15 17:13   ` Laszlo Ersek
2018-10-16 14:44     ` Dong, Eric
2018-10-16  3:16   ` Ni, Ruiyu
2018-10-16 23:52     ` Dong, Eric
2018-10-15 15:51 ` [Patch 0/4] Fix performance issue caused by Set MSR task Laszlo Ersek
2018-10-16  1:39   ` Dong, Eric
2018-10-17 11:42     ` Laszlo Ersek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox