public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
* [Patch v2 0/6] Fix performance issue caused by Set MSR task.
@ 2018-10-17  2:16 Eric Dong
  2018-10-17  2:16 ` [Patch v2 1/6] UefiCpuPkg/Include/AcpiCpuData.h: Add Semaphore related Information Eric Dong
                   ` (7 more replies)
  0 siblings, 8 replies; 18+ messages in thread
From: Eric Dong @ 2018-10-17  2:16 UTC (permalink / raw)
  To: edk2-devel; +Cc: Ruiyu Ni, Laszlo Ersek

V2 changes include:
1. Include the change for CpuCommonFeaturesLib which used to set MSR base on its scope info.
2. Include the change for CpuS3DataDxe driver which also handle the AcpiCpuData data.
3. Update code base on feedback for V1 changes.

V1 changes include:
In a system which has multiple cores, current set register value task costs huge times.
After investigation, current set MSR task costs most of the times. Current logic uses SpinLock to let set MSR task as an single thread task for all cores. Because MSR has scope attribute which may cause GP fault if multiple APs set MSR at the same time, current logic use an easiest solution (use SpinLock) to avoid this issue, but it will cost huge times.

In order to fix this performance issue, new solution will set MSRs base on their scope attribute. After this, the SpinLock will not needed. Without SpinLock, new issue raised which is caused by MSR dependence. For example, MSR A depends on MSR B which means MSR A must been set after MSR B has been set. Also MSR B is package scope level and MSR A is thread scope level. If system has multiple threads, Thread 1 needs to set the thread level MSRs and thread 2 needs to set thread and package level MSRs. Set MSRs task for thread 1 and thread 2 like below:

            Thread 1                 Thread 2
MSR B          N                        Y
MSR A          Y                        Y

If driver don't control execute MSR order, for thread 1, it will execute MSR A first, but at this time, MSR B not been executed yet by thread 2. system may trig exception at this time.

In order to fix the above issue, driver introduces semaphore logic to control the MSR execute sequence. For the above case, a semaphore will be add between MSR A and B for all threads. Semaphore has scope info for it. The possible scope value is core or package.
For each thread, when it meets a semaphore during it set registers, it will 1) release semaphore (+1) for each threads in this core or package(based on the scope info for this
semaphore) 2) acquire semaphore (-1) for all the threads in this core or package(based on the scope info for this semaphore). With these two steps, driver can control MSR sequence. Sample code logic like below:

  //
  // First increase semaphore count by 1 for processors in this package.
  //
  for (ProcessorIndex = 0; ProcessorIndex < PackageThreadsCount ; ProcessorIndex ++) {
    LibReleaseSemaphore ((UINT32 *) &SemaphorePtr[PackageOffset + ProcessorIndex]);
  }
  //
  // Second, check whether the count has reach the check number.
  //
  for (ProcessorIndex = 0; ProcessorIndex < ValidApCount; ProcessorIndex ++) {
    LibWaitForSemaphore (&SemaphorePtr[ApOffset]);
  }

Platform Requirement:
1. This change requires register MSR setting base on MSR scope info. If still register MSR
   for all threads, exception may raised.

Known limitation:
1. Current CpuFeatures driver supports DXE instance and PEI instance. But semaphore logic
   requires Aps execute in async mode which is not supported by PEI driver. So CpuFeature
   PEI instance not works after this change. We plan to support async mode for PEI in phase
   2 for this task.
2. Current execute MSR task code in duplicated in PiSmmCpuDxeSmm driver and 
   RegisterCpuFeaturesLib library because the schedule limitation. Will merge the code to 
   RegisterCpuFeaturesLib and export as an API in phase 2 for this task.

Cc: Ruiyu Ni <ruiyu.ni@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Eric Dong <eric.dong@intel.com>


Eric Dong (6):
  UefiCpuPkg/Include/AcpiCpuData.h: Add Semaphore related Information.
  UefiCpuPkg/RegisterCpuFeaturesLib.h: Add new dependence types.
  UefiCpuPkg/RegisterCpuFeaturesLib: Add logic to support semaphore
    type.
  UefiCpuPkg/PiSmmCpuDxeSmm: Add logic to support semaphore type.
  UefiCpuPkg/CpuS3DataDxe: Keep old data if value already existed.
  UefiCpuPkg/CpuCommonFeaturesLib: Register MSR base on scope Info.

 UefiCpuPkg/CpuS3DataDxe/CpuS3Data.c                |   2 +
 UefiCpuPkg/Include/AcpiCpuData.h                   |  45 +-
 .../Include/Library/RegisterCpuFeaturesLib.h       |  25 +-
 UefiCpuPkg/Library/CpuCommonFeaturesLib/C1e.c      |   8 +
 UefiCpuPkg/Library/CpuCommonFeaturesLib/Eist.c     |  12 +
 .../Library/CpuCommonFeaturesLib/ExecuteDisable.c  |  10 +
 .../Library/CpuCommonFeaturesLib/FastStrings.c     |  12 +
 .../Library/CpuCommonFeaturesLib/FeatureControl.c  |  38 ++
 .../CpuCommonFeaturesLib/LimitCpuIdMaxval.c        |  14 +
 .../Library/CpuCommonFeaturesLib/MachineCheck.c    |  38 ++
 .../Library/CpuCommonFeaturesLib/MonitorMwait.c    |  15 +
 .../Library/CpuCommonFeaturesLib/PendingBreak.c    |  11 +
 UefiCpuPkg/Library/CpuCommonFeaturesLib/Ppin.c     |  11 +
 .../Library/CpuCommonFeaturesLib/ProcTrace.c       |  11 +
 UefiCpuPkg/Library/CpuCommonFeaturesLib/X2Apic.c   |  10 +
 .../RegisterCpuFeaturesLib/CpuFeaturesInitialize.c | 438 ++++++++++++++++---
 .../DxeRegisterCpuFeaturesLib.c                    |  71 ++-
 .../DxeRegisterCpuFeaturesLib.inf                  |   3 +
 .../PeiRegisterCpuFeaturesLib.c                    |  55 ++-
 .../PeiRegisterCpuFeaturesLib.inf                  |   1 +
 .../RegisterCpuFeaturesLib/RegisterCpuFeatures.h   |  61 ++-
 .../RegisterCpuFeaturesLib.c                       | 484 ++++++++++++++++++---
 UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c                  | 406 +++++++++++------
 UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c              |   3 -
 UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h         |   3 +-
 25 files changed, 1505 insertions(+), 282 deletions(-)

-- 
2.15.0.windows.1



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Patch v2 1/6] UefiCpuPkg/Include/AcpiCpuData.h: Add Semaphore related Information.
  2018-10-17  2:16 [Patch v2 0/6] Fix performance issue caused by Set MSR task Eric Dong
@ 2018-10-17  2:16 ` Eric Dong
  2018-10-18  3:04   ` Ni, Ruiyu
  2018-10-18  3:10   ` Ni, Ruiyu
  2018-10-17  2:16 ` [Patch v2 2/6] UefiCpuPkg/RegisterCpuFeaturesLib.h: Add new dependence types Eric Dong
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 18+ messages in thread
From: Eric Dong @ 2018-10-17  2:16 UTC (permalink / raw)
  To: edk2-devel; +Cc: Ruiyu Ni, Laszlo Ersek

v2 changes:
1. Add more description about why we do this change.
2. Change structure field type from pointer to EFI_PHYSICAL_ADDRESS because it will
   be share between PEI and DXE.

In order to support semaphore related logic, add new definition for it.

In a system which has multiple cores, current set register value task costs huge times.
After investigation, current set MSR task costs most of the times. Current logic uses
SpinLock to let set MSR task as an single thread task for all cores. Because MSR has
scope attribute which may cause GP fault if multiple APs set MSR at the same time,
current logic use an easiest solution (use SpinLock) to avoid this issue, but it will
cost huge times.

In order to fix this performance issue, new solution will set MSRs base on their scope
attribute. After this, the SpinLock will not needed. Without SpinLock, new issue raised
which is caused by MSR dependence. For example, MSR A depends on MSR B which means MSR A
must been set after MSR B has been set. Also MSR B is package scope level and MSR A is
thread scope level. If system has multiple threads, Thread 1 needs to set the thread level
MSRs and thread 2 needs to set thread and package level MSRs. Set MSRs task for thread 1
and thread 2 like below:

            Thread 1                 Thread 2
MSR B          N                        Y
MSR A          Y                        Y

If driver don't control execute MSR order, for thread 1, it will execute MSR A first, but
at this time, MSR B not been executed yet by thread 2. system may trig exception at this
time.

In order to fix the above issue, driver introduces semaphore logic to control the MSR
execute sequence. For the above case, a semaphore will be add between MSR A and B for
all threads. Semaphore has scope info for it. The possible scope value is core or package.
For each thread, when it meets a semaphore during it set registers, it will 1) release
semaphore (+1) for each threads in this core or package(based on the scope info for this
semaphore) 2) acquire semaphore (-1) for all the threads in this core or package(based
on the scope info for this semaphore). With these two steps, driver can control MSR
sequence. Sample code logic like below:

  //
  // First increase semaphore count by 1 for processors in this package.
  //
  for (ProcessorIndex = 0; ProcessorIndex < PackageThreadsCount ; ProcessorIndex ++) {
    LibReleaseSemaphore ((UINT32 *) &SemaphorePtr[PackageOffset + ProcessorIndex]);
  }
  //
  // Second, check whether the count has reach the check number.
  //
  for (ProcessorIndex = 0; ProcessorIndex < ValidApCount; ProcessorIndex ++) {
    LibWaitForSemaphore (&SemaphorePtr[ApOffset]);
  }

Platform Requirement:
1. This change requires register MSR setting base on MSR scope info. If still register MSR
   for all threads, exception may raised.

Known limitation:
1. Current CpuFeatures driver supports DXE instance and PEI instance. But semaphore logic
   requires Aps execute in async mode which is not supported by PEI driver. So CpuFeature
   PEI instance not works after this change. We plan to support async mode for PEI in phase
   2 for this task.

Cc: Ruiyu Ni <ruiyu.ni@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Eric Dong <eric.dong@intel.com>
---
 UefiCpuPkg/Include/AcpiCpuData.h | 45 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 44 insertions(+), 1 deletion(-)

diff --git a/UefiCpuPkg/Include/AcpiCpuData.h b/UefiCpuPkg/Include/AcpiCpuData.h
index 9e51145c08..f1439dcf9a 100644
--- a/UefiCpuPkg/Include/AcpiCpuData.h
+++ b/UefiCpuPkg/Include/AcpiCpuData.h
@@ -22,9 +22,42 @@ typedef enum {
   Msr,
   ControlRegister,
   MemoryMapped,
-  CacheControl
+  CacheControl,
+  //
+  // Semaphore type used to control the execute sequence of the Msr.
+  // It will be insert between two Msr which has execute dependence.
+  //
+  Semaphore
 } REGISTER_TYPE;
 
+//
+// CPU information.
+//
+typedef struct {
+  //
+  // Record the package count in this CPU.
+  //
+  UINT32                      PackageCount;
+  //
+  // Record the max core count in this CPU.
+  // Different packages may have different core count, this value
+  // save the max core count in all the packages.
+  //
+  UINT32                      MaxCoreCount;
+  //
+  // Record the max thread count in this CPU.
+  // Different cores may have different thread count, this value
+  // save the max thread count in all the cores.
+  //
+  UINT32                      MaxThreadCount;
+  //
+  // This fild is an pointer type which point to an array.
+  // This array used to save the valid cores in different packages in this CPU.
+  // The array count is the package number in this CPU.
+  //
+  EFI_PHYSICAL_ADDRESS        ValidCoresPerPackages;
+} CPU_STATUS_INFORMATION;
+
 //
 // Element of register table entry
 //
@@ -147,6 +180,16 @@ typedef struct {
   // provided.
   //
   UINT32                ApMachineCheckHandlerSize;
+  //
+  // CPU information which is required when set the register table.
+  //
+  CPU_STATUS_INFORMATION     CpuStatus;
+  //
+  // Location info for each AP.
+  // It points to an array which saves all APs location info in this CPU.
+  // The array count is the AP count in this CPU.
+  //
+  EFI_PHYSICAL_ADDRESS  ApLocation;
 } ACPI_CPU_DATA;
 
 #endif
-- 
2.15.0.windows.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Patch v2 2/6] UefiCpuPkg/RegisterCpuFeaturesLib.h: Add new dependence types.
  2018-10-17  2:16 [Patch v2 0/6] Fix performance issue caused by Set MSR task Eric Dong
  2018-10-17  2:16 ` [Patch v2 1/6] UefiCpuPkg/Include/AcpiCpuData.h: Add Semaphore related Information Eric Dong
@ 2018-10-17  2:16 ` Eric Dong
  2018-10-18  3:31   ` Ni, Ruiyu
  2018-10-17  2:16 ` [Patch v2 3/6] UefiCpuPkg/RegisterCpuFeaturesLib: Add logic to support semaphore type Eric Dong
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Eric Dong @ 2018-10-17  2:16 UTC (permalink / raw)
  To: edk2-devel; +Cc: Ruiyu Ni, Laszlo Ersek

Add new core/package dependence types which consumed by different MSRs.

Cc: Ruiyu Ni <ruiyu.ni@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Eric Dong <eric.dong@intel.com>
---
 .../Include/Library/RegisterCpuFeaturesLib.h       | 25 ++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/UefiCpuPkg/Include/Library/RegisterCpuFeaturesLib.h b/UefiCpuPkg/Include/Library/RegisterCpuFeaturesLib.h
index 9331e49d13..e6f0ebe4bc 100644
--- a/UefiCpuPkg/Include/Library/RegisterCpuFeaturesLib.h
+++ b/UefiCpuPkg/Include/Library/RegisterCpuFeaturesLib.h
@@ -73,10 +73,17 @@
 #define CPU_FEATURE_PPIN                            (32+11)
 #define CPU_FEATURE_PROC_TRACE                      (32+12)
 
-#define CPU_FEATURE_BEFORE_ALL                      BIT27
-#define CPU_FEATURE_AFTER_ALL                       BIT28
-#define CPU_FEATURE_BEFORE                          BIT29
-#define CPU_FEATURE_AFTER                           BIT30
+#define CPU_FEATURE_BEFORE_ALL                      BIT23
+#define CPU_FEATURE_AFTER_ALL                       BIT24
+#define CPU_FEATURE_BEFORE                          BIT25
+#define CPU_FEATURE_AFTER                           BIT26
+
+#define CPU_FEATURE_THREAD_BEFORE                   CPU_FEATURE_BEFORE
+#define CPU_FEATURE_THREAD_AFTER                    CPU_FEATURE_AFTER
+#define CPU_FEATURE_CORE_BEFORE                     BIT27
+#define CPU_FEATURE_CORE_AFTER                      BIT28
+#define CPU_FEATURE_PACKAGE_BEFORE                  BIT29
+#define CPU_FEATURE_PACKAGE_AFTER                   BIT30
 #define CPU_FEATURE_END                             MAX_UINT32
 /// @}
 
@@ -116,6 +123,16 @@ typedef struct {
   CPUID_VERSION_INFO_EDX               CpuIdVersionInfoEdx;
 } REGISTER_CPU_FEATURE_INFORMATION;
 
+//
+// Describe the dependency type for different features.
+//
+typedef enum {
+  NoneDepType,
+  ThreadDepType,
+  CoreDepType,
+  PackageDepType
+} CPU_FEATURE_DEPENDENCE_TYPE;
+
 /**
   Determines if a CPU feature is enabled in PcdCpuFeaturesSupport bit mask.
   If a CPU feature is disabled in PcdCpuFeaturesSupport then all the code/data
-- 
2.15.0.windows.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Patch v2 3/6] UefiCpuPkg/RegisterCpuFeaturesLib: Add logic to support semaphore type.
  2018-10-17  2:16 [Patch v2 0/6] Fix performance issue caused by Set MSR task Eric Dong
  2018-10-17  2:16 ` [Patch v2 1/6] UefiCpuPkg/Include/AcpiCpuData.h: Add Semaphore related Information Eric Dong
  2018-10-17  2:16 ` [Patch v2 2/6] UefiCpuPkg/RegisterCpuFeaturesLib.h: Add new dependence types Eric Dong
@ 2018-10-17  2:16 ` Eric Dong
  2018-10-18  5:46   ` Ni, Ruiyu
  2018-10-17  2:16 ` [Patch v2 4/6] UefiCpuPkg/PiSmmCpuDxeSmm: " Eric Dong
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Eric Dong @ 2018-10-17  2:16 UTC (permalink / raw)
  To: edk2-devel; +Cc: Ruiyu Ni, Laszlo Ersek

V2 changes include:
1. Add more description for the code part which need easy to understand.
2. Refine some code base on feedback for V1 changes.

V1 changes include:
In a system which has multiple cores, current set register value task costs huge times.
After investigation, current set MSR task costs most of the times. Current logic uses
SpinLock to let set MSR task as an single thread task for all cores. Because MSR has
scope attribute which may cause GP fault if multiple APs set MSR at the same time,
current logic use an easiest solution (use SpinLock) to avoid this issue, but it will
cost huge times.

In order to fix this performance issue, new solution will set MSRs base on their scope
attribute. After this, the SpinLock will not needed. Without SpinLock, new issue raised
which is caused by MSR dependence. For example, MSR A depends on MSR B which means MSR A
must been set after MSR B has been set. Also MSR B is package scope level and MSR A is
thread scope level. If system has multiple threads, Thread 1 needs to set the thread level
MSRs and thread 2 needs to set thread and package level MSRs. Set MSRs task for thread 1
and thread 2 like below:

            Thread 1                 Thread 2
MSR B          N                        Y
MSR A          Y                        Y

If driver don't control execute MSR order, for thread 1, it will execute MSR A first, but
at this time, MSR B not been executed yet by thread 2. system may trig exception at this
time.

In order to fix the above issue, driver introduces semaphore logic to control the MSR
execute sequence. For the above case, a semaphore will be add between MSR A and B for
all threads. Semaphore has scope info for it. The possible scope value is core or package.
For each thread, when it meets a semaphore during it set registers, it will 1) release
semaphore (+1) for each threads in this core or package(based on the scope info for this
semaphore) 2) acquire semaphore (-1) for all the threads in this core or package(based
on the scope info for this semaphore). With these two steps, driver can control MSR
sequence. Sample code logic like below:

  //
  // First increase semaphore count by 1 for processors in this package.
  //
  for (ProcessorIndex = 0; ProcessorIndex < PackageThreadsCount ; ProcessorIndex ++) {
    LibReleaseSemaphore ((UINT32 *) &SemaphorePtr[PackageOffset + ProcessorIndex]);
  }
  //
  // Second, check whether the count has reach the check number.
  //
  for (ProcessorIndex = 0; ProcessorIndex < ValidApCount; ProcessorIndex ++) {
    LibWaitForSemaphore (&SemaphorePtr[ApOffset]);
  }

Platform Requirement:
1. This change requires register MSR setting base on MSR scope info. If still register MSR
   for all threads, exception may raised.

Known limitation:
1. Current CpuFeatures driver supports DXE instance and PEI instance. But semaphore logic
   requires Aps execute in async mode which is not supported by PEI driver. So CpuFeature
   PEI instance not works after this change. We plan to support async mode for PEI in phase
   2 for this task.

Cc: Ruiyu Ni <ruiyu.ni@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Eric Dong <eric.dong@intel.com>
---
 .../RegisterCpuFeaturesLib/CpuFeaturesInitialize.c | 438 ++++++++++++++++---
 .../DxeRegisterCpuFeaturesLib.c                    |  71 ++-
 .../DxeRegisterCpuFeaturesLib.inf                  |   3 +
 .../PeiRegisterCpuFeaturesLib.c                    |  55 ++-
 .../PeiRegisterCpuFeaturesLib.inf                  |   1 +
 .../RegisterCpuFeaturesLib/RegisterCpuFeatures.h   |  61 ++-
 .../RegisterCpuFeaturesLib.c                       | 484 ++++++++++++++++++---
 7 files changed, 980 insertions(+), 133 deletions(-)

diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c
index ba3fb3250f..2bf2254602 100644
--- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c
+++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c
@@ -145,6 +145,19 @@ CpuInitDataInitialize (
   CPU_FEATURES_INIT_ORDER              *InitOrder;
   CPU_FEATURES_DATA                    *CpuFeaturesData;
   LIST_ENTRY                           *Entry;
+  UINT32                               Core;
+  UINT32                               Package;
+  UINT32                               Thread;
+  EFI_CPU_PHYSICAL_LOCATION            *Location;
+  BOOLEAN                              *CoresVisited;
+  UINTN                                Index;
+  ACPI_CPU_DATA                        *AcpiCpuData;
+  CPU_STATUS_INFORMATION               *CpuStatus;
+  UINT32                               *ValidCorePerPackage;
+
+  Core    = 0;
+  Package = 0;
+  Thread  = 0;
 
   CpuFeaturesData = GetCpuFeaturesData ();
   CpuFeaturesData->InitOrder = AllocateZeroPool (sizeof (CPU_FEATURES_INIT_ORDER) * NumberOfCpus);
@@ -163,6 +176,17 @@ CpuInitDataInitialize (
     Entry = Entry->ForwardLink;
   }
 
+  CpuFeaturesData->NumberOfCpus = (UINT32) NumberOfCpus;
+
+  AcpiCpuData = GetAcpiCpuData ();
+  ASSERT (AcpiCpuData != NULL);
+  CpuFeaturesData->AcpiCpuData= AcpiCpuData;
+
+  CpuStatus = &AcpiCpuData->CpuStatus;
+  Location = AllocateZeroPool (sizeof (EFI_CPU_PHYSICAL_LOCATION) * NumberOfCpus);
+  ASSERT (Location != NULL);
+  AcpiCpuData->ApLocation = (EFI_PHYSICAL_ADDRESS)(UINTN)Location;
+
   for (ProcessorNumber = 0; ProcessorNumber < NumberOfCpus; ProcessorNumber++) {
     InitOrder = &CpuFeaturesData->InitOrder[ProcessorNumber];
     InitOrder->FeaturesSupportedMask = AllocateZeroPool (CpuFeaturesData->BitMaskSize);
@@ -175,7 +199,76 @@ CpuInitDataInitialize (
       &ProcessorInfoBuffer,
       sizeof (EFI_PROCESSOR_INFORMATION)
       );
+    CopyMem (
+      Location + ProcessorNumber,
+      &ProcessorInfoBuffer.Location,
+      sizeof (EFI_CPU_PHYSICAL_LOCATION)
+      );
+
+    //
+    // Collect CPU package count info.
+    //
+    if (Package < ProcessorInfoBuffer.Location.Package) {
+      Package = ProcessorInfoBuffer.Location.Package;
+    }
+    //
+    // Collect CPU max core count info.
+    //
+    if (Core < ProcessorInfoBuffer.Location.Core) {
+      Core = ProcessorInfoBuffer.Location.Core;
+    }
+    //
+    // Collect CPU max thread count info.
+    //
+    if (Thread < ProcessorInfoBuffer.Location.Thread) {
+      Thread = ProcessorInfoBuffer.Location.Thread;
+    }
   }
+  CpuStatus->PackageCount    = Package + 1;
+  CpuStatus->MaxCoreCount    = Core + 1;
+  CpuStatus->MaxThreadCount  = Thread + 1;
+  DEBUG ((DEBUG_INFO, "Processor Info: Package: %d, MaxCore : %d, MaxThread: %d\n",
+         CpuStatus->PackageCount,
+         CpuStatus->MaxCoreCount,
+         CpuStatus->MaxThreadCount));
+
+  //
+  // Collect valid core count in each package because not all cores are valid.
+  //
+  ValidCorePerPackage= AllocateZeroPool (sizeof (UINT32) * CpuStatus->PackageCount);
+  ASSERT (ValidCorePerPackage != 0);
+  CpuStatus->ValidCoresPerPackages = (EFI_PHYSICAL_ADDRESS)(UINTN)ValidCorePerPackage;
+  CoresVisited = AllocatePool (sizeof (UINT32) * CpuStatus->MaxCoreCount);
+  ASSERT (CoresVisited != NULL);
+
+  for (Index = 0; Index < CpuStatus->PackageCount; Index ++ ) {
+    ZeroMem (CoresVisited, sizeof (UINT32) * CpuStatus->MaxCoreCount);
+    //
+    // Collect valid cores in Current package.
+    //
+    for (ProcessorNumber = 0; ProcessorNumber < NumberOfCpus; ProcessorNumber++) {
+      Location = &CpuFeaturesData->InitOrder[ProcessorNumber].CpuInfo.ProcessorInfo.Location;
+      if (Location->Package == Index && !CoresVisited[Location->Core] ) {
+        //
+        // The ValidCores position for Location->Core is valid.
+        // The possible values in ValidCores[Index] are 0 or 1.
+        // FALSE means no valid threads in this Core.
+        // TRUE means have valid threads in this core, no matter the thead count is 1 or more.
+        //
+        CoresVisited[Location->Core] = TRUE;
+        ValidCorePerPackage[Index]++;
+      }
+    }
+  }
+  FreePool (CoresVisited);
+
+  for (Index = 0; Index <= Package; Index++) {
+    DEBUG ((DEBUG_INFO, "Package: %d, Valid Core : %d\n", Index, ValidCorePerPackage[Index]));
+  }
+
+  CpuFeaturesData->CpuFlags.SemaphoreCount = AllocateZeroPool (sizeof (UINT32) * CpuStatus->PackageCount * CpuStatus->MaxCoreCount * CpuStatus->MaxThreadCount);
+  ASSERT (CpuFeaturesData->CpuFlags.SemaphoreCount != NULL);
+
   //
   // Get support and configuration PCDs
   //
@@ -310,7 +403,7 @@ CollectProcessorData (
   LIST_ENTRY                           *Entry;
   CPU_FEATURES_DATA                    *CpuFeaturesData;
 
-  CpuFeaturesData = GetCpuFeaturesData ();
+  CpuFeaturesData = (CPU_FEATURES_DATA *)Buffer;
   ProcessorNumber = GetProcessorIndex ();
   CpuInfo = &CpuFeaturesData->InitOrder[ProcessorNumber].CpuInfo;
   //
@@ -340,6 +433,33 @@ CollectProcessorData (
   }
 }
 
+/**
+  Get the string value for the input dependence type.
+
+  @param[in]  DependType  Input the dependence type.
+
+  @retval     Return the dependence type string.
+**/
+CHAR16 *
+GetDependTypeString (
+  IN CPU_FEATURE_DEPENDENCE_TYPE       DependType
+  )
+{
+  switch (DependType) {
+  case ThreadDepType:
+    return L"ThreadDepType";
+
+  case CoreDepType:
+    return L"CoreDepType";
+
+  case PackageDepType:
+    return L"PackageDepType";
+
+  default:
+    return L"NoneDepType";
+  }
+}
+
 /**
   Dump the contents of a CPU register table.
 
@@ -416,6 +536,15 @@ DumpRegisterTableOnProcessor (
         RegisterTableEntry->Value
         ));
       break;
+    case Semaphore:
+      DEBUG ((
+        DebugPrintErrorLevel,
+        "Processor: %d: Semaphore: Scope Value: %s\r\n",
+        ProcessorNumber,
+        GetDependTypeString ((CPU_FEATURE_DEPENDENCE_TYPE)RegisterTableEntry->Value)
+        ));
+      break;
+
     default:
       break;
     }
@@ -441,6 +570,11 @@ AnalysisProcessorFeatures (
   REGISTER_CPU_FEATURE_INFORMATION     *CpuInfo;
   LIST_ENTRY                           *Entry;
   CPU_FEATURES_DATA                    *CpuFeaturesData;
+  LIST_ENTRY                           *NextEntry;
+  CPU_FEATURES_ENTRY                   *NextCpuFeatureInOrder;
+  BOOLEAN                              Success;
+  CPU_FEATURE_DEPENDENCE_TYPE          BeforeDep;
+  CPU_FEATURE_DEPENDENCE_TYPE          AfterDep;
 
   CpuFeaturesData = GetCpuFeaturesData ();
   CpuFeaturesData->CapabilityPcd = AllocatePool (CpuFeaturesData->BitMaskSize);
@@ -517,8 +651,14 @@ AnalysisProcessorFeatures (
     //
     CpuInfo = &CpuFeaturesData->InitOrder[ProcessorNumber].CpuInfo;
     Entry = GetFirstNode (&CpuInitOrder->OrderList);
+    NextEntry = Entry->ForwardLink;
     while (!IsNull (&CpuInitOrder->OrderList, Entry)) {
       CpuFeatureInOrder = CPU_FEATURE_ENTRY_FROM_LINK (Entry);
+      if (!IsNull (&CpuInitOrder->OrderList, NextEntry)) {
+        NextCpuFeatureInOrder = CPU_FEATURE_ENTRY_FROM_LINK (NextEntry);
+      } else {
+        NextCpuFeatureInOrder = NULL;
+      }
       if (IsBitMaskMatch (CpuFeatureInOrder->FeatureMask, CpuFeaturesData->SettingPcd)) {
         Status = CpuFeatureInOrder->InitializeFunc (ProcessorNumber, CpuInfo, CpuFeatureInOrder->ConfigData, TRUE);
         if (EFI_ERROR (Status)) {
@@ -532,6 +672,8 @@ AnalysisProcessorFeatures (
             DEBUG ((DEBUG_WARN, "Warning :: Failed to enable Feature: Mask = "));
             DumpCpuFeatureMask (CpuFeatureInOrder->FeatureMask);
           }
+        } else {
+          Success = TRUE;
         }
       } else {
         Status = CpuFeatureInOrder->InitializeFunc (ProcessorNumber, CpuInfo, CpuFeatureInOrder->ConfigData, FALSE);
@@ -542,9 +684,36 @@ AnalysisProcessorFeatures (
             DEBUG ((DEBUG_WARN, "Warning :: Failed to disable Feature: Mask = "));
             DumpCpuFeatureMask (CpuFeatureInOrder->FeatureMask);
           }
+        } else {
+          Success = TRUE;
         }
       }
-      Entry = Entry->ForwardLink;
+
+      if (Success) {
+        //
+        // If feature has dependence with the next feature (ONLY care core/package dependency).
+        // and feature initialize succeed, add sync semaphere here.
+        //
+        BeforeDep = DetectFeatureScope (CpuFeatureInOrder, TRUE);
+        if (NextCpuFeatureInOrder != NULL) {
+          AfterDep  = DetectFeatureScope (NextCpuFeatureInOrder, FALSE);
+        } else {
+          AfterDep = NoneDepType;
+        }
+        //
+        // Assume only one of the depend is valid.
+        //
+        ASSERT (!(BeforeDep > ThreadDepType && AfterDep > ThreadDepType));
+        if (BeforeDep > ThreadDepType) {
+          CPU_REGISTER_TABLE_WRITE32 (ProcessorNumber, Semaphore, 0, BeforeDep);
+        }
+        if (AfterDep > ThreadDepType) {
+          CPU_REGISTER_TABLE_WRITE32 (ProcessorNumber, Semaphore, 0, AfterDep);
+        }
+      }
+
+      Entry     = Entry->ForwardLink;
+      NextEntry = Entry->ForwardLink;
     }
 
     //
@@ -561,27 +730,116 @@ AnalysisProcessorFeatures (
   }
 }
 
+/**
+  Increment semaphore by 1.
+
+  @param      Sem            IN:  32-bit unsigned integer
+
+**/
+VOID
+LibReleaseSemaphore (
+  IN OUT  volatile UINT32           *Sem
+  )
+{
+  InterlockedIncrement (Sem);
+}
+
+/**
+  Decrement the semaphore by 1 if it is not zero.
+
+  Performs an atomic decrement operation for semaphore.
+  The compare exchange operation must be performed using
+  MP safe mechanisms.
+
+  @param      Sem            IN:  32-bit unsigned integer
+
+**/
+VOID
+LibWaitForSemaphore (
+  IN OUT  volatile UINT32           *Sem
+  )
+{
+  UINT32  Value;
+
+  do {
+    Value = *Sem;
+  } while (Value == 0 ||
+           InterlockedCompareExchange32 (
+             Sem,
+             Value,
+             Value - 1
+             ) != Value);
+}
+
+/**
+  Get the string value for the input register type.
+
+  @param[in]  RegisterType  Input the register type.
+
+  @retval     Return the register type string.
+**/
+CHAR16 *
+GetRegisterTypeString (
+  IN REGISTER_TYPE       RegisterType
+  )
+{
+  switch (RegisterType) {
+  case Msr:
+    return L"Msr";
+
+  case ControlRegister:
+    return L"ControlRegister";
+
+  case MemoryMapped:
+    return L"MemoryMapped";
+
+  case CacheControl:
+    return L"CacheControl";
+  
+  case Semaphore:
+    return L"Semaphore";
+
+  default:
+    ASSERT (FALSE);
+    return L"NoneType";
+  }
+}
+
 /**
   Initialize the CPU registers from a register table.
 
-  @param[in]  ProcessorNumber  The index of the CPU executing this function.
+  @param[in]  RegisterTable         The register table for this AP.
+  @param[in]  ApLocation            AP location info for this ap.
+  @param[in]  CpuStatus             CPU status info for this CPU.
+  @param[in]  CpuFlags              Flags data structure used when program the register.
 
   @note This service could be called by BSP/APs.
 **/
 VOID
 ProgramProcessorRegister (
-  IN UINTN  ProcessorNumber
+  IN CPU_REGISTER_TABLE           *RegisterTable,
+  IN EFI_CPU_PHYSICAL_LOCATION    *ApLocation,
+  IN CPU_STATUS_INFORMATION       *CpuStatus,
+  IN PROGRAM_CPU_REGISTER_FLAGS   *CpuFlags
   )
 {
-  CPU_FEATURES_DATA         *CpuFeaturesData;
-  CPU_REGISTER_TABLE        *RegisterTable;
   CPU_REGISTER_TABLE_ENTRY  *RegisterTableEntry;
   UINTN                     Index;
   UINTN                     Value;
   CPU_REGISTER_TABLE_ENTRY  *RegisterTableEntryHead;
-
-  CpuFeaturesData = GetCpuFeaturesData ();
-  RegisterTable = &CpuFeaturesData->RegisterTable[ProcessorNumber];
+  volatile UINT32           *SemaphorePtr;
+  UINT32                    FirstThread;
+  UINT32                    PackageThreadsCount;
+  UINT32                    CurrentThread;
+  UINTN                     ProcessorIndex;
+  UINTN                     ThreadIndex;
+  UINTN                     ValidThreadCount;
+  UINT32                    *ValidCorePerPackage;
+
+  ThreadIndex = ApLocation->Package * CpuStatus->MaxCoreCount * CpuStatus->MaxThreadCount \
+            + ApLocation->Core * CpuStatus->MaxThreadCount \
+            + ApLocation->Thread;
 
   //
   // Traverse Register Table of this logical processor
@@ -591,6 +849,13 @@ ProgramProcessorRegister (
   for (Index = 0; Index < RegisterTable->TableLength; Index++) {
 
     RegisterTableEntry = &RegisterTableEntryHead[Index];
+    DEBUG ((
+      DEBUG_INFO,
+      "Processor = %ul, Entry Index %ul, Type = %s!\n",
+      (UINT64)ThreadIndex,
+      (UINT64)Index,
+      GetRegisterTypeString((REGISTER_TYPE)RegisterTableEntry->RegisterType)
+      ));
 
     //
     // Check the type of specified register
@@ -654,10 +919,6 @@ ProgramProcessorRegister (
     // The specified register is Model Specific Register
     //
     case Msr:
-      //
-      // Get lock to avoid Package/Core scope MSRs programming issue in parallel execution mode
-      //
-      AcquireSpinLock (&CpuFeaturesData->MsrLock);
       if (RegisterTableEntry->ValidBitLength >= 64) {
         //
         // If length is not less than 64 bits, then directly write without reading
@@ -677,20 +938,19 @@ ProgramProcessorRegister (
           RegisterTableEntry->Value
           );
       }
-      ReleaseSpinLock (&CpuFeaturesData->MsrLock);
       break;
     //
     // MemoryMapped operations
     //
     case MemoryMapped:
-      AcquireSpinLock (&CpuFeaturesData->MemoryMappedLock);
+      AcquireSpinLock (&CpuFlags->MemoryMappedLock);
       MmioBitFieldWrite32 (
         (UINTN)(RegisterTableEntry->Index | LShiftU64 (RegisterTableEntry->HighIndex, 32)),
         RegisterTableEntry->ValidBitStart,
         RegisterTableEntry->ValidBitStart + RegisterTableEntry->ValidBitLength - 1,
         (UINT32)RegisterTableEntry->Value
         );
-      ReleaseSpinLock (&CpuFeaturesData->MemoryMappedLock);
+      ReleaseSpinLock (&CpuFlags->MemoryMappedLock);
       break;
     //
     // Enable or disable cache
@@ -706,6 +966,77 @@ ProgramProcessorRegister (
       }
       break;
 
+    case Semaphore:
+      // Semaphore works logic like below:
+      //
+      //  V(x) = LibReleaseSemaphore (Semaphore[FirstThread + x]);
+      //  P(x) = LibWaitForSemaphore (Semaphore[FirstThread + x]);
+      //
+      //  All threads (T0...Tn) waits in P() line and continues running
+      //  together.
+      //
+      //
+      //  T0             T1            ...           Tn
+      //
+      //  V(0...n)       V(0...n)      ...           V(0...n)
+      //  n * P(0)       n * P(1)      ...           n * P(n)
+      //
+      SemaphorePtr = CpuFlags->SemaphoreCount;
+      switch (RegisterTableEntry->Value) {
+      case CoreDepType:
+        //
+        // Get Offset info for the first thread in the core which current thread belongs to.
+        //
+        FirstThread = (ApLocation->Package * CpuStatus->MaxCoreCount + ApLocation->Core) * CpuStatus->MaxThreadCount;
+        CurrentThread = FirstThread + ApLocation->Thread;
+        //
+        // First Notify all threads in current Core that this thread has ready.
+        //
+        for (ProcessorIndex = 0; ProcessorIndex < CpuStatus->MaxThreadCount; ProcessorIndex ++) {
+          LibReleaseSemaphore ((UINT32 *) &SemaphorePtr[FirstThread + ProcessorIndex]);
+        }
+        //
+        // Second, check whether all valid threads in current core have ready.
+        //
+        for (ProcessorIndex = 0; ProcessorIndex < CpuStatus->MaxThreadCount; ProcessorIndex ++) {
+          LibWaitForSemaphore (&SemaphorePtr[CurrentThread]);
+        }
+        break;
+
+      case PackageDepType:
+        ValidCorePerPackage = (UINT32 *)(UINTN)CpuStatus->ValidCoresPerPackages;
+        //
+        // Get Offset info for the first thread in the package which current thread belongs to.
+        //
+        FirstThread = ApLocation->Package * CpuStatus->MaxCoreCount * CpuStatus->MaxThreadCount;
+        //
+        // Get the possible threads count for current package.
+        //
+        PackageThreadsCount = CpuStatus->MaxThreadCount * CpuStatus->MaxCoreCount;
+        CurrentThread = FirstThread + CpuStatus->MaxThreadCount * ApLocation->Core + ApLocation->Thread;
+        //
+        // Get the valid thread count for current package.
+        //
+        ValidThreadCount = CpuStatus->MaxThreadCount * ValidCorePerPackage[ApLocation->Package];
+        //
+        // First Notify all threads in current package that this thread has ready.
+        //
+        for (ProcessorIndex = 0; ProcessorIndex < PackageThreadsCount ; ProcessorIndex ++) {
+          LibReleaseSemaphore ((UINT32 *) &SemaphorePtr[FirstThread + ProcessorIndex]);
+        }
+        //
+        // Second, check whether all valid threads in current package have ready.
+        //
+        for (ProcessorIndex = 0; ProcessorIndex < ValidThreadCount; ProcessorIndex ++) {
+          LibWaitForSemaphore (&SemaphorePtr[CurrentThread]);
+        }
+        break;
+
+      default:
+        break;
+      }
+      break;
+
     default:
       break;
     }
@@ -724,10 +1055,36 @@ SetProcessorRegister (
   IN OUT VOID            *Buffer
   )
 {
-  UINTN                  ProcessorNumber;
+  CPU_FEATURES_DATA         *CpuFeaturesData;
+  CPU_REGISTER_TABLE        *RegisterTable;
+  CPU_REGISTER_TABLE        *RegisterTables;
+  UINT32                    InitApicId;
+  UINTN                     ProcIndex;
+  UINTN                     Index;
+  ACPI_CPU_DATA             *AcpiCpuData;
 
-  ProcessorNumber = GetProcessorIndex ();
-  ProgramProcessorRegister (ProcessorNumber);
+  CpuFeaturesData = (CPU_FEATURES_DATA *) Buffer;
+  AcpiCpuData = CpuFeaturesData->AcpiCpuData;
+
+  RegisterTables = (CPU_REGISTER_TABLE *)(UINTN)AcpiCpuData->RegisterTable;
+
+  InitApicId = GetInitialApicId ();
+  RegisterTable = NULL;
+  for (Index = 0; Index < AcpiCpuData->NumberOfCpus; Index++) {
+    if (RegisterTables[Index].InitialApicId == InitApicId) {
+      RegisterTable =  &RegisterTables[Index];
+      ProcIndex = Index;
+      break;
+    }
+  }
+  ASSERT (RegisterTable != NULL);
+
+  ProgramProcessorRegister (
+    RegisterTable,
+    (EFI_CPU_PHYSICAL_LOCATION *)(UINTN)AcpiCpuData->ApLocation + ProcIndex,
+    &AcpiCpuData->CpuStatus,
+    &CpuFeaturesData->CpuFlags
+    );
 }
 
 /**
@@ -746,6 +1103,9 @@ CpuFeaturesDetect (
 {
   UINTN                  NumberOfCpus;
   UINTN                  NumberOfEnabledProcessors;
+  CPU_FEATURES_DATA      *CpuFeaturesData;
+
+  CpuFeaturesData = GetCpuFeaturesData();
 
   GetNumberOfProcessor (&NumberOfCpus, &NumberOfEnabledProcessors);
 
@@ -754,49 +1114,13 @@ CpuFeaturesDetect (
   //
   // Wakeup all APs for data collection.
   //
-  StartupAPsWorker (CollectProcessorData);
+  StartupAPsWorker (CollectProcessorData, NULL);
 
   //
   // Collect data on BSP
   //
-  CollectProcessorData (NULL);
+  CollectProcessorData (CpuFeaturesData);
 
   AnalysisProcessorFeatures (NumberOfCpus);
 }
 
-/**
-  Performs CPU features Initialization.
-
-  This service will invoke MP service to perform CPU features
-  initialization on BSP/APs per user configuration.
-
-  @note This service could be called by BSP only.
-**/
-VOID
-EFIAPI
-CpuFeaturesInitialize (
-  VOID
-  )
-{
-  CPU_FEATURES_DATA      *CpuFeaturesData;
-  UINTN                  OldBspNumber;
-
-  CpuFeaturesData = GetCpuFeaturesData ();
-
-  OldBspNumber = GetProcessorIndex();
-  CpuFeaturesData->BspNumber = OldBspNumber;
-  //
-  // Wakeup all APs for programming.
-  //
-  StartupAPsWorker (SetProcessorRegister);
-  //
-  // Programming BSP
-  //
-  SetProcessorRegister (NULL);
-  //
-  // Switch to new BSP if required
-  //
-  if (CpuFeaturesData->BspNumber != OldBspNumber) {
-    SwitchNewBsp (CpuFeaturesData->BspNumber);
-  }
-}
diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.c b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.c
index 1f34a3f489..8346f7004f 100644
--- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.c
+++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.c
@@ -15,6 +15,7 @@
 #include <PiDxe.h>
 
 #include <Library/UefiBootServicesTableLib.h>
+#include <Library/UefiLib.h>
 
 #include "RegisterCpuFeatures.h"
 
@@ -115,14 +116,20 @@ GetProcessorInformation (
 
   @param[in]  Procedure               A pointer to the function to be run on
                                       enabled APs of the system.
+  @param[in]  MpEvent                 A pointer to the event to be used later
+                                      to check whether procedure has done.
 **/
 VOID
 StartupAPsWorker (
-  IN  EFI_AP_PROCEDURE                 Procedure
+  IN  EFI_AP_PROCEDURE                 Procedure,
+  IN  VOID                             *MpEvent
   )
 {
   EFI_STATUS                           Status;
   EFI_MP_SERVICES_PROTOCOL             *MpServices;
+  CPU_FEATURES_DATA                    *CpuFeaturesData;
+
+  CpuFeaturesData = GetCpuFeaturesData ();
 
   MpServices = GetMpProtocol ();
   //
@@ -132,9 +139,9 @@ StartupAPsWorker (
                  MpServices,
                  Procedure,
                  FALSE,
-                 NULL,
+                 (EFI_EVENT)MpEvent,
                  0,
-                 NULL,
+                 CpuFeaturesData,
                  NULL
                  );
   ASSERT_EFI_ERROR (Status);
@@ -197,3 +204,61 @@ GetNumberOfProcessor (
   ASSERT_EFI_ERROR (Status);
 }
 
+/**
+  Performs CPU features Initialization.
+
+  This service will invoke MP service to perform CPU features
+  initialization on BSP/APs per user configuration.
+
+  @note This service could be called by BSP only.
+**/
+VOID
+EFIAPI
+CpuFeaturesInitialize (
+  VOID
+  )
+{
+  CPU_FEATURES_DATA          *CpuFeaturesData;
+  UINTN                      OldBspNumber;
+  EFI_EVENT                  MpEvent;
+  EFI_STATUS                 Status;
+
+  CpuFeaturesData = GetCpuFeaturesData ();
+
+  OldBspNumber = GetProcessorIndex();
+  CpuFeaturesData->BspNumber = OldBspNumber;
+
+  Status = gBS->CreateEvent (
+                  EVT_NOTIFY_WAIT,
+                  TPL_CALLBACK,
+                  EfiEventEmptyFunction,
+                  NULL,
+                  &MpEvent
+                  );
+  ASSERT_EFI_ERROR (Status);
+
+  //
+  // Wakeup all APs for programming.
+  //
+  StartupAPsWorker (SetProcessorRegister, MpEvent);
+  //
+  // Programming BSP
+  //
+  SetProcessorRegister (CpuFeaturesData);
+
+  //
+  // Wait all processors to finish the task.
+  //
+  do {
+    Status = gBS->CheckEvent (MpEvent);
+  } while (Status == EFI_NOT_READY);
+  ASSERT_EFI_ERROR (Status);
+
+  //
+  // Switch to new BSP if required
+  //
+  if (CpuFeaturesData->BspNumber != OldBspNumber) {
+    SwitchNewBsp (CpuFeaturesData->BspNumber);
+  }
+}
+
diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.inf b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.inf
index f0f317c945..6693bae575 100644
--- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.inf
+++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.inf
@@ -47,6 +47,9 @@
   SynchronizationLib
   UefiBootServicesTableLib
   IoLib
+  UefiBootServicesTableLib
+  UefiLib
+  LocalApicLib
 
 [Protocols]
   gEfiMpServiceProtocolGuid                                            ## CONSUMES
diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.c b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.c
index 82fe268812..799864a136 100644
--- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.c
+++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.c
@@ -149,11 +149,15 @@ GetProcessorInformation (
 **/
 VOID
 StartupAPsWorker (
-  IN  EFI_AP_PROCEDURE                 Procedure
+  IN  EFI_AP_PROCEDURE                 Procedure,
+  IN  VOID                             *MpEvent
   )
 {
   EFI_STATUS                           Status;
   EFI_PEI_MP_SERVICES_PPI              *CpuMpPpi;
+  CPU_FEATURES_DATA                    *CpuFeaturesData;
+
+  CpuFeaturesData = GetCpuFeaturesData ();
 
   //
   // Get MP Services Protocol
@@ -175,7 +179,7 @@ StartupAPsWorker (
                  Procedure,
                  FALSE,
                  0,
-                 NULL
+                 CpuFeaturesData
                  );
   ASSERT_EFI_ERROR (Status);
 }
@@ -257,3 +261,50 @@ GetNumberOfProcessor (
                          );
   ASSERT_EFI_ERROR (Status);
 }
+
+/**
+  Performs CPU features Initialization.
+
+  This service will invoke MP service to perform CPU features
+  initialization on BSP/APs per user configuration.
+
+  @note This service could be called by BSP only.
+**/
+VOID
+EFIAPI
+CpuFeaturesInitialize (
+  VOID
+  )
+{
+  CPU_FEATURES_DATA          *CpuFeaturesData;
+  UINTN                      OldBspNumber;
+
+  CpuFeaturesData = GetCpuFeaturesData ();
+
+  OldBspNumber = GetProcessorIndex();
+  CpuFeaturesData->BspNumber = OldBspNumber;
+
+  //
+  // Known limitation: In PEI phase, CpuFeatures driver not
+  // support async mode execute tasks. So semaphore type
+  // register can't been used for this instance, must use
+  // DXE type instance.
+  //
+
+  //
+  // Wakeup all APs for programming.
+  //
+  StartupAPsWorker (SetProcessorRegister, NULL);
+  //
+  // Programming BSP
+  //
+  SetProcessorRegister (CpuFeaturesData);
+
+  //
+  // Switch to new BSP if required
+  //
+  if (CpuFeaturesData->BspNumber != OldBspNumber) {
+    SwitchNewBsp (CpuFeaturesData->BspNumber);
+  }
+}
+
diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.inf b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.inf
index fdfef98293..e95f01df0b 100644
--- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.inf
+++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.inf
@@ -49,6 +49,7 @@
   PeiServicesLib
   PeiServicesTablePointerLib
   IoLib
+  LocalApicLib
 
 [Ppis]
   gEfiPeiMpServicesPpiGuid                                             ## CONSUMES
diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeatures.h b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeatures.h
index edd266934f..ad970a00b2 100644
--- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeatures.h
+++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeatures.h
@@ -23,6 +23,7 @@
 #include <Library/MemoryAllocationLib.h>
 #include <Library/SynchronizationLib.h>
 #include <Library/IoLib.h>
+#include <Library/LocalApicLib.h>
 
 #include <AcpiCpuData.h>
 
@@ -46,16 +47,26 @@ typedef struct {
   CPU_FEATURE_INITIALIZE       InitializeFunc;
   UINT8                        *BeforeFeatureBitMask;
   UINT8                        *AfterFeatureBitMask;
+  UINT8                        *CoreBeforeFeatureBitMask;
+  UINT8                        *CoreAfterFeatureBitMask;
+  UINT8                        *PackageBeforeFeatureBitMask;
+  UINT8                        *PackageAfterFeatureBitMask;
   VOID                         *ConfigData;
   BOOLEAN                      BeforeAll;
   BOOLEAN                      AfterAll;
 } CPU_FEATURES_ENTRY;
 
+//
+// Flags used when program the register.
+//
+typedef struct {
+  volatile UINTN           MemoryMappedLock;     // Spinlock used to program mmio
+  volatile UINT32          *SemaphoreCount;      // Semaphore used to program semaphore.
+} PROGRAM_CPU_REGISTER_FLAGS;
+
 typedef struct {
   UINTN                    FeaturesCount;
   UINT32                   BitMaskSize;
-  SPIN_LOCK                MsrLock;
-  SPIN_LOCK                MemoryMappedLock;
   LIST_ENTRY               FeatureList;
 
   CPU_FEATURES_INIT_ORDER  *InitOrder;
@@ -64,9 +75,14 @@ typedef struct {
   UINT8                    *ConfigurationPcd;
   UINT8                    *SettingPcd;
 
+  UINT32                   NumberOfCpus;
+  ACPI_CPU_DATA            *AcpiCpuData;
+
   CPU_REGISTER_TABLE       *RegisterTable;
   CPU_REGISTER_TABLE       *PreSmmRegisterTable;
   UINTN                    BspNumber;
+
+  PROGRAM_CPU_REGISTER_FLAGS  CpuFlags;
 } CPU_FEATURES_DATA;
 
 #define CPU_FEATURE_ENTRY_FROM_LINK(a) \
@@ -118,10 +134,13 @@ GetProcessorInformation (
 
   @param[in]  Procedure               A pointer to the function to be run on
                                       enabled APs of the system.
+  @param[in]  MpEvent                 A pointer to the event to be used later
+                                      to check whether procedure has done.
 **/
 VOID
 StartupAPsWorker (
-  IN  EFI_AP_PROCEDURE                 Procedure
+  IN  EFI_AP_PROCEDURE                 Procedure,
+  IN  VOID                             *MpEvent
   );
 
 /**
@@ -170,4 +189,40 @@ DumpCpuFeature (
   IN CPU_FEATURES_ENTRY  *CpuFeature
   );
 
+/**
+  Return feature dependence result.
+
+  @param[in]  CpuFeature        Pointer to CPU feature.
+  @param[in]  Before            Check before dependence or after.
+
+  @retval     return the dependence result.
+**/
+CPU_FEATURE_DEPENDENCE_TYPE
+DetectFeatureScope (
+  IN CPU_FEATURES_ENTRY         *CpuFeature,
+  IN BOOLEAN                    Before
+  );
+
+/**
+  Programs registers for the calling processor.
+
+  @param[in,out] Buffer  The pointer to private data buffer.
+
+**/
+VOID
+EFIAPI
+SetProcessorRegister (
+  IN OUT VOID            *Buffer
+  );
+
+/**
+  Return ACPI_CPU_DATA data.
+
+  @return  Pointer to ACPI_CPU_DATA data.
+**/
+ACPI_CPU_DATA *
+GetAcpiCpuData (
+  VOID
+  );
+
 #endif
diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeaturesLib.c b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeaturesLib.c
index fa7e107e39..d395f98f32 100644
--- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeaturesLib.c
+++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeaturesLib.c
@@ -112,6 +112,302 @@ IsBitMaskMatchCheck (
   return FALSE;
 }
 
+/**
+  Return feature dependence result.
+
+  @param[in]  CpuFeature        Pointer to CPU feature.
+  @param[in]  Before            Check before dependence or after.
+
+  @retval     return the dependence result.
+**/
+CPU_FEATURE_DEPENDENCE_TYPE
+DetectFeatureScope (
+  IN CPU_FEATURES_ENTRY         *CpuFeature,
+  IN BOOLEAN                    Before
+  )
+{
+  if (Before) {
+    if (CpuFeature->PackageBeforeFeatureBitMask != NULL) {
+      return PackageDepType;
+    }
+
+    if (CpuFeature->CoreBeforeFeatureBitMask != NULL) {
+      return CoreDepType;
+    }
+
+    if (CpuFeature->BeforeFeatureBitMask != NULL) {
+      return ThreadDepType;
+    }
+
+    return NoneDepType;
+  }
+
+  if (CpuFeature->PackageAfterFeatureBitMask != NULL) {
+    return PackageDepType;
+  }
+
+  if (CpuFeature->CoreAfterFeatureBitMask != NULL) {
+    return CoreDepType;
+  }
+
+  if (CpuFeature->AfterFeatureBitMask != NULL) {
+    return ThreadDepType;
+  }
+
+  return NoneDepType;
+}
+
+/**
+  Clear dependence for the specified type.
+
+  @param[in]  CurrentFeature     Cpu feature need to clear.
+  @param[in]  Before             Before or after dependence relationship.
+
+**/
+VOID
+ClearFeatureScope (
+  IN CPU_FEATURES_ENTRY           *CpuFeature,
+  IN BOOLEAN                      Before
+  )
+{
+  if (Before) {
+    if (CpuFeature->BeforeFeatureBitMask != NULL) {
+      FreePool (CpuFeature->BeforeFeatureBitMask);
+      CpuFeature->BeforeFeatureBitMask = NULL;
+    }
+    if (CpuFeature->CoreBeforeFeatureBitMask != NULL) {
+      FreePool (CpuFeature->CoreBeforeFeatureBitMask);
+      CpuFeature->CoreBeforeFeatureBitMask = NULL;
+    }
+    if (CpuFeature->PackageBeforeFeatureBitMask != NULL) {
+      FreePool (CpuFeature->PackageBeforeFeatureBitMask);
+      CpuFeature->PackageBeforeFeatureBitMask = NULL;
+    }
+  } else {
+    if (CpuFeature->PackageAfterFeatureBitMask != NULL) {
+      FreePool (CpuFeature->PackageAfterFeatureBitMask);
+      CpuFeature->PackageAfterFeatureBitMask = NULL;
+    }
+    if (CpuFeature->CoreAfterFeatureBitMask != NULL) {
+      FreePool (CpuFeature->CoreAfterFeatureBitMask);
+      CpuFeature->CoreAfterFeatureBitMask = NULL;
+    }
+    if (CpuFeature->AfterFeatureBitMask != NULL) {
+      FreePool (CpuFeature->AfterFeatureBitMask);
+      CpuFeature->AfterFeatureBitMask = NULL;
+    }
+  }
+}
+
+/**
+  Base on dependence relationship to asjust feature dependence.
+
+  ONLY when the feature before(or after) the find feature also has 
+  dependence with the find feature. In this case, driver need to base
+  on dependce relationship to decide how to insert current feature and
+  adjust the feature dependence.
+
+  @param[in]  PreviousFeature    CPU feature current before the find one.
+  @param[in]  CurrentFeature     Cpu feature need to adjust.
+  @param[in]  Before             Before or after dependence relationship.
+
+  @retval   TRUE   means the current feature dependence has been adjusted.
+
+  @retval   FALSE  means the previous feature dependence has been adjusted.
+                   or previous feature has no dependence with the find one.
+
+**/
+BOOLEAN
+AdjustFeaturesDependence (
+  IN OUT CPU_FEATURES_ENTRY         *PreviousFeature,
+  IN OUT CPU_FEATURES_ENTRY         *CurrentFeature,
+  IN     BOOLEAN                    Before
+  )
+{
+  CPU_FEATURE_DEPENDENCE_TYPE            PreDependType;
+  CPU_FEATURE_DEPENDENCE_TYPE            CurrentDependType;
+
+  PreDependType     = DetectFeatureScope(PreviousFeature, Before);
+  CurrentDependType = DetectFeatureScope(CurrentFeature, Before);
+
+  //
+  // If previous feature has no dependence with the find featue.
+  // return FALSE.
+  //
+  if (PreDependType == NoneDepType) {
+    return FALSE;
+  }
+
+  //
+  // If both feature have dependence, keep the one which needs use more 
+  // processors and clear the dependence for the other one.
+  //
+  if (PreDependType >= CurrentDependType) {
+    ClearFeatureScope (CurrentFeature, Before);
+    return TRUE;
+  } else {
+    ClearFeatureScope (PreviousFeature, Before);
+    return FALSE;
+  }
+}
+
+/**
+  Base on dependence relationship to asjust feature order.
+
+  @param[in]  FeatureList        Pointer to CPU feature list
+  @param[in]  FindEntry          The entry this feature depend on.
+  @param[in]  CurrentEntry       The entry for this feature.
+  @param[in]  Before             Before or after dependence relationship.
+
+**/
+VOID
+AdjustEntry (
+  IN      LIST_ENTRY                *FeatureList,
+  IN OUT  LIST_ENTRY                *FindEntry,
+  IN OUT  LIST_ENTRY                *CurrentEntry,
+  IN      BOOLEAN                   Before
+  )
+{
+  LIST_ENTRY                *PreviousEntry;
+  CPU_FEATURES_ENTRY        *PreviousFeature;
+  CPU_FEATURES_ENTRY        *CurrentFeature;
+
+  //
+  // For CPU feature which has core or package type dependence, later code need to insert
+  // AcquireSpinLock/ReleaseSpinLock logic to sequency the execute order.
+  // So if driver finds both feature A and B need to execute before feature C, driver will
+  // base on dependence type of feature A and B to update the logic here.
+  // For example, feature A has package type dependence and feature B has core type dependence,
+  // because package type dependence need to wait for more processors which has strong dependence
+  // than core type dependence. So driver will adjust the feature order to B -> A -> C. and driver 
+  // will remove the feature dependence in feature B. 
+  // Driver just needs to make sure before feature C been executed, feature A has finished its task
+  // in all all thread. Feature A finished in all threads also means feature B have finshed in all
+  // threads.
+  //
+  if (Before) {
+    PreviousEntry = GetPreviousNode (FeatureList, FindEntry);
+  } else {

+    PreviousEntry = GetNextNode (FeatureList, FindEntry);
+  }
+
+  CurrentFeature  = CPU_FEATURE_ENTRY_FROM_LINK (CurrentEntry);
+  RemoveEntryList (CurrentEntry);
+
+  if (IsNull (FeatureList, PreviousEntry)) {
+    //
+    // If not exist the previous or next entry, just insert the current entry.
+    //
+    if (Before) {
+      InsertTailList (FindEntry, CurrentEntry);
+    } else {
+      InsertHeadList (FindEntry, CurrentEntry);
+    }
+  } else {
+    //
+    // If exist the previous or next entry, need to check it before insert curent entry.
+    //
+    PreviousFeature = CPU_FEATURE_ENTRY_FROM_LINK (PreviousEntry);
+
+    if (AdjustFeaturesDependence (PreviousFeature, CurrentFeature, Before)) {
+      //
+      // Return TRUE means current feature dependence has been cleared and the previous
+      // feature dependence has been kept and used. So insert current feature before (or after)
+      // the previous feature.
+      //
+      if (Before) {
+        InsertTailList (PreviousEntry, CurrentEntry);
+      } else {
+        InsertHeadList (PreviousEntry, CurrentEntry);
+      }
+    } else {
+      if (Before) {
+        InsertTailList (FindEntry, CurrentEntry);
+      } else {
+        InsertHeadList (FindEntry, CurrentEntry);
+      }
+    }
+  }
+}

+
+/**
+  Checks and adjusts current CPU features per dependency relationship.
+
+  @param[in]  FeatureList        Pointer to CPU feature list
+  @param[in]  CurrentEntry       Pointer to current checked CPU feature
+  @param[in]  FeatureMask        The feature bit mask.
+
+  @retval     return Swapped info.
+**/
+BOOLEAN
+InsertToBeforeEntry (
+  IN LIST_ENTRY              *FeatureList,
+  IN LIST_ENTRY              *CurrentEntry,
+  IN UINT8                   *FeatureMask
+  )
+{
+  LIST_ENTRY                 *CheckEntry;
+  CPU_FEATURES_ENTRY         *CheckFeature;
+  BOOLEAN                    Swapped;
+
+  Swapped = FALSE;
+
+  //
+  // Check all features dispatched before this entry
+  //
+  CheckEntry = GetFirstNode (FeatureList);
+  while (CheckEntry != CurrentEntry) {
+    CheckFeature = CPU_FEATURE_ENTRY_FROM_LINK (CheckEntry);
+    if (IsBitMaskMatchCheck (CheckFeature->FeatureMask, FeatureMask)) {
+      AdjustEntry (FeatureList, CheckEntry, CurrentEntry, TRUE);
+      Swapped = TRUE;
+      break;
+    }
+    CheckEntry = CheckEntry->ForwardLink;
+  }
+
+  return Swapped;
+}
+
+/**
+  Checks and adjusts current CPU features per dependency relationship.
+
+  @param[in]  FeatureList        Pointer to CPU feature list
+  @param[in]  CurrentEntry       Pointer to current checked CPU feature
+  @param[in]  FeatureMask        The feature bit mask.
+
+  @retval     return Swapped info.
+**/
+BOOLEAN
+InsertToAfterEntry (
+  IN LIST_ENTRY              *FeatureList,
+  IN LIST_ENTRY              *CurrentEntry,
+  IN UINT8                   *FeatureMask
+  )
+{
+  LIST_ENTRY                 *CheckEntry;
+  CPU_FEATURES_ENTRY         *CheckFeature;
+  BOOLEAN                    Swapped;
+
+  Swapped = FALSE;
+
+  //
+  // Check all features dispatched after this entry
+  //
+  CheckEntry = GetNextNode (FeatureList, CurrentEntry);
+  while (!IsNull (FeatureList, CheckEntry)) {
+    CheckFeature = CPU_FEATURE_ENTRY_FROM_LINK (CheckEntry);
+    if (IsBitMaskMatchCheck (CheckFeature->FeatureMask, FeatureMask)) {
+      AdjustEntry (FeatureList, CheckEntry, CurrentEntry, FALSE);
+      Swapped = TRUE;
+      break;
+    }
+    CheckEntry = CheckEntry->ForwardLink;
+  }
+
+  return Swapped;
+}
+
 /**
   Checks and adjusts CPU features order per dependency relationship.
 
@@ -128,11 +424,13 @@ CheckCpuFeaturesDependency (
   CPU_FEATURES_ENTRY         *CheckFeature;
   BOOLEAN                    Swapped;
   LIST_ENTRY                 *TempEntry;
+  LIST_ENTRY                 *NextEntry;
 
   CurrentEntry = GetFirstNode (FeatureList);
   while (!IsNull (FeatureList, CurrentEntry)) {
     Swapped = FALSE;
     CpuFeature = CPU_FEATURE_ENTRY_FROM_LINK (CurrentEntry);
+    NextEntry = CurrentEntry->ForwardLink;
     if (CpuFeature->BeforeAll) {
       //
       // Check all features dispatched before this entry
@@ -153,6 +451,7 @@ CheckCpuFeaturesDependency (
         CheckEntry = CheckEntry->ForwardLink;
       }
       if (Swapped) {
+        CurrentEntry = NextEntry;
         continue;
       }
     }
@@ -179,60 +478,59 @@ CheckCpuFeaturesDependency (
         CheckEntry = CheckEntry->ForwardLink;
       }
       if (Swapped) {
+        CurrentEntry = NextEntry;
         continue;
       }
     }
 
     if (CpuFeature->BeforeFeatureBitMask != NULL) {
-      //
-      // Check all features dispatched before this entry
-      //
-      CheckEntry = GetFirstNode (FeatureList);
-      while (CheckEntry != CurrentEntry) {
-        CheckFeature = CPU_FEATURE_ENTRY_FROM_LINK (CheckEntry);
-        if (IsBitMaskMatchCheck (CheckFeature->FeatureMask, CpuFeature->BeforeFeatureBitMask)) {
-          //
-          // If there is dependency, swap them
-          //
-          RemoveEntryList (CurrentEntry);
-          InsertTailList (CheckEntry, CurrentEntry);
-          Swapped = TRUE;
-          break;
-        }
-        CheckEntry = CheckEntry->ForwardLink;
-      }
+      Swapped = InsertToBeforeEntry (FeatureList, CurrentEntry, CpuFeature->BeforeFeatureBitMask);
       if (Swapped) {
+        CurrentEntry = NextEntry;
         continue;
       }
     }
 
     if (CpuFeature->AfterFeatureBitMask != NULL) {
-      //
-      // Check all features dispatched after this entry
-      //
-      CheckEntry = GetNextNode (FeatureList, CurrentEntry);
-      while (!IsNull (FeatureList, CheckEntry)) {
-        CheckFeature = CPU_FEATURE_ENTRY_FROM_LINK (CheckEntry);
-        if (IsBitMaskMatchCheck (CheckFeature->FeatureMask, CpuFeature->AfterFeatureBitMask)) {
-          //
-          // If there is dependency, swap them
-          //
-          TempEntry = GetNextNode (FeatureList, CurrentEntry);
-          RemoveEntryList (CurrentEntry);
-          InsertHeadList (CheckEntry, CurrentEntry);
-          CurrentEntry = TempEntry;
-          Swapped = TRUE;
-          break;
-        }
-        CheckEntry = CheckEntry->ForwardLink;
+      Swapped = InsertToAfterEntry (FeatureList, CurrentEntry, CpuFeature->AfterFeatureBitMask);
+      if (Swapped) {
+        CurrentEntry = NextEntry;
+        continue;
+      }
+    }
+
+    if (CpuFeature->CoreBeforeFeatureBitMask != NULL) {
+      Swapped = InsertToBeforeEntry (FeatureList, CurrentEntry, CpuFeature->CoreBeforeFeatureBitMask);
+      if (Swapped) {
+        CurrentEntry = NextEntry;
+        continue;
       }
+    }
+
+    if (CpuFeature->CoreAfterFeatureBitMask != NULL) {
+      Swapped = InsertToAfterEntry (FeatureList, CurrentEntry, CpuFeature->CoreAfterFeatureBitMask);
       if (Swapped) {
+        CurrentEntry = NextEntry;
         continue;
       }
     }
-    //
-    // No swap happened, check the next feature
-    //
+
+    if (CpuFeature->PackageBeforeFeatureBitMask != NULL) {
+      Swapped = InsertToBeforeEntry (FeatureList, CurrentEntry, CpuFeature->PackageBeforeFeatureBitMask);
+      if (Swapped) {
+        CurrentEntry = NextEntry;
+        continue;
+      }
+    }
+
+    if (CpuFeature->PackageAfterFeatureBitMask != NULL) {
+      Swapped = InsertToAfterEntry (FeatureList, CurrentEntry, CpuFeature->PackageAfterFeatureBitMask);
+      if (Swapped) {
+        CurrentEntry = NextEntry;
+        continue;
+      }
+    }
+
     CurrentEntry = CurrentEntry->ForwardLink;
   }
 }
@@ -265,8 +563,7 @@ RegisterCpuFeatureWorker (
   CpuFeaturesData = GetCpuFeaturesData ();
   if (CpuFeaturesData->FeaturesCount == 0) {
     InitializeListHead (&CpuFeaturesData->FeatureList);
-    InitializeSpinLock (&CpuFeaturesData->MsrLock);
-    InitializeSpinLock (&CpuFeaturesData->MemoryMappedLock);
+    InitializeSpinLock (&CpuFeaturesData->CpuFlags.MemoryMappedLock);
     CpuFeaturesData->BitMaskSize = (UINT32) BitMaskSize;
   }
   ASSERT (CpuFeaturesData->BitMaskSize == BitMaskSize);
@@ -328,6 +625,31 @@ RegisterCpuFeatureWorker (
       }
       CpuFeatureEntry->AfterFeatureBitMask = CpuFeature->AfterFeatureBitMask;
     }
+    if (CpuFeature->CoreBeforeFeatureBitMask != NULL) {
+      if (CpuFeatureEntry->CoreBeforeFeatureBitMask != NULL) {
+        FreePool (CpuFeatureEntry->CoreBeforeFeatureBitMask);
+      }
+      CpuFeatureEntry->CoreBeforeFeatureBitMask = CpuFeature->CoreBeforeFeatureBitMask;
+    }
+    if (CpuFeature->CoreAfterFeatureBitMask != NULL) {
+      if (CpuFeatureEntry->CoreAfterFeatureBitMask != NULL) {
+        FreePool (CpuFeatureEntry->CoreAfterFeatureBitMask);
+      }
+      CpuFeatureEntry->CoreAfterFeatureBitMask = CpuFeature->CoreAfterFeatureBitMask;
+    }
+    if (CpuFeature->PackageBeforeFeatureBitMask != NULL) {
+      if (CpuFeatureEntry->PackageBeforeFeatureBitMask != NULL) {
+        FreePool (CpuFeatureEntry->PackageBeforeFeatureBitMask);
+      }
+      CpuFeatureEntry->PackageBeforeFeatureBitMask = CpuFeature->PackageBeforeFeatureBitMask;
+    }
+    if (CpuFeature->PackageAfterFeatureBitMask != NULL) {
+      if (CpuFeatureEntry->PackageAfterFeatureBitMask != NULL) {
+        FreePool (CpuFeatureEntry->PackageAfterFeatureBitMask);
+      }
+      CpuFeatureEntry->PackageAfterFeatureBitMask = CpuFeature->PackageAfterFeatureBitMask;
+    }
+
     CpuFeatureEntry->BeforeAll = CpuFeature->BeforeAll;
     CpuFeatureEntry->AfterAll  = CpuFeature->AfterAll;
 
@@ -410,6 +732,8 @@ SetCpuFeaturesBitMask (
   @retval  RETURN_UNSUPPORTED       Registration of the CPU feature is not
                                     supported due to a circular dependency between
                                     BEFORE and AFTER features.
+  @retval  RETURN_NOT_READY         CPU feature PCD PcdCpuFeaturesUserConfiguration
+                                    not updated by Platform driver yet.
 
   @note This service could be called by BSP only.
 **/
@@ -431,12 +755,20 @@ RegisterCpuFeature (
   UINT8                      *FeatureMask;
   UINT8                      *BeforeFeatureBitMask;
   UINT8                      *AfterFeatureBitMask;
+  UINT8                      *CoreBeforeFeatureBitMask;
+  UINT8                      *CoreAfterFeatureBitMask;
+  UINT8                      *PackageBeforeFeatureBitMask;
+  UINT8                      *PackageAfterFeatureBitMask;
   BOOLEAN                    BeforeAll;
   BOOLEAN                    AfterAll;
 
-  FeatureMask          = NULL;
-  BeforeFeatureBitMask = NULL;
-  AfterFeatureBitMask  = NULL;
+  FeatureMask                 = NULL;
+  BeforeFeatureBitMask        = NULL;
+  AfterFeatureBitMask         = NULL;
+  CoreBeforeFeatureBitMask    = NULL;
+  CoreAfterFeatureBitMask     = NULL;
+  PackageBeforeFeatureBitMask  = NULL;
+  PackageAfterFeatureBitMask   = NULL;
   BeforeAll            = FALSE;
   AfterAll             = FALSE;
 
@@ -449,6 +781,10 @@ RegisterCpuFeature (
                     != (CPU_FEATURE_BEFORE | CPU_FEATURE_AFTER));
     ASSERT ((Feature & (CPU_FEATURE_BEFORE_ALL | CPU_FEATURE_AFTER_ALL))
                     != (CPU_FEATURE_BEFORE_ALL | CPU_FEATURE_AFTER_ALL));
+    ASSERT ((Feature & (CPU_FEATURE_CORE_BEFORE | CPU_FEATURE_CORE_AFTER))
+                    != (CPU_FEATURE_CORE_BEFORE | CPU_FEATURE_CORE_AFTER));
+    ASSERT ((Feature & (CPU_FEATURE_PACKAGE_BEFORE | CPU_FEATURE_PACKAGE_AFTER))
+                    != (CPU_FEATURE_PACKAGE_BEFORE | CPU_FEATURE_PACKAGE_AFTER));
     if (Feature < CPU_FEATURE_BEFORE) {
       BeforeAll = ((Feature & CPU_FEATURE_BEFORE_ALL) != 0) ? TRUE : FALSE;
       AfterAll  = ((Feature & CPU_FEATURE_AFTER_ALL) != 0) ? TRUE : FALSE;
@@ -459,6 +795,14 @@ RegisterCpuFeature (
       SetCpuFeaturesBitMask (&BeforeFeatureBitMask, Feature & ~CPU_FEATURE_BEFORE, BitMaskSize);
     } else if ((Feature & CPU_FEATURE_AFTER) != 0) {
       SetCpuFeaturesBitMask (&AfterFeatureBitMask, Feature & ~CPU_FEATURE_AFTER, BitMaskSize);
+    } else if ((Feature & CPU_FEATURE_CORE_BEFORE) != 0) {
+      SetCpuFeaturesBitMask (&CoreBeforeFeatureBitMask, Feature & ~CPU_FEATURE_CORE_BEFORE, BitMaskSize);
+    } else if ((Feature & CPU_FEATURE_CORE_AFTER) != 0) {
+      SetCpuFeaturesBitMask (&CoreAfterFeatureBitMask, Feature & ~CPU_FEATURE_CORE_AFTER, BitMaskSize);
+    } else if ((Feature & CPU_FEATURE_PACKAGE_BEFORE) != 0) {
+      SetCpuFeaturesBitMask (&PackageBeforeFeatureBitMask, Feature & ~CPU_FEATURE_PACKAGE_BEFORE, BitMaskSize);
+    } else if ((Feature & CPU_FEATURE_PACKAGE_AFTER) != 0) {
+      SetCpuFeaturesBitMask (&PackageAfterFeatureBitMask, Feature & ~CPU_FEATURE_PACKAGE_AFTER, BitMaskSize);
     }
     Feature = VA_ARG (Marker, UINT32);
   }
@@ -466,15 +810,19 @@ RegisterCpuFeature (
 
   CpuFeature = AllocateZeroPool (sizeof (CPU_FEATURES_ENTRY));
   ASSERT (CpuFeature != NULL);
-  CpuFeature->Signature            = CPU_FEATURE_ENTRY_SIGNATURE;
-  CpuFeature->FeatureMask          = FeatureMask;
-  CpuFeature->BeforeFeatureBitMask = BeforeFeatureBitMask;
-  CpuFeature->AfterFeatureBitMask  = AfterFeatureBitMask;
-  CpuFeature->BeforeAll            = BeforeAll;
-  CpuFeature->AfterAll             = AfterAll;
-  CpuFeature->GetConfigDataFunc    = GetConfigDataFunc;
-  CpuFeature->SupportFunc          = SupportFunc;
-  CpuFeature->InitializeFunc       = InitializeFunc;
+  CpuFeature->Signature                   = CPU_FEATURE_ENTRY_SIGNATURE;
+  CpuFeature->FeatureMask                 = FeatureMask;
+  CpuFeature->BeforeFeatureBitMask        = BeforeFeatureBitMask;
+  CpuFeature->AfterFeatureBitMask         = AfterFeatureBitMask;
+  CpuFeature->CoreBeforeFeatureBitMask    = CoreBeforeFeatureBitMask;
+  CpuFeature->CoreAfterFeatureBitMask     = CoreAfterFeatureBitMask;
+  CpuFeature->PackageBeforeFeatureBitMask = PackageBeforeFeatureBitMask;
+  CpuFeature->PackageAfterFeatureBitMask  = PackageAfterFeatureBitMask;
+  CpuFeature->BeforeAll                   = BeforeAll;
+  CpuFeature->AfterAll                    = AfterAll;
+  CpuFeature->GetConfigDataFunc           = GetConfigDataFunc;
+  CpuFeature->SupportFunc                 = SupportFunc;
+  CpuFeature->InitializeFunc              = InitializeFunc;
   if (FeatureName != NULL) {
     CpuFeature->FeatureName          = AllocatePool (CPU_FEATURE_NAME_SIZE);
     ASSERT (CpuFeature->FeatureName != NULL);
@@ -489,13 +837,12 @@ RegisterCpuFeature (
 }
 
 /**
-  Allocates boot service data to save ACPI_CPU_DATA.
+  Return ACPI_CPU_DATA data.
 
-  @return  Pointer to allocated ACPI_CPU_DATA.
+  @return  Pointer to ACPI_CPU_DATA data.
 **/
-STATIC
 ACPI_CPU_DATA *
-AllocateAcpiCpuData (
+GetAcpiCpuData (
   VOID
   )
 {
@@ -508,9 +855,20 @@ AllocateAcpiCpuData (
   UINTN                                Index;
   EFI_PROCESSOR_INFORMATION            ProcessorInfoBuffer;
 
+  AcpiCpuData = (ACPI_CPU_DATA *) (UINTN) PcdGet64 (PcdCpuS3DataAddress);
+  if (AcpiCpuData != NULL) {
+    return AcpiCpuData;
+  }
+
   AcpiCpuData  = AllocatePages (EFI_SIZE_TO_PAGES (sizeof (ACPI_CPU_DATA)));
   ASSERT (AcpiCpuData != NULL);
 
+  //
+  // Set PcdCpuS3DataAddress to the base address of the ACPI_CPU_DATA structure
+  //
+  Status = PcdSet64S (PcdCpuS3DataAddress, (UINT64)(UINTN)AcpiCpuData);
+  ASSERT_EFI_ERROR (Status);
+
   GetNumberOfProcessor (&NumberOfCpus, &NumberOfEnabledProcessors);
   AcpiCpuData->NumberOfCpus = (UINT32)NumberOfCpus;
 
@@ -606,7 +964,6 @@ CpuRegisterTableWriteWorker (
   IN UINT64                  Value
   )
 {
-  EFI_STATUS               Status;
   CPU_FEATURES_DATA        *CpuFeaturesData;
   ACPI_CPU_DATA            *AcpiCpuData;
   CPU_REGISTER_TABLE       *RegisterTable;
@@ -614,17 +971,8 @@ CpuRegisterTableWriteWorker (
 
   CpuFeaturesData = GetCpuFeaturesData ();
   if (CpuFeaturesData->RegisterTable == NULL) {
-    AcpiCpuData = (ACPI_CPU_DATA *) (UINTN) PcdGet64 (PcdCpuS3DataAddress);
-    if (AcpiCpuData == NULL) {
-      AcpiCpuData = AllocateAcpiCpuData ();
-      ASSERT (AcpiCpuData != NULL);
-      //
-      // Set PcdCpuS3DataAddress to the base address of the ACPI_CPU_DATA structure
-      //
-      Status = PcdSet64S (PcdCpuS3DataAddress, (UINT64)(UINTN)AcpiCpuData);
-      ASSERT_EFI_ERROR (Status);
-    }
-    ASSERT (AcpiCpuData->RegisterTable != 0);
+    AcpiCpuData = GetAcpiCpuData ();
+    ASSERT ((AcpiCpuData != NULL) && (AcpiCpuData->RegisterTable != 0));
     CpuFeaturesData->RegisterTable = (CPU_REGISTER_TABLE *) (UINTN) AcpiCpuData->RegisterTable;
     CpuFeaturesData->PreSmmRegisterTable = (CPU_REGISTER_TABLE *) (UINTN) AcpiCpuData->PreSmmInitRegisterTable;
   }
-- 
2.15.0.windows.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Patch v2 4/6] UefiCpuPkg/PiSmmCpuDxeSmm: Add logic to support semaphore type.
  2018-10-17  2:16 [Patch v2 0/6] Fix performance issue caused by Set MSR task Eric Dong
                   ` (2 preceding siblings ...)
  2018-10-17  2:16 ` [Patch v2 3/6] UefiCpuPkg/RegisterCpuFeaturesLib: Add logic to support semaphore type Eric Dong
@ 2018-10-17  2:16 ` Eric Dong
  2018-10-18  5:54   ` Ni, Ruiyu
  2018-10-17  2:16 ` [Patch v2 5/6] UefiCpuPkg/CpuS3DataDxe: Keep old data if value already existed Eric Dong
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Eric Dong @ 2018-10-17  2:16 UTC (permalink / raw)
  To: edk2-devel; +Cc: Ruiyu Ni, Laszlo Ersek

Because this driver needs to set MSRs saved in normal boot phase, sync semaphore
logic from RegisterCpuFeaturesLib code which used for normal boot phase.

Detail see below change for RegisterCpuFeaturesLib:
  UefiCpuPkg/RegisterCpuFeaturesLib: Add logic to support semaphore type.

Cc: Ruiyu Ni <ruiyu.ni@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Eric Dong <eric.dong@intel.com>
---
 UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c          | 406 +++++++++++++++++++----------
 UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c      |   3 -
 UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h |   3 +-
 3 files changed, 268 insertions(+), 144 deletions(-)

diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c b/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
index 52ff9679d5..42a889f08f 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
@@ -38,9 +38,12 @@ typedef struct {
 } MP_ASSEMBLY_ADDRESS_MAP;
 
 //
-// Spin lock used to serialize MemoryMapped operation
+// Flags used when program the register.
 //
-SPIN_LOCK                *mMemoryMappedLock = NULL;
+typedef struct {
+  volatile UINTN           MemoryMappedLock;     // Spinlock used to program mmio
+  volatile UINT32          *SemaphoreCount;      // Semaphore used to program semaphore.
+} PROGRAM_CPU_REGISTER_FLAGS;
 
 //
 // Signal that SMM BASE relocation is complete.
@@ -62,13 +65,11 @@ AsmGetAddressMap (
 #define LEGACY_REGION_SIZE    (2 * 0x1000)
 #define LEGACY_REGION_BASE    (0xA0000 - LEGACY_REGION_SIZE)
 
+PROGRAM_CPU_REGISTER_FLAGS   mCpuFlags;
 ACPI_CPU_DATA                mAcpiCpuData;
 volatile UINT32              mNumberToFinish;
 MP_CPU_EXCHANGE_INFO         *mExchangeInfo;
 BOOLEAN                      mRestoreSmmConfigurationInS3 = FALSE;
-MP_MSR_LOCK                  *mMsrSpinLocks = NULL;
-UINTN                        mMsrSpinLockCount;
-UINTN                        mMsrCount = 0;
 
 //
 // S3 boot flag
@@ -91,89 +92,6 @@ UINT8                        mApHltLoopCodeTemplate[] = {
                                0xEB, 0xFC               // jmp $-2
                                };
 
-/**
-  Get MSR spin lock by MSR index.
-
-  @param  MsrIndex       MSR index value.
-
-  @return Pointer to MSR spin lock.
-
-**/
-SPIN_LOCK *
-GetMsrSpinLockByIndex (
-  IN UINT32      MsrIndex
-  )
-{
-  UINTN     Index;
-  for (Index = 0; Index < mMsrCount; Index++) {
-    if (MsrIndex == mMsrSpinLocks[Index].MsrIndex) {
-      return mMsrSpinLocks[Index].SpinLock;
-    }
-  }
-  return NULL;
-}
-
-/**
-  Initialize MSR spin lock by MSR index.
-
-  @param  MsrIndex       MSR index value.
-
-**/
-VOID
-InitMsrSpinLockByIndex (
-  IN UINT32      MsrIndex
-  )
-{
-  UINTN    MsrSpinLockCount;
-  UINTN    NewMsrSpinLockCount;
-  UINTN    Index;
-  UINTN    AddedSize;
-
-  if (mMsrSpinLocks == NULL) {
-    MsrSpinLockCount = mSmmCpuSemaphores.SemaphoreMsr.AvailableCounter;
-    mMsrSpinLocks = (MP_MSR_LOCK *) AllocatePool (sizeof (MP_MSR_LOCK) * MsrSpinLockCount);
-    ASSERT (mMsrSpinLocks != NULL);
-    for (Index = 0; Index < MsrSpinLockCount; Index++) {
-      mMsrSpinLocks[Index].SpinLock =
-       (SPIN_LOCK *)((UINTN)mSmmCpuSemaphores.SemaphoreMsr.Msr + Index * mSemaphoreSize);
-      mMsrSpinLocks[Index].MsrIndex = (UINT32)-1;
-    }
-    mMsrSpinLockCount = MsrSpinLockCount;
-    mSmmCpuSemaphores.SemaphoreMsr.AvailableCounter = 0;
-  }
-  if (GetMsrSpinLockByIndex (MsrIndex) == NULL) {
-    //
-    // Initialize spin lock for MSR programming
-    //
-    mMsrSpinLocks[mMsrCount].MsrIndex = MsrIndex;
-    InitializeSpinLock (mMsrSpinLocks[mMsrCount].SpinLock);
-    mMsrCount ++;
-    if (mMsrCount == mMsrSpinLockCount) {
-      //
-      // If MSR spin lock buffer is full, enlarge it
-      //
-      AddedSize = SIZE_4KB;
-      mSmmCpuSemaphores.SemaphoreMsr.Msr =
-                        AllocatePages (EFI_SIZE_TO_PAGES(AddedSize));
-      ASSERT (mSmmCpuSemaphores.SemaphoreMsr.Msr != NULL);
-      NewMsrSpinLockCount = mMsrSpinLockCount + AddedSize / mSemaphoreSize;
-      mMsrSpinLocks = ReallocatePool (
-                        sizeof (MP_MSR_LOCK) * mMsrSpinLockCount,
-                        sizeof (MP_MSR_LOCK) * NewMsrSpinLockCount,
-                        mMsrSpinLocks
-                        );
-      ASSERT (mMsrSpinLocks != NULL);
-      mMsrSpinLockCount = NewMsrSpinLockCount;
-      for (Index = mMsrCount; Index < mMsrSpinLockCount; Index++) {
-        mMsrSpinLocks[Index].SpinLock =
-                 (SPIN_LOCK *)((UINTN)mSmmCpuSemaphores.SemaphoreMsr.Msr +
-                 (Index - mMsrCount)  * mSemaphoreSize);
-        mMsrSpinLocks[Index].MsrIndex = (UINT32)-1;
-      }
-    }
-  }
-}
-
 /**
   Sync up the MTRR values for all processors.
 
@@ -204,42 +122,131 @@ Returns:
 }
 
 /**
-  Programs registers for the calling processor.
+  Increment semaphore by 1.
+
+  @param      Sem            IN:  32-bit unsigned integer
+
+**/
+VOID
+S3ReleaseSemaphore (
+  IN OUT  volatile UINT32           *Sem
+  )
+{
+  InterlockedIncrement (Sem);
+}
+
+/**
+  Decrement the semaphore by 1 if it is not zero.
 
-  This function programs registers for the calling processor.
+  Performs an atomic decrement operation for semaphore.
+  The compare exchange operation must be performed using
+  MP safe mechanisms.
 
-  @param  RegisterTables        Pointer to register table of the running processor.
-  @param  RegisterTableCount    Register table count.
+  @param      Sem            IN:  32-bit unsigned integer
 
 **/
 VOID
-SetProcessorRegister (
-  IN CPU_REGISTER_TABLE        *RegisterTables,
-  IN UINTN                     RegisterTableCount
+S3WaitForSemaphore (
+  IN OUT  volatile UINT32           *Sem
+  )
+{
+  UINT32  Value;
+
+  do {
+    Value = *Sem;
+  } while (Value == 0 ||
+           InterlockedCompareExchange32 (
+             Sem,
+             Value,
+             Value - 1
+             ) != Value);
+}
+
+/**
+  Get the string value for the input register type.
+
+  @param[in]  RegisterType  Input the register type.
+
+  @retval     Return the register type string.
+**/
+CHAR16 *
+GetRegisterTypeString (
+  IN REGISTER_TYPE       RegisterType
+  )
+{
+  switch (RegisterType) {
+  case Msr:
+    return L"Msr";
+
+  case ControlRegister:
+    return L"ControlRegister";
+
+  case MemoryMapped:
+    return L"MemoryMapped";
+
+  case CacheControl:
+    return L"CacheControl";
+  
+  case Semaphore:
+    return L"Semaphore";
+
+  default:
+    ASSERT (FALSE);
+    return L"NoneType";
+  }
+}
+
+/**
+  Initialize the CPU registers from a register table.
+
+  @param[in]  RegisterTable         The register table for this AP.
+  @param[in]  ApLocation            AP location info for this ap.
+  @param[in]  CpuStatus             CPU status info for this CPU.
+  @param[in]  CpuFlags              Flags data structure used when program the register.
+
+  @note This service could be called by BSP/APs.
+**/
+VOID
+ProgramProcessorRegister (
+  IN CPU_REGISTER_TABLE           *RegisterTable,
+  IN EFI_CPU_PHYSICAL_LOCATION    *ApLocation,
+  IN CPU_STATUS_INFORMATION       *CpuStatus,
+  IN PROGRAM_CPU_REGISTER_FLAGS   *CpuFlags
   )
 {
   CPU_REGISTER_TABLE_ENTRY  *RegisterTableEntry;
   UINTN                     Index;
   UINTN                     Value;
-  SPIN_LOCK                 *MsrSpinLock;
-  UINT32                    InitApicId;
-  CPU_REGISTER_TABLE        *RegisterTable;
-
-  InitApicId = GetInitialApicId ();
-  RegisterTable = NULL;
-  for (Index = 0; Index < RegisterTableCount; Index++) {
-    if (RegisterTables[Index].InitialApicId == InitApicId) {
-      RegisterTable =  &RegisterTables[Index];
-      break;
-    }
-  }
-  ASSERT (RegisterTable != NULL);
+  CPU_REGISTER_TABLE_ENTRY  *RegisterTableEntryHead;
+  volatile UINT32           *SemaphorePtr;
+  UINT32                    FirstThread;
+  UINT32                    PackageThreadsCount;
+  UINT32                    CurrentThread;
+  UINTN                     ProcessorIndex;
+  UINTN                     ThreadIndex;
+  UINTN                     ValidThreadCount;
+  UINT32                    *ValidCorePerPackage;
+
+  ThreadIndex = ApLocation->Package * CpuStatus->MaxCoreCount * CpuStatus->MaxThreadCount +
+              ApLocation->Core * CpuStatus->MaxThreadCount +
+              ApLocation->Thread;
 
   //
   // Traverse Register Table of this logical processor
   //
-  RegisterTableEntry = (CPU_REGISTER_TABLE_ENTRY *) (UINTN) RegisterTable->RegisterTableEntry;
-  for (Index = 0; Index < RegisterTable->TableLength; Index++, RegisterTableEntry++) {
+  RegisterTableEntryHead = (CPU_REGISTER_TABLE_ENTRY *) (UINTN) RegisterTable->RegisterTableEntry;
+
+  for (Index = 0; Index < RegisterTable->TableLength; Index++) {
+
+    RegisterTableEntry = &RegisterTableEntryHead[Index];
+    DEBUG ((
+      DEBUG_INFO,
+      "Processor = %lu, Entry Index %lu, Type = %s!\n",
+      (UINT64)ThreadIndex,
+      (UINT64)Index,
+      GetRegisterTypeString((REGISTER_TYPE)RegisterTableEntry->RegisterType)
+      ));
+
     //
     // Check the type of specified register
     //
@@ -310,12 +317,6 @@ SetProcessorRegister (
           RegisterTableEntry->Value
           );
       } else {
-        //
-        // Get lock to avoid Package/Core scope MSRs programming issue in parallel execution mode
-        // to make sure MSR read/write operation is atomic.
-        //
-        MsrSpinLock = GetMsrSpinLockByIndex (RegisterTableEntry->Index);
-        AcquireSpinLock (MsrSpinLock);
         //
         // Set the bit section according to bit start and length
         //
@@ -325,21 +326,20 @@ SetProcessorRegister (
           RegisterTableEntry->ValidBitStart + RegisterTableEntry->ValidBitLength - 1,
           RegisterTableEntry->Value
           );
-        ReleaseSpinLock (MsrSpinLock);
       }
       break;
     //
     // MemoryMapped operations
     //
     case MemoryMapped:
-      AcquireSpinLock (mMemoryMappedLock);
+      AcquireSpinLock (&CpuFlags->MemoryMappedLock);
       MmioBitFieldWrite32 (
         (UINTN)(RegisterTableEntry->Index | LShiftU64 (RegisterTableEntry->HighIndex, 32)),
         RegisterTableEntry->ValidBitStart,
         RegisterTableEntry->ValidBitStart + RegisterTableEntry->ValidBitLength - 1,
         (UINT32)RegisterTableEntry->Value
         );
-      ReleaseSpinLock (mMemoryMappedLock);
+      ReleaseSpinLock (&CpuFlags->MemoryMappedLock);
       break;
     //
     // Enable or disable cache
@@ -355,12 +355,126 @@ SetProcessorRegister (
       }
       break;
 
+    case Semaphore:
+      // Semaphore works logic like below:
+      //
+      //  V(x) = LibReleaseSemaphore (Semaphore[FirstThread + x]);
+      //  P(x) = LibWaitForSemaphore (Semaphore[FirstThread + x]);
+      //
+      //  All threads (T0...Tn) waits in P() line and continues running
+      //  together.
+      //
+      //
+      //  T0             T1            ...           Tn
+      //
+      //  V(0...n)       V(0...n)      ...           V(0...n)
+      //  n * P(0)       n * P(1)      ...           n * P(n)
+      //
+      SemaphorePtr = CpuFlags->SemaphoreCount;
+      switch (RegisterTableEntry->Value) {
+      case CoreDepType:
+        //
+        // Get Offset info for the first thread in the core which current thread belongs to.
+        //
+        FirstThread = (ApLocation->Package * CpuStatus->MaxCoreCount + ApLocation->Core) * CpuStatus->MaxThreadCount;
+        CurrentThread = FirstThread + ApLocation->Thread;
+        //
+        // First Notify all threads in current Core that this thread has ready.
+        //
+        for (ProcessorIndex = 0; ProcessorIndex < CpuStatus->MaxThreadCount; ProcessorIndex ++) {
+          S3ReleaseSemaphore (&SemaphorePtr[FirstThread + ProcessorIndex]);
+        }
+        //
+        // Second, check whether all valid threads in current core have ready.
+        //
+        for (ProcessorIndex = 0; ProcessorIndex < CpuStatus->MaxThreadCount; ProcessorIndex ++) {
+          S3WaitForSemaphore (&SemaphorePtr[CurrentThread]);
+        }
+        break;
+
+      case PackageDepType:
+        ValidCorePerPackage = (UINT32 *)(UINTN)CpuStatus->ValidCoresPerPackages;
+        //
+        // Get Offset info for the first thread in the package which current thread belongs to.
+        //
+        FirstThread = ApLocation->Package * CpuStatus->MaxCoreCount * CpuStatus->MaxThreadCount;
+        //
+        // Get the possible threads count for current package.
+        //
+        PackageThreadsCount = CpuStatus->MaxThreadCount * CpuStatus->MaxCoreCount;
+        CurrentThread = FirstThread + CpuStatus->MaxThreadCount * ApLocation->Core + ApLocation->Thread;
+        //
+        // Get the valid thread count for current package.
+        //
+        ValidThreadCount = CpuStatus->MaxThreadCount * ValidCorePerPackage[ApLocation->Package];
+        //
+        // First Notify all threads in current package that this thread has ready.
+        //
+        for (ProcessorIndex = 0; ProcessorIndex < PackageThreadsCount ; ProcessorIndex ++) {
+          S3ReleaseSemaphore (&SemaphorePtr[FirstThread + ProcessorIndex]);
+        }
+        //
+        // Second, check whether all valid threads in current package have ready.
+        //
+        for (ProcessorIndex = 0; ProcessorIndex < ValidThreadCount; ProcessorIndex ++) {
+          S3WaitForSemaphore (&SemaphorePtr[CurrentThread]);
+        }
+        break;
+
+      default:
+        break;
+      }
+      break;
+
     default:
       break;
     }
   }
 }
 
+/**
+
+  Set Processor register for one AP.
+  
+  @param     PreSmmRegisterTable     Use pre Smm register table or register table.
+
+**/
+VOID
+SetRegister (
+  IN BOOLEAN                 PreSmmRegisterTable
+  )
+{
+  CPU_REGISTER_TABLE        *RegisterTable;
+  CPU_REGISTER_TABLE        *RegisterTables;
+  UINT32                    InitApicId;
+  UINTN                     ProcIndex;
+  UINTN                     Index;
+
+  if (PreSmmRegisterTable) {
+    RegisterTables = (CPU_REGISTER_TABLE *)(UINTN)mAcpiCpuData.PreSmmInitRegisterTable;
+  } else {
+    RegisterTables = (CPU_REGISTER_TABLE *)(UINTN)mAcpiCpuData.RegisterTable;
+  }
+
+  InitApicId = GetInitialApicId ();
+  RegisterTable = NULL;
+  for (Index = 0; Index < mAcpiCpuData.NumberOfCpus; Index++) {
+    if (RegisterTables[Index].InitialApicId == InitApicId) {
+      RegisterTable = &RegisterTables[Index];
+      ProcIndex = Index;
+      break;
+    }
+  }
+  ASSERT (RegisterTable != NULL);
+
+  ProgramProcessorRegister (
+    RegisterTable,
+    (EFI_CPU_PHYSICAL_LOCATION *)(UINTN)mAcpiCpuData.ApLocation + ProcIndex,
+    &mAcpiCpuData.CpuStatus,
+    &mCpuFlags
+    );
+}
+
 /**
   AP initialization before then after SMBASE relocation in the S3 boot path.
 **/
@@ -374,7 +488,7 @@ InitializeAp (
 
   LoadMtrrData (mAcpiCpuData.MtrrTable);
 
-  SetProcessorRegister ((CPU_REGISTER_TABLE *) (UINTN) mAcpiCpuData.PreSmmInitRegisterTable, mAcpiCpuData.NumberOfCpus);
+  SetRegister (TRUE);
 
   //
   // Count down the number with lock mechanism.
@@ -391,7 +505,7 @@ InitializeAp (
   ProgramVirtualWireMode ();
   DisableLvtInterrupts ();
 
-  SetProcessorRegister ((CPU_REGISTER_TABLE *) (UINTN) mAcpiCpuData.RegisterTable, mAcpiCpuData.NumberOfCpus);
+  SetRegister (FALSE);
 
   //
   // Place AP into the safe code, count down the number with lock mechanism in the safe code.
@@ -466,7 +580,7 @@ InitializeCpuBeforeRebase (
 {
   LoadMtrrData (mAcpiCpuData.MtrrTable);
 
-  SetProcessorRegister ((CPU_REGISTER_TABLE *) (UINTN) mAcpiCpuData.PreSmmInitRegisterTable, mAcpiCpuData.NumberOfCpus);
+  SetRegister (TRUE);
 
   ProgramVirtualWireMode ();
 
@@ -502,15 +616,24 @@ InitializeCpuAfterRebase (
   VOID
   )
 {
-  SetProcessorRegister ((CPU_REGISTER_TABLE *) (UINTN) mAcpiCpuData.RegisterTable, mAcpiCpuData.NumberOfCpus);
-
   mNumberToFinish = mAcpiCpuData.NumberOfCpus - 1;
 
   //
-  // Signal that SMM base relocation is complete and to continue initialization.
+  // Signal that SMM base relocation is complete and to continue initialization for all APs.
   //
   mInitApsAfterSmmBaseReloc = TRUE;
 
+  //
+  // Must begin set register after all APs have continue their initialization.
+  // This is a requirement to support semaphore mechanism in register table.
+  // Because if semaphore's dependence type is package type, semaphore will wait
+  // for all Aps in one package finishing their tasks before set next register
+  // for all APs. If the Aps not begin its task during BSP doing its task, the
+  // BSP thread will hang because it is waiting for other Aps in the same
+  // package finishing their task.
+  //
+  SetRegister (FALSE);
+
   while (mNumberToFinish > 0) {
     CpuPause ();
   }
@@ -574,8 +697,6 @@ SmmRestoreCpu (
 
   mSmmS3Flag = TRUE;
 
-  InitializeSpinLock (mMemoryMappedLock);
-
   //
   // See if there is enough context to resume PEI Phase
   //
@@ -790,7 +911,6 @@ CopyRegisterTable (
   )
 {
   UINTN                      Index;
-  UINTN                      Index1;
   CPU_REGISTER_TABLE_ENTRY   *RegisterTableEntry;
 
   CopyMem (DestinationRegisterTableList, SourceRegisterTableList, NumberOfCpus * sizeof (CPU_REGISTER_TABLE));
@@ -802,17 +922,6 @@ CopyRegisterTable (
         );
       ASSERT (RegisterTableEntry != NULL);
       DestinationRegisterTableList[Index].RegisterTableEntry = (EFI_PHYSICAL_ADDRESS)(UINTN)RegisterTableEntry;
-      //
-      // Go though all MSRs in register table to initialize MSR spin lock
-      //
-      for (Index1 = 0; Index1 < DestinationRegisterTableList[Index].TableLength; Index1++, RegisterTableEntry++) {
-        if ((RegisterTableEntry->RegisterType == Msr) && (RegisterTableEntry->ValidBitLength < 64)) {
-          //
-          // Initialize MSR spin lock only for those MSRs need bit field writing
-          //
-          InitMsrSpinLockByIndex (RegisterTableEntry->Index);
-        }
-      }
     }
   }
 }
@@ -832,6 +941,7 @@ GetAcpiCpuData (
   VOID                       *GdtForAp;
   VOID                       *IdtForAp;
   VOID                       *MachineCheckHandlerForAp;
+  CPU_STATUS_INFORMATION     *CpuStatus;
 
   if (!mAcpiS3Enable) {
     return;
@@ -906,6 +1016,24 @@ GetAcpiCpuData (
   Gdtr->Base = (UINTN)GdtForAp;
   Idtr->Base = (UINTN)IdtForAp;
   mAcpiCpuData.ApMachineCheckHandlerBase = (EFI_PHYSICAL_ADDRESS)(UINTN)MachineCheckHandlerForAp;
+
+  CpuStatus = &mAcpiCpuData.CpuStatus;
+  CopyMem (CpuStatus, &AcpiCpuData->CpuStatus, sizeof (CPU_STATUS_INFORMATION));
+  CpuStatus->ValidCoresPerPackages = (EFI_PHYSICAL_ADDRESS)(UINTN)AllocateCopyPool (
+                                       sizeof (UINT32) * CpuStatus->PackageCount,
+                                       (UINT32 *)(UINTN)AcpiCpuData->CpuStatus.ValidCoresPerPackages
+                                       );
+  ASSERT (CpuStatus->ValidCoresPerPackages != 0);
+  mAcpiCpuData.ApLocation = (EFI_PHYSICAL_ADDRESS)(UINTN)AllocateCopyPool (
+                              mAcpiCpuData.NumberOfCpus * sizeof (EFI_CPU_PHYSICAL_LOCATION),
+                              (EFI_CPU_PHYSICAL_LOCATION *)(UINTN)AcpiCpuData->ApLocation
+                              );
+  ASSERT (mAcpiCpuData.ApLocation != 0);
+  InitializeSpinLock((SPIN_LOCK*) &mCpuFlags.MemoryMappedLock);
+  mCpuFlags.SemaphoreCount = AllocateZeroPool (
+                               sizeof (UINT32) * CpuStatus->PackageCount *
+                               CpuStatus->MaxCoreCount * CpuStatus->MaxThreadCount);
+  ASSERT (mCpuFlags.SemaphoreCount != NULL);
 }
 
 /**
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
index 9cf508a5c7..42b040531e 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
@@ -1303,8 +1303,6 @@ InitializeSmmCpuSemaphores (
   mSmmCpuSemaphores.SemaphoreGlobal.CodeAccessCheckLock
                                                   = (SPIN_LOCK *)SemaphoreAddr;
   SemaphoreAddr += SemaphoreSize;
-  mSmmCpuSemaphores.SemaphoreGlobal.MemoryMappedLock
-                                                  = (SPIN_LOCK *)SemaphoreAddr;
 
   SemaphoreAddr = (UINTN)SemaphoreBlock + GlobalSemaphoresSize;
   mSmmCpuSemaphores.SemaphoreCpu.Busy    = (SPIN_LOCK *)SemaphoreAddr;
@@ -1321,7 +1319,6 @@ InitializeSmmCpuSemaphores (
 
   mPFLock                       = mSmmCpuSemaphores.SemaphoreGlobal.PFLock;
   mConfigSmmCodeAccessCheckLock = mSmmCpuSemaphores.SemaphoreGlobal.CodeAccessCheckLock;
-  mMemoryMappedLock             = mSmmCpuSemaphores.SemaphoreGlobal.MemoryMappedLock;
 
   mSemaphoreSize = SemaphoreSize;
 }
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
index 8c7f4996d1..e2970308fe 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
@@ -53,6 +53,7 @@ WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED.
 #include <Library/ReportStatusCodeLib.h>
 #include <Library/SmmCpuFeaturesLib.h>
 #include <Library/PeCoffGetEntryPointLib.h>
+#include <Library/RegisterCpuFeaturesLib.h>
 
 #include <AcpiCpuData.h>
 #include <CpuHotPlugData.h>
@@ -364,7 +365,6 @@ typedef struct {
   volatile BOOLEAN     *AllCpusInSync;
   SPIN_LOCK            *PFLock;
   SPIN_LOCK            *CodeAccessCheckLock;
-  SPIN_LOCK            *MemoryMappedLock;
 } SMM_CPU_SEMAPHORE_GLOBAL;
 
 ///
@@ -409,7 +409,6 @@ extern SMM_CPU_SEMAPHORES                  mSmmCpuSemaphores;
 extern UINTN                               mSemaphoreSize;
 extern SPIN_LOCK                           *mPFLock;
 extern SPIN_LOCK                           *mConfigSmmCodeAccessCheckLock;
-extern SPIN_LOCK                           *mMemoryMappedLock;
 extern EFI_SMRAM_DESCRIPTOR                *mSmmCpuSmramRanges;
 extern UINTN                               mSmmCpuSmramRangeCount;
 extern UINT8                               mPhysicalAddressBits;
-- 
2.15.0.windows.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Patch v2 5/6] UefiCpuPkg/CpuS3DataDxe: Keep old data if value already existed.
  2018-10-17  2:16 [Patch v2 0/6] Fix performance issue caused by Set MSR task Eric Dong
                   ` (3 preceding siblings ...)
  2018-10-17  2:16 ` [Patch v2 4/6] UefiCpuPkg/PiSmmCpuDxeSmm: " Eric Dong
@ 2018-10-17  2:16 ` Eric Dong
  2018-10-18  5:57   ` Ni, Ruiyu
  2018-10-17  2:16 ` [Patch v2 6/6] UefiCpuPkg/CpuCommonFeaturesLib: Register MSR base on scope Info Eric Dong
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Eric Dong @ 2018-10-17  2:16 UTC (permalink / raw)
  To: edk2-devel; +Cc: Ruiyu Ni, Laszlo Ersek

AcpiCpuData add new fields, keep these fields if old data already existed.

Cc: Ruiyu Ni <ruiyu.ni@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Eric Dong <eric.dong@intel.com>
---
 UefiCpuPkg/CpuS3DataDxe/CpuS3Data.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/UefiCpuPkg/CpuS3DataDxe/CpuS3Data.c b/UefiCpuPkg/CpuS3DataDxe/CpuS3Data.c
index ef98239844..1b847e453a 100644
--- a/UefiCpuPkg/CpuS3DataDxe/CpuS3Data.c
+++ b/UefiCpuPkg/CpuS3DataDxe/CpuS3Data.c
@@ -259,6 +259,8 @@ CpuS3DataInitialize (
   if (OldAcpiCpuData != NULL) {
     AcpiCpuData->RegisterTable           = OldAcpiCpuData->RegisterTable;
     AcpiCpuData->PreSmmInitRegisterTable = OldAcpiCpuData->PreSmmInitRegisterTable;
+    AcpiCpuData->ApLocation              = OldAcpiCpuData->ApLocation;
+    CopyMem (&AcpiCpuData->CpuStatus, &OldAcpiCpuData->CpuStatus, sizeof (CPU_STATUS_INFORMATION));
   } else {
     //
     // Allocate buffer for empty RegisterTable and PreSmmInitRegisterTable for all CPUs
-- 
2.15.0.windows.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Patch v2 6/6] UefiCpuPkg/CpuCommonFeaturesLib: Register MSR base on scope Info.
  2018-10-17  2:16 [Patch v2 0/6] Fix performance issue caused by Set MSR task Eric Dong
                   ` (4 preceding siblings ...)
  2018-10-17  2:16 ` [Patch v2 5/6] UefiCpuPkg/CpuS3DataDxe: Keep old data if value already existed Eric Dong
@ 2018-10-17  2:16 ` Eric Dong
  2018-10-18  6:01   ` Ni, Ruiyu
  2018-10-17 17:33 ` [Patch v2 0/6] Fix performance issue caused by Set MSR task Laszlo Ersek
  2018-10-18  2:12 ` Ni, Ruiyu
  7 siblings, 1 reply; 18+ messages in thread
From: Eric Dong @ 2018-10-17  2:16 UTC (permalink / raw)
  To: edk2-devel; +Cc: Ruiyu Ni, Laszlo Ersek

Because MSR has scope attribute, driver has no needs to set
MSR for all APs if MSR scope is core or package type. This patch
updates code to base on the MSR scope value to add MSR to the register
table.

Cc: Ruiyu Ni <ruiyu.ni@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Eric Dong <eric.dong@intel.com>
---
 UefiCpuPkg/Library/CpuCommonFeaturesLib/C1e.c      |  8 +++++
 UefiCpuPkg/Library/CpuCommonFeaturesLib/Eist.c     | 12 +++++++
 .../Library/CpuCommonFeaturesLib/ExecuteDisable.c  | 10 ++++++
 .../Library/CpuCommonFeaturesLib/FastStrings.c     | 12 +++++++
 .../Library/CpuCommonFeaturesLib/FeatureControl.c  | 38 ++++++++++++++++++++++
 .../CpuCommonFeaturesLib/LimitCpuIdMaxval.c        | 14 ++++++++
 .../Library/CpuCommonFeaturesLib/MachineCheck.c    | 38 ++++++++++++++++++++++
 .../Library/CpuCommonFeaturesLib/MonitorMwait.c    | 15 +++++++++
 .../Library/CpuCommonFeaturesLib/PendingBreak.c    | 11 +++++++
 UefiCpuPkg/Library/CpuCommonFeaturesLib/Ppin.c     | 11 +++++++
 .../Library/CpuCommonFeaturesLib/ProcTrace.c       | 11 +++++++
 UefiCpuPkg/Library/CpuCommonFeaturesLib/X2Apic.c   | 10 ++++++
 12 files changed, 190 insertions(+)

diff --git a/UefiCpuPkg/Library/CpuCommonFeaturesLib/C1e.c b/UefiCpuPkg/Library/CpuCommonFeaturesLib/C1e.c
index 47116355a8..1beaebe69c 100644
--- a/UefiCpuPkg/Library/CpuCommonFeaturesLib/C1e.c
+++ b/UefiCpuPkg/Library/CpuCommonFeaturesLib/C1e.c
@@ -67,6 +67,14 @@ C1eInitialize (
   IN BOOLEAN                           State
   )
 {
+  //
+  // The scope of C1EEnable bit in the MSR_NEHALEM_POWER_CTL is Package, only program
+  // MSR_FEATURE_CONFIG for thread 0 core 0 in each package.
+  //
+  if ((CpuInfo->ProcessorInfo.Location.Thread != 0) || (CpuInfo->ProcessorInfo.Location.Core != 0)) {
+  return RETURN_SUCCESS;
+  }
+
   CPU_REGISTER_TABLE_WRITE_FIELD (
     ProcessorNumber,
     Msr,
diff --git a/UefiCpuPkg/Library/CpuCommonFeaturesLib/Eist.c b/UefiCpuPkg/Library/CpuCommonFeaturesLib/Eist.c
index 2038171a14..f30117d2c5 100644
--- a/UefiCpuPkg/Library/CpuCommonFeaturesLib/Eist.c
+++ b/UefiCpuPkg/Library/CpuCommonFeaturesLib/Eist.c
@@ -69,6 +69,18 @@ EistInitialize (
   IN BOOLEAN                           State
   )
 {
+  //
+  // The scope of the MSR_IA32_MISC_ENABLE is core for below processor type, only program
+  // MSR_IA32_MISC_ENABLE for thread 0 in each core.
+  //
+  if (IS_ATOM_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
+      IS_CORE_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
+      IS_CORE2_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel)) {
+    if (CpuInfo->ProcessorInfo.Location.Thread != 0) {
+      return RETURN_SUCCESS;
+    }
+  }
+
   CPU_REGISTER_TABLE_WRITE_FIELD (
     ProcessorNumber,
     Msr,
diff --git a/UefiCpuPkg/Library/CpuCommonFeaturesLib/ExecuteDisable.c b/UefiCpuPkg/Library/CpuCommonFeaturesLib/ExecuteDisable.c
index 921656a1e8..ff06cb9b60 100644
--- a/UefiCpuPkg/Library/CpuCommonFeaturesLib/ExecuteDisable.c
+++ b/UefiCpuPkg/Library/CpuCommonFeaturesLib/ExecuteDisable.c
@@ -79,6 +79,16 @@ ExecuteDisableInitialize (
   IN BOOLEAN                           State
   )
 {
+  //
+  // The scope of the MSR_IA32_EFER is core for below processor type, only program
+  // MSR_IA32_EFER for thread 0 in each core.
+  //
+  if (IS_SILVERMONT_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel)) {
+    if (CpuInfo->ProcessorInfo.Location.Thread != 0) {
+      return RETURN_SUCCESS;
+    }
+  }
+
   CPU_REGISTER_TABLE_WRITE_FIELD (
     ProcessorNumber,
     Msr,
diff --git a/UefiCpuPkg/Library/CpuCommonFeaturesLib/FastStrings.c b/UefiCpuPkg/Library/CpuCommonFeaturesLib/FastStrings.c
index 029bcf87b3..2682093c23 100644
--- a/UefiCpuPkg/Library/CpuCommonFeaturesLib/FastStrings.c
+++ b/UefiCpuPkg/Library/CpuCommonFeaturesLib/FastStrings.c
@@ -40,6 +40,18 @@ FastStringsInitialize (
   IN BOOLEAN                           State
   )
 {
+  //
+  // The scope of FastStrings bit in the MSR_IA32_MISC_ENABLE is core for below processor type, only program
+  // MSR_IA32_MISC_ENABLE for thread 0 in each core.
+  //
+  if (IS_SILVERMONT_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
+      IS_GOLDMONT_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
+      IS_PENTIUM_4_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel)) {
+    if (CpuInfo->ProcessorInfo.Location.Thread != 0) {
+      return RETURN_SUCCESS;
+    }
+  }
+
   CPU_REGISTER_TABLE_WRITE_FIELD (
     ProcessorNumber,
     Msr,
diff --git a/UefiCpuPkg/Library/CpuCommonFeaturesLib/FeatureControl.c b/UefiCpuPkg/Library/CpuCommonFeaturesLib/FeatureControl.c
index d28c4ec51a..8c1eb5eb4f 100644
--- a/UefiCpuPkg/Library/CpuCommonFeaturesLib/FeatureControl.c
+++ b/UefiCpuPkg/Library/CpuCommonFeaturesLib/FeatureControl.c
@@ -96,6 +96,19 @@ VmxInitialize (
 {
   MSR_IA32_FEATURE_CONTROL_REGISTER    *MsrRegister;
 
+  //
+  // The scope of EnableVmxOutsideSmx bit in the MSR_IA32_FEATURE_CONTROL is core for
+  // below processor type, only program MSR_IA32_FEATURE_CONTROL for thread 0 in each
+  // core.
+  //
+  if (IS_SILVERMONT_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
+      IS_GOLDMONT_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
+      IS_GOLDMONT_PLUS_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel)) {
+    if (CpuInfo->ProcessorInfo.Location.Thread != 0) {
+      return RETURN_SUCCESS;
+    }
+  }
+
   ASSERT (ConfigData != NULL);
   MsrRegister = (MSR_IA32_FEATURE_CONTROL_REGISTER *) ConfigData;
   if (MsrRegister[ProcessorNumber].Bits.Lock == 0) {
@@ -171,6 +184,19 @@ LockFeatureControlRegisterInitialize (
 {
   MSR_IA32_FEATURE_CONTROL_REGISTER    *MsrRegister;
 
+  //
+  // The scope of Lock bit in the MSR_IA32_FEATURE_CONTROL is core for
+  // below processor type, only program MSR_IA32_FEATURE_CONTROL for thread 0 in each
+  // core.
+  //
+  if (IS_SILVERMONT_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
+      IS_GOLDMONT_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
+      IS_GOLDMONT_PLUS_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel)) {
+    if (CpuInfo->ProcessorInfo.Location.Thread != 0) {
+      return RETURN_SUCCESS;
+    }
+  }
+
   ASSERT (ConfigData != NULL);
   MsrRegister = (MSR_IA32_FEATURE_CONTROL_REGISTER *) ConfigData;
   if (MsrRegister[ProcessorNumber].Bits.Lock == 0) {
@@ -248,6 +274,18 @@ SmxInitialize (
   MSR_IA32_FEATURE_CONTROL_REGISTER    *MsrRegister;
   RETURN_STATUS                        Status;
 
+  //
+  // The scope of Lock bit in the MSR_IA32_FEATURE_CONTROL is core for
+  // below processor type, only program MSR_IA32_FEATURE_CONTROL for thread 0 in each
+  // core.
+  //
+  if (IS_GOLDMONT_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
+      IS_GOLDMONT_PLUS_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel)) {
+    if (CpuInfo->ProcessorInfo.Location.Thread != 0) {
+      return RETURN_SUCCESS;
+    }
+  }
+
   Status = RETURN_SUCCESS;
 
   if (State && (!IsCpuFeatureInSetting (CPU_FEATURE_VMX))) {
diff --git a/UefiCpuPkg/Library/CpuCommonFeaturesLib/LimitCpuIdMaxval.c b/UefiCpuPkg/Library/CpuCommonFeaturesLib/LimitCpuIdMaxval.c
index 3d41efe9e9..eab1fb538c 100644
--- a/UefiCpuPkg/Library/CpuCommonFeaturesLib/LimitCpuIdMaxval.c
+++ b/UefiCpuPkg/Library/CpuCommonFeaturesLib/LimitCpuIdMaxval.c
@@ -70,6 +70,20 @@ LimitCpuidMaxvalInitialize (
   IN BOOLEAN                           State
   )
 {
+  //
+  // The scope of LimitCpuidMaxval bit in the MSR_IA32_MISC_ENABLE is core for below
+  // processor type, only program MSR_IA32_MISC_ENABLE for thread 0 in each core.
+  //
+  if (IS_PENTIUM_4_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||

+      IS_SILVERMONT_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
+      IS_GOLDMONT_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
+      IS_CORE_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
+      IS_CORE2_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel)) {
+    if (CpuInfo->ProcessorInfo.Location.Thread != 0) {
+      return RETURN_SUCCESS;
+    }
+  }
+
   CPU_REGISTER_TABLE_WRITE_FIELD (
     ProcessorNumber,
     Msr,
diff --git a/UefiCpuPkg/Library/CpuCommonFeaturesLib/MachineCheck.c b/UefiCpuPkg/Library/CpuCommonFeaturesLib/MachineCheck.c
index c4eca062fd..f8bee53819 100644
--- a/UefiCpuPkg/Library/CpuCommonFeaturesLib/MachineCheck.c
+++ b/UefiCpuPkg/Library/CpuCommonFeaturesLib/MachineCheck.c
@@ -140,6 +140,32 @@ McaInitialize (
   MSR_IA32_MCG_CAP_REGISTER  McgCap;
   UINT32                     BankIndex;
 
+  //
+  // The scope of MSR_IA32_MC*_CTL/MSR_IA32_MC*_STATUS is core for below processor type, only program
+  // MSR_IA32_MC*_CTL/MSR_IA32_MC*_STATUS for thread 0 in each core.
+  //
+  if (IS_ATOM_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
+      IS_SILVERMONT_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
+      IS_SANDY_BRIDGE_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
+      IS_SKYLAKE_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
+      IS_XEON_PHI_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
+      IS_PENTIUM_4_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
+      IS_CORE_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel)) {
+    if (CpuInfo->ProcessorInfo.Location.Thread != 0) {
+      return RETURN_SUCCESS;
+    }
+  }
+
+  //
+  // The scope of MSR_IA32_MC*_CTL/MSR_IA32_MC*_STATUS is package for below processor type, only program
+  // MSR_IA32_MC*_CTL/MSR_IA32_MC*_STATUS for thread 0 core 0 in each package.
+  //
+  if (IS_NEHALEM_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel)) {
+    if ((CpuInfo->ProcessorInfo.Location.Thread != 0) || (CpuInfo->ProcessorInfo.Location.Core != 0)) {
+      return RETURN_SUCCESS;
+    }
+  }
+
   if (State) {
     McgCap.Uint64 = AsmReadMsr64 (MSR_IA32_MCG_CAP);
     for (BankIndex = 0; BankIndex < (UINT32) McgCap.Bits.Count; BankIndex++) {
@@ -301,6 +327,18 @@ LmceInitialize (
 {
   MSR_IA32_FEATURE_CONTROL_REGISTER    *MsrRegister;
 
+  //
+  // The scope of FastStrings bit in the MSR_IA32_MISC_ENABLE is core for below processor type, only program 
+  // MSR_IA32_MISC_ENABLE for thread 0 in each core.
+  //
+  if (IS_SILVERMONT_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
+      IS_GOLDMONT_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
+      IS_PENTIUM_4_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel)) {
+    if (CpuInfo->ProcessorInfo.Location.Thread != 0) {
+      return RETURN_SUCCESS;
+    }
+  }
+
   ASSERT (ConfigData != NULL);
   MsrRegister = (MSR_IA32_FEATURE_CONTROL_REGISTER *) ConfigData;
   if (MsrRegister[ProcessorNumber].Bits.Lock == 0) {
diff --git a/UefiCpuPkg/Library/CpuCommonFeaturesLib/MonitorMwait.c b/UefiCpuPkg/Library/CpuCommonFeaturesLib/MonitorMwait.c
index 1d43bd128a..530748bf46 100644
--- a/UefiCpuPkg/Library/CpuCommonFeaturesLib/MonitorMwait.c
+++ b/UefiCpuPkg/Library/CpuCommonFeaturesLib/MonitorMwait.c
@@ -67,6 +67,21 @@ MonitorMwaitInitialize (
   IN BOOLEAN                           State
   )
 {
+  //
+  // The scope of the MSR_IA32_MISC_ENABLE is core for below processor type, only program
+  // MSR_IA32_MISC_ENABLE for thread 0 in each core.
+  //
+  if (IS_CORE2_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
+      IS_ATOM_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
+      IS_GOLDMONT_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
+      IS_SILVERMONT_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
+      IS_PENTIUM_4_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
+      IS_CORE_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel)) {
+    if (CpuInfo->ProcessorInfo.Location.Thread != 0) {
+      return RETURN_SUCCESS;
+    }
+  }
+
   CPU_REGISTER_TABLE_WRITE_FIELD (
     ProcessorNumber,
     Msr,
diff --git a/UefiCpuPkg/Library/CpuCommonFeaturesLib/PendingBreak.c b/UefiCpuPkg/Library/CpuCommonFeaturesLib/PendingBreak.c
index 8cafba4f4a..2e0d2bdeca 100644
--- a/UefiCpuPkg/Library/CpuCommonFeaturesLib/PendingBreak.c
+++ b/UefiCpuPkg/Library/CpuCommonFeaturesLib/PendingBreak.c
@@ -74,6 +74,17 @@ PendingBreakInitialize (
   IN BOOLEAN                           State
   )
 {
+  //
+  // The scope of the MSR_ATOM_IA32_MISC_ENABLE is core for below processor type, only program
+  // MSR_ATOM_IA32_MISC_ENABLE for thread 0 in each core.
+  //
+  // Support function has check the processer type for this feature, no need to check again
+  // here.
+  //
+  if (CpuInfo->ProcessorInfo.Location.Thread != 0) {
+    return RETURN_SUCCESS;
+  }
+
   //
   // ATOM, CORE2, CORE, PENTIUM_4 and IS_PENTIUM_M_PROCESSOR have the same MSR index,
   // Simply use MSR_ATOM_IA32_MISC_ENABLE here
diff --git a/UefiCpuPkg/Library/CpuCommonFeaturesLib/Ppin.c b/UefiCpuPkg/Library/CpuCommonFeaturesLib/Ppin.c
index 721470cdfe..d6219f4f3f 100644
--- a/UefiCpuPkg/Library/CpuCommonFeaturesLib/Ppin.c
+++ b/UefiCpuPkg/Library/CpuCommonFeaturesLib/Ppin.c
@@ -101,6 +101,17 @@ PpinInitialize (
     return MsrPpinCtrl.Bits.Enable_PPIN == State ? RETURN_SUCCESS : RETURN_DEVICE_ERROR;
   }
 
+  //
+  // Support function already check the processor which support PPIN feature, so this function not need
+  // to check the processor again.
+  //
+  // The scope of the MSR_IVY_BRIDGE_PPIN_CTL is package level, only program MSR_IVY_BRIDGE_PPIN_CTL for
+  // thread 0 core 0 in each package.
+  //
+  if ((CpuInfo->ProcessorInfo.Location.Thread != 0) || (CpuInfo->ProcessorInfo.Location.Core != 0)) {
+    return RETURN_SUCCESS;
+  }
+
   CPU_REGISTER_TABLE_WRITE_FIELD (
     ProcessorNumber,
     Msr,
diff --git a/UefiCpuPkg/Library/CpuCommonFeaturesLib/ProcTrace.c b/UefiCpuPkg/Library/CpuCommonFeaturesLib/ProcTrace.c
index 98490c6777..cf34ad4d1f 100644
--- a/UefiCpuPkg/Library/CpuCommonFeaturesLib/ProcTrace.c
+++ b/UefiCpuPkg/Library/CpuCommonFeaturesLib/ProcTrace.c
@@ -191,6 +191,17 @@ ProcTraceInitialize (
   MSR_IA32_RTIT_OUTPUT_MASK_PTRS_REGISTER  OutputMaskPtrsReg;
   RTIT_TOPA_TABLE_ENTRY                *TopaEntryPtr;
 
+  //
+  // The scope of the MSR_IA32_RTIT_* is core for below processor type, only program
+  // MSR_IA32_RTIT_* for thread 0 in each core.
+  //
+  if (IS_GOLDMONT_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
+      IS_GOLDMONT_PLUS_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel)) {
+    if (CpuInfo->ProcessorInfo.Location.Thread != 0) {
+      return RETURN_SUCCESS;
+    }
+  }
+
   ProcTraceData = (PROC_TRACE_DATA *) ConfigData;
   ASSERT (ProcTraceData != NULL);
 
diff --git a/UefiCpuPkg/Library/CpuCommonFeaturesLib/X2Apic.c b/UefiCpuPkg/Library/CpuCommonFeaturesLib/X2Apic.c
index b4a453c352..342b45f25b 100644
--- a/UefiCpuPkg/Library/CpuCommonFeaturesLib/X2Apic.c
+++ b/UefiCpuPkg/Library/CpuCommonFeaturesLib/X2Apic.c
@@ -102,6 +102,16 @@ X2ApicInitialize (
 {
   BOOLEAN                            *X2ApicEnabled;
 
+  //
+  // The scope of the MSR_IA32_APIC_BASE is core for below processor type, only program
+  // MSR_IA32_APIC_BASE for thread 0 in each core.
+  //
+  if (IS_SILVERMONT_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel)) {
+    if (CpuInfo->ProcessorInfo.Location.Thread != 0) {
+      return RETURN_SUCCESS;
+    }
+  }
+
   ASSERT (ConfigData != NULL);
   X2ApicEnabled = (BOOLEAN *) ConfigData;
   if (X2ApicEnabled[ProcessorNumber]) {
-- 
2.15.0.windows.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Patch v2 0/6] Fix performance issue caused by Set MSR task.
  2018-10-17  2:16 [Patch v2 0/6] Fix performance issue caused by Set MSR task Eric Dong
                   ` (5 preceding siblings ...)
  2018-10-17  2:16 ` [Patch v2 6/6] UefiCpuPkg/CpuCommonFeaturesLib: Register MSR base on scope Info Eric Dong
@ 2018-10-17 17:33 ` Laszlo Ersek
  2018-10-18  7:36   ` Dong, Eric
  2018-10-18  2:12 ` Ni, Ruiyu
  7 siblings, 1 reply; 18+ messages in thread
From: Laszlo Ersek @ 2018-10-17 17:33 UTC (permalink / raw)
  To: Eric Dong; +Cc: Ruiyu Ni, edk2-devel-01

Hi Eric,

On 10/17/18 04:16, Eric Dong wrote:
> V2 changes include:
> 1. Include the change for CpuCommonFeaturesLib which used to set MSR base on its scope info.
> 2. Include the change for CpuS3DataDxe driver which also handle the AcpiCpuData data.
> 3. Update code base on feedback for V1 changes.

reviewing this version of the series is on my TODO list. I was
out-of-office yesterday (somewhat unexpectedly) and this morning I had
130 emails just in my inbox (not counting other list traffic).

I've more or less managed to get to the bottom of that mail dump by now,
but as a result, it's too late to start reviewing your v2 still today.
Hopefully I can cover it tomorrow.

(I might fetch another email batch this evening, and handle small items.
Thus, you could see further emails from me on the list tonight.)

If Ray reviewed your v2 today (while I was busy unburying myself from my
inbox), and you are ready to post a v3 based on Ray's comments, please
do so. Then I'll skip v2 and catch up with v3.

Thanks!
Laszlo


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch v2 0/6] Fix performance issue caused by Set MSR task.
  2018-10-17  2:16 [Patch v2 0/6] Fix performance issue caused by Set MSR task Eric Dong
                   ` (6 preceding siblings ...)
  2018-10-17 17:33 ` [Patch v2 0/6] Fix performance issue caused by Set MSR task Laszlo Ersek
@ 2018-10-18  2:12 ` Ni, Ruiyu
  2018-10-18  2:35   ` Dong, Eric
  7 siblings, 1 reply; 18+ messages in thread
From: Ni, Ruiyu @ 2018-10-18  2:12 UTC (permalink / raw)
  To: Dong, Eric, edk2-devel@lists.01.org; +Cc: Laszlo Ersek

Eric,
Can you post your changes to github yours mirror repo?
I found #3/6 cannot be applied to my code properly.

Thanks/Ray

> -----Original Message-----
> From: Dong, Eric
> Sent: Wednesday, October 17, 2018 10:16 AM
> To: edk2-devel@lists.01.org
> Cc: Ni, Ruiyu <ruiyu.ni@intel.com>; Laszlo Ersek <lersek@redhat.com>
> Subject: [Patch v2 0/6] Fix performance issue caused by Set MSR task.
> 
> V2 changes include:
> 1. Include the change for CpuCommonFeaturesLib which used to set MSR
> base on its scope info.
> 2. Include the change for CpuS3DataDxe driver which also handle the
> AcpiCpuData data.
> 3. Update code base on feedback for V1 changes.
> 
> V1 changes include:
> In a system which has multiple cores, current set register value task costs
> huge times.
> After investigation, current set MSR task costs most of the times. Current
> logic uses SpinLock to let set MSR task as an single thread task for all cores.
> Because MSR has scope attribute which may cause GP fault if multiple APs
> set MSR at the same time, current logic use an easiest solution (use SpinLock)
> to avoid this issue, but it will cost huge times.
> 
> In order to fix this performance issue, new solution will set MSRs base on
> their scope attribute. After this, the SpinLock will not needed. Without
> SpinLock, new issue raised which is caused by MSR dependence. For
> example, MSR A depends on MSR B which means MSR A must been set after
> MSR B has been set. Also MSR B is package scope level and MSR A is thread
> scope level. If system has multiple threads, Thread 1 needs to set the thread
> level MSRs and thread 2 needs to set thread and package level MSRs. Set
> MSRs task for thread 1 and thread 2 like below:
> 
>             Thread 1                 Thread 2
> MSR B          N                        Y
> MSR A          Y                        Y
> 
> If driver don't control execute MSR order, for thread 1, it will execute MSR A
> first, but at this time, MSR B not been executed yet by thread 2. system may
> trig exception at this time.
> 
> In order to fix the above issue, driver introduces semaphore logic to control
> the MSR execute sequence. For the above case, a semaphore will be add
> between MSR A and B for all threads. Semaphore has scope info for it. The
> possible scope value is core or package.
> For each thread, when it meets a semaphore during it set registers, it will 1)
> release semaphore (+1) for each threads in this core or package(based on
> the scope info for this
> semaphore) 2) acquire semaphore (-1) for all the threads in this core or
> package(based on the scope info for this semaphore). With these two steps,
> driver can control MSR sequence. Sample code logic like below:
> 
>   //
>   // First increase semaphore count by 1 for processors in this package.
>   //
>   for (ProcessorIndex = 0; ProcessorIndex < PackageThreadsCount ;
> ProcessorIndex ++) {
>     LibReleaseSemaphore ((UINT32 *) &SemaphorePtr[PackageOffset +
> ProcessorIndex]);
>   }
>   //
>   // Second, check whether the count has reach the check number.
>   //
>   for (ProcessorIndex = 0; ProcessorIndex < ValidApCount; ProcessorIndex ++)
> {
>     LibWaitForSemaphore (&SemaphorePtr[ApOffset]);
>   }
> 
> Platform Requirement:
> 1. This change requires register MSR setting base on MSR scope info. If still
> register MSR
>    for all threads, exception may raised.
> 
> Known limitation:
> 1. Current CpuFeatures driver supports DXE instance and PEI instance. But
> semaphore logic
>    requires Aps execute in async mode which is not supported by PEI driver.
> So CpuFeature
>    PEI instance not works after this change. We plan to support async mode
> for PEI in phase
>    2 for this task.
> 2. Current execute MSR task code in duplicated in PiSmmCpuDxeSmm driver
> and
>    RegisterCpuFeaturesLib library because the schedule limitation. Will merge
> the code to
>    RegisterCpuFeaturesLib and export as an API in phase 2 for this task.
> 
> Cc: Ruiyu Ni <ruiyu.ni@intel.com>
> Cc: Laszlo Ersek <lersek@redhat.com>
> Contributed-under: TianoCore Contribution Agreement 1.1
> Signed-off-by: Eric Dong <eric.dong@intel.com>
> 
> 
> Eric Dong (6):
>   UefiCpuPkg/Include/AcpiCpuData.h: Add Semaphore related Information.
>   UefiCpuPkg/RegisterCpuFeaturesLib.h: Add new dependence types.
>   UefiCpuPkg/RegisterCpuFeaturesLib: Add logic to support semaphore
>     type.
>   UefiCpuPkg/PiSmmCpuDxeSmm: Add logic to support semaphore type.
>   UefiCpuPkg/CpuS3DataDxe: Keep old data if value already existed.
>   UefiCpuPkg/CpuCommonFeaturesLib: Register MSR base on scope Info.
> 
>  UefiCpuPkg/CpuS3DataDxe/CpuS3Data.c                |   2 +
>  UefiCpuPkg/Include/AcpiCpuData.h                   |  45 +-
>  .../Include/Library/RegisterCpuFeaturesLib.h       |  25 +-
>  UefiCpuPkg/Library/CpuCommonFeaturesLib/C1e.c      |   8 +
>  UefiCpuPkg/Library/CpuCommonFeaturesLib/Eist.c     |  12 +
>  .../Library/CpuCommonFeaturesLib/ExecuteDisable.c  |  10 +
>  .../Library/CpuCommonFeaturesLib/FastStrings.c     |  12 +
>  .../Library/CpuCommonFeaturesLib/FeatureControl.c  |  38 ++
>  .../CpuCommonFeaturesLib/LimitCpuIdMaxval.c        |  14 +
>  .../Library/CpuCommonFeaturesLib/MachineCheck.c    |  38 ++
>  .../Library/CpuCommonFeaturesLib/MonitorMwait.c    |  15 +
>  .../Library/CpuCommonFeaturesLib/PendingBreak.c    |  11 +
>  UefiCpuPkg/Library/CpuCommonFeaturesLib/Ppin.c     |  11 +
>  .../Library/CpuCommonFeaturesLib/ProcTrace.c       |  11 +
>  UefiCpuPkg/Library/CpuCommonFeaturesLib/X2Apic.c   |  10 +
>  .../RegisterCpuFeaturesLib/CpuFeaturesInitialize.c | 438
> ++++++++++++++++---
>  .../DxeRegisterCpuFeaturesLib.c                    |  71 ++-
>  .../DxeRegisterCpuFeaturesLib.inf                  |   3 +
>  .../PeiRegisterCpuFeaturesLib.c                    |  55 ++-
>  .../PeiRegisterCpuFeaturesLib.inf                  |   1 +
>  .../RegisterCpuFeaturesLib/RegisterCpuFeatures.h   |  61 ++-
>  .../RegisterCpuFeaturesLib.c                       | 484 ++++++++++++++++++---
>  UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c                  | 406 +++++++++++------
>  UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c              |   3 -
>  UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h         |   3 +-
>  25 files changed, 1505 insertions(+), 282 deletions(-)
> 
> --
> 2.15.0.windows.1



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch v2 0/6] Fix performance issue caused by Set MSR task.
  2018-10-18  2:12 ` Ni, Ruiyu
@ 2018-10-18  2:35   ` Dong, Eric
  0 siblings, 0 replies; 18+ messages in thread
From: Dong, Eric @ 2018-10-18  2:35 UTC (permalink / raw)
  To: Ni, Ruiyu, edk2-devel@lists.01.org; +Cc: Laszlo Ersek

Hi Ruiyu,

Below link with my changes:

https://github.com/ydong10/edk2/tree/MSR

Thanks,
Eric

> -----Original Message-----
> From: Ni, Ruiyu
> Sent: Thursday, October 18, 2018 10:13 AM
> To: Dong, Eric <eric.dong@intel.com>; edk2-devel@lists.01.org
> Cc: Laszlo Ersek <lersek@redhat.com>
> Subject: RE: [Patch v2 0/6] Fix performance issue caused by Set MSR task.
> 
> Eric,
> Can you post your changes to github yours mirror repo?
> I found #3/6 cannot be applied to my code properly.
> 
> Thanks/Ray
> 
> > -----Original Message-----
> > From: Dong, Eric
> > Sent: Wednesday, October 17, 2018 10:16 AM
> > To: edk2-devel@lists.01.org
> > Cc: Ni, Ruiyu <ruiyu.ni@intel.com>; Laszlo Ersek <lersek@redhat.com>
> > Subject: [Patch v2 0/6] Fix performance issue caused by Set MSR task.
> >
> > V2 changes include:
> > 1. Include the change for CpuCommonFeaturesLib which used to set MSR
> > base on its scope info.
> > 2. Include the change for CpuS3DataDxe driver which also handle the
> > AcpiCpuData data.
> > 3. Update code base on feedback for V1 changes.
> >
> > V1 changes include:
> > In a system which has multiple cores, current set register value task
> > costs huge times.
> > After investigation, current set MSR task costs most of the times.
> > Current logic uses SpinLock to let set MSR task as an single thread task for
> all cores.
> > Because MSR has scope attribute which may cause GP fault if multiple
> > APs set MSR at the same time, current logic use an easiest solution
> > (use SpinLock) to avoid this issue, but it will cost huge times.
> >
> > In order to fix this performance issue, new solution will set MSRs
> > base on their scope attribute. After this, the SpinLock will not
> > needed. Without SpinLock, new issue raised which is caused by MSR
> > dependence. For example, MSR A depends on MSR B which means MSR A
> must
> > been set after MSR B has been set. Also MSR B is package scope level
> > and MSR A is thread scope level. If system has multiple threads,
> > Thread 1 needs to set the thread level MSRs and thread 2 needs to set
> > thread and package level MSRs. Set MSRs task for thread 1 and thread 2
> like below:
> >
> >             Thread 1                 Thread 2
> > MSR B          N                        Y
> > MSR A          Y                        Y
> >
> > If driver don't control execute MSR order, for thread 1, it will
> > execute MSR A first, but at this time, MSR B not been executed yet by
> > thread 2. system may trig exception at this time.
> >
> > In order to fix the above issue, driver introduces semaphore logic to
> > control the MSR execute sequence. For the above case, a semaphore will
> > be add between MSR A and B for all threads. Semaphore has scope info
> > for it. The possible scope value is core or package.
> > For each thread, when it meets a semaphore during it set registers, it
> > will 1) release semaphore (+1) for each threads in this core or
> > package(based on the scope info for this
> > semaphore) 2) acquire semaphore (-1) for all the threads in this core
> > or package(based on the scope info for this semaphore). With these two
> > steps, driver can control MSR sequence. Sample code logic like below:
> >
> >   //
> >   // First increase semaphore count by 1 for processors in this package.
> >   //
> >   for (ProcessorIndex = 0; ProcessorIndex < PackageThreadsCount ;
> > ProcessorIndex ++) {
> >     LibReleaseSemaphore ((UINT32 *) &SemaphorePtr[PackageOffset +
> > ProcessorIndex]);
> >   }
> >   //
> >   // Second, check whether the count has reach the check number.
> >   //
> >   for (ProcessorIndex = 0; ProcessorIndex < ValidApCount;
> > ProcessorIndex ++) {
> >     LibWaitForSemaphore (&SemaphorePtr[ApOffset]);
> >   }
> >
> > Platform Requirement:
> > 1. This change requires register MSR setting base on MSR scope info.
> > If still register MSR
> >    for all threads, exception may raised.
> >
> > Known limitation:
> > 1. Current CpuFeatures driver supports DXE instance and PEI instance.
> > But semaphore logic
> >    requires Aps execute in async mode which is not supported by PEI driver.
> > So CpuFeature
> >    PEI instance not works after this change. We plan to support async
> > mode for PEI in phase
> >    2 for this task.
> > 2. Current execute MSR task code in duplicated in PiSmmCpuDxeSmm
> > driver and
> >    RegisterCpuFeaturesLib library because the schedule limitation.
> > Will merge the code to
> >    RegisterCpuFeaturesLib and export as an API in phase 2 for this task.
> >
> > Cc: Ruiyu Ni <ruiyu.ni@intel.com>
> > Cc: Laszlo Ersek <lersek@redhat.com>
> > Contributed-under: TianoCore Contribution Agreement 1.1
> > Signed-off-by: Eric Dong <eric.dong@intel.com>
> >
> >
> > Eric Dong (6):
> >   UefiCpuPkg/Include/AcpiCpuData.h: Add Semaphore related Information.
> >   UefiCpuPkg/RegisterCpuFeaturesLib.h: Add new dependence types.
> >   UefiCpuPkg/RegisterCpuFeaturesLib: Add logic to support semaphore
> >     type.
> >   UefiCpuPkg/PiSmmCpuDxeSmm: Add logic to support semaphore type.
> >   UefiCpuPkg/CpuS3DataDxe: Keep old data if value already existed.
> >   UefiCpuPkg/CpuCommonFeaturesLib: Register MSR base on scope Info.
> >
> >  UefiCpuPkg/CpuS3DataDxe/CpuS3Data.c                |   2 +
> >  UefiCpuPkg/Include/AcpiCpuData.h                   |  45 +-
> >  .../Include/Library/RegisterCpuFeaturesLib.h       |  25 +-
> >  UefiCpuPkg/Library/CpuCommonFeaturesLib/C1e.c      |   8 +
> >  UefiCpuPkg/Library/CpuCommonFeaturesLib/Eist.c     |  12 +
> >  .../Library/CpuCommonFeaturesLib/ExecuteDisable.c  |  10 +
> >  .../Library/CpuCommonFeaturesLib/FastStrings.c     |  12 +
> >  .../Library/CpuCommonFeaturesLib/FeatureControl.c  |  38 ++
> >  .../CpuCommonFeaturesLib/LimitCpuIdMaxval.c        |  14 +
> >  .../Library/CpuCommonFeaturesLib/MachineCheck.c    |  38 ++
> >  .../Library/CpuCommonFeaturesLib/MonitorMwait.c    |  15 +
> >  .../Library/CpuCommonFeaturesLib/PendingBreak.c    |  11 +
> >  UefiCpuPkg/Library/CpuCommonFeaturesLib/Ppin.c     |  11 +
> >  .../Library/CpuCommonFeaturesLib/ProcTrace.c       |  11 +
> >  UefiCpuPkg/Library/CpuCommonFeaturesLib/X2Apic.c   |  10 +
> >  .../RegisterCpuFeaturesLib/CpuFeaturesInitialize.c | 438
> > ++++++++++++++++---
> >  .../DxeRegisterCpuFeaturesLib.c                    |  71 ++-
> >  .../DxeRegisterCpuFeaturesLib.inf                  |   3 +
> >  .../PeiRegisterCpuFeaturesLib.c                    |  55 ++-
> >  .../PeiRegisterCpuFeaturesLib.inf                  |   1 +
> >  .../RegisterCpuFeaturesLib/RegisterCpuFeatures.h   |  61 ++-
> >  .../RegisterCpuFeaturesLib.c                       | 484 ++++++++++++++++++---
> >  UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c                  | 406 +++++++++++-----
> -
> >  UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c              |   3 -
> >  UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h         |   3 +-
> >  25 files changed, 1505 insertions(+), 282 deletions(-)
> >
> > --
> > 2.15.0.windows.1



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch v2 1/6] UefiCpuPkg/Include/AcpiCpuData.h: Add Semaphore related Information.
  2018-10-17  2:16 ` [Patch v2 1/6] UefiCpuPkg/Include/AcpiCpuData.h: Add Semaphore related Information Eric Dong
@ 2018-10-18  3:04   ` Ni, Ruiyu
  2018-10-18  3:10   ` Ni, Ruiyu
  1 sibling, 0 replies; 18+ messages in thread
From: Ni, Ruiyu @ 2018-10-18  3:04 UTC (permalink / raw)
  To: Eric Dong, edk2-devel; +Cc: Laszlo Ersek

Thanks for the detailed comments.
Very minor comments in below.

On 10/17/2018 10:16 AM, Eric Dong wrote:
> v2 changes:
> 1. Add more description about why we do this change.
> 2. Change structure field type from pointer to EFI_PHYSICAL_ADDRESS because it will
>     be share between PEI and DXE.
> 
> In order to support semaphore related logic, add new definition for it.
> 
> In a system which has multiple cores, current set register value task costs huge times.
> After investigation, current set MSR task costs most of the times. Current logic uses
> SpinLock to let set MSR task as an single thread task for all cores. Because MSR has
> scope attribute which may cause GP fault if multiple APs set MSR at the same time,
> current logic use an easiest solution (use SpinLock) to avoid this issue, but it will
> cost huge times.
> 
> In order to fix this performance issue, new solution will set MSRs base on their scope
> attribute. After this, the SpinLock will not needed. Without SpinLock, new issue raised
> which is caused by MSR dependence. For example, MSR A depends on MSR B which means MSR A
> must been set after MSR B has been set. Also MSR B is package scope level and MSR A is
> thread scope level. If system has multiple threads, Thread 1 needs to set the thread level
> MSRs and thread 2 needs to set thread and package level MSRs. Set MSRs task for thread 1
> and thread 2 like below:
> 
>              Thread 1                 Thread 2
> MSR B          N                        Y
> MSR A          Y                        Y
> 
> If driver don't control execute MSR order, for thread 1, it will execute MSR A first, but
> at this time, MSR B not been executed yet by thread 2. system may trig exception at this
> time.
> 
> In order to fix the above issue, driver introduces semaphore logic to control the MSR
> execute sequence. For the above case, a semaphore will be add between MSR A and B for
> all threads. Semaphore has scope info for it. The possible scope value is core or package.
> For each thread, when it meets a semaphore during it set registers, it will 1) release
> semaphore (+1) for each threads in this core or package(based on the scope info for this
> semaphore) 2) acquire semaphore (-1) for all the threads in this core or package(based
> on the scope info for this semaphore). With these two steps, driver can control MSR
> sequence. Sample code logic like below:
> 
>    //
>    // First increase semaphore count by 1 for processors in this package.
>    //
>    for (ProcessorIndex = 0; ProcessorIndex < PackageThreadsCount ; ProcessorIndex ++) {
>      LibReleaseSemaphore ((UINT32 *) &SemaphorePtr[PackageOffset + ProcessorIndex]);
>    }
>    //
>    // Second, check whether the count has reach the check number.
>    //
>    for (ProcessorIndex = 0; ProcessorIndex < ValidApCount; ProcessorIndex ++) {
>      LibWaitForSemaphore (&SemaphorePtr[ApOffset]);
>    }
> 
> Platform Requirement:
> 1. This change requires register MSR setting base on MSR scope info. If still register MSR
>     for all threads, exception may raised.
> 
> Known limitation:
> 1. Current CpuFeatures driver supports DXE instance and PEI instance. But semaphore logic
>     requires Aps execute in async mode which is not supported by PEI driver. So CpuFeature
>     PEI instance not works after this change. We plan to support async mode for PEI in phase
>     2 for this task.
> 
> Cc: Ruiyu Ni <ruiyu.ni@intel.com>
> Cc: Laszlo Ersek <lersek@redhat.com>
> Contributed-under: TianoCore Contribution Agreement 1.1
> Signed-off-by: Eric Dong <eric.dong@intel.com>
> ---
>   UefiCpuPkg/Include/AcpiCpuData.h | 45 +++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 44 insertions(+), 1 deletion(-)
> 
> diff --git a/UefiCpuPkg/Include/AcpiCpuData.h b/UefiCpuPkg/Include/AcpiCpuData.h
> index 9e51145c08..f1439dcf9a 100644
> --- a/UefiCpuPkg/Include/AcpiCpuData.h
> +++ b/UefiCpuPkg/Include/AcpiCpuData.h
> @@ -22,9 +22,42 @@ typedef enum {
>     Msr,
>     ControlRegister,
>     MemoryMapped,
> -  CacheControl
> +  CacheControl,

Suggest to leave a blank line here. This can make the comment block in 
below looks closer to the field (Semaphore).
Similar to all the comment blocks below.
> +  //
> +  // Semaphore type used to control the execute sequence of the Msr.
> +  // It will be insert between two Msr which has execute dependence.
> +  //
> +  Semaphore
>   } REGISTER_TYPE;
>   
> +//
> +// CPU information.
> +//
> +typedef struct {
> +  //
> +  // Record the package count in this CPU.
> +  //
> +  UINT32                      PackageCount;
> +  //
> +  // Record the max core count in this CPU.
> +  // Different packages may have different core count, this value
> +  // save the max core count in all the packages.
> +  //
> +  UINT32                      MaxCoreCount;
> +  //
> +  // Record the max thread count in this CPU.
> +  // Different cores may have different thread count, this value
> +  // save the max thread count in all the cores.
> +  //
> +  UINT32                      MaxThreadCount;
> +  //
> +  // This fild is an pointer type which point to an array.

"This field points to an array."?

> +  // This array used to save the valid cores in different packages in this CPU.
> +  // The array count is the package number in this CPU.
"This array saves valid core count (type UINT32) of each package."
"The array has PackageCount elements."
> +  //
> +  EFI_PHYSICAL_ADDRESS        ValidCoresPerPackages;
> +} CPU_STATUS_INFORMATION;
> +
>   //
>   // Element of register table entry
>   //
> @@ -147,6 +180,16 @@ typedef struct {
>     // provided.
>     //
>     UINT32                ApMachineCheckHandlerSize;
> +  //
> +  // CPU information which is required when set the register table.
> +  //
> +  CPU_STATUS_INFORMATION     CpuStatus;
> +  //
> +  // Location info for each AP.
> +  // It points to an array which saves all APs location info in this CPU.
Suggest to remove "in this CPU". Because for a multiple sockets system, 
there are multiple CPUs.
> +  // The array count is the AP count in this CPU.
> +  //
> +  EFI_PHYSICAL_ADDRESS  ApLocation;
>   } ACPI_CPU_DATA;
>   
>   #endif
> 


-- 
Thanks,
Ray


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch v2 1/6] UefiCpuPkg/Include/AcpiCpuData.h: Add Semaphore related Information.
  2018-10-17  2:16 ` [Patch v2 1/6] UefiCpuPkg/Include/AcpiCpuData.h: Add Semaphore related Information Eric Dong
  2018-10-18  3:04   ` Ni, Ruiyu
@ 2018-10-18  3:10   ` Ni, Ruiyu
  1 sibling, 0 replies; 18+ messages in thread
From: Ni, Ruiyu @ 2018-10-18  3:10 UTC (permalink / raw)
  To: Eric Dong, edk2-devel; +Cc: Laszlo Ersek

On 10/17/2018 10:16 AM, Eric Dong wrote:
> v2 changes:
> 1. Add more description about why we do this change.
> 2. Change structure field type from pointer to EFI_PHYSICAL_ADDRESS because it will
>     be share between PEI and DXE.
> 
> In order to support semaphore related logic, add new definition for it.
> 
> In a system which has multiple cores, current set register value task costs huge times.
> After investigation, current set MSR task costs most of the times. Current logic uses
> SpinLock to let set MSR task as an single thread task for all cores. Because MSR has
> scope attribute which may cause GP fault if multiple APs set MSR at the same time,
> current logic use an easiest solution (use SpinLock) to avoid this issue, but it will
> cost huge times.
> 
> In order to fix this performance issue, new solution will set MSRs base on their scope
> attribute. After this, the SpinLock will not needed. Without SpinLock, new issue raised
> which is caused by MSR dependence. For example, MSR A depends on MSR B which means MSR A
> must been set after MSR B has been set. Also MSR B is package scope level and MSR A is
> thread scope level. If system has multiple threads, Thread 1 needs to set the thread level
> MSRs and thread 2 needs to set thread and package level MSRs. Set MSRs task for thread 1
> and thread 2 like below:
> 
>              Thread 1                 Thread 2
> MSR B          N                        Y
> MSR A          Y                        Y
> 
> If driver don't control execute MSR order, for thread 1, it will execute MSR A first, but
> at this time, MSR B not been executed yet by thread 2. system may trig exception at this
> time.
> 
> In order to fix the above issue, driver introduces semaphore logic to control the MSR
> execute sequence. For the above case, a semaphore will be add between MSR A and B for
> all threads. Semaphore has scope info for it. The possible scope value is core or package.
> For each thread, when it meets a semaphore during it set registers, it will 1) release
> semaphore (+1) for each threads in this core or package(based on the scope info for this
> semaphore) 2) acquire semaphore (-1) for all the threads in this core or package(based
> on the scope info for this semaphore). With these two steps, driver can control MSR
> sequence. Sample code logic like below:
> 
>    //
>    // First increase semaphore count by 1 for processors in this package.
>    //
>    for (ProcessorIndex = 0; ProcessorIndex < PackageThreadsCount ; ProcessorIndex ++) {
>      LibReleaseSemaphore ((UINT32 *) &SemaphorePtr[PackageOffset + ProcessorIndex]);
>    }
>    //
>    // Second, check whether the count has reach the check number.
>    //
>    for (ProcessorIndex = 0; ProcessorIndex < ValidApCount; ProcessorIndex ++) {
>      LibWaitForSemaphore (&SemaphorePtr[ApOffset]);
>    }
> 
> Platform Requirement:
> 1. This change requires register MSR setting base on MSR scope info. If still register MSR
>     for all threads, exception may raised.
> 
> Known limitation:
> 1. Current CpuFeatures driver supports DXE instance and PEI instance. But semaphore logic
>     requires Aps execute in async mode which is not supported by PEI driver. So CpuFeature
>     PEI instance not works after this change. We plan to support async mode for PEI in phase
>     2 for this task.
> 
> Cc: Ruiyu Ni <ruiyu.ni@intel.com>
> Cc: Laszlo Ersek <lersek@redhat.com>
> Contributed-under: TianoCore Contribution Agreement 1.1
> Signed-off-by: Eric Dong <eric.dong@intel.com>
> ---
>   UefiCpuPkg/Include/AcpiCpuData.h | 45 +++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 44 insertions(+), 1 deletion(-)
> 
> diff --git a/UefiCpuPkg/Include/AcpiCpuData.h b/UefiCpuPkg/Include/AcpiCpuData.h
> index 9e51145c08..f1439dcf9a 100644
> --- a/UefiCpuPkg/Include/AcpiCpuData.h
> +++ b/UefiCpuPkg/Include/AcpiCpuData.h
> @@ -22,9 +22,42 @@ typedef enum {
>     Msr,
>     ControlRegister,
>     MemoryMapped,
> -  CacheControl
> +  CacheControl,
> +  //
> +  // Semaphore type used to control the execute sequence of the Msr.
> +  // It will be insert between two Msr which has execute dependence.
> +  //
> +  Semaphore
>   } REGISTER_TYPE;
>   
> +//
> +// CPU information.
> +//
> +typedef struct {
> +  //
> +  // Record the package count in this CPU.
> +  //
> +  UINT32                      PackageCount;
> +  //
> +  // Record the max core count in this CPU.
> +  // Different packages may have different core count, this value
> +  // save the max core count in all the packages.
> +  //
> +  UINT32                      MaxCoreCount;
> +  //
> +  // Record the max thread count in this CPU.
> +  // Different cores may have different thread count, this value
> +  // save the max thread count in all the cores.
> +  //
> +  UINT32                      MaxThreadCount;
> +  //
> +  // This fild is an pointer type which point to an array.
> +  // This array used to save the valid cores in different packages in this CPU.
> +  // The array count is the package number in this CPU.
> +  //
> +  EFI_PHYSICAL_ADDRESS        ValidCoresPerPackages;
ValidCoreCountPerPackage?
Using the new name can tell people that this array holds the core *count*.

> +} CPU_STATUS_INFORMATION;
> +
>   //
>   // Element of register table entry
>   //
> @@ -147,6 +180,16 @@ typedef struct {
>     // provided.
>     //
>     UINT32                ApMachineCheckHandlerSize;
> +  //
> +  // CPU information which is required when set the register table.
> +  //
> +  CPU_STATUS_INFORMATION     CpuStatus;
> +  //
> +  // Location info for each AP.
> +  // It points to an array which saves all APs location info in this CPU.
> +  // The array count is the AP count in this CPU.
> +  //
> +  EFI_PHYSICAL_ADDRESS  ApLocation;
>   } ACPI_CPU_DATA;
>   
>   #endif
> 


-- 
Thanks,
Ray


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch v2 2/6] UefiCpuPkg/RegisterCpuFeaturesLib.h: Add new dependence types.
  2018-10-17  2:16 ` [Patch v2 2/6] UefiCpuPkg/RegisterCpuFeaturesLib.h: Add new dependence types Eric Dong
@ 2018-10-18  3:31   ` Ni, Ruiyu
  0 siblings, 0 replies; 18+ messages in thread
From: Ni, Ruiyu @ 2018-10-18  3:31 UTC (permalink / raw)
  To: Eric Dong, edk2-devel; +Cc: Laszlo Ersek

On 10/17/2018 10:16 AM, Eric Dong wrote:
> Add new core/package dependence types which consumed by different MSRs.
> 
> Cc: Ruiyu Ni <ruiyu.ni@intel.com>
> Cc: Laszlo Ersek <lersek@redhat.com>
> Contributed-under: TianoCore Contribution Agreement 1.1
> Signed-off-by: Eric Dong <eric.dong@intel.com>
> ---
>   .../Include/Library/RegisterCpuFeaturesLib.h       | 25 ++++++++++++++++++----
>   1 file changed, 21 insertions(+), 4 deletions(-)
> 
> diff --git a/UefiCpuPkg/Include/Library/RegisterCpuFeaturesLib.h b/UefiCpuPkg/Include/Library/RegisterCpuFeaturesLib.h
> index 9331e49d13..e6f0ebe4bc 100644
> --- a/UefiCpuPkg/Include/Library/RegisterCpuFeaturesLib.h
> +++ b/UefiCpuPkg/Include/Library/RegisterCpuFeaturesLib.h
> @@ -73,10 +73,17 @@
>   #define CPU_FEATURE_PPIN                            (32+11)
>   #define CPU_FEATURE_PROC_TRACE                      (32+12)
>   
> -#define CPU_FEATURE_BEFORE_ALL                      BIT27
> -#define CPU_FEATURE_AFTER_ALL                       BIT28
> -#define CPU_FEATURE_BEFORE                          BIT29
> -#define CPU_FEATURE_AFTER                           BIT30
> +#define CPU_FEATURE_BEFORE_ALL                      BIT23
> +#define CPU_FEATURE_AFTER_ALL                       BIT24


We could add comments to emphasize that CPU_FEATURE_BEFORE and 
CPU_FEATURE_AFTER only mean Thread scope before and Thread scope after.

And after the whole patch serials are checked in, I prefer to mark the 
below two macros as deprecated and avoid using them in core code any more.

> +#define CPU_FEATURE_BEFORE                          BIT25
> +#define CPU_FEATURE_AFTER                           BIT26
> +


> +#define CPU_FEATURE_THREAD_BEFORE                   CPU_FEATURE_BEFORE
> +#define CPU_FEATURE_THREAD_AFTER                    CPU_FEATURE_AFTER
> +#define CPU_FEATURE_CORE_BEFORE                     BIT27
> +#define CPU_FEATURE_CORE_AFTER                      BIT28
> +#define CPU_FEATURE_PACKAGE_BEFORE                  BIT29
> +#define CPU_FEATURE_PACKAGE_AFTER                   BIT30
>   #define CPU_FEATURE_END                             MAX_UINT32
>   /// @}
>   
> @@ -116,6 +123,16 @@ typedef struct {
>     CPUID_VERSION_INFO_EDX               CpuIdVersionInfoEdx;
>   } REGISTER_CPU_FEATURE_INFORMATION;
>   
> +//
> +// Describe the dependency type for different features.

Can you add comments to say like below?
"The value set to CPU_REGISTER_TABLE_ENTRY.Value when the REGISTER_TYPE 
is Semaphore."

And maybe move the enum definition to AcpiCpuData.h because the 
definition of CPU_REGISTER_TABLE_ENTRY and REGISTER_TYPE are all defined 
there.

> +//
> +typedef enum {
> +  NoneDepType,
> +  ThreadDepType,
> +  CoreDepType,
> +  PackageDepType
> +} CPU_FEATURE_DEPENDENCE_TYPE; > +
>   /**
>     Determines if a CPU feature is enabled in PcdCpuFeaturesSupport bit mask.
>     If a CPU feature is disabled in PcdCpuFeaturesSupport then all the code/data
> 


-- 
Thanks,
Ray


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch v2 3/6] UefiCpuPkg/RegisterCpuFeaturesLib: Add logic to support semaphore type.
  2018-10-17  2:16 ` [Patch v2 3/6] UefiCpuPkg/RegisterCpuFeaturesLib: Add logic to support semaphore type Eric Dong
@ 2018-10-18  5:46   ` Ni, Ruiyu
  0 siblings, 0 replies; 18+ messages in thread
From: Ni, Ruiyu @ 2018-10-18  5:46 UTC (permalink / raw)
  To: Eric Dong, edk2-devel; +Cc: Laszlo Ersek

On 10/17/2018 10:16 AM, Eric Dong wrote:
> V2 changes include:
> 1. Add more description for the code part which need easy to understand.
> 2. Refine some code base on feedback for V1 changes.
> 
> V1 changes include:
> In a system which has multiple cores, current set register value task costs huge times.
> After investigation, current set MSR task costs most of the times. Current logic uses
> SpinLock to let set MSR task as an single thread task for all cores. Because MSR has
> scope attribute which may cause GP fault if multiple APs set MSR at the same time,
> current logic use an easiest solution (use SpinLock) to avoid this issue, but it will
> cost huge times.
> 
> In order to fix this performance issue, new solution will set MSRs base on their scope
> attribute. After this, the SpinLock will not needed. Without SpinLock, new issue raised
> which is caused by MSR dependence. For example, MSR A depends on MSR B which means MSR A
> must been set after MSR B has been set. Also MSR B is package scope level and MSR A is
> thread scope level. If system has multiple threads, Thread 1 needs to set the thread level
> MSRs and thread 2 needs to set thread and package level MSRs. Set MSRs task for thread 1
> and thread 2 like below:
> 
>              Thread 1                 Thread 2
> MSR B          N                        Y
> MSR A          Y                        Y
> 
> If driver don't control execute MSR order, for thread 1, it will execute MSR A first, but
> at this time, MSR B not been executed yet by thread 2. system may trig exception at this
> time.
> 
> In order to fix the above issue, driver introduces semaphore logic to control the MSR
> execute sequence. For the above case, a semaphore will be add between MSR A and B for
> all threads. Semaphore has scope info for it. The possible scope value is core or package.
> For each thread, when it meets a semaphore during it set registers, it will 1) release
> semaphore (+1) for each threads in this core or package(based on the scope info for this
> semaphore) 2) acquire semaphore (-1) for all the threads in this core or package(based
> on the scope info for this semaphore). With these two steps, driver can control MSR
> sequence. Sample code logic like below:
> 
>    //
>    // First increase semaphore count by 1 for processors in this package.
>    //
>    for (ProcessorIndex = 0; ProcessorIndex < PackageThreadsCount ; ProcessorIndex ++) {
>      LibReleaseSemaphore ((UINT32 *) &SemaphorePtr[PackageOffset + ProcessorIndex]);
>    }
>    //
>    // Second, check whether the count has reach the check number.
>    //
>    for (ProcessorIndex = 0; ProcessorIndex < ValidApCount; ProcessorIndex ++) {
>      LibWaitForSemaphore (&SemaphorePtr[ApOffset]);
>    }
> 
> Platform Requirement:
> 1. This change requires register MSR setting base on MSR scope info. If still register MSR
>     for all threads, exception may raised.
> 
> Known limitation:
> 1. Current CpuFeatures driver supports DXE instance and PEI instance. But semaphore logic
>     requires Aps execute in async mode which is not supported by PEI driver. So CpuFeature
>     PEI instance not works after this change. We plan to support async mode for PEI in phase
>     2 for this task.
> 
> Cc: Ruiyu Ni <ruiyu.ni@intel.com>
> Cc: Laszlo Ersek <lersek@redhat.com>
> Contributed-under: TianoCore Contribution Agreement 1.1
> Signed-off-by: Eric Dong <eric.dong@intel.com>
> ---
>   .../RegisterCpuFeaturesLib/CpuFeaturesInitialize.c | 438 ++++++++++++++++---
>   .../DxeRegisterCpuFeaturesLib.c                    |  71 ++-
>   .../DxeRegisterCpuFeaturesLib.inf                  |   3 +
>   .../PeiRegisterCpuFeaturesLib.c                    |  55 ++-
>   .../PeiRegisterCpuFeaturesLib.inf                  |   1 +
>   .../RegisterCpuFeaturesLib/RegisterCpuFeatures.h   |  61 ++-
>   .../RegisterCpuFeaturesLib.c                       | 484 ++++++++++++++++++---
>   7 files changed, 980 insertions(+), 133 deletions(-)
> 
> diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c
> index ba3fb3250f..2bf2254602 100644
> --- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c
> +++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c
> @@ -145,6 +145,19 @@ CpuInitDataInitialize (
>     CPU_FEATURES_INIT_ORDER              *InitOrder;
>     CPU_FEATURES_DATA                    *CpuFeaturesData;
>     LIST_ENTRY                           *Entry;
> +  UINT32                               Core;
> +  UINT32                               Package;
> +  UINT32                               Thread;
> +  EFI_CPU_PHYSICAL_LOCATION            *Location;
> +  BOOLEAN                              *CoresVisited;
> +  UINTN                                Index;
> +  ACPI_CPU_DATA                        *AcpiCpuData;
> +  CPU_STATUS_INFORMATION               *CpuStatus;
> +  UINT32                               *ValidCorePerPackage;
> +
> +  Core    = 0;
> +  Package = 0;
> +  Thread  = 0;
>   
>     CpuFeaturesData = GetCpuFeaturesData ();
>     CpuFeaturesData->InitOrder = AllocateZeroPool (sizeof (CPU_FEATURES_INIT_ORDER) * NumberOfCpus);
> @@ -163,6 +176,17 @@ CpuInitDataInitialize (
>       Entry = Entry->ForwardLink;
>     }
>   
> +  CpuFeaturesData->NumberOfCpus = (UINT32) NumberOfCpus;
> +
> +  AcpiCpuData = GetAcpiCpuData ();
> +  ASSERT (AcpiCpuData != NULL);
> +  CpuFeaturesData->AcpiCpuData= AcpiCpuData;
> +
> +  CpuStatus = &AcpiCpuData->CpuStatus;
> +  Location = AllocateZeroPool (sizeof (EFI_CPU_PHYSICAL_LOCATION) * NumberOfCpus);
> +  ASSERT (Location != NULL);
> +  AcpiCpuData->ApLocation = (EFI_PHYSICAL_ADDRESS)(UINTN)Location;
> +
>     for (ProcessorNumber = 0; ProcessorNumber < NumberOfCpus; ProcessorNumber++) {
>       InitOrder = &CpuFeaturesData->InitOrder[ProcessorNumber];
>       InitOrder->FeaturesSupportedMask = AllocateZeroPool (CpuFeaturesData->BitMaskSize);
> @@ -175,7 +199,76 @@ CpuInitDataInitialize (
>         &ProcessorInfoBuffer,
>         sizeof (EFI_PROCESSOR_INFORMATION)
>         );
> +    CopyMem (
> +      Location + ProcessorNumber,
&Location[ProcessorNumber]?

> +      &ProcessorInfoBuffer.Location,
> +      sizeof (EFI_CPU_PHYSICAL_LOCATION)
> +      );
> +
> +    //
> +    // Collect CPU package count info.
> +    //
> +    if (Package < ProcessorInfoBuffer.Location.Package) {
> +      Package = ProcessorInfoBuffer.Location.Package;
> +    }
> +    //
> +    // Collect CPU max core count info.
> +    //
> +    if (Core < ProcessorInfoBuffer.Location.Core) {
> +      Core = ProcessorInfoBuffer.Location.Core;
> +    }
> +    //
> +    // Collect CPU max thread count info.
> +    //
> +    if (Thread < ProcessorInfoBuffer.Location.Thread) {
> +      Thread = ProcessorInfoBuffer.Location.Thread;
> +    }
>     }
> +  CpuStatus->PackageCount    = Package + 1;
> +  CpuStatus->MaxCoreCount    = Core + 1;
> +  CpuStatus->MaxThreadCount  = Thread + 1;
> +  DEBUG ((DEBUG_INFO, "Processor Info: Package: %d, MaxCore : %d, MaxThread: %d\n",
> +         CpuStatus->PackageCount,
> +         CpuStatus->MaxCoreCount,
> +         CpuStatus->MaxThreadCount));
> +
> +  //
> +  // Collect valid core count in each package because not all cores are valid.
> +  //
> +  ValidCorePerPackage= AllocateZeroPool (sizeof (UINT32) * CpuStatus->PackageCount);
> +  ASSERT (ValidCorePerPackage != 0);
> +  CpuStatus->ValidCoresPerPackages = (EFI_PHYSICAL_ADDRESS)(UINTN)ValidCorePerPackage;
> +  CoresVisited = AllocatePool (sizeof (UINT32) * CpuStatus->MaxCoreCount);
Should be sizeof (BOOLEAN).

> +  ASSERT (CoresVisited != NULL);
> +
> +  for (Index = 0; Index < CpuStatus->PackageCount; Index ++ ) {
> +    ZeroMem (CoresVisited, sizeof (UINT32) * CpuStatus->MaxCoreCount);
> +    //
> +    // Collect valid cores in Current package.
> +    //
> +    for (ProcessorNumber = 0; ProcessorNumber < NumberOfCpus; ProcessorNumber++) {
> +      Location = &CpuFeaturesData->InitOrder[ProcessorNumber].CpuInfo.ProcessorInfo.Location;
> +      if (Location->Package == Index && !CoresVisited[Location->Core] ) {
> +        //
> +        // The ValidCores position for Location->Core is valid.
> +        // The possible values in ValidCores[Index] are 0 or 1.
> +        // FALSE means no valid threads in this Core.
> +        // TRUE means have valid threads in this core, no matter the thead count is 1 or more.
> +        //
> +        CoresVisited[Location->Core] = TRUE;
> +        ValidCorePerPackage[Index]++;
> +      }
> +    }
> +  }
> +  FreePool (CoresVisited);
> +
> +  for (Index = 0; Index <= Package; Index++) {
> +    DEBUG ((DEBUG_INFO, "Package: %d, Valid Core : %d\n", Index, ValidCorePerPackage[Index]));
> +  }
> +
> +  CpuFeaturesData->CpuFlags.SemaphoreCount = AllocateZeroPool (sizeof (UINT32) * CpuStatus->PackageCount * CpuStatus->MaxCoreCount * CpuStatus->MaxThreadCount);
> +  ASSERT (CpuFeaturesData->CpuFlags.SemaphoreCount != NULL);
> +
>     //
>     // Get support and configuration PCDs
>     //
> @@ -310,7 +403,7 @@ CollectProcessorData (
>     LIST_ENTRY                           *Entry;
>     CPU_FEATURES_DATA                    *CpuFeaturesData;
>   
> -  CpuFeaturesData = GetCpuFeaturesData ();
> +  CpuFeaturesData = (CPU_FEATURES_DATA *)Buffer;
>     ProcessorNumber = GetProcessorIndex ();
>     CpuInfo = &CpuFeaturesData->InitOrder[ProcessorNumber].CpuInfo;
>     //
> @@ -340,6 +433,33 @@ CollectProcessorData (
>     }
>   }
>   
> +/**
> +  Get the string value for the input dependence type.
> +
> +  @param[in]  DependType  Input the dependence type.
> +
> +  @retval     Return the dependence type string.
> +**/
> +CHAR16 *
> +GetDependTypeString (
> +  IN CPU_FEATURE_DEPENDENCE_TYPE       DependType
> +  )
> +{
> +  switch (DependType) {
> +  case ThreadDepType:
> +    return L"ThreadDepType";
> +
> +  case CoreDepType:
> +    return L"CoreDepType";
> +
> +  case PackageDepType:
> +    return L"PackageDepType";
> +
> +  default:
> +    return L"NoneDepType";
> +  }
> +}

We could eliminate GetDependTypeString() by using a global string array.
E.g.:
CHAR16 *mDependTypeStr[] = { L"None", L"Thread", L"Core", L"Package", 
L"Invalid" };

When using it, we can use
  mDependTypeStr[MIN (DependType, InvalidDepType)].

This means we need to add another value "InvalidDepType" to the enum 
definition.

> +
>   /**
>     Dump the contents of a CPU register table.
>   
> @@ -416,6 +536,15 @@ DumpRegisterTableOnProcessor (
>           RegisterTableEntry->Value
>           ));
>         break;
> +    case Semaphore:
> +      DEBUG ((
> +        DebugPrintErrorLevel,
> +        "Processor: %d: Semaphore: Scope Value: %s\r\n",
> +        ProcessorNumber,
> +        GetDependTypeString ((CPU_FEATURE_DEPENDENCE_TYPE)RegisterTableEntry->Value)
mDependTypeStr[MIN (RegisterTableEntry->Value, InvalidDepType)]
> +        ));
> +      break;
> +
>       default:
>         break;
>       }
> @@ -441,6 +570,11 @@ AnalysisProcessorFeatures (
>     REGISTER_CPU_FEATURE_INFORMATION     *CpuInfo;
>     LIST_ENTRY                           *Entry;
>     CPU_FEATURES_DATA                    *CpuFeaturesData;
> +  LIST_ENTRY                           *NextEntry;
> +  CPU_FEATURES_ENTRY                   *NextCpuFeatureInOrder;
> +  BOOLEAN                              Success;
> +  CPU_FEATURE_DEPENDENCE_TYPE          BeforeDep;
> +  CPU_FEATURE_DEPENDENCE_TYPE          AfterDep;
>   
>     CpuFeaturesData = GetCpuFeaturesData ();
>     CpuFeaturesData->CapabilityPcd = AllocatePool (CpuFeaturesData->BitMaskSize);
> @@ -517,8 +651,14 @@ AnalysisProcessorFeatures (
>       //
>       CpuInfo = &CpuFeaturesData->InitOrder[ProcessorNumber].CpuInfo;
>       Entry = GetFirstNode (&CpuInitOrder->OrderList);
> +    NextEntry = Entry->ForwardLink;
>       while (!IsNull (&CpuInitOrder->OrderList, Entry)) {
>         CpuFeatureInOrder = CPU_FEATURE_ENTRY_FROM_LINK (Entry);
> +      if (!IsNull (&CpuInitOrder->OrderList, NextEntry)) {
> +        NextCpuFeatureInOrder = CPU_FEATURE_ENTRY_FROM_LINK (NextEntry);
> +      } else {
> +        NextCpuFeatureInOrder = NULL;
> +      }
>         if (IsBitMaskMatch (CpuFeatureInOrder->FeatureMask, CpuFeaturesData->SettingPcd)) {
>           Status = CpuFeatureInOrder->InitializeFunc (ProcessorNumber, CpuInfo, CpuFeatureInOrder->ConfigData, TRUE);
>           if (EFI_ERROR (Status)) {
> @@ -532,6 +672,8 @@ AnalysisProcessorFeatures (
>               DEBUG ((DEBUG_WARN, "Warning :: Failed to enable Feature: Mask = "));
>               DumpCpuFeatureMask (CpuFeatureInOrder->FeatureMask);
>             }
> +        } else {
> +          Success = TRUE;
>           }
>         } else {
>           Status = CpuFeatureInOrder->InitializeFunc (ProcessorNumber, CpuInfo, CpuFeatureInOrder->ConfigData, FALSE);
> @@ -542,9 +684,36 @@ AnalysisProcessorFeatures (
>               DEBUG ((DEBUG_WARN, "Warning :: Failed to disable Feature: Mask = "));
>               DumpCpuFeatureMask (CpuFeatureInOrder->FeatureMask);
>             }
> +        } else {
> +          Success = TRUE;
>           }
>         }
> -      Entry = Entry->ForwardLink;
> +
> +      if (Success) {
> +        //
> +        // If feature has dependence with the next feature (ONLY care core/package dependency).
> +        // and feature initialize succeed, add sync semaphere here.
> +        //
> +        BeforeDep = DetectFeatureScope (CpuFeatureInOrder, TRUE);
> +        if (NextCpuFeatureInOrder != NULL) {
> +          AfterDep  = DetectFeatureScope (NextCpuFeatureInOrder, FALSE);
> +        } else {
> +          AfterDep = NoneDepType;
> +        }
> +        //
> +        // Assume only one of the depend is valid.
> +        //
> +        ASSERT (!(BeforeDep > ThreadDepType && AfterDep > ThreadDepType));
> +        if (BeforeDep > ThreadDepType) {
> +          CPU_REGISTER_TABLE_WRITE32 (ProcessorNumber, Semaphore, 0, BeforeDep);
> +        }
> +        if (AfterDep > ThreadDepType) {
> +          CPU_REGISTER_TABLE_WRITE32 (ProcessorNumber, Semaphore, 0, AfterDep);
> +        }
> +      }
> +
> +      Entry     = Entry->ForwardLink;
> +      NextEntry = Entry->ForwardLink;
>       }
>   
>       //
> @@ -561,27 +730,116 @@ AnalysisProcessorFeatures (
>     }
>   }
>   
> +/**
> +  Increment semaphore by 1.
> +
> +  @param      Sem            IN:  32-bit unsigned integer
> +
> +**/
> +VOID
> +LibReleaseSemaphore (
> +  IN OUT  volatile UINT32           *Sem
> +  )
> +{
> +  InterlockedIncrement (Sem);
> +}
> +
> +/**
> +  Decrement the semaphore by 1 if it is not zero.
> +
> +  Performs an atomic decrement operation for semaphore.
> +  The compare exchange operation must be performed using
> +  MP safe mechanisms.
> +
> +  @param      Sem            IN:  32-bit unsigned integer
> +
> +**/
> +VOID
> +LibWaitForSemaphore (
> +  IN OUT  volatile UINT32           *Sem
> +  )
> +{
> +  UINT32  Value;
> +
> +  do {
> +    Value = *Sem;
> +  } while (Value == 0 ||
> +           InterlockedCompareExchange32 (
> +             Sem,
> +             Value,
> +             Value - 1
> +             ) != Value);
> +}

Looks good.

> +
> +/**
> +  Get the string value for the input register type.
> +
> +  @param[in]  RegisterType  Input the register type.
> +
> +  @retval     Return the register type string.
> +**/
> +CHAR16 *
> +GetRegisterTypeString (
> +  IN REGISTER_TYPE       RegisterType
> +  )
> +{
> +  switch (RegisterType) {
> +  case Msr:
> +    return L"Msr";
> +
> +  case ControlRegister:
> +    return L"ControlRegister";
> +
> +  case MemoryMapped:
> +    return L"MemoryMapped";
> +
> +  case CacheControl:
> +    return L"CacheControl";
> +
> +  case Semaphore:
> +    return L"Semaphore";
> +
> +  default:
> +    ASSERT (FALSE);
> +    return L"NoneType";
> +  }
> +}

Similar comments as above. We could use a global string array to avoid 
adding a small function.
Similarly, we need to add a Invalid value to that enum definition.


> +
>   /**
>     Initialize the CPU registers from a register table.
>   
> -  @param[in]  ProcessorNumber  The index of the CPU executing this function.
> +  @param[in]  RegisterTable         The register table for this AP.
> +  @param[in]  ApLocation            AP location info for this ap.
> +  @param[in]  CpuStatus             CPU status info for this CPU.
> +  @param[in]  CpuFlags              Flags data structure used when program the register.
>   
>     @note This service could be called by BSP/APs.
>   **/
>   VOID
>   ProgramProcessorRegister (
> -  IN UINTN  ProcessorNumber
> +  IN CPU_REGISTER_TABLE           *RegisterTable,
> +  IN EFI_CPU_PHYSICAL_LOCATION    *ApLocation,
> +  IN CPU_STATUS_INFORMATION       *CpuStatus,
> +  IN PROGRAM_CPU_REGISTER_FLAGS   *CpuFlags
>     )
>   {
> -  CPU_FEATURES_DATA         *CpuFeaturesData;
> -  CPU_REGISTER_TABLE        *RegisterTable;
>     CPU_REGISTER_TABLE_ENTRY  *RegisterTableEntry;
>     UINTN                     Index;
>     UINTN                     Value;
>     CPU_REGISTER_TABLE_ENTRY  *RegisterTableEntryHead;
> -
> -  CpuFeaturesData = GetCpuFeaturesData ();
> -  RegisterTable = &CpuFeaturesData->RegisterTable[ProcessorNumber];
> +  volatile UINT32           *SemaphorePtr;
> +  UINT32                    FirstThread;
> +  UINT32                    PackageThreadsCount;
> +  UINT32                    CurrentThread;
> +  UINTN                     ProcessorIndex;
> +  UINTN                     ThreadIndex;
> +  UINTN                     ValidThreadCount;
> +  UINT32                    *ValidCorePerPackage;
> +
> +  ThreadIndex = ApLocation->Package * CpuStatus->MaxCoreCount * CpuStatus->MaxThreadCount \
> +            + ApLocation->Core * CpuStatus->MaxThreadCount \
> +            + ApLocation->Thread;
>   
>     //
>     // Traverse Register Table of this logical processor
> @@ -591,6 +849,13 @@ ProgramProcessorRegister (
>     for (Index = 0; Index < RegisterTable->TableLength; Index++) {
>   
>       RegisterTableEntry = &RegisterTableEntryHead[Index];
> +    DEBUG ((
> +      DEBUG_INFO,
> +      "Processor = %ul, Entry Index %ul, Type = %s!\n",
> +      (UINT64)ThreadIndex,
> +      (UINT64)Index,
> +      GetRegisterTypeString((REGISTER_TYPE)RegisterTableEntry->RegisterType)
> +      ));
>   
>       //
>       // Check the type of specified register
> @@ -654,10 +919,6 @@ ProgramProcessorRegister (
>       // The specified register is Model Specific Register
>       //
>       case Msr:
> -      //
> -      // Get lock to avoid Package/Core scope MSRs programming issue in parallel execution mode
> -      //
> -      AcquireSpinLock (&CpuFeaturesData->MsrLock);
>         if (RegisterTableEntry->ValidBitLength >= 64) {
>           //
>           // If length is not less than 64 bits, then directly write without reading
> @@ -677,20 +938,19 @@ ProgramProcessorRegister (
>             RegisterTableEntry->Value
>             );
>         }
> -      ReleaseSpinLock (&CpuFeaturesData->MsrLock);
>         break;
>       //
>       // MemoryMapped operations
>       //
>       case MemoryMapped:
> -      AcquireSpinLock (&CpuFeaturesData->MemoryMappedLock);
> +      AcquireSpinLock (&CpuFlags->MemoryMappedLock);
>         MmioBitFieldWrite32 (
>           (UINTN)(RegisterTableEntry->Index | LShiftU64 (RegisterTableEntry->HighIndex, 32)),
>           RegisterTableEntry->ValidBitStart,
>           RegisterTableEntry->ValidBitStart + RegisterTableEntry->ValidBitLength - 1,
>           (UINT32)RegisterTableEntry->Value
>           );
> -      ReleaseSpinLock (&CpuFeaturesData->MemoryMappedLock);
> +      ReleaseSpinLock (&CpuFlags->MemoryMappedLock);
>         break;
>       //
>       // Enable or disable cache
> @@ -706,6 +966,77 @@ ProgramProcessorRegister (
>         }
>         break;
>   
> +    case Semaphore:
> +      // Semaphore works logic like below:
> +      //
> +      //  V(x) = LibReleaseSemaphore (Semaphore[FirstThread + x]);
> +      //  P(x) = LibWaitForSemaphore (Semaphore[FirstThread + x]);
> +      //
> +      //  All threads (T0...Tn) waits in P() line and continues running
> +      //  together.
> +      //
> +      //
> +      //  T0             T1            ...           Tn
> +      //
> +      //  V(0...n)       V(0...n)      ...           V(0...n)
> +      //  n * P(0)       n * P(1)      ...           n * P(n)
> +      //
> +      SemaphorePtr = CpuFlags->SemaphoreCount;
> +      switch (RegisterTableEntry->Value) {
> +      case CoreDepType:
> +        //
> +        // Get Offset info for the first thread in the core which current thread belongs to.
> +        //
> +        FirstThread = (ApLocation->Package * CpuStatus->MaxCoreCount + ApLocation->Core) * CpuStatus->MaxThreadCount;
> +        CurrentThread = FirstThread + ApLocation->Thread;
> +        //
> +        // First Notify all threads in current Core that this thread has ready.
> +        //
> +        for (ProcessorIndex = 0; ProcessorIndex < CpuStatus->MaxThreadCount; ProcessorIndex ++) {
> +          LibReleaseSemaphore ((UINT32 *) &SemaphorePtr[FirstThread + ProcessorIndex]);
> +        }
> +        //
> +        // Second, check whether all valid threads in current core have ready.
> +        //
> +        for (ProcessorIndex = 0; ProcessorIndex < CpuStatus->MaxThreadCount; ProcessorIndex ++) {
> +          LibWaitForSemaphore (&SemaphorePtr[CurrentThread]);
> +        }
> +        break;
> +
> +      case PackageDepType:
> +        ValidCorePerPackage = (UINT32 *)(UINTN)CpuStatus->ValidCoresPerPackages;
> +        //
> +        // Get Offset info for the first thread in the package which current thread belongs to.
> +        //
> +        FirstThread = ApLocation->Package * CpuStatus->MaxCoreCount * CpuStatus->MaxThreadCount;
> +        //
> +        // Get the possible threads count for current package.
> +        //
> +        PackageThreadsCount = CpuStatus->MaxThreadCount * CpuStatus->MaxCoreCount;
> +        CurrentThread = FirstThread + CpuStatus->MaxThreadCount * ApLocation->Core + ApLocation->Thread;
> +        //
> +        // Get the valid thread count for current package.
> +        //
> +        ValidThreadCount = CpuStatus->MaxThreadCount * ValidCorePerPackage[ApLocation->Package];
> +        //
> +        // First Notify all threads in current package that this thread has ready.

We can add comments here to explain why notifying all threads in current 
package but wait only ValidThreadCount times in second step.

> +        //
> +        for (ProcessorIndex = 0; ProcessorIndex < PackageThreadsCount ; ProcessorIndex ++) {
> +          LibReleaseSemaphore ((UINT32 *) &SemaphorePtr[FirstThread + ProcessorIndex]);
> +        }
> +        //
> +        // Second, check whether all valid threads in current package have ready.
> +        //
> +        for (ProcessorIndex = 0; ProcessorIndex < ValidThreadCount; ProcessorIndex ++) {
> +          LibWaitForSemaphore (&SemaphorePtr[CurrentThread]);
> +        }
> +        break;
> +
> +      default:
> +        break;
> +      }
> +      break;
> +
>       default:
>         break;
>       }
> @@ -724,10 +1055,36 @@ SetProcessorRegister (
>     IN OUT VOID            *Buffer
>     )
>   {
> -  UINTN                  ProcessorNumber;
> +  CPU_FEATURES_DATA         *CpuFeaturesData;
> +  CPU_REGISTER_TABLE        *RegisterTable;
> +  CPU_REGISTER_TABLE        *RegisterTables;
> +  UINT32                    InitApicId;
> +  UINTN                     ProcIndex;
> +  UINTN                     Index;
> +  ACPI_CPU_DATA             *AcpiCpuData;
>   
> -  ProcessorNumber = GetProcessorIndex ();
> -  ProgramProcessorRegister (ProcessorNumber);
> +  CpuFeaturesData = (CPU_FEATURES_DATA *) Buffer;
> +  AcpiCpuData = CpuFeaturesData->AcpiCpuData;
> +
> +  RegisterTables = (CPU_REGISTER_TABLE *)(UINTN)AcpiCpuData->RegisterTable;
> +
> +  InitApicId = GetInitialApicId ();
> +  RegisterTable = NULL;
> +  for (Index = 0; Index < AcpiCpuData->NumberOfCpus; Index++) {
> +    if (RegisterTables[Index].InitialApicId == InitApicId) {
> +      RegisterTable =  &RegisterTables[Index];
> +      ProcIndex = Index;
> +      break;
> +    }
> +  }
> +  ASSERT (RegisterTable != NULL);
> +
> +  ProgramProcessorRegister (
> +    RegisterTable,
> +    (EFI_CPU_PHYSICAL_LOCATION *)(UINTN)AcpiCpuData->ApLocation + ProcIndex,
> +    &AcpiCpuData->CpuStatus,
> +    &CpuFeaturesData->CpuFlags
> +    );
>   }
>   
>   /**
> @@ -746,6 +1103,9 @@ CpuFeaturesDetect (
>   {
>     UINTN                  NumberOfCpus;
>     UINTN                  NumberOfEnabledProcessors;
> +  CPU_FEATURES_DATA      *CpuFeaturesData;
> +
> +  CpuFeaturesData = GetCpuFeaturesData();
>   
>     GetNumberOfProcessor (&NumberOfCpus, &NumberOfEnabledProcessors);
>   
> @@ -754,49 +1114,13 @@ CpuFeaturesDetect (
>     //
>     // Wakeup all APs for data collection.
>     //
> -  StartupAPsWorker (CollectProcessorData);
> +  StartupAPsWorker (CollectProcessorData, NULL);
>   
>     //
>     // Collect data on BSP
>     //
> -  CollectProcessorData (NULL);
> +  CollectProcessorData (CpuFeaturesData);
>   
>     AnalysisProcessorFeatures (NumberOfCpus);
>   }
>   
> -/**
> -  Performs CPU features Initialization.
> -
> -  This service will invoke MP service to perform CPU features
> -  initialization on BSP/APs per user configuration.
> -
> -  @note This service could be called by BSP only.
> -**/
> -VOID
> -EFIAPI
> -CpuFeaturesInitialize (
> -  VOID
> -  )
> -{
> -  CPU_FEATURES_DATA      *CpuFeaturesData;
> -  UINTN                  OldBspNumber;
> -
> -  CpuFeaturesData = GetCpuFeaturesData ();
> -
> -  OldBspNumber = GetProcessorIndex();
> -  CpuFeaturesData->BspNumber = OldBspNumber;
> -  //
> -  // Wakeup all APs for programming.
> -  //
> -  StartupAPsWorker (SetProcessorRegister);
> -  //
> -  // Programming BSP
> -  //
> -  SetProcessorRegister (NULL);
> -  //
> -  // Switch to new BSP if required
> -  //
> -  if (CpuFeaturesData->BspNumber != OldBspNumber) {
> -    SwitchNewBsp (CpuFeaturesData->BspNumber);
> -  }
> -}
> diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.c b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.c
> index 1f34a3f489..8346f7004f 100644
> --- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.c
> +++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.c
> @@ -15,6 +15,7 @@
>   #include <PiDxe.h>
>   
>   #include <Library/UefiBootServicesTableLib.h>
> +#include <Library/UefiLib.h>
>   
>   #include "RegisterCpuFeatures.h"
>   
> @@ -115,14 +116,20 @@ GetProcessorInformation (
>   
>     @param[in]  Procedure               A pointer to the function to be run on
>                                         enabled APs of the system.
> +  @param[in]  MpEvent                 A pointer to the event to be used later
> +                                      to check whether procedure has done.
>   **/
>   VOID
>   StartupAPsWorker (
> -  IN  EFI_AP_PROCEDURE                 Procedure
> +  IN  EFI_AP_PROCEDURE                 Procedure,
> +  IN  VOID                             *MpEvent

EFI_EVENT MpEvent?


>     )
>   {
>     EFI_STATUS                           Status;
>     EFI_MP_SERVICES_PROTOCOL             *MpServices;
> +  CPU_FEATURES_DATA                    *CpuFeaturesData;
> +
> +  CpuFeaturesData = GetCpuFeaturesData ();
>   
>     MpServices = GetMpProtocol ();
>     //
> @@ -132,9 +139,9 @@ StartupAPsWorker (
>                    MpServices,
>                    Procedure,
>                    FALSE,
> -                 NULL,
> +                 (EFI_EVENT)MpEvent,

So no typecast is needed here.

>                    0,
> -                 NULL,
> +                 CpuFeaturesData,
>                    NULL
>                    );
>     ASSERT_EFI_ERROR (Status);
> @@ -197,3 +204,61 @@ GetNumberOfProcessor (
>     ASSERT_EFI_ERROR (Status);
>   }
>   
> +/**
> +  Performs CPU features Initialization.
> +
> +  This service will invoke MP service to perform CPU features
> +  initialization on BSP/APs per user configuration.
> +
> +  @note This service could be called by BSP only.
> +**/
> +VOID
> +EFIAPI
> +CpuFeaturesInitialize (
> +  VOID
> +  )
> +{
> +  CPU_FEATURES_DATA          *CpuFeaturesData;
> +  UINTN                      OldBspNumber;
> +  EFI_EVENT                  MpEvent;
> +  EFI_STATUS                 Status;
> +
> +  CpuFeaturesData = GetCpuFeaturesData ();
> +
> +  OldBspNumber = GetProcessorIndex();
> +  CpuFeaturesData->BspNumber = OldBspNumber;
> +
> +  Status = gBS->CreateEvent (
> +                  EVT_NOTIFY_WAIT,
> +                  TPL_CALLBACK,
> +                  EfiEventEmptyFunction,
> +                  NULL,
> +                  &MpEvent
> +                  );
> +  ASSERT_EFI_ERROR (Status);
> +
> +  //
> +  // Wakeup all APs for programming.
> +  //
> +  StartupAPsWorker (SetProcessorRegister, MpEvent);
> +  //
> +  // Programming BSP
> +  //
> +  SetProcessorRegister (CpuFeaturesData);
> +
> +  //
> +  // Wait all processors to finish the task.
> +  //
> +  do {
> +    Status = gBS->CheckEvent (MpEvent);
> +  } while (Status == EFI_NOT_READY);
> +  ASSERT_EFI_ERROR (Status);
> +
> +  //
> +  // Switch to new BSP if required
> +  //
> +  if (CpuFeaturesData->BspNumber != OldBspNumber) {
> +    SwitchNewBsp (CpuFeaturesData->BspNumber);
> +  }
> +}
> +
> diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.inf b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.inf
> index f0f317c945..6693bae575 100644
> --- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.inf
> +++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.inf
> @@ -47,6 +47,9 @@
>     SynchronizationLib
>     UefiBootServicesTableLib
>     IoLib
> +  UefiBootServicesTableLib
> +  UefiLib
> +  LocalApicLib

LocalApicLib is already listed in INF.
>   
>   [Protocols]
>     gEfiMpServiceProtocolGuid                                            ## CONSUMES
> diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.c b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.c
> index 82fe268812..799864a136 100644
> --- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.c
> +++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.c
> @@ -149,11 +149,15 @@ GetProcessorInformation (
>   **/
>   VOID
>   StartupAPsWorker (
> -  IN  EFI_AP_PROCEDURE                 Procedure
> +  IN  EFI_AP_PROCEDURE                 Procedure,
> +  IN  VOID                             *MpEvent
>     )
>   {
>     EFI_STATUS                           Status;
>     EFI_PEI_MP_SERVICES_PPI              *CpuMpPpi;
> +  CPU_FEATURES_DATA                    *CpuFeaturesData;
> +
> +  CpuFeaturesData = GetCpuFeaturesData ();
>   
>     //
>     // Get MP Services Protocol
> @@ -175,7 +179,7 @@ StartupAPsWorker (
>                    Procedure,
>                    FALSE,
>                    0,
> -                 NULL
> +                 CpuFeaturesData
>                    );
>     ASSERT_EFI_ERROR (Status);
>   }
> @@ -257,3 +261,50 @@ GetNumberOfProcessor (
>                            );
>     ASSERT_EFI_ERROR (Status);
>   }
> +
> +/**
> +  Performs CPU features Initialization.
> +
> +  This service will invoke MP service to perform CPU features
> +  initialization on BSP/APs per user configuration.
> +
> +  @note This service could be called by BSP only.
> +**/
> +VOID
> +EFIAPI
> +CpuFeaturesInitialize (
> +  VOID
> +  )
> +{
> +  CPU_FEATURES_DATA          *CpuFeaturesData;
> +  UINTN                      OldBspNumber;
> +
> +  CpuFeaturesData = GetCpuFeaturesData ();
> +
> +  OldBspNumber = GetProcessorIndex();
> +  CpuFeaturesData->BspNumber = OldBspNumber;
> +
> +  //
> +  // Known limitation: In PEI phase, CpuFeatures driver not
> +  // support async mode execute tasks. So semaphore type
> +  // register can't been used for this instance, must use
> +  // DXE type instance.
> +  //
> +
> +  //
> +  // Wakeup all APs for programming.
> +  //
> +  StartupAPsWorker (SetProcessorRegister, NULL);
> +  //
> +  // Programming BSP
> +  //
> +  SetProcessorRegister (CpuFeaturesData);
> +
> +  //
> +  // Switch to new BSP if required
> +  //
> +  if (CpuFeaturesData->BspNumber != OldBspNumber) {
> +    SwitchNewBsp (CpuFeaturesData->BspNumber);
> +  }
> +}
> +
> diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.inf b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.inf
> index fdfef98293..e95f01df0b 100644
> --- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.inf
> +++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.inf
> @@ -49,6 +49,7 @@
>     PeiServicesLib
>     PeiServicesTablePointerLib
>     IoLib
> +  LocalApicLib

LocalApicLib is already listed in INF so no change is needed here.

>   
>   [Ppis]
>     gEfiPeiMpServicesPpiGuid                                             ## CONSUMES
> diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeatures.h b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeatures.h
> index edd266934f..ad970a00b2 100644
> --- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeatures.h
> +++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeatures.h
> @@ -23,6 +23,7 @@
>   #include <Library/MemoryAllocationLib.h>
>   #include <Library/SynchronizationLib.h>
>   #include <Library/IoLib.h>
> +#include <Library/LocalApicLib.h>
>   
>   #include <AcpiCpuData.h>
>   
> @@ -46,16 +47,26 @@ typedef struct {
>     CPU_FEATURE_INITIALIZE       InitializeFunc;
>     UINT8                        *BeforeFeatureBitMask;
>     UINT8                        *AfterFeatureBitMask;
> +  UINT8                        *CoreBeforeFeatureBitMask;
> +  UINT8                        *CoreAfterFeatureBitMask;
> +  UINT8                        *PackageBeforeFeatureBitMask;
> +  UINT8                        *PackageAfterFeatureBitMask;
>     VOID                         *ConfigData;
>     BOOLEAN                      BeforeAll;
>     BOOLEAN                      AfterAll;
>   } CPU_FEATURES_ENTRY;
>   
> +//
> +// Flags used when program the register.
> +//
> +typedef struct {
> +  volatile UINTN           MemoryMappedLock;     // Spinlock used to program mmio
> +  volatile UINT32          *SemaphoreCount;      // Semaphore used to program semaphore.
> +} PROGRAM_CPU_REGISTER_FLAGS;
> +
>   typedef struct {
>     UINTN                    FeaturesCount;
>     UINT32                   BitMaskSize;
> -  SPIN_LOCK                MsrLock;
> -  SPIN_LOCK                MemoryMappedLock;
>     LIST_ENTRY               FeatureList;
>   
>     CPU_FEATURES_INIT_ORDER  *InitOrder;
> @@ -64,9 +75,14 @@ typedef struct {
>     UINT8                    *ConfigurationPcd;
>     UINT8                    *SettingPcd;
>   
> +  UINT32                   NumberOfCpus;
> +  ACPI_CPU_DATA            *AcpiCpuData;
> +
>     CPU_REGISTER_TABLE       *RegisterTable;
>     CPU_REGISTER_TABLE       *PreSmmRegisterTable;
>     UINTN                    BspNumber;
> +
> +  PROGRAM_CPU_REGISTER_FLAGS  CpuFlags;
>   } CPU_FEATURES_DATA;
>   
>   #define CPU_FEATURE_ENTRY_FROM_LINK(a) \
> @@ -118,10 +134,13 @@ GetProcessorInformation (
>   
>     @param[in]  Procedure               A pointer to the function to be run on
>                                         enabled APs of the system.
> +  @param[in]  MpEvent                 A pointer to the event to be used later
> +                                      to check whether procedure has done.
>   **/
>   VOID
>   StartupAPsWorker (
> -  IN  EFI_AP_PROCEDURE                 Procedure
> +  IN  EFI_AP_PROCEDURE                 Procedure,
> +  IN  VOID                             *MpEvent
>     );
>   
>   /**
> @@ -170,4 +189,40 @@ DumpCpuFeature (
>     IN CPU_FEATURES_ENTRY  *CpuFeature
>     );
>   
> +/**
> +  Return feature dependence result.
> +
> +  @param[in]  CpuFeature        Pointer to CPU feature.
> +  @param[in]  Before            Check before dependence or after.
> +
> +  @retval     return the dependence result.
> +**/
> +CPU_FEATURE_DEPENDENCE_TYPE
> +DetectFeatureScope (
> +  IN CPU_FEATURES_ENTRY         *CpuFeature,
> +  IN BOOLEAN                    Before
> +  );
> +
> +/**
> +  Programs registers for the calling processor.
> +
> +  @param[in,out] Buffer  The pointer to private data buffer.
> +
> +**/
> +VOID
> +EFIAPI
> +SetProcessorRegister (
> +  IN OUT VOID            *Buffer
> +  );
> +
> +/**
> +  Return ACPI_CPU_DATA data.
> +
> +  @return  Pointer to ACPI_CPU_DATA data.
> +**/
> +ACPI_CPU_DATA *
> +GetAcpiCpuData (
> +  VOID
> +  );
> +
>   #endif
> diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeaturesLib.c b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeaturesLib.c
> index fa7e107e39..d395f98f32 100644
> --- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeaturesLib.c
> +++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeaturesLib.c
> @@ -112,6 +112,302 @@ IsBitMaskMatchCheck (
>     return FALSE;
>   }
>   
> +/**
> +  Return feature dependence result.
> +
> +  @param[in]  CpuFeature        Pointer to CPU feature.
> +  @param[in]  Before            Check before dependence or after.
> +
> +  @retval     return the dependence result.
> +**/
> +CPU_FEATURE_DEPENDENCE_TYPE
> +DetectFeatureScope (
> +  IN CPU_FEATURES_ENTRY         *CpuFeature,
> +  IN BOOLEAN                    Before
> +  )
> +{
> +  if (Before) {
> +    if (CpuFeature->PackageBeforeFeatureBitMask != NULL) {
> +      return PackageDepType;
> +    }
> +
> +    if (CpuFeature->CoreBeforeFeatureBitMask != NULL) {
> +      return CoreDepType;
> +    }
> +
> +    if (CpuFeature->BeforeFeatureBitMask != NULL) {
> +      return ThreadDepType;
> +    }
> +
> +    return NoneDepType;
> +  }
> +
> +  if (CpuFeature->PackageAfterFeatureBitMask != NULL) {
> +    return PackageDepType;
> +  }
> +
> +  if (CpuFeature->CoreAfterFeatureBitMask != NULL) {
> +    return CoreDepType;
> +  }
> +
> +  if (CpuFeature->AfterFeatureBitMask != NULL) {
> +    return ThreadDepType;
> +  }
> +
> +  return NoneDepType;
> +}
> +
> +/**
> +  Clear dependence for the specified type.
> +
> +  @param[in]  CurrentFeature     Cpu feature need to clear.
> +  @param[in]  Before             Before or after dependence relationship.
> +
> +**/
> +VOID
> +ClearFeatureScope (
> +  IN CPU_FEATURES_ENTRY           *CpuFeature,
> +  IN BOOLEAN                      Before
> +  )
> +{
> +  if (Before) {
> +    if (CpuFeature->BeforeFeatureBitMask != NULL) {
> +      FreePool (CpuFeature->BeforeFeatureBitMask);
> +      CpuFeature->BeforeFeatureBitMask = NULL;
> +    }
> +    if (CpuFeature->CoreBeforeFeatureBitMask != NULL) {
> +      FreePool (CpuFeature->CoreBeforeFeatureBitMask);
> +      CpuFeature->CoreBeforeFeatureBitMask = NULL;
> +    }
> +    if (CpuFeature->PackageBeforeFeatureBitMask != NULL) {
> +      FreePool (CpuFeature->PackageBeforeFeatureBitMask);
> +      CpuFeature->PackageBeforeFeatureBitMask = NULL;
> +    }
> +  } else {
> +    if (CpuFeature->PackageAfterFeatureBitMask != NULL) {
> +      FreePool (CpuFeature->PackageAfterFeatureBitMask);
> +      CpuFeature->PackageAfterFeatureBitMask = NULL;
> +    }
> +    if (CpuFeature->CoreAfterFeatureBitMask != NULL) {
> +      FreePool (CpuFeature->CoreAfterFeatureBitMask);
> +      CpuFeature->CoreAfterFeatureBitMask = NULL;
> +    }
> +    if (CpuFeature->AfterFeatureBitMask != NULL) {
> +      FreePool (CpuFeature->AfterFeatureBitMask);
> +      CpuFeature->AfterFeatureBitMask = NULL;
> +    }
> +  }
> +}
> +
> +/**
> +  Base on dependence relationship to asjust feature dependence.
> +
> +  ONLY when the feature before(or after) the find feature also has
> +  dependence with the find feature. In this case, driver need to base
> +  on dependce relationship to decide how to insert current feature and
> +  adjust the feature dependence.
> +
> +  @param[in]  PreviousFeature    CPU feature current before the find one.
> +  @param[in]  CurrentFeature     Cpu feature need to adjust.
> +  @param[in]  Before             Before or after dependence relationship.
> +
> +  @retval   TRUE   means the current feature dependence has been adjusted.
> +
> +  @retval   FALSE  means the previous feature dependence has been adjusted.
> +                   or previous feature has no dependence with the find one.
> +
> +**/
> +BOOLEAN
> +AdjustFeaturesDependence (
> +  IN OUT CPU_FEATURES_ENTRY         *PreviousFeature,
> +  IN OUT CPU_FEATURES_ENTRY         *CurrentFeature,
> +  IN     BOOLEAN                    Before
> +  )
> +{
> +  CPU_FEATURE_DEPENDENCE_TYPE            PreDependType;
> +  CPU_FEATURE_DEPENDENCE_TYPE            CurrentDependType;
> +
> +  PreDependType     = DetectFeatureScope(PreviousFeature, Before);
> +  CurrentDependType = DetectFeatureScope(CurrentFeature, Before);
> +
> +  //
> +  // If previous feature has no dependence with the find featue.
> +  // return FALSE.
> +  //
> +  if (PreDependType == NoneDepType) {
> +    return FALSE;
> +  }
> +
> +  //
> +  // If both feature have dependence, keep the one which needs use more
> +  // processors and clear the dependence for the other one.
> +  //
> +  if (PreDependType >= CurrentDependType) {
> +    ClearFeatureScope (CurrentFeature, Before);
> +    return TRUE;
> +  } else {
> +    ClearFeatureScope (PreviousFeature, Before);
> +    return FALSE;
> +  }
> +}
> +
> +/**
> +  Base on dependence relationship to asjust feature order.
> +
> +  @param[in]  FeatureList        Pointer to CPU feature list
> +  @param[in]  FindEntry          The entry this feature depend on.
> +  @param[in]  CurrentEntry       The entry for this feature.
> +  @param[in]  Before             Before or after dependence relationship.
> +
> +**/
> +VOID
> +AdjustEntry (
> +  IN      LIST_ENTRY                *FeatureList,
> +  IN OUT  LIST_ENTRY                *FindEntry,
> +  IN OUT  LIST_ENTRY                *CurrentEntry,
> +  IN      BOOLEAN                   Before
> +  )
> +{
> +  LIST_ENTRY                *PreviousEntry;
> +  CPU_FEATURES_ENTRY        *PreviousFeature;
> +  CPU_FEATURES_ENTRY        *CurrentFeature;
> +
> +  //
> +  // For CPU feature which has core or package type dependence, later code need to insert
> +  // AcquireSpinLock/ReleaseSpinLock logic to sequency the execute order.
> +  // So if driver finds both feature A and B need to execute before feature C, driver will
> +  // base on dependence type of feature A and B to update the logic here.
> +  // For example, feature A has package type dependence and feature B has core type dependence,
> +  // because package type dependence need to wait for more processors which has strong dependence
> +  // than core type dependence. So driver will adjust the feature order to B -> A -> C. and driver
> +  // will remove the feature dependence in feature B.
> +  // Driver just needs to make sure before feature C been executed, feature A has finished its task
> +  // in all all thread. Feature A finished in all threads also means feature B have finshed in all
> +  // threads.
> +  //
> +  if (Before) {
> +    PreviousEntry = GetPreviousNode (FeatureList, FindEntry);
> +  } else {
> 
> +    PreviousEntry = GetNextNode (FeatureList, FindEntry);
> +  }
> +
> +  CurrentFeature  = CPU_FEATURE_ENTRY_FROM_LINK (CurrentEntry);
> +  RemoveEntryList (CurrentEntry);
> +
> +  if (IsNull (FeatureList, PreviousEntry)) {
> +    //
> +    // If not exist the previous or next entry, just insert the current entry.
> +    //
> +    if (Before) {
> +      InsertTailList (FindEntry, CurrentEntry);
> +    } else {
> +      InsertHeadList (FindEntry, CurrentEntry);
> +    }
> +  } else {
> +    //
> +    // If exist the previous or next entry, need to check it before insert curent entry.
> +    //
> +    PreviousFeature = CPU_FEATURE_ENTRY_FROM_LINK (PreviousEntry);
> +
> +    if (AdjustFeaturesDependence (PreviousFeature, CurrentFeature, Before)) {
> +      //
> +      // Return TRUE means current feature dependence has been cleared and the previous
> +      // feature dependence has been kept and used. So insert current feature before (or after)
> +      // the previous feature.
> +      //
> +      if (Before) {
> +        InsertTailList (PreviousEntry, CurrentEntry);
> +      } else {
> +        InsertHeadList (PreviousEntry, CurrentEntry);
> +      }
> +    } else {
> +      if (Before) {
> +        InsertTailList (FindEntry, CurrentEntry);
> +      } else {
> +        InsertHeadList (FindEntry, CurrentEntry);
> +      }
> +    }
> +  }
> +}
> 
> +
> +/**
> +  Checks and adjusts current CPU features per dependency relationship.
> +
> +  @param[in]  FeatureList        Pointer to CPU feature list
> +  @param[in]  CurrentEntry       Pointer to current checked CPU feature
> +  @param[in]  FeatureMask        The feature bit mask.
> +
> +  @retval     return Swapped info.
> +**/
> +BOOLEAN
> +InsertToBeforeEntry (
> +  IN LIST_ENTRY              *FeatureList,
> +  IN LIST_ENTRY              *CurrentEntry,
> +  IN UINT8                   *FeatureMask
> +  )
> +{
> +  LIST_ENTRY                 *CheckEntry;
> +  CPU_FEATURES_ENTRY         *CheckFeature;
> +  BOOLEAN                    Swapped;
> +
> +  Swapped = FALSE;
> +
> +  //
> +  // Check all features dispatched before this entry
> +  //
> +  CheckEntry = GetFirstNode (FeatureList);
> +  while (CheckEntry != CurrentEntry) {
> +    CheckFeature = CPU_FEATURE_ENTRY_FROM_LINK (CheckEntry);
> +    if (IsBitMaskMatchCheck (CheckFeature->FeatureMask, FeatureMask)) {
> +      AdjustEntry (FeatureList, CheckEntry, CurrentEntry, TRUE);
> +      Swapped = TRUE;
> +      break;
> +    }
> +    CheckEntry = CheckEntry->ForwardLink;
> +  }
> +
> +  return Swapped;
> +}
> +
> +/**
> +  Checks and adjusts current CPU features per dependency relationship.
> +
> +  @param[in]  FeatureList        Pointer to CPU feature list
> +  @param[in]  CurrentEntry       Pointer to current checked CPU feature
> +  @param[in]  FeatureMask        The feature bit mask.
> +
> +  @retval     return Swapped info.
> +**/
> +BOOLEAN
> +InsertToAfterEntry (
> +  IN LIST_ENTRY              *FeatureList,
> +  IN LIST_ENTRY              *CurrentEntry,
> +  IN UINT8                   *FeatureMask
> +  )
> +{
> +  LIST_ENTRY                 *CheckEntry;
> +  CPU_FEATURES_ENTRY         *CheckFeature;
> +  BOOLEAN                    Swapped;
> +
> +  Swapped = FALSE;
> +
> +  //
> +  // Check all features dispatched after this entry
> +  //
> +  CheckEntry = GetNextNode (FeatureList, CurrentEntry);
> +  while (!IsNull (FeatureList, CheckEntry)) {
> +    CheckFeature = CPU_FEATURE_ENTRY_FROM_LINK (CheckEntry);
> +    if (IsBitMaskMatchCheck (CheckFeature->FeatureMask, FeatureMask)) {
> +      AdjustEntry (FeatureList, CheckEntry, CurrentEntry, FALSE);
> +      Swapped = TRUE;
> +      break;
> +    }
> +    CheckEntry = CheckEntry->ForwardLink;
> +  }
> +
> +  return Swapped;
> +}
> +
>   /**
>     Checks and adjusts CPU features order per dependency relationship.
>   
> @@ -128,11 +424,13 @@ CheckCpuFeaturesDependency (
>     CPU_FEATURES_ENTRY         *CheckFeature;
>     BOOLEAN                    Swapped;
>     LIST_ENTRY                 *TempEntry;
> +  LIST_ENTRY                 *NextEntry;
>   
>     CurrentEntry = GetFirstNode (FeatureList);
>     while (!IsNull (FeatureList, CurrentEntry)) {
>       Swapped = FALSE;
>       CpuFeature = CPU_FEATURE_ENTRY_FROM_LINK (CurrentEntry);
> +    NextEntry = CurrentEntry->ForwardLink;
>       if (CpuFeature->BeforeAll) {
>         //
>         // Check all features dispatched before this entry
> @@ -153,6 +451,7 @@ CheckCpuFeaturesDependency (
>           CheckEntry = CheckEntry->ForwardLink;
>         }
>         if (Swapped) {
> +        CurrentEntry = NextEntry;
>           continue;
>         }
>       }
> @@ -179,60 +478,59 @@ CheckCpuFeaturesDependency (
>           CheckEntry = CheckEntry->ForwardLink;
>         }
>         if (Swapped) {
> +        CurrentEntry = NextEntry;
>           continue;
>         }
>       }
>   
>       if (CpuFeature->BeforeFeatureBitMask != NULL) {
> -      //
> -      // Check all features dispatched before this entry
> -      //
> -      CheckEntry = GetFirstNode (FeatureList);
> -      while (CheckEntry != CurrentEntry) {
> -        CheckFeature = CPU_FEATURE_ENTRY_FROM_LINK (CheckEntry);
> -        if (IsBitMaskMatchCheck (CheckFeature->FeatureMask, CpuFeature->BeforeFeatureBitMask)) {
> -          //
> -          // If there is dependency, swap them
> -          //
> -          RemoveEntryList (CurrentEntry);
> -          InsertTailList (CheckEntry, CurrentEntry);
> -          Swapped = TRUE;
> -          break;
> -        }
> -        CheckEntry = CheckEntry->ForwardLink;
> -      }
> +      Swapped = InsertToBeforeEntry (FeatureList, CurrentEntry, CpuFeature->BeforeFeatureBitMask);
>         if (Swapped) {
> +        CurrentEntry = NextEntry;
>           continue;
>         }
>       }
>   
>       if (CpuFeature->AfterFeatureBitMask != NULL) {
> -      //
> -      // Check all features dispatched after this entry
> -      //
> -      CheckEntry = GetNextNode (FeatureList, CurrentEntry);
> -      while (!IsNull (FeatureList, CheckEntry)) {
> -        CheckFeature = CPU_FEATURE_ENTRY_FROM_LINK (CheckEntry);
> -        if (IsBitMaskMatchCheck (CheckFeature->FeatureMask, CpuFeature->AfterFeatureBitMask)) {
> -          //
> -          // If there is dependency, swap them
> -          //
> -          TempEntry = GetNextNode (FeatureList, CurrentEntry);
> -          RemoveEntryList (CurrentEntry);
> -          InsertHeadList (CheckEntry, CurrentEntry);
> -          CurrentEntry = TempEntry;
> -          Swapped = TRUE;
> -          break;
> -        }
> -        CheckEntry = CheckEntry->ForwardLink;
> +      Swapped = InsertToAfterEntry (FeatureList, CurrentEntry, CpuFeature->AfterFeatureBitMask);
> +      if (Swapped) {
> +        CurrentEntry = NextEntry;
> +        continue;
> +      }
> +    }
> +
> +    if (CpuFeature->CoreBeforeFeatureBitMask != NULL) {
> +      Swapped = InsertToBeforeEntry (FeatureList, CurrentEntry, CpuFeature->CoreBeforeFeatureBitMask);
> +      if (Swapped) {
> +        CurrentEntry = NextEntry;
> +        continue;
>         }
> +    }
> +
> +    if (CpuFeature->CoreAfterFeatureBitMask != NULL) {
> +      Swapped = InsertToAfterEntry (FeatureList, CurrentEntry, CpuFeature->CoreAfterFeatureBitMask);
>         if (Swapped) {
> +        CurrentEntry = NextEntry;
>           continue;
>         }
>       }
> -    //
> -    // No swap happened, check the next feature
> -    //
> +
> +    if (CpuFeature->PackageBeforeFeatureBitMask != NULL) {
> +      Swapped = InsertToBeforeEntry (FeatureList, CurrentEntry, CpuFeature->PackageBeforeFeatureBitMask);
> +      if (Swapped) {
> +        CurrentEntry = NextEntry;
> +        continue;
> +      }
> +    }
> +
> +    if (CpuFeature->PackageAfterFeatureBitMask != NULL) {
> +      Swapped = InsertToAfterEntry (FeatureList, CurrentEntry, CpuFeature->PackageAfterFeatureBitMask);
> +      if (Swapped) {
> +        CurrentEntry = NextEntry;
> +        continue;
> +      }
> +    }
> +
>       CurrentEntry = CurrentEntry->ForwardLink;
>     }
>   }
> @@ -265,8 +563,7 @@ RegisterCpuFeatureWorker (
>     CpuFeaturesData = GetCpuFeaturesData ();
>     if (CpuFeaturesData->FeaturesCount == 0) {
>       InitializeListHead (&CpuFeaturesData->FeatureList);
> -    InitializeSpinLock (&CpuFeaturesData->MsrLock);
> -    InitializeSpinLock (&CpuFeaturesData->MemoryMappedLock);
> +    InitializeSpinLock (&CpuFeaturesData->CpuFlags.MemoryMappedLock);
>       CpuFeaturesData->BitMaskSize = (UINT32) BitMaskSize;
>     }
>     ASSERT (CpuFeaturesData->BitMaskSize == BitMaskSize);
> @@ -328,6 +625,31 @@ RegisterCpuFeatureWorker (
>         }
>         CpuFeatureEntry->AfterFeatureBitMask = CpuFeature->AfterFeatureBitMask;
>       }
> +    if (CpuFeature->CoreBeforeFeatureBitMask != NULL) {
> +      if (CpuFeatureEntry->CoreBeforeFeatureBitMask != NULL) {
> +        FreePool (CpuFeatureEntry->CoreBeforeFeatureBitMask);
> +      }
> +      CpuFeatureEntry->CoreBeforeFeatureBitMask = CpuFeature->CoreBeforeFeatureBitMask;
> +    }
> +    if (CpuFeature->CoreAfterFeatureBitMask != NULL) {
> +      if (CpuFeatureEntry->CoreAfterFeatureBitMask != NULL) {
> +        FreePool (CpuFeatureEntry->CoreAfterFeatureBitMask);
> +      }
> +      CpuFeatureEntry->CoreAfterFeatureBitMask = CpuFeature->CoreAfterFeatureBitMask;
> +    }
> +    if (CpuFeature->PackageBeforeFeatureBitMask != NULL) {
> +      if (CpuFeatureEntry->PackageBeforeFeatureBitMask != NULL) {
> +        FreePool (CpuFeatureEntry->PackageBeforeFeatureBitMask);
> +      }
> +      CpuFeatureEntry->PackageBeforeFeatureBitMask = CpuFeature->PackageBeforeFeatureBitMask;
> +    }
> +    if (CpuFeature->PackageAfterFeatureBitMask != NULL) {
> +      if (CpuFeatureEntry->PackageAfterFeatureBitMask != NULL) {
> +        FreePool (CpuFeatureEntry->PackageAfterFeatureBitMask);
> +      }
> +      CpuFeatureEntry->PackageAfterFeatureBitMask = CpuFeature->PackageAfterFeatureBitMask;
> +    }
> +
>       CpuFeatureEntry->BeforeAll = CpuFeature->BeforeAll;
>       CpuFeatureEntry->AfterAll  = CpuFeature->AfterAll;
>   
> @@ -410,6 +732,8 @@ SetCpuFeaturesBitMask (
>     @retval  RETURN_UNSUPPORTED       Registration of the CPU feature is not
>                                       supported due to a circular dependency between
>                                       BEFORE and AFTER features.
> +  @retval  RETURN_NOT_READY         CPU feature PCD PcdCpuFeaturesUserConfiguration
> +                                    not updated by Platform driver yet.
>   
>     @note This service could be called by BSP only.
>   **/
> @@ -431,12 +755,20 @@ RegisterCpuFeature (
>     UINT8                      *FeatureMask;
>     UINT8                      *BeforeFeatureBitMask;
>     UINT8                      *AfterFeatureBitMask;
> +  UINT8                      *CoreBeforeFeatureBitMask;
> +  UINT8                      *CoreAfterFeatureBitMask;
> +  UINT8                      *PackageBeforeFeatureBitMask;
> +  UINT8                      *PackageAfterFeatureBitMask;
>     BOOLEAN                    BeforeAll;
>     BOOLEAN                    AfterAll;
>   
> -  FeatureMask          = NULL;
> -  BeforeFeatureBitMask = NULL;
> -  AfterFeatureBitMask  = NULL;
> +  FeatureMask                 = NULL;
> +  BeforeFeatureBitMask        = NULL;
> +  AfterFeatureBitMask         = NULL;
> +  CoreBeforeFeatureBitMask    = NULL;
> +  CoreAfterFeatureBitMask     = NULL;
> +  PackageBeforeFeatureBitMask  = NULL;
> +  PackageAfterFeatureBitMask   = NULL;

Try to align the "=" for PackageBeforeFeatureBitMask and 
PackageAfterFeatureBitMask.

>     BeforeAll            = FALSE;
>     AfterAll             = FALSE;
>   
> @@ -449,6 +781,10 @@ RegisterCpuFeature (
>                       != (CPU_FEATURE_BEFORE | CPU_FEATURE_AFTER));
>       ASSERT ((Feature & (CPU_FEATURE_BEFORE_ALL | CPU_FEATURE_AFTER_ALL))
>                       != (CPU_FEATURE_BEFORE_ALL | CPU_FEATURE_AFTER_ALL));
> +    ASSERT ((Feature & (CPU_FEATURE_CORE_BEFORE | CPU_FEATURE_CORE_AFTER))
> +                    != (CPU_FEATURE_CORE_BEFORE | CPU_FEATURE_CORE_AFTER));
> +    ASSERT ((Feature & (CPU_FEATURE_PACKAGE_BEFORE | CPU_FEATURE_PACKAGE_AFTER))
> +                    != (CPU_FEATURE_PACKAGE_BEFORE | CPU_FEATURE_PACKAGE_AFTER));
>       if (Feature < CPU_FEATURE_BEFORE) {
>         BeforeAll = ((Feature & CPU_FEATURE_BEFORE_ALL) != 0) ? TRUE : FALSE;
>         AfterAll  = ((Feature & CPU_FEATURE_AFTER_ALL) != 0) ? TRUE : FALSE;
> @@ -459,6 +795,14 @@ RegisterCpuFeature (
>         SetCpuFeaturesBitMask (&BeforeFeatureBitMask, Feature & ~CPU_FEATURE_BEFORE, BitMaskSize);
>       } else if ((Feature & CPU_FEATURE_AFTER) != 0) {
>         SetCpuFeaturesBitMask (&AfterFeatureBitMask, Feature & ~CPU_FEATURE_AFTER, BitMaskSize);
> +    } else if ((Feature & CPU_FEATURE_CORE_BEFORE) != 0) {
> +      SetCpuFeaturesBitMask (&CoreBeforeFeatureBitMask, Feature & ~CPU_FEATURE_CORE_BEFORE, BitMaskSize);
> +    } else if ((Feature & CPU_FEATURE_CORE_AFTER) != 0) {
> +      SetCpuFeaturesBitMask (&CoreAfterFeatureBitMask, Feature & ~CPU_FEATURE_CORE_AFTER, BitMaskSize);
> +    } else if ((Feature & CPU_FEATURE_PACKAGE_BEFORE) != 0) {
> +      SetCpuFeaturesBitMask (&PackageBeforeFeatureBitMask, Feature & ~CPU_FEATURE_PACKAGE_BEFORE, BitMaskSize);
> +    } else if ((Feature & CPU_FEATURE_PACKAGE_AFTER) != 0) {
> +      SetCpuFeaturesBitMask (&PackageAfterFeatureBitMask, Feature & ~CPU_FEATURE_PACKAGE_AFTER, BitMaskSize);
>       }
>       Feature = VA_ARG (Marker, UINT32);
>     }
> @@ -466,15 +810,19 @@ RegisterCpuFeature (
>   
>     CpuFeature = AllocateZeroPool (sizeof (CPU_FEATURES_ENTRY));
>     ASSERT (CpuFeature != NULL);
> -  CpuFeature->Signature            = CPU_FEATURE_ENTRY_SIGNATURE;
> -  CpuFeature->FeatureMask          = FeatureMask;
> -  CpuFeature->BeforeFeatureBitMask = BeforeFeatureBitMask;
> -  CpuFeature->AfterFeatureBitMask  = AfterFeatureBitMask;
> -  CpuFeature->BeforeAll            = BeforeAll;
> -  CpuFeature->AfterAll             = AfterAll;
> -  CpuFeature->GetConfigDataFunc    = GetConfigDataFunc;
> -  CpuFeature->SupportFunc          = SupportFunc;
> -  CpuFeature->InitializeFunc       = InitializeFunc;
> +  CpuFeature->Signature                   = CPU_FEATURE_ENTRY_SIGNATURE;
> +  CpuFeature->FeatureMask                 = FeatureMask;
> +  CpuFeature->BeforeFeatureBitMask        = BeforeFeatureBitMask;
> +  CpuFeature->AfterFeatureBitMask         = AfterFeatureBitMask;
> +  CpuFeature->CoreBeforeFeatureBitMask    = CoreBeforeFeatureBitMask;
> +  CpuFeature->CoreAfterFeatureBitMask     = CoreAfterFeatureBitMask;
> +  CpuFeature->PackageBeforeFeatureBitMask = PackageBeforeFeatureBitMask;
> +  CpuFeature->PackageAfterFeatureBitMask  = PackageAfterFeatureBitMask;
> +  CpuFeature->BeforeAll                   = BeforeAll;
> +  CpuFeature->AfterAll                    = AfterAll;
> +  CpuFeature->GetConfigDataFunc           = GetConfigDataFunc;
> +  CpuFeature->SupportFunc                 = SupportFunc;
> +  CpuFeature->InitializeFunc              = InitializeFunc;
>     if (FeatureName != NULL) {
>       CpuFeature->FeatureName          = AllocatePool (CPU_FEATURE_NAME_SIZE);
>       ASSERT (CpuFeature->FeatureName != NULL);
> @@ -489,13 +837,12 @@ RegisterCpuFeature (
>   }
>   
>   /**
> -  Allocates boot service data to save ACPI_CPU_DATA.
> +  Return ACPI_CPU_DATA data.
>   
> -  @return  Pointer to allocated ACPI_CPU_DATA.
> +  @return  Pointer to ACPI_CPU_DATA data.
>   **/
> -STATIC
>   ACPI_CPU_DATA *
> -AllocateAcpiCpuData (
> +GetAcpiCpuData (
>     VOID
>     )
>   {
> @@ -508,9 +855,20 @@ AllocateAcpiCpuData (
>     UINTN                                Index;
>     EFI_PROCESSOR_INFORMATION            ProcessorInfoBuffer;
>   
> +  AcpiCpuData = (ACPI_CPU_DATA *) (UINTN) PcdGet64 (PcdCpuS3DataAddress);
> +  if (AcpiCpuData != NULL) {
> +    return AcpiCpuData;
> +  }
> +
>     AcpiCpuData  = AllocatePages (EFI_SIZE_TO_PAGES (sizeof (ACPI_CPU_DATA)));
>     ASSERT (AcpiCpuData != NULL);
>   
> +  //
> +  // Set PcdCpuS3DataAddress to the base address of the ACPI_CPU_DATA structure
> +  //
> +  Status = PcdSet64S (PcdCpuS3DataAddress, (UINT64)(UINTN)AcpiCpuData);
> +  ASSERT_EFI_ERROR (Status);
> +
>     GetNumberOfProcessor (&NumberOfCpus, &NumberOfEnabledProcessors);
>     AcpiCpuData->NumberOfCpus = (UINT32)NumberOfCpus;
>   
> @@ -606,7 +964,6 @@ CpuRegisterTableWriteWorker (
>     IN UINT64                  Value
>     )
>   {
> -  EFI_STATUS               Status;
>     CPU_FEATURES_DATA        *CpuFeaturesData;
>     ACPI_CPU_DATA            *AcpiCpuData;
>     CPU_REGISTER_TABLE       *RegisterTable;
> @@ -614,17 +971,8 @@ CpuRegisterTableWriteWorker (
>   
>     CpuFeaturesData = GetCpuFeaturesData ();
>     if (CpuFeaturesData->RegisterTable == NULL) {
> -    AcpiCpuData = (ACPI_CPU_DATA *) (UINTN) PcdGet64 (PcdCpuS3DataAddress);
> -    if (AcpiCpuData == NULL) {
> -      AcpiCpuData = AllocateAcpiCpuData ();
> -      ASSERT (AcpiCpuData != NULL);
> -      //
> -      // Set PcdCpuS3DataAddress to the base address of the ACPI_CPU_DATA structure
> -      //
> -      Status = PcdSet64S (PcdCpuS3DataAddress, (UINT64)(UINTN)AcpiCpuData);
> -      ASSERT_EFI_ERROR (Status);
> -    }
> -    ASSERT (AcpiCpuData->RegisterTable != 0);
> +    AcpiCpuData = GetAcpiCpuData ();
> +    ASSERT ((AcpiCpuData != NULL) && (AcpiCpuData->RegisterTable != 0));
>       CpuFeaturesData->RegisterTable = (CPU_REGISTER_TABLE *) (UINTN) AcpiCpuData->RegisterTable;
>       CpuFeaturesData->PreSmmRegisterTable = (CPU_REGISTER_TABLE *) (UINTN) AcpiCpuData->PreSmmInitRegisterTable;
>     }
> 


-- 
Thanks,
Ray


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch v2 4/6] UefiCpuPkg/PiSmmCpuDxeSmm: Add logic to support semaphore type.
  2018-10-17  2:16 ` [Patch v2 4/6] UefiCpuPkg/PiSmmCpuDxeSmm: " Eric Dong
@ 2018-10-18  5:54   ` Ni, Ruiyu
  0 siblings, 0 replies; 18+ messages in thread
From: Ni, Ruiyu @ 2018-10-18  5:54 UTC (permalink / raw)
  To: Eric Dong, edk2-devel; +Cc: Laszlo Ersek

On 10/17/2018 10:16 AM, Eric Dong wrote:
> Because this driver needs to set MSRs saved in normal boot phase, sync semaphore
> logic from RegisterCpuFeaturesLib code which used for normal boot phase.
> 
> Detail see below change for RegisterCpuFeaturesLib:
>    UefiCpuPkg/RegisterCpuFeaturesLib: Add logic to support semaphore type.
> 
> Cc: Ruiyu Ni <ruiyu.ni@intel.com>
> Cc: Laszlo Ersek <lersek@redhat.com>
> Contributed-under: TianoCore Contribution Agreement 1.1
> Signed-off-by: Eric Dong <eric.dong@intel.com>
> ---
>   UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c          | 406 +++++++++++++++++++----------
>   UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c      |   3 -
>   UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h |   3 +-
>   3 files changed, 268 insertions(+), 144 deletions(-)
> 
> diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c b/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
> index 52ff9679d5..42a889f08f 100644
> --- a/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
> +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
> @@ -38,9 +38,12 @@ typedef struct {
>   } MP_ASSEMBLY_ADDRESS_MAP;
>   
>   //
> -// Spin lock used to serialize MemoryMapped operation
> +// Flags used when program the register.
>   //
> -SPIN_LOCK                *mMemoryMappedLock = NULL;
> +typedef struct {
> +  volatile UINTN           MemoryMappedLock;     // Spinlock used to program mmio
> +  volatile UINT32          *SemaphoreCount;      // Semaphore used to program semaphore.
> +} PROGRAM_CPU_REGISTER_FLAGS;
>   
>   //
>   // Signal that SMM BASE relocation is complete.
> @@ -62,13 +65,11 @@ AsmGetAddressMap (
>   #define LEGACY_REGION_SIZE    (2 * 0x1000)
>   #define LEGACY_REGION_BASE    (0xA0000 - LEGACY_REGION_SIZE)
>   
> +PROGRAM_CPU_REGISTER_FLAGS   mCpuFlags;
>   ACPI_CPU_DATA                mAcpiCpuData;
>   volatile UINT32              mNumberToFinish;
>   MP_CPU_EXCHANGE_INFO         *mExchangeInfo;
>   BOOLEAN                      mRestoreSmmConfigurationInS3 = FALSE;
> -MP_MSR_LOCK                  *mMsrSpinLocks = NULL;
> -UINTN                        mMsrSpinLockCount;
> -UINTN                        mMsrCount = 0;
>   
>   //
>   // S3 boot flag
> @@ -91,89 +92,6 @@ UINT8                        mApHltLoopCodeTemplate[] = {
>                                  0xEB, 0xFC               // jmp $-2
>                                  };
>   
> -/**
> -  Get MSR spin lock by MSR index.
> -
> -  @param  MsrIndex       MSR index value.
> -
> -  @return Pointer to MSR spin lock.
> -
> -**/
> -SPIN_LOCK *
> -GetMsrSpinLockByIndex (
> -  IN UINT32      MsrIndex
> -  )
> -{
> -  UINTN     Index;
> -  for (Index = 0; Index < mMsrCount; Index++) {
> -    if (MsrIndex == mMsrSpinLocks[Index].MsrIndex) {
> -      return mMsrSpinLocks[Index].SpinLock;
> -    }
> -  }
> -  return NULL;
> -}
> -
> -/**
> -  Initialize MSR spin lock by MSR index.
> -
> -  @param  MsrIndex       MSR index value.
> -
> -**/
> -VOID
> -InitMsrSpinLockByIndex (
> -  IN UINT32      MsrIndex
> -  )
> -{
> -  UINTN    MsrSpinLockCount;
> -  UINTN    NewMsrSpinLockCount;
> -  UINTN    Index;
> -  UINTN    AddedSize;
> -
> -  if (mMsrSpinLocks == NULL) {
> -    MsrSpinLockCount = mSmmCpuSemaphores.SemaphoreMsr.AvailableCounter;
> -    mMsrSpinLocks = (MP_MSR_LOCK *) AllocatePool (sizeof (MP_MSR_LOCK) * MsrSpinLockCount);
> -    ASSERT (mMsrSpinLocks != NULL);
> -    for (Index = 0; Index < MsrSpinLockCount; Index++) {
> -      mMsrSpinLocks[Index].SpinLock =
> -       (SPIN_LOCK *)((UINTN)mSmmCpuSemaphores.SemaphoreMsr.Msr + Index * mSemaphoreSize);
> -      mMsrSpinLocks[Index].MsrIndex = (UINT32)-1;
> -    }
> -    mMsrSpinLockCount = MsrSpinLockCount;
> -    mSmmCpuSemaphores.SemaphoreMsr.AvailableCounter = 0;
> -  }
> -  if (GetMsrSpinLockByIndex (MsrIndex) == NULL) {
> -    //
> -    // Initialize spin lock for MSR programming
> -    //
> -    mMsrSpinLocks[mMsrCount].MsrIndex = MsrIndex;
> -    InitializeSpinLock (mMsrSpinLocks[mMsrCount].SpinLock);
> -    mMsrCount ++;
> -    if (mMsrCount == mMsrSpinLockCount) {
> -      //
> -      // If MSR spin lock buffer is full, enlarge it
> -      //
> -      AddedSize = SIZE_4KB;
> -      mSmmCpuSemaphores.SemaphoreMsr.Msr =
> -                        AllocatePages (EFI_SIZE_TO_PAGES(AddedSize));
> -      ASSERT (mSmmCpuSemaphores.SemaphoreMsr.Msr != NULL);
> -      NewMsrSpinLockCount = mMsrSpinLockCount + AddedSize / mSemaphoreSize;
> -      mMsrSpinLocks = ReallocatePool (
> -                        sizeof (MP_MSR_LOCK) * mMsrSpinLockCount,
> -                        sizeof (MP_MSR_LOCK) * NewMsrSpinLockCount,
> -                        mMsrSpinLocks
> -                        );
> -      ASSERT (mMsrSpinLocks != NULL);
> -      mMsrSpinLockCount = NewMsrSpinLockCount;
> -      for (Index = mMsrCount; Index < mMsrSpinLockCount; Index++) {
> -        mMsrSpinLocks[Index].SpinLock =
> -                 (SPIN_LOCK *)((UINTN)mSmmCpuSemaphores.SemaphoreMsr.Msr +
> -                 (Index - mMsrCount)  * mSemaphoreSize);
> -        mMsrSpinLocks[Index].MsrIndex = (UINT32)-1;
> -      }
> -    }
> -  }
> -}
> -
>   /**
>     Sync up the MTRR values for all processors.
>   
> @@ -204,42 +122,131 @@ Returns:
>   }
>   
>   /**
> -  Programs registers for the calling processor.
> +  Increment semaphore by 1.
> +
> +  @param      Sem            IN:  32-bit unsigned integer
> +
> +**/
> +VOID
> +S3ReleaseSemaphore (
> +  IN OUT  volatile UINT32           *Sem
> +  )
> +{
> +  InterlockedIncrement (Sem);
> +}
> +
> +/**
> +  Decrement the semaphore by 1 if it is not zero.
>   
> -  This function programs registers for the calling processor.
> +  Performs an atomic decrement operation for semaphore.
> +  The compare exchange operation must be performed using
> +  MP safe mechanisms.
>   
> -  @param  RegisterTables        Pointer to register table of the running processor.
> -  @param  RegisterTableCount    Register table count.
> +  @param      Sem            IN:  32-bit unsigned integer
>   
>   **/
>   VOID
> -SetProcessorRegister (
> -  IN CPU_REGISTER_TABLE        *RegisterTables,
> -  IN UINTN                     RegisterTableCount
> +S3WaitForSemaphore (
> +  IN OUT  volatile UINT32           *Sem
> +  )
> +{
> +  UINT32  Value;
> +
> +  do {
> +    Value = *Sem;
> +  } while (Value == 0 ||
> +           InterlockedCompareExchange32 (
> +             Sem,
> +             Value,
> +             Value - 1
> +             ) != Value);
> +}
> +
> +/**
> +  Get the string value for the input register type.
> +
> +  @param[in]  RegisterType  Input the register type.
> +
> +  @retval     Return the register type string.
> +**/
> +CHAR16 *
> +GetRegisterTypeString (
> +  IN REGISTER_TYPE       RegisterType
> +  )
> +{
> +  switch (RegisterType) {
> +  case Msr:
> +    return L"Msr";
> +
> +  case ControlRegister:
> +    return L"ControlRegister";
> +
> +  case MemoryMapped:
> +    return L"MemoryMapped";
> +
> +  case CacheControl:
> +    return L"CacheControl";
> +
> +  case Semaphore:
> +    return L"Semaphore";
> +
> +  default:
> +    ASSERT (FALSE);
> +    return L"NoneType";
> +  }
> +}
> +
> +/**
> +  Initialize the CPU registers from a register table.
> +
> +  @param[in]  RegisterTable         The register table for this AP.
> +  @param[in]  ApLocation            AP location info for this ap.
> +  @param[in]  CpuStatus             CPU status info for this CPU.
> +  @param[in]  CpuFlags              Flags data structure used when program the register.
> +
> +  @note This service could be called by BSP/APs.
> +**/
> +VOID
> +ProgramProcessorRegister (
> +  IN CPU_REGISTER_TABLE           *RegisterTable,
> +  IN EFI_CPU_PHYSICAL_LOCATION    *ApLocation,
> +  IN CPU_STATUS_INFORMATION       *CpuStatus,
> +  IN PROGRAM_CPU_REGISTER_FLAGS   *CpuFlags
>     )
>   {
>     CPU_REGISTER_TABLE_ENTRY  *RegisterTableEntry;
>     UINTN                     Index;
>     UINTN                     Value;
> -  SPIN_LOCK                 *MsrSpinLock;
> -  UINT32                    InitApicId;
> -  CPU_REGISTER_TABLE        *RegisterTable;
> -
> -  InitApicId = GetInitialApicId ();
> -  RegisterTable = NULL;
> -  for (Index = 0; Index < RegisterTableCount; Index++) {
> -    if (RegisterTables[Index].InitialApicId == InitApicId) {
> -      RegisterTable =  &RegisterTables[Index];
> -      break;
> -    }
> -  }
> -  ASSERT (RegisterTable != NULL);
> +  CPU_REGISTER_TABLE_ENTRY  *RegisterTableEntryHead;
> +  volatile UINT32           *SemaphorePtr;
> +  UINT32                    FirstThread;
> +  UINT32                    PackageThreadsCount;
> +  UINT32                    CurrentThread;
> +  UINTN                     ProcessorIndex;
> +  UINTN                     ThreadIndex;
> +  UINTN                     ValidThreadCount;
> +  UINT32                    *ValidCorePerPackage;
> +
> +  ThreadIndex = ApLocation->Package * CpuStatus->MaxCoreCount * CpuStatus->MaxThreadCount +
> +              ApLocation->Core * CpuStatus->MaxThreadCount +
> +              ApLocation->Thread;
>   
>     //
>     // Traverse Register Table of this logical processor
>     //
> -  RegisterTableEntry = (CPU_REGISTER_TABLE_ENTRY *) (UINTN) RegisterTable->RegisterTableEntry;
> -  for (Index = 0; Index < RegisterTable->TableLength; Index++, RegisterTableEntry++) {
> +  RegisterTableEntryHead = (CPU_REGISTER_TABLE_ENTRY *) (UINTN) RegisterTable->RegisterTableEntry;
> +
> +  for (Index = 0; Index < RegisterTable->TableLength; Index++) {
> +
> +    RegisterTableEntry = &RegisterTableEntryHead[Index];
> +    DEBUG ((
> +      DEBUG_INFO,
> +      "Processor = %lu, Entry Index %lu, Type = %s!\n",
> +      (UINT64)ThreadIndex,
> +      (UINT64)Index,
> +      GetRegisterTypeString((REGISTER_TYPE)RegisterTableEntry->RegisterType)
> +      ));
> +
>       //
>       // Check the type of specified register
>       //
> @@ -310,12 +317,6 @@ SetProcessorRegister (
>             RegisterTableEntry->Value
>             );
>         } else {
> -        //
> -        // Get lock to avoid Package/Core scope MSRs programming issue in parallel execution mode
> -        // to make sure MSR read/write operation is atomic.
> -        //
> -        MsrSpinLock = GetMsrSpinLockByIndex (RegisterTableEntry->Index);
> -        AcquireSpinLock (MsrSpinLock);
>           //
>           // Set the bit section according to bit start and length
>           //
> @@ -325,21 +326,20 @@ SetProcessorRegister (
>             RegisterTableEntry->ValidBitStart + RegisterTableEntry->ValidBitLength - 1,
>             RegisterTableEntry->Value
>             );
> -        ReleaseSpinLock (MsrSpinLock);
>         }
>         break;
>       //
>       // MemoryMapped operations
>       //
>       case MemoryMapped:
> -      AcquireSpinLock (mMemoryMappedLock);
> +      AcquireSpinLock (&CpuFlags->MemoryMappedLock);
>         MmioBitFieldWrite32 (
>           (UINTN)(RegisterTableEntry->Index | LShiftU64 (RegisterTableEntry->HighIndex, 32)),
>           RegisterTableEntry->ValidBitStart,
>           RegisterTableEntry->ValidBitStart + RegisterTableEntry->ValidBitLength - 1,
>           (UINT32)RegisterTableEntry->Value
>           );
> -      ReleaseSpinLock (mMemoryMappedLock);
> +      ReleaseSpinLock (&CpuFlags->MemoryMappedLock);
>         break;
>       //
>       // Enable or disable cache
> @@ -355,12 +355,126 @@ SetProcessorRegister (
>         }
>         break;
>   
> +    case Semaphore:
> +      // Semaphore works logic like below:
> +      //
> +      //  V(x) = LibReleaseSemaphore (Semaphore[FirstThread + x]);
> +      //  P(x) = LibWaitForSemaphore (Semaphore[FirstThread + x]);
> +      //
> +      //  All threads (T0...Tn) waits in P() line and continues running
> +      //  together.
> +      //
> +      //
> +      //  T0             T1            ...           Tn
> +      //
> +      //  V(0...n)       V(0...n)      ...           V(0...n)
> +      //  n * P(0)       n * P(1)      ...           n * P(n)
> +      //
> +      SemaphorePtr = CpuFlags->SemaphoreCount;
> +      switch (RegisterTableEntry->Value) {
> +      case CoreDepType:
> +        //
> +        // Get Offset info for the first thread in the core which current thread belongs to.
> +        //
> +        FirstThread = (ApLocation->Package * CpuStatus->MaxCoreCount + ApLocation->Core) * CpuStatus->MaxThreadCount;
> +        CurrentThread = FirstThread + ApLocation->Thread;
> +        //
> +        // First Notify all threads in current Core that this thread has ready.
> +        //
> +        for (ProcessorIndex = 0; ProcessorIndex < CpuStatus->MaxThreadCount; ProcessorIndex ++) {
> +          S3ReleaseSemaphore (&SemaphorePtr[FirstThread + ProcessorIndex]);
> +        }
> +        //
> +        // Second, check whether all valid threads in current core have ready.
> +        //
> +        for (ProcessorIndex = 0; ProcessorIndex < CpuStatus->MaxThreadCount; ProcessorIndex ++) {
> +          S3WaitForSemaphore (&SemaphorePtr[CurrentThread]);
> +        }
> +        break;
> +
> +      case PackageDepType:
> +        ValidCorePerPackage = (UINT32 *)(UINTN)CpuStatus->ValidCoresPerPackages;
> +        //
> +        // Get Offset info for the first thread in the package which current thread belongs to.
> +        //
> +        FirstThread = ApLocation->Package * CpuStatus->MaxCoreCount * CpuStatus->MaxThreadCount;
> +        //
> +        // Get the possible threads count for current package.
> +        //
> +        PackageThreadsCount = CpuStatus->MaxThreadCount * CpuStatus->MaxCoreCount;
> +        CurrentThread = FirstThread + CpuStatus->MaxThreadCount * ApLocation->Core + ApLocation->Thread;
> +        //
> +        // Get the valid thread count for current package.
> +        //
> +        ValidThreadCount = CpuStatus->MaxThreadCount * ValidCorePerPackage[ApLocation->Package];
> +        //
> +        // First Notify all threads in current package that this thread has ready.
> +        //
> +        for (ProcessorIndex = 0; ProcessorIndex < PackageThreadsCount ; ProcessorIndex ++) {
> +          S3ReleaseSemaphore (&SemaphorePtr[FirstThread + ProcessorIndex]);
> +        }
> +        //
> +        // Second, check whether all valid threads in current package have ready.
> +        //
> +        for (ProcessorIndex = 0; ProcessorIndex < ValidThreadCount; ProcessorIndex ++) {
> +          S3WaitForSemaphore (&SemaphorePtr[CurrentThread]);
> +        }
> +        break;
> +
> +      default:
> +        break;
> +      }
> +      break;
> +
>       default:
>         break;
>       }
>     }
>   }
>   
> +/**
> +
> +  Set Processor register for one AP.
> +
> +  @param     PreSmmRegisterTable     Use pre Smm register table or register table.
> +
> +**/
> +VOID
> +SetRegister (
> +  IN BOOLEAN                 PreSmmRegisterTable
> +  )
> +{
> +  CPU_REGISTER_TABLE        *RegisterTable;
> +  CPU_REGISTER_TABLE        *RegisterTables;
> +  UINT32                    InitApicId;
> +  UINTN                     ProcIndex;
> +  UINTN                     Index;
> +
> +  if (PreSmmRegisterTable) {
> +    RegisterTables = (CPU_REGISTER_TABLE *)(UINTN)mAcpiCpuData.PreSmmInitRegisterTable;
> +  } else {
> +    RegisterTables = (CPU_REGISTER_TABLE *)(UINTN)mAcpiCpuData.RegisterTable;
> +  }
> +
> +  InitApicId = GetInitialApicId ();
> +  RegisterTable = NULL;
> +  for (Index = 0; Index < mAcpiCpuData.NumberOfCpus; Index++) {
> +    if (RegisterTables[Index].InitialApicId == InitApicId) {
> +      RegisterTable = &RegisterTables[Index];
> +      ProcIndex = Index;
> +      break;
> +    }
> +  }
> +  ASSERT (RegisterTable != NULL);
> +
> +  ProgramProcessorRegister (
> +    RegisterTable,
> +    (EFI_CPU_PHYSICAL_LOCATION *)(UINTN)mAcpiCpuData.ApLocation + ProcIndex,
> +    &mAcpiCpuData.CpuStatus,
> +    &mCpuFlags
> +    );
> +}
> +
>   /**
>     AP initialization before then after SMBASE relocation in the S3 boot path.
>   **/
> @@ -374,7 +488,7 @@ InitializeAp (
>   
>     LoadMtrrData (mAcpiCpuData.MtrrTable);
>   
> -  SetProcessorRegister ((CPU_REGISTER_TABLE *) (UINTN) mAcpiCpuData.PreSmmInitRegisterTable, mAcpiCpuData.NumberOfCpus);
> +  SetRegister (TRUE);
>   
>     //
>     // Count down the number with lock mechanism.
> @@ -391,7 +505,7 @@ InitializeAp (
>     ProgramVirtualWireMode ();
>     DisableLvtInterrupts ();
>   
> -  SetProcessorRegister ((CPU_REGISTER_TABLE *) (UINTN) mAcpiCpuData.RegisterTable, mAcpiCpuData.NumberOfCpus);
> +  SetRegister (FALSE);
>   
>     //
>     // Place AP into the safe code, count down the number with lock mechanism in the safe code.
> @@ -466,7 +580,7 @@ InitializeCpuBeforeRebase (
>   {
>     LoadMtrrData (mAcpiCpuData.MtrrTable);
>   
> -  SetProcessorRegister ((CPU_REGISTER_TABLE *) (UINTN) mAcpiCpuData.PreSmmInitRegisterTable, mAcpiCpuData.NumberOfCpus);
> +  SetRegister (TRUE);
>   
>     ProgramVirtualWireMode ();
>   
> @@ -502,15 +616,24 @@ InitializeCpuAfterRebase (
>     VOID
>     )
>   {
> -  SetProcessorRegister ((CPU_REGISTER_TABLE *) (UINTN) mAcpiCpuData.RegisterTable, mAcpiCpuData.NumberOfCpus);
> -
>     mNumberToFinish = mAcpiCpuData.NumberOfCpus - 1;
>   
>     //
> -  // Signal that SMM base relocation is complete and to continue initialization.
> +  // Signal that SMM base relocation is complete and to continue initialization for all APs.
>     //
>     mInitApsAfterSmmBaseReloc = TRUE;
>   
> +  //
> +  // Must begin set register after all APs have continue their initialization.
> +  // This is a requirement to support semaphore mechanism in register table.
> +  // Because if semaphore's dependence type is package type, semaphore will wait
> +  // for all Aps in one package finishing their tasks before set next register
> +  // for all APs. If the Aps not begin its task during BSP doing its task, the
> +  // BSP thread will hang because it is waiting for other Aps in the same
> +  // package finishing their task.
> +  //
> +  SetRegister (FALSE);
> +
>     while (mNumberToFinish > 0) {
>       CpuPause ();
>     }
> @@ -574,8 +697,6 @@ SmmRestoreCpu (
>   
>     mSmmS3Flag = TRUE;
>   
> -  InitializeSpinLock (mMemoryMappedLock);
> -
>     //
>     // See if there is enough context to resume PEI Phase
>     //
> @@ -790,7 +911,6 @@ CopyRegisterTable (
>     )
>   {
>     UINTN                      Index;
> -  UINTN                      Index1;
>     CPU_REGISTER_TABLE_ENTRY   *RegisterTableEntry;
>   
>     CopyMem (DestinationRegisterTableList, SourceRegisterTableList, NumberOfCpus * sizeof (CPU_REGISTER_TABLE));
> @@ -802,17 +922,6 @@ CopyRegisterTable (
>           );
>         ASSERT (RegisterTableEntry != NULL);
>         DestinationRegisterTableList[Index].RegisterTableEntry = (EFI_PHYSICAL_ADDRESS)(UINTN)RegisterTableEntry;
> -      //
> -      // Go though all MSRs in register table to initialize MSR spin lock
> -      //
> -      for (Index1 = 0; Index1 < DestinationRegisterTableList[Index].TableLength; Index1++, RegisterTableEntry++) {
> -        if ((RegisterTableEntry->RegisterType == Msr) && (RegisterTableEntry->ValidBitLength < 64)) {
> -          //
> -          // Initialize MSR spin lock only for those MSRs need bit field writing
> -          //
> -          InitMsrSpinLockByIndex (RegisterTableEntry->Index);
> -        }
> -      }
>       }
>     }
>   }
> @@ -832,6 +941,7 @@ GetAcpiCpuData (
>     VOID                       *GdtForAp;
>     VOID                       *IdtForAp;
>     VOID                       *MachineCheckHandlerForAp;
> +  CPU_STATUS_INFORMATION     *CpuStatus;
>   
>     if (!mAcpiS3Enable) {
>       return;
> @@ -906,6 +1016,24 @@ GetAcpiCpuData (
>     Gdtr->Base = (UINTN)GdtForAp;
>     Idtr->Base = (UINTN)IdtForAp;
>     mAcpiCpuData.ApMachineCheckHandlerBase = (EFI_PHYSICAL_ADDRESS)(UINTN)MachineCheckHandlerForAp;
> +
> +  CpuStatus = &mAcpiCpuData.CpuStatus;
> +  CopyMem (CpuStatus, &AcpiCpuData->CpuStatus, sizeof (CPU_STATUS_INFORMATION));
> +  CpuStatus->ValidCoresPerPackages = (EFI_PHYSICAL_ADDRESS)(UINTN)AllocateCopyPool (
> +                                       sizeof (UINT32) * CpuStatus->PackageCount,
> +                                       (UINT32 *)(UINTN)AcpiCpuData->CpuStatus.ValidCoresPerPackages
> +                                       );
> +  ASSERT (CpuStatus->ValidCoresPerPackages != 0);
> +  mAcpiCpuData.ApLocation = (EFI_PHYSICAL_ADDRESS)(UINTN)AllocateCopyPool (
> +                              mAcpiCpuData.NumberOfCpus * sizeof (EFI_CPU_PHYSICAL_LOCATION),
> +                              (EFI_CPU_PHYSICAL_LOCATION *)(UINTN)AcpiCpuData->ApLocation
> +                              );
> +  ASSERT (mAcpiCpuData.ApLocation != 0);
> +  InitializeSpinLock((SPIN_LOCK*) &mCpuFlags.MemoryMappedLock);
> +  mCpuFlags.SemaphoreCount = AllocateZeroPool (
> +                               sizeof (UINT32) * CpuStatus->PackageCount *
> +                               CpuStatus->MaxCoreCount * CpuStatus->MaxThreadCount);
> +  ASSERT (mCpuFlags.SemaphoreCount != NULL);
>   }
>   
>   /**
> diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
> index 9cf508a5c7..42b040531e 100644
> --- a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
> +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
> @@ -1303,8 +1303,6 @@ InitializeSmmCpuSemaphores (
>     mSmmCpuSemaphores.SemaphoreGlobal.CodeAccessCheckLock
>                                                     = (SPIN_LOCK *)SemaphoreAddr;
>     SemaphoreAddr += SemaphoreSize;
> -  mSmmCpuSemaphores.SemaphoreGlobal.MemoryMappedLock
> -                                                  = (SPIN_LOCK *)SemaphoreAddr;
>   
>     SemaphoreAddr = (UINTN)SemaphoreBlock + GlobalSemaphoresSize;
>     mSmmCpuSemaphores.SemaphoreCpu.Busy    = (SPIN_LOCK *)SemaphoreAddr;
> @@ -1321,7 +1319,6 @@ InitializeSmmCpuSemaphores (
>   
>     mPFLock                       = mSmmCpuSemaphores.SemaphoreGlobal.PFLock;
>     mConfigSmmCodeAccessCheckLock = mSmmCpuSemaphores.SemaphoreGlobal.CodeAccessCheckLock;
> -  mMemoryMappedLock             = mSmmCpuSemaphores.SemaphoreGlobal.MemoryMappedLock;
>   
>     mSemaphoreSize = SemaphoreSize;
>   }
> diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
> index 8c7f4996d1..e2970308fe 100644
> --- a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
> +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
> @@ -53,6 +53,7 @@ WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED.
>   #include <Library/ReportStatusCodeLib.h>
>   #include <Library/SmmCpuFeaturesLib.h>
>   #include <Library/PeCoffGetEntryPointLib.h>
> +#include <Library/RegisterCpuFeaturesLib.h>
>   
>   #include <AcpiCpuData.h>
>   #include <CpuHotPlugData.h>
> @@ -364,7 +365,6 @@ typedef struct {
>     volatile BOOLEAN     *AllCpusInSync;
>     SPIN_LOCK            *PFLock;
>     SPIN_LOCK            *CodeAccessCheckLock;
> -  SPIN_LOCK            *MemoryMappedLock;
>   } SMM_CPU_SEMAPHORE_GLOBAL;
>   
>   ///
> @@ -409,7 +409,6 @@ extern SMM_CPU_SEMAPHORES                  mSmmCpuSemaphores;
>   extern UINTN                               mSemaphoreSize;
>   extern SPIN_LOCK                           *mPFLock;
>   extern SPIN_LOCK                           *mConfigSmmCodeAccessCheckLock;
> -extern SPIN_LOCK                           *mMemoryMappedLock;
>   extern EFI_SMRAM_DESCRIPTOR                *mSmmCpuSmramRanges;
>   extern UINTN                               mSmmCpuSmramRangeCount;
>   extern UINT8                               mPhysicalAddressBits;
> 

As the code change is similar to patch 3/6, please refer to the comments 
for 3/6.

-- 
Thanks,
Ray


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch v2 5/6] UefiCpuPkg/CpuS3DataDxe: Keep old data if value already existed.
  2018-10-17  2:16 ` [Patch v2 5/6] UefiCpuPkg/CpuS3DataDxe: Keep old data if value already existed Eric Dong
@ 2018-10-18  5:57   ` Ni, Ruiyu
  0 siblings, 0 replies; 18+ messages in thread
From: Ni, Ruiyu @ 2018-10-18  5:57 UTC (permalink / raw)
  To: Eric Dong, edk2-devel; +Cc: Laszlo Ersek

On 10/17/2018 10:16 AM, Eric Dong wrote:
> AcpiCpuData add new fields, keep these fields if old data already existed.
> 
> Cc: Ruiyu Ni <ruiyu.ni@intel.com>
> Cc: Laszlo Ersek <lersek@redhat.com>
> Contributed-under: TianoCore Contribution Agreement 1.1
> Signed-off-by: Eric Dong <eric.dong@intel.com>
> ---
>   UefiCpuPkg/CpuS3DataDxe/CpuS3Data.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/UefiCpuPkg/CpuS3DataDxe/CpuS3Data.c b/UefiCpuPkg/CpuS3DataDxe/CpuS3Data.c
> index ef98239844..1b847e453a 100644
> --- a/UefiCpuPkg/CpuS3DataDxe/CpuS3Data.c
> +++ b/UefiCpuPkg/CpuS3DataDxe/CpuS3Data.c
> @@ -259,6 +259,8 @@ CpuS3DataInitialize (
>     if (OldAcpiCpuData != NULL) {
>       AcpiCpuData->RegisterTable           = OldAcpiCpuData->RegisterTable;
>       AcpiCpuData->PreSmmInitRegisterTable = OldAcpiCpuData->PreSmmInitRegisterTable;
> +    AcpiCpuData->ApLocation              = OldAcpiCpuData->ApLocation;
> +    CopyMem (&AcpiCpuData->CpuStatus, &OldAcpiCpuData->CpuStatus, sizeof (CPU_STATUS_INFORMATION));
>     } else {
>       //
>       // Allocate buffer for empty RegisterTable and PreSmmInitRegisterTable for all CPUs
> 
Reviewed-by: Ruiyu Ni <ruiyu.ni@intel.com>

-- 
Thanks,
Ray


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch v2 6/6] UefiCpuPkg/CpuCommonFeaturesLib: Register MSR base on scope Info.
  2018-10-17  2:16 ` [Patch v2 6/6] UefiCpuPkg/CpuCommonFeaturesLib: Register MSR base on scope Info Eric Dong
@ 2018-10-18  6:01   ` Ni, Ruiyu
  0 siblings, 0 replies; 18+ messages in thread
From: Ni, Ruiyu @ 2018-10-18  6:01 UTC (permalink / raw)
  To: Eric Dong, edk2-devel; +Cc: Laszlo Ersek

On 10/17/2018 10:16 AM, Eric Dong wrote:
> Because MSR has scope attribute, driver has no needs to set
> MSR for all APs if MSR scope is core or package type. This patch
> updates code to base on the MSR scope value to add MSR to the register
> table.
> 
> Cc: Ruiyu Ni <ruiyu.ni@intel.com>
> Cc: Laszlo Ersek <lersek@redhat.com>
> Contributed-under: TianoCore Contribution Agreement 1.1
> Signed-off-by: Eric Dong <eric.dong@intel.com>
> ---
>   UefiCpuPkg/Library/CpuCommonFeaturesLib/C1e.c      |  8 +++++
>   UefiCpuPkg/Library/CpuCommonFeaturesLib/Eist.c     | 12 +++++++
>   .../Library/CpuCommonFeaturesLib/ExecuteDisable.c  | 10 ++++++
>   .../Library/CpuCommonFeaturesLib/FastStrings.c     | 12 +++++++
>   .../Library/CpuCommonFeaturesLib/FeatureControl.c  | 38 ++++++++++++++++++++++
>   .../CpuCommonFeaturesLib/LimitCpuIdMaxval.c        | 14 ++++++++
>   .../Library/CpuCommonFeaturesLib/MachineCheck.c    | 38 ++++++++++++++++++++++
>   .../Library/CpuCommonFeaturesLib/MonitorMwait.c    | 15 +++++++++
>   .../Library/CpuCommonFeaturesLib/PendingBreak.c    | 11 +++++++
>   UefiCpuPkg/Library/CpuCommonFeaturesLib/Ppin.c     | 11 +++++++
>   .../Library/CpuCommonFeaturesLib/ProcTrace.c       | 11 +++++++
>   UefiCpuPkg/Library/CpuCommonFeaturesLib/X2Apic.c   | 10 ++++++
>   12 files changed, 190 insertions(+)
> 
> diff --git a/UefiCpuPkg/Library/CpuCommonFeaturesLib/C1e.c b/UefiCpuPkg/Library/CpuCommonFeaturesLib/C1e.c
> index 47116355a8..1beaebe69c 100644
> --- a/UefiCpuPkg/Library/CpuCommonFeaturesLib/C1e.c
> +++ b/UefiCpuPkg/Library/CpuCommonFeaturesLib/C1e.c
> @@ -67,6 +67,14 @@ C1eInitialize (
>     IN BOOLEAN                           State
>     )
>   {
> +  //
> +  // The scope of C1EEnable bit in the MSR_NEHALEM_POWER_CTL is Package, only program
> +  // MSR_FEATURE_CONFIG for thread 0 core 0 in each package.
> +  //
> +  if ((CpuInfo->ProcessorInfo.Location.Thread != 0) || (CpuInfo->ProcessorInfo.Location.Core != 0)) {
> +  return RETURN_SUCCESS;
> +  }
> +
>     CPU_REGISTER_TABLE_WRITE_FIELD (
>       ProcessorNumber,
>       Msr,
> diff --git a/UefiCpuPkg/Library/CpuCommonFeaturesLib/Eist.c b/UefiCpuPkg/Library/CpuCommonFeaturesLib/Eist.c
> index 2038171a14..f30117d2c5 100644
> --- a/UefiCpuPkg/Library/CpuCommonFeaturesLib/Eist.c
> +++ b/UefiCpuPkg/Library/CpuCommonFeaturesLib/Eist.c
> @@ -69,6 +69,18 @@ EistInitialize (
>     IN BOOLEAN                           State
>     )
>   {
> +  //
> +  // The scope of the MSR_IA32_MISC_ENABLE is core for below processor type, only program
> +  // MSR_IA32_MISC_ENABLE for thread 0 in each core.
> +  //
> +  if (IS_ATOM_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
> +      IS_CORE_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
> +      IS_CORE2_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel)) {
> +    if (CpuInfo->ProcessorInfo.Location.Thread != 0) {
> +      return RETURN_SUCCESS;
> +    }
> +  }
> +
>     CPU_REGISTER_TABLE_WRITE_FIELD (
>       ProcessorNumber,
>       Msr,
> diff --git a/UefiCpuPkg/Library/CpuCommonFeaturesLib/ExecuteDisable.c b/UefiCpuPkg/Library/CpuCommonFeaturesLib/ExecuteDisable.c
> index 921656a1e8..ff06cb9b60 100644
> --- a/UefiCpuPkg/Library/CpuCommonFeaturesLib/ExecuteDisable.c
> +++ b/UefiCpuPkg/Library/CpuCommonFeaturesLib/ExecuteDisable.c
> @@ -79,6 +79,16 @@ ExecuteDisableInitialize (
>     IN BOOLEAN                           State
>     )
>   {
> +  //
> +  // The scope of the MSR_IA32_EFER is core for below processor type, only program
> +  // MSR_IA32_EFER for thread 0 in each core.
> +  //
> +  if (IS_SILVERMONT_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel)) {
> +    if (CpuInfo->ProcessorInfo.Location.Thread != 0) {
> +      return RETURN_SUCCESS;
> +    }
> +  }
> +
>     CPU_REGISTER_TABLE_WRITE_FIELD (
>       ProcessorNumber,
>       Msr,
> diff --git a/UefiCpuPkg/Library/CpuCommonFeaturesLib/FastStrings.c b/UefiCpuPkg/Library/CpuCommonFeaturesLib/FastStrings.c
> index 029bcf87b3..2682093c23 100644
> --- a/UefiCpuPkg/Library/CpuCommonFeaturesLib/FastStrings.c
> +++ b/UefiCpuPkg/Library/CpuCommonFeaturesLib/FastStrings.c
> @@ -40,6 +40,18 @@ FastStringsInitialize (
>     IN BOOLEAN                           State
>     )
>   {
> +  //
> +  // The scope of FastStrings bit in the MSR_IA32_MISC_ENABLE is core for below processor type, only program
> +  // MSR_IA32_MISC_ENABLE for thread 0 in each core.
> +  //
> +  if (IS_SILVERMONT_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
> +      IS_GOLDMONT_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
> +      IS_PENTIUM_4_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel)) {
> +    if (CpuInfo->ProcessorInfo.Location.Thread != 0) {
> +      return RETURN_SUCCESS;
> +    }
> +  }
> +
>     CPU_REGISTER_TABLE_WRITE_FIELD (
>       ProcessorNumber,
>       Msr,
> diff --git a/UefiCpuPkg/Library/CpuCommonFeaturesLib/FeatureControl.c b/UefiCpuPkg/Library/CpuCommonFeaturesLib/FeatureControl.c
> index d28c4ec51a..8c1eb5eb4f 100644
> --- a/UefiCpuPkg/Library/CpuCommonFeaturesLib/FeatureControl.c
> +++ b/UefiCpuPkg/Library/CpuCommonFeaturesLib/FeatureControl.c
> @@ -96,6 +96,19 @@ VmxInitialize (
>   {
>     MSR_IA32_FEATURE_CONTROL_REGISTER    *MsrRegister;
>   
> +  //
> +  // The scope of EnableVmxOutsideSmx bit in the MSR_IA32_FEATURE_CONTROL is core for
> +  // below processor type, only program MSR_IA32_FEATURE_CONTROL for thread 0 in each
> +  // core.
> +  //
> +  if (IS_SILVERMONT_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
> +      IS_GOLDMONT_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
> +      IS_GOLDMONT_PLUS_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel)) {
> +    if (CpuInfo->ProcessorInfo.Location.Thread != 0) {
> +      return RETURN_SUCCESS;
> +    }
> +  }
> +
>     ASSERT (ConfigData != NULL);
>     MsrRegister = (MSR_IA32_FEATURE_CONTROL_REGISTER *) ConfigData;
>     if (MsrRegister[ProcessorNumber].Bits.Lock == 0) {
> @@ -171,6 +184,19 @@ LockFeatureControlRegisterInitialize (
>   {
>     MSR_IA32_FEATURE_CONTROL_REGISTER    *MsrRegister;
>   
> +  //
> +  // The scope of Lock bit in the MSR_IA32_FEATURE_CONTROL is core for
> +  // below processor type, only program MSR_IA32_FEATURE_CONTROL for thread 0 in each
> +  // core.
> +  //
> +  if (IS_SILVERMONT_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
> +      IS_GOLDMONT_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
> +      IS_GOLDMONT_PLUS_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel)) {
> +    if (CpuInfo->ProcessorInfo.Location.Thread != 0) {
> +      return RETURN_SUCCESS;
> +    }
> +  }
> +
>     ASSERT (ConfigData != NULL);
>     MsrRegister = (MSR_IA32_FEATURE_CONTROL_REGISTER *) ConfigData;
>     if (MsrRegister[ProcessorNumber].Bits.Lock == 0) {
> @@ -248,6 +274,18 @@ SmxInitialize (
>     MSR_IA32_FEATURE_CONTROL_REGISTER    *MsrRegister;
>     RETURN_STATUS                        Status;
>   
> +  //
> +  // The scope of Lock bit in the MSR_IA32_FEATURE_CONTROL is core for
> +  // below processor type, only program MSR_IA32_FEATURE_CONTROL for thread 0 in each
> +  // core.
> +  //
> +  if (IS_GOLDMONT_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
> +      IS_GOLDMONT_PLUS_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel)) {
> +    if (CpuInfo->ProcessorInfo.Location.Thread != 0) {
> +      return RETURN_SUCCESS;
> +    }
> +  }
> +
>     Status = RETURN_SUCCESS;
>   
>     if (State && (!IsCpuFeatureInSetting (CPU_FEATURE_VMX))) {
> diff --git a/UefiCpuPkg/Library/CpuCommonFeaturesLib/LimitCpuIdMaxval.c b/UefiCpuPkg/Library/CpuCommonFeaturesLib/LimitCpuIdMaxval.c
> index 3d41efe9e9..eab1fb538c 100644
> --- a/UefiCpuPkg/Library/CpuCommonFeaturesLib/LimitCpuIdMaxval.c
> +++ b/UefiCpuPkg/Library/CpuCommonFeaturesLib/LimitCpuIdMaxval.c
> @@ -70,6 +70,20 @@ LimitCpuidMaxvalInitialize (
>     IN BOOLEAN                           State
>     )
>   {
> +  //
> +  // The scope of LimitCpuidMaxval bit in the MSR_IA32_MISC_ENABLE is core for below
> +  // processor type, only program MSR_IA32_MISC_ENABLE for thread 0 in each core.
> +  //
> +  if (IS_PENTIUM_4_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
> 
> +      IS_SILVERMONT_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
> +      IS_GOLDMONT_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
> +      IS_CORE_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
> +      IS_CORE2_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel)) {
> +    if (CpuInfo->ProcessorInfo.Location.Thread != 0) {
> +      return RETURN_SUCCESS;
> +    }
> +  }
> +
>     CPU_REGISTER_TABLE_WRITE_FIELD (
>       ProcessorNumber,
>       Msr,
> diff --git a/UefiCpuPkg/Library/CpuCommonFeaturesLib/MachineCheck.c b/UefiCpuPkg/Library/CpuCommonFeaturesLib/MachineCheck.c
> index c4eca062fd..f8bee53819 100644
> --- a/UefiCpuPkg/Library/CpuCommonFeaturesLib/MachineCheck.c
> +++ b/UefiCpuPkg/Library/CpuCommonFeaturesLib/MachineCheck.c
> @@ -140,6 +140,32 @@ McaInitialize (
>     MSR_IA32_MCG_CAP_REGISTER  McgCap;
>     UINT32                     BankIndex;
>   
> +  //
> +  // The scope of MSR_IA32_MC*_CTL/MSR_IA32_MC*_STATUS is core for below processor type, only program
> +  // MSR_IA32_MC*_CTL/MSR_IA32_MC*_STATUS for thread 0 in each core.
> +  //
> +  if (IS_ATOM_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
> +      IS_SILVERMONT_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
> +      IS_SANDY_BRIDGE_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
> +      IS_SKYLAKE_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
> +      IS_XEON_PHI_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
> +      IS_PENTIUM_4_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
> +      IS_CORE_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel)) {
> +    if (CpuInfo->ProcessorInfo.Location.Thread != 0) {
> +      return RETURN_SUCCESS;
> +    }
> +  }
> +
> +  //
> +  // The scope of MSR_IA32_MC*_CTL/MSR_IA32_MC*_STATUS is package for below processor type, only program
> +  // MSR_IA32_MC*_CTL/MSR_IA32_MC*_STATUS for thread 0 core 0 in each package.
> +  //
> +  if (IS_NEHALEM_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel)) {
> +    if ((CpuInfo->ProcessorInfo.Location.Thread != 0) || (CpuInfo->ProcessorInfo.Location.Core != 0)) {
> +      return RETURN_SUCCESS;
> +    }
> +  }
> +
>     if (State) {
>       McgCap.Uint64 = AsmReadMsr64 (MSR_IA32_MCG_CAP);
>       for (BankIndex = 0; BankIndex < (UINT32) McgCap.Bits.Count; BankIndex++) {
> @@ -301,6 +327,18 @@ LmceInitialize (
>   {
>     MSR_IA32_FEATURE_CONTROL_REGISTER    *MsrRegister;
>   
> +  //
> +  // The scope of FastStrings bit in the MSR_IA32_MISC_ENABLE is core for below processor type, only program
> +  // MSR_IA32_MISC_ENABLE for thread 0 in each core.
> +  //
> +  if (IS_SILVERMONT_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
> +      IS_GOLDMONT_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
> +      IS_PENTIUM_4_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel)) {
> +    if (CpuInfo->ProcessorInfo.Location.Thread != 0) {
> +      return RETURN_SUCCESS;
> +    }
> +  }
> +
>     ASSERT (ConfigData != NULL);
>     MsrRegister = (MSR_IA32_FEATURE_CONTROL_REGISTER *) ConfigData;
>     if (MsrRegister[ProcessorNumber].Bits.Lock == 0) {
> diff --git a/UefiCpuPkg/Library/CpuCommonFeaturesLib/MonitorMwait.c b/UefiCpuPkg/Library/CpuCommonFeaturesLib/MonitorMwait.c
> index 1d43bd128a..530748bf46 100644
> --- a/UefiCpuPkg/Library/CpuCommonFeaturesLib/MonitorMwait.c
> +++ b/UefiCpuPkg/Library/CpuCommonFeaturesLib/MonitorMwait.c
> @@ -67,6 +67,21 @@ MonitorMwaitInitialize (
>     IN BOOLEAN                           State
>     )
>   {
> +  //
> +  // The scope of the MSR_IA32_MISC_ENABLE is core for below processor type, only program
> +  // MSR_IA32_MISC_ENABLE for thread 0 in each core.
> +  //
> +  if (IS_CORE2_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
> +      IS_ATOM_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
> +      IS_GOLDMONT_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
> +      IS_SILVERMONT_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
> +      IS_PENTIUM_4_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
> +      IS_CORE_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel)) {
> +    if (CpuInfo->ProcessorInfo.Location.Thread != 0) {
> +      return RETURN_SUCCESS;
> +    }
> +  }
> +
>     CPU_REGISTER_TABLE_WRITE_FIELD (
>       ProcessorNumber,
>       Msr,
> diff --git a/UefiCpuPkg/Library/CpuCommonFeaturesLib/PendingBreak.c b/UefiCpuPkg/Library/CpuCommonFeaturesLib/PendingBreak.c
> index 8cafba4f4a..2e0d2bdeca 100644
> --- a/UefiCpuPkg/Library/CpuCommonFeaturesLib/PendingBreak.c
> +++ b/UefiCpuPkg/Library/CpuCommonFeaturesLib/PendingBreak.c
> @@ -74,6 +74,17 @@ PendingBreakInitialize (
>     IN BOOLEAN                           State
>     )
>   {
> +  //
> +  // The scope of the MSR_ATOM_IA32_MISC_ENABLE is core for below processor type, only program
> +  // MSR_ATOM_IA32_MISC_ENABLE for thread 0 in each core.
> +  //
> +  // Support function has check the processer type for this feature, no need to check again
> +  // here.
> +  //
> +  if (CpuInfo->ProcessorInfo.Location.Thread != 0) {
> +    return RETURN_SUCCESS;
> +  }
> +
>     //
>     // ATOM, CORE2, CORE, PENTIUM_4 and IS_PENTIUM_M_PROCESSOR have the same MSR index,
>     // Simply use MSR_ATOM_IA32_MISC_ENABLE here
> diff --git a/UefiCpuPkg/Library/CpuCommonFeaturesLib/Ppin.c b/UefiCpuPkg/Library/CpuCommonFeaturesLib/Ppin.c
> index 721470cdfe..d6219f4f3f 100644
> --- a/UefiCpuPkg/Library/CpuCommonFeaturesLib/Ppin.c
> +++ b/UefiCpuPkg/Library/CpuCommonFeaturesLib/Ppin.c
> @@ -101,6 +101,17 @@ PpinInitialize (
>       return MsrPpinCtrl.Bits.Enable_PPIN == State ? RETURN_SUCCESS : RETURN_DEVICE_ERROR;
>     }
>   
> +  //
> +  // Support function already check the processor which support PPIN feature, so this function not need
> +  // to check the processor again.
> +  //
> +  // The scope of the MSR_IVY_BRIDGE_PPIN_CTL is package level, only program MSR_IVY_BRIDGE_PPIN_CTL for
> +  // thread 0 core 0 in each package.
> +  //
> +  if ((CpuInfo->ProcessorInfo.Location.Thread != 0) || (CpuInfo->ProcessorInfo.Location.Core != 0)) {
> +    return RETURN_SUCCESS;
> +  }
> +
>     CPU_REGISTER_TABLE_WRITE_FIELD (
>       ProcessorNumber,
>       Msr,
> diff --git a/UefiCpuPkg/Library/CpuCommonFeaturesLib/ProcTrace.c b/UefiCpuPkg/Library/CpuCommonFeaturesLib/ProcTrace.c
> index 98490c6777..cf34ad4d1f 100644
> --- a/UefiCpuPkg/Library/CpuCommonFeaturesLib/ProcTrace.c
> +++ b/UefiCpuPkg/Library/CpuCommonFeaturesLib/ProcTrace.c
> @@ -191,6 +191,17 @@ ProcTraceInitialize (
>     MSR_IA32_RTIT_OUTPUT_MASK_PTRS_REGISTER  OutputMaskPtrsReg;
>     RTIT_TOPA_TABLE_ENTRY                *TopaEntryPtr;
>   
> +  //
> +  // The scope of the MSR_IA32_RTIT_* is core for below processor type, only program
> +  // MSR_IA32_RTIT_* for thread 0 in each core.
> +  //
> +  if (IS_GOLDMONT_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel) ||
> +      IS_GOLDMONT_PLUS_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel)) {
> +    if (CpuInfo->ProcessorInfo.Location.Thread != 0) {
> +      return RETURN_SUCCESS;
> +    }
> +  }
> +
>     ProcTraceData = (PROC_TRACE_DATA *) ConfigData;
>     ASSERT (ProcTraceData != NULL);
>   
> diff --git a/UefiCpuPkg/Library/CpuCommonFeaturesLib/X2Apic.c b/UefiCpuPkg/Library/CpuCommonFeaturesLib/X2Apic.c
> index b4a453c352..342b45f25b 100644
> --- a/UefiCpuPkg/Library/CpuCommonFeaturesLib/X2Apic.c
> +++ b/UefiCpuPkg/Library/CpuCommonFeaturesLib/X2Apic.c
> @@ -102,6 +102,16 @@ X2ApicInitialize (
>   {
>     BOOLEAN                            *X2ApicEnabled;
>   
> +  //
> +  // The scope of the MSR_IA32_APIC_BASE is core for below processor type, only program
> +  // MSR_IA32_APIC_BASE for thread 0 in each core.
> +  //
> +  if (IS_SILVERMONT_PROCESSOR (CpuInfo->DisplayFamily, CpuInfo->DisplayModel)) {
> +    if (CpuInfo->ProcessorInfo.Location.Thread != 0) {
> +      return RETURN_SUCCESS;
> +    }
> +  }
> +
>     ASSERT (ConfigData != NULL);
>     X2ApicEnabled = (BOOLEAN *) ConfigData;
>     if (X2ApicEnabled[ProcessorNumber]) {
> 
Reviewed-by: Ruiyu Ni <ruiyu.ni@intel.com>

-- 
Thanks,
Ray


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch v2 0/6] Fix performance issue caused by Set MSR task.
  2018-10-17 17:33 ` [Patch v2 0/6] Fix performance issue caused by Set MSR task Laszlo Ersek
@ 2018-10-18  7:36   ` Dong, Eric
  0 siblings, 0 replies; 18+ messages in thread
From: Dong, Eric @ 2018-10-18  7:36 UTC (permalink / raw)
  To: Laszlo Ersek; +Cc: Ni, Ruiyu, edk2-devel-01

Hi Laszlo,

Thanks for your notes. I have updated V2 changes based on Ray's comments. Please help to check the V3 changes.

Thanks,
Eric

> -----Original Message-----
> From: Laszlo Ersek [mailto:lersek@redhat.com]
> Sent: Thursday, October 18, 2018 1:34 AM
> To: Dong, Eric <eric.dong@intel.com>
> Cc: Ni, Ruiyu <ruiyu.ni@intel.com>; edk2-devel-01 <edk2-devel@lists.01.org>
> Subject: Re: [edk2] [Patch v2 0/6] Fix performance issue caused by Set MSR
> task.
> 
> Hi Eric,
> 
> On 10/17/18 04:16, Eric Dong wrote:
> > V2 changes include:
> > 1. Include the change for CpuCommonFeaturesLib which used to set MSR
> base on its scope info.
> > 2. Include the change for CpuS3DataDxe driver which also handle the
> AcpiCpuData data.
> > 3. Update code base on feedback for V1 changes.
> 
> reviewing this version of the series is on my TODO list. I was out-of-office
> yesterday (somewhat unexpectedly) and this morning I had
> 130 emails just in my inbox (not counting other list traffic).
> 
> I've more or less managed to get to the bottom of that mail dump by now,
> but as a result, it's too late to start reviewing your v2 still today.
> Hopefully I can cover it tomorrow.
> 
> (I might fetch another email batch this evening, and handle small items.
> Thus, you could see further emails from me on the list tonight.)
> 
> If Ray reviewed your v2 today (while I was busy unburying myself from my
> inbox), and you are ready to post a v3 based on Ray's comments, please do
> so. Then I'll skip v2 and catch up with v3.
> 
> Thanks!
> Laszlo

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-10-18  7:37 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-10-17  2:16 [Patch v2 0/6] Fix performance issue caused by Set MSR task Eric Dong
2018-10-17  2:16 ` [Patch v2 1/6] UefiCpuPkg/Include/AcpiCpuData.h: Add Semaphore related Information Eric Dong
2018-10-18  3:04   ` Ni, Ruiyu
2018-10-18  3:10   ` Ni, Ruiyu
2018-10-17  2:16 ` [Patch v2 2/6] UefiCpuPkg/RegisterCpuFeaturesLib.h: Add new dependence types Eric Dong
2018-10-18  3:31   ` Ni, Ruiyu
2018-10-17  2:16 ` [Patch v2 3/6] UefiCpuPkg/RegisterCpuFeaturesLib: Add logic to support semaphore type Eric Dong
2018-10-18  5:46   ` Ni, Ruiyu
2018-10-17  2:16 ` [Patch v2 4/6] UefiCpuPkg/PiSmmCpuDxeSmm: " Eric Dong
2018-10-18  5:54   ` Ni, Ruiyu
2018-10-17  2:16 ` [Patch v2 5/6] UefiCpuPkg/CpuS3DataDxe: Keep old data if value already existed Eric Dong
2018-10-18  5:57   ` Ni, Ruiyu
2018-10-17  2:16 ` [Patch v2 6/6] UefiCpuPkg/CpuCommonFeaturesLib: Register MSR base on scope Info Eric Dong
2018-10-18  6:01   ` Ni, Ruiyu
2018-10-17 17:33 ` [Patch v2 0/6] Fix performance issue caused by Set MSR task Laszlo Ersek
2018-10-18  7:36   ` Dong, Eric
2018-10-18  2:12 ` Ni, Ruiyu
2018-10-18  2:35   ` Dong, Eric

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox