public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: "Ni, Ruiyu" <ruiyu.ni@Intel.com>
To: Eric Dong <eric.dong@intel.com>, edk2-devel@lists.01.org
Cc: Laszlo Ersek <lersek@redhat.com>
Subject: Re: [Patch 3/4] UefiCpuPkg/RegisterCpuFeaturesLib: Add logic to support semaphore type.
Date: Tue, 16 Oct 2018 11:05:09 +0800	[thread overview]
Message-ID: <ccee537f-6fbc-1505-78af-c52c4cc77e27@Intel.com> (raw)
In-Reply-To: <20181015024948.228-4-eric.dong@intel.com>

On 10/15/2018 10:49 AM, Eric Dong wrote:
> In a system which has multiple cores, current set register value task costs huge times.
> After investigation, current set MSR task costs most of the times. Current logic uses
> SpinLock to let set MSR task as an single thread task for all cores. Because MSR has
> scope attribute which may cause GP fault if multiple APs set MSR at the same time,
> current logic use an easiest solution (use SpinLock) to avoid this issue, but it will
> cost huge times.
> 
> In order to fix this performance issue, new solution will set MSRs base on their scope
> attribute. After this, the SpinLock will not needed. Without SpinLock, new issue raised
> which is caused by MSR dependence. For example, MSR A depends on MSR B which means MSR A
> must been set after MSR B has been set. Also MSR B is package scope level and MSR A is
> thread scope level. If system has multiple threads, Thread 1 needs to set the thread level
> MSRs and thread 2 needs to set thread and package level MSRs. Set MSRs task for thread 1
> and thread 2 like below:
> 
>              Thread 1                 Thread 2
> MSR B          N                        Y
> MSR A          Y                        Y
> 
> If driver don't control execute MSR order, for thread 1, it will execute MSR A first, but
> at this time, MSR B not been executed yet by thread 2. system may trig exception at this
> time.
> 
> In order to fix the above issue, driver introduces semaphore logic to control the MSR
> execute sequence. For the above case, a semaphore will be add between MSR A and B for
> all threads. Semaphore has scope info for it. The possible scope value is core or package.
> For each thread, when it meets a semaphore during it set registers, it will 1) release
> semaphore (+1) for each threads in this core or package(based on the scope info for this
> semaphore) 2) acquire semaphore (-1) for all the threads in this core or package(based
> on the scope info for this semaphore). With these two steps, driver can control MSR
> sequence. Sample code logic like below:
> 
>    //
>    // First increase semaphore count by 1 for processors in this package.
>    //
>    for (ProcessorIndex = 0; ProcessorIndex < PackageThreadsCount ; ProcessorIndex ++) {
>      LibReleaseSemaphore ((UINT32 *) &SemaphorePtr[PackageOffset + ProcessorIndex]);
>    }
>    //
>    // Second, check whether the count has reach the check number.
>    //
>    for (ProcessorIndex = 0; ProcessorIndex < ValidApCount; ProcessorIndex ++) {
>      LibWaitForSemaphore (&SemaphorePtr[ApOffset]);
>    }
> 
> Platform Requirement:
> 1. This change requires register MSR setting base on MSR scope info. If still register MSR
>     for all threads, exception may raised.
> 
> Known limitation:
> 1. Current CpuFeatures driver supports DXE instance and PEI instance. But semaphore logic
>     requires Aps execute in async mode which is not supported by PEI driver. So CpuFeature
>     PEI instance not works after this change. We plan to support async mode for PEI in phase
>     2 for this task.
> 
> Cc: Ruiyu Ni <ruiyu.ni@intel.com>
> Cc: Laszlo Ersek <lersek@redhat.com>
> Contributed-under: TianoCore Contribution Agreement 1.1
> Signed-off-by: Eric Dong <eric.dong@intel.com>
> ---
>   .../RegisterCpuFeaturesLib/CpuFeaturesInitialize.c | 324 ++++++++++++---
>   .../DxeRegisterCpuFeaturesLib.c                    |  71 +++-
>   .../DxeRegisterCpuFeaturesLib.inf                  |   3 +
>   .../PeiRegisterCpuFeaturesLib.c                    |  55 ++-
>   .../PeiRegisterCpuFeaturesLib.inf                  |   1 +
>   .../RegisterCpuFeaturesLib/RegisterCpuFeatures.h   |  51 ++-
>   .../RegisterCpuFeaturesLib.c                       | 452 ++++++++++++++++++---
>   7 files changed, 840 insertions(+), 117 deletions(-)
> 
> diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c
> index ba3fb3250f..f820b4fed7 100644
> --- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c
> +++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c
> @@ -145,6 +145,20 @@ CpuInitDataInitialize (
>     CPU_FEATURES_INIT_ORDER              *InitOrder;
>     CPU_FEATURES_DATA                    *CpuFeaturesData;
>     LIST_ENTRY                           *Entry;
> +  UINT32                               Core;
> +  UINT32                               Package;
> +  UINT32                               Thread;
> +  EFI_CPU_PHYSICAL_LOCATION            *Location;
> +  UINT32                               *CoreArray;
> +  UINTN                                Index;
> +  UINT32                               ValidCount;
> +  UINTN                                CoreIndex;
> +  ACPI_CPU_DATA                        *AcpiCpuData;
> +  CPU_STATUS_INFORMATION               *CpuStatus;
> +
> +  Core    = 0;
> +  Package = 0;
> +  Thread  = 0;
>   
>     CpuFeaturesData = GetCpuFeaturesData ();
>     CpuFeaturesData->InitOrder = AllocateZeroPool (sizeof (CPU_FEATURES_INIT_ORDER) * NumberOfCpus);
> @@ -163,6 +177,16 @@ CpuInitDataInitialize (
>       Entry = Entry->ForwardLink;
>     }
>   
> +  CpuFeaturesData->NumberOfCpus = (UINT32) NumberOfCpus;
> +
> +  AcpiCpuData = (ACPI_CPU_DATA *) (UINTN) PcdGet64 (PcdCpuS3DataAddress);
> +  ASSERT (AcpiCpuData != NULL);
> +  CpuFeaturesData->AcpiCpuData= AcpiCpuData;
> +
> +  CpuStatus = &AcpiCpuData->CpuStatus;
> +  AcpiCpuData->ApLocation = AllocateZeroPool (sizeof (EFI_CPU_PHYSICAL_LOCATION) * NumberOfCpus);
> +  ASSERT (AcpiCpuData->ApLocation != NULL);
> +
>     for (ProcessorNumber = 0; ProcessorNumber < NumberOfCpus; ProcessorNumber++) {
>       InitOrder = &CpuFeaturesData->InitOrder[ProcessorNumber];
>       InitOrder->FeaturesSupportedMask = AllocateZeroPool (CpuFeaturesData->BitMaskSize);
> @@ -175,7 +199,59 @@ CpuInitDataInitialize (
>         &ProcessorInfoBuffer,
>         sizeof (EFI_PROCESSOR_INFORMATION)
>         );
> +    CopyMem (
> +      AcpiCpuData->ApLocation + ProcessorNumber,
> +      &ProcessorInfoBuffer.Location,
> +      sizeof (EFI_CPU_PHYSICAL_LOCATION)
> +      );
> +

Please add more comments here to describe what the below code tries to 
do and why.

> +    if (Package < ProcessorInfoBuffer.Location.Package) {
> +      Package = ProcessorInfoBuffer.Location.Package;
> +    }
> +    if (Core < ProcessorInfoBuffer.Location.Core) {
> +      Core = ProcessorInfoBuffer.Location.Core;
> +    }
> +    if (Thread < ProcessorInfoBuffer.Location.Thread) {
> +      Thread = ProcessorInfoBuffer.Location.Thread;
> +    }
> +  }
> +  CpuStatus->PackageCount = Package + 1;
> +  CpuStatus->CoreCount    = Core + 1;
> +  CpuStatus->ThreadCount  = Thread + 1;


> +  DEBUG ((DEBUG_INFO, "Processor Info: Package: %d, Core : %d, Thread: %d\n",
> +         CpuStatus->PackageCount,
> +         CpuStatus->CoreCount,
> +         CpuStatus->ThreadCount));

Please use MaxCore and MaxThread in debug message. Otherwise it's confusing.

> +
> +  //
> +  // Collect valid core count in each package because not all cores are valid.
> +  //
> +  CpuStatus->ValidCoresInPackages = AllocateZeroPool (sizeof (UINT32) * CpuStatus->PackageCount);
> +  ASSERT (CpuStatus->ValidCoresInPackages != NULL);

Please add comments to describe the purpose of CoreArray.
CoreArray is not a good name IMO. How about:
CoreVisited - AllocatePool (sizeof (BOOLEAN) * CpuStatus->MaxCoreCount);

> +  CoreArray = AllocatePool (sizeof (UINT32) * CpuStatus->CoreCount);
> +  ASSERT (CoreArray != NULL);
> +
> +  for (Index = 0; Index <= Package; Index ++ ) {

Please stop using Package/Core/Thread. Use the field in CpuStatus 
structure instead. It makes the code more readable.

> +    ZeroMem (CoreArray, sizeof (UINT32) * (Core + 1));
> +    for (ProcessorNumber = 0; ProcessorNumber < NumberOfCpus; ProcessorNumber++) {
> +      Location = &CpuFeaturesData->InitOrder[ProcessorNumber].CpuInfo.ProcessorInfo.Location;
> +      if (Location->Package == Index) {
> +        CoreArray[Location->Core] = 1;
> +      }

The above if-clause can be:
          if ((Location->Package == Index) &&
              !CoreVisited[Location->Core])) {
            CpuStatus->ValidCoreCountPerPackage[Index]++;
            CoreVisited[Location->Core] = TRUE;
          }

The for-loop below can be removed.

> +    }
> +    for (CoreIndex = 0, ValidCount = 0; CoreIndex <= Core; CoreIndex ++) {
> +      ValidCount += CoreArray[CoreIndex];
> +    }
> +    CpuStatus->ValidCoresInPackages[Index] = ValidCount;
>     }
> +  FreePool (CoreArray);
> +  for (Index = 0; Index <= Package; Index++) {
> +    DEBUG ((DEBUG_INFO, "Package: %d, Valid Core : %d\n", Index, CpuStatus->ValidCoresInPackages[Index]));
> +  }
> +
> +  CpuFeaturesData->CpuFlags.SemaphoreCount = AllocateZeroPool (sizeof (UINT32) * CpuStatus->PackageCount * CpuStatus->CoreCount* CpuStatus->ThreadCount);
> +  ASSERT (CpuFeaturesData->CpuFlags.SemaphoreCount != NULL);
> +
>     //
>     // Get support and configuration PCDs
>     //
> @@ -310,7 +386,7 @@ CollectProcessorData (
>     LIST_ENTRY                           *Entry;
>     CPU_FEATURES_DATA                    *CpuFeaturesData;
>   
> -  CpuFeaturesData = GetCpuFeaturesData ();
> +  CpuFeaturesData = (CPU_FEATURES_DATA *)Buffer;

Is the above change more proper in a separate patch?

>     ProcessorNumber = GetProcessorIndex ();
>     CpuInfo = &CpuFeaturesData->InitOrder[ProcessorNumber].CpuInfo;
>     //
> @@ -416,6 +492,15 @@ DumpRegisterTableOnProcessor (
>           RegisterTableEntry->Value
>           ));
>         break;
> +    case Semaphore:
> +      DEBUG ((
> +        DebugPrintErrorLevel,
> +        "Processor: %d: Semaphore: Scope Value: %d\r\n",

How about print the Scope value in string? This makes the debug message 
more meaningful.

> +        ProcessorNumber,
> +        RegisterTableEntry->Value
> +        ));
> +      break;
> +
>       default:
>         break;
>       }
> @@ -441,6 +526,11 @@ AnalysisProcessorFeatures (
>     REGISTER_CPU_FEATURE_INFORMATION     *CpuInfo;
>     LIST_ENTRY                           *Entry;
>     CPU_FEATURES_DATA                    *CpuFeaturesData;
> +  LIST_ENTRY                           *NextEntry;
> +  CPU_FEATURES_ENTRY                   *NextCpuFeatureInOrder;
> +  BOOLEAN                              Success;
> +  CPU_FEATURE_DEPENDENCE_TYPE          BeforeDep;
> +  CPU_FEATURE_DEPENDENCE_TYPE          AfterDep;
>   
>     CpuFeaturesData = GetCpuFeaturesData ();
>     CpuFeaturesData->CapabilityPcd = AllocatePool (CpuFeaturesData->BitMaskSize);
> @@ -517,8 +607,14 @@ AnalysisProcessorFeatures (
>       //
>       CpuInfo = &CpuFeaturesData->InitOrder[ProcessorNumber].CpuInfo;
>       Entry = GetFirstNode (&CpuInitOrder->OrderList);
> +    NextEntry = Entry->ForwardLink;
>       while (!IsNull (&CpuInitOrder->OrderList, Entry)) {
>         CpuFeatureInOrder = CPU_FEATURE_ENTRY_FROM_LINK (Entry);
> +      if (!IsNull (&CpuInitOrder->OrderList, NextEntry)) {
> +        NextCpuFeatureInOrder = CPU_FEATURE_ENTRY_FROM_LINK (NextEntry);
> +      } else {
> +        NextCpuFeatureInOrder = NULL;
> +      }
>         if (IsBitMaskMatch (CpuFeatureInOrder->FeatureMask, CpuFeaturesData->SettingPcd)) {
>           Status = CpuFeatureInOrder->InitializeFunc (ProcessorNumber, CpuInfo, CpuFeatureInOrder->ConfigData, TRUE);
>           if (EFI_ERROR (Status)) {
> @@ -532,6 +628,8 @@ AnalysisProcessorFeatures (
>               DEBUG ((DEBUG_WARN, "Warning :: Failed to enable Feature: Mask = "));
>               DumpCpuFeatureMask (CpuFeatureInOrder->FeatureMask);
>             }
> +        } else {
> +          Success = TRUE;
>           }
>         } else {
>           Status = CpuFeatureInOrder->InitializeFunc (ProcessorNumber, CpuInfo, CpuFeatureInOrder->ConfigData, FALSE);
> @@ -542,9 +640,36 @@ AnalysisProcessorFeatures (
>               DEBUG ((DEBUG_WARN, "Warning :: Failed to disable Feature: Mask = "));
>               DumpCpuFeatureMask (CpuFeatureInOrder->FeatureMask);
>             }
> +        } else {
> +          Success = TRUE;
>           }
>         }
> -      Entry = Entry->ForwardLink;
> +
> +      if (Success) {
> +        //
> +        // If feature has dependence with the next feature (ONLY care core/package dependency).
> +        // and feature initialize succeed, add sync semaphere here.
> +        //
> +        BeforeDep = DetectFeatureScope (CpuFeatureInOrder, TRUE);
> +        if (NextCpuFeatureInOrder != NULL) {
> +          AfterDep  = DetectFeatureScope (NextCpuFeatureInOrder, FALSE);
> +        } else {
> +          AfterDep = NoneDepType;
> +        }
> +        //
> +        // Assume only one of the depend is valid.
> +        //
> +        ASSERT (!(BeforeDep > ThreadDepType && AfterDep > ThreadDepType));
> +        if (BeforeDep > ThreadDepType) {
> +          CPU_REGISTER_TABLE_WRITE32 (ProcessorNumber, Semaphore, 0, BeforeDep);
> +        }
> +        if (AfterDep > ThreadDepType) {
> +          CPU_REGISTER_TABLE_WRITE32 (ProcessorNumber, Semaphore, 0, AfterDep);
> +        }
> +      }
> +
> +      Entry     = Entry->ForwardLink;
> +      NextEntry = Entry->ForwardLink;
>       }
>   
>       //
> @@ -561,27 +686,79 @@ AnalysisProcessorFeatures (
>     }
>   }
>   
> +/**
> +  Increment semaphore by 1.
> +
> +  @param      Sem            IN:  32-bit unsigned integer
> +
> +**/
> +VOID
> +LibReleaseSemaphore (
> +  IN OUT  volatile UINT32           *Sem
> +  )
> +{
> +  InterlockedIncrement (Sem);
> +}
> +
> +/**
> +  Decrement the semaphore by 1 if it is not zero.
> +
> +  Performs an atomic decrement operation for semaphore.
> +  The compare exchange operation must be performed using
> +  MP safe mechanisms.
> +
> +  @param      Sem            IN:  32-bit unsigned integer
> +
> +**/
> +VOID
> +LibWaitForSemaphore (
> +  IN OUT  volatile UINT32           *Sem
> +  )
> +{
> +  UINT32  Value;
> +
> +  do {
> +    Value = *Sem;
> +  } while (Value == 0);
> +
> +  InterlockedDecrement (Sem);
> +}
> +
>   /**
>     Initialize the CPU registers from a register table.
>   
> -  @param[in]  ProcessorNumber  The index of the CPU executing this function.
> +  @param[in]  RegisterTable         The register table for this AP.
> +  @param[in]  ApLocation            AP location info for this ap.
> +  @param[in]  CpuStatus             CPU status info for this CPU.
> +  @param[in]  CpuFlags              Flags data structure used when program the register.
>   
>     @note This service could be called by BSP/APs.
>   **/
>   VOID
> +EFIAPI
>   ProgramProcessorRegister (
> -  IN UINTN  ProcessorNumber
> +  IN CPU_REGISTER_TABLE           *RegisterTable,
> +  IN EFI_CPU_PHYSICAL_LOCATION    *ApLocation,
> +  IN CPU_STATUS_INFORMATION       *CpuStatus,
> +  IN PROGRAM_CPU_REGISTER_FLAGS   *CpuFlags
>     )
>   {
> -  CPU_FEATURES_DATA         *CpuFeaturesData;
> -  CPU_REGISTER_TABLE        *RegisterTable;
>     CPU_REGISTER_TABLE_ENTRY  *RegisterTableEntry;
>     UINTN                     Index;
>     UINTN                     Value;
>     CPU_REGISTER_TABLE_ENTRY  *RegisterTableEntryHead;
> -
> -  CpuFeaturesData = GetCpuFeaturesData ();
> -  RegisterTable = &CpuFeaturesData->RegisterTable[ProcessorNumber];
> +  volatile UINT32           *SemaphorePtr;
> +  UINT32                    CoreOffset;
> +  UINT32                    PackageOffset;
> +  UINT32                    PackageThreadsCount;
> +  UINT32                    ApOffset;
> +  UINTN                     ProcessorIndex;
> +  UINTN                     ApIndex;
> +  UINTN                     ValidApCount;
> +
> +  ApIndex = ApLocation->Package * CpuStatus->CoreCount * CpuStatus->ThreadCount \
> +            + ApLocation->Core * CpuStatus->ThreadCount \
> +            + ApLocation->Thread;
>   
>     //
>     // Traverse Register Table of this logical processor
> @@ -591,6 +768,7 @@ ProgramProcessorRegister (
>     for (Index = 0; Index < RegisterTable->TableLength; Index++) {
>   
>       RegisterTableEntry = &RegisterTableEntryHead[Index];
> +    DEBUG ((DEBUG_INFO, "Processor = %d, Entry Index %d, Type = %d!\n", ApIndex, Index, RegisterTableEntry->RegisterType));
How about print the register type in string?

>   
>       //
>       // Check the type of specified register
> @@ -654,10 +832,6 @@ ProgramProcessorRegister (
>       // The specified register is Model Specific Register
>       //
>       case Msr:
> -      //
> -      // Get lock to avoid Package/Core scope MSRs programming issue in parallel execution mode
> -      //
> -      AcquireSpinLock (&CpuFeaturesData->MsrLock);
>         if (RegisterTableEntry->ValidBitLength >= 64) {
>           //
>           // If length is not less than 64 bits, then directly write without reading
> @@ -677,20 +851,19 @@ ProgramProcessorRegister (
>             RegisterTableEntry->Value
>             );
>         }
> -      ReleaseSpinLock (&CpuFeaturesData->MsrLock);
>         break;
>       //
>       // MemoryMapped operations
>       //
>       case MemoryMapped:
> -      AcquireSpinLock (&CpuFeaturesData->MemoryMappedLock);
> +      AcquireSpinLock (&CpuFlags->MemoryMappedLock);
>         MmioBitFieldWrite32 (
>           (UINTN)(RegisterTableEntry->Index | LShiftU64 (RegisterTableEntry->HighIndex, 32)),
>           RegisterTableEntry->ValidBitStart,
>           RegisterTableEntry->ValidBitStart + RegisterTableEntry->ValidBitLength - 1,
>           (UINT32)RegisterTableEntry->Value
>           );
> -      ReleaseSpinLock (&CpuFeaturesData->MemoryMappedLock);
> +      ReleaseSpinLock (&CpuFlags->MemoryMappedLock);
>         break;
>       //
>       // Enable or disable cache
> @@ -706,6 +879,50 @@ ProgramProcessorRegister (
>         }
>         break;
>   
> +    case Semaphore:
> +      SemaphorePtr = CpuFlags->SemaphoreCount;
> +      switch (RegisterTableEntry->Value) {
> +      case CoreDepType:
> +        CoreOffset = (ApLocation->Package * CpuStatus->CoreCount + ApLocation->Core) * CpuStatus->ThreadCount > +        ApOffset = CoreOffset + ApLocation->Thread;

How about FirstThread and CurrentThread?

> +        //
> +        // First increase semaphore count by 1 for processors in this core.
This comment might not be helpful for reviewer to understand.
How about "Notify all threads in current Core"?

> +        //
> +        for (ProcessorIndex = 0; ProcessorIndex < CpuStatus->ThreadCount; ProcessorIndex ++) {
> +          LibReleaseSemaphore ((UINT32 *) &SemaphorePtr[CoreOffset + ProcessorIndex]);
> +        }
> +        //
> +        // Second, check whether the count has reach the check number.
How about "Wait for all threads in current Core"?

Below diagram is also helpful
//
//  V(x) = LibReleaseSemaphore (Semaphore[FirstThread + x]);
//  P(x) = LibWaitForSemaphore (Semaphore[FirstThread + x]);
//
//  All threads (T0...Tn) waits in P() line and continues running
//  together.
//
//
//  T0             T1            ...           Tn
//
//  V(0...n)       V(0...n)      ...           V(0...n)
//  n * P(0)       n * P(1)      ...           n * P(n)
//

> +        //
> +        for (ProcessorIndex = 0; ProcessorIndex < CpuStatus->ThreadCount; ProcessorIndex ++) {
> +          LibWaitForSemaphore (&SemaphorePtr[ApOffset]);
> +        }
> +        break;
> +
> +      case PackageDepType:
> +        PackageOffset = ApLocation->Package * CpuStatus->CoreCount * CpuStatus->ThreadCount;

FirstThread?

> +        PackageThreadsCount = CpuStatus->ThreadCount * CpuStatus->CoreCount;
ThreadCount?

> +        ApOffset = PackageOffset + CpuStatus->ThreadCount * ApLocation->Core + ApLocation->Thread;
CurrentThread?

> +        ValidApCount = CpuStatus->ThreadCount * CpuStatus->ValidCoresInPackages[ApLocation->Package];
ValidThreadCount?

> +        //
> +        // First increase semaphore count by 1 for processors in this package.
How about "Notify all threads in current Package"?
> +        //
> +        for (ProcessorIndex = 0; ProcessorIndex < PackageThreadsCount ; ProcessorIndex ++) {
> +          LibReleaseSemaphore ((UINT32 *) &SemaphorePtr[PackageOffset + ProcessorIndex]);
> +        }
> +        //
> +        // Second, check whether the count has reach the check number.
How about "Wait for all threads in current Package"?
> +        //
> +        for (ProcessorIndex = 0; ProcessorIndex < ValidApCount; ProcessorIndex ++) {
> +          LibWaitForSemaphore (&SemaphorePtr[ApOffset]);
> +        }
> +        break;
> +
> +      default:
> +        break;
> +      }
> +      break;
> +
>       default:
>         break;
>       }
> @@ -724,10 +941,36 @@ SetProcessorRegister (
>     IN OUT VOID            *Buffer
>     )
>   {
> -  UINTN                  ProcessorNumber;
> +  CPU_FEATURES_DATA         *CpuFeaturesData;
> +  CPU_REGISTER_TABLE        *RegisterTable;
> +  CPU_REGISTER_TABLE        *RegisterTables;
> +  UINT32                    InitApicId;
> +  UINTN                     ProcIndex;
> +  UINTN                     Index;
> +  ACPI_CPU_DATA             *AcpiCpuData;
>   
> -  ProcessorNumber = GetProcessorIndex ();
> -  ProgramProcessorRegister (ProcessorNumber);
> +  CpuFeaturesData = (CPU_FEATURES_DATA *) Buffer;
> +  AcpiCpuData = CpuFeaturesData->AcpiCpuData;
> +
> +  RegisterTables = (CPU_REGISTER_TABLE *)(UINTN)AcpiCpuData->RegisterTable;
> +
> +  InitApicId = GetInitialApicId ();
> +  RegisterTable = NULL;
> +  for (Index = 0; Index < AcpiCpuData->NumberOfCpus; Index++) {
> +    if (RegisterTables[Index].InitialApicId == InitApicId) {
> +      RegisterTable =  &RegisterTables[Index];
> +      ProcIndex = Index;
> +      break;
> +    }
> +  }
> +  ASSERT (RegisterTable != NULL);
> +
> +  ProgramProcessorRegister (
> +    RegisterTable,
> +    AcpiCpuData->ApLocation + ProcIndex,
> +    &AcpiCpuData->CpuStatus,
> +    &CpuFeaturesData->CpuFlags
> +    );
>   }
>   
>   /**
> @@ -746,6 +989,9 @@ CpuFeaturesDetect (
>   {
>     UINTN                  NumberOfCpus;
>     UINTN                  NumberOfEnabledProcessors;
> +  CPU_FEATURES_DATA      *CpuFeaturesData;
> +
> +  CpuFeaturesData = GetCpuFeaturesData();
>   
>     GetNumberOfProcessor (&NumberOfCpus, &NumberOfEnabledProcessors);
>   
> @@ -754,49 +1000,13 @@ CpuFeaturesDetect (
>     //
>     // Wakeup all APs for data collection.
>     //
> -  StartupAPsWorker (CollectProcessorData);
> +  StartupAPsWorker (CollectProcessorData, NULL);
>   
>     //
>     // Collect data on BSP
>     //
> -  CollectProcessorData (NULL);
> +  CollectProcessorData (CpuFeaturesData);
>   
>     AnalysisProcessorFeatures (NumberOfCpus);
>   }
>   
> -/**
> -  Performs CPU features Initialization.
> -
> -  This service will invoke MP service to perform CPU features
> -  initialization on BSP/APs per user configuration.
> -
> -  @note This service could be called by BSP only.
> -**/
> -VOID
> -EFIAPI
> -CpuFeaturesInitialize (
> -  VOID
> -  )
> -{
> -  CPU_FEATURES_DATA      *CpuFeaturesData;
> -  UINTN                  OldBspNumber;
> -
> -  CpuFeaturesData = GetCpuFeaturesData ();
> -
> -  OldBspNumber = GetProcessorIndex();
> -  CpuFeaturesData->BspNumber = OldBspNumber;
> -  //
> -  // Wakeup all APs for programming.
> -  //
> -  StartupAPsWorker (SetProcessorRegister);
> -  //
> -  // Programming BSP
> -  //
> -  SetProcessorRegister (NULL);
> -  //
> -  // Switch to new BSP if required
> -  //
> -  if (CpuFeaturesData->BspNumber != OldBspNumber) {
> -    SwitchNewBsp (CpuFeaturesData->BspNumber);
> -  }
> -}
> diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.c b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.c
> index 1f34a3f489..8346f7004f 100644
> --- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.c
> +++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.c
> @@ -15,6 +15,7 @@
>   #include <PiDxe.h>
>   
>   #include <Library/UefiBootServicesTableLib.h>
> +#include <Library/UefiLib.h>
>   
>   #include "RegisterCpuFeatures.h"
>   
> @@ -115,14 +116,20 @@ GetProcessorInformation (
>   
>     @param[in]  Procedure               A pointer to the function to be run on
>                                         enabled APs of the system.
> +  @param[in]  MpEvent                 A pointer to the event to be used later
> +                                      to check whether procedure has done.
>   **/
>   VOID
>   StartupAPsWorker (
> -  IN  EFI_AP_PROCEDURE                 Procedure
> +  IN  EFI_AP_PROCEDURE                 Procedure,
> +  IN  VOID                             *MpEvent
>     )
>   {
>     EFI_STATUS                           Status;
>     EFI_MP_SERVICES_PROTOCOL             *MpServices;
> +  CPU_FEATURES_DATA                    *CpuFeaturesData;
> +
> +  CpuFeaturesData = GetCpuFeaturesData ();
>   
>     MpServices = GetMpProtocol ();
>     //
> @@ -132,9 +139,9 @@ StartupAPsWorker (
>                    MpServices,
>                    Procedure,
>                    FALSE,
> -                 NULL,
> +                 (EFI_EVENT)MpEvent,
>                    0,
> -                 NULL,
> +                 CpuFeaturesData,
>                    NULL
>                    );
>     ASSERT_EFI_ERROR (Status);
> @@ -197,3 +204,61 @@ GetNumberOfProcessor (
>     ASSERT_EFI_ERROR (Status);
>   }
>   
> +/**
> +  Performs CPU features Initialization.
> +
> +  This service will invoke MP service to perform CPU features
> +  initialization on BSP/APs per user configuration.
> +
> +  @note This service could be called by BSP only.
> +**/
> +VOID
> +EFIAPI
> +CpuFeaturesInitialize (
> +  VOID
> +  )
> +{
> +  CPU_FEATURES_DATA          *CpuFeaturesData;
> +  UINTN                      OldBspNumber;
> +  EFI_EVENT                  MpEvent;
> +  EFI_STATUS                 Status;
> +
> +  CpuFeaturesData = GetCpuFeaturesData ();
> +
> +  OldBspNumber = GetProcessorIndex();
> +  CpuFeaturesData->BspNumber = OldBspNumber;
> +
> +  Status = gBS->CreateEvent (
> +                  EVT_NOTIFY_WAIT,
> +                  TPL_CALLBACK,
> +                  EfiEventEmptyFunction,
> +                  NULL,
> +                  &MpEvent
> +                  );
> +  ASSERT_EFI_ERROR (Status);
> +
> +  //
> +  // Wakeup all APs for programming.
> +  //
> +  StartupAPsWorker (SetProcessorRegister, MpEvent);
> +  //
> +  // Programming BSP
> +  //
> +  SetProcessorRegister (CpuFeaturesData);
> +
> +  //
> +  // Wait all processors to finish the task.
> +  //
> +  do {
> +    Status = gBS->CheckEvent (MpEvent);
> +  } while (Status == EFI_NOT_READY);
> +  ASSERT_EFI_ERROR (Status);
> +
> +  //
> +  // Switch to new BSP if required
> +  //
> +  if (CpuFeaturesData->BspNumber != OldBspNumber) {
> +    SwitchNewBsp (CpuFeaturesData->BspNumber);
> +  }
> +}
> +
> diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.inf b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.inf
> index f0f317c945..6693bae575 100644
> --- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.inf
> +++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.inf
> @@ -47,6 +47,9 @@
>     SynchronizationLib
>     UefiBootServicesTableLib
>     IoLib
> +  UefiBootServicesTableLib
> +  UefiLib
> +  LocalApicLib
>   
>   [Protocols]
>     gEfiMpServiceProtocolGuid                                            ## CONSUMES
> diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.c b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.c
> index 82fe268812..799864a136 100644
> --- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.c
> +++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.c
> @@ -149,11 +149,15 @@ GetProcessorInformation (
>   **/
>   VOID
>   StartupAPsWorker (
> -  IN  EFI_AP_PROCEDURE                 Procedure
> +  IN  EFI_AP_PROCEDURE                 Procedure,
> +  IN  VOID                             *MpEvent
>     )
>   {
>     EFI_STATUS                           Status;
>     EFI_PEI_MP_SERVICES_PPI              *CpuMpPpi;
> +  CPU_FEATURES_DATA                    *CpuFeaturesData;
> +
> +  CpuFeaturesData = GetCpuFeaturesData ();
>   
>     //
>     // Get MP Services Protocol
> @@ -175,7 +179,7 @@ StartupAPsWorker (
>                    Procedure,
>                    FALSE,
>                    0,
> -                 NULL
> +                 CpuFeaturesData
>                    );
>     ASSERT_EFI_ERROR (Status);
>   }
> @@ -257,3 +261,50 @@ GetNumberOfProcessor (
>                            );
>     ASSERT_EFI_ERROR (Status);
>   }
> +
> +/**
> +  Performs CPU features Initialization.
> +
> +  This service will invoke MP service to perform CPU features
> +  initialization on BSP/APs per user configuration.
> +
> +  @note This service could be called by BSP only.
> +**/
> +VOID
> +EFIAPI
> +CpuFeaturesInitialize (
> +  VOID
> +  )
> +{
> +  CPU_FEATURES_DATA          *CpuFeaturesData;
> +  UINTN                      OldBspNumber;
> +
> +  CpuFeaturesData = GetCpuFeaturesData ();
> +
> +  OldBspNumber = GetProcessorIndex();
> +  CpuFeaturesData->BspNumber = OldBspNumber;
> +
> +  //
> +  // Known limitation: In PEI phase, CpuFeatures driver not
> +  // support async mode execute tasks. So semaphore type
> +  // register can't been used for this instance, must use
> +  // DXE type instance.
> +  //
> +
> +  //
> +  // Wakeup all APs for programming.
> +  //
> +  StartupAPsWorker (SetProcessorRegister, NULL);
> +  //
> +  // Programming BSP
> +  //
> +  SetProcessorRegister (CpuFeaturesData);
> +
> +  //
> +  // Switch to new BSP if required
> +  //
> +  if (CpuFeaturesData->BspNumber != OldBspNumber) {
> +    SwitchNewBsp (CpuFeaturesData->BspNumber);
> +  }
> +}
> +
> diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.inf b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.inf
> index fdfef98293..e95f01df0b 100644
> --- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.inf
> +++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.inf
> @@ -49,6 +49,7 @@
>     PeiServicesLib
>     PeiServicesTablePointerLib
>     IoLib
> +  LocalApicLib
>   
>   [Ppis]
>     gEfiPeiMpServicesPpiGuid                                             ## CONSUMES
> diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeatures.h b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeatures.h
> index edd266934f..39457e9730 100644
> --- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeatures.h
> +++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeatures.h
> @@ -23,6 +23,7 @@
>   #include <Library/MemoryAllocationLib.h>
>   #include <Library/SynchronizationLib.h>
>   #include <Library/IoLib.h>
> +#include <Library/LocalApicLib.h>
>   
>   #include <AcpiCpuData.h>
>   
> @@ -46,16 +47,26 @@ typedef struct {
>     CPU_FEATURE_INITIALIZE       InitializeFunc;
>     UINT8                        *BeforeFeatureBitMask;
>     UINT8                        *AfterFeatureBitMask;
> +  UINT8                        *CoreBeforeFeatureBitMask;
> +  UINT8                        *CoreAfterFeatureBitMask;
> +  UINT8                        *PackageBeforeFeatureBitMask;
> +  UINT8                        *PackageAfterFeatureBitMask;
>     VOID                         *ConfigData;
>     BOOLEAN                      BeforeAll;
>     BOOLEAN                      AfterAll;
>   } CPU_FEATURES_ENTRY;
>   
> +//
> +// Flags used when program the register.
> +//
> +typedef struct {
> +  volatile UINTN           MemoryMappedLock;     // Spinlock used to program mmio
> +  volatile UINT32          *SemaphoreCount;      // Semaphore used to program semaphore.
> +} PROGRAM_CPU_REGISTER_FLAGS;
> +
>   typedef struct {
>     UINTN                    FeaturesCount;
>     UINT32                   BitMaskSize;
> -  SPIN_LOCK                MsrLock;
> -  SPIN_LOCK                MemoryMappedLock;
>     LIST_ENTRY               FeatureList;
>   
>     CPU_FEATURES_INIT_ORDER  *InitOrder;
> @@ -64,9 +75,14 @@ typedef struct {
>     UINT8                    *ConfigurationPcd;
>     UINT8                    *SettingPcd;
>   
> +  UINT32                   NumberOfCpus;
> +  ACPI_CPU_DATA            *AcpiCpuData;
> +
>     CPU_REGISTER_TABLE       *RegisterTable;
>     CPU_REGISTER_TABLE       *PreSmmRegisterTable;
>     UINTN                    BspNumber;
> +
> +  PROGRAM_CPU_REGISTER_FLAGS  CpuFlags;
>   } CPU_FEATURES_DATA;
>   
>   #define CPU_FEATURE_ENTRY_FROM_LINK(a) \
> @@ -118,10 +134,13 @@ GetProcessorInformation (
>   
>     @param[in]  Procedure               A pointer to the function to be run on
>                                         enabled APs of the system.
> +  @param[in]  MpEvent                 A pointer to the event to be used later
> +                                      to check whether procedure has done.
>   **/
>   VOID
>   StartupAPsWorker (
> -  IN  EFI_AP_PROCEDURE                 Procedure
> +  IN  EFI_AP_PROCEDURE                 Procedure,
> +  IN  VOID                             *MpEvent
>     );
>   
>   /**
> @@ -170,4 +189,30 @@ DumpCpuFeature (
>     IN CPU_FEATURES_ENTRY  *CpuFeature
>     );
>   
> +/**
> +  Return feature dependence result.
> +
> +  @param[in]  CpuFeature        Pointer to CPU feature.
> +  @param[in]  Before            Check before dependence or after.
> +
> +  @retval     return the dependence result.
> +**/
> +CPU_FEATURE_DEPENDENCE_TYPE
> +DetectFeatureScope (
> +  IN CPU_FEATURES_ENTRY         *CpuFeature,
> +  IN BOOLEAN                    Before
> +  );
> +
> +/**
> +  Programs registers for the calling processor.
> +
> +  @param[in,out] Buffer  The pointer to private data buffer.
> +
> +**/
> +VOID
> +EFIAPI
> +SetProcessorRegister (
> +  IN OUT VOID            *Buffer
> +  );
> +
>   #endif
> diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeaturesLib.c b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeaturesLib.c
> index fa7e107e39..f9e3178dc1 100644
> --- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeaturesLib.c
> +++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeaturesLib.c
> @@ -112,6 +112,302 @@ IsBitMaskMatchCheck (
>     return FALSE;
>   }
>   
> +/**
> +  Return feature dependence result.
> +
> +  @param[in]  CpuFeature        Pointer to CPU feature.
> +  @param[in]  Before            Check before dependence or after.
> +
> +  @retval     return the dependence result.
> +**/
> +CPU_FEATURE_DEPENDENCE_TYPE
> +DetectFeatureScope (
> +  IN CPU_FEATURES_ENTRY         *CpuFeature,
> +  IN BOOLEAN                    Before
> +  )
> +{
> +  if (Before) {
> +    if (CpuFeature->PackageBeforeFeatureBitMask != NULL) {
> +      return PackageDepType;
> +    }
> +
> +    if (CpuFeature->CoreBeforeFeatureBitMask != NULL) {
> +      return CoreDepType;
> +    }
> +
> +    if (CpuFeature->BeforeFeatureBitMask != NULL) {
> +      return ThreadDepType;
> +    }
> +
> +    return NoneDepType;
> +  }
> +
> +  if (CpuFeature->PackageAfterFeatureBitMask != NULL) {
> +    return PackageDepType;
> +  }
> +
> +  if (CpuFeature->CoreAfterFeatureBitMask != NULL) {
> +    return CoreDepType;
> +  }
> +
> +  if (CpuFeature->AfterFeatureBitMask != NULL) {
> +    return ThreadDepType;
> +  }
> +
> +  return NoneDepType;
> +}
> +
> +/**
> +  Clear dependence for the specified type.
> +
> +  @param[in]  CurrentFeature     Cpu feature need to clear.
> +  @param[in]  Before             Before or after dependence relationship.
> +
> +**/
> +VOID
> +ClearFeatureScope (
> +  IN CPU_FEATURES_ENTRY           *CpuFeature,
> +  IN BOOLEAN                      Before
> +  )
> +{
> +  if (Before) {
> +    if (CpuFeature->BeforeFeatureBitMask != NULL) {
> +      FreePool (CpuFeature->BeforeFeatureBitMask);
> +      CpuFeature->BeforeFeatureBitMask = NULL;
> +    }
> +    if (CpuFeature->CoreBeforeFeatureBitMask != NULL) {
> +      FreePool (CpuFeature->CoreBeforeFeatureBitMask);
> +      CpuFeature->CoreBeforeFeatureBitMask = NULL;
> +    }
> +    if (CpuFeature->PackageBeforeFeatureBitMask != NULL) {
> +      FreePool (CpuFeature->PackageBeforeFeatureBitMask);
> +      CpuFeature->PackageBeforeFeatureBitMask = NULL;
> +    }
> +  } else {
> +    if (CpuFeature->PackageAfterFeatureBitMask != NULL) {
> +      FreePool (CpuFeature->PackageAfterFeatureBitMask);
> +      CpuFeature->PackageAfterFeatureBitMask = NULL;
> +    }
> +    if (CpuFeature->CoreAfterFeatureBitMask != NULL) {
> +      FreePool (CpuFeature->CoreAfterFeatureBitMask);
> +      CpuFeature->CoreAfterFeatureBitMask = NULL;
> +    }
> +    if (CpuFeature->AfterFeatureBitMask != NULL) {
> +      FreePool (CpuFeature->AfterFeatureBitMask);
> +      CpuFeature->AfterFeatureBitMask = NULL;
> +    }
> +  }
> +}
> +
> +/**
> +  Base on dependence relationship to asjust feature dependence.
> +
> +  ONLY when the feature before(or after) the find feature also has
> +  dependence with the find feature. In this case, driver need to base
> +  on dependce relationship to decide how to insert current feature and
> +  adjust the feature dependence.
> +
> +  @param[in]  PreviousFeature    CPU feature current before the find one.
> +  @param[in]  CurrentFeature     Cpu feature need to adjust.
> +  @param[in]  Before             Before or after dependence relationship.
> +
> +  @retval   TRUE   means the current feature dependence has been adjusted.
> +
> +  @retval   FALSE  means the previous feature dependence has been adjusted.
> +                   or previous feature has no dependence with the find one.
> +
> +**/
> +BOOLEAN
> +AdjustFeaturesDependence (
> +  IN OUT CPU_FEATURES_ENTRY         *PreviousFeature,
> +  IN OUT CPU_FEATURES_ENTRY         *CurrentFeature,
> +  IN     BOOLEAN                    Before
> +  )
> +{
> +  CPU_FEATURE_DEPENDENCE_TYPE            PreDependType;
> +  CPU_FEATURE_DEPENDENCE_TYPE            CurrentDependType;
> +
> +  PreDependType     = DetectFeatureScope(PreviousFeature, Before);
> +  CurrentDependType = DetectFeatureScope(CurrentFeature, Before);
> +
> +  //
> +  // If previous feature has no dependence with the find featue.
> +  // return FALSE.
> +  //
> +  if (PreDependType == NoneDepType) {
> +    return FALSE;
> +  }
> +
> +  //
> +  // If both feature have dependence, keep the one which needs use more
> +  // processors and clear the dependence for the other one.
> +  //
> +  if (PreDependType >= CurrentDependType) {
> +    ClearFeatureScope (CurrentFeature, Before);
> +    return TRUE;
> +  } else {
> +    ClearFeatureScope (PreviousFeature, Before);
> +    return FALSE;
> +  }
> +}
> +
> +/**
> +  Base on dependence relationship to asjust feature order.
> +
> +  @param[in]  FeatureList        Pointer to CPU feature list
> +  @param[in]  FindEntry          The entry this feature depend on.
> +  @param[in]  CurrentEntry       The entry for this feature.
> +  @param[in]  Before             Before or after dependence relationship.
> +
> +**/
> +VOID
> +AdjustEntry (
> +  IN      LIST_ENTRY                *FeatureList,
> +  IN OUT  LIST_ENTRY                *FindEntry,
> +  IN OUT  LIST_ENTRY                *CurrentEntry,
> +  IN      BOOLEAN                   Before
> +  )
> +{
> +  LIST_ENTRY                *PreviousEntry;
> +  CPU_FEATURES_ENTRY        *PreviousFeature;
> +  CPU_FEATURES_ENTRY        *CurrentFeature;
> +
> +  //
> +  // For CPU feature which has core or package type dependence, later code need to insert
> +  // AcquireSpinLock/ReleaseSpinLock logic to sequency the execute order.
> +  // So if driver finds both feature A and B need to execute before feature C, driver will
> +  // base on dependence type of feature A and B to update the logic here.
> +  // For example, feature A has package type dependence and feature B has core type dependence,
> +  // because package type dependence need to wait for more processors which has strong dependence
> +  // than core type dependence. So driver will adjust the feature order to B -> A -> C. and driver
> +  // will remove the feature dependence in feature B.
> +  // Driver just needs to make sure before feature C been executed, feature A has finished its task
> +  // in all all thread. Feature A finished in all threads also means feature B have finshed in all
> +  // threads.
> +  //
> +  if (Before) {
> +    PreviousEntry = GetPreviousNode (FeatureList, FindEntry);
> +  } else {
> 
> +    PreviousEntry = GetNextNode (FeatureList, FindEntry);
> +  }
> +
> +  CurrentFeature  = CPU_FEATURE_ENTRY_FROM_LINK (CurrentEntry);
> +  RemoveEntryList (CurrentEntry);
> +
> +  if (IsNull (FeatureList, PreviousEntry)) {
> +    //
> +    // If not exist the previous or next entry, just insert the current entry.
> +    //
> +    if (Before) {
> +      InsertTailList (FindEntry, CurrentEntry);
> +    } else {
> +      InsertHeadList (FindEntry, CurrentEntry);
> +    }
> +  } else {
> +    //
> +    // If exist the previous or next entry, need to check it before insert curent entry.
> +    //
> +    PreviousFeature = CPU_FEATURE_ENTRY_FROM_LINK (PreviousEntry);
> +
> +    if (AdjustFeaturesDependence (PreviousFeature, CurrentFeature, Before)) {
> +      //
> +      // Return TRUE means current feature dependence has been cleared and the previous
> +      // feature dependence has been kept and used. So insert current feature before (or after)
> +      // the previous feature.
> +      //
> +      if (Before) {
> +        InsertTailList (PreviousEntry, CurrentEntry);
> +      } else {
> +        InsertHeadList (PreviousEntry, CurrentEntry);
> +      }
> +    } else {
> +      if (Before) {
> +        InsertTailList (FindEntry, CurrentEntry);
> +      } else {
> +        InsertHeadList (FindEntry, CurrentEntry);
> +      }
> +    }
> +  }
> +}
> 
> +
> +/**
> +  Checks and adjusts current CPU features per dependency relationship.
> +
> +  @param[in]  FeatureList        Pointer to CPU feature list
> +  @param[in]  CurrentEntry       Pointer to current checked CPU feature
> +  @param[in]  FeatureMask        The feature bit mask.
> +
> +  @retval     return Swapped info.
> +**/
> +BOOLEAN
> +InsertToBeforeEntry (
> +  IN LIST_ENTRY              *FeatureList,
> +  IN LIST_ENTRY              *CurrentEntry,
> +  IN UINT8                   *FeatureMask
> +  )
> +{
> +  LIST_ENTRY                 *CheckEntry;
> +  CPU_FEATURES_ENTRY         *CheckFeature;
> +  BOOLEAN                    Swapped;
> +
> +  Swapped = FALSE;
> +
> +  //
> +  // Check all features dispatched before this entry
> +  //
> +  CheckEntry = GetFirstNode (FeatureList);
> +  while (CheckEntry != CurrentEntry) {
> +    CheckFeature = CPU_FEATURE_ENTRY_FROM_LINK (CheckEntry);
> +    if (IsBitMaskMatchCheck (CheckFeature->FeatureMask, FeatureMask)) {
> +      AdjustEntry (FeatureList, CheckEntry, CurrentEntry, TRUE);
> +      Swapped = TRUE;
> +      break;
> +    }
> +    CheckEntry = CheckEntry->ForwardLink;
> +  }
> +
> +  return Swapped;
> +}
> +
> +/**
> +  Checks and adjusts current CPU features per dependency relationship.
> +
> +  @param[in]  FeatureList        Pointer to CPU feature list
> +  @param[in]  CurrentEntry       Pointer to current checked CPU feature
> +  @param[in]  FeatureMask        The feature bit mask.
> +
> +  @retval     return Swapped info.
> +**/
> +BOOLEAN
> +InsertToAfterEntry (
> +  IN LIST_ENTRY              *FeatureList,
> +  IN LIST_ENTRY              *CurrentEntry,
> +  IN UINT8                   *FeatureMask
> +  )
> +{
> +  LIST_ENTRY                 *CheckEntry;
> +  CPU_FEATURES_ENTRY         *CheckFeature;
> +  BOOLEAN                    Swapped;
> +
> +  Swapped = FALSE;
> +
> +  //
> +  // Check all features dispatched after this entry
> +  //
> +  CheckEntry = GetNextNode (FeatureList, CurrentEntry);
> +  while (!IsNull (FeatureList, CheckEntry)) {
> +    CheckFeature = CPU_FEATURE_ENTRY_FROM_LINK (CheckEntry);
> +    if (IsBitMaskMatchCheck (CheckFeature->FeatureMask, FeatureMask)) {
> +      AdjustEntry (FeatureList, CheckEntry, CurrentEntry, FALSE);
> +      Swapped = TRUE;
> +      break;
> +    }
> +    CheckEntry = CheckEntry->ForwardLink;
> +  }
> +
> +  return Swapped;
> +}
> +
>   /**
>     Checks and adjusts CPU features order per dependency relationship.
>   
> @@ -128,11 +424,13 @@ CheckCpuFeaturesDependency (
>     CPU_FEATURES_ENTRY         *CheckFeature;
>     BOOLEAN                    Swapped;
>     LIST_ENTRY                 *TempEntry;
> +  LIST_ENTRY                 *NextEntry;
>   
>     CurrentEntry = GetFirstNode (FeatureList);
>     while (!IsNull (FeatureList, CurrentEntry)) {
>       Swapped = FALSE;
>       CpuFeature = CPU_FEATURE_ENTRY_FROM_LINK (CurrentEntry);
> +    NextEntry = CurrentEntry->ForwardLink;
>       if (CpuFeature->BeforeAll) {
>         //
>         // Check all features dispatched before this entry
> @@ -153,6 +451,7 @@ CheckCpuFeaturesDependency (
>           CheckEntry = CheckEntry->ForwardLink;
>         }
>         if (Swapped) {
> +        CurrentEntry = NextEntry;
>           continue;
>         }
>       }
> @@ -179,60 +478,59 @@ CheckCpuFeaturesDependency (
>           CheckEntry = CheckEntry->ForwardLink;
>         }
>         if (Swapped) {
> +        CurrentEntry = NextEntry;
>           continue;
>         }
>       }
>   
>       if (CpuFeature->BeforeFeatureBitMask != NULL) {
> -      //
> -      // Check all features dispatched before this entry
> -      //
> -      CheckEntry = GetFirstNode (FeatureList);
> -      while (CheckEntry != CurrentEntry) {
> -        CheckFeature = CPU_FEATURE_ENTRY_FROM_LINK (CheckEntry);
> -        if (IsBitMaskMatchCheck (CheckFeature->FeatureMask, CpuFeature->BeforeFeatureBitMask)) {
> -          //
> -          // If there is dependency, swap them
> -          //
> -          RemoveEntryList (CurrentEntry);
> -          InsertTailList (CheckEntry, CurrentEntry);
> -          Swapped = TRUE;
> -          break;
> -        }
> -        CheckEntry = CheckEntry->ForwardLink;
> -      }
> +      Swapped = InsertToBeforeEntry (FeatureList, CurrentEntry, CpuFeature->BeforeFeatureBitMask);
>         if (Swapped) {
> +        CurrentEntry = NextEntry;
>           continue;
>         }
>       }
>   
>       if (CpuFeature->AfterFeatureBitMask != NULL) {
> -      //
> -      // Check all features dispatched after this entry
> -      //
> -      CheckEntry = GetNextNode (FeatureList, CurrentEntry);
> -      while (!IsNull (FeatureList, CheckEntry)) {
> -        CheckFeature = CPU_FEATURE_ENTRY_FROM_LINK (CheckEntry);
> -        if (IsBitMaskMatchCheck (CheckFeature->FeatureMask, CpuFeature->AfterFeatureBitMask)) {
> -          //
> -          // If there is dependency, swap them
> -          //
> -          TempEntry = GetNextNode (FeatureList, CurrentEntry);
> -          RemoveEntryList (CurrentEntry);
> -          InsertHeadList (CheckEntry, CurrentEntry);
> -          CurrentEntry = TempEntry;
> -          Swapped = TRUE;
> -          break;
> -        }
> -        CheckEntry = CheckEntry->ForwardLink;
> +      Swapped = InsertToAfterEntry (FeatureList, CurrentEntry, CpuFeature->AfterFeatureBitMask);
> +      if (Swapped) {
> +        CurrentEntry = NextEntry;
> +        continue;
>         }
> +    }
> +
> +    if (CpuFeature->CoreBeforeFeatureBitMask != NULL) {
> +      Swapped = InsertToBeforeEntry (FeatureList, CurrentEntry, CpuFeature->CoreBeforeFeatureBitMask);
>         if (Swapped) {
> +        CurrentEntry = NextEntry;
>           continue;
>         }
>       }
> -    //
> -    // No swap happened, check the next feature
> -    //
> +
> +    if (CpuFeature->CoreAfterFeatureBitMask != NULL) {
> +      Swapped = InsertToAfterEntry (FeatureList, CurrentEntry, CpuFeature->CoreAfterFeatureBitMask);
> +      if (Swapped) {
> +        CurrentEntry = NextEntry;
> +        continue;
> +      }
> +    }
> +
> +    if (CpuFeature->PackageBeforeFeatureBitMask != NULL) {
> +      Swapped = InsertToBeforeEntry (FeatureList, CurrentEntry, CpuFeature->PackageBeforeFeatureBitMask);
> +      if (Swapped) {
> +        CurrentEntry = NextEntry;
> +        continue;
> +      }
> +    }
> +
> +    if (CpuFeature->PackageAfterFeatureBitMask != NULL) {
> +      Swapped = InsertToAfterEntry (FeatureList, CurrentEntry, CpuFeature->PackageAfterFeatureBitMask);
> +      if (Swapped) {
> +        CurrentEntry = NextEntry;
> +        continue;
> +      }
> +    }
> +
>       CurrentEntry = CurrentEntry->ForwardLink;
>     }
>   }
> @@ -265,8 +563,7 @@ RegisterCpuFeatureWorker (
>     CpuFeaturesData = GetCpuFeaturesData ();
>     if (CpuFeaturesData->FeaturesCount == 0) {
>       InitializeListHead (&CpuFeaturesData->FeatureList);
> -    InitializeSpinLock (&CpuFeaturesData->MsrLock);
> -    InitializeSpinLock (&CpuFeaturesData->MemoryMappedLock);
> +    InitializeSpinLock (&CpuFeaturesData->CpuFlags.MemoryMappedLock);
>       CpuFeaturesData->BitMaskSize = (UINT32) BitMaskSize;
>     }
>     ASSERT (CpuFeaturesData->BitMaskSize == BitMaskSize);
> @@ -328,6 +625,31 @@ RegisterCpuFeatureWorker (
>         }
>         CpuFeatureEntry->AfterFeatureBitMask = CpuFeature->AfterFeatureBitMask;
>       }
> +    if (CpuFeature->CoreBeforeFeatureBitMask != NULL) {
> +      if (CpuFeatureEntry->CoreBeforeFeatureBitMask != NULL) {
> +        FreePool (CpuFeatureEntry->CoreBeforeFeatureBitMask);
> +      }
> +      CpuFeatureEntry->CoreBeforeFeatureBitMask = CpuFeature->CoreBeforeFeatureBitMask;
> +    }
> +    if (CpuFeature->CoreAfterFeatureBitMask != NULL) {
> +      if (CpuFeatureEntry->CoreAfterFeatureBitMask != NULL) {
> +        FreePool (CpuFeatureEntry->CoreAfterFeatureBitMask);
> +      }
> +      CpuFeatureEntry->CoreAfterFeatureBitMask = CpuFeature->CoreAfterFeatureBitMask;
> +    }
> +    if (CpuFeature->PackageBeforeFeatureBitMask != NULL) {
> +      if (CpuFeatureEntry->PackageBeforeFeatureBitMask != NULL) {
> +        FreePool (CpuFeatureEntry->PackageBeforeFeatureBitMask);
> +      }
> +      CpuFeatureEntry->PackageBeforeFeatureBitMask = CpuFeature->PackageBeforeFeatureBitMask;
> +    }
> +    if (CpuFeature->PackageAfterFeatureBitMask != NULL) {
> +      if (CpuFeatureEntry->PackageAfterFeatureBitMask != NULL) {
> +        FreePool (CpuFeatureEntry->PackageAfterFeatureBitMask);
> +      }
> +      CpuFeatureEntry->PackageAfterFeatureBitMask = CpuFeature->PackageAfterFeatureBitMask;
> +    }
> +
>       CpuFeatureEntry->BeforeAll = CpuFeature->BeforeAll;
>       CpuFeatureEntry->AfterAll  = CpuFeature->AfterAll;
>   
> @@ -410,6 +732,8 @@ SetCpuFeaturesBitMask (
>     @retval  RETURN_UNSUPPORTED       Registration of the CPU feature is not
>                                       supported due to a circular dependency between
>                                       BEFORE and AFTER features.
> +  @retval  RETURN_NOT_READY         CPU feature PCD PcdCpuFeaturesUserConfiguration
> +                                    not updated by Platform driver yet.
>   
>     @note This service could be called by BSP only.
>   **/
> @@ -431,12 +755,20 @@ RegisterCpuFeature (
>     UINT8                      *FeatureMask;
>     UINT8                      *BeforeFeatureBitMask;
>     UINT8                      *AfterFeatureBitMask;
> +  UINT8                      *CoreBeforeFeatureBitMask;
> +  UINT8                      *CoreAfterFeatureBitMask;
> +  UINT8                      *PackageBeforeFeatureBitMask;
> +  UINT8                      *PackageAfterFeatureBitMask;
>     BOOLEAN                    BeforeAll;
>     BOOLEAN                    AfterAll;
>   
> -  FeatureMask          = NULL;
> -  BeforeFeatureBitMask = NULL;
> -  AfterFeatureBitMask  = NULL;
> +  FeatureMask                 = NULL;
> +  BeforeFeatureBitMask        = NULL;

How about renaming BeforeFeatureBitMask to ThreadBeforeFeatureBitMask?
I think the renaming together with redefining the macro 
CPU_FEATURE_BEFORE as CPU_FEATURE_THREAD_BEFORE can be in a separate patch.

> +  AfterFeatureBitMask         = NULL;
> +  CoreBeforeFeatureBitMask    = NULL;
> +  CoreAfterFeatureBitMask     = NULL;
> +  PackageBeforeFeatureBitMask  = NULL;
> +  PackageAfterFeatureBitMask   = NULL;
>     BeforeAll            = FALSE;
>     AfterAll             = FALSE;
>   
> @@ -449,6 +781,10 @@ RegisterCpuFeature (
>                       != (CPU_FEATURE_BEFORE | CPU_FEATURE_AFTER));
>       ASSERT ((Feature & (CPU_FEATURE_BEFORE_ALL | CPU_FEATURE_AFTER_ALL))
>                       != (CPU_FEATURE_BEFORE_ALL | CPU_FEATURE_AFTER_ALL));

Implementation can avoid using CPU_FEATURE_BEFORE and CPU_FEATURE_AFTER.
Use CPU_FEATURE_THREAD_BEFORE and CPU_FEATURE_THREAD_AFTER.

> +    ASSERT ((Feature & (CPU_FEATURE_CORE_BEFORE | CPU_FEATURE_CORE_AFTER))
> +                    != (CPU_FEATURE_CORE_BEFORE | CPU_FEATURE_CORE_AFTER));
> +    ASSERT ((Feature & (CPU_FEATURE_PACKAGE_BEFORE | CPU_FEATURE_PACKAGE_AFTER))
> +                    != (CPU_FEATURE_PACKAGE_BEFORE | CPU_FEATURE_PACKAGE_AFTER));
>       if (Feature < CPU_FEATURE_BEFORE) {
>         BeforeAll = ((Feature & CPU_FEATURE_BEFORE_ALL) != 0) ? TRUE : FALSE;
>         AfterAll  = ((Feature & CPU_FEATURE_AFTER_ALL) != 0) ? TRUE : FALSE;
> @@ -459,6 +795,14 @@ RegisterCpuFeature (
>         SetCpuFeaturesBitMask (&BeforeFeatureBitMask, Feature & ~CPU_FEATURE_BEFORE, BitMaskSize);
>       } else if ((Feature & CPU_FEATURE_AFTER) != 0) {
>         SetCpuFeaturesBitMask (&AfterFeatureBitMask, Feature & ~CPU_FEATURE_AFTER, BitMaskSize);
> +    } else if ((Feature & CPU_FEATURE_CORE_BEFORE) != 0) {
> +      SetCpuFeaturesBitMask (&CoreBeforeFeatureBitMask, Feature & ~CPU_FEATURE_CORE_BEFORE, BitMaskSize);
> +    } else if ((Feature & CPU_FEATURE_CORE_AFTER) != 0) {
> +      SetCpuFeaturesBitMask (&CoreAfterFeatureBitMask, Feature & ~CPU_FEATURE_CORE_AFTER, BitMaskSize);
> +    } else if ((Feature & CPU_FEATURE_PACKAGE_BEFORE) != 0) {
> +      SetCpuFeaturesBitMask (&PackageBeforeFeatureBitMask, Feature & ~CPU_FEATURE_PACKAGE_BEFORE, BitMaskSize);
> +    } else if ((Feature & CPU_FEATURE_PACKAGE_AFTER) != 0) {
> +      SetCpuFeaturesBitMask (&PackageAfterFeatureBitMask, Feature & ~CPU_FEATURE_PACKAGE_AFTER, BitMaskSize);
>       }
>       Feature = VA_ARG (Marker, UINT32);
>     }
> @@ -466,15 +810,19 @@ RegisterCpuFeature (
>   
>     CpuFeature = AllocateZeroPool (sizeof (CPU_FEATURES_ENTRY));
>     ASSERT (CpuFeature != NULL);
> -  CpuFeature->Signature            = CPU_FEATURE_ENTRY_SIGNATURE;
> -  CpuFeature->FeatureMask          = FeatureMask;
> -  CpuFeature->BeforeFeatureBitMask = BeforeFeatureBitMask;
> -  CpuFeature->AfterFeatureBitMask  = AfterFeatureBitMask;
> -  CpuFeature->BeforeAll            = BeforeAll;
> -  CpuFeature->AfterAll             = AfterAll;
> -  CpuFeature->GetConfigDataFunc    = GetConfigDataFunc;
> -  CpuFeature->SupportFunc          = SupportFunc;
> -  CpuFeature->InitializeFunc       = InitializeFunc;
> +  CpuFeature->Signature                   = CPU_FEATURE_ENTRY_SIGNATURE;
> +  CpuFeature->FeatureMask                 = FeatureMask;
> +  CpuFeature->BeforeFeatureBitMask        = BeforeFeatureBitMask;
> +  CpuFeature->AfterFeatureBitMask         = AfterFeatureBitMask;
> +  CpuFeature->CoreBeforeFeatureBitMask    = CoreBeforeFeatureBitMask;
> +  CpuFeature->CoreAfterFeatureBitMask     = CoreAfterFeatureBitMask;
> +  CpuFeature->PackageBeforeFeatureBitMask = PackageBeforeFeatureBitMask;
> +  CpuFeature->PackageAfterFeatureBitMask  = PackageAfterFeatureBitMask;
> +  CpuFeature->BeforeAll                   = BeforeAll;
> +  CpuFeature->AfterAll                    = AfterAll;
> +  CpuFeature->GetConfigDataFunc           = GetConfigDataFunc;
> +  CpuFeature->SupportFunc                 = SupportFunc;
> +  CpuFeature->InitializeFunc              = InitializeFunc;
>     if (FeatureName != NULL) {
>       CpuFeature->FeatureName          = AllocatePool (CPU_FEATURE_NAME_SIZE);
>       ASSERT (CpuFeature->FeatureName != NULL);
> 


-- 
Thanks,
Ray


  reply	other threads:[~2018-10-16  3:04 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-15  2:49 [Patch 0/4] Fix performance issue caused by Set MSR task Eric Dong
2018-10-15  2:49 ` [Patch 1/4] UefiCpuPkg/Include/AcpiCpuData.h: Add Semaphore related Information Eric Dong
2018-10-15 16:02   ` Laszlo Ersek
2018-10-16  3:43     ` Dong, Eric
2018-10-16  2:27   ` Ni, Ruiyu
2018-10-16  5:25     ` Dong, Eric
2018-10-15  2:49 ` [Patch 2/4] UefiCpuPkg/RegisterCpuFeaturesLib.h: Add new dependence types Eric Dong
2018-10-15  2:49 ` [Patch 3/4] UefiCpuPkg/RegisterCpuFeaturesLib: Add logic to support semaphore type Eric Dong
2018-10-16  3:05   ` Ni, Ruiyu [this message]
2018-10-16  7:43     ` Dong, Eric
2018-10-15  2:49 ` [Patch 4/4] UefiCpuPkg/PiSmmCpuDxeSmm: " Eric Dong
2018-10-15 17:13   ` Laszlo Ersek
2018-10-16 14:44     ` Dong, Eric
2018-10-16  3:16   ` Ni, Ruiyu
2018-10-16 23:52     ` Dong, Eric
2018-10-15 15:51 ` [Patch 0/4] Fix performance issue caused by Set MSR task Laszlo Ersek
2018-10-16  1:39   ` Dong, Eric
2018-10-17 11:42     ` Laszlo Ersek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ccee537f-6fbc-1505-78af-c52c4cc77e27@Intel.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox