Re: [edk2-devel] [PATCH v3 3/6] UefiCpuPkg: Implements SmmCpuSyncLib library instance

public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed

From: "Wu, Jiaxin" <jiaxin.wu@intel.com>
To: Laszlo Ersek <lersek@redhat.com>,
	"devel@edk2.groups.io" <devel@edk2.groups.io>
Cc: "Dong, Eric" <eric.dong@intel.com>, "Ni, Ray" <ray.ni@intel.com>,
	"Zeng, Star" <star.zeng@intel.com>,
	Gerd Hoffmann <kraxel@redhat.com>,
	"Kumar, Rahul R" <rahul.r.kumar@intel.com>
Subject: Re: [edk2-devel] [PATCH v3 3/6] UefiCpuPkg: Implements SmmCpuSyncLib library instance
Date: Thu, 14 Dec 2023 11:11:52 +0000	[thread overview]
Message-ID: <MN0PR11MB6158504B83C892BD70B1D76DFE8CA@MN0PR11MB6158.namprd11.prod.outlook.com> (raw)
In-Reply-To: <3b31f213-f102-acd4-4af6-d2b77fbde9a9@redhat.com>

Hi Laszlo,

Really appreciate your comments! I checked one by one and feedback as below, thank you & Ray again & again for patch refinement!!!!


> 
> (1) If / when you update the documentation in patch#2, please update
> this one as well.
> 

Yes, I will do the alignment.

> (2) Please sort the #include list alphabetically.
> 
> (The idea is that the [LibraryClasses] section in the INF file should be
> sorted as well, and then we can easily verify whether those two lists
> match each other -- modulo <Library/SmmCpuSyncLib.h>, of course.)
> 

Agree.

> (3) We can improve this, as follows:
> 
>   typedef volatile UINT32 SMM_CPU_SYNC_SEMAPHORE;

Good comment. Agree.


> 
>   typedef struct {
>     SMM_CPU_SYNC_SEMAPHORE *Counter;
>   } SMM_CPU_SYNC_SEMAPHORE_GLOBAL;
> 
>   typedef struct {
>     SMM_CPU_SYNC_SEMAPHORE *Run;
>   } SMM_CPU_SYNC_SEMAPHORE_CPU;
> 
> Because, while it *indeed* makes some sense to introduce these separate
> wrapper structures, we should still ensure that the internals are
> identical. This will come handy later.
> 

After check with Ray, I convinced we don't need wrapper the "Counter" into the SMM_CPU_SYNC_SEMAPHORE_GLOBAL. He thinks it's overdesigned since currently we only have one GLOBAL semaphore. "Counter" defines in the SMM_CPU_SYNC_CONTEXT directly can also make "Counter" has its own CPU cache lines, and don't need consider the future extension. So, I agree to move Counter into SMM_CPU_SYNC_CONTEXT. What's your opinion?

But for the *Run*, he is ok to wrap into the structure since it's for each CPU, and it can benefit & simply the coding logic, which can easily help us direct to the different CPU with index and meet the semaphore size alignment requirement.

> 
> (4) This is too complicated, in my opinion.
> 
> (4.1) First of all, please add a *conspicuous* comment to the
> SMM_CPU_SYNC_CONTEXT here, explaining that the whole idea is to place
> the Counter and Run semaphores on different CPU cache lines, for good
> performance. That's the *core* principle of this whole structure --
> that's why we have an array of pointers to semaphores, rather than an
> array of semaphores directly.
> 
> You didn't document that principle, and I had to spend a lot of time
> deducing that fact from the SmmCpuSyncContextInit() function.

Sorry about that, I will document for the SMM_CPU_SYNC_CONTEXT definition.


> 
> (4.2) The structure should go like this:
> 
> struct SMM_CPU_SYNC_CONTEXT  {
>   UINTN                            NumberOfCpus;
>   VOID                             *SemBuffer;
>   UINTN                            SemBufferPages;
>   SMM_CPU_SYNC_SEMAPHORE_GLOBAL    GlobalSem;
>   SMM_CPU_SYNC_SEMAPHORE_CPU       CpuSem[];
> };
> 
> Details:
> 
> - move NumberOfCpus to the top
> 
> - change the type of SemBuffer from (UINTN*) to (VOID*)
> 
> - replace SemBufferSize with SemBufferPages

Oh? Lazlo, could you explain why it's better to define it as "SemBufferPages" instead of "SemBufferSize" in bytes?


> 
> - move GlobalSem and CpuSem to the end
> 
> - We need exactly one SMM_CPU_SYNC_SEMAPHORE_GLOBAL, therefore
> embed
> GlobalSem directly as a field (it should not be a pointer)
> 
> - We can much simplify the code by turning CpuSem into a *flexible array
> member* (this is a C99 feature that is already widely used in edk2).
> This is why we move CpuSem to the end (and then we keep GlobalSem
> nearby, for clarity).

Agree! The same comment from Ray! Thank you and Ray, both!!!

>

Really great comment here:

Based on my above explanation, I propose to below definition:

///
/// Shall place each semaphore on exclusive cache line for good performance.
///
typedef volatile UINT32 SMM_CPU_SYNC_SEMAPHORE;

typedef struct {
  ///
  /// Used for control each CPU continue run or wait for signal
  ///
  SMM_CPU_SYNC_SEMAPHORE    *Run;
} SMM_CPU_SYNC_SEMAPHORE_FOR_EACH_CPU;

struct SMM_CPU_SYNC_CONTEXT  {
  ///
  /// Indicate all CPUs in the system.
  ///
  UINTN                                  NumberOfCpus;
  ///
  /// Address of semaphores
  ///
  VOID                                   *SemBuffer;
  ///
  /// Size in bytes of semaphores
  ///
  UINTN                                  SemBufferSize;      ----> I can change to pages based on your feedback:).
  ///
  /// Indicate CPUs entered SMM.
  ///
  SMM_CPU_SYNC_SEMAPHORE                 *CpuCount;
  ///
  /// Define an array of structure for each CPU semaphore due to the size alignment
  /// requirement. With the array of structure for each CPU semaphore, it's easy to
  /// reach the specific CPU with CPU Index for its own semaphore access: CpuSem[CpuIndex].
  ///
  SMM_CPU_SYNC_SEMAPHORE_FOR_EACH_CPU    CpuSem[];
};


> 
> (5) Please make these Internal functions STATIC.
> 
> Better yet, please make *all* functions that are not EFIAPI, STATIC.

Agree.


> > +  ASSERT (SmmCpuSyncCtx != NULL);
> 
> (6) This assert is unnecessary and wrong; we perform correct error
> checking already.

Agree, I will remove the check, but keep assert.

> 
> So, several comments on this section:
> 
> (7) the separate NULL assignment to (*SmmCpuSyncCtx) is superfluous, we
> overwrite the object immediately after.
> 

Agree.

> (8) the ASSERT() is superfluous and wrong; we already check for -- and
> report -- allocation failure correctly.

Agree, will remove the assert here.


> 
> (9) *page* allocation is useless / wasteful here; the main sync context
> structure can be allocated from *pool*
> 

Agree! The same comment from Ray! Again, thank you and Ray, both!!!



> (10) SafeIntLib APIs return RETURN_STATUS (and Status already has type
> RETURN_STATUS), so we should use RETURN_ERROR() rather than
> EFI_ERROR()
> -- it's more idiomatic

Agree.

> 
> (11) Referring back to (4), SMM_CPU_SYNC_SEMAPHORE_GLOBAL should
> not be
> counted (added) separately, because it is embedded in
> SMM_CPU_SYNC_CONTEXT directly.
> 
> > +
> > +  (*SmmCpuSyncCtx)->GlobalSem    =
> (SMM_CPU_SYNC_SEMAPHORE_GLOBAL *)((UINT8 *)(*SmmCpuSyncCtx) +
> sizeof (SMM_CPU_SYNC_CONTEXT));
> > +  (*SmmCpuSyncCtx)->CpuSem       = (SMM_CPU_SYNC_SEMAPHORE_CPU
> *)((UINT8 *)(*SmmCpuSyncCtx) + sizeof (SMM_CPU_SYNC_CONTEXT) +
> sizeof (SMM_CPU_SYNC_SEMAPHORE_GLOBAL));
> 
> (12) And then these two assignments should be dropped.
> 

Yes, agree, with my proposed definition, I will refine it.


> > +  (*SmmCpuSyncCtx)->NumberOfCpus = NumberOfCpus;
> > +
> > +  //
> > +  // Count the TotalSemSize
> > +  //
> > +  OneSemSize = GetSpinLockProperties ();
> > +
> > +  Status = SafeUintnMult (OneSemSize, sizeof
> (SMM_CPU_SYNC_SEMAPHORE_GLOBAL) / sizeof (VOID *), &GlobalSemSize);
> > +  if (EFI_ERROR (Status)) {
> > +    goto ON_ERROR;
> > +  }
> > +
> > +  Status = SafeUintnMult (OneSemSize, sizeof
> (SMM_CPU_SYNC_SEMAPHORE_CPU) / sizeof (VOID *), &OneCpuSemSize);
> > +  if (EFI_ERROR (Status)) {
> > +    goto ON_ERROR;
> > +  }
> 
> (13) I find this obscure and misleading. How about this one instead:
> 
>   UINTN  CacheLineSize;
> 
>   CacheLineSize = GetSpinLockProperties ();
>   OneSemSize    = ALIGN_VALUE (sizeof (SMM_CPU_SYNC_SEMAPHORE),
> CacheLineSize);
> 
> and then eliminate GlobalSemSize and OneCpuSemSize altogether.
> 
> The above construct will ensure that
> 
> (a) OneSemSize is just large enough for placing semaphores on different
> cache lines, and that
> 
> (b) OneSemSize is suitable for *both*
> SMM_CPU_SYNC_SEMAPHORE_GLOBAL and
> SMM_CPU_SYNC_SEMAPHORE_CPU. This is where we rely on the common
> internal
> type SMM_CPU_SYNC_SEMAPHORE.
> 
> > +
> > +  Status = SafeUintnMult (NumberOfCpus, OneCpuSemSize, &CpuSemSize);
> > +  if (EFI_ERROR (Status)) {
> > +    goto ON_ERROR;
> > +  }
> > +
> > +  Status = SafeUintnAdd (GlobalSemSize, CpuSemSize, &TotalSemSize);
> > +  if (EFI_ERROR (Status)) {
> > +    goto ON_ERROR;
> > +  }
> 

Agree!


> (14) This is probably better written as
> 
>   UINTN  NumSem;
> 
>   Status = SafeUintnAdd (1, NumberOfCpus, &NumSem);
>   if (RETURN_ERROR (Status)) {
>     goto ON_ERROR;
>   }
> 
>   Status = SafeUintnMult (NumSem, OneSemSize, &TotalSemSize);
>   if (RETURN_ERROR (Status)) {
>     goto ON_ERROR;
>   }
> 
> and remove the variable CpuSemSize as well.
> 
> > +
> > +  DEBUG ((DEBUG_INFO, "[%a] - One Semaphore Size    = 0x%x\n",
> __func__, OneSemSize));
> > +  DEBUG ((DEBUG_INFO, "[%a] - Total Semaphores Size = 0x%x\n",
> __func__, TotalSemSize));
> 
> (15) These are useful, but %x is not suitable for formatting UINTN.
> Instead, use %Lx, and cast the values to UINT64:
> 
>   DEBUG ((DEBUG_INFO, "[%a] - One Semaphore Size    = 0x%Lx\n", __func__,
> (UINT64)OneSemSize));
>   DEBUG ((DEBUG_INFO, "[%a] - Total Semaphores Size = 0x%Lx\n", __func__,
> (UINT64)TotalSemSize));
> 

Agree!

> > +
> > +  //
> > +  // Allocate for Semaphores in the *SmmCpuSyncCtx
> > +  //
> > +  (*SmmCpuSyncCtx)->SemBufferSize = TotalSemSize;
> > +  (*SmmCpuSyncCtx)->SemBuffer     = AllocatePages (EFI_SIZE_TO_PAGES
> ((*SmmCpuSyncCtx)->SemBufferSize));
> 
> (16) I suggest reworking this as follows (will be beneficial later), in
> accordance with (4):
> 
>   (*SmmCpuSyncCtx)->SemBufferPages = EFI_SIZE_TO_PAGES (TotalSemSize);
>   (*SmmCpuSyncCtx)->SemBuffer      = AllocatePages (
>                                        (*SmmCpuSyncCtx)->SemBufferPages
>                                        );
> 
> > +  ASSERT ((*SmmCpuSyncCtx)->SemBuffer != NULL);
> 
> (17) Bogus assert; same reason as in point (8).


Agree!

> 
> > +  if ((*SmmCpuSyncCtx)->SemBuffer == NULL) {
> > +    Status = RETURN_OUT_OF_RESOURCES;
> > +    goto ON_ERROR;
> > +  }
> > +
> > +  ZeroMem ((*SmmCpuSyncCtx)->SemBuffer, TotalSemSize);
> 
> (18) First approach: simplify the code by calling AllocateZeroPages()
> instead. (It may zero more bytes than strictly necessary, but it's not a
> big deal, and the code simplification is worth it.)
> 
> (19) Second approach: even better, just drop this call. There is no need
> for zeroing the semaphore buffer at all, as we are going to manually set
> both the Counter and the individual Run elements, below!
> 
> (20) With ZeroMem() gone, evaluate if we still depend on the
> BaseMemoryLib class (#include and [LibraryClasses]).

Agree, I will drop the zeromem!


> 
> > +
> > +  //
> > +  // Assign Global Semaphore pointer
> > +  //
> > +  SemAddr                               = (UINTN)(*SmmCpuSyncCtx)->SemBuffer;
> > +  (*SmmCpuSyncCtx)->GlobalSem->Counter  = (UINT32 *)SemAddr;
> 
> (21) Side comment (the compiler will catch it for you anyway): it's not
> "GlobalSem->Counter" but "GlobalSem.Counter", after point (4).
> 

I will rename the Counter --> CpuCount, and move to SmmCpuSyncCtx directly based on the comment from Ray.


> (22) The explicit (UINT32 *) cast is ugly. We should cast to
> (SMM_CPU_SYNC_SEMAPHORE *).
> 

Agree!

> > +  *(*SmmCpuSyncCtx)->GlobalSem->Counter = 0;
> > +  DEBUG ((DEBUG_INFO, "[%a] - (*SmmCpuSyncCtx)->GlobalSem->Counter
> Address: 0x%08x\n", __func__, (UINTN)(*SmmCpuSyncCtx)->GlobalSem-
> >Counter));
> 
> (23) problems with this DEBUG line:
> 
> (23.1) needlessly verbose,
> 
> (23.2) prints UINTN with %x,
> 
> (23.3) pads to 8 nibbles even though UINTN can be 64-bit
> 
> How about:
> 
>   DEBUG ((
>     DEBUG_INFO,
>     "[%a] - GlobalSem.Counter @ 0x%016Lx\n",
>     __func__,
>     (UINT64)SemAddr
>     ));
> 
> > +
> > +  SemAddr += GlobalSemSize;
> 
> (24) Should be "+= OneSemSize".

Agree, even move the Counter in context, it should be ok with "+= OneSemSize".

> 
> > +
> > +  //
> > +  // Assign CPU Semaphore pointer
> > +  //
> > +  for (CpuIndex = 0; CpuIndex < NumberOfCpus; CpuIndex++) {
> > +    (*SmmCpuSyncCtx)->CpuSem[CpuIndex].Run  = (UINT32 *)(SemAddr +
> (CpuSemSize / NumberOfCpus) * CpuIndex);
> > +    *(*SmmCpuSyncCtx)->CpuSem[CpuIndex].Run = 0;
> > +    DEBUG ((DEBUG_INFO, "[%a] - (*SmmCpuSyncCtx)->CpuSem[%d].Run
> Address: 0x%08x\n", __func__, CpuIndex, (UINTN)(*SmmCpuSyncCtx)-
> >CpuSem[CpuIndex].Run));
> > +  }
> 
> (25) Extremely over-complicated.
> 

Yes, agree, now I have realized that. Thank you again!


> (25.1) The quotient (CpuSemSize / NumberOfCpus) is just OneCpuSemSize,
> from the previous SafeUintnMult() call.
> 

Yes.

> (25.2) Using SemAddr as a base address, and then performing a separate
> multiplication, is wasteful -- not just computationally, but
> semantically. We can simply advance SemAddr here!
> 

Agree.

> (25.3) the expression
> 
>   (*SmmCpuSyncCtx)->CpuSem[CpuIndex].Run
> 
> is tiresome to read, and so we shouldn't repat it multiple times!

Agree

> 
> (25.4) the usual problems with the DEBUG line:
> 
> (25.4.1) needlessly verbose
> 
> (25.4.2) uses %d for formatting CpuIndex (which is UINTN)
> 
> (25.4.3) uses %x for formatting (UINTN)Run
> 
> (25.4.4) padds to 8 nibbles even though the Run address can be 64-bit
> 
> So:
> 
>   SMM_CPU_SYNC_SEMAPHORE_CPU *CpuSem;
> 
>   CpuSem = (*SmmCpuSyncCtx)->CpuSem;
>   for (CpuIndex = 0; CpuIndex < NumberOfCpus; CpuIndex++) {
>     CpuSem->Run  = (SMM_CPU_SYNC_SEMAPHORE *)SemAddr;
>     *CpuSem->Run = 0;
> 
>     DEBUG ((
>       DEBUG_INFO,
>       "[%a] - CpuSem[%Lu].Run @ 0x%016Lx\n",
>       __func__,
>       (UINT64)CpuIndex,
>       (UINT64)SemAddr
>       ));
> 
>     CpuSem++;
>     SemAddr += OneSemSize;
>   }
> 

Really good, thanks comments. Totally agree!


> > +
> > +  return RETURN_SUCCESS;
> > +
> > +ON_ERROR:
> > +  FreePages (*SmmCpuSyncCtx, EFI_SIZE_TO_PAGES (CtxSize));
> 
> (26) And then this can be
> 
>   FreePool (*SmmCpuSyncCtx);
> 

Yes, agree.

> per comment (9).
> 
> > +  return Status;
> > +}
> > +
> > +/**
> > +  Deinit an allocated SMM CPU Sync context.
> > +
> > +  SmmCpuSyncContextDeinit() function is to deinitialize SMM CPU Sync
> context, the resources allocated in
> > +  SmmCpuSyncContextInit() will be freed.
> > +
> > +  Note: This function only can be called after SmmCpuSyncContextInit()
> return success.
> > +
> > +  @param[in,out]  SmmCpuSyncCtx     Pointer to the SMM CPU Sync
> context object to be deinitialized.
> > +
> > +  @retval RETURN_SUCCESS            The SMM CPU Sync context was
> successful deinitialized.
> > +  @retval RETURN_INVALID_PARAMETER  SmmCpuSyncCtx is NULL.
> > +  @retval RETURN_UNSUPPORTED        Unsupported operation.
> > +
> > +**/
> > +RETURN_STATUS
> > +EFIAPI
> > +SmmCpuSyncContextDeinit (
> > +  IN OUT SMM_CPU_SYNC_CONTEXT  *SmmCpuSyncCtx
> > +  )
> > +{
> > +  UINTN  SmmCpuSyncCtxSize;
> > +
> > +  ASSERT (SmmCpuSyncCtx != NULL);
> 
> (27) bogus ASSERT
> 

According our original define, I will refine the interface.

> > +  if (SmmCpuSyncCtx == NULL) {
> > +    return RETURN_INVALID_PARAMETER;
> > +  }
> > +
> > +  SmmCpuSyncCtxSize = sizeof (SMM_CPU_SYNC_CONTEXT) + sizeof
> (SMM_CPU_SYNC_SEMAPHORE_GLOBAL) + sizeof
> (SMM_CPU_SYNC_SEMAPHORE_CPU) * (SmmCpuSyncCtx->NumberOfCpus);
> > +
> > +  FreePages (SmmCpuSyncCtx->SemBuffer, EFI_SIZE_TO_PAGES
> (SmmCpuSyncCtx->SemBufferSize));
> 
> (28) Per comment (16), this can be simplified as:
> 
>   FreePages (SmmCpuSyncCtx->SemBuffer, SmmCpuSyncCtx-
> >SemBufferPages);
> 
> > +
> > +  FreePages (SmmCpuSyncCtx, EFI_SIZE_TO_PAGES (SmmCpuSyncCtxSize));
> 
> (29) Per comments (9) and (26), this should be just
> 
>   FreePool (SmmCpuSyncCtx);
> 
> (and the variable "SmmCpuSyncCtxSize" should be removed).
> 

Yes, agree!


> > +
> > +  return RETURN_SUCCESS;
> > +}
> > +
> > +/**
> > +  Reset SMM CPU Sync context.
> > +
> > +  SmmCpuSyncContextReset() function is to reset SMM CPU Sync context
> to the initialized state.
> > +
> > +  @param[in,out]  SmmCpuSyncCtx     Pointer to the SMM CPU Sync
> context object to be reset.
> > +
> > +  @retval RETURN_SUCCESS            The SMM CPU Sync context was
> successful reset.
> > +  @retval RETURN_INVALID_PARAMETER  SmmCpuSyncCtx is NULL.
> > +
> > +**/
> > +RETURN_STATUS
> > +EFIAPI
> > +SmmCpuSyncContextReset (
> > +  IN OUT SMM_CPU_SYNC_CONTEXT  *SmmCpuSyncCtx
> > +  )
> > +{
> > +  ASSERT (SmmCpuSyncCtx != NULL);
> 
> (30) bogus assert
> 

Will refine the implementation based on the latest interface.

> > +  if (SmmCpuSyncCtx == NULL) {
> > +    return RETURN_INVALID_PARAMETER;
> > +  }
> > +
> > +  *SmmCpuSyncCtx->GlobalSem->Counter = 0;
> > +
> > +  return RETURN_SUCCESS;
> > +}
> 
> (31) Is there anything to do about the Run semaphores here?
> 

No, Run semaphores will naturally reset if all cpus ready to exit if the caller follow the API requirement calling.


> > +
> > +/**
> > +  Get current number of arrived CPU in SMI.
> > +
> > +  For traditional CPU synchronization method, BSP might need to know the
> current number of arrived CPU in
> > +  SMI to make sure all APs in SMI. This API can be for that purpose.
> > +
> > +  @param[in]      SmmCpuSyncCtx     Pointer to the SMM CPU Sync context
> object.
> > +  @param[in,out]  CpuCount          Current count of arrived CPU in SMI.
> > +
> > +  @retval RETURN_SUCCESS            Get current number of arrived CPU in SMI
> successfully.
> > +  @retval RETURN_INVALID_PARAMETER  SmmCpuSyncCtx or CpuCount is
> NULL.
> > +  @retval RETURN_UNSUPPORTED        Unsupported operation.
> > +
> > +**/
> > +RETURN_STATUS
> > +EFIAPI
> > +SmmCpuSyncGetArrivedCpuCount (
> > +  IN     SMM_CPU_SYNC_CONTEXT  *SmmCpuSyncCtx,
> > +  IN OUT UINTN                 *CpuCount
> > +  )
> > +{
> > +  ASSERT (SmmCpuSyncCtx != NULL && CpuCount != NULL);
> 
> (32) bogus assert
> 
> > +  if ((SmmCpuSyncCtx == NULL) || (CpuCount == NULL)) {
> > +    return RETURN_INVALID_PARAMETER;
> > +  }
> > +
> > +  if (*SmmCpuSyncCtx->GlobalSem->Counter < 0) {
> 
> (33) The type of Counter is
> 
>   volatile UINT32
> 
> therefore this condition will never evaluate to true.
> 
> If you want to check for the door being locked, then I suggest
> 
>   *SmmCpuSyncCtx->GlobalSem.Counter == (UINT32)-1
> 
> or
> 
>   *SmmCpuSyncCtx->GlobalSem.Counter == MAX_UINT32
> 

Yes, I do need check the locked or not, I will fix this!!!


For this API, I will rewrite the implementation according the interface we aligned.



> > +    return RETURN_UNSUPPORTED;
> > +  }
> > +
> > +  *CpuCount = *SmmCpuSyncCtx->GlobalSem->Counter;
> > +
> > +  return RETURN_SUCCESS;
> > +}
> > +
> > +/**
> > +  Performs an atomic operation to check in CPU.
> > +
> > +  When SMI happens, all processors including BSP enter to SMM mode by
> calling SmmCpuSyncCheckInCpu().
> > +
> > +  @param[in,out]  SmmCpuSyncCtx     Pointer to the SMM CPU Sync
> context object.
> > +  @param[in]      CpuIndex          Check in CPU index.
> > +
> > +  @retval RETURN_SUCCESS            Check in CPU (CpuIndex) successfully.
> > +  @retval RETURN_INVALID_PARAMETER  SmmCpuSyncCtx is NULL.
> > +  @retval RETURN_ABORTED            Check in CPU failed due to
> SmmCpuSyncLockDoor() has been called by one elected CPU.
> > +
> > +**/
> > +RETURN_STATUS
> > +EFIAPI
> > +SmmCpuSyncCheckInCpu (
> > +  IN OUT SMM_CPU_SYNC_CONTEXT  *SmmCpuSyncCtx,
> > +  IN     UINTN                 CpuIndex
> > +  )
> > +{
> > +  ASSERT (SmmCpuSyncCtx != NULL);
> 
> (34) bogus ASSERT
> 
> > +  if (SmmCpuSyncCtx == NULL) {
> > +    return RETURN_INVALID_PARAMETER;
> > +  }

Agree, I will refine!

> > +
> > +  //
> > +  // Check to return if Counter has already been locked.
> > +  //
> > +  if ((INT32)InternalReleaseSemaphore (SmmCpuSyncCtx->GlobalSem-
> >Counter) <= 0) {
> 
> (35) The cast and the comparison are bogus.
> 
> InternalReleaseSemaphore():
> 
> - returns 0, and leaves the semaphore unchanged, if the current value of
> the semaphore is MAX_UINT32,
> 
> - increments the semaphore, and returns the incremented -- hence:
> strictly positive -- UINT32 value, otherwise.
> 
> So the condition for
> 
>   semaphore unchanged because door has been locked
> 
> is:
> 
>   InternalReleaseSemaphore (SmmCpuSyncCtx->GlobalSem->Counter) == 0
> 
> No INT32 cast, and no "<".
> 

Agree! I will do the fix. 


> 
> > +    return RETURN_ABORTED;
> > +  }
> > +
> > +  return RETURN_SUCCESS;
> > +}
> > +
> > +/**
> > +  Performs an atomic operation to check out CPU.
> > +
> > +  CheckOutCpu() can be called in error handling flow for the CPU who calls
> CheckInCpu() earlier.
> > +
> > +  @param[in,out]  SmmCpuSyncCtx     Pointer to the SMM CPU Sync
> context object.
> > +  @param[in]      CpuIndex          Check out CPU index.
> > +
> > +  @retval RETURN_SUCCESS            Check out CPU (CpuIndex) successfully.
> > +  @retval RETURN_INVALID_PARAMETER  SmmCpuSyncCtx is NULL.
> > +  @retval RETURN_NOT_READY          The CPU is not checked-in.
> > +  @retval RETURN_UNSUPPORTED        Unsupported operation.
> > +
> > +**/
> > +RETURN_STATUS
> > +EFIAPI
> > +SmmCpuSyncCheckOutCpu (
> > +  IN OUT SMM_CPU_SYNC_CONTEXT  *SmmCpuSyncCtx,
> > +  IN     UINTN                 CpuIndex
> > +  )
> > +{
> > +  ASSERT (SmmCpuSyncCtx != NULL);
> 
> (36) bogus assert
> 
> 
> > +  if (SmmCpuSyncCtx == NULL) {
> > +    return RETURN_INVALID_PARAMETER;
> > +  }
> > +

The same, I will refine it.





> > +  if (*SmmCpuSyncCtx->GlobalSem->Counter == 0) {
> > +    return RETURN_NOT_READY;
> > +  }
> 
> (37) This preliminary check is not particularly useful.
> 
> Assume that Counter is currently 1, but -- due to a programming error
> somewhere -- there are two APs executing SmmCpuSyncCheckOutCpu() in
> parallel. Both may pass this check (due to Counter being 1), and then
> one of the APs will consume the semaphore and return, and the other AP
> will hang forever.
> 
> So this check is "best effort". It's fine -- some programming errors
> just inevitably lead to undefined behavior; not all bad usage can be
> explicitly caught.
> 
> Maybe add a comment?
> 


Yes, I will remove the RETURN_NOT_READY, and add the comment for the API:

"The caller shall make sure the CPU specified by CpuIndex has already checked-in."



> > +  if ((INT32)InternalWaitForSemaphore (SmmCpuSyncCtx->GlobalSem-
> >Counter) < 0) {
> > +    return RETURN_UNSUPPORTED;
> > +  }
> 
> (38) This doesn't look right. InternalWaitForSemaphore() blocks for as
> long as the semaphore is zero. When the semaphore is nonzero,
> InternalWaitForSemaphore() decrements it, and returns the decremented
> value. Thus, InternalWaitForSemaphore() cannot return negative values
> (it returns UINT32), and it also cannot return MAX_UINT32.
> 
> So I simply don't understand the purpose of this code.

The Counter can be locked by calling SmmCpuSyncLockDoor(), after that, Counter will be (UINT32)-1, in such a case,  InternalWaitForSemaphore will continue decrease it to (UINT32)-2, this is to catch the lock case. but according we aligned for the interface below:
/**
  Performs an atomic operation to check out CPU.

  This function can be called in error handling flow for the CPU who calls CheckInCpu() earlier.
  The caller shall make sure the CPU specified by CpuIndex has already checked-in.

  If Context is NULL, then ASSERT().
  If CpuIndex exceeds the range of all CPUs in the system, then ASSERT().

  @param[in,out]  Context           Pointer to the SMM CPU Sync context object.
  @param[in]      CpuIndex          Check out CPU index.

  @retval RETURN_SUCCESS            Check out CPU (CpuIndex) successfully.
  @retval RETURN_ABORTED            Check out CPU failed due to SmmCpuSyncLockDoor() has been called by one elected CPU.

**/
RETURN_STATUS
EFIAPI
SmmCpuSyncCheckOutCpu (
  IN OUT SMM_CPU_SYNC_CONTEXT  *Context,
  IN     UINTN                 CpuIndex
  );

The code will be changed to:

  if ((INT32)InternalWaitForSemaphore (Context->CpuCount) < 0) {
    return RETURN_ABORTED;
  }


> 
> As written, this condition could only fire if InternalWaitForSemaphore()
> successfully decremented the semaphore, and the *new* value of the
> semaphore were >=0x8000_0000. Because in that case, the INT32 cast (=
> implementation-defined behavior) would produce a negative value. But for
> that, we'd first have to increase Counter to 0x8000_0001 at least, and
> that could never happen in practice, IMO.
> 
> So this is basically dead code. What is the intent?


It can happen when locked the Counter.


> 
> > +
> > +  return RETURN_SUCCESS;
> > +}
> > +
> > +/**
> > +  Performs an atomic operation lock door for CPU checkin or checkout.
> > +
> > +  After this function, CPU can not check in via SmmCpuSyncCheckInCpu().
> > +
> > +  The CPU specified by CpuIndex is elected to lock door.
> > +
> > +  @param[in,out]  SmmCpuSyncCtx     Pointer to the SMM CPU Sync
> context object.
> > +  @param[in]      CpuIndex          Indicate which CPU to lock door.
> > +  @param[in,out]  CpuCount          Number of arrived CPU in SMI after look
> door.
> > +
> > +  @retval RETURN_SUCCESS            Lock door for CPU successfully.
> > +  @retval RETURN_INVALID_PARAMETER  SmmCpuSyncCtx or CpuCount is
> NULL.
> > +
> > +**/
> > +RETURN_STATUS
> > +EFIAPI
> > +SmmCpuSyncLockDoor (
> > +  IN OUT SMM_CPU_SYNC_CONTEXT  *SmmCpuSyncCtx,
> > +  IN     UINTN                 CpuIndex,
> > +  IN OUT UINTN                 *CpuCount
> > +  )
> > +{
> > +  ASSERT (SmmCpuSyncCtx != NULL && CpuCount != NULL);
> 
> (39) bogus assert
> 
> > +  if ((SmmCpuSyncCtx == NULL) || (CpuCount == NULL)) {
> > +    return RETURN_INVALID_PARAMETER;
> > +  }
> > +
> > +  *CpuCount = InternalLockdownSemaphore (SmmCpuSyncCtx-
> >GlobalSem->Counter);
> > +
> > +  return RETURN_SUCCESS;
> > +}
> > +
> > +/**
> > +  Used by the BSP to wait for APs.
> > +
> > +  The number of APs need to be waited is specified by NumberOfAPs. The
> BSP is specified by BspIndex.
> > +
> > +  Note: This function is blocking mode, and it will return only after the
> number of APs released by
> > +  calling SmmCpuSyncReleaseBsp():
> > +  BSP: WaitForAPs    <--  AP: ReleaseBsp
> > +
> > +  @param[in,out]  SmmCpuSyncCtx     Pointer to the SMM CPU Sync
> context object.
> > +  @param[in]      NumberOfAPs       Number of APs need to be waited by
> BSP.
> > +  @param[in]      BspIndex          The BSP Index to wait for APs.
> > +
> > +  @retval RETURN_SUCCESS            BSP to wait for APs successfully.
> > +  @retval RETURN_INVALID_PARAMETER  SmmCpuSyncCtx is NULL or
> NumberOfAPs > total number of processors in system.
> > +
> > +**/
> > +RETURN_STATUS
> > +EFIAPI
> > +SmmCpuSyncWaitForAPs (
> > +  IN OUT SMM_CPU_SYNC_CONTEXT  *SmmCpuSyncCtx,
> > +  IN     UINTN                 NumberOfAPs,
> > +  IN     UINTN                 BspIndex
> > +  )
> > +{
> > +  ASSERT (SmmCpuSyncCtx != NULL && NumberOfAPs <= SmmCpuSyncCtx-
> >NumberOfCpus);
> 
> (40) bogus assert
> 
> > +  if ((SmmCpuSyncCtx == NULL) || (NumberOfAPs > SmmCpuSyncCtx-
> >NumberOfCpus)) {
> > +    return RETURN_INVALID_PARAMETER;
> > +  }
> 
> (41) Question for both the library instance and the library class (i.e.,
> API documentation):
> 
> Is it ever valid to call this function with (NumberOfAPs ==
> SmmCpuSyncCtx->NumberOfCpus)?
> 
> I would think not. NumberOfCpus is supposed to include the BSP and the
> APs. Therefore the highest permitted NumberOfAPs value, on input, is
> (SmmCpuSyncCtx->NumberOfCpus - 1).
> 
> So I think we should modify the lib class and the lib instance both.
> RETURN_INVALID_PARAMETER applies to "NumberOfAPs *>=* total number
> of
> processors in system".
> 

Good catch!!! I will fix it.

> > +
> > +  while (NumberOfAPs-- > 0) {
> > +    InternalWaitForSemaphore (SmmCpuSyncCtx->CpuSem[BspIndex].Run);
> > +  }
> 
> (42) In my opinion, this is an ugly pattern.
> 
> First, after the loop, NumberOfAPs will be MAX_UINTN.
> 
> Second, modifying input parameters is also an anti-pattern. Assume you
> debug a problem, and fetch a backtrace where the two innermost frames
> are SmmCpuSyncWaitForAPs() and InternalWaitForSemaphore(). If you look
> at the stack frame that belongs to SmmCpuSyncWaitForAPs(), you may be
> led to think that the function was *invoked* with a *low* NumberOfAPs
> value. Whereas in fact NumberOfAPs may have been a larger value at the
> time of call, only the function decremented NumberOfAPs by the time the
> stack trace was fetched.
> 
> So, please add a new helper variable, and write a proper "for" loop.
> 
>   UINTN  Arrived;
> 
>   for (Arrived = 0; Arrived < NumberOfAPs; Arrived++) {
>     InternalWaitForSemaphore (SmmCpuSyncCtx->CpuSem[BspIndex].Run);
>   }
> 

Agree. Thank you!


> (43) I mentioned this while reviewing the lib class header (patch#2), so
> let me repeat it here:
> 
> BspIndex is used for indexing the CpuSem array, but we perform no range
> checking, against "SmmCpuSyncCtx->NumberOfCpus".
> 
> That error should be documented (in the lib class header), and
> caught/reported here.
> 

Yes, I will refine the patch based on the lib class header we aligned.


> > +
> > +  return RETURN_SUCCESS;
> > +}
> > +
> > +/**
> > +  Used by the BSP to release one AP.
> > +
> > +  The AP is specified by CpuIndex. The BSP is specified by BspIndex.
> > +
> > +  @param[in,out]  SmmCpuSyncCtx     Pointer to the SMM CPU Sync
> context object.
> > +  @param[in]      CpuIndex          Indicate which AP need to be released.
> > +  @param[in]      BspIndex          The BSP Index to release AP.
> > +
> > +  @retval RETURN_SUCCESS            BSP to release one AP successfully.
> > +  @retval RETURN_INVALID_PARAMETER  SmmCpuSyncCtx is NULL or
> CpuIndex is same as BspIndex.
> > +
> > +**/
> > +RETURN_STATUS
> > +EFIAPI
> > +SmmCpuSyncReleaseOneAp   (
> > +  IN OUT SMM_CPU_SYNC_CONTEXT  *SmmCpuSyncCtx,
> > +  IN     UINTN                 CpuIndex,
> > +  IN     UINTN                 BspIndex
> > +  )
> > +{
> > +  ASSERT (SmmCpuSyncCtx != NULL && BspIndex != CpuIndex);
> 
> (44) bogus assert
> 
> > +  if ((SmmCpuSyncCtx == NULL) || (BspIndex == CpuIndex)) {
> > +    return RETURN_INVALID_PARAMETER;
> > +  }
> 
> (45) range checks for BspIndex and CpuIndex missing (in both lib class
> and lib instance)
> 

The same, I will align the code with we aligned lib class.


> > +
> > +  InternalReleaseSemaphore (SmmCpuSyncCtx->CpuSem[CpuIndex].Run);
> > +
> > +  return RETURN_SUCCESS;
> > +}
> > +
> > +/**
> > +  Used by the AP to wait BSP.
> > +
> > +  The AP is specified by CpuIndex. The BSP is specified by BspIndex.
> > +
> > +  Note: This function is blocking mode, and it will return only after the AP
> released by
> > +  calling SmmCpuSyncReleaseOneAp():
> > +  BSP: ReleaseOneAp  -->  AP: WaitForBsp
> > +
> > +  @param[in,out]  SmmCpuSyncCtx    Pointer to the SMM CPU Sync context
> object.
> > +  @param[in]      CpuIndex         Indicate which AP wait BSP.
> > +  @param[in]      BspIndex         The BSP Index to be waited.
> > +
> > +  @retval RETURN_SUCCESS            AP to wait BSP successfully.
> > +  @retval RETURN_INVALID_PARAMETER  SmmCpuSyncCtx is NULL or
> CpuIndex is same as BspIndex.
> > +
> > +**/
> > +RETURN_STATUS
> > +EFIAPI
> > +SmmCpuSyncWaitForBsp (
> > +  IN OUT SMM_CPU_SYNC_CONTEXT  *SmmCpuSyncCtx,
> > +  IN     UINTN                 CpuIndex,
> > +  IN     UINTN                 BspIndex
> > +  )
> > +{
> > +  ASSERT (SmmCpuSyncCtx != NULL && BspIndex != CpuIndex);
> 
> (46) bogus assert
> 
> > +  if ((SmmCpuSyncCtx == NULL) || (BspIndex == CpuIndex)) {
> > +    return RETURN_INVALID_PARAMETER;
> > +  }
> 
> (47) range checks missing (lib class and instance)
> 


The same, I will align the code with we aligned lib class.

> > +
> > +  InternalWaitForSemaphore (SmmCpuSyncCtx->CpuSem[CpuIndex].Run);
> > +
> > +  return RETURN_SUCCESS;
> > +}
> > +
> > +/**
> > +  Used by the AP to release BSP.
> > +
> > +  The AP is specified by CpuIndex. The BSP is specified by BspIndex.
> > +
> > +  @param[in,out]  SmmCpuSyncCtx     Pointer to the SMM CPU Sync
> context object.
> > +  @param[in]      CpuIndex          Indicate which AP release BSP.
> > +  @param[in]      BspIndex          The BSP Index to be released.
> > +
> > +  @retval RETURN_SUCCESS            AP to release BSP successfully.
> > +  @retval RETURN_INVALID_PARAMETER  SmmCpuSyncCtx is NULL or
> CpuIndex is same as BspIndex.
> > +
> > +**/
> > +RETURN_STATUS
> > +EFIAPI
> > +SmmCpuSyncReleaseBsp (
> > +  IN OUT SMM_CPU_SYNC_CONTEXT  *SmmCpuSyncCtx,
> > +  IN     UINTN                 CpuIndex,
> > +  IN     UINTN                 BspIndex
> > +  )
> > +{
> > +  ASSERT (SmmCpuSyncCtx != NULL && BspIndex != CpuIndex);
> 
> (48) bogus assert
> 
> > +  if ((SmmCpuSyncCtx == NULL) || (BspIndex == CpuIndex)) {
> > +    return RETURN_INVALID_PARAMETER;
> > +  }
> 
> (49) range checks missing (lib class and instance)
> 

The same, I will align the code with we aligned lib class.


> > +
> > +  InternalReleaseSemaphore (SmmCpuSyncCtx->CpuSem[BspIndex].Run);
> > +
> > +  return RETURN_SUCCESS;
> > +}
> > diff --git a/UefiCpuPkg/Library/SmmCpuSyncLib/SmmCpuSyncLib.inf
> b/UefiCpuPkg/Library/SmmCpuSyncLib/SmmCpuSyncLib.inf
> > new file mode 100644
> > index 0000000000..6bb1895577
> > --- /dev/null
> > +++ b/UefiCpuPkg/Library/SmmCpuSyncLib/SmmCpuSyncLib.inf
> > @@ -0,0 +1,39 @@
> > +## @file
> > +# SMM CPU Synchronization lib.
> > +#
> > +# This is SMM CPU Synchronization lib used for SMM CPU sync operations.
> > +#
> > +# Copyright (c) 2023, Intel Corporation. All rights reserved.<BR>
> > +# SPDX-License-Identifier: BSD-2-Clause-Patent
> > +#
> > +##
> > +
> > +[Defines]
> > +  INF_VERSION                    = 0x00010005
> > +  BASE_NAME                      = SmmCpuSyncLib
> > +  FILE_GUID                      = 1ca1bc1a-16a4-46ef-956a-ca500fd3381f
> > +  MODULE_TYPE                    = DXE_SMM_DRIVER
> > +  LIBRARY_CLASS                  = SmmCpuSyncLib|DXE_SMM_DRIVER
> > +
> > +[Sources]
> > +  SmmCpuSyncLib.c
> > +
> > +[Packages]
> > +  MdePkg/MdePkg.dec
> > +  MdeModulePkg/MdeModulePkg.dec
> > +  UefiCpuPkg/UefiCpuPkg.dec
> > +
> > +[LibraryClasses]
> > +  UefiLib
> > +  BaseLib
> > +  DebugLib
> > +  PrintLib
> > +  SafeIntLib
> > +  SynchronizationLib
> > +  BaseMemoryLib
> > +  SmmServicesTableLib
> > +  MemoryAllocationLib
> 
> (50) Please sort this list alphabetically (cf. comment (2)).
> 

Ok, will refine it.

> > +
> > +[Pcd]
> > +
> > +[Protocols]
> 
> (51) Useless empty INF file sections; please remove them.
> 
> > diff --git a/UefiCpuPkg/UefiCpuPkg.dsc b/UefiCpuPkg/UefiCpuPkg.dsc
> > index 074fd77461..f264031c77 100644
> > --- a/UefiCpuPkg/UefiCpuPkg.dsc
> > +++ b/UefiCpuPkg/UefiCpuPkg.dsc
> > @@ -23,10 +23,11 @@
> >  #
> >
> >  !include MdePkg/MdeLibs.dsc.inc
> >
> >  [LibraryClasses]
> > +  SafeIntLib|MdePkg/Library/BaseSafeIntLib/BaseSafeIntLib.inf
> >    BaseLib|MdePkg/Library/BaseLib/BaseLib.inf
> >    BaseMemoryLib|MdePkg/Library/BaseMemoryLib/BaseMemoryLib.inf
> >    CpuLib|MdePkg/Library/BaseCpuLib/BaseCpuLib.inf
> >    DebugLib|MdePkg/Library/BaseDebugLibNull/BaseDebugLibNull.inf
> >
> SerialPortLib|MdePkg/Library/BaseSerialPortLibNull/BaseSerialPortLibNull.inf
> 
> (52) Just from the context visible here, this list seems alphabetically
> sorted pre-patch; if that's the case, please stick with it (don't break
> the sort order).


How about adding the SafeIntLib in the MdePkg/MdeLibs.dsc.inc? if agree, I will do that with separated patch.


Thanks,
Jiaxin



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#112520): https://edk2.groups.io/g/devel/message/112520
Mute This Topic: https://groups.io/mt/103010165/7686176
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io]
-=-=-=-=-=-=-=-=-=-=-=-

next prev parent reply	other threads:[~2023-12-14 11:12 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-06 10:01 [edk2-devel] [PATCH v3 0/6] Refine SMM CPU Sync flow and abstract SmmCpuSyncLib Wu, Jiaxin
2023-12-06 10:01 ` [edk2-devel] [PATCH v3 1/6] UefiCpuPkg/PiSmmCpuDxeSmm: Optimize Semaphore Sync between BSP and AP Wu, Jiaxin
2023-12-12 19:27   ` Laszlo Ersek
2023-12-06 10:01 ` [edk2-devel] [PATCH v3 2/6] UefiCpuPkg: Adds SmmCpuSyncLib library class Wu, Jiaxin
2023-12-07  9:07   ` Ni, Ray
2023-12-12 20:18   ` Laszlo Ersek
2023-12-13  4:23     ` Wu, Jiaxin
2023-12-13 15:02       ` Laszlo Ersek
2023-12-06 10:01 ` [edk2-devel] [PATCH v3 3/6] UefiCpuPkg: Implements SmmCpuSyncLib library instance Wu, Jiaxin
2023-12-13 14:34   ` Laszlo Ersek
2023-12-14 11:11     ` Wu, Jiaxin [this message]
2023-12-14 13:48       ` Laszlo Ersek
2023-12-14 15:34         ` Wu, Jiaxin
2023-12-14 15:54         ` Wu, Jiaxin
2023-12-15  6:41         ` Wu, Jiaxin
2023-12-15  6:44         ` Wu, Jiaxin
2023-12-06 10:01 ` [edk2-devel] [PATCH v3 4/6] OvmfPkg: Specifies SmmCpuSyncLib instance Wu, Jiaxin
2023-12-13 16:52   ` Laszlo Ersek
2023-12-14 13:43     ` Wu, Jiaxin
2023-12-15  0:21     ` Ni, Ray
2023-12-06 10:01 ` [edk2-devel] [PATCH v3 5/6] UefiPayloadPkg: " Wu, Jiaxin
2023-12-06 10:01 ` [edk2-devel] [PATCH v3 6/6] UefiCpuPkg/PiSmmCpuDxeSmm: Consume SmmCpuSyncLib Wu, Jiaxin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MN0PR11MB6158504B83C892BD70B1D76DFE8CA@MN0PR11MB6158.namprd11.prod.outlook.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox