From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <bounce+27952+112529+7686176+12367111@groups.io>
Received: from mail02.groups.io (mail02.groups.io [66.175.222.108])
	by spool.mail.gandi.net (Postfix) with ESMTPS id BCD527803E3
	for <rebecca@openfw.io>; Thu, 14 Dec 2023 13:48:51 +0000 (UTC)
DKIM-Signature: a=rsa-sha256; bh=KxVlolaUF6b3ieg68vzuUjptDP+2tVia1L21Uc0qTfM=;
 c=relaxed/simple; d=groups.io;
 h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From:In-Reply-To:Precedence:List-Subscribe:List-Help:Sender:List-Id:Mailing-List:Delivered-To:Reply-To:List-Unsubscribe-Post:List-Unsubscribe:Content-Language:Content-Type:Content-Transfer-Encoding;
 s=20140610; t=1702561730; v=1;
 b=R6oRWhWgnLGFVUTu1Ot9k024mImbg6cGi0+5oc04tl8y5gDjarAqsEyH3KEyqkG5YhQJy/hi
 KJ5wIZ+h9JoRH8FQ+rUC9RWZIpVDtZbA0YtURCKDd3JZc7U3Ghr7gTqvih1QgqIylURnQtDG7XE
 xUsEiGT6FOD672+8UoqgBFi0=
X-Received: by 127.0.0.2 with SMTP id LAgaYY7687511xI2xHubTqFd; Thu, 14 Dec 2023 05:48:50 -0800
X-Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124])
 by mx.groups.io with SMTP id smtpd.web11.22410.1702561729683973822
 for <devel@edk2.groups.io>;
 Thu, 14 Dec 2023 05:48:49 -0800
X-Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73])
 by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3,
 cipher=TLS_AES_256_GCM_SHA384) id us-mta-381-aGXkRqXUMLaAbiiE_wYpQQ-1; Thu,
 14 Dec 2023 08:48:45 -0500
X-MC-Unique: aGXkRqXUMLaAbiiE_wYpQQ-1
X-Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
	(No client certificate requested)
	by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E9A8E28040B1;
	Thu, 14 Dec 2023 13:48:44 +0000 (UTC)
X-Received: from [10.39.193.43] (unknown [10.39.193.43])
	by smtp.corp.redhat.com (Postfix) with ESMTPS id 3A4212166B31;
	Thu, 14 Dec 2023 13:48:43 +0000 (UTC)
Message-ID: <4a2911fb-aa15-efb0-f2e4-1c5bac8aab8d@redhat.com>
Date: Thu, 14 Dec 2023 14:48:42 +0100
MIME-Version: 1.0
Subject: Re: [edk2-devel] [PATCH v3 3/6] UefiCpuPkg: Implements SmmCpuSyncLib library instance
To: "Wu, Jiaxin" <jiaxin.wu@intel.com>,
 "devel@edk2.groups.io" <devel@edk2.groups.io>
Cc: "Dong, Eric" <eric.dong@intel.com>, "Ni, Ray" <ray.ni@intel.com>,
 "Zeng, Star" <star.zeng@intel.com>, Gerd Hoffmann <kraxel@redhat.com>,
 "Kumar, Rahul R" <rahul.r.kumar@intel.com>
References: <20231206100122.8028-1-jiaxin.wu@intel.com>
 <20231206100122.8028-4-jiaxin.wu@intel.com>
 <3b31f213-f102-acd4-4af6-d2b77fbde9a9@redhat.com>
 <MN0PR11MB6158504B83C892BD70B1D76DFE8CA@MN0PR11MB6158.namprd11.prod.outlook.com>
From: "Laszlo Ersek" <lersek@redhat.com>
In-Reply-To: <MN0PR11MB6158504B83C892BD70B1D76DFE8CA@MN0PR11MB6158.namprd11.prod.outlook.com>
X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.6
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Precedence: Bulk
List-Subscribe: <mailto:devel+subscribe@edk2.groups.io>
List-Help: <mailto:devel+help@edk2.groups.io>
Sender: devel@edk2.groups.io
List-Id: <devel.edk2.groups.io>
Mailing-List: list devel@edk2.groups.io; contact devel+owner@edk2.groups.io
Reply-To: devel@edk2.groups.io,lersek@redhat.com
List-Unsubscribe-Post: List-Unsubscribe=One-Click
List-Unsubscribe: <https://edk2.groups.io/g/devel/leave/12367111/7686176/1913456212/plugh>
X-Gm-Message-State: nJ6wNrZbi1jQ7Bg0h7oLYOtGx7686176AA=
Content-Language: en-US
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-GND-Status: LEGIT
Authentication-Results: spool.mail.gandi.net;
	dkim=pass header.d=groups.io header.s=20140610 header.b=R6oRWhWg;
	dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=redhat.com (policy=none);
	spf=pass (spool.mail.gandi.net: domain of bounce@groups.io designates 66.175.222.108 as permitted sender) smtp.mailfrom=bounce@groups.io

On 12/14/23 12:11, Wu, Jiaxin wrote:
> Hi Laszlo,
>=20
> Really appreciate your comments! I checked one by one and feedback as bel=
ow, thank you & Ray again & again for patch refinement!!!!
>=20
>=20
>>
>> (1) If / when you update the documentation in patch#2, please update
>> this one as well.
>>
>=20
> Yes, I will do the alignment.
>=20
>> (2) Please sort the #include list alphabetically.
>>
>> (The idea is that the [LibraryClasses] section in the INF file should be
>> sorted as well, and then we can easily verify whether those two lists
>> match each other -- modulo <Library/SmmCpuSyncLib.h>, of course.)
>>
>=20
> Agree.
>=20
>> (3) We can improve this, as follows:
>>
>>   typedef volatile UINT32 SMM_CPU_SYNC_SEMAPHORE;
>=20
> Good comment. Agree.
>=20
>=20
>>
>>   typedef struct {
>>     SMM_CPU_SYNC_SEMAPHORE *Counter;
>>   } SMM_CPU_SYNC_SEMAPHORE_GLOBAL;
>>
>>   typedef struct {
>>     SMM_CPU_SYNC_SEMAPHORE *Run;
>>   } SMM_CPU_SYNC_SEMAPHORE_CPU;
>>
>> Because, while it *indeed* makes some sense to introduce these separate
>> wrapper structures, we should still ensure that the internals are
>> identical. This will come handy later.
>>
>=20
> After check with Ray, I convinced we don't need wrapper the "Counter" int=
o the SMM_CPU_SYNC_SEMAPHORE_GLOBAL. He thinks it's overdesigned since curr=
ently we only have one GLOBAL semaphore. "Counter" defines in the SMM_CPU_S=
YNC_CONTEXT directly can also make "Counter" has its own CPU cache lines, a=
nd don't need consider the future extension. So, I agree to move Counter in=
to SMM_CPU_SYNC_CONTEXT. What's your opinion?
>=20
> But for the *Run*, he is ok to wrap into the structure since it's for eac=
h CPU, and it can benefit & simply the coding logic, which can easily help =
us direct to the different CPU with index and meet the semaphore size align=
ment requirement.

My actual opinion is that *both* wrapper structures are unnecessary. You
can just embed the Counter pointer into the outer sync structure, and
you can have a Run *array of pointers* in the outer sync structure as well.

However, many people find double indirection confusing, and therefore
wrap one level into thin structures. That's fine with me. It has zero
runtime cost, and if it makes the code more manageable to the submitter,
I don't mind. So, up to you.


>=20
>>
>> (4) This is too complicated, in my opinion.
>>
>> (4.1) First of all, please add a *conspicuous* comment to the
>> SMM_CPU_SYNC_CONTEXT here, explaining that the whole idea is to place
>> the Counter and Run semaphores on different CPU cache lines, for good
>> performance. That's the *core* principle of this whole structure --
>> that's why we have an array of pointers to semaphores, rather than an
>> array of semaphores directly.
>>
>> You didn't document that principle, and I had to spend a lot of time
>> deducing that fact from the SmmCpuSyncContextInit() function.
>=20
> Sorry about that, I will document for the SMM_CPU_SYNC_CONTEXT definition=
.
>=20
>=20
>>
>> (4.2) The structure should go like this:
>>
>> struct SMM_CPU_SYNC_CONTEXT  {
>>   UINTN                            NumberOfCpus;
>>   VOID                             *SemBuffer;
>>   UINTN                            SemBufferPages;
>>   SMM_CPU_SYNC_SEMAPHORE_GLOBAL    GlobalSem;
>>   SMM_CPU_SYNC_SEMAPHORE_CPU       CpuSem[];
>> };
>>
>> Details:
>>
>> - move NumberOfCpus to the top
>>
>> - change the type of SemBuffer from (UINTN*) to (VOID*)
>>
>> - replace SemBufferSize with SemBufferPages
>=20
> Oh? Lazlo, could you explain why it's better to define it as "SemBufferPa=
ges" instead of "SemBufferSize" in bytes?

Semantically, there is no difference, or even SemBufferSize is more
flexible. My recommendation is basically a "forward declaration" here:
using "SemBufferPages" will simplify the code later on. Because, you
never need to *remember* the precise byte count. What you need to
*remember* is the page count only. So that way the declaration matches
the usage better, and you can save some code logic later on.

>=20
>=20
>>
>> - move GlobalSem and CpuSem to the end
>>
>> - We need exactly one SMM_CPU_SYNC_SEMAPHORE_GLOBAL, therefore
>> embed
>> GlobalSem directly as a field (it should not be a pointer)
>>
>> - We can much simplify the code by turning CpuSem into a *flexible array
>> member* (this is a C99 feature that is already widely used in edk2).
>> This is why we move CpuSem to the end (and then we keep GlobalSem
>> nearby, for clarity).
>=20
> Agree! The same comment from Ray! Thank you and Ray, both!!!
>=20
>>
>=20
> Really great comment here:
>=20
> Based on my above explanation, I propose to below definition:
>=20
> ///
> /// Shall place each semaphore on exclusive cache line for good performan=
ce.
> ///
> typedef volatile UINT32 SMM_CPU_SYNC_SEMAPHORE;
>=20
> typedef struct {
>   ///
>   /// Used for control each CPU continue run or wait for signal
>   ///
>   SMM_CPU_SYNC_SEMAPHORE    *Run;
> } SMM_CPU_SYNC_SEMAPHORE_FOR_EACH_CPU;
>=20
> struct SMM_CPU_SYNC_CONTEXT  {
>   ///
>   /// Indicate all CPUs in the system.
>   ///
>   UINTN                                  NumberOfCpus;
>   ///
>   /// Address of semaphores
>   ///
>   VOID                                   *SemBuffer;
>   ///
>   /// Size in bytes of semaphores
>   ///
>   UINTN                                  SemBufferSize;      ----> I can =
change to pages based on your feedback:).
>   ///
>   /// Indicate CPUs entered SMM.
>   ///
>   SMM_CPU_SYNC_SEMAPHORE                 *CpuCount;
>   ///
>   /// Define an array of structure for each CPU semaphore due to the size=
 alignment
>   /// requirement. With the array of structure for each CPU semaphore, it=
's easy to
>   /// reach the specific CPU with CPU Index for its own semaphore access:=
 CpuSem[CpuIndex].
>   ///
>   SMM_CPU_SYNC_SEMAPHORE_FOR_EACH_CPU    CpuSem[];
> };

Yes. This works too. In earnest, as I state above, I would prefer

  SMM_CPU_SYNC_SEMAPHORE *CpuSem[];

but again, some people dislike an array-of-pointers. It's a matter of taste=
.

>=20
>=20
>>
>> (5) Please make these Internal functions STATIC.
>>
>> Better yet, please make *all* functions that are not EFIAPI, STATIC.
>=20
> Agree.
>=20
>=20
>>> +  ASSERT (SmmCpuSyncCtx !=3D NULL);
>>
>> (6) This assert is unnecessary and wrong; we perform correct error
>> checking already.
>=20
> Agree, I will remove the check, but keep assert.

Right, per our previous discussion, that works too, as long as the API
documents the requirement for the caller.

>=20
>>
>> So, several comments on this section:
>>
>> (7) the separate NULL assignment to (*SmmCpuSyncCtx) is superfluous, we
>> overwrite the object immediately after.
>>
>=20
> Agree.
>=20
>> (8) the ASSERT() is superfluous and wrong; we already check for -- and
>> report -- allocation failure correctly.
>=20
> Agree, will remove the assert here.
>=20
>=20
>>
>> (9) *page* allocation is useless / wasteful here; the main sync context
>> structure can be allocated from *pool*
>>
>=20
> Agree! The same comment from Ray! Again, thank you and Ray, both!!!
>=20
>=20
>=20
>> (10) SafeIntLib APIs return RETURN_STATUS (and Status already has type
>> RETURN_STATUS), so we should use RETURN_ERROR() rather than
>> EFI_ERROR()
>> -- it's more idiomatic
>=20
> Agree.
>=20
>>
>> (11) Referring back to (4), SMM_CPU_SYNC_SEMAPHORE_GLOBAL should
>> not be
>> counted (added) separately, because it is embedded in
>> SMM_CPU_SYNC_CONTEXT directly.
>>
>>> +
>>> +  (*SmmCpuSyncCtx)->GlobalSem    =3D
>> (SMM_CPU_SYNC_SEMAPHORE_GLOBAL *)((UINT8 *)(*SmmCpuSyncCtx) +
>> sizeof (SMM_CPU_SYNC_CONTEXT));
>>> +  (*SmmCpuSyncCtx)->CpuSem       =3D (SMM_CPU_SYNC_SEMAPHORE_CPU
>> *)((UINT8 *)(*SmmCpuSyncCtx) + sizeof (SMM_CPU_SYNC_CONTEXT) +
>> sizeof (SMM_CPU_SYNC_SEMAPHORE_GLOBAL));
>>
>> (12) And then these two assignments should be dropped.
>>
>=20
> Yes, agree, with my proposed definition, I will refine it.
>=20
>=20
>>> +  (*SmmCpuSyncCtx)->NumberOfCpus =3D NumberOfCpus;
>>> +
>>> +  //
>>> +  // Count the TotalSemSize
>>> +  //
>>> +  OneSemSize =3D GetSpinLockProperties ();
>>> +
>>> +  Status =3D SafeUintnMult (OneSemSize, sizeof
>> (SMM_CPU_SYNC_SEMAPHORE_GLOBAL) / sizeof (VOID *), &GlobalSemSize);
>>> +  if (EFI_ERROR (Status)) {
>>> +    goto ON_ERROR;
>>> +  }
>>> +
>>> +  Status =3D SafeUintnMult (OneSemSize, sizeof
>> (SMM_CPU_SYNC_SEMAPHORE_CPU) / sizeof (VOID *), &OneCpuSemSize);
>>> +  if (EFI_ERROR (Status)) {
>>> +    goto ON_ERROR;
>>> +  }
>>
>> (13) I find this obscure and misleading. How about this one instead:
>>
>>   UINTN  CacheLineSize;
>>
>>   CacheLineSize =3D GetSpinLockProperties ();
>>   OneSemSize    =3D ALIGN_VALUE (sizeof (SMM_CPU_SYNC_SEMAPHORE),
>> CacheLineSize);
>>
>> and then eliminate GlobalSemSize and OneCpuSemSize altogether.
>>
>> The above construct will ensure that
>>
>> (a) OneSemSize is just large enough for placing semaphores on different
>> cache lines, and that
>>
>> (b) OneSemSize is suitable for *both*
>> SMM_CPU_SYNC_SEMAPHORE_GLOBAL and
>> SMM_CPU_SYNC_SEMAPHORE_CPU. This is where we rely on the common
>> internal
>> type SMM_CPU_SYNC_SEMAPHORE.
>>
>>> +
>>> +  Status =3D SafeUintnMult (NumberOfCpus, OneCpuSemSize, &CpuSemSize);
>>> +  if (EFI_ERROR (Status)) {
>>> +    goto ON_ERROR;
>>> +  }
>>> +
>>> +  Status =3D SafeUintnAdd (GlobalSemSize, CpuSemSize, &TotalSemSize);
>>> +  if (EFI_ERROR (Status)) {
>>> +    goto ON_ERROR;
>>> +  }
>>
>=20
> Agree!
>=20
>=20
>> (14) This is probably better written as
>>
>>   UINTN  NumSem;
>>
>>   Status =3D SafeUintnAdd (1, NumberOfCpus, &NumSem);
>>   if (RETURN_ERROR (Status)) {
>>     goto ON_ERROR;
>>   }
>>
>>   Status =3D SafeUintnMult (NumSem, OneSemSize, &TotalSemSize);
>>   if (RETURN_ERROR (Status)) {
>>     goto ON_ERROR;
>>   }
>>
>> and remove the variable CpuSemSize as well.
>>
>>> +
>>> +  DEBUG ((DEBUG_INFO, "[%a] - One Semaphore Size    =3D 0x%x\n",
>> __func__, OneSemSize));
>>> +  DEBUG ((DEBUG_INFO, "[%a] - Total Semaphores Size =3D 0x%x\n",
>> __func__, TotalSemSize));
>>
>> (15) These are useful, but %x is not suitable for formatting UINTN.
>> Instead, use %Lx, and cast the values to UINT64:
>>
>>   DEBUG ((DEBUG_INFO, "[%a] - One Semaphore Size    =3D 0x%Lx\n", __func=
__,
>> (UINT64)OneSemSize));
>>   DEBUG ((DEBUG_INFO, "[%a] - Total Semaphores Size =3D 0x%Lx\n", __func=
__,
>> (UINT64)TotalSemSize));
>>
>=20
> Agree!
>=20
>>> +
>>> +  //
>>> +  // Allocate for Semaphores in the *SmmCpuSyncCtx
>>> +  //
>>> +  (*SmmCpuSyncCtx)->SemBufferSize =3D TotalSemSize;
>>> +  (*SmmCpuSyncCtx)->SemBuffer     =3D AllocatePages (EFI_SIZE_TO_PAGES
>> ((*SmmCpuSyncCtx)->SemBufferSize));
>>
>> (16) I suggest reworking this as follows (will be beneficial later), in
>> accordance with (4):
>>
>>   (*SmmCpuSyncCtx)->SemBufferPages =3D EFI_SIZE_TO_PAGES (TotalSemSize);
>>   (*SmmCpuSyncCtx)->SemBuffer      =3D AllocatePages (
>>                                        (*SmmCpuSyncCtx)->SemBufferPages
>>                                        );
>>
>>> +  ASSERT ((*SmmCpuSyncCtx)->SemBuffer !=3D NULL);
>>
>> (17) Bogus assert; same reason as in point (8).
>=20
>=20
> Agree!
>=20
>>
>>> +  if ((*SmmCpuSyncCtx)->SemBuffer =3D=3D NULL) {
>>> +    Status =3D RETURN_OUT_OF_RESOURCES;
>>> +    goto ON_ERROR;
>>> +  }
>>> +
>>> +  ZeroMem ((*SmmCpuSyncCtx)->SemBuffer, TotalSemSize);
>>
>> (18) First approach: simplify the code by calling AllocateZeroPages()
>> instead. (It may zero more bytes than strictly necessary, but it's not a
>> big deal, and the code simplification is worth it.)
>>
>> (19) Second approach: even better, just drop this call. There is no need
>> for zeroing the semaphore buffer at all, as we are going to manually set
>> both the Counter and the individual Run elements, below!
>>
>> (20) With ZeroMem() gone, evaluate if we still depend on the
>> BaseMemoryLib class (#include and [LibraryClasses]).
>=20
> Agree, I will drop the zeromem!
>=20
>=20
>>
>>> +
>>> +  //
>>> +  // Assign Global Semaphore pointer
>>> +  //
>>> +  SemAddr                               =3D (UINTN)(*SmmCpuSyncCtx)->S=
emBuffer;
>>> +  (*SmmCpuSyncCtx)->GlobalSem->Counter  =3D (UINT32 *)SemAddr;
>>
>> (21) Side comment (the compiler will catch it for you anyway): it's not
>> "GlobalSem->Counter" but "GlobalSem.Counter", after point (4).
>>
>=20
> I will rename the Counter --> CpuCount, and move to SmmCpuSyncCtx directl=
y based on the comment from Ray.
>=20
>=20
>> (22) The explicit (UINT32 *) cast is ugly. We should cast to
>> (SMM_CPU_SYNC_SEMAPHORE *).
>>
>=20
> Agree!
>=20
>>> +  *(*SmmCpuSyncCtx)->GlobalSem->Counter =3D 0;
>>> +  DEBUG ((DEBUG_INFO, "[%a] - (*SmmCpuSyncCtx)->GlobalSem->Counter
>> Address: 0x%08x\n", __func__, (UINTN)(*SmmCpuSyncCtx)->GlobalSem-
>>> Counter));
>>
>> (23) problems with this DEBUG line:
>>
>> (23.1) needlessly verbose,
>>
>> (23.2) prints UINTN with %x,
>>
>> (23.3) pads to 8 nibbles even though UINTN can be 64-bit
>>
>> How about:
>>
>>   DEBUG ((
>>     DEBUG_INFO,
>>     "[%a] - GlobalSem.Counter @ 0x%016Lx\n",
>>     __func__,
>>     (UINT64)SemAddr
>>     ));
>>
>>> +
>>> +  SemAddr +=3D GlobalSemSize;
>>
>> (24) Should be "+=3D OneSemSize".
>=20
> Agree, even move the Counter in context, it should be ok with "+=3D OneSe=
mSize".
>=20
>>
>>> +
>>> +  //
>>> +  // Assign CPU Semaphore pointer
>>> +  //
>>> +  for (CpuIndex =3D 0; CpuIndex < NumberOfCpus; CpuIndex++) {
>>> +    (*SmmCpuSyncCtx)->CpuSem[CpuIndex].Run  =3D (UINT32 *)(SemAddr +
>> (CpuSemSize / NumberOfCpus) * CpuIndex);
>>> +    *(*SmmCpuSyncCtx)->CpuSem[CpuIndex].Run =3D 0;
>>> +    DEBUG ((DEBUG_INFO, "[%a] - (*SmmCpuSyncCtx)->CpuSem[%d].Run
>> Address: 0x%08x\n", __func__, CpuIndex, (UINTN)(*SmmCpuSyncCtx)-
>>> CpuSem[CpuIndex].Run));
>>> +  }
>>
>> (25) Extremely over-complicated.
>>
>=20
> Yes, agree, now I have realized that. Thank you again!
>=20
>=20
>> (25.1) The quotient (CpuSemSize / NumberOfCpus) is just OneCpuSemSize,
>> from the previous SafeUintnMult() call.
>>
>=20
> Yes.
>=20
>> (25.2) Using SemAddr as a base address, and then performing a separate
>> multiplication, is wasteful -- not just computationally, but
>> semantically. We can simply advance SemAddr here!
>>
>=20
> Agree.
>=20
>> (25.3) the expression
>>
>>   (*SmmCpuSyncCtx)->CpuSem[CpuIndex].Run
>>
>> is tiresome to read, and so we shouldn't repat it multiple times!
>=20
> Agree
>=20
>>
>> (25.4) the usual problems with the DEBUG line:
>>
>> (25.4.1) needlessly verbose
>>
>> (25.4.2) uses %d for formatting CpuIndex (which is UINTN)
>>
>> (25.4.3) uses %x for formatting (UINTN)Run
>>
>> (25.4.4) padds to 8 nibbles even though the Run address can be 64-bit
>>
>> So:
>>
>>   SMM_CPU_SYNC_SEMAPHORE_CPU *CpuSem;
>>
>>   CpuSem =3D (*SmmCpuSyncCtx)->CpuSem;
>>   for (CpuIndex =3D 0; CpuIndex < NumberOfCpus; CpuIndex++) {
>>     CpuSem->Run  =3D (SMM_CPU_SYNC_SEMAPHORE *)SemAddr;
>>     *CpuSem->Run =3D 0;
>>
>>     DEBUG ((
>>       DEBUG_INFO,
>>       "[%a] - CpuSem[%Lu].Run @ 0x%016Lx\n",
>>       __func__,
>>       (UINT64)CpuIndex,
>>       (UINT64)SemAddr
>>       ));
>>
>>     CpuSem++;
>>     SemAddr +=3D OneSemSize;
>>   }
>>
>=20
> Really good, thanks comments. Totally agree!
>=20
>=20
>>> +
>>> +  return RETURN_SUCCESS;
>>> +
>>> +ON_ERROR:
>>> +  FreePages (*SmmCpuSyncCtx, EFI_SIZE_TO_PAGES (CtxSize));
>>
>> (26) And then this can be
>>
>>   FreePool (*SmmCpuSyncCtx);
>>
>=20
> Yes, agree.
>=20
>> per comment (9).
>>
>>> +  return Status;
>>> +}
>>> +
>>> +/**
>>> +  Deinit an allocated SMM CPU Sync context.
>>> +
>>> +  SmmCpuSyncContextDeinit() function is to deinitialize SMM CPU Sync
>> context, the resources allocated in
>>> +  SmmCpuSyncContextInit() will be freed.
>>> +
>>> +  Note: This function only can be called after SmmCpuSyncContextInit()
>> return success.
>>> +
>>> +  @param[in,out]  SmmCpuSyncCtx     Pointer to the SMM CPU Sync
>> context object to be deinitialized.
>>> +
>>> +  @retval RETURN_SUCCESS            The SMM CPU Sync context was
>> successful deinitialized.
>>> +  @retval RETURN_INVALID_PARAMETER  SmmCpuSyncCtx is NULL.
>>> +  @retval RETURN_UNSUPPORTED        Unsupported operation.
>>> +
>>> +**/
>>> +RETURN_STATUS
>>> +EFIAPI
>>> +SmmCpuSyncContextDeinit (
>>> +  IN OUT SMM_CPU_SYNC_CONTEXT  *SmmCpuSyncCtx
>>> +  )
>>> +{
>>> +  UINTN  SmmCpuSyncCtxSize;
>>> +
>>> +  ASSERT (SmmCpuSyncCtx !=3D NULL);
>>
>> (27) bogus ASSERT
>>
>=20
> According our original define, I will refine the interface.
>=20
>>> +  if (SmmCpuSyncCtx =3D=3D NULL) {
>>> +    return RETURN_INVALID_PARAMETER;
>>> +  }
>>> +
>>> +  SmmCpuSyncCtxSize =3D sizeof (SMM_CPU_SYNC_CONTEXT) + sizeof
>> (SMM_CPU_SYNC_SEMAPHORE_GLOBAL) + sizeof
>> (SMM_CPU_SYNC_SEMAPHORE_CPU) * (SmmCpuSyncCtx->NumberOfCpus);
>>> +
>>> +  FreePages (SmmCpuSyncCtx->SemBuffer, EFI_SIZE_TO_PAGES
>> (SmmCpuSyncCtx->SemBufferSize));
>>
>> (28) Per comment (16), this can be simplified as:
>>
>>   FreePages (SmmCpuSyncCtx->SemBuffer, SmmCpuSyncCtx-
>>> SemBufferPages);
>>
>>> +
>>> +  FreePages (SmmCpuSyncCtx, EFI_SIZE_TO_PAGES (SmmCpuSyncCtxSize));
>>
>> (29) Per comments (9) and (26), this should be just
>>
>>   FreePool (SmmCpuSyncCtx);
>>
>> (and the variable "SmmCpuSyncCtxSize" should be removed).
>>
>=20
> Yes, agree!
>=20
>=20
>>> +
>>> +  return RETURN_SUCCESS;
>>> +}
>>> +
>>> +/**
>>> +  Reset SMM CPU Sync context.
>>> +
>>> +  SmmCpuSyncContextReset() function is to reset SMM CPU Sync context
>> to the initialized state.
>>> +
>>> +  @param[in,out]  SmmCpuSyncCtx     Pointer to the SMM CPU Sync
>> context object to be reset.
>>> +
>>> +  @retval RETURN_SUCCESS            The SMM CPU Sync context was
>> successful reset.
>>> +  @retval RETURN_INVALID_PARAMETER  SmmCpuSyncCtx is NULL.
>>> +
>>> +**/
>>> +RETURN_STATUS
>>> +EFIAPI
>>> +SmmCpuSyncContextReset (
>>> +  IN OUT SMM_CPU_SYNC_CONTEXT  *SmmCpuSyncCtx
>>> +  )
>>> +{
>>> +  ASSERT (SmmCpuSyncCtx !=3D NULL);
>>
>> (30) bogus assert
>>
>=20
> Will refine the implementation based on the latest interface.
>=20
>>> +  if (SmmCpuSyncCtx =3D=3D NULL) {
>>> +    return RETURN_INVALID_PARAMETER;
>>> +  }
>>> +
>>> +  *SmmCpuSyncCtx->GlobalSem->Counter =3D 0;
>>> +
>>> +  return RETURN_SUCCESS;
>>> +}
>>
>> (31) Is there anything to do about the Run semaphores here?
>>
>=20
> No, Run semaphores will naturally reset if all cpus ready to exit if the =
caller follow the API requirement calling.

OK, thanks!

>=20
>=20
>>> +
>>> +/**
>>> +  Get current number of arrived CPU in SMI.
>>> +
>>> +  For traditional CPU synchronization method, BSP might need to know t=
he
>> current number of arrived CPU in
>>> +  SMI to make sure all APs in SMI. This API can be for that purpose.
>>> +
>>> +  @param[in]      SmmCpuSyncCtx     Pointer to the SMM CPU Sync contex=
t
>> object.
>>> +  @param[in,out]  CpuCount          Current count of arrived CPU in SM=
I.
>>> +
>>> +  @retval RETURN_SUCCESS            Get current number of arrived CPU =
in SMI
>> successfully.
>>> +  @retval RETURN_INVALID_PARAMETER  SmmCpuSyncCtx or CpuCount is
>> NULL.
>>> +  @retval RETURN_UNSUPPORTED        Unsupported operation.
>>> +
>>> +**/
>>> +RETURN_STATUS
>>> +EFIAPI
>>> +SmmCpuSyncGetArrivedCpuCount (
>>> +  IN     SMM_CPU_SYNC_CONTEXT  *SmmCpuSyncCtx,
>>> +  IN OUT UINTN                 *CpuCount
>>> +  )
>>> +{
>>> +  ASSERT (SmmCpuSyncCtx !=3D NULL && CpuCount !=3D NULL);
>>
>> (32) bogus assert
>>
>>> +  if ((SmmCpuSyncCtx =3D=3D NULL) || (CpuCount =3D=3D NULL)) {
>>> +    return RETURN_INVALID_PARAMETER;
>>> +  }
>>> +
>>> +  if (*SmmCpuSyncCtx->GlobalSem->Counter < 0) {
>>
>> (33) The type of Counter is
>>
>>   volatile UINT32
>>
>> therefore this condition will never evaluate to true.
>>
>> If you want to check for the door being locked, then I suggest
>>
>>   *SmmCpuSyncCtx->GlobalSem.Counter =3D=3D (UINT32)-1
>>
>> or
>>
>>   *SmmCpuSyncCtx->GlobalSem.Counter =3D=3D MAX_UINT32
>>
>=20
> Yes, I do need check the locked or not, I will fix this!!!
>=20
>=20
> For this API, I will rewrite the implementation according the interface w=
e aligned.
>=20
>=20
>=20
>>> +    return RETURN_UNSUPPORTED;
>>> +  }
>>> +
>>> +  *CpuCount =3D *SmmCpuSyncCtx->GlobalSem->Counter;
>>> +
>>> +  return RETURN_SUCCESS;
>>> +}
>>> +
>>> +/**
>>> +  Performs an atomic operation to check in CPU.
>>> +
>>> +  When SMI happens, all processors including BSP enter to SMM mode by
>> calling SmmCpuSyncCheckInCpu().
>>> +
>>> +  @param[in,out]  SmmCpuSyncCtx     Pointer to the SMM CPU Sync
>> context object.
>>> +  @param[in]      CpuIndex          Check in CPU index.
>>> +
>>> +  @retval RETURN_SUCCESS            Check in CPU (CpuIndex) successful=
ly.
>>> +  @retval RETURN_INVALID_PARAMETER  SmmCpuSyncCtx is NULL.
>>> +  @retval RETURN_ABORTED            Check in CPU failed due to
>> SmmCpuSyncLockDoor() has been called by one elected CPU.
>>> +
>>> +**/
>>> +RETURN_STATUS
>>> +EFIAPI
>>> +SmmCpuSyncCheckInCpu (
>>> +  IN OUT SMM_CPU_SYNC_CONTEXT  *SmmCpuSyncCtx,
>>> +  IN     UINTN                 CpuIndex
>>> +  )
>>> +{
>>> +  ASSERT (SmmCpuSyncCtx !=3D NULL);
>>
>> (34) bogus ASSERT
>>
>>> +  if (SmmCpuSyncCtx =3D=3D NULL) {
>>> +    return RETURN_INVALID_PARAMETER;
>>> +  }
>=20
> Agree, I will refine!
>=20
>>> +
>>> +  //
>>> +  // Check to return if Counter has already been locked.
>>> +  //
>>> +  if ((INT32)InternalReleaseSemaphore (SmmCpuSyncCtx->GlobalSem-
>>> Counter) <=3D 0) {
>>
>> (35) The cast and the comparison are bogus.
>>
>> InternalReleaseSemaphore():
>>
>> - returns 0, and leaves the semaphore unchanged, if the current value of
>> the semaphore is MAX_UINT32,
>>
>> - increments the semaphore, and returns the incremented -- hence:
>> strictly positive -- UINT32 value, otherwise.
>>
>> So the condition for
>>
>>   semaphore unchanged because door has been locked
>>
>> is:
>>
>>   InternalReleaseSemaphore (SmmCpuSyncCtx->GlobalSem->Counter) =3D=3D 0
>>
>> No INT32 cast, and no "<".
>>
>=20
> Agree! I will do the fix.=20
>=20
>=20
>>
>>> +    return RETURN_ABORTED;
>>> +  }
>>> +
>>> +  return RETURN_SUCCESS;
>>> +}
>>> +
>>> +/**
>>> +  Performs an atomic operation to check out CPU.
>>> +
>>> +  CheckOutCpu() can be called in error handling flow for the CPU who c=
alls
>> CheckInCpu() earlier.
>>> +
>>> +  @param[in,out]  SmmCpuSyncCtx     Pointer to the SMM CPU Sync
>> context object.
>>> +  @param[in]      CpuIndex          Check out CPU index.
>>> +
>>> +  @retval RETURN_SUCCESS            Check out CPU (CpuIndex) successfu=
lly.
>>> +  @retval RETURN_INVALID_PARAMETER  SmmCpuSyncCtx is NULL.
>>> +  @retval RETURN_NOT_READY          The CPU is not checked-in.
>>> +  @retval RETURN_UNSUPPORTED        Unsupported operation.
>>> +
>>> +**/
>>> +RETURN_STATUS
>>> +EFIAPI
>>> +SmmCpuSyncCheckOutCpu (
>>> +  IN OUT SMM_CPU_SYNC_CONTEXT  *SmmCpuSyncCtx,
>>> +  IN     UINTN                 CpuIndex
>>> +  )
>>> +{
>>> +  ASSERT (SmmCpuSyncCtx !=3D NULL);
>>
>> (36) bogus assert
>>
>>
>>> +  if (SmmCpuSyncCtx =3D=3D NULL) {
>>> +    return RETURN_INVALID_PARAMETER;
>>> +  }
>>> +
>=20
> The same, I will refine it.
>=20
>=20
>=20
>=20
>=20
>>> +  if (*SmmCpuSyncCtx->GlobalSem->Counter =3D=3D 0) {
>>> +    return RETURN_NOT_READY;
>>> +  }
>>
>> (37) This preliminary check is not particularly useful.
>>
>> Assume that Counter is currently 1, but -- due to a programming error
>> somewhere -- there are two APs executing SmmCpuSyncCheckOutCpu() in
>> parallel. Both may pass this check (due to Counter being 1), and then
>> one of the APs will consume the semaphore and return, and the other AP
>> will hang forever.
>>
>> So this check is "best effort". It's fine -- some programming errors
>> just inevitably lead to undefined behavior; not all bad usage can be
>> explicitly caught.
>>
>> Maybe add a comment?
>>
>=20
>=20
> Yes, I will remove the RETURN_NOT_READY, and add the comment for the API:
>=20
> "The caller shall make sure the CPU specified by CpuIndex has already che=
cked-in."

OK, thanks.

>=20
>=20
>=20
>>> +  if ((INT32)InternalWaitForSemaphore (SmmCpuSyncCtx->GlobalSem-
>>> Counter) < 0) {
>>> +    return RETURN_UNSUPPORTED;
>>> +  }
>>
>> (38) This doesn't look right. InternalWaitForSemaphore() blocks for as
>> long as the semaphore is zero. When the semaphore is nonzero,
>> InternalWaitForSemaphore() decrements it, and returns the decremented
>> value. Thus, InternalWaitForSemaphore() cannot return negative values
>> (it returns UINT32), and it also cannot return MAX_UINT32.
>>
>> So I simply don't understand the purpose of this code.
>=20
> The Counter can be locked by calling SmmCpuSyncLockDoor(), after that, Co=
unter will be (UINT32)-1, in such a case,  InternalWaitForSemaphore will co=
ntinue decrease it to (UINT32)-2, this is to catch the lock case. but accor=
ding we aligned for the interface below:
> /**
>   Performs an atomic operation to check out CPU.
>=20
>   This function can be called in error handling flow for the CPU who call=
s CheckInCpu() earlier.
>   The caller shall make sure the CPU specified by CpuIndex has already ch=
ecked-in.
>=20
>   If Context is NULL, then ASSERT().
>   If CpuIndex exceeds the range of all CPUs in the system, then ASSERT().
>=20
>   @param[in,out]  Context           Pointer to the SMM CPU Sync context o=
bject.
>   @param[in]      CpuIndex          Check out CPU index.
>=20
>   @retval RETURN_SUCCESS            Check out CPU (CpuIndex) successfully=
.
>   @retval RETURN_ABORTED            Check out CPU failed due to SmmCpuSyn=
cLockDoor() has been called by one elected CPU.
>=20
> **/
> RETURN_STATUS
> EFIAPI
> SmmCpuSyncCheckOutCpu (
>   IN OUT SMM_CPU_SYNC_CONTEXT  *Context,
>   IN     UINTN                 CpuIndex
>   );
>=20
> The code will be changed to:
>=20
>   if ((INT32)InternalWaitForSemaphore (Context->CpuCount) < 0) {
>     return RETURN_ABORTED;
>   }

I find this quite ugly. In the "semaphore post" operation, we already
have code that prevents incrementing if the semaphore is "locked". Can
we perhaps create a "semaphore pend" operation that does the same?

How about this:

diff --git a/UefiCpuPkg/Library/SmmCpuSyncLib/SmmCpuSyncLib.c
b/UefiCpuPkg/Library/SmmCpuSyncLib/SmmCpuSyncLib.c
index 3c2835f8def6..5d7fc58ef23f 100644
--- a/UefiCpuPkg/Library/SmmCpuSyncLib/SmmCpuSyncLib.c
+++ b/UefiCpuPkg/Library/SmmCpuSyncLib/SmmCpuSyncLib.c
@@ -91,35 +91,38 @@ UINT32
 InternalWaitForSemaphore (
   IN OUT  volatile UINT32  *Sem
   )
 {
   UINT32  Value;

   for ( ; ;) {
     Value =3D *Sem;
+    if (Value =3D=3D MAX_UINT32) {
+      return Value;
+    }
     if ((Value !=3D 0) &&
         (InterlockedCompareExchange32 (
            (UINT32 *)Sem,
            Value,
            Value - 1
            ) =3D=3D Value))
     {
       break;
     }

     CpuPause ();
   }

   return Value - 1;
 }

Note, I'm just brainstorming here, I've not thought it through. Just to
illustrate the direction I'm thinking of.

This change should be mostly OK. InternalWaitForSemaphore() returns the
decremented value. So, for InternalWaitForSemaphore() to return
MAX_UINT32 *without* this update, the function would have to decrement
the semaphore when the semaphore is zero. But in that case, the function
*blocks*. Thus, a return value of MAX_UINT32 is not possible without
this extension; ergo, if MAX_UINT32 is returned (with this extension),
we know the door was locked earlier (and the semaphore is not changed).

At the same time, we might want to update InternalReleaseSemaphore() as
well, so that it cannot validly increment the semaphore value to MAX_UINT32=
.


>=20
>=20
>>
>> As written, this condition could only fire if InternalWaitForSemaphore()
>> successfully decremented the semaphore, and the *new* value of the
>> semaphore were >=3D0x8000_0000. Because in that case, the INT32 cast (=
=3D
>> implementation-defined behavior) would produce a negative value. But for
>> that, we'd first have to increase Counter to 0x8000_0001 at least, and
>> that could never happen in practice, IMO.
>>
>> So this is basically dead code. What is the intent?
>=20
>=20
> It can happen when locked the Counter.
>=20
>=20
>>
>>> +
>>> +  return RETURN_SUCCESS;
>>> +}
>>> +
>>> +/**
>>> +  Performs an atomic operation lock door for CPU checkin or checkout.
>>> +
>>> +  After this function, CPU can not check in via SmmCpuSyncCheckInCpu()=
.
>>> +
>>> +  The CPU specified by CpuIndex is elected to lock door.
>>> +
>>> +  @param[in,out]  SmmCpuSyncCtx     Pointer to the SMM CPU Sync
>> context object.
>>> +  @param[in]      CpuIndex          Indicate which CPU to lock door.
>>> +  @param[in,out]  CpuCount          Number of arrived CPU in SMI after=
 look
>> door.
>>> +
>>> +  @retval RETURN_SUCCESS            Lock door for CPU successfully.
>>> +  @retval RETURN_INVALID_PARAMETER  SmmCpuSyncCtx or CpuCount is
>> NULL.
>>> +
>>> +**/
>>> +RETURN_STATUS
>>> +EFIAPI
>>> +SmmCpuSyncLockDoor (
>>> +  IN OUT SMM_CPU_SYNC_CONTEXT  *SmmCpuSyncCtx,
>>> +  IN     UINTN                 CpuIndex,
>>> +  IN OUT UINTN                 *CpuCount
>>> +  )
>>> +{
>>> +  ASSERT (SmmCpuSyncCtx !=3D NULL && CpuCount !=3D NULL);
>>
>> (39) bogus assert
>>
>>> +  if ((SmmCpuSyncCtx =3D=3D NULL) || (CpuCount =3D=3D NULL)) {
>>> +    return RETURN_INVALID_PARAMETER;
>>> +  }
>>> +
>>> +  *CpuCount =3D InternalLockdownSemaphore (SmmCpuSyncCtx-
>>> GlobalSem->Counter);
>>> +
>>> +  return RETURN_SUCCESS;
>>> +}
>>> +
>>> +/**
>>> +  Used by the BSP to wait for APs.
>>> +
>>> +  The number of APs need to be waited is specified by NumberOfAPs. The
>> BSP is specified by BspIndex.
>>> +
>>> +  Note: This function is blocking mode, and it will return only after =
the
>> number of APs released by
>>> +  calling SmmCpuSyncReleaseBsp():
>>> +  BSP: WaitForAPs    <--  AP: ReleaseBsp
>>> +
>>> +  @param[in,out]  SmmCpuSyncCtx     Pointer to the SMM CPU Sync
>> context object.
>>> +  @param[in]      NumberOfAPs       Number of APs need to be waited by
>> BSP.
>>> +  @param[in]      BspIndex          The BSP Index to wait for APs.
>>> +
>>> +  @retval RETURN_SUCCESS            BSP to wait for APs successfully.
>>> +  @retval RETURN_INVALID_PARAMETER  SmmCpuSyncCtx is NULL or
>> NumberOfAPs > total number of processors in system.
>>> +
>>> +**/
>>> +RETURN_STATUS
>>> +EFIAPI
>>> +SmmCpuSyncWaitForAPs (
>>> +  IN OUT SMM_CPU_SYNC_CONTEXT  *SmmCpuSyncCtx,
>>> +  IN     UINTN                 NumberOfAPs,
>>> +  IN     UINTN                 BspIndex
>>> +  )
>>> +{
>>> +  ASSERT (SmmCpuSyncCtx !=3D NULL && NumberOfAPs <=3D SmmCpuSyncCtx-
>>> NumberOfCpus);
>>
>> (40) bogus assert
>>
>>> +  if ((SmmCpuSyncCtx =3D=3D NULL) || (NumberOfAPs > SmmCpuSyncCtx-
>>> NumberOfCpus)) {
>>> +    return RETURN_INVALID_PARAMETER;
>>> +  }
>>
>> (41) Question for both the library instance and the library class (i.e.,
>> API documentation):
>>
>> Is it ever valid to call this function with (NumberOfAPs =3D=3D
>> SmmCpuSyncCtx->NumberOfCpus)?
>>
>> I would think not. NumberOfCpus is supposed to include the BSP and the
>> APs. Therefore the highest permitted NumberOfAPs value, on input, is
>> (SmmCpuSyncCtx->NumberOfCpus - 1).
>>
>> So I think we should modify the lib class and the lib instance both.
>> RETURN_INVALID_PARAMETER applies to "NumberOfAPs *>=3D* total number
>> of
>> processors in system".
>>
>=20
> Good catch!!! I will fix it.
>=20
>>> +
>>> +  while (NumberOfAPs-- > 0) {
>>> +    InternalWaitForSemaphore (SmmCpuSyncCtx->CpuSem[BspIndex].Run);
>>> +  }
>>
>> (42) In my opinion, this is an ugly pattern.
>>
>> First, after the loop, NumberOfAPs will be MAX_UINTN.
>>
>> Second, modifying input parameters is also an anti-pattern. Assume you
>> debug a problem, and fetch a backtrace where the two innermost frames
>> are SmmCpuSyncWaitForAPs() and InternalWaitForSemaphore(). If you look
>> at the stack frame that belongs to SmmCpuSyncWaitForAPs(), you may be
>> led to think that the function was *invoked* with a *low* NumberOfAPs
>> value. Whereas in fact NumberOfAPs may have been a larger value at the
>> time of call, only the function decremented NumberOfAPs by the time the
>> stack trace was fetched.
>>
>> So, please add a new helper variable, and write a proper "for" loop.
>>
>>   UINTN  Arrived;
>>
>>   for (Arrived =3D 0; Arrived < NumberOfAPs; Arrived++) {
>>     InternalWaitForSemaphore (SmmCpuSyncCtx->CpuSem[BspIndex].Run);
>>   }
>>
>=20
> Agree. Thank you!
>=20
>=20
>> (43) I mentioned this while reviewing the lib class header (patch#2), so
>> let me repeat it here:
>>
>> BspIndex is used for indexing the CpuSem array, but we perform no range
>> checking, against "SmmCpuSyncCtx->NumberOfCpus".
>>
>> That error should be documented (in the lib class header), and
>> caught/reported here.
>>
>=20
> Yes, I will refine the patch based on the lib class header we aligned.
>=20
>=20
>>> +
>>> +  return RETURN_SUCCESS;
>>> +}
>>> +
>>> +/**
>>> +  Used by the BSP to release one AP.
>>> +
>>> +  The AP is specified by CpuIndex. The BSP is specified by BspIndex.
>>> +
>>> +  @param[in,out]  SmmCpuSyncCtx     Pointer to the SMM CPU Sync
>> context object.
>>> +  @param[in]      CpuIndex          Indicate which AP need to be relea=
sed.
>>> +  @param[in]      BspIndex          The BSP Index to release AP.
>>> +
>>> +  @retval RETURN_SUCCESS            BSP to release one AP successfully=
.
>>> +  @retval RETURN_INVALID_PARAMETER  SmmCpuSyncCtx is NULL or
>> CpuIndex is same as BspIndex.
>>> +
>>> +**/
>>> +RETURN_STATUS
>>> +EFIAPI
>>> +SmmCpuSyncReleaseOneAp   (
>>> +  IN OUT SMM_CPU_SYNC_CONTEXT  *SmmCpuSyncCtx,
>>> +  IN     UINTN                 CpuIndex,
>>> +  IN     UINTN                 BspIndex
>>> +  )
>>> +{
>>> +  ASSERT (SmmCpuSyncCtx !=3D NULL && BspIndex !=3D CpuIndex);
>>
>> (44) bogus assert
>>
>>> +  if ((SmmCpuSyncCtx =3D=3D NULL) || (BspIndex =3D=3D CpuIndex)) {
>>> +    return RETURN_INVALID_PARAMETER;
>>> +  }
>>
>> (45) range checks for BspIndex and CpuIndex missing (in both lib class
>> and lib instance)
>>
>=20
> The same, I will align the code with we aligned lib class.
>=20
>=20
>>> +
>>> +  InternalReleaseSemaphore (SmmCpuSyncCtx->CpuSem[CpuIndex].Run);
>>> +
>>> +  return RETURN_SUCCESS;
>>> +}
>>> +
>>> +/**
>>> +  Used by the AP to wait BSP.
>>> +
>>> +  The AP is specified by CpuIndex. The BSP is specified by BspIndex.
>>> +
>>> +  Note: This function is blocking mode, and it will return only after =
the AP
>> released by
>>> +  calling SmmCpuSyncReleaseOneAp():
>>> +  BSP: ReleaseOneAp  -->  AP: WaitForBsp
>>> +
>>> +  @param[in,out]  SmmCpuSyncCtx    Pointer to the SMM CPU Sync context
>> object.
>>> +  @param[in]      CpuIndex         Indicate which AP wait BSP.
>>> +  @param[in]      BspIndex         The BSP Index to be waited.
>>> +
>>> +  @retval RETURN_SUCCESS            AP to wait BSP successfully.
>>> +  @retval RETURN_INVALID_PARAMETER  SmmCpuSyncCtx is NULL or
>> CpuIndex is same as BspIndex.
>>> +
>>> +**/
>>> +RETURN_STATUS
>>> +EFIAPI
>>> +SmmCpuSyncWaitForBsp (
>>> +  IN OUT SMM_CPU_SYNC_CONTEXT  *SmmCpuSyncCtx,
>>> +  IN     UINTN                 CpuIndex,
>>> +  IN     UINTN                 BspIndex
>>> +  )
>>> +{
>>> +  ASSERT (SmmCpuSyncCtx !=3D NULL && BspIndex !=3D CpuIndex);
>>
>> (46) bogus assert
>>
>>> +  if ((SmmCpuSyncCtx =3D=3D NULL) || (BspIndex =3D=3D CpuIndex)) {
>>> +    return RETURN_INVALID_PARAMETER;
>>> +  }
>>
>> (47) range checks missing (lib class and instance)
>>
>=20
>=20
> The same, I will align the code with we aligned lib class.
>=20
>>> +
>>> +  InternalWaitForSemaphore (SmmCpuSyncCtx->CpuSem[CpuIndex].Run);
>>> +
>>> +  return RETURN_SUCCESS;
>>> +}
>>> +
>>> +/**
>>> +  Used by the AP to release BSP.
>>> +
>>> +  The AP is specified by CpuIndex. The BSP is specified by BspIndex.
>>> +
>>> +  @param[in,out]  SmmCpuSyncCtx     Pointer to the SMM CPU Sync
>> context object.
>>> +  @param[in]      CpuIndex          Indicate which AP release BSP.
>>> +  @param[in]      BspIndex          The BSP Index to be released.
>>> +
>>> +  @retval RETURN_SUCCESS            AP to release BSP successfully.
>>> +  @retval RETURN_INVALID_PARAMETER  SmmCpuSyncCtx is NULL or
>> CpuIndex is same as BspIndex.
>>> +
>>> +**/
>>> +RETURN_STATUS
>>> +EFIAPI
>>> +SmmCpuSyncReleaseBsp (
>>> +  IN OUT SMM_CPU_SYNC_CONTEXT  *SmmCpuSyncCtx,
>>> +  IN     UINTN                 CpuIndex,
>>> +  IN     UINTN                 BspIndex
>>> +  )
>>> +{
>>> +  ASSERT (SmmCpuSyncCtx !=3D NULL && BspIndex !=3D CpuIndex);
>>
>> (48) bogus assert
>>
>>> +  if ((SmmCpuSyncCtx =3D=3D NULL) || (BspIndex =3D=3D CpuIndex)) {
>>> +    return RETURN_INVALID_PARAMETER;
>>> +  }
>>
>> (49) range checks missing (lib class and instance)
>>
>=20
> The same, I will align the code with we aligned lib class.
>=20
>=20
>>> +
>>> +  InternalReleaseSemaphore (SmmCpuSyncCtx->CpuSem[BspIndex].Run);
>>> +
>>> +  return RETURN_SUCCESS;
>>> +}
>>> diff --git a/UefiCpuPkg/Library/SmmCpuSyncLib/SmmCpuSyncLib.inf
>> b/UefiCpuPkg/Library/SmmCpuSyncLib/SmmCpuSyncLib.inf
>>> new file mode 100644
>>> index 0000000000..6bb1895577
>>> --- /dev/null
>>> +++ b/UefiCpuPkg/Library/SmmCpuSyncLib/SmmCpuSyncLib.inf
>>> @@ -0,0 +1,39 @@
>>> +## @file
>>> +# SMM CPU Synchronization lib.
>>> +#
>>> +# This is SMM CPU Synchronization lib used for SMM CPU sync operations=
.
>>> +#
>>> +# Copyright (c) 2023, Intel Corporation. All rights reserved.<BR>
>>> +# SPDX-License-Identifier: BSD-2-Clause-Patent
>>> +#
>>> +##
>>> +
>>> +[Defines]
>>> +  INF_VERSION                    =3D 0x00010005
>>> +  BASE_NAME                      =3D SmmCpuSyncLib
>>> +  FILE_GUID                      =3D 1ca1bc1a-16a4-46ef-956a-ca500fd33=
81f
>>> +  MODULE_TYPE                    =3D DXE_SMM_DRIVER
>>> +  LIBRARY_CLASS                  =3D SmmCpuSyncLib|DXE_SMM_DRIVER
>>> +
>>> +[Sources]
>>> +  SmmCpuSyncLib.c
>>> +
>>> +[Packages]
>>> +  MdePkg/MdePkg.dec
>>> +  MdeModulePkg/MdeModulePkg.dec
>>> +  UefiCpuPkg/UefiCpuPkg.dec
>>> +
>>> +[LibraryClasses]
>>> +  UefiLib
>>> +  BaseLib
>>> +  DebugLib
>>> +  PrintLib
>>> +  SafeIntLib
>>> +  SynchronizationLib
>>> +  BaseMemoryLib
>>> +  SmmServicesTableLib
>>> +  MemoryAllocationLib
>>
>> (50) Please sort this list alphabetically (cf. comment (2)).
>>
>=20
> Ok, will refine it.
>=20
>>> +
>>> +[Pcd]
>>> +
>>> +[Protocols]
>>
>> (51) Useless empty INF file sections; please remove them.
>>
>>> diff --git a/UefiCpuPkg/UefiCpuPkg.dsc b/UefiCpuPkg/UefiCpuPkg.dsc
>>> index 074fd77461..f264031c77 100644
>>> --- a/UefiCpuPkg/UefiCpuPkg.dsc
>>> +++ b/UefiCpuPkg/UefiCpuPkg.dsc
>>> @@ -23,10 +23,11 @@
>>>  #
>>>
>>>  !include MdePkg/MdeLibs.dsc.inc
>>>
>>>  [LibraryClasses]
>>> +  SafeIntLib|MdePkg/Library/BaseSafeIntLib/BaseSafeIntLib.inf
>>>    BaseLib|MdePkg/Library/BaseLib/BaseLib.inf
>>>    BaseMemoryLib|MdePkg/Library/BaseMemoryLib/BaseMemoryLib.inf
>>>    CpuLib|MdePkg/Library/BaseCpuLib/BaseCpuLib.inf
>>>    DebugLib|MdePkg/Library/BaseDebugLibNull/BaseDebugLibNull.inf
>>>
>> SerialPortLib|MdePkg/Library/BaseSerialPortLibNull/BaseSerialPortLibNull=
.inf
>>
>> (52) Just from the context visible here, this list seems alphabetically
>> sorted pre-patch; if that's the case, please stick with it (don't break
>> the sort order).
>=20
>=20
> How about adding the SafeIntLib in the MdePkg/MdeLibs.dsc.inc? if agree, =
I will do that with separated patch.

Sounds like a good idea -- there shouldn't ever be further instances of
SafeIntLib, and SafeIntLib should be used wherever possible (integer
overflow may be a risk anywhere at all). So a central lib class ->
instance resolution sounds useful.

Thanks!
Laszlo


-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#112529): https://edk2.groups.io/g/devel/message/112529
Mute This Topic: https://groups.io/mt/103010165/7686176
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/leave/12367111/7686176/19134562=
12/xyzzy [rebecca@openfw.io]
-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-