* [edk2-devel] [PATCH v1 1/2] UefiCpuPkg/PiSmmCpuDxeSmm: Remove one round of Busy spinlock
2024-03-06 12:11 [edk2-devel] [PATCH v1 0/2] SMM MP service performance Improvement Wu, Jiaxin
@ 2024-03-06 12:11 ` Wu, Jiaxin
2024-03-06 12:11 ` [edk2-devel] [PATCH v1 2/2] UefiCpuPkg/PiSmmCpuDxeSmm: Make RunningApCount on exclusive cacheline Wu, Jiaxin
1 sibling, 0 replies; 3+ messages in thread
From: Wu, Jiaxin @ 2024-03-06 12:11 UTC (permalink / raw)
To: devel
Cc: Ray Ni, Laszlo Ersek, Eric Dong, Zeng Star, Gerd Hoffmann,
Rahul Kumar
For both non blocking and blocking:
Remove unnecessary a pair of AcquireSpinLock and ReleaseSpinLock
for each AP. It's target to improve SmmStartupAllAPs performance.
No functional impact, see below analysis:
During first acquire spinLock, InternalSmmStartupAllAPs will
return EFI_NOT_READY if AcquireSpinLockOrFail return false:
if (!AcquireSpinLockOrFail (mSmmMpSyncData->CpuData[Index].Busy)) {
return EFI_NOT_READY;
}
which means the AcquireSpinLockOrFail () must return TRUE if
InternalSmmStartupAllAPs () can continue the second round of Busy
spinlock. since the Busy has already acquired the spinlock, no need
to release and acquire again.
Cc: Ray Ni <ray.ni@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Eric Dong <eric.dong@intel.com>
Cc: Zeng Star <star.zeng@intel.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Rahul Kumar <rahul1.kumar@intel.com>
Signed-off-by: Jiaxin Wu <jiaxin.wu@intel.com>
---
UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c | 15 ---------------
1 file changed, 15 deletions(-)
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
index 081f0c1501..9790b4f888 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
@@ -1257,12 +1257,10 @@ InternalSmmStartupAllAPs (
}
if (!AcquireSpinLockOrFail (mSmmMpSyncData->CpuData[Index].Busy)) {
return EFI_NOT_READY;
}
-
- ReleaseSpinLock (mSmmMpSyncData->CpuData[Index].Busy);
}
}
if (CpuCount == 0) {
return EFI_NOT_STARTED;
@@ -1273,23 +1271,10 @@ InternalSmmStartupAllAPs (
*Token = (MM_COMPLETION)ProcToken->SpinLock;
} else {
ProcToken = NULL;
}
- //
- // Make sure all BUSY should be acquired.
- //
- // Because former code already check mSmmMpSyncData->CpuData[***].Busy for each AP.
- // Here code always use AcquireSpinLock instead of AcquireSpinLockOrFail for not
- // block mode.
- //
- for (Index = 0; Index < mMaxNumberOfCpus; Index++) {
- if (IsPresentAp (Index)) {
- AcquireSpinLock (mSmmMpSyncData->CpuData[Index].Busy);
- }
- }
-
for (Index = 0; Index < mMaxNumberOfCpus; Index++) {
if (IsPresentAp (Index)) {
mSmmMpSyncData->CpuData[Index].Procedure = (EFI_AP_PROCEDURE2)Procedure;
mSmmMpSyncData->CpuData[Index].Parameter = ProcedureArguments;
if (ProcToken != NULL) {
--
2.16.2.windows.1
-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#116434): https://edk2.groups.io/g/devel/message/116434
Mute This Topic: https://groups.io/mt/104764104/7686176
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io]
-=-=-=-=-=-=-=-=-=-=-=-
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [edk2-devel] [PATCH v1 2/2] UefiCpuPkg/PiSmmCpuDxeSmm: Make RunningApCount on exclusive cacheline
2024-03-06 12:11 [edk2-devel] [PATCH v1 0/2] SMM MP service performance Improvement Wu, Jiaxin
2024-03-06 12:11 ` [edk2-devel] [PATCH v1 1/2] UefiCpuPkg/PiSmmCpuDxeSmm: Remove one round of Busy spinlock Wu, Jiaxin
@ 2024-03-06 12:11 ` Wu, Jiaxin
1 sibling, 0 replies; 3+ messages in thread
From: Wu, Jiaxin @ 2024-03-06 12:11 UTC (permalink / raw)
To: devel
Cc: Ray Ni, Laszlo Ersek, Eric Dong, Zeng Star, Gerd Hoffmann,
Rahul Kumar
For non blocking mode during SmmMpBroadcastProcedure, multiple APs might
contended access the RunningApCount in the PROCEDURE_TOKEN:
Step1: RunningApCount is initialized to the mMaxNumberOfCpus
(See GetFreeToken).
Step2: Decrease RunningApCount if the AP is not present
(See InterlockedDecrement in InternalSmmStartupAllAPs).
Step3: multiple APs are contended to decrease RunningApCount in the
same token (See ReleaseToken in APHandler).
So, Contended lock case happen during Step3. For multiple APs access
the shared memory (RunningApCount), we shall use exclusive cache line
with WB attribute for SMM Performance Tuning.
This patch makes RunningApCount on exclusive cacheline.
Cc: Ray Ni <ray.ni@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Eric Dong <eric.dong@intel.com>
Cc: Zeng Star <star.zeng@intel.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Rahul Kumar <rahul1.kumar@intel.com>
Signed-off-by: Jiaxin Wu <jiaxin.wu@intel.com>
---
UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c | 35 +++++++++++++++++++-----------
UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h | 2 +-
2 files changed, 23 insertions(+), 14 deletions(-)
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
index 9790b4f888..05fa6854fe 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
@@ -421,11 +421,11 @@ ReleaseToken (
{
PROCEDURE_TOKEN *Token;
Token = mSmmMpSyncData->CpuData[CpuIndex].Token;
- if (InterlockedDecrement (&Token->RunningApCount) == 0) {
+ if (InterlockedDecrement (Token->RunningApCount) == 0) {
ReleaseSpinLock (Token->SpinLock);
}
mSmmMpSyncData->CpuData[CpuIndex].Token = NULL;
}
@@ -970,12 +970,12 @@ AllocateTokenBuffer (
)
{
UINTN SpinLockSize;
UINT32 TokenCountPerChunk;
UINTN Index;
- SPIN_LOCK *SpinLock;
- UINT8 *SpinLockBuffer;
+ UINTN BufferAddr;
+ VOID *Buffer;
PROCEDURE_TOKEN *ProcTokens;
SpinLockSize = GetSpinLockProperties ();
TokenCountPerChunk = FixedPcdGet32 (PcdCpuSmmMpTokenCountPerChunk);
@@ -986,25 +986,34 @@ AllocateTokenBuffer (
}
DEBUG ((DEBUG_INFO, "CpuSmm: SpinLock Size = 0x%x, PcdCpuSmmMpTokenCountPerChunk = 0x%x\n", SpinLockSize, TokenCountPerChunk));
//
- // Separate the Spin_lock and Proc_token because the alignment requires by Spin_Lock.
+ // Allocate the buffer for SpinLock and RunningApCount to meet the alignment requirement.
//
- SpinLockBuffer = AllocatePool (SpinLockSize * TokenCountPerChunk);
- ASSERT (SpinLockBuffer != NULL);
+ Buffer = AllocatePages (EFI_SIZE_TO_PAGES (SpinLockSize * TokenCountPerChunk * 2));
+ if (Buffer == NULL) {
+ DEBUG ((DEBUG_ERROR, "AllocateTokenBuffer: Failed to allocate the buffer for SpinLock and RunningApCount!\n"));
+ CpuDeadLoop ();
+ }
ProcTokens = AllocatePool (sizeof (PROCEDURE_TOKEN) * TokenCountPerChunk);
ASSERT (ProcTokens != NULL);
+ BufferAddr = (UINTN)Buffer;
for (Index = 0; Index < TokenCountPerChunk; Index++) {
- SpinLock = (SPIN_LOCK *)(SpinLockBuffer + SpinLockSize * Index);
- InitializeSpinLock (SpinLock);
+ ProcTokens[Index].Signature = PROCEDURE_TOKEN_SIGNATURE;
+
+ ProcTokens[Index].SpinLock = (SPIN_LOCK *)BufferAddr;
+ InitializeSpinLock (ProcTokens[Index].SpinLock);
+
+ BufferAddr += SpinLockSize;
+
+ ProcTokens[Index].RunningApCount = (volatile UINT32 *)BufferAddr;
+ *ProcTokens[Index].RunningApCount = 0;
- ProcTokens[Index].Signature = PROCEDURE_TOKEN_SIGNATURE;
- ProcTokens[Index].SpinLock = SpinLock;
- ProcTokens[Index].RunningApCount = 0;
+ BufferAddr += SpinLockSize;
InsertTailList (&gSmmCpuPrivate->TokenList, &ProcTokens[Index].Link);
}
return &ProcTokens[0].Link;
@@ -1036,11 +1045,11 @@ GetFreeToken (
}
NewToken = PROCEDURE_TOKEN_FROM_LINK (gSmmCpuPrivate->FirstFreeToken);
gSmmCpuPrivate->FirstFreeToken = GetNextNode (&gSmmCpuPrivate->TokenList, gSmmCpuPrivate->FirstFreeToken);
- NewToken->RunningApCount = RunningApsCount;
+ *NewToken->RunningApCount = RunningApsCount;
AcquireSpinLock (NewToken->SpinLock);
return NewToken;
}
@@ -1298,11 +1307,11 @@ InternalSmmStartupAllAPs (
//
// Decrease the count to mark this processor(AP or BSP) as finished.
//
if (ProcToken != NULL) {
- InterlockedDecrement (&ProcToken->RunningApCount);
+ InterlockedDecrement (ProcToken->RunningApCount);
}
}
}
ReleaseAllAPs ();
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
index 7f244ea803..07473208fd 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
@@ -213,11 +213,11 @@ typedef struct {
typedef struct {
UINTN Signature;
LIST_ENTRY Link;
SPIN_LOCK *SpinLock;
- volatile UINT32 RunningApCount;
+ volatile UINT32 *RunningApCount;
} PROCEDURE_TOKEN;
#define PROCEDURE_TOKEN_FROM_LINK(a) CR (a, PROCEDURE_TOKEN, Link, PROCEDURE_TOKEN_SIGNATURE)
#define TOKEN_BUFFER_SIGNATURE SIGNATURE_32 ('T', 'K', 'B', 'S')
--
2.16.2.windows.1
-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#116435): https://edk2.groups.io/g/devel/message/116435
Mute This Topic: https://groups.io/mt/104764105/7686176
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io]
-=-=-=-=-=-=-=-=-=-=-=-
^ permalink raw reply related [flat|nested] 3+ messages in thread