public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
* [PATCH 0/3] UefiCpuPkg/DxeMpLib: Allocate new safe stack < 4GB
@ 2016-11-23 14:01 Jeff Fan
  2016-11-23 14:01 ` [PATCH 1/3] UefiCpuPkg/DxeMpLib: Get safe AP loop handler from global variable Jeff Fan
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Jeff Fan @ 2016-11-23 14:01 UTC (permalink / raw)
  To: edk2-devel; +Cc: Laszlo Ersek, Feng Tian, Michael D Kinney

Allocate safe AP stack under 4GB and make sure BSP wait till all APs running in
safe code.  

Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Feng Tian <feng.tian@intel.com>
Cc: Michael D Kinney <michael.d.kinney@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jeff Fan <jeff.fan@intel.com>

Jeff Fan (3):
  UefiCpuPkg/DxeMpLib: Get safe AP loop handler from global variable
  UefiCpuPkg/DxeMpLib: Allocate new safe stack < 4GB
  UefiCpuPkg/DxeMpLib: Make sure APs in safe loop code

 UefiCpuPkg/Library/MpInitLib/DxeMpLib.c        | 29 ++++++++++++++++++++++----
 UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm | 11 ++++++++--
 UefiCpuPkg/Library/MpInitLib/MpLib.h           |  4 +++-
 UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm  |  5 +++++
 4 files changed, 42 insertions(+), 7 deletions(-)

-- 
2.9.3.windows.2



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] UefiCpuPkg/DxeMpLib: Get safe AP loop handler from global variable
  2016-11-23 14:01 [PATCH 0/3] UefiCpuPkg/DxeMpLib: Allocate new safe stack < 4GB Jeff Fan
@ 2016-11-23 14:01 ` Jeff Fan
  2016-11-23 17:08   ` Laszlo Ersek
  2016-11-23 14:01 ` [PATCH 2/3] UefiCpuPkg/DxeMpLib: Allocate new safe stack < 4GB Jeff Fan
  2016-11-23 14:01 ` [PATCH 3/3] UefiCpuPkg/DxeMpLib: Make sure APs in safe loop code Jeff Fan
  2 siblings, 1 reply; 12+ messages in thread
From: Jeff Fan @ 2016-11-23 14:01 UTC (permalink / raw)
  To: edk2-devel; +Cc: Laszlo Ersek, Feng Tian, Michael D Kinney

Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Feng Tian <feng.tian@intel.com>
Cc: Michael D Kinney <michael.d.kinney@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jeff Fan <jeff.fan@intel.com>
---
 UefiCpuPkg/Library/MpInitLib/DxeMpLib.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/UefiCpuPkg/Library/MpInitLib/DxeMpLib.c b/UefiCpuPkg/Library/MpInitLib/DxeMpLib.c
index 7f3900b..a0d5eeb 100644
--- a/UefiCpuPkg/Library/MpInitLib/DxeMpLib.c
+++ b/UefiCpuPkg/Library/MpInitLib/DxeMpLib.c
@@ -244,7 +244,7 @@ RelocateApLoop (
 
   CpuMpData    = GetCpuMpData ();
   MwaitSupport = IsMwaitSupport ();
-  AsmRelocateApLoopFunc = (ASM_RELOCATE_AP_LOOP) (UINTN) Buffer;
+  AsmRelocateApLoopFunc = (ASM_RELOCATE_AP_LOOP) (UINTN) mReservedApLoopFunc;
   AsmRelocateApLoopFunc (MwaitSupport, CpuMpData->ApTargetCState, CpuMpData->PmCodeSegment);
   //
   // It should never reach here
@@ -273,7 +273,7 @@ MpInitChangeApLoopCallback (
   CpuMpData->SaveRestoreFlag = TRUE;
   CpuMpData->PmCodeSegment = GetProtectedModeCS ();
   CpuMpData->ApLoopMode = PcdGet8 (PcdCpuApLoopMode);
-  WakeUpAP (CpuMpData, TRUE, 0, RelocateApLoop, mReservedApLoopFunc);
+  WakeUpAP (CpuMpData, TRUE, 0, RelocateApLoop, NULL);
   DEBUG ((DEBUG_INFO, "%a() done!\n", __FUNCTION__));
 }
 
-- 
2.9.3.windows.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/3] UefiCpuPkg/DxeMpLib: Allocate new safe stack < 4GB
  2016-11-23 14:01 [PATCH 0/3] UefiCpuPkg/DxeMpLib: Allocate new safe stack < 4GB Jeff Fan
  2016-11-23 14:01 ` [PATCH 1/3] UefiCpuPkg/DxeMpLib: Get safe AP loop handler from global variable Jeff Fan
@ 2016-11-23 14:01 ` Jeff Fan
  2016-11-23 18:31   ` Laszlo Ersek
  2016-11-23 14:01 ` [PATCH 3/3] UefiCpuPkg/DxeMpLib: Make sure APs in safe loop code Jeff Fan
  2 siblings, 1 reply; 12+ messages in thread
From: Jeff Fan @ 2016-11-23 14:01 UTC (permalink / raw)
  To: edk2-devel; +Cc: Laszlo Ersek, Feng Tian, Michael D Kinney

For long mode DXE, we will disable paging on AP to protected mode to execute AP
safe loop code in ACPI NVS range under 4GB. But we forget to allocate stack for
AP under 4GB and AP still are using original AP stack. If original AP stack is
larger than 4GB, it cannot be used after AP is transferred to protected mode.

Moreover, even though AP stack is always under 4GB on protected mode DXE, AP
stack(in BootServiceData) maybe crashed by OS after Exit Boot Service event.

This fix is to allocate ACPI NVS range under 4GB together with AP safe loop
code. APs will switch to new stack at beginning of safe loop code.

Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Feng Tian <feng.tian@intel.com>
Cc: Michael D Kinney <michael.d.kinney@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jeff Fan <jeff.fan@intel.com>
---
 UefiCpuPkg/Library/MpInitLib/DxeMpLib.c        | 19 +++++++++++++++++--
 UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm |  9 +++++++--
 UefiCpuPkg/Library/MpInitLib/MpLib.h           |  3 ++-
 UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm  |  3 +++
 4 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/UefiCpuPkg/Library/MpInitLib/DxeMpLib.c b/UefiCpuPkg/Library/MpInitLib/DxeMpLib.c
index a0d5eeb..d0f9f7e 100644
--- a/UefiCpuPkg/Library/MpInitLib/DxeMpLib.c
+++ b/UefiCpuPkg/Library/MpInitLib/DxeMpLib.c
@@ -18,6 +18,7 @@
 #include <Library/UefiBootServicesTableLib.h>
 
 #define  AP_CHECK_INTERVAL     (EFI_TIMER_PERIOD_MILLISECONDS (100))
+#define  AP_SAFT_STACK_SIZE    128
 
 CPU_MP_DATA      *mCpuMpData = NULL;
 EFI_EVENT        mCheckAllApsEvent = NULL;
@@ -25,6 +26,7 @@ EFI_EVENT        mMpInitExitBootServicesEvent = NULL;
 EFI_EVENT        mLegacyBootEvent = NULL;
 volatile BOOLEAN mStopCheckAllApsStatus = TRUE;
 VOID             *mReservedApLoopFunc = NULL;
+UINTN            mReservedTopOfApStack;
 
 /**
   Get the pointer to CPU MP Data structure.
@@ -241,11 +243,18 @@ RelocateApLoop (
   CPU_MP_DATA            *CpuMpData;
   BOOLEAN                MwaitSupport;
   ASM_RELOCATE_AP_LOOP   AsmRelocateApLoopFunc;
+  UINTN                  ProcessorNumber;
 
+  MpInitLibWhoAmI (&ProcessorNumber); 
   CpuMpData    = GetCpuMpData ();
   MwaitSupport = IsMwaitSupport ();
   AsmRelocateApLoopFunc = (ASM_RELOCATE_AP_LOOP) (UINTN) mReservedApLoopFunc;
-  AsmRelocateApLoopFunc (MwaitSupport, CpuMpData->ApTargetCState, CpuMpData->PmCodeSegment);
+  AsmRelocateApLoopFunc (
+    MwaitSupport,
+    CpuMpData->ApTargetCState,
+    CpuMpData->PmCodeSegment,
+    mReservedTopOfApStack - ProcessorNumber * AP_SAFT_STACK_SIZE
+    );
   //
   // It should never reach here
   //
@@ -289,6 +298,7 @@ InitMpGlobalData (
 {
   EFI_STATUS                 Status;
   EFI_PHYSICAL_ADDRESS       Address;
+  UINTN                      ApSafeBufferSize;
 
   SaveCpuMpData (CpuMpData);
 
@@ -307,16 +317,21 @@ InitMpGlobalData (
   // Allocating it in advance since memory services are not available in
   // Exit Boot Services callback function.
   //
+  ApSafeBufferSize  = CpuMpData->AddressMap.RelocateApLoopFuncSize;
+  ApSafeBufferSize += CpuMpData->CpuCount * AP_SAFT_STACK_SIZE;
+
   Address = BASE_4GB - 1;
   Status  = gBS->AllocatePages (
                    AllocateMaxAddress,
                    EfiReservedMemoryType,
-                   EFI_SIZE_TO_PAGES (sizeof (CpuMpData->AddressMap.RelocateApLoopFuncSize)),
+                   EFI_SIZE_TO_PAGES (ApSafeBufferSize),
                    &Address
                    );
   ASSERT_EFI_ERROR (Status);
   mReservedApLoopFunc = (VOID *) (UINTN) Address;
   ASSERT (mReservedApLoopFunc != NULL);
+  mReservedTopOfApStack = (UINTN) Address + EFI_SIZE_TO_PAGES (ApSafeBufferSize) * SIZE_4KB;
+  ASSERT ((mReservedTopOfApStack & (UINTN)(CPU_STACK_ALIGNMENT - 1)) == 0);
   CopyMem (
     mReservedApLoopFunc,
     CpuMpData->AddressMap.RelocateApLoopFuncAddress,
diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
index 64e51d8..4e55760 100644
--- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
+++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
@@ -217,14 +217,19 @@ RendezvousFunnelProcEnd:
 global ASM_PFX(AsmRelocateApLoop)
 ASM_PFX(AsmRelocateApLoop):
 AsmRelocateApLoopStart:
-    cmp        byte [esp + 4], 1
+    push       ebp
+    mov        ebp, esp
+    mov        ecx, [ebp + 8]      ; MwaitSupport
+    mov        ebx, [ebp + 12]     ; ApTargetCState
+    mov        esp, [ebp + 20]     ; TopOfApStack
+    cmp        cl,  1              ; Check mwait-monitor support
     jnz        HltLoop
 MwaitLoop:
     mov        eax, esp
     xor        ecx, ecx
     xor        edx, edx
     monitor
-    mov        eax, [esp + 8]    ; Mwait Cx, Target C-State per eax[7:4]
+    mov        eax, ebx            ; Mwait Cx, Target C-State per eax[7:4]
     shl        eax, 4
     mwait
     jmp        MwaitLoop
diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.h b/UefiCpuPkg/Library/MpInitLib/MpLib.h
index f73a469..e6dea18 100644
--- a/UefiCpuPkg/Library/MpInitLib/MpLib.h
+++ b/UefiCpuPkg/Library/MpInitLib/MpLib.h
@@ -250,7 +250,8 @@ VOID
 (EFIAPI * ASM_RELOCATE_AP_LOOP) (
   IN BOOLEAN                 MwaitSupport,
   IN UINTN                   ApTargetCState,
-  IN UINTN                   PmCodeSegment
+  IN UINTN                   PmCodeSegment,
+  IN UINTN                   TopOfApStack
   );
 
 /**
diff --git a/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm b/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm
index aaabb50..7c8fa45 100644
--- a/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm
+++ b/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm
@@ -224,6 +224,9 @@ RendezvousFunnelProcEnd:
 global ASM_PFX(AsmRelocateApLoop)
 ASM_PFX(AsmRelocateApLoop):
 AsmRelocateApLoopStart:
+    push       rbp
+    mov        rbp, rsp
+    mov        rsp, r9
     push       rcx
     push       rdx
 
-- 
2.9.3.windows.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] UefiCpuPkg/DxeMpLib: Make sure APs in safe loop code
  2016-11-23 14:01 [PATCH 0/3] UefiCpuPkg/DxeMpLib: Allocate new safe stack < 4GB Jeff Fan
  2016-11-23 14:01 ` [PATCH 1/3] UefiCpuPkg/DxeMpLib: Get safe AP loop handler from global variable Jeff Fan
  2016-11-23 14:01 ` [PATCH 2/3] UefiCpuPkg/DxeMpLib: Allocate new safe stack < 4GB Jeff Fan
@ 2016-11-23 14:01 ` Jeff Fan
  2016-11-23 18:52   ` Laszlo Ersek
  2 siblings, 1 reply; 12+ messages in thread
From: Jeff Fan @ 2016-11-23 14:01 UTC (permalink / raw)
  To: edk2-devel; +Cc: Laszlo Ersek, Feng Tian, Michael D Kinney

Add one semaphore to make sure BSP to wait till all APs run in AP safe loop
code.

Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Feng Tian <feng.tian@intel.com>
Cc: Michael D Kinney <michael.d.kinney@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jeff Fan <jeff.fan@intel.com>
---
 UefiCpuPkg/Library/MpInitLib/DxeMpLib.c        | 8 +++++++-
 UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm | 2 ++
 UefiCpuPkg/Library/MpInitLib/MpLib.h           | 3 ++-
 UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm  | 2 ++
 4 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/UefiCpuPkg/Library/MpInitLib/DxeMpLib.c b/UefiCpuPkg/Library/MpInitLib/DxeMpLib.c
index d0f9f7e..64c0c52 100644
--- a/UefiCpuPkg/Library/MpInitLib/DxeMpLib.c
+++ b/UefiCpuPkg/Library/MpInitLib/DxeMpLib.c
@@ -27,6 +27,7 @@ EFI_EVENT        mLegacyBootEvent = NULL;
 volatile BOOLEAN mStopCheckAllApsStatus = TRUE;
 VOID             *mReservedApLoopFunc = NULL;
 UINTN            mReservedTopOfApStack;
+volatile UINT32  mNumberToFinish = 0;
 
 /**
   Get the pointer to CPU MP Data structure.
@@ -253,7 +254,8 @@ RelocateApLoop (
     MwaitSupport,
     CpuMpData->ApTargetCState,
     CpuMpData->PmCodeSegment,
-    mReservedTopOfApStack - ProcessorNumber * AP_SAFT_STACK_SIZE
+    mReservedTopOfApStack - ProcessorNumber * AP_SAFT_STACK_SIZE,
+    (UINTN) &mNumberToFinish
     );
   //
   // It should never reach here
@@ -282,8 +284,12 @@ MpInitChangeApLoopCallback (
   CpuMpData->SaveRestoreFlag = TRUE;
   CpuMpData->PmCodeSegment = GetProtectedModeCS ();
   CpuMpData->ApLoopMode = PcdGet8 (PcdCpuApLoopMode);
+  mNumberToFinish = CpuMpData->CpuCount - 1;
   WakeUpAP (CpuMpData, TRUE, 0, RelocateApLoop, NULL);
   DEBUG ((DEBUG_INFO, "%a() done!\n", __FUNCTION__));
+  while (mNumberToFinish > 0) {
+    CpuPause ();
+  }
 }
 
 /**
diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
index 4e55760..6235748 100644
--- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
+++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
@@ -222,6 +222,8 @@ AsmRelocateApLoopStart:
     mov        ecx, [ebp + 8]      ; MwaitSupport
     mov        ebx, [ebp + 12]     ; ApTargetCState
     mov        esp, [ebp + 20]     ; TopOfApStack
+    mov        eax, [ebp + 24]     ; CountTofinish
+    lock dec   dword [eax]         ; CountTofinish--
     cmp        cl,  1              ; Check mwait-monitor support
     jnz        HltLoop
 MwaitLoop:
diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.h b/UefiCpuPkg/Library/MpInitLib/MpLib.h
index e6dea18..49305ad 100644
--- a/UefiCpuPkg/Library/MpInitLib/MpLib.h
+++ b/UefiCpuPkg/Library/MpInitLib/MpLib.h
@@ -251,7 +251,8 @@ VOID
   IN BOOLEAN                 MwaitSupport,
   IN UINTN                   ApTargetCState,
   IN UINTN                   PmCodeSegment,
-  IN UINTN                   TopOfApStack
+  IN UINTN                   TopOfApStack,
+  IN UINTN                   NumberToFinish
   );
 
 /**
diff --git a/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm b/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm
index 7c8fa45..deaee9e 100644
--- a/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm
+++ b/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm
@@ -227,6 +227,8 @@ AsmRelocateApLoopStart:
     push       rbp
     mov        rbp, rsp
     mov        rsp, r9
+    mov        rax, [rbp + 48]     ; CountTofinish
+    lock dec   dword [rax]         ; CountTofinish--
     push       rcx
     push       rdx
 
-- 
2.9.3.windows.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] UefiCpuPkg/DxeMpLib: Get safe AP loop handler from global variable
  2016-11-23 14:01 ` [PATCH 1/3] UefiCpuPkg/DxeMpLib: Get safe AP loop handler from global variable Jeff Fan
@ 2016-11-23 17:08   ` Laszlo Ersek
  0 siblings, 0 replies; 12+ messages in thread
From: Laszlo Ersek @ 2016-11-23 17:08 UTC (permalink / raw)
  To: Jeff Fan, edk2-devel; +Cc: Feng Tian, Michael D Kinney

On 11/23/16 15:01, Jeff Fan wrote:
> Cc: Laszlo Ersek <lersek@redhat.com>
> Cc: Feng Tian <feng.tian@intel.com>
> Cc: Michael D Kinney <michael.d.kinney@intel.com>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Jeff Fan <jeff.fan@intel.com>
> ---
>  UefiCpuPkg/Library/MpInitLib/DxeMpLib.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/UefiCpuPkg/Library/MpInitLib/DxeMpLib.c b/UefiCpuPkg/Library/MpInitLib/DxeMpLib.c
> index 7f3900b..a0d5eeb 100644
> --- a/UefiCpuPkg/Library/MpInitLib/DxeMpLib.c
> +++ b/UefiCpuPkg/Library/MpInitLib/DxeMpLib.c
> @@ -244,7 +244,7 @@ RelocateApLoop (
>  
>    CpuMpData    = GetCpuMpData ();
>    MwaitSupport = IsMwaitSupport ();
> -  AsmRelocateApLoopFunc = (ASM_RELOCATE_AP_LOOP) (UINTN) Buffer;
> +  AsmRelocateApLoopFunc = (ASM_RELOCATE_AP_LOOP) (UINTN) mReservedApLoopFunc;
>    AsmRelocateApLoopFunc (MwaitSupport, CpuMpData->ApTargetCState, CpuMpData->PmCodeSegment);
>    //
>    // It should never reach here
> @@ -273,7 +273,7 @@ MpInitChangeApLoopCallback (
>    CpuMpData->SaveRestoreFlag = TRUE;
>    CpuMpData->PmCodeSegment = GetProtectedModeCS ();
>    CpuMpData->ApLoopMode = PcdGet8 (PcdCpuApLoopMode);
> -  WakeUpAP (CpuMpData, TRUE, 0, RelocateApLoop, mReservedApLoopFunc);
> +  WakeUpAP (CpuMpData, TRUE, 0, RelocateApLoop, NULL);
>    DEBUG ((DEBUG_INFO, "%a() done!\n", __FUNCTION__));
>  }
>  
> 

This patch looks okay (there are no other references to the
RelocateApLoop() function), but can you please modify the commit message
to explain why this change is a good thing?

Thanks
Laszlo


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] UefiCpuPkg/DxeMpLib: Allocate new safe stack < 4GB
  2016-11-23 14:01 ` [PATCH 2/3] UefiCpuPkg/DxeMpLib: Allocate new safe stack < 4GB Jeff Fan
@ 2016-11-23 18:31   ` Laszlo Ersek
  2016-11-24  2:23     ` Fan, Jeff
  0 siblings, 1 reply; 12+ messages in thread
From: Laszlo Ersek @ 2016-11-23 18:31 UTC (permalink / raw)
  To: Jeff Fan, edk2-devel; +Cc: Feng Tian, Michael D Kinney

On 11/23/16 15:01, Jeff Fan wrote:
> For long mode DXE, we will disable paging on AP to protected mode to execute AP
> safe loop code in ACPI NVS range under 4GB. But we forget to allocate stack for
> AP under 4GB and AP still are using original AP stack. If original AP stack is
> larger than 4GB, it cannot be used after AP is transferred to protected mode.
> 
> Moreover, even though AP stack is always under 4GB on protected mode DXE, AP
> stack(in BootServiceData) maybe crashed by OS after Exit Boot Service event.
> 
> This fix is to allocate ACPI NVS range under 4GB together with AP safe loop
> code. APs will switch to new stack at beginning of safe loop code.

(1) We are actually allocating EfiReservedMemoryType -- please update
the commit message.

(2) For a minute I was confused about using the stack *at all* in the
HLT loop. But looking farther down, and reading up on the MONITOR
statement, I see that the MWAIT loops in both the Ia32 and X64 source
files use the stack inherentily. So, the stack usage is unavoidable in
the MwaitSupport==TRUE case.

Can you mention in the commit message that the stack usage is only
really necessary for MwaitSupport==TRUE, and in the HLT loop mode, it
could be technically avoided?

(I'm not suggesting that we implement a separate solution for the HLT
loop mode, but we should be clear about the true incentive for this patch.)

> Cc: Laszlo Ersek <lersek@redhat.com>
> Cc: Feng Tian <feng.tian@intel.com>
> Cc: Michael D Kinney <michael.d.kinney@intel.com>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Jeff Fan <jeff.fan@intel.com>
> ---
>  UefiCpuPkg/Library/MpInitLib/DxeMpLib.c        | 19 +++++++++++++++++--
>  UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm |  9 +++++++--
>  UefiCpuPkg/Library/MpInitLib/MpLib.h           |  3 ++-
>  UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm  |  3 +++
>  4 files changed, 29 insertions(+), 5 deletions(-)
> 
> diff --git a/UefiCpuPkg/Library/MpInitLib/DxeMpLib.c b/UefiCpuPkg/Library/MpInitLib/DxeMpLib.c
> index a0d5eeb..d0f9f7e 100644
> --- a/UefiCpuPkg/Library/MpInitLib/DxeMpLib.c
> +++ b/UefiCpuPkg/Library/MpInitLib/DxeMpLib.c
> @@ -18,6 +18,7 @@
>  #include <Library/UefiBootServicesTableLib.h>
>  
>  #define  AP_CHECK_INTERVAL     (EFI_TIMER_PERIOD_MILLISECONDS (100))
> +#define  AP_SAFT_STACK_SIZE    128

(3) Is "SAFT" a typo? Did you mean "SAFE"?

>  
>  CPU_MP_DATA      *mCpuMpData = NULL;
>  EFI_EVENT        mCheckAllApsEvent = NULL;
> @@ -25,6 +26,7 @@ EFI_EVENT        mMpInitExitBootServicesEvent = NULL;
>  EFI_EVENT        mLegacyBootEvent = NULL;
>  volatile BOOLEAN mStopCheckAllApsStatus = TRUE;
>  VOID             *mReservedApLoopFunc = NULL;
> +UINTN            mReservedTopOfApStack;
>  
>  /**
>    Get the pointer to CPU MP Data structure.
> @@ -241,11 +243,18 @@ RelocateApLoop (
>    CPU_MP_DATA            *CpuMpData;
>    BOOLEAN                MwaitSupport;
>    ASM_RELOCATE_AP_LOOP   AsmRelocateApLoopFunc;
> +  UINTN                  ProcessorNumber;
>  
> +  MpInitLibWhoAmI (&ProcessorNumber); 
>    CpuMpData    = GetCpuMpData ();
>    MwaitSupport = IsMwaitSupport ();
>    AsmRelocateApLoopFunc = (ASM_RELOCATE_AP_LOOP) (UINTN) mReservedApLoopFunc;
> -  AsmRelocateApLoopFunc (MwaitSupport, CpuMpData->ApTargetCState, CpuMpData->PmCodeSegment);
> +  AsmRelocateApLoopFunc (
> +    MwaitSupport,
> +    CpuMpData->ApTargetCState,
> +    CpuMpData->PmCodeSegment,
> +    mReservedTopOfApStack - ProcessorNumber * AP_SAFT_STACK_SIZE
> +    );
>    //
>    // It should never reach here
>    //
> @@ -289,6 +298,7 @@ InitMpGlobalData (
>  {
>    EFI_STATUS                 Status;
>    EFI_PHYSICAL_ADDRESS       Address;
> +  UINTN                      ApSafeBufferSize;
>  
>    SaveCpuMpData (CpuMpData);
>  
> @@ -307,16 +317,21 @@ InitMpGlobalData (
>    // Allocating it in advance since memory services are not available in
>    // Exit Boot Services callback function.
>    //
> +  ApSafeBufferSize  = CpuMpData->AddressMap.RelocateApLoopFuncSize;
> +  ApSafeBufferSize += CpuMpData->CpuCount * AP_SAFT_STACK_SIZE;
> +

(So, I think this could be theoretically avoided for the HLT loop case,
but I'm fine if we do it for both loop styles, for simplicity.)

>    Address = BASE_4GB - 1;
>    Status  = gBS->AllocatePages (
>                     AllocateMaxAddress,
>                     EfiReservedMemoryType,
> -                   EFI_SIZE_TO_PAGES (sizeof (CpuMpData->AddressMap.RelocateApLoopFuncSize)),
> +                   EFI_SIZE_TO_PAGES (ApSafeBufferSize),
>                     &Address
>                     );
>    ASSERT_EFI_ERROR (Status);
>    mReservedApLoopFunc = (VOID *) (UINTN) Address;
>    ASSERT (mReservedApLoopFunc != NULL);
> +  mReservedTopOfApStack = (UINTN) Address + EFI_SIZE_TO_PAGES (ApSafeBufferSize) * SIZE_4KB;

Ah, here you are moving the stacks to the end of the allocated area, so
if there's any gap left after the assembly code and the stacks, the gap
is now moved into the middle. Looks okay.

(4) I propose to replace the multiplication with the more idiomatic

  EFI_PAGES_TO_SIZE (EFI_SIZE_TO_PAGES (ApSafeBufferSize))

or else

  ALIGN_VALUE (ApSafeBufferSize, EFI_PAGE_SIZE)

Although the current code is correct too.

> +  ASSERT ((mReservedTopOfApStack & (UINTN)(CPU_STACK_ALIGNMENT - 1)) == 0);
>    CopyMem (
>      mReservedApLoopFunc,
>      CpuMpData->AddressMap.RelocateApLoopFuncAddress,
> diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> index 64e51d8..4e55760 100644
> --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> @@ -217,14 +217,19 @@ RendezvousFunnelProcEnd:
>  global ASM_PFX(AsmRelocateApLoop)
>  ASM_PFX(AsmRelocateApLoop):
>  AsmRelocateApLoopStart:
> -    cmp        byte [esp + 4], 1
> +    push       ebp

(5) Why do you save EBP on the stack? (Sorry if it is trivial, my
assembly is not that great.) And, it is saved on the original AP stack.

> +    mov        ebp, esp
> +    mov        ecx, [ebp + 8]      ; MwaitSupport
> +    mov        ebx, [ebp + 12]     ; ApTargetCState
> +    mov        esp, [ebp + 20]     ; TopOfApStack
> +    cmp        cl,  1              ; Check mwait-monitor support
>      jnz        HltLoop
>  MwaitLoop:
>      mov        eax, esp
>      xor        ecx, ecx
>      xor        edx, edx
>      monitor
> -    mov        eax, [esp + 8]    ; Mwait Cx, Target C-State per eax[7:4]
> +    mov        eax, ebx            ; Mwait Cx, Target C-State per eax[7:4]
>      shl        eax, 4
>      mwait
>      jmp        MwaitLoop

(6) These code changes look okay to me, but are they necessary in the
32-bit case as well? The original AP stack is under 4GB too.

Oh wait, I understand. Our purpose is dual here. The stack space used by
MONITOR should be *both* under 4GB *and* in reserved type memory.

So, the code is fine, but can you please modify the following sentence
in the commit message:

    Moreover, even though AP stack is always under 4GB on protected
    mode DXE, ...

to:

    Moreover, even though AP stack is always under 4GB (a) in Ia32 DXE
    and (b) with this patch, after transfering to protected mode from
    X64 DXE, ...

It should be clear from the commit message that for Ia32, we solve
problem #2 (memory type for the MONITOR area), while for X64, we solve
problem #1 (address range) and problem #2 (memory type) *both*.

> diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.h b/UefiCpuPkg/Library/MpInitLib/MpLib.h
> index f73a469..e6dea18 100644
> --- a/UefiCpuPkg/Library/MpInitLib/MpLib.h
> +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.h
> @@ -250,7 +250,8 @@ VOID
>  (EFIAPI * ASM_RELOCATE_AP_LOOP) (
>    IN BOOLEAN                 MwaitSupport,
>    IN UINTN                   ApTargetCState,
> -  IN UINTN                   PmCodeSegment
> +  IN UINTN                   PmCodeSegment,
> +  IN UINTN                   TopOfApStack
>    );
>  
>  /**
> diff --git a/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm b/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm
> index aaabb50..7c8fa45 100644
> --- a/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm
> +++ b/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm
> @@ -224,6 +224,9 @@ RendezvousFunnelProcEnd:
>  global ASM_PFX(AsmRelocateApLoop)
>  ASM_PFX(AsmRelocateApLoop):
>  AsmRelocateApLoopStart:
> +    push       rbp

(7) Again, not sure why we're saving RBP (on the current AP stack).

> +    mov        rbp, rsp
> +    mov        rsp, r9
>      push       rcx
>      push       rdx
>  
> 

This looks okay.


I have some more questions about the preexistent code:

(8) The MONITOR statement seems to set up an address *range* for
monitoring with MWAIT. EAX provides the base address of the range, and
we point it to our new stack. However, what determines the *size* of the
address range? Obviously, it must fit in our new stack.

(9) In the original (pre-patch) X64 code, I see this:

MwaitLoop:
    mov        eax, esp           ; Set Monitor Address
    xor        ecx, ecx           ; ecx = 0
    xor        edx, edx           ; edx = 0
    monitor
    shl        ebx, 4
    mov        eax, ebx           ; Mwait Cx, Target C-State per eax[7:4]
    mwait
    jmp        MwaitLoop

I think this is wrong: EBX is supposed to contain ApTargetCState. In
order to pass it to MWAIT, it has to be shifted left four bit positions,
and moved to EAX, yes.

But, *unlike* in the Ia32 case, in the X64 code you shift EBX itself,
and then you move the result from EBX to EAX. (In the Ia32 case, you
move EBX to EAX first, and then shift EAX). This means that on the
second iteration, EBX will contain garbage, and MWAIT won't be
configured correctly.

EBX must be treated read-only in the loop, in my opinion. It should be
fixed in a separate patch.

(10) The X64 HLT loop goes like this:

HltLoop:
    cli
    hlt
    jmp        HltLoop
    ret

Can we remove the "ret" (in a separate patch)?

Thanks!
Laszlo


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] UefiCpuPkg/DxeMpLib: Make sure APs in safe loop code
  2016-11-23 14:01 ` [PATCH 3/3] UefiCpuPkg/DxeMpLib: Make sure APs in safe loop code Jeff Fan
@ 2016-11-23 18:52   ` Laszlo Ersek
  2016-11-24  2:37     ` Fan, Jeff
  0 siblings, 1 reply; 12+ messages in thread
From: Laszlo Ersek @ 2016-11-23 18:52 UTC (permalink / raw)
  To: Jeff Fan, edk2-devel; +Cc: Feng Tian, Michael D Kinney

On 11/23/16 15:01, Jeff Fan wrote:
> Add one semaphore to make sure BSP to wait till all APs run in AP safe loop
> code.
> 
> Cc: Laszlo Ersek <lersek@redhat.com>
> Cc: Feng Tian <feng.tian@intel.com>
> Cc: Michael D Kinney <michael.d.kinney@intel.com>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Jeff Fan <jeff.fan@intel.com>
> ---
>  UefiCpuPkg/Library/MpInitLib/DxeMpLib.c        | 8 +++++++-
>  UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm | 2 ++
>  UefiCpuPkg/Library/MpInitLib/MpLib.h           | 3 ++-
>  UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm  | 2 ++
>  4 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/UefiCpuPkg/Library/MpInitLib/DxeMpLib.c b/UefiCpuPkg/Library/MpInitLib/DxeMpLib.c
> index d0f9f7e..64c0c52 100644
> --- a/UefiCpuPkg/Library/MpInitLib/DxeMpLib.c
> +++ b/UefiCpuPkg/Library/MpInitLib/DxeMpLib.c
> @@ -27,6 +27,7 @@ EFI_EVENT        mLegacyBootEvent = NULL;
>  volatile BOOLEAN mStopCheckAllApsStatus = TRUE;
>  VOID             *mReservedApLoopFunc = NULL;
>  UINTN            mReservedTopOfApStack;
> +volatile UINT32  mNumberToFinish = 0;
>  
>  /**
>    Get the pointer to CPU MP Data structure.
> @@ -253,7 +254,8 @@ RelocateApLoop (
>      MwaitSupport,
>      CpuMpData->ApTargetCState,
>      CpuMpData->PmCodeSegment,
> -    mReservedTopOfApStack - ProcessorNumber * AP_SAFT_STACK_SIZE
> +    mReservedTopOfApStack - ProcessorNumber * AP_SAFT_STACK_SIZE,
> +    (UINTN) &mNumberToFinish
>      );
>    //
>    // It should never reach here
> @@ -282,8 +284,12 @@ MpInitChangeApLoopCallback (
>    CpuMpData->SaveRestoreFlag = TRUE;
>    CpuMpData->PmCodeSegment = GetProtectedModeCS ();
>    CpuMpData->ApLoopMode = PcdGet8 (PcdCpuApLoopMode);
> +  mNumberToFinish = CpuMpData->CpuCount - 1;
>    WakeUpAP (CpuMpData, TRUE, 0, RelocateApLoop, NULL);
>    DEBUG ((DEBUG_INFO, "%a() done!\n", __FUNCTION__));
> +  while (mNumberToFinish > 0) {
> +    CpuPause ();
> +  }
>  }

(1) Can you hoist the counting above the "done" message?

>  
>  /**
> diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> index 4e55760..6235748 100644
> --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> @@ -222,6 +222,8 @@ AsmRelocateApLoopStart:
>      mov        ecx, [ebp + 8]      ; MwaitSupport
>      mov        ebx, [ebp + 12]     ; ApTargetCState
>      mov        esp, [ebp + 20]     ; TopOfApStack
> +    mov        eax, [ebp + 24]     ; CountTofinish
> +    lock dec   dword [eax]         ; CountTofinish--

(2) The comments should say

  ; NumberToFinish
  ; (*NumberToFinish)--

>      cmp        cl,  1              ; Check mwait-monitor support
>      jnz        HltLoop
>  MwaitLoop:
> diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.h b/UefiCpuPkg/Library/MpInitLib/MpLib.h
> index e6dea18..49305ad 100644
> --- a/UefiCpuPkg/Library/MpInitLib/MpLib.h
> +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.h
> @@ -251,7 +251,8 @@ VOID
>    IN BOOLEAN                 MwaitSupport,
>    IN UINTN                   ApTargetCState,
>    IN UINTN                   PmCodeSegment,
> -  IN UINTN                   TopOfApStack
> +  IN UINTN                   TopOfApStack,
> +  IN UINTN                   NumberToFinish
>    );
>  
>  /**
> diff --git a/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm b/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm
> index 7c8fa45..deaee9e 100644
> --- a/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm
> +++ b/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm
> @@ -227,6 +227,8 @@ AsmRelocateApLoopStart:
>      push       rbp
>      mov        rbp, rsp
>      mov        rsp, r9
> +    mov        rax, [rbp + 48]     ; CountTofinish
> +    lock dec   dword [rax]         ; CountTofinish--
>      push       rcx
>      push       rdx
>  
> 

Hmmm, in the X64 case, this decrement seems to be pretty far from the
actual loop; a window still remains. On the other hand, if we wanted to
do the decrement from protected mode, then the semaphore would have to
be moved into the specially allocated area too.

Anyway, the point is that we are on the new stack, and will make no
further accesses to BootServicesData type memory.

(3) The comments should say

  ; NumberToFinish
  ; (*NumberToFinish)--


I think I'll postpone testing until the next version (if you decide to
post v3).

Thanks!
Laszlo


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] UefiCpuPkg/DxeMpLib: Allocate new safe stack < 4GB
  2016-11-23 18:31   ` Laszlo Ersek
@ 2016-11-24  2:23     ` Fan, Jeff
  2016-11-24  9:20       ` Laszlo Ersek
  0 siblings, 1 reply; 12+ messages in thread
From: Fan, Jeff @ 2016-11-24  2:23 UTC (permalink / raw)
  To: Laszlo Ersek, edk2-devel@ml01.01.org; +Cc: Tian, Feng, Kinney, Michael D

Laszlo,

Thanks your comments.  I update the feedback as below in [Jeff]

Jeff

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@redhat.com] 
Sent: Thursday, November 24, 2016 2:31 AM
To: Fan, Jeff; edk2-devel@ml01.01.org
Cc: Tian, Feng; Kinney, Michael D
Subject: Re: [PATCH 2/3] UefiCpuPkg/DxeMpLib: Allocate new safe stack < 4GB

On 11/23/16 15:01, Jeff Fan wrote:
> For long mode DXE, we will disable paging on AP to protected mode to 
> execute AP safe loop code in ACPI NVS range under 4GB. But we forget 
> to allocate stack for AP under 4GB and AP still are using original AP 
> stack. If original AP stack is larger than 4GB, it cannot be used after AP is transferred to protected mode.
> 
> Moreover, even though AP stack is always under 4GB on protected mode 
> DXE, AP stack(in BootServiceData) maybe crashed by OS after Exit Boot Service event.
> 
> This fix is to allocate ACPI NVS range under 4GB together with AP safe 
> loop code. APs will switch to new stack at beginning of safe loop code.

(1) We are actually allocating EfiReservedMemoryType -- please update the commit message.
[Jeff] OK

(2) For a minute I was confused about using the stack *at all* in the HLT loop. But looking farther down, and reading up on the MONITOR statement, I see that the MWAIT loops in both the Ia32 and X64 source files use the stack inherentily. So, the stack usage is unavoidable in the MwaitSupport==TRUE case.

Can you mention in the commit message that the stack usage is only really necessary for MwaitSupport==TRUE, and in the HLT loop mode, it could be technically avoided?
[Jeff] OK

(I'm not suggesting that we implement a separate solution for the HLT loop mode, but we should be clear about the true incentive for this patch.)

> Cc: Laszlo Ersek <lersek@redhat.com>
> Cc: Feng Tian <feng.tian@intel.com>
> Cc: Michael D Kinney <michael.d.kinney@intel.com>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Jeff Fan <jeff.fan@intel.com>
> ---
>  UefiCpuPkg/Library/MpInitLib/DxeMpLib.c        | 19 +++++++++++++++++--
>  UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm |  9 +++++++--
>  UefiCpuPkg/Library/MpInitLib/MpLib.h           |  3 ++-
>  UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm  |  3 +++
>  4 files changed, 29 insertions(+), 5 deletions(-)
> 
> diff --git a/UefiCpuPkg/Library/MpInitLib/DxeMpLib.c 
> b/UefiCpuPkg/Library/MpInitLib/DxeMpLib.c
> index a0d5eeb..d0f9f7e 100644
> --- a/UefiCpuPkg/Library/MpInitLib/DxeMpLib.c
> +++ b/UefiCpuPkg/Library/MpInitLib/DxeMpLib.c
> @@ -18,6 +18,7 @@
>  #include <Library/UefiBootServicesTableLib.h>
>  
>  #define  AP_CHECK_INTERVAL     (EFI_TIMER_PERIOD_MILLISECONDS (100))
> +#define  AP_SAFT_STACK_SIZE    128

(3) Is "SAFT" a typo? Did you mean "SAFE"?
[Jeff] Yes. Will correct it.

>  
>  CPU_MP_DATA      *mCpuMpData = NULL;
>  EFI_EVENT        mCheckAllApsEvent = NULL;
> @@ -25,6 +26,7 @@ EFI_EVENT        mMpInitExitBootServicesEvent = NULL;
>  EFI_EVENT        mLegacyBootEvent = NULL;
>  volatile BOOLEAN mStopCheckAllApsStatus = TRUE;
>  VOID             *mReservedApLoopFunc = NULL;
> +UINTN            mReservedTopOfApStack;
>  
>  /**
>    Get the pointer to CPU MP Data structure.
> @@ -241,11 +243,18 @@ RelocateApLoop (
>    CPU_MP_DATA            *CpuMpData;
>    BOOLEAN                MwaitSupport;
>    ASM_RELOCATE_AP_LOOP   AsmRelocateApLoopFunc;
> +  UINTN                  ProcessorNumber;
>  
> +  MpInitLibWhoAmI (&ProcessorNumber);
>    CpuMpData    = GetCpuMpData ();
>    MwaitSupport = IsMwaitSupport ();
>    AsmRelocateApLoopFunc = (ASM_RELOCATE_AP_LOOP) (UINTN) 
> mReservedApLoopFunc;
> -  AsmRelocateApLoopFunc (MwaitSupport, CpuMpData->ApTargetCState, 
> CpuMpData->PmCodeSegment);
> +  AsmRelocateApLoopFunc (
> +    MwaitSupport,
> +    CpuMpData->ApTargetCState,
> +    CpuMpData->PmCodeSegment,
> +    mReservedTopOfApStack - ProcessorNumber * AP_SAFT_STACK_SIZE
> +    );
>    //
>    // It should never reach here
>    //
> @@ -289,6 +298,7 @@ InitMpGlobalData (  {
>    EFI_STATUS                 Status;
>    EFI_PHYSICAL_ADDRESS       Address;
> +  UINTN                      ApSafeBufferSize;
>  
>    SaveCpuMpData (CpuMpData);
>  
> @@ -307,16 +317,21 @@ InitMpGlobalData (
>    // Allocating it in advance since memory services are not available in
>    // Exit Boot Services callback function.
>    //
> +  ApSafeBufferSize  = CpuMpData->AddressMap.RelocateApLoopFuncSize;
> +  ApSafeBufferSize += CpuMpData->CpuCount * AP_SAFT_STACK_SIZE;
> +

(So, I think this could be theoretically avoided for the HLT loop case, but I'm fine if we do it for both loop styles, for simplicity.)

>    Address = BASE_4GB - 1;
>    Status  = gBS->AllocatePages (
>                     AllocateMaxAddress,
>                     EfiReservedMemoryType,
> -                   EFI_SIZE_TO_PAGES (sizeof (CpuMpData->AddressMap.RelocateApLoopFuncSize)),
> +                   EFI_SIZE_TO_PAGES (ApSafeBufferSize),
>                     &Address
>                     );
>    ASSERT_EFI_ERROR (Status);
>    mReservedApLoopFunc = (VOID *) (UINTN) Address;
>    ASSERT (mReservedApLoopFunc != NULL);
> +  mReservedTopOfApStack = (UINTN) Address + EFI_SIZE_TO_PAGES 
> + (ApSafeBufferSize) * SIZE_4KB;

Ah, here you are moving the stacks to the end of the allocated area, so if there's any gap left after the assembly code and the stacks, the gap is now moved into the middle. Looks okay.

(4) I propose to replace the multiplication with the more idiomatic

  EFI_PAGES_TO_SIZE (EFI_SIZE_TO_PAGES (ApSafeBufferSize))

[Jeff] Will correct it.

or else

  ALIGN_VALUE (ApSafeBufferSize, EFI_PAGE_SIZE)

Although the current code is correct too.

> +  ASSERT ((mReservedTopOfApStack & (UINTN)(CPU_STACK_ALIGNMENT - 1)) 
> + == 0);
>    CopyMem (
>      mReservedApLoopFunc,
>      CpuMpData->AddressMap.RelocateApLoopFuncAddress,
> diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm 
> b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> index 64e51d8..4e55760 100644
> --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> @@ -217,14 +217,19 @@ RendezvousFunnelProcEnd:
>  global ASM_PFX(AsmRelocateApLoop)
>  ASM_PFX(AsmRelocateApLoop):
>  AsmRelocateApLoopStart:
> -    cmp        byte [esp + 4], 1
> +    push       ebp

(5) Why do you save EBP on the stack? (Sorry if it is trivial, my assembly is not that great.) And, it is saved on the original AP stack.
[Jeff] Good point. It could be saved in new stack and used for stack trace. I will fix it.

> +    mov        ebp, esp
> +    mov        ecx, [ebp + 8]      ; MwaitSupport
> +    mov        ebx, [ebp + 12]     ; ApTargetCState
> +    mov        esp, [ebp + 20]     ; TopOfApStack
> +    cmp        cl,  1              ; Check mwait-monitor support
>      jnz        HltLoop
>  MwaitLoop:
>      mov        eax, esp
>      xor        ecx, ecx
>      xor        edx, edx
>      monitor
> -    mov        eax, [esp + 8]    ; Mwait Cx, Target C-State per eax[7:4]
> +    mov        eax, ebx            ; Mwait Cx, Target C-State per eax[7:4]
>      shl        eax, 4
>      mwait
>      jmp        MwaitLoop

(6) These code changes look okay to me, but are they necessary in the 32-bit case as well? The original AP stack is under 4GB too.

Oh wait, I understand. Our purpose is dual here. The stack space used by MONITOR should be *both* under 4GB *and* in reserved type memory.

So, the code is fine, but can you please modify the following sentence in the commit message:

    Moreover, even though AP stack is always under 4GB on protected
    mode DXE, ...

to:

    Moreover, even though AP stack is always under 4GB (a) in Ia32 DXE
    and (b) with this patch, after transfering to protected mode from
    X64 DXE, ...

It should be clear from the commit message that for Ia32, we solve problem #2 (memory type for the MONITOR area), while for X64, we solve problem #1 (address range) and problem #2 (memory type) *both*.
[Jeff] OK.
  
> diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.h 
> b/UefiCpuPkg/Library/MpInitLib/MpLib.h
> index f73a469..e6dea18 100644
> --- a/UefiCpuPkg/Library/MpInitLib/MpLib.h
> +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.h
> @@ -250,7 +250,8 @@ VOID
>  (EFIAPI * ASM_RELOCATE_AP_LOOP) (
>    IN BOOLEAN                 MwaitSupport,
>    IN UINTN                   ApTargetCState,
> -  IN UINTN                   PmCodeSegment
> +  IN UINTN                   PmCodeSegment,
> +  IN UINTN                   TopOfApStack
>    );
>  
>  /**
> diff --git a/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm 
> b/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm
> index aaabb50..7c8fa45 100644
> --- a/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm
> +++ b/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm
> @@ -224,6 +224,9 @@ RendezvousFunnelProcEnd:
>  global ASM_PFX(AsmRelocateApLoop)
>  ASM_PFX(AsmRelocateApLoop):
>  AsmRelocateApLoopStart:
> +    push       rbp

(7) Again, not sure why we're saving RBP (on the current AP stack).
[Jeff] For x64, saving RBP is not necessary. It cannot used to do stack trace. I will remove it.

> +    mov        rbp, rsp
> +    mov        rsp, r9
>      push       rcx
>      push       rdx
>  
> 

This looks okay.


I have some more questions about the preexistent code:

(8) The MONITOR statement seems to set up an address *range* for monitoring with MWAIT. EAX provides the base address of the range, and we point it to our new stack. However, what determines the *size* of the address range? Obviously, it must fit in our new stack.
[Jeff] Yes. Monitor has size requirement to avoid fault wakeup. But this case, fault wakeup has no any real impact. It has rare case to write the memory. Even though fault wakeup happens, safe mwait-loop could make sure AP enter into C-state
 again.

(9) In the original (pre-patch) X64 code, I see this:

MwaitLoop:
    mov        eax, esp           ; Set Monitor Address
    xor        ecx, ecx           ; ecx = 0
    xor        edx, edx           ; edx = 0
    monitor
    shl        ebx, 4
    mov        eax, ebx           ; Mwait Cx, Target C-State per eax[7:4]
    mwait
    jmp        MwaitLoop

I think this is wrong: EBX is supposed to contain ApTargetCState. In order to pass it to MWAIT, it has to be shifted left four bit positions, and moved to EAX, yes.

But, *unlike* in the Ia32 case, in the X64 code you shift EBX itself, and then you move the result from EBX to EAX. (In the Ia32 case, you move EBX to EAX first, and then shift EAX). This means that on the second iteration, EBX will contain garbage, and MWAIT won't be configured correctly.

EBX must be treated read-only in the loop, in my opinion. It should be fixed in a separate patch.
[Jeff] Good catch. Will fix it.

(10) The X64 HLT loop goes like this:

HltLoop:
    cli
    hlt
    jmp        HltLoop
    ret

Can we remove the "ret" (in a separate patch)?
[Jeff] OK

Thanks!
Laszlo


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] UefiCpuPkg/DxeMpLib: Make sure APs in safe loop code
  2016-11-23 18:52   ` Laszlo Ersek
@ 2016-11-24  2:37     ` Fan, Jeff
  0 siblings, 0 replies; 12+ messages in thread
From: Fan, Jeff @ 2016-11-24  2:37 UTC (permalink / raw)
  To: Laszlo Ersek, edk2-devel@ml01.01.org; +Cc: Tian, Feng, Kinney, Michael D

Laszlo,

I added my feedback in [Jeff].

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@redhat.com] 
Sent: Thursday, November 24, 2016 2:52 AM
To: Fan, Jeff; edk2-devel@ml01.01.org
Cc: Tian, Feng; Kinney, Michael D
Subject: Re: [PATCH 3/3] UefiCpuPkg/DxeMpLib: Make sure APs in safe loop code

On 11/23/16 15:01, Jeff Fan wrote:
> Add one semaphore to make sure BSP to wait till all APs run in AP safe 
> loop code.
> 
> Cc: Laszlo Ersek <lersek@redhat.com>
> Cc: Feng Tian <feng.tian@intel.com>
> Cc: Michael D Kinney <michael.d.kinney@intel.com>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Jeff Fan <jeff.fan@intel.com>
> ---
>  UefiCpuPkg/Library/MpInitLib/DxeMpLib.c        | 8 +++++++-
>  UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm | 2 ++
>  UefiCpuPkg/Library/MpInitLib/MpLib.h           | 3 ++-
>  UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm  | 2 ++
>  4 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/UefiCpuPkg/Library/MpInitLib/DxeMpLib.c 
> b/UefiCpuPkg/Library/MpInitLib/DxeMpLib.c
> index d0f9f7e..64c0c52 100644
> --- a/UefiCpuPkg/Library/MpInitLib/DxeMpLib.c
> +++ b/UefiCpuPkg/Library/MpInitLib/DxeMpLib.c
> @@ -27,6 +27,7 @@ EFI_EVENT        mLegacyBootEvent = NULL;
>  volatile BOOLEAN mStopCheckAllApsStatus = TRUE;
>  VOID             *mReservedApLoopFunc = NULL;
>  UINTN            mReservedTopOfApStack;
> +volatile UINT32  mNumberToFinish = 0;
>  
>  /**
>    Get the pointer to CPU MP Data structure.
> @@ -253,7 +254,8 @@ RelocateApLoop (
>      MwaitSupport,
>      CpuMpData->ApTargetCState,
>      CpuMpData->PmCodeSegment,
> -    mReservedTopOfApStack - ProcessorNumber * AP_SAFT_STACK_SIZE
> +    mReservedTopOfApStack - ProcessorNumber * AP_SAFT_STACK_SIZE,
> +    (UINTN) &mNumberToFinish
>      );
>    //
>    // It should never reach here
> @@ -282,8 +284,12 @@ MpInitChangeApLoopCallback (
>    CpuMpData->SaveRestoreFlag = TRUE;
>    CpuMpData->PmCodeSegment = GetProtectedModeCS ();
>    CpuMpData->ApLoopMode = PcdGet8 (PcdCpuApLoopMode);
> +  mNumberToFinish = CpuMpData->CpuCount - 1;
>    WakeUpAP (CpuMpData, TRUE, 0, RelocateApLoop, NULL);
>    DEBUG ((DEBUG_INFO, "%a() done!\n", __FUNCTION__));
> +  while (mNumberToFinish > 0) {
> +    CpuPause ();
> +  }
>  }

(1) Can you hoist the counting above the "done" message?
[Jeff] Agree.

>  
>  /**
> diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm 
> b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> index 4e55760..6235748 100644
> --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> @@ -222,6 +222,8 @@ AsmRelocateApLoopStart:
>      mov        ecx, [ebp + 8]      ; MwaitSupport
>      mov        ebx, [ebp + 12]     ; ApTargetCState
>      mov        esp, [ebp + 20]     ; TopOfApStack
> +    mov        eax, [ebp + 24]     ; CountTofinish
> +    lock dec   dword [eax]         ; CountTofinish--

(2) The comments should say

  ; NumberToFinish
  ; (*NumberToFinish)--

[Jeff] Agree.

>      cmp        cl,  1              ; Check mwait-monitor support
>      jnz        HltLoop
>  MwaitLoop:
> diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.h 
> b/UefiCpuPkg/Library/MpInitLib/MpLib.h
> index e6dea18..49305ad 100644
> --- a/UefiCpuPkg/Library/MpInitLib/MpLib.h
> +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.h
> @@ -251,7 +251,8 @@ VOID
>    IN BOOLEAN                 MwaitSupport,
>    IN UINTN                   ApTargetCState,
>    IN UINTN                   PmCodeSegment,
> -  IN UINTN                   TopOfApStack
> +  IN UINTN                   TopOfApStack,
> +  IN UINTN                   NumberToFinish
>    );
>  
>  /**
> diff --git a/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm 
> b/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm
> index 7c8fa45..deaee9e 100644
> --- a/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm
> +++ b/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm
> @@ -227,6 +227,8 @@ AsmRelocateApLoopStart:
>      push       rbp
>      mov        rbp, rsp
>      mov        rsp, r9
> +    mov        rax, [rbp + 48]     ; CountTofinish
> +    lock dec   dword [rax]         ; CountTofinish--
>      push       rcx
>      push       rdx
>  
> 

Hmmm, in the X64 case, this decrement seems to be pretty far from the actual loop; a window still remains. On the other hand, if we wanted to do the decrement from protected mode, then the semaphore would have to be moved into the specially allocated area too.
[Jeff] There is no window on this case. After "move rsp, r9" all AP code will be updated in safe code / safe stack. And it is in long mode.

Anyway, the point is that we are on the new stack, and will make no further accesses to BootServicesData type memory.

(3) The comments should say

  ; NumberToFinish
  ; (*NumberToFinish)--
[Jeff] Agree.

I think I'll postpone testing until the next version (if you decide to post v3).

Thanks!
Laszlo


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] UefiCpuPkg/DxeMpLib: Allocate new safe stack < 4GB
  2016-11-24  2:23     ` Fan, Jeff
@ 2016-11-24  9:20       ` Laszlo Ersek
  2016-11-24 13:48         ` Fan, Jeff
  0 siblings, 1 reply; 12+ messages in thread
From: Laszlo Ersek @ 2016-11-24  9:20 UTC (permalink / raw)
  To: Fan, Jeff, edk2-devel@ml01.01.org; +Cc: Tian, Feng, Kinney, Michael D

On 11/24/16 03:23, Fan, Jeff wrote:
> -----Original Message-----
> From: Laszlo Ersek [mailto:lersek@redhat.com] 
> Sent: Thursday, November 24, 2016 2:31 AM
> To: Fan, Jeff; edk2-devel@ml01.01.org
> Cc: Tian, Feng; Kinney, Michael D
> Subject: Re: [PATCH 2/3] UefiCpuPkg/DxeMpLib: Allocate new safe stack < 4GB

> I have some more questions about the preexistent code:
> 
> (8) The MONITOR statement seems to set up an address *range* for
> monitoring with MWAIT. EAX provides the base address of the range,
> and we point it to our new stack. However, what determines the *size*
> of the address range? Obviously, it must fit in our new stack.

> [Jeff] Yes. Monitor has size requirement to avoid fault wakeup. But
> this case, fault wakeup has no any real impact. It has rare case to
> write the memory. Even though fault wakeup happens, safe mwait-loop
> could make sure AP enter into C-state again.

I'm sorry, I think my question was not clear enough. I'm actually
interested in three things here.

* The first question is just for my general education, because I'm not
familiar with the MONITOR configuration. The question is independent of
the code, it is just for me to learn about MONITOR.

So, how does the programmer configure the size of the monitored area?
The base address is taken from EAX. Where does the size come from?

* My second note is that whatever size we configure for the monitor, it
should fit into our new stack. I see that you agree (although spurious
wakeups don't matter), so this part is done then.

* And third, I thought of something else last night. Please consider the
Ia32 case:

> diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> index 64e51d8..4e55760 100644
> --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> @@ -217,14 +217,19 @@ RendezvousFunnelProcEnd:
>  global ASM_PFX(AsmRelocateApLoop)
>  ASM_PFX(AsmRelocateApLoop):
>  AsmRelocateApLoopStart:
> -    cmp        byte [esp + 4], 1
> +    push       ebp
> +    mov        ebp, esp
> +    mov        ecx, [ebp + 8]      ; MwaitSupport
> +    mov        ebx, [ebp + 12]     ; ApTargetCState
> +    mov        esp, [ebp + 20]     ; TopOfApStack

We set ESP precisely to TopOfApStack here.

> +    cmp        cl,  1              ; Check mwait-monitor support
>      jnz        HltLoop
>  MwaitLoop:
>      mov        eax, esp

Here we move the same value to EAX -- so EAX equals TopOfApStack.

>      xor        ecx, ecx
>      xor        edx, edx
>      monitor

And here we point the monitor to TopOfApStack exactly. Note that nothing
has been pushed.

> -    mov        eax, [esp + 8]    ; Mwait Cx, Target C-State per eax[7:4]
> +    mov        eax, ebx            ; Mwait Cx, Target C-State per eax[7:4]
>      shl        eax, 4
>      mwait
>      jmp        MwaitLoop

The stack grows down. Whenever we push something, the stack pointer is
decremented first, and then the value being pushed is written to the
location pointed-to by the new (decremented) stack pointer. (I verified
this order of operations in the Intel SDM).

"mReservedTopOfApStack" points to the byte *right past* the allocated
area. In turn, for ProcessorNumber = 0, "TopOfApStack" is set exactly to
"mReservedTopOfApStack".

This is valid if we push something, because then the pointer is
decremented first, and the value will be written into the allocated
area.

However, with MONITOR, the area to watch extends *upward* from EAX. This
means that for ProcessorNumber = 0, we will be monitoring an area that
is strictly outside of the allocated range, regardless of the size of
the monitored range. And, for ProcessorNumber (N+1), the monitored area
will actually be located in the stack slice of ProcessorNumber (N).

So, my point is, we should determine the size of the monitored area
exactly, and push that much empty space (== decrement ESP manually)
before invoking MONITOR.

... Hm, okay, here's what the Intel SDM tells us about MONITOR: "the
address range that the monitoring hardware checks for store operations
can be determined by using CPUID".

We should consider this address range in both the assembly code, and in
the original allocation (so replace AP_SAFE_STACK_SIZE with a dynamic
value, or at least make sure that AP_SAFT_STACK_SIZE is larger than any
imaginable monitor area size).

The same issue could apply to the X64 code, but that's more complex so I
didn't check it.

Thank you,
Laszlo


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] UefiCpuPkg/DxeMpLib: Allocate new safe stack < 4GB
  2016-11-24  9:20       ` Laszlo Ersek
@ 2016-11-24 13:48         ` Fan, Jeff
  2016-11-24 21:13           ` Laszlo Ersek
  0 siblings, 1 reply; 12+ messages in thread
From: Fan, Jeff @ 2016-11-24 13:48 UTC (permalink / raw)
  To: Laszlo Ersek, edk2-devel@ml01.01.org; +Cc: Tian, Feng, Kinney, Michael D

Laszlo,

I added my feedback for your three questions.

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@redhat.com] 
Sent: Thursday, November 24, 2016 5:21 PM
To: Fan, Jeff; edk2-devel@ml01.01.org
Cc: Tian, Feng; Kinney, Michael D
Subject: Re: [PATCH 2/3] UefiCpuPkg/DxeMpLib: Allocate new safe stack < 4GB

On 11/24/16 03:23, Fan, Jeff wrote:
> -----Original Message-----
> From: Laszlo Ersek [mailto:lersek@redhat.com]
> Sent: Thursday, November 24, 2016 2:31 AM
> To: Fan, Jeff; edk2-devel@ml01.01.org
> Cc: Tian, Feng; Kinney, Michael D
> Subject: Re: [PATCH 2/3] UefiCpuPkg/DxeMpLib: Allocate new safe stack 
> < 4GB

> I have some more questions about the preexistent code:
> 
> (8) The MONITOR statement seems to set up an address *range* for 
> monitoring with MWAIT. EAX provides the base address of the range, and 
> we point it to our new stack. However, what determines the *size* of 
> the address range? Obviously, it must fit in our new stack.

> [Jeff] Yes. Monitor has size requirement to avoid fault wakeup. But 
> this case, fault wakeup has no any real impact. It has rare case to 
> write the memory. Even though fault wakeup happens, safe mwait-loop 
> could make sure AP enter into C-state again.

I'm sorry, I think my question was not clear enough. I'm actually interested in three things here.

* The first question is just for my general education, because I'm not familiar with the MONITOR configuration. The question is independent of the code, it is just for me to learn about MONITOR.

So, how does the programmer configure the size of the monitored area?
The base address is taken from EAX. Where does the size come from?
[Jeff] Monitor size is from CPUID. I observed it equal to the cache line size. But we needn't to set size of Monitor and only set base address of Monitor buffer when executing MONITOR instruction.

* My second note is that whatever size we configure for the monitor, it should fit into our new stack. I see that you agree (although spurious wakeups don't matter), so this part is done then.
[Jeff] Yes. It should be in new stack.

* And third, I thought of something else last night. Please consider the
Ia32 case:

> diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm 
> b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> index 64e51d8..4e55760 100644
> --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> @@ -217,14 +217,19 @@ RendezvousFunnelProcEnd:
>  global ASM_PFX(AsmRelocateApLoop)
>  ASM_PFX(AsmRelocateApLoop):
>  AsmRelocateApLoopStart:
> -    cmp        byte [esp + 4], 1
> +    push       ebp
> +    mov        ebp, esp
> +    mov        ecx, [ebp + 8]      ; MwaitSupport
> +    mov        ebx, [ebp + 12]     ; ApTargetCState
> +    mov        esp, [ebp + 20]     ; TopOfApStack

We set ESP precisely to TopOfApStack here.

> +    cmp        cl,  1              ; Check mwait-monitor support
>      jnz        HltLoop
>  MwaitLoop:
>      mov        eax, esp

Here we move the same value to EAX -- so EAX equals TopOfApStack.

>      xor        ecx, ecx
>      xor        edx, edx
>      monitor

And here we point the monitor to TopOfApStack exactly. Note that nothing has been pushed.

> -    mov        eax, [esp + 8]    ; Mwait Cx, Target C-State per eax[7:4]
> +    mov        eax, ebx            ; Mwait Cx, Target C-State per eax[7:4]
>      shl        eax, 4
>      mwait
>      jmp        MwaitLoop

The stack grows down. Whenever we push something, the stack pointer is decremented first, and then the value being pushed is written to the location pointed-to by the new (decremented) stack pointer. (I verified this order of operations in the Intel SDM).

"mReservedTopOfApStack" points to the byte *right past* the allocated area. In turn, for ProcessorNumber = 0, "TopOfApStack" is set exactly to "mReservedTopOfApStack".

This is valid if we push something, because then the pointer is decremented first, and the value will be written into the allocated area.

However, with MONITOR, the area to watch extends *upward* from EAX. This means that for ProcessorNumber = 0, we will be monitoring an area that is strictly outside of the allocated range, regardless of the size of the monitored range. And, for ProcessorNumber (N+1), the monitored area will actually be located in the stack slice of ProcessorNumber (N).

So, my point is, we should determine the size of the monitored area exactly, and push that much empty space (== decrement ESP manually) before invoking MONITOR.

... Hm, okay, here's what the Intel SDM tells us about MONITOR: "the address range that the monitoring hardware checks for store operations can be determined by using CPUID".

We should consider this address range in both the assembly code, and in the original allocation (so replace AP_SAFE_STACK_SIZE with a dynamic value, or at least make sure that AP_SAFT_STACK_SIZE is larger than any imaginable monitor area size).

The same issue could apply to the X64 code, but that's more complex so I didn't check it.

[Jeff] On normal case, we DO should check monitor size and make sure we write the correct monitor buffer to wakeup Aps. But for this case, we do not want to wake up Aps at all. So, we just have one mwait deadloop here. Once we found AP is wake up by any reason (SMI, Interrupt, or monitor buffer writing), we will go on executing mwait and place Aps into C-State again.
As long as this mwait loop does not touch any unsafe code/stack, it should be OK. This could simply the implementation.


Thank you,
Laszlo


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] UefiCpuPkg/DxeMpLib: Allocate new safe stack < 4GB
  2016-11-24 13:48         ` Fan, Jeff
@ 2016-11-24 21:13           ` Laszlo Ersek
  0 siblings, 0 replies; 12+ messages in thread
From: Laszlo Ersek @ 2016-11-24 21:13 UTC (permalink / raw)
  To: Fan, Jeff, edk2-devel@ml01.01.org; +Cc: Tian, Feng, Kinney, Michael D

On 11/24/16 14:48, Fan, Jeff wrote:
> Laszlo,
> 
> I added my feedback for your three questions.
> 
> -----Original Message-----
> From: Laszlo Ersek [mailto:lersek@redhat.com] 
> Sent: Thursday, November 24, 2016 5:21 PM
> To: Fan, Jeff; edk2-devel@ml01.01.org
> Cc: Tian, Feng; Kinney, Michael D
> Subject: Re: [PATCH 2/3] UefiCpuPkg/DxeMpLib: Allocate new safe stack < 4GB
> 
> On 11/24/16 03:23, Fan, Jeff wrote:
>> -----Original Message-----
>> From: Laszlo Ersek [mailto:lersek@redhat.com]
>> Sent: Thursday, November 24, 2016 2:31 AM
>> To: Fan, Jeff; edk2-devel@ml01.01.org
>> Cc: Tian, Feng; Kinney, Michael D
>> Subject: Re: [PATCH 2/3] UefiCpuPkg/DxeMpLib: Allocate new safe stack 
>> < 4GB
> 
>> I have some more questions about the preexistent code:
>>
>> (8) The MONITOR statement seems to set up an address *range* for 
>> monitoring with MWAIT. EAX provides the base address of the range, and 
>> we point it to our new stack. However, what determines the *size* of 
>> the address range? Obviously, it must fit in our new stack.
> 
>> [Jeff] Yes. Monitor has size requirement to avoid fault wakeup. But 
>> this case, fault wakeup has no any real impact. It has rare case to 
>> write the memory. Even though fault wakeup happens, safe mwait-loop 
>> could make sure AP enter into C-state again.
> 
> I'm sorry, I think my question was not clear enough. I'm actually interested in three things here.
> 
> * The first question is just for my general education, because I'm not familiar with the MONITOR configuration. The question is independent of the code, it is just for me to learn about MONITOR.
> 
> So, how does the programmer configure the size of the monitored area?
> The base address is taken from EAX. Where does the size come from?
> [Jeff] Monitor size is from CPUID. I observed it equal to the cache line size. But we needn't to set size of Monitor and only set base address of Monitor buffer when executing MONITOR instruction.
> 
> * My second note is that whatever size we configure for the monitor, it should fit into our new stack. I see that you agree (although spurious wakeups don't matter), so this part is done then.
> [Jeff] Yes. It should be in new stack.
> 
> * And third, I thought of something else last night. Please consider the
> Ia32 case:
> 
>> diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm 
>> b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
>> index 64e51d8..4e55760 100644
>> --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
>> +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
>> @@ -217,14 +217,19 @@ RendezvousFunnelProcEnd:
>>  global ASM_PFX(AsmRelocateApLoop)
>>  ASM_PFX(AsmRelocateApLoop):
>>  AsmRelocateApLoopStart:
>> -    cmp        byte [esp + 4], 1
>> +    push       ebp
>> +    mov        ebp, esp
>> +    mov        ecx, [ebp + 8]      ; MwaitSupport
>> +    mov        ebx, [ebp + 12]     ; ApTargetCState
>> +    mov        esp, [ebp + 20]     ; TopOfApStack
> 
> We set ESP precisely to TopOfApStack here.
> 
>> +    cmp        cl,  1              ; Check mwait-monitor support
>>      jnz        HltLoop
>>  MwaitLoop:
>>      mov        eax, esp
> 
> Here we move the same value to EAX -- so EAX equals TopOfApStack.
> 
>>      xor        ecx, ecx
>>      xor        edx, edx
>>      monitor
> 
> And here we point the monitor to TopOfApStack exactly. Note that nothing has been pushed.
> 
>> -    mov        eax, [esp + 8]    ; Mwait Cx, Target C-State per eax[7:4]
>> +    mov        eax, ebx            ; Mwait Cx, Target C-State per eax[7:4]
>>      shl        eax, 4
>>      mwait
>>      jmp        MwaitLoop
> 
> The stack grows down. Whenever we push something, the stack pointer is decremented first, and then the value being pushed is written to the location pointed-to by the new (decremented) stack pointer. (I verified this order of operations in the Intel SDM).
> 
> "mReservedTopOfApStack" points to the byte *right past* the allocated area. In turn, for ProcessorNumber = 0, "TopOfApStack" is set exactly to "mReservedTopOfApStack".
> 
> This is valid if we push something, because then the pointer is decremented first, and the value will be written into the allocated area.
> 
> However, with MONITOR, the area to watch extends *upward* from EAX. This means that for ProcessorNumber = 0, we will be monitoring an area that is strictly outside of the allocated range, regardless of the size of the monitored range. And, for ProcessorNumber (N+1), the monitored area will actually be located in the stack slice of ProcessorNumber (N).
> 
> So, my point is, we should determine the size of the monitored area exactly, and push that much empty space (== decrement ESP manually) before invoking MONITOR.
> 
> ... Hm, okay, here's what the Intel SDM tells us about MONITOR: "the address range that the monitoring hardware checks for store operations can be determined by using CPUID".
> 
> We should consider this address range in both the assembly code, and in the original allocation (so replace AP_SAFE_STACK_SIZE with a dynamic value, or at least make sure that AP_SAFT_STACK_SIZE is larger than any imaginable monitor area size).
> 
> The same issue could apply to the X64 code, but that's more complex so I didn't check it.
> 
> [Jeff] On normal case, we DO should check monitor size and make sure we write the correct monitor buffer to wakeup Aps. But for this case, we do not want to wake up Aps at all. So, we just have one mwait deadloop here. Once we found AP is wake up by any reason (SMI, Interrupt, or monitor buffer writing), we will go on executing mwait and place Aps into C-State again.
> As long as this mwait loop does not touch any unsafe code/stack, it should be OK. This could simply the implementation.

But, if we don't actually care where the monitor / mwait are looking
(and we only want to make sure that the base address is in accessible
address space), then what do we need the AP stacks for at all? As far as
I recall from last night when I was looking at the code, the HLT loop
doesn't seem to need a stack at all, and the MONITOR / MWAIT loops could
share a single address between all of the APs. No need to allocate an
array of stacks.

... I'm not trying to block this series, just to understand why the code
is designed like this.

Anyway, I think I have no more open questions, so if you wish, you could
post v2, and I'd try to test it soon.

Thanks!
Laszlo


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-11-24 21:14 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-23 14:01 [PATCH 0/3] UefiCpuPkg/DxeMpLib: Allocate new safe stack < 4GB Jeff Fan
2016-11-23 14:01 ` [PATCH 1/3] UefiCpuPkg/DxeMpLib: Get safe AP loop handler from global variable Jeff Fan
2016-11-23 17:08   ` Laszlo Ersek
2016-11-23 14:01 ` [PATCH 2/3] UefiCpuPkg/DxeMpLib: Allocate new safe stack < 4GB Jeff Fan
2016-11-23 18:31   ` Laszlo Ersek
2016-11-24  2:23     ` Fan, Jeff
2016-11-24  9:20       ` Laszlo Ersek
2016-11-24 13:48         ` Fan, Jeff
2016-11-24 21:13           ` Laszlo Ersek
2016-11-23 14:01 ` [PATCH 3/3] UefiCpuPkg/DxeMpLib: Make sure APs in safe loop code Jeff Fan
2016-11-23 18:52   ` Laszlo Ersek
2016-11-24  2:37     ` Fan, Jeff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox