public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
* [PATCH 0/2] Put AP into safe hlt-loop code on S3 path
@ 2016-11-10  6:07 Jeff Fan
  2016-11-10  6:07 ` [PATCH 1/2] UefiCpuPkg/PiSmmCpuDxeSmm: " Jeff Fan
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Jeff Fan @ 2016-11-10  6:07 UTC (permalink / raw)
  To: edk2-devel

On S3 path, we will wake up APs to restore CPU context in PiSmmCpuDxeSmm
driver. In case, one NMI or SMI happens, APs may exit from hlt state and
execute the instruction after HLT instruction.

But APs are not running on safe code, it leads OVMF S3 boot unstable.

https://bugzilla.tianocore.org/show_bug.cgi?id=216

I tested real platform with 64bit DXE.

Jeff Fan (2):
  UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path
  UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode on S3 path

 UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c             | 31 ++++++++++++++
 UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 25 ++++++++++++
 UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h    | 13 ++++++
 UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c  | 59 +++++++++++++++++++++++++++
 4 files changed, 128 insertions(+)

-- 
2.9.3.windows.2



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/2] UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path
  2016-11-10  6:07 [PATCH 0/2] Put AP into safe hlt-loop code on S3 path Jeff Fan
@ 2016-11-10  6:07 ` Jeff Fan
  2016-11-10  8:50   ` Laszlo Ersek
  2016-11-10  6:07 ` [PATCH 2/2] UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode " Jeff Fan
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Jeff Fan @ 2016-11-10  6:07 UTC (permalink / raw)
  To: edk2-devel; +Cc: Laszlo Ersek, Paolo Bonzini, Jiewen Yao, Michael D Kinney

On S3 path, we will wake up APs to restore CPU context in PiSmmCpuDxeSmm
driver. However, we place AP in hlt-loop under 1MB space borrowed after CPU
restoring CPU contexts.
In case, one NMI or SMI happens, APs may exit from hlt state and execute the
instruction after HLT instruction. But the code under 1MB is no longer safe at
that time.

This fix is to allocate one ACPI NVS range to place the AP hlt-loop code. When
CPU finished restoration CPU contexts, AP will execute in this ACPI NVS range.

https://bugzilla.tianocore.org/show_bug.cgi?id=216

Reported-by: Laszlo Ersek <lersek@redhat.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Michael D Kinney <michael.d.kinney@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jeff Fan <jeff.fan@intel.com>
---
 UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c             | 31 +++++++++++++++++++++++++++
 UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 25 +++++++++++++++++++++
 UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h    | 13 +++++++++++
 UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c  | 26 ++++++++++++++++++++++
 4 files changed, 95 insertions(+)

diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c b/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
index 6a798ef..6f290b8 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
@@ -77,6 +77,13 @@ SMM_S3_RESUME_STATE          *mSmmS3ResumeState = NULL;
 
 BOOLEAN                      mAcpiS3Enable = TRUE;
 
+UINT8                        *mApHltLoopCode = NULL;
+UINT8                        mApHltLoopCodeTemplate[] = {
+                               0xFA,        // cli
+                               0xF4,        // hlt
+                               0xEB, 0xFC   // jmp $-2
+                               };
+
 /**
   Get MSR spin lock by MSR index.
 
@@ -376,6 +383,8 @@ MPRendezvousProcedure (
   CPU_REGISTER_TABLE         *RegisterTableList;
   UINT32                     InitApicId;
   UINTN                      Index;
+  UINT32                     TopOfStack;
+  UINT8                      Stack[128];
 
   ProgramVirtualWireMode ();
   DisableLvtInterrupts ();
@@ -393,6 +402,13 @@ MPRendezvousProcedure (
   // Count down the number with lock mechanism.
   //
   InterlockedDecrement (&mNumberToFinish);
+
+  //
+  // Place AP into the safe code
+  //
+  TopOfStack = (UINT32) (UINTN) Stack + sizeof (Stack);
+  CopyMem ((VOID *) (UINTN) mApHltLoopCode, mApHltLoopCodeTemplate, sizeof (mApHltLoopCodeTemplate));
+  TransferApToSafeState ((UINT32) (UINTN) mApHltLoopCode, TopOfStack);
 }
 
 /**
@@ -731,6 +747,8 @@ InitSmmS3ResumeState (
   VOID                       *GuidHob;
   EFI_SMRAM_DESCRIPTOR       *SmramDescriptor;
   SMM_S3_RESUME_STATE        *SmmS3ResumeState;
+  EFI_PHYSICAL_ADDRESS       Address;
+  EFI_STATUS                 Status;
 
   if (!mAcpiS3Enable) {
     return;
@@ -773,6 +791,19 @@ InitSmmS3ResumeState (
   // Patch SmmS3ResumeState->SmmS3Cr3
   //
   InitSmmS3Cr3 ();
+
+  //
+  // Allocate safe memory in ACPI NVS for AP to execute hlt loop on S3 path
+  //
+  Address = BASE_4GB - 1;
+  Status  = gBS->AllocatePages (
+                   AllocateMaxAddress,
+                   EfiACPIMemoryNVS,
+                   EFI_SIZE_TO_PAGES (sizeof (mApHltLoopCodeTemplate)),
+                   &Address
+                   );
+  ASSERT_EFI_ERROR (Status);
+  mApHltLoopCode = (UINT8 *) (UINTN) Address;
 }
 
 /**
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c
index 545b534..3e99aab 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c
@@ -94,3 +94,28 @@ InitGdt (
   *GdtStepSize = GdtTableStepSize;
   return GdtTssTables;
 }
+
+/**
+  Transfer AP to safe hlt-loop after it finished restore CPU features on S3 patch.
+  
+  @param[in] ApHltLoopCode    The 32-bit address of the safe hlt-loop function 
+  @param[in] TopOfStack       A pointer to the new stack to use for the ApHltLoopCode.
+
+**/
+VOID
+TransferApToSafeState (
+  IN UINT32             ApHltLoopCode,
+  IN UINT32             TopOfStack
+  )
+{
+  SwitchStack (
+    (SWITCH_STACK_ENTRY_POINT) (UINTN) ApHltLoopCode,
+    NULL,
+    NULL,
+    (VOID *) (UINTN) TopOfStack
+    );
+  //
+  // It should never reach here
+  //
+  ASSERT (FALSE);
+}
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
index 9b119c8..82fa5a6 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
@@ -825,4 +825,17 @@ GetAcpiS3EnableFlag (
   VOID
   );
 
+/**
+  Transfer AP to safe hlt-loop after it finished restore CPU features on S3 patch.
+  
+  @param[in] ApHltLoopCode    The 32-bit address of the safe hlt-loop function 
+  @param[in] TopOfStack       A pointer to the new stack to use for the ApHltLoopCode.
+
+**/
+VOID
+TransferApToSafeState (
+  IN UINT32             ApHltLoopCode,
+  IN UINT32             TopOfStack
+  );
+
 #endif
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c
index b53aa45..c05dec7 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c
@@ -68,3 +68,29 @@ InitGdt (
   *GdtStepSize = GdtTableStepSize;
   return GdtTssTables;
 }
+/**
+  Transfer AP to safe hlt-loop after it finished restore CPU features on S3 patch.
+  
+  @param[in] ApHltLoopCode    The 32-bit address of the safe hlt-loop function 
+  @param[in] TopOfStack       A pointer to the new stack to use for the ApHltLoopCode.
+
+**/
+VOID
+TransferApToSafeState (
+  IN UINT32             ApHltLoopCode,
+  IN UINT32             TopOfStack
+  )
+{
+  SwitchStack (
+    (SWITCH_STACK_ENTRY_POINT) (UINTN) ApHltLoopCode,
+    NULL,
+    NULL,
+    (VOID *) (UINTN) TopOfStack
+    );
+  //
+  // It should never reach here
+  //
+  ASSERT (FALSE);
+}
+
+
-- 
2.9.3.windows.2



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/2] UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode on S3 path
  2016-11-10  6:07 [PATCH 0/2] Put AP into safe hlt-loop code on S3 path Jeff Fan
  2016-11-10  6:07 ` [PATCH 1/2] UefiCpuPkg/PiSmmCpuDxeSmm: " Jeff Fan
@ 2016-11-10  6:07 ` Jeff Fan
  2016-11-10  8:56 ` [PATCH 0/2] Put AP into safe hlt-loop code " Laszlo Ersek
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Jeff Fan @ 2016-11-10  6:07 UTC (permalink / raw)
  To: edk2-devel; +Cc: Laszlo Ersek, Paolo Bonzini, Jiewen Yao, Michael D Kinney

On S3 path, we may transfer to long mode (if DXE is long mode) to restore CPU
contexts with CR3 = SmmS3Cr3 (in SMM). AP will execute hlt-loop after CPU
contexts restoration. Once one NMI or SMI happens, APs may exit from hlt state
and execute the instruction after HLT instruction. If APs are running on long
mode, page table is required to fetch the instruction. However, CR3 pointer to
page table in SMM. APs will crash.

This fix is to disable long mode on APs and transfer to 32bit protected mode to
execute hlt-loop. Then CR3 and page table will no longer be required.

https://bugzilla.tianocore.org/show_bug.cgi?id=216

Reported-by: Laszlo Ersek <lersek@redhat.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Michael D Kinney <michael.d.kinney@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jeff Fan <jeff.fan@intel.com>
---
 UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c | 43 ++++++++++++++++++++++++----
 1 file changed, 38 insertions(+), 5 deletions(-)

diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c
index c05dec7..1db0a6e 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c
@@ -68,6 +68,38 @@ InitGdt (
   *GdtStepSize = GdtTableStepSize;
   return GdtTssTables;
 }
+
+/**
+  Get Protected mode code segment from current GDT table.
+
+  @return  Protected mode code segment value.
+**/
+UINT16
+GetProtectedModeCS (
+  VOID
+  )
+{
+  IA32_DESCRIPTOR          GdtrDesc;
+  IA32_SEGMENT_DESCRIPTOR  *GdtEntry;
+  UINTN                    GdtEntryCount;
+  UINT16                   Index;
+
+  Index = (UINT16) -1;
+  AsmReadGdtr (&GdtrDesc);
+  GdtEntryCount = (GdtrDesc.Limit + 1) / sizeof (IA32_SEGMENT_DESCRIPTOR);
+  GdtEntry = (IA32_SEGMENT_DESCRIPTOR *) GdtrDesc.Base;
+  for (Index = 0; Index < GdtEntryCount; Index++) {
+    if (GdtEntry->Bits.L == 0) {
+      if (GdtEntry->Bits.Type > 8 && GdtEntry->Bits.L == 0) {
+        break;
+      }
+    }
+    GdtEntry++;
+  }
+  ASSERT (Index != -1);
+  return Index * 8;
+}
+
 /**
   Transfer AP to safe hlt-loop after it finished restore CPU features on S3 patch.
   
@@ -81,11 +113,12 @@ TransferApToSafeState (
   IN UINT32             TopOfStack
   )
 {
-  SwitchStack (
-    (SWITCH_STACK_ENTRY_POINT) (UINTN) ApHltLoopCode,
-    NULL,
-    NULL,
-    (VOID *) (UINTN) TopOfStack
+  AsmDisablePaging64 (
+    GetProtectedModeCS (),
+    (UINT32) (UINTN) ApHltLoopCode,
+    0,
+    0,
+    TopOfStack
     );
   //
   // It should never reach here
-- 
2.9.3.windows.2



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path
  2016-11-10  6:07 ` [PATCH 1/2] UefiCpuPkg/PiSmmCpuDxeSmm: " Jeff Fan
@ 2016-11-10  8:50   ` Laszlo Ersek
  2016-11-10  9:00     ` Fan, Jeff
  0 siblings, 1 reply; 15+ messages in thread
From: Laszlo Ersek @ 2016-11-10  8:50 UTC (permalink / raw)
  To: Jeff Fan, edk2-devel; +Cc: Paolo Bonzini, Jiewen Yao, Michael D Kinney

Before I test this, I have one question / suggestion:

On 11/10/16 07:07, Jeff Fan wrote:
> On S3 path, we will wake up APs to restore CPU context in PiSmmCpuDxeSmm
> driver. However, we place AP in hlt-loop under 1MB space borrowed after CPU
> restoring CPU contexts.
> In case, one NMI or SMI happens, APs may exit from hlt state and execute the
> instruction after HLT instruction. But the code under 1MB is no longer safe at
> that time.
> 
> This fix is to allocate one ACPI NVS range to place the AP hlt-loop code. When
> CPU finished restoration CPU contexts, AP will execute in this ACPI NVS range.
> 
> https://bugzilla.tianocore.org/show_bug.cgi?id=216
> 
> Reported-by: Laszlo Ersek <lersek@redhat.com>
> Cc: Laszlo Ersek <lersek@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Jiewen Yao <jiewen.yao@intel.com>
> Cc: Michael D Kinney <michael.d.kinney@intel.com>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Jeff Fan <jeff.fan@intel.com>
> ---
>  UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c             | 31 +++++++++++++++++++++++++++
>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 25 +++++++++++++++++++++
>  UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h    | 13 +++++++++++
>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c  | 26 ++++++++++++++++++++++
>  4 files changed, 95 insertions(+)
> 
> diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c b/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
> index 6a798ef..6f290b8 100644
> --- a/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
> +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
> @@ -77,6 +77,13 @@ SMM_S3_RESUME_STATE          *mSmmS3ResumeState = NULL;
>  
>  BOOLEAN                      mAcpiS3Enable = TRUE;
>  
> +UINT8                        *mApHltLoopCode = NULL;
> +UINT8                        mApHltLoopCodeTemplate[] = {
> +                               0xFA,        // cli
> +                               0xF4,        // hlt
> +                               0xEB, 0xFC   // jmp $-2
> +                               };
> +
>  /**
>    Get MSR spin lock by MSR index.
>  
> @@ -376,6 +383,8 @@ MPRendezvousProcedure (
>    CPU_REGISTER_TABLE         *RegisterTableList;
>    UINT32                     InitApicId;
>    UINTN                      Index;
> +  UINT32                     TopOfStack;
> +  UINT8                      Stack[128];
>  
>    ProgramVirtualWireMode ();
>    DisableLvtInterrupts ();
> @@ -393,6 +402,13 @@ MPRendezvousProcedure (
>    // Count down the number with lock mechanism.
>    //
>    InterlockedDecrement (&mNumberToFinish);
> +
> +  //
> +  // Place AP into the safe code
> +  //
> +  TopOfStack = (UINT32) (UINTN) Stack + sizeof (Stack);

Here I suggest to append the following assignment:

  TopOfStack &= ~(UINT32) (CPU_STACK_ALIGNMENT - 1);

This will clear the least significant bits of TopOfStack, wasting at
most (CPU_STACK_ALIGNMENT - 1) bytes from the Stack array. However, the
stack will be aligned correctly.

Not sure if it's necessary, but the patch reminds me of commit
f98f5ec304ec ("S3Resume2Pei: align return stacks explicitly").

What do you think?

Thanks!
Laszlo

> +  CopyMem ((VOID *) (UINTN) mApHltLoopCode, mApHltLoopCodeTemplate, sizeof (mApHltLoopCodeTemplate));
> +  TransferApToSafeState ((UINT32) (UINTN) mApHltLoopCode, TopOfStack);
>  }
>  
>  /**
> @@ -731,6 +747,8 @@ InitSmmS3ResumeState (
>    VOID                       *GuidHob;
>    EFI_SMRAM_DESCRIPTOR       *SmramDescriptor;
>    SMM_S3_RESUME_STATE        *SmmS3ResumeState;
> +  EFI_PHYSICAL_ADDRESS       Address;
> +  EFI_STATUS                 Status;
>  
>    if (!mAcpiS3Enable) {
>      return;
> @@ -773,6 +791,19 @@ InitSmmS3ResumeState (
>    // Patch SmmS3ResumeState->SmmS3Cr3
>    //
>    InitSmmS3Cr3 ();
> +
> +  //
> +  // Allocate safe memory in ACPI NVS for AP to execute hlt loop on S3 path
> +  //
> +  Address = BASE_4GB - 1;
> +  Status  = gBS->AllocatePages (
> +                   AllocateMaxAddress,
> +                   EfiACPIMemoryNVS,
> +                   EFI_SIZE_TO_PAGES (sizeof (mApHltLoopCodeTemplate)),
> +                   &Address
> +                   );
> +  ASSERT_EFI_ERROR (Status);
> +  mApHltLoopCode = (UINT8 *) (UINTN) Address;
>  }
>  
>  /**
> diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c
> index 545b534..3e99aab 100644
> --- a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c
> +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c
> @@ -94,3 +94,28 @@ InitGdt (
>    *GdtStepSize = GdtTableStepSize;
>    return GdtTssTables;
>  }
> +
> +/**
> +  Transfer AP to safe hlt-loop after it finished restore CPU features on S3 patch.
> +  
> +  @param[in] ApHltLoopCode    The 32-bit address of the safe hlt-loop function 
> +  @param[in] TopOfStack       A pointer to the new stack to use for the ApHltLoopCode.
> +
> +**/
> +VOID
> +TransferApToSafeState (
> +  IN UINT32             ApHltLoopCode,
> +  IN UINT32             TopOfStack
> +  )
> +{
> +  SwitchStack (
> +    (SWITCH_STACK_ENTRY_POINT) (UINTN) ApHltLoopCode,
> +    NULL,
> +    NULL,
> +    (VOID *) (UINTN) TopOfStack
> +    );
> +  //
> +  // It should never reach here
> +  //
> +  ASSERT (FALSE);
> +}
> diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
> index 9b119c8..82fa5a6 100644
> --- a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
> +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
> @@ -825,4 +825,17 @@ GetAcpiS3EnableFlag (
>    VOID
>    );
>  
> +/**
> +  Transfer AP to safe hlt-loop after it finished restore CPU features on S3 patch.
> +  
> +  @param[in] ApHltLoopCode    The 32-bit address of the safe hlt-loop function 
> +  @param[in] TopOfStack       A pointer to the new stack to use for the ApHltLoopCode.
> +
> +**/
> +VOID
> +TransferApToSafeState (
> +  IN UINT32             ApHltLoopCode,
> +  IN UINT32             TopOfStack
> +  );
> +
>  #endif
> diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c
> index b53aa45..c05dec7 100644
> --- a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c
> +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c
> @@ -68,3 +68,29 @@ InitGdt (
>    *GdtStepSize = GdtTableStepSize;
>    return GdtTssTables;
>  }
> +/**
> +  Transfer AP to safe hlt-loop after it finished restore CPU features on S3 patch.
> +  
> +  @param[in] ApHltLoopCode    The 32-bit address of the safe hlt-loop function 
> +  @param[in] TopOfStack       A pointer to the new stack to use for the ApHltLoopCode.
> +
> +**/
> +VOID
> +TransferApToSafeState (
> +  IN UINT32             ApHltLoopCode,
> +  IN UINT32             TopOfStack
> +  )
> +{
> +  SwitchStack (
> +    (SWITCH_STACK_ENTRY_POINT) (UINTN) ApHltLoopCode,
> +    NULL,
> +    NULL,
> +    (VOID *) (UINTN) TopOfStack
> +    );
> +  //
> +  // It should never reach here
> +  //
> +  ASSERT (FALSE);
> +}
> +
> +
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] Put AP into safe hlt-loop code on S3 path
  2016-11-10  6:07 [PATCH 0/2] Put AP into safe hlt-loop code on S3 path Jeff Fan
  2016-11-10  6:07 ` [PATCH 1/2] UefiCpuPkg/PiSmmCpuDxeSmm: " Jeff Fan
  2016-11-10  6:07 ` [PATCH 2/2] UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode " Jeff Fan
@ 2016-11-10  8:56 ` Laszlo Ersek
  2016-11-10  9:59 ` Paolo Bonzini
  2016-11-10 10:41 ` Laszlo Ersek
  4 siblings, 0 replies; 15+ messages in thread
From: Laszlo Ersek @ 2016-11-10  8:56 UTC (permalink / raw)
  To: Jeff Fan, edk2-devel; +Cc: Paolo Bonzini

On 11/10/16 07:07, Jeff Fan wrote:
> On S3 path, we will wake up APs to restore CPU context in PiSmmCpuDxeSmm
> driver. In case, one NMI or SMI happens, APs may exit from hlt state and
> execute the instruction after HLT instruction.
> 
> But APs are not running on safe code, it leads OVMF S3 boot unstable.
> 
> https://bugzilla.tianocore.org/show_bug.cgi?id=216
> 
> I tested real platform with 64bit DXE.
> 
> Jeff Fan (2):
>   UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path
>   UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode on S3 path
> 
>  UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c             | 31 ++++++++++++++
>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 25 ++++++++++++
>  UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h    | 13 ++++++
>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c  | 59 +++++++++++++++++++++++++++
>  4 files changed, 128 insertions(+)
> 

Can we please append, to both commit messages:

Analyzed-by: Paolo Bonzini <pbonzini@redhat.com>

I think we should give credit to Paolo for his work in this thread.

I'll report back with test results (with my suggested rounding-down of
TopOfStack in patch #1).

Thanks
Laszlo


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path
  2016-11-10  8:50   ` Laszlo Ersek
@ 2016-11-10  9:00     ` Fan, Jeff
  2016-11-10  9:30       ` Laszlo Ersek
  0 siblings, 1 reply; 15+ messages in thread
From: Fan, Jeff @ 2016-11-10  9:00 UTC (permalink / raw)
  To: Laszlo Ersek, edk2-devel@ml01.01.org
  Cc: Kinney, Michael D, Paolo Bonzini, Yao, Jiewen

I agree with your proposal.  Great suggestion. I will create v2 later.

-----Original Message-----
From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek
Sent: Thursday, November 10, 2016 4:51 PM
To: Fan, Jeff; edk2-devel@ml01.01.org
Cc: Kinney, Michael D; Paolo Bonzini; Yao, Jiewen
Subject: Re: [edk2] [PATCH 1/2] UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path

Before I test this, I have one question / suggestion:

On 11/10/16 07:07, Jeff Fan wrote:
> On S3 path, we will wake up APs to restore CPU context in 
> PiSmmCpuDxeSmm driver. However, we place AP in hlt-loop under 1MB 
> space borrowed after CPU restoring CPU contexts.
> In case, one NMI or SMI happens, APs may exit from hlt state and 
> execute the instruction after HLT instruction. But the code under 1MB 
> is no longer safe at that time.
> 
> This fix is to allocate one ACPI NVS range to place the AP hlt-loop 
> code. When CPU finished restoration CPU contexts, AP will execute in this ACPI NVS range.
> 
> https://bugzilla.tianocore.org/show_bug.cgi?id=216
> 
> Reported-by: Laszlo Ersek <lersek@redhat.com>
> Cc: Laszlo Ersek <lersek@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Jiewen Yao <jiewen.yao@intel.com>
> Cc: Michael D Kinney <michael.d.kinney@intel.com>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Jeff Fan <jeff.fan@intel.com>
> ---
>  UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c             | 31 +++++++++++++++++++++++++++
>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 25 +++++++++++++++++++++
>  UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h    | 13 +++++++++++
>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c  | 26 
> ++++++++++++++++++++++
>  4 files changed, 95 insertions(+)
> 
> diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c 
> b/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
> index 6a798ef..6f290b8 100644
> --- a/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
> +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
> @@ -77,6 +77,13 @@ SMM_S3_RESUME_STATE          *mSmmS3ResumeState = NULL;
>  
>  BOOLEAN                      mAcpiS3Enable = TRUE;
>  
> +UINT8                        *mApHltLoopCode = NULL;
> +UINT8                        mApHltLoopCodeTemplate[] = {
> +                               0xFA,        // cli
> +                               0xF4,        // hlt
> +                               0xEB, 0xFC   // jmp $-2
> +                               };
> +
>  /**
>    Get MSR spin lock by MSR index.
>  
> @@ -376,6 +383,8 @@ MPRendezvousProcedure (
>    CPU_REGISTER_TABLE         *RegisterTableList;
>    UINT32                     InitApicId;
>    UINTN                      Index;
> +  UINT32                     TopOfStack;
> +  UINT8                      Stack[128];
>  
>    ProgramVirtualWireMode ();
>    DisableLvtInterrupts ();
> @@ -393,6 +402,13 @@ MPRendezvousProcedure (
>    // Count down the number with lock mechanism.
>    //
>    InterlockedDecrement (&mNumberToFinish);
> +
> +  //
> +  // Place AP into the safe code
> +  //
> +  TopOfStack = (UINT32) (UINTN) Stack + sizeof (Stack);

Here I suggest to append the following assignment:

  TopOfStack &= ~(UINT32) (CPU_STACK_ALIGNMENT - 1);

This will clear the least significant bits of TopOfStack, wasting at most (CPU_STACK_ALIGNMENT - 1) bytes from the Stack array. However, the stack will be aligned correctly.

Not sure if it's necessary, but the patch reminds me of commit f98f5ec304ec ("S3Resume2Pei: align return stacks explicitly").

What do you think?

Thanks!
Laszlo

> +  CopyMem ((VOID *) (UINTN) mApHltLoopCode, mApHltLoopCodeTemplate, 
> + sizeof (mApHltLoopCodeTemplate));  TransferApToSafeState ((UINT32) 
> + (UINTN) mApHltLoopCode, TopOfStack);
>  }
>  
>  /**
> @@ -731,6 +747,8 @@ InitSmmS3ResumeState (
>    VOID                       *GuidHob;
>    EFI_SMRAM_DESCRIPTOR       *SmramDescriptor;
>    SMM_S3_RESUME_STATE        *SmmS3ResumeState;
> +  EFI_PHYSICAL_ADDRESS       Address;
> +  EFI_STATUS                 Status;
>  
>    if (!mAcpiS3Enable) {
>      return;
> @@ -773,6 +791,19 @@ InitSmmS3ResumeState (
>    // Patch SmmS3ResumeState->SmmS3Cr3
>    //
>    InitSmmS3Cr3 ();
> +
> +  //
> +  // Allocate safe memory in ACPI NVS for AP to execute hlt loop on 
> + S3 path  //  Address = BASE_4GB - 1;  Status  = gBS->AllocatePages (
> +                   AllocateMaxAddress,
> +                   EfiACPIMemoryNVS,
> +                   EFI_SIZE_TO_PAGES (sizeof (mApHltLoopCodeTemplate)),
> +                   &Address
> +                   );
> +  ASSERT_EFI_ERROR (Status);
> +  mApHltLoopCode = (UINT8 *) (UINTN) Address;
>  }
>  
>  /**
> diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c 
> b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c
> index 545b534..3e99aab 100644
> --- a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c
> +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c
> @@ -94,3 +94,28 @@ InitGdt (
>    *GdtStepSize = GdtTableStepSize;
>    return GdtTssTables;
>  }
> +
> +/**
> +  Transfer AP to safe hlt-loop after it finished restore CPU features on S3 patch.
> +  
> +  @param[in] ApHltLoopCode    The 32-bit address of the safe hlt-loop function 
> +  @param[in] TopOfStack       A pointer to the new stack to use for the ApHltLoopCode.
> +
> +**/
> +VOID
> +TransferApToSafeState (
> +  IN UINT32             ApHltLoopCode,
> +  IN UINT32             TopOfStack
> +  )
> +{
> +  SwitchStack (
> +    (SWITCH_STACK_ENTRY_POINT) (UINTN) ApHltLoopCode,
> +    NULL,
> +    NULL,
> +    (VOID *) (UINTN) TopOfStack
> +    );
> +  //
> +  // It should never reach here
> +  //
> +  ASSERT (FALSE);
> +}
> diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h 
> b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
> index 9b119c8..82fa5a6 100644
> --- a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
> +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
> @@ -825,4 +825,17 @@ GetAcpiS3EnableFlag (
>    VOID
>    );
>  
> +/**
> +  Transfer AP to safe hlt-loop after it finished restore CPU features on S3 patch.
> +  
> +  @param[in] ApHltLoopCode    The 32-bit address of the safe hlt-loop function 
> +  @param[in] TopOfStack       A pointer to the new stack to use for the ApHltLoopCode.
> +
> +**/
> +VOID
> +TransferApToSafeState (
> +  IN UINT32             ApHltLoopCode,
> +  IN UINT32             TopOfStack
> +  );
> +
>  #endif
> diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c 
> b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c
> index b53aa45..c05dec7 100644
> --- a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c
> +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c
> @@ -68,3 +68,29 @@ InitGdt (
>    *GdtStepSize = GdtTableStepSize;
>    return GdtTssTables;
>  }
> +/**
> +  Transfer AP to safe hlt-loop after it finished restore CPU features on S3 patch.
> +  
> +  @param[in] ApHltLoopCode    The 32-bit address of the safe hlt-loop function 
> +  @param[in] TopOfStack       A pointer to the new stack to use for the ApHltLoopCode.
> +
> +**/
> +VOID
> +TransferApToSafeState (
> +  IN UINT32             ApHltLoopCode,
> +  IN UINT32             TopOfStack
> +  )
> +{
> +  SwitchStack (
> +    (SWITCH_STACK_ENTRY_POINT) (UINTN) ApHltLoopCode,
> +    NULL,
> +    NULL,
> +    (VOID *) (UINTN) TopOfStack
> +    );
> +  //
> +  // It should never reach here
> +  //
> +  ASSERT (FALSE);
> +}
> +
> +
> 

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path
  2016-11-10  9:00     ` Fan, Jeff
@ 2016-11-10  9:30       ` Laszlo Ersek
  0 siblings, 0 replies; 15+ messages in thread
From: Laszlo Ersek @ 2016-11-10  9:30 UTC (permalink / raw)
  To: Fan, Jeff, edk2-devel@ml01.01.org
  Cc: Kinney, Michael D, Paolo Bonzini, Yao, Jiewen

On 11/10/16 10:00, Fan, Jeff wrote:
> I agree with your proposal.  Great suggestion. I will create v2 later.

Thank you.

Since you're going to send a v2: while applying this patch for testing,
git-am warned me that there were 7 new lines with trailing whitespace.
Can you please trim those?

Thanks
Laszlo

> -----Original Message-----
> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek
> Sent: Thursday, November 10, 2016 4:51 PM
> To: Fan, Jeff; edk2-devel@ml01.01.org
> Cc: Kinney, Michael D; Paolo Bonzini; Yao, Jiewen
> Subject: Re: [edk2] [PATCH 1/2] UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path
> 
> Before I test this, I have one question / suggestion:
> 
> On 11/10/16 07:07, Jeff Fan wrote:
>> On S3 path, we will wake up APs to restore CPU context in 
>> PiSmmCpuDxeSmm driver. However, we place AP in hlt-loop under 1MB 
>> space borrowed after CPU restoring CPU contexts.
>> In case, one NMI or SMI happens, APs may exit from hlt state and 
>> execute the instruction after HLT instruction. But the code under 1MB 
>> is no longer safe at that time.
>>
>> This fix is to allocate one ACPI NVS range to place the AP hlt-loop 
>> code. When CPU finished restoration CPU contexts, AP will execute in this ACPI NVS range.
>>
>> https://bugzilla.tianocore.org/show_bug.cgi?id=216
>>
>> Reported-by: Laszlo Ersek <lersek@redhat.com>
>> Cc: Laszlo Ersek <lersek@redhat.com>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Jiewen Yao <jiewen.yao@intel.com>
>> Cc: Michael D Kinney <michael.d.kinney@intel.com>
>> Contributed-under: TianoCore Contribution Agreement 1.0
>> Signed-off-by: Jeff Fan <jeff.fan@intel.com>
>> ---
>>  UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c             | 31 +++++++++++++++++++++++++++
>>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 25 +++++++++++++++++++++
>>  UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h    | 13 +++++++++++
>>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c  | 26 
>> ++++++++++++++++++++++
>>  4 files changed, 95 insertions(+)
>>
>> diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c 
>> b/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
>> index 6a798ef..6f290b8 100644
>> --- a/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
>> +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
>> @@ -77,6 +77,13 @@ SMM_S3_RESUME_STATE          *mSmmS3ResumeState = NULL;
>>  
>>  BOOLEAN                      mAcpiS3Enable = TRUE;
>>  
>> +UINT8                        *mApHltLoopCode = NULL;
>> +UINT8                        mApHltLoopCodeTemplate[] = {
>> +                               0xFA,        // cli
>> +                               0xF4,        // hlt
>> +                               0xEB, 0xFC   // jmp $-2
>> +                               };
>> +
>>  /**
>>    Get MSR spin lock by MSR index.
>>  
>> @@ -376,6 +383,8 @@ MPRendezvousProcedure (
>>    CPU_REGISTER_TABLE         *RegisterTableList;
>>    UINT32                     InitApicId;
>>    UINTN                      Index;
>> +  UINT32                     TopOfStack;
>> +  UINT8                      Stack[128];
>>  
>>    ProgramVirtualWireMode ();
>>    DisableLvtInterrupts ();
>> @@ -393,6 +402,13 @@ MPRendezvousProcedure (
>>    // Count down the number with lock mechanism.
>>    //
>>    InterlockedDecrement (&mNumberToFinish);
>> +
>> +  //
>> +  // Place AP into the safe code
>> +  //
>> +  TopOfStack = (UINT32) (UINTN) Stack + sizeof (Stack);
> 
> Here I suggest to append the following assignment:
> 
>   TopOfStack &= ~(UINT32) (CPU_STACK_ALIGNMENT - 1);
> 
> This will clear the least significant bits of TopOfStack, wasting at most (CPU_STACK_ALIGNMENT - 1) bytes from the Stack array. However, the stack will be aligned correctly.
> 
> Not sure if it's necessary, but the patch reminds me of commit f98f5ec304ec ("S3Resume2Pei: align return stacks explicitly").
> 
> What do you think?
> 
> Thanks!
> Laszlo
> 
>> +  CopyMem ((VOID *) (UINTN) mApHltLoopCode, mApHltLoopCodeTemplate, 
>> + sizeof (mApHltLoopCodeTemplate));  TransferApToSafeState ((UINT32) 
>> + (UINTN) mApHltLoopCode, TopOfStack);
>>  }
>>  
>>  /**
>> @@ -731,6 +747,8 @@ InitSmmS3ResumeState (
>>    VOID                       *GuidHob;
>>    EFI_SMRAM_DESCRIPTOR       *SmramDescriptor;
>>    SMM_S3_RESUME_STATE        *SmmS3ResumeState;
>> +  EFI_PHYSICAL_ADDRESS       Address;
>> +  EFI_STATUS                 Status;
>>  
>>    if (!mAcpiS3Enable) {
>>      return;
>> @@ -773,6 +791,19 @@ InitSmmS3ResumeState (
>>    // Patch SmmS3ResumeState->SmmS3Cr3
>>    //
>>    InitSmmS3Cr3 ();
>> +
>> +  //
>> +  // Allocate safe memory in ACPI NVS for AP to execute hlt loop on 
>> + S3 path  //  Address = BASE_4GB - 1;  Status  = gBS->AllocatePages (
>> +                   AllocateMaxAddress,
>> +                   EfiACPIMemoryNVS,
>> +                   EFI_SIZE_TO_PAGES (sizeof (mApHltLoopCodeTemplate)),
>> +                   &Address
>> +                   );
>> +  ASSERT_EFI_ERROR (Status);
>> +  mApHltLoopCode = (UINT8 *) (UINTN) Address;
>>  }
>>  
>>  /**
>> diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c 
>> b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c
>> index 545b534..3e99aab 100644
>> --- a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c
>> +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c
>> @@ -94,3 +94,28 @@ InitGdt (
>>    *GdtStepSize = GdtTableStepSize;
>>    return GdtTssTables;
>>  }
>> +
>> +/**
>> +  Transfer AP to safe hlt-loop after it finished restore CPU features on S3 patch.
>> +  
>> +  @param[in] ApHltLoopCode    The 32-bit address of the safe hlt-loop function 
>> +  @param[in] TopOfStack       A pointer to the new stack to use for the ApHltLoopCode.
>> +
>> +**/
>> +VOID
>> +TransferApToSafeState (
>> +  IN UINT32             ApHltLoopCode,
>> +  IN UINT32             TopOfStack
>> +  )
>> +{
>> +  SwitchStack (
>> +    (SWITCH_STACK_ENTRY_POINT) (UINTN) ApHltLoopCode,
>> +    NULL,
>> +    NULL,
>> +    (VOID *) (UINTN) TopOfStack
>> +    );
>> +  //
>> +  // It should never reach here
>> +  //
>> +  ASSERT (FALSE);
>> +}
>> diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h 
>> b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
>> index 9b119c8..82fa5a6 100644
>> --- a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
>> +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
>> @@ -825,4 +825,17 @@ GetAcpiS3EnableFlag (
>>    VOID
>>    );
>>  
>> +/**
>> +  Transfer AP to safe hlt-loop after it finished restore CPU features on S3 patch.
>> +  
>> +  @param[in] ApHltLoopCode    The 32-bit address of the safe hlt-loop function 
>> +  @param[in] TopOfStack       A pointer to the new stack to use for the ApHltLoopCode.
>> +
>> +**/
>> +VOID
>> +TransferApToSafeState (
>> +  IN UINT32             ApHltLoopCode,
>> +  IN UINT32             TopOfStack
>> +  );
>> +
>>  #endif
>> diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c 
>> b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c
>> index b53aa45..c05dec7 100644
>> --- a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c
>> +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c
>> @@ -68,3 +68,29 @@ InitGdt (
>>    *GdtStepSize = GdtTableStepSize;
>>    return GdtTssTables;
>>  }
>> +/**
>> +  Transfer AP to safe hlt-loop after it finished restore CPU features on S3 patch.
>> +  
>> +  @param[in] ApHltLoopCode    The 32-bit address of the safe hlt-loop function 
>> +  @param[in] TopOfStack       A pointer to the new stack to use for the ApHltLoopCode.
>> +
>> +**/
>> +VOID
>> +TransferApToSafeState (
>> +  IN UINT32             ApHltLoopCode,
>> +  IN UINT32             TopOfStack
>> +  )
>> +{
>> +  SwitchStack (
>> +    (SWITCH_STACK_ENTRY_POINT) (UINTN) ApHltLoopCode,
>> +    NULL,
>> +    NULL,
>> +    (VOID *) (UINTN) TopOfStack
>> +    );
>> +  //
>> +  // It should never reach here
>> +  //
>> +  ASSERT (FALSE);
>> +}
>> +
>> +
>>
> 
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] Put AP into safe hlt-loop code on S3 path
  2016-11-10  6:07 [PATCH 0/2] Put AP into safe hlt-loop code on S3 path Jeff Fan
                   ` (2 preceding siblings ...)
  2016-11-10  8:56 ` [PATCH 0/2] Put AP into safe hlt-loop code " Laszlo Ersek
@ 2016-11-10  9:59 ` Paolo Bonzini
  2016-11-11  6:32   ` Fan, Jeff
  2016-11-10 10:41 ` Laszlo Ersek
  4 siblings, 1 reply; 15+ messages in thread
From: Paolo Bonzini @ 2016-11-10  9:59 UTC (permalink / raw)
  To: Jeff Fan, edk2-devel



On 10/11/2016 07:07, Jeff Fan wrote:
> On S3 path, we will wake up APs to restore CPU context in PiSmmCpuDxeSmm
> driver. In case, one NMI or SMI happens, APs may exit from hlt state and
> execute the instruction after HLT instruction.
> 
> But APs are not running on safe code, it leads OVMF S3 boot unstable.
> 
> https://bugzilla.tianocore.org/show_bug.cgi?id=216
> 
> I tested real platform with 64bit DXE.
> 
> Jeff Fan (2):
>   UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path
>   UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode on S3 path
> 
>  UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c             | 31 ++++++++++++++
>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 25 ++++++++++++
>  UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h    | 13 ++++++
>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c  | 59 +++++++++++++++++++++++++++
>  4 files changed, 128 insertions(+)
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

It would be slightly more robust to do the "InterlockedDecrement
(&mNumberToFinish);" while in safe state, but the race window is really
really small.

Paolo


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] Put AP into safe hlt-loop code on S3 path
  2016-11-10  6:07 [PATCH 0/2] Put AP into safe hlt-loop code on S3 path Jeff Fan
                   ` (3 preceding siblings ...)
  2016-11-10  9:59 ` Paolo Bonzini
@ 2016-11-10 10:41 ` Laszlo Ersek
  2016-11-10 11:17   ` Yao, Jiewen
  2016-11-10 12:26   ` Paolo Bonzini
  4 siblings, 2 replies; 15+ messages in thread
From: Laszlo Ersek @ 2016-11-10 10:41 UTC (permalink / raw)
  To: Jeff Fan; +Cc: edk2-devel, Jiewen Yao, Paolo Bonzini

On 11/10/16 07:07, Jeff Fan wrote:
> On S3 path, we will wake up APs to restore CPU context in PiSmmCpuDxeSmm
> driver. In case, one NMI or SMI happens, APs may exit from hlt state and
> execute the instruction after HLT instruction.
> 
> But APs are not running on safe code, it leads OVMF S3 boot unstable.
> 
> https://bugzilla.tianocore.org/show_bug.cgi?id=216
> 
> I tested real platform with 64bit DXE.
> 
> Jeff Fan (2):
>   UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path
>   UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode on S3 path
> 
>  UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c             | 31 ++++++++++++++
>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 25 ++++++++++++
>  UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h    | 13 ++++++
>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c  | 59 +++++++++++++++++++++++++++
>  4 files changed, 128 insertions(+)
> 

I applied this on top of Jiewen's v2, for testing.

This series (with my addition for patch #1) doesn't fix the boot failure in case 8. (See "case 8" in <https://lists.01.org/pipermail/edk2-devel/2016-November/004316.html>.) I don't think the series aims to do that at all, but since it modifies the Ia32/SmmFuncsArch.c file, I thought I'd give it a shot.

The series (with my addition for patch #1) changed the behavior of S3 resume, in case 13. There seem to be no crashes / emulation failures now. However, in some of the tries, the resume seems to include a several second long busy loop, and after that -- although the guest OS does come back up --, I cannot access *some* of the APs from within the OS:

# this works, quickly
taskset -c 0 efibootmgr 

# this fails
taskset -c 1 efibootmgr
taskset: failed to set pid 0's affinity: Invalid argument

# these work again, albeit more slowly (as expected)
taskset -c 2 efibootmgr
taskset -c 3 efibootmgr

I've seen this symptom ("AP goes lost during S3 resume") with the Ia32 SMM build before (without Jiewen's v2 series applied).

If I run the "info cpus" QEMU command, I get:

* CPU #0: pc=0xffffffff8105eb26 (halted) thread_id=22745
  CPU #1: pc=0x00000000fffffff0 thread_id=22746
  CPU #2: pc=0xffffffff8105eb26 (halted) thread_id=22747
  CPU #3: pc=0xffffffff8105eb26 (halted) thread_id=22748

The halted status for #0, #2 and #3 is fine; that's just Linux at work. CPU#1 is strange -- not halted, but somehow stuck in the reset vector (0xfffffff0)?

The gust kernel dmesg contains the following messages:

> [   55.805153] PM: Restoring platform NVS memory
> [   55.805153] Enabling non-boot CPUs ...
> [   55.805153] x86: Booting SMP configuration:
> [   55.805516] smpboot: Booting Node 0 Processor 1 APIC 0x1
> [   65.816049] smpboot: do_boot_cpu failed(-1) to wakeup CPU#1 <- HERE
> [   65.816738] Error taking CPU1 up: -5
> [   65.817050] smpboot: Booting Node 0 Processor 2 APIC 0x2
> [   65.817029] kvm-clock: cpu 2, msr 1:7ffd6081, secondary cpu clock
> [   65.817029] kvm: enabling virtualization on CPU2
> [   65.832296] KVM setup async PF for cpu 2
> [   65.832607] kvm-stealtime: cpu 2, msr 17fd0e100
> [   65.833031] CPU2 is up
> [   65.833242] smpboot: Booting Node 0 Processor 3 APIC 0x3
> [   65.833229] kvm-clock: cpu 3, msr 1:7ffd60c1, secondary cpu clock
> [   65.833229] kvm: enabling virtualization on CPU3
> [   65.848594] KVM setup async PF for cpu 3
> [   65.848940] kvm-stealtime: cpu 3, msr 17fd8e100
> [   65.849393] CPU3 is up
> [   65.849722] ACPI: Waking up from system sleep state S3

Note the 10 second gap where I put the marker (and the error message itself, too).

Here's an excerpt from the KVM trace:

>  CPU-23509 [002]  8406.908787: kvm_enter_smm:        vcpu 1: entering SMM, smbase 0x30000
>  CPU-23509 [002]  8406.908836: kvm_enter_smm:        vcpu 1: leaving SMM, smbase 0x7ffb3000
>  CPU-23510 [003]  8406.908850: kvm_enter_smm:        vcpu 2: entering SMM, smbase 0x30000
>  CPU-23510 [003]  8406.908881: kvm_enter_smm:        vcpu 2: leaving SMM, smbase 0x7ffb5000
>  CPU-23511 [001]  8406.908908: kvm_enter_smm:        vcpu 3: entering SMM, smbase 0x30000
>  CPU-23511 [001]  8406.908941: kvm_enter_smm:        vcpu 3: leaving SMM, smbase 0x7ffb7000
>  CPU-23508 [005]  8406.908951: kvm_enter_smm:        vcpu 0: entering SMM, smbase 0x30000
>  CPU-23508 [005]  8406.908989: kvm_enter_smm:        vcpu 0: leaving SMM, smbase 0x7ffb1000
>  CPU-23511 [001]  8406.920215: kvm_enter_smm:        vcpu 3: entering SMM, smbase 0x7ffb7000
>  CPU-23509 [002]  8406.920225: kvm_enter_smm:        vcpu 1: entering SMM, smbase 0x7ffb3000
>  CPU-23510 [003]  8406.920225: kvm_enter_smm:        vcpu 2: entering SMM, smbase 0x7ffb5000
>  CPU-23508 [005]  8406.920227: kvm_enter_smm:        vcpu 0: entering SMM, smbase 0x7ffb1000
>  CPU-23508 [005]  8406.920262: kvm_enter_smm:        vcpu 0: leaving SMM, smbase 0x7ffb1000
>  CPU-23511 [001]  8406.920263: kvm_enter_smm:        vcpu 3: leaving SMM, smbase 0x7ffb7000
>  CPU-23508 [005]  8407.020292: kvm_enter_smm:        vcpu 0: entering SMM, smbase 0x7ffb1000
>  CPU-23509 [006]  8407.020338: kvm_enter_smm:        vcpu 1: leaving SMM, smbase 0x7ffb3000
>  CPU-23510 [003]  8407.020338: kvm_enter_smm:        vcpu 2: leaving SMM, smbase 0x7ffb5000
>  CPU-23508 [005]  8407.020338: kvm_enter_smm:        vcpu 0: leaving SMM, smbase 0x7ffb1000

It seems that VCPU#0 still leaves (and then re-enters) SMM while VCPU#1 and VCPU#2 are firmly in SMM.

So this series is a clear improvement, but something else remains amiss.

If I remove Jiewen's v2 series, and apply only this one, then the symptom shows up much less frequently, but it does exist:
- With (Jiewen's v2 + this one), testing case 13, I hit the symptom on the second resume,
- With just this set applied, I hit the symptom (= one AP disappearing from Linux after resume) only on the 24th resume.

Thanks
Laszlo


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] Put AP into safe hlt-loop code on S3 path
  2016-11-10 10:41 ` Laszlo Ersek
@ 2016-11-10 11:17   ` Yao, Jiewen
  2016-11-10 12:08     ` Laszlo Ersek
  2016-11-10 12:26   ` Paolo Bonzini
  1 sibling, 1 reply; 15+ messages in thread
From: Yao, Jiewen @ 2016-11-10 11:17 UTC (permalink / raw)
  To: Laszlo Ersek, Fan, Jeff; +Cc: edk2-devel@ml01.01.org, Paolo Bonzini

Hi Laszlo
Thanks to test for us.

Are you saying Jeff's patch introduces a new issue?
Or is this a previous issue but just not fixed by Jeff's patch?


Thank you
Yao Jiewen

From: Laszlo Ersek [mailto:lersek@redhat.com]
Sent: Thursday, November 10, 2016 6:41 PM
To: Fan, Jeff <jeff.fan@intel.com>
Cc: edk2-devel@ml01.01.org; Yao, Jiewen <jiewen.yao@intel.com>; Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [edk2] [PATCH 0/2] Put AP into safe hlt-loop code on S3 path

On 11/10/16 07:07, Jeff Fan wrote:
> On S3 path, we will wake up APs to restore CPU context in PiSmmCpuDxeSmm
> driver. In case, one NMI or SMI happens, APs may exit from hlt state and
> execute the instruction after HLT instruction.
>
> But APs are not running on safe code, it leads OVMF S3 boot unstable.
>
> https://bugzilla.tianocore.org/show_bug.cgi?id=216
>
> I tested real platform with 64bit DXE.
>
> Jeff Fan (2):
>   UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path
>   UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode on S3 path
>
>  UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c             | 31 ++++++++++++++
>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 25 ++++++++++++
>  UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h    | 13 ++++++
>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c  | 59 +++++++++++++++++++++++++++
>  4 files changed, 128 insertions(+)
>

I applied this on top of Jiewen's v2, for testing.

This series (with my addition for patch #1) doesn't fix the boot failure in case 8. (See "case 8" in <https://lists.01.org/pipermail/edk2-devel/2016-November/004316.html>.) I don't think the series aims to do that at all, but since it modifies the Ia32/SmmFuncsArch.c file, I thought I'd give it a shot.

The series (with my addition for patch #1) changed the behavior of S3 resume, in case 13. There seem to be no crashes / emulation failures now. However, in some of the tries, the resume seems to include a several second long busy loop, and after that -- although the guest OS does come back up --, I cannot access *some* of the APs from within the OS:

# this works, quickly
taskset -c 0 efibootmgr

# this fails
taskset -c 1 efibootmgr
taskset: failed to set pid 0's affinity: Invalid argument

# these work again, albeit more slowly (as expected)
taskset -c 2 efibootmgr
taskset -c 3 efibootmgr

I've seen this symptom ("AP goes lost during S3 resume") with the Ia32 SMM build before (without Jiewen's v2 series applied).

If I run the "info cpus" QEMU command, I get:

* CPU #0: pc=0xffffffff8105eb26 (halted) thread_id=22745
  CPU #1: pc=0x00000000fffffff0 thread_id=22746
  CPU #2: pc=0xffffffff8105eb26 (halted) thread_id=22747
  CPU #3: pc=0xffffffff8105eb26 (halted) thread_id=22748

The halted status for #0, #2 and #3 is fine; that's just Linux at work. CPU#1 is strange -- not halted, but somehow stuck in the reset vector (0xfffffff0)?

The gust kernel dmesg contains the following messages:

> [   55.805153] PM: Restoring platform NVS memory
> [   55.805153] Enabling non-boot CPUs ...
> [   55.805153] x86: Booting SMP configuration:
> [   55.805516] smpboot: Booting Node 0 Processor 1 APIC 0x1
> [   65.816049] smpboot: do_boot_cpu failed(-1) to wakeup CPU#1 <- HERE
> [   65.816738] Error taking CPU1 up: -5
> [   65.817050] smpboot: Booting Node 0 Processor 2 APIC 0x2
> [   65.817029] kvm-clock: cpu 2, msr 1:7ffd6081, secondary cpu clock
> [   65.817029] kvm: enabling virtualization on CPU2
> [   65.832296] KVM setup async PF for cpu 2
> [   65.832607] kvm-stealtime: cpu 2, msr 17fd0e100
> [   65.833031] CPU2 is up
> [   65.833242] smpboot: Booting Node 0 Processor 3 APIC 0x3
> [   65.833229] kvm-clock: cpu 3, msr 1:7ffd60c1, secondary cpu clock
> [   65.833229] kvm: enabling virtualization on CPU3
> [   65.848594] KVM setup async PF for cpu 3
> [   65.848940] kvm-stealtime: cpu 3, msr 17fd8e100
> [   65.849393] CPU3 is up
> [   65.849722] ACPI: Waking up from system sleep state S3

Note the 10 second gap where I put the marker (and the error message itself, too).

Here's an excerpt from the KVM trace:

>  CPU-23509 [002]  8406.908787: kvm_enter_smm:        vcpu 1: entering SMM, smbase 0x30000
>  CPU-23509 [002]  8406.908836: kvm_enter_smm:        vcpu 1: leaving SMM, smbase 0x7ffb3000
>  CPU-23510 [003]  8406.908850: kvm_enter_smm:        vcpu 2: entering SMM, smbase 0x30000
>  CPU-23510 [003]  8406.908881: kvm_enter_smm:        vcpu 2: leaving SMM, smbase 0x7ffb5000
>  CPU-23511 [001]  8406.908908: kvm_enter_smm:        vcpu 3: entering SMM, smbase 0x30000
>  CPU-23511 [001]  8406.908941: kvm_enter_smm:        vcpu 3: leaving SMM, smbase 0x7ffb7000
>  CPU-23508 [005]  8406.908951: kvm_enter_smm:        vcpu 0: entering SMM, smbase 0x30000
>  CPU-23508 [005]  8406.908989: kvm_enter_smm:        vcpu 0: leaving SMM, smbase 0x7ffb1000
>  CPU-23511 [001]  8406.920215: kvm_enter_smm:        vcpu 3: entering SMM, smbase 0x7ffb7000
>  CPU-23509 [002]  8406.920225: kvm_enter_smm:        vcpu 1: entering SMM, smbase 0x7ffb3000
>  CPU-23510 [003]  8406.920225: kvm_enter_smm:        vcpu 2: entering SMM, smbase 0x7ffb5000
>  CPU-23508 [005]  8406.920227: kvm_enter_smm:        vcpu 0: entering SMM, smbase 0x7ffb1000
>  CPU-23508 [005]  8406.920262: kvm_enter_smm:        vcpu 0: leaving SMM, smbase 0x7ffb1000
>  CPU-23511 [001]  8406.920263: kvm_enter_smm:        vcpu 3: leaving SMM, smbase 0x7ffb7000
>  CPU-23508 [005]  8407.020292: kvm_enter_smm:        vcpu 0: entering SMM, smbase 0x7ffb1000
>  CPU-23509 [006]  8407.020338: kvm_enter_smm:        vcpu 1: leaving SMM, smbase 0x7ffb3000
>  CPU-23510 [003]  8407.020338: kvm_enter_smm:        vcpu 2: leaving SMM, smbase 0x7ffb5000
>  CPU-23508 [005]  8407.020338: kvm_enter_smm:        vcpu 0: leaving SMM, smbase 0x7ffb1000

It seems that VCPU#0 still leaves (and then re-enters) SMM while VCPU#1 and VCPU#2 are firmly in SMM.

So this series is a clear improvement, but something else remains amiss.

If I remove Jiewen's v2 series, and apply only this one, then the symptom shows up much less frequently, but it does exist:
- With (Jiewen's v2 + this one), testing case 13, I hit the symptom on the second resume,
- With just this set applied, I hit the symptom (= one AP disappearing from Linux after resume) only on the 24th resume.

Thanks
Laszlo


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] Put AP into safe hlt-loop code on S3 path
  2016-11-10 11:17   ` Yao, Jiewen
@ 2016-11-10 12:08     ` Laszlo Ersek
  2016-11-10 20:45       ` Laszlo Ersek
  0 siblings, 1 reply; 15+ messages in thread
From: Laszlo Ersek @ 2016-11-10 12:08 UTC (permalink / raw)
  To: Yao, Jiewen, Fan, Jeff; +Cc: edk2-devel@ml01.01.org, Paolo Bonzini

On 11/10/16 12:17, Yao, Jiewen wrote:
> Hi Laszlo
> 
> Thanks to test for us.
> 
>  
> 
> Are you saying Jeff’s patch introduces a new issue?
> 
> Or is this a previous issue but just not fixed by Jeff’s patch?

With your v2 series applied, Jeff's patches replace the crash /
emulation failure symptoms during S3 resume with less intrusive
symptoms, namely that some of the APs cannot be brought up by the OS,
occasionally.

Without your v2 series applied, Jeff's patches seem to present the same
symptoms (OS cannot bring up some APs), although much less frequently.
However, I cannot say definitively whether or not this exact same issues
exists, on Ia32X64, with none of the patch sets applied. I haven't seen
it before (on Ia32X64), but maybe I just haven't tried hard enough.

I guess I should try harder and see if the "lost AP" issue exists
without either patch set applied.

Thanks
Laszlo


> 
>  
> 
>  
> 
> Thank you
> 
> Yao Jiewen
> 
>  
> 
> *From:*Laszlo Ersek [mailto:lersek@redhat.com]
> *Sent:* Thursday, November 10, 2016 6:41 PM
> *To:* Fan, Jeff <jeff.fan@intel.com>
> *Cc:* edk2-devel@ml01.01.org; Yao, Jiewen <jiewen.yao@intel.com>; Paolo
> Bonzini <pbonzini@redhat.com>
> *Subject:* Re: [edk2] [PATCH 0/2] Put AP into safe hlt-loop code on S3 path
> 
>  
> 
> On 11/10/16 07:07, Jeff Fan wrote:
>> On S3 path, we will wake up APs to restore CPU context in PiSmmCpuDxeSmm
>> driver. In case, one NMI or SMI happens, APs may exit from hlt state and
>> execute the instruction after HLT instruction.
>> 
>> But APs are not running on safe code, it leads OVMF S3 boot unstable.
>> 
>> https://bugzilla.tianocore.org/show_bug.cgi?id=216
>> 
>> I tested real platform with 64bit DXE.
>> 
>> Jeff Fan (2):
>>   UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path
>>   UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode on S3 path
>> 
>>  UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c             | 31 ++++++++++++++
>>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 25 ++++++++++++
>>  UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h    | 13 ++++++
>>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c  | 59 +++++++++++++++++++++++++++
>>  4 files changed, 128 insertions(+)
>> 
> 
> I applied this on top of Jiewen's v2, for testing.
> 
> This series (with my addition for patch #1) doesn't fix the boot failure in case 8. (See "case 8" in <https://lists.01.org/pipermail/edk2-devel/2016-November/004316.html>.) I don't think the series aims to do that at all, but since it modifies the Ia32/SmmFuncsArch.c file, I thought I'd give it a shot.
> 
> The series (with my addition for patch #1) changed the behavior of S3 resume, in case 13. There seem to be no crashes / emulation failures now. However, in some of the tries, the resume seems to include a several second long busy loop, and after that -- although the guest OS does come back up --, I cannot access *some* of the APs from within the OS:
> 
> # this works, quickly
> taskset -c 0 efibootmgr 
> 
> # this fails
> taskset -c 1 efibootmgr
> taskset: failed to set pid 0's affinity: Invalid argument
> 
> # these work again, albeit more slowly (as expected)
> taskset -c 2 efibootmgr
> taskset -c 3 efibootmgr
> 
> I've seen this symptom ("AP goes lost during S3 resume") with the Ia32 SMM build before (without Jiewen's v2 series applied).
> 
> If I run the "info cpus" QEMU command, I get:
> 
> * CPU #0: pc=0xffffffff8105eb26 (halted) thread_id=22745
>   CPU #1: pc=0x00000000fffffff0 thread_id=22746
>   CPU #2: pc=0xffffffff8105eb26 (halted) thread_id=22747
>   CPU #3: pc=0xffffffff8105eb26 (halted) thread_id=22748
> 
> The halted status for #0, #2 and #3 is fine; that's just Linux at work. CPU#1 is strange -- not halted, but somehow stuck in the reset vector (0xfffffff0)?
> 
> The gust kernel dmesg contains the following messages:
> 
>> [   55.805153] PM: Restoring platform NVS memory
>> [   55.805153] Enabling non-boot CPUs ...
>> [   55.805153] x86: Booting SMP configuration:
>> [   55.805516] smpboot: Booting Node 0 Processor 1 APIC 0x1
>> [   65.816049] smpboot: do_boot_cpu failed(-1) to wakeup CPU#1 <- HERE
>> [   65.816738] Error taking CPU1 up: -5
>> [   65.817050] smpboot: Booting Node 0 Processor 2 APIC 0x2
>> [   65.817029] kvm-clock: cpu 2, msr 1:7ffd6081, secondary cpu clock
>> [   65.817029] kvm: enabling virtualization on CPU2
>> [   65.832296] KVM setup async PF for cpu 2
>> [   65.832607] kvm-stealtime: cpu 2, msr 17fd0e100
>> [   65.833031] CPU2 is up
>> [   65.833242] smpboot: Booting Node 0 Processor 3 APIC 0x3
>> [   65.833229] kvm-clock: cpu 3, msr 1:7ffd60c1, secondary cpu clock
>> [   65.833229] kvm: enabling virtualization on CPU3
>> [   65.848594] KVM setup async PF for cpu 3
>> [   65.848940] kvm-stealtime: cpu 3, msr 17fd8e100
>> [   65.849393] CPU3 is up
>> [   65.849722] ACPI: Waking up from system sleep state S3
> 
> Note the 10 second gap where I put the marker (and the error message itself, too).
> 
> Here's an excerpt from the KVM trace:
> 
>>  CPU-23509 [002]  8406.908787: kvm_enter_smm:        vcpu 1: entering SMM, smbase 0x30000
>>  CPU-23509 [002]  8406.908836: kvm_enter_smm:        vcpu 1: leaving SMM, smbase 0x7ffb3000
>>  CPU-23510 [003]  8406.908850: kvm_enter_smm:        vcpu 2: entering SMM, smbase 0x30000
>>  CPU-23510 [003]  8406.908881: kvm_enter_smm:        vcpu 2: leaving SMM, smbase 0x7ffb5000
>>  CPU-23511 [001]  8406.908908: kvm_enter_smm:        vcpu 3: entering SMM, smbase 0x30000
>>  CPU-23511 [001]  8406.908941: kvm_enter_smm:        vcpu 3: leaving SMM, smbase 0x7ffb7000
>>  CPU-23508 [005]  8406.908951: kvm_enter_smm:        vcpu 0: entering SMM, smbase 0x30000
>>  CPU-23508 [005]  8406.908989: kvm_enter_smm:        vcpu 0: leaving SMM, smbase 0x7ffb1000
>>  CPU-23511 [001]  8406.920215: kvm_enter_smm:        vcpu 3: entering SMM, smbase 0x7ffb7000
>>  CPU-23509 [002]  8406.920225: kvm_enter_smm:        vcpu 1: entering SMM, smbase 0x7ffb3000
>>  CPU-23510 [003]  8406.920225: kvm_enter_smm:        vcpu 2: entering SMM, smbase 0x7ffb5000
>>  CPU-23508 [005]  8406.920227: kvm_enter_smm:        vcpu 0: entering SMM, smbase 0x7ffb1000
>>  CPU-23508 [005]  8406.920262: kvm_enter_smm:        vcpu 0: leaving SMM, smbase 0x7ffb1000
>>  CPU-23511 [001]  8406.920263: kvm_enter_smm:        vcpu 3: leaving SMM, smbase 0x7ffb7000
>>  CPU-23508 [005]  8407.020292: kvm_enter_smm:        vcpu 0: entering SMM, smbase 0x7ffb1000
>>  CPU-23509 [006]  8407.020338: kvm_enter_smm:        vcpu 1: leaving SMM, smbase 0x7ffb3000
>>  CPU-23510 [003]  8407.020338: kvm_enter_smm:        vcpu 2: leaving SMM, smbase 0x7ffb5000
>>  CPU-23508 [005]  8407.020338: kvm_enter_smm:        vcpu 0: leaving SMM, smbase 0x7ffb1000
> 
> It seems that VCPU#0 still leaves (and then re-enters) SMM while VCPU#1 and VCPU#2 are firmly in SMM.
> 
> So this series is a clear improvement, but something else remains amiss.
> 
> If I remove Jiewen's v2 series, and apply only this one, then the symptom shows up much less frequently, but it does exist:
> - With (Jiewen's v2 + this one), testing case 13, I hit the symptom on the second resume,
> - With just this set applied, I hit the symptom (= one AP disappearing from Linux after resume) only on the 24th resume.
> 
> Thanks
> Laszlo
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] Put AP into safe hlt-loop code on S3 path
  2016-11-10 10:41 ` Laszlo Ersek
  2016-11-10 11:17   ` Yao, Jiewen
@ 2016-11-10 12:26   ` Paolo Bonzini
  2016-11-10 13:33     ` Laszlo Ersek
  1 sibling, 1 reply; 15+ messages in thread
From: Paolo Bonzini @ 2016-11-10 12:26 UTC (permalink / raw)
  To: Laszlo Ersek, Jeff Fan; +Cc: edk2-devel, Jiewen Yao



On 10/11/2016 11:41, Laszlo Ersek wrote:
> Here's an excerpt from the KVM trace:
> 
>>  CPU-23509 [002]  8406.908787: kvm_enter_smm:        vcpu 1: entering SMM, smbase 0x30000
>>  CPU-23509 [002]  8406.908836: kvm_enter_smm:        vcpu 1: leaving SMM, smbase 0x7ffb3000
>>  CPU-23510 [003]  8406.908850: kvm_enter_smm:        vcpu 2: entering SMM, smbase 0x30000
>>  CPU-23510 [003]  8406.908881: kvm_enter_smm:        vcpu 2: leaving SMM, smbase 0x7ffb5000
>>  CPU-23511 [001]  8406.908908: kvm_enter_smm:        vcpu 3: entering SMM, smbase 0x30000
>>  CPU-23511 [001]  8406.908941: kvm_enter_smm:        vcpu 3: leaving SMM, smbase 0x7ffb7000
>>  CPU-23508 [005]  8406.908951: kvm_enter_smm:        vcpu 0: entering SMM, smbase 0x30000
>>  CPU-23508 [005]  8406.908989: kvm_enter_smm:        vcpu 0: leaving SMM, smbase 0x7ffb1000
>>  CPU-23511 [001]  8406.920215: kvm_enter_smm:        vcpu 3: entering SMM, smbase 0x7ffb7000
>>  CPU-23509 [002]  8406.920225: kvm_enter_smm:        vcpu 1: entering SMM, smbase 0x7ffb3000
>>  CPU-23510 [003]  8406.920225: kvm_enter_smm:        vcpu 2: entering SMM, smbase 0x7ffb5000
>>  CPU-23508 [005]  8406.920227: kvm_enter_smm:        vcpu 0: entering SMM, smbase 0x7ffb1000
>>  CPU-23508 [005]  8406.920262: kvm_enter_smm:        vcpu 0: leaving SMM, smbase 0x7ffb1000
>>  CPU-23511 [001]  8406.920263: kvm_enter_smm:        vcpu 3: leaving SMM, smbase 0x7ffb7000
>>  CPU-23508 [005]  8407.020292: kvm_enter_smm:        vcpu 0: entering SMM, smbase 0x7ffb1000
>>  CPU-23509 [006]  8407.020338: kvm_enter_smm:        vcpu 1: leaving SMM, smbase 0x7ffb3000
>>  CPU-23510 [003]  8407.020338: kvm_enter_smm:        vcpu 2: leaving SMM, smbase 0x7ffb5000
>>  CPU-23508 [005]  8407.020338: kvm_enter_smm:        vcpu 0: leaving SMM, smbase 0x7ffb1000
> 
> It seems that VCPU#0 still leaves (and then re-enters) SMM while VCPU#1 and VCPU#2 are firmly in SMM.
> 
> So this series is a clear improvement, but something else remains amiss.
> 
> If I remove Jiewen's v2 series, and apply only this one, then the symptom shows up much less frequently, but it does exist:
> - With (Jiewen's v2 + this one), testing case 13, I hit the symptom on the second resume,
> - With just this set applied, I hit the symptom (= one AP disappearing from Linux after resume) only on the 24th resume.

Any trace I can look at?  What about case 14, with
PcdCpuSmmStaticPageTable=TRUE?

Paolo


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] Put AP into safe hlt-loop code on S3 path
  2016-11-10 12:26   ` Paolo Bonzini
@ 2016-11-10 13:33     ` Laszlo Ersek
  0 siblings, 0 replies; 15+ messages in thread
From: Laszlo Ersek @ 2016-11-10 13:33 UTC (permalink / raw)
  To: Paolo Bonzini, Jeff Fan; +Cc: edk2-devel, Jiewen Yao

On 11/10/16 13:26, Paolo Bonzini wrote:
> 
> 
> On 10/11/2016 11:41, Laszlo Ersek wrote:
>> Here's an excerpt from the KVM trace:
>>
>>>  CPU-23509 [002]  8406.908787: kvm_enter_smm:        vcpu 1: entering SMM, smbase 0x30000
>>>  CPU-23509 [002]  8406.908836: kvm_enter_smm:        vcpu 1: leaving SMM, smbase 0x7ffb3000
>>>  CPU-23510 [003]  8406.908850: kvm_enter_smm:        vcpu 2: entering SMM, smbase 0x30000
>>>  CPU-23510 [003]  8406.908881: kvm_enter_smm:        vcpu 2: leaving SMM, smbase 0x7ffb5000
>>>  CPU-23511 [001]  8406.908908: kvm_enter_smm:        vcpu 3: entering SMM, smbase 0x30000
>>>  CPU-23511 [001]  8406.908941: kvm_enter_smm:        vcpu 3: leaving SMM, smbase 0x7ffb7000
>>>  CPU-23508 [005]  8406.908951: kvm_enter_smm:        vcpu 0: entering SMM, smbase 0x30000
>>>  CPU-23508 [005]  8406.908989: kvm_enter_smm:        vcpu 0: leaving SMM, smbase 0x7ffb1000
>>>  CPU-23511 [001]  8406.920215: kvm_enter_smm:        vcpu 3: entering SMM, smbase 0x7ffb7000
>>>  CPU-23509 [002]  8406.920225: kvm_enter_smm:        vcpu 1: entering SMM, smbase 0x7ffb3000
>>>  CPU-23510 [003]  8406.920225: kvm_enter_smm:        vcpu 2: entering SMM, smbase 0x7ffb5000
>>>  CPU-23508 [005]  8406.920227: kvm_enter_smm:        vcpu 0: entering SMM, smbase 0x7ffb1000
>>>  CPU-23508 [005]  8406.920262: kvm_enter_smm:        vcpu 0: leaving SMM, smbase 0x7ffb1000
>>>  CPU-23511 [001]  8406.920263: kvm_enter_smm:        vcpu 3: leaving SMM, smbase 0x7ffb7000
>>>  CPU-23508 [005]  8407.020292: kvm_enter_smm:        vcpu 0: entering SMM, smbase 0x7ffb1000
>>>  CPU-23509 [006]  8407.020338: kvm_enter_smm:        vcpu 1: leaving SMM, smbase 0x7ffb3000
>>>  CPU-23510 [003]  8407.020338: kvm_enter_smm:        vcpu 2: leaving SMM, smbase 0x7ffb5000
>>>  CPU-23508 [005]  8407.020338: kvm_enter_smm:        vcpu 0: leaving SMM, smbase 0x7ffb1000
>>
>> It seems that VCPU#0 still leaves (and then re-enters) SMM while VCPU#1 and VCPU#2 are firmly in SMM.
>>
>> So this series is a clear improvement, but something else remains amiss.
>>
>> If I remove Jiewen's v2 series, and apply only this one, then the symptom shows up much less frequently, but it does exist:
>> - With (Jiewen's v2 + this one), testing case 13, I hit the symptom on the second resume,
>> - With just this set applied, I hit the symptom (= one AP disappearing from Linux after resume) only on the 24th resume.
> 
> Any trace I can look at?

Thank you for asking / offering, I did stash the full trace :), I just
didn't want to foist it upon you without you offering first :)

http://people.redhat.com/lersek/s3-crash-8d1dfed7-ca92-4e25-8d2b-b1c9ac2a53db/case-13-jiewen-v2-jeff-fixup-trace.txt.bz2

> What about case 14, with
> PcdCpuSmmStaticPageTable=TRUE?

I didn't test case 14, because I consider it a more demanding test case
than case 13. According to Jiewen's earlier explanation,
PcdCpuSmmStaticPageTable=FALSE implements a subset of the protections
that PcdCpuSmmStaticPageTable=TRUE implements.

Thanks
Laszlo


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] Put AP into safe hlt-loop code on S3 path
  2016-11-10 12:08     ` Laszlo Ersek
@ 2016-11-10 20:45       ` Laszlo Ersek
  0 siblings, 0 replies; 15+ messages in thread
From: Laszlo Ersek @ 2016-11-10 20:45 UTC (permalink / raw)
  To: Yao, Jiewen, Fan, Jeff; +Cc: Paolo Bonzini, edk2-devel@ml01.01.org

On 11/10/16 13:08, Laszlo Ersek wrote:
> On 11/10/16 12:17, Yao, Jiewen wrote:
>> Hi Laszlo
>>
>> Thanks to test for us.
>>
>>  
>>
>> Are you saying Jeff’s patch introduces a new issue?
>>
>> Or is this a previous issue but just not fixed by Jeff’s patch?
> 
> With your v2 series applied, Jeff's patches replace the crash /
> emulation failure symptoms during S3 resume with less intrusive
> symptoms, namely that some of the APs cannot be brought up by the OS,
> occasionally.
> 
> Without your v2 series applied, Jeff's patches seem to present the same
> symptoms (OS cannot bring up some APs), although much less frequently.
> However, I cannot say definitively whether or not this exact same issues
> exists, on Ia32X64, with none of the patch sets applied. I haven't seen
> it before (on Ia32X64), but maybe I just haven't tried hard enough.
> 
> I guess I should try harder and see if the "lost AP" issue exists
> without either patch set applied.

With none of the patch sets applied, the "lost AP" issue doesn't exist;
instead, the emulation failure is experienced.

So, for Ia32X64,

                         Jeff's not applied           Jeff's applied
                         ---------------------------  -----------------
Jiewen's v2 not applied  emulation failure, rare      AP lost, rare
Jiewen's v2 applied      emulation failure, frequent  AP lost, frequent

Thanks
Laszlo

>>
>>  
>>
>>  
>>
>> Thank you
>>
>> Yao Jiewen
>>
>>  
>>
>> *From:*Laszlo Ersek [mailto:lersek@redhat.com]
>> *Sent:* Thursday, November 10, 2016 6:41 PM
>> *To:* Fan, Jeff <jeff.fan@intel.com>
>> *Cc:* edk2-devel@ml01.01.org; Yao, Jiewen <jiewen.yao@intel.com>; Paolo
>> Bonzini <pbonzini@redhat.com>
>> *Subject:* Re: [edk2] [PATCH 0/2] Put AP into safe hlt-loop code on S3 path
>>
>>  
>>
>> On 11/10/16 07:07, Jeff Fan wrote:
>>> On S3 path, we will wake up APs to restore CPU context in PiSmmCpuDxeSmm
>>> driver. In case, one NMI or SMI happens, APs may exit from hlt state and
>>> execute the instruction after HLT instruction.
>>>
>>> But APs are not running on safe code, it leads OVMF S3 boot unstable.
>>>
>>> https://bugzilla.tianocore.org/show_bug.cgi?id=216
>>>
>>> I tested real platform with 64bit DXE.
>>>
>>> Jeff Fan (2):
>>>   UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path
>>>   UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode on S3 path
>>>
>>>  UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c             | 31 ++++++++++++++
>>>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 25 ++++++++++++
>>>  UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h    | 13 ++++++
>>>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c  | 59 +++++++++++++++++++++++++++
>>>  4 files changed, 128 insertions(+)
>>>
>>
>> I applied this on top of Jiewen's v2, for testing.
>>
>> This series (with my addition for patch #1) doesn't fix the boot failure in case 8. (See "case 8" in <https://lists.01.org/pipermail/edk2-devel/2016-November/004316.html>.) I don't think the series aims to do that at all, but since it modifies the Ia32/SmmFuncsArch.c file, I thought I'd give it a shot.
>>
>> The series (with my addition for patch #1) changed the behavior of S3 resume, in case 13. There seem to be no crashes / emulation failures now. However, in some of the tries, the resume seems to include a several second long busy loop, and after that -- although the guest OS does come back up --, I cannot access *some* of the APs from within the OS:
>>
>> # this works, quickly
>> taskset -c 0 efibootmgr 
>>
>> # this fails
>> taskset -c 1 efibootmgr
>> taskset: failed to set pid 0's affinity: Invalid argument
>>
>> # these work again, albeit more slowly (as expected)
>> taskset -c 2 efibootmgr
>> taskset -c 3 efibootmgr
>>
>> I've seen this symptom ("AP goes lost during S3 resume") with the Ia32 SMM build before (without Jiewen's v2 series applied).
>>
>> If I run the "info cpus" QEMU command, I get:
>>
>> * CPU #0: pc=0xffffffff8105eb26 (halted) thread_id=22745
>>   CPU #1: pc=0x00000000fffffff0 thread_id=22746
>>   CPU #2: pc=0xffffffff8105eb26 (halted) thread_id=22747
>>   CPU #3: pc=0xffffffff8105eb26 (halted) thread_id=22748
>>
>> The halted status for #0, #2 and #3 is fine; that's just Linux at work. CPU#1 is strange -- not halted, but somehow stuck in the reset vector (0xfffffff0)?
>>
>> The gust kernel dmesg contains the following messages:
>>
>>> [   55.805153] PM: Restoring platform NVS memory
>>> [   55.805153] Enabling non-boot CPUs ...
>>> [   55.805153] x86: Booting SMP configuration:
>>> [   55.805516] smpboot: Booting Node 0 Processor 1 APIC 0x1
>>> [   65.816049] smpboot: do_boot_cpu failed(-1) to wakeup CPU#1 <- HERE
>>> [   65.816738] Error taking CPU1 up: -5
>>> [   65.817050] smpboot: Booting Node 0 Processor 2 APIC 0x2
>>> [   65.817029] kvm-clock: cpu 2, msr 1:7ffd6081, secondary cpu clock
>>> [   65.817029] kvm: enabling virtualization on CPU2
>>> [   65.832296] KVM setup async PF for cpu 2
>>> [   65.832607] kvm-stealtime: cpu 2, msr 17fd0e100
>>> [   65.833031] CPU2 is up
>>> [   65.833242] smpboot: Booting Node 0 Processor 3 APIC 0x3
>>> [   65.833229] kvm-clock: cpu 3, msr 1:7ffd60c1, secondary cpu clock
>>> [   65.833229] kvm: enabling virtualization on CPU3
>>> [   65.848594] KVM setup async PF for cpu 3
>>> [   65.848940] kvm-stealtime: cpu 3, msr 17fd8e100
>>> [   65.849393] CPU3 is up
>>> [   65.849722] ACPI: Waking up from system sleep state S3
>>
>> Note the 10 second gap where I put the marker (and the error message itself, too).
>>
>> Here's an excerpt from the KVM trace:
>>
>>>  CPU-23509 [002]  8406.908787: kvm_enter_smm:        vcpu 1: entering SMM, smbase 0x30000
>>>  CPU-23509 [002]  8406.908836: kvm_enter_smm:        vcpu 1: leaving SMM, smbase 0x7ffb3000
>>>  CPU-23510 [003]  8406.908850: kvm_enter_smm:        vcpu 2: entering SMM, smbase 0x30000
>>>  CPU-23510 [003]  8406.908881: kvm_enter_smm:        vcpu 2: leaving SMM, smbase 0x7ffb5000
>>>  CPU-23511 [001]  8406.908908: kvm_enter_smm:        vcpu 3: entering SMM, smbase 0x30000
>>>  CPU-23511 [001]  8406.908941: kvm_enter_smm:        vcpu 3: leaving SMM, smbase 0x7ffb7000
>>>  CPU-23508 [005]  8406.908951: kvm_enter_smm:        vcpu 0: entering SMM, smbase 0x30000
>>>  CPU-23508 [005]  8406.908989: kvm_enter_smm:        vcpu 0: leaving SMM, smbase 0x7ffb1000
>>>  CPU-23511 [001]  8406.920215: kvm_enter_smm:        vcpu 3: entering SMM, smbase 0x7ffb7000
>>>  CPU-23509 [002]  8406.920225: kvm_enter_smm:        vcpu 1: entering SMM, smbase 0x7ffb3000
>>>  CPU-23510 [003]  8406.920225: kvm_enter_smm:        vcpu 2: entering SMM, smbase 0x7ffb5000
>>>  CPU-23508 [005]  8406.920227: kvm_enter_smm:        vcpu 0: entering SMM, smbase 0x7ffb1000
>>>  CPU-23508 [005]  8406.920262: kvm_enter_smm:        vcpu 0: leaving SMM, smbase 0x7ffb1000
>>>  CPU-23511 [001]  8406.920263: kvm_enter_smm:        vcpu 3: leaving SMM, smbase 0x7ffb7000
>>>  CPU-23508 [005]  8407.020292: kvm_enter_smm:        vcpu 0: entering SMM, smbase 0x7ffb1000
>>>  CPU-23509 [006]  8407.020338: kvm_enter_smm:        vcpu 1: leaving SMM, smbase 0x7ffb3000
>>>  CPU-23510 [003]  8407.020338: kvm_enter_smm:        vcpu 2: leaving SMM, smbase 0x7ffb5000
>>>  CPU-23508 [005]  8407.020338: kvm_enter_smm:        vcpu 0: leaving SMM, smbase 0x7ffb1000
>>
>> It seems that VCPU#0 still leaves (and then re-enters) SMM while VCPU#1 and VCPU#2 are firmly in SMM.
>>
>> So this series is a clear improvement, but something else remains amiss.
>>
>> If I remove Jiewen's v2 series, and apply only this one, then the symptom shows up much less frequently, but it does exist:
>> - With (Jiewen's v2 + this one), testing case 13, I hit the symptom on the second resume,
>> - With just this set applied, I hit the symptom (= one AP disappearing from Linux after resume) only on the 24th resume.
>>
>> Thanks
>> Laszlo
>>
> 
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] Put AP into safe hlt-loop code on S3 path
  2016-11-10  9:59 ` Paolo Bonzini
@ 2016-11-11  6:32   ` Fan, Jeff
  0 siblings, 0 replies; 15+ messages in thread
From: Fan, Jeff @ 2016-11-11  6:32 UTC (permalink / raw)
  To: Paolo Bonzini, edk2-devel@lists.01.org

Paolo,

I added patch #3 in v2 to do InterlockedDecrement (&mNumberToFinish) in the safe code. 
This is very good comment to eliminate this gap.

Thanks!
Jeff

-----Original Message-----
From: Paolo Bonzini [mailto:paolo.bonzini@gmail.com] On Behalf Of Paolo Bonzini
Sent: Thursday, November 10, 2016 5:59 PM
To: Fan, Jeff; edk2-devel@lists.01.org
Subject: Re: [edk2] [PATCH 0/2] Put AP into safe hlt-loop code on S3 path



On 10/11/2016 07:07, Jeff Fan wrote:
> On S3 path, we will wake up APs to restore CPU context in 
> PiSmmCpuDxeSmm driver. In case, one NMI or SMI happens, APs may exit 
> from hlt state and execute the instruction after HLT instruction.
> 
> But APs are not running on safe code, it leads OVMF S3 boot unstable.
> 
> https://bugzilla.tianocore.org/show_bug.cgi?id=216
> 
> I tested real platform with 64bit DXE.
> 
> Jeff Fan (2):
>   UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path
>   UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode on S3 
> path
> 
>  UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c             | 31 ++++++++++++++
>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 25 ++++++++++++
>  UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h    | 13 ++++++
>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c  | 59 
> +++++++++++++++++++++++++++
>  4 files changed, 128 insertions(+)
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

It would be slightly more robust to do the "InterlockedDecrement (&mNumberToFinish);" while in safe state, but the race window is really really small.

Paolo


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-11-11  6:32 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-10  6:07 [PATCH 0/2] Put AP into safe hlt-loop code on S3 path Jeff Fan
2016-11-10  6:07 ` [PATCH 1/2] UefiCpuPkg/PiSmmCpuDxeSmm: " Jeff Fan
2016-11-10  8:50   ` Laszlo Ersek
2016-11-10  9:00     ` Fan, Jeff
2016-11-10  9:30       ` Laszlo Ersek
2016-11-10  6:07 ` [PATCH 2/2] UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode " Jeff Fan
2016-11-10  8:56 ` [PATCH 0/2] Put AP into safe hlt-loop code " Laszlo Ersek
2016-11-10  9:59 ` Paolo Bonzini
2016-11-11  6:32   ` Fan, Jeff
2016-11-10 10:41 ` Laszlo Ersek
2016-11-10 11:17   ` Yao, Jiewen
2016-11-10 12:08     ` Laszlo Ersek
2016-11-10 20:45       ` Laszlo Ersek
2016-11-10 12:26   ` Paolo Bonzini
2016-11-10 13:33     ` Laszlo Ersek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox