public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
* [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path
@ 2016-11-11  5:45 Jeff Fan
  2016-11-11  5:45 ` [PATCH v2 1/3] UefiCpuPkg/PiSmmCpuDxeSmm: " Jeff Fan
                   ` (4 more replies)
  0 siblings, 5 replies; 26+ messages in thread
From: Jeff Fan @ 2016-11-11  5:45 UTC (permalink / raw)
  To: edk2-devel

On S3 path, we will wake up APs to restore CPU context in PiSmmCpuDxeSmm
driver. In case, one NMI or SMI happens, APs may exit from hlt state and
execute the instruction after HLT instruction.

But APs are not running on safe code, it leads OVMF S3 boot unstable.

https://bugzilla.tianocore.org/show_bug.cgi?id=216

I tested real platform with 64bit DXE.

v2:
  1. Make stack alignment per Laszlo's comment.
  2. Trim whitespace at end of end per Laszlo's comment.
  3. Update year mark in file header.
  4. Enhancement on InterlockedDecrement() per Paolo's comment.

Jeff Fan (3):
  UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path
  UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode on S3 path
  UefiCpuPkg/PiSmmCpuDxeSmm: Decrease mNumberToFinish in AP safe code

 UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c             | 33 +++++++++++++-
 UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 29 +++++++++++-
 UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h    | 15 +++++++
 UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c  | 63 ++++++++++++++++++++++++++-
 4 files changed, 136 insertions(+), 4 deletions(-)

-- 
2.9.3.windows.2



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 1/3] UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path
  2016-11-11  5:45 [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path Jeff Fan
@ 2016-11-11  5:45 ` Jeff Fan
  2016-11-11  5:45 ` [PATCH v2 2/3] UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode " Jeff Fan
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 26+ messages in thread
From: Jeff Fan @ 2016-11-11  5:45 UTC (permalink / raw)
  To: edk2-devel; +Cc: Laszlo Ersek, Paolo Bonzini, Jiewen Yao, Michael D Kinney

On S3 path, we will wake up APs to restore CPU context in PiSmmCpuDxeSmm
driver. However, we place AP in hlt-loop under 1MB space borrowed after CPU
restoring CPU contexts.
In case, one NMI or SMI happens, APs may exit from hlt state and execute the
instruction after HLT instruction. But the code under 1MB is no longer safe at
that time.

This fix is to allocate one ACPI NVS range to place the AP hlt-loop code. When
CPU finished restoration CPU contexts, AP will execute in this ACPI NVS range.

https://bugzilla.tianocore.org/show_bug.cgi?id=216

v2:
  1. Make stack alignment per Laszlo's comment.
  2. Trim whitespace at end of end.
  3. Update year mark in file header.

Reported-by: Laszlo Ersek <lersek@redhat.com>
Analyzed-by: Paolo Bonzini <pbonzini@redhat.com>
Analyzed-by: Laszlo Ersek <lersek@redhat.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Michael D Kinney <michael.d.kinney@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jeff Fan <jeff.fan@intel.com>
---
 UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c             | 32 +++++++++++++++++++++++++++
 UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 27 +++++++++++++++++++++-
 UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h    | 13 +++++++++++
 UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c  | 29 +++++++++++++++++++++++-
 4 files changed, 99 insertions(+), 2 deletions(-)

diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c b/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
index 6a798ef..e53e096 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
@@ -77,6 +77,13 @@ SMM_S3_RESUME_STATE          *mSmmS3ResumeState = NULL;
 
 BOOLEAN                      mAcpiS3Enable = TRUE;
 
+UINT8                        *mApHltLoopCode = NULL;
+UINT8                        mApHltLoopCodeTemplate[] = {
+                               0xFA,        // cli
+                               0xF4,        // hlt
+                               0xEB, 0xFC   // jmp $-2
+                               };
+
 /**
   Get MSR spin lock by MSR index.
 
@@ -376,6 +383,8 @@ MPRendezvousProcedure (
   CPU_REGISTER_TABLE         *RegisterTableList;
   UINT32                     InitApicId;
   UINTN                      Index;
+  UINT32                     TopOfStack;
+  UINT8                      Stack[128];
 
   ProgramVirtualWireMode ();
   DisableLvtInterrupts ();
@@ -393,6 +402,14 @@ MPRendezvousProcedure (
   // Count down the number with lock mechanism.
   //
   InterlockedDecrement (&mNumberToFinish);
+
+  //
+  // Place AP into the safe code
+  //
+  TopOfStack  = (UINT32) (UINTN) Stack + sizeof (Stack);
+  TopOfStack &= ~(UINT32) (CPU_STACK_ALIGNMENT - 1);
+  CopyMem ((VOID *) (UINTN) mApHltLoopCode, mApHltLoopCodeTemplate, sizeof (mApHltLoopCodeTemplate));
+  TransferApToSafeState ((UINT32) (UINTN) mApHltLoopCode, TopOfStack);
 }
 
 /**
@@ -731,6 +748,8 @@ InitSmmS3ResumeState (
   VOID                       *GuidHob;
   EFI_SMRAM_DESCRIPTOR       *SmramDescriptor;
   SMM_S3_RESUME_STATE        *SmmS3ResumeState;
+  EFI_PHYSICAL_ADDRESS       Address;
+  EFI_STATUS                 Status;
 
   if (!mAcpiS3Enable) {
     return;
@@ -773,6 +792,19 @@ InitSmmS3ResumeState (
   // Patch SmmS3ResumeState->SmmS3Cr3
   //
   InitSmmS3Cr3 ();
+
+  //
+  // Allocate safe memory in ACPI NVS for AP to execute hlt loop on S3 path
+  //
+  Address = BASE_4GB - 1;
+  Status  = gBS->AllocatePages (
+                   AllocateMaxAddress,
+                   EfiACPIMemoryNVS,
+                   EFI_SIZE_TO_PAGES (sizeof (mApHltLoopCodeTemplate)),
+                   &Address
+                   );
+  ASSERT_EFI_ERROR (Status);
+  mApHltLoopCode = (UINT8 *) (UINTN) Address;
 }
 
 /**
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c
index 545b534..8b880d6 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c
@@ -1,7 +1,7 @@
 /** @file
   SMM CPU misc functions for Ia32 arch specific.
   
-Copyright (c) 2015, Intel Corporation. All rights reserved.<BR>
+Copyright (c) 2015 - 2016, Intel Corporation. All rights reserved.<BR>
 This program and the accompanying materials
 are licensed and made available under the terms and conditions of the BSD License
 which accompanies this distribution.  The full text of the license may be found at
@@ -94,3 +94,28 @@ InitGdt (
   *GdtStepSize = GdtTableStepSize;
   return GdtTssTables;
 }
+
+/**
+  Transfer AP to safe hlt-loop after it finished restore CPU features on S3 patch.
+
+  @param[in] ApHltLoopCode    The 32-bit address of the safe hlt-loop function.
+  @param[in] TopOfStack       A pointer to the new stack to use for the ApHltLoopCode.
+
+**/
+VOID
+TransferApToSafeState (
+  IN UINT32             ApHltLoopCode,
+  IN UINT32             TopOfStack
+  )
+{
+  SwitchStack (
+    (SWITCH_STACK_ENTRY_POINT) (UINTN) ApHltLoopCode,
+    NULL,
+    NULL,
+    (VOID *) (UINTN) TopOfStack
+    );
+  //
+  // It should never reach here
+  //
+  ASSERT (FALSE);
+}
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
index 9b119c8..6c98659 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
@@ -825,4 +825,17 @@ GetAcpiS3EnableFlag (
   VOID
   );
 
+/**
+  Transfer AP to safe hlt-loop after it finished restore CPU features on S3 patch.
+
+  @param[in] ApHltLoopCode    The 32-bit address of the safe hlt-loop function.
+  @param[in] TopOfStack       A pointer to the new stack to use for the ApHltLoopCode.
+
+**/
+VOID
+TransferApToSafeState (
+  IN UINT32             ApHltLoopCode,
+  IN UINT32             TopOfStack
+  );
+
 #endif
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c
index b53aa45..bd465c7 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c
@@ -1,7 +1,7 @@
 /** @file
   SMM CPU misc functions for x64 arch specific.
   
-Copyright (c) 2015, Intel Corporation. All rights reserved.<BR>
+Copyright (c) 2015 - 2016, Intel Corporation. All rights reserved.<BR>
 This program and the accompanying materials
 are licensed and made available under the terms and conditions of the BSD License
 which accompanies this distribution.  The full text of the license may be found at
@@ -68,3 +68,30 @@ InitGdt (
   *GdtStepSize = GdtTableStepSize;
   return GdtTssTables;
 }
+
+/**
+  Transfer AP to safe hlt-loop after it finished restore CPU features on S3 patch.
+
+  @param[in] ApHltLoopCode    The 32-bit address of the safe hlt-loop function.
+  @param[in] TopOfStack       A pointer to the new stack to use for the ApHltLoopCode.
+
+**/
+VOID
+TransferApToSafeState (
+  IN UINT32             ApHltLoopCode,
+  IN UINT32             TopOfStack
+  )
+{
+  SwitchStack (
+    (SWITCH_STACK_ENTRY_POINT) (UINTN) ApHltLoopCode,
+    NULL,
+    NULL,
+    (VOID *) (UINTN) TopOfStack
+    );
+  //
+  // It should never reach here
+  //
+  ASSERT (FALSE);
+}
+
+
-- 
2.9.3.windows.2



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 2/3] UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode on S3 path
  2016-11-11  5:45 [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path Jeff Fan
  2016-11-11  5:45 ` [PATCH v2 1/3] UefiCpuPkg/PiSmmCpuDxeSmm: " Jeff Fan
@ 2016-11-11  5:45 ` Jeff Fan
  2016-11-11  5:45 ` [PATCH v2 3/3] UefiCpuPkg/PiSmmCpuDxeSmm: Decrease mNumberToFinish in AP safe code Jeff Fan
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 26+ messages in thread
From: Jeff Fan @ 2016-11-11  5:45 UTC (permalink / raw)
  To: edk2-devel; +Cc: Laszlo Ersek, Paolo Bonzini, Jiewen Yao, Michael D Kinney

On S3 path, we may transfer to long mode (if DXE is long mode) to restore CPU
contexts with CR3 = SmmS3Cr3 (in SMM). AP will execute hlt-loop after CPU
contexts restoration. Once one NMI or SMI happens, APs may exit from hlt state
and execute the instruction after HLT instruction. If APs are running on long
mode, page table is required to fetch the instruction. However, CR3 pointer to
page table in SMM. APs will crash.

This fix is to disable long mode on APs and transfer to 32bit protected mode to
execute hlt-loop. Then CR3 and page table will no longer be required.

https://bugzilla.tianocore.org/show_bug.cgi?id=216

Reported-by: Laszlo Ersek <lersek@redhat.com>
Analyzed-by: Paolo Bonzini <pbonzini@redhat.com>
Analyzed-by: Laszlo Ersek <lersek@redhat.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Michael D Kinney <michael.d.kinney@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jeff Fan <jeff.fan@intel.com>
---
 UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c | 42 ++++++++++++++++++++++++----
 1 file changed, 37 insertions(+), 5 deletions(-)

diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c
index bd465c7..62338b7 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c
@@ -70,6 +70,37 @@ InitGdt (
 }
 
 /**
+  Get Protected mode code segment from current GDT table.
+
+  @return  Protected mode code segment value.
+**/
+UINT16
+GetProtectedModeCS (
+  VOID
+  )
+{
+  IA32_DESCRIPTOR          GdtrDesc;
+  IA32_SEGMENT_DESCRIPTOR  *GdtEntry;
+  UINTN                    GdtEntryCount;
+  UINT16                   Index;
+
+  Index = (UINT16) -1;
+  AsmReadGdtr (&GdtrDesc);
+  GdtEntryCount = (GdtrDesc.Limit + 1) / sizeof (IA32_SEGMENT_DESCRIPTOR);
+  GdtEntry = (IA32_SEGMENT_DESCRIPTOR *) GdtrDesc.Base;
+  for (Index = 0; Index < GdtEntryCount; Index++) {
+    if (GdtEntry->Bits.L == 0) {
+      if (GdtEntry->Bits.Type > 8 && GdtEntry->Bits.L == 0) {
+        break;
+      }
+    }
+    GdtEntry++;
+  }
+  ASSERT (Index != -1);
+  return Index * 8;
+}
+
+/**
   Transfer AP to safe hlt-loop after it finished restore CPU features on S3 patch.
 
   @param[in] ApHltLoopCode    The 32-bit address of the safe hlt-loop function.
@@ -82,11 +113,12 @@ TransferApToSafeState (
   IN UINT32             TopOfStack
   )
 {
-  SwitchStack (
-    (SWITCH_STACK_ENTRY_POINT) (UINTN) ApHltLoopCode,
-    NULL,
-    NULL,
-    (VOID *) (UINTN) TopOfStack
+  AsmDisablePaging64 (
+    GetProtectedModeCS (),
+    (UINT32) (UINTN) ApHltLoopCode,
+    0,
+    0,
+    TopOfStack
     );
   //
   // It should never reach here
-- 
2.9.3.windows.2



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 3/3] UefiCpuPkg/PiSmmCpuDxeSmm: Decrease mNumberToFinish in AP safe code
  2016-11-11  5:45 [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path Jeff Fan
  2016-11-11  5:45 ` [PATCH v2 1/3] UefiCpuPkg/PiSmmCpuDxeSmm: " Jeff Fan
  2016-11-11  5:45 ` [PATCH v2 2/3] UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode " Jeff Fan
@ 2016-11-11  5:45 ` Jeff Fan
  2016-11-11 10:16   ` Paolo Bonzini
  2016-11-11 19:49 ` [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path Laszlo Ersek
       [not found] ` <542CF652F8836A4AB8DBFAAD40ED192A4A2DCDE3@shsmsx102.ccr.corp.intel.com>
  4 siblings, 1 reply; 26+ messages in thread
From: Jeff Fan @ 2016-11-11  5:45 UTC (permalink / raw)
  To: edk2-devel; +Cc: Laszlo Ersek, Paolo Bonzini, Jiewen Yao, Michael D Kinney

We will put APs into hlt-loop in safe code. But we decrease mNumberToFinish
before APs enter into the safe code. Paolo pointed out this gap.

This patch is to move mNumberToFinish decreasing to the safe code. It could
make sure BSP could wait for all APs are running in safe code.

https://bugzilla.tianocore.org/show_bug.cgi?id=216

Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Michael D Kinney <michael.d.kinney@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jeff Fan <jeff.fan@intel.com>
---
 UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c             | 17 +++++++----------
 UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c |  6 ++++--
 UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h    |  4 +++-
 UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c  |  6 ++++--
 4 files changed, 18 insertions(+), 15 deletions(-)

diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c b/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
index e53e096..34547e0 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
@@ -79,9 +79,11 @@ BOOLEAN                      mAcpiS3Enable = TRUE;
 
 UINT8                        *mApHltLoopCode = NULL;
 UINT8                        mApHltLoopCodeTemplate[] = {
-                               0xFA,        // cli
-                               0xF4,        // hlt
-                               0xEB, 0xFC   // jmp $-2
+                               0x8B, 0x44, 0x24, 0x04,  // mov  eax, dword ptr [esp+4]
+                               0xF0, 0xFF, 0x08,        // lock dec  dword ptr [eax]
+                               0xFA,                    // cli
+                               0xF4,                    // hlt
+                               0xEB, 0xFC               // jmp $-2
                                };
 
 /**
@@ -399,17 +401,12 @@ MPRendezvousProcedure (
   }
 
   //
-  // Count down the number with lock mechanism.
-  //
-  InterlockedDecrement (&mNumberToFinish);
-
-  //
-  // Place AP into the safe code
+  // Place AP into the safe code, count down the number with lock mechanism in the safe code.
   //
   TopOfStack  = (UINT32) (UINTN) Stack + sizeof (Stack);
   TopOfStack &= ~(UINT32) (CPU_STACK_ALIGNMENT - 1);
   CopyMem ((VOID *) (UINTN) mApHltLoopCode, mApHltLoopCodeTemplate, sizeof (mApHltLoopCodeTemplate));
-  TransferApToSafeState ((UINT32) (UINTN) mApHltLoopCode, TopOfStack);
+  TransferApToSafeState ((UINT32) (UINTN) mApHltLoopCode, TopOfStack, &mNumberToFinish);
 }
 
 /**
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c
index 8b880d6..9760373 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c
@@ -100,17 +100,19 @@ InitGdt (
 
   @param[in] ApHltLoopCode    The 32-bit address of the safe hlt-loop function.
   @param[in] TopOfStack       A pointer to the new stack to use for the ApHltLoopCode.
+  @param[in] NumberToFinish   Semaphore of APs finish count.
 
 **/
 VOID
 TransferApToSafeState (
   IN UINT32             ApHltLoopCode,
-  IN UINT32             TopOfStack
+  IN UINT32             TopOfStack,
+  IN UINT32             *NumberToFinish
   )
 {
   SwitchStack (
     (SWITCH_STACK_ENTRY_POINT) (UINTN) ApHltLoopCode,
-    NULL,
+    NumberToFinish,
     NULL,
     (VOID *) (UINTN) TopOfStack
     );
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
index 6c98659..88d9c85 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
@@ -830,12 +830,14 @@ GetAcpiS3EnableFlag (
 
   @param[in] ApHltLoopCode    The 32-bit address of the safe hlt-loop function.
   @param[in] TopOfStack       A pointer to the new stack to use for the ApHltLoopCode.
+  @param[in] NumberToFinish   Semaphore of APs finish count.
 
 **/
 VOID
 TransferApToSafeState (
   IN UINT32             ApHltLoopCode,
-  IN UINT32             TopOfStack
+  IN UINT32             TopOfStack,
+  IN UINT32             *NumberToFinish
   );
 
 #endif
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c
index 62338b7..6844c3f 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c
@@ -105,18 +105,20 @@ GetProtectedModeCS (
 
   @param[in] ApHltLoopCode    The 32-bit address of the safe hlt-loop function.
   @param[in] TopOfStack       A pointer to the new stack to use for the ApHltLoopCode.
+  @param[in] NumberToFinish   Semaphore of APs finish count.
 
 **/
 VOID
 TransferApToSafeState (
   IN UINT32             ApHltLoopCode,
-  IN UINT32             TopOfStack
+  IN UINT32             TopOfStack,
+  IN UINT32             *NumberToFinish
   )
 {
   AsmDisablePaging64 (
     GetProtectedModeCS (),
     (UINT32) (UINTN) ApHltLoopCode,
-    0,
+    (UINT32) (UINTN) NumberToFinish,
     0,
     TopOfStack
     );
-- 
2.9.3.windows.2



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 3/3] UefiCpuPkg/PiSmmCpuDxeSmm: Decrease mNumberToFinish in AP safe code
  2016-11-11  5:45 ` [PATCH v2 3/3] UefiCpuPkg/PiSmmCpuDxeSmm: Decrease mNumberToFinish in AP safe code Jeff Fan
@ 2016-11-11 10:16   ` Paolo Bonzini
  0 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-11-11 10:16 UTC (permalink / raw)
  To: Jeff Fan, edk2-devel; +Cc: Michael D Kinney, Laszlo Ersek, Jiewen Yao



On 11/11/2016 06:45, Jeff Fan wrote:
> We will put APs into hlt-loop in safe code. But we decrease mNumberToFinish
> before APs enter into the safe code. Paolo pointed out this gap.
> 
> This patch is to move mNumberToFinish decreasing to the safe code. It could
> make sure BSP could wait for all APs are running in safe code.
> 
> https://bugzilla.tianocore.org/show_bug.cgi?id=216
> 
> Reported-by: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Laszlo Ersek <lersek@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Jiewen Yao <jiewen.yao@intel.com>
> Cc: Michael D Kinney <michael.d.kinney@intel.com>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Jeff Fan <jeff.fan@intel.com>
> ---
>  UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c             | 17 +++++++----------
>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c |  6 ++++--
>  UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h    |  4 +++-
>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c  |  6 ++++--
>  4 files changed, 18 insertions(+), 15 deletions(-)
> 
> diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c b/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
> index e53e096..34547e0 100644
> --- a/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
> +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
> @@ -79,9 +79,11 @@ BOOLEAN                      mAcpiS3Enable = TRUE;
>  
>  UINT8                        *mApHltLoopCode = NULL;
>  UINT8                        mApHltLoopCodeTemplate[] = {
> -                               0xFA,        // cli
> -                               0xF4,        // hlt
> -                               0xEB, 0xFC   // jmp $-2
> +                               0x8B, 0x44, 0x24, 0x04,  // mov  eax, dword ptr [esp+4]
> +                               0xF0, 0xFF, 0x08,        // lock dec  dword ptr [eax]
> +                               0xFA,                    // cli
> +                               0xF4,                    // hlt
> +                               0xEB, 0xFC               // jmp $-2
>                                 };
>  
>  /**
> @@ -399,17 +401,12 @@ MPRendezvousProcedure (
>    }
>  
>    //
> -  // Count down the number with lock mechanism.
> -  //
> -  InterlockedDecrement (&mNumberToFinish);
> -
> -  //
> -  // Place AP into the safe code
> +  // Place AP into the safe code, count down the number with lock mechanism in the safe code.
>    //
>    TopOfStack  = (UINT32) (UINTN) Stack + sizeof (Stack);
>    TopOfStack &= ~(UINT32) (CPU_STACK_ALIGNMENT - 1);
>    CopyMem ((VOID *) (UINTN) mApHltLoopCode, mApHltLoopCodeTemplate, sizeof (mApHltLoopCodeTemplate));
> -  TransferApToSafeState ((UINT32) (UINTN) mApHltLoopCode, TopOfStack);
> +  TransferApToSafeState ((UINT32) (UINTN) mApHltLoopCode, TopOfStack, &mNumberToFinish);
>  }
>  
>  /**
> diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c
> index 8b880d6..9760373 100644
> --- a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c
> +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c
> @@ -100,17 +100,19 @@ InitGdt (
>  
>    @param[in] ApHltLoopCode    The 32-bit address of the safe hlt-loop function.
>    @param[in] TopOfStack       A pointer to the new stack to use for the ApHltLoopCode.
> +  @param[in] NumberToFinish   Semaphore of APs finish count.
>  
>  **/
>  VOID
>  TransferApToSafeState (
>    IN UINT32             ApHltLoopCode,
> -  IN UINT32             TopOfStack
> +  IN UINT32             TopOfStack,
> +  IN UINT32             *NumberToFinish
>    )
>  {
>    SwitchStack (
>      (SWITCH_STACK_ENTRY_POINT) (UINTN) ApHltLoopCode,
> -    NULL,
> +    NumberToFinish,
>      NULL,
>      (VOID *) (UINTN) TopOfStack
>      );
> diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
> index 6c98659..88d9c85 100644
> --- a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
> +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
> @@ -830,12 +830,14 @@ GetAcpiS3EnableFlag (
>  
>    @param[in] ApHltLoopCode    The 32-bit address of the safe hlt-loop function.
>    @param[in] TopOfStack       A pointer to the new stack to use for the ApHltLoopCode.
> +  @param[in] NumberToFinish   Semaphore of APs finish count.
>  
>  **/
>  VOID
>  TransferApToSafeState (
>    IN UINT32             ApHltLoopCode,
> -  IN UINT32             TopOfStack
> +  IN UINT32             TopOfStack,
> +  IN UINT32             *NumberToFinish
>    );
>  
>  #endif
> diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c
> index 62338b7..6844c3f 100644
> --- a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c
> +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c
> @@ -105,18 +105,20 @@ GetProtectedModeCS (
>  
>    @param[in] ApHltLoopCode    The 32-bit address of the safe hlt-loop function.
>    @param[in] TopOfStack       A pointer to the new stack to use for the ApHltLoopCode.
> +  @param[in] NumberToFinish   Semaphore of APs finish count.
>  
>  **/
>  VOID
>  TransferApToSafeState (
>    IN UINT32             ApHltLoopCode,
> -  IN UINT32             TopOfStack
> +  IN UINT32             TopOfStack,
> +  IN UINT32             *NumberToFinish
>    )
>  {
>    AsmDisablePaging64 (
>      GetProtectedModeCS (),
>      (UINT32) (UINTN) ApHltLoopCode,
> -    0,
> +    (UINT32) (UINTN) NumberToFinish,
>      0,
>      TopOfStack
>      );
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path
  2016-11-11  5:45 [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path Jeff Fan
                   ` (2 preceding siblings ...)
  2016-11-11  5:45 ` [PATCH v2 3/3] UefiCpuPkg/PiSmmCpuDxeSmm: Decrease mNumberToFinish in AP safe code Jeff Fan
@ 2016-11-11 19:49 ` Laszlo Ersek
  2016-11-13 12:51   ` Fan, Jeff
       [not found] ` <542CF652F8836A4AB8DBFAAD40ED192A4A2DCDE3@shsmsx102.ccr.corp.intel.com>
  4 siblings, 1 reply; 26+ messages in thread
From: Laszlo Ersek @ 2016-11-11 19:49 UTC (permalink / raw)
  To: Jeff Fan; +Cc: edk2-devel, Jiewen Yao, Paolo Bonzini

On 11/11/16 06:45, Jeff Fan wrote:
> On S3 path, we will wake up APs to restore CPU context in PiSmmCpuDxeSmm
> driver. In case, one NMI or SMI happens, APs may exit from hlt state and
> execute the instruction after HLT instruction.
> 
> But APs are not running on safe code, it leads OVMF S3 boot unstable.
> 
> https://bugzilla.tianocore.org/show_bug.cgi?id=216
> 
> I tested real platform with 64bit DXE.
> 
> v2:
>   1. Make stack alignment per Laszlo's comment.
>   2. Trim whitespace at end of end per Laszlo's comment.
>   3. Update year mark in file header.
>   4. Enhancement on InterlockedDecrement() per Paolo's comment.
> 
> Jeff Fan (3):
>   UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path
>   UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode on S3 path
>   UefiCpuPkg/PiSmmCpuDxeSmm: Decrease mNumberToFinish in AP safe code
> 
>  UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c             | 33 +++++++++++++-
>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 29 +++++++++++-
>  UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h    | 15 +++++++
>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c  | 63 ++++++++++++++++++++++++++-
>  4 files changed, 136 insertions(+), 4 deletions(-)
> 

Applied this locally to master (ffd6b0b1b65e) for testing. I tested the
series with a suspend-resume loop -- not a busy loop, just manually. (So
there was always one second or so between adjacent steps.)

No crashes or emulation failures, but the "AP going lost" issue remains
present -- sometimes Linux cannot bring up one of the four VCPUs after
resume.

In the Ia32 case, this "AP lost" symptom surfaced after the 6th resume.

In the Ia32X64 case, I experienced the symptom after the 89th resume.

Thanks
Laszlo


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path
  2016-11-11 19:49 ` [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path Laszlo Ersek
@ 2016-11-13 12:51   ` Fan, Jeff
  2016-11-14  1:41     ` Yao, Jiewen
  2016-11-14  8:17     ` Laszlo Ersek
  0 siblings, 2 replies; 26+ messages in thread
From: Fan, Jeff @ 2016-11-13 12:51 UTC (permalink / raw)
  To: Laszlo Ersek; +Cc: edk2-devel@ml01.01.org, Yao, Jiewen, Paolo Bonzini

Laszlo,

Thanks your testing. It seems that there is still some unknown issue existing.

I suggest to push this serial of patches firstly, because they have big progress to solve the AP crashed issue in https://bugzilla.tianocore.org/show_bug.cgi?id=216.

I could submit another bug to handle "AP lost" issue.  Thus, JIewen's  or others' patches could be push as long as they have no additional issue except for "AP Lost:".

I could follow up to fix "AP Lost" issue.

Thanks!
Jeff


-----Original Message-----
From: Laszlo Ersek [mailto:lersek@redhat.com] 
Sent: Saturday, November 12, 2016 3:49 AM
To: Fan, Jeff
Cc: edk2-devel@ml01.01.org; Yao, Jiewen; Paolo Bonzini
Subject: Re: [edk2] [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path

On 11/11/16 06:45, Jeff Fan wrote:
> On S3 path, we will wake up APs to restore CPU context in 
> PiSmmCpuDxeSmm driver. In case, one NMI or SMI happens, APs may exit 
> from hlt state and execute the instruction after HLT instruction.
> 
> But APs are not running on safe code, it leads OVMF S3 boot unstable.
> 
> https://bugzilla.tianocore.org/show_bug.cgi?id=216
> 
> I tested real platform with 64bit DXE.
> 
> v2:
>   1. Make stack alignment per Laszlo's comment.
>   2. Trim whitespace at end of end per Laszlo's comment.
>   3. Update year mark in file header.
>   4. Enhancement on InterlockedDecrement() per Paolo's comment.
> 
> Jeff Fan (3):
>   UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path
>   UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode on S3 path
>   UefiCpuPkg/PiSmmCpuDxeSmm: Decrease mNumberToFinish in AP safe code
> 
>  UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c             | 33 +++++++++++++-
>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 29 +++++++++++-
>  UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h    | 15 +++++++
>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c  | 63 
> ++++++++++++++++++++++++++-
>  4 files changed, 136 insertions(+), 4 deletions(-)
> 

Applied this locally to master (ffd6b0b1b65e) for testing. I tested the series with a suspend-resume loop -- not a busy loop, just manually. (So there was always one second or so between adjacent steps.)

No crashes or emulation failures, but the "AP going lost" issue remains present -- sometimes Linux cannot bring up one of the four VCPUs after resume.

In the Ia32 case, this "AP lost" symptom surfaced after the 6th resume.

In the Ia32X64 case, I experienced the symptom after the 89th resume.

Thanks
Laszlo


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path
  2016-11-13 12:51   ` Fan, Jeff
@ 2016-11-14  1:41     ` Yao, Jiewen
  2016-11-14  8:17     ` Laszlo Ersek
  1 sibling, 0 replies; 26+ messages in thread
From: Yao, Jiewen @ 2016-11-14  1:41 UTC (permalink / raw)
  To: Fan, Jeff, Laszlo Ersek; +Cc: edk2-devel@ml01.01.org, Paolo Bonzini

Yes. I think it is good to check in because it did fix an existing issue.
I suggest we describe the remaining issue honestly in our checkin log.

Thank you
Yao Jiewen

From: Fan, Jeff
Sent: Sunday, November 13, 2016 8:52 PM
To: Laszlo Ersek <lersek@redhat.com>
Cc: edk2-devel@ml01.01.org; Yao, Jiewen <jiewen.yao@intel.com>; Paolo Bonzini <pbonzini@redhat.com>
Subject: RE: [edk2] [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path

Laszlo,

Thanks your testing. It seems that there is still some unknown issue existing.

I suggest to push this serial of patches firstly, because they have big progress to solve the AP crashed issue in https://bugzilla.tianocore.org/show_bug.cgi?id=216.

I could submit another bug to handle "AP lost" issue.  Thus, JIewen's  or others' patches could be push as long as they have no additional issue except for "AP Lost:".

I could follow up to fix "AP Lost" issue.

Thanks!
Jeff


-----Original Message-----
From: Laszlo Ersek [mailto:lersek@redhat.com]
Sent: Saturday, November 12, 2016 3:49 AM
To: Fan, Jeff
Cc: edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Yao, Jiewen; Paolo Bonzini
Subject: Re: [edk2] [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path

On 11/11/16 06:45, Jeff Fan wrote:
> On S3 path, we will wake up APs to restore CPU context in
> PiSmmCpuDxeSmm driver. In case, one NMI or SMI happens, APs may exit
> from hlt state and execute the instruction after HLT instruction.
>
> But APs are not running on safe code, it leads OVMF S3 boot unstable.
>
> https://bugzilla.tianocore.org/show_bug.cgi?id=216
>
> I tested real platform with 64bit DXE.
>
> v2:
>   1. Make stack alignment per Laszlo's comment.
>   2. Trim whitespace at end of end per Laszlo's comment.
>   3. Update year mark in file header.
>   4. Enhancement on InterlockedDecrement() per Paolo's comment.
>
> Jeff Fan (3):
>   UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path
>   UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode on S3 path
>   UefiCpuPkg/PiSmmCpuDxeSmm: Decrease mNumberToFinish in AP safe code
>
>  UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c             | 33 +++++++++++++-
>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 29 +++++++++++-
>  UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h    | 15 +++++++
>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c  | 63
> ++++++++++++++++++++++++++-
>  4 files changed, 136 insertions(+), 4 deletions(-)
>

Applied this locally to master (ffd6b0b1b65e) for testing. I tested the series with a suspend-resume loop -- not a busy loop, just manually. (So there was always one second or so between adjacent steps.)

No crashes or emulation failures, but the "AP going lost" issue remains present -- sometimes Linux cannot bring up one of the four VCPUs after resume.

In the Ia32 case, this "AP lost" symptom surfaced after the 6th resume.

In the Ia32X64 case, I experienced the symptom after the 89th resume.

Thanks
Laszlo


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path
  2016-11-13 12:51   ` Fan, Jeff
  2016-11-14  1:41     ` Yao, Jiewen
@ 2016-11-14  8:17     ` Laszlo Ersek
  2016-11-14  8:50       ` Paolo Bonzini
  1 sibling, 1 reply; 26+ messages in thread
From: Laszlo Ersek @ 2016-11-14  8:17 UTC (permalink / raw)
  To: Fan, Jeff; +Cc: edk2-devel@ml01.01.org, Yao, Jiewen, Paolo Bonzini

On 11/13/16 13:51, Fan, Jeff wrote:
> Laszlo,
> 
> Thanks your testing. It seems that there is still some unknown issue existing.
> 
> I suggest to push this serial of patches firstly, because they have
> big progress to solve the AP crashed issue in
> https://bugzilla.tianocore.org/show_bug.cgi?id=216.

Sounds good to me.

> I could submit another bug to handle "AP lost" issue.

I hope that Paolo can continue to help us with the KVM trace analysis.

> Thus, JIewen's
> or others' patches could be push as long as they have no additional
> issue except for "AP Lost:".

I haven't gotten around testing Jiewen's v3 series yet. I think it would
be best if I could test Jiewen's v3 after this v2 series of yours is
committed. I'll report back with results.

Thanks
Laszlo

> 
> I could follow up to fix "AP Lost" issue.
> 
> Thanks!
> Jeff
> 
> 
> -----Original Message-----
> From: Laszlo Ersek [mailto:lersek@redhat.com] 
> Sent: Saturday, November 12, 2016 3:49 AM
> To: Fan, Jeff
> Cc: edk2-devel@ml01.01.org; Yao, Jiewen; Paolo Bonzini
> Subject: Re: [edk2] [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path
> 
> On 11/11/16 06:45, Jeff Fan wrote:
>> On S3 path, we will wake up APs to restore CPU context in 
>> PiSmmCpuDxeSmm driver. In case, one NMI or SMI happens, APs may exit 
>> from hlt state and execute the instruction after HLT instruction.
>>
>> But APs are not running on safe code, it leads OVMF S3 boot unstable.
>>
>> https://bugzilla.tianocore.org/show_bug.cgi?id=216
>>
>> I tested real platform with 64bit DXE.
>>
>> v2:
>>   1. Make stack alignment per Laszlo's comment.
>>   2. Trim whitespace at end of end per Laszlo's comment.
>>   3. Update year mark in file header.
>>   4. Enhancement on InterlockedDecrement() per Paolo's comment.
>>
>> Jeff Fan (3):
>>   UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path
>>   UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode on S3 path
>>   UefiCpuPkg/PiSmmCpuDxeSmm: Decrease mNumberToFinish in AP safe code
>>
>>  UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c             | 33 +++++++++++++-
>>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 29 +++++++++++-
>>  UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h    | 15 +++++++
>>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c  | 63 
>> ++++++++++++++++++++++++++-
>>  4 files changed, 136 insertions(+), 4 deletions(-)
>>
> 
> Applied this locally to master (ffd6b0b1b65e) for testing. I tested the series with a suspend-resume loop -- not a busy loop, just manually. (So there was always one second or so between adjacent steps.)
> 
> No crashes or emulation failures, but the "AP going lost" issue remains present -- sometimes Linux cannot bring up one of the four VCPUs after resume.
> 
> In the Ia32 case, this "AP lost" symptom surfaced after the 6th resume.
> 
> In the Ia32X64 case, I experienced the symptom after the 89th resume.
> 
> Thanks
> Laszlo
> 



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path
  2016-11-14  8:17     ` Laszlo Ersek
@ 2016-11-14  8:50       ` Paolo Bonzini
  2016-11-14 10:39         ` Laszlo Ersek
  0 siblings, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2016-11-14  8:50 UTC (permalink / raw)
  To: Laszlo Ersek, Fan, Jeff; +Cc: edk2-devel@ml01.01.org, Yao, Jiewen



On 14/11/2016 09:17, Laszlo Ersek wrote:
> On 11/13/16 13:51, Fan, Jeff wrote:
>> Laszlo,
>>
>> Thanks your testing. It seems that there is still some unknown issue existing.
>>
>> I suggest to push this serial of patches firstly, because they have
>> big progress to solve the AP crashed issue in
>> https://bugzilla.tianocore.org/show_bug.cgi?id=216.
> 
> Sounds good to me.
> 
>> I could submit another bug to handle "AP lost" issue.
> 
> I hope that Paolo can continue to help us with the KVM trace analysis.

I will, but it will take a few days.  In the meanwhile it would be nice
if you could take a look at using SendSmiIpiAllExcludingSelf() to bridge
the difference between 0xb2 on QEMU and on real hardware.

Paolo

>> Thus, JIewen's
>> or others' patches could be push as long as they have no additional
>> issue except for "AP Lost:".
> 
> I haven't gotten around testing Jiewen's v3 series yet. I think it would
> be best if I could test Jiewen's v3 after this v2 series of yours is
> committed. I'll report back with results.
> 
> Thanks
> Laszlo
> 
>>
>> I could follow up to fix "AP Lost" issue.
>>
>> Thanks!
>> Jeff
>>
>>
>> -----Original Message-----
>> From: Laszlo Ersek [mailto:lersek@redhat.com] 
>> Sent: Saturday, November 12, 2016 3:49 AM
>> To: Fan, Jeff
>> Cc: edk2-devel@ml01.01.org; Yao, Jiewen; Paolo Bonzini
>> Subject: Re: [edk2] [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path
>>
>> On 11/11/16 06:45, Jeff Fan wrote:
>>> On S3 path, we will wake up APs to restore CPU context in 
>>> PiSmmCpuDxeSmm driver. In case, one NMI or SMI happens, APs may exit 
>>> from hlt state and execute the instruction after HLT instruction.
>>>
>>> But APs are not running on safe code, it leads OVMF S3 boot unstable.
>>>
>>> https://bugzilla.tianocore.org/show_bug.cgi?id=216
>>>
>>> I tested real platform with 64bit DXE.
>>>
>>> v2:
>>>   1. Make stack alignment per Laszlo's comment.
>>>   2. Trim whitespace at end of end per Laszlo's comment.
>>>   3. Update year mark in file header.
>>>   4. Enhancement on InterlockedDecrement() per Paolo's comment.
>>>
>>> Jeff Fan (3):
>>>   UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path
>>>   UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode on S3 path
>>>   UefiCpuPkg/PiSmmCpuDxeSmm: Decrease mNumberToFinish in AP safe code
>>>
>>>  UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c             | 33 +++++++++++++-
>>>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 29 +++++++++++-
>>>  UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h    | 15 +++++++
>>>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c  | 63 
>>> ++++++++++++++++++++++++++-
>>>  4 files changed, 136 insertions(+), 4 deletions(-)
>>>
>>
>> Applied this locally to master (ffd6b0b1b65e) for testing. I tested the series with a suspend-resume loop -- not a busy loop, just manually. (So there was always one second or so between adjacent steps.)
>>
>> No crashes or emulation failures, but the "AP going lost" issue remains present -- sometimes Linux cannot bring up one of the four VCPUs after resume.
>>
>> In the Ia32 case, this "AP lost" symptom surfaced after the 6th resume.
>>
>> In the Ia32X64 case, I experienced the symptom after the 89th resume.
>>
>> Thanks
>> Laszlo
>>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path
  2016-11-14  8:50       ` Paolo Bonzini
@ 2016-11-14 10:39         ` Laszlo Ersek
  2016-11-14 11:09           ` Paolo Bonzini
  0 siblings, 1 reply; 26+ messages in thread
From: Laszlo Ersek @ 2016-11-14 10:39 UTC (permalink / raw)
  To: Paolo Bonzini, Fan, Jeff; +Cc: edk2-devel@ml01.01.org, Yao, Jiewen

On 11/14/16 09:50, Paolo Bonzini wrote:
> 
> 
> On 14/11/2016 09:17, Laszlo Ersek wrote:
>> On 11/13/16 13:51, Fan, Jeff wrote:
>>> Laszlo,
>>>
>>> Thanks your testing. It seems that there is still some unknown issue existing.
>>>
>>> I suggest to push this serial of patches firstly, because they have
>>> big progress to solve the AP crashed issue in
>>> https://bugzilla.tianocore.org/show_bug.cgi?id=216.
>>
>> Sounds good to me.
>>
>>> I could submit another bug to handle "AP lost" issue.
>>
>> I hope that Paolo can continue to help us with the KVM trace analysis.
> 
> I will, but it will take a few days.  In the meanwhile it would be nice
> if you could take a look at using SendSmiIpiAllExcludingSelf() to bridge
> the difference between 0xb2 on QEMU and on real hardware.

You've tried that:

https://www.mail-archive.com/edk2-devel@lists.01.org/msg02840.html
https://www.mail-archive.com/edk2-devel@lists.01.org/msg02923.html

Do you suggest to make the LocalApicLib instances usable at runtime?

For that I think we'll need to cover the LAPIC address range with a
runtime-marked EfiMemoryMappedIO area. This can be done in
"OvmfPkg/SmmControl2Dxe".

Also, we'll need a LocalApicLib instance that registers a callback for
SetVirtualAddressMap() and converts the LAPIC base address pointer.

Currently BaseXApicX2ApicLib.c's GetLocalApicBaseAddress() function uses
the MSR_IA32_APIC_BASE register if it's available -- based on CPUID --,
and falls back to PcdCpuLocalApicBaseAddress otherwise. And only
PcdCpuLocalApicBaseAddress is what we could replace with the virtual
pointer. We can't accommodate a guest OS that reprograms the LAPIC base
address.

Jeff, what do you think?

Anyway, I believe KVM doesn't support moving the LAPIC window; is that
right? (Independently, I seem to recall an attack that stole SMRAM
accesses by hiding SMRAM with the LAPIC window.)

Thanks
Laszlo


>>> Thus, JIewen's
>>> or others' patches could be push as long as they have no additional
>>> issue except for "AP Lost:".
>>
>> I haven't gotten around testing Jiewen's v3 series yet. I think it would
>> be best if I could test Jiewen's v3 after this v2 series of yours is
>> committed. I'll report back with results.
>>
>> Thanks
>> Laszlo
>>
>>>
>>> I could follow up to fix "AP Lost" issue.
>>>
>>> Thanks!
>>> Jeff
>>>
>>>
>>> -----Original Message-----
>>> From: Laszlo Ersek [mailto:lersek@redhat.com] 
>>> Sent: Saturday, November 12, 2016 3:49 AM
>>> To: Fan, Jeff
>>> Cc: edk2-devel@ml01.01.org; Yao, Jiewen; Paolo Bonzini
>>> Subject: Re: [edk2] [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path
>>>
>>> On 11/11/16 06:45, Jeff Fan wrote:
>>>> On S3 path, we will wake up APs to restore CPU context in 
>>>> PiSmmCpuDxeSmm driver. In case, one NMI or SMI happens, APs may exit 
>>>> from hlt state and execute the instruction after HLT instruction.
>>>>
>>>> But APs are not running on safe code, it leads OVMF S3 boot unstable.
>>>>
>>>> https://bugzilla.tianocore.org/show_bug.cgi?id=216
>>>>
>>>> I tested real platform with 64bit DXE.
>>>>
>>>> v2:
>>>>   1. Make stack alignment per Laszlo's comment.
>>>>   2. Trim whitespace at end of end per Laszlo's comment.
>>>>   3. Update year mark in file header.
>>>>   4. Enhancement on InterlockedDecrement() per Paolo's comment.
>>>>
>>>> Jeff Fan (3):
>>>>   UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path
>>>>   UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode on S3 path
>>>>   UefiCpuPkg/PiSmmCpuDxeSmm: Decrease mNumberToFinish in AP safe code
>>>>
>>>>  UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c             | 33 +++++++++++++-
>>>>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 29 +++++++++++-
>>>>  UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h    | 15 +++++++
>>>>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c  | 63 
>>>> ++++++++++++++++++++++++++-
>>>>  4 files changed, 136 insertions(+), 4 deletions(-)
>>>>
>>>
>>> Applied this locally to master (ffd6b0b1b65e) for testing. I tested the series with a suspend-resume loop -- not a busy loop, just manually. (So there was always one second or so between adjacent steps.)
>>>
>>> No crashes or emulation failures, but the "AP going lost" issue remains present -- sometimes Linux cannot bring up one of the four VCPUs after resume.
>>>
>>> In the Ia32 case, this "AP lost" symptom surfaced after the 6th resume.
>>>
>>> In the Ia32X64 case, I experienced the symptom after the 89th resume.
>>>
>>> Thanks
>>> Laszlo
>>>



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path
  2016-11-14 10:39         ` Laszlo Ersek
@ 2016-11-14 11:09           ` Paolo Bonzini
  2016-11-14 11:27             ` Laszlo Ersek
  0 siblings, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2016-11-14 11:09 UTC (permalink / raw)
  To: Laszlo Ersek, Fan, Jeff; +Cc: edk2-devel@ml01.01.org, Yao, Jiewen



On 14/11/2016 11:39, Laszlo Ersek wrote:
> You've tried that:
> 
> https://www.mail-archive.com/edk2-devel@lists.01.org/msg02840.html
> https://www.mail-archive.com/edk2-devel@lists.01.org/msg02923.html

Uh, right. :)

> Do you suggest to make the LocalApicLib instances usable at runtime?
> For that I think we'll need to cover the LAPIC address range with a
> runtime-marked EfiMemoryMappedIO area. This can be done in
> "OvmfPkg/SmmControl2Dxe".
> 
> Also, we'll need a LocalApicLib instance that registers a callback for
> SetVirtualAddressMap() and converts the LAPIC base address pointer.
> 
> Currently BaseXApicX2ApicLib.c's GetLocalApicBaseAddress() function uses
> the MSR_IA32_APIC_BASE register if it's available -- based on CPUID --,
> and falls back to PcdCpuLocalApicBaseAddress otherwise. And only
> PcdCpuLocalApicBaseAddress is what we could replace with the virtual
> pointer. We can't accommodate a guest OS that reprograms the LAPIC base
> address.
> 
> Jeff, what do you think?
> 
> Anyway, I believe KVM doesn't support moving the LAPIC window; is that
> right?

No, it doesn't.  Let's add a backdoor in QEMU, where writing a given
value to port 0xb2 would start generating SMIs to all CPUs.  Then you
can write this somewhere in the initialization code, and in the S3 boot
script.

Paolo


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path
  2016-11-14 11:09           ` Paolo Bonzini
@ 2016-11-14 11:27             ` Laszlo Ersek
  2016-11-14 12:00               ` Paolo Bonzini
  0 siblings, 1 reply; 26+ messages in thread
From: Laszlo Ersek @ 2016-11-14 11:27 UTC (permalink / raw)
  To: Paolo Bonzini, Fan, Jeff; +Cc: edk2-devel@ml01.01.org, Yao, Jiewen

On 11/14/16 12:09, Paolo Bonzini wrote:
> 
> 
> On 14/11/2016 11:39, Laszlo Ersek wrote:
>> You've tried that:
>>
>> https://www.mail-archive.com/edk2-devel@lists.01.org/msg02840.html
>> https://www.mail-archive.com/edk2-devel@lists.01.org/msg02923.html
> 
> Uh, right. :)
> 
>> Do you suggest to make the LocalApicLib instances usable at runtime?
>> For that I think we'll need to cover the LAPIC address range with a
>> runtime-marked EfiMemoryMappedIO area. This can be done in
>> "OvmfPkg/SmmControl2Dxe".
>>
>> Also, we'll need a LocalApicLib instance that registers a callback for
>> SetVirtualAddressMap() and converts the LAPIC base address pointer.
>>
>> Currently BaseXApicX2ApicLib.c's GetLocalApicBaseAddress() function uses
>> the MSR_IA32_APIC_BASE register if it's available -- based on CPUID --,
>> and falls back to PcdCpuLocalApicBaseAddress otherwise. And only
>> PcdCpuLocalApicBaseAddress is what we could replace with the virtual
>> pointer. We can't accommodate a guest OS that reprograms the LAPIC base
>> address.
>>
>> Jeff, what do you think?
>>
>> Anyway, I believe KVM doesn't support moving the LAPIC window; is that
>> right?
> 
> No, it doesn't.  Let's add a backdoor in QEMU, where writing a given
> value to port 0xb2 would start generating SMIs to all CPUs.  Then you
> can write this somewhere in the initialization code, and in the S3 boot
> script.

Well...

http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05658.html
http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00125.html
http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00563.html

Are you suggesting that I resurrect this patch? That would be my pleasure. Please say yes.

Also, no special handling would be necessary on the S3 resume path, because after resume, SMM clients like the variable driver would continue calling the Trigger() method. SmmControl2Dxe is a Runtime DXE Driver. The OVMF side patch looks like this (I think I never posted it, but I preserved it):

> From 2d0cf255858c265d8b0af36ab586b0dd6a9140fa Mon Sep 17 00:00:00 2001
> From: Laszlo Ersek <lersek@redhat.com>
> Date: Fri, 23 Oct 2015 17:10:38 +0200
> Subject: [PATCH] (SQUASH) OvmfPkg: SmmControl2Dxe: select broadcast SMI
>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
> ---
>  OvmfPkg/Include/IndustryStandard/Q35MchIch9.h |  6 ++++--
>  OvmfPkg/SmmControl2Dxe/SmmControl2Dxe.c       | 12 +++++++++++-
>  2 files changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/OvmfPkg/Include/IndustryStandard/Q35MchIch9.h b/OvmfPkg/Include/IndustryStandard/Q35MchIch9.h
> index 18b34a3d4f4e..3fa26395adc7 100644
> --- a/OvmfPkg/Include/IndustryStandard/Q35MchIch9.h
> +++ b/OvmfPkg/Include/IndustryStandard/Q35MchIch9.h
> @@ -83,8 +83,10 @@
>  //
>  // IO ports
>  //
> -#define ICH9_APM_CNT 0xB2
> -#define ICH9_APM_STS 0xB3
> +#define ICH9_APM_CNT                    0xB2
> +
> +#define ICH9_APM_STS                    0xB3
> +#define QEMU_ICH9_APM_STS_BROADCAST_SMI   'Q'
>
>  //
>  // IO ports relative to PMBASE
> diff --git a/OvmfPkg/SmmControl2Dxe/SmmControl2Dxe.c b/OvmfPkg/SmmControl2Dxe/SmmControl2Dxe.c
> index e895fd638d48..4f60ca48bdea 100644
> --- a/OvmfPkg/SmmControl2Dxe/SmmControl2Dxe.c
> +++ b/OvmfPkg/SmmControl2Dxe/SmmControl2Dxe.c
> @@ -95,10 +95,20 @@ SmmControl2DxeTrigger (
>    // report about hardware status, while this register is fully governed by
>    // software.
>    //
> +  // On the QEMU platform, we use the status register to request a "broadcast"
> +  // SMI; i.e., to bring all VCPUs into SMM at once. Edk2 handles the case when
> +  // the SMI is raised only on the processor that calls Trigger(), but that
> +  // case has much worse performance.
> +  //
> +  // Note that we exploit the fact that the SMM_CORE never passes a non-NULL
> +  // DataPort pointer (that is, we exploit that it doesn't care about the
> +  // status register value).
> +  //
>    // Write to the status register first, as this won't trigger the SMI just
>    // yet. Then write to the control register.
>    //
> -  IoWrite8 (ICH9_APM_STS, DataPort    == NULL ? 0 : *DataPort);
> +  ASSERT (DataPort == NULL);
> +  IoWrite8 (ICH9_APM_STS, QEMU_ICH9_APM_STS_BROADCAST_SMI);
>    IoWrite8 (ICH9_APM_CNT, CommandPort == NULL ? 0 : *CommandPort);
>    return EFI_SUCCESS;
>  }
> --
> 2.9.2
>

(I guess there's a teeny-tiny window between the two IoWrite8() calls, but I don't think any guest OS would permit (or try) an S3 suspend while some firmware call is in flight.)

If you wish, I can apply these patches (both QEMU and OVMF) locally, and then retest Jeff's v2 and Jiewen's v3 patch sets.

Thank you!
Laszlo



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path
  2016-11-14 11:27             ` Laszlo Ersek
@ 2016-11-14 12:00               ` Paolo Bonzini
  2016-11-14 18:07                 ` Laszlo Ersek
  0 siblings, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2016-11-14 12:00 UTC (permalink / raw)
  To: Laszlo Ersek, Fan, Jeff; +Cc: edk2-devel@ml01.01.org, Yao, Jiewen



On 14/11/2016 12:27, Laszlo Ersek wrote:
> Well...
> 
> http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05658.html
> http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00125.html
> http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00563.html
> 
> Are you suggesting that I resurrect this patch? That would be my
> pleasure. Please say yes.

It's hard to say no when someone has written the code already. :)

Paolo

> Also, no special handling would be necessary on the S3 resume path,
> because after resume, SMM clients like the variable driver would
> continue calling the Trigger() method. SmmControl2Dxe is a Runtime
> DXE Driver. The OVMF side patch looks like this (I think I never
> posted it, but I preserved it):


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path
  2016-11-14 12:00               ` Paolo Bonzini
@ 2016-11-14 18:07                 ` Laszlo Ersek
  2016-11-14 18:13                   ` Paolo Bonzini
  0 siblings, 1 reply; 26+ messages in thread
From: Laszlo Ersek @ 2016-11-14 18:07 UTC (permalink / raw)
  To: Paolo Bonzini, Fan, Jeff; +Cc: edk2-devel@ml01.01.org, Yao, Jiewen

On 11/14/16 13:00, Paolo Bonzini wrote:
> 
> 
> On 14/11/2016 12:27, Laszlo Ersek wrote:
>> Well...
>>
>> http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05658.html
>> http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00125.html
>> http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00563.html
>>
>> Are you suggesting that I resurrect this patch? That would be my
>> pleasure. Please say yes.
> 
> It's hard to say no when someone has written the code already. :)

Thanks. I refreshed both patches (OVMF and QEMU -- no code changes just
more precise commit messages). Unfortunately, quite a few things seem
broken, although these patches worked a year ago.

My QEMU base commit is current master 83c83f9a5266. My host kernel is
3.10.0-514.el7.x86_64.

*** So, when I test these two patches, based on edk2 master (no on-list
patches), Ia32 target, my boot hangs (spins) with the log ending in:

> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0

That is, MpInitChangeApLoopCallback() is entered, but it never finishes.
"info cpus" prints:

* CPU #0: pc=0x000000007f1f7763 thread_id=17395
  CPU #1: pc=0x000000007f2ce01e (halted) thread_id=17396
  CPU #2: pc=0x000000007f2ce01e (halted) thread_id=17397
  CPU #3: pc=0x00000000fffffff0 thread_id=17398

and I've also seen a case where all the APs were stuck at the reset
vector (0x00000000fffffff0), *not* halted, like VCPU#3 above. They don't
spin, they're just stuck. The spinning comes from CPU#0, apparently in
MpInitChangeApLoopCallback.

*** I flipped the AP sync mode to traditional (considering the relaxed
mode shouldn't be required with the broadcast SMIs). This time the log
ends with:

> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
> MpInitChangeApLoopCallback() done!

but then QEMU abort()s:

> kvm_io_ioeventfd_add: error adding ioeventfd: File exists
> 2016-11-14 17:00:41.405+0000: shutting down, reason=crashed

I see some ioeventfd stuff in the recent QEMU history; do you think it's
related?

*** My last attempt was even more strange. I applied Jeff's v2 (this
series), returned to the relaxed (= currently in-tree) sync mode, and
(of course) the broadcast SMI patches on both sides. This time I didn't
even boot an OS, I just entered the setup TUI, and selected the Reset
option. QEMU crashed again with:

> kvm_io_ioeventfd_add: error adding ioeventfd: File exists
> 2016-11-14 17:00:41.405+0000: shutting down, reason=crashed

I don't know what to look at, honestly. I think I'll check the reflog
for my local QEMU master branch, and return to one of my earlier pulls,
or else use v2.7.0 for testing.

FWIW, the broadcast SMIs work just fine as long as I'm in the firmware
(not booting an OS and not resetting, just browsing around); I verified
with GDB that the broadcast SMI branch was taken in QEMU repeatedly.

Thanks
Laszlo


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path
  2016-11-14 18:07                 ` Laszlo Ersek
@ 2016-11-14 18:13                   ` Paolo Bonzini
  2016-11-14 23:56                     ` Laszlo Ersek
  0 siblings, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2016-11-14 18:13 UTC (permalink / raw)
  To: Laszlo Ersek, Fan, Jeff; +Cc: edk2-devel@ml01.01.org, Yao, Jiewen



On 14/11/2016 19:07, Laszlo Ersek wrote:
> On 11/14/16 13:00, Paolo Bonzini wrote:
>>
>>
>> On 14/11/2016 12:27, Laszlo Ersek wrote:
>>> Well...
>>>
>>> http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05658.html
>>> http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00125.html
>>> http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00563.html
>>>
>>> Are you suggesting that I resurrect this patch? That would be my
>>> pleasure. Please say yes.
>>
>> It's hard to say no when someone has written the code already. :)
> 
> Thanks. I refreshed both patches (OVMF and QEMU -- no code changes just
> more precise commit messages). Unfortunately, quite a few things seem
> broken, although these patches worked a year ago.
> 
> My QEMU base commit is current master 83c83f9a5266. My host kernel is
> 3.10.0-514.el7.x86_64.
> 
> *** So, when I test these two patches, based on edk2 master (no on-list
> patches), Ia32 target, my boot hangs (spins) with the log ending in:
> 
>> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
> 
> That is, MpInitChangeApLoopCallback() is entered, but it never finishes.
> "info cpus" prints:
> 
> * CPU #0: pc=0x000000007f1f7763 thread_id=17395
>   CPU #1: pc=0x000000007f2ce01e (halted) thread_id=17396
>   CPU #2: pc=0x000000007f2ce01e (halted) thread_id=17397
>   CPU #3: pc=0x00000000fffffff0 thread_id=17398
> 
> and I've also seen a case where all the APs were stuck at the reset
> vector (0x00000000fffffff0), *not* halted, like VCPU#3 above. They don't
> spin, they're just stuck. The spinning comes from CPU#0, apparently in
> MpInitChangeApLoopCallback.
> 
> *** I flipped the AP sync mode to traditional (considering the relaxed
> mode shouldn't be required with the broadcast SMIs). This time the log
> ends with:
> 
>> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
>> MpInitChangeApLoopCallback() done!
> 
> but then QEMU abort()s:
> 
>> kvm_io_ioeventfd_add: error adding ioeventfd: File exists
>> 2016-11-14 17:00:41.405+0000: shutting down, reason=crashed
> 
> I see some ioeventfd stuff in the recent QEMU history; do you think it's
> related?

Yes, just try 2.7 for now or disable vhost.

Paolo

> *** My last attempt was even more strange. I applied Jeff's v2 (this
> series), returned to the relaxed (= currently in-tree) sync mode, and
> (of course) the broadcast SMI patches on both sides. This time I didn't
> even boot an OS, I just entered the setup TUI, and selected the Reset
> option. QEMU crashed again with:
> 
>> kvm_io_ioeventfd_add: error adding ioeventfd: File exists
>> 2016-11-14 17:00:41.405+0000: shutting down, reason=crashed
> 
> I don't know what to look at, honestly. I think I'll check the reflog
> for my local QEMU master branch, and return to one of my earlier pulls,
> or else use v2.7.0 for testing.
> 
> FWIW, the broadcast SMIs work just fine as long as I'm in the firmware
> (not booting an OS and not resetting, just browsing around); I verified
> with GDB that the broadcast SMI branch was taken in QEMU repeatedly.
> 
> Thanks
> Laszlo
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path
  2016-11-14 18:13                   ` Paolo Bonzini
@ 2016-11-14 23:56                     ` Laszlo Ersek
  2016-11-15  0:47                       ` Fan, Jeff
                                         ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Laszlo Ersek @ 2016-11-14 23:56 UTC (permalink / raw)
  To: Paolo Bonzini, Fan, Jeff; +Cc: edk2-devel@ml01.01.org, Yao, Jiewen

On 11/14/16 19:13, Paolo Bonzini wrote:
> 
> 
> On 14/11/2016 19:07, Laszlo Ersek wrote:
>> On 11/14/16 13:00, Paolo Bonzini wrote:
>>>
>>>
>>> On 14/11/2016 12:27, Laszlo Ersek wrote:
>>>> Well...
>>>>
>>>> http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05658.html
>>>> http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00125.html
>>>> http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00563.html
>>>>
>>>> Are you suggesting that I resurrect this patch? That would be my
>>>> pleasure. Please say yes.
>>>
>>> It's hard to say no when someone has written the code already. :)
>>
>> Thanks. I refreshed both patches (OVMF and QEMU -- no code changes just
>> more precise commit messages). Unfortunately, quite a few things seem
>> broken, although these patches worked a year ago.
>>
>> My QEMU base commit is current master 83c83f9a5266. My host kernel is
>> 3.10.0-514.el7.x86_64.
>>
>> *** So, when I test these two patches, based on edk2 master (no on-list
>> patches), Ia32 target, my boot hangs (spins) with the log ending in:
>>
>>> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
>>
>> That is, MpInitChangeApLoopCallback() is entered, but it never finishes.
>> "info cpus" prints:
>>
>> * CPU #0: pc=0x000000007f1f7763 thread_id=17395
>>   CPU #1: pc=0x000000007f2ce01e (halted) thread_id=17396
>>   CPU #2: pc=0x000000007f2ce01e (halted) thread_id=17397
>>   CPU #3: pc=0x00000000fffffff0 thread_id=17398
>>
>> and I've also seen a case where all the APs were stuck at the reset
>> vector (0x00000000fffffff0), *not* halted, like VCPU#3 above. They don't
>> spin, they're just stuck. The spinning comes from CPU#0, apparently in
>> MpInitChangeApLoopCallback.
>>
>> *** I flipped the AP sync mode to traditional (considering the relaxed
>> mode shouldn't be required with the broadcast SMIs). This time the log
>> ends with:
>>
>>> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
>>> MpInitChangeApLoopCallback() done!
>>
>> but then QEMU abort()s:
>>
>>> kvm_io_ioeventfd_add: error adding ioeventfd: File exists
>>> 2016-11-14 17:00:41.405+0000: shutting down, reason=crashed
>>
>> I see some ioeventfd stuff in the recent QEMU history; do you think it's
>> related?
> 
> Yes, just try 2.7 for now or disable vhost.

(1) I think I have some new results.

I used the gdbserver built into QEMU, and I (sort of) single-stepped the
MpInitChangeApLoopCallback() function in
"UefiCpuPkg/Library/MpInitLib/DxeMpLib.c", and whatever else it called.
For this I used the Ia32 1x2x2 configuration. Also, the broadcast SMI
patches were applied to both QEMU and OVMF.

I set a breakpoint on RelocateApLoop(), so that when an AP would start
up, gdb would switch to it automatically (and it did in fact).

I liberally used "info cpus" from a separate terminal, and also "info
thread" from within gdb (which gives an incredibly cool insight into the
VCPU states!)

(2) Here's a few interesting results (strictly empirically):

* when I was stepping through the SendInitSipiSipiAllExcludingSelf()
  function *real slow*, manually, suddenly things worked

* I noticed that, while stepping through the INIT-SIPI-SIPI sequence in
  the above-mentioned BSP function, the APs switched from "halted" to
  "running" after the *first* SIPI. Not the second SIPI, the first one.

Then I went to the KVM code, and looked at arch/x86/kvm/lapic.c. My
explanation is terribly inexact, but:

* __apic_accept_irq() translates the LAPIC writes (APIC_DM_INIT and
  APIC_DM_STARTUP) into pending events (KVM_APIC_INIT and KVM_APIC_SIPI)

* this function also calls kvm_vcpu_kick()

* the kvm_apic_accept_events() function processes the pending events.
  The KVM_APIC_INIT event, if pending, causes kvm_vcpu_reset() to be
  called, and the AP to be moved to KVM_MP_STATE_INIT_RECEIVED state.

  And, in that state, a pending KVM_APIC_SIPI event moves the AP to
  KVM_MP_STATE_RUNNABLE. This confirms my surprising empirical result
  that only one SIPI is awaited after the INIT before launching the APs.

* the kvm_vcpu_kick() function in "virt/kvm/kvm_main.c" boils down to a
  smp_send_reschedule() call, if the kicker and the "kickee" are
  different processors (for example, when the BSP kicks the AP)

  This seemed important because it suggested that host kernel
  scheduling jitter could delay the delivery (reception) of the INIT on
  the AP after the BSP sent it. If the BSP sent the first (and second)
  SIPI really fast after the INIT, then those SIPIs could be missed,
  and the AP would remain stuck in KVM_MP_STATE_INIT_RECEIVED state.

  And this state (remember kvm_vcpu_reset() from above!) is consistent
  with the RIP being stuck at the reset vector address, with the AP
  neither running nor being halted.

(3) So, I looked at the Intel SDM. It says in the sysprog book (yes,
yes, I should be using the humongous fused edition),

8.4.4.1 Typical BSP Initialization Sequence

  15. Broadcasts an INIT-SIPI-SIPI IPI sequence to the APs to wake them
      up and initialize them:

      MOV ESI, ICR_LOW; Load address of ICR low dword into ESI.
      MOV EAX, 000C4500H; Load ICR encoding for broadcast INIT IPI
                        ; to all APs into EAX.
      MOV [ESI], EAX; Broadcast INIT IPI to all APs

      ; 10-millisecond delay loop.

      MOV EAX, 000C46XXH; Load ICR encoding for broadcast SIPI IP
                        ; to all APs into EAX, where xx is the vector
                        ; computed in step 10.
      MOV [ESI], EAX; Broadcast SIPI IPI to all APs

      ; 200-microsecond delay loop

      MOV [ESI], EAX; Broadcast second SIPI IPI to all APs

     ; 200-microsecond delay loop

In the SendInitSipiSipiAllExcludingSelf(), we have a

  MicroSecondDelay (PcdGet32(PcdCpuInitIpiDelayInMicroSeconds));

between the INIT and the first SIPI, and a

  MicroSecondDelay (200);

between both SIPIs. PcdCpuInitIpiDelayInMicroSeconds is set (in
UefiCpuPkg.dec) to 10000, which matches the above 10ms recommendation.

I think this could be too low for KVM. (It's telling that the value is a
PCD in the first place.)

(4) The circumstances where the "AP is lost" -- due to the missed
SIPI(s) I believe -- vary, interestingly. These results are from my
testing with the Ia32 SMM build of OVMF, CPU topology 1x2x2, no XD
support. Jeff's v2 (== this series) is applied invariably as a basis
(because we agree that it fixes bugs).

* The "AP lost" issue persists at S3 resume even if I raise
  PcdCpuInitIpiDelayInMicroSeconds to 1/10th of a second. (Reproduced
  at the 43th resume.)

  If I set the kernel's "cpu_init_udelay" parameter similarly in
  addition, then that seems to make the symptom more frequent, not
  less, which is completely counter-intuitive :(

* If I apply the broadcast SMI patch to OVMF (on top of Jeff's v2),
  then I can't even boot; the AP (or some APs) regularly lose the SIPI
  and remain stuck in "INIT received" (I think) when started from
  MpInitChangeApLoopCallback(). Hence the boot never completes.

  Raising PcdCpuInitIpiDelayInMicroSeconds as described above fixes the
  boot. (And, the original goal of the broadcast SMI patches is
  achieved, "efibootmgr" is pretty fast even when bound to VCPU#1.)
  However, an AP still gets stuck during S3 occasionally :(

  And, cpu_init_udelay=100000 for the guest kernel makes it only worse.

Ultimately, I couldn't find a way to make S3 work reliably in the Ia32
SMM build. This v2 series doesn't help, broadcast SMIs don't help,
raising the INIT<->SIPI delay in the firmware doesn't help (it just
mitigates the bad effects of the broadcast SMIs :/), and raising the
same delay in the guest kernel only makes things worse.

Jeff, I think you should go ahead and commit this series. Paolo reviewed
patch #3 and I hope Mike or Jiewen can review the first two patches. For
OVMF they improve things (no more emulation failures), and I guess we
can figure out the lost AP issue later.

Thanks
Laszlo


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path
  2016-11-14 23:56                     ` Laszlo Ersek
@ 2016-11-15  0:47                       ` Fan, Jeff
  2016-11-15  1:03                         ` Laszlo Ersek
  2016-11-15  1:19                       ` Fan, Jeff
  2016-11-15  1:27                       ` Laszlo Ersek
  2 siblings, 1 reply; 26+ messages in thread
From: Fan, Jeff @ 2016-11-15  0:47 UTC (permalink / raw)
  To: Laszlo Ersek, Paolo Bonzini; +Cc: edk2-devel@ml01.01.org, Yao, Jiewen

Laszlo,

I file https://bugzilla.tianocore.org/show_bug.cgi?id=230 to handle AP lost issue.

I will push the v2 serial of patches, Paolo has already added r-b signature on the first 2 patches in v1 serials.

Jeff

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@redhat.com] 
Sent: Tuesday, November 15, 2016 7:56 AM
To: Paolo Bonzini; Fan, Jeff
Cc: edk2-devel@ml01.01.org; Yao, Jiewen
Subject: Re: [edk2] [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path

On 11/14/16 19:13, Paolo Bonzini wrote:
> 
> 
> On 14/11/2016 19:07, Laszlo Ersek wrote:
>> On 11/14/16 13:00, Paolo Bonzini wrote:
>>>
>>>
>>> On 14/11/2016 12:27, Laszlo Ersek wrote:
>>>> Well...
>>>>
>>>> http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05658.ht
>>>> ml 
>>>> http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00125.ht
>>>> ml 
>>>> http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00563.ht
>>>> ml
>>>>
>>>> Are you suggesting that I resurrect this patch? That would be my 
>>>> pleasure. Please say yes.
>>>
>>> It's hard to say no when someone has written the code already. :)
>>
>> Thanks. I refreshed both patches (OVMF and QEMU -- no code changes 
>> just more precise commit messages). Unfortunately, quite a few things 
>> seem broken, although these patches worked a year ago.
>>
>> My QEMU base commit is current master 83c83f9a5266. My host kernel is 
>> 3.10.0-514.el7.x86_64.
>>
>> *** So, when I test these two patches, based on edk2 master (no 
>> on-list patches), Ia32 target, my boot hangs (spins) with the log ending in:
>>
>>> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
>>
>> That is, MpInitChangeApLoopCallback() is entered, but it never finishes.
>> "info cpus" prints:
>>
>> * CPU #0: pc=0x000000007f1f7763 thread_id=17395
>>   CPU #1: pc=0x000000007f2ce01e (halted) thread_id=17396
>>   CPU #2: pc=0x000000007f2ce01e (halted) thread_id=17397
>>   CPU #3: pc=0x00000000fffffff0 thread_id=17398
>>
>> and I've also seen a case where all the APs were stuck at the reset 
>> vector (0x00000000fffffff0), *not* halted, like VCPU#3 above. They 
>> don't spin, they're just stuck. The spinning comes from CPU#0, 
>> apparently in MpInitChangeApLoopCallback.
>>
>> *** I flipped the AP sync mode to traditional (considering the 
>> relaxed mode shouldn't be required with the broadcast SMIs). This 
>> time the log ends with:
>>
>>> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
>>> MpInitChangeApLoopCallback() done!
>>
>> but then QEMU abort()s:
>>
>>> kvm_io_ioeventfd_add: error adding ioeventfd: File exists
>>> 2016-11-14 17:00:41.405+0000: shutting down, reason=crashed
>>
>> I see some ioeventfd stuff in the recent QEMU history; do you think 
>> it's related?
> 
> Yes, just try 2.7 for now or disable vhost.

(1) I think I have some new results.

I used the gdbserver built into QEMU, and I (sort of) single-stepped the
MpInitChangeApLoopCallback() function in "UefiCpuPkg/Library/MpInitLib/DxeMpLib.c", and whatever else it called.
For this I used the Ia32 1x2x2 configuration. Also, the broadcast SMI patches were applied to both QEMU and OVMF.

I set a breakpoint on RelocateApLoop(), so that when an AP would start up, gdb would switch to it automatically (and it did in fact).

I liberally used "info cpus" from a separate terminal, and also "info thread" from within gdb (which gives an incredibly cool insight into the VCPU states!)

(2) Here's a few interesting results (strictly empirically):

* when I was stepping through the SendInitSipiSipiAllExcludingSelf()
  function *real slow*, manually, suddenly things worked

* I noticed that, while stepping through the INIT-SIPI-SIPI sequence in
  the above-mentioned BSP function, the APs switched from "halted" to
  "running" after the *first* SIPI. Not the second SIPI, the first one.

Then I went to the KVM code, and looked at arch/x86/kvm/lapic.c. My explanation is terribly inexact, but:

* __apic_accept_irq() translates the LAPIC writes (APIC_DM_INIT and
  APIC_DM_STARTUP) into pending events (KVM_APIC_INIT and KVM_APIC_SIPI)

* this function also calls kvm_vcpu_kick()

* the kvm_apic_accept_events() function processes the pending events.
  The KVM_APIC_INIT event, if pending, causes kvm_vcpu_reset() to be
  called, and the AP to be moved to KVM_MP_STATE_INIT_RECEIVED state.

  And, in that state, a pending KVM_APIC_SIPI event moves the AP to
  KVM_MP_STATE_RUNNABLE. This confirms my surprising empirical result
  that only one SIPI is awaited after the INIT before launching the APs.

* the kvm_vcpu_kick() function in "virt/kvm/kvm_main.c" boils down to a
  smp_send_reschedule() call, if the kicker and the "kickee" are
  different processors (for example, when the BSP kicks the AP)

  This seemed important because it suggested that host kernel
  scheduling jitter could delay the delivery (reception) of the INIT on
  the AP after the BSP sent it. If the BSP sent the first (and second)
  SIPI really fast after the INIT, then those SIPIs could be missed,
  and the AP would remain stuck in KVM_MP_STATE_INIT_RECEIVED state.

  And this state (remember kvm_vcpu_reset() from above!) is consistent
  with the RIP being stuck at the reset vector address, with the AP
  neither running nor being halted.

(3) So, I looked at the Intel SDM. It says in the sysprog book (yes, yes, I should be using the humongous fused edition),

8.4.4.1 Typical BSP Initialization Sequence

  15. Broadcasts an INIT-SIPI-SIPI IPI sequence to the APs to wake them
      up and initialize them:

      MOV ESI, ICR_LOW; Load address of ICR low dword into ESI.
      MOV EAX, 000C4500H; Load ICR encoding for broadcast INIT IPI
                        ; to all APs into EAX.
      MOV [ESI], EAX; Broadcast INIT IPI to all APs

      ; 10-millisecond delay loop.

      MOV EAX, 000C46XXH; Load ICR encoding for broadcast SIPI IP
                        ; to all APs into EAX, where xx is the vector
                        ; computed in step 10.
      MOV [ESI], EAX; Broadcast SIPI IPI to all APs

      ; 200-microsecond delay loop

      MOV [ESI], EAX; Broadcast second SIPI IPI to all APs

     ; 200-microsecond delay loop

In the SendInitSipiSipiAllExcludingSelf(), we have a

  MicroSecondDelay (PcdGet32(PcdCpuInitIpiDelayInMicroSeconds));

between the INIT and the first SIPI, and a

  MicroSecondDelay (200);

between both SIPIs. PcdCpuInitIpiDelayInMicroSeconds is set (in
UefiCpuPkg.dec) to 10000, which matches the above 10ms recommendation.

I think this could be too low for KVM. (It's telling that the value is a PCD in the first place.)

(4) The circumstances where the "AP is lost" -- due to the missed
SIPI(s) I believe -- vary, interestingly. These results are from my testing with the Ia32 SMM build of OVMF, CPU topology 1x2x2, no XD support. Jeff's v2 (== this series) is applied invariably as a basis (because we agree that it fixes bugs).

* The "AP lost" issue persists at S3 resume even if I raise
  PcdCpuInitIpiDelayInMicroSeconds to 1/10th of a second. (Reproduced
  at the 43th resume.)

  If I set the kernel's "cpu_init_udelay" parameter similarly in
  addition, then that seems to make the symptom more frequent, not
  less, which is completely counter-intuitive :(

* If I apply the broadcast SMI patch to OVMF (on top of Jeff's v2),
  then I can't even boot; the AP (or some APs) regularly lose the SIPI
  and remain stuck in "INIT received" (I think) when started from
  MpInitChangeApLoopCallback(). Hence the boot never completes.

  Raising PcdCpuInitIpiDelayInMicroSeconds as described above fixes the
  boot. (And, the original goal of the broadcast SMI patches is
  achieved, "efibootmgr" is pretty fast even when bound to VCPU#1.)
  However, an AP still gets stuck during S3 occasionally :(

  And, cpu_init_udelay=100000 for the guest kernel makes it only worse.

Ultimately, I couldn't find a way to make S3 work reliably in the Ia32 SMM build. This v2 series doesn't help, broadcast SMIs don't help, raising the INIT<->SIPI delay in the firmware doesn't help (it just mitigates the bad effects of the broadcast SMIs :/), and raising the same delay in the guest kernel only makes things worse.

Jeff, I think you should go ahead and commit this series. Paolo reviewed patch #3 and I hope Mike or Jiewen can review the first two patches. For OVMF they improve things (no more emulation failures), and I guess we can figure out the lost AP issue later.

Thanks
Laszlo


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path
  2016-11-15  0:47                       ` Fan, Jeff
@ 2016-11-15  1:03                         ` Laszlo Ersek
  2016-11-15  1:04                           ` Fan, Jeff
  0 siblings, 1 reply; 26+ messages in thread
From: Laszlo Ersek @ 2016-11-15  1:03 UTC (permalink / raw)
  To: Fan, Jeff, Paolo Bonzini; +Cc: edk2-devel@ml01.01.org, Yao, Jiewen

On 11/15/16 01:47, Fan, Jeff wrote:
> Laszlo,
> 
> I file https://bugzilla.tianocore.org/show_bug.cgi?id=230 to handle AP lost issue.
> 
> I will push the v2 serial of patches, Paolo has already added r-b signature on the first 2 patches in v1 serials.

Please add my

Tested-by: Laszlo Ersek <lersek@redhat.com>

to the first and the third patch.

If you can wait a bit, I might be able to extend my T-b to the second
patch as well.

(More details to follow soon.)

Thanks!
Laszlo

> -----Original Message-----
> From: Laszlo Ersek [mailto:lersek@redhat.com] 
> Sent: Tuesday, November 15, 2016 7:56 AM
> To: Paolo Bonzini; Fan, Jeff
> Cc: edk2-devel@ml01.01.org; Yao, Jiewen
> Subject: Re: [edk2] [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path
> 
> On 11/14/16 19:13, Paolo Bonzini wrote:
>>
>>
>> On 14/11/2016 19:07, Laszlo Ersek wrote:
>>> On 11/14/16 13:00, Paolo Bonzini wrote:
>>>>
>>>>
>>>> On 14/11/2016 12:27, Laszlo Ersek wrote:
>>>>> Well...
>>>>>
>>>>> http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05658.ht
>>>>> ml 
>>>>> http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00125.ht
>>>>> ml 
>>>>> http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00563.ht
>>>>> ml
>>>>>
>>>>> Are you suggesting that I resurrect this patch? That would be my 
>>>>> pleasure. Please say yes.
>>>>
>>>> It's hard to say no when someone has written the code already. :)
>>>
>>> Thanks. I refreshed both patches (OVMF and QEMU -- no code changes 
>>> just more precise commit messages). Unfortunately, quite a few things 
>>> seem broken, although these patches worked a year ago.
>>>
>>> My QEMU base commit is current master 83c83f9a5266. My host kernel is 
>>> 3.10.0-514.el7.x86_64.
>>>
>>> *** So, when I test these two patches, based on edk2 master (no 
>>> on-list patches), Ia32 target, my boot hangs (spins) with the log ending in:
>>>
>>>> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
>>>
>>> That is, MpInitChangeApLoopCallback() is entered, but it never finishes.
>>> "info cpus" prints:
>>>
>>> * CPU #0: pc=0x000000007f1f7763 thread_id=17395
>>>   CPU #1: pc=0x000000007f2ce01e (halted) thread_id=17396
>>>   CPU #2: pc=0x000000007f2ce01e (halted) thread_id=17397
>>>   CPU #3: pc=0x00000000fffffff0 thread_id=17398
>>>
>>> and I've also seen a case where all the APs were stuck at the reset 
>>> vector (0x00000000fffffff0), *not* halted, like VCPU#3 above. They 
>>> don't spin, they're just stuck. The spinning comes from CPU#0, 
>>> apparently in MpInitChangeApLoopCallback.
>>>
>>> *** I flipped the AP sync mode to traditional (considering the 
>>> relaxed mode shouldn't be required with the broadcast SMIs). This 
>>> time the log ends with:
>>>
>>>> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
>>>> MpInitChangeApLoopCallback() done!
>>>
>>> but then QEMU abort()s:
>>>
>>>> kvm_io_ioeventfd_add: error adding ioeventfd: File exists
>>>> 2016-11-14 17:00:41.405+0000: shutting down, reason=crashed
>>>
>>> I see some ioeventfd stuff in the recent QEMU history; do you think 
>>> it's related?
>>
>> Yes, just try 2.7 for now or disable vhost.
> 
> (1) I think I have some new results.
> 
> I used the gdbserver built into QEMU, and I (sort of) single-stepped the
> MpInitChangeApLoopCallback() function in "UefiCpuPkg/Library/MpInitLib/DxeMpLib.c", and whatever else it called.
> For this I used the Ia32 1x2x2 configuration. Also, the broadcast SMI patches were applied to both QEMU and OVMF.
> 
> I set a breakpoint on RelocateApLoop(), so that when an AP would start up, gdb would switch to it automatically (and it did in fact).
> 
> I liberally used "info cpus" from a separate terminal, and also "info thread" from within gdb (which gives an incredibly cool insight into the VCPU states!)
> 
> (2) Here's a few interesting results (strictly empirically):
> 
> * when I was stepping through the SendInitSipiSipiAllExcludingSelf()
>   function *real slow*, manually, suddenly things worked
> 
> * I noticed that, while stepping through the INIT-SIPI-SIPI sequence in
>   the above-mentioned BSP function, the APs switched from "halted" to
>   "running" after the *first* SIPI. Not the second SIPI, the first one.
> 
> Then I went to the KVM code, and looked at arch/x86/kvm/lapic.c. My explanation is terribly inexact, but:
> 
> * __apic_accept_irq() translates the LAPIC writes (APIC_DM_INIT and
>   APIC_DM_STARTUP) into pending events (KVM_APIC_INIT and KVM_APIC_SIPI)
> 
> * this function also calls kvm_vcpu_kick()
> 
> * the kvm_apic_accept_events() function processes the pending events.
>   The KVM_APIC_INIT event, if pending, causes kvm_vcpu_reset() to be
>   called, and the AP to be moved to KVM_MP_STATE_INIT_RECEIVED state.
> 
>   And, in that state, a pending KVM_APIC_SIPI event moves the AP to
>   KVM_MP_STATE_RUNNABLE. This confirms my surprising empirical result
>   that only one SIPI is awaited after the INIT before launching the APs.
> 
> * the kvm_vcpu_kick() function in "virt/kvm/kvm_main.c" boils down to a
>   smp_send_reschedule() call, if the kicker and the "kickee" are
>   different processors (for example, when the BSP kicks the AP)
> 
>   This seemed important because it suggested that host kernel
>   scheduling jitter could delay the delivery (reception) of the INIT on
>   the AP after the BSP sent it. If the BSP sent the first (and second)
>   SIPI really fast after the INIT, then those SIPIs could be missed,
>   and the AP would remain stuck in KVM_MP_STATE_INIT_RECEIVED state.
> 
>   And this state (remember kvm_vcpu_reset() from above!) is consistent
>   with the RIP being stuck at the reset vector address, with the AP
>   neither running nor being halted.
> 
> (3) So, I looked at the Intel SDM. It says in the sysprog book (yes, yes, I should be using the humongous fused edition),
> 
> 8.4.4.1 Typical BSP Initialization Sequence
> 
>   15. Broadcasts an INIT-SIPI-SIPI IPI sequence to the APs to wake them
>       up and initialize them:
> 
>       MOV ESI, ICR_LOW; Load address of ICR low dword into ESI.
>       MOV EAX, 000C4500H; Load ICR encoding for broadcast INIT IPI
>                         ; to all APs into EAX.
>       MOV [ESI], EAX; Broadcast INIT IPI to all APs
> 
>       ; 10-millisecond delay loop.
> 
>       MOV EAX, 000C46XXH; Load ICR encoding for broadcast SIPI IP
>                         ; to all APs into EAX, where xx is the vector
>                         ; computed in step 10.
>       MOV [ESI], EAX; Broadcast SIPI IPI to all APs
> 
>       ; 200-microsecond delay loop
> 
>       MOV [ESI], EAX; Broadcast second SIPI IPI to all APs
> 
>      ; 200-microsecond delay loop
> 
> In the SendInitSipiSipiAllExcludingSelf(), we have a
> 
>   MicroSecondDelay (PcdGet32(PcdCpuInitIpiDelayInMicroSeconds));
> 
> between the INIT and the first SIPI, and a
> 
>   MicroSecondDelay (200);
> 
> between both SIPIs. PcdCpuInitIpiDelayInMicroSeconds is set (in
> UefiCpuPkg.dec) to 10000, which matches the above 10ms recommendation.
> 
> I think this could be too low for KVM. (It's telling that the value is a PCD in the first place.)
> 
> (4) The circumstances where the "AP is lost" -- due to the missed
> SIPI(s) I believe -- vary, interestingly. These results are from my testing with the Ia32 SMM build of OVMF, CPU topology 1x2x2, no XD support. Jeff's v2 (== this series) is applied invariably as a basis (because we agree that it fixes bugs).
> 
> * The "AP lost" issue persists at S3 resume even if I raise
>   PcdCpuInitIpiDelayInMicroSeconds to 1/10th of a second. (Reproduced
>   at the 43th resume.)
> 
>   If I set the kernel's "cpu_init_udelay" parameter similarly in
>   addition, then that seems to make the symptom more frequent, not
>   less, which is completely counter-intuitive :(
> 
> * If I apply the broadcast SMI patch to OVMF (on top of Jeff's v2),
>   then I can't even boot; the AP (or some APs) regularly lose the SIPI
>   and remain stuck in "INIT received" (I think) when started from
>   MpInitChangeApLoopCallback(). Hence the boot never completes.
> 
>   Raising PcdCpuInitIpiDelayInMicroSeconds as described above fixes the
>   boot. (And, the original goal of the broadcast SMI patches is
>   achieved, "efibootmgr" is pretty fast even when bound to VCPU#1.)
>   However, an AP still gets stuck during S3 occasionally :(
> 
>   And, cpu_init_udelay=100000 for the guest kernel makes it only worse.
> 
> Ultimately, I couldn't find a way to make S3 work reliably in the Ia32 SMM build. This v2 series doesn't help, broadcast SMIs don't help, raising the INIT<->SIPI delay in the firmware doesn't help (it just mitigates the bad effects of the broadcast SMIs :/), and raising the same delay in the guest kernel only makes things worse.
> 
> Jeff, I think you should go ahead and commit this series. Paolo reviewed patch #3 and I hope Mike or Jiewen can review the first two patches. For OVMF they improve things (no more emulation failures), and I guess we can figure out the lost AP issue later.
> 
> Thanks
> Laszlo
> 



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path
  2016-11-15  1:03                         ` Laszlo Ersek
@ 2016-11-15  1:04                           ` Fan, Jeff
  0 siblings, 0 replies; 26+ messages in thread
From: Fan, Jeff @ 2016-11-15  1:04 UTC (permalink / raw)
  To: Laszlo Ersek, Paolo Bonzini; +Cc: edk2-devel@ml01.01.org, Yao, Jiewen

Sure. Please add it. :-)

-----Original Message-----
From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek
Sent: Tuesday, November 15, 2016 9:03 AM
To: Fan, Jeff; Paolo Bonzini
Cc: edk2-devel@ml01.01.org; Yao, Jiewen
Subject: Re: [edk2] [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path

On 11/15/16 01:47, Fan, Jeff wrote:
> Laszlo,
> 
> I file https://bugzilla.tianocore.org/show_bug.cgi?id=230 to handle AP lost issue.
> 
> I will push the v2 serial of patches, Paolo has already added r-b signature on the first 2 patches in v1 serials.

Please add my

Tested-by: Laszlo Ersek <lersek@redhat.com>

to the first and the third patch.

If you can wait a bit, I might be able to extend my T-b to the second patch as well.

(More details to follow soon.)

Thanks!
Laszlo

> -----Original Message-----
> From: Laszlo Ersek [mailto:lersek@redhat.com]
> Sent: Tuesday, November 15, 2016 7:56 AM
> To: Paolo Bonzini; Fan, Jeff
> Cc: edk2-devel@ml01.01.org; Yao, Jiewen
> Subject: Re: [edk2] [PATCH v2 0/3] Put AP into safe hlt-loop code on 
> S3 path
> 
> On 11/14/16 19:13, Paolo Bonzini wrote:
>>
>>
>> On 14/11/2016 19:07, Laszlo Ersek wrote:
>>> On 11/14/16 13:00, Paolo Bonzini wrote:
>>>>
>>>>
>>>> On 14/11/2016 12:27, Laszlo Ersek wrote:
>>>>> Well...
>>>>>
>>>>> http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05658.h
>>>>> t
>>>>> ml
>>>>> http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00125.h
>>>>> t
>>>>> ml
>>>>> http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00563.h
>>>>> t
>>>>> ml
>>>>>
>>>>> Are you suggesting that I resurrect this patch? That would be my 
>>>>> pleasure. Please say yes.
>>>>
>>>> It's hard to say no when someone has written the code already. :)
>>>
>>> Thanks. I refreshed both patches (OVMF and QEMU -- no code changes 
>>> just more precise commit messages). Unfortunately, quite a few 
>>> things seem broken, although these patches worked a year ago.
>>>
>>> My QEMU base commit is current master 83c83f9a5266. My host kernel 
>>> is 3.10.0-514.el7.x86_64.
>>>
>>> *** So, when I test these two patches, based on edk2 master (no 
>>> on-list patches), Ia32 target, my boot hangs (spins) with the log ending in:
>>>
>>>> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
>>>
>>> That is, MpInitChangeApLoopCallback() is entered, but it never finishes.
>>> "info cpus" prints:
>>>
>>> * CPU #0: pc=0x000000007f1f7763 thread_id=17395
>>>   CPU #1: pc=0x000000007f2ce01e (halted) thread_id=17396
>>>   CPU #2: pc=0x000000007f2ce01e (halted) thread_id=17397
>>>   CPU #3: pc=0x00000000fffffff0 thread_id=17398
>>>
>>> and I've also seen a case where all the APs were stuck at the reset 
>>> vector (0x00000000fffffff0), *not* halted, like VCPU#3 above. They 
>>> don't spin, they're just stuck. The spinning comes from CPU#0, 
>>> apparently in MpInitChangeApLoopCallback.
>>>
>>> *** I flipped the AP sync mode to traditional (considering the 
>>> relaxed mode shouldn't be required with the broadcast SMIs). This 
>>> time the log ends with:
>>>
>>>> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
>>>> MpInitChangeApLoopCallback() done!
>>>
>>> but then QEMU abort()s:
>>>
>>>> kvm_io_ioeventfd_add: error adding ioeventfd: File exists
>>>> 2016-11-14 17:00:41.405+0000: shutting down, reason=crashed
>>>
>>> I see some ioeventfd stuff in the recent QEMU history; do you think 
>>> it's related?
>>
>> Yes, just try 2.7 for now or disable vhost.
> 
> (1) I think I have some new results.
> 
> I used the gdbserver built into QEMU, and I (sort of) single-stepped 
> the
> MpInitChangeApLoopCallback() function in "UefiCpuPkg/Library/MpInitLib/DxeMpLib.c", and whatever else it called.
> For this I used the Ia32 1x2x2 configuration. Also, the broadcast SMI patches were applied to both QEMU and OVMF.
> 
> I set a breakpoint on RelocateApLoop(), so that when an AP would start up, gdb would switch to it automatically (and it did in fact).
> 
> I liberally used "info cpus" from a separate terminal, and also "info 
> thread" from within gdb (which gives an incredibly cool insight into 
> the VCPU states!)
> 
> (2) Here's a few interesting results (strictly empirically):
> 
> * when I was stepping through the SendInitSipiSipiAllExcludingSelf()
>   function *real slow*, manually, suddenly things worked
> 
> * I noticed that, while stepping through the INIT-SIPI-SIPI sequence in
>   the above-mentioned BSP function, the APs switched from "halted" to
>   "running" after the *first* SIPI. Not the second SIPI, the first one.
> 
> Then I went to the KVM code, and looked at arch/x86/kvm/lapic.c. My explanation is terribly inexact, but:
> 
> * __apic_accept_irq() translates the LAPIC writes (APIC_DM_INIT and
>   APIC_DM_STARTUP) into pending events (KVM_APIC_INIT and 
> KVM_APIC_SIPI)
> 
> * this function also calls kvm_vcpu_kick()
> 
> * the kvm_apic_accept_events() function processes the pending events.
>   The KVM_APIC_INIT event, if pending, causes kvm_vcpu_reset() to be
>   called, and the AP to be moved to KVM_MP_STATE_INIT_RECEIVED state.
> 
>   And, in that state, a pending KVM_APIC_SIPI event moves the AP to
>   KVM_MP_STATE_RUNNABLE. This confirms my surprising empirical result
>   that only one SIPI is awaited after the INIT before launching the APs.
> 
> * the kvm_vcpu_kick() function in "virt/kvm/kvm_main.c" boils down to a
>   smp_send_reschedule() call, if the kicker and the "kickee" are
>   different processors (for example, when the BSP kicks the AP)
> 
>   This seemed important because it suggested that host kernel
>   scheduling jitter could delay the delivery (reception) of the INIT on
>   the AP after the BSP sent it. If the BSP sent the first (and second)
>   SIPI really fast after the INIT, then those SIPIs could be missed,
>   and the AP would remain stuck in KVM_MP_STATE_INIT_RECEIVED state.
> 
>   And this state (remember kvm_vcpu_reset() from above!) is consistent
>   with the RIP being stuck at the reset vector address, with the AP
>   neither running nor being halted.
> 
> (3) So, I looked at the Intel SDM. It says in the sysprog book (yes, 
> yes, I should be using the humongous fused edition),
> 
> 8.4.4.1 Typical BSP Initialization Sequence
> 
>   15. Broadcasts an INIT-SIPI-SIPI IPI sequence to the APs to wake them
>       up and initialize them:
> 
>       MOV ESI, ICR_LOW; Load address of ICR low dword into ESI.
>       MOV EAX, 000C4500H; Load ICR encoding for broadcast INIT IPI
>                         ; to all APs into EAX.
>       MOV [ESI], EAX; Broadcast INIT IPI to all APs
> 
>       ; 10-millisecond delay loop.
> 
>       MOV EAX, 000C46XXH; Load ICR encoding for broadcast SIPI IP
>                         ; to all APs into EAX, where xx is the vector
>                         ; computed in step 10.
>       MOV [ESI], EAX; Broadcast SIPI IPI to all APs
> 
>       ; 200-microsecond delay loop
> 
>       MOV [ESI], EAX; Broadcast second SIPI IPI to all APs
> 
>      ; 200-microsecond delay loop
> 
> In the SendInitSipiSipiAllExcludingSelf(), we have a
> 
>   MicroSecondDelay (PcdGet32(PcdCpuInitIpiDelayInMicroSeconds));
> 
> between the INIT and the first SIPI, and a
> 
>   MicroSecondDelay (200);
> 
> between both SIPIs. PcdCpuInitIpiDelayInMicroSeconds is set (in
> UefiCpuPkg.dec) to 10000, which matches the above 10ms recommendation.
> 
> I think this could be too low for KVM. (It's telling that the value is 
> a PCD in the first place.)
> 
> (4) The circumstances where the "AP is lost" -- due to the missed
> SIPI(s) I believe -- vary, interestingly. These results are from my testing with the Ia32 SMM build of OVMF, CPU topology 1x2x2, no XD support. Jeff's v2 (== this series) is applied invariably as a basis (because we agree that it fixes bugs).
> 
> * The "AP lost" issue persists at S3 resume even if I raise
>   PcdCpuInitIpiDelayInMicroSeconds to 1/10th of a second. (Reproduced
>   at the 43th resume.)
> 
>   If I set the kernel's "cpu_init_udelay" parameter similarly in
>   addition, then that seems to make the symptom more frequent, not
>   less, which is completely counter-intuitive :(
> 
> * If I apply the broadcast SMI patch to OVMF (on top of Jeff's v2),
>   then I can't even boot; the AP (or some APs) regularly lose the SIPI
>   and remain stuck in "INIT received" (I think) when started from
>   MpInitChangeApLoopCallback(). Hence the boot never completes.
> 
>   Raising PcdCpuInitIpiDelayInMicroSeconds as described above fixes the
>   boot. (And, the original goal of the broadcast SMI patches is
>   achieved, "efibootmgr" is pretty fast even when bound to VCPU#1.)
>   However, an AP still gets stuck during S3 occasionally :(
> 
>   And, cpu_init_udelay=100000 for the guest kernel makes it only worse.
> 
> Ultimately, I couldn't find a way to make S3 work reliably in the Ia32 SMM build. This v2 series doesn't help, broadcast SMIs don't help, raising the INIT<->SIPI delay in the firmware doesn't help (it just mitigates the bad effects of the broadcast SMIs :/), and raising the same delay in the guest kernel only makes things worse.
> 
> Jeff, I think you should go ahead and commit this series. Paolo reviewed patch #3 and I hope Mike or Jiewen can review the first two patches. For OVMF they improve things (no more emulation failures), and I guess we can figure out the lost AP issue later.
> 
> Thanks
> Laszlo
> 

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path
  2016-11-14 23:56                     ` Laszlo Ersek
  2016-11-15  0:47                       ` Fan, Jeff
@ 2016-11-15  1:19                       ` Fan, Jeff
  2016-11-15  1:30                         ` Laszlo Ersek
  2016-11-15  1:27                       ` Laszlo Ersek
  2 siblings, 1 reply; 26+ messages in thread
From: Fan, Jeff @ 2016-11-15  1:19 UTC (permalink / raw)
  To: Laszlo Ersek, Paolo Bonzini; +Cc: edk2-devel@ml01.01.org, Yao, Jiewen

Laszlo,

Have you tried to decrease  PcdCpuInitIpiDelayInMicroSeconds?

For PCD PcdCpuInitIpiDelayInMicroSeconds before two SIPIs, we introduced this PCD is just to do customization.
  MicroSecondDelay (PcdGet32(PcdCpuInitIpiDelayInMicroSeconds));

Per our experience,  we could decrease this PCD value to 10us (microsecond) on some platforms/processors.  You may try it.

Thanks!
Jeff

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@redhat.com] 
Sent: Tuesday, November 15, 2016 7:56 AM
To: Paolo Bonzini; Fan, Jeff
Cc: edk2-devel@ml01.01.org; Yao, Jiewen
Subject: Re: [edk2] [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path

On 11/14/16 19:13, Paolo Bonzini wrote:
> 
> 
> On 14/11/2016 19:07, Laszlo Ersek wrote:
>> On 11/14/16 13:00, Paolo Bonzini wrote:
>>>
>>>
>>> On 14/11/2016 12:27, Laszlo Ersek wrote:
>>>> Well...
>>>>
>>>> http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05658.ht
>>>> ml 
>>>> http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00125.ht
>>>> ml 
>>>> http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00563.ht
>>>> ml
>>>>
>>>> Are you suggesting that I resurrect this patch? That would be my 
>>>> pleasure. Please say yes.
>>>
>>> It's hard to say no when someone has written the code already. :)
>>
>> Thanks. I refreshed both patches (OVMF and QEMU -- no code changes 
>> just more precise commit messages). Unfortunately, quite a few things 
>> seem broken, although these patches worked a year ago.
>>
>> My QEMU base commit is current master 83c83f9a5266. My host kernel is 
>> 3.10.0-514.el7.x86_64.
>>
>> *** So, when I test these two patches, based on edk2 master (no 
>> on-list patches), Ia32 target, my boot hangs (spins) with the log ending in:
>>
>>> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
>>
>> That is, MpInitChangeApLoopCallback() is entered, but it never finishes.
>> "info cpus" prints:
>>
>> * CPU #0: pc=0x000000007f1f7763 thread_id=17395
>>   CPU #1: pc=0x000000007f2ce01e (halted) thread_id=17396
>>   CPU #2: pc=0x000000007f2ce01e (halted) thread_id=17397
>>   CPU #3: pc=0x00000000fffffff0 thread_id=17398
>>
>> and I've also seen a case where all the APs were stuck at the reset 
>> vector (0x00000000fffffff0), *not* halted, like VCPU#3 above. They 
>> don't spin, they're just stuck. The spinning comes from CPU#0, 
>> apparently in MpInitChangeApLoopCallback.
>>
>> *** I flipped the AP sync mode to traditional (considering the 
>> relaxed mode shouldn't be required with the broadcast SMIs). This 
>> time the log ends with:
>>
>>> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
>>> MpInitChangeApLoopCallback() done!
>>
>> but then QEMU abort()s:
>>
>>> kvm_io_ioeventfd_add: error adding ioeventfd: File exists
>>> 2016-11-14 17:00:41.405+0000: shutting down, reason=crashed
>>
>> I see some ioeventfd stuff in the recent QEMU history; do you think 
>> it's related?
> 
> Yes, just try 2.7 for now or disable vhost.

(1) I think I have some new results.

I used the gdbserver built into QEMU, and I (sort of) single-stepped the
MpInitChangeApLoopCallback() function in "UefiCpuPkg/Library/MpInitLib/DxeMpLib.c", and whatever else it called.
For this I used the Ia32 1x2x2 configuration. Also, the broadcast SMI patches were applied to both QEMU and OVMF.

I set a breakpoint on RelocateApLoop(), so that when an AP would start up, gdb would switch to it automatically (and it did in fact).

I liberally used "info cpus" from a separate terminal, and also "info thread" from within gdb (which gives an incredibly cool insight into the VCPU states!)

(2) Here's a few interesting results (strictly empirically):

* when I was stepping through the SendInitSipiSipiAllExcludingSelf()
  function *real slow*, manually, suddenly things worked

* I noticed that, while stepping through the INIT-SIPI-SIPI sequence in
  the above-mentioned BSP function, the APs switched from "halted" to
  "running" after the *first* SIPI. Not the second SIPI, the first one.

Then I went to the KVM code, and looked at arch/x86/kvm/lapic.c. My explanation is terribly inexact, but:

* __apic_accept_irq() translates the LAPIC writes (APIC_DM_INIT and
  APIC_DM_STARTUP) into pending events (KVM_APIC_INIT and KVM_APIC_SIPI)

* this function also calls kvm_vcpu_kick()

* the kvm_apic_accept_events() function processes the pending events.
  The KVM_APIC_INIT event, if pending, causes kvm_vcpu_reset() to be
  called, and the AP to be moved to KVM_MP_STATE_INIT_RECEIVED state.

  And, in that state, a pending KVM_APIC_SIPI event moves the AP to
  KVM_MP_STATE_RUNNABLE. This confirms my surprising empirical result
  that only one SIPI is awaited after the INIT before launching the APs.

* the kvm_vcpu_kick() function in "virt/kvm/kvm_main.c" boils down to a
  smp_send_reschedule() call, if the kicker and the "kickee" are
  different processors (for example, when the BSP kicks the AP)

  This seemed important because it suggested that host kernel
  scheduling jitter could delay the delivery (reception) of the INIT on
  the AP after the BSP sent it. If the BSP sent the first (and second)
  SIPI really fast after the INIT, then those SIPIs could be missed,
  and the AP would remain stuck in KVM_MP_STATE_INIT_RECEIVED state.

  And this state (remember kvm_vcpu_reset() from above!) is consistent
  with the RIP being stuck at the reset vector address, with the AP
  neither running nor being halted.

(3) So, I looked at the Intel SDM. It says in the sysprog book (yes, yes, I should be using the humongous fused edition),

8.4.4.1 Typical BSP Initialization Sequence

  15. Broadcasts an INIT-SIPI-SIPI IPI sequence to the APs to wake them
      up and initialize them:

      MOV ESI, ICR_LOW; Load address of ICR low dword into ESI.
      MOV EAX, 000C4500H; Load ICR encoding for broadcast INIT IPI
                        ; to all APs into EAX.
      MOV [ESI], EAX; Broadcast INIT IPI to all APs

      ; 10-millisecond delay loop.

      MOV EAX, 000C46XXH; Load ICR encoding for broadcast SIPI IP
                        ; to all APs into EAX, where xx is the vector
                        ; computed in step 10.
      MOV [ESI], EAX; Broadcast SIPI IPI to all APs

      ; 200-microsecond delay loop

      MOV [ESI], EAX; Broadcast second SIPI IPI to all APs

     ; 200-microsecond delay loop

In the SendInitSipiSipiAllExcludingSelf(), we have a

  MicroSecondDelay (PcdGet32(PcdCpuInitIpiDelayInMicroSeconds));

between the INIT and the first SIPI, and a

  MicroSecondDelay (200);

between both SIPIs. PcdCpuInitIpiDelayInMicroSeconds is set (in
UefiCpuPkg.dec) to 10000, which matches the above 10ms recommendation.

I think this could be too low for KVM. (It's telling that the value is a PCD in the first place.)

(4) The circumstances where the "AP is lost" -- due to the missed
SIPI(s) I believe -- vary, interestingly. These results are from my testing with the Ia32 SMM build of OVMF, CPU topology 1x2x2, no XD support. Jeff's v2 (== this series) is applied invariably as a basis (because we agree that it fixes bugs).

* The "AP lost" issue persists at S3 resume even if I raise
  PcdCpuInitIpiDelayInMicroSeconds to 1/10th of a second. (Reproduced
  at the 43th resume.)

  If I set the kernel's "cpu_init_udelay" parameter similarly in
  addition, then that seems to make the symptom more frequent, not
  less, which is completely counter-intuitive :(

* If I apply the broadcast SMI patch to OVMF (on top of Jeff's v2),
  then I can't even boot; the AP (or some APs) regularly lose the SIPI
  and remain stuck in "INIT received" (I think) when started from
  MpInitChangeApLoopCallback(). Hence the boot never completes.

  Raising PcdCpuInitIpiDelayInMicroSeconds as described above fixes the
  boot. (And, the original goal of the broadcast SMI patches is
  achieved, "efibootmgr" is pretty fast even when bound to VCPU#1.)
  However, an AP still gets stuck during S3 occasionally :(

  And, cpu_init_udelay=100000 for the guest kernel makes it only worse.

Ultimately, I couldn't find a way to make S3 work reliably in the Ia32 SMM build. This v2 series doesn't help, broadcast SMIs don't help, raising the INIT<->SIPI delay in the firmware doesn't help (it just mitigates the bad effects of the broadcast SMIs :/), and raising the same delay in the guest kernel only makes things worse.

Jeff, I think you should go ahead and commit this series. Paolo reviewed patch #3 and I hope Mike or Jiewen can review the first two patches. For OVMF they improve things (no more emulation failures), and I guess we can figure out the lost AP issue later.

Thanks
Laszlo


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path
       [not found] ` <542CF652F8836A4AB8DBFAAD40ED192A4A2DCDE3@shsmsx102.ccr.corp.intel.com>
@ 2016-11-15  1:21   ` Yao, Jiewen
  2016-11-15  1:24     ` Fan, Jeff
  0 siblings, 1 reply; 26+ messages in thread
From: Yao, Jiewen @ 2016-11-15  1:21 UTC (permalink / raw)
  To: Fan, Jeff, edk2-devel@lists.01.org; +Cc: Yao, Jiewen

HI jeff
The patch looks good and I have one minor suggestion on below comment:

Allocate safe memory in ACPI NVS for AP to execute hlt loop on S3 path
==>
Allocate safe memory in ACPI NVS for AP to execute hlt loop in protected mode on S3 path

I recommend we add "in protected mode" statement for the reason to allocate below 4GiB memory.

With that comment update, reviewed-by: Jiewen.yao@intel.com

Thank you
Yao Jiewen


> -----Original Message-----
> From: Fan, Jeff
> Sent: Tuesday, November 15, 2016 8:50 AM
> To: Yao, Jiewen <jiewen.yao@intel.com>
> Subject: FW: [edk2] [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path
> 
> 
> 
> -----Original Message-----
> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of
> Jeff Fan
> Sent: Friday, November 11, 2016 1:46 PM
> To: edk2-devel@lists.01.org
> Subject: [edk2] [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path
> 
> On S3 path, we will wake up APs to restore CPU context in
> PiSmmCpuDxeSmm
> driver. In case, one NMI or SMI happens, APs may exit from hlt state and
> execute the instruction after HLT instruction.
> 
> But APs are not running on safe code, it leads OVMF S3 boot unstable.
> 
> https://bugzilla.tianocore.org/show_bug.cgi?id=216
> 
> I tested real platform with 64bit DXE.
> 
> v2:
>   1. Make stack alignment per Laszlo's comment.
>   2. Trim whitespace at end of end per Laszlo's comment.
>   3. Update year mark in file header.
>   4. Enhancement on InterlockedDecrement() per Paolo's comment.
> 
> Jeff Fan (3):
>   UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path
>   UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode on S3
> path
>   UefiCpuPkg/PiSmmCpuDxeSmm: Decrease mNumberToFinish in AP safe
> code
> 
>  UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c             | 33
> +++++++++++++-
>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 29 +++++++++++-
>  UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h    | 15 +++++++
>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c  | 63
> ++++++++++++++++++++++++++-
>  4 files changed, 136 insertions(+), 4 deletions(-)
> 
> --
> 2.9.3.windows.2
> 
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path
  2016-11-15  1:21   ` Yao, Jiewen
@ 2016-11-15  1:24     ` Fan, Jeff
  0 siblings, 0 replies; 26+ messages in thread
From: Fan, Jeff @ 2016-11-15  1:24 UTC (permalink / raw)
  To: Yao, Jiewen, edk2-devel@lists.01.org

Jiewen,

Good suggestion, I will update comments when pushing the patches.

I still wait for Laszlo' t-b comments before I push this serial of patches.

Jeff

-----Original Message-----
From: Yao, Jiewen 
Sent: Tuesday, November 15, 2016 9:22 AM
To: Fan, Jeff; edk2-devel@lists.01.org
Cc: Yao, Jiewen
Subject: RE: [edk2] [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path

HI jeff
The patch looks good and I have one minor suggestion on below comment:

Allocate safe memory in ACPI NVS for AP to execute hlt loop on S3 path ==> Allocate safe memory in ACPI NVS for AP to execute hlt loop in protected mode on S3 path

I recommend we add "in protected mode" statement for the reason to allocate below 4GiB memory.

With that comment update, reviewed-by: Jiewen.yao@intel.com

Thank you
Yao Jiewen


> -----Original Message-----
> From: Fan, Jeff
> Sent: Tuesday, November 15, 2016 8:50 AM
> To: Yao, Jiewen <jiewen.yao@intel.com>
> Subject: FW: [edk2] [PATCH v2 0/3] Put AP into safe hlt-loop code on 
> S3 path
> 
> 
> 
> -----Original Message-----
> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of 
> Jeff Fan
> Sent: Friday, November 11, 2016 1:46 PM
> To: edk2-devel@lists.01.org
> Subject: [edk2] [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 
> path
> 
> On S3 path, we will wake up APs to restore CPU context in 
> PiSmmCpuDxeSmm driver. In case, one NMI or SMI happens, APs may exit 
> from hlt state and execute the instruction after HLT instruction.
> 
> But APs are not running on safe code, it leads OVMF S3 boot unstable.
> 
> https://bugzilla.tianocore.org/show_bug.cgi?id=216
> 
> I tested real platform with 64bit DXE.
> 
> v2:
>   1. Make stack alignment per Laszlo's comment.
>   2. Trim whitespace at end of end per Laszlo's comment.
>   3. Update year mark in file header.
>   4. Enhancement on InterlockedDecrement() per Paolo's comment.
> 
> Jeff Fan (3):
>   UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path
>   UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode on S3 
> path
>   UefiCpuPkg/PiSmmCpuDxeSmm: Decrease mNumberToFinish in AP safe code
> 
>  UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c             | 33
> +++++++++++++-
>  UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 29 +++++++++++-
>  UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h    | 15 +++++++
>  UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c  | 63
> ++++++++++++++++++++++++++-
>  4 files changed, 136 insertions(+), 4 deletions(-)
> 
> --
> 2.9.3.windows.2
> 
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path
  2016-11-14 23:56                     ` Laszlo Ersek
  2016-11-15  0:47                       ` Fan, Jeff
  2016-11-15  1:19                       ` Fan, Jeff
@ 2016-11-15  1:27                       ` Laszlo Ersek
  2016-11-15  1:38                         ` Fan, Jeff
  2 siblings, 1 reply; 26+ messages in thread
From: Laszlo Ersek @ 2016-11-15  1:27 UTC (permalink / raw)
  To: Paolo Bonzini, Fan, Jeff; +Cc: edk2-devel@ml01.01.org, Yao, Jiewen

On 11/15/16 00:56, Laszlo Ersek wrote:
> On 11/14/16 19:13, Paolo Bonzini wrote:
>>
>>
>> On 14/11/2016 19:07, Laszlo Ersek wrote:
>>> On 11/14/16 13:00, Paolo Bonzini wrote:
>>>>
>>>>
>>>> On 14/11/2016 12:27, Laszlo Ersek wrote:
>>>>> Well...
>>>>>
>>>>> http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05658.html
>>>>> http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00125.html
>>>>> http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00563.html
>>>>>
>>>>> Are you suggesting that I resurrect this patch? That would be my
>>>>> pleasure. Please say yes.
>>>>
>>>> It's hard to say no when someone has written the code already. :)
>>>
>>> Thanks. I refreshed both patches (OVMF and QEMU -- no code changes just
>>> more precise commit messages). Unfortunately, quite a few things seem
>>> broken, although these patches worked a year ago.
>>>
>>> My QEMU base commit is current master 83c83f9a5266. My host kernel is
>>> 3.10.0-514.el7.x86_64.
>>>
>>> *** So, when I test these two patches, based on edk2 master (no on-list
>>> patches), Ia32 target, my boot hangs (spins) with the log ending in:
>>>
>>>> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
>>>
>>> That is, MpInitChangeApLoopCallback() is entered, but it never finishes.
>>> "info cpus" prints:
>>>
>>> * CPU #0: pc=0x000000007f1f7763 thread_id=17395
>>>   CPU #1: pc=0x000000007f2ce01e (halted) thread_id=17396
>>>   CPU #2: pc=0x000000007f2ce01e (halted) thread_id=17397
>>>   CPU #3: pc=0x00000000fffffff0 thread_id=17398
>>>
>>> and I've also seen a case where all the APs were stuck at the reset
>>> vector (0x00000000fffffff0), *not* halted, like VCPU#3 above. They don't
>>> spin, they're just stuck. The spinning comes from CPU#0, apparently in
>>> MpInitChangeApLoopCallback.
>>>
>>> *** I flipped the AP sync mode to traditional (considering the relaxed
>>> mode shouldn't be required with the broadcast SMIs). This time the log
>>> ends with:
>>>
>>>> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
>>>> MpInitChangeApLoopCallback() done!
>>>
>>> but then QEMU abort()s:
>>>
>>>> kvm_io_ioeventfd_add: error adding ioeventfd: File exists
>>>> 2016-11-14 17:00:41.405+0000: shutting down, reason=crashed
>>>
>>> I see some ioeventfd stuff in the recent QEMU history; do you think it's
>>> related?
>>
>> Yes, just try 2.7 for now or disable vhost.
> 
> (1) I think I have some new results.
> 
> I used the gdbserver built into QEMU, and I (sort of) single-stepped the
> MpInitChangeApLoopCallback() function in
> "UefiCpuPkg/Library/MpInitLib/DxeMpLib.c", and whatever else it called.
> For this I used the Ia32 1x2x2 configuration. Also, the broadcast SMI
> patches were applied to both QEMU and OVMF.
> 
> I set a breakpoint on RelocateApLoop(), so that when an AP would start
> up, gdb would switch to it automatically (and it did in fact).
> 
> I liberally used "info cpus" from a separate terminal, and also "info
> thread" from within gdb (which gives an incredibly cool insight into the
> VCPU states!)
> 
> (2) Here's a few interesting results (strictly empirically):
> 
> * when I was stepping through the SendInitSipiSipiAllExcludingSelf()
>   function *real slow*, manually, suddenly things worked
> 
> * I noticed that, while stepping through the INIT-SIPI-SIPI sequence in
>   the above-mentioned BSP function, the APs switched from "halted" to
>   "running" after the *first* SIPI. Not the second SIPI, the first one.
> 
> Then I went to the KVM code, and looked at arch/x86/kvm/lapic.c. My
> explanation is terribly inexact, but:
> 
> * __apic_accept_irq() translates the LAPIC writes (APIC_DM_INIT and
>   APIC_DM_STARTUP) into pending events (KVM_APIC_INIT and KVM_APIC_SIPI)
> 
> * this function also calls kvm_vcpu_kick()
> 
> * the kvm_apic_accept_events() function processes the pending events.
>   The KVM_APIC_INIT event, if pending, causes kvm_vcpu_reset() to be
>   called, and the AP to be moved to KVM_MP_STATE_INIT_RECEIVED state.
> 
>   And, in that state, a pending KVM_APIC_SIPI event moves the AP to
>   KVM_MP_STATE_RUNNABLE. This confirms my surprising empirical result
>   that only one SIPI is awaited after the INIT before launching the APs.
> 
> * the kvm_vcpu_kick() function in "virt/kvm/kvm_main.c" boils down to a
>   smp_send_reschedule() call, if the kicker and the "kickee" are
>   different processors (for example, when the BSP kicks the AP)
> 
>   This seemed important because it suggested that host kernel
>   scheduling jitter could delay the delivery (reception) of the INIT on
>   the AP after the BSP sent it. If the BSP sent the first (and second)
>   SIPI really fast after the INIT, then those SIPIs could be missed,
>   and the AP would remain stuck in KVM_MP_STATE_INIT_RECEIVED state.
> 
>   And this state (remember kvm_vcpu_reset() from above!) is consistent
>   with the RIP being stuck at the reset vector address, with the AP
>   neither running nor being halted.
> 
> (3) So, I looked at the Intel SDM. It says in the sysprog book (yes,
> yes, I should be using the humongous fused edition),
> 
> 8.4.4.1 Typical BSP Initialization Sequence
> 
>   15. Broadcasts an INIT-SIPI-SIPI IPI sequence to the APs to wake them
>       up and initialize them:
> 
>       MOV ESI, ICR_LOW; Load address of ICR low dword into ESI.
>       MOV EAX, 000C4500H; Load ICR encoding for broadcast INIT IPI
>                         ; to all APs into EAX.
>       MOV [ESI], EAX; Broadcast INIT IPI to all APs
> 
>       ; 10-millisecond delay loop.
> 
>       MOV EAX, 000C46XXH; Load ICR encoding for broadcast SIPI IP
>                         ; to all APs into EAX, where xx is the vector
>                         ; computed in step 10.
>       MOV [ESI], EAX; Broadcast SIPI IPI to all APs
> 
>       ; 200-microsecond delay loop
> 
>       MOV [ESI], EAX; Broadcast second SIPI IPI to all APs
> 
>      ; 200-microsecond delay loop
> 
> In the SendInitSipiSipiAllExcludingSelf(), we have a
> 
>   MicroSecondDelay (PcdGet32(PcdCpuInitIpiDelayInMicroSeconds));
> 
> between the INIT and the first SIPI, and a
> 
>   MicroSecondDelay (200);
> 
> between both SIPIs. PcdCpuInitIpiDelayInMicroSeconds is set (in
> UefiCpuPkg.dec) to 10000, which matches the above 10ms recommendation.
> 
> I think this could be too low for KVM. (It's telling that the value is a
> PCD in the first place.)
> 
> (4) The circumstances where the "AP is lost" -- due to the missed
> SIPI(s) I believe -- vary, interestingly. These results are from my
> testing with the Ia32 SMM build of OVMF, CPU topology 1x2x2, no XD
> support. Jeff's v2 (== this series) is applied invariably as a basis
> (because we agree that it fixes bugs).
> 
> * The "AP lost" issue persists at S3 resume even if I raise
>   PcdCpuInitIpiDelayInMicroSeconds to 1/10th of a second. (Reproduced
>   at the 43th resume.)
> 
>   If I set the kernel's "cpu_init_udelay" parameter similarly in
>   addition, then that seems to make the symptom more frequent, not
>   less, which is completely counter-intuitive :(
> 
> * If I apply the broadcast SMI patch to OVMF (on top of Jeff's v2),
>   then I can't even boot; the AP (or some APs) regularly lose the SIPI
>   and remain stuck in "INIT received" (I think) when started from
>   MpInitChangeApLoopCallback(). Hence the boot never completes.
> 
>   Raising PcdCpuInitIpiDelayInMicroSeconds as described above fixes the
>   boot. (And, the original goal of the broadcast SMI patches is
>   achieved, "efibootmgr" is pretty fast even when bound to VCPU#1.)
>   However, an AP still gets stuck during S3 occasionally :(
> 
>   And, cpu_init_udelay=100000 for the guest kernel makes it only worse.
> 
> Ultimately, I couldn't find a way to make S3 work reliably in the Ia32
> SMM build. This v2 series doesn't help, broadcast SMIs don't help,
> raising the INIT<->SIPI delay in the firmware doesn't help (it just
> mitigates the bad effects of the broadcast SMIs :/), and raising the
> same delay in the guest kernel only makes things worse.

Scratch that, I did find a good configuration:

- Jeff's v2, plus
- broadcast SMI, plus
- flipping back the AP sync mode to 0x00 (traditional) -- this is what
  I missed in the above tests
- no guest kernel cmdline params
- messing with PcdCpuInitIpiDelayInMicroSeconds is not necessary, my
  tests succeeded (*) both with its original value (10ms) and its
  tenfold increased value (100ms)

(*) "Success" means 100 successful S3 suspend/resume cycles, separately
with and without modifying PcdCpuInitIpiDelayInMicroSeconds (in total,
200 successful cycles), without losing APs.

This is unprecedentedly good for OVMF's Ia32 SMM build!


I'll soon submit some patches. I'll also drop the PcdCpuSmmApSyncTimeout
override, because the default is longer (hence safer), and with the
traditional sync method, in practice we don't have to wait for the BSP
on the AP at all (tested with the "taskset -c 1 efibootmgr" command).

... Okay, repeated the test, with the successful configuration, for
Ia32X64 as well:
- Jeff's v2, plus
- broadcast SMI, plus
- flipping back the AP sync mode to 0x00 (traditional),
- no guest kernel cmdline params,
- no PcdCpuInitIpiDelayInMicroSeconds changes relative to the DEC,
- no PcdCpuSmmApSyncTimeout changes relative to the DEC,
- 64-bit Fedora guest tested with 100 cycles of S3, no emulation
failures, no lost APs
- 64-bit Windows 8.1 guest tested with 33 cycles of S3, no emulation
failures, no lost APs

Series
Tested-by: Laszlo Ersek <lersek@redhat.com>

Thanks!
Laszlo


> Jeff, I think you should go ahead and commit this series. Paolo reviewed
> patch #3 and I hope Mike or Jiewen can review the first two patches. For
> OVMF they improve things (no more emulation failures), and I guess we
> can figure out the lost AP issue later.
> 
> Thanks
> Laszlo
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
> 



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path
  2016-11-15  1:19                       ` Fan, Jeff
@ 2016-11-15  1:30                         ` Laszlo Ersek
  0 siblings, 0 replies; 26+ messages in thread
From: Laszlo Ersek @ 2016-11-15  1:30 UTC (permalink / raw)
  To: Fan, Jeff, Paolo Bonzini; +Cc: edk2-devel@ml01.01.org, Yao, Jiewen

On 11/15/16 02:19, Fan, Jeff wrote:
> Laszlo,
> 
> Have you tried to decrease  PcdCpuInitIpiDelayInMicroSeconds?
> 
> For PCD PcdCpuInitIpiDelayInMicroSeconds before two SIPIs, we introduced this PCD is just to do customization.
>   MicroSecondDelay (PcdGet32(PcdCpuInitIpiDelayInMicroSeconds));
> 
> Per our experience,  we could decrease this PCD value to 10us (microsecond) on some platforms/processors.  You may try it.

I didn't try to decrease it, but experimenting with it should no longer
be necessary. (See my other email.) Moving to the broadcast SMI pattern
and the traditional AP sync method *really* fixes things (together with
this series of yours).

Thanks!
Laszlo


> -----Original Message-----
> From: Laszlo Ersek [mailto:lersek@redhat.com] 
> Sent: Tuesday, November 15, 2016 7:56 AM
> To: Paolo Bonzini; Fan, Jeff
> Cc: edk2-devel@ml01.01.org; Yao, Jiewen
> Subject: Re: [edk2] [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path
> 
> On 11/14/16 19:13, Paolo Bonzini wrote:
>>
>>
>> On 14/11/2016 19:07, Laszlo Ersek wrote:
>>> On 11/14/16 13:00, Paolo Bonzini wrote:
>>>>
>>>>
>>>> On 14/11/2016 12:27, Laszlo Ersek wrote:
>>>>> Well...
>>>>>
>>>>> http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05658.ht
>>>>> ml 
>>>>> http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00125.ht
>>>>> ml 
>>>>> http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00563.ht
>>>>> ml
>>>>>
>>>>> Are you suggesting that I resurrect this patch? That would be my 
>>>>> pleasure. Please say yes.
>>>>
>>>> It's hard to say no when someone has written the code already. :)
>>>
>>> Thanks. I refreshed both patches (OVMF and QEMU -- no code changes 
>>> just more precise commit messages). Unfortunately, quite a few things 
>>> seem broken, although these patches worked a year ago.
>>>
>>> My QEMU base commit is current master 83c83f9a5266. My host kernel is 
>>> 3.10.0-514.el7.x86_64.
>>>
>>> *** So, when I test these two patches, based on edk2 master (no 
>>> on-list patches), Ia32 target, my boot hangs (spins) with the log ending in:
>>>
>>>> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
>>>
>>> That is, MpInitChangeApLoopCallback() is entered, but it never finishes.
>>> "info cpus" prints:
>>>
>>> * CPU #0: pc=0x000000007f1f7763 thread_id=17395
>>>   CPU #1: pc=0x000000007f2ce01e (halted) thread_id=17396
>>>   CPU #2: pc=0x000000007f2ce01e (halted) thread_id=17397
>>>   CPU #3: pc=0x00000000fffffff0 thread_id=17398
>>>
>>> and I've also seen a case where all the APs were stuck at the reset 
>>> vector (0x00000000fffffff0), *not* halted, like VCPU#3 above. They 
>>> don't spin, they're just stuck. The spinning comes from CPU#0, 
>>> apparently in MpInitChangeApLoopCallback.
>>>
>>> *** I flipped the AP sync mode to traditional (considering the 
>>> relaxed mode shouldn't be required with the broadcast SMIs). This 
>>> time the log ends with:
>>>
>>>> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
>>>> MpInitChangeApLoopCallback() done!
>>>
>>> but then QEMU abort()s:
>>>
>>>> kvm_io_ioeventfd_add: error adding ioeventfd: File exists
>>>> 2016-11-14 17:00:41.405+0000: shutting down, reason=crashed
>>>
>>> I see some ioeventfd stuff in the recent QEMU history; do you think 
>>> it's related?
>>
>> Yes, just try 2.7 for now or disable vhost.
> 
> (1) I think I have some new results.
> 
> I used the gdbserver built into QEMU, and I (sort of) single-stepped the
> MpInitChangeApLoopCallback() function in "UefiCpuPkg/Library/MpInitLib/DxeMpLib.c", and whatever else it called.
> For this I used the Ia32 1x2x2 configuration. Also, the broadcast SMI patches were applied to both QEMU and OVMF.
> 
> I set a breakpoint on RelocateApLoop(), so that when an AP would start up, gdb would switch to it automatically (and it did in fact).
> 
> I liberally used "info cpus" from a separate terminal, and also "info thread" from within gdb (which gives an incredibly cool insight into the VCPU states!)
> 
> (2) Here's a few interesting results (strictly empirically):
> 
> * when I was stepping through the SendInitSipiSipiAllExcludingSelf()
>   function *real slow*, manually, suddenly things worked
> 
> * I noticed that, while stepping through the INIT-SIPI-SIPI sequence in
>   the above-mentioned BSP function, the APs switched from "halted" to
>   "running" after the *first* SIPI. Not the second SIPI, the first one.
> 
> Then I went to the KVM code, and looked at arch/x86/kvm/lapic.c. My explanation is terribly inexact, but:
> 
> * __apic_accept_irq() translates the LAPIC writes (APIC_DM_INIT and
>   APIC_DM_STARTUP) into pending events (KVM_APIC_INIT and KVM_APIC_SIPI)
> 
> * this function also calls kvm_vcpu_kick()
> 
> * the kvm_apic_accept_events() function processes the pending events.
>   The KVM_APIC_INIT event, if pending, causes kvm_vcpu_reset() to be
>   called, and the AP to be moved to KVM_MP_STATE_INIT_RECEIVED state.
> 
>   And, in that state, a pending KVM_APIC_SIPI event moves the AP to
>   KVM_MP_STATE_RUNNABLE. This confirms my surprising empirical result
>   that only one SIPI is awaited after the INIT before launching the APs.
> 
> * the kvm_vcpu_kick() function in "virt/kvm/kvm_main.c" boils down to a
>   smp_send_reschedule() call, if the kicker and the "kickee" are
>   different processors (for example, when the BSP kicks the AP)
> 
>   This seemed important because it suggested that host kernel
>   scheduling jitter could delay the delivery (reception) of the INIT on
>   the AP after the BSP sent it. If the BSP sent the first (and second)
>   SIPI really fast after the INIT, then those SIPIs could be missed,
>   and the AP would remain stuck in KVM_MP_STATE_INIT_RECEIVED state.
> 
>   And this state (remember kvm_vcpu_reset() from above!) is consistent
>   with the RIP being stuck at the reset vector address, with the AP
>   neither running nor being halted.
> 
> (3) So, I looked at the Intel SDM. It says in the sysprog book (yes, yes, I should be using the humongous fused edition),
> 
> 8.4.4.1 Typical BSP Initialization Sequence
> 
>   15. Broadcasts an INIT-SIPI-SIPI IPI sequence to the APs to wake them
>       up and initialize them:
> 
>       MOV ESI, ICR_LOW; Load address of ICR low dword into ESI.
>       MOV EAX, 000C4500H; Load ICR encoding for broadcast INIT IPI
>                         ; to all APs into EAX.
>       MOV [ESI], EAX; Broadcast INIT IPI to all APs
> 
>       ; 10-millisecond delay loop.
> 
>       MOV EAX, 000C46XXH; Load ICR encoding for broadcast SIPI IP
>                         ; to all APs into EAX, where xx is the vector
>                         ; computed in step 10.
>       MOV [ESI], EAX; Broadcast SIPI IPI to all APs
> 
>       ; 200-microsecond delay loop
> 
>       MOV [ESI], EAX; Broadcast second SIPI IPI to all APs
> 
>      ; 200-microsecond delay loop
> 
> In the SendInitSipiSipiAllExcludingSelf(), we have a
> 
>   MicroSecondDelay (PcdGet32(PcdCpuInitIpiDelayInMicroSeconds));
> 
> between the INIT and the first SIPI, and a
> 
>   MicroSecondDelay (200);
> 
> between both SIPIs. PcdCpuInitIpiDelayInMicroSeconds is set (in
> UefiCpuPkg.dec) to 10000, which matches the above 10ms recommendation.
> 
> I think this could be too low for KVM. (It's telling that the value is a PCD in the first place.)
> 
> (4) The circumstances where the "AP is lost" -- due to the missed
> SIPI(s) I believe -- vary, interestingly. These results are from my testing with the Ia32 SMM build of OVMF, CPU topology 1x2x2, no XD support. Jeff's v2 (== this series) is applied invariably as a basis (because we agree that it fixes bugs).
> 
> * The "AP lost" issue persists at S3 resume even if I raise
>   PcdCpuInitIpiDelayInMicroSeconds to 1/10th of a second. (Reproduced
>   at the 43th resume.)
> 
>   If I set the kernel's "cpu_init_udelay" parameter similarly in
>   addition, then that seems to make the symptom more frequent, not
>   less, which is completely counter-intuitive :(
> 
> * If I apply the broadcast SMI patch to OVMF (on top of Jeff's v2),
>   then I can't even boot; the AP (or some APs) regularly lose the SIPI
>   and remain stuck in "INIT received" (I think) when started from
>   MpInitChangeApLoopCallback(). Hence the boot never completes.
> 
>   Raising PcdCpuInitIpiDelayInMicroSeconds as described above fixes the
>   boot. (And, the original goal of the broadcast SMI patches is
>   achieved, "efibootmgr" is pretty fast even when bound to VCPU#1.)
>   However, an AP still gets stuck during S3 occasionally :(
> 
>   And, cpu_init_udelay=100000 for the guest kernel makes it only worse.
> 
> Ultimately, I couldn't find a way to make S3 work reliably in the Ia32 SMM build. This v2 series doesn't help, broadcast SMIs don't help, raising the INIT<->SIPI delay in the firmware doesn't help (it just mitigates the bad effects of the broadcast SMIs :/), and raising the same delay in the guest kernel only makes things worse.
> 
> Jeff, I think you should go ahead and commit this series. Paolo reviewed patch #3 and I hope Mike or Jiewen can review the first two patches. For OVMF they improve things (no more emulation failures), and I guess we can figure out the lost AP issue later.
> 
> Thanks
> Laszlo
> 



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path
  2016-11-15  1:27                       ` Laszlo Ersek
@ 2016-11-15  1:38                         ` Fan, Jeff
  0 siblings, 0 replies; 26+ messages in thread
From: Fan, Jeff @ 2016-11-15  1:38 UTC (permalink / raw)
  To: Laszlo Ersek, Paolo Bonzini; +Cc: edk2-devel@ml01.01.org, Yao, Jiewen

Laszlo,

That's great!  Thanks your details updating.

Jeff

-----Original Message-----
From: Laszlo Ersek [mailto:lersek@redhat.com] 
Sent: Tuesday, November 15, 2016 9:28 AM
To: Paolo Bonzini; Fan, Jeff
Cc: edk2-devel@ml01.01.org; Yao, Jiewen
Subject: Re: [edk2] [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path

On 11/15/16 00:56, Laszlo Ersek wrote:
> On 11/14/16 19:13, Paolo Bonzini wrote:
>>
>>
>> On 14/11/2016 19:07, Laszlo Ersek wrote:
>>> On 11/14/16 13:00, Paolo Bonzini wrote:
>>>>
>>>>
>>>> On 14/11/2016 12:27, Laszlo Ersek wrote:
>>>>> Well...
>>>>>
>>>>> http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05658.h
>>>>> tml 
>>>>> http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00125.h
>>>>> tml 
>>>>> http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00563.h
>>>>> tml
>>>>>
>>>>> Are you suggesting that I resurrect this patch? That would be my 
>>>>> pleasure. Please say yes.
>>>>
>>>> It's hard to say no when someone has written the code already. :)
>>>
>>> Thanks. I refreshed both patches (OVMF and QEMU -- no code changes 
>>> just more precise commit messages). Unfortunately, quite a few 
>>> things seem broken, although these patches worked a year ago.
>>>
>>> My QEMU base commit is current master 83c83f9a5266. My host kernel 
>>> is 3.10.0-514.el7.x86_64.
>>>
>>> *** So, when I test these two patches, based on edk2 master (no 
>>> on-list patches), Ia32 target, my boot hangs (spins) with the log ending in:
>>>
>>>> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
>>>
>>> That is, MpInitChangeApLoopCallback() is entered, but it never finishes.
>>> "info cpus" prints:
>>>
>>> * CPU #0: pc=0x000000007f1f7763 thread_id=17395
>>>   CPU #1: pc=0x000000007f2ce01e (halted) thread_id=17396
>>>   CPU #2: pc=0x000000007f2ce01e (halted) thread_id=17397
>>>   CPU #3: pc=0x00000000fffffff0 thread_id=17398
>>>
>>> and I've also seen a case where all the APs were stuck at the reset 
>>> vector (0x00000000fffffff0), *not* halted, like VCPU#3 above. They 
>>> don't spin, they're just stuck. The spinning comes from CPU#0, 
>>> apparently in MpInitChangeApLoopCallback.
>>>
>>> *** I flipped the AP sync mode to traditional (considering the 
>>> relaxed mode shouldn't be required with the broadcast SMIs). This 
>>> time the log ends with:
>>>
>>>> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
>>>> MpInitChangeApLoopCallback() done!
>>>
>>> but then QEMU abort()s:
>>>
>>>> kvm_io_ioeventfd_add: error adding ioeventfd: File exists
>>>> 2016-11-14 17:00:41.405+0000: shutting down, reason=crashed
>>>
>>> I see some ioeventfd stuff in the recent QEMU history; do you think 
>>> it's related?
>>
>> Yes, just try 2.7 for now or disable vhost.
> 
> (1) I think I have some new results.
> 
> I used the gdbserver built into QEMU, and I (sort of) single-stepped 
> the
> MpInitChangeApLoopCallback() function in 
> "UefiCpuPkg/Library/MpInitLib/DxeMpLib.c", and whatever else it called.
> For this I used the Ia32 1x2x2 configuration. Also, the broadcast SMI 
> patches were applied to both QEMU and OVMF.
> 
> I set a breakpoint on RelocateApLoop(), so that when an AP would start 
> up, gdb would switch to it automatically (and it did in fact).
> 
> I liberally used "info cpus" from a separate terminal, and also "info 
> thread" from within gdb (which gives an incredibly cool insight into 
> the VCPU states!)
> 
> (2) Here's a few interesting results (strictly empirically):
> 
> * when I was stepping through the SendInitSipiSipiAllExcludingSelf()
>   function *real slow*, manually, suddenly things worked
> 
> * I noticed that, while stepping through the INIT-SIPI-SIPI sequence in
>   the above-mentioned BSP function, the APs switched from "halted" to
>   "running" after the *first* SIPI. Not the second SIPI, the first one.
> 
> Then I went to the KVM code, and looked at arch/x86/kvm/lapic.c. My 
> explanation is terribly inexact, but:
> 
> * __apic_accept_irq() translates the LAPIC writes (APIC_DM_INIT and
>   APIC_DM_STARTUP) into pending events (KVM_APIC_INIT and 
> KVM_APIC_SIPI)
> 
> * this function also calls kvm_vcpu_kick()
> 
> * the kvm_apic_accept_events() function processes the pending events.
>   The KVM_APIC_INIT event, if pending, causes kvm_vcpu_reset() to be
>   called, and the AP to be moved to KVM_MP_STATE_INIT_RECEIVED state.
> 
>   And, in that state, a pending KVM_APIC_SIPI event moves the AP to
>   KVM_MP_STATE_RUNNABLE. This confirms my surprising empirical result
>   that only one SIPI is awaited after the INIT before launching the APs.
> 
> * the kvm_vcpu_kick() function in "virt/kvm/kvm_main.c" boils down to a
>   smp_send_reschedule() call, if the kicker and the "kickee" are
>   different processors (for example, when the BSP kicks the AP)
> 
>   This seemed important because it suggested that host kernel
>   scheduling jitter could delay the delivery (reception) of the INIT on
>   the AP after the BSP sent it. If the BSP sent the first (and second)
>   SIPI really fast after the INIT, then those SIPIs could be missed,
>   and the AP would remain stuck in KVM_MP_STATE_INIT_RECEIVED state.
> 
>   And this state (remember kvm_vcpu_reset() from above!) is consistent
>   with the RIP being stuck at the reset vector address, with the AP
>   neither running nor being halted.
> 
> (3) So, I looked at the Intel SDM. It says in the sysprog book (yes, 
> yes, I should be using the humongous fused edition),
> 
> 8.4.4.1 Typical BSP Initialization Sequence
> 
>   15. Broadcasts an INIT-SIPI-SIPI IPI sequence to the APs to wake them
>       up and initialize them:
> 
>       MOV ESI, ICR_LOW; Load address of ICR low dword into ESI.
>       MOV EAX, 000C4500H; Load ICR encoding for broadcast INIT IPI
>                         ; to all APs into EAX.
>       MOV [ESI], EAX; Broadcast INIT IPI to all APs
> 
>       ; 10-millisecond delay loop.
> 
>       MOV EAX, 000C46XXH; Load ICR encoding for broadcast SIPI IP
>                         ; to all APs into EAX, where xx is the vector
>                         ; computed in step 10.
>       MOV [ESI], EAX; Broadcast SIPI IPI to all APs
> 
>       ; 200-microsecond delay loop
> 
>       MOV [ESI], EAX; Broadcast second SIPI IPI to all APs
> 
>      ; 200-microsecond delay loop
> 
> In the SendInitSipiSipiAllExcludingSelf(), we have a
> 
>   MicroSecondDelay (PcdGet32(PcdCpuInitIpiDelayInMicroSeconds));
> 
> between the INIT and the first SIPI, and a
> 
>   MicroSecondDelay (200);
> 
> between both SIPIs. PcdCpuInitIpiDelayInMicroSeconds is set (in
> UefiCpuPkg.dec) to 10000, which matches the above 10ms recommendation.
> 
> I think this could be too low for KVM. (It's telling that the value is 
> a PCD in the first place.)
> 
> (4) The circumstances where the "AP is lost" -- due to the missed
> SIPI(s) I believe -- vary, interestingly. These results are from my 
> testing with the Ia32 SMM build of OVMF, CPU topology 1x2x2, no XD 
> support. Jeff's v2 (== this series) is applied invariably as a basis 
> (because we agree that it fixes bugs).
> 
> * The "AP lost" issue persists at S3 resume even if I raise
>   PcdCpuInitIpiDelayInMicroSeconds to 1/10th of a second. (Reproduced
>   at the 43th resume.)
> 
>   If I set the kernel's "cpu_init_udelay" parameter similarly in
>   addition, then that seems to make the symptom more frequent, not
>   less, which is completely counter-intuitive :(
> 
> * If I apply the broadcast SMI patch to OVMF (on top of Jeff's v2),
>   then I can't even boot; the AP (or some APs) regularly lose the SIPI
>   and remain stuck in "INIT received" (I think) when started from
>   MpInitChangeApLoopCallback(). Hence the boot never completes.
> 
>   Raising PcdCpuInitIpiDelayInMicroSeconds as described above fixes the
>   boot. (And, the original goal of the broadcast SMI patches is
>   achieved, "efibootmgr" is pretty fast even when bound to VCPU#1.)
>   However, an AP still gets stuck during S3 occasionally :(
> 
>   And, cpu_init_udelay=100000 for the guest kernel makes it only worse.
> 
> Ultimately, I couldn't find a way to make S3 work reliably in the Ia32 
> SMM build. This v2 series doesn't help, broadcast SMIs don't help, 
> raising the INIT<->SIPI delay in the firmware doesn't help (it just 
> mitigates the bad effects of the broadcast SMIs :/), and raising the 
> same delay in the guest kernel only makes things worse.

Scratch that, I did find a good configuration:

- Jeff's v2, plus
- broadcast SMI, plus
- flipping back the AP sync mode to 0x00 (traditional) -- this is what
  I missed in the above tests
- no guest kernel cmdline params
- messing with PcdCpuInitIpiDelayInMicroSeconds is not necessary, my
  tests succeeded (*) both with its original value (10ms) and its
  tenfold increased value (100ms)

(*) "Success" means 100 successful S3 suspend/resume cycles, separately with and without modifying PcdCpuInitIpiDelayInMicroSeconds (in total,
200 successful cycles), without losing APs.

This is unprecedentedly good for OVMF's Ia32 SMM build!


I'll soon submit some patches. I'll also drop the PcdCpuSmmApSyncTimeout override, because the default is longer (hence safer), and with the traditional sync method, in practice we don't have to wait for the BSP on the AP at all (tested with the "taskset -c 1 efibootmgr" command).

... Okay, repeated the test, with the successful configuration, for
Ia32X64 as well:
- Jeff's v2, plus
- broadcast SMI, plus
- flipping back the AP sync mode to 0x00 (traditional),
- no guest kernel cmdline params,
- no PcdCpuInitIpiDelayInMicroSeconds changes relative to the DEC,
- no PcdCpuSmmApSyncTimeout changes relative to the DEC,
- 64-bit Fedora guest tested with 100 cycles of S3, no emulation failures, no lost APs
- 64-bit Windows 8.1 guest tested with 33 cycles of S3, no emulation failures, no lost APs

Series
Tested-by: Laszlo Ersek <lersek@redhat.com>

Thanks!
Laszlo


> Jeff, I think you should go ahead and commit this series. Paolo 
> reviewed patch #3 and I hope Mike or Jiewen can review the first two 
> patches. For OVMF they improve things (no more emulation failures), 
> and I guess we can figure out the lost AP issue later.
> 
> Thanks
> Laszlo
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
> 



^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2016-11-15  1:38 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-11  5:45 [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path Jeff Fan
2016-11-11  5:45 ` [PATCH v2 1/3] UefiCpuPkg/PiSmmCpuDxeSmm: " Jeff Fan
2016-11-11  5:45 ` [PATCH v2 2/3] UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode " Jeff Fan
2016-11-11  5:45 ` [PATCH v2 3/3] UefiCpuPkg/PiSmmCpuDxeSmm: Decrease mNumberToFinish in AP safe code Jeff Fan
2016-11-11 10:16   ` Paolo Bonzini
2016-11-11 19:49 ` [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path Laszlo Ersek
2016-11-13 12:51   ` Fan, Jeff
2016-11-14  1:41     ` Yao, Jiewen
2016-11-14  8:17     ` Laszlo Ersek
2016-11-14  8:50       ` Paolo Bonzini
2016-11-14 10:39         ` Laszlo Ersek
2016-11-14 11:09           ` Paolo Bonzini
2016-11-14 11:27             ` Laszlo Ersek
2016-11-14 12:00               ` Paolo Bonzini
2016-11-14 18:07                 ` Laszlo Ersek
2016-11-14 18:13                   ` Paolo Bonzini
2016-11-14 23:56                     ` Laszlo Ersek
2016-11-15  0:47                       ` Fan, Jeff
2016-11-15  1:03                         ` Laszlo Ersek
2016-11-15  1:04                           ` Fan, Jeff
2016-11-15  1:19                       ` Fan, Jeff
2016-11-15  1:30                         ` Laszlo Ersek
2016-11-15  1:27                       ` Laszlo Ersek
2016-11-15  1:38                         ` Fan, Jeff
     [not found] ` <542CF652F8836A4AB8DBFAAD40ED192A4A2DCDE3@shsmsx102.ccr.corp.intel.com>
2016-11-15  1:21   ` Yao, Jiewen
2016-11-15  1:24     ` Fan, Jeff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox