* [PATCH 0/2] Put AP into safe hlt-loop code on S3 path @ 2016-11-10 6:07 Jeff Fan 2016-11-10 6:07 ` [PATCH 1/2] UefiCpuPkg/PiSmmCpuDxeSmm: " Jeff Fan ` (4 more replies) 0 siblings, 5 replies; 15+ messages in thread From: Jeff Fan @ 2016-11-10 6:07 UTC (permalink / raw) To: edk2-devel On S3 path, we will wake up APs to restore CPU context in PiSmmCpuDxeSmm driver. In case, one NMI or SMI happens, APs may exit from hlt state and execute the instruction after HLT instruction. But APs are not running on safe code, it leads OVMF S3 boot unstable. https://bugzilla.tianocore.org/show_bug.cgi?id=216 I tested real platform with 64bit DXE. Jeff Fan (2): UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode on S3 path UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c | 31 ++++++++++++++ UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 25 ++++++++++++ UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h | 13 ++++++ UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c | 59 +++++++++++++++++++++++++++ 4 files changed, 128 insertions(+) -- 2.9.3.windows.2 ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/2] UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path 2016-11-10 6:07 [PATCH 0/2] Put AP into safe hlt-loop code on S3 path Jeff Fan @ 2016-11-10 6:07 ` Jeff Fan 2016-11-10 8:50 ` Laszlo Ersek 2016-11-10 6:07 ` [PATCH 2/2] UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode " Jeff Fan ` (3 subsequent siblings) 4 siblings, 1 reply; 15+ messages in thread From: Jeff Fan @ 2016-11-10 6:07 UTC (permalink / raw) To: edk2-devel; +Cc: Laszlo Ersek, Paolo Bonzini, Jiewen Yao, Michael D Kinney On S3 path, we will wake up APs to restore CPU context in PiSmmCpuDxeSmm driver. However, we place AP in hlt-loop under 1MB space borrowed after CPU restoring CPU contexts. In case, one NMI or SMI happens, APs may exit from hlt state and execute the instruction after HLT instruction. But the code under 1MB is no longer safe at that time. This fix is to allocate one ACPI NVS range to place the AP hlt-loop code. When CPU finished restoration CPU contexts, AP will execute in this ACPI NVS range. https://bugzilla.tianocore.org/show_bug.cgi?id=216 Reported-by: Laszlo Ersek <lersek@redhat.com> Cc: Laszlo Ersek <lersek@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Jiewen Yao <jiewen.yao@intel.com> Cc: Michael D Kinney <michael.d.kinney@intel.com> Contributed-under: TianoCore Contribution Agreement 1.0 Signed-off-by: Jeff Fan <jeff.fan@intel.com> --- UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c | 31 +++++++++++++++++++++++++++ UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 25 +++++++++++++++++++++ UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h | 13 +++++++++++ UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c | 26 ++++++++++++++++++++++ 4 files changed, 95 insertions(+) diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c b/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c index 6a798ef..6f290b8 100644 --- a/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c @@ -77,6 +77,13 @@ SMM_S3_RESUME_STATE *mSmmS3ResumeState = NULL; BOOLEAN mAcpiS3Enable = TRUE; +UINT8 *mApHltLoopCode = NULL; +UINT8 mApHltLoopCodeTemplate[] = { + 0xFA, // cli + 0xF4, // hlt + 0xEB, 0xFC // jmp $-2 + }; + /** Get MSR spin lock by MSR index. @@ -376,6 +383,8 @@ MPRendezvousProcedure ( CPU_REGISTER_TABLE *RegisterTableList; UINT32 InitApicId; UINTN Index; + UINT32 TopOfStack; + UINT8 Stack[128]; ProgramVirtualWireMode (); DisableLvtInterrupts (); @@ -393,6 +402,13 @@ MPRendezvousProcedure ( // Count down the number with lock mechanism. // InterlockedDecrement (&mNumberToFinish); + + // + // Place AP into the safe code + // + TopOfStack = (UINT32) (UINTN) Stack + sizeof (Stack); + CopyMem ((VOID *) (UINTN) mApHltLoopCode, mApHltLoopCodeTemplate, sizeof (mApHltLoopCodeTemplate)); + TransferApToSafeState ((UINT32) (UINTN) mApHltLoopCode, TopOfStack); } /** @@ -731,6 +747,8 @@ InitSmmS3ResumeState ( VOID *GuidHob; EFI_SMRAM_DESCRIPTOR *SmramDescriptor; SMM_S3_RESUME_STATE *SmmS3ResumeState; + EFI_PHYSICAL_ADDRESS Address; + EFI_STATUS Status; if (!mAcpiS3Enable) { return; @@ -773,6 +791,19 @@ InitSmmS3ResumeState ( // Patch SmmS3ResumeState->SmmS3Cr3 // InitSmmS3Cr3 (); + + // + // Allocate safe memory in ACPI NVS for AP to execute hlt loop on S3 path + // + Address = BASE_4GB - 1; + Status = gBS->AllocatePages ( + AllocateMaxAddress, + EfiACPIMemoryNVS, + EFI_SIZE_TO_PAGES (sizeof (mApHltLoopCodeTemplate)), + &Address + ); + ASSERT_EFI_ERROR (Status); + mApHltLoopCode = (UINT8 *) (UINTN) Address; } /** diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c index 545b534..3e99aab 100644 --- a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c @@ -94,3 +94,28 @@ InitGdt ( *GdtStepSize = GdtTableStepSize; return GdtTssTables; } + +/** + Transfer AP to safe hlt-loop after it finished restore CPU features on S3 patch. + + @param[in] ApHltLoopCode The 32-bit address of the safe hlt-loop function + @param[in] TopOfStack A pointer to the new stack to use for the ApHltLoopCode. + +**/ +VOID +TransferApToSafeState ( + IN UINT32 ApHltLoopCode, + IN UINT32 TopOfStack + ) +{ + SwitchStack ( + (SWITCH_STACK_ENTRY_POINT) (UINTN) ApHltLoopCode, + NULL, + NULL, + (VOID *) (UINTN) TopOfStack + ); + // + // It should never reach here + // + ASSERT (FALSE); +} diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h index 9b119c8..82fa5a6 100644 --- a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h @@ -825,4 +825,17 @@ GetAcpiS3EnableFlag ( VOID ); +/** + Transfer AP to safe hlt-loop after it finished restore CPU features on S3 patch. + + @param[in] ApHltLoopCode The 32-bit address of the safe hlt-loop function + @param[in] TopOfStack A pointer to the new stack to use for the ApHltLoopCode. + +**/ +VOID +TransferApToSafeState ( + IN UINT32 ApHltLoopCode, + IN UINT32 TopOfStack + ); + #endif diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c index b53aa45..c05dec7 100644 --- a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c @@ -68,3 +68,29 @@ InitGdt ( *GdtStepSize = GdtTableStepSize; return GdtTssTables; } +/** + Transfer AP to safe hlt-loop after it finished restore CPU features on S3 patch. + + @param[in] ApHltLoopCode The 32-bit address of the safe hlt-loop function + @param[in] TopOfStack A pointer to the new stack to use for the ApHltLoopCode. + +**/ +VOID +TransferApToSafeState ( + IN UINT32 ApHltLoopCode, + IN UINT32 TopOfStack + ) +{ + SwitchStack ( + (SWITCH_STACK_ENTRY_POINT) (UINTN) ApHltLoopCode, + NULL, + NULL, + (VOID *) (UINTN) TopOfStack + ); + // + // It should never reach here + // + ASSERT (FALSE); +} + + -- 2.9.3.windows.2 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path 2016-11-10 6:07 ` [PATCH 1/2] UefiCpuPkg/PiSmmCpuDxeSmm: " Jeff Fan @ 2016-11-10 8:50 ` Laszlo Ersek 2016-11-10 9:00 ` Fan, Jeff 0 siblings, 1 reply; 15+ messages in thread From: Laszlo Ersek @ 2016-11-10 8:50 UTC (permalink / raw) To: Jeff Fan, edk2-devel; +Cc: Paolo Bonzini, Jiewen Yao, Michael D Kinney Before I test this, I have one question / suggestion: On 11/10/16 07:07, Jeff Fan wrote: > On S3 path, we will wake up APs to restore CPU context in PiSmmCpuDxeSmm > driver. However, we place AP in hlt-loop under 1MB space borrowed after CPU > restoring CPU contexts. > In case, one NMI or SMI happens, APs may exit from hlt state and execute the > instruction after HLT instruction. But the code under 1MB is no longer safe at > that time. > > This fix is to allocate one ACPI NVS range to place the AP hlt-loop code. When > CPU finished restoration CPU contexts, AP will execute in this ACPI NVS range. > > https://bugzilla.tianocore.org/show_bug.cgi?id=216 > > Reported-by: Laszlo Ersek <lersek@redhat.com> > Cc: Laszlo Ersek <lersek@redhat.com> > Cc: Paolo Bonzini <pbonzini@redhat.com> > Cc: Jiewen Yao <jiewen.yao@intel.com> > Cc: Michael D Kinney <michael.d.kinney@intel.com> > Contributed-under: TianoCore Contribution Agreement 1.0 > Signed-off-by: Jeff Fan <jeff.fan@intel.com> > --- > UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c | 31 +++++++++++++++++++++++++++ > UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 25 +++++++++++++++++++++ > UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h | 13 +++++++++++ > UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c | 26 ++++++++++++++++++++++ > 4 files changed, 95 insertions(+) > > diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c b/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c > index 6a798ef..6f290b8 100644 > --- a/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c > +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c > @@ -77,6 +77,13 @@ SMM_S3_RESUME_STATE *mSmmS3ResumeState = NULL; > > BOOLEAN mAcpiS3Enable = TRUE; > > +UINT8 *mApHltLoopCode = NULL; > +UINT8 mApHltLoopCodeTemplate[] = { > + 0xFA, // cli > + 0xF4, // hlt > + 0xEB, 0xFC // jmp $-2 > + }; > + > /** > Get MSR spin lock by MSR index. > > @@ -376,6 +383,8 @@ MPRendezvousProcedure ( > CPU_REGISTER_TABLE *RegisterTableList; > UINT32 InitApicId; > UINTN Index; > + UINT32 TopOfStack; > + UINT8 Stack[128]; > > ProgramVirtualWireMode (); > DisableLvtInterrupts (); > @@ -393,6 +402,13 @@ MPRendezvousProcedure ( > // Count down the number with lock mechanism. > // > InterlockedDecrement (&mNumberToFinish); > + > + // > + // Place AP into the safe code > + // > + TopOfStack = (UINT32) (UINTN) Stack + sizeof (Stack); Here I suggest to append the following assignment: TopOfStack &= ~(UINT32) (CPU_STACK_ALIGNMENT - 1); This will clear the least significant bits of TopOfStack, wasting at most (CPU_STACK_ALIGNMENT - 1) bytes from the Stack array. However, the stack will be aligned correctly. Not sure if it's necessary, but the patch reminds me of commit f98f5ec304ec ("S3Resume2Pei: align return stacks explicitly"). What do you think? Thanks! Laszlo > + CopyMem ((VOID *) (UINTN) mApHltLoopCode, mApHltLoopCodeTemplate, sizeof (mApHltLoopCodeTemplate)); > + TransferApToSafeState ((UINT32) (UINTN) mApHltLoopCode, TopOfStack); > } > > /** > @@ -731,6 +747,8 @@ InitSmmS3ResumeState ( > VOID *GuidHob; > EFI_SMRAM_DESCRIPTOR *SmramDescriptor; > SMM_S3_RESUME_STATE *SmmS3ResumeState; > + EFI_PHYSICAL_ADDRESS Address; > + EFI_STATUS Status; > > if (!mAcpiS3Enable) { > return; > @@ -773,6 +791,19 @@ InitSmmS3ResumeState ( > // Patch SmmS3ResumeState->SmmS3Cr3 > // > InitSmmS3Cr3 (); > + > + // > + // Allocate safe memory in ACPI NVS for AP to execute hlt loop on S3 path > + // > + Address = BASE_4GB - 1; > + Status = gBS->AllocatePages ( > + AllocateMaxAddress, > + EfiACPIMemoryNVS, > + EFI_SIZE_TO_PAGES (sizeof (mApHltLoopCodeTemplate)), > + &Address > + ); > + ASSERT_EFI_ERROR (Status); > + mApHltLoopCode = (UINT8 *) (UINTN) Address; > } > > /** > diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c > index 545b534..3e99aab 100644 > --- a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c > +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c > @@ -94,3 +94,28 @@ InitGdt ( > *GdtStepSize = GdtTableStepSize; > return GdtTssTables; > } > + > +/** > + Transfer AP to safe hlt-loop after it finished restore CPU features on S3 patch. > + > + @param[in] ApHltLoopCode The 32-bit address of the safe hlt-loop function > + @param[in] TopOfStack A pointer to the new stack to use for the ApHltLoopCode. > + > +**/ > +VOID > +TransferApToSafeState ( > + IN UINT32 ApHltLoopCode, > + IN UINT32 TopOfStack > + ) > +{ > + SwitchStack ( > + (SWITCH_STACK_ENTRY_POINT) (UINTN) ApHltLoopCode, > + NULL, > + NULL, > + (VOID *) (UINTN) TopOfStack > + ); > + // > + // It should never reach here > + // > + ASSERT (FALSE); > +} > diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h > index 9b119c8..82fa5a6 100644 > --- a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h > +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h > @@ -825,4 +825,17 @@ GetAcpiS3EnableFlag ( > VOID > ); > > +/** > + Transfer AP to safe hlt-loop after it finished restore CPU features on S3 patch. > + > + @param[in] ApHltLoopCode The 32-bit address of the safe hlt-loop function > + @param[in] TopOfStack A pointer to the new stack to use for the ApHltLoopCode. > + > +**/ > +VOID > +TransferApToSafeState ( > + IN UINT32 ApHltLoopCode, > + IN UINT32 TopOfStack > + ); > + > #endif > diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c > index b53aa45..c05dec7 100644 > --- a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c > +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c > @@ -68,3 +68,29 @@ InitGdt ( > *GdtStepSize = GdtTableStepSize; > return GdtTssTables; > } > +/** > + Transfer AP to safe hlt-loop after it finished restore CPU features on S3 patch. > + > + @param[in] ApHltLoopCode The 32-bit address of the safe hlt-loop function > + @param[in] TopOfStack A pointer to the new stack to use for the ApHltLoopCode. > + > +**/ > +VOID > +TransferApToSafeState ( > + IN UINT32 ApHltLoopCode, > + IN UINT32 TopOfStack > + ) > +{ > + SwitchStack ( > + (SWITCH_STACK_ENTRY_POINT) (UINTN) ApHltLoopCode, > + NULL, > + NULL, > + (VOID *) (UINTN) TopOfStack > + ); > + // > + // It should never reach here > + // > + ASSERT (FALSE); > +} > + > + > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path 2016-11-10 8:50 ` Laszlo Ersek @ 2016-11-10 9:00 ` Fan, Jeff 2016-11-10 9:30 ` Laszlo Ersek 0 siblings, 1 reply; 15+ messages in thread From: Fan, Jeff @ 2016-11-10 9:00 UTC (permalink / raw) To: Laszlo Ersek, edk2-devel@ml01.01.org Cc: Kinney, Michael D, Paolo Bonzini, Yao, Jiewen I agree with your proposal. Great suggestion. I will create v2 later. -----Original Message----- From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek Sent: Thursday, November 10, 2016 4:51 PM To: Fan, Jeff; edk2-devel@ml01.01.org Cc: Kinney, Michael D; Paolo Bonzini; Yao, Jiewen Subject: Re: [edk2] [PATCH 1/2] UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path Before I test this, I have one question / suggestion: On 11/10/16 07:07, Jeff Fan wrote: > On S3 path, we will wake up APs to restore CPU context in > PiSmmCpuDxeSmm driver. However, we place AP in hlt-loop under 1MB > space borrowed after CPU restoring CPU contexts. > In case, one NMI or SMI happens, APs may exit from hlt state and > execute the instruction after HLT instruction. But the code under 1MB > is no longer safe at that time. > > This fix is to allocate one ACPI NVS range to place the AP hlt-loop > code. When CPU finished restoration CPU contexts, AP will execute in this ACPI NVS range. > > https://bugzilla.tianocore.org/show_bug.cgi?id=216 > > Reported-by: Laszlo Ersek <lersek@redhat.com> > Cc: Laszlo Ersek <lersek@redhat.com> > Cc: Paolo Bonzini <pbonzini@redhat.com> > Cc: Jiewen Yao <jiewen.yao@intel.com> > Cc: Michael D Kinney <michael.d.kinney@intel.com> > Contributed-under: TianoCore Contribution Agreement 1.0 > Signed-off-by: Jeff Fan <jeff.fan@intel.com> > --- > UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c | 31 +++++++++++++++++++++++++++ > UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 25 +++++++++++++++++++++ > UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h | 13 +++++++++++ > UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c | 26 > ++++++++++++++++++++++ > 4 files changed, 95 insertions(+) > > diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c > b/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c > index 6a798ef..6f290b8 100644 > --- a/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c > +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c > @@ -77,6 +77,13 @@ SMM_S3_RESUME_STATE *mSmmS3ResumeState = NULL; > > BOOLEAN mAcpiS3Enable = TRUE; > > +UINT8 *mApHltLoopCode = NULL; > +UINT8 mApHltLoopCodeTemplate[] = { > + 0xFA, // cli > + 0xF4, // hlt > + 0xEB, 0xFC // jmp $-2 > + }; > + > /** > Get MSR spin lock by MSR index. > > @@ -376,6 +383,8 @@ MPRendezvousProcedure ( > CPU_REGISTER_TABLE *RegisterTableList; > UINT32 InitApicId; > UINTN Index; > + UINT32 TopOfStack; > + UINT8 Stack[128]; > > ProgramVirtualWireMode (); > DisableLvtInterrupts (); > @@ -393,6 +402,13 @@ MPRendezvousProcedure ( > // Count down the number with lock mechanism. > // > InterlockedDecrement (&mNumberToFinish); > + > + // > + // Place AP into the safe code > + // > + TopOfStack = (UINT32) (UINTN) Stack + sizeof (Stack); Here I suggest to append the following assignment: TopOfStack &= ~(UINT32) (CPU_STACK_ALIGNMENT - 1); This will clear the least significant bits of TopOfStack, wasting at most (CPU_STACK_ALIGNMENT - 1) bytes from the Stack array. However, the stack will be aligned correctly. Not sure if it's necessary, but the patch reminds me of commit f98f5ec304ec ("S3Resume2Pei: align return stacks explicitly"). What do you think? Thanks! Laszlo > + CopyMem ((VOID *) (UINTN) mApHltLoopCode, mApHltLoopCodeTemplate, > + sizeof (mApHltLoopCodeTemplate)); TransferApToSafeState ((UINT32) > + (UINTN) mApHltLoopCode, TopOfStack); > } > > /** > @@ -731,6 +747,8 @@ InitSmmS3ResumeState ( > VOID *GuidHob; > EFI_SMRAM_DESCRIPTOR *SmramDescriptor; > SMM_S3_RESUME_STATE *SmmS3ResumeState; > + EFI_PHYSICAL_ADDRESS Address; > + EFI_STATUS Status; > > if (!mAcpiS3Enable) { > return; > @@ -773,6 +791,19 @@ InitSmmS3ResumeState ( > // Patch SmmS3ResumeState->SmmS3Cr3 > // > InitSmmS3Cr3 (); > + > + // > + // Allocate safe memory in ACPI NVS for AP to execute hlt loop on > + S3 path // Address = BASE_4GB - 1; Status = gBS->AllocatePages ( > + AllocateMaxAddress, > + EfiACPIMemoryNVS, > + EFI_SIZE_TO_PAGES (sizeof (mApHltLoopCodeTemplate)), > + &Address > + ); > + ASSERT_EFI_ERROR (Status); > + mApHltLoopCode = (UINT8 *) (UINTN) Address; > } > > /** > diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c > b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c > index 545b534..3e99aab 100644 > --- a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c > +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c > @@ -94,3 +94,28 @@ InitGdt ( > *GdtStepSize = GdtTableStepSize; > return GdtTssTables; > } > + > +/** > + Transfer AP to safe hlt-loop after it finished restore CPU features on S3 patch. > + > + @param[in] ApHltLoopCode The 32-bit address of the safe hlt-loop function > + @param[in] TopOfStack A pointer to the new stack to use for the ApHltLoopCode. > + > +**/ > +VOID > +TransferApToSafeState ( > + IN UINT32 ApHltLoopCode, > + IN UINT32 TopOfStack > + ) > +{ > + SwitchStack ( > + (SWITCH_STACK_ENTRY_POINT) (UINTN) ApHltLoopCode, > + NULL, > + NULL, > + (VOID *) (UINTN) TopOfStack > + ); > + // > + // It should never reach here > + // > + ASSERT (FALSE); > +} > diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h > b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h > index 9b119c8..82fa5a6 100644 > --- a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h > +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h > @@ -825,4 +825,17 @@ GetAcpiS3EnableFlag ( > VOID > ); > > +/** > + Transfer AP to safe hlt-loop after it finished restore CPU features on S3 patch. > + > + @param[in] ApHltLoopCode The 32-bit address of the safe hlt-loop function > + @param[in] TopOfStack A pointer to the new stack to use for the ApHltLoopCode. > + > +**/ > +VOID > +TransferApToSafeState ( > + IN UINT32 ApHltLoopCode, > + IN UINT32 TopOfStack > + ); > + > #endif > diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c > b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c > index b53aa45..c05dec7 100644 > --- a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c > +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c > @@ -68,3 +68,29 @@ InitGdt ( > *GdtStepSize = GdtTableStepSize; > return GdtTssTables; > } > +/** > + Transfer AP to safe hlt-loop after it finished restore CPU features on S3 patch. > + > + @param[in] ApHltLoopCode The 32-bit address of the safe hlt-loop function > + @param[in] TopOfStack A pointer to the new stack to use for the ApHltLoopCode. > + > +**/ > +VOID > +TransferApToSafeState ( > + IN UINT32 ApHltLoopCode, > + IN UINT32 TopOfStack > + ) > +{ > + SwitchStack ( > + (SWITCH_STACK_ENTRY_POINT) (UINTN) ApHltLoopCode, > + NULL, > + NULL, > + (VOID *) (UINTN) TopOfStack > + ); > + // > + // It should never reach here > + // > + ASSERT (FALSE); > +} > + > + > _______________________________________________ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path 2016-11-10 9:00 ` Fan, Jeff @ 2016-11-10 9:30 ` Laszlo Ersek 0 siblings, 0 replies; 15+ messages in thread From: Laszlo Ersek @ 2016-11-10 9:30 UTC (permalink / raw) To: Fan, Jeff, edk2-devel@ml01.01.org Cc: Kinney, Michael D, Paolo Bonzini, Yao, Jiewen On 11/10/16 10:00, Fan, Jeff wrote: > I agree with your proposal. Great suggestion. I will create v2 later. Thank you. Since you're going to send a v2: while applying this patch for testing, git-am warned me that there were 7 new lines with trailing whitespace. Can you please trim those? Thanks Laszlo > -----Original Message----- > From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Laszlo Ersek > Sent: Thursday, November 10, 2016 4:51 PM > To: Fan, Jeff; edk2-devel@ml01.01.org > Cc: Kinney, Michael D; Paolo Bonzini; Yao, Jiewen > Subject: Re: [edk2] [PATCH 1/2] UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path > > Before I test this, I have one question / suggestion: > > On 11/10/16 07:07, Jeff Fan wrote: >> On S3 path, we will wake up APs to restore CPU context in >> PiSmmCpuDxeSmm driver. However, we place AP in hlt-loop under 1MB >> space borrowed after CPU restoring CPU contexts. >> In case, one NMI or SMI happens, APs may exit from hlt state and >> execute the instruction after HLT instruction. But the code under 1MB >> is no longer safe at that time. >> >> This fix is to allocate one ACPI NVS range to place the AP hlt-loop >> code. When CPU finished restoration CPU contexts, AP will execute in this ACPI NVS range. >> >> https://bugzilla.tianocore.org/show_bug.cgi?id=216 >> >> Reported-by: Laszlo Ersek <lersek@redhat.com> >> Cc: Laszlo Ersek <lersek@redhat.com> >> Cc: Paolo Bonzini <pbonzini@redhat.com> >> Cc: Jiewen Yao <jiewen.yao@intel.com> >> Cc: Michael D Kinney <michael.d.kinney@intel.com> >> Contributed-under: TianoCore Contribution Agreement 1.0 >> Signed-off-by: Jeff Fan <jeff.fan@intel.com> >> --- >> UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c | 31 +++++++++++++++++++++++++++ >> UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 25 +++++++++++++++++++++ >> UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h | 13 +++++++++++ >> UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c | 26 >> ++++++++++++++++++++++ >> 4 files changed, 95 insertions(+) >> >> diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c >> b/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c >> index 6a798ef..6f290b8 100644 >> --- a/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c >> +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c >> @@ -77,6 +77,13 @@ SMM_S3_RESUME_STATE *mSmmS3ResumeState = NULL; >> >> BOOLEAN mAcpiS3Enable = TRUE; >> >> +UINT8 *mApHltLoopCode = NULL; >> +UINT8 mApHltLoopCodeTemplate[] = { >> + 0xFA, // cli >> + 0xF4, // hlt >> + 0xEB, 0xFC // jmp $-2 >> + }; >> + >> /** >> Get MSR spin lock by MSR index. >> >> @@ -376,6 +383,8 @@ MPRendezvousProcedure ( >> CPU_REGISTER_TABLE *RegisterTableList; >> UINT32 InitApicId; >> UINTN Index; >> + UINT32 TopOfStack; >> + UINT8 Stack[128]; >> >> ProgramVirtualWireMode (); >> DisableLvtInterrupts (); >> @@ -393,6 +402,13 @@ MPRendezvousProcedure ( >> // Count down the number with lock mechanism. >> // >> InterlockedDecrement (&mNumberToFinish); >> + >> + // >> + // Place AP into the safe code >> + // >> + TopOfStack = (UINT32) (UINTN) Stack + sizeof (Stack); > > Here I suggest to append the following assignment: > > TopOfStack &= ~(UINT32) (CPU_STACK_ALIGNMENT - 1); > > This will clear the least significant bits of TopOfStack, wasting at most (CPU_STACK_ALIGNMENT - 1) bytes from the Stack array. However, the stack will be aligned correctly. > > Not sure if it's necessary, but the patch reminds me of commit f98f5ec304ec ("S3Resume2Pei: align return stacks explicitly"). > > What do you think? > > Thanks! > Laszlo > >> + CopyMem ((VOID *) (UINTN) mApHltLoopCode, mApHltLoopCodeTemplate, >> + sizeof (mApHltLoopCodeTemplate)); TransferApToSafeState ((UINT32) >> + (UINTN) mApHltLoopCode, TopOfStack); >> } >> >> /** >> @@ -731,6 +747,8 @@ InitSmmS3ResumeState ( >> VOID *GuidHob; >> EFI_SMRAM_DESCRIPTOR *SmramDescriptor; >> SMM_S3_RESUME_STATE *SmmS3ResumeState; >> + EFI_PHYSICAL_ADDRESS Address; >> + EFI_STATUS Status; >> >> if (!mAcpiS3Enable) { >> return; >> @@ -773,6 +791,19 @@ InitSmmS3ResumeState ( >> // Patch SmmS3ResumeState->SmmS3Cr3 >> // >> InitSmmS3Cr3 (); >> + >> + // >> + // Allocate safe memory in ACPI NVS for AP to execute hlt loop on >> + S3 path // Address = BASE_4GB - 1; Status = gBS->AllocatePages ( >> + AllocateMaxAddress, >> + EfiACPIMemoryNVS, >> + EFI_SIZE_TO_PAGES (sizeof (mApHltLoopCodeTemplate)), >> + &Address >> + ); >> + ASSERT_EFI_ERROR (Status); >> + mApHltLoopCode = (UINT8 *) (UINTN) Address; >> } >> >> /** >> diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c >> b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c >> index 545b534..3e99aab 100644 >> --- a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c >> +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c >> @@ -94,3 +94,28 @@ InitGdt ( >> *GdtStepSize = GdtTableStepSize; >> return GdtTssTables; >> } >> + >> +/** >> + Transfer AP to safe hlt-loop after it finished restore CPU features on S3 patch. >> + >> + @param[in] ApHltLoopCode The 32-bit address of the safe hlt-loop function >> + @param[in] TopOfStack A pointer to the new stack to use for the ApHltLoopCode. >> + >> +**/ >> +VOID >> +TransferApToSafeState ( >> + IN UINT32 ApHltLoopCode, >> + IN UINT32 TopOfStack >> + ) >> +{ >> + SwitchStack ( >> + (SWITCH_STACK_ENTRY_POINT) (UINTN) ApHltLoopCode, >> + NULL, >> + NULL, >> + (VOID *) (UINTN) TopOfStack >> + ); >> + // >> + // It should never reach here >> + // >> + ASSERT (FALSE); >> +} >> diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h >> b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h >> index 9b119c8..82fa5a6 100644 >> --- a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h >> +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h >> @@ -825,4 +825,17 @@ GetAcpiS3EnableFlag ( >> VOID >> ); >> >> +/** >> + Transfer AP to safe hlt-loop after it finished restore CPU features on S3 patch. >> + >> + @param[in] ApHltLoopCode The 32-bit address of the safe hlt-loop function >> + @param[in] TopOfStack A pointer to the new stack to use for the ApHltLoopCode. >> + >> +**/ >> +VOID >> +TransferApToSafeState ( >> + IN UINT32 ApHltLoopCode, >> + IN UINT32 TopOfStack >> + ); >> + >> #endif >> diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c >> b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c >> index b53aa45..c05dec7 100644 >> --- a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c >> +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c >> @@ -68,3 +68,29 @@ InitGdt ( >> *GdtStepSize = GdtTableStepSize; >> return GdtTssTables; >> } >> +/** >> + Transfer AP to safe hlt-loop after it finished restore CPU features on S3 patch. >> + >> + @param[in] ApHltLoopCode The 32-bit address of the safe hlt-loop function >> + @param[in] TopOfStack A pointer to the new stack to use for the ApHltLoopCode. >> + >> +**/ >> +VOID >> +TransferApToSafeState ( >> + IN UINT32 ApHltLoopCode, >> + IN UINT32 TopOfStack >> + ) >> +{ >> + SwitchStack ( >> + (SWITCH_STACK_ENTRY_POINT) (UINTN) ApHltLoopCode, >> + NULL, >> + NULL, >> + (VOID *) (UINTN) TopOfStack >> + ); >> + // >> + // It should never reach here >> + // >> + ASSERT (FALSE); >> +} >> + >> + >> > > _______________________________________________ > edk2-devel mailing list > edk2-devel@lists.01.org > https://lists.01.org/mailman/listinfo/edk2-devel > _______________________________________________ > edk2-devel mailing list > edk2-devel@lists.01.org > https://lists.01.org/mailman/listinfo/edk2-devel > ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 2/2] UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode on S3 path 2016-11-10 6:07 [PATCH 0/2] Put AP into safe hlt-loop code on S3 path Jeff Fan 2016-11-10 6:07 ` [PATCH 1/2] UefiCpuPkg/PiSmmCpuDxeSmm: " Jeff Fan @ 2016-11-10 6:07 ` Jeff Fan 2016-11-10 8:56 ` [PATCH 0/2] Put AP into safe hlt-loop code " Laszlo Ersek ` (2 subsequent siblings) 4 siblings, 0 replies; 15+ messages in thread From: Jeff Fan @ 2016-11-10 6:07 UTC (permalink / raw) To: edk2-devel; +Cc: Laszlo Ersek, Paolo Bonzini, Jiewen Yao, Michael D Kinney On S3 path, we may transfer to long mode (if DXE is long mode) to restore CPU contexts with CR3 = SmmS3Cr3 (in SMM). AP will execute hlt-loop after CPU contexts restoration. Once one NMI or SMI happens, APs may exit from hlt state and execute the instruction after HLT instruction. If APs are running on long mode, page table is required to fetch the instruction. However, CR3 pointer to page table in SMM. APs will crash. This fix is to disable long mode on APs and transfer to 32bit protected mode to execute hlt-loop. Then CR3 and page table will no longer be required. https://bugzilla.tianocore.org/show_bug.cgi?id=216 Reported-by: Laszlo Ersek <lersek@redhat.com> Cc: Laszlo Ersek <lersek@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Jiewen Yao <jiewen.yao@intel.com> Cc: Michael D Kinney <michael.d.kinney@intel.com> Contributed-under: TianoCore Contribution Agreement 1.0 Signed-off-by: Jeff Fan <jeff.fan@intel.com> --- UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c | 43 ++++++++++++++++++++++++---- 1 file changed, 38 insertions(+), 5 deletions(-) diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c index c05dec7..1db0a6e 100644 --- a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c @@ -68,6 +68,38 @@ InitGdt ( *GdtStepSize = GdtTableStepSize; return GdtTssTables; } + +/** + Get Protected mode code segment from current GDT table. + + @return Protected mode code segment value. +**/ +UINT16 +GetProtectedModeCS ( + VOID + ) +{ + IA32_DESCRIPTOR GdtrDesc; + IA32_SEGMENT_DESCRIPTOR *GdtEntry; + UINTN GdtEntryCount; + UINT16 Index; + + Index = (UINT16) -1; + AsmReadGdtr (&GdtrDesc); + GdtEntryCount = (GdtrDesc.Limit + 1) / sizeof (IA32_SEGMENT_DESCRIPTOR); + GdtEntry = (IA32_SEGMENT_DESCRIPTOR *) GdtrDesc.Base; + for (Index = 0; Index < GdtEntryCount; Index++) { + if (GdtEntry->Bits.L == 0) { + if (GdtEntry->Bits.Type > 8 && GdtEntry->Bits.L == 0) { + break; + } + } + GdtEntry++; + } + ASSERT (Index != -1); + return Index * 8; +} + /** Transfer AP to safe hlt-loop after it finished restore CPU features on S3 patch. @@ -81,11 +113,12 @@ TransferApToSafeState ( IN UINT32 TopOfStack ) { - SwitchStack ( - (SWITCH_STACK_ENTRY_POINT) (UINTN) ApHltLoopCode, - NULL, - NULL, - (VOID *) (UINTN) TopOfStack + AsmDisablePaging64 ( + GetProtectedModeCS (), + (UINT32) (UINTN) ApHltLoopCode, + 0, + 0, + TopOfStack ); // // It should never reach here -- 2.9.3.windows.2 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 0/2] Put AP into safe hlt-loop code on S3 path 2016-11-10 6:07 [PATCH 0/2] Put AP into safe hlt-loop code on S3 path Jeff Fan 2016-11-10 6:07 ` [PATCH 1/2] UefiCpuPkg/PiSmmCpuDxeSmm: " Jeff Fan 2016-11-10 6:07 ` [PATCH 2/2] UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode " Jeff Fan @ 2016-11-10 8:56 ` Laszlo Ersek 2016-11-10 9:59 ` Paolo Bonzini 2016-11-10 10:41 ` Laszlo Ersek 4 siblings, 0 replies; 15+ messages in thread From: Laszlo Ersek @ 2016-11-10 8:56 UTC (permalink / raw) To: Jeff Fan, edk2-devel; +Cc: Paolo Bonzini On 11/10/16 07:07, Jeff Fan wrote: > On S3 path, we will wake up APs to restore CPU context in PiSmmCpuDxeSmm > driver. In case, one NMI or SMI happens, APs may exit from hlt state and > execute the instruction after HLT instruction. > > But APs are not running on safe code, it leads OVMF S3 boot unstable. > > https://bugzilla.tianocore.org/show_bug.cgi?id=216 > > I tested real platform with 64bit DXE. > > Jeff Fan (2): > UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path > UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode on S3 path > > UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c | 31 ++++++++++++++ > UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 25 ++++++++++++ > UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h | 13 ++++++ > UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c | 59 +++++++++++++++++++++++++++ > 4 files changed, 128 insertions(+) > Can we please append, to both commit messages: Analyzed-by: Paolo Bonzini <pbonzini@redhat.com> I think we should give credit to Paolo for his work in this thread. I'll report back with test results (with my suggested rounding-down of TopOfStack in patch #1). Thanks Laszlo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/2] Put AP into safe hlt-loop code on S3 path 2016-11-10 6:07 [PATCH 0/2] Put AP into safe hlt-loop code on S3 path Jeff Fan ` (2 preceding siblings ...) 2016-11-10 8:56 ` [PATCH 0/2] Put AP into safe hlt-loop code " Laszlo Ersek @ 2016-11-10 9:59 ` Paolo Bonzini 2016-11-11 6:32 ` Fan, Jeff 2016-11-10 10:41 ` Laszlo Ersek 4 siblings, 1 reply; 15+ messages in thread From: Paolo Bonzini @ 2016-11-10 9:59 UTC (permalink / raw) To: Jeff Fan, edk2-devel On 10/11/2016 07:07, Jeff Fan wrote: > On S3 path, we will wake up APs to restore CPU context in PiSmmCpuDxeSmm > driver. In case, one NMI or SMI happens, APs may exit from hlt state and > execute the instruction after HLT instruction. > > But APs are not running on safe code, it leads OVMF S3 boot unstable. > > https://bugzilla.tianocore.org/show_bug.cgi?id=216 > > I tested real platform with 64bit DXE. > > Jeff Fan (2): > UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path > UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode on S3 path > > UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c | 31 ++++++++++++++ > UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 25 ++++++++++++ > UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h | 13 ++++++ > UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c | 59 +++++++++++++++++++++++++++ > 4 files changed, 128 insertions(+) > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> It would be slightly more robust to do the "InterlockedDecrement (&mNumberToFinish);" while in safe state, but the race window is really really small. Paolo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/2] Put AP into safe hlt-loop code on S3 path 2016-11-10 9:59 ` Paolo Bonzini @ 2016-11-11 6:32 ` Fan, Jeff 0 siblings, 0 replies; 15+ messages in thread From: Fan, Jeff @ 2016-11-11 6:32 UTC (permalink / raw) To: Paolo Bonzini, edk2-devel@lists.01.org Paolo, I added patch #3 in v2 to do InterlockedDecrement (&mNumberToFinish) in the safe code. This is very good comment to eliminate this gap. Thanks! Jeff -----Original Message----- From: Paolo Bonzini [mailto:paolo.bonzini@gmail.com] On Behalf Of Paolo Bonzini Sent: Thursday, November 10, 2016 5:59 PM To: Fan, Jeff; edk2-devel@lists.01.org Subject: Re: [edk2] [PATCH 0/2] Put AP into safe hlt-loop code on S3 path On 10/11/2016 07:07, Jeff Fan wrote: > On S3 path, we will wake up APs to restore CPU context in > PiSmmCpuDxeSmm driver. In case, one NMI or SMI happens, APs may exit > from hlt state and execute the instruction after HLT instruction. > > But APs are not running on safe code, it leads OVMF S3 boot unstable. > > https://bugzilla.tianocore.org/show_bug.cgi?id=216 > > I tested real platform with 64bit DXE. > > Jeff Fan (2): > UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path > UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode on S3 > path > > UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c | 31 ++++++++++++++ > UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 25 ++++++++++++ > UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h | 13 ++++++ > UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c | 59 > +++++++++++++++++++++++++++ > 4 files changed, 128 insertions(+) > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> It would be slightly more robust to do the "InterlockedDecrement (&mNumberToFinish);" while in safe state, but the race window is really really small. Paolo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/2] Put AP into safe hlt-loop code on S3 path 2016-11-10 6:07 [PATCH 0/2] Put AP into safe hlt-loop code on S3 path Jeff Fan ` (3 preceding siblings ...) 2016-11-10 9:59 ` Paolo Bonzini @ 2016-11-10 10:41 ` Laszlo Ersek 2016-11-10 11:17 ` Yao, Jiewen 2016-11-10 12:26 ` Paolo Bonzini 4 siblings, 2 replies; 15+ messages in thread From: Laszlo Ersek @ 2016-11-10 10:41 UTC (permalink / raw) To: Jeff Fan; +Cc: edk2-devel, Jiewen Yao, Paolo Bonzini On 11/10/16 07:07, Jeff Fan wrote: > On S3 path, we will wake up APs to restore CPU context in PiSmmCpuDxeSmm > driver. In case, one NMI or SMI happens, APs may exit from hlt state and > execute the instruction after HLT instruction. > > But APs are not running on safe code, it leads OVMF S3 boot unstable. > > https://bugzilla.tianocore.org/show_bug.cgi?id=216 > > I tested real platform with 64bit DXE. > > Jeff Fan (2): > UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path > UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode on S3 path > > UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c | 31 ++++++++++++++ > UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 25 ++++++++++++ > UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h | 13 ++++++ > UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c | 59 +++++++++++++++++++++++++++ > 4 files changed, 128 insertions(+) > I applied this on top of Jiewen's v2, for testing. This series (with my addition for patch #1) doesn't fix the boot failure in case 8. (See "case 8" in <https://lists.01.org/pipermail/edk2-devel/2016-November/004316.html>.) I don't think the series aims to do that at all, but since it modifies the Ia32/SmmFuncsArch.c file, I thought I'd give it a shot. The series (with my addition for patch #1) changed the behavior of S3 resume, in case 13. There seem to be no crashes / emulation failures now. However, in some of the tries, the resume seems to include a several second long busy loop, and after that -- although the guest OS does come back up --, I cannot access *some* of the APs from within the OS: # this works, quickly taskset -c 0 efibootmgr # this fails taskset -c 1 efibootmgr taskset: failed to set pid 0's affinity: Invalid argument # these work again, albeit more slowly (as expected) taskset -c 2 efibootmgr taskset -c 3 efibootmgr I've seen this symptom ("AP goes lost during S3 resume") with the Ia32 SMM build before (without Jiewen's v2 series applied). If I run the "info cpus" QEMU command, I get: * CPU #0: pc=0xffffffff8105eb26 (halted) thread_id=22745 CPU #1: pc=0x00000000fffffff0 thread_id=22746 CPU #2: pc=0xffffffff8105eb26 (halted) thread_id=22747 CPU #3: pc=0xffffffff8105eb26 (halted) thread_id=22748 The halted status for #0, #2 and #3 is fine; that's just Linux at work. CPU#1 is strange -- not halted, but somehow stuck in the reset vector (0xfffffff0)? The gust kernel dmesg contains the following messages: > [ 55.805153] PM: Restoring platform NVS memory > [ 55.805153] Enabling non-boot CPUs ... > [ 55.805153] x86: Booting SMP configuration: > [ 55.805516] smpboot: Booting Node 0 Processor 1 APIC 0x1 > [ 65.816049] smpboot: do_boot_cpu failed(-1) to wakeup CPU#1 <- HERE > [ 65.816738] Error taking CPU1 up: -5 > [ 65.817050] smpboot: Booting Node 0 Processor 2 APIC 0x2 > [ 65.817029] kvm-clock: cpu 2, msr 1:7ffd6081, secondary cpu clock > [ 65.817029] kvm: enabling virtualization on CPU2 > [ 65.832296] KVM setup async PF for cpu 2 > [ 65.832607] kvm-stealtime: cpu 2, msr 17fd0e100 > [ 65.833031] CPU2 is up > [ 65.833242] smpboot: Booting Node 0 Processor 3 APIC 0x3 > [ 65.833229] kvm-clock: cpu 3, msr 1:7ffd60c1, secondary cpu clock > [ 65.833229] kvm: enabling virtualization on CPU3 > [ 65.848594] KVM setup async PF for cpu 3 > [ 65.848940] kvm-stealtime: cpu 3, msr 17fd8e100 > [ 65.849393] CPU3 is up > [ 65.849722] ACPI: Waking up from system sleep state S3 Note the 10 second gap where I put the marker (and the error message itself, too). Here's an excerpt from the KVM trace: > CPU-23509 [002] 8406.908787: kvm_enter_smm: vcpu 1: entering SMM, smbase 0x30000 > CPU-23509 [002] 8406.908836: kvm_enter_smm: vcpu 1: leaving SMM, smbase 0x7ffb3000 > CPU-23510 [003] 8406.908850: kvm_enter_smm: vcpu 2: entering SMM, smbase 0x30000 > CPU-23510 [003] 8406.908881: kvm_enter_smm: vcpu 2: leaving SMM, smbase 0x7ffb5000 > CPU-23511 [001] 8406.908908: kvm_enter_smm: vcpu 3: entering SMM, smbase 0x30000 > CPU-23511 [001] 8406.908941: kvm_enter_smm: vcpu 3: leaving SMM, smbase 0x7ffb7000 > CPU-23508 [005] 8406.908951: kvm_enter_smm: vcpu 0: entering SMM, smbase 0x30000 > CPU-23508 [005] 8406.908989: kvm_enter_smm: vcpu 0: leaving SMM, smbase 0x7ffb1000 > CPU-23511 [001] 8406.920215: kvm_enter_smm: vcpu 3: entering SMM, smbase 0x7ffb7000 > CPU-23509 [002] 8406.920225: kvm_enter_smm: vcpu 1: entering SMM, smbase 0x7ffb3000 > CPU-23510 [003] 8406.920225: kvm_enter_smm: vcpu 2: entering SMM, smbase 0x7ffb5000 > CPU-23508 [005] 8406.920227: kvm_enter_smm: vcpu 0: entering SMM, smbase 0x7ffb1000 > CPU-23508 [005] 8406.920262: kvm_enter_smm: vcpu 0: leaving SMM, smbase 0x7ffb1000 > CPU-23511 [001] 8406.920263: kvm_enter_smm: vcpu 3: leaving SMM, smbase 0x7ffb7000 > CPU-23508 [005] 8407.020292: kvm_enter_smm: vcpu 0: entering SMM, smbase 0x7ffb1000 > CPU-23509 [006] 8407.020338: kvm_enter_smm: vcpu 1: leaving SMM, smbase 0x7ffb3000 > CPU-23510 [003] 8407.020338: kvm_enter_smm: vcpu 2: leaving SMM, smbase 0x7ffb5000 > CPU-23508 [005] 8407.020338: kvm_enter_smm: vcpu 0: leaving SMM, smbase 0x7ffb1000 It seems that VCPU#0 still leaves (and then re-enters) SMM while VCPU#1 and VCPU#2 are firmly in SMM. So this series is a clear improvement, but something else remains amiss. If I remove Jiewen's v2 series, and apply only this one, then the symptom shows up much less frequently, but it does exist: - With (Jiewen's v2 + this one), testing case 13, I hit the symptom on the second resume, - With just this set applied, I hit the symptom (= one AP disappearing from Linux after resume) only on the 24th resume. Thanks Laszlo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/2] Put AP into safe hlt-loop code on S3 path 2016-11-10 10:41 ` Laszlo Ersek @ 2016-11-10 11:17 ` Yao, Jiewen 2016-11-10 12:08 ` Laszlo Ersek 2016-11-10 12:26 ` Paolo Bonzini 1 sibling, 1 reply; 15+ messages in thread From: Yao, Jiewen @ 2016-11-10 11:17 UTC (permalink / raw) To: Laszlo Ersek, Fan, Jeff; +Cc: edk2-devel@ml01.01.org, Paolo Bonzini Hi Laszlo Thanks to test for us. Are you saying Jeff's patch introduces a new issue? Or is this a previous issue but just not fixed by Jeff's patch? Thank you Yao Jiewen From: Laszlo Ersek [mailto:lersek@redhat.com] Sent: Thursday, November 10, 2016 6:41 PM To: Fan, Jeff <jeff.fan@intel.com> Cc: edk2-devel@ml01.01.org; Yao, Jiewen <jiewen.yao@intel.com>; Paolo Bonzini <pbonzini@redhat.com> Subject: Re: [edk2] [PATCH 0/2] Put AP into safe hlt-loop code on S3 path On 11/10/16 07:07, Jeff Fan wrote: > On S3 path, we will wake up APs to restore CPU context in PiSmmCpuDxeSmm > driver. In case, one NMI or SMI happens, APs may exit from hlt state and > execute the instruction after HLT instruction. > > But APs are not running on safe code, it leads OVMF S3 boot unstable. > > https://bugzilla.tianocore.org/show_bug.cgi?id=216 > > I tested real platform with 64bit DXE. > > Jeff Fan (2): > UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path > UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode on S3 path > > UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c | 31 ++++++++++++++ > UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 25 ++++++++++++ > UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h | 13 ++++++ > UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c | 59 +++++++++++++++++++++++++++ > 4 files changed, 128 insertions(+) > I applied this on top of Jiewen's v2, for testing. This series (with my addition for patch #1) doesn't fix the boot failure in case 8. (See "case 8" in <https://lists.01.org/pipermail/edk2-devel/2016-November/004316.html>.) I don't think the series aims to do that at all, but since it modifies the Ia32/SmmFuncsArch.c file, I thought I'd give it a shot. The series (with my addition for patch #1) changed the behavior of S3 resume, in case 13. There seem to be no crashes / emulation failures now. However, in some of the tries, the resume seems to include a several second long busy loop, and after that -- although the guest OS does come back up --, I cannot access *some* of the APs from within the OS: # this works, quickly taskset -c 0 efibootmgr # this fails taskset -c 1 efibootmgr taskset: failed to set pid 0's affinity: Invalid argument # these work again, albeit more slowly (as expected) taskset -c 2 efibootmgr taskset -c 3 efibootmgr I've seen this symptom ("AP goes lost during S3 resume") with the Ia32 SMM build before (without Jiewen's v2 series applied). If I run the "info cpus" QEMU command, I get: * CPU #0: pc=0xffffffff8105eb26 (halted) thread_id=22745 CPU #1: pc=0x00000000fffffff0 thread_id=22746 CPU #2: pc=0xffffffff8105eb26 (halted) thread_id=22747 CPU #3: pc=0xffffffff8105eb26 (halted) thread_id=22748 The halted status for #0, #2 and #3 is fine; that's just Linux at work. CPU#1 is strange -- not halted, but somehow stuck in the reset vector (0xfffffff0)? The gust kernel dmesg contains the following messages: > [ 55.805153] PM: Restoring platform NVS memory > [ 55.805153] Enabling non-boot CPUs ... > [ 55.805153] x86: Booting SMP configuration: > [ 55.805516] smpboot: Booting Node 0 Processor 1 APIC 0x1 > [ 65.816049] smpboot: do_boot_cpu failed(-1) to wakeup CPU#1 <- HERE > [ 65.816738] Error taking CPU1 up: -5 > [ 65.817050] smpboot: Booting Node 0 Processor 2 APIC 0x2 > [ 65.817029] kvm-clock: cpu 2, msr 1:7ffd6081, secondary cpu clock > [ 65.817029] kvm: enabling virtualization on CPU2 > [ 65.832296] KVM setup async PF for cpu 2 > [ 65.832607] kvm-stealtime: cpu 2, msr 17fd0e100 > [ 65.833031] CPU2 is up > [ 65.833242] smpboot: Booting Node 0 Processor 3 APIC 0x3 > [ 65.833229] kvm-clock: cpu 3, msr 1:7ffd60c1, secondary cpu clock > [ 65.833229] kvm: enabling virtualization on CPU3 > [ 65.848594] KVM setup async PF for cpu 3 > [ 65.848940] kvm-stealtime: cpu 3, msr 17fd8e100 > [ 65.849393] CPU3 is up > [ 65.849722] ACPI: Waking up from system sleep state S3 Note the 10 second gap where I put the marker (and the error message itself, too). Here's an excerpt from the KVM trace: > CPU-23509 [002] 8406.908787: kvm_enter_smm: vcpu 1: entering SMM, smbase 0x30000 > CPU-23509 [002] 8406.908836: kvm_enter_smm: vcpu 1: leaving SMM, smbase 0x7ffb3000 > CPU-23510 [003] 8406.908850: kvm_enter_smm: vcpu 2: entering SMM, smbase 0x30000 > CPU-23510 [003] 8406.908881: kvm_enter_smm: vcpu 2: leaving SMM, smbase 0x7ffb5000 > CPU-23511 [001] 8406.908908: kvm_enter_smm: vcpu 3: entering SMM, smbase 0x30000 > CPU-23511 [001] 8406.908941: kvm_enter_smm: vcpu 3: leaving SMM, smbase 0x7ffb7000 > CPU-23508 [005] 8406.908951: kvm_enter_smm: vcpu 0: entering SMM, smbase 0x30000 > CPU-23508 [005] 8406.908989: kvm_enter_smm: vcpu 0: leaving SMM, smbase 0x7ffb1000 > CPU-23511 [001] 8406.920215: kvm_enter_smm: vcpu 3: entering SMM, smbase 0x7ffb7000 > CPU-23509 [002] 8406.920225: kvm_enter_smm: vcpu 1: entering SMM, smbase 0x7ffb3000 > CPU-23510 [003] 8406.920225: kvm_enter_smm: vcpu 2: entering SMM, smbase 0x7ffb5000 > CPU-23508 [005] 8406.920227: kvm_enter_smm: vcpu 0: entering SMM, smbase 0x7ffb1000 > CPU-23508 [005] 8406.920262: kvm_enter_smm: vcpu 0: leaving SMM, smbase 0x7ffb1000 > CPU-23511 [001] 8406.920263: kvm_enter_smm: vcpu 3: leaving SMM, smbase 0x7ffb7000 > CPU-23508 [005] 8407.020292: kvm_enter_smm: vcpu 0: entering SMM, smbase 0x7ffb1000 > CPU-23509 [006] 8407.020338: kvm_enter_smm: vcpu 1: leaving SMM, smbase 0x7ffb3000 > CPU-23510 [003] 8407.020338: kvm_enter_smm: vcpu 2: leaving SMM, smbase 0x7ffb5000 > CPU-23508 [005] 8407.020338: kvm_enter_smm: vcpu 0: leaving SMM, smbase 0x7ffb1000 It seems that VCPU#0 still leaves (and then re-enters) SMM while VCPU#1 and VCPU#2 are firmly in SMM. So this series is a clear improvement, but something else remains amiss. If I remove Jiewen's v2 series, and apply only this one, then the symptom shows up much less frequently, but it does exist: - With (Jiewen's v2 + this one), testing case 13, I hit the symptom on the second resume, - With just this set applied, I hit the symptom (= one AP disappearing from Linux after resume) only on the 24th resume. Thanks Laszlo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/2] Put AP into safe hlt-loop code on S3 path 2016-11-10 11:17 ` Yao, Jiewen @ 2016-11-10 12:08 ` Laszlo Ersek 2016-11-10 20:45 ` Laszlo Ersek 0 siblings, 1 reply; 15+ messages in thread From: Laszlo Ersek @ 2016-11-10 12:08 UTC (permalink / raw) To: Yao, Jiewen, Fan, Jeff; +Cc: edk2-devel@ml01.01.org, Paolo Bonzini On 11/10/16 12:17, Yao, Jiewen wrote: > Hi Laszlo > > Thanks to test for us. > > > > Are you saying Jeff’s patch introduces a new issue? > > Or is this a previous issue but just not fixed by Jeff’s patch? With your v2 series applied, Jeff's patches replace the crash / emulation failure symptoms during S3 resume with less intrusive symptoms, namely that some of the APs cannot be brought up by the OS, occasionally. Without your v2 series applied, Jeff's patches seem to present the same symptoms (OS cannot bring up some APs), although much less frequently. However, I cannot say definitively whether or not this exact same issues exists, on Ia32X64, with none of the patch sets applied. I haven't seen it before (on Ia32X64), but maybe I just haven't tried hard enough. I guess I should try harder and see if the "lost AP" issue exists without either patch set applied. Thanks Laszlo > > > > > > Thank you > > Yao Jiewen > > > > *From:*Laszlo Ersek [mailto:lersek@redhat.com] > *Sent:* Thursday, November 10, 2016 6:41 PM > *To:* Fan, Jeff <jeff.fan@intel.com> > *Cc:* edk2-devel@ml01.01.org; Yao, Jiewen <jiewen.yao@intel.com>; Paolo > Bonzini <pbonzini@redhat.com> > *Subject:* Re: [edk2] [PATCH 0/2] Put AP into safe hlt-loop code on S3 path > > > > On 11/10/16 07:07, Jeff Fan wrote: >> On S3 path, we will wake up APs to restore CPU context in PiSmmCpuDxeSmm >> driver. In case, one NMI or SMI happens, APs may exit from hlt state and >> execute the instruction after HLT instruction. >> >> But APs are not running on safe code, it leads OVMF S3 boot unstable. >> >> https://bugzilla.tianocore.org/show_bug.cgi?id=216 >> >> I tested real platform with 64bit DXE. >> >> Jeff Fan (2): >> UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path >> UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode on S3 path >> >> UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c | 31 ++++++++++++++ >> UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 25 ++++++++++++ >> UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h | 13 ++++++ >> UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c | 59 +++++++++++++++++++++++++++ >> 4 files changed, 128 insertions(+) >> > > I applied this on top of Jiewen's v2, for testing. > > This series (with my addition for patch #1) doesn't fix the boot failure in case 8. (See "case 8" in <https://lists.01.org/pipermail/edk2-devel/2016-November/004316.html>.) I don't think the series aims to do that at all, but since it modifies the Ia32/SmmFuncsArch.c file, I thought I'd give it a shot. > > The series (with my addition for patch #1) changed the behavior of S3 resume, in case 13. There seem to be no crashes / emulation failures now. However, in some of the tries, the resume seems to include a several second long busy loop, and after that -- although the guest OS does come back up --, I cannot access *some* of the APs from within the OS: > > # this works, quickly > taskset -c 0 efibootmgr > > # this fails > taskset -c 1 efibootmgr > taskset: failed to set pid 0's affinity: Invalid argument > > # these work again, albeit more slowly (as expected) > taskset -c 2 efibootmgr > taskset -c 3 efibootmgr > > I've seen this symptom ("AP goes lost during S3 resume") with the Ia32 SMM build before (without Jiewen's v2 series applied). > > If I run the "info cpus" QEMU command, I get: > > * CPU #0: pc=0xffffffff8105eb26 (halted) thread_id=22745 > CPU #1: pc=0x00000000fffffff0 thread_id=22746 > CPU #2: pc=0xffffffff8105eb26 (halted) thread_id=22747 > CPU #3: pc=0xffffffff8105eb26 (halted) thread_id=22748 > > The halted status for #0, #2 and #3 is fine; that's just Linux at work. CPU#1 is strange -- not halted, but somehow stuck in the reset vector (0xfffffff0)? > > The gust kernel dmesg contains the following messages: > >> [ 55.805153] PM: Restoring platform NVS memory >> [ 55.805153] Enabling non-boot CPUs ... >> [ 55.805153] x86: Booting SMP configuration: >> [ 55.805516] smpboot: Booting Node 0 Processor 1 APIC 0x1 >> [ 65.816049] smpboot: do_boot_cpu failed(-1) to wakeup CPU#1 <- HERE >> [ 65.816738] Error taking CPU1 up: -5 >> [ 65.817050] smpboot: Booting Node 0 Processor 2 APIC 0x2 >> [ 65.817029] kvm-clock: cpu 2, msr 1:7ffd6081, secondary cpu clock >> [ 65.817029] kvm: enabling virtualization on CPU2 >> [ 65.832296] KVM setup async PF for cpu 2 >> [ 65.832607] kvm-stealtime: cpu 2, msr 17fd0e100 >> [ 65.833031] CPU2 is up >> [ 65.833242] smpboot: Booting Node 0 Processor 3 APIC 0x3 >> [ 65.833229] kvm-clock: cpu 3, msr 1:7ffd60c1, secondary cpu clock >> [ 65.833229] kvm: enabling virtualization on CPU3 >> [ 65.848594] KVM setup async PF for cpu 3 >> [ 65.848940] kvm-stealtime: cpu 3, msr 17fd8e100 >> [ 65.849393] CPU3 is up >> [ 65.849722] ACPI: Waking up from system sleep state S3 > > Note the 10 second gap where I put the marker (and the error message itself, too). > > Here's an excerpt from the KVM trace: > >> CPU-23509 [002] 8406.908787: kvm_enter_smm: vcpu 1: entering SMM, smbase 0x30000 >> CPU-23509 [002] 8406.908836: kvm_enter_smm: vcpu 1: leaving SMM, smbase 0x7ffb3000 >> CPU-23510 [003] 8406.908850: kvm_enter_smm: vcpu 2: entering SMM, smbase 0x30000 >> CPU-23510 [003] 8406.908881: kvm_enter_smm: vcpu 2: leaving SMM, smbase 0x7ffb5000 >> CPU-23511 [001] 8406.908908: kvm_enter_smm: vcpu 3: entering SMM, smbase 0x30000 >> CPU-23511 [001] 8406.908941: kvm_enter_smm: vcpu 3: leaving SMM, smbase 0x7ffb7000 >> CPU-23508 [005] 8406.908951: kvm_enter_smm: vcpu 0: entering SMM, smbase 0x30000 >> CPU-23508 [005] 8406.908989: kvm_enter_smm: vcpu 0: leaving SMM, smbase 0x7ffb1000 >> CPU-23511 [001] 8406.920215: kvm_enter_smm: vcpu 3: entering SMM, smbase 0x7ffb7000 >> CPU-23509 [002] 8406.920225: kvm_enter_smm: vcpu 1: entering SMM, smbase 0x7ffb3000 >> CPU-23510 [003] 8406.920225: kvm_enter_smm: vcpu 2: entering SMM, smbase 0x7ffb5000 >> CPU-23508 [005] 8406.920227: kvm_enter_smm: vcpu 0: entering SMM, smbase 0x7ffb1000 >> CPU-23508 [005] 8406.920262: kvm_enter_smm: vcpu 0: leaving SMM, smbase 0x7ffb1000 >> CPU-23511 [001] 8406.920263: kvm_enter_smm: vcpu 3: leaving SMM, smbase 0x7ffb7000 >> CPU-23508 [005] 8407.020292: kvm_enter_smm: vcpu 0: entering SMM, smbase 0x7ffb1000 >> CPU-23509 [006] 8407.020338: kvm_enter_smm: vcpu 1: leaving SMM, smbase 0x7ffb3000 >> CPU-23510 [003] 8407.020338: kvm_enter_smm: vcpu 2: leaving SMM, smbase 0x7ffb5000 >> CPU-23508 [005] 8407.020338: kvm_enter_smm: vcpu 0: leaving SMM, smbase 0x7ffb1000 > > It seems that VCPU#0 still leaves (and then re-enters) SMM while VCPU#1 and VCPU#2 are firmly in SMM. > > So this series is a clear improvement, but something else remains amiss. > > If I remove Jiewen's v2 series, and apply only this one, then the symptom shows up much less frequently, but it does exist: > - With (Jiewen's v2 + this one), testing case 13, I hit the symptom on the second resume, > - With just this set applied, I hit the symptom (= one AP disappearing from Linux after resume) only on the 24th resume. > > Thanks > Laszlo > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/2] Put AP into safe hlt-loop code on S3 path 2016-11-10 12:08 ` Laszlo Ersek @ 2016-11-10 20:45 ` Laszlo Ersek 0 siblings, 0 replies; 15+ messages in thread From: Laszlo Ersek @ 2016-11-10 20:45 UTC (permalink / raw) To: Yao, Jiewen, Fan, Jeff; +Cc: Paolo Bonzini, edk2-devel@ml01.01.org On 11/10/16 13:08, Laszlo Ersek wrote: > On 11/10/16 12:17, Yao, Jiewen wrote: >> Hi Laszlo >> >> Thanks to test for us. >> >> >> >> Are you saying Jeff’s patch introduces a new issue? >> >> Or is this a previous issue but just not fixed by Jeff’s patch? > > With your v2 series applied, Jeff's patches replace the crash / > emulation failure symptoms during S3 resume with less intrusive > symptoms, namely that some of the APs cannot be brought up by the OS, > occasionally. > > Without your v2 series applied, Jeff's patches seem to present the same > symptoms (OS cannot bring up some APs), although much less frequently. > However, I cannot say definitively whether or not this exact same issues > exists, on Ia32X64, with none of the patch sets applied. I haven't seen > it before (on Ia32X64), but maybe I just haven't tried hard enough. > > I guess I should try harder and see if the "lost AP" issue exists > without either patch set applied. With none of the patch sets applied, the "lost AP" issue doesn't exist; instead, the emulation failure is experienced. So, for Ia32X64, Jeff's not applied Jeff's applied --------------------------- ----------------- Jiewen's v2 not applied emulation failure, rare AP lost, rare Jiewen's v2 applied emulation failure, frequent AP lost, frequent Thanks Laszlo >> >> >> >> >> >> Thank you >> >> Yao Jiewen >> >> >> >> *From:*Laszlo Ersek [mailto:lersek@redhat.com] >> *Sent:* Thursday, November 10, 2016 6:41 PM >> *To:* Fan, Jeff <jeff.fan@intel.com> >> *Cc:* edk2-devel@ml01.01.org; Yao, Jiewen <jiewen.yao@intel.com>; Paolo >> Bonzini <pbonzini@redhat.com> >> *Subject:* Re: [edk2] [PATCH 0/2] Put AP into safe hlt-loop code on S3 path >> >> >> >> On 11/10/16 07:07, Jeff Fan wrote: >>> On S3 path, we will wake up APs to restore CPU context in PiSmmCpuDxeSmm >>> driver. In case, one NMI or SMI happens, APs may exit from hlt state and >>> execute the instruction after HLT instruction. >>> >>> But APs are not running on safe code, it leads OVMF S3 boot unstable. >>> >>> https://bugzilla.tianocore.org/show_bug.cgi?id=216 >>> >>> I tested real platform with 64bit DXE. >>> >>> Jeff Fan (2): >>> UefiCpuPkg/PiSmmCpuDxeSmm: Put AP into safe hlt-loop code on S3 path >>> UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode on S3 path >>> >>> UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c | 31 ++++++++++++++ >>> UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 25 ++++++++++++ >>> UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h | 13 ++++++ >>> UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c | 59 +++++++++++++++++++++++++++ >>> 4 files changed, 128 insertions(+) >>> >> >> I applied this on top of Jiewen's v2, for testing. >> >> This series (with my addition for patch #1) doesn't fix the boot failure in case 8. (See "case 8" in <https://lists.01.org/pipermail/edk2-devel/2016-November/004316.html>.) I don't think the series aims to do that at all, but since it modifies the Ia32/SmmFuncsArch.c file, I thought I'd give it a shot. >> >> The series (with my addition for patch #1) changed the behavior of S3 resume, in case 13. There seem to be no crashes / emulation failures now. However, in some of the tries, the resume seems to include a several second long busy loop, and after that -- although the guest OS does come back up --, I cannot access *some* of the APs from within the OS: >> >> # this works, quickly >> taskset -c 0 efibootmgr >> >> # this fails >> taskset -c 1 efibootmgr >> taskset: failed to set pid 0's affinity: Invalid argument >> >> # these work again, albeit more slowly (as expected) >> taskset -c 2 efibootmgr >> taskset -c 3 efibootmgr >> >> I've seen this symptom ("AP goes lost during S3 resume") with the Ia32 SMM build before (without Jiewen's v2 series applied). >> >> If I run the "info cpus" QEMU command, I get: >> >> * CPU #0: pc=0xffffffff8105eb26 (halted) thread_id=22745 >> CPU #1: pc=0x00000000fffffff0 thread_id=22746 >> CPU #2: pc=0xffffffff8105eb26 (halted) thread_id=22747 >> CPU #3: pc=0xffffffff8105eb26 (halted) thread_id=22748 >> >> The halted status for #0, #2 and #3 is fine; that's just Linux at work. CPU#1 is strange -- not halted, but somehow stuck in the reset vector (0xfffffff0)? >> >> The gust kernel dmesg contains the following messages: >> >>> [ 55.805153] PM: Restoring platform NVS memory >>> [ 55.805153] Enabling non-boot CPUs ... >>> [ 55.805153] x86: Booting SMP configuration: >>> [ 55.805516] smpboot: Booting Node 0 Processor 1 APIC 0x1 >>> [ 65.816049] smpboot: do_boot_cpu failed(-1) to wakeup CPU#1 <- HERE >>> [ 65.816738] Error taking CPU1 up: -5 >>> [ 65.817050] smpboot: Booting Node 0 Processor 2 APIC 0x2 >>> [ 65.817029] kvm-clock: cpu 2, msr 1:7ffd6081, secondary cpu clock >>> [ 65.817029] kvm: enabling virtualization on CPU2 >>> [ 65.832296] KVM setup async PF for cpu 2 >>> [ 65.832607] kvm-stealtime: cpu 2, msr 17fd0e100 >>> [ 65.833031] CPU2 is up >>> [ 65.833242] smpboot: Booting Node 0 Processor 3 APIC 0x3 >>> [ 65.833229] kvm-clock: cpu 3, msr 1:7ffd60c1, secondary cpu clock >>> [ 65.833229] kvm: enabling virtualization on CPU3 >>> [ 65.848594] KVM setup async PF for cpu 3 >>> [ 65.848940] kvm-stealtime: cpu 3, msr 17fd8e100 >>> [ 65.849393] CPU3 is up >>> [ 65.849722] ACPI: Waking up from system sleep state S3 >> >> Note the 10 second gap where I put the marker (and the error message itself, too). >> >> Here's an excerpt from the KVM trace: >> >>> CPU-23509 [002] 8406.908787: kvm_enter_smm: vcpu 1: entering SMM, smbase 0x30000 >>> CPU-23509 [002] 8406.908836: kvm_enter_smm: vcpu 1: leaving SMM, smbase 0x7ffb3000 >>> CPU-23510 [003] 8406.908850: kvm_enter_smm: vcpu 2: entering SMM, smbase 0x30000 >>> CPU-23510 [003] 8406.908881: kvm_enter_smm: vcpu 2: leaving SMM, smbase 0x7ffb5000 >>> CPU-23511 [001] 8406.908908: kvm_enter_smm: vcpu 3: entering SMM, smbase 0x30000 >>> CPU-23511 [001] 8406.908941: kvm_enter_smm: vcpu 3: leaving SMM, smbase 0x7ffb7000 >>> CPU-23508 [005] 8406.908951: kvm_enter_smm: vcpu 0: entering SMM, smbase 0x30000 >>> CPU-23508 [005] 8406.908989: kvm_enter_smm: vcpu 0: leaving SMM, smbase 0x7ffb1000 >>> CPU-23511 [001] 8406.920215: kvm_enter_smm: vcpu 3: entering SMM, smbase 0x7ffb7000 >>> CPU-23509 [002] 8406.920225: kvm_enter_smm: vcpu 1: entering SMM, smbase 0x7ffb3000 >>> CPU-23510 [003] 8406.920225: kvm_enter_smm: vcpu 2: entering SMM, smbase 0x7ffb5000 >>> CPU-23508 [005] 8406.920227: kvm_enter_smm: vcpu 0: entering SMM, smbase 0x7ffb1000 >>> CPU-23508 [005] 8406.920262: kvm_enter_smm: vcpu 0: leaving SMM, smbase 0x7ffb1000 >>> CPU-23511 [001] 8406.920263: kvm_enter_smm: vcpu 3: leaving SMM, smbase 0x7ffb7000 >>> CPU-23508 [005] 8407.020292: kvm_enter_smm: vcpu 0: entering SMM, smbase 0x7ffb1000 >>> CPU-23509 [006] 8407.020338: kvm_enter_smm: vcpu 1: leaving SMM, smbase 0x7ffb3000 >>> CPU-23510 [003] 8407.020338: kvm_enter_smm: vcpu 2: leaving SMM, smbase 0x7ffb5000 >>> CPU-23508 [005] 8407.020338: kvm_enter_smm: vcpu 0: leaving SMM, smbase 0x7ffb1000 >> >> It seems that VCPU#0 still leaves (and then re-enters) SMM while VCPU#1 and VCPU#2 are firmly in SMM. >> >> So this series is a clear improvement, but something else remains amiss. >> >> If I remove Jiewen's v2 series, and apply only this one, then the symptom shows up much less frequently, but it does exist: >> - With (Jiewen's v2 + this one), testing case 13, I hit the symptom on the second resume, >> - With just this set applied, I hit the symptom (= one AP disappearing from Linux after resume) only on the 24th resume. >> >> Thanks >> Laszlo >> > > _______________________________________________ > edk2-devel mailing list > edk2-devel@lists.01.org > https://lists.01.org/mailman/listinfo/edk2-devel > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/2] Put AP into safe hlt-loop code on S3 path 2016-11-10 10:41 ` Laszlo Ersek 2016-11-10 11:17 ` Yao, Jiewen @ 2016-11-10 12:26 ` Paolo Bonzini 2016-11-10 13:33 ` Laszlo Ersek 1 sibling, 1 reply; 15+ messages in thread From: Paolo Bonzini @ 2016-11-10 12:26 UTC (permalink / raw) To: Laszlo Ersek, Jeff Fan; +Cc: edk2-devel, Jiewen Yao On 10/11/2016 11:41, Laszlo Ersek wrote: > Here's an excerpt from the KVM trace: > >> CPU-23509 [002] 8406.908787: kvm_enter_smm: vcpu 1: entering SMM, smbase 0x30000 >> CPU-23509 [002] 8406.908836: kvm_enter_smm: vcpu 1: leaving SMM, smbase 0x7ffb3000 >> CPU-23510 [003] 8406.908850: kvm_enter_smm: vcpu 2: entering SMM, smbase 0x30000 >> CPU-23510 [003] 8406.908881: kvm_enter_smm: vcpu 2: leaving SMM, smbase 0x7ffb5000 >> CPU-23511 [001] 8406.908908: kvm_enter_smm: vcpu 3: entering SMM, smbase 0x30000 >> CPU-23511 [001] 8406.908941: kvm_enter_smm: vcpu 3: leaving SMM, smbase 0x7ffb7000 >> CPU-23508 [005] 8406.908951: kvm_enter_smm: vcpu 0: entering SMM, smbase 0x30000 >> CPU-23508 [005] 8406.908989: kvm_enter_smm: vcpu 0: leaving SMM, smbase 0x7ffb1000 >> CPU-23511 [001] 8406.920215: kvm_enter_smm: vcpu 3: entering SMM, smbase 0x7ffb7000 >> CPU-23509 [002] 8406.920225: kvm_enter_smm: vcpu 1: entering SMM, smbase 0x7ffb3000 >> CPU-23510 [003] 8406.920225: kvm_enter_smm: vcpu 2: entering SMM, smbase 0x7ffb5000 >> CPU-23508 [005] 8406.920227: kvm_enter_smm: vcpu 0: entering SMM, smbase 0x7ffb1000 >> CPU-23508 [005] 8406.920262: kvm_enter_smm: vcpu 0: leaving SMM, smbase 0x7ffb1000 >> CPU-23511 [001] 8406.920263: kvm_enter_smm: vcpu 3: leaving SMM, smbase 0x7ffb7000 >> CPU-23508 [005] 8407.020292: kvm_enter_smm: vcpu 0: entering SMM, smbase 0x7ffb1000 >> CPU-23509 [006] 8407.020338: kvm_enter_smm: vcpu 1: leaving SMM, smbase 0x7ffb3000 >> CPU-23510 [003] 8407.020338: kvm_enter_smm: vcpu 2: leaving SMM, smbase 0x7ffb5000 >> CPU-23508 [005] 8407.020338: kvm_enter_smm: vcpu 0: leaving SMM, smbase 0x7ffb1000 > > It seems that VCPU#0 still leaves (and then re-enters) SMM while VCPU#1 and VCPU#2 are firmly in SMM. > > So this series is a clear improvement, but something else remains amiss. > > If I remove Jiewen's v2 series, and apply only this one, then the symptom shows up much less frequently, but it does exist: > - With (Jiewen's v2 + this one), testing case 13, I hit the symptom on the second resume, > - With just this set applied, I hit the symptom (= one AP disappearing from Linux after resume) only on the 24th resume. Any trace I can look at? What about case 14, with PcdCpuSmmStaticPageTable=TRUE? Paolo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/2] Put AP into safe hlt-loop code on S3 path 2016-11-10 12:26 ` Paolo Bonzini @ 2016-11-10 13:33 ` Laszlo Ersek 0 siblings, 0 replies; 15+ messages in thread From: Laszlo Ersek @ 2016-11-10 13:33 UTC (permalink / raw) To: Paolo Bonzini, Jeff Fan; +Cc: edk2-devel, Jiewen Yao On 11/10/16 13:26, Paolo Bonzini wrote: > > > On 10/11/2016 11:41, Laszlo Ersek wrote: >> Here's an excerpt from the KVM trace: >> >>> CPU-23509 [002] 8406.908787: kvm_enter_smm: vcpu 1: entering SMM, smbase 0x30000 >>> CPU-23509 [002] 8406.908836: kvm_enter_smm: vcpu 1: leaving SMM, smbase 0x7ffb3000 >>> CPU-23510 [003] 8406.908850: kvm_enter_smm: vcpu 2: entering SMM, smbase 0x30000 >>> CPU-23510 [003] 8406.908881: kvm_enter_smm: vcpu 2: leaving SMM, smbase 0x7ffb5000 >>> CPU-23511 [001] 8406.908908: kvm_enter_smm: vcpu 3: entering SMM, smbase 0x30000 >>> CPU-23511 [001] 8406.908941: kvm_enter_smm: vcpu 3: leaving SMM, smbase 0x7ffb7000 >>> CPU-23508 [005] 8406.908951: kvm_enter_smm: vcpu 0: entering SMM, smbase 0x30000 >>> CPU-23508 [005] 8406.908989: kvm_enter_smm: vcpu 0: leaving SMM, smbase 0x7ffb1000 >>> CPU-23511 [001] 8406.920215: kvm_enter_smm: vcpu 3: entering SMM, smbase 0x7ffb7000 >>> CPU-23509 [002] 8406.920225: kvm_enter_smm: vcpu 1: entering SMM, smbase 0x7ffb3000 >>> CPU-23510 [003] 8406.920225: kvm_enter_smm: vcpu 2: entering SMM, smbase 0x7ffb5000 >>> CPU-23508 [005] 8406.920227: kvm_enter_smm: vcpu 0: entering SMM, smbase 0x7ffb1000 >>> CPU-23508 [005] 8406.920262: kvm_enter_smm: vcpu 0: leaving SMM, smbase 0x7ffb1000 >>> CPU-23511 [001] 8406.920263: kvm_enter_smm: vcpu 3: leaving SMM, smbase 0x7ffb7000 >>> CPU-23508 [005] 8407.020292: kvm_enter_smm: vcpu 0: entering SMM, smbase 0x7ffb1000 >>> CPU-23509 [006] 8407.020338: kvm_enter_smm: vcpu 1: leaving SMM, smbase 0x7ffb3000 >>> CPU-23510 [003] 8407.020338: kvm_enter_smm: vcpu 2: leaving SMM, smbase 0x7ffb5000 >>> CPU-23508 [005] 8407.020338: kvm_enter_smm: vcpu 0: leaving SMM, smbase 0x7ffb1000 >> >> It seems that VCPU#0 still leaves (and then re-enters) SMM while VCPU#1 and VCPU#2 are firmly in SMM. >> >> So this series is a clear improvement, but something else remains amiss. >> >> If I remove Jiewen's v2 series, and apply only this one, then the symptom shows up much less frequently, but it does exist: >> - With (Jiewen's v2 + this one), testing case 13, I hit the symptom on the second resume, >> - With just this set applied, I hit the symptom (= one AP disappearing from Linux after resume) only on the 24th resume. > > Any trace I can look at? Thank you for asking / offering, I did stash the full trace :), I just didn't want to foist it upon you without you offering first :) http://people.redhat.com/lersek/s3-crash-8d1dfed7-ca92-4e25-8d2b-b1c9ac2a53db/case-13-jiewen-v2-jeff-fixup-trace.txt.bz2 > What about case 14, with > PcdCpuSmmStaticPageTable=TRUE? I didn't test case 14, because I consider it a more demanding test case than case 13. According to Jiewen's earlier explanation, PcdCpuSmmStaticPageTable=FALSE implements a subset of the protections that PcdCpuSmmStaticPageTable=TRUE implements. Thanks Laszlo ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2016-11-11 6:32 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-11-10 6:07 [PATCH 0/2] Put AP into safe hlt-loop code on S3 path Jeff Fan 2016-11-10 6:07 ` [PATCH 1/2] UefiCpuPkg/PiSmmCpuDxeSmm: " Jeff Fan 2016-11-10 8:50 ` Laszlo Ersek 2016-11-10 9:00 ` Fan, Jeff 2016-11-10 9:30 ` Laszlo Ersek 2016-11-10 6:07 ` [PATCH 2/2] UefiCpuPkg/PiSmmCpuDxeSmm: Place AP to 32bit protected mode " Jeff Fan 2016-11-10 8:56 ` [PATCH 0/2] Put AP into safe hlt-loop code " Laszlo Ersek 2016-11-10 9:59 ` Paolo Bonzini 2016-11-11 6:32 ` Fan, Jeff 2016-11-10 10:41 ` Laszlo Ersek 2016-11-10 11:17 ` Yao, Jiewen 2016-11-10 12:08 ` Laszlo Ersek 2016-11-10 20:45 ` Laszlo Ersek 2016-11-10 12:26 ` Paolo Bonzini 2016-11-10 13:33 ` Laszlo Ersek
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox