From: "Fan, Jeff" <jeff.fan@intel.com>
To: "Yao, Jiewen" <jiewen.yao@intel.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Laszlo Ersek <lersek@redhat.com>
Cc: "Kinney, Michael D" <michael.d.kinney@intel.com>,
"Tian, Feng" <feng.tian@intel.com>,
"edk2-devel@ml01.01.org" <edk2-devel@ml01.01.org>,
"Zeng, Star" <star.zeng@intel.com>
Subject: Re: [PATCH V2 0/6] Enable SMM page level protection.
Date: Thu, 10 Nov 2016 01:02:22 +0000 [thread overview]
Message-ID: <542CF652F8836A4AB8DBFAAD40ED192A4A2D6EB7@shsmsx102.ccr.corp.intel.com> (raw)
In-Reply-To: <74D8A39837DF1E4DA445A8C0B3885C50386CCA3D@shsmsx102.ccr.corp.intel.com>
I think it is necessary to place AP into one safe state: (hlt-loop, no page table required, > 1MB reserved space in non-SMM), just like we have done in MpInitExitBootServicesCallback() on normal boot path.
From: Yao, Jiewen
Sent: Thursday, November 10, 2016 8:51 AM
To: Paolo Bonzini; Laszlo Ersek
Cc: Kinney, Michael D; Tian, Feng; edk2-devel@ml01.01.org; Zeng, Star; Fan, Jeff
Subject: RE: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
Fix a typo.
From: Yao, Jiewen
Sent: Thursday, November 10, 2016 8:49 AM
To: 'Paolo Bonzini' <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
Cc: Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>
Subject: RE: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
> Anyway, I think if the BSP and the APs are properly synchronized around
> the SMI injections in S3ResumeExecuteBootScript(), then this bug is
> fixed. In that case, the APs' RSMs will restore the full context for the
> APs, including their sleep in the HLT instruction, in CpuMpPei's wakeup
> buffer. The BSP will proceed, exit PEI (restoring the CpuMpPei wakeup
> buffer -- but the APs will sleep on), and then Linux will bring up the
> APs, after taking control.
Agreed.
[Jiewen] I hold different opinion on that.
If the AP is in hlt-loop at some below 1M memory with CR3 pointing to SMRAM, it has 2 issues.
* The below 1M memory (0x9f0fd) might be consumed by OS with other instruction.
* If AP starts running the code, it will get exception because CR3 is obviously wrong. The AP cannot fetch any code.
We might have 2 possibles way to trigger this scenario, at least.
A) Jeff and I have discovered one possible case – AP may receive NMI/SMI, such as periodic SMI, before OS sends INIT-SIPI-SIPI to wake up AP.
B) Paolo has found one real case - AP is in SMRAM when BSP is out and about to close SMRAM.
IMHO, letting BSP/AP sync in S3 resume just resolved B). But it does not help on A).
If the system has some special SMI, such as periodic SMI. It will definitely trigger case A).
So we have to fix the AP state anyway.
Now, if the AP state is fixed, I do not think we need worry about the BSP/AP out of sync issue.
BSP and AP can be independent. AP can receive NMI/SMI at any time and just work in HLT-loop.
Thank you
Yao Jiewen
From: Paolo Bonzini [mailto:pbonzini@redhat.com]
Sent: Thursday, November 10, 2016 7:00 AM
To: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
Cc: Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Tian, Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>; Zeng, Star <star.zeng@intel.com<mailto:star.zeng@intel.com>>; Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>
Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection.
> Another question I have -- and I feel I should really know it, but I
> don't... -- is *why* the APs are executing code from the page at
> 0x9f000.
This I can answer. :)
The APs have done their INIT-SIPI-SIPI, and then went into the CLI;HLT;JMP
loop. When the AP exits SMM, it is in the JMP instruction.
As suggested by Jiewen, edk2 could jump to a 32-bit loop that is _not_
in the 0-640K area (perhaps it could be in what your doc calls the
"permanent PEI memory for the S3 resume path"?). After thinking a
bit more about it, it seems simplest to me if CpuS3.c just uses
SwitchStack or AsmDisablePaging64 at the end of MPRendezvousProcedure,
to jump to a small stub like
POP EAX ; pop return address
POP EAX ; pop Context1 which is &mNumberToFinish
DEC [EAX]
1: CLI
HLT
JMP 1
> I could be utterly and inexcusably wrong, but I think that the
> RIP=0x9f0fd symptom is a red herring.
I wouldn't call it a red herring. After all, CR3 points to SMM
exactly because the CR3 that was set up for the 0x9f000 stub is
CpuS3.c's SMRAM page table root.
What _is_ a red herring is KVM's trace showing a RSM instruction
at RIP=0x9f0fd. That is clearly bogus, RSM was rather the last
instruction executed _before_ getting to that RIP.
> > vcpu#0 vcpu#1 vcpu#2 vcpu#3
> > ------ ------ ------ ------
> > enter enter
> > enter | enter |
> > | | | |
> > leave | | |
> > <--------------------------- BAD
> > enter | | |
> > | | | |
> > leave leave leave leave
>
> I don't understand why we don't get horrible faults on the APs
> *immediately* when the BSP closes/locks down SMRAM. Everything in SMRAM,
> page tables, executable code, everything, will read as 0xff on QEMU. How
> can the APs continue in SMM long enough to
>
> (a) time out and pull the BSP back into SMM,
> (b) complete the rendezvous and exit SMM?
Because the "0xff" only applies when you're out of SMM. The three
states (open, closed, closed/locked) only apply when you're not in SMM.
While the AP is in SMM they are executing in a separate address space
where SMRAM is "not closed". (In QEMU that's a separate AddressSpace
struct, smram_address_space in target-i386/kvm.c).
> I think I sort of answered question (2). (Apologies if Paolo and Jiewen
> explained the exact same thing before; I had to spell it out for
> myself.) That leaves question (1) open. Why enter SMM in
> S3ResumeExecuteBootScript() at all?
>
> Anyway, I think if the BSP and the APs are properly synchronized around
> the SMI injections in S3ResumeExecuteBootScript(), then this bug is
> fixed. In that case, the APs' RSMs will restore the full context for the
> APs, including their sleep in the HLT instruction, in CpuMpPei's wakeup
> buffer. The BSP will proceed, exit PEI (restoring the CpuMpPei wakeup
> buffer -- but the APs will sleep on), and then Linux will bring up the
> APs, after taking control.
Agreed.
Paolo
next prev parent reply other threads:[~2016-11-10 1:02 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-04 9:30 [PATCH V2 0/6] Enable SMM page level protection Jiewen Yao
2016-11-04 9:30 ` [PATCH V2 1/6] MdeModulePkg/Include: Add PiSmmMemoryAttributesTable.h Jiewen Yao
2016-11-04 9:30 ` [PATCH V2 2/6] MdeModulePkg/dec: Add gEdkiiPiSmmMemoryAttributesTableGuid Jiewen Yao
2016-11-04 9:30 ` [PATCH V2 3/6] MdeModulePkg/PiSmmCore: Add MemoryAttributes support Jiewen Yao
2016-11-04 9:30 ` [PATCH V2 4/6] UefiCpuPkg/dec: Add PcdCpuSmmStaticPageTable Jiewen Yao
2016-11-04 9:30 ` [PATCH V2 5/6] UefiCpuPkg/PiSmmCpuDxeSmm: Add paging protection Jiewen Yao
2016-11-04 9:30 ` [PATCH V2 6/6] QuarkPlatformPkg/dsc: enable Smm " Jiewen Yao
2016-11-04 22:40 ` [PATCH V2 0/6] Enable SMM page level protection Laszlo Ersek
2016-11-04 22:46 ` Yao, Jiewen
2016-11-04 23:08 ` Laszlo Ersek
2016-11-08 1:22 ` Laszlo Ersek
2016-11-08 12:59 ` Yao, Jiewen
2016-11-08 13:22 ` Laszlo Ersek
2016-11-08 13:41 ` Yao, Jiewen
2016-11-09 6:25 ` Yao, Jiewen
2016-11-09 11:30 ` Paolo Bonzini
2016-11-09 15:01 ` Yao, Jiewen
2016-11-09 15:54 ` Paolo Bonzini
2016-11-09 16:06 ` Paolo Bonzini
2016-11-09 22:28 ` Laszlo Ersek
2016-11-09 22:59 ` Paolo Bonzini
2016-11-09 23:27 ` Laszlo Ersek
2016-11-10 1:13 ` Yao, Jiewen
2016-11-10 6:30 ` Fan, Jeff
2016-11-10 0:49 ` Yao, Jiewen
2016-11-10 0:50 ` Yao, Jiewen
2016-11-10 1:02 ` Fan, Jeff [this message]
2016-11-09 20:46 ` Laszlo Ersek
2016-11-10 10:41 ` Yao, Jiewen
2016-11-10 12:01 ` Laszlo Ersek
2016-11-10 14:48 ` Yao, Jiewen
2016-11-10 14:53 ` Paolo Bonzini
2016-11-10 16:22 ` Laszlo Ersek
2016-11-10 16:39 ` Paolo Bonzini
2016-11-10 16:25 ` Laszlo Ersek
2016-11-10 12:27 ` Paolo Bonzini
2016-11-09 11:23 ` Paolo Bonzini
2016-11-09 15:16 ` Yao, Jiewen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-list from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=542CF652F8836A4AB8DBFAAD40ED192A4A2D6EB7@shsmsx102.ccr.corp.intel.com \
--to=devel@edk2.groups.io \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox