CPU hotplug using SMM with QEMU+OVMF

public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed

* CPU hotplug using SMM with QEMU+OVMF
@ 2019-08-13 14:16 Laszlo Ersek
  2019-08-13 16:09 ` Laszlo Ersek
  0 siblings, 1 reply; 69+ messages in thread
From: Laszlo Ersek @ 2019-08-13 14:16 UTC (permalink / raw)
  To: edk2-devel-groups-io
  Cc: edk2-rfc-groups-io, qemu devel list, Igor Mammedov, Paolo Bonzini,
	Jiewen Yao, Yingwen Chen, Jun Nakajima, Boris Ostrovsky,
	Joao Marcal Lemos Martins, Phillip Goerl

Hi,

this message is a problem statement, and an initial recommendation for
solving it, from Jiewen, Paolo, Yingwen, and others. I'm cross-posting
the thread starter to the <devel@edk2.groups.io>, <rfc@edk2.groups.io>
and <qemu-devel@nongnu.org> lists. Please use "Reply All" when
commenting.

In response to the initial posting, I plan to ask a number of questions.

The related TianoCore bugzillas are:

  https://bugzilla.tianocore.org/show_bug.cgi?id=1512
  https://bugzilla.tianocore.org/show_bug.cgi?id=1515

SMM is used as a security barrier between the OS kernel and the
firmware. When a CPU is plugged into a running system where this barrier
exists fine otherwise, the new CPU can be considered a means to attack
SMM. When the next SMI is raised (globally, or targeted at the new CPU),
the SMBASE for that CPU is still at 0x30000, which is normal RAM, not
SMRAM. Therefore the OS could place attack code in that area prior to
the SMI. Once in SMM, the new CPU would execute OS-owned code (from
normal RAM) with access to SMRAM and to other SMM-protected stuff, such
as flash. [I stole a few words from Paolo here.]

Jiewen summarized the problem as follows:

- Asset: SMM

- Adversary:

  - System Software Attacker, who can control any OS memory or silicon
    register from OS level, or read write BIOS data.

  - Simple hardware attacker, who can hot add or hot remove a CPU.

  - Non-adversary: The attacker cannot modify the flash BIOS code or
    read only BIOS data. The flash part itself is treated as TCB and
    protected.

- Threat: The attacker may hot add or hot remove a CPU, then modify
  system memory to tamper the SMRAM content, or trigger SMI to get the
  privilege escalation by executing code in SMM mode.

We'd like to solve this problem for QEMU/KVM and OVMF.

(At the moment, CPU hotplug doesn't work with OVMF *iff* OVMF was built
with -D SMM_REQUIRE. SMBASE relocation never happens for the new CPU,
the SMM infrastructure in edk2 doesn't know about the new CPU, and so
when the first SMI is broadcast afterwards, we crash. We'd like this
functionality to *work*, in the first place -- but securely at that, so
that an actively malicious guest kernel can't break into SMM.)

Yingwen and Jiewen suggested the following process.

Legend:

- "New CPU":  CPU being hot-added
- "Host CPU": existing CPU
- (Flash):    code running from flash
- (SMM):      code running from SMRAM

Steps:

(01) New CPU: (Flash) enter reset vector, Global SMI disabled by
     default.

(02) New CPU: (Flash) configure memory control to let it access global
     host memory.

(03) New CPU: (Flash) send board message to tell host CPU (GPIO->SCI) --
     I am waiting for hot-add message. (NOTE: Host CPU can only send
     instruction in SMM mode. -- The register is SMM only)

(04) Host CPU: (OS) get message from board that a new CPU is added.
     (GPIO -> SCI)

(05) Host CPU: (OS) All CPUs enter SMM (SCI->SWSMI) (NOTE: New CPU will
     not enter CPU because SMI is disabled)

(06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM rebase
     code.

(07) Host CPU: (SMM) Send message to New CPU to Enable SMI.

(08) New CPU: (Flash) Get message - Enable SMI.

(09) Host CPU: (SMM) Send SMI to the new CPU only.

(10) New CPU: (SMM) Response first SMI at 38000, and rebase SMBASE to
     TSEG.

(11) Host CPU: (SMM) Restore 38000.

(12) Host CPU: (SMM) Update located data structure to add the new CPU
     information. (This step will involve CPU_SERVICE protocol)

===================== (now, the next SMI will bring all CPU into TSEG)

(13) New CPU: (Flash) run MRC code, to init its own memory.

(14) New CPU: (Flash) Deadloop, and wait for INIT-SIPI-SIPI.

(15) Host CPU: (OS) Send INIT-SIPI-SIPI to pull new CPU in.

Thanks
Laszlo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: CPU hotplug using SMM with QEMU+OVMF
  2019-08-13 14:16 CPU hotplug using SMM with QEMU+OVMF Laszlo Ersek
@ 2019-08-13 16:09 ` Laszlo Ersek
  2019-08-13 16:18   ` Laszlo Ersek
  2019-08-14 13:20   ` Yao, Jiewen
  0 siblings, 2 replies; 69+ messages in thread
From: Laszlo Ersek @ 2019-08-13 16:09 UTC (permalink / raw)
  To: edk2-devel-groups-io
  Cc: edk2-rfc-groups-io, qemu devel list, Igor Mammedov, Paolo Bonzini,
	Jiewen Yao, Yingwen Chen, Jun Nakajima, Boris Ostrovsky,
	Joao Marcal Lemos Martins, Phillip Goerl

On 08/13/19 16:16, Laszlo Ersek wrote:

> Yingwen and Jiewen suggested the following process.
>
> Legend:
>
> - "New CPU":  CPU being hot-added
> - "Host CPU": existing CPU
> - (Flash):    code running from flash
> - (SMM):      code running from SMRAM
>
> Steps:
>
> (01) New CPU: (Flash) enter reset vector, Global SMI disabled by
>      default.

- What does "Global SMI disabled by default" mean? In particular, what
  is "global" here?

  Do you mean that the CPU being hot-plugged should mask (by default)
  broadcast SMIs? What about directed SMIs? (An attacker could try that
  too.)

  And what about other processors? (I'd assume step (01)) is not
  relevant for other processors, but "global" is quite confusing here.)

- Does this part require a new branch somewhere in the OVMF SEC code?
  How do we determine whether the CPU executing SEC is BSP or
  hot-plugged AP?

- How do we tell the hot-plugged AP where to start execution? (I.e. that
  it should execute code at a particular pflash location.)

  For example, in MpInitLib, we start a specific AP with INIT-SIPI-SIPI,
  where "SIPI" stores the startup address in the "Interrupt Command
  Register" (which is memory-mapped in xAPIC mode, and an MSR in x2APIC
  mode, apparently). That doesn't apply here -- should QEMU auto-start
  the new CPU?

- What memory is used as stack by the new CPU, when it runs code from
  flash?

  QEMU does not emulate CAR (Cache As RAM). The new CPU doesn't have
  access to SMRAM. And we cannot use AcpiNVS or Reserved memory, because
  a malicious OS could use other CPUs -- or PCI device DMA -- to attack
  the stack (unless QEMU forcibly paused other CPUs upon hotplug; I'm
  not sure).

- If an attempt is made to hotplug multiple CPUs in quick succession,
  does something serialize those attempts?

  Again, stack usage could be a concern, even with Cache-As-RAM --
  HyperThreads (logical processors) on a single core don't have
  dedicated cache.

  Does CPU hotplug apply only at the socket level? If the CPU is
  multi-core, what is responsible for hot-plugging all cores present in
  the socket?

> (02) New CPU: (Flash) configure memory control to let it access global
>      host memory.

In QEMU/KVM guests, we don't have to enable memory explicitly, it just
exists and works.

In OVMF X64 SEC, we can't access RAM above 4GB, but that shouldn't be an
issue per se.

> (03) New CPU: (Flash) send board message to tell host CPU (GPIO->SCI)
>      -- I am waiting for hot-add message.

Maybe we can simplify this in QEMU by broadcasting an SMI to existent
processors immediately upon plugging the new CPU.

>                                        (NOTE: Host CPU can only send
>      instruction in SMM mode. -- The register is SMM only)

Sorry, I don't follow -- what register are we talking about here, and
why is the BSP needed to send anything at all? What "instruction" do you
have in mind?

> (04) Host CPU: (OS) get message from board that a new CPU is added.
>      (GPIO -> SCI)
>
> (05) Host CPU: (OS) All CPUs enter SMM (SCI->SWSMI) (NOTE: New CPU
>      will not enter CPU because SMI is disabled)

I don't understand the OS involvement here. But, again, perhaps QEMU can
force all existent CPUs into SMM immediately upon adding the new CPU.

> (06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
>      rebase code.
>
> (07) Host CPU: (SMM) Send message to New CPU to Enable SMI.

Aha, so this is the SMM-only register you mention in step (03). Is the
register specified in the Intel SDM?

> (08) New CPU: (Flash) Get message - Enable SMI.
>
> (09) Host CPU: (SMM) Send SMI to the new CPU only.
>
> (10) New CPU: (SMM) Response first SMI at 38000, and rebase SMBASE to
>      TSEG.

What code does the new CPU execute after it completes step (10)? Does it
halt?

> (11) Host CPU: (SMM) Restore 38000.

These steps (i.e., (06) through (11)) don't appear RAS-specific. The
only platform-specific feature seems to be SMI masking register, which
could be extracted into a new SmmCpuFeaturesLib API.

Thus, would you please consider open sourcing firmware code for steps
(06) through (11)?

Alternatively -- and in particular because the stack for step (01)
concerns me --, we could approach this from a high-level, functional
perspective. The states that really matter are the relocated SMBASE for
the new CPU, and the state of the full system, right at the end of step
(11).

When the SMM setup quiesces during normal firmware boot, OVMF could use
existent (finalized) SMBASE infomation to *pre-program* some virtual
QEMU hardware, with such state that would be expected, as "final" state,
of any new hotplugged CPU. Afterwards, if / when the hotplug actually
happens, QEMU could blanket-apply this state to the new CPU, and
broadcast a hardware SMI to all CPUs except the new one.

The hardware SMI should tell the firmware that the rest of the process
-- step (12) below, and onward -- is being requested.

If I understand right, this approach would produce an firmware & system
state that's identical to what's expected right after step (11):

- all SMBASEs relocated
- all preexistent CPUs in SMM
- new CPU halted / blocked from launch
- DRAM at 0x30000 / 0x38000 contains OS-owned data

Is my understanding correct that this is the expected state after step
(11)?

Three more comments on the "SMBASE pre-config" approach:

- the virtual hardware providing this feature should become locked after
  the configuration, until next platform reset

- the pre-config should occur via simple hardware accesses, so that it
  can be replayed at S3 resume, i.e. as part of the S3 boot script

- from the pre-configured state, and the APIC ID, QEMU itself could
  perhaps calculate the SMI stack location for the new processor.

> (12) Host CPU: (SMM) Update located data structure to add the new CPU
>      information. (This step will involve CPU_SERVICE protocol)

I commented on EFI_SMM_CPU_SERVICE_PROTOCOL in upon bullet (4) of
<https://bugzilla.tianocore.org/show_bug.cgi?id=1512#c4>.

Calling EFI_SMM_ADD_PROCESSOR looks justified.

What are some of the other member functions used for? The scary one is
EFI_SMM_REGISTER_EXCEPTION_HANDLER.

> ===================== (now, the next SMI will bring all CPU into TSEG)

OK... but what component injects that SMI, and when?

> (13) New CPU: (Flash) run MRC code, to init its own memory.

Why is this needed esp. after step (10)? The new CPU has accessed DRAM
already. And why are we executing code from pflash, rather than from
SMRAM, given that we're past SMBASE relocation?

> (14) New CPU: (Flash) Deadloop, and wait for INIT-SIPI-SIPI.
>
> (15) Host CPU: (OS) Send INIT-SIPI-SIPI to pull new CPU in.

I'm confused by these steps. I thought that step (12) would complete the
hotplug, by updating the administrative data structures internally. And
the next SMI -- raised for the usual purposes, such as a software SMI
for variable access -- would be handled like it always is, except it
would also pull the new CPU into SMM too.

Thanks!
Laszlo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: CPU hotplug using SMM with QEMU+OVMF
  2019-08-13 16:09 ` Laszlo Ersek
@ 2019-08-13 16:18   ` Laszlo Ersek
  2019-08-14 13:20   ` Yao, Jiewen
  1 sibling, 0 replies; 69+ messages in thread
From: Laszlo Ersek @ 2019-08-13 16:18 UTC (permalink / raw)
  To: edk2-devel-groups-io
  Cc: edk2-rfc-groups-io, qemu devel list, Igor Mammedov, Paolo Bonzini,
	Jiewen Yao, Yingwen Chen, Jun Nakajima, Boris Ostrovsky,
	Joao Marcal Lemos Martins, Phillip Goerl

On 08/13/19 18:09, Laszlo Ersek wrote:
> On 08/13/19 16:16, Laszlo Ersek wrote:

>> (06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
>>      rebase code.
>>
>> (07) Host CPU: (SMM) Send message to New CPU to Enable SMI.
> 
> Aha, so this is the SMM-only register you mention in step (03). Is the
> register specified in the Intel SDM?
> 
> 
>> (08) New CPU: (Flash) Get message - Enable SMI.
>>
>> (09) Host CPU: (SMM) Send SMI to the new CPU only.
>>
>> (10) New CPU: (SMM) Response first SMI at 38000, and rebase SMBASE to
>>      TSEG.
> 
> What code does the new CPU execute after it completes step (10)? Does it
> halt?
> 
> 
>> (11) Host CPU: (SMM) Restore 38000.
> 
> These steps (i.e., (06) through (11)) don't appear RAS-specific. The
> only platform-specific feature seems to be SMI masking register, which
> could be extracted into a new SmmCpuFeaturesLib API.
> 
> Thus, would you please consider open sourcing firmware code for steps
> (06) through (11)?
> 
> 
> Alternatively -- and in particular because the stack for step (01)
> concerns me --, we could approach this from a high-level, functional
> perspective. The states that really matter are the relocated SMBASE for
> the new CPU, and the state of the full system, right at the end of step
> (11).
> 
> When the SMM setup quiesces during normal firmware boot, OVMF could use
> existent (finalized) SMBASE infomation to *pre-program* some virtual
> QEMU hardware, with such state that would be expected, as "final" state,
> of any new hotplugged CPU. Afterwards, if / when the hotplug actually
> happens, QEMU could blanket-apply this state to the new CPU, and
> broadcast a hardware SMI to all CPUs except the new one.
> 
> The hardware SMI should tell the firmware that the rest of the process
> -- step (12) below, and onward -- is being requested.
> 
> If I understand right, this approach would produce an firmware & system
> state that's identical to what's expected right after step (11):
> 
> - all SMBASEs relocated
> - all preexistent CPUs in SMM
> - new CPU halted / blocked from launch
> - DRAM at 0x30000 / 0x38000 contains OS-owned data
> 
> Is my understanding correct that this is the expected state after step
> (11)?

Revisiting some of my notes from earlier, such as
<https://bugzilla.redhat.com/show_bug.cgi?id=1454803#c46> -- apologies,
private BZ... --, we discussed some of this stuff with Mike on the phone
in April.

And, it looked like generating a hardware SMI in QEMU, in association
with the hotplug action that was being requested through the QEMU
monitor, would be the right approach.

By now I have forgotten about that discussion -- hence "revisiting my
notes"--, but luckily, it seems consistent with what I've proposed
above, under "alternatively".

Thanks,
Laszlo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: CPU hotplug using SMM with QEMU+OVMF
  2019-08-13 16:09 ` Laszlo Ersek
  2019-08-13 16:18   ` Laszlo Ersek
@ 2019-08-14 13:20   ` Yao, Jiewen
  2019-08-14 14:04     ` Paolo Bonzini
  1 sibling, 1 reply; 69+ messages in thread
From: Yao, Jiewen @ 2019-08-14 13:20 UTC (permalink / raw)
  To: Laszlo Ersek, edk2-devel-groups-io
  Cc: edk2-rfc-groups-io, qemu devel list, Igor Mammedov, Paolo Bonzini,
	Chen, Yingwen, Nakajima, Jun, Boris Ostrovsky,
	Joao Marcal Lemos Martins, Phillip Goerl

My comments below.

> -----Original Message-----
> From: Laszlo Ersek [mailto:lersek@redhat.com]
> Sent: Wednesday, August 14, 2019 12:09 AM
> To: edk2-devel-groups-io <devel@edk2.groups.io>
> Cc: edk2-rfc-groups-io <rfc@edk2.groups.io>; qemu devel list
> <qemu-devel@nongnu.org>; Igor Mammedov <imammedo@redhat.com>;
> Paolo Bonzini <pbonzini@redhat.com>; Yao, Jiewen
> <jiewen.yao@intel.com>; Chen, Yingwen <yingwen.chen@intel.com>;
> Nakajima, Jun <jun.nakajima@intel.com>; Boris Ostrovsky
> <boris.ostrovsky@oracle.com>; Joao Marcal Lemos Martins
> <joao.m.martins@oracle.com>; Phillip Goerl <phillip.goerl@oracle.com>
> Subject: Re: CPU hotplug using SMM with QEMU+OVMF
> 
> On 08/13/19 16:16, Laszlo Ersek wrote:
> 
> > Yingwen and Jiewen suggested the following process.
> >
> > Legend:
> >
> > - "New CPU":  CPU being hot-added
> > - "Host CPU": existing CPU
> > - (Flash):    code running from flash
> > - (SMM):      code running from SMRAM
> >
> > Steps:
> >
> > (01) New CPU: (Flash) enter reset vector, Global SMI disabled by
> >      default.
> 
> - What does "Global SMI disabled by default" mean? In particular, what
>   is "global" here?
[Jiewen] OK. Let's don’t use the term "global".


>   Do you mean that the CPU being hot-plugged should mask (by default)
>   broadcast SMIs? What about directed SMIs? (An attacker could try that
>   too.)
[Jiewen] I mean all SMIs are blocked for this specific hot-added CPU.


>   And what about other processors? (I'd assume step (01)) is not
>   relevant for other processors, but "global" is quite confusing here.)
[Jiewen] No impact to other processors.


> - Does this part require a new branch somewhere in the OVMF SEC code?
>   How do we determine whether the CPU executing SEC is BSP or
>   hot-plugged AP?
[Jiewen] I think this is blocked from hardware perspective, since the first instruction.
There are some hardware specific registers can be used to determine if the CPU is new added.
I don’t think this must be same as the real hardware.
You are free to invent some registers in device model to be used in OVMF hot plug driver.


> - How do we tell the hot-plugged AP where to start execution? (I.e. that
>   it should execute code at a particular pflash location.)
[Jiewen] Same real mode reset vector at FFFF:FFF0.


>   For example, in MpInitLib, we start a specific AP with INIT-SIPI-SIPI,
>   where "SIPI" stores the startup address in the "Interrupt Command
>   Register" (which is memory-mapped in xAPIC mode, and an MSR in x2APIC
>   mode, apparently). That doesn't apply here -- should QEMU auto-start
>   the new CPU?
[Jiewen] You can send INIT-SIPI-SIPI to new CPU only after it can access memory.
SIPI need give AP an below 1M memory address as waking vector.


> - What memory is used as stack by the new CPU, when it runs code from
>   flash?
[Jiewen] Same as other CPU in normal boot. You can use special reserved memory.


>   QEMU does not emulate CAR (Cache As RAM). The new CPU doesn't have
>   access to SMRAM. And we cannot use AcpiNVS or Reserved memory,
> because
>   a malicious OS could use other CPUs -- or PCI device DMA -- to attack
>   the stack (unless QEMU forcibly paused other CPUs upon hotplug; I'm
>   not sure).
[Jiewen] Excellent point!
I don’t think there is problem for real hardware, who always has CAR.
Can QEMU provide some CPU specific space, such as MMIO region?


> - If an attempt is made to hotplug multiple CPUs in quick succession,
>   does something serialize those attempts?
[Jiewen] The BIOS need consider this as availability requirement.
I don’t have strong opinion.
You can design a system that required hotplug must be one-by-one, or fail the hot-add.
Or you can design a system that did not have such restriction.
Again, all we need to do is to maintain the integrity of SMM.
The availability should be considered as separate requirement.


>   Again, stack usage could be a concern, even with Cache-As-RAM --
>   HyperThreads (logical processors) on a single core don't have
>   dedicated cache.
[Jiewen] Agree with you on the virtual environment.
For real hardware, we do socket level hot-add only. So HT is not the concern.
But if you want to do that in virtual environment, a processor specific memory
should be considered.


>   Does CPU hotplug apply only at the socket level? If the CPU is
>   multi-core, what is responsible for hot-plugging all cores present in
>   the socket?
[Jiewen] Ditto.


> > (02) New CPU: (Flash) configure memory control to let it access global
> >      host memory.
> 
> In QEMU/KVM guests, we don't have to enable memory explicitly, it just
> exists and works.
> 
> In OVMF X64 SEC, we can't access RAM above 4GB, but that shouldn't be an
> issue per se.
[Jiewen] Agree. I do not see the issue.


> > (03) New CPU: (Flash) send board message to tell host CPU (GPIO->SCI)
> >      -- I am waiting for hot-add message.
> 
> Maybe we can simplify this in QEMU by broadcasting an SMI to existent
> processors immediately upon plugging the new CPU.
> 
> 
> >                                        (NOTE: Host CPU can only
> send
> >      instruction in SMM mode. -- The register is SMM only)
> 
> Sorry, I don't follow -- what register are we talking about here, and
> why is the BSP needed to send anything at all? What "instruction" do you
> have in mind?
[Jiewen] The new CPU does not enable SMI at reset.
At some point of time later, the CPU need enable SMI, right?
The "instruction" here means, the host CPUs need tell to CPU to enable SMI.


> > (04) Host CPU: (OS) get message from board that a new CPU is added.
> >      (GPIO -> SCI)
> >
> > (05) Host CPU: (OS) All CPUs enter SMM (SCI->SWSMI) (NOTE: New CPU
> >      will not enter CPU because SMI is disabled)
> 
> I don't understand the OS involvement here. But, again, perhaps QEMU can
> force all existent CPUs into SMM immediately upon adding the new CPU.
[Jiewen] OS here means the Host CPU running code in OS environment, not in SMM environment.


> > (06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
> >      rebase code.
> >
> > (07) Host CPU: (SMM) Send message to New CPU to Enable SMI.
> 
> Aha, so this is the SMM-only register you mention in step (03). Is the
> register specified in the Intel SDM?
[Jiewen] Right. That is the register to let host CPU tell new CPU to enable SMI.
It is platform specific register. Not defined in SDM.
You may invent one in device model.


> > (08) New CPU: (Flash) Get message - Enable SMI.
> >
> > (09) Host CPU: (SMM) Send SMI to the new CPU only.
> >
> > (10) New CPU: (SMM) Response first SMI at 38000, and rebase SMBASE to
> >      TSEG.
> 
> What code does the new CPU execute after it completes step (10)? Does it
> halt?
[Jiewen] The new CPU exits SMM and return to original place - where it is
interrupted to enter SMM - running code on the flash.


> > (11) Host CPU: (SMM) Restore 38000.
> 
> These steps (i.e., (06) through (11)) don't appear RAS-specific. The
> only platform-specific feature seems to be SMI masking register, which
> could be extracted into a new SmmCpuFeaturesLib API.
> 
> Thus, would you please consider open sourcing firmware code for steps
> (06) through (11)?
> 
> Alternatively -- and in particular because the stack for step (01)
> concerns me --, we could approach this from a high-level, functional
> perspective. The states that really matter are the relocated SMBASE for
> the new CPU, and the state of the full system, right at the end of step
> (11).
> 
> When the SMM setup quiesces during normal firmware boot, OVMF could
> use
> existent (finalized) SMBASE infomation to *pre-program* some virtual
> QEMU hardware, with such state that would be expected, as "final" state,
> of any new hotplugged CPU. Afterwards, if / when the hotplug actually
> happens, QEMU could blanket-apply this state to the new CPU, and
> broadcast a hardware SMI to all CPUs except the new one.
> 
> The hardware SMI should tell the firmware that the rest of the process
> -- step (12) below, and onward -- is being requested.
> 
> If I understand right, this approach would produce an firmware & system
> state that's identical to what's expected right after step (11):
> 
> - all SMBASEs relocated
> - all preexistent CPUs in SMM
> - new CPU halted / blocked from launch
> - DRAM at 0x30000 / 0x38000 contains OS-owned data
> 
> Is my understanding correct that this is the expected state after step
> (11)?
[Jiewen] I think you are correct.


> Three more comments on the "SMBASE pre-config" approach:
> 
> - the virtual hardware providing this feature should become locked after
>   the configuration, until next platform reset
> 
> - the pre-config should occur via simple hardware accesses, so that it
>   can be replayed at S3 resume, i.e. as part of the S3 boot script
> 
> - from the pre-configured state, and the APIC ID, QEMU itself could
>   perhaps calculate the SMI stack location for the new processor.
> 
> 
> > (12) Host CPU: (SMM) Update located data structure to add the new CPU
> >      information. (This step will involve CPU_SERVICE protocol)
> 
> I commented on EFI_SMM_CPU_SERVICE_PROTOCOL in upon bullet (4) of
> <https://bugzilla.tianocore.org/show_bug.cgi?id=1512#c4>.
> 
> Calling EFI_SMM_ADD_PROCESSOR looks justified.
[Jiewen] I think you are correct.
Also REMOVE_PROCESSOR will be used for hot-remove action.


> What are some of the other member functions used for? The scary one is
> EFI_SMM_REGISTER_EXCEPTION_HANDLER.
[Jiewen] This is to register a new exception handler in SMM.
I don’t think this API is involved in hot-add.


> > ===================== (now, the next SMI will bring all CPU into TSEG)
> 
> OK... but what component injects that SMI, and when?
[Jiewen] Any SMI event. It could be synchronized SMI or asynchronized SMI.
It could from software such as IO write, or hardware such as thermal event.


> > (13) New CPU: (Flash) run MRC code, to init its own memory.
> 
> Why is this needed esp. after step (10)? The new CPU has accessed DRAM
> already. And why are we executing code from pflash, rather than from
> SMRAM, given that we're past SMBASE relocation?
[Jiewen] On real hardware, it is needed because different CPU may have different capability to access different DIMM.
I do not think your virtual platform need it.


> > (14) New CPU: (Flash) Deadloop, and wait for INIT-SIPI-SIPI.
> >
> > (15) Host CPU: (OS) Send INIT-SIPI-SIPI to pull new CPU in.
> 
> I'm confused by these steps. I thought that step (12) would complete the
> hotplug, by updating the administrative data structures internally. And
> the next SMI -- raised for the usual purposes, such as a software SMI
> for variable access -- would be handled like it always is, except it
> would also pull the new CPU into SMM too.
[Jiewen] The OS need use the new CPU at some point of time, right?
As such, the OS need pull the new CPU into its own environment by INIT-SIPI-SIPI.


> Thanks!
> Laszlo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: CPU hotplug using SMM with QEMU+OVMF
  2019-08-14 13:20   ` Yao, Jiewen
@ 2019-08-14 14:04     ` Paolo Bonzini
  2019-08-15  9:55       ` Yao, Jiewen
                         ` (2 more replies)
  0 siblings, 3 replies; 69+ messages in thread
From: Paolo Bonzini @ 2019-08-14 14:04 UTC (permalink / raw)
  To: Yao, Jiewen, Laszlo Ersek, edk2-devel-groups-io
  Cc: edk2-rfc-groups-io, qemu devel list, Igor Mammedov, Chen, Yingwen,
	Nakajima, Jun, Boris Ostrovsky, Joao Marcal Lemos Martins,
	Phillip Goerl

On 14/08/19 15:20, Yao, Jiewen wrote:
>> - Does this part require a new branch somewhere in the OVMF SEC code?
>>   How do we determine whether the CPU executing SEC is BSP or
>>   hot-plugged AP?
> [Jiewen] I think this is blocked from hardware perspective, since the first instruction.
> There are some hardware specific registers can be used to determine if the CPU is new added.
> I don’t think this must be same as the real hardware.
> You are free to invent some registers in device model to be used in OVMF hot plug driver.

Yes, this would be a new operation mode for QEMU, that only applies to
hot-plugged CPUs.  In this mode the AP doesn't reply to INIT or SMI, in
fact it doesn't reply to anything at all.

>> - How do we tell the hot-plugged AP where to start execution? (I.e. that
>>   it should execute code at a particular pflash location.)
> [Jiewen] Same real mode reset vector at FFFF:FFF0.

You do not need a reset vector or INIT/SIPI/SIPI sequence at all in
QEMU.  The AP does not start execution at all when it is unplugged, so
no cache-as-RAM etc.

We only need to modify QEMU so that hot-plugged APIs do not reply to
INIT/SIPI/SMI.

> I don’t think there is problem for real hardware, who always has CAR.
> Can QEMU provide some CPU specific space, such as MMIO region?

Why is a CPU-specific region needed if every other processor is in SMM
and thus trusted.

>>   Does CPU hotplug apply only at the socket level? If the CPU is
>>   multi-core, what is responsible for hot-plugging all cores present in
>>   the socket?

I can answer this: the SMM handler would interact with the hotplug
controller in the same way that ACPI DSDT does normally.  This supports
multiple hotplugs already.

Writes to the hotplug controller from outside SMM would be ignored.

>>> (03) New CPU: (Flash) send board message to tell host CPU (GPIO->SCI)
>>>      -- I am waiting for hot-add message.
>>
>> Maybe we can simplify this in QEMU by broadcasting an SMI to existent
>> processors immediately upon plugging the new CPU.

The QEMU DSDT could be modified (when secure boot is in effect) to OUT
to 0xB2 when hotplug happens.  It could write a well-known value to
0xB2, to be read by an SMI handler in edk2.


>>
>>>                                        (NOTE: Host CPU can only
>> send
>>>      instruction in SMM mode. -- The register is SMM only)
>>
>> Sorry, I don't follow -- what register are we talking about here, and
>> why is the BSP needed to send anything at all? What "instruction" do you
>> have in mind?
> [Jiewen] The new CPU does not enable SMI at reset.
> At some point of time later, the CPU need enable SMI, right?
> The "instruction" here means, the host CPUs need tell to CPU to enable SMI.

Right, this would be a write to the CPU hotplug controller

>>> (04) Host CPU: (OS) get message from board that a new CPU is added.
>>>      (GPIO -> SCI)
>>>
>>> (05) Host CPU: (OS) All CPUs enter SMM (SCI->SWSMI) (NOTE: New CPU
>>>      will not enter CPU because SMI is disabled)
>>
>> I don't understand the OS involvement here. But, again, perhaps QEMU can
>> force all existent CPUs into SMM immediately upon adding the new CPU.
> [Jiewen] OS here means the Host CPU running code in OS environment, not in SMM environment.

See above.

>>> (06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
>>>      rebase code.
>>>
>>> (07) Host CPU: (SMM) Send message to New CPU to Enable SMI.
>>
>> Aha, so this is the SMM-only register you mention in step (03). Is the
>> register specified in the Intel SDM?
> [Jiewen] Right. That is the register to let host CPU tell new CPU to enable SMI.
> It is platform specific register. Not defined in SDM.
> You may invent one in device model.

See above.

>>> (10) New CPU: (SMM) Response first SMI at 38000, and rebase SMBASE to
>>>      TSEG.
>>
>> What code does the new CPU execute after it completes step (10)? Does it
>> halt?
>
> [Jiewen] The new CPU exits SMM and return to original place - where it is
> interrupted to enter SMM - running code on the flash.

So in our case we'd need an INIT/SIPI/SIPI sequence between (06) and (07).

>>> (11) Host CPU: (SMM) Restore 38000.
>>
>> These steps (i.e., (06) through (11)) don't appear RAS-specific. The
>> only platform-specific feature seems to be SMI masking register, which
>> could be extracted into a new SmmCpuFeaturesLib API.
>>
>> Thus, would you please consider open sourcing firmware code for steps
>> (06) through (11)?
>>
>> Alternatively -- and in particular because the stack for step (01)
>> concerns me --, we could approach this from a high-level, functional
>> perspective. The states that really matter are the relocated SMBASE for
>> the new CPU, and the state of the full system, right at the end of step
>> (11).
>>
>> When the SMM setup quiesces during normal firmware boot, OVMF could
>> use
>> existent (finalized) SMBASE infomation to *pre-program* some virtual
>> QEMU hardware, with such state that would be expected, as "final" state,
>> of any new hotplugged CPU. Afterwards, if / when the hotplug actually
>> happens, QEMU could blanket-apply this state to the new CPU, and
>> broadcast a hardware SMI to all CPUs except the new one.

I'd rather avoid this and stay as close as possible to real hardware.

Paolo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: CPU hotplug using SMM with QEMU+OVMF
  2019-08-14 14:04     ` Paolo Bonzini
@ 2019-08-15  9:55       ` Yao, Jiewen
  2019-08-15 16:04         ` Paolo Bonzini
  2019-08-15 15:00       ` [edk2-devel] " Laszlo Ersek
  2019-08-15 16:07       ` Igor Mammedov
  2 siblings, 1 reply; 69+ messages in thread
From: Yao, Jiewen @ 2019-08-15  9:55 UTC (permalink / raw)
  To: Paolo Bonzini, Laszlo Ersek, edk2-devel-groups-io
  Cc: edk2-rfc-groups-io, qemu devel list, Igor Mammedov, Chen, Yingwen,
	Nakajima, Jun, Boris Ostrovsky, Joao Marcal Lemos Martins,
	Phillip Goerl

Hi Paolo
I am not sure what do you mean - "You do not need a reset vector ...".
If so, where is the first instruction of the new CPU in the virtualization environment?
Please help me understand that at first. Then we can continue the discussion.

Thank you
Yao Jiewen

> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Wednesday, August 14, 2019 10:05 PM
> To: Yao, Jiewen <jiewen.yao@intel.com>; Laszlo Ersek
> <lersek@redhat.com>; edk2-devel-groups-io <devel@edk2.groups.io>
> Cc: edk2-rfc-groups-io <rfc@edk2.groups.io>; qemu devel list
> <qemu-devel@nongnu.org>; Igor Mammedov <imammedo@redhat.com>;
> Chen, Yingwen <yingwen.chen@intel.com>; Nakajima, Jun
> <jun.nakajima@intel.com>; Boris Ostrovsky <boris.ostrovsky@oracle.com>;
> Joao Marcal Lemos Martins <joao.m.martins@oracle.com>; Phillip Goerl
> <phillip.goerl@oracle.com>
> Subject: Re: CPU hotplug using SMM with QEMU+OVMF
> 
> On 14/08/19 15:20, Yao, Jiewen wrote:
> >> - Does this part require a new branch somewhere in the OVMF SEC code?
> >>   How do we determine whether the CPU executing SEC is BSP or
> >>   hot-plugged AP?
> > [Jiewen] I think this is blocked from hardware perspective, since the first
> instruction.
> > There are some hardware specific registers can be used to determine if the
> CPU is new added.
> > I don’t think this must be same as the real hardware.
> > You are free to invent some registers in device model to be used in OVMF
> hot plug driver.
> 
> Yes, this would be a new operation mode for QEMU, that only applies to
> hot-plugged CPUs.  In this mode the AP doesn't reply to INIT or SMI, in
> fact it doesn't reply to anything at all.
> 
> >> - How do we tell the hot-plugged AP where to start execution? (I.e. that
> >>   it should execute code at a particular pflash location.)
> > [Jiewen] Same real mode reset vector at FFFF:FFF0.
> 
> You do not need a reset vector or INIT/SIPI/SIPI sequence at all in
> QEMU.  The AP does not start execution at all when it is unplugged, so
> no cache-as-RAM etc.

> We only need to modify QEMU so that hot-plugged APIs do not reply to
> INIT/SIPI/SMI.
> 
> > I don’t think there is problem for real hardware, who always has CAR.
> > Can QEMU provide some CPU specific space, such as MMIO region?
> 
> Why is a CPU-specific region needed if every other processor is in SMM
> and thus trusted.
> >>   Does CPU hotplug apply only at the socket level? If the CPU is
> >>   multi-core, what is responsible for hot-plugging all cores present in
> >>   the socket?
> 
> I can answer this: the SMM handler would interact with the hotplug
> controller in the same way that ACPI DSDT does normally.  This supports
> multiple hotplugs already.
> 
> Writes to the hotplug controller from outside SMM would be ignored.
> 
> >>> (03) New CPU: (Flash) send board message to tell host CPU (GPIO->SCI)
> >>>      -- I am waiting for hot-add message.
> >>
> >> Maybe we can simplify this in QEMU by broadcasting an SMI to existent
> >> processors immediately upon plugging the new CPU.
> 
> The QEMU DSDT could be modified (when secure boot is in effect) to OUT
> to 0xB2 when hotplug happens.  It could write a well-known value to
> 0xB2, to be read by an SMI handler in edk2.
> 
> 
> >>
> >>>                                        (NOTE: Host CPU can
> only
> >> send
> >>>      instruction in SMM mode. -- The register is SMM only)
> >>
> >> Sorry, I don't follow -- what register are we talking about here, and
> >> why is the BSP needed to send anything at all? What "instruction" do you
> >> have in mind?
> > [Jiewen] The new CPU does not enable SMI at reset.
> > At some point of time later, the CPU need enable SMI, right?
> > The "instruction" here means, the host CPUs need tell to CPU to enable
> SMI.
> 
> Right, this would be a write to the CPU hotplug controller
> 
> >>> (04) Host CPU: (OS) get message from board that a new CPU is added.
> >>>      (GPIO -> SCI)
> >>>
> >>> (05) Host CPU: (OS) All CPUs enter SMM (SCI->SWSMI) (NOTE: New CPU
> >>>      will not enter CPU because SMI is disabled)
> >>
> >> I don't understand the OS involvement here. But, again, perhaps QEMU
> can
> >> force all existent CPUs into SMM immediately upon adding the new CPU.
> > [Jiewen] OS here means the Host CPU running code in OS environment, not
> in SMM environment.
> 
> See above.
> 
> >>> (06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
> >>>      rebase code.
> >>>
> >>> (07) Host CPU: (SMM) Send message to New CPU to Enable SMI.
> >>
> >> Aha, so this is the SMM-only register you mention in step (03). Is the
> >> register specified in the Intel SDM?
> > [Jiewen] Right. That is the register to let host CPU tell new CPU to enable
> SMI.
> > It is platform specific register. Not defined in SDM.
> > You may invent one in device model.
> 
> See above.
> 
> >>> (10) New CPU: (SMM) Response first SMI at 38000, and rebase SMBASE
> to
> >>>      TSEG.
> >>
> >> What code does the new CPU execute after it completes step (10)? Does
> it
> >> halt?
> >
> > [Jiewen] The new CPU exits SMM and return to original place - where it is
> > interrupted to enter SMM - running code on the flash.
> 
> So in our case we'd need an INIT/SIPI/SIPI sequence between (06) and (07).
> 
> >>> (11) Host CPU: (SMM) Restore 38000.
> >>
> >> These steps (i.e., (06) through (11)) don't appear RAS-specific. The
> >> only platform-specific feature seems to be SMI masking register, which
> >> could be extracted into a new SmmCpuFeaturesLib API.
> >>
> >> Thus, would you please consider open sourcing firmware code for steps
> >> (06) through (11)?
> >>
> >> Alternatively -- and in particular because the stack for step (01)
> >> concerns me --, we could approach this from a high-level, functional
> >> perspective. The states that really matter are the relocated SMBASE for
> >> the new CPU, and the state of the full system, right at the end of step
> >> (11).
> >>
> >> When the SMM setup quiesces during normal firmware boot, OVMF could
> >> use
> >> existent (finalized) SMBASE infomation to *pre-program* some virtual
> >> QEMU hardware, with such state that would be expected, as "final" state,
> >> of any new hotplugged CPU. Afterwards, if / when the hotplug actually
> >> happens, QEMU could blanket-apply this state to the new CPU, and
> >> broadcast a hardware SMI to all CPUs except the new one.
> 
> I'd rather avoid this and stay as close as possible to real hardware.
> 
> Paolo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: CPU hotplug using SMM with QEMU+OVMF
  2019-08-15  9:55       ` Yao, Jiewen
@ 2019-08-15 16:04         ` Paolo Bonzini
  0 siblings, 0 replies; 69+ messages in thread
From: Paolo Bonzini @ 2019-08-15 16:04 UTC (permalink / raw)
  To: Yao, Jiewen, Laszlo Ersek, edk2-devel-groups-io
  Cc: edk2-rfc-groups-io, qemu devel list, Igor Mammedov, Chen, Yingwen,
	Nakajima, Jun, Boris Ostrovsky, Joao Marcal Lemos Martins,
	Phillip Goerl

On 15/08/19 11:55, Yao, Jiewen wrote:
> Hi Paolo
> I am not sure what do you mean - "You do not need a reset vector ...".
> If so, where is the first instruction of the new CPU in the virtualization environment?
> Please help me understand that at first. Then we can continue the discussion.

The BSP starts running from 0xFFFFFFF0.  APs do not start running at all
and just sit waiting for an INIT-SIPI-SIPI sequence.  Please see my
proposal in the reply to Laszlo.

Paolo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-14 14:04     ` Paolo Bonzini
  2019-08-15  9:55       ` Yao, Jiewen
@ 2019-08-15 15:00       ` Laszlo Ersek
  2019-08-15 16:16         ` Igor Mammedov
  2019-08-15 16:21         ` Paolo Bonzini
  2019-08-15 16:07       ` Igor Mammedov
  2 siblings, 2 replies; 69+ messages in thread
From: Laszlo Ersek @ 2019-08-15 15:00 UTC (permalink / raw)
  To: devel, pbonzini, Yao, Jiewen
  Cc: edk2-rfc-groups-io, qemu devel list, Igor Mammedov, Chen, Yingwen,
	Nakajima, Jun, Boris Ostrovsky, Joao Marcal Lemos Martins,
	Phillip Goerl

On 08/14/19 16:04, Paolo Bonzini wrote:
> On 14/08/19 15:20, Yao, Jiewen wrote:
>>> - Does this part require a new branch somewhere in the OVMF SEC code?
>>>   How do we determine whether the CPU executing SEC is BSP or
>>>   hot-plugged AP?
>> [Jiewen] I think this is blocked from hardware perspective, since the first instruction.
>> There are some hardware specific registers can be used to determine if the CPU is new added.
>> I don’t think this must be same as the real hardware.
>> You are free to invent some registers in device model to be used in OVMF hot plug driver.
> 
> Yes, this would be a new operation mode for QEMU, that only applies to
> hot-plugged CPUs.  In this mode the AP doesn't reply to INIT or SMI, in
> fact it doesn't reply to anything at all.
> 
>>> - How do we tell the hot-plugged AP where to start execution? (I.e. that
>>>   it should execute code at a particular pflash location.)
>> [Jiewen] Same real mode reset vector at FFFF:FFF0.
> 
> You do not need a reset vector or INIT/SIPI/SIPI sequence at all in
> QEMU.  The AP does not start execution at all when it is unplugged, so
> no cache-as-RAM etc.
> 
> We only need to modify QEMU so that hot-plugged APIs do not reply to
> INIT/SIPI/SMI.
> 
>> I don’t think there is problem for real hardware, who always has CAR.
>> Can QEMU provide some CPU specific space, such as MMIO region?
> 
> Why is a CPU-specific region needed if every other processor is in SMM
> and thus trusted.

I was going through the steps Jiewen and Yingwen recommended.

In step (02), the new CPU is expected to set up RAM access. In step
(03), the new CPU, executing code from flash, is expected to "send board
message to tell host CPU (GPIO->SCI) -- I am waiting for hot-add
message." For that action, the new CPU may need a stack (minimally if we
want to use C function calls).

Until step (03), there had been no word about any other (= pre-plugged)
CPUs (more precisely, Jiewen even confirmed "No impact to other
processors"), so I didn't assume that other CPUs had entered SMM.

Paolo, I've attempted to read Jiewen's response, and yours, as carefully
as I can. I'm still very confused. If you have a better understanding,
could you please write up the 15-step process from the thread starter
again, with all QEMU customizations applied? Such as, unnecessary steps
removed, and platform specifics filled in.

One more comment below:

> 
>>>   Does CPU hotplug apply only at the socket level? If the CPU is
>>>   multi-core, what is responsible for hot-plugging all cores present in
>>>   the socket?
> 
> I can answer this: the SMM handler would interact with the hotplug
> controller in the same way that ACPI DSDT does normally.  This supports
> multiple hotplugs already.
> 
> Writes to the hotplug controller from outside SMM would be ignored.
> 
>>>> (03) New CPU: (Flash) send board message to tell host CPU (GPIO->SCI)
>>>>      -- I am waiting for hot-add message.
>>>
>>> Maybe we can simplify this in QEMU by broadcasting an SMI to existent
>>> processors immediately upon plugging the new CPU.
> 
> The QEMU DSDT could be modified (when secure boot is in effect) to OUT
> to 0xB2 when hotplug happens.  It could write a well-known value to
> 0xB2, to be read by an SMI handler in edk2.

(My comment below is general, and may not apply to this particular
situation. I'm too confused to figure that out myself, sorry!)

I dislike involving QEMU's generated DSDT in anything SMM (even
injecting the SMI), because the AML interpreter runs in the OS.

If a malicious OS kernel is a bit too enlightened about the DSDT, it
could willfully diverge from the process that we design. If QEMU
broadcast the SMI internally, the guest OS could not interfere with that.

If the purpose of the SMI is specifically to force all CPUs into SMM
(and thereby force them into trusted state), then the OS would be
explicitly counter-interested in carrying out the AML operations from
QEMU's DSDT.

I'd be OK with an SMM / SMI involvement in QEMU's DSDT if, by diverging
from that DSDT, the OS kernel could only mess with its own state, and
not with the firmware's.

Thanks
Laszlo

> 
> 
>>>
>>>>                                        (NOTE: Host CPU can only
>>> send
>>>>      instruction in SMM mode. -- The register is SMM only)
>>>
>>> Sorry, I don't follow -- what register are we talking about here, and
>>> why is the BSP needed to send anything at all? What "instruction" do you
>>> have in mind?
>> [Jiewen] The new CPU does not enable SMI at reset.
>> At some point of time later, the CPU need enable SMI, right?
>> The "instruction" here means, the host CPUs need tell to CPU to enable SMI.
> 
> Right, this would be a write to the CPU hotplug controller
> 
>>>> (04) Host CPU: (OS) get message from board that a new CPU is added.
>>>>      (GPIO -> SCI)
>>>>
>>>> (05) Host CPU: (OS) All CPUs enter SMM (SCI->SWSMI) (NOTE: New CPU
>>>>      will not enter CPU because SMI is disabled)
>>>
>>> I don't understand the OS involvement here. But, again, perhaps QEMU can
>>> force all existent CPUs into SMM immediately upon adding the new CPU.
>> [Jiewen] OS here means the Host CPU running code in OS environment, not in SMM environment.
> 
> See above.
> 
>>>> (06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
>>>>      rebase code.
>>>>
>>>> (07) Host CPU: (SMM) Send message to New CPU to Enable SMI.
>>>
>>> Aha, so this is the SMM-only register you mention in step (03). Is the
>>> register specified in the Intel SDM?
>> [Jiewen] Right. That is the register to let host CPU tell new CPU to enable SMI.
>> It is platform specific register. Not defined in SDM.
>> You may invent one in device model.
> 
> See above.
> 
>>>> (10) New CPU: (SMM) Response first SMI at 38000, and rebase SMBASE to
>>>>      TSEG.
>>>
>>> What code does the new CPU execute after it completes step (10)? Does it
>>> halt?
>>
>> [Jiewen] The new CPU exits SMM and return to original place - where it is
>> interrupted to enter SMM - running code on the flash.
> 
> So in our case we'd need an INIT/SIPI/SIPI sequence between (06) and (07).
> 
>>>> (11) Host CPU: (SMM) Restore 38000.
>>>
>>> These steps (i.e., (06) through (11)) don't appear RAS-specific. The
>>> only platform-specific feature seems to be SMI masking register, which
>>> could be extracted into a new SmmCpuFeaturesLib API.
>>>
>>> Thus, would you please consider open sourcing firmware code for steps
>>> (06) through (11)?
>>>
>>> Alternatively -- and in particular because the stack for step (01)
>>> concerns me --, we could approach this from a high-level, functional
>>> perspective. The states that really matter are the relocated SMBASE for
>>> the new CPU, and the state of the full system, right at the end of step
>>> (11).
>>>
>>> When the SMM setup quiesces during normal firmware boot, OVMF could
>>> use
>>> existent (finalized) SMBASE infomation to *pre-program* some virtual
>>> QEMU hardware, with such state that would be expected, as "final" state,
>>> of any new hotplugged CPU. Afterwards, if / when the hotplug actually
>>> happens, QEMU could blanket-apply this state to the new CPU, and
>>> broadcast a hardware SMI to all CPUs except the new one.
> 
> I'd rather avoid this and stay as close as possible to real hardware.
> 
> Paolo
> 
> 
> 


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-15 15:00       ` [edk2-devel] " Laszlo Ersek
@ 2019-08-15 16:16         ` Igor Mammedov
  2019-08-15 16:21         ` Paolo Bonzini
  1 sibling, 0 replies; 69+ messages in thread
From: Igor Mammedov @ 2019-08-15 16:16 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: devel, pbonzini, Yao, Jiewen, edk2-rfc-groups-io, qemu devel list,
	Chen, Yingwen, Nakajima, Jun, Boris Ostrovsky,
	Joao Marcal Lemos Martins, Phillip Goerl

On Thu, 15 Aug 2019 17:00:16 +0200
Laszlo Ersek <lersek@redhat.com> wrote:

> On 08/14/19 16:04, Paolo Bonzini wrote:
> > On 14/08/19 15:20, Yao, Jiewen wrote:
> >>> - Does this part require a new branch somewhere in the OVMF SEC code?
> >>>   How do we determine whether the CPU executing SEC is BSP or
> >>>   hot-plugged AP?
> >> [Jiewen] I think this is blocked from hardware perspective, since the first instruction.
> >> There are some hardware specific registers can be used to determine if the CPU is new added.
> >> I don’t think this must be same as the real hardware.
> >> You are free to invent some registers in device model to be used in OVMF hot plug driver.
> > 
> > Yes, this would be a new operation mode for QEMU, that only applies to
> > hot-plugged CPUs.  In this mode the AP doesn't reply to INIT or SMI, in
> > fact it doesn't reply to anything at all.
> >   
> >>> - How do we tell the hot-plugged AP where to start execution? (I.e. that
> >>>   it should execute code at a particular pflash location.)
> >> [Jiewen] Same real mode reset vector at FFFF:FFF0.
> > 
> > You do not need a reset vector or INIT/SIPI/SIPI sequence at all in
> > QEMU.  The AP does not start execution at all when it is unplugged, so
> > no cache-as-RAM etc.
> > 
> > We only need to modify QEMU so that hot-plugged APIs do not reply to
> > INIT/SIPI/SMI.
> >   
> >> I don’t think there is problem for real hardware, who always has CAR.
> >> Can QEMU provide some CPU specific space, such as MMIO region?
> > 
> > Why is a CPU-specific region needed if every other processor is in SMM
> > and thus trusted.
> 
> I was going through the steps Jiewen and Yingwen recommended.
> 
> In step (02), the new CPU is expected to set up RAM access. In step
> (03), the new CPU, executing code from flash, is expected to "send board
> message to tell host CPU (GPIO->SCI) -- I am waiting for hot-add
> message." For that action, the new CPU may need a stack (minimally if we
> want to use C function calls).
> 
> Until step (03), there had been no word about any other (= pre-plugged)
> CPUs (more precisely, Jiewen even confirmed "No impact to other
> processors"), so I didn't assume that other CPUs had entered SMM.
> 
> Paolo, I've attempted to read Jiewen's response, and yours, as carefully
> as I can. I'm still very confused. If you have a better understanding,
> could you please write up the 15-step process from the thread starter
> again, with all QEMU customizations applied? Such as, unnecessary steps
> removed, and platform specifics filled in.
> 
> One more comment below:
> 
> >   
> >>>   Does CPU hotplug apply only at the socket level? If the CPU is
> >>>   multi-core, what is responsible for hot-plugging all cores present in
> >>>   the socket?
> > 
> > I can answer this: the SMM handler would interact with the hotplug
> > controller in the same way that ACPI DSDT does normally.  This supports
> > multiple hotplugs already.
> > 
> > Writes to the hotplug controller from outside SMM would be ignored.
> >   
> >>>> (03) New CPU: (Flash) send board message to tell host CPU (GPIO->SCI)
> >>>>      -- I am waiting for hot-add message.
> >>>
> >>> Maybe we can simplify this in QEMU by broadcasting an SMI to existent
> >>> processors immediately upon plugging the new CPU.
> > 
> > The QEMU DSDT could be modified (when secure boot is in effect) to OUT
> > to 0xB2 when hotplug happens.  It could write a well-known value to
> > 0xB2, to be read by an SMI handler in edk2.
> 
> (My comment below is general, and may not apply to this particular
> situation. I'm too confused to figure that out myself, sorry!)
> 
> I dislike involving QEMU's generated DSDT in anything SMM (even
> injecting the SMI), because the AML interpreter runs in the OS.
> 
> If a malicious OS kernel is a bit too enlightened about the DSDT, it
> could willfully diverge from the process that we design. If QEMU
> broadcast the SMI internally, the guest OS could not interfere with that.
> 
> If the purpose of the SMI is specifically to force all CPUs into SMM
> (and thereby force them into trusted state), then the OS would be
> explicitly counter-interested in carrying out the AML operations from
> QEMU's DSDT.
it shouldn't matter where from management SMI comes if OS won't be able
to actually trigger SMI with un-trusted content at SMBASE on hotplugged (parked) CPU.
The worst that could happen is that new cpu will stay parked.

> I'd be OK with an SMM / SMI involvement in QEMU's DSDT if, by diverging
> from that DSDT, the OS kernel could only mess with its own state, and
> not with the firmware's.
> 
> Thanks
> Laszlo
> 
> > 
> >   
> >>>
> >>>>                                        (NOTE: Host CPU can only
> >>> send
> >>>>      instruction in SMM mode. -- The register is SMM only)
> >>>
> >>> Sorry, I don't follow -- what register are we talking about here, and
> >>> why is the BSP needed to send anything at all? What "instruction" do you
> >>> have in mind?
> >> [Jiewen] The new CPU does not enable SMI at reset.
> >> At some point of time later, the CPU need enable SMI, right?
> >> The "instruction" here means, the host CPUs need tell to CPU to enable SMI.
> > 
> > Right, this would be a write to the CPU hotplug controller
> >   
> >>>> (04) Host CPU: (OS) get message from board that a new CPU is added.
> >>>>      (GPIO -> SCI)
> >>>>
> >>>> (05) Host CPU: (OS) All CPUs enter SMM (SCI->SWSMI) (NOTE: New CPU
> >>>>      will not enter CPU because SMI is disabled)
> >>>
> >>> I don't understand the OS involvement here. But, again, perhaps QEMU can
> >>> force all existent CPUs into SMM immediately upon adding the new CPU.
> >> [Jiewen] OS here means the Host CPU running code in OS environment, not in SMM environment.
> > 
> > See above.
> >   
> >>>> (06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
> >>>>      rebase code.
> >>>>
> >>>> (07) Host CPU: (SMM) Send message to New CPU to Enable SMI.
> >>>
> >>> Aha, so this is the SMM-only register you mention in step (03). Is the
> >>> register specified in the Intel SDM?
> >> [Jiewen] Right. That is the register to let host CPU tell new CPU to enable SMI.
> >> It is platform specific register. Not defined in SDM.
> >> You may invent one in device model.
> > 
> > See above.
> >   
> >>>> (10) New CPU: (SMM) Response first SMI at 38000, and rebase SMBASE to
> >>>>      TSEG.
> >>>
> >>> What code does the new CPU execute after it completes step (10)? Does it
> >>> halt?
> >>
> >> [Jiewen] The new CPU exits SMM and return to original place - where it is
> >> interrupted to enter SMM - running code on the flash.
> > 
> > So in our case we'd need an INIT/SIPI/SIPI sequence between (06) and (07).
> >   
> >>>> (11) Host CPU: (SMM) Restore 38000.
> >>>
> >>> These steps (i.e., (06) through (11)) don't appear RAS-specific. The
> >>> only platform-specific feature seems to be SMI masking register, which
> >>> could be extracted into a new SmmCpuFeaturesLib API.
> >>>
> >>> Thus, would you please consider open sourcing firmware code for steps
> >>> (06) through (11)?
> >>>
> >>> Alternatively -- and in particular because the stack for step (01)
> >>> concerns me --, we could approach this from a high-level, functional
> >>> perspective. The states that really matter are the relocated SMBASE for
> >>> the new CPU, and the state of the full system, right at the end of step
> >>> (11).
> >>>
> >>> When the SMM setup quiesces during normal firmware boot, OVMF could
> >>> use
> >>> existent (finalized) SMBASE infomation to *pre-program* some virtual
> >>> QEMU hardware, with such state that would be expected, as "final" state,
> >>> of any new hotplugged CPU. Afterwards, if / when the hotplug actually
> >>> happens, QEMU could blanket-apply this state to the new CPU, and
> >>> broadcast a hardware SMI to all CPUs except the new one.
> > 
> > I'd rather avoid this and stay as close as possible to real hardware.
> > 
> > Paolo
> > 
> > 
> >   
> 


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-15 15:00       ` [edk2-devel] " Laszlo Ersek
  2019-08-15 16:16         ` Igor Mammedov
@ 2019-08-15 16:21         ` Paolo Bonzini
  2019-08-16  2:46           ` Yao, Jiewen
  2019-08-16 20:00           ` Laszlo Ersek
  1 sibling, 2 replies; 69+ messages in thread
From: Paolo Bonzini @ 2019-08-15 16:21 UTC (permalink / raw)
  To: Laszlo Ersek, devel, Yao, Jiewen
  Cc: edk2-rfc-groups-io, qemu devel list, Igor Mammedov, Chen, Yingwen,
	Nakajima, Jun, Boris Ostrovsky, Joao Marcal Lemos Martins,
	Phillip Goerl

On 15/08/19 17:00, Laszlo Ersek wrote:
> On 08/14/19 16:04, Paolo Bonzini wrote:
>> On 14/08/19 15:20, Yao, Jiewen wrote:
>>>> - Does this part require a new branch somewhere in the OVMF SEC code?
>>>>   How do we determine whether the CPU executing SEC is BSP or
>>>>   hot-plugged AP?
>>> [Jiewen] I think this is blocked from hardware perspective, since the first instruction.
>>> There are some hardware specific registers can be used to determine if the CPU is new added.
>>> I don’t think this must be same as the real hardware.
>>> You are free to invent some registers in device model to be used in OVMF hot plug driver.
>>
>> Yes, this would be a new operation mode for QEMU, that only applies to
>> hot-plugged CPUs.  In this mode the AP doesn't reply to INIT or SMI, in
>> fact it doesn't reply to anything at all.
>>
>>>> - How do we tell the hot-plugged AP where to start execution? (I.e. that
>>>>   it should execute code at a particular pflash location.)
>>> [Jiewen] Same real mode reset vector at FFFF:FFF0.
>>
>> You do not need a reset vector or INIT/SIPI/SIPI sequence at all in
>> QEMU.  The AP does not start execution at all when it is unplugged, so
>> no cache-as-RAM etc.
>>
>> We only need to modify QEMU so that hot-plugged APIs do not reply to
>> INIT/SIPI/SMI.
>>
>>> I don’t think there is problem for real hardware, who always has CAR.
>>> Can QEMU provide some CPU specific space, such as MMIO region?
>>
>> Why is a CPU-specific region needed if every other processor is in SMM
>> and thus trusted.
> 
> I was going through the steps Jiewen and Yingwen recommended.
> 
> In step (02), the new CPU is expected to set up RAM access. In step
> (03), the new CPU, executing code from flash, is expected to "send board
> message to tell host CPU (GPIO->SCI) -- I am waiting for hot-add
> message." For that action, the new CPU may need a stack (minimally if we
> want to use C function calls).
> 
> Until step (03), there had been no word about any other (= pre-plugged)
> CPUs (more precisely, Jiewen even confirmed "No impact to other
> processors"), so I didn't assume that other CPUs had entered SMM.
> 
> Paolo, I've attempted to read Jiewen's response, and yours, as carefully
> as I can. I'm still very confused. If you have a better understanding,
> could you please write up the 15-step process from the thread starter
> again, with all QEMU customizations applied? Such as, unnecessary steps
> removed, and platform specifics filled in.

Sure.

(01a) QEMU: create new CPU.  The CPU already exists, but it does not
     start running code until unparked by the CPU hotplug controller.

(01b) QEMU: trigger SCI

(02-03) no equivalent

(04) Host CPU: (OS) execute GPE handler from DSDT

(05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New CPU
     will not enter CPU because SMI is disabled)

(06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
     rebase code.

(07a) Host CPU: (SMM) Write to CPU hotplug controller to enable
     new CPU

(07b) Host CPU: (SMM) Send INIT/SIPI/SIPI to new CPU.

(08a) New CPU: (Low RAM) Enter protected mode.

(08b) New CPU: (Flash) Signals host CPU to proceed and enter cli;hlt loop.

(09) Host CPU: (SMM) Send SMI to the new CPU only.

(10) New CPU: (SMM) Run SMM code at 38000, and rebase SMBASE to
     TSEG.

(11) Host CPU: (SMM) Restore 38000.

(12) Host CPU: (SMM) Update located data structure to add the new CPU
     information. (This step will involve CPU_SERVICE protocol)

(13) New CPU: (Flash) do whatever other initialization is needed

(14) New CPU: (Flash) Deadloop, and wait for INIT-SIPI-SIPI.

(15) Host CPU: (OS) Send INIT-SIPI-SIPI to pull new CPU in..


In other words, the cache-as-RAM phase of 02-03 is replaced by the
INIT-SIPI-SIPI sequence of 07b-08a-08b.


>> The QEMU DSDT could be modified (when secure boot is in effect) to OUT
>> to 0xB2 when hotplug happens.  It could write a well-known value to
>> 0xB2, to be read by an SMI handler in edk2.
> 
> I dislike involving QEMU's generated DSDT in anything SMM (even
> injecting the SMI), because the AML interpreter runs in the OS.
> 
> If a malicious OS kernel is a bit too enlightened about the DSDT, it
> could willfully diverge from the process that we design. If QEMU
> broadcast the SMI internally, the guest OS could not interfere with that.
> 
> If the purpose of the SMI is specifically to force all CPUs into SMM
> (and thereby force them into trusted state), then the OS would be
> explicitly counter-interested in carrying out the AML operations from
> QEMU's DSDT.

But since the hotplug controller would only be accessible from SMM,
there would be no other way to invoke it than to follow the DSDT's
instruction and write to 0xB2.  FWIW, real hardware also has plenty of
0xB2 writes in the DSDT or in APEI tables (e.g. for persistent store
access).

Paolo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-15 16:21         ` Paolo Bonzini
@ 2019-08-16  2:46           ` Yao, Jiewen
  2019-08-16  7:20             ` Paolo Bonzini
  2019-08-16 20:00           ` Laszlo Ersek
  1 sibling, 1 reply; 69+ messages in thread
From: Yao, Jiewen @ 2019-08-16  2:46 UTC (permalink / raw)
  To: Paolo Bonzini, Laszlo Ersek, devel@edk2.groups.io
  Cc: edk2-rfc-groups-io, qemu devel list, Igor Mammedov, Chen, Yingwen,
	Nakajima, Jun, Boris Ostrovsky, Joao Marcal Lemos Martins,
	Phillip Goerl

Comment below:


> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Friday, August 16, 2019 12:21 AM
> To: Laszlo Ersek <lersek@redhat.com>; devel@edk2.groups.io; Yao, Jiewen
> <jiewen.yao@intel.com>
> Cc: edk2-rfc-groups-io <rfc@edk2.groups.io>; qemu devel list
> <qemu-devel@nongnu.org>; Igor Mammedov <imammedo@redhat.com>;
> Chen, Yingwen <yingwen.chen@intel.com>; Nakajima, Jun
> <jun.nakajima@intel.com>; Boris Ostrovsky <boris.ostrovsky@oracle.com>;
> Joao Marcal Lemos Martins <joao.m.martins@oracle.com>; Phillip Goerl
> <phillip.goerl@oracle.com>
> Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
> 
> On 15/08/19 17:00, Laszlo Ersek wrote:
> > On 08/14/19 16:04, Paolo Bonzini wrote:
> >> On 14/08/19 15:20, Yao, Jiewen wrote:
> >>>> - Does this part require a new branch somewhere in the OVMF SEC
> code?
> >>>>   How do we determine whether the CPU executing SEC is BSP or
> >>>>   hot-plugged AP?
> >>> [Jiewen] I think this is blocked from hardware perspective, since the first
> instruction.
> >>> There are some hardware specific registers can be used to determine if
> the CPU is new added.
> >>> I don’t think this must be same as the real hardware.
> >>> You are free to invent some registers in device model to be used in
> OVMF hot plug driver.
> >>
> >> Yes, this would be a new operation mode for QEMU, that only applies to
> >> hot-plugged CPUs.  In this mode the AP doesn't reply to INIT or SMI, in
> >> fact it doesn't reply to anything at all.
> >>
> >>>> - How do we tell the hot-plugged AP where to start execution? (I.e.
> that
> >>>>   it should execute code at a particular pflash location.)
> >>> [Jiewen] Same real mode reset vector at FFFF:FFF0.
> >>
> >> You do not need a reset vector or INIT/SIPI/SIPI sequence at all in
> >> QEMU.  The AP does not start execution at all when it is unplugged, so
> >> no cache-as-RAM etc.
> >>
> >> We only need to modify QEMU so that hot-plugged APIs do not reply to
> >> INIT/SIPI/SMI.
> >>
> >>> I don’t think there is problem for real hardware, who always has CAR.
> >>> Can QEMU provide some CPU specific space, such as MMIO region?
> >>
> >> Why is a CPU-specific region needed if every other processor is in SMM
> >> and thus trusted.
> >
> > I was going through the steps Jiewen and Yingwen recommended.
> >
> > In step (02), the new CPU is expected to set up RAM access. In step
> > (03), the new CPU, executing code from flash, is expected to "send board
> > message to tell host CPU (GPIO->SCI) -- I am waiting for hot-add
> > message." For that action, the new CPU may need a stack (minimally if we
> > want to use C function calls).
> >
> > Until step (03), there had been no word about any other (= pre-plugged)
> > CPUs (more precisely, Jiewen even confirmed "No impact to other
> > processors"), so I didn't assume that other CPUs had entered SMM.
> >
> > Paolo, I've attempted to read Jiewen's response, and yours, as carefully
> > as I can. I'm still very confused. If you have a better understanding,
> > could you please write up the 15-step process from the thread starter
> > again, with all QEMU customizations applied? Such as, unnecessary steps
> > removed, and platform specifics filled in.
> 
> Sure.
> 
> (01a) QEMU: create new CPU.  The CPU already exists, but it does not
>      start running code until unparked by the CPU hotplug controller.
> 
> (01b) QEMU: trigger SCI
> 
> (02-03) no equivalent
> 
> (04) Host CPU: (OS) execute GPE handler from DSDT
> 
> (05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New CPU
>      will not enter CPU because SMI is disabled)
> 
> (06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
>      rebase code.
> 
> (07a) Host CPU: (SMM) Write to CPU hotplug controller to enable
>      new CPU
> 
> (07b) Host CPU: (SMM) Send INIT/SIPI/SIPI to new CPU.
[Jiewen] NOTE: INIT/SIPI/SIPI can be sent by a malicious CPU. There is no
restriction that INIT/SIPI/SIPI can only be sent in SMM.



> (08a) New CPU: (Low RAM) Enter protected mode.
[Jiewen] NOTE: The new CPU still cannot use any physical memory, because
the INIT/SIPI/SIPI may be sent by malicious CPU in non-SMM environment.



> (08b) New CPU: (Flash) Signals host CPU to proceed and enter cli;hlt loop.
> 
> (09) Host CPU: (SMM) Send SMI to the new CPU only.
> 
> (10) New CPU: (SMM) Run SMM code at 38000, and rebase SMBASE to
>      TSEG.
> 
> (11) Host CPU: (SMM) Restore 38000.
> 
> (12) Host CPU: (SMM) Update located data structure to add the new CPU
>      information. (This step will involve CPU_SERVICE protocol)
> 
> (13) New CPU: (Flash) do whatever other initialization is needed
> 
> (14) New CPU: (Flash) Deadloop, and wait for INIT-SIPI-SIPI.
> 
> (15) Host CPU: (OS) Send INIT-SIPI-SIPI to pull new CPU in..
> 
> 
> In other words, the cache-as-RAM phase of 02-03 is replaced by the
> INIT-SIPI-SIPI sequence of 07b-08a-08b.
[Jiewen] I am OK with this proposal.
I think the rule is same - the new CPU CANNOT touch any system memory,
no matter it is from reset-vector or from INIT/SIPI/SIPI.
Or I would say: if the new CPU want to touch some memory before first SMI, the memory should be
CPU specific or on the flash.



> >> The QEMU DSDT could be modified (when secure boot is in effect) to OUT
> >> to 0xB2 when hotplug happens.  It could write a well-known value to
> >> 0xB2, to be read by an SMI handler in edk2.
> >
> > I dislike involving QEMU's generated DSDT in anything SMM (even
> > injecting the SMI), because the AML interpreter runs in the OS.
> >
> > If a malicious OS kernel is a bit too enlightened about the DSDT, it
> > could willfully diverge from the process that we design. If QEMU
> > broadcast the SMI internally, the guest OS could not interfere with that.
> >
> > If the purpose of the SMI is specifically to force all CPUs into SMM
> > (and thereby force them into trusted state), then the OS would be
> > explicitly counter-interested in carrying out the AML operations from
> > QEMU's DSDT.
> 
> But since the hotplug controller would only be accessible from SMM,
> there would be no other way to invoke it than to follow the DSDT's
> instruction and write to 0xB2.  FWIW, real hardware also has plenty of
> 0xB2 writes in the DSDT or in APEI tables (e.g. for persistent store
> access).
> 
> Paolo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-16  2:46           ` Yao, Jiewen
@ 2019-08-16  7:20             ` Paolo Bonzini
  2019-08-16  7:49               ` Yao, Jiewen
  0 siblings, 1 reply; 69+ messages in thread
From: Paolo Bonzini @ 2019-08-16  7:20 UTC (permalink / raw)
  To: Yao, Jiewen, Laszlo Ersek, devel@edk2.groups.io
  Cc: edk2-rfc-groups-io, qemu devel list, Igor Mammedov, Chen, Yingwen,
	Nakajima, Jun, Boris Ostrovsky, Joao Marcal Lemos Martins,
	Phillip Goerl

On 16/08/19 04:46, Yao, Jiewen wrote:
> Comment below:
> 
> 
>> -----Original Message-----
>> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
>> Sent: Friday, August 16, 2019 12:21 AM
>> To: Laszlo Ersek <lersek@redhat.com>; devel@edk2.groups.io; Yao, Jiewen
>> <jiewen.yao@intel.com>
>> Cc: edk2-rfc-groups-io <rfc@edk2.groups.io>; qemu devel list
>> <qemu-devel@nongnu.org>; Igor Mammedov <imammedo@redhat.com>;
>> Chen, Yingwen <yingwen.chen@intel.com>; Nakajima, Jun
>> <jun.nakajima@intel.com>; Boris Ostrovsky <boris.ostrovsky@oracle.com>;
>> Joao Marcal Lemos Martins <joao.m.martins@oracle.com>; Phillip Goerl
>> <phillip.goerl@oracle.com>
>> Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
>>
>> On 15/08/19 17:00, Laszlo Ersek wrote:
>>> On 08/14/19 16:04, Paolo Bonzini wrote:
>>>> On 14/08/19 15:20, Yao, Jiewen wrote:
>>>>>> - Does this part require a new branch somewhere in the OVMF SEC
>> code?
>>>>>>   How do we determine whether the CPU executing SEC is BSP or
>>>>>>   hot-plugged AP?
>>>>> [Jiewen] I think this is blocked from hardware perspective, since the first
>> instruction.
>>>>> There are some hardware specific registers can be used to determine if
>> the CPU is new added.
>>>>> I don’t think this must be same as the real hardware.
>>>>> You are free to invent some registers in device model to be used in
>> OVMF hot plug driver.
>>>>
>>>> Yes, this would be a new operation mode for QEMU, that only applies to
>>>> hot-plugged CPUs.  In this mode the AP doesn't reply to INIT or SMI, in
>>>> fact it doesn't reply to anything at all.
>>>>
>>>>>> - How do we tell the hot-plugged AP where to start execution? (I.e.
>> that
>>>>>>   it should execute code at a particular pflash location.)
>>>>> [Jiewen] Same real mode reset vector at FFFF:FFF0.
>>>>
>>>> You do not need a reset vector or INIT/SIPI/SIPI sequence at all in
>>>> QEMU.  The AP does not start execution at all when it is unplugged, so
>>>> no cache-as-RAM etc.
>>>>
>>>> We only need to modify QEMU so that hot-plugged APIs do not reply to
>>>> INIT/SIPI/SMI.
>>>>
>>>>> I don’t think there is problem for real hardware, who always has CAR.
>>>>> Can QEMU provide some CPU specific space, such as MMIO region?
>>>>
>>>> Why is a CPU-specific region needed if every other processor is in SMM
>>>> and thus trusted.
>>>
>>> I was going through the steps Jiewen and Yingwen recommended.
>>>
>>> In step (02), the new CPU is expected to set up RAM access. In step
>>> (03), the new CPU, executing code from flash, is expected to "send board
>>> message to tell host CPU (GPIO->SCI) -- I am waiting for hot-add
>>> message." For that action, the new CPU may need a stack (minimally if we
>>> want to use C function calls).
>>>
>>> Until step (03), there had been no word about any other (= pre-plugged)
>>> CPUs (more precisely, Jiewen even confirmed "No impact to other
>>> processors"), so I didn't assume that other CPUs had entered SMM.
>>>
>>> Paolo, I've attempted to read Jiewen's response, and yours, as carefully
>>> as I can. I'm still very confused. If you have a better understanding,
>>> could you please write up the 15-step process from the thread starter
>>> again, with all QEMU customizations applied? Such as, unnecessary steps
>>> removed, and platform specifics filled in.
>>
>> Sure.
>>
>> (01a) QEMU: create new CPU.  The CPU already exists, but it does not
>>      start running code until unparked by the CPU hotplug controller.
>>
>> (01b) QEMU: trigger SCI
>>
>> (02-03) no equivalent
>>
>> (04) Host CPU: (OS) execute GPE handler from DSDT
>>
>> (05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New CPU
>>      will not enter CPU because SMI is disabled)
>>
>> (06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
>>      rebase code.
>>
>> (07a) Host CPU: (SMM) Write to CPU hotplug controller to enable
>>      new CPU
>>
>> (07b) Host CPU: (SMM) Send INIT/SIPI/SIPI to new CPU.
> [Jiewen] NOTE: INIT/SIPI/SIPI can be sent by a malicious CPU. There is no
> restriction that INIT/SIPI/SIPI can only be sent in SMM.

All of the CPUs are now in SMM, and INIT/SIPI/SIPI will be discarded
before 07a, so this is okay.

However I do see a problem, because a PCI device's DMA could overwrite
0x38000 between (06) and (10) and hijack the code that is executed in
SMM.  How is this avoided on real hardware?  By the time the new CPU
enters SMM, it doesn't run off cache-as-RAM anymore.

Paolo

>> (08a) New CPU: (Low RAM) Enter protected mode.
>
> [Jiewen] NOTE: The new CPU still cannot use any physical memory, because
> the INIT/SIPI/SIPI may be sent by malicious CPU in non-SMM environment.
> 
>> (08b) New CPU: (Flash) Signals host CPU to proceed and enter cli;hlt loop.
>>
>> (09) Host CPU: (SMM) Send SMI to the new CPU only.
>>
>> (10) New CPU: (SMM) Run SMM code at 38000, and rebase SMBASE to
>>      TSEG.
>>
>> (11) Host CPU: (SMM) Restore 38000.
>>
>> (12) Host CPU: (SMM) Update located data structure to add the new CPU
>>      information. (This step will involve CPU_SERVICE protocol)
>>
>> (13) New CPU: (Flash) do whatever other initialization is needed
>>
>> (14) New CPU: (Flash) Deadloop, and wait for INIT-SIPI-SIPI.
>>
>> (15) Host CPU: (OS) Send INIT-SIPI-SIPI to pull new CPU in..
>>
>>
>> In other words, the cache-as-RAM phase of 02-03 is replaced by the
>> INIT-SIPI-SIPI sequence of 07b-08a-08b.
> [Jiewen] I am OK with this proposal.
> I think the rule is same - the new CPU CANNOT touch any system memory,
> no matter it is from reset-vector or from INIT/SIPI/SIPI.
> Or I would say: if the new CPU want to touch some memory before first SMI, the memory should be
> CPU specific or on the flash.
> 
> 
> 
>>>> The QEMU DSDT could be modified (when secure boot is in effect) to OUT
>>>> to 0xB2 when hotplug happens.  It could write a well-known value to
>>>> 0xB2, to be read by an SMI handler in edk2.
>>>
>>> I dislike involving QEMU's generated DSDT in anything SMM (even
>>> injecting the SMI), because the AML interpreter runs in the OS.
>>>
>>> If a malicious OS kernel is a bit too enlightened about the DSDT, it
>>> could willfully diverge from the process that we design. If QEMU
>>> broadcast the SMI internally, the guest OS could not interfere with that.
>>>
>>> If the purpose of the SMI is specifically to force all CPUs into SMM
>>> (and thereby force them into trusted state), then the OS would be
>>> explicitly counter-interested in carrying out the AML operations from
>>> QEMU's DSDT.
>>
>> But since the hotplug controller would only be accessible from SMM,
>> there would be no other way to invoke it than to follow the DSDT's
>> instruction and write to 0xB2.  FWIW, real hardware also has plenty of
>> 0xB2 writes in the DSDT or in APEI tables (e.g. for persistent store
>> access).
>>
>> Paolo


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-16  7:20             ` Paolo Bonzini
@ 2019-08-16  7:49               ` Yao, Jiewen
  2019-08-16 20:15                 ` Laszlo Ersek
  0 siblings, 1 reply; 69+ messages in thread
From: Yao, Jiewen @ 2019-08-16  7:49 UTC (permalink / raw)
  To: Paolo Bonzini, Laszlo Ersek, devel@edk2.groups.io
  Cc: edk2-rfc-groups-io, qemu devel list, Igor Mammedov, Chen, Yingwen,
	Nakajima, Jun, Boris Ostrovsky, Joao Marcal Lemos Martins,
	Phillip Goerl

below

> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Friday, August 16, 2019 3:20 PM
> To: Yao, Jiewen <jiewen.yao@intel.com>; Laszlo Ersek
> <lersek@redhat.com>; devel@edk2.groups.io
> Cc: edk2-rfc-groups-io <rfc@edk2.groups.io>; qemu devel list
> <qemu-devel@nongnu.org>; Igor Mammedov <imammedo@redhat.com>;
> Chen, Yingwen <yingwen.chen@intel.com>; Nakajima, Jun
> <jun.nakajima@intel.com>; Boris Ostrovsky <boris.ostrovsky@oracle.com>;
> Joao Marcal Lemos Martins <joao.m.martins@oracle.com>; Phillip Goerl
> <phillip.goerl@oracle.com>
> Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
> 
> On 16/08/19 04:46, Yao, Jiewen wrote:
> > Comment below:
> >
> >
> >> -----Original Message-----
> >> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> >> Sent: Friday, August 16, 2019 12:21 AM
> >> To: Laszlo Ersek <lersek@redhat.com>; devel@edk2.groups.io; Yao,
> Jiewen
> >> <jiewen.yao@intel.com>
> >> Cc: edk2-rfc-groups-io <rfc@edk2.groups.io>; qemu devel list
> >> <qemu-devel@nongnu.org>; Igor Mammedov
> <imammedo@redhat.com>;
> >> Chen, Yingwen <yingwen.chen@intel.com>; Nakajima, Jun
> >> <jun.nakajima@intel.com>; Boris Ostrovsky
> <boris.ostrovsky@oracle.com>;
> >> Joao Marcal Lemos Martins <joao.m.martins@oracle.com>; Phillip Goerl
> >> <phillip.goerl@oracle.com>
> >> Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
> >>
> >> On 15/08/19 17:00, Laszlo Ersek wrote:
> >>> On 08/14/19 16:04, Paolo Bonzini wrote:
> >>>> On 14/08/19 15:20, Yao, Jiewen wrote:
> >>>>>> - Does this part require a new branch somewhere in the OVMF SEC
> >> code?
> >>>>>>   How do we determine whether the CPU executing SEC is BSP or
> >>>>>>   hot-plugged AP?
> >>>>> [Jiewen] I think this is blocked from hardware perspective, since the
> first
> >> instruction.
> >>>>> There are some hardware specific registers can be used to determine
> if
> >> the CPU is new added.
> >>>>> I don’t think this must be same as the real hardware.
> >>>>> You are free to invent some registers in device model to be used in
> >> OVMF hot plug driver.
> >>>>
> >>>> Yes, this would be a new operation mode for QEMU, that only applies
> to
> >>>> hot-plugged CPUs.  In this mode the AP doesn't reply to INIT or SMI,
> in
> >>>> fact it doesn't reply to anything at all.
> >>>>
> >>>>>> - How do we tell the hot-plugged AP where to start execution? (I.e.
> >> that
> >>>>>>   it should execute code at a particular pflash location.)
> >>>>> [Jiewen] Same real mode reset vector at FFFF:FFF0.
> >>>>
> >>>> You do not need a reset vector or INIT/SIPI/SIPI sequence at all in
> >>>> QEMU.  The AP does not start execution at all when it is unplugged,
> so
> >>>> no cache-as-RAM etc.
> >>>>
> >>>> We only need to modify QEMU so that hot-plugged APIs do not reply
> to
> >>>> INIT/SIPI/SMI.
> >>>>
> >>>>> I don’t think there is problem for real hardware, who always has CAR.
> >>>>> Can QEMU provide some CPU specific space, such as MMIO region?
> >>>>
> >>>> Why is a CPU-specific region needed if every other processor is in SMM
> >>>> and thus trusted.
> >>>
> >>> I was going through the steps Jiewen and Yingwen recommended.
> >>>
> >>> In step (02), the new CPU is expected to set up RAM access. In step
> >>> (03), the new CPU, executing code from flash, is expected to "send
> board
> >>> message to tell host CPU (GPIO->SCI) -- I am waiting for hot-add
> >>> message." For that action, the new CPU may need a stack (minimally if
> we
> >>> want to use C function calls).
> >>>
> >>> Until step (03), there had been no word about any other (= pre-plugged)
> >>> CPUs (more precisely, Jiewen even confirmed "No impact to other
> >>> processors"), so I didn't assume that other CPUs had entered SMM.
> >>>
> >>> Paolo, I've attempted to read Jiewen's response, and yours, as carefully
> >>> as I can. I'm still very confused. If you have a better understanding,
> >>> could you please write up the 15-step process from the thread starter
> >>> again, with all QEMU customizations applied? Such as, unnecessary
> steps
> >>> removed, and platform specifics filled in.
> >>
> >> Sure.
> >>
> >> (01a) QEMU: create new CPU.  The CPU already exists, but it does not
> >>      start running code until unparked by the CPU hotplug controller.
> >>
> >> (01b) QEMU: trigger SCI
> >>
> >> (02-03) no equivalent
> >>
> >> (04) Host CPU: (OS) execute GPE handler from DSDT
> >>
> >> (05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New CPU
> >>      will not enter CPU because SMI is disabled)
> >>
> >> (06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
> >>      rebase code.
> >>
> >> (07a) Host CPU: (SMM) Write to CPU hotplug controller to enable
> >>      new CPU
> >>
> >> (07b) Host CPU: (SMM) Send INIT/SIPI/SIPI to new CPU.
> > [Jiewen] NOTE: INIT/SIPI/SIPI can be sent by a malicious CPU. There is no
> > restriction that INIT/SIPI/SIPI can only be sent in SMM.
> 
> All of the CPUs are now in SMM, and INIT/SIPI/SIPI will be discarded
> before 07a, so this is okay.
[Jiewen] May I know why INIT/SIPI/SIPI is discarded before 07a but is delivered at 07a?
I don’t see any extra step between 06 and 07a.
What is the magic here?



> However I do see a problem, because a PCI device's DMA could overwrite
> 0x38000 between (06) and (10) and hijack the code that is executed in
> SMM.  How is this avoided on real hardware?  By the time the new CPU
> enters SMM, it doesn't run off cache-as-RAM anymore.
[Jiewen] Interesting question.
I don’t think the DMA attack is considered in threat model for the virtual environment. We only list adversary below:
-- Adversary: System Software Attacker, who can control any OS memory or silicon register from OS level, or read write BIOS data.
-- Adversary: Simple hardware attacker, who can hot add or hot remove a CPU.

I agree it is a threat from real hardware perspective. SMM may check VTd to make sure the 38000 is blocked.
I doubt if it is a threat in virtual environment. Do we have a way to block DMA in virtual environment?



> Paolo
> 
> >> (08a) New CPU: (Low RAM) Enter protected mode.
> >
> > [Jiewen] NOTE: The new CPU still cannot use any physical memory,
> because
> > the INIT/SIPI/SIPI may be sent by malicious CPU in non-SMM environment.
> >
> >> (08b) New CPU: (Flash) Signals host CPU to proceed and enter cli;hlt loop.
> >>
> >> (09) Host CPU: (SMM) Send SMI to the new CPU only.
> >>
> >> (10) New CPU: (SMM) Run SMM code at 38000, and rebase SMBASE to
> >>      TSEG.
> >>
> >> (11) Host CPU: (SMM) Restore 38000.
> >>
> >> (12) Host CPU: (SMM) Update located data structure to add the new CPU
> >>      information. (This step will involve CPU_SERVICE protocol)
> >>
> >> (13) New CPU: (Flash) do whatever other initialization is needed
> >>
> >> (14) New CPU: (Flash) Deadloop, and wait for INIT-SIPI-SIPI.
> >>
> >> (15) Host CPU: (OS) Send INIT-SIPI-SIPI to pull new CPU in..
> >>
> >>
> >> In other words, the cache-as-RAM phase of 02-03 is replaced by the
> >> INIT-SIPI-SIPI sequence of 07b-08a-08b.
> > [Jiewen] I am OK with this proposal.
> > I think the rule is same - the new CPU CANNOT touch any system memory,
> > no matter it is from reset-vector or from INIT/SIPI/SIPI.
> > Or I would say: if the new CPU want to touch some memory before first
> SMI, the memory should be
> > CPU specific or on the flash.
> >
> >
> >
> >>>> The QEMU DSDT could be modified (when secure boot is in effect) to
> OUT
> >>>> to 0xB2 when hotplug happens.  It could write a well-known value to
> >>>> 0xB2, to be read by an SMI handler in edk2.
> >>>
> >>> I dislike involving QEMU's generated DSDT in anything SMM (even
> >>> injecting the SMI), because the AML interpreter runs in the OS.
> >>>
> >>> If a malicious OS kernel is a bit too enlightened about the DSDT, it
> >>> could willfully diverge from the process that we design. If QEMU
> >>> broadcast the SMI internally, the guest OS could not interfere with that.
> >>>
> >>> If the purpose of the SMI is specifically to force all CPUs into SMM
> >>> (and thereby force them into trusted state), then the OS would be
> >>> explicitly counter-interested in carrying out the AML operations from
> >>> QEMU's DSDT.
> >>
> >> But since the hotplug controller would only be accessible from SMM,
> >> there would be no other way to invoke it than to follow the DSDT's
> >> instruction and write to 0xB2.  FWIW, real hardware also has plenty of
> >> 0xB2 writes in the DSDT or in APEI tables (e.g. for persistent store
> >> access).
> >>
> >> Paolo


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-16  7:49               ` Yao, Jiewen
@ 2019-08-16 20:15                 ` Laszlo Ersek
  2019-08-16 22:19                   ` Alex Williamson
  0 siblings, 1 reply; 69+ messages in thread
From: Laszlo Ersek @ 2019-08-16 20:15 UTC (permalink / raw)
  To: Yao, Jiewen, Paolo Bonzini, devel@edk2.groups.io
  Cc: edk2-rfc-groups-io, qemu devel list, Igor Mammedov, Chen, Yingwen,
	Nakajima, Jun, Boris Ostrovsky, Joao Marcal Lemos Martins,
	Phillip Goerl, Alex Williamson

+Alex (direct question at the bottom)

On 08/16/19 09:49, Yao, Jiewen wrote:
> below
> 
>> -----Original Message-----
>> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
>> Sent: Friday, August 16, 2019 3:20 PM
>> To: Yao, Jiewen <jiewen.yao@intel.com>; Laszlo Ersek
>> <lersek@redhat.com>; devel@edk2.groups.io
>> Cc: edk2-rfc-groups-io <rfc@edk2.groups.io>; qemu devel list
>> <qemu-devel@nongnu.org>; Igor Mammedov <imammedo@redhat.com>;
>> Chen, Yingwen <yingwen.chen@intel.com>; Nakajima, Jun
>> <jun.nakajima@intel.com>; Boris Ostrovsky <boris.ostrovsky@oracle.com>;
>> Joao Marcal Lemos Martins <joao.m.martins@oracle.com>; Phillip Goerl
>> <phillip.goerl@oracle.com>
>> Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
>>
>> On 16/08/19 04:46, Yao, Jiewen wrote:
>>> Comment below:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
>>>> Sent: Friday, August 16, 2019 12:21 AM
>>>> To: Laszlo Ersek <lersek@redhat.com>; devel@edk2.groups.io; Yao,
>> Jiewen
>>>> <jiewen.yao@intel.com>
>>>> Cc: edk2-rfc-groups-io <rfc@edk2.groups.io>; qemu devel list
>>>> <qemu-devel@nongnu.org>; Igor Mammedov
>> <imammedo@redhat.com>;
>>>> Chen, Yingwen <yingwen.chen@intel.com>; Nakajima, Jun
>>>> <jun.nakajima@intel.com>; Boris Ostrovsky
>> <boris.ostrovsky@oracle.com>;
>>>> Joao Marcal Lemos Martins <joao.m.martins@oracle.com>; Phillip Goerl
>>>> <phillip.goerl@oracle.com>
>>>> Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
>>>>
>>>> On 15/08/19 17:00, Laszlo Ersek wrote:
>>>>> On 08/14/19 16:04, Paolo Bonzini wrote:
>>>>>> On 14/08/19 15:20, Yao, Jiewen wrote:
>>>>>>>> - Does this part require a new branch somewhere in the OVMF SEC
>>>> code?
>>>>>>>>   How do we determine whether the CPU executing SEC is BSP or
>>>>>>>>   hot-plugged AP?
>>>>>>> [Jiewen] I think this is blocked from hardware perspective, since the
>> first
>>>> instruction.
>>>>>>> There are some hardware specific registers can be used to determine
>> if
>>>> the CPU is new added.
>>>>>>> I don’t think this must be same as the real hardware.
>>>>>>> You are free to invent some registers in device model to be used in
>>>> OVMF hot plug driver.
>>>>>>
>>>>>> Yes, this would be a new operation mode for QEMU, that only applies
>> to
>>>>>> hot-plugged CPUs.  In this mode the AP doesn't reply to INIT or SMI,
>> in
>>>>>> fact it doesn't reply to anything at all.
>>>>>>
>>>>>>>> - How do we tell the hot-plugged AP where to start execution? (I.e.
>>>> that
>>>>>>>>   it should execute code at a particular pflash location.)
>>>>>>> [Jiewen] Same real mode reset vector at FFFF:FFF0.
>>>>>>
>>>>>> You do not need a reset vector or INIT/SIPI/SIPI sequence at all in
>>>>>> QEMU.  The AP does not start execution at all when it is unplugged,
>> so
>>>>>> no cache-as-RAM etc.
>>>>>>
>>>>>> We only need to modify QEMU so that hot-plugged APIs do not reply
>> to
>>>>>> INIT/SIPI/SMI.
>>>>>>
>>>>>>> I don’t think there is problem for real hardware, who always has CAR.
>>>>>>> Can QEMU provide some CPU specific space, such as MMIO region?
>>>>>>
>>>>>> Why is a CPU-specific region needed if every other processor is in SMM
>>>>>> and thus trusted.
>>>>>
>>>>> I was going through the steps Jiewen and Yingwen recommended.
>>>>>
>>>>> In step (02), the new CPU is expected to set up RAM access. In step
>>>>> (03), the new CPU, executing code from flash, is expected to "send
>> board
>>>>> message to tell host CPU (GPIO->SCI) -- I am waiting for hot-add
>>>>> message." For that action, the new CPU may need a stack (minimally if
>> we
>>>>> want to use C function calls).
>>>>>
>>>>> Until step (03), there had been no word about any other (= pre-plugged)
>>>>> CPUs (more precisely, Jiewen even confirmed "No impact to other
>>>>> processors"), so I didn't assume that other CPUs had entered SMM.
>>>>>
>>>>> Paolo, I've attempted to read Jiewen's response, and yours, as carefully
>>>>> as I can. I'm still very confused. If you have a better understanding,
>>>>> could you please write up the 15-step process from the thread starter
>>>>> again, with all QEMU customizations applied? Such as, unnecessary
>> steps
>>>>> removed, and platform specifics filled in.
>>>>
>>>> Sure.
>>>>
>>>> (01a) QEMU: create new CPU.  The CPU already exists, but it does not
>>>>      start running code until unparked by the CPU hotplug controller.
>>>>
>>>> (01b) QEMU: trigger SCI
>>>>
>>>> (02-03) no equivalent
>>>>
>>>> (04) Host CPU: (OS) execute GPE handler from DSDT
>>>>
>>>> (05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New CPU
>>>>      will not enter CPU because SMI is disabled)
>>>>
>>>> (06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
>>>>      rebase code.
>>>>
>>>> (07a) Host CPU: (SMM) Write to CPU hotplug controller to enable
>>>>      new CPU
>>>>
>>>> (07b) Host CPU: (SMM) Send INIT/SIPI/SIPI to new CPU.
>>> [Jiewen] NOTE: INIT/SIPI/SIPI can be sent by a malicious CPU. There is no
>>> restriction that INIT/SIPI/SIPI can only be sent in SMM.
>>
>> All of the CPUs are now in SMM, and INIT/SIPI/SIPI will be discarded
>> before 07a, so this is okay.
> [Jiewen] May I know why INIT/SIPI/SIPI is discarded before 07a but is delivered at 07a?
> I don’t see any extra step between 06 and 07a.
> What is the magic here?

The magic is 07a itself, IIUC. The CPU hotplug controller would be
accessible only in SMM. And until 07a happens, the new CPU ignores
INIT/SIPI/SIPI even if another CPU sends it those, simply because QEMU
would implement the new CPU's behavior like that.

> 
> 
> 
>> However I do see a problem, because a PCI device's DMA could overwrite
>> 0x38000 between (06) and (10) and hijack the code that is executed in
>> SMM.  How is this avoided on real hardware?  By the time the new CPU
>> enters SMM, it doesn't run off cache-as-RAM anymore.
> [Jiewen] Interesting question.
> I don’t think the DMA attack is considered in threat model for the virtual environment. We only list adversary below:
> -- Adversary: System Software Attacker, who can control any OS memory or silicon register from OS level, or read write BIOS data.
> -- Adversary: Simple hardware attacker, who can hot add or hot remove a CPU.

We do have physical PCI(e) device assignment; sorry for not highlighting
that earlier. That feature (VFIO) does rely on the (physical) IOMMU, and
it makes sure that the assigned device can only access physical frames
that belong to the virtual machine that the device is assigned to.

However, as far as I know, VFIO doesn't try to restrict PCI DMA to
subsets of guest RAM... I could be wrong about that, I vaguely recall
RMRR support, which seems somewhat related.

> I agree it is a threat from real hardware perspective. SMM may check VTd to make sure the 38000 is blocked.
> I doubt if it is a threat in virtual environment. Do we have a way to block DMA in virtual environment?

I think that would be a VFIO feature.

Alex: if we wanted to block PCI(e) DMA to a specific part of guest RAM
(expressed with guest-physical RAM addresses), perhaps permanently,
perhaps just for a while -- not sure about coordination though --, could
VFIO accommodate that (I guess by "punching holes" in the IOMMU page
tables)?

Thanks
Laszlo

> 
> 
> 
>> Paolo
>>
>>>> (08a) New CPU: (Low RAM) Enter protected mode.
>>>
>>> [Jiewen] NOTE: The new CPU still cannot use any physical memory,
>> because
>>> the INIT/SIPI/SIPI may be sent by malicious CPU in non-SMM environment.
>>>
>>>> (08b) New CPU: (Flash) Signals host CPU to proceed and enter cli;hlt loop.
>>>>
>>>> (09) Host CPU: (SMM) Send SMI to the new CPU only.
>>>>
>>>> (10) New CPU: (SMM) Run SMM code at 38000, and rebase SMBASE to
>>>>      TSEG.
>>>>
>>>> (11) Host CPU: (SMM) Restore 38000.
>>>>
>>>> (12) Host CPU: (SMM) Update located data structure to add the new CPU
>>>>      information. (This step will involve CPU_SERVICE protocol)
>>>>
>>>> (13) New CPU: (Flash) do whatever other initialization is needed
>>>>
>>>> (14) New CPU: (Flash) Deadloop, and wait for INIT-SIPI-SIPI.
>>>>
>>>> (15) Host CPU: (OS) Send INIT-SIPI-SIPI to pull new CPU in..
>>>>
>>>>
>>>> In other words, the cache-as-RAM phase of 02-03 is replaced by the
>>>> INIT-SIPI-SIPI sequence of 07b-08a-08b.
>>> [Jiewen] I am OK with this proposal.
>>> I think the rule is same - the new CPU CANNOT touch any system memory,
>>> no matter it is from reset-vector or from INIT/SIPI/SIPI.
>>> Or I would say: if the new CPU want to touch some memory before first
>> SMI, the memory should be
>>> CPU specific or on the flash.
>>>
>>>
>>>
>>>>>> The QEMU DSDT could be modified (when secure boot is in effect) to
>> OUT
>>>>>> to 0xB2 when hotplug happens.  It could write a well-known value to
>>>>>> 0xB2, to be read by an SMI handler in edk2.
>>>>>
>>>>> I dislike involving QEMU's generated DSDT in anything SMM (even
>>>>> injecting the SMI), because the AML interpreter runs in the OS.
>>>>>
>>>>> If a malicious OS kernel is a bit too enlightened about the DSDT, it
>>>>> could willfully diverge from the process that we design. If QEMU
>>>>> broadcast the SMI internally, the guest OS could not interfere with that.
>>>>>
>>>>> If the purpose of the SMI is specifically to force all CPUs into SMM
>>>>> (and thereby force them into trusted state), then the OS would be
>>>>> explicitly counter-interested in carrying out the AML operations from
>>>>> QEMU's DSDT.
>>>>
>>>> But since the hotplug controller would only be accessible from SMM,
>>>> there would be no other way to invoke it than to follow the DSDT's
>>>> instruction and write to 0xB2.  FWIW, real hardware also has plenty of
>>>> 0xB2 writes in the DSDT or in APEI tables (e.g. for persistent store
>>>> access).
>>>>
>>>> Paolo
> 


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-16 20:15                 ` Laszlo Ersek
@ 2019-08-16 22:19                   ` Alex Williamson
  2019-08-17  0:20                     ` Yao, Jiewen
  0 siblings, 1 reply; 69+ messages in thread
From: Alex Williamson @ 2019-08-16 22:19 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Yao, Jiewen, Paolo Bonzini, devel@edk2.groups.io,
	edk2-rfc-groups-io, qemu devel list, Igor Mammedov, Chen, Yingwen,
	Nakajima, Jun, Boris Ostrovsky, Joao Marcal Lemos Martins,
	Phillip Goerl

On Fri, 16 Aug 2019 22:15:15 +0200
Laszlo Ersek <lersek@redhat.com> wrote:

> +Alex (direct question at the bottom)
> 
> On 08/16/19 09:49, Yao, Jiewen wrote:
> > below
> >   
> >> -----Original Message-----
> >> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> >> Sent: Friday, August 16, 2019 3:20 PM
> >> To: Yao, Jiewen <jiewen.yao@intel.com>; Laszlo Ersek
> >> <lersek@redhat.com>; devel@edk2.groups.io
> >> Cc: edk2-rfc-groups-io <rfc@edk2.groups.io>; qemu devel list
> >> <qemu-devel@nongnu.org>; Igor Mammedov <imammedo@redhat.com>;
> >> Chen, Yingwen <yingwen.chen@intel.com>; Nakajima, Jun
> >> <jun.nakajima@intel.com>; Boris Ostrovsky <boris.ostrovsky@oracle.com>;
> >> Joao Marcal Lemos Martins <joao.m.martins@oracle.com>; Phillip Goerl
> >> <phillip.goerl@oracle.com>
> >> Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
> >>
> >> On 16/08/19 04:46, Yao, Jiewen wrote:  
> >>> Comment below:
> >>>
> >>>  
> >>>> -----Original Message-----
> >>>> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> >>>> Sent: Friday, August 16, 2019 12:21 AM
> >>>> To: Laszlo Ersek <lersek@redhat.com>; devel@edk2.groups.io; Yao,  
> >> Jiewen  
> >>>> <jiewen.yao@intel.com>
> >>>> Cc: edk2-rfc-groups-io <rfc@edk2.groups.io>; qemu devel list
> >>>> <qemu-devel@nongnu.org>; Igor Mammedov  
> >> <imammedo@redhat.com>;  
> >>>> Chen, Yingwen <yingwen.chen@intel.com>; Nakajima, Jun
> >>>> <jun.nakajima@intel.com>; Boris Ostrovsky  
> >> <boris.ostrovsky@oracle.com>;  
> >>>> Joao Marcal Lemos Martins <joao.m.martins@oracle.com>; Phillip Goerl
> >>>> <phillip.goerl@oracle.com>
> >>>> Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
> >>>>
> >>>> On 15/08/19 17:00, Laszlo Ersek wrote:  
> >>>>> On 08/14/19 16:04, Paolo Bonzini wrote:  
> >>>>>> On 14/08/19 15:20, Yao, Jiewen wrote:  
> >>>>>>>> - Does this part require a new branch somewhere in the OVMF SEC  
> >>>> code?  
> >>>>>>>>   How do we determine whether the CPU executing SEC is BSP or
> >>>>>>>>   hot-plugged AP?  
> >>>>>>> [Jiewen] I think this is blocked from hardware perspective, since the  
> >> first  
> >>>> instruction.  
> >>>>>>> There are some hardware specific registers can be used to determine  
> >> if  
> >>>> the CPU is new added.  
> >>>>>>> I don’t think this must be same as the real hardware.
> >>>>>>> You are free to invent some registers in device model to be used in  
> >>>> OVMF hot plug driver.  
> >>>>>>
> >>>>>> Yes, this would be a new operation mode for QEMU, that only applies  
> >> to  
> >>>>>> hot-plugged CPUs.  In this mode the AP doesn't reply to INIT or SMI,  
> >> in  
> >>>>>> fact it doesn't reply to anything at all.
> >>>>>>  
> >>>>>>>> - How do we tell the hot-plugged AP where to start execution? (I.e.  
> >>>> that  
> >>>>>>>>   it should execute code at a particular pflash location.)  
> >>>>>>> [Jiewen] Same real mode reset vector at FFFF:FFF0.  
> >>>>>>
> >>>>>> You do not need a reset vector or INIT/SIPI/SIPI sequence at all in
> >>>>>> QEMU.  The AP does not start execution at all when it is unplugged,  
> >> so  
> >>>>>> no cache-as-RAM etc.
> >>>>>>
> >>>>>> We only need to modify QEMU so that hot-plugged APIs do not reply  
> >> to  
> >>>>>> INIT/SIPI/SMI.
> >>>>>>  
> >>>>>>> I don’t think there is problem for real hardware, who always has CAR.
> >>>>>>> Can QEMU provide some CPU specific space, such as MMIO region?  
> >>>>>>
> >>>>>> Why is a CPU-specific region needed if every other processor is in SMM
> >>>>>> and thus trusted.  
> >>>>>
> >>>>> I was going through the steps Jiewen and Yingwen recommended.
> >>>>>
> >>>>> In step (02), the new CPU is expected to set up RAM access. In step
> >>>>> (03), the new CPU, executing code from flash, is expected to "send  
> >> board  
> >>>>> message to tell host CPU (GPIO->SCI) -- I am waiting for hot-add
> >>>>> message." For that action, the new CPU may need a stack (minimally if  
> >> we  
> >>>>> want to use C function calls).
> >>>>>
> >>>>> Until step (03), there had been no word about any other (= pre-plugged)
> >>>>> CPUs (more precisely, Jiewen even confirmed "No impact to other
> >>>>> processors"), so I didn't assume that other CPUs had entered SMM.
> >>>>>
> >>>>> Paolo, I've attempted to read Jiewen's response, and yours, as carefully
> >>>>> as I can. I'm still very confused. If you have a better understanding,
> >>>>> could you please write up the 15-step process from the thread starter
> >>>>> again, with all QEMU customizations applied? Such as, unnecessary  
> >> steps  
> >>>>> removed, and platform specifics filled in.  
> >>>>
> >>>> Sure.
> >>>>
> >>>> (01a) QEMU: create new CPU.  The CPU already exists, but it does not
> >>>>      start running code until unparked by the CPU hotplug controller.
> >>>>
> >>>> (01b) QEMU: trigger SCI
> >>>>
> >>>> (02-03) no equivalent
> >>>>
> >>>> (04) Host CPU: (OS) execute GPE handler from DSDT
> >>>>
> >>>> (05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New CPU
> >>>>      will not enter CPU because SMI is disabled)
> >>>>
> >>>> (06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
> >>>>      rebase code.
> >>>>
> >>>> (07a) Host CPU: (SMM) Write to CPU hotplug controller to enable
> >>>>      new CPU
> >>>>
> >>>> (07b) Host CPU: (SMM) Send INIT/SIPI/SIPI to new CPU.  
> >>> [Jiewen] NOTE: INIT/SIPI/SIPI can be sent by a malicious CPU. There is no
> >>> restriction that INIT/SIPI/SIPI can only be sent in SMM.  
> >>
> >> All of the CPUs are now in SMM, and INIT/SIPI/SIPI will be discarded
> >> before 07a, so this is okay.  
> > [Jiewen] May I know why INIT/SIPI/SIPI is discarded before 07a but is delivered at 07a?
> > I don’t see any extra step between 06 and 07a.
> > What is the magic here?  
> 
> The magic is 07a itself, IIUC. The CPU hotplug controller would be
> accessible only in SMM. And until 07a happens, the new CPU ignores
> INIT/SIPI/SIPI even if another CPU sends it those, simply because QEMU
> would implement the new CPU's behavior like that.
> 
> > 
> > 
> >   
> >> However I do see a problem, because a PCI device's DMA could overwrite
> >> 0x38000 between (06) and (10) and hijack the code that is executed in
> >> SMM.  How is this avoided on real hardware?  By the time the new CPU
> >> enters SMM, it doesn't run off cache-as-RAM anymore.  
> > [Jiewen] Interesting question.
> > I don’t think the DMA attack is considered in threat model for the virtual environment. We only list adversary below:
> > -- Adversary: System Software Attacker, who can control any OS memory or silicon register from OS level, or read write BIOS data.
> > -- Adversary: Simple hardware attacker, who can hot add or hot remove a CPU.  
> 
> We do have physical PCI(e) device assignment; sorry for not highlighting
> that earlier. That feature (VFIO) does rely on the (physical) IOMMU, and
> it makes sure that the assigned device can only access physical frames
> that belong to the virtual machine that the device is assigned to.
> 
> However, as far as I know, VFIO doesn't try to restrict PCI DMA to
> subsets of guest RAM... I could be wrong about that, I vaguely recall
> RMRR support, which seems somewhat related.
> 
> > I agree it is a threat from real hardware perspective. SMM may check VTd to make sure the 38000 is blocked.
> > I doubt if it is a threat in virtual environment. Do we have a way to block DMA in virtual environment?  
> 
> I think that would be a VFIO feature.
> 
> Alex: if we wanted to block PCI(e) DMA to a specific part of guest RAM
> (expressed with guest-physical RAM addresses), perhaps permanently,
> perhaps just for a while -- not sure about coordination though --, could
> VFIO accommodate that (I guess by "punching holes" in the IOMMU page
> tables)?

It depends.  For starters, the vfio mapping API does not allow
unmapping arbitrary sub-ranges of previous mappings.  So the hole you
want to punch would need to be independently mapped.  From there you
get into the issue of whether this range is a potential DMA target.  If
it is, then this is the path to data corruption.  We cannot interfere
with the operation of the device and we have little to no visibility of
active DMA targets.

If we're talking about RAM that is never a DMA target, perhaps e820
reserved memory, then we can make sure certainly MemoryRegions are
skipped when mapped by QEMU and would expect the guest to never map
them through a vIOMMU as well.  Maybe then it's a question of where
we're trying to provide security (it might be more difficult if QEMU
needs to sanitize vIOMMU mappings to actively prevent mapping
reserved areas).

Is there anything unique about the VM case here?  Bare metal SMM needs
to be concerned about protecting itself from I/O devices that operate
outside of the realm of SMM mode as well, right?  Is something "simple"
like an AddressSpace switch necessary here, such that an I/O device
always has a mapping to a safe guest RAM page while the vCPU
AddressSpace can switch to some protected page?  The IOMMU and vCPU
mappings don't need to be the same.  The vCPU is more under our control
than the assigned device.

FWIW, RMRRs are a VT-d specific mechanism to define an address range as
persistently, identity mapped for one or more devices.  IOW, the device
would always map that range.  I don't think that's what you're after
here.  RMRRs are also an abomination that I hope we never find a
requirement for in a VM.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-16 22:19                   ` Alex Williamson
@ 2019-08-17  0:20                     ` Yao, Jiewen
  2019-08-18 19:50                       ` Paolo Bonzini
  0 siblings, 1 reply; 69+ messages in thread
From: Yao, Jiewen @ 2019-08-17  0:20 UTC (permalink / raw)
  To: Alex Williamson, Laszlo Ersek
  Cc: Paolo Bonzini, devel@edk2.groups.io, edk2-rfc-groups-io,
	qemu devel list, Igor Mammedov, Chen, Yingwen, Nakajima, Jun,
	Boris Ostrovsky, Joao Marcal Lemos Martins, Phillip Goerl



> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Saturday, August 17, 2019 6:20 AM
> To: Laszlo Ersek <lersek@redhat.com>
> Cc: Yao, Jiewen <jiewen.yao@intel.com>; Paolo Bonzini
> <pbonzini@redhat.com>; devel@edk2.groups.io; edk2-rfc-groups-io
> <rfc@edk2.groups.io>; qemu devel list <qemu-devel@nongnu.org>; Igor
> Mammedov <imammedo@redhat.com>; Chen, Yingwen
> <yingwen.chen@intel.com>; Nakajima, Jun <jun.nakajima@intel.com>; Boris
> Ostrovsky <boris.ostrovsky@oracle.com>; Joao Marcal Lemos Martins
> <joao.m.martins@oracle.com>; Phillip Goerl <phillip.goerl@oracle.com>
> Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
> 
> On Fri, 16 Aug 2019 22:15:15 +0200
> Laszlo Ersek <lersek@redhat.com> wrote:
> 
> > +Alex (direct question at the bottom)
> >
> > On 08/16/19 09:49, Yao, Jiewen wrote:
> > > below
> > >
> > >> -----Original Message-----
> > >> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> > >> Sent: Friday, August 16, 2019 3:20 PM
> > >> To: Yao, Jiewen <jiewen.yao@intel.com>; Laszlo Ersek
> > >> <lersek@redhat.com>; devel@edk2.groups.io
> > >> Cc: edk2-rfc-groups-io <rfc@edk2.groups.io>; qemu devel list
> > >> <qemu-devel@nongnu.org>; Igor Mammedov
> <imammedo@redhat.com>;
> > >> Chen, Yingwen <yingwen.chen@intel.com>; Nakajima, Jun
> > >> <jun.nakajima@intel.com>; Boris Ostrovsky
> <boris.ostrovsky@oracle.com>;
> > >> Joao Marcal Lemos Martins <joao.m.martins@oracle.com>; Phillip
> Goerl
> > >> <phillip.goerl@oracle.com>
> > >> Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
> > >>
> > >> On 16/08/19 04:46, Yao, Jiewen wrote:
> > >>> Comment below:
> > >>>
> > >>>
> > >>>> -----Original Message-----
> > >>>> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> > >>>> Sent: Friday, August 16, 2019 12:21 AM
> > >>>> To: Laszlo Ersek <lersek@redhat.com>; devel@edk2.groups.io; Yao,
> > >> Jiewen
> > >>>> <jiewen.yao@intel.com>
> > >>>> Cc: edk2-rfc-groups-io <rfc@edk2.groups.io>; qemu devel list
> > >>>> <qemu-devel@nongnu.org>; Igor Mammedov
> > >> <imammedo@redhat.com>;
> > >>>> Chen, Yingwen <yingwen.chen@intel.com>; Nakajima, Jun
> > >>>> <jun.nakajima@intel.com>; Boris Ostrovsky
> > >> <boris.ostrovsky@oracle.com>;
> > >>>> Joao Marcal Lemos Martins <joao.m.martins@oracle.com>; Phillip
> Goerl
> > >>>> <phillip.goerl@oracle.com>
> > >>>> Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
> > >>>>
> > >>>> On 15/08/19 17:00, Laszlo Ersek wrote:
> > >>>>> On 08/14/19 16:04, Paolo Bonzini wrote:
> > >>>>>> On 14/08/19 15:20, Yao, Jiewen wrote:
> > >>>>>>>> - Does this part require a new branch somewhere in the OVMF
> SEC
> > >>>> code?
> > >>>>>>>>   How do we determine whether the CPU executing SEC is BSP
> or
> > >>>>>>>>   hot-plugged AP?
> > >>>>>>> [Jiewen] I think this is blocked from hardware perspective, since
> the
> > >> first
> > >>>> instruction.
> > >>>>>>> There are some hardware specific registers can be used to
> determine
> > >> if
> > >>>> the CPU is new added.
> > >>>>>>> I don’t think this must be same as the real hardware.
> > >>>>>>> You are free to invent some registers in device model to be used
> in
> > >>>> OVMF hot plug driver.
> > >>>>>>
> > >>>>>> Yes, this would be a new operation mode for QEMU, that only
> applies
> > >> to
> > >>>>>> hot-plugged CPUs.  In this mode the AP doesn't reply to INIT or
> SMI,
> > >> in
> > >>>>>> fact it doesn't reply to anything at all.
> > >>>>>>
> > >>>>>>>> - How do we tell the hot-plugged AP where to start execution?
> (I.e.
> > >>>> that
> > >>>>>>>>   it should execute code at a particular pflash location.)
> > >>>>>>> [Jiewen] Same real mode reset vector at FFFF:FFF0.
> > >>>>>>
> > >>>>>> You do not need a reset vector or INIT/SIPI/SIPI sequence at all in
> > >>>>>> QEMU.  The AP does not start execution at all when it is
> unplugged,
> > >> so
> > >>>>>> no cache-as-RAM etc.
> > >>>>>>
> > >>>>>> We only need to modify QEMU so that hot-plugged APIs do not
> reply
> > >> to
> > >>>>>> INIT/SIPI/SMI.
> > >>>>>>
> > >>>>>>> I don’t think there is problem for real hardware, who always has
> CAR.
> > >>>>>>> Can QEMU provide some CPU specific space, such as MMIO
> region?
> > >>>>>>
> > >>>>>> Why is a CPU-specific region needed if every other processor is in
> SMM
> > >>>>>> and thus trusted.
> > >>>>>
> > >>>>> I was going through the steps Jiewen and Yingwen recommended.
> > >>>>>
> > >>>>> In step (02), the new CPU is expected to set up RAM access. In step
> > >>>>> (03), the new CPU, executing code from flash, is expected to "send
> > >> board
> > >>>>> message to tell host CPU (GPIO->SCI) -- I am waiting for hot-add
> > >>>>> message." For that action, the new CPU may need a stack
> (minimally if
> > >> we
> > >>>>> want to use C function calls).
> > >>>>>
> > >>>>> Until step (03), there had been no word about any other (=
> pre-plugged)
> > >>>>> CPUs (more precisely, Jiewen even confirmed "No impact to other
> > >>>>> processors"), so I didn't assume that other CPUs had entered SMM.
> > >>>>>
> > >>>>> Paolo, I've attempted to read Jiewen's response, and yours, as
> carefully
> > >>>>> as I can. I'm still very confused. If you have a better understanding,
> > >>>>> could you please write up the 15-step process from the thread
> starter
> > >>>>> again, with all QEMU customizations applied? Such as, unnecessary
> > >> steps
> > >>>>> removed, and platform specifics filled in.
> > >>>>
> > >>>> Sure.
> > >>>>
> > >>>> (01a) QEMU: create new CPU.  The CPU already exists, but it does
> not
> > >>>>      start running code until unparked by the CPU hotplug
> controller.
> > >>>>
> > >>>> (01b) QEMU: trigger SCI
> > >>>>
> > >>>> (02-03) no equivalent
> > >>>>
> > >>>> (04) Host CPU: (OS) execute GPE handler from DSDT
> > >>>>
> > >>>> (05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New
> CPU
> > >>>>      will not enter CPU because SMI is disabled)
> > >>>>
> > >>>> (06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
> > >>>>      rebase code.
> > >>>>
> > >>>> (07a) Host CPU: (SMM) Write to CPU hotplug controller to enable
> > >>>>      new CPU
> > >>>>
> > >>>> (07b) Host CPU: (SMM) Send INIT/SIPI/SIPI to new CPU.
> > >>> [Jiewen] NOTE: INIT/SIPI/SIPI can be sent by a malicious CPU. There is
> no
> > >>> restriction that INIT/SIPI/SIPI can only be sent in SMM.
> > >>
> > >> All of the CPUs are now in SMM, and INIT/SIPI/SIPI will be discarded
> > >> before 07a, so this is okay.
> > > [Jiewen] May I know why INIT/SIPI/SIPI is discarded before 07a but is
> delivered at 07a?
> > > I don’t see any extra step between 06 and 07a.
> > > What is the magic here?
> >
> > The magic is 07a itself, IIUC. The CPU hotplug controller would be
> > accessible only in SMM. And until 07a happens, the new CPU ignores
> > INIT/SIPI/SIPI even if another CPU sends it those, simply because QEMU
> > would implement the new CPU's behavior like that.
[Jiewen] Got it. Looks fine to me.



> > >> However I do see a problem, because a PCI device's DMA could
> overwrite
> > >> 0x38000 between (06) and (10) and hijack the code that is executed in
> > >> SMM.  How is this avoided on real hardware?  By the time the new
> CPU
> > >> enters SMM, it doesn't run off cache-as-RAM anymore.
> > > [Jiewen] Interesting question.
> > > I don’t think the DMA attack is considered in threat model for the virtual
> environment. We only list adversary below:
> > > -- Adversary: System Software Attacker, who can control any OS memory
> or silicon register from OS level, or read write BIOS data.
> > > -- Adversary: Simple hardware attacker, who can hot add or hot remove
> a CPU.
> >
> > We do have physical PCI(e) device assignment; sorry for not highlighting
> > that earlier.
[Jiewen] That is OK. Then we MUST add the third adversary.
-- Adversary: Simple hardware attacker, who can use device to perform DMA attack in the virtual world.
NOTE: The DMA attack in the real world is out of scope. That is be handled by IOMMU in the real world, such as VTd. -- Please do clarify if this is TRUE.

In the real world:
#1: the SMM MUST be non-DMA capable region.
#2: the MMIO MUST be non-DMA capable region.
#3: the stolen memory MIGHT be DMA capable region or non-DMA capable region. It depends upon the silicon design.
#4: the normal OS accessible memory - including ACPI reclaim, ACPI NVS, and reserved memory not included by #3 - MUST be DMA capable region.
As such, IOMMU protection is NOT required for #1 and #2. IOMMU protection MIGHT be required for #3 and MUST be required for #4.
I assume the virtual environment is designed in the same way. Please correct me if I am wrong.



>> That feature (VFIO) does rely on the (physical) IOMMU, and
> > it makes sure that the assigned device can only access physical frames
> > that belong to the virtual machine that the device is assigned to.
[Jiewen] Thank you! Good to know.
I found https://www.kernel.org/doc/Documentation/vfio.txt
Is that what you scribed above?
Anyway, I believe the problem is clear and the solution in real world is clear.
I will leave the virtual world discussion to Alex, Paolo, Laszlo.
If you need any of my input, please let me know.



> > However, as far as I know, VFIO doesn't try to restrict PCI DMA to
> > subsets of guest RAM... I could be wrong about that, I vaguely recall
> > RMRR support, which seems somewhat related.
> >
> > > I agree it is a threat from real hardware perspective. SMM may check
> VTd to make sure the 38000 is blocked.
> > > I doubt if it is a threat in virtual environment. Do we have a way to block
> DMA in virtual environment?
> >
> > I think that would be a VFIO feature.
> >
> > Alex: if we wanted to block PCI(e) DMA to a specific part of guest RAM
> > (expressed with guest-physical RAM addresses), perhaps permanently,
> > perhaps just for a while -- not sure about coordination though --, could
> > VFIO accommodate that (I guess by "punching holes" in the IOMMU page
> > tables)?
> 
> It depends.  For starters, the vfio mapping API does not allow
> unmapping arbitrary sub-ranges of previous mappings.  So the hole you
> want to punch would need to be independently mapped.  From there you
> get into the issue of whether this range is a potential DMA target.  If
> it is, then this is the path to data corruption.  We cannot interfere
> with the operation of the device and we have little to no visibility of
> active DMA targets.
> 
> If we're talking about RAM that is never a DMA target, perhaps e820
> reserved memory, then we can make sure certainly MemoryRegions are
> skipped when mapped by QEMU and would expect the guest to never map
> them through a vIOMMU as well.  Maybe then it's a question of where
> we're trying to provide security (it might be more difficult if QEMU
> needs to sanitize vIOMMU mappings to actively prevent mapping
> reserved areas).
> 
> Is there anything unique about the VM case here?  Bare metal SMM needs
> to be concerned about protecting itself from I/O devices that operate
> outside of the realm of SMM mode as well, right?  Is something "simple"
> like an AddressSpace switch necessary here, such that an I/O device
> always has a mapping to a safe guest RAM page while the vCPU
> AddressSpace can switch to some protected page?  The IOMMU and vCPU
> mappings don't need to be the same.  The vCPU is more under our control
> than the assigned device.
> 
> FWIW, RMRRs are a VT-d specific mechanism to define an address range as
> persistently, identity mapped for one or more devices.  IOW, the device
> would always map that range.  I don't think that's what you're after
> here.  RMRRs are also an abomination that I hope we never find a
> requirement for in a VM.  Thanks,
> 
> Alex

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-17  0:20                     ` Yao, Jiewen
@ 2019-08-18 19:50                       ` Paolo Bonzini
  2019-08-18 23:00                         ` Yao, Jiewen
  0 siblings, 1 reply; 69+ messages in thread
From: Paolo Bonzini @ 2019-08-18 19:50 UTC (permalink / raw)
  To: Yao, Jiewen, Alex Williamson, Laszlo Ersek
  Cc: devel@edk2.groups.io, edk2-rfc-groups-io, qemu devel list,
	Igor Mammedov, Chen, Yingwen, Nakajima, Jun, Boris Ostrovsky,
	Joao Marcal Lemos Martins, Phillip Goerl

On 17/08/19 02:20, Yao, Jiewen wrote:
> [Jiewen] That is OK. Then we MUST add the third adversary.
> -- Adversary: Simple hardware attacker, who can use device to perform DMA attack in the virtual world.
> NOTE: The DMA attack in the real world is out of scope. That is be handled by IOMMU in the real world, such as VTd. -- Please do clarify if this is TRUE.
> 
> In the real world:
> #1: the SMM MUST be non-DMA capable region.
> #2: the MMIO MUST be non-DMA capable region.
> #3: the stolen memory MIGHT be DMA capable region or non-DMA capable
> region. It depends upon the silicon design.
> #4: the normal OS accessible memory - including ACPI reclaim, ACPI
> NVS, and reserved memory not included by #3 - MUST be DMA capable region.
> As such, IOMMU protection is NOT required for #1 and #2. IOMMU
> protection MIGHT be required for #3 and MUST be required for #4.
> I assume the virtual environment is designed in the same way. Please
> correct me if I am wrong.
> 

Correct.  The 0x30000...0x3ffff area is the only problematic one;
Igor's idea (or a variant, for example optionally remapping
0xa0000..0xaffff SMRAM to 0x30000) is becoming more and more attractive.

Paolo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-18 19:50                       ` Paolo Bonzini
@ 2019-08-18 23:00                         ` Yao, Jiewen
  2019-08-19 14:10                           ` Paolo Bonzini
  2019-08-21 15:48                           ` [edk2-rfc] " Michael D Kinney
  0 siblings, 2 replies; 69+ messages in thread
From: Yao, Jiewen @ 2019-08-18 23:00 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Alex Williamson, Laszlo Ersek, devel@edk2.groups.io,
	edk2-rfc-groups-io, qemu devel list, Igor Mammedov, Chen, Yingwen,
	Nakajima, Jun, Boris Ostrovsky, Joao Marcal Lemos Martins,
	Phillip Goerl

in real world, we deprecate AB-seg usage because they are vulnerable to smm cache poison attack.
I assume cache poison is out of scope in the virtual world, or there is a way to prevent ABseg cache poison. 

thank you!
Yao, Jiewen


> 在 2019年8月19日，上午3:50，Paolo Bonzini <pbonzini@redhat.com> 写道：
> 
>> On 17/08/19 02:20, Yao, Jiewen wrote:
>> [Jiewen] That is OK. Then we MUST add the third adversary.
>> -- Adversary: Simple hardware attacker, who can use device to perform DMA attack in the virtual world.
>> NOTE: The DMA attack in the real world is out of scope. That is be handled by IOMMU in the real world, such as VTd. -- Please do clarify if this is TRUE.
>> 
>> In the real world:
>> #1: the SMM MUST be non-DMA capable region.
>> #2: the MMIO MUST be non-DMA capable region.
>> #3: the stolen memory MIGHT be DMA capable region or non-DMA capable
>> region. It depends upon the silicon design.
>> #4: the normal OS accessible memory - including ACPI reclaim, ACPI
>> NVS, and reserved memory not included by #3 - MUST be DMA capable region.
>> As such, IOMMU protection is NOT required for #1 and #2. IOMMU
>> protection MIGHT be required for #3 and MUST be required for #4.
>> I assume the virtual environment is designed in the same way. Please
>> correct me if I am wrong.
>> 
> 
> Correct.  The 0x30000...0x3ffff area is the only problematic one;
> Igor's idea (or a variant, for example optionally remapping
> 0xa0000..0xaffff SMRAM to 0x30000) is becoming more and more attractive.
> 
> Paolo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-18 23:00                         ` Yao, Jiewen
@ 2019-08-19 14:10                           ` Paolo Bonzini
  2019-08-21 12:07                             ` Laszlo Ersek
  2019-08-21 15:48                           ` [edk2-rfc] " Michael D Kinney
  1 sibling, 1 reply; 69+ messages in thread
From: Paolo Bonzini @ 2019-08-19 14:10 UTC (permalink / raw)
  To: Yao, Jiewen
  Cc: Alex Williamson, Laszlo Ersek, devel@edk2.groups.io,
	edk2-rfc-groups-io, qemu devel list, Igor Mammedov, Chen, Yingwen,
	Nakajima, Jun, Boris Ostrovsky, Joao Marcal Lemos Martins,
	Phillip Goerl

On 19/08/19 01:00, Yao, Jiewen wrote:
> in real world, we deprecate AB-seg usage because they are vulnerable
> to smm cache poison attack. I assume cache poison is out of scope in
> the virtual world, or there is a way to prevent ABseg cache poison.

Indeed the SMRR would not cover the A-seg on real hardware.  However, if
the chipset allowed aliasing A-seg SMRAM to 0x30000, it would only be
used for SMBASE relocation of hotplugged CPU.  The firmware would still
keep low SMRAM disabled, *except around SMBASE relocation of hotplugged
CPUs*.  To avoid cache poisoning attacks, you only have to issue a
WBINVD before enabling low SMRAM and before disabling it.  Hotplug SMI
is not a performance-sensitive path, so it's not a big deal.

So I guess you agree that PCI DMA attacks are a potential vector also on
real hardware.  As Alex pointed out, VT-d is not a solution because
there could be legitimate DMA happening during CPU hotplug.  For OVMF
we'll probably go with Igor's idea, it would be nice if Intel chipsets
supported it too. :)

Paolo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-19 14:10                           ` Paolo Bonzini
@ 2019-08-21 12:07                             ` Laszlo Ersek
  0 siblings, 0 replies; 69+ messages in thread
From: Laszlo Ersek @ 2019-08-21 12:07 UTC (permalink / raw)
  To: Paolo Bonzini, Yao, Jiewen
  Cc: Alex Williamson, devel@edk2.groups.io, edk2-rfc-groups-io,
	qemu devel list, Igor Mammedov, Chen, Yingwen, Nakajima, Jun,
	Boris Ostrovsky, Joao Marcal Lemos Martins, Phillip Goerl

On 08/19/19 16:10, Paolo Bonzini wrote:
> On 19/08/19 01:00, Yao, Jiewen wrote:
>> in real world, we deprecate AB-seg usage because they are vulnerable
>> to smm cache poison attack. I assume cache poison is out of scope in
>> the virtual world, or there is a way to prevent ABseg cache poison.
> 
> Indeed the SMRR would not cover the A-seg on real hardware.  However, if
> the chipset allowed aliasing A-seg SMRAM to 0x30000, it would only be
> used for SMBASE relocation of hotplugged CPU.  The firmware would still
> keep low SMRAM disabled, *except around SMBASE relocation of hotplugged
> CPUs*.  To avoid cache poisoning attacks, you only have to issue a
> WBINVD before enabling low SMRAM and before disabling it.  Hotplug SMI
> is not a performance-sensitive path, so it's not a big deal.
> 
> So I guess you agree that PCI DMA attacks are a potential vector also on
> real hardware.  As Alex pointed out, VT-d is not a solution because
> there could be legitimate DMA happening during CPU hotplug.

Alex, thank you for the help! Please let us know if we should remove you
from the CC list, in order not to clutter your inbox. (I've kept your
address for now, for saying thanks. Feel free to stop reading here. Thanks!)

> For OVMF
> we'll probably go with Igor's idea, it would be nice if Intel chipsets
> supported it too. :)

So what is Igor's idea? Please do spoon-feed it to me. I've seen the POC
patch but the memory region manipulation isn't obvious to me.

Regarding TSEG, QEMU doesn't implement it differently from normal RAM.
Instead, if memory serves, there is an extra "black hole" region that is
overlaid, which hides the RAM contents when TSEG is supposed to be
closed (and the guest is not running in SMM).

But this time we're doing something else, right? Is the idea to overlay
the RAM range at 0x30000 with a window (alias) into the "compatible"
SMRAM at 0xA0000-0xBFFFF?

I don't know how the "compatible" SMRAM is implemented in QEMU. Does the
compatible SMRAM behave in sync with TSEG? OVMF doesn't configure or
touch compatible SMRAM at all, at the moment.

Thanks
Laszlo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-18 23:00                         ` Yao, Jiewen
  2019-08-19 14:10                           ` Paolo Bonzini
@ 2019-08-21 15:48                           ` Michael D Kinney
  2019-08-21 17:05                             ` Paolo Bonzini
  2019-08-22 17:53                             ` Laszlo Ersek
  1 sibling, 2 replies; 69+ messages in thread
From: Michael D Kinney @ 2019-08-21 15:48 UTC (permalink / raw)
  To: rfc@edk2.groups.io, Yao, Jiewen, Paolo Bonzini, Kinney, Michael D
  Cc: Alex Williamson, Laszlo Ersek, devel@edk2.groups.io,
	qemu devel list, Igor Mammedov, Chen, Yingwen, Nakajima, Jun,
	Boris Ostrovsky, Joao Marcal Lemos Martins, Phillip Goerl

Perhaps there is a way to avoid the 3000:8000 startup
vector.

If a CPU is added after a cold reset, it is already in a
different state because one of the active CPUs needs to
release it by interacting with the hot plug controller.

Can the SMRR for CPUs in that state be pre-programmed to
match the SMRR in the rest of the active CPUs?

For OVMF we expect all the active CPUs to use the same
SMRR value, so a check can be made to verify that all 
the active CPUs have the same SMRR value.  If they do,
then any CPU released through the hot plug controller 
can have its SMRR pre-programmed and the initial SMI
will start within TSEG.

We just need to decide what to do in the unexpected 
case where all the active CPUs do not have the same
SMRR value.

This should also reduce the total number of steps.

Mike

> -----Original Message-----
> From: rfc@edk2.groups.io [mailto:rfc@edk2.groups.io] On
> Behalf Of Yao, Jiewen
> Sent: Sunday, August 18, 2019 4:01 PM
> To: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>; Laszlo
> Ersek <lersek@redhat.com>; devel@edk2.groups.io; edk2-
> rfc-groups-io <rfc@edk2.groups.io>; qemu devel list
> <qemu-devel@nongnu.org>; Igor Mammedov
> <imammedo@redhat.com>; Chen, Yingwen
> <yingwen.chen@intel.com>; Nakajima, Jun
> <jun.nakajima@intel.com>; Boris Ostrovsky
> <boris.ostrovsky@oracle.com>; Joao Marcal Lemos Martins
> <joao.m.martins@oracle.com>; Phillip Goerl
> <phillip.goerl@oracle.com>
> Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug using
> SMM with QEMU+OVMF
> 
> in real world, we deprecate AB-seg usage because they
> are vulnerable to smm cache poison attack.
> I assume cache poison is out of scope in the virtual
> world, or there is a way to prevent ABseg cache poison.
> 
> thank you!
> Yao, Jiewen
> 
> 
> > 在 2019年8月19日，上午3:50，Paolo Bonzini
> <pbonzini@redhat.com> 写道：
> >
> >> On 17/08/19 02:20, Yao, Jiewen wrote:
> >> [Jiewen] That is OK. Then we MUST add the third
> adversary.
> >> -- Adversary: Simple hardware attacker, who can use
> device to perform DMA attack in the virtual world.
> >> NOTE: The DMA attack in the real world is out of
> scope. That is be handled by IOMMU in the real world,
> such as VTd. -- Please do clarify if this is TRUE.
> >>
> >> In the real world:
> >> #1: the SMM MUST be non-DMA capable region.
> >> #2: the MMIO MUST be non-DMA capable region.
> >> #3: the stolen memory MIGHT be DMA capable region or
> non-DMA capable
> >> region. It depends upon the silicon design.
> >> #4: the normal OS accessible memory - including ACPI
> reclaim, ACPI
> >> NVS, and reserved memory not included by #3 - MUST be
> DMA capable region.
> >> As such, IOMMU protection is NOT required for #1 and
> #2. IOMMU
> >> protection MIGHT be required for #3 and MUST be
> required for #4.
> >> I assume the virtual environment is designed in the
> same way. Please
> >> correct me if I am wrong.
> >>
> >
> > Correct.  The 0x30000...0x3ffff area is the only
> problematic one;
> > Igor's idea (or a variant, for example optionally
> remapping
> > 0xa0000..0xaffff SMRAM to 0x30000) is becoming more
> and more attractive.
> >
> > Paolo
> 
> 


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-21 15:48                           ` [edk2-rfc] " Michael D Kinney
@ 2019-08-21 17:05                             ` Paolo Bonzini
  2019-08-21 17:25                               ` Michael D Kinney
  2019-08-22 17:59                               ` Laszlo Ersek
  2019-08-22 17:53                             ` Laszlo Ersek
  1 sibling, 2 replies; 69+ messages in thread
From: Paolo Bonzini @ 2019-08-21 17:05 UTC (permalink / raw)
  To: Kinney, Michael D, rfc@edk2.groups.io, Yao, Jiewen
  Cc: Alex Williamson, Laszlo Ersek, devel@edk2.groups.io,
	qemu devel list, Igor Mammedov, Chen, Yingwen, Nakajima, Jun,
	Boris Ostrovsky, Joao Marcal Lemos Martins, Phillip Goerl

On 21/08/19 17:48, Kinney, Michael D wrote:
> Perhaps there is a way to avoid the 3000:8000 startup
> vector.
> 
> If a CPU is added after a cold reset, it is already in a
> different state because one of the active CPUs needs to
> release it by interacting with the hot plug controller.
> 
> Can the SMRR for CPUs in that state be pre-programmed to
> match the SMRR in the rest of the active CPUs?
> 
> For OVMF we expect all the active CPUs to use the same
> SMRR value, so a check can be made to verify that all 
> the active CPUs have the same SMRR value.  If they do,
> then any CPU released through the hot plug controller 
> can have its SMRR pre-programmed and the initial SMI
> will start within TSEG.
> 
> We just need to decide what to do in the unexpected 
> case where all the active CPUs do not have the same
> SMRR value.
> 
> This should also reduce the total number of steps.

The problem is not the SMRR but the SMBASE.  If the SMBASE area is
outside TSEG, it is vulnerable to DMA attacks independent of the SMRR.
SMBASE is also different for all CPUs, so it cannot be preprogrammed.

(As an aside, virt platforms are also immune to cache poisoning so they
don't have SMRR yet - we could use them for SMM_CODE_CHK_EN and block
execution outside SMRR but we never got round to it).

An even simpler alternative would be to make A0000h the initial SMBASE.
 However, I would like to understand what hardware platforms plan to do,
if anything.

Paolo

> Mike
> 
>> -----Original Message-----
>> From: rfc@edk2.groups.io [mailto:rfc@edk2.groups.io] On
>> Behalf Of Yao, Jiewen
>> Sent: Sunday, August 18, 2019 4:01 PM
>> To: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Alex Williamson <alex.williamson@redhat.com>; Laszlo
>> Ersek <lersek@redhat.com>; devel@edk2.groups.io; edk2-
>> rfc-groups-io <rfc@edk2.groups.io>; qemu devel list
>> <qemu-devel@nongnu.org>; Igor Mammedov
>> <imammedo@redhat.com>; Chen, Yingwen
>> <yingwen.chen@intel.com>; Nakajima, Jun
>> <jun.nakajima@intel.com>; Boris Ostrovsky
>> <boris.ostrovsky@oracle.com>; Joao Marcal Lemos Martins
>> <joao.m.martins@oracle.com>; Phillip Goerl
>> <phillip.goerl@oracle.com>
>> Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug using
>> SMM with QEMU+OVMF
>>
>> in real world, we deprecate AB-seg usage because they
>> are vulnerable to smm cache poison attack.
>> I assume cache poison is out of scope in the virtual
>> world, or there is a way to prevent ABseg cache poison.
>>
>> thank you!
>> Yao, Jiewen
>>
>>
>>> 在 2019年8月19日，上午3:50，Paolo Bonzini
>> <pbonzini@redhat.com> 写道：
>>>
>>>> On 17/08/19 02:20, Yao, Jiewen wrote:
>>>> [Jiewen] That is OK. Then we MUST add the third
>> adversary.
>>>> -- Adversary: Simple hardware attacker, who can use
>> device to perform DMA attack in the virtual world.
>>>> NOTE: The DMA attack in the real world is out of
>> scope. That is be handled by IOMMU in the real world,
>> such as VTd. -- Please do clarify if this is TRUE.
>>>>
>>>> In the real world:
>>>> #1: the SMM MUST be non-DMA capable region.
>>>> #2: the MMIO MUST be non-DMA capable region.
>>>> #3: the stolen memory MIGHT be DMA capable region or
>> non-DMA capable
>>>> region. It depends upon the silicon design.
>>>> #4: the normal OS accessible memory - including ACPI
>> reclaim, ACPI
>>>> NVS, and reserved memory not included by #3 - MUST be
>> DMA capable region.
>>>> As such, IOMMU protection is NOT required for #1 and
>> #2. IOMMU
>>>> protection MIGHT be required for #3 and MUST be
>> required for #4.
>>>> I assume the virtual environment is designed in the
>> same way. Please
>>>> correct me if I am wrong.
>>>>
>>>
>>> Correct.  The 0x30000...0x3ffff area is the only
>> problematic one;
>>> Igor's idea (or a variant, for example optionally
>> remapping
>>> 0xa0000..0xaffff SMRAM to 0x30000) is becoming more
>> and more attractive.
>>>
>>> Paolo
>>
>> 
> 


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-21 17:05                             ` Paolo Bonzini
@ 2019-08-21 17:25                               ` Michael D Kinney
  2019-08-21 17:39                                 ` Paolo Bonzini
  2019-08-22 17:59                               ` Laszlo Ersek
  1 sibling, 1 reply; 69+ messages in thread
From: Michael D Kinney @ 2019-08-21 17:25 UTC (permalink / raw)
  To: Paolo Bonzini, rfc@edk2.groups.io, Yao, Jiewen, Kinney, Michael D
  Cc: Alex Williamson, Laszlo Ersek, devel@edk2.groups.io,
	qemu devel list, Igor Mammedov, Chen, Yingwen, Nakajima, Jun,
	Boris Ostrovsky, Joao Marcal Lemos Martins, Phillip Goerl

Could we have an initial SMBASE that is within TSEG.

If we bring in hot plug CPUs one at a time, then initial
SMBASE in TSEG can reprogram the SMBASE to the correct 
value for that CPU.

Can we add a register to the hot plug controller that
allows the BSP to set the initial SMBASE value for 
a hot added CPU?  The default can be 3000:8000 for
compatibility.

Another idea is when the SMI handler runs for a hot add
CPU event, the SMM monarch programs the hot plug controller
register with the SMBASE to use for the CPU that is being
added.  As each CPU is added, a different SMBASE value can
be programmed by the SMM Monarch.

Mike

> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Wednesday, August 21, 2019 10:06 AM
> To: Kinney, Michael D <michael.d.kinney@intel.com>;
> rfc@edk2.groups.io; Yao, Jiewen <jiewen.yao@intel.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>; Laszlo
> Ersek <lersek@redhat.com>; devel@edk2.groups.io; qemu
> devel list <qemu-devel@nongnu.org>; Igor Mammedov
> <imammedo@redhat.com>; Chen, Yingwen
> <yingwen.chen@intel.com>; Nakajima, Jun
> <jun.nakajima@intel.com>; Boris Ostrovsky
> <boris.ostrovsky@oracle.com>; Joao Marcal Lemos Martins
> <joao.m.martins@oracle.com>; Phillip Goerl
> <phillip.goerl@oracle.com>
> Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug using
> SMM with QEMU+OVMF
> 
> On 21/08/19 17:48, Kinney, Michael D wrote:
> > Perhaps there is a way to avoid the 3000:8000 startup
> vector.
> >
> > If a CPU is added after a cold reset, it is already in
> a different
> > state because one of the active CPUs needs to release
> it by
> > interacting with the hot plug controller.
> >
> > Can the SMRR for CPUs in that state be pre-programmed
> to match the
> > SMRR in the rest of the active CPUs?
> >
> > For OVMF we expect all the active CPUs to use the same
> SMRR value, so
> > a check can be made to verify that all the active CPUs
> have the same
> > SMRR value.  If they do, then any CPU released through
> the hot plug
> > controller can have its SMRR pre-programmed and the
> initial SMI will
> > start within TSEG.
> >
> > We just need to decide what to do in the unexpected
> case where all the
> > active CPUs do not have the same SMRR value.
> >
> > This should also reduce the total number of steps.
> 
> The problem is not the SMRR but the SMBASE.  If the
> SMBASE area is outside TSEG, it is vulnerable to DMA
> attacks independent of the SMRR.
> SMBASE is also different for all CPUs, so it cannot be
> preprogrammed.
> 
> (As an aside, virt platforms are also immune to cache
> poisoning so they don't have SMRR yet - we could use
> them for SMM_CODE_CHK_EN and block execution outside
> SMRR but we never got round to it).
> 
> An even simpler alternative would be to make A0000h the
> initial SMBASE.
>  However, I would like to understand what hardware
> platforms plan to do, if anything.
> 
> Paolo
> 
> > Mike
> >
> >> -----Original Message-----
> >> From: rfc@edk2.groups.io [mailto:rfc@edk2.groups.io]
> On Behalf Of
> >> Yao, Jiewen
> >> Sent: Sunday, August 18, 2019 4:01 PM
> >> To: Paolo Bonzini <pbonzini@redhat.com>
> >> Cc: Alex Williamson <alex.williamson@redhat.com>;
> Laszlo Ersek
> >> <lersek@redhat.com>; devel@edk2.groups.io; edk2- rfc-
> groups-io
> >> <rfc@edk2.groups.io>; qemu devel list <qemu-
> devel@nongnu.org>; Igor
> >> Mammedov <imammedo@redhat.com>; Chen, Yingwen
> >> <yingwen.chen@intel.com>; Nakajima, Jun
> <jun.nakajima@intel.com>;
> >> Boris Ostrovsky <boris.ostrovsky@oracle.com>; Joao
> Marcal Lemos
> >> Martins <joao.m.martins@oracle.com>; Phillip Goerl
> >> <phillip.goerl@oracle.com>
> >> Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug
> using SMM with
> >> QEMU+OVMF
> >>
> >> in real world, we deprecate AB-seg usage because they
> are vulnerable
> >> to smm cache poison attack.
> >> I assume cache poison is out of scope in the virtual
> world, or there
> >> is a way to prevent ABseg cache poison.
> >>
> >> thank you!
> >> Yao, Jiewen
> >>
> >>
> >>> 在 2019年8月19日，上午3:50，Paolo Bonzini
> >> <pbonzini@redhat.com> 写道：
> >>>
> >>>> On 17/08/19 02:20, Yao, Jiewen wrote:
> >>>> [Jiewen] That is OK. Then we MUST add the third
> >> adversary.
> >>>> -- Adversary: Simple hardware attacker, who can use
> >> device to perform DMA attack in the virtual world.
> >>>> NOTE: The DMA attack in the real world is out of
> >> scope. That is be handled by IOMMU in the real world,
> such as VTd. --
> >> Please do clarify if this is TRUE.
> >>>>
> >>>> In the real world:
> >>>> #1: the SMM MUST be non-DMA capable region.
> >>>> #2: the MMIO MUST be non-DMA capable region.
> >>>> #3: the stolen memory MIGHT be DMA capable region
> or
> >> non-DMA capable
> >>>> region. It depends upon the silicon design.
> >>>> #4: the normal OS accessible memory - including
> ACPI
> >> reclaim, ACPI
> >>>> NVS, and reserved memory not included by #3 - MUST
> be
> >> DMA capable region.
> >>>> As such, IOMMU protection is NOT required for #1
> and
> >> #2. IOMMU
> >>>> protection MIGHT be required for #3 and MUST be
> >> required for #4.
> >>>> I assume the virtual environment is designed in the
> >> same way. Please
> >>>> correct me if I am wrong.
> >>>>
> >>>
> >>> Correct.  The 0x30000...0x3ffff area is the only
> >> problematic one;
> >>> Igor's idea (or a variant, for example optionally
> >> remapping
> >>> 0xa0000..0xaffff SMRAM to 0x30000) is becoming more
> >> and more attractive.
> >>>
> >>> Paolo
> >>
> >> 
> >


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-21 17:25                               ` Michael D Kinney
@ 2019-08-21 17:39                                 ` Paolo Bonzini
  2019-08-21 20:17                                   ` Michael D Kinney
  0 siblings, 1 reply; 69+ messages in thread
From: Paolo Bonzini @ 2019-08-21 17:39 UTC (permalink / raw)
  To: Kinney, Michael D, rfc@edk2.groups.io, Yao, Jiewen
  Cc: Alex Williamson, Laszlo Ersek, devel@edk2.groups.io,
	qemu devel list, Igor Mammedov, Chen, Yingwen, Nakajima, Jun,
	Boris Ostrovsky, Joao Marcal Lemos Martins, Phillip Goerl

On 21/08/19 19:25, Kinney, Michael D wrote:
> Could we have an initial SMBASE that is within TSEG.
> 
> If we bring in hot plug CPUs one at a time, then initial
> SMBASE in TSEG can reprogram the SMBASE to the correct 
> value for that CPU.
> 
> Can we add a register to the hot plug controller that
> allows the BSP to set the initial SMBASE value for 
> a hot added CPU?  The default can be 3000:8000 for
> compatibility.
> 
> Another idea is when the SMI handler runs for a hot add
> CPU event, the SMM monarch programs the hot plug controller
> register with the SMBASE to use for the CPU that is being
> added.  As each CPU is added, a different SMBASE value can
> be programmed by the SMM Monarch.

Yes, all of these would work.  Again, I'm interested in having something
that has a hope of being implemented in real hardware.

Another, far easier to implement possibility could be a lockable MSR
(could be the existing MSR_SMM_FEATURE_CONTROL) that allows programming
the SMBASE outside SMM.  It would be nice if such a bit could be defined
by Intel.

Paolo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-21 17:39                                 ` Paolo Bonzini
@ 2019-08-21 20:17                                   ` Michael D Kinney
  2019-08-22  6:18                                     ` Paolo Bonzini
  0 siblings, 1 reply; 69+ messages in thread
From: Michael D Kinney @ 2019-08-21 20:17 UTC (permalink / raw)
  To: Paolo Bonzini, rfc@edk2.groups.io, Yao, Jiewen, Kinney, Michael D
  Cc: Alex Williamson, Laszlo Ersek, devel@edk2.groups.io,
	qemu devel list, Igor Mammedov, Chen, Yingwen, Nakajima, Jun,
	Boris Ostrovsky, Joao Marcal Lemos Martins, Phillip Goerl

Paolo,

It makes sense to match real HW.  That puts us back to
the reset vector and handling the initial SMI at
3000:8000.  That is all workable from a FW implementation
perspective.  It look like the only issue left is DMA.

DMA protection of memory ranges is a chipset feature.
For the current QEMU implementation, what ranges of
memory are guaranteed to be protected from DMA?  Is
it only A/B seg and TSEG?

Thanks,

Mike

> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Wednesday, August 21, 2019 10:40 AM
> To: Kinney, Michael D <michael.d.kinney@intel.com>;
> rfc@edk2.groups.io; Yao, Jiewen <jiewen.yao@intel.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>; Laszlo
> Ersek <lersek@redhat.com>; devel@edk2.groups.io; qemu
> devel list <qemu-devel@nongnu.org>; Igor Mammedov
> <imammedo@redhat.com>; Chen, Yingwen
> <yingwen.chen@intel.com>; Nakajima, Jun
> <jun.nakajima@intel.com>; Boris Ostrovsky
> <boris.ostrovsky@oracle.com>; Joao Marcal Lemos Martins
> <joao.m.martins@oracle.com>; Phillip Goerl
> <phillip.goerl@oracle.com>
> Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug using
> SMM with QEMU+OVMF
> 
> On 21/08/19 19:25, Kinney, Michael D wrote:
> > Could we have an initial SMBASE that is within TSEG.
> >
> > If we bring in hot plug CPUs one at a time, then
> initial SMBASE in
> > TSEG can reprogram the SMBASE to the correct value for
> that CPU.
> >
> > Can we add a register to the hot plug controller that
> allows the BSP
> > to set the initial SMBASE value for a hot added CPU?
> The default can
> > be 3000:8000 for compatibility.
> >
> > Another idea is when the SMI handler runs for a hot
> add CPU event, the
> > SMM monarch programs the hot plug controller register
> with the SMBASE
> > to use for the CPU that is being added.  As each CPU
> is added, a
> > different SMBASE value can be programmed by the SMM
> Monarch.
> 
> Yes, all of these would work.  Again, I'm interested in
> having something that has a hope of being implemented in
> real hardware.
> 
> Another, far easier to implement possibility could be a
> lockable MSR (could be the existing
> MSR_SMM_FEATURE_CONTROL) that allows programming the
> SMBASE outside SMM.  It would be nice if such a bit
> could be defined by Intel.
> 
> Paolo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-21 20:17                                   ` Michael D Kinney
@ 2019-08-22  6:18                                     ` Paolo Bonzini
  2019-08-22 18:29                                       ` Laszlo Ersek
  0 siblings, 1 reply; 69+ messages in thread
From: Paolo Bonzini @ 2019-08-22  6:18 UTC (permalink / raw)
  To: Kinney, Michael D, rfc@edk2.groups.io, Yao, Jiewen
  Cc: Alex Williamson, Laszlo Ersek, devel@edk2.groups.io,
	qemu devel list, Igor Mammedov, Chen, Yingwen, Nakajima, Jun,
	Boris Ostrovsky, Joao Marcal Lemos Martins, Phillip Goerl

On 21/08/19 22:17, Kinney, Michael D wrote:
> Paolo,
> 
> It makes sense to match real HW.

Note that it'd also be fine to match some kind of official Intel
specification even if no processor (currently?) supports it.

> That puts us back to
> the reset vector and handling the initial SMI at
> 3000:8000.  That is all workable from a FW implementation
> perspective.  It look like the only issue left is DMA.
> 
> DMA protection of memory ranges is a chipset feature.
> For the current QEMU implementation, what ranges of
> memory are guaranteed to be protected from DMA?  Is
> it only A/B seg and TSEG?

Yes.

Paolo

>> Yes, all of these would work.  Again, I'm interested in
>> having something that has a hope of being implemented in
>> real hardware.
>>
>> Another, far easier to implement possibility could be a
>> lockable MSR (could be the existing
>> MSR_SMM_FEATURE_CONTROL) that allows programming the
>> SMBASE outside SMM.  It would be nice if such a bit
>> could be defined by Intel.
>>
>> Paolo


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-22  6:18                                     ` Paolo Bonzini
@ 2019-08-22 18:29                                       ` Laszlo Ersek
  2019-08-22 18:51                                         ` Paolo Bonzini
  2019-08-22 20:13                                         ` Michael D Kinney
  0 siblings, 2 replies; 69+ messages in thread
From: Laszlo Ersek @ 2019-08-22 18:29 UTC (permalink / raw)
  To: Paolo Bonzini, Kinney, Michael D, rfc@edk2.groups.io, Yao, Jiewen
  Cc: Alex Williamson, devel@edk2.groups.io, qemu devel list,
	Igor Mammedov, Chen, Yingwen, Nakajima, Jun, Boris Ostrovsky,
	Joao Marcal Lemos Martins, Phillip Goerl

On 08/22/19 08:18, Paolo Bonzini wrote:
> On 21/08/19 22:17, Kinney, Michael D wrote:
>> Paolo,
>>
>> It makes sense to match real HW.
>
> Note that it'd also be fine to match some kind of official Intel
> specification even if no processor (currently?) supports it.

I agree, because...

>> That puts us back to the reset vector and handling the initial SMI at
>> 3000:8000.  That is all workable from a FW implementation
>> perspective.

that would suggest that matching reset vector code already exists, and
it would "only" need to be upstreamed to edk2. :)

>> It look like the only issue left is DMA.
>>
>> DMA protection of memory ranges is a chipset feature. For the current
>> QEMU implementation, what ranges of memory are guaranteed to be
>> protected from DMA?  Is it only A/B seg and TSEG?
>
> Yes.

(

This thread (esp. Jiewen's and Mike's messages) are the first time that
I've heard about the *existence* of such RAM ranges / the chipset
feature. :)

Out of interest (independently of virtualization), how is a general
purpose OS informed by the firmware, "never try to set up DMA to this
RAM area"? Is this communicated through ACPI _CRS perhaps?

... Ah, almost: ACPI 6.2 specifies _DMA, in "6.2.4 _DMA (Direct Memory
Access)". It writes,

    For example, if a platform implements a PCI bus that cannot access
    all of physical memory, it has a _DMA object under that PCI bus that
    describes the ranges of physical memory that can be accessed by
    devices on that bus.

Sorry about the digression, and also about being late to this thread,
continually -- I'm primarily following and learning.

)

Thanks!
Laszlo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-22 18:29                                       ` Laszlo Ersek
@ 2019-08-22 18:51                                         ` Paolo Bonzini
  2019-08-23 14:53                                           ` Laszlo Ersek
  2019-08-22 20:13                                         ` Michael D Kinney
  1 sibling, 1 reply; 69+ messages in thread
From: Paolo Bonzini @ 2019-08-22 18:51 UTC (permalink / raw)
  To: Laszlo Ersek, Kinney, Michael D, rfc@edk2.groups.io, Yao, Jiewen
  Cc: Alex Williamson, devel@edk2.groups.io, qemu devel list,
	Igor Mammedov, Chen, Yingwen, Nakajima, Jun, Boris Ostrovsky,
	Joao Marcal Lemos Martins, Phillip Goerl

On 22/08/19 20:29, Laszlo Ersek wrote:
> On 08/22/19 08:18, Paolo Bonzini wrote:
>> On 21/08/19 22:17, Kinney, Michael D wrote:
>>> DMA protection of memory ranges is a chipset feature. For the current
>>> QEMU implementation, what ranges of memory are guaranteed to be
>>> protected from DMA?  Is it only A/B seg and TSEG?
>>
>> Yes.
> 
> This thread (esp. Jiewen's and Mike's messages) are the first time that
> I've heard about the *existence* of such RAM ranges / the chipset
> feature. :)
> 
> Out of interest (independently of virtualization), how is a general
> purpose OS informed by the firmware, "never try to set up DMA to this
> RAM area"? Is this communicated through ACPI _CRS perhaps?
> 
> ... Ah, almost: ACPI 6.2 specifies _DMA, in "6.2.4 _DMA (Direct Memory
> Access)". It writes,
> 
>     For example, if a platform implements a PCI bus that cannot access
>     all of physical memory, it has a _DMA object under that PCI bus that
>     describes the ranges of physical memory that can be accessed by
>     devices on that bus.
> 
> Sorry about the digression, and also about being late to this thread,
> continually -- I'm primarily following and learning.

It's much simpler: these ranges are not in e820, for example

kernel: BIOS-e820: [mem 0x0000000000059000-0x000000000008bfff] usable
kernel: BIOS-e820: [mem 0x000000000008c000-0x00000000000fffff] reserved

The ranges are not special-cased in any way by QEMU.  Simply, AB-segs
and TSEG RAM are not part of the address space except when in SMM.
Therefore, DMA to those ranges ends up respectively to low VGA RAM[1]
and to the bit bucket.  When AB-segs are open, for example, DMA to that
area becomes possible.

Paolo

[1] old timers may remember DEF SEG=&HB800: BLOAD "foo.img",0.  It still
works with some disk device models.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-22 18:51                                         ` Paolo Bonzini
@ 2019-08-23 14:53                                           ` Laszlo Ersek
  0 siblings, 0 replies; 69+ messages in thread
From: Laszlo Ersek @ 2019-08-23 14:53 UTC (permalink / raw)
  To: Paolo Bonzini, Kinney, Michael D, rfc@edk2.groups.io, Yao, Jiewen
  Cc: Alex Williamson, devel@edk2.groups.io, qemu devel list,
	Igor Mammedov, Chen, Yingwen, Nakajima, Jun, Boris Ostrovsky,
	Joao Marcal Lemos Martins, Phillip Goerl

On 08/22/19 20:51, Paolo Bonzini wrote:
> On 22/08/19 20:29, Laszlo Ersek wrote:
>> On 08/22/19 08:18, Paolo Bonzini wrote:
>>> On 21/08/19 22:17, Kinney, Michael D wrote:
>>>> DMA protection of memory ranges is a chipset feature. For the current
>>>> QEMU implementation, what ranges of memory are guaranteed to be
>>>> protected from DMA?  Is it only A/B seg and TSEG?
>>>
>>> Yes.
>>
>> This thread (esp. Jiewen's and Mike's messages) are the first time that
>> I've heard about the *existence* of such RAM ranges / the chipset
>> feature. :)
>>
>> Out of interest (independently of virtualization), how is a general
>> purpose OS informed by the firmware, "never try to set up DMA to this
>> RAM area"? Is this communicated through ACPI _CRS perhaps?
>>
>> ... Ah, almost: ACPI 6.2 specifies _DMA, in "6.2.4 _DMA (Direct Memory
>> Access)". It writes,
>>
>>     For example, if a platform implements a PCI bus that cannot access
>>     all of physical memory, it has a _DMA object under that PCI bus that
>>     describes the ranges of physical memory that can be accessed by
>>     devices on that bus.
>>
>> Sorry about the digression, and also about being late to this thread,
>> continually -- I'm primarily following and learning.
> 
> It's much simpler: these ranges are not in e820, for example
> 
> kernel: BIOS-e820: [mem 0x0000000000059000-0x000000000008bfff] usable
> kernel: BIOS-e820: [mem 0x000000000008c000-0x00000000000fffff] reserved

(1) Sorry, my _DMA quote was a detour from QEMU -- I wondered how a
physical machine with actual RAM at 0x30000, and also chipset level
protection against DMA to/from that RAM range, would expose the fact to
the OS (so that the OS not innocently try to set up DMA there).

(2) In case of QEMU+OVMF, "e820" is not defined at the firmware level.

While
- QEMU exports an "e820 map" (and OVMF does utilize that),
- and Linux parses the UEFI memmap into an "e820 map" (so that dependent
logic only need to deal with e820),

in edk2 the concepts are "GCD memory space map" and "UEFI memmap".

So what OVMF does is, it reserves the TSEG area in the UEFI memmap:

  https://github.com/tianocore/edk2/commit/b09c1c6f2569a

(This was later de-constified for the extended TSEG size, in commit
23bfb5c0aab6, "OvmfPkg/PlatformPei: prepare for PcdQ35TsegMbytes
becoming dynamic", 2017-07-05).

This is just to say that with OVMF, TSEG is not absent from the UEFI
memmap, it is reserved instead. (Apologies if I misunderstood and you
didn't actually claim otherwise.)


> The ranges are not special-cased in any way by QEMU.  Simply, AB-segs
> and TSEG RAM are not part of the address space except when in SMM.

(or when TSEG is not locked, and open; but:) yes, this matches my
understanding.

> Therefore, DMA to those ranges ends up respectively to low VGA RAM[1]
> and to the bit bucket.  When AB-segs are open, for example, DMA to that
> area becomes possible.

Which seems to imply that, if we alias 0x30000 to the AB-segs, and rely
on the AB-segs for CPU hotplug, OVMF should close and lock down the
AB-segs at first boot. Correct? (Because OVMF doesn't do anything about
AB at the moment.)

Thanks
Laszlo

> 
> Paolo
> 
> [1] old timers may remember DEF SEG=&HB800: BLOAD "foo.img",0.  It still
> works with some disk device models.
> 


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-22 18:29                                       ` Laszlo Ersek
  2019-08-22 18:51                                         ` Paolo Bonzini
@ 2019-08-22 20:13                                         ` Michael D Kinney
  1 sibling, 0 replies; 69+ messages in thread
From: Michael D Kinney @ 2019-08-22 20:13 UTC (permalink / raw)
  To: devel@edk2.groups.io, lersek@redhat.com, Paolo Bonzini,
	rfc@edk2.groups.io, Yao, Jiewen, Kinney, Michael D
  Cc: Alex Williamson, qemu devel list, Igor Mammedov, Chen, Yingwen,
	Nakajima, Jun, Boris Ostrovsky, Joao Marcal Lemos Martins,
	Phillip Goerl

Laszlo,

I believe all the code for the AP startup vector
is already in edk2.

It is a combination of the reset vector code in
UefiCpuPkg/ResetVecor/Vtf0 and an IA32/X64 specific
feature in the GenFv tool.  It sets up a 4KB aligned
location near 4GB which can be used to start an AP
using INIT-SIPI-SIPI.

DI is set to 'AP' if the processor is not the BSP.
This can be used to choose to put the APs into a
wait loop executing from the protected FLASH region.

The SMM Monarch on a hot add event can use the Local
APIC to send an INIT-SIPI-SIPI to wake the AP at the 4KB 
startup vector in FLASH.  Later the SMM Monarch
can sent use the Local APIC to send an SMI to pull the 
hot added CPU into SMM.  It is not clear if we have to
do both SIPI followed by the SMI or if we can just do
the SMI.

Best regards,

Mike

> -----Original Message-----
> From: devel@edk2.groups.io
> [mailto:devel@edk2.groups.io] On Behalf Of Laszlo Ersek
> Sent: Thursday, August 22, 2019 11:29 AM
> To: Paolo Bonzini <pbonzini@redhat.com>; Kinney,
> Michael D <michael.d.kinney@intel.com>;
> rfc@edk2.groups.io; Yao, Jiewen <jiewen.yao@intel.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>;
> devel@edk2.groups.io; qemu devel list <qemu-
> devel@nongnu.org>; Igor Mammedov <imammedo@redhat.com>;
> Chen, Yingwen <yingwen.chen@intel.com>; Nakajima, Jun
> <jun.nakajima@intel.com>; Boris Ostrovsky
> <boris.ostrovsky@oracle.com>; Joao Marcal Lemos Martins
> <joao.m.martins@oracle.com>; Phillip Goerl
> <phillip.goerl@oracle.com>
> Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug using
> SMM with QEMU+OVMF
> 
> On 08/22/19 08:18, Paolo Bonzini wrote:
> > On 21/08/19 22:17, Kinney, Michael D wrote:
> >> Paolo,
> >>
> >> It makes sense to match real HW.
> >
> > Note that it'd also be fine to match some kind of
> official Intel
> > specification even if no processor (currently?)
> supports it.
> 
> I agree, because...
> 
> >> That puts us back to the reset vector and handling
> the initial SMI at
> >> 3000:8000.  That is all workable from a FW
> implementation
> >> perspective.
> 
> that would suggest that matching reset vector code
> already exists, and it would "only" need to be
> upstreamed to edk2. :)
> 
> >> It look like the only issue left is DMA.
> >>
> >> DMA protection of memory ranges is a chipset
> feature. For the current
> >> QEMU implementation, what ranges of memory are
> guaranteed to be
> >> protected from DMA?  Is it only A/B seg and TSEG?
> >
> > Yes.
> 
> (
> 
> This thread (esp. Jiewen's and Mike's messages) are the
> first time that I've heard about the *existence* of
> such RAM ranges / the chipset feature. :)
> 
> Out of interest (independently of virtualization), how
> is a general purpose OS informed by the firmware,
> "never try to set up DMA to this RAM area"? Is this
> communicated through ACPI _CRS perhaps?
> 
> ... Ah, almost: ACPI 6.2 specifies _DMA, in "6.2.4 _DMA
> (Direct Memory Access)". It writes,
> 
>     For example, if a platform implements a PCI bus
> that cannot access
>     all of physical memory, it has a _DMA object under
> that PCI bus that
>     describes the ranges of physical memory that can be
> accessed by
>     devices on that bus.
> 
> Sorry about the digression, and also about being late
> to this thread, continually -- I'm primarily following
> and learning.
> 
> )
> 
> Thanks!
> Laszlo
> 
> 


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-21 17:05                             ` Paolo Bonzini
  2019-08-21 17:25                               ` Michael D Kinney
@ 2019-08-22 17:59                               ` Laszlo Ersek
  2019-08-22 18:43                                 ` Paolo Bonzini
  1 sibling, 1 reply; 69+ messages in thread
From: Laszlo Ersek @ 2019-08-22 17:59 UTC (permalink / raw)
  To: Paolo Bonzini, Kinney, Michael D, rfc@edk2.groups.io, Yao, Jiewen
  Cc: Alex Williamson, devel@edk2.groups.io, qemu devel list,
	Igor Mammedov, Chen, Yingwen, Nakajima, Jun, Boris Ostrovsky,
	Joao Marcal Lemos Martins, Phillip Goerl

On 08/21/19 19:05, Paolo Bonzini wrote:
> On 21/08/19 17:48, Kinney, Michael D wrote:
>> Perhaps there is a way to avoid the 3000:8000 startup
>> vector.
>>
>> If a CPU is added after a cold reset, it is already in a
>> different state because one of the active CPUs needs to
>> release it by interacting with the hot plug controller.
>>
>> Can the SMRR for CPUs in that state be pre-programmed to
>> match the SMRR in the rest of the active CPUs?
>>
>> For OVMF we expect all the active CPUs to use the same
>> SMRR value, so a check can be made to verify that all 
>> the active CPUs have the same SMRR value.  If they do,
>> then any CPU released through the hot plug controller 
>> can have its SMRR pre-programmed and the initial SMI
>> will start within TSEG.
>>
>> We just need to decide what to do in the unexpected 
>> case where all the active CPUs do not have the same
>> SMRR value.
>>
>> This should also reduce the total number of steps.
> 
> The problem is not the SMRR but the SMBASE.  If the SMBASE area is
> outside TSEG, it is vulnerable to DMA attacks independent of the SMRR.
> SMBASE is also different for all CPUs, so it cannot be preprogrammed.

The firmware and QEMU could agree on a formula, which would compute the
CPU-specific SMBASE from a value pre-programmed by the firmware, and the
initial APIC ID of the hot-added CPU.

Yes, it would duplicate code -- the calculation -- between QEMU and
edk2. While that's not optimal, it wouldn't be a first.

Thanks
Laszlo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-22 17:59                               ` Laszlo Ersek
@ 2019-08-22 18:43                                 ` Paolo Bonzini
  2019-08-22 20:06                                   ` Michael D Kinney
  0 siblings, 1 reply; 69+ messages in thread
From: Paolo Bonzini @ 2019-08-22 18:43 UTC (permalink / raw)
  To: Laszlo Ersek, Kinney, Michael D, rfc@edk2.groups.io, Yao, Jiewen
  Cc: Alex Williamson, devel@edk2.groups.io, qemu devel list,
	Igor Mammedov, Chen, Yingwen, Nakajima, Jun, Boris Ostrovsky,
	Joao Marcal Lemos Martins, Phillip Goerl

On 22/08/19 19:59, Laszlo Ersek wrote:
> The firmware and QEMU could agree on a formula, which would compute the
> CPU-specific SMBASE from a value pre-programmed by the firmware, and the
> initial APIC ID of the hot-added CPU.
> 
> Yes, it would duplicate code -- the calculation -- between QEMU and
> edk2. While that's not optimal, it wouldn't be a first.

No, that would be unmaintainable.  The best solution to me seems to be
to make SMBASE programmable from non-SMM code if some special conditions
hold.  Michael, would it be possible to get in contact with the Intel
architects?

Paolo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-22 18:43                                 ` Paolo Bonzini
@ 2019-08-22 20:06                                   ` Michael D Kinney
  2019-08-22 22:18                                     ` Paolo Bonzini
  0 siblings, 1 reply; 69+ messages in thread
From: Michael D Kinney @ 2019-08-22 20:06 UTC (permalink / raw)
  To: Paolo Bonzini, Laszlo Ersek, rfc@edk2.groups.io, Yao, Jiewen,
	Kinney, Michael D
  Cc: Alex Williamson, devel@edk2.groups.io, qemu devel list,
	Igor Mammedov, Chen, Yingwen, Nakajima, Jun, Boris Ostrovsky,
	Joao Marcal Lemos Martins, Phillip Goerl

Paolo,

The SMBASE register is internal and cannot be directly accessed 
by any CPU.  There is an SMBASE field that is member of the SMM Save
State area and can only be modified from SMM and requires the
execution of an RSM instruction from SMM for the SMBASE register to
be updated from the current SMBASE field value.  The new SMBASE
register value is only used on the next SMI.

https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf

Vol 3C - Section 34.11

  The default base address for the SMRAM is 30000H. This value is contained in an internal processor register called
  the SMBASE register. The operating system or executive can relocate the SMRAM by setting the SMBASE field in the
  saved state map (at offset 7EF8H) to a new value (see Figure 34-4). The RSM instruction reloads the internal
  SMBASE register with the value in the SMBASE field each time it exits SMM. All subsequent SMI requests will use
  the new SMBASE value to find the starting address for the SMI handler (at SMBASE + 8000H) and the SMRAM state
  save area (from SMBASE + FE00H to SMBASE + FFFFH). (The processor resets the value in its internal SMBASE
  register to 30000H on a RESET, but does not change it on an INIT.)

One idea to work around these issues is to startup OVMF with the maximum number of
CPUs.  All the CPUs will be assigned an SMBASE address and at a safe time to assign
the SMBASE values using the initial 3000:8000 SMI vector because there is a guarantee
of no DMA at that point in the FW init.

Once all the CPUs have been initialized for SMM, the CPUs that are not needed
can be hot removed.  As noted above, the SMBASE value does not change on
an INIT.  So as long as the hot add operation does not do a RESET, the
SMBASE value must be preserved.

Of course, this is not a good idea from a boot performance perspective, 
especially if the max CPUs is a large value.

Another idea is to emulate this behavior.  If the hot plug controller
provide registers (only accessible from SMM) to assign the SMBASE address
for every CPU.  When a CPU is hot added, QEMU can set the internal SMBASE
register value from the hot plug controller register value.  If the SMM
Monarch sends an INIT or an SMI from the Local APIC to the hot added CPU,
then the SMBASE register should not be modified and the CPU starts execution
within TSEG the first time it receives an SMI.

Jiewen and I can collect specific questions on this topic and continue
the discussion here.  For example, I do not think there is any method
other than what I referenced above to program the SMBASE register, but
I can ask if there are any other methods.

Thanks,

Mike

> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Thursday, August 22, 2019 11:43 AM
> To: Laszlo Ersek <lersek@redhat.com>; Kinney, Michael D
> <michael.d.kinney@intel.com>; rfc@edk2.groups.io; Yao,
> Jiewen <jiewen.yao@intel.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>;
> devel@edk2.groups.io; qemu devel list <qemu-
> devel@nongnu.org>; Igor Mammedov <imammedo@redhat.com>;
> Chen, Yingwen <yingwen.chen@intel.com>; Nakajima, Jun
> <jun.nakajima@intel.com>; Boris Ostrovsky
> <boris.ostrovsky@oracle.com>; Joao Marcal Lemos Martins
> <joao.m.martins@oracle.com>; Phillip Goerl
> <phillip.goerl@oracle.com>
> Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug using
> SMM with QEMU+OVMF
> 
> On 22/08/19 19:59, Laszlo Ersek wrote:
> > The firmware and QEMU could agree on a formula, which
> would compute
> > the CPU-specific SMBASE from a value pre-programmed by
> the firmware,
> > and the initial APIC ID of the hot-added CPU.
> >
> > Yes, it would duplicate code -- the calculation --
> between QEMU and
> > edk2. While that's not optimal, it wouldn't be a first.
> 
> No, that would be unmaintainable.  The best solution to
> me seems to be to make SMBASE programmable from non-SMM
> code if some special conditions hold.  Michael, would it
> be possible to get in contact with the Intel architects?
> 
> Paolo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-22 20:06                                   ` Michael D Kinney
@ 2019-08-22 22:18                                     ` Paolo Bonzini
  2019-08-22 22:32                                       ` Michael D Kinney
  0 siblings, 1 reply; 69+ messages in thread
From: Paolo Bonzini @ 2019-08-22 22:18 UTC (permalink / raw)
  To: Kinney, Michael D, Laszlo Ersek, rfc@edk2.groups.io, Yao, Jiewen
  Cc: Alex Williamson, devel@edk2.groups.io, qemu devel list,
	Igor Mammedov, Chen, Yingwen, Nakajima, Jun, Boris Ostrovsky,
	Joao Marcal Lemos Martins, Phillip Goerl

On 22/08/19 22:06, Kinney, Michael D wrote:
> The SMBASE register is internal and cannot be directly accessed 
> by any CPU.  There is an SMBASE field that is member of the SMM Save
> State area and can only be modified from SMM and requires the
> execution of an RSM instruction from SMM for the SMBASE register to
> be updated from the current SMBASE field value.  The new SMBASE
> register value is only used on the next SMI.

Actually there is also an SMBASE MSR, even though in current silicon
it's read-only and its use is theoretically limited to SMM-transfer
monitors.  If that MSR could be made accessible somehow outside SMM,
that would be great.

> Once all the CPUs have been initialized for SMM, the CPUs that are not needed
> can be hot removed.  As noted above, the SMBASE value does not change on
> an INIT.  So as long as the hot add operation does not do a RESET, the
> SMBASE value must be preserved.

IIRC, hot-remove + hot-add will unplugs/plugs a completely different CPU.

> Another idea is to emulate this behavior.  If the hot plug controller
> provide registers (only accessible from SMM) to assign the SMBASE address
> for every CPU.  When a CPU is hot added, QEMU can set the internal SMBASE
> register value from the hot plug controller register value.  If the SMM
> Monarch sends an INIT or an SMI from the Local APIC to the hot added CPU,
> then the SMBASE register should not be modified and the CPU starts execution
> within TSEG the first time it receives an SMI.

Yes, this would work.  But again---if the issue is real on current
hardware too, I'd rather have a matching solution for virtual platforms.

If the current hardware for example remembers INIT-preserved across
hot-remove/hot-add, we could emulate that.

I guess the fundamental question is: how do bare metal platforms avoid
this issue, or plan to avoid this issue?  Once we know that, we can use
that information to find a way to implement it in KVM.  Only if it is
impossible we'll have a different strategy that is specific to our platform.

Paolo

> Jiewen and I can collect specific questions on this topic and continue
> the discussion here.  For example, I do not think there is any method
> other than what I referenced above to program the SMBASE register, but
> I can ask if there are any other methods.


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-22 22:18                                     ` Paolo Bonzini
@ 2019-08-22 22:32                                       ` Michael D Kinney
  2019-08-22 23:11                                         ` Paolo Bonzini
  0 siblings, 1 reply; 69+ messages in thread
From: Michael D Kinney @ 2019-08-22 22:32 UTC (permalink / raw)
  To: Paolo Bonzini, Laszlo Ersek, rfc@edk2.groups.io, Yao, Jiewen,
	Kinney, Michael D
  Cc: Alex Williamson, devel@edk2.groups.io, qemu devel list,
	Igor Mammedov, Chen, Yingwen, Nakajima, Jun, Boris Ostrovsky,
	Joao Marcal Lemos Martins, Phillip Goerl

Paolo,

It is my understanding that real HW hot plug uses the SDM defined
methods.  Meaning the initial SMI is to 3000:8000 and they rebase
to TSEG in the first SMI.  They must have chipset specific methods
to protect 3000:8000 from DMA.

Can we add a chipset feature to prevent DMA to 64KB range from
0x30000-0x3FFFF and the UEFI Memory Map and ACPI content can be
updated so the Guest OS knows to not use that range for DMA?

Thanks,

Mike

> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Thursday, August 22, 2019 3:18 PM
> To: Kinney, Michael D <michael.d.kinney@intel.com>;
> Laszlo Ersek <lersek@redhat.com>; rfc@edk2.groups.io;
> Yao, Jiewen <jiewen.yao@intel.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>;
> devel@edk2.groups.io; qemu devel list <qemu-
> devel@nongnu.org>; Igor Mammedov <imammedo@redhat.com>;
> Chen, Yingwen <yingwen.chen@intel.com>; Nakajima, Jun
> <jun.nakajima@intel.com>; Boris Ostrovsky
> <boris.ostrovsky@oracle.com>; Joao Marcal Lemos Martins
> <joao.m.martins@oracle.com>; Phillip Goerl
> <phillip.goerl@oracle.com>
> Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug using
> SMM with QEMU+OVMF
> 
> On 22/08/19 22:06, Kinney, Michael D wrote:
> > The SMBASE register is internal and cannot be directly
> accessed by any
> > CPU.  There is an SMBASE field that is member of the
> SMM Save State
> > area and can only be modified from SMM and requires the
> execution of
> > an RSM instruction from SMM for the SMBASE register to
> be updated from
> > the current SMBASE field value.  The new SMBASE
> register value is only
> > used on the next SMI.
> 
> Actually there is also an SMBASE MSR, even though in
> current silicon it's read-only and its use is
> theoretically limited to SMM-transfer monitors.  If that
> MSR could be made accessible somehow outside SMM, that
> would be great.
> 
> > Once all the CPUs have been initialized for SMM, the
> CPUs that are not
> > needed can be hot removed.  As noted above, the SMBASE
> value does not
> > change on an INIT.  So as long as the hot add operation
> does not do a
> > RESET, the SMBASE value must be preserved.
> 
> IIRC, hot-remove + hot-add will unplugs/plugs a
> completely different CPU.
> 
> > Another idea is to emulate this behavior.  If the hot
> plug controller
> > provide registers (only accessible from SMM) to assign
> the SMBASE
> > address for every CPU.  When a CPU is hot added, QEMU
> can set the
> > internal SMBASE register value from the hot plug
> controller register
> > value.  If the SMM Monarch sends an INIT or an SMI from
> the Local APIC
> > to the hot added CPU, then the SMBASE register should
> not be modified
> > and the CPU starts execution within TSEG the first time
> it receives an SMI.
> 
> Yes, this would work.  But again---if the issue is real
> on current hardware too, I'd rather have a matching
> solution for virtual platforms.
> 
> If the current hardware for example remembers INIT-
> preserved across hot-remove/hot-add, we could emulate
> that.
> 
> I guess the fundamental question is: how do bare metal
> platforms avoid this issue, or plan to avoid this issue?
> Once we know that, we can use that information to find a
> way to implement it in KVM.  Only if it is impossible
> we'll have a different strategy that is specific to our
> platform.
> 
> Paolo
> 
> > Jiewen and I can collect specific questions on this
> topic and continue
> > the discussion here.  For example, I do not think there
> is any method
> > other than what I referenced above to program the
> SMBASE register, but
> > I can ask if there are any other methods.


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-22 22:32                                       ` Michael D Kinney
@ 2019-08-22 23:11                                         ` Paolo Bonzini
  2019-08-23  1:02                                           ` Michael D Kinney
  0 siblings, 1 reply; 69+ messages in thread
From: Paolo Bonzini @ 2019-08-22 23:11 UTC (permalink / raw)
  To: Kinney, Michael D, Laszlo Ersek, rfc@edk2.groups.io, Yao, Jiewen
  Cc: Alex Williamson, devel@edk2.groups.io, qemu devel list,
	Igor Mammedov, Chen, Yingwen, Nakajima, Jun, Boris Ostrovsky,
	Joao Marcal Lemos Martins, Phillip Goerl

On 23/08/19 00:32, Kinney, Michael D wrote:
> Paolo,
> 
> It is my understanding that real HW hot plug uses the SDM defined
> methods.  Meaning the initial SMI is to 3000:8000 and they rebase
> to TSEG in the first SMI.  They must have chipset specific methods
> to protect 3000:8000 from DMA.

It would be great if you could check.

> Can we add a chipset feature to prevent DMA to 64KB range from
> 0x30000-0x3FFFF and the UEFI Memory Map and ACPI content can be
> updated so the Guest OS knows to not use that range for DMA?

If real hardware does it at the chipset level, we will probably use
Igor's suggestion of aliasing A-seg to 3000:0000.  Before starting the
new CPU, the SMI handler can prepare the SMBASE relocation trampoline at
A000:8000 and the hot-plugged CPU will find it at 3000:8000 when it
receives the initial SMI.  Because this is backed by RAM at
0xA0000-0xAFFFF, DMA cannot access it and would still go through to RAM
at 0x30000.

Paolo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-22 23:11                                         ` Paolo Bonzini
@ 2019-08-23  1:02                                           ` Michael D Kinney
  2019-08-23  5:00                                             ` Yao, Jiewen
  0 siblings, 1 reply; 69+ messages in thread
From: Michael D Kinney @ 2019-08-23  1:02 UTC (permalink / raw)
  To: Paolo Bonzini, Laszlo Ersek, rfc@edk2.groups.io, Yao, Jiewen,
	Kinney, Michael D
  Cc: Alex Williamson, devel@edk2.groups.io, qemu devel list,
	Igor Mammedov, Chen, Yingwen, Nakajima, Jun, Boris Ostrovsky,
	Joao Marcal Lemos Martins, Phillip Goerl

Paolo,

I find the following links related to the discussions here
along with one example feature called GENPROTRANGE.

https://csrc.nist.gov/CSRC/media/Presentations/The-Whole-is-Greater/images-media/day1_trusted-computing_200-250.pdf
https://cansecwest.com/slides/2017/CSW2017_Cuauhtemoc-Rene_CPU_Hot-Add_flow.pdf
https://www.mouser.com/ds/2/612/5520-5500-chipset-ioh-datasheet-1131292.pdf

Best regards,

Mike

> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Thursday, August 22, 2019 4:12 PM
> To: Kinney, Michael D <michael.d.kinney@intel.com>;
> Laszlo Ersek <lersek@redhat.com>; rfc@edk2.groups.io;
> Yao, Jiewen <jiewen.yao@intel.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>;
> devel@edk2.groups.io; qemu devel list <qemu-
> devel@nongnu.org>; Igor Mammedov <imammedo@redhat.com>;
> Chen, Yingwen <yingwen.chen@intel.com>; Nakajima, Jun
> <jun.nakajima@intel.com>; Boris Ostrovsky
> <boris.ostrovsky@oracle.com>; Joao Marcal Lemos Martins
> <joao.m.martins@oracle.com>; Phillip Goerl
> <phillip.goerl@oracle.com>
> Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug using
> SMM with QEMU+OVMF
> 
> On 23/08/19 00:32, Kinney, Michael D wrote:
> > Paolo,
> >
> > It is my understanding that real HW hot plug uses the
> SDM defined
> > methods.  Meaning the initial SMI is to 3000:8000 and
> they rebase to
> > TSEG in the first SMI.  They must have chipset specific
> methods to
> > protect 3000:8000 from DMA.
> 
> It would be great if you could check.
> 
> > Can we add a chipset feature to prevent DMA to 64KB
> range from
> > 0x30000-0x3FFFF and the UEFI Memory Map and ACPI
> content can be
> > updated so the Guest OS knows to not use that range for
> DMA?
> 
> If real hardware does it at the chipset level, we will
> probably use Igor's suggestion of aliasing A-seg to
> 3000:0000.  Before starting the new CPU, the SMI handler
> can prepare the SMBASE relocation trampoline at
> A000:8000 and the hot-plugged CPU will find it at
> 3000:8000 when it receives the initial SMI.  Because this
> is backed by RAM at 0xA0000-0xAFFFF, DMA cannot access it
> and would still go through to RAM at 0x30000.
> 
> Paolo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-23  1:02                                           ` Michael D Kinney
@ 2019-08-23  5:00                                             ` Yao, Jiewen
  2019-08-23 15:25                                               ` Michael D Kinney
  0 siblings, 1 reply; 69+ messages in thread
From: Yao, Jiewen @ 2019-08-23  5:00 UTC (permalink / raw)
  To: Kinney, Michael D, Paolo Bonzini, Laszlo Ersek,
	rfc@edk2.groups.io
  Cc: Alex Williamson, devel@edk2.groups.io, qemu devel list,
	Igor Mammedov, Chen, Yingwen, Nakajima, Jun, Boris Ostrovsky,
	Joao Marcal Lemos Martins, Phillip Goerl

Thank you Mike!

That is good reference on the real hardware behavior. (Glad it is public.)

For threat model, the unique part in virtual environment is temp RAM.
The temp RAM in real platform is per CPU cache, while the temp RAM in virtual platform is global memory.
That brings one more potential attack surface in virtual environment, if hot-added CPU need run code with stack or heap before SMI rebase.

Other threats, such as SMRAM or DMA, are same.

Thank you
Yao Jiewen


> -----Original Message-----
> From: Kinney, Michael D
> Sent: Friday, August 23, 2019 9:03 AM
> To: Paolo Bonzini <pbonzini@redhat.com>; Laszlo Ersek
> <lersek@redhat.com>; rfc@edk2.groups.io; Yao, Jiewen
> <jiewen.yao@intel.com>; Kinney, Michael D <michael.d.kinney@intel.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>; devel@edk2.groups.io;
> qemu devel list <qemu-devel@nongnu.org>; Igor Mammedov
> <imammedo@redhat.com>; Chen, Yingwen <yingwen.chen@intel.com>;
> Nakajima, Jun <jun.nakajima@intel.com>; Boris Ostrovsky
> <boris.ostrovsky@oracle.com>; Joao Marcal Lemos Martins
> <joao.m.martins@oracle.com>; Phillip Goerl <phillip.goerl@oracle.com>
> Subject: RE: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with
> QEMU+OVMF
> 
> Paolo,
> 
> I find the following links related to the discussions here
> along with one example feature called GENPROTRANGE.
> 
> https://csrc.nist.gov/CSRC/media/Presentations/The-Whole-is-Greater/ima
> ges-media/day1_trusted-computing_200-250.pdf
> https://cansecwest.com/slides/2017/CSW2017_Cuauhtemoc-Rene_CPU_Ho
> t-Add_flow.pdf
> https://www.mouser.com/ds/2/612/5520-5500-chipset-ioh-datasheet-1131
> 292.pdf
> 
> Best regards,
> 
> Mike
> 
> > -----Original Message-----
> > From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> > Sent: Thursday, August 22, 2019 4:12 PM
> > To: Kinney, Michael D <michael.d.kinney@intel.com>;
> > Laszlo Ersek <lersek@redhat.com>; rfc@edk2.groups.io;
> > Yao, Jiewen <jiewen.yao@intel.com>
> > Cc: Alex Williamson <alex.williamson@redhat.com>;
> > devel@edk2.groups.io; qemu devel list <qemu-
> > devel@nongnu.org>; Igor Mammedov <imammedo@redhat.com>;
> > Chen, Yingwen <yingwen.chen@intel.com>; Nakajima, Jun
> > <jun.nakajima@intel.com>; Boris Ostrovsky
> > <boris.ostrovsky@oracle.com>; Joao Marcal Lemos Martins
> > <joao.m.martins@oracle.com>; Phillip Goerl
> > <phillip.goerl@oracle.com>
> > Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug using
> > SMM with QEMU+OVMF
> >
> > On 23/08/19 00:32, Kinney, Michael D wrote:
> > > Paolo,
> > >
> > > It is my understanding that real HW hot plug uses the
> > SDM defined
> > > methods.  Meaning the initial SMI is to 3000:8000 and
> > they rebase to
> > > TSEG in the first SMI.  They must have chipset specific
> > methods to
> > > protect 3000:8000 from DMA.
> >
> > It would be great if you could check.
> >
> > > Can we add a chipset feature to prevent DMA to 64KB
> > range from
> > > 0x30000-0x3FFFF and the UEFI Memory Map and ACPI
> > content can be
> > > updated so the Guest OS knows to not use that range for
> > DMA?
> >
> > If real hardware does it at the chipset level, we will
> > probably use Igor's suggestion of aliasing A-seg to
> > 3000:0000.  Before starting the new CPU, the SMI handler
> > can prepare the SMBASE relocation trampoline at
> > A000:8000 and the hot-plugged CPU will find it at
> > 3000:8000 when it receives the initial SMI.  Because this
> > is backed by RAM at 0xA0000-0xAFFFF, DMA cannot access it
> > and would still go through to RAM at 0x30000.
> >
> > Paolo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-23  5:00                                             ` Yao, Jiewen
@ 2019-08-23 15:25                                               ` Michael D Kinney
  2019-08-24  1:48                                                 ` Yao, Jiewen
  2019-08-26 15:30                                                 ` [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF Laszlo Ersek
  0 siblings, 2 replies; 69+ messages in thread
From: Michael D Kinney @ 2019-08-23 15:25 UTC (permalink / raw)
  To: Yao, Jiewen, Paolo Bonzini, Laszlo Ersek, rfc@edk2.groups.io,
	Kinney, Michael D
  Cc: Alex Williamson, devel@edk2.groups.io, qemu devel list,
	Igor Mammedov, Chen, Yingwen, Nakajima, Jun, Boris Ostrovsky,
	Joao Marcal Lemos Martins, Phillip Goerl

Hi Jiewen,

If a hot add CPU needs to run any code before the
first SMI, I would recommend is only executes code
from a write protected FLASH range without a stack
and then wait for the first SMI.

For this OVMF use case, is any CPU init required
before the first SMI?

From Paolo's list of steps are steps (8a) and (8b) 
really required?  Can the SMI monarch use the Local
APIC to send a directed SMI to the hot added CPU?
The SMI monarch needs to know the APIC ID of the
hot added CPU.  Do we also need to handle the case
where multiple CPUs are added at once?  I think we
would need to serialize the use of 3000:8000 for the
SMM rebase operation on each hot added CPU.

It would be simpler if we can guarantee that only
one CPU can be added or removed at a time and the 
complete flow of adding a CPU to SMM and the OS
needs to be completed before another add/remove
event needs to be processed.

Mike

> -----Original Message-----
> From: Yao, Jiewen
> Sent: Thursday, August 22, 2019 10:00 PM
> To: Kinney, Michael D <michael.d.kinney@intel.com>;
> Paolo Bonzini <pbonzini@redhat.com>; Laszlo Ersek
> <lersek@redhat.com>; rfc@edk2.groups.io
> Cc: Alex Williamson <alex.williamson@redhat.com>;
> devel@edk2.groups.io; qemu devel list <qemu-
> devel@nongnu.org>; Igor Mammedov <imammedo@redhat.com>;
> Chen, Yingwen <yingwen.chen@intel.com>; Nakajima, Jun
> <jun.nakajima@intel.com>; Boris Ostrovsky
> <boris.ostrovsky@oracle.com>; Joao Marcal Lemos Martins
> <joao.m.martins@oracle.com>; Phillip Goerl
> <phillip.goerl@oracle.com>
> Subject: RE: [edk2-rfc] [edk2-devel] CPU hotplug using
> SMM with QEMU+OVMF
> 
> Thank you Mike!
> 
> That is good reference on the real hardware behavior.
> (Glad it is public.)
> 
> For threat model, the unique part in virtual environment
> is temp RAM.
> The temp RAM in real platform is per CPU cache, while
> the temp RAM in virtual platform is global memory.
> That brings one more potential attack surface in virtual
> environment, if hot-added CPU need run code with stack
> or heap before SMI rebase.
> 
> Other threats, such as SMRAM or DMA, are same.
> 
> Thank you
> Yao Jiewen
> 
> 
> > -----Original Message-----
> > From: Kinney, Michael D
> > Sent: Friday, August 23, 2019 9:03 AM
> > To: Paolo Bonzini <pbonzini@redhat.com>; Laszlo Ersek
> > <lersek@redhat.com>; rfc@edk2.groups.io; Yao, Jiewen
> > <jiewen.yao@intel.com>; Kinney, Michael D
> <michael.d.kinney@intel.com>
> > Cc: Alex Williamson <alex.williamson@redhat.com>;
> > devel@edk2.groups.io; qemu devel list <qemu-
> devel@nongnu.org>; Igor
> > Mammedov <imammedo@redhat.com>; Chen, Yingwen
> > <yingwen.chen@intel.com>; Nakajima, Jun
> <jun.nakajima@intel.com>;
> > Boris Ostrovsky <boris.ostrovsky@oracle.com>; Joao
> Marcal Lemos
> > Martins <joao.m.martins@oracle.com>; Phillip Goerl
> > <phillip.goerl@oracle.com>
> > Subject: RE: [edk2-rfc] [edk2-devel] CPU hotplug using
> SMM with
> > QEMU+OVMF
> >
> > Paolo,
> >
> > I find the following links related to the discussions
> here along with
> > one example feature called GENPROTRANGE.
> >
> > https://csrc.nist.gov/CSRC/media/Presentations/The-
> Whole-is-Greater/im
> > a ges-media/day1_trusted-computing_200-250.pdf
> > https://cansecwest.com/slides/2017/CSW2017_Cuauhtemoc-
> Rene_CPU_Ho
> > t-Add_flow.pdf
> > https://www.mouser.com/ds/2/612/5520-5500-chipset-ioh-
> datasheet-1131
> > 292.pdf
> >
> > Best regards,
> >
> > Mike
> >
> > > -----Original Message-----
> > > From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> > > Sent: Thursday, August 22, 2019 4:12 PM
> > > To: Kinney, Michael D <michael.d.kinney@intel.com>;
> Laszlo Ersek
> > > <lersek@redhat.com>; rfc@edk2.groups.io; Yao, Jiewen
> > > <jiewen.yao@intel.com>
> > > Cc: Alex Williamson <alex.williamson@redhat.com>;
> > > devel@edk2.groups.io; qemu devel list <qemu-
> devel@nongnu.org>; Igor
> > > Mammedov <imammedo@redhat.com>; Chen, Yingwen
> > > <yingwen.chen@intel.com>; Nakajima, Jun
> <jun.nakajima@intel.com>;
> > > Boris Ostrovsky <boris.ostrovsky@oracle.com>; Joao
> Marcal Lemos
> > > Martins <joao.m.martins@oracle.com>; Phillip Goerl
> > > <phillip.goerl@oracle.com>
> > > Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug
> using SMM with
> > > QEMU+OVMF
> > >
> > > On 23/08/19 00:32, Kinney, Michael D wrote:
> > > > Paolo,
> > > >
> > > > It is my understanding that real HW hot plug uses
> the
> > > SDM defined
> > > > methods.  Meaning the initial SMI is to 3000:8000
> and
> > > they rebase to
> > > > TSEG in the first SMI.  They must have chipset
> specific
> > > methods to
> > > > protect 3000:8000 from DMA.
> > >
> > > It would be great if you could check.
> > >
> > > > Can we add a chipset feature to prevent DMA to
> 64KB
> > > range from
> > > > 0x30000-0x3FFFF and the UEFI Memory Map and ACPI
> > > content can be
> > > > updated so the Guest OS knows to not use that
> range for
> > > DMA?
> > >
> > > If real hardware does it at the chipset level, we
> will probably use
> > > Igor's suggestion of aliasing A-seg to 3000:0000.
> Before starting
> > > the new CPU, the SMI handler can prepare the SMBASE
> relocation
> > > trampoline at
> > > A000:8000 and the hot-plugged CPU will find it at
> > > 3000:8000 when it receives the initial SMI.  Because
> this is backed
> > > by RAM at 0xA0000-0xAFFFF, DMA cannot access it and
> would still go
> > > through to RAM at 0x30000.
> > >
> > > Paolo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-23 15:25                                               ` Michael D Kinney
@ 2019-08-24  1:48                                                 ` Yao, Jiewen
  2019-08-27 18:31                                                   ` Igor Mammedov
  2019-08-26 15:30                                                 ` [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF Laszlo Ersek
  1 sibling, 1 reply; 69+ messages in thread
From: Yao, Jiewen @ 2019-08-24  1:48 UTC (permalink / raw)
  To: Kinney, Michael D, Paolo Bonzini, Laszlo Ersek,
	rfc@edk2.groups.io
  Cc: Alex Williamson, devel@edk2.groups.io, qemu devel list,
	Igor Mammedov, Chen, Yingwen, Nakajima, Jun, Boris Ostrovsky,
	Joao Marcal Lemos Martins, Phillip Goerl

I give my thought.
Paolo may add more.

> -----Original Message-----
> From: Kinney, Michael D
> Sent: Friday, August 23, 2019 11:25 PM
> To: Yao, Jiewen <jiewen.yao@intel.com>; Paolo Bonzini
> <pbonzini@redhat.com>; Laszlo Ersek <lersek@redhat.com>;
> rfc@edk2.groups.io; Kinney, Michael D <michael.d.kinney@intel.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>; devel@edk2.groups.io;
> qemu devel list <qemu-devel@nongnu.org>; Igor Mammedov
> <imammedo@redhat.com>; Chen, Yingwen <yingwen.chen@intel.com>;
> Nakajima, Jun <jun.nakajima@intel.com>; Boris Ostrovsky
> <boris.ostrovsky@oracle.com>; Joao Marcal Lemos Martins
> <joao.m.martins@oracle.com>; Phillip Goerl <phillip.goerl@oracle.com>
> Subject: RE: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with
> QEMU+OVMF
> 
> Hi Jiewen,
> 
> If a hot add CPU needs to run any code before the
> first SMI, I would recommend is only executes code
> from a write protected FLASH range without a stack
> and then wait for the first SMI.
[Jiewen] Right.

Another option from Paolo, the new CPU will not run until 0x7b.
To mitigate DMA threat, someone need guarantee the low memory SIPI vector is DMA protected.

NOTE: The LOW memory *could* be mapped to write protected FLASH AREA via PAM register. The Host CPU may setup that in SMM.
If that is the case, we don’t need worry DMA.

I copied the detail step here, because I found it is hard to dig them out again.
====================
(01a) QEMU: create new CPU.  The CPU already exists, but it does not
     start running code until unparked by the CPU hotplug controller.

(01b) QEMU: trigger SCI

(02-03) no equivalent

(04) Host CPU: (OS) execute GPE handler from DSDT

(05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New CPU
     will not enter CPU because SMI is disabled)

(06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
     rebase code.

(07a) Host CPU: (SMM) Write to CPU hotplug controller to enable
     new CPU

(07b) Host CPU: (SMM) Send INIT/SIPI/SIPI to new CPU.

(08a) New CPU: (Low RAM) Enter protected mode.

(08b) New CPU: (Flash) Signals host CPU to proceed and enter cli;hlt loop.

(09) Host CPU: (SMM) Send SMI to the new CPU only.

(10) New CPU: (SMM) Run SMM code at 38000, and rebase SMBASE to
     TSEG.

(11) Host CPU: (SMM) Restore 38000.

(12) Host CPU: (SMM) Update located data structure to add the new CPU
     information. (This step will involve CPU_SERVICE protocol)

(13) New CPU: (Flash) do whatever other initialization is needed

(14) New CPU: (Flash) Deadloop, and wait for INIT-SIPI-SIPI.

(15) Host CPU: (OS) Send INIT-SIPI-SIPI to pull new CPU in..
====================

> 
> For this OVMF use case, is any CPU init required
> before the first SMI?
[Jiewen] I am sure what is the detail action in 08b.
And I am not sure what your "init" means here?
Personally, I don’t think we need too much init work, such as Microcode or MTRR.
But we need detail info.



> From Paolo's list of steps are steps (8a) and (8b)
> really required?  Can the SMI monarch use the Local
> APIC to send a directed SMI to the hot added CPU?
> The SMI monarch needs to know the APIC ID of the
> hot added CPU.  
[Jiewen] I think it depend upon virtual hardware design.
Leave question to Paolo.



Do we also need to handle the case
> where multiple CPUs are added at once?  I think we
> would need to serialize the use of 3000:8000 for the
> SMM rebase operation on each hot added CPU.
> It would be simpler if we can guarantee that only
> one CPU can be added or removed at a time and the
> complete flow of adding a CPU to SMM and the OS
> needs to be completed before another add/remove
> event needs to be processed.
[Jiewen] Right.
I treat the multiple CPU hot-add at same time as a potential threat.
We don’t want to trust end user.
The solution could be:
1) Let trusted hardware guarantee hot-add one by one.
2) Let trusted software (SMM and init code) guarantee SMREBASE one by one (include any code runs before SMREBASE)
3) Let trusted software (SMM and init code) support SMREBASE simultaneously (include any code runs before SMREBASE).
Solution #1 or #2 are simple solution.


> Mike
> 
> > -----Original Message-----
> > From: Yao, Jiewen
> > Sent: Thursday, August 22, 2019 10:00 PM
> > To: Kinney, Michael D <michael.d.kinney@intel.com>;
> > Paolo Bonzini <pbonzini@redhat.com>; Laszlo Ersek
> > <lersek@redhat.com>; rfc@edk2.groups.io
> > Cc: Alex Williamson <alex.williamson@redhat.com>;
> > devel@edk2.groups.io; qemu devel list <qemu-
> > devel@nongnu.org>; Igor Mammedov <imammedo@redhat.com>;
> > Chen, Yingwen <yingwen.chen@intel.com>; Nakajima, Jun
> > <jun.nakajima@intel.com>; Boris Ostrovsky
> > <boris.ostrovsky@oracle.com>; Joao Marcal Lemos Martins
> > <joao.m.martins@oracle.com>; Phillip Goerl
> > <phillip.goerl@oracle.com>
> > Subject: RE: [edk2-rfc] [edk2-devel] CPU hotplug using
> > SMM with QEMU+OVMF
> >
> > Thank you Mike!
> >
> > That is good reference on the real hardware behavior.
> > (Glad it is public.)
> >
> > For threat model, the unique part in virtual environment
> > is temp RAM.
> > The temp RAM in real platform is per CPU cache, while
> > the temp RAM in virtual platform is global memory.
> > That brings one more potential attack surface in virtual
> > environment, if hot-added CPU need run code with stack
> > or heap before SMI rebase.
> >
> > Other threats, such as SMRAM or DMA, are same.
> >
> > Thank you
> > Yao Jiewen
> >
> >
> > > -----Original Message-----
> > > From: Kinney, Michael D
> > > Sent: Friday, August 23, 2019 9:03 AM
> > > To: Paolo Bonzini <pbonzini@redhat.com>; Laszlo Ersek
> > > <lersek@redhat.com>; rfc@edk2.groups.io; Yao, Jiewen
> > > <jiewen.yao@intel.com>; Kinney, Michael D
> > <michael.d.kinney@intel.com>
> > > Cc: Alex Williamson <alex.williamson@redhat.com>;
> > > devel@edk2.groups.io; qemu devel list <qemu-
> > devel@nongnu.org>; Igor
> > > Mammedov <imammedo@redhat.com>; Chen, Yingwen
> > > <yingwen.chen@intel.com>; Nakajima, Jun
> > <jun.nakajima@intel.com>;
> > > Boris Ostrovsky <boris.ostrovsky@oracle.com>; Joao
> > Marcal Lemos
> > > Martins <joao.m.martins@oracle.com>; Phillip Goerl
> > > <phillip.goerl@oracle.com>
> > > Subject: RE: [edk2-rfc] [edk2-devel] CPU hotplug using
> > SMM with
> > > QEMU+OVMF
> > >
> > > Paolo,
> > >
> > > I find the following links related to the discussions
> > here along with
> > > one example feature called GENPROTRANGE.
> > >
> > > https://csrc.nist.gov/CSRC/media/Presentations/The-
> > Whole-is-Greater/im
> > > a ges-media/day1_trusted-computing_200-250.pdf
> > > https://cansecwest.com/slides/2017/CSW2017_Cuauhtemoc-
> > Rene_CPU_Ho
> > > t-Add_flow.pdf
> > > https://www.mouser.com/ds/2/612/5520-5500-chipset-ioh-
> > datasheet-1131
> > > 292.pdf
> > >
> > > Best regards,
> > >
> > > Mike
> > >
> > > > -----Original Message-----
> > > > From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> > > > Sent: Thursday, August 22, 2019 4:12 PM
> > > > To: Kinney, Michael D <michael.d.kinney@intel.com>;
> > Laszlo Ersek
> > > > <lersek@redhat.com>; rfc@edk2.groups.io; Yao, Jiewen
> > > > <jiewen.yao@intel.com>
> > > > Cc: Alex Williamson <alex.williamson@redhat.com>;
> > > > devel@edk2.groups.io; qemu devel list <qemu-
> > devel@nongnu.org>; Igor
> > > > Mammedov <imammedo@redhat.com>; Chen, Yingwen
> > > > <yingwen.chen@intel.com>; Nakajima, Jun
> > <jun.nakajima@intel.com>;
> > > > Boris Ostrovsky <boris.ostrovsky@oracle.com>; Joao
> > Marcal Lemos
> > > > Martins <joao.m.martins@oracle.com>; Phillip Goerl
> > > > <phillip.goerl@oracle.com>
> > > > Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug
> > using SMM with
> > > > QEMU+OVMF
> > > >
> > > > On 23/08/19 00:32, Kinney, Michael D wrote:
> > > > > Paolo,
> > > > >
> > > > > It is my understanding that real HW hot plug uses
> > the
> > > > SDM defined
> > > > > methods.  Meaning the initial SMI is to 3000:8000
> > and
> > > > they rebase to
> > > > > TSEG in the first SMI.  They must have chipset
> > specific
> > > > methods to
> > > > > protect 3000:8000 from DMA.
> > > >
> > > > It would be great if you could check.
> > > >
> > > > > Can we add a chipset feature to prevent DMA to
> > 64KB
> > > > range from
> > > > > 0x30000-0x3FFFF and the UEFI Memory Map and ACPI
> > > > content can be
> > > > > updated so the Guest OS knows to not use that
> > range for
> > > > DMA?
> > > >
> > > > If real hardware does it at the chipset level, we
> > will probably use
> > > > Igor's suggestion of aliasing A-seg to 3000:0000.
> > Before starting
> > > > the new CPU, the SMI handler can prepare the SMBASE
> > relocation
> > > > trampoline at
> > > > A000:8000 and the hot-plugged CPU will find it at
> > > > 3000:8000 when it receives the initial SMI.  Because
> > this is backed
> > > > by RAM at 0xA0000-0xAFFFF, DMA cannot access it and
> > would still go
> > > > through to RAM at 0x30000.
> > > >
> > > > Paolo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-24  1:48                                                 ` Yao, Jiewen
@ 2019-08-27 18:31                                                   ` Igor Mammedov
  2019-08-29 17:01                                                     ` Laszlo Ersek
  0 siblings, 1 reply; 69+ messages in thread
From: Igor Mammedov @ 2019-08-27 18:31 UTC (permalink / raw)
  To: Yao, Jiewen
  Cc: Kinney, Michael D, Paolo Bonzini, Laszlo Ersek,
	rfc@edk2.groups.io, Alex Williamson, devel@edk2.groups.io,
	qemu devel list, Chen, Yingwen, Nakajima, Jun, Boris Ostrovsky,
	Joao Marcal Lemos Martins, Phillip Goerl

On Sat, 24 Aug 2019 01:48:09 +0000
"Yao, Jiewen" <jiewen.yao@intel.com> wrote:

> I give my thought.
> Paolo may add more.
Here are some ideas I have on the topic.

> 
> > -----Original Message-----
> > From: Kinney, Michael D
> > Sent: Friday, August 23, 2019 11:25 PM
> > To: Yao, Jiewen <jiewen.yao@intel.com>; Paolo Bonzini
> > <pbonzini@redhat.com>; Laszlo Ersek <lersek@redhat.com>;
> > rfc@edk2.groups.io; Kinney, Michael D <michael.d.kinney@intel.com>
> > Cc: Alex Williamson <alex.williamson@redhat.com>; devel@edk2.groups.io;
> > qemu devel list <qemu-devel@nongnu.org>; Igor Mammedov
> > <imammedo@redhat.com>; Chen, Yingwen <yingwen.chen@intel.com>;
> > Nakajima, Jun <jun.nakajima@intel.com>; Boris Ostrovsky
> > <boris.ostrovsky@oracle.com>; Joao Marcal Lemos Martins
> > <joao.m.martins@oracle.com>; Phillip Goerl <phillip.goerl@oracle.com>
> > Subject: RE: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with
> > QEMU+OVMF
> > 
> > Hi Jiewen,
> > 
> > If a hot add CPU needs to run any code before the
> > first SMI, I would recommend is only executes code
> > from a write protected FLASH range without a stack
> > and then wait for the first SMI.  
> [Jiewen] Right.
> 
> Another option from Paolo, the new CPU will not run until 0x7b.
> To mitigate DMA threat, someone need guarantee the low memory SIPI vector is DMA protected.
> 
> NOTE: The LOW memory *could* be mapped to write protected FLASH AREA via PAM register. The Host CPU may setup that in SMM.
> If that is the case, we don’t need worry DMA.
> 
> I copied the detail step here, because I found it is hard to dig them out again.

*) In light of using dedicated SMRAM at 30000 with pre-configured
relocation vector for initial relocation which is not reachable from
non-SMM mode:

> ====================
> (01a) QEMU: create new CPU.  The CPU already exists, but it does not
>      start running code until unparked by the CPU hotplug controller.
we might not need parked CPU (if we ignore attacker's attempt to send
SMI to several new CPUs, see below for issue it causes)

> (01b) QEMU: trigger SCI
> 
> (02-03) no equivalent
> 
> (04) Host CPU: (OS) execute GPE handler from DSDT
> 
> (05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New CPU
>      will not enter CPU because SMI is disabled)
I think only CPU that does the write will enter SMM
and we might not need to pull in all already initialized CPUs into SMM.

At this step we could also send a directed SMI to a new CPU from host
CPU that entered SMM on write.

> (06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
>      rebase code.
could skip this step as well (*)


> (07a) Host CPU: (SMM) Write to CPU hotplug controller to enable
>      new CPU
ditto
 
> (07b) Host CPU: (SMM) Send INIT/SIPI/SIPI to new CPU.
we need to wake up new CPU somehow so it would process (09) pending (05) SMI
before jumping to SIPI vector

> (08a) New CPU: (Low RAM) Enter protected mode.
> 
> (08b) New CPU: (Flash) Signals host CPU to proceed and enter cli;hlt loop.

these both steps could be changed to to just cli;hlt loop or do INIT reset.
if SMI relocation handler and/or host CPU will pull in the new CPU into OVMF,
we actually don't care about SIPI vector as all firmware initialization
for the new CPU is done in SMM mode (07b triggers 10).
Thus eliminating one attack vector to protect from.

> (09) Host CPU: (SMM) Send SMI to the new CPU only.
could be done at (05)
 
> (10) New CPU: (SMM) Run SMM code at 38000, and rebase SMBASE to
>      TSEG.
it could also pull in itself into other OVMF structures
(assuming it can TSEG as stack as that's rather complex) or
just do relocation and let host CPU to fill in OVMF structures for the new CPU (12).

> (11) Host CPU: (SMM) Restore 38000.
could skip this step as well (*)

> (12) Host CPU: (SMM) Update located data structure to add the new CPU
>      information. (This step will involve CPU_SERVICE protocol)
> 
> (13) New CPU: (Flash) do whatever other initialization is needed
do we actually need it?

> (14) New CPU: (Flash) Deadloop, and wait for INIT-SIPI-SIPI.
> 
> (15) Host CPU: (OS) Send INIT-SIPI-SIPI to pull new CPU in..
> ====================
> 
> > 
> > For this OVMF use case, is any CPU init required
> > before the first SMI?  
> [Jiewen] I am sure what is the detail action in 08b.
> And I am not sure what your "init" means here?
> Personally, I don’t think we need too much init work, such as Microcode or MTRR.
> But we need detail info.
Wouldn't it be preferable to do in SMM mode?

> > From Paolo's list of steps are steps (8a) and (8b)
> > really required?  Can the SMI monarch use the Local
> > APIC to send a directed SMI to the hot added CPU?
> > The SMI monarch needs to know the APIC ID of the
> > hot added CPU.    
> [Jiewen] I think it depend upon virtual hardware design.
> Leave question to Paolo.

it's not really needed as described in (8x), it could be just
cli;hlt loop so that our SIPI could land at sensible code and stop the new CPU,
it even could be an attacker's code if we do all initialization in SMM mode.

> Do we also need to handle the case
> > where multiple CPUs are added at once?  I think we
> > would need to serialize the use of 3000:8000 for the
> > SMM rebase operation on each hot added CPU.
> > It would be simpler if we can guarantee that only
> > one CPU can be added or removed at a time and the
> > complete flow of adding a CPU to SMM and the OS
> > needs to be completed before another add/remove
> > event needs to be processed.  
> [Jiewen] Right.
> I treat the multiple CPU hot-add at same time as a potential threat.

the problem I see here is the race of saving/restoring to/from SMBASE at 30000,
so a CPU exiting SMM can't be sure if it restores its own saved area
or it's another CPU saved state. (I couldn't find in SDM what would
happen in this case)

If we consider non-attack flow, then we can serialize sending SMIs
to new CPUs (one at a time) from GPE handler and ensure that
only one CPU can do relocation at a time (i.e. non enforced serialization).

In attack case, attacker would only be able to trigger above race.

> We don’t want to trust end user.
> The solution could be:
> 1) Let trusted hardware guarantee hot-add one by one.
so far in QEMU it's not possible. We might be able to implement
"parking/unparking" chipset feature, but that would mean inventing
and maintaining ABI for it, which I'd like to avoid if possible.

That's why I'm curious about what happens if CPU exits SMM mode with
another CPU saved registers state in case of the race and if we could
ignore consequences of it. (it's fine for guest OS to crash or new CPU
do not work, attacker would only affect itself)

> 2) Let trusted software (SMM and init code) guarantee SMREBASE one by one (include any code runs before SMREBASE)
that would mean pulling all present CPUs into SMM mode so no attack
code could be executing before doing hotplug. With a lot of present CPUs
it could be quite expensive and unlike physical hardware, guest's CPUs
could be preempted arbitrarily long causing long delays.

> 3) Let trusted software (SMM and init code) support SMREBASE simultaneously (include any code runs before SMREBASE).
Is it really possible to do in software?
Potentially it could be done in hardware (QEMU/KVM) if each CPU will have its
own SMRAM at 30000, so parallely relocated CPUs won't trample over each other.

But KVM has only 2 address spaces (normal RAM and SMM) so it won't just
work of the box (and I recall that Paolo had some reservation versus adding more).
Also it would mean adding ABI for initializing that SMRAM blocks from
another CPU which could be complicated.

> Solution #1 or #2 are simple solution.
lets first see if if we can ignore race and if it's not then
we probably end up with implementing some form of #1


> 
> > Mike
> >   
> > > -----Original Message-----
> > > From: Yao, Jiewen
> > > Sent: Thursday, August 22, 2019 10:00 PM
> > > To: Kinney, Michael D <michael.d.kinney@intel.com>;
> > > Paolo Bonzini <pbonzini@redhat.com>; Laszlo Ersek
> > > <lersek@redhat.com>; rfc@edk2.groups.io
> > > Cc: Alex Williamson <alex.williamson@redhat.com>;
> > > devel@edk2.groups.io; qemu devel list <qemu-  
> > > devel@nongnu.org>; Igor Mammedov <imammedo@redhat.com>;  
> > > Chen, Yingwen <yingwen.chen@intel.com>; Nakajima, Jun
> > > <jun.nakajima@intel.com>; Boris Ostrovsky
> > > <boris.ostrovsky@oracle.com>; Joao Marcal Lemos Martins
> > > <joao.m.martins@oracle.com>; Phillip Goerl
> > > <phillip.goerl@oracle.com>
> > > Subject: RE: [edk2-rfc] [edk2-devel] CPU hotplug using
> > > SMM with QEMU+OVMF
> > >
> > > Thank you Mike!
> > >
> > > That is good reference on the real hardware behavior.
> > > (Glad it is public.)
> > >
> > > For threat model, the unique part in virtual environment
> > > is temp RAM.
> > > The temp RAM in real platform is per CPU cache, while
> > > the temp RAM in virtual platform is global memory.
> > > That brings one more potential attack surface in virtual
> > > environment, if hot-added CPU need run code with stack
> > > or heap before SMI rebase.
> > >
> > > Other threats, such as SMRAM or DMA, are same.
> > >
> > > Thank you
> > > Yao Jiewen
> > >
> > >  
> > > > -----Original Message-----
> > > > From: Kinney, Michael D
> > > > Sent: Friday, August 23, 2019 9:03 AM
> > > > To: Paolo Bonzini <pbonzini@redhat.com>; Laszlo Ersek
> > > > <lersek@redhat.com>; rfc@edk2.groups.io; Yao, Jiewen
> > > > <jiewen.yao@intel.com>; Kinney, Michael D  
> > > <michael.d.kinney@intel.com>  
> > > > Cc: Alex Williamson <alex.williamson@redhat.com>;
> > > > devel@edk2.groups.io; qemu devel list <qemu-
> > > devel@nongnu.org>; Igor
> > > > Mammedov <imammedo@redhat.com>; Chen, Yingwen
> > > > <yingwen.chen@intel.com>; Nakajima, Jun  
> > > <jun.nakajima@intel.com>;  
> > > > Boris Ostrovsky <boris.ostrovsky@oracle.com>; Joao  
> > > Marcal Lemos  
> > > > Martins <joao.m.martins@oracle.com>; Phillip Goerl
> > > > <phillip.goerl@oracle.com>
> > > > Subject: RE: [edk2-rfc] [edk2-devel] CPU hotplug using  
> > > SMM with  
> > > > QEMU+OVMF
> > > >
> > > > Paolo,
> > > >
> > > > I find the following links related to the discussions  
> > > here along with  
> > > > one example feature called GENPROTRANGE.
> > > >
> > > > https://csrc.nist.gov/CSRC/media/Presentations/The-  
> > > Whole-is-Greater/im  
> > > > a ges-media/day1_trusted-computing_200-250.pdf
> > > > https://cansecwest.com/slides/2017/CSW2017_Cuauhtemoc-  
> > > Rene_CPU_Ho  
> > > > t-Add_flow.pdf
> > > > https://www.mouser.com/ds/2/612/5520-5500-chipset-ioh-  
> > > datasheet-1131  
> > > > 292.pdf
> > > >
> > > > Best regards,
> > > >
> > > > Mike
> > > >  
> > > > > -----Original Message-----
> > > > > From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> > > > > Sent: Thursday, August 22, 2019 4:12 PM
> > > > > To: Kinney, Michael D <michael.d.kinney@intel.com>;  
> > > Laszlo Ersek  
> > > > > <lersek@redhat.com>; rfc@edk2.groups.io; Yao, Jiewen
> > > > > <jiewen.yao@intel.com>
> > > > > Cc: Alex Williamson <alex.williamson@redhat.com>;
> > > > > devel@edk2.groups.io; qemu devel list <qemu-  
> > > devel@nongnu.org>; Igor  
> > > > > Mammedov <imammedo@redhat.com>; Chen, Yingwen
> > > > > <yingwen.chen@intel.com>; Nakajima, Jun  
> > > <jun.nakajima@intel.com>;  
> > > > > Boris Ostrovsky <boris.ostrovsky@oracle.com>; Joao  
> > > Marcal Lemos  
> > > > > Martins <joao.m.martins@oracle.com>; Phillip Goerl
> > > > > <phillip.goerl@oracle.com>
> > > > > Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug  
> > > using SMM with  
> > > > > QEMU+OVMF
> > > > >
> > > > > On 23/08/19 00:32, Kinney, Michael D wrote:  
> > > > > > Paolo,
> > > > > >
> > > > > > It is my understanding that real HW hot plug uses  
> > > the  
> > > > > SDM defined  
> > > > > > methods.  Meaning the initial SMI is to 3000:8000  
> > > and  
> > > > > they rebase to  
> > > > > > TSEG in the first SMI.  They must have chipset  
> > > specific  
> > > > > methods to  
> > > > > > protect 3000:8000 from DMA.  
> > > > >
> > > > > It would be great if you could check.
> > > > >  
> > > > > > Can we add a chipset feature to prevent DMA to  
> > > 64KB  
> > > > > range from  
> > > > > > 0x30000-0x3FFFF and the UEFI Memory Map and ACPI  
> > > > > content can be  
> > > > > > updated so the Guest OS knows to not use that  
> > > range for  
> > > > > DMA?
> > > > >
> > > > > If real hardware does it at the chipset level, we  
> > > will probably use  
> > > > > Igor's suggestion of aliasing A-seg to 3000:0000.  
> > > Before starting  
> > > > > the new CPU, the SMI handler can prepare the SMBASE  
> > > relocation  
> > > > > trampoline at
> > > > > A000:8000 and the hot-plugged CPU will find it at
> > > > > 3000:8000 when it receives the initial SMI.  Because  
> > > this is backed  
> > > > > by RAM at 0xA0000-0xAFFFF, DMA cannot access it and  
> > > would still go  
> > > > > through to RAM at 0x30000.
> > > > >
> > > > > Paolo  


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-27 18:31                                                   ` Igor Mammedov
@ 2019-08-29 17:01                                                     ` Laszlo Ersek
  2019-08-30 14:48                                                       ` Igor Mammedov
  0 siblings, 1 reply; 69+ messages in thread
From: Laszlo Ersek @ 2019-08-29 17:01 UTC (permalink / raw)
  To: Igor Mammedov, Yao, Jiewen
  Cc: Kinney, Michael D, Paolo Bonzini, rfc@edk2.groups.io,
	Alex Williamson, devel@edk2.groups.io, qemu devel list,
	Chen, Yingwen, Nakajima, Jun, Boris Ostrovsky,
	Joao Marcal Lemos Martins, Phillip Goerl

On 08/27/19 20:31, Igor Mammedov wrote:
> On Sat, 24 Aug 2019 01:48:09 +0000
> "Yao, Jiewen" <jiewen.yao@intel.com> wrote:

>> (05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New CPU
>>      will not enter CPU because SMI is disabled)
> I think only CPU that does the write will enter SMM

That used to be the case (and it is still the default QEMU behavior, if
broadcast SMI is not negotiated). However, OVMF does negotiate broadcast
SMI whenever QEMU offers the feature. Broadcast SMI is important for the
stability of the edk2 SMM infrastructure on QEMU/KVM, we've found.

https://bugzilla.redhat.com/show_bug.cgi?id=1412313
https://bugzilla.redhat.com/show_bug.cgi?id=1412327

> and we might not need to pull in all already initialized CPUs into SMM.

That, on the other hand, could be a valid idea. But then the CPU should
use a different method for raising a synchronous SMI for itself (not a
write to IO port 0xB2). Is a "directed SMI for self" possible?

> [...]

I've tried to read through the procedure with your suggested changes,
but I'm failing at composing a coherent mental image, in this email
response format.

If you have the time, can you write up the suggested list of steps in a
"flat" format? (I believe you are suggesting to eliminate some steps
completely.)

... jumping to another point:

>> 2) Let trusted software (SMM and init code) guarantee SMREBASE one by one (include any code runs before SMREBASE)
> that would mean pulling all present CPUs into SMM mode so no attack
> code could be executing before doing hotplug. With a lot of present CPUs
> it could be quite expensive and unlike physical hardware, guest's CPUs
> could be preempted arbitrarily long causing long delays.

I agree with your analysis, but I slightly disagree about the impact:

- CPU hotplug is not a frequent administrative action, so the CPU load
should be temporary (it should be a spike). I don't worry that it would
trip up OS kernel code. (SMI handling is known to take long on physical
platforms oo.) In practice, all "normal" SMIs are broadcast already (for
example when calling the runtime UEFI variable services from the OS kernel).

- The fact that QEMU/KVM introduces some jitter into the execution of
multi-core code (including SMM code) has proved useful in the past, for
catching edk2 regressions.

Again, this is not a strong disagreement from my side. I'm open to
better ways for synching CPUs during muti-CPU-hotplug.

(Digression:

I expect someone could be curious why (a) I find it acceptable (even
beneficial) that "some jitter" injected by the QEMU/KVM scheduling
exposes multi-core regressions in edk2, but at the same time (b) I found
it really important to add broadcast SMI to QEMU and OVMF. After all,
both "jitter" and "unicast SMIs" are QEMU/KVM platform specifics, so why
the different treatment?

The reason is that the "jitter" does not interfere with normal
operation, and it has been good for catching *regressions*. IOW, there
is a working edk2 state, someone posts a patch, works on physical
hardware, but breaks on QEMU/KVM --> then we can still reject or rework
or revert the patch. And we're back to a working state again (in the
best case, with a fixed feature patch).

With the unicast SMIs however, it was impossible to enable the SMM stack
reliably in the first place. There was no functional state to return to.

Digression ends.)

> lets first see if if we can ignore race

Makes me uncomfortable, but if this is the consensus, I'll go along.

> and if it's not then
> we probably end up with implementing some form of #1

OK.

Thanks!
Laszlo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-29 17:01                                                     ` Laszlo Ersek
@ 2019-08-30 14:48                                                       ` Igor Mammedov
  2019-08-30 18:46                                                         ` Laszlo Ersek
  0 siblings, 1 reply; 69+ messages in thread
From: Igor Mammedov @ 2019-08-30 14:48 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Yao, Jiewen, Kinney, Michael D, Paolo Bonzini, rfc@edk2.groups.io,
	Alex Williamson, devel@edk2.groups.io, qemu devel list,
	Chen, Yingwen, Nakajima, Jun, Boris Ostrovsky,
	Joao Marcal Lemos Martins, Phillip Goerl

On Thu, 29 Aug 2019 19:01:35 +0200
Laszlo Ersek <lersek@redhat.com> wrote:

> On 08/27/19 20:31, Igor Mammedov wrote:
> > On Sat, 24 Aug 2019 01:48:09 +0000
> > "Yao, Jiewen" <jiewen.yao@intel.com> wrote:  
> 
> >> (05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New CPU
> >>      will not enter CPU because SMI is disabled)  
> > I think only CPU that does the write will enter SMM  
> 
> That used to be the case (and it is still the default QEMU behavior, if
> broadcast SMI is not negotiated). However, OVMF does negotiate broadcast
> SMI whenever QEMU offers the feature. Broadcast SMI is important for the
> stability of the edk2 SMM infrastructure on QEMU/KVM, we've found.
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1412313
> https://bugzilla.redhat.com/show_bug.cgi?id=1412327
> 
> > and we might not need to pull in all already initialized CPUs into SMM.  
> 
> That, on the other hand, could be a valid idea. But then the CPU should
> use a different method for raising a synchronous SMI for itself (not a
> write to IO port 0xB2). Is a "directed SMI for self" possible?

theoretically depending on argument in 0xb3, it should be possible to
rise directed SMI even if broadcast ones are negotiated.

> > [...]  
> 
> I've tried to read through the procedure with your suggested changes,
> but I'm failing at composing a coherent mental image, in this email
> response format.
> 
> If you have the time, can you write up the suggested list of steps in a
> "flat" format? (I believe you are suggesting to eliminate some steps
> completely.)
if I'd sum it up:

(01) On boot firmware maps and initializes SMI handler at default SMBASE (30000)
     (using dedicated SMRAM at 30000 would allow us to avoid save/restore
      steps and make SMM handler pointer not vulnerable to DMA attacks)

(02) QEMU hotplugs a new CPU in reset-ed state and sends SCI

(03) on receiving SCI, host CPU calls GPE cpu hotplug handler
      which writes to IO port 0xB2 (broadcast SMI)

(04) firmware waits for all existing CPUs rendezvous in SMM mode,
     new CPU(s) have SMI pending but does nothing yet

(05) host CPU wakes up one new CPU (INIT-INIT-SIPI)
     SIPI vector points to RO flash HLT loop.
     (how host CPU will know which new CPUs to relocate?
      possibly reuse QEMU CPU hotplug MMIO interface???)

(06) new CPU does relocation.
     (in case of attacker sends SIPI to several new CPUs,
      open question how to detect collision of several CPUs at the same default SMBASE)

(07) once new CPU relocated host CPU completes initialization, returns
     from IO port write and executes the rest of GPE handler, telling OS
     to online new CPU.


> ... jumping to another point:
> 
> >> 2) Let trusted software (SMM and init code) guarantee SMREBASE one by one (include any code runs before SMREBASE)  
> > that would mean pulling all present CPUs into SMM mode so no attack
> > code could be executing before doing hotplug. With a lot of present CPUs
> > it could be quite expensive and unlike physical hardware, guest's CPUs
> > could be preempted arbitrarily long causing long delays.  
> 
> I agree with your analysis, but I slightly disagree about the impact:
> 
> - CPU hotplug is not a frequent administrative action, so the CPU load
> should be temporary (it should be a spike). I don't worry that it would
> trip up OS kernel code. (SMI handling is known to take long on physical
> platforms oo.) In practice, all "normal" SMIs are broadcast already (for
> example when calling the runtime UEFI variable services from the OS kernel).
> 
> - The fact that QEMU/KVM introduces some jitter into the execution of
> multi-core code (including SMM code) has proved useful in the past, for
> catching edk2 regressions.
> 
> Again, this is not a strong disagreement from my side. I'm open to
> better ways for synching CPUs during muti-CPU-hotplug.
> 
> (Digression:
> 
> I expect someone could be curious why (a) I find it acceptable (even
> beneficial) that "some jitter" injected by the QEMU/KVM scheduling
> exposes multi-core regressions in edk2, but at the same time (b) I found
> it really important to add broadcast SMI to QEMU and OVMF. After all,
> both "jitter" and "unicast SMIs" are QEMU/KVM platform specifics, so why
> the different treatment?
> 
> The reason is that the "jitter" does not interfere with normal
> operation, and it has been good for catching *regressions*. IOW, there
> is a working edk2 state, someone posts a patch, works on physical
> hardware, but breaks on QEMU/KVM --> then we can still reject or rework
> or revert the patch. And we're back to a working state again (in the
> best case, with a fixed feature patch).
> 
> With the unicast SMIs however, it was impossible to enable the SMM stack
> reliably in the first place. There was no functional state to return to.
I don't really get the last statement, but the I know nothing about OVMF.
I don't insist on unicast SMI being used, it's just some ideas about what
we could do. It could be done later, broadcast SMI (might be not the best)
is sufficient to implement CPU hotplug.

> Digression ends.)
> 
> > lets first see if if we can ignore race  
> 
> Makes me uncomfortable, but if this is the consensus, I'll go along.
same here, as mentioned in another reply as it's only possible in
attack case (multiple SMIs + multiple SIPI) so it could be fine to just
explode in case it happens (point is fw in not leaking anything from SMRAM
and OS did something illegeal). 

> > and if it's not then
> > we probably end up with implementing some form of #1  
> 
> OK.
> 
> Thanks!
> Laszlo


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-30 14:48                                                       ` Igor Mammedov
@ 2019-08-30 18:46                                                         ` Laszlo Ersek
  2019-09-02  8:45                                                           ` Igor Mammedov
  0 siblings, 1 reply; 69+ messages in thread
From: Laszlo Ersek @ 2019-08-30 18:46 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Yao, Jiewen, Kinney, Michael D, Paolo Bonzini, rfc@edk2.groups.io,
	Alex Williamson, devel@edk2.groups.io, qemu devel list,
	Chen, Yingwen, Nakajima, Jun, Boris Ostrovsky,
	Joao Marcal Lemos Martins, Phillip Goerl

On 08/30/19 16:48, Igor Mammedov wrote:

> (01) On boot firmware maps and initializes SMI handler at default SMBASE (30000)
>      (using dedicated SMRAM at 30000 would allow us to avoid save/restore
>       steps and make SMM handler pointer not vulnerable to DMA attacks)
> 
> (02) QEMU hotplugs a new CPU in reset-ed state and sends SCI
> 
> (03) on receiving SCI, host CPU calls GPE cpu hotplug handler
>       which writes to IO port 0xB2 (broadcast SMI)
> 
> (04) firmware waits for all existing CPUs rendezvous in SMM mode,
>      new CPU(s) have SMI pending but does nothing yet
> 
> (05) host CPU wakes up one new CPU (INIT-INIT-SIPI)
>      SIPI vector points to RO flash HLT loop.
>      (how host CPU will know which new CPUs to relocate?
>       possibly reuse QEMU CPU hotplug MMIO interface???)
> 
> (06) new CPU does relocation.
>      (in case of attacker sends SIPI to several new CPUs,
>       open question how to detect collision of several CPUs at the same default SMBASE)
> 
> (07) once new CPU relocated host CPU completes initialization, returns
>      from IO port write and executes the rest of GPE handler, telling OS
>      to online new CPU.

In step (03), it is the OS that handles the SCI; it transfers control to
ACPI. The AML can write to IO port 0xB2 only because the OS allows it.

If the OS decides to omit that step, and sends an INIT-SIPI-SIPI
directly to the new CPU, can it steal the CPU?

Thanks!
Laszlo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-30 18:46                                                         ` Laszlo Ersek
@ 2019-09-02  8:45                                                           ` Igor Mammedov
  2019-09-02 19:09                                                             ` Laszlo Ersek
  0 siblings, 1 reply; 69+ messages in thread
From: Igor Mammedov @ 2019-09-02  8:45 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Yao, Jiewen, Kinney, Michael D, Paolo Bonzini, rfc@edk2.groups.io,
	Alex Williamson, devel@edk2.groups.io, qemu devel list,
	Chen, Yingwen, Nakajima, Jun, Boris Ostrovsky,
	Joao Marcal Lemos Martins, Phillip Goerl

On Fri, 30 Aug 2019 20:46:14 +0200
Laszlo Ersek <lersek@redhat.com> wrote:

> On 08/30/19 16:48, Igor Mammedov wrote:
> 
> > (01) On boot firmware maps and initializes SMI handler at default SMBASE (30000)
> >      (using dedicated SMRAM at 30000 would allow us to avoid save/restore
> >       steps and make SMM handler pointer not vulnerable to DMA attacks)
> > 
> > (02) QEMU hotplugs a new CPU in reset-ed state and sends SCI
> > 
> > (03) on receiving SCI, host CPU calls GPE cpu hotplug handler
> >       which writes to IO port 0xB2 (broadcast SMI)
> > 
> > (04) firmware waits for all existing CPUs rendezvous in SMM mode,
> >      new CPU(s) have SMI pending but does nothing yet
> > 
> > (05) host CPU wakes up one new CPU (INIT-INIT-SIPI)
> >      SIPI vector points to RO flash HLT loop.
> >      (how host CPU will know which new CPUs to relocate?
> >       possibly reuse QEMU CPU hotplug MMIO interface???)
> > 
> > (06) new CPU does relocation.
> >      (in case of attacker sends SIPI to several new CPUs,
> >       open question how to detect collision of several CPUs at the same default SMBASE)
> > 
> > (07) once new CPU relocated host CPU completes initialization, returns
> >      from IO port write and executes the rest of GPE handler, telling OS
> >      to online new CPU.  
> 
> In step (03), it is the OS that handles the SCI; it transfers control to
> ACPI. The AML can write to IO port 0xB2 only because the OS allows it.
> 
> If the OS decides to omit that step, and sends an INIT-SIPI-SIPI
> directly to the new CPU, can it steal the CPU?
It sure can but this way it won't get access to privileged SMRAM
so OS can't subvert firmware.
The next time SMI broadcast is sent the CPU will use SMI handler at
default 30000 SMBASE. It's up to us to define behavior here (for example
relocation handler can put such CPU in shutdown state).

It's in the best interest of OS to cooperate and execute AML
provided by firmware, if it does not follow proper cpu hotplug flow
we can't guarantee that stolen CPU will work.

> Thanks!
> Laszlo


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-09-02  8:45                                                           ` Igor Mammedov
@ 2019-09-02 19:09                                                             ` Laszlo Ersek
  2019-09-03 14:53                                                               ` [Qemu-devel] " Igor Mammedov
  0 siblings, 1 reply; 69+ messages in thread
From: Laszlo Ersek @ 2019-09-02 19:09 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Yao, Jiewen, Kinney, Michael D, Paolo Bonzini, rfc@edk2.groups.io,
	Alex Williamson, devel@edk2.groups.io, qemu devel list,
	Chen, Yingwen, Nakajima, Jun, Boris Ostrovsky,
	Joao Marcal Lemos Martins, Phillip Goerl

On 09/02/19 10:45, Igor Mammedov wrote:
> On Fri, 30 Aug 2019 20:46:14 +0200
> Laszlo Ersek <lersek@redhat.com> wrote:
> 
>> On 08/30/19 16:48, Igor Mammedov wrote:
>>
>>> (01) On boot firmware maps and initializes SMI handler at default SMBASE (30000)
>>>      (using dedicated SMRAM at 30000 would allow us to avoid save/restore
>>>       steps and make SMM handler pointer not vulnerable to DMA attacks)
>>>
>>> (02) QEMU hotplugs a new CPU in reset-ed state and sends SCI
>>>
>>> (03) on receiving SCI, host CPU calls GPE cpu hotplug handler
>>>       which writes to IO port 0xB2 (broadcast SMI)
>>>
>>> (04) firmware waits for all existing CPUs rendezvous in SMM mode,
>>>      new CPU(s) have SMI pending but does nothing yet
>>>
>>> (05) host CPU wakes up one new CPU (INIT-INIT-SIPI)
>>>      SIPI vector points to RO flash HLT loop.
>>>      (how host CPU will know which new CPUs to relocate?
>>>       possibly reuse QEMU CPU hotplug MMIO interface???)
>>>
>>> (06) new CPU does relocation.
>>>      (in case of attacker sends SIPI to several new CPUs,
>>>       open question how to detect collision of several CPUs at the same default SMBASE)
>>>
>>> (07) once new CPU relocated host CPU completes initialization, returns
>>>      from IO port write and executes the rest of GPE handler, telling OS
>>>      to online new CPU.  
>>
>> In step (03), it is the OS that handles the SCI; it transfers control to
>> ACPI. The AML can write to IO port 0xB2 only because the OS allows it.
>>
>> If the OS decides to omit that step, and sends an INIT-SIPI-SIPI
>> directly to the new CPU, can it steal the CPU?
> It sure can but this way it won't get access to privileged SMRAM
> so OS can't subvert firmware.
> The next time SMI broadcast is sent the CPU will use SMI handler at
> default 30000 SMBASE. It's up to us to define behavior here (for example
> relocation handler can put such CPU in shutdown state).
> 
> It's in the best interest of OS to cooperate and execute AML
> provided by firmware, if it does not follow proper cpu hotplug flow
> we can't guarantee that stolen CPU will work.

This sounds convincing enough, for the hotplugged CPU; thanks.

So now my concern is with step (01). While preparing for the initial
relocation (of cold-plugged CPUs), the code assumes the memory at the
default SMBASE (0x30000) is normal RAM.

Is it not a problem that the area is written initially while running in
normal 32-bit or 64-bit mode, but then executed (in response to the
first, synchronous, SMI) as SMRAM?

Basically I'm confused by the alias.

TSEG (and presumably, A/B seg) work like this:
- when open, looks like RAM to normal mode and SMM
- when closed, looks like black-hole to normal mode, and like RAM to SMM

The generic edk2 code knows this, and manages the SMRAM areas accordingly.

The area at 0x30000 is different:
- looks like RAM to both normal mode and SMM

If we set up the alias at 0x30000 into A/B seg,
- will that *permanently* hide the normal RAM at 0x30000?
- will 0x30000 start behaving like A/B seg?

Basically my concern is that the universal code in edk2 might or might
not keep A/B seg open while initially populating the area at the default
SMBASE. Specifically, I can imagine two issues:

- if the alias into A/B seg is inactive during the initial population,
then the initial writes go to RAM, but the execution (the first SMBASE
relocation) will occur from A/B seg through the alias

- alternatively, if the alias is always active, but A/B seg is closed
during initial population (which happens in normal mode), then the
initial writes go to the black hole, and execution will occur from a
"blank" A/B seg.

Am I seeing things? (Sorry, I keep feeling dumber and dumber in this
thread.)

Anyway, I guess we could try and see if OVMF still boots with the alias...

Thanks
Laszlo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-09-02 19:09                                                             ` Laszlo Ersek
@ 2019-09-03 14:53                                                               ` Igor Mammedov
  2019-09-03 17:20                                                                 ` Laszlo Ersek
  0 siblings, 1 reply; 69+ messages in thread
From: Igor Mammedov @ 2019-09-03 14:53 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Chen, Yingwen, devel@edk2.groups.io, Phillip Goerl,
	qemu devel list, Alex Williamson, Yao, Jiewen, Nakajima, Jun,
	Kinney, Michael D, Paolo Bonzini, Boris Ostrovsky,
	rfc@edk2.groups.io, Joao Marcal Lemos Martins

On Mon, 2 Sep 2019 21:09:58 +0200
Laszlo Ersek <lersek@redhat.com> wrote:

> On 09/02/19 10:45, Igor Mammedov wrote:
> > On Fri, 30 Aug 2019 20:46:14 +0200
> > Laszlo Ersek <lersek@redhat.com> wrote:
> >   
> >> On 08/30/19 16:48, Igor Mammedov wrote:
> >>  
> >>> (01) On boot firmware maps and initializes SMI handler at default SMBASE (30000)
> >>>      (using dedicated SMRAM at 30000 would allow us to avoid save/restore
> >>>       steps and make SMM handler pointer not vulnerable to DMA attacks)
> >>>
> >>> (02) QEMU hotplugs a new CPU in reset-ed state and sends SCI
> >>>
> >>> (03) on receiving SCI, host CPU calls GPE cpu hotplug handler
> >>>       which writes to IO port 0xB2 (broadcast SMI)
> >>>
> >>> (04) firmware waits for all existing CPUs rendezvous in SMM mode,
> >>>      new CPU(s) have SMI pending but does nothing yet
> >>>
> >>> (05) host CPU wakes up one new CPU (INIT-INIT-SIPI)
> >>>      SIPI vector points to RO flash HLT loop.
> >>>      (how host CPU will know which new CPUs to relocate?
> >>>       possibly reuse QEMU CPU hotplug MMIO interface???)
> >>>
> >>> (06) new CPU does relocation.
> >>>      (in case of attacker sends SIPI to several new CPUs,
> >>>       open question how to detect collision of several CPUs at the same default SMBASE)
> >>>
> >>> (07) once new CPU relocated host CPU completes initialization, returns
> >>>      from IO port write and executes the rest of GPE handler, telling OS
> >>>      to online new CPU.    
> >>
> >> In step (03), it is the OS that handles the SCI; it transfers control to
> >> ACPI. The AML can write to IO port 0xB2 only because the OS allows it.
> >>
> >> If the OS decides to omit that step, and sends an INIT-SIPI-SIPI
> >> directly to the new CPU, can it steal the CPU?  
> > It sure can but this way it won't get access to privileged SMRAM
> > so OS can't subvert firmware.
> > The next time SMI broadcast is sent the CPU will use SMI handler at
> > default 30000 SMBASE. It's up to us to define behavior here (for example
> > relocation handler can put such CPU in shutdown state).
> > 
> > It's in the best interest of OS to cooperate and execute AML
> > provided by firmware, if it does not follow proper cpu hotplug flow
> > we can't guarantee that stolen CPU will work.  
> 
> This sounds convincing enough, for the hotplugged CPU; thanks.
> 
> So now my concern is with step (01). While preparing for the initial
> relocation (of cold-plugged CPUs), the code assumes the memory at the
> default SMBASE (0x30000) is normal RAM.
> 
> Is it not a problem that the area is written initially while running in
> normal 32-bit or 64-bit mode, but then executed (in response to the
> first, synchronous, SMI) as SMRAM?

currently there is no SMRAM at 0x30000, so all access falls through
into RAM address space and we are about to change that.

but firmware doesn't have to use it as RAM, it can check if QEMU
supports SMRAM at 0x30000 and if supported map it to configure
and then lock it down.

 
> Basically I'm confused by the alias.
> 
> TSEG (and presumably, A/B seg) work like this:
> - when open, looks like RAM to normal mode and SMM
> - when closed, looks like black-hole to normal mode, and like RAM to SMM
> 
> The generic edk2 code knows this, and manages the SMRAM areas accordingly.
> 
> The area at 0x30000 is different:
> - looks like RAM to both normal mode and SMM
> 
> If we set up the alias at 0x30000 into A/B seg,
> - will that *permanently* hide the normal RAM at 0x30000?
> - will 0x30000 start behaving like A/B seg?
> 
> Basically my concern is that the universal code in edk2 might or might
> not keep A/B seg open while initially populating the area at the default
> SMBASE. Specifically, I can imagine two issues:
> 
> - if the alias into A/B seg is inactive during the initial population,
> then the initial writes go to RAM, but the execution (the first SMBASE
> relocation) will occur from A/B seg through the alias
> 
> - alternatively, if the alias is always active, but A/B seg is closed
> during initial population (which happens in normal mode), then the
> initial writes go to the black hole, and execution will occur from a
> "blank" A/B seg.
> 
> Am I seeing things? (Sorry, I keep feeling dumber and dumber in this
> thread.)

I don't really know how firmware uses A/B segments and I'm afraid that
cannibalizing one for configuring 0x30000 might break something.

Since we are inventing something out of q35 spec anyway, How about
leaving A/B/TSEG to be and using fwcfg to configure when/where
SMRAM(0x30000+128K) should be mapped into RAM address space.

I see a couple of options:
  1: use identity mapping where SMRAM(0x30000+128K) maps into the same
     range in RAM address space when firmware writes into fwcfg
     file and unmaps/locks on the second write (until HW reset)
  2: let firmware choose where to map SMRAM(0x30000+128K) in RAM address
     space, logic is essentially the same as above only firmware
     picks and writes into fwcfg an address where SMRAM(0x30000+128K)
     should be mapped.  


> Anyway, I guess we could try and see if OVMF still boots with the alias...
> 
> Thanks
> Laszlo
> 


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-09-03 14:53                                                               ` [Qemu-devel] " Igor Mammedov
@ 2019-09-03 17:20                                                                 ` Laszlo Ersek
  2019-09-04  9:52                                                                   ` imammedo
  0 siblings, 1 reply; 69+ messages in thread
From: Laszlo Ersek @ 2019-09-03 17:20 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Chen, Yingwen, devel@edk2.groups.io, Phillip Goerl,
	qemu devel list, Alex Williamson, Yao, Jiewen, Nakajima, Jun,
	Kinney, Michael D, Paolo Bonzini, Boris Ostrovsky,
	rfc@edk2.groups.io, Joao Marcal Lemos Martins

On 09/03/19 16:53, Igor Mammedov wrote:
> On Mon, 2 Sep 2019 21:09:58 +0200
> Laszlo Ersek <lersek@redhat.com> wrote:
> 
>> On 09/02/19 10:45, Igor Mammedov wrote:
>>> On Fri, 30 Aug 2019 20:46:14 +0200
>>> Laszlo Ersek <lersek@redhat.com> wrote:
>>>   
>>>> On 08/30/19 16:48, Igor Mammedov wrote:
>>>>  
>>>>> (01) On boot firmware maps and initializes SMI handler at default SMBASE (30000)
>>>>>      (using dedicated SMRAM at 30000 would allow us to avoid save/restore
>>>>>       steps and make SMM handler pointer not vulnerable to DMA attacks)
>>>>>
>>>>> (02) QEMU hotplugs a new CPU in reset-ed state and sends SCI
>>>>>
>>>>> (03) on receiving SCI, host CPU calls GPE cpu hotplug handler
>>>>>       which writes to IO port 0xB2 (broadcast SMI)
>>>>>
>>>>> (04) firmware waits for all existing CPUs rendezvous in SMM mode,
>>>>>      new CPU(s) have SMI pending but does nothing yet
>>>>>
>>>>> (05) host CPU wakes up one new CPU (INIT-INIT-SIPI)
>>>>>      SIPI vector points to RO flash HLT loop.
>>>>>      (how host CPU will know which new CPUs to relocate?
>>>>>       possibly reuse QEMU CPU hotplug MMIO interface???)
>>>>>
>>>>> (06) new CPU does relocation.
>>>>>      (in case of attacker sends SIPI to several new CPUs,
>>>>>       open question how to detect collision of several CPUs at the same default SMBASE)
>>>>>
>>>>> (07) once new CPU relocated host CPU completes initialization, returns
>>>>>      from IO port write and executes the rest of GPE handler, telling OS
>>>>>      to online new CPU.    
>>>>
>>>> In step (03), it is the OS that handles the SCI; it transfers control to
>>>> ACPI. The AML can write to IO port 0xB2 only because the OS allows it.
>>>>
>>>> If the OS decides to omit that step, and sends an INIT-SIPI-SIPI
>>>> directly to the new CPU, can it steal the CPU?  
>>> It sure can but this way it won't get access to privileged SMRAM
>>> so OS can't subvert firmware.
>>> The next time SMI broadcast is sent the CPU will use SMI handler at
>>> default 30000 SMBASE. It's up to us to define behavior here (for example
>>> relocation handler can put such CPU in shutdown state).
>>>
>>> It's in the best interest of OS to cooperate and execute AML
>>> provided by firmware, if it does not follow proper cpu hotplug flow
>>> we can't guarantee that stolen CPU will work.  
>>
>> This sounds convincing enough, for the hotplugged CPU; thanks.
>>
>> So now my concern is with step (01). While preparing for the initial
>> relocation (of cold-plugged CPUs), the code assumes the memory at the
>> default SMBASE (0x30000) is normal RAM.
>>
>> Is it not a problem that the area is written initially while running in
>> normal 32-bit or 64-bit mode, but then executed (in response to the
>> first, synchronous, SMI) as SMRAM?
> 
> currently there is no SMRAM at 0x30000, so all access falls through
> into RAM address space and we are about to change that.
> 
> but firmware doesn't have to use it as RAM, it can check if QEMU
> supports SMRAM at 0x30000 and if supported map it to configure
> and then lock it down.

I'm sure you are *technically* right, but you seem to be assuming that I
can modify or rearrange anything I want in edk2. :)

If we can solve the above in OVMF platform code, that's great. If not
(e.g. UefiCpuPkg code needs to be updated), then things will get tricky.
If we can introduce another platform hook for this, that would help. I
can't say before I try.


> 
>  
>> Basically I'm confused by the alias.
>>
>> TSEG (and presumably, A/B seg) work like this:
>> - when open, looks like RAM to normal mode and SMM
>> - when closed, looks like black-hole to normal mode, and like RAM to SMM
>>
>> The generic edk2 code knows this, and manages the SMRAM areas accordingly.
>>
>> The area at 0x30000 is different:
>> - looks like RAM to both normal mode and SMM
>>
>> If we set up the alias at 0x30000 into A/B seg,
>> - will that *permanently* hide the normal RAM at 0x30000?
>> - will 0x30000 start behaving like A/B seg?
>>
>> Basically my concern is that the universal code in edk2 might or might
>> not keep A/B seg open while initially populating the area at the default
>> SMBASE. Specifically, I can imagine two issues:
>>
>> - if the alias into A/B seg is inactive during the initial population,
>> then the initial writes go to RAM, but the execution (the first SMBASE
>> relocation) will occur from A/B seg through the alias
>>
>> - alternatively, if the alias is always active, but A/B seg is closed
>> during initial population (which happens in normal mode), then the
>> initial writes go to the black hole, and execution will occur from a
>> "blank" A/B seg.
>>
>> Am I seeing things? (Sorry, I keep feeling dumber and dumber in this
>> thread.)
> 
> I don't really know how firmware uses A/B segments and I'm afraid that
> cannibalizing one for configuring 0x30000 might break something.
> 
> Since we are inventing something out of q35 spec anyway, How about
> leaving A/B/TSEG to be and using fwcfg to configure when/where
> SMRAM(0x30000+128K) should be mapped into RAM address space.
> 
> I see a couple of options:
>   1: use identity mapping where SMRAM(0x30000+128K) maps into the same
>      range in RAM address space when firmware writes into fwcfg
>      file and unmaps/locks on the second write (until HW reset)
>   2: let firmware choose where to map SMRAM(0x30000+128K) in RAM address
>      space, logic is essentially the same as above only firmware
>      picks and writes into fwcfg an address where SMRAM(0x30000+128K)
>      should be mapped.  

Option#1 would be similar to how TSEG works now, correct? IOW normal RAM
(from the QEMU perspective) is exposed as "SMRAM" to the guest, hidden
with a "black hole" overlay (outside of SMM) if SMRAM is closed.

If that's correct, then #1 looks more attractive to me than #2.

Thanks
Laszlo


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-09-03 17:20                                                                 ` Laszlo Ersek
@ 2019-09-04  9:52                                                                   ` imammedo
  2019-09-05 13:08                                                                     ` Laszlo Ersek
  0 siblings, 1 reply; 69+ messages in thread
From: imammedo @ 2019-09-04  9:52 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Chen, Yingwen, devel@edk2.groups.io, Phillip Goerl,
	qemu devel list, Alex Williamson, Yao, Jiewen, Nakajima, Jun,
	Kinney, Michael D, Paolo Bonzini, Boris Ostrovsky,
	rfc@edk2.groups.io, Joao Marcal Lemos Martins

On Tue, 3 Sep 2019 19:20:25 +0200
Laszlo Ersek <lersek@redhat.com> wrote:

> On 09/03/19 16:53, Igor Mammedov wrote:
> > On Mon, 2 Sep 2019 21:09:58 +0200
> > Laszlo Ersek <lersek@redhat.com> wrote:
> >   
> >> On 09/02/19 10:45, Igor Mammedov wrote:  
> >>> On Fri, 30 Aug 2019 20:46:14 +0200
> >>> Laszlo Ersek <lersek@redhat.com> wrote:
> >>>     
> >>>> On 08/30/19 16:48, Igor Mammedov wrote:
> >>>>    
> >>>>> (01) On boot firmware maps and initializes SMI handler at default SMBASE (30000)
> >>>>>      (using dedicated SMRAM at 30000 would allow us to avoid save/restore
> >>>>>       steps and make SMM handler pointer not vulnerable to DMA attacks)
> >>>>>
> >>>>> (02) QEMU hotplugs a new CPU in reset-ed state and sends SCI
> >>>>>
> >>>>> (03) on receiving SCI, host CPU calls GPE cpu hotplug handler
> >>>>>       which writes to IO port 0xB2 (broadcast SMI)
> >>>>>
> >>>>> (04) firmware waits for all existing CPUs rendezvous in SMM mode,
> >>>>>      new CPU(s) have SMI pending but does nothing yet
> >>>>>
> >>>>> (05) host CPU wakes up one new CPU (INIT-INIT-SIPI)
> >>>>>      SIPI vector points to RO flash HLT loop.
> >>>>>      (how host CPU will know which new CPUs to relocate?
> >>>>>       possibly reuse QEMU CPU hotplug MMIO interface???)
> >>>>>
> >>>>> (06) new CPU does relocation.
> >>>>>      (in case of attacker sends SIPI to several new CPUs,
> >>>>>       open question how to detect collision of several CPUs at the same default SMBASE)
> >>>>>
> >>>>> (07) once new CPU relocated host CPU completes initialization, returns
> >>>>>      from IO port write and executes the rest of GPE handler, telling OS
> >>>>>      to online new CPU.      
> >>>>
> >>>> In step (03), it is the OS that handles the SCI; it transfers control to
> >>>> ACPI. The AML can write to IO port 0xB2 only because the OS allows it.
> >>>>
> >>>> If the OS decides to omit that step, and sends an INIT-SIPI-SIPI
> >>>> directly to the new CPU, can it steal the CPU?    
> >>> It sure can but this way it won't get access to privileged SMRAM
> >>> so OS can't subvert firmware.
> >>> The next time SMI broadcast is sent the CPU will use SMI handler at
> >>> default 30000 SMBASE. It's up to us to define behavior here (for example
> >>> relocation handler can put such CPU in shutdown state).
> >>>
> >>> It's in the best interest of OS to cooperate and execute AML
> >>> provided by firmware, if it does not follow proper cpu hotplug flow
> >>> we can't guarantee that stolen CPU will work.    
> >>
> >> This sounds convincing enough, for the hotplugged CPU; thanks.
> >>
> >> So now my concern is with step (01). While preparing for the initial
> >> relocation (of cold-plugged CPUs), the code assumes the memory at the
> >> default SMBASE (0x30000) is normal RAM.
> >>
> >> Is it not a problem that the area is written initially while running in
> >> normal 32-bit or 64-bit mode, but then executed (in response to the
> >> first, synchronous, SMI) as SMRAM?  
> > 
> > currently there is no SMRAM at 0x30000, so all access falls through
> > into RAM address space and we are about to change that.
> > 
> > but firmware doesn't have to use it as RAM, it can check if QEMU
> > supports SMRAM at 0x30000 and if supported map it to configure
> > and then lock it down.  
> 
> I'm sure you are *technically* right, but you seem to be assuming that I
> can modify or rearrange anything I want in edk2. :)
yep, I'm looking at it from theoretical perspective so far,
but what could be done in reality might be limited.
 
> If we can solve the above in OVMF platform code, that's great. If not
> (e.g. UefiCpuPkg code needs to be updated), then things will get tricky.
> If we can introduce another platform hook for this, that would help. I
> can't say before I try.
> 
>
> > 
> >    
> >> Basically I'm confused by the alias.
> >>
> >> TSEG (and presumably, A/B seg) work like this:
> >> - when open, looks like RAM to normal mode and SMM
> >> - when closed, looks like black-hole to normal mode, and like RAM to SMM
> >>
> >> The generic edk2 code knows this, and manages the SMRAM areas accordingly.
> >>
> >> The area at 0x30000 is different:
> >> - looks like RAM to both normal mode and SMM
> >>
> >> If we set up the alias at 0x30000 into A/B seg,
> >> - will that *permanently* hide the normal RAM at 0x30000?
> >> - will 0x30000 start behaving like A/B seg?
> >>
> >> Basically my concern is that the universal code in edk2 might or might
> >> not keep A/B seg open while initially populating the area at the default
> >> SMBASE. Specifically, I can imagine two issues:
> >>
> >> - if the alias into A/B seg is inactive during the initial population,
> >> then the initial writes go to RAM, but the execution (the first SMBASE
> >> relocation) will occur from A/B seg through the alias
> >>
> >> - alternatively, if the alias is always active, but A/B seg is closed
> >> during initial population (which happens in normal mode), then the
> >> initial writes go to the black hole, and execution will occur from a
> >> "blank" A/B seg.
> >>
> >> Am I seeing things? (Sorry, I keep feeling dumber and dumber in this
> >> thread.)  
> > 
> > I don't really know how firmware uses A/B segments and I'm afraid that
> > cannibalizing one for configuring 0x30000 might break something.
> > 
> > Since we are inventing something out of q35 spec anyway, How about
> > leaving A/B/TSEG to be and using fwcfg to configure when/where
> > SMRAM(0x30000+128K) should be mapped into RAM address space.
> > 
> > I see a couple of options:
> >   1: use identity mapping where SMRAM(0x30000+128K) maps into the same
> >      range in RAM address space when firmware writes into fwcfg
> >      file and unmaps/locks on the second write (until HW reset)
> >   2: let firmware choose where to map SMRAM(0x30000+128K) in RAM address
> >      space, logic is essentially the same as above only firmware
> >      picks and writes into fwcfg an address where SMRAM(0x30000+128K)
> >      should be mapped.    
> 
> Option#1 would be similar to how TSEG works now, correct? IOW normal RAM
> (from the QEMU perspective) is exposed as "SMRAM" to the guest, hidden
> with a "black hole" overlay (outside of SMM) if SMRAM is closed.

 it could be stolen RAM + black hole like TSEG, assuming fw can live without RAM(0x30000+128K) range
  (in this case fwcfg interface would only work for locking down the range)

 or

 we can actually have a dedicated SMRAM (like in my earlier RFC),
 in this case FW can use RAM(0x30000+128K) when SMRAM isn't mapped into RAM address space
 (in this case fwcfg would be used to temporarily map SMRAM into normal RAM and unmap/lock
  after SMI relocation handler was initialized).

If possible I'd prefer a simpler TSEG like variant.

> 
> If that's correct, then #1 looks more attractive to me than #2.
> 
> Thanks
> Laszlo
> 


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-09-04  9:52                                                                   ` imammedo
@ 2019-09-05 13:08                                                                     ` Laszlo Ersek
  2019-09-05 15:45                                                                       ` Igor Mammedov
  2019-09-05 15:49                                                                       ` [PATCH] q35: lpc: allow to lock down 128K RAM at default SMBASE address Igor Mammedov
  0 siblings, 2 replies; 69+ messages in thread
From: Laszlo Ersek @ 2019-09-05 13:08 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Chen, Yingwen, devel@edk2.groups.io, Phillip Goerl,
	qemu devel list, Alex Williamson, Yao, Jiewen, Nakajima, Jun,
	Kinney, Michael D, Paolo Bonzini, Boris Ostrovsky,
	rfc@edk2.groups.io, Joao Marcal Lemos Martins

On 09/04/19 11:52, Igor Mammedov wrote:

>  it could be stolen RAM + black hole like TSEG, assuming fw can live without RAM(0x30000+128K) range
>   (in this case fwcfg interface would only work for locking down the range)
> 
>  or
> 
>  we can actually have a dedicated SMRAM (like in my earlier RFC),
>  in this case FW can use RAM(0x30000+128K) when SMRAM isn't mapped into RAM address space
>  (in this case fwcfg would be used to temporarily map SMRAM into normal RAM and unmap/lock
>   after SMI relocation handler was initialized).
> 
> If possible I'd prefer a simpler TSEG like variant.

I think TSEG-like behavior is between these two. That is, I believe we
should have explicit open/close/lock operations. And, when the range is
closed (meaning, closed+unlocked, or closed+locked), then the black hole
should take effect for code that's not running in SMM.

Put differently, its like the second choice, except the range never
appears as normal RAM. "When SMRAM isn't mapped into RAM address space",
then the address range shows "nothing" (black hole).

Regarding "fw can live without RAM(0x30000+128K) range" -- do you mean
whether the firmware could use another RAM area for fw_cfg DMA?

If that's the question, then I wouldn't worry about it. I'd remove the
0x30000+128K range from the memory map, so the fw_cfg stuff (or anything
else) would never allocate memory from the range. It's much more
concerning to me however how the SMM infrastructure would deal with a
hole in the memory map right there.

Thanks
Laszlo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-09-05 13:08                                                                     ` Laszlo Ersek
@ 2019-09-05 15:45                                                                       ` Igor Mammedov
  2019-09-05 15:49                                                                       ` [PATCH] q35: lpc: allow to lock down 128K RAM at default SMBASE address Igor Mammedov
  1 sibling, 0 replies; 69+ messages in thread
From: Igor Mammedov @ 2019-09-05 15:45 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Chen, Yingwen, devel@edk2.groups.io, Phillip Goerl,
	qemu devel list, Alex Williamson, Yao, Jiewen, Nakajima, Jun,
	Kinney, Michael D, Paolo Bonzini, Boris Ostrovsky,
	rfc@edk2.groups.io, Joao Marcal Lemos Martins

On Thu, 5 Sep 2019 15:08:31 +0200
Laszlo Ersek <lersek@redhat.com> wrote:

> On 09/04/19 11:52, Igor Mammedov wrote:
> 
> >  it could be stolen RAM + black hole like TSEG, assuming fw can live without RAM(0x30000+128K) range
> >   (in this case fwcfg interface would only work for locking down the range)
> > 
> >  or
> > 
> >  we can actually have a dedicated SMRAM (like in my earlier RFC),
> >  in this case FW can use RAM(0x30000+128K) when SMRAM isn't mapped into RAM address space
> >  (in this case fwcfg would be used to temporarily map SMRAM into normal RAM and unmap/lock
> >   after SMI relocation handler was initialized).
> > 
> > If possible I'd prefer a simpler TSEG like variant.  
> 
> I think TSEG-like behavior is between these two. That is, I believe we
> should have explicit open/close/lock operations. And, when the range is
> closed (meaning, closed+unlocked, or closed+locked), then the black hole
> should take effect for code that's not running in SMM.
> 
> Put differently, its like the second choice, except the range never
> appears as normal RAM. "When SMRAM isn't mapped into RAM address space",
> then the address range shows "nothing" (black hole).
I guess we at point where patch is better then words, I'll send one as reply here shortly.
I've just implemented subset of above (opened, closed+locked).


> Regarding "fw can live without RAM(0x30000+128K) range" -- do you mean
> whether the firmware could use another RAM area for fw_cfg DMA?
> 
> If that's the question, then I wouldn't worry about it. I'd remove the
> 0x30000+128K range from the memory map, so the fw_cfg stuff (or anything
> else) would never allocate memory from the range. It's much more
> concerning to me however how the SMM infrastructure would deal with a
> hole in the memory map right there.
I didn't mean fwcfg in this context, what I meant if firmware were able
to avoid using RAM(0x30000+128K) range (since it becomes unusable after locking).
Looks like you just answered it here


^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH] q35: lpc: allow to lock down 128K RAM at default SMBASE address
  2019-09-05 13:08                                                                     ` Laszlo Ersek
  2019-09-05 15:45                                                                       ` Igor Mammedov
@ 2019-09-05 15:49                                                                       ` Igor Mammedov
  2019-09-09 19:15                                                                         ` Laszlo Ersek
  1 sibling, 1 reply; 69+ messages in thread
From: Igor Mammedov @ 2019-09-05 15:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: yingwen.chen, devel, phillip.goerl, lersek, alex.williamson,
	jiewen.yao, jun.nakajima, michael.d.kinney, pbonzini,
	boris.ostrovsky, rfc, joao.m.martins

lpc already has SMI negotiation feature, extend it by adding
optin ICH9_LPC_SMI_F_LOCKED_SMBASE_BIT to supported features.

Writing this bit into "etc/smi/requested-features" fw_cfg file,
tells QEMU to alias 0x30000,128K RAM range into SMRAM address
space and mask this region from normal RAM address space
(reads return 0xff and writes are ignored, i.e. guest code
should be able to deal with not usable 0x30000,128K RAM range
once ICH9_LPC_SMI_F_LOCKED_SMBASE_BIT is activated).

To make negotiated change effective, guest should read
"etc/smi/features-ok" fw_cfg file, which activates negotiated
features and locks down negotiating capabilities until hard reset.

Flow for initializing SMI handler on guest side:
 1. set SMI handler entry point at default SMBASE location
 2. check that host supports ICH9_LPC_SMI_F_LOCKED_SMBASE_BIT
    in "etc/smi/supported-features" and set if supported set
    it in "etc/smi/requested-features"
 3. read "etc/smi/features-ok", if returned value is 1
    negotiated at step 2 features are activated successfully.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 include/hw/i386/ich9.h | 11 ++++++--
 hw/i386/pc.c           |  4 ++-
 hw/i386/pc_q35.c       |  3 ++-
 hw/isa/lpc_ich9.c      | 58 +++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 71 insertions(+), 5 deletions(-)

diff --git a/include/hw/i386/ich9.h b/include/hw/i386/ich9.h
index 72e803f6e2..c28685b753 100644
--- a/include/hw/i386/ich9.h
+++ b/include/hw/i386/ich9.h
@@ -12,11 +12,14 @@
 #include "hw/acpi/acpi.h"
 #include "hw/acpi/ich9.h"
 #include "hw/pci/pci_bus.h"
+#include "qemu/units.h"
 
 void ich9_lpc_set_irq(void *opaque, int irq_num, int level);
 int ich9_lpc_map_irq(PCIDevice *pci_dev, int intx);
 PCIINTxRoute ich9_route_intx_pin_to_irq(void *opaque, int pirq_pin);
-void ich9_lpc_pm_init(PCIDevice *pci_lpc, bool smm_enabled);
+void ich9_lpc_pm_init(PCIDevice *pci_lpc, bool smm_enabled,
+                      MemoryRegion *system_memory, MemoryRegion *ram,
+                      MemoryRegion *smram);
 I2CBus *ich9_smb_init(PCIBus *bus, int devfn, uint32_t smb_io_base);
 
 void ich9_generate_smi(void);
@@ -71,6 +74,8 @@ typedef struct ICH9LPCState {
     uint8_t smi_features_ok;          /* guest-visible, read-only; selecting it
                                        * triggers feature lockdown */
     uint64_t smi_negotiated_features; /* guest-invisible, host endian */
+    MemoryRegion smbase_blackhole;
+    MemoryRegion smbase_window;
 
     /* isa bus */
     ISABus *isa_bus;
@@ -248,5 +253,7 @@ typedef struct ICH9LPCState {
 
 /* bit positions used in fw_cfg SMI feature negotiation */
 #define ICH9_LPC_SMI_F_BROADCAST_BIT            0
-
+#define ICH9_LPC_SMI_F_LOCKED_SMBASE_BIT        1
+#define ICH9_LPC_SMBASE_ADDR                    0x30000
+#define ICH9_LPC_SMBASE_RAM_SIZE                (128 * KiB)
 #endif /* HW_ICH9_H */
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index c14ed86439..99a98303eb 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -119,7 +119,9 @@ struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
 /* Physical Address of PVH entry point read from kernel ELF NOTE */
 static size_t pvh_start_addr;
 
-GlobalProperty pc_compat_4_1[] = {};
+GlobalProperty pc_compat_4_1[] = {
+    { "ICH9-LPC", "x-smi-locked-smbase", "off" },
+};
 const size_t pc_compat_4_1_len = G_N_ELEMENTS(pc_compat_4_1);
 
 GlobalProperty pc_compat_4_0[] = {};
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index d4e8a1cb9f..50462686a0 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -292,7 +292,8 @@ static void pc_q35_init(MachineState *machine)
                          0xff0104);
 
     /* connect pm stuff to lpc */
-    ich9_lpc_pm_init(lpc, pc_machine_is_smm_enabled(pcms));
+    ich9_lpc_pm_init(lpc, pc_machine_is_smm_enabled(pcms), get_system_memory(),
+        ram_memory, MEMORY_REGION(object_resolve_path("/machine/smram", NULL)));
 
     if (pcms->sata_enabled) {
         /* ahci and SATA device, for q35 1 ahci controller is built-in */
diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
index 17c292e306..17a8cd1b51 100644
--- a/hw/isa/lpc_ich9.c
+++ b/hw/isa/lpc_ich9.c
@@ -359,6 +359,38 @@ static void ich9_set_sci(void *opaque, int irq_num, int level)
     }
 }
 
+static uint64_t smbase_blackhole_read(void *ptr, hwaddr reg, unsigned size)
+{
+    return 0xffffffff;
+}
+
+static void smbase_blackhole_write(void *opaque, hwaddr addr, uint64_t val,
+                                   unsigned width)
+{
+    /* nothing */
+}
+
+static const MemoryRegionOps smbase_blackhole_ops = {
+    .read = smbase_blackhole_read,
+    .write = smbase_blackhole_write,
+    .endianness = DEVICE_NATIVE_ENDIAN,
+    .valid.min_access_size = 1,
+    .valid.max_access_size = 4,
+    .impl.min_access_size = 4,
+    .impl.max_access_size = 4,
+    .endianness = DEVICE_LITTLE_ENDIAN,
+};
+
+static void ich9_lpc_smbase_locked_update(ICH9LPCState *lpc)
+{
+    bool en = lpc->smi_negotiated_features & ICH9_LPC_SMI_F_LOCKED_SMBASE_BIT;
+
+    memory_region_transaction_begin();
+    memory_region_set_enabled(&lpc->smbase_blackhole, en);
+    memory_region_set_enabled(&lpc->smbase_window, en);
+    memory_region_transaction_commit();
+}
+
 static void smi_features_ok_callback(void *opaque)
 {
     ICH9LPCState *lpc = opaque;
@@ -379,9 +411,13 @@ static void smi_features_ok_callback(void *opaque)
     /* valid feature subset requested, lock it down, report success */
     lpc->smi_negotiated_features = guest_features;
     lpc->smi_features_ok = 1;
+
+    ich9_lpc_smbase_locked_update(lpc);
 }
 
-void ich9_lpc_pm_init(PCIDevice *lpc_pci, bool smm_enabled)
+void ich9_lpc_pm_init(PCIDevice *lpc_pci, bool smm_enabled,
+                      MemoryRegion *system_memory,  MemoryRegion *ram,
+                      MemoryRegion *smram)
 {
     ICH9LPCState *lpc = ICH9_LPC_DEVICE(lpc_pci);
     qemu_irq sci_irq;
@@ -413,6 +449,20 @@ void ich9_lpc_pm_init(PCIDevice *lpc_pci, bool smm_enabled)
                                  &lpc->smi_features_ok,
                                  sizeof lpc->smi_features_ok,
                                  true);
+
+        memory_region_init_io(&lpc->smbase_blackhole, OBJECT(lpc),
+                              &smbase_blackhole_ops, NULL,
+                              "smbase-blackhole", ICH9_LPC_SMBASE_RAM_SIZE);
+        memory_region_set_enabled(&lpc->smbase_blackhole, false);
+        memory_region_add_subregion_overlap(system_memory, ICH9_LPC_SMBASE_ADDR,
+                                            &lpc->smbase_blackhole, 1);
+
+
+        memory_region_init_alias(&lpc->smbase_window, OBJECT(lpc),
+            "smbase-window", ram,
+             ICH9_LPC_SMBASE_ADDR, ICH9_LPC_SMBASE_RAM_SIZE);
+        memory_region_set_enabled(&lpc->smbase_window, false);
+        memory_region_add_subregion(smram, 0x30000, &lpc->smbase_window);
     }
 
     ich9_lpc_reset(DEVICE(lpc));
@@ -508,6 +558,7 @@ static int ich9_lpc_post_load(void *opaque, int version_id)
     ich9_lpc_pmbase_sci_update(lpc);
     ich9_lpc_rcba_update(lpc, 0 /* disabled ICH9_LPC_RCBA_EN */);
     ich9_lpc_pmcon_update(lpc);
+    ich9_lpc_smbase_locked_update(lpc);
     return 0;
 }
 
@@ -567,6 +618,8 @@ static void ich9_lpc_reset(DeviceState *qdev)
     memset(lpc->smi_guest_features_le, 0, sizeof lpc->smi_guest_features_le);
     lpc->smi_features_ok = 0;
     lpc->smi_negotiated_features = 0;
+
+    ich9_lpc_smbase_locked_update(lpc);
 }
 
 /* root complex register block is mapped into memory space */
@@ -697,6 +750,7 @@ static void ich9_lpc_realize(PCIDevice *d, Error **errp)
     qdev_init_gpio_out_named(dev, lpc->gsi, ICH9_GPIO_GSI, GSI_NUM_PINS);
 
     isa_bus_irqs(isa_bus, lpc->gsi);
+
 }
 
 static bool ich9_rst_cnt_needed(void *opaque)
@@ -764,6 +818,8 @@ static Property ich9_lpc_properties[] = {
     DEFINE_PROP_BOOL("noreboot", ICH9LPCState, pin_strap.spkr_hi, true),
     DEFINE_PROP_BIT64("x-smi-broadcast", ICH9LPCState, smi_host_features,
                       ICH9_LPC_SMI_F_BROADCAST_BIT, true),
+    DEFINE_PROP_BIT64("x-smi-locked-smbase", ICH9LPCState, smi_host_features,
+                      ICH9_LPC_SMI_F_LOCKED_SMBASE_BIT, true),
     DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH] q35: lpc: allow to lock down 128K RAM at default SMBASE address
  2019-09-05 15:49                                                                       ` [PATCH] q35: lpc: allow to lock down 128K RAM at default SMBASE address Igor Mammedov
@ 2019-09-09 19:15                                                                         ` Laszlo Ersek
  2019-09-09 19:20                                                                           ` Laszlo Ersek
  2019-09-10 15:58                                                                           ` Igor Mammedov
  0 siblings, 2 replies; 69+ messages in thread
From: Laszlo Ersek @ 2019-09-09 19:15 UTC (permalink / raw)
  To: Igor Mammedov, qemu-devel
  Cc: yingwen.chen, devel, phillip.goerl, alex.williamson, jiewen.yao,
	jun.nakajima, michael.d.kinney, pbonzini, boris.ostrovsky, rfc,
	joao.m.martins

Hi Igor,

On 09/05/19 17:49, Igor Mammedov wrote:
> lpc already has SMI negotiation feature, extend it by adding
> optin ICH9_LPC_SMI_F_LOCKED_SMBASE_BIT to supported features.
>
> Writing this bit into "etc/smi/requested-features" fw_cfg file,
> tells QEMU to alias 0x30000,128K RAM range into SMRAM address
> space and mask this region from normal RAM address space
> (reads return 0xff and writes are ignored, i.e. guest code
> should be able to deal with not usable 0x30000,128K RAM range
> once ICH9_LPC_SMI_F_LOCKED_SMBASE_BIT is activated).
>
> To make negotiated change effective, guest should read
> "etc/smi/features-ok" fw_cfg file, which activates negotiated
> features and locks down negotiating capabilities until hard reset.
>
> Flow for initializing SMI handler on guest side:
>  1. set SMI handler entry point at default SMBASE location
>  2. check that host supports ICH9_LPC_SMI_F_LOCKED_SMBASE_BIT
>     in "etc/smi/supported-features" and set if supported set
>     it in "etc/smi/requested-features"
>  3. read "etc/smi/features-ok", if returned value is 1
>     negotiated at step 2 features are activated successfully.

Tying the [0x30000+128K) lockdown to the broadcast SMI negotiation is a
simplification for QEMU, but it is a complication for OVMF.

(This QEMU patch ties those things together in effect because
"etc/smi/features-ok" can be selected for lockdown only once.)

In OVMF, at least 6 modules are involved in SMM setup. Here I'm only
going to list some steps for 4 modules (skipping
"OvmfPkg/SmmAccess/SmmAccess2Dxe.inf" and
"UefiCpuPkg/CpuIo2Smm/CpuIo2Smm.inf").


(1) The "OvmfPkg/SmmControl2Dxe/SmmControl2Dxe.inf" driver is launched,
and it produces the EFI_SMM_CONTROL2_PROTOCOL.

EFI_SMM_CONTROL2_PROTOCOL.Trigger() is the standard / abstract method
for synchronously raising an SMI. The OVMF implementation writes to IO
port 0xB2.

Because OVMF exposes this protocol to the rest of the firmware, it first
negotiates SMI broadcast, if QEMU offers it. The idea is that, without
negotiating SMI broadcast (if it's available), EFI_SMM_CONTROL2_PROTOCOL
is not fully configured, and should not be exposed. (Because, Trigger()
wouldn't work properly). Incomplete / halfway functional protocols are
not to be published.

That is, we have

(1a) negotiate SMI broadcast
(1b) install EFI_SMM_CONTROL2_PROTOCOL.


(2) Dependent on EFI_SMM_CONTROL2_PROTOCOL, the SMM IPL (Initial Program
Load -- "MdeModulePkg/Core/PiSmmCore/PiSmmIpl.inf") is launched.

This module
(2a) registers a callback for EFI_SMM_CONFIGURATION_PROTOCOL,
(2b) loads the SMM Core ("MdeModulePkg/Core/PiSmmCore/PiSmmCore.inf")
     into SMRAM and starts it.


(3) The SMM Core launches the SMM processor driver
("UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.inf").

The SMM processor driver
(3a) performs the initial SMBASE relocation,
(3b) and then installs EFI_SMM_CONFIGURATION_PROTOCOL.

(Side remark: the SMM processor driver does not use IO port 0xB2 (it
does not call Trigger()); it uses LAPIC accesses. This is by design (PI
spec); Trigger() is supposed to be called  after the relocation is done,
and not for starting the relocation.)


(4) The SMM IPL's callback fires. It uses EFI_SMM_CONFIGURATION_PROTOCOL
to connect the platform-independent SMM entry point (= central
high-level SMI handler), which is in the SMM Core, into the low-level
(CPU-specific) SMI handler in the SMM processor driver.

At this point, SMIs are considered fully functional. General drivers
that are split into privileged (SMM) and unprivileged (runtime DXE)
halves, such as the variable service driver, can use
EFI_SMM_COMMUNICATION_PROTOCOL to submit messages to the privileged
(SMM) halves. And that boils down to EFI_SMM_CONTROL2_PROTOCOL.Trigger()
calls, which depends on SMI broadcast.

--*--

The present QEMU patch requires the firmware to (i) negotiate SMI
broadcast and to (ii) lock down [0x30000+128K) at the same time.

If OVMF does both in step (1a) -- i.e. where it currently negotiates the
broadcast --, then step (3a) breaks: because the initial SMBASE
relocation depends on RAM at [0x30000+128K).

In a theoretical ordering perspective, we could perhaps move the logic
from step (1a) between steps (3a) and (3b). There are two problems with
that:

- The platform logic from step (1a) doesn't belong in the SMM processor
driver (even if we managed to hook it in).

- In step (1b), we'd be installing a protocol
(EFI_SMM_CONTROL2_PROTOCOL) that is simply not set up correctly -- it's
incomplete.


Can QEMU offer this new "[0x30000+128K) lockdown" hardware feature in a
separate platform device? (Such as a PCI device with fixed
(QEMU-specified) B/D/F, and config space register(s).)

It would be less difficult to lock such hardware down in isolation: I
wouldn't even attempt to inject that action between steps (3a) and (3b),
but write it as a new, independent End-of-DXE handler, in
"OvmfPkg/SmmAccess/SmmAccess2Dxe.inf". (That driver already offers SMRAM
open/close/lock services.) I would also reserve the memory away at that
time -- I don't expect the firmware to keep anything that low.
(Allocations are generally served top-down.)

--*--

... I've done some testing too. Applying the QEMU patch on top of
89ea03a7dc83, my plan was:

- do not change OVMF, just see if it continues booting with the QEMU
patch

- then negotiate bit#1 too, in step (1a) -- this is when I'd expect (3a)
to break.

Unfortunately, the result is worse than that; even without negotiating
bit#1 (i.e. in the baseline test), the firmware crashes (reboots) in
step (3a). I've checked "info mtree", and all occurences of
"smbase-blackhole" and "smbase-blackhole" are marked [disabled]. I'm not
sure what's wrong with the baseline test (i.e. without negotiating
bit#1). If I drop the patch (build QEMU at 89ea03a7dc83), then things
work fine.

Thank you!
Laszlo

>
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
>  include/hw/i386/ich9.h | 11 ++++++--
>  hw/i386/pc.c           |  4 ++-
>  hw/i386/pc_q35.c       |  3 ++-
>  hw/isa/lpc_ich9.c      | 58 +++++++++++++++++++++++++++++++++++++++++-
>  4 files changed, 71 insertions(+), 5 deletions(-)
>
> diff --git a/include/hw/i386/ich9.h b/include/hw/i386/ich9.h
> index 72e803f6e2..c28685b753 100644
> --- a/include/hw/i386/ich9.h
> +++ b/include/hw/i386/ich9.h
> @@ -12,11 +12,14 @@
>  #include "hw/acpi/acpi.h"
>  #include "hw/acpi/ich9.h"
>  #include "hw/pci/pci_bus.h"
> +#include "qemu/units.h"
>
>  void ich9_lpc_set_irq(void *opaque, int irq_num, int level);
>  int ich9_lpc_map_irq(PCIDevice *pci_dev, int intx);
>  PCIINTxRoute ich9_route_intx_pin_to_irq(void *opaque, int pirq_pin);
> -void ich9_lpc_pm_init(PCIDevice *pci_lpc, bool smm_enabled);
> +void ich9_lpc_pm_init(PCIDevice *pci_lpc, bool smm_enabled,
> +                      MemoryRegion *system_memory, MemoryRegion *ram,
> +                      MemoryRegion *smram);
>  I2CBus *ich9_smb_init(PCIBus *bus, int devfn, uint32_t smb_io_base);
>
>  void ich9_generate_smi(void);
> @@ -71,6 +74,8 @@ typedef struct ICH9LPCState {
>      uint8_t smi_features_ok;          /* guest-visible, read-only; selecting it
>                                         * triggers feature lockdown */
>      uint64_t smi_negotiated_features; /* guest-invisible, host endian */
> +    MemoryRegion smbase_blackhole;
> +    MemoryRegion smbase_window;
>
>      /* isa bus */
>      ISABus *isa_bus;
> @@ -248,5 +253,7 @@ typedef struct ICH9LPCState {
>
>  /* bit positions used in fw_cfg SMI feature negotiation */
>  #define ICH9_LPC_SMI_F_BROADCAST_BIT            0
> -
> +#define ICH9_LPC_SMI_F_LOCKED_SMBASE_BIT        1
> +#define ICH9_LPC_SMBASE_ADDR                    0x30000
> +#define ICH9_LPC_SMBASE_RAM_SIZE                (128 * KiB)
>  #endif /* HW_ICH9_H */
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index c14ed86439..99a98303eb 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -119,7 +119,9 @@ struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
>  /* Physical Address of PVH entry point read from kernel ELF NOTE */
>  static size_t pvh_start_addr;
>
> -GlobalProperty pc_compat_4_1[] = {};
> +GlobalProperty pc_compat_4_1[] = {
> +    { "ICH9-LPC", "x-smi-locked-smbase", "off" },
> +};
>  const size_t pc_compat_4_1_len = G_N_ELEMENTS(pc_compat_4_1);
>
>  GlobalProperty pc_compat_4_0[] = {};
> diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
> index d4e8a1cb9f..50462686a0 100644
> --- a/hw/i386/pc_q35.c
> +++ b/hw/i386/pc_q35.c
> @@ -292,7 +292,8 @@ static void pc_q35_init(MachineState *machine)
>                           0xff0104);
>
>      /* connect pm stuff to lpc */
> -    ich9_lpc_pm_init(lpc, pc_machine_is_smm_enabled(pcms));
> +    ich9_lpc_pm_init(lpc, pc_machine_is_smm_enabled(pcms), get_system_memory(),
> +        ram_memory, MEMORY_REGION(object_resolve_path("/machine/smram", NULL)));
>
>      if (pcms->sata_enabled) {
>          /* ahci and SATA device, for q35 1 ahci controller is built-in */
> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
> index 17c292e306..17a8cd1b51 100644
> --- a/hw/isa/lpc_ich9.c
> +++ b/hw/isa/lpc_ich9.c
> @@ -359,6 +359,38 @@ static void ich9_set_sci(void *opaque, int irq_num, int level)
>      }
>  }
>
> +static uint64_t smbase_blackhole_read(void *ptr, hwaddr reg, unsigned size)
> +{
> +    return 0xffffffff;
> +}
> +
> +static void smbase_blackhole_write(void *opaque, hwaddr addr, uint64_t val,
> +                                   unsigned width)
> +{
> +    /* nothing */
> +}
> +
> +static const MemoryRegionOps smbase_blackhole_ops = {
> +    .read = smbase_blackhole_read,
> +    .write = smbase_blackhole_write,
> +    .endianness = DEVICE_NATIVE_ENDIAN,
> +    .valid.min_access_size = 1,
> +    .valid.max_access_size = 4,
> +    .impl.min_access_size = 4,
> +    .impl.max_access_size = 4,
> +    .endianness = DEVICE_LITTLE_ENDIAN,
> +};
> +
> +static void ich9_lpc_smbase_locked_update(ICH9LPCState *lpc)
> +{
> +    bool en = lpc->smi_negotiated_features & ICH9_LPC_SMI_F_LOCKED_SMBASE_BIT;
> +
> +    memory_region_transaction_begin();
> +    memory_region_set_enabled(&lpc->smbase_blackhole, en);
> +    memory_region_set_enabled(&lpc->smbase_window, en);
> +    memory_region_transaction_commit();
> +}
> +
>  static void smi_features_ok_callback(void *opaque)
>  {
>      ICH9LPCState *lpc = opaque;
> @@ -379,9 +411,13 @@ static void smi_features_ok_callback(void *opaque)
>      /* valid feature subset requested, lock it down, report success */
>      lpc->smi_negotiated_features = guest_features;
>      lpc->smi_features_ok = 1;
> +
> +    ich9_lpc_smbase_locked_update(lpc);
>  }
>
> -void ich9_lpc_pm_init(PCIDevice *lpc_pci, bool smm_enabled)
> +void ich9_lpc_pm_init(PCIDevice *lpc_pci, bool smm_enabled,
> +                      MemoryRegion *system_memory,  MemoryRegion *ram,
> +                      MemoryRegion *smram)
>  {
>      ICH9LPCState *lpc = ICH9_LPC_DEVICE(lpc_pci);
>      qemu_irq sci_irq;
> @@ -413,6 +449,20 @@ void ich9_lpc_pm_init(PCIDevice *lpc_pci, bool smm_enabled)
>                                   &lpc->smi_features_ok,
>                                   sizeof lpc->smi_features_ok,
>                                   true);
> +
> +        memory_region_init_io(&lpc->smbase_blackhole, OBJECT(lpc),
> +                              &smbase_blackhole_ops, NULL,
> +                              "smbase-blackhole", ICH9_LPC_SMBASE_RAM_SIZE);
> +        memory_region_set_enabled(&lpc->smbase_blackhole, false);
> +        memory_region_add_subregion_overlap(system_memory, ICH9_LPC_SMBASE_ADDR,
> +                                            &lpc->smbase_blackhole, 1);
> +
> +
> +        memory_region_init_alias(&lpc->smbase_window, OBJECT(lpc),
> +            "smbase-window", ram,
> +             ICH9_LPC_SMBASE_ADDR, ICH9_LPC_SMBASE_RAM_SIZE);
> +        memory_region_set_enabled(&lpc->smbase_window, false);
> +        memory_region_add_subregion(smram, 0x30000, &lpc->smbase_window);
>      }
>
>      ich9_lpc_reset(DEVICE(lpc));
> @@ -508,6 +558,7 @@ static int ich9_lpc_post_load(void *opaque, int version_id)
>      ich9_lpc_pmbase_sci_update(lpc);
>      ich9_lpc_rcba_update(lpc, 0 /* disabled ICH9_LPC_RCBA_EN */);
>      ich9_lpc_pmcon_update(lpc);
> +    ich9_lpc_smbase_locked_update(lpc);
>      return 0;
>  }
>
> @@ -567,6 +618,8 @@ static void ich9_lpc_reset(DeviceState *qdev)
>      memset(lpc->smi_guest_features_le, 0, sizeof lpc->smi_guest_features_le);
>      lpc->smi_features_ok = 0;
>      lpc->smi_negotiated_features = 0;
> +
> +    ich9_lpc_smbase_locked_update(lpc);
>  }
>
>  /* root complex register block is mapped into memory space */
> @@ -697,6 +750,7 @@ static void ich9_lpc_realize(PCIDevice *d, Error **errp)
>      qdev_init_gpio_out_named(dev, lpc->gsi, ICH9_GPIO_GSI, GSI_NUM_PINS);
>
>      isa_bus_irqs(isa_bus, lpc->gsi);
> +
>  }
>
>  static bool ich9_rst_cnt_needed(void *opaque)
> @@ -764,6 +818,8 @@ static Property ich9_lpc_properties[] = {
>      DEFINE_PROP_BOOL("noreboot", ICH9LPCState, pin_strap.spkr_hi, true),
>      DEFINE_PROP_BIT64("x-smi-broadcast", ICH9LPCState, smi_host_features,
>                        ICH9_LPC_SMI_F_BROADCAST_BIT, true),
> +    DEFINE_PROP_BIT64("x-smi-locked-smbase", ICH9LPCState, smi_host_features,
> +                      ICH9_LPC_SMI_F_LOCKED_SMBASE_BIT, true),
>      DEFINE_PROP_END_OF_LIST(),
>  };
>
>


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] q35: lpc: allow to lock down 128K RAM at default SMBASE address
  2019-09-09 19:15                                                                         ` Laszlo Ersek
@ 2019-09-09 19:20                                                                           ` Laszlo Ersek
  2019-09-10 15:58                                                                           ` Igor Mammedov
  1 sibling, 0 replies; 69+ messages in thread
From: Laszlo Ersek @ 2019-09-09 19:20 UTC (permalink / raw)
  To: Igor Mammedov, qemu-devel
  Cc: yingwen.chen, devel, phillip.goerl, alex.williamson, jiewen.yao,
	jun.nakajima, michael.d.kinney, pbonzini, boris.ostrovsky, rfc,
	joao.m.martins

On 09/09/19 21:15, Laszlo Ersek wrote:

> ... I've done some testing too. Applying the QEMU patch on top of
> 89ea03a7dc83, my plan was:
> 
> - do not change OVMF, just see if it continues booting with the QEMU
> patch
> 
> - then negotiate bit#1 too, in step (1a) -- this is when I'd expect (3a)
> to break.
> 
> Unfortunately, the result is worse than that; even without negotiating
> bit#1 (i.e. in the baseline test), the firmware crashes (reboots) in
> step (3a). I've checked "info mtree", and all occurences of
> "smbase-blackhole" and "smbase-blackhole" are marked [disabled]. I'm not
> sure what's wrong with the baseline test (i.e. without negotiating
> bit#1). If I drop the patch (build QEMU at 89ea03a7dc83), then things
> work fine.

Sorry, there's a typo above: I pasted "smbase-blackhole" twice. The
second instance was meant to be "smbase-window". I checked all instances
of both regions in the info mtree output, I just fumbled the pasting.

Thanks
Laszlo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] q35: lpc: allow to lock down 128K RAM at default SMBASE address
  2019-09-09 19:15                                                                         ` Laszlo Ersek
  2019-09-09 19:20                                                                           ` Laszlo Ersek
@ 2019-09-10 15:58                                                                           ` Igor Mammedov
  2019-09-11 17:30                                                                             ` Laszlo Ersek
  1 sibling, 1 reply; 69+ messages in thread
From: Igor Mammedov @ 2019-09-10 15:58 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: qemu-devel, yingwen.chen, devel, phillip.goerl, alex.williamson,
	jiewen.yao, jun.nakajima, michael.d.kinney, pbonzini,
	boris.ostrovsky, rfc, joao.m.martins

On Mon, 9 Sep 2019 21:15:44 +0200
Laszlo Ersek <lersek@redhat.com> wrote:

> Hi Igor,
> 
> On 09/05/19 17:49, Igor Mammedov wrote:
> > lpc already has SMI negotiation feature, extend it by adding
> > optin ICH9_LPC_SMI_F_LOCKED_SMBASE_BIT to supported features.
> >
> > Writing this bit into "etc/smi/requested-features" fw_cfg file,
> > tells QEMU to alias 0x30000,128K RAM range into SMRAM address
> > space and mask this region from normal RAM address space
> > (reads return 0xff and writes are ignored, i.e. guest code
> > should be able to deal with not usable 0x30000,128K RAM range
> > once ICH9_LPC_SMI_F_LOCKED_SMBASE_BIT is activated).
> >
> > To make negotiated change effective, guest should read
> > "etc/smi/features-ok" fw_cfg file, which activates negotiated
> > features and locks down negotiating capabilities until hard reset.
> >
> > Flow for initializing SMI handler on guest side:
> >  1. set SMI handler entry point at default SMBASE location
> >  2. check that host supports ICH9_LPC_SMI_F_LOCKED_SMBASE_BIT
> >     in "etc/smi/supported-features" and set if supported set
> >     it in "etc/smi/requested-features"
> >  3. read "etc/smi/features-ok", if returned value is 1
> >     negotiated at step 2 features are activated successfully.  
> 
> Tying the [0x30000+128K) lockdown to the broadcast SMI negotiation is a
> simplification for QEMU, but it is a complication for OVMF.
> 
> (This QEMU patch ties those things together in effect because
> "etc/smi/features-ok" can be selected for lockdown only once.)
> 
> In OVMF, at least 6 modules are involved in SMM setup. Here I'm only
> going to list some steps for 4 modules (skipping
> "OvmfPkg/SmmAccess/SmmAccess2Dxe.inf" and
> "UefiCpuPkg/CpuIo2Smm/CpuIo2Smm.inf").
> 
> 
> (1) The "OvmfPkg/SmmControl2Dxe/SmmControl2Dxe.inf" driver is launched,
> and it produces the EFI_SMM_CONTROL2_PROTOCOL.
> 
> EFI_SMM_CONTROL2_PROTOCOL.Trigger() is the standard / abstract method
> for synchronously raising an SMI. The OVMF implementation writes to IO
> port 0xB2.
> 
> Because OVMF exposes this protocol to the rest of the firmware, it first
> negotiates SMI broadcast, if QEMU offers it. The idea is that, without
> negotiating SMI broadcast (if it's available), EFI_SMM_CONTROL2_PROTOCOL
> is not fully configured, and should not be exposed. (Because, Trigger()
> wouldn't work properly). Incomplete / halfway functional protocols are
> not to be published.
> 
> That is, we have
> 
> (1a) negotiate SMI broadcast
> (1b) install EFI_SMM_CONTROL2_PROTOCOL.
> 
> 
> (2) Dependent on EFI_SMM_CONTROL2_PROTOCOL, the SMM IPL (Initial Program
> Load -- "MdeModulePkg/Core/PiSmmCore/PiSmmIpl.inf") is launched.
> 
> This module
> (2a) registers a callback for EFI_SMM_CONFIGURATION_PROTOCOL,
> (2b) loads the SMM Core ("MdeModulePkg/Core/PiSmmCore/PiSmmCore.inf")
>      into SMRAM and starts it.
> 
> 
> (3) The SMM Core launches the SMM processor driver
> ("UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.inf").
> 
> The SMM processor driver
> (3a) performs the initial SMBASE relocation,
> (3b) and then installs EFI_SMM_CONFIGURATION_PROTOCOL.
> 
> (Side remark: the SMM processor driver does not use IO port 0xB2 (it
> does not call Trigger()); it uses LAPIC accesses. This is by design (PI
> spec); Trigger() is supposed to be called  after the relocation is done,
> and not for starting the relocation.)
> 
> 
> (4) The SMM IPL's callback fires. It uses EFI_SMM_CONFIGURATION_PROTOCOL
> to connect the platform-independent SMM entry point (= central
> high-level SMI handler), which is in the SMM Core, into the low-level
> (CPU-specific) SMI handler in the SMM processor driver.
> 
> At this point, SMIs are considered fully functional. General drivers
> that are split into privileged (SMM) and unprivileged (runtime DXE)
> halves, such as the variable service driver, can use
> EFI_SMM_COMMUNICATION_PROTOCOL to submit messages to the privileged
> (SMM) halves. And that boils down to EFI_SMM_CONTROL2_PROTOCOL.Trigger()
> calls, which depends on SMI broadcast.
> 
> --*--
> 
> The present QEMU patch requires the firmware to (i) negotiate SMI
> broadcast and to (ii) lock down [0x30000+128K) at the same time.
> 
> If OVMF does both in step (1a) -- i.e. where it currently negotiates the
> broadcast --, then step (3a) breaks: because the initial SMBASE
> relocation depends on RAM at [0x30000+128K).
> 
> In a theoretical ordering perspective, we could perhaps move the logic
> from step (1a) between steps (3a) and (3b). There are two problems with
> that:
> 
> - The platform logic from step (1a) doesn't belong in the SMM processor
> driver (even if we managed to hook it in).
> 
> - In step (1b), we'd be installing a protocol
> (EFI_SMM_CONTROL2_PROTOCOL) that is simply not set up correctly -- it's
> incomplete.
> 
> 
> Can QEMU offer this new "[0x30000+128K) lockdown" hardware feature in a
> separate platform device? (Such as a PCI device with fixed
> (QEMU-specified) B/D/F, and config space register(s).)
It looks like fwcfg smi feature negotiation isn't reusable in this case.
I'd prefer not to add another device just for another SMI feature
negotiation/activation.
How about stealing reserved register from pci-host similar to your
extended TSEG commit (2f295167 q35/mch: implement extended TSEG sizes)?
(Looking into spec can (ab)use 0x58 or 0x59 register)

> It would be less difficult to lock such hardware down in isolation: I
> wouldn't even attempt to inject that action between steps (3a) and (3b),
> but write it as a new, independent End-of-DXE handler, in
> "OvmfPkg/SmmAccess/SmmAccess2Dxe.inf". (That driver already offers SMRAM
> open/close/lock services.) I would also reserve the memory away at that
> time -- I don't expect the firmware to keep anything that low.
> (Allocations are generally served top-down.)

> 
> --*--
> 
> ... I've done some testing too. Applying the QEMU patch on top of
> 89ea03a7dc83, my plan was:
> 
> - do not change OVMF, just see if it continues booting with the QEMU
> patch
> 
> - then negotiate bit#1 too, in step (1a) -- this is when I'd expect (3a)
> to break.
> 
> Unfortunately, the result is worse than that; even without negotiating
> bit#1 (i.e. in the baseline test), the firmware crashes (reboots) in
> step (3a). I've checked "info mtree", and all occurences of
> "smbase-blackhole" and "smbase-blackhole" are marked [disabled]. I'm not
> sure what's wrong with the baseline test (i.e. without negotiating
> bit#1). If I drop the patch (build QEMU at 89ea03a7dc83), then things
> work fine.

that was a bug in my code, which always made lock down effective on
feature_ok selection, which breaks relocation for reasons you've
explained above.

diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
index 17a8cd1b51..32ddf54fc2 100644
--- a/hw/isa/lpc_ich9.c
+++ b/hw/isa/lpc_ich9.c
@@ -383,7 +383,7 @@ static const MemoryRegionOps smbase_blackhole_ops = {
 
 static void ich9_lpc_smbase_locked_update(ICH9LPCState *lpc)
 {
-    bool en = lpc->smi_negotiated_features & ICH9_LPC_SMI_F_LOCKED_SMBASE_BIT;
+    bool en = lpc->smi_negotiated_features & (UINT64_C(1) << ICH9_LPC_SMI_F_LOCKED_SMBASE_BIT);
 
     memory_region_transaction_begin();
     memory_region_set_enabled(&lpc->smbase_blackhole, en);


> 
> Thank you!
> Laszlo
> 
> >
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> >  include/hw/i386/ich9.h | 11 ++++++--
> >  hw/i386/pc.c           |  4 ++-
> >  hw/i386/pc_q35.c       |  3 ++-
> >  hw/isa/lpc_ich9.c      | 58 +++++++++++++++++++++++++++++++++++++++++-
> >  4 files changed, 71 insertions(+), 5 deletions(-)
> >
> > diff --git a/include/hw/i386/ich9.h b/include/hw/i386/ich9.h
> > index 72e803f6e2..c28685b753 100644
> > --- a/include/hw/i386/ich9.h
> > +++ b/include/hw/i386/ich9.h
> > @@ -12,11 +12,14 @@
> >  #include "hw/acpi/acpi.h"
> >  #include "hw/acpi/ich9.h"
> >  #include "hw/pci/pci_bus.h"
> > +#include "qemu/units.h"
> >
> >  void ich9_lpc_set_irq(void *opaque, int irq_num, int level);
> >  int ich9_lpc_map_irq(PCIDevice *pci_dev, int intx);
> >  PCIINTxRoute ich9_route_intx_pin_to_irq(void *opaque, int pirq_pin);
> > -void ich9_lpc_pm_init(PCIDevice *pci_lpc, bool smm_enabled);
> > +void ich9_lpc_pm_init(PCIDevice *pci_lpc, bool smm_enabled,
> > +                      MemoryRegion *system_memory, MemoryRegion *ram,
> > +                      MemoryRegion *smram);
> >  I2CBus *ich9_smb_init(PCIBus *bus, int devfn, uint32_t smb_io_base);
> >
> >  void ich9_generate_smi(void);
> > @@ -71,6 +74,8 @@ typedef struct ICH9LPCState {
> >      uint8_t smi_features_ok;          /* guest-visible, read-only; selecting it
> >                                         * triggers feature lockdown */
> >      uint64_t smi_negotiated_features; /* guest-invisible, host endian */
> > +    MemoryRegion smbase_blackhole;
> > +    MemoryRegion smbase_window;
> >
> >      /* isa bus */
> >      ISABus *isa_bus;
> > @@ -248,5 +253,7 @@ typedef struct ICH9LPCState {
> >
> >  /* bit positions used in fw_cfg SMI feature negotiation */
> >  #define ICH9_LPC_SMI_F_BROADCAST_BIT            0
> > -
> > +#define ICH9_LPC_SMI_F_LOCKED_SMBASE_BIT        1
> > +#define ICH9_LPC_SMBASE_ADDR                    0x30000
> > +#define ICH9_LPC_SMBASE_RAM_SIZE                (128 * KiB)
> >  #endif /* HW_ICH9_H */
> > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > index c14ed86439..99a98303eb 100644
> > --- a/hw/i386/pc.c
> > +++ b/hw/i386/pc.c
> > @@ -119,7 +119,9 @@ struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
> >  /* Physical Address of PVH entry point read from kernel ELF NOTE */
> >  static size_t pvh_start_addr;
> >
> > -GlobalProperty pc_compat_4_1[] = {};
> > +GlobalProperty pc_compat_4_1[] = {
> > +    { "ICH9-LPC", "x-smi-locked-smbase", "off" },
> > +};
> >  const size_t pc_compat_4_1_len = G_N_ELEMENTS(pc_compat_4_1);
> >
> >  GlobalProperty pc_compat_4_0[] = {};
> > diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
> > index d4e8a1cb9f..50462686a0 100644
> > --- a/hw/i386/pc_q35.c
> > +++ b/hw/i386/pc_q35.c
> > @@ -292,7 +292,8 @@ static void pc_q35_init(MachineState *machine)
> >                           0xff0104);
> >
> >      /* connect pm stuff to lpc */
> > -    ich9_lpc_pm_init(lpc, pc_machine_is_smm_enabled(pcms));
> > +    ich9_lpc_pm_init(lpc, pc_machine_is_smm_enabled(pcms), get_system_memory(),
> > +        ram_memory, MEMORY_REGION(object_resolve_path("/machine/smram", NULL)));
> >
> >      if (pcms->sata_enabled) {
> >          /* ahci and SATA device, for q35 1 ahci controller is built-in */
> > diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
> > index 17c292e306..17a8cd1b51 100644
> > --- a/hw/isa/lpc_ich9.c
> > +++ b/hw/isa/lpc_ich9.c
> > @@ -359,6 +359,38 @@ static void ich9_set_sci(void *opaque, int irq_num, int level)
> >      }
> >  }
> >
> > +static uint64_t smbase_blackhole_read(void *ptr, hwaddr reg, unsigned size)
> > +{
> > +    return 0xffffffff;
> > +}
> > +
> > +static void smbase_blackhole_write(void *opaque, hwaddr addr, uint64_t val,
> > +                                   unsigned width)
> > +{
> > +    /* nothing */
> > +}
> > +
> > +static const MemoryRegionOps smbase_blackhole_ops = {
> > +    .read = smbase_blackhole_read,
> > +    .write = smbase_blackhole_write,
> > +    .endianness = DEVICE_NATIVE_ENDIAN,
> > +    .valid.min_access_size = 1,
> > +    .valid.max_access_size = 4,
> > +    .impl.min_access_size = 4,
> > +    .impl.max_access_size = 4,
> > +    .endianness = DEVICE_LITTLE_ENDIAN,
> > +};
> > +
> > +static void ich9_lpc_smbase_locked_update(ICH9LPCState *lpc)
> > +{
> > +    bool en = lpc->smi_negotiated_features & ICH9_LPC_SMI_F_LOCKED_SMBASE_BIT;
> > +
> > +    memory_region_transaction_begin();
> > +    memory_region_set_enabled(&lpc->smbase_blackhole, en);
> > +    memory_region_set_enabled(&lpc->smbase_window, en);
> > +    memory_region_transaction_commit();
> > +}
> > +
> >  static void smi_features_ok_callback(void *opaque)
> >  {
> >      ICH9LPCState *lpc = opaque;
> > @@ -379,9 +411,13 @@ static void smi_features_ok_callback(void *opaque)
> >      /* valid feature subset requested, lock it down, report success */
> >      lpc->smi_negotiated_features = guest_features;
> >      lpc->smi_features_ok = 1;
> > +
> > +    ich9_lpc_smbase_locked_update(lpc);
> >  }
> >
> > -void ich9_lpc_pm_init(PCIDevice *lpc_pci, bool smm_enabled)
> > +void ich9_lpc_pm_init(PCIDevice *lpc_pci, bool smm_enabled,
> > +                      MemoryRegion *system_memory,  MemoryRegion *ram,
> > +                      MemoryRegion *smram)
> >  {
> >      ICH9LPCState *lpc = ICH9_LPC_DEVICE(lpc_pci);
> >      qemu_irq sci_irq;
> > @@ -413,6 +449,20 @@ void ich9_lpc_pm_init(PCIDevice *lpc_pci, bool smm_enabled)
> >                                   &lpc->smi_features_ok,
> >                                   sizeof lpc->smi_features_ok,
> >                                   true);
> > +
> > +        memory_region_init_io(&lpc->smbase_blackhole, OBJECT(lpc),
> > +                              &smbase_blackhole_ops, NULL,
> > +                              "smbase-blackhole", ICH9_LPC_SMBASE_RAM_SIZE);
> > +        memory_region_set_enabled(&lpc->smbase_blackhole, false);
> > +        memory_region_add_subregion_overlap(system_memory, ICH9_LPC_SMBASE_ADDR,
> > +                                            &lpc->smbase_blackhole, 1);
> > +
> > +
> > +        memory_region_init_alias(&lpc->smbase_window, OBJECT(lpc),
> > +            "smbase-window", ram,
> > +             ICH9_LPC_SMBASE_ADDR, ICH9_LPC_SMBASE_RAM_SIZE);
> > +        memory_region_set_enabled(&lpc->smbase_window, false);
> > +        memory_region_add_subregion(smram, 0x30000, &lpc->smbase_window);
> >      }
> >
> >      ich9_lpc_reset(DEVICE(lpc));
> > @@ -508,6 +558,7 @@ static int ich9_lpc_post_load(void *opaque, int version_id)
> >      ich9_lpc_pmbase_sci_update(lpc);
> >      ich9_lpc_rcba_update(lpc, 0 /* disabled ICH9_LPC_RCBA_EN */);
> >      ich9_lpc_pmcon_update(lpc);
> > +    ich9_lpc_smbase_locked_update(lpc);
> >      return 0;
> >  }
> >
> > @@ -567,6 +618,8 @@ static void ich9_lpc_reset(DeviceState *qdev)
> >      memset(lpc->smi_guest_features_le, 0, sizeof lpc->smi_guest_features_le);
> >      lpc->smi_features_ok = 0;
> >      lpc->smi_negotiated_features = 0;
> > +
> > +    ich9_lpc_smbase_locked_update(lpc);
> >  }
> >
> >  /* root complex register block is mapped into memory space */
> > @@ -697,6 +750,7 @@ static void ich9_lpc_realize(PCIDevice *d, Error **errp)
> >      qdev_init_gpio_out_named(dev, lpc->gsi, ICH9_GPIO_GSI, GSI_NUM_PINS);
> >
> >      isa_bus_irqs(isa_bus, lpc->gsi);
> > +
> >  }
> >
> >  static bool ich9_rst_cnt_needed(void *opaque)
> > @@ -764,6 +818,8 @@ static Property ich9_lpc_properties[] = {
> >      DEFINE_PROP_BOOL("noreboot", ICH9LPCState, pin_strap.spkr_hi, true),
> >      DEFINE_PROP_BIT64("x-smi-broadcast", ICH9LPCState, smi_host_features,
> >                        ICH9_LPC_SMI_F_BROADCAST_BIT, true),
> > +    DEFINE_PROP_BIT64("x-smi-locked-smbase", ICH9LPCState, smi_host_features,
> > +                      ICH9_LPC_SMI_F_LOCKED_SMBASE_BIT, true),
> >      DEFINE_PROP_END_OF_LIST(),
> >  };
> >
> >  
> 


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH] q35: lpc: allow to lock down 128K RAM at default SMBASE address
  2019-09-10 15:58                                                                           ` Igor Mammedov
@ 2019-09-11 17:30                                                                             ` Laszlo Ersek
  2019-09-17 13:11                                                                               ` [edk2-devel] " Igor Mammedov
  0 siblings, 1 reply; 69+ messages in thread
From: Laszlo Ersek @ 2019-09-11 17:30 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, yingwen.chen, devel, phillip.goerl, alex.williamson,
	jiewen.yao, jun.nakajima, michael.d.kinney, pbonzini,
	boris.ostrovsky, rfc, joao.m.martins

On 09/10/19 17:58, Igor Mammedov wrote:
> On Mon, 9 Sep 2019 21:15:44 +0200
> Laszlo Ersek <lersek@redhat.com> wrote:

[...]

> It looks like fwcfg smi feature negotiation isn't reusable in this case.
> I'd prefer not to add another device just for another SMI feature
> negotiation/activation.

I thought it could be a register on the new CPU hotplug controller that
we're going to need anyway (if I understand correctly, at least).

But:

> How about stealing reserved register from pci-host similar to your
> extended TSEG commit (2f295167 q35/mch: implement extended TSEG sizes)?
> (Looking into spec can (ab)use 0x58 or 0x59 register)

Yes, that should work.

In fact, I had considered 0x58 / 0x59 when looking for unused registers
for extended TSEG configuration:

http://mid.mail-archive.com/d8802612-0b11-776f-b209-53bbdaf67515@redhat.com

So I'm OK with this, thank you.

More below:

>> ... I've done some testing too. Applying the QEMU patch on top of
>> 89ea03a7dc83, my plan was:
>>
>> - do not change OVMF, just see if it continues booting with the QEMU
>> patch
>>
>> - then negotiate bit#1 too, in step (1a) -- this is when I'd expect (3a)
>> to break.
>>
>> Unfortunately, the result is worse than that; even without negotiating
>> bit#1 (i.e. in the baseline test), the firmware crashes (reboots) in
>> step (3a). I've checked "info mtree", and all occurences of
>> "smbase-blackhole" and "smbase-blackhole" are marked [disabled]. I'm not
>> sure what's wrong with the baseline test (i.e. without negotiating
>> bit#1). If I drop the patch (build QEMU at 89ea03a7dc83), then things
>> work fine.
> 
> that was a bug in my code, which always made lock down effective on
> feature_ok selection, which breaks relocation for reasons you've
> explained above.
> 
> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
> index 17a8cd1b51..32ddf54fc2 100644
> --- a/hw/isa/lpc_ich9.c
> +++ b/hw/isa/lpc_ich9.c
> @@ -383,7 +383,7 @@ static const MemoryRegionOps smbase_blackhole_ops = {
>  
>  static void ich9_lpc_smbase_locked_update(ICH9LPCState *lpc)
>  {
> -    bool en = lpc->smi_negotiated_features & ICH9_LPC_SMI_F_LOCKED_SMBASE_BIT;
> +    bool en = lpc->smi_negotiated_features & (UINT64_C(1) << ICH9_LPC_SMI_F_LOCKED_SMBASE_BIT);
>  
>      memory_region_transaction_begin();
>      memory_region_set_enabled(&lpc->smbase_blackhole, en);

I see.

ICH9_LPC_SMI_F_LOCKED_SMBASE_BIT is 1, with the intended value for
bitmask checkin) being 1LLU<<1 == 2LLU.

Due to the bug, the function would check value 1 in the bitmask -- which
in fact corresponds to bit#0. Bit#0 happens to be
ICH9_LPC_SMI_F_BROADCAST_BIT.

And because OVMF would negotiate that feature (= broadcast SMI) even in
the baseline test, it ended up enabling the "locked smbase" feature too.

So ultimately I think we can consider this a valid test (= with
meaningful result); the result is that we can't reuse these fw_cfg files
for "locked smbase", as discussed above.

Thanks!
Laszlo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-devel] [PATCH] q35: lpc: allow to lock down 128K RAM at default SMBASE address
  2019-09-11 17:30                                                                             ` Laszlo Ersek
@ 2019-09-17 13:11                                                                               ` Igor Mammedov
  2019-09-17 14:38                                                                                 ` [staging/branch]: CdePkg - C Development Environment Package Minnow Ware
  0 siblings, 1 reply; 69+ messages in thread
From: Igor Mammedov @ 2019-09-17 13:11 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: devel, qemu-devel, yingwen.chen, phillip.goerl, alex.williamson,
	jiewen.yao, jun.nakajima, michael.d.kinney, pbonzini,
	boris.ostrovsky, rfc, joao.m.martins

On Wed, 11 Sep 2019 19:30:46 +0200
"Laszlo Ersek" <lersek@redhat.com> wrote:

> On 09/10/19 17:58, Igor Mammedov wrote:
> > On Mon, 9 Sep 2019 21:15:44 +0200
> > Laszlo Ersek <lersek@redhat.com> wrote:  
> 
> [...]
> 
> > It looks like fwcfg smi feature negotiation isn't reusable in this case.
> > I'd prefer not to add another device just for another SMI feature
> > negotiation/activation.  
> 
> I thought it could be a register on the new CPU hotplug controller that
> we're going to need anyway (if I understand correctly, at least).
If we don't have to 'park' hotplugged CPUs, then I don't see a need for
an extra controller.


> But:
> 
> > How about stealing reserved register from pci-host similar to your
> > extended TSEG commit (2f295167 q35/mch: implement extended TSEG sizes)?
> > (Looking into spec can (ab)use 0x58 or 0x59 register)  
> 
> Yes, that should work.
> 
> In fact, I had considered 0x58 / 0x59 when looking for unused registers
> for extended TSEG configuration:
> 
> http://mid.mail-archive.com/d8802612-0b11-776f-b209-53bbdaf67515@redhat.com
> 
> So I'm OK with this, thank you.
Thanks for the tip!
... patches with a stolen register are on the way to mail-list.



^ permalink raw reply	[flat|nested] 69+ messages in thread

* [staging/branch]: CdePkg - C Development Environment Package
  2019-09-17 13:11                                                                               ` [edk2-devel] " Igor Mammedov
@ 2019-09-17 14:38                                                                                 ` Minnow Ware
  0 siblings, 0 replies; 69+ messages in thread
From: Minnow Ware @ 2019-09-17 14:38 UTC (permalink / raw)
  To: devel@edk2.groups.io; +Cc: michael.d.kinney@intel.com, Richardson, Brian

[-- Attachment #1: Type: text/plain, Size: 2611 bytes --]

Hi UEFI community,

I’d like to introduce the CdePkg to edk2-staging.

The package is not yet completed but ready to demonstrate it’s power, probably also for modernFW.

A couple of years ago, after an UEFI BIOS project on AMD platform I decided to write my own ANSI C Library for UEFI Shell and POST.

My design goals were:

  1.  to rewrite the whole thing from scratch, without using any public source code from GNU, BSD, Watcom or Intel EDK2 / tiano core
  2.  completeness: full blown C90 + C95 support, no C99, no non-specified extensions at all , e.g. itoa(), stricmp()...
  3.  small code size, for UEFI-POST-driver uses a C-Library-Driver, that contains core/worker functions for realloc() ==  malloc() and free(),

entire printf-family, entire scanf-family.

UEFI-POST-driver just uses small wrapper functions to run the C-Library-Driver code.

  1.  stable, exact, chipset independent (w/o ACPI timer) "clock()” with CLOCKS_PER_SEC == 1000
  2.  complete set of the Microsoft C-compiler intrinsic functions
  3.  ROM-able! Runs with stack but w/o any static storage duration in .data segment, e.g. for rand(), strtok(), tmpfile()

This is required for early PEI before memory sizing, when PEI-images run directly out of flash.

  1.  Microsoft bug compatible (as far as possible)

     *   to save my lifetime writing a documentation  https://github.com/JoaquinConoBolillo/torito-C-Library/blob/master/implemented.md
     *   use original Microsoft header files for UEFI Shell Apps created in VS2017/19
     *   “debug”-mode for UEFI Shell executable in VS2017/19, that truly runs on Windows (that works

when using library functions only, no HW access, not UEFI-API use) to debug the library

itself – but this just links the same .OBJ module with the WinNT-EntryPoint instead of UEFI-EntryPoint

(The entry point module pulls in the appropriate OS-interface branch dispatcher)

  1.  all that in one single C-Library CdeLib.lib

The Readme.MD is here: https://github.com/MinnowWare/CdePkg#cdepkg

CdePkg shall be adjusted to other compilers/tool chains too, once it is feature complete and accepted by the UEFI community,
as long as it is for Microsoft VS2017/19 only.

The CdePkg is integrated into the “vUDK2018”-EDK2, which in turn runs in a MinnowBoard build.
It can be emulated in the Nt32Pkg, since EmulatorPkg in “vUDK2018” doesn’t support Windows…

I would like to move the “vUDK2018”-EDK2 to the edk2-staging branch CdePkg, but need to have access granted.

Can anyone kindly grant access rights to me?

Best Regards,
Kilian

[-- Attachment #2: Type: text/html, Size: 12701 bytes --]

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-23 15:25                                               ` Michael D Kinney
  2019-08-24  1:48                                                 ` Yao, Jiewen
@ 2019-08-26 15:30                                                 ` Laszlo Ersek
  2019-08-27 16:23                                                   ` Igor Mammedov
  1 sibling, 1 reply; 69+ messages in thread
From: Laszlo Ersek @ 2019-08-26 15:30 UTC (permalink / raw)
  To: Kinney, Michael D, Yao, Jiewen, Paolo Bonzini, rfc@edk2.groups.io
  Cc: Alex Williamson, devel@edk2.groups.io, qemu devel list,
	Igor Mammedov, Chen, Yingwen, Nakajima, Jun, Boris Ostrovsky,
	Joao Marcal Lemos Martins, Phillip Goerl

On 08/23/19 17:25, Kinney, Michael D wrote:
> Hi Jiewen,
> 
> If a hot add CPU needs to run any code before the
> first SMI, I would recommend is only executes code
> from a write protected FLASH range without a stack
> and then wait for the first SMI.

"without a stack" looks very risky to me. Even if we manage to implement
the guest code initially, we'll be trapped without a stack, should we
ever need to add more complex stuff there.

> For this OVMF use case, is any CPU init required
> before the first SMI?

I expressed a preference for that too: "I wish we could simply wake the
new CPU [...] with an SMI".

http://mid.mail-archive.com/398b3327-0820-95af-a34d-1a4a1d50cf35@redhat.com

> From Paolo's list of steps are steps (8a) and (8b) 
> really required?

See again my message linked above -- just after the quoted sentence, I
wrote, "IOW, if we could excise steps 07b, 08a, 08b".

But, I obviously defer to Paolo and Igor on that.

(I do believe we have a dilemma here. In QEMU, we probably prefer to
emulate physical hardware as faithfully as possible. However, we do not
have Cache-As-RAM (nor do we intend to, IIUC). Does that justify other
divergences from physical hardware too, such as waking just by virtue of
an SMI?)

> Can the SMI monarch use the Local
> APIC to send a directed SMI to the hot added CPU?
> The SMI monarch needs to know the APIC ID of the
> hot added CPU.  Do we also need to handle the case
> where multiple CPUs are added at once?  I think we
> would need to serialize the use of 3000:8000 for the
> SMM rebase operation on each hot added CPU.

I agree this would be a huge help.

> It would be simpler if we can guarantee that only
> one CPU can be added or removed at a time and the 
> complete flow of adding a CPU to SMM and the OS
> needs to be completed before another add/remove
> event needs to be processed.

I don't know if the QEMU monitor command in question can guarantee this
serialization. I think such a request/response pattern is generally
implementable between QEMU and guest code.

But, AIUI, the "device-add" monitor command is quite generic, and used
for hot-plugging a number of other (non-CPU) device models. I'm unsure
if the pattern in question can be squeezed into "device-add". (It's not
a dedicated command for CPU hotplug.)

... Apologies that I didn't add much information to the thread, just
now. I'd like to keep the discussion going.

Thanks
Laszlo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-26 15:30                                                 ` [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF Laszlo Ersek
@ 2019-08-27 16:23                                                   ` Igor Mammedov
  2019-08-27 20:11                                                     ` Laszlo Ersek
  0 siblings, 1 reply; 69+ messages in thread
From: Igor Mammedov @ 2019-08-27 16:23 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Kinney, Michael D, Yao, Jiewen, Paolo Bonzini, rfc@edk2.groups.io,
	Alex Williamson, devel@edk2.groups.io, qemu devel list,
	Chen, Yingwen, Nakajima, Jun, Boris Ostrovsky,
	Joao Marcal Lemos Martins, Phillip Goerl

On Mon, 26 Aug 2019 17:30:43 +0200
Laszlo Ersek <lersek@redhat.com> wrote:

> On 08/23/19 17:25, Kinney, Michael D wrote:
> > Hi Jiewen,
> > 
> > If a hot add CPU needs to run any code before the
> > first SMI, I would recommend is only executes code
> > from a write protected FLASH range without a stack
> > and then wait for the first SMI.  
> 
> "without a stack" looks very risky to me. Even if we manage to implement
> the guest code initially, we'll be trapped without a stack, should we
> ever need to add more complex stuff there.

Do we need anything complex in relocation handler, though?
From what I'd imagine, minimum handler should
  1: get address of TSEG, possibly read it from chipset
  2: calculate its new SMBASE offset based on its APIC ID
  3: save new SMBASE

> > For this OVMF use case, is any CPU init required
> > before the first SMI?  
> 
> I expressed a preference for that too: "I wish we could simply wake the
> new CPU [...] with an SMI".
> 
> http://mid.mail-archive.com/398b3327-0820-95af-a34d-1a4a1d50cf35@redhat.com
> 
> 
> > From Paolo's list of steps are steps (8a) and (8b) 
> > really required?  

07b - implies 08b
   8b could be trivial hlt loop and we most likely could skip 08a and signaling host CPU steps
   but we need INIT/SIPI/SIPI sequence to wake up AP so it could handle pending SMI
   before handling SIPI (so behavior would follow SDM).


> See again my message linked above -- just after the quoted sentence, I
> wrote, "IOW, if we could excise steps 07b, 08a, 08b".
> 
> But, I obviously defer to Paolo and Igor on that.
> 
> (I do believe we have a dilemma here. In QEMU, we probably prefer to
> emulate physical hardware as faithfully as possible. However, we do not
> have Cache-As-RAM (nor do we intend to, IIUC). Does that justify other
> divergences from physical hardware too, such as waking just by virtue of
> an SMI?)
So far we should be able to implement it per spec (at least SDM one),
but we would still need to invent chipset hardware
i.e. like adding to Q35 non exiting SMRAM and means to map/unmap it
to non-SMM address space.
(and I hope we could avoid adding "parked CPU" thingy)
 
> > Can the SMI monarch use the Local
> > APIC to send a directed SMI to the hot added CPU?
> > The SMI monarch needs to know the APIC ID of the
> > hot added CPU.  Do we also need to handle the case
> > where multiple CPUs are added at once?  I think we
> > would need to serialize the use of 3000:8000 for the
> > SMM rebase operation on each hot added CPU.  
> 
> I agree this would be a huge help.

We can serialize it (for normal hotplug flow) from ACPI handler
in the guest (i.e. non enforced serialization).
The only reason for serialization I see is not to allow
a bunch of new CPU trample over default SMBASE save area
at the same time.

There is a consideration though, an OS level attacker
could send broadcast SMI and INIT-SIPI-SIPI sequences
to rigger race, but I don't see it as a threat since
attack shouldn't be able to exploit anything and in
worst case guest OS would crash (taking in account that
SMIs are privileged, OS attacker has a plenty of other
means to kill itself).

> > It would be simpler if we can guarantee that only
> > one CPU can be added or removed at a time and the 
> > complete flow of adding a CPU to SMM and the OS
> > needs to be completed before another add/remove
> > event needs to be processed.  
> 
> I don't know if the QEMU monitor command in question can guarantee this
> serialization. I think such a request/response pattern is generally
> implementable between QEMU and guest code.
> 
> But, AIUI, the "device-add" monitor command is quite generic, and used
> for hot-plugging a number of other (non-CPU) device models. I'm unsure
> if the pattern in question can be squeezed into "device-add". (It's not
> a dedicated command for CPU hotplug.)
> 
> ... Apologies that I didn't add much information to the thread, just
> now. I'd like to keep the discussion going.
> 
> Thanks
> Laszlo


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-27 16:23                                                   ` Igor Mammedov
@ 2019-08-27 20:11                                                     ` Laszlo Ersek
  2019-08-28 12:01                                                       ` Igor Mammedov
  0 siblings, 1 reply; 69+ messages in thread
From: Laszlo Ersek @ 2019-08-27 20:11 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Kinney, Michael D, Yao, Jiewen, Paolo Bonzini, rfc@edk2.groups.io,
	Alex Williamson, devel@edk2.groups.io, qemu devel list,
	Chen, Yingwen, Nakajima, Jun, Boris Ostrovsky,
	Joao Marcal Lemos Martins, Phillip Goerl

On 08/27/19 18:23, Igor Mammedov wrote:
> On Mon, 26 Aug 2019 17:30:43 +0200
> Laszlo Ersek <lersek@redhat.com> wrote:
> 
>> On 08/23/19 17:25, Kinney, Michael D wrote:
>>> Hi Jiewen,
>>>
>>> If a hot add CPU needs to run any code before the
>>> first SMI, I would recommend is only executes code
>>> from a write protected FLASH range without a stack
>>> and then wait for the first SMI.  
>>
>> "without a stack" looks very risky to me. Even if we manage to implement
>> the guest code initially, we'll be trapped without a stack, should we
>> ever need to add more complex stuff there.
> 
> Do we need anything complex in relocation handler, though?
> From what I'd imagine, minimum handler should
>   1: get address of TSEG, possibly read it from chipset

The TSEG base calculation is not trivial in this environment. The 32-bit
RAM size needs to be read from the CMOS (IO port accesses). Then the
extended TSEG size (if any) needs to be detected from PCI config space
(IO port accesses). Both CMOS and PCI config space requires IO port
writes too (not just reads). Even if there are enough registers for the
calculations, can we rely on these unprotected IO ports?

Also, can we switch to 32-bit mode without a stack? I assume it would be
necessary to switch to 32-bit mode for 32-bit arithmetic.

Getting the initial APIC ID needs some CPUID instructions IIUC, which
clobber EAX through EDX, if I understand correctly. Given the register
pressure, CPUID might have to be one of the first instructions to call.

>   2: calculate its new SMBASE offset based on its APIC ID
>   3: save new SMBASE
> 
>>> For this OVMF use case, is any CPU init required
>>> before the first SMI?  
>>
>> I expressed a preference for that too: "I wish we could simply wake the
>> new CPU [...] with an SMI".
>>
>> http://mid.mail-archive.com/398b3327-0820-95af-a34d-1a4a1d50cf35@redhat.com
>>
>>
>>> From Paolo's list of steps are steps (8a) and (8b) 
>>> really required?  
> 
> 07b - implies 08b

I agree about that implication, yes. *If* we send an INIT/SIPI/SIPI to
the new CPU, then the new CPU needs a HLT loop, I think.

>    8b could be trivial hlt loop and we most likely could skip 08a and signaling host CPU steps
>    but we need INIT/SIPI/SIPI sequence to wake up AP so it could handle pending SMI
>    before handling SIPI (so behavior would follow SDM).
> 
> 
>> See again my message linked above -- just after the quoted sentence, I
>> wrote, "IOW, if we could excise steps 07b, 08a, 08b".
>>
>> But, I obviously defer to Paolo and Igor on that.
>>
>> (I do believe we have a dilemma here. In QEMU, we probably prefer to
>> emulate physical hardware as faithfully as possible. However, we do not
>> have Cache-As-RAM (nor do we intend to, IIUC). Does that justify other
>> divergences from physical hardware too, such as waking just by virtue of
>> an SMI?)
> So far we should be able to implement it per spec (at least SDM one),
> but we would still need to invent chipset hardware
> i.e. like adding to Q35 non exiting SMRAM and means to map/unmap it
> to non-SMM address space.
> (and I hope we could avoid adding "parked CPU" thingy)

I think we'll need a separate QEMU tree for this. I'm quite in the dark
-- I can't tell if I'll be able to do something in OVMF without actually
trying it. And for that, we'll need some proposed QEMU code that is
testable, but not upstream yet. (As I might realize that I'm unable to
make it work in OVMF.)

>>> Can the SMI monarch use the Local
>>> APIC to send a directed SMI to the hot added CPU?
>>> The SMI monarch needs to know the APIC ID of the
>>> hot added CPU.  Do we also need to handle the case
>>> where multiple CPUs are added at once?  I think we
>>> would need to serialize the use of 3000:8000 for the
>>> SMM rebase operation on each hot added CPU.  
>>
>> I agree this would be a huge help.
> 
> We can serialize it (for normal hotplug flow) from ACPI handler
> in the guest (i.e. non enforced serialization).
> The only reason for serialization I see is not to allow
> a bunch of new CPU trample over default SMBASE save area
> at the same time.

If the default SMBASE area is corrupted due to concurrent access, could
that lead to invalid relocated SMBASE values? Possibly pointing into
normal RAM?

Thanks
Laszlo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-27 20:11                                                     ` Laszlo Ersek
@ 2019-08-28 12:01                                                       ` Igor Mammedov
  2019-08-29 16:25                                                         ` Laszlo Ersek
  0 siblings, 1 reply; 69+ messages in thread
From: Igor Mammedov @ 2019-08-28 12:01 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Kinney, Michael D, Yao, Jiewen, Paolo Bonzini, rfc@edk2.groups.io,
	Alex Williamson, devel@edk2.groups.io, qemu devel list,
	Chen, Yingwen, Nakajima, Jun, Boris Ostrovsky,
	Joao Marcal Lemos Martins, Phillip Goerl

On Tue, 27 Aug 2019 22:11:15 +0200
Laszlo Ersek <lersek@redhat.com> wrote:

> On 08/27/19 18:23, Igor Mammedov wrote:
> > On Mon, 26 Aug 2019 17:30:43 +0200
> > Laszlo Ersek <lersek@redhat.com> wrote:
> > 
> >> On 08/23/19 17:25, Kinney, Michael D wrote:
> >>> Hi Jiewen,
> >>>
> >>> If a hot add CPU needs to run any code before the
> >>> first SMI, I would recommend is only executes code
> >>> from a write protected FLASH range without a stack
> >>> and then wait for the first SMI.  
> >>
> >> "without a stack" looks very risky to me. Even if we manage to implement
> >> the guest code initially, we'll be trapped without a stack, should we
> >> ever need to add more complex stuff there.
> > 
> > Do we need anything complex in relocation handler, though?
> > From what I'd imagine, minimum handler should
> >   1: get address of TSEG, possibly read it from chipset
> 
> The TSEG base calculation is not trivial in this environment. The 32-bit
> RAM size needs to be read from the CMOS (IO port accesses). Then the
> extended TSEG size (if any) needs to be detected from PCI config space
> (IO port accesses). Both CMOS and PCI config space requires IO port
> writes too (not just reads). Even if there are enough registers for the
> calculations, can we rely on these unprotected IO ports?
> 
> Also, can we switch to 32-bit mode without a stack? I assume it would be
> necessary to switch to 32-bit mode for 32-bit arithmetic.
from SDM vol 3:
"
34.5.1 Initial SMM Execution Environment
After saving the current context of the processor, the processor initializes its core registers to the values shown in Table 34-4. Upon entering SMM, the PE and PG flags in control register CR0 are cleared, which places the processor in an environment similar to real-address mode. The differences between the SMM execution environment and the real-address mode execution environment are as follows:
• The addressable address space ranges from 0 to FFFFFFFFH (4 GBytes).
• The normal 64-KByte segment limit for real-address mode is increased to 4 GBytes.
• The default operand and address sizes are set to 16 bits, which restricts the addressable SMRAM address space to the 1-MByte real-address mode limit for native real-address-mode code. However, operand-size and address-size override prefixes can be used to access the address space beyond
                                         ^^^^^^^^
the 1-MByte.
"

> 
> Getting the initial APIC ID needs some CPUID instructions IIUC, which
> clobber EAX through EDX, if I understand correctly. Given the register
> pressure, CPUID might have to be one of the first instructions to call.

we could map at 30000 not 64K required for save area but 128K and use
2nd half as secure RAM for stack and intermediate data.

Firmware could put there pre-calculated pointer to TSEG after it's configured and locked down,
this way relocation handler won't have to figure out TSEG address on its own.

> >   2: calculate its new SMBASE offset based on its APIC ID
> >   3: save new SMBASE
> > 
> >>> For this OVMF use case, is any CPU init required
> >>> before the first SMI?  
> >>
> >> I expressed a preference for that too: "I wish we could simply wake the
> >> new CPU [...] with an SMI".
> >>
> >> http://mid.mail-archive.com/398b3327-0820-95af-a34d-1a4a1d50cf35@redhat.com
> >>
> >>
> >>> From Paolo's list of steps are steps (8a) and (8b) 
> >>> really required?  
> > 
> > 07b - implies 08b
> 
> I agree about that implication, yes. *If* we send an INIT/SIPI/SIPI to
> the new CPU, then the new CPU needs a HLT loop, I think.
It also could execute INIT reset, which leaves initialized SMM untouched
but otherwise CPU would be inactive.
 
> 
> >    8b could be trivial hlt loop and we most likely could skip 08a and signaling host CPU steps
> >    but we need INIT/SIPI/SIPI sequence to wake up AP so it could handle pending SMI
> >    before handling SIPI (so behavior would follow SDM).
> > 
> > 
> >> See again my message linked above -- just after the quoted sentence, I
> >> wrote, "IOW, if we could excise steps 07b, 08a, 08b".
> >>
> >> But, I obviously defer to Paolo and Igor on that.
> >>
> >> (I do believe we have a dilemma here. In QEMU, we probably prefer to
> >> emulate physical hardware as faithfully as possible. However, we do not
> >> have Cache-As-RAM (nor do we intend to, IIUC). Does that justify other
> >> divergences from physical hardware too, such as waking just by virtue of
> >> an SMI?)
> > So far we should be able to implement it per spec (at least SDM one),
> > but we would still need to invent chipset hardware
> > i.e. like adding to Q35 non exiting SMRAM and means to map/unmap it
> > to non-SMM address space.
> > (and I hope we could avoid adding "parked CPU" thingy)
> 
> I think we'll need a separate QEMU tree for this. I'm quite in the dark
> -- I can't tell if I'll be able to do something in OVMF without actually
> trying it. And for that, we'll need some proposed QEMU code that is
> testable, but not upstream yet. (As I might realize that I'm unable to
> make it work in OVMF.)

Let me prepare a QEMU branch with something usable for you.

To avoid inventing mgmt API for configuring SMRAM at 30000,
I'm suggesting to steal/alias top or bottom 128K of TSEG window to 30000.
This way OVMF would be able to set SMI relocation handler modifying
TSEG and pass TSEG base/other data to it as well.
Would it work for you or should we try more elaborate approach?
 
> >>> Can the SMI monarch use the Local
> >>> APIC to send a directed SMI to the hot added CPU?
> >>> The SMI monarch needs to know the APIC ID of the
> >>> hot added CPU.  Do we also need to handle the case
> >>> where multiple CPUs are added at once?  I think we
> >>> would need to serialize the use of 3000:8000 for the
> >>> SMM rebase operation on each hot added CPU.  
> >>
> >> I agree this would be a huge help.
> > 
> > We can serialize it (for normal hotplug flow) from ACPI handler
> > in the guest (i.e. non enforced serialization).
> > The only reason for serialization I see is not to allow
> > a bunch of new CPU trample over default SMBASE save area
> > at the same time.
> 
> If the default SMBASE area is corrupted due to concurrent access, could
> that lead to invalid relocated SMBASE values? Possibly pointing into
> normal RAM?

in case of broadcast SMI (btw does OVMF use broadcast SMIs?) several CPUs could end up
with the same SMBASE within SMRAM
  1: default one: in case the 2nd CPU enters SMM after the 1st CPU saved new SMBASE but before it's called RSM
  2: duplicated SMBASE: where the 2nd CPU saves its new SMBASE before the 1st calls RSM

while the 2nd could be counteracted with using locks, I don't see how 1st one could be avoided.
May be host CPU can send 2nd SMI so just relocated CPU could send an ACK from relocated SMBASE/with new SMI handler?

> 
> Thanks
> Laszlo


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-28 12:01                                                       ` Igor Mammedov
@ 2019-08-29 16:25                                                         ` Laszlo Ersek
  2019-08-30 13:49                                                           ` [Qemu-devel] " Igor Mammedov
  0 siblings, 1 reply; 69+ messages in thread
From: Laszlo Ersek @ 2019-08-29 16:25 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Kinney, Michael D, Yao, Jiewen, Paolo Bonzini, rfc@edk2.groups.io,
	Alex Williamson, devel@edk2.groups.io, qemu devel list,
	Chen, Yingwen, Nakajima, Jun, Boris Ostrovsky,
	Joao Marcal Lemos Martins, Phillip Goerl

On 08/28/19 14:01, Igor Mammedov wrote:
> On Tue, 27 Aug 2019 22:11:15 +0200
> Laszlo Ersek <lersek@redhat.com> wrote:
> 
>> On 08/27/19 18:23, Igor Mammedov wrote:
>>> On Mon, 26 Aug 2019 17:30:43 +0200
>>> Laszlo Ersek <lersek@redhat.com> wrote:
>>>
>>>> On 08/23/19 17:25, Kinney, Michael D wrote:
>>>>> Hi Jiewen,
>>>>>
>>>>> If a hot add CPU needs to run any code before the
>>>>> first SMI, I would recommend is only executes code
>>>>> from a write protected FLASH range without a stack
>>>>> and then wait for the first SMI.  
>>>>
>>>> "without a stack" looks very risky to me. Even if we manage to implement
>>>> the guest code initially, we'll be trapped without a stack, should we
>>>> ever need to add more complex stuff there.
>>>
>>> Do we need anything complex in relocation handler, though?
>>> From what I'd imagine, minimum handler should
>>>   1: get address of TSEG, possibly read it from chipset
>>
>> The TSEG base calculation is not trivial in this environment. The 32-bit
>> RAM size needs to be read from the CMOS (IO port accesses). Then the
>> extended TSEG size (if any) needs to be detected from PCI config space
>> (IO port accesses). Both CMOS and PCI config space requires IO port
>> writes too (not just reads). Even if there are enough registers for the
>> calculations, can we rely on these unprotected IO ports?
>>
>> Also, can we switch to 32-bit mode without a stack? I assume it would be
>> necessary to switch to 32-bit mode for 32-bit arithmetic.
> from SDM vol 3:
> "
> 34.5.1 Initial SMM Execution Environment
> After saving the current context of the processor, the processor initializes its core registers to the values shown in Table 34-4. Upon entering SMM, the PE and PG flags in control register CR0 are cleared, which places the processor in an environment similar to real-address mode. The differences between the SMM execution environment and the real-address mode execution environment are as follows:
> • The addressable address space ranges from 0 to FFFFFFFFH (4 GBytes).
> • The normal 64-KByte segment limit for real-address mode is increased to 4 GBytes.
> • The default operand and address sizes are set to 16 bits, which restricts the addressable SMRAM address space to the 1-MByte real-address mode limit for native real-address-mode code. However, operand-size and address-size override prefixes can be used to access the address space beyond
>                                          ^^^^^^^^
> the 1-MByte.
> "

That helps. Thanks for the quote!

>> Getting the initial APIC ID needs some CPUID instructions IIUC, which
>> clobber EAX through EDX, if I understand correctly. Given the register
>> pressure, CPUID might have to be one of the first instructions to call.
> 
> we could map at 30000 not 64K required for save area but 128K and use
> 2nd half as secure RAM for stack and intermediate data.
> 
> Firmware could put there pre-calculated pointer to TSEG after it's configured and locked down,
> this way relocation handler won't have to figure out TSEG address on its own.

Sounds like a great idea.

>>>   2: calculate its new SMBASE offset based on its APIC ID
>>>   3: save new SMBASE
>>>
>>>>> For this OVMF use case, is any CPU init required
>>>>> before the first SMI?  
>>>>
>>>> I expressed a preference for that too: "I wish we could simply wake the
>>>> new CPU [...] with an SMI".
>>>>
>>>> http://mid.mail-archive.com/398b3327-0820-95af-a34d-1a4a1d50cf35@redhat.com
>>>>
>>>>
>>>>> From Paolo's list of steps are steps (8a) and (8b) 
>>>>> really required?  
>>>
>>> 07b - implies 08b
>>
>> I agree about that implication, yes. *If* we send an INIT/SIPI/SIPI to
>> the new CPU, then the new CPU needs a HLT loop, I think.
> It also could execute INIT reset, which leaves initialized SMM untouched
> but otherwise CPU would be inactive.
>  
>>
>>>    8b could be trivial hlt loop and we most likely could skip 08a and signaling host CPU steps
>>>    but we need INIT/SIPI/SIPI sequence to wake up AP so it could handle pending SMI
>>>    before handling SIPI (so behavior would follow SDM).
>>>
>>>
>>>> See again my message linked above -- just after the quoted sentence, I
>>>> wrote, "IOW, if we could excise steps 07b, 08a, 08b".
>>>>
>>>> But, I obviously defer to Paolo and Igor on that.
>>>>
>>>> (I do believe we have a dilemma here. In QEMU, we probably prefer to
>>>> emulate physical hardware as faithfully as possible. However, we do not
>>>> have Cache-As-RAM (nor do we intend to, IIUC). Does that justify other
>>>> divergences from physical hardware too, such as waking just by virtue of
>>>> an SMI?)
>>> So far we should be able to implement it per spec (at least SDM one),
>>> but we would still need to invent chipset hardware
>>> i.e. like adding to Q35 non exiting SMRAM and means to map/unmap it
>>> to non-SMM address space.
>>> (and I hope we could avoid adding "parked CPU" thingy)
>>
>> I think we'll need a separate QEMU tree for this. I'm quite in the dark
>> -- I can't tell if I'll be able to do something in OVMF without actually
>> trying it. And for that, we'll need some proposed QEMU code that is
>> testable, but not upstream yet. (As I might realize that I'm unable to
>> make it work in OVMF.)
> 
> Let me prepare a QEMU branch with something usable for you.
> 
> To avoid inventing mgmt API for configuring SMRAM at 30000,
> I'm suggesting to steal/alias top or bottom 128K of TSEG window to 30000.
> This way OVMF would be able to set SMI relocation handler modifying
> TSEG and pass TSEG base/other data to it as well.
> Would it work for you or should we try more elaborate approach?

I believe this this change may not be cross-compatible between QEMU and
OVMF. OVMF platform code would have to hide the stolen part of the TSEG
from core edk2 SMM code.

If old OVMF were booted on new QEMU, I believe things could break -- the
SMM core would be at liberty to use any part of the TSEG (advertized by
OVMF platform code to the full extent), and the SMM core would continue
expecting 0x30000 to be normal (and distinct) RAM. If QEMU suddenly
aliased both ranges to the same contents (in System Management Mode), I
think that would confuse the SMM core.

We already negotiate (or at least, detect) two features in this area;
"extended TSEG" and "broadcast SMI". I believe we need a CPU hotplug
controller anyway -- is that still the case? If it is, we could use
registers on that device, for managing the alias.

>> If the default SMBASE area is corrupted due to concurrent access, could
>> that lead to invalid relocated SMBASE values? Possibly pointing into
>> normal RAM?
> 
> in case of broadcast SMI (btw does OVMF use broadcast SMIs?) several CPUs could end up

Broadcast SMI is very important for OVMF.

The Platform Init spec basically defines an abstract interface for
runtime UEFI drivers for submitting an "SMM request". Part of that is
raising an SMI (also abstracted).

*How* an SMI is raised is platform-dependent, and edk2 provides two
implementations for synching APs in SMM (broadcast ("traditional") and
relaxed).

In our testing on QEMU/KVM, the broadcast/traditional sync mode worked
very robustly (with QEMU actually broadcasting the SMI in response to IO
port 0xB2 writes), but the relaxed synch mode was unstable / brittle (in
particular during S3 resume). Therefore broadcast SMI is negotiated by
OVMF whenever it is available -- it makes a big difference in stability.

Now, whether broadcast SMI needs to be part of CPU hotplug specifically,
that's a different question. The CPU hotplug logic may not necessarily
have to go through the same (standardized) interfaces that runtime UEFI
drivers do.

> with the same SMBASE within SMRAM
>   1: default one: in case the 2nd CPU enters SMM after the 1st CPU saved new SMBASE but before it's called RSM
>   2: duplicated SMBASE: where the 2nd CPU saves its new SMBASE before the 1st calls RSM
> 
> while the 2nd could be counteracted with using locks, I don't see how 1st one could be avoided.
> May be host CPU can send 2nd SMI so just relocated CPU could send an ACK from relocated SMBASE/with new SMI handler?

I don't have any better idea. We could protect the default SMBASE with a
semaphore (spinlock?) in SMRAM, but that would have to be released with
the owning CPU executing code at the new SMBASE. Basically, what you
say, just "ACK" meaning "release the spinlock".

Thanks,
Laszlo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-29 16:25                                                         ` Laszlo Ersek
@ 2019-08-30 13:49                                                           ` Igor Mammedov
  0 siblings, 0 replies; 69+ messages in thread
From: Igor Mammedov @ 2019-08-30 13:49 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Chen, Yingwen, devel@edk2.groups.io, Phillip Goerl,
	qemu devel list, Alex Williamson, Yao, Jiewen, Nakajima, Jun,
	Kinney, Michael D, Paolo Bonzini, Boris Ostrovsky,
	rfc@edk2.groups.io, Joao Marcal Lemos Martins

On Thu, 29 Aug 2019 18:25:17 +0200
Laszlo Ersek <lersek@redhat.com> wrote:

> On 08/28/19 14:01, Igor Mammedov wrote:
> > On Tue, 27 Aug 2019 22:11:15 +0200
> > Laszlo Ersek <lersek@redhat.com> wrote:
> >   
> >> On 08/27/19 18:23, Igor Mammedov wrote:  
> >>> On Mon, 26 Aug 2019 17:30:43 +0200
> >>> Laszlo Ersek <lersek@redhat.com> wrote:
> >>>  
> >>>> On 08/23/19 17:25, Kinney, Michael D wrote:  
> >>>>> Hi Jiewen,
> >>>>>
> >>>>> If a hot add CPU needs to run any code before the
> >>>>> first SMI, I would recommend is only executes code
> >>>>> from a write protected FLASH range without a stack
> >>>>> and then wait for the first SMI.    
> >>>>
> >>>> "without a stack" looks very risky to me. Even if we manage to implement
> >>>> the guest code initially, we'll be trapped without a stack, should we
> >>>> ever need to add more complex stuff there.  
> >>>
> >>> Do we need anything complex in relocation handler, though?
> >>> From what I'd imagine, minimum handler should
> >>>   1: get address of TSEG, possibly read it from chipset  
> >>
> >> The TSEG base calculation is not trivial in this environment. The 32-bit
> >> RAM size needs to be read from the CMOS (IO port accesses). Then the
> >> extended TSEG size (if any) needs to be detected from PCI config space
> >> (IO port accesses). Both CMOS and PCI config space requires IO port
> >> writes too (not just reads). Even if there are enough registers for the
> >> calculations, can we rely on these unprotected IO ports?
> >>
> >> Also, can we switch to 32-bit mode without a stack? I assume it would be
> >> necessary to switch to 32-bit mode for 32-bit arithmetic.  
> > from SDM vol 3:
> > "
> > 34.5.1 Initial SMM Execution Environment
> > After saving the current context of the processor, the processor initializes its core registers to the values shown in Table 34-4. Upon entering SMM, the PE and PG flags in control register CR0 are cleared, which places the processor in an environment similar to real-address mode. The differences between the SMM execution environment and the real-address mode execution environment are as follows:
> > • The addressable address space ranges from 0 to FFFFFFFFH (4 GBytes).
> > • The normal 64-KByte segment limit for real-address mode is increased to 4 GBytes.
> > • The default operand and address sizes are set to 16 bits, which restricts the addressable SMRAM address space to the 1-MByte real-address mode limit for native real-address-mode code. However, operand-size and address-size override prefixes can be used to access the address space beyond
> >                                          ^^^^^^^^
> > the 1-MByte.
> > "  
> 
> That helps. Thanks for the quote!
> 
> >> Getting the initial APIC ID needs some CPUID instructions IIUC, which
> >> clobber EAX through EDX, if I understand correctly. Given the register
> >> pressure, CPUID might have to be one of the first instructions to call.  
> > 
> > we could map at 30000 not 64K required for save area but 128K and use
> > 2nd half as secure RAM for stack and intermediate data.
> > 
> > Firmware could put there pre-calculated pointer to TSEG after it's configured and locked down,
> > this way relocation handler won't have to figure out TSEG address on its own.  
> 
> Sounds like a great idea.
> 
> >>>   2: calculate its new SMBASE offset based on its APIC ID
> >>>   3: save new SMBASE
> >>>  
> >>>>> For this OVMF use case, is any CPU init required
> >>>>> before the first SMI?    
> >>>>
> >>>> I expressed a preference for that too: "I wish we could simply wake the
> >>>> new CPU [...] with an SMI".
> >>>>
> >>>> http://mid.mail-archive.com/398b3327-0820-95af-a34d-1a4a1d50cf35@redhat.com
> >>>>
> >>>>  
> >>>>> From Paolo's list of steps are steps (8a) and (8b) 
> >>>>> really required?    
> >>>
> >>> 07b - implies 08b  
> >>
> >> I agree about that implication, yes. *If* we send an INIT/SIPI/SIPI to
> >> the new CPU, then the new CPU needs a HLT loop, I think.  
> > It also could execute INIT reset, which leaves initialized SMM untouched
> > but otherwise CPU would be inactive.
> >    
> >>  
> >>>    8b could be trivial hlt loop and we most likely could skip 08a and signaling host CPU steps
> >>>    but we need INIT/SIPI/SIPI sequence to wake up AP so it could handle pending SMI
> >>>    before handling SIPI (so behavior would follow SDM).
> >>>
> >>>  
> >>>> See again my message linked above -- just after the quoted sentence, I
> >>>> wrote, "IOW, if we could excise steps 07b, 08a, 08b".
> >>>>
> >>>> But, I obviously defer to Paolo and Igor on that.
> >>>>
> >>>> (I do believe we have a dilemma here. In QEMU, we probably prefer to
> >>>> emulate physical hardware as faithfully as possible. However, we do not
> >>>> have Cache-As-RAM (nor do we intend to, IIUC). Does that justify other
> >>>> divergences from physical hardware too, such as waking just by virtue of
> >>>> an SMI?)  
> >>> So far we should be able to implement it per spec (at least SDM one),
> >>> but we would still need to invent chipset hardware
> >>> i.e. like adding to Q35 non exiting SMRAM and means to map/unmap it
> >>> to non-SMM address space.
> >>> (and I hope we could avoid adding "parked CPU" thingy)  
> >>
> >> I think we'll need a separate QEMU tree for this. I'm quite in the dark
> >> -- I can't tell if I'll be able to do something in OVMF without actually
> >> trying it. And for that, we'll need some proposed QEMU code that is
> >> testable, but not upstream yet. (As I might realize that I'm unable to
> >> make it work in OVMF.)  
> > 
> > Let me prepare a QEMU branch with something usable for you.
> > 
> > To avoid inventing mgmt API for configuring SMRAM at 30000,
> > I'm suggesting to steal/alias top or bottom 128K of TSEG window to 30000.
> > This way OVMF would be able to set SMI relocation handler modifying
> > TSEG and pass TSEG base/other data to it as well.
> > Would it work for you or should we try more elaborate approach?  
> 
> I believe this this change may not be cross-compatible between QEMU and
> OVMF. OVMF platform code would have to hide the stolen part of the TSEG
> from core edk2 SMM code.
> 
> If old OVMF were booted on new QEMU, I believe things could break -- the
> SMM core would be at liberty to use any part of the TSEG (advertized by
> OVMF platform code to the full extent), and the SMM core would continue
> expecting 0x30000 to be normal (and distinct) RAM. If QEMU suddenly
> aliased both ranges to the same contents (in System Management Mode), I
> think that would confuse the SMM core.
> 
> We already negotiate (or at least, detect) two features in this area;
> "extended TSEG" and "broadcast SMI". I believe we need a CPU hotplug
> controller anyway -- is that still the case? If it is, we could use
> registers on that device, for managing the alias.
Ok, let me check if we could cannibalize q35 pci-host for the task or
it would be easier to extend MMIO cpu-hotplug interface.
I'll probably come back with questions about how OVMF uses pci-host later.

> >> If the default SMBASE area is corrupted due to concurrent access, could
> >> that lead to invalid relocated SMBASE values? Possibly pointing into
> >> normal RAM?  
> > 
> > in case of broadcast SMI (btw does OVMF use broadcast SMIs?) several CPUs could end up  
> 
> Broadcast SMI is very important for OVMF.
> 
> The Platform Init spec basically defines an abstract interface for
> runtime UEFI drivers for submitting an "SMM request". Part of that is
> raising an SMI (also abstracted).
> 
> *How* an SMI is raised is platform-dependent, and edk2 provides two
> implementations for synching APs in SMM (broadcast ("traditional") and
> relaxed).
> 
> In our testing on QEMU/KVM, the broadcast/traditional sync mode worked
> very robustly (with QEMU actually broadcasting the SMI in response to IO
> port 0xB2 writes), but the relaxed synch mode was unstable / brittle (in
> particular during S3 resume). Therefore broadcast SMI is negotiated by
> OVMF whenever it is available -- it makes a big difference in stability.
> 
> Now, whether broadcast SMI needs to be part of CPU hotplug specifically,
> that's a different question. The CPU hotplug logic may not necessarily
> have to go through the same (standardized) interfaces that runtime UEFI
> drivers do.

considering above we are pretty much stuck with broadcast SMI mode
for standard UEFI interfaces. So for starters we can use it for CPU
hotplug as well (I think it's not possible to trigger directed SMI
from GPE handler and no nice way to implement it comes to my mind so far)

Broadcast SMI by itself does not present any problems to normal hotplug
flow. Even if there are several hotplugged CPUs, SMI# will be only pending
on all of them and host CPU can serialize them by waking one CPU at a time
by sending INIT-INIT-SIPI. Once one CPU is relocated, host CPU may wake up
the next one the same way ...


> > with the same SMBASE within SMRAM
> >   1: default one: in case the 2nd CPU enters SMM after the 1st CPU saved new SMBASE but before it's called RSM
> >   2: duplicated SMBASE: where the 2nd CPU saves its new SMBASE before the 1st calls RSM
> > 
> > while the 2nd could be counteracted with using locks, I don't see how 1st one could be avoided.
> > May be host CPU can send 2nd SMI so just relocated CPU could send an ACK from relocated SMBASE/with new SMI handler?  
> 
> I don't have any better idea. We could protect the default SMBASE with a
> semaphore (spinlock?) in SMRAM, but that would have to be released with
> the owning CPU executing code at the new SMBASE. Basically, what you
> say, just "ACK" meaning "release the spinlock".

Lets try it, if it won't work out we will invent 'parking' feature in QEMU.
Considering that an attack scenario, it is probably fine to even avoid
attempts to recover from collision if it happens and just do machine wide
reset once detected that CPU is using not its own SMBASE.
This way locking might be not needed.

In case of 'parking' I see 2 possible ways:
   1: on CPU hotplug inhibit other CPUs hotplug in QEMU (device_add cpu) and
      wait until firmware permits it. (relatively simple to implement without
      affecting CPU/KVM code)
      downside is that's not nice to upper layers as they will start getting
      transient errors while previously hotplugged CPU is being relocated.

   2: implement parked 'CPU' feature, which as I think in practice means
      ignore or queue SIPI and process it only when allowed (which is out
      of spec behavior). That probably would require changes not only to
      QEMU but to KVM as well.

> Thanks,
> Laszlo
> 


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-21 15:48                           ` [edk2-rfc] " Michael D Kinney
  2019-08-21 17:05                             ` Paolo Bonzini
@ 2019-08-22 17:53                             ` Laszlo Ersek
  1 sibling, 0 replies; 69+ messages in thread
From: Laszlo Ersek @ 2019-08-22 17:53 UTC (permalink / raw)
  To: Kinney, Michael D, rfc@edk2.groups.io, Yao, Jiewen, Paolo Bonzini
  Cc: Alex Williamson, devel@edk2.groups.io, qemu devel list,
	Igor Mammedov, Chen, Yingwen, Nakajima, Jun, Boris Ostrovsky,
	Joao Marcal Lemos Martins, Phillip Goerl

On 08/21/19 17:48, Kinney, Michael D wrote:
> Perhaps there is a way to avoid the 3000:8000 startup
> vector.
>
> If a CPU is added after a cold reset, it is already in a
> different state because one of the active CPUs needs to
> release it by interacting with the hot plug controller.
>
> Can the SMRR for CPUs in that state be pre-programmed to
> match the SMRR in the rest of the active CPUs?
>
> For OVMF we expect all the active CPUs to use the same
> SMRR value, so a check can be made to verify that all
> the active CPUs have the same SMRR value.  If they do,
> then any CPU released through the hot plug controller
> can have its SMRR pre-programmed and the initial SMI
> will start within TSEG.

Yes, that is what I proposed here:

* http://mid.mail-archive.com/effa5e32-be1e-4703-4419-8866b7754e2d@redhat.com
* https://edk2.groups.io/g/devel/message/45570

Namely:

> When the SMM setup quiesces during normal firmware boot, OVMF could
> use existent (finalized) SMBASE infomation to *pre-program* some
> virtual QEMU hardware, with such state that would be expected, as
> "final" state, of any new hotplugged CPU. Afterwards, if / when the
> hotplug actually happens, QEMU could blanket-apply this state to the
> new CPU, and broadcast a hardware SMI to all CPUs except the new one.

(I know that Paolo didn't like it; I'm just confirming that I had the
same, or at least a very similar, idea.)

Thanks!
Laszlo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-15 16:21         ` Paolo Bonzini
  2019-08-16  2:46           ` Yao, Jiewen
@ 2019-08-16 20:00           ` Laszlo Ersek
  1 sibling, 0 replies; 69+ messages in thread
From: Laszlo Ersek @ 2019-08-16 20:00 UTC (permalink / raw)
  To: Paolo Bonzini, devel, Yao, Jiewen
  Cc: edk2-rfc-groups-io, qemu devel list, Igor Mammedov, Chen, Yingwen,
	Nakajima, Jun, Boris Ostrovsky, Joao Marcal Lemos Martins,
	Phillip Goerl

On 08/15/19 18:21, Paolo Bonzini wrote:
> On 15/08/19 17:00, Laszlo Ersek wrote:
>> On 08/14/19 16:04, Paolo Bonzini wrote:
>>> On 14/08/19 15:20, Yao, Jiewen wrote:
>>>>> - Does this part require a new branch somewhere in the OVMF SEC code?
>>>>>   How do we determine whether the CPU executing SEC is BSP or
>>>>>   hot-plugged AP?
>>>> [Jiewen] I think this is blocked from hardware perspective, since the first instruction.
>>>> There are some hardware specific registers can be used to determine if the CPU is new added.
>>>> I don’t think this must be same as the real hardware.
>>>> You are free to invent some registers in device model to be used in OVMF hot plug driver.
>>>
>>> Yes, this would be a new operation mode for QEMU, that only applies to
>>> hot-plugged CPUs.  In this mode the AP doesn't reply to INIT or SMI, in
>>> fact it doesn't reply to anything at all.
>>>
>>>>> - How do we tell the hot-plugged AP where to start execution? (I.e. that
>>>>>   it should execute code at a particular pflash location.)
>>>> [Jiewen] Same real mode reset vector at FFFF:FFF0.
>>>
>>> You do not need a reset vector or INIT/SIPI/SIPI sequence at all in
>>> QEMU.  The AP does not start execution at all when it is unplugged, so
>>> no cache-as-RAM etc.
>>>
>>> We only need to modify QEMU so that hot-plugged APIs do not reply to
>>> INIT/SIPI/SMI.
>>>
>>>> I don’t think there is problem for real hardware, who always has CAR.
>>>> Can QEMU provide some CPU specific space, such as MMIO region?
>>>
>>> Why is a CPU-specific region needed if every other processor is in SMM
>>> and thus trusted.
>>
>> I was going through the steps Jiewen and Yingwen recommended.
>>
>> In step (02), the new CPU is expected to set up RAM access. In step
>> (03), the new CPU, executing code from flash, is expected to "send board
>> message to tell host CPU (GPIO->SCI) -- I am waiting for hot-add
>> message." For that action, the new CPU may need a stack (minimally if we
>> want to use C function calls).
>>
>> Until step (03), there had been no word about any other (= pre-plugged)
>> CPUs (more precisely, Jiewen even confirmed "No impact to other
>> processors"), so I didn't assume that other CPUs had entered SMM.
>>
>> Paolo, I've attempted to read Jiewen's response, and yours, as carefully
>> as I can. I'm still very confused. If you have a better understanding,
>> could you please write up the 15-step process from the thread starter
>> again, with all QEMU customizations applied? Such as, unnecessary steps
>> removed, and platform specifics filled in.
> 
> Sure.
> 
> (01a) QEMU: create new CPU.  The CPU already exists, but it does not
>      start running code until unparked by the CPU hotplug controller.
> 
> (01b) QEMU: trigger SCI
> 
> (02-03) no equivalent
> 
> (04) Host CPU: (OS) execute GPE handler from DSDT
> 
> (05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New CPU
>      will not enter CPU because SMI is disabled)
> 
> (06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
>      rebase code.

(Could Intel open source code for this?)

> (07a) Host CPU: (SMM) Write to CPU hotplug controller to enable
>      new CPU
> 
> (07b) Host CPU: (SMM) Send INIT/SIPI/SIPI to new CPU.
> 
> (08a) New CPU: (Low RAM) Enter protected mode.

PCI DMA attack might be relevant (but yes, I see you've mentioned that
too, down-thread)

> 
> (08b) New CPU: (Flash) Signals host CPU to proceed and enter cli;hlt loop.
> 
> (09) Host CPU: (SMM) Send SMI to the new CPU only.
> 
> (10) New CPU: (SMM) Run SMM code at 38000, and rebase SMBASE to
>      TSEG.

I wish we could simply wake the new CPU -- after step 07a -- with an
SMI. IOW, if we could excise steps 07b, 08a, 08b.

Our CPU hotplug controller, and the initial parked state in 01a for the
new CPU, are going to be home-brewed anyway.

On the other hand...

> (11) Host CPU: (SMM) Restore 38000.
> 
> (12) Host CPU: (SMM) Update located data structure to add the new CPU
>      information. (This step will involve CPU_SERVICE protocol)
> 
> (13) New CPU: (Flash) do whatever other initialization is needed
> 
> (14) New CPU: (Flash) Deadloop, and wait for INIT-SIPI-SIPI.

basically step 08b is the environment to which the new CPU returns in
13/14, after the RSM.

Do we absolutely need low RAM for 08a (for entering protected mode)? we
could execute from pflash, no? OTOH we'd still need RAM for the stack,
and that could be attacked with PCI DMA similarly. I believe.

> (15) Host CPU: (OS) Send INIT-SIPI-SIPI to pull new CPU in..
> 
> 
> In other words, the cache-as-RAM phase of 02-03 is replaced by the
> INIT-SIPI-SIPI sequence of 07b-08a-08b.
> 
> 
>>> The QEMU DSDT could be modified (when secure boot is in effect) to OUT
>>> to 0xB2 when hotplug happens.  It could write a well-known value to
>>> 0xB2, to be read by an SMI handler in edk2.
>>
>> I dislike involving QEMU's generated DSDT in anything SMM (even
>> injecting the SMI), because the AML interpreter runs in the OS.
>>
>> If a malicious OS kernel is a bit too enlightened about the DSDT, it
>> could willfully diverge from the process that we design. If QEMU
>> broadcast the SMI internally, the guest OS could not interfere with that.
>>
>> If the purpose of the SMI is specifically to force all CPUs into SMM
>> (and thereby force them into trusted state), then the OS would be
>> explicitly counter-interested in carrying out the AML operations from
>> QEMU's DSDT.
> 
> But since the hotplug controller would only be accessible from SMM,
> there would be no other way to invoke it than to follow the DSDT's
> instruction and write to 0xB2.

Right.

> FWIW, real hardware also has plenty of
> 0xB2 writes in the DSDT or in APEI tables (e.g. for persistent store
> access).

Thanks
Laszlo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: CPU hotplug using SMM with QEMU+OVMF
  2019-08-14 14:04     ` Paolo Bonzini
  2019-08-15  9:55       ` Yao, Jiewen
  2019-08-15 15:00       ` [edk2-devel] " Laszlo Ersek
@ 2019-08-15 16:07       ` Igor Mammedov
  2019-08-15 16:24         ` Paolo Bonzini
  2 siblings, 1 reply; 69+ messages in thread
From: Igor Mammedov @ 2019-08-15 16:07 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Yao, Jiewen, Laszlo Ersek, edk2-devel-groups-io,
	edk2-rfc-groups-io, qemu devel list, Chen, Yingwen, Nakajima, Jun,
	Boris Ostrovsky, Joao Marcal Lemos Martins, Phillip Goerl

On Wed, 14 Aug 2019 16:04:50 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 14/08/19 15:20, Yao, Jiewen wrote:
> >> - Does this part require a new branch somewhere in the OVMF SEC code?
> >>   How do we determine whether the CPU executing SEC is BSP or
> >>   hot-plugged AP?  
> > [Jiewen] I think this is blocked from hardware perspective, since the first instruction.
> > There are some hardware specific registers can be used to determine if the CPU is new added.
> > I don’t think this must be same as the real hardware.
> > You are free to invent some registers in device model to be used in OVMF hot plug driver.  
> 
> Yes, this would be a new operation mode for QEMU, that only applies to
> hot-plugged CPUs.  In this mode the AP doesn't reply to INIT or SMI, in
> fact it doesn't reply to anything at all.
> 
> >> - How do we tell the hot-plugged AP where to start execution? (I.e. that
> >>   it should execute code at a particular pflash location.)  
> > [Jiewen] Same real mode reset vector at FFFF:FFF0.  
> 
> You do not need a reset vector or INIT/SIPI/SIPI sequence at all in
> QEMU.  The AP does not start execution at all when it is unplugged, so
> no cache-as-RAM etc.
> 
> We only need to modify QEMU so that hot-plugged APIs do not reply to
> INIT/SIPI/SMI.
> 
> > I don’t think there is problem for real hardware, who always has CAR.
> > Can QEMU provide some CPU specific space, such as MMIO region?  
> 
> Why is a CPU-specific region needed if every other processor is in SMM
> and thus trusted.
> 
> >>   Does CPU hotplug apply only at the socket level? If the CPU is
> >>   multi-core, what is responsible for hot-plugging all cores present in
> >>   the socket?  
> 
> I can answer this: the SMM handler would interact with the hotplug
> controller in the same way that ACPI DSDT does normally.  This supports
> multiple hotplugs already.
> 
> Writes to the hotplug controller from outside SMM would be ignored.
> 
> >>> (03) New CPU: (Flash) send board message to tell host CPU (GPIO->SCI)
> >>>      -- I am waiting for hot-add message.  
> >>
> >> Maybe we can simplify this in QEMU by broadcasting an SMI to existent
> >> processors immediately upon plugging the new CPU.  
> 
> The QEMU DSDT could be modified (when secure boot is in effect) to OUT
> to 0xB2 when hotplug happens.  It could write a well-known value to
> 0xB2, to be read by an SMI handler in edk2.
> 
> 
> >>  
> >>>                                        (NOTE: Host CPU can only  
> >> send  
> >>>      instruction in SMM mode. -- The register is SMM only)  
> >>
> >> Sorry, I don't follow -- what register are we talking about here, and
> >> why is the BSP needed to send anything at all? What "instruction" do you
> >> have in mind?  
> > [Jiewen] The new CPU does not enable SMI at reset.
> > At some point of time later, the CPU need enable SMI, right?
> > The "instruction" here means, the host CPUs need tell to CPU to enable SMI.  
> 
> Right, this would be a write to the CPU hotplug controller
> 
> >>> (04) Host CPU: (OS) get message from board that a new CPU is added.
> >>>      (GPIO -> SCI)
> >>>
> >>> (05) Host CPU: (OS) All CPUs enter SMM (SCI->SWSMI) (NOTE: New CPU
> >>>      will not enter CPU because SMI is disabled)  
> >>
> >> I don't understand the OS involvement here. But, again, perhaps QEMU can
> >> force all existent CPUs into SMM immediately upon adding the new CPU.  
> > [Jiewen] OS here means the Host CPU running code in OS environment, not in SMM environment.  
> 
> See above.
> 
> >>> (06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
> >>>      rebase code.
> >>>
> >>> (07) Host CPU: (SMM) Send message to New CPU to Enable SMI.  
> >>
> >> Aha, so this is the SMM-only register you mention in step (03). Is the
> >> register specified in the Intel SDM?  
> > [Jiewen] Right. That is the register to let host CPU tell new CPU to enable SMI.
> > It is platform specific register. Not defined in SDM.
> > You may invent one in device model.  
> 
> See above.
> 
> >>> (10) New CPU: (SMM) Response first SMI at 38000, and rebase SMBASE to
> >>>      TSEG.  
> >>
> >> What code does the new CPU execute after it completes step (10)? Does it
> >> halt?  
> >
> > [Jiewen] The new CPU exits SMM and return to original place - where it is
> > interrupted to enter SMM - running code on the flash.  
> 
> So in our case we'd need an INIT/SIPI/SIPI sequence between (06) and (07).

Looking at Q35 code and Seabios SMM relocation as example, if I see it
right QEMU has:
    - SMRAM is aliased from DRAM at 0xa0000
    - and TSEG steals from the top of low RAM when configured

Now problem is that default SMBASE at 0x30000 isn't backed by anything
in SMRAM address space and default SMI entry falls-through to the same
location in System address space.

The later is not trusted and entry into SMM mode will corrupt area + might
jump to 'random' SMI handler (hence save/restore code in Seabios).

Here is an idea, can we map a memory region at 0x30000 in SMRAM address
space with relocation space/code reserved. It could be a part of TSEG
(so we don't have to invent ABI to configure that)?

In that case we do not have to care about System address space content
anymore and un-trusted code shouldn't be able to supply rogue SMI handler.
(that would cross out one of the reasons for inventing disabled-INIT/SMI state)


> >>> (11) Host CPU: (SMM) Restore 38000.  
> >>
> >> These steps (i.e., (06) through (11)) don't appear RAS-specific. The
> >> only platform-specific feature seems to be SMI masking register, which
> >> could be extracted into a new SmmCpuFeaturesLib API.
> >>
> >> Thus, would you please consider open sourcing firmware code for steps
> >> (06) through (11)?
> >>
> >> Alternatively -- and in particular because the stack for step (01)
> >> concerns me --, we could approach this from a high-level, functional
> >> perspective. The states that really matter are the relocated SMBASE for
> >> the new CPU, and the state of the full system, right at the end of step
> >> (11).
> >>
> >> When the SMM setup quiesces during normal firmware boot, OVMF could
> >> use
> >> existent (finalized) SMBASE infomation to *pre-program* some virtual
> >> QEMU hardware, with such state that would be expected, as "final" state,
> >> of any new hotplugged CPU. Afterwards, if / when the hotplug actually
> >> happens, QEMU could blanket-apply this state to the new CPU, and
> >> broadcast a hardware SMI to all CPUs except the new one.  
> 
> I'd rather avoid this and stay as close as possible to real hardware.
> 
> Paolo


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: CPU hotplug using SMM with QEMU+OVMF
  2019-08-15 16:07       ` Igor Mammedov
@ 2019-08-15 16:24         ` Paolo Bonzini
  2019-08-16  7:42           ` Igor Mammedov
  0 siblings, 1 reply; 69+ messages in thread
From: Paolo Bonzini @ 2019-08-15 16:24 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Yao, Jiewen, Laszlo Ersek, edk2-devel-groups-io,
	edk2-rfc-groups-io, qemu devel list, Chen, Yingwen, Nakajima, Jun,
	Boris Ostrovsky, Joao Marcal Lemos Martins, Phillip Goerl

On 15/08/19 18:07, Igor Mammedov wrote:
> Looking at Q35 code and Seabios SMM relocation as example, if I see it
> right QEMU has:
>     - SMRAM is aliased from DRAM at 0xa0000
>     - and TSEG steals from the top of low RAM when configured
> 
> Now problem is that default SMBASE at 0x30000 isn't backed by anything
> in SMRAM address space and default SMI entry falls-through to the same
> location in System address space.
> 
> The later is not trusted and entry into SMM mode will corrupt area + might
> jump to 'random' SMI handler (hence save/restore code in Seabios).
> 
> Here is an idea, can we map a memory region at 0x30000 in SMRAM address
> space with relocation space/code reserved. It could be a part of TSEG
> (so we don't have to invent ABI to configure that)?

No, there could be real mode code using it.  What we _could_ do is
initialize SMBASE to 0xa0000, but I think it's better to not deviate too
much from processor behavior (even if it's admittedly a 20-years legacy
that doesn't make any sense).

Paolo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: CPU hotplug using SMM with QEMU+OVMF
  2019-08-15 16:24         ` Paolo Bonzini
@ 2019-08-16  7:42           ` Igor Mammedov
  0 siblings, 0 replies; 69+ messages in thread
From: Igor Mammedov @ 2019-08-16  7:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Yao, Jiewen, Laszlo Ersek, edk2-devel-groups-io,
	edk2-rfc-groups-io, qemu devel list, Chen, Yingwen, Nakajima, Jun,
	Boris Ostrovsky, Joao Marcal Lemos Martins, Phillip Goerl

On Thu, 15 Aug 2019 18:24:53 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 15/08/19 18:07, Igor Mammedov wrote:
> > Looking at Q35 code and Seabios SMM relocation as example, if I see it
> > right QEMU has:
> >     - SMRAM is aliased from DRAM at 0xa0000
> >     - and TSEG steals from the top of low RAM when configured
> > 
> > Now problem is that default SMBASE at 0x30000 isn't backed by anything
> > in SMRAM address space and default SMI entry falls-through to the same
> > location in System address space.
> > 
> > The later is not trusted and entry into SMM mode will corrupt area + might
> > jump to 'random' SMI handler (hence save/restore code in Seabios).
> > 
> > Here is an idea, can we map a memory region at 0x30000 in SMRAM address
> > space with relocation space/code reserved. It could be a part of TSEG
> > (so we don't have to invent ABI to configure that)?  
> 
> No, there could be real mode code using it.

My impression was that QEMU/KVM's SMM address space is accessible only from
CPU in SMM mode, so SMM CPU should access in-depended SMRAM at 0x30000 in
SMM address space while not SMM CPUs (including real mode) should access
0x30000 from normal system RAM.

> What we _could_ do is
> initialize SMBASE to 0xa0000, but I think it's better to not deviate too
> much from processor behavior (even if it's admittedly a 20-years legacy
> that doesn't make any sense).

Agreed, it's better to follow spec, that's one of the reasons why I was toying
with idea of using separate SMRAM at 0x30000 mapped only in SMM address space.

Practically we would be following spec: SDM: 34.4 SMRAM
"
System logic can use the SMI acknowledge transaction or the assertion of the SMIACT# pin to decode accesses to
the SMRAM and redirect them (if desired) to specific SMRAM memory. If a separate RAM memory is used for
SMRAM, system logic should provide a programmable method of mapping the SMRAM into system memory space
when the processor is not in SMM. This mechanism will enable start-up procedures to initialize the SMRAM space
(that is, load the SMI handler) before executing the SMI handler during SMM.
"

Another benefit that gives us, is that we won't have to pull in
all existing CPUs into SMM (essentially another stop_machine) to
guarantee exclusive access to 0x30000 in normal RAM.

> 
> Paolo

^ permalink raw reply	[flat|nested] 69+ messages in thread

end of thread, other threads:[~2019-09-17 14:38 UTC | newest]

Thread overview: 69+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-08-13 14:16 CPU hotplug using SMM with QEMU+OVMF Laszlo Ersek
2019-08-13 16:09 ` Laszlo Ersek
2019-08-13 16:18   ` Laszlo Ersek
2019-08-14 13:20   ` Yao, Jiewen
2019-08-14 14:04     ` Paolo Bonzini
2019-08-15  9:55       ` Yao, Jiewen
2019-08-15 16:04         ` Paolo Bonzini
2019-08-15 15:00       ` [edk2-devel] " Laszlo Ersek
2019-08-15 16:16         ` Igor Mammedov
2019-08-15 16:21         ` Paolo Bonzini
2019-08-16  2:46           ` Yao, Jiewen
2019-08-16  7:20             ` Paolo Bonzini
2019-08-16  7:49               ` Yao, Jiewen
2019-08-16 20:15                 ` Laszlo Ersek
2019-08-16 22:19                   ` Alex Williamson
2019-08-17  0:20                     ` Yao, Jiewen
2019-08-18 19:50                       ` Paolo Bonzini
2019-08-18 23:00                         ` Yao, Jiewen
2019-08-19 14:10                           ` Paolo Bonzini
2019-08-21 12:07                             ` Laszlo Ersek
2019-08-21 15:48                           ` [edk2-rfc] " Michael D Kinney
2019-08-21 17:05                             ` Paolo Bonzini
2019-08-21 17:25                               ` Michael D Kinney
2019-08-21 17:39                                 ` Paolo Bonzini
2019-08-21 20:17                                   ` Michael D Kinney
2019-08-22  6:18                                     ` Paolo Bonzini
2019-08-22 18:29                                       ` Laszlo Ersek
2019-08-22 18:51                                         ` Paolo Bonzini
2019-08-23 14:53                                           ` Laszlo Ersek
2019-08-22 20:13                                         ` Michael D Kinney
2019-08-22 17:59                               ` Laszlo Ersek
2019-08-22 18:43                                 ` Paolo Bonzini
2019-08-22 20:06                                   ` Michael D Kinney
2019-08-22 22:18                                     ` Paolo Bonzini
2019-08-22 22:32                                       ` Michael D Kinney
2019-08-22 23:11                                         ` Paolo Bonzini
2019-08-23  1:02                                           ` Michael D Kinney
2019-08-23  5:00                                             ` Yao, Jiewen
2019-08-23 15:25                                               ` Michael D Kinney
2019-08-24  1:48                                                 ` Yao, Jiewen
2019-08-27 18:31                                                   ` Igor Mammedov
2019-08-29 17:01                                                     ` Laszlo Ersek
2019-08-30 14:48                                                       ` Igor Mammedov
2019-08-30 18:46                                                         ` Laszlo Ersek
2019-09-02  8:45                                                           ` Igor Mammedov
2019-09-02 19:09                                                             ` Laszlo Ersek
2019-09-03 14:53                                                               ` [Qemu-devel] " Igor Mammedov
2019-09-03 17:20                                                                 ` Laszlo Ersek
2019-09-04  9:52                                                                   ` imammedo
2019-09-05 13:08                                                                     ` Laszlo Ersek
2019-09-05 15:45                                                                       ` Igor Mammedov
2019-09-05 15:49                                                                       ` [PATCH] q35: lpc: allow to lock down 128K RAM at default SMBASE address Igor Mammedov
2019-09-09 19:15                                                                         ` Laszlo Ersek
2019-09-09 19:20                                                                           ` Laszlo Ersek
2019-09-10 15:58                                                                           ` Igor Mammedov
2019-09-11 17:30                                                                             ` Laszlo Ersek
2019-09-17 13:11                                                                               ` [edk2-devel] " Igor Mammedov
2019-09-17 14:38                                                                                 ` [staging/branch]: CdePkg - C Development Environment Package Minnow Ware
2019-08-26 15:30                                                 ` [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF Laszlo Ersek
2019-08-27 16:23                                                   ` Igor Mammedov
2019-08-27 20:11                                                     ` Laszlo Ersek
2019-08-28 12:01                                                       ` Igor Mammedov
2019-08-29 16:25                                                         ` Laszlo Ersek
2019-08-30 13:49                                                           ` [Qemu-devel] " Igor Mammedov
2019-08-22 17:53                             ` Laszlo Ersek
2019-08-16 20:00           ` Laszlo Ersek
2019-08-15 16:07       ` Igor Mammedov
2019-08-15 16:24         ` Paolo Bonzini
2019-08-16  7:42           ` Igor Mammedov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox