public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: Laszlo Ersek <lersek@redhat.com>
To: "Yao, Jiewen" <jiewen.yao@intel.com>
Cc: "Kinney, Michael D" <michael.d.kinney@intel.com>,
	"Tian, Feng" <feng.tian@intel.com>,
	"Radim Kr?má?" <rkrcmar@redhat.com>,
	"edk2-devel@ml01.01.org" <edk2-devel@ml01.01.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Fan, Jeff" <jeff.fan@intel.com>,
	"Zeng, Star" <star.zeng@intel.com>
Subject: Re: [PATCH 0/6] Enable SMM page level protection.
Date: Fri, 4 Nov 2016 17:15:46 +0100	[thread overview]
Message-ID: <eae7a6e9-2517-fdf8-27a9-87b15896b96d@redhat.com> (raw)
In-Reply-To: <74D8A39837DF1E4DA445A8C0B3885C50386BDFD3@shsmsx102.ccr.corp.intel.com>

On 11/04/16 02:15, Yao, Jiewen wrote:
> Thank you, Mike.
> Yes, I reproduced the issue and found out what is wrong. Here is detail:
> 
> It seems OVMF never puts AP in SMM mode, even *before my patch series*.
> I rollback all my update and use e9d093.
> 
> I add some debug message in PerformRemainingTasks(), I found below:
> *(mSmmMpSyncData->CpuData[0].Present) = 1
> *(mSmmMpSyncData->CpuData[1].Present) = 0
> *(mSmmMpSyncData->CpuData[2].Present) = 0
> *(mSmmMpSyncData->CpuData[3].Present) = 0
> *(mSmmMpSyncData->CpuData[4].Present) = 0
> *(mSmmMpSyncData->CpuData[5].Present) = 0
> *(mSmmMpSyncData->CpuData[6].Present) = 0
> *(mSmmMpSyncData->CpuData[7].Present) = 0
> It is never exposed before, because there is no code to call SmmStartupAp.
> 
> I assume all APs should be present, so I added ASSERT.
> Based upon current OVMF situation, I will change ASSERT to a DEBUG_ERROR message.
> 
> Last but not least important, it is unclear to me, it is a *bug* or a *feature* on OVMF or QEMU.
> Anyone can confirm that?

Thank you very much for narrowing this down.

It is a QEMU platform peculiarity that raising an SMI, with
EFI_SMM_CONTROL2_PROTOCOL.Trigger(), puts *only* the processor that is
executing Trigger() into SMM. The other processors are not affected.

In PiSmmCpuDxeSmm, there is BSP code that waits for the APs to show up
in SMM, for a specific amount of time. (See PcdCpuSmmSyncMode and
PcdCpuSmmApSyncTimeout.) If the APs don't show up in the allotted time,
due to the original SMI, then the BSP pulls them in with directed LAPIC
writes, if I remember correctly.

(We discussed this at great length when we were enabling SMM in OVMF --
I think I may have forgotten most of the details by now.)

On the bare metal, the LAPIC writes are just a fallback, and the
timeouts are not expected in practice. On QEMU however, those timeouts
are the norm (because only the processor calling Trigger() enters SMM).
This caused very long waits when using the variable services (with SMM)
from the runtime OS, even if all of the related code (OS and firmware
alike) were executed only on the BSP (VCPU#0).

In order to mitigate this, we set PcdCpuSmmSyncMode to 1 (relaxed), in
the following patch:

commit 9b1e378811ff730fe4e5bbbba294ea74ef76a915
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Mon Nov 30 18:46:46 2015 +0000

    OvmfPkg: use relaxed AP SMM synchronization mode

    Port 0xb2 on QEMU only sends an SMI to the currently executing processor.
    The SMI handler, however, and in particular SmmWaitForApArrival, currently
    expects that SmmControl2DxeTrigger triggers an SMI IPI on all processors
    rather than just the BSP.  Thus all SMM invocations loop for a second (the
    default value of PcdCpuSmmApSyncTimeout) before SmmWaitForApArrival sends
    another SMI IPI to the APs.

    With the default SmmCpuFeaturesLib, 32-bit machines must broadcast SMIs
    because 32-bit machines must reset the MTRRs on each entry to system
    management modes (they have no SMRRs).  However, our virtual platform
    does not have problems with cacheability of SMRAM, so we can use "directed"
    SMIs instead.  To do this, just set gUefiCpuPkgTokenSpaceGuid.PcdCpuSmmSyncMode
    to 1 (aka SmmCpuSyncModeRelaxedAp).  This fixes SMM on multiprocessor virtual
    machines.

    Contributed-under: TianoCore Contribution Agreement 1.0
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Reviewed-by: Michael Kinney <michael.d.kinney@intel.com>

    git-svn-id: https://svn.code.sf.net/p/edk2/code/trunk/edk2@19058 6f19259b-4bc3-4df7-8a09-765794883524

Another problem remained though: if the SMI was raised by an AP in the
first place -- which is possible when the runtime OS calls the variable
services --, then the code still waited for the sync timeout to elapse.
This is very-very easy to reproduce with the following command in a
Linux guest:

  taskset -c 1 efibootmgr

The "efibootmgr" command issues a big bunch of variable service calls
(with the help of the guest kernel, of course), and the "taskset -c 1"
command ensures that all of the user space, kernel, and firmware code
involved in "efibootmgr" is executed exclusively on VCPU#1 (that is, an
AP).

(Side remark: invoking efibootmgr without any options causes it to
simply print the BootCurrent, Timeout, BootOrder and Boot#### variables.
Probably BootNext too, if it exists.)

Initially, this command would take up to 35-40 *seconds* to complete on
my laptop, even with Paolo's 9b1e378811ff in place. Count

- 1 second busy-looping per SMI raised,

- possibly several SMIs per variable service call (I don't know how many
  times the unprivileged part of the variable driver calls into the
  privileged part, per external service call),

- and several variable service calls issued by efibootmgr.

This was clearly intolerable, especially because in a multiprocessor
linux guest, the efibootmgr program (and the initial variable service
calls issued by the kernel during boot) can be scheduled to any VCPU --
the "taskset -c 1" command only synthesizes the worst case.

For mitigating *that*, we set PcdCpuSmmApSyncTimeout to 1/10th of a
second (from its original 1 second value), in:

commit bb0f18b0bce682f6780af997de45bcef146c11ca
Author: Laszlo Ersek <lersek@redhat.com>
Date:   Mon Nov 30 18:46:50 2015 +0000

    OvmfPkg: any AP in SMM should not wait for the BSP for more than 100 ms

    This patch complements the previous one, "OvmfPkg: use relaxed AP SMM
    synchronization mode". While that patch focuses on the case when the SMI
    is raised synchronously by the BSP, on the BSP:

      BSPHandler()             [UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c]
        SmmWaitForApArrival()  [UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c]
          IsSyncTimerTimeout() [UefiCpuPkg/PiSmmCpuDxeSmm/SyncTimer.c]

    this patch concerns itself with the case when it is one of the APs that
    raises (and sees delivered) the synchronous SMI:

      APHandler()            [UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c]
        IsSyncTimerTimeout() [UefiCpuPkg/PiSmmCpuDxeSmm/SyncTimer.c]

    Namely, in APHandler() the AP waits for the BSP to enter SMM regardless of
    PcdCpuSmmSyncMode, for PcdCpuSmmApSyncTimeout microseconds (the default
    value is 1 second). If the BSP doesn't show up in SMM within that
    interval, then the AP brings it in with a directed SMI, and waits for the
    BSP again for PcdCpuSmmApSyncTimeout microseconds.

    Although during boot services, SmmControl2DxeTrigger() is only called by
    the BSP, at runtime the OS can invoke runtime services from an AP (it can
    even be forced with "taskset -c 1 efibootmgr"). Because on QEMU
    SmmControl2DxeTrigger() only raises the SMI for the calling processor (BSP
    and AP alike), the first interval above times out invariably in such cases
    -- the BSP never shows up before the AP calls it in.

    In order to mitigate the performance penalty, decrease
    PcdCpuSmmApSyncTimeout to one tenth of its default value: 100 ms. (For
    comparison, Vlv2TbltDevicePkg sets 1 ms.)

    NOTE: once QEMU becomes capable of synchronous broadcast SMIs, this patch
    and the previous one ("OvmfPkg: use relaxed AP SMM synchronization mode")
    should be reverted, and SmmControl2DxeTrigger() should be adjusted
    instead.

    Cc: Paolo Bonzini <pbonzini@redhat.com>
    Cc: Jordan Justen <jordan.l.justen@intel.com>
    Cc: Michael Kinney <michael.d.kinney@intel.com>
    Contributed-under: TianoCore Contribution Agreement 1.0
    Signed-off-by: Laszlo Ersek <lersek@redhat.com>
    Reviewed-by: Michael Kinney <michael.d.kinney@intel.com>

    git-svn-id: https://svn.code.sf.net/p/edk2/code/trunk/edk2@19059 6f19259b-4bc3-4df7-8a09-765794883524

With this patch in place, the "taskset -c 1 efibootmgr" command now
"only" takes ~4 seconds when executed in a guest running on my laptop.
(Again, for comparison, Vlv2TbltDevicePkg sets a hundred times stricter
timeout.)

This is the status we have to this day. The NOTE at the end of the
commit message, about QEMU gaining the ability to raise synchronous
broadcast SMIs, has not realized yet.

Prototyping that change for QEMU wasn't hard actually; I did it and
posted the patch for review:

http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05658.html
http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00125.html

See especially the second link for a (subjective) summary of the
opinions around the change. There was a lot of discussion on edk2-devel
too (I don't think I'll try to look up the messages now.)

Ultimately, we couldn't agree on a solution, so we continue to have only
unicast SMIs available for Trigger().

Regarding your question whether this is a bug or feature in QEMU... hard
to say. As long as only SeaBIOS was using QEMU's SMI, I guess it
qualified as a platform feature. For edk2's purposes, we could call it a
bug. The behavior could be changed in QEMU without regressing SeaBIOS's
usage pattern (for example by setting a specific APM_STS value from OVMF
that is known not to be used by SeaBIOS, before writing to APM_CNT), but
we couldn't agree on a solution.

I do see that I ultimately withdrew my QEMU patch:

http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00563.html

so maybe Paolo or someone else convinced me that the approach was wrong.
I don't remember.

Thanks
Laszlo



  reply	other threads:[~2016-11-04 16:15 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-03  6:53 [PATCH 0/6] Enable SMM page level protection Jiewen Yao
2016-11-03  6:53 ` [PATCH 1/6] MdeModulePkg/Include: Add PiSmmMemoryAttributesTable.h Jiewen Yao
2016-11-03  6:53 ` [PATCH 2/6] MdeModulePkg/dec: Add gEdkiiPiSmmMemoryAttributesTableGuid Jiewen Yao
2016-11-03  6:53 ` [PATCH 3/6] MdeModulePkg/PiSmmCore: Add MemoryAttributes support Jiewen Yao
2016-11-03  6:53 ` [PATCH 4/6] UefiCpuPkg/dec: Add PcdCpuSmmStaticPageTable Jiewen Yao
2016-11-03  8:28   ` Laszlo Ersek
2016-11-03 10:46     ` Yao, Jiewen
2016-11-03 21:35       ` Laszlo Ersek
2016-11-03  6:53 ` [PATCH 5/6] UefiCpuPkg/PiSmmCpuDxeSmm: Add paging protection Jiewen Yao
2016-11-03  6:53 ` [PATCH 6/6] QuarkPlatformPkg/dsc: enable Smm " Jiewen Yao
2016-11-03  6:55 ` [PATCH 0/6] Enable SMM page level protection Yao, Jiewen
2016-11-03 21:43 ` Laszlo Ersek
2016-11-03 23:51   ` Yao, Jiewen
2016-11-04  0:09     ` Kinney, Michael D
2016-11-04  1:15       ` Yao, Jiewen
2016-11-04 16:15         ` Laszlo Ersek [this message]
2016-11-04 15:22     ` Laszlo Ersek
2016-11-04 15:29       ` Paolo Bonzini
2016-11-04 17:36         ` Laszlo Ersek
2016-11-04  9:35   ` Yao, Jiewen
2016-11-04 15:23     ` Laszlo Ersek
  -- strict thread matches above, loose matches on Subject: below --
2016-11-04  2:49 Kinney, Michael D
2016-11-04  2:55 ` Yao, Jiewen
2016-11-04  3:05   ` Kinney, Michael D
2016-11-04  3:12     ` Kinney, Michael D
2016-11-04  3:20       ` Yao, Jiewen
2016-11-04 11:34         ` Paolo Bonzini
2016-11-04 13:28           ` Yao, Jiewen
2016-11-04 13:50             ` Paolo Bonzini
2016-11-04 16:34               ` Laszlo Ersek
2016-11-04  3:14     ` Yao, Jiewen
2016-11-04 16:28       ` Laszlo Ersek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=eae7a6e9-2517-fdf8-27a9-87b15896b96d@redhat.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox