public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: "Laszlo Ersek" <lersek@redhat.com>
To: devel@edk2.groups.io, ankur.a.arora@oracle.com
Cc: imammedo@redhat.com, boris.ostrovsky@oracle.com,
	Jordan Justen <jordan.l.justen@intel.com>,
	Ard Biesheuvel <ard.biesheuvel@arm.com>,
	Aaron Young <aaron.young@oracle.com>
Subject: Re: [edk2-devel] [PATCH v8 09/10] OvmfPkg/CpuHotplugSmm: do actual CPU hot-eject
Date: Tue, 23 Feb 2021 22:39:01 +0100	[thread overview]
Message-ID: <0f373648-0d88-6151-f2f6-aa185041b576@redhat.com> (raw)
In-Reply-To: <20210222071928.1401820-10-ankur.a.arora@oracle.com>

On 02/22/21 08:19, Ankur Arora wrote:
> Add logic in EjectCpu() to do the actual the CPU ejection.
> 
> On the BSP, ejection happens by first selecting the CPU via
> its QemuSelector and then sending the QEMU "eject" command.
> QEMU in-turn signals the remote VCPU thread which context-switches
> the CPU out of the SMI handler.
> 
> Meanwhile the CPU being ejected, waits around in its holding
> area until it is context-switched out. Note that it is possible
> that a slow CPU gets ejected before it reaches the wait loop.
> However, this would never happen before it has executed the
> "AllCpusInSync" loop in SmiRendezvous().
> It can mean that an ejected CPU does not execute code after
> that point but given that the CPU state will be destroyed by
> QEMU, the missed cleanup is no great loss.
> 
> Cc: Laszlo Ersek <lersek@redhat.com>
> Cc: Jordan Justen <jordan.l.justen@intel.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@arm.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: Aaron Young <aaron.young@oracle.com>
> Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=3132
> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
> ---
> 
> Notes:
>     Addresses the following reviewing comments from v6:
>     (1) s/CpuEject/EjectCpu/g
>     (2,2a,2c) Get rid of eject-worker and related.
>     (2b,2d) Use the PlatformSmmBspElection() logic to find out IsBSP.
>     (3,3b) Use CPU_HOT_EJECT_DATA->QemuSelector instead of ApicIdMap to
>      do the actual ejection.
>     (4,5a,5b) Fix the format etc in the final unplugged log message
>     () Also as discussed elsewhere document the ordering requirements for
>      mCpuHotEjectData->QemuSelector[] and mCpuHotEjectData->Handler.
>     () [from patch 2] Move definition of QEMU_CPUHP_STAT_EJECTED to this
>      patch.
>     () s/QEMU_CPUHP_STAT_EJECTED/QEMU_CPUHP_STAT_EJECT/
> 
>  OvmfPkg/Include/IndustryStandard/QemuCpuHotplug.h  |   1 +
>  OvmfPkg/CpuHotplugSmm/CpuHotplug.c                 | 127 +++++++++++++++++++--
>  .../Library/SmmCpuFeaturesLib/SmmCpuFeaturesLib.c  |  31 +++++
>  3 files changed, 152 insertions(+), 7 deletions(-)
> 
> diff --git a/OvmfPkg/Include/IndustryStandard/QemuCpuHotplug.h b/OvmfPkg/Include/IndustryStandard/QemuCpuHotplug.h
> index 2ec7a107a64d..d0e83102c13f 100644
> --- a/OvmfPkg/Include/IndustryStandard/QemuCpuHotplug.h
> +++ b/OvmfPkg/Include/IndustryStandard/QemuCpuHotplug.h
> @@ -34,6 +34,7 @@
>  #define QEMU_CPUHP_STAT_ENABLED                BIT0
>  #define QEMU_CPUHP_STAT_INSERT                 BIT1
>  #define QEMU_CPUHP_STAT_REMOVE                 BIT2
> +#define QEMU_CPUHP_STAT_EJECT                  BIT3
>  #define QEMU_CPUHP_STAT_FW_REMOVE              BIT4
>  
>  #define QEMU_CPUHP_RW_CMD_DATA               0x8
> diff --git a/OvmfPkg/CpuHotplugSmm/CpuHotplug.c b/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
> index 851e2b28aad9..0484be8fe43c 100644
> --- a/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
> +++ b/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
> @@ -18,6 +18,7 @@
>  #include <Pcd/CpuHotEjectData.h>             // CPU_HOT_EJECT_DATA
>  #include <Protocol/MmCpuIo.h>                // EFI_MM_CPU_IO_PROTOCOL
>  #include <Protocol/SmmCpuService.h>          // EFI_SMM_CPU_SERVICE_PROTOCOL
> +#include <Register/Intel/ArchitecturalMsr.h> // MSR_IA32_APIC_BASE_REGISTER
>  #include <Uefi/UefiBaseType.h>               // EFI_STATUS
>  
>  #include "ApicId.h"                          // APIC_ID
> @@ -191,12 +192,39 @@ RevokeNewSlot:
>  }
>  
>  /**
> +  EjectCpu needs to know the BSP at SMI exit at a point when
> +  some of the EFI_SMM_CPU_SERVICE_PROTOCOL state has been torn
> +  down.
> +  Reuse the logic from OvmfPkg::PlatformSmmBspElection() to
> +  do that.
> +
> +  @param[in] ProcessorNum      ProcessorNum denotes the processor handle number
> +                               in EFI_SMM_CPU_SERVICE_PROTOCOL.
> +**/
> +STATIC
> +BOOLEAN
> +CheckIfBsp (
> +  IN UINTN ProcessorNum
> +  )

(1a) Please remove the ProcessorNum parameter -- comment and parameter
list alike --; it's useless. In the parameter list, just replace the
line's contents with "VOID".

(1b) Additionally, please state:

  @retval TRUE   If the CPU executing this function is the BSP.

  @retval FALSE  If the CPU executing this function is an AP.




> +{
> +  MSR_IA32_APIC_BASE_REGISTER ApicBaseMsr;
> +  BOOLEAN IsBsp;

(2) "IsBsp" should line up with "ApicBaseMsr".


> +
> +  ApicBaseMsr.Uint64 = AsmReadMsr64 (MSR_IA32_APIC_BASE);
> +  IsBsp = (BOOLEAN)(ApicBaseMsr.Bits.BSP == 1);
> +  return IsBsp;
> +}
> +
> +/**
>    CPU Hot-eject handler, called from SmmCpuFeaturesRendezvousExit()
>    on each CPU at exit from SMM.
>  
> -  If, the executing CPU is not being ejected, nothing to be done.
> +  If, the executing CPU is neither the BSP, nor being ejected, nothing
> +  to be done.
>    If, the executing CPU is being ejected, wait in a halted loop
>    until ejected.
> +  If, the executing CPU is the BSP, set QEMU CPU status to eject
> +  for CPUs being ejected.
>  
>    @param[in] ProcessorNum      ProcessorNum denotes the CPU exiting SMM,
>                                 and will be used as an index into
> @@ -211,9 +239,99 @@ EjectCpu (
>    )
>  {
>    UINT64 QemuSelector;
> +  BOOLEAN IsBsp;
>  
> +  IsBsp = CheckIfBsp (ProcessorNum);
> +
> +  //
> +  // mCpuHotEjectData->QemuSelectorMap[ProcessorNum] is updated
> +  // on the BSP in the ongoing SMI iteration at two places:

(3) I feel "iteration" doesn't really apply; I'd simply drop that word.
"ongoing SMI" seems sufficient (or maybe "ongoing SMI handling").


> +  //
> +  // - UnplugCpus() where the BSP determines if a CPU is under ejection
> +  //   or not. As the comment where mCpuHotEjectData->Handler is set-up
> +  //   describes any such updates are guaranteed to be ordered-before the
> +  //   dereference below.
> +  //
> +  // - EjectCpu() on the BSP updates QemuSelectorMap[ProcessorNum] for
> +  //   CPUs after they have been hot-ejected.
> +  //
> +  //   The CPU under ejection: might be executing anywhere between the
> +  //   "AllCpusInSync" exit loop in SmiRendezvous() to about to
> +  //   dereference QemuSelectorMap[ProcessorNum].
> +  //   Given that the BSP ensures that this store only happens after the
> +  //   CPU has been ejected, this CPU would never see the after value.
> +  //   (Note that any CPU that is already executing the CpuSleep() loop
> +  //   below never raced any updates and always saw the before value.)
> +  //
> +  //   CPUs not-under ejection: never see any changes so they are fine.
> +  //
> +  //   Lastly, note that we are also guaranteed that any dereferencing
> +  //   CPU only sees the before or after value and not an intermediate
> +  //   value. This is because QemuSelectorMap[ProcessorNum] is aligned at
> +  //   a natural boundary.
> +  //

(4) Well, about the last paragraph -- when I saw that you used UINT64 in
QemuSelectorMap, to allow for a sentinel value, I immediately thought of
IA32. This module is built for IA32 too, and there I'm not so sure about
the atomicity of UINT64 accesses.

I didn't raise it at that point because I wasn't sure if we were
actually going to rely on the atomicity. And, I don't think we are.
Again, let me quote Paolo's example from <https://lwn.net/Articles/844224/>:

    thread 1                              thread 2
    --------------------------------      ------------------------
    a.x = 1;
    smp_wmb();
    WRITE_ONCE(message, &a);              datum = READ_ONCE(message);
                                          smp_rmb();
                                          if (datum != NULL)
                                            printk("%x\n", datum->x);

The access to object "x" is not restricted in any particular way. In
thread#1, we have an smp_wmb() which enforces both a compiler barrier
and a store-release fence, and then we have an atomic access to
"message". But the "a.x" assignment need not be atomic.

In our case, "a.x = 1" maps to (a) populating "QemuSelectorMap", and (b)
setting up the "Handler" function pointer, in UnplugCpus(). The
smp_wmb() is the "ReleaseMemoryFence" in UnplugCpus(). The WRITE_ONCE is
the final "AllCpusInSync = FALSE" assignment in BSPHandler()
[UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c].

Regarding thread#2, the READ_ONCE is matched by the "AllCpusInSync"
loop. smp_rmb() is the "AcquireMemoryFence" I called for, under PATCH v8
07/10, "OvmfPkg/SmmCpuFeaturesLib: call CPU hot-eject handler". And the
"datum" access happens (a) in SmmCpuFeaturesRendezvousExit(), where we
fetch/call Handler, and (b) here, where we read/modify "QemuSelectorMap".

So this seems to imply we can:

- drop the natural-alignment padding of "QemuSelectorMap", from PATCH v8
06/10, "OvmfPkg/SmmCpuFeaturesLib: init CPU ejection state",

- drop the last paragraph of the comment above.


>    QemuSelector = mCpuHotEjectData->QemuSelectorMap[ProcessorNum];
> -  if (QemuSelector == CPU_EJECT_QEMU_SELECTOR_INVALID) {
> +  if (QemuSelector == CPU_EJECT_QEMU_SELECTOR_INVALID && !IsBsp) {
> +    return;
> +  }
> +
> +  if (IsBsp) {

(5) Can you reorder the high-level flow here? Such as:

  if (CheckIfBsp ()) {
    //
    // send the eject requests to QEMU here
    //
    return;
  }

  //
  // Reached only on APs:
  //
  QemuSelector = mCpuHotEjectData->QemuSelectorMap[ProcessorNum];
  if (QemuSelector == CPU_EJECT_QEMU_SELECTOR_INVALID) {
    return;
  }

  //
  // AP being hot-ejected; pen it here.
  //


> +    UINT32 Idx;
> +
> +    for (Idx = 0; Idx < mCpuHotEjectData->ArrayLength; Idx++) {
> +      UINT64 QemuSelector;
> +
> +      QemuSelector = mCpuHotEjectData->QemuSelectorMap[Idx];
> +
> +      if (QemuSelector != CPU_EJECT_QEMU_SELECTOR_INVALID) {
> +        //
> +        // This to-be-ejected-CPU has already received the BSP's SMI exit
> +        // signal and, will execute SmmCpuFeaturesRendezvousExit()
> +        // followed by this callback or is already waiting in the
> +        // CpuSleep() loop below.
> +        //
> +        // Tell QEMU to context-switch it out.
> +        //
> +        QemuCpuhpWriteCpuSelector (mMmCpuIo, (UINT32) QemuSelector);
> +        QemuCpuhpWriteCpuStatus (mMmCpuIo, QEMU_CPUHP_STAT_EJECT);
> +
> +        //
> +        // We need a compiler barrier here to ensure that the compiler
> +        // does not reorder the CpuStatus and QemuSelectorMap[Idx] stores.
> +        //
> +        // A store fence is not strictly necessary on x86 which has
> +        // TSO; however, both of these stores are in different address spaces
> +        // so also add a Store Fence here.
> +        //
> +        MemoryFence ();

(6) I wonder if this compiler barrier + comment block are helpful.
Paraphrasing your (ex-)colleague Liran, if MMIO and IO Port accessors
didn't contain built-in fences, all hell would break lose. We're using
EFI_MM_CPU_IO_PROTOCOL for IO Port accesses. I think we should be safe
ordering-wise, even without an explicit compiler barrier here.

To me personally, this particular fence only muddies the picture --
where we already have an acquire memory fence and a store memory fence
to couple with each other.

I'd recommend removing this. (If you disagree, I'm willing to listen to
arguments, of course!)


> +
> +        //
> +        // Clear the eject status for this CPU Idx to ensure that an invalid
> +        // SMI later does not end up trying to eject it or a newly
> +        // hotplugged CPU Idx does not go into the dead loop.
> +        //
> +        mCpuHotEjectData->QemuSelectorMap[Idx] =
> +          CPU_EJECT_QEMU_SELECTOR_INVALID;
> +
> +        DEBUG ((DEBUG_INFO, "%a: Unplugged ProcessorNum %u, "
> +          "QemuSelector 0x%Lx\n", __FUNCTION__, Idx, QemuSelector));

(7) Please log QemuSelector with "%Lu".


> +      }
> +    }
> +
> +    //
> +    // We are done until the next hot-unplug; clear the handler.
> +    //
> +    // By virtue of the MemoryFence() in the ejection loop above, the
> +    // following store is ordered-after all the ejections are done.
> +    // (We know that there is at least one CPU hot-eject handler if this
> +    // handler was installed.)
> +    //
> +    // As described in OvmfPkg::SmmCpuFeaturesRendezvousExit() this
> +    // means that the only CPUs which might dereference
> +    // mCpuHotEjectData->Handler are not under ejection, so we can
> +    // safely reset.
> +    //
> +    mCpuHotEjectData->Handler = NULL;
>      return;
>    }
>  
> @@ -496,11 +614,6 @@ CpuHotplugMmi (
>    if (EFI_ERROR (Status)) {
>      goto Fatal;
>    }
> -  if (ToUnplugCount > 0) {
> -    DEBUG ((DEBUG_ERROR, "%a: hot-unplug is not supported yet\n",
> -      __FUNCTION__));
> -    goto Fatal;
> -  }
>  
>    if (PluggedCount > 0) {
>      Status = ProcessHotAddedCpus (mPluggedApicIds, PluggedCount);
> diff --git a/OvmfPkg/Library/SmmCpuFeaturesLib/SmmCpuFeaturesLib.c b/OvmfPkg/Library/SmmCpuFeaturesLib/SmmCpuFeaturesLib.c
> index 99988285b6a2..ddfef05ee6cf 100644
> --- a/OvmfPkg/Library/SmmCpuFeaturesLib/SmmCpuFeaturesLib.c
> +++ b/OvmfPkg/Library/SmmCpuFeaturesLib/SmmCpuFeaturesLib.c
> @@ -472,6 +472,37 @@ SmmCpuFeaturesRendezvousExit (
>    // (PcdCpuMaxLogicalProcessorNumber > 1), and hot-eject is needed
>    // in this SMI exit (otherwise mCpuHotEjectData->Handler is not armed.)
>    //
> +  // mCpuHotEjectData itself is stable once setup so it can be
> +  // dereferenced without needing any synchronization,
> +  // but, mCpuHotEjectData->Handler is updated on the BSP in the
> +  // ongoing SMI iteration at two places:
> +  //
> +  // - UnplugCpus() where the BSP determines if a CPU is under ejection
> +  //   or not. As the comment where mCpuHotEjectData->Handler is set-up
> +  //   describes any such updates are guaranteed to be ordered-before the
> +  //   dereference below.
> +  //
> +  // - EjectCpu() (which is called via the Handler below), on the BSP
> +  //   updates mCpuHotEjectData->Handler once it is done with all ejections.
> +  //
> +  //   The CPU under ejection: might be executing anywhere between the
> +  //   "AllCpusInSync" exit loop in SmiRendezvous() to about to
> +  //   dereference the Handler field.
> +  //   Given that the BSP ensures that this store only happens after all
> +  //   CPUs under ejection have been ejected, this CPU would never see
> +  //   the after value.
> +  //   (Note that any CPU that is already executing the CpuSleep() loop
> +  //   below never raced any updates and always saw the before value.)
> +  //
> +  //   CPUs not-under ejection: might see either value of the Handler
> +  //   which is fine, because the Handler is a NOP for CPUs not-under
> +  //   ejection.
> +  //
> +  //   Lastly, note that we are also guaranteed that any dereferencing
> +  //   CPU only sees the before or after value and not an intermediate
> +  //   value. This is because mCpuHotEjectData->Handler is aligned at a
> +  //   natural boundary.
> +  //
>  
>    if (mCpuHotEjectData != NULL) {
>      CPU_HOT_EJECT_HANDLER Handler;
> 

(8) I can't really put my finger on it, I just feel that repeating
(open-coding) this wall of text here is not really productive.

Do you think that, after you add the "acquire memory fence" comment in
patch #7, we could avoid most of the text here? I think we should only
point out (in patch #7) the "release fence" that the logic here pairs with.

If you really want to present it all from both perspectives, I guess I'm
OK with that, but then I believe we should drop the last paragraph at
least (see point (4)).

Thanks
Laszlo


  reply	other threads:[~2021-02-23 21:39 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-22  7:19 [PATCH v8 00/10] support CPU hot-unplug Ankur Arora
2021-02-22  7:19 ` [PATCH v8 01/10] OvmfPkg/CpuHotplugSmm: refactor hotplug logic Ankur Arora
2021-02-22 11:49   ` [edk2-devel] " Laszlo Ersek
2021-02-22  7:19 ` [PATCH v8 02/10] OvmfPkg/CpuHotplugSmm: collect hot-unplug events Ankur Arora
2021-02-22 12:27   ` [edk2-devel] " Laszlo Ersek
2021-02-22 22:03     ` Ankur Arora
2021-02-23 16:44       ` Laszlo Ersek
2021-02-22  7:19 ` [PATCH v8 03/10] OvmfPkg/CpuHotplugSmm: add Qemu Cpu Status helper Ankur Arora
2021-02-22 12:31   ` [edk2-devel] " Laszlo Ersek
2021-02-22 22:22     ` Ankur Arora
2021-02-22  7:19 ` [PATCH v8 04/10] OvmfPkg/CpuHotplugSmm: introduce UnplugCpus() Ankur Arora
2021-02-22 12:39   ` [edk2-devel] " Laszlo Ersek
2021-02-22 22:22     ` Ankur Arora
2021-02-22  7:19 ` [PATCH v8 05/10] OvmfPkg/CpuHotplugSmm: define CPU_HOT_EJECT_DATA Ankur Arora
2021-02-22 13:06   ` [edk2-devel] " Laszlo Ersek
2021-02-22 22:33     ` Ankur Arora
2021-02-22  7:19 ` [PATCH v8 06/10] OvmfPkg/SmmCpuFeaturesLib: init CPU ejection state Ankur Arora
2021-02-22 14:19   ` [edk2-devel] " Laszlo Ersek
2021-02-23  7:37     ` Ankur Arora
2021-02-22  7:19 ` [PATCH v8 07/10] OvmfPkg/SmmCpuFeaturesLib: call CPU hot-eject handler Ankur Arora
2021-02-22 14:53   ` [edk2-devel] " Laszlo Ersek
2021-02-23  7:37     ` Ankur Arora
2021-02-23 16:52       ` Laszlo Ersek
2021-02-23  7:45     ` Paolo Bonzini
2021-02-23 17:06       ` Laszlo Ersek
2021-02-23 17:18         ` Paolo Bonzini
2021-02-23 20:46           ` Ankur Arora
2021-02-22  7:19 ` [PATCH v8 08/10] OvmfPkg/CpuHotplugSmm: add EjectCpu() Ankur Arora
2021-02-23 20:36   ` [edk2-devel] " Laszlo Ersek
2021-02-23 20:51     ` Ankur Arora
2021-02-22  7:19 ` [PATCH v8 09/10] OvmfPkg/CpuHotplugSmm: do actual CPU hot-eject Ankur Arora
2021-02-23 21:39   ` Laszlo Ersek [this message]
2021-02-24  3:44     ` [edk2-devel] " Ankur Arora
2021-02-25 19:22       ` Laszlo Ersek
2021-02-22  7:19 ` [PATCH v8 10/10] OvmfPkg/SmmControl2Dxe: negotiate CPU hot-unplug Ankur Arora
2021-02-23 21:52   ` [edk2-devel] " Laszlo Ersek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0f373648-0d88-6151-f2f6-aa185041b576@redhat.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox