public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: "Philippe Mathieu-Daudé" <philmd@redhat.com>
To: devel@edk2.groups.io, lersek@redhat.com
Cc: Anthony Perard <anthony.perard@citrix.com>,
	Ard Biesheuvel <ard.biesheuvel@linaro.org>,
	Igor Mammedov <imammedo@redhat.com>,
	Jordan Justen <jordan.l.justen@intel.com>,
	Julien Grall <julien.grall@arm.com>
Subject: Re: [edk2-devel] [PATCH v2 3/3] OvmfPkg/PlatformPei: rewrite MaxCpuCountInitialization() for CPU hotplug
Date: Thu, 24 Oct 2019 17:33:39 +0200	[thread overview]
Message-ID: <9826409d-43ff-f432-f894-9b5c2f9d5a2a@redhat.com> (raw)
In-Reply-To: <20191022221554.14963-4-lersek@redhat.com>

On 10/23/19 12:15 AM, Laszlo Ersek wrote:
> MaxCpuCountInitialization() currently handles the following options:
> 
> (1) QEMU does not report the boot CPU count (FW_CFG_NB_CPUS is 0)
> 
>      In this case, PlatformPei makes MpInitLib enumerate APs up to the
>      default PcdCpuMaxLogicalProcessorNumber value (64) minus 1, or until
>      the default PcdCpuApInitTimeOutInMicroSeconds (50,000) elapses.
>      (Whichever is reached first.)
> 
>      Time-limited AP enumeration had never been reliable on QEMU/KVM, which
>      is why commit 45a70db3c3a5 strated handling case (2) below, in OVMF.
> 
> (2) QEMU reports the boot CPU count (FW_CFG_NB_CPUS is nonzero)
> 
>      In this case, PlatformPei sets
> 
>      - PcdCpuMaxLogicalProcessorNumber to the reported boot CPU count
>        (FW_CFG_NB_CPUS, which exports "PCMachineState.boot_cpus"),
> 
>      - and PcdCpuApInitTimeOutInMicroSeconds to practically "infinity"
>        (MAX_UINT32, ~71 minutes).
> 
>      That causes MpInitLib to enumerate exactly the present (boot) APs.
> 
>      With CPU hotplug in mind, this method is not good enough. Because,
>      using QEMU terminology, UefiCpuPkg expects
>      PcdCpuMaxLogicalProcessorNumber to provide the "possible CPUs" count
>      ("MachineState.smp.max_cpus"), which includes present and not present
>      CPUs both (with not present CPUs being subject for hot-plugging).
>      FW_CFG_NB_CPUS does not include not present CPUs.
> 
> Rewrite MaxCpuCountInitialization() for handling the following cases:
> 
> (1) The behavior of case (1) does not change. (No UefiCpuPkg PCDs are set
>      to values different from the defaults.)
> 
> (2) QEMU reports the boot CPU count ("PCMachineState.boot_cpus", via
>      FW_CFG_NB_CPUS), but not the possible CPUs count
>      ("MachineState.smp.max_cpus").
> 
>      In this case, the behavior remains unchanged.
> 
>      The way MpInitLib is instructed to do the same differs however: we now
>      set the new PcdCpuBootLogicalProcessorNumber to the boot CPU count
>      (while continuing to set PcdCpuMaxLogicalProcessorNumber identically).
>      PcdCpuApInitTimeOutInMicroSeconds becomes irrelevant.
> 
> (3) QEMU reports both the boot CPU count ("PCMachineState.boot_cpus", via
>      FW_CFG_NB_CPUS), and the possible CPUs count
>      ("MachineState.smp.max_cpus").
> 
>      We tell UefiCpuPkg about the possible CPUs count through
>      PcdCpuMaxLogicalProcessorNumber. We also tell MpInitLib the boot CPU
>      count for precise and quick AP enumeration, via
>      PcdCpuBootLogicalProcessorNumber. PcdCpuApInitTimeOutInMicroSeconds is
>      irrelevant again.
> 
> This patch is a pre-requisite for enabling CPU hotplug with SMM_REQUIRE.
> As a side effect, the patch also enables S3 to work with CPU hotplug at
> once, *without* SMM_REQUIRE.
> 
> (Without the patch, S3 resume fails, if a CPU is hot-plugged at OS
> runtime, prior to suspend: the FW_CFG_NB_CPUS increase seen during resume
> causes PcdCpuMaxLogicalProcessorNumber to increase as well, which is not
> permitted.
> 
> With the patch, PcdCpuMaxLogicalProcessorNumber stays the same, namely
> "MachineState.smp.max_cpus". Therefore, the CPU structures allocated
> during normal boot can accommodate the CPUs at S3 resume that have been
> hotplugged prior to S3 suspend.)
> 
> Cc: Anthony Perard <anthony.perard@citrix.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Cc: Jordan Justen <jordan.l.justen@intel.com>
> Cc: Julien Grall <julien.grall@arm.com>
> Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1515
> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
> ---
> 
> Notes:
>      v2:
>      
>      - use "possible CPUs" term in the code and the commit message [Igor]
>      
>      - add details about S3 to the commit message [Igor]
>      
>      - use QEMU's existent CPU hotplug register block, rather than a new
>        named file in fw_cfg [Igor]
>      
>      - tested on QEMU v2.6.2 (legacy-only CPU hotplug register block), v2.7.1
>        (modern register block, but buggy fw_cfg), v2.8.1.1 (no QEMU issues),
>        v4.0.0 (no QEMU issues)
> 
>   OvmfPkg/OvmfPkgIa32.dsc             |   2 +-
>   OvmfPkg/OvmfPkgIa32X64.dsc          |   2 +-
>   OvmfPkg/OvmfPkgX64.dsc              |   2 +-
>   OvmfPkg/PlatformPei/PlatformPei.inf |   2 +-
>   OvmfPkg/PlatformPei/Platform.c      | 172 +++++++++++++++++---
>   5 files changed, 151 insertions(+), 29 deletions(-)
> 
> diff --git a/OvmfPkg/OvmfPkgIa32.dsc b/OvmfPkg/OvmfPkgIa32.dsc
> index 4301e7821902..68d8a9fb9655 100644
> --- a/OvmfPkg/OvmfPkgIa32.dsc
> +++ b/OvmfPkg/OvmfPkgIa32.dsc
> @@ -553,11 +553,11 @@ [PcdsDynamicDefault]
>     gEfiMdeModulePkgTokenSpaceGuid.PcdSetNxForStack|FALSE
>     gEfiMdeModulePkgTokenSpaceGuid.PcdPropertiesTableEnable|FALSE
>   
>     # UefiCpuPkg PCDs related to initial AP bringup and general AP management.
>     gUefiCpuPkgTokenSpaceGuid.PcdCpuMaxLogicalProcessorNumber|64
> -  gUefiCpuPkgTokenSpaceGuid.PcdCpuApInitTimeOutInMicroSeconds|50000
> +  gUefiCpuPkgTokenSpaceGuid.PcdCpuBootLogicalProcessorNumber|0
>   
>     # Set memory encryption mask
>     gEfiMdeModulePkgTokenSpaceGuid.PcdPteMemoryEncryptionAddressOrMask|0x0
>   
>   !if $(SMM_REQUIRE) == TRUE
> diff --git a/OvmfPkg/OvmfPkgIa32X64.dsc b/OvmfPkg/OvmfPkgIa32X64.dsc
> index 803fd74ae8e4..e5a6260b6088 100644
> --- a/OvmfPkg/OvmfPkgIa32X64.dsc
> +++ b/OvmfPkg/OvmfPkgIa32X64.dsc
> @@ -565,11 +565,11 @@ [PcdsDynamicDefault]
>     gEfiMdeModulePkgTokenSpaceGuid.PcdSetNxForStack|FALSE
>     gEfiMdeModulePkgTokenSpaceGuid.PcdPropertiesTableEnable|FALSE
>   
>     # UefiCpuPkg PCDs related to initial AP bringup and general AP management.
>     gUefiCpuPkgTokenSpaceGuid.PcdCpuMaxLogicalProcessorNumber|64
> -  gUefiCpuPkgTokenSpaceGuid.PcdCpuApInitTimeOutInMicroSeconds|50000
> +  gUefiCpuPkgTokenSpaceGuid.PcdCpuBootLogicalProcessorNumber|0
>   
>     # Set memory encryption mask
>     gEfiMdeModulePkgTokenSpaceGuid.PcdPteMemoryEncryptionAddressOrMask|0x0
>   
>   !if $(SMM_REQUIRE) == TRUE
> diff --git a/OvmfPkg/OvmfPkgX64.dsc b/OvmfPkg/OvmfPkgX64.dsc
> index 5dbd1b793a90..f5d904945103 100644
> --- a/OvmfPkg/OvmfPkgX64.dsc
> +++ b/OvmfPkg/OvmfPkgX64.dsc
> @@ -564,11 +564,11 @@ [PcdsDynamicDefault]
>     gEfiMdeModulePkgTokenSpaceGuid.PcdSetNxForStack|FALSE
>     gEfiMdeModulePkgTokenSpaceGuid.PcdPropertiesTableEnable|FALSE
>   
>     # UefiCpuPkg PCDs related to initial AP bringup and general AP management.
>     gUefiCpuPkgTokenSpaceGuid.PcdCpuMaxLogicalProcessorNumber|64
> -  gUefiCpuPkgTokenSpaceGuid.PcdCpuApInitTimeOutInMicroSeconds|50000
> +  gUefiCpuPkgTokenSpaceGuid.PcdCpuBootLogicalProcessorNumber|0
>   
>     # Set memory encryption mask
>     gEfiMdeModulePkgTokenSpaceGuid.PcdPteMemoryEncryptionAddressOrMask|0x0
>   
>   !if $(SMM_REQUIRE) == TRUE
> diff --git a/OvmfPkg/PlatformPei/PlatformPei.inf b/OvmfPkg/PlatformPei/PlatformPei.inf
> index d9fd9c8f05b3..30eaebdfae63 100644
> --- a/OvmfPkg/PlatformPei/PlatformPei.inf
> +++ b/OvmfPkg/PlatformPei/PlatformPei.inf
> @@ -96,11 +96,11 @@ [Pcd]
>     gEfiMdeModulePkgTokenSpaceGuid.PcdAcpiS3Enable
>     gEfiMdeModulePkgTokenSpaceGuid.PcdPteMemoryEncryptionAddressOrMask
>     gEfiSecurityPkgTokenSpaceGuid.PcdOptionRomImageVerificationPolicy
>     gUefiCpuPkgTokenSpaceGuid.PcdCpuLocalApicBaseAddress
>     gUefiCpuPkgTokenSpaceGuid.PcdCpuMaxLogicalProcessorNumber
> -  gUefiCpuPkgTokenSpaceGuid.PcdCpuApInitTimeOutInMicroSeconds
> +  gUefiCpuPkgTokenSpaceGuid.PcdCpuBootLogicalProcessorNumber
>     gUefiCpuPkgTokenSpaceGuid.PcdCpuApStackSize
>   
>   [FixedPcd]
>     gEfiMdePkgTokenSpaceGuid.PcdPciExpressBaseAddress
>   
> diff --git a/OvmfPkg/PlatformPei/Platform.c b/OvmfPkg/PlatformPei/Platform.c
> index 3ba2459872d9..e5e8581752b5 100644
> --- a/OvmfPkg/PlatformPei/Platform.c
> +++ b/OvmfPkg/PlatformPei/Platform.c
> @@ -28,11 +28,14 @@
>   #include <Library/QemuFwCfgLib.h>
>   #include <Library/QemuFwCfgS3Lib.h>
>   #include <Library/ResourcePublicationLib.h>
>   #include <Guid/MemoryTypeInformation.h>
>   #include <Ppi/MasterBootMode.h>
> +#include <IndustryStandard/I440FxPiix4.h>
>   #include <IndustryStandard/Pci22.h>
> +#include <IndustryStandard/Q35MchIch9.h>
> +#include <IndustryStandard/QemuCpuHotplug.h>
>   #include <OvmfPlatforms.h>
>   
>   #include "Platform.h"
>   #include "Cmos.h"
>   
> @@ -562,47 +565,165 @@ S3Verification (
>   #endif
>   }
>   
>   
>   /**
> -  Fetch the number of boot CPUs from QEMU and expose it to UefiCpuPkg modules.
> -  Set the mMaxCpuCount variable.
> +  Fetch the boot CPU count and the possible CPU count from QEMU, and expose
> +  them to UefiCpuPkg modules. Set the mMaxCpuCount variable.
>   **/
>   VOID
>   MaxCpuCountInitialization (
>     VOID
>     )
>   {
> -  UINT16        ProcessorCount;
> +  UINT16        BootCpuCount;
>     RETURN_STATUS PcdStatus;
>   
> +  //
> +  // Try to fetch the boot CPU count.
> +  //
>     QemuFwCfgSelectItem (QemuFwCfgItemSmpCpuCount);
> -  ProcessorCount = QemuFwCfgRead16 ();
> -  //
> -  // If the fw_cfg key or fw_cfg entirely is unavailable, load mMaxCpuCount
> -  // from the PCD default. No change to PCDs.
> -  //
> -  if (ProcessorCount == 0) {
> +  BootCpuCount = QemuFwCfgRead16 ();
> +  if (BootCpuCount == 0) {
> +    //
> +    // QEMU doesn't report the boot CPU count. (BootCpuCount == 0) will let
> +    // MpInitLib count APs up to (PcdCpuMaxLogicalProcessorNumber - 1), or
> +    // until PcdCpuApInitTimeOutInMicroSeconds elapses (whichever is reached
> +    // first).
> +    //
> +    DEBUG ((DEBUG_WARN, "%a: boot CPU count unavailable\n", __FUNCTION__));
>       mMaxCpuCount = PcdGet32 (PcdCpuMaxLogicalProcessorNumber);
> -    return;
> +  } else {
> +    //
> +    // We will expose BootCpuCount to MpInitLib. MpInitLib will count APs up to
> +    // (BootCpuCount - 1) precisely, regardless of timeout.
> +    //
> +    // Now try to fetch the possible CPU count.
> +    //
> +    UINTN CpuHpBase;
> +    UINT32 CmdData2;
> +
> +    CpuHpBase = ((mHostBridgeDevId == INTEL_Q35_MCH_DEVICE_ID) ?
> +                 ICH9_CPU_HOTPLUG_BASE : PIIX4_CPU_HOTPLUG_BASE);
> +
> +    //
> +    // If only legacy mode is available in the CPU hotplug register block, or
> +    // the register block is completely missing, then the writes below are
> +    // no-ops.
> +    //
> +    // 1. Switch the hotplug register block to modern mode.
> +    //
> +    IoWrite32 (CpuHpBase + QEMU_CPUHP_W_CPU_SEL, 0);
> +    //
> +    // 2. Select a valid CPU for deterministic reading of
> +    //    QEMU_CPUHP_R_CMD_DATA2.
> +    //
> +    //    CPU#0 is always valid; it is the always present and non-removable
> +    //    BSP.
> +    //
> +    IoWrite32 (CpuHpBase + QEMU_CPUHP_W_CPU_SEL, 0);
> +    //
> +    // 3. Send a command after which QEMU_CPUHP_R_CMD_DATA2 is specified to
> +    //    read as zero, and which does not invalidate the selector. (The
> +    //    selector may change, but it must not become invalid.)
> +    //
> +    //    Send QEMU_CPUHP_CMD_GET_PENDING, as it will prove useful later.
> +    //
> +    IoWrite8 (CpuHpBase + QEMU_CPUHP_W_CMD, QEMU_CPUHP_CMD_GET_PENDING);
> +    //
> +    // 4. Read QEMU_CPUHP_R_CMD_DATA2.
> +    //
> +    //    If the register block is entirely missing, then this is an unassigned
> +    //    IO read, returning all-bits-one.
> +    //
> +    //    If only legacy mode is available, then bit#0 stands for CPU#0 in the
> +    //    "CPU present bitmap". CPU#0 is always present.
> +    //
> +    //    Otherwise, QEMU_CPUHP_R_CMD_DATA2 is either still reserved (returning
> +    //    all-bits-zero), or it is specified to read as zero after the above
> +    //    steps. Both cases confirm modern mode.
> +    //
> +    CmdData2 = IoRead32 (CpuHpBase + QEMU_CPUHP_R_CMD_DATA2);
> +    DEBUG ((DEBUG_VERBOSE, "%a: CmdData2=0x%x\n", __FUNCTION__, CmdData2));
> +    if (CmdData2 != 0) {
> +      //
> +      // QEMU doesn't support the modern CPU hotplug interface. Assume that the
> +      // possible CPU count equals the boot CPU count (precluding hotplug).
> +      //
> +      DEBUG ((DEBUG_WARN, "%a: modern CPU hotplug interface unavailable\n",
> +        __FUNCTION__));
> +      mMaxCpuCount = BootCpuCount;
> +    } else {
> +      //
> +      // Grab the possible CPU count from the modern CPU hotplug interface.
> +      //
> +      UINT32 Present, Possible, Selected;
> +
> +      Present = 0;
> +      Possible = 0;
> +
> +      //
> +      // We've sent QEMU_CPUHP_CMD_GET_PENDING last; this ensures
> +      // QEMU_CPUHP_RW_CMD_DATA can now be read usefully. However,
> +      // QEMU_CPUHP_CMD_GET_PENDING may have selected a CPU with actual pending
> +      // hotplug events; therefore, select CPU#0 forcibly.
> +      //
> +      IoWrite32 (CpuHpBase + QEMU_CPUHP_W_CPU_SEL, Possible);

Nitpicking, I find this easier to read (matches the comment):

          IoWrite32 (CpuHpBase + QEMU_CPUHP_W_CPU_SEL, 0);

Rest very well documented, thanks.

Reviewed-by: Philippe Mathieu-Daude <philmd@redhat.com>

> +
> +      do {
> +        UINT8 CpuStatus;
> +
> +        //
> +        // Read the status of the currently selected CPU. This will help with a
> +        // sanity check against "BootCpuCount".
> +        //
> +        CpuStatus = IoRead8 (CpuHpBase + QEMU_CPUHP_R_CPU_STAT);
> +        if ((CpuStatus & QEMU_CPUHP_STAT_ENABLED) != 0) {
> +          ++Present;
> +        }
> +        //
> +        // Attempt to select the next CPU.
> +        //
> +        ++Possible;
> +        IoWrite32 (CpuHpBase + QEMU_CPUHP_W_CPU_SEL, Possible);
> +        //
> +        // If the selection is successful, then the following read will return
> +        // the selector (which we know is positive at this point). Otherwise,
> +        // the read will return 0.
> +        //
> +        Selected = IoRead32 (CpuHpBase + QEMU_CPUHP_RW_CMD_DATA);
> +        ASSERT (Selected == Possible || Selected == 0);
> +      } while (Selected > 0);
> +
> +      //
> +      // Sanity check: fw_cfg and the modern CPU hotplug interface should
> +      // return the same boot CPU count.
> +      //
> +      if (BootCpuCount != Present) {
> +        DEBUG ((DEBUG_WARN, "%a: QEMU v2.7 reset bug: BootCpuCount=%d "
> +          "Present=%u\n", __FUNCTION__, BootCpuCount, Present));
> +        //
> +        // The handling of QemuFwCfgItemSmpCpuCount, across CPU hotplug plus
> +        // platform reset (including S3), was corrected in QEMU commit
> +        // e3cadac073a9 ("pc: fix FW_CFG_NB_CPUS to account for -device added
> +        // CPUs", 2016-11-16), part of release v2.8.0.
> +        //
> +        BootCpuCount = (UINT16)Present;
> +      }
> +
> +      mMaxCpuCount = Possible;
> +    }
>     }
> -  //
> -  // Otherwise, set mMaxCpuCount to the value reported by QEMU.
> -  //
> -  mMaxCpuCount = ProcessorCount;
> -  //
> -  // Additionally, tell UefiCpuPkg modules (a) the exact number of VCPUs, (b)
> -  // to wait, in the initial AP bringup, exactly as long as it takes for all of
> -  // the APs to report in. For this, we set the longest representable timeout
> -  // (approx. 71 minutes).
> -  //
> -  PcdStatus = PcdSet32S (PcdCpuMaxLogicalProcessorNumber, ProcessorCount);
> +
> +  DEBUG ((DEBUG_INFO, "%a: BootCpuCount=%d mMaxCpuCount=%u\n", __FUNCTION__,
> +    BootCpuCount, mMaxCpuCount));
> +  ASSERT (BootCpuCount <= mMaxCpuCount);
> +
> +  PcdStatus = PcdSet32S (PcdCpuBootLogicalProcessorNumber, BootCpuCount);
>     ASSERT_RETURN_ERROR (PcdStatus);
> -  PcdStatus = PcdSet32S (PcdCpuApInitTimeOutInMicroSeconds, MAX_UINT32);
> +  PcdStatus = PcdSet32S (PcdCpuMaxLogicalProcessorNumber, mMaxCpuCount);
>     ASSERT_RETURN_ERROR (PcdStatus);
> -  DEBUG ((DEBUG_INFO, "%a: QEMU reports %d processor(s)\n", __FUNCTION__,
> -    ProcessorCount));
>   }
>   
>   
>   /**
>     Perform Platform PEI initialization.
> @@ -636,17 +757,18 @@ InitializePlatform (
>     }
>   
>     S3Verification ();
>     BootModeInitialization ();
>     AddressWidthInitialization ();
> -  MaxCpuCountInitialization ();
>   
>     //
>     // Query Host Bridge DID
>     //
>     mHostBridgeDevId = PciRead16 (OVMF_HOSTBRIDGE_DID);
>   
> +  MaxCpuCountInitialization ();
> +
>     if (FeaturePcdGet (PcdSmmSmramRequire)) {
>       Q35TsegMbytesInitialization ();
>     }
>   
>     PublishPeiMemory ();
> 

  parent reply	other threads:[~2019-10-24 15:33 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-22 22:15 [PATCH v2 0/3] OvmfPkg: distinguish boot CPU count from possible CPU count Laszlo Ersek
2019-10-22 22:15 ` [PATCH v2 1/3] OvmfPkg/OvmfXen.dsc: remove PcdCpu* dynamic defaults Laszlo Ersek
2019-10-24 14:18   ` Anthony PERARD
2019-10-22 22:15 ` [PATCH v2 2/3] OvmfPkg/IndustryStandard: define macros for QEMU's CPU hotplug registers Laszlo Ersek
2019-10-23 12:05   ` [edk2-devel] " Philippe Mathieu-Daudé
2019-10-24 10:29     ` Laszlo Ersek
2019-10-24 15:12       ` Philippe Mathieu-Daudé
2019-10-25  8:21         ` Laszlo Ersek
2020-01-24 11:40         ` Laszlo Ersek
2020-01-29 14:43           ` Philippe Mathieu-Daudé
2020-01-29 17:30             ` Laszlo Ersek
2019-10-22 22:15 ` [PATCH v2 3/3] OvmfPkg/PlatformPei: rewrite MaxCpuCountInitialization() for CPU hotplug Laszlo Ersek
2019-10-24 14:27   ` Anthony PERARD
2019-10-24 15:28     ` Laszlo Ersek
2019-10-24 15:33   ` Philippe Mathieu-Daudé [this message]
2019-10-25  8:29     ` [edk2-devel] " Laszlo Ersek
2019-10-25  9:17       ` Philippe Mathieu-Daudé
2019-10-23  8:52 ` [PATCH v2 0/3] OvmfPkg: distinguish boot CPU count from possible CPU count Ard Biesheuvel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9826409d-43ff-f432-f894-9b5c2f9d5a2a@redhat.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox