From: "Philippe Mathieu-Daudé" <philmd@redhat.com>
To: devel@edk2.groups.io, lersek@redhat.com
Cc: Anthony Perard <anthony.perard@citrix.com>,
Ard Biesheuvel <ard.biesheuvel@linaro.org>,
Igor Mammedov <imammedo@redhat.com>,
Jordan Justen <jordan.l.justen@intel.com>,
Julien Grall <julien.grall@arm.com>
Subject: Re: [edk2-devel] [PATCH v2 3/3] OvmfPkg/PlatformPei: rewrite MaxCpuCountInitialization() for CPU hotplug
Date: Thu, 24 Oct 2019 17:33:39 +0200 [thread overview]
Message-ID: <9826409d-43ff-f432-f894-9b5c2f9d5a2a@redhat.com> (raw)
In-Reply-To: <20191022221554.14963-4-lersek@redhat.com>
On 10/23/19 12:15 AM, Laszlo Ersek wrote:
> MaxCpuCountInitialization() currently handles the following options:
>
> (1) QEMU does not report the boot CPU count (FW_CFG_NB_CPUS is 0)
>
> In this case, PlatformPei makes MpInitLib enumerate APs up to the
> default PcdCpuMaxLogicalProcessorNumber value (64) minus 1, or until
> the default PcdCpuApInitTimeOutInMicroSeconds (50,000) elapses.
> (Whichever is reached first.)
>
> Time-limited AP enumeration had never been reliable on QEMU/KVM, which
> is why commit 45a70db3c3a5 strated handling case (2) below, in OVMF.
>
> (2) QEMU reports the boot CPU count (FW_CFG_NB_CPUS is nonzero)
>
> In this case, PlatformPei sets
>
> - PcdCpuMaxLogicalProcessorNumber to the reported boot CPU count
> (FW_CFG_NB_CPUS, which exports "PCMachineState.boot_cpus"),
>
> - and PcdCpuApInitTimeOutInMicroSeconds to practically "infinity"
> (MAX_UINT32, ~71 minutes).
>
> That causes MpInitLib to enumerate exactly the present (boot) APs.
>
> With CPU hotplug in mind, this method is not good enough. Because,
> using QEMU terminology, UefiCpuPkg expects
> PcdCpuMaxLogicalProcessorNumber to provide the "possible CPUs" count
> ("MachineState.smp.max_cpus"), which includes present and not present
> CPUs both (with not present CPUs being subject for hot-plugging).
> FW_CFG_NB_CPUS does not include not present CPUs.
>
> Rewrite MaxCpuCountInitialization() for handling the following cases:
>
> (1) The behavior of case (1) does not change. (No UefiCpuPkg PCDs are set
> to values different from the defaults.)
>
> (2) QEMU reports the boot CPU count ("PCMachineState.boot_cpus", via
> FW_CFG_NB_CPUS), but not the possible CPUs count
> ("MachineState.smp.max_cpus").
>
> In this case, the behavior remains unchanged.
>
> The way MpInitLib is instructed to do the same differs however: we now
> set the new PcdCpuBootLogicalProcessorNumber to the boot CPU count
> (while continuing to set PcdCpuMaxLogicalProcessorNumber identically).
> PcdCpuApInitTimeOutInMicroSeconds becomes irrelevant.
>
> (3) QEMU reports both the boot CPU count ("PCMachineState.boot_cpus", via
> FW_CFG_NB_CPUS), and the possible CPUs count
> ("MachineState.smp.max_cpus").
>
> We tell UefiCpuPkg about the possible CPUs count through
> PcdCpuMaxLogicalProcessorNumber. We also tell MpInitLib the boot CPU
> count for precise and quick AP enumeration, via
> PcdCpuBootLogicalProcessorNumber. PcdCpuApInitTimeOutInMicroSeconds is
> irrelevant again.
>
> This patch is a pre-requisite for enabling CPU hotplug with SMM_REQUIRE.
> As a side effect, the patch also enables S3 to work with CPU hotplug at
> once, *without* SMM_REQUIRE.
>
> (Without the patch, S3 resume fails, if a CPU is hot-plugged at OS
> runtime, prior to suspend: the FW_CFG_NB_CPUS increase seen during resume
> causes PcdCpuMaxLogicalProcessorNumber to increase as well, which is not
> permitted.
>
> With the patch, PcdCpuMaxLogicalProcessorNumber stays the same, namely
> "MachineState.smp.max_cpus". Therefore, the CPU structures allocated
> during normal boot can accommodate the CPUs at S3 resume that have been
> hotplugged prior to S3 suspend.)
>
> Cc: Anthony Perard <anthony.perard@citrix.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Cc: Jordan Justen <jordan.l.justen@intel.com>
> Cc: Julien Grall <julien.grall@arm.com>
> Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1515
> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
> ---
>
> Notes:
> v2:
>
> - use "possible CPUs" term in the code and the commit message [Igor]
>
> - add details about S3 to the commit message [Igor]
>
> - use QEMU's existent CPU hotplug register block, rather than a new
> named file in fw_cfg [Igor]
>
> - tested on QEMU v2.6.2 (legacy-only CPU hotplug register block), v2.7.1
> (modern register block, but buggy fw_cfg), v2.8.1.1 (no QEMU issues),
> v4.0.0 (no QEMU issues)
>
> OvmfPkg/OvmfPkgIa32.dsc | 2 +-
> OvmfPkg/OvmfPkgIa32X64.dsc | 2 +-
> OvmfPkg/OvmfPkgX64.dsc | 2 +-
> OvmfPkg/PlatformPei/PlatformPei.inf | 2 +-
> OvmfPkg/PlatformPei/Platform.c | 172 +++++++++++++++++---
> 5 files changed, 151 insertions(+), 29 deletions(-)
>
> diff --git a/OvmfPkg/OvmfPkgIa32.dsc b/OvmfPkg/OvmfPkgIa32.dsc
> index 4301e7821902..68d8a9fb9655 100644
> --- a/OvmfPkg/OvmfPkgIa32.dsc
> +++ b/OvmfPkg/OvmfPkgIa32.dsc
> @@ -553,11 +553,11 @@ [PcdsDynamicDefault]
> gEfiMdeModulePkgTokenSpaceGuid.PcdSetNxForStack|FALSE
> gEfiMdeModulePkgTokenSpaceGuid.PcdPropertiesTableEnable|FALSE
>
> # UefiCpuPkg PCDs related to initial AP bringup and general AP management.
> gUefiCpuPkgTokenSpaceGuid.PcdCpuMaxLogicalProcessorNumber|64
> - gUefiCpuPkgTokenSpaceGuid.PcdCpuApInitTimeOutInMicroSeconds|50000
> + gUefiCpuPkgTokenSpaceGuid.PcdCpuBootLogicalProcessorNumber|0
>
> # Set memory encryption mask
> gEfiMdeModulePkgTokenSpaceGuid.PcdPteMemoryEncryptionAddressOrMask|0x0
>
> !if $(SMM_REQUIRE) == TRUE
> diff --git a/OvmfPkg/OvmfPkgIa32X64.dsc b/OvmfPkg/OvmfPkgIa32X64.dsc
> index 803fd74ae8e4..e5a6260b6088 100644
> --- a/OvmfPkg/OvmfPkgIa32X64.dsc
> +++ b/OvmfPkg/OvmfPkgIa32X64.dsc
> @@ -565,11 +565,11 @@ [PcdsDynamicDefault]
> gEfiMdeModulePkgTokenSpaceGuid.PcdSetNxForStack|FALSE
> gEfiMdeModulePkgTokenSpaceGuid.PcdPropertiesTableEnable|FALSE
>
> # UefiCpuPkg PCDs related to initial AP bringup and general AP management.
> gUefiCpuPkgTokenSpaceGuid.PcdCpuMaxLogicalProcessorNumber|64
> - gUefiCpuPkgTokenSpaceGuid.PcdCpuApInitTimeOutInMicroSeconds|50000
> + gUefiCpuPkgTokenSpaceGuid.PcdCpuBootLogicalProcessorNumber|0
>
> # Set memory encryption mask
> gEfiMdeModulePkgTokenSpaceGuid.PcdPteMemoryEncryptionAddressOrMask|0x0
>
> !if $(SMM_REQUIRE) == TRUE
> diff --git a/OvmfPkg/OvmfPkgX64.dsc b/OvmfPkg/OvmfPkgX64.dsc
> index 5dbd1b793a90..f5d904945103 100644
> --- a/OvmfPkg/OvmfPkgX64.dsc
> +++ b/OvmfPkg/OvmfPkgX64.dsc
> @@ -564,11 +564,11 @@ [PcdsDynamicDefault]
> gEfiMdeModulePkgTokenSpaceGuid.PcdSetNxForStack|FALSE
> gEfiMdeModulePkgTokenSpaceGuid.PcdPropertiesTableEnable|FALSE
>
> # UefiCpuPkg PCDs related to initial AP bringup and general AP management.
> gUefiCpuPkgTokenSpaceGuid.PcdCpuMaxLogicalProcessorNumber|64
> - gUefiCpuPkgTokenSpaceGuid.PcdCpuApInitTimeOutInMicroSeconds|50000
> + gUefiCpuPkgTokenSpaceGuid.PcdCpuBootLogicalProcessorNumber|0
>
> # Set memory encryption mask
> gEfiMdeModulePkgTokenSpaceGuid.PcdPteMemoryEncryptionAddressOrMask|0x0
>
> !if $(SMM_REQUIRE) == TRUE
> diff --git a/OvmfPkg/PlatformPei/PlatformPei.inf b/OvmfPkg/PlatformPei/PlatformPei.inf
> index d9fd9c8f05b3..30eaebdfae63 100644
> --- a/OvmfPkg/PlatformPei/PlatformPei.inf
> +++ b/OvmfPkg/PlatformPei/PlatformPei.inf
> @@ -96,11 +96,11 @@ [Pcd]
> gEfiMdeModulePkgTokenSpaceGuid.PcdAcpiS3Enable
> gEfiMdeModulePkgTokenSpaceGuid.PcdPteMemoryEncryptionAddressOrMask
> gEfiSecurityPkgTokenSpaceGuid.PcdOptionRomImageVerificationPolicy
> gUefiCpuPkgTokenSpaceGuid.PcdCpuLocalApicBaseAddress
> gUefiCpuPkgTokenSpaceGuid.PcdCpuMaxLogicalProcessorNumber
> - gUefiCpuPkgTokenSpaceGuid.PcdCpuApInitTimeOutInMicroSeconds
> + gUefiCpuPkgTokenSpaceGuid.PcdCpuBootLogicalProcessorNumber
> gUefiCpuPkgTokenSpaceGuid.PcdCpuApStackSize
>
> [FixedPcd]
> gEfiMdePkgTokenSpaceGuid.PcdPciExpressBaseAddress
>
> diff --git a/OvmfPkg/PlatformPei/Platform.c b/OvmfPkg/PlatformPei/Platform.c
> index 3ba2459872d9..e5e8581752b5 100644
> --- a/OvmfPkg/PlatformPei/Platform.c
> +++ b/OvmfPkg/PlatformPei/Platform.c
> @@ -28,11 +28,14 @@
> #include <Library/QemuFwCfgLib.h>
> #include <Library/QemuFwCfgS3Lib.h>
> #include <Library/ResourcePublicationLib.h>
> #include <Guid/MemoryTypeInformation.h>
> #include <Ppi/MasterBootMode.h>
> +#include <IndustryStandard/I440FxPiix4.h>
> #include <IndustryStandard/Pci22.h>
> +#include <IndustryStandard/Q35MchIch9.h>
> +#include <IndustryStandard/QemuCpuHotplug.h>
> #include <OvmfPlatforms.h>
>
> #include "Platform.h"
> #include "Cmos.h"
>
> @@ -562,47 +565,165 @@ S3Verification (
> #endif
> }
>
>
> /**
> - Fetch the number of boot CPUs from QEMU and expose it to UefiCpuPkg modules.
> - Set the mMaxCpuCount variable.
> + Fetch the boot CPU count and the possible CPU count from QEMU, and expose
> + them to UefiCpuPkg modules. Set the mMaxCpuCount variable.
> **/
> VOID
> MaxCpuCountInitialization (
> VOID
> )
> {
> - UINT16 ProcessorCount;
> + UINT16 BootCpuCount;
> RETURN_STATUS PcdStatus;
>
> + //
> + // Try to fetch the boot CPU count.
> + //
> QemuFwCfgSelectItem (QemuFwCfgItemSmpCpuCount);
> - ProcessorCount = QemuFwCfgRead16 ();
> - //
> - // If the fw_cfg key or fw_cfg entirely is unavailable, load mMaxCpuCount
> - // from the PCD default. No change to PCDs.
> - //
> - if (ProcessorCount == 0) {
> + BootCpuCount = QemuFwCfgRead16 ();
> + if (BootCpuCount == 0) {
> + //
> + // QEMU doesn't report the boot CPU count. (BootCpuCount == 0) will let
> + // MpInitLib count APs up to (PcdCpuMaxLogicalProcessorNumber - 1), or
> + // until PcdCpuApInitTimeOutInMicroSeconds elapses (whichever is reached
> + // first).
> + //
> + DEBUG ((DEBUG_WARN, "%a: boot CPU count unavailable\n", __FUNCTION__));
> mMaxCpuCount = PcdGet32 (PcdCpuMaxLogicalProcessorNumber);
> - return;
> + } else {
> + //
> + // We will expose BootCpuCount to MpInitLib. MpInitLib will count APs up to
> + // (BootCpuCount - 1) precisely, regardless of timeout.
> + //
> + // Now try to fetch the possible CPU count.
> + //
> + UINTN CpuHpBase;
> + UINT32 CmdData2;
> +
> + CpuHpBase = ((mHostBridgeDevId == INTEL_Q35_MCH_DEVICE_ID) ?
> + ICH9_CPU_HOTPLUG_BASE : PIIX4_CPU_HOTPLUG_BASE);
> +
> + //
> + // If only legacy mode is available in the CPU hotplug register block, or
> + // the register block is completely missing, then the writes below are
> + // no-ops.
> + //
> + // 1. Switch the hotplug register block to modern mode.
> + //
> + IoWrite32 (CpuHpBase + QEMU_CPUHP_W_CPU_SEL, 0);
> + //
> + // 2. Select a valid CPU for deterministic reading of
> + // QEMU_CPUHP_R_CMD_DATA2.
> + //
> + // CPU#0 is always valid; it is the always present and non-removable
> + // BSP.
> + //
> + IoWrite32 (CpuHpBase + QEMU_CPUHP_W_CPU_SEL, 0);
> + //
> + // 3. Send a command after which QEMU_CPUHP_R_CMD_DATA2 is specified to
> + // read as zero, and which does not invalidate the selector. (The
> + // selector may change, but it must not become invalid.)
> + //
> + // Send QEMU_CPUHP_CMD_GET_PENDING, as it will prove useful later.
> + //
> + IoWrite8 (CpuHpBase + QEMU_CPUHP_W_CMD, QEMU_CPUHP_CMD_GET_PENDING);
> + //
> + // 4. Read QEMU_CPUHP_R_CMD_DATA2.
> + //
> + // If the register block is entirely missing, then this is an unassigned
> + // IO read, returning all-bits-one.
> + //
> + // If only legacy mode is available, then bit#0 stands for CPU#0 in the
> + // "CPU present bitmap". CPU#0 is always present.
> + //
> + // Otherwise, QEMU_CPUHP_R_CMD_DATA2 is either still reserved (returning
> + // all-bits-zero), or it is specified to read as zero after the above
> + // steps. Both cases confirm modern mode.
> + //
> + CmdData2 = IoRead32 (CpuHpBase + QEMU_CPUHP_R_CMD_DATA2);
> + DEBUG ((DEBUG_VERBOSE, "%a: CmdData2=0x%x\n", __FUNCTION__, CmdData2));
> + if (CmdData2 != 0) {
> + //
> + // QEMU doesn't support the modern CPU hotplug interface. Assume that the
> + // possible CPU count equals the boot CPU count (precluding hotplug).
> + //
> + DEBUG ((DEBUG_WARN, "%a: modern CPU hotplug interface unavailable\n",
> + __FUNCTION__));
> + mMaxCpuCount = BootCpuCount;
> + } else {
> + //
> + // Grab the possible CPU count from the modern CPU hotplug interface.
> + //
> + UINT32 Present, Possible, Selected;
> +
> + Present = 0;
> + Possible = 0;
> +
> + //
> + // We've sent QEMU_CPUHP_CMD_GET_PENDING last; this ensures
> + // QEMU_CPUHP_RW_CMD_DATA can now be read usefully. However,
> + // QEMU_CPUHP_CMD_GET_PENDING may have selected a CPU with actual pending
> + // hotplug events; therefore, select CPU#0 forcibly.
> + //
> + IoWrite32 (CpuHpBase + QEMU_CPUHP_W_CPU_SEL, Possible);
Nitpicking, I find this easier to read (matches the comment):
IoWrite32 (CpuHpBase + QEMU_CPUHP_W_CPU_SEL, 0);
Rest very well documented, thanks.
Reviewed-by: Philippe Mathieu-Daude <philmd@redhat.com>
> +
> + do {
> + UINT8 CpuStatus;
> +
> + //
> + // Read the status of the currently selected CPU. This will help with a
> + // sanity check against "BootCpuCount".
> + //
> + CpuStatus = IoRead8 (CpuHpBase + QEMU_CPUHP_R_CPU_STAT);
> + if ((CpuStatus & QEMU_CPUHP_STAT_ENABLED) != 0) {
> + ++Present;
> + }
> + //
> + // Attempt to select the next CPU.
> + //
> + ++Possible;
> + IoWrite32 (CpuHpBase + QEMU_CPUHP_W_CPU_SEL, Possible);
> + //
> + // If the selection is successful, then the following read will return
> + // the selector (which we know is positive at this point). Otherwise,
> + // the read will return 0.
> + //
> + Selected = IoRead32 (CpuHpBase + QEMU_CPUHP_RW_CMD_DATA);
> + ASSERT (Selected == Possible || Selected == 0);
> + } while (Selected > 0);
> +
> + //
> + // Sanity check: fw_cfg and the modern CPU hotplug interface should
> + // return the same boot CPU count.
> + //
> + if (BootCpuCount != Present) {
> + DEBUG ((DEBUG_WARN, "%a: QEMU v2.7 reset bug: BootCpuCount=%d "
> + "Present=%u\n", __FUNCTION__, BootCpuCount, Present));
> + //
> + // The handling of QemuFwCfgItemSmpCpuCount, across CPU hotplug plus
> + // platform reset (including S3), was corrected in QEMU commit
> + // e3cadac073a9 ("pc: fix FW_CFG_NB_CPUS to account for -device added
> + // CPUs", 2016-11-16), part of release v2.8.0.
> + //
> + BootCpuCount = (UINT16)Present;
> + }
> +
> + mMaxCpuCount = Possible;
> + }
> }
> - //
> - // Otherwise, set mMaxCpuCount to the value reported by QEMU.
> - //
> - mMaxCpuCount = ProcessorCount;
> - //
> - // Additionally, tell UefiCpuPkg modules (a) the exact number of VCPUs, (b)
> - // to wait, in the initial AP bringup, exactly as long as it takes for all of
> - // the APs to report in. For this, we set the longest representable timeout
> - // (approx. 71 minutes).
> - //
> - PcdStatus = PcdSet32S (PcdCpuMaxLogicalProcessorNumber, ProcessorCount);
> +
> + DEBUG ((DEBUG_INFO, "%a: BootCpuCount=%d mMaxCpuCount=%u\n", __FUNCTION__,
> + BootCpuCount, mMaxCpuCount));
> + ASSERT (BootCpuCount <= mMaxCpuCount);
> +
> + PcdStatus = PcdSet32S (PcdCpuBootLogicalProcessorNumber, BootCpuCount);
> ASSERT_RETURN_ERROR (PcdStatus);
> - PcdStatus = PcdSet32S (PcdCpuApInitTimeOutInMicroSeconds, MAX_UINT32);
> + PcdStatus = PcdSet32S (PcdCpuMaxLogicalProcessorNumber, mMaxCpuCount);
> ASSERT_RETURN_ERROR (PcdStatus);
> - DEBUG ((DEBUG_INFO, "%a: QEMU reports %d processor(s)\n", __FUNCTION__,
> - ProcessorCount));
> }
>
>
> /**
> Perform Platform PEI initialization.
> @@ -636,17 +757,18 @@ InitializePlatform (
> }
>
> S3Verification ();
> BootModeInitialization ();
> AddressWidthInitialization ();
> - MaxCpuCountInitialization ();
>
> //
> // Query Host Bridge DID
> //
> mHostBridgeDevId = PciRead16 (OVMF_HOSTBRIDGE_DID);
>
> + MaxCpuCountInitialization ();
> +
> if (FeaturePcdGet (PcdSmmSmramRequire)) {
> Q35TsegMbytesInitialization ();
> }
>
> PublishPeiMemory ();
>
next prev parent reply other threads:[~2019-10-24 15:33 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-22 22:15 [PATCH v2 0/3] OvmfPkg: distinguish boot CPU count from possible CPU count Laszlo Ersek
2019-10-22 22:15 ` [PATCH v2 1/3] OvmfPkg/OvmfXen.dsc: remove PcdCpu* dynamic defaults Laszlo Ersek
2019-10-24 14:18 ` Anthony PERARD
2019-10-22 22:15 ` [PATCH v2 2/3] OvmfPkg/IndustryStandard: define macros for QEMU's CPU hotplug registers Laszlo Ersek
2019-10-23 12:05 ` [edk2-devel] " Philippe Mathieu-Daudé
2019-10-24 10:29 ` Laszlo Ersek
2019-10-24 15:12 ` Philippe Mathieu-Daudé
2019-10-25 8:21 ` Laszlo Ersek
2020-01-24 11:40 ` Laszlo Ersek
2020-01-29 14:43 ` Philippe Mathieu-Daudé
2020-01-29 17:30 ` Laszlo Ersek
2019-10-22 22:15 ` [PATCH v2 3/3] OvmfPkg/PlatformPei: rewrite MaxCpuCountInitialization() for CPU hotplug Laszlo Ersek
2019-10-24 14:27 ` Anthony PERARD
2019-10-24 15:28 ` Laszlo Ersek
2019-10-24 15:33 ` Philippe Mathieu-Daudé [this message]
2019-10-25 8:29 ` [edk2-devel] " Laszlo Ersek
2019-10-25 9:17 ` Philippe Mathieu-Daudé
2019-10-23 8:52 ` [PATCH v2 0/3] OvmfPkg: distinguish boot CPU count from possible CPU count Ard Biesheuvel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-list from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9826409d-43ff-f432-f894-9b5c2f9d5a2a@redhat.com \
--to=devel@edk2.groups.io \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox