From: "Laszlo Ersek" <lersek@redhat.com>
To: Ankur Arora <ankur.a.arora@oracle.com>, devel@edk2.groups.io
Cc: imammedo@redhat.com, boris.ostrovsky@oracle.com,
Jordan Justen <jordan.l.justen@intel.com>,
Ard Biesheuvel <ard.biesheuvel@arm.com>,
Aaron Young <aaron.young@oracle.com>
Subject: Re: [PATCH v6 7/9] OvmfPkg/CpuHotplugSmm: add CpuEject()
Date: Mon, 1 Feb 2021 17:11:24 +0100 [thread overview]
Message-ID: <180a8efb-1a26-3bab-f50a-2d7aeff6d582@redhat.com> (raw)
In-Reply-To: <20210129005950.467638-8-ankur.a.arora@oracle.com>
On 01/29/21 01:59, Ankur Arora wrote:
> Add CpuEject(), which handles the CPU ejection, and provides a holding
> area for said CPUs. It is called via SmmCpuFeaturesRendezvousExit(),
> at the tail end of the SMI handling.
(1) The functions introduced thus far by this patch series are all named
"Verb + Object", which is great; so please call this function EjectCpu()
as well, rather than CpuEject().
Modify all three of: subject line, commit message, patch body; please.
>
> Also UnplugCpus() now stashes APIC IDs of CPUs which need to be
> ejected in CPU_HOT_EJECT_DATA.ApicIdMap. These are used by CpuEject()
> to identify such CPUs.
>
> Cc: Laszlo Ersek <lersek@redhat.com>
> Cc: Jordan Justen <jordan.l.justen@intel.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@arm.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: Aaron Young <aaron.young@oracle.com>
> Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=3132
> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
> ---
> OvmfPkg/CpuHotplugSmm/CpuHotplug.c | 109 +++++++++++++++++++++++++++++++++++--
> 1 file changed, 105 insertions(+), 4 deletions(-)
>
> diff --git a/OvmfPkg/CpuHotplugSmm/CpuHotplug.c b/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
> index 70d69f6ed65b..526f51faf070 100644
> --- a/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
> +++ b/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
> @@ -14,6 +14,7 @@
> #include <Library/MmServicesTableLib.h> // gMmst
> #include <Library/PcdLib.h> // PcdGetBool()
> #include <Library/SafeIntLib.h> // SafeUintnSub()
> +#include <Library/CpuHotEjectData.h> // CPU_HOT_EJECT_DATA
> #include <Protocol/MmCpuIo.h> // EFI_MM_CPU_IO_PROTOCOL
> #include <Protocol/SmmCpuService.h> // EFI_SMM_CPU_SERVICE_PROTOCOL
> #include <Uefi/UefiBaseType.h> // EFI_STATUS
(2) This will change due to the movement of the header file, but: please
keep the #include directive list alphabetically sorted.
> @@ -32,11 +33,12 @@ STATIC EFI_MM_CPU_IO_PROTOCOL *mMmCpuIo;
> //
> STATIC EFI_SMM_CPU_SERVICE_PROTOCOL *mMmCpuService;
> //
> -// This structure is a communication side-channel between the
> +// These structures serve as communication side-channels between the
> // EFI_SMM_CPU_SERVICE_PROTOCOL consumer (i.e., this driver) and provider
> // (i.e., PiSmmCpuDxeSmm).
> //
> STATIC CPU_HOT_PLUG_DATA *mCpuHotPlugData;
> +STATIC CPU_HOT_EJECT_DATA *mCpuHotEjectData;
> //
> // SMRAM arrays for fetching the APIC IDs of processors with pending events (of
> // known event types), for the time of just one MMI.
> @@ -188,11 +190,53 @@ RevokeNewSlot:
> }
>
> /**
> + CPU Hot-eject handler, called from SmmCpuFeaturesRendezvousExit(),
> + on each CPU at exit from SMM.
> +
> + If, the executing CPU is not being ejected, nothing to be done.
> + If, the executing CPU is being ejected, wait in a CpuDeadLoop()
> + until ejected.
> +
> + @param[in] ProcessorNum Index of executing CPU.
> +
> +**/
> +VOID
> +EFIAPI
> +CpuEject (
> + IN UINTN ProcessorNum
> + )
> +{
> + //
> + // APIC ID is UINT32, but mCpuHotEjectData->ApicIdMap[] is UINT64
> + // so use UINT64 throughout.
> + //
> + UINT64 ApicId;
> +
> + ApicId = mCpuHotEjectData->ApicIdMap[ProcessorNum];
> + if (ApicId == CPU_EJECT_INVALID) {
> + return;
> + }
> +
> + //
> + // CPU(s) being unplugged get here from SmmCpuFeaturesSmiRendezvousExit()
> + // after having been cleared to exit the SMI by the monarch and thus have
> + // no SMM processing remaining.
> + //
> + // Given that we cannot allow them to escape to the guest, we pen them
> + // here until the SMM monarch tells the HW to unplug them.
> + //
> + CpuDeadLoop ();
> +}
(3a) We can make this less resource-hungry, by replacing CpuDeadLoop()
with:
for (;;) {
DisableInterrupts ();
CpuSleep ();
}
This basically translates to a { CLI; HLT; } loop.
(Both functions come from BaseLib, which CpuHotplugSmm already consumes,
thus there is no need to modify #include's or [LibraryClasses].)
(3b) Please refresh the CpuDeadLoop() reference in the function's
leading comment as well.
> +
> +/**
> Process to be hot-unplugged CPUs, per QemuCpuhpCollectApicIds().
>
> For each such CPU, report the CPU to PiSmmCpuDxeSmm via
> - EFI_SMM_CPU_SERVICE_PROTOCOL. If the to be hot-unplugged CPU is
> - unknown, skip it silently.
> + EFI_SMM_CPU_SERVICE_PROTOCOL and stash the APIC ID for later ejection.
> + If the to be hot-unplugged CPU is unknown, skip it silently.
> +
> + Additonally, if we do stash any APIC IDs, also install a CPU eject handler
> + which would handle the ejection.
>
> @param[in] ToUnplugApicIds The APIC IDs of the CPUs that are about to be
> hot-unplugged.
> @@ -216,9 +260,11 @@ UnplugCpus (
> {
> EFI_STATUS Status;
> UINT32 ToUnplugIdx;
> + UINT32 EjectCount;
> UINTN ProcessorNum;
>
> ToUnplugIdx = 0;
> + EjectCount = 0;
> while (ToUnplugIdx < ToUnplugCount) {
> APIC_ID RemoveApicId;
>
> @@ -255,13 +301,41 @@ UnplugCpus (
> DEBUG ((DEBUG_ERROR, "%a: RemoveProcessor(" FMT_APIC_ID "): %r\n",
> __FUNCTION__, RemoveApicId, Status));
> goto Fatal;
> + } else {
(Under patch v6 4/9, I request that the "goto" be replaced with a
"return" -- my point (4) below applies regardless:)
(4) Please don't add an "else" branch, if the first branch of the "if"
ends with a jump statement. Because, in that case, the code that follows
the "if" statement is not reachable after the first branch anyway.
So please just unnest the next part:
> + //
> + // Stash the APIC IDs so we can do the actual ejection later.
> + //
> + if (mCpuHotEjectData->ApicIdMap[ProcessorNum] != CPU_EJECT_INVALID) {
> + //
> + // Since ProcessorNum and APIC-ID map 1-1, so a valid
> + // mCpuHotEjectData->ApicIdMap[ProcessorNum] means something
> + // is horribly wrong.
> + //
(5) To be honest, I would replace this with:
//
// - mCpuHotEjectData->ApicIdMap[ProcessorNum] is initialized to
// CPU_EJECT_INVALID when mCpuHotEjectData->ApicIdMap is allocated,
//
// - mCpuHotEjectData->ApicIdMap[ProcessorNum] is restored to
// CPU_EJECT_INVALID when the subject processor is ejected,
//
// - mMmCpuService->RemoveProcessor(ProcessorNum) invalidates
// mCpuHotPlugData->ApicId[ProcessorNum], so a given ProcessorNum can
// never match more than one APIC ID in a single invocation of
// UnplugCpus().
//
> + DEBUG ((DEBUG_ERROR, "%a: ProcessorNum %u maps to %llx, cannot "
> + "map to " FMT_APIC_ID "\n", __FUNCTION__, ProcessorNum,
> + mCpuHotEjectData->ApicIdMap[ProcessorNum], RemoveApicId));
(6a) The indentation of the 2nd and 3rd lines is incorrect.
(6b) For logging UINTN values (i.e., ProcessorNum) portably between IA32
and X64, %u is not correct. Instead:
- cast the UINTN value to UINT64 explicitly,
- use the %Lu or %Lx format specifier.
(6c) There is no "%llx" format string in edk2's PrintLib (no "ll" length
modifier, to be more precise). UINT64 values need to be printed with
"%lu" or "%lx", or -- identically -- with "%Lu" or "%Lx". I prefer the
latter, because standard C does not define the "L" size modifier for
integers, and that makes it clear that we're using an edk2-specific
feature. The "l" (ell) length modifier could be misunderstood as "long"
(which is something we don't use in edk2).
(6d) FMT_APIC_ID is defined as "0x%08x"; to remain consistent with that,
I would print the ApicIdMap element not just with "%Lx", but with
"0x%016Lx".
> +
> + Status = EFI_INVALID_PARAMETER;
> + goto Fatal;
(7a) Please just "return EFI_ALREADY_STARTED".
(7b) Please also modify the leading comment on the function -- the new
return value EFI_ALREADY_STARTED should be documented. I suggest:
@retval EFI_ALREADY_STARTED For the ProcessorNumber that
EFI_SMM_CPU_SERVICE_PROTOCOL had assigned to
one of the APIC ID in ToUnplugApicIds,
mCpuHotEjectData->ApicIdMap already has an
APIC ID stashed. (This should never happen.)
> + }
> +
> + mCpuHotEjectData->ApicIdMap[ProcessorNum] = (UINT64)RemoveApicId;
> + EjectCount++;
> }
>
> ToUnplugIdx++;
> }
>
> + if (EjectCount != 0) {
> + //
> + // We have processors to be ejected; install the handler.
> + //
> + mCpuHotEjectData->Handler = CpuEject;
> + }
> +
(8) I suggest removing the "EjectCount" local variable, and setting the
"Handler" member where you currently increment "EjectCount".
> //
> - // We've removed this set of APIC IDs from SMM data structures.
> + // We've removed this set of APIC IDs from SMM data structures and
> + // have installed an ejection handler if needed.
> //
> return EFI_SUCCESS;
>
> @@ -458,7 +532,13 @@ CpuHotplugEntry (
> // Our DEPEX on EFI_SMM_CPU_SERVICE_PROTOCOL guarantees that PiSmmCpuDxeSmm
> // has pointed PcdCpuHotPlugDataAddress to CPU_HOT_PLUG_DATA in SMRAM.
> //
> + // Additionally, CPU Hot-unplug is available only if CPU Hotplug is, so
> + // the same DEPEX also guarantees that PcdCpuHotEjectDataAddress points
> + // to CPU_HOT_EJECT_DATA in SMRAM.
> + //
(9) I don't see the relevance of "hot-unplug depends on hot-plug" here.
I recommend the following comment instead:
//
// Our DEPEX on EFI_SMM_CPU_SERVICE_PROTOCOL guarantees that PiSmmCpuDxeSmm
// has pointed:
// - PcdCpuHotPlugDataAddress to CPU_HOT_PLUG_DATA in SMRAM,
// - PcdCpuHotEjectDataAddress to CPU_HOT_EJECT_DATA in SMRAM, if the
// possible CPU count is greater than 1.
//
> mCpuHotPlugData = (VOID *)(UINTN)PcdGet64 (PcdCpuHotPlugDataAddress);
> + mCpuHotEjectData = (VOID *)(UINTN)PcdGet64 (PcdCpuHotEjectDataAddress);
> +
> if (mCpuHotPlugData == NULL) {
> Status = EFI_NOT_FOUND;
> DEBUG ((DEBUG_ERROR, "%a: CPU_HOT_PLUG_DATA: %r\n", __FUNCTION__, Status));
> @@ -470,6 +550,9 @@ CpuHotplugEntry (
> if (mCpuHotPlugData->ArrayLength == 1) {
> return EFI_UNSUPPORTED;
> }
> + ASSERT (mCpuHotEjectData &&
> + (mCpuHotPlugData->ArrayLength == mCpuHotEjectData->ArrayLength));
> +
> //
> // Allocate the data structures that depend on the possible CPU count.
> //
(10) To remain consistent with the check performed on "mCpuHotPlugData",
please do:
if (mCpuHotEjectData == NULL) {
Status = EFI_NOT_FOUND;
} else if (mCpuHotPlugData->ArrayLength != mCpuHotEjectData->ArrayLength) {
Status = EFI_INVALID_PARAMETER;
} else {
Status = EFI_SUCCESS;
}
if (EFI_ERROR (Status)) {
DEBUG ((DEBUG_ERROR, "%a: CPU_HOT_EJECT_DATA: %r\n", __FUNCTION__, Status));
goto Fatal;
}
(
As a digression, I'll make some comments on the ASSERT() too:
- Given ASSERT ((C1) && (C2)), it is best to express the same as
ASSERT (C1); ASSERT (C2); -- the effect is the same, but the error
messages have finer granularity.
- Checking a pointer against NULL must be explicit at all times, in
edk2. IOW, ASSERT (mCpuHotEjectData) should be spelled
ASSERT (mCpuHotEjectData != NULL).
)
> @@ -552,6 +635,24 @@ CpuHotplugEntry (
> //
> SmbaseInstallFirstSmiHandler ();
>
> + if (mCpuHotEjectData) {
(11) This condition is guaranteed to evaluate to TRUE; see the ASSERT()
above.
Anyway, ignore this...
> + UINT32 Idx;
(12) Incorrect indentation, but ignore this too...
> + //
> + // For CPU ejection we need to map ProcessorNum -> APIC_ID. By the time
> + // we need the mapping, however, the Processor's APIC ID has already been
> + // removed from SMM data structures. So we will maintain a local map
> + // in mCpuHotEjectData->ApicIdMap.
> + //
> + for (Idx = 0; Idx < mCpuHotEjectData->ArrayLength; Idx++) {
> + mCpuHotEjectData->ApicIdMap[Idx] = CPU_EJECT_INVALID;
> + }
(13) ... because this init loop should be moved to patch #6 (subject
"OvmfPkg/SmmCpuFeaturesLib: init CPU ejection state"), as I mentioned
there...
> +
> + //
> + // Wait to init the handler until an ejection is warranted
> + //
> + mCpuHotEjectData->Handler = NULL;
(14) ... and because this nulling is performed by patch #6 already
(subject "OvmfPkg/SmmCpuFeaturesLib: init CPU ejection state").
Therefore, this whole conditional block should be removed please.
Thanks!
Laszlo
> + }
> +
> return EFI_SUCCESS;
>
> ReleasePostSmmPen:
>
next prev parent reply other threads:[~2021-02-01 16:11 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-29 0:59 [PATCH v6 0/9] support CPU hot-unplug Ankur Arora
2021-01-29 0:59 ` [PATCH v6 1/9] OvmfPkg/CpuHotplugSmm: refactor hotplug logic Ankur Arora
2021-01-30 1:15 ` [edk2-devel] " Laszlo Ersek
2021-02-02 6:19 ` Ankur Arora
2021-02-01 2:59 ` Laszlo Ersek
2021-01-29 0:59 ` [PATCH v6 2/9] OvmfPkg/CpuHotplugSmm: collect hot-unplug events Ankur Arora
2021-01-30 2:18 ` Laszlo Ersek
2021-01-30 2:23 ` Laszlo Ersek
2021-02-02 6:03 ` Ankur Arora
2021-01-29 0:59 ` [PATCH v6 3/9] OvmfPkg/CpuHotplugSmm: add Qemu Cpu Status helper Ankur Arora
2021-01-30 2:36 ` Laszlo Ersek
2021-02-02 6:04 ` Ankur Arora
2021-01-29 0:59 ` [PATCH v6 4/9] OvmfPkg/CpuHotplugSmm: introduce UnplugCpus() Ankur Arora
2021-01-30 2:37 ` Laszlo Ersek
2021-02-01 3:13 ` Laszlo Ersek
2021-02-03 4:28 ` Ankur Arora
2021-02-03 19:20 ` Laszlo Ersek
2021-01-29 0:59 ` [PATCH v6 5/9] OvmfPkg/CpuHotplugSmm: define CPU_HOT_EJECT_DATA Ankur Arora
2021-02-01 4:53 ` Laszlo Ersek
2021-02-02 6:15 ` Ankur Arora
2021-01-29 0:59 ` [PATCH v6 6/9] OvmfPkg/SmmCpuFeaturesLib: init CPU ejection state Ankur Arora
2021-02-01 13:36 ` Laszlo Ersek
2021-02-03 5:20 ` Ankur Arora
2021-02-03 20:36 ` Laszlo Ersek
2021-02-04 2:58 ` Ankur Arora
2021-01-29 0:59 ` [PATCH v6 7/9] OvmfPkg/CpuHotplugSmm: add CpuEject() Ankur Arora
2021-02-01 16:11 ` Laszlo Ersek [this message]
2021-02-01 19:08 ` Laszlo Ersek
2021-02-01 20:12 ` Ankur Arora
2021-02-02 14:00 ` Laszlo Ersek
2021-02-02 14:15 ` Laszlo Ersek
2021-02-03 6:45 ` Ankur Arora
2021-02-03 20:58 ` Laszlo Ersek
2021-02-04 2:49 ` Ankur Arora
2021-02-04 8:58 ` Laszlo Ersek
2021-02-05 16:06 ` [edk2-devel] " Laszlo Ersek
2021-02-08 5:04 ` Ankur Arora
2021-02-03 6:13 ` Ankur Arora
2021-02-03 20:55 ` Laszlo Ersek
2021-02-04 2:57 ` Ankur Arora
2021-01-29 0:59 ` [PATCH v6 8/9] OvmfPkg/CpuHotplugSmm: add worker to do CPU ejection Ankur Arora
2021-02-01 17:22 ` Laszlo Ersek
2021-02-01 19:21 ` Ankur Arora
2021-02-02 13:23 ` Laszlo Ersek
2021-02-03 5:41 ` Ankur Arora
2021-01-29 0:59 ` [PATCH v6 9/9] OvmfPkg/SmmControl2Dxe: negotiate CPU hot-unplug Ankur Arora
2021-02-01 17:37 ` Laszlo Ersek
2021-02-01 17:40 ` Laszlo Ersek
2021-02-01 17:48 ` Laszlo Ersek
2021-02-03 5:46 ` Ankur Arora
2021-02-03 20:45 ` Laszlo Ersek
2021-02-04 3:04 ` Ankur Arora
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-list from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=180a8efb-1a26-3bab-f50a-2d7aeff6d582@redhat.com \
--to=devel@edk2.groups.io \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox