public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: "Ankur Arora" <ankur.a.arora@oracle.com>
To: Laszlo Ersek <lersek@redhat.com>, devel@edk2.groups.io
Cc: imammedo@redhat.com, boris.ostrovsky@oracle.com,
	Jordan Justen <jordan.l.justen@intel.com>,
	Ard Biesheuvel <ard.biesheuvel@arm.com>,
	Aaron Young <aaron.young@oracle.com>
Subject: Re: [PATCH v6 8/9] OvmfPkg/CpuHotplugSmm: add worker to do CPU ejection
Date: Mon, 1 Feb 2021 11:21:28 -0800	[thread overview]
Message-ID: <f8c88396-94aa-5647-ec2b-9d982c559f36@oracle.com> (raw)
In-Reply-To: <2c25a9d0-38dc-754d-fcef-b372e1211a92@redhat.com>

On 2021-02-01 9:22 a.m., Laszlo Ersek wrote:
> On 01/29/21 01:59, Ankur Arora wrote:
>> Designate a worker CPU (we use the one executing the root MMI
>> handler), which will do the actual ejection via QEMU in CpuEject().
>>
>> CpuEject(), on the worker CPU, ejects each marked CPU by first
>> selecting its APIC ID and then sending the QEMU "eject" command.
>> QEMU in-turn signals the remote VCPU thread which context-switches
>> it out of the SMI.
>>
>> CpuEject(), on the CPU being ejected, spins around in its holding
>> area until this final context-switch. This does mean that there is
>> some CPU state that would ordinarily be restored (in SmiRendezvous()
>> and in SmiEntry.nasm::CommonHandler), but will not be anymore.
>> This unrestored state includes FPU state, CET enable, stuffing of
>> RSB and the final RSM. Since the CPU state is destroyed by QEMU,
>> this should be okay.
>>
>> Cc: Laszlo Ersek <lersek@redhat.com>
>> Cc: Jordan Justen <jordan.l.justen@intel.com>
>> Cc: Ard Biesheuvel <ard.biesheuvel@arm.com>
>> Cc: Igor Mammedov <imammedo@redhat.com>
>> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> Cc: Aaron Young <aaron.young@oracle.com>
>> Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=3132
>> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
>> ---
>>   OvmfPkg/CpuHotplugSmm/CpuHotplug.c | 73 ++++++++++++++++++++++++++++++++++----
>>   1 file changed, 67 insertions(+), 6 deletions(-)
> 
> (1) s/CpuEject/EjectCpu/g, per previous request (affects commit message
> and code too).
> 
> 
>> diff --git a/OvmfPkg/CpuHotplugSmm/CpuHotplug.c b/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
>> index 526f51faf070..bf91344eef9c 100644
>> --- a/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
>> +++ b/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
>> @@ -193,9 +193,12 @@ RevokeNewSlot:
>>     CPU Hot-eject handler, called from SmmCpuFeaturesRendezvousExit(),
>>     on each CPU at exit from SMM.
>>
>> -  If, the executing CPU is not being ejected, nothing to be done.
>> +  If, the executing CPU is neither a worker, nor being ejected, nothing
>> +  to be done.
>>     If, the executing CPU is being ejected, wait in a CpuDeadLoop()
>>     until ejected.
>> +  If, the executing CPU is a worker CPU, set QEMU CPU status to eject
>> +  for CPUs being ejected.
>>
>>     @param[in] ProcessorNum      Index of executing CPU.
>>
>> @@ -217,6 +220,56 @@ CpuEject (
>>       return;
>>     }
>>
>> +  if (ApicId == CPU_EJECT_WORKER) {
> 
> (2) The CPU_EJECT_WORKER approach is needlessly complicated (speculative
> generality). I wish I understood this idea earlier in the patch set.
> 
> (2a) In patch #5 (subject "OvmfPkg/CpuHotplugSmm: define
> CPU_HOT_EJECT_DATA"), the CPU_EJECT_WORKER macro definition should be
> dropped.
> 
> (2b) In this patch, the question whether the executing CPU is the BSP or
> not, should be decided with the same logic that is visible in
> PlatformSmmBspElection()
> [OvmfPkg/Library/SmmCpuPlatformHookLibQemu/SmmCpuPlatformHookLibQemu.c]:
> 
>    MSR_IA32_APIC_BASE_REGISTER ApicBaseMsr;
>    BOOLEAN                     IsBsp;
> 
>    ApicBaseMsr.Uint64 = AsmReadMsr64 (MSR_IA32_APIC_BASE);
>    IsBsp = (BOOLEAN)(ApicBaseMsr.Bits.BSP == 1);
> 
> (2c) Point (2b) obviates the explicit "mark as worker" logic entirely,
> in UnplugCpus() below.
> 
> (2d) The "is worker" language (in comments etc) should be replaced with
> direct "is BSP" language.
> 
> 
>> +    UINT32 CpuIndex;
>> +
>> +    for (CpuIndex = 0; CpuIndex < mCpuHotEjectData->ArrayLength; CpuIndex++) {
>> +      UINT64 RemoveApicId;
>> +
>> +      RemoveApicId = mCpuHotEjectData->ApicIdMap[CpuIndex];
>> +
>> +      if ((RemoveApicId != CPU_EJECT_INVALID &&
>> +           RemoveApicId != CPU_EJECT_WORKER)) {
>> +        //
>> +        // This to-be-ejected-CPU has already received the BSP's SMI exit
>> +        // signal and, will execute SmmCpuFeaturesSmiRendezvousExit()
>> +        // followed by this callback or is already waiting in the
>> +        // CpuDeadLoop() below.
>> +        //
>> +        // Tell QEMU to context-switch it out.
>> +        //
>> +        QemuCpuhpWriteCpuSelector (mMmCpuIo, (APIC_ID) RemoveApicId);
>> +        QemuCpuhpWriteCpuStatus (mMmCpuIo, QEMU_CPUHP_STAT_EJECTED);
> 
> (3) While the QEMU CPU selector value *usually* matches the APIC ID,
> it's not an invariant. APIC IDs have an internal structure, composed of
> bit-fields, where each bit-field accommodates one hierarchy level in the
> CPU topology (thread, core, die (maybe), and socket).
> 
> However, this mapping need not be surjective. QEMU lets you create
> "pathological" CPU topologies, for example one with:
> - 3 threads/core,
> - 5 cores/socket,
> - (say) 2 sockets.
> 
> Under that example, the bit-field standing for the "thread number" level
> would have 2 bits, theoretically permitting *4* threads/core, and the
> bit-field standing for the "core number" level would have 3 bits,
> theoretically allowing for *8* cores/socket.
> 
> Considering the fully populated topology, you'd see the CPU selector
> range from 0 to (3*5*2-1)=29, inclusive (corresponding to 30 logical
> processors in total). However, the APIC ID *image* of this CPU selector
> *domain* would not be "contiguous" -- the APIC ID space, with the
> above-described structure, would accommodate 4*8*2=64 logical
> processors. For example, each APIC ID that stood for the nonexistent
> "thread#3" on a particular core would be left unused (no CPU selector
> would map to it).
> 
> All in all, you can't write the APIC ID to the CPU selector register,
> for ejection. You need to select the CPU whose APIC ID is the APIC ID
> you want to eject, and then initiate ejection.

Yeah, this is a clear bug. Should have seen it earlier. Thanks for
pointing it out.

> 
> This requires one of two alternatives:
> 
> 
> (3a) The first option is to keep the change local to this patch.
> 
> This alternative is the more CPU-hungry (and uglier) one.
> 
> The idea is to perform a QEMU_CPUHP_CMD_GET_ARCH_ID loop over all
> possible CPUs, somewhat similarly to QemuCpuhpCollectApicIds(). At every
> CPU, knowing the APIC ID, try to find the APIC ID in "ApicIdMap". If
> there is a match, eject.
> 
> 
> (3b) The second option is much more elegant (and it's faster too), but
> it requires a much more intrusive update to the patch set.
> 
> First, the *element type* of the arrays that QemuCpuhpCollectApicIds()
> operates on, has to be changed from APIC_ID to a structure type that
> pairs APIC_ID with the QEMU CPU selector. [*]
> 
> Second, whenever QemuCpuhpCollectApicIds() outputs an APIC_ID, it should
> also save the "CurrentSelector" value (in the other field of the output
> array element structure).
> 
> Third, the element type of CPU_HOT_EJECT_DATA.ApicIdMap should be
> replaced with a structure type similar (or identical) to the one
> described at [*]. The ProcessorNumber lookup in UnplugCpus() would still
> be based upon the APIC ID, but CPU_HOT_EJECT_DATA should remember both
> the QEMU selector for that processor, and the APIC ID.
> 
> Fourth, the actual ejection should use the selector.
> 
> Fifth, the debug message (below) should continue logging the APIC ID, to
> mirror the DEBUG_INFO message in ProcessHotAddedCpus().
> 
> 
> Would you be willing to implement (3b)?

3b is clearly the better solution. However, is there enough value in
the print message containing APIC ID, that CPU_HOT_EJECT_DATA.ApicIdMap
carry both the cpu-selector and APIC ID?

As you say, the ejection itself just needs the ProcessorNum -> QEMU cpu-selector
mapping.

Ankur

> 
> 
>> +
>> +        //
>> +        // Compiler barrier to ensure the next store isn't reordered
>> +        //
>> +        MemoryFence ();
>> +
>> +        //
>> +        // Clear the eject status for CpuIndex to ensure that an invalid
>> +        // SMI later does not end up trying to eject it or a newly
>> +        // hotplugged CpuIndex does not go into the dead loop.
>> +        //
>> +        mCpuHotEjectData->ApicIdMap[CpuIndex] = CPU_EJECT_INVALID;
>> +
>> +        DEBUG ((DEBUG_INFO, "%a: Unplugged CPU %u -> " FMT_APIC_ID "\n",
>> +               __FUNCTION__, CpuIndex, RemoveApicId));
> 
> (4) The DEBUG_INFO log message is in the right place (and uses the right
> debug mask), but it is afflicted by the usual warts (indentation, format
> specifiers etc). Please reapply the comments I made elsewhere.
> 
> 
> (5a) Please replace "CPU" with "ProcessorNumber" (so that we know it's
> the protocol-assigned number, not the QEMU selector).
> 
> (5b) Please replace the arrow " -> " with the string " APIC ID ".
> 
> 
> Thanks!
> Laszlo
> 
>> +      }
>> +    }
>> +
>> +    //
>> +    // Clear our own worker status.
>> +    //
>> +    mCpuHotEjectData->ApicIdMap[ProcessorNum] = CPU_EJECT_INVALID;
>> +
>> +    //
>> +    // We are done until the next hot-unplug; clear the handler.
>> +    //
>> +    mCpuHotEjectData->Handler = NULL;
>> +    return;
>> +  }
>> +
>>     //
>>     // CPU(s) being unplugged get here from SmmCpuFeaturesSmiRendezvousExit()
>>     // after having been cleared to exit the SMI by the monarch and thus have
>> @@ -327,6 +380,19 @@ UnplugCpus (
>>     }
>>
>>     if (EjectCount != 0) {
>> +    UINTN  Worker;
>> +
>> +    Status = mMmCpuService->WhoAmI (mMmCpuService, &Worker);
>> +    ASSERT_EFI_ERROR (Status);
>> +    //
>> +    // UnplugCpus() is called via the root MMI handler and thus we are
>> +    // executing in the BSP context.
>> +    //
>> +    // Mark ourselves as the worker CPU.
>> +    //
>> +    ASSERT (mCpuHotEjectData->ApicIdMap[Worker] == CPU_EJECT_INVALID);
>> +    mCpuHotEjectData->ApicIdMap[Worker] = CPU_EJECT_WORKER;
>> +
>>       //
>>       // We have processors to be ejected; install the handler.
>>       //
>> @@ -451,11 +517,6 @@ CpuHotplugMmi (
>>     if (EFI_ERROR (Status)) {
>>       goto Fatal;
>>     }
>> -  if (ToUnplugCount > 0) {
>> -    DEBUG ((DEBUG_ERROR, "%a: hot-unplug is not supported yet\n",
>> -      __FUNCTION__));
>> -    goto Fatal;
>> -  }
>>
>>     if (PluggedCount > 0) {
>>       Status = ProcessHotAddedCpus (mPluggedApicIds, PluggedCount);
>>
> 

  reply	other threads:[~2021-02-01 19:21 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-29  0:59 [PATCH v6 0/9] support CPU hot-unplug Ankur Arora
2021-01-29  0:59 ` [PATCH v6 1/9] OvmfPkg/CpuHotplugSmm: refactor hotplug logic Ankur Arora
2021-01-30  1:15   ` [edk2-devel] " Laszlo Ersek
2021-02-02  6:19     ` Ankur Arora
2021-02-01  2:59   ` Laszlo Ersek
2021-01-29  0:59 ` [PATCH v6 2/9] OvmfPkg/CpuHotplugSmm: collect hot-unplug events Ankur Arora
2021-01-30  2:18   ` Laszlo Ersek
2021-01-30  2:23     ` Laszlo Ersek
2021-02-02  6:03     ` Ankur Arora
2021-01-29  0:59 ` [PATCH v6 3/9] OvmfPkg/CpuHotplugSmm: add Qemu Cpu Status helper Ankur Arora
2021-01-30  2:36   ` Laszlo Ersek
2021-02-02  6:04     ` Ankur Arora
2021-01-29  0:59 ` [PATCH v6 4/9] OvmfPkg/CpuHotplugSmm: introduce UnplugCpus() Ankur Arora
2021-01-30  2:37   ` Laszlo Ersek
2021-02-01  3:13   ` Laszlo Ersek
2021-02-03  4:28     ` Ankur Arora
2021-02-03 19:20       ` Laszlo Ersek
2021-01-29  0:59 ` [PATCH v6 5/9] OvmfPkg/CpuHotplugSmm: define CPU_HOT_EJECT_DATA Ankur Arora
2021-02-01  4:53   ` Laszlo Ersek
2021-02-02  6:15     ` Ankur Arora
2021-01-29  0:59 ` [PATCH v6 6/9] OvmfPkg/SmmCpuFeaturesLib: init CPU ejection state Ankur Arora
2021-02-01 13:36   ` Laszlo Ersek
2021-02-03  5:20     ` Ankur Arora
2021-02-03 20:36       ` Laszlo Ersek
2021-02-04  2:58         ` Ankur Arora
2021-01-29  0:59 ` [PATCH v6 7/9] OvmfPkg/CpuHotplugSmm: add CpuEject() Ankur Arora
2021-02-01 16:11   ` Laszlo Ersek
2021-02-01 19:08   ` Laszlo Ersek
2021-02-01 20:12     ` Ankur Arora
2021-02-02 14:00       ` Laszlo Ersek
2021-02-02 14:15         ` Laszlo Ersek
2021-02-03  6:45           ` Ankur Arora
2021-02-03 20:58             ` Laszlo Ersek
2021-02-04  2:49               ` Ankur Arora
2021-02-04  8:58                 ` Laszlo Ersek
2021-02-05 16:06                 ` [edk2-devel] " Laszlo Ersek
2021-02-08  5:04                   ` Ankur Arora
2021-02-03  6:13         ` Ankur Arora
2021-02-03 20:55           ` Laszlo Ersek
2021-02-04  2:57             ` Ankur Arora
2021-01-29  0:59 ` [PATCH v6 8/9] OvmfPkg/CpuHotplugSmm: add worker to do CPU ejection Ankur Arora
2021-02-01 17:22   ` Laszlo Ersek
2021-02-01 19:21     ` Ankur Arora [this message]
2021-02-02 13:23       ` Laszlo Ersek
2021-02-03  5:41         ` Ankur Arora
2021-01-29  0:59 ` [PATCH v6 9/9] OvmfPkg/SmmControl2Dxe: negotiate CPU hot-unplug Ankur Arora
2021-02-01 17:37   ` Laszlo Ersek
2021-02-01 17:40     ` Laszlo Ersek
2021-02-01 17:48       ` Laszlo Ersek
2021-02-03  5:46     ` Ankur Arora
2021-02-03 20:45       ` Laszlo Ersek
2021-02-04  3:04         ` Ankur Arora

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f8c88396-94aa-5647-ec2b-9d982c559f36@oracle.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox