public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: "Ankur Arora" <ankur.a.arora@oracle.com>
To: Laszlo Ersek <lersek@redhat.com>, devel@edk2.groups.io
Cc: imammedo@redhat.com, boris.ostrovsky@oracle.com,
	Jordan Justen <jordan.l.justen@intel.com>,
	Ard Biesheuvel <ard.biesheuvel@arm.com>,
	Aaron Young <aaron.young@oracle.com>
Subject: Re: [PATCH v6 8/9] OvmfPkg/CpuHotplugSmm: add worker to do CPU ejection
Date: Tue, 2 Feb 2021 21:41:03 -0800	[thread overview]
Message-ID: <905bc1e6-c499-6ae5-9e61-ca12c61a5c94@oracle.com> (raw)
In-Reply-To: <ef1bda82-eb37-b16c-d9ba-5b3226bc1ee7@redhat.com>

On 2021-02-02 5:23 a.m., Laszlo Ersek wrote:
> On 02/01/21 20:21, Ankur Arora wrote:
>> On 2021-02-01 9:22 a.m., Laszlo Ersek wrote:
>>> On 01/29/21 01:59, Ankur Arora wrote:
>>>> Designate a worker CPU (we use the one executing the root MMI
>>>> handler), which will do the actual ejection via QEMU in CpuEject().
>>>>
>>>> CpuEject(), on the worker CPU, ejects each marked CPU by first
>>>> selecting its APIC ID and then sending the QEMU "eject" command.
>>>> QEMU in-turn signals the remote VCPU thread which context-switches
>>>> it out of the SMI.
>>>>
>>>> CpuEject(), on the CPU being ejected, spins around in its holding
>>>> area until this final context-switch. This does mean that there is
>>>> some CPU state that would ordinarily be restored (in SmiRendezvous()
>>>> and in SmiEntry.nasm::CommonHandler), but will not be anymore.
>>>> This unrestored state includes FPU state, CET enable, stuffing of
>>>> RSB and the final RSM. Since the CPU state is destroyed by QEMU,
>>>> this should be okay.
>>>>
>>>> Cc: Laszlo Ersek <lersek@redhat.com>
>>>> Cc: Jordan Justen <jordan.l.justen@intel.com>
>>>> Cc: Ard Biesheuvel <ard.biesheuvel@arm.com>
>>>> Cc: Igor Mammedov <imammedo@redhat.com>
>>>> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>>>> Cc: Aaron Young <aaron.young@oracle.com>
>>>> Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=3132
>>>> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
>>>> ---
>>>>    OvmfPkg/CpuHotplugSmm/CpuHotplug.c | 73
>>>> ++++++++++++++++++++++++++++++++++----
>>>>    1 file changed, 67 insertions(+), 6 deletions(-)
>>>
>>> (1) s/CpuEject/EjectCpu/g, per previous request (affects commit message
>>> and code too).
>>>
>>>
>>>> diff --git a/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
>>>> b/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
>>>> index 526f51faf070..bf91344eef9c 100644
>>>> --- a/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
>>>> +++ b/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
>>>> @@ -193,9 +193,12 @@ RevokeNewSlot:
>>>>      CPU Hot-eject handler, called from SmmCpuFeaturesRendezvousExit(),
>>>>      on each CPU at exit from SMM.
>>>>
>>>> -  If, the executing CPU is not being ejected, nothing to be done.
>>>> +  If, the executing CPU is neither a worker, nor being ejected, nothing
>>>> +  to be done.
>>>>      If, the executing CPU is being ejected, wait in a CpuDeadLoop()
>>>>      until ejected.
>>>> +  If, the executing CPU is a worker CPU, set QEMU CPU status to eject
>>>> +  for CPUs being ejected.
>>>>
>>>>      @param[in] ProcessorNum      Index of executing CPU.
>>>>
>>>> @@ -217,6 +220,56 @@ CpuEject (
>>>>        return;
>>>>      }
>>>>
>>>> +  if (ApicId == CPU_EJECT_WORKER) {
>>>
>>> (2) The CPU_EJECT_WORKER approach is needlessly complicated (speculative
>>> generality). I wish I understood this idea earlier in the patch set.
>>>
>>> (2a) In patch #5 (subject "OvmfPkg/CpuHotplugSmm: define
>>> CPU_HOT_EJECT_DATA"), the CPU_EJECT_WORKER macro definition should be
>>> dropped.
>>>
>>> (2b) In this patch, the question whether the executing CPU is the BSP or
>>> not, should be decided with the same logic that is visible in
>>> PlatformSmmBspElection()
>>> [OvmfPkg/Library/SmmCpuPlatformHookLibQemu/SmmCpuPlatformHookLibQemu.c]:
>>>
>>>     MSR_IA32_APIC_BASE_REGISTER ApicBaseMsr;
>>>     BOOLEAN                     IsBsp;
>>>
>>>     ApicBaseMsr.Uint64 = AsmReadMsr64 (MSR_IA32_APIC_BASE);
>>>     IsBsp = (BOOLEAN)(ApicBaseMsr.Bits.BSP == 1);
>>>
>>> (2c) Point (2b) obviates the explicit "mark as worker" logic entirely,
>>> in UnplugCpus() below.
>>>
>>> (2d) The "is worker" language (in comments etc) should be replaced with
>>> direct "is BSP" language.
>>>
>>>
>>>> +    UINT32 CpuIndex;
>>>> +
>>>> +    for (CpuIndex = 0; CpuIndex < mCpuHotEjectData->ArrayLength;
>>>> CpuIndex++) {
>>>> +      UINT64 RemoveApicId;
>>>> +
>>>> +      RemoveApicId = mCpuHotEjectData->ApicIdMap[CpuIndex];
>>>> +
>>>> +      if ((RemoveApicId != CPU_EJECT_INVALID &&
>>>> +           RemoveApicId != CPU_EJECT_WORKER)) {
>>>> +        //
>>>> +        // This to-be-ejected-CPU has already received the BSP's SMI
>>>> exit
>>>> +        // signal and, will execute SmmCpuFeaturesSmiRendezvousExit()
>>>> +        // followed by this callback or is already waiting in the
>>>> +        // CpuDeadLoop() below.
>>>> +        //
>>>> +        // Tell QEMU to context-switch it out.
>>>> +        //
>>>> +        QemuCpuhpWriteCpuSelector (mMmCpuIo, (APIC_ID) RemoveApicId);
>>>> +        QemuCpuhpWriteCpuStatus (mMmCpuIo, QEMU_CPUHP_STAT_EJECTED);
>>>
>>> (3) While the QEMU CPU selector value *usually* matches the APIC ID,
>>> it's not an invariant. APIC IDs have an internal structure, composed of
>>> bit-fields, where each bit-field accommodates one hierarchy level in the
>>> CPU topology (thread, core, die (maybe), and socket).
>>>
>>> However, this mapping need not be surjective. QEMU lets you create
>>> "pathological" CPU topologies, for example one with:
>>> - 3 threads/core,
>>> - 5 cores/socket,
>>> - (say) 2 sockets.
>>>
>>> Under that example, the bit-field standing for the "thread number" level
>>> would have 2 bits, theoretically permitting *4* threads/core, and the
>>> bit-field standing for the "core number" level would have 3 bits,
>>> theoretically allowing for *8* cores/socket.
>>>
>>> Considering the fully populated topology, you'd see the CPU selector
>>> range from 0 to (3*5*2-1)=29, inclusive (corresponding to 30 logical
>>> processors in total). However, the APIC ID *image* of this CPU selector
>>> *domain* would not be "contiguous" -- the APIC ID space, with the
>>> above-described structure, would accommodate 4*8*2=64 logical
>>> processors. For example, each APIC ID that stood for the nonexistent
>>> "thread#3" on a particular core would be left unused (no CPU selector
>>> would map to it).
>>>
>>> All in all, you can't write the APIC ID to the CPU selector register,
>>> for ejection. You need to select the CPU whose APIC ID is the APIC ID
>>> you want to eject, and then initiate ejection.
>>
>> Yeah, this is a clear bug. Should have seen it earlier. Thanks for
>> pointing it out.
>>
>>>
>>> This requires one of two alternatives:
>>>
>>>
>>> (3a) The first option is to keep the change local to this patch.
>>>
>>> This alternative is the more CPU-hungry (and uglier) one.
>>>
>>> The idea is to perform a QEMU_CPUHP_CMD_GET_ARCH_ID loop over all
>>> possible CPUs, somewhat similarly to QemuCpuhpCollectApicIds(). At every
>>> CPU, knowing the APIC ID, try to find the APIC ID in "ApicIdMap". If
>>> there is a match, eject.
>>>
>>>
>>> (3b) The second option is much more elegant (and it's faster too), but
>>> it requires a much more intrusive update to the patch set.
>>>
>>> First, the *element type* of the arrays that QemuCpuhpCollectApicIds()
>>> operates on, has to be changed from APIC_ID to a structure type that
>>> pairs APIC_ID with the QEMU CPU selector. [*]
>>>
>>> Second, whenever QemuCpuhpCollectApicIds() outputs an APIC_ID, it should
>>> also save the "CurrentSelector" value (in the other field of the output
>>> array element structure).
>>>
>>> Third, the element type of CPU_HOT_EJECT_DATA.ApicIdMap should be
>>> replaced with a structure type similar (or identical) to the one
>>> described at [*]. The ProcessorNumber lookup in UnplugCpus() would still
>>> be based upon the APIC ID, but CPU_HOT_EJECT_DATA should remember both
>>> the QEMU selector for that processor, and the APIC ID.
>>>
>>> Fourth, the actual ejection should use the selector.
>>>
>>> Fifth, the debug message (below) should continue logging the APIC ID, to
>>> mirror the DEBUG_INFO message in ProcessHotAddedCpus().
>>>
>>>
>>> Would you be willing to implement (3b)?
>>
>> 3b is clearly the better solution. However, is there enough value in
>> the print message containing APIC ID, that CPU_HOT_EJECT_DATA.ApicIdMap
>> carry both the cpu-selector and APIC ID?
>>
>> As you say, the ejection itself just needs the ProcessorNum -> QEMU
>> cpu-selector mapping.
> 
> Good question, and I'm torn.
> 
> The default DEBUG build of OVMF enables INFO messages, and more severe
> ones. It does not enable VERBOSE messages.
> 
> In such a DEBUG build (i.e., with otherwise default settings), the log
> entries that relate to hot-plug do not report selectors. Conversely, on
> hot-unplug, the log would report selectors *only* (not APIC IDs). I feel
> this makes it more difficult to get useful OVMF debug logs in bug
> reports, especially from users of distro-provided OVMF builds.
> 
> If we can ask the user to rebuild with VERBOSE enabled and re-run the
> test, that's always great, but frequently users don't know how to
> rebuild OVMF, plus usually OVMF is hidden so deep in their virt stack
> that they have no idea how to make that stack use a custom OVMF build.
> 
> ... How about this: in UnplugCpus(), we could emit an INFO message about
> the ProcessorNumber <-> APIC ID <-> Selector correspondence, and when
> the eject happens, it would suffice to log (at INFO level) only
> ProcessorNumber <-> Selector.

This makes sense to me. Will do.

Thanks
Ankur

> 
> Thanks,
> Laszlo
> 

  reply	other threads:[~2021-02-03  5:41 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-29  0:59 [PATCH v6 0/9] support CPU hot-unplug Ankur Arora
2021-01-29  0:59 ` [PATCH v6 1/9] OvmfPkg/CpuHotplugSmm: refactor hotplug logic Ankur Arora
2021-01-30  1:15   ` [edk2-devel] " Laszlo Ersek
2021-02-02  6:19     ` Ankur Arora
2021-02-01  2:59   ` Laszlo Ersek
2021-01-29  0:59 ` [PATCH v6 2/9] OvmfPkg/CpuHotplugSmm: collect hot-unplug events Ankur Arora
2021-01-30  2:18   ` Laszlo Ersek
2021-01-30  2:23     ` Laszlo Ersek
2021-02-02  6:03     ` Ankur Arora
2021-01-29  0:59 ` [PATCH v6 3/9] OvmfPkg/CpuHotplugSmm: add Qemu Cpu Status helper Ankur Arora
2021-01-30  2:36   ` Laszlo Ersek
2021-02-02  6:04     ` Ankur Arora
2021-01-29  0:59 ` [PATCH v6 4/9] OvmfPkg/CpuHotplugSmm: introduce UnplugCpus() Ankur Arora
2021-01-30  2:37   ` Laszlo Ersek
2021-02-01  3:13   ` Laszlo Ersek
2021-02-03  4:28     ` Ankur Arora
2021-02-03 19:20       ` Laszlo Ersek
2021-01-29  0:59 ` [PATCH v6 5/9] OvmfPkg/CpuHotplugSmm: define CPU_HOT_EJECT_DATA Ankur Arora
2021-02-01  4:53   ` Laszlo Ersek
2021-02-02  6:15     ` Ankur Arora
2021-01-29  0:59 ` [PATCH v6 6/9] OvmfPkg/SmmCpuFeaturesLib: init CPU ejection state Ankur Arora
2021-02-01 13:36   ` Laszlo Ersek
2021-02-03  5:20     ` Ankur Arora
2021-02-03 20:36       ` Laszlo Ersek
2021-02-04  2:58         ` Ankur Arora
2021-01-29  0:59 ` [PATCH v6 7/9] OvmfPkg/CpuHotplugSmm: add CpuEject() Ankur Arora
2021-02-01 16:11   ` Laszlo Ersek
2021-02-01 19:08   ` Laszlo Ersek
2021-02-01 20:12     ` Ankur Arora
2021-02-02 14:00       ` Laszlo Ersek
2021-02-02 14:15         ` Laszlo Ersek
2021-02-03  6:45           ` Ankur Arora
2021-02-03 20:58             ` Laszlo Ersek
2021-02-04  2:49               ` Ankur Arora
2021-02-04  8:58                 ` Laszlo Ersek
2021-02-05 16:06                 ` [edk2-devel] " Laszlo Ersek
2021-02-08  5:04                   ` Ankur Arora
2021-02-03  6:13         ` Ankur Arora
2021-02-03 20:55           ` Laszlo Ersek
2021-02-04  2:57             ` Ankur Arora
2021-01-29  0:59 ` [PATCH v6 8/9] OvmfPkg/CpuHotplugSmm: add worker to do CPU ejection Ankur Arora
2021-02-01 17:22   ` Laszlo Ersek
2021-02-01 19:21     ` Ankur Arora
2021-02-02 13:23       ` Laszlo Ersek
2021-02-03  5:41         ` Ankur Arora [this message]
2021-01-29  0:59 ` [PATCH v6 9/9] OvmfPkg/SmmControl2Dxe: negotiate CPU hot-unplug Ankur Arora
2021-02-01 17:37   ` Laszlo Ersek
2021-02-01 17:40     ` Laszlo Ersek
2021-02-01 17:48       ` Laszlo Ersek
2021-02-03  5:46     ` Ankur Arora
2021-02-03 20:45       ` Laszlo Ersek
2021-02-04  3:04         ` Ankur Arora

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=905bc1e6-c499-6ae5-9e61-ca12c61a5c94@oracle.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox