public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: "Ankur Arora" <ankur.a.arora@oracle.com>
To: Laszlo Ersek <lersek@redhat.com>, devel@edk2.groups.io
Cc: imammedo@redhat.com, boris.ostrovsky@oracle.com,
	Jordan Justen <jordan.l.justen@intel.com>,
	Ard Biesheuvel <ard.biesheuvel@arm.com>,
	Aaron Young <aaron.young@oracle.com>
Subject: Re: [edk2-devel] [PATCH v8 02/10] OvmfPkg/CpuHotplugSmm: collect hot-unplug events
Date: Mon, 22 Feb 2021 14:03:02 -0800	[thread overview]
Message-ID: <1fc423a8-1230-bcc9-2544-bdec8cd682bf@oracle.com> (raw)
In-Reply-To: <90b4d42f-20fd-b04c-cdf1-774fa19c1052@redhat.com>

On 2021-02-22 4:27 a.m., Laszlo Ersek wrote:
> On 02/22/21 08:19, Ankur Arora wrote:
>> Process fw_remove events in QemuCpuhpCollectApicIds() and collect
>> corresponding APIC IDs for CPUs that are being hot-unplugged.
> 
> (1) We also collect selectors for those; please mention the fact here.
> 
> 
>>
>> In addition, we now ignore CPUs which only have remove set. These
>> CPUs haven't been processed by OSPM yet.
>>
>> This is based on the QEMU hot-unplug protocol documented here:
>>    https://lore.kernel.org/qemu-devel/20201204170939.1815522-3-imammedo@redhat.com/
>>
>> Cc: Laszlo Ersek <lersek@redhat.com>
>> Cc: Jordan Justen <jordan.l.justen@intel.com>
>> Cc: Ard Biesheuvel <ard.biesheuvel@arm.com>
>> Cc: Igor Mammedov <imammedo@redhat.com>
>> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> Cc: Aaron Young <aaron.young@oracle.com>
>> Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=3132
>> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
>> ---
>>
>> Notes:
>>      Addresses the following review comments from v6:
>>       (1,4) Move (and also rename) QEMU_CPUHP_STAT_EJECTED to patch 8,
>>        where we actually use it.
>>       (2) Downgrade debug mask from DEBUG_INFO to DEBUG_VERBOSE.
>>       (3a,3b,3c) Keep the CurrentSelector increment operation at
>>        the tail of the loop.
>>       () As discussed elsewhere we also need to get the CpuSelector while
>>        collecting ApicIds in QemuCpuhpCollectApicIds(). This patch adds a
>>        separate parameter for the CpuSelector values, because that works
>>        better alongside the hotplug ExtendIds logic.
>>
>>   OvmfPkg/CpuHotplugSmm/QemuCpuhp.h                 |  1 +
>>   OvmfPkg/Include/IndustryStandard/QemuCpuHotplug.h |  1 +
>>   OvmfPkg/CpuHotplugSmm/CpuHotplug.c                | 21 +++++-
>>   OvmfPkg/CpuHotplugSmm/QemuCpuhp.c                 | 84 ++++++++++++++++-------
>>   4 files changed, 79 insertions(+), 28 deletions(-)
>>
>> diff --git a/OvmfPkg/CpuHotplugSmm/QemuCpuhp.h b/OvmfPkg/CpuHotplugSmm/QemuCpuhp.h
>> index 8adaa0ad91f0..1e23b150910e 100644
>> --- a/OvmfPkg/CpuHotplugSmm/QemuCpuhp.h
>> +++ b/OvmfPkg/CpuHotplugSmm/QemuCpuhp.h
>> @@ -55,6 +55,7 @@ QemuCpuhpCollectApicIds (
>>     OUT APIC_ID                      *PluggedApicIds,
>>     OUT UINT32                       *PluggedCount,
>>     OUT APIC_ID                      *ToUnplugApicIds,
>> +  OUT UINT32                       *ToUnplugSelector,
>>     OUT UINT32                       *ToUnplugCount
>>     );
>>   
> 
> (2) For consistency with the parameter names "PluggedApicIds" (plural)
> and "ToUnplugApicIds" (plural), please rename this parameter to
> "ToUnplugSelectors".
> 
> 
>> diff --git a/OvmfPkg/Include/IndustryStandard/QemuCpuHotplug.h b/OvmfPkg/Include/IndustryStandard/QemuCpuHotplug.h
>> index a34a6d3fae61..2ec7a107a64d 100644
>> --- a/OvmfPkg/Include/IndustryStandard/QemuCpuHotplug.h
>> +++ b/OvmfPkg/Include/IndustryStandard/QemuCpuHotplug.h
>> @@ -34,6 +34,7 @@
>>   #define QEMU_CPUHP_STAT_ENABLED                BIT0
>>   #define QEMU_CPUHP_STAT_INSERT                 BIT1
>>   #define QEMU_CPUHP_STAT_REMOVE                 BIT2
>> +#define QEMU_CPUHP_STAT_FW_REMOVE              BIT4
>>   
>>   #define QEMU_CPUHP_RW_CMD_DATA               0x8
>>   
>> diff --git a/OvmfPkg/CpuHotplugSmm/CpuHotplug.c b/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
>> index bf68fcd42914..3192bfea1f15 100644
>> --- a/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
>> +++ b/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
>> @@ -52,6 +52,7 @@ STATIC CPU_HOT_PLUG_DATA *mCpuHotPlugData;
>>   //
>>   STATIC APIC_ID *mPluggedApicIds;
>>   STATIC APIC_ID *mToUnplugApicIds;
>> +STATIC UINT32  *mToUnplugSelector;
> 
> (3) For consistency with the other two global variable identifiers,
> please call this "mToUnplugSelector" (plural), too.
> 
> (4) The comment above these global variables only talks about APIC IDs;
> please update the comment with a reference / explanation of the selectors.
> 
> 
>>   //
>>   // Address of the non-SMRAM reserved memory page that contains the Post-SMM Pen
>>   // for hot-added CPUs.
>> @@ -289,6 +290,7 @@ CpuHotplugMmi (
>>                mPluggedApicIds,
>>                &PluggedCount,
>>                mToUnplugApicIds,
>> +             mToUnplugSelector,
>>                &ToUnplugCount
>>                );
>>     if (EFI_ERROR (Status)) {
>> @@ -333,7 +335,9 @@ CpuHotplugEntry (
>>     )
>>   {
>>     EFI_STATUS Status;
>> +  UINTN      Len;
>>     UINTN      Size;
>> +  UINTN      SizeSel;
>>   
>>     //
>>     // This module should only be included when SMM support is required.
>> @@ -387,8 +391,9 @@ CpuHotplugEntry (
>>     //
>>     // Allocate the data structures that depend on the possible CPU count.
>>     //
>> -  if (RETURN_ERROR (SafeUintnSub (mCpuHotPlugData->ArrayLength, 1, &Size)) ||
>> -      RETURN_ERROR (SafeUintnMult (sizeof (APIC_ID), Size, &Size))) {
>> +  if (RETURN_ERROR (SafeUintnSub (mCpuHotPlugData->ArrayLength, 1, &Len)) ||
>> +      RETURN_ERROR (SafeUintnMult (sizeof (APIC_ID), Len, &Size))||
> 
> (5) Missing space character before "||".
> 
> 
>> +      RETURN_ERROR (SafeUintnMult (sizeof (UINT32), Len, &SizeSel))) {
>>       Status = EFI_ABORTED;
>>       DEBUG ((DEBUG_ERROR, "%a: invalid CPU_HOT_PLUG_DATA\n", __FUNCTION__));
>>       goto Fatal;
>> @@ -405,6 +410,12 @@ CpuHotplugEntry (
>>       DEBUG ((DEBUG_ERROR, "%a: MmAllocatePool(): %r\n", __FUNCTION__, Status));
>>       goto ReleasePluggedApicIds;
>>     }
>> +  Status = gMmst->MmAllocatePool (EfiRuntimeServicesData, SizeSel,
>> +                    (VOID **)&mToUnplugSelector);
>> +  if (EFI_ERROR (Status)) {
>> +    DEBUG ((DEBUG_ERROR, "%a: MmAllocatePool(): %r\n", __FUNCTION__, Status));
>> +    goto ReleaseToUnplugApicIds;
>> +  }
>>   
>>     //
>>     // Allocate the Post-SMM Pen for hot-added CPUs.
>> @@ -412,7 +423,7 @@ CpuHotplugEntry (
>>     Status = SmbaseAllocatePostSmmPen (&mPostSmmPenAddress,
>>                SystemTable->BootServices);
>>     if (EFI_ERROR (Status)) {
>> -    goto ReleaseToUnplugApicIds;
>> +    goto ReleaseToUnplugSelector;
>>     }
>>   
>>     //
>> @@ -472,6 +483,10 @@ ReleasePostSmmPen:
>>     SmbaseReleasePostSmmPen (mPostSmmPenAddress, SystemTable->BootServices);
>>     mPostSmmPenAddress = 0;
>>   
>> +ReleaseToUnplugSelector:
> 
> (6) Please call this label "ReleaseToUnplugSelectors" (plural).
> 
> 
>> +  gMmst->MmFreePool (mToUnplugSelector);
>> +  mToUnplugSelector = NULL;
>> +
>>   ReleaseToUnplugApicIds:
>>     gMmst->MmFreePool (mToUnplugApicIds);
>>     mToUnplugApicIds = NULL;
>> diff --git a/OvmfPkg/CpuHotplugSmm/QemuCpuhp.c b/OvmfPkg/CpuHotplugSmm/QemuCpuhp.c
>> index 8d4a6693c8d6..36372a5e6193 100644
>> --- a/OvmfPkg/CpuHotplugSmm/QemuCpuhp.c
>> +++ b/OvmfPkg/CpuHotplugSmm/QemuCpuhp.c
>> @@ -164,6 +164,9 @@ QemuCpuhpWriteCommand (
>>     @param[out] ToUnplugApicIds  The APIC IDs of the CPUs that are about to be
>>                                  hot-unplugged.
>>   
>> +  @param[out] ToUnplugSelector The QEMU Selectors of the CPUs that are about to
>> +                               be hot-unplugged.
>> +
>>     @param[out] ToUnplugCount    The number of filled-in APIC IDs in
>>                                  ToUnplugApicIds.
>>   
> 
Acking the comments above.

> (7) Please (a) call the parameter "ToUnplugSelectors" (plural), and (b)
> make sure that there are two space characters between the variable name
> "column" and the documentation "column". (All in all, please move the
> RHS column to the right by two spaces.)

That would make the RHS of ToUnplugSelectors not line up with the other
two out params. (Even though this mail does not seem to show that, they
do line up in the code.) Is that okay?

> 
>> @@ -187,6 +190,7 @@ QemuCpuhpCollectApicIds (
>>     OUT APIC_ID                      *PluggedApicIds,
>>     OUT UINT32                       *PluggedCount,
>>     OUT APIC_ID                      *ToUnplugApicIds,
>> +  OUT UINT32                       *ToUnplugSelector,
>>     OUT UINT32                       *ToUnplugCount
>>     )
>>   {
>> @@ -204,6 +208,7 @@ QemuCpuhpCollectApicIds (
>>       UINT32  PendingSelector;
>>       UINT8   CpuStatus;
>>       APIC_ID *ExtendIds;
>> +    UINT32  *ExtendSel;
> 
> (8) Please use plural -- "ExtendSels" --, consistently with "ExtendIds".
> 
> 
>>       UINT32  *ExtendCount;
>>       APIC_ID NewApicId;
>>   
>> @@ -245,10 +250,10 @@ QemuCpuhpCollectApicIds (
>>       if ((CpuStatus & QEMU_CPUHP_STAT_INSERT) != 0) {
>>         //
>>         // The "insert" event guarantees the "enabled" status; plus it excludes
>> -      // the "remove" event.
>> +      // the "fw_remove" event.
>>         //
>>         if ((CpuStatus & QEMU_CPUHP_STAT_ENABLED) == 0 ||
>> -          (CpuStatus & QEMU_CPUHP_STAT_REMOVE) != 0) {
>> +          (CpuStatus & QEMU_CPUHP_STAT_FW_REMOVE) != 0) {
>>           DEBUG ((DEBUG_ERROR, "%a: CurrentSelector=%u CpuStatus=0x%x: "
>>             "inconsistent CPU status\n", __FUNCTION__, CurrentSelector,
>>             CpuStatus));
>> @@ -259,40 +264,69 @@ QemuCpuhpCollectApicIds (
>>           CurrentSelector));
>>   
>>         ExtendIds   = PluggedApicIds;
>> +      ExtendSel   = NULL;
>>         ExtendCount = PluggedCount;
>> -    } else if ((CpuStatus & QEMU_CPUHP_STAT_REMOVE) != 0) {
>> -      DEBUG ((DEBUG_VERBOSE, "%a: CurrentSelector=%u: remove\n", __FUNCTION__,
>> -        CurrentSelector));
>> +    } else if ((CpuStatus & QEMU_CPUHP_STAT_FW_REMOVE) != 0) {
>> +      //
>> +      // "fw_remove" event guarantees "enabled".
>> +      //
>> +      if ((CpuStatus & QEMU_CPUHP_STAT_ENABLED) == 0) {
>> +        DEBUG ((DEBUG_ERROR, "%a: CurrentSelector=%u CpuStatus=0x%x: "
>> +          "inconsistent CPU status\n", __FUNCTION__, CurrentSelector,
>> +          CpuStatus));
>> +        return EFI_PROTOCOL_ERROR;
>> +      }
>> +
>> +      DEBUG ((DEBUG_VERBOSE, "%a: CurrentSelector=%u: fw_remove\n",
>> +        __FUNCTION__, CurrentSelector));
>>   
>>         ExtendIds   = ToUnplugApicIds;
>> +      ExtendSel   = ToUnplugSelector;
>>         ExtendCount = ToUnplugCount;
>> +    } else if ((CpuStatus & QEMU_CPUHP_STAT_REMOVE) != 0) {
>> +      //
>> +      // Let the OSPM deal with the "remove" event.
>> +      //
>> +      DEBUG ((DEBUG_VERBOSE, "%a: CurrentSelector=%u: remove (ignored)\n",
>> +        __FUNCTION__, CurrentSelector));
>> +
>> +      ExtendIds   = NULL;
>> +      ExtendSel   = NULL;
>> +      ExtendCount = NULL;
>>       } else {
>>         DEBUG ((DEBUG_VERBOSE, "%a: CurrentSelector=%u: no event\n",
>>           __FUNCTION__, CurrentSelector));
>>         break;
>>       }
>>   
>> -    //
>> -    // Save the APIC ID of the CPU with the pending event, to the corresponding
>> -    // APIC ID array.
>> -    //
>> -    if (*ExtendCount == ApicIdCount) {
>> -      DEBUG ((DEBUG_ERROR, "%a: APIC ID array too small\n", __FUNCTION__));
>> -      return EFI_BUFFER_TOO_SMALL;
>> -    }
>> -    QemuCpuhpWriteCommand (MmCpuIo, QEMU_CPUHP_CMD_GET_ARCH_ID);
>> -    NewApicId = QemuCpuhpReadCommandData (MmCpuIo);
>> -    DEBUG ((DEBUG_VERBOSE, "%a: ApicId=" FMT_APIC_ID "\n", __FUNCTION__,
>> -      NewApicId));
>> -    ExtendIds[(*ExtendCount)++] = NewApicId;
>> +    ASSERT ((ExtendIds == NULL) == (ExtendCount == NULL));
> 
> (9) Please follow this ASSERT with another ASSERT(), on a separate line:
> 
>      ASSERT (ExtendSels == NULL || ExtendIds != NULL);
> 
> Explanation:
> 
> - I'd like us to capture: if ExtendSels is not NULL, then ExtendIds is
> also not NULL.
> 
> - Furthermore, express the A-->B implication using its equivalent
> formulation (!A)||B.

This is a good check. Will add.

> 
> 
>> +    if (ExtendIds != NULL) {
>> +      //
>> +      // Save the APIC ID of the CPU with the pending event, to the
>> +      // corresponding APIC ID array.
>> +      // For unplug events, also save the CurrentSelector.
>> +      //
>> +      if (*ExtendCount == ApicIdCount) {
>> +        DEBUG ((DEBUG_ERROR, "%a: APIC ID array too small\n", __FUNCTION__));
>> +        return EFI_BUFFER_TOO_SMALL;
>> +      }
>> +      QemuCpuhpWriteCommand (MmCpuIo, QEMU_CPUHP_CMD_GET_ARCH_ID);
>> +      NewApicId = QemuCpuhpReadCommandData (MmCpuIo);
>> +      DEBUG ((DEBUG_VERBOSE, "%a: ApicId=" FMT_APIC_ID "\n", __FUNCTION__,
>> +        NewApicId));
>> +      if (ExtendSel != NULL) {
>> +        ExtendSel[(*ExtendCount)] = CurrentSelector;
>> +      }
>> +      ExtendIds[(*ExtendCount)++] = NewApicId;
> 
> OK thus far...
> 
>>   
>> -    //
>> -    // We've processed the CPU with (known) pending events, but we must never
>> -    // clear events. Therefore we need to advance past this CPU manually;
>> -    // otherwise, QEMU_CPUHP_CMD_GET_PENDING would stick to the currently
>> -    // selected CPU.
>> -    //
>> -    CurrentSelector++;
>> +      //
>> +      // We've processed the CPU with (known) pending events, but we must never
>> +      // clear events. Therefore we need to advance past this CPU manually;
>> +      // otherwise, QEMU_CPUHP_CMD_GET_PENDING would stick to the currently
>> +      // selected CPU.
>> +      //
>> +      CurrentSelector++;
>> +    }
> 
> (10) ... but this is a logic bug: the comment and the CurrentSelector
> increment must not depend on (ExtendIds != NULL).
> 
> As written, if we ever see a QEMU_CPUHP_STAT_REMOVE event (which we're
> supposed to ignore), we'll never move past that CPU, but will be stuck
> in an infinite loop. The QEMU_CPUHP_CMD_GET_PENDING command will keep
> returning the CPU with the QEMU_CPUHP_STAT_REMOVE event pending.
> 
> If this was unclear from my previous feedback, then I apologize, I
> should have been clearer.

No, my bad. Don't quite know how I missed that -- that was the whole
reason why I had a CurrentSelector++ before the continue statement
in v6.

Will fix.

Thanks
Ankur

> 
> Thanks,
> Laszlo
> 
>>     } while (CurrentSelector < PossibleCpuCount);
>>   
>>     DEBUG ((DEBUG_VERBOSE, "%a: PluggedCount=%u ToUnplugCount=%u\n",
>>
> 

  reply	other threads:[~2021-02-22 22:03 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-22  7:19 [PATCH v8 00/10] support CPU hot-unplug Ankur Arora
2021-02-22  7:19 ` [PATCH v8 01/10] OvmfPkg/CpuHotplugSmm: refactor hotplug logic Ankur Arora
2021-02-22 11:49   ` [edk2-devel] " Laszlo Ersek
2021-02-22  7:19 ` [PATCH v8 02/10] OvmfPkg/CpuHotplugSmm: collect hot-unplug events Ankur Arora
2021-02-22 12:27   ` [edk2-devel] " Laszlo Ersek
2021-02-22 22:03     ` Ankur Arora [this message]
2021-02-23 16:44       ` Laszlo Ersek
2021-02-22  7:19 ` [PATCH v8 03/10] OvmfPkg/CpuHotplugSmm: add Qemu Cpu Status helper Ankur Arora
2021-02-22 12:31   ` [edk2-devel] " Laszlo Ersek
2021-02-22 22:22     ` Ankur Arora
2021-02-22  7:19 ` [PATCH v8 04/10] OvmfPkg/CpuHotplugSmm: introduce UnplugCpus() Ankur Arora
2021-02-22 12:39   ` [edk2-devel] " Laszlo Ersek
2021-02-22 22:22     ` Ankur Arora
2021-02-22  7:19 ` [PATCH v8 05/10] OvmfPkg/CpuHotplugSmm: define CPU_HOT_EJECT_DATA Ankur Arora
2021-02-22 13:06   ` [edk2-devel] " Laszlo Ersek
2021-02-22 22:33     ` Ankur Arora
2021-02-22  7:19 ` [PATCH v8 06/10] OvmfPkg/SmmCpuFeaturesLib: init CPU ejection state Ankur Arora
2021-02-22 14:19   ` [edk2-devel] " Laszlo Ersek
2021-02-23  7:37     ` Ankur Arora
2021-02-22  7:19 ` [PATCH v8 07/10] OvmfPkg/SmmCpuFeaturesLib: call CPU hot-eject handler Ankur Arora
2021-02-22 14:53   ` [edk2-devel] " Laszlo Ersek
2021-02-23  7:37     ` Ankur Arora
2021-02-23 16:52       ` Laszlo Ersek
2021-02-23  7:45     ` Paolo Bonzini
2021-02-23 17:06       ` Laszlo Ersek
2021-02-23 17:18         ` Paolo Bonzini
2021-02-23 20:46           ` Ankur Arora
2021-02-22  7:19 ` [PATCH v8 08/10] OvmfPkg/CpuHotplugSmm: add EjectCpu() Ankur Arora
2021-02-23 20:36   ` [edk2-devel] " Laszlo Ersek
2021-02-23 20:51     ` Ankur Arora
2021-02-22  7:19 ` [PATCH v8 09/10] OvmfPkg/CpuHotplugSmm: do actual CPU hot-eject Ankur Arora
2021-02-23 21:39   ` [edk2-devel] " Laszlo Ersek
2021-02-24  3:44     ` Ankur Arora
2021-02-25 19:22       ` Laszlo Ersek
2021-02-22  7:19 ` [PATCH v8 10/10] OvmfPkg/SmmControl2Dxe: negotiate CPU hot-unplug Ankur Arora
2021-02-23 21:52   ` [edk2-devel] " Laszlo Ersek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1fc423a8-1230-bcc9-2544-bdec8cd682bf@oracle.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox