public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: "Laszlo Ersek" <lersek@redhat.com>
To: Ankur Arora <ankur.a.arora@oracle.com>, devel@edk2.groups.io
Cc: imammedo@redhat.com, boris.ostrovsky@oracle.com,
	Jordan Justen <jordan.l.justen@intel.com>,
	Ard Biesheuvel <ard.biesheuvel@arm.com>,
	Aaron Young <aaron.young@oracle.com>
Subject: Re: [PATCH v6 7/9] OvmfPkg/CpuHotplugSmm: add CpuEject()
Date: Tue, 2 Feb 2021 15:00:46 +0100	[thread overview]
Message-ID: <c8c3a9f2-2b69-0ca0-86a8-7f69b2d44382@redhat.com> (raw)
In-Reply-To: <14068eb6-dc43-c244-5985-709d685fc750@oracle.com>

On 02/01/21 21:12, Ankur Arora wrote:
> On 2021-02-01 11:08 a.m., Laszlo Ersek wrote:
>> apologies, I've got more comments here:
>>
>> On 01/29/21 01:59, Ankur Arora wrote:
>>
>>>   /**
>>> +  CPU Hot-eject handler, called from SmmCpuFeaturesRendezvousExit(),
>>> +  on each CPU at exit from SMM.
>>> +
>>> +  If, the executing CPU is not being ejected, nothing to be done.
>>> +  If, the executing CPU is being ejected, wait in a CpuDeadLoop()
>>> +  until ejected.
>>> +
>>> +  @param[in] ProcessorNum      Index of executing CPU.
>>> +
>>> +**/
>>> +VOID
>>> +EFIAPI
>>> +CpuEject (
>>> +  IN UINTN ProcessorNum
>>> +  )
>>> +{
>>> +  //
>>> +  // APIC ID is UINT32, but mCpuHotEjectData->ApicIdMap[] is UINT64
>>> +  // so use UINT64 throughout.
>>> +  //
>>> +  UINT64 ApicId;
>>> +
>>> +  ApicId = mCpuHotEjectData->ApicIdMap[ProcessorNum];
>>> +  if (ApicId == CPU_EJECT_INVALID) {
>>> +    return;
>>> +  }
>>> +
>>> +  //
>>> +  // CPU(s) being unplugged get here from
>>> SmmCpuFeaturesSmiRendezvousExit()
>>> +  // after having been cleared to exit the SMI by the monarch and
>>> thus have
>>> +  // no SMM processing remaining.
>>> +  //
>>> +  // Given that we cannot allow them to escape to the guest, we pen
>>> them
>>> +  // here until the SMM monarch tells the HW to unplug them.
>>> +  //
>>> +  CpuDeadLoop ();
>>> +}
>>
>> (15) There is no such function as SmmCpuFeaturesSmiRendezvousExit() --
>> it's SmmCpuFeaturesRendezvousExit().
>>
>> (16) This function uses a data structure for communication between BSP
>> and APs -- mCpuHotEjectData->ApicIdMap is modified in UnplugCpus() on
>> the BSP, and checked above by the APs (too).
>>
>> What guarantees the visibility of mCpuHotEjectData->ApicIdMap?
>
> I was banking on SmiRendezvous() explicitly signalling that all
> processing on the BSP was done before any AP will look at
> mCpuHotEjectData in SmmCpuFeaturesRendezvousExit().
>
> 1716     //
> 1717     // Wait for BSP's signal to exit SMI
> 1718     //
> 1719     while (*mSmmMpSyncData->AllCpusInSync) {
> 1720       CpuPause ();
> 1721     }
> 1722   }
> 1723
> 1724 Exit:
> 1725   SmmCpuFeaturesRendezvousExit (CpuIndex);

Right; it's a general pattern in edk2: volatile UINT8 (aka BOOLEAN)
objects are considered atomic. (See
SMM_DISPATCHER_MP_SYNC_DATA.AllCpusInSync -- it's a pointer to a
volatile BOOLEAN.)

But our UINT64 values are neither volatile nor UINT8, and I got suddenly
doubtful about "AllCpusInSync" working as a multiprocessor barrier.

(I could be unjustifiedly worried, as a bunch of other fields in
SMM_DISPATCHER_MP_SYNC_DATA are volatile, wider than UINT8, and *not*
accessed with InterlockedCompareExchageXx().)


>
>>
>> I think we might want to use InterlockedCompareExchange64() in both
>> EjectCpu() and UnplugCpus() (and make "ApicIdMap" volatile, in
>> addition). InterlockedCompareExchange64() can be used just for
>> comparison as well, by passing ExchangeValue=CompareValue.
>
>
> Speaking specifically about the ApicIdMap, I'm not sure I fully
> agree (assuming my comment just above is correct.)
>
>
> The only AP (reader) ApicIdMap deref is here:
>
> CpuEject():
> 218   ApicId = mCpuHotEjectData->ApicIdMap[ProcessorNum];
>
> For the to-be-ejected-AP, this value can only move from
>    valid-APIC-ID (=> wait in CpuDeadLoop()) -> CPU_EJECT_INVALID.
>
> Given that, by the time the worker does the write on line 254, this
> AP is guaranteed to be dead already, I don't think there's any
> scenario where the to-be-ejected-AP can see anything other than
> a valid-APIC-ID.

The scenario I had in mind was different: what guarantees that the
effect of

   375        mCpuHotEjectData->ApicIdMap[ProcessorNum] = (UINT64)RemoveApicId;

which is performed by the BSP in UnplugCpus(), is visible by the AP on
line 218 (see your quote above)?

What if the AP gets to line 218 before the BSP's write on line 375
*propagates* sufficiently?

There's no question that the BSP writes before the AP reads, but I'm
uncertain if that suffices for the *effect* of the write to be visible
to the AP. My concern is not whether the AP sees a partial vs. a settled
update; my concern is if the AP could see an entirely *stale* value.

The consequence of that problem would be that an AP that the BSP were
about to eject would return from CpuEject() to
SmmCpuFeaturesRendezvousExit() to SmiRendezvous().

... I guess that volatile-qualifying both CPU_HOT_EJECT_DATA, and the
array pointed-to by CPU_HOT_EJECT_DATA.ApicIdMap, should suffice. In
combination with the sync-up point that you quoted. This seems to match
existing practice in PiSmmCpuDxeSmm -- there are no concurrent accesses,
so atomicity is not a concern, and serializing the instruction streams
coarsely, with the sync-up, in combination with volatile accesses,
should presumably guarantee visibility (on x86 anyway).

Thanks
Laszlo


>
> 241         QemuCpuhpWriteCpuSelector (mMmCpuIo, (APIC_ID) RemoveApicId);
> 242         QemuCpuhpWriteCpuStatus (mMmCpuIo, QEMU_CPUHP_STAT_EJECTED);
> 243
> 244         //
> 245         // Compiler barrier to ensure the next store isn't reordered
> 246         //
> 247         MemoryFence ();
> 248
> 249         //
> 250         // Clear the eject status for CpuIndex to ensure that an
> invalid
> 251         // SMI later does not end up trying to eject it or a newly
> 252         // hotplugged CpuIndex does not go into the dead loop.
> 253         //
> 254         mCpuHotEjectData->ApicIdMap[CpuIndex] = CPU_EJECT_INVALID;
>   For APs that are not being ejected, they will always see
> CPU_EJECT_INVALID
> since the writer never changes that.
>
> The one scenario in which bad things could happen is if entries in the
> ApicIdMap are unaligned (or if the compiler or cpu-arch tears aligned
> writes).
>
>>
>> (17) I think a similar observation applies to the "Handler" field too,
>> as APs call it, while the BSP keeps flipping it between NULL and a real
>> function address. We might have to turn that field into an
> From a real function address, to NULL is the problem part right?
>
> (Same argument as above for the transition in UnplugCpus() from
> NULL -> function-address.)
>
>
>> EFI_PHYSICAL_ADDRESS (just a fancy name for UINT64), and use
>> InterlockedCompareExchange64() again.
>
> AFAICS, these are the problematic derefs:
>
> SmmCpuFeaturesRendezvousExit():
>
> 450   if (mCpuHotEjectData == NULL ||
> 451       mCpuHotEjectData->Handler == NULL) {
> 452     return;
>
> and problematic assignments:
>
> 266     //
> 267     // We are done until the next hot-unplug; clear the handler.
> 268     //
> 269     mCpuHotEjectData->Handler = NULL;
> 270     return;
> 271   }
>
> Here as well, I've been banking on aligned writes such that the APs would
> only see the before or after value not an intermediate value.
>
> Thanks
> Ankur
>
>>
>> Thanks
>> Laszlo
>>
>


  reply	other threads:[~2021-02-02 14:02 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-29  0:59 [PATCH v6 0/9] support CPU hot-unplug Ankur Arora
2021-01-29  0:59 ` [PATCH v6 1/9] OvmfPkg/CpuHotplugSmm: refactor hotplug logic Ankur Arora
2021-01-30  1:15   ` [edk2-devel] " Laszlo Ersek
2021-02-02  6:19     ` Ankur Arora
2021-02-01  2:59   ` Laszlo Ersek
2021-01-29  0:59 ` [PATCH v6 2/9] OvmfPkg/CpuHotplugSmm: collect hot-unplug events Ankur Arora
2021-01-30  2:18   ` Laszlo Ersek
2021-01-30  2:23     ` Laszlo Ersek
2021-02-02  6:03     ` Ankur Arora
2021-01-29  0:59 ` [PATCH v6 3/9] OvmfPkg/CpuHotplugSmm: add Qemu Cpu Status helper Ankur Arora
2021-01-30  2:36   ` Laszlo Ersek
2021-02-02  6:04     ` Ankur Arora
2021-01-29  0:59 ` [PATCH v6 4/9] OvmfPkg/CpuHotplugSmm: introduce UnplugCpus() Ankur Arora
2021-01-30  2:37   ` Laszlo Ersek
2021-02-01  3:13   ` Laszlo Ersek
2021-02-03  4:28     ` Ankur Arora
2021-02-03 19:20       ` Laszlo Ersek
2021-01-29  0:59 ` [PATCH v6 5/9] OvmfPkg/CpuHotplugSmm: define CPU_HOT_EJECT_DATA Ankur Arora
2021-02-01  4:53   ` Laszlo Ersek
2021-02-02  6:15     ` Ankur Arora
2021-01-29  0:59 ` [PATCH v6 6/9] OvmfPkg/SmmCpuFeaturesLib: init CPU ejection state Ankur Arora
2021-02-01 13:36   ` Laszlo Ersek
2021-02-03  5:20     ` Ankur Arora
2021-02-03 20:36       ` Laszlo Ersek
2021-02-04  2:58         ` Ankur Arora
2021-01-29  0:59 ` [PATCH v6 7/9] OvmfPkg/CpuHotplugSmm: add CpuEject() Ankur Arora
2021-02-01 16:11   ` Laszlo Ersek
2021-02-01 19:08   ` Laszlo Ersek
2021-02-01 20:12     ` Ankur Arora
2021-02-02 14:00       ` Laszlo Ersek [this message]
2021-02-02 14:15         ` Laszlo Ersek
2021-02-03  6:45           ` Ankur Arora
2021-02-03 20:58             ` Laszlo Ersek
2021-02-04  2:49               ` Ankur Arora
2021-02-04  8:58                 ` Laszlo Ersek
2021-02-05 16:06                 ` [edk2-devel] " Laszlo Ersek
2021-02-08  5:04                   ` Ankur Arora
2021-02-03  6:13         ` Ankur Arora
2021-02-03 20:55           ` Laszlo Ersek
2021-02-04  2:57             ` Ankur Arora
2021-01-29  0:59 ` [PATCH v6 8/9] OvmfPkg/CpuHotplugSmm: add worker to do CPU ejection Ankur Arora
2021-02-01 17:22   ` Laszlo Ersek
2021-02-01 19:21     ` Ankur Arora
2021-02-02 13:23       ` Laszlo Ersek
2021-02-03  5:41         ` Ankur Arora
2021-01-29  0:59 ` [PATCH v6 9/9] OvmfPkg/SmmControl2Dxe: negotiate CPU hot-unplug Ankur Arora
2021-02-01 17:37   ` Laszlo Ersek
2021-02-01 17:40     ` Laszlo Ersek
2021-02-01 17:48       ` Laszlo Ersek
2021-02-03  5:46     ` Ankur Arora
2021-02-03 20:45       ` Laszlo Ersek
2021-02-04  3:04         ` Ankur Arora

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c8c3a9f2-2b69-0ca0-86a8-7f69b2d44382@redhat.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox