From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by mx.groups.io with SMTP id smtpd.web10.3366.1614116351169776100 for ; Tue, 23 Feb 2021 13:39:11 -0800 Authentication-Results: mx.groups.io; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=He0FAm2f; spf=pass (domain: redhat.com, ip: 216.205.24.124, mailfrom: lersek@redhat.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1614116350; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aeBI+MfpJ3ZsNuq3crvRU69OPf3R5VVuvIHtsmLejAo=; b=He0FAm2fIP/PtchzRlgp/s4g1XHz6vdPT7VVBTeE7Tsjjb6P6DEkkw56jRQcuq+SAuoP5M ZZDno4Iy9j+h6anpD7mtdcXb+1fDcJFmRzvSxgYA0utstIaBrLasUzdJJOy1r82NONr29F 6Ou1iR57GV/vdKWonEiMLj51JaR2qeo= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-499-AbRNPwfnNUmm243mrXxdFQ-1; Tue, 23 Feb 2021 16:39:06 -0500 X-MC-Unique: AbRNPwfnNUmm243mrXxdFQ-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id EC29B80197D; Tue, 23 Feb 2021 21:39:04 +0000 (UTC) Received: from lacos-laptop-7.usersys.redhat.com (ovpn-113-109.ams2.redhat.com [10.36.113.109]) by smtp.corp.redhat.com (Postfix) with ESMTP id DCE0C5C8BE; Tue, 23 Feb 2021 21:39:02 +0000 (UTC) Subject: Re: [edk2-devel] [PATCH v8 09/10] OvmfPkg/CpuHotplugSmm: do actual CPU hot-eject To: devel@edk2.groups.io, ankur.a.arora@oracle.com Cc: imammedo@redhat.com, boris.ostrovsky@oracle.com, Jordan Justen , Ard Biesheuvel , Aaron Young References: <20210222071928.1401820-1-ankur.a.arora@oracle.com> <20210222071928.1401820-10-ankur.a.arora@oracle.com> From: "Laszlo Ersek" Message-ID: <0f373648-0d88-6151-f2f6-aa185041b576@redhat.com> Date: Tue, 23 Feb 2021 22:39:01 +0100 MIME-Version: 1.0 In-Reply-To: <20210222071928.1401820-10-ankur.a.arora@oracle.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=lersek@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit On 02/22/21 08:19, Ankur Arora wrote: > Add logic in EjectCpu() to do the actual the CPU ejection. > > On the BSP, ejection happens by first selecting the CPU via > its QemuSelector and then sending the QEMU "eject" command. > QEMU in-turn signals the remote VCPU thread which context-switches > the CPU out of the SMI handler. > > Meanwhile the CPU being ejected, waits around in its holding > area until it is context-switched out. Note that it is possible > that a slow CPU gets ejected before it reaches the wait loop. > However, this would never happen before it has executed the > "AllCpusInSync" loop in SmiRendezvous(). > It can mean that an ejected CPU does not execute code after > that point but given that the CPU state will be destroyed by > QEMU, the missed cleanup is no great loss. > > Cc: Laszlo Ersek > Cc: Jordan Justen > Cc: Ard Biesheuvel > Cc: Igor Mammedov > Cc: Boris Ostrovsky > Cc: Aaron Young > Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=3132 > Signed-off-by: Ankur Arora > --- > > Notes: > Addresses the following reviewing comments from v6: > (1) s/CpuEject/EjectCpu/g > (2,2a,2c) Get rid of eject-worker and related. > (2b,2d) Use the PlatformSmmBspElection() logic to find out IsBSP. > (3,3b) Use CPU_HOT_EJECT_DATA->QemuSelector instead of ApicIdMap to > do the actual ejection. > (4,5a,5b) Fix the format etc in the final unplugged log message > () Also as discussed elsewhere document the ordering requirements for > mCpuHotEjectData->QemuSelector[] and mCpuHotEjectData->Handler. > () [from patch 2] Move definition of QEMU_CPUHP_STAT_EJECTED to this > patch. > () s/QEMU_CPUHP_STAT_EJECTED/QEMU_CPUHP_STAT_EJECT/ > > OvmfPkg/Include/IndustryStandard/QemuCpuHotplug.h | 1 + > OvmfPkg/CpuHotplugSmm/CpuHotplug.c | 127 +++++++++++++++++++-- > .../Library/SmmCpuFeaturesLib/SmmCpuFeaturesLib.c | 31 +++++ > 3 files changed, 152 insertions(+), 7 deletions(-) > > diff --git a/OvmfPkg/Include/IndustryStandard/QemuCpuHotplug.h b/OvmfPkg/Include/IndustryStandard/QemuCpuHotplug.h > index 2ec7a107a64d..d0e83102c13f 100644 > --- a/OvmfPkg/Include/IndustryStandard/QemuCpuHotplug.h > +++ b/OvmfPkg/Include/IndustryStandard/QemuCpuHotplug.h > @@ -34,6 +34,7 @@ > #define QEMU_CPUHP_STAT_ENABLED BIT0 > #define QEMU_CPUHP_STAT_INSERT BIT1 > #define QEMU_CPUHP_STAT_REMOVE BIT2 > +#define QEMU_CPUHP_STAT_EJECT BIT3 > #define QEMU_CPUHP_STAT_FW_REMOVE BIT4 > > #define QEMU_CPUHP_RW_CMD_DATA 0x8 > diff --git a/OvmfPkg/CpuHotplugSmm/CpuHotplug.c b/OvmfPkg/CpuHotplugSmm/CpuHotplug.c > index 851e2b28aad9..0484be8fe43c 100644 > --- a/OvmfPkg/CpuHotplugSmm/CpuHotplug.c > +++ b/OvmfPkg/CpuHotplugSmm/CpuHotplug.c > @@ -18,6 +18,7 @@ > #include // CPU_HOT_EJECT_DATA > #include // EFI_MM_CPU_IO_PROTOCOL > #include // EFI_SMM_CPU_SERVICE_PROTOCOL > +#include // MSR_IA32_APIC_BASE_REGISTER > #include // EFI_STATUS > > #include "ApicId.h" // APIC_ID > @@ -191,12 +192,39 @@ RevokeNewSlot: > } > > /** > + EjectCpu needs to know the BSP at SMI exit at a point when > + some of the EFI_SMM_CPU_SERVICE_PROTOCOL state has been torn > + down. > + Reuse the logic from OvmfPkg::PlatformSmmBspElection() to > + do that. > + > + @param[in] ProcessorNum ProcessorNum denotes the processor handle number > + in EFI_SMM_CPU_SERVICE_PROTOCOL. > +**/ > +STATIC > +BOOLEAN > +CheckIfBsp ( > + IN UINTN ProcessorNum > + ) (1a) Please remove the ProcessorNum parameter -- comment and parameter list alike --; it's useless. In the parameter list, just replace the line's contents with "VOID". (1b) Additionally, please state: @retval TRUE If the CPU executing this function is the BSP. @retval FALSE If the CPU executing this function is an AP. > +{ > + MSR_IA32_APIC_BASE_REGISTER ApicBaseMsr; > + BOOLEAN IsBsp; (2) "IsBsp" should line up with "ApicBaseMsr". > + > + ApicBaseMsr.Uint64 = AsmReadMsr64 (MSR_IA32_APIC_BASE); > + IsBsp = (BOOLEAN)(ApicBaseMsr.Bits.BSP == 1); > + return IsBsp; > +} > + > +/** > CPU Hot-eject handler, called from SmmCpuFeaturesRendezvousExit() > on each CPU at exit from SMM. > > - If, the executing CPU is not being ejected, nothing to be done. > + If, the executing CPU is neither the BSP, nor being ejected, nothing > + to be done. > If, the executing CPU is being ejected, wait in a halted loop > until ejected. > + If, the executing CPU is the BSP, set QEMU CPU status to eject > + for CPUs being ejected. > > @param[in] ProcessorNum ProcessorNum denotes the CPU exiting SMM, > and will be used as an index into > @@ -211,9 +239,99 @@ EjectCpu ( > ) > { > UINT64 QemuSelector; > + BOOLEAN IsBsp; > > + IsBsp = CheckIfBsp (ProcessorNum); > + > + // > + // mCpuHotEjectData->QemuSelectorMap[ProcessorNum] is updated > + // on the BSP in the ongoing SMI iteration at two places: (3) I feel "iteration" doesn't really apply; I'd simply drop that word. "ongoing SMI" seems sufficient (or maybe "ongoing SMI handling"). > + // > + // - UnplugCpus() where the BSP determines if a CPU is under ejection > + // or not. As the comment where mCpuHotEjectData->Handler is set-up > + // describes any such updates are guaranteed to be ordered-before the > + // dereference below. > + // > + // - EjectCpu() on the BSP updates QemuSelectorMap[ProcessorNum] for > + // CPUs after they have been hot-ejected. > + // > + // The CPU under ejection: might be executing anywhere between the > + // "AllCpusInSync" exit loop in SmiRendezvous() to about to > + // dereference QemuSelectorMap[ProcessorNum]. > + // Given that the BSP ensures that this store only happens after the > + // CPU has been ejected, this CPU would never see the after value. > + // (Note that any CPU that is already executing the CpuSleep() loop > + // below never raced any updates and always saw the before value.) > + // > + // CPUs not-under ejection: never see any changes so they are fine. > + // > + // Lastly, note that we are also guaranteed that any dereferencing > + // CPU only sees the before or after value and not an intermediate > + // value. This is because QemuSelectorMap[ProcessorNum] is aligned at > + // a natural boundary. > + // (4) Well, about the last paragraph -- when I saw that you used UINT64 in QemuSelectorMap, to allow for a sentinel value, I immediately thought of IA32. This module is built for IA32 too, and there I'm not so sure about the atomicity of UINT64 accesses. I didn't raise it at that point because I wasn't sure if we were actually going to rely on the atomicity. And, I don't think we are. Again, let me quote Paolo's example from : thread 1 thread 2 -------------------------------- ------------------------ a.x = 1; smp_wmb(); WRITE_ONCE(message, &a); datum = READ_ONCE(message); smp_rmb(); if (datum != NULL) printk("%x\n", datum->x); The access to object "x" is not restricted in any particular way. In thread#1, we have an smp_wmb() which enforces both a compiler barrier and a store-release fence, and then we have an atomic access to "message". But the "a.x" assignment need not be atomic. In our case, "a.x = 1" maps to (a) populating "QemuSelectorMap", and (b) setting up the "Handler" function pointer, in UnplugCpus(). The smp_wmb() is the "ReleaseMemoryFence" in UnplugCpus(). The WRITE_ONCE is the final "AllCpusInSync = FALSE" assignment in BSPHandler() [UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c]. Regarding thread#2, the READ_ONCE is matched by the "AllCpusInSync" loop. smp_rmb() is the "AcquireMemoryFence" I called for, under PATCH v8 07/10, "OvmfPkg/SmmCpuFeaturesLib: call CPU hot-eject handler". And the "datum" access happens (a) in SmmCpuFeaturesRendezvousExit(), where we fetch/call Handler, and (b) here, where we read/modify "QemuSelectorMap". So this seems to imply we can: - drop the natural-alignment padding of "QemuSelectorMap", from PATCH v8 06/10, "OvmfPkg/SmmCpuFeaturesLib: init CPU ejection state", - drop the last paragraph of the comment above. > QemuSelector = mCpuHotEjectData->QemuSelectorMap[ProcessorNum]; > - if (QemuSelector == CPU_EJECT_QEMU_SELECTOR_INVALID) { > + if (QemuSelector == CPU_EJECT_QEMU_SELECTOR_INVALID && !IsBsp) { > + return; > + } > + > + if (IsBsp) { (5) Can you reorder the high-level flow here? Such as: if (CheckIfBsp ()) { // // send the eject requests to QEMU here // return; } // // Reached only on APs: // QemuSelector = mCpuHotEjectData->QemuSelectorMap[ProcessorNum]; if (QemuSelector == CPU_EJECT_QEMU_SELECTOR_INVALID) { return; } // // AP being hot-ejected; pen it here. // > + UINT32 Idx; > + > + for (Idx = 0; Idx < mCpuHotEjectData->ArrayLength; Idx++) { > + UINT64 QemuSelector; > + > + QemuSelector = mCpuHotEjectData->QemuSelectorMap[Idx]; > + > + if (QemuSelector != CPU_EJECT_QEMU_SELECTOR_INVALID) { > + // > + // This to-be-ejected-CPU has already received the BSP's SMI exit > + // signal and, will execute SmmCpuFeaturesRendezvousExit() > + // followed by this callback or is already waiting in the > + // CpuSleep() loop below. > + // > + // Tell QEMU to context-switch it out. > + // > + QemuCpuhpWriteCpuSelector (mMmCpuIo, (UINT32) QemuSelector); > + QemuCpuhpWriteCpuStatus (mMmCpuIo, QEMU_CPUHP_STAT_EJECT); > + > + // > + // We need a compiler barrier here to ensure that the compiler > + // does not reorder the CpuStatus and QemuSelectorMap[Idx] stores. > + // > + // A store fence is not strictly necessary on x86 which has > + // TSO; however, both of these stores are in different address spaces > + // so also add a Store Fence here. > + // > + MemoryFence (); (6) I wonder if this compiler barrier + comment block are helpful. Paraphrasing your (ex-)colleague Liran, if MMIO and IO Port accessors didn't contain built-in fences, all hell would break lose. We're using EFI_MM_CPU_IO_PROTOCOL for IO Port accesses. I think we should be safe ordering-wise, even without an explicit compiler barrier here. To me personally, this particular fence only muddies the picture -- where we already have an acquire memory fence and a store memory fence to couple with each other. I'd recommend removing this. (If you disagree, I'm willing to listen to arguments, of course!) > + > + // > + // Clear the eject status for this CPU Idx to ensure that an invalid > + // SMI later does not end up trying to eject it or a newly > + // hotplugged CPU Idx does not go into the dead loop. > + // > + mCpuHotEjectData->QemuSelectorMap[Idx] = > + CPU_EJECT_QEMU_SELECTOR_INVALID; > + > + DEBUG ((DEBUG_INFO, "%a: Unplugged ProcessorNum %u, " > + "QemuSelector 0x%Lx\n", __FUNCTION__, Idx, QemuSelector)); (7) Please log QemuSelector with "%Lu". > + } > + } > + > + // > + // We are done until the next hot-unplug; clear the handler. > + // > + // By virtue of the MemoryFence() in the ejection loop above, the > + // following store is ordered-after all the ejections are done. > + // (We know that there is at least one CPU hot-eject handler if this > + // handler was installed.) > + // > + // As described in OvmfPkg::SmmCpuFeaturesRendezvousExit() this > + // means that the only CPUs which might dereference > + // mCpuHotEjectData->Handler are not under ejection, so we can > + // safely reset. > + // > + mCpuHotEjectData->Handler = NULL; > return; > } > > @@ -496,11 +614,6 @@ CpuHotplugMmi ( > if (EFI_ERROR (Status)) { > goto Fatal; > } > - if (ToUnplugCount > 0) { > - DEBUG ((DEBUG_ERROR, "%a: hot-unplug is not supported yet\n", > - __FUNCTION__)); > - goto Fatal; > - } > > if (PluggedCount > 0) { > Status = ProcessHotAddedCpus (mPluggedApicIds, PluggedCount); > diff --git a/OvmfPkg/Library/SmmCpuFeaturesLib/SmmCpuFeaturesLib.c b/OvmfPkg/Library/SmmCpuFeaturesLib/SmmCpuFeaturesLib.c > index 99988285b6a2..ddfef05ee6cf 100644 > --- a/OvmfPkg/Library/SmmCpuFeaturesLib/SmmCpuFeaturesLib.c > +++ b/OvmfPkg/Library/SmmCpuFeaturesLib/SmmCpuFeaturesLib.c > @@ -472,6 +472,37 @@ SmmCpuFeaturesRendezvousExit ( > // (PcdCpuMaxLogicalProcessorNumber > 1), and hot-eject is needed > // in this SMI exit (otherwise mCpuHotEjectData->Handler is not armed.) > // > + // mCpuHotEjectData itself is stable once setup so it can be > + // dereferenced without needing any synchronization, > + // but, mCpuHotEjectData->Handler is updated on the BSP in the > + // ongoing SMI iteration at two places: > + // > + // - UnplugCpus() where the BSP determines if a CPU is under ejection > + // or not. As the comment where mCpuHotEjectData->Handler is set-up > + // describes any such updates are guaranteed to be ordered-before the > + // dereference below. > + // > + // - EjectCpu() (which is called via the Handler below), on the BSP > + // updates mCpuHotEjectData->Handler once it is done with all ejections. > + // > + // The CPU under ejection: might be executing anywhere between the > + // "AllCpusInSync" exit loop in SmiRendezvous() to about to > + // dereference the Handler field. > + // Given that the BSP ensures that this store only happens after all > + // CPUs under ejection have been ejected, this CPU would never see > + // the after value. > + // (Note that any CPU that is already executing the CpuSleep() loop > + // below never raced any updates and always saw the before value.) > + // > + // CPUs not-under ejection: might see either value of the Handler > + // which is fine, because the Handler is a NOP for CPUs not-under > + // ejection. > + // > + // Lastly, note that we are also guaranteed that any dereferencing > + // CPU only sees the before or after value and not an intermediate > + // value. This is because mCpuHotEjectData->Handler is aligned at a > + // natural boundary. > + // > > if (mCpuHotEjectData != NULL) { > CPU_HOT_EJECT_HANDLER Handler; > (8) I can't really put my finger on it, I just feel that repeating (open-coding) this wall of text here is not really productive. Do you think that, after you add the "acquire memory fence" comment in patch #7, we could avoid most of the text here? I think we should only point out (in patch #7) the "release fence" that the logic here pairs with. If you really want to present it all from both perspectives, I guess I'm OK with that, but then I believe we should drop the last paragraph at least (see point (4)). Thanks Laszlo