From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by mx.groups.io with SMTP id smtpd.web08.36389.1612195895263788773 for ; Mon, 01 Feb 2021 08:11:35 -0800 Authentication-Results: mx.groups.io; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=O0j5djiL; spf=pass (domain: redhat.com, ip: 216.205.24.124, mailfrom: lersek@redhat.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1612195894; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=F4h8pgu1C7HqhsZ7FjNsztrzslzeRFrf/DF52jjyOok=; b=O0j5djiLr//Y2mfWeByD+ivLMLTQi37sthfZL8Gwq5kSwJJy3V5bvIs1/gpGcCoblQzTCp 1nRGgzMF9caa2ItYUcrN218KN9YqKHrbB6gHCDc7GBzYaLnfO3TIednauy42Ony8lvSEcd jytLAlv+qoOkKhoBPihMLCv1YNfNAdY= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-79-xGeHgOvGMfukiZ2tRtTwqw-1; Mon, 01 Feb 2021 11:11:29 -0500 X-MC-Unique: xGeHgOvGMfukiZ2tRtTwqw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id CEBBA107ACE3; Mon, 1 Feb 2021 16:11:27 +0000 (UTC) Received: from lacos-laptop-7.usersys.redhat.com (ovpn-114-253.ams2.redhat.com [10.36.114.253]) by smtp.corp.redhat.com (Postfix) with ESMTP id ABA2F60C5F; Mon, 1 Feb 2021 16:11:25 +0000 (UTC) Subject: Re: [PATCH v6 7/9] OvmfPkg/CpuHotplugSmm: add CpuEject() To: Ankur Arora , devel@edk2.groups.io Cc: imammedo@redhat.com, boris.ostrovsky@oracle.com, Jordan Justen , Ard Biesheuvel , Aaron Young References: <20210129005950.467638-1-ankur.a.arora@oracle.com> <20210129005950.467638-8-ankur.a.arora@oracle.com> From: "Laszlo Ersek" Message-ID: <180a8efb-1a26-3bab-f50a-2d7aeff6d582@redhat.com> Date: Mon, 1 Feb 2021 17:11:24 +0100 MIME-Version: 1.0 In-Reply-To: <20210129005950.467638-8-ankur.a.arora@oracle.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=lersek@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit On 01/29/21 01:59, Ankur Arora wrote: > Add CpuEject(), which handles the CPU ejection, and provides a holding > area for said CPUs. It is called via SmmCpuFeaturesRendezvousExit(), > at the tail end of the SMI handling. (1) The functions introduced thus far by this patch series are all named "Verb + Object", which is great; so please call this function EjectCpu() as well, rather than CpuEject(). Modify all three of: subject line, commit message, patch body; please. > > Also UnplugCpus() now stashes APIC IDs of CPUs which need to be > ejected in CPU_HOT_EJECT_DATA.ApicIdMap. These are used by CpuEject() > to identify such CPUs. > > Cc: Laszlo Ersek > Cc: Jordan Justen > Cc: Ard Biesheuvel > Cc: Igor Mammedov > Cc: Boris Ostrovsky > Cc: Aaron Young > Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=3132 > Signed-off-by: Ankur Arora > --- > OvmfPkg/CpuHotplugSmm/CpuHotplug.c | 109 +++++++++++++++++++++++++++++++++++-- > 1 file changed, 105 insertions(+), 4 deletions(-) > > diff --git a/OvmfPkg/CpuHotplugSmm/CpuHotplug.c b/OvmfPkg/CpuHotplugSmm/CpuHotplug.c > index 70d69f6ed65b..526f51faf070 100644 > --- a/OvmfPkg/CpuHotplugSmm/CpuHotplug.c > +++ b/OvmfPkg/CpuHotplugSmm/CpuHotplug.c > @@ -14,6 +14,7 @@ > #include // gMmst > #include // PcdGetBool() > #include // SafeUintnSub() > +#include // CPU_HOT_EJECT_DATA > #include // EFI_MM_CPU_IO_PROTOCOL > #include // EFI_SMM_CPU_SERVICE_PROTOCOL > #include // EFI_STATUS (2) This will change due to the movement of the header file, but: please keep the #include directive list alphabetically sorted. > @@ -32,11 +33,12 @@ STATIC EFI_MM_CPU_IO_PROTOCOL *mMmCpuIo; > // > STATIC EFI_SMM_CPU_SERVICE_PROTOCOL *mMmCpuService; > // > -// This structure is a communication side-channel between the > +// These structures serve as communication side-channels between the > // EFI_SMM_CPU_SERVICE_PROTOCOL consumer (i.e., this driver) and provider > // (i.e., PiSmmCpuDxeSmm). > // > STATIC CPU_HOT_PLUG_DATA *mCpuHotPlugData; > +STATIC CPU_HOT_EJECT_DATA *mCpuHotEjectData; > // > // SMRAM arrays for fetching the APIC IDs of processors with pending events (of > // known event types), for the time of just one MMI. > @@ -188,11 +190,53 @@ RevokeNewSlot: > } > > /** > + CPU Hot-eject handler, called from SmmCpuFeaturesRendezvousExit(), > + on each CPU at exit from SMM. > + > + If, the executing CPU is not being ejected, nothing to be done. > + If, the executing CPU is being ejected, wait in a CpuDeadLoop() > + until ejected. > + > + @param[in] ProcessorNum Index of executing CPU. > + > +**/ > +VOID > +EFIAPI > +CpuEject ( > + IN UINTN ProcessorNum > + ) > +{ > + // > + // APIC ID is UINT32, but mCpuHotEjectData->ApicIdMap[] is UINT64 > + // so use UINT64 throughout. > + // > + UINT64 ApicId; > + > + ApicId = mCpuHotEjectData->ApicIdMap[ProcessorNum]; > + if (ApicId == CPU_EJECT_INVALID) { > + return; > + } > + > + // > + // CPU(s) being unplugged get here from SmmCpuFeaturesSmiRendezvousExit() > + // after having been cleared to exit the SMI by the monarch and thus have > + // no SMM processing remaining. > + // > + // Given that we cannot allow them to escape to the guest, we pen them > + // here until the SMM monarch tells the HW to unplug them. > + // > + CpuDeadLoop (); > +} (3a) We can make this less resource-hungry, by replacing CpuDeadLoop() with: for (;;) { DisableInterrupts (); CpuSleep (); } This basically translates to a { CLI; HLT; } loop. (Both functions come from BaseLib, which CpuHotplugSmm already consumes, thus there is no need to modify #include's or [LibraryClasses].) (3b) Please refresh the CpuDeadLoop() reference in the function's leading comment as well. > + > +/** > Process to be hot-unplugged CPUs, per QemuCpuhpCollectApicIds(). > > For each such CPU, report the CPU to PiSmmCpuDxeSmm via > - EFI_SMM_CPU_SERVICE_PROTOCOL. If the to be hot-unplugged CPU is > - unknown, skip it silently. > + EFI_SMM_CPU_SERVICE_PROTOCOL and stash the APIC ID for later ejection. > + If the to be hot-unplugged CPU is unknown, skip it silently. > + > + Additonally, if we do stash any APIC IDs, also install a CPU eject handler > + which would handle the ejection. > > @param[in] ToUnplugApicIds The APIC IDs of the CPUs that are about to be > hot-unplugged. > @@ -216,9 +260,11 @@ UnplugCpus ( > { > EFI_STATUS Status; > UINT32 ToUnplugIdx; > + UINT32 EjectCount; > UINTN ProcessorNum; > > ToUnplugIdx = 0; > + EjectCount = 0; > while (ToUnplugIdx < ToUnplugCount) { > APIC_ID RemoveApicId; > > @@ -255,13 +301,41 @@ UnplugCpus ( > DEBUG ((DEBUG_ERROR, "%a: RemoveProcessor(" FMT_APIC_ID "): %r\n", > __FUNCTION__, RemoveApicId, Status)); > goto Fatal; > + } else { (Under patch v6 4/9, I request that the "goto" be replaced with a "return" -- my point (4) below applies regardless:) (4) Please don't add an "else" branch, if the first branch of the "if" ends with a jump statement. Because, in that case, the code that follows the "if" statement is not reachable after the first branch anyway. So please just unnest the next part: > + // > + // Stash the APIC IDs so we can do the actual ejection later. > + // > + if (mCpuHotEjectData->ApicIdMap[ProcessorNum] != CPU_EJECT_INVALID) { > + // > + // Since ProcessorNum and APIC-ID map 1-1, so a valid > + // mCpuHotEjectData->ApicIdMap[ProcessorNum] means something > + // is horribly wrong. > + // (5) To be honest, I would replace this with: // // - mCpuHotEjectData->ApicIdMap[ProcessorNum] is initialized to // CPU_EJECT_INVALID when mCpuHotEjectData->ApicIdMap is allocated, // // - mCpuHotEjectData->ApicIdMap[ProcessorNum] is restored to // CPU_EJECT_INVALID when the subject processor is ejected, // // - mMmCpuService->RemoveProcessor(ProcessorNum) invalidates // mCpuHotPlugData->ApicId[ProcessorNum], so a given ProcessorNum can // never match more than one APIC ID in a single invocation of // UnplugCpus(). // > + DEBUG ((DEBUG_ERROR, "%a: ProcessorNum %u maps to %llx, cannot " > + "map to " FMT_APIC_ID "\n", __FUNCTION__, ProcessorNum, > + mCpuHotEjectData->ApicIdMap[ProcessorNum], RemoveApicId)); (6a) The indentation of the 2nd and 3rd lines is incorrect. (6b) For logging UINTN values (i.e., ProcessorNum) portably between IA32 and X64, %u is not correct. Instead: - cast the UINTN value to UINT64 explicitly, - use the %Lu or %Lx format specifier. (6c) There is no "%llx" format string in edk2's PrintLib (no "ll" length modifier, to be more precise). UINT64 values need to be printed with "%lu" or "%lx", or -- identically -- with "%Lu" or "%Lx". I prefer the latter, because standard C does not define the "L" size modifier for integers, and that makes it clear that we're using an edk2-specific feature. The "l" (ell) length modifier could be misunderstood as "long" (which is something we don't use in edk2). (6d) FMT_APIC_ID is defined as "0x%08x"; to remain consistent with that, I would print the ApicIdMap element not just with "%Lx", but with "0x%016Lx". > + > + Status = EFI_INVALID_PARAMETER; > + goto Fatal; (7a) Please just "return EFI_ALREADY_STARTED". (7b) Please also modify the leading comment on the function -- the new return value EFI_ALREADY_STARTED should be documented. I suggest: @retval EFI_ALREADY_STARTED For the ProcessorNumber that EFI_SMM_CPU_SERVICE_PROTOCOL had assigned to one of the APIC ID in ToUnplugApicIds, mCpuHotEjectData->ApicIdMap already has an APIC ID stashed. (This should never happen.) > + } > + > + mCpuHotEjectData->ApicIdMap[ProcessorNum] = (UINT64)RemoveApicId; > + EjectCount++; > } > > ToUnplugIdx++; > } > > + if (EjectCount != 0) { > + // > + // We have processors to be ejected; install the handler. > + // > + mCpuHotEjectData->Handler = CpuEject; > + } > + (8) I suggest removing the "EjectCount" local variable, and setting the "Handler" member where you currently increment "EjectCount". > // > - // We've removed this set of APIC IDs from SMM data structures. > + // We've removed this set of APIC IDs from SMM data structures and > + // have installed an ejection handler if needed. > // > return EFI_SUCCESS; > > @@ -458,7 +532,13 @@ CpuHotplugEntry ( > // Our DEPEX on EFI_SMM_CPU_SERVICE_PROTOCOL guarantees that PiSmmCpuDxeSmm > // has pointed PcdCpuHotPlugDataAddress to CPU_HOT_PLUG_DATA in SMRAM. > // > + // Additionally, CPU Hot-unplug is available only if CPU Hotplug is, so > + // the same DEPEX also guarantees that PcdCpuHotEjectDataAddress points > + // to CPU_HOT_EJECT_DATA in SMRAM. > + // (9) I don't see the relevance of "hot-unplug depends on hot-plug" here. I recommend the following comment instead: // // Our DEPEX on EFI_SMM_CPU_SERVICE_PROTOCOL guarantees that PiSmmCpuDxeSmm // has pointed: // - PcdCpuHotPlugDataAddress to CPU_HOT_PLUG_DATA in SMRAM, // - PcdCpuHotEjectDataAddress to CPU_HOT_EJECT_DATA in SMRAM, if the // possible CPU count is greater than 1. // > mCpuHotPlugData = (VOID *)(UINTN)PcdGet64 (PcdCpuHotPlugDataAddress); > + mCpuHotEjectData = (VOID *)(UINTN)PcdGet64 (PcdCpuHotEjectDataAddress); > + > if (mCpuHotPlugData == NULL) { > Status = EFI_NOT_FOUND; > DEBUG ((DEBUG_ERROR, "%a: CPU_HOT_PLUG_DATA: %r\n", __FUNCTION__, Status)); > @@ -470,6 +550,9 @@ CpuHotplugEntry ( > if (mCpuHotPlugData->ArrayLength == 1) { > return EFI_UNSUPPORTED; > } > + ASSERT (mCpuHotEjectData && > + (mCpuHotPlugData->ArrayLength == mCpuHotEjectData->ArrayLength)); > + > // > // Allocate the data structures that depend on the possible CPU count. > // (10) To remain consistent with the check performed on "mCpuHotPlugData", please do: if (mCpuHotEjectData == NULL) { Status = EFI_NOT_FOUND; } else if (mCpuHotPlugData->ArrayLength != mCpuHotEjectData->ArrayLength) { Status = EFI_INVALID_PARAMETER; } else { Status = EFI_SUCCESS; } if (EFI_ERROR (Status)) { DEBUG ((DEBUG_ERROR, "%a: CPU_HOT_EJECT_DATA: %r\n", __FUNCTION__, Status)); goto Fatal; } ( As a digression, I'll make some comments on the ASSERT() too: - Given ASSERT ((C1) && (C2)), it is best to express the same as ASSERT (C1); ASSERT (C2); -- the effect is the same, but the error messages have finer granularity. - Checking a pointer against NULL must be explicit at all times, in edk2. IOW, ASSERT (mCpuHotEjectData) should be spelled ASSERT (mCpuHotEjectData != NULL). ) > @@ -552,6 +635,24 @@ CpuHotplugEntry ( > // > SmbaseInstallFirstSmiHandler (); > > + if (mCpuHotEjectData) { (11) This condition is guaranteed to evaluate to TRUE; see the ASSERT() above. Anyway, ignore this... > + UINT32 Idx; (12) Incorrect indentation, but ignore this too... > + // > + // For CPU ejection we need to map ProcessorNum -> APIC_ID. By the time > + // we need the mapping, however, the Processor's APIC ID has already been > + // removed from SMM data structures. So we will maintain a local map > + // in mCpuHotEjectData->ApicIdMap. > + // > + for (Idx = 0; Idx < mCpuHotEjectData->ArrayLength; Idx++) { > + mCpuHotEjectData->ApicIdMap[Idx] = CPU_EJECT_INVALID; > + } (13) ... because this init loop should be moved to patch #6 (subject "OvmfPkg/SmmCpuFeaturesLib: init CPU ejection state"), as I mentioned there... > + > + // > + // Wait to init the handler until an ejection is warranted > + // > + mCpuHotEjectData->Handler = NULL; (14) ... and because this nulling is performed by patch #6 already (subject "OvmfPkg/SmmCpuFeaturesLib: init CPU ejection state"). Therefore, this whole conditional block should be removed please. Thanks! Laszlo > + } > + > return EFI_SUCCESS; > > ReleasePostSmmPen: >