From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by mx.groups.io with SMTP id smtpd.web11.168276.1673880497201302045 for ; Mon, 16 Jan 2023 06:48:17 -0800 Authentication-Results: mx.groups.io; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=bWWrldtM; spf=pass (domain: redhat.com, ip: 170.10.129.124, mailfrom: lersek@redhat.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673880495; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xz+1CIouEonEH3NLgz30CjKLDP04QbzfrTcOUthRCyQ=; b=bWWrldtMaxnpA/sU/Di2G9o5naLgttIRM6MOGdirfKI34fRz/DWHjrkWtQMoce7W856GgQ MrhugHEpVxMOHH9TLKFLmH3OOzIpfIeB4QozH7D2cco6cnESzbbJpkcYTnDVTH0SmUKT7A X8kuBp5cGYexP3BJ1oy7RP91SlCNJic= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-627-kplXxRURMC2fVN10EBfTFQ-1; Mon, 16 Jan 2023 09:48:14 -0500 X-MC-Unique: kplXxRURMC2fVN10EBfTFQ-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 981CD19705CB; Mon, 16 Jan 2023 14:48:11 +0000 (UTC) Received: from [10.39.195.128] (unknown [10.39.195.128]) by smtp.corp.redhat.com (Postfix) with ESMTPS id C5213492B10; Mon, 16 Jan 2023 14:48:08 +0000 (UTC) Message-ID: <9141ad66-f868-762c-7ea5-d88753466fa6@redhat.com> Date: Mon, 16 Jan 2023 15:48:06 +0100 MIME-Version: 1.0 Subject: Re: [edk2-devel] [PATCH v2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression To: Gerd Hoffmann Cc: devel@edk2.groups.io, Michael Brown , Ard Biesheuvel , Brijesh Singh , Erdem Aktas , James Bottomley , Jiewen Yao , Jordan Justen , Min Xu , Oliver Steffen , Sebastien Boeuf , Tom Lendacky References: <20230112082845.128463-1-lersek@redhat.com> <01020185a568604c-e16d8581-963a-4ff3-8566-bf0640ad327d-000000@eu-west-1.amazonses.com> <407c5cee-7a6c-cbc8-35cc-8f2c2724914c@redhat.com> <01020185a6bda78a-05d82180-4d1a-4af4-9a9b-ac78088d11ed-000000@eu-west-1.amazonses.com> <49e4e8bb-3bbd-0ca8-ee59-e75560deffa7@redhat.com> <20230113060354.siony3rjwpgzd5tk@sirius.home.kraxel.org> <20230113093205.oh7euprqlmp26wpu@sirius.home.kraxel.org> <20230113122246.uabdhut4ziwerivm@sirius.home.kraxel.org> From: "Laszlo Ersek" In-Reply-To: <20230113122246.uabdhut4ziwerivm@sirius.home.kraxel.org> X-Scanned-By: MIMEDefang 3.1 on 10.11.54.9 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 1/13/23 13:22, Gerd Hoffmann wrote: > On Fri, Jan 13, 2023 at 11:10:54AM +0100, Laszlo Ersek wrote: >> On 1/13/23 10:32, Gerd Hoffmann wrote: >>> On Fri, Jan 13, 2023 at 07:03:54AM +0100, Gerd Hoffmann wrote: >>>> Hi, >>>> >>>>> - QEMU can be configured with other compat properties on the >>>>> command line so that "CPU hotplug with SMI" and "CPU hot-unplug >>>>> with SMI" *not* be offered to the firmware. Then QEMU will reject >>>>> hotplug attempts, and the SMM hotplug code in edk2 will not be >>>>> triggered by the (virtual) hardware. >>>> >>>> Can we have edk2 print instructions for that in the error message? >>> >>> This seems to be: >>> >>> qemu -M q35 \ >>> -global ICH9-LPC.x-smi-cpu-hotplug=off \ >>> -global ICH9-LPC.x-smi-cpu-hotunplug=off >> >> Yes, those are the flags. >> >>> But it appears to not work. >> >> They should work, but they take effect in QEMU, and not in the >> firmware. These knobs control what CPU hot(un)plug+SMI features QEMU >> exposes to the guest fw, via fw_cfg, > > Ok, I see, only the SMM code actually checks that. > >> In particular the firmware makes no further decisions based on >> whether QEMU advertized some of these features. > > I was thinking the other way around: When cpu hotplug is disabled in > qemu it should be safe to skip the whole cpu hotplug checking dance. > See test patch below. > > That would give us a config switch (turn off cpu hotplug support) > which would allow edk2 run on qemu versions with broken cpu hotplug. > > Does the idea look sane or do I miss something? > > take care, > Gerd > > commit bd2e36eba35268ab46c0125d2b9125391ea6f9fc > Author: Gerd Hoffmann > Date: Fri Jan 13 13:07:36 2023 +0100 > > skip cpu present checking when hotplug is off > > diff --git a/OvmfPkg/Library/PlatformInitLib/Platform.c b/OvmfPkg/Library/PlatformInitLib/Platform.c > index 13348afb4890..2b0f0c836f85 100644 > --- a/OvmfPkg/Library/PlatformInitLib/Platform.c > +++ b/OvmfPkg/Library/PlatformInitLib/Platform.c > @@ -415,8 +415,9 @@ PlatformMaxCpuCountInitialization ( > IN OUT EFI_HOB_PLATFORM_INFO *PlatformInfoHob > ) > { > - UINT16 BootCpuCount = 0; > - UINT32 MaxCpuCount; > + UINT16 BootCpuCount = 0; > + UINT32 MaxCpuCount; > + BOOLEAN CpuHotplugSupported = FALSE; > > // > // Try to fetch the boot CPU count. > @@ -424,6 +425,31 @@ PlatformMaxCpuCountInitialization ( > if (QemuFwCfgIsAvailable ()) { > QemuFwCfgSelectItem (QemuFwCfgItemSmpCpuCount); > BootCpuCount = QemuFwCfgRead16 (); > + DEBUG ((DEBUG_INFO, "%a: BootCpuCount: %d\n", __FUNCTION__, BootCpuCount)); > + } > + > + { > + FIRMWARE_CONFIG_ITEM SupportedFeaturesItem; > + UINTN SupportedFeaturesSize; > + UINT64 mSmiFeatures; > + EFI_STATUS Status; > + > + Status = QemuFwCfgFindFile ( > + "etc/smi/supported-features", > + &SupportedFeaturesItem, > + &SupportedFeaturesSize > + ); > + > + if (EFI_ERROR (Status)) { > + DEBUG ((DEBUG_INFO, "%a: etc/smi/supported-features: %r\n", __FUNCTION__, Status)); > + } else { > + QemuFwCfgSelectItem (SupportedFeaturesItem); > + QemuFwCfgReadBytes (sizeof mSmiFeatures, &mSmiFeatures); > + DEBUG ((DEBUG_INFO, "%a: etc/smi/supported-features: 0x%x\n", __FUNCTION__, mSmiFeatures)); > + if (mSmiFeatures & (BIT1 /* hotplug */ | BIT2 /* hotunplug */)) { > + CpuHotplugSupported = TRUE; > + } > + } > } > > if (BootCpuCount == 0) { > @@ -435,6 +461,9 @@ PlatformMaxCpuCountInitialization ( > // > DEBUG ((DEBUG_WARN, "%a: boot CPU count unavailable\n", __FUNCTION__)); > MaxCpuCount = PlatformInfoHob->DefaultMaxCpuNumber; > + } else if (!CpuHotplugSupported) { > + DEBUG ((DEBUG_INFO, "%a: CPU hotplug support not available\n", __FUNCTION__)); > + MaxCpuCount = BootCpuCount; > } else { > // > // We will expose BootCpuCount to MpInitLib. MpInitLib will count APs up to > This would be wrong. Let me provide a summary of CPU hotplug with SMM. It consists of the following patch sets, in edk2. Note that these sets were implemented and merged in dependency order, and I'm listing them likewise. (E1) a7e2d20193e8..778832bcad33, for TianoCore#1515 (E2) c8b8157e126a..83357313dd67, for TianoCore#1515 (E3) 422da35375c6..75839f977d37, for TianoCore#1512 (E4) 61d3b2d4279e..1158fc8e2c7b, for TianoCore#1512 (E5) 5ba203b54e59, no BZ (E6) 63d92674d240..cbccf995920a, for TianoCore#2929 [not really relevant here, just for completeness] (I'm ignoring hot-unplug now, which came later. See TianoCore#3132; the commit range is noted there too.) On the QEMU side, some relevant series are (again in dependency order): (Q1) be9612e8cbb4..3a61c8db9d25 (Q2) fd9b0830b0be [just for completeness] (Q3) 2d69eba5fe52..6e2e2e8a4220 I recommend reading through all those commits. *** Series E1+E2 allow OVMF to expose the precise boot CPU count (=present) and the max CPU count (=possible) to UefiCpuPkg/MpInitLib, and the rest of the firmware. Such that the initial enumeration is both precise and quick, and that all components in the firmware can distinguish these two concepts. Note in particular the final patch in series E2. That patch makes CPU hotplug work with S3 resume *without* SMM at all in the picture, for exmaple on i440fx too. Masking the "present at boot" vs "possible" CPU count difference based on "etc/smi/supported-features" would break that functionality (again, totally unrelated to SMM). There is another (more serious) arguments against keying this workaround off of "etc/smi/supported-features", and I'm going to get to that as well, but this is one reason already. The QEMU regression breaks the present vs. possible logic in series E2. Assume we're on i440fx, or else on Q35 with an OVMF binary that does not include the SMM driver stack. In this case, the firmware need not take notice of CPU hotplugging in real time, but during S3 resume, it must be able to see an increased CPU count. Even though the firmware fails to use the modern regblock, the firmware's enablement of the modern regblock *may not* be necessary for the guest OS to still hotplug CPUs (using QEMU's ACPI methods). There is no firmware<->OS dependency in this case. So such an S3 failure could lead to data loss. I'm against trying to come up with an extremely sophisticated condition under which this present vs. possible masking would be safe. (The QEMU-side counterpart is a *sub-series* of Q1.) *** Series E3 is the "SMRAM at default SMBASE" implementation in OVMF. The QEMU-side counterpart is the *other sub-series* of Q1. This feature is the core of the security of CPU hotplug, when SMM is enabled. The threat model is the following: the malicious guest OS places attack code at the default SMBASE, in RAM. Later, the VM admin on the host side hotplugs a CPU, eventually. The malicious guest OS immediately sends the new CPU an SMI followed by an INIT-SIPI-SIPI. This means that the new CPU starts executing the OS's attack code at the default SMBASE, in SMM, and with access to all SMRAM. So it can attack the firmware's SMM machinery. This feature prevents the attack by replacing guest OS writeable RAM at the default SMBASE with SMRAM, and locks it down during boot. The feature is negotiated by PlatformPei in Q35SmramAtDefaultSmbaseInitialization(), and is represented by PcdQ35SmramAtDefaultSmbase; it orchestrates multiple drivers. The QEMU-side property is called "mch.smbase-smram". Now here I want to point out something very important. Note that QEMU series Q1 implements *both* the "SMRAM at default SMBASE" feature *and* the "Command data 2" register in the CPU hotplug register block. The "Command data 2" register is used for two purposes: negotiating the "modern vs. legacy" operation of the register block, and for getting the APIC ID of a particular CPU. Different locations in OVMF use "Command data 2" for these different purposes, but what's important here is that the successful negotiation of the "SMRAM at default SMBASE" feature *implies* that the "Command data 2" register *works*. In other words, "SMRAM at default SMBASE", if negotiated, can be used as a proxy for "Command data 2". Both of these features would be released in QEMU v5.0.0, coming from the same one Q1 patch series, one would be negotiable, the other not. Therefore, locations in OVMF that want *just* the "Command data 2" register, go through the legacy vs. modern regblock negotiation, using "Command data 2". Other locations in OVMF, namely those that want the "SMRAM at default SMBASE" feature only, or want *both* features, would only check for "SMRAM at default SMBASE". *** Series E4 implements the actual hotplug (w/ SMM) feature in OVMF. Note that in commit 17efae27acaf ("OvmfPkg/CpuHotplugSmm: introduce skeleton for CPU Hotplug SMM driver", 2020-03-04), PcdQ35SmramAtDefaultSmbase==FALSE causes the driver to exit gracefully with EFI_UNSUPPORTED. However, if the feature has been negotiated, all further steps must complete fine, or the driver hangs intentionally. In commit f668e7887165 ("OvmfPkg/CpuHotplugSmm: define the QEMU_CPUHP_CMD_GET_ARCH_ID macro", 2020-03-04), the "Command data 2" register is sanity checked -- *enforced*. That's based exactly on the above-noted "proxying"; see the big code comment in the commit: + // + // Sanity-check the CPU hotplug interface. + // + // Both of the following features are part of QEMU 5.0, introduced primarily + // in commit range 3e08b2b9cb64..3a61c8db9d25: + // + // (a) the QEMU_CPUHP_CMD_GET_ARCH_ID command of the modern CPU hotplug + // interface, + // + // (b) the "SMRAM at default SMBASE" feature. + // + // From these, (b) is restricted to 5.0+ machine type versions, while (a) + // does not depend on machine type version. Because we ensured the stricter + // condition (b) through PcdQ35SmramAtDefaultSmbase above, the (a) + // QEMU_CPUHP_CMD_GET_ARCH_ID command must now be available too. While we + // can't verify the presence of precisely that command, we can still verify + // (sanity-check) that the modern interface is active, at least. + // + // Consult the "Typical usecases | Detecting and enabling modern CPU hotplug + // interface" section in QEMU's "docs/specs/acpi_cpu_hotplug.txt", on the + // following. + // The same implication / "proxy nature" is used in commit 1158fc8e2c7b ("OvmfPkg/CpuS3DataDxe: enable S3 resume after CPU hotplug", 2020-03-04), as well. So this shows why the QEMU regression is so nasty; it doesn't only break the explicit negotiations involving "Command data 2", but breaks *other* code locations that deduce the availability of "Command data 2" from a stricter, and separately negotiated, feature. *** Patch E5 is what introduces the "etc/smi/supported-features" stuff in OVMF. The QEMU counterpart is the Q3 series. Note that for OVMF, this is effectively an afterthought. It is not needed for securing the firmware from a malicious OS, it is also not needed for functionality, as far as the firmware is concerned. It basically communicates OVMF's features to QEMU. This stuff in OVMF enables two things: - it enables QEMU to protect a *non-malicious* OS from crashing if the firmware underneath supports SMI broadcast but is unaware of CPU hotplug, - it enables QEMU to generate ACPI content for a *non-malicious* OS so that the OS inform the firmware (via raising an SMI) about a CPU hotplug event. Note that there is zero security impact from this; that was covered by series E3 (= the "SMRAM at default SMBASE" feature). This is the other (more serious) reason that we shouldn't use BIT1 (ICH9_LPC_SMI_F_CPU_HOTPLUG) and BIT2 (ICH9_LPC_SMI_F_CPU_HOT_UNPLUG) in "etc/smi/supported-features" for driving anything in the firmware. They were never meant to *steer* the firmware. BIT0 (ICH9_LPC_SMI_F_BROADCAST) was meant to steer both the firmware and QEMU, but BIT1 and BIT2 only steer QEMU. In summary, to keep OVMF plus the guest OS "healthy" in face of the QEMU regression, at least three things are necessary: (a) Do *not* extend the role of SMI feature negotiation to steer the workaround; keep it intact. With time, the QEMU regression will lose its significance, but we're going to be left with a horrible confusion (design violation) of BIT1 and BIT2 in the SMI features bitmask. (Case in point: the QEMU-v2.7 reset bug is totally irrelevant nowadays, but we still have the workaround in place for it, and that workaround *worsens* the symptoms of the present QEMU regression!) (b) Avoid potential guest OS data loss by breaking *basic* S3 resume functionality (with SMM entirely out of the picture). This means that *whenever* the guest OS is capable of hotplugging a CPU (including that QEMU allows such an operation to be initiated over QMP), the S3 resume code in OVMF must be able to deal with an increased number of CPUs, relative to the first (cold) boot. In other words, the difference between "present" and "possible" must not be masked, whenever the guest OS is capable of hotplugging a CPU (even without SMM in the picture), using the ACPI code generated by QEMU. (In the beginning, I had proposed for QEMU to expose a new fw_cfg file, for "max cpus", but Igor (justifiedly) suggested to just collect the "possible" count from the modern regblock: .) (c) *No location* in the code that deduces "Command data 2 register works" from "PcdQ35SmramAtDefaultSmbase == TRUE" must be reached, because this implication is broken by the QEMU regression. Note that my proposal (hard hang upon "Command data 2 register not working") satisfies all three, and with time, becomes applicable. If pressure needs to be exterted somewhere to expedite that, then it's the QEMU stable release process. Note that clearing "mch.smbase-smram" on the QEMU command line, in itself, would be very wrong. It would disable the security from series E3, like it used to be on 4.2 machine types and before; however, at the same time, it would not prevent OVMF from acknowledging the ICH9_LPC_SMI_F_CPU_HOTPLUG feature bit via fw_cfg, which appeared with 5.2 machine types. In other words, it would create a machine type that never existed, and OVMF never expected to see -- the firmware wouldn't protect itself from a malicious guest OS, and QEMU would permit the admin to hotplug CPUs. If you simply go back to 4.2 machine types ("-M" switch), disabling both the "SMRAM at default SMBASE" and ICH9_LPC_SMI_F_CPU_HOTPLUG features, the hotplug regblock's negotiation still malfunctions (it is not versioned in QEMU). Assuming you hang OVMF if *either one* of PcdQ35SmramAtDefaultSmbase and ICH9_LPC_SMI_F_CPU_HOTPLUG is set, and allow the boot to progress if both are clear, the present<->possible counts would still be wrong, and S3 resume might cause OS data loss if the OS still managed to hotplug a CPU. I do not claim that a logical predicate "does not exist" that covers precisely those situations that need to be caught *for the purpose* of papering them over. (I guess the predicate and/or the action would somehow involve "PcdQ35SmramAtDefaultSmbase", *BUT* even then, the order of calls to MaxCpuCountInitialization() and Q35SmramAtDefaultSmbaseInitialization() is wrong!!!) What I'm stating is that I'm *unable to reason* about such a predicate with any sense of safety, due to the predicate's complexity, and due to it compensating for the violation of a *foundational* invariant. On my part, I wouldn't like to continue this discussion. I'm really sorry. (I'm only responding on-list because Gerd's question so closely follows my good-bye from the other thread that I think he may not have seen that until sending the question. But, with this, I have no more input on the topic, or anything for the list, really.) Laszlo