From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by mx.groups.io with SMTP id smtpd.web11.64242.1673548498253167465 for ; Thu, 12 Jan 2023 10:34:58 -0800 Authentication-Results: mx.groups.io; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=FH8Zd0ks; spf=pass (domain: redhat.com, ip: 170.10.129.124, mailfrom: lersek@redhat.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673548497; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cnXc9/YEVCoN09emFJBK35lrY55GY+jzcp2Xdfudym8=; b=FH8Zd0kst2s/uW75dx9QrhExXQ9Duk7ozx+JyMe+PCcLsKKiyLWzajv+XA/nU3I1kOmlfk 88npd8HtqIrlp8WMyRui9GuFNFZky4dbTHX0BnrCUT+zKgQZz2vWboDqQREt38F8kgaHlf zhHPB+NAerURoOFL7ayeILi/iMxAaTE= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-9-Sv5J16zqNMGg9sJ1mR7JRw-1; Thu, 12 Jan 2023 13:34:54 -0500 X-MC-Unique: Sv5J16zqNMGg9sJ1mR7JRw-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 7F936802D1A; Thu, 12 Jan 2023 18:34:53 +0000 (UTC) Received: from [10.39.192.93] (unknown [10.39.192.93]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 4CFED1759E; Thu, 12 Jan 2023 18:34:51 +0000 (UTC) Message-ID: <8f9592c3-08b7-3e8d-47c4-7ae78f0b8c36@redhat.com> Date: Thu, 12 Jan 2023 19:34:50 +0100 MIME-Version: 1.0 Subject: Re: [edk2-devel] [PATCH v2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression From: "Laszlo Ersek" To: devel@edk2.groups.io Cc: Ard Biesheuvel , Brijesh Singh , Erdem Aktas , Gerd Hoffmann , James Bottomley , Jiewen Yao , Jordan Justen , Min Xu , Oliver Steffen , Sebastien Boeuf , Tom Lendacky Reply-To: devel@edk2.groups.io, lersek@redhat.com References: <20230112082845.128463-1-lersek@redhat.com> In-Reply-To: <20230112082845.128463-1-lersek@redhat.com> X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 1/12/23 09:28, Laszlo Ersek wrote: > In QEMU v5.1.0, the CPU hotplug register block misbehaves: the negotiation > protocol is (effectively) broken such that it suggests that switching from > the legacy interface to the modern interface works, but in reality the > switch never happens. The symptom has been witnessed when using TCG > acceleration; KVM seems to mask the issue. The issue persists with the > following (latest) stable QEMU releases: v5.2.0, v6.2.0, v7.2.0. Currently > there is no stable release that addresses the problem. > > The QEMU bug confuses the Present and Possible counting in function > PlatformMaxCpuCountInitialization(), in > "OvmfPkg/Library/PlatformInitLib/Platform.c". OVMF ends up with Present=0 > Possible=1. This in turn further confuses MpInitLib in UefiCpuPkg (hence > firmware-time multiprocessing will be broken). Worse, CPU hot(un)plug with > SMI will be summarily broken in OvmfPkg/CpuHotplugSmm, which (considering > the privilege level of SMM) is not that great. > > Detect the issue in PlatformMaxCpuCountInitialization(), and print an > error message and *hang* if the issue is present. > > The problem was originally reported by Ard [0]. We analyzed it at [1] and > [2]. A QEMU patch was sent at [3]; now merged as commit dab30fbef389 > ("acpi: cpuhp: fix guest-visible maximum access size to the legacy reg > block", 2023-01-08), to be included in QEMU v8.0.0. > > [0] https://bugzilla.tianocore.org/show_bug.cgi?id=4234#c2 > > [1] https://bugzilla.tianocore.org/show_bug.cgi?id=4234#c3 > > [2] IO port write width clamping differs between TCG and KVM > http://mid.mail-archive.com/aaedee84-d3ed-a4f9-21e7-d221a28d1683@redhat.com > https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg00199.html > > [3] acpi: cpuhp: fix guest-visible maximum access size to the legacy reg block > http://mid.mail-archive.com/20230104090138.214862-1-lersek@redhat.com > https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg00278.html > > NOTE: PlatformInitLib is used in the following platform DSCs: > > OvmfPkg/AmdSev/AmdSevX64.dsc > OvmfPkg/CloudHv/CloudHvX64.dsc > OvmfPkg/IntelTdx/IntelTdxX64.dsc > OvmfPkg/Microvm/MicrovmX64.dsc > OvmfPkg/OvmfPkgIa32.dsc > OvmfPkg/OvmfPkgIa32X64.dsc > OvmfPkg/OvmfPkgX64.dsc > > but I can only test this change with the last three platforms, running on > QEMU. > > Test results: > > TCG QEMU OVMF result > patched patched > --- ------- ------- ------------------------------------------------- > 0 0 0 CPU counts OK (KVM masks the QEMU bug) > 0 0 1 CPU counts OK (KVM masks the QEMU bug) > 0 1 0 CPU counts OK (QEMU fix, but KVM masks the QEMU > bug anyway) > 0 1 1 CPU counts OK (QEMU fix, but KVM masks the QEMU > bug anyway) > 1 0 0 boot with broken CPU counts (original QEMU bug) > 1 0 1 broken CPU count caught (boot hangs) > 1 1 0 CPU counts OK (QEMU fix) > 1 1 1 CPU counts OK (QEMU fix) > > Cc: Ard Biesheuvel > Cc: Brijesh Singh > Cc: Erdem Aktas > Cc: Gerd Hoffmann > Cc: James Bottomley > Cc: Jiewen Yao > Cc: Jordan Justen > Cc: Min Xu > Cc: Oliver Steffen > Cc: Sebastien Boeuf > Cc: Tom Lendacky > Bugzilla: https://bugzilla.tianocore.org/show_bug.cgi?id=4250 > Reviewed-by: Gerd Hoffmann > Signed-off-by: Laszlo Ersek > --- > > Notes: > v2: > > - V1 was at > . > > - Repo: , branch: > cpuhp-reg-catch-4250-v2 > > - Remove KVM as a proposed workaround from the error message, because in > the QEMU discussion, we had found that the KVM accelerator's behavior > in QEMU (masking the problem) was not right, and that a fix for that > had been in progress for quite some time. > > - Add the QEMU commit hash to the commit message, the code comment, and > the error message. > > - Pick up Gerd's R-b; add Oliver to the Cc list. > > OvmfPkg/Library/PlatformInitLib/Platform.c | 35 ++++++++++++++++++++ > 1 file changed, 35 insertions(+) > > diff --git a/OvmfPkg/Library/PlatformInitLib/Platform.c b/OvmfPkg/Library/PlatformInitLib/Platform.c > index 3e13c5d4b34f..13348afb4890 100644 > --- a/OvmfPkg/Library/PlatformInitLib/Platform.c > +++ b/OvmfPkg/Library/PlatformInitLib/Platform.c > @@ -541,6 +541,41 @@ PlatformMaxCpuCountInitialization ( > ASSERT (Selected == Possible || Selected == 0); > } while (Selected > 0); > > + // > + // Sanity check: we need at least 1 present CPU (CPU#0 is always present). > + // > + // The legacy-to-modern switching of the CPU hotplug register block got > + // broken (for TCG) in QEMU v5.1.0. Refer to "IO port write width clamping > + // differs between TCG and KVM" at > + // > + // or at > + // . > + // > + // QEMU received the fix in commit dab30fbef389 ("acpi: cpuhp: fix > + // guest-visible maximum access size to the legacy reg block", > + // 2023-01-08), to be included in QEMU v8.0.0. > + // > + // If we're affected by this QEMU bug, then we must not continue: it > + // confuses the multiprocessing in UefiCpuPkg/Library/MpInitLib, and > + // breaks CPU hot(un)plug with SMI in OvmfPkg/CpuHotplugSmm. > + // > + if (Present == 0) { > + DEBUG (( > + DEBUG_ERROR, > + "%a: Broken CPU hotplug register block: Present=%u Possible=%u.\n" > + "%a: Update QEMU to v8, or to stable with dab30fbef389 backported.\n" > + "%a: Refer to " > + ".\n", > + __FUNCTION__, > + Present, > + Possible, > + __FUNCTION__, > + __FUNCTION__ > + )); > + ASSERT (FALSE); > + CpuDeadLoop (); > + } > + > // > // Sanity check: fw_cfg and the modern CPU hotplug interface should > // return the same boot CPU count. > please do with this what you will