public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: "Ard Biesheuvel" <ardb@kernel.org>
To: Laszlo Ersek <lersek@redhat.com>
Cc: Oliver Steffen <osteffen@redhat.com>,
	devel@edk2.groups.io,  Ard Biesheuvel <ardb+tianocore@kernel.org>,
	Brijesh Singh <brijesh.singh@amd.com>,
	 Erdem Aktas <erdemaktas@google.com>,
	Gerd Hoffmann <kraxel@redhat.com>,
	 James Bottomley <jejb@linux.ibm.com>,
	Jiewen Yao <jiewen.yao@intel.com>,
	 Jordan Justen <jordan.l.justen@intel.com>,
	Michael Brown <mcb30@ipxe.org>, Min Xu <min.m.xu@intel.com>,
	 Sebastien Boeuf <sebastien.boeuf@intel.com>,
	Tom Lendacky <thomas.lendacky@amd.com>
Subject: Re: [PATCH v3 2/2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression
Date: Fri, 20 Jan 2023 10:10:14 +0100	[thread overview]
Message-ID: <CAMj1kXEWEVLxjM7ubtN5-wr_H=ScC4BpX-wgrHWi_xYtZmxGVQ@mail.gmail.com> (raw)
In-Reply-To: <c5d191ad-cac6-9172-b6be-791aa0510a4b@redhat.com>

On Fri, 20 Jan 2023 at 09:50, Laszlo Ersek <lersek@redhat.com> wrote:
>
> a couple of requests to Oliver below:
>
> On 1/19/23 12:27, Ard Biesheuvel wrote:
> > On Thu, 19 Jan 2023 at 12:01, Laszlo Ersek <lersek@redhat.com> wrote:
> >>
> >> In QEMU v5.1.0, the CPU hotplug register block misbehaves: the negotiation
> >> protocol is (effectively) broken such that it suggests that switching from
> >> the legacy interface to the modern interface works, but in reality the
> >> switch never happens. The symptom has been witnessed when using TCG
> >> acceleration; KVM seems to mask the issue. The issue persists with the
> >> following (latest) stable QEMU releases: v5.2.0, v6.2.0, v7.2.0. Currently
> >> there is no stable release that addresses the problem.
> >>
> >> The QEMU bug confuses the Present and Possible counting in function
> >> PlatformMaxCpuCountInitialization(), in
> >> "OvmfPkg/Library/PlatformInitLib/Platform.c". OVMF ends up with Present=0
> >> Possible=1. This in turn further confuses MpInitLib in UefiCpuPkg (hence
> >> firmware-time multiprocessing will be broken). Worse, CPU hot(un)plug with
> >> SMI will be summarily broken in OvmfPkg/CpuHotplugSmm, which (considering
> >> the privilege level of SMM) is not that great.
> >>
> >> Detect the issue in PlatformCpuCountBugCheck(), and print an error message
> >> and *hang* if the issue is present.
> >>
> >> Users willing to take risks can override the hang with the experimental
> >> QEMU command line option
> >>
> >>   -fw_cfg name=opt/org.tianocore/X-Cpuhp-Bugcheck-Override,string=yes
> >>
> >> (The "-fw_cfg" QEMU option itself is not experimental; its above argument,
> >> as far it concerns the firmware, is experimental.)
> >>
> >> The problem was originally reported by Ard [0]. We analyzed it at [1] and
> >> [2]. A QEMU patch was sent at [3]; now merged as commit dab30fbef389
> >> ("acpi: cpuhp: fix guest-visible maximum access size to the legacy reg
> >> block", 2023-01-08), to be included in QEMU v8.0.0.
> >>
> >> [0] https://bugzilla.tianocore.org/show_bug.cgi?id=4234#c2
> >>
> >> [1] https://bugzilla.tianocore.org/show_bug.cgi?id=4234#c3
> >>
> >> [2] IO port write width clamping differs between TCG and KVM
> >>     http://mid.mail-archive.com/aaedee84-d3ed-a4f9-21e7-d221a28d1683@redhat.com
> >>     https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg00199.html
> >>
> >> [3] acpi: cpuhp: fix guest-visible maximum access size to the legacy reg block
> >>     http://mid.mail-archive.com/20230104090138.214862-1-lersek@redhat.com
> >>     https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg00278.html
> >>
> >> NOTE: PlatformInitLib is used in the following platform DSCs:
> >>
> >>   OvmfPkg/AmdSev/AmdSevX64.dsc
> >>   OvmfPkg/CloudHv/CloudHvX64.dsc
> >>   OvmfPkg/IntelTdx/IntelTdxX64.dsc
> >>   OvmfPkg/Microvm/MicrovmX64.dsc
> >>   OvmfPkg/OvmfPkgIa32.dsc
> >>   OvmfPkg/OvmfPkgIa32X64.dsc
> >>   OvmfPkg/OvmfPkgX64.dsc
> >>
> >> but I can only test this change with the last three platforms, running on
> >> QEMU.
> >>
> >> Test results:
> >>
> >>   TCG  QEMU     OVMF     override  result
> >>        patched  patched
> >>   ---  -------  -------  --------  --------------------------------------
> >>   0    0        0        0         CPU counts OK (KVM masks the QEMU bug)
> >>   0    0        1        0         CPU counts OK (KVM masks the QEMU bug)
> >>   0    1        0        0         CPU counts OK (QEMU fix, but KVM masks
> >>                                    the QEMU bug anyway)
> >>   0    1        1        0         CPU counts OK (QEMU fix, but KVM masks
> >>                                    the QEMU bug anyway)
> >>   1    0        0        0         boot with broken CPU counts (original
> >>                                    QEMU bug)
> >>   1    0        1        0         broken CPU count caught (boot hangs)
> >>   1    0        1        1         broken CPU count caught, bug check
> >>                                    overridden, boot continues
> >>   1    1        0        0         CPU counts OK (QEMU fix)
> >>   1    1        1        0         CPU counts OK (QEMU fix)
> >>
> >> Cc: Ard Biesheuvel <ardb+tianocore@kernel.org>
> >> Cc: Brijesh Singh <brijesh.singh@amd.com>
> >> Cc: Erdem Aktas <erdemaktas@google.com>
> >> Cc: Gerd Hoffmann <kraxel@redhat.com>
> >> Cc: James Bottomley <jejb@linux.ibm.com>
> >> Cc: Jiewen Yao <jiewen.yao@intel.com>
> >> Cc: Jordan Justen <jordan.l.justen@intel.com>
> >> Cc: Michael Brown <mcb30@ipxe.org>
> >> Cc: Min Xu <min.m.xu@intel.com>
> >> Cc: Oliver Steffen <osteffen@redhat.com>
> >> Cc: Sebastien Boeuf <sebastien.boeuf@intel.com>
> >> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> >> Bugzilla: https://bugzilla.tianocore.org/show_bug.cgi?id=4250
> >> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
> >
> > Thanks a lot for taking the time and investing the effort. I'm quite
> > happy that we have this 'escape hatch' now, which we could arguably
> > use temporarily in the VS2019 platform CI until its QEMU binary gets
> > updated, right?
>
> Yes, I have to agree there.
>
> Right now, because those QEMU binaries are affected by the regression,
> and because they use TCG, OVMF already sees Present=0 Possible=1. Due to
> the interference of Present=0 with the QEMU v2.7 reset bug workaround,
> we also get BootCpuCount=0. Furthermore, MaxCpuCount gets set to 1, from
> Possible. Thus, we exit PlatformMaxCpuCountInitialization() with
> PcdCpuBootLogicalProcessorNumber=0 (from BootCpuCount) and
> PcdCpuMaxLogicalProcessorNumber=1 (from MaxCpuCount).
>
> Then, in the "predictable subset" of consequences of the QEMU
> regression, we can say that MpInitLib interprets the above PCD values as
> "uniprocessor system with the boot CPU count not exposed by the
> platform". This (i.e., *just this*) does not fall outside of MpInitLib's
> domain (again, note my qualification "predictable subset").
>
> Now, if we apply the patch and also add the -fw_cfg switch to the
> Windows CI, *and* we also don't add any -smp flags (as far as I can
> tell, no -smp flag is used now), then the new PCD state will be
>
> PcdCpuBootLogicalProcessorNumber=1 (changed from zero)
> PcdCpuMaxLogicalProcessorNumber=1 (stays the same)
>
> As far as I can tell, *right now* this change should have no effect *in
> MpInitLib*, IOW nothing gets worse or better there. Namely,
> PcdCpuBootLogicalProcessorNumber is only consumed in WakeUpAP(), and
> only when InitFlag == ApInitConfig. InitFlag is set like that only in
> CollectProcessorCount(). However, CollectProcessorCount() is only called
> if PcdCpuMaxLogicalProcessorNumber is >1 (see MaxLogicalProcessorNumber
> in MpInitLibInitialize()). Meaning in effect that
> PcdCpuMaxLogicalProcessorNumber=1 makes PcdCpuBootLogicalProcessorNumber
> irrelevant, so its change from 0 to 1 is invisible *to MpInitLib*.
>
> Oliver:
>
> (1) can you please post a patch for the Windows CI so that the following
> option be passed to QEMU:
>
>   -fw_cfg name=opt/org.tianocore/X-Cpuhp-Bugcheck-Override,string=yes
>
> (This option is harmless when the firmware does not determine the QEMU
> bug, so it can be passed in advance; it will have no consequence at all.)
>
> In the patch, please reference
>
>   https://bugzilla.tianocore.org/show_bug.cgi?id=4250
>

Can I take the above as an ack on

https://edk2.groups.io/g/devel/message/98899

?

> (2) Please file a separate TianoCore BZ for *backing out* the change (=
> for removing the -fw_cfg switch), and assign it to yourself :)
>
> Once the Windows CI advances to a fixed QEMU binary, the "escape hatch"
> should be shut welded down.
>
> (3) Please give me a hint when the CI patch (1) has been merged; then I
> can go ahead and merge this v3 series as well.
>

I'll merge the whole lot once you're happy with the CI patch.

  reply	other threads:[~2023-01-20  9:10 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-19 11:01 [PATCH v3 0/2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression Laszlo Ersek
2023-01-19 11:01 ` [PATCH v3 1/2] OvmfPkg/PlatformInitLib: factor out PlatformCpuCountBugCheck() Laszlo Ersek
2023-01-19 11:01 ` [PATCH v3 2/2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression Laszlo Ersek
2023-01-19 11:27   ` Ard Biesheuvel
2023-01-20  8:50     ` Laszlo Ersek
2023-01-20  9:10       ` Ard Biesheuvel [this message]
2023-01-20 12:55         ` Laszlo Ersek
2023-01-20  9:17       ` Laszlo Ersek
2023-01-20  9:19         ` Laszlo Ersek
2023-01-19 11:25 ` [edk2-devel] [PATCH v3 0/2] " Michael Brown
2023-01-19 12:05 ` Gerd Hoffmann
2023-01-20 13:48 ` Laszlo Ersek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAMj1kXEWEVLxjM7ubtN5-wr_H=ScC4BpX-wgrHWi_xYtZmxGVQ@mail.gmail.com' \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox