public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
* [PATCH v2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression
@ 2023-01-12  8:28 Laszlo Ersek
  2023-01-12  9:55 ` [edk2-devel] " Michael Brown
  2023-01-12 18:34 ` Laszlo Ersek
  0 siblings, 2 replies; 24+ messages in thread
From: Laszlo Ersek @ 2023-01-12  8:28 UTC (permalink / raw)
  To: devel
  Cc: Ard Biesheuvel, Brijesh Singh, Erdem Aktas, Gerd Hoffmann,
	James Bottomley, Jiewen Yao, Jordan Justen, Min Xu,
	Oliver Steffen, Sebastien Boeuf, Tom Lendacky

In QEMU v5.1.0, the CPU hotplug register block misbehaves: the negotiation
protocol is (effectively) broken such that it suggests that switching from
the legacy interface to the modern interface works, but in reality the
switch never happens. The symptom has been witnessed when using TCG
acceleration; KVM seems to mask the issue. The issue persists with the
following (latest) stable QEMU releases: v5.2.0, v6.2.0, v7.2.0. Currently
there is no stable release that addresses the problem.

The QEMU bug confuses the Present and Possible counting in function
PlatformMaxCpuCountInitialization(), in
"OvmfPkg/Library/PlatformInitLib/Platform.c". OVMF ends up with Present=0
Possible=1. This in turn further confuses MpInitLib in UefiCpuPkg (hence
firmware-time multiprocessing will be broken). Worse, CPU hot(un)plug with
SMI will be summarily broken in OvmfPkg/CpuHotplugSmm, which (considering
the privilege level of SMM) is not that great.

Detect the issue in PlatformMaxCpuCountInitialization(), and print an
error message and *hang* if the issue is present.

The problem was originally reported by Ard [0]. We analyzed it at [1] and
[2]. A QEMU patch was sent at [3]; now merged as commit dab30fbef389
("acpi: cpuhp: fix guest-visible maximum access size to the legacy reg
block", 2023-01-08), to be included in QEMU v8.0.0.

[0] https://bugzilla.tianocore.org/show_bug.cgi?id=4234#c2

[1] https://bugzilla.tianocore.org/show_bug.cgi?id=4234#c3

[2] IO port write width clamping differs between TCG and KVM
    http://mid.mail-archive.com/aaedee84-d3ed-a4f9-21e7-d221a28d1683@redhat.com
    https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg00199.html

[3] acpi: cpuhp: fix guest-visible maximum access size to the legacy reg block
    http://mid.mail-archive.com/20230104090138.214862-1-lersek@redhat.com
    https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg00278.html

NOTE: PlatformInitLib is used in the following platform DSCs:

  OvmfPkg/AmdSev/AmdSevX64.dsc
  OvmfPkg/CloudHv/CloudHvX64.dsc
  OvmfPkg/IntelTdx/IntelTdxX64.dsc
  OvmfPkg/Microvm/MicrovmX64.dsc
  OvmfPkg/OvmfPkgIa32.dsc
  OvmfPkg/OvmfPkgIa32X64.dsc
  OvmfPkg/OvmfPkgX64.dsc

but I can only test this change with the last three platforms, running on
QEMU.

Test results:

  TCG  QEMU     OVMF     result
       patched  patched
  ---  -------  -------  -------------------------------------------------
  0    0        0        CPU counts OK (KVM masks the QEMU bug)
  0    0        1        CPU counts OK (KVM masks the QEMU bug)
  0    1        0        CPU counts OK (QEMU fix, but KVM masks the QEMU
                         bug anyway)
  0    1        1        CPU counts OK (QEMU fix, but KVM masks the QEMU
                         bug anyway)
  1    0        0        boot with broken CPU counts (original QEMU bug)
  1    0        1        broken CPU count caught (boot hangs)
  1    1        0        CPU counts OK (QEMU fix)
  1    1        1        CPU counts OK (QEMU fix)

Cc: Ard Biesheuvel <ardb+tianocore@kernel.org>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Erdem Aktas <erdemaktas@google.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Min Xu <min.m.xu@intel.com>
Cc: Oliver Steffen <osteffen@redhat.com>
Cc: Sebastien Boeuf <sebastien.boeuf@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Bugzilla: https://bugzilla.tianocore.org/show_bug.cgi?id=4250
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
---

Notes:
    v2:
    
    - V1 was at
      <http://mid.mail-archive.com/20230104151234.286030-1-lersek@redhat.com>.
    
    - Repo: <https://pagure.io/lersek/edk2.git>, branch:
      cpuhp-reg-catch-4250-v2
    
    - Remove KVM as a proposed workaround from the error message, because in
      the QEMU discussion, we had found that the KVM accelerator's behavior
      in QEMU (masking the problem) was not right, and that a fix for that
      had been in progress for quite some time.
    
    - Add the QEMU commit hash to the commit message, the code comment, and
      the error message.
    
    - Pick up Gerd's R-b; add Oliver to the Cc list.

 OvmfPkg/Library/PlatformInitLib/Platform.c | 35 ++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/OvmfPkg/Library/PlatformInitLib/Platform.c b/OvmfPkg/Library/PlatformInitLib/Platform.c
index 3e13c5d4b34f..13348afb4890 100644
--- a/OvmfPkg/Library/PlatformInitLib/Platform.c
+++ b/OvmfPkg/Library/PlatformInitLib/Platform.c
@@ -541,6 +541,41 @@ PlatformMaxCpuCountInitialization (
         ASSERT (Selected == Possible || Selected == 0);
       } while (Selected > 0);
 
+      //
+      // Sanity check: we need at least 1 present CPU (CPU#0 is always present).
+      //
+      // The legacy-to-modern switching of the CPU hotplug register block got
+      // broken (for TCG) in QEMU v5.1.0. Refer to "IO port write width clamping
+      // differs between TCG and KVM" at
+      // <http://mid.mail-archive.com/aaedee84-d3ed-a4f9-21e7-d221a28d1683@redhat.com>
+      // or at
+      // <https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg00199.html>.
+      //
+      // QEMU received the fix in commit dab30fbef389 ("acpi: cpuhp: fix
+      // guest-visible maximum access size to the legacy reg block",
+      // 2023-01-08), to be included in QEMU v8.0.0.
+      //
+      // If we're affected by this QEMU bug, then we must not continue: it
+      // confuses the multiprocessing in UefiCpuPkg/Library/MpInitLib, and
+      // breaks CPU hot(un)plug with SMI in OvmfPkg/CpuHotplugSmm.
+      //
+      if (Present == 0) {
+        DEBUG ((
+          DEBUG_ERROR,
+          "%a: Broken CPU hotplug register block: Present=%u Possible=%u.\n"
+          "%a: Update QEMU to v8, or to stable with dab30fbef389 backported.\n"
+          "%a: Refer to "
+          "<https://bugzilla.tianocore.org/show_bug.cgi?id=4250>.\n",
+          __FUNCTION__,
+          Present,
+          Possible,
+          __FUNCTION__,
+          __FUNCTION__
+          ));
+        ASSERT (FALSE);
+        CpuDeadLoop ();
+      }
+
       //
       // Sanity check: fw_cfg and the modern CPU hotplug interface should
       // return the same boot CPU count.

^ permalink raw reply related	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2023-01-18 13:25 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-01-12  8:28 [PATCH v2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression Laszlo Ersek
2023-01-12  9:55 ` [edk2-devel] " Michael Brown
2023-01-12 10:09   ` Ard Biesheuvel
2023-01-12 13:31     ` Laszlo Ersek
2023-01-12 13:22   ` Laszlo Ersek
2023-01-12 16:08     ` Michael Brown
2023-01-12 17:58       ` Laszlo Ersek
2023-01-12 18:22         ` Laszlo Ersek
2023-01-12 22:49           ` Michael Brown
2023-01-13  6:03         ` Gerd Hoffmann
2023-01-13  9:32           ` Gerd Hoffmann
2023-01-13 10:10             ` Laszlo Ersek
2023-01-13 12:22               ` Gerd Hoffmann
2023-01-16 14:42                 ` Ard Biesheuvel
2023-01-16 14:48                 ` Laszlo Ersek
2023-01-17 12:37                   ` Gerd Hoffmann
2023-01-17 16:43                     ` Ard Biesheuvel
2023-01-18  7:25                       ` Gerd Hoffmann
2023-01-18 11:50                         ` Laszlo Ersek
2023-01-18 13:10                           ` Gerd Hoffmann
2023-01-18 13:25                             ` Laszlo Ersek
2023-01-18 13:10                           ` Ard Biesheuvel
2023-01-18 13:21                             ` Laszlo Ersek
2023-01-12 18:34 ` Laszlo Ersek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox