public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
* [PATCH v2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression
@ 2023-01-12  8:28 Laszlo Ersek
  2023-01-12  9:55 ` [edk2-devel] " Michael Brown
  2023-01-12 18:34 ` Laszlo Ersek
  0 siblings, 2 replies; 24+ messages in thread
From: Laszlo Ersek @ 2023-01-12  8:28 UTC (permalink / raw)
  To: devel
  Cc: Ard Biesheuvel, Brijesh Singh, Erdem Aktas, Gerd Hoffmann,
	James Bottomley, Jiewen Yao, Jordan Justen, Min Xu,
	Oliver Steffen, Sebastien Boeuf, Tom Lendacky

In QEMU v5.1.0, the CPU hotplug register block misbehaves: the negotiation
protocol is (effectively) broken such that it suggests that switching from
the legacy interface to the modern interface works, but in reality the
switch never happens. The symptom has been witnessed when using TCG
acceleration; KVM seems to mask the issue. The issue persists with the
following (latest) stable QEMU releases: v5.2.0, v6.2.0, v7.2.0. Currently
there is no stable release that addresses the problem.

The QEMU bug confuses the Present and Possible counting in function
PlatformMaxCpuCountInitialization(), in
"OvmfPkg/Library/PlatformInitLib/Platform.c". OVMF ends up with Present=0
Possible=1. This in turn further confuses MpInitLib in UefiCpuPkg (hence
firmware-time multiprocessing will be broken). Worse, CPU hot(un)plug with
SMI will be summarily broken in OvmfPkg/CpuHotplugSmm, which (considering
the privilege level of SMM) is not that great.

Detect the issue in PlatformMaxCpuCountInitialization(), and print an
error message and *hang* if the issue is present.

The problem was originally reported by Ard [0]. We analyzed it at [1] and
[2]. A QEMU patch was sent at [3]; now merged as commit dab30fbef389
("acpi: cpuhp: fix guest-visible maximum access size to the legacy reg
block", 2023-01-08), to be included in QEMU v8.0.0.

[0] https://bugzilla.tianocore.org/show_bug.cgi?id=4234#c2

[1] https://bugzilla.tianocore.org/show_bug.cgi?id=4234#c3

[2] IO port write width clamping differs between TCG and KVM
    http://mid.mail-archive.com/aaedee84-d3ed-a4f9-21e7-d221a28d1683@redhat.com
    https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg00199.html

[3] acpi: cpuhp: fix guest-visible maximum access size to the legacy reg block
    http://mid.mail-archive.com/20230104090138.214862-1-lersek@redhat.com
    https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg00278.html

NOTE: PlatformInitLib is used in the following platform DSCs:

  OvmfPkg/AmdSev/AmdSevX64.dsc
  OvmfPkg/CloudHv/CloudHvX64.dsc
  OvmfPkg/IntelTdx/IntelTdxX64.dsc
  OvmfPkg/Microvm/MicrovmX64.dsc
  OvmfPkg/OvmfPkgIa32.dsc
  OvmfPkg/OvmfPkgIa32X64.dsc
  OvmfPkg/OvmfPkgX64.dsc

but I can only test this change with the last three platforms, running on
QEMU.

Test results:

  TCG  QEMU     OVMF     result
       patched  patched
  ---  -------  -------  -------------------------------------------------
  0    0        0        CPU counts OK (KVM masks the QEMU bug)
  0    0        1        CPU counts OK (KVM masks the QEMU bug)
  0    1        0        CPU counts OK (QEMU fix, but KVM masks the QEMU
                         bug anyway)
  0    1        1        CPU counts OK (QEMU fix, but KVM masks the QEMU
                         bug anyway)
  1    0        0        boot with broken CPU counts (original QEMU bug)
  1    0        1        broken CPU count caught (boot hangs)
  1    1        0        CPU counts OK (QEMU fix)
  1    1        1        CPU counts OK (QEMU fix)

Cc: Ard Biesheuvel <ardb+tianocore@kernel.org>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Erdem Aktas <erdemaktas@google.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Min Xu <min.m.xu@intel.com>
Cc: Oliver Steffen <osteffen@redhat.com>
Cc: Sebastien Boeuf <sebastien.boeuf@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Bugzilla: https://bugzilla.tianocore.org/show_bug.cgi?id=4250
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
---

Notes:
    v2:
    
    - V1 was at
      <http://mid.mail-archive.com/20230104151234.286030-1-lersek@redhat.com>.
    
    - Repo: <https://pagure.io/lersek/edk2.git>, branch:
      cpuhp-reg-catch-4250-v2
    
    - Remove KVM as a proposed workaround from the error message, because in
      the QEMU discussion, we had found that the KVM accelerator's behavior
      in QEMU (masking the problem) was not right, and that a fix for that
      had been in progress for quite some time.
    
    - Add the QEMU commit hash to the commit message, the code comment, and
      the error message.
    
    - Pick up Gerd's R-b; add Oliver to the Cc list.

 OvmfPkg/Library/PlatformInitLib/Platform.c | 35 ++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/OvmfPkg/Library/PlatformInitLib/Platform.c b/OvmfPkg/Library/PlatformInitLib/Platform.c
index 3e13c5d4b34f..13348afb4890 100644
--- a/OvmfPkg/Library/PlatformInitLib/Platform.c
+++ b/OvmfPkg/Library/PlatformInitLib/Platform.c
@@ -541,6 +541,41 @@ PlatformMaxCpuCountInitialization (
         ASSERT (Selected == Possible || Selected == 0);
       } while (Selected > 0);
 
+      //
+      // Sanity check: we need at least 1 present CPU (CPU#0 is always present).
+      //
+      // The legacy-to-modern switching of the CPU hotplug register block got
+      // broken (for TCG) in QEMU v5.1.0. Refer to "IO port write width clamping
+      // differs between TCG and KVM" at
+      // <http://mid.mail-archive.com/aaedee84-d3ed-a4f9-21e7-d221a28d1683@redhat.com>
+      // or at
+      // <https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg00199.html>.
+      //
+      // QEMU received the fix in commit dab30fbef389 ("acpi: cpuhp: fix
+      // guest-visible maximum access size to the legacy reg block",
+      // 2023-01-08), to be included in QEMU v8.0.0.
+      //
+      // If we're affected by this QEMU bug, then we must not continue: it
+      // confuses the multiprocessing in UefiCpuPkg/Library/MpInitLib, and
+      // breaks CPU hot(un)plug with SMI in OvmfPkg/CpuHotplugSmm.
+      //
+      if (Present == 0) {
+        DEBUG ((
+          DEBUG_ERROR,
+          "%a: Broken CPU hotplug register block: Present=%u Possible=%u.\n"
+          "%a: Update QEMU to v8, or to stable with dab30fbef389 backported.\n"
+          "%a: Refer to "
+          "<https://bugzilla.tianocore.org/show_bug.cgi?id=4250>.\n",
+          __FUNCTION__,
+          Present,
+          Possible,
+          __FUNCTION__,
+          __FUNCTION__
+          ));
+        ASSERT (FALSE);
+        CpuDeadLoop ();
+      }
+
       //
       // Sanity check: fw_cfg and the modern CPU hotplug interface should
       // return the same boot CPU count.

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [edk2-devel] [PATCH v2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression
  2023-01-12  8:28 [PATCH v2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression Laszlo Ersek
@ 2023-01-12  9:55 ` Michael Brown
  2023-01-12 10:09   ` Ard Biesheuvel
  2023-01-12 13:22   ` Laszlo Ersek
  2023-01-12 18:34 ` Laszlo Ersek
  1 sibling, 2 replies; 24+ messages in thread
From: Michael Brown @ 2023-01-12  9:55 UTC (permalink / raw)
  To: devel, lersek
  Cc: Ard Biesheuvel, Brijesh Singh, Erdem Aktas, Gerd Hoffmann,
	James Bottomley, Jiewen Yao, Jordan Justen, Min Xu,
	Oliver Steffen, Sebastien Boeuf, Tom Lendacky

On 12/01/2023 08:28, Laszlo Ersek wrote:
> In QEMU v5.1.0, the CPU hotplug register block misbehaves: the negotiation
> protocol is (effectively) broken such that it suggests that switching from
> the legacy interface to the modern interface works, but in reality the
> switch never happens. The symptom has been witnessed when using TCG
> acceleration; KVM seems to mask the issue. The issue persists with the
> following (latest) stable QEMU releases: v5.2.0, v6.2.0, v7.2.0. Currently
> there is no stable release that addresses the problem.
> 
> The QEMU bug confuses the Present and Possible counting in function
> PlatformMaxCpuCountInitialization(), in
> "OvmfPkg/Library/PlatformInitLib/Platform.c". OVMF ends up with Present=0
> Possible=1. This in turn further confuses MpInitLib in UefiCpuPkg (hence
> firmware-time multiprocessing will be broken). Worse, CPU hot(un)plug with
> SMI will be summarily broken in OvmfPkg/CpuHotplugSmm, which (considering
> the privilege level of SMM) is not that great.
> 
> Detect the issue in PlatformMaxCpuCountInitialization(), and print an
> error message and *hang* if the issue is present.

Would this mean that OVMF would refuse to start with all current distro 
versions of qemu (when not using KVM), or am I misunderstanding?

Thanks,

Michael


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [edk2-devel] [PATCH v2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression
  2023-01-12  9:55 ` [edk2-devel] " Michael Brown
@ 2023-01-12 10:09   ` Ard Biesheuvel
  2023-01-12 13:31     ` Laszlo Ersek
  2023-01-12 13:22   ` Laszlo Ersek
  1 sibling, 1 reply; 24+ messages in thread
From: Ard Biesheuvel @ 2023-01-12 10:09 UTC (permalink / raw)
  To: devel, mcb30
  Cc: lersek, Ard Biesheuvel, Brijesh Singh, Erdem Aktas, Gerd Hoffmann,
	James Bottomley, Jiewen Yao, Jordan Justen, Min Xu,
	Oliver Steffen, Sebastien Boeuf, Tom Lendacky

On Thu, 12 Jan 2023 at 10:56, Michael Brown <mcb30@ipxe.org> wrote:
>
> On 12/01/2023 08:28, Laszlo Ersek wrote:
> > In QEMU v5.1.0, the CPU hotplug register block misbehaves: the negotiation
> > protocol is (effectively) broken such that it suggests that switching from
> > the legacy interface to the modern interface works, but in reality the
> > switch never happens. The symptom has been witnessed when using TCG
> > acceleration; KVM seems to mask the issue. The issue persists with the
> > following (latest) stable QEMU releases: v5.2.0, v6.2.0, v7.2.0. Currently
> > there is no stable release that addresses the problem.
> >
> > The QEMU bug confuses the Present and Possible counting in function
> > PlatformMaxCpuCountInitialization(), in
> > "OvmfPkg/Library/PlatformInitLib/Platform.c". OVMF ends up with Present=0
> > Possible=1. This in turn further confuses MpInitLib in UefiCpuPkg (hence
> > firmware-time multiprocessing will be broken). Worse, CPU hot(un)plug with
> > SMI will be summarily broken in OvmfPkg/CpuHotplugSmm, which (considering
> > the privilege level of SMM) is not that great.
> >
> > Detect the issue in PlatformMaxCpuCountInitialization(), and print an
> > error message and *hang* if the issue is present.
>
> Would this mean that OVMF would refuse to start with all current distro
> versions of qemu (when not using KVM), or am I misunderstanding?
>

I had the same question, actually. This would be less than ideal,
especially given the fact that I didn't even notice the breakage, and
Linux happily booted on all cores.

If this is a security issue that affects SMM, could we make this
behavior dependent on REQUIRE_SMM perhaps?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [edk2-devel] [PATCH v2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression
  2023-01-12  9:55 ` [edk2-devel] " Michael Brown
  2023-01-12 10:09   ` Ard Biesheuvel
@ 2023-01-12 13:22   ` Laszlo Ersek
  2023-01-12 16:08     ` Michael Brown
  1 sibling, 1 reply; 24+ messages in thread
From: Laszlo Ersek @ 2023-01-12 13:22 UTC (permalink / raw)
  To: Michael Brown, devel
  Cc: Ard Biesheuvel, Brijesh Singh, Erdem Aktas, Gerd Hoffmann,
	James Bottomley, Jiewen Yao, Jordan Justen, Min Xu,
	Oliver Steffen, Sebastien Boeuf, Tom Lendacky

On 1/12/23 10:55, Michael Brown wrote:
> On 12/01/2023 08:28, Laszlo Ersek wrote:
>> In QEMU v5.1.0, the CPU hotplug register block misbehaves: the
>> negotiation
>> protocol is (effectively) broken such that it suggests that switching
>> from
>> the legacy interface to the modern interface works, but in reality the
>> switch never happens. The symptom has been witnessed when using TCG
>> acceleration; KVM seems to mask the issue. The issue persists with the
>> following (latest) stable QEMU releases: v5.2.0, v6.2.0, v7.2.0.
>> Currently
>> there is no stable release that addresses the problem.
>>
>> The QEMU bug confuses the Present and Possible counting in function
>> PlatformMaxCpuCountInitialization(), in
>> "OvmfPkg/Library/PlatformInitLib/Platform.c". OVMF ends up with Present=0
>> Possible=1. This in turn further confuses MpInitLib in UefiCpuPkg (hence
>> firmware-time multiprocessing will be broken). Worse, CPU hot(un)plug
>> with
>> SMI will be summarily broken in OvmfPkg/CpuHotplugSmm, which (considering
>> the privilege level of SMM) is not that great.
>>
>> Detect the issue in PlatformMaxCpuCountInitialization(), and print an
>> error message and *hang* if the issue is present.
> 
> Would this mean that OVMF would refuse to start with all current distro
> versions of qemu (when not using KVM), or am I misunderstanding?

Your understanding is correct.

I'm not in a rush to get this merged, but eventually, it should be.

On qemu-devel, I asked for the fix to be applied to all stable branches
starting with v5.*. I'm not sure how far "back" qemu-stable is
maintained though.

Laszlo


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [edk2-devel] [PATCH v2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression
  2023-01-12 10:09   ` Ard Biesheuvel
@ 2023-01-12 13:31     ` Laszlo Ersek
  0 siblings, 0 replies; 24+ messages in thread
From: Laszlo Ersek @ 2023-01-12 13:31 UTC (permalink / raw)
  To: Ard Biesheuvel, devel, mcb30
  Cc: Ard Biesheuvel, Brijesh Singh, Erdem Aktas, Gerd Hoffmann,
	James Bottomley, Jiewen Yao, Jordan Justen, Min Xu,
	Oliver Steffen, Sebastien Boeuf, Tom Lendacky, Michael Brown

On 1/12/23 11:09, Ard Biesheuvel wrote:
> On Thu, 12 Jan 2023 at 10:56, Michael Brown <mcb30@ipxe.org> wrote:
>>
>> On 12/01/2023 08:28, Laszlo Ersek wrote:
>>> In QEMU v5.1.0, the CPU hotplug register block misbehaves: the negotiation
>>> protocol is (effectively) broken such that it suggests that switching from
>>> the legacy interface to the modern interface works, but in reality the
>>> switch never happens. The symptom has been witnessed when using TCG
>>> acceleration; KVM seems to mask the issue. The issue persists with the
>>> following (latest) stable QEMU releases: v5.2.0, v6.2.0, v7.2.0. Currently
>>> there is no stable release that addresses the problem.
>>>
>>> The QEMU bug confuses the Present and Possible counting in function
>>> PlatformMaxCpuCountInitialization(), in
>>> "OvmfPkg/Library/PlatformInitLib/Platform.c". OVMF ends up with Present=0
>>> Possible=1. This in turn further confuses MpInitLib in UefiCpuPkg (hence
>>> firmware-time multiprocessing will be broken). Worse, CPU hot(un)plug with
>>> SMI will be summarily broken in OvmfPkg/CpuHotplugSmm, which (considering
>>> the privilege level of SMM) is not that great.
>>>
>>> Detect the issue in PlatformMaxCpuCountInitialization(), and print an
>>> error message and *hang* if the issue is present.
>>
>> Would this mean that OVMF would refuse to start with all current distro
>> versions of qemu (when not using KVM), or am I misunderstanding?
>>
> 
> I had the same question, actually. This would be less than ideal,
> especially given the fact that I didn't even notice the breakage, and
> Linux happily booted on all cores.
> 
> If this is a security issue that affects SMM, could we make this
> behavior dependent on REQUIRE_SMM perhaps?

Yes, we could restrict this with FeaturePcdGet (PcdSmmSmramRequire), but
then all multi-processing (the MP PPI from CpuMpPei, and the MP protocol
from CpuDxe) remains broken (or at least "misconfigured") in OVMF.

That can have OS visible effects; for example, MSR_IA32_FEATURE_CONTROL
is supposed to be configured identically on all processors, and
PlatformPei uses the MP PPI (from CpuMpPei) to do that;
<https://tianocore.acgmultimedia.com/show_bug.cgi?id=86>.

Is it common that people run bleeding edge OVMF (built from git
checkout) on distro QEMU?

Edk2's release cycle is much shorter than QEMU's, so I guess we should
delay merging this until after QEMU v8 is out, at the least. But, if
there won't be another stable release for, say, QEMU v6, is that reason
enough to abandon this patch in OVMF?

Laszlo


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [edk2-devel] [PATCH v2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression
  2023-01-12 13:22   ` Laszlo Ersek
@ 2023-01-12 16:08     ` Michael Brown
  2023-01-12 17:58       ` Laszlo Ersek
  0 siblings, 1 reply; 24+ messages in thread
From: Michael Brown @ 2023-01-12 16:08 UTC (permalink / raw)
  To: devel, lersek
  Cc: Ard Biesheuvel, Brijesh Singh, Erdem Aktas, Gerd Hoffmann,
	James Bottomley, Jiewen Yao, Jordan Justen, Min Xu,
	Oliver Steffen, Sebastien Boeuf, Tom Lendacky

On 12/01/2023 13:22, Laszlo Ersek wrote:
>>> Detect the issue in PlatformMaxCpuCountInitialization(), and
>>> print an error message and *hang* if the issue is present.
>> 
>> Would this mean that OVMF would refuse to start with all current
>> distro versions of qemu (when not using KVM), or am I
>> misunderstanding?
> 
> Your understanding is correct.

I apologise in advance if this is a stupid question, but: given that we
can detect the issue (as per this patch), and given also:

>> On 12/01/2023 08:28, Laszlo Ersek wrote:
>>> In QEMU v5.1.0, the CPU hotplug register block misbehaves: the 
>>> negotiation protocol is (effectively) broken such that it
>>> suggests that switching from the legacy interface to the modern
>>> interface works, but in reality the switch never happens.
...would it work to detect the issue and treat it as "modern interface 
is not supported: continue to use the legacy interface"?  IOW, to treat 
"Present=0" as indicating that the modern interface is not supported.

Thanks,

Michael

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [edk2-devel] [PATCH v2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression
  2023-01-12 16:08     ` Michael Brown
@ 2023-01-12 17:58       ` Laszlo Ersek
  2023-01-12 18:22         ` Laszlo Ersek
  2023-01-13  6:03         ` Gerd Hoffmann
  0 siblings, 2 replies; 24+ messages in thread
From: Laszlo Ersek @ 2023-01-12 17:58 UTC (permalink / raw)
  To: Michael Brown, devel
  Cc: Ard Biesheuvel, Brijesh Singh, Erdem Aktas, Gerd Hoffmann,
	James Bottomley, Jiewen Yao, Jordan Justen, Min Xu,
	Oliver Steffen, Sebastien Boeuf, Tom Lendacky

On 1/12/23 17:08, Michael Brown wrote:
> On 12/01/2023 13:22, Laszlo Ersek wrote:
>>>> Detect the issue in PlatformMaxCpuCountInitialization(), and
>>>> print an error message and *hang* if the issue is present.
>>>
>>> Would this mean that OVMF would refuse to start with all current
>>> distro versions of qemu (when not using KVM), or am I
>>> misunderstanding?
>>
>> Your understanding is correct.
>
> I apologise in advance if this is a stupid question, but: given that
> we can detect the issue (as per this patch), and given also:
>
>>> On 12/01/2023 08:28, Laszlo Ersek wrote:
>>>> In QEMU v5.1.0, the CPU hotplug register block misbehaves: the
>>>> negotiation protocol is (effectively) broken such that it
>>>> suggests that switching from the legacy interface to the modern
>>>> interface works, but in reality the switch never happens.
> ...would it work to detect the issue and treat it as "modern interface
> is not supported: continue to use the legacy interface"?  IOW, to
> treat "Present=0" as indicating that the modern interface is not
> supported.

I considered that. It's not a bad idea at all, but it's more complicated
than that.

The code in PlatformPei (or more precisely, nowadays in PlatformInitLib)
could be rearranged. It would be a really ugly patch, because now, after
CmdData2 reads as 0, we're simply not prepared to renege on that
determination. So the branches would have to be rearranged in a quite
ugly manner. *But* it would be doable of course.

However, in the CpuHotplugSmm driver, we also read the same register
(QEMU_CPUHP_R_CMD_DATA2), for determining feature availability. See the
QemuCpuhpReadCommandData2() call in
"OvmfPkg/CpuHotplugSmm/CpuHotplug.c", function CpuHotplugEntry(). And
the outcome of *that* check is final. No counting is done there anymore,
because it's unnecessary in the first place. If the check succeeds, we
proceed, if the check fails, we hang hard. That sanity check is also
misled by the QEMU bug, and tweaking *that* check is out of question.

And I didn't want to post a patch that would make PlatformPei deal with
the QEMU bug one way, and CpuHotplugSmm another way.

Technically, the following should be possible: in PlatformInitLib,
rearrange the branches such that we can still do "something" if the QEMU
bug is detected. Then, if PcdSmmSmramRequire is TRUE, hang hard, and if
PcdSmmSmramRequire is FALSE, then just pretend the modern hotplug
register block is not available.

Unfortunately, I really dislike this. It will cause the same hang on
distros that build OVMF -- possibly their *only* OVMF binary -- with -D
SMM_REQUIRE. Then the request will morph into "can we refine this even
further". And sure, you can refine it even further:

- the CpuHotplugSmm driver exits gracefully if QEMU does not negotiate
the "SMRAM at default SMBASE" feature (see edk2 commit d74d56fcfa31
("OvmfPkg: introduce PcdQ35SmramAtDefaultSmbase", 2020-02-05), and QEMU
commit f404220e279c ("q35: implement 128K SMRAM at default SMBASE
address", 2020-01-22)). This can be disabled through the
"mch.smbase-smram" property on the QEMU command line.

- QEMU can be configured with other compat properties on the command
line so that "CPU hotplug with SMI" and "CPU hot-unplug with SMI" *not*
be offered to the firmware. Then QEMU will reject hotplug attempts, and
the SMM hotplug code in edk2 will not be triggered by the (virtual)
hardware.

The case is that both QEMU and edk2 check for each other's supported
features. It's a complex interwoven feature set with security impact,
which is exactly why we added feature negotiation at every step --
effectively mutual negotiation wherever necessary. I cannot claim I
remember every part of it, and playing tricks around feature negotiation
with SMM impact makes me *extremely uncomfortable*. I absolutely don't
want to author an OVMF patch, briefly before I disappear again (for
good!), that "looks good" now, and then becomes a horrible SMM CVE in a
year or two. I want to go for "obviously no bug", rather than "no
obvious bug".

In brief, my counter-argument is: once we start refining the hang, it
will never be nuanced and polished enough for users, and we'll just heap
on more complexity, until we introduce an obscure but nasty bug. I don't
want to start down that path.

I can live with this patch being rejected altogether (that's consistent
with me having exited edk2 -- I can pretend I don't know about it; it's
not my job anymore). I will not take responsibility for relaxing the
proposed hang however, not even as a reviewer ACKing it. If someone can
rearrange the code, drawing a *practical* but also secure boundary for
the fallback, that's great, the patch can go in without me "being in the
know". I want none of that responsibility.

I don't want to force my hard-liner view on QEMU/OVMF users here; I need
not participate in a relaxed approach either, however. This reads like a
re-run of the "old grub in guest versus non-executable UEFI heap"
discussion, and it's giving me the shivers.

Your proposal is entirely justified, from a practical / user
perspective, but I'm not the right person for it.

Thanks
Laszlo


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [edk2-devel] [PATCH v2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression
  2023-01-12 17:58       ` Laszlo Ersek
@ 2023-01-12 18:22         ` Laszlo Ersek
  2023-01-12 22:49           ` Michael Brown
  2023-01-13  6:03         ` Gerd Hoffmann
  1 sibling, 1 reply; 24+ messages in thread
From: Laszlo Ersek @ 2023-01-12 18:22 UTC (permalink / raw)
  To: Michael Brown, devel
  Cc: Ard Biesheuvel, Brijesh Singh, Erdem Aktas, Gerd Hoffmann,
	James Bottomley, Jiewen Yao, Jordan Justen, Min Xu,
	Oliver Steffen, Sebastien Boeuf, Tom Lendacky

On 1/12/23 18:58, Laszlo Ersek wrote:

> Your proposal is entirely justified, from a practical / user
> perspective, but I'm not the right person for it.

This should do what you propose (untested), but I hate the code myself.

diff --git a/OvmfPkg/Library/PlatformInitLib/Platform.c b/OvmfPkg/Library/PlatformInitLib/Platform.c
index 3e13c5d4b34f..b4c0b47c8bb5 100644
--- a/OvmfPkg/Library/PlatformInitLib/Platform.c
+++ b/OvmfPkg/Library/PlatformInitLib/Platform.c
@@ -541,6 +541,22 @@ PlatformMaxCpuCountInitialization (
         ASSERT (Selected == Possible || Selected == 0);
       } while (Selected > 0);
 
+      if (Present == 0) {
+        if (FeaturePcdGet (PcdSmmSmramRequire)) {
+          ASSERT (FALSE);
+          CpuDeadLoop();
+        }
+        //
+        // The bug is in QEMU v5.1.0+, where we're not affected by the QEMU v2.7
+        // reset bug, so BootCpuCount from fw_cfg is reliable. Assume a fully
+        // populated topology, precluding hotplug, like when the modern CPU
+        // hotplug interface is unavailable. This will satisfy the QEMU v2.7
+        // reset bug sanity check below as well.
+        //
+        Present  = BootCpuCount;
+        Possible = BootCpuCount;
+      }
+
       //
       // Sanity check: fw_cfg and the modern CPU hotplug interface should
       // return the same boot CPU count.


I *really* don't like this. With this, you might get through the firmware and reach the guest OS, and pre-boot, the MP PPI and protocol might even work. But, because the QEMU register block is broken, I don't have the slightest idea what's going to happen if you attempt a CPU hot-*unplug* at OS runtime (unplug because in effect you start with a fully populated topology). OVMF does not participate in that (because of no SMM), but the ACPI content generated by QEMU does, and that content (AML methods) do massage the register block.

In a sense, with the register block negotiation broken, even without SMM in the picture, the guest OS could consider OVMF doing it a *service* by hanging, so that guest OS-level data is not lost when the admin attempts an unplug, and the guest OS crashes. (I've not reproduced such a crash to be clear, I just think it could be possible.)

There's got to be a limit to how far we try to compensate for broken (virtual) hardware. :( The right thing to do is to wait for the QEMU patch to reach as many as possible stable branches, let the distros pick up the new stable releases, and then merge the hardliner hang.

Laszlo


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [edk2-devel] [PATCH v2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression
  2023-01-12  8:28 [PATCH v2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression Laszlo Ersek
  2023-01-12  9:55 ` [edk2-devel] " Michael Brown
@ 2023-01-12 18:34 ` Laszlo Ersek
  1 sibling, 0 replies; 24+ messages in thread
From: Laszlo Ersek @ 2023-01-12 18:34 UTC (permalink / raw)
  To: devel
  Cc: Ard Biesheuvel, Brijesh Singh, Erdem Aktas, Gerd Hoffmann,
	James Bottomley, Jiewen Yao, Jordan Justen, Min Xu,
	Oliver Steffen, Sebastien Boeuf, Tom Lendacky

On 1/12/23 09:28, Laszlo Ersek wrote:
> In QEMU v5.1.0, the CPU hotplug register block misbehaves: the negotiation
> protocol is (effectively) broken such that it suggests that switching from
> the legacy interface to the modern interface works, but in reality the
> switch never happens. The symptom has been witnessed when using TCG
> acceleration; KVM seems to mask the issue. The issue persists with the
> following (latest) stable QEMU releases: v5.2.0, v6.2.0, v7.2.0. Currently
> there is no stable release that addresses the problem.
> 
> The QEMU bug confuses the Present and Possible counting in function
> PlatformMaxCpuCountInitialization(), in
> "OvmfPkg/Library/PlatformInitLib/Platform.c". OVMF ends up with Present=0
> Possible=1. This in turn further confuses MpInitLib in UefiCpuPkg (hence
> firmware-time multiprocessing will be broken). Worse, CPU hot(un)plug with
> SMI will be summarily broken in OvmfPkg/CpuHotplugSmm, which (considering
> the privilege level of SMM) is not that great.
> 
> Detect the issue in PlatformMaxCpuCountInitialization(), and print an
> error message and *hang* if the issue is present.
> 
> The problem was originally reported by Ard [0]. We analyzed it at [1] and
> [2]. A QEMU patch was sent at [3]; now merged as commit dab30fbef389
> ("acpi: cpuhp: fix guest-visible maximum access size to the legacy reg
> block", 2023-01-08), to be included in QEMU v8.0.0.
> 
> [0] https://bugzilla.tianocore.org/show_bug.cgi?id=4234#c2
> 
> [1] https://bugzilla.tianocore.org/show_bug.cgi?id=4234#c3
> 
> [2] IO port write width clamping differs between TCG and KVM
>     http://mid.mail-archive.com/aaedee84-d3ed-a4f9-21e7-d221a28d1683@redhat.com
>     https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg00199.html
> 
> [3] acpi: cpuhp: fix guest-visible maximum access size to the legacy reg block
>     http://mid.mail-archive.com/20230104090138.214862-1-lersek@redhat.com
>     https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg00278.html
> 
> NOTE: PlatformInitLib is used in the following platform DSCs:
> 
>   OvmfPkg/AmdSev/AmdSevX64.dsc
>   OvmfPkg/CloudHv/CloudHvX64.dsc
>   OvmfPkg/IntelTdx/IntelTdxX64.dsc
>   OvmfPkg/Microvm/MicrovmX64.dsc
>   OvmfPkg/OvmfPkgIa32.dsc
>   OvmfPkg/OvmfPkgIa32X64.dsc
>   OvmfPkg/OvmfPkgX64.dsc
> 
> but I can only test this change with the last three platforms, running on
> QEMU.
> 
> Test results:
> 
>   TCG  QEMU     OVMF     result
>        patched  patched
>   ---  -------  -------  -------------------------------------------------
>   0    0        0        CPU counts OK (KVM masks the QEMU bug)
>   0    0        1        CPU counts OK (KVM masks the QEMU bug)
>   0    1        0        CPU counts OK (QEMU fix, but KVM masks the QEMU
>                          bug anyway)
>   0    1        1        CPU counts OK (QEMU fix, but KVM masks the QEMU
>                          bug anyway)
>   1    0        0        boot with broken CPU counts (original QEMU bug)
>   1    0        1        broken CPU count caught (boot hangs)
>   1    1        0        CPU counts OK (QEMU fix)
>   1    1        1        CPU counts OK (QEMU fix)
> 
> Cc: Ard Biesheuvel <ardb+tianocore@kernel.org>
> Cc: Brijesh Singh <brijesh.singh@amd.com>
> Cc: Erdem Aktas <erdemaktas@google.com>
> Cc: Gerd Hoffmann <kraxel@redhat.com>
> Cc: James Bottomley <jejb@linux.ibm.com>
> Cc: Jiewen Yao <jiewen.yao@intel.com>
> Cc: Jordan Justen <jordan.l.justen@intel.com>
> Cc: Min Xu <min.m.xu@intel.com>
> Cc: Oliver Steffen <osteffen@redhat.com>
> Cc: Sebastien Boeuf <sebastien.boeuf@intel.com>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Bugzilla: https://bugzilla.tianocore.org/show_bug.cgi?id=4250
> Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
> ---
> 
> Notes:
>     v2:
>     
>     - V1 was at
>       <http://mid.mail-archive.com/20230104151234.286030-1-lersek@redhat.com>.
>     
>     - Repo: <https://pagure.io/lersek/edk2.git>, branch:
>       cpuhp-reg-catch-4250-v2
>     
>     - Remove KVM as a proposed workaround from the error message, because in
>       the QEMU discussion, we had found that the KVM accelerator's behavior
>       in QEMU (masking the problem) was not right, and that a fix for that
>       had been in progress for quite some time.
>     
>     - Add the QEMU commit hash to the commit message, the code comment, and
>       the error message.
>     
>     - Pick up Gerd's R-b; add Oliver to the Cc list.
> 
>  OvmfPkg/Library/PlatformInitLib/Platform.c | 35 ++++++++++++++++++++
>  1 file changed, 35 insertions(+)
> 
> diff --git a/OvmfPkg/Library/PlatformInitLib/Platform.c b/OvmfPkg/Library/PlatformInitLib/Platform.c
> index 3e13c5d4b34f..13348afb4890 100644
> --- a/OvmfPkg/Library/PlatformInitLib/Platform.c
> +++ b/OvmfPkg/Library/PlatformInitLib/Platform.c
> @@ -541,6 +541,41 @@ PlatformMaxCpuCountInitialization (
>          ASSERT (Selected == Possible || Selected == 0);
>        } while (Selected > 0);
>  
> +      //
> +      // Sanity check: we need at least 1 present CPU (CPU#0 is always present).
> +      //
> +      // The legacy-to-modern switching of the CPU hotplug register block got
> +      // broken (for TCG) in QEMU v5.1.0. Refer to "IO port write width clamping
> +      // differs between TCG and KVM" at
> +      // <http://mid.mail-archive.com/aaedee84-d3ed-a4f9-21e7-d221a28d1683@redhat.com>
> +      // or at
> +      // <https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg00199.html>.
> +      //
> +      // QEMU received the fix in commit dab30fbef389 ("acpi: cpuhp: fix
> +      // guest-visible maximum access size to the legacy reg block",
> +      // 2023-01-08), to be included in QEMU v8.0.0.
> +      //
> +      // If we're affected by this QEMU bug, then we must not continue: it
> +      // confuses the multiprocessing in UefiCpuPkg/Library/MpInitLib, and
> +      // breaks CPU hot(un)plug with SMI in OvmfPkg/CpuHotplugSmm.
> +      //
> +      if (Present == 0) {
> +        DEBUG ((
> +          DEBUG_ERROR,
> +          "%a: Broken CPU hotplug register block: Present=%u Possible=%u.\n"
> +          "%a: Update QEMU to v8, or to stable with dab30fbef389 backported.\n"
> +          "%a: Refer to "
> +          "<https://bugzilla.tianocore.org/show_bug.cgi?id=4250>.\n",
> +          __FUNCTION__,
> +          Present,
> +          Possible,
> +          __FUNCTION__,
> +          __FUNCTION__
> +          ));
> +        ASSERT (FALSE);
> +        CpuDeadLoop ();
> +      }
> +
>        //
>        // Sanity check: fw_cfg and the modern CPU hotplug interface should
>        // return the same boot CPU count.
> 

please do with this what you will


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [edk2-devel] [PATCH v2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression
  2023-01-12 18:22         ` Laszlo Ersek
@ 2023-01-12 22:49           ` Michael Brown
  0 siblings, 0 replies; 24+ messages in thread
From: Michael Brown @ 2023-01-12 22:49 UTC (permalink / raw)
  To: devel, lersek
  Cc: Ard Biesheuvel, Brijesh Singh, Erdem Aktas, Gerd Hoffmann,
	James Bottomley, Jiewen Yao, Jordan Justen, Min Xu,
	Oliver Steffen, Sebastien Boeuf, Tom Lendacky

On 12/01/2023 17:58, Laszlo Ersek wrote:
> The case is that both QEMU and edk2 check for each other's supported 
> features. It's a complex interwoven feature set with security
> impact, which is exactly why we added feature negotiation at every
> step -- effectively mutual negotiation wherever necessary. I cannot
> claim I remember every part of it, and playing tricks around feature
> negotiation with SMM impact makes me *extremely uncomfortable*. I
> absolutely don't want to author an OVMF patch, briefly before I
> disappear again (for good!), that "looks good" now, and then becomes
> a horrible SMM CVE in a year or two. I want to go for "obviously no
> bug", rather than "no obvious bug".

I'm definitely not sufficiently familiar with all of the QEMU and OVMF
historical quirks to safely author or review a patch to cover all of 
this, so I will very definitely defer to your judgement on this.

On 12/01/2023 18:22, Laszlo Ersek wrote:
> There's got to be a limit to how far we try to compensate for broken
> (virtual) hardware. :( The right thing to do is to wait for the QEMU
> patch to reach as many as possible stable branches, let the distros
> pick up the new stable releases, and then merge the hardliner hang.

I concur.

Thanks for considering the suggestion!  :)

Michael

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [edk2-devel] [PATCH v2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression
  2023-01-12 17:58       ` Laszlo Ersek
  2023-01-12 18:22         ` Laszlo Ersek
@ 2023-01-13  6:03         ` Gerd Hoffmann
  2023-01-13  9:32           ` Gerd Hoffmann
  1 sibling, 1 reply; 24+ messages in thread
From: Gerd Hoffmann @ 2023-01-13  6:03 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Michael Brown, devel, Ard Biesheuvel, Brijesh Singh, Erdem Aktas,
	James Bottomley, Jiewen Yao, Jordan Justen, Min Xu,
	Oliver Steffen, Sebastien Boeuf, Tom Lendacky

  Hi,

> - QEMU can be configured with other compat properties on the command
> line so that "CPU hotplug with SMI" and "CPU hot-unplug with SMI" *not*
> be offered to the firmware. Then QEMU will reject hotplug attempts, and
> the SMM hotplug code in edk2 will not be triggered by the (virtual)
> hardware.

Can we have edk2 print instructions for that in the error message?
We also might switch edk2 CI to run qemu that way.

take care,
  Gerd


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [edk2-devel] [PATCH v2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression
  2023-01-13  6:03         ` Gerd Hoffmann
@ 2023-01-13  9:32           ` Gerd Hoffmann
  2023-01-13 10:10             ` Laszlo Ersek
  0 siblings, 1 reply; 24+ messages in thread
From: Gerd Hoffmann @ 2023-01-13  9:32 UTC (permalink / raw)
  To: devel
  Cc: Laszlo Ersek, Michael Brown, Ard Biesheuvel, Brijesh Singh,
	Erdem Aktas, James Bottomley, Jiewen Yao, Jordan Justen, Min Xu,
	Oliver Steffen, Sebastien Boeuf, Tom Lendacky

On Fri, Jan 13, 2023 at 07:03:54AM +0100, Gerd Hoffmann wrote:
>   Hi,
> 
> > - QEMU can be configured with other compat properties on the command
> > line so that "CPU hotplug with SMI" and "CPU hot-unplug with SMI" *not*
> > be offered to the firmware. Then QEMU will reject hotplug attempts, and
> > the SMM hotplug code in edk2 will not be triggered by the (virtual)
> > hardware.
> 
> Can we have edk2 print instructions for that in the error message?

This seems to be:

    qemu -M q35 \
        -global ICH9-LPC.x-smi-cpu-hotplug=off \
        -global ICH9-LPC.x-smi-cpu-hotunplug=off

But it appears to not work.

Or did you mean something else?

take care,
  Gerd


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [edk2-devel] [PATCH v2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression
  2023-01-13  9:32           ` Gerd Hoffmann
@ 2023-01-13 10:10             ` Laszlo Ersek
  2023-01-13 12:22               ` Gerd Hoffmann
  0 siblings, 1 reply; 24+ messages in thread
From: Laszlo Ersek @ 2023-01-13 10:10 UTC (permalink / raw)
  To: Gerd Hoffmann, devel
  Cc: Michael Brown, Ard Biesheuvel, Brijesh Singh, Erdem Aktas,
	James Bottomley, Jiewen Yao, Jordan Justen, Min Xu,
	Oliver Steffen, Sebastien Boeuf, Tom Lendacky

On 1/13/23 10:32, Gerd Hoffmann wrote:
> On Fri, Jan 13, 2023 at 07:03:54AM +0100, Gerd Hoffmann wrote:
>>   Hi,
>>
>>> - QEMU can be configured with other compat properties on the command
>>> line so that "CPU hotplug with SMI" and "CPU hot-unplug with SMI" *not*
>>> be offered to the firmware. Then QEMU will reject hotplug attempts, and
>>> the SMM hotplug code in edk2 will not be triggered by the (virtual)
>>> hardware.
>>
>> Can we have edk2 print instructions for that in the error message?
> 
> This seems to be:
> 
>     qemu -M q35 \
>         -global ICH9-LPC.x-smi-cpu-hotplug=off \
>         -global ICH9-LPC.x-smi-cpu-hotunplug=off

Yes, those are the flags.

> But it appears to not work.

They should work, but they take effect in QEMU, and not in the firmware.
These knobs control what CPU hot(un)plug+SMI features QEMU exposes to
the guest fw, via fw_cfg, then the guest confirms those that it knows,
and even locks the feature set down (IIRC). The same is replayed with an
S3 bootscript snippet during S3 resume.

Then, when you try to use a QMP command with QEMU for hotplugging or
hot-unplugging a CPU (equiv. virsh commands that translate to such QMP
commands), QEMU will allow or reject those commands based on the
firmware's prior confirmation. That is, the knobs don't influence the
firmware's behavior, the firmware just confirms to QEMU whether it can
handle what. This makes it possible to run {old, new} firmware x { old,
new } QEMU.

In particular the firmware makes no further decisions based on whether
QEMU advertized some of these features.

Laszlo


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [edk2-devel] [PATCH v2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression
  2023-01-13 10:10             ` Laszlo Ersek
@ 2023-01-13 12:22               ` Gerd Hoffmann
  2023-01-16 14:42                 ` Ard Biesheuvel
  2023-01-16 14:48                 ` Laszlo Ersek
  0 siblings, 2 replies; 24+ messages in thread
From: Gerd Hoffmann @ 2023-01-13 12:22 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: devel, Michael Brown, Ard Biesheuvel, Brijesh Singh, Erdem Aktas,
	James Bottomley, Jiewen Yao, Jordan Justen, Min Xu,
	Oliver Steffen, Sebastien Boeuf, Tom Lendacky

On Fri, Jan 13, 2023 at 11:10:54AM +0100, Laszlo Ersek wrote:
> On 1/13/23 10:32, Gerd Hoffmann wrote:
> > On Fri, Jan 13, 2023 at 07:03:54AM +0100, Gerd Hoffmann wrote:
> >>   Hi,
> >>
> >>> - QEMU can be configured with other compat properties on the command
> >>> line so that "CPU hotplug with SMI" and "CPU hot-unplug with SMI" *not*
> >>> be offered to the firmware. Then QEMU will reject hotplug attempts, and
> >>> the SMM hotplug code in edk2 will not be triggered by the (virtual)
> >>> hardware.
> >>
> >> Can we have edk2 print instructions for that in the error message?
> > 
> > This seems to be:
> > 
> >     qemu -M q35 \
> >         -global ICH9-LPC.x-smi-cpu-hotplug=off \
> >         -global ICH9-LPC.x-smi-cpu-hotunplug=off
> 
> Yes, those are the flags.
> 
> > But it appears to not work.
> 
> They should work, but they take effect in QEMU, and not in the firmware.
> These knobs control what CPU hot(un)plug+SMI features QEMU exposes to
> the guest fw, via fw_cfg,

Ok, I see, only the SMM code actually checks that.

> In particular the firmware makes no further decisions based on whether
> QEMU advertized some of these features.

I was thinking the other way around:  When cpu hotplug is disabled in
qemu it should be safe to skip the whole cpu hotplug checking dance.
See test patch below.

That would give us a config switch (turn off cpu hotplug support) which
would allow edk2 run on qemu versions with broken cpu hotplug.

Does the idea look sane or do I miss something?

take care,
  Gerd

commit bd2e36eba35268ab46c0125d2b9125391ea6f9fc
Author: Gerd Hoffmann <kraxel@redhat.com>
Date:   Fri Jan 13 13:07:36 2023 +0100

    skip cpu present checking when hotplug is off

diff --git a/OvmfPkg/Library/PlatformInitLib/Platform.c b/OvmfPkg/Library/PlatformInitLib/Platform.c
index 13348afb4890..2b0f0c836f85 100644
--- a/OvmfPkg/Library/PlatformInitLib/Platform.c
+++ b/OvmfPkg/Library/PlatformInitLib/Platform.c
@@ -415,8 +415,9 @@ PlatformMaxCpuCountInitialization (
   IN OUT EFI_HOB_PLATFORM_INFO  *PlatformInfoHob
   )
 {
-  UINT16  BootCpuCount = 0;
-  UINT32  MaxCpuCount;
+  UINT16   BootCpuCount = 0;
+  UINT32   MaxCpuCount;
+  BOOLEAN  CpuHotplugSupported = FALSE;
 
   //
   // Try to fetch the boot CPU count.
@@ -424,6 +425,31 @@ PlatformMaxCpuCountInitialization (
   if (QemuFwCfgIsAvailable ()) {
     QemuFwCfgSelectItem (QemuFwCfgItemSmpCpuCount);
     BootCpuCount = QemuFwCfgRead16 ();
+    DEBUG ((DEBUG_INFO, "%a: BootCpuCount: %d\n", __FUNCTION__, BootCpuCount));
+  }
+
+  {
+    FIRMWARE_CONFIG_ITEM  SupportedFeaturesItem;
+    UINTN                 SupportedFeaturesSize;
+    UINT64                mSmiFeatures;
+    EFI_STATUS            Status;
+
+    Status = QemuFwCfgFindFile (
+               "etc/smi/supported-features",
+               &SupportedFeaturesItem,
+               &SupportedFeaturesSize
+               );
+
+    if (EFI_ERROR (Status)) {
+      DEBUG ((DEBUG_INFO, "%a: etc/smi/supported-features: %r\n", __FUNCTION__, Status));
+    } else {
+      QemuFwCfgSelectItem (SupportedFeaturesItem);
+      QemuFwCfgReadBytes (sizeof mSmiFeatures, &mSmiFeatures);
+      DEBUG ((DEBUG_INFO, "%a: etc/smi/supported-features: 0x%x\n", __FUNCTION__, mSmiFeatures));
+      if (mSmiFeatures & (BIT1 /* hotplug */ | BIT2 /* hotunplug */)) {
+        CpuHotplugSupported = TRUE;
+      }
+    }
   }
 
   if (BootCpuCount == 0) {
@@ -435,6 +461,9 @@ PlatformMaxCpuCountInitialization (
     //
     DEBUG ((DEBUG_WARN, "%a: boot CPU count unavailable\n", __FUNCTION__));
     MaxCpuCount = PlatformInfoHob->DefaultMaxCpuNumber;
+  } else if (!CpuHotplugSupported) {
+    DEBUG ((DEBUG_INFO, "%a: CPU hotplug support not available\n", __FUNCTION__));
+    MaxCpuCount = BootCpuCount;
   } else {
     //
     // We will expose BootCpuCount to MpInitLib. MpInitLib will count APs up to


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [edk2-devel] [PATCH v2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression
  2023-01-13 12:22               ` Gerd Hoffmann
@ 2023-01-16 14:42                 ` Ard Biesheuvel
  2023-01-16 14:48                 ` Laszlo Ersek
  1 sibling, 0 replies; 24+ messages in thread
From: Ard Biesheuvel @ 2023-01-16 14:42 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Laszlo Ersek, devel, Michael Brown, Ard Biesheuvel, Brijesh Singh,
	Erdem Aktas, James Bottomley, Jiewen Yao, Jordan Justen, Min Xu,
	Oliver Steffen, Sebastien Boeuf, Tom Lendacky

On Fri, 13 Jan 2023 at 13:22, Gerd Hoffmann <kraxel@redhat.com> wrote:
>
> On Fri, Jan 13, 2023 at 11:10:54AM +0100, Laszlo Ersek wrote:
> > On 1/13/23 10:32, Gerd Hoffmann wrote:
> > > On Fri, Jan 13, 2023 at 07:03:54AM +0100, Gerd Hoffmann wrote:
> > >>   Hi,
> > >>
> > >>> - QEMU can be configured with other compat properties on the command
> > >>> line so that "CPU hotplug with SMI" and "CPU hot-unplug with SMI" *not*
> > >>> be offered to the firmware. Then QEMU will reject hotplug attempts, and
> > >>> the SMM hotplug code in edk2 will not be triggered by the (virtual)
> > >>> hardware.
> > >>
> > >> Can we have edk2 print instructions for that in the error message?
> > >
> > > This seems to be:
> > >
> > >     qemu -M q35 \
> > >         -global ICH9-LPC.x-smi-cpu-hotplug=off \
> > >         -global ICH9-LPC.x-smi-cpu-hotunplug=off
> >
> > Yes, those are the flags.
> >
> > > But it appears to not work.
> >
> > They should work, but they take effect in QEMU, and not in the firmware.
> > These knobs control what CPU hot(un)plug+SMI features QEMU exposes to
> > the guest fw, via fw_cfg,
>
> Ok, I see, only the SMM code actually checks that.
>
> > In particular the firmware makes no further decisions based on whether
> > QEMU advertized some of these features.
>
> I was thinking the other way around:  When cpu hotplug is disabled in
> qemu it should be safe to skip the whole cpu hotplug checking dance.
> See test patch below.
>
> That would give us a config switch (turn off cpu hotplug support) which
> would allow edk2 run on qemu versions with broken cpu hotplug.
>
> Does the idea look sane or do I miss something?
>

I cannot review the actual patch due to lack of x86/smm/qemu/hotplug
knowledge, but if this allows us to merge Laszlo's fix without
breaking all current QEMU/x86 TCG users, I'm all for it.


>
> commit bd2e36eba35268ab46c0125d2b9125391ea6f9fc
> Author: Gerd Hoffmann <kraxel@redhat.com>
> Date:   Fri Jan 13 13:07:36 2023 +0100
>
>     skip cpu present checking when hotplug is off
>
> diff --git a/OvmfPkg/Library/PlatformInitLib/Platform.c b/OvmfPkg/Library/PlatformInitLib/Platform.c
> index 13348afb4890..2b0f0c836f85 100644
> --- a/OvmfPkg/Library/PlatformInitLib/Platform.c
> +++ b/OvmfPkg/Library/PlatformInitLib/Platform.c
> @@ -415,8 +415,9 @@ PlatformMaxCpuCountInitialization (
>    IN OUT EFI_HOB_PLATFORM_INFO  *PlatformInfoHob
>    )
>  {
> -  UINT16  BootCpuCount = 0;
> -  UINT32  MaxCpuCount;
> +  UINT16   BootCpuCount = 0;
> +  UINT32   MaxCpuCount;
> +  BOOLEAN  CpuHotplugSupported = FALSE;
>
>    //
>    // Try to fetch the boot CPU count.
> @@ -424,6 +425,31 @@ PlatformMaxCpuCountInitialization (
>    if (QemuFwCfgIsAvailable ()) {
>      QemuFwCfgSelectItem (QemuFwCfgItemSmpCpuCount);
>      BootCpuCount = QemuFwCfgRead16 ();
> +    DEBUG ((DEBUG_INFO, "%a: BootCpuCount: %d\n", __FUNCTION__, BootCpuCount));
> +  }
> +
> +  {
> +    FIRMWARE_CONFIG_ITEM  SupportedFeaturesItem;
> +    UINTN                 SupportedFeaturesSize;
> +    UINT64                mSmiFeatures;
> +    EFI_STATUS            Status;
> +
> +    Status = QemuFwCfgFindFile (
> +               "etc/smi/supported-features",
> +               &SupportedFeaturesItem,
> +               &SupportedFeaturesSize
> +               );
> +
> +    if (EFI_ERROR (Status)) {
> +      DEBUG ((DEBUG_INFO, "%a: etc/smi/supported-features: %r\n", __FUNCTION__, Status));
> +    } else {
> +      QemuFwCfgSelectItem (SupportedFeaturesItem);
> +      QemuFwCfgReadBytes (sizeof mSmiFeatures, &mSmiFeatures);
> +      DEBUG ((DEBUG_INFO, "%a: etc/smi/supported-features: 0x%x\n", __FUNCTION__, mSmiFeatures));
> +      if (mSmiFeatures & (BIT1 /* hotplug */ | BIT2 /* hotunplug */)) {
> +        CpuHotplugSupported = TRUE;
> +      }
> +    }
>    }
>
>    if (BootCpuCount == 0) {
> @@ -435,6 +461,9 @@ PlatformMaxCpuCountInitialization (
>      //
>      DEBUG ((DEBUG_WARN, "%a: boot CPU count unavailable\n", __FUNCTION__));
>      MaxCpuCount = PlatformInfoHob->DefaultMaxCpuNumber;
> +  } else if (!CpuHotplugSupported) {
> +    DEBUG ((DEBUG_INFO, "%a: CPU hotplug support not available\n", __FUNCTION__));
> +    MaxCpuCount = BootCpuCount;
>    } else {
>      //
>      // We will expose BootCpuCount to MpInitLib. MpInitLib will count APs up to
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [edk2-devel] [PATCH v2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression
  2023-01-13 12:22               ` Gerd Hoffmann
  2023-01-16 14:42                 ` Ard Biesheuvel
@ 2023-01-16 14:48                 ` Laszlo Ersek
  2023-01-17 12:37                   ` Gerd Hoffmann
  1 sibling, 1 reply; 24+ messages in thread
From: Laszlo Ersek @ 2023-01-16 14:48 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: devel, Michael Brown, Ard Biesheuvel, Brijesh Singh, Erdem Aktas,
	James Bottomley, Jiewen Yao, Jordan Justen, Min Xu,
	Oliver Steffen, Sebastien Boeuf, Tom Lendacky

On 1/13/23 13:22, Gerd Hoffmann wrote:
> On Fri, Jan 13, 2023 at 11:10:54AM +0100, Laszlo Ersek wrote:
>> On 1/13/23 10:32, Gerd Hoffmann wrote:
>>> On Fri, Jan 13, 2023 at 07:03:54AM +0100, Gerd Hoffmann wrote:
>>>>   Hi,
>>>>
>>>>> - QEMU can be configured with other compat properties on the
>>>>> command line so that "CPU hotplug with SMI" and "CPU hot-unplug
>>>>> with SMI" *not* be offered to the firmware. Then QEMU will reject
>>>>> hotplug attempts, and the SMM hotplug code in edk2 will not be
>>>>> triggered by the (virtual) hardware.
>>>>
>>>> Can we have edk2 print instructions for that in the error message?
>>>
>>> This seems to be:
>>>
>>>     qemu -M q35 \
>>>         -global ICH9-LPC.x-smi-cpu-hotplug=off \
>>>         -global ICH9-LPC.x-smi-cpu-hotunplug=off
>>
>> Yes, those are the flags.
>>
>>> But it appears to not work.
>>
>> They should work, but they take effect in QEMU, and not in the
>> firmware. These knobs control what CPU hot(un)plug+SMI features QEMU
>> exposes to the guest fw, via fw_cfg,
>
> Ok, I see, only the SMM code actually checks that.
>
>> In particular the firmware makes no further decisions based on
>> whether QEMU advertized some of these features.
>
> I was thinking the other way around:  When cpu hotplug is disabled in
> qemu it should be safe to skip the whole cpu hotplug checking dance.
> See test patch below.
>
> That would give us a config switch (turn off cpu hotplug support)
> which would allow edk2 run on qemu versions with broken cpu hotplug.
>
> Does the idea look sane or do I miss something?
>
> take care,
>   Gerd
>
> commit bd2e36eba35268ab46c0125d2b9125391ea6f9fc
> Author: Gerd Hoffmann <kraxel@redhat.com>
> Date:   Fri Jan 13 13:07:36 2023 +0100
>
>     skip cpu present checking when hotplug is off
>
> diff --git a/OvmfPkg/Library/PlatformInitLib/Platform.c b/OvmfPkg/Library/PlatformInitLib/Platform.c
> index 13348afb4890..2b0f0c836f85 100644
> --- a/OvmfPkg/Library/PlatformInitLib/Platform.c
> +++ b/OvmfPkg/Library/PlatformInitLib/Platform.c
> @@ -415,8 +415,9 @@ PlatformMaxCpuCountInitialization (
>    IN OUT EFI_HOB_PLATFORM_INFO  *PlatformInfoHob
>    )
>  {
> -  UINT16  BootCpuCount = 0;
> -  UINT32  MaxCpuCount;
> +  UINT16   BootCpuCount = 0;
> +  UINT32   MaxCpuCount;
> +  BOOLEAN  CpuHotplugSupported = FALSE;
>
>    //
>    // Try to fetch the boot CPU count.
> @@ -424,6 +425,31 @@ PlatformMaxCpuCountInitialization (
>    if (QemuFwCfgIsAvailable ()) {
>      QemuFwCfgSelectItem (QemuFwCfgItemSmpCpuCount);
>      BootCpuCount = QemuFwCfgRead16 ();
> +    DEBUG ((DEBUG_INFO, "%a: BootCpuCount: %d\n", __FUNCTION__, BootCpuCount));
> +  }
> +
> +  {
> +    FIRMWARE_CONFIG_ITEM  SupportedFeaturesItem;
> +    UINTN                 SupportedFeaturesSize;
> +    UINT64                mSmiFeatures;
> +    EFI_STATUS            Status;
> +
> +    Status = QemuFwCfgFindFile (
> +               "etc/smi/supported-features",
> +               &SupportedFeaturesItem,
> +               &SupportedFeaturesSize
> +               );
> +
> +    if (EFI_ERROR (Status)) {
> +      DEBUG ((DEBUG_INFO, "%a: etc/smi/supported-features: %r\n", __FUNCTION__, Status));
> +    } else {
> +      QemuFwCfgSelectItem (SupportedFeaturesItem);
> +      QemuFwCfgReadBytes (sizeof mSmiFeatures, &mSmiFeatures);
> +      DEBUG ((DEBUG_INFO, "%a: etc/smi/supported-features: 0x%x\n", __FUNCTION__, mSmiFeatures));
> +      if (mSmiFeatures & (BIT1 /* hotplug */ | BIT2 /* hotunplug */)) {
> +        CpuHotplugSupported = TRUE;
> +      }
> +    }
>    }
>
>    if (BootCpuCount == 0) {
> @@ -435,6 +461,9 @@ PlatformMaxCpuCountInitialization (
>      //
>      DEBUG ((DEBUG_WARN, "%a: boot CPU count unavailable\n", __FUNCTION__));
>      MaxCpuCount = PlatformInfoHob->DefaultMaxCpuNumber;
> +  } else if (!CpuHotplugSupported) {
> +    DEBUG ((DEBUG_INFO, "%a: CPU hotplug support not available\n", __FUNCTION__));
> +    MaxCpuCount = BootCpuCount;
>    } else {
>      //
>      // We will expose BootCpuCount to MpInitLib. MpInitLib will count APs up to
>

This would be wrong.

Let me provide a summary of CPU hotplug with SMM. It consists of the
following patch sets, in edk2. Note that these sets were implemented and
merged in dependency order, and I'm listing them likewise.

(E1) a7e2d20193e8..778832bcad33, for TianoCore#1515

(E2) c8b8157e126a..83357313dd67, for TianoCore#1515

(E3) 422da35375c6..75839f977d37, for TianoCore#1512

(E4) 61d3b2d4279e..1158fc8e2c7b, for TianoCore#1512

(E5) 5ba203b54e59, no BZ

(E6) 63d92674d240..cbccf995920a, for TianoCore#2929 [not really relevant
here, just for completeness]

(I'm ignoring hot-unplug now, which came later. See TianoCore#3132; the
commit range is noted there too.)

On the QEMU side, some relevant series are (again in dependency order):

(Q1) be9612e8cbb4..3a61c8db9d25

(Q2) fd9b0830b0be [just for completeness]

(Q3) 2d69eba5fe52..6e2e2e8a4220

I recommend reading through all those commits.


*** Series E1+E2 allow OVMF to expose the precise boot CPU count
(=present) and the max CPU count (=possible) to UefiCpuPkg/MpInitLib,
and the rest of the firmware. Such that the initial enumeration is both
precise and quick, and that all components in the firmware can
distinguish these two concepts.

Note in particular the final patch in series E2. That patch makes CPU
hotplug work with S3 resume *without* SMM at all in the picture, for
exmaple on i440fx too. Masking the "present at boot" vs "possible" CPU
count difference based on "etc/smi/supported-features" would break that
functionality (again, totally unrelated to SMM). There is another (more
serious) arguments against keying this workaround off of
"etc/smi/supported-features", and I'm going to get to that as well, but
this is one reason already.

The QEMU regression breaks the present vs. possible logic in series E2.
Assume we're on i440fx, or else on Q35 with an OVMF binary that does not
include the SMM driver stack. In this case, the firmware need not take
notice of CPU hotplugging in real time, but during S3 resume, it must be
able to see an increased CPU count. Even though the firmware fails to
use the modern regblock, the firmware's enablement of the modern
regblock *may not* be necessary for the guest OS to still hotplug CPUs
(using QEMU's ACPI methods). There is no firmware<->OS dependency in
this case. So such an S3 failure could lead to data loss.

I'm against trying to come up with an extremely sophisticated condition
under which this present vs. possible masking would be safe.

(The QEMU-side counterpart is a *sub-series* of Q1.)


*** Series E3 is the "SMRAM at default SMBASE" implementation in OVMF.
The QEMU-side counterpart is the *other sub-series* of Q1.

This feature is the core of the security of CPU hotplug, when SMM is
enabled.

The threat model is the following: the malicious guest OS places attack
code at the default SMBASE, in RAM. Later, the VM admin on the host side
hotplugs a CPU, eventually. The malicious guest OS immediately sends the
new CPU an SMI followed by an INIT-SIPI-SIPI. This means that the new
CPU starts executing the OS's attack code at the default SMBASE, in SMM,
and with access to all SMRAM. So it can attack the firmware's SMM
machinery.

This feature prevents the attack by replacing guest OS writeable RAM at
the default SMBASE with SMRAM, and locks it down during boot. The
feature is negotiated by PlatformPei in
Q35SmramAtDefaultSmbaseInitialization(), and is represented by
PcdQ35SmramAtDefaultSmbase; it orchestrates multiple drivers. The
QEMU-side property is called "mch.smbase-smram".


Now here I want to point out something very important. Note that QEMU
series Q1 implements *both* the "SMRAM at default SMBASE" feature *and*
the "Command data 2" register in the CPU hotplug register block. The
"Command data 2" register is used for two purposes: negotiating the
"modern vs. legacy" operation of the register block, and for getting the
APIC ID of a particular CPU. Different locations in OVMF use "Command
data 2" for these different purposes, but what's important here is that
the successful negotiation of the "SMRAM at default SMBASE" feature
*implies* that the "Command data 2" register *works*. In other words,
"SMRAM at default SMBASE", if negotiated, can be used as a proxy for
"Command data 2". Both of these features would be released in QEMU
v5.0.0, coming from the same one Q1 patch series, one would be
negotiable, the other not.

Therefore, locations in OVMF that want *just* the "Command data 2"
register, go through the legacy vs. modern regblock negotiation, using
"Command data 2". Other locations in OVMF, namely those that want the
"SMRAM at default SMBASE" feature only, or want *both* features, would
only check for "SMRAM at default SMBASE".


*** Series E4 implements the actual hotplug (w/ SMM) feature in OVMF.

Note that in commit 17efae27acaf ("OvmfPkg/CpuHotplugSmm: introduce
skeleton for CPU Hotplug SMM driver", 2020-03-04),
PcdQ35SmramAtDefaultSmbase==FALSE causes the driver to exit gracefully
with EFI_UNSUPPORTED. However, if the feature has been negotiated, all
further steps must complete fine, or the driver hangs intentionally.

In commit f668e7887165 ("OvmfPkg/CpuHotplugSmm: define the
QEMU_CPUHP_CMD_GET_ARCH_ID macro", 2020-03-04), the "Command data 2"
register is sanity checked -- *enforced*. That's based exactly on the
above-noted "proxying"; see the big code comment in the commit:

+  //
+  // Sanity-check the CPU hotplug interface.
+  //
+  // Both of the following features are part of QEMU 5.0, introduced primarily
+  // in commit range 3e08b2b9cb64..3a61c8db9d25:
+  //
+  // (a) the QEMU_CPUHP_CMD_GET_ARCH_ID command of the modern CPU hotplug
+  //     interface,
+  //
+  // (b) the "SMRAM at default SMBASE" feature.
+  //
+  // From these, (b) is restricted to 5.0+ machine type versions, while (a)
+  // does not depend on machine type version. Because we ensured the stricter
+  // condition (b) through PcdQ35SmramAtDefaultSmbase above, the (a)
+  // QEMU_CPUHP_CMD_GET_ARCH_ID command must now be available too. While we
+  // can't verify the presence of precisely that command, we can still verify
+  // (sanity-check) that the modern interface is active, at least.
+  //
+  // Consult the "Typical usecases | Detecting and enabling modern CPU hotplug
+  // interface" section in QEMU's "docs/specs/acpi_cpu_hotplug.txt", on the
+  // following.
+  //

The same implication / "proxy nature" is used in commit 1158fc8e2c7b
("OvmfPkg/CpuS3DataDxe: enable S3 resume after CPU hotplug",
2020-03-04), as well.

So this shows why the QEMU regression is so nasty; it doesn't only break
the explicit negotiations involving "Command data 2", but breaks *other*
code locations that deduce the availability of "Command data 2" from a
stricter, and separately negotiated, feature.


*** Patch E5 is what introduces the "etc/smi/supported-features" stuff
in OVMF. The QEMU counterpart is the Q3 series.

Note that for OVMF, this is effectively an afterthought. It is not
needed for securing the firmware from a malicious OS, it is also not
needed for functionality, as far as the firmware is concerned. It
basically communicates OVMF's features to QEMU.

This stuff in OVMF enables two things:

- it enables QEMU to protect a *non-malicious* OS from crashing if the
firmware underneath supports SMI broadcast but is unaware of CPU
hotplug,

- it enables QEMU to generate ACPI content for a *non-malicious* OS so
that the OS inform the firmware (via raising an SMI) about a CPU hotplug
event.

Note that there is zero security impact from this; that was covered by
series E3 (= the "SMRAM at default SMBASE" feature).

This is the other (more serious) reason that we shouldn't use BIT1
(ICH9_LPC_SMI_F_CPU_HOTPLUG) and BIT2 (ICH9_LPC_SMI_F_CPU_HOT_UNPLUG) in
"etc/smi/supported-features" for driving anything in the firmware. They
were never meant to *steer* the firmware. BIT0
(ICH9_LPC_SMI_F_BROADCAST) was meant to steer both the firmware and
QEMU, but BIT1 and BIT2 only steer QEMU.


In summary, to keep OVMF plus the guest OS "healthy" in face of the QEMU
regression, at least three things are necessary:

(a) Do *not* extend the role of SMI feature negotiation to steer the
workaround; keep it intact. With time, the QEMU regression will lose its
significance, but we're going to be left with a horrible confusion
(design violation) of BIT1 and BIT2 in the SMI features bitmask. (Case
in point: the QEMU-v2.7 reset bug is totally irrelevant nowadays, but we
still have the workaround in place for it, and that workaround *worsens*
the symptoms of the present QEMU regression!)

(b) Avoid potential guest OS data loss by breaking *basic* S3 resume
functionality (with SMM entirely out of the picture). This means that
*whenever* the guest OS is capable of hotplugging a CPU (including that
QEMU allows such an operation to be initiated over QMP), the S3 resume
code in OVMF must be able to deal with an increased number of CPUs,
relative to the first (cold) boot. In other words, the difference
between "present" and "possible" must not be masked, whenever the guest
OS is capable of hotplugging a CPU (even without SMM in the picture),
using the ACPI code generated by QEMU.

(In the beginning, I had proposed for QEMU to expose a new fw_cfg file,
for "max cpus", but Igor (justifiedly) suggested to just collect the
"possible" count from the modern regblock:
<http://mid.mail-archive.com/20191008175931.483af366@redhat.com>.)

(c) *No location* in the code that deduces "Command data 2 register
works" from "PcdQ35SmramAtDefaultSmbase == TRUE" must be reached,
because this implication is broken by the QEMU regression.

Note that my proposal (hard hang upon "Command data 2 register not
working") satisfies all three, and with time, becomes applicable. If
pressure needs to be exterted somewhere to expedite that, then it's the
QEMU stable release process.

Note that clearing "mch.smbase-smram" on the QEMU command line, in
itself, would be very wrong. It would disable the security from series
E3, like it used to be on 4.2 machine types and before; however, at the
same time, it would not prevent OVMF from acknowledging the
ICH9_LPC_SMI_F_CPU_HOTPLUG feature bit via fw_cfg, which appeared with
5.2 machine types. In other words, it would create a machine type that
never existed, and OVMF never expected to see -- the firmware wouldn't
protect itself from a malicious guest OS, and QEMU would permit the
admin to hotplug CPUs.

If you simply go back to 4.2 machine types ("-M" switch), disabling both
the "SMRAM at default SMBASE" and ICH9_LPC_SMI_F_CPU_HOTPLUG features,
the hotplug regblock's negotiation still malfunctions (it is not
versioned in QEMU). Assuming you hang OVMF if *either one* of
PcdQ35SmramAtDefaultSmbase and ICH9_LPC_SMI_F_CPU_HOTPLUG is set, and
allow the boot to progress if both are clear, the present<->possible
counts would still be wrong, and S3 resume might cause OS data loss if
the OS still managed to hotplug a CPU.

I do not claim that a logical predicate "does not exist" that covers
precisely those situations that need to be caught *for the purpose* of
papering them over. (I guess the predicate and/or the action would
somehow involve "PcdQ35SmramAtDefaultSmbase", *BUT* even then, the order
of calls to MaxCpuCountInitialization() and
Q35SmramAtDefaultSmbaseInitialization() is wrong!!!)

What I'm stating is that I'm *unable to reason* about such a predicate
with any sense of safety, due to the predicate's complexity, and due to
it compensating for the violation of a *foundational* invariant.

On my part, I wouldn't like to continue this discussion. I'm really
sorry.

(I'm only responding on-list because Gerd's question so closely follows
my good-bye from the other thread that I think he may not have seen that
until sending the question. But, with this, I have no more input on the
topic, or anything for the list, really.)

Laszlo


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [edk2-devel] [PATCH v2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression
  2023-01-16 14:48                 ` Laszlo Ersek
@ 2023-01-17 12:37                   ` Gerd Hoffmann
  2023-01-17 16:43                     ` Ard Biesheuvel
  0 siblings, 1 reply; 24+ messages in thread
From: Gerd Hoffmann @ 2023-01-17 12:37 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: devel, Michael Brown, Ard Biesheuvel, Brijesh Singh, Erdem Aktas,
	James Bottomley, Jiewen Yao, Jordan Justen, Min Xu,
	Oliver Steffen, Sebastien Boeuf, Tom Lendacky

  Hi,

> >> In particular the firmware makes no further decisions based on
> >> whether QEMU advertized some of these features.
> >
> > I was thinking the other way around:  When cpu hotplug is disabled in
> > qemu it should be safe to skip the whole cpu hotplug checking dance.
> > See test patch below.
> >
> > That would give us a config switch (turn off cpu hotplug support)
> > which would allow edk2 run on qemu versions with broken cpu hotplug.
> >
> > Does the idea look sane or do I miss something?

> This would be wrong.
> 
> [ detailed description snipped here (but stored for later reference,
>   thanks for all the details) ]

So, the tl;dr version:  cpu hotplug is older than smi feature
negotiation, so smi hotplug feature bit being off doesn't imply
qemu wouldn't hotplug cpus.

So, no easy way out.  Luckily this affects tcg only.

For edk2 ci doing (tcg) efi shell test boots switching to Oliver's
latest containers with fixed qemu included should handle things
(latest series just posted).  So once this is in we should be able to
merge this patch without breaking CI.

take care,
  Gerd


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [edk2-devel] [PATCH v2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression
  2023-01-17 12:37                   ` Gerd Hoffmann
@ 2023-01-17 16:43                     ` Ard Biesheuvel
  2023-01-18  7:25                       ` Gerd Hoffmann
  0 siblings, 1 reply; 24+ messages in thread
From: Ard Biesheuvel @ 2023-01-17 16:43 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Laszlo Ersek, devel, Michael Brown, Ard Biesheuvel, Brijesh Singh,
	Erdem Aktas, James Bottomley, Jiewen Yao, Jordan Justen, Min Xu,
	Oliver Steffen, Sebastien Boeuf, Tom Lendacky

On Tue, 17 Jan 2023 at 13:37, Gerd Hoffmann <kraxel@redhat.com> wrote:
>
>   Hi,
>
> > >> In particular the firmware makes no further decisions based on
> > >> whether QEMU advertized some of these features.
> > >
> > > I was thinking the other way around:  When cpu hotplug is disabled in
> > > qemu it should be safe to skip the whole cpu hotplug checking dance.
> > > See test patch below.
> > >
> > > That would give us a config switch (turn off cpu hotplug support)
> > > which would allow edk2 run on qemu versions with broken cpu hotplug.
> > >
> > > Does the idea look sane or do I miss something?
>
> > This would be wrong.
> >
> > [ detailed description snipped here (but stored for later reference,
> >   thanks for all the details) ]
>
> So, the tl;dr version:  cpu hotplug is older than smi feature
> negotiation, so smi hotplug feature bit being off doesn't imply
> qemu wouldn't hotplug cpus.
>
> So, no easy way out.  Luckily this affects tcg only.
>
> For edk2 ci doing (tcg) efi shell test boots switching to Oliver's
> latest containers with fixed qemu included should handle things
> (latest series just posted).  So once this is in we should be able to
> merge this patch without breaking CI.
>

My head is spinning.

What about running QEMU with only a single CPU, and without any of
these features? Is there really no way we can make that work without
turning OVMF into the timebomb that Laszlo describes?

It's just very annoying that on a non-KVM host and a given QEMU
binary, you might simply be out of luck entirely, and there is no way
you can run OVMF with the fix applied. I would like to avoid that if
possible.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [edk2-devel] [PATCH v2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression
  2023-01-17 16:43                     ` Ard Biesheuvel
@ 2023-01-18  7:25                       ` Gerd Hoffmann
  2023-01-18 11:50                         ` Laszlo Ersek
  0 siblings, 1 reply; 24+ messages in thread
From: Gerd Hoffmann @ 2023-01-18  7:25 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Laszlo Ersek, devel, Michael Brown, Ard Biesheuvel, Brijesh Singh,
	Erdem Aktas, James Bottomley, Jiewen Yao, Jordan Justen, Min Xu,
	Oliver Steffen, Sebastien Boeuf, Tom Lendacky

On Tue, Jan 17, 2023 at 05:43:53PM +0100, Ard Biesheuvel wrote:
> On Tue, 17 Jan 2023 at 13:37, Gerd Hoffmann <kraxel@redhat.com> wrote:
> >
> >   Hi,
> >
> > > >> In particular the firmware makes no further decisions based on
> > > >> whether QEMU advertized some of these features.
> > > >
> > > > I was thinking the other way around:  When cpu hotplug is disabled in
> > > > qemu it should be safe to skip the whole cpu hotplug checking dance.
> > > > See test patch below.
> > > >
> > > > That would give us a config switch (turn off cpu hotplug support)
> > > > which would allow edk2 run on qemu versions with broken cpu hotplug.
> > > >
> > > > Does the idea look sane or do I miss something?
> >
> > > This would be wrong.
> > >
> > > [ detailed description snipped here (but stored for later reference,
> > >   thanks for all the details) ]
> >
> > So, the tl;dr version:  cpu hotplug is older than smi feature
> > negotiation, so smi hotplug feature bit being off doesn't imply
> > qemu wouldn't hotplug cpus.
> >
> > So, no easy way out.  Luckily this affects tcg only.
> >
> > For edk2 ci doing (tcg) efi shell test boots switching to Oliver's
> > latest containers with fixed qemu included should handle things
> > (latest series just posted).  So once this is in we should be able to
> > merge this patch without breaking CI.
> 
> My head is spinning.
> 
> What about running QEMU with only a single CPU, and without any of
> these features? Is there really no way we can make that work without
> turning OVMF into the timebomb that Laszlo describes?

I can't see any way :(

ovmf seeing only a single cpu does not imply cpu hotplug can't happen,
it could be "qemu -smp cpus=1,maxcpus=4".  Figuring the maxcpus number
depends on the broken cpu hotplug registers.

> It's just very annoying that on a non-KVM host and a given QEMU
> binary, you might simply be out of luck entirely, and there is no way
> you can run OVMF with the fix applied. I would like to avoid that if
> possible.

Indeed.

take care,
  Gerd


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [edk2-devel] [PATCH v2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression
  2023-01-18  7:25                       ` Gerd Hoffmann
@ 2023-01-18 11:50                         ` Laszlo Ersek
  2023-01-18 13:10                           ` Gerd Hoffmann
  2023-01-18 13:10                           ` Ard Biesheuvel
  0 siblings, 2 replies; 24+ messages in thread
From: Laszlo Ersek @ 2023-01-18 11:50 UTC (permalink / raw)
  To: Gerd Hoffmann, Ard Biesheuvel
  Cc: devel, Michael Brown, Ard Biesheuvel, Brijesh Singh, Erdem Aktas,
	James Bottomley, Jiewen Yao, Jordan Justen, Min Xu,
	Oliver Steffen, Sebastien Boeuf, Tom Lendacky

On 1/18/23 08:25, Gerd Hoffmann wrote:
> On Tue, Jan 17, 2023 at 05:43:53PM +0100, Ard Biesheuvel wrote:
>> On Tue, 17 Jan 2023 at 13:37, Gerd Hoffmann <kraxel@redhat.com> wrote:
>>>
>>>   Hi,
>>>
>>>>>> In particular the firmware makes no further decisions based on
>>>>>> whether QEMU advertized some of these features.
>>>>>
>>>>> I was thinking the other way around:  When cpu hotplug is disabled in
>>>>> qemu it should be safe to skip the whole cpu hotplug checking dance.
>>>>> See test patch below.
>>>>>
>>>>> That would give us a config switch (turn off cpu hotplug support)
>>>>> which would allow edk2 run on qemu versions with broken cpu hotplug.
>>>>>
>>>>> Does the idea look sane or do I miss something?
>>>
>>>> This would be wrong.
>>>>
>>>> [ detailed description snipped here (but stored for later reference,
>>>>   thanks for all the details) ]
>>>
>>> So, the tl;dr version:  cpu hotplug is older than smi feature
>>> negotiation, so smi hotplug feature bit being off doesn't imply
>>> qemu wouldn't hotplug cpus.
>>>
>>> So, no easy way out.  Luckily this affects tcg only.
>>>
>>> For edk2 ci doing (tcg) efi shell test boots switching to Oliver's
>>> latest containers with fixed qemu included should handle things
>>> (latest series just posted).  So once this is in we should be able to
>>> merge this patch without breaking CI.
>>
>> My head is spinning.
>>
>> What about running QEMU with only a single CPU, and without any of
>> these features? Is there really no way we can make that work without
>> turning OVMF into the timebomb that Laszlo describes?
> 
> I can't see any way :(
> 
> ovmf seeing only a single cpu does not imply cpu hotplug can't happen,
> it could be "qemu -smp cpus=1,maxcpus=4".  Figuring the maxcpus number
> depends on the broken cpu hotplug registers.
> 
>> It's just very annoying that on a non-KVM host and a given QEMU
>> binary, you might simply be out of luck entirely, and there is no way
>> you can run OVMF with the fix applied. I would like to avoid that if
>> possible.
> 
> Indeed.

... you could introduce a new fw_cfg boolean switch (and explain it in
the hang message) that meant: "I know what this QEMU bug is, I
understand its consequences are obscure, risky, and far-reaching in
OVMF, I've been warned, I know what I'm doing". That's a relatively
small addition to this patch, and then the risk is assumed by the user.
It resolves "being out of luck *entirely*".

Laszlo


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [edk2-devel] [PATCH v2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression
  2023-01-18 11:50                         ` Laszlo Ersek
@ 2023-01-18 13:10                           ` Gerd Hoffmann
  2023-01-18 13:25                             ` Laszlo Ersek
  2023-01-18 13:10                           ` Ard Biesheuvel
  1 sibling, 1 reply; 24+ messages in thread
From: Gerd Hoffmann @ 2023-01-18 13:10 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Ard Biesheuvel, devel, Michael Brown, Ard Biesheuvel,
	Brijesh Singh, Erdem Aktas, James Bottomley, Jiewen Yao,
	Jordan Justen, Min Xu, Oliver Steffen, Sebastien Boeuf,
	Tom Lendacky

  Hi,

> ... you could introduce a new fw_cfg boolean switch (and explain it in
> the hang message) that meant: "I know what this QEMU bug is, I
> understand its consequences are obscure, risky, and far-reaching in
> OVMF, I've been warned, I know what I'm doing". That's a relatively
> small addition to this patch, and then the risk is assumed by the user.
> It resolves "being out of luck *entirely*".

Using -fw_cfg would work on old qemu versions indeed.

take care,
  Gerd

>From 65a4f683eaf94f82693811ce9b2393586a15afd7 Mon Sep 17 00:00:00 2001
From: Gerd Hoffmann <kraxel@redhat.com>
Date: Fri, 13 Jan 2023 13:07:36 +0100
Subject: [PATCH 1/1] OvmfPkg/PlatformInitLib: add switch to disable cpu
 hotplug support.

Add a fw_cfg-based switch to disable cpu hotplug support in OVMF.
This allows to boot OVMF on known-broken qemu versions.

This is only safe to do in case hotplug is not used.  Taking care
of that is left to the user.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 OvmfPkg/Library/PlatformInitLib/Platform.c | 30 ++++++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/OvmfPkg/Library/PlatformInitLib/Platform.c b/OvmfPkg/Library/PlatformInitLib/Platform.c
index c3d5f5eeb375..15811a4ff726 100644
--- a/OvmfPkg/Library/PlatformInitLib/Platform.c
+++ b/OvmfPkg/Library/PlatformInitLib/Platform.c
@@ -414,8 +414,12 @@ PlatformMaxCpuCountInitialization (
   IN OUT EFI_HOB_PLATFORM_INFO  *PlatformInfoHob
   )
 {
-  UINT16  BootCpuCount = 0;
-  UINT32  MaxCpuCount;
+  FIRMWARE_CONFIG_ITEM  SupportedFeaturesItem;
+  UINTN                 SupportedFeaturesSize;
+  EFI_STATUS            Status;
+  UINT16                BootCpuCount = 0;
+  UINT32                MaxCpuCount;
+  BOOLEAN               CpuHotplugDisabled = FALSE;
 
   //
   // Try to fetch the boot CPU count.
@@ -425,6 +429,15 @@ PlatformMaxCpuCountInitialization (
     BootCpuCount = QemuFwCfgRead16 ();
   }
 
+  Status = QemuFwCfgFindFile (
+             "opt/org.tianocore/nocpuhotplug",
+             &SupportedFeaturesItem,
+             &SupportedFeaturesSize
+             );
+  if (!EFI_ERROR (Status)) {
+    CpuHotplugDisabled = TRUE;
+  }
+
   if (BootCpuCount == 0) {
     //
     // QEMU doesn't report the boot CPU count. (BootCpuCount == 0) will let
@@ -434,6 +447,9 @@ PlatformMaxCpuCountInitialization (
     //
     DEBUG ((DEBUG_WARN, "%a: boot CPU count unavailable\n", __FUNCTION__));
     MaxCpuCount = PlatformInfoHob->DefaultMaxCpuNumber;
+  } else if (CpuHotplugDisabled) {
+    DEBUG ((DEBUG_INFO, "%a: CPU hotplug support disabled via opt/org.tianocore/nocpuhotplug\n", __FUNCTION__));
+    MaxCpuCount = BootCpuCount;
   } else {
     //
     // We will expose BootCpuCount to MpInitLib. MpInitLib will count APs up to
@@ -563,12 +579,22 @@ PlatformMaxCpuCountInitialization (
           DEBUG_ERROR,
           "%a: Broken CPU hotplug register block: Present=%u Possible=%u.\n"
           "%a: Update QEMU to v8, or to stable with dab30fbef389 backported.\n"
+          "\n"
+          "%a: Alternatively start qemu with:\n"
+          "%a:     -fw_cfg name=opt/org.tianocore/nocpuhotplug,string=1\n"
+          "%a: to disable OVMF cpu hotplug support.  Note that you must\n"
+          "%a: not ask qemu to hotplug CPUs then\n"
+          "\n"
           "%a: Refer to "
           "<https://bugzilla.tianocore.org/show_bug.cgi?id=4250>.\n",
           __FUNCTION__,
           Present,
           Possible,
           __FUNCTION__,
+          __FUNCTION__,
+          __FUNCTION__,
+          __FUNCTION__,
+          __FUNCTION__,
           __FUNCTION__
           ));
         ASSERT (FALSE);
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [edk2-devel] [PATCH v2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression
  2023-01-18 11:50                         ` Laszlo Ersek
  2023-01-18 13:10                           ` Gerd Hoffmann
@ 2023-01-18 13:10                           ` Ard Biesheuvel
  2023-01-18 13:21                             ` Laszlo Ersek
  1 sibling, 1 reply; 24+ messages in thread
From: Ard Biesheuvel @ 2023-01-18 13:10 UTC (permalink / raw)
  To: devel, lersek
  Cc: Gerd Hoffmann, Michael Brown, Ard Biesheuvel, Brijesh Singh,
	Erdem Aktas, James Bottomley, Jiewen Yao, Jordan Justen, Min Xu,
	Oliver Steffen, Sebastien Boeuf, Tom Lendacky

On Wed, 18 Jan 2023 at 12:50, Laszlo Ersek <lersek@redhat.com> wrote:
>
> On 1/18/23 08:25, Gerd Hoffmann wrote:
> > On Tue, Jan 17, 2023 at 05:43:53PM +0100, Ard Biesheuvel wrote:
> >> On Tue, 17 Jan 2023 at 13:37, Gerd Hoffmann <kraxel@redhat.com> wrote:
> >>>
> >>>   Hi,
> >>>
> >>>>>> In particular the firmware makes no further decisions based on
> >>>>>> whether QEMU advertized some of these features.
> >>>>>
> >>>>> I was thinking the other way around:  When cpu hotplug is disabled in
> >>>>> qemu it should be safe to skip the whole cpu hotplug checking dance.
> >>>>> See test patch below.
> >>>>>
> >>>>> That would give us a config switch (turn off cpu hotplug support)
> >>>>> which would allow edk2 run on qemu versions with broken cpu hotplug.
> >>>>>
> >>>>> Does the idea look sane or do I miss something?
> >>>
> >>>> This would be wrong.
> >>>>
> >>>> [ detailed description snipped here (but stored for later reference,
> >>>>   thanks for all the details) ]
> >>>
> >>> So, the tl;dr version:  cpu hotplug is older than smi feature
> >>> negotiation, so smi hotplug feature bit being off doesn't imply
> >>> qemu wouldn't hotplug cpus.
> >>>
> >>> So, no easy way out.  Luckily this affects tcg only.
> >>>
> >>> For edk2 ci doing (tcg) efi shell test boots switching to Oliver's
> >>> latest containers with fixed qemu included should handle things
> >>> (latest series just posted).  So once this is in we should be able to
> >>> merge this patch without breaking CI.
> >>
> >> My head is spinning.
> >>
> >> What about running QEMU with only a single CPU, and without any of
> >> these features? Is there really no way we can make that work without
> >> turning OVMF into the timebomb that Laszlo describes?
> >
> > I can't see any way :(
> >
> > ovmf seeing only a single cpu does not imply cpu hotplug can't happen,
> > it could be "qemu -smp cpus=1,maxcpus=4".  Figuring the maxcpus number
> > depends on the broken cpu hotplug registers.
> >
> >> It's just very annoying that on a non-KVM host and a given QEMU
> >> binary, you might simply be out of luck entirely, and there is no way
> >> you can run OVMF with the fix applied. I would like to avoid that if
> >> possible.
> >
> > Indeed.
>
> ... you could introduce a new fw_cfg boolean switch (and explain it in
> the hang message) that meant: "I know what this QEMU bug is, I
> understand its consequences are obscure, risky, and far-reaching in
> OVMF, I've been warned, I know what I'm doing". That's a relatively
> small addition to this patch, and then the risk is assumed by the user.
> It resolves "being out of luck *entirely*".
>

You mean the kind of fw_cfg vairiable that is arbitrarily settable
from the QEMU command line, right? Yeah, that would at least provide a
way out.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [edk2-devel] [PATCH v2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression
  2023-01-18 13:10                           ` Ard Biesheuvel
@ 2023-01-18 13:21                             ` Laszlo Ersek
  0 siblings, 0 replies; 24+ messages in thread
From: Laszlo Ersek @ 2023-01-18 13:21 UTC (permalink / raw)
  To: Ard Biesheuvel, devel
  Cc: Gerd Hoffmann, Michael Brown, Ard Biesheuvel, Brijesh Singh,
	Erdem Aktas, James Bottomley, Jiewen Yao, Jordan Justen, Min Xu,
	Oliver Steffen, Sebastien Boeuf, Tom Lendacky

On 1/18/23 14:10, Ard Biesheuvel wrote:
> On Wed, 18 Jan 2023 at 12:50, Laszlo Ersek <lersek@redhat.com> wrote:
>>
>> On 1/18/23 08:25, Gerd Hoffmann wrote:
>>> On Tue, Jan 17, 2023 at 05:43:53PM +0100, Ard Biesheuvel wrote:
>>>> On Tue, 17 Jan 2023 at 13:37, Gerd Hoffmann <kraxel@redhat.com> wrote:
>>>>>
>>>>>   Hi,
>>>>>
>>>>>>>> In particular the firmware makes no further decisions based on
>>>>>>>> whether QEMU advertized some of these features.
>>>>>>>
>>>>>>> I was thinking the other way around:  When cpu hotplug is disabled in
>>>>>>> qemu it should be safe to skip the whole cpu hotplug checking dance.
>>>>>>> See test patch below.
>>>>>>>
>>>>>>> That would give us a config switch (turn off cpu hotplug support)
>>>>>>> which would allow edk2 run on qemu versions with broken cpu hotplug.
>>>>>>>
>>>>>>> Does the idea look sane or do I miss something?
>>>>>
>>>>>> This would be wrong.
>>>>>>
>>>>>> [ detailed description snipped here (but stored for later reference,
>>>>>>   thanks for all the details) ]
>>>>>
>>>>> So, the tl;dr version:  cpu hotplug is older than smi feature
>>>>> negotiation, so smi hotplug feature bit being off doesn't imply
>>>>> qemu wouldn't hotplug cpus.
>>>>>
>>>>> So, no easy way out.  Luckily this affects tcg only.
>>>>>
>>>>> For edk2 ci doing (tcg) efi shell test boots switching to Oliver's
>>>>> latest containers with fixed qemu included should handle things
>>>>> (latest series just posted).  So once this is in we should be able to
>>>>> merge this patch without breaking CI.
>>>>
>>>> My head is spinning.
>>>>
>>>> What about running QEMU with only a single CPU, and without any of
>>>> these features? Is there really no way we can make that work without
>>>> turning OVMF into the timebomb that Laszlo describes?
>>>
>>> I can't see any way :(
>>>
>>> ovmf seeing only a single cpu does not imply cpu hotplug can't happen,
>>> it could be "qemu -smp cpus=1,maxcpus=4".  Figuring the maxcpus number
>>> depends on the broken cpu hotplug registers.
>>>
>>>> It's just very annoying that on a non-KVM host and a given QEMU
>>>> binary, you might simply be out of luck entirely, and there is no way
>>>> you can run OVMF with the fix applied. I would like to avoid that if
>>>> possible.
>>>
>>> Indeed.
>>
>> ... you could introduce a new fw_cfg boolean switch (and explain it in
>> the hang message) that meant: "I know what this QEMU bug is, I
>> understand its consequences are obscure, risky, and far-reaching in
>> OVMF, I've been warned, I know what I'm doing". That's a relatively
>> small addition to this patch, and then the risk is assumed by the user.
>> It resolves "being out of luck *entirely*".
>>
> 
> You mean the kind of fw_cfg vairiable that is arbitrarily settable
> from the QEMU command line, right? Yeah, that would at least provide a
> way out.
> 

yes.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [edk2-devel] [PATCH v2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression
  2023-01-18 13:10                           ` Gerd Hoffmann
@ 2023-01-18 13:25                             ` Laszlo Ersek
  0 siblings, 0 replies; 24+ messages in thread
From: Laszlo Ersek @ 2023-01-18 13:25 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Ard Biesheuvel, devel, Michael Brown, Ard Biesheuvel,
	Brijesh Singh, Erdem Aktas, James Bottomley, Jiewen Yao,
	Jordan Justen, Min Xu, Oliver Steffen, Sebastien Boeuf,
	Tom Lendacky

On 1/18/23 14:10, Gerd Hoffmann wrote:
>   Hi,
> 
>> ... you could introduce a new fw_cfg boolean switch (and explain it in
>> the hang message) that meant: "I know what this QEMU bug is, I
>> understand its consequences are obscure, risky, and far-reaching in
>> OVMF, I've been warned, I know what I'm doing". That's a relatively
>> small addition to this patch, and then the risk is assumed by the user.
>> It resolves "being out of luck *entirely*".
> 
> Using -fw_cfg would work on old qemu versions indeed.
> 
> take care,
>   Gerd
> 
> From 65a4f683eaf94f82693811ce9b2393586a15afd7 Mon Sep 17 00:00:00 2001
> From: Gerd Hoffmann <kraxel@redhat.com>
> Date: Fri, 13 Jan 2023 13:07:36 +0100
> Subject: [PATCH 1/1] OvmfPkg/PlatformInitLib: add switch to disable cpu
>  hotplug support.
> 
> Add a fw_cfg-based switch to disable cpu hotplug support in OVMF.
> This allows to boot OVMF on known-broken qemu versions.
> 
> This is only safe to do in case hotplug is not used.  Taking care
> of that is left to the user.
> 
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
>  OvmfPkg/Library/PlatformInitLib/Platform.c | 30 ++++++++++++++++++++--
>  1 file changed, 28 insertions(+), 2 deletions(-)
> 
> diff --git a/OvmfPkg/Library/PlatformInitLib/Platform.c b/OvmfPkg/Library/PlatformInitLib/Platform.c
> index c3d5f5eeb375..15811a4ff726 100644
> --- a/OvmfPkg/Library/PlatformInitLib/Platform.c
> +++ b/OvmfPkg/Library/PlatformInitLib/Platform.c
> @@ -414,8 +414,12 @@ PlatformMaxCpuCountInitialization (
>    IN OUT EFI_HOB_PLATFORM_INFO  *PlatformInfoHob
>    )
>  {
> -  UINT16  BootCpuCount = 0;
> -  UINT32  MaxCpuCount;
> +  FIRMWARE_CONFIG_ITEM  SupportedFeaturesItem;
> +  UINTN                 SupportedFeaturesSize;
> +  EFI_STATUS            Status;
> +  UINT16                BootCpuCount = 0;
> +  UINT32                MaxCpuCount;
> +  BOOLEAN               CpuHotplugDisabled = FALSE;
>  
>    //
>    // Try to fetch the boot CPU count.
> @@ -425,6 +429,15 @@ PlatformMaxCpuCountInitialization (
>      BootCpuCount = QemuFwCfgRead16 ();
>    }
>  
> +  Status = QemuFwCfgFindFile (
> +             "opt/org.tianocore/nocpuhotplug",
> +             &SupportedFeaturesItem,
> +             &SupportedFeaturesSize
> +             );
> +  if (!EFI_ERROR (Status)) {
> +    CpuHotplugDisabled = TRUE;
> +  }
> +
>    if (BootCpuCount == 0) {
>      //
>      // QEMU doesn't report the boot CPU count. (BootCpuCount == 0) will let
> @@ -434,6 +447,9 @@ PlatformMaxCpuCountInitialization (
>      //
>      DEBUG ((DEBUG_WARN, "%a: boot CPU count unavailable\n", __FUNCTION__));
>      MaxCpuCount = PlatformInfoHob->DefaultMaxCpuNumber;
> +  } else if (CpuHotplugDisabled) {
> +    DEBUG ((DEBUG_INFO, "%a: CPU hotplug support disabled via opt/org.tianocore/nocpuhotplug\n", __FUNCTION__));
> +    MaxCpuCount = BootCpuCount;
>    } else {
>      //
>      // We will expose BootCpuCount to MpInitLib. MpInitLib will count APs up to
> @@ -563,12 +579,22 @@ PlatformMaxCpuCountInitialization (
>            DEBUG_ERROR,
>            "%a: Broken CPU hotplug register block: Present=%u Possible=%u.\n"
>            "%a: Update QEMU to v8, or to stable with dab30fbef389 backported.\n"
> +          "\n"
> +          "%a: Alternatively start qemu with:\n"
> +          "%a:     -fw_cfg name=opt/org.tianocore/nocpuhotplug,string=1\n"
> +          "%a: to disable OVMF cpu hotplug support.  Note that you must\n"
> +          "%a: not ask qemu to hotplug CPUs then\n"
> +          "\n"
>            "%a: Refer to "
>            "<https://bugzilla.tianocore.org/show_bug.cgi?id=4250>.\n",
>            __FUNCTION__,
>            Present,
>            Possible,
>            __FUNCTION__,
> +          __FUNCTION__,
> +          __FUNCTION__,
> +          __FUNCTION__,
> +          __FUNCTION__,
>            __FUNCTION__
>            ));
>          ASSERT (FALSE);

... let me post a v3 in this direction, later.

Thanks
Laszlo


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2023-01-18 13:25 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-01-12  8:28 [PATCH v2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression Laszlo Ersek
2023-01-12  9:55 ` [edk2-devel] " Michael Brown
2023-01-12 10:09   ` Ard Biesheuvel
2023-01-12 13:31     ` Laszlo Ersek
2023-01-12 13:22   ` Laszlo Ersek
2023-01-12 16:08     ` Michael Brown
2023-01-12 17:58       ` Laszlo Ersek
2023-01-12 18:22         ` Laszlo Ersek
2023-01-12 22:49           ` Michael Brown
2023-01-13  6:03         ` Gerd Hoffmann
2023-01-13  9:32           ` Gerd Hoffmann
2023-01-13 10:10             ` Laszlo Ersek
2023-01-13 12:22               ` Gerd Hoffmann
2023-01-16 14:42                 ` Ard Biesheuvel
2023-01-16 14:48                 ` Laszlo Ersek
2023-01-17 12:37                   ` Gerd Hoffmann
2023-01-17 16:43                     ` Ard Biesheuvel
2023-01-18  7:25                       ` Gerd Hoffmann
2023-01-18 11:50                         ` Laszlo Ersek
2023-01-18 13:10                           ` Gerd Hoffmann
2023-01-18 13:25                             ` Laszlo Ersek
2023-01-18 13:10                           ` Ard Biesheuvel
2023-01-18 13:21                             ` Laszlo Ersek
2023-01-12 18:34 ` Laszlo Ersek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox