public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
* [PATCH v2 1/1] OvmfPkg/NestedInterruptTplLib: replace ASSERT() with a warning logged.
@ 2023-05-03  7:19 Gerd Hoffmann
  2023-05-05 14:10 ` [edk2-devel] " Michael Brown
  0 siblings, 1 reply; 18+ messages in thread
From: Gerd Hoffmann @ 2023-05-03  7:19 UTC (permalink / raw)
  To: devel
  Cc: Oliver Steffen, Gerd Hoffmann, Pawel Polawski, Jiewen Yao,
	Ard Biesheuvel, Jordan Justen, Michael Brown, Laszlo Ersek

OVMF can't guarantee that the ASSERT() doesn't happen.  Misbehaving
EFI applications can trigger this.  So log a warning instead and try
to continue.

Reproducer: Fetch windows 11 22H2 iso image, boot it in qemu with OVMF.

Traced to BootServices->Stall() being called with IPL=TPL_HIGH_LEVEL
and Interrupts /enabled/ while windows is booting.

Cc: Michael Brown <mcb30@ipxe.org>
Cc: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 OvmfPkg/Library/NestedInterruptTplLib/Tpl.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c b/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c
index e19d98878eb7..fdd7d15c4ba8 100644
--- a/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c
+++ b/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c
@@ -39,7 +39,9 @@ NestedInterruptRaiseTPL (
   //
   ASSERT (GetInterruptState () == FALSE);
   InterruptedTPL = gBS->RaiseTPL (TPL_HIGH_LEVEL);
-  ASSERT (InterruptedTPL < TPL_HIGH_LEVEL);
+  if (InterruptedTPL >= TPL_HIGH_LEVEL) {
+    DEBUG ((DEBUG_WARN, "%a: Called at IPL %d\n", __func__, InterruptedTPL));
+  }
 
   return InterruptedTPL;
 }
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [edk2-devel] [PATCH v2 1/1] OvmfPkg/NestedInterruptTplLib: replace ASSERT() with a warning logged.
  2023-05-03  7:19 [PATCH v2 1/1] OvmfPkg/NestedInterruptTplLib: replace ASSERT() with a warning logged Gerd Hoffmann
@ 2023-05-05 14:10 ` Michael Brown
  2023-05-05 18:56   ` Laszlo Ersek
  0 siblings, 1 reply; 18+ messages in thread
From: Michael Brown @ 2023-05-05 14:10 UTC (permalink / raw)
  To: devel, kraxel
  Cc: Oliver Steffen, Pawel Polawski, Jiewen Yao, Ard Biesheuvel,
	Jordan Justen, Laszlo Ersek

On 03/05/2023 08:19, Gerd Hoffmann wrote:
> OVMF can't guarantee that the ASSERT() doesn't happen.  Misbehaving
> EFI applications can trigger this.  So log a warning instead and try
> to continue.
> 
> Reproducer: Fetch windows 11 22H2 iso image, boot it in qemu with OVMF.
> 
> Traced to BootServices->Stall() being called with IPL=TPL_HIGH_LEVEL
> and Interrupts /enabled/ while windows is booting.
> 
> Cc: Michael Brown <mcb30@ipxe.org>
> Cc: Laszlo Ersek <lersek@redhat.com>
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
>   OvmfPkg/Library/NestedInterruptTplLib/Tpl.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c b/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c
> index e19d98878eb7..fdd7d15c4ba8 100644
> --- a/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c
> +++ b/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c
> @@ -39,7 +39,9 @@ NestedInterruptRaiseTPL (
>     //
>     ASSERT (GetInterruptState () == FALSE);
>     InterruptedTPL = gBS->RaiseTPL (TPL_HIGH_LEVEL);
> -  ASSERT (InterruptedTPL < TPL_HIGH_LEVEL);
> +  if (InterruptedTPL >= TPL_HIGH_LEVEL) {
> +    DEBUG ((DEBUG_WARN, "%a: Called at IPL %d\n", __func__, InterruptedTPL));
> +  }
>   
>     return InterruptedTPL;
>   }

While https://bugzilla.redhat.com/show_bug.cgi?id=2189136 continues to 
track the underlying Windows bug that leads to this assertion being 
triggered: I suspect that this patch will allow people to boot these 
buggy versions of Windows in OVMF, and I don't think it will make things 
any worse.

I would probably suggest changing DEBUG_WARN to DEBUG_ERROR since this 
represents a serious invariant violation being detected.  With that change:

   Reviewed-by: Michael Brown <mcb30@ipxe.org>

Thanks,

Michael


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [edk2-devel] [PATCH v2 1/1] OvmfPkg/NestedInterruptTplLib: replace ASSERT() with a warning logged.
  2023-05-05 14:10 ` [edk2-devel] " Michael Brown
@ 2023-05-05 18:56   ` Laszlo Ersek
  2023-05-05 23:27     ` Michael Brown
  0 siblings, 1 reply; 18+ messages in thread
From: Laszlo Ersek @ 2023-05-05 18:56 UTC (permalink / raw)
  To: Michael Brown, devel, kraxel
  Cc: Oliver Steffen, Pawel Polawski, Jiewen Yao, Ard Biesheuvel,
	Jordan Justen

On 5/5/23 16:10, Michael Brown wrote:
> On 03/05/2023 08:19, Gerd Hoffmann wrote:
>> OVMF can't guarantee that the ASSERT() doesn't happen.  Misbehaving
>> EFI applications can trigger this.  So log a warning instead and try
>> to continue.
>>
>> Reproducer: Fetch windows 11 22H2 iso image, boot it in qemu with OVMF.
>>
>> Traced to BootServices->Stall() being called with IPL=TPL_HIGH_LEVEL
>> and Interrupts /enabled/ while windows is booting.
>>
>> Cc: Michael Brown <mcb30@ipxe.org>
>> Cc: Laszlo Ersek <lersek@redhat.com>
>> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
>> ---
>>   OvmfPkg/Library/NestedInterruptTplLib/Tpl.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c
>> b/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c
>> index e19d98878eb7..fdd7d15c4ba8 100644
>> --- a/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c
>> +++ b/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c
>> @@ -39,7 +39,9 @@ NestedInterruptRaiseTPL (
>>     //
>>     ASSERT (GetInterruptState () == FALSE);
>>     InterruptedTPL = gBS->RaiseTPL (TPL_HIGH_LEVEL);
>> -  ASSERT (InterruptedTPL < TPL_HIGH_LEVEL);
>> +  if (InterruptedTPL >= TPL_HIGH_LEVEL) {
>> +    DEBUG ((DEBUG_WARN, "%a: Called at IPL %d\n", __func__,
>> InterruptedTPL));
>> +  }
>>       return InterruptedTPL;
>>   }
> 
> While https://bugzilla.redhat.com/show_bug.cgi?id=2189136 continues to
> track the underlying Windows bug that leads to this assertion being
> triggered: I suspect that this patch will allow people to boot these
> buggy versions of Windows in OVMF, and I don't think it will make things
> any worse.
> 
> I would probably suggest changing DEBUG_WARN to DEBUG_ERROR since this
> represents a serious invariant violation being detected.  With that change:
> 
>   Reviewed-by: Michael Brown <mcb30@ipxe.org>

I don't like the patch. For two reasons:

(1) It papers over the actual issue. The problem should be fixed where
it is, if possible.

(2) With the patch applied, NestedInterruptRaiseTPL() can return
TPL_HIGH_LEVEL (as "InterruptedTPL"). Consequently,
TimerInterruptHandler() [OvmfPkg/LocalApicTimerDxe/LocalApicTimerDxe.c]
may pass TPL_HIGH_LEVEL back to NestedInterruptRestoreTPL(), as
"InterruptedTPL".

I believe that this in turn may invalidate at least one comment in
NestedInterruptRestoreTPL():

    //
    // Call RestoreTPL() to allow event notifications to be
    // dispatched.  This will implicitly re-enable interrupts.
    //
    gBS->RestoreTPL (InterruptedTPL);

Restoring TPL_HIGH_LEVEL does not re-enable interrupts -- nominally anyways.


I wouldn't like OVMF to stick with yet another workaround / yet more
internal inconsistency. We should just wait until fixed Windows
installer media gets released.

Here's an alternative:

(a) Make LocalApicTimerDxe Xen-specific again. It's only the OVMF Xen
platform that really *needs* NestedInterruptTplLib. (Don't get me wrong:
NestedInterruptTplLib is technically correct in all circumstances, but
in practice it happens to be too strict.)

(b) For the non-Xen OVMF platforms, re-create a LocalApicTimerDxe
variant that effectively has commits a086f4a63bc0 and a24fbd606125
reverted. (We should keep 9bf473da4c1d.) This returns us to
pre-239b50a86370 status -- that is, a timer interrupt handler that (a)
does not try to be smart about nested interrupts, therefore one that is
much simpler, and (b) is more tolerant of the Windows / cdboot.efi spec
violation, (c) is vulnerable to the timer interrupt storm seen on Xen,
but will never run on Xen. (Only the OVMF Xen platform is supposed to be
launched on Xen.)

Laszlo


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [edk2-devel] [PATCH v2 1/1] OvmfPkg/NestedInterruptTplLib: replace ASSERT() with a warning logged.
  2023-05-05 18:56   ` Laszlo Ersek
@ 2023-05-05 23:27     ` Michael Brown
  2023-05-05 23:57       ` Ard Biesheuvel
  2023-05-08  6:38       ` Laszlo Ersek
  0 siblings, 2 replies; 18+ messages in thread
From: Michael Brown @ 2023-05-05 23:27 UTC (permalink / raw)
  To: devel, lersek, kraxel
  Cc: Oliver Steffen, Pawel Polawski, Jiewen Yao, Ard Biesheuvel,
	Jordan Justen

On 05/05/2023 19:56, Laszlo Ersek wrote:
> I don't like the patch. For two reasons:
> 
> (1) It papers over the actual issue. The problem should be fixed where
> it is, if possible.

Agreed, but (as you have shown in 
https://bugzilla.redhat.com/show_bug.cgi?id=2189136) the bug lies in 
Windows code rather than in EDK2 code.  If the goal is to allow these 
buggy Windows builds to still be used with OVMF, then the only option is 
to paper over the issue.  We should do this only if it can be proven 
safe to do so, of course.

> (2) With the patch applied, NestedInterruptRaiseTPL() can return
> TPL_HIGH_LEVEL (as "InterruptedTPL"). Consequently,
> TimerInterruptHandler() [OvmfPkg/LocalApicTimerDxe/LocalApicTimerDxe.c]
> may pass TPL_HIGH_LEVEL back to NestedInterruptRestoreTPL(), as
> "InterruptedTPL".
> 
> I believe that this in turn may invalidate at least one comment in
> NestedInterruptRestoreTPL():
> 
>      //
>      // Call RestoreTPL() to allow event notifications to be
>      // dispatched.  This will implicitly re-enable interrupts.
>      //
>      gBS->RestoreTPL (InterruptedTPL);
> 
> Restoring TPL_HIGH_LEVEL does not re-enable interrupts -- nominally anyways.

I agree that the comment is invalidated, but as far as I can tell the 
logic remains safe.

I will put together a patch to update the comments in 
NestedInterruptTplLib to address the possibility of an interrupt 
occurring (illegally) at TPL_HIGH_LEVEL.

> (a) Make LocalApicTimerDxe Xen-specific again. It's only the OVMF Xen
> platform that really *needs* NestedInterruptTplLib. (Don't get me wrong:
> NestedInterruptTplLib is technically correct in all circumstances, but
> in practice it happens to be too strict.)
> 
> (b) For the non-Xen OVMF platforms, re-create a LocalApicTimerDxe
> variant that effectively has commits a086f4a63bc0 and a24fbd606125
> reverted. (We should keep 9bf473da4c1d.) This returns us to
> pre-239b50a86370 status -- that is, a timer interrupt handler that (a)
> does not try to be smart about nested interrupts, therefore one that is
> much simpler, and (b) is more tolerant of the Windows / cdboot.efi spec
> violation, (c) is vulnerable to the timer interrupt storm seen on Xen,
> but will never run on Xen. (Only the OVMF Xen platform is supposed to be
> launched on Xen.)

I'm less keen on this because it reduces the runtime exposure of a very 
complex piece of code, and will effectively cause that code to become 
unmaintained.

It's also satisfying (to me) that NestedInterruptTplLib provides a 
provable upper bound on stack consumption due to interrupts, which can't 
be guaranteed by the simpler pre-239b50a86370 scheme.

Could we defer judgement until after I've fully reasoned through (and 
documented) how NestedInterruptTplLib will work in the presence of 
interrupts occurring at TPL_HIGH_LEVEL?

Thanks,

Michael


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [edk2-devel] [PATCH v2 1/1] OvmfPkg/NestedInterruptTplLib: replace ASSERT() with a warning logged.
  2023-05-05 23:27     ` Michael Brown
@ 2023-05-05 23:57       ` Ard Biesheuvel
  2023-05-08  6:45         ` Laszlo Ersek
  2023-05-08  6:38       ` Laszlo Ersek
  1 sibling, 1 reply; 18+ messages in thread
From: Ard Biesheuvel @ 2023-05-05 23:57 UTC (permalink / raw)
  To: Michael Brown
  Cc: devel, lersek, kraxel, Oliver Steffen, Pawel Polawski, Jiewen Yao,
	Ard Biesheuvel, Jordan Justen

On Sat, 6 May 2023 at 01:27, Michael Brown <mcb30@ipxe.org> wrote:
>
> On 05/05/2023 19:56, Laszlo Ersek wrote:
> > I don't like the patch. For two reasons:
> >
> > (1) It papers over the actual issue. The problem should be fixed where
> > it is, if possible.
>
> Agreed, but (as you have shown in
> https://bugzilla.redhat.com/show_bug.cgi?id=2189136) the bug lies in
> Windows code rather than in EDK2 code.  If the goal is to allow these
> buggy Windows builds to still be used with OVMF, then the only option is
> to paper over the issue.  We should do this only if it can be proven
> safe to do so, of course.
>
> > (2) With the patch applied, NestedInterruptRaiseTPL() can return
> > TPL_HIGH_LEVEL (as "InterruptedTPL"). Consequently,
> > TimerInterruptHandler() [OvmfPkg/LocalApicTimerDxe/LocalApicTimerDxe.c]
> > may pass TPL_HIGH_LEVEL back to NestedInterruptRestoreTPL(), as
> > "InterruptedTPL".
> >
> > I believe that this in turn may invalidate at least one comment in
> > NestedInterruptRestoreTPL():
> >
> >      //
> >      // Call RestoreTPL() to allow event notifications to be
> >      // dispatched.  This will implicitly re-enable interrupts.
> >      //
> >      gBS->RestoreTPL (InterruptedTPL);
> >
> > Restoring TPL_HIGH_LEVEL does not re-enable interrupts -- nominally anyways.
>
> I agree that the comment is invalidated, but as far as I can tell the
> logic remains safe.
>
> I will put together a patch to update the comments in
> NestedInterruptTplLib to address the possibility of an interrupt
> occurring (illegally) at TPL_HIGH_LEVEL.
>
> > (a) Make LocalApicTimerDxe Xen-specific again. It's only the OVMF Xen
> > platform that really *needs* NestedInterruptTplLib. (Don't get me wrong:
> > NestedInterruptTplLib is technically correct in all circumstances, but
> > in practice it happens to be too strict.)
> >
> > (b) For the non-Xen OVMF platforms, re-create a LocalApicTimerDxe
> > variant that effectively has commits a086f4a63bc0 and a24fbd606125
> > reverted. (We should keep 9bf473da4c1d.) This returns us to
> > pre-239b50a86370 status -- that is, a timer interrupt handler that (a)
> > does not try to be smart about nested interrupts, therefore one that is
> > much simpler, and (b) is more tolerant of the Windows / cdboot.efi spec
> > violation, (c) is vulnerable to the timer interrupt storm seen on Xen,
> > but will never run on Xen. (Only the OVMF Xen platform is supposed to be
> > launched on Xen.)
>
> I'm less keen on this because it reduces the runtime exposure of a very
> complex piece of code, and will effectively cause that code to become
> unmaintained.
>
> It's also satisfying (to me) that NestedInterruptTplLib provides a
> provable upper bound on stack consumption due to interrupts, which can't
> be guaranteed by the simpler pre-239b50a86370 scheme.
>
> Could we defer judgement until after I've fully reasoned through (and
> documented) how NestedInterruptTplLib will work in the presence of
> interrupts occurring at TPL_HIGH_LEVEL?
>

Would it be feasible for our firmware implementation to disable the
timer interrupt at the timer end as well?

E.g.,

RaiseTPL(HIGH)::

CLI
disarm timer


RestoreTPL::

<complain if HIGH and interrupts enabled at CPU side>
re-arm timer
STI

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [edk2-devel] [PATCH v2 1/1] OvmfPkg/NestedInterruptTplLib: replace ASSERT() with a warning logged.
  2023-05-05 23:27     ` Michael Brown
  2023-05-05 23:57       ` Ard Biesheuvel
@ 2023-05-08  6:38       ` Laszlo Ersek
  2023-05-08 21:31         ` [PATCH 0/2] OvmfPkg: Relax assertion that interrupts do not occur at TPL_HIGH_LEVEL Michael Brown
       [not found]         ` <20230508213100.3949708-1-mcb30@ipxe.org>
  1 sibling, 2 replies; 18+ messages in thread
From: Laszlo Ersek @ 2023-05-08  6:38 UTC (permalink / raw)
  To: Michael Brown, devel, kraxel
  Cc: Oliver Steffen, Pawel Polawski, Jiewen Yao, Ard Biesheuvel,
	Jordan Justen

On 5/6/23 01:27, Michael Brown wrote:
> On 05/05/2023 19:56, Laszlo Ersek wrote:
>> I don't like the patch. For two reasons:
>>
>> (1) It papers over the actual issue. The problem should be fixed where
>> it is, if possible.
> 
> Agreed, but (as you have shown in
> https://bugzilla.redhat.com/show_bug.cgi?id=2189136) the bug lies in
> Windows code rather than in EDK2 code.  If the goal is to allow these
> buggy Windows builds to still be used with OVMF, then the only option is
> to paper over the issue.  We should do this only if it can be proven
> safe to do so, of course.
> 
>> (2) With the patch applied, NestedInterruptRaiseTPL() can return
>> TPL_HIGH_LEVEL (as "InterruptedTPL"). Consequently,
>> TimerInterruptHandler() [OvmfPkg/LocalApicTimerDxe/LocalApicTimerDxe.c]
>> may pass TPL_HIGH_LEVEL back to NestedInterruptRestoreTPL(), as
>> "InterruptedTPL".
>>
>> I believe that this in turn may invalidate at least one comment in
>> NestedInterruptRestoreTPL():
>>
>>      //
>>      // Call RestoreTPL() to allow event notifications to be
>>      // dispatched.  This will implicitly re-enable interrupts.
>>      //
>>      gBS->RestoreTPL (InterruptedTPL);
>>
>> Restoring TPL_HIGH_LEVEL does not re-enable interrupts -- nominally
>> anyways.
> 
> I agree that the comment is invalidated, but as far as I can tell the
> logic remains safe.
> 
> I will put together a patch to update the comments in
> NestedInterruptTplLib to address the possibility of an interrupt
> occurring (illegally) at TPL_HIGH_LEVEL.
> 
>> (a) Make LocalApicTimerDxe Xen-specific again. It's only the OVMF Xen
>> platform that really *needs* NestedInterruptTplLib. (Don't get me wrong:
>> NestedInterruptTplLib is technically correct in all circumstances, but
>> in practice it happens to be too strict.)
>>
>> (b) For the non-Xen OVMF platforms, re-create a LocalApicTimerDxe
>> variant that effectively has commits a086f4a63bc0 and a24fbd606125
>> reverted. (We should keep 9bf473da4c1d.) This returns us to
>> pre-239b50a86370 status -- that is, a timer interrupt handler that (a)
>> does not try to be smart about nested interrupts, therefore one that is
>> much simpler, and (b) is more tolerant of the Windows / cdboot.efi spec
>> violation, (c) is vulnerable to the timer interrupt storm seen on Xen,
>> but will never run on Xen. (Only the OVMF Xen platform is supposed to be
>> launched on Xen.)
> 
> I'm less keen on this because it reduces the runtime exposure of a very
> complex piece of code, and will effectively cause that code to become
> unmaintained.
> 
> It's also satisfying (to me) that NestedInterruptTplLib provides a
> provable upper bound on stack consumption due to interrupts, which can't
> be guaranteed by the simpler pre-239b50a86370 scheme.
> 
> Could we defer judgement until after I've fully reasoned through (and
> documented) how NestedInterruptTplLib will work in the presence of
> interrupts occurring at TPL_HIGH_LEVEL?

Sure, absolutely!

As I wrote elsewhere: if you can revalidate the code with a new, less
strict set of invariants, and update the comments, I think that would be
the perfect workaround.

Thank you,
Laszlo


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [edk2-devel] [PATCH v2 1/1] OvmfPkg/NestedInterruptTplLib: replace ASSERT() with a warning logged.
  2023-05-05 23:57       ` Ard Biesheuvel
@ 2023-05-08  6:45         ` Laszlo Ersek
  2023-05-09  9:13           ` Ard Biesheuvel
  0 siblings, 1 reply; 18+ messages in thread
From: Laszlo Ersek @ 2023-05-08  6:45 UTC (permalink / raw)
  To: Ard Biesheuvel, Michael Brown
  Cc: devel, kraxel, Oliver Steffen, Pawel Polawski, Jiewen Yao,
	Ard Biesheuvel, Jordan Justen

On 5/6/23 01:57, Ard Biesheuvel wrote:
> On Sat, 6 May 2023 at 01:27, Michael Brown <mcb30@ipxe.org> wrote:
>>
>> On 05/05/2023 19:56, Laszlo Ersek wrote:
>>> I don't like the patch. For two reasons:
>>>
>>> (1) It papers over the actual issue. The problem should be fixed where
>>> it is, if possible.
>>
>> Agreed, but (as you have shown in
>> https://bugzilla.redhat.com/show_bug.cgi?id=2189136) the bug lies in
>> Windows code rather than in EDK2 code.  If the goal is to allow these
>> buggy Windows builds to still be used with OVMF, then the only option is
>> to paper over the issue.  We should do this only if it can be proven
>> safe to do so, of course.
>>
>>> (2) With the patch applied, NestedInterruptRaiseTPL() can return
>>> TPL_HIGH_LEVEL (as "InterruptedTPL"). Consequently,
>>> TimerInterruptHandler() [OvmfPkg/LocalApicTimerDxe/LocalApicTimerDxe.c]
>>> may pass TPL_HIGH_LEVEL back to NestedInterruptRestoreTPL(), as
>>> "InterruptedTPL".
>>>
>>> I believe that this in turn may invalidate at least one comment in
>>> NestedInterruptRestoreTPL():
>>>
>>>      //
>>>      // Call RestoreTPL() to allow event notifications to be
>>>      // dispatched.  This will implicitly re-enable interrupts.
>>>      //
>>>      gBS->RestoreTPL (InterruptedTPL);
>>>
>>> Restoring TPL_HIGH_LEVEL does not re-enable interrupts -- nominally anyways.
>>
>> I agree that the comment is invalidated, but as far as I can tell the
>> logic remains safe.
>>
>> I will put together a patch to update the comments in
>> NestedInterruptTplLib to address the possibility of an interrupt
>> occurring (illegally) at TPL_HIGH_LEVEL.
>>
>>> (a) Make LocalApicTimerDxe Xen-specific again. It's only the OVMF Xen
>>> platform that really *needs* NestedInterruptTplLib. (Don't get me wrong:
>>> NestedInterruptTplLib is technically correct in all circumstances, but
>>> in practice it happens to be too strict.)
>>>
>>> (b) For the non-Xen OVMF platforms, re-create a LocalApicTimerDxe
>>> variant that effectively has commits a086f4a63bc0 and a24fbd606125
>>> reverted. (We should keep 9bf473da4c1d.) This returns us to
>>> pre-239b50a86370 status -- that is, a timer interrupt handler that (a)
>>> does not try to be smart about nested interrupts, therefore one that is
>>> much simpler, and (b) is more tolerant of the Windows / cdboot.efi spec
>>> violation, (c) is vulnerable to the timer interrupt storm seen on Xen,
>>> but will never run on Xen. (Only the OVMF Xen platform is supposed to be
>>> launched on Xen.)
>>
>> I'm less keen on this because it reduces the runtime exposure of a very
>> complex piece of code, and will effectively cause that code to become
>> unmaintained.
>>
>> It's also satisfying (to me) that NestedInterruptTplLib provides a
>> provable upper bound on stack consumption due to interrupts, which can't
>> be guaranteed by the simpler pre-239b50a86370 scheme.
>>
>> Could we defer judgement until after I've fully reasoned through (and
>> documented) how NestedInterruptTplLib will work in the presence of
>> interrupts occurring at TPL_HIGH_LEVEL?
>>
> 
> Would it be feasible for our firmware implementation to disable the
> timer interrupt at the timer end as well?
> 
> E.g.,
> 
> RaiseTPL(HIGH)::
> 
> CLI
> disarm timer
> 
> 
> RestoreTPL::
> 
> <complain if HIGH and interrupts enabled at CPU side>
> re-arm timer
> STI
> 

I can be entirely wrong here, but:

- we looked for a solution (or workaround) to the original problem that
stays within the boundaries of OvmfPkg, so sinking tweaks into the core
TPL manipulation functions isn't ideal

- regarding the TimerInterruptHandler() function(s) that do live in
OvmfPkg, there had been tweaks to signaling end-of-interrupt (which I
understand as sort of equivalent to your suggestion, as unless/until you
signal EOI, no more interrupts will be *generated*), but those had not
helped. The EOI was either too early and so we got the unbounded
nesting, or it was too late, and no interrupts were generated while (for
example) TPL_CALLBACK code would depend on timers with CheckEvent. See
bug 4162 -- that was what prompted Michael to revert the EOI placement
tweak and to implement NestedInterruptLib.

Apologies if there are further interpretations of disarming the timer
that I'm missing!

Laszlo


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 0/2] OvmfPkg: Relax assertion that interrupts do not occur at TPL_HIGH_LEVEL
  2023-05-08  6:38       ` Laszlo Ersek
@ 2023-05-08 21:31         ` Michael Brown
  2023-05-09  7:05           ` Gerd Hoffmann
  2023-05-09  8:43           ` Laszlo Ersek
       [not found]         ` <20230508213100.3949708-1-mcb30@ipxe.org>
  1 sibling, 2 replies; 18+ messages in thread
From: Michael Brown @ 2023-05-08 21:31 UTC (permalink / raw)
  To: devel
  Cc: Michael Brown, Laszlo Ersek, Gerd Hoffmann, Oliver Steffen,
	Pawel Polawski, Jiewen Yao, Ard Biesheuvel, Jordan Justen

At TPL_HIGH_LEVEL, CPU interrupts are disabled (as per the UEFI
specification) and so we should never encounter a situation in which
an interrupt occurs at TPL_HIGH_LEVEL. The specification also
restricts usage of TPL_HIGH_LEVEL to the firmware itself.

However, nothing prevents a rogue UEFI application from illegally
calling gBS->RaiseTPL(TPL_HIGH_LEVEL) and then deliberately violating
the invariant by enabling interrupts via the STI or equivalent
instruction. Some versions of the Microsoft Windows bootloader are
known to do this.

NestedInterruptTplLib maintains the invariant that interrupts are
disabled at TPL_HIGH_LEVEL (even when performing the dark art of
deliberately manipulating the stack so that IRET will return with
interrupts still disabled), but does not itself rely on external code
maintaining this invariant.

Relax the assertion that the interrupted TPL is below TPL_HIGH_LEVEL
to an error message, to allow rogue UEFI applications such as the
Microsoft Windows bootloader to continue to function.

Ref: https://bugzilla.redhat.com/show_bug.cgi?id=2189136
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Oliver Steffen <osteffen@redhat.com>
Cc: Pawel Polawski <ppolawsk@redhat.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Ard Biesheuvel <ardb+tianocore@kernel.org>
Cc: Jordan Justen <jordan.l.justen@intel.com>

Michael Brown (2):
  OvmfPkg: Clarify invariants for NestedInterruptTplLib
  OvmfPkg: Relax assertion that interrupts do not occur at
    TPL_HIGH_LEVEL

 OvmfPkg/Library/NestedInterruptTplLib/Tpl.c | 31 +++++++++++++++++----
 1 file changed, 26 insertions(+), 5 deletions(-)

-- 
2.39.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/2] OvmfPkg: Clarify invariants for NestedInterruptTplLib
       [not found]         ` <20230508213100.3949708-1-mcb30@ipxe.org>
@ 2023-05-08 21:31           ` Michael Brown
  2023-05-08 21:31           ` [PATCH 2/2] OvmfPkg: Relax assertion that interrupts do not occur at TPL_HIGH_LEVEL Michael Brown
  1 sibling, 0 replies; 18+ messages in thread
From: Michael Brown @ 2023-05-08 21:31 UTC (permalink / raw)
  To: devel; +Cc: Michael Brown

NestedInterruptTplLib relies on CPU interrupts being disabled to
guarantee exclusive (and hence atomic) access to the shared state in
IsrState.  Nothing in the calling interrupt handler should have
re-enabled interrupts before calling NestedInterruptRestoreTPL(), and
the loop in NestedInterruptRestoreTPL() itself maintains the invariant
that interrupts are disabled at the start of each iteration.

Add assertions to clarify this invariant, and expand the comments
around the calls to RestoreTPL() and DisableInterrupts() to clarify
the expectations around enabling and disabling interrupts.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
---
 OvmfPkg/Library/NestedInterruptTplLib/Tpl.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c b/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c
index e19d98878eb7..e921a09c5599 100644
--- a/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c
+++ b/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c
@@ -104,6 +104,7 @@ NestedInterruptRestoreTPL (
   // defer our call to RestoreTPL() to the in-progress outer instance
   // of the same interrupt handler.
   //
+  ASSERT (GetInterruptState () == FALSE);
   if (InterruptedTPL == IsrState->InProgressRestoreTPL) {
     //
     // Trigger outer instance of this interrupt handler to perform the
@@ -153,6 +154,7 @@ NestedInterruptRestoreTPL (
     //
     // Check shared state loop invariants.
     //
+    ASSERT (GetInterruptState () == FALSE);
     ASSERT (IsrState->InProgressRestoreTPL < InterruptedTPL);
     ASSERT (IsrState->DeferredRestoreTPL == FALSE);
 
@@ -167,13 +169,17 @@ NestedInterruptRestoreTPL (
 
     //
     // Call RestoreTPL() to allow event notifications to be
-    // dispatched.  This will implicitly re-enable interrupts.
+    // dispatched.  This will implicitly re-enable interrupts (if
+    // InterruptedTPL is below TPL_HIGH_LEVEL), even though we are
+    // still inside the interrupt handler.
     //
     gBS->RestoreTPL (InterruptedTPL);
 
     //
     // Re-disable interrupts after the call to RestoreTPL() to ensure
-    // that we have exclusive access to the shared state.
+    // that we have exclusive access to the shared state.  Interrupts
+    // will be re-enabled by the IRET or equivalent instruction when
+    // we subsequently return from the interrupt handler.
     //
     DisableInterrupts ();
 
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/2] OvmfPkg: Relax assertion that interrupts do not occur at TPL_HIGH_LEVEL
       [not found]         ` <20230508213100.3949708-1-mcb30@ipxe.org>
  2023-05-08 21:31           ` [PATCH 1/2] OvmfPkg: Clarify invariants for NestedInterruptTplLib Michael Brown
@ 2023-05-08 21:31           ` Michael Brown
  2023-05-09  8:35             ` Laszlo Ersek
  1 sibling, 1 reply; 18+ messages in thread
From: Michael Brown @ 2023-05-08 21:31 UTC (permalink / raw)
  To: devel; +Cc: Michael Brown, Gerd Hoffmann, Laszlo Ersek

At TPL_HIGH_LEVEL, CPU interrupts are disabled (as per the UEFI
specification) and so we should never encounter a situation in which
an interrupt occurs at TPL_HIGH_LEVEL.  The specification also
restricts usage of TPL_HIGH_LEVEL to the firmware itself.

However, nothing prevents a rogue UEFI application from illegally
calling gBS->RaiseTPL(TPL_HIGH_LEVEL) and then deliberately violating
the invariant by enabling interrupts via the STI or equivalent
instruction.  Some versions of the Microsoft Windows bootloader are
known to do this.

NestedInterruptTplLib maintains the invariant that interrupts are
disabled at TPL_HIGH_LEVEL (even when performing the dark art of
deliberately manipulating the stack so that IRET will return with
interrupts still disabled), but does not itself rely on external code
maintaining this invariant.

Relax the assertion that the interrupted TPL is below TPL_HIGH_LEVEL
to an error message, to allow rogue UEFI applications such as the
Microsoft Windows bootloader to continue to function.

Debugged-by: Gerd Hoffmann <kraxel@redhat.com>
Debugged-by: Laszlo Ersek <lersek@redhat.com>
Ref: https://bugzilla.redhat.com/show_bug.cgi?id=2189136
Signed-off-by: Michael Brown <mcb30@ipxe.org>
---
 OvmfPkg/Library/NestedInterruptTplLib/Tpl.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c b/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c
index e921a09c5599..a91f2d3cb8c7 100644
--- a/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c
+++ b/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c
@@ -34,12 +34,27 @@ NestedInterruptRaiseTPL (
 
   //
   // Raise TPL and assert that we were called from within an interrupt
-  // handler (i.e. with TPL below TPL_HIGH_LEVEL but with interrupts
-  // disabled).
+  // handler (i.e. with interrupts already disabled before raising the
+  // TPL).
   //
   ASSERT (GetInterruptState () == FALSE);
   InterruptedTPL = gBS->RaiseTPL (TPL_HIGH_LEVEL);
-  ASSERT (InterruptedTPL < TPL_HIGH_LEVEL);
+
+  //
+  // At TPL_HIGH_LEVEL, CPU interrupts are disabled (as per the UEFI
+  // specification) and so we should never encounter a situation in
+  // which InterruptedTPL==TPL_HIGH_LEVEL.  The specification also
+  // restricts usage of TPL_HIGH_LEVEL to the firmware itself.
+  //
+  // However, nothing prevents a rogue UEFI application from illegally
+  // calling gBS->RaiseTPL(TPL_HIGH_LEVEL) and then deliberately
+  // violating the invariant by enabling interrupts via the STI or
+  // equivalent instruction.  Some versions of the Microsoft Windows
+  // bootloader are known to do this.
+  //
+  if (InterruptedTPL >= TPL_HIGH_LEVEL) {
+    DEBUG ((DEBUG_ERROR, "Illegal interrupt at TPL_HIGH_LEVEL!\n"));
+  }
 
   return InterruptedTPL;
 }
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] OvmfPkg: Relax assertion that interrupts do not occur at TPL_HIGH_LEVEL
  2023-05-08 21:31         ` [PATCH 0/2] OvmfPkg: Relax assertion that interrupts do not occur at TPL_HIGH_LEVEL Michael Brown
@ 2023-05-09  7:05           ` Gerd Hoffmann
  2023-05-09  8:43           ` Laszlo Ersek
  1 sibling, 0 replies; 18+ messages in thread
From: Gerd Hoffmann @ 2023-05-09  7:05 UTC (permalink / raw)
  To: Michael Brown
  Cc: devel, Laszlo Ersek, Oliver Steffen, Pawel Polawski, Jiewen Yao,
	Ard Biesheuvel, Jordan Justen

On Mon, May 08, 2023 at 09:31:23PM +0000, Michael Brown wrote:
> At TPL_HIGH_LEVEL, CPU interrupts are disabled (as per the UEFI
> specification) and so we should never encounter a situation in which
> an interrupt occurs at TPL_HIGH_LEVEL. The specification also
> restricts usage of TPL_HIGH_LEVEL to the firmware itself.
> 
> However, nothing prevents a rogue UEFI application from illegally
> calling gBS->RaiseTPL(TPL_HIGH_LEVEL) and then deliberately violating
> the invariant by enabling interrupts via the STI or equivalent
> instruction. Some versions of the Microsoft Windows bootloader are
> known to do this.
> 
> NestedInterruptTplLib maintains the invariant that interrupts are
> disabled at TPL_HIGH_LEVEL (even when performing the dark art of
> deliberately manipulating the stack so that IRET will return with
> interrupts still disabled), but does not itself rely on external code
> maintaining this invariant.
> 
> Relax the assertion that the interrupted TPL is below TPL_HIGH_LEVEL
> to an error message, to allow rogue UEFI applications such as the
> Microsoft Windows bootloader to continue to function.
> 
> Ref: https://bugzilla.redhat.com/show_bug.cgi?id=2189136
> Cc: Laszlo Ersek <lersek@redhat.com>
> Cc: Gerd Hoffmann <kraxel@redhat.com>
> Cc: Oliver Steffen <osteffen@redhat.com>
> Cc: Pawel Polawski <ppolawsk@redhat.com>
> Cc: Jiewen Yao <jiewen.yao@intel.com>
> Cc: Ard Biesheuvel <ardb+tianocore@kernel.org>
> Cc: Jordan Justen <jordan.l.justen@intel.com>
> 
> Michael Brown (2):
>   OvmfPkg: Clarify invariants for NestedInterruptTplLib
>   OvmfPkg: Relax assertion that interrupts do not occur at
>     TPL_HIGH_LEVEL

Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>

take care,
  Gerd


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] OvmfPkg: Relax assertion that interrupts do not occur at TPL_HIGH_LEVEL
  2023-05-08 21:31           ` [PATCH 2/2] OvmfPkg: Relax assertion that interrupts do not occur at TPL_HIGH_LEVEL Michael Brown
@ 2023-05-09  8:35             ` Laszlo Ersek
  2023-05-09  9:42               ` Gerd Hoffmann
  2023-05-09 12:04               ` [edk2-devel] " Michael Brown
  0 siblings, 2 replies; 18+ messages in thread
From: Laszlo Ersek @ 2023-05-09  8:35 UTC (permalink / raw)
  To: Michael Brown, devel; +Cc: Gerd Hoffmann

On 5/8/23 23:31, Michael Brown wrote:
> At TPL_HIGH_LEVEL, CPU interrupts are disabled (as per the UEFI
> specification) and so we should never encounter a situation in which
> an interrupt occurs at TPL_HIGH_LEVEL.  The specification also
> restricts usage of TPL_HIGH_LEVEL to the firmware itself.

Great introduction!

Regarding the rest of the commit message, I'd like us to tone it down a
little bit.

Here's why: I'd been used to Microsoft *not* cooperating usefully in
Windows-on-QEMU/KVM situations. But this instance was totally different.
In fact I'm still a bit shocked, in the positive sense. We got a fast
and helpful, to-the-point response. It's a first, considering my own
experience, and it has strongly changed my impression of Microsoft's
Windows team. I'd like us to acknowledge that in the commit message, if
possible.

Mind you, I'm not a native English speaker, so I could be seeing things
(and proposing worse language than the original).

With all that said:

> However, nothing prevents a rogue UEFI application from illegally

I request s/rogue/non-conformant/.

I'd also request "invalidly" rather than "illegally"; the latter has
connotations with the law, and seeing such in a commit message makes me
fidget uncomfortably. I do apologize.

> calling gBS->RaiseTPL(TPL_HIGH_LEVEL) and then deliberately violating

You are not wrong about "deliberately", but I'd still like us to remove
that word. :)

> the invariant by enabling interrupts via the STI or equivalent
> instruction.  Some versions of the Microsoft Windows bootloader are
> known to do this.
> 
> NestedInterruptTplLib maintains the invariant that interrupts are
> disabled at TPL_HIGH_LEVEL (even when performing the dark art of
> deliberately manipulating the stack so that IRET will return with
> interrupts still disabled), but does not itself rely on external code
> maintaining this invariant.
> 
> Relax the assertion that the interrupted TPL is below TPL_HIGH_LEVEL
> to an error message, to allow rogue UEFI applications such as the

s/rogue/non-conformant/

> Microsoft Windows bootloader to continue to function.

Can we say "particular version of the Microsoft Windows bootloader"?

> 
> Debugged-by: Gerd Hoffmann <kraxel@redhat.com>
> Debugged-by: Laszlo Ersek <lersek@redhat.com>
> Ref: https://bugzilla.redhat.com/show_bug.cgi?id=2189136
> Signed-off-by: Michael Brown <mcb30@ipxe.org>
> ---
>  OvmfPkg/Library/NestedInterruptTplLib/Tpl.c | 21 ++++++++++++++++++---
>  1 file changed, 18 insertions(+), 3 deletions(-)
> 
> diff --git a/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c b/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c
> index e921a09c5599..a91f2d3cb8c7 100644
> --- a/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c
> +++ b/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c
> @@ -34,12 +34,27 @@ NestedInterruptRaiseTPL (
>  
>    //
>    // Raise TPL and assert that we were called from within an interrupt
> -  // handler (i.e. with TPL below TPL_HIGH_LEVEL but with interrupts
> -  // disabled).
> +  // handler (i.e. with interrupts already disabled before raising the
> +  // TPL).
>    //
>    ASSERT (GetInterruptState () == FALSE);
>    InterruptedTPL = gBS->RaiseTPL (TPL_HIGH_LEVEL);
> -  ASSERT (InterruptedTPL < TPL_HIGH_LEVEL);
> +
> +  //
> +  // At TPL_HIGH_LEVEL, CPU interrupts are disabled (as per the UEFI
> +  // specification) and so we should never encounter a situation in
> +  // which InterruptedTPL==TPL_HIGH_LEVEL.  The specification also
> +  // restricts usage of TPL_HIGH_LEVEL to the firmware itself.
> +  //
> +  // However, nothing prevents a rogue UEFI application from illegally
> +  // calling gBS->RaiseTPL(TPL_HIGH_LEVEL) and then deliberately
> +  // violating the invariant by enabling interrupts via the STI or
> +  // equivalent instruction.  Some versions of the Microsoft Windows
> +  // bootloader are known to do this.
> +  //

Same three requests on the wording: rogue, illegally, deliberately.

(I am happy with "Some versions of"!)

> +  if (InterruptedTPL >= TPL_HIGH_LEVEL) {
> +    DEBUG ((DEBUG_ERROR, "Illegal interrupt at TPL_HIGH_LEVEL!\n"));

s/Illegal/Invalid/ please! :)

> +  }
>  
>    return InterruptedTPL;
>  }

Thank you for the patch; I do apologize about splitting hairs. The
debugging was difficult, and you *are* working around a bug here -- but
I'd really like our tone of voice to be positive here, simply because of
the stunningly positive attitude I've experienced from Microsoft.

Thanks!
Laszlo


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] OvmfPkg: Relax assertion that interrupts do not occur at TPL_HIGH_LEVEL
  2023-05-08 21:31         ` [PATCH 0/2] OvmfPkg: Relax assertion that interrupts do not occur at TPL_HIGH_LEVEL Michael Brown
  2023-05-09  7:05           ` Gerd Hoffmann
@ 2023-05-09  8:43           ` Laszlo Ersek
  2023-05-09 12:08             ` [edk2-devel] " Michael Brown
  1 sibling, 1 reply; 18+ messages in thread
From: Laszlo Ersek @ 2023-05-09  8:43 UTC (permalink / raw)
  To: Michael Brown, devel
  Cc: Gerd Hoffmann, Oliver Steffen, Pawel Polawski, Jiewen Yao,
	Ard Biesheuvel, Jordan Justen

On 5/8/23 23:31, Michael Brown wrote:
> At TPL_HIGH_LEVEL, CPU interrupts are disabled (as per the UEFI
> specification) and so we should never encounter a situation in which
> an interrupt occurs at TPL_HIGH_LEVEL. The specification also
> restricts usage of TPL_HIGH_LEVEL to the firmware itself.
> 
> However, nothing prevents a rogue UEFI application from illegally
> calling gBS->RaiseTPL(TPL_HIGH_LEVEL) and then deliberately violating
> the invariant by enabling interrupts via the STI or equivalent
> instruction. Some versions of the Microsoft Windows bootloader are
> known to do this.
> 
> NestedInterruptTplLib maintains the invariant that interrupts are
> disabled at TPL_HIGH_LEVEL (even when performing the dark art of
> deliberately manipulating the stack so that IRET will return with
> interrupts still disabled), but does not itself rely on external code
> maintaining this invariant.
> 
> Relax the assertion that the interrupted TPL is below TPL_HIGH_LEVEL
> to an error message, to allow rogue UEFI applications such as the
> Microsoft Windows bootloader to continue to function.
> 
> Ref: https://bugzilla.redhat.com/show_bug.cgi?id=2189136
> Cc: Laszlo Ersek <lersek@redhat.com>
> Cc: Gerd Hoffmann <kraxel@redhat.com>
> Cc: Oliver Steffen <osteffen@redhat.com>
> Cc: Pawel Polawski <ppolawsk@redhat.com>
> Cc: Jiewen Yao <jiewen.yao@intel.com>
> Cc: Ard Biesheuvel <ardb+tianocore@kernel.org>
> Cc: Jordan Justen <jordan.l.justen@intel.com>
> 
> Michael Brown (2):
>   OvmfPkg: Clarify invariants for NestedInterruptTplLib
>   OvmfPkg: Relax assertion that interrupts do not occur at
>     TPL_HIGH_LEVEL
> 
>  OvmfPkg/Library/NestedInterruptTplLib/Tpl.c | 31 +++++++++++++++++----
>  1 file changed, 26 insertions(+), 5 deletions(-)
> 

I'm not subscribed to the list, so I don't have a copy of patch#1. I've
checked patch#1 at this URL:

https://listman.redhat.com/archives/edk2-devel-archive/2023-May/063591.html

and I'll comment on it using the cover letter:

I really like that patch, with one stylistic exception: in edk2,
explicit FALSE and TRUE comparisons are not desired. So I suggest:

  ASSERT (!GetInterruptState ());

Twice.

In fact, I *think* that if you run uncrustify with the edk2 config on
the patch, then it will rewrite that code.

Uncrustify normally only complains in CI on github, but you can run it
manually too, after each commit:

(1) Clone and build uncrustify:

https://projectmu@dev.azure.com/projectmu/Uncrustify/_git/Uncrustify

(2) After each commit, preferably when your working tree *and* index are
clean, run uncrustify in the edk2 project root, with the following flags:

  -c .pytool/Plugin/UncrustifyCheck/uncrustify.cfg \
  --replace \
  --no-backup \
  --if-changed

The file list can be passed in via stdin ("-F -") or on the uncrustify
command line.

The "file list" is generally the list of *.c and *.h files modified by
the particular commit. Just dumping the entire file list modified by a
commit to uncrustify is not good: uncrustify will happily garble (for
example) *.inf files, and then it will also crash with SEGV.

Thank you!
Laszlo


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [edk2-devel] [PATCH v2 1/1] OvmfPkg/NestedInterruptTplLib: replace ASSERT() with a warning logged.
  2023-05-08  6:45         ` Laszlo Ersek
@ 2023-05-09  9:13           ` Ard Biesheuvel
  0 siblings, 0 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2023-05-09  9:13 UTC (permalink / raw)
  To: devel, lersek
  Cc: Michael Brown, kraxel, Oliver Steffen, Pawel Polawski, Jiewen Yao,
	Ard Biesheuvel, Jordan Justen

On Mon, 8 May 2023 at 08:46, Laszlo Ersek <lersek@redhat.com> wrote:
>
> On 5/6/23 01:57, Ard Biesheuvel wrote:
> > On Sat, 6 May 2023 at 01:27, Michael Brown <mcb30@ipxe.org> wrote:
> >>
> >> On 05/05/2023 19:56, Laszlo Ersek wrote:
> >>> I don't like the patch. For two reasons:
> >>>
> >>> (1) It papers over the actual issue. The problem should be fixed where
> >>> it is, if possible.
> >>
> >> Agreed, but (as you have shown in
> >> https://bugzilla.redhat.com/show_bug.cgi?id=2189136) the bug lies in
> >> Windows code rather than in EDK2 code.  If the goal is to allow these
> >> buggy Windows builds to still be used with OVMF, then the only option is
> >> to paper over the issue.  We should do this only if it can be proven
> >> safe to do so, of course.
> >>
> >>> (2) With the patch applied, NestedInterruptRaiseTPL() can return
> >>> TPL_HIGH_LEVEL (as "InterruptedTPL"). Consequently,
> >>> TimerInterruptHandler() [OvmfPkg/LocalApicTimerDxe/LocalApicTimerDxe.c]
> >>> may pass TPL_HIGH_LEVEL back to NestedInterruptRestoreTPL(), as
> >>> "InterruptedTPL".
> >>>
> >>> I believe that this in turn may invalidate at least one comment in
> >>> NestedInterruptRestoreTPL():
> >>>
> >>>      //
> >>>      // Call RestoreTPL() to allow event notifications to be
> >>>      // dispatched.  This will implicitly re-enable interrupts.
> >>>      //
> >>>      gBS->RestoreTPL (InterruptedTPL);
> >>>
> >>> Restoring TPL_HIGH_LEVEL does not re-enable interrupts -- nominally anyways.
> >>
> >> I agree that the comment is invalidated, but as far as I can tell the
> >> logic remains safe.
> >>
> >> I will put together a patch to update the comments in
> >> NestedInterruptTplLib to address the possibility of an interrupt
> >> occurring (illegally) at TPL_HIGH_LEVEL.
> >>
> >>> (a) Make LocalApicTimerDxe Xen-specific again. It's only the OVMF Xen
> >>> platform that really *needs* NestedInterruptTplLib. (Don't get me wrong:
> >>> NestedInterruptTplLib is technically correct in all circumstances, but
> >>> in practice it happens to be too strict.)
> >>>
> >>> (b) For the non-Xen OVMF platforms, re-create a LocalApicTimerDxe
> >>> variant that effectively has commits a086f4a63bc0 and a24fbd606125
> >>> reverted. (We should keep 9bf473da4c1d.) This returns us to
> >>> pre-239b50a86370 status -- that is, a timer interrupt handler that (a)
> >>> does not try to be smart about nested interrupts, therefore one that is
> >>> much simpler, and (b) is more tolerant of the Windows / cdboot.efi spec
> >>> violation, (c) is vulnerable to the timer interrupt storm seen on Xen,
> >>> but will never run on Xen. (Only the OVMF Xen platform is supposed to be
> >>> launched on Xen.)
> >>
> >> I'm less keen on this because it reduces the runtime exposure of a very
> >> complex piece of code, and will effectively cause that code to become
> >> unmaintained.
> >>
> >> It's also satisfying (to me) that NestedInterruptTplLib provides a
> >> provable upper bound on stack consumption due to interrupts, which can't
> >> be guaranteed by the simpler pre-239b50a86370 scheme.
> >>
> >> Could we defer judgement until after I've fully reasoned through (and
> >> documented) how NestedInterruptTplLib will work in the presence of
> >> interrupts occurring at TPL_HIGH_LEVEL?
> >>
> >
> > Would it be feasible for our firmware implementation to disable the
> > timer interrupt at the timer end as well?
> >
> > E.g.,
> >
> > RaiseTPL(HIGH)::
> >
> > CLI
> > disarm timer
> >
> >
> > RestoreTPL::
> >
> > <complain if HIGH and interrupts enabled at CPU side>
> > re-arm timer
> > STI
> >
>
> I can be entirely wrong here, but:
>
> - we looked for a solution (or workaround) to the original problem that
> stays within the boundaries of OvmfPkg, so sinking tweaks into the core
> TPL manipulation functions isn't ideal
>
> - regarding the TimerInterruptHandler() function(s) that do live in
> OvmfPkg, there had been tweaks to signaling end-of-interrupt (which I
> understand as sort of equivalent to your suggestion, as unless/until you
> signal EOI, no more interrupts will be *generated*), but those had not
> helped. The EOI was either too early and so we got the unbounded
> nesting, or it was too late, and no interrupts were generated while (for
> example) TPL_CALLBACK code would depend on timers with CheckEvent. See
> bug 4162 -- that was what prompted Michael to revert the EOI placement
> tweak and to implement NestedInterruptLib.
>
> Apologies if there are further interpretations of disarming the timer
> that I'm missing!
>

No, I think you've summed it up. And in any case, it is not OVMF's job
to police what the loader is doing to that extent - we should flag it
as a violation but beyond that, I don't think we should try to
interfere with this non-compliant behavior.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] OvmfPkg: Relax assertion that interrupts do not occur at TPL_HIGH_LEVEL
  2023-05-09  8:35             ` Laszlo Ersek
@ 2023-05-09  9:42               ` Gerd Hoffmann
  2023-05-09 12:04               ` [edk2-devel] " Michael Brown
  1 sibling, 0 replies; 18+ messages in thread
From: Gerd Hoffmann @ 2023-05-09  9:42 UTC (permalink / raw)
  To: Laszlo Ersek; +Cc: Michael Brown, devel

  Hi,

> Here's why: I'd been used to Microsoft *not* cooperating usefully in
> Windows-on-QEMU/KVM situations. But this instance was totally different.
> In fact I'm still a bit shocked, in the positive sense. We got a fast
> and helpful, to-the-point response. It's a first, considering my own
> experience, and it has strongly changed my impression of Microsoft's
> Windows team. I'd like us to acknowledge that in the commit message, if
> possible.

And Microsoft saying this is tsc calibration code also gives us a
workaround: When using 'qemu -cpu host,hv-frequencies=on' the windows
installer iso boots just fine, most likely because cdboot.efi queries
the hypervisor for the tsc frequency instead of running the calibration
loop.

take care,
  Gerd


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [edk2-devel] [PATCH 2/2] OvmfPkg: Relax assertion that interrupts do not occur at TPL_HIGH_LEVEL
  2023-05-09  8:35             ` Laszlo Ersek
  2023-05-09  9:42               ` Gerd Hoffmann
@ 2023-05-09 12:04               ` Michael Brown
  1 sibling, 0 replies; 18+ messages in thread
From: Michael Brown @ 2023-05-09 12:04 UTC (permalink / raw)
  To: devel, lersek; +Cc: Gerd Hoffmann

On 09/05/2023 09:35, Laszlo Ersek wrote:
> Thank you for the patch; I do apologize about splitting hairs. The
> debugging was difficult, and you *are* working around a bug here -- but
> I'd really like our tone of voice to be positive here, simply because of
> the stunningly positive attitude I've experienced from Microsoft.

No problem.  Patch v2 to follow.

Michael


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [edk2-devel] [PATCH 0/2] OvmfPkg: Relax assertion that interrupts do not occur at TPL_HIGH_LEVEL
  2023-05-09  8:43           ` Laszlo Ersek
@ 2023-05-09 12:08             ` Michael Brown
  2023-05-09 13:27               ` Laszlo Ersek
  0 siblings, 1 reply; 18+ messages in thread
From: Michael Brown @ 2023-05-09 12:08 UTC (permalink / raw)
  To: devel, lersek
  Cc: Gerd Hoffmann, Oliver Steffen, Pawel Polawski, Jiewen Yao,
	Ard Biesheuvel, Jordan Justen

On 09/05/2023 09:43, Laszlo Ersek wrote:
> I'm not subscribed to the list, so I don't have a copy of patch#1. I've
> checked patch#1 at this URL:
> 
> https://listman.redhat.com/archives/edk2-devel-archive/2023-May/063591.html
> 
> and I'll comment on it using the cover letter:
> 
> I really like that patch, with one stylistic exception: in edk2,
> explicit FALSE and TRUE comparisons are not desired. So I suggest:
> 
>    ASSERT (!GetInterruptState ());
> 
> Twice.
> 
> In fact, I *think* that if you run uncrustify with the edk2 config on
> the patch, then it will rewrite that code.

I built and ran uncrustify with the edk2 config but it did not modify 
the code.  (I did check that it would fix other deliberate errors such 
as extra whitespace, so I don't think this was an error in my setup.)

I will send through v2 with the explicit "== FALSE" still present, for 
consistency with the rest of the code in that file.  (I think I vaguely 
remember someone asking me to add the explicit comparisons when I first 
submitted the code.)  I'm happy for there to be a follow up patch to 
change the coding style to remove all of the explicit boolean 
comparisons, if that is what is wanted.

Thanks,

Michael


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [edk2-devel] [PATCH 0/2] OvmfPkg: Relax assertion that interrupts do not occur at TPL_HIGH_LEVEL
  2023-05-09 12:08             ` [edk2-devel] " Michael Brown
@ 2023-05-09 13:27               ` Laszlo Ersek
  0 siblings, 0 replies; 18+ messages in thread
From: Laszlo Ersek @ 2023-05-09 13:27 UTC (permalink / raw)
  To: Michael Brown, devel
  Cc: Gerd Hoffmann, Oliver Steffen, Pawel Polawski, Jiewen Yao,
	Ard Biesheuvel, Jordan Justen

On 5/9/23 14:08, Michael Brown wrote:
> On 09/05/2023 09:43, Laszlo Ersek wrote:
>> I'm not subscribed to the list, so I don't have a copy of patch#1. I've
>> checked patch#1 at this URL:
>>
>> https://listman.redhat.com/archives/edk2-devel-archive/2023-May/063591.html
>>
>> and I'll comment on it using the cover letter:
>>
>> I really like that patch, with one stylistic exception: in edk2,
>> explicit FALSE and TRUE comparisons are not desired. So I suggest:
>>
>>    ASSERT (!GetInterruptState ());
>>
>> Twice.
>>
>> In fact, I *think* that if you run uncrustify with the edk2 config on
>> the patch, then it will rewrite that code.
> 
> I built and ran uncrustify with the edk2 config but it did not modify
> the code.  (I did check that it would fix other deliberate errors such
> as extra whitespace, so I don't think this was an error in my setup.)
> 
> I will send through v2 with the explicit "== FALSE" still present, for
> consistency with the rest of the code in that file.  (I think I vaguely
> remember someone asking me to add the explicit comparisons when I first
> submitted the code.)

Wow, I totally missed the preexistent comparisons. You are right to stay
consistent.

(I do disagree with the preexistent comparisons as well, but that's
indeed a different discussion.)

Thanks!
Laszlo

> I'm happy for there to be a follow up patch to
> change the coding style to remove all of the explicit boolean
> comparisons, if that is what is wanted.
> 
> Thanks,
> 
> Michael
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2023-05-09 13:27 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-03  7:19 [PATCH v2 1/1] OvmfPkg/NestedInterruptTplLib: replace ASSERT() with a warning logged Gerd Hoffmann
2023-05-05 14:10 ` [edk2-devel] " Michael Brown
2023-05-05 18:56   ` Laszlo Ersek
2023-05-05 23:27     ` Michael Brown
2023-05-05 23:57       ` Ard Biesheuvel
2023-05-08  6:45         ` Laszlo Ersek
2023-05-09  9:13           ` Ard Biesheuvel
2023-05-08  6:38       ` Laszlo Ersek
2023-05-08 21:31         ` [PATCH 0/2] OvmfPkg: Relax assertion that interrupts do not occur at TPL_HIGH_LEVEL Michael Brown
2023-05-09  7:05           ` Gerd Hoffmann
2023-05-09  8:43           ` Laszlo Ersek
2023-05-09 12:08             ` [edk2-devel] " Michael Brown
2023-05-09 13:27               ` Laszlo Ersek
     [not found]         ` <20230508213100.3949708-1-mcb30@ipxe.org>
2023-05-08 21:31           ` [PATCH 1/2] OvmfPkg: Clarify invariants for NestedInterruptTplLib Michael Brown
2023-05-08 21:31           ` [PATCH 2/2] OvmfPkg: Relax assertion that interrupts do not occur at TPL_HIGH_LEVEL Michael Brown
2023-05-09  8:35             ` Laszlo Ersek
2023-05-09  9:42               ` Gerd Hoffmann
2023-05-09 12:04               ` [edk2-devel] " Michael Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox