* [PATCH v2 1/1] OvmfPkg/NestedInterruptTplLib: replace ASSERT() with a warning logged. @ 2023-05-03 7:19 Gerd Hoffmann 2023-05-05 14:10 ` [edk2-devel] " Michael Brown 0 siblings, 1 reply; 18+ messages in thread From: Gerd Hoffmann @ 2023-05-03 7:19 UTC (permalink / raw) To: devel Cc: Oliver Steffen, Gerd Hoffmann, Pawel Polawski, Jiewen Yao, Ard Biesheuvel, Jordan Justen, Michael Brown, Laszlo Ersek OVMF can't guarantee that the ASSERT() doesn't happen. Misbehaving EFI applications can trigger this. So log a warning instead and try to continue. Reproducer: Fetch windows 11 22H2 iso image, boot it in qemu with OVMF. Traced to BootServices->Stall() being called with IPL=TPL_HIGH_LEVEL and Interrupts /enabled/ while windows is booting. Cc: Michael Brown <mcb30@ipxe.org> Cc: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> --- OvmfPkg/Library/NestedInterruptTplLib/Tpl.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c b/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c index e19d98878eb7..fdd7d15c4ba8 100644 --- a/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c +++ b/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c @@ -39,7 +39,9 @@ NestedInterruptRaiseTPL ( // ASSERT (GetInterruptState () == FALSE); InterruptedTPL = gBS->RaiseTPL (TPL_HIGH_LEVEL); - ASSERT (InterruptedTPL < TPL_HIGH_LEVEL); + if (InterruptedTPL >= TPL_HIGH_LEVEL) { + DEBUG ((DEBUG_WARN, "%a: Called at IPL %d\n", __func__, InterruptedTPL)); + } return InterruptedTPL; } -- 2.40.1 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [edk2-devel] [PATCH v2 1/1] OvmfPkg/NestedInterruptTplLib: replace ASSERT() with a warning logged. 2023-05-03 7:19 [PATCH v2 1/1] OvmfPkg/NestedInterruptTplLib: replace ASSERT() with a warning logged Gerd Hoffmann @ 2023-05-05 14:10 ` Michael Brown 2023-05-05 18:56 ` Laszlo Ersek 0 siblings, 1 reply; 18+ messages in thread From: Michael Brown @ 2023-05-05 14:10 UTC (permalink / raw) To: devel, kraxel Cc: Oliver Steffen, Pawel Polawski, Jiewen Yao, Ard Biesheuvel, Jordan Justen, Laszlo Ersek On 03/05/2023 08:19, Gerd Hoffmann wrote: > OVMF can't guarantee that the ASSERT() doesn't happen. Misbehaving > EFI applications can trigger this. So log a warning instead and try > to continue. > > Reproducer: Fetch windows 11 22H2 iso image, boot it in qemu with OVMF. > > Traced to BootServices->Stall() being called with IPL=TPL_HIGH_LEVEL > and Interrupts /enabled/ while windows is booting. > > Cc: Michael Brown <mcb30@ipxe.org> > Cc: Laszlo Ersek <lersek@redhat.com> > Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> > --- > OvmfPkg/Library/NestedInterruptTplLib/Tpl.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c b/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c > index e19d98878eb7..fdd7d15c4ba8 100644 > --- a/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c > +++ b/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c > @@ -39,7 +39,9 @@ NestedInterruptRaiseTPL ( > // > ASSERT (GetInterruptState () == FALSE); > InterruptedTPL = gBS->RaiseTPL (TPL_HIGH_LEVEL); > - ASSERT (InterruptedTPL < TPL_HIGH_LEVEL); > + if (InterruptedTPL >= TPL_HIGH_LEVEL) { > + DEBUG ((DEBUG_WARN, "%a: Called at IPL %d\n", __func__, InterruptedTPL)); > + } > > return InterruptedTPL; > } While https://bugzilla.redhat.com/show_bug.cgi?id=2189136 continues to track the underlying Windows bug that leads to this assertion being triggered: I suspect that this patch will allow people to boot these buggy versions of Windows in OVMF, and I don't think it will make things any worse. I would probably suggest changing DEBUG_WARN to DEBUG_ERROR since this represents a serious invariant violation being detected. With that change: Reviewed-by: Michael Brown <mcb30@ipxe.org> Thanks, Michael ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [edk2-devel] [PATCH v2 1/1] OvmfPkg/NestedInterruptTplLib: replace ASSERT() with a warning logged. 2023-05-05 14:10 ` [edk2-devel] " Michael Brown @ 2023-05-05 18:56 ` Laszlo Ersek 2023-05-05 23:27 ` Michael Brown 0 siblings, 1 reply; 18+ messages in thread From: Laszlo Ersek @ 2023-05-05 18:56 UTC (permalink / raw) To: Michael Brown, devel, kraxel Cc: Oliver Steffen, Pawel Polawski, Jiewen Yao, Ard Biesheuvel, Jordan Justen On 5/5/23 16:10, Michael Brown wrote: > On 03/05/2023 08:19, Gerd Hoffmann wrote: >> OVMF can't guarantee that the ASSERT() doesn't happen. Misbehaving >> EFI applications can trigger this. So log a warning instead and try >> to continue. >> >> Reproducer: Fetch windows 11 22H2 iso image, boot it in qemu with OVMF. >> >> Traced to BootServices->Stall() being called with IPL=TPL_HIGH_LEVEL >> and Interrupts /enabled/ while windows is booting. >> >> Cc: Michael Brown <mcb30@ipxe.org> >> Cc: Laszlo Ersek <lersek@redhat.com> >> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> >> --- >> OvmfPkg/Library/NestedInterruptTplLib/Tpl.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c >> b/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c >> index e19d98878eb7..fdd7d15c4ba8 100644 >> --- a/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c >> +++ b/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c >> @@ -39,7 +39,9 @@ NestedInterruptRaiseTPL ( >> // >> ASSERT (GetInterruptState () == FALSE); >> InterruptedTPL = gBS->RaiseTPL (TPL_HIGH_LEVEL); >> - ASSERT (InterruptedTPL < TPL_HIGH_LEVEL); >> + if (InterruptedTPL >= TPL_HIGH_LEVEL) { >> + DEBUG ((DEBUG_WARN, "%a: Called at IPL %d\n", __func__, >> InterruptedTPL)); >> + } >> return InterruptedTPL; >> } > > While https://bugzilla.redhat.com/show_bug.cgi?id=2189136 continues to > track the underlying Windows bug that leads to this assertion being > triggered: I suspect that this patch will allow people to boot these > buggy versions of Windows in OVMF, and I don't think it will make things > any worse. > > I would probably suggest changing DEBUG_WARN to DEBUG_ERROR since this > represents a serious invariant violation being detected. With that change: > > Reviewed-by: Michael Brown <mcb30@ipxe.org> I don't like the patch. For two reasons: (1) It papers over the actual issue. The problem should be fixed where it is, if possible. (2) With the patch applied, NestedInterruptRaiseTPL() can return TPL_HIGH_LEVEL (as "InterruptedTPL"). Consequently, TimerInterruptHandler() [OvmfPkg/LocalApicTimerDxe/LocalApicTimerDxe.c] may pass TPL_HIGH_LEVEL back to NestedInterruptRestoreTPL(), as "InterruptedTPL". I believe that this in turn may invalidate at least one comment in NestedInterruptRestoreTPL(): // // Call RestoreTPL() to allow event notifications to be // dispatched. This will implicitly re-enable interrupts. // gBS->RestoreTPL (InterruptedTPL); Restoring TPL_HIGH_LEVEL does not re-enable interrupts -- nominally anyways. I wouldn't like OVMF to stick with yet another workaround / yet more internal inconsistency. We should just wait until fixed Windows installer media gets released. Here's an alternative: (a) Make LocalApicTimerDxe Xen-specific again. It's only the OVMF Xen platform that really *needs* NestedInterruptTplLib. (Don't get me wrong: NestedInterruptTplLib is technically correct in all circumstances, but in practice it happens to be too strict.) (b) For the non-Xen OVMF platforms, re-create a LocalApicTimerDxe variant that effectively has commits a086f4a63bc0 and a24fbd606125 reverted. (We should keep 9bf473da4c1d.) This returns us to pre-239b50a86370 status -- that is, a timer interrupt handler that (a) does not try to be smart about nested interrupts, therefore one that is much simpler, and (b) is more tolerant of the Windows / cdboot.efi spec violation, (c) is vulnerable to the timer interrupt storm seen on Xen, but will never run on Xen. (Only the OVMF Xen platform is supposed to be launched on Xen.) Laszlo ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [edk2-devel] [PATCH v2 1/1] OvmfPkg/NestedInterruptTplLib: replace ASSERT() with a warning logged. 2023-05-05 18:56 ` Laszlo Ersek @ 2023-05-05 23:27 ` Michael Brown 2023-05-05 23:57 ` Ard Biesheuvel 2023-05-08 6:38 ` Laszlo Ersek 0 siblings, 2 replies; 18+ messages in thread From: Michael Brown @ 2023-05-05 23:27 UTC (permalink / raw) To: devel, lersek, kraxel Cc: Oliver Steffen, Pawel Polawski, Jiewen Yao, Ard Biesheuvel, Jordan Justen On 05/05/2023 19:56, Laszlo Ersek wrote: > I don't like the patch. For two reasons: > > (1) It papers over the actual issue. The problem should be fixed where > it is, if possible. Agreed, but (as you have shown in https://bugzilla.redhat.com/show_bug.cgi?id=2189136) the bug lies in Windows code rather than in EDK2 code. If the goal is to allow these buggy Windows builds to still be used with OVMF, then the only option is to paper over the issue. We should do this only if it can be proven safe to do so, of course. > (2) With the patch applied, NestedInterruptRaiseTPL() can return > TPL_HIGH_LEVEL (as "InterruptedTPL"). Consequently, > TimerInterruptHandler() [OvmfPkg/LocalApicTimerDxe/LocalApicTimerDxe.c] > may pass TPL_HIGH_LEVEL back to NestedInterruptRestoreTPL(), as > "InterruptedTPL". > > I believe that this in turn may invalidate at least one comment in > NestedInterruptRestoreTPL(): > > // > // Call RestoreTPL() to allow event notifications to be > // dispatched. This will implicitly re-enable interrupts. > // > gBS->RestoreTPL (InterruptedTPL); > > Restoring TPL_HIGH_LEVEL does not re-enable interrupts -- nominally anyways. I agree that the comment is invalidated, but as far as I can tell the logic remains safe. I will put together a patch to update the comments in NestedInterruptTplLib to address the possibility of an interrupt occurring (illegally) at TPL_HIGH_LEVEL. > (a) Make LocalApicTimerDxe Xen-specific again. It's only the OVMF Xen > platform that really *needs* NestedInterruptTplLib. (Don't get me wrong: > NestedInterruptTplLib is technically correct in all circumstances, but > in practice it happens to be too strict.) > > (b) For the non-Xen OVMF platforms, re-create a LocalApicTimerDxe > variant that effectively has commits a086f4a63bc0 and a24fbd606125 > reverted. (We should keep 9bf473da4c1d.) This returns us to > pre-239b50a86370 status -- that is, a timer interrupt handler that (a) > does not try to be smart about nested interrupts, therefore one that is > much simpler, and (b) is more tolerant of the Windows / cdboot.efi spec > violation, (c) is vulnerable to the timer interrupt storm seen on Xen, > but will never run on Xen. (Only the OVMF Xen platform is supposed to be > launched on Xen.) I'm less keen on this because it reduces the runtime exposure of a very complex piece of code, and will effectively cause that code to become unmaintained. It's also satisfying (to me) that NestedInterruptTplLib provides a provable upper bound on stack consumption due to interrupts, which can't be guaranteed by the simpler pre-239b50a86370 scheme. Could we defer judgement until after I've fully reasoned through (and documented) how NestedInterruptTplLib will work in the presence of interrupts occurring at TPL_HIGH_LEVEL? Thanks, Michael ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [edk2-devel] [PATCH v2 1/1] OvmfPkg/NestedInterruptTplLib: replace ASSERT() with a warning logged. 2023-05-05 23:27 ` Michael Brown @ 2023-05-05 23:57 ` Ard Biesheuvel 2023-05-08 6:45 ` Laszlo Ersek 2023-05-08 6:38 ` Laszlo Ersek 1 sibling, 1 reply; 18+ messages in thread From: Ard Biesheuvel @ 2023-05-05 23:57 UTC (permalink / raw) To: Michael Brown Cc: devel, lersek, kraxel, Oliver Steffen, Pawel Polawski, Jiewen Yao, Ard Biesheuvel, Jordan Justen On Sat, 6 May 2023 at 01:27, Michael Brown <mcb30@ipxe.org> wrote: > > On 05/05/2023 19:56, Laszlo Ersek wrote: > > I don't like the patch. For two reasons: > > > > (1) It papers over the actual issue. The problem should be fixed where > > it is, if possible. > > Agreed, but (as you have shown in > https://bugzilla.redhat.com/show_bug.cgi?id=2189136) the bug lies in > Windows code rather than in EDK2 code. If the goal is to allow these > buggy Windows builds to still be used with OVMF, then the only option is > to paper over the issue. We should do this only if it can be proven > safe to do so, of course. > > > (2) With the patch applied, NestedInterruptRaiseTPL() can return > > TPL_HIGH_LEVEL (as "InterruptedTPL"). Consequently, > > TimerInterruptHandler() [OvmfPkg/LocalApicTimerDxe/LocalApicTimerDxe.c] > > may pass TPL_HIGH_LEVEL back to NestedInterruptRestoreTPL(), as > > "InterruptedTPL". > > > > I believe that this in turn may invalidate at least one comment in > > NestedInterruptRestoreTPL(): > > > > // > > // Call RestoreTPL() to allow event notifications to be > > // dispatched. This will implicitly re-enable interrupts. > > // > > gBS->RestoreTPL (InterruptedTPL); > > > > Restoring TPL_HIGH_LEVEL does not re-enable interrupts -- nominally anyways. > > I agree that the comment is invalidated, but as far as I can tell the > logic remains safe. > > I will put together a patch to update the comments in > NestedInterruptTplLib to address the possibility of an interrupt > occurring (illegally) at TPL_HIGH_LEVEL. > > > (a) Make LocalApicTimerDxe Xen-specific again. It's only the OVMF Xen > > platform that really *needs* NestedInterruptTplLib. (Don't get me wrong: > > NestedInterruptTplLib is technically correct in all circumstances, but > > in practice it happens to be too strict.) > > > > (b) For the non-Xen OVMF platforms, re-create a LocalApicTimerDxe > > variant that effectively has commits a086f4a63bc0 and a24fbd606125 > > reverted. (We should keep 9bf473da4c1d.) This returns us to > > pre-239b50a86370 status -- that is, a timer interrupt handler that (a) > > does not try to be smart about nested interrupts, therefore one that is > > much simpler, and (b) is more tolerant of the Windows / cdboot.efi spec > > violation, (c) is vulnerable to the timer interrupt storm seen on Xen, > > but will never run on Xen. (Only the OVMF Xen platform is supposed to be > > launched on Xen.) > > I'm less keen on this because it reduces the runtime exposure of a very > complex piece of code, and will effectively cause that code to become > unmaintained. > > It's also satisfying (to me) that NestedInterruptTplLib provides a > provable upper bound on stack consumption due to interrupts, which can't > be guaranteed by the simpler pre-239b50a86370 scheme. > > Could we defer judgement until after I've fully reasoned through (and > documented) how NestedInterruptTplLib will work in the presence of > interrupts occurring at TPL_HIGH_LEVEL? > Would it be feasible for our firmware implementation to disable the timer interrupt at the timer end as well? E.g., RaiseTPL(HIGH):: CLI disarm timer RestoreTPL:: <complain if HIGH and interrupts enabled at CPU side> re-arm timer STI ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [edk2-devel] [PATCH v2 1/1] OvmfPkg/NestedInterruptTplLib: replace ASSERT() with a warning logged. 2023-05-05 23:57 ` Ard Biesheuvel @ 2023-05-08 6:45 ` Laszlo Ersek 2023-05-09 9:13 ` Ard Biesheuvel 0 siblings, 1 reply; 18+ messages in thread From: Laszlo Ersek @ 2023-05-08 6:45 UTC (permalink / raw) To: Ard Biesheuvel, Michael Brown Cc: devel, kraxel, Oliver Steffen, Pawel Polawski, Jiewen Yao, Ard Biesheuvel, Jordan Justen On 5/6/23 01:57, Ard Biesheuvel wrote: > On Sat, 6 May 2023 at 01:27, Michael Brown <mcb30@ipxe.org> wrote: >> >> On 05/05/2023 19:56, Laszlo Ersek wrote: >>> I don't like the patch. For two reasons: >>> >>> (1) It papers over the actual issue. The problem should be fixed where >>> it is, if possible. >> >> Agreed, but (as you have shown in >> https://bugzilla.redhat.com/show_bug.cgi?id=2189136) the bug lies in >> Windows code rather than in EDK2 code. If the goal is to allow these >> buggy Windows builds to still be used with OVMF, then the only option is >> to paper over the issue. We should do this only if it can be proven >> safe to do so, of course. >> >>> (2) With the patch applied, NestedInterruptRaiseTPL() can return >>> TPL_HIGH_LEVEL (as "InterruptedTPL"). Consequently, >>> TimerInterruptHandler() [OvmfPkg/LocalApicTimerDxe/LocalApicTimerDxe.c] >>> may pass TPL_HIGH_LEVEL back to NestedInterruptRestoreTPL(), as >>> "InterruptedTPL". >>> >>> I believe that this in turn may invalidate at least one comment in >>> NestedInterruptRestoreTPL(): >>> >>> // >>> // Call RestoreTPL() to allow event notifications to be >>> // dispatched. This will implicitly re-enable interrupts. >>> // >>> gBS->RestoreTPL (InterruptedTPL); >>> >>> Restoring TPL_HIGH_LEVEL does not re-enable interrupts -- nominally anyways. >> >> I agree that the comment is invalidated, but as far as I can tell the >> logic remains safe. >> >> I will put together a patch to update the comments in >> NestedInterruptTplLib to address the possibility of an interrupt >> occurring (illegally) at TPL_HIGH_LEVEL. >> >>> (a) Make LocalApicTimerDxe Xen-specific again. It's only the OVMF Xen >>> platform that really *needs* NestedInterruptTplLib. (Don't get me wrong: >>> NestedInterruptTplLib is technically correct in all circumstances, but >>> in practice it happens to be too strict.) >>> >>> (b) For the non-Xen OVMF platforms, re-create a LocalApicTimerDxe >>> variant that effectively has commits a086f4a63bc0 and a24fbd606125 >>> reverted. (We should keep 9bf473da4c1d.) This returns us to >>> pre-239b50a86370 status -- that is, a timer interrupt handler that (a) >>> does not try to be smart about nested interrupts, therefore one that is >>> much simpler, and (b) is more tolerant of the Windows / cdboot.efi spec >>> violation, (c) is vulnerable to the timer interrupt storm seen on Xen, >>> but will never run on Xen. (Only the OVMF Xen platform is supposed to be >>> launched on Xen.) >> >> I'm less keen on this because it reduces the runtime exposure of a very >> complex piece of code, and will effectively cause that code to become >> unmaintained. >> >> It's also satisfying (to me) that NestedInterruptTplLib provides a >> provable upper bound on stack consumption due to interrupts, which can't >> be guaranteed by the simpler pre-239b50a86370 scheme. >> >> Could we defer judgement until after I've fully reasoned through (and >> documented) how NestedInterruptTplLib will work in the presence of >> interrupts occurring at TPL_HIGH_LEVEL? >> > > Would it be feasible for our firmware implementation to disable the > timer interrupt at the timer end as well? > > E.g., > > RaiseTPL(HIGH):: > > CLI > disarm timer > > > RestoreTPL:: > > <complain if HIGH and interrupts enabled at CPU side> > re-arm timer > STI > I can be entirely wrong here, but: - we looked for a solution (or workaround) to the original problem that stays within the boundaries of OvmfPkg, so sinking tweaks into the core TPL manipulation functions isn't ideal - regarding the TimerInterruptHandler() function(s) that do live in OvmfPkg, there had been tweaks to signaling end-of-interrupt (which I understand as sort of equivalent to your suggestion, as unless/until you signal EOI, no more interrupts will be *generated*), but those had not helped. The EOI was either too early and so we got the unbounded nesting, or it was too late, and no interrupts were generated while (for example) TPL_CALLBACK code would depend on timers with CheckEvent. See bug 4162 -- that was what prompted Michael to revert the EOI placement tweak and to implement NestedInterruptLib. Apologies if there are further interpretations of disarming the timer that I'm missing! Laszlo ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [edk2-devel] [PATCH v2 1/1] OvmfPkg/NestedInterruptTplLib: replace ASSERT() with a warning logged. 2023-05-08 6:45 ` Laszlo Ersek @ 2023-05-09 9:13 ` Ard Biesheuvel 0 siblings, 0 replies; 18+ messages in thread From: Ard Biesheuvel @ 2023-05-09 9:13 UTC (permalink / raw) To: devel, lersek Cc: Michael Brown, kraxel, Oliver Steffen, Pawel Polawski, Jiewen Yao, Ard Biesheuvel, Jordan Justen On Mon, 8 May 2023 at 08:46, Laszlo Ersek <lersek@redhat.com> wrote: > > On 5/6/23 01:57, Ard Biesheuvel wrote: > > On Sat, 6 May 2023 at 01:27, Michael Brown <mcb30@ipxe.org> wrote: > >> > >> On 05/05/2023 19:56, Laszlo Ersek wrote: > >>> I don't like the patch. For two reasons: > >>> > >>> (1) It papers over the actual issue. The problem should be fixed where > >>> it is, if possible. > >> > >> Agreed, but (as you have shown in > >> https://bugzilla.redhat.com/show_bug.cgi?id=2189136) the bug lies in > >> Windows code rather than in EDK2 code. If the goal is to allow these > >> buggy Windows builds to still be used with OVMF, then the only option is > >> to paper over the issue. We should do this only if it can be proven > >> safe to do so, of course. > >> > >>> (2) With the patch applied, NestedInterruptRaiseTPL() can return > >>> TPL_HIGH_LEVEL (as "InterruptedTPL"). Consequently, > >>> TimerInterruptHandler() [OvmfPkg/LocalApicTimerDxe/LocalApicTimerDxe.c] > >>> may pass TPL_HIGH_LEVEL back to NestedInterruptRestoreTPL(), as > >>> "InterruptedTPL". > >>> > >>> I believe that this in turn may invalidate at least one comment in > >>> NestedInterruptRestoreTPL(): > >>> > >>> // > >>> // Call RestoreTPL() to allow event notifications to be > >>> // dispatched. This will implicitly re-enable interrupts. > >>> // > >>> gBS->RestoreTPL (InterruptedTPL); > >>> > >>> Restoring TPL_HIGH_LEVEL does not re-enable interrupts -- nominally anyways. > >> > >> I agree that the comment is invalidated, but as far as I can tell the > >> logic remains safe. > >> > >> I will put together a patch to update the comments in > >> NestedInterruptTplLib to address the possibility of an interrupt > >> occurring (illegally) at TPL_HIGH_LEVEL. > >> > >>> (a) Make LocalApicTimerDxe Xen-specific again. It's only the OVMF Xen > >>> platform that really *needs* NestedInterruptTplLib. (Don't get me wrong: > >>> NestedInterruptTplLib is technically correct in all circumstances, but > >>> in practice it happens to be too strict.) > >>> > >>> (b) For the non-Xen OVMF platforms, re-create a LocalApicTimerDxe > >>> variant that effectively has commits a086f4a63bc0 and a24fbd606125 > >>> reverted. (We should keep 9bf473da4c1d.) This returns us to > >>> pre-239b50a86370 status -- that is, a timer interrupt handler that (a) > >>> does not try to be smart about nested interrupts, therefore one that is > >>> much simpler, and (b) is more tolerant of the Windows / cdboot.efi spec > >>> violation, (c) is vulnerable to the timer interrupt storm seen on Xen, > >>> but will never run on Xen. (Only the OVMF Xen platform is supposed to be > >>> launched on Xen.) > >> > >> I'm less keen on this because it reduces the runtime exposure of a very > >> complex piece of code, and will effectively cause that code to become > >> unmaintained. > >> > >> It's also satisfying (to me) that NestedInterruptTplLib provides a > >> provable upper bound on stack consumption due to interrupts, which can't > >> be guaranteed by the simpler pre-239b50a86370 scheme. > >> > >> Could we defer judgement until after I've fully reasoned through (and > >> documented) how NestedInterruptTplLib will work in the presence of > >> interrupts occurring at TPL_HIGH_LEVEL? > >> > > > > Would it be feasible for our firmware implementation to disable the > > timer interrupt at the timer end as well? > > > > E.g., > > > > RaiseTPL(HIGH):: > > > > CLI > > disarm timer > > > > > > RestoreTPL:: > > > > <complain if HIGH and interrupts enabled at CPU side> > > re-arm timer > > STI > > > > I can be entirely wrong here, but: > > - we looked for a solution (or workaround) to the original problem that > stays within the boundaries of OvmfPkg, so sinking tweaks into the core > TPL manipulation functions isn't ideal > > - regarding the TimerInterruptHandler() function(s) that do live in > OvmfPkg, there had been tweaks to signaling end-of-interrupt (which I > understand as sort of equivalent to your suggestion, as unless/until you > signal EOI, no more interrupts will be *generated*), but those had not > helped. The EOI was either too early and so we got the unbounded > nesting, or it was too late, and no interrupts were generated while (for > example) TPL_CALLBACK code would depend on timers with CheckEvent. See > bug 4162 -- that was what prompted Michael to revert the EOI placement > tweak and to implement NestedInterruptLib. > > Apologies if there are further interpretations of disarming the timer > that I'm missing! > No, I think you've summed it up. And in any case, it is not OVMF's job to police what the loader is doing to that extent - we should flag it as a violation but beyond that, I don't think we should try to interfere with this non-compliant behavior. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [edk2-devel] [PATCH v2 1/1] OvmfPkg/NestedInterruptTplLib: replace ASSERT() with a warning logged. 2023-05-05 23:27 ` Michael Brown 2023-05-05 23:57 ` Ard Biesheuvel @ 2023-05-08 6:38 ` Laszlo Ersek 2023-05-08 21:31 ` [PATCH 0/2] OvmfPkg: Relax assertion that interrupts do not occur at TPL_HIGH_LEVEL Michael Brown [not found] ` <20230508213100.3949708-1-mcb30@ipxe.org> 1 sibling, 2 replies; 18+ messages in thread From: Laszlo Ersek @ 2023-05-08 6:38 UTC (permalink / raw) To: Michael Brown, devel, kraxel Cc: Oliver Steffen, Pawel Polawski, Jiewen Yao, Ard Biesheuvel, Jordan Justen On 5/6/23 01:27, Michael Brown wrote: > On 05/05/2023 19:56, Laszlo Ersek wrote: >> I don't like the patch. For two reasons: >> >> (1) It papers over the actual issue. The problem should be fixed where >> it is, if possible. > > Agreed, but (as you have shown in > https://bugzilla.redhat.com/show_bug.cgi?id=2189136) the bug lies in > Windows code rather than in EDK2 code. If the goal is to allow these > buggy Windows builds to still be used with OVMF, then the only option is > to paper over the issue. We should do this only if it can be proven > safe to do so, of course. > >> (2) With the patch applied, NestedInterruptRaiseTPL() can return >> TPL_HIGH_LEVEL (as "InterruptedTPL"). Consequently, >> TimerInterruptHandler() [OvmfPkg/LocalApicTimerDxe/LocalApicTimerDxe.c] >> may pass TPL_HIGH_LEVEL back to NestedInterruptRestoreTPL(), as >> "InterruptedTPL". >> >> I believe that this in turn may invalidate at least one comment in >> NestedInterruptRestoreTPL(): >> >> // >> // Call RestoreTPL() to allow event notifications to be >> // dispatched. This will implicitly re-enable interrupts. >> // >> gBS->RestoreTPL (InterruptedTPL); >> >> Restoring TPL_HIGH_LEVEL does not re-enable interrupts -- nominally >> anyways. > > I agree that the comment is invalidated, but as far as I can tell the > logic remains safe. > > I will put together a patch to update the comments in > NestedInterruptTplLib to address the possibility of an interrupt > occurring (illegally) at TPL_HIGH_LEVEL. > >> (a) Make LocalApicTimerDxe Xen-specific again. It's only the OVMF Xen >> platform that really *needs* NestedInterruptTplLib. (Don't get me wrong: >> NestedInterruptTplLib is technically correct in all circumstances, but >> in practice it happens to be too strict.) >> >> (b) For the non-Xen OVMF platforms, re-create a LocalApicTimerDxe >> variant that effectively has commits a086f4a63bc0 and a24fbd606125 >> reverted. (We should keep 9bf473da4c1d.) This returns us to >> pre-239b50a86370 status -- that is, a timer interrupt handler that (a) >> does not try to be smart about nested interrupts, therefore one that is >> much simpler, and (b) is more tolerant of the Windows / cdboot.efi spec >> violation, (c) is vulnerable to the timer interrupt storm seen on Xen, >> but will never run on Xen. (Only the OVMF Xen platform is supposed to be >> launched on Xen.) > > I'm less keen on this because it reduces the runtime exposure of a very > complex piece of code, and will effectively cause that code to become > unmaintained. > > It's also satisfying (to me) that NestedInterruptTplLib provides a > provable upper bound on stack consumption due to interrupts, which can't > be guaranteed by the simpler pre-239b50a86370 scheme. > > Could we defer judgement until after I've fully reasoned through (and > documented) how NestedInterruptTplLib will work in the presence of > interrupts occurring at TPL_HIGH_LEVEL? Sure, absolutely! As I wrote elsewhere: if you can revalidate the code with a new, less strict set of invariants, and update the comments, I think that would be the perfect workaround. Thank you, Laszlo ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 0/2] OvmfPkg: Relax assertion that interrupts do not occur at TPL_HIGH_LEVEL 2023-05-08 6:38 ` Laszlo Ersek @ 2023-05-08 21:31 ` Michael Brown 2023-05-09 7:05 ` Gerd Hoffmann 2023-05-09 8:43 ` Laszlo Ersek [not found] ` <20230508213100.3949708-1-mcb30@ipxe.org> 1 sibling, 2 replies; 18+ messages in thread From: Michael Brown @ 2023-05-08 21:31 UTC (permalink / raw) To: devel Cc: Michael Brown, Laszlo Ersek, Gerd Hoffmann, Oliver Steffen, Pawel Polawski, Jiewen Yao, Ard Biesheuvel, Jordan Justen At TPL_HIGH_LEVEL, CPU interrupts are disabled (as per the UEFI specification) and so we should never encounter a situation in which an interrupt occurs at TPL_HIGH_LEVEL. The specification also restricts usage of TPL_HIGH_LEVEL to the firmware itself. However, nothing prevents a rogue UEFI application from illegally calling gBS->RaiseTPL(TPL_HIGH_LEVEL) and then deliberately violating the invariant by enabling interrupts via the STI or equivalent instruction. Some versions of the Microsoft Windows bootloader are known to do this. NestedInterruptTplLib maintains the invariant that interrupts are disabled at TPL_HIGH_LEVEL (even when performing the dark art of deliberately manipulating the stack so that IRET will return with interrupts still disabled), but does not itself rely on external code maintaining this invariant. Relax the assertion that the interrupted TPL is below TPL_HIGH_LEVEL to an error message, to allow rogue UEFI applications such as the Microsoft Windows bootloader to continue to function. Ref: https://bugzilla.redhat.com/show_bug.cgi?id=2189136 Cc: Laszlo Ersek <lersek@redhat.com> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Oliver Steffen <osteffen@redhat.com> Cc: Pawel Polawski <ppolawsk@redhat.com> Cc: Jiewen Yao <jiewen.yao@intel.com> Cc: Ard Biesheuvel <ardb+tianocore@kernel.org> Cc: Jordan Justen <jordan.l.justen@intel.com> Michael Brown (2): OvmfPkg: Clarify invariants for NestedInterruptTplLib OvmfPkg: Relax assertion that interrupts do not occur at TPL_HIGH_LEVEL OvmfPkg/Library/NestedInterruptTplLib/Tpl.c | 31 +++++++++++++++++---- 1 file changed, 26 insertions(+), 5 deletions(-) -- 2.39.0 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/2] OvmfPkg: Relax assertion that interrupts do not occur at TPL_HIGH_LEVEL 2023-05-08 21:31 ` [PATCH 0/2] OvmfPkg: Relax assertion that interrupts do not occur at TPL_HIGH_LEVEL Michael Brown @ 2023-05-09 7:05 ` Gerd Hoffmann 2023-05-09 8:43 ` Laszlo Ersek 1 sibling, 0 replies; 18+ messages in thread From: Gerd Hoffmann @ 2023-05-09 7:05 UTC (permalink / raw) To: Michael Brown Cc: devel, Laszlo Ersek, Oliver Steffen, Pawel Polawski, Jiewen Yao, Ard Biesheuvel, Jordan Justen On Mon, May 08, 2023 at 09:31:23PM +0000, Michael Brown wrote: > At TPL_HIGH_LEVEL, CPU interrupts are disabled (as per the UEFI > specification) and so we should never encounter a situation in which > an interrupt occurs at TPL_HIGH_LEVEL. The specification also > restricts usage of TPL_HIGH_LEVEL to the firmware itself. > > However, nothing prevents a rogue UEFI application from illegally > calling gBS->RaiseTPL(TPL_HIGH_LEVEL) and then deliberately violating > the invariant by enabling interrupts via the STI or equivalent > instruction. Some versions of the Microsoft Windows bootloader are > known to do this. > > NestedInterruptTplLib maintains the invariant that interrupts are > disabled at TPL_HIGH_LEVEL (even when performing the dark art of > deliberately manipulating the stack so that IRET will return with > interrupts still disabled), but does not itself rely on external code > maintaining this invariant. > > Relax the assertion that the interrupted TPL is below TPL_HIGH_LEVEL > to an error message, to allow rogue UEFI applications such as the > Microsoft Windows bootloader to continue to function. > > Ref: https://bugzilla.redhat.com/show_bug.cgi?id=2189136 > Cc: Laszlo Ersek <lersek@redhat.com> > Cc: Gerd Hoffmann <kraxel@redhat.com> > Cc: Oliver Steffen <osteffen@redhat.com> > Cc: Pawel Polawski <ppolawsk@redhat.com> > Cc: Jiewen Yao <jiewen.yao@intel.com> > Cc: Ard Biesheuvel <ardb+tianocore@kernel.org> > Cc: Jordan Justen <jordan.l.justen@intel.com> > > Michael Brown (2): > OvmfPkg: Clarify invariants for NestedInterruptTplLib > OvmfPkg: Relax assertion that interrupts do not occur at > TPL_HIGH_LEVEL Reviewed-by: Gerd Hoffmann <kraxel@redhat.com> take care, Gerd ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/2] OvmfPkg: Relax assertion that interrupts do not occur at TPL_HIGH_LEVEL 2023-05-08 21:31 ` [PATCH 0/2] OvmfPkg: Relax assertion that interrupts do not occur at TPL_HIGH_LEVEL Michael Brown 2023-05-09 7:05 ` Gerd Hoffmann @ 2023-05-09 8:43 ` Laszlo Ersek 2023-05-09 12:08 ` [edk2-devel] " Michael Brown 1 sibling, 1 reply; 18+ messages in thread From: Laszlo Ersek @ 2023-05-09 8:43 UTC (permalink / raw) To: Michael Brown, devel Cc: Gerd Hoffmann, Oliver Steffen, Pawel Polawski, Jiewen Yao, Ard Biesheuvel, Jordan Justen On 5/8/23 23:31, Michael Brown wrote: > At TPL_HIGH_LEVEL, CPU interrupts are disabled (as per the UEFI > specification) and so we should never encounter a situation in which > an interrupt occurs at TPL_HIGH_LEVEL. The specification also > restricts usage of TPL_HIGH_LEVEL to the firmware itself. > > However, nothing prevents a rogue UEFI application from illegally > calling gBS->RaiseTPL(TPL_HIGH_LEVEL) and then deliberately violating > the invariant by enabling interrupts via the STI or equivalent > instruction. Some versions of the Microsoft Windows bootloader are > known to do this. > > NestedInterruptTplLib maintains the invariant that interrupts are > disabled at TPL_HIGH_LEVEL (even when performing the dark art of > deliberately manipulating the stack so that IRET will return with > interrupts still disabled), but does not itself rely on external code > maintaining this invariant. > > Relax the assertion that the interrupted TPL is below TPL_HIGH_LEVEL > to an error message, to allow rogue UEFI applications such as the > Microsoft Windows bootloader to continue to function. > > Ref: https://bugzilla.redhat.com/show_bug.cgi?id=2189136 > Cc: Laszlo Ersek <lersek@redhat.com> > Cc: Gerd Hoffmann <kraxel@redhat.com> > Cc: Oliver Steffen <osteffen@redhat.com> > Cc: Pawel Polawski <ppolawsk@redhat.com> > Cc: Jiewen Yao <jiewen.yao@intel.com> > Cc: Ard Biesheuvel <ardb+tianocore@kernel.org> > Cc: Jordan Justen <jordan.l.justen@intel.com> > > Michael Brown (2): > OvmfPkg: Clarify invariants for NestedInterruptTplLib > OvmfPkg: Relax assertion that interrupts do not occur at > TPL_HIGH_LEVEL > > OvmfPkg/Library/NestedInterruptTplLib/Tpl.c | 31 +++++++++++++++++---- > 1 file changed, 26 insertions(+), 5 deletions(-) > I'm not subscribed to the list, so I don't have a copy of patch#1. I've checked patch#1 at this URL: https://listman.redhat.com/archives/edk2-devel-archive/2023-May/063591.html and I'll comment on it using the cover letter: I really like that patch, with one stylistic exception: in edk2, explicit FALSE and TRUE comparisons are not desired. So I suggest: ASSERT (!GetInterruptState ()); Twice. In fact, I *think* that if you run uncrustify with the edk2 config on the patch, then it will rewrite that code. Uncrustify normally only complains in CI on github, but you can run it manually too, after each commit: (1) Clone and build uncrustify: https://projectmu@dev.azure.com/projectmu/Uncrustify/_git/Uncrustify (2) After each commit, preferably when your working tree *and* index are clean, run uncrustify in the edk2 project root, with the following flags: -c .pytool/Plugin/UncrustifyCheck/uncrustify.cfg \ --replace \ --no-backup \ --if-changed The file list can be passed in via stdin ("-F -") or on the uncrustify command line. The "file list" is generally the list of *.c and *.h files modified by the particular commit. Just dumping the entire file list modified by a commit to uncrustify is not good: uncrustify will happily garble (for example) *.inf files, and then it will also crash with SEGV. Thank you! Laszlo ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [edk2-devel] [PATCH 0/2] OvmfPkg: Relax assertion that interrupts do not occur at TPL_HIGH_LEVEL 2023-05-09 8:43 ` Laszlo Ersek @ 2023-05-09 12:08 ` Michael Brown 2023-05-09 13:27 ` Laszlo Ersek 0 siblings, 1 reply; 18+ messages in thread From: Michael Brown @ 2023-05-09 12:08 UTC (permalink / raw) To: devel, lersek Cc: Gerd Hoffmann, Oliver Steffen, Pawel Polawski, Jiewen Yao, Ard Biesheuvel, Jordan Justen On 09/05/2023 09:43, Laszlo Ersek wrote: > I'm not subscribed to the list, so I don't have a copy of patch#1. I've > checked patch#1 at this URL: > > https://listman.redhat.com/archives/edk2-devel-archive/2023-May/063591.html > > and I'll comment on it using the cover letter: > > I really like that patch, with one stylistic exception: in edk2, > explicit FALSE and TRUE comparisons are not desired. So I suggest: > > ASSERT (!GetInterruptState ()); > > Twice. > > In fact, I *think* that if you run uncrustify with the edk2 config on > the patch, then it will rewrite that code. I built and ran uncrustify with the edk2 config but it did not modify the code. (I did check that it would fix other deliberate errors such as extra whitespace, so I don't think this was an error in my setup.) I will send through v2 with the explicit "== FALSE" still present, for consistency with the rest of the code in that file. (I think I vaguely remember someone asking me to add the explicit comparisons when I first submitted the code.) I'm happy for there to be a follow up patch to change the coding style to remove all of the explicit boolean comparisons, if that is what is wanted. Thanks, Michael ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [edk2-devel] [PATCH 0/2] OvmfPkg: Relax assertion that interrupts do not occur at TPL_HIGH_LEVEL 2023-05-09 12:08 ` [edk2-devel] " Michael Brown @ 2023-05-09 13:27 ` Laszlo Ersek 0 siblings, 0 replies; 18+ messages in thread From: Laszlo Ersek @ 2023-05-09 13:27 UTC (permalink / raw) To: Michael Brown, devel Cc: Gerd Hoffmann, Oliver Steffen, Pawel Polawski, Jiewen Yao, Ard Biesheuvel, Jordan Justen On 5/9/23 14:08, Michael Brown wrote: > On 09/05/2023 09:43, Laszlo Ersek wrote: >> I'm not subscribed to the list, so I don't have a copy of patch#1. I've >> checked patch#1 at this URL: >> >> https://listman.redhat.com/archives/edk2-devel-archive/2023-May/063591.html >> >> and I'll comment on it using the cover letter: >> >> I really like that patch, with one stylistic exception: in edk2, >> explicit FALSE and TRUE comparisons are not desired. So I suggest: >> >> ASSERT (!GetInterruptState ()); >> >> Twice. >> >> In fact, I *think* that if you run uncrustify with the edk2 config on >> the patch, then it will rewrite that code. > > I built and ran uncrustify with the edk2 config but it did not modify > the code. (I did check that it would fix other deliberate errors such > as extra whitespace, so I don't think this was an error in my setup.) > > I will send through v2 with the explicit "== FALSE" still present, for > consistency with the rest of the code in that file. (I think I vaguely > remember someone asking me to add the explicit comparisons when I first > submitted the code.) Wow, I totally missed the preexistent comparisons. You are right to stay consistent. (I do disagree with the preexistent comparisons as well, but that's indeed a different discussion.) Thanks! Laszlo > I'm happy for there to be a follow up patch to > change the coding style to remove all of the explicit boolean > comparisons, if that is what is wanted. > > Thanks, > > Michael > ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <20230508213100.3949708-1-mcb30@ipxe.org>]
* [PATCH 1/2] OvmfPkg: Clarify invariants for NestedInterruptTplLib [not found] ` <20230508213100.3949708-1-mcb30@ipxe.org> @ 2023-05-08 21:31 ` Michael Brown 2023-05-08 21:31 ` [PATCH 2/2] OvmfPkg: Relax assertion that interrupts do not occur at TPL_HIGH_LEVEL Michael Brown 1 sibling, 0 replies; 18+ messages in thread From: Michael Brown @ 2023-05-08 21:31 UTC (permalink / raw) To: devel; +Cc: Michael Brown NestedInterruptTplLib relies on CPU interrupts being disabled to guarantee exclusive (and hence atomic) access to the shared state in IsrState. Nothing in the calling interrupt handler should have re-enabled interrupts before calling NestedInterruptRestoreTPL(), and the loop in NestedInterruptRestoreTPL() itself maintains the invariant that interrupts are disabled at the start of each iteration. Add assertions to clarify this invariant, and expand the comments around the calls to RestoreTPL() and DisableInterrupts() to clarify the expectations around enabling and disabling interrupts. Signed-off-by: Michael Brown <mcb30@ipxe.org> --- OvmfPkg/Library/NestedInterruptTplLib/Tpl.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c b/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c index e19d98878eb7..e921a09c5599 100644 --- a/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c +++ b/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c @@ -104,6 +104,7 @@ NestedInterruptRestoreTPL ( // defer our call to RestoreTPL() to the in-progress outer instance // of the same interrupt handler. // + ASSERT (GetInterruptState () == FALSE); if (InterruptedTPL == IsrState->InProgressRestoreTPL) { // // Trigger outer instance of this interrupt handler to perform the @@ -153,6 +154,7 @@ NestedInterruptRestoreTPL ( // // Check shared state loop invariants. // + ASSERT (GetInterruptState () == FALSE); ASSERT (IsrState->InProgressRestoreTPL < InterruptedTPL); ASSERT (IsrState->DeferredRestoreTPL == FALSE); @@ -167,13 +169,17 @@ NestedInterruptRestoreTPL ( // // Call RestoreTPL() to allow event notifications to be - // dispatched. This will implicitly re-enable interrupts. + // dispatched. This will implicitly re-enable interrupts (if + // InterruptedTPL is below TPL_HIGH_LEVEL), even though we are + // still inside the interrupt handler. // gBS->RestoreTPL (InterruptedTPL); // // Re-disable interrupts after the call to RestoreTPL() to ensure - // that we have exclusive access to the shared state. + // that we have exclusive access to the shared state. Interrupts + // will be re-enabled by the IRET or equivalent instruction when + // we subsequently return from the interrupt handler. // DisableInterrupts (); -- 2.39.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 2/2] OvmfPkg: Relax assertion that interrupts do not occur at TPL_HIGH_LEVEL [not found] ` <20230508213100.3949708-1-mcb30@ipxe.org> 2023-05-08 21:31 ` [PATCH 1/2] OvmfPkg: Clarify invariants for NestedInterruptTplLib Michael Brown @ 2023-05-08 21:31 ` Michael Brown 2023-05-09 8:35 ` Laszlo Ersek 1 sibling, 1 reply; 18+ messages in thread From: Michael Brown @ 2023-05-08 21:31 UTC (permalink / raw) To: devel; +Cc: Michael Brown, Gerd Hoffmann, Laszlo Ersek At TPL_HIGH_LEVEL, CPU interrupts are disabled (as per the UEFI specification) and so we should never encounter a situation in which an interrupt occurs at TPL_HIGH_LEVEL. The specification also restricts usage of TPL_HIGH_LEVEL to the firmware itself. However, nothing prevents a rogue UEFI application from illegally calling gBS->RaiseTPL(TPL_HIGH_LEVEL) and then deliberately violating the invariant by enabling interrupts via the STI or equivalent instruction. Some versions of the Microsoft Windows bootloader are known to do this. NestedInterruptTplLib maintains the invariant that interrupts are disabled at TPL_HIGH_LEVEL (even when performing the dark art of deliberately manipulating the stack so that IRET will return with interrupts still disabled), but does not itself rely on external code maintaining this invariant. Relax the assertion that the interrupted TPL is below TPL_HIGH_LEVEL to an error message, to allow rogue UEFI applications such as the Microsoft Windows bootloader to continue to function. Debugged-by: Gerd Hoffmann <kraxel@redhat.com> Debugged-by: Laszlo Ersek <lersek@redhat.com> Ref: https://bugzilla.redhat.com/show_bug.cgi?id=2189136 Signed-off-by: Michael Brown <mcb30@ipxe.org> --- OvmfPkg/Library/NestedInterruptTplLib/Tpl.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c b/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c index e921a09c5599..a91f2d3cb8c7 100644 --- a/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c +++ b/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c @@ -34,12 +34,27 @@ NestedInterruptRaiseTPL ( // // Raise TPL and assert that we were called from within an interrupt - // handler (i.e. with TPL below TPL_HIGH_LEVEL but with interrupts - // disabled). + // handler (i.e. with interrupts already disabled before raising the + // TPL). // ASSERT (GetInterruptState () == FALSE); InterruptedTPL = gBS->RaiseTPL (TPL_HIGH_LEVEL); - ASSERT (InterruptedTPL < TPL_HIGH_LEVEL); + + // + // At TPL_HIGH_LEVEL, CPU interrupts are disabled (as per the UEFI + // specification) and so we should never encounter a situation in + // which InterruptedTPL==TPL_HIGH_LEVEL. The specification also + // restricts usage of TPL_HIGH_LEVEL to the firmware itself. + // + // However, nothing prevents a rogue UEFI application from illegally + // calling gBS->RaiseTPL(TPL_HIGH_LEVEL) and then deliberately + // violating the invariant by enabling interrupts via the STI or + // equivalent instruction. Some versions of the Microsoft Windows + // bootloader are known to do this. + // + if (InterruptedTPL >= TPL_HIGH_LEVEL) { + DEBUG ((DEBUG_ERROR, "Illegal interrupt at TPL_HIGH_LEVEL!\n")); + } return InterruptedTPL; } -- 2.39.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] OvmfPkg: Relax assertion that interrupts do not occur at TPL_HIGH_LEVEL 2023-05-08 21:31 ` [PATCH 2/2] OvmfPkg: Relax assertion that interrupts do not occur at TPL_HIGH_LEVEL Michael Brown @ 2023-05-09 8:35 ` Laszlo Ersek 2023-05-09 9:42 ` Gerd Hoffmann 2023-05-09 12:04 ` [edk2-devel] " Michael Brown 0 siblings, 2 replies; 18+ messages in thread From: Laszlo Ersek @ 2023-05-09 8:35 UTC (permalink / raw) To: Michael Brown, devel; +Cc: Gerd Hoffmann On 5/8/23 23:31, Michael Brown wrote: > At TPL_HIGH_LEVEL, CPU interrupts are disabled (as per the UEFI > specification) and so we should never encounter a situation in which > an interrupt occurs at TPL_HIGH_LEVEL. The specification also > restricts usage of TPL_HIGH_LEVEL to the firmware itself. Great introduction! Regarding the rest of the commit message, I'd like us to tone it down a little bit. Here's why: I'd been used to Microsoft *not* cooperating usefully in Windows-on-QEMU/KVM situations. But this instance was totally different. In fact I'm still a bit shocked, in the positive sense. We got a fast and helpful, to-the-point response. It's a first, considering my own experience, and it has strongly changed my impression of Microsoft's Windows team. I'd like us to acknowledge that in the commit message, if possible. Mind you, I'm not a native English speaker, so I could be seeing things (and proposing worse language than the original). With all that said: > However, nothing prevents a rogue UEFI application from illegally I request s/rogue/non-conformant/. I'd also request "invalidly" rather than "illegally"; the latter has connotations with the law, and seeing such in a commit message makes me fidget uncomfortably. I do apologize. > calling gBS->RaiseTPL(TPL_HIGH_LEVEL) and then deliberately violating You are not wrong about "deliberately", but I'd still like us to remove that word. :) > the invariant by enabling interrupts via the STI or equivalent > instruction. Some versions of the Microsoft Windows bootloader are > known to do this. > > NestedInterruptTplLib maintains the invariant that interrupts are > disabled at TPL_HIGH_LEVEL (even when performing the dark art of > deliberately manipulating the stack so that IRET will return with > interrupts still disabled), but does not itself rely on external code > maintaining this invariant. > > Relax the assertion that the interrupted TPL is below TPL_HIGH_LEVEL > to an error message, to allow rogue UEFI applications such as the s/rogue/non-conformant/ > Microsoft Windows bootloader to continue to function. Can we say "particular version of the Microsoft Windows bootloader"? > > Debugged-by: Gerd Hoffmann <kraxel@redhat.com> > Debugged-by: Laszlo Ersek <lersek@redhat.com> > Ref: https://bugzilla.redhat.com/show_bug.cgi?id=2189136 > Signed-off-by: Michael Brown <mcb30@ipxe.org> > --- > OvmfPkg/Library/NestedInterruptTplLib/Tpl.c | 21 ++++++++++++++++++--- > 1 file changed, 18 insertions(+), 3 deletions(-) > > diff --git a/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c b/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c > index e921a09c5599..a91f2d3cb8c7 100644 > --- a/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c > +++ b/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c > @@ -34,12 +34,27 @@ NestedInterruptRaiseTPL ( > > // > // Raise TPL and assert that we were called from within an interrupt > - // handler (i.e. with TPL below TPL_HIGH_LEVEL but with interrupts > - // disabled). > + // handler (i.e. with interrupts already disabled before raising the > + // TPL). > // > ASSERT (GetInterruptState () == FALSE); > InterruptedTPL = gBS->RaiseTPL (TPL_HIGH_LEVEL); > - ASSERT (InterruptedTPL < TPL_HIGH_LEVEL); > + > + // > + // At TPL_HIGH_LEVEL, CPU interrupts are disabled (as per the UEFI > + // specification) and so we should never encounter a situation in > + // which InterruptedTPL==TPL_HIGH_LEVEL. The specification also > + // restricts usage of TPL_HIGH_LEVEL to the firmware itself. > + // > + // However, nothing prevents a rogue UEFI application from illegally > + // calling gBS->RaiseTPL(TPL_HIGH_LEVEL) and then deliberately > + // violating the invariant by enabling interrupts via the STI or > + // equivalent instruction. Some versions of the Microsoft Windows > + // bootloader are known to do this. > + // Same three requests on the wording: rogue, illegally, deliberately. (I am happy with "Some versions of"!) > + if (InterruptedTPL >= TPL_HIGH_LEVEL) { > + DEBUG ((DEBUG_ERROR, "Illegal interrupt at TPL_HIGH_LEVEL!\n")); s/Illegal/Invalid/ please! :) > + } > > return InterruptedTPL; > } Thank you for the patch; I do apologize about splitting hairs. The debugging was difficult, and you *are* working around a bug here -- but I'd really like our tone of voice to be positive here, simply because of the stunningly positive attitude I've experienced from Microsoft. Thanks! Laszlo ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] OvmfPkg: Relax assertion that interrupts do not occur at TPL_HIGH_LEVEL 2023-05-09 8:35 ` Laszlo Ersek @ 2023-05-09 9:42 ` Gerd Hoffmann 2023-05-09 12:04 ` [edk2-devel] " Michael Brown 1 sibling, 0 replies; 18+ messages in thread From: Gerd Hoffmann @ 2023-05-09 9:42 UTC (permalink / raw) To: Laszlo Ersek; +Cc: Michael Brown, devel Hi, > Here's why: I'd been used to Microsoft *not* cooperating usefully in > Windows-on-QEMU/KVM situations. But this instance was totally different. > In fact I'm still a bit shocked, in the positive sense. We got a fast > and helpful, to-the-point response. It's a first, considering my own > experience, and it has strongly changed my impression of Microsoft's > Windows team. I'd like us to acknowledge that in the commit message, if > possible. And Microsoft saying this is tsc calibration code also gives us a workaround: When using 'qemu -cpu host,hv-frequencies=on' the windows installer iso boots just fine, most likely because cdboot.efi queries the hypervisor for the tsc frequency instead of running the calibration loop. take care, Gerd ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [edk2-devel] [PATCH 2/2] OvmfPkg: Relax assertion that interrupts do not occur at TPL_HIGH_LEVEL 2023-05-09 8:35 ` Laszlo Ersek 2023-05-09 9:42 ` Gerd Hoffmann @ 2023-05-09 12:04 ` Michael Brown 1 sibling, 0 replies; 18+ messages in thread From: Michael Brown @ 2023-05-09 12:04 UTC (permalink / raw) To: devel, lersek; +Cc: Gerd Hoffmann On 09/05/2023 09:35, Laszlo Ersek wrote: > Thank you for the patch; I do apologize about splitting hairs. The > debugging was difficult, and you *are* working around a bug here -- but > I'd really like our tone of voice to be positive here, simply because of > the stunningly positive attitude I've experienced from Microsoft. No problem. Patch v2 to follow. Michael ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2023-05-09 13:27 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-05-03 7:19 [PATCH v2 1/1] OvmfPkg/NestedInterruptTplLib: replace ASSERT() with a warning logged Gerd Hoffmann 2023-05-05 14:10 ` [edk2-devel] " Michael Brown 2023-05-05 18:56 ` Laszlo Ersek 2023-05-05 23:27 ` Michael Brown 2023-05-05 23:57 ` Ard Biesheuvel 2023-05-08 6:45 ` Laszlo Ersek 2023-05-09 9:13 ` Ard Biesheuvel 2023-05-08 6:38 ` Laszlo Ersek 2023-05-08 21:31 ` [PATCH 0/2] OvmfPkg: Relax assertion that interrupts do not occur at TPL_HIGH_LEVEL Michael Brown 2023-05-09 7:05 ` Gerd Hoffmann 2023-05-09 8:43 ` Laszlo Ersek 2023-05-09 12:08 ` [edk2-devel] " Michael Brown 2023-05-09 13:27 ` Laszlo Ersek [not found] ` <20230508213100.3949708-1-mcb30@ipxe.org> 2023-05-08 21:31 ` [PATCH 1/2] OvmfPkg: Clarify invariants for NestedInterruptTplLib Michael Brown 2023-05-08 21:31 ` [PATCH 2/2] OvmfPkg: Relax assertion that interrupts do not occur at TPL_HIGH_LEVEL Michael Brown 2023-05-09 8:35 ` Laszlo Ersek 2023-05-09 9:42 ` Gerd Hoffmann 2023-05-09 12:04 ` [edk2-devel] " Michael Brown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox