* [edk2-devel] RFC: Another solution to the nested interrupt issue
@ 2024-01-25 7:57 Ni, Ray
2024-01-25 13:03 ` Michael Brown
0 siblings, 1 reply; 6+ messages in thread
From: Ni, Ray @ 2024-01-25 7:57 UTC (permalink / raw)
To: Michael Brown, Laszlo Ersek, Kinney, Michael D
Cc: Ni, Ray, devel@edk2.groups.io
[-- Attachment #1.1: Type: text/plain, Size: 1896 bytes --]
All,
This mail is to bring another approach to solve the stack-overflow due to nested interrupts. Michael solved this problem in OVMF through NestedInterruptTplLib.
I made a draft patch as attached "DxeCore.diff". The patch simply to avoid the interrupt in enable state when TPL is dropped to the interrupted TPL. The interrupt will be enabled later through "IRET".
So, a timer driver has two ways to implement its timer interrupt handler:
1. Do raise/restore TPL in the TimerInterruptHandler(). But call the APIs in NestedInterruptTplLib.
2. Do not raise/restore TPL in the TimerInterruptHandler(). So that only DxeCore restores the TPL. And when DxeCore restores the TPL, the interrupt is not enabled when TPL is dropped to the interrupted TPL (as it will be enabled later by "IRET").
Implementing the logic in DxeCore does not prevent the TimerInterruptHandler() from implementing the way #1.
Agree on the draft patch?
My 2nd question is can we set a rule that TimerInterruptHandler() should NOT restore TPL so that way #2 (changing DxeCore) is enough to solve the stack overflow issue due to nested interrupts.
I was aware of the discussion between Laszlo and Michael in end of 2022 but never dig deeply as today into this problem.
I really appreciate the long discussion in the bugzilla (https://bugzilla.tianocore.org/show_bug.cgi?id=4162) and comments in NestedInterruptTplLib. I learned a lot from them and they are quite interesting!
Thanks,
Ray
-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#114369): https://edk2.groups.io/g/devel/message/114369
Mute This Topic: https://groups.io/mt/103950154/7686176
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/leave/12367111/7686176/1913456212/xyzzy [rebecca@openfw.io]
-=-=-=-=-=-=-=-=-=-=-=-
[-- Attachment #1.2: Type: text/html, Size: 8700 bytes --]
[-- Attachment #2: DxeCore.diff --]
[-- Type: application/octet-stream, Size: 2457 bytes --]
diff --git a/MdeModulePkg/Core/Dxe/Event/Timer.c b/MdeModulePkg/Core/Dxe/Event/Timer.c
index 29e507c67c..8e8b9d6592 100644
--- a/MdeModulePkg/Core/Dxe/Event/Timer.c
+++ b/MdeModulePkg/Core/Dxe/Event/Timer.c
@@ -176,6 +176,11 @@ CoreInitializeTimer (
ASSERT_EFI_ERROR (Status);
}
+VOID
+CoreRestoreTplDeferEnableInterrupt (
+ IN EFI_TPL NewTpl
+ );
+
/**
Called by the platform code to process a tick.
@@ -190,11 +195,9 @@ CoreTimerTick (
)
{
IEVENT *Event;
+ EFI_TPL OriginalTpl;
- //
- // Check runtiem flag in case there are ticks while exiting boot services
- //
- CoreAcquireLock (&mEfiSystemTimeLock);
+ OriginalTpl = CoreRaiseTpl (TPL_HIGH_LEVEL);
//
// Update the system time
@@ -213,7 +216,7 @@ CoreTimerTick (
}
}
- CoreReleaseLock (&mEfiSystemTimeLock);
+ CoreRestoreTplDeferEnableInterrupt (OriginalTpl); //CoreRestoreTpl (OriginalTpl);
}
/**
diff --git a/MdeModulePkg/Core/Dxe/Event/Tpl.c b/MdeModulePkg/Core/Dxe/Event/Tpl.c
index b33f80573c..8d4b92ef4e 100644
--- a/MdeModulePkg/Core/Dxe/Event/Tpl.c
+++ b/MdeModulePkg/Core/Dxe/Event/Tpl.c
@@ -147,3 +147,55 @@ CoreRestoreTpl (
CoreSetInterruptState (TRUE);
}
}
+
+
+VOID
+CoreRestoreTplDeferEnableInterrupt (
+ IN EFI_TPL NewTpl
+ )
+{
+ EFI_TPL OldTpl;
+ EFI_TPL PendingTpl;
+
+ OldTpl = gEfiCurrentTpl;
+ if (NewTpl > OldTpl) {
+ DEBUG ((DEBUG_ERROR, "FATAL ERROR - RestoreTpl with NewTpl(0x%x) > OldTpl(0x%x)\n", NewTpl, OldTpl));
+ ASSERT (FALSE);
+ }
+
+ ASSERT (VALID_TPL (NewTpl));
+
+ //
+ // If lowering below HIGH_LEVEL, make sure
+ // interrupts are enabled
+ //
+
+ if ((OldTpl >= TPL_HIGH_LEVEL) && (NewTpl < TPL_HIGH_LEVEL)) {
+ gEfiCurrentTpl = TPL_HIGH_LEVEL;
+ }
+
+ //
+ // Dispatch any pending events
+ //
+ while (gEventPending != 0) {
+ PendingTpl = (UINTN)HighBitSet64 (gEventPending);
+ if (PendingTpl <= NewTpl) {
+ break;
+ }
+
+ gEfiCurrentTpl = PendingTpl;
+ if (gEfiCurrentTpl < TPL_HIGH_LEVEL) {
+ CoreSetInterruptState (TRUE);
+ }
+
+ CoreDispatchEventNotifies (gEfiCurrentTpl);
+ }
+
+ //
+ // Disable interrupt before restoring to the original TPL.
+ // This is to avoid the nest interrupt happens in the same TPL
+ // which will be endless.
+ //
+ CoreSetInterruptState (FALSE);
+ gEfiCurrentTpl = NewTpl;
+}
\ No newline at end of file
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [edk2-devel] RFC: Another solution to the nested interrupt issue
2024-01-25 7:57 [edk2-devel] RFC: Another solution to the nested interrupt issue Ni, Ray
@ 2024-01-25 13:03 ` Michael Brown
2024-01-25 13:54 ` Ni, Ray
0 siblings, 1 reply; 6+ messages in thread
From: Michael Brown @ 2024-01-25 13:03 UTC (permalink / raw)
To: devel, ray.ni, Laszlo Ersek, Kinney, Michael D
On 25/01/2024 07:57, Ni, Ray wrote:
> This mail is to bring another approach to solve the stack-overflow due
> to nested interrupts. Michael solved this problem in OVMF through
> NestedInterruptTplLib.
>
> I made a draft patch as attached “DxeCore.diff”. The patch simply to
> avoid the interrupt in enable state when TPL is dropped to the
> interrupted TPL. The interrupt will be enabled later through “IRET”.
I don't disagree with the approach, but it does break the API as per the
UEFI PI specification (version 1.8 section II-12.10), and so this is not
something that can just be dropped in as an EDK2 code change.
There are no version number fields or spare parameters (that I can see)
that could be used within EFI_TIMER_ARCH_PROTOCOL to allow for this
change of semantics, so this would have to be a breaking change.
I think you'd need to first define an EFI_TIMER_ARCH2_PROTOCOL with
different semantics for RegisterHandler(), but I'm not a UEFI Forum
member and I have no idea what the bureaucratic process is for pushing
through this kind of change. I suspect we'd end up having to support
both protocol variants (which means added maintenance burden).
It might be plausible instead to extend EFI_CPU_ARCH_PROTOCOL in a
backwards-compatible way. The InterruptType parameter to
RegisterInterruptHandler() is already handled in a very casual way: the
UEFI spec defines a limited range of possible types (0-19 for IA32/X64)
for EFI_EXCEPTION_TYPE, but the code in LocalApicTimerDxe.c already
treats it as meaning just an IRQ vector number by passing the value
LOCAL_APIC_TIMER_VECTOR=32.
(Incidentally, the code comments in MdePkg/Include/Protocol/Cpu.h are
completely wrong - the description of the InterruptType parameter seems
to have been copied from the description of the State parameter in
EFI_CPU_GET_INTERRUPT_STATE.)
The extension I'm thinking of could be:
- Clean up the definition of EFI_EXCEPTION_TYPE to clarify that it
represents a CPU interrupt vector number (if this is how all current
architectures actually use it).
- Extend the definition of EFI_EXCEPTION_TYPE to include a high bit flag
that can be ORed on to the exception type, e.g.
#define EFI_EXCEPTION_TYPE_TPL_HIGH 0x01000000
- The RegisterInterruptHandler() implementation could then treat this
flag as meaning that the handler should be called via a wrapper that
does RaiseTPL(TPL_HIGH_LEVEL) before calling the handler, and the
equivalent of your CoreRestoreTplDeferEnableInterrupt() after the
handler returns, i.e. that the handler is automatically called at
TPL_HIGH_LEVEL.
This would not make any breaking change to the definitions of
EFI_TIMER_ARCH_PROTOCOL or EFI_CPU_ARCH_PROTOCOL, and so would not
require an ARCH2_PROTOCOL variant to be created.
I haven't tried implementing this to see if it's viable, so I've
probably missed something crucial.
Personally I would go for moving NestedInterruptTplLib to MdeModulePkg
since this can be done today, whereas anything involving a spec change
will take a lot more time, but that's not my call.
Thanks,
Michael
-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#114401): https://edk2.groups.io/g/devel/message/114401
Mute This Topic: https://groups.io/mt/103950154/7686176
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io]
-=-=-=-=-=-=-=-=-=-=-=-
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [edk2-devel] RFC: Another solution to the nested interrupt issue
2024-01-25 13:03 ` Michael Brown
@ 2024-01-25 13:54 ` Ni, Ray
2024-01-25 14:25 ` Michael Brown
0 siblings, 1 reply; 6+ messages in thread
From: Ni, Ray @ 2024-01-25 13:54 UTC (permalink / raw)
To: Michael Brown, devel@edk2.groups.io, Laszlo Ersek,
Kinney, Michael D
> On 25/01/2024 07:57, Ni, Ray wrote:
> > This mail is to bring another approach to solve the stack-overflow due
> > to nested interrupts. Michael solved this problem in OVMF through
> > NestedInterruptTplLib.
> >
> > I made a draft patch as attached “DxeCore.diff”. The patch simply to
> > avoid the interrupt in enable state when TPL is dropped to the
> > interrupted TPL. The interrupt will be enabled later through “IRET”.
>
> I don't disagree with the approach, but it does break the API as per the
> UEFI PI specification (version 1.8 section II-12.10), and so this is not
> something that can just be dropped in as an EDK2 code change.
You think that the TimerInterruptHandler() doesn't raise/restore TPL
which would violate the PI spec as PI spec says " NotifyFunction ... executes at EFI_TPL_HIGH_LEVEL."?
I do not think the PI spec requires TimerInterruptHandler() raises TPL
to HIGH before invoking NotifyFunction. It just means the NotifyFunction
will execute at TPL_HIGH.
If you review HpetTimer driver, it does not raise TPL to HIGH before
invoking NotifyFunction.
And I think implementing the DxeCore changes as attached does not
prevent the TimerInterruptHandler() from calling raise/restore TPL.
So, with the changes done in DxeCore, a timer driver could either
not raise/restore TPL in TimerInterruptHandler(), or it calls
NestedInterruptTplLib if it wants.
I like the idea behind NestedInterruptTplLib. The only concern is
the maintenance effort. I want to avoid the difficulty of debuggability
brought by NestedInterruptTplLib when some timer interrupt related
issues happen in future.
I am ok with OvmfPkg code using NestedInterruptTplLib.
Thanks,
Ray
-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#114405): https://edk2.groups.io/g/devel/message/114405
Mute This Topic: https://groups.io/mt/103950154/7686176
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/leave/12367111/7686176/1913456212/xyzzy [rebecca@openfw.io]
-=-=-=-=-=-=-=-=-=-=-=-
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [edk2-devel] RFC: Another solution to the nested interrupt issue
2024-01-25 13:54 ` Ni, Ray
@ 2024-01-25 14:25 ` Michael Brown
2024-01-25 15:06 ` Ni, Ray
0 siblings, 1 reply; 6+ messages in thread
From: Michael Brown @ 2024-01-25 14:25 UTC (permalink / raw)
To: devel, ray.ni, Laszlo Ersek, Kinney, Michael D
On 25/01/2024 13:54, Ni, Ray wrote:
>> I don't disagree with the approach, but it does break the API as per the
>> UEFI PI specification (version 1.8 section II-12.10), and so this is not
>> something that can just be dropped in as an EDK2 code change.
>
> You think that the TimerInterruptHandler() doesn't raise/restore TPL
> which would violate the PI spec as PI spec says " NotifyFunction ... executes at EFI_TPL_HIGH_LEVEL."?
>
> I do not think the PI spec requires TimerInterruptHandler() raises TPL
> to HIGH before invoking NotifyFunction. It just means the NotifyFunction
> will execute at TPL_HIGH.
If the caller is not supposed to raise TPL to TPL_HIGH_LEVEL before
calling NotifyFunction, then the statement "This function executes at
EFI_TPL_HIGH_LEVEL" in the PI specification is meaningless. There is no
other possible interpretation besides "the caller must raise TPL to
TPL_HIGH_LEVEL before calling this function".
> If you review HpetTimer driver, it does not raise TPL to HIGH before
> invoking NotifyFunction.
That would then be a bug in HpetTimer, which ought to be fixed. If
HpetTimer were to be used on a platform where the NotifyFunction
correctly assumes that it is called at TPL_HIGH_LEVEL and does something
that would break at a lower level, then this could lead to undefined
behaviour.
> And I think implementing the DxeCore changes as attached does not
> prevent the TimerInterruptHandler() from calling raise/restore TPL.
No, but a spec-conforming timer interrupt handler could not take
advantage of the feature, because it would have to raise to
TPL_HIGH_LEVEL before calling the NotifyFunction. (Any raise/restore
within the NotifyFunction would then have no effect.)
> So, with the changes done in DxeCore, a timer driver could either
> not raise/restore TPL in TimerInterruptHandler(), or it calls
> NestedInterruptTplLib if it wants.
As a pure code change, I do agree that it solves the problem and it's a
much simpler approach. However, it is a breaking change to the
specification and I think it would need be handled as such.
The minimal specification change I can think of that would make this
possible would be to relax the wording on NotifyFunction in the next
version of the PI specification to say that
* the NotifyFunction can be called at any TPL level
* the NotifyFunction will raise TPL to TPL_HIGH_LEVEL, restore TPL back
to the original TPL before returning
* the NotifyFunction may re-enable interrupts during its execution, and
that the caller must be prepared to be re-entered before NotifyFunction
returns
* the timer interrupt must have been rearmed before calling NotifyFunction
* the NotifyFunction must guarantee that it never reaches a state in
which the TPL has been restored to the original level with CPU
interrupts enabled.
This would be backwards compatible with the existing behaviour. A
caller written to the current specification would call NotifyFunction at
TPL_HIGH_LEVEL and so any RaiseTPL/RestoreTPL done within a
NotifyFunction complying to the new specification would be a no-op anyway.
A caller written to the new specification would have to check the
supported version of the PI specification (which I assume is available
in some system configuration table somewhere) to know that it was safe
to call NotifyFunction without first raising to TPL_HIGH_LEVEL.
This approach would at least avoid the need for an ARCH2_PROTOCOL
variant, which is potentially lower impact.
Thanks,
Michael
-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#114407): https://edk2.groups.io/g/devel/message/114407
Mute This Topic: https://groups.io/mt/103950154/7686176
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io]
-=-=-=-=-=-=-=-=-=-=-=-
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [edk2-devel] RFC: Another solution to the nested interrupt issue
2024-01-25 14:25 ` Michael Brown
@ 2024-01-25 15:06 ` Ni, Ray
2024-01-25 15:29 ` Michael Brown
0 siblings, 1 reply; 6+ messages in thread
From: Ni, Ray @ 2024-01-25 15:06 UTC (permalink / raw)
To: Michael Brown, devel@edk2.groups.io, Laszlo Ersek,
Kinney, Michael D, Zimmer, Vincent
Cc: Ni, Ray
> That would then be a bug in HpetTimer, which ought to be fixed. If
> HpetTimer were to be used on a platform where the NotifyFunction
> correctly assumes that it is called at TPL_HIGH_LEVEL and does something
> that would break at a lower level, then this could lead to undefined
> behaviour.
In theory, it could happen that the NotifyFunction may not raise TPL
to HIGH.
In real world, NotifyFunction (CoreTimerTick() in DxeCore) does raise
TPL to HIGH.
I agree binding the NotifyFunction to CoreTimerTick() is not the best.
But it could help to reduce the complexity of the problem.
> As a pure code change, I do agree that it solves the problem and it's a
> much simpler approach. However, it is a breaking change to the
> specification and I think it would need be handled as such.
>
> The minimal specification change I can think of that would make this
> possible would be to relax the wording on NotifyFunction in the next
> version of the PI specification to say that
>
> * the NotifyFunction can be called at any TPL level
>
> * the NotifyFunction will raise TPL to TPL_HIGH_LEVEL, restore TPL back
> to the original TPL before returning
>
> * the NotifyFunction may re-enable interrupts during its execution, and
> that the caller must be prepared to be re-entered before NotifyFunction
> returns
>
> * the timer interrupt must have been rearmed before calling NotifyFunction
>
> * the NotifyFunction must guarantee that it never reaches a state in
> which the TPL has been restored to the original level with CPU
> interrupts enabled.
I agree with you about the above PI spec clarifications.
Would you like to write a PI spec ECR?
But I do not think the PI spec version stored in the PI system table needs to
reflect whether a DxeCore implementation follows the clarification.
Since the DxeCore::CoreTimerTick() implementation raises TPL to HIGH in the very first version created
in more than 10 years ago, I think it's safe for TimerInterruptHandler() assumes CoreTimerTick() will
raise TPL to HIGH so that TimerInterruptHandler() does not need to raise TPL to HIGH.
(I agree changing the spec version is the most correct way if we review the problem in a very theorical way.)
I really want to keep the UEFI world simple with the bug fixed.
(The cost is assumption on existing DxeCore::CoreTimerTick().)
-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#114409): https://edk2.groups.io/g/devel/message/114409
Mute This Topic: https://groups.io/mt/103950154/7686176
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/leave/12367111/7686176/1913456212/xyzzy [rebecca@openfw.io]
-=-=-=-=-=-=-=-=-=-=-=-
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [edk2-devel] RFC: Another solution to the nested interrupt issue
2024-01-25 15:06 ` Ni, Ray
@ 2024-01-25 15:29 ` Michael Brown
0 siblings, 0 replies; 6+ messages in thread
From: Michael Brown @ 2024-01-25 15:29 UTC (permalink / raw)
To: devel, ray.ni, Laszlo Ersek, Kinney, Michael D, Zimmer, Vincent
On 25/01/2024 15:06, Ni, Ray wrote:
> I agree with you about the above PI spec clarifications.
> Would you like to write a PI spec ECR?
I am not a UEFI Forum member, so I don't think I have standing to do
this. (I also don't have the spare time to do this for free, sorry.)
> But I do not think the PI spec version stored in the PI system table needs to
> reflect whether a DxeCore implementation follows the clarification.
>
> Since the DxeCore::CoreTimerTick() implementation raises TPL to HIGH in the very first version created
> in more than 10 years ago, I think it's safe for TimerInterruptHandler() assumes CoreTimerTick() will
> raise TPL to HIGH so that TimerInterruptHandler() does not need to raise TPL to HIGH.
> (I agree changing the spec version is the most correct way if we review the problem in a very theorical way.)
I don't see any call to RaiseTPL() in the current version of
CoreTimerTick(), unless this is implicit in the use of CoreAcquireLock()?
> I really want to keep the UEFI world simple with the bug fixed.
I definitely agree with this aim! :)
> (The cost is assumption on existing DxeCore::CoreTimerTick().)
For me, the problem is that we can't assume that it's the EDK2
implementation of CoreTimerTick() that is being used. It's perfectly
legitimate for an implementation to not use EDK2's DxeCore, and to
provide a NotifyFunction that *requires* being called with the TPL
already raised to TPL_HIGH_LEVEL.
If we're not worried about maintaining conformance to the published
specifications, then there are several very breaking changes I'd like to
make, starting with the USB I/O protocols. :)
Thanks,
Michael
-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#114427): https://edk2.groups.io/g/devel/message/114427
Mute This Topic: https://groups.io/mt/103950154/7686176
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io]
-=-=-=-=-=-=-=-=-=-=-=-
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-01-25 15:29 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-25 7:57 [edk2-devel] RFC: Another solution to the nested interrupt issue Ni, Ray
2024-01-25 13:03 ` Michael Brown
2024-01-25 13:54 ` Ni, Ray
2024-01-25 14:25 ` Michael Brown
2024-01-25 15:06 ` Ni, Ray
2024-01-25 15:29 ` Michael Brown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox