From: "Paolo Bonzini" <pbonzini@redhat.com>
To: "Kinney, Michael D" <michael.d.kinney@intel.com>,
"Ni, Ray" <ray.ni@intel.com>,
"devel@edk2.groups.io" <devel@edk2.groups.io>
Cc: Liming Gao <gaoliming@byosoft.com.cn>,
Laszlo Ersek <lersek@redhat.com>, Michael Brown <mcb30@ipxe.org>
Subject: Re: [edk2-devel] [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack overflow issue due to nested interrupts
Date: Thu, 29 Feb 2024 21:08:48 +0100 [thread overview]
Message-ID: <77bbc006-7a5d-478a-9ba5-398c8db1699c@redhat.com> (raw)
In-Reply-To: <CO1PR11MB4929C4EAB5165AEFA4D6EDE2D25F2@CO1PR11MB4929.namprd11.prod.outlook.com>
On 2/29/24 20:16, Kinney, Michael D wrote:
>
>
>> -----Original Message-----
>> From: Paolo Bonzini <pbonzini@redhat.com>
>> Sent: Thursday, February 29, 2024 11:04 AM
>> To: Ni, Ray <ray.ni@intel.com>; devel@edk2.groups.io
>> Cc: Kinney, Michael D <michael.d.kinney@intel.com>; Liming Gao
>> <gaoliming@byosoft.com.cn>; Laszlo Ersek <lersek@redhat.com>; Michael
>> Brown <mcb30@ipxe.org>
>> Subject: Re: [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack overflow issue
>> due to nested interrupts
>>
>> On 2/29/24 14:02, Ray Ni wrote:
>>> In the end, it will lower the TPL to TPL_APPLICATION with interrupt
>> enabled.
>>>
>>> However, it's possible that another timer interrupt happens just in
>> the end
>>> of RestoreTPL() function when TPL is TPL_APPLICATION.
>>
>> How do non-OVMF platforms solve the issue? Do they just have the same
>> bug as in https://bugzilla.tianocore.org/show_bug.cgi?id=4162 ?
>
> Yes. This same issue can be reproduced on non-OVMF platforms.
>
> This proposal here is an attempt to integrate a common fix into the DXE Core.
>
> I would agree conceptually that integrating the NestedInterruptTplLib work
> into the DXE Core is another option.
>
> I believe the root cause of all of these scenarios is enabling interrupts
> in RestoreTPL() when processing a timer interrupt between the last processed
> event and the return from the interrupt handler. Ther are some instances
> of the Timer Arch Protocol implementation that call Raise/Restore TPL, so
> we want a DXE Core change that is compatible with the DXE Core doing Raise/Restore
> when processing a timer interrupt and the Timer Arch Protocol implementation
> also doing the Raise/Restore TPL.
Ok, now I understand better.
The reason why the NestedInterruptTplLib was introduced (as opposed to
doing it in core DXE) was to enable returning with disabled interrupts
from the nested interrupt handler, but I think it can be done with a
function like the CoreRestoreTplInternal() I outlined in the previous
email, which is the same as current CoreRestoreTpl() but finishes with
if (!DesiredInterruptState) {
CoreSetInterruptState (FALSE);
}
gEfiCurrentTpl = NewTpl;
if (DesiredInterruptState) {
ASSERT (gEfiCurrentTpl < TPL_HIGH_LEVEL);
CoreSetInterruptState (TRUE);
}
The new CoreRaiseTpl would be the same as in Ray and your patch, while
the CoreRestoreTpl would be something like this:
if (NewTpl == HighBitSet64 (mInterruptedTplMask)) {
static NESTED_INTERRUPT_STATE NestedInterruptState;
mInterruptedTplMask &= ~(UINTN)(1 << NewTpl);
//
// Use the deferred invocation logic that is currently
// in NestedInterruptTplLib.
//
// But unlike current NestedInterruptRestoreTPL(), if the logic
// is part of core DXE, the
//
// gBS->RestoreTPL (InterruptedTPL);
// DisableInterrupts ();
//
// pair that requires "disable interrupts on IRET" logic can
// be done without ever enabling interrupts, with
// CoreRestoreTplInternal(InterruptedTPL, FALSE)
//
// As an aside, NestedInterruptState might as well become a
// pair of globals.
//
NestedInterruptRestoreTPL (NewTpl, &NestedInterruptState);
} else {
CoreRestoreTplInternal(NewTpl, NewTpl < TPL_HIGH_LEVEL);
}
Requiring matching raise/restore pairs is a bit scary. It can be
avoided by changing the "if" to a
while (NewTpl >= HighBitSet64 (mInterruptedTplMask))
mInterruptedTplMask &=
~(UINTN)(1 << HighBitSet64 (mInterruptedTplMask));
Then, if inlining NestedInterruptRestoreTPL() allows simplifications,
they can be done on top after the merge of NestedInterruptTplLib. In
particular, I suspect that the while loop above can be unified with the
loop in NestedInterruptRestoreTPL(). But again, that would be best
reviewed as a separate change.
All this, as Michael said, is however conditional on being able to deal
with the TPL_HIGH_LEVEL+STI shenanigans that Windows does.
Paolo
>>
>> The design of NestedInterruptTplLib is that each nested interrupt must
>> increase the TPL, but if I understand correctly there is a hole here:
>>
>> //
>> // Call RestoreTPL() to allow event notifications to be
>> // dispatched. This will implicitly re-enable interrupts.
>> //
>> gBS->RestoreTPL (InterruptedTPL);
>>
>> //
>> // Re-disable interrupts after the call to RestoreTPL() to ensure
>> // that we have exclusive access to the shared state.
>> //
>> DisableInterrupts ();
>>
>> because gBS->RestoreTPL will unconditionally enable interrupts if
>> InterruptedTPL < TPL_HIGH_LEVEL.
>>
>>
>> If possible, the easiest solution would be to merge
>> NestedInterruptTplLib into Core DXE. This way, instead of calling
>> gBS->RestoreTPL, NestedInterruptTplLib can call a custom version of
>> CoreRestoreTpl that exits with interrupts disabled. That is, something
>> like
>>
>> VOID EFIAPI CoreRestoreTplInternal(IN EFI_TPL NewTpl,
>> IN BOOLEAN InterruptState)
>> {
>> //
>> // The caller can request disabled interrupts to access shared
>> // state, but TPL_HIGH_LEVEL must *not* have them enabled.
>> //
>> ASSERT(!(NewTpl == TPL_HIGH_LEVEL && InterruptState));
>>
>> // ...
>>
>> gEfiCurrentTpl = NewTpl;
>> CoreSetInterruptState (InterruptState);
>> }
>>
>> Now, CoreRestoreTpl is just
>>
>> //
>> // If lowering below HIGH_LEVEL, make sure
>> // interrupts are enabled
>> //
>> CoreRestoreTplInternal(NewTpl, NewTpl < TPL_HIGH_LEVEL);
>>
>> whereas NestedInterruptRestoreTPL can do
>>
>> //
>> // Call RestoreTPL() to allow event notifications to be
>> // dispatched. This will implicitly re-enable interrupts,
>> // but only if events have to be dispatched.
>> //
>> CoreRestoreTplInternal(InterruptedTPL, FALSE);
>>
>> //
>> // Interrupts are now disabled, so we can access shared state.
>> //
>>
>> This avoids the unlimited nesting of interrupts because each stack
>> frame
>> will indeed have a higher TPL than the outer version.
>>
>> Paolo
>
-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#116189): https://edk2.groups.io/g/devel/message/116189
Mute This Topic: https://groups.io/mt/104642317/7686176
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io]
-=-=-=-=-=-=-=-=-=-=-=-
next prev parent reply other threads:[~2024-02-29 20:08 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-29 13:02 [edk2-devel] [PATCH 0/2] Fix stack overflow issue due to nested interrupts Ni, Ray
2024-02-29 13:02 ` [edk2-devel] [PATCH 1/2] UefiCpuPkg/CpuDxe: Return correct interrupt state Ni, Ray
2024-02-29 13:02 ` [edk2-devel] [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack overflow issue due to nested interrupts Ni, Ray
2024-02-29 13:23 ` Michael Brown
2024-02-29 16:43 ` Michael D Kinney
2024-02-29 17:39 ` Michael Brown
2024-02-29 19:09 ` Michael D Kinney
2024-02-29 19:41 ` Michael Brown
2024-02-29 17:39 ` Paolo Bonzini
2024-02-29 19:09 ` Michael D Kinney
2024-02-29 19:04 ` Paolo Bonzini
2024-02-29 19:16 ` Michael D Kinney
2024-02-29 20:08 ` Paolo Bonzini [this message]
2024-02-29 19:22 ` Michael Brown
2024-02-29 19:26 ` Michael D Kinney
2024-02-29 19:44 ` Michael Brown
2024-02-29 20:11 ` Paolo Bonzini
2024-03-01 0:14 ` Paolo Bonzini
2024-03-01 3:07 ` Ni, Ray
2024-03-01 8:37 ` Paolo Bonzini
2024-03-01 9:27 ` Michael Brown
2024-03-01 9:33 ` Paolo Bonzini
2024-03-01 11:10 ` Michael Brown
2024-03-01 12:09 ` Paolo Bonzini
2024-03-05 4:19 ` Ni, Ray
[not found] ` <17B9C3692B44139F.30946@groups.io>
2024-06-18 5:54 ` Ni, Ray
2024-03-01 8:44 ` Paolo Bonzini
2024-03-01 9:20 ` Ni, Ray
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-list from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=77bbc006-7a5d-478a-9ba5-398c8db1699c@redhat.com \
--to=devel@edk2.groups.io \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox