public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: "Ni, Ray" <ray.ni@intel.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: "devel@edk2.groups.io" <devel@edk2.groups.io>,
	"Kinney, Michael D" <michael.d.kinney@intel.com>,
	Liming Gao <gaoliming@byosoft.com.cn>,
	"Laszlo Ersek" <lersek@redhat.com>,
	Michael Brown <mcb30@ipxe.org>
Subject: Re: [edk2-devel] [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack overflow issue due to nested interrupts
Date: Fri, 1 Mar 2024 03:07:44 +0000	[thread overview]
Message-ID: <MN6PR11MB82441CEB3AE528D56B4F26038C5E2@MN6PR11MB8244.namprd11.prod.outlook.com> (raw)
In-Reply-To: <CABgObfZPmg745qmUoWLcYcu2WCFTarCg2DcOdRTwy0j0MksdFA@mail.gmail.com>

I think we are all aligned on the purpose. It's to avoid enabling the interrupts in the end of RestoreTPL (HIGH->non-HIGH) in the interrupt context.
The discussion is about how to implement it.

Michael Brown's idea is to avoid changing DxeCore but add a customized RaiseTpl/RestoreTpl implementation in a lib and request Timer driver calls it.
That lib was implemented very smartly. It includes while-loop, implicitly-recursive, implicitly-requiring NESTED_INTERRUPT_STATE in global storage not in stack as local variable.
I really do NOT like the future that every timer driver calls that lib to avoid the potential stack overflow. It's so complicated! And it's called in every 10ms!!

Paolo,
I don't fully understand your patch especially the following changes.
3 comments embedded.

@@ -161,5 +191,46 @@ CoreRestoreTpl (
   IN EFI_TPL NewTpl
   )
 {
+  BOOLEAN InInterruptHandler = FALSE;
+
+  //
+  // Unwind the nested interrupt handlers up to the required
+  // TPL, paying attention not to overflow the stack.  While
+  // not strictly necessary according to the specification,
+  // accept the possibility that multiple RaiseTPL calls are
+  // undone by a single RestoreTPL
+  //
+  while ((INTN)NewTpl <= HighBitSet64 (mInterruptedTplMask)) {
1. why "<="? I thought when RestoreTPL() is called there are only two cases:
   a. NewTpl == HighBitSet64 (...)
   b. NewTpl > HighBitSet64 (...)
  1.a is the case when TimerInterruptHandler() or CoreTimerTick() restores
  TPL from HIGH to non-HIGH.
  1.b is the case when the pending event backs call RaiseTPL/RestoreTPL().
  Because only pending events whose TPL > "Interrupted TPL" can run, the
  RestoreTPL() call from the event callbacks cannot change the TPL to a value
  less than or equal to "Interrupted TPL".
  So, I think "<=" can be "==".

2. can you explain a bit more about the reason of "while"?



+    UINTN InterruptedTpl = HighBitSet64 (mInterruptedTplMask);
+    mInterruptedTplMask &= ~(UINTN)(1 << InterruptedTpl);
+
+    ASSERT (GetInterruptState () == FALSE);
+    InInterruptHandler = TRUE;
+
+    //
+    // Take the TPL down a notch to allow event notifications to be
+    // dispatched.  This will implicitly re-enable interrupts (if
+    // InterruptedTPL is below TPL_HIGH_LEVEL), even though we are
+    // still inside the interrupt handler, but the new TPL will
+    // be set while they are disabled.
+    //
+    // DesiredInterruptState must be FALSE to ensure that the
+    // stack does not blow up.  If we used, as in the final call
+    // below, "InterruptedTpl < TPL_HIGH_LEVEL", the timer interrupt
+    // handler could be invoked repeatedly in the small window between
+    // CoreSetInterruptState (TRUE) and the IRET instruction.
+    //
+    CoreRestoreTplInternal (InterruptedTpl, FALSE);
+
+    if (InterruptedTpl == NewTpl) {
+      break;
3. "break" or "return"? I think we should exit from this function.


+    }
+  }
+
+  //
+  // If we get here with InInterruptHandler == TRUE, an interrupt
+  // handler forgot to restore the TPL.
+  //
+  ASSERT (!InInterruptHandler);
   CoreRestoreTplInternal (NewTpl, NewTpl < TPL_HIGH_LEVEL);
 }

Thanks,
Ray
> -----Original Message-----
> From: Paolo Bonzini <pbonzini@redhat.com>
> Sent: Friday, March 1, 2024 8:14 AM
> To: Ni, Ray <ray.ni@intel.com>
> Cc: devel@edk2.groups.io; Kinney, Michael D <michael.d.kinney@intel.com>;
> Liming Gao <gaoliming@byosoft.com.cn>; Laszlo Ersek <lersek@redhat.com>;
> Michael Brown <mcb30@ipxe.org>
> Subject: Re: [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack overflow issue
> due to nested interrupts
> 
> On Thu, Feb 29, 2024 at 2:04 PM Ray Ni <ray.ni@intel.com> wrote:
> > @@ -134,9 +262,9 @@ CoreRestoreTpl (
> >    }
> >
> >    //
> > -  // Set the new value
> > +  // Set the new TPL with interrupt disabled.
> >    //
> > -
> > +  CoreSetInterruptState (FALSE);
> >    gEfiCurrentTpl = NewTpl;
> >
> >    //
> > @@ -144,7 +272,22 @@ CoreRestoreTpl (
> >    // interrupts are enabled
> >    //
> >    if (gEfiCurrentTpl < TPL_HIGH_LEVEL) {
> > -    CoreSetInterruptState (TRUE);
> > +    if ((INTN)gEfiCurrentTpl > HighBitSet64 (mInterruptedTplMask)) {
> > +      //
> > +      // Only enable interrupts if restoring to a level above the highest
> > +      // interrupted TPL level.  This allows interrupt nesting, but only for
> > +      // events at higher TPL level than the current TPL level.
> > +      //
> > +      CoreSetInterruptState (TRUE);
> > +    } else {
> > +      //
> > +      // Clear interrupted TPL level mask, but do not re-enable interrupts
> here
> > +      // This will return to CoreTimerTick() and interrupts will be
> re-enabled
> > +      // when the timer interrupt handlers returns from interrupt
> context.
> > +      //
> > +      ASSERT ((INTN)gEfiCurrentTpl == HighBitSet64
> (mInterruptedTplMask));
> > +      mInterruptedTplMask &= ~(UINTN)(1 << gEfiCurrentTpl);
> > +    }
> >    }
> 
> Ok, now I understand what's going on and it's indeed the same logic as
> NestedInterruptTplLib, with DisableInterruptsOnIret() replaced by
> skipping CoreSetInterruptState(TRUE). It's similar to what I proposed
> elsewhere in the thread, just written differenty.
> 
> I agree with Michael Brown that the spec is unclear on the state of
> the interrupt flag on exit from gBS->RestoreTPL(), but perhaps this
> change is feasible if the interrupt handlers just raise the TPL first
> and restore it last.
> 
> Just as an exercise for me to understand the code better, I tried
> rewriting the code in terms of the CoreRestoreTplInternal() function
> that I proposed. I find it easier to read, but I guess that's a bit in
> the eye of the beholder, and it is a little more defensively coded. I
> attach it (untested beyond compilation) for reference.
> 
> Paolo


-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#116207): https://edk2.groups.io/g/devel/message/116207
Mute This Topic: https://groups.io/mt/104642317/7686176
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io]
-=-=-=-=-=-=-=-=-=-=-=-



  reply	other threads:[~2024-03-01  3:07 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-29 13:02 [edk2-devel] [PATCH 0/2] Fix stack overflow issue due to nested interrupts Ni, Ray
2024-02-29 13:02 ` [edk2-devel] [PATCH 1/2] UefiCpuPkg/CpuDxe: Return correct interrupt state Ni, Ray
2024-02-29 13:02 ` [edk2-devel] [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack overflow issue due to nested interrupts Ni, Ray
2024-02-29 13:23   ` Michael Brown
2024-02-29 16:43     ` Michael D Kinney
2024-02-29 17:39       ` Michael Brown
2024-02-29 19:09         ` Michael D Kinney
2024-02-29 19:41           ` Michael Brown
2024-02-29 17:39       ` Paolo Bonzini
2024-02-29 19:09         ` Michael D Kinney
2024-02-29 19:04   ` Paolo Bonzini
2024-02-29 19:16     ` Michael D Kinney
2024-02-29 20:08       ` Paolo Bonzini
2024-02-29 19:22     ` Michael Brown
2024-02-29 19:26       ` Michael D Kinney
2024-02-29 19:44         ` Michael Brown
2024-02-29 20:11       ` Paolo Bonzini
2024-03-01  0:14   ` Paolo Bonzini
2024-03-01  3:07     ` Ni, Ray [this message]
2024-03-01  8:37       ` Paolo Bonzini
2024-03-01  9:27         ` Michael Brown
2024-03-01  9:33           ` Paolo Bonzini
2024-03-01 11:10             ` Michael Brown
2024-03-01 12:09               ` Paolo Bonzini
2024-03-05  4:19               ` Ni, Ray
     [not found]               ` <17B9C3692B44139F.30946@groups.io>
2024-06-18  5:54                 ` Ni, Ray
2024-03-01  8:44   ` Paolo Bonzini
2024-03-01  9:20     ` Ni, Ray

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MN6PR11MB82441CEB3AE528D56B4F26038C5E2@MN6PR11MB8244.namprd11.prod.outlook.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox