public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: "Ni, Ray" <ray.ni@intel.com>
To: Laszlo Ersek <lersek@redhat.com>,
	"devel@edk2.groups.io" <devel@edk2.groups.io>,
	"mcb30@ipxe.org" <mcb30@ipxe.org>,
	"kraxel@redhat.com" <kraxel@redhat.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>,
	"Kinney, Michael D" <michael.d.kinney@intel.com>,
	"Desimone, Nathaniel L" <nathaniel.l.desimone@intel.com>,
	"Kumar, Rahul R" <rahul.r.kumar@intel.com>
Subject: Re: [edk2-devel] [PATCH 1/6] UefiCpuPkg/LocalApicTimerDxe: Duplicate OvmfPkg/LocalApicTimerDxe driver
Date: Wed, 17 Jan 2024 07:11:15 +0000	[thread overview]
Message-ID: <MN6PR11MB8244FFE620B349552A7BDDD68C722@MN6PR11MB8244.namprd11.prod.outlook.com> (raw)
In-Reply-To: <eebfe175-ff54-3b33-6945-c8542f31dc70@redhat.com>

Laszlo, Michael,

When timer interrupt happens, the calling flow is:
[Timer Interrupt #1] CPU IDT handler calls into LocalApicTimerDxe::TimerInterruptHandler(), which
   [Timer Interrupt #1]1. RaiseTPL (HIGH) from APPLICATION causing CPU interrupt be disabled.
   [Timer Interrupt #1]2. Send APIC EOI (ACK the interrupt received so APIC can continue generate interrupts)
   [Timer Interrupt #1]3. Call DxeCore::CoreTimerTick()
   [Timer Interrupt #1]4. RestoreTPL (APPLICATION) from HIGH. (All callbacks registered at NOTIFY and CALLBACK will run.)
      [Timer Interrupt #1]4.1. When there are Callbacks registered at NOTIFY, current TPL is set to NOTIFY and interrupt is enabled. CoreDispatchEventNotifies() is called to run the NOTIFY callbacks.
         [Timer Interrupt #2] Immediately after interrupt is enabled, CPU runs to LocalApicTimerDxe::TimerInterruptHandler(). But stack is not fully popped to the initial state.
            [Timer Interrupt #2]1. RaiseTPL (HIGH) from NOTIFY causing CPU interrupt be disabled.
            [Timer Interrupt #2]2. Send APIC EOI (ACK the interrupt received so APIC can continue generate interrupts)
            [Timer Interrupt #2]3. Call DxeCore::CoreTimerTick()
            [Timer Interrupt #2]4. RestoreTPL (NOTIFY) from HIGH. No callback runs as no callback can be registered at TPL > NOTIFY. In the end of RestoreTPL(), CPU interrupt is enabled.
               [Timer Interrupt #3] Immediately after interrupt is enabled, CPU runs to LocalApicTimerDxe::TimerInterruptHandler(). But stack is not fully popped to the initial state.
                  [Timer Interrupt #3]1. RaiseTPL (HIGH) from NOTIFY causing CPU interrupt be disabled.
                  [Timer Interrupt #3]2. Send APIC EOI (ACK the interrupt received so APIC can continue generate interrupts)
                  [Timer Interrupt #3]3. Call DxeCore::CoreTimerTick()
                  [Timer Interrupt #3]4. RestoreTPL (NOTIFY) from HIGH. No callback runs as no callback can be registered at TPL > NOTIFY. In the end of RestoreTPL(), CPU interrupt is enabled.
                     [Timer Interrupt #4] Immediately after interrupt is enabled, CPU runs to LocalApicTimerDxe::TimerInterruptHandler(). But stack is not fully popped to the initial state.
                        [Timer Interrupt #4]...


The above flow shows endless re-entrance of timer interrupt handler.

But, my question is: above flow only can happen in real platform when the below 4 steps occupies more time than the timer period (usually 10ms).
            [Timer Interrupt #2]1. RaiseTPL (HIGH) from NOTIFY causing CPU interrupt be disabled.
            [Timer Interrupt #2]2. Send APIC EOI (ACK the interrupt received so APIC can continue generate interrupts)
            [Timer Interrupt #2]3. Call DxeCore::CoreTimerTick()
            [Timer Interrupt #2]4. RestoreTPL (NOTIFY) from HIGH. No callback runs as no callback can be registered at TPL > NOTIFY. In the end of RestoreTPL(), CPU interrupt is enabled.

But, in my opinion, it's impossible.


Thanks,
Ray
> -----Original Message-----
> From: Laszlo Ersek <lersek@redhat.com>
> Sent: Tuesday, January 16, 2024 11:37 PM
> To: devel@edk2.groups.io; mcb30@ipxe.org; kraxel@redhat.com
> Cc: Pedro Falcato <pedro.falcato@gmail.com>; Ni, Ray <ray.ni@intel.com>;
> Kinney, Michael D <michael.d.kinney@intel.com>; Desimone, Nathaniel L
> <nathaniel.l.desimone@intel.com>; Kumar, Rahul R
> <rahul.r.kumar@intel.com>
> Subject: Re: [edk2-devel] [PATCH 1/6] UefiCpuPkg/LocalApicTimerDxe:
> Duplicate OvmfPkg/LocalApicTimerDxe driver
> 
> On 1/16/24 16:16, Michael Brown wrote:
> > On 16/01/2024 14:34, Laszlo Ersek wrote:
> >> On 1/16/24 10:48, Michael Brown wrote:
> >> IOW, my impression is that NestedInterruptTplLib can certainly handle
> >> all scenarios thrown at it, but where it really matters is in the face
> >> of an interrupt storm (not just "normal nesting"), and a storm is
> >> unlikely (or even impossible?) on physical hardware.
> >>
> >> ... Oh, scratch that. "Interrupt storm" simply means that interrupts are
> >> being delivered at a rate higher than the handler routine can service
> >> them. IOW, the "storm" is not that interrupts are delivered *very
> >> rapidly* in an absoulte sense. If interrupts are delivered at normal
> >> frequency, but the handler is too slow to service *even that rate*, then
> >> that also qualifies as "storm", because the nesting depth will *keep
> >> growing*. It's not really the growth rate that matters; what matter is
> >> the *trend*, i.e., the fact that there *is* growth (the stack gets
> >> deeper and deeper). The stack might not overflow immediately, and if the
> >> handler speeds up (for whatever reason), the stack might recover, but
> >> there is nothing to prevent an overflow.
> >>
> >> So, in the end, I think you've convinced me.
> >
> > :)
> >
> >>> I'm happy to send a patch to migrate NestedInterruptTplLib to
> >>> MdeModulePkg, so that it can be consumed outside of OvmfPkg.  Shall I
> do
> >>> this?
> >>
> >> Sounds like a valid idea to me.
> >>
> >> Could be greatly supported by a test case (to be run on the bare metal)
> >> installing a slow handler that *eventually* exhausted the stack, when
> >> not using NestedInterruptTplLib.
> >>
> >> (FWIW, IIRC, the UEFI spec warns about this -- it says something like,
> >> "return from TPL_HIGH as soon as you can, otherwise the system will
> >> become unstable".)
> >>
> >> Sorry for the wall of text, I find this very difficult to reason about.
> >
> > I also find it very difficult to reason about, which is why
> > NestedInterruptRestoreTpl() has 126 lines of comments providing a
> > semi-formal proof of correctness for a mere 15 statements of C code!
> >
> > In particular, I find it difficult to reason about when it would be safe
> > for a platform to *not* use NestedInterruptTplLib.  It's clearly
> > empirically difficult to trigger stack underflow via an interrupt
> > "storm" on physical hardware, but I'm not convinced it's impossible.
> >
> > I find it mentally easier to rely on the hard guarantee that
> > NestedInterruptTplLib provides: that nested interrupts will continue to
> > be delivered but that the number of interrupt-induced stack frames is
> > bounded by the (small, finite) number of distinct TPL levels in existence.
> >
> >
> >
> > While developing NestedInterruptTplLib, I did hack together a test case
> > for a slow handler that would deliberately induce an interrupt storm,
> > since I needed this to test that my code was working.  When triggered,
> > this test would cause the machine to effectively hang due to servicing
> > an endless storm of timer interrupts.  Before NestedInterruptTplLib, the
> > stack would soon underflow and would typically cause a reboot (or other
> > crash).  With NestedInterruptTplLib the machine would continue to
> > service interrupts indefinitely.
> >
> > How might such a test case be included in upstream EDK2?  I'm
> > peripherally aware of EDK2 test infrastructure such as UEFI SCT, but
> > I've never interacted with it yet.
> 
> I'm vaguely aware of a unit test framework inside edk2, but the best I
> can give you is just this link:
> 
> https://github.com/tianocore/edk2/tree/master/UnitTestFrameworkPkg#unit
> -test-framework-package
> 
> There are some files under the directory "MdeModulePkg/Test" too;
> git-log on that subdir, and perhaps the MdeModulePkg maintainers, might
> provide more pointers.
> 
> The end of the readme linked above says to ask Bret, Mike and Sean, as well.
> 
> Laszlo



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#113932): https://edk2.groups.io/g/devel/message/113932
Mute This Topic: https://groups.io/mt/103734961/7686176
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/leave/12367111/7686176/1913456212/xyzzy [rebecca@openfw.io]
-=-=-=-=-=-=-=-=-=-=-=-



  reply	other threads:[~2024-01-17  7:11 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-15  8:03 [edk2-devel] Add LocalApicTimerDxe driver in UefiCpuPkg Ni, Ray
2024-01-15  8:03 ` [edk2-devel] [PATCH 1/6] UefiCpuPkg/LocalApicTimerDxe: Duplicate OvmfPkg/LocalApicTimerDxe driver Ni, Ray
2024-01-15  8:59   ` Pedro Falcato
2024-01-15 18:10     ` Michael D Kinney
2024-01-16 10:02       ` Laszlo Ersek
2024-01-16 17:17         ` Michael D Kinney
2024-01-15 19:30     ` Laszlo Ersek
2024-01-16  8:47       ` Gerd Hoffmann
2024-01-16  9:48         ` Michael Brown
2024-01-16 14:34           ` Laszlo Ersek
2024-01-16 15:16             ` Michael Brown
2024-01-16 15:37               ` Laszlo Ersek
2024-01-17  7:11                 ` Ni, Ray [this message]
2024-01-17 10:46                   ` Michael Brown
2024-01-19 13:14                     ` Ni, Ray
2024-01-19 17:42                       ` Michael Brown
2024-01-19 23:44                         ` Ni, Ray
2024-01-20  0:49                           ` Michael Brown
2024-01-22 10:15                         ` Gerd Hoffmann
2024-01-15  8:03 ` [edk2-devel] [PATCH 2/6] UefiCpuPkg/LocalApicTimerDxe: Remove NestedInterruptTplLib call Ni, Ray
2024-01-15  8:03 ` [edk2-devel] [PATCH 3/6] UefiCpuPkg: Add LocalApicTimerDxe driver in DSC file Ni, Ray
2024-01-15  8:03 ` [edk2-devel] [PATCH 4/6] UefiCpuPkg/LocalApicTimerDxe: Enhance Timer Frequency calculation logic Ni, Ray
2024-01-15  8:03 ` [edk2-devel] [PATCH 5/6] UefiCpuPkg/LocalApicTimerDxe: warn if APIC Timer is used by other code Ni, Ray
2024-01-15  8:03 ` [edk2-devel] [PATCH 6/6] UefiCpuPkg/LocalApicTimerDxe: Passing fixed timer period is not a bug Ni, Ray

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MN6PR11MB8244FFE620B349552A7BDDD68C722@MN6PR11MB8244.namprd11.prod.outlook.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox