public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: "Ni, Ray" <ray.ni@intel.com>
To: Michael Brown <mcb30@ipxe.org>,
	"devel@edk2.groups.io" <devel@edk2.groups.io>,
	Laszlo Ersek <lersek@redhat.com>,
	"kraxel@redhat.com" <kraxel@redhat.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>,
	"Kinney, Michael D" <michael.d.kinney@intel.com>,
	"Desimone, Nathaniel L" <nathaniel.l.desimone@intel.com>,
	"Kumar, Rahul R" <rahul.r.kumar@intel.com>,
	"Liu, Zhiguang" <zhiguang.liu@intel.com>
Subject: Re: [edk2-devel] [PATCH 1/6] UefiCpuPkg/LocalApicTimerDxe: Duplicate OvmfPkg/LocalApicTimerDxe driver
Date: Fri, 19 Jan 2024 13:14:16 +0000	[thread overview]
Message-ID: <MN6PR11MB8244BB617C68D27D65B8F5A78C702@MN6PR11MB8244.namprd11.prod.outlook.com> (raw)
In-Reply-To: <0102018d17080a28-370d97eb-b73c-4811-a9ea-05cfb459ba37-000000@eu-west-1.amazonses.com>

[-- Attachment #1: Type: text/plain, Size: 6559 bytes --]

Michael,

Thanks for the explanation.
I tried to expand the code flow in below and help the discussion..

TimerInterruptHandler()
    gBS->RaiseTPL (HIGH)
    gBS->RestoreTPL (APPLICATION) // expand in below.
        For Tpl = {NOTIFY, CALLBACK}:          ---- [for-loop]
            if PendingBit is not set:
                continue
            gCurrentTpl = Tpl
            EnableInterrupt()                  ---- [1]
            CoreDispatchEventNotifies(Tpl) // expand in below.
                gBS->RaiseTPL (HIGH)
                gBS->RestoreTPL (Tpl)
                NotifyFunction()               ---- [2]
                gBS->RaiseTPL (HIGH)
                gBS->RestoreTPL (Tpl)
        End-For
        gCurrentTpl = APPLICATION              ---- [3]
        EnableInterrupt()                      ---- [4]
IRET


  1.  Agree that the stack overflow could happen in real platform.
The interrupt-enabled env when CPU runs gBS->RestoreTPL(APPLICATION)  could be 3 cases: When gCurrentTpl is NOTIFY, CALLBACK or APPLICATION.
Let's name them as "env:NOTIFY", "env:CALLBACK" and "env:APPLICATION".
CPU enters "env:NOTIFY" and "env:CALLBACK" in [1].
CPU enters "env:APPLICATION" in [4], or [3] when the interrupt is already enabled in the [for-loop].

When interrupt happens in "env:NOTIFY", the inner interrupt handler calls gBS->RestoreTPL(NOTIFY). The interrupt-enabled env in the inner RestoreTPL(NOTIFY) is "env:NOTIFY" only.
When interrupt happens in "env:CALLBACK", the inner interrupt handler calls gBS->RestoreTPL(CALLBACK). The interrupt-enabled env in the inner RestoreTPL(CALLBACK) can be: "env:NOTIFY" and "env:CALLBACK".

So, the interrupt re-entrance we want to avoid is "env:NOTIFY"  -> "env:NOTIFY", or "env:CALLBACK" -> "env:CALLBACK", or "env:APPLICATION" -> "env:APPLICATION". Because it's endless.
NestedTplInterruptLib was written to avoid it.

However, it's ok for: "env:APPLICATION" -> "env:CALLBACK" -> "env:NOTIFY", or "env:CALLBACK" -> "env:NOTIFY".

In real platform, it's possible that interrupt happens just after [4], and in worst case, the RestoreTpl() call in the inner timer interrupt handler is interrupted after [4] again, and again.


  1.  Some questions on NestedInterruptTplLib.

  1.  Can we remove DisableInterruptsOnIret()? That means the inner interrupt handler would returns to the outer world with interrupt enabled and TPL==HIGH. But I don't see any issue with that.
  2.  If DxeCore can be changed, do you have an easier-to-understand solution? It really took me 2 days to understand why NestedInterruptTplLib is written in today's way.


thanks,
ray
________________________________
From: Michael Brown <mcb30@ipxe.org>
Sent: Wednesday, January 17, 2024 6:46:59 PM
To: devel@edk2.groups.io <devel@edk2.groups.io>; Ni, Ray <ray.ni@intel.com>; Laszlo Ersek <lersek@redhat.com>; kraxel@redhat.com <kraxel@redhat.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>; Kinney, Michael D <michael.d.kinney@intel.com>; Desimone, Nathaniel L <nathaniel.l.desimone@intel.com>; Kumar, Rahul R <rahul.r.kumar@intel.com>
Subject: Re: [edk2-devel] [PATCH 1/6] UefiCpuPkg/LocalApicTimerDxe: Duplicate OvmfPkg/LocalApicTimerDxe driver

On 17/01/2024 07:11, Ni, Ray wrote:
> The above flow shows endless re-entrance of timer interrupt handler.
>
> But, my question is: above flow only can happen in real platform when the below 4 steps occupies more time than the timer period (usually 10ms).
>              [Timer Interrupt #2]1. RaiseTPL (HIGH) from NOTIFY causing CPU interrupt be disabled.
>              [Timer Interrupt #2]2. Send APIC EOI (ACK the interrupt received so APIC can continue generate interrupts)
>              [Timer Interrupt #2]3. Call DxeCore::CoreTimerTick()
>              [Timer Interrupt #2]4. RestoreTPL (NOTIFY) from HIGH. No callback runs as no callback can be registered at TPL > NOTIFY. In the end of RestoreTPL(), CPU interrupt is enabled.
>
> But, in my opinion, it's impossible.

As is thoroughly documented in NestedInterruptRestoreTpl(), the
potential for unbounded stack consumption arises when an interrupt
occurs after the point that RestoreTPL() completes dispatching all
notifications but before the IRET (or equivalent) instruction pops the
original stack frame.

Since dispatching notifications can take an unbounded amount of time,
there is absolutely no guarantee that this will be less than 10ms after
the previous interrupt.  It could easily be 30 seconds later.

The problematic flow is a subtle variation on what you described:

   [IRQ#1] timer interrupt at TPL_APPLICATION
     [ISR#1] RaiseTPL from TPL_APPLICATION -> TPL_HIGH_LEVEL
     [ISR#1] Send APIC EOI
     [ISR#1] Call CoreTimerTick()
     [ISR#1] RestoreTPL from TPL_HIGH_LEVEL -> TPL_APPLICATION
       [ISR#1] Callbacks for TPL_NOTIFY are run
       [ISR#1] Callbacks for TPL_CALLBACK are run
       ... these may take several *seconds* to complete, during
           which further interrupts are raised, the details of
           which are not shown here...
       [ISR#1] TPL is now restored to TPL_APPLICATION
       [IRQ#N] timer interrupt at TPL_APPLICATION
         [ISR#N] RaiseTPL from TPL_APPLICATION -> TPL_HIGH_LEVEL
         ... continues ...

The root cause is that the ISR reaches a state in which:

   a) an arbitrary amount of time has elapsed since the triggering
      interrupt (due to unknown callbacks being invoked, which may
      themselves wait for further timer interrupts), and

   b) the TPL has been fully restored back to the TPL at the point
      the triggering interrupt occurred (i.e. TPL_APPLICATION in
      this example), and

   c) the timer interrupt source is enabled, and

   d) CPU interrupts are enabled

At this point, there is nothing preventing another interrupt from
occurring.  It will occur at TPL_APPLICATION and it will be one stack
frame deeper than the previous interrupt at TPL_APPLICATION.

Rinse and repeat, and you have unbounded stack consumption.

Hence the requirement for NestedInterruptTplLib, even on physical hardware.

Michael


-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#114045): https://edk2.groups.io/g/devel/message/114045
Mute This Topic: https://groups.io/mt/103734961/7686176
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/leave/12367111/7686176/1913456212/xyzzy [rebecca@openfw.io]
-=-=-=-=-=-=-=-=-=-=-=-



[-- Attachment #2: Type: text/html, Size: 21900 bytes --]

  reply	other threads:[~2024-01-19 13:14 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-15  8:03 [edk2-devel] Add LocalApicTimerDxe driver in UefiCpuPkg Ni, Ray
2024-01-15  8:03 ` [edk2-devel] [PATCH 1/6] UefiCpuPkg/LocalApicTimerDxe: Duplicate OvmfPkg/LocalApicTimerDxe driver Ni, Ray
2024-01-15  8:59   ` Pedro Falcato
2024-01-15 18:10     ` Michael D Kinney
2024-01-16 10:02       ` Laszlo Ersek
2024-01-16 17:17         ` Michael D Kinney
2024-01-15 19:30     ` Laszlo Ersek
2024-01-16  8:47       ` Gerd Hoffmann
2024-01-16  9:48         ` Michael Brown
2024-01-16 14:34           ` Laszlo Ersek
2024-01-16 15:16             ` Michael Brown
2024-01-16 15:37               ` Laszlo Ersek
2024-01-17  7:11                 ` Ni, Ray
2024-01-17 10:46                   ` Michael Brown
2024-01-19 13:14                     ` Ni, Ray [this message]
2024-01-19 17:42                       ` Michael Brown
2024-01-19 23:44                         ` Ni, Ray
2024-01-20  0:49                           ` Michael Brown
2024-01-22 10:15                         ` Gerd Hoffmann
2024-01-15  8:03 ` [edk2-devel] [PATCH 2/6] UefiCpuPkg/LocalApicTimerDxe: Remove NestedInterruptTplLib call Ni, Ray
2024-01-15  8:03 ` [edk2-devel] [PATCH 3/6] UefiCpuPkg: Add LocalApicTimerDxe driver in DSC file Ni, Ray
2024-01-15  8:03 ` [edk2-devel] [PATCH 4/6] UefiCpuPkg/LocalApicTimerDxe: Enhance Timer Frequency calculation logic Ni, Ray
2024-01-15  8:03 ` [edk2-devel] [PATCH 5/6] UefiCpuPkg/LocalApicTimerDxe: warn if APIC Timer is used by other code Ni, Ray
2024-01-15  8:03 ` [edk2-devel] [PATCH 6/6] UefiCpuPkg/LocalApicTimerDxe: Passing fixed timer period is not a bug Ni, Ray

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MN6PR11MB8244BB617C68D27D65B8F5A78C702@MN6PR11MB8244.namprd11.prod.outlook.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox