From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail02.groups.io (mail02.groups.io [66.175.222.108]) by spool.mail.gandi.net (Postfix) with ESMTPS id 272B97803E7 for ; Thu, 29 Feb 2024 13:03:10 +0000 (UTC) DKIM-Signature: a=rsa-sha256; bh=AcW0EYTSlZxh5i6ocpShkDrMqnkCphev+4jk/vBBJG4=; c=relaxed/simple; d=groups.io; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:MIME-Version:Precedence:List-Subscribe:List-Help:Sender:List-Id:Mailing-List:Delivered-To:Reply-To:List-Unsubscribe-Post:List-Unsubscribe:Content-Transfer-Encoding; s=20140610; t=1709211788; v=1; b=Mx0SU/4H/ahOXKwYvx67Ba6cLWBNiGqa0igcL3sB3M7PwuNeYG2pBhZIC+lyI95qGOICKwKR kK8kTitDPpXjlF4+kbFLiNdMgsTv/Fayi5aSNLL/COI0uYOr/SV7wdSwDmBgSzGkmkWI/w2M+aF XTkPh+CfWzhzpivUUw2jagug= X-Received: by 127.0.0.2 with SMTP id a6dyYY7687511xYsOmV7HL5n; Thu, 29 Feb 2024 05:03:08 -0800 X-Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.16]) by mx.groups.io with SMTP id smtpd.web11.24094.1709211785992876904 for ; Thu, 29 Feb 2024 05:03:06 -0800 X-IronPort-AV: E=McAfee;i="6600,9927,10998"; a="3794427" X-IronPort-AV: E=Sophos;i="6.06,194,1705392000"; d="scan'208";a="3794427" X-Received: from orviesa006.jf.intel.com ([10.64.159.146]) by orvoesa108.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Feb 2024 05:02:55 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.06,194,1705392000"; d="scan'208";a="8241620" X-Received: from shwdeopenlab706.ccr.corp.intel.com ([10.239.55.95]) by orviesa006.jf.intel.com with ESMTP; 29 Feb 2024 05:02:53 -0800 From: "Ni, Ray" To: devel@edk2.groups.io Cc: Michael D Kinney , Liming Gao , Laszlo Ersek , Michael Brown , Paolo Bonzini Subject: [edk2-devel] [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack overflow issue due to nested interrupts Date: Thu, 29 Feb 2024 21:02:46 +0800 Message-Id: <20240229130246.3-3-ray.ni@intel.com> In-Reply-To: <20240229130246.3-1-ray.ni@intel.com> References: <20240229130246.3-1-ray.ni@intel.com> MIME-Version: 1.0 Precedence: Bulk List-Subscribe: List-Help: Sender: devel@edk2.groups.io List-Id: Mailing-List: list devel@edk2.groups.io; contact devel+owner@edk2.groups.io Reply-To: devel@edk2.groups.io,ray.ni@intel.com List-Unsubscribe-Post: List-Unsubscribe=One-Click List-Unsubscribe: X-Gm-Message-State: 8E5pa6rxmaKX6B082WdtAV51x7686176AA= Content-Transfer-Encoding: quoted-printable X-GND-Status: LEGIT Authentication-Results: spool.mail.gandi.net; dkim=pass header.d=groups.io header.s=20140610 header.b="Mx0SU/4H"; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=intel.com (policy=none); spf=pass (spool.mail.gandi.net: domain of bounce@groups.io designates 66.175.222.108 as permitted sender) smtp.mailfrom=bounce@groups.io Assuming the current TPL is TPL_APPLICATION, RaiseTPL (TPL_APPLICATION -> HIGH) and RestoreTPL (TPL_HIGH -> TPL_APPLICATION) are called in the timer interrupt context. RestoreTPL() will lower the TPL from TPL_HIGH to TPL_NOTIFY first, enable the interrupts and dispatch pending events associated with TPL_NOTIFY. Then it will lower TPL to TPL_CALLBACK, enable the interrupts and dispatch pending events associated with TPL_CALLBACK. In the end, it will lower the TPL to TPL_APPLICATION with interrupt enabled. However, it's possible that another timer interrupt happens just in the end of RestoreTPL() function when TPL is TPL_APPLICATION. CPU runs into the interrupt context but the contents pushed to the stack in the outer interrupt context haven't been fully popped. In the nested interrupt context, if 3rd interrupt happens just in the end of RestoreTPL() function, CPU runs to interrupt context again but the contents pushed to the stack in the 2nd interrupt context haven't been fully popped. The situation can repeat infinitely that could result in stack overflow. A ideal solution is to not keep the interrupt disabled when RestoreTPL(TPL_HIGH -> not TPL_HIGH) is executed in the timer interrupt context because the interrupt handler will re-enable the interrupt with arch specific instructions (e.g.: IRET for x86). The patch introduces mInterruptedTplMask which tells RestoreTPL() if it's called in the interrupt context and whether it should defer enabling the interrupt. Signed-off-by: Michael D Kinney Signed-off-by: Ray Ni Cc: Liming Gao Cc: Laszlo Ersek Cc: Michael Brown Cc: Paolo Bonzini --- MdeModulePkg/Core/Dxe/Event/Tpl.c | 151 +++++++++++++++++++++++++++++- 1 file changed, 147 insertions(+), 4 deletions(-) diff --git a/MdeModulePkg/Core/Dxe/Event/Tpl.c b/MdeModulePkg/Core/Dxe/Even= t/Tpl.c index 8ce456f968..f2206bc788 100644 --- a/MdeModulePkg/Core/Dxe/Event/Tpl.c +++ b/MdeModulePkg/Core/Dxe/Event/Tpl.c @@ -9,6 +9,100 @@ SPDX-License-Identifier: BSD-2-Clause-Patent #include "DxeMain.h"=0D #include "Event.h"=0D =0D +//=0D +// It's used to support nested interrupts.=0D +// The bit position is the TPL that was interrupted.=0D +// The bit is set when the TPL is interrupted.=0D +//=0D +// Example 1):=0D +// Assume system runs at TPL_APPLICATION (4) and there is no pending eve= nt.=0D +//=0D +// Timer interrupt happens. CPU runs to the interrupt context.=0D +// 1. Interrupt context (Interrupted TPL =3D TPL_APPLICATION):=0D +// CoreRaiseTpl(TPL_APPLICATION -> TPL_HIGH) is called where=0D +// mInterruptedTplMask is changed from 0 to 0x10.=0D +//=0D +// Note: CoreRaiseTpl(TPL_HIGH) could be called from Timer driver, or = CoreTimerTick().=0D +// If it's called from both, the 2nd call in CoreTimerTick() will no= t change mInterruptedTplMask=0D +// as it's a TPL raise from TPL_HIGH to TPL_HIGH.=0D +//=0D +// When CoreRestoreTpl(TPL_HIGH -> TPL_APPLICATION) is called, because= there is no pending event it will=0D +// lower TPL to TPL_APPLICATION, but with interrupts disabled as the T= PL_APPLICATION bit is set in=0D +// mInterruptedTplMask.=0D +// mInterruptedTplMask is changed from 0x10 to 0.=0D +//=0D +// Example 2):=0D +// Assume system runs at TPL_APPLICATION (4) and there is only one event= at TPL_CALLBACK (8) which will=0D +// register another event at TPL_NOTIFY (16).=0D +//=0D +// Timer interrupt happens. CPU runs to the (outer) interrupt context.=0D +// 1. Outer interrupt context (Interrupted TPL =3D TPL_APPLICATION):=0D +// CoreRaiseTpl(TPL_APPLICATION -> TPL_HIGH) is called where=0D +// mInterruptedTplMask is changed from 0 to 0x10.=0D +//=0D +// When CoreRestoreTpl(TPL_HIGH -> TPL_APPLICATION) is called, because= there is one event at=0D +// TPL_CALLBACK (8) it will lower TPL to TPL_CALLBACK, enable the inte= rrupts and dispatch it.=0D +//=0D +// The event registers another event associating with TPL_NOTIFY.=0D +//=0D +// 2nd timer interrupt happens. CPU runs to the inner-1/nested-1 inter= rupt context.=0D +//=0D +// 2. Inner-1/nested-1 interrupt context (Interrupted TPL =3D TPL_CALL= BACK):=0D +// CoreRaiseTpl(TPL_CALLBACK -> TPL_HIGH) is called where=0D +// mInterruptedTplMask is changed from 0x10 to 0x110.=0D +//=0D +// When CoreRestoreTpl(TPL_HIGH -> TPL_CALLBACK) is called, because = there is one event at=0D +// TPL_NOTIFY (16) it will lower TPL to TPL_NOTIFY, enable the inter= rupts and dispatch it.=0D +//=0D +// 3rd timer interrupt happens. CPU runs to the inner-2/nested-2 int= errupt context.=0D +//=0D +// 3. Inner-2/nested-2 interrupt context (Interrupt TPL =3D TPL_NOTI= FY):=0D +// CoreRaiseTpl(TPL_NOTIFY -> TPL_HIGH) is called where=0D +// mInterruptedTplMask is changed from 0x110 to 0x10110.=0D +//=0D +// CoreTimerTick() signals mEfiCheckTimerEvent which queues a even= t at TPL_HIGH - 1.=0D +//=0D +// When CoreRestoreTpl(TPL_HIGH -> TPL_NOTIFY) is called, because = there is one event at=0D +// TPL_HIGH - 1 (30) it will lower TPL to TPL_HIGH - 1, enable the= interrupts and dispatch it.=0D +//=0D +// 4th timer interrupt happens. CPU runs to the inner-3/nested-3 i= nterrupt context.=0D +//=0D +// 4. Inner-3/nested-3 interrupt context (Interrupt TPL =3D TPL_= HIGH - 1):=0D +// CoreRaiseTpl(TPL_HIGH - 1 -> TPL_HIGH) is called where=0D +// mInterruptedTplMask is changed from 0x10110 to 0x40010110.=0D +//=0D +// When CoreRestoreTpl(TPL_HIGH -> TPL_HIGH - 1) is called, beca= use there is no pending event it will=0D +// lower TPL to TPL_HIGH - 1, but with interrupts disabled as th= e (TPL_HIGH - 1) bit is set in=0D +// mInterruptedTplMask.=0D +// mInterruptedTplMask is changed from 0x40010110 to 0x10110.=0D +//=0D +// Arch specific instruction (e.g.: IRET for X86) in the interru= pt handler will re-enable the interrupts=0D +// and return to the inner-2/nested-2 interrupt context.=0D +//=0D +// 3. Inner-2/nested-2 interrupt context continues (Interrupt TPL = =3D TPL_NOTIFY):=0D +// CoreRestoreTpl (TPL_HIGH -> TPL_NOTIFY) lowers TPL to TPL_NOTIF= Y with interrupts disabled=0D +// as the (TPL_NOTIFY) bit is set in mInterruptedTplMask.=0D +// mInterruptedTplMask is changed from 0x10110 to 0x110.=0D +//=0D +// Arch specific instruction (e.g.: IRET for X86) in the interrupt= handler will re-enable the interrupts=0D +// and return to the inner-1/nested-1 interrupt context.=0D +//=0D +// 2. Inner-1/nested-1 interrupt context continues (Interrupt TPL =3D = TPL_CALLBACK):=0D +// CoreRestoreTpl (TPL_HIGH -> TPL_CALLBACK) lowers TPL to TPL_CALLB= ACK with interrupts disabled=0D +// as the (TPL_CALLBACK) bit is set in mInterruptedTplMask.=0D +// mInterruptedTplMask is changed from 0x110 to 0x10.=0D +// Arch specific instruction (e.g.: IRET for X86) in the interrupt h= andler will re-enable the interrupts=0D +// and return to the outer interrupt context.=0D +//=0D +// 1. Outer interrupt context continues (Interrupted TPL =3D TPL_APPLICA= TION):=0D +// CoreRestoreTpl (TPL_HIGH -> TPL_APPLICATION) lowers TPL to TPL_APPL= ICATION with interrupts disabled=0D +// as the (TPL_APPLICATION) bit is set in mInterruptedTplMask.=0D +// mInterruptedTplMask is changed from 0x10 to 0.=0D +// Arch specific instruction (e.g.: IRET for X86) in the interrupt han= dler will re-enable the interrupts=0D +// and return to main flow.=0D +//=0D +volatile static UINTN mInterruptedTplMask =3D 0;=0D +=0D /**=0D Set Interrupt State.=0D =0D @@ -59,6 +153,7 @@ CoreRaiseTpl ( )=0D {=0D EFI_TPL OldTpl;=0D + BOOLEAN InterruptState;=0D =0D OldTpl =3D gEfiCurrentTpl;=0D if (OldTpl > NewTpl) {=0D @@ -72,7 +167,40 @@ CoreRaiseTpl ( // If raising to high level, disable interrupts=0D //=0D if ((NewTpl >=3D TPL_HIGH_LEVEL) && (OldTpl < TPL_HIGH_LEVEL)) {=0D - CoreSetInterruptState (FALSE);=0D + //=0D + // When gCpu is NULL, State is TRUE.=0D + // Calling CoreSetInterruptState() with TRUE is safe as CoreSetInterru= ptState() will directly return=0D + // when gCpu is NULL.=0D + //=0D + InterruptState =3D TRUE;=0D + if (gCpu !=3D NULL) {=0D + gCpu->GetInterruptState (gCpu, &InterruptState);=0D + }=0D +=0D + if (InterruptState) {=0D + //=0D + // Interrupts are currently enabled.=0D + // Disable them for going to HIGH level.=0D + //=0D + CoreSetInterruptState (FALSE);=0D + } else {=0D + //=0D + // Interrupts are already disabled.=0D + // gEfiCurrentTpl is the "Interrupted TPL".=0D + // It's a non-nested interrupt if mInterruptedTplMask is 0.=0D + // It's a nested interrupt otherwise.=0D + // Nested interrupts are ONLY allowed at TPL > "Interrupted TPL", ot= herwise=0D + // stack overflow might occur.=0D + // If it's a nested interrupt, the "Interrupted TPL" should be highe= r than=0D + // the outer's "Interrupted TPL". ASSERT() is to check that.=0D + //=0D + ASSERT ((INTN)gEfiCurrentTpl > HighBitSet64 (mInterruptedTplMask));= =0D +=0D + //=0D + // Save the "Interrupted TPL" (TPL that was interrupted).=0D + //=0D + mInterruptedTplMask |=3D (UINTN)(1 << gEfiCurrentTpl);=0D + }=0D }=0D =0D //=0D @@ -134,9 +262,9 @@ CoreRestoreTpl ( }=0D =0D //=0D - // Set the new value=0D + // Set the new TPL with interrupt disabled.=0D //=0D -=0D + CoreSetInterruptState (FALSE);=0D gEfiCurrentTpl =3D NewTpl;=0D =0D //=0D @@ -144,7 +272,22 @@ CoreRestoreTpl ( // interrupts are enabled=0D //=0D if (gEfiCurrentTpl < TPL_HIGH_LEVEL) {=0D - CoreSetInterruptState (TRUE);=0D + if ((INTN)gEfiCurrentTpl > HighBitSet64 (mInterruptedTplMask)) {=0D + //=0D + // Only enable interrupts if restoring to a level above the highest= =0D + // interrupted TPL level. This allows interrupt nesting, but only f= or=0D + // events at higher TPL level than the current TPL level.=0D + //=0D + CoreSetInterruptState (TRUE);=0D + } else {=0D + //=0D + // Clear interrupted TPL level mask, but do not re-enable interrupts= here=0D + // This will return to CoreTimerTick() and interrupts will be re-ena= bled=0D + // when the timer interrupt handlers returns from interrupt context.= =0D + //=0D + ASSERT ((INTN)gEfiCurrentTpl =3D=3D HighBitSet64 (mInterruptedTplMas= k));=0D + mInterruptedTplMask &=3D ~(UINTN)(1 << gEfiCurrentTpl);=0D + }=0D }=0D =0D DEBUG_CODE (=0D --=20 2.39.1.windows.1 -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#116167): https://edk2.groups.io/g/devel/message/116167 Mute This Topic: https://groups.io/mt/104642317/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io] -=-=-=-=-=-=-=-=-=-=-=-