From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail02.groups.io (mail02.groups.io [66.175.222.108]) by spool.mail.gandi.net (Postfix) with ESMTPS id CE77B7803E8 for ; Fri, 19 Jan 2024 13:14:29 +0000 (UTC) DKIM-Signature: a=rsa-sha256; bh=x+nam+4hQ1JGdZFaCY0nujDLb0DT64dvcq5y9lzZUuE=; c=relaxed/simple; d=groups.io; h=ARC-Seal:ARC-Message-Signature:ARC-Authentication-Results:From:To:CC:Subject:Thread-Topic:Thread-Index:Date:Message-ID:References:In-Reply-To:Accept-Language:MIME-Version:Precedence:List-Subscribe:List-Help:Sender:List-Id:Mailing-List:Delivered-To:Reply-To:List-Unsubscribe-Post:List-Unsubscribe:Content-Language:Content-Type; s=20140610; t=1705670068; v=1; b=BAj4jdGvRYpHvYz9uvGcVpY2eUa7YnxQyBxQei5Ln7tqcSkNtmCctqOY0IKi1HZk9SVrg+/G YyJyw1psUbrRGTESNjx8OnUp9NKUfD4qayIcS6t6rCwbxsyln7jhBiBgeZgPJwUkt9TsXFPHgbc 4LaMjERU7yNe4rq9LwlBeaj0= X-Received: by 127.0.0.2 with SMTP id 2xmZYY7687511xr1Vg5iuy43; Fri, 19 Jan 2024 05:14:28 -0800 X-Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) by mx.groups.io with SMTP id smtpd.web11.21670.1705670067626558833 for ; Fri, 19 Jan 2024 05:14:27 -0800 X-IronPort-AV: E=McAfee;i="6600,9927,10956"; a="14263730" X-IronPort-AV: E=Sophos;i="6.05,204,1701158400"; d="scan'208,217";a="14263730" X-Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jan 2024 05:14:27 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10956"; a="1031941852" X-IronPort-AV: E=Sophos;i="6.05,204,1701158400"; d="scan'208,217";a="1031941852" X-Received: from orsmsx603.amr.corp.intel.com ([10.22.229.16]) by fmsmga006.fm.intel.com with ESMTP/TLS/AES256-GCM-SHA384; 19 Jan 2024 05:14:26 -0800 X-Received: from orsmsx612.amr.corp.intel.com (10.22.229.25) by ORSMSX603.amr.corp.intel.com (10.22.229.16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Fri, 19 Jan 2024 05:14:25 -0800 X-Received: from orsmsx610.amr.corp.intel.com (10.22.229.23) by ORSMSX612.amr.corp.intel.com (10.22.229.25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Fri, 19 Jan 2024 05:14:25 -0800 X-Received: from ORSEDG602.ED.cps.intel.com (10.7.248.7) by orsmsx610.amr.corp.intel.com (10.22.229.23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35 via Frontend Transport; Fri, 19 Jan 2024 05:14:25 -0800 X-Received: from NAM11-BN8-obe.outbound.protection.outlook.com (104.47.58.168) by edgegateway.intel.com (134.134.137.103) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Fri, 19 Jan 2024 05:14:24 -0800 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ljXpeBDhA7OcXsmq3sLfOJ/1eVEOHILhABgqO1/+VBg9DPCFS5bEu1Siere/WO0s9wBCE37JVKTQGouDE051ZN5dAif6AJzdQ2UAR12xv2c5SMQwBPpFNZCBwOYZFvlua+cqEVYUCcpLmfRpmTrl0aZAe4U5QR3OKvrdiI5XvxvN+pwB3rqMQCGOiFgenmoFtYA3ycT5P3D2n80ObfdZpBW0FkStPz3HkWIa3bmxEK/uphTSR/AZjDeerxjp6bD7+ThmdmRLXS/cb1RoRqu7DxMOdyq45WLrQzJAhoRYTH3C4aXNEReiem3LbD131HMcg2qv8Qbf0jINj/Tp3MHgTg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=auZl/nolg3jQ4msmrN29XrDiSwiVDNFuQGbVjho/c8M=; b=Hk7ItawvSKTeoT7sf/KQ29T05pas3iE24RJe2FsBp4KVFevdOZ7A41MCys3ucN00DczuQXsa8xSPI0AESMa2XzvbCg8BHkWDkyIn3xDddlkti610KFBA5uNiao7UMNxT4HMBys5wolcSsshFdHgawGYZHeGHQnAtnahWpXwZkd7qyI94WyX5ibt5wsWX7ct1Yq8jr2Tpi3FpT1+wF546U6KmSpW5dQFEnSSmKiDgMD+EE4Xk8l4abLyctTqsj04V+OjgA2SoJx5LRdyWdztdSmwX8QTFP2PQAuZxsZCJrgdJLNT2Rn8vS0dra2sEHpiD8CKX+S19FTohZWO67DDOVg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none X-Received: from MN6PR11MB8244.namprd11.prod.outlook.com (2603:10b6:208:470::14) by SJ0PR11MB5599.namprd11.prod.outlook.com (2603:10b6:a03:3af::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7202.24; Fri, 19 Jan 2024 13:14:16 +0000 X-Received: from MN6PR11MB8244.namprd11.prod.outlook.com ([fe80::3fea:ca2b:2ef7:e3d4]) by MN6PR11MB8244.namprd11.prod.outlook.com ([fe80::3fea:ca2b:2ef7:e3d4%4]) with mapi id 15.20.7202.024; Fri, 19 Jan 2024 13:14:16 +0000 From: "Ni, Ray" To: Michael Brown , "devel@edk2.groups.io" , Laszlo Ersek , "kraxel@redhat.com" CC: Pedro Falcato , "Kinney, Michael D" , "Desimone, Nathaniel L" , "Kumar, Rahul R" , "Liu, Zhiguang" Subject: Re: [edk2-devel] [PATCH 1/6] UefiCpuPkg/LocalApicTimerDxe: Duplicate OvmfPkg/LocalApicTimerDxe driver Thread-Topic: [edk2-devel] [PATCH 1/6] UefiCpuPkg/LocalApicTimerDxe: Duplicate OvmfPkg/LocalApicTimerDxe driver Thread-Index: AQHaSGFHyevSvtTsF0K3gPtn4XXPkrDcgWkAgAAL4gCAAAWlAIAAs3KAgACN2YCAAFpPlg== Date: Fri, 19 Jan 2024 13:14:16 +0000 Message-ID: References: <20240115080325.147-1-ray.ni@intel.com> <20240115080325.147-2-ray.ni@intel.com> <0102018d11ac8fe8-1ce1e102-af57-426f-bafc-7297bec4799a-000000@eu-west-1.amazonses.com> <0102018d12d8bd9f-d209332f-f501-498e-b43c-3b0cc4f7ef7b-000000@eu-west-1.amazonses.com> <0102018d17080a28-370d97eb-b73c-4811-a9ea-05cfb459ba37-000000@eu-west-1.amazonses.com> In-Reply-To: <0102018d17080a28-370d97eb-b73c-4811-a9ea-05cfb459ba37-000000@eu-west-1.amazonses.com> Accept-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-publictraffictype: Email x-ms-traffictypediagnostic: MN6PR11MB8244:EE_|SJ0PR11MB5599:EE_ x-ms-office365-filtering-correlation-id: 787b1c65-81e9-4ff6-0c2c-08dc18f089ef x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam-message-info: AxCuGU0txphsoJDMDQ65vclsa4T8KZlYZ+COOBJPJVbR4Ru3hRaJidJOASIzNFbQeEFGDoR94SMMufTPX1d/3qgFJbxxqzwAjRKox3bO9YFSkqtP9AhX3deJOWvhYITkWEt0+MCK3dtOQgR5LxZI2k8xMtTOUfraFUcZ0/HIHvLUrOJ/inLsO6vNQjeKt6cY4rNUj4MttMsqhy4GFC45Ivd9djk6g2qkL3YkNr7s4eQ6HHeUwnD3gvEIqdRId2Olk4w7V+74ojp24S6rS/JDTxL4RJ7NtX1RsrUgfFJS/jXiFfhg9uJnZtSUUja0P+ygVDD7YvRIu4n1Kig3mDKgRJMCdKKskTbPzbT4G9oEv0U+ksb/nfR3BVctUlBrDqAngH4aKVKtLnjy3VEkidh/OX/xmmg8aVed6INkMse5yiWUBdiZzj2rhCiHaCnke4FCdfSvF0n+zWeDrvqXTB7NZnkH/vSAS6ozqJF4aW6isjHq/zZyjelMK4sa1q3Dhm27t4BDogC7ILzi15wazjh0GP8Xui+2Q2tpUrmzqQDouPWBefA+MShMGDpFkSbRESbZJGlSsjQN354A9YOY/i2oiiKUViy1cJ41J39nkO+403hzy9+ICxeLxq6Yl8LbqOkG x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?QxB/pZMeyVqXzRTVrjN2Eop7C8jAQ2IB785QcUHiQzmSRQax43LsG0tdGhI/?= =?us-ascii?Q?HEyVGf/732cl3E01aQBVOcYI+oB35nkuTu2RAVQDnhAwRoOnpDM3XcxwX+5h?= =?us-ascii?Q?ajIq9ssGuYqF0BIhbYuRFgypC2r6ye+uj183jOyKv1y/pzpCc1HAC2etBG4X?= =?us-ascii?Q?08CS2yCK0qCyazrZty+WU8jmy/I91DRCJNwe1x64HX0fXIvWbDCHf+BLmwkm?= =?us-ascii?Q?qS/7bFo6EAZN/uLmJTtB4zCuCaQB8B+9/J5BdAqyv3KJOgbodVtTdP9e1uki?= =?us-ascii?Q?+fypdkOFYHqLFy3YH2P7kMoDrqEYYmVm0JNtYXC8KVwax+4CZwrwmfEuyea6?= =?us-ascii?Q?gkIga4e9RvyTecI+X0Jxn8eFZjJhSkjxIStaPgoKe0KHLpeAL+s3irmJIh9b?= =?us-ascii?Q?1r5oROjiWChPjdcJ2d4DB2z7aQHJ1AoLx/ZiE7znrf1yk4eiI0kh9w3+Ga14?= =?us-ascii?Q?z8LZdns6Jkn/LfNuGkJFmUhJwVx65aMtk4kFKi9CQ38ua+9XIbaXcXDLEnoZ?= =?us-ascii?Q?KNF23gylfNxdIB1+C4BM8BPEncP5FJ8mC89dZqBf6HhckdzjZ6hpjSWdtFyo?= =?us-ascii?Q?rki9Xy5WjiLO3BHQ+LsLBIhf3to4XJE8Pl75JWS4vnQ614diXMEpO/ZUrNjM?= =?us-ascii?Q?W+nbUMiJAhF3tGKTSzv9Ws1WNiC5IcT/dTrfIyafglsQ8i6Po7oOVaAN21y2?= =?us-ascii?Q?hRI9fQjLxs3sZRqFMHdaYCca+sHLaDEZhqerR7XNyumIt/d4V7y4EgL9evgq?= =?us-ascii?Q?Cr1OAuEZjjd3tJGDkn3vJtrRtlxKQaJcFRscz9AaSFOJLWGh5MS9CI0R+5GV?= =?us-ascii?Q?BKn0f5n0T+ipMbNrohX+6zubBtrAOxxbL21eAG4zWvdX1ItFxokAcSU6hVXY?= =?us-ascii?Q?Cna/Qf65qBAqEudsIU6rfxyitwNDDkBeRu+cb7Zj2pWVp2f7/sBI5ZJ5ZIq6?= =?us-ascii?Q?zCh7rJHouFTWt45bYr404EEAAmtfeHy9OhRkTLYwmCyRulSc18X7orZSn3UU?= =?us-ascii?Q?LjYpr4WuvRL+XH4V6MrAF0SMWV9GPJEQXsc3lfxs6aWHs4e7hyw83+fUkxu6?= =?us-ascii?Q?lRYK/yTxV3y6G+YgkXFsJ06rpAOUqtXezSG2/NBhQz4OExMqDWGBW2nZhGC+?= =?us-ascii?Q?oGGy4bTaOkMX+Gfbk+ZnYPkz5JoWr3kkEfBHnVyT38pC66eFqFb1ljdIX+LY?= =?us-ascii?Q?mvk8jhT2/njBQlhl5wt4T4wATUSJGm9yUEbvjE67ZC1qT1hoFITBNS4wt0lc?= =?us-ascii?Q?d+DQpPE6Mq3nR8Xtn0Ymqkq3gTJvb4RpWJDHWSBK032MIeXGRH55VN+BFmCp?= =?us-ascii?Q?0rqDAjKIh5UukYLOYTGyInfxCF4ydPG5rd2iZbkGjJwu4jC5duV3oDHgPJH9?= =?us-ascii?Q?IIdD05CqDkjVLd0afm5DvJDq0dAaeB+15WO2amQwtF5mh8Fqz0phxr+Ux4rV?= =?us-ascii?Q?xuClT1TFjtmt3tOwqL8qaFP2SkBWxD1+yT68gFZVC1B0fRDXhitEYITjOqjl?= =?us-ascii?Q?HjieVdj48smDRFbaikKyNqOABzc3yGAxVucXGPF+Fey1FWFgV9jQHCi2mUgb?= =?us-ascii?Q?a1Si/gZ95a11NIXKC54=3D?= MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: MN6PR11MB8244.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 787b1c65-81e9-4ff6-0c2c-08dc18f089ef X-MS-Exchange-CrossTenant-originalarrivaltime: 19 Jan 2024 13:14:16.5106 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: RzQSghWKZeerY92QNMw+0y/wHZfTeSDON2QDQcjcKhL2EhyJQlTD+zVxNLDV+lCTj9cpHmpznecH8v/Y3Zsu2A== X-MS-Exchange-Transport-CrossTenantHeadersStamped: SJ0PR11MB5599 X-OriginatorOrg: intel.com Precedence: Bulk List-Subscribe: List-Help: Sender: devel@edk2.groups.io List-Id: Mailing-List: list devel@edk2.groups.io; contact devel+owner@edk2.groups.io Reply-To: devel@edk2.groups.io,ray.ni@intel.com List-Unsubscribe-Post: List-Unsubscribe=One-Click List-Unsubscribe: X-Gm-Message-State: uGDy70pBbI86TO9GCZ9wczFlx7686176AA= Content-Language: en-US Content-Type: multipart/alternative; boundary="_000_MN6PR11MB8244BB617C68D27D65B8F5A78C702MN6PR11MB8244namp_" X-GND-Status: LEGIT Authentication-Results: spool.mail.gandi.net; dkim=pass header.d=groups.io header.s=20140610 header.b=BAj4jdGv; arc=reject ("signature check failed: fail, {[1] = sig:microsoft.com:reject}"); dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=intel.com (policy=none); spf=pass (spool.mail.gandi.net: domain of bounce@groups.io designates 66.175.222.108 as permitted sender) smtp.mailfrom=bounce@groups.io --_000_MN6PR11MB8244BB617C68D27D65B8F5A78C702MN6PR11MB8244namp_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Michael, Thanks for the explanation. I tried to expand the code flow in below and help the discussion.. TimerInterruptHandler() gBS->RaiseTPL (HIGH) gBS->RestoreTPL (APPLICATION) // expand in below. For Tpl =3D {NOTIFY, CALLBACK}: ---- [for-loop] if PendingBit is not set: continue gCurrentTpl =3D Tpl EnableInterrupt() ---- [1] CoreDispatchEventNotifies(Tpl) // expand in below. gBS->RaiseTPL (HIGH) gBS->RestoreTPL (Tpl) NotifyFunction() ---- [2] gBS->RaiseTPL (HIGH) gBS->RestoreTPL (Tpl) End-For gCurrentTpl =3D APPLICATION ---- [3] EnableInterrupt() ---- [4] IRET 1. Agree that the stack overflow could happen in real platform. The interrupt-enabled env when CPU runs gBS->RestoreTPL(APPLICATION) could= be 3 cases: When gCurrentTpl is NOTIFY, CALLBACK or APPLICATION. Let's name them as "env:NOTIFY", "env:CALLBACK" and "env:APPLICATION". CPU enters "env:NOTIFY" and "env:CALLBACK" in [1]. CPU enters "env:APPLICATION" in [4], or [3] when the interrupt is already e= nabled in the [for-loop]. When interrupt happens in "env:NOTIFY", the inner interrupt handler calls g= BS->RestoreTPL(NOTIFY). The interrupt-enabled env in the inner RestoreTPL(N= OTIFY) is "env:NOTIFY" only. When interrupt happens in "env:CALLBACK", the inner interrupt handler calls= gBS->RestoreTPL(CALLBACK). The interrupt-enabled env in the inner RestoreT= PL(CALLBACK) can be: "env:NOTIFY" and "env:CALLBACK". So, the interrupt re-entrance we want to avoid is "env:NOTIFY" -> "env:NOT= IFY", or "env:CALLBACK" -> "env:CALLBACK", or "env:APPLICATION" -> "env:APP= LICATION". Because it's endless. NestedTplInterruptLib was written to avoid it. However, it's ok for: "env:APPLICATION" -> "env:CALLBACK" -> "env:NOTIFY", = or "env:CALLBACK" -> "env:NOTIFY". In real platform, it's possible that interrupt happens just after [4], and = in worst case, the RestoreTpl() call in the inner timer interrupt handler i= s interrupted after [4] again, and again. 1. Some questions on NestedInterruptTplLib. 1. Can we remove DisableInterruptsOnIret()? That means the inner interru= pt handler would returns to the outer world with interrupt enabled and TPL= =3D=3DHIGH. But I don't see any issue with that. 2. If DxeCore can be changed, do you have an easier-to-understand soluti= on? It really took me 2 days to understand why NestedInterruptTplLib is wri= tten in today's way. thanks, ray ________________________________ From: Michael Brown Sent: Wednesday, January 17, 2024 6:46:59 PM To: devel@edk2.groups.io ; Ni, Ray = ; Laszlo Ersek ; kraxel@redhat.com Cc: Pedro Falcato ; Kinney, Michael D ; Desimone, Nathaniel L ; K= umar, Rahul R Subject: Re: [edk2-devel] [PATCH 1/6] UefiCpuPkg/LocalApicTimerDxe: Duplica= te OvmfPkg/LocalApicTimerDxe driver On 17/01/2024 07:11, Ni, Ray wrote: > The above flow shows endless re-entrance of timer interrupt handler. > > But, my question is: above flow only can happen in real platform when the= below 4 steps occupies more time than the timer period (usually 10ms). > [Timer Interrupt #2]1. RaiseTPL (HIGH) from NOTIFY causing C= PU interrupt be disabled. > [Timer Interrupt #2]2. Send APIC EOI (ACK the interrupt rece= ived so APIC can continue generate interrupts) > [Timer Interrupt #2]3. Call DxeCore::CoreTimerTick() > [Timer Interrupt #2]4. RestoreTPL (NOTIFY) from HIGH. No cal= lback runs as no callback can be registered at TPL > NOTIFY. In the end of = RestoreTPL(), CPU interrupt is enabled. > > But, in my opinion, it's impossible. As is thoroughly documented in NestedInterruptRestoreTpl(), the potential for unbounded stack consumption arises when an interrupt occurs after the point that RestoreTPL() completes dispatching all notifications but before the IRET (or equivalent) instruction pops the original stack frame. Since dispatching notifications can take an unbounded amount of time, there is absolutely no guarantee that this will be less than 10ms after the previous interrupt. It could easily be 30 seconds later. The problematic flow is a subtle variation on what you described: [IRQ#1] timer interrupt at TPL_APPLICATION [ISR#1] RaiseTPL from TPL_APPLICATION -> TPL_HIGH_LEVEL [ISR#1] Send APIC EOI [ISR#1] Call CoreTimerTick() [ISR#1] RestoreTPL from TPL_HIGH_LEVEL -> TPL_APPLICATION [ISR#1] Callbacks for TPL_NOTIFY are run [ISR#1] Callbacks for TPL_CALLBACK are run ... these may take several *seconds* to complete, during which further interrupts are raised, the details of which are not shown here... [ISR#1] TPL is now restored to TPL_APPLICATION [IRQ#N] timer interrupt at TPL_APPLICATION [ISR#N] RaiseTPL from TPL_APPLICATION -> TPL_HIGH_LEVEL ... continues ... The root cause is that the ISR reaches a state in which: a) an arbitrary amount of time has elapsed since the triggering interrupt (due to unknown callbacks being invoked, which may themselves wait for further timer interrupts), and b) the TPL has been fully restored back to the TPL at the point the triggering interrupt occurred (i.e. TPL_APPLICATION in this example), and c) the timer interrupt source is enabled, and d) CPU interrupts are enabled At this point, there is nothing preventing another interrupt from occurring. It will occur at TPL_APPLICATION and it will be one stack frame deeper than the previous interrupt at TPL_APPLICATION. Rinse and repeat, and you have unbounded stack consumption. Hence the requirement for NestedInterruptTplLib, even on physical hardware. Michael -=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#114045): https://edk2.groups.io/g/devel/message/114045 Mute This Topic: https://groups.io/mt/103734961/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/leave/12367111/7686176/19134562= 12/xyzzy [rebecca@openfw.io] -=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D- --_000_MN6PR11MB8244BB617C68D27D65B8F5A78C702MN6PR11MB8244namp_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Michael,

 

Thanks for the explanation.

I tried to expand the code flow in below and help th= e discussion..

 

TimerInterruptH= andler()

  &nb= sp; gBS->RaiseTPL (HIGH)

  &nb= sp; gBS->RestoreTPL (APPLICATION) // expand in below.<= /p>

   &n= bsp;    For Tpl =3D {NOTIFY, CALLBACK}:  &nbs= p;       ---- [for-loop]

  &nb= sp;         if PendingBit is not se= t:

  &nb= sp;             = ;continue

  &nb= sp;         gCurrentTpl =3D Tpl

  &nb= sp;         EnableInterrupt() =             &nb= sp;    ---- [1]

  &nb= sp;         CoreDispatchEventNotifi= es(Tpl) // expand in below.

  &nb= sp;             = ;gBS->RaiseTPL (HIGH)

  &nb= sp;             = ;gBS->RestoreTPL (Tpl)

  &nb= sp;             = ;NotifyFunction()         &nbs= p;     ---- [2]

  &nb= sp;             = ;gBS->RaiseTPL (HIGH)

  &nb= sp;             = ;gBS->RestoreTPL (Tpl)

   &n= bsp;    End-For

   &n= bsp;    gCurrentTpl =3D APPLICATION   &n= bsp;          ---- [3]

   &n= bsp;    EnableInterrupt()     =              &n= bsp;   ---- [4]

IRET=

 

  1. Agree that the stack overflow could happen in real platform.

The interrupt-enabled env when CPU runs gBS->Rest= oreTPL(APPLICATION)  could be 3 cases: When gCurrentTpl is NOTIFY, CAL= LBACK or APPLICATION.

Let’s name them as “env:NOTIFY”, &= #8220;env:CALLBACK” and “env:APPLICATION”.

CPU enters “env:NOTIFY” and “env:C= ALLBACK” in [1].

CPU enters “env:APPLICATION” in [4], or = [3] when the interrupt is already enabled in the [for-loop].

 

When interrupt happens in “env:NOTIFY”, = the inner interrupt handler calls gBS->RestoreTPL(NOTIFY). The interrupt= -enabled env in the inner RestoreTPL(NOTIFY) is “env:NOTIFY” on= ly.

When interrupt happens in “env:CALLBACK”= , the inner interrupt handler calls gBS->RestoreTPL(CALLBACK). The inter= rupt-enabled env in the inner RestoreTPL(CALLBACK) can be: “env:NOTIF= Y” and “env:CALLBACK”.

 

So, the interrupt re-entrance we want to avoid is &#= 8220;env:NOTIFY”  -> “env:NOTIFY”, or “env:= CALLBACK” -> “env:CALLBACK”, or “env:APPLICATION= ” -> “env:APPLICATION”. Because it’s endless.

NestedTplInterruptLib was written to avoid it.<= /o:p>

 

However, it’s ok for: “env:APPLICATION&#= 8221; -> “env:CALLBACK” -> “env:NOTIFY”, or &= #8220;env:CALLBACK” -> “env:NOTIFY”.

 

In real platform, it’s possible that interrupt= happens just after [4], and in worst case, the RestoreTpl() call in the in= ner timer interrupt handler is interrupted after [4] again, and again.=

 

  1. Some questions on NestedInterruptTplLib.
  1. Can we remove DisableInterruptsOnIret()? That means the inner inter= rupt handler would returns to the outer world with interrupt enabled and TP= L=3D=3DHIGH. But I don’t see any issue with that.
  2. If DxeCore can be changed, do you have an eas= ier-to-understand solution? It really took me 2 days to understand why Nest= edInterruptTplLib is written in today’s way.

 

 

thanks,

ray


From: Michael Brown <mcb30@ipxe.org>
Sent: Wednesday, January 17, 2024 6:46:59 PM
To: devel@edk2.groups.io <devel@edk2.groups.io>; Ni, Ray <r= ay.ni@intel.com>; Laszlo Ersek <lersek@redhat.com>; kraxel@redhat.= com <kraxel@redhat.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>; Kinney, Michael D= <michael.d.kinney@intel.com>; Desimone, Nathaniel L <nathaniel.l.= desimone@intel.com>; Kumar, Rahul R <rahul.r.kumar@intel.com>
Subject: Re: [edk2-devel] [PATCH 1/6] UefiCpuPkg/LocalApicTimerDxe: = Duplicate OvmfPkg/LocalApicTimerDxe driver

 

On 17/01/2024 07:11, = Ni, Ray wrote:
> The above flow shows endless re-entrance of timer interrupt handler. >
> But, my question is: above flow only can happen in real platform when = the below 4 steps occupies more time than the timer period (usually 10ms).<= br> >            = ;  [Timer Interrupt #2]1. RaiseTPL (HIGH) from NOTIFY causing CPU inte= rrupt be disabled.
>            = ;  [Timer Interrupt #2]2. Send APIC EOI (ACK the interrupt received so= APIC can continue generate interrupts)
>            = ;  [Timer Interrupt #2]3. Call DxeCore::CoreTimerTick()
>            = ;  [Timer Interrupt #2]4. RestoreTPL (NOTIFY) from HIGH. No callback r= uns as no callback can be registered at TPL > NOTIFY. In the end of Rest= oreTPL(), CPU interrupt is enabled.
>
> But, in my opinion, it's impossible.

As is thoroughly documented in NestedInterruptRestoreTpl(), the
potential for unbounded stack consumption arises when an interrupt
occurs after the point that RestoreTPL() completes dispatching all
notifications but before the IRET (or equivalent) instruction pops the
original stack frame.

Since dispatching notifications can take an unbounded amount of time,
there is absolutely no guarantee that this will be less than 10ms after the previous interrupt.  It could easily be 30 seconds later.

The problematic flow is a subtle variation on what you described:

   [IRQ#1] timer interrupt at TPL_APPLICATION
     [ISR#1] RaiseTPL from TPL_APPLICATION -> TPL_HI= GH_LEVEL
     [ISR#1] Send APIC EOI
     [ISR#1] Call CoreTimerTick()
     [ISR#1] RestoreTPL from TPL_HIGH_LEVEL -> TPL_A= PPLICATION
       [ISR#1] Callbacks for TPL_NOTIFY are r= un
       [ISR#1] Callbacks for TPL_CALLBACK are= run
       ... these may take several *seconds* t= o complete, during
           which further = interrupts are raised, the details of
           which are not = shown here...
       [ISR#1] TPL is now restored to TPL_APP= LICATION
       [IRQ#N] timer interrupt at TPL_APPLICA= TION
         [ISR#N] RaiseTPL from TPL_= APPLICATION -> TPL_HIGH_LEVEL
         ... continues ...

The root cause is that the ISR reaches a state in which:

   a) an arbitrary amount of time has elapsed since the triggerin= g
      interrupt (due to unknown callbacks being in= voked, which may
      themselves wait for further timer interrupts= ), and

   b) the TPL has been fully restored back to the TPL at the poin= t
      the triggering interrupt occurred (i.e. TPL_= APPLICATION in
      this example), and

   c) the timer interrupt source is enabled, and

   d) CPU interrupts are enabled

At this point, there is nothing preventing another interrupt from
occurring.  It will occur at TPL_APPLICATION and it will be one stack =
frame deeper than the previous interrupt at TPL_APPLICATION.

Rinse and repeat, and you have unbounded stack consumption.

Hence the requirement for NestedInterruptTplLib, even on physical hardware.=

Michael

_._,_._,_

Groups.io Links:

=20 You receive all messages sent to this group. =20 =20

View/Reply Online (#114045) | =20 | Mute= This Topic | New Topic
Your Subscriptio= n | Contact Group Owner | Unsubscribe [rebecca@openfw.io]

_._,_._,_
--_000_MN6PR11MB8244BB617C68D27D65B8F5A78C702MN6PR11MB8244namp_--