From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail02.groups.io (mail02.groups.io [66.175.222.108]) by spool.mail.gandi.net (Postfix) with ESMTPS id 6D30774003C for ; Tue, 16 Jan 2024 14:34:29 +0000 (UTC) DKIM-Signature: a=rsa-sha256; bh=BvYtWjI1JtQ4UpgSsT5YtKYt9yevlDmlVn7EdG2tPPU=; c=relaxed/simple; d=groups.io; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From:In-Reply-To:Precedence:List-Subscribe:List-Help:Sender:List-Id:Mailing-List:Delivered-To:Reply-To:List-Unsubscribe-Post:List-Unsubscribe:Content-Language:Content-Type:Content-Transfer-Encoding; s=20140610; t=1705415668; v=1; b=eu4M4krYlsXkgsBRXg6m7jQeCiEKdKtcw0OT6SPvL3fNrCSljWVsT0LiPqmL8v1aXRWS4MvT MRBy2cVauqWfNCyDFxhiUqbhyPEtEND0WBs4nzvoGbzMwLCPWEHsK3tn1BDS70Dx+ww5o7vU32j 81iIYatVMbRZMUtYEhnJtt8s= X-Received: by 127.0.0.2 with SMTP id nTOUYY7687511x8h1FutGkmf; Tue, 16 Jan 2024 06:34:28 -0800 X-Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by mx.groups.io with SMTP id smtpd.web11.15908.1705415667460080600 for ; Tue, 16 Jan 2024 06:34:27 -0800 X-Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-141-JQpZ6eDuPwK2FZ6770OyBw-1; Tue, 16 Jan 2024 09:34:21 -0500 X-MC-Unique: JQpZ6eDuPwK2FZ6770OyBw-1 X-Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1B2B51064CC5; Tue, 16 Jan 2024 14:34:21 +0000 (UTC) X-Received: from [10.39.194.248] (unknown [10.39.194.248]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 73F671BDB0; Tue, 16 Jan 2024 14:34:19 +0000 (UTC) Message-ID: Date: Tue, 16 Jan 2024 15:34:18 +0100 MIME-Version: 1.0 Subject: Re: [edk2-devel] [PATCH 1/6] UefiCpuPkg/LocalApicTimerDxe: Duplicate OvmfPkg/LocalApicTimerDxe driver To: Michael Brown , devel@edk2.groups.io, kraxel@redhat.com Cc: Pedro Falcato , ray.ni@intel.com, Michael D Kinney , Nate DeSimone , Rahul Kumar References: <20240115080325.147-1-ray.ni@intel.com> <20240115080325.147-2-ray.ni@intel.com> <0102018d11ac8fe8-1ce1e102-af57-426f-bafc-7297bec4799a-000000@eu-west-1.amazonses.com> From: "Laszlo Ersek" In-Reply-To: <0102018d11ac8fe8-1ce1e102-af57-426f-bafc-7297bec4799a-000000@eu-west-1.amazonses.com> X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.5 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Precedence: Bulk List-Subscribe: List-Help: Sender: devel@edk2.groups.io List-Id: Mailing-List: list devel@edk2.groups.io; contact devel+owner@edk2.groups.io Reply-To: devel@edk2.groups.io,lersek@redhat.com List-Unsubscribe-Post: List-Unsubscribe=One-Click List-Unsubscribe: X-Gm-Message-State: Lta8NtrshEKANMZ2dwUrkJEzx7686176AA= Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-GND-Status: LEGIT Authentication-Results: spool.mail.gandi.net; dkim=pass header.d=groups.io header.s=20140610 header.b=eu4M4krY; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=redhat.com (policy=none); spf=pass (spool.mail.gandi.net: domain of bounce@groups.io designates 66.175.222.108 as permitted sender) smtp.mailfrom=bounce@groups.io On 1/16/24 10:48, Michael Brown wrote: > On 16/01/2024 08:47, Gerd Hoffmann wrote: >> I think the reason is that the next timer interrupt arriving while the >> handler is running still is *much* more likely in virtual machines >> because the vCPU does not get 100% of the of the physical CPU time >> slice. > > The interrupt handler can run for an arbitrary length of time, because > the call to RestoreTPL() can end up calling an application callback > which in turn waits on further timer interrupts. > > This is a legitimate use case within UEFI, so all timer interrupt > handlers (not just in virtual machines) need to allow for the > possibility that nested interrupts will occur. The more naive, original interrupt handler implementation (i.e., the one without NestedInterruptTplLib) still "allowed" for nesting, simply by virtue of letting itself be interrupted, if I remember correctly. That in itself didn't cause a problem; the problem arose when this reentrant interruption got limitlessly deep, due to interrupts having been queued on the host side, and then being injected as a "storm" in the guest. The naive handler impl. effectively translated the host-side "interrupt queue" to a "guest side stack". "queue length" became "stack depth", leading to stack overflow. Thus, even the original (more naive) handler could work fine (for nesting too) as long as there was no storm (put differently, as long as queue length, aka stack depth, were small); is that correct? (I admit that I can't really recall the details at this point.) The sophistication of NestedInterruptTplLib is that it supports nesting while *resisting* a storm (= preventing infinite nesting by careful masking of interrupt delivery, IIRC). It does not eagerly slurp all pending (queued) interrupts into the handler stack. IOW, my impression is that NestedInterruptTplLib can certainly handle all scenarios thrown at it, but where it really matters is in the face of an interrupt storm (not just "normal nesting"), and a storm is unlikely (or even impossible?) on physical hardware. ... Oh, scratch that. "Interrupt storm" simply means that interrupts are being delivered at a rate higher than the handler routine can service them. IOW, the "storm" is not that interrupts are delivered *very rapidly* in an absoulte sense. If interrupts are delivered at normal frequency, but the handler is too slow to service *even that rate*, then that also qualifies as "storm", because the nesting depth will *keep growing*. It's not really the growth rate that matters; what matter is the *trend*, i.e., the fact that there *is* growth (the stack gets deeper and deeper). The stack might not overflow immediately, and if the handler speeds up (for whatever reason), the stack might recover, but there is nothing to prevent an overflow. So, in the end, I think you've convinced me. > >> So on OVMF we will continue to need NestedInterruptTplLib.  I like the >> idea to have a Null version of the library, so OVMF does not need its >> own version of the driver just because of NestedInterruptTplLib. > > I agree that there should not need to be two versions of LocalApicTimerDxe. > > I would suggest moving NestedInterruptTplLib from OvmfPkg to MdeModulePkg. > > The code is complex, but for the caller the complexity is completely > hidden behind the calls to NestedInterruptRaiseTPL() and > NestedInterruptRestoreTPL(), which straightforwardly replace what would > otherwise be (unsafe) calls to RaiseTPL() and RestoreTPL(), as shown in > > https://github.com/tianocore/edk2/commit/a086f4a#diff-1418ec21e24e4ce2f3d621113585e3bea11fbcfa3b77d54046e61be7928e0c92 > > For a new CPU architecture, the only requirement is to provide a short > fragment (~5 lines) of code that can clear the interrupts-enabled flag > in the EFI_SYSTEM_CONTEXT, as shown in > OvmfPkg/Library/NestedInterruptTplLib/Iret.c. > > I'm happy to send a patch to migrate NestedInterruptTplLib to > MdeModulePkg, so that it can be consumed outside of OvmfPkg.  Shall I do > this? Sounds like a valid idea to me. Could be greatly supported by a test case (to be run on the bare metal) installing a slow handler that *eventually* exhausted the stack, when not using NestedInterruptTplLib. (FWIW, IIRC, the UEFI spec warns about this -- it says something like, "return from TPL_HIGH as soon as you can, otherwise the system will become unstable".) Sorry for the wall of text, I find this very difficult to reason about. Laszlo -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#113902): https://edk2.groups.io/g/devel/message/113902 Mute This Topic: https://groups.io/mt/103734961/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/leave/12367111/7686176/1913456212/xyzzy [rebecca@openfw.io] -=-=-=-=-=-=-=-=-=-=-=-