From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by mx.groups.io with SMTP id smtpd.web10.27432.1683623614620003619 for ; Tue, 09 May 2023 02:13:34 -0700 Authentication-Results: mx.groups.io; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=KuFPUj93; spf=pass (domain: kernel.org, ip: 139.178.84.217, mailfrom: ardb@kernel.org) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 09A7D63245 for ; Tue, 9 May 2023 09:13:34 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2D8EFC433EF for ; Tue, 9 May 2023 09:13:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1683623613; bh=RYb9L5lZ+HeHiev9hfrHAV4rjDyJR2NcvTJBcV7xX2Q=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=KuFPUj93gWYwSyDHzSQD+uDpJ8HucbxbNd2x0kEKjy5FL0KafSSdyUEyLL4rsElQV ujnWjeGKqU3J6wx53bxygGoMzEK4qzoEqZ/wpICJ2fuDeGiLzH+PnEWrtJkxPTcDOL g2nVXy6MtWwhKgAqeIVPBeyN4esrDGc7Ic/c4pNWAAS4PUJG6Wxq2A08V6MJRCOFjt 6NiYFvrhG4pzxVRFbY0fLmlUzWNeuGUQu5ZZxMs8EQPmPo6YLfjKtLVw2Gv75CD1GW 6oRYbVQK+WEjGs6ijD+ZOhYXPxRH+rREjxx3U4gOG5+OiGQyHvG0tp6/pWMQJvX8lh Ga6blBdYcBJeA== Received: by mail-lf1-f47.google.com with SMTP id 2adb3069b0e04-4f139de8cefso32590851e87.0 for ; Tue, 09 May 2023 02:13:33 -0700 (PDT) X-Gm-Message-State: AC+VfDybYzLWvp6hoh3+nXSQcCsFci89N5Dk8JeYKDRDVq5p9ipUlTXp 6O9esX1xxJU2TTPT+Ju9+AheQSh1u4E4iZV2tmU= X-Google-Smtp-Source: ACHHUZ504DjKy+CCWyeLmP/hHl/km8TxL8+CfTVcKo72Gz3wJvwUft+C1aihL/LW/XGHBimpkTxmx6VpBJzNeHnExT0= X-Received: by 2002:ac2:4d03:0:b0:4ef:ebbb:2cf5 with SMTP id r3-20020ac24d03000000b004efebbb2cf5mr666985lfi.17.1683623611145; Tue, 09 May 2023 02:13:31 -0700 (PDT) MIME-Version: 1.0 References: <20230503071954.266637-1-kraxel@redhat.com> <01020187ec402266-6d4dee99-5a0d-4105-abaf-419c2a5607cc-000000@eu-west-1.amazonses.com> <01020187ee3d92cc-eb212c44-2e49-4ca2-992c-a2d7d3b03f6f-000000@eu-west-1.amazonses.com> <0b5118c3-35b6-28f6-87e1-bcba6d445c82@redhat.com> In-Reply-To: <0b5118c3-35b6-28f6-87e1-bcba6d445c82@redhat.com> From: "Ard Biesheuvel" Date: Tue, 9 May 2023 11:13:19 +0200 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [edk2-devel] [PATCH v2 1/1] OvmfPkg/NestedInterruptTplLib: replace ASSERT() with a warning logged. To: devel@edk2.groups.io, lersek@redhat.com Cc: Michael Brown , kraxel@redhat.com, Oliver Steffen , Pawel Polawski , Jiewen Yao , Ard Biesheuvel , Jordan Justen Content-Type: text/plain; charset="UTF-8" On Mon, 8 May 2023 at 08:46, Laszlo Ersek wrote: > > On 5/6/23 01:57, Ard Biesheuvel wrote: > > On Sat, 6 May 2023 at 01:27, Michael Brown wrote: > >> > >> On 05/05/2023 19:56, Laszlo Ersek wrote: > >>> I don't like the patch. For two reasons: > >>> > >>> (1) It papers over the actual issue. The problem should be fixed where > >>> it is, if possible. > >> > >> Agreed, but (as you have shown in > >> https://bugzilla.redhat.com/show_bug.cgi?id=2189136) the bug lies in > >> Windows code rather than in EDK2 code. If the goal is to allow these > >> buggy Windows builds to still be used with OVMF, then the only option is > >> to paper over the issue. We should do this only if it can be proven > >> safe to do so, of course. > >> > >>> (2) With the patch applied, NestedInterruptRaiseTPL() can return > >>> TPL_HIGH_LEVEL (as "InterruptedTPL"). Consequently, > >>> TimerInterruptHandler() [OvmfPkg/LocalApicTimerDxe/LocalApicTimerDxe.c] > >>> may pass TPL_HIGH_LEVEL back to NestedInterruptRestoreTPL(), as > >>> "InterruptedTPL". > >>> > >>> I believe that this in turn may invalidate at least one comment in > >>> NestedInterruptRestoreTPL(): > >>> > >>> // > >>> // Call RestoreTPL() to allow event notifications to be > >>> // dispatched. This will implicitly re-enable interrupts. > >>> // > >>> gBS->RestoreTPL (InterruptedTPL); > >>> > >>> Restoring TPL_HIGH_LEVEL does not re-enable interrupts -- nominally anyways. > >> > >> I agree that the comment is invalidated, but as far as I can tell the > >> logic remains safe. > >> > >> I will put together a patch to update the comments in > >> NestedInterruptTplLib to address the possibility of an interrupt > >> occurring (illegally) at TPL_HIGH_LEVEL. > >> > >>> (a) Make LocalApicTimerDxe Xen-specific again. It's only the OVMF Xen > >>> platform that really *needs* NestedInterruptTplLib. (Don't get me wrong: > >>> NestedInterruptTplLib is technically correct in all circumstances, but > >>> in practice it happens to be too strict.) > >>> > >>> (b) For the non-Xen OVMF platforms, re-create a LocalApicTimerDxe > >>> variant that effectively has commits a086f4a63bc0 and a24fbd606125 > >>> reverted. (We should keep 9bf473da4c1d.) This returns us to > >>> pre-239b50a86370 status -- that is, a timer interrupt handler that (a) > >>> does not try to be smart about nested interrupts, therefore one that is > >>> much simpler, and (b) is more tolerant of the Windows / cdboot.efi spec > >>> violation, (c) is vulnerable to the timer interrupt storm seen on Xen, > >>> but will never run on Xen. (Only the OVMF Xen platform is supposed to be > >>> launched on Xen.) > >> > >> I'm less keen on this because it reduces the runtime exposure of a very > >> complex piece of code, and will effectively cause that code to become > >> unmaintained. > >> > >> It's also satisfying (to me) that NestedInterruptTplLib provides a > >> provable upper bound on stack consumption due to interrupts, which can't > >> be guaranteed by the simpler pre-239b50a86370 scheme. > >> > >> Could we defer judgement until after I've fully reasoned through (and > >> documented) how NestedInterruptTplLib will work in the presence of > >> interrupts occurring at TPL_HIGH_LEVEL? > >> > > > > Would it be feasible for our firmware implementation to disable the > > timer interrupt at the timer end as well? > > > > E.g., > > > > RaiseTPL(HIGH):: > > > > CLI > > disarm timer > > > > > > RestoreTPL:: > > > > > > re-arm timer > > STI > > > > I can be entirely wrong here, but: > > - we looked for a solution (or workaround) to the original problem that > stays within the boundaries of OvmfPkg, so sinking tweaks into the core > TPL manipulation functions isn't ideal > > - regarding the TimerInterruptHandler() function(s) that do live in > OvmfPkg, there had been tweaks to signaling end-of-interrupt (which I > understand as sort of equivalent to your suggestion, as unless/until you > signal EOI, no more interrupts will be *generated*), but those had not > helped. The EOI was either too early and so we got the unbounded > nesting, or it was too late, and no interrupts were generated while (for > example) TPL_CALLBACK code would depend on timers with CheckEvent. See > bug 4162 -- that was what prompted Michael to revert the EOI placement > tweak and to implement NestedInterruptLib. > > Apologies if there are further interpretations of disarming the timer > that I'm missing! > No, I think you've summed it up. And in any case, it is not OVMF's job to police what the loader is doing to that extent - we should flag it as a violation but beyond that, I don't think we should try to interfere with this non-compliant behavior.