From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by mx.groups.io with SMTP id smtpd.web10.101566.1683528357934109142 for ; Sun, 07 May 2023 23:45:58 -0700 Authentication-Results: mx.groups.io; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=WuNRZRCT; spf=pass (domain: redhat.com, ip: 170.10.133.124, mailfrom: lersek@redhat.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1683528356; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ovDvfYMZvkYrjF/eHJ9InfV0YKE+4SPFgOFW5JAc+Qk=; b=WuNRZRCTMDYZVtFX+vvYkEjU+ockiu4FG3TvJSkk/Tfii8zMK5ozc/0IIKrMMZ/UcK7t0o 5e5rMQFqYnvq85m0KJH9YonSQM391VauRhqWlS3S2ZSH6Q6NFmtzDtWy0ZsEftVtRR1zF+ jP/KhXVl/yAbAhxnkmYYdTF0MhouVCY= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-64-kZD1Pfo5PqGURbvakXfF7Q-1; Mon, 08 May 2023 02:45:55 -0400 X-MC-Unique: kZD1Pfo5PqGURbvakXfF7Q-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5BACA885626; Mon, 8 May 2023 06:45:55 +0000 (UTC) Received: from [10.39.192.234] (unknown [10.39.192.234]) by smtp.corp.redhat.com (Postfix) with ESMTPS id CA9BA492C13; Mon, 8 May 2023 06:45:53 +0000 (UTC) Message-ID: <0b5118c3-35b6-28f6-87e1-bcba6d445c82@redhat.com> Date: Mon, 8 May 2023 08:45:52 +0200 MIME-Version: 1.0 Subject: Re: [edk2-devel] [PATCH v2 1/1] OvmfPkg/NestedInterruptTplLib: replace ASSERT() with a warning logged. To: Ard Biesheuvel , Michael Brown Cc: devel@edk2.groups.io, kraxel@redhat.com, Oliver Steffen , Pawel Polawski , Jiewen Yao , Ard Biesheuvel , Jordan Justen References: <20230503071954.266637-1-kraxel@redhat.com> <01020187ec402266-6d4dee99-5a0d-4105-abaf-419c2a5607cc-000000@eu-west-1.amazonses.com> <01020187ee3d92cc-eb212c44-2e49-4ca2-992c-a2d7d3b03f6f-000000@eu-west-1.amazonses.com> From: "Laszlo Ersek" In-Reply-To: X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 5/6/23 01:57, Ard Biesheuvel wrote: > On Sat, 6 May 2023 at 01:27, Michael Brown wrote: >> >> On 05/05/2023 19:56, Laszlo Ersek wrote: >>> I don't like the patch. For two reasons: >>> >>> (1) It papers over the actual issue. The problem should be fixed where >>> it is, if possible. >> >> Agreed, but (as you have shown in >> https://bugzilla.redhat.com/show_bug.cgi?id=2189136) the bug lies in >> Windows code rather than in EDK2 code. If the goal is to allow these >> buggy Windows builds to still be used with OVMF, then the only option is >> to paper over the issue. We should do this only if it can be proven >> safe to do so, of course. >> >>> (2) With the patch applied, NestedInterruptRaiseTPL() can return >>> TPL_HIGH_LEVEL (as "InterruptedTPL"). Consequently, >>> TimerInterruptHandler() [OvmfPkg/LocalApicTimerDxe/LocalApicTimerDxe.c] >>> may pass TPL_HIGH_LEVEL back to NestedInterruptRestoreTPL(), as >>> "InterruptedTPL". >>> >>> I believe that this in turn may invalidate at least one comment in >>> NestedInterruptRestoreTPL(): >>> >>> // >>> // Call RestoreTPL() to allow event notifications to be >>> // dispatched. This will implicitly re-enable interrupts. >>> // >>> gBS->RestoreTPL (InterruptedTPL); >>> >>> Restoring TPL_HIGH_LEVEL does not re-enable interrupts -- nominally anyways. >> >> I agree that the comment is invalidated, but as far as I can tell the >> logic remains safe. >> >> I will put together a patch to update the comments in >> NestedInterruptTplLib to address the possibility of an interrupt >> occurring (illegally) at TPL_HIGH_LEVEL. >> >>> (a) Make LocalApicTimerDxe Xen-specific again. It's only the OVMF Xen >>> platform that really *needs* NestedInterruptTplLib. (Don't get me wrong: >>> NestedInterruptTplLib is technically correct in all circumstances, but >>> in practice it happens to be too strict.) >>> >>> (b) For the non-Xen OVMF platforms, re-create a LocalApicTimerDxe >>> variant that effectively has commits a086f4a63bc0 and a24fbd606125 >>> reverted. (We should keep 9bf473da4c1d.) This returns us to >>> pre-239b50a86370 status -- that is, a timer interrupt handler that (a) >>> does not try to be smart about nested interrupts, therefore one that is >>> much simpler, and (b) is more tolerant of the Windows / cdboot.efi spec >>> violation, (c) is vulnerable to the timer interrupt storm seen on Xen, >>> but will never run on Xen. (Only the OVMF Xen platform is supposed to be >>> launched on Xen.) >> >> I'm less keen on this because it reduces the runtime exposure of a very >> complex piece of code, and will effectively cause that code to become >> unmaintained. >> >> It's also satisfying (to me) that NestedInterruptTplLib provides a >> provable upper bound on stack consumption due to interrupts, which can't >> be guaranteed by the simpler pre-239b50a86370 scheme. >> >> Could we defer judgement until after I've fully reasoned through (and >> documented) how NestedInterruptTplLib will work in the presence of >> interrupts occurring at TPL_HIGH_LEVEL? >> > > Would it be feasible for our firmware implementation to disable the > timer interrupt at the timer end as well? > > E.g., > > RaiseTPL(HIGH):: > > CLI > disarm timer > > > RestoreTPL:: > > > re-arm timer > STI > I can be entirely wrong here, but: - we looked for a solution (or workaround) to the original problem that stays within the boundaries of OvmfPkg, so sinking tweaks into the core TPL manipulation functions isn't ideal - regarding the TimerInterruptHandler() function(s) that do live in OvmfPkg, there had been tweaks to signaling end-of-interrupt (which I understand as sort of equivalent to your suggestion, as unless/until you signal EOI, no more interrupts will be *generated*), but those had not helped. The EOI was either too early and so we got the unbounded nesting, or it was too late, and no interrupts were generated while (for example) TPL_CALLBACK code would depend on timers with CheckEvent. See bug 4162 -- that was what prompted Michael to revert the EOI placement tweak and to implement NestedInterruptLib. Apologies if there are further interpretations of disarming the timer that I'm missing! Laszlo