From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by mx.groups.io with SMTP id smtpd.web10.101486.1683527938220059077 for ; Sun, 07 May 2023 23:38:58 -0700 Authentication-Results: mx.groups.io; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=cU+4ZAtk; spf=pass (domain: redhat.com, ip: 170.10.129.124, mailfrom: lersek@redhat.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1683527937; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HcY4o3mnfZBH1RkmE1ZYPdu9hfqg+VP3+646U+4VcFg=; b=cU+4ZAtke47Q9c06C7cdxB8p/aUDjk4mlf4nuPaXlvzNlkQyokmOlE25c2tYy6IbBI2KDS 6/UTAEvIEBzxj0czwLTpiSsRvADfWAWrQJUxkKJoPdKlLSgG8HqE4URXp3N+Rz8FKWYG/5 LwSG9oG6t4ecTmbWMGUASmHrZNXmqoE= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-411-UL1S16MRPIqGE-IctEzm7A-1; Mon, 08 May 2023 02:38:54 -0400 X-MC-Unique: UL1S16MRPIqGE-IctEzm7A-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id A692B3813F21; Mon, 8 May 2023 06:38:53 +0000 (UTC) Received: from [10.39.192.234] (unknown [10.39.192.234]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 4AEED1121314; Mon, 8 May 2023 06:38:52 +0000 (UTC) Message-ID: <476bbc17-6484-9afd-9be9-08de14d1d72e@redhat.com> Date: Mon, 8 May 2023 08:38:51 +0200 MIME-Version: 1.0 Subject: Re: [edk2-devel] [PATCH v2 1/1] OvmfPkg/NestedInterruptTplLib: replace ASSERT() with a warning logged. To: Michael Brown , devel@edk2.groups.io, kraxel@redhat.com Cc: Oliver Steffen , Pawel Polawski , Jiewen Yao , Ard Biesheuvel , Jordan Justen References: <20230503071954.266637-1-kraxel@redhat.com> <01020187ec402266-6d4dee99-5a0d-4105-abaf-419c2a5607cc-000000@eu-west-1.amazonses.com> <01020187ee3d92cc-eb212c44-2e49-4ca2-992c-a2d7d3b03f6f-000000@eu-west-1.amazonses.com> From: "Laszlo Ersek" In-Reply-To: <01020187ee3d92cc-eb212c44-2e49-4ca2-992c-a2d7d3b03f6f-000000@eu-west-1.amazonses.com> X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On 5/6/23 01:27, Michael Brown wrote: > On 05/05/2023 19:56, Laszlo Ersek wrote: >> I don't like the patch. For two reasons: >> >> (1) It papers over the actual issue. The problem should be fixed where >> it is, if possible. > > Agreed, but (as you have shown in > https://bugzilla.redhat.com/show_bug.cgi?id=2189136) the bug lies in > Windows code rather than in EDK2 code.  If the goal is to allow these > buggy Windows builds to still be used with OVMF, then the only option is > to paper over the issue.  We should do this only if it can be proven > safe to do so, of course. > >> (2) With the patch applied, NestedInterruptRaiseTPL() can return >> TPL_HIGH_LEVEL (as "InterruptedTPL"). Consequently, >> TimerInterruptHandler() [OvmfPkg/LocalApicTimerDxe/LocalApicTimerDxe.c] >> may pass TPL_HIGH_LEVEL back to NestedInterruptRestoreTPL(), as >> "InterruptedTPL". >> >> I believe that this in turn may invalidate at least one comment in >> NestedInterruptRestoreTPL(): >> >>      // >>      // Call RestoreTPL() to allow event notifications to be >>      // dispatched.  This will implicitly re-enable interrupts. >>      // >>      gBS->RestoreTPL (InterruptedTPL); >> >> Restoring TPL_HIGH_LEVEL does not re-enable interrupts -- nominally >> anyways. > > I agree that the comment is invalidated, but as far as I can tell the > logic remains safe. > > I will put together a patch to update the comments in > NestedInterruptTplLib to address the possibility of an interrupt > occurring (illegally) at TPL_HIGH_LEVEL. > >> (a) Make LocalApicTimerDxe Xen-specific again. It's only the OVMF Xen >> platform that really *needs* NestedInterruptTplLib. (Don't get me wrong: >> NestedInterruptTplLib is technically correct in all circumstances, but >> in practice it happens to be too strict.) >> >> (b) For the non-Xen OVMF platforms, re-create a LocalApicTimerDxe >> variant that effectively has commits a086f4a63bc0 and a24fbd606125 >> reverted. (We should keep 9bf473da4c1d.) This returns us to >> pre-239b50a86370 status -- that is, a timer interrupt handler that (a) >> does not try to be smart about nested interrupts, therefore one that is >> much simpler, and (b) is more tolerant of the Windows / cdboot.efi spec >> violation, (c) is vulnerable to the timer interrupt storm seen on Xen, >> but will never run on Xen. (Only the OVMF Xen platform is supposed to be >> launched on Xen.) > > I'm less keen on this because it reduces the runtime exposure of a very > complex piece of code, and will effectively cause that code to become > unmaintained. > > It's also satisfying (to me) that NestedInterruptTplLib provides a > provable upper bound on stack consumption due to interrupts, which can't > be guaranteed by the simpler pre-239b50a86370 scheme. > > Could we defer judgement until after I've fully reasoned through (and > documented) how NestedInterruptTplLib will work in the presence of > interrupts occurring at TPL_HIGH_LEVEL? Sure, absolutely! As I wrote elsewhere: if you can revalidate the code with a new, less strict set of invariants, and update the comments, I think that would be the perfect workaround. Thank you, Laszlo