From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from a7-20.smtp-out.eu-west-1.amazonses.com (a7-20.smtp-out.eu-west-1.amazonses.com [54.240.7.20]) by mx.groups.io with SMTP id smtpd.web11.21864.1682693406303014032 for ; Fri, 28 Apr 2023 07:50:06 -0700 Authentication-Results: mx.groups.io; dkim=pass header.i=@ipxe.org header.s=cphpx6z2rfcgehlykjjh3gknqe3hsoe2 header.b=UFUeKaMf; spf=pass (domain: eu-west-1.amazonses.com, ip: 54.240.7.20, mailfrom: 01020187c857b5fe-3dbe7e11-e052-4b37-a477-5b12ccc253fc-000000@eu-west-1.amazonses.com) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/simple; s=cphpx6z2rfcgehlykjjh3gknqe3hsoe2; d=ipxe.org; t=1682693404; h=Message-ID:Date:MIME-Version:To:Cc:References:From:Subject:In-Reply-To:Content-Type:Content-Transfer-Encoding; bh=jSd3xkFAI8dgSN6xIuTKQzPDRbxH/Vs6Dk5rhSe/wTA=; b=UFUeKaMfFelBMf5z5NIj2uGNUOyZBy3jynBRHOgrmhhM/CjwmqMsEXlHVbfJxw/k T1SFpC5GUdnaD1PIYdfzpq1DTz+mCYAiHqQa7g6PsO8whvd1Ir9p3gjDRXYfgCWylLK ZYhTv+PC9GjWbuy+JV/zzUKoWzTnNwoY8f0IdYrOsyzUyYFxtnfejTqKHo65XdW+Q9/ iqkEbyPB5XgkgucxDAEBTofqysT7WZ1eG5HAtG+FzeY3EtoFEzcHDoCW1ukOwAHblUK MZqiysIMF6zMItaMxRtVc9CtWZwXcofqzLYDy+XvATInio0T7BmZ77x/+0ql9bRuD5Q WkJ+9A6piQ== DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/simple; s=shh3fegwg5fppqsuzphvschd53n6ihuv; d=amazonses.com; t=1682693404; h=Message-ID:Date:MIME-Version:To:Cc:References:From:Subject:In-Reply-To:Content-Type:Content-Transfer-Encoding:Feedback-ID; bh=jSd3xkFAI8dgSN6xIuTKQzPDRbxH/Vs6Dk5rhSe/wTA=; b=n6RcPcbpuvwCAtkt1alkRyGSYttH1dDxPimVcIw6VgQbRimKiMUdH/gKVWpUHCja OsihFFs0J0wXHYuxLGfspKf4di1y8ohSQeFzpkkqW8vii4Yz3awT20tEfGN4knmKCPp 1AxiRhT9Pn/VqTGGDDB/nyIaHj3mWUxjgNdbrMrw= Message-ID: <01020187c857b5fe-3dbe7e11-e052-4b37-a477-5b12ccc253fc-000000@eu-west-1.amazonses.com> Date: Fri, 28 Apr 2023 14:50:04 +0000 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.0 To: devel@edk2.groups.io, kraxel@redhat.com Cc: Oliver Steffen , Jiewen Yao , Ard Biesheuvel , Pawel Polawski , Jordan Justen , Laszlo Ersek References: <20230428091019.1506923-1-kraxel@redhat.com> <01020187c79d5eef-190d7d28-3c30-442a-913b-4dae66b71839-000000@eu-west-1.amazonses.com> From: "Michael Brown" Subject: Re: [edk2-devel] [PATCH 1/1] OvmfPkg/NestedInterruptTplLib: replace ASSERT() with a warning logged. In-Reply-To: X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on blyat.fensystems.co.uk Feedback-ID: 1.eu-west-1.fspj4M/5bzJ9NLRzJP0PaxRwxrpZqiDQJ1IF94CF2TA=:AmazonSES X-SES-Outgoing: 2023.04.28-54.240.7.20 Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 28/04/2023 14:38, Gerd Hoffmann wrote: > I suspect the windows boot loader does something fishy here, but I can't > proof it, I have not yet pinned down the exact location where interrupts > get enabled while running at IPL=TPL_HIGH_LEVEL (which is what I suspect > is happening, but of course this is not the only possible theory how > that ASSERT got triggered). > > Not fully sure how to best continue debugging this, I don't think gdb > can set an watchpoint on eflags.if ... In the absence of any better ideas, I'd be tempted to run QEMU via TCG instead of KVM, and hack the STI instruction definition to check the location in guest memory where gEfiCurrentTpl gets stored in your test build. (It's brute force debugging, but it should find the culprit.) >> This workaround looks wrong to me: it will cause the subsequent call to >> NestedInterruptRestoreTPL() to drop the TPL back down to TPL_HIGH_LEVEL-1, >> which is lower than it would have been when the interrupt occurred. The >> completed hardware interrupt would then result in an overall change of TPL, >> which cannot be correct. > > The idea was to set InterruptedTPL to the highest level which is allowed > to run with interrupts enabled and continue running with interrupts > enabled, hoping that things get back into normal once the next timer > interrupt arrives. > > Which -- now that I'm thinking about it again -- is clearly bogus, we > are in the timer interrupt so IRQs have already been disabled at that > point, so the fixup idea doesn't make much sense. > > So just leave InterruptedTPL alone and hope for the best? I think so. CoreRestoreTpl(TPL_HIGH_LEVEL) should be a no-op, so it would be reasonable to expect that NestedInterruptRestoreTPL(TPL_HIGH_LEVEL, ...) should also be a no-op. From a quick check of the implementation, I'm pretty sure this will be the case. However, the logic is already (necessarily) pretty complex and so I am not 100% sure of this. The reasoning behind the logic in NestedInterruptTplLib relies on certain axioms and invariants (e.g. that there are a finite number of distinct TPLs, and that there are certain places within the code that further interrupts provably cannot occur), and so it gets very difficult to reason about when one of those invariants is violated, as it seems to be in this situation. If it seems to work, then I think it would be plausible to replace the ASSERT (InterruptedTPL < TPL_HIGH_LEVEL); with a warning message, and to return with InterruptedTPL unmodified. But I'd prefer to understand how this invariant violation arises in the first place. Good luck! Michael