From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by mx.groups.io with SMTP id smtpd.web11.12604.1683095256927657925 for ; Tue, 02 May 2023 23:27:37 -0700 Authentication-Results: mx.groups.io; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=YQPvZ5oq; spf=pass (domain: redhat.com, ip: 170.10.129.124, mailfrom: kraxel@redhat.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1683095255; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=iImvHjA/hemz/WHDyGoQYni3jKbjXnjoQRHMXhIJ4w4=; b=YQPvZ5oqVXiajcF+YoUQ2cM1C7Vb9a2CjF2UKEcNei2rxsc0ppIlaVzpFAoeEPrP7km4+E x0nuw/V22DfH/gHaMZSOhb3yIgxAU1C3qCJLMU1LeSvUFm+1b1SSXRN8H7yhFAR/jONSyy QYzGNv0VNxQNQo6YLThXJL+z7ColuGk= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-101-L-OdDR5DOmiXXFWsMzmD8w-1; Wed, 03 May 2023 02:27:32 -0400 X-MC-Unique: L-OdDR5DOmiXXFWsMzmD8w-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5E99E3C025BA; Wed, 3 May 2023 06:27:32 +0000 (UTC) Received: from sirius.home.kraxel.org (unknown [10.39.192.37]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 055A9C15BAD; Wed, 3 May 2023 06:27:31 +0000 (UTC) Received: by sirius.home.kraxel.org (Postfix, from userid 1000) id 337D4180093A; Wed, 3 May 2023 08:27:30 +0200 (CEST) Date: Wed, 3 May 2023 08:27:30 +0200 From: "Gerd Hoffmann" To: Michael Brown Cc: devel@edk2.groups.io, Oliver Steffen , Jiewen Yao , Ard Biesheuvel , Pawel Polawski , Jordan Justen , Laszlo Ersek Subject: Re: [edk2-devel] [PATCH 1/1] OvmfPkg/NestedInterruptTplLib: replace ASSERT() with a warning logged. Message-ID: References: <20230428091019.1506923-1-kraxel@redhat.com> <01020187c79d5eef-190d7d28-3c30-442a-913b-4dae66b71839-000000@eu-west-1.amazonses.com> <01020187c857b5fe-3dbe7e11-e052-4b37-a477-5b12ccc253fc-000000@eu-west-1.amazonses.com> MIME-Version: 1.0 In-Reply-To: <01020187c857b5fe-3dbe7e11-e052-4b37-a477-5b12ccc253fc-000000@eu-west-1.amazonses.com> X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Fri, Apr 28, 2023 at 02:50:04PM +0000, Michael Brown wrote: > On 28/04/2023 14:38, Gerd Hoffmann wrote: > > I suspect the windows boot loader does something fishy here, but I can't > > proof it, I have not yet pinned down the exact location where interrupts > > get enabled while running at IPL=TPL_HIGH_LEVEL (which is what I suspect > > is happening, but of course this is not the only possible theory how > > that ASSERT got triggered). > > > > Not fully sure how to best continue debugging this, I don't think gdb > > can set an watchpoint on eflags.if ... > > In the absence of any better ideas, I'd be tempted to run QEMU via TCG > instead of KVM, and hack the STI instruction definition to check the > location in guest memory where gEfiCurrentTpl gets stored in your test > build. (It's brute force debugging, but it should find the culprit.) Not so easy as tcg generates code for cli+sti, so there isn't a function I could add logging hacks to. > From a quick check of the implementation, I'm pretty sure this will be the > case. However, the logic is already (necessarily) pretty complex and so I > am not 100% sure of this. The reasoning behind the logic in > NestedInterruptTplLib relies on certain axioms and invariants (e.g. that > there are a finite number of distinct TPLs, and that there are certain > places within the code that further interrupts provably cannot occur), and > so it gets very difficult to reason about when one of those invariants is > violated, as it seems to be in this situation. > > If it seems to work, It seems to work, and the warning is printed exactly once during boot. > with a warning message, and to return with InterruptedTPL unmodified. But > I'd prefer to understand how this invariant violation arises in the first > place. Root cause not found yet, but all debugging so far didn't found any problems in edk2/ovmf, so my theory still is that windows does something fishy here. take care, Gerd