From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by mx.groups.io with SMTP id smtpd.web11.288.1606329367676170053 for ; Wed, 25 Nov 2020 10:36:08 -0800 Authentication-Results: mx.groups.io; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=U/tWfuwA; spf=pass (domain: redhat.com, ip: 63.128.21.124, mailfrom: lersek@redhat.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1606329366; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Xxkb0oqndyD7VS478O+PePmt5Ec0P/2IWjgkL6fUOB4=; b=U/tWfuwAQnEPvS7Q9bHI2XDPAy4cY84pr3K/oclfNKu0C3ukxM1BFug9QdsF/CH6ScHj7X uMWC6zD36MLVgK4WR731OWq48AQMKAdLqBdMf069jlo62wquEJonRoYC/4MWB+zQCK6AtJ toyhRV60z/nUVARg7XdA7UXU/JIR2+I= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-252-Hqzk65YcPQys2dWRNMfb9g-1; Wed, 25 Nov 2020 13:36:04 -0500 X-MC-Unique: Hqzk65YcPQys2dWRNMfb9g-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id AAA0818C8C06; Wed, 25 Nov 2020 18:36:01 +0000 (UTC) Received: from lacos-laptop-7.usersys.redhat.com (ovpn-112-239.ams2.redhat.com [10.36.112.239]) by smtp.corp.redhat.com (Postfix) with ESMTP id 43C0E60BE5; Wed, 25 Nov 2020 18:35:57 +0000 (UTC) Subject: Re: [edk2-devel] [PATCH v2 2/6] OvmfPkg/AmdSev: add Grub Firmware Volume Package To: jejb@linux.ibm.com, devel@edk2.groups.io, Bret Barkelew , "Liming Gao (Byosoft address)" Cc: dovmurik@linux.vnet.ibm.com, Dov.Murik1@il.ibm.com, ashish.kalra@amd.com, brijesh.singh@amd.com, tobin@ibm.com, david.kaplan@amd.com, jon.grimm@amd.com, thomas.lendacky@amd.com, frankeh@us.ibm.com, "Dr . David Alan Gilbert" , "Ard Biesheuvel (ARM address)" References: <20201120184521.19437-1-jejb@linux.ibm.com> <20201120184521.19437-3-jejb@linux.ibm.com> <28e99174-79b3-e805-b977-5fed0071a702@redhat.com> <06b9425507ab8c1b35d377cf9bba155b0cc44147.camel@linux.ibm.com> <3b7899fa-fa52-7652-2d2a-d4ec67ece34d@redhat.com> <1c871b56-f459-5ac4-3b8d-a55d978eac06@redhat.com> <93fdaca88b53d400670b338a06fd1410c1445a39.camel@linux.ibm.com> <082a97c2-9a49-acf6-fd7c-70ee6b61c000@redhat.com> <5b9b21c3eb37ba7024c1cb85ead267867b323c7d.camel@linux.ibm.com> <1064db1d53315987bf8bb478894a07bda8d90a96.camel@linux.ibm.com> From: "Laszlo Ersek" Message-ID: <53957e99-3f81-0121-b8fd-db28fdb01a73@redhat.com> Date: Wed, 25 Nov 2020 19:35:57 +0100 MIME-Version: 1.0 In-Reply-To: <1064db1d53315987bf8bb478894a07bda8d90a96.camel@linux.ibm.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=lersek@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit On 11/25/20 18:09, James Bottomley wrote: > On Wed, 2020-11-25 at 08:02 -0800, James Bottomley wrote: >> On Wed, 2020-11-25 at 15:01 +0100, Laszlo Ersek wrote: >>> This upgrade gave me kernel 5.8.18-100.fc31.x86_64 in the guest -- >>> and this one does *not* crash. From your boot log below, I see your >>> guest kernel is 5.5.0; I suggest upgrading it. >> >> Heh, that's easier said than done ... I always make my encrypted >> images too small to upgrade a kernel easily. Anyway, after doing the >> remove and add stuff dance, I finally got it upgraded to the latest >> debian testing linux-image-5.8.0-3 it's still crashing although with >> a slightly different traceback. It looks like there might be >> something additional in the fedora 5.8 kernel that fixes this. I'm >> going to try out upstream kernels next. > > I've got the upstream kernel booting through OVMF with a qemu -kernel > command line. I also have a fix: it's not to delete the dummy variable > which was part of the ancient x86 anti bricking code (which is also why > arm64 doesn't have the problem). > > If you remove the set variable in arch/x86/platform/efi/quirks.c: > > /* > * Deleting the dummy variable which kicks off garbage collection > */ > void efi_delete_dummy_variable(void) > { > efi.set_variable_nonblocking((efi_char16_t *)efi_dummy_name, > &EFI_DUMMY_GUID, > EFI_VARIABLE_NON_VOLATILE | > EFI_VARIABLE_BOOTSERVICE_ACCESS | > EFI_VARIABLE_RUNTIME_ACCESS, 0, NULL); > } > > The kernel will boot. I'm not sure why we have this deletion > unconditionally in efi_enter_virtual_mode, but removing the call with > the patch below allows the kernel to boot. I think commit 2ecb7402cfc7 ("efi/x86: Do not clean dummy variable in kexec path", 2019-10-07) is related (part of v5.4), but it's not sufficient to prevent the boot crash. (That removal only covered the kexec path, and not the normal boot path.) > > However, once the kernel has booted, any attempt to write to an EFI > variable results in this: > > [ 975.440240] [Firmware Bug]: Page fault caused by firmware at PA: 0x7e450020 > > And then the efi runtime gets disabled. Blech, that doesn't look good. We still get a page fault somewhere in the firmware, it just doesn't kill the kernel outright. That kind of suggests the crash on the boot path *is* firmware-originated, it's just that the kernel is unable to mask the problem that early. OK, I'll try to look into this more closely... In such cases, I generally reproduce the guest kernel crash, and while the guest is in that crashed state, I use $ virsh dump ovmf.fedora vmcore --memory-only --format kdump-lzo Then, I force off the VM. Next, install the "kernel-debuginfo" and "kernel-debuginfo-common" packages matching the crashed guest kernel. Finally, run the "crash" utility on the vmcore, to poke around in the vmcore. "crash" is very powerful, I hope it turns up something... BTW, the Fedora 5.8.18-100.fc31 kernel does carry like 71 broken-out extra patches in the SRPM, as far as I can tell... Thanks Laszlo > > James > > --- > > diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c > index 8a26e705cb06..dfae61f07196 100644 > --- a/arch/x86/platform/efi/efi.c > +++ b/arch/x86/platform/efi/efi.c > @@ -844,7 +844,7 @@ static void __init __efi_enter_virtual_mode(void) > efi_runtime_update_mappings(); > > /* clean DUMMY object */ > - efi_delete_dummy_variable(); > + //efi_delete_dummy_variable(); > return; > > err: >