From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-x232.google.com (mail-io0-x232.google.com [IPv6:2607:f8b0:4001:c06::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 564A221D2DCED for ; Tue, 1 Aug 2017 15:55:17 -0700 (PDT) Received: by mail-io0-x232.google.com with SMTP id m88so13862412iod.2 for ; Tue, 01 Aug 2017 15:57:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=Q4UyC+NLhLat9+riGy+Pt6KnS440Oe4+bMAKhqPT1MU=; b=VGiLzCXd8oCpL6fBUc/Jx8kVjZfcPxVUrIbsx2Hu29RJU1ZJLnxzsIpPyPHauNQEZ4 sMlhl5CkKoOAy76nH9d0XXCVuAZChS+4nyD+uQaDxVobc1GiGI3HzgkjKQELC9NkKQSJ Xm2kktC/7P1G0gT0mdUNWtwoathOgRyNSDLFs= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=Q4UyC+NLhLat9+riGy+Pt6KnS440Oe4+bMAKhqPT1MU=; b=QWEWPGd2Aurc7AbKqF8anjTGOM18rW1ibwL9U4rS7M2kf6yaElEM0/WpNlGZtaDx6j Nu0iRDnUTewFQzhd/FxeD6X4b6TfbzzV+W3ZsLQ1Mv/rx6+5VUmnpJcJ3PYAhaXqS4v2 bkzMrjj+H8ayTyi0dXdIE7ky3113fmRnqghjUos4dVoaK+ubY1uUapgPwxsHQI6s6Iae +5azrtzdm29DognEXRyk23K5b5HrKd6/efMLq/KshL3YtdHJEcWsaSYqqhqcvUGzwfxT 1CvoaAUQwBefZ7KL8l3S8RWUT1qzULJAoQ1kEaVMxFZCt5wlcbaKxKZY2VpLWs1aAxtv YkFA== X-Gm-Message-State: AIVw112cyIXntk9ZbC1oXWiopewY+sejh+xQYdMTiNXatW1nwf5tm3xv +qJkwtfu9HWCo6fK1mCBhDopDTUFSx+k X-Received: by 10.107.43.131 with SMTP id r125mr24365837ior.76.1501628246104; Tue, 01 Aug 2017 15:57:26 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.162.1 with HTTP; Tue, 1 Aug 2017 15:57:25 -0700 (PDT) In-Reply-To: <780bbce1-3cb7-81ab-6e72-4779804b7ce3@redhat.com> References: <597E798B.1020806@huawei.com> <443e01eb-28ec-6e4d-43ac-6f6f16f7f3d4@redhat.com> <59803D0A.6020305@huawei.com> <77e8ddad-9039-efe4-f6f7-1dbc66d4eb6c@redhat.com> <780bbce1-3cb7-81ab-6e72-4779804b7ce3@redhat.com> From: Ard Biesheuvel Date: Tue, 1 Aug 2017 23:57:25 +0100 Message-ID: To: Laszlo Ersek Cc: "edk2-devel@lists.01.org" , "Richard W.M. Jones" Subject: Re: issue about booting centos fail with edk2 X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Aug 2017 22:55:17 -0000 Content-Type: text/plain; charset="UTF-8" On 1 August 2017 at 23:29, Laszlo Ersek wrote: > On 08/01/17 19:23, Ard Biesheuvel wrote: >> On 1 August 2017 at 16:42, Laszlo Ersek wrote: >>> On 08/01/17 10:34, Zhu Yijun wrote: >>>> Thanks for your reply! >>>> >>>> On 2017/8/1 3:02, Laszlo Ersek wrote: >>>>> On 07/31/17 02:27, Zhu Yijun wrote: >>>>>> Hi all, >>>>>> >>>>>> I install a CentOS-7-aarch64 guest img by qemu cdrom, but it hung at UEFI probability. >>>>>> >>>>>> Basic info: >>>>>> libvirt 1.3.5 >>>>>> QEMU 2.6.2 >>>>>> UEFI: master branch with commit "688c7d2 BaseTools: Fix the bug that warn() function with only 1 argument" >>>>>> >>>>>> Config pflash and two disks in xml: >>>>>> >>>>>> ... >>>>>> >>>>>> hvm >>>>>> /usr/share/edk2/aarch64/QEMU_EFI-pflash.raw >>>>>> >>>>>> >>>>>> ... >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ... >>>>>> >>>>>> I found it failed at "Match (Translated, TranslatedSize, ActiveOption[Idx].BootOption->FilePath)" function in "SetBootOrderFromQemu", the UEFI debug info as follow: >>>>> No, that's not where the problem is. See below: >>>>> >>>>>> start-console-fail.log >>>>>> FSOpen: Open '\EFI\BOOT\fallback.efi' Success >>>>>> FSOpen: Open '\EFI\BOOT\fallback.efi' Success >>>>>> >>>>>> >>>>>> Synchronous Exception at 0x00000002384B1104 >>>>>> PC 0x0002384B1104 >>>>>> PC 0x0002384A916C >>>>>> PC 0x0002384CA2D0 >>>>>> PC 0x00023EEB7DF8 (0x00023EEB1000+0x00006DF8) [ 1] DxeCore.dll >>>>>> PC 0x00023BD1568C (0x00023BD02000+0x0001368C) [ 2] BdsDxe.dll >>>>>> PC 0x00023BD03F98 (0x00023BD02000+0x00001F98) [ 2] BdsDxe.dll >>>>>> PC 0x00023BD05640 (0x00023BD02000+0x00003640) [ 2] BdsDxe.dll >>>>>> PC 0x00023EEB3704 (0x00023EEB1000+0x00002704) [ 3] DxeCore.dll >>>>>> PC 0x00023EEB27C8 (0x00023EEB1000+0x000017C8) [ 3] DxeCore.dll >>>>>> PC 0x00023EEB2024 (0x00023EEB1000+0x00001024) [ 3] DxeCore.dll >>>>>> [ 1] /root/rpmbuild/BUILD/edk2-2.6.0/Build/ArmVirtQemu-AARCH64/DEBUG_GCC49/AARCH64/MdeModulePkg/Core/Dxe/DxeMain/DEBUG/DxeCore.dll >>>>>> [ 2] /root/rpmbuild/BUILD/edk2-2.6.0/Build/ArmVirtQemu-AARCH64/DEBUG_GCC49/AARCH64/MdeModulePkg/Universal/BdsDxe/BdsDxe/DEBUG/BdsDxe.dll >>>>>> [ 3] /root/rpmbuild/BUILD/edk2-2.6.0/Build/ArmVirtQemu-AARCH64/DEBUG_GCC49/AARCH64/MdeModulePkg/Core/Dxe/DxeMain/DEBUG/DxeCore.dll >>>>>> >>>>>> X0 0x00000002384A9000 X1 0x00000002384B2990 X2 0x000000023AAFDF98 X3 0x000000023BFF0018 >>>>>> X4 0x0000000000000000 X5 0x0000000000000007 X6 0x0000000238533300 X7 0x0000000000000000 >>>>>> X8 0x000000023C01F548 X9 0x0000000200000000 X10 0x00000002384A8000 X11 0x00000002384C5FFF >>>>>> X12 0x0000000000000000 X13 0x0000000000000008 X14 0x259511BDAEB1F36C X15 0x1378CC1DF3F5DDBB >>>>>> X16 0x000000023EEB0BE0 X17 0x0000000000000000 X18 0x0000000000000000 X19 0x0000000000000013 >>>>>> X20 0x0000000000000000 X21 0x0000000000000000 X22 0x0000000000000000 X23 0x0000000000000000 >>>>>> X24 0x0000000000000000 X25 0x0000000000000000 X26 0x0000000000000000 X27 0x0000000000000000 >>>>>> X28 0x0000000000000000 FP 0x000000023EEB0A40 LR 0x00000002384A916C >>>>>> >>>>>> V0 0xAFAFAFAFAFAFAFAF AFAFAFAFAFAFAFAF V1 0x63702F6666666666 6666666666666666 >>>>>> V2 0x40697363732F3340 6567646972622D69 V3 0x0000000000000000 0000000000000000 >>>>>> V4 0x0000000000000000 0000000000000000 V5 0x4010040140100401 4010040140100401 >>>>>> V6 0x0000000000000000 0000000000000000 V7 0x0000000000000000 0000000000000000 >>>>>> V8 0x0000000000000000 0000000000000000 V9 0x0000000000000000 0000000000000000 >>>>>> V10 0x0000000000000000 0000000000000000 V11 0x0000000000000000 0000000000000000 >>>>>> V12 0x0000000000000000 0000000000000000 V13 0x0000000000000000 0000000000000000 >>>>>> V14 0x0000000000000000 0000000000000000 V15 0x0000000000000000 0000000000000000 >>>>>> V16 0x0000000000000000 0000000000000000 V17 0x0000000000000000 0000000000000000 >>>>>> V18 0x0000000000000000 0000000000000000 V19 0x0000000000000000 0000000000000000 >>>>>> V20 0x0000000000000000 0000000000000000 V21 0x0000000000000000 0000000000000000 >>>>>> V22 0x0000000000000000 0000000000000000 V23 0x0000000000000000 0000000000000000 >>>>>> V24 0x0000000000000000 0000000000000000 V25 0x0000000000000000 0000000000000000 >>>>>> V26 0x0000000000000000 0000000000000000 V27 0x0000000000000000 0000000000000000 >>>>>> V28 0x0000000000000000 0000000000000000 V29 0x0000000000000000 0000000000000000 >>>>>> V30 0x0000000000000000 0000000000000000 V31 0x0000000000000000 0000000000000000 >>>>>> >>>>>> SP 0x000000023EEB0A40 ELR 0x00000002384B1104 SPSR 0x60000205 FPSR 0x00000000 >>>>>> ESR 0x02000000 FAR 0x1DE7EC7EDBADC0DE >>>>>> >>>>>> ESR : EC 0x00 IL 0x1 ISS 0x00000000 >>>>>> >>>>>> Stack dump: >>>>>> 000023EEB0940: 0000C0E000000148 00000002384A9000 00000002384CA254 0000000000000000 >>>>>> 000023EEB0960: 000000023EEB0BC0 000000023AC006C0 0000F2503EEB0BC0 00000002384B6018 >>>>>> 000023EEB0980: 000000023EEB0BC0 0000000000000000 000000000000C0E0 0000000000000148 >>>>>> 000023EEB09A0: 0000000000000148 0000100000020A8C 00000002384B6110 00000002384B6108 >>>>>> 000023EEB09C0: 00000002384B6100 0000000000000006 00000002384B6058 00000002384B50DF >>>>>> 000023EEB09E0: 00000002384A9148 0000000000000000 00000002384A9000 00000002384A9000 >>>>>> 000023EEB0A00: 0000000000000000 00000002398DA518 00000002385375B2 00000002385629A0 >>>>>> 000023EEB0A20: 000000023854C1C0 00000002398DA518 000000023EEB0BC0 0000000000000000 >>>>>>> 000023EEB0A40: 000000023EEB0BC0 00000002384CA2D0 000000023AAFDF98 000000023BFF0018 >>>>>> 000023EEB0A60: 00000002384CA360 000000023EEC8348 00000002385375B0 000000023AAFDF98 >>>>>> 000023EEB0A80: 000000023EEB0AC0 0000F25038533338 00000002384B6018 0000000000000000 >>>>>> 000023EEB0AA0: 0000000000000000 0000000238B63D18 0000000000001000 0000000000000000 >>>>>> 000023EEB0AC0: 000000023BFF0018 00000002398DA518 00000002398CE598 0000000000000000 >>>>>> 000023EEB0AE0: 0000000000000000 0000000000000000 00000002384C6000 00000000000C99C0 >>>>>> 000023EEB0B00: 0000000200000001 0000000000000000 000000023AC006C0 11D295625B1B31A1 >>>>>> 000023EEB0B20: 3B7269C9A0003F8E 0000000000000000 0000000238B63F98 000000163EEB0B68 >>>>>> ASSERT [ArmCpuDxe] /root/rpmbuild/BUILD/edk2-2.6.0/ArmPkg/Library/DefaultExceptionHandlerLib/AArch64/DefaultExceptionHandler.c(271): ((BOOLEAN)(0==1)) >>>>> This is a guest that you didn't install from installer media. I think >>>>> you may have gotten the preinstalled disk image from some image provider >>>>> service. The UEFI boot variable(s) are not set up to boot the CentOS >>>>> installation, in your nvram / pflash file. >>>> Yes, the boot variable must store in domain's nvram file("/var/lib/libvirt/qemu/nvram/centos_VARS.fd"). After installed, it generates an new boot menu >>>> called "CentOS Linux AltArch " which device path is "HD(1,GPT,D562CAA6-F61B-4F93-87FB-22DDADF6CAE2,0x800,0x64000)/\EFI\centos\shim.efi". >>>> >>>> such like: >>>> Boot Manager Menu >>>> CentOS Linux AltArch -> device path: PciRoot(0x0)/Pci(0x3,0x0)/Pci(0x0,0x0)/Scsi(0x0,0x0) /HD(1,GPT,D562CAA6-F61B-4F93-87FB-22DDADF6CAE2,0x800,0x64000)/\EFI\centos\shim.efi >>>> UEFI Misc Device >>>> UEFI Misc Device 2 >>>> EFI Internal Shell >>>> UEFI QEMU QEMU CD-ROM -> device path: PciRoot(0x0)/Pci(0x3,0x0)/Pci(0x0,0x0)/Scsi(0x0,0x1) >>>> UEFI QEMU QEMU HARDDISK -> device path: PciRoot(0x0)/Pci(0x3,0x0)/Pci(0x0,0x0)/Scsi(0x0,0x0) >>>> UEFI PXEv4 (MAC:5254002D2EB6) >>>> >>>> But when I shutdown &undefine this domain, and virsh create an new domain with the disk centos.qcow2 which installed just before, the UEFI boot manager >>>> menu is: >>>> Boot Manager Menu >>>> UEFI QEMU QEMU HARDDISK -> device path: PciRoot(0x0)/Pci(0x3,0x0)/Pci(0x0,0x0)/Scsi(0x0,0x0) >>>> UEFI Misc Device >>>> UEFI Misc Device 2 >>>> EFI Internal Shell >>>> UEFI PXEv4 (MAC:5254002D2EB6) >>> >>> Right. In this case you have lost your original nvram contents, and you >>> only have the boot options that are auto-generated by the >>> EfiBootManagerRefreshAllBootOption() function. This function lives in >>> UefiBootManagerLib, and is called from OVMF's PlatformBootManagerLib >>> instance. >>> >>> The filtering and reordering still occurs in OVMF, but now the first >>> boot option that matches QEMU's fw_cfg bootorder specification is not >>> the "CentOS Linux AltArch" boot option that you originally had. Instead, >>> now QemuBootOrderLib encounters the "UEFI QEMU QEMU HARDDISK" >>> auto-generated boot option first as a match. >>> >>> This boot option in turn means "fallback.efi", according to the blog >>> post I linked earlier. >>> >>> When "fallback.efi" executes successfully, your original "CentOS Linux >>> AltArch" boot option is restored / recreated (at the top of the boot >>> option list). But, when "fallback.efi" crashes, you get a crash instead. >>> >>>> I am confused about two points: >>>> 1) The new domain still have chance to load the "EFI\centos\shim.efi" and boot kernel successful, it means that sometimes the system firmware launches >>>> the BOOTAA64.EFI, sometimes lauches shim.efi. It is probabilistic. >>> >>> "EFI\centos\shim.efi" is never automatically loaded. It needs a >>> dedicated UEFI boot option. Thus, it can be loaded in your "new" domain >>> *only* if "fallback.efi" runs first, successfully. >>> >>> So what you are seeing is that "fallback.efi" sometimes works, and >>> sometimes crashes. That's the nature of memory corruption bugs. >>> >>>> >>>> 2) Is there a way to make the "CentOS Linux AltArch " boot menu persistent? >>> >>> There isn't. If you lose your nvram, you lose the non-auto-generated >>> boot options with it. >>> >>> Remedying such situations is what "fallback.efi" exists for. >>> >>>>> >>>>> In such cases, the "fallback.efi" utility is invoked (called >>>>> "\EFI\BOOT\BOOTAA64.EFI). Please refer to: >>>>> >>>>> https://blog.uncooperative.org/blog/2014/02/06/the-efi-system-partition/ >>>>> >>>>> Unfortunately, "fallback.efi" (from the shim package) used to have a few >>>>> bugs over time and sometimes it would crash. See for example: >>>>> >>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1196114 >>>>> >>>>> I'm unsure what version of shim / fallback.efi is in the installed >>>>> CentOS image, but it looks like the same (or another similar) >>>>> fallback.efi issue to me. >>>> >>>> shim version in my side is shim-0.9-2.el7.aarch64. >>> >>> This confirms that you are not seeing the exact bug described in >>> RHBZ#1196114, because that bug was fixed in shim-0.9 (see >>> ). >>> >>> It remains a fact that your original log contains a crash register dump >>> after fallback.efi is launched. The V0 register contains >>> 0xAFAFAFAFAFAFAFAF AFAFAFAFAFAFAFAF; the pattern 0xAF is used to fill >>> released (freed) pages in debug builds. So this seems to be an >>> use-after-free issue. I suggest adding debug instrumentation to >>> fallback.efi, and seeing where exactly it blows up. >>> >> >> The presence of the 0xAF pattern in register v0 by itself does not >> suggest anything at all: V0 is a SIMD register, which is used by the >> SetMem() routine to poison the memory. There is very little other code >> (if any) that actually uses the SIMD registers otherwise. > > Thanks for pointing this out. > > Can you perhaps deduce more info from the stack / register dump? The > topmost three stack frames don't have edk2 module names associated with > them -- does that confirm that the synchronous exception is raised in a > non-edk2 module? > The stack trace is consistent with BDS calling LoadImage() to launch fallback.efi (which is GNU-EFI based so it does not set the NB10 Codeview debug entry containing the path on the build host) The FAR (faulting address) register contains the well known bogus value KVM puts in there by default. Also, the exception class field in the ESR (bits 31:26) is 0x0 as well, which translates as an unknown exception. Are there any kvm related messages in the host kernel log? This looks like the result of kvm_inject_undefined(), which prints some kind of diagnostic in many cases. > (I still think the only way forward is to instrument fallback.efi, and I > won't be doing that.) > Well, if you have access to the ELF file that fallback.efi was built from, you can correlate the stack trace address with locations in the code. Lacking that, it would at least be *very* helpful to know which opcode is being executed when the exception is taken.