From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 17FEC2095D9DD for ; Tue, 1 Aug 2017 08:40:18 -0700 (PDT) Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E51627ACA4; Tue, 1 Aug 2017 15:42:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com E51627ACA4 Authentication-Results: ext-mx02.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx02.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=lersek@redhat.com Received: from lacos-laptop-7.usersys.redhat.com (ovpn-116-153.phx2.redhat.com [10.3.116.153]) by smtp.corp.redhat.com (Postfix) with ESMTP id 897476759F; Tue, 1 Aug 2017 15:42:21 +0000 (UTC) To: Zhu Yijun Cc: edk2-devel@lists.01.org, "Richard W.M. Jones" , Gerd Hoffmann References: <597E798B.1020806@huawei.com> <443e01eb-28ec-6e4d-43ac-6f6f16f7f3d4@redhat.com> <59803D0A.6020305@huawei.com> From: Laszlo Ersek Message-ID: <77e8ddad-9039-efe4-f6f7-1dbc66d4eb6c@redhat.com> Date: Tue, 1 Aug 2017 17:42:20 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <59803D0A.6020305@huawei.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Tue, 01 Aug 2017 15:42:27 +0000 (UTC) Subject: Re: issue about booting centos fail with edk2 X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Aug 2017 15:40:18 -0000 Content-Type: text/plain; charset=windows-1252 Content-Language: en-US Content-Transfer-Encoding: 7bit On 08/01/17 10:34, Zhu Yijun wrote: > Thanks for your reply! > > On 2017/8/1 3:02, Laszlo Ersek wrote: >> On 07/31/17 02:27, Zhu Yijun wrote: >>> Hi all, >>> >>> I install a CentOS-7-aarch64 guest img by qemu cdrom, but it hung at UEFI probability. >>> >>> Basic info: >>> libvirt 1.3.5 >>> QEMU 2.6.2 >>> UEFI: master branch with commit "688c7d2 BaseTools: Fix the bug that warn() function with only 1 argument" >>> >>> Config pflash and two disks in xml: >>> >>> ... >>> >>> hvm >>> /usr/share/edk2/aarch64/QEMU_EFI-pflash.raw >>> >>> >>> ... >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> ... >>> >>> I found it failed at "Match (Translated, TranslatedSize, ActiveOption[Idx].BootOption->FilePath)" function in "SetBootOrderFromQemu", the UEFI debug info as follow: >> No, that's not where the problem is. See below: >> >>> start-console-fail.log >>> FSOpen: Open '\EFI\BOOT\fallback.efi' Success >>> FSOpen: Open '\EFI\BOOT\fallback.efi' Success >>> >>> >>> Synchronous Exception at 0x00000002384B1104 >>> PC 0x0002384B1104 >>> PC 0x0002384A916C >>> PC 0x0002384CA2D0 >>> PC 0x00023EEB7DF8 (0x00023EEB1000+0x00006DF8) [ 1] DxeCore.dll >>> PC 0x00023BD1568C (0x00023BD02000+0x0001368C) [ 2] BdsDxe.dll >>> PC 0x00023BD03F98 (0x00023BD02000+0x00001F98) [ 2] BdsDxe.dll >>> PC 0x00023BD05640 (0x00023BD02000+0x00003640) [ 2] BdsDxe.dll >>> PC 0x00023EEB3704 (0x00023EEB1000+0x00002704) [ 3] DxeCore.dll >>> PC 0x00023EEB27C8 (0x00023EEB1000+0x000017C8) [ 3] DxeCore.dll >>> PC 0x00023EEB2024 (0x00023EEB1000+0x00001024) [ 3] DxeCore.dll >>> [ 1] /root/rpmbuild/BUILD/edk2-2.6.0/Build/ArmVirtQemu-AARCH64/DEBUG_GCC49/AARCH64/MdeModulePkg/Core/Dxe/DxeMain/DEBUG/DxeCore.dll >>> [ 2] /root/rpmbuild/BUILD/edk2-2.6.0/Build/ArmVirtQemu-AARCH64/DEBUG_GCC49/AARCH64/MdeModulePkg/Universal/BdsDxe/BdsDxe/DEBUG/BdsDxe.dll >>> [ 3] /root/rpmbuild/BUILD/edk2-2.6.0/Build/ArmVirtQemu-AARCH64/DEBUG_GCC49/AARCH64/MdeModulePkg/Core/Dxe/DxeMain/DEBUG/DxeCore.dll >>> >>> X0 0x00000002384A9000 X1 0x00000002384B2990 X2 0x000000023AAFDF98 X3 0x000000023BFF0018 >>> X4 0x0000000000000000 X5 0x0000000000000007 X6 0x0000000238533300 X7 0x0000000000000000 >>> X8 0x000000023C01F548 X9 0x0000000200000000 X10 0x00000002384A8000 X11 0x00000002384C5FFF >>> X12 0x0000000000000000 X13 0x0000000000000008 X14 0x259511BDAEB1F36C X15 0x1378CC1DF3F5DDBB >>> X16 0x000000023EEB0BE0 X17 0x0000000000000000 X18 0x0000000000000000 X19 0x0000000000000013 >>> X20 0x0000000000000000 X21 0x0000000000000000 X22 0x0000000000000000 X23 0x0000000000000000 >>> X24 0x0000000000000000 X25 0x0000000000000000 X26 0x0000000000000000 X27 0x0000000000000000 >>> X28 0x0000000000000000 FP 0x000000023EEB0A40 LR 0x00000002384A916C >>> >>> V0 0xAFAFAFAFAFAFAFAF AFAFAFAFAFAFAFAF V1 0x63702F6666666666 6666666666666666 >>> V2 0x40697363732F3340 6567646972622D69 V3 0x0000000000000000 0000000000000000 >>> V4 0x0000000000000000 0000000000000000 V5 0x4010040140100401 4010040140100401 >>> V6 0x0000000000000000 0000000000000000 V7 0x0000000000000000 0000000000000000 >>> V8 0x0000000000000000 0000000000000000 V9 0x0000000000000000 0000000000000000 >>> V10 0x0000000000000000 0000000000000000 V11 0x0000000000000000 0000000000000000 >>> V12 0x0000000000000000 0000000000000000 V13 0x0000000000000000 0000000000000000 >>> V14 0x0000000000000000 0000000000000000 V15 0x0000000000000000 0000000000000000 >>> V16 0x0000000000000000 0000000000000000 V17 0x0000000000000000 0000000000000000 >>> V18 0x0000000000000000 0000000000000000 V19 0x0000000000000000 0000000000000000 >>> V20 0x0000000000000000 0000000000000000 V21 0x0000000000000000 0000000000000000 >>> V22 0x0000000000000000 0000000000000000 V23 0x0000000000000000 0000000000000000 >>> V24 0x0000000000000000 0000000000000000 V25 0x0000000000000000 0000000000000000 >>> V26 0x0000000000000000 0000000000000000 V27 0x0000000000000000 0000000000000000 >>> V28 0x0000000000000000 0000000000000000 V29 0x0000000000000000 0000000000000000 >>> V30 0x0000000000000000 0000000000000000 V31 0x0000000000000000 0000000000000000 >>> >>> SP 0x000000023EEB0A40 ELR 0x00000002384B1104 SPSR 0x60000205 FPSR 0x00000000 >>> ESR 0x02000000 FAR 0x1DE7EC7EDBADC0DE >>> >>> ESR : EC 0x00 IL 0x1 ISS 0x00000000 >>> >>> Stack dump: >>> 000023EEB0940: 0000C0E000000148 00000002384A9000 00000002384CA254 0000000000000000 >>> 000023EEB0960: 000000023EEB0BC0 000000023AC006C0 0000F2503EEB0BC0 00000002384B6018 >>> 000023EEB0980: 000000023EEB0BC0 0000000000000000 000000000000C0E0 0000000000000148 >>> 000023EEB09A0: 0000000000000148 0000100000020A8C 00000002384B6110 00000002384B6108 >>> 000023EEB09C0: 00000002384B6100 0000000000000006 00000002384B6058 00000002384B50DF >>> 000023EEB09E0: 00000002384A9148 0000000000000000 00000002384A9000 00000002384A9000 >>> 000023EEB0A00: 0000000000000000 00000002398DA518 00000002385375B2 00000002385629A0 >>> 000023EEB0A20: 000000023854C1C0 00000002398DA518 000000023EEB0BC0 0000000000000000 >>>> 000023EEB0A40: 000000023EEB0BC0 00000002384CA2D0 000000023AAFDF98 000000023BFF0018 >>> 000023EEB0A60: 00000002384CA360 000000023EEC8348 00000002385375B0 000000023AAFDF98 >>> 000023EEB0A80: 000000023EEB0AC0 0000F25038533338 00000002384B6018 0000000000000000 >>> 000023EEB0AA0: 0000000000000000 0000000238B63D18 0000000000001000 0000000000000000 >>> 000023EEB0AC0: 000000023BFF0018 00000002398DA518 00000002398CE598 0000000000000000 >>> 000023EEB0AE0: 0000000000000000 0000000000000000 00000002384C6000 00000000000C99C0 >>> 000023EEB0B00: 0000000200000001 0000000000000000 000000023AC006C0 11D295625B1B31A1 >>> 000023EEB0B20: 3B7269C9A0003F8E 0000000000000000 0000000238B63F98 000000163EEB0B68 >>> ASSERT [ArmCpuDxe] /root/rpmbuild/BUILD/edk2-2.6.0/ArmPkg/Library/DefaultExceptionHandlerLib/AArch64/DefaultExceptionHandler.c(271): ((BOOLEAN)(0==1)) >> This is a guest that you didn't install from installer media. I think >> you may have gotten the preinstalled disk image from some image provider >> service. The UEFI boot variable(s) are not set up to boot the CentOS >> installation, in your nvram / pflash file. > Yes, the boot variable must store in domain's nvram file("/var/lib/libvirt/qemu/nvram/centos_VARS.fd"). After installed, it generates an new boot menu > called "CentOS Linux AltArch " which device path is "HD(1,GPT,D562CAA6-F61B-4F93-87FB-22DDADF6CAE2,0x800,0x64000)/\EFI\centos\shim.efi". > > such like: > Boot Manager Menu > CentOS Linux AltArch -> device path: PciRoot(0x0)/Pci(0x3,0x0)/Pci(0x0,0x0)/Scsi(0x0,0x0) /HD(1,GPT,D562CAA6-F61B-4F93-87FB-22DDADF6CAE2,0x800,0x64000)/\EFI\centos\shim.efi > UEFI Misc Device > UEFI Misc Device 2 > EFI Internal Shell > UEFI QEMU QEMU CD-ROM -> device path: PciRoot(0x0)/Pci(0x3,0x0)/Pci(0x0,0x0)/Scsi(0x0,0x1) > UEFI QEMU QEMU HARDDISK -> device path: PciRoot(0x0)/Pci(0x3,0x0)/Pci(0x0,0x0)/Scsi(0x0,0x0) > UEFI PXEv4 (MAC:5254002D2EB6) > > But when I shutdown &undefine this domain, and virsh create an new domain with the disk centos.qcow2 which installed just before, the UEFI boot manager > menu is: > Boot Manager Menu > UEFI QEMU QEMU HARDDISK -> device path: PciRoot(0x0)/Pci(0x3,0x0)/Pci(0x0,0x0)/Scsi(0x0,0x0) > UEFI Misc Device > UEFI Misc Device 2 > EFI Internal Shell > UEFI PXEv4 (MAC:5254002D2EB6) Right. In this case you have lost your original nvram contents, and you only have the boot options that are auto-generated by the EfiBootManagerRefreshAllBootOption() function. This function lives in UefiBootManagerLib, and is called from OVMF's PlatformBootManagerLib instance. The filtering and reordering still occurs in OVMF, but now the first boot option that matches QEMU's fw_cfg bootorder specification is not the "CentOS Linux AltArch" boot option that you originally had. Instead, now QemuBootOrderLib encounters the "UEFI QEMU QEMU HARDDISK" auto-generated boot option first as a match. This boot option in turn means "fallback.efi", according to the blog post I linked earlier. When "fallback.efi" executes successfully, your original "CentOS Linux AltArch" boot option is restored / recreated (at the top of the boot option list). But, when "fallback.efi" crashes, you get a crash instead. > I am confused about two points: > 1) The new domain still have chance to load the "EFI\centos\shim.efi" and boot kernel successful, it means that sometimes the system firmware launches > the BOOTAA64.EFI, sometimes lauches shim.efi. It is probabilistic. "EFI\centos\shim.efi" is never automatically loaded. It needs a dedicated UEFI boot option. Thus, it can be loaded in your "new" domain *only* if "fallback.efi" runs first, successfully. So what you are seeing is that "fallback.efi" sometimes works, and sometimes crashes. That's the nature of memory corruption bugs. > > 2) Is there a way to make the "CentOS Linux AltArch " boot menu persistent? There isn't. If you lose your nvram, you lose the non-auto-generated boot options with it. Remedying such situations is what "fallback.efi" exists for. >> >> In such cases, the "fallback.efi" utility is invoked (called >> "\EFI\BOOT\BOOTAA64.EFI). Please refer to: >> >> https://blog.uncooperative.org/blog/2014/02/06/the-efi-system-partition/ >> >> Unfortunately, "fallback.efi" (from the shim package) used to have a few >> bugs over time and sometimes it would crash. See for example: >> >> https://bugzilla.redhat.com/show_bug.cgi?id=1196114 >> >> I'm unsure what version of shim / fallback.efi is in the installed >> CentOS image, but it looks like the same (or another similar) >> fallback.efi issue to me. > > shim version in my side is shim-0.9-2.el7.aarch64. This confirms that you are not seeing the exact bug described in RHBZ#1196114, because that bug was fixed in shim-0.9 (see ). It remains a fact that your original log contains a crash register dump after fallback.efi is launched. The V0 register contains 0xAFAFAFAFAFAFAFAF AFAFAFAFAFAFAFAF; the pattern 0xAF is used to fill released (freed) pages in debug builds. So this seems to be an use-after-free issue. I suggest adding debug instrumentation to fallback.efi, and seeing where exactly it blows up. Laszlo