public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
To: Laszlo Ersek <lersek@redhat.com>
Cc: Zhu Yijun <zhuyijun@huawei.com>,
	 "edk2-devel@lists.01.org" <edk2-devel@lists.01.org>,
	"Richard W.M. Jones" <rjones@redhat.com>
Subject: Re: issue about booting centos fail with edk2
Date: Tue, 1 Aug 2017 18:23:55 +0100	[thread overview]
Message-ID: <CAKv+Gu_keKHw237Yu+PwqJmK3MA=FPkhtxzxBSWh0Q6vJyKm7g@mail.gmail.com> (raw)
In-Reply-To: <77e8ddad-9039-efe4-f6f7-1dbc66d4eb6c@redhat.com>

On 1 August 2017 at 16:42, Laszlo Ersek <lersek@redhat.com> wrote:
> On 08/01/17 10:34, Zhu Yijun wrote:
>> Thanks for your reply!
>>
>> On 2017/8/1 3:02, Laszlo Ersek wrote:
>>> On 07/31/17 02:27, Zhu Yijun wrote:
>>>> Hi all,
>>>>
>>>>     I install a CentOS-7-aarch64 guest img by qemu cdrom, but it hung at UEFI probability.
>>>>
>>>>     Basic info:
>>>>     libvirt 1.3.5
>>>>     QEMU 2.6.2
>>>>     UEFI: master branch with commit "688c7d2 BaseTools: Fix the bug that warn() function with only 1 argument"
>>>>
>>>>     Config pflash and two disks in xml:
>>>>
>>>>     ...
>>>>     <os>
>>>>     <type arch='aarch64' machine='virt-2.6'>hvm</type>
>>>>     <loader readonly='yes' type='pflash'>/usr/share/edk2/aarch64/QEMU_EFI-pflash.raw</loader>
>>>>     <boot dev='hd'/>
>>>>   </os>
>>>>   ...
>>>>   <disk type='file' device='disk'>
>>>>       <driver name='qemu' type='qcow2' cache='none' io='native'/>
>>>>       <source file='/CentOS-7-aarch64/centos.qcow2'/>
>>>>       <backingStore/>
>>>>       <target dev='sda' bus='scsi'/>
>>>>     </disk>
>>>>     <disk type='file' device='cdrom'>
>>>>       <driver name='qemu' type='raw' cache='none' io='native'/>
>>>>       <source file='/CentOS-7-aarch64/CentOS-7-aarch64-Everything.iso'/>
>>>>       <backingStore/>
>>>>       <target dev='sdb' bus='scsi'/>
>>>>     </disk>
>>>>     ...
>>>>
>>>>     I found it failed at "Match (Translated, TranslatedSize, ActiveOption[Idx].BootOption->FilePath)" function in "SetBootOrderFromQemu", the UEFI debug info as follow:
>>> No, that's not where the problem is. See below:
>>>
>>>> start-console-fail.log
>>>> FSOpen: Open '\EFI\BOOT\fallback.efi' Success
>>>> FSOpen: Open '\EFI\BOOT\fallback.efi' Success
>>>>
>>>>
>>>> Synchronous Exception at 0x00000002384B1104
>>>> PC 0x0002384B1104
>>>> PC 0x0002384A916C
>>>> PC 0x0002384CA2D0
>>>> PC 0x00023EEB7DF8 (0x00023EEB1000+0x00006DF8) [ 1] DxeCore.dll
>>>> PC 0x00023BD1568C (0x00023BD02000+0x0001368C) [ 2] BdsDxe.dll
>>>> PC 0x00023BD03F98 (0x00023BD02000+0x00001F98) [ 2] BdsDxe.dll
>>>> PC 0x00023BD05640 (0x00023BD02000+0x00003640) [ 2] BdsDxe.dll
>>>> PC 0x00023EEB3704 (0x00023EEB1000+0x00002704) [ 3] DxeCore.dll
>>>> PC 0x00023EEB27C8 (0x00023EEB1000+0x000017C8) [ 3] DxeCore.dll
>>>> PC 0x00023EEB2024 (0x00023EEB1000+0x00001024) [ 3] DxeCore.dll
>>>> [ 1] /root/rpmbuild/BUILD/edk2-2.6.0/Build/ArmVirtQemu-AARCH64/DEBUG_GCC49/AARCH64/MdeModulePkg/Core/Dxe/DxeMain/DEBUG/DxeCore.dll
>>>> [ 2] /root/rpmbuild/BUILD/edk2-2.6.0/Build/ArmVirtQemu-AARCH64/DEBUG_GCC49/AARCH64/MdeModulePkg/Universal/BdsDxe/BdsDxe/DEBUG/BdsDxe.dll
>>>> [ 3] /root/rpmbuild/BUILD/edk2-2.6.0/Build/ArmVirtQemu-AARCH64/DEBUG_GCC49/AARCH64/MdeModulePkg/Core/Dxe/DxeMain/DEBUG/DxeCore.dll
>>>>
>>>>   X0 0x00000002384A9000   X1 0x00000002384B2990   X2 0x000000023AAFDF98   X3 0x000000023BFF0018
>>>>   X4 0x0000000000000000   X5 0x0000000000000007   X6 0x0000000238533300   X7 0x0000000000000000
>>>>   X8 0x000000023C01F548   X9 0x0000000200000000  X10 0x00000002384A8000  X11 0x00000002384C5FFF
>>>>  X12 0x0000000000000000  X13 0x0000000000000008  X14 0x259511BDAEB1F36C  X15 0x1378CC1DF3F5DDBB
>>>>  X16 0x000000023EEB0BE0  X17 0x0000000000000000  X18 0x0000000000000000  X19 0x0000000000000013
>>>>  X20 0x0000000000000000  X21 0x0000000000000000  X22 0x0000000000000000  X23 0x0000000000000000
>>>>  X24 0x0000000000000000  X25 0x0000000000000000  X26 0x0000000000000000  X27 0x0000000000000000
>>>>  X28 0x0000000000000000   FP 0x000000023EEB0A40   LR 0x00000002384A916C
>>>>
>>>>   V0 0xAFAFAFAFAFAFAFAF AFAFAFAFAFAFAFAF   V1 0x63702F6666666666 6666666666666666
>>>>   V2 0x40697363732F3340 6567646972622D69   V3 0x0000000000000000 0000000000000000
>>>>   V4 0x0000000000000000 0000000000000000   V5 0x4010040140100401 4010040140100401
>>>>   V6 0x0000000000000000 0000000000000000   V7 0x0000000000000000 0000000000000000
>>>>   V8 0x0000000000000000 0000000000000000   V9 0x0000000000000000 0000000000000000
>>>>  V10 0x0000000000000000 0000000000000000  V11 0x0000000000000000 0000000000000000
>>>>  V12 0x0000000000000000 0000000000000000  V13 0x0000000000000000 0000000000000000
>>>>  V14 0x0000000000000000 0000000000000000  V15 0x0000000000000000 0000000000000000
>>>>  V16 0x0000000000000000 0000000000000000  V17 0x0000000000000000 0000000000000000
>>>>  V18 0x0000000000000000 0000000000000000  V19 0x0000000000000000 0000000000000000
>>>>  V20 0x0000000000000000 0000000000000000  V21 0x0000000000000000 0000000000000000
>>>>  V22 0x0000000000000000 0000000000000000  V23 0x0000000000000000 0000000000000000
>>>>  V24 0x0000000000000000 0000000000000000  V25 0x0000000000000000 0000000000000000
>>>>  V26 0x0000000000000000 0000000000000000  V27 0x0000000000000000 0000000000000000
>>>>  V28 0x0000000000000000 0000000000000000  V29 0x0000000000000000 0000000000000000
>>>>  V30 0x0000000000000000 0000000000000000  V31 0x0000000000000000 0000000000000000
>>>>
>>>>   SP 0x000000023EEB0A40  ELR 0x00000002384B1104  SPSR 0x60000205  FPSR 0x00000000
>>>>  ESR 0x02000000          FAR 0x1DE7EC7EDBADC0DE
>>>>
>>>>  ESR : EC 0x00  IL 0x1  ISS 0x00000000
>>>>
>>>> Stack dump:
>>>>   000023EEB0940: 0000C0E000000148 00000002384A9000 00000002384CA254 0000000000000000
>>>>   000023EEB0960: 000000023EEB0BC0 000000023AC006C0 0000F2503EEB0BC0 00000002384B6018
>>>>   000023EEB0980: 000000023EEB0BC0 0000000000000000 000000000000C0E0 0000000000000148
>>>>   000023EEB09A0: 0000000000000148 0000100000020A8C 00000002384B6110 00000002384B6108
>>>>   000023EEB09C0: 00000002384B6100 0000000000000006 00000002384B6058 00000002384B50DF
>>>>   000023EEB09E0: 00000002384A9148 0000000000000000 00000002384A9000 00000002384A9000
>>>>   000023EEB0A00: 0000000000000000 00000002398DA518 00000002385375B2 00000002385629A0
>>>>   000023EEB0A20: 000000023854C1C0 00000002398DA518 000000023EEB0BC0 0000000000000000
>>>>> 000023EEB0A40: 000000023EEB0BC0 00000002384CA2D0 000000023AAFDF98 000000023BFF0018
>>>>   000023EEB0A60: 00000002384CA360 000000023EEC8348 00000002385375B0 000000023AAFDF98
>>>>   000023EEB0A80: 000000023EEB0AC0 0000F25038533338 00000002384B6018 0000000000000000
>>>>   000023EEB0AA0: 0000000000000000 0000000238B63D18 0000000000001000 0000000000000000
>>>>   000023EEB0AC0: 000000023BFF0018 00000002398DA518 00000002398CE598 0000000000000000
>>>>   000023EEB0AE0: 0000000000000000 0000000000000000 00000002384C6000 00000000000C99C0
>>>>   000023EEB0B00: 0000000200000001 0000000000000000 000000023AC006C0 11D295625B1B31A1
>>>>   000023EEB0B20: 3B7269C9A0003F8E 0000000000000000 0000000238B63F98 000000163EEB0B68
>>>> ASSERT [ArmCpuDxe] /root/rpmbuild/BUILD/edk2-2.6.0/ArmPkg/Library/DefaultExceptionHandlerLib/AArch64/DefaultExceptionHandler.c(271): ((BOOLEAN)(0==1))
>>> This is a guest that you didn't install from installer media. I think
>>> you may have gotten the preinstalled disk image from some image provider
>>> service. The UEFI boot variable(s) are not set up to boot the CentOS
>>> installation, in your nvram / pflash file.
>> Yes, the boot variable must store in domain's nvram file("/var/lib/libvirt/qemu/nvram/centos_VARS.fd").  After installed, it generates an new boot menu
>> called "CentOS Linux AltArch "  which device path is "HD(1,GPT,D562CAA6-F61B-4F93-87FB-22DDADF6CAE2,0x800,0x64000)/\EFI\centos\shim.efi".
>>
>> such like:
>> Boot Manager Menu
>>    CentOS Linux AltArch                          -> device path:  PciRoot(0x0)/Pci(0x3,0x0)/Pci(0x0,0x0)/Scsi(0x0,0x0) /HD(1,GPT,D562CAA6-F61B-4F93-87FB-22DDADF6CAE2,0x800,0x64000)/\EFI\centos\shim.efi
>>    UEFI Misc Device
>>    UEFI Misc Device 2
>>    EFI Internal Shell
>>    UEFI QEMU QEMU CD-ROM               -> device path:  PciRoot(0x0)/Pci(0x3,0x0)/Pci(0x0,0x0)/Scsi(0x0,0x1)
>>    UEFI QEMU QEMU HARDDISK            -> device path:  PciRoot(0x0)/Pci(0x3,0x0)/Pci(0x0,0x0)/Scsi(0x0,0x0)
>>    UEFI PXEv4 (MAC:5254002D2EB6)
>>
>> But when I shutdown &undefine this domain, and virsh create an new domain with the disk centos.qcow2 which installed just before, the UEFI boot manager
>> menu is:
>> Boot Manager Menu
>>    UEFI QEMU QEMU HARDDISK               -> device path:  PciRoot(0x0)/Pci(0x3,0x0)/Pci(0x0,0x0)/Scsi(0x0,0x0)
>>    UEFI Misc Device
>>    UEFI Misc Device 2
>>    EFI Internal Shell
>>    UEFI PXEv4 (MAC:5254002D2EB6)
>
> Right. In this case you have lost your original nvram contents, and you
> only have the boot options that are auto-generated by the
> EfiBootManagerRefreshAllBootOption() function. This function lives in
> UefiBootManagerLib, and is called from OVMF's PlatformBootManagerLib
> instance.
>
> The filtering and reordering still occurs in OVMF, but now the first
> boot option that matches QEMU's fw_cfg bootorder specification is not
> the "CentOS Linux AltArch" boot option that you originally had. Instead,
> now QemuBootOrderLib encounters the "UEFI QEMU QEMU HARDDISK"
> auto-generated boot option first as a match.
>
> This boot option in turn means "fallback.efi", according to the blog
> post I linked earlier.
>
> When "fallback.efi" executes successfully, your original "CentOS Linux
> AltArch" boot option is restored / recreated (at the top of the boot
> option list). But, when "fallback.efi" crashes, you get a crash instead.
>
>> I am confused about two points:
>> 1) The new domain still have chance to load the "EFI\centos\shim.efi" and boot kernel successful, it means that sometimes the system firmware launches
>> the BOOTAA64.EFI, sometimes lauches shim.efi.  It is probabilistic.
>
> "EFI\centos\shim.efi" is never automatically loaded. It needs a
> dedicated UEFI boot option. Thus, it can be loaded in your "new" domain
> *only* if "fallback.efi" runs first, successfully.
>
> So what you are seeing is that "fallback.efi" sometimes works, and
> sometimes crashes. That's the nature of memory corruption bugs.
>
>>
>> 2) Is there a way to make the "CentOS Linux AltArch " boot menu persistent?
>
> There isn't. If you lose your nvram, you lose the non-auto-generated
> boot options with it.
>
> Remedying such situations is what "fallback.efi" exists for.
>
>>>
>>> In such cases, the "fallback.efi" utility is invoked (called
>>> "\EFI\BOOT\BOOTAA64.EFI). Please refer to:
>>>
>>> https://blog.uncooperative.org/blog/2014/02/06/the-efi-system-partition/
>>>
>>> Unfortunately, "fallback.efi" (from the shim package) used to have a few
>>> bugs over time and sometimes it would crash. See for example:
>>>
>>>   https://bugzilla.redhat.com/show_bug.cgi?id=1196114
>>>
>>> I'm unsure what version of shim / fallback.efi is in the installed
>>> CentOS image, but it looks like the same (or another similar)
>>> fallback.efi issue to me.
>>
>> shim version in my side is shim-0.9-2.el7.aarch64.
>
> This confirms that you are not seeing the exact bug described in
> RHBZ#1196114, because that bug was fixed in shim-0.9 (see
> <https://bugzilla.redhat.com/show_bug.cgi?id=1196114#c16>).
>
> It remains a fact that your original log contains a crash register dump
> after fallback.efi is launched. The V0 register contains
> 0xAFAFAFAFAFAFAFAF AFAFAFAFAFAFAFAF; the pattern 0xAF is used to fill
> released (freed) pages in debug builds. So this seems to be an
> use-after-free issue. I suggest adding debug instrumentation to
> fallback.efi, and seeing where exactly it blows up.
>

The presence of the 0xAF pattern in register v0 by itself does not
suggest anything at all: V0 is a SIMD register, which is used by the
SetMem() routine to poison the memory. There is very little other code
(if any) that actually uses the SIMD registers otherwise.


  reply	other threads:[~2017-08-01 17:21 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-31  0:27 issue about booting centos fail with edk2 Zhu Yijun
2017-07-31 19:02 ` Laszlo Ersek
2017-08-01  8:34   ` Zhu Yijun
2017-08-01 15:42     ` Laszlo Ersek
2017-08-01 17:23       ` Ard Biesheuvel [this message]
2017-08-01 22:29         ` Laszlo Ersek
2017-08-01 22:57           ` Ard Biesheuvel
2017-08-02  8:29             ` Laszlo Ersek
2017-08-03  0:40               ` Zhu Yijun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKv+Gu_keKHw237Yu+PwqJmK3MA=FPkhtxzxBSWh0Q6vJyKm7g@mail.gmail.com' \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox