From: Laszlo Ersek <lersek@redhat.com>
To: Zhu Yijun <zhuyijun@huawei.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>,
"edk2-devel@lists.01.org" <edk2-devel@lists.01.org>,
"Richard W.M. Jones" <rjones@redhat.com>
Subject: Re: issue about booting centos fail with edk2
Date: Wed, 2 Aug 2017 10:29:29 +0200 [thread overview]
Message-ID: <b562e4b7-0268-7c70-de68-85cec2759b3e@redhat.com> (raw)
In-Reply-To: <CAKv+Gu86iVtxOfWX=LwOkjzLkxf+GuFiK0gxt+zyyAiaE+XNhA@mail.gmail.com>
On 08/02/17 00:57, Ard Biesheuvel wrote:
> On 1 August 2017 at 23:29, Laszlo Ersek <lersek@redhat.com> wrote:
>> On 08/01/17 19:23, Ard Biesheuvel wrote:
>>> On 1 August 2017 at 16:42, Laszlo Ersek <lersek@redhat.com> wrote:
>>>> On 08/01/17 10:34, Zhu Yijun wrote:
>>>>> Thanks for your reply!
>>>>>
>>>>> On 2017/8/1 3:02, Laszlo Ersek wrote:
>>>>>> On 07/31/17 02:27, Zhu Yijun wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I install a CentOS-7-aarch64 guest img by qemu cdrom, but it hung at UEFI probability.
>>>>>>>
>>>>>>> Basic info:
>>>>>>> libvirt 1.3.5
>>>>>>> QEMU 2.6.2
>>>>>>> UEFI: master branch with commit "688c7d2 BaseTools: Fix the bug that warn() function with only 1 argument"
>>>>>>>
>>>>>>> Config pflash and two disks in xml:
>>>>>>>
>>>>>>> ...
>>>>>>> <os>
>>>>>>> <type arch='aarch64' machine='virt-2.6'>hvm</type>
>>>>>>> <loader readonly='yes' type='pflash'>/usr/share/edk2/aarch64/QEMU_EFI-pflash.raw</loader>
>>>>>>> <boot dev='hd'/>
>>>>>>> </os>
>>>>>>> ...
>>>>>>> <disk type='file' device='disk'>
>>>>>>> <driver name='qemu' type='qcow2' cache='none' io='native'/>
>>>>>>> <source file='/CentOS-7-aarch64/centos.qcow2'/>
>>>>>>> <backingStore/>
>>>>>>> <target dev='sda' bus='scsi'/>
>>>>>>> </disk>
>>>>>>> <disk type='file' device='cdrom'>
>>>>>>> <driver name='qemu' type='raw' cache='none' io='native'/>
>>>>>>> <source file='/CentOS-7-aarch64/CentOS-7-aarch64-Everything.iso'/>
>>>>>>> <backingStore/>
>>>>>>> <target dev='sdb' bus='scsi'/>
>>>>>>> </disk>
>>>>>>> ...
>>>>>>>
>>>>>>> I found it failed at "Match (Translated, TranslatedSize, ActiveOption[Idx].BootOption->FilePath)" function in "SetBootOrderFromQemu", the UEFI debug info as follow:
>>>>>> No, that's not where the problem is. See below:
>>>>>>
>>>>>>> start-console-fail.log
>>>>>>> FSOpen: Open '\EFI\BOOT\fallback.efi' Success
>>>>>>> FSOpen: Open '\EFI\BOOT\fallback.efi' Success
>>>>>>>
>>>>>>>
>>>>>>> Synchronous Exception at 0x00000002384B1104
>>>>>>> PC 0x0002384B1104
>>>>>>> PC 0x0002384A916C
>>>>>>> PC 0x0002384CA2D0
>>>>>>> PC 0x00023EEB7DF8 (0x00023EEB1000+0x00006DF8) [ 1] DxeCore.dll
>>>>>>> PC 0x00023BD1568C (0x00023BD02000+0x0001368C) [ 2] BdsDxe.dll
>>>>>>> PC 0x00023BD03F98 (0x00023BD02000+0x00001F98) [ 2] BdsDxe.dll
>>>>>>> PC 0x00023BD05640 (0x00023BD02000+0x00003640) [ 2] BdsDxe.dll
>>>>>>> PC 0x00023EEB3704 (0x00023EEB1000+0x00002704) [ 3] DxeCore.dll
>>>>>>> PC 0x00023EEB27C8 (0x00023EEB1000+0x000017C8) [ 3] DxeCore.dll
>>>>>>> PC 0x00023EEB2024 (0x00023EEB1000+0x00001024) [ 3] DxeCore.dll
>>>>>>> [ 1] /root/rpmbuild/BUILD/edk2-2.6.0/Build/ArmVirtQemu-AARCH64/DEBUG_GCC49/AARCH64/MdeModulePkg/Core/Dxe/DxeMain/DEBUG/DxeCore.dll
>>>>>>> [ 2] /root/rpmbuild/BUILD/edk2-2.6.0/Build/ArmVirtQemu-AARCH64/DEBUG_GCC49/AARCH64/MdeModulePkg/Universal/BdsDxe/BdsDxe/DEBUG/BdsDxe.dll
>>>>>>> [ 3] /root/rpmbuild/BUILD/edk2-2.6.0/Build/ArmVirtQemu-AARCH64/DEBUG_GCC49/AARCH64/MdeModulePkg/Core/Dxe/DxeMain/DEBUG/DxeCore.dll
>>>>>>>
>>>>>>> X0 0x00000002384A9000 X1 0x00000002384B2990 X2 0x000000023AAFDF98 X3 0x000000023BFF0018
>>>>>>> X4 0x0000000000000000 X5 0x0000000000000007 X6 0x0000000238533300 X7 0x0000000000000000
>>>>>>> X8 0x000000023C01F548 X9 0x0000000200000000 X10 0x00000002384A8000 X11 0x00000002384C5FFF
>>>>>>> X12 0x0000000000000000 X13 0x0000000000000008 X14 0x259511BDAEB1F36C X15 0x1378CC1DF3F5DDBB
>>>>>>> X16 0x000000023EEB0BE0 X17 0x0000000000000000 X18 0x0000000000000000 X19 0x0000000000000013
>>>>>>> X20 0x0000000000000000 X21 0x0000000000000000 X22 0x0000000000000000 X23 0x0000000000000000
>>>>>>> X24 0x0000000000000000 X25 0x0000000000000000 X26 0x0000000000000000 X27 0x0000000000000000
>>>>>>> X28 0x0000000000000000 FP 0x000000023EEB0A40 LR 0x00000002384A916C
>>>>>>>
>>>>>>> V0 0xAFAFAFAFAFAFAFAF AFAFAFAFAFAFAFAF V1 0x63702F6666666666 6666666666666666
>>>>>>> V2 0x40697363732F3340 6567646972622D69 V3 0x0000000000000000 0000000000000000
>>>>>>> V4 0x0000000000000000 0000000000000000 V5 0x4010040140100401 4010040140100401
>>>>>>> V6 0x0000000000000000 0000000000000000 V7 0x0000000000000000 0000000000000000
>>>>>>> V8 0x0000000000000000 0000000000000000 V9 0x0000000000000000 0000000000000000
>>>>>>> V10 0x0000000000000000 0000000000000000 V11 0x0000000000000000 0000000000000000
>>>>>>> V12 0x0000000000000000 0000000000000000 V13 0x0000000000000000 0000000000000000
>>>>>>> V14 0x0000000000000000 0000000000000000 V15 0x0000000000000000 0000000000000000
>>>>>>> V16 0x0000000000000000 0000000000000000 V17 0x0000000000000000 0000000000000000
>>>>>>> V18 0x0000000000000000 0000000000000000 V19 0x0000000000000000 0000000000000000
>>>>>>> V20 0x0000000000000000 0000000000000000 V21 0x0000000000000000 0000000000000000
>>>>>>> V22 0x0000000000000000 0000000000000000 V23 0x0000000000000000 0000000000000000
>>>>>>> V24 0x0000000000000000 0000000000000000 V25 0x0000000000000000 0000000000000000
>>>>>>> V26 0x0000000000000000 0000000000000000 V27 0x0000000000000000 0000000000000000
>>>>>>> V28 0x0000000000000000 0000000000000000 V29 0x0000000000000000 0000000000000000
>>>>>>> V30 0x0000000000000000 0000000000000000 V31 0x0000000000000000 0000000000000000
>>>>>>>
>>>>>>> SP 0x000000023EEB0A40 ELR 0x00000002384B1104 SPSR 0x60000205 FPSR 0x00000000
>>>>>>> ESR 0x02000000 FAR 0x1DE7EC7EDBADC0DE
>>>>>>>
>>>>>>> ESR : EC 0x00 IL 0x1 ISS 0x00000000
>>>>>>>
>>>>>>> Stack dump:
>>>>>>> 000023EEB0940: 0000C0E000000148 00000002384A9000 00000002384CA254 0000000000000000
>>>>>>> 000023EEB0960: 000000023EEB0BC0 000000023AC006C0 0000F2503EEB0BC0 00000002384B6018
>>>>>>> 000023EEB0980: 000000023EEB0BC0 0000000000000000 000000000000C0E0 0000000000000148
>>>>>>> 000023EEB09A0: 0000000000000148 0000100000020A8C 00000002384B6110 00000002384B6108
>>>>>>> 000023EEB09C0: 00000002384B6100 0000000000000006 00000002384B6058 00000002384B50DF
>>>>>>> 000023EEB09E0: 00000002384A9148 0000000000000000 00000002384A9000 00000002384A9000
>>>>>>> 000023EEB0A00: 0000000000000000 00000002398DA518 00000002385375B2 00000002385629A0
>>>>>>> 000023EEB0A20: 000000023854C1C0 00000002398DA518 000000023EEB0BC0 0000000000000000
>>>>>>>> 000023EEB0A40: 000000023EEB0BC0 00000002384CA2D0 000000023AAFDF98 000000023BFF0018
>>>>>>> 000023EEB0A60: 00000002384CA360 000000023EEC8348 00000002385375B0 000000023AAFDF98
>>>>>>> 000023EEB0A80: 000000023EEB0AC0 0000F25038533338 00000002384B6018 0000000000000000
>>>>>>> 000023EEB0AA0: 0000000000000000 0000000238B63D18 0000000000001000 0000000000000000
>>>>>>> 000023EEB0AC0: 000000023BFF0018 00000002398DA518 00000002398CE598 0000000000000000
>>>>>>> 000023EEB0AE0: 0000000000000000 0000000000000000 00000002384C6000 00000000000C99C0
>>>>>>> 000023EEB0B00: 0000000200000001 0000000000000000 000000023AC006C0 11D295625B1B31A1
>>>>>>> 000023EEB0B20: 3B7269C9A0003F8E 0000000000000000 0000000238B63F98 000000163EEB0B68
>>>>>>> ASSERT [ArmCpuDxe] /root/rpmbuild/BUILD/edk2-2.6.0/ArmPkg/Library/DefaultExceptionHandlerLib/AArch64/DefaultExceptionHandler.c(271): ((BOOLEAN)(0==1))
>>>>>> This is a guest that you didn't install from installer media. I think
>>>>>> you may have gotten the preinstalled disk image from some image provider
>>>>>> service. The UEFI boot variable(s) are not set up to boot the CentOS
>>>>>> installation, in your nvram / pflash file.
>>>>> Yes, the boot variable must store in domain's nvram file("/var/lib/libvirt/qemu/nvram/centos_VARS.fd"). After installed, it generates an new boot menu
>>>>> called "CentOS Linux AltArch " which device path is "HD(1,GPT,D562CAA6-F61B-4F93-87FB-22DDADF6CAE2,0x800,0x64000)/\EFI\centos\shim.efi".
>>>>>
>>>>> such like:
>>>>> Boot Manager Menu
>>>>> CentOS Linux AltArch -> device path: PciRoot(0x0)/Pci(0x3,0x0)/Pci(0x0,0x0)/Scsi(0x0,0x0) /HD(1,GPT,D562CAA6-F61B-4F93-87FB-22DDADF6CAE2,0x800,0x64000)/\EFI\centos\shim.efi
>>>>> UEFI Misc Device
>>>>> UEFI Misc Device 2
>>>>> EFI Internal Shell
>>>>> UEFI QEMU QEMU CD-ROM -> device path: PciRoot(0x0)/Pci(0x3,0x0)/Pci(0x0,0x0)/Scsi(0x0,0x1)
>>>>> UEFI QEMU QEMU HARDDISK -> device path: PciRoot(0x0)/Pci(0x3,0x0)/Pci(0x0,0x0)/Scsi(0x0,0x0)
>>>>> UEFI PXEv4 (MAC:5254002D2EB6)
>>>>>
>>>>> But when I shutdown &undefine this domain, and virsh create an new domain with the disk centos.qcow2 which installed just before, the UEFI boot manager
>>>>> menu is:
>>>>> Boot Manager Menu
>>>>> UEFI QEMU QEMU HARDDISK -> device path: PciRoot(0x0)/Pci(0x3,0x0)/Pci(0x0,0x0)/Scsi(0x0,0x0)
>>>>> UEFI Misc Device
>>>>> UEFI Misc Device 2
>>>>> EFI Internal Shell
>>>>> UEFI PXEv4 (MAC:5254002D2EB6)
>>>>
>>>> Right. In this case you have lost your original nvram contents, and you
>>>> only have the boot options that are auto-generated by the
>>>> EfiBootManagerRefreshAllBootOption() function. This function lives in
>>>> UefiBootManagerLib, and is called from OVMF's PlatformBootManagerLib
>>>> instance.
>>>>
>>>> The filtering and reordering still occurs in OVMF, but now the first
>>>> boot option that matches QEMU's fw_cfg bootorder specification is not
>>>> the "CentOS Linux AltArch" boot option that you originally had. Instead,
>>>> now QemuBootOrderLib encounters the "UEFI QEMU QEMU HARDDISK"
>>>> auto-generated boot option first as a match.
>>>>
>>>> This boot option in turn means "fallback.efi", according to the blog
>>>> post I linked earlier.
>>>>
>>>> When "fallback.efi" executes successfully, your original "CentOS Linux
>>>> AltArch" boot option is restored / recreated (at the top of the boot
>>>> option list). But, when "fallback.efi" crashes, you get a crash instead.
>>>>
>>>>> I am confused about two points:
>>>>> 1) The new domain still have chance to load the "EFI\centos\shim.efi" and boot kernel successful, it means that sometimes the system firmware launches
>>>>> the BOOTAA64.EFI, sometimes lauches shim.efi. It is probabilistic.
>>>>
>>>> "EFI\centos\shim.efi" is never automatically loaded. It needs a
>>>> dedicated UEFI boot option. Thus, it can be loaded in your "new" domain
>>>> *only* if "fallback.efi" runs first, successfully.
>>>>
>>>> So what you are seeing is that "fallback.efi" sometimes works, and
>>>> sometimes crashes. That's the nature of memory corruption bugs.
>>>>
>>>>>
>>>>> 2) Is there a way to make the "CentOS Linux AltArch " boot menu persistent?
>>>>
>>>> There isn't. If you lose your nvram, you lose the non-auto-generated
>>>> boot options with it.
>>>>
>>>> Remedying such situations is what "fallback.efi" exists for.
>>>>
>>>>>>
>>>>>> In such cases, the "fallback.efi" utility is invoked (called
>>>>>> "\EFI\BOOT\BOOTAA64.EFI). Please refer to:
>>>>>>
>>>>>> https://blog.uncooperative.org/blog/2014/02/06/the-efi-system-partition/
>>>>>>
>>>>>> Unfortunately, "fallback.efi" (from the shim package) used to have a few
>>>>>> bugs over time and sometimes it would crash. See for example:
>>>>>>
>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1196114
>>>>>>
>>>>>> I'm unsure what version of shim / fallback.efi is in the installed
>>>>>> CentOS image, but it looks like the same (or another similar)
>>>>>> fallback.efi issue to me.
>>>>>
>>>>> shim version in my side is shim-0.9-2.el7.aarch64.
>>>>
>>>> This confirms that you are not seeing the exact bug described in
>>>> RHBZ#1196114, because that bug was fixed in shim-0.9 (see
>>>> <https://bugzilla.redhat.com/show_bug.cgi?id=1196114#c16>).
>>>>
>>>> It remains a fact that your original log contains a crash register dump
>>>> after fallback.efi is launched. The V0 register contains
>>>> 0xAFAFAFAFAFAFAFAF AFAFAFAFAFAFAFAF; the pattern 0xAF is used to fill
>>>> released (freed) pages in debug builds. So this seems to be an
>>>> use-after-free issue. I suggest adding debug instrumentation to
>>>> fallback.efi, and seeing where exactly it blows up.
>>>>
>>>
>>> The presence of the 0xAF pattern in register v0 by itself does not
>>> suggest anything at all: V0 is a SIMD register, which is used by the
>>> SetMem() routine to poison the memory. There is very little other code
>>> (if any) that actually uses the SIMD registers otherwise.
>>
>> Thanks for pointing this out.
>>
>> Can you perhaps deduce more info from the stack / register dump? The
>> topmost three stack frames don't have edk2 module names associated with
>> them -- does that confirm that the synchronous exception is raised in a
>> non-edk2 module?
>>
>
> The stack trace is consistent with BDS calling LoadImage() to launch
> fallback.efi (which is GNU-EFI based so it does not set the NB10
> Codeview debug entry containing the path on the build host)
>
> The FAR (faulting address) register contains the well known bogus
> value KVM puts in there by default. Also, the exception class field in
> the ESR (bits 31:26) is 0x0 as well, which translates as an unknown
> exception.
>
> Are there any kvm related messages in the host kernel log? This looks
> like the result of kvm_inject_undefined(), which prints some kind of
> diagnostic in many cases.
Thanks, Ard!
Zhu Yijun -- can you check this?
Thanks
Laszlo
>
>> (I still think the only way forward is to instrument fallback.efi, and I
>> won't be doing that.)
>>
>
> Well, if you have access to the ELF file that fallback.efi was built
> from, you can correlate the stack trace address with locations in the
> code. Lacking that, it would at least be *very* helpful to know which
> opcode is being executed when the exception is taken.
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
>
next prev parent reply other threads:[~2017-08-02 8:27 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-31 0:27 issue about booting centos fail with edk2 Zhu Yijun
2017-07-31 19:02 ` Laszlo Ersek
2017-08-01 8:34 ` Zhu Yijun
2017-08-01 15:42 ` Laszlo Ersek
2017-08-01 17:23 ` Ard Biesheuvel
2017-08-01 22:29 ` Laszlo Ersek
2017-08-01 22:57 ` Ard Biesheuvel
2017-08-02 8:29 ` Laszlo Ersek [this message]
2017-08-03 0:40 ` Zhu Yijun
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-list from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b562e4b7-0268-7c70-de68-85cec2759b3e@redhat.com \
--to=devel@edk2.groups.io \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox