On 19. Apr 2023, at 23:55, Ard Biesheuvel <ardb@kernel.org> wrote:
On Wed, 19 Apr 2023 at 22:10, Marvin Häuser <mhaeuser@posteo.de> wrote:
On 19. Apr 2023, at 21:48, Ard Biesheuvel <ardb@kernel.org> wrote:
The issue is likely caused by
-Wl,--defsym=PECOFF_HEADER_SIZE=0
Why are you setting that? It breaks the ELF to PE conversion.
Where?
It would, but you only appear to be setting that for ASLD_DLINK_FLAGS,right? So that seems unrelated.
The only thing I am observing is that the store to memory in
ArmMmuBaseLibConstructor()
Hob = GetFirstGuidHob (&gArmMmuReplaceLiveTranslationEntryFuncGuid);
if (Hob != NULL) {
mReplaceLiveEntryFunc = *(VOID **)GET_GUID_HOB_DATA (Hob);
is writing to the emulated NOR flash, and this switches it into NOR
programming mode, causing the firmware to crash immediately as it can
no longer fetch instructions.
That makes so much more sense now! I expected one of three things to happen:
1) The write actually succeeds (after all, this is a VM, this might actually be the case for x86 OVMF)
2) The write is silently discarded
3) The write causes an exception
I certainly did not expect *this*. When we initially tried to debug this, we attempted to use watchpoints to no avail, expecting it to be regular memory corruption. As those didn’t fire, we messed with function alignment and discovered the reported bug (which we didn’t really even trigger to begin with, it appears!). I suppose fixing its alignment meant some code that’s important down the line is fetched earlier as part of some flash unit and that’s why it started to work after fixing it. Whew.
FYI I am using GDB to step through the code, i.e.,- run gdb (or 'gdb-multiarch' if you are cross-compiling)- start qemu with -s -S- connect using 'target remote :1234'- paste the 'add-symbol-file' line, e.g.,add-symbol-file/home/ard/build/edk2-workspace/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Core/DxeIplPeim/DxeIpl/DEBUG/DxeIpl.dll0x30000- set breakpoint"hb _ModuleEntryPoint"- start executing"c"- use 'ni' to advance to the 'str' instruction that sets mReplaceLiveEntryFunc0x3553c <_ModuleEntryPoint+96> str x1, [x0, #224]
Now, as soon as I step over that instruction (using 's'), the entireview of memory changes into│ > 0x35540 <_ModuleEntryPoint+100> .inst 0x00800080 ; undefined│ 0x35544 <_ModuleEntryPoint+104> .inst 0x00800080 ; undefinedetc, and the next step generates an exception, but this cannot behandled either. This is all related to the NOR flash emulation code inQEMU, that stops working as a ROM and switches into programming mode.I cannot explain why this only happens in this case, and why somewrites seem to be ignored. But it does explain why this particularfirmware build is misbehavingNow, if you apply the following patches:ArmPkg/Mmu: Remove handling of NONSECURE memory regionsArmPkg/ArmMmuLib: Introduce region types for RO/XP WB cached memoryArmVirtPkg/ArmVirtQemu: Use read-only memory region type for code flash(from the edk2-devel list), your build still crashes, but it printsone additional lineSynchronous Exception at 0x3553Cwhich is the exception caused by the write to NOR flash, which is nowmapped read-only in the page tables, and so it is caught by thefirmware itself.
That’s actually something I proposed to debug the issue early on, but we’re all so-so with ARM experience, so we never got to that with the limited time we could spare. Praise to you!
If you subsequently apply
ArmVirtPkg/ArmVirtQemu: Use PEI flavor of ArmMmuLib for all PEIMs
things work as expected.
https://github.com/ardbiesheuvel/edk2/tree/arm_corruption-latest-ardb
I‘d love to confirm all this, but I can’t spare the time. I blindly trust you and will try to submit V3 within this week.
Best regards,
Marvin