On 19. Apr 2023, at 23:55, Ard Biesheuvel <ardb@kernel.org> wrote:

On Wed, 19 Apr 2023 at 22:10, Marvin Häuser <mhaeuser@posteo.de> wrote:


On 19. Apr 2023, at 21:48, Ard Biesheuvel <ardb@kernel.org> wrote:

The issue is likely caused by

-Wl,--defsym=PECOFF_HEADER_SIZE=0

Why are you setting that? It breaks the ELF to PE conversion.

Where?

It would, but you only appear to be setting that for ASLD_DLINK_FLAGS,
right? So that seems unrelated.




The only thing I am observing is that the store to memory in
ArmMmuBaseLibConstructor()

 Hob = GetFirstGuidHob (&gArmMmuReplaceLiveTranslationEntryFuncGuid);
 if (Hob != NULL) {
   mReplaceLiveEntryFunc = *(VOID **)GET_GUID_HOB_DATA (Hob);

is writing to the emulated NOR flash, and this switches it into NOR
programming mode, causing the firmware to crash immediately as it can
no longer fetch instructions.

That makes so much more sense now! I expected one of three things to happen:
1) The write actually succeeds (after all, this is a VM, this might actually be the case for x86 OVMF)
2) The write is silently discarded
3) The write causes an exception

I certainly did not expect *this*. When we initially tried to debug this, we attempted to use watchpoints to no avail, expecting it to be regular memory corruption. As those didn’t fire, we messed with function alignment and discovered the reported bug (which we didn’t really even trigger to begin with, it appears!). I suppose fixing its alignment meant some code that’s important down the line is fetched earlier as part of some flash unit and that’s why it started to work after fixing it. Whew.


FYI I am using GDB to step through the code, i.e.,

- run gdb (or 'gdb-multiarch' if you are cross-compiling)
- start qemu with -s -S
- connect using 'target remote :1234'
- paste the 'add-symbol-file' line, e.g.,
add-symbol-file
/home/ard/build/edk2-workspace/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Core/DxeIplPeim/DxeIpl/DEBUG/DxeIpl.dll
0x30000
- set breakpoint
"hb _ModuleEntryPoint"
- start executing
"c"
- use 'ni' to advance to the 'str' instruction that sets mReplaceLiveEntryFunc

0x3553c <_ModuleEntryPoint+96>  str     x1, [x0, #224]

Now, as soon as I step over that instruction (using 's'), the entire
view of memory changes into

│  > 0x35540 <_ModuleEntryPoint+100> .inst   0x00800080 ; undefined
│    0x35544 <_ModuleEntryPoint+104> .inst   0x00800080 ; undefined

etc, and the next step generates an exception, but this cannot be
handled either. This is all related to the NOR flash emulation code in
QEMU, that stops working as a ROM and switches into programming mode.

I cannot explain why this only happens in this case, and why some
writes seem to be ignored. But it does explain why this particular
firmware build is misbehaving

Now, if you apply the following patches:

ArmPkg/Mmu: Remove handling of NONSECURE memory regions
ArmPkg/ArmMmuLib: Introduce region types for RO/XP WB cached memory
ArmVirtPkg/ArmVirtQemu: Use read-only memory region type for code flash

(from the edk2-devel list), your build still crashes, but it prints
one additional line

Synchronous Exception at 0x3553C

which is the exception caused by the write to NOR flash, which is now
mapped read-only in the page tables, and so it is caught by the
firmware itself.

That’s actually something I proposed to debug the issue early on, but we’re all so-so with ARM experience, so we never got to that with the limited time we could spare. Praise to you!


If you subsequently apply

ArmVirtPkg/ArmVirtQemu: Use PEI flavor of ArmMmuLib for all PEIMs

things work as expected.

https://github.com/ardbiesheuvel/edk2/tree/arm_corruption-latest-ardb

I‘d love to confirm all this, but I can’t spare the time. I blindly trust you and will try to submit V3 within this week.

Best regards,
Marvin