On 19. Apr 2023, at 19:40, Ard Biesheuvel <ardb@kernel.org> wrote:

On Wed, 19 Apr 2023 at 19:14, Marvin Häuser <mhaeuser@posteo.de> wrote:

Hi all,

While testing Ard's suggestion for V3, I noticed I got a broken FD where ArmReplaceLiveTranslationEntry() is misaligned, but does not cross a 4 KB boundary.

Which platform are you building?

ArmVirtPkg / AARCH64 / DEBUG / GCC5 (GCC 12.2.0).


To not just hide the issue via this patch, can someone please try to explain the exact requirements this function has (the comments read like 0x200 was just the lowest value to guarantee staying within a page)? Why would it be broken if misaligned, but not crossing a page?


0x200 is a log2 upper bound for the size of the function, so it's just
the smallest value that fits that requirement, determined manually
iirc

And the only reason we have this is that we can cheaply decide whether
or not unmapping a page will unmap this function or not, but we could
actually just use the address and size to decide this.

In any case, if the FD is constructed in a way that violates the
alignment, there is something wrong with the build tools you are
using.

The tools are stock edk2, the only changes made are those in the latest commit of the linked branch.


Is there any chance the FD is somehow misaligned in memory, thus shifting the function across a page in the process? Or is the FD mapped to a fixed address like with x86? Is code after ArmReplaceLiveTranslationEntry() crossing page boundaries the actual issue (and is implicitly fixed by aligning it)?


If you are building ArmVirtQemu.dsc, the FD is mapped at address 0x0
and the FV is mapped at 0x1000

Then the function simply is not crossing a page boundary... which means the patch did fix a valid bug, but it wasn't what actually caused the corruption. Any help is appreciated. :)

Best regards,
Marvin



For reference, I attached the broken FD. The problematic ArmReplaceLiveTranslationEntry() is located at MemoryInitPei 0x25C10 - 0x25D54.
This is from my arm_corruption-latest branch: https://github.com/mhaeuser/edk2/tree/arm_corruption-latest

Best regards,
Marvin


On 18. Apr 2023, at 10:18, Marvin Häuser <mhaeuser@posteo.de> wrote:



On 18. Apr 2023, at 10:10, Ard Biesheuvel <ardb@kernel.org> wrote:

On Tue, 18 Apr 2023 at 08:40, Marvin Häuser <mhaeuser@posteo.de> wrote:



On 17. Apr 2023, at 23:18, Ard Biesheuvel <ardb@kernel.org> wrote:

Agree with all of this.

And thanks for tracking this down - must not have been fun :-)


No worries - it wasn’t. :) It was mere luck Vitaly discovered early it was an issue that triggered based on the binary layout rather than a bug in the mapping code itself…

Speaking of not fun to track down, I initially wanted to add ASSERTs (yes, runtime :( ) to check the alignment guarantee is actually met. Leif basically declined any form of regression-testing at runtime. Do you happen to have a simple(!) idea for how it could be checked at build-time? (It’s less about “which commands do I run?†and more about integration with the build process / BaseTools, cross-OS compatibility, etc.)



I think we should just add another align to the code:

.align xx
func:

< code >

.align xx
.org func + xx

That way, if the code straddles a xx-aligned boundary, the .org will
move backwards and force an error.


Wow, that's pretty fucking smart... I didn't even know that directive was a thing. I will try to toy with it soon. Do you want it as a separate series, separate commit in the current series, or squashed into the fix?


Attachments:

QEMU_EFI.fd