I am still looking for a better solution to the problem and I am open to working with you or anyone else from ARM on that. Please let me know if you want me to try something else.
Hello Ard,
When we had a discussion on this topic earlier, maybe a few weeks back, we thought device memory is being accessed in a speculative manner. Our latest debug where
we focussed on MMU page tables at the time of error tells that the issue is actually DRAM mapping in page tables getting corrupt (as per DS-5) where descriptor for a page seems to be something garbage. What this causes is that a valid input address in DRAM
may get translated to an address in a different region of DRAM or some address in device memory.
When I invalidate the instruction cache after enabling MMUs, this issue seems to be getting resolved. Again, I am not saying this is a fix but this is something that
solves the issue while we are looking for suggestions from you for a proper fix.
I have tried to summarize the problem based on the latest findings a few emails down the trail if you want to have a look at that again.
Thanks
Ashish
From: Ard Biesheuvel <ardb@kernel.org>
Sent: Wednesday, February 23, 2022 3:54 PM
To: Ashish Singhal <ashishsingha@nvidia.com>
Cc: edk2-devel-groups-io <devel@edk2.groups.io>; Marc Zyngier <maz@kernel.org>; Sami Mujawar <sami.mujawar@arm.com>; Ard Biesheuvel <ardb+tianocore@kernel.org>; Leif Lindholm <quic_llindhol@quicinc.com>
Subject: Re: [edk2-devel] [PATCH] ArmPkg: Invalidate Instruction Cache On MMU Enable
External email: Use caution opening links or attachments
On Wed, 23 Feb 2022 at 19:14, Ashish Singhal <ashishsingha@nvidia.com> wrote:
>
> Ard,
>
> During PrePi, I setup the initial memory map by calling into ArmConfigureMmu function with my memory table where device memory regions have attribute of ARM_MEMORY_REGION_ATTRIBUTE_DEVICE and DRAM regions have attribute of ARM_MEMORY_REGION_ATTRIBUTE_WRITE_BACK.
>
> For device memory, XN bit is set by ArmMemoryAttributeToPageAttribute function. After PrePi, when I add a region of memory to device memory from a DXE driver, I call gDS->AddMemorySpace with EfiGcdMemoryTypeMemoryMappedIo and EFI_MEMORY_UC | EFI_MEMORY_RUNTIME
followed by gDS->SetMemorySpaceAttributes with EFI_MEMORY_UC.
>
> Please let me know in case I have still not understood your question.
>
This all looks ok. But the real question is whether the address that
the speculative access targets is mapped using the XN attribute or
not, so you will need to find a way to check that.
So there are a couple of options:
- The XN attribute is set correctly, but the CPU is speculatively
fetching instructions anyway. This would imply a severe hardware bug,
and flushing the I-cache is unlikely to make a difference.
- The speculative access is not the result of an instruction fetch, in
which case I-cache maintenance is unlikely to help either.
- The XN bit is not being set correctly, and so the MMU code needs to be fixed.
Papering over this by adding I-cache maintenance doesn't seem the best
course of action tbh.
--
Ard.