Ard,

During PrePi, I setup the initial memory map by calling into ArmConfigureMmu function with my memory table where device memory regions have attribute of ARM_MEMORY_REGION_ATTRIBUTE_DEVICE and DRAM regions have attribute of ARM_MEMORY_REGION_ATTRIBUTE_WRITE_BACK.

For device memory, XN bit is set by ArmMemoryAttributeToPageAttribute function. After PrePi, when I add a region of memory to device memory from a DXE driver, I call gDS->AddMemorySpace with EfiGcdMemoryTypeMemoryMappedIo and EFI_MEMORY_UC | EFI_MEMORY_RUNTIME followed by gDS->SetMemorySpaceAttributes with EFI_MEMORY_UC.

Please let me know in case I have still not understood your question.

In our case, UEFI is running in NS-EL2. I understand what I am proposing may not be a proper fix. I am looking for help on what could be a fix for this as we are using all upstream code for memory management.

Thanks
Ashish

From: Ard Biesheuvel <ardb@kernel.org>
Sent: Wednesday, February 23, 2022 10:40 AM
To: edk2-devel-groups-io <devel@edk2.groups.io>; Ashish Singhal <ashishsingha@nvidia.com>
Cc: Marc Zyngier <maz@kernel.org>; Sami Mujawar <sami.mujawar@arm.com>; Ard Biesheuvel <ardb+tianocore@kernel.org>; Leif Lindholm <quic_llindhol@quicinc.com>
Subject: Re: [edk2-devel] [PATCH] ArmPkg: Invalidate Instruction Cache On MMU Enable
 
External email: Use caution opening links or attachments


On Wed, 23 Feb 2022 at 18:36, Ashish Singhal via groups.io
<ashishsingha=nvidia.com@groups.io> wrote:
>
> Hello Ard and Marc,
>
> I apologize for not providing the background on this in the commit message and I understand the commit message is not very clear as well. Let me try to summarize the problem.
>
> In our UEFI implementation, we are doing the following as part of the initial MMU setup:
>
> Set the applicable device memory as nGnRnE memory.
> Set the whole DRAM as normal memory that translates to RW and executable memory.
> Enable caches and MMU.
> At this time, the memory map looks correct when I check that from DS-5.
>

I have asked you a number of times about the XN attribute. How and
where are you setting it for the device regions?

> When we start dispatching drivers, DxeCore dispatches a driver and marks its code area as RO and executable and its data region as RW and non-executable. What we are seeing randomly is that some of the page tables (using DS-5) have invalid output address that leads to the correct input address from UEFI being translated to an unavailable memory location causing a crash sometime in EL2 or sometimes as a RAS error in EL3.
>

At which EL does UEFI run? If at EL1, is it running under a stage2
mapping? If so, does the stage 2 mapping set the XN attribute
appropriately for the device mappings?


> When I reached out to the CPU team here, they said Arm® Cortex®-A78AE is a highly speculative core and we need to have appropriate barriers in place so that there is consistency in the way an address is accessed especially if it is done right after there is a change in translation tables. Based on this, I started some experimentation wrt caches whenever MMUs are enabled and I found that invalidating the instruction cache after enabling MMUs solves this problem.
>

It may hide it, but I don't think it is a proper fix.

> Please note that I could be wrong with my hypothesis here and I may just be masking the issue. If that is the case, please let me know what I should be trying as I am out of ideas at this point. Also, the same UEFI works on NVIDIA's Xavier Silicon that has Carmel cores but shows this issue on Orin Silicon that has Arm® Cortex®-A78AE v8.2 64-bit CPU.
>

Thanks,
Ard.