Hello Ard and Marc, I apologize for not providing the background on this in the commit message and I understand the commit message is not very clear as well. Let me try to summarize the problem. In our UEFI implementation, we are doing the following as part of the initial MMU setup: 1. Set the applicable device memory as nGnRnE memory. 2. Set the whole DRAM as normal memory that translates to RW and executable memory. 3. Enable caches and MMU. 4. At this time, the memory map looks correct when I check that from DS-5. When we start dispatching drivers, DxeCore dispatches a driver and marks its code area as RO and executable and its data region as RW and non-executable. What we are seeing randomly is that some of the page tables (using DS-5) have invalid output address that leads to the correct input address from UEFI being translated to an unavailable memory location causing a crash sometime in EL2 or sometimes as a RAS error in EL3. When I reached out to the CPU team here, they said Arm® Cortex®-A78AE is a highly speculative core and we need to have appropriate barriers in place so that there is consistency in the way an address is accessed especially if it is done right after there is a change in translation tables. Based on this, I started some experimentation wrt caches whenever MMUs are enabled and I found that invalidating the instruction cache after enabling MMUs solves this problem. Please note that I could be wrong with my hypothesis here and I may just be masking the issue. If that is the case, please let me know what I should be trying as I am out of ideas at this point. Also, the same UEFI works on NVIDIA's Xavier Silicon that has Carmel cores but shows this issue on Orin Silicon that has Arm® Cortex®-A78AE v8.2 64-bit CPU. @Marc Zyngier, I agree with your concern about using "dsb sy" and we should replace that with "dsb nsh" as we are running this only on a single core during boot. Thanks Ashish ________________________________ From: Marc Zyngier Sent: Wednesday, February 23, 2022 1:58 AM To: Ard Biesheuvel ; Ashish Singhal Cc: edk2-devel-groups-io ; Sami Mujawar ; Ard Biesheuvel ; Leif Lindholm Subject: Re: [PATCH] ArmPkg: Invalidate Instruction Cache On MMU Enable External email: Use caution opening links or attachments On Wed, 23 Feb 2022 07:02:28 +0000, Ard Biesheuvel wrote: > > (+ Marc) > > On Tue, 22 Feb 2022 at 03:42, Ashish Singhal wrote: > > > > Even with MMU turned off, instruction cache can speculate > > and fetch instructions. This can cause a crash if region > > being executed has been modified recently. With this patch, > > we ensure that instruction cache is invalidated right after > > MMU has been enabled and any potentially stale instruction > > fetched earlier has been discarded. Are you doing code patching while the MMU is off? > > > > I don't follow this reasoning. Every time the UEFI image loader loads > an image into memory, it does so with MMU and caches enabled, and the > code regions are cleaned to the PoU before being invalidated in the > I-cache. So how could these regions be stale? The only code that needs > special care here is the little trampoline that executes with the MMU > off, but this is being taken care of explicitly, by cleaning the code > to the PoC before invoking it. > > > This is specially helpful when the memory attributes of a > > region in MMU are being changed and some instructions > > operating on the region are prefetched in the instruction > > cache. > > Huh, this is way too vague. How do you expect anyone to understand your problem based on this? > > This sounds like a TLB problem not a cache problem. For the sake of > posterity, could you include a more detailed description of the issue > in the commit log? IIRC, this is about speculative instruction fetches > hitting device memory locations? > > As I mentioned before, the architecture does not permit speculative > instruction fetches for regions mapped with the XN attribute. > > Is the issue occurring under virtualization? Is there a stage 2 > mapping that lacks the XN attribute for the device memory region in > question? > > > Signed-off-by: Ashish Singhal > > --- > > ArmPkg/Library/ArmLib/AArch64/AArch64Support.S | 4 +++- > > ArmPkg/Library/ArmMmuLib/AArch64/ArmMmuLibReplaceEntry.S | 2 ++ > > 2 files changed, 5 insertions(+), 1 deletion(-) > > > > diff --git a/ArmPkg/Library/ArmLib/AArch64/AArch64Support.S b/ArmPkg/Library/ArmLib/AArch64/AArch64Support.S > > index d3cc1e8671..9648245182 100644 > > --- a/ArmPkg/Library/ArmLib/AArch64/AArch64Support.S > > +++ b/ArmPkg/Library/ArmLib/AArch64/AArch64Support.S > > @@ -89,7 +89,9 @@ ASM_FUNC(ArmEnableMmu) > > dsb nsh > > isb > > msr sctlr_el3, x0 // Write back > > -4: isb > > +4: ic iallu > > + dsb sy Why DSB SY? Given that you are only invalidating on a single CPU, DSB NSH should be enough. > > + isb > > ret > > > > > > diff --git a/ArmPkg/Library/ArmMmuLib/AArch64/ArmMmuLibReplaceEntry.S b/ArmPkg/Library/ArmMmuLib/AArch64/ArmMmuLibReplaceEntry.S > > index 66ebca571e..56cc2dd73f 100644 > > --- a/ArmPkg/Library/ArmMmuLib/AArch64/ArmMmuLibReplaceEntry.S > > +++ b/ArmPkg/Library/ArmMmuLib/AArch64/ArmMmuLibReplaceEntry.S > > @@ -37,6 +37,8 @@ > > > > // re-enable the MMU > > msr sctlr_el\el, x8 > > + ic iallu > > + dsb sy Same thing here. M. -- Without deviation from the norm, progress is not possible.