From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail05.groups.io (mail05.groups.io [45.79.224.7]) by spool.mail.gandi.net (Postfix) with ESMTPS id 258AE78003C for ; Wed, 4 Jun 2025 17:54:54 +0000 (UTC) DKIM-Signature: a=rsa-sha256; bh=GT3orYOdkOcPTxrjhBh6FmzpHbkcvXWkSymoDBj7rlY=; c=relaxed/simple; d=groups.io; h=DKIM-Filter:Message-ID:Date:MIME-Version:User-Agent:To:Cc:From:Subject:Precedence:List-Subscribe:List-Help:Sender:List-Id:Mailing-List:Delivered-To:Resent-Date:Resent-From:Reply-To:List-Unsubscribe-Post:List-Unsubscribe:Content-Language:Content-Type:Content-Transfer-Encoding; s=20240830; t=1749059694; v=1; x=1749318893; b=ES1MaMQrlF46nQHA83anDLd8TP3SQq1+Ubx33uLKIpa1vvk5oACFAp6Tt47t87sZ9UPLiUCx 8pR1y28Ms2K/woZSy5MX4oIRH36Zwy3OqD/yxNP9OCvZU2Py7tXwANYmTEK1+1oqup/uNVT/HQJ gVoRhKgnOYHA5sVLNYPbbsgFbVVIni3kXAH8YZQo1fBm/p42rYFDPlcYkCw3rwWckLE26/GPkJG WWcE5EaFoVGB07W20hvm+h6VEhLOld3NkmEFwdoVGOrj+LiuHoiu0SC4mHpjxYiJBWrFviYXuKL 76RuyeD/AVWlIgtxiB4O2/HKW2rQ8J2+Ln8skPxTO968A== X-Received: by 127.0.0.2 with SMTP id 7uUMYY7687511xJyhCD6hytP; Wed, 04 Jun 2025 10:54:53 -0700 X-Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by mx.groups.io with SMTP id smtpd.web10.23411.1749059692753923816 for ; Wed, 04 Jun 2025 10:54:52 -0700 X-Received: from [10.137.194.171] (unknown [131.107.1.171]) by linux.microsoft.com (Postfix) with ESMTPSA id 3DFD1201FF32; Wed, 4 Jun 2025 10:54:52 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 3DFD1201FF32 Message-ID: <7aa61f2f-f17c-438a-b676-7d80b32895ae@linux.microsoft.com> Date: Wed, 4 Jun 2025 10:54:52 -0700 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird To: Ard Biesheuvel , Leif Lindholm Cc: "devel@edk2.groups.io" From: "Oliver Smith-Denny via groups.io" Subject: [edk2-devel] AARCH64 Cacheability Attributes Precedence: Bulk List-Subscribe: List-Help: Sender: devel@edk2.groups.io List-Id: Mailing-List: list devel@edk2.groups.io; contact devel+owner@edk2.groups.io Resent-Date: Wed, 04 Jun 2025 10:54:52 -0700 Resent-From: osde@linux.microsoft.com Reply-To: devel@edk2.groups.io,osde@linux.microsoft.com List-Unsubscribe-Post: List-Unsubscribe=One-Click List-Unsubscribe: X-Gm-Message-State: G8twzXA4swoO4UV6iXQVVapnx7686176AA= Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable X-GND-Status: LEGIT Authentication-Results: spool.mail.gandi.net; dkim=pass header.d=groups.io header.s=20240830 header.b=ES1MaMQr; dmarc=pass (policy=none) header.from=groups.io; spf=pass (spool.mail.gandi.net: domain of bounce@groups.io designates 45.79.224.7 as permitted sender) smtp.mailfrom=bounce@groups.io Hi Ard and Leif, We have been debugging an issue on an AARCH64 platform that has led us to believe a UEFI spec update with a new caching type may be needed, but we wanted to get your input before proposing that. Diving into the specific issue that led us here first: we have a platform with an XHCI controller that connects to the PCI hierarchy through NonDiscoverablePciDeviceDxe, registering as a non-coherent MMIO device, which prompts that driver to set up its PciIo structure with the noncoherent routines: https://github.com/tianocore/edk2/blob/8c04bcc7ed0efbb2e3fde23f787ff0249e62= f874/MdeModulePkg/Bus/Pci/NonDiscoverablePciDeviceDxe/NonDiscoverablePciDev= iceIo.c#L1098. When attempting to PXE boot over a USB NIC, SnpDxe ends up having many different alignment faults trying to access the host DMA buffers it has set up: https://github.com/tianocore/edk2/blob/8c04bcc7ed0efbb2e3fde23f787ff0249e62= f874/NetworkPkg/SnpDxe/Snp.c#L364 (SnpDxe is poorly architected and we are putting up a PR to resolve some of the glaring issues, like allocating its internal driver structure in DMA-able memory). Tracking this down, this is because the non-coherent APIs exposed in the PciIo->AllocateBuffer routine of NonDiscoverablePciDeviceDxe set the attributes of the buffers to what the caller has provided or if the caller has not provided attributes (as SnpDxe does not, and can't, it doesn't know what the underlying bus is and what the cacheability state must be), it will set UC. ArmMmuLib translates UC to device memory: https://github.com/tianocore/edk2/blob/8c04bcc7ed0efbb2e3fde23f787ff0249e62= f874/ArmPkg/Library/ArmMmuLib/AArch64/ArmMmuLibCore.c#L451 Which causes the AP to have stricter alignment requirements (among other things) and then causes faults with various code patterns in SnpDxe and the closed source vendor provided UNDI driver, including some patterns that GCC has created where it tries to optimize writing 0 to the structure and so does unaligned writes. We explored various workarounds: setting -mstrict-align (which is a big hammer, though I don't have exact numbers), using the coherent APIs (I suspect the platform XHCI driver was copied from somewhere and that it is actually coherent on this platform), and rearchitecting parts of SnpDxe. However, while we have a solution there, this led us to the greater problem: edk2 (and the UEFI spec) were built for x86 and the cacheability attributes are x86 ones that we are attempting to shoehorn into aarch64 ones. I am guessing (though certainly the two of you would have much better knowledge here) that when ArmPkg was created, there were many such decisions made to avoid changing the rest of edk2 and just bolting Arm onto the side. Because some UC memory must be device memory, all of UC is made into device so that things "just work", until they don't. I think that having all UC being set to device memory is a recipe for more alignment faults because we have no guarantees in higher level drivers that accesses are aligned (or compiler guarantees). OSes do of course make the distinction between normal non-cacheable and device memory. My proposal is to follow this model (and the ARM ARM) and=20 update the UEFI spec to include a new cacheability attribute, EFI_MEMORY_UC_IO (or some such name, I don't really care) which for x86 will just map to the same thing as EFI_MEMORY_UC. On aarch64, EFI_MEMORY_UC_IO maps to device memory and EFI_MEMORY_UC maps to normal non-cacheable (an aside is that EFI_MEMORY_WC probably continues to also map to normal non-cacheable). Then, drivers and the cores are updated to reflect whether they are setting attributes on true MMIO or host side DMA buffers (or whatever else). I believe this alleviates the issues with widespread device memory and the alignment faults while bringing the UEFI spec and edk2 into better alignment with the aarch64 architecture. We have explored a few other possibilities that are more of the bolted on the side variety, such as the GCD knowning what is mapped MMIO and so ArmCpuDxe can query the GCD if it sees EFI_MEMORY_UC being passed in and query if this is actually MMIO or normal memory and map accordingly, but besides being additional overhead, this feels like another band aid instead of the holistic solution of actually integrating aarch64 into the UEFI spec/edk2. Also, there is concern that because the UEFI spec defines these attributes (and the PI spec references them) we would be unable to reflect the real state of memory when returning the results of a query, etc. Furthermore, adding a heuristic for the core to follow is always a little troubling, in the end, the calling driver knows what this memory region is being used for and what the cacheability should be. Please let us know your thoughts, we may well be missing some important nuances here, but if you agree with the general direction, we'll take this to the USWG. Thanks, Oliver -=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#121391): https://edk2.groups.io/g/devel/message/121391 Mute This Topic: https://groups.io/mt/113471239/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io] -=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-