From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail02.groups.io (mail02.groups.io [66.175.222.108]) by spool.mail.gandi.net (Postfix) with ESMTPS id 2CAD77803DE for ; Wed, 1 Nov 2023 09:57:19 +0000 (UTC) DKIM-Signature: a=rsa-sha256; bh=sG9gBwJKZHDH/6cFY5Zvuoj7ak1bKXonPS8xvCeMxfQ=; c=relaxed/simple; d=groups.io; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject:To:Cc:Precedence:List-Subscribe:List-Help:Sender:List-Id:Mailing-List:Delivered-To:Reply-To:List-Unsubscribe-Post:List-Unsubscribe:Content-Type:Content-Transfer-Encoding; s=20140610; t=1698832637; v=1; b=mjae6fOYX3yM2UUMjnLWTRZGnnxM2P3tylnZES+muZYAWV1N/Me9XeRgvAiJl4ERODAMOjq2 N50Yi7Fbi67RRW07PEdNuHGUD2tIQMtHjfkyrCYxVZDjZPCaRcVw5P8ginl6yKSEHJtb58Rz22p KvunPJR1A3kUrsAXCHIQqQC0= X-Received: by 127.0.0.2 with SMTP id FyPAYY7687511xmvHsu148og; Wed, 01 Nov 2023 02:57:17 -0700 X-Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by mx.groups.io with SMTP id smtpd.web10.3577.1698832636753491216 for ; Wed, 01 Nov 2023 02:57:17 -0700 X-Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id C37C8CE0E60 for ; Wed, 1 Nov 2023 09:57:13 +0000 (UTC) X-Received: by smtp.kernel.org (Postfix) with ESMTPSA id F1E6EC433CA for ; Wed, 1 Nov 2023 09:57:12 +0000 (UTC) X-Received: by mail-lf1-f45.google.com with SMTP id 2adb3069b0e04-507b9408c61so9103334e87.0 for ; Wed, 01 Nov 2023 02:57:12 -0700 (PDT) X-Gm-Message-State: 7WZnFCeMROvOPRPLUbVHMQgCx7686176AA= X-Google-Smtp-Source: AGHT+IHNYFiFVw4QExoVnDQsvCei0lep8BMDGAeXV15I8Rq4iggoQEgc8Uyx9K4aAApzpfRz5Pa3Lu8y7Fd+CKIeDBo= X-Received: by 2002:a05:6512:3e0f:b0:507:b8c5:6542 with SMTP id i15-20020a0565123e0f00b00507b8c56542mr15763899lfv.65.1698832631137; Wed, 01 Nov 2023 02:57:11 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: "Ard Biesheuvel" Date: Wed, 1 Nov 2023 10:56:59 +0100 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [edk2-devel] [RFC] Ordering of Arm PCI ECAM and MMIO operations To: devel@edk2.groups.io, pedro.falcato@gmail.com Cc: jlotwo@gmail.com, Leif Lindholm , Sami Mujawar , Ray Ni Precedence: Bulk List-Subscribe: List-Help: Sender: devel@edk2.groups.io List-Id: Mailing-List: list devel@edk2.groups.io; contact devel+owner@edk2.groups.io Reply-To: devel@edk2.groups.io,ardb@kernel.org List-Unsubscribe-Post: List-Unsubscribe=One-Click List-Unsubscribe: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-GND-Status: LEGIT Authentication-Results: spool.mail.gandi.net; dkim=pass header.d=groups.io header.s=20140610 header.b=mjae6fOY; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=kernel.org (policy=none); spf=pass (spool.mail.gandi.net: domain of bounce@groups.io designates 66.175.222.108 as permitted sender) smtp.mailfrom=bounce@groups.io (cc Ray) On Wed, 1 Nov 2023 at 03:09, Pedro Falcato wrote: > > +CC ARM maintainers > > On Wed, Nov 1, 2023 at 12:40=E2=80=AFAM Joe L wrote: > > > > Hello, > > > > During experimentation on an AARCH64 platform with a PCIe endpoint that= supports option ROM, it was found that the ordering of transactions betwee= n ECAM (Cfg-Write) and MMIO (Mem-Read) is not preserved by the CHI HN-I nod= e. The observed sequence is as follows: > > 1. EFI issues a Cfg-Write to endpoint Memory Space Enable bit to enable= memory access > > 2. EFI issues a Mem-Read to the endpoint Option ROM memory space and re= ceives back 0xFFFF (unsupported request) > > If the memory ordering between the two accesses is explicitly preserved= via memory barrier (DMB instruction), 2. will return back valid data inste= ad of UR. Analyzing the transactions with a protocol analyzer, we found tha= t the Mem-Read was being issued before the completion for the Cfg-Write is = received. > > > > On this system, the HN-I node is configured to separate the ECAM and MM= IO into different SAM regions. Both regions are assigned Decice-nGnRnE attr= ibutes. According to this snippet from Arm "DEN0114 PCIe AMBA Integration G= uide", the ordering of even Device-nGnRnE memory regions cannot be guarante= ed if they belong to separate PCIe address spaces > > > > 4.8.2 > > > > Multiple PCIe address spaces mapped as Device-nGnRnE or Device-nGnRE Ar= m memory model does not give any ordering guarantees between accesses to di= fferent Device-nGnRnE or Device-nGnRE peripherals. Additionally, there is n= o restriction on mapping various PCIe address spaces of the same PCIe funct= ion as different Device-nGnRnE or Device-nGnRE peripherals. Consequently, s= oftware cannot assume that program order will be maintained between accesse= s to two different PCIe address spaces, even though both spaces are mapped = as Device-nGnRnE or Device-nGnRE. Therefore, for maximum software portabili= ty, ordering requirements between accesses to different PCIe address spaces= must be handled explicitly in software using appropriate ordering instruct= ions." > > > > We requested a comment from an Arm representative and received a simila= r response, confirming that a memory barrier is needed to ensure ordering b= etween accesses to ECAM and MMIO regions (or between any two ranges that ar= e assigned to a separate SAM address region) > > > > When they are to two different order regions, the read will not wait fo= r the write to complete, and can return data before the write does anything= . The HN-I only preserves ordering between reads and writes to the same Ord= er Region (which implies the same Address Region). Likewise, the HN-I will = only preserve ordering between multiple reads and between multiple writes w= ithin the same Order Region, and it accomplishes this by issuing the reads = with the same ARID and the writes with the same AWID (i.e. it relies on the= downstream device to follow AXI ordering rules). Issuing a CHI request wit= h REQ.Order=3DEndpointOrder only guarantees ordering to the same =E2=80=9Ce= ndpoint address range,=E2=80=9D which the HN-I defines as an Order Region (= within an Address Region). > > > > Our CMN TRM showcases an example where ECAM and MMIO are two different = regions in the HN-I SAM. The implication is that we would expect a DSB betw= een the ECAM write and MMIO read. I'm asking our Open Source Software group= to confirm that standard PCIe software is generally expected to be aware o= f the need for a DSB--but my impression from talking to some of our hardwar= e engineers is that that is indeed the expectation. > > > > > > I am requesting that EDK2 consumes or produces a change to the current = PciExpressLib that will ensure ordering on Arm architectures after Cfg-Writ= es which may or may not have side effects. For example, in MdePkg/Library/B= asePciExpressLib/PciExpressLib.c, > > > > UINT8 > > EFIAPI > > PciExpressWrite8 ( > > IN UINTN Address, > > IN UINT8 Value > > ) > > { > > ASSERT_INVALID_PCI_ADDRESS (Address); > > if (Address >=3D PcdPciExpressBaseSize ()) { > > return (UINT8)-1; > > } > > > > return MmioWrite8 ((UINTN)GetPciExpressBaseAddress () + Address, Valu= e); > > } > > > > should become > > > > > > > > UINT8 > > EFIAPI > > PciExpressWrite8 ( > > IN UINTN Address, > > IN UINT8 Value > > ) > > { > > ASSERT_INVALID_PCI_ADDRESS (Address); > > if (Address >=3D PcdPciExpressBaseSize ()) { > > return (UINT8)-1; > > } > > > > UINT8 ReturnValue =3D MmioWrite8 ((UINTN)GetPciExpressBaseAddress () = + Address, Value); > > #if defined (MDE_CPU_AARCH64) > > MemoryFence (); // DMB sy or DSB > > #endif > > > > return ReturnValue; > > } > > > > Please let me know your thoughts and if this is the correct implementat= ion change needed to enforce memory ordering between separate address regio= ns. I would also like to know the preferred memory barrier instruction in t= his case (DMB or DSB). > > Hrmmm, I think the current implementation of MmioRead/Write only > offers TSO instead of "full serialization". For instance: > > // > // Reads an 8-bit MMIO register. > // > // Reads the 8-bit MMIO register specified by Address. The 8-bit read va= lue is > // returned. This function must guarantee that all MMIO read and write > // operations are serialized. > // > // @param Address The MMIO register to read. > // > // @return The value read. > // > ASM_PFX(MmioRead8Internal): > AARCH64_BTI(c) > ldrb w0, [x0] > dmb ld > ret > > // > // Writes an 8-bit MMIO register. > // > // Writes the 8-bit MMIO register specified by Address with the value sp= ecified > // by Value and returns Value. This function must guarantee that all MMI= O read > // and write operations are serialized. > // > // @param Address The MMIO register to write. > // @param Value The value to write to the MMIO register. > // > ASM_PFX(MmioWrite8Internal): > AARCH64_BTI(c) > dmb st > strb w1, [x0] > ret > > > So prior reads are ordered against reads/stores (through dmb ld after > the load). Prior writes are ordered against writes only (through dmb > st before the store). So one can easily devise a problem here where > > dmb st /* orders against prior writes */ > strb w1, [x0] /* write */ > [...] > > ldrb w0, [x0] /* load */ > dmb ld /* orders this load against later reads/stores, but not > against prior writes (!!) */ > > where the CPU can (AFAIK) freely reorder the write after the load in > the ARM64 case. > Not if the memory type is device/strongly ordered, which is the assumption for regions these accessors operate on. And even with cached normal memory, only /other/ observers might see these take effect in a different order. The barriers here are supposed to manage concurrent accesses, which are mostly from coherent masters other than secondary CPUs in the context that EDK2 typically operates in. For instance, the DMB ST in MmioWrite8Internal() is supposed to ensure that any prior writes to a DMA buffer are not reordered with the MMIO write that kicks of the DMA transaction. > Do we want dmb st after the store instead? > I think we should separate the two cases here: 1) as per the architecture (as interpreted by the ARM architects), a DSB is required to ensure that the side effects of enabling a MMIO BAR in the PCI config space are sufficiently observable to memory accesses to that BAR that appear after the PCI config space access in the program. 2) the MMIO accessors are completely undocumented, and it would make sense to put some of the (alleged) rationale in the source code for posterity. As for 1), I don't think dropping a MemoryFence() (which issues a DBM not a DSB on arm64) in a shared source file somewhere is the right approach here. I'd much prefer handling this in RootBridgeIoPciAccess(), and using/introducing a generic helper other than MemoryFence() to convey the existence of a potential side effect that may turn entire regions of memory on or off (including ones that the program/compiler may consider 'ordinary' memory) On ARM/AARCH64, we already have DataSynchronizationBarrier () for this purpose. Alternatively, we might introduce a PciExpressLib instance in ArmPkg that uses this. -=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#110473): https://edk2.groups.io/g/devel/message/110473 Mute This Topic: https://groups.io/mt/102310377/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io] -=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-