From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail02.groups.io (mail02.groups.io [66.175.222.108]) by spool.mail.gandi.net (Postfix) with ESMTPS id 848CED8018D for ; Wed, 1 Nov 2023 02:09:52 +0000 (UTC) DKIM-Signature: a=rsa-sha256; bh=8cGV0bMMUorv2O3bZQ2kqBB0fKqiTlfOT7NzK4AmvDM=; c=relaxed/simple; d=groups.io; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject:To:Cc:Precedence:List-Subscribe:List-Help:Sender:List-Id:Mailing-List:Delivered-To:Reply-To:List-Unsubscribe-Post:List-Unsubscribe:Content-Type:Content-Transfer-Encoding; s=20140610; t=1698804591; v=1; b=wgYlaE5ljHAFnUGKjt7m53qUnG2fp+IYG42c3SlxU298kLiN4IrgBe0a20pNaoCKM8oWPGkb OG0aV99ks5vDceqSXbgpkAziz8Ld7u1gIjMPi/oq+oneJlLCv1h85RNMZ/GGUX4pW4heoyR0V5v lYbjVY+OHyFqC/vf4kDS6RjU= X-Received: by 127.0.0.2 with SMTP id KJkPYY7687511xuYICsZtXRw; Tue, 31 Oct 2023 19:09:51 -0700 X-Received: from mail-vk1-f177.google.com (mail-vk1-f177.google.com [209.85.221.177]) by mx.groups.io with SMTP id smtpd.web10.48287.1698804590712601417 for ; Tue, 31 Oct 2023 19:09:50 -0700 X-Received: by mail-vk1-f177.google.com with SMTP id 71dfb90a1353d-4a13374a1e8so2575482e0c.1 for ; Tue, 31 Oct 2023 19:09:50 -0700 (PDT) X-Gm-Message-State: RehHyBATaguoA1opXZjoWmxNx7686176AA= X-Google-Smtp-Source: AGHT+IFAiCn28BRrN+IGkTzuYyLVZuA7WB92zUuOEEHIf2E/xWRSLbJMCCh7LbcX55MRDHbPojFpVs4+sEQhWMsBzUE= X-Received: by 2002:a1f:298e:0:b0:49a:7a5b:dab2 with SMTP id p136-20020a1f298e000000b0049a7a5bdab2mr12486595vkp.16.1698804589103; Tue, 31 Oct 2023 19:09:49 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: "Pedro Falcato" Date: Wed, 1 Nov 2023 02:09:37 +0000 Message-ID: Subject: Re: [edk2-devel] [RFC] Ordering of Arm PCI ECAM and MMIO operations To: devel@edk2.groups.io, jlotwo@gmail.com Cc: Leif Lindholm , Ard Biesheuvel , Sami Mujawar Precedence: Bulk List-Subscribe: List-Help: Sender: devel@edk2.groups.io List-Id: Mailing-List: list devel@edk2.groups.io; contact devel+owner@edk2.groups.io Reply-To: devel@edk2.groups.io,pedro.falcato@gmail.com List-Unsubscribe-Post: List-Unsubscribe=One-Click List-Unsubscribe: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-GND-Status: LEGIT Authentication-Results: spool.mail.gandi.net; dkim=pass header.d=groups.io header.s=20140610 header.b=wgYlaE5l; spf=pass (spool.mail.gandi.net: domain of bounce@groups.io designates 66.175.222.108 as permitted sender) smtp.mailfrom=bounce@groups.io; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=gmail.com (policy=none) +CC ARM maintainers On Wed, Nov 1, 2023 at 12:40=E2=80=AFAM Joe L wrote: > > Hello, > > During experimentation on an AARCH64 platform with a PCIe endpoint that s= upports option ROM, it was found that the ordering of transactions between = ECAM (Cfg-Write) and MMIO (Mem-Read) is not preserved by the CHI HN-I node.= The observed sequence is as follows: > 1. EFI issues a Cfg-Write to endpoint Memory Space Enable bit to enable m= emory access > 2. EFI issues a Mem-Read to the endpoint Option ROM memory space and rece= ives back 0xFFFF (unsupported request) > If the memory ordering between the two accesses is explicitly preserved v= ia memory barrier (DMB instruction), 2. will return back valid data instead= of UR. Analyzing the transactions with a protocol analyzer, we found that = the Mem-Read was being issued before the completion for the Cfg-Write is re= ceived. > > On this system, the HN-I node is configured to separate the ECAM and MMIO= into different SAM regions. Both regions are assigned Decice-nGnRnE attrib= utes. According to this snippet from Arm "DEN0114 PCIe AMBA Integration Gui= de", the ordering of even Device-nGnRnE memory regions cannot be guaranteed= if they belong to separate PCIe address spaces > > 4.8.2 > > Multiple PCIe address spaces mapped as Device-nGnRnE or Device-nGnRE Arm = memory model does not give any ordering guarantees between accesses to diff= erent Device-nGnRnE or Device-nGnRE peripherals. Additionally, there is no = restriction on mapping various PCIe address spaces of the same PCIe functio= n as different Device-nGnRnE or Device-nGnRE peripherals. Consequently, sof= tware cannot assume that program order will be maintained between accesses = to two different PCIe address spaces, even though both spaces are mapped as= Device-nGnRnE or Device-nGnRE. Therefore, for maximum software portability= , ordering requirements between accesses to different PCIe address spaces m= ust be handled explicitly in software using appropriate ordering instructio= ns." > > We requested a comment from an Arm representative and received a similar = response, confirming that a memory barrier is needed to ensure ordering bet= ween accesses to ECAM and MMIO regions (or between any two ranges that are = assigned to a separate SAM address region) > > When they are to two different order regions, the read will not wait for = the write to complete, and can return data before the write does anything. = The HN-I only preserves ordering between reads and writes to the same Order= Region (which implies the same Address Region). Likewise, the HN-I will on= ly preserve ordering between multiple reads and between multiple writes wit= hin the same Order Region, and it accomplishes this by issuing the reads wi= th the same ARID and the writes with the same AWID (i.e. it relies on the d= ownstream device to follow AXI ordering rules). Issuing a CHI request with = REQ.Order=3DEndpointOrder only guarantees ordering to the same =E2=80=9Cend= point address range,=E2=80=9D which the HN-I defines as an Order Region (wi= thin an Address Region). > > Our CMN TRM showcases an example where ECAM and MMIO are two different re= gions in the HN-I SAM. The implication is that we would expect a DSB betwee= n the ECAM write and MMIO read. I'm asking our Open Source Software group t= o confirm that standard PCIe software is generally expected to be aware of = the need for a DSB--but my impression from talking to some of our hardware = engineers is that that is indeed the expectation. > > > I am requesting that EDK2 consumes or produces a change to the current Pc= iExpressLib that will ensure ordering on Arm architectures after Cfg-Writes= which may or may not have side effects. For example, in MdePkg/Library/Bas= ePciExpressLib/PciExpressLib.c, > > UINT8 > EFIAPI > PciExpressWrite8 ( > IN UINTN Address, > IN UINT8 Value > ) > { > ASSERT_INVALID_PCI_ADDRESS (Address); > if (Address >=3D PcdPciExpressBaseSize ()) { > return (UINT8)-1; > } > > return MmioWrite8 ((UINTN)GetPciExpressBaseAddress () + Address, Value)= ; > } > > should become > > > > UINT8 > EFIAPI > PciExpressWrite8 ( > IN UINTN Address, > IN UINT8 Value > ) > { > ASSERT_INVALID_PCI_ADDRESS (Address); > if (Address >=3D PcdPciExpressBaseSize ()) { > return (UINT8)-1; > } > > UINT8 ReturnValue =3D MmioWrite8 ((UINTN)GetPciExpressBaseAddress () + = Address, Value); > #if defined (MDE_CPU_AARCH64) > MemoryFence (); // DMB sy or DSB > #endif > > return ReturnValue; > } > > Please let me know your thoughts and if this is the correct implementatio= n change needed to enforce memory ordering between separate address regions= . I would also like to know the preferred memory barrier instruction in thi= s case (DMB or DSB). Hrmmm, I think the current implementation of MmioRead/Write only offers TSO instead of "full serialization". For instance: // // Reads an 8-bit MMIO register. // // Reads the 8-bit MMIO register specified by Address. The 8-bit read valu= e is // returned. This function must guarantee that all MMIO read and write // operations are serialized. // // @param Address The MMIO register to read. // // @return The value read. // ASM_PFX(MmioRead8Internal): AARCH64_BTI(c) ldrb w0, [x0] dmb ld ret // // Writes an 8-bit MMIO register. // // Writes the 8-bit MMIO register specified by Address with the value spec= ified // by Value and returns Value. This function must guarantee that all MMIO = read // and write operations are serialized. // // @param Address The MMIO register to write. // @param Value The value to write to the MMIO register. // ASM_PFX(MmioWrite8Internal): AARCH64_BTI(c) dmb st strb w1, [x0] ret So prior reads are ordered against reads/stores (through dmb ld after the load). Prior writes are ordered against writes only (through dmb st before the store). So one can easily devise a problem here where dmb st /* orders against prior writes */ strb w1, [x0] /* write */ [...] ldrb w0, [x0] /* load */ dmb ld /* orders this load against later reads/stores, but not against prior writes (!!) */ where the CPU can (AFAIK) freely reorder the write after the load in the ARM64 case. Do we want dmb st after the store instead? --=20 Pedro -=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#110462): https://edk2.groups.io/g/devel/message/110462 Mute This Topic: https://groups.io/mt/102310377/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io] -=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-