From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mx.groups.io with SMTP id smtpd.web11.882.1589216042370093932 for ; Mon, 11 May 2020 09:54:02 -0700 Authentication-Results: mx.groups.io; dkim=missing; spf=pass (domain: arm.com, ip: 217.140.110.172, mailfrom: ard.biesheuvel@arm.com) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E726630E; Mon, 11 May 2020 09:54:01 -0700 (PDT) Received: from [192.168.1.81] (unknown [10.37.8.255]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E18C03F305; Mon, 11 May 2020 09:53:59 -0700 (PDT) Subject: Re: [PATCH edk2-platforms v4 4/9] Silicon/Broadcom/BcmGenetDxe: avoid uncached memory for streaming DMA To: Andrei Warkentin , "devel@edk2.groups.io" Cc: Pete Batard , Jared McNeill , Samer El-Haj-Mahmoud , Jeremy Linton References: <20200511145527.23453-1-ard.biesheuvel@arm.com> <20200511145527.23453-5-ard.biesheuvel@arm.com> From: "Ard Biesheuvel" Message-ID: Date: Mon, 11 May 2020 18:53:55 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable On 5/11/20 6:42 PM, Andrei Warkentin wrote: > Is 1500 the max MTU that UEFI uses? Wasn't clear from the spec. >=20 > Otherwise looks good - esp optimizing the allocations instead of wastin= g=20 > half page on every alloc =F0=9F=99=82. >=20 > LGTM >=20 > Reviewed-by: Andrei Warkentin >=20 Thanks. I couldn't disentangle that mess either, but I *think* we should=20 probably be using 1500 for SnpMode.MaxPacketSize, since MNP adds the=20 VLAN tag lan and Ethernet FCS size to that. Currently, we have Genet->SnpMode.MaxPacketSize =3D GENET_MAX_PACKET_SIZE; which is definitely wrong. (1536 is a multiple of 512 so it is guaranteed to be cacheline aligned,=20 which helps the non-coherent DMA lib map the buffers without having to=20 do cache maintenance) > -----------------------------------------------------------------------= - > *From:* Ard Biesheuvel > *Sent:* Monday, May 11, 2020 9:55 AM > *To:* devel@edk2.groups.io > *Cc:* Ard Biesheuvel ; Pete Batard=20 > ; Jared McNeill ; Andrei Warkentin= =20 > ; Samer El-Haj-Mahmoud=20 > ; Jeremy Linton > *Subject:* [PATCH edk2-platforms v4 4/9] Silicon/Broadcom/BcmGenetDxe:=20 > avoid uncached memory for streaming DMA > The non-coherent version of DmaAllocateBuffer () returns uncached > memory, to ensure that the CPU and the device see the same data, > even we they are accessing the buffer at the same time. >=20 > This is not really necessary for our RX ring: the CPU never accesses > the buffer while it is mapped for writing by the device, and so we > can simply use the streaming DMA model, which uses ordinary cached > buffers, but issues a cache invalidate at DMA unmap time. >=20 > While at it, reduce the max packet size to 1536 bytes, and allocate > the entire buffer array using a single call. >=20 > Signed-off-by: Ard Biesheuvel > --- > =C2=A0Silicon/Broadcom/Drivers/Net/BcmGenetDxe/BcmGenetDxe.inf |=C2=A0= 4 +++ > =C2=A0Silicon/Broadcom/Drivers/Net/BcmGenetDxe/GenetUtil.h=C2=A0=C2=A0= =C2=A0=C2=A0 |=C2=A0 6 ++-- > =C2=A0Silicon/Broadcom/Drivers/Net/BcmGenetDxe/GenetUtil.c=C2=A0=C2=A0= =C2=A0=C2=A0 | 36=20 > +++++++++----------- > =C2=A0Silicon/Broadcom/Drivers/Net/BcmGenetDxe/SimpleNetwork.c |=C2=A0= 2 +- > =C2=A04 files changed, 26 insertions(+), 22 deletions(-) >=20 > diff --git a/Silicon/Broadcom/Drivers/Net/BcmGenetDxe/BcmGenetDxe.inf=20 > b/Silicon/Broadcom/Drivers/Net/BcmGenetDxe/BcmGenetDxe.inf > index 1f1aeca7dd6b..248164249c6e 100644 > --- a/Silicon/Broadcom/Drivers/Net/BcmGenetDxe/BcmGenetDxe.inf > +++ b/Silicon/Broadcom/Drivers/Net/BcmGenetDxe/BcmGenetDxe.inf > @@ -51,3 +51,7 @@ [Protocols] > =C2=A0=C2=A0 gBcmGenetPlatformDeviceProtocolGuid=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 ## TO_START > =C2=A0=C2=A0 gEfiDevicePathProtocolGuid=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ## = BY_START > =C2=A0=C2=A0 gEfiSimpleNetworkProtocolGuid=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ## BY_START > + > +[FixedPcd] > +=C2=A0 gEmbeddedTokenSpaceGuid.PcdDmaDeviceOffset > +=C2=A0 gEmbeddedTokenSpaceGuid.PcdDmaDeviceLimit > diff --git a/Silicon/Broadcom/Drivers/Net/BcmGenetDxe/GenetUtil.h=20 > b/Silicon/Broadcom/Drivers/Net/BcmGenetDxe/GenetUtil.h > index b491ea4665b0..ddfbc0806c07 100644 > --- a/Silicon/Broadcom/Drivers/Net/BcmGenetDxe/GenetUtil.h > +++ b/Silicon/Broadcom/Drivers/Net/BcmGenetDxe/GenetUtil.h > @@ -50,7 +50,7 @@ > =C2=A0#define BRGPHY_SHADOW_1C_GTXCLK_EN=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 0x0200 >=20 > =C2=A0#define GENET_VERSION=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0x0a > -#define GENET_MAX_PACKET_SIZE=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 2048 > +#define GENET_MAX_PACKET_SIZE=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 1536 >=20 > =C2=A0#define GENET_SYS_REV_CTRL=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 0x000 > =C2=A0#define=C2=A0 SYS_REV_MAJOR=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 (BIT27|BIT26|BIT25|BIT24) > @@ -217,7 +217,7 @@ typedef struct { > =C2=A0=C2=A0 UINT16=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 TxConsIndex; > =C2=A0=C2=A0 UINT16=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 TxProdIndex; >=20 > -=C2=A0 UINT8=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 *RxBuffer[GENET_DMA_DESC_CO= UNT]; > +=C2=A0 EFI_PHYSICAL_ADDRESS=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 RxBuffer; > =C2=A0=C2=A0 GENET_MAP_INFO=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 RxBufferMap[GENET_DMA_DESC_COUNT]; > =C2=A0=C2=A0 UINT16=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 RxConsIndex; > =C2=A0=C2=A0 UINT16=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 RxProdIndex; > @@ -235,6 +235,8 @@ extern CONST EFI_SIMPLE_NETWORK_PROTOCOL =20 > gGenetSimpleNetworkTemplate; > =C2=A0#define GENET_DRIVER_SIGNATURE=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 SIGNATURE_32('G', = 'N',=20 > 'E', 'T') > =C2=A0#define GENET_PRIVATE_DATA_FROM_SNP_THIS(a)=C2=A0=C2=A0 CR(a,=20 > GENET_PRIVATE_DATA, Snp, GENET_DRIVER_SIGNATURE) >=20 > +#define GENET_RX_BUFFER(g, idx)=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ((UINT8=20 > *)(UINTN)(g)->RxBuffer + GENET_MAX_PACKET_SIZE * (idx)) > + > =C2=A0EFI_STATUS > =C2=A0EFIAPI > =C2=A0GenetPhyRead ( > diff --git a/Silicon/Broadcom/Drivers/Net/BcmGenetDxe/GenetUtil.c=20 > b/Silicon/Broadcom/Drivers/Net/BcmGenetDxe/GenetUtil.c > index 71c659e7f882..94b578a10aa1 100644 > --- a/Silicon/Broadcom/Drivers/Net/BcmGenetDxe/GenetUtil.c > +++ b/Silicon/Broadcom/Drivers/Net/BcmGenetDxe/GenetUtil.c > @@ -19,6 +19,10 @@ >=20 > =C2=A0#define GENET_PHY_RETRY=C2=A0=C2=A0=C2=A0=C2=A0 1000 >=20 > +STATIC CONST > +EFI_PHYSICAL_ADDRESS=C2=A0=C2=A0 mDmaAddressLimit =3D FixedPcdGet64=20 > (PcdDmaDeviceLimit) - > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 FixedPcdGet64=20 > (PcdDmaDeviceOffset); > + > =C2=A0/** > =C2=A0=C2=A0 Read a memory-mapped device CSR. >=20 > @@ -605,19 +609,17 @@ GenetDmaAlloc ( > =C2=A0=C2=A0 IN GENET_PRIVATE_DATA *Genet > =C2=A0=C2=A0 ) > =C2=A0{ > -=C2=A0 EFI_STATUS=C2=A0 Status; > -=C2=A0 UINTN=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Idx; > +=C2=A0 EFI_STATUS=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 Status; >=20 > -=C2=A0 for (Idx =3D 0; Idx < GENET_DMA_DESC_COUNT; Idx++) { > -=C2=A0=C2=A0=C2=A0 Status =3D DmaAllocateBuffer (EfiBootServicesData, = EFI_SIZE_TO_PAGES=20 > (GENET_MAX_PACKET_SIZE), (VOID **)&Genet->RxBuffer[Idx]); > -=C2=A0=C2=A0=C2=A0 if (EFI_ERROR (Status)) { > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 DEBUG ((DEBUG_ERROR, "%a: Failed to all= ocate RX buffer: %r\n",=20 > __FUNCTION__, Status)); > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 GenetDmaFree (Genet); > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return Status; > -=C2=A0=C2=A0=C2=A0 } > +=C2=A0 Genet->RxBuffer =3D mDmaAddressLimit; > +=C2=A0 Status =3D gBS->AllocatePages (AllocateMaxAddress, EfiBootServi= cesData, > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 EFI_SIZE_TO_PAGES (GENET_MAX_PACKET_SIZE *= =20 > GENET_DMA_DESC_COUNT), > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 &Genet->RxBuffer); > +=C2=A0 if (EFI_ERROR (Status)) { > +=C2=A0=C2=A0=C2=A0 DEBUG ((DEBUG_ERROR, > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 "%a: Failed to allocate RX buffer: %r\n= ", __FUNCTION__, Status)); > =C2=A0=C2=A0 } > - > -=C2=A0 return EFI_SUCCESS; > +=C2=A0 return Status; > =C2=A0} >=20 > =C2=A0/** > @@ -640,11 +642,11 @@ GenetDmaMapRxDescriptor ( > =C2=A0=C2=A0 UINTN=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Dma= NumberOfBytes; >=20 > =C2=A0=C2=A0 ASSERT (Genet->RxBufferMap[DescIndex].Mapping =3D=3D NULL= ); > -=C2=A0 ASSERT (Genet->RxBuffer[DescIndex] !=3D NULL); > +=C2=A0 ASSERT (Genet->RxBuffer !=3D 0); >=20 > =C2=A0=C2=A0 DmaNumberOfBytes =3D GENET_MAX_PACKET_SIZE; > =C2=A0=C2=A0 Status =3D DmaMap (MapOperationBusMasterWrite, > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= (VOID *)Genet->RxBuffer[DescIndex], > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= GENET_RX_BUFFER (Genet, DescIndex), > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 &DmaNumberOfBytes, > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 &Genet->RxBufferMap[DescIndex].PhysAddress, > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 &Genet->RxBufferMap[DescIndex].Mapping); > @@ -697,13 +699,9 @@ GenetDmaFree ( >=20 > =C2=A0=C2=A0 for (Idx =3D 0; Idx < GENET_DMA_DESC_COUNT; Idx++) { > =C2=A0=C2=A0=C2=A0=C2=A0 GenetDmaUnmapRxDescriptor (Genet, Idx); > - > -=C2=A0=C2=A0=C2=A0 if (Genet->RxBuffer[Idx] !=3D NULL) { > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 DmaFreeBuffer (EFI_SIZE_TO_PAGES (GENET= _MAX_PACKET_SIZE), > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Genet->RxBuffer[Idx]); > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Genet->RxBuffer[Idx] =3D NULL; > -=C2=A0=C2=A0=C2=A0 } > =C2=A0=C2=A0 } > +=C2=A0 gBS->FreePages (Genet->RxBuffer, > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 EFI_SIZE_TO_PAGES (GE= NET_MAX_PACKET_SIZE * GENET_DMA_DESC_COUNT)); > =C2=A0} >=20 > =C2=A0/** > diff --git a/Silicon/Broadcom/Drivers/Net/BcmGenetDxe/SimpleNetwork.c=20 > b/Silicon/Broadcom/Drivers/Net/BcmGenetDxe/SimpleNetwork.c > index 72ab55619b0e..9746f210d1f2 100644 > --- a/Silicon/Broadcom/Drivers/Net/BcmGenetDxe/SimpleNetwork.c > +++ b/Silicon/Broadcom/Drivers/Net/BcmGenetDxe/SimpleNetwork.c > @@ -730,7 +730,7 @@ GenetSimpleNetworkReceive ( >=20 > =C2=A0=C2=A0 GenetDmaUnmapRxDescriptor (Genet, DescIndex); >=20 > -=C2=A0 Frame =3D Genet->RxBuffer[DescIndex]; > +=C2=A0 Frame =3D GENET_RX_BUFFER (Genet, DescIndex); >=20 > =C2=A0=C2=A0 if (FrameLength > 2 + Genet->SnpMode.MediaHeaderSize) { > =C2=A0=C2=A0=C2=A0=C2=A0 // Received frame has 2 bytes of padding at t= he start > --=20 > 2.17.1 >=20