From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=2607:f8b0:4864:20::143; helo=mail-it1-x143.google.com; envelope-from=ard.biesheuvel@linaro.org; receiver=edk2-devel@lists.01.org Received: from mail-it1-x143.google.com (mail-it1-x143.google.com [IPv6:2607:f8b0:4864:20::143]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 5822021B02822 for ; Wed, 7 Nov 2018 07:14:31 -0800 (PST) Received: by mail-it1-x143.google.com with SMTP id t189-v6so13315008itf.1 for ; Wed, 07 Nov 2018 07:14:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=szx4ElZ7TYR2SIjF/FbYM7HYjTdi+Q6kwQYdrYqvYWo=; b=J5luEg7puThVSNcRinB8Go70NCGH3y1IwyJ0ZYJsOHQ7/nrUz89/vzFgteZAilUHWp Eu1Uqhgjmqfh7TmwzfGrDGfNu3sy1C89tdYKRzxoGSl2YBbMdb0F8wyvn8asFai45iE1 qYo//+f9k5kniM2hAyChdtHCDw+7aFiJE0zeY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=szx4ElZ7TYR2SIjF/FbYM7HYjTdi+Q6kwQYdrYqvYWo=; b=KxZF/tGotvzXBVaWJ7vM/Jcj901wm+lRC1dW9RQfSM3T9Mmf+6FQlBfQMD+Atvi9M+ 3NKI7wEysWONJeZ9B7b/GRLAURTQ4P8Oq/kTF6zkSt6ZmCVqg7AQkx5ill9aefcxe0/Y /p1J+HEJe+eJhpDbvJTEJVI+1gkjOin0d2LBuQY9T2CWIFTxT0nWPBTuRu448bVLMiNV xU2h6IvSRdwIYBIXk2pNDyODSjUWoBc3RJhyll3WRB0UDJFnot8QVVSfTx/0+X83iTf4 Rbh+NQ3r0YF76lJ/BVVok1jF6/bCGo+8qDFzMmPaxINpUFq+PVBU3l9k0MI400VGFRPo 7utA== X-Gm-Message-State: AGRZ1gJuA+62lLJGidG9AgXFl7y6MuMVN35++ZjCn5dcwbVeiDl2HY9n iAGdfsUNkkqLne4MkpgqPsOUIY6FuD2PvKg+7dfOuYFF X-Google-Smtp-Source: AJdET5cVDqwwXFsdGebTcE++FtrP5qRmvcvKignUuXUL8z/fkka+dAzJ+tfAHI7sRbXqeo+H/Y5kgfWPUvQjmByWaVk= X-Received: by 2002:a02:8449:: with SMTP id l9-v6mr612540jah.130.1541603670253; Wed, 07 Nov 2018 07:14:30 -0800 (PST) MIME-Version: 1.0 Received: by 2002:a6b:4f16:0:0:0:0:0 with HTTP; Wed, 7 Nov 2018 07:14:29 -0800 (PST) In-Reply-To: <57b70aa4-f2c7-02e4-4eb5-43b0a65ba24c@intel.com> References: <1540561286-112684-1-git-send-email-star.zeng@intel.com> <1540561286-112684-5-git-send-email-star.zeng@intel.com> <20181030125006.4deveknlhrwehllb@bivouac.eciton.net> <962a2a90-2783-5fd1-25d2-6a834daa3f26@intel.com> <42216d80-d5c0-d071-aa54-932138a05078@intel.com> <57b70aa4-f2c7-02e4-4eb5-43b0a65ba24c@intel.com> From: Ard Biesheuvel Date: Wed, 7 Nov 2018 16:14:29 +0100 Message-ID: To: "Zeng, Star" Cc: edk2-devel-01 Subject: Re: [PATCH V3 4/4] MdeModulePkg EhciDxe: Use common buffer for AsyncInterruptTransfer X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Nov 2018 15:14:31 -0000 Content-Type: text/plain; charset="UTF-8" On 7 November 2018 at 16:00, Zeng, Star wrote: > On 2018/11/6 22:37, Zeng, Star wrote: >> >> On 2018/11/6 17:49, Ard Biesheuvel wrote: >>> >>> On 31 October 2018 at 05:38, Zeng, Star wrote: >>>> >>>> Good feedback. >>>> >>>> On 2018/10/30 20:50, Leif Lindholm wrote: >>>>> >>>>> >>>>> On Tue, Oct 30, 2018 at 09:39:24AM -0300, Ard Biesheuvel wrote: >>>>>> >>>>>> >>>>>> (add back the list) >>>>> >>>>> >>>>> >>>>> Oi! Go back on holiday! >>>>> >>>>>> On 30 October 2018 at 09:07, Cohen, Eugene wrote: >>>>>>> >>>>>>> >>>>>>> Has this patch been tested on a system that does not have coherent >>>>>>> DMA? >>>>>>> >>>>>>> It's not clear that this change would actually be faster on a system >>>>>>> of >>>>>>> that >>>>>>> type since using common buffers imply access to uncached memory. >>>>>>> Depending >>>>>>> on the access patterns the uncached memory access could be more time >>>>>>> consuming than cache maintenance operations. >>>> >>>> >>>> >>>> The change/idea was based on the statement below. >>>> /// >>>> /// Provides both read and write access to system memory by both the >>>> processor and a >>>> /// bus master. The buffer is coherent from both the processor's and >>>> the >>>> bus master's point of view. >>>> /// >>>> EfiPciIoOperationBusMasterCommonBuffer, >>>> >>>> Thanks for raising case about uncached memory access. But after checking >>>> the >>>> code, for Intel VTd case >>>> >>>> https://github.com/tianocore/edk2/blob/master/IntelSiliconPkg/Feature/VTd/IntelVTdDxe/BmDma.c#L460 >>>> (or no IOMMU case >>>> >>>> https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciRootBridgeIo.c#L1567), >>>> the common buffer is just normal memory buffer. >>>> If someone can help do some test/collect some data on a system using >>>> common >>>> buffers imply access to uncached memory, that will be great. >>>> >>> >>> OK, so first of all, can anyone explain to me under which >>> circumstances interrupt transfers are a bottleneck? I'd assume that >>> anything throughput bound would use bulk endpoints. >>> >>> Also, since the Map/Unmap calls are only costly when using an IOMMU, >>> could we simply revert to the old behavior if mIoMmu == NULL? >>> >>>>>> >>>>>> I haven't had time to look at these patches yet. >>>>>> >>>>>> I agree with Eugene's concern: the directional DMA routines are much >>>>>> more performant on implementations with non-coherent DMA, and so >>>>>> common buffers should be avoided unless we are dealing with data >>>>>> structures that are truly shared between the CPU and the device. >>>>>> >>>>>> Since this is obviously not the case here, could we please have some >>>>>> numbers about the performance improvement we are talking about here? >>>>>> Would it be possible to improve the IOMMU handling code instead? >>>> >>>> >>>> >>>> We collected the data below on a platform with release image and Intel >>>> VTd >>>> enabled. >>>> >>>> The image size of EhciDxe or XhciDxe can reduce about 120+ bytes. >>>> >>>> EHCI without the patch: >>>> ==[ Cumulative ]======== >>>> (Times in microsec.) Cumulative Average Shortest Longest >>>> Name Count Duration Duration Duration Duration >>>> >>>> ------------------------------------------------------------------------------- >>>> S0000B00D1DF0 446 2150 4 2 963 >>>> >>>> EHCI with the patch: >>>> ==[ Cumulative ]======== >>>> (Times in microsec.) Cumulative Average Shortest Longest >>>> Name Count Duration Duration Duration Duration >>>> >>>> ------------------------------------------------------------------------------- >>>> S0000B00D1DF0 270 742 2 2 41 >>>> >>>> XHCI without the patch: >>>> ==[ Cumulative ]======== >>>> (Times in microsec.) Cumulative Average Shortest Longest >>>> Name Count Duration Duration Duration Duration >>>> >>>> ------------------------------------------------------------------------------- >>>> S0000B00D14F0 215 603 2 2 52 >>>> >>>> XHCI with the patch: >>>> ==[ Cumulative ]======== >>>> (Times in microsec.) Cumulative Average Shortest Longest >>>> Name Count Duration Duration Duration Duration >>>> >>>> ------------------------------------------------------------------------------- >>>> S0000B00D14F0 95 294 3 2 52 >>>> >>>> I believe the performance data really depends on >>>> 1. How many AsyncInterruptTransfer handlers (the number of USB keyboard >>>> and/or USB bluetooth keyboard?) >>>> 2. Data size (for flushing data from PCI controller specific address to >>>> mapped system memory address *in original code*) >>>> 3. The performance of IoMmu->SetAttribute (for example, the SetAttribute >>>> operation on Intel VTd engine caused by the unmap and map for flushing >>>> data >>>> *in original code*, the SetAttribute operation on IntelVTd engine will >>>> involve FlushPageTableMemory, InvalidatePageEntry and etc) >>>> >>> >>> OK, so there is room for improvement here: there is no reason the >>> IOMMU driver couldn't cache mappings, or do some other optimizations >>> that would make mapping the same memory repeatedly less costly. >> >> >> The unmap/map with IOMMU will direct to SetAttribute that will >> disallow/allow DMA memory access. The IOMMU driver is hard to predict the >> sequence of unmap/map operations. Do you have more detail about the >> optimizations? >> >> Could you take a try with the patch on the platform for the case you and >> Eugene mentioned? >> >> Anyway, I am going to revert the patches (3/4 and 4/4, since 1/4 and 2/4 >> have no functionality impact) since the time point is a little sensitive as >> it is near edk2-stable201811. > > > I have reverted the patch 3/4 and 4/4 at > https://github.com/tianocore/edk2/compare/1ed6498...d98fc9a, and we can > continue the discussion. > Thanks Star