From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=134.134.136.100; helo=mga07.intel.com; envelope-from=star.zeng@intel.com; receiver=edk2-devel@lists.01.org Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 7367A21106F21 for ; Wed, 7 Nov 2018 07:01:25 -0800 (PST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 07 Nov 2018 07:01:24 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,475,1534834800"; d="scan'208";a="277876593" Received: from shzintpr03.sh.intel.com (HELO [10.253.24.32]) ([10.239.4.100]) by fmsmga005.fm.intel.com with ESMTP; 07 Nov 2018 07:01:24 -0800 To: Ard Biesheuvel Cc: edk2-devel-01 , star.zeng@intel.com References: <1540561286-112684-1-git-send-email-star.zeng@intel.com> <1540561286-112684-5-git-send-email-star.zeng@intel.com> <20181030125006.4deveknlhrwehllb@bivouac.eciton.net> <962a2a90-2783-5fd1-25d2-6a834daa3f26@intel.com> <42216d80-d5c0-d071-aa54-932138a05078@intel.com> From: "Zeng, Star" Message-ID: <57b70aa4-f2c7-02e4-4eb5-43b0a65ba24c@intel.com> Date: Wed, 7 Nov 2018 23:00:53 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <42216d80-d5c0-d071-aa54-932138a05078@intel.com> Subject: Re: [PATCH V3 4/4] MdeModulePkg EhciDxe: Use common buffer for AsyncInterruptTransfer X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Nov 2018 15:01:25 -0000 Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit On 2018/11/6 22:37, Zeng, Star wrote: > On 2018/11/6 17:49, Ard Biesheuvel wrote: >> On 31 October 2018 at 05:38, Zeng, Star wrote: >>> Good feedback. >>> >>> On 2018/10/30 20:50, Leif Lindholm wrote: >>>> >>>> On Tue, Oct 30, 2018 at 09:39:24AM -0300, Ard Biesheuvel wrote: >>>>> >>>>> (add back the list) >>>> >>>> >>>> Oi! Go back on holiday! >>>> >>>>> On 30 October 2018 at 09:07, Cohen, Eugene wrote: >>>>>> >>>>>> Has this patch been tested on a system that does not have coherent >>>>>> DMA? >>>>>> >>>>>> It's not clear that this change would actually be faster on a >>>>>> system of >>>>>> that >>>>>> type since using common buffers imply access to uncached memory. >>>>>> Depending >>>>>> on the access patterns the uncached memory access could be more time >>>>>> consuming than cache maintenance operations. >>> >>> >>> The change/idea was based on the statement below. >>>    /// >>>    /// Provides both read and write access to system memory by both the >>> processor and a >>>    /// bus master. The buffer is coherent from both the processor's >>> and the >>> bus master's point of view. >>>    /// >>>    EfiPciIoOperationBusMasterCommonBuffer, >>> >>> Thanks for raising case about uncached memory access. But after >>> checking the >>> code, for Intel VTd case >>> https://github.com/tianocore/edk2/blob/master/IntelSiliconPkg/Feature/VTd/IntelVTdDxe/BmDma.c#L460 >>> >>> (or no IOMMU case >>> https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciRootBridgeIo.c#L1567), >>> >>> the common buffer is just normal memory buffer. >>> If someone can help do some test/collect some data on a system using >>> common >>> buffers imply access to uncached memory, that will be great. >>> >> >> OK, so first of all, can anyone explain to me under which >> circumstances interrupt transfers are a bottleneck? I'd assume that >> anything throughput bound would use bulk endpoints. >> >> Also, since the Map/Unmap calls are only costly when using an IOMMU, >> could we simply revert to the old behavior if mIoMmu == NULL? >> >>>>> >>>>> I haven't had time to look at these patches yet. >>>>> >>>>> I agree with Eugene's concern: the directional DMA routines are much >>>>> more performant on implementations with non-coherent DMA, and so >>>>> common buffers should be avoided unless we are dealing with data >>>>> structures that are truly shared between the CPU and the device. >>>>> >>>>> Since this is obviously not the case here, could we please have some >>>>> numbers about the performance improvement we are talking about here? >>>>> Would it be possible to improve the IOMMU handling code instead? >>> >>> >>> We collected the data below on a platform with release image and >>> Intel VTd >>> enabled. >>> >>> The image size of EhciDxe or XhciDxe can reduce about 120+ bytes. >>> >>> EHCI without the patch: >>> ==[ Cumulative ]======== >>> (Times in microsec.)     Cumulative   Average     Shortest    Longest >>>     Name         Count     Duration    Duration    Duration    Duration >>> ------------------------------------------------------------------------------- >>> >>> S0000B00D1DF0        446        2150           4           2         963 >>> >>> EHCI with the patch: >>> ==[ Cumulative ]======== >>> (Times in microsec.)     Cumulative   Average     Shortest    Longest >>>     Name         Count     Duration    Duration    Duration    Duration >>> ------------------------------------------------------------------------------- >>> >>> S0000B00D1DF0        270         742           2           2          41 >>> >>> XHCI without the patch: >>> ==[ Cumulative ]======== >>> (Times in microsec.)     Cumulative   Average     Shortest    Longest >>>     Name         Count     Duration    Duration    Duration    Duration >>> ------------------------------------------------------------------------------- >>> >>> S0000B00D14F0        215         603           2           2          52 >>> >>> XHCI with the patch: >>> ==[ Cumulative ]======== >>> (Times in microsec.)     Cumulative   Average     Shortest    Longest >>>     Name         Count     Duration    Duration    Duration    Duration >>> ------------------------------------------------------------------------------- >>> >>> S0000B00D14F0         95         294           3           2          52 >>> >>> I believe the performance data really depends on >>> 1. How many AsyncInterruptTransfer handlers (the number of USB keyboard >>> and/or USB bluetooth keyboard?) >>> 2. Data size (for flushing data from PCI controller specific address to >>> mapped system memory address *in original code*) >>> 3. The performance of IoMmu->SetAttribute (for example, the SetAttribute >>> operation on Intel VTd engine caused by the unmap and map for >>> flushing data >>> *in original code*, the SetAttribute operation on IntelVTd engine will >>> involve FlushPageTableMemory, InvalidatePageEntry and etc) >>> >> >> OK, so there is room for improvement here: there is no reason the >> IOMMU driver couldn't cache mappings, or do some other optimizations >> that would make mapping the same memory repeatedly less costly. > > The unmap/map with IOMMU will direct to SetAttribute that will > disallow/allow DMA memory access. The IOMMU driver is hard to predict > the sequence of unmap/map operations. Do you have more detail about the > optimizations? > > Could you take a try with the patch on the platform for the case you and > Eugene mentioned? > > Anyway, I am going to revert the patches (3/4 and 4/4, since 1/4 and 2/4 > have no functionality impact) since the time point is a little sensitive > as it is near edk2-stable201811. I have reverted the patch 3/4 and 4/4 at https://github.com/tianocore/edk2/compare/1ed6498...d98fc9a, and we can continue the discussion. > > Thanks, > Star > >> >>>> >>>> On an unrelated note to the concerns above: >>>> Why has a fundamental change to the behaviour of one of the industry >>>> standard drivers been pushed at the very end of the stable cycle? >>> >>> >>> We thought it was a simple improvement but not fundamental change before >>> Eugene and Ard raised the concern. >>> >>> >>> Thanks, >>> Star >>> >>>> >>>> Regards, >>>> >>>> Leif >>>> >>>