From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=192.55.52.43; helo=mga05.intel.com; envelope-from=star.zeng@intel.com; receiver=edk2-devel@lists.01.org Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id DB16921125471 for ; Wed, 31 Oct 2018 18:12:36 -0700 (PDT) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 31 Oct 2018 18:12:36 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,450,1534834800"; d="scan'208";a="104974531" Received: from shzintpr01.sh.intel.com (HELO [10.7.209.21]) ([10.239.4.80]) by orsmga002.jf.intel.com with ESMTP; 31 Oct 2018 18:12:35 -0700 To: Leif Lindholm Cc: edk2-devel-01 , star.zeng@intel.com References: <1540561286-112684-1-git-send-email-star.zeng@intel.com> <1540561286-112684-5-git-send-email-star.zeng@intel.com> <20181030125006.4deveknlhrwehllb@bivouac.eciton.net> <962a2a90-2783-5fd1-25d2-6a834daa3f26@intel.com> <20181031120816.jmzeo67l2ij7da23@bivouac.eciton.net> From: "Zeng, Star" Message-ID: <90e7a9d5-0ece-a12c-0730-e67b6dbf6505@intel.com> Date: Thu, 1 Nov 2018 09:12:04 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20181031120816.jmzeo67l2ij7da23@bivouac.eciton.net> Subject: Re: [PATCH V3 4/4] MdeModulePkg EhciDxe: Use common buffer for AsyncInterruptTransfer X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Nov 2018 01:12:37 -0000 Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit On 2018/10/31 20:08, Leif Lindholm wrote: > On Wed, Oct 31, 2018 at 12:38:43PM +0800, Zeng, Star wrote: >> Good feedback. >> >> On 2018/10/30 20:50, Leif Lindholm wrote: >>> On Tue, Oct 30, 2018 at 09:39:24AM -0300, Ard Biesheuvel wrote: >>>> (add back the list) >>> >>> Oi! Go back on holiday! >>> >>>> On 30 October 2018 at 09:07, Cohen, Eugene wrote: >>>>> Has this patch been tested on a system that does not have coherent DMA? >>>>> >>>>> It's not clear that this change would actually be faster on a system of that >>>>> type since using common buffers imply access to uncached memory. Depending >>>>> on the access patterns the uncached memory access could be more time >>>>> consuming than cache maintenance operations. >> >> The change/idea was based on the statement below. >> /// >> /// Provides both read and write access to system memory by both the >> processor and a >> /// bus master. The buffer is coherent from both the processor's and the >> bus master's point of view. >> /// >> EfiPciIoOperationBusMasterCommonBuffer, >> >> Thanks for raising case about uncached memory access. But after checking the >> code, for Intel VTd case https://github.com/tianocore/edk2/blob/master/IntelSiliconPkg/Feature/VTd/IntelVTdDxe/BmDma.c#L460 >> (or no IOMMU case https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciRootBridgeIo.c#L1567), >> the common buffer is just normal memory buffer. >> If someone can help do some test/collect some data on a system using common >> buffers imply access to uncached memory, that will be great. >> >>>> >>>> I haven't had time to look at these patches yet. >>>> >>>> I agree with Eugene's concern: the directional DMA routines are much >>>> more performant on implementations with non-coherent DMA, and so >>>> common buffers should be avoided unless we are dealing with data >>>> structures that are truly shared between the CPU and the device. >>>> >>>> Since this is obviously not the case here, could we please have some >>>> numbers about the performance improvement we are talking about here? >>>> Would it be possible to improve the IOMMU handling code instead? >> >> We collected the data below on a platform with release image and Intel VTd >> enabled. >> >> The image size of EhciDxe or XhciDxe can reduce about 120+ bytes. >> >> EHCI without the patch: >> ==[ Cumulative ]======== >> (Times in microsec.) Cumulative Average Shortest Longest >> Name Count Duration Duration Duration Duration >> ------------------------------------------------------------------------------- >> S0000B00D1DF0 446 2150 4 2 963 >> >> EHCI with the patch: >> ==[ Cumulative ]======== >> (Times in microsec.) Cumulative Average Shortest Longest >> Name Count Duration Duration Duration Duration >> ------------------------------------------------------------------------------- >> S0000B00D1DF0 270 742 2 2 41 >> >> XHCI without the patch: >> ==[ Cumulative ]======== >> (Times in microsec.) Cumulative Average Shortest Longest >> Name Count Duration Duration Duration Duration >> ------------------------------------------------------------------------------- >> S0000B00D14F0 215 603 2 2 52 >> >> XHCI with the patch: >> ==[ Cumulative ]======== >> (Times in microsec.) Cumulative Average Shortest Longest >> Name Count Duration Duration Duration Duration >> ------------------------------------------------------------------------------- >> S0000B00D14F0 95 294 3 2 52 >> >> I believe the performance data really depends on >> 1. How many AsyncInterruptTransfer handlers (the number of USB keyboard >> and/or USB bluetooth keyboard?) >> 2. Data size (for flushing data from PCI controller specific address to >> mapped system memory address *in original code*) >> 3. The performance of IoMmu->SetAttribute (for example, the SetAttribute >> operation on Intel VTd engine caused by the unmap and map for flushing data >> *in original code*, the SetAttribute operation on IntelVTd engine will >> involve FlushPageTableMemory, InvalidatePageEntry and etc) >> >>> On an unrelated note to the concerns above: >>> Why has a fundamental change to the behaviour of one of the industry >>> standard drivers been pushed at the very end of the stable cycle? >> >> We thought it was a simple improvement but not fundamental change before >> Eugene and Ard raised the concern. > > Understood. Thanks. :) > > However, as it is changing the memory management behaviour of a core > driver, I think it automatically qualifies as something that should > only go in the week after a stable tag. > > We will need to have a closer look at the non-coherent case when Ard > gets back (Monday). You mean Ard is on vacation and will be back next Monday. > > If this version causes issues with non-coherent systems, we will need > to revert it before the stable tag. We would then need to look into > the best way to deal with the performance issues quoted above. I am glad to revert it if it has side effect. Is it possible someone could have a quick check? Thanks, Star > > Best Regards, > > Leif >