From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=2a00:1450:4864:20::343; helo=mail-wm1-x343.google.com; envelope-from=leif.lindholm@linaro.org; receiver=edk2-devel@lists.01.org Received: from mail-wm1-x343.google.com (mail-wm1-x343.google.com [IPv6:2a00:1450:4864:20::343]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 27AAD21CAD998 for ; Wed, 31 Oct 2018 05:08:21 -0700 (PDT) Received: by mail-wm1-x343.google.com with SMTP id y144-v6so14253072wmd.4 for ; Wed, 31 Oct 2018 05:08:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=tB/fsbEihkBNOyTPIIpfSqSRPTUI2XlzuRaYi3k7bwA=; b=CLhMFG2c0Il2j3B8p/BJ1spYVwB9v6qyVlQvKDVJeC7WxI4IakH86Pn1luGh5MiDd5 BmjNOhU6ab6F++NThouj022SbIARkPx06u0dfLc3U58zhI/5cz28lxArvHI6cK1wxnja AvbCDZVmbu4IUoZgE7zjE1pkbmVo2uFtPl8KE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=tB/fsbEihkBNOyTPIIpfSqSRPTUI2XlzuRaYi3k7bwA=; b=hrTQVBo4U0X1li0zr8ZEUALpAxsZq058d3P28LEki3uO9Z0vLiNRoCkrQuk4rG5O5/ +H+G9NnakgVinNh516HJAeD/ajpjUvbwsBuvB9qmdpnHcq02E7DmNV/17hQd8nqoRDmt rZa1gaWSaX97ulpXtKAqWOISgGWjHb6D9ED+I13qmBMfHI2QYZ9+s93+Stn9zOXlet3v rJi8YArSnAAxT+6qtNsO+YPSk+Nfp+G+RsnSHRHQ2DC6/0SqAJQbxKcSgZTZpH52m3Eg zFHdB79Gpgiho196JMc2Yi1AZYEFG38xfTDfxeGLP4dMv4FafpgbOfEO7jB47YWe6a5j zA9A== X-Gm-Message-State: AGRZ1gI+JhHdMXWNtgw3VjDo3vJI7L5AxhmZoi+TMA1PtqBWLQhOlq0N seOGmv+j/nidJY8YHZx76HTH0Q== X-Google-Smtp-Source: AJdET5f8Firi0WtmWHGZ2jFiMgLxUXSv2e5pIWcuIbYex2vMvPBtwNxhqGf/qxC6bQ7epQn7Gdiq3g== X-Received: by 2002:a1c:ab54:: with SMTP id u81-v6mr2119369wme.45.1540987699704; Wed, 31 Oct 2018 05:08:19 -0700 (PDT) Received: from bivouac.eciton.net (bivouac.eciton.net. [2a00:1098:0:86:1000:23:0:2]) by smtp.gmail.com with ESMTPSA id 7-v6sm9049126wrl.27.2018.10.31.05.08.18 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 31 Oct 2018 05:08:18 -0700 (PDT) Date: Wed, 31 Oct 2018 12:08:16 +0000 From: Leif Lindholm To: "Zeng, Star" Cc: Ard Biesheuvel , "Cohen, Eugene" , edk2-devel-01 Message-ID: <20181031120816.jmzeo67l2ij7da23@bivouac.eciton.net> References: <1540561286-112684-1-git-send-email-star.zeng@intel.com> <1540561286-112684-5-git-send-email-star.zeng@intel.com> <20181030125006.4deveknlhrwehllb@bivouac.eciton.net> <962a2a90-2783-5fd1-25d2-6a834daa3f26@intel.com> MIME-Version: 1.0 In-Reply-To: <962a2a90-2783-5fd1-25d2-6a834daa3f26@intel.com> User-Agent: NeoMutt/20170113 (1.7.2) Subject: Re: [PATCH V3 4/4] MdeModulePkg EhciDxe: Use common buffer for AsyncInterruptTransfer X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Oct 2018 12:08:22 -0000 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Wed, Oct 31, 2018 at 12:38:43PM +0800, Zeng, Star wrote: > Good feedback. > > On 2018/10/30 20:50, Leif Lindholm wrote: > > On Tue, Oct 30, 2018 at 09:39:24AM -0300, Ard Biesheuvel wrote: > > > (add back the list) > > > > Oi! Go back on holiday! > > > > > On 30 October 2018 at 09:07, Cohen, Eugene wrote: > > > > Has this patch been tested on a system that does not have coherent DMA? > > > > > > > > It's not clear that this change would actually be faster on a system of that > > > > type since using common buffers imply access to uncached memory. Depending > > > > on the access patterns the uncached memory access could be more time > > > > consuming than cache maintenance operations. > > The change/idea was based on the statement below. > /// > /// Provides both read and write access to system memory by both the > processor and a > /// bus master. The buffer is coherent from both the processor's and the > bus master's point of view. > /// > EfiPciIoOperationBusMasterCommonBuffer, > > Thanks for raising case about uncached memory access. But after checking the > code, for Intel VTd case https://github.com/tianocore/edk2/blob/master/IntelSiliconPkg/Feature/VTd/IntelVTdDxe/BmDma.c#L460 > (or no IOMMU case https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciRootBridgeIo.c#L1567), > the common buffer is just normal memory buffer. > If someone can help do some test/collect some data on a system using common > buffers imply access to uncached memory, that will be great. > > > > > > > I haven't had time to look at these patches yet. > > > > > > I agree with Eugene's concern: the directional DMA routines are much > > > more performant on implementations with non-coherent DMA, and so > > > common buffers should be avoided unless we are dealing with data > > > structures that are truly shared between the CPU and the device. > > > > > > Since this is obviously not the case here, could we please have some > > > numbers about the performance improvement we are talking about here? > > > Would it be possible to improve the IOMMU handling code instead? > > We collected the data below on a platform with release image and Intel VTd > enabled. > > The image size of EhciDxe or XhciDxe can reduce about 120+ bytes. > > EHCI without the patch: > ==[ Cumulative ]======== > (Times in microsec.) Cumulative Average Shortest Longest > Name Count Duration Duration Duration Duration > ------------------------------------------------------------------------------- > S0000B00D1DF0 446 2150 4 2 963 > > EHCI with the patch: > ==[ Cumulative ]======== > (Times in microsec.) Cumulative Average Shortest Longest > Name Count Duration Duration Duration Duration > ------------------------------------------------------------------------------- > S0000B00D1DF0 270 742 2 2 41 > > XHCI without the patch: > ==[ Cumulative ]======== > (Times in microsec.) Cumulative Average Shortest Longest > Name Count Duration Duration Duration Duration > ------------------------------------------------------------------------------- > S0000B00D14F0 215 603 2 2 52 > > XHCI with the patch: > ==[ Cumulative ]======== > (Times in microsec.) Cumulative Average Shortest Longest > Name Count Duration Duration Duration Duration > ------------------------------------------------------------------------------- > S0000B00D14F0 95 294 3 2 52 > > I believe the performance data really depends on > 1. How many AsyncInterruptTransfer handlers (the number of USB keyboard > and/or USB bluetooth keyboard?) > 2. Data size (for flushing data from PCI controller specific address to > mapped system memory address *in original code*) > 3. The performance of IoMmu->SetAttribute (for example, the SetAttribute > operation on Intel VTd engine caused by the unmap and map for flushing data > *in original code*, the SetAttribute operation on IntelVTd engine will > involve FlushPageTableMemory, InvalidatePageEntry and etc) > > > On an unrelated note to the concerns above: > > Why has a fundamental change to the behaviour of one of the industry > > standard drivers been pushed at the very end of the stable cycle? > > We thought it was a simple improvement but not fundamental change before > Eugene and Ard raised the concern. Understood. However, as it is changing the memory management behaviour of a core driver, I think it automatically qualifies as something that should only go in the week after a stable tag. We will need to have a closer look at the non-coherent case when Ard gets back (Monday). If this version causes issues with non-coherent systems, we will need to revert it before the stable tag. We would then need to look into the best way to deal with the performance issues quoted above. Best Regards, Leif