From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=2607:f8b0:4864:20::d41; helo=mail-io1-xd41.google.com; envelope-from=ard.biesheuvel@linaro.org; receiver=edk2-devel@lists.01.org Received: from mail-io1-xd41.google.com (mail-io1-xd41.google.com [IPv6:2607:f8b0:4864:20::d41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 49D1221B02822 for ; Tue, 6 Nov 2018 01:49:01 -0800 (PST) Received: by mail-io1-xd41.google.com with SMTP id f12-v6so8781913iog.0 for ; Tue, 06 Nov 2018 01:49:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=dBAP1gbkfc45j2qLrffbxqN2R4+50zBWAco8zzhv1GI=; b=UpBEVUzMXml7YiCCpYD28w6ZdSUTgZHoPWlB3q7keBvJMkrMFtZaQrk2MLRbGAYLlx K+uqAmVXCYKCVVxdjUlBjQYqpADmQt0eKatdzpEVGnhbgN7rWM7hnZLj8Rv+4b8bHz33 9SiVI6zc8b6tQA0aBkdid7o/pUBnXrBliTq88= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=dBAP1gbkfc45j2qLrffbxqN2R4+50zBWAco8zzhv1GI=; b=A7EOha62iqZXYxl454i7v0VjTVnWVqTjoRKI5eYGQ3PDOaYfGxHg5PDT1jPuQ2CEkO H8hHoZ3TfHl5qNPJUGii6aKY9FciNkZAVEhllPBU2R1jJRfHuB+NkVvG0PttAok5+Axv bYIkn+YLAN1guH0M3C5uz6Jgjrsdre2BiXEIIBzwsa2Y/7MI2+UgmmQe0uOth2pwxt19 jx0bbbAUvX30O85IaJT7qpwdm0kXAQ2PtEHoZuYzBKg/PmCmI+TTU6S6UknGVMPTa16P YFs7PbnnfpSevsHaBxec5oZiHUjsxCEo6U6UVGMtNn+gfCRuDPR+3vNR6ppLtbM/tc3/ mGbQ== X-Gm-Message-State: AGRZ1gI35BzCoTJx8vbnn+9/3TPFokgtnyw9YWhsfwSMFRDScYV9cOF5 YHYH9yPVeIlGj+tRTnn6Z5aN6sM5UCRgfkCv/ZnBMQ== X-Google-Smtp-Source: AJdET5d35dm/eEptOSF5ut4GIvbUtVEyxwmguTJnDgixg7RJ45YIuEAwz0ExNdjndedzJjkowtwxi8nW3y/nCf7zCf4= X-Received: by 2002:a6b:37c2:: with SMTP id e185-v6mr20944584ioa.173.1541497741124; Tue, 06 Nov 2018 01:49:01 -0800 (PST) MIME-Version: 1.0 Received: by 2002:a6b:4f16:0:0:0:0:0 with HTTP; Tue, 6 Nov 2018 01:49:00 -0800 (PST) In-Reply-To: <962a2a90-2783-5fd1-25d2-6a834daa3f26@intel.com> References: <1540561286-112684-1-git-send-email-star.zeng@intel.com> <1540561286-112684-5-git-send-email-star.zeng@intel.com> <20181030125006.4deveknlhrwehllb@bivouac.eciton.net> <962a2a90-2783-5fd1-25d2-6a834daa3f26@intel.com> From: Ard Biesheuvel Date: Tue, 6 Nov 2018 10:49:00 +0100 Message-ID: To: "Zeng, Star" Cc: Leif Lindholm , "Cohen, Eugene" , edk2-devel-01 Subject: Re: [PATCH V3 4/4] MdeModulePkg EhciDxe: Use common buffer for AsyncInterruptTransfer X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Nov 2018 09:49:04 -0000 Content-Type: text/plain; charset="UTF-8" On 31 October 2018 at 05:38, Zeng, Star wrote: > Good feedback. > > On 2018/10/30 20:50, Leif Lindholm wrote: >> >> On Tue, Oct 30, 2018 at 09:39:24AM -0300, Ard Biesheuvel wrote: >>> >>> (add back the list) >> >> >> Oi! Go back on holiday! >> >>> On 30 October 2018 at 09:07, Cohen, Eugene wrote: >>>> >>>> Has this patch been tested on a system that does not have coherent DMA? >>>> >>>> It's not clear that this change would actually be faster on a system of >>>> that >>>> type since using common buffers imply access to uncached memory. >>>> Depending >>>> on the access patterns the uncached memory access could be more time >>>> consuming than cache maintenance operations. > > > The change/idea was based on the statement below. > /// > /// Provides both read and write access to system memory by both the > processor and a > /// bus master. The buffer is coherent from both the processor's and the > bus master's point of view. > /// > EfiPciIoOperationBusMasterCommonBuffer, > > Thanks for raising case about uncached memory access. But after checking the > code, for Intel VTd case > https://github.com/tianocore/edk2/blob/master/IntelSiliconPkg/Feature/VTd/IntelVTdDxe/BmDma.c#L460 > (or no IOMMU case > https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciRootBridgeIo.c#L1567), > the common buffer is just normal memory buffer. > If someone can help do some test/collect some data on a system using common > buffers imply access to uncached memory, that will be great. > OK, so first of all, can anyone explain to me under which circumstances interrupt transfers are a bottleneck? I'd assume that anything throughput bound would use bulk endpoints. Also, since the Map/Unmap calls are only costly when using an IOMMU, could we simply revert to the old behavior if mIoMmu == NULL? >>> >>> I haven't had time to look at these patches yet. >>> >>> I agree with Eugene's concern: the directional DMA routines are much >>> more performant on implementations with non-coherent DMA, and so >>> common buffers should be avoided unless we are dealing with data >>> structures that are truly shared between the CPU and the device. >>> >>> Since this is obviously not the case here, could we please have some >>> numbers about the performance improvement we are talking about here? >>> Would it be possible to improve the IOMMU handling code instead? > > > We collected the data below on a platform with release image and Intel VTd > enabled. > > The image size of EhciDxe or XhciDxe can reduce about 120+ bytes. > > EHCI without the patch: > ==[ Cumulative ]======== > (Times in microsec.) Cumulative Average Shortest Longest > Name Count Duration Duration Duration Duration > ------------------------------------------------------------------------------- > S0000B00D1DF0 446 2150 4 2 963 > > EHCI with the patch: > ==[ Cumulative ]======== > (Times in microsec.) Cumulative Average Shortest Longest > Name Count Duration Duration Duration Duration > ------------------------------------------------------------------------------- > S0000B00D1DF0 270 742 2 2 41 > > XHCI without the patch: > ==[ Cumulative ]======== > (Times in microsec.) Cumulative Average Shortest Longest > Name Count Duration Duration Duration Duration > ------------------------------------------------------------------------------- > S0000B00D14F0 215 603 2 2 52 > > XHCI with the patch: > ==[ Cumulative ]======== > (Times in microsec.) Cumulative Average Shortest Longest > Name Count Duration Duration Duration Duration > ------------------------------------------------------------------------------- > S0000B00D14F0 95 294 3 2 52 > > I believe the performance data really depends on > 1. How many AsyncInterruptTransfer handlers (the number of USB keyboard > and/or USB bluetooth keyboard?) > 2. Data size (for flushing data from PCI controller specific address to > mapped system memory address *in original code*) > 3. The performance of IoMmu->SetAttribute (for example, the SetAttribute > operation on Intel VTd engine caused by the unmap and map for flushing data > *in original code*, the SetAttribute operation on IntelVTd engine will > involve FlushPageTableMemory, InvalidatePageEntry and etc) > OK, so there is room for improvement here: there is no reason the IOMMU driver couldn't cache mappings, or do some other optimizations that would make mapping the same memory repeatedly less costly. >> >> On an unrelated note to the concerns above: >> Why has a fundamental change to the behaviour of one of the industry >> standard drivers been pushed at the very end of the stable cycle? > > > We thought it was a simple improvement but not fundamental change before > Eugene and Ard raised the concern. > > > Thanks, > Star > >> >> Regards, >> >> Leif >> >