From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ard.biesheuvel@linaro.org>
Received-SPF: Pass (sender SPF authorized) identity=mailfrom;
 client-ip=2607:f8b0:4864:20::d41; helo=mail-io1-xd41.google.com;
 envelope-from=ard.biesheuvel@linaro.org; receiver=edk2-devel@lists.01.org 
Received: from mail-io1-xd41.google.com (mail-io1-xd41.google.com
 [IPv6:2607:f8b0:4864:20::d41])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by ml01.01.org (Postfix) with ESMTPS id 49D1221B02822
 for <edk2-devel@lists.01.org>; Tue,  6 Nov 2018 01:49:01 -0800 (PST)
Received: by mail-io1-xd41.google.com with SMTP id f12-v6so8781913iog.0
 for <edk2-devel@lists.01.org>; Tue, 06 Nov 2018 01:49:01 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc; bh=dBAP1gbkfc45j2qLrffbxqN2R4+50zBWAco8zzhv1GI=;
 b=UpBEVUzMXml7YiCCpYD28w6ZdSUTgZHoPWlB3q7keBvJMkrMFtZaQrk2MLRbGAYLlx
 K+uqAmVXCYKCVVxdjUlBjQYqpADmQt0eKatdzpEVGnhbgN7rWM7hnZLj8Rv+4b8bHz33
 9SiVI6zc8b6tQA0aBkdid7o/pUBnXrBliTq88=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:cc;
 bh=dBAP1gbkfc45j2qLrffbxqN2R4+50zBWAco8zzhv1GI=;
 b=A7EOha62iqZXYxl454i7v0VjTVnWVqTjoRKI5eYGQ3PDOaYfGxHg5PDT1jPuQ2CEkO
 H8hHoZ3TfHl5qNPJUGii6aKY9FciNkZAVEhllPBU2R1jJRfHuB+NkVvG0PttAok5+Axv
 bYIkn+YLAN1guH0M3C5uz6Jgjrsdre2BiXEIIBzwsa2Y/7MI2+UgmmQe0uOth2pwxt19
 jx0bbbAUvX30O85IaJT7qpwdm0kXAQ2PtEHoZuYzBKg/PmCmI+TTU6S6UknGVMPTa16P
 YFs7PbnnfpSevsHaBxec5oZiHUjsxCEo6U6UVGMtNn+gfCRuDPR+3vNR6ppLtbM/tc3/
 mGbQ==
X-Gm-Message-State: AGRZ1gI35BzCoTJx8vbnn+9/3TPFokgtnyw9YWhsfwSMFRDScYV9cOF5
 YHYH9yPVeIlGj+tRTnn6Z5aN6sM5UCRgfkCv/ZnBMQ==
X-Google-Smtp-Source: AJdET5d35dm/eEptOSF5ut4GIvbUtVEyxwmguTJnDgixg7RJ45YIuEAwz0ExNdjndedzJjkowtwxi8nW3y/nCf7zCf4=
X-Received: by 2002:a6b:37c2:: with SMTP id
 e185-v6mr20944584ioa.173.1541497741124; 
 Tue, 06 Nov 2018 01:49:01 -0800 (PST)
MIME-Version: 1.0
Received: by 2002:a6b:4f16:0:0:0:0:0 with HTTP;
 Tue, 6 Nov 2018 01:49:00 -0800 (PST)
In-Reply-To: <962a2a90-2783-5fd1-25d2-6a834daa3f26@intel.com>
References: <1540561286-112684-1-git-send-email-star.zeng@intel.com>
 <1540561286-112684-5-git-send-email-star.zeng@intel.com>
 <CS1PR8401MB1189428C2915107C583C0A42B4CC0@CS1PR8401MB1189.NAMPRD84.PROD.OUTLOOK.COM>
 <CAKv+Gu-U6rtrz11zsTH0+ea4X7WaDdf38ZcZG45Bsh_2cK+L=Q@mail.gmail.com>
 <20181030125006.4deveknlhrwehllb@bivouac.eciton.net>
 <962a2a90-2783-5fd1-25d2-6a834daa3f26@intel.com>
From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Date: Tue, 6 Nov 2018 10:49:00 +0100
Message-ID: <CAKv+Gu8hSmrEOAgTn91PPa+oX-6o+HWRjef6jdEPzwBOvr-vdQ@mail.gmail.com>
To: "Zeng, Star" <star.zeng@intel.com>
Cc: Leif Lindholm <leif.lindholm@linaro.org>, "Cohen, Eugene" <eugene@hp.com>, 
 edk2-devel-01 <edk2-devel@lists.01.org>
Subject: Re: [PATCH V3 4/4] MdeModulePkg EhciDxe: Use common buffer for AsyncInterruptTransfer
X-BeenThere: edk2-devel@lists.01.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: EDK II Development  <edk2-devel.lists.01.org>
List-Unsubscribe: <https://lists.01.org/mailman/options/edk2-devel>,
 <mailto:edk2-devel-request@lists.01.org?subject=unsubscribe>
List-Archive: <http://lists.01.org/pipermail/edk2-devel/>
List-Post: <mailto:edk2-devel@lists.01.org>
List-Help: <mailto:edk2-devel-request@lists.01.org?subject=help>
List-Subscribe: <https://lists.01.org/mailman/listinfo/edk2-devel>,
 <mailto:edk2-devel-request@lists.01.org?subject=subscribe>
X-List-Received-Date: Tue, 06 Nov 2018 09:49:04 -0000
Content-Type: text/plain; charset="UTF-8"

On 31 October 2018 at 05:38, Zeng, Star <star.zeng@intel.com> wrote:
> Good feedback.
>
> On 2018/10/30 20:50, Leif Lindholm wrote:
>>
>> On Tue, Oct 30, 2018 at 09:39:24AM -0300, Ard Biesheuvel wrote:
>>>
>>> (add back the list)
>>
>>
>> Oi! Go back on holiday!
>>
>>> On 30 October 2018 at 09:07, Cohen, Eugene <eugene@hp.com> wrote:
>>>>
>>>> Has this patch been tested on a system that does not have coherent DMA?
>>>>
>>>> It's not clear that this change would actually be faster on a system of
>>>> that
>>>> type since using common buffers imply access to uncached memory.
>>>> Depending
>>>> on the access patterns the uncached memory access could be more time
>>>> consuming than cache maintenance operations.
>
>
> The change/idea was based on the statement below.
>   ///
>   /// Provides both read and write access to system memory by both the
> processor and a
>   /// bus master. The buffer is coherent from both the processor's and the
> bus master's point of view.
>   ///
>   EfiPciIoOperationBusMasterCommonBuffer,
>
> Thanks for raising case about uncached memory access. But after checking the
> code, for Intel VTd case
> https://github.com/tianocore/edk2/blob/master/IntelSiliconPkg/Feature/VTd/IntelVTdDxe/BmDma.c#L460
> (or no IOMMU case
> https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciRootBridgeIo.c#L1567),
> the common buffer is just normal memory buffer.
> If someone can help do some test/collect some data on a system using common
> buffers imply access to uncached memory, that will be great.
>

OK, so first of all, can anyone explain to me under which
circumstances interrupt transfers are a bottleneck? I'd assume that
anything throughput bound would use bulk endpoints.

Also, since the Map/Unmap calls are only costly when using an IOMMU,
could we simply revert to the old behavior if mIoMmu == NULL?

>>>
>>> I haven't had time to look at these patches yet.
>>>
>>> I agree with Eugene's concern: the directional DMA routines are much
>>> more performant on implementations with non-coherent DMA, and so
>>> common buffers should be avoided unless we are dealing with data
>>> structures that are truly shared between the CPU and the device.
>>>
>>> Since this is obviously not the case here, could we please have some
>>> numbers about the performance improvement we are talking about here?
>>> Would it be possible to improve the IOMMU handling code instead?
>
>
> We collected the data below on a platform with release image and Intel VTd
> enabled.
>
> The image size of EhciDxe or XhciDxe can reduce about 120+ bytes.
>
> EHCI without the patch:
> ==[ Cumulative ]========
> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>    Name         Count     Duration    Duration    Duration    Duration
> -------------------------------------------------------------------------------
> S0000B00D1DF0        446        2150           4           2         963
>
> EHCI with the patch:
> ==[ Cumulative ]========
> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>    Name         Count     Duration    Duration    Duration    Duration
> -------------------------------------------------------------------------------
> S0000B00D1DF0        270         742           2           2          41
>
> XHCI without the patch:
> ==[ Cumulative ]========
> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>    Name         Count     Duration    Duration    Duration    Duration
> -------------------------------------------------------------------------------
> S0000B00D14F0        215         603           2           2          52
>
> XHCI with the patch:
> ==[ Cumulative ]========
> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>    Name         Count     Duration    Duration    Duration    Duration
> -------------------------------------------------------------------------------
> S0000B00D14F0         95         294           3           2          52
>
> I believe the performance data really depends on
> 1. How many AsyncInterruptTransfer handlers (the number of USB keyboard
> and/or USB bluetooth keyboard?)
> 2. Data size (for flushing data from PCI controller specific address to
> mapped system memory address *in original code*)
> 3. The performance of IoMmu->SetAttribute (for example, the SetAttribute
> operation on Intel VTd engine caused by the unmap and map for flushing data
> *in original code*, the SetAttribute operation on IntelVTd engine will
> involve FlushPageTableMemory, InvalidatePageEntry and etc)
>

OK, so there is room for improvement here: there is no reason the
IOMMU driver couldn't cache mappings, or do some other optimizations
that would make mapping the same memory repeatedly less costly.

>>
>> On an unrelated note to the concerns above:
>> Why has a fundamental change to the behaviour of one of the industry
>> standard drivers been pushed at the very end of the stable cycle?
>
>
> We thought it was a simple improvement but not fundamental change before
> Eugene and Ard raised the concern.
>
>
> Thanks,
> Star
>
>>
>> Regards,
>>
>> Leif
>>
>