From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <star.zeng@intel.com>
Received-SPF: Pass (sender SPF authorized) identity=mailfrom;
 client-ip=192.55.52.43; helo=mga05.intel.com;
 envelope-from=star.zeng@intel.com; receiver=edk2-devel@lists.01.org 
Received: from mga05.intel.com (mga05.intel.com [192.55.52.43])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by ml01.01.org (Postfix) with ESMTPS id DB16921125471
 for <edk2-devel@lists.01.org>; Wed, 31 Oct 2018 18:12:36 -0700 (PDT)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga002.jf.intel.com ([10.7.209.21])
 by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 31 Oct 2018 18:12:36 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.54,450,1534834800"; d="scan'208";a="104974531"
Received: from shzintpr01.sh.intel.com (HELO [10.7.209.21]) ([10.239.4.80])
 by orsmga002.jf.intel.com with ESMTP; 31 Oct 2018 18:12:35 -0700
To: Leif Lindholm <leif.lindholm@linaro.org>
Cc: edk2-devel-01 <edk2-devel@lists.01.org>, star.zeng@intel.com
References: <1540561286-112684-1-git-send-email-star.zeng@intel.com>
 <1540561286-112684-5-git-send-email-star.zeng@intel.com>
 <CS1PR8401MB1189428C2915107C583C0A42B4CC0@CS1PR8401MB1189.NAMPRD84.PROD.OUTLOOK.COM>
 <CAKv+Gu-U6rtrz11zsTH0+ea4X7WaDdf38ZcZG45Bsh_2cK+L=Q@mail.gmail.com>
 <20181030125006.4deveknlhrwehllb@bivouac.eciton.net>
 <962a2a90-2783-5fd1-25d2-6a834daa3f26@intel.com>
 <20181031120816.jmzeo67l2ij7da23@bivouac.eciton.net>
From: "Zeng, Star" <star.zeng@intel.com>
Message-ID: <90e7a9d5-0ece-a12c-0730-e67b6dbf6505@intel.com>
Date: Thu, 1 Nov 2018 09:12:04 +0800
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:52.0) Gecko/20100101
 Thunderbird/52.9.1
MIME-Version: 1.0
In-Reply-To: <20181031120816.jmzeo67l2ij7da23@bivouac.eciton.net>
Subject: Re: [PATCH V3 4/4] MdeModulePkg EhciDxe: Use common buffer for AsyncInterruptTransfer
X-BeenThere: edk2-devel@lists.01.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: EDK II Development  <edk2-devel.lists.01.org>
List-Unsubscribe: <https://lists.01.org/mailman/options/edk2-devel>,
 <mailto:edk2-devel-request@lists.01.org?subject=unsubscribe>
List-Archive: <http://lists.01.org/pipermail/edk2-devel/>
List-Post: <mailto:edk2-devel@lists.01.org>
List-Help: <mailto:edk2-devel-request@lists.01.org?subject=help>
List-Subscribe: <https://lists.01.org/mailman/listinfo/edk2-devel>,
 <mailto:edk2-devel-request@lists.01.org?subject=subscribe>
X-List-Received-Date: Thu, 01 Nov 2018 01:12:37 -0000
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit

On 2018/10/31 20:08, Leif Lindholm wrote:
> On Wed, Oct 31, 2018 at 12:38:43PM +0800, Zeng, Star wrote:
>> Good feedback.
>>
>> On 2018/10/30 20:50, Leif Lindholm wrote:
>>> On Tue, Oct 30, 2018 at 09:39:24AM -0300, Ard Biesheuvel wrote:
>>>> (add back the list)
>>>
>>> Oi! Go back on holiday!
>>>
>>>> On 30 October 2018 at 09:07, Cohen, Eugene <eugene@hp.com> wrote:
>>>>> Has this patch been tested on a system that does not have coherent DMA?
>>>>>
>>>>> It's not clear that this change would actually be faster on a system of that
>>>>> type since using common buffers imply access to uncached memory.  Depending
>>>>> on the access patterns the uncached memory access could be more time
>>>>> consuming than cache maintenance operations.
>>
>> The change/idea was based on the statement below.
>>    ///
>>    /// Provides both read and write access to system memory by both the
>> processor and a
>>    /// bus master. The buffer is coherent from both the processor's and the
>> bus master's point of view.
>>    ///
>>    EfiPciIoOperationBusMasterCommonBuffer,
>>
>> Thanks for raising case about uncached memory access. But after checking the
>> code, for Intel VTd case https://github.com/tianocore/edk2/blob/master/IntelSiliconPkg/Feature/VTd/IntelVTdDxe/BmDma.c#L460
>> (or no IOMMU case https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciRootBridgeIo.c#L1567),
>> the common buffer is just normal memory buffer.
>> If someone can help do some test/collect some data on a system using common
>> buffers imply access to uncached memory, that will be great.
>>
>>>>
>>>> I haven't had time to look at these patches yet.
>>>>
>>>> I agree with Eugene's concern: the directional DMA routines are much
>>>> more performant on implementations with non-coherent DMA, and so
>>>> common buffers should be avoided unless we are dealing with data
>>>> structures that are truly shared between the CPU and the device.
>>>>
>>>> Since this is obviously not the case here, could we please have some
>>>> numbers about the performance improvement we are talking about here?
>>>> Would it be possible to improve the IOMMU handling code instead?
>>
>> We collected the data below on a platform with release image and Intel VTd
>> enabled.
>>
>> The image size of EhciDxe or XhciDxe can reduce about 120+ bytes.
>>
>> EHCI without the patch:
>> ==[ Cumulative ]========
>> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>>     Name         Count     Duration    Duration    Duration    Duration
>> -------------------------------------------------------------------------------
>> S0000B00D1DF0        446        2150           4           2         963
>>
>> EHCI with the patch:
>> ==[ Cumulative ]========
>> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>>     Name         Count     Duration    Duration    Duration    Duration
>> -------------------------------------------------------------------------------
>> S0000B00D1DF0        270         742           2           2          41
>>
>> XHCI without the patch:
>> ==[ Cumulative ]========
>> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>>     Name         Count     Duration    Duration    Duration    Duration
>> -------------------------------------------------------------------------------
>> S0000B00D14F0        215         603           2           2          52
>>
>> XHCI with the patch:
>> ==[ Cumulative ]========
>> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>>     Name         Count     Duration    Duration    Duration    Duration
>> -------------------------------------------------------------------------------
>> S0000B00D14F0         95         294           3           2          52
>>
>> I believe the performance data really depends on
>> 1. How many AsyncInterruptTransfer handlers (the number of USB keyboard
>> and/or USB bluetooth keyboard?)
>> 2. Data size (for flushing data from PCI controller specific address to
>> mapped system memory address *in original code*)
>> 3. The performance of IoMmu->SetAttribute (for example, the SetAttribute
>> operation on Intel VTd engine caused by the unmap and map for flushing data
>> *in original code*, the SetAttribute operation on IntelVTd engine will
>> involve FlushPageTableMemory, InvalidatePageEntry and etc)
>>
>>> On an unrelated note to the concerns above:
>>> Why has a fundamental change to the behaviour of one of the industry
>>> standard drivers been pushed at the very end of the stable cycle?
>>
>> We thought it was a simple improvement but not fundamental change before
>> Eugene and Ard raised the concern.
> 
> Understood.

Thanks. :)

> 
> However, as it is changing the memory management behaviour of a core
> driver, I think it automatically qualifies as something that should
> only go in the week after a stable tag.
> 
> We will need to have a closer look at the non-coherent case when Ard
> gets back (Monday).

You mean Ard is on vacation and will be back next Monday.

> 
> If this version causes issues with non-coherent systems, we will need
> to revert it before the stable tag. We would then need to look into
> the best way to deal with the performance issues quoted above.

I am glad to revert it if it has side effect. Is it possible someone 
could have a quick check?


Thanks,
Star

> 
> Best Regards,
> 
> Leif
>