public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: "Zeng, Star" <star.zeng@intel.com>
To: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: edk2-devel-01 <edk2-devel@lists.01.org>, star.zeng@intel.com
Subject: Re: [PATCH V3 4/4] MdeModulePkg EhciDxe: Use common buffer for AsyncInterruptTransfer
Date: Tue, 6 Nov 2018 22:37:37 +0800	[thread overview]
Message-ID: <42216d80-d5c0-d071-aa54-932138a05078@intel.com> (raw)
In-Reply-To: <CAKv+Gu8hSmrEOAgTn91PPa+oX-6o+HWRjef6jdEPzwBOvr-vdQ@mail.gmail.com>

On 2018/11/6 17:49, Ard Biesheuvel wrote:
> On 31 October 2018 at 05:38, Zeng, Star <star.zeng@intel.com> wrote:
>> Good feedback.
>>
>> On 2018/10/30 20:50, Leif Lindholm wrote:
>>>
>>> On Tue, Oct 30, 2018 at 09:39:24AM -0300, Ard Biesheuvel wrote:
>>>>
>>>> (add back the list)
>>>
>>>
>>> Oi! Go back on holiday!
>>>
>>>> On 30 October 2018 at 09:07, Cohen, Eugene <eugene@hp.com> wrote:
>>>>>
>>>>> Has this patch been tested on a system that does not have coherent DMA?
>>>>>
>>>>> It's not clear that this change would actually be faster on a system of
>>>>> that
>>>>> type since using common buffers imply access to uncached memory.
>>>>> Depending
>>>>> on the access patterns the uncached memory access could be more time
>>>>> consuming than cache maintenance operations.
>>
>>
>> The change/idea was based on the statement below.
>>    ///
>>    /// Provides both read and write access to system memory by both the
>> processor and a
>>    /// bus master. The buffer is coherent from both the processor's and the
>> bus master's point of view.
>>    ///
>>    EfiPciIoOperationBusMasterCommonBuffer,
>>
>> Thanks for raising case about uncached memory access. But after checking the
>> code, for Intel VTd case
>> https://github.com/tianocore/edk2/blob/master/IntelSiliconPkg/Feature/VTd/IntelVTdDxe/BmDma.c#L460
>> (or no IOMMU case
>> https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciRootBridgeIo.c#L1567),
>> the common buffer is just normal memory buffer.
>> If someone can help do some test/collect some data on a system using common
>> buffers imply access to uncached memory, that will be great.
>>
> 
> OK, so first of all, can anyone explain to me under which
> circumstances interrupt transfers are a bottleneck? I'd assume that
> anything throughput bound would use bulk endpoints.
> 
> Also, since the Map/Unmap calls are only costly when using an IOMMU,
> could we simply revert to the old behavior if mIoMmu == NULL?
> 
>>>>
>>>> I haven't had time to look at these patches yet.
>>>>
>>>> I agree with Eugene's concern: the directional DMA routines are much
>>>> more performant on implementations with non-coherent DMA, and so
>>>> common buffers should be avoided unless we are dealing with data
>>>> structures that are truly shared between the CPU and the device.
>>>>
>>>> Since this is obviously not the case here, could we please have some
>>>> numbers about the performance improvement we are talking about here?
>>>> Would it be possible to improve the IOMMU handling code instead?
>>
>>
>> We collected the data below on a platform with release image and Intel VTd
>> enabled.
>>
>> The image size of EhciDxe or XhciDxe can reduce about 120+ bytes.
>>
>> EHCI without the patch:
>> ==[ Cumulative ]========
>> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>>     Name         Count     Duration    Duration    Duration    Duration
>> -------------------------------------------------------------------------------
>> S0000B00D1DF0        446        2150           4           2         963
>>
>> EHCI with the patch:
>> ==[ Cumulative ]========
>> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>>     Name         Count     Duration    Duration    Duration    Duration
>> -------------------------------------------------------------------------------
>> S0000B00D1DF0        270         742           2           2          41
>>
>> XHCI without the patch:
>> ==[ Cumulative ]========
>> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>>     Name         Count     Duration    Duration    Duration    Duration
>> -------------------------------------------------------------------------------
>> S0000B00D14F0        215         603           2           2          52
>>
>> XHCI with the patch:
>> ==[ Cumulative ]========
>> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>>     Name         Count     Duration    Duration    Duration    Duration
>> -------------------------------------------------------------------------------
>> S0000B00D14F0         95         294           3           2          52
>>
>> I believe the performance data really depends on
>> 1. How many AsyncInterruptTransfer handlers (the number of USB keyboard
>> and/or USB bluetooth keyboard?)
>> 2. Data size (for flushing data from PCI controller specific address to
>> mapped system memory address *in original code*)
>> 3. The performance of IoMmu->SetAttribute (for example, the SetAttribute
>> operation on Intel VTd engine caused by the unmap and map for flushing data
>> *in original code*, the SetAttribute operation on IntelVTd engine will
>> involve FlushPageTableMemory, InvalidatePageEntry and etc)
>>
> 
> OK, so there is room for improvement here: there is no reason the
> IOMMU driver couldn't cache mappings, or do some other optimizations
> that would make mapping the same memory repeatedly less costly.

The unmap/map with IOMMU will direct to SetAttribute that will 
disallow/allow DMA memory access. The IOMMU driver is hard to predict 
the sequence of unmap/map operations. Do you have more detail about the 
optimizations?

Could you take a try with the patch on the platform for the case you and 
Eugene mentioned?

Anyway, I am going to revert the patches (3/4 and 4/4, since 1/4 and 2/4 
have no functionality impact) since the time point is a little sensitive 
as it is near edk2-stable201811.

Thanks,
Star

> 
>>>
>>> On an unrelated note to the concerns above:
>>> Why has a fundamental change to the behaviour of one of the industry
>>> standard drivers been pushed at the very end of the stable cycle?
>>
>>
>> We thought it was a simple improvement but not fundamental change before
>> Eugene and Ard raised the concern.
>>
>>
>> Thanks,
>> Star
>>
>>>
>>> Regards,
>>>
>>> Leif
>>>
>>



  reply	other threads:[~2018-11-06 14:38 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-26 13:41 [PATCH V3 0/4] Remove unnecessary Map/Unmap in XhciDxe/EhciDxe for AsyncInterruptTransfer Star Zeng
2018-10-26 13:41 ` [PATCH V3 1/4] MdeModulePkg XhciDxe: Extract new XhciInsertAsyncIntTransfer function Star Zeng
2018-10-26 13:41 ` [PATCH V3 2/4] MdeModulePkg EhciDxe: Extract new EhciInsertAsyncIntTransfer function Star Zeng
2018-10-26 13:41 ` [PATCH V3 3/4] MdeModulePkg XhciDxe: Use common buffer for AsyncInterruptTransfer Star Zeng
2018-10-27  7:47   ` Wu, Hao A
2018-10-26 13:41 ` [PATCH V3 4/4] MdeModulePkg EhciDxe: " Star Zeng
     [not found]   ` <CS1PR8401MB1189428C2915107C583C0A42B4CC0@CS1PR8401MB1189.NAMPRD84.PROD.OUTLOOK.COM>
2018-10-30 12:39     ` Ard Biesheuvel
2018-10-30 12:50       ` Leif Lindholm
2018-10-31  4:38         ` Zeng, Star
2018-10-31 12:08           ` Leif Lindholm
2018-11-01  1:12             ` Zeng, Star
2018-11-01 10:32               ` Leif Lindholm
2018-11-06  9:49           ` Ard Biesheuvel
2018-11-06 14:37             ` Zeng, Star [this message]
2018-11-07 15:00               ` Zeng, Star
2018-11-07 15:14                 ` Ard Biesheuvel
2018-10-28 13:35 ` [PATCH V3 0/4] Remove unnecessary Map/Unmap in XhciDxe/EhciDxe " Zeng, Star
2018-10-29  2:19 ` Ni, Ruiyu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=42216d80-d5c0-d071-aa54-932138a05078@intel.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox