public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
* [PATCH V3 0/4] Remove unnecessary Map/Unmap in XhciDxe/EhciDxe for AsyncInterruptTransfer
@ 2018-10-26 13:41 Star Zeng
  2018-10-26 13:41 ` [PATCH V3 1/4] MdeModulePkg XhciDxe: Extract new XhciInsertAsyncIntTransfer function Star Zeng
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Star Zeng @ 2018-10-26 13:41 UTC (permalink / raw)
  To: edk2-devel; +Cc: Star Zeng, Ruiyu Ni, Hao Wu, Jian J Wang, Jiewen Yao

V3: Thanks for Hao's feedback.
1. Match function parameter name and description between
XhciSched.c and XhciSched.h, EhciSched.c and EhciSched.h.
2. Call XhcFreeUrb after XhcCreateTransferTrb fails in XhcCreateTrb.


V2: Thanks for Ray's feedback.
1. Add the missing "FreePool (Data);".
2. Remove the unnecessary indentation change.

REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1274

Please refer to the log message of each commit for more details.

Cc: Ruiyu Ni <ruiyu.ni@intel.com>
Cc: Hao Wu <hao.a.wu@intel.com>
Cc: Jian J Wang <jian.j.wang@intel.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>

Star Zeng (4):
  MdeModulePkg XhciDxe: Extract new XhciInsertAsyncIntTransfer function
  MdeModulePkg EhciDxe: Extract new EhciInsertAsyncIntTransfer function
  MdeModulePkg XhciDxe: Use common buffer for AsyncInterruptTransfer
  MdeModulePkg EhciDxe: Use common buffer for AsyncInterruptTransfer

 MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c      |  28 +----
 MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.c | 114 +++++++++----------
 MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.h |  35 +++++-
 MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.c   |  38 ++++++-
 MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.h   |  33 +++---
 MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c      |  19 +---
 MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.c | 185 +++++++++++++++++--------------
 MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.h |  55 +++++++--
 8 files changed, 296 insertions(+), 211 deletions(-)

-- 
2.7.0.windows.1



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH V3 1/4] MdeModulePkg XhciDxe: Extract new XhciInsertAsyncIntTransfer function
  2018-10-26 13:41 [PATCH V3 0/4] Remove unnecessary Map/Unmap in XhciDxe/EhciDxe for AsyncInterruptTransfer Star Zeng
@ 2018-10-26 13:41 ` Star Zeng
  2018-10-26 13:41 ` [PATCH V3 2/4] MdeModulePkg EhciDxe: Extract new EhciInsertAsyncIntTransfer function Star Zeng
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Star Zeng @ 2018-10-26 13:41 UTC (permalink / raw)
  To: edk2-devel; +Cc: Star Zeng, Ruiyu Ni, Hao Wu, Jian J Wang, Jiewen Yao

V3:
Match function parameter name and description between
XhciSched.c and XhciSched.h.

V2:
Add the missing "FreePool (Data);".
Remove the unnecessary indentation change.

REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1274

Extract new XhciInsertAsyncIntTransfer function from
XhcAsyncInterruptTransfer.

It is code preparation for following patch,
no essential functional change.

Cc: Ruiyu Ni <ruiyu.ni@intel.com>
Cc: Hao Wu <hao.a.wu@intel.com>
Cc: Jian J Wang <jian.j.wang@intel.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Star Zeng <star.zeng@intel.com>
Reviewed-by: Hao Wu <hao.a.wu@intel.com>
---
 MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c      | 18 +--------
 MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.c | 65 ++++++++++++++++++++++++++++++++
 MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.h | 28 ++++++++++++++
 3 files changed, 94 insertions(+), 17 deletions(-)

diff --git a/MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c b/MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c
index f1c60bef01c0..7f64f9c7c982 100644
--- a/MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c
+++ b/MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c
@@ -1346,7 +1346,6 @@ XhcAsyncInterruptTransfer (
   EFI_STATUS              Status;
   UINT8                   SlotId;
   UINT8                   Index;
-  UINT8                   *Data;
   EFI_TPL                 OldTpl;
 
   //
@@ -1413,36 +1412,21 @@ XhcAsyncInterruptTransfer (
     goto ON_EXIT;
   }
 
-  Data = AllocateZeroPool (DataLength);
-
-  if (Data == NULL) {
-    DEBUG ((EFI_D_ERROR, "XhcAsyncInterruptTransfer: failed to allocate buffer\n"));
-    Status = EFI_OUT_OF_RESOURCES;
-    goto ON_EXIT;
-  }
-
-  Urb = XhcCreateUrb (
+  Urb = XhciInsertAsyncIntTransfer (
           Xhc,
           DeviceAddress,
           EndPointAddress,
           DeviceSpeed,
           MaximumPacketLength,
-          XHC_INT_TRANSFER_ASYNC,
-          NULL,
-          Data,
           DataLength,
           CallBackFunction,
           Context
           );
-
   if (Urb == NULL) {
-    DEBUG ((EFI_D_ERROR, "XhcAsyncInterruptTransfer: failed to create URB\n"));
-    FreePool (Data);
     Status = EFI_OUT_OF_RESOURCES;
     goto ON_EXIT;
   }
 
-  InsertHeadList (&Xhc->AsyncIntTransfers, &Urb->UrbList);
   //
   // Ring the doorbell
   //
diff --git a/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.c b/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.c
index 166c44bf5e66..75959ae08363 100644
--- a/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.c
+++ b/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.c
@@ -1411,6 +1411,71 @@ XhciDelAllAsyncIntTransfers (
 }
 
 /**
+  Insert a single asynchronous interrupt transfer for
+  the device and endpoint.
+
+  @param Xhc            The XHCI Instance
+  @param BusAddr        The logical device address assigned by UsbBus driver
+  @param EpAddr         Endpoint addrress
+  @param DevSpeed       The device speed
+  @param MaxPacket      The max packet length of the endpoint
+  @param DataLen        The length of data buffer
+  @param Callback       The function to call when data is transferred
+  @param Context        The context to the callback
+
+  @return Created URB or NULL
+
+**/
+URB *
+XhciInsertAsyncIntTransfer (
+  IN USB_XHCI_INSTANCE                  *Xhc,
+  IN UINT8                              BusAddr,
+  IN UINT8                              EpAddr,
+  IN UINT8                              DevSpeed,
+  IN UINTN                              MaxPacket,
+  IN UINTN                              DataLen,
+  IN EFI_ASYNC_USB_TRANSFER_CALLBACK    Callback,
+  IN VOID                               *Context
+  )
+{
+  VOID      *Data;
+  URB       *Urb;
+
+  Data = AllocateZeroPool (DataLen);
+  if (Data == NULL) {
+    DEBUG ((DEBUG_ERROR, "%a: failed to allocate buffer\n", __FUNCTION__));
+    return NULL;
+  }
+
+  Urb = XhcCreateUrb (
+          Xhc,
+          BusAddr,
+          EpAddr,
+          DevSpeed,
+          MaxPacket,
+          XHC_INT_TRANSFER_ASYNC,
+          NULL,
+          Data,
+          DataLen,
+          Callback,
+          Context
+          );
+  if (Urb == NULL) {
+    DEBUG ((DEBUG_ERROR, "%a: failed to create URB\n", __FUNCTION__));
+    FreePool (Data);
+    return NULL;
+  }
+
+  //
+  // New asynchronous transfer must inserted to the head.
+  // Check the comments in XhcMoniteAsyncRequests
+  //
+  InsertHeadList (&Xhc->AsyncIntTransfers, &Urb->UrbList);
+
+  return Urb;
+}
+
+/**
   Update the queue head for next round of asynchronous transfer
 
   @param  Xhc     The XHCI Instance.
diff --git a/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.h b/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.h
index 097408828a1f..b5e192c3b589 100644
--- a/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.h
+++ b/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.h
@@ -853,6 +853,34 @@ XhciDelAllAsyncIntTransfers (
   );
 
 /**
+  Insert a single asynchronous interrupt transfer for
+  the device and endpoint.
+
+  @param Xhc            The XHCI Instance
+  @param BusAddr        The logical device address assigned by UsbBus driver
+  @param EpAddr         Endpoint addrress
+  @param DevSpeed       The device speed
+  @param MaxPacket      The max packet length of the endpoint
+  @param DataLen        The length of data buffer
+  @param Callback       The function to call when data is transferred
+  @param Context        The context to the callback
+
+  @return Created URB or NULL
+
+**/
+URB *
+XhciInsertAsyncIntTransfer (
+  IN USB_XHCI_INSTANCE                  *Xhc,
+  IN UINT8                              BusAddr,
+  IN UINT8                              EpAddr,
+  IN UINT8                              DevSpeed,
+  IN UINTN                              MaxPacket,
+  IN UINTN                              DataLen,
+  IN EFI_ASYNC_USB_TRANSFER_CALLBACK    Callback,
+  IN VOID                               *Context
+  );
+
+/**
   Set Bios Ownership
 
   @param  Xhc          The XHCI Instance.
-- 
2.7.0.windows.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V3 2/4] MdeModulePkg EhciDxe: Extract new EhciInsertAsyncIntTransfer function
  2018-10-26 13:41 [PATCH V3 0/4] Remove unnecessary Map/Unmap in XhciDxe/EhciDxe for AsyncInterruptTransfer Star Zeng
  2018-10-26 13:41 ` [PATCH V3 1/4] MdeModulePkg XhciDxe: Extract new XhciInsertAsyncIntTransfer function Star Zeng
@ 2018-10-26 13:41 ` Star Zeng
  2018-10-26 13:41 ` [PATCH V3 3/4] MdeModulePkg XhciDxe: Use common buffer for AsyncInterruptTransfer Star Zeng
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Star Zeng @ 2018-10-26 13:41 UTC (permalink / raw)
  To: edk2-devel; +Cc: Star Zeng, Ruiyu Ni, Hao Wu, Jian J Wang, Jiewen Yao

V3:
Match function parameter name and description between
EhciSched.c and EhciSched.h.

V2:
Add the missing "gBS->FreePool (Data);".

REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1274

Extract new EhciInsertAsyncIntTransfer function from
EhcAsyncInterruptTransfer.

It is code preparation for following patch,
no essential functional change.

Cc: Ruiyu Ni <ruiyu.ni@intel.com>
Cc: Hao Wu <hao.a.wu@intel.com>
Cc: Jian J Wang <jian.j.wang@intel.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Star Zeng <star.zeng@intel.com>
Reviewed-by: Hao Wu <hao.a.wu@intel.com>
---
 MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c      | 25 +----------
 MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.c | 76 ++++++++++++++++++++++++++++++++
 MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.h | 35 ++++++++++++++-
 3 files changed, 111 insertions(+), 25 deletions(-)

diff --git a/MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c b/MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c
index 50b5598df4fb..5569f4f9618b 100644
--- a/MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c
+++ b/MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c
@@ -997,7 +997,6 @@ EhcAsyncInterruptTransfer (
   URB                     *Urb;
   EFI_TPL                 OldTpl;
   EFI_STATUS              Status;
-  UINT8                   *Data;
 
   //
   // Validate parameters
@@ -1046,16 +1045,7 @@ EhcAsyncInterruptTransfer (
 
   EhcAckAllInterrupt (Ehc);
 
-  Data = AllocatePool (DataLength);
-
-  if (Data == NULL) {
-    DEBUG ((EFI_D_ERROR, "EhcAsyncInterruptTransfer: failed to allocate buffer\n"));
-
-    Status = EFI_OUT_OF_RESOURCES;
-    goto ON_EXIT;
-  }
-
-  Urb = EhcCreateUrb (
+  Urb = EhciInsertAsyncIntTransfer (
           Ehc,
           DeviceAddress,
           EndPointAddress,
@@ -1063,9 +1053,6 @@ EhcAsyncInterruptTransfer (
           *DataToggle,
           MaximumPacketLength,
           Translator,
-          EHC_INT_TRANSFER_ASYNC,
-          NULL,
-          Data,
           DataLength,
           CallBackFunction,
           Context,
@@ -1073,20 +1060,10 @@ EhcAsyncInterruptTransfer (
           );
 
   if (Urb == NULL) {
-    DEBUG ((EFI_D_ERROR, "EhcAsyncInterruptTransfer: failed to create URB\n"));
-
-    gBS->FreePool (Data);
     Status = EFI_OUT_OF_RESOURCES;
     goto ON_EXIT;
   }
 
-  //
-  // New asynchronous transfer must inserted to the head.
-  // Check the comments in EhcMoniteAsyncRequests
-  //
-  EhcLinkQhToPeriod (Ehc, Urb->Qh);
-  InsertHeadList (&Ehc->AsyncIntTransfers, &Urb->UrbList);
-
 ON_EXIT:
   Ehc->PciIo->Flush (Ehc->PciIo);
   gBS->RestoreTPL (OldTpl);
diff --git a/MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.c b/MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.c
index 168280be81d7..ec8d796fab11 100644
--- a/MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.c
+++ b/MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.c
@@ -814,6 +814,82 @@ EhciDelAllAsyncIntTransfers (
   }
 }
 
+/**
+  Insert a single asynchronous interrupt transfer for
+  the device and endpoint.
+
+  @param  Ehc               The EHCI device.
+  @param  DevAddr           The device address.
+  @param  EpAddr            Endpoint addrress & its direction.
+  @param  DevSpeed          The device speed.
+  @param  Toggle            Initial data toggle to use.
+  @param  MaxPacket         The max packet length of the endpoint.
+  @param  Hub               The transaction translator to use.
+  @param  DataLen           The length of data buffer.
+  @param  Callback          The function to call when data is transferred.
+  @param  Context           The context to the callback.
+  @param  Interval          The interval for interrupt transfer.
+
+  @return Created URB or NULL.
+
+**/
+URB *
+EhciInsertAsyncIntTransfer (
+  IN USB2_HC_DEV                        *Ehc,
+  IN UINT8                              DevAddr,
+  IN UINT8                              EpAddr,
+  IN UINT8                              DevSpeed,
+  IN UINT8                              Toggle,
+  IN UINTN                              MaxPacket,
+  IN EFI_USB2_HC_TRANSACTION_TRANSLATOR *Hub,
+  IN UINTN                              DataLen,
+  IN EFI_ASYNC_USB_TRANSFER_CALLBACK    Callback,
+  IN VOID                               *Context,
+  IN UINTN                              Interval
+  )
+{
+  VOID      *Data;
+  URB       *Urb;
+
+  Data = AllocatePool (DataLen);
+
+  if (Data == NULL) {
+    DEBUG ((DEBUG_ERROR, "%a: failed to allocate buffer\n", __FUNCTION__));
+    return NULL;
+  }
+
+  Urb = EhcCreateUrb (
+          Ehc,
+          DevAddr,
+          EpAddr,
+          DevSpeed,
+          Toggle,
+          MaxPacket,
+          Hub,
+          EHC_INT_TRANSFER_ASYNC,
+          NULL,
+          Data,
+          DataLen,
+          Callback,
+          Context,
+          Interval
+          );
+
+  if (Urb == NULL) {
+    DEBUG ((DEBUG_ERROR, "%a: failed to create URB\n", __FUNCTION__));
+    gBS->FreePool (Data);
+    return NULL;
+  }
+
+  //
+  // New asynchronous transfer must inserted to the head.
+  // Check the comments in EhcMoniteAsyncRequests
+  //
+  EhcLinkQhToPeriod (Ehc, Urb->Qh);
+  InsertHeadList (&Ehc->AsyncIntTransfers, &Urb->UrbList);
+
+  return Urb;
+}
 
 /**
   Flush data from PCI controller specific address to mapped system
diff --git a/MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.h b/MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.h
index c03bd619d7e8..e0c430531faf 100644
--- a/MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.h
+++ b/MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.h
@@ -2,7 +2,7 @@
 
   This file contains the definination for host controller schedule routines.
 
-Copyright (c) 2007 - 2009, Intel Corporation. All rights reserved.<BR>
+Copyright (c) 2007 - 2018, Intel Corporation. All rights reserved.<BR>
 This program and the accompanying materials
 are licensed and made available under the terms and conditions of the BSD License
 which accompanies this distribution.  The full text of the license may be found at
@@ -162,6 +162,39 @@ EhciDelAllAsyncIntTransfers (
   IN USB2_HC_DEV          *Ehc
   );
 
+/**
+  Insert a single asynchronous interrupt transfer for
+  the device and endpoint.
+
+  @param  Ehc               The EHCI device.
+  @param  DevAddr           The device address.
+  @param  EpAddr            Endpoint addrress & its direction.
+  @param  DevSpeed          The device speed.
+  @param  Toggle            Initial data toggle to use.
+  @param  MaxPacket         The max packet length of the endpoint.
+  @param  Hub               The transaction translator to use.
+  @param  DataLen           The length of data buffer.
+  @param  Callback          The function to call when data is transferred.
+  @param  Context           The context to the callback.
+  @param  Interval          The interval for interrupt transfer.
+
+  @return Created URB or NULL.
+
+**/
+URB *
+EhciInsertAsyncIntTransfer (
+  IN USB2_HC_DEV                        *Ehc,
+  IN UINT8                              DevAddr,
+  IN UINT8                              EpAddr,
+  IN UINT8                              DevSpeed,
+  IN UINT8                              Toggle,
+  IN UINTN                              MaxPacket,
+  IN EFI_USB2_HC_TRANSACTION_TRANSLATOR *Hub,
+  IN UINTN                              DataLen,
+  IN EFI_ASYNC_USB_TRANSFER_CALLBACK    Callback,
+  IN VOID                               *Context,
+  IN UINTN                              Interval
+  );
 
 /**
   Interrupt transfer periodic check handler.
-- 
2.7.0.windows.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V3 3/4] MdeModulePkg XhciDxe: Use common buffer for AsyncInterruptTransfer
  2018-10-26 13:41 [PATCH V3 0/4] Remove unnecessary Map/Unmap in XhciDxe/EhciDxe for AsyncInterruptTransfer Star Zeng
  2018-10-26 13:41 ` [PATCH V3 1/4] MdeModulePkg XhciDxe: Extract new XhciInsertAsyncIntTransfer function Star Zeng
  2018-10-26 13:41 ` [PATCH V3 2/4] MdeModulePkg EhciDxe: Extract new EhciInsertAsyncIntTransfer function Star Zeng
@ 2018-10-26 13:41 ` Star Zeng
  2018-10-27  7:47   ` Wu, Hao A
  2018-10-26 13:41 ` [PATCH V3 4/4] MdeModulePkg EhciDxe: " Star Zeng
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Star Zeng @ 2018-10-26 13:41 UTC (permalink / raw)
  To: edk2-devel; +Cc: Star Zeng, Ruiyu Ni, Hao Wu, Jian J Wang, Jiewen Yao

V3:
Call XhcFreeUrb after XhcCreateTransferTrb fails in XhcCreateTrb.

REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1274

In current code, XhcMonitorAsyncRequests (timer handler) will do
unmap and map operations for AsyncIntTransfers to "Flush data from
PCI controller specific address to mapped system memory address".
XhcMonitorAsyncRequests
  XhcFlushAsyncIntMap
    PciIo->Unmap
      IoMmu->SetAttribute
    PciIo->Map
      IoMmu->SetAttribute

This may impact the boot performance.

Since the data buffer for XhcMonitorAsyncRequests is internal
buffer, we can allocate common buffer by PciIo->AllocateBuffer
and map the buffer with EfiPciIoOperationBusMasterCommonBuffer,
then the unmap and map operations can be removed.

///
/// Provides both read and write access to system memory by
/// both the processor and a bus master. The buffer is coherent
/// from both the processor's and the bus master's point of view.
///
EfiPciIoOperationBusMasterCommonBuffer,

Test done:
USB KB works normally.
USB disk read/write works normally.

Cc: Ruiyu Ni <ruiyu.ni@intel.com>
Cc: Hao Wu <hao.a.wu@intel.com>
Cc: Jian J Wang <jian.j.wang@intel.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Star Zeng <star.zeng@intel.com>
Reviewed-by: Ruiyu Ni <ruiyu.ni@intel.com>
---
 MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c      |   1 +
 MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.c | 138 +++++++++++--------------------
 MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.h |  27 +++---
 3 files changed, 63 insertions(+), 103 deletions(-)

diff --git a/MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c b/MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c
index 7f64f9c7c982..64855a4c158c 100644
--- a/MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c
+++ b/MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c
@@ -769,6 +769,7 @@ XhcTransfer (
           MaximumPacketLength,
           Type,
           Request,
+          FALSE,
           Data,
           *DataLength,
           NULL,
diff --git a/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.c b/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.c
index 75959ae08363..9dd45e93a272 100644
--- a/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.c
+++ b/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.c
@@ -118,17 +118,18 @@ ON_EXIT:
 /**
   Create a new URB for a new transaction.
 
-  @param  Xhc       The XHCI Instance
-  @param  BusAddr   The logical device address assigned by UsbBus driver
-  @param  EpAddr    Endpoint addrress
-  @param  DevSpeed  The device speed
-  @param  MaxPacket The max packet length of the endpoint
-  @param  Type      The transaction type
-  @param  Request   The standard USB request for control transfer
-  @param  Data      The user data to transfer
-  @param  DataLen   The length of data buffer
-  @param  Callback  The function to call when data is transferred
-  @param  Context   The context to the callback
+  @param  Xhc                   The XHCI Instance
+  @param  BusAddr               The logical device address assigned by UsbBus driver
+  @param  EpAddr                Endpoint addrress
+  @param  DevSpeed              The device speed
+  @param  MaxPacket             The max packet length of the endpoint
+  @param  Type                  The transaction type
+  @param  Request               The standard USB request for control transfer
+  @param  AllocateCommonBuffer  Indicate whether need to allocate common buffer for data transfer
+  @param  Data                  The user data to transfer, NULL if AllocateCommonBuffer is TRUE
+  @param  DataLen               The length of data buffer
+  @param  Callback              The function to call when data is transferred
+  @param  Context               The context to the callback
 
   @return Created URB or NULL
 
@@ -142,6 +143,7 @@ XhcCreateUrb (
   IN UINTN                              MaxPacket,
   IN UINTN                              Type,
   IN EFI_USB_DEVICE_REQUEST             *Request,
+  IN BOOLEAN                            AllocateCommonBuffer,
   IN VOID                               *Data,
   IN UINTN                              DataLen,
   IN EFI_ASYNC_USB_TRANSFER_CALLBACK    Callback,
@@ -169,8 +171,24 @@ XhcCreateUrb (
   Ep->Type      = Type;
 
   Urb->Request  = Request;
+  if (AllocateCommonBuffer) {
+    ASSERT (Data == NULL);
+    Status = Xhc->PciIo->AllocateBuffer (
+                           Xhc->PciIo,
+                           AllocateAnyPages,
+                           EfiBootServicesData,
+                           EFI_SIZE_TO_PAGES (DataLen),
+                           &Data,
+                           0
+                           );
+    if (EFI_ERROR (Status) || (Data == NULL)) {
+      FreePool (Urb);
+      return NULL;
+    }
+  }
   Urb->Data     = Data;
   Urb->DataLen  = DataLen;
+  Urb->AllocateCommonBuffer = AllocateCommonBuffer;
   Urb->Callback = Callback;
   Urb->Context  = Context;
 
@@ -178,7 +196,7 @@ XhcCreateUrb (
   ASSERT_EFI_ERROR (Status);
   if (EFI_ERROR (Status)) {
     DEBUG ((EFI_D_ERROR, "XhcCreateUrb: XhcCreateTransferTrb Failed, Status = %r\n", Status));
-    FreePool (Urb);
+    XhcFreeUrb (Xhc, Urb);
     Urb = NULL;
   }
 
@@ -206,6 +224,14 @@ XhcFreeUrb (
     Xhc->PciIo->Unmap (Xhc->PciIo, Urb->DataMap);
   }
 
+  if (Urb->AllocateCommonBuffer) {
+    Xhc->PciIo->FreeBuffer (
+                  Xhc->PciIo,
+                  EFI_SIZE_TO_PAGES (Urb->DataLen),
+                  Urb->Data
+                  );
+  }
+
   FreePool (Urb);
 }
 
@@ -264,10 +290,14 @@ XhcCreateTransferTrb (
   // No need to remap.
   //
   if ((Urb->Data != NULL) && (Urb->DataMap == NULL)) {
-    if (((UINT8) (Urb->Ep.Direction)) == EfiUsbDataIn) {
-      MapOp = EfiPciIoOperationBusMasterWrite;
+    if (Urb->AllocateCommonBuffer) {
+      MapOp = EfiPciIoOperationBusMasterCommonBuffer;
     } else {
-      MapOp = EfiPciIoOperationBusMasterRead;
+      if (((UINT8) (Urb->Ep.Direction)) == EfiUsbDataIn) {
+        MapOp = EfiPciIoOperationBusMasterWrite;
+      } else {
+        MapOp = EfiPciIoOperationBusMasterRead;
+      }
     }
 
     Len = Urb->DataLen;
@@ -1367,7 +1397,6 @@ XhciDelAsyncIntTransfer (
       }
 
       RemoveEntryList (&Urb->UrbList);
-      FreePool (Urb->Data);
       XhcFreeUrb (Xhc, Urb);
       return EFI_SUCCESS;
     }
@@ -1405,7 +1434,6 @@ XhciDelAllAsyncIntTransfers (
     }
 
     RemoveEntryList (&Urb->UrbList);
-    FreePool (Urb->Data);
     XhcFreeUrb (Xhc, Urb);
   }
 }
@@ -1438,15 +1466,8 @@ XhciInsertAsyncIntTransfer (
   IN VOID                               *Context
   )
 {
-  VOID      *Data;
   URB       *Urb;
 
-  Data = AllocateZeroPool (DataLen);
-  if (Data == NULL) {
-    DEBUG ((DEBUG_ERROR, "%a: failed to allocate buffer\n", __FUNCTION__));
-    return NULL;
-  }
-
   Urb = XhcCreateUrb (
           Xhc,
           BusAddr,
@@ -1455,14 +1476,14 @@ XhciInsertAsyncIntTransfer (
           MaxPacket,
           XHC_INT_TRANSFER_ASYNC,
           NULL,
-          Data,
+          TRUE,
+          NULL,
           DataLen,
           Callback,
           Context
           );
   if (Urb == NULL) {
     DEBUG ((DEBUG_ERROR, "%a: failed to create URB\n", __FUNCTION__));
-    FreePool (Data);
     return NULL;
   }
 
@@ -1503,61 +1524,6 @@ XhcUpdateAsyncRequest (
 }
 
 /**
-  Flush data from PCI controller specific address to mapped system
-  memory address.
-
-  @param  Xhc                The XHCI device.
-  @param  Urb                The URB to unmap.
-
-  @retval EFI_SUCCESS        Success to flush data to mapped system memory.
-  @retval EFI_DEVICE_ERROR   Fail to flush data to mapped system memory.
-
-**/
-EFI_STATUS
-XhcFlushAsyncIntMap (
-  IN  USB_XHCI_INSTANCE   *Xhc,
-  IN  URB                 *Urb
-  )
-{
-  EFI_STATUS                    Status;
-  EFI_PHYSICAL_ADDRESS          PhyAddr;
-  EFI_PCI_IO_PROTOCOL_OPERATION MapOp;
-  EFI_PCI_IO_PROTOCOL           *PciIo;
-  UINTN                         Len;
-  VOID                          *Map;
-
-  PciIo = Xhc->PciIo;
-  Len   = Urb->DataLen;
-
-  if (Urb->Ep.Direction == EfiUsbDataIn) {
-    MapOp = EfiPciIoOperationBusMasterWrite;
-  } else {
-    MapOp = EfiPciIoOperationBusMasterRead;
-  }
-
-  if (Urb->DataMap != NULL) {
-    Status = PciIo->Unmap (PciIo, Urb->DataMap);
-    if (EFI_ERROR (Status)) {
-      goto ON_ERROR;
-    }
-  }
-
-  Urb->DataMap = NULL;
-
-  Status = PciIo->Map (PciIo, MapOp, Urb->Data, &Len, &PhyAddr, &Map);
-  if (EFI_ERROR (Status) || (Len != Urb->DataLen)) {
-    goto ON_ERROR;
-  }
-
-  Urb->DataPhy  = (VOID *) ((UINTN) PhyAddr);
-  Urb->DataMap  = Map;
-  return EFI_SUCCESS;
-
-ON_ERROR:
-  return EFI_DEVICE_ERROR;
-}
-
-/**
   Interrupt transfer periodic check handler.
 
   @param  Event                 Interrupt event.
@@ -1577,7 +1543,6 @@ XhcMonitorAsyncRequests (
   UINT8                   *ProcBuf;
   URB                     *Urb;
   UINT8                   SlotId;
-  EFI_STATUS              Status;
   EFI_TPL                 OldTpl;
 
   OldTpl = gBS->RaiseTPL (XHC_TPL);
@@ -1606,15 +1571,6 @@ XhcMonitorAsyncRequests (
     }
 
     //
-    // Flush any PCI posted write transactions from a PCI host
-    // bridge to system memory.
-    //
-    Status = XhcFlushAsyncIntMap (Xhc, Urb);
-    if (EFI_ERROR (Status)) {
-      DEBUG ((EFI_D_ERROR, "XhcMonitorAsyncRequests: Fail to Flush AsyncInt Mapped Memeory\n"));
-    }
-
-    //
     // Allocate a buffer then copy the transferred data for user.
     // If failed to allocate the buffer, update the URB for next
     // round of transfer. Ignore the data of this round.
diff --git a/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.h b/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.h
index b5e192c3b589..1a95d98996c1 100644
--- a/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.h
+++ b/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.h
@@ -172,6 +172,7 @@ typedef struct _URB {
   //
   USB_ENDPOINT                    Ep;
   EFI_USB_DEVICE_REQUEST          *Request;
+  BOOLEAN                         AllocateCommonBuffer;
   VOID                            *Data;
   UINTN                           DataLen;
   VOID                            *DataPhy;
@@ -1432,17 +1433,18 @@ XhcSetTrDequeuePointer (
 /**
   Create a new URB for a new transaction.
 
-  @param  Xhc       The XHCI Instance
-  @param  DevAddr   The device address
-  @param  EpAddr    Endpoint addrress
-  @param  DevSpeed  The device speed
-  @param  MaxPacket The max packet length of the endpoint
-  @param  Type      The transaction type
-  @param  Request   The standard USB request for control transfer
-  @param  Data      The user data to transfer
-  @param  DataLen   The length of data buffer
-  @param  Callback  The function to call when data is transferred
-  @param  Context   The context to the callback
+  @param  Xhc                   The XHCI Instance
+  @param  BusAddr               The logical device address assigned by UsbBus driver
+  @param  EpAddr                Endpoint addrress
+  @param  DevSpeed              The device speed
+  @param  MaxPacket             The max packet length of the endpoint
+  @param  Type                  The transaction type
+  @param  Request               The standard USB request for control transfer
+  @param  AllocateCommonBuffer  Indicate whether need to allocate common buffer for data transfer
+  @param  Data                  The user data to transfer, NULL if AllocateCommonBuffer is TRUE
+  @param  DataLen               The length of data buffer
+  @param  Callback              The function to call when data is transferred
+  @param  Context               The context to the callback
 
   @return Created URB or NULL
 
@@ -1450,12 +1452,13 @@ XhcSetTrDequeuePointer (
 URB*
 XhcCreateUrb (
   IN USB_XHCI_INSTANCE                  *Xhc,
-  IN UINT8                              DevAddr,
+  IN UINT8                              BusAddr,
   IN UINT8                              EpAddr,
   IN UINT8                              DevSpeed,
   IN UINTN                              MaxPacket,
   IN UINTN                              Type,
   IN EFI_USB_DEVICE_REQUEST             *Request,
+  IN BOOLEAN                            AllocateCommonBuffer,
   IN VOID                               *Data,
   IN UINTN                              DataLen,
   IN EFI_ASYNC_USB_TRANSFER_CALLBACK    Callback,
-- 
2.7.0.windows.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V3 4/4] MdeModulePkg EhciDxe: Use common buffer for AsyncInterruptTransfer
  2018-10-26 13:41 [PATCH V3 0/4] Remove unnecessary Map/Unmap in XhciDxe/EhciDxe for AsyncInterruptTransfer Star Zeng
                   ` (2 preceding siblings ...)
  2018-10-26 13:41 ` [PATCH V3 3/4] MdeModulePkg XhciDxe: Use common buffer for AsyncInterruptTransfer Star Zeng
@ 2018-10-26 13:41 ` Star Zeng
       [not found]   ` <CS1PR8401MB1189428C2915107C583C0A42B4CC0@CS1PR8401MB1189.NAMPRD84.PROD.OUTLOOK.COM>
  2018-10-28 13:35 ` [PATCH V3 0/4] Remove unnecessary Map/Unmap in XhciDxe/EhciDxe " Zeng, Star
  2018-10-29  2:19 ` Ni, Ruiyu
  5 siblings, 1 reply; 18+ messages in thread
From: Star Zeng @ 2018-10-26 13:41 UTC (permalink / raw)
  To: edk2-devel; +Cc: Star Zeng, Ruiyu Ni, Hao Wu, Jian J Wang, Jiewen Yao

REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1274

In current code, EhcMonitorAsyncRequests (timer handler) will do
unmap and map operations for AsyncIntTransfers to "Flush data from
PCI controller specific address to mapped system memory address".
EhcMonitorAsyncRequests
  EhcFlushAsyncIntMap
    PciIo->Unmap
      IoMmu->SetAttribute
    PciIo->Map
      IoMmu->SetAttribute

This may impact the boot performance.

Since the data buffer for EhcMonitorAsyncRequests is internal
buffer, we can allocate common buffer by PciIo->AllocateBuffer
and map the buffer with EfiPciIoOperationBusMasterCommonBuffer,
then the unmap and map operations can be removed.

///
/// Provides both read and write access to system memory by
/// both the processor and a bus master. The buffer is coherent
/// from both the processor's and the bus master's point of view.
///
EfiPciIoOperationBusMasterCommonBuffer,

Test done:
USB KB works normally.
USB disk read/write works normally.

Cc: Ruiyu Ni <ruiyu.ni@intel.com>
Cc: Hao Wu <hao.a.wu@intel.com>
Cc: Jian J Wang <jian.j.wang@intel.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Star Zeng <star.zeng@intel.com>
Reviewed-by: Ruiyu Ni <ruiyu.ni@intel.com>
Reviewed-by: Hao Wu <hao.a.wu@intel.com>
---
 MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c      |  3 ++
 MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.c | 78 +-------------------------------
 MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.c   | 38 ++++++++++++++--
 MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.h   | 33 ++++++++------
 4 files changed, 57 insertions(+), 95 deletions(-)

diff --git a/MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c b/MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c
index 5569f4f9618b..764eeda58ba1 100644
--- a/MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c
+++ b/MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c
@@ -763,6 +763,7 @@ EhcControlTransfer (
           Translator,
           EHC_CTRL_TRANSFER,
           Request,
+          FALSE,
           Data,
           *DataLength,
           NULL,
@@ -906,6 +907,7 @@ EhcBulkTransfer (
           Translator,
           EHC_BULK_TRANSFER,
           NULL,
+          FALSE,
           Data[0],
           *DataLength,
           NULL,
@@ -1163,6 +1165,7 @@ EhcSyncInterruptTransfer (
           Translator,
           EHC_INT_TRANSFER_SYNC,
           NULL,
+          FALSE,
           Data,
           *DataLength,
           NULL,
diff --git a/MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.c b/MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.c
index ec8d796fab11..b067fd02d1ce 100644
--- a/MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.c
+++ b/MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.c
@@ -778,7 +778,6 @@ EhciDelAsyncIntTransfer (
       EhcUnlinkQhFromPeriod (Ehc, Urb->Qh);
       RemoveEntryList (&Urb->UrbList);
 
-      gBS->FreePool (Urb->Data);
       EhcFreeUrb (Ehc, Urb);
       return EFI_SUCCESS;
     }
@@ -809,7 +808,6 @@ EhciDelAllAsyncIntTransfers (
     EhcUnlinkQhFromPeriod (Ehc, Urb->Qh);
     RemoveEntryList (&Urb->UrbList);
 
-    gBS->FreePool (Urb->Data);
     EhcFreeUrb (Ehc, Urb);
   }
 }
@@ -848,16 +846,8 @@ EhciInsertAsyncIntTransfer (
   IN UINTN                              Interval
   )
 {
-  VOID      *Data;
   URB       *Urb;
 
-  Data = AllocatePool (DataLen);
-
-  if (Data == NULL) {
-    DEBUG ((DEBUG_ERROR, "%a: failed to allocate buffer\n", __FUNCTION__));
-    return NULL;
-  }
-
   Urb = EhcCreateUrb (
           Ehc,
           DevAddr,
@@ -868,7 +858,8 @@ EhciInsertAsyncIntTransfer (
           Hub,
           EHC_INT_TRANSFER_ASYNC,
           NULL,
-          Data,
+          TRUE,
+          NULL,
           DataLen,
           Callback,
           Context,
@@ -877,7 +868,6 @@ EhciInsertAsyncIntTransfer (
 
   if (Urb == NULL) {
     DEBUG ((DEBUG_ERROR, "%a: failed to create URB\n", __FUNCTION__));
-    gBS->FreePool (Data);
     return NULL;
   }
 
@@ -892,60 +882,6 @@ EhciInsertAsyncIntTransfer (
 }
 
 /**
-  Flush data from PCI controller specific address to mapped system
-  memory address.
-
-  @param  Ehc                The EHCI device.
-  @param  Urb                The URB to unmap.
-
-  @retval EFI_SUCCESS        Success to flush data to mapped system memory.
-  @retval EFI_DEVICE_ERROR   Fail to flush data to mapped system memory.
-
-**/
-EFI_STATUS
-EhcFlushAsyncIntMap (
-  IN  USB2_HC_DEV         *Ehc,
-  IN  URB                 *Urb
-  )
-{
-  EFI_STATUS                    Status;
-  EFI_PHYSICAL_ADDRESS          PhyAddr;
-  EFI_PCI_IO_PROTOCOL_OPERATION MapOp;
-  EFI_PCI_IO_PROTOCOL           *PciIo;
-  UINTN                         Len;
-  VOID                          *Map;
-
-  PciIo = Ehc->PciIo;
-  Len   = Urb->DataLen;
-
-  if (Urb->Ep.Direction == EfiUsbDataIn) {
-    MapOp = EfiPciIoOperationBusMasterWrite;
-  } else {
-    MapOp = EfiPciIoOperationBusMasterRead;
-  }
-
-  Status = PciIo->Unmap (PciIo, Urb->DataMap);
-  if (EFI_ERROR (Status)) {
-    goto ON_ERROR;
-  }
-
-  Urb->DataMap = NULL;
-
-  Status = PciIo->Map (PciIo, MapOp, Urb->Data, &Len, &PhyAddr, &Map);
-  if (EFI_ERROR (Status) || (Len != Urb->DataLen)) {
-    goto ON_ERROR;
-  }
-
-  Urb->DataPhy  = (VOID *) ((UINTN) PhyAddr);
-  Urb->DataMap  = Map;
-  return EFI_SUCCESS;
-
-ON_ERROR:
-  return EFI_DEVICE_ERROR;
-}
-
-
-/**
   Update the queue head for next round of asynchronous transfer.
 
   @param  Ehc                   The EHCI device.
@@ -1050,7 +986,6 @@ EhcMonitorAsyncRequests (
   BOOLEAN                 Finished;
   UINT8                   *ProcBuf;
   URB                     *Urb;
-  EFI_STATUS              Status;
 
   OldTpl  = gBS->RaiseTPL (EHC_TPL);
   Ehc     = (USB2_HC_DEV *) Context;
@@ -1069,15 +1004,6 @@ EhcMonitorAsyncRequests (
     }
 
     //
-    // Flush any PCI posted write transactions from a PCI host
-    // bridge to system memory.
-    //
-    Status = EhcFlushAsyncIntMap (Ehc, Urb);
-    if (EFI_ERROR (Status)) {
-      DEBUG ((EFI_D_ERROR, "EhcMonitorAsyncRequests: Fail to Flush AsyncInt Mapped Memeory\n"));
-    }
-
-    //
     // Allocate a buffer then copy the transferred data for user.
     // If failed to allocate the buffer, update the URB for next
     // round of transfer. Ignore the data of this round.
diff --git a/MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.c b/MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.c
index 6afb327df165..09c12776ecdb 100644
--- a/MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.c
+++ b/MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.c
@@ -339,6 +339,14 @@ EhcFreeUrb (
     PciIo->Unmap (PciIo, Urb->DataMap);
   }
 
+  if (Urb->AllocateCommonBuffer) {
+    PciIo->FreeBuffer (
+             PciIo,
+             EFI_SIZE_TO_PAGES (Urb->DataLen),
+             Urb->Data
+             );
+  }
+
   if (Urb->Qh != NULL) {
     //
     // Ensure that this queue head has been unlinked from the
@@ -529,7 +537,8 @@ ON_ERROR:
   @param  Hub                   The transaction translator to use.
   @param  Type                  The transaction type.
   @param  Request               The standard USB request for control transfer.
-  @param  Data                  The user data to transfer.
+  @param  AllocateCommonBuffer  Indicate whether need to allocate common buffer for data transfer.
+  @param  Data                  The user data to transfer, NULL if AllocateCommonBuffer is TRUE.
   @param  DataLen               The length of data buffer.
   @param  Callback              The function to call when data is transferred.
   @param  Context               The context to the callback.
@@ -549,6 +558,7 @@ EhcCreateUrb (
   IN EFI_USB2_HC_TRANSACTION_TRANSLATOR *Hub,
   IN UINTN                              Type,
   IN EFI_USB_DEVICE_REQUEST             *Request,
+  IN BOOLEAN                            AllocateCommonBuffer,
   IN VOID                               *Data,
   IN UINTN                              DataLen,
   IN EFI_ASYNC_USB_TRANSFER_CALLBACK    Callback,
@@ -596,8 +606,24 @@ EhcCreateUrb (
   Ep->PollRate    = EhcConvertPollRate (Interval);
 
   Urb->Request    = Request;
+  if (AllocateCommonBuffer) {
+    ASSERT (Data == NULL);
+    Status = Ehc->PciIo->AllocateBuffer (
+                           Ehc->PciIo,
+                           AllocateAnyPages,
+                           EfiBootServicesData,
+                           EFI_SIZE_TO_PAGES (DataLen),
+                           &Data,
+                           0
+                           );
+    if (EFI_ERROR (Status) || (Data == NULL)) {
+      FreePool (Urb);
+      return NULL;
+    }
+  }
   Urb->Data       = Data;
   Urb->DataLen    = DataLen;
+  Urb->AllocateCommonBuffer = AllocateCommonBuffer;
   Urb->Callback   = Callback;
   Urb->Context    = Context;
 
@@ -627,10 +653,14 @@ EhcCreateUrb (
   if (Data != NULL) {
     Len     = DataLen;
 
-    if (Ep->Direction == EfiUsbDataIn) {
-      MapOp = EfiPciIoOperationBusMasterWrite;
+    if (Urb->AllocateCommonBuffer) {
+      MapOp = EfiPciIoOperationBusMasterCommonBuffer;
     } else {
-      MapOp = EfiPciIoOperationBusMasterRead;
+      if (Ep->Direction == EfiUsbDataIn) {
+        MapOp = EfiPciIoOperationBusMasterWrite;
+      } else {
+        MapOp = EfiPciIoOperationBusMasterRead;
+      }
     }
 
     Status  = PciIo->Map (PciIo, MapOp, Data, &Len, &PhyAddr, &Map);
diff --git a/MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.h b/MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.h
index 02e9af81be39..cb3599f9d361 100644
--- a/MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.h
+++ b/MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.h
@@ -3,7 +3,7 @@
     This file contains URB request, each request is warpped in a
     URB (Usb Request Block).
 
-Copyright (c) 2007 - 2010, Intel Corporation. All rights reserved.<BR>
+Copyright (c) 2007 - 2018, Intel Corporation. All rights reserved.<BR>
 This program and the accompanying materials
 are licensed and made available under the terms and conditions of the BSD License
 which accompanies this distribution.  The full text of the license may be found at
@@ -216,6 +216,7 @@ struct _URB {
   EFI_USB_DEVICE_REQUEST          *Request;     // Control transfer only
   VOID                            *RequestPhy;  // Address of the mapped request
   VOID                            *RequestMap;
+  BOOLEAN                         AllocateCommonBuffer;
   VOID                            *Data;
   UINTN                           DataLen;
   VOID                            *DataPhy;     // Address of the mapped user data
@@ -298,20 +299,21 @@ EhcFreeUrb (
 /**
   Create a new URB and its associated QTD.
 
-  @param  Ehc        The EHCI device.
-  @param  DevAddr    The device address.
-  @param  EpAddr     Endpoint addrress & its direction.
-  @param  DevSpeed   The device speed.
-  @param  Toggle     Initial data toggle to use.
-  @param  MaxPacket  The max packet length of the endpoint.
-  @param  Hub        The transaction translator to use.
-  @param  Type       The transaction type.
-  @param  Request    The standard USB request for control transfer.
-  @param  Data       The user data to transfer.
-  @param  DataLen    The length of data buffer.
-  @param  Callback   The function to call when data is transferred.
-  @param  Context    The context to the callback.
-  @param  Interval   The interval for interrupt transfer.
+  @param  Ehc                   The EHCI device.
+  @param  DevAddr               The device address.
+  @param  EpAddr                Endpoint addrress & its direction.
+  @param  DevSpeed              The device speed.
+  @param  Toggle                Initial data toggle to use.
+  @param  MaxPacket             The max packet length of the endpoint.
+  @param  Hub                   The transaction translator to use.
+  @param  Type                  The transaction type.
+  @param  Request               The standard USB request for control transfer.
+  @param  AllocateCommonBuffer  Indicate whether need to allocate common buffer for data transfer.
+  @param  Data                  The user data to transfer, NULL if AllocateCommonBuffer is TRUE.
+  @param  DataLen               The length of data buffer.
+  @param  Callback              The function to call when data is transferred.
+  @param  Context               The context to the callback.
+  @param  Interval              The interval for interrupt transfer.
 
   @return Created URB or NULL.
 
@@ -327,6 +329,7 @@ EhcCreateUrb (
   IN EFI_USB2_HC_TRANSACTION_TRANSLATOR *Hub,
   IN UINTN                              Type,
   IN EFI_USB_DEVICE_REQUEST             *Request,
+  IN BOOLEAN                            AllocateCommonBuffer,
   IN VOID                               *Data,
   IN UINTN                              DataLen,
   IN EFI_ASYNC_USB_TRANSFER_CALLBACK    Callback,
-- 
2.7.0.windows.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH V3 3/4] MdeModulePkg XhciDxe: Use common buffer for AsyncInterruptTransfer
  2018-10-26 13:41 ` [PATCH V3 3/4] MdeModulePkg XhciDxe: Use common buffer for AsyncInterruptTransfer Star Zeng
@ 2018-10-27  7:47   ` Wu, Hao A
  0 siblings, 0 replies; 18+ messages in thread
From: Wu, Hao A @ 2018-10-27  7:47 UTC (permalink / raw)
  To: Zeng, Star, edk2-devel@lists.01.org; +Cc: Ni, Ruiyu, Wang, Jian J, Yao, Jiewen

Reviewed-by: Hao Wu <hao.a.wu@intel.com>

Best Regards,
Hao Wu


> -----Original Message-----
> From: Zeng, Star
> Sent: Friday, October 26, 2018 9:41 PM
> To: edk2-devel@lists.01.org
> Cc: Zeng, Star; Ni, Ruiyu; Wu, Hao A; Wang, Jian J; Yao, Jiewen
> Subject: [PATCH V3 3/4] MdeModulePkg XhciDxe: Use common buffer for
> AsyncInterruptTransfer
> 
> V3:
> Call XhcFreeUrb after XhcCreateTransferTrb fails in XhcCreateTrb.
> 
> REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1274
> 
> In current code, XhcMonitorAsyncRequests (timer handler) will do
> unmap and map operations for AsyncIntTransfers to "Flush data from
> PCI controller specific address to mapped system memory address".
> XhcMonitorAsyncRequests
>   XhcFlushAsyncIntMap
>     PciIo->Unmap
>       IoMmu->SetAttribute
>     PciIo->Map
>       IoMmu->SetAttribute
> 
> This may impact the boot performance.
> 
> Since the data buffer for XhcMonitorAsyncRequests is internal
> buffer, we can allocate common buffer by PciIo->AllocateBuffer
> and map the buffer with EfiPciIoOperationBusMasterCommonBuffer,
> then the unmap and map operations can be removed.
> 
> ///
> /// Provides both read and write access to system memory by
> /// both the processor and a bus master. The buffer is coherent
> /// from both the processor's and the bus master's point of view.
> ///
> EfiPciIoOperationBusMasterCommonBuffer,
> 
> Test done:
> USB KB works normally.
> USB disk read/write works normally.
> 
> Cc: Ruiyu Ni <ruiyu.ni@intel.com>
> Cc: Hao Wu <hao.a.wu@intel.com>
> Cc: Jian J Wang <jian.j.wang@intel.com>
> Cc: Jiewen Yao <jiewen.yao@intel.com>
> Contributed-under: TianoCore Contribution Agreement 1.1
> Signed-off-by: Star Zeng <star.zeng@intel.com>
> Reviewed-by: Ruiyu Ni <ruiyu.ni@intel.com>
> ---
>  MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c      |   1 +
>  MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.c | 138 +++++++++++-------------
> -------
>  MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.h |  27 +++---
>  3 files changed, 63 insertions(+), 103 deletions(-)
> 
> diff --git a/MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c
> b/MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c
> index 7f64f9c7c982..64855a4c158c 100644
> --- a/MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c
> +++ b/MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c
> @@ -769,6 +769,7 @@ XhcTransfer (
>            MaximumPacketLength,
>            Type,
>            Request,
> +          FALSE,
>            Data,
>            *DataLength,
>            NULL,
> diff --git a/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.c
> b/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.c
> index 75959ae08363..9dd45e93a272 100644
> --- a/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.c
> +++ b/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.c
> @@ -118,17 +118,18 @@ ON_EXIT:
>  /**
>    Create a new URB for a new transaction.
> 
> -  @param  Xhc       The XHCI Instance
> -  @param  BusAddr   The logical device address assigned by UsbBus driver
> -  @param  EpAddr    Endpoint addrress
> -  @param  DevSpeed  The device speed
> -  @param  MaxPacket The max packet length of the endpoint
> -  @param  Type      The transaction type
> -  @param  Request   The standard USB request for control transfer
> -  @param  Data      The user data to transfer
> -  @param  DataLen   The length of data buffer
> -  @param  Callback  The function to call when data is transferred
> -  @param  Context   The context to the callback
> +  @param  Xhc                   The XHCI Instance
> +  @param  BusAddr               The logical device address assigned by UsbBus
> driver
> +  @param  EpAddr                Endpoint addrress
> +  @param  DevSpeed              The device speed
> +  @param  MaxPacket             The max packet length of the endpoint
> +  @param  Type                  The transaction type
> +  @param  Request               The standard USB request for control transfer
> +  @param  AllocateCommonBuffer  Indicate whether need to allocate
> common buffer for data transfer
> +  @param  Data                  The user data to transfer, NULL if
> AllocateCommonBuffer is TRUE
> +  @param  DataLen               The length of data buffer
> +  @param  Callback              The function to call when data is transferred
> +  @param  Context               The context to the callback
> 
>    @return Created URB or NULL
> 
> @@ -142,6 +143,7 @@ XhcCreateUrb (
>    IN UINTN                              MaxPacket,
>    IN UINTN                              Type,
>    IN EFI_USB_DEVICE_REQUEST             *Request,
> +  IN BOOLEAN                            AllocateCommonBuffer,
>    IN VOID                               *Data,
>    IN UINTN                              DataLen,
>    IN EFI_ASYNC_USB_TRANSFER_CALLBACK    Callback,
> @@ -169,8 +171,24 @@ XhcCreateUrb (
>    Ep->Type      = Type;
> 
>    Urb->Request  = Request;
> +  if (AllocateCommonBuffer) {
> +    ASSERT (Data == NULL);
> +    Status = Xhc->PciIo->AllocateBuffer (
> +                           Xhc->PciIo,
> +                           AllocateAnyPages,
> +                           EfiBootServicesData,
> +                           EFI_SIZE_TO_PAGES (DataLen),
> +                           &Data,
> +                           0
> +                           );
> +    if (EFI_ERROR (Status) || (Data == NULL)) {
> +      FreePool (Urb);
> +      return NULL;
> +    }
> +  }
>    Urb->Data     = Data;
>    Urb->DataLen  = DataLen;
> +  Urb->AllocateCommonBuffer = AllocateCommonBuffer;
>    Urb->Callback = Callback;
>    Urb->Context  = Context;
> 
> @@ -178,7 +196,7 @@ XhcCreateUrb (
>    ASSERT_EFI_ERROR (Status);
>    if (EFI_ERROR (Status)) {
>      DEBUG ((EFI_D_ERROR, "XhcCreateUrb: XhcCreateTransferTrb Failed,
> Status = %r\n", Status));
> -    FreePool (Urb);
> +    XhcFreeUrb (Xhc, Urb);
>      Urb = NULL;
>    }
> 
> @@ -206,6 +224,14 @@ XhcFreeUrb (
>      Xhc->PciIo->Unmap (Xhc->PciIo, Urb->DataMap);
>    }
> 
> +  if (Urb->AllocateCommonBuffer) {
> +    Xhc->PciIo->FreeBuffer (
> +                  Xhc->PciIo,
> +                  EFI_SIZE_TO_PAGES (Urb->DataLen),
> +                  Urb->Data
> +                  );
> +  }
> +
>    FreePool (Urb);
>  }
> 
> @@ -264,10 +290,14 @@ XhcCreateTransferTrb (
>    // No need to remap.
>    //
>    if ((Urb->Data != NULL) && (Urb->DataMap == NULL)) {
> -    if (((UINT8) (Urb->Ep.Direction)) == EfiUsbDataIn) {
> -      MapOp = EfiPciIoOperationBusMasterWrite;
> +    if (Urb->AllocateCommonBuffer) {
> +      MapOp = EfiPciIoOperationBusMasterCommonBuffer;
>      } else {
> -      MapOp = EfiPciIoOperationBusMasterRead;
> +      if (((UINT8) (Urb->Ep.Direction)) == EfiUsbDataIn) {
> +        MapOp = EfiPciIoOperationBusMasterWrite;
> +      } else {
> +        MapOp = EfiPciIoOperationBusMasterRead;
> +      }
>      }
> 
>      Len = Urb->DataLen;
> @@ -1367,7 +1397,6 @@ XhciDelAsyncIntTransfer (
>        }
> 
>        RemoveEntryList (&Urb->UrbList);
> -      FreePool (Urb->Data);
>        XhcFreeUrb (Xhc, Urb);
>        return EFI_SUCCESS;
>      }
> @@ -1405,7 +1434,6 @@ XhciDelAllAsyncIntTransfers (
>      }
> 
>      RemoveEntryList (&Urb->UrbList);
> -    FreePool (Urb->Data);
>      XhcFreeUrb (Xhc, Urb);
>    }
>  }
> @@ -1438,15 +1466,8 @@ XhciInsertAsyncIntTransfer (
>    IN VOID                               *Context
>    )
>  {
> -  VOID      *Data;
>    URB       *Urb;
> 
> -  Data = AllocateZeroPool (DataLen);
> -  if (Data == NULL) {
> -    DEBUG ((DEBUG_ERROR, "%a: failed to allocate buffer\n",
> __FUNCTION__));
> -    return NULL;
> -  }
> -
>    Urb = XhcCreateUrb (
>            Xhc,
>            BusAddr,
> @@ -1455,14 +1476,14 @@ XhciInsertAsyncIntTransfer (
>            MaxPacket,
>            XHC_INT_TRANSFER_ASYNC,
>            NULL,
> -          Data,
> +          TRUE,
> +          NULL,
>            DataLen,
>            Callback,
>            Context
>            );
>    if (Urb == NULL) {
>      DEBUG ((DEBUG_ERROR, "%a: failed to create URB\n", __FUNCTION__));
> -    FreePool (Data);
>      return NULL;
>    }
> 
> @@ -1503,61 +1524,6 @@ XhcUpdateAsyncRequest (
>  }
> 
>  /**
> -  Flush data from PCI controller specific address to mapped system
> -  memory address.
> -
> -  @param  Xhc                The XHCI device.
> -  @param  Urb                The URB to unmap.
> -
> -  @retval EFI_SUCCESS        Success to flush data to mapped system memory.
> -  @retval EFI_DEVICE_ERROR   Fail to flush data to mapped system memory.
> -
> -**/
> -EFI_STATUS
> -XhcFlushAsyncIntMap (
> -  IN  USB_XHCI_INSTANCE   *Xhc,
> -  IN  URB                 *Urb
> -  )
> -{
> -  EFI_STATUS                    Status;
> -  EFI_PHYSICAL_ADDRESS          PhyAddr;
> -  EFI_PCI_IO_PROTOCOL_OPERATION MapOp;
> -  EFI_PCI_IO_PROTOCOL           *PciIo;
> -  UINTN                         Len;
> -  VOID                          *Map;
> -
> -  PciIo = Xhc->PciIo;
> -  Len   = Urb->DataLen;
> -
> -  if (Urb->Ep.Direction == EfiUsbDataIn) {
> -    MapOp = EfiPciIoOperationBusMasterWrite;
> -  } else {
> -    MapOp = EfiPciIoOperationBusMasterRead;
> -  }
> -
> -  if (Urb->DataMap != NULL) {
> -    Status = PciIo->Unmap (PciIo, Urb->DataMap);
> -    if (EFI_ERROR (Status)) {
> -      goto ON_ERROR;
> -    }
> -  }
> -
> -  Urb->DataMap = NULL;
> -
> -  Status = PciIo->Map (PciIo, MapOp, Urb->Data, &Len, &PhyAddr, &Map);
> -  if (EFI_ERROR (Status) || (Len != Urb->DataLen)) {
> -    goto ON_ERROR;
> -  }
> -
> -  Urb->DataPhy  = (VOID *) ((UINTN) PhyAddr);
> -  Urb->DataMap  = Map;
> -  return EFI_SUCCESS;
> -
> -ON_ERROR:
> -  return EFI_DEVICE_ERROR;
> -}
> -
> -/**
>    Interrupt transfer periodic check handler.
> 
>    @param  Event                 Interrupt event.
> @@ -1577,7 +1543,6 @@ XhcMonitorAsyncRequests (
>    UINT8                   *ProcBuf;
>    URB                     *Urb;
>    UINT8                   SlotId;
> -  EFI_STATUS              Status;
>    EFI_TPL                 OldTpl;
> 
>    OldTpl = gBS->RaiseTPL (XHC_TPL);
> @@ -1606,15 +1571,6 @@ XhcMonitorAsyncRequests (
>      }
> 
>      //
> -    // Flush any PCI posted write transactions from a PCI host
> -    // bridge to system memory.
> -    //
> -    Status = XhcFlushAsyncIntMap (Xhc, Urb);
> -    if (EFI_ERROR (Status)) {
> -      DEBUG ((EFI_D_ERROR, "XhcMonitorAsyncRequests: Fail to Flush
> AsyncInt Mapped Memeory\n"));
> -    }
> -
> -    //
>      // Allocate a buffer then copy the transferred data for user.
>      // If failed to allocate the buffer, update the URB for next
>      // round of transfer. Ignore the data of this round.
> diff --git a/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.h
> b/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.h
> index b5e192c3b589..1a95d98996c1 100644
> --- a/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.h
> +++ b/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.h
> @@ -172,6 +172,7 @@ typedef struct _URB {
>    //
>    USB_ENDPOINT                    Ep;
>    EFI_USB_DEVICE_REQUEST          *Request;
> +  BOOLEAN                         AllocateCommonBuffer;
>    VOID                            *Data;
>    UINTN                           DataLen;
>    VOID                            *DataPhy;
> @@ -1432,17 +1433,18 @@ XhcSetTrDequeuePointer (
>  /**
>    Create a new URB for a new transaction.
> 
> -  @param  Xhc       The XHCI Instance
> -  @param  DevAddr   The device address
> -  @param  EpAddr    Endpoint addrress
> -  @param  DevSpeed  The device speed
> -  @param  MaxPacket The max packet length of the endpoint
> -  @param  Type      The transaction type
> -  @param  Request   The standard USB request for control transfer
> -  @param  Data      The user data to transfer
> -  @param  DataLen   The length of data buffer
> -  @param  Callback  The function to call when data is transferred
> -  @param  Context   The context to the callback
> +  @param  Xhc                   The XHCI Instance
> +  @param  BusAddr               The logical device address assigned by UsbBus
> driver
> +  @param  EpAddr                Endpoint addrress
> +  @param  DevSpeed              The device speed
> +  @param  MaxPacket             The max packet length of the endpoint
> +  @param  Type                  The transaction type
> +  @param  Request               The standard USB request for control transfer
> +  @param  AllocateCommonBuffer  Indicate whether need to allocate
> common buffer for data transfer
> +  @param  Data                  The user data to transfer, NULL if
> AllocateCommonBuffer is TRUE
> +  @param  DataLen               The length of data buffer
> +  @param  Callback              The function to call when data is transferred
> +  @param  Context               The context to the callback
> 
>    @return Created URB or NULL
> 
> @@ -1450,12 +1452,13 @@ XhcSetTrDequeuePointer (
>  URB*
>  XhcCreateUrb (
>    IN USB_XHCI_INSTANCE                  *Xhc,
> -  IN UINT8                              DevAddr,
> +  IN UINT8                              BusAddr,
>    IN UINT8                              EpAddr,
>    IN UINT8                              DevSpeed,
>    IN UINTN                              MaxPacket,
>    IN UINTN                              Type,
>    IN EFI_USB_DEVICE_REQUEST             *Request,
> +  IN BOOLEAN                            AllocateCommonBuffer,
>    IN VOID                               *Data,
>    IN UINTN                              DataLen,
>    IN EFI_ASYNC_USB_TRANSFER_CALLBACK    Callback,
> --
> 2.7.0.windows.1



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V3 0/4] Remove unnecessary Map/Unmap in XhciDxe/EhciDxe for AsyncInterruptTransfer
  2018-10-26 13:41 [PATCH V3 0/4] Remove unnecessary Map/Unmap in XhciDxe/EhciDxe for AsyncInterruptTransfer Star Zeng
                   ` (3 preceding siblings ...)
  2018-10-26 13:41 ` [PATCH V3 4/4] MdeModulePkg EhciDxe: " Star Zeng
@ 2018-10-28 13:35 ` Zeng, Star
  2018-10-29  2:19 ` Ni, Ruiyu
  5 siblings, 0 replies; 18+ messages in thread
From: Zeng, Star @ 2018-10-28 13:35 UTC (permalink / raw)
  To: edk2-devel@lists.01.org
  Cc: Ni, Ruiyu, Wu, Hao A, Wang, Jian J, Yao, Jiewen, Zeng, Star

Thanks for all the reviews, I just pushed the patches at https://github.com/tianocore/edk2/compare/a6eb94e...0cd6452.

Star
-----Original Message-----
From: Zeng, Star 
Sent: Friday, October 26, 2018 9:41 PM
To: edk2-devel@lists.01.org
Cc: Zeng, Star <star.zeng@intel.com>; Ni, Ruiyu <ruiyu.ni@intel.com>; Wu, Hao A <hao.a.wu@intel.com>; Wang, Jian J <jian.j.wang@intel.com>; Yao, Jiewen <jiewen.yao@intel.com>
Subject: [PATCH V3 0/4] Remove unnecessary Map/Unmap in XhciDxe/EhciDxe for AsyncInterruptTransfer

V3: Thanks for Hao's feedback.
1. Match function parameter name and description between XhciSched.c and XhciSched.h, EhciSched.c and EhciSched.h.
2. Call XhcFreeUrb after XhcCreateTransferTrb fails in XhcCreateTrb.


V2: Thanks for Ray's feedback.
1. Add the missing "FreePool (Data);".
2. Remove the unnecessary indentation change.

REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1274

Please refer to the log message of each commit for more details.

Cc: Ruiyu Ni <ruiyu.ni@intel.com>
Cc: Hao Wu <hao.a.wu@intel.com>
Cc: Jian J Wang <jian.j.wang@intel.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>

Star Zeng (4):
  MdeModulePkg XhciDxe: Extract new XhciInsertAsyncIntTransfer function
  MdeModulePkg EhciDxe: Extract new EhciInsertAsyncIntTransfer function
  MdeModulePkg XhciDxe: Use common buffer for AsyncInterruptTransfer
  MdeModulePkg EhciDxe: Use common buffer for AsyncInterruptTransfer

 MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c      |  28 +----
 MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.c | 114 +++++++++----------  MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.h |  35 +++++-
 MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.c   |  38 ++++++-
 MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.h   |  33 +++---
 MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c      |  19 +---
 MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.c | 185 +++++++++++++++++--------------  MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.h |  55 +++++++--
 8 files changed, 296 insertions(+), 211 deletions(-)

--
2.7.0.windows.1



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V3 0/4] Remove unnecessary Map/Unmap in XhciDxe/EhciDxe for AsyncInterruptTransfer
  2018-10-26 13:41 [PATCH V3 0/4] Remove unnecessary Map/Unmap in XhciDxe/EhciDxe for AsyncInterruptTransfer Star Zeng
                   ` (4 preceding siblings ...)
  2018-10-28 13:35 ` [PATCH V3 0/4] Remove unnecessary Map/Unmap in XhciDxe/EhciDxe " Zeng, Star
@ 2018-10-29  2:19 ` Ni, Ruiyu
  5 siblings, 0 replies; 18+ messages in thread
From: Ni, Ruiyu @ 2018-10-29  2:19 UTC (permalink / raw)
  To: Star Zeng, edk2-devel; +Cc: Hao Wu, Jian J Wang, Jiewen Yao

On 10/26/2018 9:41 PM, Star Zeng wrote:
> V3: Thanks for Hao's feedback.
> 1. Match function parameter name and description between
> XhciSched.c and XhciSched.h, EhciSched.c and EhciSched.h.
> 2. Call XhcFreeUrb after XhcCreateTransferTrb fails in XhcCreateTrb.
> 
> 
> V2: Thanks for Ray's feedback.
> 1. Add the missing "FreePool (Data);".
> 2. Remove the unnecessary indentation change.
> 
> REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1274
> 
> Please refer to the log message of each commit for more details.
> 
> Cc: Ruiyu Ni <ruiyu.ni@intel.com>
> Cc: Hao Wu <hao.a.wu@intel.com>
> Cc: Jian J Wang <jian.j.wang@intel.com>
> Cc: Jiewen Yao <jiewen.yao@intel.com>
> 
> Star Zeng (4):
>    MdeModulePkg XhciDxe: Extract new XhciInsertAsyncIntTransfer function
>    MdeModulePkg EhciDxe: Extract new EhciInsertAsyncIntTransfer function
>    MdeModulePkg XhciDxe: Use common buffer for AsyncInterruptTransfer
>    MdeModulePkg EhciDxe: Use common buffer for AsyncInterruptTransfer
> 
>   MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c      |  28 +----
>   MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.c | 114 +++++++++----------
>   MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.h |  35 +++++-
>   MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.c   |  38 ++++++-
>   MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.h   |  33 +++---
>   MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c      |  19 +---
>   MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.c | 185 +++++++++++++++++--------------
>   MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.h |  55 +++++++--
>   8 files changed, 296 insertions(+), 211 deletions(-)
> 
Reviewed-by: Ruiyu Ni <ruiyu.ni@Intel.com>

-- 
Thanks,
Ray


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V3 4/4] MdeModulePkg EhciDxe: Use common buffer for AsyncInterruptTransfer
       [not found]   ` <CS1PR8401MB1189428C2915107C583C0A42B4CC0@CS1PR8401MB1189.NAMPRD84.PROD.OUTLOOK.COM>
@ 2018-10-30 12:39     ` Ard Biesheuvel
  2018-10-30 12:50       ` Leif Lindholm
  0 siblings, 1 reply; 18+ messages in thread
From: Ard Biesheuvel @ 2018-10-30 12:39 UTC (permalink / raw)
  To: Cohen, Eugene, edk2-devel-01; +Cc: Star Zeng, Leif Lindholm

(add back the list)

On 30 October 2018 at 09:07, Cohen, Eugene <eugene@hp.com> wrote:
> Has this patch been tested on a system that does not have coherent DMA?
>
>
>
> It's not clear that this change would actually be faster on a system of that
> type since using common buffers imply access to uncached memory.  Depending
> on the access patterns the uncached memory access could be more time
> consuming than cache maintenance operations.
>

I haven't had time to look at these patches yet.

I agree with Eugene's concern: the directional DMA routines are much
more performant on implementations with non-coherent DMA, and so
common buffers should be avoided unless we are dealing with data
structures that are truly shared between the CPU and the device.

Since this is obviously not the case here, could we please have some
numbers about the performance improvement we are talking about here?
Would it be possible to improve the IOMMU handling code instead?


>
>
> From: edk2-devel <edk2-devel-bounces@lists.01.org> On Behalf Of Star Zeng
> Sent: Friday, October 26, 2018 7:41 AM
> To: edk2-devel@lists.01.org
> Cc: Ruiyu Ni <ruiyu.ni@intel.com>; Hao Wu <hao.a.wu@intel.com>; Jiewen Yao
> <jiewen.yao@intel.com>; Star Zeng <star.zeng@intel.com>
> Subject: [edk2] [PATCH V3 4/4] MdeModulePkg EhciDxe: Use common buffer for
> AsyncInterruptTransfer
>
>
>
> REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1274
>
> In current code, EhcMonitorAsyncRequests (timer handler) will do
> unmap and map operations for AsyncIntTransfers to "Flush data from
> PCI controller specific address to mapped system memory address".
> EhcMonitorAsyncRequests
> EhcFlushAsyncIntMap
> PciIo->Unmap
> IoMmu->SetAttribute
> PciIo->Map
> IoMmu->SetAttribute
>
> This may impact the boot performance.
>
> Since the data buffer for EhcMonitorAsyncRequests is internal
> buffer, we can allocate common buffer by PciIo->AllocateBuffer
> and map the buffer with EfiPciIoOperationBusMasterCommonBuffer,
> then the unmap and map operations can be removed.
>
> ///
> /// Provides both read and write access to system memory by
> /// both the processor and a bus master. The buffer is coherent
> /// from both the processor's and the bus master's point of view.
> ///
> EfiPciIoOperationBusMasterCommonBuffer,
>
> Test done:
> USB KB works normally.
> USB disk read/write works normally.
>
> Cc: Ruiyu Ni <ruiyu.ni@intel.com>
> Cc: Hao Wu <hao.a.wu@intel.com>
> Cc: Jian J Wang <jian.j.wang@intel.com>
> Cc: Jiewen Yao <jiewen.yao@intel.com>
> Contributed-under: TianoCore Contribution Agreement 1.1
> Signed-off-by: Star Zeng <star.zeng@intel.com>
> Reviewed-by: Ruiyu Ni <ruiyu.ni@intel.com>
> Reviewed-by: Hao Wu <hao.a.wu@intel.com>
> ---
> MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c | 3 ++
> MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.c | 78
> +-------------------------------
> MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.c | 38 ++++++++++++++--
> MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.h | 33 ++++++++------
> 4 files changed, 57 insertions(+), 95 deletions(-)
>
> diff --git a/MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c
> b/MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c
> index 5569f4f9618b..764eeda58ba1 100644
> --- a/MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c
> +++ b/MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c
> @@ -763,6 +763,7 @@ EhcControlTransfer (
> Translator,
> EHC_CTRL_TRANSFER,
> Request,
> + FALSE,
> Data,
> *DataLength,
> NULL,
> @@ -906,6 +907,7 @@ EhcBulkTransfer (
> Translator,
> EHC_BULK_TRANSFER,
> NULL,
> + FALSE,
> Data[0],
> *DataLength,
> NULL,
> @@ -1163,6 +1165,7 @@ EhcSyncInterruptTransfer (
> Translator,
> EHC_INT_TRANSFER_SYNC,
> NULL,
> + FALSE,
> Data,
> *DataLength,
> NULL,
> diff --git a/MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.c
> b/MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.c
> index ec8d796fab11..b067fd02d1ce 100644
> --- a/MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.c
> +++ b/MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.c
> @@ -778,7 +778,6 @@ EhciDelAsyncIntTransfer (
> EhcUnlinkQhFromPeriod (Ehc, Urb->Qh);
> RemoveEntryList (&Urb->UrbList);
>
> - gBS->FreePool (Urb->Data);
> EhcFreeUrb (Ehc, Urb);
> return EFI_SUCCESS;
> }
> @@ -809,7 +808,6 @@ EhciDelAllAsyncIntTransfers (
> EhcUnlinkQhFromPeriod (Ehc, Urb->Qh);
> RemoveEntryList (&Urb->UrbList);
>
> - gBS->FreePool (Urb->Data);
> EhcFreeUrb (Ehc, Urb);
> }
> }
> @@ -848,16 +846,8 @@ EhciInsertAsyncIntTransfer (
> IN UINTN Interval
> )
> {
> - VOID *Data;
> URB *Urb;
>
> - Data = AllocatePool (DataLen);
> -
> - if (Data == NULL) {
> - DEBUG ((DEBUG_ERROR, "%a: failed to allocate buffer\n", __FUNCTION__));
> - return NULL;
> - }
> -
> Urb = EhcCreateUrb (
> Ehc,
> DevAddr,
> @@ -868,7 +858,8 @@ EhciInsertAsyncIntTransfer (
> Hub,
> EHC_INT_TRANSFER_ASYNC,
> NULL,
> - Data,
> + TRUE,
> + NULL,
> DataLen,
> Callback,
> Context,
> @@ -877,7 +868,6 @@ EhciInsertAsyncIntTransfer (
>
> if (Urb == NULL) {
> DEBUG ((DEBUG_ERROR, "%a: failed to create URB\n", __FUNCTION__));
> - gBS->FreePool (Data);
> return NULL;
> }
>
> @@ -892,60 +882,6 @@ EhciInsertAsyncIntTransfer (
> }
>
> /**
> - Flush data from PCI controller specific address to mapped system
> - memory address.
> -
> - @param Ehc The EHCI device.
> - @param Urb The URB to unmap.
> -
> - @retval EFI_SUCCESS Success to flush data to mapped system memory.
> - @retval EFI_DEVICE_ERROR Fail to flush data to mapped system memory.
> -
> -**/
> -EFI_STATUS
> -EhcFlushAsyncIntMap (
> - IN USB2_HC_DEV *Ehc,
> - IN URB *Urb
> - )
> -{
> - EFI_STATUS Status;
> - EFI_PHYSICAL_ADDRESS PhyAddr;
> - EFI_PCI_IO_PROTOCOL_OPERATION MapOp;
> - EFI_PCI_IO_PROTOCOL *PciIo;
> - UINTN Len;
> - VOID *Map;
> -
> - PciIo = Ehc->PciIo;
> - Len = Urb->DataLen;
> -
> - if (Urb->Ep.Direction == EfiUsbDataIn) {
> - MapOp = EfiPciIoOperationBusMasterWrite;
> - } else {
> - MapOp = EfiPciIoOperationBusMasterRead;
> - }
> -
> - Status = PciIo->Unmap (PciIo, Urb->DataMap);
> - if (EFI_ERROR (Status)) {
> - goto ON_ERROR;
> - }
> -
> - Urb->DataMap = NULL;
> -
> - Status = PciIo->Map (PciIo, MapOp, Urb->Data, &Len, &PhyAddr, &Map);
> - if (EFI_ERROR (Status) || (Len != Urb->DataLen)) {
> - goto ON_ERROR;
> - }
> -
> - Urb->DataPhy = (VOID *) ((UINTN) PhyAddr);
> - Urb->DataMap = Map;
> - return EFI_SUCCESS;
> -
> -ON_ERROR:
> - return EFI_DEVICE_ERROR;
> -}
> -
> -
> -/**
> Update the queue head for next round of asynchronous transfer.
>
> @param Ehc The EHCI device.
> @@ -1050,7 +986,6 @@ EhcMonitorAsyncRequests (
> BOOLEAN Finished;
> UINT8 *ProcBuf;
> URB *Urb;
> - EFI_STATUS Status;
>
> OldTpl = gBS->RaiseTPL (EHC_TPL);
> Ehc = (USB2_HC_DEV *) Context;
> @@ -1069,15 +1004,6 @@ EhcMonitorAsyncRequests (
> }
>
> //
> - // Flush any PCI posted write transactions from a PCI host
> - // bridge to system memory.
> - //
> - Status = EhcFlushAsyncIntMap (Ehc, Urb);
> - if (EFI_ERROR (Status)) {
> - DEBUG ((EFI_D_ERROR, "EhcMonitorAsyncRequests: Fail to Flush AsyncInt
> Mapped Memeory\n"));
> - }
> -
> - //
> // Allocate a buffer then copy the transferred data for user.
> // If failed to allocate the buffer, update the URB for next
> // round of transfer. Ignore the data of this round.
> diff --git a/MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.c
> b/MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.c
> index 6afb327df165..09c12776ecdb 100644
> --- a/MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.c
> +++ b/MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.c
> @@ -339,6 +339,14 @@ EhcFreeUrb (
> PciIo->Unmap (PciIo, Urb->DataMap);
> }
>
> + if (Urb->AllocateCommonBuffer) {
> + PciIo->FreeBuffer (
> + PciIo,
> + EFI_SIZE_TO_PAGES (Urb->DataLen),
> + Urb->Data
> + );
> + }
> +
> if (Urb->Qh != NULL) {
> //
> // Ensure that this queue head has been unlinked from the
> @@ -529,7 +537,8 @@ ON_ERROR:
> @param Hub The transaction translator to use.
> @param Type The transaction type.
> @param Request The standard USB request for control transfer.
> - @param Data The user data to transfer.
> + @param AllocateCommonBuffer Indicate whether need to allocate common
> buffer for data transfer.
> + @param Data The user data to transfer, NULL if AllocateCommonBuffer is
> TRUE.
> @param DataLen The length of data buffer.
> @param Callback The function to call when data is transferred.
> @param Context The context to the callback.
> @@ -549,6 +558,7 @@ EhcCreateUrb (
> IN EFI_USB2_HC_TRANSACTION_TRANSLATOR *Hub,
> IN UINTN Type,
> IN EFI_USB_DEVICE_REQUEST *Request,
> + IN BOOLEAN AllocateCommonBuffer,
> IN VOID *Data,
> IN UINTN DataLen,
> IN EFI_ASYNC_USB_TRANSFER_CALLBACK Callback,
> @@ -596,8 +606,24 @@ EhcCreateUrb (
> Ep->PollRate = EhcConvertPollRate (Interval);
>
> Urb->Request = Request;
> + if (AllocateCommonBuffer) {
> + ASSERT (Data == NULL);
> + Status = Ehc->PciIo->AllocateBuffer (
> + Ehc->PciIo,
> + AllocateAnyPages,
> + EfiBootServicesData,
> + EFI_SIZE_TO_PAGES (DataLen),
> + &Data,
> + 0
> + );
> + if (EFI_ERROR (Status) || (Data == NULL)) {
> + FreePool (Urb);
> + return NULL;
> + }
> + }
> Urb->Data = Data;
> Urb->DataLen = DataLen;
> + Urb->AllocateCommonBuffer = AllocateCommonBuffer;
> Urb->Callback = Callback;
> Urb->Context = Context;
>
> @@ -627,10 +653,14 @@ EhcCreateUrb (
> if (Data != NULL) {
> Len = DataLen;
>
> - if (Ep->Direction == EfiUsbDataIn) {
> - MapOp = EfiPciIoOperationBusMasterWrite;
> + if (Urb->AllocateCommonBuffer) {
> + MapOp = EfiPciIoOperationBusMasterCommonBuffer;
> } else {
> - MapOp = EfiPciIoOperationBusMasterRead;
> + if (Ep->Direction == EfiUsbDataIn) {
> + MapOp = EfiPciIoOperationBusMasterWrite;
> + } else {
> + MapOp = EfiPciIoOperationBusMasterRead;
> + }
> }
>
> Status = PciIo->Map (PciIo, MapOp, Data, &Len, &PhyAddr, &Map);
> diff --git a/MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.h
> b/MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.h
> index 02e9af81be39..cb3599f9d361 100644
> --- a/MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.h
> +++ b/MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.h
> @@ -3,7 +3,7 @@
> This file contains URB request, each request is warpped in a
> URB (Usb Request Block).
>
> -Copyright (c) 2007 - 2010, Intel Corporation. All rights reserved.<BR>
> +Copyright (c) 2007 - 2018, Intel Corporation. All rights reserved.<BR>
> This program and the accompanying materials
> are licensed and made available under the terms and conditions of the BSD
> License
> which accompanies this distribution. The full text of the license may be
> found at
> @@ -216,6 +216,7 @@ struct _URB {
> EFI_USB_DEVICE_REQUEST *Request; // Control transfer only
> VOID *RequestPhy; // Address of the mapped request
> VOID *RequestMap;
> + BOOLEAN AllocateCommonBuffer;
> VOID *Data;
> UINTN DataLen;
> VOID *DataPhy; // Address of the mapped user data
> @@ -298,20 +299,21 @@ EhcFreeUrb (
> /**
> Create a new URB and its associated QTD.
>
> - @param Ehc The EHCI device.
> - @param DevAddr The device address.
> - @param EpAddr Endpoint addrress & its direction.
> - @param DevSpeed The device speed.
> - @param Toggle Initial data toggle to use.
> - @param MaxPacket The max packet length of the endpoint.
> - @param Hub The transaction translator to use.
> - @param Type The transaction type.
> - @param Request The standard USB request for control transfer.
> - @param Data The user data to transfer.
> - @param DataLen The length of data buffer.
> - @param Callback The function to call when data is transferred.
> - @param Context The context to the callback.
> - @param Interval The interval for interrupt transfer.
> + @param Ehc The EHCI device.
> + @param DevAddr The device address.
> + @param EpAddr Endpoint addrress & its direction.
> + @param DevSpeed The device speed.
> + @param Toggle Initial data toggle to use.
> + @param MaxPacket The max packet length of the endpoint.
> + @param Hub The transaction translator to use.
> + @param Type The transaction type.
> + @param Request The standard USB request for control transfer.
> + @param AllocateCommonBuffer Indicate whether need to allocate common
> buffer for data transfer.
> + @param Data The user data to transfer, NULL if AllocateCommonBuffer is
> TRUE.
> + @param DataLen The length of data buffer.
> + @param Callback The function to call when data is transferred.
> + @param Context The context to the callback.
> + @param Interval The interval for interrupt transfer.
>
> @return Created URB or NULL.
>
> @@ -327,6 +329,7 @@ EhcCreateUrb (
> IN EFI_USB2_HC_TRANSACTION_TRANSLATOR *Hub,
> IN UINTN Type,
> IN EFI_USB_DEVICE_REQUEST *Request,
> + IN BOOLEAN AllocateCommonBuffer,
> IN VOID *Data,
> IN UINTN DataLen,
> IN EFI_ASYNC_USB_TRANSFER_CALLBACK Callback,
> --
> 2.7.0.windows.1
>
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V3 4/4] MdeModulePkg EhciDxe: Use common buffer for AsyncInterruptTransfer
  2018-10-30 12:39     ` Ard Biesheuvel
@ 2018-10-30 12:50       ` Leif Lindholm
  2018-10-31  4:38         ` Zeng, Star
  0 siblings, 1 reply; 18+ messages in thread
From: Leif Lindholm @ 2018-10-30 12:50 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: Cohen, Eugene, edk2-devel-01, Star Zeng

On Tue, Oct 30, 2018 at 09:39:24AM -0300, Ard Biesheuvel wrote:
> (add back the list)

Oi! Go back on holiday!

> On 30 October 2018 at 09:07, Cohen, Eugene <eugene@hp.com> wrote:
> > Has this patch been tested on a system that does not have coherent DMA?
> >
> > It's not clear that this change would actually be faster on a system of that
> > type since using common buffers imply access to uncached memory.  Depending
> > on the access patterns the uncached memory access could be more time
> > consuming than cache maintenance operations.
> 
> I haven't had time to look at these patches yet.
> 
> I agree with Eugene's concern: the directional DMA routines are much
> more performant on implementations with non-coherent DMA, and so
> common buffers should be avoided unless we are dealing with data
> structures that are truly shared between the CPU and the device.
> 
> Since this is obviously not the case here, could we please have some
> numbers about the performance improvement we are talking about here?
> Would it be possible to improve the IOMMU handling code instead?

On an unrelated note to the concerns above:
Why has a fundamental change to the behaviour of one of the industry
standard drivers been pushed at the very end of the stable cycle?

Regards,

Leif


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V3 4/4] MdeModulePkg EhciDxe: Use common buffer for AsyncInterruptTransfer
  2018-10-30 12:50       ` Leif Lindholm
@ 2018-10-31  4:38         ` Zeng, Star
  2018-10-31 12:08           ` Leif Lindholm
  2018-11-06  9:49           ` Ard Biesheuvel
  0 siblings, 2 replies; 18+ messages in thread
From: Zeng, Star @ 2018-10-31  4:38 UTC (permalink / raw)
  To: Leif Lindholm, Ard Biesheuvel, Cohen, Eugene; +Cc: edk2-devel-01, star.zeng

Good feedback.

On 2018/10/30 20:50, Leif Lindholm wrote:
> On Tue, Oct 30, 2018 at 09:39:24AM -0300, Ard Biesheuvel wrote:
>> (add back the list)
> 
> Oi! Go back on holiday!
> 
>> On 30 October 2018 at 09:07, Cohen, Eugene <eugene@hp.com> wrote:
>>> Has this patch been tested on a system that does not have coherent DMA?
>>>
>>> It's not clear that this change would actually be faster on a system of that
>>> type since using common buffers imply access to uncached memory.  Depending
>>> on the access patterns the uncached memory access could be more time
>>> consuming than cache maintenance operations.

The change/idea was based on the statement below.
   ///
   /// Provides both read and write access to system memory by both the 
processor and a
   /// bus master. The buffer is coherent from both the processor's and 
the bus master's point of view.
   ///
   EfiPciIoOperationBusMasterCommonBuffer,

Thanks for raising case about uncached memory access. But after checking 
the code, for Intel VTd case 
https://github.com/tianocore/edk2/blob/master/IntelSiliconPkg/Feature/VTd/IntelVTdDxe/BmDma.c#L460 
(or no IOMMU case 
https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciRootBridgeIo.c#L1567), 
the common buffer is just normal memory buffer.
If someone can help do some test/collect some data on a system using 
common buffers imply access to uncached memory, that will be great.

>>
>> I haven't had time to look at these patches yet.
>>
>> I agree with Eugene's concern: the directional DMA routines are much
>> more performant on implementations with non-coherent DMA, and so
>> common buffers should be avoided unless we are dealing with data
>> structures that are truly shared between the CPU and the device.
>>
>> Since this is obviously not the case here, could we please have some
>> numbers about the performance improvement we are talking about here?
>> Would it be possible to improve the IOMMU handling code instead?

We collected the data below on a platform with release image and Intel 
VTd enabled.

The image size of EhciDxe or XhciDxe can reduce about 120+ bytes.

EHCI without the patch:
==[ Cumulative ]========
(Times in microsec.)     Cumulative   Average     Shortest    Longest
    Name         Count     Duration    Duration    Duration    Duration
-------------------------------------------------------------------------------
S0000B00D1DF0        446        2150           4           2         963

EHCI with the patch:
==[ Cumulative ]========
(Times in microsec.)     Cumulative   Average     Shortest    Longest
    Name         Count     Duration    Duration    Duration    Duration
-------------------------------------------------------------------------------
S0000B00D1DF0        270         742           2           2          41

XHCI without the patch:
==[ Cumulative ]========
(Times in microsec.)     Cumulative   Average     Shortest    Longest
    Name         Count     Duration    Duration    Duration    Duration
-------------------------------------------------------------------------------
S0000B00D14F0        215         603           2           2          52

XHCI with the patch:
==[ Cumulative ]========
(Times in microsec.)     Cumulative   Average     Shortest    Longest
    Name         Count     Duration    Duration    Duration    Duration
-------------------------------------------------------------------------------
S0000B00D14F0         95         294           3           2          52

I believe the performance data really depends on
1. How many AsyncInterruptTransfer handlers (the number of USB keyboard 
and/or USB bluetooth keyboard?)
2. Data size (for flushing data from PCI controller specific address to 
mapped system memory address *in original code*)
3. The performance of IoMmu->SetAttribute (for example, the SetAttribute 
operation on Intel VTd engine caused by the unmap and map for flushing 
data *in original code*, the SetAttribute operation on IntelVTd engine 
will involve FlushPageTableMemory, InvalidatePageEntry and etc)

> 
> On an unrelated note to the concerns above:
> Why has a fundamental change to the behaviour of one of the industry
> standard drivers been pushed at the very end of the stable cycle?

We thought it was a simple improvement but not fundamental change before 
Eugene and Ard raised the concern.


Thanks,
Star

> 
> Regards,
> 
> Leif
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V3 4/4] MdeModulePkg EhciDxe: Use common buffer for AsyncInterruptTransfer
  2018-10-31  4:38         ` Zeng, Star
@ 2018-10-31 12:08           ` Leif Lindholm
  2018-11-01  1:12             ` Zeng, Star
  2018-11-06  9:49           ` Ard Biesheuvel
  1 sibling, 1 reply; 18+ messages in thread
From: Leif Lindholm @ 2018-10-31 12:08 UTC (permalink / raw)
  To: Zeng, Star; +Cc: Ard Biesheuvel, Cohen, Eugene, edk2-devel-01

On Wed, Oct 31, 2018 at 12:38:43PM +0800, Zeng, Star wrote:
> Good feedback.
> 
> On 2018/10/30 20:50, Leif Lindholm wrote:
> > On Tue, Oct 30, 2018 at 09:39:24AM -0300, Ard Biesheuvel wrote:
> > > (add back the list)
> > 
> > Oi! Go back on holiday!
> > 
> > > On 30 October 2018 at 09:07, Cohen, Eugene <eugene@hp.com> wrote:
> > > > Has this patch been tested on a system that does not have coherent DMA?
> > > > 
> > > > It's not clear that this change would actually be faster on a system of that
> > > > type since using common buffers imply access to uncached memory.  Depending
> > > > on the access patterns the uncached memory access could be more time
> > > > consuming than cache maintenance operations.
> 
> The change/idea was based on the statement below.
>   ///
>   /// Provides both read and write access to system memory by both the
> processor and a
>   /// bus master. The buffer is coherent from both the processor's and the
> bus master's point of view.
>   ///
>   EfiPciIoOperationBusMasterCommonBuffer,
> 
> Thanks for raising case about uncached memory access. But after checking the
> code, for Intel VTd case https://github.com/tianocore/edk2/blob/master/IntelSiliconPkg/Feature/VTd/IntelVTdDxe/BmDma.c#L460
> (or no IOMMU case https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciRootBridgeIo.c#L1567),
> the common buffer is just normal memory buffer.
> If someone can help do some test/collect some data on a system using common
> buffers imply access to uncached memory, that will be great.
> 
> > > 
> > > I haven't had time to look at these patches yet.
> > > 
> > > I agree with Eugene's concern: the directional DMA routines are much
> > > more performant on implementations with non-coherent DMA, and so
> > > common buffers should be avoided unless we are dealing with data
> > > structures that are truly shared between the CPU and the device.
> > > 
> > > Since this is obviously not the case here, could we please have some
> > > numbers about the performance improvement we are talking about here?
> > > Would it be possible to improve the IOMMU handling code instead?
> 
> We collected the data below on a platform with release image and Intel VTd
> enabled.
> 
> The image size of EhciDxe or XhciDxe can reduce about 120+ bytes.
> 
> EHCI without the patch:
> ==[ Cumulative ]========
> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>    Name         Count     Duration    Duration    Duration    Duration
> -------------------------------------------------------------------------------
> S0000B00D1DF0        446        2150           4           2         963
> 
> EHCI with the patch:
> ==[ Cumulative ]========
> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>    Name         Count     Duration    Duration    Duration    Duration
> -------------------------------------------------------------------------------
> S0000B00D1DF0        270         742           2           2          41
> 
> XHCI without the patch:
> ==[ Cumulative ]========
> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>    Name         Count     Duration    Duration    Duration    Duration
> -------------------------------------------------------------------------------
> S0000B00D14F0        215         603           2           2          52
> 
> XHCI with the patch:
> ==[ Cumulative ]========
> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>    Name         Count     Duration    Duration    Duration    Duration
> -------------------------------------------------------------------------------
> S0000B00D14F0         95         294           3           2          52
> 
> I believe the performance data really depends on
> 1. How many AsyncInterruptTransfer handlers (the number of USB keyboard
> and/or USB bluetooth keyboard?)
> 2. Data size (for flushing data from PCI controller specific address to
> mapped system memory address *in original code*)
> 3. The performance of IoMmu->SetAttribute (for example, the SetAttribute
> operation on Intel VTd engine caused by the unmap and map for flushing data
> *in original code*, the SetAttribute operation on IntelVTd engine will
> involve FlushPageTableMemory, InvalidatePageEntry and etc)
> 
> > On an unrelated note to the concerns above:
> > Why has a fundamental change to the behaviour of one of the industry
> > standard drivers been pushed at the very end of the stable cycle?
> 
> We thought it was a simple improvement but not fundamental change before
> Eugene and Ard raised the concern.

Understood.

However, as it is changing the memory management behaviour of a core
driver, I think it automatically qualifies as something that should
only go in the week after a stable tag.

We will need to have a closer look at the non-coherent case when Ard
gets back (Monday).

If this version causes issues with non-coherent systems, we will need
to revert it before the stable tag. We would then need to look into
the best way to deal with the performance issues quoted above.

Best Regards,

Leif


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V3 4/4] MdeModulePkg EhciDxe: Use common buffer for AsyncInterruptTransfer
  2018-10-31 12:08           ` Leif Lindholm
@ 2018-11-01  1:12             ` Zeng, Star
  2018-11-01 10:32               ` Leif Lindholm
  0 siblings, 1 reply; 18+ messages in thread
From: Zeng, Star @ 2018-11-01  1:12 UTC (permalink / raw)
  To: Leif Lindholm; +Cc: edk2-devel-01, star.zeng

On 2018/10/31 20:08, Leif Lindholm wrote:
> On Wed, Oct 31, 2018 at 12:38:43PM +0800, Zeng, Star wrote:
>> Good feedback.
>>
>> On 2018/10/30 20:50, Leif Lindholm wrote:
>>> On Tue, Oct 30, 2018 at 09:39:24AM -0300, Ard Biesheuvel wrote:
>>>> (add back the list)
>>>
>>> Oi! Go back on holiday!
>>>
>>>> On 30 October 2018 at 09:07, Cohen, Eugene <eugene@hp.com> wrote:
>>>>> Has this patch been tested on a system that does not have coherent DMA?
>>>>>
>>>>> It's not clear that this change would actually be faster on a system of that
>>>>> type since using common buffers imply access to uncached memory.  Depending
>>>>> on the access patterns the uncached memory access could be more time
>>>>> consuming than cache maintenance operations.
>>
>> The change/idea was based on the statement below.
>>    ///
>>    /// Provides both read and write access to system memory by both the
>> processor and a
>>    /// bus master. The buffer is coherent from both the processor's and the
>> bus master's point of view.
>>    ///
>>    EfiPciIoOperationBusMasterCommonBuffer,
>>
>> Thanks for raising case about uncached memory access. But after checking the
>> code, for Intel VTd case https://github.com/tianocore/edk2/blob/master/IntelSiliconPkg/Feature/VTd/IntelVTdDxe/BmDma.c#L460
>> (or no IOMMU case https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciRootBridgeIo.c#L1567),
>> the common buffer is just normal memory buffer.
>> If someone can help do some test/collect some data on a system using common
>> buffers imply access to uncached memory, that will be great.
>>
>>>>
>>>> I haven't had time to look at these patches yet.
>>>>
>>>> I agree with Eugene's concern: the directional DMA routines are much
>>>> more performant on implementations with non-coherent DMA, and so
>>>> common buffers should be avoided unless we are dealing with data
>>>> structures that are truly shared between the CPU and the device.
>>>>
>>>> Since this is obviously not the case here, could we please have some
>>>> numbers about the performance improvement we are talking about here?
>>>> Would it be possible to improve the IOMMU handling code instead?
>>
>> We collected the data below on a platform with release image and Intel VTd
>> enabled.
>>
>> The image size of EhciDxe or XhciDxe can reduce about 120+ bytes.
>>
>> EHCI without the patch:
>> ==[ Cumulative ]========
>> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>>     Name         Count     Duration    Duration    Duration    Duration
>> -------------------------------------------------------------------------------
>> S0000B00D1DF0        446        2150           4           2         963
>>
>> EHCI with the patch:
>> ==[ Cumulative ]========
>> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>>     Name         Count     Duration    Duration    Duration    Duration
>> -------------------------------------------------------------------------------
>> S0000B00D1DF0        270         742           2           2          41
>>
>> XHCI without the patch:
>> ==[ Cumulative ]========
>> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>>     Name         Count     Duration    Duration    Duration    Duration
>> -------------------------------------------------------------------------------
>> S0000B00D14F0        215         603           2           2          52
>>
>> XHCI with the patch:
>> ==[ Cumulative ]========
>> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>>     Name         Count     Duration    Duration    Duration    Duration
>> -------------------------------------------------------------------------------
>> S0000B00D14F0         95         294           3           2          52
>>
>> I believe the performance data really depends on
>> 1. How many AsyncInterruptTransfer handlers (the number of USB keyboard
>> and/or USB bluetooth keyboard?)
>> 2. Data size (for flushing data from PCI controller specific address to
>> mapped system memory address *in original code*)
>> 3. The performance of IoMmu->SetAttribute (for example, the SetAttribute
>> operation on Intel VTd engine caused by the unmap and map for flushing data
>> *in original code*, the SetAttribute operation on IntelVTd engine will
>> involve FlushPageTableMemory, InvalidatePageEntry and etc)
>>
>>> On an unrelated note to the concerns above:
>>> Why has a fundamental change to the behaviour of one of the industry
>>> standard drivers been pushed at the very end of the stable cycle?
>>
>> We thought it was a simple improvement but not fundamental change before
>> Eugene and Ard raised the concern.
> 
> Understood.

Thanks. :)

> 
> However, as it is changing the memory management behaviour of a core
> driver, I think it automatically qualifies as something that should
> only go in the week after a stable tag.
> 
> We will need to have a closer look at the non-coherent case when Ard
> gets back (Monday).

You mean Ard is on vacation and will be back next Monday.

> 
> If this version causes issues with non-coherent systems, we will need
> to revert it before the stable tag. We would then need to look into
> the best way to deal with the performance issues quoted above.

I am glad to revert it if it has side effect. Is it possible someone 
could have a quick check?


Thanks,
Star

> 
> Best Regards,
> 
> Leif
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V3 4/4] MdeModulePkg EhciDxe: Use common buffer for AsyncInterruptTransfer
  2018-11-01  1:12             ` Zeng, Star
@ 2018-11-01 10:32               ` Leif Lindholm
  0 siblings, 0 replies; 18+ messages in thread
From: Leif Lindholm @ 2018-11-01 10:32 UTC (permalink / raw)
  To: Zeng, Star; +Cc: edk2-devel-01

On Thu, Nov 01, 2018 at 09:12:04AM +0800, Zeng, Star wrote:
> > > I believe the performance data really depends on
> > > 1. How many AsyncInterruptTransfer handlers (the number of USB keyboard
> > > and/or USB bluetooth keyboard?)
> > > 2. Data size (for flushing data from PCI controller specific address to
> > > mapped system memory address *in original code*)
> > > 3. The performance of IoMmu->SetAttribute (for example, the SetAttribute
> > > operation on Intel VTd engine caused by the unmap and map for flushing data
> > > *in original code*, the SetAttribute operation on IntelVTd engine will
> > > involve FlushPageTableMemory, InvalidatePageEntry and etc)
> > > 
> > > > On an unrelated note to the concerns above:
> > > > Why has a fundamental change to the behaviour of one of the industry
> > > > standard drivers been pushed at the very end of the stable cycle?
> > > 
> > > We thought it was a simple improvement but not fundamental change before
> > > Eugene and Ard raised the concern.
> > 
> > Understood.
> 
> Thanks. :)
> 
> > 
> > However, as it is changing the memory management behaviour of a core
> > driver, I think it automatically qualifies as something that should
> > only go in the week after a stable tag.
> > 
> > We will need to have a closer look at the non-coherent case when Ard
> > gets back (Monday).
> 
> You mean Ard is on vacation and will be back next Monday.

Yes.

> > If this version causes issues with non-coherent systems, we will need
> > to revert it before the stable tag. We would then need to look into
> > the best way to deal with the performance issues quoted above.
> 
> I am glad to revert it if it has side effect. Is it possible someone could
> have a quick check?

Ard could. It would take me much longer :)
Which is why I would prefer to wait until next week.

Regards,

Leif


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V3 4/4] MdeModulePkg EhciDxe: Use common buffer for AsyncInterruptTransfer
  2018-10-31  4:38         ` Zeng, Star
  2018-10-31 12:08           ` Leif Lindholm
@ 2018-11-06  9:49           ` Ard Biesheuvel
  2018-11-06 14:37             ` Zeng, Star
  1 sibling, 1 reply; 18+ messages in thread
From: Ard Biesheuvel @ 2018-11-06  9:49 UTC (permalink / raw)
  To: Zeng, Star; +Cc: Leif Lindholm, Cohen, Eugene, edk2-devel-01

On 31 October 2018 at 05:38, Zeng, Star <star.zeng@intel.com> wrote:
> Good feedback.
>
> On 2018/10/30 20:50, Leif Lindholm wrote:
>>
>> On Tue, Oct 30, 2018 at 09:39:24AM -0300, Ard Biesheuvel wrote:
>>>
>>> (add back the list)
>>
>>
>> Oi! Go back on holiday!
>>
>>> On 30 October 2018 at 09:07, Cohen, Eugene <eugene@hp.com> wrote:
>>>>
>>>> Has this patch been tested on a system that does not have coherent DMA?
>>>>
>>>> It's not clear that this change would actually be faster on a system of
>>>> that
>>>> type since using common buffers imply access to uncached memory.
>>>> Depending
>>>> on the access patterns the uncached memory access could be more time
>>>> consuming than cache maintenance operations.
>
>
> The change/idea was based on the statement below.
>   ///
>   /// Provides both read and write access to system memory by both the
> processor and a
>   /// bus master. The buffer is coherent from both the processor's and the
> bus master's point of view.
>   ///
>   EfiPciIoOperationBusMasterCommonBuffer,
>
> Thanks for raising case about uncached memory access. But after checking the
> code, for Intel VTd case
> https://github.com/tianocore/edk2/blob/master/IntelSiliconPkg/Feature/VTd/IntelVTdDxe/BmDma.c#L460
> (or no IOMMU case
> https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciRootBridgeIo.c#L1567),
> the common buffer is just normal memory buffer.
> If someone can help do some test/collect some data on a system using common
> buffers imply access to uncached memory, that will be great.
>

OK, so first of all, can anyone explain to me under which
circumstances interrupt transfers are a bottleneck? I'd assume that
anything throughput bound would use bulk endpoints.

Also, since the Map/Unmap calls are only costly when using an IOMMU,
could we simply revert to the old behavior if mIoMmu == NULL?

>>>
>>> I haven't had time to look at these patches yet.
>>>
>>> I agree with Eugene's concern: the directional DMA routines are much
>>> more performant on implementations with non-coherent DMA, and so
>>> common buffers should be avoided unless we are dealing with data
>>> structures that are truly shared between the CPU and the device.
>>>
>>> Since this is obviously not the case here, could we please have some
>>> numbers about the performance improvement we are talking about here?
>>> Would it be possible to improve the IOMMU handling code instead?
>
>
> We collected the data below on a platform with release image and Intel VTd
> enabled.
>
> The image size of EhciDxe or XhciDxe can reduce about 120+ bytes.
>
> EHCI without the patch:
> ==[ Cumulative ]========
> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>    Name         Count     Duration    Duration    Duration    Duration
> -------------------------------------------------------------------------------
> S0000B00D1DF0        446        2150           4           2         963
>
> EHCI with the patch:
> ==[ Cumulative ]========
> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>    Name         Count     Duration    Duration    Duration    Duration
> -------------------------------------------------------------------------------
> S0000B00D1DF0        270         742           2           2          41
>
> XHCI without the patch:
> ==[ Cumulative ]========
> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>    Name         Count     Duration    Duration    Duration    Duration
> -------------------------------------------------------------------------------
> S0000B00D14F0        215         603           2           2          52
>
> XHCI with the patch:
> ==[ Cumulative ]========
> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>    Name         Count     Duration    Duration    Duration    Duration
> -------------------------------------------------------------------------------
> S0000B00D14F0         95         294           3           2          52
>
> I believe the performance data really depends on
> 1. How many AsyncInterruptTransfer handlers (the number of USB keyboard
> and/or USB bluetooth keyboard?)
> 2. Data size (for flushing data from PCI controller specific address to
> mapped system memory address *in original code*)
> 3. The performance of IoMmu->SetAttribute (for example, the SetAttribute
> operation on Intel VTd engine caused by the unmap and map for flushing data
> *in original code*, the SetAttribute operation on IntelVTd engine will
> involve FlushPageTableMemory, InvalidatePageEntry and etc)
>

OK, so there is room for improvement here: there is no reason the
IOMMU driver couldn't cache mappings, or do some other optimizations
that would make mapping the same memory repeatedly less costly.

>>
>> On an unrelated note to the concerns above:
>> Why has a fundamental change to the behaviour of one of the industry
>> standard drivers been pushed at the very end of the stable cycle?
>
>
> We thought it was a simple improvement but not fundamental change before
> Eugene and Ard raised the concern.
>
>
> Thanks,
> Star
>
>>
>> Regards,
>>
>> Leif
>>
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V3 4/4] MdeModulePkg EhciDxe: Use common buffer for AsyncInterruptTransfer
  2018-11-06  9:49           ` Ard Biesheuvel
@ 2018-11-06 14:37             ` Zeng, Star
  2018-11-07 15:00               ` Zeng, Star
  0 siblings, 1 reply; 18+ messages in thread
From: Zeng, Star @ 2018-11-06 14:37 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: edk2-devel-01, star.zeng

On 2018/11/6 17:49, Ard Biesheuvel wrote:
> On 31 October 2018 at 05:38, Zeng, Star <star.zeng@intel.com> wrote:
>> Good feedback.
>>
>> On 2018/10/30 20:50, Leif Lindholm wrote:
>>>
>>> On Tue, Oct 30, 2018 at 09:39:24AM -0300, Ard Biesheuvel wrote:
>>>>
>>>> (add back the list)
>>>
>>>
>>> Oi! Go back on holiday!
>>>
>>>> On 30 October 2018 at 09:07, Cohen, Eugene <eugene@hp.com> wrote:
>>>>>
>>>>> Has this patch been tested on a system that does not have coherent DMA?
>>>>>
>>>>> It's not clear that this change would actually be faster on a system of
>>>>> that
>>>>> type since using common buffers imply access to uncached memory.
>>>>> Depending
>>>>> on the access patterns the uncached memory access could be more time
>>>>> consuming than cache maintenance operations.
>>
>>
>> The change/idea was based on the statement below.
>>    ///
>>    /// Provides both read and write access to system memory by both the
>> processor and a
>>    /// bus master. The buffer is coherent from both the processor's and the
>> bus master's point of view.
>>    ///
>>    EfiPciIoOperationBusMasterCommonBuffer,
>>
>> Thanks for raising case about uncached memory access. But after checking the
>> code, for Intel VTd case
>> https://github.com/tianocore/edk2/blob/master/IntelSiliconPkg/Feature/VTd/IntelVTdDxe/BmDma.c#L460
>> (or no IOMMU case
>> https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciRootBridgeIo.c#L1567),
>> the common buffer is just normal memory buffer.
>> If someone can help do some test/collect some data on a system using common
>> buffers imply access to uncached memory, that will be great.
>>
> 
> OK, so first of all, can anyone explain to me under which
> circumstances interrupt transfers are a bottleneck? I'd assume that
> anything throughput bound would use bulk endpoints.
> 
> Also, since the Map/Unmap calls are only costly when using an IOMMU,
> could we simply revert to the old behavior if mIoMmu == NULL?
> 
>>>>
>>>> I haven't had time to look at these patches yet.
>>>>
>>>> I agree with Eugene's concern: the directional DMA routines are much
>>>> more performant on implementations with non-coherent DMA, and so
>>>> common buffers should be avoided unless we are dealing with data
>>>> structures that are truly shared between the CPU and the device.
>>>>
>>>> Since this is obviously not the case here, could we please have some
>>>> numbers about the performance improvement we are talking about here?
>>>> Would it be possible to improve the IOMMU handling code instead?
>>
>>
>> We collected the data below on a platform with release image and Intel VTd
>> enabled.
>>
>> The image size of EhciDxe or XhciDxe can reduce about 120+ bytes.
>>
>> EHCI without the patch:
>> ==[ Cumulative ]========
>> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>>     Name         Count     Duration    Duration    Duration    Duration
>> -------------------------------------------------------------------------------
>> S0000B00D1DF0        446        2150           4           2         963
>>
>> EHCI with the patch:
>> ==[ Cumulative ]========
>> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>>     Name         Count     Duration    Duration    Duration    Duration
>> -------------------------------------------------------------------------------
>> S0000B00D1DF0        270         742           2           2          41
>>
>> XHCI without the patch:
>> ==[ Cumulative ]========
>> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>>     Name         Count     Duration    Duration    Duration    Duration
>> -------------------------------------------------------------------------------
>> S0000B00D14F0        215         603           2           2          52
>>
>> XHCI with the patch:
>> ==[ Cumulative ]========
>> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>>     Name         Count     Duration    Duration    Duration    Duration
>> -------------------------------------------------------------------------------
>> S0000B00D14F0         95         294           3           2          52
>>
>> I believe the performance data really depends on
>> 1. How many AsyncInterruptTransfer handlers (the number of USB keyboard
>> and/or USB bluetooth keyboard?)
>> 2. Data size (for flushing data from PCI controller specific address to
>> mapped system memory address *in original code*)
>> 3. The performance of IoMmu->SetAttribute (for example, the SetAttribute
>> operation on Intel VTd engine caused by the unmap and map for flushing data
>> *in original code*, the SetAttribute operation on IntelVTd engine will
>> involve FlushPageTableMemory, InvalidatePageEntry and etc)
>>
> 
> OK, so there is room for improvement here: there is no reason the
> IOMMU driver couldn't cache mappings, or do some other optimizations
> that would make mapping the same memory repeatedly less costly.

The unmap/map with IOMMU will direct to SetAttribute that will 
disallow/allow DMA memory access. The IOMMU driver is hard to predict 
the sequence of unmap/map operations. Do you have more detail about the 
optimizations?

Could you take a try with the patch on the platform for the case you and 
Eugene mentioned?

Anyway, I am going to revert the patches (3/4 and 4/4, since 1/4 and 2/4 
have no functionality impact) since the time point is a little sensitive 
as it is near edk2-stable201811.

Thanks,
Star

> 
>>>
>>> On an unrelated note to the concerns above:
>>> Why has a fundamental change to the behaviour of one of the industry
>>> standard drivers been pushed at the very end of the stable cycle?
>>
>>
>> We thought it was a simple improvement but not fundamental change before
>> Eugene and Ard raised the concern.
>>
>>
>> Thanks,
>> Star
>>
>>>
>>> Regards,
>>>
>>> Leif
>>>
>>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V3 4/4] MdeModulePkg EhciDxe: Use common buffer for AsyncInterruptTransfer
  2018-11-06 14:37             ` Zeng, Star
@ 2018-11-07 15:00               ` Zeng, Star
  2018-11-07 15:14                 ` Ard Biesheuvel
  0 siblings, 1 reply; 18+ messages in thread
From: Zeng, Star @ 2018-11-07 15:00 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: edk2-devel-01, star.zeng

On 2018/11/6 22:37, Zeng, Star wrote:
> On 2018/11/6 17:49, Ard Biesheuvel wrote:
>> On 31 October 2018 at 05:38, Zeng, Star <star.zeng@intel.com> wrote:
>>> Good feedback.
>>>
>>> On 2018/10/30 20:50, Leif Lindholm wrote:
>>>>
>>>> On Tue, Oct 30, 2018 at 09:39:24AM -0300, Ard Biesheuvel wrote:
>>>>>
>>>>> (add back the list)
>>>>
>>>>
>>>> Oi! Go back on holiday!
>>>>
>>>>> On 30 October 2018 at 09:07, Cohen, Eugene <eugene@hp.com> wrote:
>>>>>>
>>>>>> Has this patch been tested on a system that does not have coherent 
>>>>>> DMA?
>>>>>>
>>>>>> It's not clear that this change would actually be faster on a 
>>>>>> system of
>>>>>> that
>>>>>> type since using common buffers imply access to uncached memory.
>>>>>> Depending
>>>>>> on the access patterns the uncached memory access could be more time
>>>>>> consuming than cache maintenance operations.
>>>
>>>
>>> The change/idea was based on the statement below.
>>>    ///
>>>    /// Provides both read and write access to system memory by both the
>>> processor and a
>>>    /// bus master. The buffer is coherent from both the processor's 
>>> and the
>>> bus master's point of view.
>>>    ///
>>>    EfiPciIoOperationBusMasterCommonBuffer,
>>>
>>> Thanks for raising case about uncached memory access. But after 
>>> checking the
>>> code, for Intel VTd case
>>> https://github.com/tianocore/edk2/blob/master/IntelSiliconPkg/Feature/VTd/IntelVTdDxe/BmDma.c#L460 
>>>
>>> (or no IOMMU case
>>> https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciRootBridgeIo.c#L1567), 
>>>
>>> the common buffer is just normal memory buffer.
>>> If someone can help do some test/collect some data on a system using 
>>> common
>>> buffers imply access to uncached memory, that will be great.
>>>
>>
>> OK, so first of all, can anyone explain to me under which
>> circumstances interrupt transfers are a bottleneck? I'd assume that
>> anything throughput bound would use bulk endpoints.
>>
>> Also, since the Map/Unmap calls are only costly when using an IOMMU,
>> could we simply revert to the old behavior if mIoMmu == NULL?
>>
>>>>>
>>>>> I haven't had time to look at these patches yet.
>>>>>
>>>>> I agree with Eugene's concern: the directional DMA routines are much
>>>>> more performant on implementations with non-coherent DMA, and so
>>>>> common buffers should be avoided unless we are dealing with data
>>>>> structures that are truly shared between the CPU and the device.
>>>>>
>>>>> Since this is obviously not the case here, could we please have some
>>>>> numbers about the performance improvement we are talking about here?
>>>>> Would it be possible to improve the IOMMU handling code instead?
>>>
>>>
>>> We collected the data below on a platform with release image and 
>>> Intel VTd
>>> enabled.
>>>
>>> The image size of EhciDxe or XhciDxe can reduce about 120+ bytes.
>>>
>>> EHCI without the patch:
>>> ==[ Cumulative ]========
>>> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>>>     Name         Count     Duration    Duration    Duration    Duration
>>> ------------------------------------------------------------------------------- 
>>>
>>> S0000B00D1DF0        446        2150           4           2         963
>>>
>>> EHCI with the patch:
>>> ==[ Cumulative ]========
>>> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>>>     Name         Count     Duration    Duration    Duration    Duration
>>> ------------------------------------------------------------------------------- 
>>>
>>> S0000B00D1DF0        270         742           2           2          41
>>>
>>> XHCI without the patch:
>>> ==[ Cumulative ]========
>>> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>>>     Name         Count     Duration    Duration    Duration    Duration
>>> ------------------------------------------------------------------------------- 
>>>
>>> S0000B00D14F0        215         603           2           2          52
>>>
>>> XHCI with the patch:
>>> ==[ Cumulative ]========
>>> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>>>     Name         Count     Duration    Duration    Duration    Duration
>>> ------------------------------------------------------------------------------- 
>>>
>>> S0000B00D14F0         95         294           3           2          52
>>>
>>> I believe the performance data really depends on
>>> 1. How many AsyncInterruptTransfer handlers (the number of USB keyboard
>>> and/or USB bluetooth keyboard?)
>>> 2. Data size (for flushing data from PCI controller specific address to
>>> mapped system memory address *in original code*)
>>> 3. The performance of IoMmu->SetAttribute (for example, the SetAttribute
>>> operation on Intel VTd engine caused by the unmap and map for 
>>> flushing data
>>> *in original code*, the SetAttribute operation on IntelVTd engine will
>>> involve FlushPageTableMemory, InvalidatePageEntry and etc)
>>>
>>
>> OK, so there is room for improvement here: there is no reason the
>> IOMMU driver couldn't cache mappings, or do some other optimizations
>> that would make mapping the same memory repeatedly less costly.
> 
> The unmap/map with IOMMU will direct to SetAttribute that will 
> disallow/allow DMA memory access. The IOMMU driver is hard to predict 
> the sequence of unmap/map operations. Do you have more detail about the 
> optimizations?
> 
> Could you take a try with the patch on the platform for the case you and 
> Eugene mentioned?
> 
> Anyway, I am going to revert the patches (3/4 and 4/4, since 1/4 and 2/4 
> have no functionality impact) since the time point is a little sensitive 
> as it is near edk2-stable201811.

I have reverted the patch 3/4 and 4/4 at 
https://github.com/tianocore/edk2/compare/1ed6498...d98fc9a, and we can 
continue the discussion.


> 
> Thanks,
> Star
> 
>>
>>>>
>>>> On an unrelated note to the concerns above:
>>>> Why has a fundamental change to the behaviour of one of the industry
>>>> standard drivers been pushed at the very end of the stable cycle?
>>>
>>>
>>> We thought it was a simple improvement but not fundamental change before
>>> Eugene and Ard raised the concern.
>>>
>>>
>>> Thanks,
>>> Star
>>>
>>>>
>>>> Regards,
>>>>
>>>> Leif
>>>>
>>>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V3 4/4] MdeModulePkg EhciDxe: Use common buffer for AsyncInterruptTransfer
  2018-11-07 15:00               ` Zeng, Star
@ 2018-11-07 15:14                 ` Ard Biesheuvel
  0 siblings, 0 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2018-11-07 15:14 UTC (permalink / raw)
  To: Zeng, Star; +Cc: edk2-devel-01

On 7 November 2018 at 16:00, Zeng, Star <star.zeng@intel.com> wrote:
> On 2018/11/6 22:37, Zeng, Star wrote:
>>
>> On 2018/11/6 17:49, Ard Biesheuvel wrote:
>>>
>>> On 31 October 2018 at 05:38, Zeng, Star <star.zeng@intel.com> wrote:
>>>>
>>>> Good feedback.
>>>>
>>>> On 2018/10/30 20:50, Leif Lindholm wrote:
>>>>>
>>>>>
>>>>> On Tue, Oct 30, 2018 at 09:39:24AM -0300, Ard Biesheuvel wrote:
>>>>>>
>>>>>>
>>>>>> (add back the list)
>>>>>
>>>>>
>>>>>
>>>>> Oi! Go back on holiday!
>>>>>
>>>>>> On 30 October 2018 at 09:07, Cohen, Eugene <eugene@hp.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Has this patch been tested on a system that does not have coherent
>>>>>>> DMA?
>>>>>>>
>>>>>>> It's not clear that this change would actually be faster on a system
>>>>>>> of
>>>>>>> that
>>>>>>> type since using common buffers imply access to uncached memory.
>>>>>>> Depending
>>>>>>> on the access patterns the uncached memory access could be more time
>>>>>>> consuming than cache maintenance operations.
>>>>
>>>>
>>>>
>>>> The change/idea was based on the statement below.
>>>>    ///
>>>>    /// Provides both read and write access to system memory by both the
>>>> processor and a
>>>>    /// bus master. The buffer is coherent from both the processor's and
>>>> the
>>>> bus master's point of view.
>>>>    ///
>>>>    EfiPciIoOperationBusMasterCommonBuffer,
>>>>
>>>> Thanks for raising case about uncached memory access. But after checking
>>>> the
>>>> code, for Intel VTd case
>>>>
>>>> https://github.com/tianocore/edk2/blob/master/IntelSiliconPkg/Feature/VTd/IntelVTdDxe/BmDma.c#L460
>>>> (or no IOMMU case
>>>>
>>>> https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciRootBridgeIo.c#L1567),
>>>> the common buffer is just normal memory buffer.
>>>> If someone can help do some test/collect some data on a system using
>>>> common
>>>> buffers imply access to uncached memory, that will be great.
>>>>
>>>
>>> OK, so first of all, can anyone explain to me under which
>>> circumstances interrupt transfers are a bottleneck? I'd assume that
>>> anything throughput bound would use bulk endpoints.
>>>
>>> Also, since the Map/Unmap calls are only costly when using an IOMMU,
>>> could we simply revert to the old behavior if mIoMmu == NULL?
>>>
>>>>>>
>>>>>> I haven't had time to look at these patches yet.
>>>>>>
>>>>>> I agree with Eugene's concern: the directional DMA routines are much
>>>>>> more performant on implementations with non-coherent DMA, and so
>>>>>> common buffers should be avoided unless we are dealing with data
>>>>>> structures that are truly shared between the CPU and the device.
>>>>>>
>>>>>> Since this is obviously not the case here, could we please have some
>>>>>> numbers about the performance improvement we are talking about here?
>>>>>> Would it be possible to improve the IOMMU handling code instead?
>>>>
>>>>
>>>>
>>>> We collected the data below on a platform with release image and Intel
>>>> VTd
>>>> enabled.
>>>>
>>>> The image size of EhciDxe or XhciDxe can reduce about 120+ bytes.
>>>>
>>>> EHCI without the patch:
>>>> ==[ Cumulative ]========
>>>> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>>>>     Name         Count     Duration    Duration    Duration    Duration
>>>>
>>>> -------------------------------------------------------------------------------
>>>> S0000B00D1DF0        446        2150           4           2         963
>>>>
>>>> EHCI with the patch:
>>>> ==[ Cumulative ]========
>>>> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>>>>     Name         Count     Duration    Duration    Duration    Duration
>>>>
>>>> -------------------------------------------------------------------------------
>>>> S0000B00D1DF0        270         742           2           2          41
>>>>
>>>> XHCI without the patch:
>>>> ==[ Cumulative ]========
>>>> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>>>>     Name         Count     Duration    Duration    Duration    Duration
>>>>
>>>> -------------------------------------------------------------------------------
>>>> S0000B00D14F0        215         603           2           2          52
>>>>
>>>> XHCI with the patch:
>>>> ==[ Cumulative ]========
>>>> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>>>>     Name         Count     Duration    Duration    Duration    Duration
>>>>
>>>> -------------------------------------------------------------------------------
>>>> S0000B00D14F0         95         294           3           2          52
>>>>
>>>> I believe the performance data really depends on
>>>> 1. How many AsyncInterruptTransfer handlers (the number of USB keyboard
>>>> and/or USB bluetooth keyboard?)
>>>> 2. Data size (for flushing data from PCI controller specific address to
>>>> mapped system memory address *in original code*)
>>>> 3. The performance of IoMmu->SetAttribute (for example, the SetAttribute
>>>> operation on Intel VTd engine caused by the unmap and map for flushing
>>>> data
>>>> *in original code*, the SetAttribute operation on IntelVTd engine will
>>>> involve FlushPageTableMemory, InvalidatePageEntry and etc)
>>>>
>>>
>>> OK, so there is room for improvement here: there is no reason the
>>> IOMMU driver couldn't cache mappings, or do some other optimizations
>>> that would make mapping the same memory repeatedly less costly.
>>
>>
>> The unmap/map with IOMMU will direct to SetAttribute that will
>> disallow/allow DMA memory access. The IOMMU driver is hard to predict the
>> sequence of unmap/map operations. Do you have more detail about the
>> optimizations?
>>
>> Could you take a try with the patch on the platform for the case you and
>> Eugene mentioned?
>>
>> Anyway, I am going to revert the patches (3/4 and 4/4, since 1/4 and 2/4
>> have no functionality impact) since the time point is a little sensitive as
>> it is near edk2-stable201811.
>
>
> I have reverted the patch 3/4 and 4/4 at
> https://github.com/tianocore/edk2/compare/1ed6498...d98fc9a, and we can
> continue the discussion.
>

Thanks Star


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-11-07 15:14 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-10-26 13:41 [PATCH V3 0/4] Remove unnecessary Map/Unmap in XhciDxe/EhciDxe for AsyncInterruptTransfer Star Zeng
2018-10-26 13:41 ` [PATCH V3 1/4] MdeModulePkg XhciDxe: Extract new XhciInsertAsyncIntTransfer function Star Zeng
2018-10-26 13:41 ` [PATCH V3 2/4] MdeModulePkg EhciDxe: Extract new EhciInsertAsyncIntTransfer function Star Zeng
2018-10-26 13:41 ` [PATCH V3 3/4] MdeModulePkg XhciDxe: Use common buffer for AsyncInterruptTransfer Star Zeng
2018-10-27  7:47   ` Wu, Hao A
2018-10-26 13:41 ` [PATCH V3 4/4] MdeModulePkg EhciDxe: " Star Zeng
     [not found]   ` <CS1PR8401MB1189428C2915107C583C0A42B4CC0@CS1PR8401MB1189.NAMPRD84.PROD.OUTLOOK.COM>
2018-10-30 12:39     ` Ard Biesheuvel
2018-10-30 12:50       ` Leif Lindholm
2018-10-31  4:38         ` Zeng, Star
2018-10-31 12:08           ` Leif Lindholm
2018-11-01  1:12             ` Zeng, Star
2018-11-01 10:32               ` Leif Lindholm
2018-11-06  9:49           ` Ard Biesheuvel
2018-11-06 14:37             ` Zeng, Star
2018-11-07 15:00               ` Zeng, Star
2018-11-07 15:14                 ` Ard Biesheuvel
2018-10-28 13:35 ` [PATCH V3 0/4] Remove unnecessary Map/Unmap in XhciDxe/EhciDxe " Zeng, Star
2018-10-29  2:19 ` Ni, Ruiyu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox