* [PATCH V3 0/4] Remove unnecessary Map/Unmap in XhciDxe/EhciDxe for AsyncInterruptTransfer
@ 2018-10-26 13:41 Star Zeng
2018-10-26 13:41 ` [PATCH V3 1/4] MdeModulePkg XhciDxe: Extract new XhciInsertAsyncIntTransfer function Star Zeng
` (5 more replies)
0 siblings, 6 replies; 18+ messages in thread
From: Star Zeng @ 2018-10-26 13:41 UTC (permalink / raw)
To: edk2-devel; +Cc: Star Zeng, Ruiyu Ni, Hao Wu, Jian J Wang, Jiewen Yao
V3: Thanks for Hao's feedback.
1. Match function parameter name and description between
XhciSched.c and XhciSched.h, EhciSched.c and EhciSched.h.
2. Call XhcFreeUrb after XhcCreateTransferTrb fails in XhcCreateTrb.
V2: Thanks for Ray's feedback.
1. Add the missing "FreePool (Data);".
2. Remove the unnecessary indentation change.
REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1274
Please refer to the log message of each commit for more details.
Cc: Ruiyu Ni <ruiyu.ni@intel.com>
Cc: Hao Wu <hao.a.wu@intel.com>
Cc: Jian J Wang <jian.j.wang@intel.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Star Zeng (4):
MdeModulePkg XhciDxe: Extract new XhciInsertAsyncIntTransfer function
MdeModulePkg EhciDxe: Extract new EhciInsertAsyncIntTransfer function
MdeModulePkg XhciDxe: Use common buffer for AsyncInterruptTransfer
MdeModulePkg EhciDxe: Use common buffer for AsyncInterruptTransfer
MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c | 28 +----
MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.c | 114 +++++++++----------
MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.h | 35 +++++-
MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.c | 38 ++++++-
MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.h | 33 +++---
MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c | 19 +---
MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.c | 185 +++++++++++++++++--------------
MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.h | 55 +++++++--
8 files changed, 296 insertions(+), 211 deletions(-)
--
2.7.0.windows.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH V3 1/4] MdeModulePkg XhciDxe: Extract new XhciInsertAsyncIntTransfer function
2018-10-26 13:41 [PATCH V3 0/4] Remove unnecessary Map/Unmap in XhciDxe/EhciDxe for AsyncInterruptTransfer Star Zeng
@ 2018-10-26 13:41 ` Star Zeng
2018-10-26 13:41 ` [PATCH V3 2/4] MdeModulePkg EhciDxe: Extract new EhciInsertAsyncIntTransfer function Star Zeng
` (4 subsequent siblings)
5 siblings, 0 replies; 18+ messages in thread
From: Star Zeng @ 2018-10-26 13:41 UTC (permalink / raw)
To: edk2-devel; +Cc: Star Zeng, Ruiyu Ni, Hao Wu, Jian J Wang, Jiewen Yao
V3:
Match function parameter name and description between
XhciSched.c and XhciSched.h.
V2:
Add the missing "FreePool (Data);".
Remove the unnecessary indentation change.
REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1274
Extract new XhciInsertAsyncIntTransfer function from
XhcAsyncInterruptTransfer.
It is code preparation for following patch,
no essential functional change.
Cc: Ruiyu Ni <ruiyu.ni@intel.com>
Cc: Hao Wu <hao.a.wu@intel.com>
Cc: Jian J Wang <jian.j.wang@intel.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Star Zeng <star.zeng@intel.com>
Reviewed-by: Hao Wu <hao.a.wu@intel.com>
---
MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c | 18 +--------
MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.c | 65 ++++++++++++++++++++++++++++++++
MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.h | 28 ++++++++++++++
3 files changed, 94 insertions(+), 17 deletions(-)
diff --git a/MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c b/MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c
index f1c60bef01c0..7f64f9c7c982 100644
--- a/MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c
+++ b/MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c
@@ -1346,7 +1346,6 @@ XhcAsyncInterruptTransfer (
EFI_STATUS Status;
UINT8 SlotId;
UINT8 Index;
- UINT8 *Data;
EFI_TPL OldTpl;
//
@@ -1413,36 +1412,21 @@ XhcAsyncInterruptTransfer (
goto ON_EXIT;
}
- Data = AllocateZeroPool (DataLength);
-
- if (Data == NULL) {
- DEBUG ((EFI_D_ERROR, "XhcAsyncInterruptTransfer: failed to allocate buffer\n"));
- Status = EFI_OUT_OF_RESOURCES;
- goto ON_EXIT;
- }
-
- Urb = XhcCreateUrb (
+ Urb = XhciInsertAsyncIntTransfer (
Xhc,
DeviceAddress,
EndPointAddress,
DeviceSpeed,
MaximumPacketLength,
- XHC_INT_TRANSFER_ASYNC,
- NULL,
- Data,
DataLength,
CallBackFunction,
Context
);
-
if (Urb == NULL) {
- DEBUG ((EFI_D_ERROR, "XhcAsyncInterruptTransfer: failed to create URB\n"));
- FreePool (Data);
Status = EFI_OUT_OF_RESOURCES;
goto ON_EXIT;
}
- InsertHeadList (&Xhc->AsyncIntTransfers, &Urb->UrbList);
//
// Ring the doorbell
//
diff --git a/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.c b/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.c
index 166c44bf5e66..75959ae08363 100644
--- a/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.c
+++ b/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.c
@@ -1411,6 +1411,71 @@ XhciDelAllAsyncIntTransfers (
}
/**
+ Insert a single asynchronous interrupt transfer for
+ the device and endpoint.
+
+ @param Xhc The XHCI Instance
+ @param BusAddr The logical device address assigned by UsbBus driver
+ @param EpAddr Endpoint addrress
+ @param DevSpeed The device speed
+ @param MaxPacket The max packet length of the endpoint
+ @param DataLen The length of data buffer
+ @param Callback The function to call when data is transferred
+ @param Context The context to the callback
+
+ @return Created URB or NULL
+
+**/
+URB *
+XhciInsertAsyncIntTransfer (
+ IN USB_XHCI_INSTANCE *Xhc,
+ IN UINT8 BusAddr,
+ IN UINT8 EpAddr,
+ IN UINT8 DevSpeed,
+ IN UINTN MaxPacket,
+ IN UINTN DataLen,
+ IN EFI_ASYNC_USB_TRANSFER_CALLBACK Callback,
+ IN VOID *Context
+ )
+{
+ VOID *Data;
+ URB *Urb;
+
+ Data = AllocateZeroPool (DataLen);
+ if (Data == NULL) {
+ DEBUG ((DEBUG_ERROR, "%a: failed to allocate buffer\n", __FUNCTION__));
+ return NULL;
+ }
+
+ Urb = XhcCreateUrb (
+ Xhc,
+ BusAddr,
+ EpAddr,
+ DevSpeed,
+ MaxPacket,
+ XHC_INT_TRANSFER_ASYNC,
+ NULL,
+ Data,
+ DataLen,
+ Callback,
+ Context
+ );
+ if (Urb == NULL) {
+ DEBUG ((DEBUG_ERROR, "%a: failed to create URB\n", __FUNCTION__));
+ FreePool (Data);
+ return NULL;
+ }
+
+ //
+ // New asynchronous transfer must inserted to the head.
+ // Check the comments in XhcMoniteAsyncRequests
+ //
+ InsertHeadList (&Xhc->AsyncIntTransfers, &Urb->UrbList);
+
+ return Urb;
+}
+
+/**
Update the queue head for next round of asynchronous transfer
@param Xhc The XHCI Instance.
diff --git a/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.h b/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.h
index 097408828a1f..b5e192c3b589 100644
--- a/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.h
+++ b/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.h
@@ -853,6 +853,34 @@ XhciDelAllAsyncIntTransfers (
);
/**
+ Insert a single asynchronous interrupt transfer for
+ the device and endpoint.
+
+ @param Xhc The XHCI Instance
+ @param BusAddr The logical device address assigned by UsbBus driver
+ @param EpAddr Endpoint addrress
+ @param DevSpeed The device speed
+ @param MaxPacket The max packet length of the endpoint
+ @param DataLen The length of data buffer
+ @param Callback The function to call when data is transferred
+ @param Context The context to the callback
+
+ @return Created URB or NULL
+
+**/
+URB *
+XhciInsertAsyncIntTransfer (
+ IN USB_XHCI_INSTANCE *Xhc,
+ IN UINT8 BusAddr,
+ IN UINT8 EpAddr,
+ IN UINT8 DevSpeed,
+ IN UINTN MaxPacket,
+ IN UINTN DataLen,
+ IN EFI_ASYNC_USB_TRANSFER_CALLBACK Callback,
+ IN VOID *Context
+ );
+
+/**
Set Bios Ownership
@param Xhc The XHCI Instance.
--
2.7.0.windows.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH V3 2/4] MdeModulePkg EhciDxe: Extract new EhciInsertAsyncIntTransfer function
2018-10-26 13:41 [PATCH V3 0/4] Remove unnecessary Map/Unmap in XhciDxe/EhciDxe for AsyncInterruptTransfer Star Zeng
2018-10-26 13:41 ` [PATCH V3 1/4] MdeModulePkg XhciDxe: Extract new XhciInsertAsyncIntTransfer function Star Zeng
@ 2018-10-26 13:41 ` Star Zeng
2018-10-26 13:41 ` [PATCH V3 3/4] MdeModulePkg XhciDxe: Use common buffer for AsyncInterruptTransfer Star Zeng
` (3 subsequent siblings)
5 siblings, 0 replies; 18+ messages in thread
From: Star Zeng @ 2018-10-26 13:41 UTC (permalink / raw)
To: edk2-devel; +Cc: Star Zeng, Ruiyu Ni, Hao Wu, Jian J Wang, Jiewen Yao
V3:
Match function parameter name and description between
EhciSched.c and EhciSched.h.
V2:
Add the missing "gBS->FreePool (Data);".
REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1274
Extract new EhciInsertAsyncIntTransfer function from
EhcAsyncInterruptTransfer.
It is code preparation for following patch,
no essential functional change.
Cc: Ruiyu Ni <ruiyu.ni@intel.com>
Cc: Hao Wu <hao.a.wu@intel.com>
Cc: Jian J Wang <jian.j.wang@intel.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Star Zeng <star.zeng@intel.com>
Reviewed-by: Hao Wu <hao.a.wu@intel.com>
---
MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c | 25 +----------
MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.c | 76 ++++++++++++++++++++++++++++++++
MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.h | 35 ++++++++++++++-
3 files changed, 111 insertions(+), 25 deletions(-)
diff --git a/MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c b/MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c
index 50b5598df4fb..5569f4f9618b 100644
--- a/MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c
+++ b/MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c
@@ -997,7 +997,6 @@ EhcAsyncInterruptTransfer (
URB *Urb;
EFI_TPL OldTpl;
EFI_STATUS Status;
- UINT8 *Data;
//
// Validate parameters
@@ -1046,16 +1045,7 @@ EhcAsyncInterruptTransfer (
EhcAckAllInterrupt (Ehc);
- Data = AllocatePool (DataLength);
-
- if (Data == NULL) {
- DEBUG ((EFI_D_ERROR, "EhcAsyncInterruptTransfer: failed to allocate buffer\n"));
-
- Status = EFI_OUT_OF_RESOURCES;
- goto ON_EXIT;
- }
-
- Urb = EhcCreateUrb (
+ Urb = EhciInsertAsyncIntTransfer (
Ehc,
DeviceAddress,
EndPointAddress,
@@ -1063,9 +1053,6 @@ EhcAsyncInterruptTransfer (
*DataToggle,
MaximumPacketLength,
Translator,
- EHC_INT_TRANSFER_ASYNC,
- NULL,
- Data,
DataLength,
CallBackFunction,
Context,
@@ -1073,20 +1060,10 @@ EhcAsyncInterruptTransfer (
);
if (Urb == NULL) {
- DEBUG ((EFI_D_ERROR, "EhcAsyncInterruptTransfer: failed to create URB\n"));
-
- gBS->FreePool (Data);
Status = EFI_OUT_OF_RESOURCES;
goto ON_EXIT;
}
- //
- // New asynchronous transfer must inserted to the head.
- // Check the comments in EhcMoniteAsyncRequests
- //
- EhcLinkQhToPeriod (Ehc, Urb->Qh);
- InsertHeadList (&Ehc->AsyncIntTransfers, &Urb->UrbList);
-
ON_EXIT:
Ehc->PciIo->Flush (Ehc->PciIo);
gBS->RestoreTPL (OldTpl);
diff --git a/MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.c b/MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.c
index 168280be81d7..ec8d796fab11 100644
--- a/MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.c
+++ b/MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.c
@@ -814,6 +814,82 @@ EhciDelAllAsyncIntTransfers (
}
}
+/**
+ Insert a single asynchronous interrupt transfer for
+ the device and endpoint.
+
+ @param Ehc The EHCI device.
+ @param DevAddr The device address.
+ @param EpAddr Endpoint addrress & its direction.
+ @param DevSpeed The device speed.
+ @param Toggle Initial data toggle to use.
+ @param MaxPacket The max packet length of the endpoint.
+ @param Hub The transaction translator to use.
+ @param DataLen The length of data buffer.
+ @param Callback The function to call when data is transferred.
+ @param Context The context to the callback.
+ @param Interval The interval for interrupt transfer.
+
+ @return Created URB or NULL.
+
+**/
+URB *
+EhciInsertAsyncIntTransfer (
+ IN USB2_HC_DEV *Ehc,
+ IN UINT8 DevAddr,
+ IN UINT8 EpAddr,
+ IN UINT8 DevSpeed,
+ IN UINT8 Toggle,
+ IN UINTN MaxPacket,
+ IN EFI_USB2_HC_TRANSACTION_TRANSLATOR *Hub,
+ IN UINTN DataLen,
+ IN EFI_ASYNC_USB_TRANSFER_CALLBACK Callback,
+ IN VOID *Context,
+ IN UINTN Interval
+ )
+{
+ VOID *Data;
+ URB *Urb;
+
+ Data = AllocatePool (DataLen);
+
+ if (Data == NULL) {
+ DEBUG ((DEBUG_ERROR, "%a: failed to allocate buffer\n", __FUNCTION__));
+ return NULL;
+ }
+
+ Urb = EhcCreateUrb (
+ Ehc,
+ DevAddr,
+ EpAddr,
+ DevSpeed,
+ Toggle,
+ MaxPacket,
+ Hub,
+ EHC_INT_TRANSFER_ASYNC,
+ NULL,
+ Data,
+ DataLen,
+ Callback,
+ Context,
+ Interval
+ );
+
+ if (Urb == NULL) {
+ DEBUG ((DEBUG_ERROR, "%a: failed to create URB\n", __FUNCTION__));
+ gBS->FreePool (Data);
+ return NULL;
+ }
+
+ //
+ // New asynchronous transfer must inserted to the head.
+ // Check the comments in EhcMoniteAsyncRequests
+ //
+ EhcLinkQhToPeriod (Ehc, Urb->Qh);
+ InsertHeadList (&Ehc->AsyncIntTransfers, &Urb->UrbList);
+
+ return Urb;
+}
/**
Flush data from PCI controller specific address to mapped system
diff --git a/MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.h b/MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.h
index c03bd619d7e8..e0c430531faf 100644
--- a/MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.h
+++ b/MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.h
@@ -2,7 +2,7 @@
This file contains the definination for host controller schedule routines.
-Copyright (c) 2007 - 2009, Intel Corporation. All rights reserved.<BR>
+Copyright (c) 2007 - 2018, Intel Corporation. All rights reserved.<BR>
This program and the accompanying materials
are licensed and made available under the terms and conditions of the BSD License
which accompanies this distribution. The full text of the license may be found at
@@ -162,6 +162,39 @@ EhciDelAllAsyncIntTransfers (
IN USB2_HC_DEV *Ehc
);
+/**
+ Insert a single asynchronous interrupt transfer for
+ the device and endpoint.
+
+ @param Ehc The EHCI device.
+ @param DevAddr The device address.
+ @param EpAddr Endpoint addrress & its direction.
+ @param DevSpeed The device speed.
+ @param Toggle Initial data toggle to use.
+ @param MaxPacket The max packet length of the endpoint.
+ @param Hub The transaction translator to use.
+ @param DataLen The length of data buffer.
+ @param Callback The function to call when data is transferred.
+ @param Context The context to the callback.
+ @param Interval The interval for interrupt transfer.
+
+ @return Created URB or NULL.
+
+**/
+URB *
+EhciInsertAsyncIntTransfer (
+ IN USB2_HC_DEV *Ehc,
+ IN UINT8 DevAddr,
+ IN UINT8 EpAddr,
+ IN UINT8 DevSpeed,
+ IN UINT8 Toggle,
+ IN UINTN MaxPacket,
+ IN EFI_USB2_HC_TRANSACTION_TRANSLATOR *Hub,
+ IN UINTN DataLen,
+ IN EFI_ASYNC_USB_TRANSFER_CALLBACK Callback,
+ IN VOID *Context,
+ IN UINTN Interval
+ );
/**
Interrupt transfer periodic check handler.
--
2.7.0.windows.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH V3 3/4] MdeModulePkg XhciDxe: Use common buffer for AsyncInterruptTransfer
2018-10-26 13:41 [PATCH V3 0/4] Remove unnecessary Map/Unmap in XhciDxe/EhciDxe for AsyncInterruptTransfer Star Zeng
2018-10-26 13:41 ` [PATCH V3 1/4] MdeModulePkg XhciDxe: Extract new XhciInsertAsyncIntTransfer function Star Zeng
2018-10-26 13:41 ` [PATCH V3 2/4] MdeModulePkg EhciDxe: Extract new EhciInsertAsyncIntTransfer function Star Zeng
@ 2018-10-26 13:41 ` Star Zeng
2018-10-27 7:47 ` Wu, Hao A
2018-10-26 13:41 ` [PATCH V3 4/4] MdeModulePkg EhciDxe: " Star Zeng
` (2 subsequent siblings)
5 siblings, 1 reply; 18+ messages in thread
From: Star Zeng @ 2018-10-26 13:41 UTC (permalink / raw)
To: edk2-devel; +Cc: Star Zeng, Ruiyu Ni, Hao Wu, Jian J Wang, Jiewen Yao
V3:
Call XhcFreeUrb after XhcCreateTransferTrb fails in XhcCreateTrb.
REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1274
In current code, XhcMonitorAsyncRequests (timer handler) will do
unmap and map operations for AsyncIntTransfers to "Flush data from
PCI controller specific address to mapped system memory address".
XhcMonitorAsyncRequests
XhcFlushAsyncIntMap
PciIo->Unmap
IoMmu->SetAttribute
PciIo->Map
IoMmu->SetAttribute
This may impact the boot performance.
Since the data buffer for XhcMonitorAsyncRequests is internal
buffer, we can allocate common buffer by PciIo->AllocateBuffer
and map the buffer with EfiPciIoOperationBusMasterCommonBuffer,
then the unmap and map operations can be removed.
///
/// Provides both read and write access to system memory by
/// both the processor and a bus master. The buffer is coherent
/// from both the processor's and the bus master's point of view.
///
EfiPciIoOperationBusMasterCommonBuffer,
Test done:
USB KB works normally.
USB disk read/write works normally.
Cc: Ruiyu Ni <ruiyu.ni@intel.com>
Cc: Hao Wu <hao.a.wu@intel.com>
Cc: Jian J Wang <jian.j.wang@intel.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Star Zeng <star.zeng@intel.com>
Reviewed-by: Ruiyu Ni <ruiyu.ni@intel.com>
---
MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c | 1 +
MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.c | 138 +++++++++++--------------------
MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.h | 27 +++---
3 files changed, 63 insertions(+), 103 deletions(-)
diff --git a/MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c b/MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c
index 7f64f9c7c982..64855a4c158c 100644
--- a/MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c
+++ b/MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c
@@ -769,6 +769,7 @@ XhcTransfer (
MaximumPacketLength,
Type,
Request,
+ FALSE,
Data,
*DataLength,
NULL,
diff --git a/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.c b/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.c
index 75959ae08363..9dd45e93a272 100644
--- a/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.c
+++ b/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.c
@@ -118,17 +118,18 @@ ON_EXIT:
/**
Create a new URB for a new transaction.
- @param Xhc The XHCI Instance
- @param BusAddr The logical device address assigned by UsbBus driver
- @param EpAddr Endpoint addrress
- @param DevSpeed The device speed
- @param MaxPacket The max packet length of the endpoint
- @param Type The transaction type
- @param Request The standard USB request for control transfer
- @param Data The user data to transfer
- @param DataLen The length of data buffer
- @param Callback The function to call when data is transferred
- @param Context The context to the callback
+ @param Xhc The XHCI Instance
+ @param BusAddr The logical device address assigned by UsbBus driver
+ @param EpAddr Endpoint addrress
+ @param DevSpeed The device speed
+ @param MaxPacket The max packet length of the endpoint
+ @param Type The transaction type
+ @param Request The standard USB request for control transfer
+ @param AllocateCommonBuffer Indicate whether need to allocate common buffer for data transfer
+ @param Data The user data to transfer, NULL if AllocateCommonBuffer is TRUE
+ @param DataLen The length of data buffer
+ @param Callback The function to call when data is transferred
+ @param Context The context to the callback
@return Created URB or NULL
@@ -142,6 +143,7 @@ XhcCreateUrb (
IN UINTN MaxPacket,
IN UINTN Type,
IN EFI_USB_DEVICE_REQUEST *Request,
+ IN BOOLEAN AllocateCommonBuffer,
IN VOID *Data,
IN UINTN DataLen,
IN EFI_ASYNC_USB_TRANSFER_CALLBACK Callback,
@@ -169,8 +171,24 @@ XhcCreateUrb (
Ep->Type = Type;
Urb->Request = Request;
+ if (AllocateCommonBuffer) {
+ ASSERT (Data == NULL);
+ Status = Xhc->PciIo->AllocateBuffer (
+ Xhc->PciIo,
+ AllocateAnyPages,
+ EfiBootServicesData,
+ EFI_SIZE_TO_PAGES (DataLen),
+ &Data,
+ 0
+ );
+ if (EFI_ERROR (Status) || (Data == NULL)) {
+ FreePool (Urb);
+ return NULL;
+ }
+ }
Urb->Data = Data;
Urb->DataLen = DataLen;
+ Urb->AllocateCommonBuffer = AllocateCommonBuffer;
Urb->Callback = Callback;
Urb->Context = Context;
@@ -178,7 +196,7 @@ XhcCreateUrb (
ASSERT_EFI_ERROR (Status);
if (EFI_ERROR (Status)) {
DEBUG ((EFI_D_ERROR, "XhcCreateUrb: XhcCreateTransferTrb Failed, Status = %r\n", Status));
- FreePool (Urb);
+ XhcFreeUrb (Xhc, Urb);
Urb = NULL;
}
@@ -206,6 +224,14 @@ XhcFreeUrb (
Xhc->PciIo->Unmap (Xhc->PciIo, Urb->DataMap);
}
+ if (Urb->AllocateCommonBuffer) {
+ Xhc->PciIo->FreeBuffer (
+ Xhc->PciIo,
+ EFI_SIZE_TO_PAGES (Urb->DataLen),
+ Urb->Data
+ );
+ }
+
FreePool (Urb);
}
@@ -264,10 +290,14 @@ XhcCreateTransferTrb (
// No need to remap.
//
if ((Urb->Data != NULL) && (Urb->DataMap == NULL)) {
- if (((UINT8) (Urb->Ep.Direction)) == EfiUsbDataIn) {
- MapOp = EfiPciIoOperationBusMasterWrite;
+ if (Urb->AllocateCommonBuffer) {
+ MapOp = EfiPciIoOperationBusMasterCommonBuffer;
} else {
- MapOp = EfiPciIoOperationBusMasterRead;
+ if (((UINT8) (Urb->Ep.Direction)) == EfiUsbDataIn) {
+ MapOp = EfiPciIoOperationBusMasterWrite;
+ } else {
+ MapOp = EfiPciIoOperationBusMasterRead;
+ }
}
Len = Urb->DataLen;
@@ -1367,7 +1397,6 @@ XhciDelAsyncIntTransfer (
}
RemoveEntryList (&Urb->UrbList);
- FreePool (Urb->Data);
XhcFreeUrb (Xhc, Urb);
return EFI_SUCCESS;
}
@@ -1405,7 +1434,6 @@ XhciDelAllAsyncIntTransfers (
}
RemoveEntryList (&Urb->UrbList);
- FreePool (Urb->Data);
XhcFreeUrb (Xhc, Urb);
}
}
@@ -1438,15 +1466,8 @@ XhciInsertAsyncIntTransfer (
IN VOID *Context
)
{
- VOID *Data;
URB *Urb;
- Data = AllocateZeroPool (DataLen);
- if (Data == NULL) {
- DEBUG ((DEBUG_ERROR, "%a: failed to allocate buffer\n", __FUNCTION__));
- return NULL;
- }
-
Urb = XhcCreateUrb (
Xhc,
BusAddr,
@@ -1455,14 +1476,14 @@ XhciInsertAsyncIntTransfer (
MaxPacket,
XHC_INT_TRANSFER_ASYNC,
NULL,
- Data,
+ TRUE,
+ NULL,
DataLen,
Callback,
Context
);
if (Urb == NULL) {
DEBUG ((DEBUG_ERROR, "%a: failed to create URB\n", __FUNCTION__));
- FreePool (Data);
return NULL;
}
@@ -1503,61 +1524,6 @@ XhcUpdateAsyncRequest (
}
/**
- Flush data from PCI controller specific address to mapped system
- memory address.
-
- @param Xhc The XHCI device.
- @param Urb The URB to unmap.
-
- @retval EFI_SUCCESS Success to flush data to mapped system memory.
- @retval EFI_DEVICE_ERROR Fail to flush data to mapped system memory.
-
-**/
-EFI_STATUS
-XhcFlushAsyncIntMap (
- IN USB_XHCI_INSTANCE *Xhc,
- IN URB *Urb
- )
-{
- EFI_STATUS Status;
- EFI_PHYSICAL_ADDRESS PhyAddr;
- EFI_PCI_IO_PROTOCOL_OPERATION MapOp;
- EFI_PCI_IO_PROTOCOL *PciIo;
- UINTN Len;
- VOID *Map;
-
- PciIo = Xhc->PciIo;
- Len = Urb->DataLen;
-
- if (Urb->Ep.Direction == EfiUsbDataIn) {
- MapOp = EfiPciIoOperationBusMasterWrite;
- } else {
- MapOp = EfiPciIoOperationBusMasterRead;
- }
-
- if (Urb->DataMap != NULL) {
- Status = PciIo->Unmap (PciIo, Urb->DataMap);
- if (EFI_ERROR (Status)) {
- goto ON_ERROR;
- }
- }
-
- Urb->DataMap = NULL;
-
- Status = PciIo->Map (PciIo, MapOp, Urb->Data, &Len, &PhyAddr, &Map);
- if (EFI_ERROR (Status) || (Len != Urb->DataLen)) {
- goto ON_ERROR;
- }
-
- Urb->DataPhy = (VOID *) ((UINTN) PhyAddr);
- Urb->DataMap = Map;
- return EFI_SUCCESS;
-
-ON_ERROR:
- return EFI_DEVICE_ERROR;
-}
-
-/**
Interrupt transfer periodic check handler.
@param Event Interrupt event.
@@ -1577,7 +1543,6 @@ XhcMonitorAsyncRequests (
UINT8 *ProcBuf;
URB *Urb;
UINT8 SlotId;
- EFI_STATUS Status;
EFI_TPL OldTpl;
OldTpl = gBS->RaiseTPL (XHC_TPL);
@@ -1606,15 +1571,6 @@ XhcMonitorAsyncRequests (
}
//
- // Flush any PCI posted write transactions from a PCI host
- // bridge to system memory.
- //
- Status = XhcFlushAsyncIntMap (Xhc, Urb);
- if (EFI_ERROR (Status)) {
- DEBUG ((EFI_D_ERROR, "XhcMonitorAsyncRequests: Fail to Flush AsyncInt Mapped Memeory\n"));
- }
-
- //
// Allocate a buffer then copy the transferred data for user.
// If failed to allocate the buffer, update the URB for next
// round of transfer. Ignore the data of this round.
diff --git a/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.h b/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.h
index b5e192c3b589..1a95d98996c1 100644
--- a/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.h
+++ b/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.h
@@ -172,6 +172,7 @@ typedef struct _URB {
//
USB_ENDPOINT Ep;
EFI_USB_DEVICE_REQUEST *Request;
+ BOOLEAN AllocateCommonBuffer;
VOID *Data;
UINTN DataLen;
VOID *DataPhy;
@@ -1432,17 +1433,18 @@ XhcSetTrDequeuePointer (
/**
Create a new URB for a new transaction.
- @param Xhc The XHCI Instance
- @param DevAddr The device address
- @param EpAddr Endpoint addrress
- @param DevSpeed The device speed
- @param MaxPacket The max packet length of the endpoint
- @param Type The transaction type
- @param Request The standard USB request for control transfer
- @param Data The user data to transfer
- @param DataLen The length of data buffer
- @param Callback The function to call when data is transferred
- @param Context The context to the callback
+ @param Xhc The XHCI Instance
+ @param BusAddr The logical device address assigned by UsbBus driver
+ @param EpAddr Endpoint addrress
+ @param DevSpeed The device speed
+ @param MaxPacket The max packet length of the endpoint
+ @param Type The transaction type
+ @param Request The standard USB request for control transfer
+ @param AllocateCommonBuffer Indicate whether need to allocate common buffer for data transfer
+ @param Data The user data to transfer, NULL if AllocateCommonBuffer is TRUE
+ @param DataLen The length of data buffer
+ @param Callback The function to call when data is transferred
+ @param Context The context to the callback
@return Created URB or NULL
@@ -1450,12 +1452,13 @@ XhcSetTrDequeuePointer (
URB*
XhcCreateUrb (
IN USB_XHCI_INSTANCE *Xhc,
- IN UINT8 DevAddr,
+ IN UINT8 BusAddr,
IN UINT8 EpAddr,
IN UINT8 DevSpeed,
IN UINTN MaxPacket,
IN UINTN Type,
IN EFI_USB_DEVICE_REQUEST *Request,
+ IN BOOLEAN AllocateCommonBuffer,
IN VOID *Data,
IN UINTN DataLen,
IN EFI_ASYNC_USB_TRANSFER_CALLBACK Callback,
--
2.7.0.windows.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH V3 4/4] MdeModulePkg EhciDxe: Use common buffer for AsyncInterruptTransfer
2018-10-26 13:41 [PATCH V3 0/4] Remove unnecessary Map/Unmap in XhciDxe/EhciDxe for AsyncInterruptTransfer Star Zeng
` (2 preceding siblings ...)
2018-10-26 13:41 ` [PATCH V3 3/4] MdeModulePkg XhciDxe: Use common buffer for AsyncInterruptTransfer Star Zeng
@ 2018-10-26 13:41 ` Star Zeng
[not found] ` <CS1PR8401MB1189428C2915107C583C0A42B4CC0@CS1PR8401MB1189.NAMPRD84.PROD.OUTLOOK.COM>
2018-10-28 13:35 ` [PATCH V3 0/4] Remove unnecessary Map/Unmap in XhciDxe/EhciDxe " Zeng, Star
2018-10-29 2:19 ` Ni, Ruiyu
5 siblings, 1 reply; 18+ messages in thread
From: Star Zeng @ 2018-10-26 13:41 UTC (permalink / raw)
To: edk2-devel; +Cc: Star Zeng, Ruiyu Ni, Hao Wu, Jian J Wang, Jiewen Yao
REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1274
In current code, EhcMonitorAsyncRequests (timer handler) will do
unmap and map operations for AsyncIntTransfers to "Flush data from
PCI controller specific address to mapped system memory address".
EhcMonitorAsyncRequests
EhcFlushAsyncIntMap
PciIo->Unmap
IoMmu->SetAttribute
PciIo->Map
IoMmu->SetAttribute
This may impact the boot performance.
Since the data buffer for EhcMonitorAsyncRequests is internal
buffer, we can allocate common buffer by PciIo->AllocateBuffer
and map the buffer with EfiPciIoOperationBusMasterCommonBuffer,
then the unmap and map operations can be removed.
///
/// Provides both read and write access to system memory by
/// both the processor and a bus master. The buffer is coherent
/// from both the processor's and the bus master's point of view.
///
EfiPciIoOperationBusMasterCommonBuffer,
Test done:
USB KB works normally.
USB disk read/write works normally.
Cc: Ruiyu Ni <ruiyu.ni@intel.com>
Cc: Hao Wu <hao.a.wu@intel.com>
Cc: Jian J Wang <jian.j.wang@intel.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Star Zeng <star.zeng@intel.com>
Reviewed-by: Ruiyu Ni <ruiyu.ni@intel.com>
Reviewed-by: Hao Wu <hao.a.wu@intel.com>
---
MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c | 3 ++
MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.c | 78 +-------------------------------
MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.c | 38 ++++++++++++++--
MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.h | 33 ++++++++------
4 files changed, 57 insertions(+), 95 deletions(-)
diff --git a/MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c b/MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c
index 5569f4f9618b..764eeda58ba1 100644
--- a/MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c
+++ b/MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c
@@ -763,6 +763,7 @@ EhcControlTransfer (
Translator,
EHC_CTRL_TRANSFER,
Request,
+ FALSE,
Data,
*DataLength,
NULL,
@@ -906,6 +907,7 @@ EhcBulkTransfer (
Translator,
EHC_BULK_TRANSFER,
NULL,
+ FALSE,
Data[0],
*DataLength,
NULL,
@@ -1163,6 +1165,7 @@ EhcSyncInterruptTransfer (
Translator,
EHC_INT_TRANSFER_SYNC,
NULL,
+ FALSE,
Data,
*DataLength,
NULL,
diff --git a/MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.c b/MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.c
index ec8d796fab11..b067fd02d1ce 100644
--- a/MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.c
+++ b/MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.c
@@ -778,7 +778,6 @@ EhciDelAsyncIntTransfer (
EhcUnlinkQhFromPeriod (Ehc, Urb->Qh);
RemoveEntryList (&Urb->UrbList);
- gBS->FreePool (Urb->Data);
EhcFreeUrb (Ehc, Urb);
return EFI_SUCCESS;
}
@@ -809,7 +808,6 @@ EhciDelAllAsyncIntTransfers (
EhcUnlinkQhFromPeriod (Ehc, Urb->Qh);
RemoveEntryList (&Urb->UrbList);
- gBS->FreePool (Urb->Data);
EhcFreeUrb (Ehc, Urb);
}
}
@@ -848,16 +846,8 @@ EhciInsertAsyncIntTransfer (
IN UINTN Interval
)
{
- VOID *Data;
URB *Urb;
- Data = AllocatePool (DataLen);
-
- if (Data == NULL) {
- DEBUG ((DEBUG_ERROR, "%a: failed to allocate buffer\n", __FUNCTION__));
- return NULL;
- }
-
Urb = EhcCreateUrb (
Ehc,
DevAddr,
@@ -868,7 +858,8 @@ EhciInsertAsyncIntTransfer (
Hub,
EHC_INT_TRANSFER_ASYNC,
NULL,
- Data,
+ TRUE,
+ NULL,
DataLen,
Callback,
Context,
@@ -877,7 +868,6 @@ EhciInsertAsyncIntTransfer (
if (Urb == NULL) {
DEBUG ((DEBUG_ERROR, "%a: failed to create URB\n", __FUNCTION__));
- gBS->FreePool (Data);
return NULL;
}
@@ -892,60 +882,6 @@ EhciInsertAsyncIntTransfer (
}
/**
- Flush data from PCI controller specific address to mapped system
- memory address.
-
- @param Ehc The EHCI device.
- @param Urb The URB to unmap.
-
- @retval EFI_SUCCESS Success to flush data to mapped system memory.
- @retval EFI_DEVICE_ERROR Fail to flush data to mapped system memory.
-
-**/
-EFI_STATUS
-EhcFlushAsyncIntMap (
- IN USB2_HC_DEV *Ehc,
- IN URB *Urb
- )
-{
- EFI_STATUS Status;
- EFI_PHYSICAL_ADDRESS PhyAddr;
- EFI_PCI_IO_PROTOCOL_OPERATION MapOp;
- EFI_PCI_IO_PROTOCOL *PciIo;
- UINTN Len;
- VOID *Map;
-
- PciIo = Ehc->PciIo;
- Len = Urb->DataLen;
-
- if (Urb->Ep.Direction == EfiUsbDataIn) {
- MapOp = EfiPciIoOperationBusMasterWrite;
- } else {
- MapOp = EfiPciIoOperationBusMasterRead;
- }
-
- Status = PciIo->Unmap (PciIo, Urb->DataMap);
- if (EFI_ERROR (Status)) {
- goto ON_ERROR;
- }
-
- Urb->DataMap = NULL;
-
- Status = PciIo->Map (PciIo, MapOp, Urb->Data, &Len, &PhyAddr, &Map);
- if (EFI_ERROR (Status) || (Len != Urb->DataLen)) {
- goto ON_ERROR;
- }
-
- Urb->DataPhy = (VOID *) ((UINTN) PhyAddr);
- Urb->DataMap = Map;
- return EFI_SUCCESS;
-
-ON_ERROR:
- return EFI_DEVICE_ERROR;
-}
-
-
-/**
Update the queue head for next round of asynchronous transfer.
@param Ehc The EHCI device.
@@ -1050,7 +986,6 @@ EhcMonitorAsyncRequests (
BOOLEAN Finished;
UINT8 *ProcBuf;
URB *Urb;
- EFI_STATUS Status;
OldTpl = gBS->RaiseTPL (EHC_TPL);
Ehc = (USB2_HC_DEV *) Context;
@@ -1069,15 +1004,6 @@ EhcMonitorAsyncRequests (
}
//
- // Flush any PCI posted write transactions from a PCI host
- // bridge to system memory.
- //
- Status = EhcFlushAsyncIntMap (Ehc, Urb);
- if (EFI_ERROR (Status)) {
- DEBUG ((EFI_D_ERROR, "EhcMonitorAsyncRequests: Fail to Flush AsyncInt Mapped Memeory\n"));
- }
-
- //
// Allocate a buffer then copy the transferred data for user.
// If failed to allocate the buffer, update the URB for next
// round of transfer. Ignore the data of this round.
diff --git a/MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.c b/MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.c
index 6afb327df165..09c12776ecdb 100644
--- a/MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.c
+++ b/MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.c
@@ -339,6 +339,14 @@ EhcFreeUrb (
PciIo->Unmap (PciIo, Urb->DataMap);
}
+ if (Urb->AllocateCommonBuffer) {
+ PciIo->FreeBuffer (
+ PciIo,
+ EFI_SIZE_TO_PAGES (Urb->DataLen),
+ Urb->Data
+ );
+ }
+
if (Urb->Qh != NULL) {
//
// Ensure that this queue head has been unlinked from the
@@ -529,7 +537,8 @@ ON_ERROR:
@param Hub The transaction translator to use.
@param Type The transaction type.
@param Request The standard USB request for control transfer.
- @param Data The user data to transfer.
+ @param AllocateCommonBuffer Indicate whether need to allocate common buffer for data transfer.
+ @param Data The user data to transfer, NULL if AllocateCommonBuffer is TRUE.
@param DataLen The length of data buffer.
@param Callback The function to call when data is transferred.
@param Context The context to the callback.
@@ -549,6 +558,7 @@ EhcCreateUrb (
IN EFI_USB2_HC_TRANSACTION_TRANSLATOR *Hub,
IN UINTN Type,
IN EFI_USB_DEVICE_REQUEST *Request,
+ IN BOOLEAN AllocateCommonBuffer,
IN VOID *Data,
IN UINTN DataLen,
IN EFI_ASYNC_USB_TRANSFER_CALLBACK Callback,
@@ -596,8 +606,24 @@ EhcCreateUrb (
Ep->PollRate = EhcConvertPollRate (Interval);
Urb->Request = Request;
+ if (AllocateCommonBuffer) {
+ ASSERT (Data == NULL);
+ Status = Ehc->PciIo->AllocateBuffer (
+ Ehc->PciIo,
+ AllocateAnyPages,
+ EfiBootServicesData,
+ EFI_SIZE_TO_PAGES (DataLen),
+ &Data,
+ 0
+ );
+ if (EFI_ERROR (Status) || (Data == NULL)) {
+ FreePool (Urb);
+ return NULL;
+ }
+ }
Urb->Data = Data;
Urb->DataLen = DataLen;
+ Urb->AllocateCommonBuffer = AllocateCommonBuffer;
Urb->Callback = Callback;
Urb->Context = Context;
@@ -627,10 +653,14 @@ EhcCreateUrb (
if (Data != NULL) {
Len = DataLen;
- if (Ep->Direction == EfiUsbDataIn) {
- MapOp = EfiPciIoOperationBusMasterWrite;
+ if (Urb->AllocateCommonBuffer) {
+ MapOp = EfiPciIoOperationBusMasterCommonBuffer;
} else {
- MapOp = EfiPciIoOperationBusMasterRead;
+ if (Ep->Direction == EfiUsbDataIn) {
+ MapOp = EfiPciIoOperationBusMasterWrite;
+ } else {
+ MapOp = EfiPciIoOperationBusMasterRead;
+ }
}
Status = PciIo->Map (PciIo, MapOp, Data, &Len, &PhyAddr, &Map);
diff --git a/MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.h b/MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.h
index 02e9af81be39..cb3599f9d361 100644
--- a/MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.h
+++ b/MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.h
@@ -3,7 +3,7 @@
This file contains URB request, each request is warpped in a
URB (Usb Request Block).
-Copyright (c) 2007 - 2010, Intel Corporation. All rights reserved.<BR>
+Copyright (c) 2007 - 2018, Intel Corporation. All rights reserved.<BR>
This program and the accompanying materials
are licensed and made available under the terms and conditions of the BSD License
which accompanies this distribution. The full text of the license may be found at
@@ -216,6 +216,7 @@ struct _URB {
EFI_USB_DEVICE_REQUEST *Request; // Control transfer only
VOID *RequestPhy; // Address of the mapped request
VOID *RequestMap;
+ BOOLEAN AllocateCommonBuffer;
VOID *Data;
UINTN DataLen;
VOID *DataPhy; // Address of the mapped user data
@@ -298,20 +299,21 @@ EhcFreeUrb (
/**
Create a new URB and its associated QTD.
- @param Ehc The EHCI device.
- @param DevAddr The device address.
- @param EpAddr Endpoint addrress & its direction.
- @param DevSpeed The device speed.
- @param Toggle Initial data toggle to use.
- @param MaxPacket The max packet length of the endpoint.
- @param Hub The transaction translator to use.
- @param Type The transaction type.
- @param Request The standard USB request for control transfer.
- @param Data The user data to transfer.
- @param DataLen The length of data buffer.
- @param Callback The function to call when data is transferred.
- @param Context The context to the callback.
- @param Interval The interval for interrupt transfer.
+ @param Ehc The EHCI device.
+ @param DevAddr The device address.
+ @param EpAddr Endpoint addrress & its direction.
+ @param DevSpeed The device speed.
+ @param Toggle Initial data toggle to use.
+ @param MaxPacket The max packet length of the endpoint.
+ @param Hub The transaction translator to use.
+ @param Type The transaction type.
+ @param Request The standard USB request for control transfer.
+ @param AllocateCommonBuffer Indicate whether need to allocate common buffer for data transfer.
+ @param Data The user data to transfer, NULL if AllocateCommonBuffer is TRUE.
+ @param DataLen The length of data buffer.
+ @param Callback The function to call when data is transferred.
+ @param Context The context to the callback.
+ @param Interval The interval for interrupt transfer.
@return Created URB or NULL.
@@ -327,6 +329,7 @@ EhcCreateUrb (
IN EFI_USB2_HC_TRANSACTION_TRANSLATOR *Hub,
IN UINTN Type,
IN EFI_USB_DEVICE_REQUEST *Request,
+ IN BOOLEAN AllocateCommonBuffer,
IN VOID *Data,
IN UINTN DataLen,
IN EFI_ASYNC_USB_TRANSFER_CALLBACK Callback,
--
2.7.0.windows.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH V3 3/4] MdeModulePkg XhciDxe: Use common buffer for AsyncInterruptTransfer
2018-10-26 13:41 ` [PATCH V3 3/4] MdeModulePkg XhciDxe: Use common buffer for AsyncInterruptTransfer Star Zeng
@ 2018-10-27 7:47 ` Wu, Hao A
0 siblings, 0 replies; 18+ messages in thread
From: Wu, Hao A @ 2018-10-27 7:47 UTC (permalink / raw)
To: Zeng, Star, edk2-devel@lists.01.org; +Cc: Ni, Ruiyu, Wang, Jian J, Yao, Jiewen
Reviewed-by: Hao Wu <hao.a.wu@intel.com>
Best Regards,
Hao Wu
> -----Original Message-----
> From: Zeng, Star
> Sent: Friday, October 26, 2018 9:41 PM
> To: edk2-devel@lists.01.org
> Cc: Zeng, Star; Ni, Ruiyu; Wu, Hao A; Wang, Jian J; Yao, Jiewen
> Subject: [PATCH V3 3/4] MdeModulePkg XhciDxe: Use common buffer for
> AsyncInterruptTransfer
>
> V3:
> Call XhcFreeUrb after XhcCreateTransferTrb fails in XhcCreateTrb.
>
> REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1274
>
> In current code, XhcMonitorAsyncRequests (timer handler) will do
> unmap and map operations for AsyncIntTransfers to "Flush data from
> PCI controller specific address to mapped system memory address".
> XhcMonitorAsyncRequests
> XhcFlushAsyncIntMap
> PciIo->Unmap
> IoMmu->SetAttribute
> PciIo->Map
> IoMmu->SetAttribute
>
> This may impact the boot performance.
>
> Since the data buffer for XhcMonitorAsyncRequests is internal
> buffer, we can allocate common buffer by PciIo->AllocateBuffer
> and map the buffer with EfiPciIoOperationBusMasterCommonBuffer,
> then the unmap and map operations can be removed.
>
> ///
> /// Provides both read and write access to system memory by
> /// both the processor and a bus master. The buffer is coherent
> /// from both the processor's and the bus master's point of view.
> ///
> EfiPciIoOperationBusMasterCommonBuffer,
>
> Test done:
> USB KB works normally.
> USB disk read/write works normally.
>
> Cc: Ruiyu Ni <ruiyu.ni@intel.com>
> Cc: Hao Wu <hao.a.wu@intel.com>
> Cc: Jian J Wang <jian.j.wang@intel.com>
> Cc: Jiewen Yao <jiewen.yao@intel.com>
> Contributed-under: TianoCore Contribution Agreement 1.1
> Signed-off-by: Star Zeng <star.zeng@intel.com>
> Reviewed-by: Ruiyu Ni <ruiyu.ni@intel.com>
> ---
> MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c | 1 +
> MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.c | 138 +++++++++++-------------
> -------
> MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.h | 27 +++---
> 3 files changed, 63 insertions(+), 103 deletions(-)
>
> diff --git a/MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c
> b/MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c
> index 7f64f9c7c982..64855a4c158c 100644
> --- a/MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c
> +++ b/MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c
> @@ -769,6 +769,7 @@ XhcTransfer (
> MaximumPacketLength,
> Type,
> Request,
> + FALSE,
> Data,
> *DataLength,
> NULL,
> diff --git a/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.c
> b/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.c
> index 75959ae08363..9dd45e93a272 100644
> --- a/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.c
> +++ b/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.c
> @@ -118,17 +118,18 @@ ON_EXIT:
> /**
> Create a new URB for a new transaction.
>
> - @param Xhc The XHCI Instance
> - @param BusAddr The logical device address assigned by UsbBus driver
> - @param EpAddr Endpoint addrress
> - @param DevSpeed The device speed
> - @param MaxPacket The max packet length of the endpoint
> - @param Type The transaction type
> - @param Request The standard USB request for control transfer
> - @param Data The user data to transfer
> - @param DataLen The length of data buffer
> - @param Callback The function to call when data is transferred
> - @param Context The context to the callback
> + @param Xhc The XHCI Instance
> + @param BusAddr The logical device address assigned by UsbBus
> driver
> + @param EpAddr Endpoint addrress
> + @param DevSpeed The device speed
> + @param MaxPacket The max packet length of the endpoint
> + @param Type The transaction type
> + @param Request The standard USB request for control transfer
> + @param AllocateCommonBuffer Indicate whether need to allocate
> common buffer for data transfer
> + @param Data The user data to transfer, NULL if
> AllocateCommonBuffer is TRUE
> + @param DataLen The length of data buffer
> + @param Callback The function to call when data is transferred
> + @param Context The context to the callback
>
> @return Created URB or NULL
>
> @@ -142,6 +143,7 @@ XhcCreateUrb (
> IN UINTN MaxPacket,
> IN UINTN Type,
> IN EFI_USB_DEVICE_REQUEST *Request,
> + IN BOOLEAN AllocateCommonBuffer,
> IN VOID *Data,
> IN UINTN DataLen,
> IN EFI_ASYNC_USB_TRANSFER_CALLBACK Callback,
> @@ -169,8 +171,24 @@ XhcCreateUrb (
> Ep->Type = Type;
>
> Urb->Request = Request;
> + if (AllocateCommonBuffer) {
> + ASSERT (Data == NULL);
> + Status = Xhc->PciIo->AllocateBuffer (
> + Xhc->PciIo,
> + AllocateAnyPages,
> + EfiBootServicesData,
> + EFI_SIZE_TO_PAGES (DataLen),
> + &Data,
> + 0
> + );
> + if (EFI_ERROR (Status) || (Data == NULL)) {
> + FreePool (Urb);
> + return NULL;
> + }
> + }
> Urb->Data = Data;
> Urb->DataLen = DataLen;
> + Urb->AllocateCommonBuffer = AllocateCommonBuffer;
> Urb->Callback = Callback;
> Urb->Context = Context;
>
> @@ -178,7 +196,7 @@ XhcCreateUrb (
> ASSERT_EFI_ERROR (Status);
> if (EFI_ERROR (Status)) {
> DEBUG ((EFI_D_ERROR, "XhcCreateUrb: XhcCreateTransferTrb Failed,
> Status = %r\n", Status));
> - FreePool (Urb);
> + XhcFreeUrb (Xhc, Urb);
> Urb = NULL;
> }
>
> @@ -206,6 +224,14 @@ XhcFreeUrb (
> Xhc->PciIo->Unmap (Xhc->PciIo, Urb->DataMap);
> }
>
> + if (Urb->AllocateCommonBuffer) {
> + Xhc->PciIo->FreeBuffer (
> + Xhc->PciIo,
> + EFI_SIZE_TO_PAGES (Urb->DataLen),
> + Urb->Data
> + );
> + }
> +
> FreePool (Urb);
> }
>
> @@ -264,10 +290,14 @@ XhcCreateTransferTrb (
> // No need to remap.
> //
> if ((Urb->Data != NULL) && (Urb->DataMap == NULL)) {
> - if (((UINT8) (Urb->Ep.Direction)) == EfiUsbDataIn) {
> - MapOp = EfiPciIoOperationBusMasterWrite;
> + if (Urb->AllocateCommonBuffer) {
> + MapOp = EfiPciIoOperationBusMasterCommonBuffer;
> } else {
> - MapOp = EfiPciIoOperationBusMasterRead;
> + if (((UINT8) (Urb->Ep.Direction)) == EfiUsbDataIn) {
> + MapOp = EfiPciIoOperationBusMasterWrite;
> + } else {
> + MapOp = EfiPciIoOperationBusMasterRead;
> + }
> }
>
> Len = Urb->DataLen;
> @@ -1367,7 +1397,6 @@ XhciDelAsyncIntTransfer (
> }
>
> RemoveEntryList (&Urb->UrbList);
> - FreePool (Urb->Data);
> XhcFreeUrb (Xhc, Urb);
> return EFI_SUCCESS;
> }
> @@ -1405,7 +1434,6 @@ XhciDelAllAsyncIntTransfers (
> }
>
> RemoveEntryList (&Urb->UrbList);
> - FreePool (Urb->Data);
> XhcFreeUrb (Xhc, Urb);
> }
> }
> @@ -1438,15 +1466,8 @@ XhciInsertAsyncIntTransfer (
> IN VOID *Context
> )
> {
> - VOID *Data;
> URB *Urb;
>
> - Data = AllocateZeroPool (DataLen);
> - if (Data == NULL) {
> - DEBUG ((DEBUG_ERROR, "%a: failed to allocate buffer\n",
> __FUNCTION__));
> - return NULL;
> - }
> -
> Urb = XhcCreateUrb (
> Xhc,
> BusAddr,
> @@ -1455,14 +1476,14 @@ XhciInsertAsyncIntTransfer (
> MaxPacket,
> XHC_INT_TRANSFER_ASYNC,
> NULL,
> - Data,
> + TRUE,
> + NULL,
> DataLen,
> Callback,
> Context
> );
> if (Urb == NULL) {
> DEBUG ((DEBUG_ERROR, "%a: failed to create URB\n", __FUNCTION__));
> - FreePool (Data);
> return NULL;
> }
>
> @@ -1503,61 +1524,6 @@ XhcUpdateAsyncRequest (
> }
>
> /**
> - Flush data from PCI controller specific address to mapped system
> - memory address.
> -
> - @param Xhc The XHCI device.
> - @param Urb The URB to unmap.
> -
> - @retval EFI_SUCCESS Success to flush data to mapped system memory.
> - @retval EFI_DEVICE_ERROR Fail to flush data to mapped system memory.
> -
> -**/
> -EFI_STATUS
> -XhcFlushAsyncIntMap (
> - IN USB_XHCI_INSTANCE *Xhc,
> - IN URB *Urb
> - )
> -{
> - EFI_STATUS Status;
> - EFI_PHYSICAL_ADDRESS PhyAddr;
> - EFI_PCI_IO_PROTOCOL_OPERATION MapOp;
> - EFI_PCI_IO_PROTOCOL *PciIo;
> - UINTN Len;
> - VOID *Map;
> -
> - PciIo = Xhc->PciIo;
> - Len = Urb->DataLen;
> -
> - if (Urb->Ep.Direction == EfiUsbDataIn) {
> - MapOp = EfiPciIoOperationBusMasterWrite;
> - } else {
> - MapOp = EfiPciIoOperationBusMasterRead;
> - }
> -
> - if (Urb->DataMap != NULL) {
> - Status = PciIo->Unmap (PciIo, Urb->DataMap);
> - if (EFI_ERROR (Status)) {
> - goto ON_ERROR;
> - }
> - }
> -
> - Urb->DataMap = NULL;
> -
> - Status = PciIo->Map (PciIo, MapOp, Urb->Data, &Len, &PhyAddr, &Map);
> - if (EFI_ERROR (Status) || (Len != Urb->DataLen)) {
> - goto ON_ERROR;
> - }
> -
> - Urb->DataPhy = (VOID *) ((UINTN) PhyAddr);
> - Urb->DataMap = Map;
> - return EFI_SUCCESS;
> -
> -ON_ERROR:
> - return EFI_DEVICE_ERROR;
> -}
> -
> -/**
> Interrupt transfer periodic check handler.
>
> @param Event Interrupt event.
> @@ -1577,7 +1543,6 @@ XhcMonitorAsyncRequests (
> UINT8 *ProcBuf;
> URB *Urb;
> UINT8 SlotId;
> - EFI_STATUS Status;
> EFI_TPL OldTpl;
>
> OldTpl = gBS->RaiseTPL (XHC_TPL);
> @@ -1606,15 +1571,6 @@ XhcMonitorAsyncRequests (
> }
>
> //
> - // Flush any PCI posted write transactions from a PCI host
> - // bridge to system memory.
> - //
> - Status = XhcFlushAsyncIntMap (Xhc, Urb);
> - if (EFI_ERROR (Status)) {
> - DEBUG ((EFI_D_ERROR, "XhcMonitorAsyncRequests: Fail to Flush
> AsyncInt Mapped Memeory\n"));
> - }
> -
> - //
> // Allocate a buffer then copy the transferred data for user.
> // If failed to allocate the buffer, update the URB for next
> // round of transfer. Ignore the data of this round.
> diff --git a/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.h
> b/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.h
> index b5e192c3b589..1a95d98996c1 100644
> --- a/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.h
> +++ b/MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.h
> @@ -172,6 +172,7 @@ typedef struct _URB {
> //
> USB_ENDPOINT Ep;
> EFI_USB_DEVICE_REQUEST *Request;
> + BOOLEAN AllocateCommonBuffer;
> VOID *Data;
> UINTN DataLen;
> VOID *DataPhy;
> @@ -1432,17 +1433,18 @@ XhcSetTrDequeuePointer (
> /**
> Create a new URB for a new transaction.
>
> - @param Xhc The XHCI Instance
> - @param DevAddr The device address
> - @param EpAddr Endpoint addrress
> - @param DevSpeed The device speed
> - @param MaxPacket The max packet length of the endpoint
> - @param Type The transaction type
> - @param Request The standard USB request for control transfer
> - @param Data The user data to transfer
> - @param DataLen The length of data buffer
> - @param Callback The function to call when data is transferred
> - @param Context The context to the callback
> + @param Xhc The XHCI Instance
> + @param BusAddr The logical device address assigned by UsbBus
> driver
> + @param EpAddr Endpoint addrress
> + @param DevSpeed The device speed
> + @param MaxPacket The max packet length of the endpoint
> + @param Type The transaction type
> + @param Request The standard USB request for control transfer
> + @param AllocateCommonBuffer Indicate whether need to allocate
> common buffer for data transfer
> + @param Data The user data to transfer, NULL if
> AllocateCommonBuffer is TRUE
> + @param DataLen The length of data buffer
> + @param Callback The function to call when data is transferred
> + @param Context The context to the callback
>
> @return Created URB or NULL
>
> @@ -1450,12 +1452,13 @@ XhcSetTrDequeuePointer (
> URB*
> XhcCreateUrb (
> IN USB_XHCI_INSTANCE *Xhc,
> - IN UINT8 DevAddr,
> + IN UINT8 BusAddr,
> IN UINT8 EpAddr,
> IN UINT8 DevSpeed,
> IN UINTN MaxPacket,
> IN UINTN Type,
> IN EFI_USB_DEVICE_REQUEST *Request,
> + IN BOOLEAN AllocateCommonBuffer,
> IN VOID *Data,
> IN UINTN DataLen,
> IN EFI_ASYNC_USB_TRANSFER_CALLBACK Callback,
> --
> 2.7.0.windows.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH V3 0/4] Remove unnecessary Map/Unmap in XhciDxe/EhciDxe for AsyncInterruptTransfer
2018-10-26 13:41 [PATCH V3 0/4] Remove unnecessary Map/Unmap in XhciDxe/EhciDxe for AsyncInterruptTransfer Star Zeng
` (3 preceding siblings ...)
2018-10-26 13:41 ` [PATCH V3 4/4] MdeModulePkg EhciDxe: " Star Zeng
@ 2018-10-28 13:35 ` Zeng, Star
2018-10-29 2:19 ` Ni, Ruiyu
5 siblings, 0 replies; 18+ messages in thread
From: Zeng, Star @ 2018-10-28 13:35 UTC (permalink / raw)
To: edk2-devel@lists.01.org
Cc: Ni, Ruiyu, Wu, Hao A, Wang, Jian J, Yao, Jiewen, Zeng, Star
Thanks for all the reviews, I just pushed the patches at https://github.com/tianocore/edk2/compare/a6eb94e...0cd6452.
Star
-----Original Message-----
From: Zeng, Star
Sent: Friday, October 26, 2018 9:41 PM
To: edk2-devel@lists.01.org
Cc: Zeng, Star <star.zeng@intel.com>; Ni, Ruiyu <ruiyu.ni@intel.com>; Wu, Hao A <hao.a.wu@intel.com>; Wang, Jian J <jian.j.wang@intel.com>; Yao, Jiewen <jiewen.yao@intel.com>
Subject: [PATCH V3 0/4] Remove unnecessary Map/Unmap in XhciDxe/EhciDxe for AsyncInterruptTransfer
V3: Thanks for Hao's feedback.
1. Match function parameter name and description between XhciSched.c and XhciSched.h, EhciSched.c and EhciSched.h.
2. Call XhcFreeUrb after XhcCreateTransferTrb fails in XhcCreateTrb.
V2: Thanks for Ray's feedback.
1. Add the missing "FreePool (Data);".
2. Remove the unnecessary indentation change.
REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1274
Please refer to the log message of each commit for more details.
Cc: Ruiyu Ni <ruiyu.ni@intel.com>
Cc: Hao Wu <hao.a.wu@intel.com>
Cc: Jian J Wang <jian.j.wang@intel.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Star Zeng (4):
MdeModulePkg XhciDxe: Extract new XhciInsertAsyncIntTransfer function
MdeModulePkg EhciDxe: Extract new EhciInsertAsyncIntTransfer function
MdeModulePkg XhciDxe: Use common buffer for AsyncInterruptTransfer
MdeModulePkg EhciDxe: Use common buffer for AsyncInterruptTransfer
MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c | 28 +----
MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.c | 114 +++++++++---------- MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.h | 35 +++++-
MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.c | 38 ++++++-
MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.h | 33 +++---
MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c | 19 +---
MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.c | 185 +++++++++++++++++-------------- MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.h | 55 +++++++--
8 files changed, 296 insertions(+), 211 deletions(-)
--
2.7.0.windows.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH V3 0/4] Remove unnecessary Map/Unmap in XhciDxe/EhciDxe for AsyncInterruptTransfer
2018-10-26 13:41 [PATCH V3 0/4] Remove unnecessary Map/Unmap in XhciDxe/EhciDxe for AsyncInterruptTransfer Star Zeng
` (4 preceding siblings ...)
2018-10-28 13:35 ` [PATCH V3 0/4] Remove unnecessary Map/Unmap in XhciDxe/EhciDxe " Zeng, Star
@ 2018-10-29 2:19 ` Ni, Ruiyu
5 siblings, 0 replies; 18+ messages in thread
From: Ni, Ruiyu @ 2018-10-29 2:19 UTC (permalink / raw)
To: Star Zeng, edk2-devel; +Cc: Hao Wu, Jian J Wang, Jiewen Yao
On 10/26/2018 9:41 PM, Star Zeng wrote:
> V3: Thanks for Hao's feedback.
> 1. Match function parameter name and description between
> XhciSched.c and XhciSched.h, EhciSched.c and EhciSched.h.
> 2. Call XhcFreeUrb after XhcCreateTransferTrb fails in XhcCreateTrb.
>
>
> V2: Thanks for Ray's feedback.
> 1. Add the missing "FreePool (Data);".
> 2. Remove the unnecessary indentation change.
>
> REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1274
>
> Please refer to the log message of each commit for more details.
>
> Cc: Ruiyu Ni <ruiyu.ni@intel.com>
> Cc: Hao Wu <hao.a.wu@intel.com>
> Cc: Jian J Wang <jian.j.wang@intel.com>
> Cc: Jiewen Yao <jiewen.yao@intel.com>
>
> Star Zeng (4):
> MdeModulePkg XhciDxe: Extract new XhciInsertAsyncIntTransfer function
> MdeModulePkg EhciDxe: Extract new EhciInsertAsyncIntTransfer function
> MdeModulePkg XhciDxe: Use common buffer for AsyncInterruptTransfer
> MdeModulePkg EhciDxe: Use common buffer for AsyncInterruptTransfer
>
> MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c | 28 +----
> MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.c | 114 +++++++++----------
> MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.h | 35 +++++-
> MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.c | 38 ++++++-
> MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.h | 33 +++---
> MdeModulePkg/Bus/Pci/XhciDxe/Xhci.c | 19 +---
> MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.c | 185 +++++++++++++++++--------------
> MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.h | 55 +++++++--
> 8 files changed, 296 insertions(+), 211 deletions(-)
>
Reviewed-by: Ruiyu Ni <ruiyu.ni@Intel.com>
--
Thanks,
Ray
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH V3 4/4] MdeModulePkg EhciDxe: Use common buffer for AsyncInterruptTransfer
[not found] ` <CS1PR8401MB1189428C2915107C583C0A42B4CC0@CS1PR8401MB1189.NAMPRD84.PROD.OUTLOOK.COM>
@ 2018-10-30 12:39 ` Ard Biesheuvel
2018-10-30 12:50 ` Leif Lindholm
0 siblings, 1 reply; 18+ messages in thread
From: Ard Biesheuvel @ 2018-10-30 12:39 UTC (permalink / raw)
To: Cohen, Eugene, edk2-devel-01; +Cc: Star Zeng, Leif Lindholm
(add back the list)
On 30 October 2018 at 09:07, Cohen, Eugene <eugene@hp.com> wrote:
> Has this patch been tested on a system that does not have coherent DMA?
>
>
>
> It's not clear that this change would actually be faster on a system of that
> type since using common buffers imply access to uncached memory. Depending
> on the access patterns the uncached memory access could be more time
> consuming than cache maintenance operations.
>
I haven't had time to look at these patches yet.
I agree with Eugene's concern: the directional DMA routines are much
more performant on implementations with non-coherent DMA, and so
common buffers should be avoided unless we are dealing with data
structures that are truly shared between the CPU and the device.
Since this is obviously not the case here, could we please have some
numbers about the performance improvement we are talking about here?
Would it be possible to improve the IOMMU handling code instead?
>
>
> From: edk2-devel <edk2-devel-bounces@lists.01.org> On Behalf Of Star Zeng
> Sent: Friday, October 26, 2018 7:41 AM
> To: edk2-devel@lists.01.org
> Cc: Ruiyu Ni <ruiyu.ni@intel.com>; Hao Wu <hao.a.wu@intel.com>; Jiewen Yao
> <jiewen.yao@intel.com>; Star Zeng <star.zeng@intel.com>
> Subject: [edk2] [PATCH V3 4/4] MdeModulePkg EhciDxe: Use common buffer for
> AsyncInterruptTransfer
>
>
>
> REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1274
>
> In current code, EhcMonitorAsyncRequests (timer handler) will do
> unmap and map operations for AsyncIntTransfers to "Flush data from
> PCI controller specific address to mapped system memory address".
> EhcMonitorAsyncRequests
> EhcFlushAsyncIntMap
> PciIo->Unmap
> IoMmu->SetAttribute
> PciIo->Map
> IoMmu->SetAttribute
>
> This may impact the boot performance.
>
> Since the data buffer for EhcMonitorAsyncRequests is internal
> buffer, we can allocate common buffer by PciIo->AllocateBuffer
> and map the buffer with EfiPciIoOperationBusMasterCommonBuffer,
> then the unmap and map operations can be removed.
>
> ///
> /// Provides both read and write access to system memory by
> /// both the processor and a bus master. The buffer is coherent
> /// from both the processor's and the bus master's point of view.
> ///
> EfiPciIoOperationBusMasterCommonBuffer,
>
> Test done:
> USB KB works normally.
> USB disk read/write works normally.
>
> Cc: Ruiyu Ni <ruiyu.ni@intel.com>
> Cc: Hao Wu <hao.a.wu@intel.com>
> Cc: Jian J Wang <jian.j.wang@intel.com>
> Cc: Jiewen Yao <jiewen.yao@intel.com>
> Contributed-under: TianoCore Contribution Agreement 1.1
> Signed-off-by: Star Zeng <star.zeng@intel.com>
> Reviewed-by: Ruiyu Ni <ruiyu.ni@intel.com>
> Reviewed-by: Hao Wu <hao.a.wu@intel.com>
> ---
> MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c | 3 ++
> MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.c | 78
> +-------------------------------
> MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.c | 38 ++++++++++++++--
> MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.h | 33 ++++++++------
> 4 files changed, 57 insertions(+), 95 deletions(-)
>
> diff --git a/MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c
> b/MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c
> index 5569f4f9618b..764eeda58ba1 100644
> --- a/MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c
> +++ b/MdeModulePkg/Bus/Pci/EhciDxe/Ehci.c
> @@ -763,6 +763,7 @@ EhcControlTransfer (
> Translator,
> EHC_CTRL_TRANSFER,
> Request,
> + FALSE,
> Data,
> *DataLength,
> NULL,
> @@ -906,6 +907,7 @@ EhcBulkTransfer (
> Translator,
> EHC_BULK_TRANSFER,
> NULL,
> + FALSE,
> Data[0],
> *DataLength,
> NULL,
> @@ -1163,6 +1165,7 @@ EhcSyncInterruptTransfer (
> Translator,
> EHC_INT_TRANSFER_SYNC,
> NULL,
> + FALSE,
> Data,
> *DataLength,
> NULL,
> diff --git a/MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.c
> b/MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.c
> index ec8d796fab11..b067fd02d1ce 100644
> --- a/MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.c
> +++ b/MdeModulePkg/Bus/Pci/EhciDxe/EhciSched.c
> @@ -778,7 +778,6 @@ EhciDelAsyncIntTransfer (
> EhcUnlinkQhFromPeriod (Ehc, Urb->Qh);
> RemoveEntryList (&Urb->UrbList);
>
> - gBS->FreePool (Urb->Data);
> EhcFreeUrb (Ehc, Urb);
> return EFI_SUCCESS;
> }
> @@ -809,7 +808,6 @@ EhciDelAllAsyncIntTransfers (
> EhcUnlinkQhFromPeriod (Ehc, Urb->Qh);
> RemoveEntryList (&Urb->UrbList);
>
> - gBS->FreePool (Urb->Data);
> EhcFreeUrb (Ehc, Urb);
> }
> }
> @@ -848,16 +846,8 @@ EhciInsertAsyncIntTransfer (
> IN UINTN Interval
> )
> {
> - VOID *Data;
> URB *Urb;
>
> - Data = AllocatePool (DataLen);
> -
> - if (Data == NULL) {
> - DEBUG ((DEBUG_ERROR, "%a: failed to allocate buffer\n", __FUNCTION__));
> - return NULL;
> - }
> -
> Urb = EhcCreateUrb (
> Ehc,
> DevAddr,
> @@ -868,7 +858,8 @@ EhciInsertAsyncIntTransfer (
> Hub,
> EHC_INT_TRANSFER_ASYNC,
> NULL,
> - Data,
> + TRUE,
> + NULL,
> DataLen,
> Callback,
> Context,
> @@ -877,7 +868,6 @@ EhciInsertAsyncIntTransfer (
>
> if (Urb == NULL) {
> DEBUG ((DEBUG_ERROR, "%a: failed to create URB\n", __FUNCTION__));
> - gBS->FreePool (Data);
> return NULL;
> }
>
> @@ -892,60 +882,6 @@ EhciInsertAsyncIntTransfer (
> }
>
> /**
> - Flush data from PCI controller specific address to mapped system
> - memory address.
> -
> - @param Ehc The EHCI device.
> - @param Urb The URB to unmap.
> -
> - @retval EFI_SUCCESS Success to flush data to mapped system memory.
> - @retval EFI_DEVICE_ERROR Fail to flush data to mapped system memory.
> -
> -**/
> -EFI_STATUS
> -EhcFlushAsyncIntMap (
> - IN USB2_HC_DEV *Ehc,
> - IN URB *Urb
> - )
> -{
> - EFI_STATUS Status;
> - EFI_PHYSICAL_ADDRESS PhyAddr;
> - EFI_PCI_IO_PROTOCOL_OPERATION MapOp;
> - EFI_PCI_IO_PROTOCOL *PciIo;
> - UINTN Len;
> - VOID *Map;
> -
> - PciIo = Ehc->PciIo;
> - Len = Urb->DataLen;
> -
> - if (Urb->Ep.Direction == EfiUsbDataIn) {
> - MapOp = EfiPciIoOperationBusMasterWrite;
> - } else {
> - MapOp = EfiPciIoOperationBusMasterRead;
> - }
> -
> - Status = PciIo->Unmap (PciIo, Urb->DataMap);
> - if (EFI_ERROR (Status)) {
> - goto ON_ERROR;
> - }
> -
> - Urb->DataMap = NULL;
> -
> - Status = PciIo->Map (PciIo, MapOp, Urb->Data, &Len, &PhyAddr, &Map);
> - if (EFI_ERROR (Status) || (Len != Urb->DataLen)) {
> - goto ON_ERROR;
> - }
> -
> - Urb->DataPhy = (VOID *) ((UINTN) PhyAddr);
> - Urb->DataMap = Map;
> - return EFI_SUCCESS;
> -
> -ON_ERROR:
> - return EFI_DEVICE_ERROR;
> -}
> -
> -
> -/**
> Update the queue head for next round of asynchronous transfer.
>
> @param Ehc The EHCI device.
> @@ -1050,7 +986,6 @@ EhcMonitorAsyncRequests (
> BOOLEAN Finished;
> UINT8 *ProcBuf;
> URB *Urb;
> - EFI_STATUS Status;
>
> OldTpl = gBS->RaiseTPL (EHC_TPL);
> Ehc = (USB2_HC_DEV *) Context;
> @@ -1069,15 +1004,6 @@ EhcMonitorAsyncRequests (
> }
>
> //
> - // Flush any PCI posted write transactions from a PCI host
> - // bridge to system memory.
> - //
> - Status = EhcFlushAsyncIntMap (Ehc, Urb);
> - if (EFI_ERROR (Status)) {
> - DEBUG ((EFI_D_ERROR, "EhcMonitorAsyncRequests: Fail to Flush AsyncInt
> Mapped Memeory\n"));
> - }
> -
> - //
> // Allocate a buffer then copy the transferred data for user.
> // If failed to allocate the buffer, update the URB for next
> // round of transfer. Ignore the data of this round.
> diff --git a/MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.c
> b/MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.c
> index 6afb327df165..09c12776ecdb 100644
> --- a/MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.c
> +++ b/MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.c
> @@ -339,6 +339,14 @@ EhcFreeUrb (
> PciIo->Unmap (PciIo, Urb->DataMap);
> }
>
> + if (Urb->AllocateCommonBuffer) {
> + PciIo->FreeBuffer (
> + PciIo,
> + EFI_SIZE_TO_PAGES (Urb->DataLen),
> + Urb->Data
> + );
> + }
> +
> if (Urb->Qh != NULL) {
> //
> // Ensure that this queue head has been unlinked from the
> @@ -529,7 +537,8 @@ ON_ERROR:
> @param Hub The transaction translator to use.
> @param Type The transaction type.
> @param Request The standard USB request for control transfer.
> - @param Data The user data to transfer.
> + @param AllocateCommonBuffer Indicate whether need to allocate common
> buffer for data transfer.
> + @param Data The user data to transfer, NULL if AllocateCommonBuffer is
> TRUE.
> @param DataLen The length of data buffer.
> @param Callback The function to call when data is transferred.
> @param Context The context to the callback.
> @@ -549,6 +558,7 @@ EhcCreateUrb (
> IN EFI_USB2_HC_TRANSACTION_TRANSLATOR *Hub,
> IN UINTN Type,
> IN EFI_USB_DEVICE_REQUEST *Request,
> + IN BOOLEAN AllocateCommonBuffer,
> IN VOID *Data,
> IN UINTN DataLen,
> IN EFI_ASYNC_USB_TRANSFER_CALLBACK Callback,
> @@ -596,8 +606,24 @@ EhcCreateUrb (
> Ep->PollRate = EhcConvertPollRate (Interval);
>
> Urb->Request = Request;
> + if (AllocateCommonBuffer) {
> + ASSERT (Data == NULL);
> + Status = Ehc->PciIo->AllocateBuffer (
> + Ehc->PciIo,
> + AllocateAnyPages,
> + EfiBootServicesData,
> + EFI_SIZE_TO_PAGES (DataLen),
> + &Data,
> + 0
> + );
> + if (EFI_ERROR (Status) || (Data == NULL)) {
> + FreePool (Urb);
> + return NULL;
> + }
> + }
> Urb->Data = Data;
> Urb->DataLen = DataLen;
> + Urb->AllocateCommonBuffer = AllocateCommonBuffer;
> Urb->Callback = Callback;
> Urb->Context = Context;
>
> @@ -627,10 +653,14 @@ EhcCreateUrb (
> if (Data != NULL) {
> Len = DataLen;
>
> - if (Ep->Direction == EfiUsbDataIn) {
> - MapOp = EfiPciIoOperationBusMasterWrite;
> + if (Urb->AllocateCommonBuffer) {
> + MapOp = EfiPciIoOperationBusMasterCommonBuffer;
> } else {
> - MapOp = EfiPciIoOperationBusMasterRead;
> + if (Ep->Direction == EfiUsbDataIn) {
> + MapOp = EfiPciIoOperationBusMasterWrite;
> + } else {
> + MapOp = EfiPciIoOperationBusMasterRead;
> + }
> }
>
> Status = PciIo->Map (PciIo, MapOp, Data, &Len, &PhyAddr, &Map);
> diff --git a/MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.h
> b/MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.h
> index 02e9af81be39..cb3599f9d361 100644
> --- a/MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.h
> +++ b/MdeModulePkg/Bus/Pci/EhciDxe/EhciUrb.h
> @@ -3,7 +3,7 @@
> This file contains URB request, each request is warpped in a
> URB (Usb Request Block).
>
> -Copyright (c) 2007 - 2010, Intel Corporation. All rights reserved.<BR>
> +Copyright (c) 2007 - 2018, Intel Corporation. All rights reserved.<BR>
> This program and the accompanying materials
> are licensed and made available under the terms and conditions of the BSD
> License
> which accompanies this distribution. The full text of the license may be
> found at
> @@ -216,6 +216,7 @@ struct _URB {
> EFI_USB_DEVICE_REQUEST *Request; // Control transfer only
> VOID *RequestPhy; // Address of the mapped request
> VOID *RequestMap;
> + BOOLEAN AllocateCommonBuffer;
> VOID *Data;
> UINTN DataLen;
> VOID *DataPhy; // Address of the mapped user data
> @@ -298,20 +299,21 @@ EhcFreeUrb (
> /**
> Create a new URB and its associated QTD.
>
> - @param Ehc The EHCI device.
> - @param DevAddr The device address.
> - @param EpAddr Endpoint addrress & its direction.
> - @param DevSpeed The device speed.
> - @param Toggle Initial data toggle to use.
> - @param MaxPacket The max packet length of the endpoint.
> - @param Hub The transaction translator to use.
> - @param Type The transaction type.
> - @param Request The standard USB request for control transfer.
> - @param Data The user data to transfer.
> - @param DataLen The length of data buffer.
> - @param Callback The function to call when data is transferred.
> - @param Context The context to the callback.
> - @param Interval The interval for interrupt transfer.
> + @param Ehc The EHCI device.
> + @param DevAddr The device address.
> + @param EpAddr Endpoint addrress & its direction.
> + @param DevSpeed The device speed.
> + @param Toggle Initial data toggle to use.
> + @param MaxPacket The max packet length of the endpoint.
> + @param Hub The transaction translator to use.
> + @param Type The transaction type.
> + @param Request The standard USB request for control transfer.
> + @param AllocateCommonBuffer Indicate whether need to allocate common
> buffer for data transfer.
> + @param Data The user data to transfer, NULL if AllocateCommonBuffer is
> TRUE.
> + @param DataLen The length of data buffer.
> + @param Callback The function to call when data is transferred.
> + @param Context The context to the callback.
> + @param Interval The interval for interrupt transfer.
>
> @return Created URB or NULL.
>
> @@ -327,6 +329,7 @@ EhcCreateUrb (
> IN EFI_USB2_HC_TRANSACTION_TRANSLATOR *Hub,
> IN UINTN Type,
> IN EFI_USB_DEVICE_REQUEST *Request,
> + IN BOOLEAN AllocateCommonBuffer,
> IN VOID *Data,
> IN UINTN DataLen,
> IN EFI_ASYNC_USB_TRANSFER_CALLBACK Callback,
> --
> 2.7.0.windows.1
>
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH V3 4/4] MdeModulePkg EhciDxe: Use common buffer for AsyncInterruptTransfer
2018-10-30 12:39 ` Ard Biesheuvel
@ 2018-10-30 12:50 ` Leif Lindholm
2018-10-31 4:38 ` Zeng, Star
0 siblings, 1 reply; 18+ messages in thread
From: Leif Lindholm @ 2018-10-30 12:50 UTC (permalink / raw)
To: Ard Biesheuvel; +Cc: Cohen, Eugene, edk2-devel-01, Star Zeng
On Tue, Oct 30, 2018 at 09:39:24AM -0300, Ard Biesheuvel wrote:
> (add back the list)
Oi! Go back on holiday!
> On 30 October 2018 at 09:07, Cohen, Eugene <eugene@hp.com> wrote:
> > Has this patch been tested on a system that does not have coherent DMA?
> >
> > It's not clear that this change would actually be faster on a system of that
> > type since using common buffers imply access to uncached memory. Depending
> > on the access patterns the uncached memory access could be more time
> > consuming than cache maintenance operations.
>
> I haven't had time to look at these patches yet.
>
> I agree with Eugene's concern: the directional DMA routines are much
> more performant on implementations with non-coherent DMA, and so
> common buffers should be avoided unless we are dealing with data
> structures that are truly shared between the CPU and the device.
>
> Since this is obviously not the case here, could we please have some
> numbers about the performance improvement we are talking about here?
> Would it be possible to improve the IOMMU handling code instead?
On an unrelated note to the concerns above:
Why has a fundamental change to the behaviour of one of the industry
standard drivers been pushed at the very end of the stable cycle?
Regards,
Leif
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH V3 4/4] MdeModulePkg EhciDxe: Use common buffer for AsyncInterruptTransfer
2018-10-30 12:50 ` Leif Lindholm
@ 2018-10-31 4:38 ` Zeng, Star
2018-10-31 12:08 ` Leif Lindholm
2018-11-06 9:49 ` Ard Biesheuvel
0 siblings, 2 replies; 18+ messages in thread
From: Zeng, Star @ 2018-10-31 4:38 UTC (permalink / raw)
To: Leif Lindholm, Ard Biesheuvel, Cohen, Eugene; +Cc: edk2-devel-01, star.zeng
Good feedback.
On 2018/10/30 20:50, Leif Lindholm wrote:
> On Tue, Oct 30, 2018 at 09:39:24AM -0300, Ard Biesheuvel wrote:
>> (add back the list)
>
> Oi! Go back on holiday!
>
>> On 30 October 2018 at 09:07, Cohen, Eugene <eugene@hp.com> wrote:
>>> Has this patch been tested on a system that does not have coherent DMA?
>>>
>>> It's not clear that this change would actually be faster on a system of that
>>> type since using common buffers imply access to uncached memory. Depending
>>> on the access patterns the uncached memory access could be more time
>>> consuming than cache maintenance operations.
The change/idea was based on the statement below.
///
/// Provides both read and write access to system memory by both the
processor and a
/// bus master. The buffer is coherent from both the processor's and
the bus master's point of view.
///
EfiPciIoOperationBusMasterCommonBuffer,
Thanks for raising case about uncached memory access. But after checking
the code, for Intel VTd case
https://github.com/tianocore/edk2/blob/master/IntelSiliconPkg/Feature/VTd/IntelVTdDxe/BmDma.c#L460
(or no IOMMU case
https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciRootBridgeIo.c#L1567),
the common buffer is just normal memory buffer.
If someone can help do some test/collect some data on a system using
common buffers imply access to uncached memory, that will be great.
>>
>> I haven't had time to look at these patches yet.
>>
>> I agree with Eugene's concern: the directional DMA routines are much
>> more performant on implementations with non-coherent DMA, and so
>> common buffers should be avoided unless we are dealing with data
>> structures that are truly shared between the CPU and the device.
>>
>> Since this is obviously not the case here, could we please have some
>> numbers about the performance improvement we are talking about here?
>> Would it be possible to improve the IOMMU handling code instead?
We collected the data below on a platform with release image and Intel
VTd enabled.
The image size of EhciDxe or XhciDxe can reduce about 120+ bytes.
EHCI without the patch:
==[ Cumulative ]========
(Times in microsec.) Cumulative Average Shortest Longest
Name Count Duration Duration Duration Duration
-------------------------------------------------------------------------------
S0000B00D1DF0 446 2150 4 2 963
EHCI with the patch:
==[ Cumulative ]========
(Times in microsec.) Cumulative Average Shortest Longest
Name Count Duration Duration Duration Duration
-------------------------------------------------------------------------------
S0000B00D1DF0 270 742 2 2 41
XHCI without the patch:
==[ Cumulative ]========
(Times in microsec.) Cumulative Average Shortest Longest
Name Count Duration Duration Duration Duration
-------------------------------------------------------------------------------
S0000B00D14F0 215 603 2 2 52
XHCI with the patch:
==[ Cumulative ]========
(Times in microsec.) Cumulative Average Shortest Longest
Name Count Duration Duration Duration Duration
-------------------------------------------------------------------------------
S0000B00D14F0 95 294 3 2 52
I believe the performance data really depends on
1. How many AsyncInterruptTransfer handlers (the number of USB keyboard
and/or USB bluetooth keyboard?)
2. Data size (for flushing data from PCI controller specific address to
mapped system memory address *in original code*)
3. The performance of IoMmu->SetAttribute (for example, the SetAttribute
operation on Intel VTd engine caused by the unmap and map for flushing
data *in original code*, the SetAttribute operation on IntelVTd engine
will involve FlushPageTableMemory, InvalidatePageEntry and etc)
>
> On an unrelated note to the concerns above:
> Why has a fundamental change to the behaviour of one of the industry
> standard drivers been pushed at the very end of the stable cycle?
We thought it was a simple improvement but not fundamental change before
Eugene and Ard raised the concern.
Thanks,
Star
>
> Regards,
>
> Leif
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH V3 4/4] MdeModulePkg EhciDxe: Use common buffer for AsyncInterruptTransfer
2018-10-31 4:38 ` Zeng, Star
@ 2018-10-31 12:08 ` Leif Lindholm
2018-11-01 1:12 ` Zeng, Star
2018-11-06 9:49 ` Ard Biesheuvel
1 sibling, 1 reply; 18+ messages in thread
From: Leif Lindholm @ 2018-10-31 12:08 UTC (permalink / raw)
To: Zeng, Star; +Cc: Ard Biesheuvel, Cohen, Eugene, edk2-devel-01
On Wed, Oct 31, 2018 at 12:38:43PM +0800, Zeng, Star wrote:
> Good feedback.
>
> On 2018/10/30 20:50, Leif Lindholm wrote:
> > On Tue, Oct 30, 2018 at 09:39:24AM -0300, Ard Biesheuvel wrote:
> > > (add back the list)
> >
> > Oi! Go back on holiday!
> >
> > > On 30 October 2018 at 09:07, Cohen, Eugene <eugene@hp.com> wrote:
> > > > Has this patch been tested on a system that does not have coherent DMA?
> > > >
> > > > It's not clear that this change would actually be faster on a system of that
> > > > type since using common buffers imply access to uncached memory. Depending
> > > > on the access patterns the uncached memory access could be more time
> > > > consuming than cache maintenance operations.
>
> The change/idea was based on the statement below.
> ///
> /// Provides both read and write access to system memory by both the
> processor and a
> /// bus master. The buffer is coherent from both the processor's and the
> bus master's point of view.
> ///
> EfiPciIoOperationBusMasterCommonBuffer,
>
> Thanks for raising case about uncached memory access. But after checking the
> code, for Intel VTd case https://github.com/tianocore/edk2/blob/master/IntelSiliconPkg/Feature/VTd/IntelVTdDxe/BmDma.c#L460
> (or no IOMMU case https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciRootBridgeIo.c#L1567),
> the common buffer is just normal memory buffer.
> If someone can help do some test/collect some data on a system using common
> buffers imply access to uncached memory, that will be great.
>
> > >
> > > I haven't had time to look at these patches yet.
> > >
> > > I agree with Eugene's concern: the directional DMA routines are much
> > > more performant on implementations with non-coherent DMA, and so
> > > common buffers should be avoided unless we are dealing with data
> > > structures that are truly shared between the CPU and the device.
> > >
> > > Since this is obviously not the case here, could we please have some
> > > numbers about the performance improvement we are talking about here?
> > > Would it be possible to improve the IOMMU handling code instead?
>
> We collected the data below on a platform with release image and Intel VTd
> enabled.
>
> The image size of EhciDxe or XhciDxe can reduce about 120+ bytes.
>
> EHCI without the patch:
> ==[ Cumulative ]========
> (Times in microsec.) Cumulative Average Shortest Longest
> Name Count Duration Duration Duration Duration
> -------------------------------------------------------------------------------
> S0000B00D1DF0 446 2150 4 2 963
>
> EHCI with the patch:
> ==[ Cumulative ]========
> (Times in microsec.) Cumulative Average Shortest Longest
> Name Count Duration Duration Duration Duration
> -------------------------------------------------------------------------------
> S0000B00D1DF0 270 742 2 2 41
>
> XHCI without the patch:
> ==[ Cumulative ]========
> (Times in microsec.) Cumulative Average Shortest Longest
> Name Count Duration Duration Duration Duration
> -------------------------------------------------------------------------------
> S0000B00D14F0 215 603 2 2 52
>
> XHCI with the patch:
> ==[ Cumulative ]========
> (Times in microsec.) Cumulative Average Shortest Longest
> Name Count Duration Duration Duration Duration
> -------------------------------------------------------------------------------
> S0000B00D14F0 95 294 3 2 52
>
> I believe the performance data really depends on
> 1. How many AsyncInterruptTransfer handlers (the number of USB keyboard
> and/or USB bluetooth keyboard?)
> 2. Data size (for flushing data from PCI controller specific address to
> mapped system memory address *in original code*)
> 3. The performance of IoMmu->SetAttribute (for example, the SetAttribute
> operation on Intel VTd engine caused by the unmap and map for flushing data
> *in original code*, the SetAttribute operation on IntelVTd engine will
> involve FlushPageTableMemory, InvalidatePageEntry and etc)
>
> > On an unrelated note to the concerns above:
> > Why has a fundamental change to the behaviour of one of the industry
> > standard drivers been pushed at the very end of the stable cycle?
>
> We thought it was a simple improvement but not fundamental change before
> Eugene and Ard raised the concern.
Understood.
However, as it is changing the memory management behaviour of a core
driver, I think it automatically qualifies as something that should
only go in the week after a stable tag.
We will need to have a closer look at the non-coherent case when Ard
gets back (Monday).
If this version causes issues with non-coherent systems, we will need
to revert it before the stable tag. We would then need to look into
the best way to deal with the performance issues quoted above.
Best Regards,
Leif
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH V3 4/4] MdeModulePkg EhciDxe: Use common buffer for AsyncInterruptTransfer
2018-10-31 12:08 ` Leif Lindholm
@ 2018-11-01 1:12 ` Zeng, Star
2018-11-01 10:32 ` Leif Lindholm
0 siblings, 1 reply; 18+ messages in thread
From: Zeng, Star @ 2018-11-01 1:12 UTC (permalink / raw)
To: Leif Lindholm; +Cc: edk2-devel-01, star.zeng
On 2018/10/31 20:08, Leif Lindholm wrote:
> On Wed, Oct 31, 2018 at 12:38:43PM +0800, Zeng, Star wrote:
>> Good feedback.
>>
>> On 2018/10/30 20:50, Leif Lindholm wrote:
>>> On Tue, Oct 30, 2018 at 09:39:24AM -0300, Ard Biesheuvel wrote:
>>>> (add back the list)
>>>
>>> Oi! Go back on holiday!
>>>
>>>> On 30 October 2018 at 09:07, Cohen, Eugene <eugene@hp.com> wrote:
>>>>> Has this patch been tested on a system that does not have coherent DMA?
>>>>>
>>>>> It's not clear that this change would actually be faster on a system of that
>>>>> type since using common buffers imply access to uncached memory. Depending
>>>>> on the access patterns the uncached memory access could be more time
>>>>> consuming than cache maintenance operations.
>>
>> The change/idea was based on the statement below.
>> ///
>> /// Provides both read and write access to system memory by both the
>> processor and a
>> /// bus master. The buffer is coherent from both the processor's and the
>> bus master's point of view.
>> ///
>> EfiPciIoOperationBusMasterCommonBuffer,
>>
>> Thanks for raising case about uncached memory access. But after checking the
>> code, for Intel VTd case https://github.com/tianocore/edk2/blob/master/IntelSiliconPkg/Feature/VTd/IntelVTdDxe/BmDma.c#L460
>> (or no IOMMU case https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciRootBridgeIo.c#L1567),
>> the common buffer is just normal memory buffer.
>> If someone can help do some test/collect some data on a system using common
>> buffers imply access to uncached memory, that will be great.
>>
>>>>
>>>> I haven't had time to look at these patches yet.
>>>>
>>>> I agree with Eugene's concern: the directional DMA routines are much
>>>> more performant on implementations with non-coherent DMA, and so
>>>> common buffers should be avoided unless we are dealing with data
>>>> structures that are truly shared between the CPU and the device.
>>>>
>>>> Since this is obviously not the case here, could we please have some
>>>> numbers about the performance improvement we are talking about here?
>>>> Would it be possible to improve the IOMMU handling code instead?
>>
>> We collected the data below on a platform with release image and Intel VTd
>> enabled.
>>
>> The image size of EhciDxe or XhciDxe can reduce about 120+ bytes.
>>
>> EHCI without the patch:
>> ==[ Cumulative ]========
>> (Times in microsec.) Cumulative Average Shortest Longest
>> Name Count Duration Duration Duration Duration
>> -------------------------------------------------------------------------------
>> S0000B00D1DF0 446 2150 4 2 963
>>
>> EHCI with the patch:
>> ==[ Cumulative ]========
>> (Times in microsec.) Cumulative Average Shortest Longest
>> Name Count Duration Duration Duration Duration
>> -------------------------------------------------------------------------------
>> S0000B00D1DF0 270 742 2 2 41
>>
>> XHCI without the patch:
>> ==[ Cumulative ]========
>> (Times in microsec.) Cumulative Average Shortest Longest
>> Name Count Duration Duration Duration Duration
>> -------------------------------------------------------------------------------
>> S0000B00D14F0 215 603 2 2 52
>>
>> XHCI with the patch:
>> ==[ Cumulative ]========
>> (Times in microsec.) Cumulative Average Shortest Longest
>> Name Count Duration Duration Duration Duration
>> -------------------------------------------------------------------------------
>> S0000B00D14F0 95 294 3 2 52
>>
>> I believe the performance data really depends on
>> 1. How many AsyncInterruptTransfer handlers (the number of USB keyboard
>> and/or USB bluetooth keyboard?)
>> 2. Data size (for flushing data from PCI controller specific address to
>> mapped system memory address *in original code*)
>> 3. The performance of IoMmu->SetAttribute (for example, the SetAttribute
>> operation on Intel VTd engine caused by the unmap and map for flushing data
>> *in original code*, the SetAttribute operation on IntelVTd engine will
>> involve FlushPageTableMemory, InvalidatePageEntry and etc)
>>
>>> On an unrelated note to the concerns above:
>>> Why has a fundamental change to the behaviour of one of the industry
>>> standard drivers been pushed at the very end of the stable cycle?
>>
>> We thought it was a simple improvement but not fundamental change before
>> Eugene and Ard raised the concern.
>
> Understood.
Thanks. :)
>
> However, as it is changing the memory management behaviour of a core
> driver, I think it automatically qualifies as something that should
> only go in the week after a stable tag.
>
> We will need to have a closer look at the non-coherent case when Ard
> gets back (Monday).
You mean Ard is on vacation and will be back next Monday.
>
> If this version causes issues with non-coherent systems, we will need
> to revert it before the stable tag. We would then need to look into
> the best way to deal with the performance issues quoted above.
I am glad to revert it if it has side effect. Is it possible someone
could have a quick check?
Thanks,
Star
>
> Best Regards,
>
> Leif
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH V3 4/4] MdeModulePkg EhciDxe: Use common buffer for AsyncInterruptTransfer
2018-11-01 1:12 ` Zeng, Star
@ 2018-11-01 10:32 ` Leif Lindholm
0 siblings, 0 replies; 18+ messages in thread
From: Leif Lindholm @ 2018-11-01 10:32 UTC (permalink / raw)
To: Zeng, Star; +Cc: edk2-devel-01
On Thu, Nov 01, 2018 at 09:12:04AM +0800, Zeng, Star wrote:
> > > I believe the performance data really depends on
> > > 1. How many AsyncInterruptTransfer handlers (the number of USB keyboard
> > > and/or USB bluetooth keyboard?)
> > > 2. Data size (for flushing data from PCI controller specific address to
> > > mapped system memory address *in original code*)
> > > 3. The performance of IoMmu->SetAttribute (for example, the SetAttribute
> > > operation on Intel VTd engine caused by the unmap and map for flushing data
> > > *in original code*, the SetAttribute operation on IntelVTd engine will
> > > involve FlushPageTableMemory, InvalidatePageEntry and etc)
> > >
> > > > On an unrelated note to the concerns above:
> > > > Why has a fundamental change to the behaviour of one of the industry
> > > > standard drivers been pushed at the very end of the stable cycle?
> > >
> > > We thought it was a simple improvement but not fundamental change before
> > > Eugene and Ard raised the concern.
> >
> > Understood.
>
> Thanks. :)
>
> >
> > However, as it is changing the memory management behaviour of a core
> > driver, I think it automatically qualifies as something that should
> > only go in the week after a stable tag.
> >
> > We will need to have a closer look at the non-coherent case when Ard
> > gets back (Monday).
>
> You mean Ard is on vacation and will be back next Monday.
Yes.
> > If this version causes issues with non-coherent systems, we will need
> > to revert it before the stable tag. We would then need to look into
> > the best way to deal with the performance issues quoted above.
>
> I am glad to revert it if it has side effect. Is it possible someone could
> have a quick check?
Ard could. It would take me much longer :)
Which is why I would prefer to wait until next week.
Regards,
Leif
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH V3 4/4] MdeModulePkg EhciDxe: Use common buffer for AsyncInterruptTransfer
2018-10-31 4:38 ` Zeng, Star
2018-10-31 12:08 ` Leif Lindholm
@ 2018-11-06 9:49 ` Ard Biesheuvel
2018-11-06 14:37 ` Zeng, Star
1 sibling, 1 reply; 18+ messages in thread
From: Ard Biesheuvel @ 2018-11-06 9:49 UTC (permalink / raw)
To: Zeng, Star; +Cc: Leif Lindholm, Cohen, Eugene, edk2-devel-01
On 31 October 2018 at 05:38, Zeng, Star <star.zeng@intel.com> wrote:
> Good feedback.
>
> On 2018/10/30 20:50, Leif Lindholm wrote:
>>
>> On Tue, Oct 30, 2018 at 09:39:24AM -0300, Ard Biesheuvel wrote:
>>>
>>> (add back the list)
>>
>>
>> Oi! Go back on holiday!
>>
>>> On 30 October 2018 at 09:07, Cohen, Eugene <eugene@hp.com> wrote:
>>>>
>>>> Has this patch been tested on a system that does not have coherent DMA?
>>>>
>>>> It's not clear that this change would actually be faster on a system of
>>>> that
>>>> type since using common buffers imply access to uncached memory.
>>>> Depending
>>>> on the access patterns the uncached memory access could be more time
>>>> consuming than cache maintenance operations.
>
>
> The change/idea was based on the statement below.
> ///
> /// Provides both read and write access to system memory by both the
> processor and a
> /// bus master. The buffer is coherent from both the processor's and the
> bus master's point of view.
> ///
> EfiPciIoOperationBusMasterCommonBuffer,
>
> Thanks for raising case about uncached memory access. But after checking the
> code, for Intel VTd case
> https://github.com/tianocore/edk2/blob/master/IntelSiliconPkg/Feature/VTd/IntelVTdDxe/BmDma.c#L460
> (or no IOMMU case
> https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciRootBridgeIo.c#L1567),
> the common buffer is just normal memory buffer.
> If someone can help do some test/collect some data on a system using common
> buffers imply access to uncached memory, that will be great.
>
OK, so first of all, can anyone explain to me under which
circumstances interrupt transfers are a bottleneck? I'd assume that
anything throughput bound would use bulk endpoints.
Also, since the Map/Unmap calls are only costly when using an IOMMU,
could we simply revert to the old behavior if mIoMmu == NULL?
>>>
>>> I haven't had time to look at these patches yet.
>>>
>>> I agree with Eugene's concern: the directional DMA routines are much
>>> more performant on implementations with non-coherent DMA, and so
>>> common buffers should be avoided unless we are dealing with data
>>> structures that are truly shared between the CPU and the device.
>>>
>>> Since this is obviously not the case here, could we please have some
>>> numbers about the performance improvement we are talking about here?
>>> Would it be possible to improve the IOMMU handling code instead?
>
>
> We collected the data below on a platform with release image and Intel VTd
> enabled.
>
> The image size of EhciDxe or XhciDxe can reduce about 120+ bytes.
>
> EHCI without the patch:
> ==[ Cumulative ]========
> (Times in microsec.) Cumulative Average Shortest Longest
> Name Count Duration Duration Duration Duration
> -------------------------------------------------------------------------------
> S0000B00D1DF0 446 2150 4 2 963
>
> EHCI with the patch:
> ==[ Cumulative ]========
> (Times in microsec.) Cumulative Average Shortest Longest
> Name Count Duration Duration Duration Duration
> -------------------------------------------------------------------------------
> S0000B00D1DF0 270 742 2 2 41
>
> XHCI without the patch:
> ==[ Cumulative ]========
> (Times in microsec.) Cumulative Average Shortest Longest
> Name Count Duration Duration Duration Duration
> -------------------------------------------------------------------------------
> S0000B00D14F0 215 603 2 2 52
>
> XHCI with the patch:
> ==[ Cumulative ]========
> (Times in microsec.) Cumulative Average Shortest Longest
> Name Count Duration Duration Duration Duration
> -------------------------------------------------------------------------------
> S0000B00D14F0 95 294 3 2 52
>
> I believe the performance data really depends on
> 1. How many AsyncInterruptTransfer handlers (the number of USB keyboard
> and/or USB bluetooth keyboard?)
> 2. Data size (for flushing data from PCI controller specific address to
> mapped system memory address *in original code*)
> 3. The performance of IoMmu->SetAttribute (for example, the SetAttribute
> operation on Intel VTd engine caused by the unmap and map for flushing data
> *in original code*, the SetAttribute operation on IntelVTd engine will
> involve FlushPageTableMemory, InvalidatePageEntry and etc)
>
OK, so there is room for improvement here: there is no reason the
IOMMU driver couldn't cache mappings, or do some other optimizations
that would make mapping the same memory repeatedly less costly.
>>
>> On an unrelated note to the concerns above:
>> Why has a fundamental change to the behaviour of one of the industry
>> standard drivers been pushed at the very end of the stable cycle?
>
>
> We thought it was a simple improvement but not fundamental change before
> Eugene and Ard raised the concern.
>
>
> Thanks,
> Star
>
>>
>> Regards,
>>
>> Leif
>>
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH V3 4/4] MdeModulePkg EhciDxe: Use common buffer for AsyncInterruptTransfer
2018-11-06 9:49 ` Ard Biesheuvel
@ 2018-11-06 14:37 ` Zeng, Star
2018-11-07 15:00 ` Zeng, Star
0 siblings, 1 reply; 18+ messages in thread
From: Zeng, Star @ 2018-11-06 14:37 UTC (permalink / raw)
To: Ard Biesheuvel; +Cc: edk2-devel-01, star.zeng
On 2018/11/6 17:49, Ard Biesheuvel wrote:
> On 31 October 2018 at 05:38, Zeng, Star <star.zeng@intel.com> wrote:
>> Good feedback.
>>
>> On 2018/10/30 20:50, Leif Lindholm wrote:
>>>
>>> On Tue, Oct 30, 2018 at 09:39:24AM -0300, Ard Biesheuvel wrote:
>>>>
>>>> (add back the list)
>>>
>>>
>>> Oi! Go back on holiday!
>>>
>>>> On 30 October 2018 at 09:07, Cohen, Eugene <eugene@hp.com> wrote:
>>>>>
>>>>> Has this patch been tested on a system that does not have coherent DMA?
>>>>>
>>>>> It's not clear that this change would actually be faster on a system of
>>>>> that
>>>>> type since using common buffers imply access to uncached memory.
>>>>> Depending
>>>>> on the access patterns the uncached memory access could be more time
>>>>> consuming than cache maintenance operations.
>>
>>
>> The change/idea was based on the statement below.
>> ///
>> /// Provides both read and write access to system memory by both the
>> processor and a
>> /// bus master. The buffer is coherent from both the processor's and the
>> bus master's point of view.
>> ///
>> EfiPciIoOperationBusMasterCommonBuffer,
>>
>> Thanks for raising case about uncached memory access. But after checking the
>> code, for Intel VTd case
>> https://github.com/tianocore/edk2/blob/master/IntelSiliconPkg/Feature/VTd/IntelVTdDxe/BmDma.c#L460
>> (or no IOMMU case
>> https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciRootBridgeIo.c#L1567),
>> the common buffer is just normal memory buffer.
>> If someone can help do some test/collect some data on a system using common
>> buffers imply access to uncached memory, that will be great.
>>
>
> OK, so first of all, can anyone explain to me under which
> circumstances interrupt transfers are a bottleneck? I'd assume that
> anything throughput bound would use bulk endpoints.
>
> Also, since the Map/Unmap calls are only costly when using an IOMMU,
> could we simply revert to the old behavior if mIoMmu == NULL?
>
>>>>
>>>> I haven't had time to look at these patches yet.
>>>>
>>>> I agree with Eugene's concern: the directional DMA routines are much
>>>> more performant on implementations with non-coherent DMA, and so
>>>> common buffers should be avoided unless we are dealing with data
>>>> structures that are truly shared between the CPU and the device.
>>>>
>>>> Since this is obviously not the case here, could we please have some
>>>> numbers about the performance improvement we are talking about here?
>>>> Would it be possible to improve the IOMMU handling code instead?
>>
>>
>> We collected the data below on a platform with release image and Intel VTd
>> enabled.
>>
>> The image size of EhciDxe or XhciDxe can reduce about 120+ bytes.
>>
>> EHCI without the patch:
>> ==[ Cumulative ]========
>> (Times in microsec.) Cumulative Average Shortest Longest
>> Name Count Duration Duration Duration Duration
>> -------------------------------------------------------------------------------
>> S0000B00D1DF0 446 2150 4 2 963
>>
>> EHCI with the patch:
>> ==[ Cumulative ]========
>> (Times in microsec.) Cumulative Average Shortest Longest
>> Name Count Duration Duration Duration Duration
>> -------------------------------------------------------------------------------
>> S0000B00D1DF0 270 742 2 2 41
>>
>> XHCI without the patch:
>> ==[ Cumulative ]========
>> (Times in microsec.) Cumulative Average Shortest Longest
>> Name Count Duration Duration Duration Duration
>> -------------------------------------------------------------------------------
>> S0000B00D14F0 215 603 2 2 52
>>
>> XHCI with the patch:
>> ==[ Cumulative ]========
>> (Times in microsec.) Cumulative Average Shortest Longest
>> Name Count Duration Duration Duration Duration
>> -------------------------------------------------------------------------------
>> S0000B00D14F0 95 294 3 2 52
>>
>> I believe the performance data really depends on
>> 1. How many AsyncInterruptTransfer handlers (the number of USB keyboard
>> and/or USB bluetooth keyboard?)
>> 2. Data size (for flushing data from PCI controller specific address to
>> mapped system memory address *in original code*)
>> 3. The performance of IoMmu->SetAttribute (for example, the SetAttribute
>> operation on Intel VTd engine caused by the unmap and map for flushing data
>> *in original code*, the SetAttribute operation on IntelVTd engine will
>> involve FlushPageTableMemory, InvalidatePageEntry and etc)
>>
>
> OK, so there is room for improvement here: there is no reason the
> IOMMU driver couldn't cache mappings, or do some other optimizations
> that would make mapping the same memory repeatedly less costly.
The unmap/map with IOMMU will direct to SetAttribute that will
disallow/allow DMA memory access. The IOMMU driver is hard to predict
the sequence of unmap/map operations. Do you have more detail about the
optimizations?
Could you take a try with the patch on the platform for the case you and
Eugene mentioned?
Anyway, I am going to revert the patches (3/4 and 4/4, since 1/4 and 2/4
have no functionality impact) since the time point is a little sensitive
as it is near edk2-stable201811.
Thanks,
Star
>
>>>
>>> On an unrelated note to the concerns above:
>>> Why has a fundamental change to the behaviour of one of the industry
>>> standard drivers been pushed at the very end of the stable cycle?
>>
>>
>> We thought it was a simple improvement but not fundamental change before
>> Eugene and Ard raised the concern.
>>
>>
>> Thanks,
>> Star
>>
>>>
>>> Regards,
>>>
>>> Leif
>>>
>>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH V3 4/4] MdeModulePkg EhciDxe: Use common buffer for AsyncInterruptTransfer
2018-11-06 14:37 ` Zeng, Star
@ 2018-11-07 15:00 ` Zeng, Star
2018-11-07 15:14 ` Ard Biesheuvel
0 siblings, 1 reply; 18+ messages in thread
From: Zeng, Star @ 2018-11-07 15:00 UTC (permalink / raw)
To: Ard Biesheuvel; +Cc: edk2-devel-01, star.zeng
On 2018/11/6 22:37, Zeng, Star wrote:
> On 2018/11/6 17:49, Ard Biesheuvel wrote:
>> On 31 October 2018 at 05:38, Zeng, Star <star.zeng@intel.com> wrote:
>>> Good feedback.
>>>
>>> On 2018/10/30 20:50, Leif Lindholm wrote:
>>>>
>>>> On Tue, Oct 30, 2018 at 09:39:24AM -0300, Ard Biesheuvel wrote:
>>>>>
>>>>> (add back the list)
>>>>
>>>>
>>>> Oi! Go back on holiday!
>>>>
>>>>> On 30 October 2018 at 09:07, Cohen, Eugene <eugene@hp.com> wrote:
>>>>>>
>>>>>> Has this patch been tested on a system that does not have coherent
>>>>>> DMA?
>>>>>>
>>>>>> It's not clear that this change would actually be faster on a
>>>>>> system of
>>>>>> that
>>>>>> type since using common buffers imply access to uncached memory.
>>>>>> Depending
>>>>>> on the access patterns the uncached memory access could be more time
>>>>>> consuming than cache maintenance operations.
>>>
>>>
>>> The change/idea was based on the statement below.
>>> ///
>>> /// Provides both read and write access to system memory by both the
>>> processor and a
>>> /// bus master. The buffer is coherent from both the processor's
>>> and the
>>> bus master's point of view.
>>> ///
>>> EfiPciIoOperationBusMasterCommonBuffer,
>>>
>>> Thanks for raising case about uncached memory access. But after
>>> checking the
>>> code, for Intel VTd case
>>> https://github.com/tianocore/edk2/blob/master/IntelSiliconPkg/Feature/VTd/IntelVTdDxe/BmDma.c#L460
>>>
>>> (or no IOMMU case
>>> https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciRootBridgeIo.c#L1567),
>>>
>>> the common buffer is just normal memory buffer.
>>> If someone can help do some test/collect some data on a system using
>>> common
>>> buffers imply access to uncached memory, that will be great.
>>>
>>
>> OK, so first of all, can anyone explain to me under which
>> circumstances interrupt transfers are a bottleneck? I'd assume that
>> anything throughput bound would use bulk endpoints.
>>
>> Also, since the Map/Unmap calls are only costly when using an IOMMU,
>> could we simply revert to the old behavior if mIoMmu == NULL?
>>
>>>>>
>>>>> I haven't had time to look at these patches yet.
>>>>>
>>>>> I agree with Eugene's concern: the directional DMA routines are much
>>>>> more performant on implementations with non-coherent DMA, and so
>>>>> common buffers should be avoided unless we are dealing with data
>>>>> structures that are truly shared between the CPU and the device.
>>>>>
>>>>> Since this is obviously not the case here, could we please have some
>>>>> numbers about the performance improvement we are talking about here?
>>>>> Would it be possible to improve the IOMMU handling code instead?
>>>
>>>
>>> We collected the data below on a platform with release image and
>>> Intel VTd
>>> enabled.
>>>
>>> The image size of EhciDxe or XhciDxe can reduce about 120+ bytes.
>>>
>>> EHCI without the patch:
>>> ==[ Cumulative ]========
>>> (Times in microsec.) Cumulative Average Shortest Longest
>>> Name Count Duration Duration Duration Duration
>>> -------------------------------------------------------------------------------
>>>
>>> S0000B00D1DF0 446 2150 4 2 963
>>>
>>> EHCI with the patch:
>>> ==[ Cumulative ]========
>>> (Times in microsec.) Cumulative Average Shortest Longest
>>> Name Count Duration Duration Duration Duration
>>> -------------------------------------------------------------------------------
>>>
>>> S0000B00D1DF0 270 742 2 2 41
>>>
>>> XHCI without the patch:
>>> ==[ Cumulative ]========
>>> (Times in microsec.) Cumulative Average Shortest Longest
>>> Name Count Duration Duration Duration Duration
>>> -------------------------------------------------------------------------------
>>>
>>> S0000B00D14F0 215 603 2 2 52
>>>
>>> XHCI with the patch:
>>> ==[ Cumulative ]========
>>> (Times in microsec.) Cumulative Average Shortest Longest
>>> Name Count Duration Duration Duration Duration
>>> -------------------------------------------------------------------------------
>>>
>>> S0000B00D14F0 95 294 3 2 52
>>>
>>> I believe the performance data really depends on
>>> 1. How many AsyncInterruptTransfer handlers (the number of USB keyboard
>>> and/or USB bluetooth keyboard?)
>>> 2. Data size (for flushing data from PCI controller specific address to
>>> mapped system memory address *in original code*)
>>> 3. The performance of IoMmu->SetAttribute (for example, the SetAttribute
>>> operation on Intel VTd engine caused by the unmap and map for
>>> flushing data
>>> *in original code*, the SetAttribute operation on IntelVTd engine will
>>> involve FlushPageTableMemory, InvalidatePageEntry and etc)
>>>
>>
>> OK, so there is room for improvement here: there is no reason the
>> IOMMU driver couldn't cache mappings, or do some other optimizations
>> that would make mapping the same memory repeatedly less costly.
>
> The unmap/map with IOMMU will direct to SetAttribute that will
> disallow/allow DMA memory access. The IOMMU driver is hard to predict
> the sequence of unmap/map operations. Do you have more detail about the
> optimizations?
>
> Could you take a try with the patch on the platform for the case you and
> Eugene mentioned?
>
> Anyway, I am going to revert the patches (3/4 and 4/4, since 1/4 and 2/4
> have no functionality impact) since the time point is a little sensitive
> as it is near edk2-stable201811.
I have reverted the patch 3/4 and 4/4 at
https://github.com/tianocore/edk2/compare/1ed6498...d98fc9a, and we can
continue the discussion.
>
> Thanks,
> Star
>
>>
>>>>
>>>> On an unrelated note to the concerns above:
>>>> Why has a fundamental change to the behaviour of one of the industry
>>>> standard drivers been pushed at the very end of the stable cycle?
>>>
>>>
>>> We thought it was a simple improvement but not fundamental change before
>>> Eugene and Ard raised the concern.
>>>
>>>
>>> Thanks,
>>> Star
>>>
>>>>
>>>> Regards,
>>>>
>>>> Leif
>>>>
>>>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH V3 4/4] MdeModulePkg EhciDxe: Use common buffer for AsyncInterruptTransfer
2018-11-07 15:00 ` Zeng, Star
@ 2018-11-07 15:14 ` Ard Biesheuvel
0 siblings, 0 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2018-11-07 15:14 UTC (permalink / raw)
To: Zeng, Star; +Cc: edk2-devel-01
On 7 November 2018 at 16:00, Zeng, Star <star.zeng@intel.com> wrote:
> On 2018/11/6 22:37, Zeng, Star wrote:
>>
>> On 2018/11/6 17:49, Ard Biesheuvel wrote:
>>>
>>> On 31 October 2018 at 05:38, Zeng, Star <star.zeng@intel.com> wrote:
>>>>
>>>> Good feedback.
>>>>
>>>> On 2018/10/30 20:50, Leif Lindholm wrote:
>>>>>
>>>>>
>>>>> On Tue, Oct 30, 2018 at 09:39:24AM -0300, Ard Biesheuvel wrote:
>>>>>>
>>>>>>
>>>>>> (add back the list)
>>>>>
>>>>>
>>>>>
>>>>> Oi! Go back on holiday!
>>>>>
>>>>>> On 30 October 2018 at 09:07, Cohen, Eugene <eugene@hp.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Has this patch been tested on a system that does not have coherent
>>>>>>> DMA?
>>>>>>>
>>>>>>> It's not clear that this change would actually be faster on a system
>>>>>>> of
>>>>>>> that
>>>>>>> type since using common buffers imply access to uncached memory.
>>>>>>> Depending
>>>>>>> on the access patterns the uncached memory access could be more time
>>>>>>> consuming than cache maintenance operations.
>>>>
>>>>
>>>>
>>>> The change/idea was based on the statement below.
>>>> ///
>>>> /// Provides both read and write access to system memory by both the
>>>> processor and a
>>>> /// bus master. The buffer is coherent from both the processor's and
>>>> the
>>>> bus master's point of view.
>>>> ///
>>>> EfiPciIoOperationBusMasterCommonBuffer,
>>>>
>>>> Thanks for raising case about uncached memory access. But after checking
>>>> the
>>>> code, for Intel VTd case
>>>>
>>>> https://github.com/tianocore/edk2/blob/master/IntelSiliconPkg/Feature/VTd/IntelVTdDxe/BmDma.c#L460
>>>> (or no IOMMU case
>>>>
>>>> https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciRootBridgeIo.c#L1567),
>>>> the common buffer is just normal memory buffer.
>>>> If someone can help do some test/collect some data on a system using
>>>> common
>>>> buffers imply access to uncached memory, that will be great.
>>>>
>>>
>>> OK, so first of all, can anyone explain to me under which
>>> circumstances interrupt transfers are a bottleneck? I'd assume that
>>> anything throughput bound would use bulk endpoints.
>>>
>>> Also, since the Map/Unmap calls are only costly when using an IOMMU,
>>> could we simply revert to the old behavior if mIoMmu == NULL?
>>>
>>>>>>
>>>>>> I haven't had time to look at these patches yet.
>>>>>>
>>>>>> I agree with Eugene's concern: the directional DMA routines are much
>>>>>> more performant on implementations with non-coherent DMA, and so
>>>>>> common buffers should be avoided unless we are dealing with data
>>>>>> structures that are truly shared between the CPU and the device.
>>>>>>
>>>>>> Since this is obviously not the case here, could we please have some
>>>>>> numbers about the performance improvement we are talking about here?
>>>>>> Would it be possible to improve the IOMMU handling code instead?
>>>>
>>>>
>>>>
>>>> We collected the data below on a platform with release image and Intel
>>>> VTd
>>>> enabled.
>>>>
>>>> The image size of EhciDxe or XhciDxe can reduce about 120+ bytes.
>>>>
>>>> EHCI without the patch:
>>>> ==[ Cumulative ]========
>>>> (Times in microsec.) Cumulative Average Shortest Longest
>>>> Name Count Duration Duration Duration Duration
>>>>
>>>> -------------------------------------------------------------------------------
>>>> S0000B00D1DF0 446 2150 4 2 963
>>>>
>>>> EHCI with the patch:
>>>> ==[ Cumulative ]========
>>>> (Times in microsec.) Cumulative Average Shortest Longest
>>>> Name Count Duration Duration Duration Duration
>>>>
>>>> -------------------------------------------------------------------------------
>>>> S0000B00D1DF0 270 742 2 2 41
>>>>
>>>> XHCI without the patch:
>>>> ==[ Cumulative ]========
>>>> (Times in microsec.) Cumulative Average Shortest Longest
>>>> Name Count Duration Duration Duration Duration
>>>>
>>>> -------------------------------------------------------------------------------
>>>> S0000B00D14F0 215 603 2 2 52
>>>>
>>>> XHCI with the patch:
>>>> ==[ Cumulative ]========
>>>> (Times in microsec.) Cumulative Average Shortest Longest
>>>> Name Count Duration Duration Duration Duration
>>>>
>>>> -------------------------------------------------------------------------------
>>>> S0000B00D14F0 95 294 3 2 52
>>>>
>>>> I believe the performance data really depends on
>>>> 1. How many AsyncInterruptTransfer handlers (the number of USB keyboard
>>>> and/or USB bluetooth keyboard?)
>>>> 2. Data size (for flushing data from PCI controller specific address to
>>>> mapped system memory address *in original code*)
>>>> 3. The performance of IoMmu->SetAttribute (for example, the SetAttribute
>>>> operation on Intel VTd engine caused by the unmap and map for flushing
>>>> data
>>>> *in original code*, the SetAttribute operation on IntelVTd engine will
>>>> involve FlushPageTableMemory, InvalidatePageEntry and etc)
>>>>
>>>
>>> OK, so there is room for improvement here: there is no reason the
>>> IOMMU driver couldn't cache mappings, or do some other optimizations
>>> that would make mapping the same memory repeatedly less costly.
>>
>>
>> The unmap/map with IOMMU will direct to SetAttribute that will
>> disallow/allow DMA memory access. The IOMMU driver is hard to predict the
>> sequence of unmap/map operations. Do you have more detail about the
>> optimizations?
>>
>> Could you take a try with the patch on the platform for the case you and
>> Eugene mentioned?
>>
>> Anyway, I am going to revert the patches (3/4 and 4/4, since 1/4 and 2/4
>> have no functionality impact) since the time point is a little sensitive as
>> it is near edk2-stable201811.
>
>
> I have reverted the patch 3/4 and 4/4 at
> https://github.com/tianocore/edk2/compare/1ed6498...d98fc9a, and we can
> continue the discussion.
>
Thanks Star
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2018-11-07 15:14 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-10-26 13:41 [PATCH V3 0/4] Remove unnecessary Map/Unmap in XhciDxe/EhciDxe for AsyncInterruptTransfer Star Zeng
2018-10-26 13:41 ` [PATCH V3 1/4] MdeModulePkg XhciDxe: Extract new XhciInsertAsyncIntTransfer function Star Zeng
2018-10-26 13:41 ` [PATCH V3 2/4] MdeModulePkg EhciDxe: Extract new EhciInsertAsyncIntTransfer function Star Zeng
2018-10-26 13:41 ` [PATCH V3 3/4] MdeModulePkg XhciDxe: Use common buffer for AsyncInterruptTransfer Star Zeng
2018-10-27 7:47 ` Wu, Hao A
2018-10-26 13:41 ` [PATCH V3 4/4] MdeModulePkg EhciDxe: " Star Zeng
[not found] ` <CS1PR8401MB1189428C2915107C583C0A42B4CC0@CS1PR8401MB1189.NAMPRD84.PROD.OUTLOOK.COM>
2018-10-30 12:39 ` Ard Biesheuvel
2018-10-30 12:50 ` Leif Lindholm
2018-10-31 4:38 ` Zeng, Star
2018-10-31 12:08 ` Leif Lindholm
2018-11-01 1:12 ` Zeng, Star
2018-11-01 10:32 ` Leif Lindholm
2018-11-06 9:49 ` Ard Biesheuvel
2018-11-06 14:37 ` Zeng, Star
2018-11-07 15:00 ` Zeng, Star
2018-11-07 15:14 ` Ard Biesheuvel
2018-10-28 13:35 ` [PATCH V3 0/4] Remove unnecessary Map/Unmap in XhciDxe/EhciDxe " Zeng, Star
2018-10-29 2:19 ` Ni, Ruiyu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox