public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
* [PATCH 0/3] MdeModulePkg/SdMmcPciHcDxe: Retry the commands that failed due to CRC error
@ 2020-01-07 11:06 Albecki, Mateusz
  2020-01-07 11:06 ` [PATCH 1/3] MdeModulePkg/SdMmcPciHcDxe: Refactor command error detection Albecki, Mateusz
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Albecki, Mateusz @ 2020-01-07 11:06 UTC (permalink / raw)
  To: devel; +Cc: Mateusz Albecki, Hao A Wu, Marcin Wojtas, Zhichao Gao, Liming Gao

REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1140

Some of the boards report that just after we change the clock frequency to 200MHz link is
unable to stabilize fast enough and when driver sends the CMD13 it will often fail
randomly with CRC error. To protect against this kind of random failures this patch
series will make the driver retry the commands that failed due to random CRC errors.

Since async code has not yet been tested it has been put into separate patch. That patch
is not needed to solve most pressing CMD13 issues.

Tets performed:
-Boot eMMC in HS400
-Boot eMMC in HS400 with simulated CRC error on every first CMD13

Cc: Hao A Wu <hao.a.wu@intel.com>
Cc: Marcin Wojtas <mw@semihalf.com>
Cc: Zhichao Gao <zhichao.gao@intel.com>
Cc: Liming Gao <liming.gao@intel.com>

Signed-off-by: Mateusz Albecki <mateusz.albecki@intel.com>

Mateusz Albecki (3):
  MdeModulePkg/SdMmcPciHcDxe: Refactor command error detection
  MdeModulePkg/SdMmcPciHcDxe: Add retries for sync commands
  MdeModulePkg/SdMmcPciHcDxe: Add retries for async commands

 MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHcDxe.c |  89 +++++---
 MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHcDxe.h |   5 +-
 MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHci.c   | 235 ++++++++++++++-------
 3 files changed, 221 insertions(+), 108 deletions(-)

-- 
2.14.1.windows.1

--------------------------------------------------------------------

Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.

Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek
przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by
others is strictly prohibited.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/3] MdeModulePkg/SdMmcPciHcDxe: Refactor command error detection
  2020-01-07 11:06 [PATCH 0/3] MdeModulePkg/SdMmcPciHcDxe: Retry the commands that failed due to CRC error Albecki, Mateusz
@ 2020-01-07 11:06 ` Albecki, Mateusz
  2020-01-10  5:37   ` Wu, Hao A
  2020-01-07 11:06 ` [PATCH 2/3] MdeModulePkg/SdMmcPciHcDxe: Add retries for sync commands Albecki, Mateusz
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Albecki, Mateusz @ 2020-01-07 11:06 UTC (permalink / raw)
  To: devel; +Cc: Mateusz Albecki, Hao A Wu, Marcin Wojtas, Zhichao Gao, Liming Gao

REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1140

Error detection function will now check if the command
failure has been caused by one of the errors that can
appear randomly on link(CRC error + end bit error). If
such an error has been a cause of failure function will
return EFI_CRC_ERROR instead of EFI_DEVICE_ERROR to indicate
to the higher level that command has a chance of succeeding if
resent. In addition this patch also fixes 2 small bugs. First one
is DAT lane being reset on current limit error. Second one is
data timeout error not being cleared after transfer has been completed.

Cc: Hao A Wu <hao.a.wu@intel.com>
Cc: Marcin Wojtas <mw@semihalf.com>
Cc: Zhichao Gao <zhichao.gao@intel.com>
Cc: Liming Gao <liming.gao@intel.com>

Signed-off-by: Mateusz Albecki <mateusz.albecki@intel.com>
---
 MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHci.c | 234 +++++++++++++++--------
 1 file changed, 158 insertions(+), 76 deletions(-)

diff --git a/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHci.c b/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHci.c
index e7f2fac69b..8b5e54f321 100644
--- a/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHci.c
+++ b/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHci.c
@@ -7,7 +7,7 @@
   It would expose EFI_SD_MMC_PASS_THRU_PROTOCOL for upper layer use.
 
   Copyright (c) 2018-2019, NVIDIA CORPORATION. All rights reserved.
-  Copyright (c) 2015 - 2019, Intel Corporation. All rights reserved.<BR>
+  Copyright (c) 2015 - 2020, Intel Corporation. All rights reserved.<BR>
   SPDX-License-Identifier: BSD-2-Clause-Patent
 
 **/
@@ -2137,6 +2137,154 @@ SdMmcExecTrb (
   return Status;
 }
 
+/**
+  Performs SW reset based on passed error status mask.
+
+  @param[in]  Private       Pointer to driver private data.
+  @param[in]  Slot          Index of the slot to reset.
+  @param[in]  ErrIntStatus  Error interrupt status mask.
+
+  @retval EFI_SUCCESS  Software reset performed successfully.
+  @retval Other        Software reset failed.
+**/
+EFI_STATUS
+SdMmcSoftwareReset (
+  IN SD_MMC_HC_PRIVATE_DATA  *Private,
+  IN UINT8                   Slot,
+  IN UINT16                  ErrIntStatus
+  )
+{
+  UINT8       SwReset;
+  EFI_STATUS  Status;
+
+  SwReset = 0;
+  if ((ErrIntStatus & 0x0F) != 0) {
+    SwReset |= BIT1;
+  }
+  if ((ErrIntStatus & 0x70) != 0) {
+    SwReset |= BIT2;
+  }
+
+  Status  = SdMmcHcRwMmio (
+              Private->PciIo,
+              Slot,
+              SD_MMC_HC_SW_RST,
+              FALSE,
+              sizeof (SwReset),
+              &SwReset
+              );
+  if (EFI_ERROR (Status)) {
+    return Status;
+  }
+
+  Status = SdMmcHcWaitMmioSet (
+             Private->PciIo,
+             Slot,
+             SD_MMC_HC_SW_RST,
+             sizeof (SwReset),
+             0xFF,
+             0,
+             SD_MMC_HC_GENERIC_TIMEOUT
+             );
+  if (EFI_ERROR (Status)) {
+    return Status;
+  }
+
+  return EFI_SUCCESS;
+}
+
+/**
+  Checks the error status in error status register
+  and issues appropriate software reset as described in
+  SD specification section 3.10.
+
+  @param[in] Private  Pointer to driver private data.
+  @param[in] Trb      Pointer to currently executing TRB.
+
+  @retval EFI_CRC_ERROR  CRC error happened during CMD execution.
+  @retval EFI_SUCCESS    No error reported.
+  @retval Others         Some other error happened.
+
+**/
+EFI_STATUS
+SdMmcCheckAndRecoverErrors (
+  IN SD_MMC_HC_PRIVATE_DATA  *Private,
+  IN UINT8                   Slot
+  )
+{
+  UINT16      IntStatus;
+  UINT16      ErrIntStatus;
+  UINT16      ErrIntStatusOr;
+  EFI_STATUS  Status;
+  EFI_STATUS  ErrorStatus;
+
+  Status = SdMmcHcRwMmio (
+             Private->PciIo,
+             Slot,
+             SD_MMC_HC_NOR_INT_STS,
+             TRUE,
+             sizeof (IntStatus),
+             &IntStatus
+             );
+  if (EFI_ERROR (Status)) {
+    return Status;
+  }
+
+  if ((IntStatus & BIT15) == 0) {
+    return EFI_SUCCESS;
+  }
+
+  Status = SdMmcHcRwMmio (
+             Private->PciIo,
+             Slot,
+             SD_MMC_HC_ERR_INT_STS,
+             TRUE,
+             sizeof (ErrIntStatus),
+             &ErrIntStatus
+             );
+  if (EFI_ERROR (Status)) {
+    return Status;
+  }
+
+  //
+  // We treat both CMD and DAT CRC errors and
+  // end bits errors as EFI_CRC_ERROR. This will
+  // let higher layer know that the error possibly
+  // happened due to random bus condition and the
+  // command can be retried.
+  //
+  if (ErrIntStatus & (BIT1 | BIT2 | BIT5 | BIT6)) {
+    ErrorStatus = EFI_CRC_ERROR;
+  } else if ((ErrIntStatus & BIT4) && (IntStatus & BIT1)){
+    //
+    // If the data timeout error is reported
+    // but data transfer is signaled as completed we
+    // have to ignore data timeout.
+    //
+    ErrorStatus = EFI_SUCCESS;
+    ErrIntStatusOr = BIT4;
+    Status = SdMmcHcOrMmio (
+               Private->PciIo,
+               Slot,
+               SD_MMC_HC_ERR_INT_STS,
+               sizeof (ErrIntStatus),
+               &ErrIntStatusOr
+               );
+    if (EFI_ERROR (Status)) {
+      return Status;
+    }
+  } else {
+    ErrorStatus = EFI_DEVICE_ERROR;
+  }
+
+  Status = SdMmcSoftwareReset (Private, Slot, ErrIntStatus);
+  if (EFI_ERROR (Status)) {
+    return Status;
+  }
+
+  return ErrorStatus;
+}
+
 /**
   Check the TRB execution result.
 
@@ -2160,10 +2308,8 @@ SdMmcCheckTrbResult (
   UINT32                              Response[4];
   UINT64                              SdmaAddr;
   UINT8                               Index;
-  UINT8                               SwReset;
   UINT32                              PioLength;
 
-  SwReset = 0;
   Packet  = Trb->Packet;
   //
   // Check Trb execution result by reading Normal Interrupt Status register.
@@ -2179,87 +2325,23 @@ SdMmcCheckTrbResult (
   if (EFI_ERROR (Status)) {
     goto Done;
   }
+
   //
-  // Check Transfer Complete bit is set or not.
+  // Check if there are any errors reported by host controller
+  // and if neccessary recover the controller before next command is executed.
   //
-  if ((IntStatus & BIT1) == BIT1) {
-    if ((IntStatus & BIT15) == BIT15) {
-      //
-      // Read Error Interrupt Status register to check if the error is
-      // Data Timeout Error.
-      // If yes, treat it as success as Transfer Complete has higher
-      // priority than Data Timeout Error.
-      //
-      Status = SdMmcHcRwMmio (
-                 Private->PciIo,
-                 Trb->Slot,
-                 SD_MMC_HC_ERR_INT_STS,
-                 TRUE,
-                 sizeof (IntStatus),
-                 &IntStatus
-                 );
-      if (!EFI_ERROR (Status)) {
-        if ((IntStatus & BIT4) == BIT4) {
-          Status = EFI_SUCCESS;
-        } else {
-          Status = EFI_DEVICE_ERROR;
-        }
-      }
-    }
-
+  Status = SdMmcCheckAndRecoverErrors (Private, Trb->Slot);
+  if (EFI_ERROR (Status)) {
     goto Done;
   }
+
   //
-  // Check if there is a error happened during cmd execution.
-  // If yes, then do error recovery procedure to follow SD Host Controller
-  // Simplified Spec 3.0 section 3.10.1.
+  // Check Transfer Complete bit is set or not.
   //
-  if ((IntStatus & BIT15) == BIT15) {
-    Status = SdMmcHcRwMmio (
-               Private->PciIo,
-               Trb->Slot,
-               SD_MMC_HC_ERR_INT_STS,
-               TRUE,
-               sizeof (IntStatus),
-               &IntStatus
-               );
-    if (EFI_ERROR (Status)) {
-      goto Done;
-    }
-    if ((IntStatus & 0x0F) != 0) {
-      SwReset |= BIT1;
-    }
-    if ((IntStatus & 0xF0) != 0) {
-      SwReset |= BIT2;
-    }
-
-    Status  = SdMmcHcRwMmio (
-                Private->PciIo,
-                Trb->Slot,
-                SD_MMC_HC_SW_RST,
-                FALSE,
-                sizeof (SwReset),
-                &SwReset
-                );
-    if (EFI_ERROR (Status)) {
-      goto Done;
-    }
-    Status = SdMmcHcWaitMmioSet (
-               Private->PciIo,
-               Trb->Slot,
-               SD_MMC_HC_SW_RST,
-               sizeof (SwReset),
-               0xFF,
-               0,
-               SD_MMC_HC_GENERIC_TIMEOUT
-               );
-    if (EFI_ERROR (Status)) {
-      goto Done;
-    }
-
-    Status = EFI_DEVICE_ERROR;
+  if ((IntStatus & BIT1) == BIT1) {
     goto Done;
   }
+
   //
   // Check if DMA interrupt is signalled for the SDMA transfer.
   //
-- 
2.14.1.windows.1

--------------------------------------------------------------------

Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.

Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek
przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by
others is strictly prohibited.


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/3] MdeModulePkg/SdMmcPciHcDxe: Add retries for sync commands
  2020-01-07 11:06 [PATCH 0/3] MdeModulePkg/SdMmcPciHcDxe: Retry the commands that failed due to CRC error Albecki, Mateusz
  2020-01-07 11:06 ` [PATCH 1/3] MdeModulePkg/SdMmcPciHcDxe: Refactor command error detection Albecki, Mateusz
@ 2020-01-07 11:06 ` Albecki, Mateusz
  2020-01-10  5:37   ` Wu, Hao A
  2020-01-07 11:06 ` [PATCH 3/3] MdeModulePkg/SdMmcPciHcDxe: Add retries for async commands Albecki, Mateusz
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Albecki, Mateusz @ 2020-01-07 11:06 UTC (permalink / raw)
  To: devel; +Cc: Mateusz Albecki, Hao A Wu, Marcin Wojtas, Zhichao Gao, Liming Gao

REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1140

To increase the resiliency driver will now attempt to
retry the commands that failed due to the CRC error up
to 5 times. This should address the problems with the commands
that fail due to random condition on links. This should also
help the boards on which CMD13 is particularly unstable after
switching the link frequency.

Cc: Hao A Wu <hao.a.wu@intel.com>
Cc: Marcin Wojtas <mw@semihalf.com>
Cc: Zhichao Gao <zhichao.gao@intel.com>
Cc: Liming Gao <liming.gao@intel.com>

Signed-off-by: Mateusz Albecki <mateusz.albecki@intel.com>
---
 MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHcDxe.c | 83 ++++++++++++++--------
 MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHcDxe.h |  5 +-
 MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHci.c   |  1 +
 3 files changed, 59 insertions(+), 30 deletions(-)

diff --git a/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHcDxe.c b/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHcDxe.c
index 373f1bed45..193b0f24e2 100644
--- a/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHcDxe.c
+++ b/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHcDxe.c
@@ -7,7 +7,7 @@
   It would expose EFI_SD_MMC_PASS_THRU_PROTOCOL for upper layer use.
 
   Copyright (c) 2018-2019, NVIDIA CORPORATION. All rights reserved.
-  Copyright (c) 2015 - 2019, Intel Corporation. All rights reserved.<BR>
+  Copyright (c) 2015 - 2020, Intel Corporation. All rights reserved.<BR>
   SPDX-License-Identifier: BSD-2-Clause-Patent
 
 **/
@@ -974,6 +974,58 @@ SdMmcPciHcDriverBindingStop (
   return Status;
 }
 
+/**
+  Execute TRB synchronously.
+
+  @param[in] Private  Pointer to driver private data.
+  @param[in] Trb      Pointer to TRB to execute.
+
+  @retval EFI_SUCCESS  TRB executed successfully.
+  @retval Other        TRB failed.
+**/
+EFI_STATUS
+SdMmcPassThruExecSyncTrb (
+  IN SD_MMC_HC_PRIVATE_DATA  *Private,
+  IN SD_MMC_HC_TRB           *Trb
+  )
+{
+  EFI_STATUS  Status;
+  EFI_TPL     OldTpl;
+
+  //
+  // Wait async I/O list is empty before execute sync I/O operation.
+  //
+  while (TRUE) {
+    OldTpl = gBS->RaiseTPL (TPL_NOTIFY);
+    if (IsListEmpty (&Private->Queue)) {
+      gBS->RestoreTPL (OldTpl);
+      break;
+    }
+    gBS->RestoreTPL (OldTpl);
+  }
+
+  while (Trb->Retries) {
+    Status = SdMmcWaitTrbEnv (Private, Trb);
+    if (EFI_ERROR (Status)) {
+      return Status;
+    }
+
+    Status = SdMmcExecTrb (Private, Trb);
+    if (EFI_ERROR (Status)) {
+      return Status;
+    }
+
+    Status = SdMmcWaitTrbResult (Private, Trb);
+    if (Status == EFI_CRC_ERROR) {
+      Trb->Retries--;
+    } else {
+      return Status;
+    }
+  }
+
+  return Status;
+}
+
 /**
   Sends SD command to an SD card that is attached to the SD controller.
 
@@ -1023,7 +1075,6 @@ SdMmcPassThruPassThru (
   EFI_STATUS                      Status;
   SD_MMC_HC_PRIVATE_DATA          *Private;
   SD_MMC_HC_TRB                   *Trb;
-  EFI_TPL                         OldTpl;
 
   if ((This == NULL) || (Packet == NULL)) {
     return EFI_INVALID_PARAMETER;
@@ -1066,34 +1117,8 @@ SdMmcPassThruPassThru (
     return EFI_SUCCESS;
   }
 
-  //
-  // Wait async I/O list is empty before execute sync I/O operation.
-  //
-  while (TRUE) {
-    OldTpl = gBS->RaiseTPL (TPL_NOTIFY);
-    if (IsListEmpty (&Private->Queue)) {
-      gBS->RestoreTPL (OldTpl);
-      break;
-    }
-    gBS->RestoreTPL (OldTpl);
-  }
-
-  Status = SdMmcWaitTrbEnv (Private, Trb);
-  if (EFI_ERROR (Status)) {
-    goto Done;
-  }
-
-  Status = SdMmcExecTrb (Private, Trb);
-  if (EFI_ERROR (Status)) {
-    goto Done;
-  }
+  Status = SdMmcPassThruExecSyncTrb (Private, Trb);
 
-  Status = SdMmcWaitTrbResult (Private, Trb);
-  if (EFI_ERROR (Status)) {
-    goto Done;
-  }
-
-Done:
   SdMmcFreeTrb (Trb);
 
   return Status;
diff --git a/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHcDxe.h b/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHcDxe.h
index 0304960132..5bc3577ba2 100644
--- a/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHcDxe.h
+++ b/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHcDxe.h
@@ -3,7 +3,7 @@
   Provides some data structure definitions used by the SD/MMC host controller driver.
 
 Copyright (c) 2018-2019, NVIDIA CORPORATION. All rights reserved.
-Copyright (c) 2015, Intel Corporation. All rights reserved.<BR>
+Copyright (c) 2015 - 2020, Intel Corporation. All rights reserved.<BR>
 SPDX-License-Identifier: BSD-2-Clause-Patent
 
 **/
@@ -130,6 +130,8 @@ typedef struct {
 
 #define SD_MMC_HC_TRB_SIG             SIGNATURE_32 ('T', 'R', 'B', 'T')
 
+#define SD_MMC_TRB_RETRIES            5
+
 //
 // TRB (Transfer Request Block) contains information for the cmd request.
 //
@@ -152,6 +154,7 @@ typedef struct {
   EFI_EVENT                           Event;
   BOOLEAN                             Started;
   UINT64                              Timeout;
+  UINT32                              Retries;
 
   SD_MMC_HC_ADMA_32_DESC_LINE         *Adma32Desc;
   SD_MMC_HC_ADMA_64_V3_DESC_LINE      *Adma64V3Desc;
diff --git a/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHci.c b/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHci.c
index 8b5e54f321..676ace847b 100644
--- a/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHci.c
+++ b/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHci.c
@@ -1683,6 +1683,7 @@ SdMmcCreateTrb (
   Trb->Event     = Event;
   Trb->Started   = FALSE;
   Trb->Timeout   = Packet->Timeout;
+  Trb->Retries   = SD_MMC_TRB_RETRIES;
   Trb->Private   = Private;
 
   if ((Packet->InTransferLength != 0) && (Packet->InDataBuffer != NULL)) {
-- 
2.14.1.windows.1

--------------------------------------------------------------------

Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.

Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek
przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by
others is strictly prohibited.


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/3] MdeModulePkg/SdMmcPciHcDxe: Add retries for async commands
  2020-01-07 11:06 [PATCH 0/3] MdeModulePkg/SdMmcPciHcDxe: Retry the commands that failed due to CRC error Albecki, Mateusz
  2020-01-07 11:06 ` [PATCH 1/3] MdeModulePkg/SdMmcPciHcDxe: Refactor command error detection Albecki, Mateusz
  2020-01-07 11:06 ` [PATCH 2/3] MdeModulePkg/SdMmcPciHcDxe: Add retries for sync commands Albecki, Mateusz
@ 2020-01-07 11:06 ` Albecki, Mateusz
  2020-01-10  5:37   ` Wu, Hao A
  2020-01-07 11:09 ` [PATCH 0/3] MdeModulePkg/SdMmcPciHcDxe: Retry the commands that failed due to CRC error Albecki, Mateusz
  2020-01-08  7:38 ` Wu, Hao A
  4 siblings, 1 reply; 11+ messages in thread
From: Albecki, Mateusz @ 2020-01-07 11:06 UTC (permalink / raw)
  To: devel; +Cc: Mateusz Albecki, Hao A Wu, Marcin Wojtas, Zhichao Gao, Liming Gao

This patch adds retries for async execution for commands that
failed due to the CRC errors.

Cc: Hao A Wu <hao.a.wu@intel.com>
Cc: Marcin Wojtas <mw@semihalf.com>
Cc: Zhichao Gao <zhichao.gao@intel.com>
Cc: Liming Gao <liming.gao@intel.com>

Signed-off-by: Mateusz Albecki <mateusz.albecki@intel.com>
---
 MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHcDxe.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHcDxe.c b/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHcDxe.c
index 193b0f24e2..b18ff3e972 100644
--- a/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHcDxe.c
+++ b/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHcDxe.c
@@ -211,8 +211,10 @@ Done:
       gBS->SignalEvent (TrbEvent);
       return;
     }
-  }
-  if ((Trb != NULL) && (Status != EFI_NOT_READY)) {
+  } else if ((Trb != NULL) && (Status == EFI_CRC_ERROR) && (Trb->Retries > 0)) {
+    Trb->Retries--;
+    Trb->Started = FALSE;
+  } else if ((Trb != NULL)) {
     RemoveEntryList (Link);
     Trb->Packet->TransactionStatus = Status;
     TrbEvent = Trb->Event;
-- 
2.14.1.windows.1

--------------------------------------------------------------------

Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.

Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek
przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by
others is strictly prohibited.


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/3] MdeModulePkg/SdMmcPciHcDxe: Retry the commands that failed due to CRC error
  2020-01-07 11:06 [PATCH 0/3] MdeModulePkg/SdMmcPciHcDxe: Retry the commands that failed due to CRC error Albecki, Mateusz
                   ` (2 preceding siblings ...)
  2020-01-07 11:06 ` [PATCH 3/3] MdeModulePkg/SdMmcPciHcDxe: Add retries for async commands Albecki, Mateusz
@ 2020-01-07 11:09 ` Albecki, Mateusz
  2020-01-08  7:38 ` Wu, Hao A
  4 siblings, 0 replies; 11+ messages in thread
From: Albecki, Mateusz @ 2020-01-07 11:09 UTC (permalink / raw)
  To: devel@edk2.groups.io; +Cc: Wu, Hao A, Marcin Wojtas, Gao, Zhichao, Gao, Liming

Missed github info:

Clean patch series is here: https://github.com/malbecki/edk2/tree/sdmmc_retries
Patch series with the applied CRC simulation code is here: https://github.com/malbecki/edk2/tree/sdmmc_retries_with_test_code

Thanks,
Mateusz

> -----Original Message-----
> From: Albecki, Mateusz <mateusz.albecki@intel.com>
> Sent: Tuesday, January 7, 2020 12:06 PM
> To: devel@edk2.groups.io
> Cc: Albecki, Mateusz <mateusz.albecki@intel.com>; Wu, Hao A
> <hao.a.wu@intel.com>; Marcin Wojtas <mw@semihalf.com>; Gao, Zhichao
> <zhichao.gao@intel.com>; Gao, Liming <liming.gao@intel.com>
> Subject: [PATCH 0/3] MdeModulePkg/SdMmcPciHcDxe: Retry the
> commands that failed due to CRC error
> 
> REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1140
> 
> Some of the boards report that just after we change the clock frequency to
> 200MHz link is unable to stabilize fast enough and when driver sends the
> CMD13 it will often fail randomly with CRC error. To protect against this kind
> of random failures this patch series will make the driver retry the commands
> that failed due to random CRC errors.
> 
> Since async code has not yet been tested it has been put into separate patch.
> That patch is not needed to solve most pressing CMD13 issues.
> 
> Tets performed:
> -Boot eMMC in HS400
> -Boot eMMC in HS400 with simulated CRC error on every first CMD13
> 
> Cc: Hao A Wu <hao.a.wu@intel.com>
> Cc: Marcin Wojtas <mw@semihalf.com>
> Cc: Zhichao Gao <zhichao.gao@intel.com>
> Cc: Liming Gao <liming.gao@intel.com>
> 
> Signed-off-by: Mateusz Albecki <mateusz.albecki@intel.com>
> 
> Mateusz Albecki (3):
>   MdeModulePkg/SdMmcPciHcDxe: Refactor command error detection
>   MdeModulePkg/SdMmcPciHcDxe: Add retries for sync commands
>   MdeModulePkg/SdMmcPciHcDxe: Add retries for async commands
> 
>  MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHcDxe.c |  89 +++++---
>  MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHcDxe.h |   5 +-
>  MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHci.c   | 235
> ++++++++++++++-------
>  3 files changed, 221 insertions(+), 108 deletions(-)
> 
> --
> 2.14.1.windows.1

--------------------------------------------------------------------

Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.

Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek
przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by
others is strictly prohibited.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/3] MdeModulePkg/SdMmcPciHcDxe: Retry the commands that failed due to CRC error
  2020-01-07 11:06 [PATCH 0/3] MdeModulePkg/SdMmcPciHcDxe: Retry the commands that failed due to CRC error Albecki, Mateusz
                   ` (3 preceding siblings ...)
  2020-01-07 11:09 ` [PATCH 0/3] MdeModulePkg/SdMmcPciHcDxe: Retry the commands that failed due to CRC error Albecki, Mateusz
@ 2020-01-08  7:38 ` Wu, Hao A
  4 siblings, 0 replies; 11+ messages in thread
From: Wu, Hao A @ 2020-01-08  7:38 UTC (permalink / raw)
  To: Albecki, Mateusz, devel@edk2.groups.io
  Cc: Marcin Wojtas, Gao, Zhichao, Gao, Liming

> -----Original Message-----
> From: Albecki, Mateusz
> Sent: Tuesday, January 07, 2020 7:06 PM
> To: devel@edk2.groups.io
> Cc: Albecki, Mateusz; Wu, Hao A; Marcin Wojtas; Gao, Zhichao; Gao, Liming
> Subject: [PATCH 0/3] MdeModulePkg/SdMmcPciHcDxe: Retry the
> commands that failed due to CRC error
> 
> REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1140
> 
> Some of the boards report that just after we change the clock frequency to
> 200MHz link is
> unable to stabilize fast enough and when driver sends the CMD13 it will often
> fail
> randomly with CRC error. To protect against this kind of random failures this
> patch
> series will make the driver retry the commands that failed due to random
> CRC errors.
> 
> Since async code has not yet been tested it has been put into separate patch.
> That patch
> is not needed to solve most pressing CMD13 issues.
> 
> Tets performed:
> -Boot eMMC in HS400
> -Boot eMMC in HS400 with simulated CRC error on every first CMD13


Hello Mateusz,

Thanks for the contribution.
Please grant me some time to review and validate the series.
I will try to give my feedbacks no later than early next week.

Best Regards,
Hao Wu


> 
> Cc: Hao A Wu <hao.a.wu@intel.com>
> Cc: Marcin Wojtas <mw@semihalf.com>
> Cc: Zhichao Gao <zhichao.gao@intel.com>
> Cc: Liming Gao <liming.gao@intel.com>
> 
> Signed-off-by: Mateusz Albecki <mateusz.albecki@intel.com>
> 
> Mateusz Albecki (3):
>   MdeModulePkg/SdMmcPciHcDxe: Refactor command error detection
>   MdeModulePkg/SdMmcPciHcDxe: Add retries for sync commands
>   MdeModulePkg/SdMmcPciHcDxe: Add retries for async commands
> 
>  MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHcDxe.c |  89 +++++---
>  MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHcDxe.h |   5 +-
>  MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHci.c   | 235
> ++++++++++++++-------
>  3 files changed, 221 insertions(+), 108 deletions(-)
> 
> --
> 2.14.1.windows.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] MdeModulePkg/SdMmcPciHcDxe: Refactor command error detection
  2020-01-07 11:06 ` [PATCH 1/3] MdeModulePkg/SdMmcPciHcDxe: Refactor command error detection Albecki, Mateusz
@ 2020-01-10  5:37   ` Wu, Hao A
  2020-01-13 13:48     ` Albecki, Mateusz
  0 siblings, 1 reply; 11+ messages in thread
From: Wu, Hao A @ 2020-01-10  5:37 UTC (permalink / raw)
  To: Albecki, Mateusz, devel@edk2.groups.io
  Cc: Marcin Wojtas, Gao, Zhichao, Gao, Liming

Hello Mateusz,

Some inline comments below:


> -----Original Message-----
> From: Albecki, Mateusz
> Sent: Tuesday, January 07, 2020 7:06 PM
> To: devel@edk2.groups.io
> Cc: Albecki, Mateusz; Wu, Hao A; Marcin Wojtas; Gao, Zhichao; Gao, Liming
> Subject: [PATCH 1/3] MdeModulePkg/SdMmcPciHcDxe: Refactor command
> error detection
> 
> REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1140
> 
> Error detection function will now check if the command
> failure has been caused by one of the errors that can
> appear randomly on link(CRC error + end bit error). If
> such an error has been a cause of failure function will
> return EFI_CRC_ERROR instead of EFI_DEVICE_ERROR to indicate
> to the higher level that command has a chance of succeeding if
> resent. In addition this patch also fixes 2 small bugs. First one
> is DAT lane being reset on current limit error. Second one is
> data timeout error not being cleared after transfer has been completed.


For the 2 small issues, I would suggest to split them into separate patches,
which would make this patch into 3 patches. You can either do the refactoring
or fixing the 2 bugs first.


> 
> Cc: Hao A Wu <hao.a.wu@intel.com>
> Cc: Marcin Wojtas <mw@semihalf.com>
> Cc: Zhichao Gao <zhichao.gao@intel.com>
> Cc: Liming Gao <liming.gao@intel.com>
> 
> Signed-off-by: Mateusz Albecki <mateusz.albecki@intel.com>
> ---
>  MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHci.c | 234
> +++++++++++++++--------
>  1 file changed, 158 insertions(+), 76 deletions(-)
> 
> diff --git a/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHci.c
> b/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHci.c
> index e7f2fac69b..8b5e54f321 100644
> --- a/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHci.c
> +++ b/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHci.c
> @@ -7,7 +7,7 @@
>    It would expose EFI_SD_MMC_PASS_THRU_PROTOCOL for upper layer use.
> 
>    Copyright (c) 2018-2019, NVIDIA CORPORATION. All rights reserved.
> -  Copyright (c) 2015 - 2019, Intel Corporation. All rights reserved.<BR>
> +  Copyright (c) 2015 - 2020, Intel Corporation. All rights reserved.<BR>
>    SPDX-License-Identifier: BSD-2-Clause-Patent
> 
>  **/
> @@ -2137,6 +2137,154 @@ SdMmcExecTrb (
>    return Status;
>  }
> 
> +/**
> +  Performs SW reset based on passed error status mask.
> +
> +  @param[in]  Private       Pointer to driver private data.
> +  @param[in]  Slot          Index of the slot to reset.
> +  @param[in]  ErrIntStatus  Error interrupt status mask.
> +
> +  @retval EFI_SUCCESS  Software reset performed successfully.
> +  @retval Other        Software reset failed.
> +**/
> +EFI_STATUS
> +SdMmcSoftwareReset (
> +  IN SD_MMC_HC_PRIVATE_DATA  *Private,
> +  IN UINT8                   Slot,
> +  IN UINT16                  ErrIntStatus
> +  )
> +{
> +  UINT8       SwReset;
> +  EFI_STATUS  Status;
> +
> +  SwReset = 0;
> +  if ((ErrIntStatus & 0x0F) != 0) {
> +    SwReset |= BIT1;
> +  }
> +  if ((ErrIntStatus & 0x70) != 0) {


Thanks for this catch.
Could you help to separate this fix to another patch?


> +    SwReset |= BIT2;
> +  }
> +
> +  Status  = SdMmcHcRwMmio (
> +              Private->PciIo,
> +              Slot,
> +              SD_MMC_HC_SW_RST,
> +              FALSE,
> +              sizeof (SwReset),
> +              &SwReset
> +              );
> +  if (EFI_ERROR (Status)) {
> +    return Status;
> +  }
> +
> +  Status = SdMmcHcWaitMmioSet (
> +             Private->PciIo,
> +             Slot,
> +             SD_MMC_HC_SW_RST,
> +             sizeof (SwReset),
> +             0xFF,
> +             0,
> +             SD_MMC_HC_GENERIC_TIMEOUT
> +             );
> +  if (EFI_ERROR (Status)) {
> +    return Status;
> +  }
> +
> +  return EFI_SUCCESS;
> +}
> +
> +/**
> +  Checks the error status in error status register
> +  and issues appropriate software reset as described in
> +  SD specification section 3.10.
> +
> +  @param[in] Private  Pointer to driver private data.
> +  @param[in] Trb      Pointer to currently executing TRB.
> +
> +  @retval EFI_CRC_ERROR  CRC error happened during CMD execution.
> +  @retval EFI_SUCCESS    No error reported.
> +  @retval Others         Some other error happened.
> +
> +**/
> +EFI_STATUS
> +SdMmcCheckAndRecoverErrors (
> +  IN SD_MMC_HC_PRIVATE_DATA  *Private,
> +  IN UINT8                   Slot
> +  )
> +{
> +  UINT16      IntStatus;
> +  UINT16      ErrIntStatus;
> +  UINT16      ErrIntStatusOr;
> +  EFI_STATUS  Status;
> +  EFI_STATUS  ErrorStatus;
> +
> +  Status = SdMmcHcRwMmio (
> +             Private->PciIo,
> +             Slot,
> +             SD_MMC_HC_NOR_INT_STS,
> +             TRUE,
> +             sizeof (IntStatus),
> +             &IntStatus
> +             );
> +  if (EFI_ERROR (Status)) {
> +    return Status;
> +  }
> +
> +  if ((IntStatus & BIT15) == 0) {
> +    return EFI_SUCCESS;
> +  }
> +
> +  Status = SdMmcHcRwMmio (
> +             Private->PciIo,
> +             Slot,
> +             SD_MMC_HC_ERR_INT_STS,
> +             TRUE,
> +             sizeof (ErrIntStatus),
> +             &ErrIntStatus
> +             );
> +  if (EFI_ERROR (Status)) {
> +    return Status;
> +  }
> +
> +  //
> +  // We treat both CMD and DAT CRC errors and
> +  // end bits errors as EFI_CRC_ERROR. This will
> +  // let higher layer know that the error possibly
> +  // happened due to random bus condition and the
> +  // command can be retried.
> +  //
> +  if (ErrIntStatus & (BIT1 | BIT2 | BIT5 | BIT6)) {
> +    ErrorStatus = EFI_CRC_ERROR;
> +  } else if ((ErrIntStatus & BIT4) && (IntStatus & BIT1)){


A coding style comment here, please help to explicitly compare the '&' operation
result with 0 for the above checks, like:

  if ((ErrIntStatus & (BIT1 | BIT2 | BIT5 | BIT6)) != 0 ) {


> +    //
> +    // If the data timeout error is reported
> +    // but data transfer is signaled as completed we
> +    // have to ignore data timeout.
> +    //
> +    ErrorStatus = EFI_SUCCESS;


I am thinking it might be more clean to directly return for this case. Since
for this case, a recovery procedure is not required, doing so can avoid
explicitly clearing the Timeout error bit in the 'SD_MMC_HC_ERR_INT_STS'
register.

We can still leave the clear of the bits in 'SD_MMC_HC_ERR_INT_STS' register to
function SdMmcExecTrb() before the execution of next command.


> +    ErrIntStatusOr = BIT4;
> +    Status = SdMmcHcOrMmio (
> +               Private->PciIo,
> +               Slot,
> +               SD_MMC_HC_ERR_INT_STS,
> +               sizeof (ErrIntStatus),
> +               &ErrIntStatusOr
> +               );
> +    if (EFI_ERROR (Status)) {
> +      return Status;
> +    }
> +  } else {
> +    ErrorStatus = EFI_DEVICE_ERROR;
> +  }
> +
> +  Status = SdMmcSoftwareReset (Private, Slot, ErrIntStatus);
> +  if (EFI_ERROR (Status)) {
> +    return Status;
> +  }
> +
> +  return ErrorStatus;
> +}
> +
>  /**
>    Check the TRB execution result.
> 
> @@ -2160,10 +2308,8 @@ SdMmcCheckTrbResult (
>    UINT32                              Response[4];
>    UINT64                              SdmaAddr;
>    UINT8                               Index;
> -  UINT8                               SwReset;
>    UINT32                              PioLength;
> 
> -  SwReset = 0;
>    Packet  = Trb->Packet;
>    //
>    // Check Trb execution result by reading Normal Interrupt Status register.
> @@ -2179,87 +2325,23 @@ SdMmcCheckTrbResult (
>    if (EFI_ERROR (Status)) {
>      goto Done;
>    }


I think the below call in SdMmcCheckTrbResult():

  Status = SdMmcHcRwMmio (
             Private->PciIo,
             Trb->Slot,
             SD_MMC_HC_NOR_INT_STS,
             TRUE,
             sizeof (IntStatus),
             &IntStatus
             );

can be removed. I found that the same call will be made in
SdMmcCheckAndRecoverErrors().

Best Regards,
Hao Wu


> +
>    //
> -  // Check Transfer Complete bit is set or not.
> +  // Check if there are any errors reported by host controller
> +  // and if neccessary recover the controller before next command is
> executed.
>    //
> -  if ((IntStatus & BIT1) == BIT1) {
> -    if ((IntStatus & BIT15) == BIT15) {
> -      //
> -      // Read Error Interrupt Status register to check if the error is
> -      // Data Timeout Error.
> -      // If yes, treat it as success as Transfer Complete has higher
> -      // priority than Data Timeout Error.
> -      //
> -      Status = SdMmcHcRwMmio (
> -                 Private->PciIo,
> -                 Trb->Slot,
> -                 SD_MMC_HC_ERR_INT_STS,
> -                 TRUE,
> -                 sizeof (IntStatus),
> -                 &IntStatus
> -                 );
> -      if (!EFI_ERROR (Status)) {
> -        if ((IntStatus & BIT4) == BIT4) {
> -          Status = EFI_SUCCESS;
> -        } else {
> -          Status = EFI_DEVICE_ERROR;
> -        }
> -      }
> -    }
> -
> +  Status = SdMmcCheckAndRecoverErrors (Private, Trb->Slot);
> +  if (EFI_ERROR (Status)) {
>      goto Done;
>    }
> +
>    //
> -  // Check if there is a error happened during cmd execution.
> -  // If yes, then do error recovery procedure to follow SD Host Controller
> -  // Simplified Spec 3.0 section 3.10.1.
> +  // Check Transfer Complete bit is set or not.
>    //
> -  if ((IntStatus & BIT15) == BIT15) {
> -    Status = SdMmcHcRwMmio (
> -               Private->PciIo,
> -               Trb->Slot,
> -               SD_MMC_HC_ERR_INT_STS,
> -               TRUE,
> -               sizeof (IntStatus),
> -               &IntStatus
> -               );
> -    if (EFI_ERROR (Status)) {
> -      goto Done;
> -    }
> -    if ((IntStatus & 0x0F) != 0) {
> -      SwReset |= BIT1;
> -    }
> -    if ((IntStatus & 0xF0) != 0) {
> -      SwReset |= BIT2;
> -    }
> -
> -    Status  = SdMmcHcRwMmio (
> -                Private->PciIo,
> -                Trb->Slot,
> -                SD_MMC_HC_SW_RST,
> -                FALSE,
> -                sizeof (SwReset),
> -                &SwReset
> -                );
> -    if (EFI_ERROR (Status)) {
> -      goto Done;
> -    }
> -    Status = SdMmcHcWaitMmioSet (
> -               Private->PciIo,
> -               Trb->Slot,
> -               SD_MMC_HC_SW_RST,
> -               sizeof (SwReset),
> -               0xFF,
> -               0,
> -               SD_MMC_HC_GENERIC_TIMEOUT
> -               );
> -    if (EFI_ERROR (Status)) {
> -      goto Done;
> -    }
> -
> -    Status = EFI_DEVICE_ERROR;
> +  if ((IntStatus & BIT1) == BIT1) {
>      goto Done;
>    }
> +
>    //
>    // Check if DMA interrupt is signalled for the SDMA transfer.
>    //
> --
> 2.14.1.windows.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] MdeModulePkg/SdMmcPciHcDxe: Add retries for sync commands
  2020-01-07 11:06 ` [PATCH 2/3] MdeModulePkg/SdMmcPciHcDxe: Add retries for sync commands Albecki, Mateusz
@ 2020-01-10  5:37   ` Wu, Hao A
  0 siblings, 0 replies; 11+ messages in thread
From: Wu, Hao A @ 2020-01-10  5:37 UTC (permalink / raw)
  To: Albecki, Mateusz, devel@edk2.groups.io
  Cc: Marcin Wojtas, Gao, Zhichao, Gao, Liming

> -----Original Message-----
> From: Albecki, Mateusz
> Sent: Tuesday, January 07, 2020 7:06 PM
> To: devel@edk2.groups.io
> Cc: Albecki, Mateusz; Wu, Hao A; Marcin Wojtas; Gao, Zhichao; Gao, Liming
> Subject: [PATCH 2/3] MdeModulePkg/SdMmcPciHcDxe: Add retries for sync
> commands
> 
> REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1140
> 
> To increase the resiliency driver will now attempt to
> retry the commands that failed due to the CRC error up
> to 5 times. This should address the problems with the commands
> that fail due to random condition on links. This should also
> help the boards on which CMD13 is particularly unstable after
> switching the link frequency.


Reviewed-by: Hao A Wu <hao.a.wu@intel.com>

Best Regards,
Hao Wu


> 
> Cc: Hao A Wu <hao.a.wu@intel.com>
> Cc: Marcin Wojtas <mw@semihalf.com>
> Cc: Zhichao Gao <zhichao.gao@intel.com>
> Cc: Liming Gao <liming.gao@intel.com>
> 
> Signed-off-by: Mateusz Albecki <mateusz.albecki@intel.com>
> ---
>  MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHcDxe.c | 83
> ++++++++++++++--------
>  MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHcDxe.h |  5 +-
>  MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHci.c   |  1 +
>  3 files changed, 59 insertions(+), 30 deletions(-)
> 
> diff --git a/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHcDxe.c
> b/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHcDxe.c
> index 373f1bed45..193b0f24e2 100644
> --- a/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHcDxe.c
> +++ b/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHcDxe.c
> @@ -7,7 +7,7 @@
>    It would expose EFI_SD_MMC_PASS_THRU_PROTOCOL for upper layer use.
> 
>    Copyright (c) 2018-2019, NVIDIA CORPORATION. All rights reserved.
> -  Copyright (c) 2015 - 2019, Intel Corporation. All rights reserved.<BR>
> +  Copyright (c) 2015 - 2020, Intel Corporation. All rights reserved.<BR>
>    SPDX-License-Identifier: BSD-2-Clause-Patent
> 
>  **/
> @@ -974,6 +974,58 @@ SdMmcPciHcDriverBindingStop (
>    return Status;
>  }
> 
> +/**
> +  Execute TRB synchronously.
> +
> +  @param[in] Private  Pointer to driver private data.
> +  @param[in] Trb      Pointer to TRB to execute.
> +
> +  @retval EFI_SUCCESS  TRB executed successfully.
> +  @retval Other        TRB failed.
> +**/
> +EFI_STATUS
> +SdMmcPassThruExecSyncTrb (
> +  IN SD_MMC_HC_PRIVATE_DATA  *Private,
> +  IN SD_MMC_HC_TRB           *Trb
> +  )
> +{
> +  EFI_STATUS  Status;
> +  EFI_TPL     OldTpl;
> +
> +  //
> +  // Wait async I/O list is empty before execute sync I/O operation.
> +  //
> +  while (TRUE) {
> +    OldTpl = gBS->RaiseTPL (TPL_NOTIFY);
> +    if (IsListEmpty (&Private->Queue)) {
> +      gBS->RestoreTPL (OldTpl);
> +      break;
> +    }
> +    gBS->RestoreTPL (OldTpl);
> +  }
> +
> +  while (Trb->Retries) {
> +    Status = SdMmcWaitTrbEnv (Private, Trb);
> +    if (EFI_ERROR (Status)) {
> +      return Status;
> +    }
> +
> +    Status = SdMmcExecTrb (Private, Trb);
> +    if (EFI_ERROR (Status)) {
> +      return Status;
> +    }
> +
> +    Status = SdMmcWaitTrbResult (Private, Trb);
> +    if (Status == EFI_CRC_ERROR) {
> +      Trb->Retries--;
> +    } else {
> +      return Status;
> +    }
> +  }
> +
> +  return Status;
> +}
> +
>  /**
>    Sends SD command to an SD card that is attached to the SD controller.
> 
> @@ -1023,7 +1075,6 @@ SdMmcPassThruPassThru (
>    EFI_STATUS                      Status;
>    SD_MMC_HC_PRIVATE_DATA          *Private;
>    SD_MMC_HC_TRB                   *Trb;
> -  EFI_TPL                         OldTpl;
> 
>    if ((This == NULL) || (Packet == NULL)) {
>      return EFI_INVALID_PARAMETER;
> @@ -1066,34 +1117,8 @@ SdMmcPassThruPassThru (
>      return EFI_SUCCESS;
>    }
> 
> -  //
> -  // Wait async I/O list is empty before execute sync I/O operation.
> -  //
> -  while (TRUE) {
> -    OldTpl = gBS->RaiseTPL (TPL_NOTIFY);
> -    if (IsListEmpty (&Private->Queue)) {
> -      gBS->RestoreTPL (OldTpl);
> -      break;
> -    }
> -    gBS->RestoreTPL (OldTpl);
> -  }
> -
> -  Status = SdMmcWaitTrbEnv (Private, Trb);
> -  if (EFI_ERROR (Status)) {
> -    goto Done;
> -  }
> -
> -  Status = SdMmcExecTrb (Private, Trb);
> -  if (EFI_ERROR (Status)) {
> -    goto Done;
> -  }
> +  Status = SdMmcPassThruExecSyncTrb (Private, Trb);
> 
> -  Status = SdMmcWaitTrbResult (Private, Trb);
> -  if (EFI_ERROR (Status)) {
> -    goto Done;
> -  }
> -
> -Done:
>    SdMmcFreeTrb (Trb);
> 
>    return Status;
> diff --git a/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHcDxe.h
> b/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHcDxe.h
> index 0304960132..5bc3577ba2 100644
> --- a/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHcDxe.h
> +++ b/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHcDxe.h
> @@ -3,7 +3,7 @@
>    Provides some data structure definitions used by the SD/MMC host
> controller driver.
> 
>  Copyright (c) 2018-2019, NVIDIA CORPORATION. All rights reserved.
> -Copyright (c) 2015, Intel Corporation. All rights reserved.<BR>
> +Copyright (c) 2015 - 2020, Intel Corporation. All rights reserved.<BR>
>  SPDX-License-Identifier: BSD-2-Clause-Patent
> 
>  **/
> @@ -130,6 +130,8 @@ typedef struct {
> 
>  #define SD_MMC_HC_TRB_SIG             SIGNATURE_32 ('T', 'R', 'B', 'T')
> 
> +#define SD_MMC_TRB_RETRIES            5
> +
>  //
>  // TRB (Transfer Request Block) contains information for the cmd request.
>  //
> @@ -152,6 +154,7 @@ typedef struct {
>    EFI_EVENT                           Event;
>    BOOLEAN                             Started;
>    UINT64                              Timeout;
> +  UINT32                              Retries;
> 
>    SD_MMC_HC_ADMA_32_DESC_LINE         *Adma32Desc;
>    SD_MMC_HC_ADMA_64_V3_DESC_LINE      *Adma64V3Desc;
> diff --git a/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHci.c
> b/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHci.c
> index 8b5e54f321..676ace847b 100644
> --- a/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHci.c
> +++ b/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHci.c
> @@ -1683,6 +1683,7 @@ SdMmcCreateTrb (
>    Trb->Event     = Event;
>    Trb->Started   = FALSE;
>    Trb->Timeout   = Packet->Timeout;
> +  Trb->Retries   = SD_MMC_TRB_RETRIES;
>    Trb->Private   = Private;
> 
>    if ((Packet->InTransferLength != 0) && (Packet->InDataBuffer != NULL)) {
> --
> 2.14.1.windows.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] MdeModulePkg/SdMmcPciHcDxe: Add retries for async commands
  2020-01-07 11:06 ` [PATCH 3/3] MdeModulePkg/SdMmcPciHcDxe: Add retries for async commands Albecki, Mateusz
@ 2020-01-10  5:37   ` Wu, Hao A
  0 siblings, 0 replies; 11+ messages in thread
From: Wu, Hao A @ 2020-01-10  5:37 UTC (permalink / raw)
  To: Albecki, Mateusz, devel@edk2.groups.io
  Cc: Marcin Wojtas, Gao, Zhichao, Gao, Liming

> -----Original Message-----
> From: Albecki, Mateusz
> Sent: Tuesday, January 07, 2020 7:06 PM
> To: devel@edk2.groups.io
> Cc: Albecki, Mateusz; Wu, Hao A; Marcin Wojtas; Gao, Zhichao; Gao, Liming
> Subject: [PATCH 3/3] MdeModulePkg/SdMmcPciHcDxe: Add retries for async
> commands
> 
> This patch adds retries for async execution for commands that
> failed due to the CRC errors.
> 
> Cc: Hao A Wu <hao.a.wu@intel.com>
> Cc: Marcin Wojtas <mw@semihalf.com>
> Cc: Zhichao Gao <zhichao.gao@intel.com>
> Cc: Liming Gao <liming.gao@intel.com>
> 
> Signed-off-by: Mateusz Albecki <mateusz.albecki@intel.com>
> ---
>  MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHcDxe.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHcDxe.c
> b/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHcDxe.c
> index 193b0f24e2..b18ff3e972 100644
> --- a/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHcDxe.c
> +++ b/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHcDxe.c
> @@ -211,8 +211,10 @@ Done:
>        gBS->SignalEvent (TrbEvent);
>        return;
>      }
> -  }
> -  if ((Trb != NULL) && (Status != EFI_NOT_READY)) {
> +  } else if ((Trb != NULL) && (Status == EFI_CRC_ERROR) && (Trb->Retries >
> 0)) {
> +    Trb->Retries--;
> +    Trb->Started = FALSE;


I think it is fine to let the 'Trb->Timeout' field not being reset here.
My take is that all the potential retries should be restricted by the timeout
value specified by the command packet.

Reviewed-by: Hao A Wu <hao.a.wu@intel.com>

Best Regards,
Hao Wu


> +  } else if ((Trb != NULL)) {
>      RemoveEntryList (Link);
>      Trb->Packet->TransactionStatus = Status;
>      TrbEvent = Trb->Event;
> --
> 2.14.1.windows.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] MdeModulePkg/SdMmcPciHcDxe: Refactor command error detection
  2020-01-10  5:37   ` Wu, Hao A
@ 2020-01-13 13:48     ` Albecki, Mateusz
  2020-01-14  4:16       ` Wu, Hao A
  0 siblings, 1 reply; 11+ messages in thread
From: Albecki, Mateusz @ 2020-01-13 13:48 UTC (permalink / raw)
  To: Wu, Hao A, devel@edk2.groups.io; +Cc: Marcin Wojtas, Gao, Zhichao, Gao, Liming

Hi,

I will fix the 2 issues in separate patch probably before doing the refactor to avoid reverting it if the refactor introduces some unexpected issues.

Please also see inline.

Thanks,
Mateusz

> -----Original Message-----
> From: Wu, Hao A <hao.a.wu@intel.com>
> Sent: Friday, January 10, 2020 6:38 AM
> To: Albecki, Mateusz <mateusz.albecki@intel.com>; devel@edk2.groups.io
> Cc: Marcin Wojtas <mw@semihalf.com>; Gao, Zhichao
> <zhichao.gao@intel.com>; Gao, Liming <liming.gao@intel.com>
> Subject: RE: [PATCH 1/3] MdeModulePkg/SdMmcPciHcDxe: Refactor
> command error detection
> 
> Hello Mateusz,
> 
> Some inline comments below:
> 
> 
> > -----Original Message-----
> > From: Albecki, Mateusz
> > Sent: Tuesday, January 07, 2020 7:06 PM
> > To: devel@edk2.groups.io
> > Cc: Albecki, Mateusz; Wu, Hao A; Marcin Wojtas; Gao, Zhichao; Gao,
> > Liming
> > Subject: [PATCH 1/3] MdeModulePkg/SdMmcPciHcDxe: Refactor
> command
> > error detection
> >
> > REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1140
> >
> > Error detection function will now check if the command failure has
> > been caused by one of the errors that can appear randomly on link(CRC
> > error + end bit error). If such an error has been a cause of failure
> > function will return EFI_CRC_ERROR instead of EFI_DEVICE_ERROR to
> > indicate to the higher level that command has a chance of succeeding
> > if resent. In addition this patch also fixes 2 small bugs. First one
> > is DAT lane being reset on current limit error. Second one is data
> > timeout error not being cleared after transfer has been completed.
> 
> 
> For the 2 small issues, I would suggest to split them into separate patches,
> which would make this patch into 3 patches. You can either do the
> refactoring or fixing the 2 bugs first.
> 
> 
> >
> > Cc: Hao A Wu <hao.a.wu@intel.com>
> > Cc: Marcin Wojtas <mw@semihalf.com>
> > Cc: Zhichao Gao <zhichao.gao@intel.com>
> > Cc: Liming Gao <liming.gao@intel.com>
> >
> > Signed-off-by: Mateusz Albecki <mateusz.albecki@intel.com>
> > ---
> >  MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHci.c | 234
> > +++++++++++++++--------
> >  1 file changed, 158 insertions(+), 76 deletions(-)
> >
> > diff --git a/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHci.c
> > b/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHci.c
> > index e7f2fac69b..8b5e54f321 100644
> > --- a/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHci.c
> > +++ b/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHci.c
> > @@ -7,7 +7,7 @@
> >    It would expose EFI_SD_MMC_PASS_THRU_PROTOCOL for upper layer
> use.
> >
> >    Copyright (c) 2018-2019, NVIDIA CORPORATION. All rights reserved.
> > -  Copyright (c) 2015 - 2019, Intel Corporation. All rights
> > reserved.<BR>
> > +  Copyright (c) 2015 - 2020, Intel Corporation. All rights
> > + reserved.<BR>
> >    SPDX-License-Identifier: BSD-2-Clause-Patent
> >
> >  **/
> > @@ -2137,6 +2137,154 @@ SdMmcExecTrb (
> >    return Status;
> >  }
> >
> > +/**
> > +  Performs SW reset based on passed error status mask.
> > +
> > +  @param[in]  Private       Pointer to driver private data.
> > +  @param[in]  Slot          Index of the slot to reset.
> > +  @param[in]  ErrIntStatus  Error interrupt status mask.
> > +
> > +  @retval EFI_SUCCESS  Software reset performed successfully.
> > +  @retval Other        Software reset failed.
> > +**/
> > +EFI_STATUS
> > +SdMmcSoftwareReset (
> > +  IN SD_MMC_HC_PRIVATE_DATA  *Private,
> > +  IN UINT8                   Slot,
> > +  IN UINT16                  ErrIntStatus
> > +  )
> > +{
> > +  UINT8       SwReset;
> > +  EFI_STATUS  Status;
> > +
> > +  SwReset = 0;
> > +  if ((ErrIntStatus & 0x0F) != 0) {
> > +    SwReset |= BIT1;
> > +  }
> > +  if ((ErrIntStatus & 0x70) != 0) {
> 
> 
> Thanks for this catch.
> Could you help to separate this fix to another patch?
> 
> 
> > +    SwReset |= BIT2;
> > +  }
> > +
> > +  Status  = SdMmcHcRwMmio (
> > +              Private->PciIo,
> > +              Slot,
> > +              SD_MMC_HC_SW_RST,
> > +              FALSE,
> > +              sizeof (SwReset),
> > +              &SwReset
> > +              );
> > +  if (EFI_ERROR (Status)) {
> > +    return Status;
> > +  }
> > +
> > +  Status = SdMmcHcWaitMmioSet (
> > +             Private->PciIo,
> > +             Slot,
> > +             SD_MMC_HC_SW_RST,
> > +             sizeof (SwReset),
> > +             0xFF,
> > +             0,
> > +             SD_MMC_HC_GENERIC_TIMEOUT
> > +             );
> > +  if (EFI_ERROR (Status)) {
> > +    return Status;
> > +  }
> > +
> > +  return EFI_SUCCESS;
> > +}
> > +
> > +/**
> > +  Checks the error status in error status register
> > +  and issues appropriate software reset as described in
> > +  SD specification section 3.10.
> > +
> > +  @param[in] Private  Pointer to driver private data.
> > +  @param[in] Trb      Pointer to currently executing TRB.
> > +
> > +  @retval EFI_CRC_ERROR  CRC error happened during CMD execution.
> > +  @retval EFI_SUCCESS    No error reported.
> > +  @retval Others         Some other error happened.
> > +
> > +**/
> > +EFI_STATUS
> > +SdMmcCheckAndRecoverErrors (
> > +  IN SD_MMC_HC_PRIVATE_DATA  *Private,
> > +  IN UINT8                   Slot
> > +  )
> > +{
> > +  UINT16      IntStatus;
> > +  UINT16      ErrIntStatus;
> > +  UINT16      ErrIntStatusOr;
> > +  EFI_STATUS  Status;
> > +  EFI_STATUS  ErrorStatus;
> > +
> > +  Status = SdMmcHcRwMmio (
> > +             Private->PciIo,
> > +             Slot,
> > +             SD_MMC_HC_NOR_INT_STS,
> > +             TRUE,
> > +             sizeof (IntStatus),
> > +             &IntStatus
> > +             );
> > +  if (EFI_ERROR (Status)) {
> > +    return Status;
> > +  }
> > +
> > +  if ((IntStatus & BIT15) == 0) {
> > +    return EFI_SUCCESS;
> > +  }
> > +
> > +  Status = SdMmcHcRwMmio (
> > +             Private->PciIo,
> > +             Slot,
> > +             SD_MMC_HC_ERR_INT_STS,
> > +             TRUE,
> > +             sizeof (ErrIntStatus),
> > +             &ErrIntStatus
> > +             );
> > +  if (EFI_ERROR (Status)) {
> > +    return Status;
> > +  }
> > +
> > +  //
> > +  // We treat both CMD and DAT CRC errors and  // end bits errors as
> > + EFI_CRC_ERROR. This will  // let higher layer know that the error
> > + possibly  // happened due to random bus condition and the  //
> > + command can be retried.
> > +  //
> > +  if (ErrIntStatus & (BIT1 | BIT2 | BIT5 | BIT6)) {
> > +    ErrorStatus = EFI_CRC_ERROR;
> > +  } else if ((ErrIntStatus & BIT4) && (IntStatus & BIT1)){
> 
> 
> A coding style comment here, please help to explicitly compare the '&'
> operation
> result with 0 for the above checks, like:
> 
>   if ((ErrIntStatus & (BIT1 | BIT2 | BIT5 | BIT6)) != 0 ) {
> 
> 

OK

> > +    //
> > +    // If the data timeout error is reported
> > +    // but data transfer is signaled as completed we
> > +    // have to ignore data timeout.
> > +    //
> > +    ErrorStatus = EFI_SUCCESS;
> 
> 
> I am thinking it might be more clean to directly return for this case. Since
> for this case, a recovery procedure is not required, doing so can avoid
> explicitly clearing the Timeout error bit in the 'SD_MMC_HC_ERR_INT_STS'
> register.
> 
> We can still leave the clear of the bits in 'SD_MMC_HC_ERR_INT_STS' register
> to
> function SdMmcExecTrb() before the execution of next command.
> 
> 

You are right, both error and normal interrupt status are going to be reset during SdMmcExecTrb. I didn't notice it and that is why I thought it is a bug.
Reading SD host controller spec some more it seems to me like we should assume that if Transfer complete is set then no other error except for transfer timeout can be set and, as you proposed, return directly with EFI_SUCCESS. It would match previous implementation behavior.

> > +    ErrIntStatusOr = BIT4;
> > +    Status = SdMmcHcOrMmio (
> > +               Private->PciIo,
> > +               Slot,
> > +               SD_MMC_HC_ERR_INT_STS,
> > +               sizeof (ErrIntStatus),
> > +               &ErrIntStatusOr
> > +               );
> > +    if (EFI_ERROR (Status)) {
> > +      return Status;
> > +    }
> > +  } else {
> > +    ErrorStatus = EFI_DEVICE_ERROR;
> > +  }
> > +
> > +  Status = SdMmcSoftwareReset (Private, Slot, ErrIntStatus);
> > +  if (EFI_ERROR (Status)) {
> > +    return Status;
> > +  }
> > +
> > +  return ErrorStatus;
> > +}
> > +
> >  /**
> >    Check the TRB execution result.
> >
> > @@ -2160,10 +2308,8 @@ SdMmcCheckTrbResult (
> >    UINT32                              Response[4];
> >    UINT64                              SdmaAddr;
> >    UINT8                               Index;
> > -  UINT8                               SwReset;
> >    UINT32                              PioLength;
> >
> > -  SwReset = 0;
> >    Packet  = Trb->Packet;
> >    //
> >    // Check Trb execution result by reading Normal Interrupt Status register.
> > @@ -2179,87 +2325,23 @@ SdMmcCheckTrbResult (
> >    if (EFI_ERROR (Status)) {
> >      goto Done;
> >    }
> 
> 
> I think the below call in SdMmcCheckTrbResult():
> 
>   Status = SdMmcHcRwMmio (
>              Private->PciIo,
>              Trb->Slot,
>              SD_MMC_HC_NOR_INT_STS,
>              TRUE,
>              sizeof (IntStatus),
>              &IntStatus
>              );
> 
> can be removed. I found that the same call will be made in
> SdMmcCheckAndRecoverErrors().
> 
> Best Regards,
> Hao Wu
> 
> 

We still need to check in normal interrupt register the status of the transfer complete, DMA interrupt and buffer ready so I think the call has to stay there. That said I can pass the IntStatus to SdMmcCheckAndRecoverErrors or move the call to SdMmcHcRwMmio after the call to SdMmcCheckAndRecoverErrors. I don't really have a preference so let me know if you want me to make any changes.

> > +
> >    //
> > -  // Check Transfer Complete bit is set or not.
> > +  // Check if there are any errors reported by host controller
> > +  // and if neccessary recover the controller before next command is
> > executed.
> >    //
> > -  if ((IntStatus & BIT1) == BIT1) {
> > -    if ((IntStatus & BIT15) == BIT15) {
> > -      //
> > -      // Read Error Interrupt Status register to check if the error is
> > -      // Data Timeout Error.
> > -      // If yes, treat it as success as Transfer Complete has higher
> > -      // priority than Data Timeout Error.
> > -      //
> > -      Status = SdMmcHcRwMmio (
> > -                 Private->PciIo,
> > -                 Trb->Slot,
> > -                 SD_MMC_HC_ERR_INT_STS,
> > -                 TRUE,
> > -                 sizeof (IntStatus),
> > -                 &IntStatus
> > -                 );
> > -      if (!EFI_ERROR (Status)) {
> > -        if ((IntStatus & BIT4) == BIT4) {
> > -          Status = EFI_SUCCESS;
> > -        } else {
> > -          Status = EFI_DEVICE_ERROR;
> > -        }
> > -      }
> > -    }
> > -
> > +  Status = SdMmcCheckAndRecoverErrors (Private, Trb->Slot);
> > +  if (EFI_ERROR (Status)) {
> >      goto Done;
> >    }
> > +
> >    //
> > -  // Check if there is a error happened during cmd execution.
> > -  // If yes, then do error recovery procedure to follow SD Host Controller
> > -  // Simplified Spec 3.0 section 3.10.1.
> > +  // Check Transfer Complete bit is set or not.
> >    //
> > -  if ((IntStatus & BIT15) == BIT15) {
> > -    Status = SdMmcHcRwMmio (
> > -               Private->PciIo,
> > -               Trb->Slot,
> > -               SD_MMC_HC_ERR_INT_STS,
> > -               TRUE,
> > -               sizeof (IntStatus),
> > -               &IntStatus
> > -               );
> > -    if (EFI_ERROR (Status)) {
> > -      goto Done;
> > -    }
> > -    if ((IntStatus & 0x0F) != 0) {
> > -      SwReset |= BIT1;
> > -    }
> > -    if ((IntStatus & 0xF0) != 0) {
> > -      SwReset |= BIT2;
> > -    }
> > -
> > -    Status  = SdMmcHcRwMmio (
> > -                Private->PciIo,
> > -                Trb->Slot,
> > -                SD_MMC_HC_SW_RST,
> > -                FALSE,
> > -                sizeof (SwReset),
> > -                &SwReset
> > -                );
> > -    if (EFI_ERROR (Status)) {
> > -      goto Done;
> > -    }
> > -    Status = SdMmcHcWaitMmioSet (
> > -               Private->PciIo,
> > -               Trb->Slot,
> > -               SD_MMC_HC_SW_RST,
> > -               sizeof (SwReset),
> > -               0xFF,
> > -               0,
> > -               SD_MMC_HC_GENERIC_TIMEOUT
> > -               );
> > -    if (EFI_ERROR (Status)) {
> > -      goto Done;
> > -    }
> > -
> > -    Status = EFI_DEVICE_ERROR;
> > +  if ((IntStatus & BIT1) == BIT1) {
> >      goto Done;
> >    }
> > +
> >    //
> >    // Check if DMA interrupt is signalled for the SDMA transfer.
> >    //
> > --
> > 2.14.1.windows.1
> 

--------------------------------------------------------------------

Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.

Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek
przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by
others is strictly prohibited.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] MdeModulePkg/SdMmcPciHcDxe: Refactor command error detection
  2020-01-13 13:48     ` Albecki, Mateusz
@ 2020-01-14  4:16       ` Wu, Hao A
  0 siblings, 0 replies; 11+ messages in thread
From: Wu, Hao A @ 2020-01-14  4:16 UTC (permalink / raw)
  To: Albecki, Mateusz, devel@edk2.groups.io
  Cc: Marcin Wojtas, Gao, Zhichao, Gao, Liming

> -----Original Message-----
> From: Albecki, Mateusz
> Sent: Monday, January 13, 2020 9:49 PM
> To: Wu, Hao A; devel@edk2.groups.io
> Cc: Marcin Wojtas; Gao, Zhichao; Gao, Liming
> Subject: RE: [PATCH 1/3] MdeModulePkg/SdMmcPciHcDxe: Refactor
> command error detection
> 
> Hi,
> 
> I will fix the 2 issues in separate patch probably before doing the refactor to
> avoid reverting it if the refactor introduces some unexpected issues.
> 
> Please also see inline.
> 
> Thanks,
> Mateusz
> 
> > -----Original Message-----
> > From: Wu, Hao A <hao.a.wu@intel.com>
> > Sent: Friday, January 10, 2020 6:38 AM
> > To: Albecki, Mateusz <mateusz.albecki@intel.com>;
> devel@edk2.groups.io
> > Cc: Marcin Wojtas <mw@semihalf.com>; Gao, Zhichao
> > <zhichao.gao@intel.com>; Gao, Liming <liming.gao@intel.com>
> > Subject: RE: [PATCH 1/3] MdeModulePkg/SdMmcPciHcDxe: Refactor
> > command error detection
> >
> > Hello Mateusz,
> >
> > Some inline comments below:
> >
> >
> > > -----Original Message-----
> > > From: Albecki, Mateusz
> > > Sent: Tuesday, January 07, 2020 7:06 PM
> > > To: devel@edk2.groups.io
> > > Cc: Albecki, Mateusz; Wu, Hao A; Marcin Wojtas; Gao, Zhichao; Gao,
> > > Liming
> > > Subject: [PATCH 1/3] MdeModulePkg/SdMmcPciHcDxe: Refactor
> > command
> > > error detection
> > >
> > > REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1140
> > >
> > > Error detection function will now check if the command failure has
> > > been caused by one of the errors that can appear randomly on link(CRC
> > > error + end bit error). If such an error has been a cause of failure
> > > function will return EFI_CRC_ERROR instead of EFI_DEVICE_ERROR to
> > > indicate to the higher level that command has a chance of succeeding
> > > if resent. In addition this patch also fixes 2 small bugs. First one
> > > is DAT lane being reset on current limit error. Second one is data
> > > timeout error not being cleared after transfer has been completed.
> >
> >
> > For the 2 small issues, I would suggest to split them into separate patches,
> > which would make this patch into 3 patches. You can either do the
> > refactoring or fixing the 2 bugs first.
> >
> >
> > >
> > > Cc: Hao A Wu <hao.a.wu@intel.com>
> > > Cc: Marcin Wojtas <mw@semihalf.com>
> > > Cc: Zhichao Gao <zhichao.gao@intel.com>
> > > Cc: Liming Gao <liming.gao@intel.com>
> > >
> > > Signed-off-by: Mateusz Albecki <mateusz.albecki@intel.com>
> > > ---
> > >  MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHci.c | 234
> > > +++++++++++++++--------
> > >  1 file changed, 158 insertions(+), 76 deletions(-)
> > >
> > > diff --git a/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHci.c
> > > b/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHci.c
> > > index e7f2fac69b..8b5e54f321 100644
> > > --- a/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHci.c
> > > +++ b/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/SdMmcPciHci.c
> > > @@ -7,7 +7,7 @@
> > >    It would expose EFI_SD_MMC_PASS_THRU_PROTOCOL for upper layer
> > use.
> > >
> > >    Copyright (c) 2018-2019, NVIDIA CORPORATION. All rights reserved.
> > > -  Copyright (c) 2015 - 2019, Intel Corporation. All rights
> > > reserved.<BR>
> > > +  Copyright (c) 2015 - 2020, Intel Corporation. All rights
> > > + reserved.<BR>
> > >    SPDX-License-Identifier: BSD-2-Clause-Patent
> > >
> > >  **/
> > > @@ -2137,6 +2137,154 @@ SdMmcExecTrb (
> > >    return Status;
> > >  }
> > >
> > > +/**
> > > +  Performs SW reset based on passed error status mask.
> > > +
> > > +  @param[in]  Private       Pointer to driver private data.
> > > +  @param[in]  Slot          Index of the slot to reset.
> > > +  @param[in]  ErrIntStatus  Error interrupt status mask.
> > > +
> > > +  @retval EFI_SUCCESS  Software reset performed successfully.
> > > +  @retval Other        Software reset failed.
> > > +**/
> > > +EFI_STATUS
> > > +SdMmcSoftwareReset (
> > > +  IN SD_MMC_HC_PRIVATE_DATA  *Private,
> > > +  IN UINT8                   Slot,
> > > +  IN UINT16                  ErrIntStatus
> > > +  )
> > > +{
> > > +  UINT8       SwReset;
> > > +  EFI_STATUS  Status;
> > > +
> > > +  SwReset = 0;
> > > +  if ((ErrIntStatus & 0x0F) != 0) {
> > > +    SwReset |= BIT1;
> > > +  }
> > > +  if ((ErrIntStatus & 0x70) != 0) {
> >
> >
> > Thanks for this catch.
> > Could you help to separate this fix to another patch?
> >
> >
> > > +    SwReset |= BIT2;
> > > +  }
> > > +
> > > +  Status  = SdMmcHcRwMmio (
> > > +              Private->PciIo,
> > > +              Slot,
> > > +              SD_MMC_HC_SW_RST,
> > > +              FALSE,
> > > +              sizeof (SwReset),
> > > +              &SwReset
> > > +              );
> > > +  if (EFI_ERROR (Status)) {
> > > +    return Status;
> > > +  }
> > > +
> > > +  Status = SdMmcHcWaitMmioSet (
> > > +             Private->PciIo,
> > > +             Slot,
> > > +             SD_MMC_HC_SW_RST,
> > > +             sizeof (SwReset),
> > > +             0xFF,
> > > +             0,
> > > +             SD_MMC_HC_GENERIC_TIMEOUT
> > > +             );
> > > +  if (EFI_ERROR (Status)) {
> > > +    return Status;
> > > +  }
> > > +
> > > +  return EFI_SUCCESS;
> > > +}
> > > +
> > > +/**
> > > +  Checks the error status in error status register
> > > +  and issues appropriate software reset as described in
> > > +  SD specification section 3.10.
> > > +
> > > +  @param[in] Private  Pointer to driver private data.
> > > +  @param[in] Trb      Pointer to currently executing TRB.
> > > +
> > > +  @retval EFI_CRC_ERROR  CRC error happened during CMD execution.
> > > +  @retval EFI_SUCCESS    No error reported.
> > > +  @retval Others         Some other error happened.
> > > +
> > > +**/
> > > +EFI_STATUS
> > > +SdMmcCheckAndRecoverErrors (
> > > +  IN SD_MMC_HC_PRIVATE_DATA  *Private,
> > > +  IN UINT8                   Slot
> > > +  )
> > > +{
> > > +  UINT16      IntStatus;
> > > +  UINT16      ErrIntStatus;
> > > +  UINT16      ErrIntStatusOr;
> > > +  EFI_STATUS  Status;
> > > +  EFI_STATUS  ErrorStatus;
> > > +
> > > +  Status = SdMmcHcRwMmio (
> > > +             Private->PciIo,
> > > +             Slot,
> > > +             SD_MMC_HC_NOR_INT_STS,
> > > +             TRUE,
> > > +             sizeof (IntStatus),
> > > +             &IntStatus
> > > +             );
> > > +  if (EFI_ERROR (Status)) {
> > > +    return Status;
> > > +  }
> > > +
> > > +  if ((IntStatus & BIT15) == 0) {
> > > +    return EFI_SUCCESS;
> > > +  }
> > > +
> > > +  Status = SdMmcHcRwMmio (
> > > +             Private->PciIo,
> > > +             Slot,
> > > +             SD_MMC_HC_ERR_INT_STS,
> > > +             TRUE,
> > > +             sizeof (ErrIntStatus),
> > > +             &ErrIntStatus
> > > +             );
> > > +  if (EFI_ERROR (Status)) {
> > > +    return Status;
> > > +  }
> > > +
> > > +  //
> > > +  // We treat both CMD and DAT CRC errors and  // end bits errors as
> > > + EFI_CRC_ERROR. This will  // let higher layer know that the error
> > > + possibly  // happened due to random bus condition and the  //
> > > + command can be retried.
> > > +  //
> > > +  if (ErrIntStatus & (BIT1 | BIT2 | BIT5 | BIT6)) {
> > > +    ErrorStatus = EFI_CRC_ERROR;
> > > +  } else if ((ErrIntStatus & BIT4) && (IntStatus & BIT1)){
> >
> >
> > A coding style comment here, please help to explicitly compare the '&'
> > operation
> > result with 0 for the above checks, like:
> >
> >   if ((ErrIntStatus & (BIT1 | BIT2 | BIT5 | BIT6)) != 0 ) {
> >
> >
> 
> OK
> 
> > > +    //
> > > +    // If the data timeout error is reported
> > > +    // but data transfer is signaled as completed we
> > > +    // have to ignore data timeout.
> > > +    //
> > > +    ErrorStatus = EFI_SUCCESS;
> >
> >
> > I am thinking it might be more clean to directly return for this case. Since
> > for this case, a recovery procedure is not required, doing so can avoid
> > explicitly clearing the Timeout error bit in the 'SD_MMC_HC_ERR_INT_STS'
> > register.
> >
> > We can still leave the clear of the bits in 'SD_MMC_HC_ERR_INT_STS'
> register
> > to
> > function SdMmcExecTrb() before the execution of next command.
> >
> >
> 
> You are right, both error and normal interrupt status are going to be reset
> during SdMmcExecTrb. I didn't notice it and that is why I thought it is a bug.
> Reading SD host controller spec some more it seems to me like we should
> assume that if Transfer complete is set then no other error except for
> transfer timeout can be set and, as you proposed, return directly with
> EFI_SUCCESS. It would match previous implementation behavior.
> 
> > > +    ErrIntStatusOr = BIT4;
> > > +    Status = SdMmcHcOrMmio (
> > > +               Private->PciIo,
> > > +               Slot,
> > > +               SD_MMC_HC_ERR_INT_STS,
> > > +               sizeof (ErrIntStatus),
> > > +               &ErrIntStatusOr
> > > +               );
> > > +    if (EFI_ERROR (Status)) {
> > > +      return Status;
> > > +    }
> > > +  } else {
> > > +    ErrorStatus = EFI_DEVICE_ERROR;
> > > +  }
> > > +
> > > +  Status = SdMmcSoftwareReset (Private, Slot, ErrIntStatus);
> > > +  if (EFI_ERROR (Status)) {
> > > +    return Status;
> > > +  }
> > > +
> > > +  return ErrorStatus;
> > > +}
> > > +
> > >  /**
> > >    Check the TRB execution result.
> > >
> > > @@ -2160,10 +2308,8 @@ SdMmcCheckTrbResult (
> > >    UINT32                              Response[4];
> > >    UINT64                              SdmaAddr;
> > >    UINT8                               Index;
> > > -  UINT8                               SwReset;
> > >    UINT32                              PioLength;
> > >
> > > -  SwReset = 0;
> > >    Packet  = Trb->Packet;
> > >    //
> > >    // Check Trb execution result by reading Normal Interrupt Status
> register.
> > > @@ -2179,87 +2325,23 @@ SdMmcCheckTrbResult (
> > >    if (EFI_ERROR (Status)) {
> > >      goto Done;
> > >    }
> >
> >
> > I think the below call in SdMmcCheckTrbResult():
> >
> >   Status = SdMmcHcRwMmio (
> >              Private->PciIo,
> >              Trb->Slot,
> >              SD_MMC_HC_NOR_INT_STS,
> >              TRUE,
> >              sizeof (IntStatus),
> >              &IntStatus
> >              );
> >
> > can be removed. I found that the same call will be made in
> > SdMmcCheckAndRecoverErrors().
> >
> > Best Regards,
> > Hao Wu
> >
> >
> 
> We still need to check in normal interrupt register the status of the transfer
> complete, DMA interrupt and buffer ready so I think the call has to stay
> there. That said I can pass the IntStatus to SdMmcCheckAndRecoverErrors or
> move the call to SdMmcHcRwMmio after the call to
> SdMmcCheckAndRecoverErrors. I don't really have a preference so let me
> know if you want me to make any changes.


Hello Mateusz,

I think you can pass the 'IntStatus' parameter into function
SdMmcCheckAndRecoverErrors(). Doing so can avoid reading the
'SD_MMC_HC_NOR_INT_STS' register multiple times.

Best Regards,
Hao Wu


> 
> > > +
> > >    //
> > > -  // Check Transfer Complete bit is set or not.
> > > +  // Check if there are any errors reported by host controller
> > > +  // and if neccessary recover the controller before next command is
> > > executed.
> > >    //
> > > -  if ((IntStatus & BIT1) == BIT1) {
> > > -    if ((IntStatus & BIT15) == BIT15) {
> > > -      //
> > > -      // Read Error Interrupt Status register to check if the error is
> > > -      // Data Timeout Error.
> > > -      // If yes, treat it as success as Transfer Complete has higher
> > > -      // priority than Data Timeout Error.
> > > -      //
> > > -      Status = SdMmcHcRwMmio (
> > > -                 Private->PciIo,
> > > -                 Trb->Slot,
> > > -                 SD_MMC_HC_ERR_INT_STS,
> > > -                 TRUE,
> > > -                 sizeof (IntStatus),
> > > -                 &IntStatus
> > > -                 );
> > > -      if (!EFI_ERROR (Status)) {
> > > -        if ((IntStatus & BIT4) == BIT4) {
> > > -          Status = EFI_SUCCESS;
> > > -        } else {
> > > -          Status = EFI_DEVICE_ERROR;
> > > -        }
> > > -      }
> > > -    }
> > > -
> > > +  Status = SdMmcCheckAndRecoverErrors (Private, Trb->Slot);
> > > +  if (EFI_ERROR (Status)) {
> > >      goto Done;
> > >    }
> > > +
> > >    //
> > > -  // Check if there is a error happened during cmd execution.
> > > -  // If yes, then do error recovery procedure to follow SD Host Controller
> > > -  // Simplified Spec 3.0 section 3.10.1.
> > > +  // Check Transfer Complete bit is set or not.
> > >    //
> > > -  if ((IntStatus & BIT15) == BIT15) {
> > > -    Status = SdMmcHcRwMmio (
> > > -               Private->PciIo,
> > > -               Trb->Slot,
> > > -               SD_MMC_HC_ERR_INT_STS,
> > > -               TRUE,
> > > -               sizeof (IntStatus),
> > > -               &IntStatus
> > > -               );
> > > -    if (EFI_ERROR (Status)) {
> > > -      goto Done;
> > > -    }
> > > -    if ((IntStatus & 0x0F) != 0) {
> > > -      SwReset |= BIT1;
> > > -    }
> > > -    if ((IntStatus & 0xF0) != 0) {
> > > -      SwReset |= BIT2;
> > > -    }
> > > -
> > > -    Status  = SdMmcHcRwMmio (
> > > -                Private->PciIo,
> > > -                Trb->Slot,
> > > -                SD_MMC_HC_SW_RST,
> > > -                FALSE,
> > > -                sizeof (SwReset),
> > > -                &SwReset
> > > -                );
> > > -    if (EFI_ERROR (Status)) {
> > > -      goto Done;
> > > -    }
> > > -    Status = SdMmcHcWaitMmioSet (
> > > -               Private->PciIo,
> > > -               Trb->Slot,
> > > -               SD_MMC_HC_SW_RST,
> > > -               sizeof (SwReset),
> > > -               0xFF,
> > > -               0,
> > > -               SD_MMC_HC_GENERIC_TIMEOUT
> > > -               );
> > > -    if (EFI_ERROR (Status)) {
> > > -      goto Done;
> > > -    }
> > > -
> > > -    Status = EFI_DEVICE_ERROR;
> > > +  if ((IntStatus & BIT1) == BIT1) {
> > >      goto Done;
> > >    }
> > > +
> > >    //
> > >    // Check if DMA interrupt is signalled for the SDMA transfer.
> > >    //
> > > --
> > > 2.14.1.windows.1
> >
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-01-14  4:16 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-01-07 11:06 [PATCH 0/3] MdeModulePkg/SdMmcPciHcDxe: Retry the commands that failed due to CRC error Albecki, Mateusz
2020-01-07 11:06 ` [PATCH 1/3] MdeModulePkg/SdMmcPciHcDxe: Refactor command error detection Albecki, Mateusz
2020-01-10  5:37   ` Wu, Hao A
2020-01-13 13:48     ` Albecki, Mateusz
2020-01-14  4:16       ` Wu, Hao A
2020-01-07 11:06 ` [PATCH 2/3] MdeModulePkg/SdMmcPciHcDxe: Add retries for sync commands Albecki, Mateusz
2020-01-10  5:37   ` Wu, Hao A
2020-01-07 11:06 ` [PATCH 3/3] MdeModulePkg/SdMmcPciHcDxe: Add retries for async commands Albecki, Mateusz
2020-01-10  5:37   ` Wu, Hao A
2020-01-07 11:09 ` [PATCH 0/3] MdeModulePkg/SdMmcPciHcDxe: Retry the commands that failed due to CRC error Albecki, Mateusz
2020-01-08  7:38 ` Wu, Hao A

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox