public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: "Sami Mujawar" <sami.mujawar@arm.com>
To: Omkar Anand Kulkarni <omkar.kulkarni@arm.com>, devel@edk2.groups.io
Cc: Ard Biesheuvel <ardb+tianocore@kernel.org>, nd <nd@arm.com>
Subject: Re: [edk2-platforms][PATCH v3 1/5] Platform/ARM: Add DMC-620 ECC error handling driver
Date: Mon, 27 Sep 2021 18:30:01 +0100	[thread overview]
Message-ID: <b2072b9f-e57e-f80d-5fac-72655978d1cc@arm.com> (raw)
In-Reply-To: <20210824060027.27246-2-omkar.kulkarni@arm.com>

[-- Attachment #1: Type: text/plain, Size: 34324 bytes --]

Hi Omkar,

Thank you for this patch.

Please find my feedback marked inline as [SAMI].

Regards,

Sami Mujawar


On 24/08/2021 07:00 AM, Omkar Anand Kulkarni wrote:
> DMC-620 memory controller improves system reliability by generating
> interrupts on detecting ECC errors on the data. Add a initial DMC-620 MM
> driver that implements a MMI handler for handling single-bit ECC error
> events originating from the DRAM.
>
> The driver implements the HEST error source descriptor protocol in order
> to publish the GHES error source descriptor for single-bit DRAM errors.
> The GHES error source descriptor that is published is of type 'memory
> error'. A GHES error source descriptor is published for each instances
> if the DMC-620 controller in the system.
>
> The driver registers a MMI handler for handling 1-bit DRAM ECC error
> events. The MMI handler, when invoked, reads the DMC-620 error record
> registers and populates the EFI_PLATFORM_MEMORY_ERROR_DATA type error
> section information structure with the corresponding information read
> from the error record registers.
>
> Co-authored-by: Thomas Abraham <thomas.abraham@arm.com>
> Signed-off-by: Omkar Anand Kulkarni <omkar.kulkarni@arm.com>
> ---
>   Platform/ARM/Drivers/Dmc620Mm/Dmc620Mm.dec              |  30 ++
>   Platform/ARM/Drivers/Dmc620Mm/Dmc620Mm.inf              |  61 ++++
>   Platform/ARM/Drivers/Dmc620Mm/Dmc620Mm.h                | 174 ++++++++++
>   Platform/ARM/Drivers/Dmc620Mm/Dmc620Mm.c                | 362 ++++++++++++++++++++
>   Platform/ARM/Drivers/Dmc620Mm/Dmc620MmErrorSourceInfo.c | 194 +++++++++++
>   5 files changed, 821 insertions(+)
>
> diff --git a/Platform/ARM/Drivers/Dmc620Mm/Dmc620Mm.dec b/Platform/ARM/Drivers/Dmc620Mm/Dmc620Mm.dec
> new file mode 100644
> index 000000000000..8f3508574203
> --- /dev/null
> +++ b/Platform/ARM/Drivers/Dmc620Mm/Dmc620Mm.dec
> @@ -0,0 +1,30 @@
> +## @file
> +#  DMC-620 MM driver specific declrations.
> +#
> +#  This file defines GUIDs and declares PCD values for DMC-620 MM driver.
> +#
> +#  Copyright (c) 2020 - 2021, ARM Limited. All rights reserved.
> +#  SPDX-License-Identifier: BSD-2-Clause-Patent
> +#
> +##
> +
> +[Defines]
> +  DEC_SPECIFICATION              = 0x0001001A
> +  PACKAGE_NAME                   = Dmc620Mm
> +  PACKAGE_GUID                   = 94110B10-8E72-42A0-8963-D2B57FCF0F38
> +  PACKAGE_VERSION                = 0.1
> +
> +[Guids]
> +  gDmc620MmTokenSpaceGuid = {0xc305f72a, 0xd10d, 0x45e8, { 0x81, 0x78, 0x51, 0x8b, 0x78, 0x62, 0x77, 0x79 } }
> +  gArmDmcEventHandlerGuid = { 0x5ef0afd5, 0xe01a, 0x4c30, { 0x86, 0x19, 0x45, 0x46, 0x26, 0x91, 0x80, 0x98 }}
> +
> +[PcdsFixedAtBuild.common]
> +  gDmc620MmTokenSpaceGuid.PcdDmc620CorrectableErrorThreshold|10|UINT32|0x00000004
> +  gDmc620MmTokenSpaceGuid.PcdDmc620CtrlSize|0x100000|UINT32|0x00000003
> +  gDmc620MmTokenSpaceGuid.PcdDmc620DramErrorSdeiEventBase|0|UINT32|0x00000006
> +  gDmc620MmTokenSpaceGuid.PcdDmc620DramOneBitErrorDataBase|0|UINT64|0x00000007
> +  gDmc620MmTokenSpaceGuid.PcdDmc620DramOneBitErrorDataSize|0|UINT64|0x00000008
> +  gDmc620MmTokenSpaceGuid.PcdDmc620DramOneBitErrorSourceId|0|UINT16|0x00000009
> +  gDmc620MmTokenSpaceGuid.PcdDmc620ErrSourceCount|1|UINT32|0x00000005
> +  gDmc620MmTokenSpaceGuid.PcdDmc620NumCtrl|2|UINT32|0x00000001
> +  gDmc620MmTokenSpaceGuid.PcdDmc620RegisterBase|0x4E000000|UINT64|0x00000002
> diff --git a/Platform/ARM/Drivers/Dmc620Mm/Dmc620Mm.inf b/Platform/ARM/Drivers/Dmc620Mm/Dmc620Mm.inf
> new file mode 100644
> index 000000000000..8cad07749a23
> --- /dev/null
> +++ b/Platform/ARM/Drivers/Dmc620Mm/Dmc620Mm.inf
> @@ -0,0 +1,61 @@
> +## @file
> +#  StandaloneMM driver for the DMC620 Memory Controller.
> +#
> +#  Driver to handle 1-bit Corrected DRAM errors for DMC(s).
> +#
> +#  Copyright (c) 2020 - 2021, ARM Limited. All rights reserved.
> +#  SPDX-License-Identifier: BSD-2-Clause-Patent
> +#
> +##
> +
> +[Defines]
> +  INF_VERSION                    = 0x0001001A
[SAMI] Please use latest INF version i.e. 0x0001001B (see 
https://edk2-docs.gitbook.io/edk-ii-inf-specification/2_inf_overview/24_-defines-_section).
> +  BASE_NAME                      = StandaloneMmDmc620Driver
> +  FILE_GUID                      = CB53ACD9-A1A1-43B3-A638-AC74DA5D9DA2
> +  MODULE_TYPE                    = MM_STANDALONE
> +  VERSION_STRING                 = 1.0
> +  PI_SPECIFICATION_VERSION       = 0x00010032
> +  ENTRY_POINT                    = Dmc620MmDriverInitialize
> +
> +[Sources]
> +  Dmc620Mm.c
> +  Dmc620MmErrorSourceInfo.c
> +
> +[Packages]
> +  ArmPkg/ArmPkg.dec
> +  ArmPlatformPkg/ArmPlatformPkg.dec
> +  EmbeddedPkg/EmbeddedPkg.dec
> +  MdeModulePkg/MdeModulePkg.dec
> +  MdePkg/MdePkg.dec
> +  Platform/ARM/Drivers/Dmc620Mm/Dmc620Mm.dec
> +  StandaloneMmPkg/StandaloneMmPkg.dec
> +
> +[LibraryClasses]
> +  ArmLib
> +  ArmSvcLib
> +  BaseMemoryLib
> +  DebugLib
> +  StandaloneMmDriverEntryPoint
> +
> +[Protocols]
> +  gMmHestErrorSourceDescProtocolGuid      ##PRODUCES
> +
> +[FixedPcd]
> +  gArmPlatformTokenSpaceGuid.PcdGhesGenericErrorDataMmBufferBase
> +  gArmPlatformTokenSpaceGuid.PcdGhesGenericErrorDataMmBufferSize
> +
> +  gDmc620MmTokenSpaceGuid.PcdDmc620CorrectableErrorThreshold
> +  gDmc620MmTokenSpaceGuid.PcdDmc620CtrlSize
> +  gDmc620MmTokenSpaceGuid.PcdDmc620DramErrorSdeiEventBase
> +  gDmc620MmTokenSpaceGuid.PcdDmc620DramOneBitErrorDataBase
> +  gDmc620MmTokenSpaceGuid.PcdDmc620DramOneBitErrorDataSize
> +  gDmc620MmTokenSpaceGuid.PcdDmc620DramOneBitErrorSourceId
> +  gDmc620MmTokenSpaceGuid.PcdDmc620ErrSourceCount
> +  gDmc620MmTokenSpaceGuid.PcdDmc620NumCtrl
> +  gDmc620MmTokenSpaceGuid.PcdDmc620RegisterBase
> +
> +[Guids]
> +  gArmDmcEventHandlerGuid
> +
> +[Depex]
> +  TRUE
> diff --git a/Platform/ARM/Drivers/Dmc620Mm/Dmc620Mm.h b/Platform/ARM/Drivers/Dmc620Mm/Dmc620Mm.h
> new file mode 100644
> index 000000000000..f5c96396b870
> --- /dev/null
> +++ b/Platform/ARM/Drivers/Dmc620Mm/Dmc620Mm.h
> @@ -0,0 +1,174 @@
> +/** @file
> +  DMC-620 memory controller MM driver definitions.
> +
> +  Macros and structure definitions for DMC-620 error handling MM driver.
> +
[SAMI] Is it possible to add the DMC specification reference here as 
well, please?
> +  Copyright (c) 2020 - 2021, ARM Limited. All rights reserved.
> +  SPDX-License-Identifier: BSD-2-Clause-Patent
> +**/
> +
> +#ifndef DMC620_MM_DRIVER_H_
> +#define DMC620_MM_DRIVER_H_
> +
> +#include <Base.h>
> +#include <Guid/Cper.h>
> +#include <IndustryStandard/Acpi.h>
> +#include <Library/ArmLib.h>
> +#include <Library/BaseMemoryLib.h>
> +#include <Library/DebugLib.h>
> +#include <Library/IoLib.h>
> +#include <Protocol/HestErrorSourceInfo.h>
> +
> +// DMC-620 memc register field values and masks.
> +#define DMC620_MEMC_STATUS_MASK       (BIT2|BIT1|BIT0)
> +#define DMC620_MEMC_STATUS_READY      (BIT1|BIT0)
> +#define DMC620_MEMC_CMD_EXECUTE_DRAIN (BIT2|BIT0)
> +
> +// DMC-620 Error Record Status register fields values and masks.
> +#define DMC620_ERR_STATUS_MV BIT26
> +#define DMC620_ERR_STATUS_AV BIT31
> +
> +// DMC-620 Error Record MISC-0 register fields values and masks.
> +#define DMC620_ERR_MISC0_COLUMN_MASK \
> +  (BIT9|BIT8|BIT7|BIT6|BIT5|BIT4|BIT3|BIT2|BIT1|BIT0)
> +#define DMC620_ERR_MISC0_ROW_MASK   (0x0FFFFC00)
> +#define DMC620_ERR_MISC0_ROW_SHIFT  10
> +#define DMC620_ERR_MISC0_RANK_MASK  (BIT30|BIT29|BIT28)
> +#define DMC620_ERR_MISC0_RANK_SHIFT 28
> +#define DMC620_ERR_MISC0_VAILD      BIT31
> +
> +// DMC-620 Error Record register fields values and mask.
> +#define DMC620_ERR_MISC1_VAILD     BIT31
> +#define DMC620_ERR_MISC1_BANK_MASK (BIT3|BIT2|BIT1|BIT0)
> +
> +// DMC-620 Error Record Global Status register bit field.
> +#define DMC620_ERR_GSR_ECC_CORRECTED_FH BIT1
> +
> +//
> +// DMC-620 Memory Mapped register definitions.
> +//
> +
> +// Unused DMC-620 register fields.
> +#define RESV_0 0x1BD
> +#define RESV_1 0x2C
> +#define RESV_2 0x8
> +#define RESV_3 0x58
> +
> +#pragma pack(1)
> +typedef struct {
[SAMI] Can you add documentation for what this structure describes, please?
> +  UINT32 MemcStatus;
> +  UINT32 MemcConfig;
> +  UINT32 MemcCmd;
> +  UINT32 Reserved[RESV_0];
> +  UINT32 Err0Fr;
> +  UINT32 Reserved1;
> +  UINT32 Err0Ctlr0;
> +  UINT32 Err0Ctlr1;
> +  UINT32 Err0Status;
> +  UINT8  Reserved2[RESV_1];
> +  UINT32 Err1Fr;
> +  UINT32 Reserved3;
> +  UINT32 Err1Ctlr;
> +  UINT32 Reserved4;
> +  UINT32 Err1Status;
> +  UINT32 Reserved5;
> +  UINT32 Err1Addr0;
> +  UINT32 Err1Addr1;
> +  UINT32 Err1Misc0;
> +  UINT32 Err1Misc1;
> +  UINT32 Err1Misc2;
> +  UINT32 Err1Misc3;
> +  UINT32 Err1Misc4;
> +  UINT32 Err1Misc5;
> +  UINT8  Reserved6[RESV_2];
> +  UINT32 Err2Fr;
> +  UINT32 Reserved7;
> +  UINT32 Err2Ctlr;
> +  UINT32 Reserved8;
> +  UINT32 Err2Status;
> +  UINT32 Reserved9;
> +  UINT32 Err2Addr0;
> +  UINT32 Err2Addr1;
> +  UINT32 Err2Misc0;
> +  UINT32 Err2Misc1;
> +  UINT32 Err2Misc2;
> +  UINT32 Err2Misc3;
> +  UINT32 Err2Misc4;
> +  UINT32 Err2Misc5;
> +  UINT8  Reserved10[RESV_2];
> +  UINT32 Reserved11[RESV_3];
> +  UINT32 Errgsr;
> +} DMC620_REGS_TYPE;
> +
> +// DMC-620 Typical Error Record register definition.
> +typedef struct {
> +  UINT32 ErrFr;
> +  UINT32 Reserved;
> +  UINT32 ErrCtlr;
> +  UINT32 Reserved1;
> +  UINT32 ErrStatus;
> +  UINT32 Reserved2;
> +  UINT32 ErrAddr0;
> +  UINT32 ErrAddr1;
> +  UINT32 ErrMisc0;
> +  UINT32 ErrMisc1;
> +  UINT32 ErrMisc2;
> +  UINT32 ErrMisc3;
> +  UINT32 ErrMisc4;
> +  UINT32 ErrMisc5;
> +  UINT8  Reserved3[RESV_2];
> +} DMC620_ERR_REGS_TYPE;
> +#pragma pack()
> +
> +// List of supported error sources by DMC-620.
> +typedef enum {
> +  DramEccCfh = 0,
> +  DramEccFh,
> +  ChiFh,
> +  SramEccCfh,
> +  SramEccFh,
> +  DmcErrRecovery
> +} DMC_ERR_SOURCES;
> +
> +/**
> +  MMI handler implementing the HEST error source desc protocol.
> +
> +  Returns the error source descriptor information for all DMC(s) error sources
> +  and also returns its count and length.
> +
> +  @param[in]   This                Pointer for this protocol.
> +  @param[out]  Buffer              HEST error source descriptor Information
> +                                   buffer.
> +  @param[out]  ErrorSourcesLength  Total length of Error Source Descriptors.
> +  @param[out]  ErrorSourceCount    Total number of supported error sources.
> +
> +  @retval  EFI_SUCCESS            Buffer has valid Error Source descriptor
> +                                  information.
> +  @retval  EFI_INVALID_PARAMETER  Buffer is NULL.
> +**/
> +EFI_STATUS
> +EFIAPI
> +DmcErrorSourceDescInfoGet (
> +  IN  MM_HEST_ERROR_SOURCE_DESC_PROTOCOL *This,
> +  OUT VOID                               **Buffer,
> +  OUT UINTN                              *ErrorSourcesLength,
> +  OUT UINTN                              *ErrorSourcesCount
> +  );
> +
> +/**
> +  Allow reporting of supported DMC-620 error sources.
> +
> +  Install the HEST Error Source Descriptor protocol handler to allow publishing
> +  of the supported DMC-620 memory controller error sources.
> +
> +  @param[in]  MmSystemTable  Pointer to System table.
> +
> +  @retval  EFI_SUCCESS            Protocol installation successful.
> +  @retval  EFI_INVALID_PARAMETER  Invalid system table parameter.
> +**/
> +EFI_STATUS
> +Dmc620InstallErrorSourceDescProtocol (
> +  IN EFI_MM_SYSTEM_TABLE *MmSystemTable
> +  );
> +
> +#endif // DMC620_MM_DRIVER_H_
> diff --git a/Platform/ARM/Drivers/Dmc620Mm/Dmc620Mm.c b/Platform/ARM/Drivers/Dmc620Mm/Dmc620Mm.c
> new file mode 100644
> index 000000000000..91daf713f275
> --- /dev/null
> +++ b/Platform/ARM/Drivers/Dmc620Mm/Dmc620Mm.c
> @@ -0,0 +1,362 @@
> +/** @file
> +  DMC-620 Memory Controller error handling (Standalone MM) driver.
> +
> +  Supports 1-bit Bit DRAM error handling for multiple DMC instances. On a error
> +  event, publishes the CPER error record of Memory Error type.
> +
> +  Copyright (c) 2020 - 2021, ARM Limited. All rights reserved.
> +  SPDX-License-Identifier: BSD-2-Clause-Patent
> +
> +  @par Specification Reference
> +    - DMC620 Dynamic Memory Controller, revision r1p0.
> +    - UEFI Reference Specification 2.8, Section N.2.5 Memory Error Section
> +**/
> +
> +#include <Dmc620Mm.h>
> +
> +/**
> +  Helper function to handle the DMC-620 DRAM errors.
> +
> +  Reads the DRAM error record registers. Creates a CPER error record of type
> +  'Memory Error' and populates it with information collected from DRAM error
> +  record registers.
> +
> +  @param[in]  DmcCtrl                A pointer to DMC control registers.
> +  @param[in]  DmcInstance            DMC instance which raised the fault event.
> +  @param[in]  ErrRecType             A type of the DMC error record.
> +  @param[in]  ErrorBlockBaseAddress  Unique address for populating the error
> +                                     block status for given DMC error source.
> +**/
> +STATIC
> +VOID
> +Dmc620HandleDramError (
> +  IN DMC620_REGS_TYPE *DmcCtrl,
> +  IN UINTN            DmcInstance,
> +  IN UINTN            ErrRecType,
> +  IN UINTN            ErrorBlockBaseAddress
> +  )
> +{
> +  EFI_ACPI_6_3_GENERIC_ERROR_DATA_ENTRY_STRUCTURE *ErrBlockSectionDesc;
> +  EFI_ACPI_6_3_GENERIC_ERROR_STATUS_STRUCTURE     *ErrBlockStatusHeaderData;
> +  EFI_PLATFORM_MEMORY_ERROR_DATA                  MemorySectionInfo = {0};
[SAMI] Call ZeroMem() to initialise MemorySectionInfo structure.
> +  DMC620_ERR_REGS_TYPE                            *ErrRecord;
> +  EFI_GUID                                        SectionType;
> +  UINT32                                          ResetReg;
> +  VOID                                            *ErrBlockSectionData;
> +  UINTN                                           *ErrorStatusRegister;
> +  UINTN                                           *ReadAckRegister;
> +  UINTN                                           *ErrStatusBlock;
> +  UINTN                                           ErrStatus;
> +  UINTN                                           ErrAddr0;
> +  UINTN                                           ErrAddr1;
> +  UINTN                                           ErrMisc0;
> +  UINTN                                           ErrMisc1;
> +  UINT8                                           CorrectedError;
> +
> +  //
> +  // Check the type of DRAM error (1-bit or 2-bit) and accordingly select
> +  // error record to use.
> +  //
> +  if (ErrRecType == DMC620_ERR_GSR_ECC_CORRECTED_FH) {
> +    DEBUG ((
> +      DEBUG_INFO,
> +      "%a: DRAM ECC Corrected Fault (1-Bit ECC error)\n",
> +      __FUNCTION__
> +      ));
> +    ErrRecord = (DMC620_ERR_REGS_TYPE *)&DmcCtrl->Err1Fr;
> +    CorrectedError = 1;
> +  } else {
> +    DEBUG ((
> +      DEBUG_INFO,
> +      "%a: DRAM ECC Fault Handling (2-bit ECC error)\n",
> +      __FUNCTION__
> +      ));
> +    ErrRecord = (DMC620_ERR_REGS_TYPE *)&DmcCtrl->Err2Fr;
> +    CorrectedError = 0;
> +  }
> +
> +  // Read most recent DRAM error record registers.
> +  ErrStatus = MmioRead32 ((UINTN)&ErrRecord->ErrStatus);
[SAMI] Why is a typecast to UINTN required here? If ErrRecord->ErrStatus 
is the address of the ErrStatus register then this should have been 
UINTN in DMC620_ERR_REGS_TYPE, right?
The same question for the typecasts used in Mmio[Read|Write]32 the rest 
of this patch.

I can see why you did this after having a look at 
https://developer.arm.com/documentation/100568/0100/programmers-model/register-summary?lang=en.
I think you should use #defines for the register offsets instead of 
defining the DMC620_ERR_REGS_TYPE structure.
> +  ErrAddr0  = MmioRead32 ((UINTN)&ErrRecord->ErrAddr0);
> +  ErrAddr1  = MmioRead32 ((UINTN)&ErrRecord->ErrAddr1);
> +  ErrMisc0  = MmioRead32 ((UINTN)&ErrRecord->ErrMisc0);
> +  ErrMisc1  = MmioRead32 ((UINTN)&ErrRecord->ErrMisc1);
> +
> +  // Clear the status register so that new error records are populated.
> +  ResetReg = MmioRead32 ((UINTN)&ErrRecord->ErrStatus);
> +  MmioWrite32 ((UINTN)&ErrRecord->ErrStatus, ResetReg);
> +
> +  //
> +  // Get Physical address of DRAM error from Error Record Address register
> +  // and populate Memory Error Section.
> +  //
> +  if (ErrStatus & DMC620_ERR_STATUS_AV) {
[SAMI] If condition must evaluate a boolean expression. 
https://edk2-docs.gitbook.io/edk-ii-c-coding-standards-specification/5_source_files/57_c_programming#5-7-2-1-boolean-values-variable-type-boolean-do-not-require-explicit-comparisons-to-true-or-false

> +    DEBUG ((
> +      DEBUG_INFO,
> +      "%a: DRAM Error: Address_0 : 0x%x Address_1 : 0x%x\n",
> +      __FUNCTION__,
> +      ErrAddr0,
> +      ErrAddr1
> +      ));
> +
> +    //
> +    // Populate Memory CPER section with DRAM error address (48 bits) and
> +    // address mask fields.
> +    //
> +    MemorySectionInfo.ValidFields |=
> +      EFI_PLATFORM_MEMORY_PHY_ADDRESS_MASK_VALID |
> +      EFI_PLATFORM_MEMORY_PHY_ADDRESS_VALID;
> +    MemorySectionInfo.PhysicalAddressMask = 0xFFFFFFFFFFFF;
> +    MemorySectionInfo.PhysicalAddress = (ErrAddr1 << 32) | ErrAddr0;
> +  }
> +
> +  //
> +  // Read the Error Record Misc registers and populate relevant fields in
> +  // Memory CPER error section.
> +  //
> +  if ((ErrStatus & DMC620_ERR_STATUS_MV)
> +      && (ErrMisc0 & DMC620_ERR_MISC0_VAILD))
> +  {
> +    // Populate Memory error section wih DRAM column information.
> +    MemorySectionInfo.ValidFields |= EFI_PLATFORM_MEMORY_COLUMN_VALID;
> +    MemorySectionInfo.Column = ErrMisc0 & DMC620_ERR_MISC0_COLUMN_MASK;
> +
> +    //
> +    // Populate Memory Error Section with DRAM row information.
> +    // Row bits (bit 16 and 17) are to be filled as extended.
> +    //
> +    MemorySectionInfo.ValidFields |=
> +      EFI_PLATFORM_MEMORY_ERROR_EXTENDED_ROW_BIT_16_17_VALID;
> +    MemorySectionInfo.Row =
> +      (ErrMisc0 & DMC620_ERR_MISC0_ROW_MASK) >> DMC620_ERR_MISC0_ROW_SHIFT;
> +    MemorySectionInfo.Extended =
> +      ((ErrMisc0 & DMC620_ERR_MISC0_ROW_MASK) >>
> +       (DMC620_ERR_MISC0_ROW_SHIFT + 16));
> +
> +    // Populate Memory Error Section wih DRAM rank information.
> +    MemorySectionInfo.ValidFields |= EFI_PLATFORM_MEMORY_ERROR_RANK_NUM_VALID;
> +    MemorySectionInfo.RankNum = (ErrMisc0 & DMC620_ERR_MISC0_RANK_MASK) >>
> +      DMC620_ERR_MISC0_RANK_SHIFT;
> +  }
> +
> +  // Read Error Record MISC1 register and populate the Memory Error Section.
> +  if ((ErrStatus & DMC620_ERR_STATUS_MV)
> +      && (ErrMisc1 & DMC620_ERR_MISC1_VAILD))
> +  {
> +    MemorySectionInfo.ValidFields |= EFI_PLATFORM_MEMORY_BANK_VALID;
> +    MemorySectionInfo.Bank = (ErrMisc1 & DMC620_ERR_MISC1_BANK_MASK);
> +  }
> +
> +  //
> +  // Misc registers 2..5 are not used and convey only the error counter
> +  // information. They are cleared as they do not contribute in Error
> +  // Record creation.
> +  //
> +  if (ErrStatus & DMC620_ERR_STATUS_MV) {
> +    ResetReg = 0x0;
> +    MmioWrite32 ((UINTN)&ErrRecord->ErrMisc2, ResetReg);
> +    MmioWrite32 ((UINTN)&ErrRecord->ErrMisc3, ResetReg);
> +    MmioWrite32 ((UINTN)&ErrRecord->ErrMisc4, ResetReg);
> +    MmioWrite32 ((UINTN)&ErrRecord->ErrMisc5, ResetReg);
> +  }
> +
> +  //
> +  // Reset error records Status register for recording new DRAM error syndrome
> +  // information.
> +  //
> +  ResetReg = MmioRead32 ((UINTN)&ErrRecord->ErrStatus);
> +  MmioWrite32 ((UINTN)&ErrRecord->ErrStatus, ResetReg);
> +
> +  //
> +  // Allocate memory for Error Acknowledge register, Error Status register and
> +  // Error status block data.
> +  //
> +  ReadAckRegister = (UINTN *)ErrorBlockBaseAddress;
> +  ErrorStatusRegister = (UINTN *)ErrorBlockBaseAddress + 1;
[SAMI] Can you check if the pointer math here is what you expect, 
please?  It will help if some explanation is added. Memory certainly is 
not being allocated here.
> +  ErrStatusBlock = (UINTN *)ErrorStatusRegister + 1;
> +
> +  // Initialize Error Status Register with Error Status Block address.
> +  *ErrorStatusRegister = (UINTN)ErrStatusBlock;
> +
> +  //
> +  // Locate Block Status Header base address and populate it with Error Status
> +  // Block Header information.
> +  //
> +  ErrBlockStatusHeaderData = (EFI_ACPI_6_3_GENERIC_ERROR_STATUS_STRUCTURE *)
> +                             ErrStatusBlock;
> +  *ErrBlockStatusHeaderData =
> +    (EFI_ACPI_6_3_GENERIC_ERROR_STATUS_STRUCTURE) {
> +      .BlockStatus = {
> +        .UncorrectableErrorValid     = ((CorrectedError == 0) ? 0 : 1),
> +        .CorrectableErrorValid       = ((CorrectedError == 1) ? 1 : 0),
> +        .MultipleUncorrectableErrors = 0x0,
> +        .MultipleCorrectableErrors   = 0x0,
> +        .ErrorDataEntryCount         = 0x1
> +       },
[SAMI] This initialisation form at is not supported by all compilers. 
Can you fix this, please?
> +      .RawDataOffset =
> +        (sizeof (EFI_ACPI_6_3_GENERIC_ERROR_STATUS_STRUCTURE) +
> +         sizeof (EFI_ACPI_6_3_GENERIC_ERROR_DATA_ENTRY_STRUCTURE)),
> +      .RawDataLength = 0,
> +      .DataLength =
> +        (sizeof (EFI_ACPI_6_3_GENERIC_ERROR_DATA_ENTRY_STRUCTURE) +
> +         sizeof(EFI_PLATFORM_MEMORY_ERROR_DATA)),
> +      .ErrorSeverity = ((CorrectedError == 1) ?
> +                        EFI_ACPI_6_3_ERROR_SEVERITY_CORRECTED :
> +                        EFI_ACPI_6_3_ERROR_SEVERITY_FATAL),
> +    };
> +
> +  //
> +  // Locate Section Descriptor base address and populate Error Status Section
> +  // Descriptor data.
> +  //
> +  ErrBlockSectionDesc = (EFI_ACPI_6_3_GENERIC_ERROR_DATA_ENTRY_STRUCTURE *)
> +                        (ErrBlockStatusHeaderData + 1);
> +  *ErrBlockSectionDesc =
> +    (EFI_ACPI_6_3_GENERIC_ERROR_DATA_ENTRY_STRUCTURE) {
> +      .ErrorSeverity = ((CorrectedError == 1) ?
> +                        EFI_ACPI_6_3_ERROR_SEVERITY_CORRECTED :
> +                        EFI_ACPI_6_3_ERROR_SEVERITY_FATAL),
> +      .Revision = EFI_ACPI_6_3_GENERIC_ERROR_DATA_ENTRY_REVISION,
> +      .ValidationBits = 0,
> +      .Flags = 0,
> +      .ErrorDataLength = sizeof (EFI_PLATFORM_MEMORY_ERROR_DATA),
> +      .FruId = {0},
> +      .FruText = {0},
> +      .Timestamp = {0},
> +    };
> +  SectionType = (EFI_GUID) EFI_ERROR_SECTION_PLATFORM_MEMORY_GUID;
> +  CopyGuid ((EFI_GUID *)ErrBlockSectionDesc->SectionType, &SectionType);
> +
> +  // Locate Section base address and populate Memory Error Section(Cper) data.
> +  ErrBlockSectionData = (VOID *)(ErrBlockSectionDesc + 1);
> +  CopyMem (
> +    ErrBlockSectionData,
> +    (VOID *)&MemorySectionInfo,
> +    sizeof (EFI_PLATFORM_MEMORY_ERROR_DATA)
> +    );
> +}
> +
> +/**
> +  DMC-620 1-bit ECC event handler.
> +
> +  Supports multiple DMC error processing. Current implementation handles the
> +  DRAM ECC errors.
> +
> +  @param[in]  DispatchHandle       The unique handle assigned to this handler by
> +                                   MmiHandlerRegister().
> +  @param[in]  Context              Points to an optional handler context which
> +                                   was specified when the handler was
> +                                   registered.
> +  @param[in, out]  CommBuffer      Buffer passed from Non-MM to MM environmvent.
> +  @param[in, out]  CommBufferSize  The size of the CommBuffer.
> +
> +  @retval  EFI_SUCCESS  Event handler successful.
> +  @retval  Other        Failure of event handler.
> +**/
> +STATIC
> +EFI_STATUS
> +EFIAPI
> +Dmc620ErrorEventHandler (
> +  IN     EFI_HANDLE DispatchHandle,
> +  IN     CONST VOID *Context,       OPTIONAL
> +  IN OUT VOID       *CommBuffer,    OPTIONAL
> +  IN OUT UINTN      *CommBufferSize OPTIONAL
> +  )
> +{
> +  DMC620_REGS_TYPE *DmcCtrl;
> +  UINTN            DmcIdx;
> +  UINTN            ErrGsr;
> +
> +  // DMC instance which raised the error event.
> +  DmcIdx = *(UINTN *)CommBuffer;
[SAMI] Would it be good to add validation for the DMC instance index?
> +  // Error Record Base address for that DMC instance.
> +  DmcCtrl = (DMC620_REGS_TYPE *)(FixedPcdGet64 (PcdDmc620RegisterBase) +
> +            (FixedPcdGet64 (PcdDmc620CtrlSize) * DmcIdx));
> +
> +  DEBUG ((
> +    DEBUG_INFO,
> +    "%a: DMC error event raised for DMC: %d with DmcBaseAddr: 0x%x \n",
> +    __FUNCTION__,
> +    DmcIdx,
> +    (UINTN)DmcCtrl
> +    ));
> +
> +  ErrGsr = MmioRead32 ((UINTN)&DmcCtrl->Errgsr);
> +
> +  if (ErrGsr & DMC620_ERR_GSR_ECC_CORRECTED_FH) {
[SAMI] Fix if condition.
> +    // Handle corrected 1-bit DRAM ECC error.
> +    Dmc620HandleDramError (
> +      DmcCtrl,
> +      DmcIdx,
> +      DMC620_ERR_GSR_ECC_CORRECTED_FH,
> +      FixedPcdGet64 (
> +        PcdDmc620DramOneBitErrorDataBase) +
> +        (FixedPcdGet64 (PcdDmc620DramOneBitErrorDataSize) * DmcIdx)
> +        );
> +  } else {
> +    DEBUG ((
> +      DEBUG_ERROR,
> +      "%a: Unsupported DMC-620 error reported, ignoring\n",
> +      __FUNCTION__
> +      ));
> +  }
> +
> +  // No data to send using the MM communication buffer so clear the comm buffer
> +  // size.
> +  *CommBufferSize = 0;
> +
> +  return EFI_SUCCESS;
> +}
> +
> +/**
> +  Initialize function for the driver.
> +
> +  Registers MMI handlers to process fault events on DMC and installs required
> +  protocols to publish the error source descriptors.
> +
> +  @param[in]  ImageHandle  Handle to image.
> +  @param[in]  SystemTable  Pointer to System table.
> +
> +  @retval  EFI_SUCCESS  On successful installation of error event handler for
> +                        DMC.
> +  @retval  Other        Failure in installing error event handlers for DMC.
> +**/
> +EFI_STATUS
> +EFIAPI
> +Dmc620MmDriverInitialize (
> +  IN EFI_HANDLE          ImageHandle,
> +  IN EFI_MM_SYSTEM_TABLE *SystemTable
> +  )
> +{
> +  EFI_MM_SYSTEM_TABLE *mMmst;
> +  EFI_STATUS          Status;
> +  EFI_HANDLE          DispatchHandle;
> +
> +  ASSERT (SystemTable != NULL);
> +  mMmst = SystemTable;
> +
> +  // Register MMI handlers for DMC-620 error events.
> +  Status = mMmst->MmiHandlerRegister (
> +                    Dmc620ErrorEventHandler,
> +                    &gArmDmcEventHandlerGuid,
> +                    &DispatchHandle
> +                    );
> +  if (EFI_ERROR(Status)) {
[SAMI] Space needed after EFI_ERROR and opening bracket. Same comment 
for other places in this patch.
> +    DEBUG ((
> +      DEBUG_ERROR,
> +      "%a: Registration failed for DMC error event handler, Status:%r\n",
> +      __FUNCTION__,
> +      Status
> +      ));
> +
> +     return Status;
> +  }
> +
> +  // Installs the HEST error source descriptor protocol.
> +  Status = Dmc620InstallErrorSourceDescProtocol (SystemTable);
> +  if (EFI_ERROR(Status)) {
> +    mMmst->MmiHandlerUnRegister (DispatchHandle);
> +  }
> +
> +  return Status;
> +}
> diff --git a/Platform/ARM/Drivers/Dmc620Mm/Dmc620MmErrorSourceInfo.c b/Platform/ARM/Drivers/Dmc620Mm/Dmc620MmErrorSourceInfo.c
> new file mode 100644
> index 000000000000..59dcff019a07
> --- /dev/null
> +++ b/Platform/ARM/Drivers/Dmc620Mm/Dmc620MmErrorSourceInfo.c
> @@ -0,0 +1,194 @@
> +/** @file
> +  Create and populate DMC-620 HEST error source descriptors.
> +
> +  Implements the HEST Error Source Descriptor protocol. Creates the GHESv2
> +  type error source descriptors for supported hardware errors. Appends
> +  the created descriptors to the Buffer parameter of the protocol.
> +
> +  Copyright (c) 2020 - 2021, ARM Limited. All rights reserved.
> +  SPDX-License-Identifier: BSD-2-Clause-Patent
> +
> +  @par Specification Reference:
> +    - ACPI Reference Specification 6.3, Table 18-393 GHESv2 Structure.
> +**/
> +
> +#include <Library/AcpiLib.h>
> +#include <Dmc620Mm.h>
> +
> +/**
> +  Populate the DMC-620 DRAM Error Source Descriptor.
> +
> +  Creates error source descriptor of GHESv2 type to be appended to the Hest
> +  table. The error source descriptor is populated with appropriate values
> +  based on the instance number of DMC-620. Allocates and initializes memory
> +  for Error Status Block(Cper) section for each error source.
> +
> +  @param[in]  ErrorDesc  HEST error source descriptor Information.
> +  @param[in]  DmcIdx     Instance number of the DMC-620.
> +**/
> +STATIC
> +VOID
> +EFIAPI
> +Dmc620SetupDramErrorDescriptor (
> +  IN  EFI_ACPI_6_3_GENERIC_HARDWARE_ERROR_SOURCE_VERSION_2_STRUCTURE *ErrorDesc,
> +  IN  UINTN     DmcIdx
> +  )
> +{
> +  UINTN  ErrorBlockData;
> +
> +  //
> +  // Address of reserved memory for the error status block that will be used
> +  // to hold the information about the DRAM error. Initialize this memory
> +  // with 0.
> +  //
> +  ErrorBlockData = FixedPcdGet64 (PcdDmc620DramOneBitErrorDataBase) +
> +                     (FixedPcdGet64 (PcdDmc620DramOneBitErrorDataSize) *
> +                      DmcIdx);
> +  SetMem (
> +    (VOID *)ErrorBlockData,
> +    FixedPcdGet64 (PcdDmc620DramOneBitErrorDataSize),
> +    0
> +    );
> +
> +  // Build the DRAM error source descriptor.
> +  *ErrorDesc =
> +    (EFI_ACPI_6_3_GENERIC_HARDWARE_ERROR_SOURCE_VERSION_2_STRUCTURE) {
> +      .Type = EFI_ACPI_6_3_GENERIC_HARDWARE_ERROR_VERSION_2,
> +      .SourceId = FixedPcdGet16 (PcdDmc620DramOneBitErrorSourceId) + DmcIdx,
> +      .RelatedSourceId = 0xFFFF,
> +      .Flags = 0,
> +      .Enabled = 1,
> +      .NumberOfRecordsToPreAllocate = 1,
> +      .MaxSectionsPerRecord = 1,
> +      .MaxRawDataLength = sizeof (EFI_PLATFORM_MEMORY_ERROR_DATA),
> +      .ErrorStatusAddress = ARM_GAS64 (ErrorBlockData + 8),
[SAMI] This initialisation style is not protable and some compilers may 
not support this. Please change this.
> +      .NotificationStructure =
> +        EFI_ACPI_6_3_HARDWARE_ERROR_NOTIFICATION_STRUCTURE_INIT (
> +          EFI_ACPI_6_3_HARDWARE_ERROR_NOTIFICATION_SOFTWARE_DELEGATED_EXCEPTION,
> +          0,
> +          FixedPcdGet32 (PcdDmc620DramErrorSdeiEventBase) + DmcIdx
> +          ),
> +      .ErrorStatusBlockLength =
> +        sizeof (EFI_ACPI_6_3_GENERIC_ERROR_STATUS_STRUCTURE) +
> +        sizeof (EFI_ACPI_6_3_GENERIC_ERROR_DATA_ENTRY_STRUCTURE) +
> +        sizeof (EFI_PLATFORM_MEMORY_ERROR_DATA),
> +      .ReadAckRegister = ARM_GAS64 (ErrorBlockData),
> +      .ReadAckPreserve = 0,
> +      .ReadAckWrite = 0
> +      };
> +}
> +
> +/**
> +  MMI handler implementing the HEST error source descriptor protocol.
> +
> +  Returns the error source descriptor information for all supported hardware
> +  error sources. As mentioned in the HEST Error Source Decriptor protocol this
> +  handler returns with error source count and length when Buffer parameter is
> +  NULL.
> +
> +  @param[in]   This                Pointer for this protocol.
> +  @param[out]  Buffer              HEST error source descriptor Information
> +                                   buffer.
> +  @param[out]  ErrorSourcesLength  Total length of Error Source Descriptors
> +  @param[out]  ErrorSourceCount    Total number of supported error spurces.
> +
> +  @retval  EFI_SUCCESS            Buffer has valid Error Source descriptor
> +                                  information.
> +  @retval  EFI_INVALID_PARAMETER  Buffer is NULL.
> +**/
> +STATIC
> +EFI_STATUS
> +EFIAPI
> +Dmc620ErrorSourceDescInfoGet (
> +  IN  MM_HEST_ERROR_SOURCE_DESC_PROTOCOL *This,
> +  OUT VOID                               **Buffer,
> +  OUT UINTN                              *ErrorSourcesLength,
> +  OUT UINTN                              *ErrorSourcesCount
> +  )
> +{
> +  EFI_ACPI_6_3_GENERIC_HARDWARE_ERROR_SOURCE_VERSION_2_STRUCTURE *ErrorDescriptor;
> +  UINTN                                                          DmcIdx;
> +
> +  //
> +  // Update the error source length and error source count parameters.
> +  //
> +  *ErrorSourcesLength =
> +    FixedPcdGet64 (PcdDmc620NumCtrl) *
> +    FixedPcdGet64 (PcdDmc620ErrSourceCount) *
> +    sizeof (EFI_ACPI_6_3_GENERIC_HARDWARE_ERROR_SOURCE_VERSION_2_STRUCTURE);
> +  *ErrorSourcesCount = FixedPcdGet64 (PcdDmc620NumCtrl) *
> +                       FixedPcdGet64 (PcdDmc620ErrSourceCount);
> +
> +  //
> +  // If 'Buffer' is NULL return, as this invocation of the protocol handler is
> +  // to determine the total size of all the error source descriptor instances.
> +  //
> +  if (Buffer == NULL) {
> +    return EFI_INVALID_PARAMETER;
[SAMI] Depending on how this function is designed to be used, this check 
should be moved at the begining of the function.
Is the function is expected to return the length and count to the caller 
so that the caller can allocate a buffer of the required size?
[/SAMI]
> +  }
> +
> +  // Buffer to be updated with error source descriptor(s) information.
> +  ErrorDescriptor =
> +    (EFI_ACPI_6_3_GENERIC_HARDWARE_ERROR_SOURCE_VERSION_2_STRUCTURE *)*Buffer;
> +
> +  //
> +  // Create and populate the available error source descriptor for all DMC(s).
> +  //
> +  for (DmcIdx = 0; DmcIdx < FixedPcdGet64 (PcdDmc620NumCtrl); DmcIdx++) {
> +    // Add the one-bit DRAM error source descriptor.
> +    Dmc620SetupDramErrorDescriptor (ErrorDescriptor, DmcIdx);
> +    ErrorDescriptor++;
> +  }
> +
> +  return EFI_SUCCESS;
> +}
> +
> +//
> +// DMC-620 MM_HEST_ERROR_SOURCE_DESC_PROTOCOL protocol instance.
> +//
> +STATIC MM_HEST_ERROR_SOURCE_DESC_PROTOCOL mDmc620ErrorSourceDesc = {
> +  Dmc620ErrorSourceDescInfoGet
> +};
> +
> +/**
> +  Allow reporting of supported DMC-620 error sources.
> +
> +  Install the HEST Error Source Descriptor protocol handler to allow publishing
> +  of the supported Dmc(s) hardware error sources.
> +
> +  @param[in]  MmSystemTable  Pointer to System table.
> +
> +  @retval  EFI_SUCCESS            Protocol installation successful.
> +  @retval  EFI_INVALID_PARAMETER  Invalid system table parameter.
> +**/
> +EFI_STATUS
> +Dmc620InstallErrorSourceDescProtocol (
> +  IN EFI_MM_SYSTEM_TABLE *MmSystemTable
> +  )
> +{
> +  EFI_HANDLE mDmcHandle = NULL;
> +  EFI_STATUS Status;
> +
> +  // Check if the MmSystemTable is initialized.
> +  if (MmSystemTable == NULL) {
> +    return EFI_INVALID_PARAMETER;
> +  }
> +
> +  // Install HEST error source descriptor protocol for DMC(s).
> +  Status = MmSystemTable->MmInstallProtocolInterface (
> +                            &mDmcHandle,
> +                            &gMmHestErrorSourceDescProtocolGuid,
> +                            EFI_NATIVE_INTERFACE,
> +                            &mDmc620ErrorSourceDesc
> +                            );
> +  if (EFI_ERROR(Status)) {
> +    DEBUG ((
> +      DEBUG_ERROR,
> +      "%a: Failed installing HEST error source protocol, status: %r\n",
> +      __FUNCTION__,
> +      Status
> +      ));
> +  }
> +
> +  return Status;
> +}


[-- Attachment #2: Type: text/html, Size: 37643 bytes --]

  reply	other threads:[~2021-09-27 17:30 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-24  6:00 [edk2-platforms][PATCH v3 0/5] Platform/Sgi: Add platform support for firmware first error handling Omkar Anand Kulkarni
2021-08-24  6:00 ` [edk2-platforms][PATCH v3 1/5] Platform/ARM: Add DMC-620 ECC error handling driver Omkar Anand Kulkarni
2021-09-27 17:30   ` Sami Mujawar [this message]
2021-08-24  6:00 ` [edk2-platforms][PATCH v3 2/5] Platform/Sgi: dmc-620 firmware-first error handling Omkar Anand Kulkarni
2021-10-04 12:45   ` Sami Mujawar
2021-08-24  6:00 ` [edk2-platforms][PATCH v3 3/5] Platform/Sgi: define memory region for GHES error status block Omkar Anand Kulkarni
2021-10-04 18:23   ` Sami Mujawar
2021-08-24  6:00 ` [edk2-platforms][PATCH v3 4/5] Platform/Sgi: Define values for ACPI table header Omkar Anand Kulkarni
2021-10-04 12:46   ` Sami Mujawar
2021-08-24  6:00 ` [edk2-platforms][PATCH v3 5/5] Platform/Sgi: Add platform error handling driver Omkar Anand Kulkarni
2021-10-04 12:46   ` Sami Mujawar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b2072b9f-e57e-f80d-5fac-72655978d1cc@arm.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox